[Biopython-dev] Parsing fastq files with SeqIO.parser(handle)

Fri Apr 19 11:29:28 UTC 2013

Dear BioPython Devs,

Probably a strange request, but I was wondering if it might be a good idea  
to make the fasta parser raise an error when it is asked to parse  
incorrectly formatted files.

I ask, because a while ago, I made a simple command line utility to  
convert sequence files to/from various formats, using SeqIO.parser. It's  
attached if anyone's interested.

My supervisor's now using it to filter fastq formatted sequences by  
length, but keeps forgetting to add a '-format fastq' option. The script  
by default assumes fasta formatted sequences, which, like SeqIO.parser is  
by design, but the problem is that the parser doesn't mind at all when a  
fastq file doesn't contain a single ">" character.

Are there any interfaces to make the fasta parser stricter? This error is  
completely silent until picked up by external programs; hmmer, in this  
instance. Ideally, an error would be raised much earlier in the process,  
especially as the department's NFS servers take ages to retrieve and  
convert an IonTorrent dataset. (I've got him using /var/tmp for the  
converted files, but he keeps the original fastq's in an NFS home folder,  
which is sloooooow).

The department's using BioPython 1.57 btw.

Thanks for your time.
Kind regards,
Alex

p.s. Don't suppose there's any plans to implement any parsers as  
C-extensions?

---
Alex Leach. BSc, MRes
Chong & Redeker Labs
Department of Biology
University of York
YO10 5DD
Tel: 07940 480 771
EMAIL DISCLAIMER: http://www.york.ac.uk/docs/disclaimer/email.htm
-------------- next part --------------
A non-text attachment was scrubbed...
Name: seqDB.py
Type: application/octet-stream
Size: 10674 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/biopython-dev/attachments/20130419/9da044eb/attachment-0002.obj>