[Biopython-dev] Parsing fastq files with SeqIO.parser(handle)
Alex Leach
albl500 at york.ac.uk
Fri Apr 19 07:29:28 EDT 2013
Dear BioPython Devs,
Probably a strange request, but I was wondering if it might be a good idea
to make the fasta parser raise an error when it is asked to parse
incorrectly formatted files.
I ask, because a while ago, I made a simple command line utility to
convert sequence files to/from various formats, using SeqIO.parser. It's
attached if anyone's interested.
My supervisor's now using it to filter fastq formatted sequences by
length, but keeps forgetting to add a '-format fastq' option. The script
by default assumes fasta formatted sequences, which, like SeqIO.parser is
by design, but the problem is that the parser doesn't mind at all when a
fastq file doesn't contain a single ">" character.
Are there any interfaces to make the fasta parser stricter? This error is
completely silent until picked up by external programs; hmmer, in this
instance. Ideally, an error would be raised much earlier in the process,
especially as the department's NFS servers take ages to retrieve and
convert an IonTorrent dataset. (I've got him using /var/tmp for the
converted files, but he keeps the original fastq's in an NFS home folder,
which is sloooooow).
The department's using BioPython 1.57 btw.
Thanks for your time.
Kind regards,
Alex
p.s. Don't suppose there's any plans to implement any parsers as
C-extensions?
---
Alex Leach. BSc, MRes
Chong & Redeker Labs
Department of Biology
University of York
YO10 5DD
Tel: 07940 480 771
EMAIL DISCLAIMER: http://www.york.ac.uk/docs/disclaimer/email.htm
-------------- next part --------------
A non-text attachment was scrubbed...
Name: seqDB.py
Type: application/octet-stream
Size: 10674 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/biopython-dev/attachments/20130419/9da044eb/attachment.obj>
More information about the Biopython-dev
mailing list