[BioPython] Reading Fasta-like Qual Files

Peter biopython at maubp.freeserve.co.uk
Fri Feb 27 10:54:26 UTC 2009


On Fri, Feb 27, 2009 at 7:26 AM, David Michael Schruth
<dschruth at gmail.com> wrote:
> Hello,
>
> This is my first post on the list.  I'm enjoying using biopython but am
> running into some snags when trying to incorporate quality information into
> my analysis.  Namely I can't quite read in qual files (output from 454,
> solid)  the way I would like.   Namely, the spaces between the two digit
> integer phred scores get squished indistinguishably together.   I've
> actually fixed this in my own copy of the code by removing the
>
> .replace(" ","")
>
> call from ~58th line of FastaIO.py (in the FastaIterator class).
>
> Hopefully this doesn't have any adverse effects that I might not have
> forseen.  In the mean time, It would be nice to have some sort of more
> permanant solution to this.... some way to specify or to otherwise
> accomodate these fasta-like qual files in FastaIO and Biopython In
> general.    Supporting the fastQ format would also be nice.

The Bio.SeqIO.FastaIO parser is intended for FASTA sequence files
only, where any white space should be removed.  Even with your
suggested change, it doesn't make sense to stick a string of numbers
into a Seq object.  Basically that parser just isn't intended to be
used with QUAL files.  But there is some good news - your email is
quite timely!

> The only mention of biopython and quality I've run across is on the
> Biopython-dev list:
> http://portal.open-bio.org/pipermail/biopython-dev/2007-October/003131.html
> The email is dated 2007 but I'm doubting that any progress on this front has
> been made.

Maybe the search engines haven't indexed the latest discussions, just
this month:
http://lists.open-bio.org/pipermail/biopython-dev/2009-February/005340.html

Also searching our Bugzilla should have found a few hits for
enhancements.  There isn't any code checked into CVS yet, but things
are happening and I hope we'll have built in support for reading (and
probably writing) FASTQ and QUAL files in the next release of
Biopython.  If you are interested in the details, you might want to
sign up to the dev mailing list, but I'm sure we'll have some
announcement or discussion on this list too.

Peter




More information about the Biopython mailing list