[Open-bio-l] FASTQ support in Biopython, BioPerl, and EMBOSS

Peter biopython at maubp.freeserve.co.uk
Thu Jul 30 15:55:56 UTC 2009


On Thu, Jul 30, 2009 at 4:46 PM, Chris Fields<cjfields at illinois.edu> wrote:
>> The EMBOSS patch I was testing from Peter Rice went for a
>> silent truncation, in Biopython have also for the moment gone
>> for silently imposing the maximum scores (ASCII 126, 0x7e)
>> of 93, 62 and 62 for the three formats. Another reason for this
>> is speed.
>>
>> Peter
>
> Speed is one reason to worry, but we also should think carefully about
> silently truncating the data w/o the user's knowledge.  One thing we
> don't want to propagate is loss of data w/o warning.

Yes and no. Do you warn about converting from EMBL/GenBank to
FASTA? Or from a PFAM alignment to a ClustalW or PHYLIP
alignment? In those cases, anyone familiar with the file formats will
expect data loss as you are going from a richly annotated file format
to something much simpler.

Likewise here, anyone familiar with the FASTQ variants (and our
documentation should cover this) shouldn't be surprised at this
quality truncation. But I must concede, this is a more subtle and
less obvious data issue. So maybe you are right.

I can take a look at this and see how badly it would impact the
speed for Biopython...

Peter




More information about the Open-Bio-l mailing list