[Open-bio-l] More FASTQ examples for cross project testing

Peter biopython at maubp.freeserve.co.uk
Tue Sep 1 11:28:06 EDT 2009


On Wed, Aug 26, 2009 at 11:04 PM, Peter<biopython at maubp.freeserve.co.uk> wrote:
>
> I didn't want to clog up the mailing list with attachments, but just
> for the record, I've sent my first attempt at this to Peter (EMBOSS)
> and Chris (BioPerl) for comment (and checking).

I've emailed the latest test cases (off the mailing list) to Peter
(EMBOSS), Chris (BioPerl), Michael (BioJava) and Naohisa
(BioRuby). These files are also in Biopython's repository.

I've just run these against bioperl-live SVN, and most of them
work as I would expect. Note that the output of Solexa FASTQ
files where the scores must be converted from PHRED values
isn't working yet (Chris knows about this):
http://lists.open-bio.org/pipermail/bioperl-l/2009-August/031064.html

All the error_*.fastq files are correctly rejected by BioPerl, except
those with invalid characters in the quality string (e.g. a delete)
which are treated as a warning condition (rather than aborting
with an exception):

error_qual_del.fastq
error_qual_escape.fastq
error_qual_null.fastq
error_qual_space.fastq
error_qual_tab.fastq
error_qual_unit_sep.fastq
error_qual_vtab.fastq

Presumably this is in line with (Bio)Perl norms? i.e. Make a best guess
at what the file is trying trying to say, issue a warning, but continue?

In Biopython (in line with Python norms), we don't try to guess. Giving
an error and aborting is the only clear and unambiguous action.

Would it suffice to agree that all the OBF projects will read these
error_*.fastq files and either raise an exception (abort), or at least
issue a warning?

Peter


More information about the Open-Bio-l mailing list