[Bioperl-l] Next-gen modules

Peter biopython at maubp.freeserve.co.uk
Fri Jul 24 09:00:23 EDT 2009


Hi all,

On Fri, Jul 24, 2009 at 1:19 PM, Chris Fields<cjfields at illinois.edu> wrote:
>>
>> Have you guys (BioPerl) have also gone for "fastq-sanger" instead of
>> just "fastq" for the Sanger Standard version of FASTQ (like EMBOSS)?
>> Does BioPerl use just "fastq" to mean anything?
>
> Short answer: yes, and yes.
>
> Slightly longer answer: I've set up SeqIO so it converts "new(-format =>
> 'foo-bar')" to new(-format => 'foo, -variant => 'bar').  In the fastq
> constructor, if the variant is expected but isn't defined (i.e. for 'fastq')
> it defaults to sanger.  Makes it a bit easier maintenance-wise if a new
> variant pops up.

Right, so BioPerl understands "fastq" and "fastq-sanger" to mean the
Sanger standard FASTQ files.

I've just updated Biopython to also allow "fastq-sanger" as an alias for
"fastq", so we are consistent here:
http://lists.open-bio.org/pipermail/biopython-dev/2009-July/006466.html

Biopython, BioPerl and EMBOSS now all agree on the format names:
* "fastq-sanger" - PHRED scores offset 33
* "fastq-solexa" - Solexa scores offset 64
* "fastq-illumina" - PHRED scores offset 64

And Biopython and BioPerl also agree on the meaning of "fastq" as
an alias for "fastq-sanger". Unfortunately EMBOSS differs here, see:
http://lists.open-bio.org/pipermail/emboss-dev/2009-July/000599.html

Does BioJava or BioRuby have a SeqIO equivalent where they need
to give different sequence formats unique names? If so, we should
talk to them soon...

Peter



More information about the Bioperl-l mailing list