[Bioperl-l] Next-gen modules

Peter biopython at maubp.freeserve.co.uk
Wed Jun 17 12:21:17 UTC 2009


On Wed, Jun 17, 2009 at 12:29 PM, Elia Stupka<e.stupka at ucl.ac.uk> wrote:
>
> Dear all,
>
> after several years of absence I am slowly coming back to Bioperl, and hope
> to contribute again to its development.
>
> One area that I was thinking of starting from, since we are actively
> involved with it, is to improve BIoperl's support fo next-gen sequencing
> data, tools, etc. Since I am sure I have missed out on a lot of recent
> developments, do let me know if/what is useful.
>
> One example that comes to mind is that the conversion of various formats
> to/from FASTQ does not seem to be supported. Some code can be found within
> Li Heng's script: http://maq.sourceforge.net/fq_all2std.pl but it would be
> good if it could make its way into SeqIO? And similarly, potentially, for
> other next-gen sequence formats?

If you do add FASTQ support to BioPerl's SeqIO (and I think that is a
good idea), please could you follow the format names used by Biopython
- as this time we got there first ;)

I'm asking this as Biopython's SeqIO tries to use the same format
names as BioPerl's SeqIO and EMBOSS, see
http://biopython.org/wiki/SeqIO

Specifically,
* "fastq" in Biopython means the original Sanger standard FASTQ files
encoding PHRED qualities using an ASCII offset of 33.
* "fastq-solexa" in Biopython means the early Solexa/Illumina style
FASTQ files which encode Solexa qualities using an ASCII offset of 64.
* "fastq-illumina" in Biopython will mean recent Solexa/Illumina style
FASTQ files (from pipeline version 1.3+) which encode PHRED qualities
using an ASCII offset of 64. This is in the Biopython repository, but
hasn't been released yet - so the name "fastq-illumina" isn't set in
stone yet.

For good quality reads, PHRED and Solexa scores are approximately
equal, so the "fastq-solexa" and "fastq-illumina" variants are almost
equivalent.

> Similarly, there seems to be little in bioperl-run to support tools that
> have been developed in this area, such as Maq, BowTie, TopHat, etc?
>
> Do let me know if there is a past thread on this, or other people actively
> developing, etc. so that I can find out what priorities are.

Have you seen these recent threads?:
http://lists.open-bio.org/pipermail/bioperl-l/2009-May/029970.html
http://lists.open-bio.org/pipermail/bioperl-l/2009-May/029911.html
http://lists.open-bio.org/pipermail/bioperl-l/2009-June/030187.html

Regards,

Peter (at Biopython)



More information about the Bioperl-l mailing list