[Bioperl-l] Next-gen modules
Peter
biopython at maubp.freeserve.co.uk
Wed Jun 17 08:21:17 EDT 2009
On Wed, Jun 17, 2009 at 12:29 PM, Elia Stupka<e.stupka at ucl.ac.uk> wrote:
>
> Dear all,
>
> after several years of absence I am slowly coming back to Bioperl, and hope
> to contribute again to its development.
>
> One area that I was thinking of starting from, since we are actively
> involved with it, is to improve BIoperl's support fo next-gen sequencing
> data, tools, etc. Since I am sure I have missed out on a lot of recent
> developments, do let me know if/what is useful.
>
> One example that comes to mind is that the conversion of various formats
> to/from FASTQ does not seem to be supported. Some code can be found within
> Li Heng's script: http://maq.sourceforge.net/fq_all2std.pl but it would be
> good if it could make its way into SeqIO? And similarly, potentially, for
> other next-gen sequence formats?
If you do add FASTQ support to BioPerl's SeqIO (and I think that is a
good idea), please could you follow the format names used by Biopython
- as this time we got there first ;)
I'm asking this as Biopython's SeqIO tries to use the same format
names as BioPerl's SeqIO and EMBOSS, see
http://biopython.org/wiki/SeqIO
Specifically,
* "fastq" in Biopython means the original Sanger standard FASTQ files
encoding PHRED qualities using an ASCII offset of 33.
* "fastq-solexa" in Biopython means the early Solexa/Illumina style
FASTQ files which encode Solexa qualities using an ASCII offset of 64.
* "fastq-illumina" in Biopython will mean recent Solexa/Illumina style
FASTQ files (from pipeline version 1.3+) which encode PHRED qualities
using an ASCII offset of 64. This is in the Biopython repository, but
hasn't been released yet - so the name "fastq-illumina" isn't set in
stone yet.
For good quality reads, PHRED and Solexa scores are approximately
equal, so the "fastq-solexa" and "fastq-illumina" variants are almost
equivalent.
> Similarly, there seems to be little in bioperl-run to support tools that
> have been developed in this area, such as Maq, BowTie, TopHat, etc?
>
> Do let me know if there is a past thread on this, or other people actively
> developing, etc. so that I can find out what priorities are.
Have you seen these recent threads?:
http://lists.open-bio.org/pipermail/bioperl-l/2009-May/029970.html
http://lists.open-bio.org/pipermail/bioperl-l/2009-May/029911.html
http://lists.open-bio.org/pipermail/bioperl-l/2009-June/030187.html
Regards,
Peter (at Biopython)
More information about the Bioperl-l
mailing list