[Bioperl-l] Next-gen modules

Chris Fields cjfields at illinois.edu
Wed Jun 17 12:57:52 UTC 2009


Elia,

As Mark indicated, we recently discussed the lack of support for next- 
gen on list, at least re: fastq.  I may be hit with the same thing in  
a few months time myself, and I recall Jason and a few others also  
mentioning the same.  Heikki wrote some code for Illumina FASTQ for  
SeqIO and related modules but I don't believe it has been committed to  
trunk yet, so maybe he can answer.

 From prior discussions IIRC the issues were:

1) distinguishing the various FASTQ versions (Sanger, Illumina 1.0,  
Illumina 1.3) from one another (so maybe some optional validation), and
2) having a way for the Seq object to either 'know' what format is  
contained, or we use phred score and convert back and forth from that  
(I think the latter makes more sense).

Peter's suggestions also are reasonable, though does biopython have a  
separate module for each of these variations?  Our version (I believe)  
mainly varied the conversion within Bio::SeqIO::fastq itself based on  
the fastq variant passed in as a separate named argument.

As for the wrappers, we would most certainly welcome them!

chris

On Jun 17, 2009, at 6:29 AM, Elia Stupka wrote:

> Dear all,
>
> after several years of absence I am slowly coming back to Bioperl,  
> and hope to contribute again to its development.
>
> One area that I was thinking of starting from, since we are actively  
> involved with it, is to improve BIoperl's support fo next-gen  
> sequencing data, tools, etc. Since I am sure I have missed out on a  
> lot of recent developments, do let me know if/what is useful.
>
> One example that comes to mind is that the conversion of various  
> formats to/from FASTQ does not seem to be supported. Some code can  
> be found within Li Heng's script: http://maq.sourceforge.net/fq_all2std.pl 
>  but it would be good if it could make its way into SeqIO? And  
> similarly, potentially, for other next-gen sequence formats?
>
> Similarly, there seems to be little in bioperl-run to support tools  
> that have been developed in this area, such as Maq, BowTie, TopHat,  
> etc?
>
> Do let me know if there is a past thread on this, or other people  
> actively developing, etc. so that I can find out what priorities are.
>
> thanks and best regards to all (old friends and new),
>
> Elia
>
> ---
> Senior Lecturer, Bioinformatics
> UCL Cancer Institute
> Paul O' Gorman Building
> University College London
> Gower Street
> WC1E 6BT
> London
> UK
>
> Office (UCL): +44 207 679 6493
> Office (ICMS): +44 0207 8822374
>
> Mobile: +44 7597 566 194
> Mobile (Italy): +39 338 8448801
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l




More information about the Bioperl-l mailing list