[Bioperl-l] fastq splitter

Wed Feb 29 15:27:38 UTC 2012

On Feb 29, 2012, at 4:32 AM, Peter Cock wrote:

> On Wed, Feb 29, 2012 at 2:42 AM, Fields, Christopher J
> <cjfields at illinois.edu> wrote:
>> Frankly, there never seemed to be a real fixed standard in the way that FASTQ
>> headers were written (and just when it seems there is some consensus, Illumina
>> pulls the rug out from under you), hence the reason I leave it alone.  We could
>> add some ID munging in there if needed, would just need a qr// with a standard
>> fallback.
>> 
>> chris
> 
> Indeed - just like FASTA, it seems every company/tool/database has its own
> conventions about the FASTQ ID line and how to stuff as much meta-data
> into it as possible. This is a major reason why I hope unaligned reads in
> SAM/BAM takes off - places like the Sanger and Broad use this in their
> pipelines.
> 
> http://blastedbio.blogspot.com/2011/10/fastq-must-die-long-live-sambam.html
> 
> Peter

Unaligned BAM makes the most sense.  I've also been talking with the HDF5 folks here sporadically, they're still keen on promoting BioHDF (it is pretty fast), though that has cooled considerably.

Anyone working directly with CRAM in their pipelines?

chris