[Bioperl-l] fastq splitter

Wed Feb 29 10:56:11 EST 2012

On Feb 29, 2012, at 9:32 AM, Peter Cock wrote:

> On Wed, Feb 29, 2012 at 3:27 PM, Fields, Christopher J
> <cjfields at illinois.edu> wrote:
>> On Feb 29, 2012, at 4:32 AM, Peter Cock wrote:
>> 
>>> On Wed, Feb 29, 2012 at 2:42 AM, Fields, Christopher J
>>> <cjfields at illinois.edu> wrote:
>>>> Frankly, there never seemed to be a real fixed standard in the way that FASTQ
>>>> headers were written (and just when it seems there is some consensus, Illumina
>>>> pulls the rug out from under you), hence the reason I leave it alone.  We could
>>>> add some ID munging in there if needed, would just need a qr// with a standard
>>>> fallback.
>>>> 
>>>> chris
>>> 
>>> Indeed - just like FASTA, it seems every company/tool/database has its own
>>> conventions about the FASTQ ID line and how to stuff as much meta-data
>>> into it as possible. This is a major reason why I hope unaligned reads in
>>> SAM/BAM takes off - places like the Sanger and Broad use this in their
>>> pipelines.
>>> 
>>> http://blastedbio.blogspot.com/2011/10/fastq-must-die-long-live-sambam.html
>>> 
>>> Peter
>> 
>> Unaligned BAM makes the most sense.  I've also been talking with the
>> HDF5 folks here sporadically, they're still keen on promoting BioHDF
>> (it is pretty fast), though that has cooled considerably.
>> 
>> Anyone working directly with CRAM in their pipelines?
>> 
>> chris
> 
> I understand that Sanger are looking at moving their pipelines from BAM to
> CRAM later this year, but CRAM is still quite new and in flux.
> 
> Peter

Yeah, I wasn't sure how the community outside of Sanger is approaching this.  

chris