[Bioperl-l] fastq splitter
Fields, Christopher J
cjfields at illinois.edu
Wed Feb 29 09:38:50 EST 2012
On Feb 29, 2012, at 7:07 AM, Michael Muratet wrote:
> On Feb 28, 2012, at 4:01 PM, Sean O'Keeffe wrote:
>> Hi Chris,
>> Unfortunately the read pairs are not consecutive. It seems they are cat'd
>> I could use split -l on the line number that they're glued together I guess.
>> If this is an overnight job for a bunch of files, I can wait so don't mind
>> using the module if it worked.
>> Someone pointed out I need to switch $seqin->desc to $inseq->desc.
>> However, now it spits out fasta output instead of fastq and returns a bunch
>> of warnings: Seq/Qual descriptions don't match; using sequence description
> Hi Sean
> Apparently the bioperl parser expects the the 'second' header line, i.e.,
> to have the same (redundant) identifier. When it encounters a blank line, which is the way the Illumina pipeline writes it out, it warns you.
> I think you have to explicitly write out the quality scores in fastq format.
Actually no, that's not true for the latest versions. It was completely refactored in coordination with Peter Cock (Biopython) and the other Bio* toolkits along with EMBOSS to parse a wide range of FASTQ data (including the solexa/illumina variants), and also attempt to catch bad formatting issues. See this pub:
This is one of the primary test examples that passes:
More information about the Bioperl-l