[Bioperl-l] fastq splitter - working but not before xmas!!

Sean O'Keeffe limericksean at gmail.com
Wed May 16 14:05:28 EDT 2012


So now I've got a bunch of fastq's all about 17GB in size. The script is
puttering away but this is tediously slow.
I tried the the fastq-dump tool from sra toolkit but it didn't like my
commands (fastq-dump --split-files <input_fastq_file> ) - my ignorance no
doubt.

Any ideas out there on speeding up Bio::SeqIO::fastq output?
Thanks.

On 1 March 2012 03:16, Joel Martin <j_martin at lbl.gov> wrote:

> Just a caution to double check that the read1 and read2 names match after
> splitting.  I don't know if this thread jinxed me or what, but I just for
> the first time received a concatenated fastq file formatted as you
> describe, except the first read1 doesn't match the first read2.  zut alores!
>
> came up with converting to scarf, /usr/bin/sort the scarf, then read that
> with tossing into single or paired files and reconverting to fastq in the
> process.  it wasn't too bad, but I don't think bioperl has a scarf
> conversion, it's basically fastq with : substituted for \n.  most
> delimeters that aren't : would work better but i already had a fastq2scarf
> from early solexa days ( i think ).
>
> # this was the last step, if it's handy for this plague of hideous files,
> the fixed fields for : would need adjusting
> use strict;
>
> open( my $oph, '>', 'paired.fq' ) or die $!;
> open( my $osh, '>', 'single.fq' ) or die $!;
>
> my ( $pend, $pname, $pline );
>
> while ( <>) {
>   my ( $name, $end ) = /^(\S+)\s(\d)/;
>
>   if ( $end == 1 ) {
>     if ( $pend ) {
>       print_reads( $osh, $pline );
>     }
>     $pend = $end;
>     $pname = $name;
>     $pline = $_;
>   }
>   elsif ( $end == 2 ) {
>     my $fh = $pend == 1 && $pname eq $name ? $oph : $osh;
>     print_reads( $fh, $pline, $_ );
>     $pend = '';
>   }
>   else {
>     die "ERROR: can't interpret line $. $_";
>   }
> }
> sub print_reads {
>   my ( $fh, @reads ) = @_;
>   for my $scarf ( @reads ) {
>     my @stuff = split /:/,$scarf,12;
>     print $fh '@',join(':', at stuff[0..9]),"\n$stuff[10]\n+\n$stuff[11]";
>   }
> }
>
> Joel
>
> On Wed, Feb 29, 2012 at 11:52 AM, George Hartzell <hartzell at alerce.com>wrote:
>
>> Fields, Christopher J writes:
>>  > Just want to say, if you can set up a local perl and local::lib it
>>  > makes your life a LOT easier.  Particularly if you are running jobs
>>  > on older versions of RHEL, which notoriously stuck with
>>  > outdated/broken versions of perl (as well as other tools).
>>  > [...]
>>
>> And Perlbrew takes away your last excuse for not building perls and
>> setting up local::lib's.
>>
>>  http://perlbrew.pl/
>>
>> g.
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
>


More information about the Bioperl-l mailing list