[Bioperl-l] fastq splitter

Joel Martin j_martin at lbl.gov
Thu Mar 1 08:16:10 UTC 2012


Just a caution to double check that the read1 and read2 names match after
splitting.  I don't know if this thread jinxed me or what, but I just for
the first time received a concatenated fastq file formatted as you
describe, except the first read1 doesn't match the first read2.  zut alores!

came up with converting to scarf, /usr/bin/sort the scarf, then read that
with tossing into single or paired files and reconverting to fastq in the
process.  it wasn't too bad, but I don't think bioperl has a scarf
conversion, it's basically fastq with : substituted for \n.  most
delimeters that aren't : would work better but i already had a fastq2scarf
from early solexa days ( i think ).

# this was the last step, if it's handy for this plague of hideous files,
the fixed fields for : would need adjusting
use strict;

open( my $oph, '>', 'paired.fq' ) or die $!;
open( my $osh, '>', 'single.fq' ) or die $!;

my ( $pend, $pname, $pline );

while ( <>) {
  my ( $name, $end ) = /^(\S+)\s(\d)/;

  if ( $end == 1 ) {
    if ( $pend ) {
      print_reads( $osh, $pline );
    }
    $pend = $end;
    $pname = $name;
    $pline = $_;
  }
  elsif ( $end == 2 ) {
    my $fh = $pend == 1 && $pname eq $name ? $oph : $osh;
    print_reads( $fh, $pline, $_ );
    $pend = '';
  }
  else {
    die "ERROR: can't interpret line $. $_";
  }
}
sub print_reads {
  my ( $fh, @reads ) = @_;
  for my $scarf ( @reads ) {
    my @stuff = split /:/,$scarf,12;
    print $fh '@',join(':', at stuff[0..9]),"\n$stuff[10]\n+\n$stuff[11]";
  }
}

Joel

On Wed, Feb 29, 2012 at 11:52 AM, George Hartzell <hartzell at alerce.com>wrote:

> Fields, Christopher J writes:
>  > Just want to say, if you can set up a local perl and local::lib it
>  > makes your life a LOT easier.  Particularly if you are running jobs
>  > on older versions of RHEL, which notoriously stuck with
>  > outdated/broken versions of perl (as well as other tools).
>  > [...]
>
> And Perlbrew takes away your last excuse for not building perls and
> setting up local::lib's.
>
>  http://perlbrew.pl/
>
> g.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>



More information about the Bioperl-l mailing list