[Bioperl-l] fastq splitter - working but not before xmas!!
Fields, Christopher J
cjfields at illinois.edu
Thu May 24 15:05:40 EDT 2012
Sean,
I have been working on a XS-based interface that basically just returns hashrefs, this uses Heng Li's kseq.h library. I can probably push this out to CPAN sometime in the next week or so. I did some initial (very rough) benchmarks and when using a simple count it parsed 30M reads in about 25-30 seconds.
chris
On May 16, 2012, at 1:05 PM, Sean O'Keeffe wrote:
> So now I've got a bunch of fastq's all about 17GB in size. The script is
> puttering away but this is tediously slow.
> I tried the the fastq-dump tool from sra toolkit but it didn't like my
> commands (fastq-dump --split-files <input_fastq_file> ) - my ignorance no
> doubt.
>
> Any ideas out there on speeding up Bio::SeqIO::fastq output?
> Thanks.
>
> On 1 March 2012 03:16, Joel Martin <j_martin at lbl.gov> wrote:
>
>> Just a caution to double check that the read1 and read2 names match after
>> splitting. I don't know if this thread jinxed me or what, but I just for
>> the first time received a concatenated fastq file formatted as you
>> describe, except the first read1 doesn't match the first read2. zut alores!
>>
>> came up with converting to scarf, /usr/bin/sort the scarf, then read that
>> with tossing into single or paired files and reconverting to fastq in the
>> process. it wasn't too bad, but I don't think bioperl has a scarf
>> conversion, it's basically fastq with : substituted for \n. most
>> delimeters that aren't : would work better but i already had a fastq2scarf
>> from early solexa days ( i think ).
>>
>> # this was the last step, if it's handy for this plague of hideous files,
>> the fixed fields for : would need adjusting
>> use strict;
>>
>> open( my $oph, '>', 'paired.fq' ) or die $!;
>> open( my $osh, '>', 'single.fq' ) or die $!;
>>
>> my ( $pend, $pname, $pline );
>>
>> while ( <>) {
>> my ( $name, $end ) = /^(\S+)\s(\d)/;
>>
>> if ( $end == 1 ) {
>> if ( $pend ) {
>> print_reads( $osh, $pline );
>> }
>> $pend = $end;
>> $pname = $name;
>> $pline = $_;
>> }
>> elsif ( $end == 2 ) {
>> my $fh = $pend == 1 && $pname eq $name ? $oph : $osh;
>> print_reads( $fh, $pline, $_ );
>> $pend = '';
>> }
>> else {
>> die "ERROR: can't interpret line $. $_";
>> }
>> }
>> sub print_reads {
>> my ( $fh, @reads ) = @_;
>> for my $scarf ( @reads ) {
>> my @stuff = split /:/,$scarf,12;
>> print $fh '@',join(':', at stuff[0..9]),"\n$stuff[10]\n+\n$stuff[11]";
>> }
>> }
>>
>> Joel
>>
>> On Wed, Feb 29, 2012 at 11:52 AM, George Hartzell <hartzell at alerce.com>wrote:
>>
>>> Fields, Christopher J writes:
>>>> Just want to say, if you can set up a local perl and local::lib it
>>>> makes your life a LOT easier. Particularly if you are running jobs
>>>> on older versions of RHEL, which notoriously stuck with
>>>> outdated/broken versions of perl (as well as other tools).
>>>> [...]
>>>
>>> And Perlbrew takes away your last excuse for not building perls and
>>> setting up local::lib's.
>>>
>>> http://perlbrew.pl/
>>>
>>> g.
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
More information about the Bioperl-l
mailing list