[Bioperl-l] About FASTQ parser

Chris Fields cjfields at illinois.edu
Thu Sep 17 23:26:52 UTC 2009


The default format for most FASTQ parsers is to leave the extra header  
off (it increases the file size substantially).  You can add that back  
by setting quality_header():

my $out = Bio::SeqIO->new(-format => 'fastq', -file => $file, - 
quality_header => 1);

Again, let me know if that works okay.

chris

On Sep 17, 2009, at 1:16 PM, Abhishek Pratap wrote:

> Hi Chris
>
> I am just wondering if the following is intentionally excluded from a
> fasta record or a bug.
>
> After reading in each fastq record from a FASTQ fiel the output of the
> same recored  (  $out->write_seq($seq)  )  has line/text missing after
> the + sign.
>
>
>
> Eg:
>
> @HWI-EAS397:1:1:11:252#NNNTNN/1
> NACAATATCAATTAGAGGATTGCTTNGTTNAAGGNNTNGNTNNNANTNT
> +
> DNXPMXNYXMPVXZVTXYZ[[BBBBBBBBBBBBBBBBBBBBBBBBBBBB
>
>
> PS: In our case we need the exact record to be printed out as we need
> to split the fastq file into multiple fastq files based on the read
> index in the @ Line. So exact output is needed to avoid conflicts with
> downstream processing pipelines.
>
> Thanks,
> -Abhi
>
> Thanks,
> -Abhi
>
> On Thu, Sep 17, 2009 at 12:39 AM, Chris Fields  
> <cjfields at illinois.edu> wrote:
>> Abhi,
>>
>> The FASTQ parser hasn't been released to CPAN yet.  It is available  
>> via
>> bioperl-live.  We haven't added any code yet to the HOWTO's, but the
>> SYNOPSIS example in Bio::SeqIO::fastq should be sufficient to get you
>> started.
>>
>> Bio::Seq::Quality is the object returned via next_seq(); it can be  
>> queried
>> for PHRED qual scores and other bits.  If you want to split things  
>> up you
>> should call next_seq(), then generate a FASTQ output stream in the  
>> variant
>> you want:
>>
>> my $outfasta = Bio::SeqIO->new(-format => 'fastq-sanger', -file =>
>> '>fasta.file');
>> my $outqual = Bio::SeqIO->new(-format => 'fastq-sanger', -file =>
>> '>qual.file');
>>
>> while (my $seq = $in->next_seq) {
>>   $outfasta->write_fasta($seq);
>>   $outqual->write_qual($seq);
>> }
>>
>> Note I haven't tested that yet, but it should work.  Let me know if  
>> it
>> doesn't.
>>
>> chris
>>
>> On Sep 16, 2009, at 3:13 PM, Abhishek Pratap wrote:
>>
>>> Hi Chris
>>>
>>> I remember seeing a recent email about new bioperl fastq parser.  
>>> Is it
>>> part of bioperl 1.6 dist. I installed one and based on the doc
>>>
>>> here(http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/SeqIO/fastq.html 
>>> )
>>> I am a bit lost.
>>>
>>> I see two methods there : using Bio::SeqIO::fastq and
>>> Bio::Seq::Quality. Are both same in terms of data returned and  
>>> latter
>>> giving a scale up in speed ?
>>>
>>> This is not to offend any developer but small example/s on the  
>>> HOWTO's
>>> helps a lot.
>>>
>>> The current example (copied below) is not working. I guess it is  
>>> based
>>> on a previous version of code.
>>>
>>> # grabs the FASTQ parser, specifies the Illumina variant
>>> my $in = Bio::SeqIO->new(-format    => 'fastq-illumina',
>>>                         -file      => 'mydata.fq');
>>>
>>>
>>> My basic requirement is to read each read in fastq record and  
>>> split it
>>> into header: read: quality.
>>>
>>>
>>> Thanks,
>>> -Abhi
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l




More information about the Bioperl-l mailing list