[Bioperl-l] Next Gen Formats
Chris Fields
cjfields at illinois.edu
Fri Mar 12 08:04:53 EST 2010
On Mar 12, 2010, at 4:06 AM, Peter wrote:
> On Fri, Mar 12, 2010 at 3:35 AM, Chris Fields <cjfields at illinois.edu> wrote:
>> Ryan,
>>
>> We would have to see example files to get an idea of how feasible it is.
>> You could possibly use a Bio::SeqIO::fasta and a Bio::SeqIO::qual
>> stream, and interleave the two somehow. However, BioPerl qual
>> scores are PHRED-based by default, and I'm not sure how color-space
>> data would work within that schematic.
>>
>> chris
>
> Chris,
>
> I am under the (possibly mistaken) assumption that PHRED scores
> are used for SOLiD color space QUAL files - the key issue is each
> score corresponds to the color call in the color sequence.
>
> Ignoring color-space for a moment, are there BioPerl examples
> of iterating over a pair of sequence-space FASTA and QUAL files?
> i.e. What you'd get if you had a FASTQ file to iterate over.
>
> [I guess Ryan could just merge the color-space FASTA and
> QUAL into a color-space FASTQ file and iterate over that]
>
> Peter
If they're PHRED scores then it should be fine, though we may need to work in a few color-space specific things.
Iterating over pairs is something that has popped up before. For output, in the Bio::SeqIO::fastq module there is code for writing fasta/qual (to two separate streams), where I'm assuming one could do something like:
--------------------------------
my $in = Bio::SeqIO->new(-format => 'fastq', -file => 'foo.fastq');
my $out1 = Bio::SeqIO->new(-format => 'fastq', -file => '>foo.fasta');
my $out2 = Bio::SeqIO->new(-format => 'fastq', -file => '>foo.qual');
while (my $seq = $in->next_seq) {
$out1->write_fasta($seq);
$out2->write_fasta($seq);
}
--------------------------------
Note that all use the 'fastq' formatm instead of 'fasta' or 'qual'. This should work for those as well, just haven't tried it myself (it's a bug otherwise).
I'm assuming for input it would be something like:
--------------------------------
my $in1 = Bio::SeqIO->new(-format => 'fasta', -file => 'foo.fasta');
my $in2 = Bio::SeqIO->new(-format => 'qual', -file => 'foo.qual');
my $out = Bio::SeqIO->new(-format => 'fastq', -file => '>foo.fastq');
# 'qual' parser joins the two streams
while (my $seq = $in2->next_seq($in1)) {
$out->write_seq($seq);
}
--------------------------------
chris
More information about the Bioperl-l
mailing list