[Bioperl-l] How to input fasta and qual into Bio::Seq::Quality?

Mark Johnson johnsonm at gmail.com
Thu Jul 24 19:35:00 UTC 2008


On Thu, Jul 24, 2008 at 10:09 AM, Phillip San Miguel <pmiguel at purdue.edu> wrote:

> Specifically, what I would like is for someone to tell me that my way of
> creating a Bio::Seq::Quality object is unnecessarily brute force. I would
> like this to be possible:
>
> my $BSQ_in_obj   =Bio::SeqIO
>       ->new( -file    =>$seq_infile,
>                   -qfile  =>$qual_infile
>             );
> such that $BSQ_in_obj->next_seq would create a Bio::Seq::Quality object.

I'm no design pattern ninja, but I think what you're looking for is
something like a Facade for Bio::SeqIO::fasta and Bio::SeqIO::qual:

my $seqio = Bio::SeqIO::SomeCleverName->new(-seq => $seq_file, -qual
=> $qual_file);

# $bsq is a Bio::Seq::Quality
while (my $bsq = $seqio->next_seq()) {
    # Do Stuff
}

I don't think this is implemented anywhere in Bioperl, though I have
not done an exhaustive search.

> But I don't have a good overall sense of the guts of bioperl, so I'm not
> sure if this would create other problems.

A wrapper would have just should have zero impact.  There seems to be
some sort of precedent for this sort of thing (see
Bio::SeqIO::MultiFile).  You could code this up and start using it
yourself immediately.  If you were so inclined, you could contribute
it back to Bioperl (no guarantees it would be accepted, at least as
is, but talking about it on the list first helps grease the tracks, so
to speak).

> while (1){
>   #create actual objects for both a seq and its associated qual
>   my $seq_obj =$in_seq_obj->next_seq() || last;
>   my $qual_obj=$in_qual_obj->next_seq();

Your code assumes that the files are in the same order.  That is, that
the Nth fasta sequence in the seq file goes with the Nth fasta quality
sequence in the qual file.  I suppose that's a safe assumption if you
generated the files, but if you were to code this up as a wrapper for
Bio::SeqIO::fasta and Bio::SeqIO::qual you'd want some sort of sanity
check that would throw an exception if the identifiers didn't match.

I suppose another option would be to convert your seq/qual files to
fastq format (See Bio::SeqIO::fastq).



More information about the Bioperl-l mailing list