[Bioperl-l] Fasta Qual files

Ewan Birney birney@ebi.ac.uk
Thu, 14 Sep 2000 08:52:22 +0100 (GMT)


On Wed, 13 Sep 2000, Brian A. Desany, Ph.D. wrote:

> I am just starting to get my feet wet with bioperl, so I apologize if the
> answer to my question seems self-evident. (I was unable to answer this by
> using the available documentation). Background: I created a tiny little
> script that sorts sequences in an initial fasta file into files included.fa
> and excluded.fa, based on a list of sequence ID's in a different file. That
> is, for each sequence in the initial fasta file, the script goes down the
> list of ID's in the index file and writes the sequence to excluded.fa if
> there is a match, and if it gets to the end of the index list without
> finding a match, it writes the sequence to included.fa. Then it moves on to
> the next sequence in the initial fasta file and goes down the index list
> again.
> 
> This works great. My question is, does bioperl have the capability of
> treating fasta.qual files like this? I tried it on a fasta.qual file and it
> definitely did not work! Error message about unrecognized alphabet. It
> turned out that it actually did sort everything properly, but the quality
> values in the output files were occasionally concatenated, such that, say,
> in a run of 40 40 40 40 40 40 40, you would see 40 40 4040 40 40 40. So it
> almost worked, but not quite. Is there a means to deal with fasta.qual files
> within bioperl (Bio::SeqIO::fastaqual, like), or does anyone have any other
> suggestions as to how to handle this independently? Thanks,

The way to handle this is to probably have the following set up

  Bio::PrimarySeqWithQualityScores - object representing the sequence with
quality scores. This should at least implement the Bio::PrimarySeqI
interface and possibly inheriet from Bio::PrimarySeq for its
implementation

  Bio::SeqIO::quality - seqio system to read PrimarySeqWithQualityScores




Of course the problem is that the quality scores and sequecne actually
come in different files (though usually named the same). So - this is not
a perfect fit. A second object might come into play.

does this give you some ideas?





> -Brian.
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
> 

-----------------------------------------------------------------
Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420
<birney@ebi.ac.uk>. 
-----------------------------------------------------------------