[Bioperl-l] Alphabet guessing

Jason Stajich jason.stajich at duke.edu
Tue Oct 18 08:06:41 EDT 2005


 From the Bio::SeqIO documentation

-alphabet

Sets the alphabet ('dna', 'rna', or 'protein'). When the alphabet is
set then Bioperl will not attempt to guess what the alphabet is. This
may be important because Bioperl does not always guess correctly.


You can pre-specify the alphabet:

$seqio = Bio::SeqIO->new(-format => 'fasta',
                                                 -file =>  
"fifteen_million_sequence_file.fa",
                                                 -alphabet => 'dna');

-jason
On Oct 18, 2005, at 3:49 AM, Dmitri Bichko wrote:

> Hi,
>
> Is being unable to guess the sequence alphabet really an unrecoverable
> error?  I'm referring to this bit in PrimarySeq.pm:
>
>   my $str = $self->seq();
>   $str =~ s/[-.?x]//gi;
>   my $total = CORE::length($str);
>   if( $total == 0 ) {
>     $self->throw("Got a sequence with no letters in it ".
>       "cannot guess alphabet [$str]");
>   }
>
> Problem is that if you happen on a seq that's all X's, you get a fatal
> exception, which can be very annoying when you are in the middle of  
> a 15
> million sequence fasta stream (where you don't care about, nor even
> expect the alphabet type; and the docs suggest that you can't
> necessarily recover after catching exceptions).
>
> Might not something along these lines make more sense:
>
>   if( $total == 0 ) {
>     $self->warn("Got a sequence with no letters in it, assuming 'dna'
> alphabet.");
>     $self->alphabet('dna');
>     return 'dna';
>   }
>
> Or should the seqio factories catch the guessing exceptions?
>
> Thanks,
> Dmitri
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12




More information about the Bioperl-l mailing list