[Bioperl-l] Alphabet guessing

Dmitri Bichko dbichko at aveopharma.com
Tue Oct 18 03:49:10 EDT 2005


Hi,

Is being unable to guess the sequence alphabet really an unrecoverable
error?  I'm referring to this bit in PrimarySeq.pm:

  my $str = $self->seq();
  $str =~ s/[-.?x]//gi;
  my $total = CORE::length($str);
  if( $total == 0 ) {
    $self->throw("Got a sequence with no letters in it ". 
      "cannot guess alphabet [$str]");
  }

Problem is that if you happen on a seq that's all X's, you get a fatal
exception, which can be very annoying when you are in the middle of a 15
million sequence fasta stream (where you don't care about, nor even
expect the alphabet type; and the docs suggest that you can't
necessarily recover after catching exceptions).

Might not something along these lines make more sense: 

  if( $total == 0 ) {
    $self->warn("Got a sequence with no letters in it, assuming 'dna'
alphabet.");
    $self->alphabet('dna');
    return 'dna';
  }

Or should the seqio factories catch the guessing exceptions?

Thanks,
Dmitri



More information about the Bioperl-l mailing list