[Biopython-dev] Determining if seq alphabet is protein/dna/rna
Peter (BioPython Dev)
biopython-dev at maubp.freeserve.co.uk
Mon Oct 30 00:13:57 UTC 2006
Hello all,
I've been looking at writing multiple sequence alignments in Nexus
format for the new Bio.SeqIO code, and came up with the following little
problem:
Given one or more Seq objects, how can I reliably decide if they are
protein, DNA, or RNA?
(These are the relevant choices in a Nexus file's format datatype=...
header.)
I'm resigned to the fact that if the Seq object has the generic alphabet
this boils down to looking at the sequence strings and making an
educated guess (probably following an established algorithm from an
alignment program). Does any such code already exist in BioPython?
However - is there a nice/official way to ask an alphabet object what it
is (protein, DNA, RNA)?
Looking over the code in Bio.Alphabet the only thing I can think of is
to get the class name as a string and search it(!) We can't look at the
letters property as this is None for the base classes like ProteinAlphabet.
If we are prepared to meddle with the alphabet system we might add
attributes like "isProtein", "isNucleotide", "isRNA", "isDNA" to these
base classes. Or simply have a "sequence_type" method, which the
subclasses can re-define as required.
(I wasn't meaning to reopen the whole "do we need alphabets"
conversation last discussed in July 2006. At least, not yet...)
Peter
More information about the Biopython-dev
mailing list