[Biopython] StockholmIO replaces "." with "-", why?

Peter biopython at maubp.freeserve.co.uk
Fri Apr 9 09:21:03 EDT 2010


On Fri, Apr 9, 2010 at 1:51 PM, Chris Fields <cjfields at illinois.edu> wrote:
>
>
> Just curious, b/c this is a point of contention in BioPerl.  How does BioPython
> internally set what symbols correspond to residues/gaps/frameshifts/other?
> BioPerl retains the original sequence but uses regexes for validation and
> methods that return symbol-related information (e.g. gap counts).
>
> (BTW, the contention here isn't that we use regexes, but that we set them globally).
>
> chris

Hi Chris,

The short answer is gaps are by default "-", and stop codons are "*", but
beyond that it would be down to user code to interpret odd symbols.

Our sequences have an alphabet object which can specify the letters (as
a set of expected characters), with explicit support for a single gap
character (usually "-"), and for proteins a single stop codon symbol (usually
"*"). This could in theory be extended to define other symbols too. The gap
char does get treated specially in some of the alignment code (e.g. for
calling a consensus), but I don't think we have anything built in regarding
frameshifts.

Peter



More information about the Biopython mailing list