[Biopython] StockholmIO replaces "." with "-", why?

Peter biopython at maubp.freeserve.co.uk
Fri Apr 9 13:30:55 UTC 2010


On Fri, Apr 9, 2010 at 2:09 PM, Ivan Rossi <ivan at biodec.com> wrote:
>
> On Fri, 9 Apr 2010, Peter wrote:
>
>> So a Stockholm file using a mixture of "." and "-" would be
>> valid but a bit odd. Why would anyone do that?
>
> IIRC the "." are used for "gaps" at the extremes of sequences in a MSA. When
> you do local sequence alignments, like blast and most HMMs do, gaps at the
> extremes of sequences do not pay the usual penalty for gap opening. So in
> Stockholm format distinguishes between gaps for what you paid a price during
> the alignment ("-") and gaps-for-free (".") which are there just to pad each
> row to the MSA width.

So internal gaps (true gaps), versus leading or trailing padding. That makes
sense - and is certainly how PFAM does things according to their FAQ:

Quoting from http://pfam.sanger.ac.uk/help#tabview=tab3
>>> What is the difference between the - and . characters in your full alignments ?
>>>
>>> The '-' and '.' characters both represent gap characters. However they
>>> do tell you some extra information about how the HMM has generated
>>> the alignment. The '-' symbols are where the alignment of the sequence
>>> has used a delete state in the HMM to jump past a match state. This
>>> means that the sequence is missing a column that the HMM was
>>> expecting to be there. The '.' character is used to pad gaps where one
>>> sequence in the alignment has sequence from the HMMs insert state.
>>> See the alignment below where both characters are used. The HMM
>>> states emitting each column are shown. Note that residues emitted
>>> from the Insert (I) state are in lower case.

I wonder why doesn't this get mentioned anywhere on the format definitions:
http://sonnhammer.sbc.su.se/Stockholm.html
http://en.wikipedia.org/wiki/Stockholm_format

Peter



More information about the Biopython mailing list