[Biopython] StockholmIO replaces "." with "-", why?

Bryan Lunt lunt at ctbp.ucsd.edu
Fri Apr 9 15:55:31 UTC 2010


Hello Peter,
The HMMER suit of tools, and the Pfam website use "-" to indicate that
an HMM visited a deletion state, and "." to indicate that the HMM on a
different sequence visited an insertion state, and this gap is just
added to maintain alignment.


>foo
AA...BBB---CCC
>bar
AAbazBBBDDDCCC

In this example, the sequence "foo" doesn't have the DDD section of
the profile HMM,
the second sequence has not only the full model, but also contains an
insert, "baz" that is not part of the HMM, for example, an extra-long
loop.

I hope this helps...
-Bryan

On Fri, Apr 9, 2010 at 5:08 AM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> On Thu, Apr 8, 2010 at 9:04 AM, Peter <biopython at maubp.freeserve.co.uk> wrote:
>> On Thu, Apr 8, 2010 at 1:57 AM, Bryan Lunt <lunt at ctbp.ucsd.edu> wrote:
>>> Greetings All!
>>>
>>> It looks like line 364 of Bio.AlignIO.StockholmIO reads:
>>>
>>> seqs[id] += seq.replace(".","-")
>>>
>>> So when you load into memory alignments that mark gaps created to
>>> allow alignment to inserts with ".", (such as PFam alignments or the
>>> output of hmmer) that information is lost.
>>>
>>> I know there must be a good reason for this, but I am finding it a
>>> problem on my end..
>>>
>>> -Bryan Lunt
>>
>> Hi Bryan,
>>
>> Yes, is it done deliberately. The dot is a problem - it has a quite
>> specific meaning of "same as above" on other alignment file
>> formats, while "-" is an almost universal shorthand for gap/insertion.
>> Consider the use case of Stockholm to PHYLIP/FASTA/Clustal
>> conversion.
>>
>> Have you got a sample output file we can use as a unit test or
>> at least discuss? As I recall, on the PFAM alignments I looked
>> at there was no data loss by doing the dot to dash mapping.
>
> According to http://sonnhammer.sbc.su.se/Stockholm.html
>>> Sequence letters may include any characters except
>>> whitespace. Gaps may be indicated by "." or "-".
>
> So a Stockholm file using a mixture of "." and "-" would be
> valid but a bit odd. Why would anyone do that?
>
> Peter
>



More information about the Biopython mailing list