[Biopython] StockholmIO replaces "." with "-", why?

Peter biopython at maubp.freeserve.co.uk
Fri Apr 9 12:08:03 UTC 2010


On Thu, Apr 8, 2010 at 9:04 AM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> On Thu, Apr 8, 2010 at 1:57 AM, Bryan Lunt <lunt at ctbp.ucsd.edu> wrote:
>> Greetings All!
>>
>> It looks like line 364 of Bio.AlignIO.StockholmIO reads:
>>
>> seqs[id] += seq.replace(".","-")
>>
>> So when you load into memory alignments that mark gaps created to
>> allow alignment to inserts with ".", (such as PFam alignments or the
>> output of hmmer) that information is lost.
>>
>> I know there must be a good reason for this, but I am finding it a
>> problem on my end..
>>
>> -Bryan Lunt
>
> Hi Bryan,
>
> Yes, is it done deliberately. The dot is a problem - it has a quite
> specific meaning of "same as above" on other alignment file
> formats, while "-" is an almost universal shorthand for gap/insertion.
> Consider the use case of Stockholm to PHYLIP/FASTA/Clustal
> conversion.
>
> Have you got a sample output file we can use as a unit test or
> at least discuss? As I recall, on the PFAM alignments I looked
> at there was no data loss by doing the dot to dash mapping.

According to http://sonnhammer.sbc.su.se/Stockholm.html
>> Sequence letters may include any characters except
>> whitespace. Gaps may be indicated by "." or "-".

So a Stockholm file using a mixture of "." and "-" would be
valid but a bit odd. Why would anyone do that?

Peter



More information about the Biopython mailing list