[Biopython] StockholmIO replaces "." with "-", why?

Peter biopython at maubp.freeserve.co.uk
Thu Apr 8 08:04:27 UTC 2010


On Thu, Apr 8, 2010 at 1:57 AM, Bryan Lunt <lunt at ctbp.ucsd.edu> wrote:
> Greetings All!
>
> It looks like line 364 of Bio.AlignIO.StockholmIO reads:
>
> seqs[id] += seq.replace(".","-")
>
> So when you load into memory alignments that mark gaps created to
> allow alignment to inserts with ".", (such as PFam alignments or the
> output of hmmer) that information is lost.
>
> I know there must be a good reason for this, but I am finding it a
> problem on my end..
>
> -Bryan Lunt

Hi Bryan,

Yes, is it done deliberately. The dot is a problem - it has a quite
specific meaning of "same as above" on other alignment file
formats, while "-" is an almost universal shorthand for gap/insertion.
Consider the use case of Stockholm to PHYLIP/FASTA/Clustal
conversion.

Have you got a sample output file we can use as a unit test or
at least discuss? As I recall, on the PFAM alignments I looked
at there was no data loss by doing the dot to dash mapping.

Peter



More information about the Biopython mailing list