[Biopython] Struggling with MSA...

Dan Bolser dan.bolser at outsee.co.uk
Thu Feb 15 10:09:58 EST 2024


Sorry, I can't follow the docs (or find the right docs).

I've got the 'seed' stockholm alignment for this domain:

and I'm trying to reproduce the signature it shows here:

I'm not sure a) why the probabilities differ in the profile relative to the
seed alignment, or b) how to filter columns in the alignment by those that
have a match in the model (see columns 4-6 in the alignment, which are gaps
in the model).

I think if I can answer b) then the answer to a) will be, "look at the full

Here is my crude 'best guess' code:

import gzip
import Bio.AlignIO

# msa = "PF08241.alignment.full.gz"
msa = "PF08241.alignment.seed.gz"

with gzip.open(msa, "rt") as handle:
align = Bio.AlignIO.read(handle, "stockholm")
ncols = align.get_alignment_length()

for col in range(ncols):
amino_acids = dict()
for s in align[:, col]:
amino_acids[s] = amino_acids.get(s, 0) + 1
for s in amino_acids:
print(f"{s}: {amino_acids[s]:3d} {amino_acids[s] / len(align):.3f}")

I have the feeling I'm doin it rong...

The above is just a 'warm up', really I want to see the conservation score,
base by base on a given protein in the alignment (where it matches the

Many thanks for any suggestions, and sorry for not being able to find the
right document to answer these questions.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biopython/attachments/20240215/2dad9e6f/attachment.htm>

More information about the Biopython mailing list