[Biopython] Struggling with MSA...

Thu Feb 15 10:09:58 EST 2024

Hi,

Sorry, I can't follow the docs (or find the right docs).

I've got the 'seed' stockholm alignment for this domain:
https://www.ebi.ac.uk/interpro/entry/pfam/PF08241/entry_alignments/?type=seed

and I'm trying to reproduce the signature it shows here:
https://www.ebi.ac.uk/interpro/entry/pfam/PF08241/logo/

I'm not sure a) why the probabilities differ in the profile relative to the
seed alignment, or b) how to filter columns in the alignment by those that
have a match in the model (see columns 4-6 in the alignment, which are gaps
in the model).

I think if I can answer b) then the answer to a) will be, "look at the full
alignment".

Here is my crude 'best guess' code:

import gzip
import Bio.AlignIO

# msa = "PF08241.alignment.full.gz"
msa = "PF08241.alignment.seed.gz"

with gzip.open(msa, "rt") as handle:
align = Bio.AlignIO.read(handle, "stockholm")
ncols = align.get_alignment_length()

for col in range(ncols):
amino_acids = dict()
for s in align[:, col]:
amino_acids[s] = amino_acids.get(s, 0) + 1
print(amino_acids)
for s in amino_acids:
print(f"{s}: {amino_acids[s]:3d} {amino_acids[s] / len(align):.3f}")

I have the feeling I'm doin it rong...

The above is just a 'warm up', really I want to see the conservation score,
base by base on a given protein in the alignment (where it matches the
model).

Many thanks for any suggestions, and sorry for not being able to find the
right document to answer these questions.

kthxbi,
Dan.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biopython/attachments/20240215/2dad9e6f/attachment.htm>