[Biopython] Struggling with MSA...
Peter Cock
p.j.a.cock at googlemail.com
Thu Feb 15 10:28:43 EST 2024
Hello Dan,
It that is a probability at the end, you have the wrong denominator -
should be len(align) == sum(amino_acids.values()
Note this seed alignment has 164 columns, yet the model page says the
model has 95 columns. If I count the number of columns which are half
gaps that's pretty close...
Peter
On Thu, Feb 15, 2024 at 3:09 PM Dan Bolser <dan.bolser at outsee.co.uk> wrote:
>
> Hi,
>
> Sorry, I can't follow the docs (or find the right docs).
>
> I've got the 'seed' stockholm alignment for this domain:
> https://www.ebi.ac.uk/interpro/entry/pfam/PF08241/entry_alignments/?type=seed
>
> and I'm trying to reproduce the signature it shows here:
> https://www.ebi.ac.uk/interpro/entry/pfam/PF08241/logo/
>
> I'm not sure a) why the probabilities differ in the profile relative to the seed alignment, or b) how to filter columns in the alignment by those that have a match in the model (see columns 4-6 in the alignment, which are gaps in the model).
>
> I think if I can answer b) then the answer to a) will be, "look at the full alignment".
>
> Here is my crude 'best guess' code:
>
> import gzip
> import Bio.AlignIO
>
> # msa = "PF08241.alignment.full.gz"
> msa = "PF08241.alignment.seed.gz"
>
> with gzip.open(msa, "rt") as handle:
> align = Bio.AlignIO.read(handle, "stockholm")
> ncols = align.get_alignment_length()
>
> for col in range(ncols):
> amino_acids = dict()
> for s in align[:, col]:
> amino_acids[s] = amino_acids.get(s, 0) + 1
> print(amino_acids)
> for s in amino_acids:
> print(f"{s}: {amino_acids[s]:3d} {amino_acids[s] / len(align):.3f}")
>
>
>
> I have the feeling I'm doin it rong...
>
> The above is just a 'warm up', really I want to see the conservation score, base by base on a given protein in the alignment (where it matches the model).
>
> Many thanks for any suggestions, and sorry for not being able to find the right document to answer these questions.
>
>
> kthxbi,
> Dan.
> _______________________________________________
> Biopython mailing list - Biopython at biopython.org
> https://mailman.open-bio.org/mailman/listinfo/biopython
More information about the Biopython
mailing list