[Biopython] Struggling with MSA...

Thu Feb 15 10:47:33 EST 2024

Hi Peter,

On Thu, 15 Feb 2024 at 15:28, Peter Cock <p.j.a.cock at googlemail.com> wrote:

> Hello Dan,
>
> It that is a probability at the end, you have the wrong denominator -
> should be len(align) == sum(amino_acids.values()
>

Makes no difference, they are the same (including -s). The profile reports
-s as 'Insert Probability'.

Note this seed alignment has 164 columns, yet the model page says the
> model has 95 columns. If I count the number of columns which are half
> gaps that's pretty close...
>

I don't understand what that means or implies... Is something wrong
somewhere?

Cheers,
Dan.

Peter
>
> On Thu, Feb 15, 2024 at 3:09 PM Dan Bolser <dan.bolser at outsee.co.uk>
> wrote:
> >
> > Hi,
> >
> > Sorry, I can't follow the docs (or find the right docs).
> >
> > I've got the 'seed' stockholm alignment for this domain:
> >
> https://www.ebi.ac.uk/interpro/entry/pfam/PF08241/entry_alignments/?type=seed
> >
> > and I'm trying to reproduce the signature it shows here:
> > https://www.ebi.ac.uk/interpro/entry/pfam/PF08241/logo/
> >
> > I'm not sure a) why the probabilities differ in the profile relative to
> the seed alignment, or b) how to filter columns in the alignment by those
> that have a match in the model (see columns 4-6 in the alignment, which are
> gaps in the model).
> >
> > I think if I can answer b) then the answer to a) will be, "look at the
> full alignment".
> >
> > Here is my crude 'best guess' code:
> >
> > import gzip
> > import Bio.AlignIO
> >
> > # msa = "PF08241.alignment.full.gz"
> > msa = "PF08241.alignment.seed.gz"
> >
> > with gzip.open(msa, "rt") as handle:
> > align = Bio.AlignIO.read(handle, "stockholm")
> > ncols = align.get_alignment_length()
> >
> > for col in range(ncols):
> > amino_acids = dict()
> > for s in align[:, col]:
> > amino_acids[s] = amino_acids.get(s, 0) + 1
> > print(amino_acids)
> > for s in amino_acids:
> > print(f"{s}: {amino_acids[s]:3d} {amino_acids[s] / len(align):.3f}")
> >
> >
> >
> > I have the feeling I'm doin it rong...
> >
> > The above is just a 'warm up', really I want to see the conservation
> score, base by base on a given protein in the alignment (where it matches
> the model).
> >
> > Many thanks for any suggestions, and sorry for not being able to find
> the right document to answer these questions.
> >
> >
> > kthxbi,
> > Dan.
> > _______________________________________________
> > Biopython mailing list  -  Biopython at biopython.org
> > https://mailman.open-bio.org/mailman/listinfo/biopython
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biopython/attachments/20240215/7e8b068a/attachment.htm>