[Biopython-dev] [Bug 2542] AlignInfo.py fails a test

Fri Jul 11 21:19:50 UTC 2008

http://bugzilla.open-bio.org/show_bug.cgi?id=2542

biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED

------- Comment #6 from biopython-bugzilla at maubp.freeserve.co.uk  2008-07-11 17:19 EST -------
> Yes, this code with Bio.AlignIO also fails (I tried right now with
> AlignInfo.py rev. 1.17):
> 
> from Bio.Align import AlignInfo
> from Bio.Align.AlignInfo import SummaryInfo
> from Bio import AlignIO
> fn = open("secu3.aln")
> alignment = AlignIO.read(fn, "clustal")
> summary = SummaryInfo(alignment)
> print summary.information_content()
> 
> And I got (and this time I am not supplying any alphabet, at least not
> explicit):
> 
> Traceback (most recent call last):
> ...
> ValueError: Error in alphabet: not Nucleotide or Protein, supply expected
> frequencies

Good.  That seems to be working as intended - alignment formats like FASTA or
Clustal do not specify the sequence type (unlike for example the Nexus format).

Perhaps Bio.AlignIO.read() and parse() should be able to accept an optional
alphabet argument?  I had already been considering this for Bio.SeqIO so this
is a natural extension.  See Bug 2443.

Unless information_content() can determine the sequence type (protein or
nucleotide) from the alignment alphabet, you have to help it by supplying an
appropriate e_freq_table argument.

Perhaps:

from Bio.Alphabet import IUPAC
from Bio.SubsMat import FreqTable
from Bio.Align.AlignInfo import SummaryInfo
from Bio import AlignIO

fn = open("secu3.aln")
alignment = AlignIO.read(fn, "clustal")
summary = SummaryInfo(alignment)

#Have a generic alphabet, without a declared gap char, so must
#provide the frequencies and chars to ignore explicitly:
expected = FreqTable.FreqTable({"A":0.25,"G":0.25,"T":0.25,"C":0.25},
                               FreqTable.FREQ, IUPAC.unambiguous_dna)
print summary.information_content(e_freq_table=expected,
                                  chars_to_ignore=['-'])

This is probably safest.  I'm doubtful that information_content() will choose
wisely if given mixed case or lower case sequences... if that is the case it
should be filed as a new bug.

>
> > P.S. Please update to Biopython 1.47 rather than using 1.46
> 
> I was using Biopython 1.47, but I reported as 1.46 just because 1.47
> it is not available from the drop-down menu in bugzilla form.

Thanks for the reminder - I've added that to Bugzilla now :)

I'm marking this bug as fixed now (after the updates to AlignInfo.py)

-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.