[BioPython] Clustalw.parse_file errors

Peter biopython at maubp.freeserve.co.uk
Tue Aug 5 21:27:50 UTC 2008


Peter wrote:
> Nick Matzke  wrote:
>> Hi all,
>>
>> I'm running through the excellent biopython tutorial here:
>> http://www.biopython.org/DIST/docs/tutorial/Tutorial.html#htoc100
>
> I'm glad you are enjoying the Tutorial (apart from the parsing bug!).
> I can't take any credit for this bit ;)

[I meant I can't take credit for this bit of the tutorial, however the
bug was mine!]

>> ...basically the Clustalw parser won't parse even the given example
>> alignment file (protein.aln) or another example file from elsewhere
>> (example.aln).

I've also checked I can read this file too:
http://www.pasteur.fr/recherche/unites/sis/formation/python/data/example.aln

For example,
>>> from Bio import AlignIO
>>> a = AlignIO.read(open("/tmp/example.aln"), "clustal")
>>> print a
SingleLetterAlphabet() alignment with 12 rows and 1168 columns
MESGHLLWALLFMQSLWPQLTDGATRVYYLGIRDVQWNYAPKGR...FKQ Q9C058
...

Currently Bio.AlignIO does not let you define the alphabet.  See:
http://bugzilla.open-bio.org/show_bug.cgi?id=2443

Alternatively, using Bio.Clustalw which does let you define an alphabet:
>>> from Bio import Clustalw
>>> from Bio.Alphabet import IUPAC, Gapped
>>> a = Clustalw.parse_file("/tmp/example.aln", Gapped(IUPAC.protein,"-"))
>>> print a

Note that using Bio.Clustalw you get a sub-class of the generic
alignment, which has a different str method (meaning "print a" will
re-create the alignment in clustal format).

Peter



More information about the Biopython mailing list