[Biopython-dev] Multiple alignment - Clustalw etc...

Cymon Cox cy at cymon.org
Mon Mar 30 11:42:00 UTC 2009


Hi Folks,

I've been trying to formalize a bunch of randomly scattered bits of code to
support the use of the alignment programme Muscle
(http://www.drive5.com/muscle/). I prefer to use this software in preference
to
Clustalw - subjectively, it seems to give the most accurate alignments.
(Whether
Biopython would want to support a second alignment programme/external
dependency
is another matter...)

Anyway, while doing so, I realised just how awkward the current interface to
Clustalw is, which doesn't fit the SeqIO/AlignIO paradigm well.

Currently, if we have a bunch of SeqRecords, say after downloading from
GenBank
or being pulled from a BioSQL db, we have to write them to disk and call
clustalw on the file:

>>> from Bio import Clustalw
>>> from Bio.Clustalw import MultipleAlignCL
>>> cline = MultipleAlignCL("f002", command="clustalw")
>>> align = Clustalw.do_alignment(cline)

It seems to me more appropriate to be able to call clustalw directly on a
bunch
of SeqRecords:

eg (suggested implementation)
>>> records = list(SeqIO.parse(open("f002", "r"), "fasta"))
>>> from Bio.Align import MultipleAlignment
>>> align = MultipleAlignment(records, executable="clustalw")

Secondly, the biopython interface does not support calling Clustalw to
perform
profile alignments,

(suggested implementation)
# The scaffold alignment:
>>> align = AlignIO.read(open("blah.nex", "r"), "nexus")
# The sequences we want to add to it:
>>> records = list(SeqIO.parse(open("f002", "r"), "fasta"))
>>> from Bio.Align import ProfileAlignment
>>> align = ProfileAlignment(align, records, executable="clustalw")

Calls to MultipleAlignment and ProfileAlignment would take a **options
parameter to collect any additional command line options.

Thirdly, should an alignment object have a
Alignment.refine_alignment(executable="clustalw")
method?

Any thoughts?

Cheers, C.
-- 
____________________________________________________________________

Cymon J. Cox

Centro de Ciencias do Mar
Faculdade de Ciencias do Mar e Ambiente (FCMA)
Universidade do Algarve
Campus de Gambelas
8005-139 Faro
Portugal

Phone: +0351 289800909 ext 7909
Fax: +0351 289800051
Email: cy at cymon.org, cymon at ualg.pt, cymon.cox at gmail.com
HomePage : http://biology.duke.edu/bryology/cymon.html
-8.63/-6.77



More information about the Biopython-dev mailing list