[Biopython-dev] Multiple alignment - Clustalw etc...
Cymon Cox
cy at cymon.org
Mon Mar 30 11:42:00 UTC 2009
Hi Folks,
I've been trying to formalize a bunch of randomly scattered bits of code to
support the use of the alignment programme Muscle
(http://www.drive5.com/muscle/). I prefer to use this software in preference
to
Clustalw - subjectively, it seems to give the most accurate alignments.
(Whether
Biopython would want to support a second alignment programme/external
dependency
is another matter...)
Anyway, while doing so, I realised just how awkward the current interface to
Clustalw is, which doesn't fit the SeqIO/AlignIO paradigm well.
Currently, if we have a bunch of SeqRecords, say after downloading from
GenBank
or being pulled from a BioSQL db, we have to write them to disk and call
clustalw on the file:
>>> from Bio import Clustalw
>>> from Bio.Clustalw import MultipleAlignCL
>>> cline = MultipleAlignCL("f002", command="clustalw")
>>> align = Clustalw.do_alignment(cline)
It seems to me more appropriate to be able to call clustalw directly on a
bunch
of SeqRecords:
eg (suggested implementation)
>>> records = list(SeqIO.parse(open("f002", "r"), "fasta"))
>>> from Bio.Align import MultipleAlignment
>>> align = MultipleAlignment(records, executable="clustalw")
Secondly, the biopython interface does not support calling Clustalw to
perform
profile alignments,
(suggested implementation)
# The scaffold alignment:
>>> align = AlignIO.read(open("blah.nex", "r"), "nexus")
# The sequences we want to add to it:
>>> records = list(SeqIO.parse(open("f002", "r"), "fasta"))
>>> from Bio.Align import ProfileAlignment
>>> align = ProfileAlignment(align, records, executable="clustalw")
Calls to MultipleAlignment and ProfileAlignment would take a **options
parameter to collect any additional command line options.
Thirdly, should an alignment object have a
Alignment.refine_alignment(executable="clustalw")
method?
Any thoughts?
Cheers, C.
--
____________________________________________________________________
Cymon J. Cox
Centro de Ciencias do Mar
Faculdade de Ciencias do Mar e Ambiente (FCMA)
Universidade do Algarve
Campus de Gambelas
8005-139 Faro
Portugal
Phone: +0351 289800909 ext 7909
Fax: +0351 289800051
Email: cy at cymon.org, cymon at ualg.pt, cymon.cox at gmail.com
HomePage : http://biology.duke.edu/bryology/cymon.html
-8.63/-6.77
More information about the Biopython-dev
mailing list