[Bioperl-l] Annotation-assisted (and/or BLAST assisted) multiple sequence alignment tool?
Brian Foley
brianfoleynm at gmail.com
Thu Apr 18 19:17:59 EDT 2013
At the HIV Sequence and Immunolgogy Databases (http://www.hiv.lanl.gov)
where I work, we have
used a bit of creativity to solve some difficult problems in multiple
sequence alignment, because we
often want to produce an alignment of gene sequences from more than 20,000
different isolates of HIV-1
in less than a few minutes time.
We are very good at "deep" multiple alignment, thousands of copies of the
same small genome.
My problem comes when I want to align the genomes of other viruses or
similar sized gene
regions (the complete mitochondrial genomes of vertebrates for example,
which are roughly 17 kb
in size), they don't always have the same gene order.
A good example are the mitochondrial genomes of birds and mammals, which
are mostly
co-linear, but with the NADH6 gene moved to a different location. See
attached JPG of
Aardvark and Japanese Eagle-Hawk mitochondrial genomes.
In other cases, I think it is the primate mitochondrial genomes, the
authors all used a different site for the "base #1" in
the circular genome. So although the primate mitochondrial genomes are
100% co-linear with other vertebrates, we
have to chop several thousand bases off the right end and past them onto
the left end (5' end, beginning) to make
them align with the mt-genomes of other mammals.
So, it seems to me that there ought to be a multiple sequence alignment
tool, that can read GenBank files with
their annotation, and use the annotation to help with the alignment process.
One tool that I am aware of, which can help a lot, is the "Artemis Genome
Comparison Tool" (ACT) and its
associated DOUBLE-ACT server:
http://www.hpa-bioinfotools.org.uk/pise/double_act.html
The DOUBLE-ACT server uses BLAST to find regions on a pair of genomes which
are homologous/similar
and creates a table of these matched regions. The Artemis Comparison Tool
then loads both genomes
into an ARTEMIS Genome Browser tool and uses the BLAST hit table to help
the browser get both genomes
"in synch" with each other as you browse the genomes.
Although the DOUBLE-ACT BLAST step here is not dependent on annotations at
all, the annotations
are visible when browsing the genomes in ACT.
I am quite sure that I am not the only one in the world who needs this type
of tool. I am increasingly seeing
large multiple sequence alignments being done for classification of
organisms, where the authors
could have used such a tool.
Please let me know if you have any ideas about where to look for such a
tool, or which groups of
bioinformatics workers might be able to develop one.
Brian T. Foley, PhD
HIV Databases
Los Alamos National Laboratory
btf at lanl.gov
505 665-1970
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Avian_Mammal_mtGenomeMaps.jpg
Type: image/jpeg
Size: 170205 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20130418/b27e9858/attachment-0002.jpg>
More information about the Bioperl-l
mailing list