[Biopython] Multiple Sequence Alignment Conversion: A2M and A3M from Fasta

Peter biopython at maubp.freeserve.co.uk
Thu Feb 3 00:10:02 UTC 2011


On Wed, Feb 2, 2011 at 11:54 PM, Brett Bowman <bnbowman at gmail.com> wrote:
> I'm writing a Biopython script to pipeline the following process:
> 1) Parse Fasta From File
> 2) Blast it against NCBI and pull down a range of solid hits
> 3) Align the sequence with Muscle or ClustalW
> 4) Build a HMM profile of the alignment with HHmake
>
> 1-3 I've got down pat, its step 4 that seems to be the problem.
>  In particular, HHmake appears to prefer A2M or A3M format alignments,
> and produces inferior results when fed an Aligned Fasta (*.AFA).  Both
> alignment programs output to Fasta or ClustalW, but not A2M or A3M, and in
> addition I can't seem to find a definition for either format online
> anywhere.
>
> So: Does anyone know if there is a way to convert to A2M or A3M
> with Biopython?  They do not appear supported by AlignIO.  Otherwise, does
> anyone know where I could find a definition for the formats online so that I
> can write my own conversion?

Have you seen this  HHMake manual:

ftp://toolkit.lmb.uni-muenchen.de/HHsearch/HHsearch1.5.1/HHsearch-guide.pdf

This describes the A2M and A3M formats, which I had not heard
of before. I suspect these are file formats specific to the HHmake.
It also says HHmake comes with a perl script reformat.pl which can
be used to convert Clustal (or Stockholm format) to A3M - so just
use that instead?

Peter




More information about the Biopython mailing list