[BioPython] FormatConverter: from Fasta format to ClustalW format

Peter biopython at maubp.freeserve.co.uk
Fri Jan 4 08:20:26 EST 2008


On Jan 2, 2008 5:46 PM, Peter wrote:
> On Jan 2, 2008 1:44 PM, Lee,Byung-chul wrote:
> > As your explanation, I tried to use SeqIO, but another error occured
> > I did it like below:
>
> My fault, sorry. I wasn't at a computer with Biopython installed, I
> had to guess.  I'll try and put together a proper example for you
> tomorrow.

This should work on Biopython 1.43 or later, I have tested it using
the simple FASTA file you gave earlier:

from Bio.Alphabet.IUPAC import IUPACProtein
from Bio.Alphabet import Gapped
from Bio import SeqIO
from Bio.Align import AlignInfo
gapped_protein = Gapped(IUPACProtein())

records = list(SeqIO.parse(open('tmp.fasta'), "fasta"))
for rec in records :
    #Override the default generic alphabet:
    rec.seq.alphabet = gapped_protein
#Turn these records into an alignment
alignment = SeqIO.to_alignment(records, gapped_protein)
del records

summary_align = AlignInfo.SummaryInfo(alignment)
print summary_align.dumb_consensus()
print summary_align.gap_consensus()

The problem with my previous shorter suggestion was the Bio.SeqIO
FASTA parser returned SeqRecord objects with a generic alphabet, while
the alignment summary expected a gapped alphabet.  I'm beginning to
think that the Bio.SeqIO.parse() function should allow an alphabet to
be specified as an optional argument for this sort of situation.

Alternatively, going back to your original code how about:

from Bio.Fasta import FastaAlign
from Bio.Align import AlignInfo

alignment = FastaAlign.parse_file('tmp.fasta',type='PROTEIN')
summary_align = AlignInfo.SummaryInfo(alignment)
print summary_align.dumb_consensus()
print summary_align.gap_consensus()

This works using Biopython 1.44 with either mxTextTools 2.0 or 3.0.
It should work with older versions of Biopython using mxTextTools 2.0
as well.

Peter


More information about the BioPython mailing list