[Biopython-dev] Bio.AlignIO, Bio.Nexus, MrBayes, polymorphic sites, maximum line length
Nick Loman
n.j.loman at bham.ac.uk
Thu Dec 2 15:25:06 UTC 2010
Peter wrote:
>> Is this the best way of doing it? Would a method call in AlignIO to
>> do the same thing be useful to others?
>>
> I've got some code somewhere for iterating over the columns of
> the alignment, and think I filed an enhancement bug for this.
> Would that do what you want?
>
Hi Peter,
Yes, that would make the code more readable, definitely. Not sure
whether you think a function to return an alignment containing just the
polymorphic sites would also be helpful to others.
>> 2) When outputting long alignments in Nexus format, MrBayes refuses to read
>> the resulting files saying that the maximum line length is 19900 characters.
>> I'm assuming that is not the maximum input to MrBayes and that it can handle
>> longer alignments if they are split in some way. Would it be possible for
>> Bio.Nexus to split alignments in the appropriate format?
>>
>
> Are you outputting the large alignment using Bio.AlignIO or using
> Bio.Nexus directly?
>
In this case I was using Bio.Nexus but it would be the same with
Bio.AlignIO.
> The file format details are not fresh in my mind, but I think that long
> sequences can be split over multiple lines - so if the problem is
> just with how MrBayes parses the file, that might be fixable. Can
> you give me a test case for this (maybe generate a simple but
> large alignment in code) with the MrBayes call that fails?
>
Sure thing:
from Bio import AlignIO
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord
from Bio.Align import MultipleSeqAlignment
from Bio.Alphabet import generic_dna
import subprocess
align1 = MultipleSeqAlignment([
SeqRecord(Seq("A" * 20000, generic_dna), id="Alpha"),
SeqRecord(Seq("A" * 20000, generic_dna), id="Beta"),
])
AlignIO.write([align1], "out.nex", "nexus")
p = subprocess.Popen(["mb"], stdin=subprocess.PIPE)
p.communicate("execute out.nex")
This gives the error:
MrBayes > execute out.nex
Executing file "out.nex"
UNIX line termination
Longest line length = 20006
A maximum of 19900 characters is allowed on a single line
in a file. The longest line of the file out.nex
contains at least one line with 20056 characters.
Error in command "Execute"
Cheers
Nick
More information about the Biopython-dev
mailing list