[Biopython-dev] Bio.AlignIO, Bio.Nexus, MrBayes, polymorphic sites, maximum line length

Peter biopython at maubp.freeserve.co.uk
Thu Dec 2 11:43:28 UTC 2010


On Thu, Dec 2, 2010 at 10:50 AM, Nick Loman <n.j.loman at bham.ac.uk> wrote:
> Hi there
>
> Two questions for the developers.
>
> 1) I wanted to extract polymorphic sites from a multiple alignment and ended
> up with some code like this:
>
>   alignment = AlignIO.read(fn, "nexus")
>   rows = len(alignment)
>   new_alignment = None
>   for n in xrange(alignment.get_alignment_length()):
>       aln = alignment[:,n]
>       if aln[0] * rows != aln:
>           if new_alignment:
>               new_alignment += alignment[:,n:n+1]
>           else:
>               new_alignment = alignment[:,n:n+1]
>   if new_alignment:
>       AlignIO.write([new_alignment], open(fn + ".ply", "w"), "nexus")
>
> Is this the best way of doing it? Would a method call in AlignIO to
> do the same thing be useful to others?

I've got some code somewhere for iterating over the columns of
the alignment, and think I filed an enhancement bug for this.
Would that do what you want?

> 2) When outputting long alignments in Nexus format, MrBayes refuses to read
> the resulting files saying that the maximum line length is 19900 characters.
> I'm assuming that is not the maximum input to MrBayes and that it can handle
> longer alignments if they are split in some way. Would it be possible for
> Bio.Nexus to split alignments in the appropriate format?

Are you outputting the large alignment using Bio.AlignIO or using
Bio.Nexus directly?

The file format details are not fresh in my mind, but I think that long
sequences can be split over multiple lines - so if the problem is
just with how MrBayes parses the file, that might be fixable. Can
you give me a test case for this (maybe generate a simple but
large alignment in code) with the MrBayes call that fails?

Peter




More information about the Biopython-dev mailing list