[Biopython-dev] Bio.AlignIO, Bio.Nexus, MrBayes, polymorphic sites, maximum line length

Cymon Cox cy at cymon.org
Thu Dec 2 12:03:55 UTC 2010


On 2 December 2010 11:43, Peter <biopython at maubp.freeserve.co.uk> wrote:

> On Thu, Dec 2, 2010 at 10:50 AM, Nick Loman <n.j.loman at bham.ac.uk> wrote:
> > Hi there
> [...]
> > 2) When outputting long alignments in Nexus format, MrBayes refuses to
> read
> > the resulting files saying that the maximum line length is 19900
> characters.
> > I'm assuming that is not the maximum input to MrBayes and that it can
> handle
> > longer alignments if they are split in some way. Would it be possible for
> > Bio.Nexus to split alignments in the appropriate format?
>
> The file format details are not fresh in my mind, but I think that long
> sequences can be split over multiple lines#


This is valid interleaved Nexus format:

"""
#NEXUS

begin data;
Dimensions ntax=4 nchar=3;
Format interleave datatype=dna gap=-;
Matrix

taxon1 AA
taxon2 GG
taxon3 CC
taxon4 TT

taxon1 A
taxon2 G
taxon3 C
taxon4 T
;

end;
"""

Note, "interleave" on the format line. Also beware that some Nexus parsers
don't check that taxa in additional blocks are in the same order as the
first block - they just assume they are.

You can write interleaved Nexus formatted data with
Nexus.write_nexus_data(interleave_by_partition=True) provide you have a
character partition set.

Cheers, C.



> - so if the problem is
> just with how MrBayes parses the file, that might be fixable. Can
> you give me a test case for this (maybe generate a simple but
> large alignment in code) with the MrBayes call that fails?
>
> Peter
>
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>



-- 
____________________________________________________________________

Cymon J. Cox

Auxiliary Investigator
Plant Systematics and Bioinformatics Research Group (PSB)
Centro de Ciencias do Mar (CCMAR) - CIMAR-Lab. Assoc.

Mailing address:
Rm. 2.77
Faculdade de Ciências e Tecnologia (FCT), Ed.7,
Universidade do Algarve
Campus de Gambelas
8005-139 Faro
Portugal

Phone: +0351 289800909 ext 7380
Fax: +0351 289800051
Email: cy at cymon.org, cymon at ualg.pt, cymon.cox at gmail.com
HomePage : http://www.ccmar.ualg.pt/home/index.php?id=202
-8.63/-6.77




More information about the Biopython-dev mailing list