[Biopython-dev] [Bug 2531] Nexus and fasta parsers have a problem with identical taxa names
bugzilla-daemon at portal.open-bio.org
bugzilla-daemon at portal.open-bio.org
Mon Jun 30 14:21:41 UTC 2008
http://bugzilla.open-bio.org/show_bug.cgi?id=2531
------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-06-30 10:21 EST -------
Can I repeat my request that you upload an example file (by creating an
attachment to this bug) of a FASTA and NEXUS file that doesn't work for you.
Here is a small Nexus file I just created by hand, with repeated taxon
CYS1_DICDI (with almost the same sequence), and then below some example code
using Bio.Nexus to parse it.
==================================
#NEXUS
[TITLE: NoName]
begin data;
dimensions ntax=4 nchar=50;
format interleave datatype=protein gap=- symbols="FSTNKEYVQMCLAWPHDRIG";
matrix
CYS1_DICDI -----MKVIL LFVLAVFTVF VSS------- --------RG IPPEEQ----
ALEU_HORVU MAHARVLLLA LAVLATAAVA VASSSSFADS NPIRPVTDRA ASTLESAVLG
CATH_HUMAN ------MWAT LPLLCAGAWL LGV------- -PVCGAAELS VNSLEK----
CYS1_DICDI -----MKVIL LFVLAVFTVF VSS------- --------RG IPPEEQ---X
;
end;
==================================
Then in python,
>>> filename = ...
>>> handle = open(filename)
>>> from Bio.Nexus import Nexus
>>> n = Nexus.Nexus(handle)
>>> print n.matrix.keys()
['CATH_HUMAN', 'CYS1_DICDI', 'CYS1_DICDI.copy', 'ALEU_HORVU']
>>> n.matrix['CYS1_DICDI']
Seq('-----MKVILLFVLAVFTVFVSS---------------RGIPPEEQ----', IUPACProtein())
>>> n.matrix['CYS1_DICDI.copy']
Seq('-----MKVILLFVLAVFTVFVSS---------------RGIPPEEQ---X', IUPACProtein())
Note that Bio.Nexus has automatically renamed the duplicate entry
'CYS1_DICDI.copy' and that their different sequences have been loaded
correctly.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
More information about the Biopython-dev
mailing list