[Biopython-dev] [Bug 2531] Nexus and fasta parsers have a problem with identical taxa names
Frank Kauff
fkauff at biologie.uni-kl.de
Mon Jun 30 13:10:15 EDT 2008
bugzilla-daemon at portal.open-bio.org wrote:
>
>
> In the Bio.SeqIO code that calls Bio.Nexus, I hadn't realized that Bio.Nexus
> kept the un-edited taxon names around. It is this list of the non-unique
> original identifiers that Bio.SeqIO was using, which explains why you end up
> with two copies of HI99.Line5.
>
> Sorry Frank - I was pointing fingers when it was my own bug after all!
>
>
> Looking back, the reason I was using the original_taxon_order list was I wanted
> to get the sequences in their original order. I see now that I can't use the
> elements in this list as keys to the matrix because the matrix keys are the
> modified taxon names.
>
> Is there any way to get the modified taxon names in the original order? Other
> than looping over original_taxon_order and repeating your naming algorithm?
>
Actually -this *IS* a bug. All fingers were pointing correctly...
Original_taxon labels was just kept just for compatibility, and is the
same as taxlabels. Taxlabels is supposed to have the unique identifiers
- it just doesn't work correctly with non-unique ids in interleaved data
sets.
Fix following soon
Frank
More information about the Biopython-dev
mailing list