[Biopython-dev] [Bug 2531] Nexus and fasta parsers have a problem with identical taxa names

Frank Kauff fkauff at biologie.uni-kl.de
Mon Jun 30 13:10:15 EDT 2008



bugzilla-daemon at portal.open-bio.org wrote:
>
>
> In the Bio.SeqIO code that calls Bio.Nexus, I hadn't realized that Bio.Nexus
> kept the un-edited taxon names around.  It is this list of the non-unique
> original identifiers that Bio.SeqIO was using, which explains why you end up
> with two copies of HI99.Line5.
>
> Sorry Frank - I was pointing fingers when it was my own bug after all!
>
>
> Looking back, the reason I was using the original_taxon_order list was I wanted
> to get the sequences in their original order.  I see now that I can't use the
> elements in this list as keys to the matrix because the matrix keys are the
> modified taxon names.
>
> Is there any way to get the modified taxon names in the original order?  Other
> than looping over original_taxon_order and repeating your naming algorithm?
>   
Actually -this *IS* a bug. All fingers were pointing correctly... 
Original_taxon labels was just kept just for compatibility, and is the 
same as taxlabels. Taxlabels is supposed to have the unique identifiers 
- it just doesn't work correctly with non-unique ids in interleaved data 
sets.
Fix following soon

Frank


More information about the Biopython-dev mailing list