[Biopython-dev] [Bug 2494] New: _retrieve_taxon in BioSQL.py needs urgent optimization

Sat Apr 26 03:47:28 UTC 2008

http://bugzilla.open-bio.org/show_bug.cgi?id=2494

           Summary: _retrieve_taxon in BioSQL.py needs urgent optimization
           Product: Biopython
           Version: Not Applicable
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: BioSQL
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: ericgibert at yahoo.fr

I ran the Perl script to get the BioSQL tables 'taxon' and 'taxon_name'
updated. Taxon contains 419036 rows and taxon_name contains 584058 rows.

To retrieve the taxonomy of a DBSeqRecord, the function DBSeq._retrieve_taxon()
uses a SQL based on the nested sets defined by left and right values.

This approach is extremely time consuming once the tables grow large. When the
issue is a bottom-up search, in this case "all taxon parent of this species",
it is better to use the links child/parent based on parent_taxon_id field.

Please refer to next post with attached script demonstrating my point.

Eric

-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.