[Biopython-dev] [Bug 2494] New: _retrieve_taxon in BioSQL.py needs urgent optimization
bugzilla-daemon at portal.open-bio.org
bugzilla-daemon at portal.open-bio.org
Sat Apr 26 03:47:28 UTC 2008
http://bugzilla.open-bio.org/show_bug.cgi?id=2494
Summary: _retrieve_taxon in BioSQL.py needs urgent optimization
Product: Biopython
Version: Not Applicable
Platform: PC
OS/Version: Linux
Status: NEW
Severity: normal
Priority: P2
Component: BioSQL
AssignedTo: biopython-dev at biopython.org
ReportedBy: ericgibert at yahoo.fr
I ran the Perl script to get the BioSQL tables 'taxon' and 'taxon_name'
updated. Taxon contains 419036 rows and taxon_name contains 584058 rows.
To retrieve the taxonomy of a DBSeqRecord, the function DBSeq._retrieve_taxon()
uses a SQL based on the nested sets defined by left and right values.
This approach is extremely time consuming once the tables grow large. When the
issue is a bottom-up search, in this case "all taxon parent of this species",
it is better to use the links child/parent based on parent_taxon_id field.
Please refer to next post with attached script demonstrating my point.
Eric
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
More information about the Biopython-dev
mailing list