[Biopython-dev] [Bug 2475] BioSQL.Loader should reuse existing taxon entries in lineage

Wed Apr 2 13:41:35 UTC 2008

http://bugzilla.open-bio.org/show_bug.cgi?id=2475

------- Comment #13 from biopython-bugzilla at maubp.freeserve.co.uk  2008-04-02 09:41 EST -------
In reply to comment 11,
> Ok to have the code in Loader.py.
> When a SeqRecord is to be added, we create a Taxonomy instance.
> I will add a function to return a copy of the _NCBI_lineage list.
> Then from "top" to "bottom", check if the taxon exists, if not,
> add it, until the species itself (there ensure that the
> parent_taxon_id is well populated).

Something like that sounds fine.  But I think we should settle the
Bio.Entrez.Taxonomy code first.

> By default, we assume that taxon_id == NCBI_taxon_id.

Why do you say that?  I don't think we should make this assumption.
See also BioSQL project Bug 2470

> If this is not the case, do I raise an error or
> "fall to plan B" and let the database to auto assign
> the taxon_id?

I am inclined to let the database assign the taxon_id, unless after discussion
on the BioSQL mailing list it is agreed that "attempting" to use the NCBI taxon
id as the taxon_id is encouraged.

> On missing point: the left and right value. Do you know
> what to do? I have run the Perl script on a test database
> and plan to look into the created records to clarify it...
> but you can save me the effort if you already know their logic.

Sorry, I haven't yet gone through this enough to be confident in
the correct usage (and Brad's comments in the relevant old bit of
Loader.py wasn't very helpful).  It might be worth discussing this
on the BioSQL mailing list.

Peter

-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.