[BioSQL-l] Loading sequences with novel NCBI taxon id
Hilmar Lapp
hlapp at gmx.net
Tue Mar 18 12:30:34 UTC 2008
On Mar 17, 2008, at 12:08 PM, Peter wrote:
>> [...]
>> It's pretty unreliable actually. There is not only synonymy but also
>> rampant homonymy in taxonomic names. There are plenty of examples
>> for
>> the same scientific name in use for a plant and for some animal, for
>> example. So in order to be unambiguous you will need to know (and
>> check) the kingdom.
>
> I don't think the current Biopython code for recording the lineages
> checks the
> kingdom... could someone point me at the relevant bit of BioPerl
> and I'll see
> if I can understand exactly what they do?
Bioperl-db locates by NCBI taxon id first and then by scientific
name. It does not take kingdom into account.
You can find the persisted columns, unique key queries etc in Bio/DB/
BioSQL and then the respective adapter, in this case
SpeciesAdapter.pm. The unique key queries are defined in
get_unique_key_query().
>
> Hilmar Lapp wrote:
>> If I remember correctly, the script makes (and hence expects) the
>> primary key and the NCBI taxonomy ID to be identical.
>> ...
>> Doing that isn't a big deal but I guess this could also be fixed in
>> load_ncbi_taxonomy.pl so that it doesn't need to rely on this
>> assumption. Would someone mind filing the bug report? (We have a
>> BioSQL category now on bugzilla.open-bio.org.)
>
> I've filed Bug 2470 on this, http://bugzilla.open-bio.org/
> show_bug.cgi?id=2470
Thanks for the help, great, appreciated!
-hilmar
--
===========================================================
: Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net :
===========================================================
More information about the BioSQL-l
mailing list