[BioSQL-l] Loading sequences with novel NCBI taxon id

Tue Mar 18 12:30:34 UTC 2008

On Mar 17, 2008, at 12:08 PM, Peter wrote:

>> [...]
>>  It's pretty unreliable actually. There is not only synonymy but also
>>  rampant homonymy in taxonomic names. There are plenty of examples  
>> for
>>  the same scientific name in use for a plant and for some animal, for
>>  example. So in order to be unambiguous you will need to know (and
>>  check) the kingdom.
>
> I don't think the current Biopython code for recording the lineages  
> checks the
> kingdom... could someone point me at the relevant bit of BioPerl  
> and I'll see
> if I can understand exactly what they do?

Bioperl-db locates by NCBI taxon id first and then by scientific  
name. It does not take kingdom into account.

You can find the persisted columns, unique key queries etc in Bio/DB/ 
BioSQL and then the respective adapter, in this case  
SpeciesAdapter.pm. The unique key queries are defined in  
get_unique_key_query().

>
> Hilmar Lapp wrote:
>>  If I remember correctly, the script makes (and hence expects) the
>>  primary key and the NCBI taxonomy ID to be identical.
>>  ...
>>  Doing that isn't a big deal but I guess this could also be fixed in
>>  load_ncbi_taxonomy.pl so that it doesn't need to rely on this
>>  assumption. Would someone mind filing the bug report? (We have a
>>  BioSQL category now on bugzilla.open-bio.org.)
>
> I've filed Bug 2470 on this, http://bugzilla.open-bio.org/ 
> show_bug.cgi?id=2470

Thanks for the help, great, appreciated!

	-hilmar

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================