[BioSQL-l] Loading sequences with novel NCBI taxon id
Hilmar Lapp
hlapp at gmx.net
Thu Mar 13 19:41:43 EDT 2008
On Mar 13, 2008, at 7:13 PM, Peter wrote:
> On Thu, Mar 13, 2008 at 10:51 PM, Hilmar Lapp <hlapp at gmx.net> wrote:
>> [...]
>> The load_ncbi_taxonomy.pl script is designed to update the taxon
>> tables in a non-disruptive way, and if there weren't many changes
>> shouldn't actually take that long (except that recalculating the
>> nested set values may take a couple of minutes).
>
> Do you think when faced with a novel taxon id, Biopython/BioPerl/...
> could write some minimal taxonomy entry (without any guess work based
> on the species name), in order to record the sequence's taxon
This is what Bioperl-db does. There isn't any guesswork. If
Bio::Species has lineage information it will also insert the lineage
information, though.
> - and then running an improved load_ncbi_taxonomy.pl at a later
> date would
> sort out the proper taxonomy?
If I remember correctly, the script makes (and hence expects) the
primary key and the NCBI taxonomy ID to be identical. If your loading
procedure can achieve that already then load_ncbi_taxonomy.pl should
pick them up and fix them. You can try that by loading the taxonomy
through the script, then arbitrarily choose a taxon, create a stub
bioentry for it and set its taxon_id foreign key to the chosen
taxon, change its taxon_name.name to some bogus value (for the
'scientific name' class, for example) (and feel free to change the
left_id and right_id values in taxon too), and rerun the script. It
should fix the change you made, and your bioentry should still point
to the same taxon (because its primary key did not change, and did
not get deleted either; otherwise the bioentry would now have a null
value in the foreign key).
The Bioperl-db way of storing things does not give control over
primary key assignment to Bioperl-db, so the database will assign it.
> [...]
>> For the SymAtlas project we had this situation (new species in
>> sequence updates that the last NCBI taxonomy update hadn't yet
>> brought in) quite regularly. I wrote a SQL script would fix those
>> 'haphazard' additions such that load_ncbi_taxonomy would update them
>> to their correct values come the next NCBI taxonomy update. I can
>> send you the script (it would be for the Oracle version), but I'm
>> not
>> sure this is a widely viable strategy.
>
> So this wasn't integrated with load_ncbi_taxonomy.pl at all?
No, but now that you say it I don't see any reason why I couldn't.
Maybe that's just what I should do.
-hilmar
--
===========================================================
: Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net :
===========================================================
More information about the BioSQL-l
mailing list