[Biopython-dev] [BioSQL-l] Loading sequences with novel NCBI taxon id

Michiel de Hoon mjldehoon at yahoo.com
Mon Mar 17 02:43:55 UTC 2008


> Thank you for your mail recommending the usage of NCBI.WWW.
> I have modified my class/script accordingly to your suggestion
> without problem. Once 1.45 is out, I will change for NCBI.Entrez
> as you informed me.

Just to avoid any confusion: In Biopython 1.45, the module will be "Bio.Entrez", not "Bio.NCBI.Entrez".

> In any case, I do not pretend having a fantastic piece of code, but it gets
> the job done. If you find this interesting, I would be pleased to contribute
> to BioPython.

Bio.Entrez will need some parsers to parse the XML results, although that probably won't happen before the 1.45 release. I think your script could be very useful when writing those parsers. Could you open a bug report on Bugzilla and upload your script there? Beware, to upload a script to Bugzilla, you need to create a bug report first, and then as a separate step upload the script.

Thanks!

--Michiel..



Eric Gibert <ericgibert at yahoo.fr> wrote: Dear Peter,

Regarding the update of the BioSQL tables taxon and taxon_name, I have
created a class "TaxonUpdate" (how original!) which do two things:

1) as a class itself, it will fetch from NCBI the taxon's information as XLM
based on the taxon_id passed to the constructor, parse the returned XML
answer to get the genus, class, order, family (10 levels) and update that in
taxon table. If taxon_name needs update/insert, it does it too.

2) run as an independent script __main__, it will look for all species in
taxon table for which the genus (parent) does not have a ncbi_taxon_id (i.e.
is NULL as this is the current result after adding a new sequence in
BioSQL). For all those incomplete found records, it will perform the update
as (1)

After the addition of a new sequence in a BioSQL database, a simple call of
this code (passing the taxon_id) will do the updating job.


Dear Michiel,

Thank you for your mail recommending the usage of NCBI.WWW. I have modified
my class/script accordingly to your suggestion without problem. Once 1.45 is
out, I will change for NCBI.Entrez as you informed me.


In any case, I do not pretend having a fantastic piece of code, but it gets
the job done. If you find this interesting, I would be pleased to contribute
to BioPython.


Eric


-----Original Message-----
From: biosql-l-bounces at lists.open-bio.org
[mailto:biosql-l-bounces at lists.open-bio.org] On Behalf Of Peter
Sent: Thursday, March 13, 2008 11:06 PM
To: BioSQL
Subject: [BioSQL-l] Loading sequences with novel NCBI taxon id

Dear list,

One of the unresolved issues with Biopython's BioSQL interface is
dealing with the NCBI taxon ID when loading sequences into the
database.

As I understand it, ideally before loading any sequences, the user
will have loaded in the entire NCBI taxonomy using the
load_ncbi_taxonomy.pl script, as I described here:
http://biopython.org/wiki/BioSQL#NCBI_Taxonomy

When a new sequence is added to the database with a known taxon id,
there is no problem.  But happens if its a recently sequenced organism
which isn't defined yet in the BioSQL taxonomy tables?  Could/should
the user re-run load_ncbi_taxonomy.pl, and then load in their new
sequence?

Right now in Biopython due what appears to have been intended as a
short term hack, we simple don't record the taxon id at all (!), and I
would like to fix this (bug 2422).
http://bugzilla.open-bio.org/show_bug.cgi?id=2422

How do BioPerl et al deal with this issue?  Do they try and update the
taxonomy tables using the available information in the new record's
annotation (i.e. the new taxon id and the species name)?  Do they
lookup the NCBI taxonomy definition via the internet?  Do they throw
an error and halt?

Thanks,

Peter
(Biopython)
_______________________________________________
BioSQL-l mailing list
BioSQL-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biosql-l




       
---------------------------------
Be a better friend, newshound, and know-it-all with Yahoo! Mobile.  Try it now.



More information about the Biopython-dev mailing list