[BioPython] Taxon/organism/source in Biopython

Eric Gibert ericgibert at yahoo.fr
Sun Nov 11 02:11:54 UTC 2007


I find out one answer to my question: in BioSQL/Loader.py (Biopython version
1.44), inside the function _load_bioentry_table, there is a *commented* call
to

 

#     taxon_id = self._get_taxon_id(record)           

 

I uncommented it and also modified the "INSERT INTO bioentry" statement
slightly below to include the taxon_id column and value.

 

Thereafter, the call to self._get_taxon_id ensures that the taxon is handled
and inserted in the database. The Sequence.annotations now contains:

- 'taxonomy': ['the Genus', 'theSpecies']

- 'ncbi_taxoid' : 123456L

- 'organism' : 'thespecies'

 

Which is exactly what I was looking for :-)

 

Attention: inside the _get_taxon_id function, in the section starting with
the comment "# XXX -- Brad:......", inserts are performed without checking
prior existence of 'taxon': although it is clear that the lowest taxon is
not in the database already (or else we would have already returned from the
function), INSERT of higher level without prior existence check is not
correct: I have imported two sequences of the same genus but different
species and the GENUS has been created twice.

 

This is also due to the fact the genus does not have ncbi_taxon_id...

 

Thus I propose to first check if the taxon is not already in the table
before insertion, based on 

 

      SELECT taxon_id from taxon_name where name=%s and
name_class='scientific name'

 

What do you think?

 

PS1: Hilmar, I was typing this mail when I received your mail commenting on
the taxon manipulation. As you can see, this provides some answers. Thank
you for taking the time to detail the tables content: as you guessed, I only
access NCBI :-)

 

PS2: Hilmar mentions that BioPerl has a function 'load_ncbi_taxonomy.pl'.
Does BioPython has one too (I could not find one)? If there is none, shall
we/I try to provide one?

 

 

-----Original Message-----
From: biopython-bounces at lists.open-bio.org
[mailto:biopython-bounces at lists.open-bio.org] On Behalf Of Eric Gibert
Sent: Saturday, November 10, 2007 7:17 PM
To: biopython at lists.open-bio.org
Subject: Re: [BioPython] error on insert new sequences from GenBank:
noannotations saved in BioSQL database

 

Dear Peter,

 

 

 

My problem is not that I do not have entries in the tables but it is that no

interpretation on the feature is perform.

 

 

 

Example:

 

In the tutorial and in BioJava, 'source' is an annotation:

 

 

 

      # from the Biopython Tutorial and Cookbook

 

      print "from: %s" % seq_record.annotations['source']

 

 

 

This returns a "KeyError: 'source'"

 

 

 

On the other hand, after some tinkering, I found that I can have a a feature

from the list Seq.features with the type='source' which contains a

"qualifiers['organism']"... Quite cumbersome.

 

 

 

But maybe there is another way, more straight forward, that I did not find.

Can you tell me?

 




More information about the Biopython mailing list