[Biopython-dev] BioSQL bugs

Brad Chapman chapmanb at uga.edu
Wed Mar 24 16:35:53 EST 2004


Hi Marc;

> First,  I've added support for pgdb to DBUtils and did some testing the 
> diff is at the end. 

Thanks. I've just checked your patch in. The only problem I have is
with the autocommit functionality. I dug around on mailing lists and
the like and do see that PyGreSQL doesn't support anything like this
-- however, do you have any ideas to make the Tests work without
this type of functionality.

The problem (as far as I can see it right now) is that if a
connection is opened then you can't do DROPs (or CREATE?). However
if you don't have an open connection, then you can't execute SQL so
you can't do the DROPs either. So I guess maybe it's a catch 22 that
really only affects the tests (where we need to do this annoying
dropping and creating automatically), but do you (or anything) have
any clever ideas to work around this so that the Tests will work?

> Second the fix for taxon doesn't work. The problem 
> is that it tries to enter NULLs for fields that are required to be 
> unique.
> 
> BioSQL.Loader
> line 188 parent_taxon_id = None
> 	for taxon in lineage:
>             self.adaptor.execute(
>                 "INSERT INTO taxon(parent_taxon_id, ncbi_taxon_id, 
> node_rank,"\
>                 " left_value, right_value)" \
>                 " VALUES (%s, %s, %s, %s, %s)", (parent_taxon_id,
>                                                  taxon[0],
>                                                  taxon[1],
>                                                  left_value,
>                                                  right_value))
> 
> This might work the first time, but since parent_taxon and other need 
> to be unique this fails. I don't know a simple solution for this, 
> except to give up and not put in a taxon_id (which isn't required for a 
> bioentry).

Okay, I was playing around with this and fixed it for a problem I
was having (with non-unique right_values) in an ugly way which I'm
sure is not right.

My real problem is I don't understand the table:

CREATE TABLE taxon (
       taxon_id		INT(10) UNSIGNED NOT NULL auto_increment,
       ncbi_taxon_id 	INT(10),
       parent_taxon_id	INT(10) UNSIGNED,
       node_rank	VARCHAR(32),
       genetic_code	TINYINT UNSIGNED,
       mito_genetic_code TINYINT UNSIGNED,
       left_value	INT(10) UNSIGNED,
       right_value	INT(10) UNSIGNED,
       PRIMARY KEY (taxon_id),
       UNIQUE (ncbi_taxon_id),
       UNIQUE (left_value),
       UNIQUE (right_value)
) TYPE=INNODB;

Okay, so the problem is that I have no idea what parent_taxon_id,
left_value and right_value are. I assume that they are supposed to
represent some kind of heirarchy of taxonomy. As near as I can
figure if you have a tree like:

A -> B -> C -> D
          |
           --> E

Then this table would be filled for C with parent_taxon_id to be B's
taxon_id, left_value to be D's taxon_id and right_value to be E's
taxon_id.

Is this right at all or am I completely confused? I can take a hit
at this, but without really getting the table I've been stumped so
far and just stare at it scratching my head.

Thanks for the work on BioSQL. Sorry if I am a bit (a lot) confused 
about things at the moment.
Brad



More information about the Biopython-dev mailing list