[Biopython-dev] BioSQL bugs

Fri Mar 26 10:48:07 EST 2004

>
>
>Hi Marc;
>
>>/ First,  I've added support for pgdb to DBUtils and did some testing the 
>/>/ diff is at the end. 
>/
>Thanks. I've just checked your patch in. The only problem I have is
>with the autocommit functionality. I dug around on mailing lists and
>the like and do see that PyGreSQL doesn't support anything like this
>-- however, do you have any ideas to make the Tests work without
>this type of functionality.
>
>The problem (as far as I can see it right now) is that if a
>connection is opened then you can't do DROPs (or CREATE?). However
>if you don't have an open connection, then you can't execute SQL so
>you can't do the DROPs either. So I guess maybe it's a catch 22 that
>really only affects the tests (where we need to do this annoying
>dropping and creating automatically), but do you (or anything) have
>any clever ideas to work around this so that the Tests will work?
>  
>

I think bioperl makes use of functions (see the biosqldb-pg.sql). I was 
thinking about adding some of these function calls to the DBUtils 
section to speed up the transactions. Removing some of the constraints 
will increase the speed as the database grows. This code works fine for 
small sets, but it quickly slows down (probably because of the checks).

>>/ Second the fix for taxon doesn't work. The problem 
>/>/ is that it tries to enter NULLs for fields that are required to be 
>/>/ unique.
>/>/ 
>/>/ BioSQL.Loader
>/>/ line 188 parent_taxon_id = None
>/>/ 	for taxon in lineage:
>/>/             self.adaptor.execute(
>/>/                 "INSERT INTO taxon(parent_taxon_id, ncbi_taxon_id, 
>/>/ node_rank,"\
>/>/                 " left_value, right_value)" \
>/>/                 " VALUES (%s, %s, %s, %s, %s)", (parent_taxon_id,
>/>/                                                  taxon[0],
>/>/                                                  taxon[1],
>/>/                                                  left_value,
>/>/                                                  right_value))
>/>/ 
>/>/ This might work the first time, but since parent_taxon and other need 
>/>/ to be unique this fails. I don't know a simple solution for this, 
>/>/ except to give up and not put in a taxon_id (which isn't required for a 
>/>/ bioentry).
>/
>Okay, I was playing around with this and fixed it for a problem I
>was having (with non-unique right_values) in an ugly way which I'm
>sure is not right.
>
>My real problem is I don't understand the table:
>
>CREATE TABLE taxon (
>       taxon_id		INT(10) UNSIGNED NOT NULL auto_increment,
>       ncbi_taxon_id 	INT(10),
>       parent_taxon_id	INT(10) UNSIGNED,
>       node_rank	VARCHAR(32),
>       genetic_code	TINYINT UNSIGNED,
>       mito_genetic_code TINYINT UNSIGNED,
>       left_value	INT(10) UNSIGNED,
>       right_value	INT(10) UNSIGNED,
>       PRIMARY KEY (taxon_id),
>       UNIQUE (ncbi_taxon_id),
>       UNIQUE (left_value),
>       UNIQUE (right_value)
>) TYPE=INNODB;
>
>Okay, so the problem is that I have no idea what parent_taxon_id,
>left_value and right_value are. I assume that they are supposed to
>represent some kind of heirarchy of taxonomy. As near as I can
>figure if you have a tree like:
>  
>

These values are needed for nested-set representation 
<http://www.oreillynet.com/pub/a/network/2002/11/27/bioconf.html?page=1>. 
They are used to quickly limit a branch of a tree. Selecting on the 
values >= the left and <= the right gives you all the elements under 
that part of the tree. I don't think it would be easy to add a new 
element to the tree with out rebuilding the whole representation. 
Therefore, I just skip it and put in a null (and print out that it 
wasn't known). This needs to be fixed in the source of the data.

>A -> B -> C -> D
>          |
>           --> E
>
>Then this table would be filled for C with parent_taxon_id to be B's
>taxon_id, left_value to be D's taxon_id and right_value to be E's
>taxon_id.
>
>Is this right at all or am I completely confused? I can take a hit
>at this, but without really getting the table I've been stumped so
>far and just stare at it scratching my head.
>
>Thanks for the work on BioSQL. Sorry if I am a bit (a lot) confused 
>about things at the moment.
>Brad
>