[BioSQL-l] Loading sequences with novel NCBI taxon_id

Sat Apr 12 03:23:21 UTC 2008

On Apr 11, 2008, at 9:32 PM, Martin MOKREJŠ wrote:
> I decided to re-import the same data to mimic at least somehow
> the future updates, although no record should be UPDATEd,
> except zapping left and right values with NULL. :((

Not sure what made you frown here?

> [...]
>
> I believe the recalculation step could be made faster. See what
> happens:
> [...]
> The columns left_value and right_value have NULL value upon
> the table is created, so no need to write again NULL into
> them.

But that's only true the first time you load. For almost all real  
databases, all except the first run of the script won't be able to  
take advantage of that.

> This would mean writing a wrapper function which would
> mimic update() but before doing that it would do 'SELECT * FROM',
> compare the values with those to be written and include in the
> final UPDATE statement only those columns for which values have
> been changed. We use such a smart wrapper for our code in python.
> ;-)

What you see is the "optimization" for MySQL. For all other RDBMSs it  
does both left and right in one update.

BTW note that SELECT does not have zero cost, it requires both an  
index and a table read, only to find on average 50% of the time that  
you will need to update anyway. So what you gain 50% of the time you  
lose the other 50% of the time.

>
> When the columns for left and right are to be made NULL during
> update of an existing database, I think it would be much faster
> to drop the columns and re-create them again with NULL values.

In terms of speed, that may be how MySQL works indeed. In PostgreSQL  
it would even be transactional (but very slow with concurrent  
queries), but with most databases you are now outside of a  
transaction (because it is DDL), which not only leaves the data in an  
inconsistent state, but also will immediately break any application  
you run against it because the table structure changed under its feet.

> [...] I think it could be investigated more the possibility to create
> empty taxon and taxon_name tables as MyISAM tables and only after
> all the import and updates they could be converted into InnoDB
> tables.

I'm sure there are lots of hacks and tricks that would make this  
faster for one particular RBDMS, and you are welcome to explore  
those. But the script is written to deal with several RDBMSs, and it  
does so as transactionally safe as possible. The assumption is that  
you are running this against a live database that is being queried  
concurrently.

	-hilmar
-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================