[BioSQL-l] Loading sequences with novel NCBI taxon_id
Hilmar Lapp
hlapp at gmx.net
Sat Apr 12 03:23:21 UTC 2008
On Apr 11, 2008, at 9:32 PM, Martin MOKREJŠ wrote:
> I decided to re-import the same data to mimic at least somehow
> the future updates, although no record should be UPDATEd,
> except zapping left and right values with NULL. :((
Not sure what made you frown here?
> [...]
>
> I believe the recalculation step could be made faster. See what
> happens:
> [...]
> The columns left_value and right_value have NULL value upon
> the table is created, so no need to write again NULL into
> them.
But that's only true the first time you load. For almost all real
databases, all except the first run of the script won't be able to
take advantage of that.
> This would mean writing a wrapper function which would
> mimic update() but before doing that it would do 'SELECT * FROM',
> compare the values with those to be written and include in the
> final UPDATE statement only those columns for which values have
> been changed. We use such a smart wrapper for our code in python.
> ;-)
What you see is the "optimization" for MySQL. For all other RDBMSs it
does both left and right in one update.
BTW note that SELECT does not have zero cost, it requires both an
index and a table read, only to find on average 50% of the time that
you will need to update anyway. So what you gain 50% of the time you
lose the other 50% of the time.
>
> When the columns for left and right are to be made NULL during
> update of an existing database, I think it would be much faster
> to drop the columns and re-create them again with NULL values.
In terms of speed, that may be how MySQL works indeed. In PostgreSQL
it would even be transactional (but very slow with concurrent
queries), but with most databases you are now outside of a
transaction (because it is DDL), which not only leaves the data in an
inconsistent state, but also will immediately break any application
you run against it because the table structure changed under its feet.
> [...] I think it could be investigated more the possibility to create
> empty taxon and taxon_name tables as MyISAM tables and only after
> all the import and updates they could be converted into InnoDB
> tables.
I'm sure there are lots of hacks and tricks that would make this
faster for one particular RBDMS, and you are welcome to explore
those. But the script is written to deal with several RDBMSs, and it
does so as transactionally safe as possible. The assumption is that
you are running this against a live database that is being queried
concurrently.
-hilmar
--
===========================================================
: Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net :
===========================================================
More information about the BioSQL-l
mailing list