[Biopython-dev] [Bug 2833] Features insertion on previous bioentry_id
bugzilla-daemon at portal.open-bio.org
bugzilla-daemon at portal.open-bio.org
Tue Jun 2 17:00:56 UTC 2009
http://bugzilla.open-bio.org/show_bug.cgi?id=2833
------- Comment #19 from biopython-bugzilla at maubp.freeserve.co.uk 2009-06-02 13:00 EST -------
(In reply to comment #18)
> (In reply to comment #17)
> > How do you feel about this simplistic solution?: if the rules are present,
> > before loading a new record, do a query to check to make sure there isn't a
> > duplicate already present, and if there is raise an IntegrityError.
>
> Now thats a much better solution than the way Ive been trying to go...
>
> This does the trick:
> ...
> + if self.postgres_rules_present:
> + self.adaptor.execute("SELECT bioentry_id FROM bioentry "
> + "WHERE identifier = '%s'" %
> cur_record.id)
> + if self.adaptor.cursor.fetchone():
> + raise self.adaptor.conn.IntegrityError("Duplicate record "
> + "detected: record has not been inserted")
While the above code looks sensible, I don't think it covers all the cases yet.
Essentially the two bioentry rules relate to these two uniqueness rules in the
default schema:
UNIQUE ( identifier , biodatabase_id )
UNIQUE ( accession , biodatabase_id , version )
According to rule_bioentry_i1 (or the equivalent rule) we should allow the same
bioentry.identifier to appear in different namespaces (i.e. as long as
bioentry.biodatabase_id differs). i.e. something like this in your code:
"SELECT bioentry_id FROM bioentry WHERE identifier = '%s AND biodatabase_id =
%s' % (cur_record.id, self.dbid)
Then for rule_bioentry_i2 we also need to check the accession, version and
biodatabase_id have not been used before.
Both checks could probably be done as a single more complex SQL query.
Also, when we check for the rules, do you think we should check for
rule_bioentry_i2 as well as rule_bioentry_i1? In principle they will either
both be there, or neither. What about the other rules - might they also cause
problems in Biopython?
Finally, on a code style thing, I'd make postgres_rules_present private, i.e.
call it _postgres_rules_present instead. Anyway, in principle it looks like
this approach should work :)
Peter
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
More information about the Biopython-dev
mailing list