[Biopython-dev] [Bug 2833] Features insertion on previous bioentry_id

bugzilla-daemon at portal.open-bio.org bugzilla-daemon at portal.open-bio.org
Tue Jun 2 17:00:56 UTC 2009


http://bugzilla.open-bio.org/show_bug.cgi?id=2833





------- Comment #19 from biopython-bugzilla at maubp.freeserve.co.uk  2009-06-02 13:00 EST -------
(In reply to comment #18)
> (In reply to comment #17)
> > How do you feel about this simplistic solution?: if the rules are present,
> > before loading a new record, do a query to check to make sure there isn't a
> > duplicate already present, and if there is raise an IntegrityError.
> 
> Now thats a much better solution than the way Ive been trying to go...
> 
> This does the trick:
> ...
> +            if self.postgres_rules_present:
> +                self.adaptor.execute("SELECT bioentry_id FROM bioentry "
> +                                     "WHERE identifier = '%s'" %
> cur_record.id)
> +                if self.adaptor.cursor.fetchone():
> +                    raise self.adaptor.conn.IntegrityError("Duplicate record " 
> +                        "detected: record has not been inserted")

While the above code looks sensible, I don't think it covers all the cases yet.
Essentially the two bioentry rules relate to these two uniqueness rules in the
default schema:

UNIQUE ( identifier , biodatabase_id ) 
UNIQUE ( accession , biodatabase_id , version )

According to rule_bioentry_i1 (or the equivalent rule) we should allow the same
bioentry.identifier to appear in different namespaces (i.e. as long as
bioentry.biodatabase_id differs). i.e. something like this in your code:

"SELECT bioentry_id FROM bioentry WHERE identifier = '%s AND biodatabase_id =
%s' % (cur_record.id, self.dbid)

Then for rule_bioentry_i2 we also need to check the accession, version and
biodatabase_id have not been used before.

Both checks could probably be done as a single more complex SQL query.

Also, when we check for the rules, do you think we should check for
rule_bioentry_i2 as well as rule_bioentry_i1? In principle they will either
both be there, or neither. What about the other rules - might they also cause
problems in Biopython?

Finally, on a code style thing, I'd make postgres_rules_present private, i.e.
call it _postgres_rules_present instead. Anyway, in principle it looks like
this approach should work :)

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.



More information about the Biopython-dev mailing list