[Biopython-dev] [Bug 2833] Features insertion on previous bioentry_id

bugzilla-daemon at portal.open-bio.org bugzilla-daemon at portal.open-bio.org
Tue Jun 2 17:25:54 UTC 2009


http://bugzilla.open-bio.org/show_bug.cgi?id=2833





------- Comment #20 from andrea at biodec.com  2009-06-02 13:25 EST -------
(In reply to comment #19)
> (In reply to comment #18)
> > (In reply to comment #17)
> > > How do you feel about this simplistic solution?: if the rules are present,
> > > before loading a new record, do a query to check to make sure there isn't a
> > > duplicate already present, and if there is raise an IntegrityError.
> > 
> > Now thats a much better solution than the way Ive been trying to go...
> > 
> > This does the trick:
> > ...
> > +            if self.postgres_rules_present:
> > +                self.adaptor.execute("SELECT bioentry_id FROM bioentry "
> > +                                     "WHERE identifier = '%s'" %
> > cur_record.id)
> > +                if self.adaptor.cursor.fetchone():
> > +                    raise self.adaptor.conn.IntegrityError("Duplicate record " 
> > +                        "detected: record has not been inserted")
> 
> While the above code looks sensible, I don't think it covers all the cases yet.
> Essentially the two bioentry rules relate to these two uniqueness rules in the
> default schema:
> 
> UNIQUE ( identifier , biodatabase_id ) 
> UNIQUE ( accession , biodatabase_id , version )

What i think... 
 1) the solution is almost correct
 2) but we have for sure to consider both rules because ("i tried") and
    they work fully independetly.. so we need to check both rules. 
 3) the unicity is related to the biodatabase, so i can add 2 record with
    identical accession, or identifier or both... but different biodatabase
    and this works perfectly.
 3) At the end i would like to add also a warning because the presence 
    of the rules cause an overhead into insertion because trigger other 
    queries.... (and it could be convenient to inform...) 
> 
> According to rule_bioentry_i1 (or the equivalent rule) we should allow the same
> bioentry.identifier to appear in different namespaces (i.e. as long as
> bioentry.biodatabase_id differs). i.e. something like this in your code:
> 
> "SELECT bioentry_id FROM bioentry WHERE identifier = '%s AND biodatabase_id =
> %s' % (cur_record.id, self.dbid)
> 
> Then for rule_bioentry_i2 we also need to check the accession, version and
> biodatabase_id have not been used before.

sure
> 
> Both checks could probably be done as a single more complex SQL query.

"SELECT bioentry_id FROM bioentry WHERE (identifier = '%s AND biodatabase_id =
%s') OR (accession = '%s AND version = '%s' AND biodatabase_id = %s')"

so if one of the two (or both) is matched you have a bioentry_id and you
could have the problem

> 
> Also, when we check for the rules, do you think we should check for
> rule_bioentry_i2 as well as rule_bioentry_i1? In principle they will either
> both be there, or neither. What about the other rules - might they also cause
> problems in Biopython?

both... it's fully the same. you have the same problem on
aceession,version,biodatabase_id
> 
> Finally, on a code style thing, I'd make postgres_rules_present private, i.e.
> call it _postgres_rules_present instead. Anyway, in principle it looks like
> this approach should work :)

ok

> 
> Peter
> 
thanks
andrea


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.



More information about the Biopython-dev mailing list