[Biopython-dev] [Bug 2840] When a record has been loaded from BioSQL, trying to save it to another database fails with loader db_loader.load_seqrecord in _load_reference

bugzilla-daemon at portal.open-bio.org bugzilla-daemon at portal.open-bio.org
Mon May 25 20:14:40 EDT 2009


http://bugzilla.open-bio.org/show_bug.cgi?id=2840





------- Comment #3 from cymon.cox at gmail.com  2009-05-25 20:14 EST -------
(In reply to comment #1)
> I have modified the dbtestcase.py script to show the contents of the reference
> of the record downloaded from genbank, and from the record recovered from
> BioSQL.
> 
> Here is a print out of the last two references before saving to BioSQL:
> 
> authors: Sugita,M., Sugiura,C., Arikawa,T. and Higuchi,M.
> title: Molecular evidence of an rpoA gene in the basal moss chloroplast
> genomes: rpoA is a useful molecular marker for phylogenetic analysis of mosses
> journal: Hikobia 14, 171-175 (2004)
> medline id: 
> pubmed id: 
> comment: 
> 
> location: [0:789]
> authors: Sugita,M.
> title: Direct Submission
> journal: Submitted (25-DEC-2002) Mamoru Sugita, Nagoya University, Center for
> Gene Research; Chikusa-ku, Nagoya, Aichi 464-8602, Japan
> (E-mail:sugita at gene.nagoya-u.ac.jp, Tel:81-52-789-3080(ex.3080),
> Fax:81-52-789-3080)
> medline id: 
> pubmed id: 
> comment: 
> 
> --- note: no location in the first one; only a location in the last reference
> (why? - should references have a location?  I suppose they might, if they
> referred to a part of a chromosome?)
> 
> Now, after saving to BioSQL and recovering, all the records have a location,
> but in some cases, it is [None:None]; here are the same two records.
> 
> location: [None:None]
> authors: Sugita,M., Sugiura,C., Arikawa,T. and Higuchi,M.
> title: Molecular evidence of an rpoA gene in the basal moss chloroplast
> genomes: rpoA is a useful molecular marker for phylogenetic analysis of mosses
> journal: Hikobia 14, 171-175 (2004)
> medline id: 
> pubmed id: 
> comment: 
> 
> location: [0:789]
> authors: Sugita,M.
> title: Direct Submission
> journal: Submitted (25-DEC-2002) Mamoru Sugita, Nagoya University, Center for
> Gene Research; Chikusa-ku, Nagoya, Aichi 464-8602, Japan
> (E-mail:sugita at gene.nagoya-u.ac.jp, Tel:81-52-789-3080(ex.3080),
> Fax:81-52-789-3080)
> medline id: 
> pubmed id: 
> comment: 
> 
> 
> After this, the db.load method calls _load_reference.  
> 
> I think the problem is because the last line doesn't cope with none values.
> If one edits 
> _load_reference to put the last reference inside a test for the null condition
> 
>      if (start is not None and end is not None):        
>             sql = "INSERT INTO bioentry_reference (bioentry_id, reference_id,"
> \
>                   " start_pos, end_pos, rank)" \
>                   " VALUES (%s, %s, %s, %s, %s)"
>             self.adaptor.execute(sql, (bioentry_id, reference_id,
>                                        start, end, rank + 1))
> 
> Then the problem is solved, but I'm not sure how this fits in the bigger scheme
> of things.
> 
> d
> 

The BioSQL loader uses None for "start" and "end" if a reference doesn't have a
location. When the reference is retrieved the location remains set to
["None","None"]

Try this alteration to BioSeq.py, it should solve your problem:
cymon at gyra:~/git/github-master/BioSQL$ git diff BioSeq.py
diff --git a/BioSQL/BioSeq.py b/BioSQL/BioSeq.py
index cc47cf4..8d1e02a 100644
--- a/BioSQL/BioSeq.py
+++ b/BioSQL/BioSeq.py
@@ -351,8 +351,11 @@ def _retrieve_reference(adaptor, primary_id):
     references = []
     for start, end, location, title, authors, dbname, accession in refs:
         reference = SeqFeature.Reference()
-        if start: start -= 1
-        reference.location = [SeqFeature.FeatureLocation(start, end)]
+        if start:
+            start -= 1
+            reference.location = [SeqFeature.FeatureLocation(start, end)]
+        else:
+            reference.location = []
         #Don't replace the default "" with None.
         if authors : reference.authors = authors
         if title : reference.title = title


Heres a patch for the unittest to compare locations of injected and retrieved
records:
diff --git a/Tests/test_BioSQL_SeqIO.py b/Tests/test_BioSQL_SeqIO.py
index 2d8caf8..9479e02 100644
--- a/Tests/test_BioSQL_SeqIO.py
+++ b/Tests/test_BioSQL_SeqIO.py
@@ -360,6 +360,19 @@ def compare_records(old, new) :
             assert len(old.annotations[key]) == len(new.annotations[key])
             for old_r, new_r in zip(old.annotations[key],
new.annotations[key]) :
                 compare_references(old_r, new_r)
+            for old_ref, new_ref in zip(old.annotations[key],
+                    new.annotations[key]):
+                if old_ref.location == []:
+                    assert new_ref.location == [], "old_reference.location %s
!=" \
+                        "new_reference location %s" % (old_ref.location,
+                        new_ref.location)
+                else:
+                    assert old_ref.location[0].start ==
new_ref.location[0].start, \
+                    "old ref.location[0].start %s != new ref.location[0].start
%s" % \
+                    (old_ref.location[0].start, new_ref.location[0].start)
+                    assert old_ref.location[0].end == new_ref.location[0].end,
\
+                    "old ref.location[0].end %s != new ref.location[0].end %s"
% \
+                    (old_ref.location[0].end, new_ref.location[0].end)
         elif key == "comment":
             if isinstance(old.annotations[key], list):
                 old_comment = [comm.replace("\n", " ") for comm in \

Cheers, Cymon


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


More information about the Biopython-dev mailing list