[Biopython-dev] [Bug 2840] When a record has been loaded from BioSQL, trying to save it to another database fails with loader db_loader.load_seqrecord in _load_reference
bugzilla-daemon at portal.open-bio.org
bugzilla-daemon at portal.open-bio.org
Mon May 25 20:14:40 EDT 2009
http://bugzilla.open-bio.org/show_bug.cgi?id=2840
------- Comment #3 from cymon.cox at gmail.com 2009-05-25 20:14 EST -------
(In reply to comment #1)
> I have modified the dbtestcase.py script to show the contents of the reference
> of the record downloaded from genbank, and from the record recovered from
> BioSQL.
>
> Here is a print out of the last two references before saving to BioSQL:
>
> authors: Sugita,M., Sugiura,C., Arikawa,T. and Higuchi,M.
> title: Molecular evidence of an rpoA gene in the basal moss chloroplast
> genomes: rpoA is a useful molecular marker for phylogenetic analysis of mosses
> journal: Hikobia 14, 171-175 (2004)
> medline id:
> pubmed id:
> comment:
>
> location: [0:789]
> authors: Sugita,M.
> title: Direct Submission
> journal: Submitted (25-DEC-2002) Mamoru Sugita, Nagoya University, Center for
> Gene Research; Chikusa-ku, Nagoya, Aichi 464-8602, Japan
> (E-mail:sugita at gene.nagoya-u.ac.jp, Tel:81-52-789-3080(ex.3080),
> Fax:81-52-789-3080)
> medline id:
> pubmed id:
> comment:
>
> --- note: no location in the first one; only a location in the last reference
> (why? - should references have a location? I suppose they might, if they
> referred to a part of a chromosome?)
>
> Now, after saving to BioSQL and recovering, all the records have a location,
> but in some cases, it is [None:None]; here are the same two records.
>
> location: [None:None]
> authors: Sugita,M., Sugiura,C., Arikawa,T. and Higuchi,M.
> title: Molecular evidence of an rpoA gene in the basal moss chloroplast
> genomes: rpoA is a useful molecular marker for phylogenetic analysis of mosses
> journal: Hikobia 14, 171-175 (2004)
> medline id:
> pubmed id:
> comment:
>
> location: [0:789]
> authors: Sugita,M.
> title: Direct Submission
> journal: Submitted (25-DEC-2002) Mamoru Sugita, Nagoya University, Center for
> Gene Research; Chikusa-ku, Nagoya, Aichi 464-8602, Japan
> (E-mail:sugita at gene.nagoya-u.ac.jp, Tel:81-52-789-3080(ex.3080),
> Fax:81-52-789-3080)
> medline id:
> pubmed id:
> comment:
>
>
> After this, the db.load method calls _load_reference.
>
> I think the problem is because the last line doesn't cope with none values.
> If one edits
> _load_reference to put the last reference inside a test for the null condition
>
> if (start is not None and end is not None):
> sql = "INSERT INTO bioentry_reference (bioentry_id, reference_id,"
> \
> " start_pos, end_pos, rank)" \
> " VALUES (%s, %s, %s, %s, %s)"
> self.adaptor.execute(sql, (bioentry_id, reference_id,
> start, end, rank + 1))
>
> Then the problem is solved, but I'm not sure how this fits in the bigger scheme
> of things.
>
> d
>
The BioSQL loader uses None for "start" and "end" if a reference doesn't have a
location. When the reference is retrieved the location remains set to
["None","None"]
Try this alteration to BioSeq.py, it should solve your problem:
cymon at gyra:~/git/github-master/BioSQL$ git diff BioSeq.py
diff --git a/BioSQL/BioSeq.py b/BioSQL/BioSeq.py
index cc47cf4..8d1e02a 100644
--- a/BioSQL/BioSeq.py
+++ b/BioSQL/BioSeq.py
@@ -351,8 +351,11 @@ def _retrieve_reference(adaptor, primary_id):
references = []
for start, end, location, title, authors, dbname, accession in refs:
reference = SeqFeature.Reference()
- if start: start -= 1
- reference.location = [SeqFeature.FeatureLocation(start, end)]
+ if start:
+ start -= 1
+ reference.location = [SeqFeature.FeatureLocation(start, end)]
+ else:
+ reference.location = []
#Don't replace the default "" with None.
if authors : reference.authors = authors
if title : reference.title = title
Heres a patch for the unittest to compare locations of injected and retrieved
records:
diff --git a/Tests/test_BioSQL_SeqIO.py b/Tests/test_BioSQL_SeqIO.py
index 2d8caf8..9479e02 100644
--- a/Tests/test_BioSQL_SeqIO.py
+++ b/Tests/test_BioSQL_SeqIO.py
@@ -360,6 +360,19 @@ def compare_records(old, new) :
assert len(old.annotations[key]) == len(new.annotations[key])
for old_r, new_r in zip(old.annotations[key],
new.annotations[key]) :
compare_references(old_r, new_r)
+ for old_ref, new_ref in zip(old.annotations[key],
+ new.annotations[key]):
+ if old_ref.location == []:
+ assert new_ref.location == [], "old_reference.location %s
!=" \
+ "new_reference location %s" % (old_ref.location,
+ new_ref.location)
+ else:
+ assert old_ref.location[0].start ==
new_ref.location[0].start, \
+ "old ref.location[0].start %s != new ref.location[0].start
%s" % \
+ (old_ref.location[0].start, new_ref.location[0].start)
+ assert old_ref.location[0].end == new_ref.location[0].end,
\
+ "old ref.location[0].end %s != new ref.location[0].end %s"
% \
+ (old_ref.location[0].end, new_ref.location[0].end)
elif key == "comment":
if isinstance(old.annotations[key], list):
old_comment = [comm.replace("\n", " ") for comm in \
Cheers, Cymon
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
More information about the Biopython-dev
mailing list