[Biopython-dev] Strange Genbank feature description: how should biopython handle
this?
Danny Yoo
dyoo at acoma.Stanford.EDU
Wed Aug 7 16:46:45 EDT 2002
Hi everyone,
Ok, I fiddling around with the Genbank parser. In one of my test cases,
there's one particular entry that's very evil. It comes from AP000423
(GI:5881673), as gene RPS12:
gene join(complement(98562..98793),complement(97999..98024),
complement(69611..69724),139856..140087,140625..140650)
/gene="rps12"
Here's how Biopython is initializing this feature as:
###
type: gene
location: (98561..140650)
ref: None:None
strand: None
qualifiers:
Key: gene, Value: ['rps12']
Sub-Features
type: gene
location: (98561..98793)
ref: None:None
strand: -1
qualifiers:
type: gene
location: (97998..98024)
ref: None:None
strand: -1
qualifiers:
type: gene
location: (69610..69724)
ref: None:None
strand: -1
qualifiers:
type: gene
location: (139855..140087)
ref: None:None
strand: None
qualifiers:
type: gene
location: (140624..140650)
ref: None:None
strand: None
qualifiers:
###
The LocationParser itself appears to be doing it's job, as I see that:
###
Function('join', [Function('complement', [AbsoluteLocation(None,
Range(Integer(98562), Integer(98793)))]), Function('complement',
[AbsoluteLocation(None, Range(Integer(97999), Integer(98024)))]),
Function('complement', [AbsoluteLocation(None, Range(Integer(69611),
Integer(69724)))]), AbsoluteLocation(None, Range(Integer(139856),
Integer(140087))), AbsoluteLocation(None, Range(Integer(140625),
Integer(140650)))])
###
Having a strand of 'None' doesn't appear to be right. I've been staring
at 'Bio.GenBank.__init__.py' for a while, and it appears that the default
value for the strand isn't set unless the self._seq_type is equal to
"DNA". I don't quite understand all of the code yet, but the following
change appears to fix this particular case:
Index: Bio/GenBank/__init__.py
===================================================================
RCS file: /home/repository/biopython/biopython/Bio/GenBank/__init__.py,v
retrieving revision 1.29
diff -u -r1.29 __init__.py
--- Bio/GenBank/__init__.py 2002/04/16 15:45:26 1.29
+++ Bio/GenBank/__init__.py 2002/08/07 20:43:28
@@ -636,8 +636,9 @@
# assume positive strand to start with if we have DNA. The
# complement in the location will change this later.
- if self._seq_type == "DNA":
- self._cur_feature.strand = 1
+## if self._seq_type == "DNA":
+## self._cur_feature.strand = 1
+ self._cur_feature.strand = 1
def location(self, content):
"""Parse out location information from the location string.
@@ -735,7 +736,7 @@
new_sub_feature.ref = cur_feature.ref
new_sub_feature.ref_db = cur_feature.ref_db
new_sub_feature.strand = cur_feature.strand
-
+ assert(new_sub_feature.strand in (1, -1)) ## debug
# set the information for the inner element
self._set_location_info(inner_element, new_sub_feature)
What's the right way of fixing this problem? Thank you!
More information about the Biopython-dev
mailing list