[Biopython] Parsing problem
Iwan Grin
iwan.grin at googlemail.com
Tue Dec 8 18:52:13 UTC 2009
Hi all,
I am having a little problem while trying to parse a GenBank (or rather
GenProt) file using BioPython. I am trying to extract the position on the
genome from the "coded_by" qualifier of the CDS feature of a protein.
The "coded_by" string in this specific case looks like this:
'complement(NC_012967.1:
3622110..3624728)'
Now, when I run
Bio.GFF.easy.LocationFromString('complement(NC_012967.1:3622110..3624728)' )
I get
File "/usr/lib/pymodules/python2.6/Bio/GFF/easy.py", line 419, in __init__
list.__init__(self, [int(location_str)-1]) # zero based, nip it in the
bud
ValueError: invalid literal for int() with base 10:
'NC_012967.1:3622110..3624728'
Is there another way to parse this location string or do I have to cook up
some kind of custom RegExp?
Iwan
P.S.: Code snippet:
from Bio import Entrez
from Bio import SeqIO
from Bio import GFF
gi = 254163455
handle = Entrez.efetch(db="protein", id=gi, rettype="gb")
record= SeqIO.read(handle,"genbank")
handle.close()
for feature in record.features:
if(feature.type=="CDS" and feature.qualifiers.has_key("coded_by")):
print feature.qualifiers["coded_by"][0],
loc=GFF.easy.LocationFromString(feature.qualifiers["coded_by"][0])
print loc.start(),loc.end(), loc.complement
More information about the Biopython
mailing list