[Biopython] [BioPython] Genbank parser
Timothy Wu
2huggie at gmail.com
Wed Mar 16 04:26:44 EDT 2011
Hi,
I'm using Biopython to parse human genome files with code like this:
for seq_record in SeqIO.parse(fd, "genbank"):
* do something with seq_record*
However something tripped on me:
Traceback (most recent call last):
File "./buildSyn.py", line 26, in <module>
main()
File "./buildSyn.py", line 19, in main
gene2SynMapping, syn2GeneMapping = mapper.getMappingDicts(files)
File
"/home/thw/MyPythonPackage/frameworks/BioProg/idmapping/idmapper/human_genome_id_mapper.py",
line 29, in getMappingDicts
self.parseAndGetMapping(fd, gene2syn)
File
"/home/thw/MyPythonPackage/frameworks/BioProg/idmapping/idmapper/human_genome_id_mapper.py",
line 74, in parseAndGetMapping
for seq_record in SeqIO.parse(fd, "genbank"):
File "/usr/lib/pymodules/python2.6/Bio/SeqIO/__init__.py", line 525, in
parse
for r in i:
File "/usr/lib/pymodules/python2.6/Bio/GenBank/Scanner.py", line 437, in
parse_records
record = self.parse(handle, do_features)
File "/usr/lib/pymodules/python2.6/Bio/GenBank/Scanner.py", line 420, in
parse
if self.feed(handle, consumer, do_features):
File "/usr/lib/pymodules/python2.6/Bio/GenBank/Scanner.py", line 392, in
feed
self._feed_feature_table(consumer, self.parse_features(skip=False))
File "/usr/lib/pymodules/python2.6/Bio/GenBank/Scanner.py", line 344, in
_feed_feature_table
consumer.location(location_string)
File "/usr/lib/pymodules/python2.6/Bio/GenBank/__init__.py", line 975, in
location
raise LocationParserError(location_line)
Bio.GenBank.LocationParserError: 958574^958575..958886
The Genbank file involved has the following structure:
CDS 958574^958575..958772
/gene="CSH2"
/gene_synonym="CS-2; CSB; hCS-B"
/exception="unclassified translation discrepancy"
/note="placental lactogen; chorionic somatomammotropin
B;
Derived by automated computational analysis using gene
prediction method: Curated Genomic."
/codon_start=1
/product="chorionic somatomammotropin hormone 2 isoform
3"
/protein_id="NP_072171.1"
/db_xref="GI:12408694"
/db_xref="CCDS:CCDS42368.1"
/db_xref="GeneID:1443"
/db_xref="HGNC:2441"
/db_xref="MIM:118820"
This isn't the first occurrence in this file, however I manually deleted
what's equivalent of "^958575"
in the location and it works out OK.
Is there something I can do? Right now I edit the genbank file instead
(since I won't be needing the location information)
And I'm not sure what the caret is suppose to represent.
Thanks for your attention.
Timothy
More information about the Biopython
mailing list