[Biopython-dev] [Bug 1680] New: Problems with the GenBank indexing
bugzilla-daemon at portal.open-bio.org
bugzilla-daemon at portal.open-bio.org
Tue Aug 17 12:22:37 EDT 2004
http://bugzilla.open-bio.org/show_bug.cgi?id=1680
Summary: Problems with the GenBank indexing
Product: Biopython
Version: Not Applicable
Platform: PC
OS/Version: Windows XP
Status: NEW
Severity: major
Priority: P2
Component: Main Distribution
AssignedTo: biopython-dev at biopython.org
ReportedBy: sameet at nccs.res.in
I am using the latest Biopython(1.3) and latest Python (2.3.4). I want to
make an index file using a genbank file. I am doing the following:
>>> from Bio import GenBank
>>> dict_file = r'C:\Sameet\correspondence\genbank.gb'
>>> index_file = r'C:\Sameet\correspondence\genbank.idx'
>>> GenBank.index_file(dict_file, index_file)
I get the following trace:
Traceback (most recent call last):
File "<pyshell#7>", line 1, in -toplevel-
GenBank.index_file(dict_file, index_file)
File "C:\Python23\Lib\site-packages\Bio\GenBank\__init__.py", line 1283,
in index_file
SimpleSeqRecord.create_flatdb([filename], indexname, indexer)
File "C:\Python23\Lib\site-packages\Bio\Mindy\SimpleSeqRecord.py", line
152, in create_flatdb
creator.load(filename, builder = builder, fileid_info = {})
File "C:\Python23\Lib\site-packages\Bio\Mindy\BaseDB.py", line 52, in load
for record in iterator.iterate(source, cont_handler = builder):
File "C:\Python23\Lib\site-packages\Martel\IterParser.py", line 76, in
iterateFile
self.record_parser.parseString(rec)
File "C:\Python23\Lib\site-packages\Martel\Parser.py", line 356, in
parseString
self._err_handler.fatalError(result)
File "C:\Python23\lib\xml\sax\handler.py", line 38, in fatalError
raise exception
ParserPositionException: error parsing at or beyond character 1843
I am also attaching the file that i used in this experiment, it is the text
that follows after this
------------GenBank File------------------
LOCUS AY517242 312 bp DNA linear PRI 29-FEB-2004
DEFINITION Homo sapiens clone HIV1-H9-362 HIV-1 integration site.
ACCESSION AY517242
VERSION AY517242.1 GI:42767637
KEYWORDS .
SOURCE Homo sapiens (human)
ORGANISM Homo sapiens
Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo.
REFERENCE 1 (bases 1 to 312)
AUTHORS Wu,X., Li,Y., Crise,B. and Burgess,S.M.
TITLE Transcription start regions in the human genome are favored targets
for MLV integration
JOURNAL Unpublished
REFERENCE 2 (bases 1 to 312)
AUTHORS Wu,X., Li,Y., Crise,B. and Burgess,S.M.
TITLE Direct Submission
JOURNAL Submitted (02-JAN-2004) Genome Technology Branch, National Human
Genome Research Institute, NIH, 50 South Drive, Rm 5537, Bethesda,
MD 20892, USA
FEATURES Location/Qualifiers
source 70..312
/organism="Homo sapiens"
/mol_type="genomic DNA"
/db_xref="taxon:9606"
/clone="HIV1-H9-362"
/focus
source 1..69
/organism="Human immunodeficiency virus 1"
/mol_type="genomic DNA"
/db_xref="taxon:11676"
LTR <1..69
ORIGIN
1 gtctgttgtg tgactctggt aactagagat ccctcagacc cttttagtca gtgtggaaaa
61 tctctagcac aagccaaagt gataatcaaa actcaggaac agtaagtcgg atgcgtcaca
121 ttttttactc aatttatcaa acagtagcta catataggac atttgccagt tatataagga
181 tgtacaaagt tcttgcacac tcaaatagca tatccagcca attgtctcaa tccagactgt
241 tttagtgatg tgaaagaatg tggcaggtct ggaagtatca gaagctcagg aaagggcgga
301 gatttttgtt aa
//
LOCUS AY517241 310 bp DNA linear PRI 29-FEB-2004
DEFINITION Homo sapiens clone HIV1-H9-361 HIV-1 integration site.
ACCESSION AY517241
VERSION AY517241.1 GI:42767636
KEYWORDS .
SOURCE Homo sapiens (human)
ORGANISM Homo sapiens
Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo.
REFERENCE 1 (bases 1 to 310)
AUTHORS Wu,X., Li,Y., Crise,B. and Burgess,S.M.
TITLE Transcription start regions in the human genome are favored targets
for MLV integration
JOURNAL Unpublished
REFERENCE 2 (bases 1 to 310)
AUTHORS Wu,X., Li,Y., Crise,B. and Burgess,S.M.
TITLE Direct Submission
JOURNAL Submitted (02-JAN-2004) Genome Technology Branch, National Human
Genome Research Institute, NIH, 50 South Drive, Rm 5537, Bethesda,
MD 20892, USA
FEATURES Location/Qualifiers
source 70..310
/organism="Homo sapiens"
/mol_type="genomic DNA"
/db_xref="taxon:9606"
/clone="HIV1-H9-361"
/focus
source 1..69
/organism="Human immunodeficiency virus 1"
/mol_type="genomic DNA"
/db_xref="taxon:11676"
LTR <1..69
ORIGIN
1 gtctgttgtg tgactctggt aactagagat ccctcagacc cttttagtca gtgtggaaaa
61 tctctagcag tccttgagtt cgctcagtaa gtccagcacc tgcacatttt cttccgaatc
121 accaacatca agttctccca ggaggcgctg agtggtcacc agttcatgct gcaccacttt
181 tcataaaagc ccaaagatgg aaacagccaa atgtctatca gatgatggat aaacaaaatg
241 tggtacaggc acacctcata tttattgcag ttcactttat cgtgctttac aaatattgct
301 ttttttttaa
//
LOCUS AY517240 310 bp DNA linear PRI 29-FEB-2004
DEFINITION Homo sapiens clone HIV1-H9-360 HIV-1 integration site.
ACCESSION AY517240
VERSION AY517240.1 GI:42767635
KEYWORDS .
SOURCE Homo sapiens (human)
ORGANISM Homo sapiens
Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo.
REFERENCE 1 (bases 1 to 310)
AUTHORS Wu,X., Li,Y., Crise,B. and Burgess,S.M.
TITLE Transcription start regions in the human genome are favored targets
for MLV integration
JOURNAL Unpublished
REFERENCE 2 (bases 1 to 310)
AUTHORS Wu,X., Li,Y., Crise,B. and Burgess,S.M.
TITLE Direct Submission
JOURNAL Submitted (02-JAN-2004) Genome Technology Branch, National Human
Genome Research Institute, NIH, 50 South Drive, Rm 5537, Bethesda,
MD 20892, USA
FEATURES Location/Qualifiers
source 70..310
/organism="Homo sapiens"
/mol_type="genomic DNA"
/db_xref="taxon:9606"
/clone="HIV1-H9-360"
/focus
source 1..69
/organism="Human immunodeficiency virus 1"
/mol_type="genomic DNA"
/db_xref="taxon:11676"
LTR <1..69
ORIGIN
1 gtctgttgtg tgactctggt aactagagat ccctcagacc cttttagtca gtgtggaaaa
61 tctctagcag attcctcaac aaattaaaaa agtaaaagta ctctgtgata agtgctatag
121 taagggtagg caaaaggtgc tatggagccc agagaagaag accagagggt aagaggaggg
181 gagggaaatg ttgacagggt gctgggacag gctgcctgga ctgtcacaaa acctcccatg
241 gtcacaccag ggcctgcaag cagaagtctt cctcttgatt acatgatcca tggatggtca
301 gcatttttaa
//
-------------GenBank File-------------------------
This problem has occured recurrently, irrespective of the operating system.
I think this is a problem
regards
Sameet
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
More information about the Biopython-dev
mailing list