[Biopython-dev] [Bug 1680] New: Problems with the GenBank indexing

bugzilla-daemon at portal.open-bio.org bugzilla-daemon at portal.open-bio.org
Tue Aug 17 12:22:37 EDT 2004


http://bugzilla.open-bio.org/show_bug.cgi?id=1680

           Summary: Problems with the GenBank indexing
           Product: Biopython
           Version: Not Applicable
          Platform: PC
        OS/Version: Windows XP
            Status: NEW
          Severity: major
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: sameet at nccs.res.in


I am using the latest Biopython(1.3) and latest Python (2.3.4).  I want to 
make an index file using a genbank file.  I am doing the following:
>>> from Bio import GenBank 
>>> dict_file = r'C:\Sameet\correspondence\genbank.gb' 
>>> index_file = r'C:\Sameet\correspondence\genbank.idx' 
>>> GenBank.index_file(dict_file, index_file) 

I get the following trace:
Traceback (most recent call last): 
 File "<pyshell#7>", line 1, in -toplevel- 
   GenBank.index_file(dict_file, index_file) 
 File "C:\Python23\Lib\site-packages\Bio\GenBank\__init__.py", line 1283, 
in index_file 
   SimpleSeqRecord.create_flatdb([filename], indexname, indexer) 
 File "C:\Python23\Lib\site-packages\Bio\Mindy\SimpleSeqRecord.py", line 
152, in create_flatdb 
   creator.load(filename, builder = builder, fileid_info = {}) 
 File "C:\Python23\Lib\site-packages\Bio\Mindy\BaseDB.py", line 52, in load 
   for record in iterator.iterate(source, cont_handler = builder): 
 File "C:\Python23\Lib\site-packages\Martel\IterParser.py", line 76, in 
iterateFile 
   self.record_parser.parseString(rec) 
 File "C:\Python23\Lib\site-packages\Martel\Parser.py", line 356, in 
parseString 
   self._err_handler.fatalError(result) 
 File "C:\Python23\lib\xml\sax\handler.py", line 38, in fatalError 
   raise exception 
ParserPositionException: error parsing at or beyond character 1843 

I am also attaching the file that i used in this experiment, it is the text 
that follows after this
------------GenBank File------------------
LOCUS       AY517242                 312 bp    DNA     linear   PRI 29-FEB-2004
DEFINITION  Homo sapiens clone HIV1-H9-362 HIV-1 integration site.
ACCESSION   AY517242
VERSION     AY517242.1  GI:42767637
KEYWORDS    .
SOURCE      Homo sapiens (human)
  ORGANISM  Homo sapiens
            Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
            Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo.
REFERENCE   1  (bases 1 to 312)
  AUTHORS   Wu,X., Li,Y., Crise,B. and Burgess,S.M.
  TITLE     Transcription start regions in the human genome are favored targets
            for MLV integration
  JOURNAL   Unpublished
REFERENCE   2  (bases 1 to 312)
  AUTHORS   Wu,X., Li,Y., Crise,B. and Burgess,S.M.
  TITLE     Direct Submission
  JOURNAL   Submitted (02-JAN-2004) Genome Technology Branch, National Human
            Genome Research Institute, NIH, 50 South Drive, Rm 5537, Bethesda,
            MD 20892, USA
FEATURES             Location/Qualifiers
     source          70..312
                     /organism="Homo sapiens"
                     /mol_type="genomic DNA"
                     /db_xref="taxon:9606"
                     /clone="HIV1-H9-362"
                     /focus
     source          1..69
                     /organism="Human immunodeficiency virus 1"
                     /mol_type="genomic DNA"
                     /db_xref="taxon:11676"
     LTR             <1..69
ORIGIN      
        1 gtctgttgtg tgactctggt aactagagat ccctcagacc cttttagtca gtgtggaaaa
       61 tctctagcac aagccaaagt gataatcaaa actcaggaac agtaagtcgg atgcgtcaca
      121 ttttttactc aatttatcaa acagtagcta catataggac atttgccagt tatataagga
      181 tgtacaaagt tcttgcacac tcaaatagca tatccagcca attgtctcaa tccagactgt
      241 tttagtgatg tgaaagaatg tggcaggtct ggaagtatca gaagctcagg aaagggcgga
      301 gatttttgtt aa
//

LOCUS       AY517241                 310 bp    DNA     linear   PRI 29-FEB-2004
DEFINITION  Homo sapiens clone HIV1-H9-361 HIV-1 integration site.
ACCESSION   AY517241
VERSION     AY517241.1  GI:42767636
KEYWORDS    .
SOURCE      Homo sapiens (human)
  ORGANISM  Homo sapiens
            Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
            Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo.
REFERENCE   1  (bases 1 to 310)
  AUTHORS   Wu,X., Li,Y., Crise,B. and Burgess,S.M.
  TITLE     Transcription start regions in the human genome are favored targets
            for MLV integration
  JOURNAL   Unpublished
REFERENCE   2  (bases 1 to 310)
  AUTHORS   Wu,X., Li,Y., Crise,B. and Burgess,S.M.
  TITLE     Direct Submission
  JOURNAL   Submitted (02-JAN-2004) Genome Technology Branch, National Human
            Genome Research Institute, NIH, 50 South Drive, Rm 5537, Bethesda,
            MD 20892, USA
FEATURES             Location/Qualifiers
     source          70..310
                     /organism="Homo sapiens"
                     /mol_type="genomic DNA"
                     /db_xref="taxon:9606"
                     /clone="HIV1-H9-361"
                     /focus
     source          1..69
                     /organism="Human immunodeficiency virus 1"
                     /mol_type="genomic DNA"
                     /db_xref="taxon:11676"
     LTR             <1..69
ORIGIN      
        1 gtctgttgtg tgactctggt aactagagat ccctcagacc cttttagtca gtgtggaaaa
       61 tctctagcag tccttgagtt cgctcagtaa gtccagcacc tgcacatttt cttccgaatc
      121 accaacatca agttctccca ggaggcgctg agtggtcacc agttcatgct gcaccacttt
      181 tcataaaagc ccaaagatgg aaacagccaa atgtctatca gatgatggat aaacaaaatg
      241 tggtacaggc acacctcata tttattgcag ttcactttat cgtgctttac aaatattgct
      301 ttttttttaa
//

LOCUS       AY517240                 310 bp    DNA     linear   PRI 29-FEB-2004
DEFINITION  Homo sapiens clone HIV1-H9-360 HIV-1 integration site.
ACCESSION   AY517240
VERSION     AY517240.1  GI:42767635
KEYWORDS    .
SOURCE      Homo sapiens (human)
  ORGANISM  Homo sapiens
            Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
            Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo.
REFERENCE   1  (bases 1 to 310)
  AUTHORS   Wu,X., Li,Y., Crise,B. and Burgess,S.M.
  TITLE     Transcription start regions in the human genome are favored targets
            for MLV integration
  JOURNAL   Unpublished
REFERENCE   2  (bases 1 to 310)
  AUTHORS   Wu,X., Li,Y., Crise,B. and Burgess,S.M.
  TITLE     Direct Submission
  JOURNAL   Submitted (02-JAN-2004) Genome Technology Branch, National Human
            Genome Research Institute, NIH, 50 South Drive, Rm 5537, Bethesda,
            MD 20892, USA
FEATURES             Location/Qualifiers
     source          70..310
                     /organism="Homo sapiens"
                     /mol_type="genomic DNA"
                     /db_xref="taxon:9606"
                     /clone="HIV1-H9-360"
                     /focus
     source          1..69
                     /organism="Human immunodeficiency virus 1"
                     /mol_type="genomic DNA"
                     /db_xref="taxon:11676"
     LTR             <1..69
ORIGIN      
        1 gtctgttgtg tgactctggt aactagagat ccctcagacc cttttagtca gtgtggaaaa
       61 tctctagcag attcctcaac aaattaaaaa agtaaaagta ctctgtgata agtgctatag
      121 taagggtagg caaaaggtgc tatggagccc agagaagaag accagagggt aagaggaggg
      181 gagggaaatg ttgacagggt gctgggacag gctgcctgga ctgtcacaaa acctcccatg
      241 gtcacaccag ggcctgcaag cagaagtctt cctcttgatt acatgatcca tggatggtca
      301 gcatttttaa
//
-------------GenBank File-------------------------

This problem has occured recurrently, irrespective of the operating system.

I think this is a problem
regards
Sameet



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.



More information about the Biopython-dev mailing list