[BioPython] GenBank.FeatureParser() error in Locus field?

Cymon Cox cymon@duke.edu
04 Apr 2002 18:50:34 -0500


Dear BioPython Folks, 

The GenBank.FeatureParser() appears to bomb on the topology definition
in the Locus field: 

Code: 

from Bio import GenBank 

gi_list = GenBank.search_for("Bryum AND rps4") 
ncbi_dict = GenBank.NCBIDictionary() 
gb_record = ncbi_dict[gi_list[0]] 

print gb_record 

record_parser = GenBank.FeatureParser() 
ncbi_dict = GenBank.NCBIDictionary(parser = record_parser) 

gb_seqrecord = ncbi_dict[gi_list[0]] 

print gb_seqrecord 

Result: 
>>> 
LOCUS       BBI251310                642 bp    DNA     linear   PLN
29-MAR-2001
DEFINITION  Bryum billarderi chloroplast partial rps4 gene for ribosomal
            protein, subunit 4exit.
ACCESSION   AJ251310
VERSION     AJ251310.1  GI:11121108
KEYWORDS    ribosomal protein; RPS4 gene; subunit 4.
SOURCE      Bryum billarderi.
  ORGANISM  Plastid Bryum billarderi
            Eukaryota; Viridiplantae; Streptophyta; Embryophyta;
Bryophyta;
            Bryopsida; Bryidae; Bryales; Bryaceae; Bryum.
REFERENCE   1  (bases 1 to 642)
  AUTHORS   Cox,C.J., Goffinet,B., Newton,A.N., Shaw,A.J. and
Hedderson,T.A.
  TITLE     Phylogenetic relationships among the
diplolepideous-alternate
            mosses (Bryidae) inferred from nuclear and chloroplast DNA
            sequences
  JOURNAL   Unpublished
REFERENCE   2  (bases 1 to 642)
  AUTHORS   Cox,C.J.
  TITLE     Direct Submission
  JOURNAL   Submitted (22-OCT-1999) Cox C.J., Botany, The Natural
History
            Museum, Cromwell Road, London SW7 5BD, United Kingdom
FEATURES             Location/Qualifiers
     source          1..642
                     /organism="Bryum billarderi"
                     /organelle="plastid"
                     /db_xref="taxon:109056"
     gene            1..601
                     /gene="rps4"
     CDS             <1..601
                     /gene="rps4"
                     /codon_start=2
                     /transl_table=11
                     /product="ribosomal protein subunit 4"
                     /protein_id="CAC14741.1"
                     /db_xref="GI:11121109"
                    
/translation="YRGPRVRIIRRLGALTGLTNKTPQLKTNSINQSISNKKISQYRI
                    
RLEEKQKLRFHYGITERQLLNYVRIARKAKGSTGEVLLQLLEMRLDNVIFRLGMAPTI
                    
PGARQLVNHRHILVNDRIVNIPSYRCKPEDSITIKDRQKSQAIISKNLNLYQKYKTPN
                     HLTYNFLKKKGLVNQILDRESIGLKINELLVVEYYSRQA"
     misc_feature    602..>642
                     /note="intergenic spacer"
BASE COUNT      260 a     89 c     92 g    201 t
ORIGIN      
        1 gtatcgagga cctcgtgtaa gaataatacg ccgtttagga gctttaacag
gactaactaa
       61 taaaacaccc cagttaaaaa ctaattcgat caatcaatca atatctaata
aaaaaatttc
      121 tcaatatcgc attcgtttgg aagaaaaaca aaaattacgt tttcattatg
gaataacaga
      181 gcgacaatta cttaattatg tacgtattgc tagaaaagca aaagggtcaa
caggtgaagt
      241 gttattacaa ttacttgaaa tgcgcttaga taacgttatt tttcgattag
gtatggctcc
      301 tacgattcct ggagcaaggc aactagtaaa tcatagacat attttagtta
atgatcgtat
      361 agtaaatata ccgagttacc gttgtaaacc tgaggattct attactataa
aagatcgaca
      421 aaaatctcag gctataatta gtaaaaatct aaatttgtat caaaaatata
aaacaccaaa
      481 tcatttaact tataattttt taaaaaaaaa aggattagtt aatcaaatac
tagatcgtga
      541 atccattggt ttaaaaataa atgaattatt agttgtagaa tattattctc
gtcaagctta
      601 attagcaact aagagtattt ttaattatat acataataaa aa
//

Traceback (most recent call last):
  File "<string>", line 1, in ?
  File "/home/cymon/python/moss_db2/genbank_parser2temp.py", line 19, in
?
    gb_seqrecord = ncbi_dict[gi_list[0]]
  File "/usr/lib/python2.2/site-packages/Bio/GenBank/__init__.py", line
1555, in __getitem__
    return self.parser.parse(handle)
  File "/usr/lib/python2.2/site-packages/Bio/GenBank/__init__.py", line
268, in parse
    self._scanner.feed(handle, self._consumer)
  File "/usr/lib/python2.2/site-packages/Bio/GenBank/__init__.py", line
1250, in feed
    self._parser.parseFile(handle)
  File "/usr/lib/python2.2/site-packages/Martel/Parser.py", line 230, in
parseFile
    self.parseString(fileobj.read())
  File "/usr/lib/python2.2/site-packages/Martel/Parser.py", line 258, in
parseString
    self._err_handler.fatalError(result)
  File "/var/tmp/python2-2.2-root/usr/lib/python2.2/xml/sax/handler.py",
line 38, in fatalError
ParserPositionException: error parsing at or beyond character 55

>>> 

Character 55 is the last space before the 'l' of linear. If 'linear' is
removed from the locus field the record parses just fine.

Thanks for all your work,

Cheers, Cymon

-- 
___________________________________________________

Cymon J. Cox
Research Associate
Department of Biology
Duke University
Durham NC 27708

___________________________________________________