[BioPython] GenBank records again

Andreas Kuntzagk andreas.kuntzagk at mdc-berlin.de
Thu Feb 27 09:17:25 EST 2003


Hi,

> Thank you very much for the GenBank record things. Now I am trying to
> retrieve protein sequences with a file of GenBank ids. My script is the following:
> 
> from Bio import GenBank
> import sys
> 
> file = sys.argv[1]
> fp1 = open(file, 'r+')    #file of gi
> ids = fp1.read()
> 
> lids = ids.split()
> recNum = len(lids)
> 
> protein_ncbi_dict = GenBank.NCBIDictionary(database='protein',
>                         format='gp', parser=GenBank.FeatureParser())
> 
> for i in range(0, recNum):
>     gb_record = protein_ncbi_dict[lids[i]]
>     print '>'+ gb_record.id[0:-2] + '   ' + gb_record.seq.data
> 
> The script works well most of the time, but sometimes it gives an error
> message:
> 
> Traceback (most recent call last):
>   File "getGBRecords.py", line 25, in ?
>     gb_record = protein_ncbi_dict[lids[i]]
> File "/bio/python2.2/lib/python2.2/site-packages/Bio/GenBank/__init__.py", line
> 1563, in __getitem__ return self.parser.parse(handle)
>   File "/bio/python2.2/lib/python2.2/site-packages/Bio/GenBank/__init__.py", line
> 268, in parse self._scanner.feed(handle, self._consumer)
>   File "/bio/python2.2/lib/python2.2/site-packages/Bio/GenBank/__init__.py", line
> 1255, in feed self._parser.parseFile(handle)
>   File "/bio/python2.2/lib/python2.2/site-packages/Martel/Parser.py", line
> 338, in parseFile self.parseString(fileobj.read())
>   File "/bio/python2.2/lib/python2.2/site-packages/Martel/Parser.py", line
> 366, in parseString self._err_handler.fatalError(result)
>   File "/bio/python2.2/lib/python2.2/xml/sax/handler.py", line 38, in
> fatalError raise exception
> Martel.Parser.ParserPositionException: error parsing at or beyond character 14
> 
> 
> What is the reason for the problem? It seems that the problem is in the
> parser part, but I just don't know why.  Can anybody help?

It will probably help if you can give the ids where this happens.
Also you could use 
parser= GenBank.FeatureParser(debug=2)
This would give some info, where the parser chokes. (It is quit noisy
though and not easy to understand.)

I think, character 14 means, it's somewhere in the beginning of the
entry.

Ciao, Andreas



More information about the BioPython mailing list