[BioPython] GenBank parser used to break recently on rRNA records

Peter biopython at maubp.freeserve.co.uk
Fri Jul 27 14:08:38 UTC 2007


Martin MOKREJŠ wrote:
> Hi,
>   I tried to parse all ESTs and cDNAs from GenBank using biopython about
> 3 weeks old from CVS and it turned out it choked here:
> 
> Will parse file 'ftp://ftp.ncbi.nlm.nih.gov/genbank/gbhtc12.seq.gz'
> Traceback (most recent call last):
>   File "translate_ESTs.py", line 27, in ?
>     _record = _iterator.next()
>   File "/usr/lib/python2.4/site-packages/Bio/GenBank/__init__.py", line 142, in next
>     return self._parser.parse(self.handle)
>   File "/usr/lib/python2.4/site-packages/Bio/GenBank/__init__.py", line 208, in parse
>     self._scanner.feed(handle, self._consumer)
>   File "/usr/lib/python2.4/site-packages/Bio/GenBank/Scanner.py", line 360, in feed
>     self._feed_first_line(consumer, self.line)
>   File "/usr/lib/python2.4/site-packages/Bio/GenBank/Scanner.py", line 820, in _feed_first_line
>     assert line[47:54].strip() in ['','DNA','RNA','tRNA','mRNA','uRNA','snRNA','cDNA'], \
> AssertionError: LOCUS line does not contain valid sequence type (DNA, RNA, ...):
> LOCUS       DQ369798                 725 bp    rRNA    linear   HTC 14-JUN-2007
> 
>   However, the code has been revamped as I see in current CVS, so this is
> just for your information. I can parse the file with current code. ;-)
> Martin

It looks like the NCBI have introduced another sequence type to their 
databases, 'rRNA' in this case. I think this validates the recent change 
which will now accept anything with 'RNA' or 'DNA' in the string :)

Peter



More information about the Biopython mailing list