[BioPython] GenBank parser used to break recently on rRNA records
Peter
biopython at maubp.freeserve.co.uk
Fri Jul 27 14:08:38 UTC 2007
Martin MOKREJŠ wrote:
> Hi,
> I tried to parse all ESTs and cDNAs from GenBank using biopython about
> 3 weeks old from CVS and it turned out it choked here:
>
> Will parse file 'ftp://ftp.ncbi.nlm.nih.gov/genbank/gbhtc12.seq.gz'
> Traceback (most recent call last):
> File "translate_ESTs.py", line 27, in ?
> _record = _iterator.next()
> File "/usr/lib/python2.4/site-packages/Bio/GenBank/__init__.py", line 142, in next
> return self._parser.parse(self.handle)
> File "/usr/lib/python2.4/site-packages/Bio/GenBank/__init__.py", line 208, in parse
> self._scanner.feed(handle, self._consumer)
> File "/usr/lib/python2.4/site-packages/Bio/GenBank/Scanner.py", line 360, in feed
> self._feed_first_line(consumer, self.line)
> File "/usr/lib/python2.4/site-packages/Bio/GenBank/Scanner.py", line 820, in _feed_first_line
> assert line[47:54].strip() in ['','DNA','RNA','tRNA','mRNA','uRNA','snRNA','cDNA'], \
> AssertionError: LOCUS line does not contain valid sequence type (DNA, RNA, ...):
> LOCUS DQ369798 725 bp rRNA linear HTC 14-JUN-2007
>
> However, the code has been revamped as I see in current CVS, so this is
> just for your information. I can parse the file with current code. ;-)
> Martin
It looks like the NCBI have introduced another sequence type to their
databases, 'rRNA' in this case. I think this validates the recent change
which will now accept anything with 'RNA' or 'DNA' in the string :)
Peter
More information about the Biopython
mailing list