[Biopython] Error while parsing bgk file
ning luwen
bioinformaticsing at gmail.com
Thu Jul 19 23:56:51 EDT 2012
Hi Bow,
Thank you for your reply, and a patch by lenna can solve the
interruption of the parse.
ps: these gbk file was recently downloaded from
ftp://ftp.ncbi.nih.gov/refseq/H_sapiens/H_sapiens/ (with extension of
gbs.gz), and the file contained "invalid GenBank annotation" is
ftp://ftp.ncbi.nih.gov/refseq/H_sapiens/H_sapiens/CHR_02/hs_ref_GRCh37.p5_chr2.gbs.gz
On Thu, Jul 19, 2012 at 4:50 PM, Wibowo Arindrarto
<w.arindrarto at gmail.com> wrote:
> Hi Ning,
>
> Thanks for reporting the error. A similar issue has been reported in
> the bug tracker here: https://redmine.open-bio.org/issues/3175 (it
> also looks like it's the same coordinate). It seems that this could be
> an invalid GenBank coordinate made by NCBI, though.
>
> From which chromosome is this coordinate coming from? Is it the latest draft?
>
> cheers,
> Bow
>
>
> On Thu, Jul 19, 2012 at 5:36 AM, ning luwen <bioinformaticsing at gmail.com> wrote:
>> Hi everyone,
>>
>> A error encountered when i parse a gbk file.
>>
>> the error message as follow:
>>
>> Traceback (most recent call last):
>> File "stat_refseq_gbs.py", line 10, in <module>
>> for seq in f:
>> File "/media/disk2/bio/bin/lib/python2.7/site-packages/Bio/SeqIO/__init__.py",
>> line 537, in parse
>> for r in i:
>> File "/media/disk2/bio/bin/lib/python2.7/site-packages/Bio/GenBank/Scanner.py",
>> line 445, in parse_records
>> record = self.parse(handle, do_features)
>> File "/media/disk2/bio/bin/lib/python2.7/site-packages/Bio/GenBank/Scanner.py",
>> line 428, in parse
>> if self.feed(handle, consumer, do_features):
>> File "/media/disk2/bio/bin/lib/python2.7/site-packages/Bio/GenBank/Scanner.py",
>> line 400, in feed
>> self._feed_feature_table(consumer, self.parse_features(skip=False))
>> File "/media/disk2/bio/bin/lib/python2.7/site-packages/Bio/GenBank/Scanner.py",
>> line 350, in _feed_feature_table
>> consumer.location(location_string)
>> File "/media/disk2/bio/bin/lib/python2.7/site-packages/Bio/GenBank/__init__.py",
>> line 970, in location
>> int(e),
>> ValueError: invalid literal for int() with base 10: '68452073^68452074'
>>
>> the file parsed is ref_GRCh37.p5, the biopython version is 1.60, the
>> lines cause the error may be:
>>
>> V_segment complement(68451760..68452073^68452074)
>> CDS complement(<68451760..68452072^68452073)
>>
>> --
>> regards,
>> luwen ning
>> _______________________________________________
>> Biopython mailing list - Biopython at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biopython
--
regards,
luwen ning
More information about the Biopython
mailing list