[Biopython] Error while parsing bgk file

ning luwen bioinformaticsing at gmail.com
Thu Jul 19 23:56:51 EDT 2012


Hi Bow,

      Thank you for your reply,  and a patch by lenna can solve the
interruption of the parse.

      ps: these gbk file was recently downloaded from
ftp://ftp.ncbi.nih.gov/refseq/H_sapiens/H_sapiens/ (with extension of
gbs.gz), and the file contained "invalid GenBank annotation" is
ftp://ftp.ncbi.nih.gov/refseq/H_sapiens/H_sapiens/CHR_02/hs_ref_GRCh37.p5_chr2.gbs.gz

On Thu, Jul 19, 2012 at 4:50 PM, Wibowo Arindrarto
<w.arindrarto at gmail.com> wrote:
> Hi Ning,
>
> Thanks for reporting the error. A similar issue has been reported in
> the bug tracker here: https://redmine.open-bio.org/issues/3175 (it
> also looks like it's the same coordinate). It seems that this could be
> an invalid GenBank coordinate made by NCBI, though.
>
> From which chromosome is this coordinate coming from? Is it the latest draft?
>
> cheers,
> Bow
>
>
> On Thu, Jul 19, 2012 at 5:36 AM, ning luwen <bioinformaticsing at gmail.com> wrote:
>> Hi everyone,
>>
>> A error encountered when i parse a gbk file.
>>
>> the error message as follow:
>>
>> Traceback (most recent call last):
>>   File "stat_refseq_gbs.py", line 10, in <module>
>>     for seq in f:
>>   File "/media/disk2/bio/bin/lib/python2.7/site-packages/Bio/SeqIO/__init__.py",
>> line 537, in parse
>>     for r in i:
>>   File "/media/disk2/bio/bin/lib/python2.7/site-packages/Bio/GenBank/Scanner.py",
>> line 445, in parse_records
>>     record = self.parse(handle, do_features)
>>   File "/media/disk2/bio/bin/lib/python2.7/site-packages/Bio/GenBank/Scanner.py",
>> line 428, in parse
>>     if self.feed(handle, consumer, do_features):
>>   File "/media/disk2/bio/bin/lib/python2.7/site-packages/Bio/GenBank/Scanner.py",
>> line 400, in feed
>>     self._feed_feature_table(consumer, self.parse_features(skip=False))
>>   File "/media/disk2/bio/bin/lib/python2.7/site-packages/Bio/GenBank/Scanner.py",
>> line 350, in _feed_feature_table
>>     consumer.location(location_string)
>>   File "/media/disk2/bio/bin/lib/python2.7/site-packages/Bio/GenBank/__init__.py",
>> line 970, in location
>>     int(e),
>> ValueError: invalid literal for int() with base 10: '68452073^68452074'
>>
>> the file parsed is ref_GRCh37.p5, the biopython version is 1.60, the
>> lines cause the error may be:
>>
>>      V_segment       complement(68451760..68452073^68452074)
>>      CDS             complement(<68451760..68452072^68452073)
>>
>> --
>> regards,
>> luwen ning
>> _______________________________________________
>> Biopython mailing list  -  Biopython at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biopython



-- 
regards,
luwen ning


More information about the Biopython mailing list