[Biopython-dev] Notification: incoming/30
biopython-bugs at bioperl.org
biopython-bugs at bioperl.org
Fri May 4 12:58:56 EDT 2001
JitterBug notification
chapmanb changed notes
Message summary for PR#30
From: dimlight at lgci.co.kr
Subject: PRIVATE: About Genbank Iterator
Date: Thu, 3 May 2001 20:18:39 -0400
0 replies 0 followups
Notes: Problem was GenBank record NM_006141.1, which was lacking a REFERENCE section.
Fixed the parser to be able to handle this case, fixes in CVS.
====> ORIGINAL MESSAGE FOLLOWS <====
>From dimlight at lgci.co.kr Thu May 3 20:18:39 2001
Received: from localhost (localhost [127.0.0.1])
by pw600a.bioperl.org (8.11.2/8.11.2) with ESMTP id f440Ic208720
for <biopython-bugs at pw600a.bioperl.org>; Thu, 3 May 2001 20:18:39 -0400
Date: Thu, 3 May 2001 20:18:39 -0400
Message-Id: <200105040018.f440Ic208720 at pw600a.bioperl.org>
From: dimlight at lgci.co.kr
To: biopython-bugs at bioperl.org
Subject: PRIVATE: About Genbank Iterator
Full_Name: Wankyu Kim
Module: GenBank,SeqFeature
Version: biopython-1.00a1
OS: win98
Submission from: cache14.bora.net (210.120.192.31)
I tried parsing GenBank-formatted file and just print every element on screen.
And I've downloaded RefSeq flat file in Genbank format at the following site.
ftp://ncbi.nlm.nih.gov/refseq/H_sapiens/mRNA_Prot/hs.gbff.gz
After unzipped the hs.gbff.gz file, I tryed parsing every element of RefSeq
Record.
It seemed working very well, and I could see the parsed elements scrolling down
on and on...
but on parsing 5287th record, I had the following error message.
Traceback (innermost last):
File "C:\Python20\genbank_element.py", line 11, in ?
cur_record = gb_iterator.next()
File "c:\python20\Bio\GenBank\__init__.py", line 156, in next
return self._parser.parse(File.StringHandle(data))
File "c:\python20\Bio\GenBank\__init__.py", line 233, in parse
self._scanner.feed(handle, self._consumer)
File "c:\python20\Bio\GenBank\__init__.py", line 1004, in feed
self._parser.parseFile(handle)
File "c:\python20\Martel\Parser.py", line 206, in parseFile
self.parseString(fileobj.read())
File "c:\python20\Martel\Parser.py", line 234, in parseString
self._err_handler.fatalError(result)
File "c:\python20\lib\xml\sax\handler.py", line 38, in fatalError
raise exception
ParserPositionException: error parsing at or beyond character 446
I had similar errors on RedHat 6.2 too.
Please cut & paste my code and test it. It'will took hours to test.
< Code >
from Bio import GenBank
gb_file = "hs.gbff"
from Bio import SeqFeature
gb_handle = open(gb_file, 'r')
feature_parser = GenBank.FeatureParser()
gb_iterator = GenBank.Iterator(gb_handle, feature_parser)
k = 0
while 1:
cur_record = gb_iterator.next()
k = k +1
print
print "record no", k
print
if cur_record is None:
break
print "cur_record.seq:", cur_record.seq.tostring()
print
print "cur_record.id",cur_record.id
print
print "cur_record.name", cur_record.name
print
print "cur_record.description", cur_record.description
print
print "cur_record.annotations"
print "gi : ", cur_record.annotations['gi']
print "organism : ", cur_record.annotations['organism']
print "taxonomy : ", cur_record.annotations['taxonomy'][:]
print "keywords : ", cur_record.annotations['keywords']
print "data_file_division : ", cur_record.annotations['data_file_division']
print "date : ", cur_record.annotations['date']
ref_len = len(cur_record.annotations['references'])
for j in range(ref_len):
print cur_record.annotations['references'][j].journal
print cur_record.annotations['references'][j].title
print cur_record.annotations['references'][j].authors
print cur_record.annotations['references'][j].medline_id
print cur_record.annotations['references'][j].pubmed_id
print cur_record.annotations['references'][j].comment
print len(cur_record.features)
i = len(cur_record.features)
for i in range(i):
print "type:", '\t\t',cur_record.features[i].type
print "location:",'\t', cur_record.features[i].location
for key in cur_record.features[i].qualifiers.keys():
print key, '\t', cur_record.features[i].qualifiers[key]
print
print
print "Congulatulations!!! You've gone through RefSeq file "
print
print
More information about the Biopython-dev
mailing list