[BioPython] Updates to GenBank stuff
Brad Chapman
chapmanb@arches.uga.edu
Sat, 19 May 2001 16:20:19 -0400
Hey all;
I've been working on the GenBank parser stuff these past few days and
thought I would let everyone know what I've been working on:
* Peter had wrote, saying that it would be useful to add the ability
to be able to output GenBank as well. So, in the spirit of the Fasta
parser, you can now output a GenBank Record class object in GenBank
flat file format with 'print genbank_record' or
'str(genbank_record)'. So, assuming you had a GenBank file called
'my_file.gb', you could parse a record and write it back out with:
>>> from Bio import GenBank
>>> parser = GenBank.RecordParser()
>>> handle = open('my_file.gb', 'r')
>>> iterator = GenBank.Iterator(handle, parser)
>>> cur_record = iterator.next()
>>> repr(cur_record)
'<Bio.GenBank.Record.Record instance at 0x1025bbdc>'
>>> print cur_record
LOCUS HUGLUT1 741 bp DNA PRI 16-NOV-1994
DEFINITION Human fructose transporter (GLUT5) gene, promoter and exon 1.
ACCESSION U05344
...
For most cases, the GenBank file you output will be identical to the
original file. In some cases, some whitespace may be changed, but I'm
trying to work to minimize these cases as much as possible.
* I changed the way the GenBank parser handles whitespace in some
cases to attempt to make it "smarter." These changes include
maintaining newlines in COMMENT fields (where newlines tend to be very
significant), and getting rid of extra spaces in the values that go
along with '/translation=' keys. In general these fixes are supposed
to make the parser easier to use and do the "right" thing more
often. Hopefully :-).
* There were a bunch of bug that got shook out while working on this,
which were fixed up.
If people would be willing to take the newest stuff for a test drive
and report any bug/problems/suggestions/etc, I would be very
appreciative. All of the new code is available in CVS, which you can
get anonymously following the instructions at:
http://cvs.biopython.org
Thanks in advance!
Brad