[Biopython] Deprecate Bio.GenBank.Record based GenBank parser?

Peter Cock p.j.a.cock at googlemail.com
Wed Oct 3 14:30:42 UTC 2018


Hello all,

Am I right in thinking almost everyone working with GenBank
or EMBL files in Biopython does so via Bio.Seq these days?

Underneath, this calls the scanner/consumer parser defined in
Bio.GenBank, where the scanner code breaks up the file into
logical bits which are passed to a consumer which turns them
into a Biopython data structure. For Bio.SeqIO, we build up a
SeqRecord object, but there is an alternative consumer which
builds up Bio.GenBank.Record objects instead.

If you the Bio.GenBank.read(...) or Bio.GenBank.parse(...)
functions you will get Bio.GenBank.Record objects which are
a quite direct representation of the underlying data structure,
and str(...) will give you a GenBank formatted string. Here
for example, the feature locations are left as plain strings.

Does anyone use the Bio.GenBank.Record based GenBank
parser? Could we deprecate it (in favour of only using the
GenBank parser via Bio.SeqIO)? This would mean in a few
releases time, we could remove the old record class and
potentially then simplify the GenBank/EMBL parsing.

Peter


More information about the Biopython mailing list