[Biopython-dev] test_AlignIO to python 3

Peter Cock p.j.a.cock at googlemail.com
Mon Jul 5 13:18:12 UTC 2010


Tiago wrote:
>>> 3. The big one: No sgmllib in p3.

Peter wrote:
>> A lot of the things using sgmllib are already deprecated (e.g.
>> Bio.NetCatch and Bio.Prosite). I think that leaves just Bio.UniGene
>> and Bio.InterPro - which isn't such a big issue.

Michiel wrote:
> In Bio.UniGene and Bio.InterPro, sgmllib is used for parsing HTML pages,
> which tends to break easily anyway because the HTML format keeps
> changing. As a case in point, the parser in Bio.InterPro doesn't seem to
> work with current HTML pages from InterPro.

So that one is ready for deprecation (assuming no one steps forward
to update it).

> I haven't tried Bio.UniGene, but Bio.UniGene can also parse UniGene
> flat files so I doubt that there is a real need to parse UniGene html files.

Again, perhaps this HTML parser can be deprecated.

> In test_AlignIO, the import for sgmllib is coming from the SGMLStripper
> class in Bio.File, imported from Bio.ParserSupport, imported from
> Bio.GenBank, imported from Bio.SeqIO. But Bio.SeqIO doesn't
> actually use SGMLStripper, which has been deprecated.

That's been fixed by making Bio.File ignore the deprecated SGML
stuff if sgmllib isn't available.

> So I suggest that instead of fixing the modules that depend on sgmllib,
> we replace the relevant pieces of code by a NotImplementedError, and
> see if anybody complains.

How about just deprecation instead?

> For the longer term, it would be nice if the code in Bio.GenBank
> could be moved to Bio.SeqIO, and made independent of
> Bio.ParserSupport.

That makes sense except for the fact that Bio.GenBank is still useful
for "low level" work (not using a SeqRecord), for example WGS files.
Certainly long term I think we could drop Bio.GenBank and have a
simplified SeqRecord only parser in Bio.SeqIO. My recent location
parsing work is a step in that direction.

Peter



More information about the Biopython-dev mailing list