[Biopython-dev] test_AlignIO to python 3
Peter Cock
p.j.a.cock at googlemail.com
Mon Jul 5 13:18:12 UTC 2010
Tiago wrote:
>>> 3. The big one: No sgmllib in p3.
Peter wrote:
>> A lot of the things using sgmllib are already deprecated (e.g.
>> Bio.NetCatch and Bio.Prosite). I think that leaves just Bio.UniGene
>> and Bio.InterPro - which isn't such a big issue.
Michiel wrote:
> In Bio.UniGene and Bio.InterPro, sgmllib is used for parsing HTML pages,
> which tends to break easily anyway because the HTML format keeps
> changing. As a case in point, the parser in Bio.InterPro doesn't seem to
> work with current HTML pages from InterPro.
So that one is ready for deprecation (assuming no one steps forward
to update it).
> I haven't tried Bio.UniGene, but Bio.UniGene can also parse UniGene
> flat files so I doubt that there is a real need to parse UniGene html files.
Again, perhaps this HTML parser can be deprecated.
> In test_AlignIO, the import for sgmllib is coming from the SGMLStripper
> class in Bio.File, imported from Bio.ParserSupport, imported from
> Bio.GenBank, imported from Bio.SeqIO. But Bio.SeqIO doesn't
> actually use SGMLStripper, which has been deprecated.
That's been fixed by making Bio.File ignore the deprecated SGML
stuff if sgmllib isn't available.
> So I suggest that instead of fixing the modules that depend on sgmllib,
> we replace the relevant pieces of code by a NotImplementedError, and
> see if anybody complains.
How about just deprecation instead?
> For the longer term, it would be nice if the code in Bio.GenBank
> could be moved to Bio.SeqIO, and made independent of
> Bio.ParserSupport.
That makes sense except for the fact that Bio.GenBank is still useful
for "low level" work (not using a SeqRecord), for example WGS files.
Certainly long term I think we could drop Bio.GenBank and have a
simplified SeqRecord only parser in Bio.SeqIO. My recent location
parsing work is a step in that direction.
Peter
More information about the Biopython-dev
mailing list