[Biopython-dev] rebase

Sat Jul 29 21:20:46 EDT 2000

----- Original Message -----
From: Jeffrey Chang <jchang at SMI.Stanford.EDU>
To: Cayte <katel at worldpath.net>
Cc: <Biopython-dev at biopython.org>; <bioperl-guts at bioperl.org>
Sent: Monday, July 24, 2000 3:28 AM
Subject: Re: [Biopython-dev] rebase

> There's already a class that strips HTML tags in:
> Bio.File.SGMLHandle
>
> It decorates a file handle to HTML data (e.g. a socket to a web page) and
> returns only the non-tag data.  It uses Python's built-in sgmllib library,
> since stripping tags is non-trivial.
>
> There's also a consumer decorator so that you can build consumers that
> don't have to deal with tags:
> Bio.ParserSupport.SGMLStrippingConsumer
>
> Jeff
>
   The consumer decorator doesn't solve the problem, because it occurs in the
_Scanner.

   SGML Handle works, except the linefeeds are placed in such a way, that there
may be no separation
between a key word and data from a previous field.  As an experiment, I hacked
handle_data in a copy of File.py and I was able to solve the problem.  But to do
it cleanly in production code, I would need to be able to be able to pass my own
parser to SGMLStripper, as an optional parameter. The .  alternative would be to
subclass both SGMLStripper and /SGMLHandle, because hamdle_data is deeply buried
in these classes.

   Isn't it spooky, the way our coding problems we deal with echo the problems our
molecules solve?

                                                            Cayte