[Biopython-dev] ScanProsite

Peter biopython at maubp.freeserve.co.uk
Mon Mar 2 10:26:38 UTC 2009


On Sun, Mar 1, 2009 at 12:17 PM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
> ScanProsite is a web tool to scan protein sequences against the PROSITE
> database (see http://www.expasy.org/tools/scanprosite/). Biopython contains
> code in Bio.Prosite to interact with ScanProsite. However, this code needs to
> be updated, as it does not work with the current ScanProsite web pages:
> Neither accessing ScanProsite nor extracting the hits from the HTML page works.
>
> This problem is relatively easy to solve, since ExPASy nowadays allows
> programmatic access to ScanProsite
> (see http://www.expasy.org/tools/scanprosite/ScanPrositeREST.html). This
> returns the Prosite hits in XML format, which can be parsed easily in Python.
>
> The only issue now is how this should be presented to the user. ...
> ...
> This is more straightforward, but on the other hand people may want to save the
> XML search results in an XML file, and for that purpose we'd need a function that
> does the parsing only.
>
> Any opinions?

I would definitely have two functions, one returning a handle to the
XML, and one for parsing XML from a handle.  This would be more
consistent with Bio.Entrez and other parsers, and more flexible.  For
example, the user can opt to save the XML to disk, and they can also
use our parser on files or the remote site - plus of course they can
use any other XML parser they may prefer.

I like your suggestion to have a REST XML based module under
Bio.ExPASy, which means we can deprecate the HTML based Bio.Prosite
module and in the process make the top level list of modules in
Biopython a bit shorter.  In the long term I think that will help
people find functionality.

Peter



More information about the Biopython-dev mailing list