[Biopython-dev] Accessing ExPASy through Bio.SwissProt / Bio.SeqIO

Peter biopython at maubp.freeserve.co.uk
Wed Dec 5 11:55:45 UTC 2007


On 12/5/07, Michiel De Hoon <mdehoon at c2b2.columbia.edu> wrote:
> > One idea I had been thinking about was adding a new function
> > Bio.SeqIO.fetch(...) or Bio.SeqIO.online_fetch(...) which would act as
> > a proxy to all our supported online sequence databases, and either
> > return a handle to the requested record(s), or perhaps return
> > SeqRecord(s).
>
> I believe that Bio.db has such a functionality, but I don't think it is used
> much. Anyway, we currently have too many functions in Biopython to
> access databases rather than too few. So I think we should not add any
> new ones.

Certainly before taking my suggestion seriously we should try and take
stock of where we stand at the moment with respect to online
databases.

> > Can ExPASyDictionary return anything that get_sprot_raw can't?
> > Otherwise from the user's point of view its just a coding style issue
> > (dictionary versus function).
>
> ExPASyDictionary is just a wrapper around get_sprot_raw, so get_sprot_raw can
> return any record that ExPASyDictionary can return.
> There are two differences between the two:
> 1) ExPASyDictionary behaves as a dictionary, get_sprot_raw as a function. As
> you write, this is just a coding style issue.
> 2) When creating a ExPASyDictionary, users can pass a parser to parse the
> records before returning them. This is in essence only a coding style issue.
> In particular, do we want:
>   >>> from Bio.SwissProt import SProt
>   >>> sprot_parser = SProt.RecordParser()
>   >>> dictionary = SProt.ExPASyDictionary(parser = sprot_parser)
>   >>> record = dictionary["O12345"]
>   or
>   >>> from Bio.SwissProt import SProt
>   >>> from Bio import ExPASy
>   >>> handle = ExPASy.get_sprot_raw("O12345")
>   >>> record = SProt.parse(handle)

Or do we want to encourage Bio.SeqIO (which happens to call
Bio.SwissProt.SProt internally)?

>>> from Bio SeqIO
>>> from Bio import ExPASy
>>> handle = ExPASy.get_sprot_raw("O12345")
>>> record = SeqIO.parse(handle, "swiss")

This is the style I prefer (and is very similar to the related
examples I added to the tutorial).  It separates fetching the data (as
a handle) and parsing it (via SeqIO).

> For SeqRecords, in the ExPASyDictionary approach we'd use a different parser,
> in the get_sprot_raw approach we call SeqIO.parse instead of SProt.parse.
> For plain-text output, in the ExPASyDictionary approach we pass no parser,
> and in the get_sprot_raw approach we call read() on the handle directly.
> To get a handle, in the ExPASyDictionary approach we can use StringIO to
> convert the text output to a handle; in the get_sprot_raw approach we don't
> need to do anything.
>
> In my opinion, both 1) and 2) are just coding style issues. Maintaining both
> ExPASyDictionary and get_sprot_raw is a burden for the developers, and causes
> confusion for users. So I suggest we focus on one of these, and deprecate the
> other.

As ExPASyDictionary just calls wraps get_sprot_raw with a parser
object, the additional overhead is minimal.  The dictionary metaphore
is quite nice - even if you don't actually gain much functionality.
However, setting up the dictionary as it is now (requiring an "old
fashioned" parser object) is fairly fiddly/confusing.

> The ExPASy.get_sprot_raw approach seems closer to how Bio.SeqIO is
> organized, and therefore has my preference.

I would agree if you wanted to depreceate one, I would keep
get_sprot_raw and drop ExPASyDictionary. However we should try and
have a coherent API for the other online tools as well.

> Two more issues:
> 1) I am not sure why the SwissProt code is kept in a separate SProt submodule
> of Bio.SwissProt. Currently, Bio/SwissProt/__init__.py is empty, so we can
> save ourselves some typing by keeping all the SwissProt code there instead of
> in SProt.py.

Or just encourage using it via Bio.SeqIO (then we can moving things
later if wanted)

> 2) A SwissProt.parse function currently doesn't exist. Right now it is a
> three-step process:
>   >>> s_parser = SProt.RecordParser()
>   >>> s_iterator = SProt.Iterator(handle, s_parser)
>   >>> record = s_iterator.next()
>   A SwissProt.parse function would just contain these three steps, or
> perhaps only the first two.

The Bio.SeqIO.parse() is very close though.

Peter




More information about the Biopython-dev mailing list