[Biopython-dev] EZRetrieve

Peter biopython-dev at maubp.freeserve.co.uk
Sat Mar 22 11:18:42 UTC 2008


>  I have a proposal, so it could be implemented in the next version (1.46?).
>  Change the output of EZRetrieve.retrieve_single. It currently returns
>  a FASTA formated sequence. I think it should return a SeqRecord object
>  (if you want this SeqRecord object to be printed or stored as FASTA,
>  just use formatIO).
>  Here are the proposed changes: http://www.pastecode.com.ar/f3baff314
>  I can fill this as an enhancement in the bugtrack if you agree.

So there is currently one function, retrieve_single, which can returns
a handle but by default extracts and returns a FASTA record as a
string.   It does this by calling the parse_single function which
reads in the handle, parses the HTML file, and extracts just the FASTA
style text, throwing away the other annotation data (like the
chromosome or range requested).

Here is an example URL constructed by hand,
http://siriusb.umdnj.edu:18080/EZRetrieve/single_r_run.jsp?org=0&AccType=0&input=BC014651&from=-200&to=200

Parsing HTML is nasty - especially if the site updates the formatting
every so often.  I suppose just looking for the FASTA sequence is
fairly reliable.  I can see the case for an EzRetrieve HTML to
SeqRecord parser, but I would be tempted to try and parse more of the
annotation.

How many people do you think are using the retrieve_single function?
I would be very annoying for them if its behaviour suddenly changed.
Maybe we can add a new parse function, and call it from
retrieve_single if the optional argument parse=2?

Peter



More information about the Biopython-dev mailing list