[Bioperl-l] LWP in Bio::DB::GenBank.pm

Aaron J Mackey ajm6q@virginia.edu
Sun, 26 Nov 2000 12:17:15 -0500 (EST)


On Sat, 25 Nov 2000, Hilmar Lapp wrote:

> Regarding streaming through LWP, it is not really clear to me how to
> turn a callback into a stream of sequences that can be processed by
> client-code one after the other. If you look at the code of LWP, you may
> notice that processing of the stream (reading from the socket) is deeply
> buried, subclassing LWP::UserAgent to me doesn't seem to help.

I agree you have a problem here; LWP's callback mechanism is focused on
the incoming stream of data from the HTTP request, and will use the
callback function for each line of data read (which means that, at best,
you could generate a FIFO queue of Bio::Seq objects or even just raw data,
but if the client isn't using them up fast enough, you're back at the
original problem: a potentially huge internal memory representation of a
large sequence file).

It seems that the second option, the temporary file storage, would be the
better idea: save the data away into a safe tempfile, open a handle on it,
then delete it (anonymous file handles).

As another option, you could ask for a fourth type of request from Gisle
Aas - if you pass an already appropriately blessed/globbed file handle to
request(), LWP::UserAgent will duplicate it's incoming data handle and
give it back to you.  This would of course also require your users to
upgrade LWP, which isn't exactly on everyone's TODO list.

And finally, you could duplicate the critical aspects of LWP (socket code,
proxy handling, etc.) in your very own lightweight NetIO package, in which
it would be easy to pass off handles as necessary.

On a related note however, many web servers will cease data transmission
back to your client if the data isn't being read (i.e. the server's writes
to the socket are blocked for more than some period of time).  So if
you're client is doing some kind of heavy computation in it's pipeline
before reading the next sequence, the pipeline may get cut off from the
web server side of things.  Even more argument for the temporary file
solution.

Good luck,

-Aaron

-- 
 o ~   ~   ~   ~   ~   ~  o
/ Aaron J Mackey           \
\  Dr. Pearson Laboratory  /
 \ University of Virginia  \
 /  (804) 924-2821          \
 \  amackey@virginia.edu    /
  o ~   ~   ~   ~   ~   ~  o