[Bioperl-l] Bio::Root::IO reads URLs from -file

Lincoln Stein lstein at cshl.edu
Thu Aug 12 18:08:40 EDT 2004


The "perlish" way to do this is:

	-file => "GET http://foobar.com/index.html |"

This will, in fact, work using the current codebase (without Allen's patches), 
provided that the GET command from LWP is installed.  If you don't have GET 
installed, you can do the same thing with "wget" or any of a large number of 
URL fetcher commands.

Lincoln

On Tuesday 10 August 2004 03:57 pm, Allen Day wrote:
> On Tue, 10 Aug 2004, Jason Stajich wrote:
> > On Tue, 10 Aug 2004, Allen Day wrote:
> > > On Tue, 10 Aug 2004, Peter van Heusden wrote:
> > > > Hilmar Lapp wrote:
> > > > > I lean with Ewan to -url as I like explicit commands better than
> > > > > possibly dubious magic behind the scenes ... imagine someone stores
> > > > > files by names that match their url ...
> > > > >
> > > > > There's one thing though that's important IMO that Jason brings up:
> > > > > I don't know how you implemented this but I think Bio::Root::IO
> > > > > must not be dependent on LWP or any such beast that doesn't come
> > > > > with perl.
> > > >
> > > > I'm with the majority in that 'magic' creates possible confusion and
> > > > more room for error. As to Hilmar's idea of not depending on LWP, I
> > > > think this is also a good idea, and maybe the URL code can be a kind
> > > > of 'mixin' - i.e. implement it in another module and then have
> > > > Bio::Root::IO optionally add it as a plugin. What do you intend to do
> > > > with this capability? Is there going to be another module that
> > > > depends on the -url ability?
> > >
> > > Bio::Root::IO::_initialize_io() now accepts a '-url' argument.
> > >
> > > If present, and if LWP is loadable, _initialize_io() attempts to use
> > > LWP::Simple::getstore() to download the url to a local tempfile, and
> > > assigns that tempfile to the equivalent of _initialize_io()'s '-file'
> > > argument.  This works for HTTP, HTTPS, FTP, and all other protocols
> > > supported by LWP.  If a file request fails, there is a retry loop in
> > > place to retry a few times to fetch the file.
> >
> > The tempfile gets cleaned up by LWP?  We do this sort of thing in
> > Bio::Tools::Run::RemoteBlast and within Bio::DB::NCBIHelper,et al perhaps
> > we can localize some of that code to a -url param where it is a GET
> > request...
>
> no, i use Bio::Root::IO::tempfile() to generate the tempfile, and use LWP
> to write into that.  LWP doesn't know how to clean up after itself, as far
> as i can tell.
>
> > > If LWP is not loadable, _initialze_io() uses Bio::Root::HTTPget to open
> > > a socket to the file's host and sets '-fh' to read from this socket. 
> > > This only works for HTTP.  There is no retry loop in place here, as
> > > Bio::Root::HTTPget throws an error if it can't open the socket.  It's
> > > possible to modify Bio::Root::HTTPget to do retries, but I didn't feel
> > > like poking around in there.
> > >
> > > Still remaining to be done:
> > >
> > >   [1] add -url to the documentation
> > >   [2] checking for existance of clashing '-file' or '-fh' arguments
> > >   [3] add additional tests to t/RootIO.t for testing https and ftp
> > >       retrievals.
> > >
> > > Regarding another module depending on this, yes, there will be one,
> > > that's the only reason I added this :).  I have a new FeatureIO
> > > subsystem.  One format it can parse is GFF v3.  Valid GFF v3 requires
> > > features to be typed according to the Sequence Ontology or an extension
> > > thereof.  As part of the parse it downloads the Sequence Ontology
> > > DAG-Edit files, parses them into a Bio::Ontology, and returns
> > > Bio::SeqFeatureI objects with Annotation::OntologyTerms attached.
> > >
> > > I will commit the FeatureIO code soon.
> >
> > Cool! Will it also support some sort of caching of SO too?  Maybe we can
> > change Tools::GFF to delegate to FeatureIO for GFF3 files instead of
> > having 2 modules doing the same thing.
>
> well, it just caches to the tempfile right now and deletes on program
> termination, but if you want to add functionality for -url to store the
> file somewhere (perhaps in a filename in $TEMPDIR that is the md5 sum of
> the URL?), be my guest.
>
> my idea is to do away with Bio::Tools::GFF entirely.
>
> > Also, have you worked on the alignment <-> GFF3 at all either?  It is an
> > almost-doable thing with HSP->cigar_line but I am not sure we have a
> > cigar2HSP factory yet.
>
> nope, haven't looked at this.  i don't really do alignments so i don't
> need this functionality.
>
> -allen
>
> > -jason
> >
> > > -Allen
> > >
> > > > Peter
> > > >
> > > > _______________________________________________
> > > > Bioperl-l mailing list
> > > > Bioperl-l at portal.open-bio.org
> > > > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at portal.open-bio.org
> > > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >
> > --
> > Jason Stajich
> > Duke University
> > jason at cgt.mc.duke.edu
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l

-- 
Lincoln Stein
lstein at cshl.edu
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)


More information about the Bioperl-l mailing list