[Bioperl-l] Bio::Root::IO reads URLs from -file

Jason Stajich jason at cgt.duhs.duke.edu
Tue Aug 10 15:43:13 EDT 2004


On Tue, 10 Aug 2004, Allen Day wrote:

> On Tue, 10 Aug 2004, Peter van Heusden wrote:
>
> > Hilmar Lapp wrote:
> >
> > > I lean with Ewan to -url as I like explicit commands better than
> > > possibly dubious magic behind the scenes ... imagine someone stores
> > > files by names that match their url ...
> > >
> > > There's one thing though that's important IMO that Jason brings up: I
> > > don't know how you implemented this but I think Bio::Root::IO must not
> > > be dependent on LWP or any such beast that doesn't come with perl.
> > >
> > I'm with the majority in that 'magic' creates possible confusion and
> > more room for error. As to Hilmar's idea of not depending on LWP, I
> > think this is also a good idea, and maybe the URL code can be a kind of
> > 'mixin' - i.e. implement it in another module and then have
> > Bio::Root::IO optionally add it as a plugin. What do you intend to do
> > with this capability? Is there going to be another module that depends
> > on the -url ability?
>
> Bio::Root::IO::_initialize_io() now accepts a '-url' argument.
>
> If present, and if LWP is loadable, _initialize_io() attempts to use
> LWP::Simple::getstore() to download the url to a local tempfile, and
> assigns that tempfile to the equivalent of _initialize_io()'s '-file'
> argument.  This works for HTTP, HTTPS, FTP, and all other protocols
> supported by LWP.  If a file request fails, there is a retry loop in place
> to retry a few times to fetch the file.

The tempfile gets cleaned up by LWP?  We do this sort of thing in
Bio::Tools::Run::RemoteBlast and within Bio::DB::NCBIHelper,et al perhaps
we can localize some of that code to a -url param where it is a GET
request...

>
> If LWP is not loadable, _initialze_io() uses Bio::Root::HTTPget to open a
> socket to the file's host and sets '-fh' to read from this socket.  This
> only works for HTTP.  There is no retry loop in place here, as
> Bio::Root::HTTPget throws an error if it can't open the socket.  It's
> possible to modify Bio::Root::HTTPget to do retries, but I didn't feel
> like poking around in there.
>
> Still remaining to be done:
>
>   [1] add -url to the documentation
>   [2] checking for existance of clashing '-file' or '-fh' arguments
>   [3] add additional tests to t/RootIO.t for testing https and ftp
>       retrievals.
>
> Regarding another module depending on this, yes, there will be one, that's
> the only reason I added this :).  I have a new FeatureIO subsystem.  One
> format it can parse is GFF v3.  Valid GFF v3 requires features to be typed
> according to the Sequence Ontology or an extension thereof.  As part of
> the parse it downloads the Sequence Ontology DAG-Edit files, parses them
> into a Bio::Ontology, and returns Bio::SeqFeatureI objects with
> Annotation::OntologyTerms attached.
>
> I will commit the FeatureIO code soon.
>

Cool! Will it also support some sort of caching of SO too?  Maybe we can
change Tools::GFF to delegate to FeatureIO for GFF3 files instead of
having 2 modules doing the same thing.

Also, have you worked on the alignment <-> GFF3 at all either?  It is an
almost-doable thing with HSP->cigar_line but I am not sure we have a
cigar2HSP factory yet.


-jason



> -Allen
>
>
> >
> > Peter
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>

--
Jason Stajich
Duke University
jason at cgt.mc.duke.edu


More information about the Bioperl-l mailing list