[Bioperl-l] standard IO & URL handling

Tue, 26 Sep 2000 13:41:28 +0200

Ewan Birney wrote:
> 
> I am not sure how IOManager is set out, but I would like to see in this
> case a true base class (? perhaps IOManager) which IO orientated modules
> in bioperl would inheriet from. This would
> 
>         (a) save people from having to type get/sets for fh/filename all
> the time and
> 
>         (b) make bioperl more consistent.
> 
> For Network orientated modules, something similar might occur or the IO
> system might be good enough.
> 
> Does this sound sane Hilmar? Would you like to propose the system we
> should try to stick to?
> 

I don't feel that I'm the one having enough overview of the code and
'standard' modules available, but I can put a starter, hoping that people
out there will erase the flaws.

The wish-list/requirements from my point of view are the following:

1) Basic FileIO functionality supplied ready-to-use by a core module,
implemented through inheritance or delegation. This functionality should
at least comprise of
	o -file and -fh parameters in new()/_initialize() being dealt with
	o method fh() (or _filehandle(), whatever you prefer)
	o support for keeping track of the filename if one was supplied
	o method _pushback()
	o method _readline()
	o method close()
	o support for the capability of tying a filehandle to the object
	o ability to deal with any sort of IO::* handles

The two latter obviously refer to some comments by others. I have to
admit that so far I have never been at a point where I found the tying
possibility to save me lots of hassle, but since it is there in Perl it
is certainly something very useful in certain places.

While I think Matthew's point is basically right, first most BioPerl
modules using file IO use only one file at a time, and second, probably
at least half of BioPerl does use file IO. So, utilizing a central file
IO facility should be as easy and straight-forward as possible.

A proposal for implementation is then:
	o a base class implementing the requirements, like Bio::Root::StreamIO
	o a module in the need of stream IO inherits from this base class
          -- or see below
	o a module that needs multiple streams creates multiple (or, in the 
          case of inheritance, additional) instances of this class

The downside of simple inheritance is that the implementing class is
hard-coded
and also cannot be changed (set) at run-time. An alternative
circumventing this
could be to have something like the following in Bio::Root::Object (or a
descendant):
	o method stream() which gets/sets the StreamIO implementing object
          (note that this can be smart in not creating anything until it
is
           accessed, and in accepting named parameters, too)

This may sound like complicated or a lot of code, but if Perl is indeed
so rich in modules supporting all of this, it should in fact be very
straight-forward to implement it. And a programmer implementing a new
BioPerl parser does not have to worry about how to code portable and
consistent IO if he/she just sticks to what the core supplies.

Concerning URL/HTTP stuff, what I'd like to have is what I described with
'delegate the guts'. So, usually you have a URL you want to GET from or
POST to, and you have a table of key/value pairs (yes, of course, keys
may have multiple values), and you don't want to bother about how HTTP
works, and how to get through a firewall. You even may not want to bother
which particular protocol your URL refers to (ftp, file, etc). So, a core
module for supporting consistent net IO should from my point of view
enable something like
	o $stream = $netio->openURL($url, 'GET', \%query);

If this is already out there (probably it is), that's fine. It should be
very straight-forward then to implement a core module for this, and
complaints about inconsistent behaviour across BioPerl modules (one is
firewall-safe, another one is not) should become history, and a BioPerl
programmer in the need of URL queries very quickly finds his way.

One remark concerning LWP: it is indeed already on our list of
dependencies, but only very optional. E.g., I haven't installed it, and
presently the remote BLAST running module is not functional anyway
(because of the NCBI BLAST server changes). The long list of dependencies
on non-core packages LWP has makes it not really attractive from an
industrial environment point of view, I have to admit. Anyway, as I have
no overview on what packages are available for this purpose and how
likely they are to become a Perl standard, the vote should be cast by
those who know better.

Does any of the things proposed make sense to people out there?

	Hilmar

-- 
-----------------------------------------------------------------
Hilmar Lapp                                email: hlapp@gmx.net
NFI Vienna, IFD/Bioinformatics             phone: +43 1 86634 631
A-1235 Vienna                                fax: +43 1 86634 727
-----------------------------------------------------------------