Bioperl: SeqIO drivers

hilmar.lapp@pharma.Novartis.com hilmar.lapp@pharma.Novartis.com
Mon, 8 May 2000 17:47:36 +0100


Dear all,

I've partially rewritten and extended the Bio::SeqIO.pm module to allow for more
flexible dealing with parsers.

The problem I have with the current implementation is that basically its
complete operation is hardwired, and some methods don't have the cleanest
implementation possible (like new(), and there is also a _new()). Thus, when you
need your own sequence parser, because you have your own sequence format or the
BioPerl parser doesn't work for you, you have two options: either, do it like
you did it before without any notion of BioPerl objects, or you can try to fit
it transparently into the existing BioPerl framework, which comprises e.g. a
standard call to Bio::SeqIO->new() when you open a sequence stream. However, the
latter requires you to either mess around in the BioPerl modules, or write at
least two own modules, one with the actual parser, and one derivatizing
SeqIO.pm, and furtheron use that derived class instead of Bio::SeqIO.

I found this inconvenient. The way I changed the module was in the notion of a
SeqIO manager, who knows how to handle streams and file handles, and leaves the
real parsing stuff to a driver. In this sense, the modules in Bio::SeqIO::* are
drivers. Apart from other things, I added a method that allows you to register a
driver, and the information required for doing so is the name of the format for
which you want to register your driver, and the name of the module implementing
the driver. Optionally you can provide a regular expression to be used in format
guessing for recognizing whether a file may be your format based on the file's
name. The Bio::SeqIO::* drivers and their formats are known by default, so you
don't have to worry about these. This way you can easily plug in your own
driver, and it will be treated the same way as the BioPerl drivers. It also
comes in handy if you want to replace a BioPerl driver with your own one for a
specific format - if you provide a format string upon registration for which a
driver already exists, that one is replaced by yours. There's no need to hack
the BioPerl core, and you can transparently use Bio::SeqIO->new() whenever you
open a sequence stream.

I've used this for implementing and using a driver for fasta files accompanied
by quality values in a pseudo fasta format, and so far it has worked, but it's
still not sufficiently tested.

If you see fit and utility I can replace the existing SeqIO.pm module with this
version. (It doesn't affect any other module - hopefully.)

     Hilmar

-----------------------------------------------------------------------------
Hilmar Lapp                            email: Hilmar.Lapp@pharma.novartis.com
NFI Vienna, IFD/Bioinformatics         phone: +43 1 86634 631
A-1235 Vienna                          fax:   +43 1 86634 727
ROI: Bioinformatics (arrays, expression, seqs), Programming (OO), Databases
-----------------------------------------------------------------------------


=========== Bioperl Project Mailing List Message Footer =======
Project URL: http://bio.perl.org/
For info about how to (un)subscribe, where messages are archived, etc:
http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/vsns-bcd-perl.html
====================================================================