[Bioperl-l] Bio::Tools reorganisation (long)

Chris Fields cjfields at illinois.edu
Sat Oct 25 19:05:09 UTC 2008


On Oct 24, 2008, at 5:04 AM, Heikki Lehvaslaiho wrote:

> I was thinking of the proposed simplification of the BioPerl core and
> reading http://www.bioperl.org/wiki/Proposed_core_modules_changes.
>
> I realised that Bio::Tools really should be reorganised. At the
> moment it holds at least five different kinds of categories
> of "tools".
>
> The most difficult question here is, of course: How much backward
> compatibility should be kept? I am in favour of doing quite
> drastic changes if they help clarify the purpose of the modules.

It's probably best to just bite the bullet, fix as many bugs within  
reason within the next month or two, and just put out a 1.6 release.   
Past that point I think we should focus on what a bioperl 2.0 should  
be like and work towards that w/o the overhead of worrying about  
breaking old code.

> Also, what is the defining rule to categorize the modules?
>
> Possibilities:
>
> 1. Local/web
> 2. Type of data: Sequences in databases/Analysis tools
> 3. Type of analysis
>
> Below I've outlined what I think need to be done based on
> assumption that local/web rule is primary and type of analysis is the
> secondary organising principle.
>
>
> None of this is in the bioperl wiki but can be put in there at any  
> point of
> the discussion.
>
>
>   -Heikki

You're free to modify the 'proposed changes' page as you want; we can  
use the discussion page as well for ideas.

> 1. Core functionality
> =====================
>
> Used by core sequence objects.
>
> e.g.:
> Bio::Utils::Codontable
> Bio::Utils::GuessSeqFormat
>
> Suggestion: Not called directly, so moving to. e.g. Bio::Seq,
> should not be a problem. Can be implemented immediately.

+1

> 2. Utilities
> ============
>
> Perform a simple analysis related to sequences or sequence
> formats. All the code is present within the module.
>
> e.g.:
> Bio::Tools::IUPAC
> Bio::Tools::OddCodes
> Bio::Tools::ECnumber (?)
>
> Suggestion: Separate them from tools into Bio::Utils within the
> core package. Seldom used, so should not break backward
> compatibility too much.

+1

> 3. Parsers for program outputs
> ==============================
>
> Bulk of the Bio::Tools name space content. They need to be sorted into
> categories when possible according to convention:
> Bio::Tools::Alignment, Bio::Tools, Phylo.
>
> Suggestion: Move into Bio::Tools::Parser(, or Bio::Parser).

Some of the tools combine parsers with simple container objects, so  
they aren't easily separated (e.g. Bio::Tools::EUtilities, which  
parses output from NCBI's eutils and represents data from them as Bio*  
container objects).  I suppose I could move the simple containers into  
their own unique namespace...

> 4. External local programme wrappers
> ====================================
>
> Most of these, but not all, are in Bio::Tools::Run and already in
> bioperl-run package. They use parsers in Bio::Tools name
> space (category 3.).
>
> Suggestion: Move into Bio::Tools::RunLocal, (or Bio::RunLocal) to  
> shorten the
> name.

Maybe just Bio::Run or Bio::Wrapper?

> 5. Wrappers for remote (Web based) services
> ===========================================
>
> Most of the service wrappers follow Bio::SimpleAlignI and are in
> Bio::Tools::Analysis.
>
> Examples of modules that are using web but are among local
> application wrappers:
>
> Bio::Tools::Protparam
> Bio::Tools::WebBlat

WebBlat is deprecated (no longer maintained).

http://thread.gmane.org/gmane.comp.lang.perl.bio.general/13520

> Modules using Web access, but are in the bioperl-run package:
>
> Bio::Tools::Run::Pise*
>
> Modules accessing web for retrieving sequences: Bio::DB.
>
> This name space contains modules for managing local sequence  
> databases,
> accessing web based sequence databases, and a variety of other
> objects: Bibliographic references, sequence annotation, MeSH
> terms, Taxonomy.
>
> Suggestion: move to Bio::Tools::RunExternal, (or
> Bio::Web). Reorganise Bio::DB in the similar manner to logical  
> categories.

I think this is okay, though the shorter the namespace the better.

chris

> -- 
> ______ _/      _/_____________________________________________________
>      _/      _/
>     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
>    _/_/_/_/_/  Senior Scientist    skype: heikki_lehvaslaiho
>   _/  _/  _/  SANBI, South African National Bioinformatics Institute
>  _/  _/  _/  University of Western Cape, South Africa
>     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
> ___ _/_/_/_/_/________________________________________________________



More information about the Bioperl-l mailing list