[Bioperl-l] Another Taxonomy modules to CPAN

Wed Nov 3 22:34:30 EDT 2010

Miguel,

(Caveat: You should also ask this on the perl module-authors list, just in case: http://lists.perl.org/list/module-authors.html)

Not sure how the other devs feel, but I personally don't think the Bio* namespace is reserved only for BioPerl modules (see Bio::Phylo, for example).  It's a fairly generic top-level name.  The only worry I have is if these are too similar to current BioPerl modules; Bio::Taxonomy and Bio::DB::Taxonomy already have namespaces in CPAN related to BioPerl modules.   

Saying that, tagging them as *::Lite might be fine, as long as the documentation indicated these are not related to BioPerl.  Anyone else want to chime in?  Maybe releasing them as a top-level Taxonomy?  

chris

On Nov 3, 2010, at 4:42 AM, Miguel Pignatelli wrote:

> Hi all,
> 
> I have written a couple of modules that overlap certain functionality with Bio::DB::Taxonomy and Bio::Taxon. I had to write them because certain constraints in the environment I had to run it (GRID) made impossible to use a bioperl based solution.
> 
> 
> The main features of these modules are:
> 
> + No dependencies of non-standard Perl modules
> + NCBI and RDP based taxonomies supported
> + Very fast and low memory footprint -- orders of magnitude faster than Bioperl modules (for the tasks they are designed for --).
> 
> Of course, they do not compete with Bio::DB::Taxonomy and Bio::Taxon in completeness or integration with other tools (e.g. rest of bioperl suit) but they are handy for mapping very large datasets (for example blast results) with the NCBI or RDP Taxonomy.
> 
> The modules are:
> 
> Taxonomy::Base -- Finds ancestors, ranks, converts between
>                  names, ranks and IDs, etc...
> 
> Taxonomy::RDP  -- Reads the taxonomic tree from the RDP xml file
> 
> Taxonomy::NCBI -- Reads the taxonomic tree from flat NCBI files
>                  (nodes.dmp and names.dmp)
>                  (Similar to Bio::DB::Taxonomy::flatfile)
> 
> Taxonomy::NCBI::Gi2taxid -- Converts very fast and efficiently
>                            NCBI GIs to Taxids.
>                            Uses a binary lookup table.
> 
> These modules are being used by several groups now -- mainly working with large metagenomics datasets -- and I am considering uploading them to CPAN, but I am not clear on where these modules should be placed there.
> 
> How do you think I should name these modules? (e.g. where these modules should live in CPAN?) Their natural place could be under Bio::DB::Taxonomy, maybe Bio::DB::Taxonomy::Lite / Bio::DB::Taxonomy::Lite::NCBI / etc...? Is this possible (and convenient) without being part of Bioperl? Any other suggestions?
> 
> Thank you very much in advance,
> 
> M;
> 
> ----------------------------------------------------
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l