[Bioperl-l] discusion/advice on non-bioperl bioinformatics modules

Hilmar Lapp hilmarl@yahoo.com
Thu, 23 Aug 2001 13:52:01 +0200


Sean Quinlan wrote:
> 
> =from posting
> Current functions in CompBio.pm:
> # note - table format refers to tab delimited, such as id\tsequence[\n|\tother fields\n]
> 
> new - create new CompBio object
> check_type - try to determine what format sequence data is in
> tbl_to_fa - convert sequence data in table format to fasta
> tbl_to_ig - convert sequence data in table format to intelligenics
> fa_to_tbl - convert sequence data in fasta to table format
> ig_to_tbl - convert sequence data in intelligenics format to table
> dna_to_protein - convert dna sequence to protein sequence
> complement - convert dna sequence to it's compliment
> six_frame - translate dna sequence to protein across all six frames
> aa_hash - hash lookup of aa using codons as keys - includes ambiguous codes
> _stop - internal method used by six_frame
> wu_blast - interface to WUBlast; old, ugly and not portable - next project after catching up Simple.pm
> _error - internal method for varying error handling behavior without extra typing every time
> 
> Planned (in most cases some code already exists in BMERC::bio or elsewhere):
> 
> ncbi_blast - interface to NCBI's version of the blast tools
> parse_blast - simple blast output parser - may need to be seperate versions for WU and NCBI blasts. Return tab delimited data in consistent format, such as score, %identity, start/stop positions of match, etc.
> calculate_scores - calculates %equivalent identities and #effective identities from blast output
> dnastar_to_tbl - convert sequence data in dna* format to table
> tbl_to_dnastar - and back
> gcg_to_tbl - convert sequence data in gcg format to table
> tbl_to_gcg - and back
> ncbi_to_tbl - convert sequence data in ncbi 'format' (as cut and pasted from .gbk reports or ncbi's website) to table
> tbl_to_ncbi - and back
> 
> 
> OK, now I'll try to get to the real reason I am making this post. 
> I'd like to voulenteer. Regardless of whether or not they get listed, 
> I would like to offer any code in the modules, or any of the utilities 
> attached, to the bioperl project. 

Sean, thanks for offering your time and code, we appreciate that.
>From your listing I could imagine several useful pieces of code:
1) SeqIO: table format (what's that?), intelligenic format (sorry,
never heard of that, what is it used for?), dna* format (again,
forgive my ignorance, what is it used for), automatic format
checking (the present code in bioperl determines by extension;
using actual content has to be switchable, however, because some
streams may not support rewind).
2) SeqIO::gcg.pm could certainly need at least a maintainer, i.e,
someone who takes care that it is up-to-date and supports the
latest GCG versions (I think it is somewhat outdated, like the
checksum calculation, but I'm not sure).
3) Blast parser (BPlite) could probably use more hands, too. Maybe
you check with Roger Hall (roger@iosea.com) and Jason
(jason@chg.mc.duke.edu).

If you'd like to contribute even more, there is plenty of work to
be done in writing more rigorous tests and fixing bugs.

As for potential clashes of Bioperl with your CompBio work, I'm
not sure why there could be name clashes. Regarding the APIs,
these look very much different, so keeping those interoperable to
me seems to be a big effort you probably don't want to take. Maybe
I'm missing something.

	-hilmar
-- 
----------------------------------------------------------------
Hilmar Lapp                              email: hilmarl@yahoo.com
A-1120 Vienna
-----------------------------------------------------------------