[Bioperl-l] Unigene proposal and basic implementation

Andrew Macgregor andrew@anatomy.otago.ac.nz
Mon, 15 Apr 2002 16:32:11 +1200


Recently I emailed Elia regarding the state of unigene in bioperl as I found
a few posts from about a year ago in the mailing list archive. He said that
not much had happened on that front and that if I was working on something I
should post a proposal.

I have been working on coding a unigene parser and seeing whether I could
make it fit into bioperl in any way. I have scripts that do everything I
need outside of bioperl but would like to contribute. I'm new to bioperl and
this is a first foray in to OO-perl, but you gotta start somewhere right?!
I've worked on producing what I would need, but tried to follow the
structure of bioperl.

So I've coded a basic implementation of UniGene.pm and UniGeneIO.pm based on
Seq.pm and SeqIO.pm. I've also coded a unigene format module based on those
used by SeqIO. They work roughly like this.

- UnigeneIO reads from a NCBI unigene file using the unigene format module
and returns a unigene object for each unigene record.
- Each unigene object has methods to return info like unigene_id, title,
gene, locuslink etc
- Each unigene object has methods to return the associated sequence,
protsims, express tissues etc either one by one or as an array.

So basically a unigene object is a container specific to unigene as far as I
can see. It could I guess have a more abstract container above it. It could
be made to return each sequence as a seq object.

What I am now wondering is where to from here? Is there interest in using
this? Is this on the right track? etc etc. I'm happy to contribute this if
it will be useful, and look after it. I'll post the code, rough though it is
if there is interest.

Cheers, Andrew.