[Bioperl-l] est2genome

Jason Stajich jason@cgt.mc.duke.edu
Fri, 11 Oct 2002 10:28:04 -0400 (EDT)


I wrote a very basic est2genome parser in Bio::Tools::Est2Genome and a
test in t/est2genome.

Now, I didn't really do this the way I'd like as I'm returning an array
of either Bio::SeqFeature::SimilarityPair (exons) or Bio::SeqFeature::Generic (introns)
and next_feature isn't supported yet because I don't think the current
gene objects fit properly with this data.

The do not allow attatchment of evidence or the fact that the exon might
contain a pair of information for the genomic and cdna/pep information.


Additionally, we don't really seem to do a good job of serializing (GFF,
GAME, GenBank/EMBL/Swissprot) Bio::SeqFeature objects which aren't
Bio::SeqFeature::Generic.

I think we need to add the hooks to make this simplier so one can, for
example, parse with Est2Genome and output as annotation in GFF or
GenBank/EMBL formats.  We can use tag/value pairs to output the
score,alignment information in either of these formats, and allow the user
to override this if they have a specialized way they want to output this.

The problem comes in the composite objects (FeaturePair, SimilarityPair) -
these can't be properly written out because one never sees the
feature2()/hit() component of the data, nor the extra fields like
significance when being written out by genbank/embl or gff writers.  So we
need a better way to register what are the available outputs are in a sort
of recursive fashion which can be available as tag/values and may have
non-unique tag names.

Does anyone have good ideas of how to structure this? Some sort of 'get
all the tag values and all of your children's tag/values pairs and any
registered data functions'.

Also, in a final note, Ensembl is starting to standardize their function
names from each_XX to get_all_XX - I think we have this implicit each_XX
returns a list, while, next_XX is an iterator method.  I don't think this
impacts us too much, but we should try and insure we are being consistent
across the board so people aren't getting mislead.

-jason

-- 
Jason Stajich
Duke University
jason at cgt.mc.duke.edu