[Biocorba-l] idl 0.2 discussion
Ewan Birney
birney@ebi.ac.uk
Mon, 20 Nov 2000 23:18:54 +0000 (GMT)
On Mon, 20 Nov 2000 ableasby@hgmp.mrc.ac.uk wrote:
> Jason writes:
>
> >(I'm not leaving EMBOSS out in the cold here intentionally, I just don't
> >know how/where they fit in right now.)
>
> First, just committed some EMBOSS (ajax) code to the repository on
> bio.perl.org along with an application "corbatest" to retrieve
> sequence and feature data using IDL 0.2 (not pretty but enough
> for what's described below)..
Cool.
>
> Secondly, I'm not sure either without knowing what the other projects
> are doing with corba. From the EMBOSS point of view (the word
> philosophy is overused) we want to be able to provide seamless access
> to data from any source. So, you can get EMBL entries from emblcd
> indexed flatfiles, GCG databases (God help us), BLAST databases, SRS
> servers or by firing up external applications. The sequences are easy.
> When the database provision method gives feature table information it
> can be used. Currently the feature tables must be as supplied by EMBL
> or GenBank so (e.g) srs is OK, blast isn't. Applications allow you,
> for example, to construct CDS sequences however they're joined,
> complemented or spread across entries.
>
Ok. This is not something that people have tackled head on in biocorba yet
though we could decide on this. this sort of "exploded" feature
is an another pain-in-the-arse bequeathed to us from EMBL/GenBank. Again,
I know that bioperl has punted on this, as has I believe biocorba
(actually, it hasn't though it makes implementing "start" and "end" on a
top seqfeature a challenge).
> At the moment we're reorganising the internal representation of features
> to what I've christened the EMBOSS Flexible Open Feature Format (EFOFF)
> although the following points apply to the current format also.
>
> Given the IDL we can get the sequence quite quickly. To get the full
> feature information takes lots of iterations and this can take quite a
> while to load up an object.
> We'll then have to write another routine to convert this info to EFOFF.
> (that's why I've committed the code; so another developer who's doing
> EFOFF can have a look to see if the information from IDL 0.2 can mesh).
>
Right.
> What would be ideal from our point of view would be if the original-format
> feature text could be retrieved in one (or a few) slurps. We could then
> use the existing parsers to get the information into EFOFF quickly.
>
I am *mightly* against this. In many cases (eg, Ensembl, Biojava etc)
there is no "original-format" feature table. It is just an object.
We have to cope with EMBL/GenBank features but we don't have remain
slavishly tied to them.
Can you describe EFOFF; as perhaps a C header file or a set of C structs?
It may well be that I could suggest a solution which did not require
extensive monkeying around.
Both Bioperl and Biojava very effectively wrap Biocorba so that it "looks
like" Bioperl/Biojava respectively. This leverages all the internals of
library set in one simple client layer. In my view, this would be my
recommendation for the EMBOSS client layer.
In addition, I would strongly suggest separating out the corba client and
corba server from main EMBOSS code otherwise EMBOSS gets a dependancy on
ORBit. These days (wht the prevelance of GNOME - and wherever GNOME goes,
ORBit goes...) this aint so bad. But your users will bitch... ;)
> Other applications of CORBA for us will be (e.g.) remote execution of
> EMBOSS (or bio*) applications.
>
One thing I would really enjoy seeing is an EMBOSS biocorba *server*. This
would give access to all these icky database formats (eg, BLAST, GCG etc)
directly. Again, I can be helpful here...
> Hope that explains things a bit.
>
> (the other)
> Alan
>
> _______________________________________________
> Biocorba-l mailing list
> Biocorba-l@biocorba.org
> http://www.biocorba.org/mailman/listinfo/biocorba-l
>
-----------------------------------------------------------------
Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420
<birney@ebi.ac.uk>.
-----------------------------------------------------------------