[Bioperl-l] Re: GO dbxrefs in swissprot

Hilmar Lapp hlapp at gnf.org
Tue Jul 6 16:49:22 EDT 2004


Hi Ewan. how are you? :-)


On Jul 6, 2004, at 12:43 PM, Ewan Birney wrote:

> Ensembl is best accessed through the Ensembl Perl API

Bluntly, this is - although less extreme - like NCBI saying RefSeq is 
best accessed through the NCBI toolkit, and here's how you install 
that, and by the way we don't have time to create a genbank-formatted 
dump.

I.e., I believe immediately that if I wanted to get every detail and 
every context of the genome annotation that Ensembl produces right up 
to every special case, then I shouldn't go for anything less than the 
full power of the Ensembl Perl API.

Many times though the "best" access is the least troublesome, or most 
familiar, with some loss of content acknowledged. I'm willing to bet 
that most people access RefSeq not through the NCBI toolkit, and that 
that wouldn't change even if there were some content that would be 
absent in the genbank-formatted dump.

Do you foremost want to do a service to the community, or a service to 
your development group?

What would be extremely useful is if Ensembl provided a dump in a 
common flat-file format that contained all Ensembl-originated content 
that one cannot reproduce without a very significant computing and 
maintenance effort. As I see it, this would consist of all gene 
predictions, transcript predictions, protein predictions, and the 
results of the Ensembl annotation pipeline(s) for those predictions; 
localizations would be nice, but not required. It doesn't have to be 
EMBL format; any flat format that Bio::SeqIO supports and that doesn't 
require me to install yet another huge library  the update cycle of 
which I need to keep up with would be very helpful.

(Actually, would the gene-only dump you mentioned have all that as 
features and tags?)

IMNSHO this wouldn't be a nice-to-have; it would be terrific and 
tremendously increase the value of Ensembl once you're outside of the 
Ensembl website. It would also allow people (read: me ;) to, e.g., 
effortlessly load ensembl along with refseq and swissprot into biosql.

Affy probe and any other public sequence mappings I can do myself given 
the genome sequence and my own BLAT server (besides, even without one, 
UCSC provides all of that for download anyway).

Anyway, my $0.02, which turns out to approach being worth less than a 
GBP penny ...

Beer in Glasgow? Meanwhile I could even convince my credit card company 
not to shut down my account and that Concorde Services is not a 
fraudulent UK male entertainment enterprise :-)

	-hilmar
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------



More information about the Bioperl-l mailing list