[Biojava-dev] retrieving species (common name)

Tue Jun 13 08:58:21 UTC 2006

At present, BJX only has bindings to BioSQL (which can be installed in
Oracle, MySQL, PostgreSQL, or HSQL depending on your preference). It
doesn't know how to access sequence/taxonomy data stored in other
databases. Of course, it can still read flat files.

Without a database which BJX understands, the only way to do what you
describe is to load taxonomy data from the NCBI taxonomy files into
memory on startup, then set up some mechanism of parsing Genbank records
on the fly according to accession number... I could go into detail but
it's a bit complex.

So the short answer is - no, you can't do that kind of query without
coming up with some clever way of using file parsers efficiently on the
fly, or by storing everything in a BioSQL database. Have a look at
RichSequenceListener if you want to selectively parse sequence files.

cheers,
Richard

On Mon, 2006-06-12 at 10:36 -0600, Hubert Prielinger wrote:
> > If your sequences and taxonomy data are not stored in BioSQL, then
> the
> > only way to do this is to parse the taxonomy data on startup, parse
> the
> > sequences on startup into a simple in-memory system such as
> > HashRichSequenceDB, then use the methods on the RichSequenceDB
> interface
> > to obtain sequences by accession before continuing as per the
> example
> > above.
-- 
Richard Holland (BioMart Team)
EMBL-EBI
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD
UNITED KINGDOM
Tel: +44-(0)1223-494416