[Biojava-dev] retrieving species (common name)

Richard Holland richard.holland at ebi.ac.uk
Mon Jun 12 08:52:53 UTC 2006


I'm assuming your sequences and taxonomy data are stored in BioSQL. In
which case, it's fairly straightforward to get this information out
without having to drag all the features and annotations out as well, by
using BioEntry instead of RichSequence to query the database. Code like
this should work (hasn't been checked or anything, but it gives you an
idea as to how things should go):

	// connect to BioSQL and establish a Hibernate Session
	Session sess = ...;

	// set up BioJavaX to use the session
	RichObjectFactory.connectToBioSQL(sess);

	// instantiate the class that gets BioEntries from BioSQL.
	// use BioSQLRichSequenceDB instead if you want features and
	// annotations included.
	BioEntryDB db = new BioSQLBioEntryDB(sess);

	// get BioEntry for accession (accession must be the
	// primary accession of the sequence, as found in the
	// 'name' column in the 'bioentry' table in the database).
	BioEntry be = db.getBioEntry("YPOL_IBDVS");

	// get BioEntry's taxon object
	NCBITaxon tax = be.getTaxon();

	// print the names. Each name belongs to a name class.
	for (Iterator i = tax.getNameClasses().iterator(); 
		i.hasNext(); 
	) {
		String nameClass = (String)i.next();
		for (Iterator k = tax.getNames(nameClass).iterator(); 
			k.hasNext(); 
		) {
			String name = (String)k.next();
			System.out.println(nameClass+" : "+name);
		}
	}
	

If your sequences and taxonomy data are not stored in BioSQL, then the
only way to do this is to parse the taxonomy data on startup, parse the
sequences on startup into a simple in-memory system such as
HashRichSequenceDB, then use the methods on the RichSequenceDB interface
to obtain sequences by accession before continuing as per the example
above.

cheers,
Richard


On Fri, 2006-06-09 at 14:51 -0600, Hubert Prielinger wrote:
> hi,
> sorry for replying that late,
> I have XML blast outputs, which you can retrieve information like 
> accession id, protein name, length of sequnence aso....
> but there is no possibility to retrieve the taxonomy (especially the 
> scientific name or common name)
> I need the common and scientific name from each blast hit. I have found 
> in biojava-live/src/org/biojava/bibliography/taxa  a few code examples 
> that could suit my
> task (e.g: simpleTaxon.java)
> 
> eg: I have the accession id: YPOL_IBDVS
> and I want to get the taxonomy of that protein, not neccessarily the 
> entire taxonomy but mentioned above scientific and common name.
> and I don't know exactly how to get the taxonomy, it seems that there is 
> no directly way from the accession id, but over the taxon id, but I 
> don't know how to get that either.....
> it must be possible to map the accession id to the taxon id and then 
> request with the taxon id the taxonomy, if I get it right.....
> 
> thanks in advance
> regards
> Hubert
> 
> 
> Richard Holland wrote:
> > I'm not sure what you're asking for here. Could you explain in a little
> > more detail? Maybe write some example program code that assumes BioJava
> > works the way you'd like it to work in this situation, making up the
> > names of classes/methods that you might call in BioJava but don't yet
> > exist, then we can help you fill in the gaps. 
> >
> > cheers,
> > Richard
> >
> > On Mon, 2006-06-05 at 16:49 -0600, Hubert Prielinger wrote:
> >   
> >> hi,
> >> Is it possible with biojava to retrieve the species not the entire 
> >> taxonomy, only the common name if I only have the accession id or the 
> >> name of the protein and if yes
> >> how to start.....
> >> In my case:
> >> I would retrieve the accession id from my local database then assign as 
> >> parameter to the program, retrieve common name and write the common name 
> >> back into the database....
> >> the thing I want to know is the retrieving possible with biojava?
> >>
> >> thanks for help
> >>
> >> Hubert
> >> _______________________________________________
> >> biojava-dev mailing list
> >> biojava-dev at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/biojava-dev
> >>     
> 
-- 
Richard Holland (BioMart Team)
EMBL-EBI
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD
UNITED KINGDOM
Tel: +44-(0)1223-494416




More information about the biojava-dev mailing list