From mark.schreiber at novartis.com  Tue Jun  6 02:45:19 2006
From: mark.schreiber at novartis.com (mark.schreiber at novartis.com)
Date: Tue, 6 Jun 2006 14:45:19 +0800
Subject: [Biojava-dev] Proposed change to RichFormat interface
Message-ID: <OFFEE60AED.663C8D30-ON48257185.0024D97F-48257185.00251B97@EU.novartis.net>

Hi all -

I would like to propose a change  to the RichFormat interface. I think we 
should do this now as we haven't done a stable biojavax roll out yet so 
interface 
changes should still be allowed. The additional methods would be:

public String currentLine();
public int currentLineNumber();

This would make debugging a lot easier, it would also make construction of 
a RichSeqIOListener that logs and debugs much easier. I was trying to do 
this a while back. I started a background process that parsed 6GB of 
genbank records looking for records that failed. It worked ok but would be 

much better with the ability to query the RichFormat in the above way. We 
might even be able to make it  a utility that people could run on suspect 
files and generate standard bug reports to make it easier for us to debug 
the parser code.

What do people think??
- Mark


From richard.holland at ebi.ac.uk  Tue Jun  6 04:10:40 2006
From: richard.holland at ebi.ac.uk (Richard Holland)
Date: Tue, 06 Jun 2006 09:10:40 +0100
Subject: [Biojava-dev] Proposed change to RichFormat interface
In-Reply-To: <OFFEE60AED.663C8D30-ON48257185.0024D97F-48257185.00251B97@EU.novartis.net>
References: <OFFEE60AED.663C8D30-ON48257185.0024D97F-48257185.00251B97@EU.novartis.net>
Message-ID: <1149581440.3947.56.camel@texas.ebi.ac.uk>

Go for it. It would be very helpful.

On Tue, 2006-06-06 at 14:45 +0800, mark.schreiber at novartis.com wrote:
> Hi all -
> 
> I would like to propose a change  to the RichFormat interface. I think we 
> should do this now as we haven't done a stable biojavax roll out yet so 
> interface 
> changes should still be allowed. The additional methods would be:
> 
> public String currentLine();
> public int currentLineNumber();
> 
> This would make debugging a lot easier, it would also make construction of 
> a RichSeqIOListener that logs and debugs much easier. I was trying to do 
> this a while back. I started a background process that parsed 6GB of 
> genbank records looking for records that failed. It worked ok but would be 
> 
> much better with the ability to query the RichFormat in the above way. We 
> might even be able to make it  a utility that people could run on suspect 
> files and generate standard bug reports to make it easier for us to debug 
> the parser code.
> 
> What do people think??
> - Mark
> 
> 
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
-- 
Richard Holland (BioMart Team)
EMBL-EBI
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD
UNITED KINGDOM
Tel: +44-(0)1223-494416


From mark.schreiber at novartis.com  Tue Jun  6 05:41:41 2006
From: mark.schreiber at novartis.com (mark.schreiber at novartis.com)
Date: Tue, 6 Jun 2006 17:41:41 +0800
Subject: [Biojava-dev] Proposed change to RichFormat interface
Message-ID: <OFEF0C340B.CE10DB23-ON48257185.0034EC03-48257185.0035415D@EU.novartis.net>

maybe the method should be something like

public String currentParseString()

The question is should the currentLineNumber be the start of the parse 
block or the end?

I would favour the start of the parse block. This would be more like 
compiler type behaivour but might be trickier to code??

- Mark


Richard Holland <richard.holland at ebi.ac.uk>
06/06/2006 05:31 PM

 
        To:     Mark Schreiber <mark.schreiber at novartis.com>
        cc: 
        Subject:        Re: [Biojava-dev] Proposed change to RichFormat interface


It's worth pointing out that most of the parsers bunch together lines,
so the methods below would probably print out the line number on which
the group of lines started, followed by the entire group. Not sure if
that's exactly what you had in mind, but I'm sure it'd help a little
bit.

On Tue, 2006-06-06 at 14:45 +0800, mark.schreiber at novartis.com wrote:
> public String currentLine();
> public int currentLineNumber();
-- 
Richard Holland (BioMart Team)
EMBL-EBI
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD
UNITED KINGDOM
Tel: +44-(0)1223-494416


From hubert.prielinger at gmx.at  Mon Jun  5 18:49:29 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Mon, 05 Jun 2006 16:49:29 -0600
Subject: [Biojava-dev] retrieving species (common name)
Message-ID: <4484B4F9.9000502@gmx.at>

hi,
Is it possible with biojava to retrieve the species not the entire 
taxonomy, only the common name if I only have the accession id or the 
name of the protein and if yes
how to start.....
In my case:
I would retrieve the accession id from my local database then assign as 
parameter to the program, retrieve common name and write the common name 
back into the database....
the thing I want to know is the retrieving possible with biojava?

thanks for help

Hubert

From richard.holland at ebi.ac.uk  Tue Jun  6 11:17:41 2006
From: richard.holland at ebi.ac.uk (Richard Holland)
Date: Tue, 06 Jun 2006 16:17:41 +0100
Subject: [Biojava-dev] retrieving species (common name)
In-Reply-To: <4484B4F9.9000502@gmx.at>
References: <4484B4F9.9000502@gmx.at>
Message-ID: <1149607062.3947.92.camel@texas.ebi.ac.uk>

I'm not sure what you're asking for here. Could you explain in a little
more detail? Maybe write some example program code that assumes BioJava
works the way you'd like it to work in this situation, making up the
names of classes/methods that you might call in BioJava but don't yet
exist, then we can help you fill in the gaps. 

cheers,
Richard

On Mon, 2006-06-05 at 16:49 -0600, Hubert Prielinger wrote:
> hi,
> Is it possible with biojava to retrieve the species not the entire 
> taxonomy, only the common name if I only have the accession id or the 
> name of the protein and if yes
> how to start.....
> In my case:
> I would retrieve the accession id from my local database then assign as 
> parameter to the program, retrieve common name and write the common name 
> back into the database....
> the thing I want to know is the retrieving possible with biojava?
> 
> thanks for help
> 
> Hubert
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
-- 
Richard Holland (BioMart Team)
EMBL-EBI
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD
UNITED KINGDOM
Tel: +44-(0)1223-494416


From mark.schreiber at novartis.com  Wed Jun  7 02:02:51 2006
From: mark.schreiber at novartis.com (mark.schreiber at novartis.com)
Date: Wed, 7 Jun 2006 14:02:51 +0800
Subject: [Biojava-dev] Proposed change to RichFormat interface
Message-ID: <OFDD055B07.C740ACD8-ON48257186.00206C63-48257186.00213864@EU.novartis.net>

That might be a more elegant solution.

Could even make the InputStream implement RichSeqIOListener thus it would 
be sending data to the RichFormat and listening to what the RichFormat 
makes of the data.

The InputStreamIOListener could remember when the RichFormat emits a 
startXXX() event record the line number and start buffering all the data 
sent as the readLine() requests are made (while also sending it to the 
RichFormat). When the RichFormat emits the corresponding endXXX() event 
the buffer can be cleared and the process starts again.

Only problem might be what to do when the RichFormat consumes data in 
between emitting events (which is allowed).

- Mark


Michael Heuer <heuermh at acm.org>
Sent by: Michael Heuer <heuermh at shell3.shore.net>
06/07/2006 01:51 PM

 
        To:     mark.schreiber at novartis.com
        cc:     biojava-dev at biojava.org
        Subject:        Re: [Biojava-dev] Proposed change to RichFormat interface


Mark Schreiber wrote:

> Hi all -
>
> I would like to propose a change  to the RichFormat interface. I think 
we
> should do this now as we haven't done a stable biojavax roll out yet so
> interface
> changes should still be allowed. The additional methods would be:
>
> public String currentLine();
> public int currentLineNumber();
>
> This would make debugging a lot easier, it would also make construction 
of
> a RichSeqIOListener that logs and debugs much easier. I was trying to do
> this a while back. I started a background process that parsed 6GB of
> genbank records looking for records that failed. It worked ok but would 
be
>
> much better with the ability to query the RichFormat in the above way. 
We
> might even be able to make it  a utility that people could run on 
suspect
> files and generate standard bug reports to make it easier for us to 
debug
> the parser code.
>
> What do people think??

Another possibility would be to leave this sort of progress tracking up
to the client, in that they could wrap the InputStream in something like
an CountingInputStream before passing it to the parser(s):

http://jakarta.apache.org/commons/io/api-release/org/apache/commons/io/input/CountingInputStream.html

   michael


From heuermh at acm.org  Wed Jun  7 01:51:42 2006
From: heuermh at acm.org (Michael Heuer)
Date: Wed, 7 Jun 2006 01:51:42 -0400 (EDT)
Subject: [Biojava-dev] Proposed change to RichFormat interface
In-Reply-To: <OFFEE60AED.663C8D30-ON48257185.0024D97F-48257185.00251B97@EU.novartis.net>
Message-ID: <Pine.GSO.4.44.0606070148350.26221-100000@shell3.shore.net>

Mark Schreiber wrote:

> Hi all -
>
> I would like to propose a change  to the RichFormat interface. I think we
> should do this now as we haven't done a stable biojavax roll out yet so
> interface
> changes should still be allowed. The additional methods would be:
>
> public String currentLine();
> public int currentLineNumber();
>
> This would make debugging a lot easier, it would also make construction of
> a RichSeqIOListener that logs and debugs much easier. I was trying to do
> this a while back. I started a background process that parsed 6GB of
> genbank records looking for records that failed. It worked ok but would be
>
> much better with the ability to query the RichFormat in the above way. We
> might even be able to make it  a utility that people could run on suspect
> files and generate standard bug reports to make it easier for us to debug
> the parser code.
>
> What do people think??

Another possibility would be to leave this sort of progress tracking up
to the client, in that they could wrap the InputStream in something like
an CountingInputStream before passing it to the parser(s):

http://jakarta.apache.org/commons/io/api-release/org/apache/commons/io/input/CountingInputStream.html

   michael


From richard.holland at ebi.ac.uk  Wed Jun  7 08:36:49 2006
From: richard.holland at ebi.ac.uk (Richard Holland)
Date: Wed, 07 Jun 2006 13:36:49 +0100
Subject: [Biojava-dev] Proposed change to RichFormat interface
In-Reply-To: <OFDD055B07.C740ACD8-ON48257186.00206C63-48257186.00213864@EU.novartis.net>
References: <OFDD055B07.C740ACD8-ON48257186.00206C63-48257186.00213864@EU.novartis.net>
Message-ID: <1149683810.3947.131.camel@texas.ebi.ac.uk>

Hi guys.

See org.biojavax.seq.io.DebuggingRichSeqIOListener.

It extends BufferedInputStream, so can be used to wrap a normal
InputStream before being passed around.

It also implements RichSeqIOListener.

The idea is that you do something like this:

	Namespace ns = RichObjectFactory.getDefaultNamespace();
	InputStream is = new FileInputStream("myFastaFile.fasta");
	FASTAFormat format = new FASTAFormat();

	DebuggingRichSeqIOListener debug = 
		new DebuggingRichSeqIOListener(is);
	BufferedReader br = new BufferedReader(
		new InputStreamReader(debug));

	SymbolTokenization symParser = format.guessSymbolTokenization(debug);

	format.readRichSequence(
            br,
            symParser,
            debug,
            ns);

This will then dump out everything as it is read, and all events as they
happen in-line with the input as it is interpreted.

Hope this helps?

cheers,
Richard
 

On Wed, 2006-06-07 at 14:02 +0800, mark.schreiber at novartis.com wrote:
> That might be a more elegant solution.
> 
> Could even make the InputStream implement RichSeqIOListener thus it would 
> be sending data to the RichFormat and listening to what the RichFormat 
> makes of the data.
> 
> The InputStreamIOListener could remember when the RichFormat emits a 
> startXXX() event record the line number and start buffering all the data 
> sent as the readLine() requests are made (while also sending it to the 
> RichFormat). When the RichFormat emits the corresponding endXXX() event 
> the buffer can be cleared and the process starts again.
> 
> Only problem might be what to do when the RichFormat consumes data in 
> between emitting events (which is allowed).
> 
> - Mark
> 
> 
> 
> 
> 
> Michael Heuer <heuermh at acm.org>
> Sent by: Michael Heuer <heuermh at shell3.shore.net>
> 06/07/2006 01:51 PM
> 
>  
>         To:     mark.schreiber at novartis.com
>         cc:     biojava-dev at biojava.org
>         Subject:        Re: [Biojava-dev] Proposed change to RichFormat interface
> 
> 
> Mark Schreiber wrote:
> 
> > Hi all -
> >
> > I would like to propose a change  to the RichFormat interface. I think 
> we
> > should do this now as we haven't done a stable biojavax roll out yet so
> > interface
> > changes should still be allowed. The additional methods would be:
> >
> > public String currentLine();
> > public int currentLineNumber();
> >
> > This would make debugging a lot easier, it would also make construction 
> of
> > a RichSeqIOListener that logs and debugs much easier. I was trying to do
> > this a while back. I started a background process that parsed 6GB of
> > genbank records looking for records that failed. It worked ok but would 
> be
> >
> > much better with the ability to query the RichFormat in the above way. 
> We
> > might even be able to make it  a utility that people could run on 
> suspect
> > files and generate standard bug reports to make it easier for us to 
> debug
> > the parser code.
> >
> > What do people think??
> 
> Another possibility would be to leave this sort of progress tracking up
> to the client, in that they could wrap the InputStream in something like
> an CountingInputStream before passing it to the parser(s):
> 
> http://jakarta.apache.org/commons/io/api-release/org/apache/commons/io/input/CountingInputStream.html
> 
>    michael
> 
> 
> 
> 
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
-- 
Richard Holland (BioMart Team)
EMBL-EBI
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD
UNITED KINGDOM
Tel: +44-(0)1223-494416


From mark.schreiber at novartis.com  Wed Jun  7 21:03:22 2006
From: mark.schreiber at novartis.com (mark.schreiber at novartis.com)
Date: Thu, 8 Jun 2006 09:03:22 +0800
Subject: [Biojava-dev] Proposed change to RichFormat interface
Message-ID: <OF8EDFF654.E3B483EE-ON48257187.0005BED1-48257187.0005CD56@EU.novartis.net>

Very cool!

Can you put this example in the cookbook?

- Mark


Richard Holland <richard.holland at ebi.ac.uk>
Sent by: biojava-dev-bounces at lists.open-bio.org
06/07/2006 08:36 PM

 
        To:     Mark Schreiber <mark.schreiber at novartis.com>
        cc:     biojava-dev <biojava-dev at biojava.org>, Michael Heuer <heuermh at acm.org>, 
Michael Heuer <heuermh at shell3.shore.net>
        Subject:        Re: [Biojava-dev] Proposed change to RichFormat interface


Hi guys.

See org.biojavax.seq.io.DebuggingRichSeqIOListener.

It extends BufferedInputStream, so can be used to wrap a normal
InputStream before being passed around.

It also implements RichSeqIOListener.

The idea is that you do something like this:

                 Namespace ns = RichObjectFactory.getDefaultNamespace();
                 InputStream is = new 
FileInputStream("myFastaFile.fasta");
                 FASTAFormat format = new FASTAFormat();

                 DebuggingRichSeqIOListener debug = 
                                 new DebuggingRichSeqIOListener(is);
                 BufferedReader br = new BufferedReader(
                                 new InputStreamReader(debug));

                 SymbolTokenization symParser = 
format.guessSymbolTokenization(debug);

                 format.readRichSequence(
            br,
            symParser,
            debug,
            ns);

This will then dump out everything as it is read, and all events as they
happen in-line with the input as it is interpreted.

Hope this helps?

cheers,
Richard
 

On Wed, 2006-06-07 at 14:02 +0800, mark.schreiber at novartis.com wrote:
> That might be a more elegant solution.
> 
> Could even make the InputStream implement RichSeqIOListener thus it 
would 
> be sending data to the RichFormat and listening to what the RichFormat 
> makes of the data.
> 
> The InputStreamIOListener could remember when the RichFormat emits a 
> startXXX() event record the line number and start buffering all the data 

> sent as the readLine() requests are made (while also sending it to the 
> RichFormat). When the RichFormat emits the corresponding endXXX() event 
> the buffer can be cleared and the process starts again.
> 
> Only problem might be what to do when the RichFormat consumes data in 
> between emitting events (which is allowed).
> 
> - Mark
> 
> 
> 
> 
> 
> Michael Heuer <heuermh at acm.org>
> Sent by: Michael Heuer <heuermh at shell3.shore.net>
> 06/07/2006 01:51 PM
> 
> 
>         To:     mark.schreiber at novartis.com
>         cc:     biojava-dev at biojava.org
>         Subject:        Re: [Biojava-dev] Proposed change to RichFormat 
interface
> 
> 
> Mark Schreiber wrote:
> 
> > Hi all -
> >
> > I would like to propose a change  to the RichFormat interface. I think 

> we
> > should do this now as we haven't done a stable biojavax roll out yet 
so
> > interface
> > changes should still be allowed. The additional methods would be:
> >
> > public String currentLine();
> > public int currentLineNumber();
> >
> > This would make debugging a lot easier, it would also make 
construction 
> of
> > a RichSeqIOListener that logs and debugs much easier. I was trying to 
do
> > this a while back. I started a background process that parsed 6GB of
> > genbank records looking for records that failed. It worked ok but 
would 
> be
> >
> > much better with the ability to query the RichFormat in the above way. 

> We
> > might even be able to make it  a utility that people could run on 
> suspect
> > files and generate standard bug reports to make it easier for us to 
> debug
> > the parser code.
> >
> > What do people think??
> 
> Another possibility would be to leave this sort of progress tracking up
> to the client, in that they could wrap the InputStream in something like
> an CountingInputStream before passing it to the parser(s):
> 
> http://jakarta.apache.org/commons/io/api-release/org/apache/commons/io/input/CountingInputStream.html
> 
>    michael
> 
> 
> 
> 
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
-- 
Richard Holland (BioMart Team)
EMBL-EBI
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD
UNITED KINGDOM
Tel: +44-(0)1223-494416

_______________________________________________
biojava-dev mailing list
biojava-dev at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biojava-dev


From richard.holland at ebi.ac.uk  Mon Jun 12 04:52:53 2006
From: richard.holland at ebi.ac.uk (Richard Holland)
Date: Mon, 12 Jun 2006 09:52:53 +0100
Subject: [Biojava-dev] retrieving species (common name)
In-Reply-To: <4489DF3F.4060504@gmx.at>
References: <4484B4F9.9000502@gmx.at>
	<1149607062.3947.92.camel@texas.ebi.ac.uk>  <4489DF3F.4060504@gmx.at>
Message-ID: <1150102373.3952.21.camel@texas.ebi.ac.uk>

I'm assuming your sequences and taxonomy data are stored in BioSQL. In
which case, it's fairly straightforward to get this information out
without having to drag all the features and annotations out as well, by
using BioEntry instead of RichSequence to query the database. Code like
this should work (hasn't been checked or anything, but it gives you an
idea as to how things should go):

	// connect to BioSQL and establish a Hibernate Session
	Session sess = ...;

	// set up BioJavaX to use the session
	RichObjectFactory.connectToBioSQL(sess);

	// instantiate the class that gets BioEntries from BioSQL.
	// use BioSQLRichSequenceDB instead if you want features and
	// annotations included.
	BioEntryDB db = new BioSQLBioEntryDB(sess);

	// get BioEntry for accession (accession must be the
	// primary accession of the sequence, as found in the
	// 'name' column in the 'bioentry' table in the database).
	BioEntry be = db.getBioEntry("YPOL_IBDVS");

	// get BioEntry's taxon object
	NCBITaxon tax = be.getTaxon();

	// print the names. Each name belongs to a name class.
	for (Iterator i = tax.getNameClasses().iterator(); 
		i.hasNext(); 
	) {
		String nameClass = (String)i.next();
		for (Iterator k = tax.getNames(nameClass).iterator(); 
			k.hasNext(); 
		) {
			String name = (String)k.next();
			System.out.println(nameClass+" : "+name);
		}
	}
	

If your sequences and taxonomy data are not stored in BioSQL, then the
only way to do this is to parse the taxonomy data on startup, parse the
sequences on startup into a simple in-memory system such as
HashRichSequenceDB, then use the methods on the RichSequenceDB interface
to obtain sequences by accession before continuing as per the example
above.

cheers,
Richard


On Fri, 2006-06-09 at 14:51 -0600, Hubert Prielinger wrote:
> hi,
> sorry for replying that late,
> I have XML blast outputs, which you can retrieve information like 
> accession id, protein name, length of sequnence aso....
> but there is no possibility to retrieve the taxonomy (especially the 
> scientific name or common name)
> I need the common and scientific name from each blast hit. I have found 
> in biojava-live/src/org/biojava/bibliography/taxa  a few code examples 
> that could suit my
> task (e.g: simpleTaxon.java)
> 
> eg: I have the accession id: YPOL_IBDVS
> and I want to get the taxonomy of that protein, not neccessarily the 
> entire taxonomy but mentioned above scientific and common name.
> and I don't know exactly how to get the taxonomy, it seems that there is 
> no directly way from the accession id, but over the taxon id, but I 
> don't know how to get that either.....
> it must be possible to map the accession id to the taxon id and then 
> request with the taxon id the taxonomy, if I get it right.....
> 
> thanks in advance
> regards
> Hubert
> 
> 
> Richard Holland wrote:
> > I'm not sure what you're asking for here. Could you explain in a little
> > more detail? Maybe write some example program code that assumes BioJava
> > works the way you'd like it to work in this situation, making up the
> > names of classes/methods that you might call in BioJava but don't yet
> > exist, then we can help you fill in the gaps. 
> >
> > cheers,
> > Richard
> >
> > On Mon, 2006-06-05 at 16:49 -0600, Hubert Prielinger wrote:
> >   
> >> hi,
> >> Is it possible with biojava to retrieve the species not the entire 
> >> taxonomy, only the common name if I only have the accession id or the 
> >> name of the protein and if yes
> >> how to start.....
> >> In my case:
> >> I would retrieve the accession id from my local database then assign as 
> >> parameter to the program, retrieve common name and write the common name 
> >> back into the database....
> >> the thing I want to know is the retrieving possible with biojava?
> >>
> >> thanks for help
> >>
> >> Hubert
> >> _______________________________________________
> >> biojava-dev mailing list
> >> biojava-dev at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/biojava-dev
> >>     
> 
-- 
Richard Holland (BioMart Team)
EMBL-EBI
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD
UNITED KINGDOM
Tel: +44-(0)1223-494416


From Robin.Emig at pioneer.com  Mon Jun 12 15:01:12 2006
From: Robin.Emig at pioneer.com (Emig, Robin)
Date: Mon, 12 Jun 2006 12:01:12 -0700
Subject: [Biojava-dev] Read/Write Account
Message-ID: <BE0B1B72A3A05C448AD3F6FB6B6D9A137D668E@rcy1ms01.phibred.com>

Can I get a read write account for biojava? I used to have one under
remig, or raemig.

Thanks

Robin

 
Robin Emig

Pioneer HiBred/Dupont

700A Bay Road

Redwood City, CA 94063

650-298-3564

 
From hubert.prielinger at gmx.at  Fri Jun  9 16:51:11 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Fri, 09 Jun 2006 14:51:11 -0600
Subject: [Biojava-dev] retrieving species (common name)
In-Reply-To: <1149607062.3947.92.camel@texas.ebi.ac.uk>
References: <4484B4F9.9000502@gmx.at>
	<1149607062.3947.92.camel@texas.ebi.ac.uk>
Message-ID: <4489DF3F.4060504@gmx.at>

hi,
sorry for replying that late,
I have XML blast outputs, which you can retrieve information like 
accession id, protein name, length of sequnence aso....
but there is no possibility to retrieve the taxonomy (especially the 
scientific name or common name)
I need the common and scientific name from each blast hit. I have found 
in biojava-live/src/org/biojava/bibliography/taxa  a few code examples 
that could suit my
task (e.g: simpleTaxon.java)

eg: I have the accession id: YPOL_IBDVS
and I want to get the taxonomy of that protein, not neccessarily the 
entire taxonomy but mentioned above scientific and common name.
and I don't know exactly how to get the taxonomy, it seems that there is 
no directly way from the accession id, but over the taxon id, but I 
don't know how to get that either.....
it must be possible to map the accession id to the taxon id and then 
request with the taxon id the taxonomy, if I get it right.....

thanks in advance
regards
Hubert


Richard Holland wrote:
> I'm not sure what you're asking for here. Could you explain in a little
> more detail? Maybe write some example program code that assumes BioJava
> works the way you'd like it to work in this situation, making up the
> names of classes/methods that you might call in BioJava but don't yet
> exist, then we can help you fill in the gaps. 
>
> cheers,
> Richard
>
> On Mon, 2006-06-05 at 16:49 -0600, Hubert Prielinger wrote:
>   
>> hi,
>> Is it possible with biojava to retrieve the species not the entire 
>> taxonomy, only the common name if I only have the accession id or the 
>> name of the protein and if yes
>> how to start.....
>> In my case:
>> I would retrieve the accession id from my local database then assign as 
>> parameter to the program, retrieve common name and write the common name 
>> back into the database....
>> the thing I want to know is the retrieving possible with biojava?
>>
>> thanks for help
>>
>> Hubert
>> _______________________________________________
>> biojava-dev mailing list
>> biojava-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>     


From hubert.prielinger at gmx.at  Fri Jun  9 18:10:12 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Fri, 09 Jun 2006 16:10:12 -0600
Subject: [Biojava-dev] retrieving species (common name)
In-Reply-To: <1149607062.3947.92.camel@texas.ebi.ac.uk>
References: <4484B4F9.9000502@gmx.at>
	<1149607062.3947.92.camel@texas.ebi.ac.uk>
Message-ID: <4489F1C4.3030803@gmx.at>

ok, with accession id, I mean the genbank id, if I have the genbank id, 
is there a direct way to get the common name...


Richard Holland wrote:
> I'm not sure what you're asking for here. Could you explain in a little
> more detail? Maybe write some example program code that assumes BioJava
> works the way you'd like it to work in this situation, making up the
> names of classes/methods that you might call in BioJava but don't yet
> exist, then we can help you fill in the gaps. 
>
> cheers,
> Richard
>
> On Mon, 2006-06-05 at 16:49 -0600, Hubert Prielinger wrote:
>   
>> hi,
>> Is it possible with biojava to retrieve the species not the entire 
>> taxonomy, only the common name if I only have the accession id or the 
>> name of the protein and if yes
>> how to start.....
>> In my case:
>> I would retrieve the accession id from my local database then assign as 
>> parameter to the program, retrieve common name and write the common name 
>> back into the database....
>> the thing I want to know is the retrieving possible with biojava?
>>
>> thanks for help
>>
>> Hubert
>> _______________________________________________
>> biojava-dev mailing list
>> biojava-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>     


From hubert.prielinger at gmx.at  Mon Jun 12 12:36:32 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Mon, 12 Jun 2006 10:36:32 -0600
Subject: [Biojava-dev] retrieving species (common name)
In-Reply-To: <1150102373.3952.21.camel@texas.ebi.ac.uk>
References: <4484B4F9.9000502@gmx.at>	
	<1149607062.3947.92.camel@texas.ebi.ac.uk>
	<4489DF3F.4060504@gmx.at>
	<1150102373.3952.21.camel@texas.ebi.ac.uk>
Message-ID: <448D9810.1060703@gmx.at>

hi,
No, I'm not using BioSQL, it is an usual mySQL database and I have only 
the genbank accession id available. I want to get the taxonomy with the 
accession id, if that is possible.

regards
Hubert


Richard Holland wrote:
> I'm assuming your sequences and taxonomy data are stored in BioSQL. In
> which case, it's fairly straightforward to get this information out
> without having to drag all the features and annotations out as well, by
> using BioEntry instead of RichSequence to query the database. Code like
> this should work (hasn't been checked or anything, but it gives you an
> idea as to how things should go):
>
> 	// connect to BioSQL and establish a Hibernate Session
> 	Session sess = ...;
>
> 	// set up BioJavaX to use the session
> 	RichObjectFactory.connectToBioSQL(sess);
>
> 	// instantiate the class that gets BioEntries from BioSQL.
> 	// use BioSQLRichSequenceDB instead if you want features and
> 	// annotations included.
> 	BioEntryDB db = new BioSQLBioEntryDB(sess);
>
> 	// get BioEntry for accession (accession must be the
> 	// primary accession of the sequence, as found in the
> 	// 'name' column in the 'bioentry' table in the database).
> 	BioEntry be = db.getBioEntry("YPOL_IBDVS");
>
> 	// get BioEntry's taxon object
> 	NCBITaxon tax = be.getTaxon();
>
> 	// print the names. Each name belongs to a name class.
> 	for (Iterator i = tax.getNameClasses().iterator(); 
> 		i.hasNext(); 
> 	) {
> 		String nameClass = (String)i.next();
> 		for (Iterator k = tax.getNames(nameClass).iterator(); 
> 			k.hasNext(); 
> 		) {
> 			String name = (String)k.next();
> 			System.out.println(nameClass+" : "+name);
> 		}
> 	}
> 	
>
> If your sequences and taxonomy data are not stored in BioSQL, then the
> only way to do this is to parse the taxonomy data on startup, parse the
> sequences on startup into a simple in-memory system such as
> HashRichSequenceDB, then use the methods on the RichSequenceDB interface
> to obtain sequences by accession before continuing as per the example
> above.
>
> cheers,
> Richard
>
>
> On Fri, 2006-06-09 at 14:51 -0600, Hubert Prielinger wrote:
>   
>> hi,
>> sorry for replying that late,
>> I have XML blast outputs, which you can retrieve information like 
>> accession id, protein name, length of sequnence aso....
>> but there is no possibility to retrieve the taxonomy (especially the 
>> scientific name or common name)
>> I need the common and scientific name from each blast hit. I have found 
>> in biojava-live/src/org/biojava/bibliography/taxa  a few code examples 
>> that could suit my
>> task (e.g: simpleTaxon.java)
>>
>> eg: I have the accession id: YPOL_IBDVS
>> and I want to get the taxonomy of that protein, not neccessarily the 
>> entire taxonomy but mentioned above scientific and common name.
>> and I don't know exactly how to get the taxonomy, it seems that there is 
>> no directly way from the accession id, but over the taxon id, but I 
>> don't know how to get that either.....
>> it must be possible to map the accession id to the taxon id and then 
>> request with the taxon id the taxonomy, if I get it right.....
>>
>> thanks in advance
>> regards
>> Hubert
>>
>>
>> Richard Holland wrote:
>>     
>>> I'm not sure what you're asking for here. Could you explain in a little
>>> more detail? Maybe write some example program code that assumes BioJava
>>> works the way you'd like it to work in this situation, making up the
>>> names of classes/methods that you might call in BioJava but don't yet
>>> exist, then we can help you fill in the gaps. 
>>>
>>> cheers,
>>> Richard
>>>
>>> On Mon, 2006-06-05 at 16:49 -0600, Hubert Prielinger wrote:
>>>   
>>>       
>>>> hi,
>>>> Is it possible with biojava to retrieve the species not the entire 
>>>> taxonomy, only the common name if I only have the accession id or the 
>>>> name of the protein and if yes
>>>> how to start.....
>>>> In my case:
>>>> I would retrieve the accession id from my local database then assign as 
>>>> parameter to the program, retrieve common name and write the common name 
>>>> back into the database....
>>>> the thing I want to know is the retrieving possible with biojava?
>>>>
>>>> thanks for help
>>>>
>>>> Hubert
>>>> _______________________________________________
>>>> biojava-dev mailing list
>>>> biojava-dev at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>>>     
>>>>         


From richard.holland at ebi.ac.uk  Tue Jun 13 04:58:21 2006
From: richard.holland at ebi.ac.uk (Richard Holland)
Date: Tue, 13 Jun 2006 09:58:21 +0100
Subject: [Biojava-dev] retrieving species (common name)
In-Reply-To: <448D9810.1060703@gmx.at>
References: <4484B4F9.9000502@gmx.at>
	<1149607062.3947.92.camel@texas.ebi.ac.uk>  <4489DF3F.4060504@gmx.at>
	<1150102373.3952.21.camel@texas.ebi.ac.uk>  <448D9810.1060703@gmx.at>
Message-ID: <1150189101.3952.47.camel@texas.ebi.ac.uk>

At present, BJX only has bindings to BioSQL (which can be installed in
Oracle, MySQL, PostgreSQL, or HSQL depending on your preference). It
doesn't know how to access sequence/taxonomy data stored in other
databases. Of course, it can still read flat files.

Without a database which BJX understands, the only way to do what you
describe is to load taxonomy data from the NCBI taxonomy files into
memory on startup, then set up some mechanism of parsing Genbank records
on the fly according to accession number... I could go into detail but
it's a bit complex.

So the short answer is - no, you can't do that kind of query without
coming up with some clever way of using file parsers efficiently on the
fly, or by storing everything in a BioSQL database. Have a look at
RichSequenceListener if you want to selectively parse sequence files.

cheers,
Richard

On Mon, 2006-06-12 at 10:36 -0600, Hubert Prielinger wrote:
> > If your sequences and taxonomy data are not stored in BioSQL, then
> the
> > only way to do this is to parse the taxonomy data on startup, parse
> the
> > sequences on startup into a simple in-memory system such as
> > HashRichSequenceDB, then use the methods on the RichSequenceDB
> interface
> > to obtain sequences by accession before continuing as per the
> example
> > above.
-- 
Richard Holland (BioMart Team)
EMBL-EBI
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD
UNITED KINGDOM
Tel: +44-(0)1223-494416


From richard.holland at ebi.ac.uk  Tue Jun 13 11:20:15 2006
From: richard.holland at ebi.ac.uk (Richard Holland)
Date: Tue, 13 Jun 2006 16:20:15 +0100
Subject: [Biojava-dev] Read/Write Account
In-Reply-To: <BE0B1B72A3A05C448AD3F6FB6B6D9A137D668E@rcy1ms01.phibred.com>
References: <BE0B1B72A3A05C448AD3F6FB6B6D9A137D668E@rcy1ms01.phibred.com>
Message-ID: <1150212015.3952.121.camel@texas.ebi.ac.uk>

Hi Robin. Mark should be able to set you up with one, or point you to
the person who can. I can never remember who's in charge.

Purely out of interest, what are you planning on changing once you get
access? It's useful to know what people are up to out there so we don't
duplicate effort.

cheers,
Richard

On Mon, 2006-06-12 at 12:01 -0700, Emig, Robin wrote:
> Can I get a read write account for biojava? I used to have one under
> remig, or raemig.
> 
> Thanks
> 
> Robin
> 
>  
> 
> Robin Emig
> 
> Pioneer HiBred/Dupont
> 
> 700A Bay Road
> 
> Redwood City, CA 94063
> 
> 650-298-3564
> 
>  
> 
> 
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
-- 
Richard Holland (BioMart Team)
EMBL-EBI
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD
UNITED KINGDOM
Tel: +44-(0)1223-494416


From mark.schreiber at novartis.com  Sun Jun 18 22:38:38 2006
From: mark.schreiber at novartis.com (mark.schreiber at novartis.com)
Date: Mon, 19 Jun 2006 10:38:38 +0800
Subject: [Biojava-dev] retrieving species (common name)
Message-ID: <OFACBEF79C.494DD122-ON48257192.000E71BA-48257192.000E861E@EU.novartis.net>

You could try NCBI's e-utils webservice. This might enable you to get the 
common name using a GI number.

- Mark


Hubert Prielinger <hubert.prielinger at gmx.at>
Sent by: biojava-dev-bounces at lists.open-bio.org
06/13/2006 12:36 AM

 
        To:     Richard Holland <richard.holland at ebi.ac.uk>, 
biojava-dev at lists.open-bio.org
        cc:     (bcc: Mark Schreiber/GP/Novartis)
        Subject:        Re: [Biojava-dev] retrieving species (common name)


hi,
No, I'm not using BioSQL, it is an usual mySQL database and I have only 
the genbank accession id available. I want to get the taxonomy with the 
accession id, if that is possible.

regards
Hubert


Richard Holland wrote:
> I'm assuming your sequences and taxonomy data are stored in BioSQL. In
> which case, it's fairly straightforward to get this information out
> without having to drag all the features and annotations out as well, by
> using BioEntry instead of RichSequence to query the database. Code like
> this should work (hasn't been checked or anything, but it gives you an
> idea as to how things should go):
>
>                // connect to BioSQL and establish a Hibernate Session
>                Session sess = ...;
>
>                // set up BioJavaX to use the session
>                RichObjectFactory.connectToBioSQL(sess);
>
>                // instantiate the class that gets BioEntries from 
BioSQL.
>                // use BioSQLRichSequenceDB instead if you want features 
and
>                // annotations included.
>                BioEntryDB db = new BioSQLBioEntryDB(sess);
>
>                // get BioEntry for accession (accession must be the
>                // primary accession of the sequence, as found in the
>                // 'name' column in the 'bioentry' table in the 
database).
>                BioEntry be = db.getBioEntry("YPOL_IBDVS");
>
>                // get BioEntry's taxon object
>                NCBITaxon tax = be.getTaxon();
>
>                // print the names. Each name belongs to a name class.
>                for (Iterator i = tax.getNameClasses().iterator(); 
>                                i.hasNext(); 
>                ) {
>                                String nameClass = (String)i.next();
>                                for (Iterator k = 
tax.getNames(nameClass).iterator(); 
>                                                k.hasNext(); 
>                                ) {
>                                                String name = 
(String)k.next();
> System.out.println(nameClass+" : "+name);
>                                }
>                }
> 
>
> If your sequences and taxonomy data are not stored in BioSQL, then the
> only way to do this is to parse the taxonomy data on startup, parse the
> sequences on startup into a simple in-memory system such as
> HashRichSequenceDB, then use the methods on the RichSequenceDB interface
> to obtain sequences by accession before continuing as per the example
> above.
>
> cheers,
> Richard
>
>
> On Fri, 2006-06-09 at 14:51 -0600, Hubert Prielinger wrote:
> 
>> hi,
>> sorry for replying that late,
>> I have XML blast outputs, which you can retrieve information like 
>> accession id, protein name, length of sequnence aso....
>> but there is no possibility to retrieve the taxonomy (especially the 
>> scientific name or common name)
>> I need the common and scientific name from each blast hit. I have found 

>> in biojava-live/src/org/biojava/bibliography/taxa  a few code examples 
>> that could suit my
>> task (e.g: simpleTaxon.java)
>>
>> eg: I have the accession id: YPOL_IBDVS
>> and I want to get the taxonomy of that protein, not neccessarily the 
>> entire taxonomy but mentioned above scientific and common name.
>> and I don't know exactly how to get the taxonomy, it seems that there 
is 
>> no directly way from the accession id, but over the taxon id, but I 
>> don't know how to get that either.....
>> it must be possible to map the accession id to the taxon id and then 
>> request with the taxon id the taxonomy, if I get it right.....
>>
>> thanks in advance
>> regards
>> Hubert
>>
>>
>> Richard Holland wrote:
>> 
>>> I'm not sure what you're asking for here. Could you explain in a 
little
>>> more detail? Maybe write some example program code that assumes 
BioJava
>>> works the way you'd like it to work in this situation, making up the
>>> names of classes/methods that you might call in BioJava but don't yet
>>> exist, then we can help you fill in the gaps. 
>>>
>>> cheers,
>>> Richard
>>>
>>> On Mon, 2006-06-05 at 16:49 -0600, Hubert Prielinger wrote:
>>> 
>>> 
>>>> hi,
>>>> Is it possible with biojava to retrieve the species not the entire 
>>>> taxonomy, only the common name if I only have the accession id or the 

>>>> name of the protein and if yes
>>>> how to start.....
>>>> In my case:
>>>> I would retrieve the accession id from my local database then assign 
as 
>>>> parameter to the program, retrieve common name and write the common 
name 
>>>> back into the database....
>>>> the thing I want to know is the retrieving possible with biojava?
>>>>
>>>> thanks for help
>>>>
>>>> Hubert
>>>> _______________________________________________
>>>> biojava-dev mailing list
>>>> biojava-dev at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>>> 
>>>> 

_______________________________________________
biojava-dev mailing list
biojava-dev at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biojava-dev


From gwaldon at geneinfinity.org  Thu Jun 29 19:33:55 2006
From: gwaldon at geneinfinity.org (george waldon)
Date: Thu, 29 Jun 2006 16:33:55 -0700
Subject: [Biojava-dev] Problem with SimpleDocRefTest
Message-ID: <200606292333.k5TNXwdC026213@mmm1924.dulles19-verio.com>

Hi,

I am trying to run the tests but my compiler complains about org.biojavax.SimpleDocRefTest at compilation:
missing constructor SimpleDocRef(java.util.List, java.lang.String)

Was-It expected to interpretate that as SimpleDocRef(java.util.List, java.lang.String, null) (the three-argument constructor)?

Thanks,
George

From richard.holland at ebi.ac.uk  Fri Jun 30 05:11:11 2006
From: richard.holland at ebi.ac.uk (Richard Holland)
Date: Fri, 30 Jun 2006 10:11:11 +0100
Subject: [Biojava-dev] Problem with SimpleDocRefTest
In-Reply-To: <200606292333.k5TNXwdC026213@mmm1924.dulles19-verio.com>
References: <200606292333.k5TNXwdC026213@mmm1924.dulles19-verio.com>
Message-ID: <1151658671.3942.75.camel@texas.ebi.ac.uk>

The SimpleDocRef constructor changed recently to include document titles
and I don't think the test was updated to match it. Sorry about that -
the head branch of CVS is always under development so cannot always be
guaranteed to work out 100%.

Mark, can you update the tests?

cheers,
Richard

On Thu, 2006-06-29 at 16:33 -0700, george waldon wrote:
> Hi,
> 
> I am trying to run the tests but my compiler complains about org.biojavax.SimpleDocRefTest at compilation:
> missing constructor SimpleDocRef(java.util.List, java.lang.String)
> 
> Was-It expected to interpretate that as SimpleDocRef(java.util.List, java.lang.String, null) (the three-argument constructor)?
> 
> Thanks,
> George
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
-- 
Richard Holland (BioMart Team)
EMBL-EBI
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD
UNITED KINGDOM
Tel: +44-(0)1223-494416


From mark.schreiber at novartis.com  Tue Jun  6 06:45:19 2006
From: mark.schreiber at novartis.com (mark.schreiber at novartis.com)
Date: Tue, 6 Jun 2006 14:45:19 +0800
Subject: [Biojava-dev] Proposed change to RichFormat interface
Message-ID: <OFFEE60AED.663C8D30-ON48257185.0024D97F-48257185.00251B97@EU.novartis.net>

Hi all -

I would like to propose a change  to the RichFormat interface. I think we 
should do this now as we haven't done a stable biojavax roll out yet so 
interface 
changes should still be allowed. The additional methods would be:

public String currentLine();
public int currentLineNumber();

This would make debugging a lot easier, it would also make construction of 
a RichSeqIOListener that logs and debugs much easier. I was trying to do 
this a while back. I started a background process that parsed 6GB of 
genbank records looking for records that failed. It worked ok but would be 

much better with the ability to query the RichFormat in the above way. We 
might even be able to make it  a utility that people could run on suspect 
files and generate standard bug reports to make it easier for us to debug 
the parser code.

What do people think??
- Mark


From richard.holland at ebi.ac.uk  Tue Jun  6 08:10:40 2006
From: richard.holland at ebi.ac.uk (Richard Holland)
Date: Tue, 06 Jun 2006 09:10:40 +0100
Subject: [Biojava-dev] Proposed change to RichFormat interface
In-Reply-To: <OFFEE60AED.663C8D30-ON48257185.0024D97F-48257185.00251B97@EU.novartis.net>
References: <OFFEE60AED.663C8D30-ON48257185.0024D97F-48257185.00251B97@EU.novartis.net>
Message-ID: <1149581440.3947.56.camel@texas.ebi.ac.uk>

Go for it. It would be very helpful.

On Tue, 2006-06-06 at 14:45 +0800, mark.schreiber at novartis.com wrote:
> Hi all -
> 
> I would like to propose a change  to the RichFormat interface. I think we 
> should do this now as we haven't done a stable biojavax roll out yet so 
> interface 
> changes should still be allowed. The additional methods would be:
> 
> public String currentLine();
> public int currentLineNumber();
> 
> This would make debugging a lot easier, it would also make construction of 
> a RichSeqIOListener that logs and debugs much easier. I was trying to do 
> this a while back. I started a background process that parsed 6GB of 
> genbank records looking for records that failed. It worked ok but would be 
> 
> much better with the ability to query the RichFormat in the above way. We 
> might even be able to make it  a utility that people could run on suspect 
> files and generate standard bug reports to make it easier for us to debug 
> the parser code.
> 
> What do people think??
> - Mark
> 
> 
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
-- 
Richard Holland (BioMart Team)
EMBL-EBI
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD
UNITED KINGDOM
Tel: +44-(0)1223-494416


From mark.schreiber at novartis.com  Tue Jun  6 09:41:41 2006
From: mark.schreiber at novartis.com (mark.schreiber at novartis.com)
Date: Tue, 6 Jun 2006 17:41:41 +0800
Subject: [Biojava-dev] Proposed change to RichFormat interface
Message-ID: <OFEF0C340B.CE10DB23-ON48257185.0034EC03-48257185.0035415D@EU.novartis.net>

maybe the method should be something like

public String currentParseString()

The question is should the currentLineNumber be the start of the parse 
block or the end?

I would favour the start of the parse block. This would be more like 
compiler type behaivour but might be trickier to code??

- Mark


Richard Holland <richard.holland at ebi.ac.uk>
06/06/2006 05:31 PM

 
        To:     Mark Schreiber <mark.schreiber at novartis.com>
        cc: 
        Subject:        Re: [Biojava-dev] Proposed change to RichFormat interface


It's worth pointing out that most of the parsers bunch together lines,
so the methods below would probably print out the line number on which
the group of lines started, followed by the entire group. Not sure if
that's exactly what you had in mind, but I'm sure it'd help a little
bit.

On Tue, 2006-06-06 at 14:45 +0800, mark.schreiber at novartis.com wrote:
> public String currentLine();
> public int currentLineNumber();
-- 
Richard Holland (BioMart Team)
EMBL-EBI
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD
UNITED KINGDOM
Tel: +44-(0)1223-494416


From hubert.prielinger at gmx.at  Mon Jun  5 22:49:29 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Mon, 05 Jun 2006 16:49:29 -0600
Subject: [Biojava-dev] retrieving species (common name)
Message-ID: <4484B4F9.9000502@gmx.at>

hi,
Is it possible with biojava to retrieve the species not the entire 
taxonomy, only the common name if I only have the accession id or the 
name of the protein and if yes
how to start.....
In my case:
I would retrieve the accession id from my local database then assign as 
parameter to the program, retrieve common name and write the common name 
back into the database....
the thing I want to know is the retrieving possible with biojava?

thanks for help

Hubert


From richard.holland at ebi.ac.uk  Tue Jun  6 15:17:41 2006
From: richard.holland at ebi.ac.uk (Richard Holland)
Date: Tue, 06 Jun 2006 16:17:41 +0100
Subject: [Biojava-dev] retrieving species (common name)
In-Reply-To: <4484B4F9.9000502@gmx.at>
References: <4484B4F9.9000502@gmx.at>
Message-ID: <1149607062.3947.92.camel@texas.ebi.ac.uk>

I'm not sure what you're asking for here. Could you explain in a little
more detail? Maybe write some example program code that assumes BioJava
works the way you'd like it to work in this situation, making up the
names of classes/methods that you might call in BioJava but don't yet
exist, then we can help you fill in the gaps. 

cheers,
Richard

On Mon, 2006-06-05 at 16:49 -0600, Hubert Prielinger wrote:
> hi,
> Is it possible with biojava to retrieve the species not the entire 
> taxonomy, only the common name if I only have the accession id or the 
> name of the protein and if yes
> how to start.....
> In my case:
> I would retrieve the accession id from my local database then assign as 
> parameter to the program, retrieve common name and write the common name 
> back into the database....
> the thing I want to know is the retrieving possible with biojava?
> 
> thanks for help
> 
> Hubert
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
-- 
Richard Holland (BioMart Team)
EMBL-EBI
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD
UNITED KINGDOM
Tel: +44-(0)1223-494416


From mark.schreiber at novartis.com  Wed Jun  7 06:02:51 2006
From: mark.schreiber at novartis.com (mark.schreiber at novartis.com)
Date: Wed, 7 Jun 2006 14:02:51 +0800
Subject: [Biojava-dev] Proposed change to RichFormat interface
Message-ID: <OFDD055B07.C740ACD8-ON48257186.00206C63-48257186.00213864@EU.novartis.net>

That might be a more elegant solution.

Could even make the InputStream implement RichSeqIOListener thus it would 
be sending data to the RichFormat and listening to what the RichFormat 
makes of the data.

The InputStreamIOListener could remember when the RichFormat emits a 
startXXX() event record the line number and start buffering all the data 
sent as the readLine() requests are made (while also sending it to the 
RichFormat). When the RichFormat emits the corresponding endXXX() event 
the buffer can be cleared and the process starts again.

Only problem might be what to do when the RichFormat consumes data in 
between emitting events (which is allowed).

- Mark


Michael Heuer <heuermh at acm.org>
Sent by: Michael Heuer <heuermh at shell3.shore.net>
06/07/2006 01:51 PM

 
        To:     mark.schreiber at novartis.com
        cc:     biojava-dev at biojava.org
        Subject:        Re: [Biojava-dev] Proposed change to RichFormat interface


Mark Schreiber wrote:

> Hi all -
>
> I would like to propose a change  to the RichFormat interface. I think 
we
> should do this now as we haven't done a stable biojavax roll out yet so
> interface
> changes should still be allowed. The additional methods would be:
>
> public String currentLine();
> public int currentLineNumber();
>
> This would make debugging a lot easier, it would also make construction 
of
> a RichSeqIOListener that logs and debugs much easier. I was trying to do
> this a while back. I started a background process that parsed 6GB of
> genbank records looking for records that failed. It worked ok but would 
be
>
> much better with the ability to query the RichFormat in the above way. 
We
> might even be able to make it  a utility that people could run on 
suspect
> files and generate standard bug reports to make it easier for us to 
debug
> the parser code.
>
> What do people think??

Another possibility would be to leave this sort of progress tracking up
to the client, in that they could wrap the InputStream in something like
an CountingInputStream before passing it to the parser(s):

http://jakarta.apache.org/commons/io/api-release/org/apache/commons/io/input/CountingInputStream.html

   michael


From heuermh at acm.org  Wed Jun  7 05:51:42 2006
From: heuermh at acm.org (Michael Heuer)
Date: Wed, 7 Jun 2006 01:51:42 -0400 (EDT)
Subject: [Biojava-dev] Proposed change to RichFormat interface
In-Reply-To: <OFFEE60AED.663C8D30-ON48257185.0024D97F-48257185.00251B97@EU.novartis.net>
Message-ID: <Pine.GSO.4.44.0606070148350.26221-100000@shell3.shore.net>

Mark Schreiber wrote:

> Hi all -
>
> I would like to propose a change  to the RichFormat interface. I think we
> should do this now as we haven't done a stable biojavax roll out yet so
> interface
> changes should still be allowed. The additional methods would be:
>
> public String currentLine();
> public int currentLineNumber();
>
> This would make debugging a lot easier, it would also make construction of
> a RichSeqIOListener that logs and debugs much easier. I was trying to do
> this a while back. I started a background process that parsed 6GB of
> genbank records looking for records that failed. It worked ok but would be
>
> much better with the ability to query the RichFormat in the above way. We
> might even be able to make it  a utility that people could run on suspect
> files and generate standard bug reports to make it easier for us to debug
> the parser code.
>
> What do people think??

Another possibility would be to leave this sort of progress tracking up
to the client, in that they could wrap the InputStream in something like
an CountingInputStream before passing it to the parser(s):

http://jakarta.apache.org/commons/io/api-release/org/apache/commons/io/input/CountingInputStream.html

   michael


From richard.holland at ebi.ac.uk  Wed Jun  7 12:36:49 2006
From: richard.holland at ebi.ac.uk (Richard Holland)
Date: Wed, 07 Jun 2006 13:36:49 +0100
Subject: [Biojava-dev] Proposed change to RichFormat interface
In-Reply-To: <OFDD055B07.C740ACD8-ON48257186.00206C63-48257186.00213864@EU.novartis.net>
References: <OFDD055B07.C740ACD8-ON48257186.00206C63-48257186.00213864@EU.novartis.net>
Message-ID: <1149683810.3947.131.camel@texas.ebi.ac.uk>

Hi guys.

See org.biojavax.seq.io.DebuggingRichSeqIOListener.

It extends BufferedInputStream, so can be used to wrap a normal
InputStream before being passed around.

It also implements RichSeqIOListener.

The idea is that you do something like this:

	Namespace ns = RichObjectFactory.getDefaultNamespace();
	InputStream is = new FileInputStream("myFastaFile.fasta");
	FASTAFormat format = new FASTAFormat();

	DebuggingRichSeqIOListener debug = 
		new DebuggingRichSeqIOListener(is);
	BufferedReader br = new BufferedReader(
		new InputStreamReader(debug));

	SymbolTokenization symParser = format.guessSymbolTokenization(debug);

	format.readRichSequence(
            br,
            symParser,
            debug,
            ns);

This will then dump out everything as it is read, and all events as they
happen in-line with the input as it is interpreted.

Hope this helps?

cheers,
Richard
 

On Wed, 2006-06-07 at 14:02 +0800, mark.schreiber at novartis.com wrote:
> That might be a more elegant solution.
> 
> Could even make the InputStream implement RichSeqIOListener thus it would 
> be sending data to the RichFormat and listening to what the RichFormat 
> makes of the data.
> 
> The InputStreamIOListener could remember when the RichFormat emits a 
> startXXX() event record the line number and start buffering all the data 
> sent as the readLine() requests are made (while also sending it to the 
> RichFormat). When the RichFormat emits the corresponding endXXX() event 
> the buffer can be cleared and the process starts again.
> 
> Only problem might be what to do when the RichFormat consumes data in 
> between emitting events (which is allowed).
> 
> - Mark
> 
> 
> 
> 
> 
> Michael Heuer <heuermh at acm.org>
> Sent by: Michael Heuer <heuermh at shell3.shore.net>
> 06/07/2006 01:51 PM
> 
>  
>         To:     mark.schreiber at novartis.com
>         cc:     biojava-dev at biojava.org
>         Subject:        Re: [Biojava-dev] Proposed change to RichFormat interface
> 
> 
> Mark Schreiber wrote:
> 
> > Hi all -
> >
> > I would like to propose a change  to the RichFormat interface. I think 
> we
> > should do this now as we haven't done a stable biojavax roll out yet so
> > interface
> > changes should still be allowed. The additional methods would be:
> >
> > public String currentLine();
> > public int currentLineNumber();
> >
> > This would make debugging a lot easier, it would also make construction 
> of
> > a RichSeqIOListener that logs and debugs much easier. I was trying to do
> > this a while back. I started a background process that parsed 6GB of
> > genbank records looking for records that failed. It worked ok but would 
> be
> >
> > much better with the ability to query the RichFormat in the above way. 
> We
> > might even be able to make it  a utility that people could run on 
> suspect
> > files and generate standard bug reports to make it easier for us to 
> debug
> > the parser code.
> >
> > What do people think??
> 
> Another possibility would be to leave this sort of progress tracking up
> to the client, in that they could wrap the InputStream in something like
> an CountingInputStream before passing it to the parser(s):
> 
> http://jakarta.apache.org/commons/io/api-release/org/apache/commons/io/input/CountingInputStream.html
> 
>    michael
> 
> 
> 
> 
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
-- 
Richard Holland (BioMart Team)
EMBL-EBI
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD
UNITED KINGDOM
Tel: +44-(0)1223-494416


From mark.schreiber at novartis.com  Thu Jun  8 01:03:22 2006
From: mark.schreiber at novartis.com (mark.schreiber at novartis.com)
Date: Thu, 8 Jun 2006 09:03:22 +0800
Subject: [Biojava-dev] Proposed change to RichFormat interface
Message-ID: <OF8EDFF654.E3B483EE-ON48257187.0005BED1-48257187.0005CD56@EU.novartis.net>

Very cool!

Can you put this example in the cookbook?

- Mark


Richard Holland <richard.holland at ebi.ac.uk>
Sent by: biojava-dev-bounces at lists.open-bio.org
06/07/2006 08:36 PM

 
        To:     Mark Schreiber <mark.schreiber at novartis.com>
        cc:     biojava-dev <biojava-dev at biojava.org>, Michael Heuer <heuermh at acm.org>, 
Michael Heuer <heuermh at shell3.shore.net>
        Subject:        Re: [Biojava-dev] Proposed change to RichFormat interface


Hi guys.

See org.biojavax.seq.io.DebuggingRichSeqIOListener.

It extends BufferedInputStream, so can be used to wrap a normal
InputStream before being passed around.

It also implements RichSeqIOListener.

The idea is that you do something like this:

                 Namespace ns = RichObjectFactory.getDefaultNamespace();
                 InputStream is = new 
FileInputStream("myFastaFile.fasta");
                 FASTAFormat format = new FASTAFormat();

                 DebuggingRichSeqIOListener debug = 
                                 new DebuggingRichSeqIOListener(is);
                 BufferedReader br = new BufferedReader(
                                 new InputStreamReader(debug));

                 SymbolTokenization symParser = 
format.guessSymbolTokenization(debug);

                 format.readRichSequence(
            br,
            symParser,
            debug,
            ns);

This will then dump out everything as it is read, and all events as they
happen in-line with the input as it is interpreted.

Hope this helps?

cheers,
Richard
 

On Wed, 2006-06-07 at 14:02 +0800, mark.schreiber at novartis.com wrote:
> That might be a more elegant solution.
> 
> Could even make the InputStream implement RichSeqIOListener thus it 
would 
> be sending data to the RichFormat and listening to what the RichFormat 
> makes of the data.
> 
> The InputStreamIOListener could remember when the RichFormat emits a 
> startXXX() event record the line number and start buffering all the data 

> sent as the readLine() requests are made (while also sending it to the 
> RichFormat). When the RichFormat emits the corresponding endXXX() event 
> the buffer can be cleared and the process starts again.
> 
> Only problem might be what to do when the RichFormat consumes data in 
> between emitting events (which is allowed).
> 
> - Mark
> 
> 
> 
> 
> 
> Michael Heuer <heuermh at acm.org>
> Sent by: Michael Heuer <heuermh at shell3.shore.net>
> 06/07/2006 01:51 PM
> 
> 
>         To:     mark.schreiber at novartis.com
>         cc:     biojava-dev at biojava.org
>         Subject:        Re: [Biojava-dev] Proposed change to RichFormat 
interface
> 
> 
> Mark Schreiber wrote:
> 
> > Hi all -
> >
> > I would like to propose a change  to the RichFormat interface. I think 

> we
> > should do this now as we haven't done a stable biojavax roll out yet 
so
> > interface
> > changes should still be allowed. The additional methods would be:
> >
> > public String currentLine();
> > public int currentLineNumber();
> >
> > This would make debugging a lot easier, it would also make 
construction 
> of
> > a RichSeqIOListener that logs and debugs much easier. I was trying to 
do
> > this a while back. I started a background process that parsed 6GB of
> > genbank records looking for records that failed. It worked ok but 
would 
> be
> >
> > much better with the ability to query the RichFormat in the above way. 

> We
> > might even be able to make it  a utility that people could run on 
> suspect
> > files and generate standard bug reports to make it easier for us to 
> debug
> > the parser code.
> >
> > What do people think??
> 
> Another possibility would be to leave this sort of progress tracking up
> to the client, in that they could wrap the InputStream in something like
> an CountingInputStream before passing it to the parser(s):
> 
> http://jakarta.apache.org/commons/io/api-release/org/apache/commons/io/input/CountingInputStream.html
> 
>    michael
> 
> 
> 
> 
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
-- 
Richard Holland (BioMart Team)
EMBL-EBI
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD
UNITED KINGDOM
Tel: +44-(0)1223-494416

_______________________________________________
biojava-dev mailing list
biojava-dev at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biojava-dev


From richard.holland at ebi.ac.uk  Mon Jun 12 08:52:53 2006
From: richard.holland at ebi.ac.uk (Richard Holland)
Date: Mon, 12 Jun 2006 09:52:53 +0100
Subject: [Biojava-dev] retrieving species (common name)
In-Reply-To: <4489DF3F.4060504@gmx.at>
References: <4484B4F9.9000502@gmx.at>
	<1149607062.3947.92.camel@texas.ebi.ac.uk>  <4489DF3F.4060504@gmx.at>
Message-ID: <1150102373.3952.21.camel@texas.ebi.ac.uk>

I'm assuming your sequences and taxonomy data are stored in BioSQL. In
which case, it's fairly straightforward to get this information out
without having to drag all the features and annotations out as well, by
using BioEntry instead of RichSequence to query the database. Code like
this should work (hasn't been checked or anything, but it gives you an
idea as to how things should go):

	// connect to BioSQL and establish a Hibernate Session
	Session sess = ...;

	// set up BioJavaX to use the session
	RichObjectFactory.connectToBioSQL(sess);

	// instantiate the class that gets BioEntries from BioSQL.
	// use BioSQLRichSequenceDB instead if you want features and
	// annotations included.
	BioEntryDB db = new BioSQLBioEntryDB(sess);

	// get BioEntry for accession (accession must be the
	// primary accession of the sequence, as found in the
	// 'name' column in the 'bioentry' table in the database).
	BioEntry be = db.getBioEntry("YPOL_IBDVS");

	// get BioEntry's taxon object
	NCBITaxon tax = be.getTaxon();

	// print the names. Each name belongs to a name class.
	for (Iterator i = tax.getNameClasses().iterator(); 
		i.hasNext(); 
	) {
		String nameClass = (String)i.next();
		for (Iterator k = tax.getNames(nameClass).iterator(); 
			k.hasNext(); 
		) {
			String name = (String)k.next();
			System.out.println(nameClass+" : "+name);
		}
	}
	

If your sequences and taxonomy data are not stored in BioSQL, then the
only way to do this is to parse the taxonomy data on startup, parse the
sequences on startup into a simple in-memory system such as
HashRichSequenceDB, then use the methods on the RichSequenceDB interface
to obtain sequences by accession before continuing as per the example
above.

cheers,
Richard


On Fri, 2006-06-09 at 14:51 -0600, Hubert Prielinger wrote:
> hi,
> sorry for replying that late,
> I have XML blast outputs, which you can retrieve information like 
> accession id, protein name, length of sequnence aso....
> but there is no possibility to retrieve the taxonomy (especially the 
> scientific name or common name)
> I need the common and scientific name from each blast hit. I have found 
> in biojava-live/src/org/biojava/bibliography/taxa  a few code examples 
> that could suit my
> task (e.g: simpleTaxon.java)
> 
> eg: I have the accession id: YPOL_IBDVS
> and I want to get the taxonomy of that protein, not neccessarily the 
> entire taxonomy but mentioned above scientific and common name.
> and I don't know exactly how to get the taxonomy, it seems that there is 
> no directly way from the accession id, but over the taxon id, but I 
> don't know how to get that either.....
> it must be possible to map the accession id to the taxon id and then 
> request with the taxon id the taxonomy, if I get it right.....
> 
> thanks in advance
> regards
> Hubert
> 
> 
> Richard Holland wrote:
> > I'm not sure what you're asking for here. Could you explain in a little
> > more detail? Maybe write some example program code that assumes BioJava
> > works the way you'd like it to work in this situation, making up the
> > names of classes/methods that you might call in BioJava but don't yet
> > exist, then we can help you fill in the gaps. 
> >
> > cheers,
> > Richard
> >
> > On Mon, 2006-06-05 at 16:49 -0600, Hubert Prielinger wrote:
> >   
> >> hi,
> >> Is it possible with biojava to retrieve the species not the entire 
> >> taxonomy, only the common name if I only have the accession id or the 
> >> name of the protein and if yes
> >> how to start.....
> >> In my case:
> >> I would retrieve the accession id from my local database then assign as 
> >> parameter to the program, retrieve common name and write the common name 
> >> back into the database....
> >> the thing I want to know is the retrieving possible with biojava?
> >>
> >> thanks for help
> >>
> >> Hubert
> >> _______________________________________________
> >> biojava-dev mailing list
> >> biojava-dev at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/biojava-dev
> >>     
> 
-- 
Richard Holland (BioMart Team)
EMBL-EBI
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD
UNITED KINGDOM
Tel: +44-(0)1223-494416


From Robin.Emig at pioneer.com  Mon Jun 12 19:01:12 2006
From: Robin.Emig at pioneer.com (Emig, Robin)
Date: Mon, 12 Jun 2006 12:01:12 -0700
Subject: [Biojava-dev] Read/Write Account
Message-ID: <BE0B1B72A3A05C448AD3F6FB6B6D9A137D668E@rcy1ms01.phibred.com>

Can I get a read write account for biojava? I used to have one under
remig, or raemig.

Thanks

Robin

 
Robin Emig

Pioneer HiBred/Dupont

700A Bay Road

Redwood City, CA 94063

650-298-3564

 
From hubert.prielinger at gmx.at  Fri Jun  9 20:51:11 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Fri, 09 Jun 2006 14:51:11 -0600
Subject: [Biojava-dev] retrieving species (common name)
In-Reply-To: <1149607062.3947.92.camel@texas.ebi.ac.uk>
References: <4484B4F9.9000502@gmx.at>
	<1149607062.3947.92.camel@texas.ebi.ac.uk>
Message-ID: <4489DF3F.4060504@gmx.at>

hi,
sorry for replying that late,
I have XML blast outputs, which you can retrieve information like 
accession id, protein name, length of sequnence aso....
but there is no possibility to retrieve the taxonomy (especially the 
scientific name or common name)
I need the common and scientific name from each blast hit. I have found 
in biojava-live/src/org/biojava/bibliography/taxa  a few code examples 
that could suit my
task (e.g: simpleTaxon.java)

eg: I have the accession id: YPOL_IBDVS
and I want to get the taxonomy of that protein, not neccessarily the 
entire taxonomy but mentioned above scientific and common name.
and I don't know exactly how to get the taxonomy, it seems that there is 
no directly way from the accession id, but over the taxon id, but I 
don't know how to get that either.....
it must be possible to map the accession id to the taxon id and then 
request with the taxon id the taxonomy, if I get it right.....

thanks in advance
regards
Hubert


Richard Holland wrote:
> I'm not sure what you're asking for here. Could you explain in a little
> more detail? Maybe write some example program code that assumes BioJava
> works the way you'd like it to work in this situation, making up the
> names of classes/methods that you might call in BioJava but don't yet
> exist, then we can help you fill in the gaps. 
>
> cheers,
> Richard
>
> On Mon, 2006-06-05 at 16:49 -0600, Hubert Prielinger wrote:
>   
>> hi,
>> Is it possible with biojava to retrieve the species not the entire 
>> taxonomy, only the common name if I only have the accession id or the 
>> name of the protein and if yes
>> how to start.....
>> In my case:
>> I would retrieve the accession id from my local database then assign as 
>> parameter to the program, retrieve common name and write the common name 
>> back into the database....
>> the thing I want to know is the retrieving possible with biojava?
>>
>> thanks for help
>>
>> Hubert
>> _______________________________________________
>> biojava-dev mailing list
>> biojava-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>     


From hubert.prielinger at gmx.at  Fri Jun  9 22:10:12 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Fri, 09 Jun 2006 16:10:12 -0600
Subject: [Biojava-dev] retrieving species (common name)
In-Reply-To: <1149607062.3947.92.camel@texas.ebi.ac.uk>
References: <4484B4F9.9000502@gmx.at>
	<1149607062.3947.92.camel@texas.ebi.ac.uk>
Message-ID: <4489F1C4.3030803@gmx.at>

ok, with accession id, I mean the genbank id, if I have the genbank id, 
is there a direct way to get the common name...


Richard Holland wrote:
> I'm not sure what you're asking for here. Could you explain in a little
> more detail? Maybe write some example program code that assumes BioJava
> works the way you'd like it to work in this situation, making up the
> names of classes/methods that you might call in BioJava but don't yet
> exist, then we can help you fill in the gaps. 
>
> cheers,
> Richard
>
> On Mon, 2006-06-05 at 16:49 -0600, Hubert Prielinger wrote:
>   
>> hi,
>> Is it possible with biojava to retrieve the species not the entire 
>> taxonomy, only the common name if I only have the accession id or the 
>> name of the protein and if yes
>> how to start.....
>> In my case:
>> I would retrieve the accession id from my local database then assign as 
>> parameter to the program, retrieve common name and write the common name 
>> back into the database....
>> the thing I want to know is the retrieving possible with biojava?
>>
>> thanks for help
>>
>> Hubert
>> _______________________________________________
>> biojava-dev mailing list
>> biojava-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>     


From hubert.prielinger at gmx.at  Mon Jun 12 16:36:32 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Mon, 12 Jun 2006 10:36:32 -0600
Subject: [Biojava-dev] retrieving species (common name)
In-Reply-To: <1150102373.3952.21.camel@texas.ebi.ac.uk>
References: <4484B4F9.9000502@gmx.at>	
	<1149607062.3947.92.camel@texas.ebi.ac.uk>
	<4489DF3F.4060504@gmx.at>
	<1150102373.3952.21.camel@texas.ebi.ac.uk>
Message-ID: <448D9810.1060703@gmx.at>

hi,
No, I'm not using BioSQL, it is an usual mySQL database and I have only 
the genbank accession id available. I want to get the taxonomy with the 
accession id, if that is possible.

regards
Hubert


Richard Holland wrote:
> I'm assuming your sequences and taxonomy data are stored in BioSQL. In
> which case, it's fairly straightforward to get this information out
> without having to drag all the features and annotations out as well, by
> using BioEntry instead of RichSequence to query the database. Code like
> this should work (hasn't been checked or anything, but it gives you an
> idea as to how things should go):
>
> 	// connect to BioSQL and establish a Hibernate Session
> 	Session sess = ...;
>
> 	// set up BioJavaX to use the session
> 	RichObjectFactory.connectToBioSQL(sess);
>
> 	// instantiate the class that gets BioEntries from BioSQL.
> 	// use BioSQLRichSequenceDB instead if you want features and
> 	// annotations included.
> 	BioEntryDB db = new BioSQLBioEntryDB(sess);
>
> 	// get BioEntry for accession (accession must be the
> 	// primary accession of the sequence, as found in the
> 	// 'name' column in the 'bioentry' table in the database).
> 	BioEntry be = db.getBioEntry("YPOL_IBDVS");
>
> 	// get BioEntry's taxon object
> 	NCBITaxon tax = be.getTaxon();
>
> 	// print the names. Each name belongs to a name class.
> 	for (Iterator i = tax.getNameClasses().iterator(); 
> 		i.hasNext(); 
> 	) {
> 		String nameClass = (String)i.next();
> 		for (Iterator k = tax.getNames(nameClass).iterator(); 
> 			k.hasNext(); 
> 		) {
> 			String name = (String)k.next();
> 			System.out.println(nameClass+" : "+name);
> 		}
> 	}
> 	
>
> If your sequences and taxonomy data are not stored in BioSQL, then the
> only way to do this is to parse the taxonomy data on startup, parse the
> sequences on startup into a simple in-memory system such as
> HashRichSequenceDB, then use the methods on the RichSequenceDB interface
> to obtain sequences by accession before continuing as per the example
> above.
>
> cheers,
> Richard
>
>
> On Fri, 2006-06-09 at 14:51 -0600, Hubert Prielinger wrote:
>   
>> hi,
>> sorry for replying that late,
>> I have XML blast outputs, which you can retrieve information like 
>> accession id, protein name, length of sequnence aso....
>> but there is no possibility to retrieve the taxonomy (especially the 
>> scientific name or common name)
>> I need the common and scientific name from each blast hit. I have found 
>> in biojava-live/src/org/biojava/bibliography/taxa  a few code examples 
>> that could suit my
>> task (e.g: simpleTaxon.java)
>>
>> eg: I have the accession id: YPOL_IBDVS
>> and I want to get the taxonomy of that protein, not neccessarily the 
>> entire taxonomy but mentioned above scientific and common name.
>> and I don't know exactly how to get the taxonomy, it seems that there is 
>> no directly way from the accession id, but over the taxon id, but I 
>> don't know how to get that either.....
>> it must be possible to map the accession id to the taxon id and then 
>> request with the taxon id the taxonomy, if I get it right.....
>>
>> thanks in advance
>> regards
>> Hubert
>>
>>
>> Richard Holland wrote:
>>     
>>> I'm not sure what you're asking for here. Could you explain in a little
>>> more detail? Maybe write some example program code that assumes BioJava
>>> works the way you'd like it to work in this situation, making up the
>>> names of classes/methods that you might call in BioJava but don't yet
>>> exist, then we can help you fill in the gaps. 
>>>
>>> cheers,
>>> Richard
>>>
>>> On Mon, 2006-06-05 at 16:49 -0600, Hubert Prielinger wrote:
>>>   
>>>       
>>>> hi,
>>>> Is it possible with biojava to retrieve the species not the entire 
>>>> taxonomy, only the common name if I only have the accession id or the 
>>>> name of the protein and if yes
>>>> how to start.....
>>>> In my case:
>>>> I would retrieve the accession id from my local database then assign as 
>>>> parameter to the program, retrieve common name and write the common name 
>>>> back into the database....
>>>> the thing I want to know is the retrieving possible with biojava?
>>>>
>>>> thanks for help
>>>>
>>>> Hubert
>>>> _______________________________________________
>>>> biojava-dev mailing list
>>>> biojava-dev at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>>>     
>>>>         


From richard.holland at ebi.ac.uk  Tue Jun 13 08:58:21 2006
From: richard.holland at ebi.ac.uk (Richard Holland)
Date: Tue, 13 Jun 2006 09:58:21 +0100
Subject: [Biojava-dev] retrieving species (common name)
In-Reply-To: <448D9810.1060703@gmx.at>
References: <4484B4F9.9000502@gmx.at>
	<1149607062.3947.92.camel@texas.ebi.ac.uk>  <4489DF3F.4060504@gmx.at>
	<1150102373.3952.21.camel@texas.ebi.ac.uk>  <448D9810.1060703@gmx.at>
Message-ID: <1150189101.3952.47.camel@texas.ebi.ac.uk>

At present, BJX only has bindings to BioSQL (which can be installed in
Oracle, MySQL, PostgreSQL, or HSQL depending on your preference). It
doesn't know how to access sequence/taxonomy data stored in other
databases. Of course, it can still read flat files.

Without a database which BJX understands, the only way to do what you
describe is to load taxonomy data from the NCBI taxonomy files into
memory on startup, then set up some mechanism of parsing Genbank records
on the fly according to accession number... I could go into detail but
it's a bit complex.

So the short answer is - no, you can't do that kind of query without
coming up with some clever way of using file parsers efficiently on the
fly, or by storing everything in a BioSQL database. Have a look at
RichSequenceListener if you want to selectively parse sequence files.

cheers,
Richard

On Mon, 2006-06-12 at 10:36 -0600, Hubert Prielinger wrote:
> > If your sequences and taxonomy data are not stored in BioSQL, then
> the
> > only way to do this is to parse the taxonomy data on startup, parse
> the
> > sequences on startup into a simple in-memory system such as
> > HashRichSequenceDB, then use the methods on the RichSequenceDB
> interface
> > to obtain sequences by accession before continuing as per the
> example
> > above.
-- 
Richard Holland (BioMart Team)
EMBL-EBI
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD
UNITED KINGDOM
Tel: +44-(0)1223-494416


From richard.holland at ebi.ac.uk  Tue Jun 13 15:20:15 2006
From: richard.holland at ebi.ac.uk (Richard Holland)
Date: Tue, 13 Jun 2006 16:20:15 +0100
Subject: [Biojava-dev] Read/Write Account
In-Reply-To: <BE0B1B72A3A05C448AD3F6FB6B6D9A137D668E@rcy1ms01.phibred.com>
References: <BE0B1B72A3A05C448AD3F6FB6B6D9A137D668E@rcy1ms01.phibred.com>
Message-ID: <1150212015.3952.121.camel@texas.ebi.ac.uk>

Hi Robin. Mark should be able to set you up with one, or point you to
the person who can. I can never remember who's in charge.

Purely out of interest, what are you planning on changing once you get
access? It's useful to know what people are up to out there so we don't
duplicate effort.

cheers,
Richard

On Mon, 2006-06-12 at 12:01 -0700, Emig, Robin wrote:
> Can I get a read write account for biojava? I used to have one under
> remig, or raemig.
> 
> Thanks
> 
> Robin
> 
>  
> 
> Robin Emig
> 
> Pioneer HiBred/Dupont
> 
> 700A Bay Road
> 
> Redwood City, CA 94063
> 
> 650-298-3564
> 
>  
> 
> 
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
-- 
Richard Holland (BioMart Team)
EMBL-EBI
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD
UNITED KINGDOM
Tel: +44-(0)1223-494416


From mark.schreiber at novartis.com  Mon Jun 19 02:38:38 2006
From: mark.schreiber at novartis.com (mark.schreiber at novartis.com)
Date: Mon, 19 Jun 2006 10:38:38 +0800
Subject: [Biojava-dev] retrieving species (common name)
Message-ID: <OFACBEF79C.494DD122-ON48257192.000E71BA-48257192.000E861E@EU.novartis.net>

You could try NCBI's e-utils webservice. This might enable you to get the 
common name using a GI number.

- Mark


Hubert Prielinger <hubert.prielinger at gmx.at>
Sent by: biojava-dev-bounces at lists.open-bio.org
06/13/2006 12:36 AM

 
        To:     Richard Holland <richard.holland at ebi.ac.uk>, 
biojava-dev at lists.open-bio.org
        cc:     (bcc: Mark Schreiber/GP/Novartis)
        Subject:        Re: [Biojava-dev] retrieving species (common name)


hi,
No, I'm not using BioSQL, it is an usual mySQL database and I have only 
the genbank accession id available. I want to get the taxonomy with the 
accession id, if that is possible.

regards
Hubert


Richard Holland wrote:
> I'm assuming your sequences and taxonomy data are stored in BioSQL. In
> which case, it's fairly straightforward to get this information out
> without having to drag all the features and annotations out as well, by
> using BioEntry instead of RichSequence to query the database. Code like
> this should work (hasn't been checked or anything, but it gives you an
> idea as to how things should go):
>
>                // connect to BioSQL and establish a Hibernate Session
>                Session sess = ...;
>
>                // set up BioJavaX to use the session
>                RichObjectFactory.connectToBioSQL(sess);
>
>                // instantiate the class that gets BioEntries from 
BioSQL.
>                // use BioSQLRichSequenceDB instead if you want features 
and
>                // annotations included.
>                BioEntryDB db = new BioSQLBioEntryDB(sess);
>
>                // get BioEntry for accession (accession must be the
>                // primary accession of the sequence, as found in the
>                // 'name' column in the 'bioentry' table in the 
database).
>                BioEntry be = db.getBioEntry("YPOL_IBDVS");
>
>                // get BioEntry's taxon object
>                NCBITaxon tax = be.getTaxon();
>
>                // print the names. Each name belongs to a name class.
>                for (Iterator i = tax.getNameClasses().iterator(); 
>                                i.hasNext(); 
>                ) {
>                                String nameClass = (String)i.next();
>                                for (Iterator k = 
tax.getNames(nameClass).iterator(); 
>                                                k.hasNext(); 
>                                ) {
>                                                String name = 
(String)k.next();
> System.out.println(nameClass+" : "+name);
>                                }
>                }
> 
>
> If your sequences and taxonomy data are not stored in BioSQL, then the
> only way to do this is to parse the taxonomy data on startup, parse the
> sequences on startup into a simple in-memory system such as
> HashRichSequenceDB, then use the methods on the RichSequenceDB interface
> to obtain sequences by accession before continuing as per the example
> above.
>
> cheers,
> Richard
>
>
> On Fri, 2006-06-09 at 14:51 -0600, Hubert Prielinger wrote:
> 
>> hi,
>> sorry for replying that late,
>> I have XML blast outputs, which you can retrieve information like 
>> accession id, protein name, length of sequnence aso....
>> but there is no possibility to retrieve the taxonomy (especially the 
>> scientific name or common name)
>> I need the common and scientific name from each blast hit. I have found 

>> in biojava-live/src/org/biojava/bibliography/taxa  a few code examples 
>> that could suit my
>> task (e.g: simpleTaxon.java)
>>
>> eg: I have the accession id: YPOL_IBDVS
>> and I want to get the taxonomy of that protein, not neccessarily the 
>> entire taxonomy but mentioned above scientific and common name.
>> and I don't know exactly how to get the taxonomy, it seems that there 
is 
>> no directly way from the accession id, but over the taxon id, but I 
>> don't know how to get that either.....
>> it must be possible to map the accession id to the taxon id and then 
>> request with the taxon id the taxonomy, if I get it right.....
>>
>> thanks in advance
>> regards
>> Hubert
>>
>>
>> Richard Holland wrote:
>> 
>>> I'm not sure what you're asking for here. Could you explain in a 
little
>>> more detail? Maybe write some example program code that assumes 
BioJava
>>> works the way you'd like it to work in this situation, making up the
>>> names of classes/methods that you might call in BioJava but don't yet
>>> exist, then we can help you fill in the gaps. 
>>>
>>> cheers,
>>> Richard
>>>
>>> On Mon, 2006-06-05 at 16:49 -0600, Hubert Prielinger wrote:
>>> 
>>> 
>>>> hi,
>>>> Is it possible with biojava to retrieve the species not the entire 
>>>> taxonomy, only the common name if I only have the accession id or the 

>>>> name of the protein and if yes
>>>> how to start.....
>>>> In my case:
>>>> I would retrieve the accession id from my local database then assign 
as 
>>>> parameter to the program, retrieve common name and write the common 
name 
>>>> back into the database....
>>>> the thing I want to know is the retrieving possible with biojava?
>>>>
>>>> thanks for help
>>>>
>>>> Hubert
>>>> _______________________________________________
>>>> biojava-dev mailing list
>>>> biojava-dev at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>>> 
>>>> 

_______________________________________________
biojava-dev mailing list
biojava-dev at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biojava-dev


From gwaldon at geneinfinity.org  Thu Jun 29 23:33:55 2006
From: gwaldon at geneinfinity.org (george waldon)
Date: Thu, 29 Jun 2006 16:33:55 -0700
Subject: [Biojava-dev] Problem with SimpleDocRefTest
Message-ID: <200606292333.k5TNXwdC026213@mmm1924.dulles19-verio.com>

Hi,

I am trying to run the tests but my compiler complains about org.biojavax.SimpleDocRefTest at compilation:
missing constructor SimpleDocRef(java.util.List, java.lang.String)

Was-It expected to interpretate that as SimpleDocRef(java.util.List, java.lang.String, null) (the three-argument constructor)?

Thanks,
George


From richard.holland at ebi.ac.uk  Fri Jun 30 09:11:11 2006
From: richard.holland at ebi.ac.uk (Richard Holland)
Date: Fri, 30 Jun 2006 10:11:11 +0100
Subject: [Biojava-dev] Problem with SimpleDocRefTest
In-Reply-To: <200606292333.k5TNXwdC026213@mmm1924.dulles19-verio.com>
References: <200606292333.k5TNXwdC026213@mmm1924.dulles19-verio.com>
Message-ID: <1151658671.3942.75.camel@texas.ebi.ac.uk>

The SimpleDocRef constructor changed recently to include document titles
and I don't think the test was updated to match it. Sorry about that -
the head branch of CVS is always under development so cannot always be
guaranteed to work out 100%.

Mark, can you update the tests?

cheers,
Richard

On Thu, 2006-06-29 at 16:33 -0700, george waldon wrote:
> Hi,
> 
> I am trying to run the tests but my compiler complains about org.biojavax.SimpleDocRefTest at compilation:
> missing constructor SimpleDocRef(java.util.List, java.lang.String)
> 
> Was-It expected to interpretate that as SimpleDocRef(java.util.List, java.lang.String, null) (the three-argument constructor)?
> 
> Thanks,
> George
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
-- 
Richard Holland (BioMart Team)
EMBL-EBI
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD
UNITED KINGDOM
Tel: +44-(0)1223-494416