From mark.schreiber at novartis.com Tue Jun 6 02:45:19 2006 From: mark.schreiber at novartis.com (mark.schreiber at novartis.com) Date: Tue, 6 Jun 2006 14:45:19 +0800 Subject: [Biojava-dev] Proposed change to RichFormat interface Message-ID: Hi all - I would like to propose a change to the RichFormat interface. I think we should do this now as we haven't done a stable biojavax roll out yet so interface changes should still be allowed. The additional methods would be: public String currentLine(); public int currentLineNumber(); This would make debugging a lot easier, it would also make construction of a RichSeqIOListener that logs and debugs much easier. I was trying to do this a while back. I started a background process that parsed 6GB of genbank records looking for records that failed. It worked ok but would be much better with the ability to query the RichFormat in the above way. We might even be able to make it a utility that people could run on suspect files and generate standard bug reports to make it easier for us to debug the parser code. What do people think?? - Mark From richard.holland at ebi.ac.uk Tue Jun 6 04:10:40 2006 From: richard.holland at ebi.ac.uk (Richard Holland) Date: Tue, 06 Jun 2006 09:10:40 +0100 Subject: [Biojava-dev] Proposed change to RichFormat interface In-Reply-To: References: Message-ID: <1149581440.3947.56.camel@texas.ebi.ac.uk> Go for it. It would be very helpful. On Tue, 2006-06-06 at 14:45 +0800, mark.schreiber at novartis.com wrote: > Hi all - > > I would like to propose a change to the RichFormat interface. I think we > should do this now as we haven't done a stable biojavax roll out yet so > interface > changes should still be allowed. The additional methods would be: > > public String currentLine(); > public int currentLineNumber(); > > This would make debugging a lot easier, it would also make construction of > a RichSeqIOListener that logs and debugs much easier. I was trying to do > this a while back. I started a background process that parsed 6GB of > genbank records looking for records that failed. It worked ok but would be > > much better with the ability to query the RichFormat in the above way. We > might even be able to make it a utility that people could run on suspect > files and generate standard bug reports to make it easier for us to debug > the parser code. > > What do people think?? > - Mark > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev -- Richard Holland (BioMart Team) EMBL-EBI Wellcome Trust Genome Campus Hinxton Cambridge CB10 1SD UNITED KINGDOM Tel: +44-(0)1223-494416 From mark.schreiber at novartis.com Tue Jun 6 05:41:41 2006 From: mark.schreiber at novartis.com (mark.schreiber at novartis.com) Date: Tue, 6 Jun 2006 17:41:41 +0800 Subject: [Biojava-dev] Proposed change to RichFormat interface Message-ID: maybe the method should be something like public String currentParseString() The question is should the currentLineNumber be the start of the parse block or the end? I would favour the start of the parse block. This would be more like compiler type behaivour but might be trickier to code?? - Mark Richard Holland 06/06/2006 05:31 PM To: Mark Schreiber cc: Subject: Re: [Biojava-dev] Proposed change to RichFormat interface It's worth pointing out that most of the parsers bunch together lines, so the methods below would probably print out the line number on which the group of lines started, followed by the entire group. Not sure if that's exactly what you had in mind, but I'm sure it'd help a little bit. On Tue, 2006-06-06 at 14:45 +0800, mark.schreiber at novartis.com wrote: > public String currentLine(); > public int currentLineNumber(); -- Richard Holland (BioMart Team) EMBL-EBI Wellcome Trust Genome Campus Hinxton Cambridge CB10 1SD UNITED KINGDOM Tel: +44-(0)1223-494416 From hubert.prielinger at gmx.at Mon Jun 5 18:49:29 2006 From: hubert.prielinger at gmx.at (Hubert Prielinger) Date: Mon, 05 Jun 2006 16:49:29 -0600 Subject: [Biojava-dev] retrieving species (common name) Message-ID: <4484B4F9.9000502@gmx.at> hi, Is it possible with biojava to retrieve the species not the entire taxonomy, only the common name if I only have the accession id or the name of the protein and if yes how to start..... In my case: I would retrieve the accession id from my local database then assign as parameter to the program, retrieve common name and write the common name back into the database.... the thing I want to know is the retrieving possible with biojava? thanks for help Hubert From richard.holland at ebi.ac.uk Tue Jun 6 11:17:41 2006 From: richard.holland at ebi.ac.uk (Richard Holland) Date: Tue, 06 Jun 2006 16:17:41 +0100 Subject: [Biojava-dev] retrieving species (common name) In-Reply-To: <4484B4F9.9000502@gmx.at> References: <4484B4F9.9000502@gmx.at> Message-ID: <1149607062.3947.92.camel@texas.ebi.ac.uk> I'm not sure what you're asking for here. Could you explain in a little more detail? Maybe write some example program code that assumes BioJava works the way you'd like it to work in this situation, making up the names of classes/methods that you might call in BioJava but don't yet exist, then we can help you fill in the gaps. cheers, Richard On Mon, 2006-06-05 at 16:49 -0600, Hubert Prielinger wrote: > hi, > Is it possible with biojava to retrieve the species not the entire > taxonomy, only the common name if I only have the accession id or the > name of the protein and if yes > how to start..... > In my case: > I would retrieve the accession id from my local database then assign as > parameter to the program, retrieve common name and write the common name > back into the database.... > the thing I want to know is the retrieving possible with biojava? > > thanks for help > > Hubert > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev -- Richard Holland (BioMart Team) EMBL-EBI Wellcome Trust Genome Campus Hinxton Cambridge CB10 1SD UNITED KINGDOM Tel: +44-(0)1223-494416 From mark.schreiber at novartis.com Wed Jun 7 02:02:51 2006 From: mark.schreiber at novartis.com (mark.schreiber at novartis.com) Date: Wed, 7 Jun 2006 14:02:51 +0800 Subject: [Biojava-dev] Proposed change to RichFormat interface Message-ID: That might be a more elegant solution. Could even make the InputStream implement RichSeqIOListener thus it would be sending data to the RichFormat and listening to what the RichFormat makes of the data. The InputStreamIOListener could remember when the RichFormat emits a startXXX() event record the line number and start buffering all the data sent as the readLine() requests are made (while also sending it to the RichFormat). When the RichFormat emits the corresponding endXXX() event the buffer can be cleared and the process starts again. Only problem might be what to do when the RichFormat consumes data in between emitting events (which is allowed). - Mark Michael Heuer Sent by: Michael Heuer 06/07/2006 01:51 PM To: mark.schreiber at novartis.com cc: biojava-dev at biojava.org Subject: Re: [Biojava-dev] Proposed change to RichFormat interface Mark Schreiber wrote: > Hi all - > > I would like to propose a change to the RichFormat interface. I think we > should do this now as we haven't done a stable biojavax roll out yet so > interface > changes should still be allowed. The additional methods would be: > > public String currentLine(); > public int currentLineNumber(); > > This would make debugging a lot easier, it would also make construction of > a RichSeqIOListener that logs and debugs much easier. I was trying to do > this a while back. I started a background process that parsed 6GB of > genbank records looking for records that failed. It worked ok but would be > > much better with the ability to query the RichFormat in the above way. We > might even be able to make it a utility that people could run on suspect > files and generate standard bug reports to make it easier for us to debug > the parser code. > > What do people think?? Another possibility would be to leave this sort of progress tracking up to the client, in that they could wrap the InputStream in something like an CountingInputStream before passing it to the parser(s): http://jakarta.apache.org/commons/io/api-release/org/apache/commons/io/input/CountingInputStream.html michael From heuermh at acm.org Wed Jun 7 01:51:42 2006 From: heuermh at acm.org (Michael Heuer) Date: Wed, 7 Jun 2006 01:51:42 -0400 (EDT) Subject: [Biojava-dev] Proposed change to RichFormat interface In-Reply-To: Message-ID: Mark Schreiber wrote: > Hi all - > > I would like to propose a change to the RichFormat interface. I think we > should do this now as we haven't done a stable biojavax roll out yet so > interface > changes should still be allowed. The additional methods would be: > > public String currentLine(); > public int currentLineNumber(); > > This would make debugging a lot easier, it would also make construction of > a RichSeqIOListener that logs and debugs much easier. I was trying to do > this a while back. I started a background process that parsed 6GB of > genbank records looking for records that failed. It worked ok but would be > > much better with the ability to query the RichFormat in the above way. We > might even be able to make it a utility that people could run on suspect > files and generate standard bug reports to make it easier for us to debug > the parser code. > > What do people think?? Another possibility would be to leave this sort of progress tracking up to the client, in that they could wrap the InputStream in something like an CountingInputStream before passing it to the parser(s): http://jakarta.apache.org/commons/io/api-release/org/apache/commons/io/input/CountingInputStream.html michael From richard.holland at ebi.ac.uk Wed Jun 7 08:36:49 2006 From: richard.holland at ebi.ac.uk (Richard Holland) Date: Wed, 07 Jun 2006 13:36:49 +0100 Subject: [Biojava-dev] Proposed change to RichFormat interface In-Reply-To: References: Message-ID: <1149683810.3947.131.camel@texas.ebi.ac.uk> Hi guys. See org.biojavax.seq.io.DebuggingRichSeqIOListener. It extends BufferedInputStream, so can be used to wrap a normal InputStream before being passed around. It also implements RichSeqIOListener. The idea is that you do something like this: Namespace ns = RichObjectFactory.getDefaultNamespace(); InputStream is = new FileInputStream("myFastaFile.fasta"); FASTAFormat format = new FASTAFormat(); DebuggingRichSeqIOListener debug = new DebuggingRichSeqIOListener(is); BufferedReader br = new BufferedReader( new InputStreamReader(debug)); SymbolTokenization symParser = format.guessSymbolTokenization(debug); format.readRichSequence( br, symParser, debug, ns); This will then dump out everything as it is read, and all events as they happen in-line with the input as it is interpreted. Hope this helps? cheers, Richard On Wed, 2006-06-07 at 14:02 +0800, mark.schreiber at novartis.com wrote: > That might be a more elegant solution. > > Could even make the InputStream implement RichSeqIOListener thus it would > be sending data to the RichFormat and listening to what the RichFormat > makes of the data. > > The InputStreamIOListener could remember when the RichFormat emits a > startXXX() event record the line number and start buffering all the data > sent as the readLine() requests are made (while also sending it to the > RichFormat). When the RichFormat emits the corresponding endXXX() event > the buffer can be cleared and the process starts again. > > Only problem might be what to do when the RichFormat consumes data in > between emitting events (which is allowed). > > - Mark > > > > > > Michael Heuer > Sent by: Michael Heuer > 06/07/2006 01:51 PM > > > To: mark.schreiber at novartis.com > cc: biojava-dev at biojava.org > Subject: Re: [Biojava-dev] Proposed change to RichFormat interface > > > Mark Schreiber wrote: > > > Hi all - > > > > I would like to propose a change to the RichFormat interface. I think > we > > should do this now as we haven't done a stable biojavax roll out yet so > > interface > > changes should still be allowed. The additional methods would be: > > > > public String currentLine(); > > public int currentLineNumber(); > > > > This would make debugging a lot easier, it would also make construction > of > > a RichSeqIOListener that logs and debugs much easier. I was trying to do > > this a while back. I started a background process that parsed 6GB of > > genbank records looking for records that failed. It worked ok but would > be > > > > much better with the ability to query the RichFormat in the above way. > We > > might even be able to make it a utility that people could run on > suspect > > files and generate standard bug reports to make it easier for us to > debug > > the parser code. > > > > What do people think?? > > Another possibility would be to leave this sort of progress tracking up > to the client, in that they could wrap the InputStream in something like > an CountingInputStream before passing it to the parser(s): > > http://jakarta.apache.org/commons/io/api-release/org/apache/commons/io/input/CountingInputStream.html > > michael > > > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev -- Richard Holland (BioMart Team) EMBL-EBI Wellcome Trust Genome Campus Hinxton Cambridge CB10 1SD UNITED KINGDOM Tel: +44-(0)1223-494416 From mark.schreiber at novartis.com Wed Jun 7 21:03:22 2006 From: mark.schreiber at novartis.com (mark.schreiber at novartis.com) Date: Thu, 8 Jun 2006 09:03:22 +0800 Subject: [Biojava-dev] Proposed change to RichFormat interface Message-ID: Very cool! Can you put this example in the cookbook? - Mark Richard Holland Sent by: biojava-dev-bounces at lists.open-bio.org 06/07/2006 08:36 PM To: Mark Schreiber cc: biojava-dev , Michael Heuer , Michael Heuer Subject: Re: [Biojava-dev] Proposed change to RichFormat interface Hi guys. See org.biojavax.seq.io.DebuggingRichSeqIOListener. It extends BufferedInputStream, so can be used to wrap a normal InputStream before being passed around. It also implements RichSeqIOListener. The idea is that you do something like this: Namespace ns = RichObjectFactory.getDefaultNamespace(); InputStream is = new FileInputStream("myFastaFile.fasta"); FASTAFormat format = new FASTAFormat(); DebuggingRichSeqIOListener debug = new DebuggingRichSeqIOListener(is); BufferedReader br = new BufferedReader( new InputStreamReader(debug)); SymbolTokenization symParser = format.guessSymbolTokenization(debug); format.readRichSequence( br, symParser, debug, ns); This will then dump out everything as it is read, and all events as they happen in-line with the input as it is interpreted. Hope this helps? cheers, Richard On Wed, 2006-06-07 at 14:02 +0800, mark.schreiber at novartis.com wrote: > That might be a more elegant solution. > > Could even make the InputStream implement RichSeqIOListener thus it would > be sending data to the RichFormat and listening to what the RichFormat > makes of the data. > > The InputStreamIOListener could remember when the RichFormat emits a > startXXX() event record the line number and start buffering all the data > sent as the readLine() requests are made (while also sending it to the > RichFormat). When the RichFormat emits the corresponding endXXX() event > the buffer can be cleared and the process starts again. > > Only problem might be what to do when the RichFormat consumes data in > between emitting events (which is allowed). > > - Mark > > > > > > Michael Heuer > Sent by: Michael Heuer > 06/07/2006 01:51 PM > > > To: mark.schreiber at novartis.com > cc: biojava-dev at biojava.org > Subject: Re: [Biojava-dev] Proposed change to RichFormat interface > > > Mark Schreiber wrote: > > > Hi all - > > > > I would like to propose a change to the RichFormat interface. I think > we > > should do this now as we haven't done a stable biojavax roll out yet so > > interface > > changes should still be allowed. The additional methods would be: > > > > public String currentLine(); > > public int currentLineNumber(); > > > > This would make debugging a lot easier, it would also make construction > of > > a RichSeqIOListener that logs and debugs much easier. I was trying to do > > this a while back. I started a background process that parsed 6GB of > > genbank records looking for records that failed. It worked ok but would > be > > > > much better with the ability to query the RichFormat in the above way. > We > > might even be able to make it a utility that people could run on > suspect > > files and generate standard bug reports to make it easier for us to > debug > > the parser code. > > > > What do people think?? > > Another possibility would be to leave this sort of progress tracking up > to the client, in that they could wrap the InputStream in something like > an CountingInputStream before passing it to the parser(s): > > http://jakarta.apache.org/commons/io/api-release/org/apache/commons/io/input/CountingInputStream.html > > michael > > > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev -- Richard Holland (BioMart Team) EMBL-EBI Wellcome Trust Genome Campus Hinxton Cambridge CB10 1SD UNITED KINGDOM Tel: +44-(0)1223-494416 _______________________________________________ biojava-dev mailing list biojava-dev at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-dev From richard.holland at ebi.ac.uk Mon Jun 12 04:52:53 2006 From: richard.holland at ebi.ac.uk (Richard Holland) Date: Mon, 12 Jun 2006 09:52:53 +0100 Subject: [Biojava-dev] retrieving species (common name) In-Reply-To: <4489DF3F.4060504@gmx.at> References: <4484B4F9.9000502@gmx.at> <1149607062.3947.92.camel@texas.ebi.ac.uk> <4489DF3F.4060504@gmx.at> Message-ID: <1150102373.3952.21.camel@texas.ebi.ac.uk> I'm assuming your sequences and taxonomy data are stored in BioSQL. In which case, it's fairly straightforward to get this information out without having to drag all the features and annotations out as well, by using BioEntry instead of RichSequence to query the database. Code like this should work (hasn't been checked or anything, but it gives you an idea as to how things should go): // connect to BioSQL and establish a Hibernate Session Session sess = ...; // set up BioJavaX to use the session RichObjectFactory.connectToBioSQL(sess); // instantiate the class that gets BioEntries from BioSQL. // use BioSQLRichSequenceDB instead if you want features and // annotations included. BioEntryDB db = new BioSQLBioEntryDB(sess); // get BioEntry for accession (accession must be the // primary accession of the sequence, as found in the // 'name' column in the 'bioentry' table in the database). BioEntry be = db.getBioEntry("YPOL_IBDVS"); // get BioEntry's taxon object NCBITaxon tax = be.getTaxon(); // print the names. Each name belongs to a name class. for (Iterator i = tax.getNameClasses().iterator(); i.hasNext(); ) { String nameClass = (String)i.next(); for (Iterator k = tax.getNames(nameClass).iterator(); k.hasNext(); ) { String name = (String)k.next(); System.out.println(nameClass+" : "+name); } } If your sequences and taxonomy data are not stored in BioSQL, then the only way to do this is to parse the taxonomy data on startup, parse the sequences on startup into a simple in-memory system such as HashRichSequenceDB, then use the methods on the RichSequenceDB interface to obtain sequences by accession before continuing as per the example above. cheers, Richard On Fri, 2006-06-09 at 14:51 -0600, Hubert Prielinger wrote: > hi, > sorry for replying that late, > I have XML blast outputs, which you can retrieve information like > accession id, protein name, length of sequnence aso.... > but there is no possibility to retrieve the taxonomy (especially the > scientific name or common name) > I need the common and scientific name from each blast hit. I have found > in biojava-live/src/org/biojava/bibliography/taxa a few code examples > that could suit my > task (e.g: simpleTaxon.java) > > eg: I have the accession id: YPOL_IBDVS > and I want to get the taxonomy of that protein, not neccessarily the > entire taxonomy but mentioned above scientific and common name. > and I don't know exactly how to get the taxonomy, it seems that there is > no directly way from the accession id, but over the taxon id, but I > don't know how to get that either..... > it must be possible to map the accession id to the taxon id and then > request with the taxon id the taxonomy, if I get it right..... > > thanks in advance > regards > Hubert > > > Richard Holland wrote: > > I'm not sure what you're asking for here. Could you explain in a little > > more detail? Maybe write some example program code that assumes BioJava > > works the way you'd like it to work in this situation, making up the > > names of classes/methods that you might call in BioJava but don't yet > > exist, then we can help you fill in the gaps. > > > > cheers, > > Richard > > > > On Mon, 2006-06-05 at 16:49 -0600, Hubert Prielinger wrote: > > > >> hi, > >> Is it possible with biojava to retrieve the species not the entire > >> taxonomy, only the common name if I only have the accession id or the > >> name of the protein and if yes > >> how to start..... > >> In my case: > >> I would retrieve the accession id from my local database then assign as > >> parameter to the program, retrieve common name and write the common name > >> back into the database.... > >> the thing I want to know is the retrieving possible with biojava? > >> > >> thanks for help > >> > >> Hubert > >> _______________________________________________ > >> biojava-dev mailing list > >> biojava-dev at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/biojava-dev > >> > -- Richard Holland (BioMart Team) EMBL-EBI Wellcome Trust Genome Campus Hinxton Cambridge CB10 1SD UNITED KINGDOM Tel: +44-(0)1223-494416 From Robin.Emig at pioneer.com Mon Jun 12 15:01:12 2006 From: Robin.Emig at pioneer.com (Emig, Robin) Date: Mon, 12 Jun 2006 12:01:12 -0700 Subject: [Biojava-dev] Read/Write Account Message-ID: Can I get a read write account for biojava? I used to have one under remig, or raemig. Thanks Robin Robin Emig Pioneer HiBred/Dupont 700A Bay Road Redwood City, CA 94063 650-298-3564 From hubert.prielinger at gmx.at Fri Jun 9 16:51:11 2006 From: hubert.prielinger at gmx.at (Hubert Prielinger) Date: Fri, 09 Jun 2006 14:51:11 -0600 Subject: [Biojava-dev] retrieving species (common name) In-Reply-To: <1149607062.3947.92.camel@texas.ebi.ac.uk> References: <4484B4F9.9000502@gmx.at> <1149607062.3947.92.camel@texas.ebi.ac.uk> Message-ID: <4489DF3F.4060504@gmx.at> hi, sorry for replying that late, I have XML blast outputs, which you can retrieve information like accession id, protein name, length of sequnence aso.... but there is no possibility to retrieve the taxonomy (especially the scientific name or common name) I need the common and scientific name from each blast hit. I have found in biojava-live/src/org/biojava/bibliography/taxa a few code examples that could suit my task (e.g: simpleTaxon.java) eg: I have the accession id: YPOL_IBDVS and I want to get the taxonomy of that protein, not neccessarily the entire taxonomy but mentioned above scientific and common name. and I don't know exactly how to get the taxonomy, it seems that there is no directly way from the accession id, but over the taxon id, but I don't know how to get that either..... it must be possible to map the accession id to the taxon id and then request with the taxon id the taxonomy, if I get it right..... thanks in advance regards Hubert Richard Holland wrote: > I'm not sure what you're asking for here. Could you explain in a little > more detail? Maybe write some example program code that assumes BioJava > works the way you'd like it to work in this situation, making up the > names of classes/methods that you might call in BioJava but don't yet > exist, then we can help you fill in the gaps. > > cheers, > Richard > > On Mon, 2006-06-05 at 16:49 -0600, Hubert Prielinger wrote: > >> hi, >> Is it possible with biojava to retrieve the species not the entire >> taxonomy, only the common name if I only have the accession id or the >> name of the protein and if yes >> how to start..... >> In my case: >> I would retrieve the accession id from my local database then assign as >> parameter to the program, retrieve common name and write the common name >> back into the database.... >> the thing I want to know is the retrieving possible with biojava? >> >> thanks for help >> >> Hubert >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> From hubert.prielinger at gmx.at Fri Jun 9 18:10:12 2006 From: hubert.prielinger at gmx.at (Hubert Prielinger) Date: Fri, 09 Jun 2006 16:10:12 -0600 Subject: [Biojava-dev] retrieving species (common name) In-Reply-To: <1149607062.3947.92.camel@texas.ebi.ac.uk> References: <4484B4F9.9000502@gmx.at> <1149607062.3947.92.camel@texas.ebi.ac.uk> Message-ID: <4489F1C4.3030803@gmx.at> ok, with accession id, I mean the genbank id, if I have the genbank id, is there a direct way to get the common name... Richard Holland wrote: > I'm not sure what you're asking for here. Could you explain in a little > more detail? Maybe write some example program code that assumes BioJava > works the way you'd like it to work in this situation, making up the > names of classes/methods that you might call in BioJava but don't yet > exist, then we can help you fill in the gaps. > > cheers, > Richard > > On Mon, 2006-06-05 at 16:49 -0600, Hubert Prielinger wrote: > >> hi, >> Is it possible with biojava to retrieve the species not the entire >> taxonomy, only the common name if I only have the accession id or the >> name of the protein and if yes >> how to start..... >> In my case: >> I would retrieve the accession id from my local database then assign as >> parameter to the program, retrieve common name and write the common name >> back into the database.... >> the thing I want to know is the retrieving possible with biojava? >> >> thanks for help >> >> Hubert >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> From hubert.prielinger at gmx.at Mon Jun 12 12:36:32 2006 From: hubert.prielinger at gmx.at (Hubert Prielinger) Date: Mon, 12 Jun 2006 10:36:32 -0600 Subject: [Biojava-dev] retrieving species (common name) In-Reply-To: <1150102373.3952.21.camel@texas.ebi.ac.uk> References: <4484B4F9.9000502@gmx.at> <1149607062.3947.92.camel@texas.ebi.ac.uk> <4489DF3F.4060504@gmx.at> <1150102373.3952.21.camel@texas.ebi.ac.uk> Message-ID: <448D9810.1060703@gmx.at> hi, No, I'm not using BioSQL, it is an usual mySQL database and I have only the genbank accession id available. I want to get the taxonomy with the accession id, if that is possible. regards Hubert Richard Holland wrote: > I'm assuming your sequences and taxonomy data are stored in BioSQL. In > which case, it's fairly straightforward to get this information out > without having to drag all the features and annotations out as well, by > using BioEntry instead of RichSequence to query the database. Code like > this should work (hasn't been checked or anything, but it gives you an > idea as to how things should go): > > // connect to BioSQL and establish a Hibernate Session > Session sess = ...; > > // set up BioJavaX to use the session > RichObjectFactory.connectToBioSQL(sess); > > // instantiate the class that gets BioEntries from BioSQL. > // use BioSQLRichSequenceDB instead if you want features and > // annotations included. > BioEntryDB db = new BioSQLBioEntryDB(sess); > > // get BioEntry for accession (accession must be the > // primary accession of the sequence, as found in the > // 'name' column in the 'bioentry' table in the database). > BioEntry be = db.getBioEntry("YPOL_IBDVS"); > > // get BioEntry's taxon object > NCBITaxon tax = be.getTaxon(); > > // print the names. Each name belongs to a name class. > for (Iterator i = tax.getNameClasses().iterator(); > i.hasNext(); > ) { > String nameClass = (String)i.next(); > for (Iterator k = tax.getNames(nameClass).iterator(); > k.hasNext(); > ) { > String name = (String)k.next(); > System.out.println(nameClass+" : "+name); > } > } > > > If your sequences and taxonomy data are not stored in BioSQL, then the > only way to do this is to parse the taxonomy data on startup, parse the > sequences on startup into a simple in-memory system such as > HashRichSequenceDB, then use the methods on the RichSequenceDB interface > to obtain sequences by accession before continuing as per the example > above. > > cheers, > Richard > > > On Fri, 2006-06-09 at 14:51 -0600, Hubert Prielinger wrote: > >> hi, >> sorry for replying that late, >> I have XML blast outputs, which you can retrieve information like >> accession id, protein name, length of sequnence aso.... >> but there is no possibility to retrieve the taxonomy (especially the >> scientific name or common name) >> I need the common and scientific name from each blast hit. I have found >> in biojava-live/src/org/biojava/bibliography/taxa a few code examples >> that could suit my >> task (e.g: simpleTaxon.java) >> >> eg: I have the accession id: YPOL_IBDVS >> and I want to get the taxonomy of that protein, not neccessarily the >> entire taxonomy but mentioned above scientific and common name. >> and I don't know exactly how to get the taxonomy, it seems that there is >> no directly way from the accession id, but over the taxon id, but I >> don't know how to get that either..... >> it must be possible to map the accession id to the taxon id and then >> request with the taxon id the taxonomy, if I get it right..... >> >> thanks in advance >> regards >> Hubert >> >> >> Richard Holland wrote: >> >>> I'm not sure what you're asking for here. Could you explain in a little >>> more detail? Maybe write some example program code that assumes BioJava >>> works the way you'd like it to work in this situation, making up the >>> names of classes/methods that you might call in BioJava but don't yet >>> exist, then we can help you fill in the gaps. >>> >>> cheers, >>> Richard >>> >>> On Mon, 2006-06-05 at 16:49 -0600, Hubert Prielinger wrote: >>> >>> >>>> hi, >>>> Is it possible with biojava to retrieve the species not the entire >>>> taxonomy, only the common name if I only have the accession id or the >>>> name of the protein and if yes >>>> how to start..... >>>> In my case: >>>> I would retrieve the accession id from my local database then assign as >>>> parameter to the program, retrieve common name and write the common name >>>> back into the database.... >>>> the thing I want to know is the retrieving possible with biojava? >>>> >>>> thanks for help >>>> >>>> Hubert >>>> _______________________________________________ >>>> biojava-dev mailing list >>>> biojava-dev at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >>>> >>>> From richard.holland at ebi.ac.uk Tue Jun 13 04:58:21 2006 From: richard.holland at ebi.ac.uk (Richard Holland) Date: Tue, 13 Jun 2006 09:58:21 +0100 Subject: [Biojava-dev] retrieving species (common name) In-Reply-To: <448D9810.1060703@gmx.at> References: <4484B4F9.9000502@gmx.at> <1149607062.3947.92.camel@texas.ebi.ac.uk> <4489DF3F.4060504@gmx.at> <1150102373.3952.21.camel@texas.ebi.ac.uk> <448D9810.1060703@gmx.at> Message-ID: <1150189101.3952.47.camel@texas.ebi.ac.uk> At present, BJX only has bindings to BioSQL (which can be installed in Oracle, MySQL, PostgreSQL, or HSQL depending on your preference). It doesn't know how to access sequence/taxonomy data stored in other databases. Of course, it can still read flat files. Without a database which BJX understands, the only way to do what you describe is to load taxonomy data from the NCBI taxonomy files into memory on startup, then set up some mechanism of parsing Genbank records on the fly according to accession number... I could go into detail but it's a bit complex. So the short answer is - no, you can't do that kind of query without coming up with some clever way of using file parsers efficiently on the fly, or by storing everything in a BioSQL database. Have a look at RichSequenceListener if you want to selectively parse sequence files. cheers, Richard On Mon, 2006-06-12 at 10:36 -0600, Hubert Prielinger wrote: > > If your sequences and taxonomy data are not stored in BioSQL, then > the > > only way to do this is to parse the taxonomy data on startup, parse > the > > sequences on startup into a simple in-memory system such as > > HashRichSequenceDB, then use the methods on the RichSequenceDB > interface > > to obtain sequences by accession before continuing as per the > example > > above. -- Richard Holland (BioMart Team) EMBL-EBI Wellcome Trust Genome Campus Hinxton Cambridge CB10 1SD UNITED KINGDOM Tel: +44-(0)1223-494416 From richard.holland at ebi.ac.uk Tue Jun 13 11:20:15 2006 From: richard.holland at ebi.ac.uk (Richard Holland) Date: Tue, 13 Jun 2006 16:20:15 +0100 Subject: [Biojava-dev] Read/Write Account In-Reply-To: References: Message-ID: <1150212015.3952.121.camel@texas.ebi.ac.uk> Hi Robin. Mark should be able to set you up with one, or point you to the person who can. I can never remember who's in charge. Purely out of interest, what are you planning on changing once you get access? It's useful to know what people are up to out there so we don't duplicate effort. cheers, Richard On Mon, 2006-06-12 at 12:01 -0700, Emig, Robin wrote: > Can I get a read write account for biojava? I used to have one under > remig, or raemig. > > Thanks > > Robin > > > > Robin Emig > > Pioneer HiBred/Dupont > > 700A Bay Road > > Redwood City, CA 94063 > > 650-298-3564 > > > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev -- Richard Holland (BioMart Team) EMBL-EBI Wellcome Trust Genome Campus Hinxton Cambridge CB10 1SD UNITED KINGDOM Tel: +44-(0)1223-494416 From mark.schreiber at novartis.com Sun Jun 18 22:38:38 2006 From: mark.schreiber at novartis.com (mark.schreiber at novartis.com) Date: Mon, 19 Jun 2006 10:38:38 +0800 Subject: [Biojava-dev] retrieving species (common name) Message-ID: You could try NCBI's e-utils webservice. This might enable you to get the common name using a GI number. - Mark Hubert Prielinger Sent by: biojava-dev-bounces at lists.open-bio.org 06/13/2006 12:36 AM To: Richard Holland , biojava-dev at lists.open-bio.org cc: (bcc: Mark Schreiber/GP/Novartis) Subject: Re: [Biojava-dev] retrieving species (common name) hi, No, I'm not using BioSQL, it is an usual mySQL database and I have only the genbank accession id available. I want to get the taxonomy with the accession id, if that is possible. regards Hubert Richard Holland wrote: > I'm assuming your sequences and taxonomy data are stored in BioSQL. In > which case, it's fairly straightforward to get this information out > without having to drag all the features and annotations out as well, by > using BioEntry instead of RichSequence to query the database. Code like > this should work (hasn't been checked or anything, but it gives you an > idea as to how things should go): > > // connect to BioSQL and establish a Hibernate Session > Session sess = ...; > > // set up BioJavaX to use the session > RichObjectFactory.connectToBioSQL(sess); > > // instantiate the class that gets BioEntries from BioSQL. > // use BioSQLRichSequenceDB instead if you want features and > // annotations included. > BioEntryDB db = new BioSQLBioEntryDB(sess); > > // get BioEntry for accession (accession must be the > // primary accession of the sequence, as found in the > // 'name' column in the 'bioentry' table in the database). > BioEntry be = db.getBioEntry("YPOL_IBDVS"); > > // get BioEntry's taxon object > NCBITaxon tax = be.getTaxon(); > > // print the names. Each name belongs to a name class. > for (Iterator i = tax.getNameClasses().iterator(); > i.hasNext(); > ) { > String nameClass = (String)i.next(); > for (Iterator k = tax.getNames(nameClass).iterator(); > k.hasNext(); > ) { > String name = (String)k.next(); > System.out.println(nameClass+" : "+name); > } > } > > > If your sequences and taxonomy data are not stored in BioSQL, then the > only way to do this is to parse the taxonomy data on startup, parse the > sequences on startup into a simple in-memory system such as > HashRichSequenceDB, then use the methods on the RichSequenceDB interface > to obtain sequences by accession before continuing as per the example > above. > > cheers, > Richard > > > On Fri, 2006-06-09 at 14:51 -0600, Hubert Prielinger wrote: > >> hi, >> sorry for replying that late, >> I have XML blast outputs, which you can retrieve information like >> accession id, protein name, length of sequnence aso.... >> but there is no possibility to retrieve the taxonomy (especially the >> scientific name or common name) >> I need the common and scientific name from each blast hit. I have found >> in biojava-live/src/org/biojava/bibliography/taxa a few code examples >> that could suit my >> task (e.g: simpleTaxon.java) >> >> eg: I have the accession id: YPOL_IBDVS >> and I want to get the taxonomy of that protein, not neccessarily the >> entire taxonomy but mentioned above scientific and common name. >> and I don't know exactly how to get the taxonomy, it seems that there is >> no directly way from the accession id, but over the taxon id, but I >> don't know how to get that either..... >> it must be possible to map the accession id to the taxon id and then >> request with the taxon id the taxonomy, if I get it right..... >> >> thanks in advance >> regards >> Hubert >> >> >> Richard Holland wrote: >> >>> I'm not sure what you're asking for here. Could you explain in a little >>> more detail? Maybe write some example program code that assumes BioJava >>> works the way you'd like it to work in this situation, making up the >>> names of classes/methods that you might call in BioJava but don't yet >>> exist, then we can help you fill in the gaps. >>> >>> cheers, >>> Richard >>> >>> On Mon, 2006-06-05 at 16:49 -0600, Hubert Prielinger wrote: >>> >>> >>>> hi, >>>> Is it possible with biojava to retrieve the species not the entire >>>> taxonomy, only the common name if I only have the accession id or the >>>> name of the protein and if yes >>>> how to start..... >>>> In my case: >>>> I would retrieve the accession id from my local database then assign as >>>> parameter to the program, retrieve common name and write the common name >>>> back into the database.... >>>> the thing I want to know is the retrieving possible with biojava? >>>> >>>> thanks for help >>>> >>>> Hubert >>>> _______________________________________________ >>>> biojava-dev mailing list >>>> biojava-dev at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >>>> >>>> _______________________________________________ biojava-dev mailing list biojava-dev at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-dev From gwaldon at geneinfinity.org Thu Jun 29 19:33:55 2006 From: gwaldon at geneinfinity.org (george waldon) Date: Thu, 29 Jun 2006 16:33:55 -0700 Subject: [Biojava-dev] Problem with SimpleDocRefTest Message-ID: <200606292333.k5TNXwdC026213@mmm1924.dulles19-verio.com> Hi, I am trying to run the tests but my compiler complains about org.biojavax.SimpleDocRefTest at compilation: missing constructor SimpleDocRef(java.util.List, java.lang.String) Was-It expected to interpretate that as SimpleDocRef(java.util.List, java.lang.String, null) (the three-argument constructor)? Thanks, George From richard.holland at ebi.ac.uk Fri Jun 30 05:11:11 2006 From: richard.holland at ebi.ac.uk (Richard Holland) Date: Fri, 30 Jun 2006 10:11:11 +0100 Subject: [Biojava-dev] Problem with SimpleDocRefTest In-Reply-To: <200606292333.k5TNXwdC026213@mmm1924.dulles19-verio.com> References: <200606292333.k5TNXwdC026213@mmm1924.dulles19-verio.com> Message-ID: <1151658671.3942.75.camel@texas.ebi.ac.uk> The SimpleDocRef constructor changed recently to include document titles and I don't think the test was updated to match it. Sorry about that - the head branch of CVS is always under development so cannot always be guaranteed to work out 100%. Mark, can you update the tests? cheers, Richard On Thu, 2006-06-29 at 16:33 -0700, george waldon wrote: > Hi, > > I am trying to run the tests but my compiler complains about org.biojavax.SimpleDocRefTest at compilation: > missing constructor SimpleDocRef(java.util.List, java.lang.String) > > Was-It expected to interpretate that as SimpleDocRef(java.util.List, java.lang.String, null) (the three-argument constructor)? > > Thanks, > George > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev -- Richard Holland (BioMart Team) EMBL-EBI Wellcome Trust Genome Campus Hinxton Cambridge CB10 1SD UNITED KINGDOM Tel: +44-(0)1223-494416 From mark.schreiber at novartis.com Tue Jun 6 06:45:19 2006 From: mark.schreiber at novartis.com (mark.schreiber at novartis.com) Date: Tue, 6 Jun 2006 14:45:19 +0800 Subject: [Biojava-dev] Proposed change to RichFormat interface Message-ID: Hi all - I would like to propose a change to the RichFormat interface. I think we should do this now as we haven't done a stable biojavax roll out yet so interface changes should still be allowed. The additional methods would be: public String currentLine(); public int currentLineNumber(); This would make debugging a lot easier, it would also make construction of a RichSeqIOListener that logs and debugs much easier. I was trying to do this a while back. I started a background process that parsed 6GB of genbank records looking for records that failed. It worked ok but would be much better with the ability to query the RichFormat in the above way. We might even be able to make it a utility that people could run on suspect files and generate standard bug reports to make it easier for us to debug the parser code. What do people think?? - Mark From richard.holland at ebi.ac.uk Tue Jun 6 08:10:40 2006 From: richard.holland at ebi.ac.uk (Richard Holland) Date: Tue, 06 Jun 2006 09:10:40 +0100 Subject: [Biojava-dev] Proposed change to RichFormat interface In-Reply-To: References: Message-ID: <1149581440.3947.56.camel@texas.ebi.ac.uk> Go for it. It would be very helpful. On Tue, 2006-06-06 at 14:45 +0800, mark.schreiber at novartis.com wrote: > Hi all - > > I would like to propose a change to the RichFormat interface. I think we > should do this now as we haven't done a stable biojavax roll out yet so > interface > changes should still be allowed. The additional methods would be: > > public String currentLine(); > public int currentLineNumber(); > > This would make debugging a lot easier, it would also make construction of > a RichSeqIOListener that logs and debugs much easier. I was trying to do > this a while back. I started a background process that parsed 6GB of > genbank records looking for records that failed. It worked ok but would be > > much better with the ability to query the RichFormat in the above way. We > might even be able to make it a utility that people could run on suspect > files and generate standard bug reports to make it easier for us to debug > the parser code. > > What do people think?? > - Mark > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev -- Richard Holland (BioMart Team) EMBL-EBI Wellcome Trust Genome Campus Hinxton Cambridge CB10 1SD UNITED KINGDOM Tel: +44-(0)1223-494416 From mark.schreiber at novartis.com Tue Jun 6 09:41:41 2006 From: mark.schreiber at novartis.com (mark.schreiber at novartis.com) Date: Tue, 6 Jun 2006 17:41:41 +0800 Subject: [Biojava-dev] Proposed change to RichFormat interface Message-ID: maybe the method should be something like public String currentParseString() The question is should the currentLineNumber be the start of the parse block or the end? I would favour the start of the parse block. This would be more like compiler type behaivour but might be trickier to code?? - Mark Richard Holland 06/06/2006 05:31 PM To: Mark Schreiber cc: Subject: Re: [Biojava-dev] Proposed change to RichFormat interface It's worth pointing out that most of the parsers bunch together lines, so the methods below would probably print out the line number on which the group of lines started, followed by the entire group. Not sure if that's exactly what you had in mind, but I'm sure it'd help a little bit. On Tue, 2006-06-06 at 14:45 +0800, mark.schreiber at novartis.com wrote: > public String currentLine(); > public int currentLineNumber(); -- Richard Holland (BioMart Team) EMBL-EBI Wellcome Trust Genome Campus Hinxton Cambridge CB10 1SD UNITED KINGDOM Tel: +44-(0)1223-494416 From hubert.prielinger at gmx.at Mon Jun 5 22:49:29 2006 From: hubert.prielinger at gmx.at (Hubert Prielinger) Date: Mon, 05 Jun 2006 16:49:29 -0600 Subject: [Biojava-dev] retrieving species (common name) Message-ID: <4484B4F9.9000502@gmx.at> hi, Is it possible with biojava to retrieve the species not the entire taxonomy, only the common name if I only have the accession id or the name of the protein and if yes how to start..... In my case: I would retrieve the accession id from my local database then assign as parameter to the program, retrieve common name and write the common name back into the database.... the thing I want to know is the retrieving possible with biojava? thanks for help Hubert From richard.holland at ebi.ac.uk Tue Jun 6 15:17:41 2006 From: richard.holland at ebi.ac.uk (Richard Holland) Date: Tue, 06 Jun 2006 16:17:41 +0100 Subject: [Biojava-dev] retrieving species (common name) In-Reply-To: <4484B4F9.9000502@gmx.at> References: <4484B4F9.9000502@gmx.at> Message-ID: <1149607062.3947.92.camel@texas.ebi.ac.uk> I'm not sure what you're asking for here. Could you explain in a little more detail? Maybe write some example program code that assumes BioJava works the way you'd like it to work in this situation, making up the names of classes/methods that you might call in BioJava but don't yet exist, then we can help you fill in the gaps. cheers, Richard On Mon, 2006-06-05 at 16:49 -0600, Hubert Prielinger wrote: > hi, > Is it possible with biojava to retrieve the species not the entire > taxonomy, only the common name if I only have the accession id or the > name of the protein and if yes > how to start..... > In my case: > I would retrieve the accession id from my local database then assign as > parameter to the program, retrieve common name and write the common name > back into the database.... > the thing I want to know is the retrieving possible with biojava? > > thanks for help > > Hubert > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev -- Richard Holland (BioMart Team) EMBL-EBI Wellcome Trust Genome Campus Hinxton Cambridge CB10 1SD UNITED KINGDOM Tel: +44-(0)1223-494416 From mark.schreiber at novartis.com Wed Jun 7 06:02:51 2006 From: mark.schreiber at novartis.com (mark.schreiber at novartis.com) Date: Wed, 7 Jun 2006 14:02:51 +0800 Subject: [Biojava-dev] Proposed change to RichFormat interface Message-ID: That might be a more elegant solution. Could even make the InputStream implement RichSeqIOListener thus it would be sending data to the RichFormat and listening to what the RichFormat makes of the data. The InputStreamIOListener could remember when the RichFormat emits a startXXX() event record the line number and start buffering all the data sent as the readLine() requests are made (while also sending it to the RichFormat). When the RichFormat emits the corresponding endXXX() event the buffer can be cleared and the process starts again. Only problem might be what to do when the RichFormat consumes data in between emitting events (which is allowed). - Mark Michael Heuer Sent by: Michael Heuer 06/07/2006 01:51 PM To: mark.schreiber at novartis.com cc: biojava-dev at biojava.org Subject: Re: [Biojava-dev] Proposed change to RichFormat interface Mark Schreiber wrote: > Hi all - > > I would like to propose a change to the RichFormat interface. I think we > should do this now as we haven't done a stable biojavax roll out yet so > interface > changes should still be allowed. The additional methods would be: > > public String currentLine(); > public int currentLineNumber(); > > This would make debugging a lot easier, it would also make construction of > a RichSeqIOListener that logs and debugs much easier. I was trying to do > this a while back. I started a background process that parsed 6GB of > genbank records looking for records that failed. It worked ok but would be > > much better with the ability to query the RichFormat in the above way. We > might even be able to make it a utility that people could run on suspect > files and generate standard bug reports to make it easier for us to debug > the parser code. > > What do people think?? Another possibility would be to leave this sort of progress tracking up to the client, in that they could wrap the InputStream in something like an CountingInputStream before passing it to the parser(s): http://jakarta.apache.org/commons/io/api-release/org/apache/commons/io/input/CountingInputStream.html michael From heuermh at acm.org Wed Jun 7 05:51:42 2006 From: heuermh at acm.org (Michael Heuer) Date: Wed, 7 Jun 2006 01:51:42 -0400 (EDT) Subject: [Biojava-dev] Proposed change to RichFormat interface In-Reply-To: Message-ID: Mark Schreiber wrote: > Hi all - > > I would like to propose a change to the RichFormat interface. I think we > should do this now as we haven't done a stable biojavax roll out yet so > interface > changes should still be allowed. The additional methods would be: > > public String currentLine(); > public int currentLineNumber(); > > This would make debugging a lot easier, it would also make construction of > a RichSeqIOListener that logs and debugs much easier. I was trying to do > this a while back. I started a background process that parsed 6GB of > genbank records looking for records that failed. It worked ok but would be > > much better with the ability to query the RichFormat in the above way. We > might even be able to make it a utility that people could run on suspect > files and generate standard bug reports to make it easier for us to debug > the parser code. > > What do people think?? Another possibility would be to leave this sort of progress tracking up to the client, in that they could wrap the InputStream in something like an CountingInputStream before passing it to the parser(s): http://jakarta.apache.org/commons/io/api-release/org/apache/commons/io/input/CountingInputStream.html michael From richard.holland at ebi.ac.uk Wed Jun 7 12:36:49 2006 From: richard.holland at ebi.ac.uk (Richard Holland) Date: Wed, 07 Jun 2006 13:36:49 +0100 Subject: [Biojava-dev] Proposed change to RichFormat interface In-Reply-To: References: Message-ID: <1149683810.3947.131.camel@texas.ebi.ac.uk> Hi guys. See org.biojavax.seq.io.DebuggingRichSeqIOListener. It extends BufferedInputStream, so can be used to wrap a normal InputStream before being passed around. It also implements RichSeqIOListener. The idea is that you do something like this: Namespace ns = RichObjectFactory.getDefaultNamespace(); InputStream is = new FileInputStream("myFastaFile.fasta"); FASTAFormat format = new FASTAFormat(); DebuggingRichSeqIOListener debug = new DebuggingRichSeqIOListener(is); BufferedReader br = new BufferedReader( new InputStreamReader(debug)); SymbolTokenization symParser = format.guessSymbolTokenization(debug); format.readRichSequence( br, symParser, debug, ns); This will then dump out everything as it is read, and all events as they happen in-line with the input as it is interpreted. Hope this helps? cheers, Richard On Wed, 2006-06-07 at 14:02 +0800, mark.schreiber at novartis.com wrote: > That might be a more elegant solution. > > Could even make the InputStream implement RichSeqIOListener thus it would > be sending data to the RichFormat and listening to what the RichFormat > makes of the data. > > The InputStreamIOListener could remember when the RichFormat emits a > startXXX() event record the line number and start buffering all the data > sent as the readLine() requests are made (while also sending it to the > RichFormat). When the RichFormat emits the corresponding endXXX() event > the buffer can be cleared and the process starts again. > > Only problem might be what to do when the RichFormat consumes data in > between emitting events (which is allowed). > > - Mark > > > > > > Michael Heuer > Sent by: Michael Heuer > 06/07/2006 01:51 PM > > > To: mark.schreiber at novartis.com > cc: biojava-dev at biojava.org > Subject: Re: [Biojava-dev] Proposed change to RichFormat interface > > > Mark Schreiber wrote: > > > Hi all - > > > > I would like to propose a change to the RichFormat interface. I think > we > > should do this now as we haven't done a stable biojavax roll out yet so > > interface > > changes should still be allowed. The additional methods would be: > > > > public String currentLine(); > > public int currentLineNumber(); > > > > This would make debugging a lot easier, it would also make construction > of > > a RichSeqIOListener that logs and debugs much easier. I was trying to do > > this a while back. I started a background process that parsed 6GB of > > genbank records looking for records that failed. It worked ok but would > be > > > > much better with the ability to query the RichFormat in the above way. > We > > might even be able to make it a utility that people could run on > suspect > > files and generate standard bug reports to make it easier for us to > debug > > the parser code. > > > > What do people think?? > > Another possibility would be to leave this sort of progress tracking up > to the client, in that they could wrap the InputStream in something like > an CountingInputStream before passing it to the parser(s): > > http://jakarta.apache.org/commons/io/api-release/org/apache/commons/io/input/CountingInputStream.html > > michael > > > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev -- Richard Holland (BioMart Team) EMBL-EBI Wellcome Trust Genome Campus Hinxton Cambridge CB10 1SD UNITED KINGDOM Tel: +44-(0)1223-494416 From mark.schreiber at novartis.com Thu Jun 8 01:03:22 2006 From: mark.schreiber at novartis.com (mark.schreiber at novartis.com) Date: Thu, 8 Jun 2006 09:03:22 +0800 Subject: [Biojava-dev] Proposed change to RichFormat interface Message-ID: Very cool! Can you put this example in the cookbook? - Mark Richard Holland Sent by: biojava-dev-bounces at lists.open-bio.org 06/07/2006 08:36 PM To: Mark Schreiber cc: biojava-dev , Michael Heuer , Michael Heuer Subject: Re: [Biojava-dev] Proposed change to RichFormat interface Hi guys. See org.biojavax.seq.io.DebuggingRichSeqIOListener. It extends BufferedInputStream, so can be used to wrap a normal InputStream before being passed around. It also implements RichSeqIOListener. The idea is that you do something like this: Namespace ns = RichObjectFactory.getDefaultNamespace(); InputStream is = new FileInputStream("myFastaFile.fasta"); FASTAFormat format = new FASTAFormat(); DebuggingRichSeqIOListener debug = new DebuggingRichSeqIOListener(is); BufferedReader br = new BufferedReader( new InputStreamReader(debug)); SymbolTokenization symParser = format.guessSymbolTokenization(debug); format.readRichSequence( br, symParser, debug, ns); This will then dump out everything as it is read, and all events as they happen in-line with the input as it is interpreted. Hope this helps? cheers, Richard On Wed, 2006-06-07 at 14:02 +0800, mark.schreiber at novartis.com wrote: > That might be a more elegant solution. > > Could even make the InputStream implement RichSeqIOListener thus it would > be sending data to the RichFormat and listening to what the RichFormat > makes of the data. > > The InputStreamIOListener could remember when the RichFormat emits a > startXXX() event record the line number and start buffering all the data > sent as the readLine() requests are made (while also sending it to the > RichFormat). When the RichFormat emits the corresponding endXXX() event > the buffer can be cleared and the process starts again. > > Only problem might be what to do when the RichFormat consumes data in > between emitting events (which is allowed). > > - Mark > > > > > > Michael Heuer > Sent by: Michael Heuer > 06/07/2006 01:51 PM > > > To: mark.schreiber at novartis.com > cc: biojava-dev at biojava.org > Subject: Re: [Biojava-dev] Proposed change to RichFormat interface > > > Mark Schreiber wrote: > > > Hi all - > > > > I would like to propose a change to the RichFormat interface. I think > we > > should do this now as we haven't done a stable biojavax roll out yet so > > interface > > changes should still be allowed. The additional methods would be: > > > > public String currentLine(); > > public int currentLineNumber(); > > > > This would make debugging a lot easier, it would also make construction > of > > a RichSeqIOListener that logs and debugs much easier. I was trying to do > > this a while back. I started a background process that parsed 6GB of > > genbank records looking for records that failed. It worked ok but would > be > > > > much better with the ability to query the RichFormat in the above way. > We > > might even be able to make it a utility that people could run on > suspect > > files and generate standard bug reports to make it easier for us to > debug > > the parser code. > > > > What do people think?? > > Another possibility would be to leave this sort of progress tracking up > to the client, in that they could wrap the InputStream in something like > an CountingInputStream before passing it to the parser(s): > > http://jakarta.apache.org/commons/io/api-release/org/apache/commons/io/input/CountingInputStream.html > > michael > > > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev -- Richard Holland (BioMart Team) EMBL-EBI Wellcome Trust Genome Campus Hinxton Cambridge CB10 1SD UNITED KINGDOM Tel: +44-(0)1223-494416 _______________________________________________ biojava-dev mailing list biojava-dev at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-dev From richard.holland at ebi.ac.uk Mon Jun 12 08:52:53 2006 From: richard.holland at ebi.ac.uk (Richard Holland) Date: Mon, 12 Jun 2006 09:52:53 +0100 Subject: [Biojava-dev] retrieving species (common name) In-Reply-To: <4489DF3F.4060504@gmx.at> References: <4484B4F9.9000502@gmx.at> <1149607062.3947.92.camel@texas.ebi.ac.uk> <4489DF3F.4060504@gmx.at> Message-ID: <1150102373.3952.21.camel@texas.ebi.ac.uk> I'm assuming your sequences and taxonomy data are stored in BioSQL. In which case, it's fairly straightforward to get this information out without having to drag all the features and annotations out as well, by using BioEntry instead of RichSequence to query the database. Code like this should work (hasn't been checked or anything, but it gives you an idea as to how things should go): // connect to BioSQL and establish a Hibernate Session Session sess = ...; // set up BioJavaX to use the session RichObjectFactory.connectToBioSQL(sess); // instantiate the class that gets BioEntries from BioSQL. // use BioSQLRichSequenceDB instead if you want features and // annotations included. BioEntryDB db = new BioSQLBioEntryDB(sess); // get BioEntry for accession (accession must be the // primary accession of the sequence, as found in the // 'name' column in the 'bioentry' table in the database). BioEntry be = db.getBioEntry("YPOL_IBDVS"); // get BioEntry's taxon object NCBITaxon tax = be.getTaxon(); // print the names. Each name belongs to a name class. for (Iterator i = tax.getNameClasses().iterator(); i.hasNext(); ) { String nameClass = (String)i.next(); for (Iterator k = tax.getNames(nameClass).iterator(); k.hasNext(); ) { String name = (String)k.next(); System.out.println(nameClass+" : "+name); } } If your sequences and taxonomy data are not stored in BioSQL, then the only way to do this is to parse the taxonomy data on startup, parse the sequences on startup into a simple in-memory system such as HashRichSequenceDB, then use the methods on the RichSequenceDB interface to obtain sequences by accession before continuing as per the example above. cheers, Richard On Fri, 2006-06-09 at 14:51 -0600, Hubert Prielinger wrote: > hi, > sorry for replying that late, > I have XML blast outputs, which you can retrieve information like > accession id, protein name, length of sequnence aso.... > but there is no possibility to retrieve the taxonomy (especially the > scientific name or common name) > I need the common and scientific name from each blast hit. I have found > in biojava-live/src/org/biojava/bibliography/taxa a few code examples > that could suit my > task (e.g: simpleTaxon.java) > > eg: I have the accession id: YPOL_IBDVS > and I want to get the taxonomy of that protein, not neccessarily the > entire taxonomy but mentioned above scientific and common name. > and I don't know exactly how to get the taxonomy, it seems that there is > no directly way from the accession id, but over the taxon id, but I > don't know how to get that either..... > it must be possible to map the accession id to the taxon id and then > request with the taxon id the taxonomy, if I get it right..... > > thanks in advance > regards > Hubert > > > Richard Holland wrote: > > I'm not sure what you're asking for here. Could you explain in a little > > more detail? Maybe write some example program code that assumes BioJava > > works the way you'd like it to work in this situation, making up the > > names of classes/methods that you might call in BioJava but don't yet > > exist, then we can help you fill in the gaps. > > > > cheers, > > Richard > > > > On Mon, 2006-06-05 at 16:49 -0600, Hubert Prielinger wrote: > > > >> hi, > >> Is it possible with biojava to retrieve the species not the entire > >> taxonomy, only the common name if I only have the accession id or the > >> name of the protein and if yes > >> how to start..... > >> In my case: > >> I would retrieve the accession id from my local database then assign as > >> parameter to the program, retrieve common name and write the common name > >> back into the database.... > >> the thing I want to know is the retrieving possible with biojava? > >> > >> thanks for help > >> > >> Hubert > >> _______________________________________________ > >> biojava-dev mailing list > >> biojava-dev at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/biojava-dev > >> > -- Richard Holland (BioMart Team) EMBL-EBI Wellcome Trust Genome Campus Hinxton Cambridge CB10 1SD UNITED KINGDOM Tel: +44-(0)1223-494416 From Robin.Emig at pioneer.com Mon Jun 12 19:01:12 2006 From: Robin.Emig at pioneer.com (Emig, Robin) Date: Mon, 12 Jun 2006 12:01:12 -0700 Subject: [Biojava-dev] Read/Write Account Message-ID: Can I get a read write account for biojava? I used to have one under remig, or raemig. Thanks Robin Robin Emig Pioneer HiBred/Dupont 700A Bay Road Redwood City, CA 94063 650-298-3564 From hubert.prielinger at gmx.at Fri Jun 9 20:51:11 2006 From: hubert.prielinger at gmx.at (Hubert Prielinger) Date: Fri, 09 Jun 2006 14:51:11 -0600 Subject: [Biojava-dev] retrieving species (common name) In-Reply-To: <1149607062.3947.92.camel@texas.ebi.ac.uk> References: <4484B4F9.9000502@gmx.at> <1149607062.3947.92.camel@texas.ebi.ac.uk> Message-ID: <4489DF3F.4060504@gmx.at> hi, sorry for replying that late, I have XML blast outputs, which you can retrieve information like accession id, protein name, length of sequnence aso.... but there is no possibility to retrieve the taxonomy (especially the scientific name or common name) I need the common and scientific name from each blast hit. I have found in biojava-live/src/org/biojava/bibliography/taxa a few code examples that could suit my task (e.g: simpleTaxon.java) eg: I have the accession id: YPOL_IBDVS and I want to get the taxonomy of that protein, not neccessarily the entire taxonomy but mentioned above scientific and common name. and I don't know exactly how to get the taxonomy, it seems that there is no directly way from the accession id, but over the taxon id, but I don't know how to get that either..... it must be possible to map the accession id to the taxon id and then request with the taxon id the taxonomy, if I get it right..... thanks in advance regards Hubert Richard Holland wrote: > I'm not sure what you're asking for here. Could you explain in a little > more detail? Maybe write some example program code that assumes BioJava > works the way you'd like it to work in this situation, making up the > names of classes/methods that you might call in BioJava but don't yet > exist, then we can help you fill in the gaps. > > cheers, > Richard > > On Mon, 2006-06-05 at 16:49 -0600, Hubert Prielinger wrote: > >> hi, >> Is it possible with biojava to retrieve the species not the entire >> taxonomy, only the common name if I only have the accession id or the >> name of the protein and if yes >> how to start..... >> In my case: >> I would retrieve the accession id from my local database then assign as >> parameter to the program, retrieve common name and write the common name >> back into the database.... >> the thing I want to know is the retrieving possible with biojava? >> >> thanks for help >> >> Hubert >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> From hubert.prielinger at gmx.at Fri Jun 9 22:10:12 2006 From: hubert.prielinger at gmx.at (Hubert Prielinger) Date: Fri, 09 Jun 2006 16:10:12 -0600 Subject: [Biojava-dev] retrieving species (common name) In-Reply-To: <1149607062.3947.92.camel@texas.ebi.ac.uk> References: <4484B4F9.9000502@gmx.at> <1149607062.3947.92.camel@texas.ebi.ac.uk> Message-ID: <4489F1C4.3030803@gmx.at> ok, with accession id, I mean the genbank id, if I have the genbank id, is there a direct way to get the common name... Richard Holland wrote: > I'm not sure what you're asking for here. Could you explain in a little > more detail? Maybe write some example program code that assumes BioJava > works the way you'd like it to work in this situation, making up the > names of classes/methods that you might call in BioJava but don't yet > exist, then we can help you fill in the gaps. > > cheers, > Richard > > On Mon, 2006-06-05 at 16:49 -0600, Hubert Prielinger wrote: > >> hi, >> Is it possible with biojava to retrieve the species not the entire >> taxonomy, only the common name if I only have the accession id or the >> name of the protein and if yes >> how to start..... >> In my case: >> I would retrieve the accession id from my local database then assign as >> parameter to the program, retrieve common name and write the common name >> back into the database.... >> the thing I want to know is the retrieving possible with biojava? >> >> thanks for help >> >> Hubert >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> From hubert.prielinger at gmx.at Mon Jun 12 16:36:32 2006 From: hubert.prielinger at gmx.at (Hubert Prielinger) Date: Mon, 12 Jun 2006 10:36:32 -0600 Subject: [Biojava-dev] retrieving species (common name) In-Reply-To: <1150102373.3952.21.camel@texas.ebi.ac.uk> References: <4484B4F9.9000502@gmx.at> <1149607062.3947.92.camel@texas.ebi.ac.uk> <4489DF3F.4060504@gmx.at> <1150102373.3952.21.camel@texas.ebi.ac.uk> Message-ID: <448D9810.1060703@gmx.at> hi, No, I'm not using BioSQL, it is an usual mySQL database and I have only the genbank accession id available. I want to get the taxonomy with the accession id, if that is possible. regards Hubert Richard Holland wrote: > I'm assuming your sequences and taxonomy data are stored in BioSQL. In > which case, it's fairly straightforward to get this information out > without having to drag all the features and annotations out as well, by > using BioEntry instead of RichSequence to query the database. Code like > this should work (hasn't been checked or anything, but it gives you an > idea as to how things should go): > > // connect to BioSQL and establish a Hibernate Session > Session sess = ...; > > // set up BioJavaX to use the session > RichObjectFactory.connectToBioSQL(sess); > > // instantiate the class that gets BioEntries from BioSQL. > // use BioSQLRichSequenceDB instead if you want features and > // annotations included. > BioEntryDB db = new BioSQLBioEntryDB(sess); > > // get BioEntry for accession (accession must be the > // primary accession of the sequence, as found in the > // 'name' column in the 'bioentry' table in the database). > BioEntry be = db.getBioEntry("YPOL_IBDVS"); > > // get BioEntry's taxon object > NCBITaxon tax = be.getTaxon(); > > // print the names. Each name belongs to a name class. > for (Iterator i = tax.getNameClasses().iterator(); > i.hasNext(); > ) { > String nameClass = (String)i.next(); > for (Iterator k = tax.getNames(nameClass).iterator(); > k.hasNext(); > ) { > String name = (String)k.next(); > System.out.println(nameClass+" : "+name); > } > } > > > If your sequences and taxonomy data are not stored in BioSQL, then the > only way to do this is to parse the taxonomy data on startup, parse the > sequences on startup into a simple in-memory system such as > HashRichSequenceDB, then use the methods on the RichSequenceDB interface > to obtain sequences by accession before continuing as per the example > above. > > cheers, > Richard > > > On Fri, 2006-06-09 at 14:51 -0600, Hubert Prielinger wrote: > >> hi, >> sorry for replying that late, >> I have XML blast outputs, which you can retrieve information like >> accession id, protein name, length of sequnence aso.... >> but there is no possibility to retrieve the taxonomy (especially the >> scientific name or common name) >> I need the common and scientific name from each blast hit. I have found >> in biojava-live/src/org/biojava/bibliography/taxa a few code examples >> that could suit my >> task (e.g: simpleTaxon.java) >> >> eg: I have the accession id: YPOL_IBDVS >> and I want to get the taxonomy of that protein, not neccessarily the >> entire taxonomy but mentioned above scientific and common name. >> and I don't know exactly how to get the taxonomy, it seems that there is >> no directly way from the accession id, but over the taxon id, but I >> don't know how to get that either..... >> it must be possible to map the accession id to the taxon id and then >> request with the taxon id the taxonomy, if I get it right..... >> >> thanks in advance >> regards >> Hubert >> >> >> Richard Holland wrote: >> >>> I'm not sure what you're asking for here. Could you explain in a little >>> more detail? Maybe write some example program code that assumes BioJava >>> works the way you'd like it to work in this situation, making up the >>> names of classes/methods that you might call in BioJava but don't yet >>> exist, then we can help you fill in the gaps. >>> >>> cheers, >>> Richard >>> >>> On Mon, 2006-06-05 at 16:49 -0600, Hubert Prielinger wrote: >>> >>> >>>> hi, >>>> Is it possible with biojava to retrieve the species not the entire >>>> taxonomy, only the common name if I only have the accession id or the >>>> name of the protein and if yes >>>> how to start..... >>>> In my case: >>>> I would retrieve the accession id from my local database then assign as >>>> parameter to the program, retrieve common name and write the common name >>>> back into the database.... >>>> the thing I want to know is the retrieving possible with biojava? >>>> >>>> thanks for help >>>> >>>> Hubert >>>> _______________________________________________ >>>> biojava-dev mailing list >>>> biojava-dev at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >>>> >>>> From richard.holland at ebi.ac.uk Tue Jun 13 08:58:21 2006 From: richard.holland at ebi.ac.uk (Richard Holland) Date: Tue, 13 Jun 2006 09:58:21 +0100 Subject: [Biojava-dev] retrieving species (common name) In-Reply-To: <448D9810.1060703@gmx.at> References: <4484B4F9.9000502@gmx.at> <1149607062.3947.92.camel@texas.ebi.ac.uk> <4489DF3F.4060504@gmx.at> <1150102373.3952.21.camel@texas.ebi.ac.uk> <448D9810.1060703@gmx.at> Message-ID: <1150189101.3952.47.camel@texas.ebi.ac.uk> At present, BJX only has bindings to BioSQL (which can be installed in Oracle, MySQL, PostgreSQL, or HSQL depending on your preference). It doesn't know how to access sequence/taxonomy data stored in other databases. Of course, it can still read flat files. Without a database which BJX understands, the only way to do what you describe is to load taxonomy data from the NCBI taxonomy files into memory on startup, then set up some mechanism of parsing Genbank records on the fly according to accession number... I could go into detail but it's a bit complex. So the short answer is - no, you can't do that kind of query without coming up with some clever way of using file parsers efficiently on the fly, or by storing everything in a BioSQL database. Have a look at RichSequenceListener if you want to selectively parse sequence files. cheers, Richard On Mon, 2006-06-12 at 10:36 -0600, Hubert Prielinger wrote: > > If your sequences and taxonomy data are not stored in BioSQL, then > the > > only way to do this is to parse the taxonomy data on startup, parse > the > > sequences on startup into a simple in-memory system such as > > HashRichSequenceDB, then use the methods on the RichSequenceDB > interface > > to obtain sequences by accession before continuing as per the > example > > above. -- Richard Holland (BioMart Team) EMBL-EBI Wellcome Trust Genome Campus Hinxton Cambridge CB10 1SD UNITED KINGDOM Tel: +44-(0)1223-494416 From richard.holland at ebi.ac.uk Tue Jun 13 15:20:15 2006 From: richard.holland at ebi.ac.uk (Richard Holland) Date: Tue, 13 Jun 2006 16:20:15 +0100 Subject: [Biojava-dev] Read/Write Account In-Reply-To: References: Message-ID: <1150212015.3952.121.camel@texas.ebi.ac.uk> Hi Robin. Mark should be able to set you up with one, or point you to the person who can. I can never remember who's in charge. Purely out of interest, what are you planning on changing once you get access? It's useful to know what people are up to out there so we don't duplicate effort. cheers, Richard On Mon, 2006-06-12 at 12:01 -0700, Emig, Robin wrote: > Can I get a read write account for biojava? I used to have one under > remig, or raemig. > > Thanks > > Robin > > > > Robin Emig > > Pioneer HiBred/Dupont > > 700A Bay Road > > Redwood City, CA 94063 > > 650-298-3564 > > > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev -- Richard Holland (BioMart Team) EMBL-EBI Wellcome Trust Genome Campus Hinxton Cambridge CB10 1SD UNITED KINGDOM Tel: +44-(0)1223-494416 From mark.schreiber at novartis.com Mon Jun 19 02:38:38 2006 From: mark.schreiber at novartis.com (mark.schreiber at novartis.com) Date: Mon, 19 Jun 2006 10:38:38 +0800 Subject: [Biojava-dev] retrieving species (common name) Message-ID: You could try NCBI's e-utils webservice. This might enable you to get the common name using a GI number. - Mark Hubert Prielinger Sent by: biojava-dev-bounces at lists.open-bio.org 06/13/2006 12:36 AM To: Richard Holland , biojava-dev at lists.open-bio.org cc: (bcc: Mark Schreiber/GP/Novartis) Subject: Re: [Biojava-dev] retrieving species (common name) hi, No, I'm not using BioSQL, it is an usual mySQL database and I have only the genbank accession id available. I want to get the taxonomy with the accession id, if that is possible. regards Hubert Richard Holland wrote: > I'm assuming your sequences and taxonomy data are stored in BioSQL. In > which case, it's fairly straightforward to get this information out > without having to drag all the features and annotations out as well, by > using BioEntry instead of RichSequence to query the database. Code like > this should work (hasn't been checked or anything, but it gives you an > idea as to how things should go): > > // connect to BioSQL and establish a Hibernate Session > Session sess = ...; > > // set up BioJavaX to use the session > RichObjectFactory.connectToBioSQL(sess); > > // instantiate the class that gets BioEntries from BioSQL. > // use BioSQLRichSequenceDB instead if you want features and > // annotations included. > BioEntryDB db = new BioSQLBioEntryDB(sess); > > // get BioEntry for accession (accession must be the > // primary accession of the sequence, as found in the > // 'name' column in the 'bioentry' table in the database). > BioEntry be = db.getBioEntry("YPOL_IBDVS"); > > // get BioEntry's taxon object > NCBITaxon tax = be.getTaxon(); > > // print the names. Each name belongs to a name class. > for (Iterator i = tax.getNameClasses().iterator(); > i.hasNext(); > ) { > String nameClass = (String)i.next(); > for (Iterator k = tax.getNames(nameClass).iterator(); > k.hasNext(); > ) { > String name = (String)k.next(); > System.out.println(nameClass+" : "+name); > } > } > > > If your sequences and taxonomy data are not stored in BioSQL, then the > only way to do this is to parse the taxonomy data on startup, parse the > sequences on startup into a simple in-memory system such as > HashRichSequenceDB, then use the methods on the RichSequenceDB interface > to obtain sequences by accession before continuing as per the example > above. > > cheers, > Richard > > > On Fri, 2006-06-09 at 14:51 -0600, Hubert Prielinger wrote: > >> hi, >> sorry for replying that late, >> I have XML blast outputs, which you can retrieve information like >> accession id, protein name, length of sequnence aso.... >> but there is no possibility to retrieve the taxonomy (especially the >> scientific name or common name) >> I need the common and scientific name from each blast hit. I have found >> in biojava-live/src/org/biojava/bibliography/taxa a few code examples >> that could suit my >> task (e.g: simpleTaxon.java) >> >> eg: I have the accession id: YPOL_IBDVS >> and I want to get the taxonomy of that protein, not neccessarily the >> entire taxonomy but mentioned above scientific and common name. >> and I don't know exactly how to get the taxonomy, it seems that there is >> no directly way from the accession id, but over the taxon id, but I >> don't know how to get that either..... >> it must be possible to map the accession id to the taxon id and then >> request with the taxon id the taxonomy, if I get it right..... >> >> thanks in advance >> regards >> Hubert >> >> >> Richard Holland wrote: >> >>> I'm not sure what you're asking for here. Could you explain in a little >>> more detail? Maybe write some example program code that assumes BioJava >>> works the way you'd like it to work in this situation, making up the >>> names of classes/methods that you might call in BioJava but don't yet >>> exist, then we can help you fill in the gaps. >>> >>> cheers, >>> Richard >>> >>> On Mon, 2006-06-05 at 16:49 -0600, Hubert Prielinger wrote: >>> >>> >>>> hi, >>>> Is it possible with biojava to retrieve the species not the entire >>>> taxonomy, only the common name if I only have the accession id or the >>>> name of the protein and if yes >>>> how to start..... >>>> In my case: >>>> I would retrieve the accession id from my local database then assign as >>>> parameter to the program, retrieve common name and write the common name >>>> back into the database.... >>>> the thing I want to know is the retrieving possible with biojava? >>>> >>>> thanks for help >>>> >>>> Hubert >>>> _______________________________________________ >>>> biojava-dev mailing list >>>> biojava-dev at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >>>> >>>> _______________________________________________ biojava-dev mailing list biojava-dev at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-dev From gwaldon at geneinfinity.org Thu Jun 29 23:33:55 2006 From: gwaldon at geneinfinity.org (george waldon) Date: Thu, 29 Jun 2006 16:33:55 -0700 Subject: [Biojava-dev] Problem with SimpleDocRefTest Message-ID: <200606292333.k5TNXwdC026213@mmm1924.dulles19-verio.com> Hi, I am trying to run the tests but my compiler complains about org.biojavax.SimpleDocRefTest at compilation: missing constructor SimpleDocRef(java.util.List, java.lang.String) Was-It expected to interpretate that as SimpleDocRef(java.util.List, java.lang.String, null) (the three-argument constructor)? Thanks, George From richard.holland at ebi.ac.uk Fri Jun 30 09:11:11 2006 From: richard.holland at ebi.ac.uk (Richard Holland) Date: Fri, 30 Jun 2006 10:11:11 +0100 Subject: [Biojava-dev] Problem with SimpleDocRefTest In-Reply-To: <200606292333.k5TNXwdC026213@mmm1924.dulles19-verio.com> References: <200606292333.k5TNXwdC026213@mmm1924.dulles19-verio.com> Message-ID: <1151658671.3942.75.camel@texas.ebi.ac.uk> The SimpleDocRef constructor changed recently to include document titles and I don't think the test was updated to match it. Sorry about that - the head branch of CVS is always under development so cannot always be guaranteed to work out 100%. Mark, can you update the tests? cheers, Richard On Thu, 2006-06-29 at 16:33 -0700, george waldon wrote: > Hi, > > I am trying to run the tests but my compiler complains about org.biojavax.SimpleDocRefTest at compilation: > missing constructor SimpleDocRef(java.util.List, java.lang.String) > > Was-It expected to interpretate that as SimpleDocRef(java.util.List, java.lang.String, null) (the three-argument constructor)? > > Thanks, > George > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev -- Richard Holland (BioMart Team) EMBL-EBI Wellcome Trust Genome Campus Hinxton Cambridge CB10 1SD UNITED KINGDOM Tel: +44-(0)1223-494416