From sylvain.foisy at diploide.net Tue Jun 2 12:26:22 2009 From: sylvain.foisy at diploide.net (Sylvain Foisy) Date: Tue, 02 Jun 2009 12:26:22 -0400 Subject: [Biojava-dev] Biojava Interface to BLAST web/remote services Message-ID: Hi, In response to Scooter and from using some of these BLAST implementations in the past, I would suggest that we use the QBlast service from NCI first for a number of reasons: - It has been in operation for a long time and its usage is well documented; - Because of this, there is few chances that it will change; - Coming from NCBI, it will probably be there for some time to come ;-) Our friends at BioPerl have been using this technique for a long time now with the BioPerl Module:Bio::Tools::Run::RemoteBlast module. We might try to emulate at first and of course, do better :-) Any inputs? Best regards Sylvain =================================================================== Sylvain Foisy, Ph. D. Consultant Bio-informatique / Bioinformatics Diploide.net - TI pour la vie / IT for Life Courriel: sylvain.foisy at diploide.net Web: http://www.diploide.net =================================================================== From andreas at sdsc.edu Thu Jun 4 05:12:01 2009 From: andreas at sdsc.edu (Andreas Prlic) Date: Thu, 4 Jun 2009 11:12:01 +0200 Subject: [Biojava-dev] Biojava Interface to BLAST web/remote services In-Reply-To: References: Message-ID: <59a41c430906040212m659046c0y820d32079607f34d@mail.gmail.com> Hi Sylvain, Do you mean the URL api for the NCBI Blast searches? Could not find a link for a WSDL... http://www.ncbi.nlm.nih.gov/BLAST/developer.shtml Andreas On Tue, Jun 2, 2009 at 6:26 PM, Sylvain Foisy wrote: > Hi, > > In response to Scooter and from using some of these BLAST implementations in > the past, I would suggest that we use the QBlast service from NCI first for > a number of reasons: > > - It has been in operation for a long time and its usage is well documented; > > - Because of this, there is few chances that it will change; > > - Coming from NCBI, it will probably be there for some time to come ;-) > > Our friends at BioPerl have been using this technique for a long time now > with the BioPerl Module:Bio::Tools::Run::RemoteBlast module. We might try to > emulate at first and of course, do better :-) > > Any inputs? > > Best regards > > Sylvain > > =================================================================== > > ?Sylvain Foisy, Ph. D. > ?Consultant Bio-informatique / Bioinformatics > ?Diploide.net - TI pour la vie / IT for Life > > ?Courriel: sylvain.foisy at diploide.net > ?Web: http://www.diploide.net > > =================================================================== > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > From HWillis at scripps.edu Thu Jun 4 07:38:43 2009 From: HWillis at scripps.edu (Scooter Willis) Date: Thu, 4 Jun 2009 07:38:43 -0400 Subject: [Biojava-dev] Biojava Interface to BLAST web/remote services References: <59a41c430906040212m659046c0y820d32079607f34d@mail.gmail.com> Message-ID: <061BFD133FA1584693D19C79A0072F5F76C86A@FLMAIL1.fl.ad.scripps.edu> Looks like the rolled their own URL interface and did not do a WSDL. Not a big deal but does appear they have some sort of submit get a "ticket" and then check back with the "ticket" identifier for the results. The BioJava API would hide the transport layer so you could use a custom URL approach or web services. Not sure how the other WSDL interfaces handle long running tasks but I assume the Web Services can handle a call that takes say 5 minutes to respond without timing out. Some process would need to distinguish between a long running server task and a server that is no longer responding. Scooter ________________________________ From: biojava-dev-bounces at lists.open-bio.org on behalf of Andreas Prlic Sent: Thu 6/4/2009 5:12 AM To: Sylvain Foisy Cc: biojava-dev at lists.open-bio.org Subject: Re: [Biojava-dev] Biojava Interface to BLAST web/remote services Hi Sylvain, Do you mean the URL api for the NCBI Blast searches? Could not find a link for a WSDL... http://www.ncbi.nlm.nih.gov/BLAST/developer.shtml Andreas On Tue, Jun 2, 2009 at 6:26 PM, Sylvain Foisy wrote: > Hi, > > In response to Scooter and from using some of these BLAST implementations in > the past, I would suggest that we use the QBlast service from NCI first for > a number of reasons: > > - It has been in operation for a long time and its usage is well documented; > > - Because of this, there is few chances that it will change; > > - Coming from NCBI, it will probably be there for some time to come ;-) > > Our friends at BioPerl have been using this technique for a long time now > with the BioPerl Module:Bio::Tools::Run::RemoteBlast module. We might try to > emulate at first and of course, do better :-) > > Any inputs? > > Best regards > > Sylvain > > =================================================================== > > Sylvain Foisy, Ph. D. > Consultant Bio-informatique / Bioinformatics > Diploide.net - TI pour la vie / IT for Life > > Courriel: sylvain.foisy at diploide.net > Web: http://www.diploide.net > > =================================================================== > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > _______________________________________________ biojava-dev mailing list biojava-dev at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-dev From andreas at sdsc.edu Thu Jun 4 09:00:43 2009 From: andreas at sdsc.edu (Andreas Prlic) Date: Thu, 4 Jun 2009 14:00:43 +0100 Subject: [Biojava-dev] Biojava Interface to BLAST web/remote services In-Reply-To: <061BFD133FA1584693D19C79A0072F5F76C86A@FLMAIL1.fl.ad.scripps.edu> References: <59a41c430906040212m659046c0y820d32079607f34d@mail.gmail.com> <061BFD133FA1584693D19C79A0072F5F76C86A@FLMAIL1.fl.ad.scripps.edu> Message-ID: <59a41c430906040600k74bd525frce89d79943542a6e@mail.gmail.com> although using a different API this system is similar to the sequence search service provided by Pfam ... http://pfam.sanger.ac.uk/help#services Andreas On Thu, Jun 4, 2009 at 12:38 PM, Scooter Willis wrote: > Looks like the rolled their own URL interface and did not do a WSDL. Not a > big deal but does appear they have some sort of submit get a "ticket" and > then check back with the "ticket" identifier for the results. The BioJava > API would hide the transport layer so you could use a custom URL approach or > web services. > > Not sure how the other WSDL interfaces handle long running tasks but I > assume the Web Services can handle a call that takes say 5 minutes to > respond without timing out. Some process would need to distinguish between a > long running server task and a server that is no longer responding. > > Scooter > ________________________________ > From: biojava-dev-bounces at lists.open-bio.org on behalf of Andreas Prlic > Sent: Thu 6/4/2009 5:12 AM > To: Sylvain Foisy > Cc: biojava-dev at lists.open-bio.org > Subject: Re: [Biojava-dev] Biojava Interface to BLAST web/remote services > > Hi Sylvain, > > Do you mean the URL api for the NCBI Blast searches? Could not find a > link for a WSDL... > http://www.ncbi.nlm.nih.gov/BLAST/developer.shtml > > Andreas > > > On Tue, Jun 2, 2009 at 6:26 PM, Sylvain Foisy > wrote: >> Hi, >> >> In response to Scooter and from using some of these BLAST implementations >> in >> the past, I would suggest that we use the QBlast service from NCI first >> for >> a number of reasons: >> >> - It has been in operation for a long time and its usage is well >> documented; >> >> - Because of this, there is few chances that it will change; >> >> - Coming from NCBI, it will probably be there for some time to come ;-) >> >> Our friends at BioPerl have been using this technique for a long time now >> with the BioPerl Module:Bio::Tools::Run::RemoteBlast module. We might try >> to >> emulate at first and of course, do better :-) >> >> Any inputs? >> >> Best regards >> >> Sylvain >> >> =================================================================== >> >> ?Sylvain Foisy, Ph. D. >> ?Consultant Bio-informatique / Bioinformatics >> ?Diploide.net - TI pour la vie / IT for Life >> >> ?Courriel: sylvain.foisy at diploide.net >> ?Web: http://www.diploide.net >> >> =================================================================== >> >> >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > From sylvain.foisy at diploide.net Thu Jun 4 09:07:04 2009 From: sylvain.foisy at diploide.net (Sylvain Foisy) Date: Thu, 04 Jun 2009 09:07:04 -0400 Subject: [Biojava-dev] Biojava Interface to BLAST web/remote services In-Reply-To: <061BFD133FA1584693D19C79A0072F5F76C86A@FLMAIL1.fl.ad.scripps.edu> Message-ID: Hi Scooter, On 04/06/09 07:38, "[NAME]" <[ADDRESS]> wrote: > Looks like the rolled their own URL interface and did not do a WSDL. Not a big > deal but does appear they have some sort of submit get a "ticket" and then > check back with the "ticket" identifier for the results. The BioJava API would > hide the transport layer so you could use a custom URL approach or web > services. That is basically the way it works. I am working on a RemoteBlastWrapper class that would do exactly what you are writing. > Not sure how the other WSDL interfaces handle long running tasks but I assume > the Web Services can handle a call that takes say 5 minutes to respond without > timing out. Some process would need to distinguish between a long running > server task and a server that is no longer responding. We'll have to try ;-) Best regards Sylvain =================================================================== Sylvain Foisy, Ph. D. Consultant Bio-informatique / Bioinformatics Diploide.net - TI pour la vie / IT for Life Courriel: sylvain.foisy at diploide.net Web: http://www.diploide.net Tel: (514) 893-4363 =================================================================== From HWillis at scripps.edu Thu Jun 4 09:28:20 2009 From: HWillis at scripps.edu (Scooter Willis) Date: Thu, 4 Jun 2009 09:28:20 -0400 Subject: [Biojava-dev] Biojava Interface to BLAST web/remote services In-Reply-To: References: <061BFD133FA1584693D19C79A0072F5F76C86A@FLMAIL1.fl.ad.scripps.edu> Message-ID: <061BFD133FA1584693D19C79A0072F5F95FBA3@FLMAIL1.fl.ad.scripps.edu> Sylvain Given that BioJava already has a BLAST file parser that returns results the goal should be to have a remote/web call return the same set of classes as if you had parsed the file locally. That is going to be my approach. Once we get a couple services working we can integrate into a common factory/interface approach. Thanks Scooter -----Original Message----- From: biojava-dev-bounces at lists.open-bio.org [mailto:biojava-dev-bounces at lists.open-bio.org] On Behalf Of Sylvain Foisy Sent: Thursday, June 04, 2009 9:07 AM To: Scooter Willis; Andreas Prlic Cc: biojava-dev at lists.open-bio.org Subject: Re: [Biojava-dev] Biojava Interface to BLAST web/remote services Hi Scooter, On 04/06/09 07:38, "[NAME]" <[ADDRESS]> wrote: > Looks like the rolled their own URL interface and did not do a WSDL. Not a big > deal but does appear they have some sort of submit get a "ticket" and then > check back with the "ticket" identifier for the results. The BioJava API would > hide the transport layer so you could use a custom URL approach or web > services. That is basically the way it works. I am working on a RemoteBlastWrapper class that would do exactly what you are writing. > Not sure how the other WSDL interfaces handle long running tasks but I assume > the Web Services can handle a call that takes say 5 minutes to respond without > timing out. Some process would need to distinguish between a long running > server task and a server that is no longer responding. We'll have to try ;-) Best regards Sylvain =================================================================== Sylvain Foisy, Ph. D. Consultant Bio-informatique / Bioinformatics Diploide.net - TI pour la vie / IT for Life Courriel: sylvain.foisy at diploide.net Web: http://www.diploide.net Tel: (514) 893-4363 =================================================================== _______________________________________________ biojava-dev mailing list biojava-dev at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-dev From sylvain.foisy at diploide.net Thu Jun 4 09:57:03 2009 From: sylvain.foisy at diploide.net (Sylvain Foisy) Date: Thu, 04 Jun 2009 09:57:03 -0400 Subject: [Biojava-dev] Biojava Interface to BLAST web/remote services In-Reply-To: <061BFD133FA1584693D19C79A0072F5F95FBA3@FLMAIL1.fl.ad.scripps.edu> Message-ID: Hi Scooter, That is one way of doing it ;-) I was thinking of creating an object that the user would either: - Feed into the BJ Blast parser - Do something else entirely. Best regards Sylvain On 04/06/09 09:28, "[NAME]" <[ADDRESS]> wrote: > Sylvain > > Given that BioJava already has a BLAST file parser that returns results > the goal should be to have a remote/web call return the same set of > classes as if you had parsed the file locally. That is going to be my > approach. Once we get a couple services working we can integrate into a > common factory/interface approach. > > Thanks > > Scooter > > > -----Original Message----- > From: biojava-dev-bounces at lists.open-bio.org > [mailto:biojava-dev-bounces at lists.open-bio.org] On Behalf Of Sylvain > Foisy > Sent: Thursday, June 04, 2009 9:07 AM > To: Scooter Willis; Andreas Prlic > Cc: biojava-dev at lists.open-bio.org > Subject: Re: [Biojava-dev] Biojava Interface to BLAST web/remote > services > > Hi Scooter, > > On 04/06/09 07:38, "[NAME]" <[ADDRESS]> wrote: > >> Looks like the rolled their own URL interface and did not do a WSDL. > Not a big >> deal but does appear they have some sort of submit get a "ticket" and > then >> check back with the "ticket" identifier for the results. The BioJava > API would >> hide the transport layer so you could use a custom URL approach or web >> services. > > That is basically the way it works. I am working on a RemoteBlastWrapper > class that would do exactly what you are writing. > > >> Not sure how the other WSDL interfaces handle long running tasks but I > assume >> the Web Services can handle a call that takes say 5 minutes to respond > without >> timing out. Some process would need to distinguish between a long > running >> server task and a server that is no longer responding. > > We'll have to try ;-) > > Best regards > > Sylvain > > > =================================================================== > > Sylvain Foisy, Ph. D. > Consultant Bio-informatique / Bioinformatics > Diploide.net - TI pour la vie / IT for Life > > Courriel: sylvain.foisy at diploide.net > Web: http://www.diploide.net > Tel: (514) 893-4363 > =================================================================== > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev From HWillis at scripps.edu Thu Jun 4 10:16:07 2009 From: HWillis at scripps.edu (Scooter Willis) Date: Thu, 4 Jun 2009 10:16:07 -0400 Subject: [Biojava-dev] Biojava Interface to BLAST web/remote services In-Reply-To: References: <061BFD133FA1584693D19C79A0072F5F95FBA3@FLMAIL1.fl.ad.scripps.edu> Message-ID: <061BFD133FA1584693D19C79A0072F5F95FBB1@FLMAIL1.fl.ad.scripps.edu> Sylvain I think the way you submit the query/paramaters of the seearch or parse a BLAST file would be different and we would not worry about the SAX API/File dependency of parsing a file. We do need a Class that would contain the search parameters and this should as an object follow the same inputs available via the union of HTML interfaces for the supported BLAST engines. Some search engines will have more inputs or specificity over others so that will require some analysis. This search parameter class should be independent of a particular BLAST web service engine allowing a user to submit the same search to multiple services with minimum overhead. But once you get the results then having the ability to use the same general iteration of results/hits will allow those who have invested in the BLAST file parsing API to easily insert the new web services approach. >From the biojava cookbook SeqSimilaritySearchHit is the class that contains the results and should be the class used to contain the results from the web service query. In the web service approach you should be able to get the collection of SeqSimilaritySearchResult and SeqSimilaritySearchHit from each of the supported BLAST web services. The assumption is that SeqSimilaritySearchResult and SeqSimilaritySearchHit have been properly designed to represent BLAST data. Scooter //output some blast details for (Iterator i = results.iterator(); i.hasNext(); ) { SeqSimilaritySearchResult result = (SeqSimilaritySearchResult)i.next(); Annotation anno = result.getAnnotation(); for (Iterator j = anno.keys().iterator(); j.hasNext(); ) { Object key = j.next(); Object property = anno.getProperty(key); System.out.println(key+" : "+property); } System.out.println("Hits: "); //list the hits for (Iterator k = result.getHits().iterator(); k.hasNext(); ) { SeqSimilaritySearchHit hit = (SeqSimilaritySearchHit)k.next(); System.out.print("\tmatch: "+hit.getSubjectID()); System.out.println("\te score: "+hit.getEValue()); } System.out.println("\n"); } } -----Original Message----- From: Sylvain Foisy [mailto:sylvain.foisy at diploide.net] Sent: Thursday, June 04, 2009 9:57 AM To: Scooter Willis; Andreas Prlic Cc: biojava-dev at lists.open-bio.org Subject: Re: [Biojava-dev] Biojava Interface to BLAST web/remote services Hi Scooter, That is one way of doing it ;-) I was thinking of creating an object that the user would either: - Feed into the BJ Blast parser - Do something else entirely. Best regards Sylvain On 04/06/09 09:28, "[NAME]" <[ADDRESS]> wrote: > Sylvain > > Given that BioJava already has a BLAST file parser that returns results > the goal should be to have a remote/web call return the same set of > classes as if you had parsed the file locally. That is going to be my > approach. Once we get a couple services working we can integrate into a > common factory/interface approach. > > Thanks > > Scooter > > > -----Original Message----- > From: biojava-dev-bounces at lists.open-bio.org > [mailto:biojava-dev-bounces at lists.open-bio.org] On Behalf Of Sylvain > Foisy > Sent: Thursday, June 04, 2009 9:07 AM > To: Scooter Willis; Andreas Prlic > Cc: biojava-dev at lists.open-bio.org > Subject: Re: [Biojava-dev] Biojava Interface to BLAST web/remote > services > > Hi Scooter, > > On 04/06/09 07:38, "[NAME]" <[ADDRESS]> wrote: > >> Looks like the rolled their own URL interface and did not do a WSDL. > Not a big >> deal but does appear they have some sort of submit get a "ticket" and > then >> check back with the "ticket" identifier for the results. The BioJava > API would >> hide the transport layer so you could use a custom URL approach or web >> services. > > That is basically the way it works. I am working on a RemoteBlastWrapper > class that would do exactly what you are writing. > > >> Not sure how the other WSDL interfaces handle long running tasks but I > assume >> the Web Services can handle a call that takes say 5 minutes to respond > without >> timing out. Some process would need to distinguish between a long > running >> server task and a server that is no longer responding. > > We'll have to try ;-) > > Best regards > > Sylvain > > > =================================================================== > > Sylvain Foisy, Ph. D. > Consultant Bio-informatique / Bioinformatics > Diploide.net - TI pour la vie / IT for Life > > Courriel: sylvain.foisy at diploide.net > Web: http://www.diploide.net > Tel: (514) 893-4363 > =================================================================== > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev From mark.schreiber at novartis.com Thu Jun 4 23:47:42 2009 From: mark.schreiber at novartis.com (mark.schreiber at novartis.com) Date: Fri, 5 Jun 2009 11:47:42 +0800 Subject: [Biojava-dev] Biojava Interface to BLAST web/remote services In-Reply-To: <061BFD133FA1584693D19C79A0072F5F95FBB1@FLMAIL1.fl.ad.scripps.edu> Message-ID: Hi - Just some observations from past experience: You could write an interface called something like RemoteSimilaritySearch which contains minimal information that all SOAP/ CGI-BIN sequence search services might be expected to require and return although it's pretty hard to anticipate what that might be. Possibly more useful would be RemoteBLAST, RemoteFASTA etc interfaces that could extend RemoteSimilaritySearch. Concrete implementations of, for example, the RemoteBLAST could include the SOAP service at EBI and the CGI-BIN service at NCBI. The RemoteBLAST and RemoteFASTA should have the possibility to modify any parameter of BLAST/ FASTA as appropriate and should have the option to throw an UnsupportedOperationException as not all interfaces will allow the setting of all parameters. In general trying to make an implementation that will talk to an HTML interface to BLAST is asking for trouble (as they can change very easily). It is best to code to a SOAP/ REST service or, if you have to, a CGI-BIN interface. You should only make an implementation that talks to a web form as a last resort and even then if probably shouldn't go into BioJava (maybe post it on the cookbook). The most stable version of the BLAST output is the XML. Parsing the text/html output has been a constant source of headaches for BioJava. Implementations of remote blast services should try and parse that format if it is available (SOAP and REST will be XML anyway although not always BLAST.XML). All the BLAST services I have used will return a job number not a result. The client will then need to poll that job number until it is complete and then get the results for the job. The client will need to handle this sensibly without timing out (unless the user wants to allow a time out). Sensible threading will be required. Converting results back into SeqSimilaritySearchResult makes sense although please note that Andreas has suggested renaming the packages for these (which I support as the old package name is not informative). Under a mavenized system the whole Similarity search system could go into it's own module. Just my $0.02 - Mark biojava-dev-bounces at lists.open-bio.org wrote on 06/04/2009 10:16:07 PM: > Sylvain > > I think the way you submit the query/paramaters of the seearch or parse > a BLAST file would be different and we would not worry about the SAX > API/File dependency of parsing a file. We do need a Class that would > contain the search parameters and this should as an object follow the > same inputs available via the union of HTML interfaces for the supported > BLAST engines. Some search engines will have more inputs or specificity > over others so that will require some analysis. This search parameter > class should be independent of a particular BLAST web service engine > allowing a user to submit the same search to multiple services with > minimum overhead. > > But once you get the results then having the ability to use the same > general iteration of results/hits will allow those who have invested in > the BLAST file parsing API to easily insert the new web services > approach. > > >From the biojava cookbook SeqSimilaritySearchHit is the class that > contains the results and should be the class used to contain the results > from the web service query. In the web service approach you should be > able to get the collection of SeqSimilaritySearchResult and > SeqSimilaritySearchHit from each of the supported BLAST web services. > The assumption is that SeqSimilaritySearchResult and > SeqSimilaritySearchHit have been properly designed to represent BLAST > data. > > Scooter > > //output some blast details > for (Iterator i = results.iterator(); i.hasNext(); ) { > SeqSimilaritySearchResult result = > (SeqSimilaritySearchResult)i.next(); > > Annotation anno = result.getAnnotation(); > > for (Iterator j = anno.keys().iterator(); j.hasNext(); ) { > Object key = j.next(); > Object property = anno.getProperty(key); > System.out.println(key+" : "+property); > } > System.out.println("Hits: "); > > //list the hits > for (Iterator k = result.getHits().iterator(); k.hasNext(); ) { > SeqSimilaritySearchHit hit = > (SeqSimilaritySearchHit)k.next(); > System.out.print("\tmatch: "+hit.getSubjectID()); > System.out.println("\te score: "+hit.getEValue()); > } > > System.out.println("\n"); > } > > } > > -----Original Message----- > From: Sylvain Foisy [mailto:sylvain.foisy at diploide.net] > Sent: Thursday, June 04, 2009 9:57 AM > To: Scooter Willis; Andreas Prlic > Cc: biojava-dev at lists.open-bio.org > Subject: Re: [Biojava-dev] Biojava Interface to BLAST web/remote > services > > Hi Scooter, > > That is one way of doing it ;-) I was thinking of creating an object > that > the user would either: > > - Feed into the BJ Blast parser > - Do something else entirely. > > Best regards > > Sylvain > > On 04/06/09 09:28, "[NAME]" <[ADDRESS]> wrote: > > > Sylvain > > > > Given that BioJava already has a BLAST file parser that returns > results > > the goal should be to have a remote/web call return the same set of > > classes as if you had parsed the file locally. That is going to be my > > approach. Once we get a couple services working we can integrate into > a > > common factory/interface approach. > > > > Thanks > > > > Scooter > > > > > > -----Original Message----- > > From: biojava-dev-bounces at lists.open-bio.org > > [mailto:biojava-dev-bounces at lists.open-bio.org] On Behalf Of Sylvain > > Foisy > > Sent: Thursday, June 04, 2009 9:07 AM > > To: Scooter Willis; Andreas Prlic > > Cc: biojava-dev at lists.open-bio.org > > Subject: Re: [Biojava-dev] Biojava Interface to BLAST web/remote > > services > > > > Hi Scooter, > > > > On 04/06/09 07:38, "[NAME]" <[ADDRESS]> wrote: > > > >> Looks like the rolled their own URL interface and did not do a WSDL. > > Not a big > >> deal but does appear they have some sort of submit get a "ticket" and > > then > >> check back with the "ticket" identifier for the results. The BioJava > > API would > >> hide the transport layer so you could use a custom URL approach or > web > >> services. > > > > That is basically the way it works. I am working on a > RemoteBlastWrapper > > class that would do exactly what you are writing. > > > > > >> Not sure how the other WSDL interfaces handle long running tasks but > I > > assume > >> the Web Services can handle a call that takes say 5 minutes to > respond > > without > >> timing out. Some process would need to distinguish between a long > > running > >> server task and a server that is no longer responding. > > > > We'll have to try ;-) > > > > Best regards > > > > Sylvain > > > > > > =================================================================== > > > > Sylvain Foisy, Ph. D. > > Consultant Bio-informatique / Bioinformatics > > Diploide.net - TI pour la vie / IT for Life > > > > Courriel: sylvain.foisy at diploide.net > > Web: http://www.diploide.net > > Tel: (514) 893-4363 > > =================================================================== > > > > > > _______________________________________________ > > biojava-dev mailing list > > biojava-dev at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-dev > > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev _________________________ CONFIDENTIALITY NOTICE The information contained in this e-mail message is intended only for the exclusive use of the individual or entity named above and may contain information that is privileged, confidential or exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, or the employee or agent responsible for delivery of the message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify the sender immediately by e-mail and delete the material from any computer. Thank you. From bugzilla-daemon at portal.open-bio.org Wed Jun 10 17:59:30 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 10 Jun 2009 17:59:30 -0400 Subject: [Biojava-dev] [Bug 2854] New: Selection of protein alphabet is hardcoded in ProteinTools class Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2854 Summary: Selection of protein alphabet is hardcoded in ProteinTools class Product: BioJava Version: live (CVS source) Platform: All OS/Version: All Status: NEW Severity: normal Priority: P2 Component: seq AssignedTo: biojava-dev at biojava.org ReportedBy: mdharsee at ocbn.ca In our application we are calling createProtein() in class org.biojava.bio.seq.ProteinTools to generate SymbolList objects to encapsulate peptide sequences that are composed of the 20 common amino acid symbols, as well as the 'X' ambiguity symbol. However createProtein() forces the selection of the PROTEIN-TERM alphabet from AlphabetManager.xml, through the call to 'getTAlphabet()' as copied below: public static SymbolList createProtein(String theProtein) throws IllegalSymbolException { SymbolTokenization p = null; try { p = getTAlphabet().getTokenization("token"); } catch (BioException e) { throw new BioError("Something has gone badly wrong with Protein", e); } return new SimpleSymbolList(p, theProtein); } This selection should rather be made based on the symbol content of the input sequence(s), rather than being hardcoded. Only if the input data contains the symbol 'TER' (terminus) or some abiguity symbol that covers the PROTEIN-TERM alphabet, should the PROTEIN-TERM alphabet be selected. Otherwise the simpler PROTEIN alphabet should be selected. On a related note, the PROTEIN alphabet defined in AlphabetManager.xml consists of 22 residues and includes the less commonly found 'SEC' (selenocysteine, U) and 'PYR' (pyroglutamic acid, O). However, many applications only require the common 20-symbol alphabet that excludes the latter two residues. It would be useful to include a new alphabet in AlphabetManager.xml that defines the simpler 20-symbol set of common amino acids. Perhaps this point should be a feature request? Cheers, Moyez -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From markjschreiber at gmail.com Wed Jun 10 21:30:00 2009 From: markjschreiber at gmail.com (Mark Schreiber) Date: Thu, 11 Jun 2009 09:30:00 +0800 Subject: [Biojava-dev] [Bug 2854] New: Selection of protein alphabet is hardcoded in ProteinTools class In-Reply-To: References: Message-ID: <93b45ca50906101830s56abf2o28fae8b901f60d56@mail.gmail.com> This actually raises an interesting point for the development of biojava3. Do we actually need separate protein alphabets? I can't actually remember the reason these are separate. Is there a good argument for this??? - Mark On Thu, Jun 11, 2009 at 5:59 AM, wrote: > > http://bugzilla.open-bio.org/show_bug.cgi?id=2854 > > ? ? ? ? ? Summary: Selection of protein alphabet is hardcoded in > ? ? ? ? ? ? ? ? ? ?ProteinTools class > ? ? ? ? ? Product: BioJava > ? ? ? ? ? Version: live (CVS source) > ? ? ? ? ?Platform: All > ? ? ? ?OS/Version: All > ? ? ? ? ? ?Status: NEW > ? ? ? ? ?Severity: normal > ? ? ? ? ?Priority: P2 > ? ? ? ? Component: seq > ? ? ? ?AssignedTo: biojava-dev at biojava.org > ? ? ? ?ReportedBy: mdharsee at ocbn.ca > > > In our application we are calling createProtein() in class > org.biojava.bio.seq.ProteinTools to generate SymbolList objects to encapsulate > peptide sequences that are composed of the 20 common amino acid symbols, as > well as the 'X' ambiguity symbol. > > However createProtein() forces the selection of the PROTEIN-TERM alphabet from > AlphabetManager.xml, through the call to 'getTAlphabet()' as copied below: > > ?public static SymbolList createProtein(String theProtein) > ? ? ? ? ?throws IllegalSymbolException > ?{ > ? ?SymbolTokenization p = null; > ? ?try { > ? ? ?p = getTAlphabet().getTokenization("token"); > ? ?} catch (BioException e) { > ? ? ?throw new BioError("Something has gone badly wrong with Protein", e); > ? ?} > ? ?return new SimpleSymbolList(p, theProtein); > ?} > > This selection should rather be made based on the symbol content of the input > sequence(s), rather than being hardcoded. Only if the input data contains the > symbol 'TER' (terminus) or some abiguity symbol that covers the PROTEIN-TERM > alphabet, should the PROTEIN-TERM alphabet be selected. Otherwise the simpler > PROTEIN alphabet should be selected. > > On a related note, the PROTEIN alphabet defined in AlphabetManager.xml consists > of 22 residues and includes the less commonly found 'SEC' (selenocysteine, U) > and 'PYR' (pyroglutamic acid, O). However, many applications only require the > common 20-symbol alphabet that excludes the latter two residues. It would be > useful to include a new alphabet in AlphabetManager.xml that defines the > simpler 20-symbol set of common amino acids. Perhaps this point should be a > feature request? > > Cheers, > Moyez > > > -- > Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email > ------- You are receiving this mail because: ------- > You are the assignee for the bug, or are watching the assignee. > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev From andreas at sdsc.edu Wed Jun 10 22:50:42 2009 From: andreas at sdsc.edu (Andreas Prlic) Date: Wed, 10 Jun 2009 19:50:42 -0700 Subject: [Biojava-dev] [Bug 2854] New: Selection of protein alphabet is hardcoded in ProteinTools class In-Reply-To: <93b45ca50906101830s56abf2o28fae8b901f60d56@mail.gmail.com> References: <93b45ca50906101830s56abf2o28fae8b901f60d56@mail.gmail.com> Message-ID: <59a41c430906101950q7a592d9dh8a71cbda2e47065c@mail.gmail.com> Hi Mark, The way I see the protein structure modules develop is that I will try to get rid of dependency on the alphabets and replace it with support for the Chemical component dictionary http://www.wwpdb.org/ccd.html . The dictionary contains a list standard and modified residues as well as small molecule ligands. If applicable it provides parent/child relationship between compounds. There are too many modified residues and sometimes the boundaries to ligands are also not straightforward to draw... Andreas On Wed, Jun 10, 2009 at 6:30 PM, Mark Schreiber wrote: > This actually raises an interesting point for the development of > biojava3. Do we actually need separate protein alphabets? I can't > actually remember the reason these are separate. Is there a good > argument for this??? > > - Mark > > On Thu, Jun 11, 2009 at 5:59 AM, wrote: >> >> http://bugzilla.open-bio.org/show_bug.cgi?id=2854 >> >> ? ? ? ? ? Summary: Selection of protein alphabet is hardcoded in >> ? ? ? ? ? ? ? ? ? ?ProteinTools class >> ? ? ? ? ? Product: BioJava >> ? ? ? ? ? Version: live (CVS source) >> ? ? ? ? ?Platform: All >> ? ? ? ?OS/Version: All >> ? ? ? ? ? ?Status: NEW >> ? ? ? ? ?Severity: normal >> ? ? ? ? ?Priority: P2 >> ? ? ? ? Component: seq >> ? ? ? ?AssignedTo: biojava-dev at biojava.org >> ? ? ? ?ReportedBy: mdharsee at ocbn.ca >> >> >> In our application we are calling createProtein() in class >> org.biojava.bio.seq.ProteinTools to generate SymbolList objects to encapsulate >> peptide sequences that are composed of the 20 common amino acid symbols, as >> well as the 'X' ambiguity symbol. >> >> However createProtein() forces the selection of the PROTEIN-TERM alphabet from >> AlphabetManager.xml, through the call to 'getTAlphabet()' as copied below: >> >> ?public static SymbolList createProtein(String theProtein) >> ? ? ? ? ?throws IllegalSymbolException >> ?{ >> ? ?SymbolTokenization p = null; >> ? ?try { >> ? ? ?p = getTAlphabet().getTokenization("token"); >> ? ?} catch (BioException e) { >> ? ? ?throw new BioError("Something has gone badly wrong with Protein", e); >> ? ?} >> ? ?return new SimpleSymbolList(p, theProtein); >> ?} >> >> This selection should rather be made based on the symbol content of the input >> sequence(s), rather than being hardcoded. Only if the input data contains the >> symbol 'TER' (terminus) or some abiguity symbol that covers the PROTEIN-TERM >> alphabet, should the PROTEIN-TERM alphabet be selected. Otherwise the simpler >> PROTEIN alphabet should be selected. >> >> On a related note, the PROTEIN alphabet defined in AlphabetManager.xml consists >> of 22 residues and includes the less commonly found 'SEC' (selenocysteine, U) >> and 'PYR' (pyroglutamic acid, O). However, many applications only require the >> common 20-symbol alphabet that excludes the latter two residues. It would be >> useful to include a new alphabet in AlphabetManager.xml that defines the >> simpler 20-symbol set of common amino acids. Perhaps this point should be a >> feature request? >> >> Cheers, >> Moyez >> >> >> -- >> Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email >> ------- You are receiving this mail because: ------- >> You are the assignee for the bug, or are watching the assignee. >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > From sylvain.foisy at diploide.net Thu Jun 11 09:52:01 2009 From: sylvain.foisy at diploide.net (Sylvain Foisy) Date: Thu, 11 Jun 2009 09:52:01 -0400 Subject: [Biojava-dev] First draft of a remote blast service class Message-ID: Hi to all, I've been working on this for the past week or so and after discussing this with Andreas, I am putting my code here for critical review. I'll put this stuff in biojava-live as soon as Andreas can fix my SVN access. First, an interface called RemotePairwiseAlignementSerivce defines the basic components of a remote service: sequence/database/progam/run options/output options. RemoteQBlastService implements this interface and runs remote Qblast requests and creates output in either text, XML or HTML. At present time, regular blastall programs work, no blastpgp/megablast support yet. I'll need some guidance to make it work on other type of web services like EBI. Best regards Sylvain =================================================================== Sylvain Foisy, Ph. D. Consultant Bio-informatique / Bioinformatics Diploide.net - TI pour la vie / IT for Life Courriel: sylvain.foisy at diploide.net Web: http://www.diploide.net Tel: (514) 893-4363 =================================================================== import java.io.InputStream; import org.biojava.bio.BioException; /** * This interface specifies minimal information needed to execute a pairwise alignment on a remote service. * * Example of service: QBlast service at NCBI * Web Service at EBI * * @author Sylvain Foisy * @since 1.8 * */ public interface RemotePairwiseAlignementService { /** * This field specifies that the output format of results * is text. * */ public static final int TEXT = 0; /** * This field specifies that the output format of results * is XML. * */ public static final int XML = 1; /** * This field specifies that the output format of results * is HTML. * */ public static final int HTML = 2; /** * Setting the database to use for doing the pairwise alignment * * @param db: a String with a valid database ID for the service used. * */ public void setDatabase(String db); /** * Setting the sequence to be align for this for this request * * @param seq: a String with a sequence to be aligned. * */ public void setSequence(String seq); /** * Setting the program to use for this pairwise alignment * * @param prog: a String with a valid database ID for the service used. * */ public void setProgram(String prog); /** * Setting all other options to use for this pairwise alignment * * @param db: a String with a valid database ID for the service used. * */ public void setAdvancedOptions(String str); /** * Doing the actual analysis on the instantiated service * * @throws BioException */ public void executeSearch() throws BioException; /** * Getting the actual alignment results from this instantiated service * * @return : an InputStream with the actual alignment results * @throws BioException */ public InputStream getAlignmentResults() throws BioException; } import java.io.BufferedReader; import java.io.IOException; import java.io.InputStream; import java.io.InputStreamReader; import java.io.OutputStreamWriter; import java.net.MalformedURLException; import java.net.URL; import java.net.URLConnection; import org.biojava.bio.BioException; /** * RemoteQBlastService - A simple way of submitting BLAST request to the QBlast * service at NCBI. * *

* NCBI provides a Blast server through a CGI-BIN interface. RemoteQBlastService simply * encapsulates an access to it by giving users access to get/set methods to fix * sequence, program and database as well as advanced options. *

* *

* As of version 1.0, only blastall programs are usable. blastpgp and megablast are high-priorities. *

* * @author Sylvain Foisy * @version 1.0 * @since 1.8 * * */ public class RemoteQBlastService implements RemotePairwiseAlignementService{ // public static final int TEXT = 0; // public static final int XML = 1; // public static final int HTML = 2; private static String baseurl = "http://www.ncbi.nlm.nih.gov/blast/Blast.cgi"; private URL aUrl; private URLConnection uConn; private OutputStreamWriter fromQBlast; private BufferedReader rd; private String seq = null; private String prog = null; private String db = null; private String outputFormat = null; private String advanced = null; private String rid; private long step; private boolean done = false; private long start; public RemoteQBlastService() throws BioException { try { aUrl = new URL(baseurl); uConn = setQBlastProperties(aUrl.openConnection()); outputFormat = "Text"; } /* * Needed but should never be thrown since the URL is static and known to exist */ catch (MalformedURLException e) { throw new BioException("It looks like the URL for NCBI QBlast service is bad"); } /* * Intercept if the program can't connect to QBlast service */ catch (IOException e) { throw new BioException( "Impossible to connect to QBlast service at this time. Check your network connection"); } } /** * This method execute the Blast request via the Put command of the CGI-BIN * interface. It gets the estimated time of completion by capturing the * value of the RTOE variable and sets a loop that will check for completion * of analysis at intervals specified by RTOE. * *

* It also capture the value for the RID variable, necessary for fetching * the actual results after completion. *

* * @throws BioException * if it is not possible to sent the BLAST command */ public void executeSearch() throws BioException { if (seq == null || db == null || prog == null) { throw new BioException( "Impossible to execute QBlast request. One or more of seq|db|prog has not been set"); } /* * sending the command to execute the Blast analysis */ String cmd = "CMD=Put&SERVICE=plain" + "&" + seq + "&" + prog + "&" + db + "&" + "FORMAT_TYPE=HTML"; if (advanced != null) { cmd += cmd + "&" + advanced; } try { uConn = setQBlastProperties(aUrl.openConnection()); fromQBlast = new OutputStreamWriter(uConn.getOutputStream()); fromQBlast.write(cmd); fromQBlast.flush(); // Get the response rd = new BufferedReader(new InputStreamReader(uConn .getInputStream())); String line = ""; while ((line = rd.readLine()) != null) { if (line.contains("RID")) { String[] arr = line.split("="); rid = arr[1].trim(); } else if (line.contains("RTOE")) { String[] arr = line.split("="); step = Long.parseLong(arr[1].trim()) * 1000; start = System.currentTimeMillis() + step; } } } catch (IOException e) { throw new BioException( "Can't submit sequence to BLAST server at this time."); } /* * Getting the info out of the NCBI system */ while (!done) { long prez = System.currentTimeMillis(); done = isReady(rid, prez); } } /** *

This method is used only for the executeBlastSearch method to check for completion of * request using the NCBI specified RTOE variable

* * @param id * @param present * @return */ private boolean isReady(String id, long present) { boolean ready = false; String check = "CMD=Get&RID=" + id; /* * If present time is less than the start of the search added to step * obtained from NCBI, just do nothing ;-) */ if (present < start) { ; } /* * If we are at least step seconds in the future from the actual call of * method executeBlastSearch() */ else { try { uConn = setQBlastProperties(aUrl.openConnection()); fromQBlast = new OutputStreamWriter(uConn.getOutputStream()); fromQBlast.write(check); fromQBlast.flush(); rd = new BufferedReader(new InputStreamReader(uConn .getInputStream())); String line = ""; while ((line = rd.readLine()) != null) { if (line.contains("READY")) { ready = true; } else if (line.contains("WAITING")) { /* * Else, move start forward in time... */ start = present + step; } } } catch (IOException e) { e.printStackTrace(); } } return ready; } /** *

This method extracts this actual Blast report. The default format is Text but can be changed before with the method * setQBlastOutputFormat.

* * * @return * @throws BioException */ public InputStream getAlignmentResults() throws BioException { String srid = "CMD=Get&RID=" + rid; srid += "&FORMAT_TYPE=" + outputFormat; if(!this.done){ throw new BioException("Unable to get report at this time. Your Blast request has not been processed yet."); } try { uConn = setQBlastProperties(aUrl.openConnection()); fromQBlast = new OutputStreamWriter(uConn.getOutputStream()); fromQBlast.write(srid); fromQBlast.flush(); return uConn.getInputStream(); } catch (IOException ioe) { throw new BioException( "It is not possible to fetch Blast report from NCBI at this time"); } } /** *

* Set the sequence to be blasted using the String that correspond to the * sequence. *

* *

* Take note that this method is mutually exclusive to setGIToBlast() for a * given Blast request. *

* * @param aStr * : a String with the sequence */ public void setSequence(String aStr) { this.seq = "QUERY=" + aStr; } /** * Simply return a string with the blasted sequence. * * @return seq : a string with the sequence */ public String getSeqToBlast() { return this.seq; } /** *

* Set the sequence to be blasted using the NCBI GI value. At this time, * there is no effort made to check the validity of this GI. *

* *

* Take note that this method is mutually exclusive to setSeqToBlast() for a * given Blast request. *

* * @param gi * : an integer value representing a NCBI GI */ public void setGIToBlast(String gi) { this.seq = "QUERY=" + gi; } /** *

* Simply return a string with the sequence blasted. *

* * @return GI : a String with the GI of the blasted sequence */ public String getGIToBlast() { return this.seq; } /** *

* This method set the program to be used to blast the given sequence/GI. At * this time, there is no attempt at checking the matching of sequence type * to program. *

* * @param prog * : a String representing the program specified for this QBlast * request. * */ public void setProgram(String prog) { this.prog = "PROGRAM=" + prog; } /** *

* Simply returns the program used for the given Blast request. *

* * @return prog : a String with the program used for this QBlast request. */ public String getProgram() { return this.prog; } /** *

* This method set the database to be used to blast the given sequence/GI. * At this time, there is no attempt at checking the matching of sequence * type to database. *

* * @param db: a String for the database specified for this QBlast request */ public void setDatabase(String db) { this.db = "DATABASE=" + db; } /** *

* Simply returns the database used for the given Blast request. *

* * @return db: a String with the database used for this QBlast request. */ public String getBlastDatabase() { return this.db; } /** *

This method let the user specify which format to use for generating the output.

* * @param type:an integer taken from the static constant of this class, either be TEXT, XML or HTML */ public void setQBlastOutputFormat(int type) { switch (type) { case 0: this.outputFormat = "Text"; break; case 1: this.outputFormat = "XML"; break; case 2: this.outputFormat = "HTML"; break; } } /** *

* Simply returns the output format used for the given Blast report. *

* * @return outputFormat : a String with the format specified for the QBlast report. */ public String getQBlastOutputFormat() { return this.outputFormat; } /** *

This method is to be used if a request is to use non-default values at submission. According to QBlast info, * the accepted parameters for PUT requests are:

* *
    *
  • -G: cost to create a gap. Default = 5 (nuc-nuc) / 11 (protein) / non-affine for megablast
  • *
  • -E: Cost to extend a gap. Default = 2 (nuc-nuc) / 1 (protein) / non-affine for megablast
  • *
  • -r: integer to reward for match. Default = 1
  • *
  • -q: negative integer for penalty to allow mismatch. Default = -3
  • *
  • -e: expectation value. Default = 10.0
  • *
  • -W: word size. Default = 3 (proteins) / 11 (nuc-nuc) / 28 (megablast)
  • *
  • -y: dropoff for blast extensions in bits, using default if not specified. Default = 20 for blastn, 7 for all others * (except megablast for which it is not applicable).
  • *
  • -X: X dropoff value for gapped alignment, in bits. Default = 30 for blastn/megablast, 15 for all others.
  • *
  • -Z: final X dropoff value for gapped alignement, in bits. Default = 50 for blastn, 25 for all others * (except megablast for which it is not applicable)
  • *
  • -P: equals 0 for multiple hits 1-pass, 1 for single hit 1-pass. Does not apply to blastn ou megablast.
  • *
  • -A: multiple hits window size. Default = 0 (for single hit algorithm)
  • *
  • -I: number of database sequences to save hits for. Default = 500
  • *
  • -Y: effective length of the search space. Default = 0 (0 represents using the whole space)
  • *
  • -z: a real specifying the effective length of the database to use. Default = 0 (0 represents the real size)
  • *
  • -c: an integer representing pseudocount constant for PSI-BLAST. Default = 7
  • *
  • -F: any filtering directive
  • *
* *

You have to be aware that at not moment is there any error checking on the use of these parameters by this class.

* @param aStr: a String with any number of optional parameters with an associated value. * */ public void setAdvancedOptions(String aStr) { this.advanced = "OTHER_ADVANCED=" + aStr; } /** * * Simply return the string given as argument via setBlastAdvancedOptions * * @return advanced: the string with the advanced options */ public String getBlastAdvancedOptions() { return this.advanced; } /** * * Simply return the QBlast RID for this specific QBlast request * * @return rid: the string with the RID */ public String getBlastRID() { return this.rid; } /** * A simple method to check the availability of the QBlast service * * @throws BioException */ public void printRemoteBlastInfo() throws BioException { try { OutputStreamWriter out = new OutputStreamWriter(uConn .getOutputStream()); out.write("CMD=Info"); out.flush(); // Get the response BufferedReader rd = new BufferedReader(new InputStreamReader(uConn .getInputStream())); String line = ""; while ((line = rd.readLine()) != null) { System.out.println(line); } out.close(); rd.close(); } catch (IOException e) { throw new BioException( "Impossible to get info from QBlast service at this time. Check your network connection"); } } private URLConnection setQBlastProperties(URLConnection conn) { URLConnection tmp = conn; conn.setDoOutput(true); conn.setUseCaches(false); tmp.setRequestProperty("User-Agent", "Biojava/RemoteQBlastService"); tmp.setRequestProperty("Connection", "Keep-Alive"); tmp.setRequestProperty("Content-type", "application/x-www-form-urlencoded"); tmp.setRequestProperty("Content-length", "200"); return tmp; } } From james at carmanconsulting.com Thu Jun 11 10:24:44 2009 From: james at carmanconsulting.com (James Carman) Date: Thu, 11 Jun 2009 10:24:44 -0400 Subject: [Biojava-dev] First draft of a remote blast service class In-Reply-To: References: Message-ID: Are we allowed to use JDK5? Why not use enums rather than int codes? On Thu, Jun 11, 2009 at 9:52 AM, Sylvain Foisy wrote: > Hi to all, > > I've been working on this for the past week or so and after discussing this > with Andreas, I am putting my code here for critical review. I'll put this > stuff in biojava-live as soon as Andreas can fix my SVN access. > > First, an interface called RemotePairwiseAlignementSerivce defines the basic > components of a remote service: sequence/database/progam/run options/output > options. RemoteQBlastService implements this interface and runs remote > Qblast requests and creates output in either text, XML or HTML. At present > time, regular blastall programs work, no blastpgp/megablast support yet. > > I'll need some guidance to make it work on other type of web services like > EBI. > > Best regards > > Sylvain > > =================================================================== > > ?Sylvain Foisy, Ph. D. > ?Consultant Bio-informatique / Bioinformatics > ?Diploide.net - TI pour la vie / IT for Life > > ?Courriel: sylvain.foisy at diploide.net > ?Web: http://www.diploide.net > ?Tel: (514) 893-4363 > =================================================================== > > import java.io.InputStream; > > import org.biojava.bio.BioException; > /** > ?* This interface specifies minimal information needed to execute a pairwise > alignment on a remote service. > ?* > ?* Example of service: QBlast service at NCBI > ?* ? ? ? ? ? ? ? ? ? ? Web Service at EBI > ?* > ?* @author Sylvain Foisy > ?* @since 1.8 > ?* > ?*/ > public interface RemotePairwiseAlignementService { > > ? ?/** > ? ? * This field specifies that the output format of results > ? ? * is text. > ? ? * > ? ? */ > ? ?public static final int TEXT = 0; > > ? ?/** > ? ? * This field specifies that the output format of results > ? ? * is XML. > ? ? * > ? ? */ > ? ?public static final int XML = 1; > > ? ?/** > ? ? * This field specifies that the output format of results > ? ? * is HTML. > ? ? * > ? ? */ > ? ?public static final int HTML = 2; > > ? ?/** > ? ? * Setting the database to use for doing the pairwise alignment > ? ? * > ? ? * @param db: a String with a valid database ID for the > service used. > ? ? * > ? ? */ > ? ?public void setDatabase(String db); > > ? ?/** > ? ? * Setting the sequence to be align for this for this request > ? ? * > ? ? * @param seq: a String with a sequence to be aligned. > ? ? * > ? ? */ > ? ?public void setSequence(String seq); > > ? ?/** > ? ? * Setting the program to use for this pairwise alignment > ? ? * > ? ? * @param prog: a String with a valid database ID for the > service used. > ? ? * > ? ? */ > ? ?public void setProgram(String prog); > > ? ?/** > ? ? * Setting all other options to use for this pairwise alignment > ? ? * > ? ? * @param db: a String with a valid database ID for the > service used. > ? ? * > ? ? */ > ? ?public void setAdvancedOptions(String str); > > ? ?/** > ? ? * Doing the actual analysis on the instantiated service > ? ? * > ? ? * @throws BioException > ? ? */ > ? ?public void executeSearch() throws BioException; > > ? ?/** > ? ? * Getting the actual alignment results from this instantiated service > ? ? * > ? ? * @return : an InputStream with the actual alignment > results > ? ? * @throws BioException > ? ? */ > ? ?public InputStream getAlignmentResults() throws BioException; > } > > import java.io.BufferedReader; > import java.io.IOException; > import java.io.InputStream; > import java.io.InputStreamReader; > import java.io.OutputStreamWriter; > import java.net.MalformedURLException; > import java.net.URL; > import java.net.URLConnection; > > import org.biojava.bio.BioException; > > /** > ?* RemoteQBlastService - A simple way of submitting BLAST request to the > QBlast > ?* service at NCBI. > ?* > ?*

> ?* NCBI provides a Blast server through a CGI-BIN interface. > RemoteQBlastService simply > ?* encapsulates an access to it by giving users access to get/set methods to > fix > ?* sequence, program and database as well as advanced options. > ?*

> ?* > ?*

> ?* As of version 1.0, only blastall programs are usable. blastpgp and > megablast are high-priorities. > ?*

> ?* > ?* @author Sylvain Foisy > ?* @version 1.0 > ?* @since 1.8 > ?* > ?* > ?*/ > public class RemoteQBlastService implements RemotePairwiseAlignementService{ > > // ? ?public static final int TEXT = 0; > // ? ?public static final int XML = 1; > // ? ?public static final int HTML = 2; > > ? ?private static String baseurl = > "http://www.ncbi.nlm.nih.gov/blast/Blast.cgi"; > ? ?private URL aUrl; > ? ?private URLConnection uConn; > ? ?private OutputStreamWriter fromQBlast; > ? ?private BufferedReader rd; > > ? ?private String seq = null; > ? ?private String prog = null; > ? ?private String db = null; > ? ?private String outputFormat = null; > ? ?private String advanced = null; > > ? ?private String rid; > ? ?private long step; > ? ?private boolean done = false; > ? ?private long start; > > ? ?public RemoteQBlastService() throws BioException { > ? ? ? ?try { > ? ? ? ? ? ?aUrl = new URL(baseurl); > ? ? ? ? ? ?uConn = setQBlastProperties(aUrl.openConnection()); > > ? ? ? ? ? ?outputFormat = "Text"; > ? ? ? ?} > ? ? ? ?/* > ? ? ? ? * Needed but should never be thrown since the URL is static and > known to exist > ? ? ? ? */ > ? ? ? ?catch (MalformedURLException e) { > ? ? ? ? ? ?throw new BioException("It looks like the URL for NCBI QBlast > service is bad"); > ? ? ? ?} > ? ? ? ?/* > ? ? ? ? * Intercept if the program can't connect to QBlast service > ? ? ? ? */ > ? ? ? ?catch (IOException e) { > ? ? ? ? ? ?throw new BioException( > ? ? ? ? ? ? ? ? ? ?"Impossible to connect to QBlast service at this time. > Check your network connection"); > ? ? ? ?} > ? ?} > > ? ?/** > ? ? * This method execute the Blast request via the Put command of the > CGI-BIN > ? ? * interface. It gets the estimated time of completion by capturing the > ? ? * value of the RTOE variable and sets a loop that will check for > completion > ? ? * of analysis at intervals specified by RTOE. > ? ? * > ? ? *

> ? ? * It also capture the value for the RID variable, necessary for > fetching > ? ? * the actual results after completion. > ? ? *

> ? ? * > ? ? * @throws BioException > ? ? * ? ? ? ? ? ? if it is not possible to sent the BLAST command > ? ? */ > ? ?public void executeSearch() throws BioException { > > ? ? ? ?if (seq == null || db == null || prog == null) { > ? ? ? ? ? ?throw new BioException( > ? ? ? ? ? ? ? ? ? ?"Impossible to execute QBlast request. One or more of > seq|db|prog has not been set"); > ? ? ? ?} > ? ? ? ?/* > ? ? ? ? * sending the command to execute the Blast analysis > ? ? ? ? */ > ? ? ? ?String cmd = "CMD=Put&SERVICE=plain" + "&" + seq + "&" + prog + "&" > ? ? ? ? ? ? ? ?+ db + "&" + "FORMAT_TYPE=HTML"; > > ? ? ? ?if (advanced != null) { > ? ? ? ? ? ?cmd += cmd + "&" + advanced; > ? ? ? ?} > > ? ? ? ?try { > > ? ? ? ? ? ?uConn = setQBlastProperties(aUrl.openConnection()); > > ? ? ? ? ? ?fromQBlast = new OutputStreamWriter(uConn.getOutputStream()); > > ? ? ? ? ? ?fromQBlast.write(cmd); > ? ? ? ? ? ?fromQBlast.flush(); > > ? ? ? ? ? ?// Get the response > ? ? ? ? ? ?rd = new BufferedReader(new InputStreamReader(uConn > ? ? ? ? ? ? ? ? ? ?.getInputStream())); > > ? ? ? ? ? ?String line = ""; > > ? ? ? ? ? ?while ((line = rd.readLine()) != null) { > ? ? ? ? ? ? ? ?if (line.contains("RID")) { > ? ? ? ? ? ? ? ? ? ?String[] arr = line.split("="); > ? ? ? ? ? ? ? ? ? ?rid = arr[1].trim(); > ? ? ? ? ? ? ? ?} else if (line.contains("RTOE")) { > ? ? ? ? ? ? ? ? ? ?String[] arr = line.split("="); > ? ? ? ? ? ? ? ? ? ?step = Long.parseLong(arr[1].trim()) * 1000; > ? ? ? ? ? ? ? ? ? ?start = System.currentTimeMillis() + step; > ? ? ? ? ? ? ? ?} > ? ? ? ? ? ?} > ? ? ? ?} catch (IOException e) { > ? ? ? ? ? ?throw new BioException( > ? ? ? ? ? ? ? ? ? ?"Can't submit sequence to BLAST server at this time."); > ? ? ? ?} > ? ? ? ?/* > ? ? ? ? * Getting the info out of the NCBI system > ? ? ? ? */ > ? ? ? ?while (!done) { > ? ? ? ? ? ?long prez = System.currentTimeMillis(); > ? ? ? ? ? ?done = isReady(rid, prez); > ? ? ? ?} > ? ?} > > ? ?/** > ? ? *

This method is used only for the executeBlastSearch method to > check for completion of > ? ? * request using the NCBI specified RTOE variable

> ? ? * > ? ? * @param id > ? ? * @param present > ? ? * @return > ? ? */ > ? ?private boolean isReady(String id, long present) { > > ? ? ? ?boolean ready = false; > ? ? ? ?String check = "CMD=Get&RID=" + id; > ? ? ? ?/* > ? ? ? ? * If present time is less than the start of the search added to > step > ? ? ? ? * obtained from NCBI, just do nothing ;-) > ? ? ? ? */ > ? ? ? ?if (present < start) { > ? ? ? ? ? ?; > ? ? ? ?} > ? ? ? ?/* > ? ? ? ? * If we are at least step seconds in the future from the actual > call of > ? ? ? ? * method executeBlastSearch() > ? ? ? ? */ > ? ? ? ?else { > ? ? ? ? ? ?try { > ? ? ? ? ? ? ? ?uConn = setQBlastProperties(aUrl.openConnection()); > > ? ? ? ? ? ? ? ?fromQBlast = new > OutputStreamWriter(uConn.getOutputStream()); > ? ? ? ? ? ? ? ?fromQBlast.write(check); > ? ? ? ? ? ? ? ?fromQBlast.flush(); > > ? ? ? ? ? ? ? ?rd = new BufferedReader(new InputStreamReader(uConn > ? ? ? ? ? ? ? ? ? ? ? ?.getInputStream())); > > ? ? ? ? ? ? ? ?String line = ""; > > ? ? ? ? ? ? ? ?while ((line = rd.readLine()) != null) { > ? ? ? ? ? ? ? ? ? ?if (line.contains("READY")) { > ? ? ? ? ? ? ? ? ? ? ? ?ready = true; > ? ? ? ? ? ? ? ? ? ?} else if (line.contains("WAITING")) { > ? ? ? ? ? ? ? ? ? ? ? ?/* > ? ? ? ? ? ? ? ? ? ? ? ? * Else, move start forward in time... > ? ? ? ? ? ? ? ? ? ? ? ? */ > ? ? ? ? ? ? ? ? ? ? ? ?start = present + step; > ? ? ? ? ? ? ? ? ? ?} > ? ? ? ? ? ? ? ?} > ? ? ? ? ? ?} catch (IOException e) { > ? ? ? ? ? ? ? ?e.printStackTrace(); > ? ? ? ? ? ?} > ? ? ? ?} > ? ? ? ?return ready; > ? ?} > > ? ?/** > ? ? *

This method extracts this actual Blast report. The default format > is Text but can be changed before with the method > ? ? * setQBlastOutputFormat.

> ? ? * > ? ? * > ? ? * @return > ? ? * @throws BioException > ? ? */ > ? ?public InputStream getAlignmentResults() throws BioException { > ? ? ? ?String srid = "CMD=Get&RID=" + rid; > ? ? ? ?srid += "&FORMAT_TYPE=" + outputFormat; > > ? ? ? ?if(!this.done){ > ? ? ? ? ? ?throw new BioException("Unable to get report at this time. Your > Blast request has not been processed yet."); > ? ? ? ?} > > ? ? ? ?try { > ? ? ? ? ? ?uConn = setQBlastProperties(aUrl.openConnection()); > > ? ? ? ? ? ?fromQBlast = new OutputStreamWriter(uConn.getOutputStream()); > ? ? ? ? ? ?fromQBlast.write(srid); > ? ? ? ? ? ?fromQBlast.flush(); > > ? ? ? ? ? ?return uConn.getInputStream(); > > ? ? ? ?} catch (IOException ioe) { > ? ? ? ? ? ?throw new BioException( > ? ? ? ? ? ? ? ? ? ?"It is not possible to fetch Blast report from NCBI at > this time"); > ? ? ? ?} > ? ?} > > ? ?/** > ? ? *

> ? ? * Set the sequence to be blasted using the String that correspond to > the > ? ? * sequence. > ? ? *

> ? ? * > ? ? *

> ? ? * Take note that this method is mutually exclusive to setGIToBlast() > for a > ? ? * given Blast request. > ? ? *

> ? ? * > ? ? * @param aStr > ? ? * ? ? ? ? ? ?: a String with the sequence > ? ? */ > ? ?public void setSequence(String aStr) { > ? ? ? ?this.seq = "QUERY=" + aStr; > ? ?} > > ? ?/** > ? ? * Simply return a string with the blasted sequence. > ? ? * > ? ? * @return seq : a string with the sequence > ? ? */ > ? ?public String getSeqToBlast() { > ? ? ? ?return this.seq; > ? ?} > > ? ?/** > ? ? *

> ? ? * Set the sequence to be blasted using the NCBI GI value. At this time, > ? ? * there is no effort made to check the validity of this GI. > ? ? *

> ? ? * > ? ? *

> ? ? * Take note that this method is mutually exclusive to setSeqToBlast() > for a > ? ? * given Blast request. > ? ? *

> ? ? * > ? ? * @param gi > ? ? * ? ? ? ? ? ?: an integer value representing a NCBI GI > ? ? */ > ? ?public void setGIToBlast(String gi) { > ? ? ? ?this.seq = "QUERY=" + gi; > ? ?} > > ? ?/** > ? ? *

> ? ? * Simply return a string with the sequence blasted. > ? ? *

> ? ? * > ? ? * @return GI : a String with the GI of the blasted sequence > ? ? */ > ? ?public String getGIToBlast() { > ? ? ? ?return this.seq; > ? ?} > > ? ?/** > ? ? *

> ? ? * This method set the program to be used to blast the given > sequence/GI. At > ? ? * this time, there is no attempt at checking the matching of sequence > type > ? ? * to program. > ? ? *

> ? ? * > ? ? * @param prog > ? ? * ? ? ? ? ? ?: a String representing the program specified for this > QBlast > ? ? * ? ? ? ? ? ?request. > ? ? * > ? ? */ > ? ?public void setProgram(String prog) { > ? ? ? ?this.prog = "PROGRAM=" + prog; > ? ?} > > ? ?/** > ? ? *

> ? ? * Simply returns the program used for the given Blast request. > ? ? *

> ? ? * > ? ? * @return prog : a String with the program used for this QBlast > request. > ? ? */ > ? ?public String getProgram() { > ? ? ? ?return this.prog; > ? ?} > > ? ?/** > ? ? *

> ? ? * This method set the database to be used to blast the given > sequence/GI. > ? ? * At this time, there is no attempt at checking the matching of > sequence > ? ? * type to database. > ? ? *

> ? ? * > ? ? * @param db: a String for the database specified for this QBlast > request > ? ? */ > ? ?public void setDatabase(String db) { > ? ? ? ?this.db = "DATABASE=" + db; > ? ?} > > ? ?/** > ? ? *

> ? ? * Simply returns the database used for the given Blast request. > ? ? *

> ? ? * > ? ? * @return db: a String with the database used for this QBlast request. > ? ? */ > ? ?public String getBlastDatabase() { > ? ? ? ?return this.db; > ? ?} > > ? ?/** > ? ? *

This method let the user specify which format to use for > generating the output.

> ? ? * > ? ? * @param type:an integer taken from the static constant of this class, > either be TEXT, XML or HTML > ? ? */ > ? ?public void setQBlastOutputFormat(int type) { > > ? ? ? ?switch (type) { > ? ? ? ? ? ?case 0: > ? ? ? ? ? ? ? ?this.outputFormat = "Text"; > ? ? ? ? ? ? ? ?break; > ? ? ? ? ? ?case 1: > ? ? ? ? ? ? ? ?this.outputFormat = "XML"; > ? ? ? ? ? ? ? ?break; > ? ? ? ? ? ?case 2: > ? ? ? ? ? ? ? ?this.outputFormat = "HTML"; > ? ? ? ? ? ? ? ?break; > ? ? ? ?} > ? ?} > > ? ?/** > ? ? *

> ? ? * Simply returns the output format used for the given Blast report. > ? ? *

> ? ? * > ? ? * @return outputFormat : a String with the format specified for the > QBlast report. > ? ? */ > ? ?public String getQBlastOutputFormat() { > ? ? ? ?return this.outputFormat; > ? ?} > > ? ?/** > ? ? *

This method is to be used if a request is to use non-default > values at submission. According to QBlast info, > ? ? * the accepted parameters for PUT requests are:

> ? ? * > ? ? *
    > ? ? *
  • -G: cost to create a gap. Default = 5 (nuc-nuc) / 11 (protein) / > non-affine for megablast
  • > ? ? *
  • -E: Cost to extend a gap. Default = 2 (nuc-nuc) / 1 (protein) / > non-affine for megablast
  • > ? ? *
  • -r: integer to reward for match. Default = 1
  • > ? ? *
  • -q: negative integer for penalty to allow mismatch. Default = > -3
  • > ? ? *
  • -e: expectation value. Default = 10.0
  • > ? ? *
  • -W: word size. Default = 3 (proteins) / 11 (nuc-nuc) / 28 > (megablast)
  • > ? ? *
  • -y: dropoff for blast extensions in bits, using default if not > specified. Default = 20 for blastn, 7 for all others > ? ? * (except megablast for which it is not applicable).
  • > ? ? *
  • -X: X dropoff value for gapped alignment, in bits. Default = 30 > for blastn/megablast, 15 for all others.
  • > ? ? *
  • -Z: final X dropoff value for gapped alignement, in bits. Default > = 50 for blastn, 25 for all others > ? ? * (except megablast for which it is not applicable)
  • > ? ? *
  • -P: equals 0 for multiple hits 1-pass, 1 for single hit 1-pass. > Does not apply to blastn ou megablast.
  • > ? ? *
  • -A: multiple hits window size. Default = 0 (for single hit > algorithm)
  • > ? ? *
  • -I: number of database sequences to save hits for. Default = > 500
  • > ? ? *
  • -Y: effective length of the search space. Default = 0 (0 > represents using the whole space)
  • > ? ? *
  • -z: a real specifying the effective length of the database to > use. Default = 0 (0 represents the real size)
  • > ? ? *
  • -c: an integer representing pseudocount constant for PSI-BLAST. > Default = 7
  • > ? ? *
  • -F: any filtering directive
  • > ? ? *
> ? ? * > ? ? *

You have to be aware that at not moment is there any error > checking on the use of these parameters by this class.

> ? ? * @param aStr: a String with any number of optional parameters with an > associated value. > ? ? * > ? ? */ > ? ?public void setAdvancedOptions(String aStr) { > ? ? ? ?this.advanced = "OTHER_ADVANCED=" + aStr; > ? ?} > > ? ?/** > ? ? * > ? ? * Simply return the string given as argument via > setBlastAdvancedOptions > ? ? * > ? ? * @return advanced: the string with the advanced options > ? ? */ > ? ?public String getBlastAdvancedOptions() { > ? ? ? ?return this.advanced; > ? ?} > > ? ?/** > ? ? * > ? ? * Simply return the QBlast RID for this specific QBlast request > ? ? * > ? ? * @return rid: the string with the RID > ? ? */ > ? ?public String getBlastRID() { > ? ? ? ?return this.rid; > ? ?} > > ? ?/** > ? ? * A simple method to check the availability of the QBlast service > ? ? * > ? ? * @throws BioException > ? ? */ > ? ?public void printRemoteBlastInfo() throws BioException { > ? ? ? ?try { > ? ? ? ? ? ?OutputStreamWriter out = new OutputStreamWriter(uConn > ? ? ? ? ? ? ? ? ? ?.getOutputStream()); > > ? ? ? ? ? ?out.write("CMD=Info"); > ? ? ? ? ? ?out.flush(); > > ? ? ? ? ? ?// Get the response > ? ? ? ? ? ?BufferedReader rd = new BufferedReader(new > InputStreamReader(uConn > ? ? ? ? ? ? ? ? ? ?.getInputStream())); > > ? ? ? ? ? ?String line = ""; > > ? ? ? ? ? ?while ((line = rd.readLine()) != null) { > ? ? ? ? ? ? ? ?System.out.println(line); > ? ? ? ? ? ?} > > ? ? ? ? ? ?out.close(); > ? ? ? ? ? ?rd.close(); > ? ? ? ?} catch (IOException e) { > ? ? ? ? ? ?throw new BioException( > ? ? ? ? ? ? ? ? ? ?"Impossible to get info from QBlast service at this > time. Check your network connection"); > ? ? ? ?} > ? ?} > > ? ?private URLConnection setQBlastProperties(URLConnection conn) { > > ? ? ? ?URLConnection tmp = conn; > > ? ? ? ?conn.setDoOutput(true); > ? ? ? ?conn.setUseCaches(false); > > ? ? ? ?tmp.setRequestProperty("User-Agent", "Biojava/RemoteQBlastService"); > ? ? ? ?tmp.setRequestProperty("Connection", "Keep-Alive"); > ? ? ? ?tmp.setRequestProperty("Content-type", > ? ? ? ? ? ? ? ?"application/x-www-form-urlencoded"); > ? ? ? ?tmp.setRequestProperty("Content-length", "200"); > > ? ? ? ?return tmp; > ? ?} > } > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > From sylvain.foisy at diploide.net Thu Jun 11 10:45:23 2009 From: sylvain.foisy at diploide.net (Sylvain Foisy) Date: Thu, 11 Jun 2009 10:45:23 -0400 Subject: [Biojava-dev] First draft of a remote blast service class In-Reply-To: <1244729855.5546.52.camel@buzzybee> Message-ID: Hi Richard, On 11/06/09 10:17, "[NAME]" <[ADDRESS]> wrote: > Good stuff! My 2p's worth: Thanks. It's my first java code in the past year or so... Managing projects really kills programming habits :-( > setSequence() should be overloaded to accept all forms of possible > sequence input - whatever is decided on as the standard way of > referencing sequence data in BJ3. The original plan for BJ3 was to allow > String/CharSequence and List (see > http://www.biojava.org/wiki/BioJava3:HowTo ) Good point. I'll work on this. The List is a bit tricky: one would need to create a timed sequence so that the program would not flood the service. My own .02 cents: this should be done at the program level, not the class level. The class should need to be "preoccupied" by a single request. > setAdvancedOptions() should not accept a String, but rather a Properties > or a Map, where the keys of the Map/Properties are > restricted to a range of acceptable values determined (and published, > maybe as an enum?) by each of the implementation classes (e.g. > RemoteQBlastService). The implementation class then uses this to > construct the call string. The reason for doing it this way is that (a) > it allows the parameters to be verified by checking them against a known > list of allowable key/values, and (b) it allows for non-URL based remote > requests to be constructed from the values, e.g. SOAP calls. Mots definitely a good thing! > I would also replace the static int HTML/TEXT/XML with an enum as > numeric constants are sometimes a Bad Thing. > > The setProgram() method in my mind is specific to Blast, as opposed to > being a generic Pairwise Alignment concept. Therefore it might be better > to move this to a Blast-specific sub-interface or make it only appear in > the implementation classes that refer to Blast. I am to used to using BLAST only... > Finally, the JavaDocs for the various set() methods are incorrect - > they're all mostly the same in fact! :) I am looking and not seeing the same thing... Ok, I'll check triple ;-) > But overall it looks good. Thanks. I'll be working on this in the next few days. Best regards Sylvain =================================================================== Sylvain Foisy, Ph. D. Consultant Bio-informatique / Bioinformatics Diploide.net - TI pour la vie / IT for Life Courriel: sylvain.foisy at diploide.net Web: http://www.diploide.net Tel: (514) 893-4363 =================================================================== From andreas.prlic at gmail.com Thu Jun 11 11:24:20 2009 From: andreas.prlic at gmail.com (Andreas Prlic) Date: Thu, 11 Jun 2009 08:24:20 -0700 Subject: [Biojava-dev] First draft of a remote blast service class In-Reply-To: References: Message-ID: <59a41c430906110824l2f1167cfp236cf69cc9dee94c@mail.gmail.com> I would pass the parameters as a bean rather than a string... Andreas On Thu, Jun 11, 2009 at 6:52 AM, Sylvain Foisy wrote: > Hi to all, > > I've been working on this for the past week or so and after discussing this > with Andreas, I am putting my code here for critical review. I'll put this > stuff in biojava-live as soon as Andreas can fix my SVN access. > > First, an interface called RemotePairwiseAlignementSerivce defines the basic > components of a remote service: sequence/database/progam/run options/output > options. RemoteQBlastService implements this interface and runs remote > Qblast requests and creates output in either text, XML or HTML. At present > time, regular blastall programs work, no blastpgp/megablast support yet. > > I'll need some guidance to make it work on other type of web services like > EBI. > > Best regards > > Sylvain > > =================================================================== > > ?Sylvain Foisy, Ph. D. > ?Consultant Bio-informatique / Bioinformatics > ?Diploide.net - TI pour la vie / IT for Life > > ?Courriel: sylvain.foisy at diploide.net > ?Web: http://www.diploide.net > ?Tel: (514) 893-4363 > =================================================================== > > import java.io.InputStream; > > import org.biojava.bio.BioException; > /** > ?* This interface specifies minimal information needed to execute a pairwise > alignment on a remote service. > ?* > ?* Example of service: QBlast service at NCBI > ?* ? ? ? ? ? ? ? ? ? ? Web Service at EBI > ?* > ?* @author Sylvain Foisy > ?* @since 1.8 > ?* > ?*/ > public interface RemotePairwiseAlignementService { > > ? ?/** > ? ? * This field specifies that the output format of results > ? ? * is text. > ? ? * > ? ? */ > ? ?public static final int TEXT = 0; > > ? ?/** > ? ? * This field specifies that the output format of results > ? ? * is XML. > ? ? * > ? ? */ > ? ?public static final int XML = 1; > > ? ?/** > ? ? * This field specifies that the output format of results > ? ? * is HTML. > ? ? * > ? ? */ > ? ?public static final int HTML = 2; > > ? ?/** > ? ? * Setting the database to use for doing the pairwise alignment > ? ? * > ? ? * @param db: a String with a valid database ID for the > service used. > ? ? * > ? ? */ > ? ?public void setDatabase(String db); > > ? ?/** > ? ? * Setting the sequence to be align for this for this request > ? ? * > ? ? * @param seq: a String with a sequence to be aligned. > ? ? * > ? ? */ > ? ?public void setSequence(String seq); > > ? ?/** > ? ? * Setting the program to use for this pairwise alignment > ? ? * > ? ? * @param prog: a String with a valid database ID for the > service used. > ? ? * > ? ? */ > ? ?public void setProgram(String prog); > > ? ?/** > ? ? * Setting all other options to use for this pairwise alignment > ? ? * > ? ? * @param db: a String with a valid database ID for the > service used. > ? ? * > ? ? */ > ? ?public void setAdvancedOptions(String str); > > ? ?/** > ? ? * Doing the actual analysis on the instantiated service > ? ? * > ? ? * @throws BioException > ? ? */ > ? ?public void executeSearch() throws BioException; > > ? ?/** > ? ? * Getting the actual alignment results from this instantiated service > ? ? * > ? ? * @return : an InputStream with the actual alignment > results > ? ? * @throws BioException > ? ? */ > ? ?public InputStream getAlignmentResults() throws BioException; > } > > import java.io.BufferedReader; > import java.io.IOException; > import java.io.InputStream; > import java.io.InputStreamReader; > import java.io.OutputStreamWriter; > import java.net.MalformedURLException; > import java.net.URL; > import java.net.URLConnection; > > import org.biojava.bio.BioException; > > /** > ?* RemoteQBlastService - A simple way of submitting BLAST request to the > QBlast > ?* service at NCBI. > ?* > ?*

> ?* NCBI provides a Blast server through a CGI-BIN interface. > RemoteQBlastService simply > ?* encapsulates an access to it by giving users access to get/set methods to > fix > ?* sequence, program and database as well as advanced options. > ?*

> ?* > ?*

> ?* As of version 1.0, only blastall programs are usable. blastpgp and > megablast are high-priorities. > ?*

> ?* > ?* @author Sylvain Foisy > ?* @version 1.0 > ?* @since 1.8 > ?* > ?* > ?*/ > public class RemoteQBlastService implements RemotePairwiseAlignementService{ > > // ? ?public static final int TEXT = 0; > // ? ?public static final int XML = 1; > // ? ?public static final int HTML = 2; > > ? ?private static String baseurl = > "http://www.ncbi.nlm.nih.gov/blast/Blast.cgi"; > ? ?private URL aUrl; > ? ?private URLConnection uConn; > ? ?private OutputStreamWriter fromQBlast; > ? ?private BufferedReader rd; > > ? ?private String seq = null; > ? ?private String prog = null; > ? ?private String db = null; > ? ?private String outputFormat = null; > ? ?private String advanced = null; > > ? ?private String rid; > ? ?private long step; > ? ?private boolean done = false; > ? ?private long start; > > ? ?public RemoteQBlastService() throws BioException { > ? ? ? ?try { > ? ? ? ? ? ?aUrl = new URL(baseurl); > ? ? ? ? ? ?uConn = setQBlastProperties(aUrl.openConnection()); > > ? ? ? ? ? ?outputFormat = "Text"; > ? ? ? ?} > ? ? ? ?/* > ? ? ? ? * Needed but should never be thrown since the URL is static and > known to exist > ? ? ? ? */ > ? ? ? ?catch (MalformedURLException e) { > ? ? ? ? ? ?throw new BioException("It looks like the URL for NCBI QBlast > service is bad"); > ? ? ? ?} > ? ? ? ?/* > ? ? ? ? * Intercept if the program can't connect to QBlast service > ? ? ? ? */ > ? ? ? ?catch (IOException e) { > ? ? ? ? ? ?throw new BioException( > ? ? ? ? ? ? ? ? ? ?"Impossible to connect to QBlast service at this time. > Check your network connection"); > ? ? ? ?} > ? ?} > > ? ?/** > ? ? * This method execute the Blast request via the Put command of the > CGI-BIN > ? ? * interface. It gets the estimated time of completion by capturing the > ? ? * value of the RTOE variable and sets a loop that will check for > completion > ? ? * of analysis at intervals specified by RTOE. > ? ? * > ? ? *

> ? ? * It also capture the value for the RID variable, necessary for > fetching > ? ? * the actual results after completion. > ? ? *

> ? ? * > ? ? * @throws BioException > ? ? * ? ? ? ? ? ? if it is not possible to sent the BLAST command > ? ? */ > ? ?public void executeSearch() throws BioException { > > ? ? ? ?if (seq == null || db == null || prog == null) { > ? ? ? ? ? ?throw new BioException( > ? ? ? ? ? ? ? ? ? ?"Impossible to execute QBlast request. One or more of > seq|db|prog has not been set"); > ? ? ? ?} > ? ? ? ?/* > ? ? ? ? * sending the command to execute the Blast analysis > ? ? ? ? */ > ? ? ? ?String cmd = "CMD=Put&SERVICE=plain" + "&" + seq + "&" + prog + "&" > ? ? ? ? ? ? ? ?+ db + "&" + "FORMAT_TYPE=HTML"; > > ? ? ? ?if (advanced != null) { > ? ? ? ? ? ?cmd += cmd + "&" + advanced; > ? ? ? ?} > > ? ? ? ?try { > > ? ? ? ? ? ?uConn = setQBlastProperties(aUrl.openConnection()); > > ? ? ? ? ? ?fromQBlast = new OutputStreamWriter(uConn.getOutputStream()); > > ? ? ? ? ? ?fromQBlast.write(cmd); > ? ? ? ? ? ?fromQBlast.flush(); > > ? ? ? ? ? ?// Get the response > ? ? ? ? ? ?rd = new BufferedReader(new InputStreamReader(uConn > ? ? ? ? ? ? ? ? ? ?.getInputStream())); > > ? ? ? ? ? ?String line = ""; > > ? ? ? ? ? ?while ((line = rd.readLine()) != null) { > ? ? ? ? ? ? ? ?if (line.contains("RID")) { > ? ? ? ? ? ? ? ? ? ?String[] arr = line.split("="); > ? ? ? ? ? ? ? ? ? ?rid = arr[1].trim(); > ? ? ? ? ? ? ? ?} else if (line.contains("RTOE")) { > ? ? ? ? ? ? ? ? ? ?String[] arr = line.split("="); > ? ? ? ? ? ? ? ? ? ?step = Long.parseLong(arr[1].trim()) * 1000; > ? ? ? ? ? ? ? ? ? ?start = System.currentTimeMillis() + step; > ? ? ? ? ? ? ? ?} > ? ? ? ? ? ?} > ? ? ? ?} catch (IOException e) { > ? ? ? ? ? ?throw new BioException( > ? ? ? ? ? ? ? ? ? ?"Can't submit sequence to BLAST server at this time."); > ? ? ? ?} > ? ? ? ?/* > ? ? ? ? * Getting the info out of the NCBI system > ? ? ? ? */ > ? ? ? ?while (!done) { > ? ? ? ? ? ?long prez = System.currentTimeMillis(); > ? ? ? ? ? ?done = isReady(rid, prez); > ? ? ? ?} > ? ?} > > ? ?/** > ? ? *

This method is used only for the executeBlastSearch method to > check for completion of > ? ? * request using the NCBI specified RTOE variable

> ? ? * > ? ? * @param id > ? ? * @param present > ? ? * @return > ? ? */ > ? ?private boolean isReady(String id, long present) { > > ? ? ? ?boolean ready = false; > ? ? ? ?String check = "CMD=Get&RID=" + id; > ? ? ? ?/* > ? ? ? ? * If present time is less than the start of the search added to > step > ? ? ? ? * obtained from NCBI, just do nothing ;-) > ? ? ? ? */ > ? ? ? ?if (present < start) { > ? ? ? ? ? ?; > ? ? ? ?} > ? ? ? ?/* > ? ? ? ? * If we are at least step seconds in the future from the actual > call of > ? ? ? ? * method executeBlastSearch() > ? ? ? ? */ > ? ? ? ?else { > ? ? ? ? ? ?try { > ? ? ? ? ? ? ? ?uConn = setQBlastProperties(aUrl.openConnection()); > > ? ? ? ? ? ? ? ?fromQBlast = new > OutputStreamWriter(uConn.getOutputStream()); > ? ? ? ? ? ? ? ?fromQBlast.write(check); > ? ? ? ? ? ? ? ?fromQBlast.flush(); > > ? ? ? ? ? ? ? ?rd = new BufferedReader(new InputStreamReader(uConn > ? ? ? ? ? ? ? ? ? ? ? ?.getInputStream())); > > ? ? ? ? ? ? ? ?String line = ""; > > ? ? ? ? ? ? ? ?while ((line = rd.readLine()) != null) { > ? ? ? ? ? ? ? ? ? ?if (line.contains("READY")) { > ? ? ? ? ? ? ? ? ? ? ? ?ready = true; > ? ? ? ? ? ? ? ? ? ?} else if (line.contains("WAITING")) { > ? ? ? ? ? ? ? ? ? ? ? ?/* > ? ? ? ? ? ? ? ? ? ? ? ? * Else, move start forward in time... > ? ? ? ? ? ? ? ? ? ? ? ? */ > ? ? ? ? ? ? ? ? ? ? ? ?start = present + step; > ? ? ? ? ? ? ? ? ? ?} > ? ? ? ? ? ? ? ?} > ? ? ? ? ? ?} catch (IOException e) { > ? ? ? ? ? ? ? ?e.printStackTrace(); > ? ? ? ? ? ?} > ? ? ? ?} > ? ? ? ?return ready; > ? ?} > > ? ?/** > ? ? *

This method extracts this actual Blast report. The default format > is Text but can be changed before with the method > ? ? * setQBlastOutputFormat.

> ? ? * > ? ? * > ? ? * @return > ? ? * @throws BioException > ? ? */ > ? ?public InputStream getAlignmentResults() throws BioException { > ? ? ? ?String srid = "CMD=Get&RID=" + rid; > ? ? ? ?srid += "&FORMAT_TYPE=" + outputFormat; > > ? ? ? ?if(!this.done){ > ? ? ? ? ? ?throw new BioException("Unable to get report at this time. Your > Blast request has not been processed yet."); > ? ? ? ?} > > ? ? ? ?try { > ? ? ? ? ? ?uConn = setQBlastProperties(aUrl.openConnection()); > > ? ? ? ? ? ?fromQBlast = new OutputStreamWriter(uConn.getOutputStream()); > ? ? ? ? ? ?fromQBlast.write(srid); > ? ? ? ? ? ?fromQBlast.flush(); > > ? ? ? ? ? ?return uConn.getInputStream(); > > ? ? ? ?} catch (IOException ioe) { > ? ? ? ? ? ?throw new BioException( > ? ? ? ? ? ? ? ? ? ?"It is not possible to fetch Blast report from NCBI at > this time"); > ? ? ? ?} > ? ?} > > ? ?/** > ? ? *

> ? ? * Set the sequence to be blasted using the String that correspond to > the > ? ? * sequence. > ? ? *

> ? ? * > ? ? *

> ? ? * Take note that this method is mutually exclusive to setGIToBlast() > for a > ? ? * given Blast request. > ? ? *

> ? ? * > ? ? * @param aStr > ? ? * ? ? ? ? ? ?: a String with the sequence > ? ? */ > ? ?public void setSequence(String aStr) { > ? ? ? ?this.seq = "QUERY=" + aStr; > ? ?} > > ? ?/** > ? ? * Simply return a string with the blasted sequence. > ? ? * > ? ? * @return seq : a string with the sequence > ? ? */ > ? ?public String getSeqToBlast() { > ? ? ? ?return this.seq; > ? ?} > > ? ?/** > ? ? *

> ? ? * Set the sequence to be blasted using the NCBI GI value. At this time, > ? ? * there is no effort made to check the validity of this GI. > ? ? *

> ? ? * > ? ? *

> ? ? * Take note that this method is mutually exclusive to setSeqToBlast() > for a > ? ? * given Blast request. > ? ? *

> ? ? * > ? ? * @param gi > ? ? * ? ? ? ? ? ?: an integer value representing a NCBI GI > ? ? */ > ? ?public void setGIToBlast(String gi) { > ? ? ? ?this.seq = "QUERY=" + gi; > ? ?} > > ? ?/** > ? ? *

> ? ? * Simply return a string with the sequence blasted. > ? ? *

> ? ? * > ? ? * @return GI : a String with the GI of the blasted sequence > ? ? */ > ? ?public String getGIToBlast() { > ? ? ? ?return this.seq; > ? ?} > > ? ?/** > ? ? *

> ? ? * This method set the program to be used to blast the given > sequence/GI. At > ? ? * this time, there is no attempt at checking the matching of sequence > type > ? ? * to program. > ? ? *

> ? ? * > ? ? * @param prog > ? ? * ? ? ? ? ? ?: a String representing the program specified for this > QBlast > ? ? * ? ? ? ? ? ?request. > ? ? * > ? ? */ > ? ?public void setProgram(String prog) { > ? ? ? ?this.prog = "PROGRAM=" + prog; > ? ?} > > ? ?/** > ? ? *

> ? ? * Simply returns the program used for the given Blast request. > ? ? *

> ? ? * > ? ? * @return prog : a String with the program used for this QBlast > request. > ? ? */ > ? ?public String getProgram() { > ? ? ? ?return this.prog; > ? ?} > > ? ?/** > ? ? *

> ? ? * This method set the database to be used to blast the given > sequence/GI. > ? ? * At this time, there is no attempt at checking the matching of > sequence > ? ? * type to database. > ? ? *

> ? ? * > ? ? * @param db: a String for the database specified for this QBlast > request > ? ? */ > ? ?public void setDatabase(String db) { > ? ? ? ?this.db = "DATABASE=" + db; > ? ?} > > ? ?/** > ? ? *

> ? ? * Simply returns the database used for the given Blast request. > ? ? *

> ? ? * > ? ? * @return db: a String with the database used for this QBlast request. > ? ? */ > ? ?public String getBlastDatabase() { > ? ? ? ?return this.db; > ? ?} > > ? ?/** > ? ? *

This method let the user specify which format to use for > generating the output.

> ? ? * > ? ? * @param type:an integer taken from the static constant of this class, > either be TEXT, XML or HTML > ? ? */ > ? ?public void setQBlastOutputFormat(int type) { > > ? ? ? ?switch (type) { > ? ? ? ? ? ?case 0: > ? ? ? ? ? ? ? ?this.outputFormat = "Text"; > ? ? ? ? ? ? ? ?break; > ? ? ? ? ? ?case 1: > ? ? ? ? ? ? ? ?this.outputFormat = "XML"; > ? ? ? ? ? ? ? ?break; > ? ? ? ? ? ?case 2: > ? ? ? ? ? ? ? ?this.outputFormat = "HTML"; > ? ? ? ? ? ? ? ?break; > ? ? ? ?} > ? ?} > > ? ?/** > ? ? *

> ? ? * Simply returns the output format used for the given Blast report. > ? ? *

> ? ? * > ? ? * @return outputFormat : a String with the format specified for the > QBlast report. > ? ? */ > ? ?public String getQBlastOutputFormat() { > ? ? ? ?return this.outputFormat; > ? ?} > > ? ?/** > ? ? *

This method is to be used if a request is to use non-default > values at submission. According to QBlast info, > ? ? * the accepted parameters for PUT requests are:

> ? ? * > ? ? *
    > ? ? *
  • -G: cost to create a gap. Default = 5 (nuc-nuc) / 11 (protein) / > non-affine for megablast
  • > ? ? *
  • -E: Cost to extend a gap. Default = 2 (nuc-nuc) / 1 (protein) / > non-affine for megablast
  • > ? ? *
  • -r: integer to reward for match. Default = 1
  • > ? ? *
  • -q: negative integer for penalty to allow mismatch. Default = > -3
  • > ? ? *
  • -e: expectation value. Default = 10.0
  • > ? ? *
  • -W: word size. Default = 3 (proteins) / 11 (nuc-nuc) / 28 > (megablast)
  • > ? ? *
  • -y: dropoff for blast extensions in bits, using default if not > specified. Default = 20 for blastn, 7 for all others > ? ? * (except megablast for which it is not applicable).
  • > ? ? *
  • -X: X dropoff value for gapped alignment, in bits. Default = 30 > for blastn/megablast, 15 for all others.
  • > ? ? *
  • -Z: final X dropoff value for gapped alignement, in bits. Default > = 50 for blastn, 25 for all others > ? ? * (except megablast for which it is not applicable)
  • > ? ? *
  • -P: equals 0 for multiple hits 1-pass, 1 for single hit 1-pass. > Does not apply to blastn ou megablast.
  • > ? ? *
  • -A: multiple hits window size. Default = 0 (for single hit > algorithm)
  • > ? ? *
  • -I: number of database sequences to save hits for. Default = > 500
  • > ? ? *
  • -Y: effective length of the search space. Default = 0 (0 > represents using the whole space)
  • > ? ? *
  • -z: a real specifying the effective length of the database to > use. Default = 0 (0 represents the real size)
  • > ? ? *
  • -c: an integer representing pseudocount constant for PSI-BLAST. > Default = 7
  • > ? ? *
  • -F: any filtering directive
  • > ? ? *
> ? ? * > ? ? *

You have to be aware that at not moment is there any error > checking on the use of these parameters by this class.

> ? ? * @param aStr: a String with any number of optional parameters with an > associated value. > ? ? * > ? ? */ > ? ?public void setAdvancedOptions(String aStr) { > ? ? ? ?this.advanced = "OTHER_ADVANCED=" + aStr; > ? ?} > > ? ?/** > ? ? * > ? ? * Simply return the string given as argument via > setBlastAdvancedOptions > ? ? * > ? ? * @return advanced: the string with the advanced options > ? ? */ > ? ?public String getBlastAdvancedOptions() { > ? ? ? ?return this.advanced; > ? ?} > > ? ?/** > ? ? * > ? ? * Simply return the QBlast RID for this specific QBlast request > ? ? * > ? ? * @return rid: the string with the RID > ? ? */ > ? ?public String getBlastRID() { > ? ? ? ?return this.rid; > ? ?} > > ? ?/** > ? ? * A simple method to check the availability of the QBlast service > ? ? * > ? ? * @throws BioException > ? ? */ > ? ?public void printRemoteBlastInfo() throws BioException { > ? ? ? ?try { > ? ? ? ? ? ?OutputStreamWriter out = new OutputStreamWriter(uConn > ? ? ? ? ? ? ? ? ? ?.getOutputStream()); > > ? ? ? ? ? ?out.write("CMD=Info"); > ? ? ? ? ? ?out.flush(); > > ? ? ? ? ? ?// Get the response > ? ? ? ? ? ?BufferedReader rd = new BufferedReader(new > InputStreamReader(uConn > ? ? ? ? ? ? ? ? ? ?.getInputStream())); > > ? ? ? ? ? ?String line = ""; > > ? ? ? ? ? ?while ((line = rd.readLine()) != null) { > ? ? ? ? ? ? ? ?System.out.println(line); > ? ? ? ? ? ?} > > ? ? ? ? ? ?out.close(); > ? ? ? ? ? ?rd.close(); > ? ? ? ?} catch (IOException e) { > ? ? ? ? ? ?throw new BioException( > ? ? ? ? ? ? ? ? ? ?"Impossible to get info from QBlast service at this > time. Check your network connection"); > ? ? ? ?} > ? ?} > > ? ?private URLConnection setQBlastProperties(URLConnection conn) { > > ? ? ? ?URLConnection tmp = conn; > > ? ? ? ?conn.setDoOutput(true); > ? ? ? ?conn.setUseCaches(false); > > ? ? ? ?tmp.setRequestProperty("User-Agent", "Biojava/RemoteQBlastService"); > ? ? ? ?tmp.setRequestProperty("Connection", "Keep-Alive"); > ? ? ? ?tmp.setRequestProperty("Content-type", > ? ? ? ? ? ? ? ?"application/x-www-form-urlencoded"); > ? ? ? ?tmp.setRequestProperty("Content-length", "200"); > > ? ? ? ?return tmp; > ? ?} > } > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > From holland at eaglegenomics.com Thu Jun 11 11:30:11 2009 From: holland at eaglegenomics.com (Richard Holland) Date: Thu, 11 Jun 2009 16:30:11 +0100 Subject: [Biojava-dev] First draft of a remote blast service class In-Reply-To: <59a41c430906110824l2f1167cfp236cf69cc9dee94c@mail.gmail.com> References: <59a41c430906110824l2f1167cfp236cf69cc9dee94c@mail.gmail.com> Message-ID: <1244734211.5546.62.camel@buzzybee> Excellent idea. Even better than a Map or Properties! One parameters bean type per implementation type complete with all its own validation, extending a placeholder interface that can be used in the generic interface declaration for RemotePairwiseAlignmentService. That would be sweet. On Thu, 2009-06-11 at 08:24 -0700, Andreas Prlic wrote: > I would pass the parameters as a bean rather than a string... > > Andreas > > On Thu, Jun 11, 2009 at 6:52 AM, Sylvain > Foisy wrote: > > Hi to all, > > > > I've been working on this for the past week or so and after discussing this > > with Andreas, I am putting my code here for critical review. I'll put this > > stuff in biojava-live as soon as Andreas can fix my SVN access. > > > > First, an interface called RemotePairwiseAlignementSerivce defines the basic > > components of a remote service: sequence/database/progam/run options/output > > options. RemoteQBlastService implements this interface and runs remote > > Qblast requests and creates output in either text, XML or HTML. At present > > time, regular blastall programs work, no blastpgp/megablast support yet. > > > > I'll need some guidance to make it work on other type of web services like > > EBI. > > > > Best regards > > > > Sylvain > > > > =================================================================== > > > > Sylvain Foisy, Ph. D. > > Consultant Bio-informatique / Bioinformatics > > Diploide.net - TI pour la vie / IT for Life > > > > Courriel: sylvain.foisy at diploide.net > > Web: http://www.diploide.net > > Tel: (514) 893-4363 > > =================================================================== > > > > import java.io.InputStream; > > > > import org.biojava.bio.BioException; > > /** > > * This interface specifies minimal information needed to execute a pairwise > > alignment on a remote service. > > * > > * Example of service: QBlast service at NCBI > > * Web Service at EBI > > * > > * @author Sylvain Foisy > > * @since 1.8 > > * > > */ > > public interface RemotePairwiseAlignementService { > > > > /** > > * This field specifies that the output format of results > > * is text. > > * > > */ > > public static final int TEXT = 0; > > > > /** > > * This field specifies that the output format of results > > * is XML. > > * > > */ > > public static final int XML = 1; > > > > /** > > * This field specifies that the output format of results > > * is HTML. > > * > > */ > > public static final int HTML = 2; > > > > /** > > * Setting the database to use for doing the pairwise alignment > > * > > * @param db: a String with a valid database ID for the > > service used. > > * > > */ > > public void setDatabase(String db); > > > > /** > > * Setting the sequence to be align for this for this request > > * > > * @param seq: a String with a sequence to be aligned. > > * > > */ > > public void setSequence(String seq); > > > > /** > > * Setting the program to use for this pairwise alignment > > * > > * @param prog: a String with a valid database ID for the > > service used. > > * > > */ > > public void setProgram(String prog); > > > > /** > > * Setting all other options to use for this pairwise alignment > > * > > * @param db: a String with a valid database ID for the > > service used. > > * > > */ > > public void setAdvancedOptions(String str); > > > > /** > > * Doing the actual analysis on the instantiated service > > * > > * @throws BioException > > */ > > public void executeSearch() throws BioException; > > > > /** > > * Getting the actual alignment results from this instantiated service > > * > > * @return : an InputStream with the actual alignment > > results > > * @throws BioException > > */ > > public InputStream getAlignmentResults() throws BioException; > > } > > > > import java.io.BufferedReader; > > import java.io.IOException; > > import java.io.InputStream; > > import java.io.InputStreamReader; > > import java.io.OutputStreamWriter; > > import java.net.MalformedURLException; > > import java.net.URL; > > import java.net.URLConnection; > > > > import org.biojava.bio.BioException; > > > > /** > > * RemoteQBlastService - A simple way of submitting BLAST request to the > > QBlast > > * service at NCBI. > > * > > *

> > * NCBI provides a Blast server through a CGI-BIN interface. > > RemoteQBlastService simply > > * encapsulates an access to it by giving users access to get/set methods to > > fix > > * sequence, program and database as well as advanced options. > > *

> > * > > *

> > * As of version 1.0, only blastall programs are usable. blastpgp and > > megablast are high-priorities. > > *

> > * > > * @author Sylvain Foisy > > * @version 1.0 > > * @since 1.8 > > * > > * > > */ > > public class RemoteQBlastService implements RemotePairwiseAlignementService{ > > > > // public static final int TEXT = 0; > > // public static final int XML = 1; > > // public static final int HTML = 2; > > > > private static String baseurl = > > "http://www.ncbi.nlm.nih.gov/blast/Blast.cgi"; > > private URL aUrl; > > private URLConnection uConn; > > private OutputStreamWriter fromQBlast; > > private BufferedReader rd; > > > > private String seq = null; > > private String prog = null; > > private String db = null; > > private String outputFormat = null; > > private String advanced = null; > > > > private String rid; > > private long step; > > private boolean done = false; > > private long start; > > > > public RemoteQBlastService() throws BioException { > > try { > > aUrl = new URL(baseurl); > > uConn = setQBlastProperties(aUrl.openConnection()); > > > > outputFormat = "Text"; > > } > > /* > > * Needed but should never be thrown since the URL is static and > > known to exist > > */ > > catch (MalformedURLException e) { > > throw new BioException("It looks like the URL for NCBI QBlast > > service is bad"); > > } > > /* > > * Intercept if the program can't connect to QBlast service > > */ > > catch (IOException e) { > > throw new BioException( > > "Impossible to connect to QBlast service at this time. > > Check your network connection"); > > } > > } > > > > /** > > * This method execute the Blast request via the Put command of the > > CGI-BIN > > * interface. It gets the estimated time of completion by capturing the > > * value of the RTOE variable and sets a loop that will check for > > completion > > * of analysis at intervals specified by RTOE. > > * > > *

> > * It also capture the value for the RID variable, necessary for > > fetching > > * the actual results after completion. > > *

> > * > > * @throws BioException > > * if it is not possible to sent the BLAST command > > */ > > public void executeSearch() throws BioException { > > > > if (seq == null || db == null || prog == null) { > > throw new BioException( > > "Impossible to execute QBlast request. One or more of > > seq|db|prog has not been set"); > > } > > /* > > * sending the command to execute the Blast analysis > > */ > > String cmd = "CMD=Put&SERVICE=plain" + "&" + seq + "&" + prog + "&" > > + db + "&" + "FORMAT_TYPE=HTML"; > > > > if (advanced != null) { > > cmd += cmd + "&" + advanced; > > } > > > > try { > > > > uConn = setQBlastProperties(aUrl.openConnection()); > > > > fromQBlast = new OutputStreamWriter(uConn.getOutputStream()); > > > > fromQBlast.write(cmd); > > fromQBlast.flush(); > > > > // Get the response > > rd = new BufferedReader(new InputStreamReader(uConn > > .getInputStream())); > > > > String line = ""; > > > > while ((line = rd.readLine()) != null) { > > if (line.contains("RID")) { > > String[] arr = line.split("="); > > rid = arr[1].trim(); > > } else if (line.contains("RTOE")) { > > String[] arr = line.split("="); > > step = Long.parseLong(arr[1].trim()) * 1000; > > start = System.currentTimeMillis() + step; > > } > > } > > } catch (IOException e) { > > throw new BioException( > > "Can't submit sequence to BLAST server at this time."); > > } > > /* > > * Getting the info out of the NCBI system > > */ > > while (!done) { > > long prez = System.currentTimeMillis(); > > done = isReady(rid, prez); > > } > > } > > > > /** > > *

This method is used only for the executeBlastSearch method to > > check for completion of > > * request using the NCBI specified RTOE variable

> > * > > * @param id > > * @param present > > * @return > > */ > > private boolean isReady(String id, long present) { > > > > boolean ready = false; > > String check = "CMD=Get&RID=" + id; > > /* > > * If present time is less than the start of the search added to > > step > > * obtained from NCBI, just do nothing ;-) > > */ > > if (present < start) { > > ; > > } > > /* > > * If we are at least step seconds in the future from the actual > > call of > > * method executeBlastSearch() > > */ > > else { > > try { > > uConn = setQBlastProperties(aUrl.openConnection()); > > > > fromQBlast = new > > OutputStreamWriter(uConn.getOutputStream()); > > fromQBlast.write(check); > > fromQBlast.flush(); > > > > rd = new BufferedReader(new InputStreamReader(uConn > > .getInputStream())); > > > > String line = ""; > > > > while ((line = rd.readLine()) != null) { > > if (line.contains("READY")) { > > ready = true; > > } else if (line.contains("WAITING")) { > > /* > > * Else, move start forward in time... > > */ > > start = present + step; > > } > > } > > } catch (IOException e) { > > e.printStackTrace(); > > } > > } > > return ready; > > } > > > > /** > > *

This method extracts this actual Blast report. The default format > > is Text but can be changed before with the method > > * setQBlastOutputFormat.

> > * > > * > > * @return > > * @throws BioException > > */ > > public InputStream getAlignmentResults() throws BioException { > > String srid = "CMD=Get&RID=" + rid; > > srid += "&FORMAT_TYPE=" + outputFormat; > > > > if(!this.done){ > > throw new BioException("Unable to get report at this time. Your > > Blast request has not been processed yet."); > > } > > > > try { > > uConn = setQBlastProperties(aUrl.openConnection()); > > > > fromQBlast = new OutputStreamWriter(uConn.getOutputStream()); > > fromQBlast.write(srid); > > fromQBlast.flush(); > > > > return uConn.getInputStream(); > > > > } catch (IOException ioe) { > > throw new BioException( > > "It is not possible to fetch Blast report from NCBI at > > this time"); > > } > > } > > > > /** > > *

> > * Set the sequence to be blasted using the String that correspond to > > the > > * sequence. > > *

> > * > > *

> > * Take note that this method is mutually exclusive to setGIToBlast() > > for a > > * given Blast request. > > *

> > * > > * @param aStr > > * : a String with the sequence > > */ > > public void setSequence(String aStr) { > > this.seq = "QUERY=" + aStr; > > } > > > > /** > > * Simply return a string with the blasted sequence. > > * > > * @return seq : a string with the sequence > > */ > > public String getSeqToBlast() { > > return this.seq; > > } > > > > /** > > *

> > * Set the sequence to be blasted using the NCBI GI value. At this time, > > * there is no effort made to check the validity of this GI. > > *

> > * > > *

> > * Take note that this method is mutually exclusive to setSeqToBlast() > > for a > > * given Blast request. > > *

> > * > > * @param gi > > * : an integer value representing a NCBI GI > > */ > > public void setGIToBlast(String gi) { > > this.seq = "QUERY=" + gi; > > } > > > > /** > > *

> > * Simply return a string with the sequence blasted. > > *

> > * > > * @return GI : a String with the GI of the blasted sequence > > */ > > public String getGIToBlast() { > > return this.seq; > > } > > > > /** > > *

> > * This method set the program to be used to blast the given > > sequence/GI. At > > * this time, there is no attempt at checking the matching of sequence > > type > > * to program. > > *

> > * > > * @param prog > > * : a String representing the program specified for this > > QBlast > > * request. > > * > > */ > > public void setProgram(String prog) { > > this.prog = "PROGRAM=" + prog; > > } > > > > /** > > *

> > * Simply returns the program used for the given Blast request. > > *

> > * > > * @return prog : a String with the program used for this QBlast > > request. > > */ > > public String getProgram() { > > return this.prog; > > } > > > > /** > > *

> > * This method set the database to be used to blast the given > > sequence/GI. > > * At this time, there is no attempt at checking the matching of > > sequence > > * type to database. > > *

> > * > > * @param db: a String for the database specified for this QBlast > > request > > */ > > public void setDatabase(String db) { > > this.db = "DATABASE=" + db; > > } > > > > /** > > *

> > * Simply returns the database used for the given Blast request. > > *

> > * > > * @return db: a String with the database used for this QBlast request. > > */ > > public String getBlastDatabase() { > > return this.db; > > } > > > > /** > > *

This method let the user specify which format to use for > > generating the output.

> > * > > * @param type:an integer taken from the static constant of this class, > > either be TEXT, XML or HTML > > */ > > public void setQBlastOutputFormat(int type) { > > > > switch (type) { > > case 0: > > this.outputFormat = "Text"; > > break; > > case 1: > > this.outputFormat = "XML"; > > break; > > case 2: > > this.outputFormat = "HTML"; > > break; > > } > > } > > > > /** > > *

> > * Simply returns the output format used for the given Blast report. > > *

> > * > > * @return outputFormat : a String with the format specified for the > > QBlast report. > > */ > > public String getQBlastOutputFormat() { > > return this.outputFormat; > > } > > > > /** > > *

This method is to be used if a request is to use non-default > > values at submission. According to QBlast info, > > * the accepted parameters for PUT requests are:

> > * > > *
    > > *
  • -G: cost to create a gap. Default = 5 (nuc-nuc) / 11 (protein) / > > non-affine for megablast
  • > > *
  • -E: Cost to extend a gap. Default = 2 (nuc-nuc) / 1 (protein) / > > non-affine for megablast
  • > > *
  • -r: integer to reward for match. Default = 1
  • > > *
  • -q: negative integer for penalty to allow mismatch. Default = > > -3
  • > > *
  • -e: expectation value. Default = 10.0
  • > > *
  • -W: word size. Default = 3 (proteins) / 11 (nuc-nuc) / 28 > > (megablast)
  • > > *
  • -y: dropoff for blast extensions in bits, using default if not > > specified. Default = 20 for blastn, 7 for all others > > * (except megablast for which it is not applicable).
  • > > *
  • -X: X dropoff value for gapped alignment, in bits. Default = 30 > > for blastn/megablast, 15 for all others.
  • > > *
  • -Z: final X dropoff value for gapped alignement, in bits. Default > > = 50 for blastn, 25 for all others > > * (except megablast for which it is not applicable)
  • > > *
  • -P: equals 0 for multiple hits 1-pass, 1 for single hit 1-pass. > > Does not apply to blastn ou megablast.
  • > > *
  • -A: multiple hits window size. Default = 0 (for single hit > > algorithm)
  • > > *
  • -I: number of database sequences to save hits for. Default = > > 500
  • > > *
  • -Y: effective length of the search space. Default = 0 (0 > > represents using the whole space)
  • > > *
  • -z: a real specifying the effective length of the database to > > use. Default = 0 (0 represents the real size)
  • > > *
  • -c: an integer representing pseudocount constant for PSI-BLAST. > > Default = 7
  • > > *
  • -F: any filtering directive
  • > > *
> > * > > *

You have to be aware that at not moment is there any error > > checking on the use of these parameters by this class.

> > * @param aStr: a String with any number of optional parameters with an > > associated value. > > * > > */ > > public void setAdvancedOptions(String aStr) { > > this.advanced = "OTHER_ADVANCED=" + aStr; > > } > > > > /** > > * > > * Simply return the string given as argument via > > setBlastAdvancedOptions > > * > > * @return advanced: the string with the advanced options > > */ > > public String getBlastAdvancedOptions() { > > return this.advanced; > > } > > > > /** > > * > > * Simply return the QBlast RID for this specific QBlast request > > * > > * @return rid: the string with the RID > > */ > > public String getBlastRID() { > > return this.rid; > > } > > > > /** > > * A simple method to check the availability of the QBlast service > > * > > * @throws BioException > > */ > > public void printRemoteBlastInfo() throws BioException { > > try { > > OutputStreamWriter out = new OutputStreamWriter(uConn > > .getOutputStream()); > > > > out.write("CMD=Info"); > > out.flush(); > > > > // Get the response > > BufferedReader rd = new BufferedReader(new > > InputStreamReader(uConn > > .getInputStream())); > > > > String line = ""; > > > > while ((line = rd.readLine()) != null) { > > System.out.println(line); > > } > > > > out.close(); > > rd.close(); > > } catch (IOException e) { > > throw new BioException( > > "Impossible to get info from QBlast service at this > > time. Check your network connection"); > > } > > } > > > > private URLConnection setQBlastProperties(URLConnection conn) { > > > > URLConnection tmp = conn; > > > > conn.setDoOutput(true); > > conn.setUseCaches(false); > > > > tmp.setRequestProperty("User-Agent", "Biojava/RemoteQBlastService"); > > tmp.setRequestProperty("Connection", "Keep-Alive"); > > tmp.setRequestProperty("Content-type", > > "application/x-www-form-urlencoded"); > > tmp.setRequestProperty("Content-length", "200"); > > > > return tmp; > > } > > } > > > > > > _______________________________________________ > > biojava-dev mailing list > > biojava-dev at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-dev > > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev -- Richard Holland, BSc MBCS Finance Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From holland at eaglegenomics.com Thu Jun 11 10:17:35 2009 From: holland at eaglegenomics.com (Richard Holland) Date: Thu, 11 Jun 2009 15:17:35 +0100 Subject: [Biojava-dev] First draft of a remote blast service class In-Reply-To: References: Message-ID: <1244729855.5546.52.camel@buzzybee> Good stuff! My 2p's worth: setSequence() should be overloaded to accept all forms of possible sequence input - whatever is decided on as the standard way of referencing sequence data in BJ3. The original plan for BJ3 was to allow String/CharSequence and List (see http://www.biojava.org/wiki/BioJava3:HowTo ) setAdvancedOptions() should not accept a String, but rather a Properties or a Map, where the keys of the Map/Properties are restricted to a range of acceptable values determined (and published, maybe as an enum?) by each of the implementation classes (e.g. RemoteQBlastService). The implementation class then uses this to construct the call string. The reason for doing it this way is that (a) it allows the parameters to be verified by checking them against a known list of allowable key/values, and (b) it allows for non-URL based remote requests to be constructed from the values, e.g. SOAP calls. I would also replace the static int HTML/TEXT/XML with an enum as numeric constants are sometimes a Bad Thing. The setProgram() method in my mind is specific to Blast, as opposed to being a generic Pairwise Alignment concept. Therefore it might be better to move this to a Blast-specific sub-interface or make it only appear in the implementation classes that refer to Blast. Finally, the JavaDocs for the various set() methods are incorrect - they're all mostly the same in fact! :) But overall it looks good. cheers, Richard On Thu, 2009-06-11 at 09:52 -0400, Sylvain Foisy wrote: > Hi to all, > > I've been working on this for the past week or so and after discussing this > with Andreas, I am putting my code here for critical review. I'll put this > stuff in biojava-live as soon as Andreas can fix my SVN access. > > First, an interface called RemotePairwiseAlignementSerivce defines the basic > components of a remote service: sequence/database/progam/run options/output > options. RemoteQBlastService implements this interface and runs remote > Qblast requests and creates output in either text, XML or HTML. At present > time, regular blastall programs work, no blastpgp/megablast support yet. > > I'll need some guidance to make it work on other type of web services like > EBI. > > Best regards > > Sylvain > > =================================================================== > > Sylvain Foisy, Ph. D. > Consultant Bio-informatique / Bioinformatics > Diploide.net - TI pour la vie / IT for Life > > Courriel: sylvain.foisy at diploide.net > Web: http://www.diploide.net > Tel: (514) 893-4363 > =================================================================== > > import java.io.InputStream; > > import org.biojava.bio.BioException; > /** > * This interface specifies minimal information needed to execute a pairwise > alignment on a remote service. > * > * Example of service: QBlast service at NCBI > * Web Service at EBI > * > * @author Sylvain Foisy > * @since 1.8 > * > */ > public interface RemotePairwiseAlignementService { > > /** > * This field specifies that the output format of results > * is text. > * > */ > public static final int TEXT = 0; > > /** > * This field specifies that the output format of results > * is XML. > * > */ > public static final int XML = 1; > > /** > * This field specifies that the output format of results > * is HTML. > * > */ > public static final int HTML = 2; > > /** > * Setting the database to use for doing the pairwise alignment > * > * @param db: a String with a valid database ID for the > service used. > * > */ > public void setDatabase(String db); > > /** > * Setting the sequence to be align for this for this request > * > * @param seq: a String with a sequence to be aligned. > * > */ > public void setSequence(String seq); > > /** > * Setting the program to use for this pairwise alignment > * > * @param prog: a String with a valid database ID for the > service used. > * > */ > public void setProgram(String prog); > > /** > * Setting all other options to use for this pairwise alignment > * > * @param db: a String with a valid database ID for the > service used. > * > */ > public void setAdvancedOptions(String str); > > /** > * Doing the actual analysis on the instantiated service > * > * @throws BioException > */ > public void executeSearch() throws BioException; > > /** > * Getting the actual alignment results from this instantiated service > * > * @return : an InputStream with the actual alignment > results > * @throws BioException > */ > public InputStream getAlignmentResults() throws BioException; > } > > import java.io.BufferedReader; > import java.io.IOException; > import java.io.InputStream; > import java.io.InputStreamReader; > import java.io.OutputStreamWriter; > import java.net.MalformedURLException; > import java.net.URL; > import java.net.URLConnection; > > import org.biojava.bio.BioException; > > /** > * RemoteQBlastService - A simple way of submitting BLAST request to the > QBlast > * service at NCBI. > * > *

> * NCBI provides a Blast server through a CGI-BIN interface. > RemoteQBlastService simply > * encapsulates an access to it by giving users access to get/set methods to > fix > * sequence, program and database as well as advanced options. > *

> * > *

> * As of version 1.0, only blastall programs are usable. blastpgp and > megablast are high-priorities. > *

> * > * @author Sylvain Foisy > * @version 1.0 > * @since 1.8 > * > * > */ > public class RemoteQBlastService implements RemotePairwiseAlignementService{ > > // public static final int TEXT = 0; > // public static final int XML = 1; > // public static final int HTML = 2; > > private static String baseurl = > "http://www.ncbi.nlm.nih.gov/blast/Blast.cgi"; > private URL aUrl; > private URLConnection uConn; > private OutputStreamWriter fromQBlast; > private BufferedReader rd; > > private String seq = null; > private String prog = null; > private String db = null; > private String outputFormat = null; > private String advanced = null; > > private String rid; > private long step; > private boolean done = false; > private long start; > > public RemoteQBlastService() throws BioException { > try { > aUrl = new URL(baseurl); > uConn = setQBlastProperties(aUrl.openConnection()); > > outputFormat = "Text"; > } > /* > * Needed but should never be thrown since the URL is static and > known to exist > */ > catch (MalformedURLException e) { > throw new BioException("It looks like the URL for NCBI QBlast > service is bad"); > } > /* > * Intercept if the program can't connect to QBlast service > */ > catch (IOException e) { > throw new BioException( > "Impossible to connect to QBlast service at this time. > Check your network connection"); > } > } > > /** > * This method execute the Blast request via the Put command of the > CGI-BIN > * interface. It gets the estimated time of completion by capturing the > * value of the RTOE variable and sets a loop that will check for > completion > * of analysis at intervals specified by RTOE. > * > *

> * It also capture the value for the RID variable, necessary for > fetching > * the actual results after completion. > *

> * > * @throws BioException > * if it is not possible to sent the BLAST command > */ > public void executeSearch() throws BioException { > > if (seq == null || db == null || prog == null) { > throw new BioException( > "Impossible to execute QBlast request. One or more of > seq|db|prog has not been set"); > } > /* > * sending the command to execute the Blast analysis > */ > String cmd = "CMD=Put&SERVICE=plain" + "&" + seq + "&" + prog + "&" > + db + "&" + "FORMAT_TYPE=HTML"; > > if (advanced != null) { > cmd += cmd + "&" + advanced; > } > > try { > > uConn = setQBlastProperties(aUrl.openConnection()); > > fromQBlast = new OutputStreamWriter(uConn.getOutputStream()); > > fromQBlast.write(cmd); > fromQBlast.flush(); > > // Get the response > rd = new BufferedReader(new InputStreamReader(uConn > .getInputStream())); > > String line = ""; > > while ((line = rd.readLine()) != null) { > if (line.contains("RID")) { > String[] arr = line.split("="); > rid = arr[1].trim(); > } else if (line.contains("RTOE")) { > String[] arr = line.split("="); > step = Long.parseLong(arr[1].trim()) * 1000; > start = System.currentTimeMillis() + step; > } > } > } catch (IOException e) { > throw new BioException( > "Can't submit sequence to BLAST server at this time."); > } > /* > * Getting the info out of the NCBI system > */ > while (!done) { > long prez = System.currentTimeMillis(); > done = isReady(rid, prez); > } > } > > /** > *

This method is used only for the executeBlastSearch method to > check for completion of > * request using the NCBI specified RTOE variable

> * > * @param id > * @param present > * @return > */ > private boolean isReady(String id, long present) { > > boolean ready = false; > String check = "CMD=Get&RID=" + id; > /* > * If present time is less than the start of the search added to > step > * obtained from NCBI, just do nothing ;-) > */ > if (present < start) { > ; > } > /* > * If we are at least step seconds in the future from the actual > call of > * method executeBlastSearch() > */ > else { > try { > uConn = setQBlastProperties(aUrl.openConnection()); > > fromQBlast = new > OutputStreamWriter(uConn.getOutputStream()); > fromQBlast.write(check); > fromQBlast.flush(); > > rd = new BufferedReader(new InputStreamReader(uConn > .getInputStream())); > > String line = ""; > > while ((line = rd.readLine()) != null) { > if (line.contains("READY")) { > ready = true; > } else if (line.contains("WAITING")) { > /* > * Else, move start forward in time... > */ > start = present + step; > } > } > } catch (IOException e) { > e.printStackTrace(); > } > } > return ready; > } > > /** > *

This method extracts this actual Blast report. The default format > is Text but can be changed before with the method > * setQBlastOutputFormat.

> * > * > * @return > * @throws BioException > */ > public InputStream getAlignmentResults() throws BioException { > String srid = "CMD=Get&RID=" + rid; > srid += "&FORMAT_TYPE=" + outputFormat; > > if(!this.done){ > throw new BioException("Unable to get report at this time. Your > Blast request has not been processed yet."); > } > > try { > uConn = setQBlastProperties(aUrl.openConnection()); > > fromQBlast = new OutputStreamWriter(uConn.getOutputStream()); > fromQBlast.write(srid); > fromQBlast.flush(); > > return uConn.getInputStream(); > > } catch (IOException ioe) { > throw new BioException( > "It is not possible to fetch Blast report from NCBI at > this time"); > } > } > > /** > *

> * Set the sequence to be blasted using the String that correspond to > the > * sequence. > *

> * > *

> * Take note that this method is mutually exclusive to setGIToBlast() > for a > * given Blast request. > *

> * > * @param aStr > * : a String with the sequence > */ > public void setSequence(String aStr) { > this.seq = "QUERY=" + aStr; > } > > /** > * Simply return a string with the blasted sequence. > * > * @return seq : a string with the sequence > */ > public String getSeqToBlast() { > return this.seq; > } > > /** > *

> * Set the sequence to be blasted using the NCBI GI value. At this time, > * there is no effort made to check the validity of this GI. > *

> * > *

> * Take note that this method is mutually exclusive to setSeqToBlast() > for a > * given Blast request. > *

> * > * @param gi > * : an integer value representing a NCBI GI > */ > public void setGIToBlast(String gi) { > this.seq = "QUERY=" + gi; > } > > /** > *

> * Simply return a string with the sequence blasted. > *

> * > * @return GI : a String with the GI of the blasted sequence > */ > public String getGIToBlast() { > return this.seq; > } > > /** > *

> * This method set the program to be used to blast the given > sequence/GI. At > * this time, there is no attempt at checking the matching of sequence > type > * to program. > *

> * > * @param prog > * : a String representing the program specified for this > QBlast > * request. > * > */ > public void setProgram(String prog) { > this.prog = "PROGRAM=" + prog; > } > > /** > *

> * Simply returns the program used for the given Blast request. > *

> * > * @return prog : a String with the program used for this QBlast > request. > */ > public String getProgram() { > return this.prog; > } > > /** > *

> * This method set the database to be used to blast the given > sequence/GI. > * At this time, there is no attempt at checking the matching of > sequence > * type to database. > *

> * > * @param db: a String for the database specified for this QBlast > request > */ > public void setDatabase(String db) { > this.db = "DATABASE=" + db; > } > > /** > *

> * Simply returns the database used for the given Blast request. > *

> * > * @return db: a String with the database used for this QBlast request. > */ > public String getBlastDatabase() { > return this.db; > } > > /** > *

This method let the user specify which format to use for > generating the output.

> * > * @param type:an integer taken from the static constant of this class, > either be TEXT, XML or HTML > */ > public void setQBlastOutputFormat(int type) { > > switch (type) { > case 0: > this.outputFormat = "Text"; > break; > case 1: > this.outputFormat = "XML"; > break; > case 2: > this.outputFormat = "HTML"; > break; > } > } > > /** > *

> * Simply returns the output format used for the given Blast report. > *

> * > * @return outputFormat : a String with the format specified for the > QBlast report. > */ > public String getQBlastOutputFormat() { > return this.outputFormat; > } > > /** > *

This method is to be used if a request is to use non-default > values at submission. According to QBlast info, > * the accepted parameters for PUT requests are:

> * > *
    > *
  • -G: cost to create a gap. Default = 5 (nuc-nuc) / 11 (protein) / > non-affine for megablast
  • > *
  • -E: Cost to extend a gap. Default = 2 (nuc-nuc) / 1 (protein) / > non-affine for megablast
  • > *
  • -r: integer to reward for match. Default = 1
  • > *
  • -q: negative integer for penalty to allow mismatch. Default = > -3
  • > *
  • -e: expectation value. Default = 10.0
  • > *
  • -W: word size. Default = 3 (proteins) / 11 (nuc-nuc) / 28 > (megablast)
  • > *
  • -y: dropoff for blast extensions in bits, using default if not > specified. Default = 20 for blastn, 7 for all others > * (except megablast for which it is not applicable).
  • > *
  • -X: X dropoff value for gapped alignment, in bits. Default = 30 > for blastn/megablast, 15 for all others.
  • > *
  • -Z: final X dropoff value for gapped alignement, in bits. Default > = 50 for blastn, 25 for all others > * (except megablast for which it is not applicable)
  • > *
  • -P: equals 0 for multiple hits 1-pass, 1 for single hit 1-pass. > Does not apply to blastn ou megablast.
  • > *
  • -A: multiple hits window size. Default = 0 (for single hit > algorithm)
  • > *
  • -I: number of database sequences to save hits for. Default = > 500
  • > *
  • -Y: effective length of the search space. Default = 0 (0 > represents using the whole space)
  • > *
  • -z: a real specifying the effective length of the database to > use. Default = 0 (0 represents the real size)
  • > *
  • -c: an integer representing pseudocount constant for PSI-BLAST. > Default = 7
  • > *
  • -F: any filtering directive
  • > *
> * > *

You have to be aware that at not moment is there any error > checking on the use of these parameters by this class.

> * @param aStr: a String with any number of optional parameters with an > associated value. > * > */ > public void setAdvancedOptions(String aStr) { > this.advanced = "OTHER_ADVANCED=" + aStr; > } > > /** > * > * Simply return the string given as argument via > setBlastAdvancedOptions > * > * @return advanced: the string with the advanced options > */ > public String getBlastAdvancedOptions() { > return this.advanced; > } > > /** > * > * Simply return the QBlast RID for this specific QBlast request > * > * @return rid: the string with the RID > */ > public String getBlastRID() { > return this.rid; > } > > /** > * A simple method to check the availability of the QBlast service > * > * @throws BioException > */ > public void printRemoteBlastInfo() throws BioException { > try { > OutputStreamWriter out = new OutputStreamWriter(uConn > .getOutputStream()); > > out.write("CMD=Info"); > out.flush(); > > // Get the response > BufferedReader rd = new BufferedReader(new > InputStreamReader(uConn > .getInputStream())); > > String line = ""; > > while ((line = rd.readLine()) != null) { > System.out.println(line); > } > > out.close(); > rd.close(); > } catch (IOException e) { > throw new BioException( > "Impossible to get info from QBlast service at this > time. Check your network connection"); > } > } > > private URLConnection setQBlastProperties(URLConnection conn) { > > URLConnection tmp = conn; > > conn.setDoOutput(true); > conn.setUseCaches(false); > > tmp.setRequestProperty("User-Agent", "Biojava/RemoteQBlastService"); > tmp.setRequestProperty("Connection", "Keep-Alive"); > tmp.setRequestProperty("Content-type", > "application/x-www-form-urlencoded"); > tmp.setRequestProperty("Content-length", "200"); > > return tmp; > } > } > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev -- Richard Holland, BSc MBCS Finance Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From ayates at ebi.ac.uk Thu Jun 11 11:53:35 2009 From: ayates at ebi.ac.uk (Andy Yates) Date: Thu, 11 Jun 2009 16:53:35 +0100 Subject: [Biojava-dev] First draft of a remote blast service class In-Reply-To: <1244729855.5546.52.camel@buzzybee> References: <1244729855.5546.52.camel@buzzybee> Message-ID: <4A31287F.9070102@ebi.ac.uk> Really the map/enum pattern is nearly knocking on the door of the prototype pattern & is a very good way to go for this kind of system where target values are never set in stone (well only for a particular release of a service). If anyone is interested there's a very good bit of information from: http://steve-yegge.blogspot.com/2008/10/universal-design-pattern.html Andy Richard Holland wrote: > Good stuff! My 2p's worth: > > setSequence() should be overloaded to accept all forms of possible > sequence input - whatever is decided on as the standard way of > referencing sequence data in BJ3. The original plan for BJ3 was to allow > String/CharSequence and List (see > http://www.biojava.org/wiki/BioJava3:HowTo ) > > setAdvancedOptions() should not accept a String, but rather a Properties > or a Map, where the keys of the Map/Properties are > restricted to a range of acceptable values determined (and published, > maybe as an enum?) by each of the implementation classes (e.g. > RemoteQBlastService). The implementation class then uses this to > construct the call string. The reason for doing it this way is that (a) > it allows the parameters to be verified by checking them against a known > list of allowable key/values, and (b) it allows for non-URL based remote > requests to be constructed from the values, e.g. SOAP calls. > > I would also replace the static int HTML/TEXT/XML with an enum as > numeric constants are sometimes a Bad Thing. > > The setProgram() method in my mind is specific to Blast, as opposed to > being a generic Pairwise Alignment concept. Therefore it might be better > to move this to a Blast-specific sub-interface or make it only appear in > the implementation classes that refer to Blast. > > Finally, the JavaDocs for the various set() methods are incorrect - > they're all mostly the same in fact! :) > > But overall it looks good. > > cheers, > Richard > > On Thu, 2009-06-11 at 09:52 -0400, Sylvain Foisy wrote: >> Hi to all, >> >> I've been working on this for the past week or so and after discussing this >> with Andreas, I am putting my code here for critical review. I'll put this >> stuff in biojava-live as soon as Andreas can fix my SVN access. >> >> First, an interface called RemotePairwiseAlignementSerivce defines the basic >> components of a remote service: sequence/database/progam/run options/output >> options. RemoteQBlastService implements this interface and runs remote >> Qblast requests and creates output in either text, XML or HTML. At present >> time, regular blastall programs work, no blastpgp/megablast support yet. >> >> I'll need some guidance to make it work on other type of web services like >> EBI. >> >> Best regards >> >> Sylvain >> >> =================================================================== >> >> Sylvain Foisy, Ph. D. >> Consultant Bio-informatique / Bioinformatics >> Diploide.net - TI pour la vie / IT for Life >> >> Courriel: sylvain.foisy at diploide.net >> Web: http://www.diploide.net >> Tel: (514) 893-4363 >> =================================================================== >> >> import java.io.InputStream; >> >> import org.biojava.bio.BioException; >> /** >> * This interface specifies minimal information needed to execute a pairwise >> alignment on a remote service. >> * >> * Example of service: QBlast service at NCBI >> * Web Service at EBI >> * >> * @author Sylvain Foisy >> * @since 1.8 >> * >> */ >> public interface RemotePairwiseAlignementService { >> >> /** >> * This field specifies that the output format of results >> * is text. >> * >> */ >> public static final int TEXT = 0; >> >> /** >> * This field specifies that the output format of results >> * is XML. >> * >> */ >> public static final int XML = 1; >> >> /** >> * This field specifies that the output format of results >> * is HTML. >> * >> */ >> public static final int HTML = 2; >> >> /** >> * Setting the database to use for doing the pairwise alignment >> * >> * @param db: a String with a valid database ID for the >> service used. >> * >> */ >> public void setDatabase(String db); >> >> /** >> * Setting the sequence to be align for this for this request >> * >> * @param seq: a String with a sequence to be aligned. >> * >> */ >> public void setSequence(String seq); >> >> /** >> * Setting the program to use for this pairwise alignment >> * >> * @param prog: a String with a valid database ID for the >> service used. >> * >> */ >> public void setProgram(String prog); >> >> /** >> * Setting all other options to use for this pairwise alignment >> * >> * @param db: a String with a valid database ID for the >> service used. >> * >> */ >> public void setAdvancedOptions(String str); >> >> /** >> * Doing the actual analysis on the instantiated service >> * >> * @throws BioException >> */ >> public void executeSearch() throws BioException; >> >> /** >> * Getting the actual alignment results from this instantiated service >> * >> * @return : an InputStream with the actual alignment >> results >> * @throws BioException >> */ >> public InputStream getAlignmentResults() throws BioException; >> } >> >> import java.io.BufferedReader; >> import java.io.IOException; >> import java.io.InputStream; >> import java.io.InputStreamReader; >> import java.io.OutputStreamWriter; >> import java.net.MalformedURLException; >> import java.net.URL; >> import java.net.URLConnection; >> >> import org.biojava.bio.BioException; >> >> /** >> * RemoteQBlastService - A simple way of submitting BLAST request to the >> QBlast >> * service at NCBI. >> * >> *

>> * NCBI provides a Blast server through a CGI-BIN interface. >> RemoteQBlastService simply >> * encapsulates an access to it by giving users access to get/set methods to >> fix >> * sequence, program and database as well as advanced options. >> *

>> * >> *

>> * As of version 1.0, only blastall programs are usable. blastpgp and >> megablast are high-priorities. >> *

>> * >> * @author Sylvain Foisy >> * @version 1.0 >> * @since 1.8 >> * >> * >> */ >> public class RemoteQBlastService implements RemotePairwiseAlignementService{ >> >> // public static final int TEXT = 0; >> // public static final int XML = 1; >> // public static final int HTML = 2; >> >> private static String baseurl = >> "http://www.ncbi.nlm.nih.gov/blast/Blast.cgi"; >> private URL aUrl; >> private URLConnection uConn; >> private OutputStreamWriter fromQBlast; >> private BufferedReader rd; >> >> private String seq = null; >> private String prog = null; >> private String db = null; >> private String outputFormat = null; >> private String advanced = null; >> >> private String rid; >> private long step; >> private boolean done = false; >> private long start; >> >> public RemoteQBlastService() throws BioException { >> try { >> aUrl = new URL(baseurl); >> uConn = setQBlastProperties(aUrl.openConnection()); >> >> outputFormat = "Text"; >> } >> /* >> * Needed but should never be thrown since the URL is static and >> known to exist >> */ >> catch (MalformedURLException e) { >> throw new BioException("It looks like the URL for NCBI QBlast >> service is bad"); >> } >> /* >> * Intercept if the program can't connect to QBlast service >> */ >> catch (IOException e) { >> throw new BioException( >> "Impossible to connect to QBlast service at this time. >> Check your network connection"); >> } >> } >> >> /** >> * This method execute the Blast request via the Put command of the >> CGI-BIN >> * interface. It gets the estimated time of completion by capturing the >> * value of the RTOE variable and sets a loop that will check for >> completion >> * of analysis at intervals specified by RTOE. >> * >> *

>> * It also capture the value for the RID variable, necessary for >> fetching >> * the actual results after completion. >> *

>> * >> * @throws BioException >> * if it is not possible to sent the BLAST command >> */ >> public void executeSearch() throws BioException { >> >> if (seq == null || db == null || prog == null) { >> throw new BioException( >> "Impossible to execute QBlast request. One or more of >> seq|db|prog has not been set"); >> } >> /* >> * sending the command to execute the Blast analysis >> */ >> String cmd = "CMD=Put&SERVICE=plain" + "&" + seq + "&" + prog + "&" >> + db + "&" + "FORMAT_TYPE=HTML"; >> >> if (advanced != null) { >> cmd += cmd + "&" + advanced; >> } >> >> try { >> >> uConn = setQBlastProperties(aUrl.openConnection()); >> >> fromQBlast = new OutputStreamWriter(uConn.getOutputStream()); >> >> fromQBlast.write(cmd); >> fromQBlast.flush(); >> >> // Get the response >> rd = new BufferedReader(new InputStreamReader(uConn >> .getInputStream())); >> >> String line = ""; >> >> while ((line = rd.readLine()) != null) { >> if (line.contains("RID")) { >> String[] arr = line.split("="); >> rid = arr[1].trim(); >> } else if (line.contains("RTOE")) { >> String[] arr = line.split("="); >> step = Long.parseLong(arr[1].trim()) * 1000; >> start = System.currentTimeMillis() + step; >> } >> } >> } catch (IOException e) { >> throw new BioException( >> "Can't submit sequence to BLAST server at this time."); >> } >> /* >> * Getting the info out of the NCBI system >> */ >> while (!done) { >> long prez = System.currentTimeMillis(); >> done = isReady(rid, prez); >> } >> } >> >> /** >> *

This method is used only for the executeBlastSearch method to >> check for completion of >> * request using the NCBI specified RTOE variable

>> * >> * @param id >> * @param present >> * @return >> */ >> private boolean isReady(String id, long present) { >> >> boolean ready = false; >> String check = "CMD=Get&RID=" + id; >> /* >> * If present time is less than the start of the search added to >> step >> * obtained from NCBI, just do nothing ;-) >> */ >> if (present < start) { >> ; >> } >> /* >> * If we are at least step seconds in the future from the actual >> call of >> * method executeBlastSearch() >> */ >> else { >> try { >> uConn = setQBlastProperties(aUrl.openConnection()); >> >> fromQBlast = new >> OutputStreamWriter(uConn.getOutputStream()); >> fromQBlast.write(check); >> fromQBlast.flush(); >> >> rd = new BufferedReader(new InputStreamReader(uConn >> .getInputStream())); >> >> String line = ""; >> >> while ((line = rd.readLine()) != null) { >> if (line.contains("READY")) { >> ready = true; >> } else if (line.contains("WAITING")) { >> /* >> * Else, move start forward in time... >> */ >> start = present + step; >> } >> } >> } catch (IOException e) { >> e.printStackTrace(); >> } >> } >> return ready; >> } >> >> /** >> *

This method extracts this actual Blast report. The default format >> is Text but can be changed before with the method >> * setQBlastOutputFormat.

>> * >> * >> * @return >> * @throws BioException >> */ >> public InputStream getAlignmentResults() throws BioException { >> String srid = "CMD=Get&RID=" + rid; >> srid += "&FORMAT_TYPE=" + outputFormat; >> >> if(!this.done){ >> throw new BioException("Unable to get report at this time. Your >> Blast request has not been processed yet."); >> } >> >> try { >> uConn = setQBlastProperties(aUrl.openConnection()); >> >> fromQBlast = new OutputStreamWriter(uConn.getOutputStream()); >> fromQBlast.write(srid); >> fromQBlast.flush(); >> >> return uConn.getInputStream(); >> >> } catch (IOException ioe) { >> throw new BioException( >> "It is not possible to fetch Blast report from NCBI at >> this time"); >> } >> } >> >> /** >> *

>> * Set the sequence to be blasted using the String that correspond to >> the >> * sequence. >> *

>> * >> *

>> * Take note that this method is mutually exclusive to setGIToBlast() >> for a >> * given Blast request. >> *

>> * >> * @param aStr >> * : a String with the sequence >> */ >> public void setSequence(String aStr) { >> this.seq = "QUERY=" + aStr; >> } >> >> /** >> * Simply return a string with the blasted sequence. >> * >> * @return seq : a string with the sequence >> */ >> public String getSeqToBlast() { >> return this.seq; >> } >> >> /** >> *

>> * Set the sequence to be blasted using the NCBI GI value. At this time, >> * there is no effort made to check the validity of this GI. >> *

>> * >> *

>> * Take note that this method is mutually exclusive to setSeqToBlast() >> for a >> * given Blast request. >> *

>> * >> * @param gi >> * : an integer value representing a NCBI GI >> */ >> public void setGIToBlast(String gi) { >> this.seq = "QUERY=" + gi; >> } >> >> /** >> *

>> * Simply return a string with the sequence blasted. >> *

>> * >> * @return GI : a String with the GI of the blasted sequence >> */ >> public String getGIToBlast() { >> return this.seq; >> } >> >> /** >> *

>> * This method set the program to be used to blast the given >> sequence/GI. At >> * this time, there is no attempt at checking the matching of sequence >> type >> * to program. >> *

>> * >> * @param prog >> * : a String representing the program specified for this >> QBlast >> * request. >> * >> */ >> public void setProgram(String prog) { >> this.prog = "PROGRAM=" + prog; >> } >> >> /** >> *

>> * Simply returns the program used for the given Blast request. >> *

>> * >> * @return prog : a String with the program used for this QBlast >> request. >> */ >> public String getProgram() { >> return this.prog; >> } >> >> /** >> *

>> * This method set the database to be used to blast the given >> sequence/GI. >> * At this time, there is no attempt at checking the matching of >> sequence >> * type to database. >> *

>> * >> * @param db: a String for the database specified for this QBlast >> request >> */ >> public void setDatabase(String db) { >> this.db = "DATABASE=" + db; >> } >> >> /** >> *

>> * Simply returns the database used for the given Blast request. >> *

>> * >> * @return db: a String with the database used for this QBlast request. >> */ >> public String getBlastDatabase() { >> return this.db; >> } >> >> /** >> *

This method let the user specify which format to use for >> generating the output.

>> * >> * @param type:an integer taken from the static constant of this class, >> either be TEXT, XML or HTML >> */ >> public void setQBlastOutputFormat(int type) { >> >> switch (type) { >> case 0: >> this.outputFormat = "Text"; >> break; >> case 1: >> this.outputFormat = "XML"; >> break; >> case 2: >> this.outputFormat = "HTML"; >> break; >> } >> } >> >> /** >> *

>> * Simply returns the output format used for the given Blast report. >> *

>> * >> * @return outputFormat : a String with the format specified for the >> QBlast report. >> */ >> public String getQBlastOutputFormat() { >> return this.outputFormat; >> } >> >> /** >> *

This method is to be used if a request is to use non-default >> values at submission. According to QBlast info, >> * the accepted parameters for PUT requests are:

>> * >> *
    >> *
  • -G: cost to create a gap. Default = 5 (nuc-nuc) / 11 (protein) / >> non-affine for megablast
  • >> *
  • -E: Cost to extend a gap. Default = 2 (nuc-nuc) / 1 (protein) / >> non-affine for megablast
  • >> *
  • -r: integer to reward for match. Default = 1
  • >> *
  • -q: negative integer for penalty to allow mismatch. Default = >> -3
  • >> *
  • -e: expectation value. Default = 10.0
  • >> *
  • -W: word size. Default = 3 (proteins) / 11 (nuc-nuc) / 28 >> (megablast)
  • >> *
  • -y: dropoff for blast extensions in bits, using default if not >> specified. Default = 20 for blastn, 7 for all others >> * (except megablast for which it is not applicable).
  • >> *
  • -X: X dropoff value for gapped alignment, in bits. Default = 30 >> for blastn/megablast, 15 for all others.
  • >> *
  • -Z: final X dropoff value for gapped alignement, in bits. Default >> = 50 for blastn, 25 for all others >> * (except megablast for which it is not applicable)
  • >> *
  • -P: equals 0 for multiple hits 1-pass, 1 for single hit 1-pass. >> Does not apply to blastn ou megablast.
  • >> *
  • -A: multiple hits window size. Default = 0 (for single hit >> algorithm)
  • >> *
  • -I: number of database sequences to save hits for. Default = >> 500
  • >> *
  • -Y: effective length of the search space. Default = 0 (0 >> represents using the whole space)
  • >> *
  • -z: a real specifying the effective length of the database to >> use. Default = 0 (0 represents the real size)
  • >> *
  • -c: an integer representing pseudocount constant for PSI-BLAST. >> Default = 7
  • >> *
  • -F: any filtering directive
  • >> *
>> * >> *

You have to be aware that at not moment is there any error >> checking on the use of these parameters by this class.

>> * @param aStr: a String with any number of optional parameters with an >> associated value. >> * >> */ >> public void setAdvancedOptions(String aStr) { >> this.advanced = "OTHER_ADVANCED=" + aStr; >> } >> >> /** >> * >> * Simply return the string given as argument via >> setBlastAdvancedOptions >> * >> * @return advanced: the string with the advanced options >> */ >> public String getBlastAdvancedOptions() { >> return this.advanced; >> } >> >> /** >> * >> * Simply return the QBlast RID for this specific QBlast request >> * >> * @return rid: the string with the RID >> */ >> public String getBlastRID() { >> return this.rid; >> } >> >> /** >> * A simple method to check the availability of the QBlast service >> * >> * @throws BioException >> */ >> public void printRemoteBlastInfo() throws BioException { >> try { >> OutputStreamWriter out = new OutputStreamWriter(uConn >> .getOutputStream()); >> >> out.write("CMD=Info"); >> out.flush(); >> >> // Get the response >> BufferedReader rd = new BufferedReader(new >> InputStreamReader(uConn >> .getInputStream())); >> >> String line = ""; >> >> while ((line = rd.readLine()) != null) { >> System.out.println(line); >> } >> >> out.close(); >> rd.close(); >> } catch (IOException e) { >> throw new BioException( >> "Impossible to get info from QBlast service at this >> time. Check your network connection"); >> } >> } >> >> private URLConnection setQBlastProperties(URLConnection conn) { >> >> URLConnection tmp = conn; >> >> conn.setDoOutput(true); >> conn.setUseCaches(false); >> >> tmp.setRequestProperty("User-Agent", "Biojava/RemoteQBlastService"); >> tmp.setRequestProperty("Connection", "Keep-Alive"); >> tmp.setRequestProperty("Content-type", >> "application/x-www-form-urlencoded"); >> tmp.setRequestProperty("Content-length", "200"); >> >> return tmp; >> } >> } >> >> >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev From HWillis at scripps.edu Thu Jun 11 11:58:22 2009 From: HWillis at scripps.edu (Scooter Willis) Date: Thu, 11 Jun 2009 11:58:22 -0400 Subject: [Biojava-dev] First draft of a remote blast service class In-Reply-To: References: Message-ID: <061BFD133FA1584693D19C79A0072F5F95FFD9@FLMAIL1.fl.ad.scripps.edu> Sylvain My first reaction was that I was expecting BLAST code but came across RemotePairwiseAlignementService which made me pause thinking I would be looking at a sequence alignment code. RemoteBLASTService would be a better description specific to doing Remote BLAST. I agree that everything should be an enum if possible but encapsulated in a single search/parameter class. The enums should not have any URL specific association with the remote service but should be abstracted to something that makes sense to a developer wanting to use a service they know nothing about and don't want to take the time to read. The query parameters should be defined as a Java class that could be passed around to different service providers and then internally to the service provider the values would be mapped to the specific requirements of that service. Doing a quick view of the form for NCBI BLASTN you have human readable labels that when the query is submitted will map to a value that the programmer wanted to use as short hand. http://blast.ncbi.nlm.nih.gov/Blast.cgi?PROGRAM=blastn&BLAST_PROGRAMS=me gaBlast&PAGE_TYPE=BlastSearch&SHOW_DEFAULTS=on&BLAST_SPEC=&LINK_LOC=blas ttab&LAST_PAGE=tblastx If you click on blastn,blastp,blastx,tblastn, tblastx tabs on the above link you will see that the forms are very similar but do have variations. I would use each input form as the model for the class to do the appropriate search. What is common to the 5 tabs would be in the base abstract search class and any input requirements that are different would go in an extended class. This gives you a generic class for modeling the search parameters that is easily understood. The hard part is then mapping the easy to understand version to the specific search query parameters of a particular service. Either way you should be able to pass the search class to different providers without knowing anything about that specific service. It would also be nice to have a listener interface so the class that is responsible for doing the query also checks if the results are available based on some poll value. The external calling code shouldn't need to worry about bookkeeping of unique identifiers for a particular service provider. The implementation class should hide all those details. You also have the results returning in text, XML or HTML. It would be nice if the results could be returned as a collection of SeqSimilaritySearchResult and collection of SeqSimilaritySearchHit found at http://www.biojava.org/wiki/BioJava:CookBook:Blast:Parser This may require you to parse the text/HTML/XML code in your implementation class. This way you can tweak or adjust for anything specific to the service provider. Other BLAST web services WSDL providers will return a collection of Java classes specific to that implementation that then need to be mapped to SeqSimilaritySearchResult and SeqSimilaritySearchHit. The benefit is that API hides all the ugly details from the developer who is using the BLAST service. NCBIBlast has a formal WSDL interface which may make the process easier for you. http://bioinfo.unice.fr/web_services/Using_NCBI-Blast.html If you click on this link http://www.ebi.ac.uk/Tools/webservices/wsdl/WSNCBIBlast.wsdl you will see all the web services magic that you hand off to your favorite IDE and it writes the code for you. I did a quick test in Netbeans and they are using Jax-RPC for the web service calls where I don't see a nice set of Java classes for structured results. This means parsing a string. It also appears they are providing a similar interface for WU-Blast http://bioinfo.unice.fr/web_services/Using_WU-Blast.html#General_Informa tion and http://www.ebi.ac.uk/Tools/webservices/wsdl/WSWUBlast.wsdl The advantage of using the web service interface is that it should be stable where you can't control changes they are making to the CGI form submission which would break the biojava code. Scooter -----Original Message----- From: biojava-dev-bounces at lists.open-bio.org [mailto:biojava-dev-bounces at lists.open-bio.org] On Behalf Of Sylvain Foisy Sent: Thursday, June 11, 2009 9:52 AM To: biojava-dev at lists.open-bio.org Subject: [Biojava-dev] First draft of a remote blast service class Hi to all, I've been working on this for the past week or so and after discussing this with Andreas, I am putting my code here for critical review. I'll put this stuff in biojava-live as soon as Andreas can fix my SVN access. First, an interface called RemotePairwiseAlignementSerivce defines the basic components of a remote service: sequence/database/progam/run options/output options. RemoteQBlastService implements this interface and runs remote Qblast requests and creates output in either text, XML or HTML. At present time, regular blastall programs work, no blastpgp/megablast support yet. I'll need some guidance to make it work on other type of web services like EBI. Best regards Sylvain =================================================================== Sylvain Foisy, Ph. D. Consultant Bio-informatique / Bioinformatics Diploide.net - TI pour la vie / IT for Life Courriel: sylvain.foisy at diploide.net Web: http://www.diploide.net Tel: (514) 893-4363 =================================================================== import java.io.InputStream; import org.biojava.bio.BioException; /** * This interface specifies minimal information needed to execute a pairwise alignment on a remote service. * * Example of service: QBlast service at NCBI * Web Service at EBI * * @author Sylvain Foisy * @since 1.8 * */ public interface RemotePairwiseAlignementService { /** * This field specifies that the output format of results * is text. * */ public static final int TEXT = 0; /** * This field specifies that the output format of results * is XML. * */ public static final int XML = 1; /** * This field specifies that the output format of results * is HTML. * */ public static final int HTML = 2; /** * Setting the database to use for doing the pairwise alignment * * @param db: a String with a valid database ID for the service used. * */ public void setDatabase(String db); /** * Setting the sequence to be align for this for this request * * @param seq: a String with a sequence to be aligned. * */ public void setSequence(String seq); /** * Setting the program to use for this pairwise alignment * * @param prog: a String with a valid database ID for the service used. * */ public void setProgram(String prog); /** * Setting all other options to use for this pairwise alignment * * @param db: a String with a valid database ID for the service used. * */ public void setAdvancedOptions(String str); /** * Doing the actual analysis on the instantiated service * * @throws BioException */ public void executeSearch() throws BioException; /** * Getting the actual alignment results from this instantiated service * * @return : an InputStream with the actual alignment results * @throws BioException */ public InputStream getAlignmentResults() throws BioException; } import java.io.BufferedReader; import java.io.IOException; import java.io.InputStream; import java.io.InputStreamReader; import java.io.OutputStreamWriter; import java.net.MalformedURLException; import java.net.URL; import java.net.URLConnection; import org.biojava.bio.BioException; /** * RemoteQBlastService - A simple way of submitting BLAST request to the QBlast * service at NCBI. * *

* NCBI provides a Blast server through a CGI-BIN interface. RemoteQBlastService simply * encapsulates an access to it by giving users access to get/set methods to fix * sequence, program and database as well as advanced options. *

* *

* As of version 1.0, only blastall programs are usable. blastpgp and megablast are high-priorities. *

* * @author Sylvain Foisy * @version 1.0 * @since 1.8 * * */ public class RemoteQBlastService implements RemotePairwiseAlignementService{ // public static final int TEXT = 0; // public static final int XML = 1; // public static final int HTML = 2; private static String baseurl = "http://www.ncbi.nlm.nih.gov/blast/Blast.cgi"; private URL aUrl; private URLConnection uConn; private OutputStreamWriter fromQBlast; private BufferedReader rd; private String seq = null; private String prog = null; private String db = null; private String outputFormat = null; private String advanced = null; private String rid; private long step; private boolean done = false; private long start; public RemoteQBlastService() throws BioException { try { aUrl = new URL(baseurl); uConn = setQBlastProperties(aUrl.openConnection()); outputFormat = "Text"; } /* * Needed but should never be thrown since the URL is static and known to exist */ catch (MalformedURLException e) { throw new BioException("It looks like the URL for NCBI QBlast service is bad"); } /* * Intercept if the program can't connect to QBlast service */ catch (IOException e) { throw new BioException( "Impossible to connect to QBlast service at this time. Check your network connection"); } } /** * This method execute the Blast request via the Put command of the CGI-BIN * interface. It gets the estimated time of completion by capturing the * value of the RTOE variable and sets a loop that will check for completion * of analysis at intervals specified by RTOE. * *

* It also capture the value for the RID variable, necessary for fetching * the actual results after completion. *

* * @throws BioException * if it is not possible to sent the BLAST command */ public void executeSearch() throws BioException { if (seq == null || db == null || prog == null) { throw new BioException( "Impossible to execute QBlast request. One or more of seq|db|prog has not been set"); } /* * sending the command to execute the Blast analysis */ String cmd = "CMD=Put&SERVICE=plain" + "&" + seq + "&" + prog + "&" + db + "&" + "FORMAT_TYPE=HTML"; if (advanced != null) { cmd += cmd + "&" + advanced; } try { uConn = setQBlastProperties(aUrl.openConnection()); fromQBlast = new OutputStreamWriter(uConn.getOutputStream()); fromQBlast.write(cmd); fromQBlast.flush(); // Get the response rd = new BufferedReader(new InputStreamReader(uConn .getInputStream())); String line = ""; while ((line = rd.readLine()) != null) { if (line.contains("RID")) { String[] arr = line.split("="); rid = arr[1].trim(); } else if (line.contains("RTOE")) { String[] arr = line.split("="); step = Long.parseLong(arr[1].trim()) * 1000; start = System.currentTimeMillis() + step; } } } catch (IOException e) { throw new BioException( "Can't submit sequence to BLAST server at this time."); } /* * Getting the info out of the NCBI system */ while (!done) { long prez = System.currentTimeMillis(); done = isReady(rid, prez); } } /** *

This method is used only for the executeBlastSearch method to check for completion of * request using the NCBI specified RTOE variable

* * @param id * @param present * @return */ private boolean isReady(String id, long present) { boolean ready = false; String check = "CMD=Get&RID=" + id; /* * If present time is less than the start of the search added to step * obtained from NCBI, just do nothing ;-) */ if (present < start) { ; } /* * If we are at least step seconds in the future from the actual call of * method executeBlastSearch() */ else { try { uConn = setQBlastProperties(aUrl.openConnection()); fromQBlast = new OutputStreamWriter(uConn.getOutputStream()); fromQBlast.write(check); fromQBlast.flush(); rd = new BufferedReader(new InputStreamReader(uConn .getInputStream())); String line = ""; while ((line = rd.readLine()) != null) { if (line.contains("READY")) { ready = true; } else if (line.contains("WAITING")) { /* * Else, move start forward in time... */ start = present + step; } } } catch (IOException e) { e.printStackTrace(); } } return ready; } /** *

This method extracts this actual Blast report. The default format is Text but can be changed before with the method * setQBlastOutputFormat.

* * * @return * @throws BioException */ public InputStream getAlignmentResults() throws BioException { String srid = "CMD=Get&RID=" + rid; srid += "&FORMAT_TYPE=" + outputFormat; if(!this.done){ throw new BioException("Unable to get report at this time. Your Blast request has not been processed yet."); } try { uConn = setQBlastProperties(aUrl.openConnection()); fromQBlast = new OutputStreamWriter(uConn.getOutputStream()); fromQBlast.write(srid); fromQBlast.flush(); return uConn.getInputStream(); } catch (IOException ioe) { throw new BioException( "It is not possible to fetch Blast report from NCBI at this time"); } } /** *

* Set the sequence to be blasted using the String that correspond to the * sequence. *

* *

* Take note that this method is mutually exclusive to setGIToBlast() for a * given Blast request. *

* * @param aStr * : a String with the sequence */ public void setSequence(String aStr) { this.seq = "QUERY=" + aStr; } /** * Simply return a string with the blasted sequence. * * @return seq : a string with the sequence */ public String getSeqToBlast() { return this.seq; } /** *

* Set the sequence to be blasted using the NCBI GI value. At this time, * there is no effort made to check the validity of this GI. *

* *

* Take note that this method is mutually exclusive to setSeqToBlast() for a * given Blast request. *

* * @param gi * : an integer value representing a NCBI GI */ public void setGIToBlast(String gi) { this.seq = "QUERY=" + gi; } /** *

* Simply return a string with the sequence blasted. *

* * @return GI : a String with the GI of the blasted sequence */ public String getGIToBlast() { return this.seq; } /** *

* This method set the program to be used to blast the given sequence/GI. At * this time, there is no attempt at checking the matching of sequence type * to program. *

* * @param prog * : a String representing the program specified for this QBlast * request. * */ public void setProgram(String prog) { this.prog = "PROGRAM=" + prog; } /** *

* Simply returns the program used for the given Blast request. *

* * @return prog : a String with the program used for this QBlast request. */ public String getProgram() { return this.prog; } /** *

* This method set the database to be used to blast the given sequence/GI. * At this time, there is no attempt at checking the matching of sequence * type to database. *

* * @param db: a String for the database specified for this QBlast request */ public void setDatabase(String db) { this.db = "DATABASE=" + db; } /** *

* Simply returns the database used for the given Blast request. *

* * @return db: a String with the database used for this QBlast request. */ public String getBlastDatabase() { return this.db; } /** *

This method let the user specify which format to use for generating the output.

* * @param type:an integer taken from the static constant of this class, either be TEXT, XML or HTML */ public void setQBlastOutputFormat(int type) { switch (type) { case 0: this.outputFormat = "Text"; break; case 1: this.outputFormat = "XML"; break; case 2: this.outputFormat = "HTML"; break; } } /** *

* Simply returns the output format used for the given Blast report. *

* * @return outputFormat : a String with the format specified for the QBlast report. */ public String getQBlastOutputFormat() { return this.outputFormat; } /** *

This method is to be used if a request is to use non-default values at submission. According to QBlast info, * the accepted parameters for PUT requests are:

* *
    *
  • -G: cost to create a gap. Default = 5 (nuc-nuc) / 11 (protein) / non-affine for megablast
  • *
  • -E: Cost to extend a gap. Default = 2 (nuc-nuc) / 1 (protein) / non-affine for megablast
  • *
  • -r: integer to reward for match. Default = 1
  • *
  • -q: negative integer for penalty to allow mismatch. Default = -3
  • *
  • -e: expectation value. Default = 10.0
  • *
  • -W: word size. Default = 3 (proteins) / 11 (nuc-nuc) / 28 (megablast)
  • *
  • -y: dropoff for blast extensions in bits, using default if not specified. Default = 20 for blastn, 7 for all others * (except megablast for which it is not applicable).
  • *
  • -X: X dropoff value for gapped alignment, in bits. Default = 30 for blastn/megablast, 15 for all others.
  • *
  • -Z: final X dropoff value for gapped alignement, in bits. Default = 50 for blastn, 25 for all others * (except megablast for which it is not applicable)
  • *
  • -P: equals 0 for multiple hits 1-pass, 1 for single hit 1-pass. Does not apply to blastn ou megablast.
  • *
  • -A: multiple hits window size. Default = 0 (for single hit algorithm)
  • *
  • -I: number of database sequences to save hits for. Default = 500
  • *
  • -Y: effective length of the search space. Default = 0 (0 represents using the whole space)
  • *
  • -z: a real specifying the effective length of the database to use. Default = 0 (0 represents the real size)
  • *
  • -c: an integer representing pseudocount constant for PSI-BLAST. Default = 7
  • *
  • -F: any filtering directive
  • *
* *

You have to be aware that at not moment is there any error checking on the use of these parameters by this class.

* @param aStr: a String with any number of optional parameters with an associated value. * */ public void setAdvancedOptions(String aStr) { this.advanced = "OTHER_ADVANCED=" + aStr; } /** * * Simply return the string given as argument via setBlastAdvancedOptions * * @return advanced: the string with the advanced options */ public String getBlastAdvancedOptions() { return this.advanced; } /** * * Simply return the QBlast RID for this specific QBlast request * * @return rid: the string with the RID */ public String getBlastRID() { return this.rid; } /** * A simple method to check the availability of the QBlast service * * @throws BioException */ public void printRemoteBlastInfo() throws BioException { try { OutputStreamWriter out = new OutputStreamWriter(uConn .getOutputStream()); out.write("CMD=Info"); out.flush(); // Get the response BufferedReader rd = new BufferedReader(new InputStreamReader(uConn .getInputStream())); String line = ""; while ((line = rd.readLine()) != null) { System.out.println(line); } out.close(); rd.close(); } catch (IOException e) { throw new BioException( "Impossible to get info from QBlast service at this time. Check your network connection"); } } private URLConnection setQBlastProperties(URLConnection conn) { URLConnection tmp = conn; conn.setDoOutput(true); conn.setUseCaches(false); tmp.setRequestProperty("User-Agent", "Biojava/RemoteQBlastService"); tmp.setRequestProperty("Connection", "Keep-Alive"); tmp.setRequestProperty("Content-type", "application/x-www-form-urlencoded"); tmp.setRequestProperty("Content-length", "200"); return tmp; } } _______________________________________________ biojava-dev mailing list biojava-dev at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-dev From sylvain.foisy at diploide.net Thu Jun 11 12:21:01 2009 From: sylvain.foisy at diploide.net (Sylvain Foisy) Date: Thu, 11 Jun 2009 12:21:01 -0400 Subject: [Biojava-dev] First draft of a remote blast service class In-Reply-To: <061BFD133FA1584693D19C79A0072F5F95FFD9@FLMAIL1.fl.ad.scripps.edu> Message-ID: Hi to all, I have read all of the comments that my code generated and I am taking notes. I have to admit that some of the material is way above what I am used to and will need some profound reading/exploration before I address it. Thanks for the inputs and looking forward to make it better ;-) Best regards Sylvain =================================================================== Sylvain Foisy, Ph. D. Consultant Bio-informatique / Bioinformatics Diploide.net - TI pour la vie / IT for Life Courriel: sylvain.foisy at diploide.net Web: http://www.diploide.net Tel: (514) 893-4363 =================================================================== From andreas at sdsc.edu Mon Jun 15 01:27:55 2009 From: andreas at sdsc.edu (Andreas Prlic) Date: Sun, 14 Jun 2009 22:27:55 -0700 Subject: [Biojava-dev] first three modules Message-ID: <59a41c430906142227w5f21f18u265dc44d3ca24384@mail.gmail.com> Hi, I tested a couple of things today and I have come up with the first 3 new modules biojava-core, biojava-das, and biojava-structure what is common to all modules is the following directory organization: * all modules have a trunk, branches and tags directory, where the trunk directory contains the main code base. * Inside of a module the following directories exist: - src - the code - tests - Junit tests - demos - a few examples classes that contain main methods that can be run as an example the location of the modules in svn is at: svn+ssh://dev.open-bio.org/home/svn-repositories/biojava/biojava-live/branches/modules/ if you want to browse through the new modules, please see here: http://code.open-bio.org/svnweb/index.cgi/biojava/browse/biojava-live/branches/modules The maven build will be added a bit later, once a few more modules have been refactored out. Any comments so far? Andreas From andreas at sdsc.edu Mon Jun 15 23:54:07 2009 From: andreas at sdsc.edu (Andreas Prlic) Date: Mon, 15 Jun 2009 20:54:07 -0700 Subject: [Biojava-dev] next modules: blast and phylo Message-ID: <59a41c430906152054k54e9eee4v341bbe46395d8d84@mail.gmail.com> Hi, just a quick update - next two modules in SVN are: biojava-blast and biojava-phylo What about a module: biojava-biosql ? to repeat: you can also view it in your browser at: http://code.open-bio.org/svnweb/index.cgi/biojava/browse/biojava-live/branches/modules anonymous svn is at: svn co svn://code.open-bio.org/biojava/biojava-live/branches/modules/ svn for developers is at: svn+ssh://dev.open-bio.org/home/svn-repositories/biojava/biojava-live/branches/modules/ and Andreas From abdul.qaddos at gmail.com Tue Jun 16 20:45:14 2009 From: abdul.qaddos at gmail.com (Abdul Qaddus) Date: Wed, 17 Jun 2009 05:45:14 +0500 Subject: [Biojava-dev] Fwd: Need help for resolving the args[0] issues. In-Reply-To: References: Message-ID: Hello Support, I am a new developer of biojava, I have a good knowledge about java and bio, but this is new tool for me, I have some problem while working in this tools, below I have write down the code for reading the Gen Bank file and then convert into "DNA", "RNA" or "Protein". I have already add the biojava library into my source code. When I have read this code I have come to know from your arguments portion for the execution of this code, I have need three argument, one for the filename, second for the file type and third is the alphabet. Now the problem is that how I will pass these three parameter values into source code for args[0], args[1] and args[2]. When I have passed these values by using the string pattern then this code generte a errors, "illegel statement". Please help me out how I can fixed this problem, I will be very thankful to you if you will reply me soon package biojava; import java.io.*; import org.biojava.bio.*; import org.biojava.bio.seq.*; import org.biojava.bio.seq.io.*; public class ReadFasta2 { /** * This program will read any file supported by SeqIOTools it takes three * arguments, the first is the file name the second is the name of * a file format supported by SeqIOTools. eg fasta, genbank etc. * The third argument is the alphabet (eg dna, rna, protein). * * Both the format and alphabet names are case insensitive. * */ public static void main(String[] args) { try { //prepare a BufferedReader for file io BufferedReader br = new BufferedReader(new FileReader(args[0])); String format = args[1]; String alphabet = args[2]; /* * get a Sequence Iterator over all the sequences in the file. * SeqIOTools.fileToBiojava() returns an Object. If the file read * is an alignment format like MSF and Alignment object is returned * otherwise a SequenceIterator is returned. */ SequenceIterator iter = (SequenceIterator)SeqIOTools.fileToBiojava(format,alphabet, br); } catch (FileNotFoundException ex) { //can't find file specified by args[0] ex.printStackTrace(); }catch (BioException ex) { //error parsing requested format ex.printStackTrace(); } } } -- Abdul Qaddus www.futurelinkers.com Cell No:- +92-3336540863 -- Abdul Qaddus www.futurelinkers.com Cell No:- +92-3336540863 From fbristow at gmail.com Thu Jun 18 22:31:13 2009 From: fbristow at gmail.com (Franklin Bristow) Date: Thu, 18 Jun 2009 21:31:13 -0500 Subject: [Biojava-dev] Extended ABIF Parser and SCF Writer Message-ID: <50a7756d0906181931t744faa52r734df8c3c10b78cb@mail.gmail.com> Hi Everyone, I've just spent the last few days putting together an extended ABIF parser and and SCF writer. The parser that I wrote extends the existing ABIFParser but takes into account much of the information that was made available a few years ago when ABI released the ABIF File Format specification ( http://www.appliedbiosystems.com/support/software_community/ABIF_File_Format.pdf). I've heavily based my code and methods on the perl implementation of the ABIF parser Bio::Trace::ABIF by Nicola Vitacolonna. I also wrote a writer for SCF formatted chromatograms. I wrote this mostly using the documentation found in the staden formats documentation ( http://staden.sourceforge.net/manual/formats_unix_2.html and http://iubio.bio.indiana.edu/soft/molbio/molbio.old/staden/www_pages/scf-rfc.html ). Finally, I have written a small utility class that will prepare an ABIFChromatogram for writing out as an SCF formatted file. This is the entire reason that I wrote both of the above classes. I will admit that there is a pretty nasty hack in the SCFUtils class, but it was the quickest way I could think of doing what I needed to do. I use reflection in order to make a protected method accessible so that I could set the value myself without having to subclass ABIFChromatogram. Of course, I would like to change this but the circumstances under which I have had to write this code forced me to do it this way for now. All of this code is written for Java 5, but if it is necessary to change it for inclusion into your source tree I will make the change. So, I welcome comments and suggestions on how I can improve this to make it appealing enough to have it included in biojava in the future. Since the code is rather long, I've attached it as a zip file. Andreas told me that he would keep an eye on the filters for it and would let it through when he saw it, so hopefully it makes it through okay. Thanks everyone for your time! -- Franklin -------------- next part -------------- A non-text attachment was scrubbed... Name: ABIFParser.zip Type: application/zip Size: 19225 bytes Desc: not available URL: From mark.schreiber at novartis.com Fri Jun 19 01:23:22 2009 From: mark.schreiber at novartis.com (mark.schreiber at novartis.com) Date: Fri, 19 Jun 2009 13:23:22 +0800 Subject: [Biojava-dev] Extended ABIF Parser and SCF Writer In-Reply-To: <50a7756d0906181931t744faa52r734df8c3c10b78cb@mail.gmail.com> Message-ID: Hi Franklin - If there is a good argument for making the protected setBaseCallAlignment method public then we could look at changing this so you don't need to use reflection. As you say in your code comments this reflection will not work unless the security policy allows it which will not be the case in many systems. Another alternative would be to modify ABIFChromatogram and provide a public method that lets people safely call the setBaseCallAlignment (requires write access to the SVN). Finally you could extend ABIFChromatogram and add a public method that will call the protected method (of course this won't work if the method is private). Nice to see well documented code! - Mark biojava-dev-bounces at lists.open-bio.org wrote on 06/19/2009 10:31:13 AM: > Hi Everyone, > I've just spent the last few days putting together an extended ABIF parser > and and SCF writer. The parser that I wrote extends the existing ABIFParser > but takes into account much of the information that was made available a few > years ago when ABI released the ABIF File Format specification ( > http://www.appliedbiosystems. > com/support/software_community/ABIF_File_Format.pdf). > I've heavily based my code and methods on the perl implementation of the > ABIF parser Bio::Trace::ABIF by Nicola Vitacolonna. > > I also wrote a writer for SCF formatted chromatograms. I wrote this mostly > using the documentation found in the staden formats documentation ( > http://staden.sourceforge.net/manual/formats_unix_2.html and > http://iubio.bio.indiana.edu/soft/molbio/molbio. > old/staden/www_pages/scf-rfc.html > ). > > Finally, I have written a small utility class that will prepare an > ABIFChromatogram for writing out as an SCF formatted file. This is the > entire reason that I wrote both of the above classes. I will admit that > there is a pretty nasty hack in the SCFUtils class, but it was the quickest > way I could think of doing what I needed to do. I use reflection in order > to make a protected method accessible so that I could set the value myself > without having to subclass ABIFChromatogram. Of course, I would like to > change this but the circumstances under which I have had to write this code > forced me to do it this way for now. > > All of this code is written for Java 5, but if it is necessary to change it > for inclusion into your source tree I will make the change. > > So, I welcome comments and suggestions on how I can improve this to make it > appealing enough to have it included in biojava in the future. > > Since the code is rather long, I've attached it as a zip file. Andreas told > me that he would keep an eye on the filters for it and would let it through > when he saw it, so hopefully it makes it through okay. > > Thanks everyone for your time! > > -- > Franklin > [attachment "ABIFParser.zip" deleted by Mark Schreiber/GP/Novartis] > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev _________________________ CONFIDENTIALITY NOTICE The information contained in this e-mail message is intended only for the exclusive use of the individual or entity named above and may contain information that is privileged, confidential or exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, or the employee or agent responsible for delivery of the message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify the sender immediately by e-mail and delete the material from any computer. Thank you. From fbristow at gmail.com Fri Jun 19 10:52:13 2009 From: fbristow at gmail.com (Franklin Bristow) Date: Fri, 19 Jun 2009 09:52:13 -0500 Subject: [Biojava-dev] Extended ABIF Parser and SCF Writer In-Reply-To: References: <50a7756d0906181931t744faa52r734df8c3c10b78cb@mail.gmail.com> Message-ID: <50a7756d0906190752t76ac1895h5ea3fa776a582cb@mail.gmail.com> Hi Richard, I've written a very small private class that extends ABIFChromatogram. This class has a method that basically copies what you have done in ABIFChromatogram when you load a file as an ABIFChromatogram, specifically: > /** > * Create an instance of an ExtendedABIFChromatogram using the > supplied > * file. This is meant to be called in lieu of the static create > method > * that is found in {@link ABIFChromatogram}. > * > * @param f > * the ABIF formatted file > * @return an instance of ExtendedABIFChromatogram > * @throws UnsupportedChromatogramFormatException > * the file supplied is not an ABIF formatted > chromatogram > * @throws IOException > * if an I/O error occurs > */ > public ExtendedABIFChromatogram createExtended(File f) > throws UnsupportedChromatogramFormatException, IOException > { > new Parser(f); > return this; > } > This removes the need for using reflection to alter the accessibility of the methods. I've attached the updated code to this message, I hope that you will allow it through your filters again. Thanks again for having a look at my code! Thanks, Franklin On Fri, Jun 19, 2009 at 12:23 AM, wrote: > > Hi Franklin - > > If there is a good argument for making the protected setBaseCallAlignment > method public then we could look at changing this so you don't need to use > reflection. As you say in your code comments this reflection will not work > unless the security policy allows it which will not be the case in many > systems. > > Another alternative would be to modify ABIFChromatogram and provide a > public method that lets people safely call the setBaseCallAlignment > (requires write access to the SVN). Finally you could extend > ABIFChromatogram and add a public method that will call the protected method > (of course this won't work if the method is private). > > Nice to see well documented code! > > - Mark > > biojava-dev-bounces at lists.open-bio.org wrote on 06/19/2009 10:31:13 AM: > > > > Hi Everyone, > > I've just spent the last few days putting together an extended ABIF > parser > > and and SCF writer. The parser that I wrote extends the existing > ABIFParser > > but takes into account much of the information that was made available a > few > > years ago when ABI released the ABIF File Format specification ( > > http://www.appliedbiosystems. > > com/support/software_community/ABIF_File_Format.pdf). > > I've heavily based my code and methods on the perl implementation of the > > ABIF parser Bio::Trace::ABIF by Nicola Vitacolonna. > > > > I also wrote a writer for SCF formatted chromatograms. I wrote this > mostly > > using the documentation found in the staden formats documentation ( > > http://staden.sourceforge.net/manual/formats_unix_2.html and > > http://iubio.bio.indiana.edu/soft/molbio/molbio. > > old/staden/www_pages/scf-rfc.html > > ). > > > > Finally, I have written a small utility class that will prepare an > > ABIFChromatogram for writing out as an SCF formatted file. This is the > > entire reason that I wrote both of the above classes. I will admit that > > there is a pretty nasty hack in the SCFUtils class, but it was the > quickest > > way I could think of doing what I needed to do. I use reflection in > order > > to make a protected method accessible so that I could set the value > myself > > without having to subclass ABIFChromatogram. Of course, I would like to > > change this but the circumstances under which I have had to write this > code > > forced me to do it this way for now. > > > > All of this code is written for Java 5, but if it is necessary to change > it > > for inclusion into your source tree I will make the change. > > > > So, I welcome comments and suggestions on how I can improve this to make > it > > appealing enough to have it included in biojava in the future. > > > > Since the code is rather long, I've attached it as a zip file. Andreas > told > > me that he would keep an eye on the filters for it and would let it > through > > when he saw it, so hopefully it makes it through okay. > > > > Thanks everyone for your time! > > > > -- > > Franklin > > [attachment "ABIFParser.zip" deleted by Mark Schreiber/GP/Novartis] > > _______________________________________________ > > biojava-dev mailing list > > biojava-dev at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-dev > > _________________________ > > CONFIDENTIALITY NOTICE > > The information contained in this e-mail message is intended only for the > exclusive use of the individual or entity named above and may contain > information that is privileged, confidential or exempt from disclosure under > applicable law. If the reader of this message is not the intended recipient, > or the employee or agent responsible for delivery of the message to the > intended recipient, you are hereby notified that any dissemination, > distribution or copying of this communication is strictly prohibited. If you > have received this communication in error, please notify the sender > immediately by e-mail and delete the material from any computer. Thank you. > -- Franklin -------------- next part -------------- A non-text attachment was scrubbed... Name: abifparser.zip Type: application/zip Size: 20254 bytes Desc: not available URL: From holland at eaglegenomics.com Fri Jun 19 11:00:49 2009 From: holland at eaglegenomics.com (Richard Holland) Date: Fri, 19 Jun 2009 16:00:49 +0100 Subject: [Biojava-dev] Extended ABIF Parser and SCF Writer In-Reply-To: <50a7756d0906190752t76ac1895h5ea3fa776a582cb@mail.gmail.com> References: <50a7756d0906181931t744faa52r734df8c3c10b78cb@mail.gmail.com> <50a7756d0906190752t76ac1895h5ea3fa776a582cb@mail.gmail.com> Message-ID: <1245423649.16991.13.camel@buzzybee> Sorry I haven't been following the thread... what problem is this a solution for? thanks, Richard On Fri, 2009-06-19 at 09:52 -0500, Franklin Bristow wrote: > Hi Richard, > I've written a very small private class that extends ABIFChromatogram. This > class has a method that basically copies what you have done in > ABIFChromatogram when you load a file as an ABIFChromatogram, specifically: > > > /** > > * Create an instance of an ExtendedABIFChromatogram using the > > supplied > > * file. This is meant to be called in lieu of the static create > > method > > * that is found in {@link ABIFChromatogram}. > > * > > * @param f > > * the ABIF formatted file > > * @return an instance of ExtendedABIFChromatogram > > * @throws UnsupportedChromatogramFormatException > > * the file supplied is not an ABIF formatted > > chromatogram > > * @throws IOException > > * if an I/O error occurs > > */ > > public ExtendedABIFChromatogram createExtended(File f) > > throws UnsupportedChromatogramFormatException, IOException > > { > > new Parser(f); > > return this; > > } > > > This removes the need for using reflection to alter the accessibility of the > methods. > > I've attached the updated code to this message, I hope that you will allow > it through your filters again. Thanks again for having a look at my code! > > Thanks, > Franklin > > On Fri, Jun 19, 2009 at 12:23 AM, wrote: > > > > > Hi Franklin - > > > > If there is a good argument for making the protected setBaseCallAlignment > > method public then we could look at changing this so you don't need to use > > reflection. As you say in your code comments this reflection will not work > > unless the security policy allows it which will not be the case in many > > systems. > > > > Another alternative would be to modify ABIFChromatogram and provide a > > public method that lets people safely call the setBaseCallAlignment > > (requires write access to the SVN). Finally you could extend > > ABIFChromatogram and add a public method that will call the protected method > > (of course this won't work if the method is private). > > > > Nice to see well documented code! > > > > - Mark > > > > biojava-dev-bounces at lists.open-bio.org wrote on 06/19/2009 10:31:13 AM: > > > > > > > Hi Everyone, > > > I've just spent the last few days putting together an extended ABIF > > parser > > > and and SCF writer. The parser that I wrote extends the existing > > ABIFParser > > > but takes into account much of the information that was made available a > > few > > > years ago when ABI released the ABIF File Format specification ( > > > http://www.appliedbiosystems. > > > com/support/software_community/ABIF_File_Format.pdf). > > > I've heavily based my code and methods on the perl implementation of the > > > ABIF parser Bio::Trace::ABIF by Nicola Vitacolonna. > > > > > > I also wrote a writer for SCF formatted chromatograms. I wrote this > > mostly > > > using the documentation found in the staden formats documentation ( > > > http://staden.sourceforge.net/manual/formats_unix_2.html and > > > http://iubio.bio.indiana.edu/soft/molbio/molbio. > > > old/staden/www_pages/scf-rfc.html > > > ). > > > > > > Finally, I have written a small utility class that will prepare an > > > ABIFChromatogram for writing out as an SCF formatted file. This is the > > > entire reason that I wrote both of the above classes. I will admit that > > > there is a pretty nasty hack in the SCFUtils class, but it was the > > quickest > > > way I could think of doing what I needed to do. I use reflection in > > order > > > to make a protected method accessible so that I could set the value > > myself > > > without having to subclass ABIFChromatogram. Of course, I would like to > > > change this but the circumstances under which I have had to write this > > code > > > forced me to do it this way for now. > > > > > > All of this code is written for Java 5, but if it is necessary to change > > it > > > for inclusion into your source tree I will make the change. > > > > > > So, I welcome comments and suggestions on how I can improve this to make > > it > > > appealing enough to have it included in biojava in the future. > > > > > > Since the code is rather long, I've attached it as a zip file. Andreas > > told > > > me that he would keep an eye on the filters for it and would let it > > through > > > when he saw it, so hopefully it makes it through okay. > > > > > > Thanks everyone for your time! > > > > > > -- > > > Franklin > > > [attachment "ABIFParser.zip" deleted by Mark Schreiber/GP/Novartis] > > > _______________________________________________ > > > biojava-dev mailing list > > > biojava-dev at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/biojava-dev > > > > _________________________ > > > > CONFIDENTIALITY NOTICE > > > > The information contained in this e-mail message is intended only for the > > exclusive use of the individual or entity named above and may contain > > information that is privileged, confidential or exempt from disclosure under > > applicable law. If the reader of this message is not the intended recipient, > > or the employee or agent responsible for delivery of the message to the > > intended recipient, you are hereby notified that any dissemination, > > distribution or copying of this communication is strictly prohibited. If you > > have received this communication in error, please notify the sender > > immediately by e-mail and delete the material from any computer. Thank you. > > > > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From fbristow at gmail.com Fri Jun 19 11:29:51 2009 From: fbristow at gmail.com (Franklin Bristow) Date: Fri, 19 Jun 2009 10:29:51 -0500 Subject: [Biojava-dev] Extended ABIF Parser and SCF Writer In-Reply-To: <1245423649.16991.13.camel@buzzybee> References: <50a7756d0906181931t744faa52r734df8c3c10b78cb@mail.gmail.com> <50a7756d0906190752t76ac1895h5ea3fa776a582cb@mail.gmail.com> <1245423649.16991.13.camel@buzzybee> Message-ID: <50a7756d0906190829s30646cefv7055c80d1fce39e9@mail.gmail.com> Sorry Richard, I meant to respond to Mark, I'm very sleepy this morning... Thanks, Franklin On Fri, Jun 19, 2009 at 10:00 AM, Richard Holland wrote: > Sorry I haven't been following the thread... what problem is this a > solution for? > > thanks, > Richard > > On Fri, 2009-06-19 at 09:52 -0500, Franklin Bristow wrote: > > Hi Richard, > > I've written a very small private class that extends ABIFChromatogram. > This > > class has a method that basically copies what you have done in > > ABIFChromatogram when you load a file as an ABIFChromatogram, > specifically: > > > > > /** > > > * Create an instance of an ExtendedABIFChromatogram using the > > > supplied > > > * file. This is meant to be called in lieu of the static > create > > > method > > > * that is found in {@link ABIFChromatogram}. > > > * > > > * @param f > > > * the ABIF formatted file > > > * @return an instance of ExtendedABIFChromatogram > > > * @throws UnsupportedChromatogramFormatException > > > * the file supplied is not an ABIF formatted > > > chromatogram > > > * @throws IOException > > > * if an I/O error occurs > > > */ > > > public ExtendedABIFChromatogram createExtended(File f) > > > throws UnsupportedChromatogramFormatException, > IOException > > > { > > > new Parser(f); > > > return this; > > > } > > > > > This removes the need for using reflection to alter the accessibility of > the > > methods. > > > > I've attached the updated code to this message, I hope that you will > allow > > it through your filters again. Thanks again for having a look at my > code! > > > > Thanks, > > Franklin > > > > On Fri, Jun 19, 2009 at 12:23 AM, wrote: > > > > > > > > Hi Franklin - > > > > > > If there is a good argument for making the protected > setBaseCallAlignment > > > method public then we could look at changing this so you don't need to > use > > > reflection. As you say in your code comments this reflection will not > work > > > unless the security policy allows it which will not be the case in many > > > systems. > > > > > > Another alternative would be to modify ABIFChromatogram and provide a > > > public method that lets people safely call the setBaseCallAlignment > > > (requires write access to the SVN). Finally you could extend > > > ABIFChromatogram and add a public method that will call the protected > method > > > (of course this won't work if the method is private). > > > > > > Nice to see well documented code! > > > > > > - Mark > > > > > > biojava-dev-bounces at lists.open-bio.org wrote on 06/19/2009 10:31:13 > AM: > > > > > > > > > > Hi Everyone, > > > > I've just spent the last few days putting together an extended ABIF > > > parser > > > > and and SCF writer. The parser that I wrote extends the existing > > > ABIFParser > > > > but takes into account much of the information that was made > available a > > > few > > > > years ago when ABI released the ABIF File Format specification ( > > > > http://www.appliedbiosystems. > > > > com/support/software_community/ABIF_File_Format.pdf). > > > > I've heavily based my code and methods on the perl implementation of > the > > > > ABIF parser Bio::Trace::ABIF by Nicola Vitacolonna. > > > > > > > > I also wrote a writer for SCF formatted chromatograms. I wrote this > > > mostly > > > > using the documentation found in the staden formats documentation ( > > > > http://staden.sourceforge.net/manual/formats_unix_2.html and > > > > http://iubio.bio.indiana.edu/soft/molbio/molbio. > > > > old/staden/www_pages/scf-rfc.html > > > > ). > > > > > > > > Finally, I have written a small utility class that will prepare an > > > > ABIFChromatogram for writing out as an SCF formatted file. This is > the > > > > entire reason that I wrote both of the above classes. I will admit > that > > > > there is a pretty nasty hack in the SCFUtils class, but it was the > > > quickest > > > > way I could think of doing what I needed to do. I use reflection in > > > order > > > > to make a protected method accessible so that I could set the value > > > myself > > > > without having to subclass ABIFChromatogram. Of course, I would like > to > > > > change this but the circumstances under which I have had to write > this > > > code > > > > forced me to do it this way for now. > > > > > > > > All of this code is written for Java 5, but if it is necessary to > change > > > it > > > > for inclusion into your source tree I will make the change. > > > > > > > > So, I welcome comments and suggestions on how I can improve this to > make > > > it > > > > appealing enough to have it included in biojava in the future. > > > > > > > > Since the code is rather long, I've attached it as a zip file. > Andreas > > > told > > > > me that he would keep an eye on the filters for it and would let it > > > through > > > > when he saw it, so hopefully it makes it through okay. > > > > > > > > Thanks everyone for your time! > > > > > > > > -- > > > > Franklin > > > > [attachment "ABIFParser.zip" deleted by Mark Schreiber/GP/Novartis] > > > > _______________________________________________ > > > > biojava-dev mailing list > > > > biojava-dev at lists.open-bio.org > > > > http://lists.open-bio.org/mailman/listinfo/biojava-dev > > > > > > _________________________ > > > > > > CONFIDENTIALITY NOTICE > > > > > > The information contained in this e-mail message is intended only for > the > > > exclusive use of the individual or entity named above and may contain > > > information that is privileged, confidential or exempt from disclosure > under > > > applicable law. If the reader of this message is not the intended > recipient, > > > or the employee or agent responsible for delivery of the message to the > > > intended recipient, you are hereby notified that any dissemination, > > > distribution or copying of this communication is strictly prohibited. > If you > > > have received this communication in error, please notify the sender > > > immediately by e-mail and delete the material from any computer. Thank > you. > > > > > > > > > > > _______________________________________________ > > biojava-dev mailing list > > biojava-dev at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-dev > -- > Richard Holland, BSc MBCS > Operations and Delivery Director, Eagle Genomics Ltd > T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com > http://www.eaglegenomics.com/ > > -- Franklin -------------- next part -------------- A non-text attachment was scrubbed... Name: abifparser.zip Type: application/zip Size: 20254 bytes Desc: not available URL: From andreas at sdsc.edu Sat Jun 20 12:45:51 2009 From: andreas at sdsc.edu (Andreas Prlic) Date: Sat, 20 Jun 2009 09:45:51 -0700 Subject: [Biojava-dev] BioJava user meeting at ISMB/BOSC Message-ID: <59a41c430906200945q598503ccj52717cf708b67083@mail.gmail.com> Hi, Next week the ISMB and BOSC conferences will take place in Stockholm, Sweden. As has become kind of a tradition, we will have a BioJava user meeting around BOSC. If you are in Stockholm at the time please join us on Sunday, late afternoon. We will meet during the "Birds of a Feather" session. http://open-bio.org/wiki/BOSC_2009/Birds-of-a-Feather Looking forward to meeting you there, Andreas From bugzilla-daemon at portal.open-bio.org Mon Jun 29 16:30:11 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 29 Jun 2009 16:30:11 -0400 Subject: [Biojava-dev] [Bug 2540] RichSequenceIterator does not skip sequence when exception is thrown In-Reply-To: Message-ID: <200906292030.n5TKUBwq020788@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2540 vdmerwe.karen at gmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From markjschreiber at gmail.com Tue Jun 30 04:33:28 2009 From: markjschreiber at gmail.com (Mark Schreiber) Date: Tue, 30 Jun 2009 16:33:28 +0800 Subject: [Biojava-dev] Singletons are bad Message-ID: <93b45ca50906300133w58109024vb89c6970a8446fed@mail.gmail.com> I came across this today which is an interesting article about how singletons seem like a good idea but after a while you realise they get you into serious trouble. After playing with BioJava for over 10 years I completely concur. Singletons and fly-weight objects are (IMHO) the most serious problem in the BioJava code base and as the article predicts the BJ code base is completely infected with them. The article is here: http://tech.puredanger.com/2007/07/03/pattern-hate-singleton/ But I have copied the paragraph below as it seems to offer a way out without completely breaking everything. This should be seriously considered for future BJ releases. ... paste starts here But I already have a bunch of singletons in my code! Sometimes, you?ll have a system (built by you or someone else) that is heavily dependent on some singletons. Often, you will find this annoying as you try to test and/or add functionality to the system. To refactor the singletons out of your system, you need to start from each point of use and allow the singleton to be set as a dependency on the component using it, rather than calling to the singleton?s getInstance() method. Doing so moves the singleton access (but not use) up one level. Repeat until the singleton?s getInstance() method is called in as few places as possible (ideally one). At this point, all components in the system declare their dependence on the concrete singleton class and that singleton class is instantiated at a very few points at the ?top? or your architecture (then passed down through the systems). Next, it?s time to apply some classic refactoring. Most importantly, we want to change the concrete singleton class into an interface and move the existing concrete implementation into a new default implementation class implementing the interface. Finally, you?ll probably want to cleanup the calls to getInstance() with either a call to new the concrete default implementation or a factory method that can do that for you. This transformation should make all of your components dependent on an injectable, interface-defined component, which is easy to mock or swap in during unit testing of the component itself. It also typically makes testing of the concrete singleton implementation itself a breeze compared to the prior implementation. Note that the first phase of bubbling the singleton instantiation up through the architecture can be done as slowly as needed and does not need to be done all at once. You?ll find the second phase is fairly easy with any modern IDE once you get to that point. From xuxiang at sibs.ac.cn Mon Jun 1 01:54:46 2009 From: xuxiang at sibs.ac.cn (xuxiang) Date: Mon, 1 Jun 2009 09:54:46 +0800 Subject: [Biojava-dev] Next Generation Sequencing Message-ID: <200906010954385937117@sibs.ac.cn> Hi all, I am doing something about sequencing data from Illumina Genome Analyzer (Next Generation Sequencing). Are there any tools in BioJava for analyzing Next Generation Sequencing data? 2009-06-01 xuxiang From sylvain.foisy at diploide.net Tue Jun 2 16:26:22 2009 From: sylvain.foisy at diploide.net (Sylvain Foisy) Date: Tue, 02 Jun 2009 12:26:22 -0400 Subject: [Biojava-dev] Biojava Interface to BLAST web/remote services Message-ID: Hi, In response to Scooter and from using some of these BLAST implementations in the past, I would suggest that we use the QBlast service from NCI first for a number of reasons: - It has been in operation for a long time and its usage is well documented; - Because of this, there is few chances that it will change; - Coming from NCBI, it will probably be there for some time to come ;-) Our friends at BioPerl have been using this technique for a long time now with the BioPerl Module:Bio::Tools::Run::RemoteBlast module. We might try to emulate at first and of course, do better :-) Any inputs? Best regards Sylvain =================================================================== Sylvain Foisy, Ph. D. Consultant Bio-informatique / Bioinformatics Diploide.net - TI pour la vie / IT for Life Courriel: sylvain.foisy at diploide.net Web: http://www.diploide.net =================================================================== From andreas at sdsc.edu Thu Jun 4 09:12:01 2009 From: andreas at sdsc.edu (Andreas Prlic) Date: Thu, 4 Jun 2009 11:12:01 +0200 Subject: [Biojava-dev] Biojava Interface to BLAST web/remote services In-Reply-To: References: Message-ID: <59a41c430906040212m659046c0y820d32079607f34d@mail.gmail.com> Hi Sylvain, Do you mean the URL api for the NCBI Blast searches? Could not find a link for a WSDL... http://www.ncbi.nlm.nih.gov/BLAST/developer.shtml Andreas On Tue, Jun 2, 2009 at 6:26 PM, Sylvain Foisy wrote: > Hi, > > In response to Scooter and from using some of these BLAST implementations in > the past, I would suggest that we use the QBlast service from NCI first for > a number of reasons: > > - It has been in operation for a long time and its usage is well documented; > > - Because of this, there is few chances that it will change; > > - Coming from NCBI, it will probably be there for some time to come ;-) > > Our friends at BioPerl have been using this technique for a long time now > with the BioPerl Module:Bio::Tools::Run::RemoteBlast module. We might try to > emulate at first and of course, do better :-) > > Any inputs? > > Best regards > > Sylvain > > =================================================================== > > ?Sylvain Foisy, Ph. D. > ?Consultant Bio-informatique / Bioinformatics > ?Diploide.net - TI pour la vie / IT for Life > > ?Courriel: sylvain.foisy at diploide.net > ?Web: http://www.diploide.net > > =================================================================== > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > From HWillis at scripps.edu Thu Jun 4 11:38:43 2009 From: HWillis at scripps.edu (Scooter Willis) Date: Thu, 4 Jun 2009 07:38:43 -0400 Subject: [Biojava-dev] Biojava Interface to BLAST web/remote services References: <59a41c430906040212m659046c0y820d32079607f34d@mail.gmail.com> Message-ID: <061BFD133FA1584693D19C79A0072F5F76C86A@FLMAIL1.fl.ad.scripps.edu> Looks like the rolled their own URL interface and did not do a WSDL. Not a big deal but does appear they have some sort of submit get a "ticket" and then check back with the "ticket" identifier for the results. The BioJava API would hide the transport layer so you could use a custom URL approach or web services. Not sure how the other WSDL interfaces handle long running tasks but I assume the Web Services can handle a call that takes say 5 minutes to respond without timing out. Some process would need to distinguish between a long running server task and a server that is no longer responding. Scooter ________________________________ From: biojava-dev-bounces at lists.open-bio.org on behalf of Andreas Prlic Sent: Thu 6/4/2009 5:12 AM To: Sylvain Foisy Cc: biojava-dev at lists.open-bio.org Subject: Re: [Biojava-dev] Biojava Interface to BLAST web/remote services Hi Sylvain, Do you mean the URL api for the NCBI Blast searches? Could not find a link for a WSDL... http://www.ncbi.nlm.nih.gov/BLAST/developer.shtml Andreas On Tue, Jun 2, 2009 at 6:26 PM, Sylvain Foisy wrote: > Hi, > > In response to Scooter and from using some of these BLAST implementations in > the past, I would suggest that we use the QBlast service from NCI first for > a number of reasons: > > - It has been in operation for a long time and its usage is well documented; > > - Because of this, there is few chances that it will change; > > - Coming from NCBI, it will probably be there for some time to come ;-) > > Our friends at BioPerl have been using this technique for a long time now > with the BioPerl Module:Bio::Tools::Run::RemoteBlast module. We might try to > emulate at first and of course, do better :-) > > Any inputs? > > Best regards > > Sylvain > > =================================================================== > > Sylvain Foisy, Ph. D. > Consultant Bio-informatique / Bioinformatics > Diploide.net - TI pour la vie / IT for Life > > Courriel: sylvain.foisy at diploide.net > Web: http://www.diploide.net > > =================================================================== > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > _______________________________________________ biojava-dev mailing list biojava-dev at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-dev From andreas at sdsc.edu Thu Jun 4 13:00:43 2009 From: andreas at sdsc.edu (Andreas Prlic) Date: Thu, 4 Jun 2009 14:00:43 +0100 Subject: [Biojava-dev] Biojava Interface to BLAST web/remote services In-Reply-To: <061BFD133FA1584693D19C79A0072F5F76C86A@FLMAIL1.fl.ad.scripps.edu> References: <59a41c430906040212m659046c0y820d32079607f34d@mail.gmail.com> <061BFD133FA1584693D19C79A0072F5F76C86A@FLMAIL1.fl.ad.scripps.edu> Message-ID: <59a41c430906040600k74bd525frce89d79943542a6e@mail.gmail.com> although using a different API this system is similar to the sequence search service provided by Pfam ... http://pfam.sanger.ac.uk/help#services Andreas On Thu, Jun 4, 2009 at 12:38 PM, Scooter Willis wrote: > Looks like the rolled their own URL interface and did not do a WSDL. Not a > big deal but does appear they have some sort of submit get a "ticket" and > then check back with the "ticket" identifier for the results. The BioJava > API would hide the transport layer so you could use a custom URL approach or > web services. > > Not sure how the other WSDL interfaces handle long running tasks but I > assume the Web Services can handle a call that takes say 5 minutes to > respond without timing out. Some process would need to distinguish between a > long running server task and a server that is no longer responding. > > Scooter > ________________________________ > From: biojava-dev-bounces at lists.open-bio.org on behalf of Andreas Prlic > Sent: Thu 6/4/2009 5:12 AM > To: Sylvain Foisy > Cc: biojava-dev at lists.open-bio.org > Subject: Re: [Biojava-dev] Biojava Interface to BLAST web/remote services > > Hi Sylvain, > > Do you mean the URL api for the NCBI Blast searches? Could not find a > link for a WSDL... > http://www.ncbi.nlm.nih.gov/BLAST/developer.shtml > > Andreas > > > On Tue, Jun 2, 2009 at 6:26 PM, Sylvain Foisy > wrote: >> Hi, >> >> In response to Scooter and from using some of these BLAST implementations >> in >> the past, I would suggest that we use the QBlast service from NCI first >> for >> a number of reasons: >> >> - It has been in operation for a long time and its usage is well >> documented; >> >> - Because of this, there is few chances that it will change; >> >> - Coming from NCBI, it will probably be there for some time to come ;-) >> >> Our friends at BioPerl have been using this technique for a long time now >> with the BioPerl Module:Bio::Tools::Run::RemoteBlast module. We might try >> to >> emulate at first and of course, do better :-) >> >> Any inputs? >> >> Best regards >> >> Sylvain >> >> =================================================================== >> >> ?Sylvain Foisy, Ph. D. >> ?Consultant Bio-informatique / Bioinformatics >> ?Diploide.net - TI pour la vie / IT for Life >> >> ?Courriel: sylvain.foisy at diploide.net >> ?Web: http://www.diploide.net >> >> =================================================================== >> >> >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > From sylvain.foisy at diploide.net Thu Jun 4 13:07:04 2009 From: sylvain.foisy at diploide.net (Sylvain Foisy) Date: Thu, 04 Jun 2009 09:07:04 -0400 Subject: [Biojava-dev] Biojava Interface to BLAST web/remote services In-Reply-To: <061BFD133FA1584693D19C79A0072F5F76C86A@FLMAIL1.fl.ad.scripps.edu> Message-ID: Hi Scooter, On 04/06/09 07:38, "[NAME]" <[ADDRESS]> wrote: > Looks like the rolled their own URL interface and did not do a WSDL. Not a big > deal but does appear they have some sort of submit get a "ticket" and then > check back with the "ticket" identifier for the results. The BioJava API would > hide the transport layer so you could use a custom URL approach or web > services. That is basically the way it works. I am working on a RemoteBlastWrapper class that would do exactly what you are writing. > Not sure how the other WSDL interfaces handle long running tasks but I assume > the Web Services can handle a call that takes say 5 minutes to respond without > timing out. Some process would need to distinguish between a long running > server task and a server that is no longer responding. We'll have to try ;-) Best regards Sylvain =================================================================== Sylvain Foisy, Ph. D. Consultant Bio-informatique / Bioinformatics Diploide.net - TI pour la vie / IT for Life Courriel: sylvain.foisy at diploide.net Web: http://www.diploide.net Tel: (514) 893-4363 =================================================================== From HWillis at scripps.edu Thu Jun 4 13:28:20 2009 From: HWillis at scripps.edu (Scooter Willis) Date: Thu, 4 Jun 2009 09:28:20 -0400 Subject: [Biojava-dev] Biojava Interface to BLAST web/remote services In-Reply-To: References: <061BFD133FA1584693D19C79A0072F5F76C86A@FLMAIL1.fl.ad.scripps.edu> Message-ID: <061BFD133FA1584693D19C79A0072F5F95FBA3@FLMAIL1.fl.ad.scripps.edu> Sylvain Given that BioJava already has a BLAST file parser that returns results the goal should be to have a remote/web call return the same set of classes as if you had parsed the file locally. That is going to be my approach. Once we get a couple services working we can integrate into a common factory/interface approach. Thanks Scooter -----Original Message----- From: biojava-dev-bounces at lists.open-bio.org [mailto:biojava-dev-bounces at lists.open-bio.org] On Behalf Of Sylvain Foisy Sent: Thursday, June 04, 2009 9:07 AM To: Scooter Willis; Andreas Prlic Cc: biojava-dev at lists.open-bio.org Subject: Re: [Biojava-dev] Biojava Interface to BLAST web/remote services Hi Scooter, On 04/06/09 07:38, "[NAME]" <[ADDRESS]> wrote: > Looks like the rolled their own URL interface and did not do a WSDL. Not a big > deal but does appear they have some sort of submit get a "ticket" and then > check back with the "ticket" identifier for the results. The BioJava API would > hide the transport layer so you could use a custom URL approach or web > services. That is basically the way it works. I am working on a RemoteBlastWrapper class that would do exactly what you are writing. > Not sure how the other WSDL interfaces handle long running tasks but I assume > the Web Services can handle a call that takes say 5 minutes to respond without > timing out. Some process would need to distinguish between a long running > server task and a server that is no longer responding. We'll have to try ;-) Best regards Sylvain =================================================================== Sylvain Foisy, Ph. D. Consultant Bio-informatique / Bioinformatics Diploide.net - TI pour la vie / IT for Life Courriel: sylvain.foisy at diploide.net Web: http://www.diploide.net Tel: (514) 893-4363 =================================================================== _______________________________________________ biojava-dev mailing list biojava-dev at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-dev From sylvain.foisy at diploide.net Thu Jun 4 13:57:03 2009 From: sylvain.foisy at diploide.net (Sylvain Foisy) Date: Thu, 04 Jun 2009 09:57:03 -0400 Subject: [Biojava-dev] Biojava Interface to BLAST web/remote services In-Reply-To: <061BFD133FA1584693D19C79A0072F5F95FBA3@FLMAIL1.fl.ad.scripps.edu> Message-ID: Hi Scooter, That is one way of doing it ;-) I was thinking of creating an object that the user would either: - Feed into the BJ Blast parser - Do something else entirely. Best regards Sylvain On 04/06/09 09:28, "[NAME]" <[ADDRESS]> wrote: > Sylvain > > Given that BioJava already has a BLAST file parser that returns results > the goal should be to have a remote/web call return the same set of > classes as if you had parsed the file locally. That is going to be my > approach. Once we get a couple services working we can integrate into a > common factory/interface approach. > > Thanks > > Scooter > > > -----Original Message----- > From: biojava-dev-bounces at lists.open-bio.org > [mailto:biojava-dev-bounces at lists.open-bio.org] On Behalf Of Sylvain > Foisy > Sent: Thursday, June 04, 2009 9:07 AM > To: Scooter Willis; Andreas Prlic > Cc: biojava-dev at lists.open-bio.org > Subject: Re: [Biojava-dev] Biojava Interface to BLAST web/remote > services > > Hi Scooter, > > On 04/06/09 07:38, "[NAME]" <[ADDRESS]> wrote: > >> Looks like the rolled their own URL interface and did not do a WSDL. > Not a big >> deal but does appear they have some sort of submit get a "ticket" and > then >> check back with the "ticket" identifier for the results. The BioJava > API would >> hide the transport layer so you could use a custom URL approach or web >> services. > > That is basically the way it works. I am working on a RemoteBlastWrapper > class that would do exactly what you are writing. > > >> Not sure how the other WSDL interfaces handle long running tasks but I > assume >> the Web Services can handle a call that takes say 5 minutes to respond > without >> timing out. Some process would need to distinguish between a long > running >> server task and a server that is no longer responding. > > We'll have to try ;-) > > Best regards > > Sylvain > > > =================================================================== > > Sylvain Foisy, Ph. D. > Consultant Bio-informatique / Bioinformatics > Diploide.net - TI pour la vie / IT for Life > > Courriel: sylvain.foisy at diploide.net > Web: http://www.diploide.net > Tel: (514) 893-4363 > =================================================================== > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev From HWillis at scripps.edu Thu Jun 4 14:16:07 2009 From: HWillis at scripps.edu (Scooter Willis) Date: Thu, 4 Jun 2009 10:16:07 -0400 Subject: [Biojava-dev] Biojava Interface to BLAST web/remote services In-Reply-To: References: <061BFD133FA1584693D19C79A0072F5F95FBA3@FLMAIL1.fl.ad.scripps.edu> Message-ID: <061BFD133FA1584693D19C79A0072F5F95FBB1@FLMAIL1.fl.ad.scripps.edu> Sylvain I think the way you submit the query/paramaters of the seearch or parse a BLAST file would be different and we would not worry about the SAX API/File dependency of parsing a file. We do need a Class that would contain the search parameters and this should as an object follow the same inputs available via the union of HTML interfaces for the supported BLAST engines. Some search engines will have more inputs or specificity over others so that will require some analysis. This search parameter class should be independent of a particular BLAST web service engine allowing a user to submit the same search to multiple services with minimum overhead. But once you get the results then having the ability to use the same general iteration of results/hits will allow those who have invested in the BLAST file parsing API to easily insert the new web services approach. >From the biojava cookbook SeqSimilaritySearchHit is the class that contains the results and should be the class used to contain the results from the web service query. In the web service approach you should be able to get the collection of SeqSimilaritySearchResult and SeqSimilaritySearchHit from each of the supported BLAST web services. The assumption is that SeqSimilaritySearchResult and SeqSimilaritySearchHit have been properly designed to represent BLAST data. Scooter //output some blast details for (Iterator i = results.iterator(); i.hasNext(); ) { SeqSimilaritySearchResult result = (SeqSimilaritySearchResult)i.next(); Annotation anno = result.getAnnotation(); for (Iterator j = anno.keys().iterator(); j.hasNext(); ) { Object key = j.next(); Object property = anno.getProperty(key); System.out.println(key+" : "+property); } System.out.println("Hits: "); //list the hits for (Iterator k = result.getHits().iterator(); k.hasNext(); ) { SeqSimilaritySearchHit hit = (SeqSimilaritySearchHit)k.next(); System.out.print("\tmatch: "+hit.getSubjectID()); System.out.println("\te score: "+hit.getEValue()); } System.out.println("\n"); } } -----Original Message----- From: Sylvain Foisy [mailto:sylvain.foisy at diploide.net] Sent: Thursday, June 04, 2009 9:57 AM To: Scooter Willis; Andreas Prlic Cc: biojava-dev at lists.open-bio.org Subject: Re: [Biojava-dev] Biojava Interface to BLAST web/remote services Hi Scooter, That is one way of doing it ;-) I was thinking of creating an object that the user would either: - Feed into the BJ Blast parser - Do something else entirely. Best regards Sylvain On 04/06/09 09:28, "[NAME]" <[ADDRESS]> wrote: > Sylvain > > Given that BioJava already has a BLAST file parser that returns results > the goal should be to have a remote/web call return the same set of > classes as if you had parsed the file locally. That is going to be my > approach. Once we get a couple services working we can integrate into a > common factory/interface approach. > > Thanks > > Scooter > > > -----Original Message----- > From: biojava-dev-bounces at lists.open-bio.org > [mailto:biojava-dev-bounces at lists.open-bio.org] On Behalf Of Sylvain > Foisy > Sent: Thursday, June 04, 2009 9:07 AM > To: Scooter Willis; Andreas Prlic > Cc: biojava-dev at lists.open-bio.org > Subject: Re: [Biojava-dev] Biojava Interface to BLAST web/remote > services > > Hi Scooter, > > On 04/06/09 07:38, "[NAME]" <[ADDRESS]> wrote: > >> Looks like the rolled their own URL interface and did not do a WSDL. > Not a big >> deal but does appear they have some sort of submit get a "ticket" and > then >> check back with the "ticket" identifier for the results. The BioJava > API would >> hide the transport layer so you could use a custom URL approach or web >> services. > > That is basically the way it works. I am working on a RemoteBlastWrapper > class that would do exactly what you are writing. > > >> Not sure how the other WSDL interfaces handle long running tasks but I > assume >> the Web Services can handle a call that takes say 5 minutes to respond > without >> timing out. Some process would need to distinguish between a long > running >> server task and a server that is no longer responding. > > We'll have to try ;-) > > Best regards > > Sylvain > > > =================================================================== > > Sylvain Foisy, Ph. D. > Consultant Bio-informatique / Bioinformatics > Diploide.net - TI pour la vie / IT for Life > > Courriel: sylvain.foisy at diploide.net > Web: http://www.diploide.net > Tel: (514) 893-4363 > =================================================================== > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev From mark.schreiber at novartis.com Fri Jun 5 03:47:42 2009 From: mark.schreiber at novartis.com (mark.schreiber at novartis.com) Date: Fri, 5 Jun 2009 11:47:42 +0800 Subject: [Biojava-dev] Biojava Interface to BLAST web/remote services In-Reply-To: <061BFD133FA1584693D19C79A0072F5F95FBB1@FLMAIL1.fl.ad.scripps.edu> Message-ID: Hi - Just some observations from past experience: You could write an interface called something like RemoteSimilaritySearch which contains minimal information that all SOAP/ CGI-BIN sequence search services might be expected to require and return although it's pretty hard to anticipate what that might be. Possibly more useful would be RemoteBLAST, RemoteFASTA etc interfaces that could extend RemoteSimilaritySearch. Concrete implementations of, for example, the RemoteBLAST could include the SOAP service at EBI and the CGI-BIN service at NCBI. The RemoteBLAST and RemoteFASTA should have the possibility to modify any parameter of BLAST/ FASTA as appropriate and should have the option to throw an UnsupportedOperationException as not all interfaces will allow the setting of all parameters. In general trying to make an implementation that will talk to an HTML interface to BLAST is asking for trouble (as they can change very easily). It is best to code to a SOAP/ REST service or, if you have to, a CGI-BIN interface. You should only make an implementation that talks to a web form as a last resort and even then if probably shouldn't go into BioJava (maybe post it on the cookbook). The most stable version of the BLAST output is the XML. Parsing the text/html output has been a constant source of headaches for BioJava. Implementations of remote blast services should try and parse that format if it is available (SOAP and REST will be XML anyway although not always BLAST.XML). All the BLAST services I have used will return a job number not a result. The client will then need to poll that job number until it is complete and then get the results for the job. The client will need to handle this sensibly without timing out (unless the user wants to allow a time out). Sensible threading will be required. Converting results back into SeqSimilaritySearchResult makes sense although please note that Andreas has suggested renaming the packages for these (which I support as the old package name is not informative). Under a mavenized system the whole Similarity search system could go into it's own module. Just my $0.02 - Mark biojava-dev-bounces at lists.open-bio.org wrote on 06/04/2009 10:16:07 PM: > Sylvain > > I think the way you submit the query/paramaters of the seearch or parse > a BLAST file would be different and we would not worry about the SAX > API/File dependency of parsing a file. We do need a Class that would > contain the search parameters and this should as an object follow the > same inputs available via the union of HTML interfaces for the supported > BLAST engines. Some search engines will have more inputs or specificity > over others so that will require some analysis. This search parameter > class should be independent of a particular BLAST web service engine > allowing a user to submit the same search to multiple services with > minimum overhead. > > But once you get the results then having the ability to use the same > general iteration of results/hits will allow those who have invested in > the BLAST file parsing API to easily insert the new web services > approach. > > >From the biojava cookbook SeqSimilaritySearchHit is the class that > contains the results and should be the class used to contain the results > from the web service query. In the web service approach you should be > able to get the collection of SeqSimilaritySearchResult and > SeqSimilaritySearchHit from each of the supported BLAST web services. > The assumption is that SeqSimilaritySearchResult and > SeqSimilaritySearchHit have been properly designed to represent BLAST > data. > > Scooter > > //output some blast details > for (Iterator i = results.iterator(); i.hasNext(); ) { > SeqSimilaritySearchResult result = > (SeqSimilaritySearchResult)i.next(); > > Annotation anno = result.getAnnotation(); > > for (Iterator j = anno.keys().iterator(); j.hasNext(); ) { > Object key = j.next(); > Object property = anno.getProperty(key); > System.out.println(key+" : "+property); > } > System.out.println("Hits: "); > > //list the hits > for (Iterator k = result.getHits().iterator(); k.hasNext(); ) { > SeqSimilaritySearchHit hit = > (SeqSimilaritySearchHit)k.next(); > System.out.print("\tmatch: "+hit.getSubjectID()); > System.out.println("\te score: "+hit.getEValue()); > } > > System.out.println("\n"); > } > > } > > -----Original Message----- > From: Sylvain Foisy [mailto:sylvain.foisy at diploide.net] > Sent: Thursday, June 04, 2009 9:57 AM > To: Scooter Willis; Andreas Prlic > Cc: biojava-dev at lists.open-bio.org > Subject: Re: [Biojava-dev] Biojava Interface to BLAST web/remote > services > > Hi Scooter, > > That is one way of doing it ;-) I was thinking of creating an object > that > the user would either: > > - Feed into the BJ Blast parser > - Do something else entirely. > > Best regards > > Sylvain > > On 04/06/09 09:28, "[NAME]" <[ADDRESS]> wrote: > > > Sylvain > > > > Given that BioJava already has a BLAST file parser that returns > results > > the goal should be to have a remote/web call return the same set of > > classes as if you had parsed the file locally. That is going to be my > > approach. Once we get a couple services working we can integrate into > a > > common factory/interface approach. > > > > Thanks > > > > Scooter > > > > > > -----Original Message----- > > From: biojava-dev-bounces at lists.open-bio.org > > [mailto:biojava-dev-bounces at lists.open-bio.org] On Behalf Of Sylvain > > Foisy > > Sent: Thursday, June 04, 2009 9:07 AM > > To: Scooter Willis; Andreas Prlic > > Cc: biojava-dev at lists.open-bio.org > > Subject: Re: [Biojava-dev] Biojava Interface to BLAST web/remote > > services > > > > Hi Scooter, > > > > On 04/06/09 07:38, "[NAME]" <[ADDRESS]> wrote: > > > >> Looks like the rolled their own URL interface and did not do a WSDL. > > Not a big > >> deal but does appear they have some sort of submit get a "ticket" and > > then > >> check back with the "ticket" identifier for the results. The BioJava > > API would > >> hide the transport layer so you could use a custom URL approach or > web > >> services. > > > > That is basically the way it works. I am working on a > RemoteBlastWrapper > > class that would do exactly what you are writing. > > > > > >> Not sure how the other WSDL interfaces handle long running tasks but > I > > assume > >> the Web Services can handle a call that takes say 5 minutes to > respond > > without > >> timing out. Some process would need to distinguish between a long > > running > >> server task and a server that is no longer responding. > > > > We'll have to try ;-) > > > > Best regards > > > > Sylvain > > > > > > =================================================================== > > > > Sylvain Foisy, Ph. D. > > Consultant Bio-informatique / Bioinformatics > > Diploide.net - TI pour la vie / IT for Life > > > > Courriel: sylvain.foisy at diploide.net > > Web: http://www.diploide.net > > Tel: (514) 893-4363 > > =================================================================== > > > > > > _______________________________________________ > > biojava-dev mailing list > > biojava-dev at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-dev > > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev _________________________ CONFIDENTIALITY NOTICE The information contained in this e-mail message is intended only for the exclusive use of the individual or entity named above and may contain information that is privileged, confidential or exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, or the employee or agent responsible for delivery of the message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify the sender immediately by e-mail and delete the material from any computer. Thank you. From bugzilla-daemon at portal.open-bio.org Wed Jun 10 21:59:30 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 10 Jun 2009 17:59:30 -0400 Subject: [Biojava-dev] [Bug 2854] New: Selection of protein alphabet is hardcoded in ProteinTools class Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2854 Summary: Selection of protein alphabet is hardcoded in ProteinTools class Product: BioJava Version: live (CVS source) Platform: All OS/Version: All Status: NEW Severity: normal Priority: P2 Component: seq AssignedTo: biojava-dev at biojava.org ReportedBy: mdharsee at ocbn.ca In our application we are calling createProtein() in class org.biojava.bio.seq.ProteinTools to generate SymbolList objects to encapsulate peptide sequences that are composed of the 20 common amino acid symbols, as well as the 'X' ambiguity symbol. However createProtein() forces the selection of the PROTEIN-TERM alphabet from AlphabetManager.xml, through the call to 'getTAlphabet()' as copied below: public static SymbolList createProtein(String theProtein) throws IllegalSymbolException { SymbolTokenization p = null; try { p = getTAlphabet().getTokenization("token"); } catch (BioException e) { throw new BioError("Something has gone badly wrong with Protein", e); } return new SimpleSymbolList(p, theProtein); } This selection should rather be made based on the symbol content of the input sequence(s), rather than being hardcoded. Only if the input data contains the symbol 'TER' (terminus) or some abiguity symbol that covers the PROTEIN-TERM alphabet, should the PROTEIN-TERM alphabet be selected. Otherwise the simpler PROTEIN alphabet should be selected. On a related note, the PROTEIN alphabet defined in AlphabetManager.xml consists of 22 residues and includes the less commonly found 'SEC' (selenocysteine, U) and 'PYR' (pyroglutamic acid, O). However, many applications only require the common 20-symbol alphabet that excludes the latter two residues. It would be useful to include a new alphabet in AlphabetManager.xml that defines the simpler 20-symbol set of common amino acids. Perhaps this point should be a feature request? Cheers, Moyez -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From markjschreiber at gmail.com Thu Jun 11 01:30:00 2009 From: markjschreiber at gmail.com (Mark Schreiber) Date: Thu, 11 Jun 2009 09:30:00 +0800 Subject: [Biojava-dev] [Bug 2854] New: Selection of protein alphabet is hardcoded in ProteinTools class In-Reply-To: References: Message-ID: <93b45ca50906101830s56abf2o28fae8b901f60d56@mail.gmail.com> This actually raises an interesting point for the development of biojava3. Do we actually need separate protein alphabets? I can't actually remember the reason these are separate. Is there a good argument for this??? - Mark On Thu, Jun 11, 2009 at 5:59 AM, wrote: > > http://bugzilla.open-bio.org/show_bug.cgi?id=2854 > > ? ? ? ? ? Summary: Selection of protein alphabet is hardcoded in > ? ? ? ? ? ? ? ? ? ?ProteinTools class > ? ? ? ? ? Product: BioJava > ? ? ? ? ? Version: live (CVS source) > ? ? ? ? ?Platform: All > ? ? ? ?OS/Version: All > ? ? ? ? ? ?Status: NEW > ? ? ? ? ?Severity: normal > ? ? ? ? ?Priority: P2 > ? ? ? ? Component: seq > ? ? ? ?AssignedTo: biojava-dev at biojava.org > ? ? ? ?ReportedBy: mdharsee at ocbn.ca > > > In our application we are calling createProtein() in class > org.biojava.bio.seq.ProteinTools to generate SymbolList objects to encapsulate > peptide sequences that are composed of the 20 common amino acid symbols, as > well as the 'X' ambiguity symbol. > > However createProtein() forces the selection of the PROTEIN-TERM alphabet from > AlphabetManager.xml, through the call to 'getTAlphabet()' as copied below: > > ?public static SymbolList createProtein(String theProtein) > ? ? ? ? ?throws IllegalSymbolException > ?{ > ? ?SymbolTokenization p = null; > ? ?try { > ? ? ?p = getTAlphabet().getTokenization("token"); > ? ?} catch (BioException e) { > ? ? ?throw new BioError("Something has gone badly wrong with Protein", e); > ? ?} > ? ?return new SimpleSymbolList(p, theProtein); > ?} > > This selection should rather be made based on the symbol content of the input > sequence(s), rather than being hardcoded. Only if the input data contains the > symbol 'TER' (terminus) or some abiguity symbol that covers the PROTEIN-TERM > alphabet, should the PROTEIN-TERM alphabet be selected. Otherwise the simpler > PROTEIN alphabet should be selected. > > On a related note, the PROTEIN alphabet defined in AlphabetManager.xml consists > of 22 residues and includes the less commonly found 'SEC' (selenocysteine, U) > and 'PYR' (pyroglutamic acid, O). However, many applications only require the > common 20-symbol alphabet that excludes the latter two residues. It would be > useful to include a new alphabet in AlphabetManager.xml that defines the > simpler 20-symbol set of common amino acids. Perhaps this point should be a > feature request? > > Cheers, > Moyez > > > -- > Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email > ------- You are receiving this mail because: ------- > You are the assignee for the bug, or are watching the assignee. > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev From andreas at sdsc.edu Thu Jun 11 02:50:42 2009 From: andreas at sdsc.edu (Andreas Prlic) Date: Wed, 10 Jun 2009 19:50:42 -0700 Subject: [Biojava-dev] [Bug 2854] New: Selection of protein alphabet is hardcoded in ProteinTools class In-Reply-To: <93b45ca50906101830s56abf2o28fae8b901f60d56@mail.gmail.com> References: <93b45ca50906101830s56abf2o28fae8b901f60d56@mail.gmail.com> Message-ID: <59a41c430906101950q7a592d9dh8a71cbda2e47065c@mail.gmail.com> Hi Mark, The way I see the protein structure modules develop is that I will try to get rid of dependency on the alphabets and replace it with support for the Chemical component dictionary http://www.wwpdb.org/ccd.html . The dictionary contains a list standard and modified residues as well as small molecule ligands. If applicable it provides parent/child relationship between compounds. There are too many modified residues and sometimes the boundaries to ligands are also not straightforward to draw... Andreas On Wed, Jun 10, 2009 at 6:30 PM, Mark Schreiber wrote: > This actually raises an interesting point for the development of > biojava3. Do we actually need separate protein alphabets? I can't > actually remember the reason these are separate. Is there a good > argument for this??? > > - Mark > > On Thu, Jun 11, 2009 at 5:59 AM, wrote: >> >> http://bugzilla.open-bio.org/show_bug.cgi?id=2854 >> >> ? ? ? ? ? Summary: Selection of protein alphabet is hardcoded in >> ? ? ? ? ? ? ? ? ? ?ProteinTools class >> ? ? ? ? ? Product: BioJava >> ? ? ? ? ? Version: live (CVS source) >> ? ? ? ? ?Platform: All >> ? ? ? ?OS/Version: All >> ? ? ? ? ? ?Status: NEW >> ? ? ? ? ?Severity: normal >> ? ? ? ? ?Priority: P2 >> ? ? ? ? Component: seq >> ? ? ? ?AssignedTo: biojava-dev at biojava.org >> ? ? ? ?ReportedBy: mdharsee at ocbn.ca >> >> >> In our application we are calling createProtein() in class >> org.biojava.bio.seq.ProteinTools to generate SymbolList objects to encapsulate >> peptide sequences that are composed of the 20 common amino acid symbols, as >> well as the 'X' ambiguity symbol. >> >> However createProtein() forces the selection of the PROTEIN-TERM alphabet from >> AlphabetManager.xml, through the call to 'getTAlphabet()' as copied below: >> >> ?public static SymbolList createProtein(String theProtein) >> ? ? ? ? ?throws IllegalSymbolException >> ?{ >> ? ?SymbolTokenization p = null; >> ? ?try { >> ? ? ?p = getTAlphabet().getTokenization("token"); >> ? ?} catch (BioException e) { >> ? ? ?throw new BioError("Something has gone badly wrong with Protein", e); >> ? ?} >> ? ?return new SimpleSymbolList(p, theProtein); >> ?} >> >> This selection should rather be made based on the symbol content of the input >> sequence(s), rather than being hardcoded. Only if the input data contains the >> symbol 'TER' (terminus) or some abiguity symbol that covers the PROTEIN-TERM >> alphabet, should the PROTEIN-TERM alphabet be selected. Otherwise the simpler >> PROTEIN alphabet should be selected. >> >> On a related note, the PROTEIN alphabet defined in AlphabetManager.xml consists >> of 22 residues and includes the less commonly found 'SEC' (selenocysteine, U) >> and 'PYR' (pyroglutamic acid, O). However, many applications only require the >> common 20-symbol alphabet that excludes the latter two residues. It would be >> useful to include a new alphabet in AlphabetManager.xml that defines the >> simpler 20-symbol set of common amino acids. Perhaps this point should be a >> feature request? >> >> Cheers, >> Moyez >> >> >> -- >> Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email >> ------- You are receiving this mail because: ------- >> You are the assignee for the bug, or are watching the assignee. >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > From sylvain.foisy at diploide.net Thu Jun 11 13:52:01 2009 From: sylvain.foisy at diploide.net (Sylvain Foisy) Date: Thu, 11 Jun 2009 09:52:01 -0400 Subject: [Biojava-dev] First draft of a remote blast service class Message-ID: Hi to all, I've been working on this for the past week or so and after discussing this with Andreas, I am putting my code here for critical review. I'll put this stuff in biojava-live as soon as Andreas can fix my SVN access. First, an interface called RemotePairwiseAlignementSerivce defines the basic components of a remote service: sequence/database/progam/run options/output options. RemoteQBlastService implements this interface and runs remote Qblast requests and creates output in either text, XML or HTML. At present time, regular blastall programs work, no blastpgp/megablast support yet. I'll need some guidance to make it work on other type of web services like EBI. Best regards Sylvain =================================================================== Sylvain Foisy, Ph. D. Consultant Bio-informatique / Bioinformatics Diploide.net - TI pour la vie / IT for Life Courriel: sylvain.foisy at diploide.net Web: http://www.diploide.net Tel: (514) 893-4363 =================================================================== import java.io.InputStream; import org.biojava.bio.BioException; /** * This interface specifies minimal information needed to execute a pairwise alignment on a remote service. * * Example of service: QBlast service at NCBI * Web Service at EBI * * @author Sylvain Foisy * @since 1.8 * */ public interface RemotePairwiseAlignementService { /** * This field specifies that the output format of results * is text. * */ public static final int TEXT = 0; /** * This field specifies that the output format of results * is XML. * */ public static final int XML = 1; /** * This field specifies that the output format of results * is HTML. * */ public static final int HTML = 2; /** * Setting the database to use for doing the pairwise alignment * * @param db: a String with a valid database ID for the service used. * */ public void setDatabase(String db); /** * Setting the sequence to be align for this for this request * * @param seq: a String with a sequence to be aligned. * */ public void setSequence(String seq); /** * Setting the program to use for this pairwise alignment * * @param prog: a String with a valid database ID for the service used. * */ public void setProgram(String prog); /** * Setting all other options to use for this pairwise alignment * * @param db: a String with a valid database ID for the service used. * */ public void setAdvancedOptions(String str); /** * Doing the actual analysis on the instantiated service * * @throws BioException */ public void executeSearch() throws BioException; /** * Getting the actual alignment results from this instantiated service * * @return : an InputStream with the actual alignment results * @throws BioException */ public InputStream getAlignmentResults() throws BioException; } import java.io.BufferedReader; import java.io.IOException; import java.io.InputStream; import java.io.InputStreamReader; import java.io.OutputStreamWriter; import java.net.MalformedURLException; import java.net.URL; import java.net.URLConnection; import org.biojava.bio.BioException; /** * RemoteQBlastService - A simple way of submitting BLAST request to the QBlast * service at NCBI. * *

* NCBI provides a Blast server through a CGI-BIN interface. RemoteQBlastService simply * encapsulates an access to it by giving users access to get/set methods to fix * sequence, program and database as well as advanced options. *

* *

* As of version 1.0, only blastall programs are usable. blastpgp and megablast are high-priorities. *

* * @author Sylvain Foisy * @version 1.0 * @since 1.8 * * */ public class RemoteQBlastService implements RemotePairwiseAlignementService{ // public static final int TEXT = 0; // public static final int XML = 1; // public static final int HTML = 2; private static String baseurl = "http://www.ncbi.nlm.nih.gov/blast/Blast.cgi"; private URL aUrl; private URLConnection uConn; private OutputStreamWriter fromQBlast; private BufferedReader rd; private String seq = null; private String prog = null; private String db = null; private String outputFormat = null; private String advanced = null; private String rid; private long step; private boolean done = false; private long start; public RemoteQBlastService() throws BioException { try { aUrl = new URL(baseurl); uConn = setQBlastProperties(aUrl.openConnection()); outputFormat = "Text"; } /* * Needed but should never be thrown since the URL is static and known to exist */ catch (MalformedURLException e) { throw new BioException("It looks like the URL for NCBI QBlast service is bad"); } /* * Intercept if the program can't connect to QBlast service */ catch (IOException e) { throw new BioException( "Impossible to connect to QBlast service at this time. Check your network connection"); } } /** * This method execute the Blast request via the Put command of the CGI-BIN * interface. It gets the estimated time of completion by capturing the * value of the RTOE variable and sets a loop that will check for completion * of analysis at intervals specified by RTOE. * *

* It also capture the value for the RID variable, necessary for fetching * the actual results after completion. *

* * @throws BioException * if it is not possible to sent the BLAST command */ public void executeSearch() throws BioException { if (seq == null || db == null || prog == null) { throw new BioException( "Impossible to execute QBlast request. One or more of seq|db|prog has not been set"); } /* * sending the command to execute the Blast analysis */ String cmd = "CMD=Put&SERVICE=plain" + "&" + seq + "&" + prog + "&" + db + "&" + "FORMAT_TYPE=HTML"; if (advanced != null) { cmd += cmd + "&" + advanced; } try { uConn = setQBlastProperties(aUrl.openConnection()); fromQBlast = new OutputStreamWriter(uConn.getOutputStream()); fromQBlast.write(cmd); fromQBlast.flush(); // Get the response rd = new BufferedReader(new InputStreamReader(uConn .getInputStream())); String line = ""; while ((line = rd.readLine()) != null) { if (line.contains("RID")) { String[] arr = line.split("="); rid = arr[1].trim(); } else if (line.contains("RTOE")) { String[] arr = line.split("="); step = Long.parseLong(arr[1].trim()) * 1000; start = System.currentTimeMillis() + step; } } } catch (IOException e) { throw new BioException( "Can't submit sequence to BLAST server at this time."); } /* * Getting the info out of the NCBI system */ while (!done) { long prez = System.currentTimeMillis(); done = isReady(rid, prez); } } /** *

This method is used only for the executeBlastSearch method to check for completion of * request using the NCBI specified RTOE variable

* * @param id * @param present * @return */ private boolean isReady(String id, long present) { boolean ready = false; String check = "CMD=Get&RID=" + id; /* * If present time is less than the start of the search added to step * obtained from NCBI, just do nothing ;-) */ if (present < start) { ; } /* * If we are at least step seconds in the future from the actual call of * method executeBlastSearch() */ else { try { uConn = setQBlastProperties(aUrl.openConnection()); fromQBlast = new OutputStreamWriter(uConn.getOutputStream()); fromQBlast.write(check); fromQBlast.flush(); rd = new BufferedReader(new InputStreamReader(uConn .getInputStream())); String line = ""; while ((line = rd.readLine()) != null) { if (line.contains("READY")) { ready = true; } else if (line.contains("WAITING")) { /* * Else, move start forward in time... */ start = present + step; } } } catch (IOException e) { e.printStackTrace(); } } return ready; } /** *

This method extracts this actual Blast report. The default format is Text but can be changed before with the method * setQBlastOutputFormat.

* * * @return * @throws BioException */ public InputStream getAlignmentResults() throws BioException { String srid = "CMD=Get&RID=" + rid; srid += "&FORMAT_TYPE=" + outputFormat; if(!this.done){ throw new BioException("Unable to get report at this time. Your Blast request has not been processed yet."); } try { uConn = setQBlastProperties(aUrl.openConnection()); fromQBlast = new OutputStreamWriter(uConn.getOutputStream()); fromQBlast.write(srid); fromQBlast.flush(); return uConn.getInputStream(); } catch (IOException ioe) { throw new BioException( "It is not possible to fetch Blast report from NCBI at this time"); } } /** *

* Set the sequence to be blasted using the String that correspond to the * sequence. *

* *

* Take note that this method is mutually exclusive to setGIToBlast() for a * given Blast request. *

* * @param aStr * : a String with the sequence */ public void setSequence(String aStr) { this.seq = "QUERY=" + aStr; } /** * Simply return a string with the blasted sequence. * * @return seq : a string with the sequence */ public String getSeqToBlast() { return this.seq; } /** *

* Set the sequence to be blasted using the NCBI GI value. At this time, * there is no effort made to check the validity of this GI. *

* *

* Take note that this method is mutually exclusive to setSeqToBlast() for a * given Blast request. *

* * @param gi * : an integer value representing a NCBI GI */ public void setGIToBlast(String gi) { this.seq = "QUERY=" + gi; } /** *

* Simply return a string with the sequence blasted. *

* * @return GI : a String with the GI of the blasted sequence */ public String getGIToBlast() { return this.seq; } /** *

* This method set the program to be used to blast the given sequence/GI. At * this time, there is no attempt at checking the matching of sequence type * to program. *

* * @param prog * : a String representing the program specified for this QBlast * request. * */ public void setProgram(String prog) { this.prog = "PROGRAM=" + prog; } /** *

* Simply returns the program used for the given Blast request. *

* * @return prog : a String with the program used for this QBlast request. */ public String getProgram() { return this.prog; } /** *

* This method set the database to be used to blast the given sequence/GI. * At this time, there is no attempt at checking the matching of sequence * type to database. *

* * @param db: a String for the database specified for this QBlast request */ public void setDatabase(String db) { this.db = "DATABASE=" + db; } /** *

* Simply returns the database used for the given Blast request. *

* * @return db: a String with the database used for this QBlast request. */ public String getBlastDatabase() { return this.db; } /** *

This method let the user specify which format to use for generating the output.

* * @param type:an integer taken from the static constant of this class, either be TEXT, XML or HTML */ public void setQBlastOutputFormat(int type) { switch (type) { case 0: this.outputFormat = "Text"; break; case 1: this.outputFormat = "XML"; break; case 2: this.outputFormat = "HTML"; break; } } /** *

* Simply returns the output format used for the given Blast report. *

* * @return outputFormat : a String with the format specified for the QBlast report. */ public String getQBlastOutputFormat() { return this.outputFormat; } /** *

This method is to be used if a request is to use non-default values at submission. According to QBlast info, * the accepted parameters for PUT requests are:

* *
    *
  • -G: cost to create a gap. Default = 5 (nuc-nuc) / 11 (protein) / non-affine for megablast
  • *
  • -E: Cost to extend a gap. Default = 2 (nuc-nuc) / 1 (protein) / non-affine for megablast
  • *
  • -r: integer to reward for match. Default = 1
  • *
  • -q: negative integer for penalty to allow mismatch. Default = -3
  • *
  • -e: expectation value. Default = 10.0
  • *
  • -W: word size. Default = 3 (proteins) / 11 (nuc-nuc) / 28 (megablast)
  • *
  • -y: dropoff for blast extensions in bits, using default if not specified. Default = 20 for blastn, 7 for all others * (except megablast for which it is not applicable).
  • *
  • -X: X dropoff value for gapped alignment, in bits. Default = 30 for blastn/megablast, 15 for all others.
  • *
  • -Z: final X dropoff value for gapped alignement, in bits. Default = 50 for blastn, 25 for all others * (except megablast for which it is not applicable)
  • *
  • -P: equals 0 for multiple hits 1-pass, 1 for single hit 1-pass. Does not apply to blastn ou megablast.
  • *
  • -A: multiple hits window size. Default = 0 (for single hit algorithm)
  • *
  • -I: number of database sequences to save hits for. Default = 500
  • *
  • -Y: effective length of the search space. Default = 0 (0 represents using the whole space)
  • *
  • -z: a real specifying the effective length of the database to use. Default = 0 (0 represents the real size)
  • *
  • -c: an integer representing pseudocount constant for PSI-BLAST. Default = 7
  • *
  • -F: any filtering directive
  • *
* *

You have to be aware that at not moment is there any error checking on the use of these parameters by this class.

* @param aStr: a String with any number of optional parameters with an associated value. * */ public void setAdvancedOptions(String aStr) { this.advanced = "OTHER_ADVANCED=" + aStr; } /** * * Simply return the string given as argument via setBlastAdvancedOptions * * @return advanced: the string with the advanced options */ public String getBlastAdvancedOptions() { return this.advanced; } /** * * Simply return the QBlast RID for this specific QBlast request * * @return rid: the string with the RID */ public String getBlastRID() { return this.rid; } /** * A simple method to check the availability of the QBlast service * * @throws BioException */ public void printRemoteBlastInfo() throws BioException { try { OutputStreamWriter out = new OutputStreamWriter(uConn .getOutputStream()); out.write("CMD=Info"); out.flush(); // Get the response BufferedReader rd = new BufferedReader(new InputStreamReader(uConn .getInputStream())); String line = ""; while ((line = rd.readLine()) != null) { System.out.println(line); } out.close(); rd.close(); } catch (IOException e) { throw new BioException( "Impossible to get info from QBlast service at this time. Check your network connection"); } } private URLConnection setQBlastProperties(URLConnection conn) { URLConnection tmp = conn; conn.setDoOutput(true); conn.setUseCaches(false); tmp.setRequestProperty("User-Agent", "Biojava/RemoteQBlastService"); tmp.setRequestProperty("Connection", "Keep-Alive"); tmp.setRequestProperty("Content-type", "application/x-www-form-urlencoded"); tmp.setRequestProperty("Content-length", "200"); return tmp; } } From james at carmanconsulting.com Thu Jun 11 14:24:44 2009 From: james at carmanconsulting.com (James Carman) Date: Thu, 11 Jun 2009 10:24:44 -0400 Subject: [Biojava-dev] First draft of a remote blast service class In-Reply-To: References: Message-ID: Are we allowed to use JDK5? Why not use enums rather than int codes? On Thu, Jun 11, 2009 at 9:52 AM, Sylvain Foisy wrote: > Hi to all, > > I've been working on this for the past week or so and after discussing this > with Andreas, I am putting my code here for critical review. I'll put this > stuff in biojava-live as soon as Andreas can fix my SVN access. > > First, an interface called RemotePairwiseAlignementSerivce defines the basic > components of a remote service: sequence/database/progam/run options/output > options. RemoteQBlastService implements this interface and runs remote > Qblast requests and creates output in either text, XML or HTML. At present > time, regular blastall programs work, no blastpgp/megablast support yet. > > I'll need some guidance to make it work on other type of web services like > EBI. > > Best regards > > Sylvain > > =================================================================== > > ?Sylvain Foisy, Ph. D. > ?Consultant Bio-informatique / Bioinformatics > ?Diploide.net - TI pour la vie / IT for Life > > ?Courriel: sylvain.foisy at diploide.net > ?Web: http://www.diploide.net > ?Tel: (514) 893-4363 > =================================================================== > > import java.io.InputStream; > > import org.biojava.bio.BioException; > /** > ?* This interface specifies minimal information needed to execute a pairwise > alignment on a remote service. > ?* > ?* Example of service: QBlast service at NCBI > ?* ? ? ? ? ? ? ? ? ? ? Web Service at EBI > ?* > ?* @author Sylvain Foisy > ?* @since 1.8 > ?* > ?*/ > public interface RemotePairwiseAlignementService { > > ? ?/** > ? ? * This field specifies that the output format of results > ? ? * is text. > ? ? * > ? ? */ > ? ?public static final int TEXT = 0; > > ? ?/** > ? ? * This field specifies that the output format of results > ? ? * is XML. > ? ? * > ? ? */ > ? ?public static final int XML = 1; > > ? ?/** > ? ? * This field specifies that the output format of results > ? ? * is HTML. > ? ? * > ? ? */ > ? ?public static final int HTML = 2; > > ? ?/** > ? ? * Setting the database to use for doing the pairwise alignment > ? ? * > ? ? * @param db: a String with a valid database ID for the > service used. > ? ? * > ? ? */ > ? ?public void setDatabase(String db); > > ? ?/** > ? ? * Setting the sequence to be align for this for this request > ? ? * > ? ? * @param seq: a String with a sequence to be aligned. > ? ? * > ? ? */ > ? ?public void setSequence(String seq); > > ? ?/** > ? ? * Setting the program to use for this pairwise alignment > ? ? * > ? ? * @param prog: a String with a valid database ID for the > service used. > ? ? * > ? ? */ > ? ?public void setProgram(String prog); > > ? ?/** > ? ? * Setting all other options to use for this pairwise alignment > ? ? * > ? ? * @param db: a String with a valid database ID for the > service used. > ? ? * > ? ? */ > ? ?public void setAdvancedOptions(String str); > > ? ?/** > ? ? * Doing the actual analysis on the instantiated service > ? ? * > ? ? * @throws BioException > ? ? */ > ? ?public void executeSearch() throws BioException; > > ? ?/** > ? ? * Getting the actual alignment results from this instantiated service > ? ? * > ? ? * @return : an InputStream with the actual alignment > results > ? ? * @throws BioException > ? ? */ > ? ?public InputStream getAlignmentResults() throws BioException; > } > > import java.io.BufferedReader; > import java.io.IOException; > import java.io.InputStream; > import java.io.InputStreamReader; > import java.io.OutputStreamWriter; > import java.net.MalformedURLException; > import java.net.URL; > import java.net.URLConnection; > > import org.biojava.bio.BioException; > > /** > ?* RemoteQBlastService - A simple way of submitting BLAST request to the > QBlast > ?* service at NCBI. > ?* > ?*

> ?* NCBI provides a Blast server through a CGI-BIN interface. > RemoteQBlastService simply > ?* encapsulates an access to it by giving users access to get/set methods to > fix > ?* sequence, program and database as well as advanced options. > ?*

> ?* > ?*

> ?* As of version 1.0, only blastall programs are usable. blastpgp and > megablast are high-priorities. > ?*

> ?* > ?* @author Sylvain Foisy > ?* @version 1.0 > ?* @since 1.8 > ?* > ?* > ?*/ > public class RemoteQBlastService implements RemotePairwiseAlignementService{ > > // ? ?public static final int TEXT = 0; > // ? ?public static final int XML = 1; > // ? ?public static final int HTML = 2; > > ? ?private static String baseurl = > "http://www.ncbi.nlm.nih.gov/blast/Blast.cgi"; > ? ?private URL aUrl; > ? ?private URLConnection uConn; > ? ?private OutputStreamWriter fromQBlast; > ? ?private BufferedReader rd; > > ? ?private String seq = null; > ? ?private String prog = null; > ? ?private String db = null; > ? ?private String outputFormat = null; > ? ?private String advanced = null; > > ? ?private String rid; > ? ?private long step; > ? ?private boolean done = false; > ? ?private long start; > > ? ?public RemoteQBlastService() throws BioException { > ? ? ? ?try { > ? ? ? ? ? ?aUrl = new URL(baseurl); > ? ? ? ? ? ?uConn = setQBlastProperties(aUrl.openConnection()); > > ? ? ? ? ? ?outputFormat = "Text"; > ? ? ? ?} > ? ? ? ?/* > ? ? ? ? * Needed but should never be thrown since the URL is static and > known to exist > ? ? ? ? */ > ? ? ? ?catch (MalformedURLException e) { > ? ? ? ? ? ?throw new BioException("It looks like the URL for NCBI QBlast > service is bad"); > ? ? ? ?} > ? ? ? ?/* > ? ? ? ? * Intercept if the program can't connect to QBlast service > ? ? ? ? */ > ? ? ? ?catch (IOException e) { > ? ? ? ? ? ?throw new BioException( > ? ? ? ? ? ? ? ? ? ?"Impossible to connect to QBlast service at this time. > Check your network connection"); > ? ? ? ?} > ? ?} > > ? ?/** > ? ? * This method execute the Blast request via the Put command of the > CGI-BIN > ? ? * interface. It gets the estimated time of completion by capturing the > ? ? * value of the RTOE variable and sets a loop that will check for > completion > ? ? * of analysis at intervals specified by RTOE. > ? ? * > ? ? *

> ? ? * It also capture the value for the RID variable, necessary for > fetching > ? ? * the actual results after completion. > ? ? *

> ? ? * > ? ? * @throws BioException > ? ? * ? ? ? ? ? ? if it is not possible to sent the BLAST command > ? ? */ > ? ?public void executeSearch() throws BioException { > > ? ? ? ?if (seq == null || db == null || prog == null) { > ? ? ? ? ? ?throw new BioException( > ? ? ? ? ? ? ? ? ? ?"Impossible to execute QBlast request. One or more of > seq|db|prog has not been set"); > ? ? ? ?} > ? ? ? ?/* > ? ? ? ? * sending the command to execute the Blast analysis > ? ? ? ? */ > ? ? ? ?String cmd = "CMD=Put&SERVICE=plain" + "&" + seq + "&" + prog + "&" > ? ? ? ? ? ? ? ?+ db + "&" + "FORMAT_TYPE=HTML"; > > ? ? ? ?if (advanced != null) { > ? ? ? ? ? ?cmd += cmd + "&" + advanced; > ? ? ? ?} > > ? ? ? ?try { > > ? ? ? ? ? ?uConn = setQBlastProperties(aUrl.openConnection()); > > ? ? ? ? ? ?fromQBlast = new OutputStreamWriter(uConn.getOutputStream()); > > ? ? ? ? ? ?fromQBlast.write(cmd); > ? ? ? ? ? ?fromQBlast.flush(); > > ? ? ? ? ? ?// Get the response > ? ? ? ? ? ?rd = new BufferedReader(new InputStreamReader(uConn > ? ? ? ? ? ? ? ? ? ?.getInputStream())); > > ? ? ? ? ? ?String line = ""; > > ? ? ? ? ? ?while ((line = rd.readLine()) != null) { > ? ? ? ? ? ? ? ?if (line.contains("RID")) { > ? ? ? ? ? ? ? ? ? ?String[] arr = line.split("="); > ? ? ? ? ? ? ? ? ? ?rid = arr[1].trim(); > ? ? ? ? ? ? ? ?} else if (line.contains("RTOE")) { > ? ? ? ? ? ? ? ? ? ?String[] arr = line.split("="); > ? ? ? ? ? ? ? ? ? ?step = Long.parseLong(arr[1].trim()) * 1000; > ? ? ? ? ? ? ? ? ? ?start = System.currentTimeMillis() + step; > ? ? ? ? ? ? ? ?} > ? ? ? ? ? ?} > ? ? ? ?} catch (IOException e) { > ? ? ? ? ? ?throw new BioException( > ? ? ? ? ? ? ? ? ? ?"Can't submit sequence to BLAST server at this time."); > ? ? ? ?} > ? ? ? ?/* > ? ? ? ? * Getting the info out of the NCBI system > ? ? ? ? */ > ? ? ? ?while (!done) { > ? ? ? ? ? ?long prez = System.currentTimeMillis(); > ? ? ? ? ? ?done = isReady(rid, prez); > ? ? ? ?} > ? ?} > > ? ?/** > ? ? *

This method is used only for the executeBlastSearch method to > check for completion of > ? ? * request using the NCBI specified RTOE variable

> ? ? * > ? ? * @param id > ? ? * @param present > ? ? * @return > ? ? */ > ? ?private boolean isReady(String id, long present) { > > ? ? ? ?boolean ready = false; > ? ? ? ?String check = "CMD=Get&RID=" + id; > ? ? ? ?/* > ? ? ? ? * If present time is less than the start of the search added to > step > ? ? ? ? * obtained from NCBI, just do nothing ;-) > ? ? ? ? */ > ? ? ? ?if (present < start) { > ? ? ? ? ? ?; > ? ? ? ?} > ? ? ? ?/* > ? ? ? ? * If we are at least step seconds in the future from the actual > call of > ? ? ? ? * method executeBlastSearch() > ? ? ? ? */ > ? ? ? ?else { > ? ? ? ? ? ?try { > ? ? ? ? ? ? ? ?uConn = setQBlastProperties(aUrl.openConnection()); > > ? ? ? ? ? ? ? ?fromQBlast = new > OutputStreamWriter(uConn.getOutputStream()); > ? ? ? ? ? ? ? ?fromQBlast.write(check); > ? ? ? ? ? ? ? ?fromQBlast.flush(); > > ? ? ? ? ? ? ? ?rd = new BufferedReader(new InputStreamReader(uConn > ? ? ? ? ? ? ? ? ? ? ? ?.getInputStream())); > > ? ? ? ? ? ? ? ?String line = ""; > > ? ? ? ? ? ? ? ?while ((line = rd.readLine()) != null) { > ? ? ? ? ? ? ? ? ? ?if (line.contains("READY")) { > ? ? ? ? ? ? ? ? ? ? ? ?ready = true; > ? ? ? ? ? ? ? ? ? ?} else if (line.contains("WAITING")) { > ? ? ? ? ? ? ? ? ? ? ? ?/* > ? ? ? ? ? ? ? ? ? ? ? ? * Else, move start forward in time... > ? ? ? ? ? ? ? ? ? ? ? ? */ > ? ? ? ? ? ? ? ? ? ? ? ?start = present + step; > ? ? ? ? ? ? ? ? ? ?} > ? ? ? ? ? ? ? ?} > ? ? ? ? ? ?} catch (IOException e) { > ? ? ? ? ? ? ? ?e.printStackTrace(); > ? ? ? ? ? ?} > ? ? ? ?} > ? ? ? ?return ready; > ? ?} > > ? ?/** > ? ? *

This method extracts this actual Blast report. The default format > is Text but can be changed before with the method > ? ? * setQBlastOutputFormat.

> ? ? * > ? ? * > ? ? * @return > ? ? * @throws BioException > ? ? */ > ? ?public InputStream getAlignmentResults() throws BioException { > ? ? ? ?String srid = "CMD=Get&RID=" + rid; > ? ? ? ?srid += "&FORMAT_TYPE=" + outputFormat; > > ? ? ? ?if(!this.done){ > ? ? ? ? ? ?throw new BioException("Unable to get report at this time. Your > Blast request has not been processed yet."); > ? ? ? ?} > > ? ? ? ?try { > ? ? ? ? ? ?uConn = setQBlastProperties(aUrl.openConnection()); > > ? ? ? ? ? ?fromQBlast = new OutputStreamWriter(uConn.getOutputStream()); > ? ? ? ? ? ?fromQBlast.write(srid); > ? ? ? ? ? ?fromQBlast.flush(); > > ? ? ? ? ? ?return uConn.getInputStream(); > > ? ? ? ?} catch (IOException ioe) { > ? ? ? ? ? ?throw new BioException( > ? ? ? ? ? ? ? ? ? ?"It is not possible to fetch Blast report from NCBI at > this time"); > ? ? ? ?} > ? ?} > > ? ?/** > ? ? *

> ? ? * Set the sequence to be blasted using the String that correspond to > the > ? ? * sequence. > ? ? *

> ? ? * > ? ? *

> ? ? * Take note that this method is mutually exclusive to setGIToBlast() > for a > ? ? * given Blast request. > ? ? *

> ? ? * > ? ? * @param aStr > ? ? * ? ? ? ? ? ?: a String with the sequence > ? ? */ > ? ?public void setSequence(String aStr) { > ? ? ? ?this.seq = "QUERY=" + aStr; > ? ?} > > ? ?/** > ? ? * Simply return a string with the blasted sequence. > ? ? * > ? ? * @return seq : a string with the sequence > ? ? */ > ? ?public String getSeqToBlast() { > ? ? ? ?return this.seq; > ? ?} > > ? ?/** > ? ? *

> ? ? * Set the sequence to be blasted using the NCBI GI value. At this time, > ? ? * there is no effort made to check the validity of this GI. > ? ? *

> ? ? * > ? ? *

> ? ? * Take note that this method is mutually exclusive to setSeqToBlast() > for a > ? ? * given Blast request. > ? ? *

> ? ? * > ? ? * @param gi > ? ? * ? ? ? ? ? ?: an integer value representing a NCBI GI > ? ? */ > ? ?public void setGIToBlast(String gi) { > ? ? ? ?this.seq = "QUERY=" + gi; > ? ?} > > ? ?/** > ? ? *

> ? ? * Simply return a string with the sequence blasted. > ? ? *

> ? ? * > ? ? * @return GI : a String with the GI of the blasted sequence > ? ? */ > ? ?public String getGIToBlast() { > ? ? ? ?return this.seq; > ? ?} > > ? ?/** > ? ? *

> ? ? * This method set the program to be used to blast the given > sequence/GI. At > ? ? * this time, there is no attempt at checking the matching of sequence > type > ? ? * to program. > ? ? *

> ? ? * > ? ? * @param prog > ? ? * ? ? ? ? ? ?: a String representing the program specified for this > QBlast > ? ? * ? ? ? ? ? ?request. > ? ? * > ? ? */ > ? ?public void setProgram(String prog) { > ? ? ? ?this.prog = "PROGRAM=" + prog; > ? ?} > > ? ?/** > ? ? *

> ? ? * Simply returns the program used for the given Blast request. > ? ? *

> ? ? * > ? ? * @return prog : a String with the program used for this QBlast > request. > ? ? */ > ? ?public String getProgram() { > ? ? ? ?return this.prog; > ? ?} > > ? ?/** > ? ? *

> ? ? * This method set the database to be used to blast the given > sequence/GI. > ? ? * At this time, there is no attempt at checking the matching of > sequence > ? ? * type to database. > ? ? *

> ? ? * > ? ? * @param db: a String for the database specified for this QBlast > request > ? ? */ > ? ?public void setDatabase(String db) { > ? ? ? ?this.db = "DATABASE=" + db; > ? ?} > > ? ?/** > ? ? *

> ? ? * Simply returns the database used for the given Blast request. > ? ? *

> ? ? * > ? ? * @return db: a String with the database used for this QBlast request. > ? ? */ > ? ?public String getBlastDatabase() { > ? ? ? ?return this.db; > ? ?} > > ? ?/** > ? ? *

This method let the user specify which format to use for > generating the output.

> ? ? * > ? ? * @param type:an integer taken from the static constant of this class, > either be TEXT, XML or HTML > ? ? */ > ? ?public void setQBlastOutputFormat(int type) { > > ? ? ? ?switch (type) { > ? ? ? ? ? ?case 0: > ? ? ? ? ? ? ? ?this.outputFormat = "Text"; > ? ? ? ? ? ? ? ?break; > ? ? ? ? ? ?case 1: > ? ? ? ? ? ? ? ?this.outputFormat = "XML"; > ? ? ? ? ? ? ? ?break; > ? ? ? ? ? ?case 2: > ? ? ? ? ? ? ? ?this.outputFormat = "HTML"; > ? ? ? ? ? ? ? ?break; > ? ? ? ?} > ? ?} > > ? ?/** > ? ? *

> ? ? * Simply returns the output format used for the given Blast report. > ? ? *

> ? ? * > ? ? * @return outputFormat : a String with the format specified for the > QBlast report. > ? ? */ > ? ?public String getQBlastOutputFormat() { > ? ? ? ?return this.outputFormat; > ? ?} > > ? ?/** > ? ? *

This method is to be used if a request is to use non-default > values at submission. According to QBlast info, > ? ? * the accepted parameters for PUT requests are:

> ? ? * > ? ? *
    > ? ? *
  • -G: cost to create a gap. Default = 5 (nuc-nuc) / 11 (protein) / > non-affine for megablast
  • > ? ? *
  • -E: Cost to extend a gap. Default = 2 (nuc-nuc) / 1 (protein) / > non-affine for megablast
  • > ? ? *
  • -r: integer to reward for match. Default = 1
  • > ? ? *
  • -q: negative integer for penalty to allow mismatch. Default = > -3
  • > ? ? *
  • -e: expectation value. Default = 10.0
  • > ? ? *
  • -W: word size. Default = 3 (proteins) / 11 (nuc-nuc) / 28 > (megablast)
  • > ? ? *
  • -y: dropoff for blast extensions in bits, using default if not > specified. Default = 20 for blastn, 7 for all others > ? ? * (except megablast for which it is not applicable).
  • > ? ? *
  • -X: X dropoff value for gapped alignment, in bits. Default = 30 > for blastn/megablast, 15 for all others.
  • > ? ? *
  • -Z: final X dropoff value for gapped alignement, in bits. Default > = 50 for blastn, 25 for all others > ? ? * (except megablast for which it is not applicable)
  • > ? ? *
  • -P: equals 0 for multiple hits 1-pass, 1 for single hit 1-pass. > Does not apply to blastn ou megablast.
  • > ? ? *
  • -A: multiple hits window size. Default = 0 (for single hit > algorithm)
  • > ? ? *
  • -I: number of database sequences to save hits for. Default = > 500
  • > ? ? *
  • -Y: effective length of the search space. Default = 0 (0 > represents using the whole space)
  • > ? ? *
  • -z: a real specifying the effective length of the database to > use. Default = 0 (0 represents the real size)
  • > ? ? *
  • -c: an integer representing pseudocount constant for PSI-BLAST. > Default = 7
  • > ? ? *
  • -F: any filtering directive
  • > ? ? *
> ? ? * > ? ? *

You have to be aware that at not moment is there any error > checking on the use of these parameters by this class.

> ? ? * @param aStr: a String with any number of optional parameters with an > associated value. > ? ? * > ? ? */ > ? ?public void setAdvancedOptions(String aStr) { > ? ? ? ?this.advanced = "OTHER_ADVANCED=" + aStr; > ? ?} > > ? ?/** > ? ? * > ? ? * Simply return the string given as argument via > setBlastAdvancedOptions > ? ? * > ? ? * @return advanced: the string with the advanced options > ? ? */ > ? ?public String getBlastAdvancedOptions() { > ? ? ? ?return this.advanced; > ? ?} > > ? ?/** > ? ? * > ? ? * Simply return the QBlast RID for this specific QBlast request > ? ? * > ? ? * @return rid: the string with the RID > ? ? */ > ? ?public String getBlastRID() { > ? ? ? ?return this.rid; > ? ?} > > ? ?/** > ? ? * A simple method to check the availability of the QBlast service > ? ? * > ? ? * @throws BioException > ? ? */ > ? ?public void printRemoteBlastInfo() throws BioException { > ? ? ? ?try { > ? ? ? ? ? ?OutputStreamWriter out = new OutputStreamWriter(uConn > ? ? ? ? ? ? ? ? ? ?.getOutputStream()); > > ? ? ? ? ? ?out.write("CMD=Info"); > ? ? ? ? ? ?out.flush(); > > ? ? ? ? ? ?// Get the response > ? ? ? ? ? ?BufferedReader rd = new BufferedReader(new > InputStreamReader(uConn > ? ? ? ? ? ? ? ? ? ?.getInputStream())); > > ? ? ? ? ? ?String line = ""; > > ? ? ? ? ? ?while ((line = rd.readLine()) != null) { > ? ? ? ? ? ? ? ?System.out.println(line); > ? ? ? ? ? ?} > > ? ? ? ? ? ?out.close(); > ? ? ? ? ? ?rd.close(); > ? ? ? ?} catch (IOException e) { > ? ? ? ? ? ?throw new BioException( > ? ? ? ? ? ? ? ? ? ?"Impossible to get info from QBlast service at this > time. Check your network connection"); > ? ? ? ?} > ? ?} > > ? ?private URLConnection setQBlastProperties(URLConnection conn) { > > ? ? ? ?URLConnection tmp = conn; > > ? ? ? ?conn.setDoOutput(true); > ? ? ? ?conn.setUseCaches(false); > > ? ? ? ?tmp.setRequestProperty("User-Agent", "Biojava/RemoteQBlastService"); > ? ? ? ?tmp.setRequestProperty("Connection", "Keep-Alive"); > ? ? ? ?tmp.setRequestProperty("Content-type", > ? ? ? ? ? ? ? ?"application/x-www-form-urlencoded"); > ? ? ? ?tmp.setRequestProperty("Content-length", "200"); > > ? ? ? ?return tmp; > ? ?} > } > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > From sylvain.foisy at diploide.net Thu Jun 11 14:45:23 2009 From: sylvain.foisy at diploide.net (Sylvain Foisy) Date: Thu, 11 Jun 2009 10:45:23 -0400 Subject: [Biojava-dev] First draft of a remote blast service class In-Reply-To: <1244729855.5546.52.camel@buzzybee> Message-ID: Hi Richard, On 11/06/09 10:17, "[NAME]" <[ADDRESS]> wrote: > Good stuff! My 2p's worth: Thanks. It's my first java code in the past year or so... Managing projects really kills programming habits :-( > setSequence() should be overloaded to accept all forms of possible > sequence input - whatever is decided on as the standard way of > referencing sequence data in BJ3. The original plan for BJ3 was to allow > String/CharSequence and List (see > http://www.biojava.org/wiki/BioJava3:HowTo ) Good point. I'll work on this. The List is a bit tricky: one would need to create a timed sequence so that the program would not flood the service. My own .02 cents: this should be done at the program level, not the class level. The class should need to be "preoccupied" by a single request. > setAdvancedOptions() should not accept a String, but rather a Properties > or a Map, where the keys of the Map/Properties are > restricted to a range of acceptable values determined (and published, > maybe as an enum?) by each of the implementation classes (e.g. > RemoteQBlastService). The implementation class then uses this to > construct the call string. The reason for doing it this way is that (a) > it allows the parameters to be verified by checking them against a known > list of allowable key/values, and (b) it allows for non-URL based remote > requests to be constructed from the values, e.g. SOAP calls. Mots definitely a good thing! > I would also replace the static int HTML/TEXT/XML with an enum as > numeric constants are sometimes a Bad Thing. > > The setProgram() method in my mind is specific to Blast, as opposed to > being a generic Pairwise Alignment concept. Therefore it might be better > to move this to a Blast-specific sub-interface or make it only appear in > the implementation classes that refer to Blast. I am to used to using BLAST only... > Finally, the JavaDocs for the various set() methods are incorrect - > they're all mostly the same in fact! :) I am looking and not seeing the same thing... Ok, I'll check triple ;-) > But overall it looks good. Thanks. I'll be working on this in the next few days. Best regards Sylvain =================================================================== Sylvain Foisy, Ph. D. Consultant Bio-informatique / Bioinformatics Diploide.net - TI pour la vie / IT for Life Courriel: sylvain.foisy at diploide.net Web: http://www.diploide.net Tel: (514) 893-4363 =================================================================== From andreas.prlic at gmail.com Thu Jun 11 15:24:20 2009 From: andreas.prlic at gmail.com (Andreas Prlic) Date: Thu, 11 Jun 2009 08:24:20 -0700 Subject: [Biojava-dev] First draft of a remote blast service class In-Reply-To: References: Message-ID: <59a41c430906110824l2f1167cfp236cf69cc9dee94c@mail.gmail.com> I would pass the parameters as a bean rather than a string... Andreas On Thu, Jun 11, 2009 at 6:52 AM, Sylvain Foisy wrote: > Hi to all, > > I've been working on this for the past week or so and after discussing this > with Andreas, I am putting my code here for critical review. I'll put this > stuff in biojava-live as soon as Andreas can fix my SVN access. > > First, an interface called RemotePairwiseAlignementSerivce defines the basic > components of a remote service: sequence/database/progam/run options/output > options. RemoteQBlastService implements this interface and runs remote > Qblast requests and creates output in either text, XML or HTML. At present > time, regular blastall programs work, no blastpgp/megablast support yet. > > I'll need some guidance to make it work on other type of web services like > EBI. > > Best regards > > Sylvain > > =================================================================== > > ?Sylvain Foisy, Ph. D. > ?Consultant Bio-informatique / Bioinformatics > ?Diploide.net - TI pour la vie / IT for Life > > ?Courriel: sylvain.foisy at diploide.net > ?Web: http://www.diploide.net > ?Tel: (514) 893-4363 > =================================================================== > > import java.io.InputStream; > > import org.biojava.bio.BioException; > /** > ?* This interface specifies minimal information needed to execute a pairwise > alignment on a remote service. > ?* > ?* Example of service: QBlast service at NCBI > ?* ? ? ? ? ? ? ? ? ? ? Web Service at EBI > ?* > ?* @author Sylvain Foisy > ?* @since 1.8 > ?* > ?*/ > public interface RemotePairwiseAlignementService { > > ? ?/** > ? ? * This field specifies that the output format of results > ? ? * is text. > ? ? * > ? ? */ > ? ?public static final int TEXT = 0; > > ? ?/** > ? ? * This field specifies that the output format of results > ? ? * is XML. > ? ? * > ? ? */ > ? ?public static final int XML = 1; > > ? ?/** > ? ? * This field specifies that the output format of results > ? ? * is HTML. > ? ? * > ? ? */ > ? ?public static final int HTML = 2; > > ? ?/** > ? ? * Setting the database to use for doing the pairwise alignment > ? ? * > ? ? * @param db: a String with a valid database ID for the > service used. > ? ? * > ? ? */ > ? ?public void setDatabase(String db); > > ? ?/** > ? ? * Setting the sequence to be align for this for this request > ? ? * > ? ? * @param seq: a String with a sequence to be aligned. > ? ? * > ? ? */ > ? ?public void setSequence(String seq); > > ? ?/** > ? ? * Setting the program to use for this pairwise alignment > ? ? * > ? ? * @param prog: a String with a valid database ID for the > service used. > ? ? * > ? ? */ > ? ?public void setProgram(String prog); > > ? ?/** > ? ? * Setting all other options to use for this pairwise alignment > ? ? * > ? ? * @param db: a String with a valid database ID for the > service used. > ? ? * > ? ? */ > ? ?public void setAdvancedOptions(String str); > > ? ?/** > ? ? * Doing the actual analysis on the instantiated service > ? ? * > ? ? * @throws BioException > ? ? */ > ? ?public void executeSearch() throws BioException; > > ? ?/** > ? ? * Getting the actual alignment results from this instantiated service > ? ? * > ? ? * @return : an InputStream with the actual alignment > results > ? ? * @throws BioException > ? ? */ > ? ?public InputStream getAlignmentResults() throws BioException; > } > > import java.io.BufferedReader; > import java.io.IOException; > import java.io.InputStream; > import java.io.InputStreamReader; > import java.io.OutputStreamWriter; > import java.net.MalformedURLException; > import java.net.URL; > import java.net.URLConnection; > > import org.biojava.bio.BioException; > > /** > ?* RemoteQBlastService - A simple way of submitting BLAST request to the > QBlast > ?* service at NCBI. > ?* > ?*

> ?* NCBI provides a Blast server through a CGI-BIN interface. > RemoteQBlastService simply > ?* encapsulates an access to it by giving users access to get/set methods to > fix > ?* sequence, program and database as well as advanced options. > ?*

> ?* > ?*

> ?* As of version 1.0, only blastall programs are usable. blastpgp and > megablast are high-priorities. > ?*

> ?* > ?* @author Sylvain Foisy > ?* @version 1.0 > ?* @since 1.8 > ?* > ?* > ?*/ > public class RemoteQBlastService implements RemotePairwiseAlignementService{ > > // ? ?public static final int TEXT = 0; > // ? ?public static final int XML = 1; > // ? ?public static final int HTML = 2; > > ? ?private static String baseurl = > "http://www.ncbi.nlm.nih.gov/blast/Blast.cgi"; > ? ?private URL aUrl; > ? ?private URLConnection uConn; > ? ?private OutputStreamWriter fromQBlast; > ? ?private BufferedReader rd; > > ? ?private String seq = null; > ? ?private String prog = null; > ? ?private String db = null; > ? ?private String outputFormat = null; > ? ?private String advanced = null; > > ? ?private String rid; > ? ?private long step; > ? ?private boolean done = false; > ? ?private long start; > > ? ?public RemoteQBlastService() throws BioException { > ? ? ? ?try { > ? ? ? ? ? ?aUrl = new URL(baseurl); > ? ? ? ? ? ?uConn = setQBlastProperties(aUrl.openConnection()); > > ? ? ? ? ? ?outputFormat = "Text"; > ? ? ? ?} > ? ? ? ?/* > ? ? ? ? * Needed but should never be thrown since the URL is static and > known to exist > ? ? ? ? */ > ? ? ? ?catch (MalformedURLException e) { > ? ? ? ? ? ?throw new BioException("It looks like the URL for NCBI QBlast > service is bad"); > ? ? ? ?} > ? ? ? ?/* > ? ? ? ? * Intercept if the program can't connect to QBlast service > ? ? ? ? */ > ? ? ? ?catch (IOException e) { > ? ? ? ? ? ?throw new BioException( > ? ? ? ? ? ? ? ? ? ?"Impossible to connect to QBlast service at this time. > Check your network connection"); > ? ? ? ?} > ? ?} > > ? ?/** > ? ? * This method execute the Blast request via the Put command of the > CGI-BIN > ? ? * interface. It gets the estimated time of completion by capturing the > ? ? * value of the RTOE variable and sets a loop that will check for > completion > ? ? * of analysis at intervals specified by RTOE. > ? ? * > ? ? *

> ? ? * It also capture the value for the RID variable, necessary for > fetching > ? ? * the actual results after completion. > ? ? *

> ? ? * > ? ? * @throws BioException > ? ? * ? ? ? ? ? ? if it is not possible to sent the BLAST command > ? ? */ > ? ?public void executeSearch() throws BioException { > > ? ? ? ?if (seq == null || db == null || prog == null) { > ? ? ? ? ? ?throw new BioException( > ? ? ? ? ? ? ? ? ? ?"Impossible to execute QBlast request. One or more of > seq|db|prog has not been set"); > ? ? ? ?} > ? ? ? ?/* > ? ? ? ? * sending the command to execute the Blast analysis > ? ? ? ? */ > ? ? ? ?String cmd = "CMD=Put&SERVICE=plain" + "&" + seq + "&" + prog + "&" > ? ? ? ? ? ? ? ?+ db + "&" + "FORMAT_TYPE=HTML"; > > ? ? ? ?if (advanced != null) { > ? ? ? ? ? ?cmd += cmd + "&" + advanced; > ? ? ? ?} > > ? ? ? ?try { > > ? ? ? ? ? ?uConn = setQBlastProperties(aUrl.openConnection()); > > ? ? ? ? ? ?fromQBlast = new OutputStreamWriter(uConn.getOutputStream()); > > ? ? ? ? ? ?fromQBlast.write(cmd); > ? ? ? ? ? ?fromQBlast.flush(); > > ? ? ? ? ? ?// Get the response > ? ? ? ? ? ?rd = new BufferedReader(new InputStreamReader(uConn > ? ? ? ? ? ? ? ? ? ?.getInputStream())); > > ? ? ? ? ? ?String line = ""; > > ? ? ? ? ? ?while ((line = rd.readLine()) != null) { > ? ? ? ? ? ? ? ?if (line.contains("RID")) { > ? ? ? ? ? ? ? ? ? ?String[] arr = line.split("="); > ? ? ? ? ? ? ? ? ? ?rid = arr[1].trim(); > ? ? ? ? ? ? ? ?} else if (line.contains("RTOE")) { > ? ? ? ? ? ? ? ? ? ?String[] arr = line.split("="); > ? ? ? ? ? ? ? ? ? ?step = Long.parseLong(arr[1].trim()) * 1000; > ? ? ? ? ? ? ? ? ? ?start = System.currentTimeMillis() + step; > ? ? ? ? ? ? ? ?} > ? ? ? ? ? ?} > ? ? ? ?} catch (IOException e) { > ? ? ? ? ? ?throw new BioException( > ? ? ? ? ? ? ? ? ? ?"Can't submit sequence to BLAST server at this time."); > ? ? ? ?} > ? ? ? ?/* > ? ? ? ? * Getting the info out of the NCBI system > ? ? ? ? */ > ? ? ? ?while (!done) { > ? ? ? ? ? ?long prez = System.currentTimeMillis(); > ? ? ? ? ? ?done = isReady(rid, prez); > ? ? ? ?} > ? ?} > > ? ?/** > ? ? *

This method is used only for the executeBlastSearch method to > check for completion of > ? ? * request using the NCBI specified RTOE variable

> ? ? * > ? ? * @param id > ? ? * @param present > ? ? * @return > ? ? */ > ? ?private boolean isReady(String id, long present) { > > ? ? ? ?boolean ready = false; > ? ? ? ?String check = "CMD=Get&RID=" + id; > ? ? ? ?/* > ? ? ? ? * If present time is less than the start of the search added to > step > ? ? ? ? * obtained from NCBI, just do nothing ;-) > ? ? ? ? */ > ? ? ? ?if (present < start) { > ? ? ? ? ? ?; > ? ? ? ?} > ? ? ? ?/* > ? ? ? ? * If we are at least step seconds in the future from the actual > call of > ? ? ? ? * method executeBlastSearch() > ? ? ? ? */ > ? ? ? ?else { > ? ? ? ? ? ?try { > ? ? ? ? ? ? ? ?uConn = setQBlastProperties(aUrl.openConnection()); > > ? ? ? ? ? ? ? ?fromQBlast = new > OutputStreamWriter(uConn.getOutputStream()); > ? ? ? ? ? ? ? ?fromQBlast.write(check); > ? ? ? ? ? ? ? ?fromQBlast.flush(); > > ? ? ? ? ? ? ? ?rd = new BufferedReader(new InputStreamReader(uConn > ? ? ? ? ? ? ? ? ? ? ? ?.getInputStream())); > > ? ? ? ? ? ? ? ?String line = ""; > > ? ? ? ? ? ? ? ?while ((line = rd.readLine()) != null) { > ? ? ? ? ? ? ? ? ? ?if (line.contains("READY")) { > ? ? ? ? ? ? ? ? ? ? ? ?ready = true; > ? ? ? ? ? ? ? ? ? ?} else if (line.contains("WAITING")) { > ? ? ? ? ? ? ? ? ? ? ? ?/* > ? ? ? ? ? ? ? ? ? ? ? ? * Else, move start forward in time... > ? ? ? ? ? ? ? ? ? ? ? ? */ > ? ? ? ? ? ? ? ? ? ? ? ?start = present + step; > ? ? ? ? ? ? ? ? ? ?} > ? ? ? ? ? ? ? ?} > ? ? ? ? ? ?} catch (IOException e) { > ? ? ? ? ? ? ? ?e.printStackTrace(); > ? ? ? ? ? ?} > ? ? ? ?} > ? ? ? ?return ready; > ? ?} > > ? ?/** > ? ? *

This method extracts this actual Blast report. The default format > is Text but can be changed before with the method > ? ? * setQBlastOutputFormat.

> ? ? * > ? ? * > ? ? * @return > ? ? * @throws BioException > ? ? */ > ? ?public InputStream getAlignmentResults() throws BioException { > ? ? ? ?String srid = "CMD=Get&RID=" + rid; > ? ? ? ?srid += "&FORMAT_TYPE=" + outputFormat; > > ? ? ? ?if(!this.done){ > ? ? ? ? ? ?throw new BioException("Unable to get report at this time. Your > Blast request has not been processed yet."); > ? ? ? ?} > > ? ? ? ?try { > ? ? ? ? ? ?uConn = setQBlastProperties(aUrl.openConnection()); > > ? ? ? ? ? ?fromQBlast = new OutputStreamWriter(uConn.getOutputStream()); > ? ? ? ? ? ?fromQBlast.write(srid); > ? ? ? ? ? ?fromQBlast.flush(); > > ? ? ? ? ? ?return uConn.getInputStream(); > > ? ? ? ?} catch (IOException ioe) { > ? ? ? ? ? ?throw new BioException( > ? ? ? ? ? ? ? ? ? ?"It is not possible to fetch Blast report from NCBI at > this time"); > ? ? ? ?} > ? ?} > > ? ?/** > ? ? *

> ? ? * Set the sequence to be blasted using the String that correspond to > the > ? ? * sequence. > ? ? *

> ? ? * > ? ? *

> ? ? * Take note that this method is mutually exclusive to setGIToBlast() > for a > ? ? * given Blast request. > ? ? *

> ? ? * > ? ? * @param aStr > ? ? * ? ? ? ? ? ?: a String with the sequence > ? ? */ > ? ?public void setSequence(String aStr) { > ? ? ? ?this.seq = "QUERY=" + aStr; > ? ?} > > ? ?/** > ? ? * Simply return a string with the blasted sequence. > ? ? * > ? ? * @return seq : a string with the sequence > ? ? */ > ? ?public String getSeqToBlast() { > ? ? ? ?return this.seq; > ? ?} > > ? ?/** > ? ? *

> ? ? * Set the sequence to be blasted using the NCBI GI value. At this time, > ? ? * there is no effort made to check the validity of this GI. > ? ? *

> ? ? * > ? ? *

> ? ? * Take note that this method is mutually exclusive to setSeqToBlast() > for a > ? ? * given Blast request. > ? ? *

> ? ? * > ? ? * @param gi > ? ? * ? ? ? ? ? ?: an integer value representing a NCBI GI > ? ? */ > ? ?public void setGIToBlast(String gi) { > ? ? ? ?this.seq = "QUERY=" + gi; > ? ?} > > ? ?/** > ? ? *

> ? ? * Simply return a string with the sequence blasted. > ? ? *

> ? ? * > ? ? * @return GI : a String with the GI of the blasted sequence > ? ? */ > ? ?public String getGIToBlast() { > ? ? ? ?return this.seq; > ? ?} > > ? ?/** > ? ? *

> ? ? * This method set the program to be used to blast the given > sequence/GI. At > ? ? * this time, there is no attempt at checking the matching of sequence > type > ? ? * to program. > ? ? *

> ? ? * > ? ? * @param prog > ? ? * ? ? ? ? ? ?: a String representing the program specified for this > QBlast > ? ? * ? ? ? ? ? ?request. > ? ? * > ? ? */ > ? ?public void setProgram(String prog) { > ? ? ? ?this.prog = "PROGRAM=" + prog; > ? ?} > > ? ?/** > ? ? *

> ? ? * Simply returns the program used for the given Blast request. > ? ? *

> ? ? * > ? ? * @return prog : a String with the program used for this QBlast > request. > ? ? */ > ? ?public String getProgram() { > ? ? ? ?return this.prog; > ? ?} > > ? ?/** > ? ? *

> ? ? * This method set the database to be used to blast the given > sequence/GI. > ? ? * At this time, there is no attempt at checking the matching of > sequence > ? ? * type to database. > ? ? *

> ? ? * > ? ? * @param db: a String for the database specified for this QBlast > request > ? ? */ > ? ?public void setDatabase(String db) { > ? ? ? ?this.db = "DATABASE=" + db; > ? ?} > > ? ?/** > ? ? *

> ? ? * Simply returns the database used for the given Blast request. > ? ? *

> ? ? * > ? ? * @return db: a String with the database used for this QBlast request. > ? ? */ > ? ?public String getBlastDatabase() { > ? ? ? ?return this.db; > ? ?} > > ? ?/** > ? ? *

This method let the user specify which format to use for > generating the output.

> ? ? * > ? ? * @param type:an integer taken from the static constant of this class, > either be TEXT, XML or HTML > ? ? */ > ? ?public void setQBlastOutputFormat(int type) { > > ? ? ? ?switch (type) { > ? ? ? ? ? ?case 0: > ? ? ? ? ? ? ? ?this.outputFormat = "Text"; > ? ? ? ? ? ? ? ?break; > ? ? ? ? ? ?case 1: > ? ? ? ? ? ? ? ?this.outputFormat = "XML"; > ? ? ? ? ? ? ? ?break; > ? ? ? ? ? ?case 2: > ? ? ? ? ? ? ? ?this.outputFormat = "HTML"; > ? ? ? ? ? ? ? ?break; > ? ? ? ?} > ? ?} > > ? ?/** > ? ? *

> ? ? * Simply returns the output format used for the given Blast report. > ? ? *

> ? ? * > ? ? * @return outputFormat : a String with the format specified for the > QBlast report. > ? ? */ > ? ?public String getQBlastOutputFormat() { > ? ? ? ?return this.outputFormat; > ? ?} > > ? ?/** > ? ? *

This method is to be used if a request is to use non-default > values at submission. According to QBlast info, > ? ? * the accepted parameters for PUT requests are:

> ? ? * > ? ? *
    > ? ? *
  • -G: cost to create a gap. Default = 5 (nuc-nuc) / 11 (protein) / > non-affine for megablast
  • > ? ? *
  • -E: Cost to extend a gap. Default = 2 (nuc-nuc) / 1 (protein) / > non-affine for megablast
  • > ? ? *
  • -r: integer to reward for match. Default = 1
  • > ? ? *
  • -q: negative integer for penalty to allow mismatch. Default = > -3
  • > ? ? *
  • -e: expectation value. Default = 10.0
  • > ? ? *
  • -W: word size. Default = 3 (proteins) / 11 (nuc-nuc) / 28 > (megablast)
  • > ? ? *
  • -y: dropoff for blast extensions in bits, using default if not > specified. Default = 20 for blastn, 7 for all others > ? ? * (except megablast for which it is not applicable).
  • > ? ? *
  • -X: X dropoff value for gapped alignment, in bits. Default = 30 > for blastn/megablast, 15 for all others.
  • > ? ? *
  • -Z: final X dropoff value for gapped alignement, in bits. Default > = 50 for blastn, 25 for all others > ? ? * (except megablast for which it is not applicable)
  • > ? ? *
  • -P: equals 0 for multiple hits 1-pass, 1 for single hit 1-pass. > Does not apply to blastn ou megablast.
  • > ? ? *
  • -A: multiple hits window size. Default = 0 (for single hit > algorithm)
  • > ? ? *
  • -I: number of database sequences to save hits for. Default = > 500
  • > ? ? *
  • -Y: effective length of the search space. Default = 0 (0 > represents using the whole space)
  • > ? ? *
  • -z: a real specifying the effective length of the database to > use. Default = 0 (0 represents the real size)
  • > ? ? *
  • -c: an integer representing pseudocount constant for PSI-BLAST. > Default = 7
  • > ? ? *
  • -F: any filtering directive
  • > ? ? *
> ? ? * > ? ? *

You have to be aware that at not moment is there any error > checking on the use of these parameters by this class.

> ? ? * @param aStr: a String with any number of optional parameters with an > associated value. > ? ? * > ? ? */ > ? ?public void setAdvancedOptions(String aStr) { > ? ? ? ?this.advanced = "OTHER_ADVANCED=" + aStr; > ? ?} > > ? ?/** > ? ? * > ? ? * Simply return the string given as argument via > setBlastAdvancedOptions > ? ? * > ? ? * @return advanced: the string with the advanced options > ? ? */ > ? ?public String getBlastAdvancedOptions() { > ? ? ? ?return this.advanced; > ? ?} > > ? ?/** > ? ? * > ? ? * Simply return the QBlast RID for this specific QBlast request > ? ? * > ? ? * @return rid: the string with the RID > ? ? */ > ? ?public String getBlastRID() { > ? ? ? ?return this.rid; > ? ?} > > ? ?/** > ? ? * A simple method to check the availability of the QBlast service > ? ? * > ? ? * @throws BioException > ? ? */ > ? ?public void printRemoteBlastInfo() throws BioException { > ? ? ? ?try { > ? ? ? ? ? ?OutputStreamWriter out = new OutputStreamWriter(uConn > ? ? ? ? ? ? ? ? ? ?.getOutputStream()); > > ? ? ? ? ? ?out.write("CMD=Info"); > ? ? ? ? ? ?out.flush(); > > ? ? ? ? ? ?// Get the response > ? ? ? ? ? ?BufferedReader rd = new BufferedReader(new > InputStreamReader(uConn > ? ? ? ? ? ? ? ? ? ?.getInputStream())); > > ? ? ? ? ? ?String line = ""; > > ? ? ? ? ? ?while ((line = rd.readLine()) != null) { > ? ? ? ? ? ? ? ?System.out.println(line); > ? ? ? ? ? ?} > > ? ? ? ? ? ?out.close(); > ? ? ? ? ? ?rd.close(); > ? ? ? ?} catch (IOException e) { > ? ? ? ? ? ?throw new BioException( > ? ? ? ? ? ? ? ? ? ?"Impossible to get info from QBlast service at this > time. Check your network connection"); > ? ? ? ?} > ? ?} > > ? ?private URLConnection setQBlastProperties(URLConnection conn) { > > ? ? ? ?URLConnection tmp = conn; > > ? ? ? ?conn.setDoOutput(true); > ? ? ? ?conn.setUseCaches(false); > > ? ? ? ?tmp.setRequestProperty("User-Agent", "Biojava/RemoteQBlastService"); > ? ? ? ?tmp.setRequestProperty("Connection", "Keep-Alive"); > ? ? ? ?tmp.setRequestProperty("Content-type", > ? ? ? ? ? ? ? ?"application/x-www-form-urlencoded"); > ? ? ? ?tmp.setRequestProperty("Content-length", "200"); > > ? ? ? ?return tmp; > ? ?} > } > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > From holland at eaglegenomics.com Thu Jun 11 15:30:11 2009 From: holland at eaglegenomics.com (Richard Holland) Date: Thu, 11 Jun 2009 16:30:11 +0100 Subject: [Biojava-dev] First draft of a remote blast service class In-Reply-To: <59a41c430906110824l2f1167cfp236cf69cc9dee94c@mail.gmail.com> References: <59a41c430906110824l2f1167cfp236cf69cc9dee94c@mail.gmail.com> Message-ID: <1244734211.5546.62.camel@buzzybee> Excellent idea. Even better than a Map or Properties! One parameters bean type per implementation type complete with all its own validation, extending a placeholder interface that can be used in the generic interface declaration for RemotePairwiseAlignmentService. That would be sweet. On Thu, 2009-06-11 at 08:24 -0700, Andreas Prlic wrote: > I would pass the parameters as a bean rather than a string... > > Andreas > > On Thu, Jun 11, 2009 at 6:52 AM, Sylvain > Foisy wrote: > > Hi to all, > > > > I've been working on this for the past week or so and after discussing this > > with Andreas, I am putting my code here for critical review. I'll put this > > stuff in biojava-live as soon as Andreas can fix my SVN access. > > > > First, an interface called RemotePairwiseAlignementSerivce defines the basic > > components of a remote service: sequence/database/progam/run options/output > > options. RemoteQBlastService implements this interface and runs remote > > Qblast requests and creates output in either text, XML or HTML. At present > > time, regular blastall programs work, no blastpgp/megablast support yet. > > > > I'll need some guidance to make it work on other type of web services like > > EBI. > > > > Best regards > > > > Sylvain > > > > =================================================================== > > > > Sylvain Foisy, Ph. D. > > Consultant Bio-informatique / Bioinformatics > > Diploide.net - TI pour la vie / IT for Life > > > > Courriel: sylvain.foisy at diploide.net > > Web: http://www.diploide.net > > Tel: (514) 893-4363 > > =================================================================== > > > > import java.io.InputStream; > > > > import org.biojava.bio.BioException; > > /** > > * This interface specifies minimal information needed to execute a pairwise > > alignment on a remote service. > > * > > * Example of service: QBlast service at NCBI > > * Web Service at EBI > > * > > * @author Sylvain Foisy > > * @since 1.8 > > * > > */ > > public interface RemotePairwiseAlignementService { > > > > /** > > * This field specifies that the output format of results > > * is text. > > * > > */ > > public static final int TEXT = 0; > > > > /** > > * This field specifies that the output format of results > > * is XML. > > * > > */ > > public static final int XML = 1; > > > > /** > > * This field specifies that the output format of results > > * is HTML. > > * > > */ > > public static final int HTML = 2; > > > > /** > > * Setting the database to use for doing the pairwise alignment > > * > > * @param db: a String with a valid database ID for the > > service used. > > * > > */ > > public void setDatabase(String db); > > > > /** > > * Setting the sequence to be align for this for this request > > * > > * @param seq: a String with a sequence to be aligned. > > * > > */ > > public void setSequence(String seq); > > > > /** > > * Setting the program to use for this pairwise alignment > > * > > * @param prog: a String with a valid database ID for the > > service used. > > * > > */ > > public void setProgram(String prog); > > > > /** > > * Setting all other options to use for this pairwise alignment > > * > > * @param db: a String with a valid database ID for the > > service used. > > * > > */ > > public void setAdvancedOptions(String str); > > > > /** > > * Doing the actual analysis on the instantiated service > > * > > * @throws BioException > > */ > > public void executeSearch() throws BioException; > > > > /** > > * Getting the actual alignment results from this instantiated service > > * > > * @return : an InputStream with the actual alignment > > results > > * @throws BioException > > */ > > public InputStream getAlignmentResults() throws BioException; > > } > > > > import java.io.BufferedReader; > > import java.io.IOException; > > import java.io.InputStream; > > import java.io.InputStreamReader; > > import java.io.OutputStreamWriter; > > import java.net.MalformedURLException; > > import java.net.URL; > > import java.net.URLConnection; > > > > import org.biojava.bio.BioException; > > > > /** > > * RemoteQBlastService - A simple way of submitting BLAST request to the > > QBlast > > * service at NCBI. > > * > > *

> > * NCBI provides a Blast server through a CGI-BIN interface. > > RemoteQBlastService simply > > * encapsulates an access to it by giving users access to get/set methods to > > fix > > * sequence, program and database as well as advanced options. > > *

> > * > > *

> > * As of version 1.0, only blastall programs are usable. blastpgp and > > megablast are high-priorities. > > *

> > * > > * @author Sylvain Foisy > > * @version 1.0 > > * @since 1.8 > > * > > * > > */ > > public class RemoteQBlastService implements RemotePairwiseAlignementService{ > > > > // public static final int TEXT = 0; > > // public static final int XML = 1; > > // public static final int HTML = 2; > > > > private static String baseurl = > > "http://www.ncbi.nlm.nih.gov/blast/Blast.cgi"; > > private URL aUrl; > > private URLConnection uConn; > > private OutputStreamWriter fromQBlast; > > private BufferedReader rd; > > > > private String seq = null; > > private String prog = null; > > private String db = null; > > private String outputFormat = null; > > private String advanced = null; > > > > private String rid; > > private long step; > > private boolean done = false; > > private long start; > > > > public RemoteQBlastService() throws BioException { > > try { > > aUrl = new URL(baseurl); > > uConn = setQBlastProperties(aUrl.openConnection()); > > > > outputFormat = "Text"; > > } > > /* > > * Needed but should never be thrown since the URL is static and > > known to exist > > */ > > catch (MalformedURLException e) { > > throw new BioException("It looks like the URL for NCBI QBlast > > service is bad"); > > } > > /* > > * Intercept if the program can't connect to QBlast service > > */ > > catch (IOException e) { > > throw new BioException( > > "Impossible to connect to QBlast service at this time. > > Check your network connection"); > > } > > } > > > > /** > > * This method execute the Blast request via the Put command of the > > CGI-BIN > > * interface. It gets the estimated time of completion by capturing the > > * value of the RTOE variable and sets a loop that will check for > > completion > > * of analysis at intervals specified by RTOE. > > * > > *

> > * It also capture the value for the RID variable, necessary for > > fetching > > * the actual results after completion. > > *

> > * > > * @throws BioException > > * if it is not possible to sent the BLAST command > > */ > > public void executeSearch() throws BioException { > > > > if (seq == null || db == null || prog == null) { > > throw new BioException( > > "Impossible to execute QBlast request. One or more of > > seq|db|prog has not been set"); > > } > > /* > > * sending the command to execute the Blast analysis > > */ > > String cmd = "CMD=Put&SERVICE=plain" + "&" + seq + "&" + prog + "&" > > + db + "&" + "FORMAT_TYPE=HTML"; > > > > if (advanced != null) { > > cmd += cmd + "&" + advanced; > > } > > > > try { > > > > uConn = setQBlastProperties(aUrl.openConnection()); > > > > fromQBlast = new OutputStreamWriter(uConn.getOutputStream()); > > > > fromQBlast.write(cmd); > > fromQBlast.flush(); > > > > // Get the response > > rd = new BufferedReader(new InputStreamReader(uConn > > .getInputStream())); > > > > String line = ""; > > > > while ((line = rd.readLine()) != null) { > > if (line.contains("RID")) { > > String[] arr = line.split("="); > > rid = arr[1].trim(); > > } else if (line.contains("RTOE")) { > > String[] arr = line.split("="); > > step = Long.parseLong(arr[1].trim()) * 1000; > > start = System.currentTimeMillis() + step; > > } > > } > > } catch (IOException e) { > > throw new BioException( > > "Can't submit sequence to BLAST server at this time."); > > } > > /* > > * Getting the info out of the NCBI system > > */ > > while (!done) { > > long prez = System.currentTimeMillis(); > > done = isReady(rid, prez); > > } > > } > > > > /** > > *

This method is used only for the executeBlastSearch method to > > check for completion of > > * request using the NCBI specified RTOE variable

> > * > > * @param id > > * @param present > > * @return > > */ > > private boolean isReady(String id, long present) { > > > > boolean ready = false; > > String check = "CMD=Get&RID=" + id; > > /* > > * If present time is less than the start of the search added to > > step > > * obtained from NCBI, just do nothing ;-) > > */ > > if (present < start) { > > ; > > } > > /* > > * If we are at least step seconds in the future from the actual > > call of > > * method executeBlastSearch() > > */ > > else { > > try { > > uConn = setQBlastProperties(aUrl.openConnection()); > > > > fromQBlast = new > > OutputStreamWriter(uConn.getOutputStream()); > > fromQBlast.write(check); > > fromQBlast.flush(); > > > > rd = new BufferedReader(new InputStreamReader(uConn > > .getInputStream())); > > > > String line = ""; > > > > while ((line = rd.readLine()) != null) { > > if (line.contains("READY")) { > > ready = true; > > } else if (line.contains("WAITING")) { > > /* > > * Else, move start forward in time... > > */ > > start = present + step; > > } > > } > > } catch (IOException e) { > > e.printStackTrace(); > > } > > } > > return ready; > > } > > > > /** > > *

This method extracts this actual Blast report. The default format > > is Text but can be changed before with the method > > * setQBlastOutputFormat.

> > * > > * > > * @return > > * @throws BioException > > */ > > public InputStream getAlignmentResults() throws BioException { > > String srid = "CMD=Get&RID=" + rid; > > srid += "&FORMAT_TYPE=" + outputFormat; > > > > if(!this.done){ > > throw new BioException("Unable to get report at this time. Your > > Blast request has not been processed yet."); > > } > > > > try { > > uConn = setQBlastProperties(aUrl.openConnection()); > > > > fromQBlast = new OutputStreamWriter(uConn.getOutputStream()); > > fromQBlast.write(srid); > > fromQBlast.flush(); > > > > return uConn.getInputStream(); > > > > } catch (IOException ioe) { > > throw new BioException( > > "It is not possible to fetch Blast report from NCBI at > > this time"); > > } > > } > > > > /** > > *

> > * Set the sequence to be blasted using the String that correspond to > > the > > * sequence. > > *

> > * > > *

> > * Take note that this method is mutually exclusive to setGIToBlast() > > for a > > * given Blast request. > > *

> > * > > * @param aStr > > * : a String with the sequence > > */ > > public void setSequence(String aStr) { > > this.seq = "QUERY=" + aStr; > > } > > > > /** > > * Simply return a string with the blasted sequence. > > * > > * @return seq : a string with the sequence > > */ > > public String getSeqToBlast() { > > return this.seq; > > } > > > > /** > > *

> > * Set the sequence to be blasted using the NCBI GI value. At this time, > > * there is no effort made to check the validity of this GI. > > *

> > * > > *

> > * Take note that this method is mutually exclusive to setSeqToBlast() > > for a > > * given Blast request. > > *

> > * > > * @param gi > > * : an integer value representing a NCBI GI > > */ > > public void setGIToBlast(String gi) { > > this.seq = "QUERY=" + gi; > > } > > > > /** > > *

> > * Simply return a string with the sequence blasted. > > *

> > * > > * @return GI : a String with the GI of the blasted sequence > > */ > > public String getGIToBlast() { > > return this.seq; > > } > > > > /** > > *

> > * This method set the program to be used to blast the given > > sequence/GI. At > > * this time, there is no attempt at checking the matching of sequence > > type > > * to program. > > *

> > * > > * @param prog > > * : a String representing the program specified for this > > QBlast > > * request. > > * > > */ > > public void setProgram(String prog) { > > this.prog = "PROGRAM=" + prog; > > } > > > > /** > > *

> > * Simply returns the program used for the given Blast request. > > *

> > * > > * @return prog : a String with the program used for this QBlast > > request. > > */ > > public String getProgram() { > > return this.prog; > > } > > > > /** > > *

> > * This method set the database to be used to blast the given > > sequence/GI. > > * At this time, there is no attempt at checking the matching of > > sequence > > * type to database. > > *

> > * > > * @param db: a String for the database specified for this QBlast > > request > > */ > > public void setDatabase(String db) { > > this.db = "DATABASE=" + db; > > } > > > > /** > > *

> > * Simply returns the database used for the given Blast request. > > *

> > * > > * @return db: a String with the database used for this QBlast request. > > */ > > public String getBlastDatabase() { > > return this.db; > > } > > > > /** > > *

This method let the user specify which format to use for > > generating the output.

> > * > > * @param type:an integer taken from the static constant of this class, > > either be TEXT, XML or HTML > > */ > > public void setQBlastOutputFormat(int type) { > > > > switch (type) { > > case 0: > > this.outputFormat = "Text"; > > break; > > case 1: > > this.outputFormat = "XML"; > > break; > > case 2: > > this.outputFormat = "HTML"; > > break; > > } > > } > > > > /** > > *

> > * Simply returns the output format used for the given Blast report. > > *

> > * > > * @return outputFormat : a String with the format specified for the > > QBlast report. > > */ > > public String getQBlastOutputFormat() { > > return this.outputFormat; > > } > > > > /** > > *

This method is to be used if a request is to use non-default > > values at submission. According to QBlast info, > > * the accepted parameters for PUT requests are:

> > * > > *
    > > *
  • -G: cost to create a gap. Default = 5 (nuc-nuc) / 11 (protein) / > > non-affine for megablast
  • > > *
  • -E: Cost to extend a gap. Default = 2 (nuc-nuc) / 1 (protein) / > > non-affine for megablast
  • > > *
  • -r: integer to reward for match. Default = 1
  • > > *
  • -q: negative integer for penalty to allow mismatch. Default = > > -3
  • > > *
  • -e: expectation value. Default = 10.0
  • > > *
  • -W: word size. Default = 3 (proteins) / 11 (nuc-nuc) / 28 > > (megablast)
  • > > *
  • -y: dropoff for blast extensions in bits, using default if not > > specified. Default = 20 for blastn, 7 for all others > > * (except megablast for which it is not applicable).
  • > > *
  • -X: X dropoff value for gapped alignment, in bits. Default = 30 > > for blastn/megablast, 15 for all others.
  • > > *
  • -Z: final X dropoff value for gapped alignement, in bits. Default > > = 50 for blastn, 25 for all others > > * (except megablast for which it is not applicable)
  • > > *
  • -P: equals 0 for multiple hits 1-pass, 1 for single hit 1-pass. > > Does not apply to blastn ou megablast.
  • > > *
  • -A: multiple hits window size. Default = 0 (for single hit > > algorithm)
  • > > *
  • -I: number of database sequences to save hits for. Default = > > 500
  • > > *
  • -Y: effective length of the search space. Default = 0 (0 > > represents using the whole space)
  • > > *
  • -z: a real specifying the effective length of the database to > > use. Default = 0 (0 represents the real size)
  • > > *
  • -c: an integer representing pseudocount constant for PSI-BLAST. > > Default = 7
  • > > *
  • -F: any filtering directive
  • > > *
> > * > > *

You have to be aware that at not moment is there any error > > checking on the use of these parameters by this class.

> > * @param aStr: a String with any number of optional parameters with an > > associated value. > > * > > */ > > public void setAdvancedOptions(String aStr) { > > this.advanced = "OTHER_ADVANCED=" + aStr; > > } > > > > /** > > * > > * Simply return the string given as argument via > > setBlastAdvancedOptions > > * > > * @return advanced: the string with the advanced options > > */ > > public String getBlastAdvancedOptions() { > > return this.advanced; > > } > > > > /** > > * > > * Simply return the QBlast RID for this specific QBlast request > > * > > * @return rid: the string with the RID > > */ > > public String getBlastRID() { > > return this.rid; > > } > > > > /** > > * A simple method to check the availability of the QBlast service > > * > > * @throws BioException > > */ > > public void printRemoteBlastInfo() throws BioException { > > try { > > OutputStreamWriter out = new OutputStreamWriter(uConn > > .getOutputStream()); > > > > out.write("CMD=Info"); > > out.flush(); > > > > // Get the response > > BufferedReader rd = new BufferedReader(new > > InputStreamReader(uConn > > .getInputStream())); > > > > String line = ""; > > > > while ((line = rd.readLine()) != null) { > > System.out.println(line); > > } > > > > out.close(); > > rd.close(); > > } catch (IOException e) { > > throw new BioException( > > "Impossible to get info from QBlast service at this > > time. Check your network connection"); > > } > > } > > > > private URLConnection setQBlastProperties(URLConnection conn) { > > > > URLConnection tmp = conn; > > > > conn.setDoOutput(true); > > conn.setUseCaches(false); > > > > tmp.setRequestProperty("User-Agent", "Biojava/RemoteQBlastService"); > > tmp.setRequestProperty("Connection", "Keep-Alive"); > > tmp.setRequestProperty("Content-type", > > "application/x-www-form-urlencoded"); > > tmp.setRequestProperty("Content-length", "200"); > > > > return tmp; > > } > > } > > > > > > _______________________________________________ > > biojava-dev mailing list > > biojava-dev at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-dev > > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev -- Richard Holland, BSc MBCS Finance Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From holland at eaglegenomics.com Thu Jun 11 14:17:35 2009 From: holland at eaglegenomics.com (Richard Holland) Date: Thu, 11 Jun 2009 15:17:35 +0100 Subject: [Biojava-dev] First draft of a remote blast service class In-Reply-To: References: Message-ID: <1244729855.5546.52.camel@buzzybee> Good stuff! My 2p's worth: setSequence() should be overloaded to accept all forms of possible sequence input - whatever is decided on as the standard way of referencing sequence data in BJ3. The original plan for BJ3 was to allow String/CharSequence and List (see http://www.biojava.org/wiki/BioJava3:HowTo ) setAdvancedOptions() should not accept a String, but rather a Properties or a Map, where the keys of the Map/Properties are restricted to a range of acceptable values determined (and published, maybe as an enum?) by each of the implementation classes (e.g. RemoteQBlastService). The implementation class then uses this to construct the call string. The reason for doing it this way is that (a) it allows the parameters to be verified by checking them against a known list of allowable key/values, and (b) it allows for non-URL based remote requests to be constructed from the values, e.g. SOAP calls. I would also replace the static int HTML/TEXT/XML with an enum as numeric constants are sometimes a Bad Thing. The setProgram() method in my mind is specific to Blast, as opposed to being a generic Pairwise Alignment concept. Therefore it might be better to move this to a Blast-specific sub-interface or make it only appear in the implementation classes that refer to Blast. Finally, the JavaDocs for the various set() methods are incorrect - they're all mostly the same in fact! :) But overall it looks good. cheers, Richard On Thu, 2009-06-11 at 09:52 -0400, Sylvain Foisy wrote: > Hi to all, > > I've been working on this for the past week or so and after discussing this > with Andreas, I am putting my code here for critical review. I'll put this > stuff in biojava-live as soon as Andreas can fix my SVN access. > > First, an interface called RemotePairwiseAlignementSerivce defines the basic > components of a remote service: sequence/database/progam/run options/output > options. RemoteQBlastService implements this interface and runs remote > Qblast requests and creates output in either text, XML or HTML. At present > time, regular blastall programs work, no blastpgp/megablast support yet. > > I'll need some guidance to make it work on other type of web services like > EBI. > > Best regards > > Sylvain > > =================================================================== > > Sylvain Foisy, Ph. D. > Consultant Bio-informatique / Bioinformatics > Diploide.net - TI pour la vie / IT for Life > > Courriel: sylvain.foisy at diploide.net > Web: http://www.diploide.net > Tel: (514) 893-4363 > =================================================================== > > import java.io.InputStream; > > import org.biojava.bio.BioException; > /** > * This interface specifies minimal information needed to execute a pairwise > alignment on a remote service. > * > * Example of service: QBlast service at NCBI > * Web Service at EBI > * > * @author Sylvain Foisy > * @since 1.8 > * > */ > public interface RemotePairwiseAlignementService { > > /** > * This field specifies that the output format of results > * is text. > * > */ > public static final int TEXT = 0; > > /** > * This field specifies that the output format of results > * is XML. > * > */ > public static final int XML = 1; > > /** > * This field specifies that the output format of results > * is HTML. > * > */ > public static final int HTML = 2; > > /** > * Setting the database to use for doing the pairwise alignment > * > * @param db: a String with a valid database ID for the > service used. > * > */ > public void setDatabase(String db); > > /** > * Setting the sequence to be align for this for this request > * > * @param seq: a String with a sequence to be aligned. > * > */ > public void setSequence(String seq); > > /** > * Setting the program to use for this pairwise alignment > * > * @param prog: a String with a valid database ID for the > service used. > * > */ > public void setProgram(String prog); > > /** > * Setting all other options to use for this pairwise alignment > * > * @param db: a String with a valid database ID for the > service used. > * > */ > public void setAdvancedOptions(String str); > > /** > * Doing the actual analysis on the instantiated service > * > * @throws BioException > */ > public void executeSearch() throws BioException; > > /** > * Getting the actual alignment results from this instantiated service > * > * @return : an InputStream with the actual alignment > results > * @throws BioException > */ > public InputStream getAlignmentResults() throws BioException; > } > > import java.io.BufferedReader; > import java.io.IOException; > import java.io.InputStream; > import java.io.InputStreamReader; > import java.io.OutputStreamWriter; > import java.net.MalformedURLException; > import java.net.URL; > import java.net.URLConnection; > > import org.biojava.bio.BioException; > > /** > * RemoteQBlastService - A simple way of submitting BLAST request to the > QBlast > * service at NCBI. > * > *

> * NCBI provides a Blast server through a CGI-BIN interface. > RemoteQBlastService simply > * encapsulates an access to it by giving users access to get/set methods to > fix > * sequence, program and database as well as advanced options. > *

> * > *

> * As of version 1.0, only blastall programs are usable. blastpgp and > megablast are high-priorities. > *

> * > * @author Sylvain Foisy > * @version 1.0 > * @since 1.8 > * > * > */ > public class RemoteQBlastService implements RemotePairwiseAlignementService{ > > // public static final int TEXT = 0; > // public static final int XML = 1; > // public static final int HTML = 2; > > private static String baseurl = > "http://www.ncbi.nlm.nih.gov/blast/Blast.cgi"; > private URL aUrl; > private URLConnection uConn; > private OutputStreamWriter fromQBlast; > private BufferedReader rd; > > private String seq = null; > private String prog = null; > private String db = null; > private String outputFormat = null; > private String advanced = null; > > private String rid; > private long step; > private boolean done = false; > private long start; > > public RemoteQBlastService() throws BioException { > try { > aUrl = new URL(baseurl); > uConn = setQBlastProperties(aUrl.openConnection()); > > outputFormat = "Text"; > } > /* > * Needed but should never be thrown since the URL is static and > known to exist > */ > catch (MalformedURLException e) { > throw new BioException("It looks like the URL for NCBI QBlast > service is bad"); > } > /* > * Intercept if the program can't connect to QBlast service > */ > catch (IOException e) { > throw new BioException( > "Impossible to connect to QBlast service at this time. > Check your network connection"); > } > } > > /** > * This method execute the Blast request via the Put command of the > CGI-BIN > * interface. It gets the estimated time of completion by capturing the > * value of the RTOE variable and sets a loop that will check for > completion > * of analysis at intervals specified by RTOE. > * > *

> * It also capture the value for the RID variable, necessary for > fetching > * the actual results after completion. > *

> * > * @throws BioException > * if it is not possible to sent the BLAST command > */ > public void executeSearch() throws BioException { > > if (seq == null || db == null || prog == null) { > throw new BioException( > "Impossible to execute QBlast request. One or more of > seq|db|prog has not been set"); > } > /* > * sending the command to execute the Blast analysis > */ > String cmd = "CMD=Put&SERVICE=plain" + "&" + seq + "&" + prog + "&" > + db + "&" + "FORMAT_TYPE=HTML"; > > if (advanced != null) { > cmd += cmd + "&" + advanced; > } > > try { > > uConn = setQBlastProperties(aUrl.openConnection()); > > fromQBlast = new OutputStreamWriter(uConn.getOutputStream()); > > fromQBlast.write(cmd); > fromQBlast.flush(); > > // Get the response > rd = new BufferedReader(new InputStreamReader(uConn > .getInputStream())); > > String line = ""; > > while ((line = rd.readLine()) != null) { > if (line.contains("RID")) { > String[] arr = line.split("="); > rid = arr[1].trim(); > } else if (line.contains("RTOE")) { > String[] arr = line.split("="); > step = Long.parseLong(arr[1].trim()) * 1000; > start = System.currentTimeMillis() + step; > } > } > } catch (IOException e) { > throw new BioException( > "Can't submit sequence to BLAST server at this time."); > } > /* > * Getting the info out of the NCBI system > */ > while (!done) { > long prez = System.currentTimeMillis(); > done = isReady(rid, prez); > } > } > > /** > *

This method is used only for the executeBlastSearch method to > check for completion of > * request using the NCBI specified RTOE variable

> * > * @param id > * @param present > * @return > */ > private boolean isReady(String id, long present) { > > boolean ready = false; > String check = "CMD=Get&RID=" + id; > /* > * If present time is less than the start of the search added to > step > * obtained from NCBI, just do nothing ;-) > */ > if (present < start) { > ; > } > /* > * If we are at least step seconds in the future from the actual > call of > * method executeBlastSearch() > */ > else { > try { > uConn = setQBlastProperties(aUrl.openConnection()); > > fromQBlast = new > OutputStreamWriter(uConn.getOutputStream()); > fromQBlast.write(check); > fromQBlast.flush(); > > rd = new BufferedReader(new InputStreamReader(uConn > .getInputStream())); > > String line = ""; > > while ((line = rd.readLine()) != null) { > if (line.contains("READY")) { > ready = true; > } else if (line.contains("WAITING")) { > /* > * Else, move start forward in time... > */ > start = present + step; > } > } > } catch (IOException e) { > e.printStackTrace(); > } > } > return ready; > } > > /** > *

This method extracts this actual Blast report. The default format > is Text but can be changed before with the method > * setQBlastOutputFormat.

> * > * > * @return > * @throws BioException > */ > public InputStream getAlignmentResults() throws BioException { > String srid = "CMD=Get&RID=" + rid; > srid += "&FORMAT_TYPE=" + outputFormat; > > if(!this.done){ > throw new BioException("Unable to get report at this time. Your > Blast request has not been processed yet."); > } > > try { > uConn = setQBlastProperties(aUrl.openConnection()); > > fromQBlast = new OutputStreamWriter(uConn.getOutputStream()); > fromQBlast.write(srid); > fromQBlast.flush(); > > return uConn.getInputStream(); > > } catch (IOException ioe) { > throw new BioException( > "It is not possible to fetch Blast report from NCBI at > this time"); > } > } > > /** > *

> * Set the sequence to be blasted using the String that correspond to > the > * sequence. > *

> * > *

> * Take note that this method is mutually exclusive to setGIToBlast() > for a > * given Blast request. > *

> * > * @param aStr > * : a String with the sequence > */ > public void setSequence(String aStr) { > this.seq = "QUERY=" + aStr; > } > > /** > * Simply return a string with the blasted sequence. > * > * @return seq : a string with the sequence > */ > public String getSeqToBlast() { > return this.seq; > } > > /** > *

> * Set the sequence to be blasted using the NCBI GI value. At this time, > * there is no effort made to check the validity of this GI. > *

> * > *

> * Take note that this method is mutually exclusive to setSeqToBlast() > for a > * given Blast request. > *

> * > * @param gi > * : an integer value representing a NCBI GI > */ > public void setGIToBlast(String gi) { > this.seq = "QUERY=" + gi; > } > > /** > *

> * Simply return a string with the sequence blasted. > *

> * > * @return GI : a String with the GI of the blasted sequence > */ > public String getGIToBlast() { > return this.seq; > } > > /** > *

> * This method set the program to be used to blast the given > sequence/GI. At > * this time, there is no attempt at checking the matching of sequence > type > * to program. > *

> * > * @param prog > * : a String representing the program specified for this > QBlast > * request. > * > */ > public void setProgram(String prog) { > this.prog = "PROGRAM=" + prog; > } > > /** > *

> * Simply returns the program used for the given Blast request. > *

> * > * @return prog : a String with the program used for this QBlast > request. > */ > public String getProgram() { > return this.prog; > } > > /** > *

> * This method set the database to be used to blast the given > sequence/GI. > * At this time, there is no attempt at checking the matching of > sequence > * type to database. > *

> * > * @param db: a String for the database specified for this QBlast > request > */ > public void setDatabase(String db) { > this.db = "DATABASE=" + db; > } > > /** > *

> * Simply returns the database used for the given Blast request. > *

> * > * @return db: a String with the database used for this QBlast request. > */ > public String getBlastDatabase() { > return this.db; > } > > /** > *

This method let the user specify which format to use for > generating the output.

> * > * @param type:an integer taken from the static constant of this class, > either be TEXT, XML or HTML > */ > public void setQBlastOutputFormat(int type) { > > switch (type) { > case 0: > this.outputFormat = "Text"; > break; > case 1: > this.outputFormat = "XML"; > break; > case 2: > this.outputFormat = "HTML"; > break; > } > } > > /** > *

> * Simply returns the output format used for the given Blast report. > *

> * > * @return outputFormat : a String with the format specified for the > QBlast report. > */ > public String getQBlastOutputFormat() { > return this.outputFormat; > } > > /** > *

This method is to be used if a request is to use non-default > values at submission. According to QBlast info, > * the accepted parameters for PUT requests are:

> * > *
    > *
  • -G: cost to create a gap. Default = 5 (nuc-nuc) / 11 (protein) / > non-affine for megablast
  • > *
  • -E: Cost to extend a gap. Default = 2 (nuc-nuc) / 1 (protein) / > non-affine for megablast
  • > *
  • -r: integer to reward for match. Default = 1
  • > *
  • -q: negative integer for penalty to allow mismatch. Default = > -3
  • > *
  • -e: expectation value. Default = 10.0
  • > *
  • -W: word size. Default = 3 (proteins) / 11 (nuc-nuc) / 28 > (megablast)
  • > *
  • -y: dropoff for blast extensions in bits, using default if not > specified. Default = 20 for blastn, 7 for all others > * (except megablast for which it is not applicable).
  • > *
  • -X: X dropoff value for gapped alignment, in bits. Default = 30 > for blastn/megablast, 15 for all others.
  • > *
  • -Z: final X dropoff value for gapped alignement, in bits. Default > = 50 for blastn, 25 for all others > * (except megablast for which it is not applicable)
  • > *
  • -P: equals 0 for multiple hits 1-pass, 1 for single hit 1-pass. > Does not apply to blastn ou megablast.
  • > *
  • -A: multiple hits window size. Default = 0 (for single hit > algorithm)
  • > *
  • -I: number of database sequences to save hits for. Default = > 500
  • > *
  • -Y: effective length of the search space. Default = 0 (0 > represents using the whole space)
  • > *
  • -z: a real specifying the effective length of the database to > use. Default = 0 (0 represents the real size)
  • > *
  • -c: an integer representing pseudocount constant for PSI-BLAST. > Default = 7
  • > *
  • -F: any filtering directive
  • > *
> * > *

You have to be aware that at not moment is there any error > checking on the use of these parameters by this class.

> * @param aStr: a String with any number of optional parameters with an > associated value. > * > */ > public void setAdvancedOptions(String aStr) { > this.advanced = "OTHER_ADVANCED=" + aStr; > } > > /** > * > * Simply return the string given as argument via > setBlastAdvancedOptions > * > * @return advanced: the string with the advanced options > */ > public String getBlastAdvancedOptions() { > return this.advanced; > } > > /** > * > * Simply return the QBlast RID for this specific QBlast request > * > * @return rid: the string with the RID > */ > public String getBlastRID() { > return this.rid; > } > > /** > * A simple method to check the availability of the QBlast service > * > * @throws BioException > */ > public void printRemoteBlastInfo() throws BioException { > try { > OutputStreamWriter out = new OutputStreamWriter(uConn > .getOutputStream()); > > out.write("CMD=Info"); > out.flush(); > > // Get the response > BufferedReader rd = new BufferedReader(new > InputStreamReader(uConn > .getInputStream())); > > String line = ""; > > while ((line = rd.readLine()) != null) { > System.out.println(line); > } > > out.close(); > rd.close(); > } catch (IOException e) { > throw new BioException( > "Impossible to get info from QBlast service at this > time. Check your network connection"); > } > } > > private URLConnection setQBlastProperties(URLConnection conn) { > > URLConnection tmp = conn; > > conn.setDoOutput(true); > conn.setUseCaches(false); > > tmp.setRequestProperty("User-Agent", "Biojava/RemoteQBlastService"); > tmp.setRequestProperty("Connection", "Keep-Alive"); > tmp.setRequestProperty("Content-type", > "application/x-www-form-urlencoded"); > tmp.setRequestProperty("Content-length", "200"); > > return tmp; > } > } > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev -- Richard Holland, BSc MBCS Finance Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From ayates at ebi.ac.uk Thu Jun 11 15:53:35 2009 From: ayates at ebi.ac.uk (Andy Yates) Date: Thu, 11 Jun 2009 16:53:35 +0100 Subject: [Biojava-dev] First draft of a remote blast service class In-Reply-To: <1244729855.5546.52.camel@buzzybee> References: <1244729855.5546.52.camel@buzzybee> Message-ID: <4A31287F.9070102@ebi.ac.uk> Really the map/enum pattern is nearly knocking on the door of the prototype pattern & is a very good way to go for this kind of system where target values are never set in stone (well only for a particular release of a service). If anyone is interested there's a very good bit of information from: http://steve-yegge.blogspot.com/2008/10/universal-design-pattern.html Andy Richard Holland wrote: > Good stuff! My 2p's worth: > > setSequence() should be overloaded to accept all forms of possible > sequence input - whatever is decided on as the standard way of > referencing sequence data in BJ3. The original plan for BJ3 was to allow > String/CharSequence and List (see > http://www.biojava.org/wiki/BioJava3:HowTo ) > > setAdvancedOptions() should not accept a String, but rather a Properties > or a Map, where the keys of the Map/Properties are > restricted to a range of acceptable values determined (and published, > maybe as an enum?) by each of the implementation classes (e.g. > RemoteQBlastService). The implementation class then uses this to > construct the call string. The reason for doing it this way is that (a) > it allows the parameters to be verified by checking them against a known > list of allowable key/values, and (b) it allows for non-URL based remote > requests to be constructed from the values, e.g. SOAP calls. > > I would also replace the static int HTML/TEXT/XML with an enum as > numeric constants are sometimes a Bad Thing. > > The setProgram() method in my mind is specific to Blast, as opposed to > being a generic Pairwise Alignment concept. Therefore it might be better > to move this to a Blast-specific sub-interface or make it only appear in > the implementation classes that refer to Blast. > > Finally, the JavaDocs for the various set() methods are incorrect - > they're all mostly the same in fact! :) > > But overall it looks good. > > cheers, > Richard > > On Thu, 2009-06-11 at 09:52 -0400, Sylvain Foisy wrote: >> Hi to all, >> >> I've been working on this for the past week or so and after discussing this >> with Andreas, I am putting my code here for critical review. I'll put this >> stuff in biojava-live as soon as Andreas can fix my SVN access. >> >> First, an interface called RemotePairwiseAlignementSerivce defines the basic >> components of a remote service: sequence/database/progam/run options/output >> options. RemoteQBlastService implements this interface and runs remote >> Qblast requests and creates output in either text, XML or HTML. At present >> time, regular blastall programs work, no blastpgp/megablast support yet. >> >> I'll need some guidance to make it work on other type of web services like >> EBI. >> >> Best regards >> >> Sylvain >> >> =================================================================== >> >> Sylvain Foisy, Ph. D. >> Consultant Bio-informatique / Bioinformatics >> Diploide.net - TI pour la vie / IT for Life >> >> Courriel: sylvain.foisy at diploide.net >> Web: http://www.diploide.net >> Tel: (514) 893-4363 >> =================================================================== >> >> import java.io.InputStream; >> >> import org.biojava.bio.BioException; >> /** >> * This interface specifies minimal information needed to execute a pairwise >> alignment on a remote service. >> * >> * Example of service: QBlast service at NCBI >> * Web Service at EBI >> * >> * @author Sylvain Foisy >> * @since 1.8 >> * >> */ >> public interface RemotePairwiseAlignementService { >> >> /** >> * This field specifies that the output format of results >> * is text. >> * >> */ >> public static final int TEXT = 0; >> >> /** >> * This field specifies that the output format of results >> * is XML. >> * >> */ >> public static final int XML = 1; >> >> /** >> * This field specifies that the output format of results >> * is HTML. >> * >> */ >> public static final int HTML = 2; >> >> /** >> * Setting the database to use for doing the pairwise alignment >> * >> * @param db: a String with a valid database ID for the >> service used. >> * >> */ >> public void setDatabase(String db); >> >> /** >> * Setting the sequence to be align for this for this request >> * >> * @param seq: a String with a sequence to be aligned. >> * >> */ >> public void setSequence(String seq); >> >> /** >> * Setting the program to use for this pairwise alignment >> * >> * @param prog: a String with a valid database ID for the >> service used. >> * >> */ >> public void setProgram(String prog); >> >> /** >> * Setting all other options to use for this pairwise alignment >> * >> * @param db: a String with a valid database ID for the >> service used. >> * >> */ >> public void setAdvancedOptions(String str); >> >> /** >> * Doing the actual analysis on the instantiated service >> * >> * @throws BioException >> */ >> public void executeSearch() throws BioException; >> >> /** >> * Getting the actual alignment results from this instantiated service >> * >> * @return : an InputStream with the actual alignment >> results >> * @throws BioException >> */ >> public InputStream getAlignmentResults() throws BioException; >> } >> >> import java.io.BufferedReader; >> import java.io.IOException; >> import java.io.InputStream; >> import java.io.InputStreamReader; >> import java.io.OutputStreamWriter; >> import java.net.MalformedURLException; >> import java.net.URL; >> import java.net.URLConnection; >> >> import org.biojava.bio.BioException; >> >> /** >> * RemoteQBlastService - A simple way of submitting BLAST request to the >> QBlast >> * service at NCBI. >> * >> *

>> * NCBI provides a Blast server through a CGI-BIN interface. >> RemoteQBlastService simply >> * encapsulates an access to it by giving users access to get/set methods to >> fix >> * sequence, program and database as well as advanced options. >> *

>> * >> *

>> * As of version 1.0, only blastall programs are usable. blastpgp and >> megablast are high-priorities. >> *

>> * >> * @author Sylvain Foisy >> * @version 1.0 >> * @since 1.8 >> * >> * >> */ >> public class RemoteQBlastService implements RemotePairwiseAlignementService{ >> >> // public static final int TEXT = 0; >> // public static final int XML = 1; >> // public static final int HTML = 2; >> >> private static String baseurl = >> "http://www.ncbi.nlm.nih.gov/blast/Blast.cgi"; >> private URL aUrl; >> private URLConnection uConn; >> private OutputStreamWriter fromQBlast; >> private BufferedReader rd; >> >> private String seq = null; >> private String prog = null; >> private String db = null; >> private String outputFormat = null; >> private String advanced = null; >> >> private String rid; >> private long step; >> private boolean done = false; >> private long start; >> >> public RemoteQBlastService() throws BioException { >> try { >> aUrl = new URL(baseurl); >> uConn = setQBlastProperties(aUrl.openConnection()); >> >> outputFormat = "Text"; >> } >> /* >> * Needed but should never be thrown since the URL is static and >> known to exist >> */ >> catch (MalformedURLException e) { >> throw new BioException("It looks like the URL for NCBI QBlast >> service is bad"); >> } >> /* >> * Intercept if the program can't connect to QBlast service >> */ >> catch (IOException e) { >> throw new BioException( >> "Impossible to connect to QBlast service at this time. >> Check your network connection"); >> } >> } >> >> /** >> * This method execute the Blast request via the Put command of the >> CGI-BIN >> * interface. It gets the estimated time of completion by capturing the >> * value of the RTOE variable and sets a loop that will check for >> completion >> * of analysis at intervals specified by RTOE. >> * >> *

>> * It also capture the value for the RID variable, necessary for >> fetching >> * the actual results after completion. >> *

>> * >> * @throws BioException >> * if it is not possible to sent the BLAST command >> */ >> public void executeSearch() throws BioException { >> >> if (seq == null || db == null || prog == null) { >> throw new BioException( >> "Impossible to execute QBlast request. One or more of >> seq|db|prog has not been set"); >> } >> /* >> * sending the command to execute the Blast analysis >> */ >> String cmd = "CMD=Put&SERVICE=plain" + "&" + seq + "&" + prog + "&" >> + db + "&" + "FORMAT_TYPE=HTML"; >> >> if (advanced != null) { >> cmd += cmd + "&" + advanced; >> } >> >> try { >> >> uConn = setQBlastProperties(aUrl.openConnection()); >> >> fromQBlast = new OutputStreamWriter(uConn.getOutputStream()); >> >> fromQBlast.write(cmd); >> fromQBlast.flush(); >> >> // Get the response >> rd = new BufferedReader(new InputStreamReader(uConn >> .getInputStream())); >> >> String line = ""; >> >> while ((line = rd.readLine()) != null) { >> if (line.contains("RID")) { >> String[] arr = line.split("="); >> rid = arr[1].trim(); >> } else if (line.contains("RTOE")) { >> String[] arr = line.split("="); >> step = Long.parseLong(arr[1].trim()) * 1000; >> start = System.currentTimeMillis() + step; >> } >> } >> } catch (IOException e) { >> throw new BioException( >> "Can't submit sequence to BLAST server at this time."); >> } >> /* >> * Getting the info out of the NCBI system >> */ >> while (!done) { >> long prez = System.currentTimeMillis(); >> done = isReady(rid, prez); >> } >> } >> >> /** >> *

This method is used only for the executeBlastSearch method to >> check for completion of >> * request using the NCBI specified RTOE variable

>> * >> * @param id >> * @param present >> * @return >> */ >> private boolean isReady(String id, long present) { >> >> boolean ready = false; >> String check = "CMD=Get&RID=" + id; >> /* >> * If present time is less than the start of the search added to >> step >> * obtained from NCBI, just do nothing ;-) >> */ >> if (present < start) { >> ; >> } >> /* >> * If we are at least step seconds in the future from the actual >> call of >> * method executeBlastSearch() >> */ >> else { >> try { >> uConn = setQBlastProperties(aUrl.openConnection()); >> >> fromQBlast = new >> OutputStreamWriter(uConn.getOutputStream()); >> fromQBlast.write(check); >> fromQBlast.flush(); >> >> rd = new BufferedReader(new InputStreamReader(uConn >> .getInputStream())); >> >> String line = ""; >> >> while ((line = rd.readLine()) != null) { >> if (line.contains("READY")) { >> ready = true; >> } else if (line.contains("WAITING")) { >> /* >> * Else, move start forward in time... >> */ >> start = present + step; >> } >> } >> } catch (IOException e) { >> e.printStackTrace(); >> } >> } >> return ready; >> } >> >> /** >> *

This method extracts this actual Blast report. The default format >> is Text but can be changed before with the method >> * setQBlastOutputFormat.

>> * >> * >> * @return >> * @throws BioException >> */ >> public InputStream getAlignmentResults() throws BioException { >> String srid = "CMD=Get&RID=" + rid; >> srid += "&FORMAT_TYPE=" + outputFormat; >> >> if(!this.done){ >> throw new BioException("Unable to get report at this time. Your >> Blast request has not been processed yet."); >> } >> >> try { >> uConn = setQBlastProperties(aUrl.openConnection()); >> >> fromQBlast = new OutputStreamWriter(uConn.getOutputStream()); >> fromQBlast.write(srid); >> fromQBlast.flush(); >> >> return uConn.getInputStream(); >> >> } catch (IOException ioe) { >> throw new BioException( >> "It is not possible to fetch Blast report from NCBI at >> this time"); >> } >> } >> >> /** >> *

>> * Set the sequence to be blasted using the String that correspond to >> the >> * sequence. >> *

>> * >> *

>> * Take note that this method is mutually exclusive to setGIToBlast() >> for a >> * given Blast request. >> *

>> * >> * @param aStr >> * : a String with the sequence >> */ >> public void setSequence(String aStr) { >> this.seq = "QUERY=" + aStr; >> } >> >> /** >> * Simply return a string with the blasted sequence. >> * >> * @return seq : a string with the sequence >> */ >> public String getSeqToBlast() { >> return this.seq; >> } >> >> /** >> *

>> * Set the sequence to be blasted using the NCBI GI value. At this time, >> * there is no effort made to check the validity of this GI. >> *

>> * >> *

>> * Take note that this method is mutually exclusive to setSeqToBlast() >> for a >> * given Blast request. >> *

>> * >> * @param gi >> * : an integer value representing a NCBI GI >> */ >> public void setGIToBlast(String gi) { >> this.seq = "QUERY=" + gi; >> } >> >> /** >> *

>> * Simply return a string with the sequence blasted. >> *

>> * >> * @return GI : a String with the GI of the blasted sequence >> */ >> public String getGIToBlast() { >> return this.seq; >> } >> >> /** >> *

>> * This method set the program to be used to blast the given >> sequence/GI. At >> * this time, there is no attempt at checking the matching of sequence >> type >> * to program. >> *

>> * >> * @param prog >> * : a String representing the program specified for this >> QBlast >> * request. >> * >> */ >> public void setProgram(String prog) { >> this.prog = "PROGRAM=" + prog; >> } >> >> /** >> *

>> * Simply returns the program used for the given Blast request. >> *

>> * >> * @return prog : a String with the program used for this QBlast >> request. >> */ >> public String getProgram() { >> return this.prog; >> } >> >> /** >> *

>> * This method set the database to be used to blast the given >> sequence/GI. >> * At this time, there is no attempt at checking the matching of >> sequence >> * type to database. >> *

>> * >> * @param db: a String for the database specified for this QBlast >> request >> */ >> public void setDatabase(String db) { >> this.db = "DATABASE=" + db; >> } >> >> /** >> *

>> * Simply returns the database used for the given Blast request. >> *

>> * >> * @return db: a String with the database used for this QBlast request. >> */ >> public String getBlastDatabase() { >> return this.db; >> } >> >> /** >> *

This method let the user specify which format to use for >> generating the output.

>> * >> * @param type:an integer taken from the static constant of this class, >> either be TEXT, XML or HTML >> */ >> public void setQBlastOutputFormat(int type) { >> >> switch (type) { >> case 0: >> this.outputFormat = "Text"; >> break; >> case 1: >> this.outputFormat = "XML"; >> break; >> case 2: >> this.outputFormat = "HTML"; >> break; >> } >> } >> >> /** >> *

>> * Simply returns the output format used for the given Blast report. >> *

>> * >> * @return outputFormat : a String with the format specified for the >> QBlast report. >> */ >> public String getQBlastOutputFormat() { >> return this.outputFormat; >> } >> >> /** >> *

This method is to be used if a request is to use non-default >> values at submission. According to QBlast info, >> * the accepted parameters for PUT requests are:

>> * >> *
    >> *
  • -G: cost to create a gap. Default = 5 (nuc-nuc) / 11 (protein) / >> non-affine for megablast
  • >> *
  • -E: Cost to extend a gap. Default = 2 (nuc-nuc) / 1 (protein) / >> non-affine for megablast
  • >> *
  • -r: integer to reward for match. Default = 1
  • >> *
  • -q: negative integer for penalty to allow mismatch. Default = >> -3
  • >> *
  • -e: expectation value. Default = 10.0
  • >> *
  • -W: word size. Default = 3 (proteins) / 11 (nuc-nuc) / 28 >> (megablast)
  • >> *
  • -y: dropoff for blast extensions in bits, using default if not >> specified. Default = 20 for blastn, 7 for all others >> * (except megablast for which it is not applicable).
  • >> *
  • -X: X dropoff value for gapped alignment, in bits. Default = 30 >> for blastn/megablast, 15 for all others.
  • >> *
  • -Z: final X dropoff value for gapped alignement, in bits. Default >> = 50 for blastn, 25 for all others >> * (except megablast for which it is not applicable)
  • >> *
  • -P: equals 0 for multiple hits 1-pass, 1 for single hit 1-pass. >> Does not apply to blastn ou megablast.
  • >> *
  • -A: multiple hits window size. Default = 0 (for single hit >> algorithm)
  • >> *
  • -I: number of database sequences to save hits for. Default = >> 500
  • >> *
  • -Y: effective length of the search space. Default = 0 (0 >> represents using the whole space)
  • >> *
  • -z: a real specifying the effective length of the database to >> use. Default = 0 (0 represents the real size)
  • >> *
  • -c: an integer representing pseudocount constant for PSI-BLAST. >> Default = 7
  • >> *
  • -F: any filtering directive
  • >> *
>> * >> *

You have to be aware that at not moment is there any error >> checking on the use of these parameters by this class.

>> * @param aStr: a String with any number of optional parameters with an >> associated value. >> * >> */ >> public void setAdvancedOptions(String aStr) { >> this.advanced = "OTHER_ADVANCED=" + aStr; >> } >> >> /** >> * >> * Simply return the string given as argument via >> setBlastAdvancedOptions >> * >> * @return advanced: the string with the advanced options >> */ >> public String getBlastAdvancedOptions() { >> return this.advanced; >> } >> >> /** >> * >> * Simply return the QBlast RID for this specific QBlast request >> * >> * @return rid: the string with the RID >> */ >> public String getBlastRID() { >> return this.rid; >> } >> >> /** >> * A simple method to check the availability of the QBlast service >> * >> * @throws BioException >> */ >> public void printRemoteBlastInfo() throws BioException { >> try { >> OutputStreamWriter out = new OutputStreamWriter(uConn >> .getOutputStream()); >> >> out.write("CMD=Info"); >> out.flush(); >> >> // Get the response >> BufferedReader rd = new BufferedReader(new >> InputStreamReader(uConn >> .getInputStream())); >> >> String line = ""; >> >> while ((line = rd.readLine()) != null) { >> System.out.println(line); >> } >> >> out.close(); >> rd.close(); >> } catch (IOException e) { >> throw new BioException( >> "Impossible to get info from QBlast service at this >> time. Check your network connection"); >> } >> } >> >> private URLConnection setQBlastProperties(URLConnection conn) { >> >> URLConnection tmp = conn; >> >> conn.setDoOutput(true); >> conn.setUseCaches(false); >> >> tmp.setRequestProperty("User-Agent", "Biojava/RemoteQBlastService"); >> tmp.setRequestProperty("Connection", "Keep-Alive"); >> tmp.setRequestProperty("Content-type", >> "application/x-www-form-urlencoded"); >> tmp.setRequestProperty("Content-length", "200"); >> >> return tmp; >> } >> } >> >> >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev From HWillis at scripps.edu Thu Jun 11 15:58:22 2009 From: HWillis at scripps.edu (Scooter Willis) Date: Thu, 11 Jun 2009 11:58:22 -0400 Subject: [Biojava-dev] First draft of a remote blast service class In-Reply-To: References: Message-ID: <061BFD133FA1584693D19C79A0072F5F95FFD9@FLMAIL1.fl.ad.scripps.edu> Sylvain My first reaction was that I was expecting BLAST code but came across RemotePairwiseAlignementService which made me pause thinking I would be looking at a sequence alignment code. RemoteBLASTService would be a better description specific to doing Remote BLAST. I agree that everything should be an enum if possible but encapsulated in a single search/parameter class. The enums should not have any URL specific association with the remote service but should be abstracted to something that makes sense to a developer wanting to use a service they know nothing about and don't want to take the time to read. The query parameters should be defined as a Java class that could be passed around to different service providers and then internally to the service provider the values would be mapped to the specific requirements of that service. Doing a quick view of the form for NCBI BLASTN you have human readable labels that when the query is submitted will map to a value that the programmer wanted to use as short hand. http://blast.ncbi.nlm.nih.gov/Blast.cgi?PROGRAM=blastn&BLAST_PROGRAMS=me gaBlast&PAGE_TYPE=BlastSearch&SHOW_DEFAULTS=on&BLAST_SPEC=&LINK_LOC=blas ttab&LAST_PAGE=tblastx If you click on blastn,blastp,blastx,tblastn, tblastx tabs on the above link you will see that the forms are very similar but do have variations. I would use each input form as the model for the class to do the appropriate search. What is common to the 5 tabs would be in the base abstract search class and any input requirements that are different would go in an extended class. This gives you a generic class for modeling the search parameters that is easily understood. The hard part is then mapping the easy to understand version to the specific search query parameters of a particular service. Either way you should be able to pass the search class to different providers without knowing anything about that specific service. It would also be nice to have a listener interface so the class that is responsible for doing the query also checks if the results are available based on some poll value. The external calling code shouldn't need to worry about bookkeeping of unique identifiers for a particular service provider. The implementation class should hide all those details. You also have the results returning in text, XML or HTML. It would be nice if the results could be returned as a collection of SeqSimilaritySearchResult and collection of SeqSimilaritySearchHit found at http://www.biojava.org/wiki/BioJava:CookBook:Blast:Parser This may require you to parse the text/HTML/XML code in your implementation class. This way you can tweak or adjust for anything specific to the service provider. Other BLAST web services WSDL providers will return a collection of Java classes specific to that implementation that then need to be mapped to SeqSimilaritySearchResult and SeqSimilaritySearchHit. The benefit is that API hides all the ugly details from the developer who is using the BLAST service. NCBIBlast has a formal WSDL interface which may make the process easier for you. http://bioinfo.unice.fr/web_services/Using_NCBI-Blast.html If you click on this link http://www.ebi.ac.uk/Tools/webservices/wsdl/WSNCBIBlast.wsdl you will see all the web services magic that you hand off to your favorite IDE and it writes the code for you. I did a quick test in Netbeans and they are using Jax-RPC for the web service calls where I don't see a nice set of Java classes for structured results. This means parsing a string. It also appears they are providing a similar interface for WU-Blast http://bioinfo.unice.fr/web_services/Using_WU-Blast.html#General_Informa tion and http://www.ebi.ac.uk/Tools/webservices/wsdl/WSWUBlast.wsdl The advantage of using the web service interface is that it should be stable where you can't control changes they are making to the CGI form submission which would break the biojava code. Scooter -----Original Message----- From: biojava-dev-bounces at lists.open-bio.org [mailto:biojava-dev-bounces at lists.open-bio.org] On Behalf Of Sylvain Foisy Sent: Thursday, June 11, 2009 9:52 AM To: biojava-dev at lists.open-bio.org Subject: [Biojava-dev] First draft of a remote blast service class Hi to all, I've been working on this for the past week or so and after discussing this with Andreas, I am putting my code here for critical review. I'll put this stuff in biojava-live as soon as Andreas can fix my SVN access. First, an interface called RemotePairwiseAlignementSerivce defines the basic components of a remote service: sequence/database/progam/run options/output options. RemoteQBlastService implements this interface and runs remote Qblast requests and creates output in either text, XML or HTML. At present time, regular blastall programs work, no blastpgp/megablast support yet. I'll need some guidance to make it work on other type of web services like EBI. Best regards Sylvain =================================================================== Sylvain Foisy, Ph. D. Consultant Bio-informatique / Bioinformatics Diploide.net - TI pour la vie / IT for Life Courriel: sylvain.foisy at diploide.net Web: http://www.diploide.net Tel: (514) 893-4363 =================================================================== import java.io.InputStream; import org.biojava.bio.BioException; /** * This interface specifies minimal information needed to execute a pairwise alignment on a remote service. * * Example of service: QBlast service at NCBI * Web Service at EBI * * @author Sylvain Foisy * @since 1.8 * */ public interface RemotePairwiseAlignementService { /** * This field specifies that the output format of results * is text. * */ public static final int TEXT = 0; /** * This field specifies that the output format of results * is XML. * */ public static final int XML = 1; /** * This field specifies that the output format of results * is HTML. * */ public static final int HTML = 2; /** * Setting the database to use for doing the pairwise alignment * * @param db: a String with a valid database ID for the service used. * */ public void setDatabase(String db); /** * Setting the sequence to be align for this for this request * * @param seq: a String with a sequence to be aligned. * */ public void setSequence(String seq); /** * Setting the program to use for this pairwise alignment * * @param prog: a String with a valid database ID for the service used. * */ public void setProgram(String prog); /** * Setting all other options to use for this pairwise alignment * * @param db: a String with a valid database ID for the service used. * */ public void setAdvancedOptions(String str); /** * Doing the actual analysis on the instantiated service * * @throws BioException */ public void executeSearch() throws BioException; /** * Getting the actual alignment results from this instantiated service * * @return : an InputStream with the actual alignment results * @throws BioException */ public InputStream getAlignmentResults() throws BioException; } import java.io.BufferedReader; import java.io.IOException; import java.io.InputStream; import java.io.InputStreamReader; import java.io.OutputStreamWriter; import java.net.MalformedURLException; import java.net.URL; import java.net.URLConnection; import org.biojava.bio.BioException; /** * RemoteQBlastService - A simple way of submitting BLAST request to the QBlast * service at NCBI. * *

* NCBI provides a Blast server through a CGI-BIN interface. RemoteQBlastService simply * encapsulates an access to it by giving users access to get/set methods to fix * sequence, program and database as well as advanced options. *

* *

* As of version 1.0, only blastall programs are usable. blastpgp and megablast are high-priorities. *

* * @author Sylvain Foisy * @version 1.0 * @since 1.8 * * */ public class RemoteQBlastService implements RemotePairwiseAlignementService{ // public static final int TEXT = 0; // public static final int XML = 1; // public static final int HTML = 2; private static String baseurl = "http://www.ncbi.nlm.nih.gov/blast/Blast.cgi"; private URL aUrl; private URLConnection uConn; private OutputStreamWriter fromQBlast; private BufferedReader rd; private String seq = null; private String prog = null; private String db = null; private String outputFormat = null; private String advanced = null; private String rid; private long step; private boolean done = false; private long start; public RemoteQBlastService() throws BioException { try { aUrl = new URL(baseurl); uConn = setQBlastProperties(aUrl.openConnection()); outputFormat = "Text"; } /* * Needed but should never be thrown since the URL is static and known to exist */ catch (MalformedURLException e) { throw new BioException("It looks like the URL for NCBI QBlast service is bad"); } /* * Intercept if the program can't connect to QBlast service */ catch (IOException e) { throw new BioException( "Impossible to connect to QBlast service at this time. Check your network connection"); } } /** * This method execute the Blast request via the Put command of the CGI-BIN * interface. It gets the estimated time of completion by capturing the * value of the RTOE variable and sets a loop that will check for completion * of analysis at intervals specified by RTOE. * *

* It also capture the value for the RID variable, necessary for fetching * the actual results after completion. *

* * @throws BioException * if it is not possible to sent the BLAST command */ public void executeSearch() throws BioException { if (seq == null || db == null || prog == null) { throw new BioException( "Impossible to execute QBlast request. One or more of seq|db|prog has not been set"); } /* * sending the command to execute the Blast analysis */ String cmd = "CMD=Put&SERVICE=plain" + "&" + seq + "&" + prog + "&" + db + "&" + "FORMAT_TYPE=HTML"; if (advanced != null) { cmd += cmd + "&" + advanced; } try { uConn = setQBlastProperties(aUrl.openConnection()); fromQBlast = new OutputStreamWriter(uConn.getOutputStream()); fromQBlast.write(cmd); fromQBlast.flush(); // Get the response rd = new BufferedReader(new InputStreamReader(uConn .getInputStream())); String line = ""; while ((line = rd.readLine()) != null) { if (line.contains("RID")) { String[] arr = line.split("="); rid = arr[1].trim(); } else if (line.contains("RTOE")) { String[] arr = line.split("="); step = Long.parseLong(arr[1].trim()) * 1000; start = System.currentTimeMillis() + step; } } } catch (IOException e) { throw new BioException( "Can't submit sequence to BLAST server at this time."); } /* * Getting the info out of the NCBI system */ while (!done) { long prez = System.currentTimeMillis(); done = isReady(rid, prez); } } /** *

This method is used only for the executeBlastSearch method to check for completion of * request using the NCBI specified RTOE variable

* * @param id * @param present * @return */ private boolean isReady(String id, long present) { boolean ready = false; String check = "CMD=Get&RID=" + id; /* * If present time is less than the start of the search added to step * obtained from NCBI, just do nothing ;-) */ if (present < start) { ; } /* * If we are at least step seconds in the future from the actual call of * method executeBlastSearch() */ else { try { uConn = setQBlastProperties(aUrl.openConnection()); fromQBlast = new OutputStreamWriter(uConn.getOutputStream()); fromQBlast.write(check); fromQBlast.flush(); rd = new BufferedReader(new InputStreamReader(uConn .getInputStream())); String line = ""; while ((line = rd.readLine()) != null) { if (line.contains("READY")) { ready = true; } else if (line.contains("WAITING")) { /* * Else, move start forward in time... */ start = present + step; } } } catch (IOException e) { e.printStackTrace(); } } return ready; } /** *

This method extracts this actual Blast report. The default format is Text but can be changed before with the method * setQBlastOutputFormat.

* * * @return * @throws BioException */ public InputStream getAlignmentResults() throws BioException { String srid = "CMD=Get&RID=" + rid; srid += "&FORMAT_TYPE=" + outputFormat; if(!this.done){ throw new BioException("Unable to get report at this time. Your Blast request has not been processed yet."); } try { uConn = setQBlastProperties(aUrl.openConnection()); fromQBlast = new OutputStreamWriter(uConn.getOutputStream()); fromQBlast.write(srid); fromQBlast.flush(); return uConn.getInputStream(); } catch (IOException ioe) { throw new BioException( "It is not possible to fetch Blast report from NCBI at this time"); } } /** *

* Set the sequence to be blasted using the String that correspond to the * sequence. *

* *

* Take note that this method is mutually exclusive to setGIToBlast() for a * given Blast request. *

* * @param aStr * : a String with the sequence */ public void setSequence(String aStr) { this.seq = "QUERY=" + aStr; } /** * Simply return a string with the blasted sequence. * * @return seq : a string with the sequence */ public String getSeqToBlast() { return this.seq; } /** *

* Set the sequence to be blasted using the NCBI GI value. At this time, * there is no effort made to check the validity of this GI. *

* *

* Take note that this method is mutually exclusive to setSeqToBlast() for a * given Blast request. *

* * @param gi * : an integer value representing a NCBI GI */ public void setGIToBlast(String gi) { this.seq = "QUERY=" + gi; } /** *

* Simply return a string with the sequence blasted. *

* * @return GI : a String with the GI of the blasted sequence */ public String getGIToBlast() { return this.seq; } /** *

* This method set the program to be used to blast the given sequence/GI. At * this time, there is no attempt at checking the matching of sequence type * to program. *

* * @param prog * : a String representing the program specified for this QBlast * request. * */ public void setProgram(String prog) { this.prog = "PROGRAM=" + prog; } /** *

* Simply returns the program used for the given Blast request. *

* * @return prog : a String with the program used for this QBlast request. */ public String getProgram() { return this.prog; } /** *

* This method set the database to be used to blast the given sequence/GI. * At this time, there is no attempt at checking the matching of sequence * type to database. *

* * @param db: a String for the database specified for this QBlast request */ public void setDatabase(String db) { this.db = "DATABASE=" + db; } /** *

* Simply returns the database used for the given Blast request. *

* * @return db: a String with the database used for this QBlast request. */ public String getBlastDatabase() { return this.db; } /** *

This method let the user specify which format to use for generating the output.

* * @param type:an integer taken from the static constant of this class, either be TEXT, XML or HTML */ public void setQBlastOutputFormat(int type) { switch (type) { case 0: this.outputFormat = "Text"; break; case 1: this.outputFormat = "XML"; break; case 2: this.outputFormat = "HTML"; break; } } /** *

* Simply returns the output format used for the given Blast report. *

* * @return outputFormat : a String with the format specified for the QBlast report. */ public String getQBlastOutputFormat() { return this.outputFormat; } /** *

This method is to be used if a request is to use non-default values at submission. According to QBlast info, * the accepted parameters for PUT requests are:

* *
    *
  • -G: cost to create a gap. Default = 5 (nuc-nuc) / 11 (protein) / non-affine for megablast
  • *
  • -E: Cost to extend a gap. Default = 2 (nuc-nuc) / 1 (protein) / non-affine for megablast
  • *
  • -r: integer to reward for match. Default = 1
  • *
  • -q: negative integer for penalty to allow mismatch. Default = -3
  • *
  • -e: expectation value. Default = 10.0
  • *
  • -W: word size. Default = 3 (proteins) / 11 (nuc-nuc) / 28 (megablast)
  • *
  • -y: dropoff for blast extensions in bits, using default if not specified. Default = 20 for blastn, 7 for all others * (except megablast for which it is not applicable).
  • *
  • -X: X dropoff value for gapped alignment, in bits. Default = 30 for blastn/megablast, 15 for all others.
  • *
  • -Z: final X dropoff value for gapped alignement, in bits. Default = 50 for blastn, 25 for all others * (except megablast for which it is not applicable)
  • *
  • -P: equals 0 for multiple hits 1-pass, 1 for single hit 1-pass. Does not apply to blastn ou megablast.
  • *
  • -A: multiple hits window size. Default = 0 (for single hit algorithm)
  • *
  • -I: number of database sequences to save hits for. Default = 500
  • *
  • -Y: effective length of the search space. Default = 0 (0 represents using the whole space)
  • *
  • -z: a real specifying the effective length of the database to use. Default = 0 (0 represents the real size)
  • *
  • -c: an integer representing pseudocount constant for PSI-BLAST. Default = 7
  • *
  • -F: any filtering directive
  • *
* *

You have to be aware that at not moment is there any error checking on the use of these parameters by this class.

* @param aStr: a String with any number of optional parameters with an associated value. * */ public void setAdvancedOptions(String aStr) { this.advanced = "OTHER_ADVANCED=" + aStr; } /** * * Simply return the string given as argument via setBlastAdvancedOptions * * @return advanced: the string with the advanced options */ public String getBlastAdvancedOptions() { return this.advanced; } /** * * Simply return the QBlast RID for this specific QBlast request * * @return rid: the string with the RID */ public String getBlastRID() { return this.rid; } /** * A simple method to check the availability of the QBlast service * * @throws BioException */ public void printRemoteBlastInfo() throws BioException { try { OutputStreamWriter out = new OutputStreamWriter(uConn .getOutputStream()); out.write("CMD=Info"); out.flush(); // Get the response BufferedReader rd = new BufferedReader(new InputStreamReader(uConn .getInputStream())); String line = ""; while ((line = rd.readLine()) != null) { System.out.println(line); } out.close(); rd.close(); } catch (IOException e) { throw new BioException( "Impossible to get info from QBlast service at this time. Check your network connection"); } } private URLConnection setQBlastProperties(URLConnection conn) { URLConnection tmp = conn; conn.setDoOutput(true); conn.setUseCaches(false); tmp.setRequestProperty("User-Agent", "Biojava/RemoteQBlastService"); tmp.setRequestProperty("Connection", "Keep-Alive"); tmp.setRequestProperty("Content-type", "application/x-www-form-urlencoded"); tmp.setRequestProperty("Content-length", "200"); return tmp; } } _______________________________________________ biojava-dev mailing list biojava-dev at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-dev From sylvain.foisy at diploide.net Thu Jun 11 16:21:01 2009 From: sylvain.foisy at diploide.net (Sylvain Foisy) Date: Thu, 11 Jun 2009 12:21:01 -0400 Subject: [Biojava-dev] First draft of a remote blast service class In-Reply-To: <061BFD133FA1584693D19C79A0072F5F95FFD9@FLMAIL1.fl.ad.scripps.edu> Message-ID: Hi to all, I have read all of the comments that my code generated and I am taking notes. I have to admit that some of the material is way above what I am used to and will need some profound reading/exploration before I address it. Thanks for the inputs and looking forward to make it better ;-) Best regards Sylvain =================================================================== Sylvain Foisy, Ph. D. Consultant Bio-informatique / Bioinformatics Diploide.net - TI pour la vie / IT for Life Courriel: sylvain.foisy at diploide.net Web: http://www.diploide.net Tel: (514) 893-4363 =================================================================== From andreas at sdsc.edu Mon Jun 15 05:27:55 2009 From: andreas at sdsc.edu (Andreas Prlic) Date: Sun, 14 Jun 2009 22:27:55 -0700 Subject: [Biojava-dev] first three modules Message-ID: <59a41c430906142227w5f21f18u265dc44d3ca24384@mail.gmail.com> Hi, I tested a couple of things today and I have come up with the first 3 new modules biojava-core, biojava-das, and biojava-structure what is common to all modules is the following directory organization: * all modules have a trunk, branches and tags directory, where the trunk directory contains the main code base. * Inside of a module the following directories exist: - src - the code - tests - Junit tests - demos - a few examples classes that contain main methods that can be run as an example the location of the modules in svn is at: svn+ssh://dev.open-bio.org/home/svn-repositories/biojava/biojava-live/branches/modules/ if you want to browse through the new modules, please see here: http://code.open-bio.org/svnweb/index.cgi/biojava/browse/biojava-live/branches/modules The maven build will be added a bit later, once a few more modules have been refactored out. Any comments so far? Andreas From andreas at sdsc.edu Tue Jun 16 03:54:07 2009 From: andreas at sdsc.edu (Andreas Prlic) Date: Mon, 15 Jun 2009 20:54:07 -0700 Subject: [Biojava-dev] next modules: blast and phylo Message-ID: <59a41c430906152054k54e9eee4v341bbe46395d8d84@mail.gmail.com> Hi, just a quick update - next two modules in SVN are: biojava-blast and biojava-phylo What about a module: biojava-biosql ? to repeat: you can also view it in your browser at: http://code.open-bio.org/svnweb/index.cgi/biojava/browse/biojava-live/branches/modules anonymous svn is at: svn co svn://code.open-bio.org/biojava/biojava-live/branches/modules/ svn for developers is at: svn+ssh://dev.open-bio.org/home/svn-repositories/biojava/biojava-live/branches/modules/ and Andreas From abdul.qaddos at gmail.com Wed Jun 17 00:45:14 2009 From: abdul.qaddos at gmail.com (Abdul Qaddus) Date: Wed, 17 Jun 2009 05:45:14 +0500 Subject: [Biojava-dev] Fwd: Need help for resolving the args[0] issues. In-Reply-To: References: Message-ID: Hello Support, I am a new developer of biojava, I have a good knowledge about java and bio, but this is new tool for me, I have some problem while working in this tools, below I have write down the code for reading the Gen Bank file and then convert into "DNA", "RNA" or "Protein". I have already add the biojava library into my source code. When I have read this code I have come to know from your arguments portion for the execution of this code, I have need three argument, one for the filename, second for the file type and third is the alphabet. Now the problem is that how I will pass these three parameter values into source code for args[0], args[1] and args[2]. When I have passed these values by using the string pattern then this code generte a errors, "illegel statement". Please help me out how I can fixed this problem, I will be very thankful to you if you will reply me soon package biojava; import java.io.*; import org.biojava.bio.*; import org.biojava.bio.seq.*; import org.biojava.bio.seq.io.*; public class ReadFasta2 { /** * This program will read any file supported by SeqIOTools it takes three * arguments, the first is the file name the second is the name of * a file format supported by SeqIOTools. eg fasta, genbank etc. * The third argument is the alphabet (eg dna, rna, protein). * * Both the format and alphabet names are case insensitive. * */ public static void main(String[] args) { try { //prepare a BufferedReader for file io BufferedReader br = new BufferedReader(new FileReader(args[0])); String format = args[1]; String alphabet = args[2]; /* * get a Sequence Iterator over all the sequences in the file. * SeqIOTools.fileToBiojava() returns an Object. If the file read * is an alignment format like MSF and Alignment object is returned * otherwise a SequenceIterator is returned. */ SequenceIterator iter = (SequenceIterator)SeqIOTools.fileToBiojava(format,alphabet, br); } catch (FileNotFoundException ex) { //can't find file specified by args[0] ex.printStackTrace(); }catch (BioException ex) { //error parsing requested format ex.printStackTrace(); } } } -- Abdul Qaddus www.futurelinkers.com Cell No:- +92-3336540863 -- Abdul Qaddus www.futurelinkers.com Cell No:- +92-3336540863 From fbristow at gmail.com Fri Jun 19 02:31:13 2009 From: fbristow at gmail.com (Franklin Bristow) Date: Thu, 18 Jun 2009 21:31:13 -0500 Subject: [Biojava-dev] Extended ABIF Parser and SCF Writer Message-ID: <50a7756d0906181931t744faa52r734df8c3c10b78cb@mail.gmail.com> Hi Everyone, I've just spent the last few days putting together an extended ABIF parser and and SCF writer. The parser that I wrote extends the existing ABIFParser but takes into account much of the information that was made available a few years ago when ABI released the ABIF File Format specification ( http://www.appliedbiosystems.com/support/software_community/ABIF_File_Format.pdf). I've heavily based my code and methods on the perl implementation of the ABIF parser Bio::Trace::ABIF by Nicola Vitacolonna. I also wrote a writer for SCF formatted chromatograms. I wrote this mostly using the documentation found in the staden formats documentation ( http://staden.sourceforge.net/manual/formats_unix_2.html and http://iubio.bio.indiana.edu/soft/molbio/molbio.old/staden/www_pages/scf-rfc.html ). Finally, I have written a small utility class that will prepare an ABIFChromatogram for writing out as an SCF formatted file. This is the entire reason that I wrote both of the above classes. I will admit that there is a pretty nasty hack in the SCFUtils class, but it was the quickest way I could think of doing what I needed to do. I use reflection in order to make a protected method accessible so that I could set the value myself without having to subclass ABIFChromatogram. Of course, I would like to change this but the circumstances under which I have had to write this code forced me to do it this way for now. All of this code is written for Java 5, but if it is necessary to change it for inclusion into your source tree I will make the change. So, I welcome comments and suggestions on how I can improve this to make it appealing enough to have it included in biojava in the future. Since the code is rather long, I've attached it as a zip file. Andreas told me that he would keep an eye on the filters for it and would let it through when he saw it, so hopefully it makes it through okay. Thanks everyone for your time! -- Franklin -------------- next part -------------- A non-text attachment was scrubbed... Name: ABIFParser.zip Type: application/zip Size: 19225 bytes Desc: not available URL: From mark.schreiber at novartis.com Fri Jun 19 05:23:22 2009 From: mark.schreiber at novartis.com (mark.schreiber at novartis.com) Date: Fri, 19 Jun 2009 13:23:22 +0800 Subject: [Biojava-dev] Extended ABIF Parser and SCF Writer In-Reply-To: <50a7756d0906181931t744faa52r734df8c3c10b78cb@mail.gmail.com> Message-ID: Hi Franklin - If there is a good argument for making the protected setBaseCallAlignment method public then we could look at changing this so you don't need to use reflection. As you say in your code comments this reflection will not work unless the security policy allows it which will not be the case in many systems. Another alternative would be to modify ABIFChromatogram and provide a public method that lets people safely call the setBaseCallAlignment (requires write access to the SVN). Finally you could extend ABIFChromatogram and add a public method that will call the protected method (of course this won't work if the method is private). Nice to see well documented code! - Mark biojava-dev-bounces at lists.open-bio.org wrote on 06/19/2009 10:31:13 AM: > Hi Everyone, > I've just spent the last few days putting together an extended ABIF parser > and and SCF writer. The parser that I wrote extends the existing ABIFParser > but takes into account much of the information that was made available a few > years ago when ABI released the ABIF File Format specification ( > http://www.appliedbiosystems. > com/support/software_community/ABIF_File_Format.pdf). > I've heavily based my code and methods on the perl implementation of the > ABIF parser Bio::Trace::ABIF by Nicola Vitacolonna. > > I also wrote a writer for SCF formatted chromatograms. I wrote this mostly > using the documentation found in the staden formats documentation ( > http://staden.sourceforge.net/manual/formats_unix_2.html and > http://iubio.bio.indiana.edu/soft/molbio/molbio. > old/staden/www_pages/scf-rfc.html > ). > > Finally, I have written a small utility class that will prepare an > ABIFChromatogram for writing out as an SCF formatted file. This is the > entire reason that I wrote both of the above classes. I will admit that > there is a pretty nasty hack in the SCFUtils class, but it was the quickest > way I could think of doing what I needed to do. I use reflection in order > to make a protected method accessible so that I could set the value myself > without having to subclass ABIFChromatogram. Of course, I would like to > change this but the circumstances under which I have had to write this code > forced me to do it this way for now. > > All of this code is written for Java 5, but if it is necessary to change it > for inclusion into your source tree I will make the change. > > So, I welcome comments and suggestions on how I can improve this to make it > appealing enough to have it included in biojava in the future. > > Since the code is rather long, I've attached it as a zip file. Andreas told > me that he would keep an eye on the filters for it and would let it through > when he saw it, so hopefully it makes it through okay. > > Thanks everyone for your time! > > -- > Franklin > [attachment "ABIFParser.zip" deleted by Mark Schreiber/GP/Novartis] > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev _________________________ CONFIDENTIALITY NOTICE The information contained in this e-mail message is intended only for the exclusive use of the individual or entity named above and may contain information that is privileged, confidential or exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, or the employee or agent responsible for delivery of the message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify the sender immediately by e-mail and delete the material from any computer. Thank you. From fbristow at gmail.com Fri Jun 19 14:52:13 2009 From: fbristow at gmail.com (Franklin Bristow) Date: Fri, 19 Jun 2009 09:52:13 -0500 Subject: [Biojava-dev] Extended ABIF Parser and SCF Writer In-Reply-To: References: <50a7756d0906181931t744faa52r734df8c3c10b78cb@mail.gmail.com> Message-ID: <50a7756d0906190752t76ac1895h5ea3fa776a582cb@mail.gmail.com> Hi Richard, I've written a very small private class that extends ABIFChromatogram. This class has a method that basically copies what you have done in ABIFChromatogram when you load a file as an ABIFChromatogram, specifically: > /** > * Create an instance of an ExtendedABIFChromatogram using the > supplied > * file. This is meant to be called in lieu of the static create > method > * that is found in {@link ABIFChromatogram}. > * > * @param f > * the ABIF formatted file > * @return an instance of ExtendedABIFChromatogram > * @throws UnsupportedChromatogramFormatException > * the file supplied is not an ABIF formatted > chromatogram > * @throws IOException > * if an I/O error occurs > */ > public ExtendedABIFChromatogram createExtended(File f) > throws UnsupportedChromatogramFormatException, IOException > { > new Parser(f); > return this; > } > This removes the need for using reflection to alter the accessibility of the methods. I've attached the updated code to this message, I hope that you will allow it through your filters again. Thanks again for having a look at my code! Thanks, Franklin On Fri, Jun 19, 2009 at 12:23 AM, wrote: > > Hi Franklin - > > If there is a good argument for making the protected setBaseCallAlignment > method public then we could look at changing this so you don't need to use > reflection. As you say in your code comments this reflection will not work > unless the security policy allows it which will not be the case in many > systems. > > Another alternative would be to modify ABIFChromatogram and provide a > public method that lets people safely call the setBaseCallAlignment > (requires write access to the SVN). Finally you could extend > ABIFChromatogram and add a public method that will call the protected method > (of course this won't work if the method is private). > > Nice to see well documented code! > > - Mark > > biojava-dev-bounces at lists.open-bio.org wrote on 06/19/2009 10:31:13 AM: > > > > Hi Everyone, > > I've just spent the last few days putting together an extended ABIF > parser > > and and SCF writer. The parser that I wrote extends the existing > ABIFParser > > but takes into account much of the information that was made available a > few > > years ago when ABI released the ABIF File Format specification ( > > http://www.appliedbiosystems. > > com/support/software_community/ABIF_File_Format.pdf). > > I've heavily based my code and methods on the perl implementation of the > > ABIF parser Bio::Trace::ABIF by Nicola Vitacolonna. > > > > I also wrote a writer for SCF formatted chromatograms. I wrote this > mostly > > using the documentation found in the staden formats documentation ( > > http://staden.sourceforge.net/manual/formats_unix_2.html and > > http://iubio.bio.indiana.edu/soft/molbio/molbio. > > old/staden/www_pages/scf-rfc.html > > ). > > > > Finally, I have written a small utility class that will prepare an > > ABIFChromatogram for writing out as an SCF formatted file. This is the > > entire reason that I wrote both of the above classes. I will admit that > > there is a pretty nasty hack in the SCFUtils class, but it was the > quickest > > way I could think of doing what I needed to do. I use reflection in > order > > to make a protected method accessible so that I could set the value > myself > > without having to subclass ABIFChromatogram. Of course, I would like to > > change this but the circumstances under which I have had to write this > code > > forced me to do it this way for now. > > > > All of this code is written for Java 5, but if it is necessary to change > it > > for inclusion into your source tree I will make the change. > > > > So, I welcome comments and suggestions on how I can improve this to make > it > > appealing enough to have it included in biojava in the future. > > > > Since the code is rather long, I've attached it as a zip file. Andreas > told > > me that he would keep an eye on the filters for it and would let it > through > > when he saw it, so hopefully it makes it through okay. > > > > Thanks everyone for your time! > > > > -- > > Franklin > > [attachment "ABIFParser.zip" deleted by Mark Schreiber/GP/Novartis] > > _______________________________________________ > > biojava-dev mailing list > > biojava-dev at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-dev > > _________________________ > > CONFIDENTIALITY NOTICE > > The information contained in this e-mail message is intended only for the > exclusive use of the individual or entity named above and may contain > information that is privileged, confidential or exempt from disclosure under > applicable law. If the reader of this message is not the intended recipient, > or the employee or agent responsible for delivery of the message to the > intended recipient, you are hereby notified that any dissemination, > distribution or copying of this communication is strictly prohibited. If you > have received this communication in error, please notify the sender > immediately by e-mail and delete the material from any computer. Thank you. > -- Franklin -------------- next part -------------- A non-text attachment was scrubbed... Name: abifparser.zip Type: application/zip Size: 20254 bytes Desc: not available URL: From holland at eaglegenomics.com Fri Jun 19 15:00:49 2009 From: holland at eaglegenomics.com (Richard Holland) Date: Fri, 19 Jun 2009 16:00:49 +0100 Subject: [Biojava-dev] Extended ABIF Parser and SCF Writer In-Reply-To: <50a7756d0906190752t76ac1895h5ea3fa776a582cb@mail.gmail.com> References: <50a7756d0906181931t744faa52r734df8c3c10b78cb@mail.gmail.com> <50a7756d0906190752t76ac1895h5ea3fa776a582cb@mail.gmail.com> Message-ID: <1245423649.16991.13.camel@buzzybee> Sorry I haven't been following the thread... what problem is this a solution for? thanks, Richard On Fri, 2009-06-19 at 09:52 -0500, Franklin Bristow wrote: > Hi Richard, > I've written a very small private class that extends ABIFChromatogram. This > class has a method that basically copies what you have done in > ABIFChromatogram when you load a file as an ABIFChromatogram, specifically: > > > /** > > * Create an instance of an ExtendedABIFChromatogram using the > > supplied > > * file. This is meant to be called in lieu of the static create > > method > > * that is found in {@link ABIFChromatogram}. > > * > > * @param f > > * the ABIF formatted file > > * @return an instance of ExtendedABIFChromatogram > > * @throws UnsupportedChromatogramFormatException > > * the file supplied is not an ABIF formatted > > chromatogram > > * @throws IOException > > * if an I/O error occurs > > */ > > public ExtendedABIFChromatogram createExtended(File f) > > throws UnsupportedChromatogramFormatException, IOException > > { > > new Parser(f); > > return this; > > } > > > This removes the need for using reflection to alter the accessibility of the > methods. > > I've attached the updated code to this message, I hope that you will allow > it through your filters again. Thanks again for having a look at my code! > > Thanks, > Franklin > > On Fri, Jun 19, 2009 at 12:23 AM, wrote: > > > > > Hi Franklin - > > > > If there is a good argument for making the protected setBaseCallAlignment > > method public then we could look at changing this so you don't need to use > > reflection. As you say in your code comments this reflection will not work > > unless the security policy allows it which will not be the case in many > > systems. > > > > Another alternative would be to modify ABIFChromatogram and provide a > > public method that lets people safely call the setBaseCallAlignment > > (requires write access to the SVN). Finally you could extend > > ABIFChromatogram and add a public method that will call the protected method > > (of course this won't work if the method is private). > > > > Nice to see well documented code! > > > > - Mark > > > > biojava-dev-bounces at lists.open-bio.org wrote on 06/19/2009 10:31:13 AM: > > > > > > > Hi Everyone, > > > I've just spent the last few days putting together an extended ABIF > > parser > > > and and SCF writer. The parser that I wrote extends the existing > > ABIFParser > > > but takes into account much of the information that was made available a > > few > > > years ago when ABI released the ABIF File Format specification ( > > > http://www.appliedbiosystems. > > > com/support/software_community/ABIF_File_Format.pdf). > > > I've heavily based my code and methods on the perl implementation of the > > > ABIF parser Bio::Trace::ABIF by Nicola Vitacolonna. > > > > > > I also wrote a writer for SCF formatted chromatograms. I wrote this > > mostly > > > using the documentation found in the staden formats documentation ( > > > http://staden.sourceforge.net/manual/formats_unix_2.html and > > > http://iubio.bio.indiana.edu/soft/molbio/molbio. > > > old/staden/www_pages/scf-rfc.html > > > ). > > > > > > Finally, I have written a small utility class that will prepare an > > > ABIFChromatogram for writing out as an SCF formatted file. This is the > > > entire reason that I wrote both of the above classes. I will admit that > > > there is a pretty nasty hack in the SCFUtils class, but it was the > > quickest > > > way I could think of doing what I needed to do. I use reflection in > > order > > > to make a protected method accessible so that I could set the value > > myself > > > without having to subclass ABIFChromatogram. Of course, I would like to > > > change this but the circumstances under which I have had to write this > > code > > > forced me to do it this way for now. > > > > > > All of this code is written for Java 5, but if it is necessary to change > > it > > > for inclusion into your source tree I will make the change. > > > > > > So, I welcome comments and suggestions on how I can improve this to make > > it > > > appealing enough to have it included in biojava in the future. > > > > > > Since the code is rather long, I've attached it as a zip file. Andreas > > told > > > me that he would keep an eye on the filters for it and would let it > > through > > > when he saw it, so hopefully it makes it through okay. > > > > > > Thanks everyone for your time! > > > > > > -- > > > Franklin > > > [attachment "ABIFParser.zip" deleted by Mark Schreiber/GP/Novartis] > > > _______________________________________________ > > > biojava-dev mailing list > > > biojava-dev at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/biojava-dev > > > > _________________________ > > > > CONFIDENTIALITY NOTICE > > > > The information contained in this e-mail message is intended only for the > > exclusive use of the individual or entity named above and may contain > > information that is privileged, confidential or exempt from disclosure under > > applicable law. If the reader of this message is not the intended recipient, > > or the employee or agent responsible for delivery of the message to the > > intended recipient, you are hereby notified that any dissemination, > > distribution or copying of this communication is strictly prohibited. If you > > have received this communication in error, please notify the sender > > immediately by e-mail and delete the material from any computer. Thank you. > > > > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From fbristow at gmail.com Fri Jun 19 15:29:51 2009 From: fbristow at gmail.com (Franklin Bristow) Date: Fri, 19 Jun 2009 10:29:51 -0500 Subject: [Biojava-dev] Extended ABIF Parser and SCF Writer In-Reply-To: <1245423649.16991.13.camel@buzzybee> References: <50a7756d0906181931t744faa52r734df8c3c10b78cb@mail.gmail.com> <50a7756d0906190752t76ac1895h5ea3fa776a582cb@mail.gmail.com> <1245423649.16991.13.camel@buzzybee> Message-ID: <50a7756d0906190829s30646cefv7055c80d1fce39e9@mail.gmail.com> Sorry Richard, I meant to respond to Mark, I'm very sleepy this morning... Thanks, Franklin On Fri, Jun 19, 2009 at 10:00 AM, Richard Holland wrote: > Sorry I haven't been following the thread... what problem is this a > solution for? > > thanks, > Richard > > On Fri, 2009-06-19 at 09:52 -0500, Franklin Bristow wrote: > > Hi Richard, > > I've written a very small private class that extends ABIFChromatogram. > This > > class has a method that basically copies what you have done in > > ABIFChromatogram when you load a file as an ABIFChromatogram, > specifically: > > > > > /** > > > * Create an instance of an ExtendedABIFChromatogram using the > > > supplied > > > * file. This is meant to be called in lieu of the static > create > > > method > > > * that is found in {@link ABIFChromatogram}. > > > * > > > * @param f > > > * the ABIF formatted file > > > * @return an instance of ExtendedABIFChromatogram > > > * @throws UnsupportedChromatogramFormatException > > > * the file supplied is not an ABIF formatted > > > chromatogram > > > * @throws IOException > > > * if an I/O error occurs > > > */ > > > public ExtendedABIFChromatogram createExtended(File f) > > > throws UnsupportedChromatogramFormatException, > IOException > > > { > > > new Parser(f); > > > return this; > > > } > > > > > This removes the need for using reflection to alter the accessibility of > the > > methods. > > > > I've attached the updated code to this message, I hope that you will > allow > > it through your filters again. Thanks again for having a look at my > code! > > > > Thanks, > > Franklin > > > > On Fri, Jun 19, 2009 at 12:23 AM, wrote: > > > > > > > > Hi Franklin - > > > > > > If there is a good argument for making the protected > setBaseCallAlignment > > > method public then we could look at changing this so you don't need to > use > > > reflection. As you say in your code comments this reflection will not > work > > > unless the security policy allows it which will not be the case in many > > > systems. > > > > > > Another alternative would be to modify ABIFChromatogram and provide a > > > public method that lets people safely call the setBaseCallAlignment > > > (requires write access to the SVN). Finally you could extend > > > ABIFChromatogram and add a public method that will call the protected > method > > > (of course this won't work if the method is private). > > > > > > Nice to see well documented code! > > > > > > - Mark > > > > > > biojava-dev-bounces at lists.open-bio.org wrote on 06/19/2009 10:31:13 > AM: > > > > > > > > > > Hi Everyone, > > > > I've just spent the last few days putting together an extended ABIF > > > parser > > > > and and SCF writer. The parser that I wrote extends the existing > > > ABIFParser > > > > but takes into account much of the information that was made > available a > > > few > > > > years ago when ABI released the ABIF File Format specification ( > > > > http://www.appliedbiosystems. > > > > com/support/software_community/ABIF_File_Format.pdf). > > > > I've heavily based my code and methods on the perl implementation of > the > > > > ABIF parser Bio::Trace::ABIF by Nicola Vitacolonna. > > > > > > > > I also wrote a writer for SCF formatted chromatograms. I wrote this > > > mostly > > > > using the documentation found in the staden formats documentation ( > > > > http://staden.sourceforge.net/manual/formats_unix_2.html and > > > > http://iubio.bio.indiana.edu/soft/molbio/molbio. > > > > old/staden/www_pages/scf-rfc.html > > > > ). > > > > > > > > Finally, I have written a small utility class that will prepare an > > > > ABIFChromatogram for writing out as an SCF formatted file. This is > the > > > > entire reason that I wrote both of the above classes. I will admit > that > > > > there is a pretty nasty hack in the SCFUtils class, but it was the > > > quickest > > > > way I could think of doing what I needed to do. I use reflection in > > > order > > > > to make a protected method accessible so that I could set the value > > > myself > > > > without having to subclass ABIFChromatogram. Of course, I would like > to > > > > change this but the circumstances under which I have had to write > this > > > code > > > > forced me to do it this way for now. > > > > > > > > All of this code is written for Java 5, but if it is necessary to > change > > > it > > > > for inclusion into your source tree I will make the change. > > > > > > > > So, I welcome comments and suggestions on how I can improve this to > make > > > it > > > > appealing enough to have it included in biojava in the future. > > > > > > > > Since the code is rather long, I've attached it as a zip file. > Andreas > > > told > > > > me that he would keep an eye on the filters for it and would let it > > > through > > > > when he saw it, so hopefully it makes it through okay. > > > > > > > > Thanks everyone for your time! > > > > > > > > -- > > > > Franklin > > > > [attachment "ABIFParser.zip" deleted by Mark Schreiber/GP/Novartis] > > > > _______________________________________________ > > > > biojava-dev mailing list > > > > biojava-dev at lists.open-bio.org > > > > http://lists.open-bio.org/mailman/listinfo/biojava-dev > > > > > > _________________________ > > > > > > CONFIDENTIALITY NOTICE > > > > > > The information contained in this e-mail message is intended only for > the > > > exclusive use of the individual or entity named above and may contain > > > information that is privileged, confidential or exempt from disclosure > under > > > applicable law. If the reader of this message is not the intended > recipient, > > > or the employee or agent responsible for delivery of the message to the > > > intended recipient, you are hereby notified that any dissemination, > > > distribution or copying of this communication is strictly prohibited. > If you > > > have received this communication in error, please notify the sender > > > immediately by e-mail and delete the material from any computer. Thank > you. > > > > > > > > > > > _______________________________________________ > > biojava-dev mailing list > > biojava-dev at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-dev > -- > Richard Holland, BSc MBCS > Operations and Delivery Director, Eagle Genomics Ltd > T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com > http://www.eaglegenomics.com/ > > -- Franklin -------------- next part -------------- A non-text attachment was scrubbed... Name: abifparser.zip Type: application/zip Size: 20254 bytes Desc: not available URL: From andreas at sdsc.edu Sat Jun 20 16:45:51 2009 From: andreas at sdsc.edu (Andreas Prlic) Date: Sat, 20 Jun 2009 09:45:51 -0700 Subject: [Biojava-dev] BioJava user meeting at ISMB/BOSC Message-ID: <59a41c430906200945q598503ccj52717cf708b67083@mail.gmail.com> Hi, Next week the ISMB and BOSC conferences will take place in Stockholm, Sweden. As has become kind of a tradition, we will have a BioJava user meeting around BOSC. If you are in Stockholm at the time please join us on Sunday, late afternoon. We will meet during the "Birds of a Feather" session. http://open-bio.org/wiki/BOSC_2009/Birds-of-a-Feather Looking forward to meeting you there, Andreas From bugzilla-daemon at portal.open-bio.org Mon Jun 29 20:30:11 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 29 Jun 2009 16:30:11 -0400 Subject: [Biojava-dev] [Bug 2540] RichSequenceIterator does not skip sequence when exception is thrown In-Reply-To: Message-ID: <200906292030.n5TKUBwq020788@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2540 vdmerwe.karen at gmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From markjschreiber at gmail.com Tue Jun 30 08:33:28 2009 From: markjschreiber at gmail.com (Mark Schreiber) Date: Tue, 30 Jun 2009 16:33:28 +0800 Subject: [Biojava-dev] Singletons are bad Message-ID: <93b45ca50906300133w58109024vb89c6970a8446fed@mail.gmail.com> I came across this today which is an interesting article about how singletons seem like a good idea but after a while you realise they get you into serious trouble. After playing with BioJava for over 10 years I completely concur. Singletons and fly-weight objects are (IMHO) the most serious problem in the BioJava code base and as the article predicts the BJ code base is completely infected with them. The article is here: http://tech.puredanger.com/2007/07/03/pattern-hate-singleton/ But I have copied the paragraph below as it seems to offer a way out without completely breaking everything. This should be seriously considered for future BJ releases. ... paste starts here But I already have a bunch of singletons in my code! Sometimes, you?ll have a system (built by you or someone else) that is heavily dependent on some singletons. Often, you will find this annoying as you try to test and/or add functionality to the system. To refactor the singletons out of your system, you need to start from each point of use and allow the singleton to be set as a dependency on the component using it, rather than calling to the singleton?s getInstance() method. Doing so moves the singleton access (but not use) up one level. Repeat until the singleton?s getInstance() method is called in as few places as possible (ideally one). At this point, all components in the system declare their dependence on the concrete singleton class and that singleton class is instantiated at a very few points at the ?top? or your architecture (then passed down through the systems). Next, it?s time to apply some classic refactoring. Most importantly, we want to change the concrete singleton class into an interface and move the existing concrete implementation into a new default implementation class implementing the interface. Finally, you?ll probably want to cleanup the calls to getInstance() with either a call to new the concrete default implementation or a factory method that can do that for you. This transformation should make all of your components dependent on an injectable, interface-defined component, which is easy to mock or swap in during unit testing of the component itself. It also typically makes testing of the concrete singleton implementation itself a breeze compared to the prior implementation. Note that the first phase of bubbling the singleton instantiation up through the architecture can be done as slowly as needed and does not need to be done all at once. You?ll find the second phase is fairly easy with any modern IDE once you get to that point.