[Biojava-dev] First draft of a remote blast service class
Richard Holland
holland at eaglegenomics.com
Thu Jun 11 14:17:35 UTC 2009
Good stuff! My 2p's worth:
setSequence() should be overloaded to accept all forms of possible
sequence input - whatever is decided on as the standard way of
referencing sequence data in BJ3. The original plan for BJ3 was to allow
String/CharSequence and List<Symbol> (see
http://www.biojava.org/wiki/BioJava3:HowTo )
setAdvancedOptions() should not accept a String, but rather a Properties
or a Map<String,String>, where the keys of the Map/Properties are
restricted to a range of acceptable values determined (and published,
maybe as an enum?) by each of the implementation classes (e.g.
RemoteQBlastService). The implementation class then uses this to
construct the call string. The reason for doing it this way is that (a)
it allows the parameters to be verified by checking them against a known
list of allowable key/values, and (b) it allows for non-URL based remote
requests to be constructed from the values, e.g. SOAP calls.
I would also replace the static int HTML/TEXT/XML with an enum as
numeric constants are sometimes a Bad Thing.
The setProgram() method in my mind is specific to Blast, as opposed to
being a generic Pairwise Alignment concept. Therefore it might be better
to move this to a Blast-specific sub-interface or make it only appear in
the implementation classes that refer to Blast.
Finally, the JavaDocs for the various set() methods are incorrect -
they're all mostly the same in fact! :)
But overall it looks good.
cheers,
Richard
On Thu, 2009-06-11 at 09:52 -0400, Sylvain Foisy wrote:
> Hi to all,
>
> I've been working on this for the past week or so and after discussing this
> with Andreas, I am putting my code here for critical review. I'll put this
> stuff in biojava-live as soon as Andreas can fix my SVN access.
>
> First, an interface called RemotePairwiseAlignementSerivce defines the basic
> components of a remote service: sequence/database/progam/run options/output
> options. RemoteQBlastService implements this interface and runs remote
> Qblast requests and creates output in either text, XML or HTML. At present
> time, regular blastall programs work, no blastpgp/megablast support yet.
>
> I'll need some guidance to make it work on other type of web services like
> EBI.
>
> Best regards
>
> Sylvain
>
> ===================================================================
>
> Sylvain Foisy, Ph. D.
> Consultant Bio-informatique / Bioinformatics
> Diploide.net - TI pour la vie / IT for Life
>
> Courriel: sylvain.foisy at diploide.net
> Web: http://www.diploide.net
> Tel: (514) 893-4363
> ===================================================================
>
> import java.io.InputStream;
>
> import org.biojava.bio.BioException;
> /**
> * This interface specifies minimal information needed to execute a pairwise
> alignment on a remote service.
> *
> * Example of service: QBlast service at NCBI
> * Web Service at EBI
> *
> * @author Sylvain Foisy
> * @since 1.8
> *
> */
> public interface RemotePairwiseAlignementService {
>
> /**
> * This field specifies that the output format of results
> * is text.
> *
> */
> public static final int TEXT = 0;
>
> /**
> * This field specifies that the output format of results
> * is XML.
> *
> */
> public static final int XML = 1;
>
> /**
> * This field specifies that the output format of results
> * is HTML.
> *
> */
> public static final int HTML = 2;
>
> /**
> * Setting the database to use for doing the pairwise alignment
> *
> * @param db: a <code>String</code> with a valid database ID for the
> service used.
> *
> */
> public void setDatabase(String db);
>
> /**
> * Setting the sequence to be align for this for this request
> *
> * @param seq: a <code>String</code> with a sequence to be aligned.
> *
> */
> public void setSequence(String seq);
>
> /**
> * Setting the program to use for this pairwise alignment
> *
> * @param prog: a <code>String</code> with a valid database ID for the
> service used.
> *
> */
> public void setProgram(String prog);
>
> /**
> * Setting all other options to use for this pairwise alignment
> *
> * @param db: a <code>String</code> with a valid database ID for the
> service used.
> *
> */
> public void setAdvancedOptions(String str);
>
> /**
> * Doing the actual analysis on the instantiated service
> *
> * @throws BioException
> */
> public void executeSearch() throws BioException;
>
> /**
> * Getting the actual alignment results from this instantiated service
> *
> * @return : an <code>InputStream</code> with the actual alignment
> results
> * @throws BioException
> */
> public InputStream getAlignmentResults() throws BioException;
> }
>
> import java.io.BufferedReader;
> import java.io.IOException;
> import java.io.InputStream;
> import java.io.InputStreamReader;
> import java.io.OutputStreamWriter;
> import java.net.MalformedURLException;
> import java.net.URL;
> import java.net.URLConnection;
>
> import org.biojava.bio.BioException;
>
> /**
> * RemoteQBlastService - A simple way of submitting BLAST request to the
> QBlast
> * service at NCBI.
> *
> * <p>
> * NCBI provides a Blast server through a CGI-BIN interface.
> RemoteQBlastService simply
> * encapsulates an access to it by giving users access to get/set methods to
> fix
> * sequence, program and database as well as advanced options.
> * </p>
> *
> * <p>
> * As of version 1.0, only blastall programs are usable. blastpgp and
> megablast are high-priorities.
> * </p>
> *
> * @author Sylvain Foisy
> * @version 1.0
> * @since 1.8
> *
> *
> */
> public class RemoteQBlastService implements RemotePairwiseAlignementService{
>
> // public static final int TEXT = 0;
> // public static final int XML = 1;
> // public static final int HTML = 2;
>
> private static String baseurl =
> "http://www.ncbi.nlm.nih.gov/blast/Blast.cgi";
> private URL aUrl;
> private URLConnection uConn;
> private OutputStreamWriter fromQBlast;
> private BufferedReader rd;
>
> private String seq = null;
> private String prog = null;
> private String db = null;
> private String outputFormat = null;
> private String advanced = null;
>
> private String rid;
> private long step;
> private boolean done = false;
> private long start;
>
> public RemoteQBlastService() throws BioException {
> try {
> aUrl = new URL(baseurl);
> uConn = setQBlastProperties(aUrl.openConnection());
>
> outputFormat = "Text";
> }
> /*
> * Needed but should never be thrown since the URL is static and
> known to exist
> */
> catch (MalformedURLException e) {
> throw new BioException("It looks like the URL for NCBI QBlast
> service is bad");
> }
> /*
> * Intercept if the program can't connect to QBlast service
> */
> catch (IOException e) {
> throw new BioException(
> "Impossible to connect to QBlast service at this time.
> Check your network connection");
> }
> }
>
> /**
> * This method execute the Blast request via the Put command of the
> CGI-BIN
> * interface. It gets the estimated time of completion by capturing the
> * value of the RTOE variable and sets a loop that will check for
> completion
> * of analysis at intervals specified by RTOE.
> *
> * <p>
> * It also capture the value for the RID variable, necessary for
> fetching
> * the actual results after completion.
> * </p>
> *
> * @throws BioException
> * if it is not possible to sent the BLAST command
> */
> public void executeSearch() throws BioException {
>
> if (seq == null || db == null || prog == null) {
> throw new BioException(
> "Impossible to execute QBlast request. One or more of
> seq|db|prog has not been set");
> }
> /*
> * sending the command to execute the Blast analysis
> */
> String cmd = "CMD=Put&SERVICE=plain" + "&" + seq + "&" + prog + "&"
> + db + "&" + "FORMAT_TYPE=HTML";
>
> if (advanced != null) {
> cmd += cmd + "&" + advanced;
> }
>
> try {
>
> uConn = setQBlastProperties(aUrl.openConnection());
>
> fromQBlast = new OutputStreamWriter(uConn.getOutputStream());
>
> fromQBlast.write(cmd);
> fromQBlast.flush();
>
> // Get the response
> rd = new BufferedReader(new InputStreamReader(uConn
> .getInputStream()));
>
> String line = "";
>
> while ((line = rd.readLine()) != null) {
> if (line.contains("RID")) {
> String[] arr = line.split("=");
> rid = arr[1].trim();
> } else if (line.contains("RTOE")) {
> String[] arr = line.split("=");
> step = Long.parseLong(arr[1].trim()) * 1000;
> start = System.currentTimeMillis() + step;
> }
> }
> } catch (IOException e) {
> throw new BioException(
> "Can't submit sequence to BLAST server at this time.");
> }
> /*
> * Getting the info out of the NCBI system
> */
> while (!done) {
> long prez = System.currentTimeMillis();
> done = isReady(rid, prez);
> }
> }
>
> /**
> * <p>This method is used only for the executeBlastSearch method to
> check for completion of
> * request using the NCBI specified RTOE variable</p>
> *
> * @param id
> * @param present
> * @return
> */
> private boolean isReady(String id, long present) {
>
> boolean ready = false;
> String check = "CMD=Get&RID=" + id;
> /*
> * If present time is less than the start of the search added to
> step
> * obtained from NCBI, just do nothing ;-)
> */
> if (present < start) {
> ;
> }
> /*
> * If we are at least step seconds in the future from the actual
> call of
> * method executeBlastSearch()
> */
> else {
> try {
> uConn = setQBlastProperties(aUrl.openConnection());
>
> fromQBlast = new
> OutputStreamWriter(uConn.getOutputStream());
> fromQBlast.write(check);
> fromQBlast.flush();
>
> rd = new BufferedReader(new InputStreamReader(uConn
> .getInputStream()));
>
> String line = "";
>
> while ((line = rd.readLine()) != null) {
> if (line.contains("READY")) {
> ready = true;
> } else if (line.contains("WAITING")) {
> /*
> * Else, move start forward in time...
> */
> start = present + step;
> }
> }
> } catch (IOException e) {
> e.printStackTrace();
> }
> }
> return ready;
> }
>
> /**
> * <p>This method extracts this actual Blast report. The default format
> is Text but can be changed before with the method
> * setQBlastOutputFormat.</p>
> *
> *
> * @return
> * @throws BioException
> */
> public InputStream getAlignmentResults() throws BioException {
> String srid = "CMD=Get&RID=" + rid;
> srid += "&FORMAT_TYPE=" + outputFormat;
>
> if(!this.done){
> throw new BioException("Unable to get report at this time. Your
> Blast request has not been processed yet.");
> }
>
> try {
> uConn = setQBlastProperties(aUrl.openConnection());
>
> fromQBlast = new OutputStreamWriter(uConn.getOutputStream());
> fromQBlast.write(srid);
> fromQBlast.flush();
>
> return uConn.getInputStream();
>
> } catch (IOException ioe) {
> throw new BioException(
> "It is not possible to fetch Blast report from NCBI at
> this time");
> }
> }
>
> /**
> * <p>
> * Set the sequence to be blasted using the String that correspond to
> the
> * sequence.
> * </p>
> *
> * <p>
> * Take note that this method is mutually exclusive to setGIToBlast()
> for a
> * given Blast request.
> * </p>
> *
> * @param aStr
> * : a String with the sequence
> */
> public void setSequence(String aStr) {
> this.seq = "QUERY=" + aStr;
> }
>
> /**
> * Simply return a string with the blasted sequence.
> *
> * @return seq : a string with the sequence
> */
> public String getSeqToBlast() {
> return this.seq;
> }
>
> /**
> * <p>
> * Set the sequence to be blasted using the NCBI GI value. At this time,
> * there is no effort made to check the validity of this GI.
> * </p>
> *
> * <p>
> * Take note that this method is mutually exclusive to setSeqToBlast()
> for a
> * given Blast request.
> * </p>
> *
> * @param gi
> * : an integer value representing a NCBI GI
> */
> public void setGIToBlast(String gi) {
> this.seq = "QUERY=" + gi;
> }
>
> /**
> * <p>
> * Simply return a string with the sequence blasted.
> * </p>
> *
> * @return GI : a String with the GI of the blasted sequence
> */
> public String getGIToBlast() {
> return this.seq;
> }
>
> /**
> * <p>
> * This method set the program to be used to blast the given
> sequence/GI. At
> * this time, there is no attempt at checking the matching of sequence
> type
> * to program.
> * </p>
> *
> * @param prog
> * : a String representing the program specified for this
> QBlast
> * request.
> *
> */
> public void setProgram(String prog) {
> this.prog = "PROGRAM=" + prog;
> }
>
> /**
> * <p>
> * Simply returns the program used for the given Blast request.
> * </p>
> *
> * @return prog : a String with the program used for this QBlast
> request.
> */
> public String getProgram() {
> return this.prog;
> }
>
> /**
> * <p>
> * This method set the database to be used to blast the given
> sequence/GI.
> * At this time, there is no attempt at checking the matching of
> sequence
> * type to database.
> * </p>
> *
> * @param db: a String for the database specified for this QBlast
> request
> */
> public void setDatabase(String db) {
> this.db = "DATABASE=" + db;
> }
>
> /**
> * <p>
> * Simply returns the database used for the given Blast request.
> * </p>
> *
> * @return db: a String with the database used for this QBlast request.
> */
> public String getBlastDatabase() {
> return this.db;
> }
>
> /**
> * <p>This method let the user specify which format to use for
> generating the output.</p>
> *
> * @param type:an integer taken from the static constant of this class,
> either be TEXT, XML or HTML
> */
> public void setQBlastOutputFormat(int type) {
>
> switch (type) {
> case 0:
> this.outputFormat = "Text";
> break;
> case 1:
> this.outputFormat = "XML";
> break;
> case 2:
> this.outputFormat = "HTML";
> break;
> }
> }
>
> /**
> * <p>
> * Simply returns the output format used for the given Blast report.
> * </p>
> *
> * @return outputFormat : a String with the format specified for the
> QBlast report.
> */
> public String getQBlastOutputFormat() {
> return this.outputFormat;
> }
>
> /**
> * <p>This method is to be used if a request is to use non-default
> values at submission. According to QBlast info,
> * the accepted parameters for PUT requests are:</p>
> *
> * <ul>
> * <li>-G: cost to create a gap. Default = 5 (nuc-nuc) / 11 (protein) /
> non-affine for megablast</li>
> * <li>-E: Cost to extend a gap. Default = 2 (nuc-nuc) / 1 (protein) /
> non-affine for megablast</li>
> * <li>-r: integer to reward for match. Default = 1</li>
> * <li>-q: negative integer for penalty to allow mismatch. Default =
> -3</li>
> * <li>-e: expectation value. Default = 10.0</li>
> * <li>-W: word size. Default = 3 (proteins) / 11 (nuc-nuc) / 28
> (megablast)</li>
> * <li>-y: dropoff for blast extensions in bits, using default if not
> specified. Default = 20 for blastn, 7 for all others
> * (except megablast for which it is not applicable).</li>
> * <li>-X: X dropoff value for gapped alignment, in bits. Default = 30
> for blastn/megablast, 15 for all others.</li>
> * <li>-Z: final X dropoff value for gapped alignement, in bits. Default
> = 50 for blastn, 25 for all others
> * (except megablast for which it is not applicable)</li>
> * <li>-P: equals 0 for multiple hits 1-pass, 1 for single hit 1-pass.
> Does not apply to blastn ou megablast.</li>
> * <li>-A: multiple hits window size. Default = 0 (for single hit
> algorithm)</li>
> * <li>-I: number of database sequences to save hits for. Default =
> 500</li>
> * <li>-Y: effective length of the search space. Default = 0 (0
> represents using the whole space)</li>
> * <li>-z: a real specifying the effective length of the database to
> use. Default = 0 (0 represents the real size)</li>
> * <li>-c: an integer representing pseudocount constant for PSI-BLAST.
> Default = 7</li>
> * <li>-F: any filtering directive</li>
> * </ul>
> *
> * <p>You have to be aware that at not moment is there any error
> checking on the use of these parameters by this class.</p>
> * @param aStr: a String with any number of optional parameters with an
> associated value.
> *
> */
> public void setAdvancedOptions(String aStr) {
> this.advanced = "OTHER_ADVANCED=" + aStr;
> }
>
> /**
> *
> * Simply return the string given as argument via
> setBlastAdvancedOptions
> *
> * @return advanced: the string with the advanced options
> */
> public String getBlastAdvancedOptions() {
> return this.advanced;
> }
>
> /**
> *
> * Simply return the QBlast RID for this specific QBlast request
> *
> * @return rid: the string with the RID
> */
> public String getBlastRID() {
> return this.rid;
> }
>
> /**
> * A simple method to check the availability of the QBlast service
> *
> * @throws BioException
> */
> public void printRemoteBlastInfo() throws BioException {
> try {
> OutputStreamWriter out = new OutputStreamWriter(uConn
> .getOutputStream());
>
> out.write("CMD=Info");
> out.flush();
>
> // Get the response
> BufferedReader rd = new BufferedReader(new
> InputStreamReader(uConn
> .getInputStream()));
>
> String line = "";
>
> while ((line = rd.readLine()) != null) {
> System.out.println(line);
> }
>
> out.close();
> rd.close();
> } catch (IOException e) {
> throw new BioException(
> "Impossible to get info from QBlast service at this
> time. Check your network connection");
> }
> }
>
> private URLConnection setQBlastProperties(URLConnection conn) {
>
> URLConnection tmp = conn;
>
> conn.setDoOutput(true);
> conn.setUseCaches(false);
>
> tmp.setRequestProperty("User-Agent", "Biojava/RemoteQBlastService");
> tmp.setRequestProperty("Connection", "Keep-Alive");
> tmp.setRequestProperty("Content-type",
> "application/x-www-form-urlencoded");
> tmp.setRequestProperty("Content-length", "200");
>
> return tmp;
> }
> }
>
>
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
--
Richard Holland, BSc MBCS
Finance Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/
More information about the biojava-dev
mailing list