[Biojava-dev] First draft of a remote blast service class

Andy Yates ayates at ebi.ac.uk
Thu Jun 11 15:53:35 UTC 2009


Really the map/enum pattern is nearly knocking on the door of the
prototype pattern & is a very good way to go for this kind of system
where target values are never set in stone (well only for a particular
release of a service).

If anyone is interested there's a very good bit of information from:

http://steve-yegge.blogspot.com/2008/10/universal-design-pattern.html

Andy

Richard Holland wrote:
> Good stuff! My 2p's worth:
> 
> setSequence() should be overloaded to accept all forms of possible
> sequence input - whatever is decided on as the standard way of
> referencing sequence data in BJ3. The original plan for BJ3 was to allow
> String/CharSequence and List<Symbol> (see
> http://www.biojava.org/wiki/BioJava3:HowTo )
> 
> setAdvancedOptions() should not accept a String, but rather a Properties
> or a Map<String,String>, where the keys of the Map/Properties are
> restricted to a range of acceptable values determined (and published,
> maybe as an enum?) by each of the implementation classes (e.g.
> RemoteQBlastService). The implementation class then uses this to
> construct the call string. The reason for doing it this way is that (a)
> it allows the parameters to be verified by checking them against a known
> list of allowable key/values, and (b) it allows for non-URL based remote
> requests to be constructed from the values, e.g. SOAP calls.
> 
> I would also replace the static int HTML/TEXT/XML with an enum as
> numeric constants are sometimes a Bad Thing.
> 
> The setProgram() method in my mind is specific to Blast, as opposed to
> being a generic Pairwise Alignment concept. Therefore it might be better
> to move this to a Blast-specific sub-interface or make it only appear in
> the implementation classes that refer to Blast.
> 
> Finally, the JavaDocs for the various set() methods are incorrect -
> they're all mostly the same in fact! :)
> 
> But overall it looks good.
> 
> cheers,
> Richard
> 
>  On Thu, 2009-06-11 at 09:52 -0400, Sylvain Foisy wrote:
>> Hi to all,
>>
>> I've been working on this for the past week or so and after discussing this
>> with Andreas, I am putting my code here for critical review. I'll put this
>> stuff in biojava-live as soon as Andreas can fix my SVN access.
>>
>> First, an interface called RemotePairwiseAlignementSerivce defines the basic
>> components of a remote service: sequence/database/progam/run options/output
>> options. RemoteQBlastService implements this interface and runs remote
>> Qblast requests and creates output in either text, XML or HTML. At present
>> time, regular blastall programs work, no blastpgp/megablast support yet.
>>
>> I'll need some guidance to make it work on other type of web services like
>> EBI.
>>
>> Best regards
>>
>> Sylvain
>>
>> ===================================================================
>>
>>  Sylvain Foisy, Ph. D.
>>  Consultant Bio-informatique / Bioinformatics
>>  Diploide.net - TI pour la vie / IT for Life
>>
>>  Courriel: sylvain.foisy at diploide.net
>>  Web: http://www.diploide.net
>>  Tel: (514) 893-4363
>> ===================================================================
>>
>> import java.io.InputStream;
>>
>> import org.biojava.bio.BioException;
>> /**
>>  * This interface specifies minimal information needed to execute a pairwise
>> alignment on a remote service.
>>  * 
>>  * Example of service: QBlast service at NCBI
>>  *                     Web Service at EBI
>>  * 
>>  * @author Sylvain Foisy
>>  * @since 1.8
>>  *
>>  */
>> public interface RemotePairwiseAlignementService {
>>
>>     /**
>>      * This field specifies that the output format of results
>>      * is text.
>>      * 
>>      */
>>     public static final int TEXT = 0;
>>
>>     /**
>>      * This field specifies that the output format of results
>>      * is XML.
>>      * 
>>      */
>>     public static final int XML = 1;
>>
>>     /**
>>      * This field specifies that the output format of results
>>      * is HTML.
>>      * 
>>      */
>>     public static final int HTML = 2;
>>
>>     /**
>>      * Setting the database to use for doing the pairwise alignment
>>      *  
>>      * @param db: a <code>String</code> with a valid database ID for the
>> service used.
>>      *  
>>      */
>>     public void setDatabase(String db);
>>
>>     /**
>>      * Setting the sequence to be align for this for this request
>>      *  
>>      * @param seq: a <code>String</code> with a sequence to be aligned.
>>      *  
>>      */
>>     public void setSequence(String seq);
>>
>>     /**
>>      * Setting the program to use for this pairwise alignment
>>      *  
>>      * @param prog: a <code>String</code> with a valid database ID for the
>> service used.
>>      *  
>>      */
>>     public void setProgram(String prog);
>>
>>     /**
>>      * Setting all other options to use for this pairwise alignment
>>      *  
>>      * @param db: a <code>String</code> with a valid database ID for the
>> service used.
>>      *  
>>      */    
>>     public void setAdvancedOptions(String str);
>>     
>>     /**
>>      * Doing the actual analysis on the instantiated service
>>      * 
>>      * @throws BioException
>>      */
>>     public void executeSearch() throws BioException;
>>     
>>     /**
>>      * Getting the actual alignment results from this instantiated service
>>      * 
>>      * @return : an <code>InputStream</code> with the actual alignment
>> results
>>      * @throws BioException
>>      */
>>     public InputStream getAlignmentResults() throws BioException;
>> }
>>
>> import java.io.BufferedReader;
>> import java.io.IOException;
>> import java.io.InputStream;
>> import java.io.InputStreamReader;
>> import java.io.OutputStreamWriter;
>> import java.net.MalformedURLException;
>> import java.net.URL;
>> import java.net.URLConnection;
>>
>> import org.biojava.bio.BioException;
>>
>> /**
>>  * RemoteQBlastService - A simple way of submitting BLAST request to the
>> QBlast
>>  * service at NCBI.
>>  * 
>>  * <p>
>>  * NCBI provides a Blast server through a CGI-BIN interface.
>> RemoteQBlastService simply
>>  * encapsulates an access to it by giving users access to get/set methods to
>> fix
>>  * sequence, program and database as well as advanced options.
>>  * </p>
>>  * 
>>  * <p>
>>  * As of version 1.0, only blastall programs are usable. blastpgp and
>> megablast are high-priorities.
>>  * </p>
>>  * 
>>  * @author Sylvain Foisy
>>  * @version 1.0
>>  * @since 1.8
>>  * 
>>  * 
>>  */
>> public class RemoteQBlastService implements RemotePairwiseAlignementService{
>>
>> //    public static final int TEXT = 0;
>> //    public static final int XML = 1;
>> //    public static final int HTML = 2;
>>
>>     private static String baseurl =
>> "http://www.ncbi.nlm.nih.gov/blast/Blast.cgi";
>>     private URL aUrl;
>>     private URLConnection uConn;
>>     private OutputStreamWriter fromQBlast;
>>     private BufferedReader rd;
>>
>>     private String seq = null;
>>     private String prog = null;
>>     private String db = null;
>>     private String outputFormat = null;
>>     private String advanced = null;
>>
>>     private String rid;
>>     private long step;
>>     private boolean done = false;
>>     private long start;
>>
>>     public RemoteQBlastService() throws BioException {
>>         try {
>>             aUrl = new URL(baseurl);
>>             uConn = setQBlastProperties(aUrl.openConnection());
>>
>>             outputFormat = "Text";
>>         }
>>         /*
>>          * Needed but should never be thrown since the URL is static and
>> known to exist
>>          */
>>         catch (MalformedURLException e) {
>>             throw new BioException("It looks like the URL for NCBI QBlast
>> service is bad");
>>         }
>>         /*
>>          * Intercept if the program can't connect to QBlast service
>>          */
>>         catch (IOException e) {
>>             throw new BioException(
>>                     "Impossible to connect to QBlast service at this time.
>> Check your network connection");
>>         }
>>     }
>>
>>     /**
>>      * This method execute the Blast request via the Put command of the
>> CGI-BIN
>>      * interface. It gets the estimated time of completion by capturing the
>>      * value of the RTOE variable and sets a loop that will check for
>> completion
>>      * of analysis at intervals specified by RTOE.
>>      * 
>>      * <p>
>>      * It also capture the value for the RID variable, necessary for
>> fetching
>>      * the actual results after completion.
>>      * </p>
>>      * 
>>      * @throws BioException
>>      *             if it is not possible to sent the BLAST command
>>      */
>>     public void executeSearch() throws BioException {
>>
>>         if (seq == null || db == null || prog == null) {
>>             throw new BioException(
>>                     "Impossible to execute QBlast request. One or more of
>> seq|db|prog has not been set");
>>         }
>>         /*
>>          * sending the command to execute the Blast analysis
>>          */
>>         String cmd = "CMD=Put&SERVICE=plain" + "&" + seq + "&" + prog + "&"
>>                 + db + "&" + "FORMAT_TYPE=HTML";
>>
>>         if (advanced != null) {
>>             cmd += cmd + "&" + advanced;
>>         }
>>
>>         try {
>>
>>             uConn = setQBlastProperties(aUrl.openConnection());
>>
>>             fromQBlast = new OutputStreamWriter(uConn.getOutputStream());
>>
>>             fromQBlast.write(cmd);
>>             fromQBlast.flush();
>>
>>             // Get the response
>>             rd = new BufferedReader(new InputStreamReader(uConn
>>                     .getInputStream()));
>>
>>             String line = "";
>>
>>             while ((line = rd.readLine()) != null) {
>>                 if (line.contains("RID")) {
>>                     String[] arr = line.split("=");
>>                     rid = arr[1].trim();
>>                 } else if (line.contains("RTOE")) {
>>                     String[] arr = line.split("=");
>>                     step = Long.parseLong(arr[1].trim()) * 1000;
>>                     start = System.currentTimeMillis() + step;
>>                 }
>>             }
>>         } catch (IOException e) {
>>             throw new BioException(
>>                     "Can't submit sequence to BLAST server at this time.");
>>         }
>>         /*
>>          * Getting the info out of the NCBI system
>>          */
>>         while (!done) {
>>             long prez = System.currentTimeMillis();
>>             done = isReady(rid, prez);
>>         }
>>     }
>>
>>     /**
>>      * <p>This method is used only for the executeBlastSearch method to
>> check for completion of
>>      * request using the NCBI specified RTOE variable</p>
>>      * 
>>      * @param id
>>      * @param present
>>      * @return
>>      */
>>     private boolean isReady(String id, long present) {
>>
>>         boolean ready = false;
>>         String check = "CMD=Get&RID=" + id;
>>         /*
>>          * If present time is less than the start of the search added to
>> step
>>          * obtained from NCBI, just do nothing ;-)
>>          */
>>         if (present < start) {
>>             ;
>>         }
>>         /*
>>          * If we are at least step seconds in the future from the actual
>> call of
>>          * method executeBlastSearch()
>>          */
>>         else {
>>             try {
>>                 uConn = setQBlastProperties(aUrl.openConnection());
>>
>>                 fromQBlast = new
>> OutputStreamWriter(uConn.getOutputStream());
>>                 fromQBlast.write(check);
>>                 fromQBlast.flush();
>>
>>                 rd = new BufferedReader(new InputStreamReader(uConn
>>                         .getInputStream()));
>>
>>                 String line = "";
>>
>>                 while ((line = rd.readLine()) != null) {
>>                     if (line.contains("READY")) {
>>                         ready = true;
>>                     } else if (line.contains("WAITING")) {
>>                         /*
>>                          * Else, move start forward in time...
>>                          */
>>                         start = present + step;
>>                     }
>>                 }
>>             } catch (IOException e) {
>>                 e.printStackTrace();
>>             }
>>         }
>>         return ready;
>>     }
>>
>>     /**
>>      * <p>This method extracts this actual Blast report. The default format
>> is Text but can be changed before with the method
>>      * setQBlastOutputFormat.</p>
>>      * 
>>      * 
>>      * @return
>>      * @throws BioException
>>      */
>>     public InputStream getAlignmentResults() throws BioException {
>>         String srid = "CMD=Get&RID=" + rid;
>>         srid += "&FORMAT_TYPE=" + outputFormat;
>>
>>         if(!this.done){
>>             throw new BioException("Unable to get report at this time. Your
>> Blast request has not been processed yet.");
>>         }
>>         
>>         try {
>>             uConn = setQBlastProperties(aUrl.openConnection());
>>
>>             fromQBlast = new OutputStreamWriter(uConn.getOutputStream());
>>             fromQBlast.write(srid);
>>             fromQBlast.flush();
>>
>>             return uConn.getInputStream();
>>
>>         } catch (IOException ioe) {
>>             throw new BioException(
>>                     "It is not possible to fetch Blast report from NCBI at
>> this time");
>>         }
>>     }
>>
>>     /**
>>      * <p>
>>      * Set the sequence to be blasted using the String that correspond to
>> the
>>      * sequence.
>>      * </p>
>>      * 
>>      * <p>
>>      * Take note that this method is mutually exclusive to setGIToBlast()
>> for a
>>      * given Blast request.
>>      * </p>
>>      * 
>>      * @param aStr
>>      *            : a String with the sequence
>>      */
>>     public void setSequence(String aStr) {
>>         this.seq = "QUERY=" + aStr;
>>     }
>>
>>     /**
>>      * Simply return a string with the blasted sequence.
>>      * 
>>      * @return seq : a string with the sequence
>>      */
>>     public String getSeqToBlast() {
>>         return this.seq;
>>     }
>>
>>     /**
>>      * <p>
>>      * Set the sequence to be blasted using the NCBI GI value. At this time,
>>      * there is no effort made to check the validity of this GI.
>>      * </p>
>>      * 
>>      * <p>
>>      * Take note that this method is mutually exclusive to setSeqToBlast()
>> for a
>>      * given Blast request.
>>      * </p>
>>      * 
>>      * @param gi
>>      *            : an integer value representing a NCBI GI
>>      */
>>     public void setGIToBlast(String gi) {
>>         this.seq = "QUERY=" + gi;
>>     }
>>
>>     /**
>>      * <p>
>>      * Simply return a string with the sequence blasted.
>>      * </p>
>>      * 
>>      * @return GI : a String with the GI of the blasted sequence
>>      */
>>     public String getGIToBlast() {
>>         return this.seq;
>>     }
>>
>>     /**
>>      * <p>
>>      * This method set the program to be used to blast the given
>> sequence/GI. At
>>      * this time, there is no attempt at checking the matching of sequence
>> type
>>      * to program.
>>      * </p>
>>      * 
>>      * @param prog
>>      *            : a String representing the program specified for this
>> QBlast
>>      *            request.
>>      * 
>>      */
>>     public void setProgram(String prog) {
>>         this.prog = "PROGRAM=" + prog;
>>     }
>>
>>     /**
>>      * <p>
>>      * Simply returns the program used for the given Blast request.
>>      * </p>
>>      * 
>>      * @return prog : a String with the program used for this QBlast
>> request.
>>      */
>>     public String getProgram() {
>>         return this.prog;
>>     }
>>
>>     /**
>>      * <p>
>>      * This method set the database to be used to blast the given
>> sequence/GI.
>>      * At this time, there is no attempt at checking the matching of
>> sequence
>>      * type to database.
>>      * </p>
>>      * 
>>      * @param db: a String for the database specified for this QBlast
>> request
>>      */
>>     public void setDatabase(String db) {
>>         this.db = "DATABASE=" + db;
>>     }
>>
>>     /**
>>      * <p>
>>      * Simply returns the database used for the given Blast request.
>>      * </p>
>>      * 
>>      * @return db: a String with the database used for this QBlast request.
>>      */
>>     public String getBlastDatabase() {
>>         return this.db;
>>     }
>>
>>     /**
>>      * <p>This method let the user specify which format to use for
>> generating the output.</p>
>>      * 
>>      * @param type:an integer taken from the static constant of this class,
>> either be TEXT, XML or HTML
>>      */
>>     public void setQBlastOutputFormat(int type) {
>>
>>         switch (type) {
>>             case 0:
>>                 this.outputFormat = "Text";
>>                 break;
>>             case 1:
>>                 this.outputFormat = "XML";
>>                 break;
>>             case 2:
>>                 this.outputFormat = "HTML";
>>                 break;
>>         }
>>     }
>>
>>     /**
>>      * <p>
>>      * Simply returns the output format used for the given Blast report.
>>      * </p>
>>      * 
>>      * @return outputFormat : a String with the format specified for the
>> QBlast report.
>>      */
>>     public String getQBlastOutputFormat() {
>>         return this.outputFormat;
>>     }
>>
>>     /**
>>      * <p>This method is to be used if a request is to use non-default
>> values at submission. According to QBlast info,
>>      * the accepted parameters for PUT requests are:</p>
>>      * 
>>      * <ul>
>>      * <li>-G: cost to create a gap. Default = 5 (nuc-nuc) / 11 (protein) /
>> non-affine for megablast</li>
>>      * <li>-E: Cost to extend a gap. Default = 2 (nuc-nuc) / 1 (protein) /
>> non-affine for megablast</li>
>>      * <li>-r: integer to reward for match. Default = 1</li>
>>      * <li>-q: negative integer for penalty to allow mismatch. Default =
>> -3</li>
>>      * <li>-e: expectation value. Default = 10.0</li>
>>      * <li>-W: word size. Default = 3 (proteins) / 11 (nuc-nuc) / 28
>> (megablast)</li>
>>      * <li>-y: dropoff for blast extensions in bits, using default if not
>> specified. Default = 20 for blastn, 7 for all others
>>      * (except megablast for which it is not applicable).</li>
>>      * <li>-X: X dropoff value for gapped alignment, in bits. Default = 30
>> for blastn/megablast, 15 for all others.</li>
>>      * <li>-Z: final X dropoff value for gapped alignement, in bits. Default
>> = 50 for blastn, 25 for all others
>>      * (except megablast for which it is not applicable)</li>
>>      * <li>-P: equals 0 for multiple hits 1-pass, 1 for single hit 1-pass.
>> Does not apply to blastn ou megablast.</li>
>>      * <li>-A: multiple hits window size. Default = 0 (for single hit
>> algorithm)</li>
>>      * <li>-I: number of database sequences to save hits for. Default =
>> 500</li>
>>      * <li>-Y: effective length of the search space. Default = 0 (0
>> represents using the whole space)</li>
>>      * <li>-z: a real specifying the effective length of the database to
>> use. Default = 0 (0 represents the real size)</li>
>>      * <li>-c: an integer representing pseudocount constant for PSI-BLAST.
>> Default = 7</li>
>>      * <li>-F: any filtering directive</li>
>>      * </ul>
>>      * 
>>      * <p>You have to be aware that at not moment is there any error
>> checking on the use of these parameters by this class.</p>
>>      * @param aStr: a String with any number of optional parameters with an
>> associated value.
>>      *
>>      */
>>     public void setAdvancedOptions(String aStr) {
>>         this.advanced = "OTHER_ADVANCED=" + aStr;
>>     }
>>
>>     /**
>>      * 
>>      * Simply return the string given as argument via
>> setBlastAdvancedOptions
>>      * 
>>      * @return advanced: the string with the advanced options
>>      */
>>     public String getBlastAdvancedOptions() {
>>         return this.advanced;
>>     }
>>
>>     /**
>>      * 
>>      * Simply return the QBlast RID for this specific QBlast request
>>      * 
>>      * @return rid: the string with the RID
>>      */
>>     public String getBlastRID() {
>>         return this.rid;
>>     }
>>
>>     /**
>>      * A simple method to check the availability of the QBlast service
>>      * 
>>      * @throws BioException
>>      */
>>     public void printRemoteBlastInfo() throws BioException {
>>         try {
>>             OutputStreamWriter out = new OutputStreamWriter(uConn
>>                     .getOutputStream());
>>
>>             out.write("CMD=Info");
>>             out.flush();
>>
>>             // Get the response
>>             BufferedReader rd = new BufferedReader(new
>> InputStreamReader(uConn
>>                     .getInputStream()));
>>
>>             String line = "";
>>
>>             while ((line = rd.readLine()) != null) {
>>                 System.out.println(line);
>>             }
>>
>>             out.close();
>>             rd.close();
>>         } catch (IOException e) {
>>             throw new BioException(
>>                     "Impossible to get info from QBlast service at this
>> time. Check your network connection");
>>         }
>>     }
>>
>>     private URLConnection setQBlastProperties(URLConnection conn) {
>>
>>         URLConnection tmp = conn;
>>
>>         conn.setDoOutput(true);
>>         conn.setUseCaches(false);
>>         
>>         tmp.setRequestProperty("User-Agent", "Biojava/RemoteQBlastService");
>>         tmp.setRequestProperty("Connection", "Keep-Alive");
>>         tmp.setRequestProperty("Content-type",
>>                 "application/x-www-form-urlencoded");
>>         tmp.setRequestProperty("Content-length", "200");
>>
>>         return tmp;
>>     }
>> }
>>
>>
>> _______________________________________________
>> biojava-dev mailing list
>> biojava-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-dev



More information about the biojava-dev mailing list