[Biojava-dev] First draft of a remote blast service class
Richard Holland
holland at eaglegenomics.com
Thu Jun 11 15:30:11 UTC 2009
Excellent idea. Even better than a Map or Properties! One parameters
bean type per implementation type complete with all its own validation,
extending a placeholder interface that can be used in the generic
interface declaration for RemotePairwiseAlignmentService. That would be
sweet.
On Thu, 2009-06-11 at 08:24 -0700, Andreas Prlic wrote:
> I would pass the parameters as a bean rather than a string...
>
> Andreas
>
> On Thu, Jun 11, 2009 at 6:52 AM, Sylvain
> Foisy<sylvain.foisy at diploide.net> wrote:
> > Hi to all,
> >
> > I've been working on this for the past week or so and after discussing this
> > with Andreas, I am putting my code here for critical review. I'll put this
> > stuff in biojava-live as soon as Andreas can fix my SVN access.
> >
> > First, an interface called RemotePairwiseAlignementSerivce defines the basic
> > components of a remote service: sequence/database/progam/run options/output
> > options. RemoteQBlastService implements this interface and runs remote
> > Qblast requests and creates output in either text, XML or HTML. At present
> > time, regular blastall programs work, no blastpgp/megablast support yet.
> >
> > I'll need some guidance to make it work on other type of web services like
> > EBI.
> >
> > Best regards
> >
> > Sylvain
> >
> > ===================================================================
> >
> > Sylvain Foisy, Ph. D.
> > Consultant Bio-informatique / Bioinformatics
> > Diploide.net - TI pour la vie / IT for Life
> >
> > Courriel: sylvain.foisy at diploide.net
> > Web: http://www.diploide.net
> > Tel: (514) 893-4363
> > ===================================================================
> >
> > import java.io.InputStream;
> >
> > import org.biojava.bio.BioException;
> > /**
> > * This interface specifies minimal information needed to execute a pairwise
> > alignment on a remote service.
> > *
> > * Example of service: QBlast service at NCBI
> > * Web Service at EBI
> > *
> > * @author Sylvain Foisy
> > * @since 1.8
> > *
> > */
> > public interface RemotePairwiseAlignementService {
> >
> > /**
> > * This field specifies that the output format of results
> > * is text.
> > *
> > */
> > public static final int TEXT = 0;
> >
> > /**
> > * This field specifies that the output format of results
> > * is XML.
> > *
> > */
> > public static final int XML = 1;
> >
> > /**
> > * This field specifies that the output format of results
> > * is HTML.
> > *
> > */
> > public static final int HTML = 2;
> >
> > /**
> > * Setting the database to use for doing the pairwise alignment
> > *
> > * @param db: a <code>String</code> with a valid database ID for the
> > service used.
> > *
> > */
> > public void setDatabase(String db);
> >
> > /**
> > * Setting the sequence to be align for this for this request
> > *
> > * @param seq: a <code>String</code> with a sequence to be aligned.
> > *
> > */
> > public void setSequence(String seq);
> >
> > /**
> > * Setting the program to use for this pairwise alignment
> > *
> > * @param prog: a <code>String</code> with a valid database ID for the
> > service used.
> > *
> > */
> > public void setProgram(String prog);
> >
> > /**
> > * Setting all other options to use for this pairwise alignment
> > *
> > * @param db: a <code>String</code> with a valid database ID for the
> > service used.
> > *
> > */
> > public void setAdvancedOptions(String str);
> >
> > /**
> > * Doing the actual analysis on the instantiated service
> > *
> > * @throws BioException
> > */
> > public void executeSearch() throws BioException;
> >
> > /**
> > * Getting the actual alignment results from this instantiated service
> > *
> > * @return : an <code>InputStream</code> with the actual alignment
> > results
> > * @throws BioException
> > */
> > public InputStream getAlignmentResults() throws BioException;
> > }
> >
> > import java.io.BufferedReader;
> > import java.io.IOException;
> > import java.io.InputStream;
> > import java.io.InputStreamReader;
> > import java.io.OutputStreamWriter;
> > import java.net.MalformedURLException;
> > import java.net.URL;
> > import java.net.URLConnection;
> >
> > import org.biojava.bio.BioException;
> >
> > /**
> > * RemoteQBlastService - A simple way of submitting BLAST request to the
> > QBlast
> > * service at NCBI.
> > *
> > * <p>
> > * NCBI provides a Blast server through a CGI-BIN interface.
> > RemoteQBlastService simply
> > * encapsulates an access to it by giving users access to get/set methods to
> > fix
> > * sequence, program and database as well as advanced options.
> > * </p>
> > *
> > * <p>
> > * As of version 1.0, only blastall programs are usable. blastpgp and
> > megablast are high-priorities.
> > * </p>
> > *
> > * @author Sylvain Foisy
> > * @version 1.0
> > * @since 1.8
> > *
> > *
> > */
> > public class RemoteQBlastService implements RemotePairwiseAlignementService{
> >
> > // public static final int TEXT = 0;
> > // public static final int XML = 1;
> > // public static final int HTML = 2;
> >
> > private static String baseurl =
> > "http://www.ncbi.nlm.nih.gov/blast/Blast.cgi";
> > private URL aUrl;
> > private URLConnection uConn;
> > private OutputStreamWriter fromQBlast;
> > private BufferedReader rd;
> >
> > private String seq = null;
> > private String prog = null;
> > private String db = null;
> > private String outputFormat = null;
> > private String advanced = null;
> >
> > private String rid;
> > private long step;
> > private boolean done = false;
> > private long start;
> >
> > public RemoteQBlastService() throws BioException {
> > try {
> > aUrl = new URL(baseurl);
> > uConn = setQBlastProperties(aUrl.openConnection());
> >
> > outputFormat = "Text";
> > }
> > /*
> > * Needed but should never be thrown since the URL is static and
> > known to exist
> > */
> > catch (MalformedURLException e) {
> > throw new BioException("It looks like the URL for NCBI QBlast
> > service is bad");
> > }
> > /*
> > * Intercept if the program can't connect to QBlast service
> > */
> > catch (IOException e) {
> > throw new BioException(
> > "Impossible to connect to QBlast service at this time.
> > Check your network connection");
> > }
> > }
> >
> > /**
> > * This method execute the Blast request via the Put command of the
> > CGI-BIN
> > * interface. It gets the estimated time of completion by capturing the
> > * value of the RTOE variable and sets a loop that will check for
> > completion
> > * of analysis at intervals specified by RTOE.
> > *
> > * <p>
> > * It also capture the value for the RID variable, necessary for
> > fetching
> > * the actual results after completion.
> > * </p>
> > *
> > * @throws BioException
> > * if it is not possible to sent the BLAST command
> > */
> > public void executeSearch() throws BioException {
> >
> > if (seq == null || db == null || prog == null) {
> > throw new BioException(
> > "Impossible to execute QBlast request. One or more of
> > seq|db|prog has not been set");
> > }
> > /*
> > * sending the command to execute the Blast analysis
> > */
> > String cmd = "CMD=Put&SERVICE=plain" + "&" + seq + "&" + prog + "&"
> > + db + "&" + "FORMAT_TYPE=HTML";
> >
> > if (advanced != null) {
> > cmd += cmd + "&" + advanced;
> > }
> >
> > try {
> >
> > uConn = setQBlastProperties(aUrl.openConnection());
> >
> > fromQBlast = new OutputStreamWriter(uConn.getOutputStream());
> >
> > fromQBlast.write(cmd);
> > fromQBlast.flush();
> >
> > // Get the response
> > rd = new BufferedReader(new InputStreamReader(uConn
> > .getInputStream()));
> >
> > String line = "";
> >
> > while ((line = rd.readLine()) != null) {
> > if (line.contains("RID")) {
> > String[] arr = line.split("=");
> > rid = arr[1].trim();
> > } else if (line.contains("RTOE")) {
> > String[] arr = line.split("=");
> > step = Long.parseLong(arr[1].trim()) * 1000;
> > start = System.currentTimeMillis() + step;
> > }
> > }
> > } catch (IOException e) {
> > throw new BioException(
> > "Can't submit sequence to BLAST server at this time.");
> > }
> > /*
> > * Getting the info out of the NCBI system
> > */
> > while (!done) {
> > long prez = System.currentTimeMillis();
> > done = isReady(rid, prez);
> > }
> > }
> >
> > /**
> > * <p>This method is used only for the executeBlastSearch method to
> > check for completion of
> > * request using the NCBI specified RTOE variable</p>
> > *
> > * @param id
> > * @param present
> > * @return
> > */
> > private boolean isReady(String id, long present) {
> >
> > boolean ready = false;
> > String check = "CMD=Get&RID=" + id;
> > /*
> > * If present time is less than the start of the search added to
> > step
> > * obtained from NCBI, just do nothing ;-)
> > */
> > if (present < start) {
> > ;
> > }
> > /*
> > * If we are at least step seconds in the future from the actual
> > call of
> > * method executeBlastSearch()
> > */
> > else {
> > try {
> > uConn = setQBlastProperties(aUrl.openConnection());
> >
> > fromQBlast = new
> > OutputStreamWriter(uConn.getOutputStream());
> > fromQBlast.write(check);
> > fromQBlast.flush();
> >
> > rd = new BufferedReader(new InputStreamReader(uConn
> > .getInputStream()));
> >
> > String line = "";
> >
> > while ((line = rd.readLine()) != null) {
> > if (line.contains("READY")) {
> > ready = true;
> > } else if (line.contains("WAITING")) {
> > /*
> > * Else, move start forward in time...
> > */
> > start = present + step;
> > }
> > }
> > } catch (IOException e) {
> > e.printStackTrace();
> > }
> > }
> > return ready;
> > }
> >
> > /**
> > * <p>This method extracts this actual Blast report. The default format
> > is Text but can be changed before with the method
> > * setQBlastOutputFormat.</p>
> > *
> > *
> > * @return
> > * @throws BioException
> > */
> > public InputStream getAlignmentResults() throws BioException {
> > String srid = "CMD=Get&RID=" + rid;
> > srid += "&FORMAT_TYPE=" + outputFormat;
> >
> > if(!this.done){
> > throw new BioException("Unable to get report at this time. Your
> > Blast request has not been processed yet.");
> > }
> >
> > try {
> > uConn = setQBlastProperties(aUrl.openConnection());
> >
> > fromQBlast = new OutputStreamWriter(uConn.getOutputStream());
> > fromQBlast.write(srid);
> > fromQBlast.flush();
> >
> > return uConn.getInputStream();
> >
> > } catch (IOException ioe) {
> > throw new BioException(
> > "It is not possible to fetch Blast report from NCBI at
> > this time");
> > }
> > }
> >
> > /**
> > * <p>
> > * Set the sequence to be blasted using the String that correspond to
> > the
> > * sequence.
> > * </p>
> > *
> > * <p>
> > * Take note that this method is mutually exclusive to setGIToBlast()
> > for a
> > * given Blast request.
> > * </p>
> > *
> > * @param aStr
> > * : a String with the sequence
> > */
> > public void setSequence(String aStr) {
> > this.seq = "QUERY=" + aStr;
> > }
> >
> > /**
> > * Simply return a string with the blasted sequence.
> > *
> > * @return seq : a string with the sequence
> > */
> > public String getSeqToBlast() {
> > return this.seq;
> > }
> >
> > /**
> > * <p>
> > * Set the sequence to be blasted using the NCBI GI value. At this time,
> > * there is no effort made to check the validity of this GI.
> > * </p>
> > *
> > * <p>
> > * Take note that this method is mutually exclusive to setSeqToBlast()
> > for a
> > * given Blast request.
> > * </p>
> > *
> > * @param gi
> > * : an integer value representing a NCBI GI
> > */
> > public void setGIToBlast(String gi) {
> > this.seq = "QUERY=" + gi;
> > }
> >
> > /**
> > * <p>
> > * Simply return a string with the sequence blasted.
> > * </p>
> > *
> > * @return GI : a String with the GI of the blasted sequence
> > */
> > public String getGIToBlast() {
> > return this.seq;
> > }
> >
> > /**
> > * <p>
> > * This method set the program to be used to blast the given
> > sequence/GI. At
> > * this time, there is no attempt at checking the matching of sequence
> > type
> > * to program.
> > * </p>
> > *
> > * @param prog
> > * : a String representing the program specified for this
> > QBlast
> > * request.
> > *
> > */
> > public void setProgram(String prog) {
> > this.prog = "PROGRAM=" + prog;
> > }
> >
> > /**
> > * <p>
> > * Simply returns the program used for the given Blast request.
> > * </p>
> > *
> > * @return prog : a String with the program used for this QBlast
> > request.
> > */
> > public String getProgram() {
> > return this.prog;
> > }
> >
> > /**
> > * <p>
> > * This method set the database to be used to blast the given
> > sequence/GI.
> > * At this time, there is no attempt at checking the matching of
> > sequence
> > * type to database.
> > * </p>
> > *
> > * @param db: a String for the database specified for this QBlast
> > request
> > */
> > public void setDatabase(String db) {
> > this.db = "DATABASE=" + db;
> > }
> >
> > /**
> > * <p>
> > * Simply returns the database used for the given Blast request.
> > * </p>
> > *
> > * @return db: a String with the database used for this QBlast request.
> > */
> > public String getBlastDatabase() {
> > return this.db;
> > }
> >
> > /**
> > * <p>This method let the user specify which format to use for
> > generating the output.</p>
> > *
> > * @param type:an integer taken from the static constant of this class,
> > either be TEXT, XML or HTML
> > */
> > public void setQBlastOutputFormat(int type) {
> >
> > switch (type) {
> > case 0:
> > this.outputFormat = "Text";
> > break;
> > case 1:
> > this.outputFormat = "XML";
> > break;
> > case 2:
> > this.outputFormat = "HTML";
> > break;
> > }
> > }
> >
> > /**
> > * <p>
> > * Simply returns the output format used for the given Blast report.
> > * </p>
> > *
> > * @return outputFormat : a String with the format specified for the
> > QBlast report.
> > */
> > public String getQBlastOutputFormat() {
> > return this.outputFormat;
> > }
> >
> > /**
> > * <p>This method is to be used if a request is to use non-default
> > values at submission. According to QBlast info,
> > * the accepted parameters for PUT requests are:</p>
> > *
> > * <ul>
> > * <li>-G: cost to create a gap. Default = 5 (nuc-nuc) / 11 (protein) /
> > non-affine for megablast</li>
> > * <li>-E: Cost to extend a gap. Default = 2 (nuc-nuc) / 1 (protein) /
> > non-affine for megablast</li>
> > * <li>-r: integer to reward for match. Default = 1</li>
> > * <li>-q: negative integer for penalty to allow mismatch. Default =
> > -3</li>
> > * <li>-e: expectation value. Default = 10.0</li>
> > * <li>-W: word size. Default = 3 (proteins) / 11 (nuc-nuc) / 28
> > (megablast)</li>
> > * <li>-y: dropoff for blast extensions in bits, using default if not
> > specified. Default = 20 for blastn, 7 for all others
> > * (except megablast for which it is not applicable).</li>
> > * <li>-X: X dropoff value for gapped alignment, in bits. Default = 30
> > for blastn/megablast, 15 for all others.</li>
> > * <li>-Z: final X dropoff value for gapped alignement, in bits. Default
> > = 50 for blastn, 25 for all others
> > * (except megablast for which it is not applicable)</li>
> > * <li>-P: equals 0 for multiple hits 1-pass, 1 for single hit 1-pass.
> > Does not apply to blastn ou megablast.</li>
> > * <li>-A: multiple hits window size. Default = 0 (for single hit
> > algorithm)</li>
> > * <li>-I: number of database sequences to save hits for. Default =
> > 500</li>
> > * <li>-Y: effective length of the search space. Default = 0 (0
> > represents using the whole space)</li>
> > * <li>-z: a real specifying the effective length of the database to
> > use. Default = 0 (0 represents the real size)</li>
> > * <li>-c: an integer representing pseudocount constant for PSI-BLAST.
> > Default = 7</li>
> > * <li>-F: any filtering directive</li>
> > * </ul>
> > *
> > * <p>You have to be aware that at not moment is there any error
> > checking on the use of these parameters by this class.</p>
> > * @param aStr: a String with any number of optional parameters with an
> > associated value.
> > *
> > */
> > public void setAdvancedOptions(String aStr) {
> > this.advanced = "OTHER_ADVANCED=" + aStr;
> > }
> >
> > /**
> > *
> > * Simply return the string given as argument via
> > setBlastAdvancedOptions
> > *
> > * @return advanced: the string with the advanced options
> > */
> > public String getBlastAdvancedOptions() {
> > return this.advanced;
> > }
> >
> > /**
> > *
> > * Simply return the QBlast RID for this specific QBlast request
> > *
> > * @return rid: the string with the RID
> > */
> > public String getBlastRID() {
> > return this.rid;
> > }
> >
> > /**
> > * A simple method to check the availability of the QBlast service
> > *
> > * @throws BioException
> > */
> > public void printRemoteBlastInfo() throws BioException {
> > try {
> > OutputStreamWriter out = new OutputStreamWriter(uConn
> > .getOutputStream());
> >
> > out.write("CMD=Info");
> > out.flush();
> >
> > // Get the response
> > BufferedReader rd = new BufferedReader(new
> > InputStreamReader(uConn
> > .getInputStream()));
> >
> > String line = "";
> >
> > while ((line = rd.readLine()) != null) {
> > System.out.println(line);
> > }
> >
> > out.close();
> > rd.close();
> > } catch (IOException e) {
> > throw new BioException(
> > "Impossible to get info from QBlast service at this
> > time. Check your network connection");
> > }
> > }
> >
> > private URLConnection setQBlastProperties(URLConnection conn) {
> >
> > URLConnection tmp = conn;
> >
> > conn.setDoOutput(true);
> > conn.setUseCaches(false);
> >
> > tmp.setRequestProperty("User-Agent", "Biojava/RemoteQBlastService");
> > tmp.setRequestProperty("Connection", "Keep-Alive");
> > tmp.setRequestProperty("Content-type",
> > "application/x-www-form-urlencoded");
> > tmp.setRequestProperty("Content-length", "200");
> >
> > return tmp;
> > }
> > }
> >
> >
> > _______________________________________________
> > biojava-dev mailing list
> > biojava-dev at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biojava-dev
> >
>
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
--
Richard Holland, BSc MBCS
Finance Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/
More information about the biojava-dev
mailing list