From p.j.a.cock at googlemail.com Thu Sep 1 11:47:53 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 1 Sep 2011 16:47:53 +0100 Subject: [emboss-dev] Problems with EMBOSS seqret GenBank to GFF3 In-Reply-To: <4E5D0649.3010905@ebi.ac.uk> References: <4E54D432.8030309@ebi.ac.uk> <4E550E8D.8010506@ebi.ac.uk> <4E56539E.6030400@ebi.ac.uk> <4E5D0649.3010905@ebi.ac.uk> Message-ID: On Tue, Aug 30, 2011 at 4:48 PM, Peter Rice wrote: > > [cut] > > On 08/26/2011 03:27 AM, Peter Cock wrote: >> This is probably a good example to discuss on the GFF3 >> song-devel mailing list - small and apparently very simple >> except for how to represent the (forward strand) join location. > > We could propose something for the > http://www.sequenceontology.org/wiki/index.php/GFF3_best_practices > page to describe how to represent EMBL/GenBank entries > in GFF3 (after due discussion on the SONG-devel list) That sounds like a plan :) I'm on leave this week so I may not get to look into the details of this until next week - but don't worry, I'll only be ignoring you in the short term ;) Thanks, Peter C. From ajb at ebi.ac.uk Tue Sep 6 04:54:02 2011 From: ajb at ebi.ac.uk (ajb at ebi.ac.uk) Date: Tue, 6 Sep 2011 09:54:02 +0100 (BST) Subject: [emboss-dev] EMBOSS and mEMBOSS bug-fix set 1-21 released Message-ID: <35576.82.26.12.214.1315299242.squirrel@imap04.ebi.ac.uk> New bug-fixes are available for EMBOSS-6.4.0 and, for Windows users, a new version of mEMBOSS is available. The bugs fixed include those recently fixed (12-21), listed below, and all those fixed by previous patches (1-11). 1) UNIX As usual, the most convenient way of applying the bug-fixes is to apply the patch file: ftp://emboss.open-bio.org/pub/EMBOSS/fixes/patches/patch-1-21.gz to a freshly extracted copy of the EMBOSS-6.4.0.tar.gz source code and recompiling/installing. (see ftp://emboss.open-bio.org/pub/EMBOSS/fixes/patches/README.patch for instructions on using 'patch'). Alternatively, you can individually copy the patched files from the ftp://emboss.open-bio.org/pub/EMBOSS/fixes/ directory if your system does not support 'patch'. 2) mEMBOSS The new version incorporates all new and previous bug-fixes. Uninstall your previous mEMBOSS installation and download and install the new setup file from: ftp://emboss.open-bio.org/pub/EMBOSS/windows/mEMBOSS-6.4.0.3-setup.exe Alan ----------------------------------------------------------------------- Fix 12. EMBOSS-6.4.0/nucleus/embgroup.c 18-Aug-2011: An internal string variable should be initialised as NULL. Fix 13. EMBOSS-6.4.0/ajax/core/ajseqread.c 18-Aug-2011: Reading protein GFF3 files ignored the EMBOSS type comment. This appears to be the only way to detect a protein GFF3 file. Fix 14. EMBOSS-6.4.0/emboss/data/Efeatures.gff3protein 18-Aug-2011: In writing GFF3 protein file, uses the current term name from the Sequence Ontology to clear errors from the GFF3 online validator. Fix 15. EMBOSS-6.4.0/ajax/core/ajfeatwrite.c 18-Aug-2011: When writing GFF3 format, tags names are explicitly converted to lower case as required by the GFF3 standard. This includes EC_number and /PCR_conditions in EMBL/GenBank/DDBJ and several RefseqP tags. The score is written using g format to represent very low values. Fix 16. EMBOSS-6.4.0/ajax/core/ajnexus.c EMBOSS-6.4.0/ajax/core/ajseqread.c 22-Aug-2011: When reading nexus data format with no taxlabels block the attempt to read the taxa (sequence names) from the matrix block failed. Fix 17. EMBOSS-6.4.0/ajax/ajaxdb/ajtextdb.c 22-Aug-2011: The SRS access method added a stray '+' character to the getz command line. Fix 18. EMBOSS-6.4.0/ajax/core/ajquery.c 25-Aug-2011: In some cases a query using a simple identifier could try to test an undefined "sv" field. Fix 19. EMBOSS-6.4.0/ajax/core/ajseqread.c 02-Sep-2011: Reading "raw" sequence format failed when piped from standard input. In release 6.4.0 "raw" format was redefined as a binary format to catch binary files that start with one or more sequence characters followed by a NULL character. This fix continues to check binary files, but has to drop the check for data piped through standard input which is read as text and cannot be reread as binary. Fix 20. EMBOSS-6.4.0/ajax/core/ajnam.c 02-Sep-2011: Complex database definitions with more than one type or format are allowed in 6.4.0 but caused an error message from showdb when the type and format were tested. Fix 21. EMBOSS-6.4.0/emboss/drfinddata.c EMBOSS-6.4.0/emboss/drfindformat.c EMBOSS-6.4.0/emboss/drfindid.c EMBOSS-6.4.0/emboss/drfindresource.c 02-Sep-2011: Running with -debug fails. Debug calls used obsolete datatype for data resource internals. Without -debug there was no problem. From p.j.a.cock at googlemail.com Tue Sep 6 05:48:17 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 6 Sep 2011 10:48:17 +0100 Subject: [emboss-dev] Problems with EMBOSS seqret GenBank to GFF3 In-Reply-To: References: <4E54D432.8030309@ebi.ac.uk> <4E550E8D.8010506@ebi.ac.uk> <4E56539E.6030400@ebi.ac.uk> <4E5D0649.3010905@ebi.ac.uk> Message-ID: On Thu, Sep 1, 2011 at 4:47 PM, Peter Cock wrote: > On Tue, Aug 30, 2011 at 4:48 PM, Peter Rice wrote: >> >> [cut] >> >> On 08/26/2011 03:27 AM, Peter Cock wrote: >>> This is probably a good example to discuss on the GFF3 >>> song-devel mailing list - small and apparently very simple >>> except for how to represent the (forward strand) join location. >> >> We could propose something for the >> http://www.sequenceontology.org/wiki/index.php/GFF3_best_practices >> page to describe how to represent EMBL/GenBank entries >> in GFF3 (after due discussion on the SONG-devel list) > > That sounds like a plan :) > > I'm on leave this week so I may not get to look into the details > of this until next week - but don't worry, I'll only be ignoring > you in the short term ;) > > Thanks, > > Peter C. > I see you've released some patches for EMBOSS 6.4.0 to address some of the issues we've been discussing: http://lists.open-bio.org/pipermail/emboss-announce/2011-September/000030.html Peter From pmr at ebi.ac.uk Thu Sep 29 10:43:04 2011 From: pmr at ebi.ac.uk (Peter Rice) Date: Thu, 29 Sep 2011 15:43:04 +0100 Subject: [emboss-dev] Commandline changes in EMBOSS applications Message-ID: <4E8483F8.4080908@ebi.ac.uk> A question for our developer community... I am working through the GALAXY wrappers for EMBOSS applications. GALAXY has a very clean way to define command line applications which is close to EMBOSS's ACD definitions, so most applications are easy to define. I have problems where the default values in the ACD file depend on other values. Two examples from prettyplot illustrate the problem. In both cases, the current GALAXY definitions ignore these qualifiers. integer: residuesperline [ default: "50" information: "Number of residues to be displayed on each line" ] integer: resbreak [ information: "Residues before a space" default: "$(residuesperline)" expected: "Same as -residuesperline to give no breaks" ] The second qualifier defaults to the value of the first. GALAXY is unable to interpret this. It could be defined with a default of "50" for GALAXY, but I would prefer to remove this qualifier and add a new one "-blocksperline" with a default of 1. In this way the dependency disappears, and the results are cleaner. The second value is a calculation from sequence properties: float: plurality [ information: "Plurality check value (totweight/2)" default: "@( $(sequences.totweight) / 2)" expected: "Half the total sequence weighting" ] This has a long history, back to the EGCG version of prettyplot where the command line options were extensions of a GCG program. The "weight" is by default 1.0 per sequence, but GCG format had a way to adjust weights in the input file. Plurality is nice in that it allows a definition of how many of the sequences should match. In this case, it seems easier to ignore the weight-based value and instead to define -percent 50.0 then multiple the total weight (or number of sequences) by 0.50 and get the same results. I am a little nervous about removing command line options because of the risk of breaking some interfaces. So: 1. Should I go ahead and add the new options? 2. Do I remove the old options so old wrappers, scripts, etc. break with "unknown qualifier -plurality" 3. Or, do we keep the old options, declare them obsolete, object to their use but keep going As option 3 would also complicate life for wrappers - anyone making new wrappers would most probably include the obsolete options - I prefer 1+2 but I would appreciate some feedback. regards, Peter Rice EMBOSS Team From p.j.a.cock at googlemail.com Thu Sep 29 11:03:08 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 29 Sep 2011 16:03:08 +0100 Subject: [emboss-dev] Commandline changes in EMBOSS applications In-Reply-To: <4E8483F8.4080908@ebi.ac.uk> References: <4E8483F8.4080908@ebi.ac.uk> Message-ID: On Thu, Sep 29, 2011 at 3:43 PM, Peter Rice wrote: > A question for our developer community... > > I am working through the GALAXY wrappers for EMBOSS applications. GALAXY has > a very clean way to define command line applications which is close to > EMBOSS's ACD definitions, so most applications are easy to define. > > I have problems where the default values in the ACD file depend on other > values. Two examples from prettyplot illustrate the problem. In both cases, > the current GALAXY definitions ignore these qualifiers. > > ?integer: residuesperline [ > ? ?default: "50" > ? ?information: "Number of residues to be displayed on each > ? ? ? ? ? ? ? ? ?line" > ?] > > ?integer: resbreak [ > ? ?information: "Residues before a space" > ? ?default: "$(residuesperline)" > ? ?expected: "Same as -residuesperline to give no breaks" > ?] > > > The second qualifier defaults to the value of the first. GALAXY is unable to > interpret this. It could be defined with a default of "50" for GALAXY, but I > would prefer to remove this qualifier and add a new one "-blocksperline" > with a default of 1. In this way the dependency disappears, and the results > are cleaner. > > The second value is a calculation from sequence properties: > > ?float: plurality [ > ? ?information: "Plurality check value (totweight/2)" > ? ?default: "@( $(sequences.totweight) / 2)" > ? ?expected: "Half the total sequence weighting" > ?] > > This has a long history, back to the EGCG version of prettyplot where the > command line options were extensions of a GCG program. The "weight" is by > default 1.0 per sequence, but GCG format had a way to adjust weights in the > input file. Plurality is nice in that it allows a definition of how many of > the sequences should match. > > In this case, it seems easier to ignore the weight-based value and instead > to define -percent 50.0 then multiple the total weight (or number of > sequences) by 0.50 and get the same results. > > I am a little nervous about removing command line options because of the > risk of breaking some interfaces. > > So: > > 1. Should I go ahead and add the new options? > 2. Do I remove the old options so old wrappers, scripts, etc. break with > "unknown qualifier -plurality" > 3. Or, do we keep the old options, declare them obsolete, object to their > use but keep going > > As option 3 would also complicate life for wrappers - anyone making new > wrappers would most probably include the obsolete options - I prefer 1+2 but > I would appreciate some feedback. > > regards, > > Peter Rice Hi Peter R, In theory you can use an optional integer parameter in Galaxy, with an empty default, meaning the user doesn't have to put in a value. You can then check this in the tool wrapper's XML tag with Cheetah syntax to decide if you add the -switch value to the command string (with the user's value), or not (to get the EMBOSS default). Perhaps I have misunderstood, but I think it is supported in Galaxy although probably quite fiddly. Peter C. From p.j.a.cock at googlemail.com Thu Sep 29 11:13:46 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 29 Sep 2011 16:13:46 +0100 Subject: [emboss-dev] Commandline changes in EMBOSS applications In-Reply-To: References: <4E8483F8.4080908@ebi.ac.uk> Message-ID: On Thu, Sep 29, 2011 at 4:03 PM, Peter Cock wrote: > On Thu, Sep 29, 2011 at 3:43 PM, Peter Rice wrote: >> A question for our developer community... >> >> I am working through the GALAXY wrappers for EMBOSS applications. GALAXY has >> a very clean way to define command line applications which is close to >> EMBOSS's ACD definitions, so most applications are easy to define. >> >> I have problems where the default values in the ACD file depend on other >> values. Two examples from prettyplot illustrate the problem. In both cases, >> the current GALAXY definitions ignore these qualifiers. >> >> ?integer: residuesperline [ >> ? ?default: "50" >> ? ?information: "Number of residues to be displayed on each >> ? ? ? ? ? ? ? ? ?line" >> ?] >> >> ?integer: resbreak [ >> ? ?information: "Residues before a space" >> ? ?default: "$(residuesperline)" >> ? ?expected: "Same as -residuesperline to give no breaks" >> ?] >> >> >> The second qualifier defaults to the value of the first. GALAXY is unable to >> interpret this. It could be defined with a default of "50" for GALAXY, but I >> would prefer to remove this qualifier and add a new one "-blocksperline" >> with a default of 1. In this way the dependency disappears, and the results >> are cleaner. >> >> The second value is a calculation from sequence properties: >> >> ?float: plurality [ >> ? ?information: "Plurality check value (totweight/2)" >> ? ?default: "@( $(sequences.totweight) / 2)" >> ? ?expected: "Half the total sequence weighting" >> ?] >> >> This has a long history, back to the EGCG version of prettyplot where the >> command line options were extensions of a GCG program. The "weight" is by >> default 1.0 per sequence, but GCG format had a way to adjust weights in the >> input file. Plurality is nice in that it allows a definition of how many of >> the sequences should match. >> >> In this case, it seems easier to ignore the weight-based value and instead >> to define -percent 50.0 then multiple the total weight (or number of >> sequences) by 0.50 and get the same results. >> >> I am a little nervous about removing command line options because of the >> risk of breaking some interfaces. >> >> So: >> >> 1. Should I go ahead and add the new options? >> 2. Do I remove the old options so old wrappers, scripts, etc. break with >> "unknown qualifier -plurality" >> 3. Or, do we keep the old options, declare them obsolete, object to their >> use but keep going >> >> As option 3 would also complicate life for wrappers - anyone making new >> wrappers would most probably include the obsolete options - I prefer 1+2 but >> I would appreciate some feedback. >> >> regards, >> >> Peter Rice > > Hi Peter R, > > In theory you can use an optional integer parameter in Galaxy, > with an empty default, ?meaning the user doesn't have to put in > a value. You can then check this in the tool wrapper's XML > tag with Cheetah syntax to decide if you add > the -switch value to the command string (with the user's value), > or not (to get the EMBOSS default). > > Perhaps I have misunderstood, but I think it is supported in > Galaxy although probably quite fiddly. > To try and clarify, Have a look at the NCBI BLAST+ wrapper for blastn as a related example, where max_hits is an integer option defaulting to zero. This pre-dated Galaxy fixing optional integer arguments - at the time the best you could do was a default (here zero) which you could recognise. In the tag, I treat zero as meaning use the defaults, i.e. don't add the -max switch to the command string: #if (str($adv_opts.max_hits) and int(str($adv_opts.max_hits)) > 0): -max_target_seqs $adv_opts.max_hits #end if You should now be able to use a blank default, and in the Cheetah if statement, check for non-blank. But it is the same basic idea. Peter From pmr at ebi.ac.uk Thu Sep 29 11:20:24 2011 From: pmr at ebi.ac.uk (Peter Rice) Date: Thu, 29 Sep 2011 16:20:24 +0100 Subject: [emboss-dev] Commandline changes in EMBOSS applications In-Reply-To: References: <4E8483F8.4080908@ebi.ac.uk> Message-ID: <4E848CB8.1010805@ebi.ac.uk> On 09/29/2011 04:03 PM, Peter Cock wrote: > On Thu, Sep 29, 2011 at 3:43 PM, Peter Rice wrote: >> A question for our developer community... >> >> I am working through the GALAXY wrappers for EMBOSS applications. GALAXY has >> a very clean way to define command line applications which is close to >> EMBOSS's ACD definitions, so most applications are easy to define. >> >> I have problems where the default values in the ACD file depend on other >> values. Two examples from prettyplot illustrate the problem. In both cases, >> the current GALAXY definitions ignore these qualifiers. > In theory you can use an optional integer parameter in Galaxy, > with an empty default, meaning the user doesn't have to put in > a value. You can then check this in the tool wrapper's XML > tag with Cheetah syntax to decide if you add > the -switch value to the command string (with the user's value), > or not (to get the EMBOSS default). Thanks for the tip. I found the GALAXY documentation on this at http://wiki.g2.bx.psu.edu/Admin/Tools/Tool%20Config%20Syntax So, possible in GALAXY but these ACD files are an issue for other wrapper developers. Therefore I would still like to replace them as far as possible. The GALAXY conditionals would be very useful for the phylipnew EMBASSY package where qualifiers depend on a set of selection menus... but I am not planning to add those applications just yet ... unless there is a demand for them, of course. regards, Peter Rice EMBOSS Team From p.j.a.cock at googlemail.com Thu Sep 1 15:47:53 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 1 Sep 2011 16:47:53 +0100 Subject: [emboss-dev] Problems with EMBOSS seqret GenBank to GFF3 In-Reply-To: <4E5D0649.3010905@ebi.ac.uk> References: <4E54D432.8030309@ebi.ac.uk> <4E550E8D.8010506@ebi.ac.uk> <4E56539E.6030400@ebi.ac.uk> <4E5D0649.3010905@ebi.ac.uk> Message-ID: On Tue, Aug 30, 2011 at 4:48 PM, Peter Rice wrote: > > [cut] > > On 08/26/2011 03:27 AM, Peter Cock wrote: >> This is probably a good example to discuss on the GFF3 >> song-devel mailing list - small and apparently very simple >> except for how to represent the (forward strand) join location. > > We could propose something for the > http://www.sequenceontology.org/wiki/index.php/GFF3_best_practices > page to describe how to represent EMBL/GenBank entries > in GFF3 (after due discussion on the SONG-devel list) That sounds like a plan :) I'm on leave this week so I may not get to look into the details of this until next week - but don't worry, I'll only be ignoring you in the short term ;) Thanks, Peter C. From ajb at ebi.ac.uk Tue Sep 6 08:54:02 2011 From: ajb at ebi.ac.uk (ajb at ebi.ac.uk) Date: Tue, 6 Sep 2011 09:54:02 +0100 (BST) Subject: [emboss-dev] EMBOSS and mEMBOSS bug-fix set 1-21 released Message-ID: <35576.82.26.12.214.1315299242.squirrel@imap04.ebi.ac.uk> New bug-fixes are available for EMBOSS-6.4.0 and, for Windows users, a new version of mEMBOSS is available. The bugs fixed include those recently fixed (12-21), listed below, and all those fixed by previous patches (1-11). 1) UNIX As usual, the most convenient way of applying the bug-fixes is to apply the patch file: ftp://emboss.open-bio.org/pub/EMBOSS/fixes/patches/patch-1-21.gz to a freshly extracted copy of the EMBOSS-6.4.0.tar.gz source code and recompiling/installing. (see ftp://emboss.open-bio.org/pub/EMBOSS/fixes/patches/README.patch for instructions on using 'patch'). Alternatively, you can individually copy the patched files from the ftp://emboss.open-bio.org/pub/EMBOSS/fixes/ directory if your system does not support 'patch'. 2) mEMBOSS The new version incorporates all new and previous bug-fixes. Uninstall your previous mEMBOSS installation and download and install the new setup file from: ftp://emboss.open-bio.org/pub/EMBOSS/windows/mEMBOSS-6.4.0.3-setup.exe Alan ----------------------------------------------------------------------- Fix 12. EMBOSS-6.4.0/nucleus/embgroup.c 18-Aug-2011: An internal string variable should be initialised as NULL. Fix 13. EMBOSS-6.4.0/ajax/core/ajseqread.c 18-Aug-2011: Reading protein GFF3 files ignored the EMBOSS type comment. This appears to be the only way to detect a protein GFF3 file. Fix 14. EMBOSS-6.4.0/emboss/data/Efeatures.gff3protein 18-Aug-2011: In writing GFF3 protein file, uses the current term name from the Sequence Ontology to clear errors from the GFF3 online validator. Fix 15. EMBOSS-6.4.0/ajax/core/ajfeatwrite.c 18-Aug-2011: When writing GFF3 format, tags names are explicitly converted to lower case as required by the GFF3 standard. This includes EC_number and /PCR_conditions in EMBL/GenBank/DDBJ and several RefseqP tags. The score is written using g format to represent very low values. Fix 16. EMBOSS-6.4.0/ajax/core/ajnexus.c EMBOSS-6.4.0/ajax/core/ajseqread.c 22-Aug-2011: When reading nexus data format with no taxlabels block the attempt to read the taxa (sequence names) from the matrix block failed. Fix 17. EMBOSS-6.4.0/ajax/ajaxdb/ajtextdb.c 22-Aug-2011: The SRS access method added a stray '+' character to the getz command line. Fix 18. EMBOSS-6.4.0/ajax/core/ajquery.c 25-Aug-2011: In some cases a query using a simple identifier could try to test an undefined "sv" field. Fix 19. EMBOSS-6.4.0/ajax/core/ajseqread.c 02-Sep-2011: Reading "raw" sequence format failed when piped from standard input. In release 6.4.0 "raw" format was redefined as a binary format to catch binary files that start with one or more sequence characters followed by a NULL character. This fix continues to check binary files, but has to drop the check for data piped through standard input which is read as text and cannot be reread as binary. Fix 20. EMBOSS-6.4.0/ajax/core/ajnam.c 02-Sep-2011: Complex database definitions with more than one type or format are allowed in 6.4.0 but caused an error message from showdb when the type and format were tested. Fix 21. EMBOSS-6.4.0/emboss/drfinddata.c EMBOSS-6.4.0/emboss/drfindformat.c EMBOSS-6.4.0/emboss/drfindid.c EMBOSS-6.4.0/emboss/drfindresource.c 02-Sep-2011: Running with -debug fails. Debug calls used obsolete datatype for data resource internals. Without -debug there was no problem. From p.j.a.cock at googlemail.com Tue Sep 6 09:48:17 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 6 Sep 2011 10:48:17 +0100 Subject: [emboss-dev] Problems with EMBOSS seqret GenBank to GFF3 In-Reply-To: References: <4E54D432.8030309@ebi.ac.uk> <4E550E8D.8010506@ebi.ac.uk> <4E56539E.6030400@ebi.ac.uk> <4E5D0649.3010905@ebi.ac.uk> Message-ID: On Thu, Sep 1, 2011 at 4:47 PM, Peter Cock wrote: > On Tue, Aug 30, 2011 at 4:48 PM, Peter Rice wrote: >> >> [cut] >> >> On 08/26/2011 03:27 AM, Peter Cock wrote: >>> This is probably a good example to discuss on the GFF3 >>> song-devel mailing list - small and apparently very simple >>> except for how to represent the (forward strand) join location. >> >> We could propose something for the >> http://www.sequenceontology.org/wiki/index.php/GFF3_best_practices >> page to describe how to represent EMBL/GenBank entries >> in GFF3 (after due discussion on the SONG-devel list) > > That sounds like a plan :) > > I'm on leave this week so I may not get to look into the details > of this until next week - but don't worry, I'll only be ignoring > you in the short term ;) > > Thanks, > > Peter C. > I see you've released some patches for EMBOSS 6.4.0 to address some of the issues we've been discussing: http://lists.open-bio.org/pipermail/emboss-announce/2011-September/000030.html Peter From pmr at ebi.ac.uk Thu Sep 29 14:43:04 2011 From: pmr at ebi.ac.uk (Peter Rice) Date: Thu, 29 Sep 2011 15:43:04 +0100 Subject: [emboss-dev] Commandline changes in EMBOSS applications Message-ID: <4E8483F8.4080908@ebi.ac.uk> A question for our developer community... I am working through the GALAXY wrappers for EMBOSS applications. GALAXY has a very clean way to define command line applications which is close to EMBOSS's ACD definitions, so most applications are easy to define. I have problems where the default values in the ACD file depend on other values. Two examples from prettyplot illustrate the problem. In both cases, the current GALAXY definitions ignore these qualifiers. integer: residuesperline [ default: "50" information: "Number of residues to be displayed on each line" ] integer: resbreak [ information: "Residues before a space" default: "$(residuesperline)" expected: "Same as -residuesperline to give no breaks" ] The second qualifier defaults to the value of the first. GALAXY is unable to interpret this. It could be defined with a default of "50" for GALAXY, but I would prefer to remove this qualifier and add a new one "-blocksperline" with a default of 1. In this way the dependency disappears, and the results are cleaner. The second value is a calculation from sequence properties: float: plurality [ information: "Plurality check value (totweight/2)" default: "@( $(sequences.totweight) / 2)" expected: "Half the total sequence weighting" ] This has a long history, back to the EGCG version of prettyplot where the command line options were extensions of a GCG program. The "weight" is by default 1.0 per sequence, but GCG format had a way to adjust weights in the input file. Plurality is nice in that it allows a definition of how many of the sequences should match. In this case, it seems easier to ignore the weight-based value and instead to define -percent 50.0 then multiple the total weight (or number of sequences) by 0.50 and get the same results. I am a little nervous about removing command line options because of the risk of breaking some interfaces. So: 1. Should I go ahead and add the new options? 2. Do I remove the old options so old wrappers, scripts, etc. break with "unknown qualifier -plurality" 3. Or, do we keep the old options, declare them obsolete, object to their use but keep going As option 3 would also complicate life for wrappers - anyone making new wrappers would most probably include the obsolete options - I prefer 1+2 but I would appreciate some feedback. regards, Peter Rice EMBOSS Team From p.j.a.cock at googlemail.com Thu Sep 29 15:03:08 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 29 Sep 2011 16:03:08 +0100 Subject: [emboss-dev] Commandline changes in EMBOSS applications In-Reply-To: <4E8483F8.4080908@ebi.ac.uk> References: <4E8483F8.4080908@ebi.ac.uk> Message-ID: On Thu, Sep 29, 2011 at 3:43 PM, Peter Rice wrote: > A question for our developer community... > > I am working through the GALAXY wrappers for EMBOSS applications. GALAXY has > a very clean way to define command line applications which is close to > EMBOSS's ACD definitions, so most applications are easy to define. > > I have problems where the default values in the ACD file depend on other > values. Two examples from prettyplot illustrate the problem. In both cases, > the current GALAXY definitions ignore these qualifiers. > > ?integer: residuesperline [ > ? ?default: "50" > ? ?information: "Number of residues to be displayed on each > ? ? ? ? ? ? ? ? ?line" > ?] > > ?integer: resbreak [ > ? ?information: "Residues before a space" > ? ?default: "$(residuesperline)" > ? ?expected: "Same as -residuesperline to give no breaks" > ?] > > > The second qualifier defaults to the value of the first. GALAXY is unable to > interpret this. It could be defined with a default of "50" for GALAXY, but I > would prefer to remove this qualifier and add a new one "-blocksperline" > with a default of 1. In this way the dependency disappears, and the results > are cleaner. > > The second value is a calculation from sequence properties: > > ?float: plurality [ > ? ?information: "Plurality check value (totweight/2)" > ? ?default: "@( $(sequences.totweight) / 2)" > ? ?expected: "Half the total sequence weighting" > ?] > > This has a long history, back to the EGCG version of prettyplot where the > command line options were extensions of a GCG program. The "weight" is by > default 1.0 per sequence, but GCG format had a way to adjust weights in the > input file. Plurality is nice in that it allows a definition of how many of > the sequences should match. > > In this case, it seems easier to ignore the weight-based value and instead > to define -percent 50.0 then multiple the total weight (or number of > sequences) by 0.50 and get the same results. > > I am a little nervous about removing command line options because of the > risk of breaking some interfaces. > > So: > > 1. Should I go ahead and add the new options? > 2. Do I remove the old options so old wrappers, scripts, etc. break with > "unknown qualifier -plurality" > 3. Or, do we keep the old options, declare them obsolete, object to their > use but keep going > > As option 3 would also complicate life for wrappers - anyone making new > wrappers would most probably include the obsolete options - I prefer 1+2 but > I would appreciate some feedback. > > regards, > > Peter Rice Hi Peter R, In theory you can use an optional integer parameter in Galaxy, with an empty default, meaning the user doesn't have to put in a value. You can then check this in the tool wrapper's XML tag with Cheetah syntax to decide if you add the -switch value to the command string (with the user's value), or not (to get the EMBOSS default). Perhaps I have misunderstood, but I think it is supported in Galaxy although probably quite fiddly. Peter C. From p.j.a.cock at googlemail.com Thu Sep 29 15:13:46 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 29 Sep 2011 16:13:46 +0100 Subject: [emboss-dev] Commandline changes in EMBOSS applications In-Reply-To: References: <4E8483F8.4080908@ebi.ac.uk> Message-ID: On Thu, Sep 29, 2011 at 4:03 PM, Peter Cock wrote: > On Thu, Sep 29, 2011 at 3:43 PM, Peter Rice wrote: >> A question for our developer community... >> >> I am working through the GALAXY wrappers for EMBOSS applications. GALAXY has >> a very clean way to define command line applications which is close to >> EMBOSS's ACD definitions, so most applications are easy to define. >> >> I have problems where the default values in the ACD file depend on other >> values. Two examples from prettyplot illustrate the problem. In both cases, >> the current GALAXY definitions ignore these qualifiers. >> >> ?integer: residuesperline [ >> ? ?default: "50" >> ? ?information: "Number of residues to be displayed on each >> ? ? ? ? ? ? ? ? ?line" >> ?] >> >> ?integer: resbreak [ >> ? ?information: "Residues before a space" >> ? ?default: "$(residuesperline)" >> ? ?expected: "Same as -residuesperline to give no breaks" >> ?] >> >> >> The second qualifier defaults to the value of the first. GALAXY is unable to >> interpret this. It could be defined with a default of "50" for GALAXY, but I >> would prefer to remove this qualifier and add a new one "-blocksperline" >> with a default of 1. In this way the dependency disappears, and the results >> are cleaner. >> >> The second value is a calculation from sequence properties: >> >> ?float: plurality [ >> ? ?information: "Plurality check value (totweight/2)" >> ? ?default: "@( $(sequences.totweight) / 2)" >> ? ?expected: "Half the total sequence weighting" >> ?] >> >> This has a long history, back to the EGCG version of prettyplot where the >> command line options were extensions of a GCG program. The "weight" is by >> default 1.0 per sequence, but GCG format had a way to adjust weights in the >> input file. Plurality is nice in that it allows a definition of how many of >> the sequences should match. >> >> In this case, it seems easier to ignore the weight-based value and instead >> to define -percent 50.0 then multiple the total weight (or number of >> sequences) by 0.50 and get the same results. >> >> I am a little nervous about removing command line options because of the >> risk of breaking some interfaces. >> >> So: >> >> 1. Should I go ahead and add the new options? >> 2. Do I remove the old options so old wrappers, scripts, etc. break with >> "unknown qualifier -plurality" >> 3. Or, do we keep the old options, declare them obsolete, object to their >> use but keep going >> >> As option 3 would also complicate life for wrappers - anyone making new >> wrappers would most probably include the obsolete options - I prefer 1+2 but >> I would appreciate some feedback. >> >> regards, >> >> Peter Rice > > Hi Peter R, > > In theory you can use an optional integer parameter in Galaxy, > with an empty default, ?meaning the user doesn't have to put in > a value. You can then check this in the tool wrapper's XML > tag with Cheetah syntax to decide if you add > the -switch value to the command string (with the user's value), > or not (to get the EMBOSS default). > > Perhaps I have misunderstood, but I think it is supported in > Galaxy although probably quite fiddly. > To try and clarify, Have a look at the NCBI BLAST+ wrapper for blastn as a related example, where max_hits is an integer option defaulting to zero. This pre-dated Galaxy fixing optional integer arguments - at the time the best you could do was a default (here zero) which you could recognise. In the tag, I treat zero as meaning use the defaults, i.e. don't add the -max switch to the command string: #if (str($adv_opts.max_hits) and int(str($adv_opts.max_hits)) > 0): -max_target_seqs $adv_opts.max_hits #end if You should now be able to use a blank default, and in the Cheetah if statement, check for non-blank. But it is the same basic idea. Peter From pmr at ebi.ac.uk Thu Sep 29 15:20:24 2011 From: pmr at ebi.ac.uk (Peter Rice) Date: Thu, 29 Sep 2011 16:20:24 +0100 Subject: [emboss-dev] Commandline changes in EMBOSS applications In-Reply-To: References: <4E8483F8.4080908@ebi.ac.uk> Message-ID: <4E848CB8.1010805@ebi.ac.uk> On 09/29/2011 04:03 PM, Peter Cock wrote: > On Thu, Sep 29, 2011 at 3:43 PM, Peter Rice wrote: >> A question for our developer community... >> >> I am working through the GALAXY wrappers for EMBOSS applications. GALAXY has >> a very clean way to define command line applications which is close to >> EMBOSS's ACD definitions, so most applications are easy to define. >> >> I have problems where the default values in the ACD file depend on other >> values. Two examples from prettyplot illustrate the problem. In both cases, >> the current GALAXY definitions ignore these qualifiers. > In theory you can use an optional integer parameter in Galaxy, > with an empty default, meaning the user doesn't have to put in > a value. You can then check this in the tool wrapper's XML > tag with Cheetah syntax to decide if you add > the -switch value to the command string (with the user's value), > or not (to get the EMBOSS default). Thanks for the tip. I found the GALAXY documentation on this at http://wiki.g2.bx.psu.edu/Admin/Tools/Tool%20Config%20Syntax So, possible in GALAXY but these ACD files are an issue for other wrapper developers. Therefore I would still like to replace them as far as possible. The GALAXY conditionals would be very useful for the phylipnew EMBASSY package where qualifiers depend on a set of selection menus... but I am not planning to add those applications just yet ... unless there is a demand for them, of course. regards, Peter Rice EMBOSS Team