From jrvalverde at cnb.uam.es Wed Mar 12 05:04:33 2003 From: jrvalverde at cnb.uam.es (José R. Valverde) Date: Wed, 12 Mar 2003 11:04:33 +0100 Subject: Fw: [Fwd: IEEE Bioinformatics Conference] Message-ID: <20030312110433.21d781ab.jrvalverde@cnb.uam.es> Please, find attached an announcement for the IEEE Bioinformatics Conference. j -- Jos? R. Valverde EMBnet/CNB -------------- next part -------------- An embedded message was scrubbed... From: =?iso-8859-1?Q?=22Guig=F3=2C_Roderic=22?= Subject: [Fwd: IEEE Bioinformatics Conference] Date: Tue, 11 Mar 2003 17:23:39 +0100 Size: 6772 Url: http://lists.open-bio.org/pipermail/emboss-dev/attachments/20030312/72e530a1/attachment.mht From richard at seqbio.com Fri Mar 21 15:59:22 2003 From: richard at seqbio.com (Richard Cote) Date: Fri, 21 Mar 2003 15:59:22 -0500 Subject: JAVA ACD parameter validation Message-ID: <001901c2efec$cb45e130$6901010a@Talisker> Hello. I'm working on a development project using emboss. I'd like to know if there are any pure java implementations of ACD parameter validation (i.e. has anybody ever written the ajax library in java, or has anybody written a JNI interface to libajax). Thank you for your assistance, R.Cote From tcarver at hgmp.mrc.ac.uk Sat Mar 22 04:08:37 2003 From: tcarver at hgmp.mrc.ac.uk (Dr T. Carver) Date: Sat, 22 Mar 2003 09:08:37 +0000 (GMT) Subject: JAVA ACD parameter validation In-Reply-To: <001901c2efec$cb45e130$6901010a@Talisker> Message-ID: Hi Richard You possibly should look at the Jemboss code. This contains a Java ACD parser (org.emboss.jemboss.parser.ParseAcd) and a JNI interface to libajax (org.emboss.jemboss.parser.Ajax - ajjava.c). Let me know if you need more information, Tim Carver HGMP-RC On Fri, 21 Mar 2003, Richard Cote wrote: > Hello. > > I'm working on a development project using emboss. I'd like to know if > there are any pure java implementations of ACD parameter validation > (i.e. has anybody ever written the ajax library in java, or has anybody > written a JNI interface to libajax). > > Thank you for your assistance, > R.Cote > > From pmr at ebi.ac.uk Mon Mar 24 07:21:11 2003 From: pmr at ebi.ac.uk (Peter Rice) Date: Mon, 24 Mar 2003 12:21:11 +0000 Subject: More than one output for EMBOSS apps Message-ID: <3E7EF837.9070509@ebi.ac.uk> I am looking through the output definitions for EMBOSS, with the aim of cleaning up some problems that especially hit interfaces that parse the results, and definitions of web services. Several applications have more than one output. Some of the output files are optional, some are more complicated. I would like to clean these up so that, in general, an EMBOSS application has only one output file (and the interface can choose the format) A graph can be produced in addition to another output (seems hard to avoid) ... but the choices should be clearly defined so that interface wrappers know what they can expect. If there is more than one output, it should be clear where this is an "either/or" choice - so that, for example, two wrappers can be built, one for each output (many interfaces do something similar with DNA and protein sequence input already). More detailed thoughs in the next message. Peter From pmr at ebi.ac.uk Mon Mar 24 07:21:50 2003 From: pmr at ebi.ac.uk (Peter Rice) Date: Mon, 24 Mar 2003 12:21:50 +0000 Subject: More than one output ... detail Message-ID: <3E7EF85E.2070202@ebi.ac.uk> Some detailed proposals (and questions) on the text, feature and sequence outputs (graphics to follow). Comments please... The applications with multiple output files (now or in the near future) are: 1. checktrans: outfile, outseq, featout We can replace the output with a report file featout is obsolete (a report file can write features). If we add "-rformat fasta" we can write the sequences in FASTA format for any report (with annotation from the report features). Does anyone really need all 3 outputs? (or more than one of them frmo one run?) 2. cpgplot: outfile, featout outfile could become a report with the outfile notes in the header, and a table output (gff is a -rformat option to write the featout file) 3. cpgreport: outfile, featout outfile could become a report in rformat table ... see cpgplot. 4. diffseq: report featout featout diffseq reports differences between 2 sequences, as a special report. It also writes the differences to feature files in GFF format - we can make this a separate -rformat with both sequences annotated in one GFF file. Does anyone need the separate featout files from diffseq? 5. einverted (and palindrome): outfile (report, align) The output file should be an alignment showing the inverted repeat(s). This is complicated by being a pairwise alignment of one sequence with itself. Can this be an alignment format, to avoid the need for a separate report file? If so, will the alignment routines know there is only one sequence so GFF output can be merged? Palindrome has the same problems. 6. emma: seqoutset outfile emma produces a sequence file with aligned sequences (should this be an alignment file instead?) and a text file whch is a copy of clustalw's dendrogram output. Should the dendrogram be a special output file type? Should emma be simplified to remove the dendrogram option, and make a separate application to generate it? Does anyone use the .dnd output file from emma? Should we rewrite the emma interface to make it a lot simpler? Note: emma uses a "string" ACD type for the old dendrogram (-dendfile) input filename, and for other optional input files. This should be changed to use infile ... to help wrappers, and to validate the filenames. 7. equicktandem: report, outfile For those who parsed the old format, equicktandem still produces a second outfile. Does anyone still use it? Can we flag it as obsolete so interfaces can safely ignore it? (it has nullok set) 8. est2genome: outfile (report, align) est2genome needs to be converted to produce report and alignment output. The alignment output is optional. Many est2genome users depend on the old text output, so this may need to be preserved - but preferably as an obsolete output (see etandem) or a renamed "oldest2genome" application that the rest of us can ignore. I would prefer to have 2 outputs (report and align) with the align optional. The align ACD definition can set "nullok:Y" and depend on the -align option. This also needs a new alignment format (est2genome alignments are not simple :-) 9. etandem: report, outfile For those who parsed the old format, etandem still produces a second outfile. Does anyone still use it? Can we flag it as obsolete so interfaces can safely ignore it? (it has nullok set). 10. megamerger: outfile, seqout I guess we need both outputs. Can the outfile be a report in the style of diffseq? It reports what happens to each mismatch region. Or ... can the output be a sequence with features, and use a report format as the default feature format? It may mean "merging" some featout and report qualifiers and attributes. Any output sequence from EMBOSS can have a feature table. We usually do not define output feature tables (feature: "Y") for output sequence types in ACD. They are not shown in the "-help" output. 11. merger: align seqout Do we need both outputs? Maybe we do. We could make a "-aformat=consensus" alignment option to get the sequence. Or we could make one or both outputs optional (with the nullok attribute in ACD) so wrappers can turn them off. 12. notseq: seqoutall seqoutall Writes the sequences remaining, and possibly the sequences excluded. Can we have a simple switch that writes one or the other set, to a single output file? 13. sirna: report, seqoutall Can the seqoutall output be a report format? Maybe not, as the sequence reported is not the same as the original sequence. Can we use a sequence with a feature report? (see megamerger) 14. sixpack: outfile, seqoutall Tricky, because both files are used by some interfaces (SRS for example - although it cheats by merging them) The output file is like "remap -translation". It does not work as a report format - too many command line options to change the appearance of the translation. I think we must keep the existing 2 output files in this case. 15. supermatcher: align, outfile The outfile is only an error report. Can we simply use standard error and let it be redirected? Or do we really need it? There is only a "no start point" message written to it. Standard error, and a verbose option for this message would probably be good enough. 16. vectorstrip: outfile, seqoutall Would be better to have a report file. Can this have the sequence output as an extra format or file? Not easy to match the report file to the seqoutall (the report file shows sequences excluded, the seqoutall file shows sequences that remain). Should be possible to tag the features in the report to make this work. Can we use standard error as the output file, with a verbose option to report vectors (see supermatcher) ? 17. wordmatch: align featout featout Do we need the featout files? They report the matched regions between the two input sequences. This could be simply a new alignment format (a GFF file with each of the aligned sequences represented). regards, Peter From ableasby at hgmp.mrc.ac.uk Mon Mar 24 07:41:18 2003 From: ableasby at hgmp.mrc.ac.uk (ableasby at hgmp.mrc.ac.uk) Date: Mon, 24 Mar 2003 12:41:18 GMT Subject: More than one output ... detail Message-ID: <200303241241.h2OCfIL07830@bromine.hgmp.mrc.ac.uk> I'll look more closely later. One thing that mustn't disappear is the emma DND output though. Its extremely useful! Cheers Alan From jkb at mrc-lmb.cam.ac.uk Mon Mar 24 08:49:18 2003 From: jkb at mrc-lmb.cam.ac.uk (James Bonfield) Date: Mon, 24 Mar 2003 13:49:18 +0000 Subject: More than one output for EMBOSS apps In-Reply-To: <3E7EF837.9070509@ebi.ac.uk>; from pmr@ebi.ac.uk on Mon, Mar 24, 2003 at 12:21:11PM +0000 References: <3E7EF837.9070509@ebi.ac.uk> Message-ID: <20030324134918.A15040@arran.mrc-lmb.cam.ac.uk> On Mon, Mar 24, 2003 at 12:21:11PM +0000, Peter Rice wrote: > Several applications have more than one output. Some of the output files > are optional, some are more complicated. This raises a related topic; how we determine what the output files are called? For text outputs these are typically (always?) an option and can be set by the user. For graphical outputs saved with -graph data the names are just generated. This has several problems: 1. You can't run emboss in such a manner in a read-only directory. (see below) 2. The output always overwrites the previous output. 3. There's no way of knowing what files have been generated short of deleting (regexp) progname[0-9]+\.dat, running the code, and then globbing. I thought I'd better just check the change log and I see that from 2.7.0 onwards some of these solved (specifically 1, which makes dealing with 2 and 3 easier). So things are getting there :) James -- James Bonfield (jkb at mrc-lmb.cam.ac.uk) Fax: (+44) 01223 213556 Medical Research Council - Laboratory of Molecular Biology, Hills Road, Cambridge, CB2 2QH, England. Also see Staden Package WWW site at http://www.mrc-lmb.cam.ac.uk/pubseq/ From senger at ebi.ac.uk Mon Mar 24 09:37:46 2003 From: senger at ebi.ac.uk (Martin Senger) Date: Mon, 24 Mar 2003 14:37:46 +0000 (GMT) Subject: More than one output for EMBOSS apps In-Reply-To: <3E7EF837.9070509@ebi.ac.uk> Message-ID: >From the perspective of EMBOSS applications, you are doing a nice job to clean up the acd files - and it cleans up also the interface around EMBOSS. But I (and the others as well) consider the ACD files useful also for the non-emboss applications which one cannot clean up and which still may produce several outputs. The ACD file syntax, therefore, in my opinion, should keep the possibility to specify more outputs - even though this will not be used by the emboss appls themselves. Martin -- Martin Senger EMBL Outstation - Hinxton Senger at EBI.ac.uk European Bioinformatics Institute Phone: (+44) 1223 494636 Wellcome Trust Genome Campus (Switchboard: 494444) Hinxton Fax : (+44) 1223 494468 Cambridge CB10 1SD United Kingdom http://industry.ebi.ac.uk/~senger From pmr at ebi.ac.uk Mon Mar 24 10:20:43 2003 From: pmr at ebi.ac.uk (Peter Rice) Date: Mon, 24 Mar 2003 15:20:43 +0000 Subject: More than one output for EMBOSS apps References: Message-ID: <3E7F224B.1020800@ebi.ac.uk> Martin Senger wrote: >>I (and the others as well) consider the ACD files useful also > for the non-emboss applications which one cannot clean up and which still > may produce several outputs. The ACD file syntax, therefore, in my > opinion, should keep the possibility to specify more outputs - even though > this will not be used by the emboss appls themselves. Yes indeed. I am planning something with the working title "ACIDIFY" which will be an ACD syntax wrapper around other applications. ACIDIFY will validate the inputs (for example, blast database name, blast gap penalty options), and convert input and output formats. Imagine blast as an EMBOSS application (or more likely as several EMBOSS applications). A wrapper (for example SoapLab) could define inputs and outputs in the same way as for any other EMBOSS application. The validation may be complicated ... but only if we allow the full horrors of a blast gap penalty options, rather than a list of valid combinations :-) Additional data types should map on to existing ACD types. For example, blast databases could be presented as a list of valid strings. ACIDIFY will check what blast databases a service provider has. It could easily report that list to as service user so they know what options to make available. It could also report the release number, number of entries, or other properties. The complicated part if the validation, of course. A list of useful non-EMBOSS applications to "acidify" would be a great help. regards, Peter From pmr at ebi.ac.uk Mon Mar 24 10:25:33 2003 From: pmr at ebi.ac.uk (Peter Rice) Date: Mon, 24 Mar 2003 15:25:33 +0000 Subject: More than one output for EMBOSS apps References: <3E7EF837.9070509@ebi.ac.uk> <20030324134918.A15040@arran.mrc-lmb.cam.ac.uk> Message-ID: <3E7F236D.7060605@ebi.ac.uk> James Bonfield wrote: > This raises a related topic; how we determine what the output files are > called? > > For text outputs these are typically (always?) an option and can be set by the > user. For graphical outputs saved with -graph data the names are just > generated. This has several problems: I was planning to cover graph outputs in a later message. You are right - any graph output with multiple files (*.png *.dat) will have this problem. -gdirectory helps, by letting you pecify where the files go, but you also need to know what files were written and have better control over the naming. At present with graphs, -graph ps writes one files, but if you specify: -graph data (or -graph png) -goutfile fred You get fred1.dat (fred1.png) fred2.dat (fred2.png) and so on. Specifying -gdirectory jim (or EMBOSS_OUTDIRECTORY for all output files) will put those files in the jim/ directory. It will also write a list of files to standard error, which you can trap and use to find the graphics output files. Perhaps we can make this easier somehow? From jrvalverde at cnb.uam.es Wed Mar 12 10:04:33 2003 From: jrvalverde at cnb.uam.es (José R. Valverde) Date: Wed, 12 Mar 2003 11:04:33 +0100 Subject: Fw: [Fwd: IEEE Bioinformatics Conference] Message-ID: <20030312110433.21d781ab.jrvalverde@cnb.uam.es> Please, find attached an announcement for the IEEE Bioinformatics Conference. j -- Jos? R. Valverde EMBnet/CNB -------------- next part -------------- An embedded message was scrubbed... From: =?iso-8859-1?Q?=22Guig=F3=2C_Roderic=22?= Subject: [Fwd: IEEE Bioinformatics Conference] Date: Tue, 11 Mar 2003 17:23:39 +0100 Size: 6772 URL: From richard at seqbio.com Fri Mar 21 20:59:22 2003 From: richard at seqbio.com (Richard Cote) Date: Fri, 21 Mar 2003 15:59:22 -0500 Subject: JAVA ACD parameter validation Message-ID: <001901c2efec$cb45e130$6901010a@Talisker> Hello. I'm working on a development project using emboss. I'd like to know if there are any pure java implementations of ACD parameter validation (i.e. has anybody ever written the ajax library in java, or has anybody written a JNI interface to libajax). Thank you for your assistance, R.Cote From tcarver at hgmp.mrc.ac.uk Sat Mar 22 09:08:37 2003 From: tcarver at hgmp.mrc.ac.uk (Dr T. Carver) Date: Sat, 22 Mar 2003 09:08:37 +0000 (GMT) Subject: JAVA ACD parameter validation In-Reply-To: <001901c2efec$cb45e130$6901010a@Talisker> Message-ID: Hi Richard You possibly should look at the Jemboss code. This contains a Java ACD parser (org.emboss.jemboss.parser.ParseAcd) and a JNI interface to libajax (org.emboss.jemboss.parser.Ajax - ajjava.c). Let me know if you need more information, Tim Carver HGMP-RC On Fri, 21 Mar 2003, Richard Cote wrote: > Hello. > > I'm working on a development project using emboss. I'd like to know if > there are any pure java implementations of ACD parameter validation > (i.e. has anybody ever written the ajax library in java, or has anybody > written a JNI interface to libajax). > > Thank you for your assistance, > R.Cote > > From pmr at ebi.ac.uk Mon Mar 24 12:21:11 2003 From: pmr at ebi.ac.uk (Peter Rice) Date: Mon, 24 Mar 2003 12:21:11 +0000 Subject: More than one output for EMBOSS apps Message-ID: <3E7EF837.9070509@ebi.ac.uk> I am looking through the output definitions for EMBOSS, with the aim of cleaning up some problems that especially hit interfaces that parse the results, and definitions of web services. Several applications have more than one output. Some of the output files are optional, some are more complicated. I would like to clean these up so that, in general, an EMBOSS application has only one output file (and the interface can choose the format) A graph can be produced in addition to another output (seems hard to avoid) ... but the choices should be clearly defined so that interface wrappers know what they can expect. If there is more than one output, it should be clear where this is an "either/or" choice - so that, for example, two wrappers can be built, one for each output (many interfaces do something similar with DNA and protein sequence input already). More detailed thoughs in the next message. Peter From pmr at ebi.ac.uk Mon Mar 24 12:21:50 2003 From: pmr at ebi.ac.uk (Peter Rice) Date: Mon, 24 Mar 2003 12:21:50 +0000 Subject: More than one output ... detail Message-ID: <3E7EF85E.2070202@ebi.ac.uk> Some detailed proposals (and questions) on the text, feature and sequence outputs (graphics to follow). Comments please... The applications with multiple output files (now or in the near future) are: 1. checktrans: outfile, outseq, featout We can replace the output with a report file featout is obsolete (a report file can write features). If we add "-rformat fasta" we can write the sequences in FASTA format for any report (with annotation from the report features). Does anyone really need all 3 outputs? (or more than one of them frmo one run?) 2. cpgplot: outfile, featout outfile could become a report with the outfile notes in the header, and a table output (gff is a -rformat option to write the featout file) 3. cpgreport: outfile, featout outfile could become a report in rformat table ... see cpgplot. 4. diffseq: report featout featout diffseq reports differences between 2 sequences, as a special report. It also writes the differences to feature files in GFF format - we can make this a separate -rformat with both sequences annotated in one GFF file. Does anyone need the separate featout files from diffseq? 5. einverted (and palindrome): outfile (report, align) The output file should be an alignment showing the inverted repeat(s). This is complicated by being a pairwise alignment of one sequence with itself. Can this be an alignment format, to avoid the need for a separate report file? If so, will the alignment routines know there is only one sequence so GFF output can be merged? Palindrome has the same problems. 6. emma: seqoutset outfile emma produces a sequence file with aligned sequences (should this be an alignment file instead?) and a text file whch is a copy of clustalw's dendrogram output. Should the dendrogram be a special output file type? Should emma be simplified to remove the dendrogram option, and make a separate application to generate it? Does anyone use the .dnd output file from emma? Should we rewrite the emma interface to make it a lot simpler? Note: emma uses a "string" ACD type for the old dendrogram (-dendfile) input filename, and for other optional input files. This should be changed to use infile ... to help wrappers, and to validate the filenames. 7. equicktandem: report, outfile For those who parsed the old format, equicktandem still produces a second outfile. Does anyone still use it? Can we flag it as obsolete so interfaces can safely ignore it? (it has nullok set) 8. est2genome: outfile (report, align) est2genome needs to be converted to produce report and alignment output. The alignment output is optional. Many est2genome users depend on the old text output, so this may need to be preserved - but preferably as an obsolete output (see etandem) or a renamed "oldest2genome" application that the rest of us can ignore. I would prefer to have 2 outputs (report and align) with the align optional. The align ACD definition can set "nullok:Y" and depend on the -align option. This also needs a new alignment format (est2genome alignments are not simple :-) 9. etandem: report, outfile For those who parsed the old format, etandem still produces a second outfile. Does anyone still use it? Can we flag it as obsolete so interfaces can safely ignore it? (it has nullok set). 10. megamerger: outfile, seqout I guess we need both outputs. Can the outfile be a report in the style of diffseq? It reports what happens to each mismatch region. Or ... can the output be a sequence with features, and use a report format as the default feature format? It may mean "merging" some featout and report qualifiers and attributes. Any output sequence from EMBOSS can have a feature table. We usually do not define output feature tables (feature: "Y") for output sequence types in ACD. They are not shown in the "-help" output. 11. merger: align seqout Do we need both outputs? Maybe we do. We could make a "-aformat=consensus" alignment option to get the sequence. Or we could make one or both outputs optional (with the nullok attribute in ACD) so wrappers can turn them off. 12. notseq: seqoutall seqoutall Writes the sequences remaining, and possibly the sequences excluded. Can we have a simple switch that writes one or the other set, to a single output file? 13. sirna: report, seqoutall Can the seqoutall output be a report format? Maybe not, as the sequence reported is not the same as the original sequence. Can we use a sequence with a feature report? (see megamerger) 14. sixpack: outfile, seqoutall Tricky, because both files are used by some interfaces (SRS for example - although it cheats by merging them) The output file is like "remap -translation". It does not work as a report format - too many command line options to change the appearance of the translation. I think we must keep the existing 2 output files in this case. 15. supermatcher: align, outfile The outfile is only an error report. Can we simply use standard error and let it be redirected? Or do we really need it? There is only a "no start point" message written to it. Standard error, and a verbose option for this message would probably be good enough. 16. vectorstrip: outfile, seqoutall Would be better to have a report file. Can this have the sequence output as an extra format or file? Not easy to match the report file to the seqoutall (the report file shows sequences excluded, the seqoutall file shows sequences that remain). Should be possible to tag the features in the report to make this work. Can we use standard error as the output file, with a verbose option to report vectors (see supermatcher) ? 17. wordmatch: align featout featout Do we need the featout files? They report the matched regions between the two input sequences. This could be simply a new alignment format (a GFF file with each of the aligned sequences represented). regards, Peter From ableasby at hgmp.mrc.ac.uk Mon Mar 24 12:41:18 2003 From: ableasby at hgmp.mrc.ac.uk (ableasby at hgmp.mrc.ac.uk) Date: Mon, 24 Mar 2003 12:41:18 GMT Subject: More than one output ... detail Message-ID: <200303241241.h2OCfIL07830@bromine.hgmp.mrc.ac.uk> I'll look more closely later. One thing that mustn't disappear is the emma DND output though. Its extremely useful! Cheers Alan From jkb at mrc-lmb.cam.ac.uk Mon Mar 24 13:49:18 2003 From: jkb at mrc-lmb.cam.ac.uk (James Bonfield) Date: Mon, 24 Mar 2003 13:49:18 +0000 Subject: More than one output for EMBOSS apps In-Reply-To: <3E7EF837.9070509@ebi.ac.uk>; from pmr@ebi.ac.uk on Mon, Mar 24, 2003 at 12:21:11PM +0000 References: <3E7EF837.9070509@ebi.ac.uk> Message-ID: <20030324134918.A15040@arran.mrc-lmb.cam.ac.uk> On Mon, Mar 24, 2003 at 12:21:11PM +0000, Peter Rice wrote: > Several applications have more than one output. Some of the output files > are optional, some are more complicated. This raises a related topic; how we determine what the output files are called? For text outputs these are typically (always?) an option and can be set by the user. For graphical outputs saved with -graph data the names are just generated. This has several problems: 1. You can't run emboss in such a manner in a read-only directory. (see below) 2. The output always overwrites the previous output. 3. There's no way of knowing what files have been generated short of deleting (regexp) progname[0-9]+\.dat, running the code, and then globbing. I thought I'd better just check the change log and I see that from 2.7.0 onwards some of these solved (specifically 1, which makes dealing with 2 and 3 easier). So things are getting there :) James -- James Bonfield (jkb at mrc-lmb.cam.ac.uk) Fax: (+44) 01223 213556 Medical Research Council - Laboratory of Molecular Biology, Hills Road, Cambridge, CB2 2QH, England. Also see Staden Package WWW site at http://www.mrc-lmb.cam.ac.uk/pubseq/ From senger at ebi.ac.uk Mon Mar 24 14:37:46 2003 From: senger at ebi.ac.uk (Martin Senger) Date: Mon, 24 Mar 2003 14:37:46 +0000 (GMT) Subject: More than one output for EMBOSS apps In-Reply-To: <3E7EF837.9070509@ebi.ac.uk> Message-ID: >From the perspective of EMBOSS applications, you are doing a nice job to clean up the acd files - and it cleans up also the interface around EMBOSS. But I (and the others as well) consider the ACD files useful also for the non-emboss applications which one cannot clean up and which still may produce several outputs. The ACD file syntax, therefore, in my opinion, should keep the possibility to specify more outputs - even though this will not be used by the emboss appls themselves. Martin -- Martin Senger EMBL Outstation - Hinxton Senger at EBI.ac.uk European Bioinformatics Institute Phone: (+44) 1223 494636 Wellcome Trust Genome Campus (Switchboard: 494444) Hinxton Fax : (+44) 1223 494468 Cambridge CB10 1SD United Kingdom http://industry.ebi.ac.uk/~senger From pmr at ebi.ac.uk Mon Mar 24 15:20:43 2003 From: pmr at ebi.ac.uk (Peter Rice) Date: Mon, 24 Mar 2003 15:20:43 +0000 Subject: More than one output for EMBOSS apps References: Message-ID: <3E7F224B.1020800@ebi.ac.uk> Martin Senger wrote: >>I (and the others as well) consider the ACD files useful also > for the non-emboss applications which one cannot clean up and which still > may produce several outputs. The ACD file syntax, therefore, in my > opinion, should keep the possibility to specify more outputs - even though > this will not be used by the emboss appls themselves. Yes indeed. I am planning something with the working title "ACIDIFY" which will be an ACD syntax wrapper around other applications. ACIDIFY will validate the inputs (for example, blast database name, blast gap penalty options), and convert input and output formats. Imagine blast as an EMBOSS application (or more likely as several EMBOSS applications). A wrapper (for example SoapLab) could define inputs and outputs in the same way as for any other EMBOSS application. The validation may be complicated ... but only if we allow the full horrors of a blast gap penalty options, rather than a list of valid combinations :-) Additional data types should map on to existing ACD types. For example, blast databases could be presented as a list of valid strings. ACIDIFY will check what blast databases a service provider has. It could easily report that list to as service user so they know what options to make available. It could also report the release number, number of entries, or other properties. The complicated part if the validation, of course. A list of useful non-EMBOSS applications to "acidify" would be a great help. regards, Peter From pmr at ebi.ac.uk Mon Mar 24 15:25:33 2003 From: pmr at ebi.ac.uk (Peter Rice) Date: Mon, 24 Mar 2003 15:25:33 +0000 Subject: More than one output for EMBOSS apps References: <3E7EF837.9070509@ebi.ac.uk> <20030324134918.A15040@arran.mrc-lmb.cam.ac.uk> Message-ID: <3E7F236D.7060605@ebi.ac.uk> James Bonfield wrote: > This raises a related topic; how we determine what the output files are > called? > > For text outputs these are typically (always?) an option and can be set by the > user. For graphical outputs saved with -graph data the names are just > generated. This has several problems: I was planning to cover graph outputs in a later message. You are right - any graph output with multiple files (*.png *.dat) will have this problem. -gdirectory helps, by letting you pecify where the files go, but you also need to know what files were written and have better control over the naming. At present with graphs, -graph ps writes one files, but if you specify: -graph data (or -graph png) -goutfile fred You get fred1.dat (fred1.png) fred2.dat (fred2.png) and so on. Specifying -gdirectory jim (or EMBOSS_OUTDIRECTORY for all output files) will put those files in the jim/ directory. It will also write a list of files to standard error, which you can trap and use to find the graphics output files. Perhaps we can make this easier somehow?