From gbottu at ben.vub.ac.be Sat Feb 12 12:28:24 2005 From: gbottu at ben.vub.ac.be (Guy Bottu) Date: Sat, 12 Feb 2005 18:28:24 +0100 Subject: suggestion for ACD syntax Message-ID: <20050212172824.GB23437@bigben.ulb.ac.be> Dear developpers, Yesterday I had a discussion with Marc Colet, developper of the EMBOSS interface wEMBOSS and we have a suggestion for a small extension of the ACD syntax. It would be nice if the parameter type "infile" had an attribute "extension", so that the program would only accept input files with a name ending with ... This would perhaps not be so useful at the command line, but in a GUI this would allow for a selector with a filter showing only the appropriate files. Regards, Guy Bottu, BEN From jrvalverde at cnb.uam.es Mon Feb 14 05:16:58 2005 From: jrvalverde at cnb.uam.es (=?ISO-8859-15?Q?Jos=E9?= R. Valverde) Date: Mon, 14 Feb 2005 11:16:58 +0100 Subject: suggestion for ACD syntax In-Reply-To: <20050212172824.GB23437@bigben.ulb.ac.be> References: <20050212172824.GB23437@bigben.ulb.ac.be> Message-ID: <20050214111658.2d0b684a.jrvalverde@cnb.uam.es> This is long, but please, read to the end. I make various serious considerations and bring up important concerns. On Sat, 12 Feb 2005 18:28:24 +0100 Guy Bottu wrote: > Dear developpers, > > Yesterday I had a discussion with Marc Colet, developper of the EMBOSS > interface wEMBOSS and we have a suggestion for a small extension of the > ACD syntax. It would be nice if the parameter type "infile" had an > attribute "extension", so that the program would only accept input files > with a name ending with ... This would perhaps not be so useful at the > command line, but in a GUI this would allow for a selector with a filter > showing only the appropriate files. > That looks interesting. Problem I see is that there is no standard naming followed by users. Most often they come from Mac/MSW environments where this is automatically taken care of by the system transparently (sort of) for them. This implies they are used to typing names with no extensions. Furthermore, some packages encourage this behaviour (e.g. Phylip) or in contradictory terms (e.g. ".fasta" for a FastA formatted file and for a FastA result listing). Thus, either we adopt a standard naming convention, and force all GUIs to adopt it and do so transparently (i.e. removing the extension from the name on listings) or it may become a serious problem. Even so, data transport from other packages may be a problem. OTOH, if a GUI is to impose naming conventions and do so transparently to the user, then it may as well offer the selection as a menu in the 'open' box just like Netscape, MSW, and many others do (i.e. under the filename offer "Show only sequences (.seq, .aa, .nt, .gcg, .fasta...)"/"Show all text files..."/"Show all files (*)"... This raises a side issue that should be obvious: in the example given let's try to follow it: "Show only sequences (.seq, .gcg, .fasta, .pir, .abi, .nrl3d, .embl, .genbank, .swissprot, .sw, .ddbj, .and-so-on-and-on-for-a-long-very-long- indeed-list-of-extensions...) The problem stems from the fact that emboss automagically manages many kinds of sequences. Then, if I state on my ACD "extension = .gcg" I am doomed because I won't handle any longer all the formats. If I have to list them all, I may forget some (or some may be added later). For these reasons, should something be added, I would prefer to see "abstract types or kinds" of possible infiles (sequence, codontable, text, image, 2D-pstruct, 3D-pstruct, na-struct, etc...) so that one could pick up *all* files of *any* suitable extension that can be processed by a program. This leads to a proposition: MIME-types or the like of. I.e. what would be really helpful for GUIs is to be able to specify types 'a la' MIME: e.g. this is an "x-emb-sequence/x-fasta-nt", "x-emb-msa/x-phylip", "structure/pdb", etc.. In this way, if one is willing to accept any EMBOSS-know sequence format, one may select by "x-emb-sequence", and if one needs to be more specific, one can do "x-emb-msa/phylip", and get all discrimination needed. Then associating MIME-types to files would be left as freedom for a) GUI designers, and b) users. Say, if I as a user prefer to call my FASTA formatted sequences .aa or .nt I can set my browser prefs to associate these with the type I need and a click on a ".aa" or ".nt" file would open on my side the appropriate program (e.g. a sequence editor). It also has another plus: it grafts very well with all other standardisation and objectization initiatives. We may come up with a fairly complete listing and offer it to the standardisation body as a well wounded proposal (OMG, whatever) which would a) benefit all the community and b) prevent lock-outs by commercial companies wanting to lock-in users on their specific dialects which might offer a standard designed to be incompatible with FOSS (say, like MS patent on Office-XML formats). Plus, has anybody else noticed the recent news about MS entering the field of Bioinformatics with a huge project incorporating various EU universities? How long do you think it will take before they start producing incompatible formtas and standards to wipe out EMBOSS, GCG and others? Don't take me wrong: I am certain some conventions would be really useful, but ~20 years of dealing with users has shown me that many of them just don't follow the conventions and you need to find a way for them. j -- Jose R. Valverde EMBnet/CNB -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/emboss-dev/attachments/20050214/580cefe8/attachment.bin From pmr at ebi.ac.uk Mon Feb 14 05:40:09 2005 From: pmr at ebi.ac.uk (Peter Rice) Date: Mon, 14 Feb 2005 10:40:09 +0000 Subject: suggestion for ACD syntax In-Reply-To: <20050212172824.GB23437@bigben.ulb.ac.be> References: <20050212172824.GB23437@bigben.ulb.ac.be> Message-ID: <42108009.6060000@ebi.ac.uk> Guy Bottu wrote: > Dear developpers, > > Yesterday I had a discussion with Marc Colet, developper of the EMBOSS > interface wEMBOSS and we have a suggestion for a small extension of the > ACD syntax. It would be nice if the parameter type "infile" had an > attribute "extension", so that the program would only accept input files > with a name ending with ... This would perhaps not be so useful at the > command line, but in a GUI this would allow for a selector with a filter > showing only the appropriate files. Good suggestion. We have this already for "directory". However ... there could be problems (as JR Valverde has pointed out) with systems that use different extensions for specific file formats. We have another way to do this .... We are adding "knowntype" definitions for infile, outfile and other ACD types. The "acdvalid" utility now warns if the knowntype is missing, or is not defined in the file knowntypes.standard. Interfaces could convert these knowntype values (and sequence types for the sequence inputs) into a standard set of extensions to filter. This would not need any change to ACD, but would need some agreement on the knowntype attribute values. Hope this helps, Peter From gbottu at ben.vub.ac.be Tue Feb 15 04:34:33 2005 From: gbottu at ben.vub.ac.be (Guy Bottu) Date: Tue, 15 Feb 2005 10:34:33 +0100 Subject: extension and knowntype Message-ID: <20050215093433.GB14727@bigben.ulb.ac.be> Dear Jose and Peter, I agree with Jose that imposing an extension for the infile name and hence limiting the user in the choice of file names would restrict the flexibility. Actually, I myself can only think at one case were it would work fine : for a databank in BLAST format, since the format itself does impose the extensions. Using the knowntype attribute is certainly a solution. I will discuss this with Marc Colet. Regards, Guy Bottu, BEN From gbottu at ben.vub.ac.be Sat Feb 12 17:28:24 2005 From: gbottu at ben.vub.ac.be (Guy Bottu) Date: Sat, 12 Feb 2005 18:28:24 +0100 Subject: suggestion for ACD syntax Message-ID: <20050212172824.GB23437@bigben.ulb.ac.be> Dear developpers, Yesterday I had a discussion with Marc Colet, developper of the EMBOSS interface wEMBOSS and we have a suggestion for a small extension of the ACD syntax. It would be nice if the parameter type "infile" had an attribute "extension", so that the program would only accept input files with a name ending with ... This would perhaps not be so useful at the command line, but in a GUI this would allow for a selector with a filter showing only the appropriate files. Regards, Guy Bottu, BEN From jrvalverde at cnb.uam.es Mon Feb 14 10:16:58 2005 From: jrvalverde at cnb.uam.es (=?ISO-8859-15?Q?Jos=E9?= R. Valverde) Date: Mon, 14 Feb 2005 11:16:58 +0100 Subject: suggestion for ACD syntax In-Reply-To: <20050212172824.GB23437@bigben.ulb.ac.be> References: <20050212172824.GB23437@bigben.ulb.ac.be> Message-ID: <20050214111658.2d0b684a.jrvalverde@cnb.uam.es> This is long, but please, read to the end. I make various serious considerations and bring up important concerns. On Sat, 12 Feb 2005 18:28:24 +0100 Guy Bottu wrote: > Dear developpers, > > Yesterday I had a discussion with Marc Colet, developper of the EMBOSS > interface wEMBOSS and we have a suggestion for a small extension of the > ACD syntax. It would be nice if the parameter type "infile" had an > attribute "extension", so that the program would only accept input files > with a name ending with ... This would perhaps not be so useful at the > command line, but in a GUI this would allow for a selector with a filter > showing only the appropriate files. > That looks interesting. Problem I see is that there is no standard naming followed by users. Most often they come from Mac/MSW environments where this is automatically taken care of by the system transparently (sort of) for them. This implies they are used to typing names with no extensions. Furthermore, some packages encourage this behaviour (e.g. Phylip) or in contradictory terms (e.g. ".fasta" for a FastA formatted file and for a FastA result listing). Thus, either we adopt a standard naming convention, and force all GUIs to adopt it and do so transparently (i.e. removing the extension from the name on listings) or it may become a serious problem. Even so, data transport from other packages may be a problem. OTOH, if a GUI is to impose naming conventions and do so transparently to the user, then it may as well offer the selection as a menu in the 'open' box just like Netscape, MSW, and many others do (i.e. under the filename offer "Show only sequences (.seq, .aa, .nt, .gcg, .fasta...)"/"Show all text files..."/"Show all files (*)"... This raises a side issue that should be obvious: in the example given let's try to follow it: "Show only sequences (.seq, .gcg, .fasta, .pir, .abi, .nrl3d, .embl, .genbank, .swissprot, .sw, .ddbj, .and-so-on-and-on-for-a-long-very-long- indeed-list-of-extensions...) The problem stems from the fact that emboss automagically manages many kinds of sequences. Then, if I state on my ACD "extension = .gcg" I am doomed because I won't handle any longer all the formats. If I have to list them all, I may forget some (or some may be added later). For these reasons, should something be added, I would prefer to see "abstract types or kinds" of possible infiles (sequence, codontable, text, image, 2D-pstruct, 3D-pstruct, na-struct, etc...) so that one could pick up *all* files of *any* suitable extension that can be processed by a program. This leads to a proposition: MIME-types or the like of. I.e. what would be really helpful for GUIs is to be able to specify types 'a la' MIME: e.g. this is an "x-emb-sequence/x-fasta-nt", "x-emb-msa/x-phylip", "structure/pdb", etc.. In this way, if one is willing to accept any EMBOSS-know sequence format, one may select by "x-emb-sequence", and if one needs to be more specific, one can do "x-emb-msa/phylip", and get all discrimination needed. Then associating MIME-types to files would be left as freedom for a) GUI designers, and b) users. Say, if I as a user prefer to call my FASTA formatted sequences .aa or .nt I can set my browser prefs to associate these with the type I need and a click on a ".aa" or ".nt" file would open on my side the appropriate program (e.g. a sequence editor). It also has another plus: it grafts very well with all other standardisation and objectization initiatives. We may come up with a fairly complete listing and offer it to the standardisation body as a well wounded proposal (OMG, whatever) which would a) benefit all the community and b) prevent lock-outs by commercial companies wanting to lock-in users on their specific dialects which might offer a standard designed to be incompatible with FOSS (say, like MS patent on Office-XML formats). Plus, has anybody else noticed the recent news about MS entering the field of Bioinformatics with a huge project incorporating various EU universities? How long do you think it will take before they start producing incompatible formtas and standards to wipe out EMBOSS, GCG and others? Don't take me wrong: I am certain some conventions would be really useful, but ~20 years of dealing with users has shown me that many of them just don't follow the conventions and you need to find a way for them. j -- Jose R. Valverde EMBnet/CNB -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available URL: From pmr at ebi.ac.uk Mon Feb 14 10:40:09 2005 From: pmr at ebi.ac.uk (Peter Rice) Date: Mon, 14 Feb 2005 10:40:09 +0000 Subject: suggestion for ACD syntax In-Reply-To: <20050212172824.GB23437@bigben.ulb.ac.be> References: <20050212172824.GB23437@bigben.ulb.ac.be> Message-ID: <42108009.6060000@ebi.ac.uk> Guy Bottu wrote: > Dear developpers, > > Yesterday I had a discussion with Marc Colet, developper of the EMBOSS > interface wEMBOSS and we have a suggestion for a small extension of the > ACD syntax. It would be nice if the parameter type "infile" had an > attribute "extension", so that the program would only accept input files > with a name ending with ... This would perhaps not be so useful at the > command line, but in a GUI this would allow for a selector with a filter > showing only the appropriate files. Good suggestion. We have this already for "directory". However ... there could be problems (as JR Valverde has pointed out) with systems that use different extensions for specific file formats. We have another way to do this .... We are adding "knowntype" definitions for infile, outfile and other ACD types. The "acdvalid" utility now warns if the knowntype is missing, or is not defined in the file knowntypes.standard. Interfaces could convert these knowntype values (and sequence types for the sequence inputs) into a standard set of extensions to filter. This would not need any change to ACD, but would need some agreement on the knowntype attribute values. Hope this helps, Peter From gbottu at ben.vub.ac.be Tue Feb 15 09:34:33 2005 From: gbottu at ben.vub.ac.be (Guy Bottu) Date: Tue, 15 Feb 2005 10:34:33 +0100 Subject: extension and knowntype Message-ID: <20050215093433.GB14727@bigben.ulb.ac.be> Dear Jose and Peter, I agree with Jose that imposing an extension for the infile name and hence limiting the user in the choice of file names would restrict the flexibility. Actually, I myself can only think at one case were it would work fine : for a databank in BLAST format, since the format itself does impose the extensions. Using the knowntype attribute is certainly a solution. I will discuss this with Marc Colet. Regards, Guy Bottu, BEN