From pmr at ebi.ac.uk Thu Apr 7 12:44:14 2005 From: pmr at ebi.ac.uk (Peter Rice) Date: Thu, 07 Apr 2005 17:44:14 +0100 Subject: Genetic codes and other repeated ACD lists Message-ID: <4255635E.8030609@ebi.ac.uk> I have found a way to save writing and maintaining lists like these in ACD files: list: table [ additional: "Y" default: "0" minimum: "1" maximum: "1" header: "Genetic codes" values: "0:Standard; 1:Standard (with alternative initiation codons); 2:Vertebrate Mitochondrial; 3:Yeast Mitochondrial; 4:Mold, Protozoan, Coelenterate Mitochondrial and Mycoplasma/Spiroplasma; 5:Invertebrate Mitochondrial; 6:Ciliate Macronuclear and Dasycladacean; 9:Echinoderm Mitochondrial; 10:Euplotid Nuclear; 11:Bacterial; 12:Alternative Yeast Nuclear; 13:Ascidian Mitochondrial; 14:Flatworm Mitochondrial; 15:Blepharisma Macronuclear; 16:Chlorophycean Mitochondrial; 21:Trematode Mitochondrial; 22:Scenedesmus obliquus; 23:Thraustochytrium Mitochondrial" delimiter: ";" codedelimiter: ":" information: "Code to use" knowntype: "genetic code" ] Using the "knowntype" attribute it is possible to delet the value atttribute, and to define a standard list using a "resource" definition in the emboss.default (or .embossrc) file like this: RESOURCE genetic_code [ type: "list" value: "0:Standard;11:Bacterial" ] (for just 2 genetic codes) or RESOURCE genetic_code [ type: "list" value: "@EGC.index" ] (for a list of all the genetic codes - this will read a datafile EGC.index which is new in CVS). Other resource definitions could be commands to execute. I have not yet decided whether to allow a value of "@EGC.index" in the ACD file itself. It could be a nice short cut, but I like using a "knowntype" to control the results. There are some problems to solve: 1. the resource is tested in too many places - it should replace the "value" attribute when it is first used. Not hard to do. 2. there should be a clean way to define a default value for each knowntype - for example calling an ajTrn function to resolve the "genetic code" knowntype to a value. Functions can be defined for list knowntypes in ajacd.c 3. anyone parsing the ACD file will wonder where the value has gone - perhaps acdpretty can be made to fill in missing values with an environment variable set. Would that be acceptable to those who need it? Future uses for this: 1. standard list of genetic codes with descriptions 2. standard reading frame names 3. list of known codon usage files, matrices, etc. by specifying "?" as the value 4. a list of blast databases for a blastall wrapper :-) 5. replacing "string" qualifiers which have a knowntype with a selection that can display and test the list of acceptable values in ACD, to avoid a run-time failure Comments please .... Peter From jison at hgmp.mrc.ac.uk Fri Apr 8 06:34:51 2005 From: jison at hgmp.mrc.ac.uk (Dr J.C. Ison) Date: Fri, 08 Apr 2005 11:34:51 +0100 Subject: Genetic codes and other repeated ACD lists References: <4255635E.8030609@ebi.ac.uk> Message-ID: <42565E4B.1232945@hgmp.mrc.ac.uk> Hi Peter Comments below. Cheers Jon Peter Rice wrote: > > I have found a way to save writing and maintaining lists like these in ACD files: > > list: table [ > additional: "Y" > default: "0" > minimum: "1" > maximum: "1" > header: "Genetic codes" > values: "0:Standard; 1:Standard (with alternative initiation > codons); 2:Vertebrate Mitochondrial; 3:Yeast Mitochondrial; > 4:Mold, Protozoan, Coelenterate Mitochondrial and > Mycoplasma/Spiroplasma; 5:Invertebrate Mitochondrial; 6:Ciliate > Macronuclear and Dasycladacean; 9:Echinoderm Mitochondrial; > 10:Euplotid Nuclear; 11:Bacterial; 12:Alternative Yeast Nuclear; > 13:Ascidian Mitochondrial; 14:Flatworm Mitochondrial; > 15:Blepharisma Macronuclear; 16:Chlorophycean Mitochondrial; > 21:Trematode Mitochondrial; 22:Scenedesmus obliquus; > 23:Thraustochytrium Mitochondrial" > delimiter: ";" > codedelimiter: ":" > information: "Code to use" > knowntype: "genetic code" > ] > > Using the "knowntype" attribute it is possible to delet the value atttribute, > and to define a standard list using a "resource" definition in the > emboss.default (or .embossrc) file like this: > > RESOURCE genetic_code [ type: "list" value: "0:Standard;11:Bacterial" ] > > (for just 2 genetic codes) > > or > > RESOURCE genetic_code [ type: "list" value: "@EGC.index" ] > > (for a list of all the genetic codes - this will read a datafile EGC.index > which is new in CVS). > > Other resource definitions could be commands to execute. It'd be cleaner, more flexible and and easier to maintain and if not a requirement now probably an increasing one in the future. I've two progs that would benefit from it now. > I have not yet decided whether to allow a value of "@EGC.index" in the ACD > file itself. It could be a nice short cut, but I like using a "knowntype" to > control the results. Could be confusing to allow that in the ACD file because the punter might think EGC existed, e.g. as a data item, in the file itself and get confused when they can't find it. > There are some problems to solve: > > 1. the resource is tested in too many places - it should replace the "value" > attribute when it is first used. Not hard to do. > > 2. there should be a clean way to define a default value for each knowntype - > for example calling an ajTrn function to resolve the "genetic code" knowntype > to a value. Functions can be defined for list knowntypes in ajacd.c Couldn't the default be specified in the same place / file as the values themselves? Presumably the default value would be needed before run-time proper and could be retrieved at the same time as the values are. > > 3. anyone parsing the ACD file will wonder where the value has gone - perhaps > acdpretty can be made to fill in missing values with an environment variable > set. Would that be acceptable to those who need it? I think it would be nice to support both "standard" lists (ie. ones *with* "values" attribute) and the new style. Perhaps something like: values: "@knowntype" to indicate to use the knowntype to get the values, *or* values: "0: Standard ... etc" as before. Then the values attribute would always be there, with the ACD developer having the option to specify a standard list of values or to get the values from the knowntype. > Future uses for this: > > 1. standard list of genetic codes with descriptions > > 2. standard reading frame names > > 3. list of known codon usage files, matrices, etc. by specifying "?" as the value > > 4. a list of blast databases for a blastall wrapper :-) > > 5. replacing "string" qualifiers which have a knowntype with a selection that > can display and test the list of acceptable values in ACD, to avoid a run-time > failure > > Comments please .... > > Peter -- Jon C. Ison, PhD Proteomics Applications Group MRC Rosalind Franklin Centre for Genomics Research Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SB, UK Tel: +44 1223 494500 Fax: +44 1223 494512 E-mail: jison at rfcgr.mrc.ac.uk Web: http://www.rfcgr.mrc.ac.uk From pmr at ebi.ac.uk Fri Apr 8 06:55:02 2005 From: pmr at ebi.ac.uk (Peter Rice) Date: Fri, 08 Apr 2005 11:55:02 +0100 Subject: Genetic codes and other repeated ACD lists In-Reply-To: <42565E4B.1232945@hgmp.mrc.ac.uk> References: <4255635E.8030609@ebi.ac.uk> <42565E4B.1232945@hgmp.mrc.ac.uk> Message-ID: <42566306.3000908@ebi.ac.uk> Dr J.C. Ison wrote: > Peter Rice wrote: >>I have found a way to save writing and maintaining lists like these in ACD files: >> >>Using the "knowntype" attribute it is possible to delet the value atttribute, >>and to define a standard list using a "resource" definition in the >>emboss.default (or .embossrc) file like this: > > It'd be cleaner, more flexible and and easier to maintain and if not a > requirement now probably an increasing one in the future. I've two progs > that would benefit from it now. Thanks. Domainatrix I assume. Which ones? I will take a look and see how they fit in. > Couldn't the default be specified in the same place / file as the values themselves? > Presumably the default value would be needed before run-time proper and could > be retrieved at the same time as the values are. Good point. I need to think some more about whether a knowntype should have a default. Genetics codes are a good example - we use a default of 0 but strictly genetic code numbers are 1 to 23 (0 is code 1 with only ATG as a start). > I think it would be nice to support both "standard" lists (ie. ones *with* "values" > attribute) and the new style. Perhaps something like: > > values: "@knowntype" A missing value will do this. Normally the value is required, so an ACD developer (or parser) will know it needs a value. (I hope :-) > to indicate to use the knowntype to get the values, *or* > > values: "0: Standard ... etc" as before. Yes, that will override the knowntype. Maybe acdvalid can warn if a list or select has a knowntype (with its own standadr value) and a defined value attribute More comments please! Peter From jison at hgmp.mrc.ac.uk Fri Apr 8 07:46:43 2005 From: jison at hgmp.mrc.ac.uk (Dr J.C. Ison) Date: Fri, 08 Apr 2005 12:46:43 +0100 Subject: Genetic codes and other repeated ACD lists References: <4255635E.8030609@ebi.ac.uk> <42565E4B.1232945@hgmp.mrc.ac.uk> <42566306.3000908@ebi.ac.uk> Message-ID: <42566F23.F80F2CFD@hgmp.mrc.ac.uk> Peter Rice wrote: > Thanks. Domainatrix I assume. Which ones? I will take a look and see how they > fit in. The newly committed matgen3d and siggenlig, which both take an "environment definition" (amino acid 3D environment) from a list. At the moment, the environemnts names are "Env1", "Env2" etc but would get more meaningful names once the definitions themselves are more settled (pending further research on which ones are most useful). The progs. then use the selection (Env1 or Env2 etc) to call appropriate functions within the application code. > > I think it would be nice to support both "standard" lists (ie. ones *with* "values" > > attribute) and the new style. Perhaps something like: > > > > values: "@knowntype" > > A missing value will do this. Normally the value is required, so an ACD > developer (or parser) will know it needs a value. (I hope :-) They could simply acdprettyify the files as described in your prev. email before parsing, so they wouldn't need to do any new coding. > > > to indicate to use the knowntype to get the values, *or* > > > > values: "0: Standard ... etc" as before. > > Yes, that will override the knowntype. Maybe acdvalid can warn if a list or > select has a knowntype (with its own standadr value) and a defined value attribute The override / warning are intuitive / sensible. From pmr at ebi.ac.uk Fri Apr 8 09:18:42 2005 From: pmr at ebi.ac.uk (Peter Rice) Date: Fri, 08 Apr 2005 14:18:42 +0100 Subject: Genetic codes and other repeated ACD lists In-Reply-To: <42566F23.F80F2CFD@hgmp.mrc.ac.uk> References: <4255635E.8030609@ebi.ac.uk> <42565E4B.1232945@hgmp.mrc.ac.uk> <42566306.3000908@ebi.ac.uk> <42566F23.F80F2CFD@hgmp.mrc.ac.uk> Message-ID: <425684B2.1000709@ebi.ac.uk> Dr J.C. Ison wrote: > Peter Rice wrote: >>Thanks. Domainatrix I assume. Which ones? I will take a look and see how they >>fit in. > > The newly committed matgen3d and siggenlig, which both take an "environment > definition" (amino acid 3D environment) from a list. > > At the moment, the environemnts names are "Env1", "Env2" etc but would get more > meaningful names once the definitions themselves are more settled (pending > further research on which ones are most useful). The genetic code format is very simple - the name, a space and the value with leading spaces and #commented lines ignored (this is the EGC.index file for an "@EGC.index" resource value) 0 Standard with AUG start only 1 Standard 2 Vertebrate mitochondrial 3 Yeast mitochondrial 4 Mold, Protozoan, and Coelenterate Mitochondrial and Mycoplasma/Spiroplasma 5 Invertebrate Mitochondrial 6 Ciliate, Dasycladacean and Hexamita Nuclear # 7 *Kinetoplast code now merged in code id 4 # 8 *Plant chloroplast all differences due to RNA edit use code id 1 9 Echinoderm and Flatworm Mitochondrial 10 Euplotid Nuclear 11 Bacterial and Plant Plastid 12 Alternative Yeast Nuclear 13 Ascidian Mitochondrial 14 Alternative Flatworm Mitochondrial 15 Blepharisma Nuclear 16 Chlorophycean Mitochondrial #17 Never defined #18 Never defined #19 Never defined #20 Never defined 21 Trematode Mitochondrial 22 Scenedesmus obliquus Mitochondrial 23 Thraustochytrium Mitochondrial > They could simply acdprettyify the files as described in your prev. email > before parsing, so they wouldn't need to do any new coding. But maybe not those who use the ACD file at run time :-) regards, Peter From pmr at ebi.ac.uk Fri Apr 8 09:26:59 2005 From: pmr at ebi.ac.uk (Peter Rice) Date: Fri, 08 Apr 2005 14:26:59 +0100 Subject: Genetic codes and other repeated ACD lists In-Reply-To: References: <4255635E.8030609@ebi.ac.uk> Message-ID: <425686A3.2080808@ebi.ac.uk> Tim Carver wrote: >Peter Rice wrote: >>3. anyone parsing the ACD file will wonder where the value has gone - perhaps >>acdpretty can be made to fill in missing values with an environment variable >>set. Would that be acceptable to those who need it? > > I guess so. If we just loop over the ACD's after installation and get > 'acdpretty' to convert them that shouldn't be too bad I would have > thought... it would only need to be done once. For list: and selction: the acdpretty output would look normal (the value: "" attribute can be filled in with the knowntype value). For matrix: and matrixf: we can leave everything unchanged (add nothing to the ACD file in acdpretty), or we can offer a list of known matrix filenames using some new attribute name. This is a little tricky ... for the alignment programs, there will be separate lists for nucleotide (only EDNAFULL and EDNAMAT) and protein (EBLOSUM* and EPAM*) with the allowed values depending on the type of the input sequences. Of course, as matrix input the user can choose any other available matrix file if the interface allows. Any prerefence (or any special requests to help JEMBOSS?) regards, Peter From pmr at ebi.ac.uk Fri Apr 8 09:30:34 2005 From: pmr at ebi.ac.uk (Peter Rice) Date: Fri, 08 Apr 2005 14:30:34 +0100 Subject: Genetic codes and other repeated ACD lists In-Reply-To: <425685BA.A9658C5F@hgmp.mrc.ac.uk> References: <4255635E.8030609@ebi.ac.uk> <42565E4B.1232945@hgmp.mrc.ac.uk> <42566306.3000908@ebi.ac.uk> <42566F23.F80F2CFD@hgmp.mrc.ac.uk> <425684B2.1000709@ebi.ac.uk> <425685BA.A9658C5F@hgmp.mrc.ac.uk> Message-ID: <4256877A.4090609@ebi.ac.uk> Dr J.C. Ison wrote: > That format would be ideal. "Env1", "Env2" etc could be replaced by "1", "2" etc > then text could be added giving a meaningful description of the environment. The name would be whatever the program accepts for a list (for selection it is the value, but list is generally preferred in ACD files). I know domainatrix often uses "1", "2", etc. but they are not always the best choices. A thought - perhaps the file could have a default marked with * before the name, or default to the first in the list? regards, Peter From gbottu at ben.vub.ac.be Wed Apr 13 09:22:12 2005 From: gbottu at ben.vub.ac.be (Guy Bottu) Date: Wed, 13 Apr 2005 15:22:12 +0200 Subject: Genetic codes and other repeated ACD Message-ID: <20050413132212.GA15521@bigben.ulb.ac.be> Dear Peter, dear all, Allow me to add something to the recent discussion about geneticcodes. I talked about it with Marc Colet, developer of wEMBOSS, and he considers, for the sake of GUI maintenance, that it is better to avoid making the ACD syntax too complicated and certainly to avoid making too often a change. A few ideas : - Currently emboss.defaults does not contain items that are absolutely needed. We think it is better not to change that philosophy by putting e.g. the geneticcodes in it. It could however be an idea to put in emboss.defaults a list of databanks in BLAST format, for the sake of BLAST wrappers. - For items like reading frames and maybe geneticcodes, that appear over and over again in several ACD files, yet are not user or installation customizable, the best proposal among those made in this discussion list seems to me to have it defined in one central file, for the purpose of the software developement, but to "acdpretty" it into the ACD files before they are distributed, for the sake of GUI functioning. - There is the case of items where users can choose to use their own data instead of the EMBOSS distribution data, like symbol comparison matrices and codon usage tables (would genetic codes fall into this catagory ?). Till now there was each time a new ACD object type defined, like matrix and cfile. Is shifting to the use of "knowntype" a good idea ? I do not know, but, let's keep consistent. - There is the issue of the program embossdata, useful for the advanced user and a possible tool for displaying choice lists in GUI's. Currently, when we run it at the BEN site with just the parameter -showall it produces a monstruous long list, because all the databanks (including CUTG) have been downloaded and "extracted". Maybe let it by default display only the data files in the main data directory ? Note that e.g. the list of PRINTS files is anyway not very interesting, since you cannot do anything with them as such. Could it be modified so that you can easily get a list of the alternative data files used by a particular program (or could a library routine called by the program itself do that) ? Sincerely, Guy Bottu, BEN From pmr at ebi.ac.uk Wed Apr 13 10:30:05 2005 From: pmr at ebi.ac.uk (Peter Rice) Date: Wed, 13 Apr 2005 15:30:05 +0100 Subject: Genetic codes and other repeated ACD In-Reply-To: <20050413132212.GA15521@bigben.ulb.ac.be> References: <20050413132212.GA15521@bigben.ulb.ac.be> Message-ID: <425D2CED.60503@ebi.ac.uk> Guy Bottu wrote: > - Currently emboss.defaults does not contain items that are absolutely > needed. We think it is better not to change that philosophy by putting > e.g. the geneticcodes in it. It could however be an idea to put in > emboss.defaults a list of databanks in BLAST format, for the sake of BLAST > wrappers. They will not be absolutely needed. There will be a default - a list of values, a file with a list of values, or a script that finds everything. > - For items like reading frames and maybe geneticcodes, that appear over > and over again in several ACD files, yet are not user or installation > customizable, the best proposal among those made in this discussion list > seems to me to have it defined in one central file, for the purpose of the > software developement, but to "acdpretty" it into the ACD files before > they are distributed, for the sake of GUI functioning. This will be the default ... but the distributed files will *not* have the values filled in (if we fill the values in, the automatic list will not work when users add new options :-). You will need to run acdpretty yourself. That way, if you add extra options locally you will get them in the acdpretty file. There is nothing to stop you copying that file on top of the original acd file. > - There is the case of items where users can choose to use their own data > instead of the EMBOSS distribution data, like symbol comparison matrices > and codon usage tables (would genetic codes fall into this catagory ?). > Till now there was each time a new ACD object type defined, like matrix > and cfile. Is shifting to the use of "knowntype" a good idea ? I do not > know, but, let's keep consistent. The same will happen for these ... but matrix files are complicated. For programs that read nucleotide and protein, the list will have to include all matrix files. > - There is the issue of the program embossdata, useful for the advanced > user and a possible tool for displaying choice lists in GUI's. Currently, > when we run it at the BEN site with just the parameter -showall it produces a > monstruous long list, because all the databanks (including CUTG) have been > downloaded and "extracted". Maybe let it by default display only the data > files in the main data directory ? Note that e.g. the list of PRINTS files > is anyway not very interesting, since you cannot do anything with them as > such. Could it be modified so that you can easily get a list of the > alternative data files used by a particular program (or could a library > routine called by the program itself do that) ? I have modified embossdata to prompt always for a filename (default of no file still lists all files). Options to select the other directories are interesting because (1) you get less output and (2) we will have a new internal default for the list of directories used by embossdata! Hope that makes things clearer, and thanks for the comments. Peter From senger at ebi.ac.uk Tue Apr 19 06:11:46 2005 From: senger at ebi.ac.uk (Martin Senger) Date: Tue, 19 Apr 2005 11:11:46 +0100 (BST) Subject: Genetic codes and other repeated ACD lists In-Reply-To: <4255635E.8030609@ebi.ac.uk> Message-ID: > RESOURCE genetic_code [ type: "list" value: "0:Standard;11:Bacterial" ] > I am not knowledgeable enough about EMBOSS, especially I know almost nothing about the EGC.index etc., in order to be helpful here, but allow me please ask a question: If I understand it correctly you are actually talking about replacing often-repeated pieces of ACD files by a reference to a common (shared) place where the piece is stored just once. But that seems to be an exact scenario used in all kinds of the 'include' directives. So what about to consider to add a general syntax for inclusion in the ACD and then you can replace not only genetic codes but any other repeting piece any time you wish. And it will be transparent for the ACD parsers (they just need to know where to look for the included files). Just my 2cents, Martin -- Martin Senger EMBL Outstation - Hinxton Senger at EBI.ac.uk European Bioinformatics Institute Phone: (+44) 1223 494636 Wellcome Trust Genome Campus (Switchboard: 494444) Hinxton Fax : (+44) 1223 494468 Cambridge CB10 1SD United Kingdom http://industry.ebi.ac.uk/~senger From jrvalverde at cnb.uam.es Thu Apr 21 05:58:51 2005 From: jrvalverde at cnb.uam.es (=?ISO-8859-15?Q?Jos=E9?= R. Valverde) Date: Thu, 21 Apr 2005 11:58:51 +0200 Subject: Wiki Message-ID: <20050421115851.49380dc9.jrvalverde@cnb.uam.es> I would rather welcome a Wiki for EMBOSS documentation. I can host it at Es.EMBnet.Org/es.emboss.org, no problem at that. The reason is that as I run into problems/tricks/tasks to do, I see comments that might be added here and there in the documentation. I would rather go to a single site and make the changes myself than go throught he hassle of devising a 'diff' comment, finding out who to mail, mailing them andn waiting for a new doc release. If there is interest, I can set it up straight away. j -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/emboss-dev/attachments/20050421/4620bfb1/attachment.bin From jrvalverde at cnb.uam.es Thu Apr 21 06:33:21 2005 From: jrvalverde at cnb.uam.es (=?ISO-8859-15?Q?Jos=E9?= R. Valverde) Date: Thu, 21 Apr 2005 12:33:21 +0200 Subject: CUTG Message-ID: <20050421123321.5574df12.jrvalverde@cnb.uam.es> I just saw there are new improvements in cutgextract... Great! However, if I may make a suggestion, it would be nice if it where to save the codon tables in a hierarchical arrangement. I just converted CUTG... 25k files in all. Amazing! Useful! all thay deserves a great Yes! but has a serious problem: users of the command line may try an 'ls Emyorganism*' and find their table, but users of GUIs will have a tough time to navigate through a pull-down menu with 25 thousand options ! Certainly, the GUI might take (partially) care of that by grouping tables through the pre-underscore part (organism name), but still too many would result. So, perhaps it would be better if CUTG where stored in $EMBOSS_DATA/CUTG, with each section under its own directory, and tables in each section arranged by e.g organism or first/two-first letter(s). This may become an interesting question for the emboss users mailing list.. j -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/emboss-dev/attachments/20050421/9d2538f8/attachment.bin From pmr at ebi.ac.uk Thu Apr 21 06:38:30 2005 From: pmr at ebi.ac.uk (Peter Rice) Date: Thu, 21 Apr 2005 11:38:30 +0100 Subject: CUTG In-Reply-To: <20050421123321.5574df12.jrvalverde@cnb.uam.es> References: <20050421123321.5574df12.jrvalverde@cnb.uam.es> Message-ID: <426782A6.5020900@ebi.ac.uk> Jos? R. Valverde wrote: > I just saw there are new improvements in cutgextract... Great! > > However, if I may make a suggestion, it would be nice if it where to > save the codon tables in a hierarchical arrangement. > > I just converted CUTG... 25k files in all. Amazing! Useful! all thay > deserves a great Yes! but has a serious problem: users of the command > line may try an 'ls Emyorganism*' and find their table, but users of > GUIs will have a tough time to navigate through a pull-down menu with > 25 thousand options ! The plan I have is a little different ... ... to allow a CUTG entry to be retrieved from SRS (haha - has everyone seen the news from LION?) or from the CUTG server through some non-sequence access method that can return the text of an entry from CUTG, PROSITE, and otehr databases. But at least CUTGEXTRACT can now extract a single species for you so there is no need to extract all 25,000 entries. Hope this helps Peter From pmr at ebi.ac.uk Thu Apr 21 12:20:24 2005 From: pmr at ebi.ac.uk (Peter Rice) Date: Thu, 21 Apr 2005 17:20:24 +0100 Subject: [EMBOSS] Wiki In-Reply-To: <20050421115851.49380dc9.jrvalverde@cnb.uam.es> References: <20050421115851.49380dc9.jrvalverde@cnb.uam.es> Message-ID: <4267D2C8.10009@ebi.ac.uk> Jos? R. Valverde wrote: > I would rather welcome a Wiki for EMBOSS documentation. We have all the documentation (including the sourceforge web pages) in CVS. Any member of the development/documentation team can make updates there. No need for a wiki for this - and a wiki would be difficult to manage as most of the documentation is generated automatically. > The reason is that as I run into problems/tricks/tasks to do, I see > comments that might be added here and there in the documentation. I > would rather go to a single site and make the changes myself than > go throught he hassle of devising a 'diff' comment, finding out who > to mail, mailing them andn waiting for a new doc release. Just mail anything like that to emboss-bug. After all ... there is not much point in changing a wiki version of the documentation if we are busy changing the application and the real documentation :-) regards, Peter From jrvalverde at cnb.uam.es Fri Apr 22 04:11:18 2005 From: jrvalverde at cnb.uam.es (=?ISO-8859-15?Q?Jos=E9?= R. Valverde) Date: Fri, 22 Apr 2005 10:11:18 +0200 Subject: [EMBOSS] Wiki (and Macs) In-Reply-To: <4267D2C8.10009@ebi.ac.uk> References: <20050421115851.49380dc9.jrvalverde@cnb.uam.es> <4267D2C8.10009@ebi.ac.uk> Message-ID: <20050422101118.33b19892.jrvalverde@cnb.uam.es> On Thu, 21 Apr 2005 17:20:24 +0100 Peter Rice wrote: > > After all ... there is not much point in changing a wiki version of the > documentation if we are busy changing the application and the real > documentation :-) > > regards, > > Peter Right you are Sir. I guess it's better as it is for now. And yet... Speaking generally, it probably boils down to the management model we want for EMBOSS. As it is now I tend to see it much like a Cathedral than a Bazaar. Truly it isn't, but you must agree it is not so evident from the docs what the procedures are for participation. At least not at first sight. I'm more for the Bazaar model, one where everyone is welcome and making changes is as trivial as possible (specially for end-users and end-user-related material, like docs). I'd rather have that as a 'common' to build a user community around. Game theory shows that to be the best strategy in the long run (see e.g. http://encyclopedia.laborlawtalk.com/Tragedy_of_the_commons ). In the short run, with limited resources as the EMBOSS team currently is, you are right it takes a significant effort and portion of the existing resources. It makes more sense to concentrate on the short term now and surviving enough to drive new resources in. But I think we should have that in sight for the long term. j -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/emboss-dev/attachments/20050422/7a0edd92/attachment.bin From jrvalverde at cnb.uam.es Fri Apr 22 04:20:58 2005 From: jrvalverde at cnb.uam.es (=?ISO-8859-15?Q?Jos=E9?= R. Valverde) Date: Fri, 22 Apr 2005 10:20:58 +0200 Subject: Macintosh EMBOSS In-Reply-To: <4267D2C8.10009@ebi.ac.uk> References: <20050421115851.49380dc9.jrvalverde@cnb.uam.es> <4267D2C8.10009@ebi.ac.uk> Message-ID: <20050422102058.2ca36edb.jrvalverde@cnb.uam.es> I'm trying to find out ways to fund EMBOSS in a way that I can justify locally. Mac users are a growing 'market' and a promising community. I've got here hundreds of Macs, and they need an easy to use, install and manage solution. What is needed (they tell me) is a good editor, and some interactive graphic facilities for common, simple tasks. Actually, locally, we are going to spend a significant amount into buying a handful of licenses for commercial software. I've tried Erik's CD, but it has some drawbacks regarding the configuration on non-user-managed Macs (as those where root belongs to a central authority): Here they can install software but not make modifications. I can't either, being on the SciComp side and not on the Offimatic end. I don't have the resources to do that locally, but would welcome a sensible way to fund it (like buying 'licenses', packages, CDs or manuals from an EMBOSS-centered company). I for one would certainly welcome a Macintosh edition ready to run, and easy to configure to use central databases. If I were to chose, I'd try to add those facilities to Jemboss (a sequence editor, and interactive drawing of clones and molecular graphics). This is the most lacking thing in EMBOSS now that every user has or can have a UNIX machine at their desktop. And, certainly, I would happily recommend locally that we buy a hundred+ licenses at a reasonable price if that would help fund EMBOSS. Most ideally, something like the LiveDVD from AT.EMBnet.Org but for Macs would be a candy. And an easy to justify buy. Any recommendations? Takers? Pointers? j -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/emboss-dev/attachments/20050422/49e14ae1/attachment.bin From pmr at ebi.ac.uk Thu Apr 7 16:44:14 2005 From: pmr at ebi.ac.uk (Peter Rice) Date: Thu, 07 Apr 2005 17:44:14 +0100 Subject: Genetic codes and other repeated ACD lists Message-ID: <4255635E.8030609@ebi.ac.uk> I have found a way to save writing and maintaining lists like these in ACD files: list: table [ additional: "Y" default: "0" minimum: "1" maximum: "1" header: "Genetic codes" values: "0:Standard; 1:Standard (with alternative initiation codons); 2:Vertebrate Mitochondrial; 3:Yeast Mitochondrial; 4:Mold, Protozoan, Coelenterate Mitochondrial and Mycoplasma/Spiroplasma; 5:Invertebrate Mitochondrial; 6:Ciliate Macronuclear and Dasycladacean; 9:Echinoderm Mitochondrial; 10:Euplotid Nuclear; 11:Bacterial; 12:Alternative Yeast Nuclear; 13:Ascidian Mitochondrial; 14:Flatworm Mitochondrial; 15:Blepharisma Macronuclear; 16:Chlorophycean Mitochondrial; 21:Trematode Mitochondrial; 22:Scenedesmus obliquus; 23:Thraustochytrium Mitochondrial" delimiter: ";" codedelimiter: ":" information: "Code to use" knowntype: "genetic code" ] Using the "knowntype" attribute it is possible to delet the value atttribute, and to define a standard list using a "resource" definition in the emboss.default (or .embossrc) file like this: RESOURCE genetic_code [ type: "list" value: "0:Standard;11:Bacterial" ] (for just 2 genetic codes) or RESOURCE genetic_code [ type: "list" value: "@EGC.index" ] (for a list of all the genetic codes - this will read a datafile EGC.index which is new in CVS). Other resource definitions could be commands to execute. I have not yet decided whether to allow a value of "@EGC.index" in the ACD file itself. It could be a nice short cut, but I like using a "knowntype" to control the results. There are some problems to solve: 1. the resource is tested in too many places - it should replace the "value" attribute when it is first used. Not hard to do. 2. there should be a clean way to define a default value for each knowntype - for example calling an ajTrn function to resolve the "genetic code" knowntype to a value. Functions can be defined for list knowntypes in ajacd.c 3. anyone parsing the ACD file will wonder where the value has gone - perhaps acdpretty can be made to fill in missing values with an environment variable set. Would that be acceptable to those who need it? Future uses for this: 1. standard list of genetic codes with descriptions 2. standard reading frame names 3. list of known codon usage files, matrices, etc. by specifying "?" as the value 4. a list of blast databases for a blastall wrapper :-) 5. replacing "string" qualifiers which have a knowntype with a selection that can display and test the list of acceptable values in ACD, to avoid a run-time failure Comments please .... Peter From jison at hgmp.mrc.ac.uk Fri Apr 8 10:34:51 2005 From: jison at hgmp.mrc.ac.uk (Dr J.C. Ison) Date: Fri, 08 Apr 2005 11:34:51 +0100 Subject: Genetic codes and other repeated ACD lists References: <4255635E.8030609@ebi.ac.uk> Message-ID: <42565E4B.1232945@hgmp.mrc.ac.uk> Hi Peter Comments below. Cheers Jon Peter Rice wrote: > > I have found a way to save writing and maintaining lists like these in ACD files: > > list: table [ > additional: "Y" > default: "0" > minimum: "1" > maximum: "1" > header: "Genetic codes" > values: "0:Standard; 1:Standard (with alternative initiation > codons); 2:Vertebrate Mitochondrial; 3:Yeast Mitochondrial; > 4:Mold, Protozoan, Coelenterate Mitochondrial and > Mycoplasma/Spiroplasma; 5:Invertebrate Mitochondrial; 6:Ciliate > Macronuclear and Dasycladacean; 9:Echinoderm Mitochondrial; > 10:Euplotid Nuclear; 11:Bacterial; 12:Alternative Yeast Nuclear; > 13:Ascidian Mitochondrial; 14:Flatworm Mitochondrial; > 15:Blepharisma Macronuclear; 16:Chlorophycean Mitochondrial; > 21:Trematode Mitochondrial; 22:Scenedesmus obliquus; > 23:Thraustochytrium Mitochondrial" > delimiter: ";" > codedelimiter: ":" > information: "Code to use" > knowntype: "genetic code" > ] > > Using the "knowntype" attribute it is possible to delet the value atttribute, > and to define a standard list using a "resource" definition in the > emboss.default (or .embossrc) file like this: > > RESOURCE genetic_code [ type: "list" value: "0:Standard;11:Bacterial" ] > > (for just 2 genetic codes) > > or > > RESOURCE genetic_code [ type: "list" value: "@EGC.index" ] > > (for a list of all the genetic codes - this will read a datafile EGC.index > which is new in CVS). > > Other resource definitions could be commands to execute. It'd be cleaner, more flexible and and easier to maintain and if not a requirement now probably an increasing one in the future. I've two progs that would benefit from it now. > I have not yet decided whether to allow a value of "@EGC.index" in the ACD > file itself. It could be a nice short cut, but I like using a "knowntype" to > control the results. Could be confusing to allow that in the ACD file because the punter might think EGC existed, e.g. as a data item, in the file itself and get confused when they can't find it. > There are some problems to solve: > > 1. the resource is tested in too many places - it should replace the "value" > attribute when it is first used. Not hard to do. > > 2. there should be a clean way to define a default value for each knowntype - > for example calling an ajTrn function to resolve the "genetic code" knowntype > to a value. Functions can be defined for list knowntypes in ajacd.c Couldn't the default be specified in the same place / file as the values themselves? Presumably the default value would be needed before run-time proper and could be retrieved at the same time as the values are. > > 3. anyone parsing the ACD file will wonder where the value has gone - perhaps > acdpretty can be made to fill in missing values with an environment variable > set. Would that be acceptable to those who need it? I think it would be nice to support both "standard" lists (ie. ones *with* "values" attribute) and the new style. Perhaps something like: values: "@knowntype" to indicate to use the knowntype to get the values, *or* values: "0: Standard ... etc" as before. Then the values attribute would always be there, with the ACD developer having the option to specify a standard list of values or to get the values from the knowntype. > Future uses for this: > > 1. standard list of genetic codes with descriptions > > 2. standard reading frame names > > 3. list of known codon usage files, matrices, etc. by specifying "?" as the value > > 4. a list of blast databases for a blastall wrapper :-) > > 5. replacing "string" qualifiers which have a knowntype with a selection that > can display and test the list of acceptable values in ACD, to avoid a run-time > failure > > Comments please .... > > Peter -- Jon C. Ison, PhD Proteomics Applications Group MRC Rosalind Franklin Centre for Genomics Research Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SB, UK Tel: +44 1223 494500 Fax: +44 1223 494512 E-mail: jison at rfcgr.mrc.ac.uk Web: http://www.rfcgr.mrc.ac.uk From pmr at ebi.ac.uk Fri Apr 8 10:55:02 2005 From: pmr at ebi.ac.uk (Peter Rice) Date: Fri, 08 Apr 2005 11:55:02 +0100 Subject: Genetic codes and other repeated ACD lists In-Reply-To: <42565E4B.1232945@hgmp.mrc.ac.uk> References: <4255635E.8030609@ebi.ac.uk> <42565E4B.1232945@hgmp.mrc.ac.uk> Message-ID: <42566306.3000908@ebi.ac.uk> Dr J.C. Ison wrote: > Peter Rice wrote: >>I have found a way to save writing and maintaining lists like these in ACD files: >> >>Using the "knowntype" attribute it is possible to delet the value atttribute, >>and to define a standard list using a "resource" definition in the >>emboss.default (or .embossrc) file like this: > > It'd be cleaner, more flexible and and easier to maintain and if not a > requirement now probably an increasing one in the future. I've two progs > that would benefit from it now. Thanks. Domainatrix I assume. Which ones? I will take a look and see how they fit in. > Couldn't the default be specified in the same place / file as the values themselves? > Presumably the default value would be needed before run-time proper and could > be retrieved at the same time as the values are. Good point. I need to think some more about whether a knowntype should have a default. Genetics codes are a good example - we use a default of 0 but strictly genetic code numbers are 1 to 23 (0 is code 1 with only ATG as a start). > I think it would be nice to support both "standard" lists (ie. ones *with* "values" > attribute) and the new style. Perhaps something like: > > values: "@knowntype" A missing value will do this. Normally the value is required, so an ACD developer (or parser) will know it needs a value. (I hope :-) > to indicate to use the knowntype to get the values, *or* > > values: "0: Standard ... etc" as before. Yes, that will override the knowntype. Maybe acdvalid can warn if a list or select has a knowntype (with its own standadr value) and a defined value attribute More comments please! Peter From jison at hgmp.mrc.ac.uk Fri Apr 8 11:46:43 2005 From: jison at hgmp.mrc.ac.uk (Dr J.C. Ison) Date: Fri, 08 Apr 2005 12:46:43 +0100 Subject: Genetic codes and other repeated ACD lists References: <4255635E.8030609@ebi.ac.uk> <42565E4B.1232945@hgmp.mrc.ac.uk> <42566306.3000908@ebi.ac.uk> Message-ID: <42566F23.F80F2CFD@hgmp.mrc.ac.uk> Peter Rice wrote: > Thanks. Domainatrix I assume. Which ones? I will take a look and see how they > fit in. The newly committed matgen3d and siggenlig, which both take an "environment definition" (amino acid 3D environment) from a list. At the moment, the environemnts names are "Env1", "Env2" etc but would get more meaningful names once the definitions themselves are more settled (pending further research on which ones are most useful). The progs. then use the selection (Env1 or Env2 etc) to call appropriate functions within the application code. > > I think it would be nice to support both "standard" lists (ie. ones *with* "values" > > attribute) and the new style. Perhaps something like: > > > > values: "@knowntype" > > A missing value will do this. Normally the value is required, so an ACD > developer (or parser) will know it needs a value. (I hope :-) They could simply acdprettyify the files as described in your prev. email before parsing, so they wouldn't need to do any new coding. > > > to indicate to use the knowntype to get the values, *or* > > > > values: "0: Standard ... etc" as before. > > Yes, that will override the knowntype. Maybe acdvalid can warn if a list or > select has a knowntype (with its own standadr value) and a defined value attribute The override / warning are intuitive / sensible. From pmr at ebi.ac.uk Fri Apr 8 13:18:42 2005 From: pmr at ebi.ac.uk (Peter Rice) Date: Fri, 08 Apr 2005 14:18:42 +0100 Subject: Genetic codes and other repeated ACD lists In-Reply-To: <42566F23.F80F2CFD@hgmp.mrc.ac.uk> References: <4255635E.8030609@ebi.ac.uk> <42565E4B.1232945@hgmp.mrc.ac.uk> <42566306.3000908@ebi.ac.uk> <42566F23.F80F2CFD@hgmp.mrc.ac.uk> Message-ID: <425684B2.1000709@ebi.ac.uk> Dr J.C. Ison wrote: > Peter Rice wrote: >>Thanks. Domainatrix I assume. Which ones? I will take a look and see how they >>fit in. > > The newly committed matgen3d and siggenlig, which both take an "environment > definition" (amino acid 3D environment) from a list. > > At the moment, the environemnts names are "Env1", "Env2" etc but would get more > meaningful names once the definitions themselves are more settled (pending > further research on which ones are most useful). The genetic code format is very simple - the name, a space and the value with leading spaces and #commented lines ignored (this is the EGC.index file for an "@EGC.index" resource value) 0 Standard with AUG start only 1 Standard 2 Vertebrate mitochondrial 3 Yeast mitochondrial 4 Mold, Protozoan, and Coelenterate Mitochondrial and Mycoplasma/Spiroplasma 5 Invertebrate Mitochondrial 6 Ciliate, Dasycladacean and Hexamita Nuclear # 7 *Kinetoplast code now merged in code id 4 # 8 *Plant chloroplast all differences due to RNA edit use code id 1 9 Echinoderm and Flatworm Mitochondrial 10 Euplotid Nuclear 11 Bacterial and Plant Plastid 12 Alternative Yeast Nuclear 13 Ascidian Mitochondrial 14 Alternative Flatworm Mitochondrial 15 Blepharisma Nuclear 16 Chlorophycean Mitochondrial #17 Never defined #18 Never defined #19 Never defined #20 Never defined 21 Trematode Mitochondrial 22 Scenedesmus obliquus Mitochondrial 23 Thraustochytrium Mitochondrial > They could simply acdprettyify the files as described in your prev. email > before parsing, so they wouldn't need to do any new coding. But maybe not those who use the ACD file at run time :-) regards, Peter From pmr at ebi.ac.uk Fri Apr 8 13:26:59 2005 From: pmr at ebi.ac.uk (Peter Rice) Date: Fri, 08 Apr 2005 14:26:59 +0100 Subject: Genetic codes and other repeated ACD lists In-Reply-To: References: <4255635E.8030609@ebi.ac.uk> Message-ID: <425686A3.2080808@ebi.ac.uk> Tim Carver wrote: >Peter Rice wrote: >>3. anyone parsing the ACD file will wonder where the value has gone - perhaps >>acdpretty can be made to fill in missing values with an environment variable >>set. Would that be acceptable to those who need it? > > I guess so. If we just loop over the ACD's after installation and get > 'acdpretty' to convert them that shouldn't be too bad I would have > thought... it would only need to be done once. For list: and selction: the acdpretty output would look normal (the value: "" attribute can be filled in with the knowntype value). For matrix: and matrixf: we can leave everything unchanged (add nothing to the ACD file in acdpretty), or we can offer a list of known matrix filenames using some new attribute name. This is a little tricky ... for the alignment programs, there will be separate lists for nucleotide (only EDNAFULL and EDNAMAT) and protein (EBLOSUM* and EPAM*) with the allowed values depending on the type of the input sequences. Of course, as matrix input the user can choose any other available matrix file if the interface allows. Any prerefence (or any special requests to help JEMBOSS?) regards, Peter From pmr at ebi.ac.uk Fri Apr 8 13:30:34 2005 From: pmr at ebi.ac.uk (Peter Rice) Date: Fri, 08 Apr 2005 14:30:34 +0100 Subject: Genetic codes and other repeated ACD lists In-Reply-To: <425685BA.A9658C5F@hgmp.mrc.ac.uk> References: <4255635E.8030609@ebi.ac.uk> <42565E4B.1232945@hgmp.mrc.ac.uk> <42566306.3000908@ebi.ac.uk> <42566F23.F80F2CFD@hgmp.mrc.ac.uk> <425684B2.1000709@ebi.ac.uk> <425685BA.A9658C5F@hgmp.mrc.ac.uk> Message-ID: <4256877A.4090609@ebi.ac.uk> Dr J.C. Ison wrote: > That format would be ideal. "Env1", "Env2" etc could be replaced by "1", "2" etc > then text could be added giving a meaningful description of the environment. The name would be whatever the program accepts for a list (for selection it is the value, but list is generally preferred in ACD files). I know domainatrix often uses "1", "2", etc. but they are not always the best choices. A thought - perhaps the file could have a default marked with * before the name, or default to the first in the list? regards, Peter From gbottu at ben.vub.ac.be Wed Apr 13 13:22:12 2005 From: gbottu at ben.vub.ac.be (Guy Bottu) Date: Wed, 13 Apr 2005 15:22:12 +0200 Subject: Genetic codes and other repeated ACD Message-ID: <20050413132212.GA15521@bigben.ulb.ac.be> Dear Peter, dear all, Allow me to add something to the recent discussion about geneticcodes. I talked about it with Marc Colet, developer of wEMBOSS, and he considers, for the sake of GUI maintenance, that it is better to avoid making the ACD syntax too complicated and certainly to avoid making too often a change. A few ideas : - Currently emboss.defaults does not contain items that are absolutely needed. We think it is better not to change that philosophy by putting e.g. the geneticcodes in it. It could however be an idea to put in emboss.defaults a list of databanks in BLAST format, for the sake of BLAST wrappers. - For items like reading frames and maybe geneticcodes, that appear over and over again in several ACD files, yet are not user or installation customizable, the best proposal among those made in this discussion list seems to me to have it defined in one central file, for the purpose of the software developement, but to "acdpretty" it into the ACD files before they are distributed, for the sake of GUI functioning. - There is the case of items where users can choose to use their own data instead of the EMBOSS distribution data, like symbol comparison matrices and codon usage tables (would genetic codes fall into this catagory ?). Till now there was each time a new ACD object type defined, like matrix and cfile. Is shifting to the use of "knowntype" a good idea ? I do not know, but, let's keep consistent. - There is the issue of the program embossdata, useful for the advanced user and a possible tool for displaying choice lists in GUI's. Currently, when we run it at the BEN site with just the parameter -showall it produces a monstruous long list, because all the databanks (including CUTG) have been downloaded and "extracted". Maybe let it by default display only the data files in the main data directory ? Note that e.g. the list of PRINTS files is anyway not very interesting, since you cannot do anything with them as such. Could it be modified so that you can easily get a list of the alternative data files used by a particular program (or could a library routine called by the program itself do that) ? Sincerely, Guy Bottu, BEN From pmr at ebi.ac.uk Wed Apr 13 14:30:05 2005 From: pmr at ebi.ac.uk (Peter Rice) Date: Wed, 13 Apr 2005 15:30:05 +0100 Subject: Genetic codes and other repeated ACD In-Reply-To: <20050413132212.GA15521@bigben.ulb.ac.be> References: <20050413132212.GA15521@bigben.ulb.ac.be> Message-ID: <425D2CED.60503@ebi.ac.uk> Guy Bottu wrote: > - Currently emboss.defaults does not contain items that are absolutely > needed. We think it is better not to change that philosophy by putting > e.g. the geneticcodes in it. It could however be an idea to put in > emboss.defaults a list of databanks in BLAST format, for the sake of BLAST > wrappers. They will not be absolutely needed. There will be a default - a list of values, a file with a list of values, or a script that finds everything. > - For items like reading frames and maybe geneticcodes, that appear over > and over again in several ACD files, yet are not user or installation > customizable, the best proposal among those made in this discussion list > seems to me to have it defined in one central file, for the purpose of the > software developement, but to "acdpretty" it into the ACD files before > they are distributed, for the sake of GUI functioning. This will be the default ... but the distributed files will *not* have the values filled in (if we fill the values in, the automatic list will not work when users add new options :-). You will need to run acdpretty yourself. That way, if you add extra options locally you will get them in the acdpretty file. There is nothing to stop you copying that file on top of the original acd file. > - There is the case of items where users can choose to use their own data > instead of the EMBOSS distribution data, like symbol comparison matrices > and codon usage tables (would genetic codes fall into this catagory ?). > Till now there was each time a new ACD object type defined, like matrix > and cfile. Is shifting to the use of "knowntype" a good idea ? I do not > know, but, let's keep consistent. The same will happen for these ... but matrix files are complicated. For programs that read nucleotide and protein, the list will have to include all matrix files. > - There is the issue of the program embossdata, useful for the advanced > user and a possible tool for displaying choice lists in GUI's. Currently, > when we run it at the BEN site with just the parameter -showall it produces a > monstruous long list, because all the databanks (including CUTG) have been > downloaded and "extracted". Maybe let it by default display only the data > files in the main data directory ? Note that e.g. the list of PRINTS files > is anyway not very interesting, since you cannot do anything with them as > such. Could it be modified so that you can easily get a list of the > alternative data files used by a particular program (or could a library > routine called by the program itself do that) ? I have modified embossdata to prompt always for a filename (default of no file still lists all files). Options to select the other directories are interesting because (1) you get less output and (2) we will have a new internal default for the list of directories used by embossdata! Hope that makes things clearer, and thanks for the comments. Peter From senger at ebi.ac.uk Tue Apr 19 10:11:46 2005 From: senger at ebi.ac.uk (Martin Senger) Date: Tue, 19 Apr 2005 11:11:46 +0100 (BST) Subject: Genetic codes and other repeated ACD lists In-Reply-To: <4255635E.8030609@ebi.ac.uk> Message-ID: > RESOURCE genetic_code [ type: "list" value: "0:Standard;11:Bacterial" ] > I am not knowledgeable enough about EMBOSS, especially I know almost nothing about the EGC.index etc., in order to be helpful here, but allow me please ask a question: If I understand it correctly you are actually talking about replacing often-repeated pieces of ACD files by a reference to a common (shared) place where the piece is stored just once. But that seems to be an exact scenario used in all kinds of the 'include' directives. So what about to consider to add a general syntax for inclusion in the ACD and then you can replace not only genetic codes but any other repeting piece any time you wish. And it will be transparent for the ACD parsers (they just need to know where to look for the included files). Just my 2cents, Martin -- Martin Senger EMBL Outstation - Hinxton Senger at EBI.ac.uk European Bioinformatics Institute Phone: (+44) 1223 494636 Wellcome Trust Genome Campus (Switchboard: 494444) Hinxton Fax : (+44) 1223 494468 Cambridge CB10 1SD United Kingdom http://industry.ebi.ac.uk/~senger From jrvalverde at cnb.uam.es Thu Apr 21 09:58:51 2005 From: jrvalverde at cnb.uam.es (=?ISO-8859-15?Q?Jos=E9?= R. Valverde) Date: Thu, 21 Apr 2005 11:58:51 +0200 Subject: Wiki Message-ID: <20050421115851.49380dc9.jrvalverde@cnb.uam.es> I would rather welcome a Wiki for EMBOSS documentation. I can host it at Es.EMBnet.Org/es.emboss.org, no problem at that. The reason is that as I run into problems/tricks/tasks to do, I see comments that might be added here and there in the documentation. I would rather go to a single site and make the changes myself than go throught he hassle of devising a 'diff' comment, finding out who to mail, mailing them andn waiting for a new doc release. If there is interest, I can set it up straight away. j -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available URL: From jrvalverde at cnb.uam.es Thu Apr 21 10:33:21 2005 From: jrvalverde at cnb.uam.es (=?ISO-8859-15?Q?Jos=E9?= R. Valverde) Date: Thu, 21 Apr 2005 12:33:21 +0200 Subject: CUTG Message-ID: <20050421123321.5574df12.jrvalverde@cnb.uam.es> I just saw there are new improvements in cutgextract... Great! However, if I may make a suggestion, it would be nice if it where to save the codon tables in a hierarchical arrangement. I just converted CUTG... 25k files in all. Amazing! Useful! all thay deserves a great Yes! but has a serious problem: users of the command line may try an 'ls Emyorganism*' and find their table, but users of GUIs will have a tough time to navigate through a pull-down menu with 25 thousand options ! Certainly, the GUI might take (partially) care of that by grouping tables through the pre-underscore part (organism name), but still too many would result. So, perhaps it would be better if CUTG where stored in $EMBOSS_DATA/CUTG, with each section under its own directory, and tables in each section arranged by e.g organism or first/two-first letter(s). This may become an interesting question for the emboss users mailing list.. j -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available URL: From pmr at ebi.ac.uk Thu Apr 21 10:38:30 2005 From: pmr at ebi.ac.uk (Peter Rice) Date: Thu, 21 Apr 2005 11:38:30 +0100 Subject: CUTG In-Reply-To: <20050421123321.5574df12.jrvalverde@cnb.uam.es> References: <20050421123321.5574df12.jrvalverde@cnb.uam.es> Message-ID: <426782A6.5020900@ebi.ac.uk> Jos? R. Valverde wrote: > I just saw there are new improvements in cutgextract... Great! > > However, if I may make a suggestion, it would be nice if it where to > save the codon tables in a hierarchical arrangement. > > I just converted CUTG... 25k files in all. Amazing! Useful! all thay > deserves a great Yes! but has a serious problem: users of the command > line may try an 'ls Emyorganism*' and find their table, but users of > GUIs will have a tough time to navigate through a pull-down menu with > 25 thousand options ! The plan I have is a little different ... ... to allow a CUTG entry to be retrieved from SRS (haha - has everyone seen the news from LION?) or from the CUTG server through some non-sequence access method that can return the text of an entry from CUTG, PROSITE, and otehr databases. But at least CUTGEXTRACT can now extract a single species for you so there is no need to extract all 25,000 entries. Hope this helps Peter From pmr at ebi.ac.uk Thu Apr 21 16:20:24 2005 From: pmr at ebi.ac.uk (Peter Rice) Date: Thu, 21 Apr 2005 17:20:24 +0100 Subject: [EMBOSS] Wiki In-Reply-To: <20050421115851.49380dc9.jrvalverde@cnb.uam.es> References: <20050421115851.49380dc9.jrvalverde@cnb.uam.es> Message-ID: <4267D2C8.10009@ebi.ac.uk> Jos? R. Valverde wrote: > I would rather welcome a Wiki for EMBOSS documentation. We have all the documentation (including the sourceforge web pages) in CVS. Any member of the development/documentation team can make updates there. No need for a wiki for this - and a wiki would be difficult to manage as most of the documentation is generated automatically. > The reason is that as I run into problems/tricks/tasks to do, I see > comments that might be added here and there in the documentation. I > would rather go to a single site and make the changes myself than > go throught he hassle of devising a 'diff' comment, finding out who > to mail, mailing them andn waiting for a new doc release. Just mail anything like that to emboss-bug. After all ... there is not much point in changing a wiki version of the documentation if we are busy changing the application and the real documentation :-) regards, Peter From jrvalverde at cnb.uam.es Fri Apr 22 08:11:18 2005 From: jrvalverde at cnb.uam.es (=?ISO-8859-15?Q?Jos=E9?= R. Valverde) Date: Fri, 22 Apr 2005 10:11:18 +0200 Subject: [EMBOSS] Wiki (and Macs) In-Reply-To: <4267D2C8.10009@ebi.ac.uk> References: <20050421115851.49380dc9.jrvalverde@cnb.uam.es> <4267D2C8.10009@ebi.ac.uk> Message-ID: <20050422101118.33b19892.jrvalverde@cnb.uam.es> On Thu, 21 Apr 2005 17:20:24 +0100 Peter Rice wrote: > > After all ... there is not much point in changing a wiki version of the > documentation if we are busy changing the application and the real > documentation :-) > > regards, > > Peter Right you are Sir. I guess it's better as it is for now. And yet... Speaking generally, it probably boils down to the management model we want for EMBOSS. As it is now I tend to see it much like a Cathedral than a Bazaar. Truly it isn't, but you must agree it is not so evident from the docs what the procedures are for participation. At least not at first sight. I'm more for the Bazaar model, one where everyone is welcome and making changes is as trivial as possible (specially for end-users and end-user-related material, like docs). I'd rather have that as a 'common' to build a user community around. Game theory shows that to be the best strategy in the long run (see e.g. http://encyclopedia.laborlawtalk.com/Tragedy_of_the_commons ). In the short run, with limited resources as the EMBOSS team currently is, you are right it takes a significant effort and portion of the existing resources. It makes more sense to concentrate on the short term now and surviving enough to drive new resources in. But I think we should have that in sight for the long term. j -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available URL: From jrvalverde at cnb.uam.es Fri Apr 22 08:20:58 2005 From: jrvalverde at cnb.uam.es (=?ISO-8859-15?Q?Jos=E9?= R. Valverde) Date: Fri, 22 Apr 2005 10:20:58 +0200 Subject: Macintosh EMBOSS In-Reply-To: <4267D2C8.10009@ebi.ac.uk> References: <20050421115851.49380dc9.jrvalverde@cnb.uam.es> <4267D2C8.10009@ebi.ac.uk> Message-ID: <20050422102058.2ca36edb.jrvalverde@cnb.uam.es> I'm trying to find out ways to fund EMBOSS in a way that I can justify locally. Mac users are a growing 'market' and a promising community. I've got here hundreds of Macs, and they need an easy to use, install and manage solution. What is needed (they tell me) is a good editor, and some interactive graphic facilities for common, simple tasks. Actually, locally, we are going to spend a significant amount into buying a handful of licenses for commercial software. I've tried Erik's CD, but it has some drawbacks regarding the configuration on non-user-managed Macs (as those where root belongs to a central authority): Here they can install software but not make modifications. I can't either, being on the SciComp side and not on the Offimatic end. I don't have the resources to do that locally, but would welcome a sensible way to fund it (like buying 'licenses', packages, CDs or manuals from an EMBOSS-centered company). I for one would certainly welcome a Macintosh edition ready to run, and easy to configure to use central databases. If I were to chose, I'd try to add those facilities to Jemboss (a sequence editor, and interactive drawing of clones and molecular graphics). This is the most lacking thing in EMBOSS now that every user has or can have a UNIX machine at their desktop. And, certainly, I would happily recommend locally that we buy a hundred+ licenses at a reasonable price if that would help fund EMBOSS. Most ideally, something like the LiveDVD from AT.EMBnet.Org but for Macs would be a candy. And an easy to justify buy. Any recommendations? Takers? Pointers? j -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available URL: