From pmr at ebi.ac.uk  Thu Apr  7 12:44:14 2005
From: pmr at ebi.ac.uk (Peter Rice)
Date: Thu, 07 Apr 2005 17:44:14 +0100
Subject: Genetic codes and other repeated ACD lists
Message-ID: <4255635E.8030609@ebi.ac.uk>

I have found a way to save writing and maintaining lists like these in ACD files:

   list: table  [
     additional: "Y"
     default: "0"
     minimum: "1"
     maximum: "1"
     header: "Genetic codes"
     values: "0:Standard; 1:Standard (with alternative initiation
              codons); 2:Vertebrate Mitochondrial; 3:Yeast Mitochondrial;
              4:Mold, Protozoan, Coelenterate Mitochondrial and
              Mycoplasma/Spiroplasma; 5:Invertebrate Mitochondrial; 6:Ciliate
              Macronuclear and Dasycladacean; 9:Echinoderm Mitochondrial;
              10:Euplotid Nuclear; 11:Bacterial; 12:Alternative Yeast Nuclear;
              13:Ascidian Mitochondrial; 14:Flatworm Mitochondrial;
              15:Blepharisma Macronuclear; 16:Chlorophycean Mitochondrial;
              21:Trematode Mitochondrial; 22:Scenedesmus obliquus;
              23:Thraustochytrium Mitochondrial"
     delimiter: ";"
     codedelimiter: ":"
     information: "Code to use"
     knowntype: "genetic code"
   ]


Using the "knowntype" attribute it is possible to delet the value atttribute, 
and to define a standard list using a "resource" definition in the 
emboss.default (or .embossrc) file like this:

RESOURCE genetic_code [ type: "list" value: "0:Standard;11:Bacterial" ]

(for just 2 genetic codes)

or

RESOURCE genetic_code [ type: "list" value: "@EGC.index" ]

(for a list of all the genetic codes - this will read a datafile EGC.index 
which is new in CVS).

Other resource definitions could be commands to execute.

I have not yet decided whether to allow a value of "@EGC.index" in the ACD 
file itself. It could be a nice short cut, but I like using a "knowntype" to 
control the results.

There are some problems to solve:

1. the resource is tested in too many places - it should replace the "value" 
attribute when it is first used. Not hard to do.

2. there should be a clean way to define a default value for each knowntype - 
for example calling an ajTrn function to resolve the "genetic code" knowntype 
to a value. Functions can be defined for list knowntypes in ajacd.c

3. anyone parsing the ACD file will wonder where the value has gone - perhaps 
acdpretty can be made to fill in missing values with an environment variable 
set. Would that be acceptable to those who need it?

Future uses for this:

1. standard list of genetic codes with descriptions

2. standard reading frame names

3. list of known codon usage files, matrices, etc. by specifying "?" as the value

4. a list of blast databases for a blastall wrapper :-)

5. replacing "string" qualifiers which have a knowntype with a selection that 
can display and test the list of acceptable values in ACD, to avoid a run-time 
failure

Comments please ....

Peter


From jison at hgmp.mrc.ac.uk  Fri Apr  8 06:34:51 2005
From: jison at hgmp.mrc.ac.uk (Dr J.C. Ison)
Date: Fri, 08 Apr 2005 11:34:51 +0100
Subject: Genetic codes and other repeated ACD lists
References: <4255635E.8030609@ebi.ac.uk>
Message-ID: <42565E4B.1232945@hgmp.mrc.ac.uk>

Hi Peter

Comments below.

Cheers

Jon


Peter Rice wrote:
> 
> I have found a way to save writing and maintaining lists like these in ACD files:
> 
>    list: table  [
>      additional: "Y"
>      default: "0"
>      minimum: "1"
>      maximum: "1"
>      header: "Genetic codes"
>      values: "0:Standard; 1:Standard (with alternative initiation
>               codons); 2:Vertebrate Mitochondrial; 3:Yeast Mitochondrial;
>               4:Mold, Protozoan, Coelenterate Mitochondrial and
>               Mycoplasma/Spiroplasma; 5:Invertebrate Mitochondrial; 6:Ciliate
>               Macronuclear and Dasycladacean; 9:Echinoderm Mitochondrial;
>               10:Euplotid Nuclear; 11:Bacterial; 12:Alternative Yeast Nuclear;
>               13:Ascidian Mitochondrial; 14:Flatworm Mitochondrial;
>               15:Blepharisma Macronuclear; 16:Chlorophycean Mitochondrial;
>               21:Trematode Mitochondrial; 22:Scenedesmus obliquus;
>               23:Thraustochytrium Mitochondrial"
>      delimiter: ";"
>      codedelimiter: ":"
>      information: "Code to use"
>      knowntype: "genetic code"
>    ]
> 
> Using the "knowntype" attribute it is possible to delet the value atttribute,
> and to define a standard list using a "resource" definition in the
> emboss.default (or .embossrc) file like this:
> 
> RESOURCE genetic_code [ type: "list" value: "0:Standard;11:Bacterial" ]
> 
> (for just 2 genetic codes)
> 
> or
> 
> RESOURCE genetic_code [ type: "list" value: "@EGC.index" ]
> 
> (for a list of all the genetic codes - this will read a datafile EGC.index
> which is new in CVS).
> 
> Other resource definitions could be commands to execute.

It'd be cleaner, more flexible and and easier to maintain and if not a 
requirement now probably an increasing one in the future.  I've two progs 
that would benefit from it now.

 
> I have not yet decided whether to allow a value of "@EGC.index" in the ACD
> file itself. It could be a nice short cut, but I like using a "knowntype" to
> control the results.

Could be confusing to allow that in the ACD file because the punter might 
think EGC existed, e.g. as a data item, in the file itself and get confused
when they can't find it.

 
> There are some problems to solve:
> 
> 1. the resource is tested in too many places - it should replace the "value"
> attribute when it is first used. Not hard to do.
> 
> 2. there should be a clean way to define a default value for each knowntype -
> for example calling an ajTrn function to resolve the "genetic code" knowntype
> to a value. Functions can be defined for list knowntypes in ajacd.c

Couldn't the default be specified in the same place / file as the values themselves?
Presumably the default value would be needed before run-time proper and could
be retrieved at the same time as the values are.

> 
> 3. anyone parsing the ACD file will wonder where the value has gone - perhaps
> acdpretty can be made to fill in missing values with an environment variable
> set. Would that be acceptable to those who need it?


I think it would be nice to support both "standard" lists (ie. ones *with* "values" 
attribute) and the new style.  Perhaps something like:

      values: "@knowntype"  

to indicate to use the knowntype to get the values, *or*

      values: "0: Standard ... etc" as before.

Then the values attribute would always be there, with the ACD developer having 
the option to specify a standard list of values or to get the values from the 
knowntype.


> Future uses for this:
> 
> 1. standard list of genetic codes with descriptions
> 
> 2. standard reading frame names
> 
> 3. list of known codon usage files, matrices, etc. by specifying "?" as the value
> 
> 4. a list of blast databases for a blastall wrapper :-)
> 
> 5. replacing "string" qualifiers which have a knowntype with a selection that
> can display and test the list of acceptable values in ACD, to avoid a run-time
> failure
> 
> Comments please ....
> 
> Peter

-- 
Jon C. Ison, PhD
Proteomics Applications Group
MRC Rosalind Franklin Centre for Genomics Research
Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SB, UK
Tel: +44 1223 494500  Fax: +44 1223 494512
E-mail: jison at rfcgr.mrc.ac.uk  Web: http://www.rfcgr.mrc.ac.uk


From pmr at ebi.ac.uk  Fri Apr  8 06:55:02 2005
From: pmr at ebi.ac.uk (Peter Rice)
Date: Fri, 08 Apr 2005 11:55:02 +0100
Subject: Genetic codes and other repeated ACD lists
In-Reply-To: <42565E4B.1232945@hgmp.mrc.ac.uk>
References: <4255635E.8030609@ebi.ac.uk> <42565E4B.1232945@hgmp.mrc.ac.uk>
Message-ID: <42566306.3000908@ebi.ac.uk>

Dr J.C. Ison wrote:

> Peter Rice wrote:
>>I have found a way to save writing and maintaining lists like these in ACD files:
>>
>>Using the "knowntype" attribute it is possible to delet the value atttribute,
>>and to define a standard list using a "resource" definition in the
>>emboss.default (or .embossrc) file like this:
> 
> It'd be cleaner, more flexible and and easier to maintain and if not a 
> requirement now probably an increasing one in the future.  I've two progs 
> that would benefit from it now.

Thanks. Domainatrix I assume. Which ones? I will take a look and see how they 
fit in.

> Couldn't the default be specified in the same place / file as the values themselves?
> Presumably the default value would be needed before run-time proper and could
> be retrieved at the same time as the values are.

Good point. I need to think some more about whether a knowntype should have a 
default. Genetics codes are a good example - we use a default of 0 but 
strictly genetic code numbers are 1 to 23 (0 is code 1 with only ATG as a start).

> I think it would be nice to support both "standard" lists (ie. ones *with* "values" 
> attribute) and the new style.  Perhaps something like:
> 
>       values: "@knowntype"  

A missing value will do this. Normally the value is required, so an ACD 
developer (or parser) will know it needs a value. (I hope :-)

> to indicate to use the knowntype to get the values, *or*
> 
>       values: "0: Standard ... etc" as before.

Yes, that will override the knowntype. Maybe acdvalid can warn if a list or 
select has a knowntype (with its own standadr value) and a defined value attribute

More comments please!

Peter


From jison at hgmp.mrc.ac.uk  Fri Apr  8 07:46:43 2005
From: jison at hgmp.mrc.ac.uk (Dr J.C. Ison)
Date: Fri, 08 Apr 2005 12:46:43 +0100
Subject: Genetic codes and other repeated ACD lists
References: <4255635E.8030609@ebi.ac.uk> <42565E4B.1232945@hgmp.mrc.ac.uk> <42566306.3000908@ebi.ac.uk>
Message-ID: <42566F23.F80F2CFD@hgmp.mrc.ac.uk>

Peter Rice wrote:

> Thanks. Domainatrix I assume. Which ones? I will take a look and see how they
> fit in.

The newly committed matgen3d and siggenlig, which both take an "environment 
definition" (amino acid 3D environment) from a list.

At the moment, the environemnts names are "Env1", "Env2" etc but would get more 
meaningful names once the definitions themselves are more settled (pending 
further research on which ones are most useful).

The progs. then use the selection (Env1 or Env2 etc) to call appropriate functions
within the application code.

> > I think it would be nice to support both "standard" lists (ie. ones *with* "values"
> > attribute) and the new style.  Perhaps something like:
> >
> >       values: "@knowntype"
> 
> A missing value will do this. Normally the value is required, so an ACD
> developer (or parser) will know it needs a value. (I hope :-)

They could simply acdprettyify the files as described in your prev. email 
before parsing, so they wouldn't need to do any new coding.


> 
> > to indicate to use the knowntype to get the values, *or*
> >
> >       values: "0: Standard ... etc" as before.
> 
> Yes, that will override the knowntype. Maybe acdvalid can warn if a list or
> select has a knowntype (with its own standadr value) and a defined value attribute

The override / warning are intuitive / sensible.


From pmr at ebi.ac.uk  Fri Apr  8 09:18:42 2005
From: pmr at ebi.ac.uk (Peter Rice)
Date: Fri, 08 Apr 2005 14:18:42 +0100
Subject: Genetic codes and other repeated ACD lists
In-Reply-To: <42566F23.F80F2CFD@hgmp.mrc.ac.uk>
References: <4255635E.8030609@ebi.ac.uk> <42565E4B.1232945@hgmp.mrc.ac.uk> <42566306.3000908@ebi.ac.uk> <42566F23.F80F2CFD@hgmp.mrc.ac.uk>
Message-ID: <425684B2.1000709@ebi.ac.uk>

Dr J.C. Ison wrote:

> Peter Rice wrote:
>>Thanks. Domainatrix I assume. Which ones? I will take a look and see how they
>>fit in.
> 
> The newly committed matgen3d and siggenlig, which both take an "environment 
> definition" (amino acid 3D environment) from a list.
> 
> At the moment, the environemnts names are "Env1", "Env2" etc but would get more 
> meaningful names once the definitions themselves are more settled (pending 
> further research on which ones are most useful).

The genetic code format is very simple - the name, a space and the value with 
leading spaces and #commented lines ignored (this is the EGC.index file for an 
"@EGC.index" resource value)

  0 Standard with AUG start only
  1 Standard
  2 Vertebrate mitochondrial
  3 Yeast mitochondrial
  4 Mold, Protozoan, and Coelenterate Mitochondrial and Mycoplasma/Spiroplasma
  5 Invertebrate Mitochondrial
  6 Ciliate, Dasycladacean and Hexamita Nuclear
# 7 *Kinetoplast code now merged in code id 4
# 8 *Plant chloroplast all differences due to RNA edit use code id 1
  9 Echinoderm and Flatworm Mitochondrial
10 Euplotid Nuclear
11 Bacterial and Plant Plastid
12 Alternative Yeast Nuclear
13 Ascidian Mitochondrial
14 Alternative Flatworm Mitochondrial
15 Blepharisma Nuclear
16 Chlorophycean Mitochondrial
#17 Never defined
#18 Never defined
#19 Never defined
#20 Never defined
21 Trematode Mitochondrial
22 Scenedesmus obliquus Mitochondrial
23 Thraustochytrium Mitochondrial


> They could simply acdprettyify the files as described in your prev. email 
> before parsing, so they wouldn't need to do any new coding.

But maybe not those who use the ACD file at run time :-)

regards,

Peter


From pmr at ebi.ac.uk  Fri Apr  8 09:26:59 2005
From: pmr at ebi.ac.uk (Peter Rice)
Date: Fri, 08 Apr 2005 14:26:59 +0100
Subject: Genetic codes and other repeated ACD lists
In-Reply-To: <Pine.OSF.4.58.0504081324290.449852@sidious.internal.sanger.ac.uk>
References: <4255635E.8030609@ebi.ac.uk> <Pine.OSF.4.58.0504081324290.449852@sidious.internal.sanger.ac.uk>
Message-ID: <425686A3.2080808@ebi.ac.uk>

Tim Carver wrote:

>Peter Rice wrote: 
>>3. anyone parsing the ACD file will wonder where the value has gone - perhaps
>>acdpretty can be made to fill in missing values with an environment variable
>>set. Would that be acceptable to those who need it?
> 
> I guess so. If we just loop over the ACD's after installation and get
> 'acdpretty' to convert them that shouldn't be too bad I would have
> thought... it would only need to be done once.

For list: and selction: the acdpretty output would look normal (the value: "" 
attribute can be filled in with the knowntype value).

For matrix: and matrixf: we can leave everything unchanged (add nothing to the 
ACD file in acdpretty), or we can offer a list of known matrix filenames using 
some new attribute name. This is a little tricky ... for the alignment 
programs, there will be separate lists for nucleotide (only EDNAFULL and 
EDNAMAT) and protein (EBLOSUM* and EPAM*) with the allowed values depending on 
the type of the input sequences. Of course, as matrix input the user can 
choose any other available matrix file if the interface allows.

Any prerefence (or any special requests to help JEMBOSS?)

regards,

Peter


From pmr at ebi.ac.uk  Fri Apr  8 09:30:34 2005
From: pmr at ebi.ac.uk (Peter Rice)
Date: Fri, 08 Apr 2005 14:30:34 +0100
Subject: Genetic codes and other repeated ACD lists
In-Reply-To: <425685BA.A9658C5F@hgmp.mrc.ac.uk>
References: <4255635E.8030609@ebi.ac.uk> <42565E4B.1232945@hgmp.mrc.ac.uk> <42566306.3000908@ebi.ac.uk> <42566F23.F80F2CFD@hgmp.mrc.ac.uk> <425684B2.1000709@ebi.ac.uk> <425685BA.A9658C5F@hgmp.mrc.ac.uk>
Message-ID: <4256877A.4090609@ebi.ac.uk>

Dr J.C. Ison wrote:

> That format would be ideal.  "Env1", "Env2" etc could be replaced by "1", "2" etc
> then text could be added giving a meaningful description of the environment.

The name would be whatever the program accepts for a list (for selection it is 
the value, but list is generally preferred in ACD files). I know domainatrix 
often uses "1", "2", etc. but they are not always the best choices.

A thought - perhaps the file could have a default marked with * before the 
name, or default to the first in the list?

regards,

Peter


From gbottu at ben.vub.ac.be  Wed Apr 13 09:22:12 2005
From: gbottu at ben.vub.ac.be (Guy Bottu)
Date: Wed, 13 Apr 2005 15:22:12 +0200
Subject: Genetic codes and other repeated ACD
Message-ID: <20050413132212.GA15521@bigben.ulb.ac.be>

	Dear Peter, dear all,

Allow me to add something to the recent discussion about geneticcodes. I 
talked about it with Marc Colet, developer of wEMBOSS, and he considers, 
for the sake of GUI maintenance, that it is better to avoid making the ACD 
syntax too complicated and certainly to avoid making too often a change.
A few ideas :

- Currently emboss.defaults does not contain items that are absolutely 
needed. We think it is better not to change that philosophy by putting 
e.g. the geneticcodes in it. It could however be an idea to put in 
emboss.defaults a list of databanks in BLAST format, for the sake of BLAST 
wrappers.

- For items like reading frames and maybe geneticcodes, that appear over 
and over again in several ACD files, yet are not user or installation 
customizable, the best proposal among those made in this discussion list 
seems to me to have it defined in one central file, for the purpose of the 
software developement, but to "acdpretty" it into the ACD files before 
they are distributed, for the sake of GUI functioning.

- There is the case of items where users can choose to use their own data 
instead of the EMBOSS distribution data, like symbol comparison matrices 
and codon usage tables (would genetic codes fall into this catagory ?). 
Till now there was each time a new ACD object type defined, like matrix 
and cfile. Is shifting to the use of "knowntype" a good idea ? I do not 
know, but, let's keep consistent.

- There is the issue of the program embossdata, useful for the advanced 
user and a possible tool for displaying choice lists in GUI's. Currently, 
when we run it at the BEN site with just the parameter -showall it produces a 
monstruous long list, because all the databanks (including CUTG) have been 
downloaded and "extracted". Maybe let it by default display only the data 
files in the main data directory ? Note that e.g. the list of PRINTS files 
is anyway not very interesting, since you cannot do anything with them as 
such. Could it be modified so that you can easily get a list of the 
alternative data files used by a particular program (or could a library 
routine called by the program itself do that) ?
 
	Sincerely,
	Guy Bottu,
	BEN


From pmr at ebi.ac.uk  Wed Apr 13 10:30:05 2005
From: pmr at ebi.ac.uk (Peter Rice)
Date: Wed, 13 Apr 2005 15:30:05 +0100
Subject: Genetic codes and other repeated ACD
In-Reply-To: <20050413132212.GA15521@bigben.ulb.ac.be>
References: <20050413132212.GA15521@bigben.ulb.ac.be>
Message-ID: <425D2CED.60503@ebi.ac.uk>

Guy Bottu wrote:

> - Currently emboss.defaults does not contain items that are absolutely 
> needed. We think it is better not to change that philosophy by putting 
> e.g. the geneticcodes in it. It could however be an idea to put in 
> emboss.defaults a list of databanks in BLAST format, for the sake of BLAST 
> wrappers.

They will not be absolutely needed. There will be a default - a list of 
values, a file with a list of values, or a script that finds everything.

> - For items like reading frames and maybe geneticcodes, that appear over 
> and over again in several ACD files, yet are not user or installation 
> customizable, the best proposal among those made in this discussion list 
> seems to me to have it defined in one central file, for the purpose of the 
> software developement, but to "acdpretty" it into the ACD files before 
> they are distributed, for the sake of GUI functioning.

This will be the default ... but the distributed files will *not* have the 
values filled in (if we fill the values in, the automatic list will not work 
when users add new options :-).

You will need to run acdpretty yourself. That way, if you add extra options 
locally you will get them in the acdpretty file. There is nothing to stop you 
copying that file on top of the original acd file.

> - There is the case of items where users can choose to use their own data 
> instead of the EMBOSS distribution data, like symbol comparison matrices 
> and codon usage tables (would genetic codes fall into this catagory ?). 
> Till now there was each time a new ACD object type defined, like matrix 
> and cfile. Is shifting to the use of "knowntype" a good idea ? I do not 
> know, but, let's keep consistent.

The same will happen for these ... but matrix files are complicated. For 
programs that read nucleotide and protein, the list will have to include all 
matrix files.

> - There is the issue of the program embossdata, useful for the advanced 
> user and a possible tool for displaying choice lists in GUI's. Currently, 
> when we run it at the BEN site with just the parameter -showall it produces a 
> monstruous long list, because all the databanks (including CUTG) have been 
> downloaded and "extracted". Maybe let it by default display only the data 
> files in the main data directory ? Note that e.g. the list of PRINTS files 
> is anyway not very interesting, since you cannot do anything with them as 
> such. Could it be modified so that you can easily get a list of the 
> alternative data files used by a particular program (or could a library 
> routine called by the program itself do that) ?

I have modified embossdata to prompt always for a filename (default of no file 
still lists all files).

Options to select the other directories are interesting because (1) you get 
less output and (2) we will have a new internal default for the list of 
directories used by embossdata!

Hope that makes things clearer, and thanks for the comments.

Peter


From senger at ebi.ac.uk  Tue Apr 19 06:11:46 2005
From: senger at ebi.ac.uk (Martin Senger)
Date: Tue, 19 Apr 2005 11:11:46 +0100 (BST)
Subject: Genetic codes and other repeated ACD lists
In-Reply-To: <4255635E.8030609@ebi.ac.uk>
Message-ID: <Pine.LNX.4.44.0504191101190.15035-100000@bagheera.ebi.ac.uk>

> RESOURCE genetic_code [ type: "list" value: "0:Standard;11:Bacterial" ]
>
   I am not knowledgeable enough about EMBOSS, especially I know almost 
nothing about the EGC.index etc., in order to be helpful here, but allow 
me please ask a question:
   If I understand it correctly you are actually talking about replacing
often-repeated pieces of ACD files by a reference to a common (shared)  
place where the piece is stored just once. But that seems to be an exact
scenario used in all kinds of the 'include' directives. So what about to
consider to add a general syntax for inclusion in the ACD and then you can
replace not only genetic codes but any other repeting piece any time you
wish. And it will be transparent for the ACD parsers (they just need to
know where to look for the included files).

   Just my 2cents,
   Martin

-- 
Martin Senger

EMBL Outstation - Hinxton                Senger at EBI.ac.uk     
European Bioinformatics Institute        Phone: (+44) 1223 494636      
Wellcome Trust Genome Campus             (Switchboard:     494444)
Hinxton                                  Fax  : (+44) 1223 494468
Cambridge CB10 1SD
United Kingdom                           http://industry.ebi.ac.uk/~senger


From jrvalverde at cnb.uam.es  Thu Apr 21 05:58:51 2005
From: jrvalverde at cnb.uam.es (=?ISO-8859-15?Q?Jos=E9?= R. Valverde)
Date: Thu, 21 Apr 2005 11:58:51 +0200
Subject: Wiki
Message-ID: <20050421115851.49380dc9.jrvalverde@cnb.uam.es>

I would rather welcome a Wiki for EMBOSS documentation.

I can host it at Es.EMBnet.Org/es.emboss.org, no problem at that.

The reason is that as I run into problems/tricks/tasks to do, I see
comments that might be added here and there in the documentation. I
would rather go to a single site and make the changes myself than 
go throught he hassle of devising a 'diff' comment, finding out who
to mail, mailing them andn waiting for a new doc release.

If there is interest, I can set it up straight away.

				j
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://lists.open-bio.org/pipermail/emboss-dev/attachments/20050421/4620bfb1/attachment.bin 

From jrvalverde at cnb.uam.es  Thu Apr 21 06:33:21 2005
From: jrvalverde at cnb.uam.es (=?ISO-8859-15?Q?Jos=E9?= R. Valverde)
Date: Thu, 21 Apr 2005 12:33:21 +0200
Subject: CUTG
Message-ID: <20050421123321.5574df12.jrvalverde@cnb.uam.es>

I just saw there are new improvements in cutgextract... Great!

However, if I may make a suggestion, it would be nice if it where to
save the codon tables in a hierarchical arrangement.

I just converted CUTG... 25k files in all. Amazing! Useful! all thay
deserves a great Yes! but has a serious problem: users of the command
line may try an 'ls Emyorganism*' and find their table, but users of
GUIs will have a tough time to navigate through a pull-down menu with
25 thousand options !

Certainly, the GUI might take (partially) care of that by grouping
tables through the pre-underscore part (organism name), but still
too many would result.

So, perhaps it would be better if CUTG where stored in $EMBOSS_DATA/CUTG,
with each section under its own directory, and tables in each section
arranged by e.g organism or first/two-first letter(s).

This may become an interesting question for the emboss users mailing
list..

				j
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://lists.open-bio.org/pipermail/emboss-dev/attachments/20050421/9d2538f8/attachment.bin 

From pmr at ebi.ac.uk  Thu Apr 21 06:38:30 2005
From: pmr at ebi.ac.uk (Peter Rice)
Date: Thu, 21 Apr 2005 11:38:30 +0100
Subject: CUTG
In-Reply-To: <20050421123321.5574df12.jrvalverde@cnb.uam.es>
References: <20050421123321.5574df12.jrvalverde@cnb.uam.es>
Message-ID: <426782A6.5020900@ebi.ac.uk>

Jos? R. Valverde wrote:

> I just saw there are new improvements in cutgextract... Great!
> 
> However, if I may make a suggestion, it would be nice if it where to
> save the codon tables in a hierarchical arrangement.
> 
> I just converted CUTG... 25k files in all. Amazing! Useful! all thay
> deserves a great Yes! but has a serious problem: users of the command
> line may try an 'ls Emyorganism*' and find their table, but users of
> GUIs will have a tough time to navigate through a pull-down menu with
> 25 thousand options !

The plan I have is a little different ...

... to allow a CUTG entry to be retrieved from SRS (haha - has everyone seen 
the news from LION?) or from the CUTG server through some non-sequence access 
method that can return the text of an entry from CUTG, PROSITE, and otehr 
databases.

But at least CUTGEXTRACT can now extract a single species for you so there is 
no need to extract all 25,000 entries.

Hope this helps

Peter


From pmr at ebi.ac.uk  Thu Apr 21 12:20:24 2005
From: pmr at ebi.ac.uk (Peter Rice)
Date: Thu, 21 Apr 2005 17:20:24 +0100
Subject: [EMBOSS] Wiki
In-Reply-To: <20050421115851.49380dc9.jrvalverde@cnb.uam.es>
References: <20050421115851.49380dc9.jrvalverde@cnb.uam.es>
Message-ID: <4267D2C8.10009@ebi.ac.uk>

Jos? R. Valverde wrote:

> I would rather welcome a Wiki for EMBOSS documentation.

We have all the documentation (including the sourceforge web pages) in CVS. 
Any member of the development/documentation team can make updates there.

No need for a wiki for this - and a wiki would be difficult to manage as most 
of the documentation is generated automatically.

> The reason is that as I run into problems/tricks/tasks to do, I see
> comments that might be added here and there in the documentation. I
> would rather go to a single site and make the changes myself than 
> go throught he hassle of devising a 'diff' comment, finding out who
> to mail, mailing them andn waiting for a new doc release.


Just mail anything like that to emboss-bug.

After all ... there is not much point in changing a wiki version of the 
documentation if we are busy changing the application and the real 
documentation :-)

regards,

Peter


From jrvalverde at cnb.uam.es  Fri Apr 22 04:11:18 2005
From: jrvalverde at cnb.uam.es (=?ISO-8859-15?Q?Jos=E9?= R. Valverde)
Date: Fri, 22 Apr 2005 10:11:18 +0200
Subject: [EMBOSS] Wiki (and Macs)
In-Reply-To: <4267D2C8.10009@ebi.ac.uk>
References: <20050421115851.49380dc9.jrvalverde@cnb.uam.es>
	<4267D2C8.10009@ebi.ac.uk>
Message-ID: <20050422101118.33b19892.jrvalverde@cnb.uam.es>

On Thu, 21 Apr 2005 17:20:24 +0100
Peter Rice <pmr at ebi.ac.uk> wrote:
> 
> After all ... there is not much point in changing a wiki version of the 
> documentation if we are busy changing the application and the real 
> documentation :-)
> 
> regards,
> 
> Peter

Right you are Sir. I guess it's better as it is for now. And yet...

Speaking generally, it probably boils down to the management model we
want for EMBOSS. As it is now I tend to see it much like a Cathedral
than a Bazaar. Truly it isn't, but you must agree it is not so evident
from the docs what the procedures are for participation. At least not
at first sight.

I'm more for the Bazaar model, one where everyone is welcome and 
making changes is as trivial as possible (specially for end-users
and end-user-related material, like docs). I'd rather have that as
a 'common' to build a user community around. Game theory shows that
to be the best strategy in the long run (see e.g. 
http://encyclopedia.laborlawtalk.com/Tragedy_of_the_commons ).

In the short run, with limited resources as the EMBOSS team currently
is, you are right it takes a significant effort and portion of the
existing resources. It makes more sense to concentrate on the short
term now and surviving enough to drive new resources in.

But I think we should have that in sight for the long term.

				j
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://lists.open-bio.org/pipermail/emboss-dev/attachments/20050422/7a0edd92/attachment.bin 

From jrvalverde at cnb.uam.es  Fri Apr 22 04:20:58 2005
From: jrvalverde at cnb.uam.es (=?ISO-8859-15?Q?Jos=E9?= R. Valverde)
Date: Fri, 22 Apr 2005 10:20:58 +0200
Subject: Macintosh EMBOSS
In-Reply-To: <4267D2C8.10009@ebi.ac.uk>
References: <20050421115851.49380dc9.jrvalverde@cnb.uam.es>
	<4267D2C8.10009@ebi.ac.uk>
Message-ID: <20050422102058.2ca36edb.jrvalverde@cnb.uam.es>

I'm trying to find out ways to fund EMBOSS in a way that I can
justify locally.

Mac users are a growing 'market' and a promising community. I've got 
here hundreds of Macs, and they need an easy to use, install and
manage solution.

What is needed (they tell me) is a good editor, and some interactive
graphic facilities for common, simple tasks. Actually, locally, we are
going to spend a significant amount into buying a handful of licenses
for commercial software.

I've tried Erik's CD, but it has some drawbacks regarding the configuration
on non-user-managed Macs (as those where root belongs to a central
authority): Here they can install software but not make modifications.
I can't either, being on the SciComp side and not on the Offimatic
end.

I don't have the resources to do that locally, but would welcome a
sensible way to fund it (like buying 'licenses', packages, CDs or
manuals from an EMBOSS-centered company).

I for one would certainly welcome a Macintosh edition ready to run,
and easy to configure to use central databases. If I were to chose,
I'd try to add those facilities to Jemboss (a sequence editor, and
interactive drawing of clones and molecular graphics). This is the
most lacking thing in EMBOSS now that every user has or can have a 
UNIX machine at their desktop.

And, certainly, I would happily recommend locally that we buy a 
hundred+ licenses at a reasonable price if that would help
fund EMBOSS.

Most ideally, something like the LiveDVD from AT.EMBnet.Org but for
Macs would be a candy. And an easy to justify buy.

Any recommendations? Takers? Pointers?

				j
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://lists.open-bio.org/pipermail/emboss-dev/attachments/20050422/49e14ae1/attachment.bin 

From pmr at ebi.ac.uk  Thu Apr  7 16:44:14 2005
From: pmr at ebi.ac.uk (Peter Rice)
Date: Thu, 07 Apr 2005 17:44:14 +0100
Subject: Genetic codes and other repeated ACD lists
Message-ID: <4255635E.8030609@ebi.ac.uk>

I have found a way to save writing and maintaining lists like these in ACD files:

   list: table  [
     additional: "Y"
     default: "0"
     minimum: "1"
     maximum: "1"
     header: "Genetic codes"
     values: "0:Standard; 1:Standard (with alternative initiation
              codons); 2:Vertebrate Mitochondrial; 3:Yeast Mitochondrial;
              4:Mold, Protozoan, Coelenterate Mitochondrial and
              Mycoplasma/Spiroplasma; 5:Invertebrate Mitochondrial; 6:Ciliate
              Macronuclear and Dasycladacean; 9:Echinoderm Mitochondrial;
              10:Euplotid Nuclear; 11:Bacterial; 12:Alternative Yeast Nuclear;
              13:Ascidian Mitochondrial; 14:Flatworm Mitochondrial;
              15:Blepharisma Macronuclear; 16:Chlorophycean Mitochondrial;
              21:Trematode Mitochondrial; 22:Scenedesmus obliquus;
              23:Thraustochytrium Mitochondrial"
     delimiter: ";"
     codedelimiter: ":"
     information: "Code to use"
     knowntype: "genetic code"
   ]


Using the "knowntype" attribute it is possible to delet the value atttribute, 
and to define a standard list using a "resource" definition in the 
emboss.default (or .embossrc) file like this:

RESOURCE genetic_code [ type: "list" value: "0:Standard;11:Bacterial" ]

(for just 2 genetic codes)

or

RESOURCE genetic_code [ type: "list" value: "@EGC.index" ]

(for a list of all the genetic codes - this will read a datafile EGC.index 
which is new in CVS).

Other resource definitions could be commands to execute.

I have not yet decided whether to allow a value of "@EGC.index" in the ACD 
file itself. It could be a nice short cut, but I like using a "knowntype" to 
control the results.

There are some problems to solve:

1. the resource is tested in too many places - it should replace the "value" 
attribute when it is first used. Not hard to do.

2. there should be a clean way to define a default value for each knowntype - 
for example calling an ajTrn function to resolve the "genetic code" knowntype 
to a value. Functions can be defined for list knowntypes in ajacd.c

3. anyone parsing the ACD file will wonder where the value has gone - perhaps 
acdpretty can be made to fill in missing values with an environment variable 
set. Would that be acceptable to those who need it?

Future uses for this:

1. standard list of genetic codes with descriptions

2. standard reading frame names

3. list of known codon usage files, matrices, etc. by specifying "?" as the value

4. a list of blast databases for a blastall wrapper :-)

5. replacing "string" qualifiers which have a knowntype with a selection that 
can display and test the list of acceptable values in ACD, to avoid a run-time 
failure

Comments please ....

Peter


From jison at hgmp.mrc.ac.uk  Fri Apr  8 10:34:51 2005
From: jison at hgmp.mrc.ac.uk (Dr J.C. Ison)
Date: Fri, 08 Apr 2005 11:34:51 +0100
Subject: Genetic codes and other repeated ACD lists
References: <4255635E.8030609@ebi.ac.uk>
Message-ID: <42565E4B.1232945@hgmp.mrc.ac.uk>

Hi Peter

Comments below.

Cheers

Jon


Peter Rice wrote:
> 
> I have found a way to save writing and maintaining lists like these in ACD files:
> 
>    list: table  [
>      additional: "Y"
>      default: "0"
>      minimum: "1"
>      maximum: "1"
>      header: "Genetic codes"
>      values: "0:Standard; 1:Standard (with alternative initiation
>               codons); 2:Vertebrate Mitochondrial; 3:Yeast Mitochondrial;
>               4:Mold, Protozoan, Coelenterate Mitochondrial and
>               Mycoplasma/Spiroplasma; 5:Invertebrate Mitochondrial; 6:Ciliate
>               Macronuclear and Dasycladacean; 9:Echinoderm Mitochondrial;
>               10:Euplotid Nuclear; 11:Bacterial; 12:Alternative Yeast Nuclear;
>               13:Ascidian Mitochondrial; 14:Flatworm Mitochondrial;
>               15:Blepharisma Macronuclear; 16:Chlorophycean Mitochondrial;
>               21:Trematode Mitochondrial; 22:Scenedesmus obliquus;
>               23:Thraustochytrium Mitochondrial"
>      delimiter: ";"
>      codedelimiter: ":"
>      information: "Code to use"
>      knowntype: "genetic code"
>    ]
> 
> Using the "knowntype" attribute it is possible to delet the value atttribute,
> and to define a standard list using a "resource" definition in the
> emboss.default (or .embossrc) file like this:
> 
> RESOURCE genetic_code [ type: "list" value: "0:Standard;11:Bacterial" ]
> 
> (for just 2 genetic codes)
> 
> or
> 
> RESOURCE genetic_code [ type: "list" value: "@EGC.index" ]
> 
> (for a list of all the genetic codes - this will read a datafile EGC.index
> which is new in CVS).
> 
> Other resource definitions could be commands to execute.

It'd be cleaner, more flexible and and easier to maintain and if not a 
requirement now probably an increasing one in the future.  I've two progs 
that would benefit from it now.

 
> I have not yet decided whether to allow a value of "@EGC.index" in the ACD
> file itself. It could be a nice short cut, but I like using a "knowntype" to
> control the results.

Could be confusing to allow that in the ACD file because the punter might 
think EGC existed, e.g. as a data item, in the file itself and get confused
when they can't find it.

 
> There are some problems to solve:
> 
> 1. the resource is tested in too many places - it should replace the "value"
> attribute when it is first used. Not hard to do.
> 
> 2. there should be a clean way to define a default value for each knowntype -
> for example calling an ajTrn function to resolve the "genetic code" knowntype
> to a value. Functions can be defined for list knowntypes in ajacd.c

Couldn't the default be specified in the same place / file as the values themselves?
Presumably the default value would be needed before run-time proper and could
be retrieved at the same time as the values are.

> 
> 3. anyone parsing the ACD file will wonder where the value has gone - perhaps
> acdpretty can be made to fill in missing values with an environment variable
> set. Would that be acceptable to those who need it?


I think it would be nice to support both "standard" lists (ie. ones *with* "values" 
attribute) and the new style.  Perhaps something like:

      values: "@knowntype"  

to indicate to use the knowntype to get the values, *or*

      values: "0: Standard ... etc" as before.

Then the values attribute would always be there, with the ACD developer having 
the option to specify a standard list of values or to get the values from the 
knowntype.


> Future uses for this:
> 
> 1. standard list of genetic codes with descriptions
> 
> 2. standard reading frame names
> 
> 3. list of known codon usage files, matrices, etc. by specifying "?" as the value
> 
> 4. a list of blast databases for a blastall wrapper :-)
> 
> 5. replacing "string" qualifiers which have a knowntype with a selection that
> can display and test the list of acceptable values in ACD, to avoid a run-time
> failure
> 
> Comments please ....
> 
> Peter

-- 
Jon C. Ison, PhD
Proteomics Applications Group
MRC Rosalind Franklin Centre for Genomics Research
Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SB, UK
Tel: +44 1223 494500  Fax: +44 1223 494512
E-mail: jison at rfcgr.mrc.ac.uk  Web: http://www.rfcgr.mrc.ac.uk


From pmr at ebi.ac.uk  Fri Apr  8 10:55:02 2005
From: pmr at ebi.ac.uk (Peter Rice)
Date: Fri, 08 Apr 2005 11:55:02 +0100
Subject: Genetic codes and other repeated ACD lists
In-Reply-To: <42565E4B.1232945@hgmp.mrc.ac.uk>
References: <4255635E.8030609@ebi.ac.uk> <42565E4B.1232945@hgmp.mrc.ac.uk>
Message-ID: <42566306.3000908@ebi.ac.uk>

Dr J.C. Ison wrote:

> Peter Rice wrote:
>>I have found a way to save writing and maintaining lists like these in ACD files:
>>
>>Using the "knowntype" attribute it is possible to delet the value atttribute,
>>and to define a standard list using a "resource" definition in the
>>emboss.default (or .embossrc) file like this:
> 
> It'd be cleaner, more flexible and and easier to maintain and if not a 
> requirement now probably an increasing one in the future.  I've two progs 
> that would benefit from it now.

Thanks. Domainatrix I assume. Which ones? I will take a look and see how they 
fit in.

> Couldn't the default be specified in the same place / file as the values themselves?
> Presumably the default value would be needed before run-time proper and could
> be retrieved at the same time as the values are.

Good point. I need to think some more about whether a knowntype should have a 
default. Genetics codes are a good example - we use a default of 0 but 
strictly genetic code numbers are 1 to 23 (0 is code 1 with only ATG as a start).

> I think it would be nice to support both "standard" lists (ie. ones *with* "values" 
> attribute) and the new style.  Perhaps something like:
> 
>       values: "@knowntype"  

A missing value will do this. Normally the value is required, so an ACD 
developer (or parser) will know it needs a value. (I hope :-)

> to indicate to use the knowntype to get the values, *or*
> 
>       values: "0: Standard ... etc" as before.

Yes, that will override the knowntype. Maybe acdvalid can warn if a list or 
select has a knowntype (with its own standadr value) and a defined value attribute

More comments please!

Peter


From jison at hgmp.mrc.ac.uk  Fri Apr  8 11:46:43 2005
From: jison at hgmp.mrc.ac.uk (Dr J.C. Ison)
Date: Fri, 08 Apr 2005 12:46:43 +0100
Subject: Genetic codes and other repeated ACD lists
References: <4255635E.8030609@ebi.ac.uk> <42565E4B.1232945@hgmp.mrc.ac.uk> <42566306.3000908@ebi.ac.uk>
Message-ID: <42566F23.F80F2CFD@hgmp.mrc.ac.uk>

Peter Rice wrote:

> Thanks. Domainatrix I assume. Which ones? I will take a look and see how they
> fit in.

The newly committed matgen3d and siggenlig, which both take an "environment 
definition" (amino acid 3D environment) from a list.

At the moment, the environemnts names are "Env1", "Env2" etc but would get more 
meaningful names once the definitions themselves are more settled (pending 
further research on which ones are most useful).

The progs. then use the selection (Env1 or Env2 etc) to call appropriate functions
within the application code.

> > I think it would be nice to support both "standard" lists (ie. ones *with* "values"
> > attribute) and the new style.  Perhaps something like:
> >
> >       values: "@knowntype"
> 
> A missing value will do this. Normally the value is required, so an ACD
> developer (or parser) will know it needs a value. (I hope :-)

They could simply acdprettyify the files as described in your prev. email 
before parsing, so they wouldn't need to do any new coding.


> 
> > to indicate to use the knowntype to get the values, *or*
> >
> >       values: "0: Standard ... etc" as before.
> 
> Yes, that will override the knowntype. Maybe acdvalid can warn if a list or
> select has a knowntype (with its own standadr value) and a defined value attribute

The override / warning are intuitive / sensible.


From pmr at ebi.ac.uk  Fri Apr  8 13:18:42 2005
From: pmr at ebi.ac.uk (Peter Rice)
Date: Fri, 08 Apr 2005 14:18:42 +0100
Subject: Genetic codes and other repeated ACD lists
In-Reply-To: <42566F23.F80F2CFD@hgmp.mrc.ac.uk>
References: <4255635E.8030609@ebi.ac.uk> <42565E4B.1232945@hgmp.mrc.ac.uk> <42566306.3000908@ebi.ac.uk> <42566F23.F80F2CFD@hgmp.mrc.ac.uk>
Message-ID: <425684B2.1000709@ebi.ac.uk>

Dr J.C. Ison wrote:

> Peter Rice wrote:
>>Thanks. Domainatrix I assume. Which ones? I will take a look and see how they
>>fit in.
> 
> The newly committed matgen3d and siggenlig, which both take an "environment 
> definition" (amino acid 3D environment) from a list.
> 
> At the moment, the environemnts names are "Env1", "Env2" etc but would get more 
> meaningful names once the definitions themselves are more settled (pending 
> further research on which ones are most useful).

The genetic code format is very simple - the name, a space and the value with 
leading spaces and #commented lines ignored (this is the EGC.index file for an 
"@EGC.index" resource value)

  0 Standard with AUG start only
  1 Standard
  2 Vertebrate mitochondrial
  3 Yeast mitochondrial
  4 Mold, Protozoan, and Coelenterate Mitochondrial and Mycoplasma/Spiroplasma
  5 Invertebrate Mitochondrial
  6 Ciliate, Dasycladacean and Hexamita Nuclear
# 7 *Kinetoplast code now merged in code id 4
# 8 *Plant chloroplast all differences due to RNA edit use code id 1
  9 Echinoderm and Flatworm Mitochondrial
10 Euplotid Nuclear
11 Bacterial and Plant Plastid
12 Alternative Yeast Nuclear
13 Ascidian Mitochondrial
14 Alternative Flatworm Mitochondrial
15 Blepharisma Nuclear
16 Chlorophycean Mitochondrial
#17 Never defined
#18 Never defined
#19 Never defined
#20 Never defined
21 Trematode Mitochondrial
22 Scenedesmus obliquus Mitochondrial
23 Thraustochytrium Mitochondrial


> They could simply acdprettyify the files as described in your prev. email 
> before parsing, so they wouldn't need to do any new coding.

But maybe not those who use the ACD file at run time :-)

regards,

Peter


From pmr at ebi.ac.uk  Fri Apr  8 13:26:59 2005
From: pmr at ebi.ac.uk (Peter Rice)
Date: Fri, 08 Apr 2005 14:26:59 +0100
Subject: Genetic codes and other repeated ACD lists
In-Reply-To: <Pine.OSF.4.58.0504081324290.449852@sidious.internal.sanger.ac.uk>
References: <4255635E.8030609@ebi.ac.uk> <Pine.OSF.4.58.0504081324290.449852@sidious.internal.sanger.ac.uk>
Message-ID: <425686A3.2080808@ebi.ac.uk>

Tim Carver wrote:

>Peter Rice wrote: 
>>3. anyone parsing the ACD file will wonder where the value has gone - perhaps
>>acdpretty can be made to fill in missing values with an environment variable
>>set. Would that be acceptable to those who need it?
> 
> I guess so. If we just loop over the ACD's after installation and get
> 'acdpretty' to convert them that shouldn't be too bad I would have
> thought... it would only need to be done once.

For list: and selction: the acdpretty output would look normal (the value: "" 
attribute can be filled in with the knowntype value).

For matrix: and matrixf: we can leave everything unchanged (add nothing to the 
ACD file in acdpretty), or we can offer a list of known matrix filenames using 
some new attribute name. This is a little tricky ... for the alignment 
programs, there will be separate lists for nucleotide (only EDNAFULL and 
EDNAMAT) and protein (EBLOSUM* and EPAM*) with the allowed values depending on 
the type of the input sequences. Of course, as matrix input the user can 
choose any other available matrix file if the interface allows.

Any prerefence (or any special requests to help JEMBOSS?)

regards,

Peter


From pmr at ebi.ac.uk  Fri Apr  8 13:30:34 2005
From: pmr at ebi.ac.uk (Peter Rice)
Date: Fri, 08 Apr 2005 14:30:34 +0100
Subject: Genetic codes and other repeated ACD lists
In-Reply-To: <425685BA.A9658C5F@hgmp.mrc.ac.uk>
References: <4255635E.8030609@ebi.ac.uk> <42565E4B.1232945@hgmp.mrc.ac.uk> <42566306.3000908@ebi.ac.uk> <42566F23.F80F2CFD@hgmp.mrc.ac.uk> <425684B2.1000709@ebi.ac.uk> <425685BA.A9658C5F@hgmp.mrc.ac.uk>
Message-ID: <4256877A.4090609@ebi.ac.uk>

Dr J.C. Ison wrote:

> That format would be ideal.  "Env1", "Env2" etc could be replaced by "1", "2" etc
> then text could be added giving a meaningful description of the environment.

The name would be whatever the program accepts for a list (for selection it is 
the value, but list is generally preferred in ACD files). I know domainatrix 
often uses "1", "2", etc. but they are not always the best choices.

A thought - perhaps the file could have a default marked with * before the 
name, or default to the first in the list?

regards,

Peter


From gbottu at ben.vub.ac.be  Wed Apr 13 13:22:12 2005
From: gbottu at ben.vub.ac.be (Guy Bottu)
Date: Wed, 13 Apr 2005 15:22:12 +0200
Subject: Genetic codes and other repeated ACD
Message-ID: <20050413132212.GA15521@bigben.ulb.ac.be>

	Dear Peter, dear all,

Allow me to add something to the recent discussion about geneticcodes. I 
talked about it with Marc Colet, developer of wEMBOSS, and he considers, 
for the sake of GUI maintenance, that it is better to avoid making the ACD 
syntax too complicated and certainly to avoid making too often a change.
A few ideas :

- Currently emboss.defaults does not contain items that are absolutely 
needed. We think it is better not to change that philosophy by putting 
e.g. the geneticcodes in it. It could however be an idea to put in 
emboss.defaults a list of databanks in BLAST format, for the sake of BLAST 
wrappers.

- For items like reading frames and maybe geneticcodes, that appear over 
and over again in several ACD files, yet are not user or installation 
customizable, the best proposal among those made in this discussion list 
seems to me to have it defined in one central file, for the purpose of the 
software developement, but to "acdpretty" it into the ACD files before 
they are distributed, for the sake of GUI functioning.

- There is the case of items where users can choose to use their own data 
instead of the EMBOSS distribution data, like symbol comparison matrices 
and codon usage tables (would genetic codes fall into this catagory ?). 
Till now there was each time a new ACD object type defined, like matrix 
and cfile. Is shifting to the use of "knowntype" a good idea ? I do not 
know, but, let's keep consistent.

- There is the issue of the program embossdata, useful for the advanced 
user and a possible tool for displaying choice lists in GUI's. Currently, 
when we run it at the BEN site with just the parameter -showall it produces a 
monstruous long list, because all the databanks (including CUTG) have been 
downloaded and "extracted". Maybe let it by default display only the data 
files in the main data directory ? Note that e.g. the list of PRINTS files 
is anyway not very interesting, since you cannot do anything with them as 
such. Could it be modified so that you can easily get a list of the 
alternative data files used by a particular program (or could a library 
routine called by the program itself do that) ?
 
	Sincerely,
	Guy Bottu,
	BEN


From pmr at ebi.ac.uk  Wed Apr 13 14:30:05 2005
From: pmr at ebi.ac.uk (Peter Rice)
Date: Wed, 13 Apr 2005 15:30:05 +0100
Subject: Genetic codes and other repeated ACD
In-Reply-To: <20050413132212.GA15521@bigben.ulb.ac.be>
References: <20050413132212.GA15521@bigben.ulb.ac.be>
Message-ID: <425D2CED.60503@ebi.ac.uk>

Guy Bottu wrote:

> - Currently emboss.defaults does not contain items that are absolutely 
> needed. We think it is better not to change that philosophy by putting 
> e.g. the geneticcodes in it. It could however be an idea to put in 
> emboss.defaults a list of databanks in BLAST format, for the sake of BLAST 
> wrappers.

They will not be absolutely needed. There will be a default - a list of 
values, a file with a list of values, or a script that finds everything.

> - For items like reading frames and maybe geneticcodes, that appear over 
> and over again in several ACD files, yet are not user or installation 
> customizable, the best proposal among those made in this discussion list 
> seems to me to have it defined in one central file, for the purpose of the 
> software developement, but to "acdpretty" it into the ACD files before 
> they are distributed, for the sake of GUI functioning.

This will be the default ... but the distributed files will *not* have the 
values filled in (if we fill the values in, the automatic list will not work 
when users add new options :-).

You will need to run acdpretty yourself. That way, if you add extra options 
locally you will get them in the acdpretty file. There is nothing to stop you 
copying that file on top of the original acd file.

> - There is the case of items where users can choose to use their own data 
> instead of the EMBOSS distribution data, like symbol comparison matrices 
> and codon usage tables (would genetic codes fall into this catagory ?). 
> Till now there was each time a new ACD object type defined, like matrix 
> and cfile. Is shifting to the use of "knowntype" a good idea ? I do not 
> know, but, let's keep consistent.

The same will happen for these ... but matrix files are complicated. For 
programs that read nucleotide and protein, the list will have to include all 
matrix files.

> - There is the issue of the program embossdata, useful for the advanced 
> user and a possible tool for displaying choice lists in GUI's. Currently, 
> when we run it at the BEN site with just the parameter -showall it produces a 
> monstruous long list, because all the databanks (including CUTG) have been 
> downloaded and "extracted". Maybe let it by default display only the data 
> files in the main data directory ? Note that e.g. the list of PRINTS files 
> is anyway not very interesting, since you cannot do anything with them as 
> such. Could it be modified so that you can easily get a list of the 
> alternative data files used by a particular program (or could a library 
> routine called by the program itself do that) ?

I have modified embossdata to prompt always for a filename (default of no file 
still lists all files).

Options to select the other directories are interesting because (1) you get 
less output and (2) we will have a new internal default for the list of 
directories used by embossdata!

Hope that makes things clearer, and thanks for the comments.

Peter


From senger at ebi.ac.uk  Tue Apr 19 10:11:46 2005
From: senger at ebi.ac.uk (Martin Senger)
Date: Tue, 19 Apr 2005 11:11:46 +0100 (BST)
Subject: Genetic codes and other repeated ACD lists
In-Reply-To: <4255635E.8030609@ebi.ac.uk>
Message-ID: <Pine.LNX.4.44.0504191101190.15035-100000@bagheera.ebi.ac.uk>

> RESOURCE genetic_code [ type: "list" value: "0:Standard;11:Bacterial" ]
>
   I am not knowledgeable enough about EMBOSS, especially I know almost 
nothing about the EGC.index etc., in order to be helpful here, but allow 
me please ask a question:
   If I understand it correctly you are actually talking about replacing
often-repeated pieces of ACD files by a reference to a common (shared)  
place where the piece is stored just once. But that seems to be an exact
scenario used in all kinds of the 'include' directives. So what about to
consider to add a general syntax for inclusion in the ACD and then you can
replace not only genetic codes but any other repeting piece any time you
wish. And it will be transparent for the ACD parsers (they just need to
know where to look for the included files).

   Just my 2cents,
   Martin

-- 
Martin Senger

EMBL Outstation - Hinxton                Senger at EBI.ac.uk     
European Bioinformatics Institute        Phone: (+44) 1223 494636      
Wellcome Trust Genome Campus             (Switchboard:     494444)
Hinxton                                  Fax  : (+44) 1223 494468
Cambridge CB10 1SD
United Kingdom                           http://industry.ebi.ac.uk/~senger


From jrvalverde at cnb.uam.es  Thu Apr 21 09:58:51 2005
From: jrvalverde at cnb.uam.es (=?ISO-8859-15?Q?Jos=E9?= R. Valverde)
Date: Thu, 21 Apr 2005 11:58:51 +0200
Subject: Wiki
Message-ID: <20050421115851.49380dc9.jrvalverde@cnb.uam.es>

I would rather welcome a Wiki for EMBOSS documentation.

I can host it at Es.EMBnet.Org/es.emboss.org, no problem at that.

The reason is that as I run into problems/tricks/tasks to do, I see
comments that might be added here and there in the documentation. I
would rather go to a single site and make the changes myself than 
go throught he hassle of devising a 'diff' comment, finding out who
to mail, mailing them andn waiting for a new doc release.

If there is interest, I can set it up straight away.

				j
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/emboss-dev/attachments/20050421/4620bfb1/attachment.sig>

From jrvalverde at cnb.uam.es  Thu Apr 21 10:33:21 2005
From: jrvalverde at cnb.uam.es (=?ISO-8859-15?Q?Jos=E9?= R. Valverde)
Date: Thu, 21 Apr 2005 12:33:21 +0200
Subject: CUTG
Message-ID: <20050421123321.5574df12.jrvalverde@cnb.uam.es>

I just saw there are new improvements in cutgextract... Great!

However, if I may make a suggestion, it would be nice if it where to
save the codon tables in a hierarchical arrangement.

I just converted CUTG... 25k files in all. Amazing! Useful! all thay
deserves a great Yes! but has a serious problem: users of the command
line may try an 'ls Emyorganism*' and find their table, but users of
GUIs will have a tough time to navigate through a pull-down menu with
25 thousand options !

Certainly, the GUI might take (partially) care of that by grouping
tables through the pre-underscore part (organism name), but still
too many would result.

So, perhaps it would be better if CUTG where stored in $EMBOSS_DATA/CUTG,
with each section under its own directory, and tables in each section
arranged by e.g organism or first/two-first letter(s).

This may become an interesting question for the emboss users mailing
list..

				j
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/emboss-dev/attachments/20050421/9d2538f8/attachment.sig>

From pmr at ebi.ac.uk  Thu Apr 21 10:38:30 2005
From: pmr at ebi.ac.uk (Peter Rice)
Date: Thu, 21 Apr 2005 11:38:30 +0100
Subject: CUTG
In-Reply-To: <20050421123321.5574df12.jrvalverde@cnb.uam.es>
References: <20050421123321.5574df12.jrvalverde@cnb.uam.es>
Message-ID: <426782A6.5020900@ebi.ac.uk>

Jos? R. Valverde wrote:

> I just saw there are new improvements in cutgextract... Great!
> 
> However, if I may make a suggestion, it would be nice if it where to
> save the codon tables in a hierarchical arrangement.
> 
> I just converted CUTG... 25k files in all. Amazing! Useful! all thay
> deserves a great Yes! but has a serious problem: users of the command
> line may try an 'ls Emyorganism*' and find their table, but users of
> GUIs will have a tough time to navigate through a pull-down menu with
> 25 thousand options !

The plan I have is a little different ...

... to allow a CUTG entry to be retrieved from SRS (haha - has everyone seen 
the news from LION?) or from the CUTG server through some non-sequence access 
method that can return the text of an entry from CUTG, PROSITE, and otehr 
databases.

But at least CUTGEXTRACT can now extract a single species for you so there is 
no need to extract all 25,000 entries.

Hope this helps

Peter


From pmr at ebi.ac.uk  Thu Apr 21 16:20:24 2005
From: pmr at ebi.ac.uk (Peter Rice)
Date: Thu, 21 Apr 2005 17:20:24 +0100
Subject: [EMBOSS] Wiki
In-Reply-To: <20050421115851.49380dc9.jrvalverde@cnb.uam.es>
References: <20050421115851.49380dc9.jrvalverde@cnb.uam.es>
Message-ID: <4267D2C8.10009@ebi.ac.uk>

Jos? R. Valverde wrote:

> I would rather welcome a Wiki for EMBOSS documentation.

We have all the documentation (including the sourceforge web pages) in CVS. 
Any member of the development/documentation team can make updates there.

No need for a wiki for this - and a wiki would be difficult to manage as most 
of the documentation is generated automatically.

> The reason is that as I run into problems/tricks/tasks to do, I see
> comments that might be added here and there in the documentation. I
> would rather go to a single site and make the changes myself than 
> go throught he hassle of devising a 'diff' comment, finding out who
> to mail, mailing them andn waiting for a new doc release.


Just mail anything like that to emboss-bug.

After all ... there is not much point in changing a wiki version of the 
documentation if we are busy changing the application and the real 
documentation :-)

regards,

Peter


From jrvalverde at cnb.uam.es  Fri Apr 22 08:11:18 2005
From: jrvalverde at cnb.uam.es (=?ISO-8859-15?Q?Jos=E9?= R. Valverde)
Date: Fri, 22 Apr 2005 10:11:18 +0200
Subject: [EMBOSS] Wiki (and Macs)
In-Reply-To: <4267D2C8.10009@ebi.ac.uk>
References: <20050421115851.49380dc9.jrvalverde@cnb.uam.es>
	<4267D2C8.10009@ebi.ac.uk>
Message-ID: <20050422101118.33b19892.jrvalverde@cnb.uam.es>

On Thu, 21 Apr 2005 17:20:24 +0100
Peter Rice <pmr at ebi.ac.uk> wrote:
> 
> After all ... there is not much point in changing a wiki version of the 
> documentation if we are busy changing the application and the real 
> documentation :-)
> 
> regards,
> 
> Peter

Right you are Sir. I guess it's better as it is for now. And yet...

Speaking generally, it probably boils down to the management model we
want for EMBOSS. As it is now I tend to see it much like a Cathedral
than a Bazaar. Truly it isn't, but you must agree it is not so evident
from the docs what the procedures are for participation. At least not
at first sight.

I'm more for the Bazaar model, one where everyone is welcome and 
making changes is as trivial as possible (specially for end-users
and end-user-related material, like docs). I'd rather have that as
a 'common' to build a user community around. Game theory shows that
to be the best strategy in the long run (see e.g. 
http://encyclopedia.laborlawtalk.com/Tragedy_of_the_commons ).

In the short run, with limited resources as the EMBOSS team currently
is, you are right it takes a significant effort and portion of the
existing resources. It makes more sense to concentrate on the short
term now and surviving enough to drive new resources in.

But I think we should have that in sight for the long term.

				j
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/emboss-dev/attachments/20050422/7a0edd92/attachment.sig>

From jrvalverde at cnb.uam.es  Fri Apr 22 08:20:58 2005
From: jrvalverde at cnb.uam.es (=?ISO-8859-15?Q?Jos=E9?= R. Valverde)
Date: Fri, 22 Apr 2005 10:20:58 +0200
Subject: Macintosh EMBOSS
In-Reply-To: <4267D2C8.10009@ebi.ac.uk>
References: <20050421115851.49380dc9.jrvalverde@cnb.uam.es>
	<4267D2C8.10009@ebi.ac.uk>
Message-ID: <20050422102058.2ca36edb.jrvalverde@cnb.uam.es>

I'm trying to find out ways to fund EMBOSS in a way that I can
justify locally.

Mac users are a growing 'market' and a promising community. I've got 
here hundreds of Macs, and they need an easy to use, install and
manage solution.

What is needed (they tell me) is a good editor, and some interactive
graphic facilities for common, simple tasks. Actually, locally, we are
going to spend a significant amount into buying a handful of licenses
for commercial software.

I've tried Erik's CD, but it has some drawbacks regarding the configuration
on non-user-managed Macs (as those where root belongs to a central
authority): Here they can install software but not make modifications.
I can't either, being on the SciComp side and not on the Offimatic
end.

I don't have the resources to do that locally, but would welcome a
sensible way to fund it (like buying 'licenses', packages, CDs or
manuals from an EMBOSS-centered company).

I for one would certainly welcome a Macintosh edition ready to run,
and easy to configure to use central databases. If I were to chose,
I'd try to add those facilities to Jemboss (a sequence editor, and
interactive drawing of clones and molecular graphics). This is the
most lacking thing in EMBOSS now that every user has or can have a 
UNIX machine at their desktop.

And, certainly, I would happily recommend locally that we buy a 
hundred+ licenses at a reasonable price if that would help
fund EMBOSS.

Most ideally, something like the LiveDVD from AT.EMBnet.Org but for
Macs would be a candy. And an easy to justify buy.

Any recommendations? Takers? Pointers?

				j
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/emboss-dev/attachments/20050422/49e14ae1/attachment.sig>