From gbottu at ben.vub.ac.be Mon Oct 7 06:03:10 2002 From: gbottu at ben.vub.ac.be (Guy Bottu) Date: Mon, 7 Oct 2002 12:03:10 +0200 (CEST) Subject: which restriction sites are used ? Message-ID: <200210071003.MAA1302581@black.vub.ac.be> from : BEN Dear developers, I am using EMBOSS version 2.4.1. I noticed the following : ApeKI is a restriction enzyme without commercial provider. If I do "redata ApeKI" I get : ------------ ApeKI Recognition site is GCWGC leaving sticky ends Cut positions 5':0 3':0 [5':0 3':0] Organism: Aeropyrum pernix K1 Methylated: ?(5) Source: ATCC 700893 Isoschizomers: TseI AceI Taq52I TseBI References: Kawarabayasi, Y. et al., (1999) DNA Res., vol. 6, pp. 83-101. Xu, S.-Y., Opitz, L., Unpublished observations. ---------------- So, it is there and has a cut site that should match GCAGC. Yet, when I run remap or restrict with parameters -nocommercial -nolimit on a sequence containg GCAGC the restriction site is not found. Asking explicitly to look for enzyme ApeKI does not help, the enzyme is not even mentioned as "enzyme that does not cut". Is that normal ? Guy Bottu From jrvalverde at cnb.uam.es Tue Oct 8 10:14:29 2002 From: jrvalverde at cnb.uam.es (José R. Valverde) Date: Tue, 8 Oct 2002 16:14:29 +0200 Subject: Suggestions Message-ID: <20021008161429.49d4b558.jrvalverde@cnb.uam.es> A couple of suggestions for the Jemboss user interface: 1) Why not integrate B/Biomer with Jemboss so structure files (PDB) may be loaded, visualised, built AND modelised by molecular dynamics/mechanics. B/Biomer has already all this, and is open code and free academic software. http://www.scripps.edu/case/Biomer/index.html I've been wanting to do this for almost two years now, but have not been able to find the time needed to start with it (sic). I still would like to, and perhaps with some minor help I could. In principle all that would be needed is a connection similar to that available for Cinema: a menu item opening a window where one can select/drag a PDB file and perhaps a "connection" of .pdb/.ent/.brk files to the program. It might be a good starting point to grasp other MD software (but again, my adaptation of TINKER to embassy is long due and delayed). 2) Simpler: a way to save user preferences locally, and most prominently among them, things like the "mail server" to be used by CINEMA, which is a pain that needs localisation almost everywhere. j From jrvalverde at cnb.uam.es Tue Oct 8 10:40:26 2002 From: jrvalverde at cnb.uam.es (José R. Valverde) Date: Tue, 8 Oct 2002 16:40:26 +0200 Subject: More Jemboss suggestions Message-ID: <20021008164026.6d68be7c.jrvalverde@cnb.uam.es> Another tools interesting to be integrated is TreeApp from phylodendron. It would allow drawing the phylogenetic trees obtained by other programs. LoopDloop might as well be interesting for drawing RNA secondary structures coupled with Zuker's RNAfold. Phylodendron and LoopDloop are available at http://iubio.bio.indiana.edu/soft/molbio/java/apps/ j From gwilliam at hgmp.mrc.ac.uk Mon Oct 14 05:36:23 2002 From: gwilliam at hgmp.mrc.ac.uk (gwilliam at hgmp.mrc.ac.uk) Date: Mon, 14 Oct 2002 10:36:23 +0100 (BST) Subject: documentation Message-ID: <200210140936.g9E9aNN07725@californium.hgmp.mrc.ac.uk> Damian Counsell wrote: > Whatever happens with the HTML stylesheets, they will probably be > subordinated to whatever the HGMP's new house style looks like. The > whole point of these stylesheets is that they just tailor the "look" > of the page to whatever environment they are being viewed in; they > don't change the content in any way. > > We can have as many different looks as we like. James is creating / > has created new stylesheets for the HGMP's new Web pages, which we > ought to pay lip service to at least when preparing stylesheets for > our copy of the online documentation. The HGMP house style has nothing whatsoever to do with the EMBOSS documentation. The EMBOSS project is not an HGMP project, the HGMP merely hosts some of the EMBOSS web pages temporarily because it is convenient for us. We can and will move the EMBOSS pages elsewhere if there is any interference from the HGMP. -- Gary Williams Tel: +44 1223 494522 Fax: +44 1223 494512 mailto:G.Williams at hgmp.mrc.ac.uk http://www.hgmp.mrc.ac.uk/ Bioinformatics,MRC HGMP Resource Centre,Hinxton,Cambridge, CB10 1SB,UK From cbeazley at hgmp.mrc.ac.uk Mon Oct 14 05:48:08 2002 From: cbeazley at hgmp.mrc.ac.uk (Claude Beazley) Date: Mon, 14 Oct 2002 10:48:08 +0100 Subject: Reorganisation of EMBOSS Message-ID: <200210141048.08268.cbeazley@hgmp.mrc.ac.uk> The UNIX design philsophy (which has served us well for nearly 40 years) is KISS (Keep It Simple Stupid). i.e. Create lots of small well designed tools each of which serve a very specific function. These tools can be then linked together (through piping, redirection or simple file writing/reading) to perform very complex tasks. It keeps debugging sane and allows users/developers to utilise the tools in ways that the original developers may not have thought of. This allows the biologists the high degree of creativity which is important for research. OK. so they have to learn stuff, but that is nothing new for Biologists. If they really have problems developing new tools, then they can approach us or get their IT dept to develop in-house solutions. Also from a distributed programming perception, monolithic code structures would be an absolute nightmare to deal with. This modular approach also largely negates the problem of reinventing the wheel for each app. It also means that bench-biologists don't have the problem of massive code-bloat when they want to run a simple app. Many researchers (for example in third world countries) have old and underpowered PCs (people still use 486s would you believe). Keeping EMBOSS modular allows them to still use these machines whereas the total integrated approach would force them to enter the hardware upgrade treadmill. This is a prohibitive path which would keep them from using EMBOSS. (One of the reasons that GNU/Linux is so successful is that it can run pretty much on anything, due to this approach.) KISS keeps stuff running efficiently, cleanly and with sanity. >On Monday 14 Oct 2002 10:15 am, Lisa Mullan wrote: > Here is the eagerly awaited ppt file! > > I don't seem to have BeeJay's address, or anyone else from theEBI, but > trust it will be shared. > > To continue the conversation we were having at the meeting regarding the > splitting, or amalgamating of programs, I think we should look for a > consensus on one thing or the other. > > For my part, I think it is rather silly that bench biologists (who are the > main users of these tools) have to wade through hundreds of programs often > with names that bear little or no relation to what they do (although we > find them funny!) > > On the side of people that are developing their own softwre using EMBOSS > applications, I feel that it would be possible to switch off the functions > they don't need by using the option flags. > > I cannot see an argument for so many tiny programs, apart from the > author's own convenience, which should perhaps be lower down the priority > list? > > Lisa > > Lisa Mullan > HGMP Resource Centre > Hinxton, > Cambridge, CB10 1SB > Tel: 01223 494526 > Email: lmullan at hgmp.mrc.ac.uk ------------------------------------------------------- From cbeazley at hgmp.mrc.ac.uk Mon Oct 14 06:09:30 2002 From: cbeazley at hgmp.mrc.ac.uk (Claude Beazley) Date: Mon, 14 Oct 2002 11:09:30 +0100 Subject: Reorganisation of EMBOSS In-Reply-To: <200210141048.08268.cbeazley@hgmp.mrc.ac.uk> References: <200210141048.08268.cbeazley@hgmp.mrc.ac.uk> Message-ID: <200210141109.30864.cbeazley@hgmp.mrc.ac.uk> However, having said that, UNIX does not have 15 different "ls" commands (one ls for listing the files, another for listing the files with file sizes, a third one for sorting and listing the files... etc) it does have one "ls" command with switches for sorting, stats etc. So if that is the sort of integration that Lisa is thinking of, then it should be carefully considered. claude >On Monday 14 Oct 2002 10:48 am, Claude Beazley wrote: > The UNIX design philsophy (which has served us well for nearly 40 years) is > KISS (Keep It Simple Stupid). i.e. Create lots of small well designed tools > each of which serve a very specific function. These tools can be then > linked together (through piping, redirection or simple file > writing/reading) to perform very complex tasks. It keeps debugging sane > and allows > users/developers to utilise the tools in ways that the original developers > may not have thought of. This allows the biologists the high degree of > creativity which is important for research. OK. so they have to learn > stuff, but that is nothing new for Biologists. If they really have problems > developing new tools, then they can approach us or get their IT dept to > develop in-house solutions. > > Also from a distributed programming perception, monolithic code structures > would be an absolute nightmare to deal with. > > This modular approach also largely negates the problem of reinventing the > wheel for each app. It also means that bench-biologists don't have the > problem of massive code-bloat when they want to run a simple app. Many > researchers (for example in third world countries) have old and > underpowered PCs (people still use 486s would you believe). Keeping EMBOSS > modular allows them to still use these machines whereas the total > integrated approach would force them to enter the hardware upgrade > treadmill. This is a prohibitive path which would keep them from using > EMBOSS. (One of the reasons that GNU/Linux is so successful is that it can > run pretty much on anything, due to this approach.) > > > KISS keeps stuff running efficiently, cleanly and with sanity. > > >On Monday 14 Oct 2002 10:15 am, Lisa Mullan wrote: > > Here is the eagerly awaited ppt file! > > > > I don't seem to have BeeJay's address, or anyone else from theEBI, but > > trust it will be shared. > > > > To continue the conversation we were having at the meeting regarding the > > splitting, or amalgamating of programs, I think we should look for a > > consensus on one thing or the other. > > > > For my part, I think it is rather silly that bench biologists (who are > > the main users of these tools) have to wade through hundreds of programs > > often with names that bear little or no relation to what they do > > (although we find them funny!) > > > > On the side of people that are developing their own softwre using EMBOSS > > applications, I feel that it would be possible to switch off the > > functions they don't need by using the option flags. > > > > I cannot see an argument for so many tiny programs, apart from the > > author's own convenience, which should perhaps be lower down the priority > > list? > > > > Lisa > > > > Lisa Mullan > > HGMP Resource Centre > > Hinxton, > > Cambridge, CB10 1SB > > Tel: 01223 494526 > > Email: lmullan at hgmp.mrc.ac.uk > > ------------------------------------------------------- From d.counsell at hgmp.mrc.ac.uk Mon Oct 14 06:11:18 2002 From: d.counsell at hgmp.mrc.ac.uk (Damian Counsell) Date: Mon, 14 Oct 2002 11:11:18 +0100 Subject: documentation In-Reply-To: <200210140936.g9E9aNN07725@californium.hgmp.mrc.ac.uk>; from gwilliam@hgmp.mrc.ac.uk on Mon, Oct 14, 2002 at 10:36:23AM +0100 References: <200210140936.g9E9aNN07725@californium.hgmp.mrc.ac.uk> Message-ID: <20021014111118.C1880@dev4.hgmp.mrc.ac.uk> Gary! * Gary Williams [021014 10:36]: > > Damian Counsell wrote: > > > Whatever happens with the HTML stylesheets, they will probably be > > subordinated to whatever the HGMP's new house style looks like. The > > whole point of these stylesheets is that they just tailor the "look" > > of the page to whatever environment they are being viewed in; they > > don't change the content in any way. > > > > We can have as many different looks as we like. James is creating / > > has created new stylesheets for the HGMP's new Web pages, which we > > ought to pay lip service to at least when preparing stylesheets for > > our copy of the online documentation. > > The HGMP house style has nothing whatsoever to do with the EMBOSS > documentation. ...in exactly the same way that the HTML stylesheets have nothing whatsoever to do with the EMBOSS documentation. The crucial phrase in my last sentence is "our copy". > The EMBOSS project is not an HGMP project, the HGMP merely hosts some of > the EMBOSS web pages temporarily because it is convenient for us. So, when they are hosting the pages, we can (if we so choose) have the same colour scheme, say, as the rest of the local pages. Just as, if the documentation is hosted at NIH, those pages can inherit the NIH house style from *their* HTML stylesheets. Or, at Lincoln's Inn Fields, the Cancer Research UK stylesheet can apply, or at the Pasteur, their stylesheets can apply. > We can and will move the EMBOSS pages elsewhere if there is any > interference from the HGMP. As the Scousers would say: "Calm down. Calm down" :-) . The point I've been making (in the last three emails I have sent out on this subject) is that the HTML stylesheets are independent of the XML stylesheets and of the documentation itself. all the best Damian -- Damian COUNSELL email: d.counsell at hgmp.mrc.ac.uk MRC Human Genome Mapping Project RC phone: +44 (0)1223 494500 Cambridge CB10 1SB direct: +44 (0)1223 494585 http://www.hgmp.mrc.ac.uk/~dcounsel/ fax: +44 (0)1223 494512 From gwilliam at hgmp.mrc.ac.uk Mon Oct 14 06:26:52 2002 From: gwilliam at hgmp.mrc.ac.uk (Gary Williams, Tel 01223 494522) Date: Mon, 14 Oct 2002 11:26:52 +0100 Subject: Reorganisation of EMBOSS References: <3DAA8CEE.95F6FC5A@hgmp.mrc.ac.uk> <3DAA9A15.F962575D@hgmp.mrc.ac.uk> Message-ID: <3DAA9BEC.7991C41D@hgmp.mrc.ac.uk> Sorry - I'm guilty of imprecise terminology here. When I said "A program should have a function", I meant that a program should have a single biological job to do, not that a program should only have one call to a single function in the code libraries, (which I agree is very silly.) Gary Tim Carver wrote: > > I agree with Lisa...... because this is the sort of feedback that comes from the courses! It does > not look impressive having so many programs that do a single function. > > Surely a function is what the libraries are for and a program can then call a number of these > functions. You still have modular libraries that the enthusiast can cobble together to do what > they want, but combine highly reletated programs and then just add flags! > > If you expect them to wade through the groups to get to the program that they are looking for then > the groups maybe could do with a re-think. > > Tim > > "Gary Williams, Tel 01223 494522" wrote: > > > Which programs were you proposing to amalgamate? > > > > Surely it is the job of an interface to group programs and to guide the > > user to the appropriate function? > > > > A program should have a function. > > > > A program should not have several functions. > > > > Perhaps a good rule is: > > If the function cannot be simply expressed in one line of description, > > then the program is too complex and should be split. If two programs can > > be described together in one line then they complement or overlap in > > function and should be merged. > > > > Gary > > > > Lisa Mullan wrote: > > > > > > Here is the eagerly awaited ppt file! > > > > > > I don't seem to have BeeJay's address, or anyone else from theEBI, but > > > trust it will be shared. > > > > > > To continue the conversation we were having at the meeting regarding the > > > splitting, or amalgamating of programs, I think we should look for a > > > consensus on one thing or the other. > > > > > > For my part, I think it is rather silly that bench biologists (who are the > > > main users of these tools) have to wade through hundreds of programs often > > > with names that bear little or no relation to what they do (although we > > > find them funny!) > > > > > > On the side of people that are developing their own softwre using EMBOSS > > > applications, I feel that it would be possible to switch off the functions > > > they don't need by using the option flags. > > > > > > I cannot see an argument for so many tiny programs, apart from the > > > author's own convenience, which should perhaps be lower down the priority > > > list? > > > > > > Lisa > > > > > > Lisa Mullan > > > HGMP Resource Centre > > > Hinxton, > > > Cambridge, CB10 1SB > > > Tel: 01223 494526 > > > Email: lmullan at hgmp.mrc.ac.uk > > > > > > ------------------------------------------------------------------------ > > > Name: emboss_coordination.ppt > > > emboss_coordination.ppt Type: Microsoft PowerPoint Show (application/vnd.ms-powerpoint) > > > Encoding: BASE64 > > > > -- > > Gary Williams Tel: +44 1223 494522 Fax: +44 1223 494512 > > mailto:G.Williams at hgmp.mrc.ac.uk http://www.hgmp.mrc.ac.uk/ > > Bioinformatics,MRC HGMP Resource Centre,Hinxton,Cambridge, CB10 1SB,UK -- Gary Williams Tel: +44 1223 494522 Fax: +44 1223 494512 mailto:G.Williams at hgmp.mrc.ac.uk http://www.hgmp.mrc.ac.uk/ Bioinformatics,MRC HGMP Resource Centre,Hinxton,Cambridge, CB10 1SB,UK From jrvalverde at cnb.uam.es Mon Oct 14 10:12:15 2002 From: jrvalverde at cnb.uam.es (José R. Valverde) Date: Mon, 14 Oct 2002 16:12:15 +0200 Subject: Reorganisation of EMBOSS In-Reply-To: <3DAA9BEC.7991C41D@hgmp.mrc.ac.uk> References: <3DAA8CEE.95F6FC5A@hgmp.mrc.ac.uk> <3DAA9A15.F962575D@hgmp.mrc.ac.uk> <3DAA9BEC.7991C41D@hgmp.mrc.ac.uk> Message-ID: <20021014161215.12c681e5.jrvalverde@cnb.uam.es> On Mon, 14 Oct 2002 11:26:52 +0100 "Gary Williams, Tel 01223 494522" wrote: > > Sorry - I'm guilty of imprecise terminology here. > > When I said "A program should have a function", I meant that a program > should have a single biological job to do, not that a program should > only have one call to a single function in the code libraries, (which I > agree is very silly.) > > Gary Not so. It not only is not silly, but a bery good idea indeed: ideally a program should have a single functional entry point. So the general layout would be main() { args = process_arguments(); do_work(args); } This has been admonished for more than fifteen years that I remember. The rationale is that if you doing it this way is no more complicated than the classic "main(argc,argv)" directly approach, but it adds a very useful layer: Instead of having to process command line arguments or text strings, the routine that is actually your program gets actual binary arguments. Thus building new programs that need to use the functionality of an existing one is easy: simply call the routine with the appropriate arguments. Doing so without a single routine entry point, i.e. a traditional program would entangle renaming main to something, converting all actual alrguments to strings and calling that something, or rewriting the program's main() entirely. The approach of a processing first the command-line and then calling a single routine enforces considering the program as something that carries on one conceptually functional work, and saves work if you later want to build on top of it. It is more elegant and demonstrates that you really have a clear idea what the program is supposed to achieve and what it needs to do it. That is most important in today's world: if you want to make that functionality into something remotely invokable for distributed computing, and it is a routine you simply write the IDL and you're done, otherwise either you redesign the program or write a wrapper to invoke it whic results in additional overhead. Ditto for adding interfaces: if you are separating the command-line processing, is to have that ability. If you now want to build a new interface it is easy to use the single routine entry as a hook to call the program instead of marshalling the arguments into text strings again... All in all, it's old wisdom that it is much better to have your program invoke a single function to do its work for future maintenance. j From shibl at seqbio.com Wed Oct 30 11:13:08 2002 From: shibl at seqbio.com (Shibl Mourad) Date: Wed, 30 Oct 2002 11:13:08 -0500 Subject: Emboss Expert System Message-ID: <002c01c2802f$3fec6370$2602a8c0@SEQUENCE> Dear EMBOSS user, We are currently developing an expert system that will complement EMBOSS. As there are roughly 200 tools packaged within EMBOSS alone, the task to locate the 'right' tool, especially if you are newcomer to the bioinformatics field, can be overwhelming. Our expert system, openExpert, aims to simulate the 'question and answer' conversation one would have with a bioinformatics 'expert' - but minus their presence and wage. Although it is currently populated with only the EMBOSS suite, we aim to broaden the knowledge base of openExpert to encompass all known bioinformatics tools. We are looking for 5 EMBOSS users to review the system. The review should not take more than 30 minutes of your time and it would be of great value to us. If you are interested, please email shibl at seqbio.com. If you would like to try openExpert without providing a review, please indicate so in your email and we will provide with free access. Help us make openExpert a valuable expert system for bioinformatics. Thank you, Shibl Mourad, President Sequence Bioinformatics From gbottu at ben.vub.ac.be Mon Oct 7 10:03:10 2002 From: gbottu at ben.vub.ac.be (Guy Bottu) Date: Mon, 7 Oct 2002 12:03:10 +0200 (CEST) Subject: which restriction sites are used ? Message-ID: <200210071003.MAA1302581@black.vub.ac.be> from : BEN Dear developers, I am using EMBOSS version 2.4.1. I noticed the following : ApeKI is a restriction enzyme without commercial provider. If I do "redata ApeKI" I get : ------------ ApeKI Recognition site is GCWGC leaving sticky ends Cut positions 5':0 3':0 [5':0 3':0] Organism: Aeropyrum pernix K1 Methylated: ?(5) Source: ATCC 700893 Isoschizomers: TseI AceI Taq52I TseBI References: Kawarabayasi, Y. et al., (1999) DNA Res., vol. 6, pp. 83-101. Xu, S.-Y., Opitz, L., Unpublished observations. ---------------- So, it is there and has a cut site that should match GCAGC. Yet, when I run remap or restrict with parameters -nocommercial -nolimit on a sequence containg GCAGC the restriction site is not found. Asking explicitly to look for enzyme ApeKI does not help, the enzyme is not even mentioned as "enzyme that does not cut". Is that normal ? Guy Bottu From jrvalverde at cnb.uam.es Tue Oct 8 14:14:29 2002 From: jrvalverde at cnb.uam.es (José R. Valverde) Date: Tue, 8 Oct 2002 16:14:29 +0200 Subject: Suggestions Message-ID: <20021008161429.49d4b558.jrvalverde@cnb.uam.es> A couple of suggestions for the Jemboss user interface: 1) Why not integrate B/Biomer with Jemboss so structure files (PDB) may be loaded, visualised, built AND modelised by molecular dynamics/mechanics. B/Biomer has already all this, and is open code and free academic software. http://www.scripps.edu/case/Biomer/index.html I've been wanting to do this for almost two years now, but have not been able to find the time needed to start with it (sic). I still would like to, and perhaps with some minor help I could. In principle all that would be needed is a connection similar to that available for Cinema: a menu item opening a window where one can select/drag a PDB file and perhaps a "connection" of .pdb/.ent/.brk files to the program. It might be a good starting point to grasp other MD software (but again, my adaptation of TINKER to embassy is long due and delayed). 2) Simpler: a way to save user preferences locally, and most prominently among them, things like the "mail server" to be used by CINEMA, which is a pain that needs localisation almost everywhere. j From jrvalverde at cnb.uam.es Tue Oct 8 14:40:26 2002 From: jrvalverde at cnb.uam.es (José R. Valverde) Date: Tue, 8 Oct 2002 16:40:26 +0200 Subject: More Jemboss suggestions Message-ID: <20021008164026.6d68be7c.jrvalverde@cnb.uam.es> Another tools interesting to be integrated is TreeApp from phylodendron. It would allow drawing the phylogenetic trees obtained by other programs. LoopDloop might as well be interesting for drawing RNA secondary structures coupled with Zuker's RNAfold. Phylodendron and LoopDloop are available at http://iubio.bio.indiana.edu/soft/molbio/java/apps/ j From gwilliam at hgmp.mrc.ac.uk Mon Oct 14 09:36:23 2002 From: gwilliam at hgmp.mrc.ac.uk (gwilliam at hgmp.mrc.ac.uk) Date: Mon, 14 Oct 2002 10:36:23 +0100 (BST) Subject: documentation Message-ID: <200210140936.g9E9aNN07725@californium.hgmp.mrc.ac.uk> Damian Counsell wrote: > Whatever happens with the HTML stylesheets, they will probably be > subordinated to whatever the HGMP's new house style looks like. The > whole point of these stylesheets is that they just tailor the "look" > of the page to whatever environment they are being viewed in; they > don't change the content in any way. > > We can have as many different looks as we like. James is creating / > has created new stylesheets for the HGMP's new Web pages, which we > ought to pay lip service to at least when preparing stylesheets for > our copy of the online documentation. The HGMP house style has nothing whatsoever to do with the EMBOSS documentation. The EMBOSS project is not an HGMP project, the HGMP merely hosts some of the EMBOSS web pages temporarily because it is convenient for us. We can and will move the EMBOSS pages elsewhere if there is any interference from the HGMP. -- Gary Williams Tel: +44 1223 494522 Fax: +44 1223 494512 mailto:G.Williams at hgmp.mrc.ac.uk http://www.hgmp.mrc.ac.uk/ Bioinformatics,MRC HGMP Resource Centre,Hinxton,Cambridge, CB10 1SB,UK From cbeazley at hgmp.mrc.ac.uk Mon Oct 14 09:48:08 2002 From: cbeazley at hgmp.mrc.ac.uk (Claude Beazley) Date: Mon, 14 Oct 2002 10:48:08 +0100 Subject: Reorganisation of EMBOSS Message-ID: <200210141048.08268.cbeazley@hgmp.mrc.ac.uk> The UNIX design philsophy (which has served us well for nearly 40 years) is KISS (Keep It Simple Stupid). i.e. Create lots of small well designed tools each of which serve a very specific function. These tools can be then linked together (through piping, redirection or simple file writing/reading) to perform very complex tasks. It keeps debugging sane and allows users/developers to utilise the tools in ways that the original developers may not have thought of. This allows the biologists the high degree of creativity which is important for research. OK. so they have to learn stuff, but that is nothing new for Biologists. If they really have problems developing new tools, then they can approach us or get their IT dept to develop in-house solutions. Also from a distributed programming perception, monolithic code structures would be an absolute nightmare to deal with. This modular approach also largely negates the problem of reinventing the wheel for each app. It also means that bench-biologists don't have the problem of massive code-bloat when they want to run a simple app. Many researchers (for example in third world countries) have old and underpowered PCs (people still use 486s would you believe). Keeping EMBOSS modular allows them to still use these machines whereas the total integrated approach would force them to enter the hardware upgrade treadmill. This is a prohibitive path which would keep them from using EMBOSS. (One of the reasons that GNU/Linux is so successful is that it can run pretty much on anything, due to this approach.) KISS keeps stuff running efficiently, cleanly and with sanity. >On Monday 14 Oct 2002 10:15 am, Lisa Mullan wrote: > Here is the eagerly awaited ppt file! > > I don't seem to have BeeJay's address, or anyone else from theEBI, but > trust it will be shared. > > To continue the conversation we were having at the meeting regarding the > splitting, or amalgamating of programs, I think we should look for a > consensus on one thing or the other. > > For my part, I think it is rather silly that bench biologists (who are the > main users of these tools) have to wade through hundreds of programs often > with names that bear little or no relation to what they do (although we > find them funny!) > > On the side of people that are developing their own softwre using EMBOSS > applications, I feel that it would be possible to switch off the functions > they don't need by using the option flags. > > I cannot see an argument for so many tiny programs, apart from the > author's own convenience, which should perhaps be lower down the priority > list? > > Lisa > > Lisa Mullan > HGMP Resource Centre > Hinxton, > Cambridge, CB10 1SB > Tel: 01223 494526 > Email: lmullan at hgmp.mrc.ac.uk ------------------------------------------------------- From cbeazley at hgmp.mrc.ac.uk Mon Oct 14 10:09:30 2002 From: cbeazley at hgmp.mrc.ac.uk (Claude Beazley) Date: Mon, 14 Oct 2002 11:09:30 +0100 Subject: Reorganisation of EMBOSS In-Reply-To: <200210141048.08268.cbeazley@hgmp.mrc.ac.uk> References: <200210141048.08268.cbeazley@hgmp.mrc.ac.uk> Message-ID: <200210141109.30864.cbeazley@hgmp.mrc.ac.uk> However, having said that, UNIX does not have 15 different "ls" commands (one ls for listing the files, another for listing the files with file sizes, a third one for sorting and listing the files... etc) it does have one "ls" command with switches for sorting, stats etc. So if that is the sort of integration that Lisa is thinking of, then it should be carefully considered. claude >On Monday 14 Oct 2002 10:48 am, Claude Beazley wrote: > The UNIX design philsophy (which has served us well for nearly 40 years) is > KISS (Keep It Simple Stupid). i.e. Create lots of small well designed tools > each of which serve a very specific function. These tools can be then > linked together (through piping, redirection or simple file > writing/reading) to perform very complex tasks. It keeps debugging sane > and allows > users/developers to utilise the tools in ways that the original developers > may not have thought of. This allows the biologists the high degree of > creativity which is important for research. OK. so they have to learn > stuff, but that is nothing new for Biologists. If they really have problems > developing new tools, then they can approach us or get their IT dept to > develop in-house solutions. > > Also from a distributed programming perception, monolithic code structures > would be an absolute nightmare to deal with. > > This modular approach also largely negates the problem of reinventing the > wheel for each app. It also means that bench-biologists don't have the > problem of massive code-bloat when they want to run a simple app. Many > researchers (for example in third world countries) have old and > underpowered PCs (people still use 486s would you believe). Keeping EMBOSS > modular allows them to still use these machines whereas the total > integrated approach would force them to enter the hardware upgrade > treadmill. This is a prohibitive path which would keep them from using > EMBOSS. (One of the reasons that GNU/Linux is so successful is that it can > run pretty much on anything, due to this approach.) > > > KISS keeps stuff running efficiently, cleanly and with sanity. > > >On Monday 14 Oct 2002 10:15 am, Lisa Mullan wrote: > > Here is the eagerly awaited ppt file! > > > > I don't seem to have BeeJay's address, or anyone else from theEBI, but > > trust it will be shared. > > > > To continue the conversation we were having at the meeting regarding the > > splitting, or amalgamating of programs, I think we should look for a > > consensus on one thing or the other. > > > > For my part, I think it is rather silly that bench biologists (who are > > the main users of these tools) have to wade through hundreds of programs > > often with names that bear little or no relation to what they do > > (although we find them funny!) > > > > On the side of people that are developing their own softwre using EMBOSS > > applications, I feel that it would be possible to switch off the > > functions they don't need by using the option flags. > > > > I cannot see an argument for so many tiny programs, apart from the > > author's own convenience, which should perhaps be lower down the priority > > list? > > > > Lisa > > > > Lisa Mullan > > HGMP Resource Centre > > Hinxton, > > Cambridge, CB10 1SB > > Tel: 01223 494526 > > Email: lmullan at hgmp.mrc.ac.uk > > ------------------------------------------------------- From d.counsell at hgmp.mrc.ac.uk Mon Oct 14 10:11:18 2002 From: d.counsell at hgmp.mrc.ac.uk (Damian Counsell) Date: Mon, 14 Oct 2002 11:11:18 +0100 Subject: documentation In-Reply-To: <200210140936.g9E9aNN07725@californium.hgmp.mrc.ac.uk>; from gwilliam@hgmp.mrc.ac.uk on Mon, Oct 14, 2002 at 10:36:23AM +0100 References: <200210140936.g9E9aNN07725@californium.hgmp.mrc.ac.uk> Message-ID: <20021014111118.C1880@dev4.hgmp.mrc.ac.uk> Gary! * Gary Williams [021014 10:36]: > > Damian Counsell wrote: > > > Whatever happens with the HTML stylesheets, they will probably be > > subordinated to whatever the HGMP's new house style looks like. The > > whole point of these stylesheets is that they just tailor the "look" > > of the page to whatever environment they are being viewed in; they > > don't change the content in any way. > > > > We can have as many different looks as we like. James is creating / > > has created new stylesheets for the HGMP's new Web pages, which we > > ought to pay lip service to at least when preparing stylesheets for > > our copy of the online documentation. > > The HGMP house style has nothing whatsoever to do with the EMBOSS > documentation. ...in exactly the same way that the HTML stylesheets have nothing whatsoever to do with the EMBOSS documentation. The crucial phrase in my last sentence is "our copy". > The EMBOSS project is not an HGMP project, the HGMP merely hosts some of > the EMBOSS web pages temporarily because it is convenient for us. So, when they are hosting the pages, we can (if we so choose) have the same colour scheme, say, as the rest of the local pages. Just as, if the documentation is hosted at NIH, those pages can inherit the NIH house style from *their* HTML stylesheets. Or, at Lincoln's Inn Fields, the Cancer Research UK stylesheet can apply, or at the Pasteur, their stylesheets can apply. > We can and will move the EMBOSS pages elsewhere if there is any > interference from the HGMP. As the Scousers would say: "Calm down. Calm down" :-) . The point I've been making (in the last three emails I have sent out on this subject) is that the HTML stylesheets are independent of the XML stylesheets and of the documentation itself. all the best Damian -- Damian COUNSELL email: d.counsell at hgmp.mrc.ac.uk MRC Human Genome Mapping Project RC phone: +44 (0)1223 494500 Cambridge CB10 1SB direct: +44 (0)1223 494585 http://www.hgmp.mrc.ac.uk/~dcounsel/ fax: +44 (0)1223 494512 From gwilliam at hgmp.mrc.ac.uk Mon Oct 14 10:26:52 2002 From: gwilliam at hgmp.mrc.ac.uk (Gary Williams, Tel 01223 494522) Date: Mon, 14 Oct 2002 11:26:52 +0100 Subject: Reorganisation of EMBOSS References: <3DAA8CEE.95F6FC5A@hgmp.mrc.ac.uk> <3DAA9A15.F962575D@hgmp.mrc.ac.uk> Message-ID: <3DAA9BEC.7991C41D@hgmp.mrc.ac.uk> Sorry - I'm guilty of imprecise terminology here. When I said "A program should have a function", I meant that a program should have a single biological job to do, not that a program should only have one call to a single function in the code libraries, (which I agree is very silly.) Gary Tim Carver wrote: > > I agree with Lisa...... because this is the sort of feedback that comes from the courses! It does > not look impressive having so many programs that do a single function. > > Surely a function is what the libraries are for and a program can then call a number of these > functions. You still have modular libraries that the enthusiast can cobble together to do what > they want, but combine highly reletated programs and then just add flags! > > If you expect them to wade through the groups to get to the program that they are looking for then > the groups maybe could do with a re-think. > > Tim > > "Gary Williams, Tel 01223 494522" wrote: > > > Which programs were you proposing to amalgamate? > > > > Surely it is the job of an interface to group programs and to guide the > > user to the appropriate function? > > > > A program should have a function. > > > > A program should not have several functions. > > > > Perhaps a good rule is: > > If the function cannot be simply expressed in one line of description, > > then the program is too complex and should be split. If two programs can > > be described together in one line then they complement or overlap in > > function and should be merged. > > > > Gary > > > > Lisa Mullan wrote: > > > > > > Here is the eagerly awaited ppt file! > > > > > > I don't seem to have BeeJay's address, or anyone else from theEBI, but > > > trust it will be shared. > > > > > > To continue the conversation we were having at the meeting regarding the > > > splitting, or amalgamating of programs, I think we should look for a > > > consensus on one thing or the other. > > > > > > For my part, I think it is rather silly that bench biologists (who are the > > > main users of these tools) have to wade through hundreds of programs often > > > with names that bear little or no relation to what they do (although we > > > find them funny!) > > > > > > On the side of people that are developing their own softwre using EMBOSS > > > applications, I feel that it would be possible to switch off the functions > > > they don't need by using the option flags. > > > > > > I cannot see an argument for so many tiny programs, apart from the > > > author's own convenience, which should perhaps be lower down the priority > > > list? > > > > > > Lisa > > > > > > Lisa Mullan > > > HGMP Resource Centre > > > Hinxton, > > > Cambridge, CB10 1SB > > > Tel: 01223 494526 > > > Email: lmullan at hgmp.mrc.ac.uk > > > > > > ------------------------------------------------------------------------ > > > Name: emboss_coordination.ppt > > > emboss_coordination.ppt Type: Microsoft PowerPoint Show (application/vnd.ms-powerpoint) > > > Encoding: BASE64 > > > > -- > > Gary Williams Tel: +44 1223 494522 Fax: +44 1223 494512 > > mailto:G.Williams at hgmp.mrc.ac.uk http://www.hgmp.mrc.ac.uk/ > > Bioinformatics,MRC HGMP Resource Centre,Hinxton,Cambridge, CB10 1SB,UK -- Gary Williams Tel: +44 1223 494522 Fax: +44 1223 494512 mailto:G.Williams at hgmp.mrc.ac.uk http://www.hgmp.mrc.ac.uk/ Bioinformatics,MRC HGMP Resource Centre,Hinxton,Cambridge, CB10 1SB,UK From jrvalverde at cnb.uam.es Mon Oct 14 14:12:15 2002 From: jrvalverde at cnb.uam.es (José R. Valverde) Date: Mon, 14 Oct 2002 16:12:15 +0200 Subject: Reorganisation of EMBOSS In-Reply-To: <3DAA9BEC.7991C41D@hgmp.mrc.ac.uk> References: <3DAA8CEE.95F6FC5A@hgmp.mrc.ac.uk> <3DAA9A15.F962575D@hgmp.mrc.ac.uk> <3DAA9BEC.7991C41D@hgmp.mrc.ac.uk> Message-ID: <20021014161215.12c681e5.jrvalverde@cnb.uam.es> On Mon, 14 Oct 2002 11:26:52 +0100 "Gary Williams, Tel 01223 494522" wrote: > > Sorry - I'm guilty of imprecise terminology here. > > When I said "A program should have a function", I meant that a program > should have a single biological job to do, not that a program should > only have one call to a single function in the code libraries, (which I > agree is very silly.) > > Gary Not so. It not only is not silly, but a bery good idea indeed: ideally a program should have a single functional entry point. So the general layout would be main() { args = process_arguments(); do_work(args); } This has been admonished for more than fifteen years that I remember. The rationale is that if you doing it this way is no more complicated than the classic "main(argc,argv)" directly approach, but it adds a very useful layer: Instead of having to process command line arguments or text strings, the routine that is actually your program gets actual binary arguments. Thus building new programs that need to use the functionality of an existing one is easy: simply call the routine with the appropriate arguments. Doing so without a single routine entry point, i.e. a traditional program would entangle renaming main to something, converting all actual alrguments to strings and calling that something, or rewriting the program's main() entirely. The approach of a processing first the command-line and then calling a single routine enforces considering the program as something that carries on one conceptually functional work, and saves work if you later want to build on top of it. It is more elegant and demonstrates that you really have a clear idea what the program is supposed to achieve and what it needs to do it. That is most important in today's world: if you want to make that functionality into something remotely invokable for distributed computing, and it is a routine you simply write the IDL and you're done, otherwise either you redesign the program or write a wrapper to invoke it whic results in additional overhead. Ditto for adding interfaces: if you are separating the command-line processing, is to have that ability. If you now want to build a new interface it is easy to use the single routine entry as a hook to call the program instead of marshalling the arguments into text strings again... All in all, it's old wisdom that it is much better to have your program invoke a single function to do its work for future maintenance. j From shibl at seqbio.com Wed Oct 30 16:13:08 2002 From: shibl at seqbio.com (Shibl Mourad) Date: Wed, 30 Oct 2002 11:13:08 -0500 Subject: Emboss Expert System Message-ID: <002c01c2802f$3fec6370$2602a8c0@SEQUENCE> Dear EMBOSS user, We are currently developing an expert system that will complement EMBOSS. As there are roughly 200 tools packaged within EMBOSS alone, the task to locate the 'right' tool, especially if you are newcomer to the bioinformatics field, can be overwhelming. Our expert system, openExpert, aims to simulate the 'question and answer' conversation one would have with a bioinformatics 'expert' - but minus their presence and wage. Although it is currently populated with only the EMBOSS suite, we aim to broaden the knowledge base of openExpert to encompass all known bioinformatics tools. We are looking for 5 EMBOSS users to review the system. The review should not take more than 30 minutes of your time and it would be of great value to us. If you are interested, please email shibl at seqbio.com. If you would like to try openExpert without providing a review, please indicate so in your email and we will provide with free access. Help us make openExpert a valuable expert system for bioinformatics. Thank you, Shibl Mourad, President Sequence Bioinformatics