From jkb at mrc-lmb.cam.ac.uk Wed Feb 7 10:13:29 2001 From: jkb at mrc-lmb.cam.ac.uk (James Bonfield) Date: Wed, 7 Feb 2001 15:13:29 +0000 Subject: required vs optional Message-ID: <20010207151329.A12981@arran.mrc-lmb.cam.ac.uk> Hello all, I'm confused about the precise use for the required and optional parameters. The observed usage varies and seems to disagree with the documentation. Some examples help, but I should warn you that I'm still using emboss 1.5.5 so things may have changed. syco.acd: bool: plot [ opt: N info: "Produce plot" ] xygraph: graph [ req: $(plot) multi: 3 ] outfile: outfile [ req: @(!$(plot)) ] Here plot is a boolean. If "Produce plot" is answered as yes then the xygraph type is a suitable question (althouugh whether or not it's asked for is another matter) and the outfile is redundant. The produce plot is answered as no then vice versa is true. It's clear to see how this is implemented via the "req: $(plot)" code. In the graphical interface, where all command options are displayed simultaneously, this produces code which automatically "greys-out" arguments that are superfluous. However if we look at shuffleseq.acd int: shuffle [ req: N def: 1 info: "Number of shuffles" ] Here this is implying that 'shuffle' is never required. So required is both indicating parameters which are merely optional and parameters which are not needed. In my current GUI this causes shuffle to be permanently greyed-out, which is obviously incorrect. As a hack I can ignore fixed "req:N" statements and only grey-out when the value of req: is an expression, but that's also wrong. Even more confusing is pepwheel.acd: bool: wheel [ def: Y info: "Plot the wheel" ] int: steps [ opt: Y min: 2 max: 100 def: 18 info: "Number of steps" help: "The number of residues plotted per turn is this value divided by the 'turns' value." ] steps is listed using "opt" and not "req". It also has no dependency on wheel, which doesn't make sense. Wheel mentions neither optional or required settings - what are the default values for these? I also do not understand the rules for working out which parameters are listed in the help as mandatory, optional, or advanced. The help for pepwheel indicates that -steps is optional and -wheel is advanced, which isn't too sensible. I assume this information is derived from the use of opt, and req. Anyway, ideally I'd like a needed paramater so that I can distinguish between options which have no use (eg "steps" in pepwheel after "wheel" is set to 0) and options that have a use but a simply optional. The required setting seems redundant, and could be a source of error (what does req:Y opt:Y mean, or req:N opt:N?). Any suggestions? James -- James Bonfield (jkb at mrc-lmb.cam.ac.uk) Tel: 01223 402499 Fax: 01223 213556 Medical Research Council - Laboratory of Molecular Biology, Hills Road, Cambridge, CB2 2QH, England. Also see Staden Package WWW site at http://www.mrc-lmb.cam.ac.uk/pubseq/ From Peter.Rice at uk.lionbioscience.com Wed Feb 7 10:32:40 2001 From: Peter.Rice at uk.lionbioscience.com (Peter Rice) Date: Wed, 07 Feb 2001 15:32:40 +0000 Subject: required vs optional References: <20010207151329.A12981@arran.mrc-lmb.cam.ac.uk> Message-ID: <3A816A98.2BA3C3FF@lionbio.co.uk> Hi James, >I'm confused about the precise use for the required and optional >parameters. It is all very simple. Honest. Only the expressions make it complicated. By default, everything is "advanced" (i.e. never prompted for). If you say "req: Y" then it means required, and will be prompted. If you say "param: Y" this also means required, and will be prompted. Incidentally, as "req: N" is the default it is not needed in shuffleseq.acd If you say "opt: Y" then it means optional, and is only prompted for when you run with -options on the command line (or EMBOSS_OPTIONS defined as true) If you want to make life complicated, you can say "req: " and work out at run time whether it is required or not. These are the cases that cause you problems. If you can figure out the result, then fine. If not, you can assume a true value or make up your own "may be needed, but I can't tell" category. >I also do not understand the rules for working out which parameters are listed >in the help as mandatory, optional, or advanced. The help for pepwheel >indicates that -steps is optional and -wheel is advanced, which isn't too >sensible. I assume this information is derived from the use of opt, and req. The help has the same problem you do. Only worse - it has to guess what would happen with any expression. At least you get to see the user input :-) >Anyway, ideally I'd like a needed paramater so that I can distinguish between >options which have no use (eg "steps" in pepwheel after "wheel" is set to 0) >and options that have a use but a simply optional. The required setting seems >redundant, and could be a source of error (what does req:Y opt:Y mean, or >req:N opt:N?). In pepwheel.acd, plotting the wheel is 'advanced' (req: N) so users can only turn off the plot from the command line. The step increment is optional so users can try running "pepwheel -options" and be prompted for the step. Yes, we could turn it off if there is no plot, but the program does not mind and it would make life even more difficult for interface developers if we had too many expressions. The obvious solution would be to say "opt: $(wheel)" in steps. I would guess leaving it as it currently stands is 'best'. Who knows, one day the program might calculate something that needs the step value, without displaying the plot :-) Of course, you are free to play with the settings to find a way that makes sense for your graphical interface. Most variations would make sense to the program (just try it from the command line to check what happens). Hope this helps, Peter -- ------------------------------------------------ Peter Rice, LION Bioscience Ltd, Cambridge, UK peter.rice at uk.lionbioscience.com +44 1223 224723 From david.martin at biotek.uio.no Wed Feb 7 10:41:28 2001 From: david.martin at biotek.uio.no (David Martin) Date: Wed, 7 Feb 2001 16:41:28 +0100 Subject: [EMBnet ADMIN] required vs optional In-Reply-To: <20010207151329.A12981@arran.mrc-lmb.cam.ac.uk> Message-ID: On Wed, 7 Feb 2001, James Bonfield wrote: > Hello all, > > I'm confused about the precise use for the required and optional > parameters. The observed usage varies and seems to disagree with the > documentation. Some examples help, but I should warn you that I'm still using > emboss 1.5.5 so things may have changed. Unless I am mistaken parameters are defined inthe following way: req: Y - a required parameter opt: y - an optional parameter everything else : an advanced parameter I would expect the opt: N to be redundant. The reason one can set opt: N or req: N is so that it can be implemented programatically (as your first example shows.) > > syco.acd: > > bool: plot [ opt: N info: "Produce plot" ] > > xygraph: graph [ req: $(plot) multi: 3 ] > > outfile: outfile [ req: @(!$(plot)) ] > > Here plot is a boolean. If "Produce plot" is answered as yes then the xygraph > type is a suitable question (althouugh whether or not it's asked for is > another matter) and the outfile is redundant. The produce plot is answered as > no then vice versa is true. It's clear to see how this is implemented via the > "req: $(plot)" code. In the graphical interface, where all command options are > displayed simultaneously, this produces code which automatically "greys-out" > arguments that are superfluous. > > However if we look at shuffleseq.acd > > int: shuffle [ req: N def: 1 info: "Number of shuffles" ] > > Here this is implying that 'shuffle' is never required. So required is both > indicating parameters which are merely optional and parameters which are not > needed. In my current GUI this causes shuffle to be permanently greyed-out, > which is obviously incorrect. As a hack I can ignore fixed "req:N" statements > and only grey-out when the value of req: is an expression, but that's also > wrong. I would interpret this as being that shuffle is an advanced qualifier, which is as EMBOSS appears to give it. It is not that it isn't required, but that it is already set. It is probably needed but isn't required to be prompted for. > > Even more confusing is pepwheel.acd: > > bool: wheel [ > def: Y > info: "Plot the wheel" > ] > > int: steps [ > opt: Y > min: 2 > max: 100 > def: 18 > info: "Number of steps" > help: "The number of residues plotted per turn is this value > divided by the 'turns' value." > ] > > steps is listed using "opt" and not "req". It also has no dependency on wheel, > which doesn't make sense. Wheel mentions neither optional or required settings > - what are the default values for these? But the opt/req controlls which parameters are prompted for.. I suppose it should really be dependent so it is not prompted if -nowheel is set on the comamnd line. > > I also do not understand the rules for working out which parameters are listed > in the help as mandatory, optional, or advanced. The help for pepwheel > indicates that -steps is optional and -wheel is advanced, which isn't too > sensible. I assume this information is derived from the use of opt, and req. yup. if req is Y then it is in mandatory, opt is Y then it is in optional, neither then it is in advanced. What it does if both are set is a mystery, probably throws an error. > > Anyway, ideally I'd like a needed paramater so that I can distinguish between > options which have no use (eg "steps" in pepwheel after "wheel" is set to 0) > and options that have a use but a simply optional. The required setting seems > redundant, and could be a source of error (what does req:Y opt:Y mean, or > req:N opt:N?). Can both be defined in a valid ACD file? On a sideways but related note, is it possible to configure the windows versions of SIP/NIP to import sequences from an external program? this would be very nice. I am thinking of a java thinghy with a gui to cover seqret/entret/showdb from a remote EMBOSS machine. ..d --------------------------------------------------------------------- * Dr. David Martin Biotechnology Centre of Oslo * * Node Manager Gaustadalleen 21 * * The Norwegian EMBNet Node P.O. box 1125 Blindern * * tel +47 22 84 05 35 N-0317 Oslo * * fax +47 22 84 05 01 Norway * --------------------------------------------------------------------- I will be leaving the Norwegian EMBnet node on 23rd February. All work related mail should be addressed to admin at embnet.uio.no where my successor, Rune Groven will deal with it. All personal email should be sent to dmartin at hgmp.mrc.ac.uk from whence it will be automatically forwarded to me. Spam should continue to be sent to /dev/null From jkb at mrc-lmb.cam.ac.uk Wed Feb 7 11:59:30 2001 From: jkb at mrc-lmb.cam.ac.uk (James Bonfield) Date: Wed, 7 Feb 2001 16:59:30 +0000 Subject: required vs optional In-Reply-To: <3A816A98.2BA3C3FF@lionbio.co.uk>; from Peter.Rice@uk.lionbioscience.com on Wed, Feb 07, 2001 at 03:32:40PM +0000 References: <20010207151329.A12981@arran.mrc-lmb.cam.ac.uk> <3A816A98.2BA3C3FF@lionbio.co.uk> Message-ID: <20010207165930.C19630@arran.mrc-lmb.cam.ac.uk> Hi Peter, Thanks for some clarification, but I still have further queries. > If you say "req: Y" then it means required, and will be prompted. > If you say "param: Y" this also means required, and will be prompted. So for the purposes of the GUI I can just treat them the same and always specify the qualifier anyway. > If you want to make life complicated, you can say "req: " and work > out at run time whether it is required or not. These are the cases that cause > you problems. If you can figure out the result, then fine. If not, you can Actually these aren't a problem at all. I'm handling expressions already, even in cases for specifying the minimum and maximum values (I don't know if these exist, but I hacked an example to test it). So my GUI for stretcher, for example, automatically handles changing the gappenalty and gaplength parameters when the user changes from DNA to protein or back again. If the user modifies gappenalty and then changes the sequence type then the program remembers that the user has already adjusted the value and so it then does not automatically change it. Internally this is all performed by Tcl's excellent variable trace options. When a value is an expression I work out which variables are contained within the expression and produce a Tcl trace for each one. Then whenever a dependent variable is modified a callback procedure is invoked which reevaluations the expression. This is even working (wrongly as it turns out) for the required parameters, so setting $(wheel) to 0, from 1, (for example) could automatically grey-out the other parameters. > would make life even more difficult for interface developers if we had too > many expressions. The obvious solution would be to say "opt: $(wheel)" in > steps. To be honest it'd make my life easier as things are more consistent, and it looks better for the user too. > Of course, you are free to play with the settings to find a way that makes > sense for your graphical interface. Most variations would make sense to the > program (just try it from the command line to check what happens). I still feel that a way of indicating unnecessary or pointless questions would be useful. It's additional to required as that just indicates whether or not the default value is enough. I'm just trying to think what dialogues professional applications are likely to have, and greying-out things on the fly is a common technique. The more I think about it the more I'm certain. So I propose the following addition to each command line qualifier. needed: @expression The default is 1. If an expression is specified and it evaluations to 0 then the option is not asked for (even with -options) and in GUIs it will be greyed-out. For me, this is trivial to do (ie I've already written it). Any comments on this proposal? James -- James Bonfield (jkb at mrc-lmb.cam.ac.uk) Tel: 01223 402499 Fax: 01223 213556 Medical Research Council - Laboratory of Molecular Biology, Hills Road, Cambridge, CB2 2QH, England. Also see Staden Package WWW site at http://www.mrc-lmb.cam.ac.uk/pubseq/ From Peter.Rice at uk.lionbioscience.com Wed Feb 7 12:07:34 2001 From: Peter.Rice at uk.lionbioscience.com (Peter Rice) Date: Wed, 07 Feb 2001 17:07:34 +0000 Subject: required vs optional References: <20010207151329.A12981@arran.mrc-lmb.cam.ac.uk> <3A816A98.2BA3C3FF@lionbio.co.uk> <20010207165930.C19630@arran.mrc-lmb.cam.ac.uk> Message-ID: <3A8180D6.F158B05C@lionbio.co.uk> James Bonfield wrote: > > If you say "req: Y" then it means required, and will be prompted. > > If you say "param: Y" this also means required, and will be prompted. > > So for the purposes of the GUI I can just treat them the same and always > specify the qualifier anyway. Yes. > I still feel that a way of indicating unnecessary or pointless questions would > be useful. It's additional to required as that just indicates whether or not > the default value is enough. I'm just trying to think what dialogues > professional applications are likely to have, and greying-out things on the > fly is a common technique. The more I think about it the more I'm certain. > So I propose the following addition to each command line qualifier. > > needed: @expression > > The default is 1. If an expression is specified and it evaluations to 0 then > the option is not asked for (even with -options) and in GUIs it will be > greyed-out. For me, this is trivial to do (ie I've already written it). > Any comments on this proposal? I don't see how this is different from testing required, parameter and optional in turn, which is what the code will do. Basically, needed is 'required or parameter or optional' ... every time. You would then have the choice of including/excluding the optional ones, just like a 'real' user. -- ------------------------------------------------ Peter Rice, LION Bioscience Ltd, Cambridge, UK peter.rice at uk.lionbioscience.com +44 1223 224723 From jkb at mrc-lmb.cam.ac.uk Wed Feb 7 12:16:49 2001 From: jkb at mrc-lmb.cam.ac.uk (James Bonfield) Date: Wed, 7 Feb 2001 17:16:49 +0000 Subject: required vs optional In-Reply-To: <3A8180D6.F158B05C@lionbio.co.uk>; from Peter.Rice@uk.lionbioscience.com on Wed, Feb 07, 2001 at 05:07:34PM +0000 References: <20010207151329.A12981@arran.mrc-lmb.cam.ac.uk> <3A816A98.2BA3C3FF@lionbio.co.uk> <20010207165930.C19630@arran.mrc-lmb.cam.ac.uk> <3A8180D6.F158B05C@lionbio.co.uk> Message-ID: <20010207171649.A25071@arran.mrc-lmb.cam.ac.uk> On Wed, Feb 07, 2001 at 05:07:34PM +0000, Peter Rice wrote: > I don't see how this is different from testing required, parameter and > optional in turn, which is what the code will do. > > Basically, needed is 'required or parameter or optional' ... every time. > > You would then have the choice of including/excluding the optional ones, just > like a 'real' user. But I've already found one case (shuffleseq) where it is stated to be "req: N" (and defaults to opt:y), in which case I (wrongly) grey it out as it's never needed. And of course there are other cases where the existing syntax is sufficient, but it's not actually used (eg all those options in pepwheel). What am I missing? James -- James Bonfield (jkb at mrc-lmb.cam.ac.uk) Tel: 01223 402499 Fax: 01223 213556 Medical Research Council - Laboratory of Molecular Biology, Hills Road, Cambridge, CB2 2QH, England. Also see Staden Package WWW site at http://www.mrc-lmb.cam.ac.uk/pubseq/ From Peter.Rice at uk.lionbioscience.com Wed Feb 7 12:30:48 2001 From: Peter.Rice at uk.lionbioscience.com (Peter Rice) Date: Wed, 07 Feb 2001 17:30:48 +0000 Subject: required vs optional References: <20010207151329.A12981@arran.mrc-lmb.cam.ac.uk> <3A816A98.2BA3C3FF@lionbio.co.uk> <20010207165930.C19630@arran.mrc-lmb.cam.ac.uk> <3A8180D6.F158B05C@lionbio.co.uk> <20010207171649.A25071@arran.mrc-lmb.cam.ac.uk> Message-ID: <3A818648.72D3658A@lionbio.co.uk> James Bonfield wrote: > But I've already found one case (shuffleseq) where it is stated to be "req: N" > (and defaults to opt:y), in which case I (wrongly) grey it out as it's never > needed. I am a little confused there. This definition: int: shuffle [ req: N def: 1 info: "Number of shuffles" ] is opt: N (the default - all these default to false) EMBOSS will never prompt for it, but you can certainly set it from the command line, and if you like you can offer it to the user. EMBOSS will accept anything on the command line (there is no complaint if the user puts something on the command line that is not wanted). But the assumption for graphical user interfaces is that only the optional cases are offered. I guess, being pedantic, it would be reasonable to change shuffle to be "opt: y" in the acd file, and the same for most other options you might like to turn on. My suspicion is that only a few keen users are really using the -options qualifier, but in any case it was aimed as much at the GUI developer as at the user who didn't want too many prompts. > And of course there are other cases where the existing syntax is sufficient, > but it's not actually used (eg all those options in pepwheel). We could clean them up, or we could remove the 'turn off the plot' option :-) -- ------------------------------------------------ Peter Rice, LION Bioscience Ltd, Cambridge, UK peter.rice at uk.lionbioscience.com +44 1223 224723 From jkb at mrc-lmb.cam.ac.uk Wed Feb 7 12:50:53 2001 From: jkb at mrc-lmb.cam.ac.uk (James Bonfield) Date: Wed, 7 Feb 2001 17:50:53 +0000 Subject: required vs optional In-Reply-To: <3A818648.72D3658A@lionbio.co.uk>; from Peter.Rice@uk.lionbioscience.com on Wed, Feb 07, 2001 at 05:30:48PM +0000 References: <20010207151329.A12981@arran.mrc-lmb.cam.ac.uk> <3A816A98.2BA3C3FF@lionbio.co.uk> <20010207165930.C19630@arran.mrc-lmb.cam.ac.uk> <3A8180D6.F158B05C@lionbio.co.uk> <20010207171649.A25071@arran.mrc-lmb.cam.ac.uk> <3A818648.72D3658A@lionbio.co.uk> Message-ID: <20010207175053.C25071@arran.mrc-lmb.cam.ac.uk> On Wed, Feb 07, 2001 at 05:30:48PM +0000, Peter Rice wrote: > I am a little confused there. This definition: > > int: shuffle [ req: N def: 1 info: "Number of shuffles" ] > > is opt: N (the default - all these default to false) Sorry, I thought opt: Y was the default. Incidently is there a list anywhere of all the default values? Eg including info, prompt, etc. Sometimes info: is used and sometimes prompt:, but often none are present. However emboss still prompts with an appropriate query - where does this come from? I can see some in the ajacd.c code, but not all. Besides reading the code to find the defaults isn't my idea of fun :-) > on. My suspicion is that only a few keen users are really using the -options > qualifier, but in any case it was aimed as much at the GUI developer as at the > user who didn't want too many prompts. Agreed. So just to clarify things, if the physical values or results from expressions give the following, then I need to enable or disable (grey-out) the question appropriately: req opt greyed out Y N N Y Y N N N N N Y Y Do req:N opt:N and req:Y opt:Y ever occur? They sound like inconsistencies. Anyway, only when it's optional and not required should I grey them out. However as optional defaults to N only places where opt:Y is explicitly stated will I ever grey out paramaters, which unfortunately includes all those graphics options (eg this is syco): bool: plot [ opt: N info: "Produce plot" ] xygraph: graph [ req: $(plot) multi: 3 ] outfile: outfile [ req: @(!$(plot)) ] I assume that I can just add opt:Y to the graph and outfile without breaking anything. I must admit though that this would be much easier if there was one single parameter to check as it makes the tcl variable tracing harder to write - rather than a trace of a single variable I have to produce a new variable consistencing of an expression (using req and opt) and trace that instead. James -- James Bonfield (jkb at mrc-lmb.cam.ac.uk) Tel: 01223 402499 Fax: 01223 213556 Medical Research Council - Laboratory of Molecular Biology, Hills Road, Cambridge, CB2 2QH, England. Also see Staden Package WWW site at http://www.mrc-lmb.cam.ac.uk/pubseq/ From Peter.Rice at uk.lionbioscience.com Wed Feb 7 13:18:20 2001 From: Peter.Rice at uk.lionbioscience.com (Peter Rice) Date: Wed, 07 Feb 2001 18:18:20 +0000 Subject: required vs optional References: <20010207151329.A12981@arran.mrc-lmb.cam.ac.uk> <3A816A98.2BA3C3FF@lionbio.co.uk> <20010207165930.C19630@arran.mrc-lmb.cam.ac.uk> <3A8180D6.F158B05C@lionbio.co.uk> <20010207171649.A25071@arran.mrc-lmb.cam.ac.uk> <3A818648.72D3658A@lionbio.co.uk> <20010207175053.C25071@arran.mrc-lmb.cam.ac.uk> Message-ID: <3A81916C.7E87ED03@lionbio.co.uk> James Bonfield wrote: > Incidently is there a list anywhere of all the default values? Eg including > info, prompt, etc. Sometimes info: is used and sometimes prompt:, but often > none are present. However emboss still prompts with an appropriate query - > where does this come from? I can see some in the ajacd.c code, but not > all. Besides reading the code to find the defaults isn't my idea of fun :-) All very much cleaned up since PISE started building their interface. All applications should be using info, and so should you. That was its original purpose. For cases where the prompt to the interactive user would be different, we invented 'prompt:' which applications can use if available, but the (unwritten) standard says that when prompts are being specified, info must always be there. However, certain data types have their own prompts. Sequence for example. Those are defined interally as secret codes, and have their own automatic prompts. If you try the following: % make check % entrails emboss.ent Almost all will be revealed in file 'emboss.ent'. entrails was written for interface developers who need to see these kinds of internals. For some reason, the 'make check' applications do not get installed so you need to run them from the original emboss directory. It should be not too hard to add the default prompts to entrails. Oops. I see it has the explanations swapped for input and output formats too. Needs some work. Which means ... Let me know what internals you need to peek at, and I can hack it for you while I fix the rest. > Agreed. So just to clarify things, if the physical values or results from > expressions give the following, then I need to enable or disable (grey-out) > the question appropriately: > > req opt greyed out > Y N N > Y Y N > N N N > N Y Y Swap those last two. opt:y means it can be useful opt:n means you can change it if you dare. > Do req:N opt:N and req:Y opt:Y ever occur? They sound like inconsistencies. They are rare, but they are possible (especially with expressions around). EMBOSS only has to decide whether to prompt. req:y prompts always. opt:y has to test whether -options (or EMBOSS_OPTIONS) is set. No conflict. EMBOSS will always prompt if it finds a reason to. > Anyway, only when it's optional and not required should I grey them out. > However as optional defaults to N only places where opt:Y is explicitly stated > will I ever grey out paramaters, which unfortunately includes all those > graphics options (eg this is syco): Some misunderstanding of 'required' perhaps? Required means the application would like to know what the user really wants, and it will prompt if it doesn't know. Optional means the user may want to set it, and he/she can ask for a prompt. With nothing set, the user can only set the value on the command line. These are the ones I would expect you to grey out. In processing, EMBOSS sets everything. If there is no prompt, and nothing on the command line, there is always a default value. > bool: plot [ opt: N info: "Produce plot" ] > > xygraph: graph [ req: $(plot) multi: 3 ] > > outfile: outfile [ req: @(!$(plot)) ] > > I assume that I can just add opt:Y to the graph and outfile without breaking > anything. I must admit though that this would be much easier if there was one > single parameter to check as it makes the tcl variable tracing harder to > write - rather than a trace of a single variable I have to produce a new > variable consistencing of an expression (using req and opt) and trace that > instead. Can you just generate the tcl variable you need? If only req or opt will be used you can just pick the value of whichever one is being set. Spare 'N' settings can be safely ignored, as they are the default anyway. -- ------------------------------------------------ Peter Rice, LION Bioscience Ltd, Cambridge, UK peter.rice at uk.lionbioscience.com +44 1223 224723 From leonardz at bioinfo.sickkids.on.ca Wed Feb 7 16:32:35 2001 From: leonardz at bioinfo.sickkids.on.ca (Len F. Zaifman) Date: Wed, 07 Feb 2001 16:32:35 -0500 Subject: EMBOSS - Indexing breaks on large databases Message-ID: <3A81BEF3.308A85CF@bioinfo.sickkids.on.ca> I have installed emboss 1.9.1 on an O2000. It installed nicely once I gave up on installing it shared. The issue came up in indexing genbank files. Most divisions indexed fine with dbiflat. However, when I try to index est , or all of genbank , the indexing breaks due to sort running out of memory: explicitly: I run dbiflat -idformat GB -directory /data/genbank -indexdirectory /tools/emboss1.9.1/data/indices/est -dbname GenBankEst -filenames gbest*.seq -date 06/02/01 -sortoptions '-T /tmp_disk/scratch4/applicat/est -k1,1' & dbiflat -idformat GB -directory /data/genbank -indexdirectory /tools/emboss1.9.1/data/indices/genbank -dbname GenBank -filenames *.seq -date 06/02/01 -sortoptions '-T /tmp_disk/scratch4/applicat/genbank -k1,1' & get UX:sort: ERROR: Out of memory before merge: Not enough space sort is run with -T /scratch4 -k1,1 , where scratch4 has a 10 GB quota I checked the environment and it is using the system sort (/bin/sort). There were no syslog errors. All other smaller divisions seemed to work. I have a scheduled reboot where I am going to set the maximum resident set size to 1 GB (it is currently 1/2 GB). However, is there a more clever way of doing this (ie if I did this on my work station I would be limited to 1/8 GB or swap like crazy). Details: I configure using: ./configure --prefix=/tools/emboss1.9.1 --disable-shared --with-x --with-pngdriver on an O2K running Irix 6.5.10 and the MipsPro 7.3.1.2 Compilers Any ideas?? As a side note: when I tried indexing all of genbank I got almost 60000 sequences generating the following warning notice: This is a warning: Duplicate ID skipped: 'XXXXXXXX' Is this an indication that the initial data needs to be cleaned up first, or a non-issue? Thanks. From jrvalverde at cnb.uam.es Thu Feb 8 04:00:49 2001 From: jrvalverde at cnb.uam.es (jrvalverde at cnb.uam.es) Date: Thu, 8 Feb 2001 10:00:49 +0100 (MET) Subject: EMBOSS - Indexing breaks on large databases In-Reply-To: <3A81BEF3.308A85CF@bioinfo.sickkids.on.ca> Message-ID: <200102080900.f1890pU3292249@embnet.cnb.uam.es> "Len F. Zaifman" wrote: > I have installed emboss 1.9.1 on an O2000. It installed nicely once I > gave up on installing it shared. BTW, I have succeeded in compiling EMBOSS for IRIX using 64 bit compilation. It required some tweaking, but works. The recipe for those willing to give it a try is - remove 'gcc' from your path - define COMPILER_DEFAULTS_PATH appropriately (see pe_environ) to look for a compiler.defaults file containing e.g. :abi=64:isa=4:proc=r10k - ./configure in EMBOSS and all EMBASSY subdirs - search in all files for 'CC = cc' and substitute it for 'CC = cc -64' - same for 'LD = /bin/ld' -> 'LD = /bin/ld -64' - make The reason is that compiling depends on the Makefile and on libtool, as well as linking. I didn't spend much in looking at configure since the above steps where so straightforward. I know I should look into the configure script and add an option for 64-bit-irix-compile or some such, but that'll have to wait till I have time for it. Yes, I know, the search and substitute thing looks tedious, but it isn't, honest: create a 'chfile.sh' out of the EMBOSS source hierarchy containing #/bin/sh cp $1 $1.orig mv $1 tmpfile sed -e 's/CC="cc"/CC="cc -64"/g' $1 > tmpfile sed -e 's/CC = cc/CC = cc -64/g' $1 > tmpfile sed -e 's/\/bin\/ld/\/bin\/ld -64/g' $1 > tmpfile rm tmpfile ## if you are sure, uncomment this #rm $1.orig 'cd' to the emboss directory and run find . -type f -exec /path/to/chfile.sh {} \; -print and you are done with the CC changes. Libtool requires special treatment since it uses quotes j From david.martin at biotek.uio.no Thu Feb 8 04:32:12 2001 From: david.martin at biotek.uio.no (David Martin) Date: Thu, 8 Feb 2001 10:32:12 +0100 Subject: [EMBnet ADMIN] EMBOSS - Indexing breaks on large databases In-Reply-To: <3A81BEF3.308A85CF@bioinfo.sickkids.on.ca> Message-ID: On Wed, 7 Feb 2001, Len F. Zaifman wrote: > I have installed emboss 1.9.1 on an O2000. It installed nicely once I > gave up on installing it shared. Which compiler were you using? I note that you have the MIPS compiler. Have you tried using gcc which seems (on my o200 which shouldn't be so different) to work just fine on EMBL. ..d > > The issue came up in indexing genbank files. Most divisions indexed fine > with dbiflat. However, when I try to index > est , or all of genbank , the indexing breaks due to sort running out of > memory: > > explicitly: > I run > dbiflat -idformat GB -directory /data/genbank -indexdirectory > /tools/emboss1.9.1/data/indices/est -dbname GenBankEst -filenames > gbest*.seq -date 06/02/01 -sortoptions '-T > /tmp_disk/scratch4/applicat/est -k1,1' > & > dbiflat -idformat GB -directory /data/genbank -indexdirectory > /tools/emboss1.9.1/data/indices/genbank -dbname GenBank -filenames > *.seq -date 06/02/01 -sortoptions '-T > /tmp_disk/scratch4/applicat/genbank -k1,1' > & get > > UX:sort: ERROR: Out of memory before merge: Not enough space > > > sort is run with -T /scratch4 -k1,1 , where scratch4 has a 10 GB > quota > I checked the environment and it is using the system sort (/bin/sort). > There were no syslog errors. > > All other smaller divisions seemed to work. I have a scheduled reboot > where I am going to set the > maximum resident set size to 1 GB (it is currently 1/2 GB). However, is > there a more clever way of doing this (ie if I did this on my work > station I would be limited to 1/8 GB or swap like crazy). > > Details: > > I configure using: > ./configure --prefix=/tools/emboss1.9.1 --disable-shared --with-x > --with-pngdriver > > on an O2K running Irix 6.5.10 and the MipsPro 7.3.1.2 Compilers > > Any ideas?? > > > > As a side note: when I tried indexing all of genbank I got almost 60000 > sequences generating the following warning notice: > > > > This is a warning: Duplicate ID skipped: 'XXXXXXXX' > > Is this an indication that the initial data needs to be cleaned up > first, or a non-issue? > > Thanks. > > > > --------------------------------------------------------------------- * Dr. David Martin Biotechnology Centre of Oslo * * Node Manager Gaustadalleen 21 * * The Norwegian EMBNet Node P.O. box 1125 Blindern * * tel +47 22 84 05 35 N-0317 Oslo * * fax +47 22 84 05 01 Norway * --------------------------------------------------------------------- I will be leaving the Norwegian EMBnet node on 23rd February. All work related mail should be addressed to admin at embnet.uio.no where my successor, Rune Groven will deal with it. All personal email should be sent to dmartin at hgmp.mrc.ac.uk from whence it will be automatically forwarded to me. Spam should continue to be sent to /dev/null From david.martin at biotek.uio.no Thu Feb 8 04:41:28 2001 From: david.martin at biotek.uio.no (David Martin) Date: Thu, 8 Feb 2001 10:41:28 +0100 Subject: [EMBnet ADMIN] Re: EMBOSS - Indexing breaks on large databases In-Reply-To: <200102080900.f1890pU3292249@embnet.cnb.uam.es> Message-ID: On Thu, 8 Feb 2001 jrvalverde at cnb.uam.es wrote: > "Len F. Zaifman" wrote: > > I have installed emboss 1.9.1 on an O2000. It installed nicely once I > > gave up on installing it shared. > > BTW, I have succeeded in compiling EMBOSS for IRIX using 64 bit > compilation. > > It required some tweaking, but works. The recipe for those willing to > give it a try is > > - remove 'gcc' from your path > - define COMPILER_DEFAULTS_PATH appropriately (see pe_environ) > to look for a compiler.defaults file containing e.g. > :abi=64:isa=4:proc=r10k > - ./configure in EMBOSS and all EMBASSY subdirs > - search in all files for 'CC = cc' and substitute it > for 'CC = cc -64' > - same for 'LD = /bin/ld' -> 'LD = /bin/ld -64' > - make > > The reason is that compiling depends on the Makefile and on libtool, > as well as linking. I didn't spend much in looking at configure since > the above steps where so straightforward. I know I should look into > the configure script and add an option for 64-bit-irix-compile or some > such, but that'll have to wait till I have time for it. > > Yes, I know, the search and substitute thing looks tedious, but it > isn't, honest: create a 'chfile.sh' out of the EMBOSS source hierarchy > containing > > #/bin/sh > cp $1 $1.orig > mv $1 tmpfile > sed -e 's/CC="cc"/CC="cc -64"/g' $1 > tmpfile > sed -e 's/CC = cc/CC = cc -64/g' $1 > tmpfile > sed -e 's/\/bin\/ld/\/bin\/ld -64/g' $1 > tmpfile > rm tmpfile > ## if you are sure, uncomment this > #rm $1.orig This will break if you have more than one ofthe sed commands run. It will either overwrite the previous substitutions or die if noclobber is set. put the sed commands in a file and source them with sed -f and it should work. alternatively setenv CC "cc -64"; setenv LD "/bin/ld -64"; ./configure may be easier (if configure is working properly). > > 'cd' to the emboss directory and run > > find . -type f -exec /path/to/chfile.sh {} \; -print > > and you are done with the CC changes. Libtool requires special > treatment since it uses quotes > ..d --------------------------------------------------------------------- * Dr. David Martin Biotechnology Centre of Oslo * * Node Manager Gaustadalleen 21 * * The Norwegian EMBNet Node P.O. box 1125 Blindern * * tel +47 22 84 05 35 N-0317 Oslo * * fax +47 22 84 05 01 Norway * --------------------------------------------------------------------- I will be leaving the Norwegian EMBnet node on 23rd February. All work related mail should be addressed to admin at embnet.uio.no where my successor, Rune Groven will deal with it. All personal email should be sent to dmartin at hgmp.mrc.ac.uk from whence it will be automatically forwarded to me. Spam should continue to be sent to /dev/null From jrvalverde at cnb.uam.es Thu Feb 8 05:45:29 2001 From: jrvalverde at cnb.uam.es (jrvalverde at cnb.uam.es) Date: Thu, 8 Feb 2001 11:45:29 +0100 (MET) Subject: [EMBnet ADMIN] Re: EMBOSS - Indexing breaks on large databases In-Reply-To: Message-ID: <200102081045.f18AjVM3331405@embnet.cnb.uam.es> David Martin wrote: > On Thu, 8 Feb 2001 jrvalverde at cnb.uam.es wrote: > > This will break if you have more than one ofthe sed commands run. It will > either overwrite the previous substitutions or die if noclobber is set. > > put the sed commands in a file and source them with sed -f and it should > work. > Certainly! I was writing from memory and got a user interrupting me with a problem meanwhile, I wrote it in a hassle and didn't realize it was mid-way (copy-pasted and half-edited lines). Shuld have been > sed -e 's/CC="cc"/CC="cc -64"/g' $1 > tmpfile > sed -e 's/CC = cc/CC = cc -64/g' tmpfile > $1 > sed -e 's/\/bin\/ld/\/bin\/ld -64/g' $1 > tmpfile > mv tmpfile > $1 Yeek! I should learn to be more careful with the [send] button or have a less window clobbered screen. j From jrvalverde at cnb.uam.es Thu Feb 8 05:52:56 2001 From: jrvalverde at cnb.uam.es (jrvalverde at cnb.uam.es) Date: Thu, 8 Feb 2001 11:52:56 +0100 (MET) Subject: [EMBnet ADMIN] Re: EMBOSS - Indexing breaks on large databases In-Reply-To: Message-ID: <200102081052.f18Aqv53259410@embnet.cnb.uam.es> David Martin wrote: > On Thu, 8 Feb 2001 jrvalverde at cnb.uam.es wrote: > > may be easier (if configure is working properly). > Nope, for some reason it didn't work for me, I got still the '-n32' on libtool.sh ... Which reminds me, I better have a look at the configure and fix it before anyone tries it, I may be forgetting some details. I keep being interrupted every two words I write. Lemme see, yes, libtool.sh contained LD="/bin/ld -n32" after ./configure, and the setenv doesn't work if you have gcc, for it finds it and uses it instead. Damn, forget my mails I'll write again when I don't get interrupted! j From jkb at mrc-lmb.cam.ac.uk Thu Feb 8 07:09:58 2001 From: jkb at mrc-lmb.cam.ac.uk (James Bonfield) Date: Thu, 8 Feb 2001 12:09:58 +0000 Subject: required vs optional In-Reply-To: <3A81916C.7E87ED03@lionbio.co.uk>; from Peter.Rice@uk.lionbioscience.com on Wed, Feb 07, 2001 at 06:18:20PM +0000 References: <20010207151329.A12981@arran.mrc-lmb.cam.ac.uk> <3A816A98.2BA3C3FF@lionbio.co.uk> <20010207165930.C19630@arran.mrc-lmb.cam.ac.uk> <3A8180D6.F158B05C@lionbio.co.uk> <20010207171649.A25071@arran.mrc-lmb.cam.ac.uk> <3A818648.72D3658A@lionbio.co.uk> <20010207175053.C25071@arran.mrc-lmb.cam.ac.uk> <3A81916C.7E87ED03@lionbio.co.uk> Message-ID: <20010208120958.D23361@arran.mrc-lmb.cam.ac.uk> On Wed, Feb 07, 2001 at 06:18:20PM +0000, Peter Rice wrote: > > req opt greyed out > > Y N N > > Y Y N > > N N N > > N Y Y > > Swap those last two. opt:y means it can be useful opt:n means you can change > it if you dare. This still leaves me confused. Swapping the last two gives: req opt greyed out Y N N Y Y N N N Y N Y N The defaults are both N, so not specifying either req or opt indicates a greyed-out option, which cannot be true surely? Can you please spell out for me exactly when we can be sure (during processing, not just at the time of producing help) when we know a question will be ignored (ie changing it's value has no effect)? > > Anyway, only when it's optional and not required should I grey them out. > > However as optional defaults to N only places where opt:Y is explicitly stated > > will I ever grey out paramaters, which unfortunately includes all those > > graphics options (eg this is syco): > > Some misunderstanding of 'required' perhaps? And optional too - I considered optional:N to indicate a mandatory question, but that's the purpose of required. > Can you just generate the tcl variable you need? If only req or opt will be > used you can just pick the value of whichever one is being set. Spare 'N' > settings can be safely ignored, as they are the default anyway. What if both req and opt are used (ignoring any opt:N and req:N settings)? embossdata looks to be the only one that uses both (but I didn't also check for param). Filename is defined as opt:Y req:$(fetch). I suppose as it's opt:Y then it may be useful and so should never be displayed as greyed-out. (I have been wondering about an "optional" tab so that the option values are automatically put into a separate area of the dialogue.) James -- James Bonfield (jkb at mrc-lmb.cam.ac.uk) Tel: 01223 402499 Fax: 01223 213556 Medical Research Council - Laboratory of Molecular Biology, Hills Road, Cambridge, CB2 2QH, England. Also see Staden Package WWW site at http://www.mrc-lmb.cam.ac.uk/pubseq/ From Peter.Rice at uk.lionbioscience.com Thu Feb 8 07:19:26 2001 From: Peter.Rice at uk.lionbioscience.com (Peter Rice) Date: Thu, 08 Feb 2001 12:19:26 +0000 Subject: required vs optional References: <20010207151329.A12981@arran.mrc-lmb.cam.ac.uk> <3A816A98.2BA3C3FF@lionbio.co.uk> <20010207165930.C19630@arran.mrc-lmb.cam.ac.uk> <3A8180D6.F158B05C@lionbio.co.uk> <20010207171649.A25071@arran.mrc-lmb.cam.ac.uk> <3A818648.72D3658A@lionbio.co.uk> <20010207175053.C25071@arran.mrc-lmb.cam.ac.uk> <3A81916C.7E87ED03@lionbio.co.uk> <20010208120958.D23361@arran.mrc-lmb.cam.ac.uk> Message-ID: <3A828ECE.E5E7F2BB@lionbio.co.uk> James Bonfield wrote: > The defaults are both N, so not specifying either req or opt indicates a > greyed-out option, which cannot be true surely? If both are N, EMBOSS will not prompt. The application will always receive a value, and it can always be set through the command line. > Can you please spell out for me exactly when we can be sure (during > processing, not just at the time of producing help) when we know a question > will be ignored (ie changing it's value has no effect)? You can't be sure - only the application code can tell you that. But you can be sure that the command line version will process everything in one pass so all the information is in the ACD file. > What if both req and opt are used (ignoring any opt:N and req:N settings)? > embossdata looks to be the only one that uses both (but I didn't also check > for param). Filename is defined as opt:Y req:$(fetch). I suppose as it's opt:Y > then it may be useful and so should never be displayed as greyed-out. Yes. If opt or req are Y you should not grey out. > (I have been wondering about an "optional" tab so that the option values are > automatically put into a separate area of the dialogue.) That is what we rather expected GUI developers to do. If you feel daring, you could add an advanced tab for the greyed out ones, but they may be ignored by the application. Our assumption was that anything that should appear in the GUI would be opt, but there are certainly some options that have no opt:Y at present. Feel free to suggest cases for promption to opt:Y. -- ------------------------------------------------ Peter Rice, LION Bioscience Ltd, Cambridge, UK peter.rice at uk.lionbioscience.com +44 1223 224723 From jkb at mrc-lmb.cam.ac.uk Thu Feb 8 07:43:49 2001 From: jkb at mrc-lmb.cam.ac.uk (James Bonfield) Date: Thu, 8 Feb 2001 12:43:49 +0000 Subject: required vs optional In-Reply-To: <3A828ECE.E5E7F2BB@lionbio.co.uk>; from Peter.Rice@uk.lionbioscience.com on Thu, Feb 08, 2001 at 12:19:26PM +0000 References: <20010207151329.A12981@arran.mrc-lmb.cam.ac.uk> <3A816A98.2BA3C3FF@lionbio.co.uk> <20010207165930.C19630@arran.mrc-lmb.cam.ac.uk> <3A8180D6.F158B05C@lionbio.co.uk> <20010207171649.A25071@arran.mrc-lmb.cam.ac.uk> <3A818648.72D3658A@lionbio.co.uk> <20010207175053.C25071@arran.mrc-lmb.cam.ac.uk> <3A81916C.7E87ED03@lionbio.co.uk> <20010208120958.D23361@arran.mrc-lmb.cam.ac.uk> <3A828ECE.E5E7F2BB@lionbio.co.uk> Message-ID: <20010208124349.F23361@arran.mrc-lmb.cam.ac.uk> On Thu, Feb 08, 2001 at 12:19:26PM +0000, Peter Rice wrote: > If both are N, EMBOSS will not prompt. The application will always receive a > value, and it can always be set through the command line. I think this is where the confusion comes from. ACD is designed for command lines where each question is presented one at a time, whereas I'm dealing in GUIs where the questions need to be shown all at once. For added complexity the users may answer questions out of order. > > Can you please spell out for me exactly when we can be sure (during > > processing, not just at the time of producing help) when we know a question > > will be ignored (ie changing it's value has no effect)? > > You can't be sure - only the application code can tell you that. So this is getting back to my original proposal. I'd come to the conclusion that the required and optional attributes where not enough to indicate whether an application needs a value for an option. Hence the suggestion of an extra attribute that indicates this (via an expression). Without this I basically have to have all options permanently available to the user even though I know many are useless (eg the user has selected not to perform a plot in pepwheel, but they are still asked to provide information about how the plot should look). I know on a web-form there's nothing that can be done about such matters, but most real windows of unix applications do make use of greying out unneeded options so that the complexity to the user is less. Would you have any objections to me making use of a needed: attribute? > But you can be sure that the command line version will process everything in > one pass so all the information is in the ACD file. But above you indicated that some information is known only by the application and so it isn't in the ACD file. > > (I have been wondering about an "optional" tab so that the option values are > > automatically put into a separate area of the dialogue.) > > That is what we rather expected GUI developers to do. The main problem is that the natural grouping is separated. Eg pepwheel -h indicates that the options tab would contain how to plot the wheel, but whether to perform the plot is in a different tab (advanced). Also for syco -graph and -outfile are both listed as mandatory, and yet the program will only ever use one or the other and never both (depending on the advanced parameter -plot). This would be rather confusing to the user. James -- James Bonfield (jkb at mrc-lmb.cam.ac.uk) Tel: 01223 402499 Fax: 01223 213556 Medical Research Council - Laboratory of Molecular Biology, Hills Road, Cambridge, CB2 2QH, England. Also see Staden Package WWW site at http://www.mrc-lmb.cam.ac.uk/pubseq/ From Peter.Rice at uk.lionbioscience.com Thu Feb 8 07:58:21 2001 From: Peter.Rice at uk.lionbioscience.com (Peter Rice) Date: Thu, 08 Feb 2001 12:58:21 +0000 Subject: required vs optional References: <20010207151329.A12981@arran.mrc-lmb.cam.ac.uk> <3A816A98.2BA3C3FF@lionbio.co.uk> <20010207165930.C19630@arran.mrc-lmb.cam.ac.uk> <3A8180D6.F158B05C@lionbio.co.uk> <20010207171649.A25071@arran.mrc-lmb.cam.ac.uk> <3A818648.72D3658A@lionbio.co.uk> <20010207175053.C25071@arran.mrc-lmb.cam.ac.uk> <3A81916C.7E87ED03@lionbio.co.uk> <20010208120958.D23361@arran.mrc-lmb.cam.ac.uk> <3A828ECE.E5E7F2BB@lionbio.co.uk> <20010208124349.F23361@arran.mrc-lmb.cam.ac.uk> Message-ID: <3A8297ED.256D80E8@lionbio.co.uk> James Bonfield wrote: > I think this is where the confusion comes from. ACD is designed for command > lines where each question is presented one at a time, whereas I'm dealing in > GUIs where the questions need to be shown all at once. For added complexity > the users may answer questions out of order. ACD is intended to cope with both. It forces the application to demand all input at the beginning. All we need to do is tweak whatever is needed to make GUIs comfortable. > Would you have any objections to me making use of a needed: attribute? Can we make opt/req do what you want? The original intention was to use opt for this (with an expression if necessary). > > But you can be sure that the command line version will process everything in > > one pass so all the information is in the ACD file. > > But above you indicated that some information is known only by the application > and so it isn't in the ACD file. Only in so far as the application can get a value for any option, and do whatever it wants with it. In effect, by greying out options, you are forcing EMBOSS to use the default value (which is what I would expect). > The main problem is that the natural grouping is separated. Eg pepwheel -h > indicates that the options tab would contain how to plot the wheel, but > whether to perform the plot is in a different tab (advanced). Hmmm. Well, we could make that option an opt:Y as well but it seems your tab system is rather artificial if you only have required and optional tabs. Is this fixed more easily by the option groups we were discussing earlier? > Also for syco -graph and -outfile are both listed as mandatory, and yet the > program will only ever use one or the other and never both (depending on the > advanced parameter -plot). This would be rather confusing to the user. Help will list them as mandatory because they both could be true. When you run "syco -help" is has no way to guess what the value of "plot" will be. Your interface can be more cunning by greying out -graph unless -plot is selected (ah, but as it is not currently optional you will not display it) Question: can you avoid greying out all options that are used in dependencies (that appear as $(name) or $(name.something) in the ACD definitions) ? -- ------------------------------------------------ Peter Rice, LION Bioscience Ltd, Cambridge, UK peter.rice at uk.lionbioscience.com +44 1223 224723 From Peter.Rice at uk.lionbioscience.com Thu Feb 8 08:13:45 2001 From: Peter.Rice at uk.lionbioscience.com (Peter Rice) Date: Thu, 08 Feb 2001 13:13:45 +0000 Subject: required vs optional References: <20010207165930.C19630@arran.mrc-lmb.cam.ac.uk> <3A8180D6.F158B05C@lionbio.co.uk> <20010207171649.A25071@arran.mrc-lmb.cam.ac.uk> <3A818648.72D3658A@lionbio.co.uk> <20010207175053.C25071@arran.mrc-lmb.cam.ac.uk> <3A81916C.7E87ED03@lionbio.co.uk> <20010208120958.D23361@arran.mrc-lmb.cam.ac.uk> <3A828ECE.E5E7F2BB@lionbio.co.uk> <20010208124349.F23361@arran.mrc-lmb.cam.ac.uk> <3A8297ED.256D80E8@lionbio.co.uk> <20010208130349.G23361@arran.mrc-lmb.cam.ac.uk> Message-ID: <3A829B89.EC8C7891@lionbio.co.uk> James Bonfield wrote: > I think you understand why I want to grey-out options. I do not want to > grey-out things which are merely optional, but things which really have no > effect whatsoever. For example -graph when -plot is 0. In this case EMBOSS > will not be using a default value, rather it won't be using it at all (and > specifying it on the command line will have no effect). This sort of thing > happens more often than you'd expect. > > Hence I do not think we can use opt/req for this as they already have other > purposes. OK. Since you are a good boy (i.e. you can cope with ACD expressions and stuff) let us experiment with a new option for GUIs only. How about: needed:y or needed:n default value would be assumed to be (parameter or required or optional) unless you think a default of Y is better. for cases where this is not enough, we expect to use an expression. In ajax/ajacd.c: static int nDefAttr = 11; ... change to 12 enum AcdEDef { ... add at the end: DEF_NEEDED (with a comma for DEF_EXPECTED on the line above) AcdOAttr acdAttrDef[] = { ... add before the NULL, VT_NULL line: {"needed", VT_BOOL}, /* value is needed, i.e. useful */ EMBOSS should now accept needed: in ACD files for all definitions (note: this is how comment, corba and style should have been added for cleanness) -- ------------------------------------------------ Peter Rice, LION Bioscience Ltd, Cambridge, UK peter.rice at uk.lionbioscience.com +44 1223 224723 From jrvalverde at cnb.uam.es Thu Feb 8 08:27:48 2001 From: jrvalverde at cnb.uam.es (jrvalverde at cnb.uam.es) Date: Thu, 8 Feb 2001 14:27:48 +0100 (MET) Subject: EMBOSS - Indexing breaks on large databases In-Reply-To: <200102080900.f1890pU3292249@embnet.cnb.uam.es> Message-ID: <200102081327.f18DRnN3428709@embnet.cnb.uam.es> wrote: > > #/bin/sh > cp $1 $1.orig > mv $1 tmpfile > sed -e 's/CC="cc"/CC="cc -64"/g' $1 > tmpfile > sed -e 's/CC = cc/CC = cc -64/g' $1 > tmpfile > sed -e 's/\/bin\/ld/\/bin\/ld -64/g' $1 > tmpfile > rm tmpfile > ## if you are sure, uncomment this > #rm $1.orig > Sorry, that won't work, I was (and am) being interrupted and writing from memory, and slipped on this. The correct recipe should ovbiously not rewrite temporary files, but rather be #/bin/sh cp $1 $1.orig mv $1 tmpfile sed -e 's/CC="cc"/CC="cc -64"/g' tmpfile > $1 sed -e 's/CC = cc/CC = cc -64/g' $1 > tmpfile sed -e 's/\/bin\/ld -n32/\/bin\/ld -64/g' tmpfile > $1 rm tmpfile ## if you are sure, uncomment this #rm $1.orig The problem seems to be that some parts of the compilation are handled by Makefile and others by libtool.sh, which makes overriding more difficult. Both are generated by configure, hence the right thing to do (if I find enough uninterrupted time) would be to change it instead. j From ableasby at hgmp.mrc.ac.uk Thu Feb 8 09:04:04 2001 From: ableasby at hgmp.mrc.ac.uk (ableasby at hgmp.mrc.ac.uk) Date: Thu, 8 Feb 2001 14:04:04 GMT Subject: EMBOSS - Indexing breaks on large databases Message-ID: <200102081404.OAA25562@bromine.hgmp.mrc.ac.uk> You should indeed be worried about the duplicate entries (normally caused by indexing the database and updates at the same time) as you'll never be certain which one is retrieved. It is best to index them separately. wrt the 64-bit thing on this same thread. Coincidentally I'm working on this at the moment. The compilation flags are not the only concern. There's ftell64's, ftello's [and fseeks] and, of course, the indexing. That's not even mentioning ajints and ajlongs :-) Alan From jkb at mrc-lmb.cam.ac.uk Thu Feb 8 09:46:27 2001 From: jkb at mrc-lmb.cam.ac.uk (James Bonfield) Date: Thu, 8 Feb 2001 14:46:27 +0000 Subject: required vs optional In-Reply-To: <3A829B89.EC8C7891@lionbio.co.uk>; from Peter.Rice@uk.lionbioscience.com on Thu, Feb 08, 2001 at 01:13:45PM +0000 References: <20010207171649.A25071@arran.mrc-lmb.cam.ac.uk> <3A818648.72D3658A@lionbio.co.uk> <20010207175053.C25071@arran.mrc-lmb.cam.ac.uk> <3A81916C.7E87ED03@lionbio.co.uk> <20010208120958.D23361@arran.mrc-lmb.cam.ac.uk> <3A828ECE.E5E7F2BB@lionbio.co.uk> <20010208124349.F23361@arran.mrc-lmb.cam.ac.uk> <3A8297ED.256D80E8@lionbio.co.uk> <20010208130349.G23361@arran.mrc-lmb.cam.ac.uk> <3A829B89.EC8C7891@lionbio.co.uk> Message-ID: <20010208144627.I23361@arran.mrc-lmb.cam.ac.uk> On Thu, Feb 08, 2001 at 01:13:45PM +0000, Peter Rice wrote: > OK. Since you are a good boy (i.e. you can cope with ACD expressions and > stuff) let us experiment with a new option for GUIs only. > > How about: > > needed:y or needed:n > > default value would be assumed to be (parameter or required or optional) > unless you think a default of Y is better. I think that's enough. The default will indeed cover most cases. I'll need to ponder about the best way to code this default, but I guess that's my problem :-} On a related note, I've just had a hunt for cases where param and req are both specified differently. I found one case in vectorstrip: ... bool: vectorfile [ param: Y def: Y prompt: "Are your vector sequences in a file?" ] infile: vectors [ param: Y req: @($(vectorfile)?Y:N) nullok: Y def: "" prompt: "Name of vectorfile" ] ... The help lists this as: Mandatory qualifiers (* if not always prompted): [-sequence] seqall (no help text) seqall value [-[no]vectorfile] bool Are your vector sequences in a file? * [-vectors] infile Name of vectorfile * -linkera string 5' sequence * -linkerb string 3' sequence -mismatch integer Max allowed % mismatch -[no]besthits bool Show only the best hits (minimise mismatches)? [-outf] outfile (no help text) outfile value [-outseq] seqoutall (no help text) seqoutall value Optional qualifiers: (none) Advanced qualifiers: (none) So on the command line we can do: vectorstrip dna.embl 1 vector_file outfile This asks a few remaining questions, then uses a vector file named vector_file, and finally saves the results to outfile. However if I try: vectorstrip dna.embl 0 outfile then I have problems because outfile isn't a vector file. To specify this correctly I have to use: vectorstrip dna.embl 0 vector_file outfile vector_file must exist, even though it's not used. This is because of specifying both req and param. Param is always Y regardless of the previous question, but it's not always required. This makes sense when specifying "-qualifier value" syntax, but not with the param syntax. I've checked and the following works as I originally expected: infile: vectors [ param: @($(vectorfile)?Y:N) nullok: Y def: "" prompt: "Name of vectorfile" ] However this does mean there's a different number of items on the command line depending on whether vectorfile was set or not, however such things are presumably only an issue for programs invoking vectorstrip, in which case they'd be more self-documenting if they use -qualifier anyway. James -- James Bonfield (jkb at mrc-lmb.cam.ac.uk) Tel: 01223 402499 Fax: 01223 213556 Medical Research Council - Laboratory of Molecular Biology, Hills Road, Cambridge, CB2 2QH, England. Also see Staden Package WWW site at http://www.mrc-lmb.cam.ac.uk/pubseq/ From leonardz at bioinfo.sickkids.on.ca Thu Feb 8 09:51:44 2001 From: leonardz at bioinfo.sickkids.on.ca (Len F. Zaifman) Date: Thu, 08 Feb 2001 09:51:44 -0500 Subject: EMBOSS - Indexing breaks on large databases References: <200102081404.OAA25562@bromine.hgmp.mrc.ac.uk> Message-ID: <3A82B280.5F5878D@bioinfo.sickkids.on.ca> ableasby at hgmp.mrc.ac.uk wrote: > > You should indeed be worried about the duplicate entries > (normally caused by indexing the database and updates > at the same time) as you'll never be certain which one is > retrieved. It is best to index them separately. > > wrt the 64-bit thing on this same thread. Coincidentally > I'm working on this at the moment. The compilation flags > are not the only concern. There's ftell64's, ftello's > [and fseeks] and, of course, the indexing. That's not > even mentioning ajints and ajlongs :-) > > Alan Thanks Alan but I think I may not have been clear enough: I don't think that dbiflat it self has the problem (although I could be wrong). I think the problem comes from: > the indexing breaks due to sort running out of memory: << my comment > > UX:sort: ERROR: Out of memory before merge: Not enough space << the error reported So I believe it is a system issue where my resident memory set size needs to be increased. I was hoping someone had a workaround to get sort to work within available memory, and not request the an amount beyond the limit. Having said that, geeting a 64 bit clean emboss would be great. Thanks to the others who responded as well. -------------- next part -------------- A non-text attachment was scrubbed... Name: leonardz.vcf Type: text/x-vcard Size: 358 bytes Desc: Card for Len F. Zaifman Url : http://lists.open-bio.org/pipermail/emboss-dev/attachments/20010208/b966abae/attachment.vcf From ableasby at hgmp.mrc.ac.uk Thu Feb 8 10:01:30 2001 From: ableasby at hgmp.mrc.ac.uk (ableasby at hgmp.mrc.ac.uk) Date: Thu, 8 Feb 2001 15:01:30 GMT Subject: EMBOSS - Indexing breaks on large databases Message-ID: <200102081501.PAA06513@bromine.hgmp.mrc.ac.uk> Hi Leonard, No, you were indeed perfectly clear. I was replying on the second matter since the sort error message is a system one. The emboss indexing programs just use a C "system()" call for performing the sorts so if anything goes wrong after that point its an SEP (Somebody elses problem) as far as EMBOSS is concerned. There is a sort option for those programs but I would NOT recommend using it. I don't think the person who wrote that bit of code/modified it was ever entirely happy with it and I am going to clean it up as part of the 64 bit process. Rgds Alan From Peter.Rice at uk.lionbioscience.com Thu Feb 8 10:07:27 2001 From: Peter.Rice at uk.lionbioscience.com (Peter Rice) Date: Thu, 08 Feb 2001 15:07:27 +0000 Subject: EMBOSS - Indexing breaks on large databases References: <200102081501.PAA06513@bromine.hgmp.mrc.ac.uk> Message-ID: <3A82B62F.D1FE8406@lionbio.co.uk> ableasby at hgmp.mrc.ac.uk wrote: > There is a sort option for those programs but I would NOT > recommend using it. I don't think the person who wrote that bit > of code/modified it was ever entirely happy with it and I am > going to clean it up as part of the 64 bit process. That was originally added because of very strange problems with GNU sort in Norway, which appeared to have a mind of its own (GNU sort, not Norway) when deciding how to sort entry names. "dbiflat -nocleanup" will leave the temporary files around, so you can try sorting them by hand to see what resources it needs. The "-debug" command line option will write a dbiflat.dbg file that includes the sort commands used. The last one will be the one that fell over. -- ------------------------------------------------ Peter Rice, LION Bioscience Ltd, Cambridge, UK peter.rice at uk.lionbioscience.com +44 1223 224723 From jkb at mrc-lmb.cam.ac.uk Thu Feb 8 10:36:19 2001 From: jkb at mrc-lmb.cam.ac.uk (James Bonfield) Date: Thu, 8 Feb 2001 15:36:19 +0000 Subject: Corrections to ACD Syntax manual Message-ID: <20010208153619.A26987@arran.mrc-lmb.cam.ac.uk> Just a couple small problems I encountered near the start of this manual. 1. "Parameters and qualifiers are defined by a single token followed by either a colon ':'" and "The first token in the file must be "application" directly followed by a colon ':' or an equal sign '='." I think the phrase "directed followed" is misleading, as "token : value" is just as valid as "token: value". Whether this is deliberate or not I do not know, but I see both examples liberally used. 2. "Values can be delimited (i.e. treated as one token) by any of the following pairs, which are stripped as the value is parsed : '' {} () [] <> " The only quoting I find used is double quotes, which isn't listed above. I also wonder whether this many quoting styles is just a symptom of the lack of escaping mechanism. Adding backslash support would probably allow great simplification of this. James -- James Bonfield (jkb at mrc-lmb.cam.ac.uk) Tel: 01223 402499 Fax: 01223 213556 Medical Research Council - Laboratory of Molecular Biology, Hills Road, Cambridge, CB2 2QH, England. Also see Staden Package WWW site at http://www.mrc-lmb.cam.ac.uk/pubseq/ From Peter.Rice at uk.lionbioscience.com Thu Feb 8 10:48:03 2001 From: Peter.Rice at uk.lionbioscience.com (Peter Rice) Date: Thu, 08 Feb 2001 15:48:03 +0000 Subject: Corrections to ACD Syntax manual References: <20010208153619.A26987@arran.mrc-lmb.cam.ac.uk> Message-ID: <3A82BFB3.339115D7@lionbio.co.uk> James Bonfield wrote: > "The first token in the file must be "application" directly followed by a > colon ':' or an equal sign '='." > > I think the phrase "directly followed" is misleading, as "token : value" is > just as valid as "token: value". Whether this is deliberate or not I do not > know, but I see both examples liberally used. The original parser was very forgiving. It allows a few other formats too. It should be fixed. Meanwhile, the documentation can be economical with the truth by only giving the officially approved style. > 2. "Values can be delimited (i.e. treated as one token) by any of the > following pairs, which are stripped as the value is parsed : > > '' {} () [] <> > > The only quoting I find used is double quotes, which isn't listed above. Oops. Never noticed that one. Thanks. > I also wonder whether this many quoting styles is just a symptom of the lack > of escaping mechanism. Adding backslash support would probably allow great > simplification of this. Spot on there. Plus it was great fun to write :-) The alternatives never caught on, and others have pointed out that the brackets could be useful for alternative forms of syntax (lists of values, for example) in some future release. Should be fixed - at least backslash support should be added. Probably not the highest priority, something for the weekend perhaps. -- ------------------------------------------------ Peter Rice, LION Bioscience Ltd, Cambridge, UK peter.rice at uk.lionbioscience.com +44 1223 224723 From jkb at mrc-lmb.cam.ac.uk Thu Feb 8 10:57:18 2001 From: jkb at mrc-lmb.cam.ac.uk (James Bonfield) Date: Thu, 8 Feb 2001 15:57:18 +0000 Subject: required vs optional In-Reply-To: <3A829B89.EC8C7891@lionbio.co.uk>; from Peter.Rice@uk.lionbioscience.com on Thu, Feb 08, 2001 at 01:13:45PM +0000 References: <20010207171649.A25071@arran.mrc-lmb.cam.ac.uk> <3A818648.72D3658A@lionbio.co.uk> <20010207175053.C25071@arran.mrc-lmb.cam.ac.uk> <3A81916C.7E87ED03@lionbio.co.uk> <20010208120958.D23361@arran.mrc-lmb.cam.ac.uk> <3A828ECE.E5E7F2BB@lionbio.co.uk> <20010208124349.F23361@arran.mrc-lmb.cam.ac.uk> <3A8297ED.256D80E8@lionbio.co.uk> <20010208130349.G23361@arran.mrc-lmb.cam.ac.uk> <3A829B89.EC8C7891@lionbio.co.uk> Message-ID: <20010208155718.J23361@arran.mrc-lmb.cam.ac.uk> On Thu, Feb 08, 2001 at 01:13:45PM +0000, Peter Rice wrote: > How about: > > needed:y or needed:n > > default value would be assumed to be (parameter or required or optional) > unless you think a default of Y is better. Looking at this in more detail, I'm unsure of the default value. Specifically "parameter or required or optional" implies that the default is no, as the default for all three dependent variables is also no. A rough check implies there are some 400ish items where none of parameter, require or optional are defined (to be anything other than N or just the default). Indeed it's only likely to be the case that we wish to set needed:n for an option that has required or optional as an expression, which is probably only a small percentage. I found roughly 50 expressions used in opt and req. If we set the default of needed to be Y then most (but perhaps not all) of these will need the req: expression duplicating in the needed: value. Given that 50 is much less than 400 may I ask for the default to be needed:y? I'm only thinking about producing less emboss changes and it has nothing to do with the fact that dealing with a non-constant default value is a bit tricky for me. :-) James -- James Bonfield (jkb at mrc-lmb.cam.ac.uk) Tel: 01223 402499 Fax: 01223 213556 Medical Research Council - Laboratory of Molecular Biology, Hills Road, Cambridge, CB2 2QH, England. Also see Staden Package WWW site at http://www.mrc-lmb.cam.ac.uk/pubseq/ From jkb at mrc-lmb.cam.ac.uk Thu Feb 8 11:05:06 2001 From: jkb at mrc-lmb.cam.ac.uk (James Bonfield) Date: Thu, 8 Feb 2001 16:05:06 +0000 Subject: Corrections to ACD Syntax manual In-Reply-To: <3A82BFB3.339115D7@lionbio.co.uk>; from Peter.Rice@uk.lionbioscience.com on Thu, Feb 08, 2001 at 03:48:03PM +0000 References: <20010208153619.A26987@arran.mrc-lmb.cam.ac.uk> <3A82BFB3.339115D7@lionbio.co.uk> Message-ID: <20010208160506.K23361@arran.mrc-lmb.cam.ac.uk> On Thu, Feb 08, 2001 at 03:48:03PM +0000, Peter Rice wrote: > Should be fixed - at least backslash support should be added. The only tricky bit is dealing with the existing backslash mechanism which is used for adding newline characters (eg see transeq.acd). (Which incidently I couldn't find documented either...) For what it's worth, my own parser (written in vanilla tcl) uses the following regular expressions for strings: set tlist { {^.(.*).$} {\1} {\\[ \n\r]+} {\\n} {[ \n\r]+} { } {\\n} "\n" {\\(.)} {\1} } set rules [format { # ... {"(\\.|[^"\\])*"} STRING {%s} {'(\\.|[^'\\])*'} STRING {%s} {<(\\.|[^>\\])*>} STRING {%s} {\{(\\.|[^\}\\])*\}} STRING {%s} # ... } $tlist $tlist $tlist $tlist This is my own lex-style hack. The rules are matched one at a time in the order they are listed. If a rule matches then the token type (STRING) is added to the token list and the token value is edited, if appropriate, based on a series of 'regsub' calls (listed in the $tlist variable here). My STRING definition does include backslashing already, which is of course not strictly correct at the moment (but I don't know how often it really matters). The substitutions (tlist) are a way to simulate the emboss acd parser mechanism of squashing multiple white-space into a single space character, and adding newlines with a single backslash. I'm not suggesting for a minute anyone copies my code, but the regular expressions may be handy for other people trying to parse ACD. James -- James Bonfield (jkb at mrc-lmb.cam.ac.uk) Tel: 01223 402499 Fax: 01223 213556 Medical Research Council - Laboratory of Molecular Biology, Hills Road, Cambridge, CB2 2QH, England. Also see Staden Package WWW site at http://www.mrc-lmb.cam.ac.uk/pubseq/ From Peter.Rice at uk.lionbioscience.com Thu Feb 8 11:07:13 2001 From: Peter.Rice at uk.lionbioscience.com (Peter Rice) Date: Thu, 08 Feb 2001 16:07:13 +0000 Subject: required vs optional References: <20010207171649.A25071@arran.mrc-lmb.cam.ac.uk> <3A818648.72D3658A@lionbio.co.uk> <20010207175053.C25071@arran.mrc-lmb.cam.ac.uk> <3A81916C.7E87ED03@lionbio.co.uk> <20010208120958.D23361@arran.mrc-lmb.cam.ac.uk> <3A828ECE.E5E7F2BB@lionbio.co.uk> <20010208124349.F23361@arran.mrc-lmb.cam.ac.uk> <3A8297ED.256D80E8@lionbio.co.uk> <20010208130349.G23361@arran.mrc-lmb.cam.ac.uk> <3A829B89.EC8C7891@lionbio.co.uk> <20010208155718.J23361@arran.mrc-lmb.cam.ac.uk> Message-ID: <3A82C431.FBC08E98@lionbio.co.uk> James Bonfield wrote: > Given that 50 is much less than 400 may I ask for the default to be needed:y? > > I'm only thinking about producing less emboss changes and it has nothing to do > with the fact that dealing with a non-constant default value is a bit tricky > for me. :-) No problem at all - we plan to ignore it, unless we can think of a use for it (like a -needed command line qualifier that only prompts for the needed values. You are completely free to choose your own default. -- ------------------------------------------------ Peter Rice, LION Bioscience Ltd, Cambridge, UK peter.rice at uk.lionbioscience.com +44 1223 224723 From Peter.Rice at uk.lionbioscience.com Thu Feb 8 11:30:58 2001 From: Peter.Rice at uk.lionbioscience.com (Peter Rice) Date: Thu, 08 Feb 2001 16:30:58 +0000 Subject: required vs optional References: Message-ID: <3A82C9C2.D99E240E@lionbio.co.uk> David Martin wrote (but not to the list): > > On Thu, 8 Feb 2001, Peter Rice wrote: > > > No problem at all - we plan to ignore it, unless we can think of a use for it > > (like a -needed command line qualifier that only prompts for the needed > > values. You are completely free to choose your own default. > > Umm.. surely there should be a defined default at the ACD level, not at > the acd processeing application level, and James is right with needed: y > because it is somewhat stupid to have a default for options that says they > are not needed (in which case why put them in?) EMBOSS will have a default (N) but will not be using it. We can define a default behaviour later if we want to make use of it. A -needed qualifier would be fun, but probably not of general interest. > Is there a copy of the ACD guide in anything other than HTML or PS, and if > so, can I have it to add to the collection. I am tempted to go mad and > rewrite everything into DocBook.. There was an MS-Word original, but I believe it is now maintained in HTML. -- ------------------------------------------------ Peter Rice, LION Bioscience Ltd, Cambridge, UK peter.rice at uk.lionbioscience.com +44 1223 224723 From gwilliam at hgmp.mrc.ac.uk Thu Feb 8 11:36:04 2001 From: gwilliam at hgmp.mrc.ac.uk (Gary Williams, Tel 01223 494522) Date: Thu, 08 Feb 2001 16:36:04 +0000 Subject: required vs optional References: <3A82C9C2.D99E240E@lionbio.co.uk> Message-ID: <3A82CAF4.D0F38DE5@hgmp.mrc.ac.uk> Peter Rice wrote: > > Is there a copy of the ACD guide in anything other than HTML or PS, and if > > so, can I have it to add to the collection. I am tempted to go mad and > > rewrite everything into DocBook.. > > There was an MS-Word original, but I believe it is now maintained in HTML. This is the case. -- Gary Williams Tel: +44 1223 494522 Fax: +44 1223 494512 mailto:G.Williams at hgmp.mrc.ac.uk http://www.hgmp.mrc.ac.uk/ Bioinformatics,MRC HGMP Resource Centre,Hinxton,Cambridge, CB10 1SB,UK From jrvalverde at cnb.uam.es Fri Feb 9 07:41:37 2001 From: jrvalverde at cnb.uam.es (jrvalverde at cnb.uam.es) Date: Fri, 9 Feb 2001 13:41:37 +0100 (MET) Subject: 64 bit & irix Message-ID: <200102091241.f19Cfcj3491505@embnet.cnb.uam.es> So, I finally had time to take a look at the configure script. It already contains a check for 64 bit irix compilation, and, when correctly used, generates the correct compilation scripts and makefiles. The problem is that it checks whether 64-bit is enabled _by default_, not whether it is available or has been user-selected. What ./configure does is create a trivial C file, compile it and check whether it is -o32, -n32 or -64, and then decide which compiler to use. Now, there is a simpler trick than the one I proposed: it all reduces to making the compiler/linker use the appropriate defaults _before_ running configure. For this: - remove gcc from your path - create a compiler.defaults file containing -DEFAULT:abi=64:isa=mips4:proc=r10k - define COMPILER_DEFAULTS_PATH setenv COMPILER_DEFAULTS_PATH `pwd` - run configure And that's it. Defining cc to be 'cc -64' and ld to be 'ld -64' and running configure dosn't work. In that case, the linker used is still the -n32 linker, and build fails with linking errors. You need to go the obscure trick way. One reason is configure hard codes the 'default' compilation method (which being default, it shouldn't need to) for the linker. Hence although if one aliases 'cc/gcc' to 64bit, it will still hardcode ld to be the default 'ld -32'. Should it just use "cc/gcc" and "ld" only, this wouldn't happen as defaults would be carried all along, this way the explicit flag overrides the aliases. So unless you get the explicit flags right by using the above trick, it won't work. BTW, this mess seems to happen (from what I gather looking at ./configure) only on IRIX. Makes sense, since that's probably the only system too that tries to maintain three incompatible binary systems defaulting compilation to 32 bits in 64 bit machines (sigh). On a side note: it won't work for gcc: the -64 test in configure is only used if the compiler is 'cc'. No check for --mabi=64 seems to be present. For gcc I suspect one will need to run ./configure and then substitute all instances of 'gcc' by 'gcc --mabi=64' and of 'ld -n32' by 'ld -64'. Haven't tested though. Or configure as above with system cc and later alias cc to gcc --mabi=64. BTW, David, might you update the trick you added to the admin guide? Thanks. j From david.martin at biotek.uio.no Fri Feb 9 08:53:46 2001 From: david.martin at biotek.uio.no (David Martin) Date: Fri, 9 Feb 2001 14:53:46 +0100 Subject: [EMBnet ADMIN] 64 bit & irix In-Reply-To: <200102091241.f19Cfcj3491505@embnet.cnb.uam.es> Message-ID: OK, I'll try this. I'll have to try to get 64 bit versions of the PNG libraries first. ..d The admin guide will get updated in due course. On Fri, 9 Feb 2001 jrvalverde at cnb.uam.es wrote: > So, I finally had time to take a look at the configure script. > > It already contains a check for 64 bit irix compilation, and, > when correctly used, generates the correct compilation scripts > and makefiles. The problem is that it checks whether 64-bit is > enabled _by default_, not whether it is available or has been > user-selected. > > What ./configure does is create a trivial C file, compile it > and check whether it is -o32, -n32 or -64, and then decide > which compiler to use. > > Now, there is a simpler trick than the one I proposed: it all > reduces to making the compiler/linker use the appropriate > defaults _before_ running configure. For this: > > - remove gcc from your path > > - create a compiler.defaults file containing > > -DEFAULT:abi=64:isa=mips4:proc=r10k > > - define COMPILER_DEFAULTS_PATH > > setenv COMPILER_DEFAULTS_PATH `pwd` > > - run configure > > And that's it. > > Defining cc to be 'cc -64' and ld to be 'ld -64' and running > configure dosn't work. In that case, the linker used is still > the -n32 linker, and build fails with linking errors. You > need to go the obscure trick way. > > One reason is configure hard codes the 'default' compilation > method (which being default, it shouldn't need to) for the > linker. Hence although if one aliases 'cc/gcc' to 64bit, > it will still hardcode ld to be the default 'ld -32'. Should > it just use "cc/gcc" and "ld" only, this wouldn't happen as > defaults would be carried all along, this way the explicit > flag overrides the aliases. So unless you get the explicit > flags right by using the above trick, it won't work. > > BTW, this mess seems to happen (from what I gather looking at > ./configure) only on IRIX. Makes sense, since that's probably > the only system too that tries to maintain three incompatible > binary systems defaulting compilation to 32 bits in 64 bit > machines (sigh). > > On a side note: it won't work for gcc: the -64 test in configure > is only used if the compiler is 'cc'. No check for --mabi=64 > seems to be present. For gcc I suspect one will need to run > ./configure and then substitute all instances of 'gcc' by > 'gcc --mabi=64' and of 'ld -n32' by 'ld -64'. Haven't tested > though. Or configure as above with system cc and later alias > cc to gcc --mabi=64. > > BTW, David, might you update the trick you added to the admin > guide? Thanks. > > j > > > > --------------------------------------------------------------------- * Dr. David Martin Biotechnology Centre of Oslo * * Node Manager Gaustadalleen 21 * * The Norwegian EMBNet Node P.O. box 1125 Blindern * * tel +47 22 84 05 35 N-0317 Oslo * * fax +47 22 84 05 01 Norway * --------------------------------------------------------------------- I will be leaving the Norwegian EMBnet node on 23rd February. All work related mail should be addressed to admin at embnet.uio.no where my successor, Rune Groven will deal with it. All personal email should be sent to dmartin at hgmp.mrc.ac.uk from whence it will be automatically forwarded to me. Spam should continue to be sent to /dev/null From jkb at mrc-lmb.cam.ac.uk Tue Feb 13 08:03:06 2001 From: jkb at mrc-lmb.cam.ac.uk (James Bonfield) Date: Tue, 13 Feb 2001 13:03:06 +0000 Subject: More ACD Syntax manual corrections Message-ID: <20010213130306.A25301@arran.mrc-lmb.cam.ac.uk> The calculated attributes are poorly defined between seqall and seqset. The main table under the heading of "Ajax Data Types" lists totweight as calculated in seqset, but not seqall. Also all of the sequence types have nucleic as a calculated attribute. Later on in the "Calculated Attributes" section another table is presented where seqset is mislabeled as seqall. Nucleic is also missing from all of the types shown. Finally, dbifasta, dbiflat and dbigcg all use the keyword "is:" in ACD expressions. For now I'm assuming that "is: $(dbname)" means $(dbname) != "". Could is: be added to the documentation? (Perhaps it is, but searching for "is" is lunacy, and is: finds nothing relevant.) James -- James Bonfield (jkb at mrc-lmb.cam.ac.uk) Tel: 01223 402499 Fax: 01223 213556 Medical Research Council - Laboratory of Molecular Biology, Hills Road, Cambridge, CB2 2QH, England. Also see Staden Package WWW site at http://www.mrc-lmb.cam.ac.uk/pubseq/ From jkb at mrc-lmb.cam.ac.uk Tue Feb 13 14:11:53 2001 From: jkb at mrc-lmb.cam.ac.uk (James Bonfield) Date: Tue, 13 Feb 2001 19:11:53 +0000 Subject: pepwindow bugs? Message-ID: <20010213191153.A1360@arran.mrc-lmb.cam.ac.uk> I'm having trouble getting pepwindow to look anything like the results produce using xpip's (OLD code!) Kyte & Doolittle implementation. For an easier example than xpip (which is available from our ftp site, but I don't expect many to have it) also try looking at the java program at: http://arbl.cvmbs.colostate.edu/molkit/hydropathy/ On the demo sequence there xpip and this java app give essentially identical results, while pepwindow gives something looking VERY different. Any explanations? James -- James Bonfield (jkb at mrc-lmb.cam.ac.uk) Tel: 01223 402499 Fax: 01223 213556 Medical Research Council - Laboratory of Molecular Biology, Hills Road, Cambridge, CB2 2QH, England. Also see Staden Package WWW site at http://www.mrc-lmb.cam.ac.uk/pubseq/ From jkb at mrc-lmb.cam.ac.uk Tue Feb 13 14:16:14 2001 From: jkb at mrc-lmb.cam.ac.uk (James Bonfield) Date: Tue, 13 Feb 2001 19:16:14 +0000 Subject: abiview bug Message-ID: <20010213191614.A9162@arran.mrc-lmb.cam.ac.uk> I'm getting assertion failed in quite a few ABI3100 files with abiview. Unfortunately these files are not public so I cannot make them available to the emboss-dev team. So are there any obvious ways I can provide more debugging info? (Except for this:) Uncaught exception Assertion failed raised at ../../ajax/ajmem.c:78 EMBOSS An error in ../../ajax/ajexcept.c at line 56: aborting... By breaking at _exit I get: #0 0x40486ca0 in _exit () from /lib/libc.so.6 #1 0x40405085 in exit () at exit.c:82 #2 0x40171846 in ajMessCrashFL () from /nfs/arran/home5/pubseq/share/private/emboss/EMBOSS-1.9.1//linux-binaries/lib/libajax.so.0 #3 0x4015e1ee in ajExceptRaise () from /nfs/arran/home5/pubseq/share/private/emboss/EMBOSS-1.9.1//linux-binaries/lib/libajax.so.0 #4 0x40170c4d in ajMemCalloc () from /nfs/arran/home5/pubseq/share/private/emboss/EMBOSS-1.9.1//linux-binaries/lib/libajax.so.0 #5 0x40170cfb in ajMemCalloc0 () from /nfs/arran/home5/pubseq/share/private/emboss/EMBOSS-1.9.1//linux-binaries/lib/libajax.so.0 #6 0x4012ab2c in ajGraphxyDataNewI () from /nfs/arran/home5/pubseq/share/private/emboss/EMBOSS-1.9.1//linux-binaries/lib/libajaxg.so.0 #7 0x8049859 in graphDisplay () #8 0x804962f in main () #9 0x403f1b5c in __libc_start_main (main=0x804932c
, argc=2, ubp_av=0xbffff0e4, init=0x8048e70 <_init>, fini=0x8049b7c <_fini>, rtld_fini=0x4000d634 <_dl_fini>, stack_end=0xbffff0dc) at ../sysdeps/generic/libc-start.c:129 Alas this is compiled with the standard cc -O options, but if this isn't a known issue then I can try producing a debug version. James -- James Bonfield (jkb at mrc-lmb.cam.ac.uk) Tel: 01223 402499 Fax: 01223 213556 Medical Research Council - Laboratory of Molecular Biology, Hills Road, Cambridge, CB2 2QH, England. Also see Staden Package WWW site at http://www.mrc-lmb.cam.ac.uk/pubseq/ From ableasby at hgmp.mrc.ac.uk Tue Feb 13 19:48:43 2001 From: ableasby at hgmp.mrc.ac.uk (ableasby at hgmp.mrc.ac.uk) Date: Wed, 14 Feb 2001 00:48:43 GMT Subject: Pepwindow bug? Message-ID: <200102140048.AAA06352@tin.hgmp.mrc.ac.uk> Hi Jack (et al), I can comment on a function but not the program :-) From what you say an ajStrCleanWhite which removes leading, trailing and excess whitespace from a string might do the trick. Author? :-) Alan From ableasby at hgmp.mrc.ac.uk Tue Feb 13 20:27:31 2001 From: ableasby at hgmp.mrc.ac.uk (ableasby at hgmp.mrc.ac.uk) Date: Wed, 14 Feb 2001 01:27:31 GMT Subject: Pepwindow bug? Message-ID: <200102140127.BAA04190@bromine.hgmp.mrc.ac.uk> Yes, (if I'd got the function name right), just add an ajStrClean(&buffer); after each of the "line++" lines. You can tell its not one of mine..... I'd have used "++line" .... not that it matters. Alan From gwilliam at hgmp.mrc.ac.uk Wed Feb 14 04:17:33 2001 From: gwilliam at hgmp.mrc.ac.uk (Gary Williams, Tel 01223 494522) Date: Wed, 14 Feb 2001 09:17:33 +0000 Subject: [Fwd: GP and Arka -- software for molecular biology] Message-ID: <3A8A4D2D.CF873866@hgmp.mrc.ac.uk> Don Gilbert wrote: > > Gary, > > Re: wrappers (guis) for emboss -- I have done enough tests > to know that emboss components can be run from SeqPup's > sequence analysis program (Java based). It may be a while > till I can find time to do enough work though to make > emboss usable from seqpup, but I'm hopeful maybe by this > summer to have something. I'll pass on to emboss > mail lists any progress in using w/ SeqPup. > > -- Don > > -- > -- d.gilbert--bioinformatics--indiana-u--bloomington-in-47405 > -- gilbertd at bio.indiana.edu -- Gary Williams Tel: +44 1223 494522 Fax: +44 1223 494512 mailto:G.Williams at hgmp.mrc.ac.uk http://www.hgmp.mrc.ac.uk/ Bioinformatics,MRC HGMP Resource Centre,Hinxton,Cambridge, CB10 1SB,UK From jkb at mrc-lmb.cam.ac.uk Wed Feb 14 04:37:42 2001 From: jkb at mrc-lmb.cam.ac.uk (James Bonfield) Date: Wed, 14 Feb 2001 09:37:42 +0000 Subject: pepwindow bugs? In-Reply-To: ; from jackl@cmbi.kun.nl on Wed, Feb 14, 2001 at 01:41:19AM +0100 References: <20010213191153.A1360@arran.mrc-lmb.cam.ac.uk> Message-ID: <20010214093742.B9543@arran.mrc-lmb.cam.ac.uk> Hi Jack, > > results, while pepwindow gives something looking VERY different. Any > > explanations? > > You're absolutely right, the output is absolute rubbish! Very good > for a non-biologist! :-) Credit has to be shared with David Judge as it was a joint discovery. When a true fix is found (I haven't got time to follow the hints at present) could someone please email me the new C file? James -- James Bonfield (jkb at mrc-lmb.cam.ac.uk) Tel: 01223 402499 Fax: 01223 213556 Medical Research Council - Laboratory of Molecular Biology, Hills Road, Cambridge, CB2 2QH, England. Also see Staden Package WWW site at http://www.mrc-lmb.cam.ac.uk/pubseq/ From ableasby at hgmp.mrc.ac.uk Wed Feb 14 04:52:12 2001 From: ableasby at hgmp.mrc.ac.uk (ableasby at hgmp.mrc.ac.uk) Date: Wed, 14 Feb 2001 09:52:12 GMT Subject: patched pepwindow(all).c Message-ID: <200102140952.JAA01756@bromine.hgmp.mrc.ac.uk> Hi James, ftp://ftp.uk.embnet.org/pub/EMBOSS/patchfiles/ From jkb at mrc-lmb.cam.ac.uk Wed Feb 14 05:22:51 2001 From: jkb at mrc-lmb.cam.ac.uk (James Bonfield) Date: Wed, 14 Feb 2001 10:22:51 +0000 Subject: abiview bug In-Reply-To: <3A8A5660.F4D90E58@hgmp.mrc.ac.uk>; from tcarver@hgmp.mrc.ac.uk on Wed, Feb 14, 2001 at 09:56:48AM +0000 References: <20010213191614.A9162@arran.mrc-lmb.cam.ac.uk> <20010214095307.C9543@arran.mrc-lmb.cam.ac.uk> <3A8A5660.F4D90E58@hgmp.mrc.ac.uk> Message-ID: <20010214102251.D9543@arran.mrc-lmb.cam.ac.uk> Hello all, I've now identified the problem in the abiview program. The program was assuming that the position of the PLOC block immediately follows the PBAS block, which is not always true. All blocks should be considered as files in a directory - they can physically be stored in any order. Hence you have to query the directory in order to find the file location. Fortunately the patch is simple: *** abiview.c~ Wed Feb 14 09:48:07 2001 --- abiview.c Wed Feb 14 10:05:59 2001 *************** *** 71,76 **** --- 71,77 ---- int i; int base; long int baseO; + long int basePosO; long int numBases; long int numPoints; long int dataOffset[4]; *************** *** 121,126 **** --- 122,129 ---- res4 = (char)(fwo_&BYTE[0]); ajSeqABIReadSeq(fp,baseO,numBases,&nseq); + basePosO = ajSeqABIGetBasePosOffset(fp); /* find PLOC tag & get offset */ + ajFileSeek(fp, basePosO, SEEK_SET); ajSeqABIGetBasePosition(fp,numBases,&basePositions); On a more general note, most people do not keep ABI files on disk as they are simply too large. They typically convert them to SCF instead. (Indeed some machines, eg Licor, write SCF as their native format.) We maintain a freely available library (io_lib) of routines for reading and writing ABI, ALF, SCF, CTF (Jean Thierry-mieg's compressed format) and ZTR (my own compressed format). All of ABI->* filters are lossy as there's lots of other bits in the ABI files which no one quite knows what to do with, however the SCF->CTF and SCF->ZTR are lossless (and SCF->ZTR is typically slightly smaller than bzipped SCF). Maybe it makes sense not to duplicate work. io_lib is free although it isn't yet GPLed. That shouldn't be a problem, but if it is I cannot see an issue with GPLing io_lib. Indeed it already looks like parts of io_lib (or at least "ted" which much of it came from) are in emboss; there's a striking similarities in the seqABIGetFlag and getABIIndexEntryLW functions (eg the same bizarre flow controls, identical code layout, and some identical variable names). James -- James Bonfield (jkb at mrc-lmb.cam.ac.uk) Tel: 01223 402499 Fax: 01223 213556 Medical Research Council - Laboratory of Molecular Biology, Hills Road, Cambridge, CB2 2QH, England. Also see Staden Package WWW site at http://www.mrc-lmb.cam.ac.uk/pubseq/ From gwilliam at hgmp.mrc.ac.uk Wed Feb 14 05:37:14 2001 From: gwilliam at hgmp.mrc.ac.uk (gwilliam at hgmp.mrc.ac.uk) Date: Wed, 14 Feb 2001 10:37:14 GMT Subject: -sreverse and feature tables Message-ID: <200102141037.KAA05396@californium.hgmp.mrc.ac.uk> Peter, Currently when you do a -sreverse, the feature table(s) of the reversed sequence(s) are not also reversed (start and end positions changed to be length-start+1, length-end+1 and sense negated). This is shown when you do % showseq em:hsfau1 -format=2 stdout and showseq em:hsfau1 -format=2 stdout -srev (The sequence gets reverse-complemented, but the feature table stays the same and so inappropriate regions are indicated in the display) Thanks, Gary From tcarver at hgmp.mrc.ac.uk Wed Feb 14 05:46:29 2001 From: tcarver at hgmp.mrc.ac.uk (Tim Carver) Date: Wed, 14 Feb 2001 10:46:29 +0000 Subject: abiview bug References: <20010213191614.A9162@arran.mrc-lmb.cam.ac.uk> <20010214095307.C9543@arran.mrc-lmb.cam.ac.uk> <3A8A5660.F4D90E58@hgmp.mrc.ac.uk> <20010214102251.D9543@arran.mrc-lmb.cam.ac.uk> Message-ID: <3A8A6204.8D771C67@hgmp.mrc.ac.uk> Hi James, Many thanks for the patch. Glad it was simple. This has now been installed here. I agree with your comments about io_lib and re-inventing the wheel. At some stage it would be good to get all these formats in. Regards Tim > Hello all, > > I've now identified the problem in the abiview program. > The program was assuming that the position of the PLOC block immediately > follows the PBAS block, which is not always true. All blocks should be > considered as files in a directory - they can physically be stored in any > order. Hence you have to query the directory in order to find the file > location. Fortunately the patch is simple: > > *** abiview.c~ Wed Feb 14 09:48:07 2001 > --- abiview.c Wed Feb 14 10:05:59 2001 > *************** > *** 71,76 **** > --- 71,77 ---- > int i; > int base; > long int baseO; > + long int basePosO; > long int numBases; > long int numPoints; > long int dataOffset[4]; > *************** > *** 121,126 **** > --- 122,129 ---- > res4 = (char)(fwo_&BYTE[0]); > > ajSeqABIReadSeq(fp,baseO,numBases,&nseq); > + basePosO = ajSeqABIGetBasePosOffset(fp); /* find PLOC tag & get offset */ > + ajFileSeek(fp, basePosO, SEEK_SET); > ajSeqABIGetBasePosition(fp,numBases,&basePositions); > > On a more general note, most people do not keep ABI files on disk as they are > simply too large. They typically convert them to SCF instead. (Indeed some > machines, eg Licor, write SCF as their native format.) We maintain a freely > available library (io_lib) of routines for reading and writing ABI, ALF, SCF, > CTF (Jean Thierry-mieg's compressed format) and ZTR (my own compressed > format). All of ABI->* filters are lossy as there's lots of other bits in the > ABI files which no one quite knows what to do with, however the SCF->CTF and > SCF->ZTR are lossless (and SCF->ZTR is typically slightly smaller than bzipped > SCF). Maybe it makes sense not to duplicate work. > > io_lib is free although it isn't yet GPLed. That shouldn't be a problem, but > if it is I cannot see an issue with GPLing io_lib. Indeed it already looks > like parts of io_lib (or at least "ted" which much of it came from) are in > emboss; there's a striking similarities in the seqABIGetFlag and > getABIIndexEntryLW functions (eg the same bizarre flow controls, identical > code layout, and some identical variable names). > > James > > -- > James Bonfield (jkb at mrc-lmb.cam.ac.uk) Tel: 01223 402499 Fax: 01223 213556 > Medical Research Council - Laboratory of Molecular Biology, > Hills Road, Cambridge, CB2 2QH, England. > Also see Staden Package WWW site at http://www.mrc-lmb.cam.ac.uk/pubseq/ -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.open-bio.org/pipermail/emboss-dev/attachments/20010214/d57700b2/attachment.html From jkb at mrc-lmb.cam.ac.uk Wed Feb 14 07:20:46 2001 From: jkb at mrc-lmb.cam.ac.uk (James Bonfield) Date: Wed, 14 Feb 2001 12:20:46 +0000 Subject: sequence.feature attribute Message-ID: <20010214122046.C10074@arran.mrc-lmb.cam.ac.uk> The feature attribute of type sequence (et al) doesn't appear to be documented. One thing I have noticed is that it prevents reading of simple format files. Eg: jkb at jura[work/emboss]& cat seq1 ATCGTACGATCGGACTAGC jkb at jura[work/emboss]& diffseq seq1 Find differences (SNPs) between nearly identical sequences EMBOSS An error in ajfile.c at line 980: Error reading from file '.' jkb at jura[work/emboss]& cat seq2 >test ATCGTACGATCGGACTAGC jkb at jura[work/emboss]& diffseq seq2 Find differences (SNPs) between nearly identical sequences Second sequence: I don't know why fasta format would work and plain text does not, as neither support features. James -- James Bonfield (jkb at mrc-lmb.cam.ac.uk) Tel: 01223 402499 Fax: 01223 213556 Medical Research Council - Laboratory of Molecular Biology, Hills Road, Cambridge, CB2 2QH, England. Also see Staden Package WWW site at http://www/pubseq/ From jkb at mrc-lmb.cam.ac.uk Wed Feb 14 10:01:12 2001 From: jkb at mrc-lmb.cam.ac.uk (James Bonfield) Date: Wed, 14 Feb 2001 15:01:12 +0000 Subject: Lists, selections & defaults Message-ID: <20010214150112.D10074@arran.mrc-lmb.cam.ac.uk> Some more ACD questions. If I've missed something in the documentation then please point me to it, otherwise please note it as a request for documentation improvements. For the list type, what value should default have? I've seen examples thus: showseq.acd: list: things [ default: "B N T S F" values: "S:Sequence, B:Blank line, 1:Frame1 translation, 2:Frame2 translation, 3:Frame3 translation, ... ajbad.acd: (is this deliberately a badly formatted acd file? list: frames [ default: "1,2,3,4,5,6" values: "0: None, 1: F1,2: F2,3: F3,4: R1,5: R2,6: R3" ... dbiblast.acd: List: seqtype [ req: Y prompt: "Sequence type" value: "N:nucleic;P:protein;?:unknown" max: 1 min: 1 def: unknown ] So the default is sometimes the 'code' (as in showseq.acd) and sometimes the 'value' (dbiblast.acd). When default is a list, sometimes it is space separated, and sometimes command separated. How should I be parsing this? James -- James Bonfield (jkb at mrc-lmb.cam.ac.uk) Tel: 01223 402499 Fax: 01223 213556 Medical Research Council - Laboratory of Molecular Biology, Hills Road, Cambridge, CB2 2QH, England. Also see Staden Package WWW site at http://www.mrc-lmb.cam.ac.uk/pubseq/ From leonardz at bioinfo.sickkids.on.ca Thu Feb 15 16:56:09 2001 From: leonardz at bioinfo.sickkids.on.ca (Len F. Zaifman) Date: Thu, 15 Feb 2001 16:56:09 -0500 Subject: Problem with water program Message-ID: <3A8C5079.D791FDDF@bioinfo.sickkids.on.ca> We are using water on a large sequence ( > 10000000 bases) and aligning a smaller sequence ( < 1000 bases ) to it. This is for emboss 1.9.1. The water manual says: > Diagnostic Error Messages > > Uncaught exception > Assertion failed > raised at ajmem.c:xxx > > > Probably means you have run out of memory. Try using supermatcher or matcher if this happens. This is on an SGI \Origin with Gigabytes of Ram, so I upped the amount of memory a single process could use from .5 GB to 2 GB and still got the same result. On the SGI I use ssusage to determine real memory usage and the memory usage didn't change in spite of the above change from kernel memory parameters real memory use rlimit_rss_max = 536870912 (0x20000000) ll 75216 mxrss 4 4k page size this is 300 MB of ram rlimit_rss_cur = 536870912 (0x20000000) ll to rlimit_rss_max = 2147483648 (0x80000000) ll 75216 mxrss rlimit_rss_cur = 2147483648 (0x80000000) ll (& yes I rebooted the system to ensure I got the new limits). So I ran water using par to trace the program and got the detailed output below. It looks to me like it is failing on opening my output file My.output. Any comments? By the way, this is a file and directory where I do have write permission, so it is not that. 15728.339mS(+ 22uS)[ 16] water.dbg( 1807): close(5) << this closes /tools/emboss1.9.1/share/EMBOSS/data/EDNAMAT 15728.366mS(+ 28uS)[ 16] water.dbg( 1807): END-close() OK 15728.649mS(+ 284uS)[ 16] water.dbg( 1807): open("MY.output", O_WRONLY|O_CREAT|O_TRUNC, 0666)15728.760mS(+ 109uS)[ 16] water.dbg( 1807): END-open() = 5 << This is my outputfile from water specified by -outfile 15729.497mS(+ 738uS)[ 16] water.dbg( 1807): write(2, "Uncaught exception", 18) 15729.545mS(+ 47uS)[ 16] water.dbg( 1807): END-write() = 18 15729.558mS(+ 12uS)[ 16] water.dbg( 1807): write(2, "\n", 1) 15729.622mS(+ 65uS)[ 16] water.dbg( 1807): END-write() = 1 15729.642mS(+ 17uS)[ 16] water.dbg( 1807): write(2, " Assertion failed", 17) 15729.649mS(+ 8uS)[ 16] water.dbg( 1807): END-write() = 17 15729.655mS(+ 6uS)[ 16] water.dbg( 1807): write(2, "\n", 1) 15729.665mS(+ 9uS)[ 16] water.dbg( 1807): END-write() = 1 15729.686mS(+ 20uS)[ 16] water.dbg( 1807): write(2, " raised at ajmem.c:167\n", 23) 15729.695mS(+ 10uS)[ 16] water.dbg( 1807): END-write() = 23 15729.700mS(+ 4uS)[ 16] water.dbg( 1807): write(2, "\n", 1) 15729.712mS(+ 11uS)[ 16] water.dbg( 1807): END-write() = 1 15729.853mS(+ 140uS)[ 16] water.dbg( 1807): write(2, "\n EMBOSS An error in ajexcep", 58) 15729.893mS(+ 40uS)[ 16] water.dbg( 1807): END-write() = 58 15729.900mS(+ 7uS)[ 16] water.dbg( 1807): write(2, "\n", 1) 15729.955mS(+ 55uS)[ 16] water.dbg( 1807): END-write() = 1 15729.998mS(+ 43uS)[ 16] water.dbg( 1807): prctl(PR_LASTSHEXIT) 15730.005mS(+ 6uS)[ 16] water.dbg( 1807): END-prctl() = 1 15730.048mS(+ 44uS)[ 16] water.dbg( 1807): exit(1) -------------- next part -------------- A non-text attachment was scrubbed... Name: leonardz.vcf Type: text/x-vcard Size: 358 bytes Desc: Card for Len F. Zaifman Url : http://lists.open-bio.org/pipermail/emboss-dev/attachments/20010215/b97a77ef/attachment.vcf From jkb at mrc-lmb.cam.ac.uk Fri Feb 16 07:48:08 2001 From: jkb at mrc-lmb.cam.ac.uk (James Bonfield) Date: Fri, 16 Feb 2001 12:48:08 +0000 Subject: abiview -graph data Message-ID: <20010216124808.A15098@arran.mrc-lmb.cam.ac.uk> Just another minor buglet: abiview does not accept "-graph data". It gripes with: Writing graph 1 data to abiview1.dat Writing graph 2 data to abiview2.dat Writing graph 3 data to abiview3.dat Writing graph 4 data to abiview4.dat *** PLPLOT ERROR *** pladv: Please call plinit first, aborting operation *** PLPLOT ERROR *** etc This isn't causing me problems at all (as we have our own trace viewer), so please don't feel that I'm waiting for a fix. -- James Bonfield (jkb at mrc-lmb.cam.ac.uk) Tel: 01223 402499 Fax: 01223 213556 Medical Research Council - Laboratory of Molecular Biology, Hills Road, Cambridge, CB2 2QH, England. Also see Staden Package WWW site at http://www.mrc-lmb.cam.ac.uk/pubseq/ From jkb at mrc-lmb.cam.ac.uk Fri Feb 16 07:50:23 2001 From: jkb at mrc-lmb.cam.ac.uk (James Bonfield) Date: Fri, 16 Feb 2001 12:50:23 +0000 Subject: Spin update Message-ID: <20010216125023.A3774@arran.mrc-lmb.cam.ac.uk> The EMBOSS interface in Spin is coming along well. My parser processes all 156 ACD files (including the collection of test and demo ones) in the 1.9.1 distribution and produces tcl code (using [incr widgets]) for each dialogue. All the tcl code runs and produces dialogue, but I expect many of them fail when I press the OK button. I'm working through them slowly. Technology wise, I think I've implemented pretty much most of the hard stuff now, with expressions in default, information, min, max, required and needed all working. Quite a bit of checking is performed (sequence types, min/max ranges, etc), but I've still got more work to do there (eg maximum number of values within lists). I'm using our existing code (that Kathryn Beal wrote) for drawing the graphical output from EMBOSS. However there seems to be little consistency in the arguments here. Many programs take a -graph option which can have the type "data". Others take a -data argument which saves the graphical information in another format. It would be good if these could be merged together somehow - all using the same method. I recall Kathryn discussing this with Alan before, but I don't know what the outcome was. Could you (Alan) please remind me what the conclusion was? One other obvious thing that strikes me is that we need a big reorganisation of the appl: group field. I'm using these to generate menus and cascading submenus, but it's almost impossible to guess where things we be. I'll perhaps have a go at reorganising it once I've finished the completed acd2tcl itself. James -- James Bonfield (jkb at mrc-lmb.cam.ac.uk) Tel: 01223 402499 Fax: 01223 213556 Medical Research Council - Laboratory of Molecular Biology, Hills Road, Cambridge, CB2 2QH, England. Also see Staden Package WWW site at http://www.mrc-lmb.cam.ac.uk/pubseq/ From jkb at mrc-lmb.cam.ac.uk Mon Feb 19 10:43:08 2001 From: jkb at mrc-lmb.cam.ac.uk (James Bonfield) Date: Mon, 19 Feb 2001 15:43:08 +0000 Subject: Strings vs expressions Message-ID: <20010219154308.C25705@arran.mrc-lmb.cam.ac.uk> Hello all, When is a expression within a string a real expression, and when is it just part of the string? Clearly not all $ symbols in strings form expressions. Eg dbiblast.acd contains a string: pattern: "^([0-9]+.[0-9]+.[0-9]+)?$" Mostly complex expressions are wrapped in @(), but not always. Eg see transeq.acd: def: $(sequence.begin)-$(sequence.end) How can I tell expressions from just internal dollars. There doesn't appear to be any escaping syntax, which means this _must_ be clearly defined to avoid bugs. My current ruling is if isn't followed by ( then it's not an expression. Is that valid? James -- James Bonfield (jkb at mrc-lmb.cam.ac.uk) Tel: 01223 402499 Fax: 01223 213556 Medical Research Council - Laboratory of Molecular Biology, Hills Road, Cambridge, CB2 2QH, England. Also see Staden Package WWW site at http://www.mrc-lmb.cam.ac.uk/pubseq/ From peter.rice at uk.lionbioscience.com Mon Feb 19 10:54:15 2001 From: peter.rice at uk.lionbioscience.com (rice) Date: Mon, 19 Feb 2001 15:54:15 +0000 Subject: Strings vs expressions References: <20010219154308.C25705@arran.mrc-lmb.cam.ac.uk> Message-ID: <3A9141A7.AB587BFD@uk.lionbioscience.com> James Bonfield wrote: > When is a expression within a string a real expression, and when is it just > part of the string? > Mostly complex expressions are wrapped in @(), but not always. Eg see > transeq.acd: > > def: $(sequence.begin)-$(sequence.end) This is a silly one - it only looks like an expression. $(name) is replaced by the variable value @(expression) is replaced by the expression result The example you give ends up as something like "1-1000" in a string. It is not an expression. Maybe we should call it a grimace :-) When () are nested, the inner ones are always evaluated first. $ or @ on their own, with no (), are unchanged. -- ------------------------------------------------ Peter Rice, LION Bioscience Ltd, Cambridge, UK peter.rice at uk.lionbioscience.com +44 1223 224723 From jkb at mrc-lmb.cam.ac.uk Mon Feb 19 10:57:39 2001 From: jkb at mrc-lmb.cam.ac.uk (James Bonfield) Date: Mon, 19 Feb 2001 15:57:39 +0000 Subject: Strings vs expressions In-Reply-To: <3A9141A7.AB587BFD@uk.lionbioscience.com>; from peter.rice@uk.lionbioscience.com on Mon, Feb 19, 2001 at 03:54:15PM +0000 References: <20010219154308.C25705@arran.mrc-lmb.cam.ac.uk> <3A9141A7.AB587BFD@uk.lionbioscience.com> Message-ID: <20010219155738.D25705@arran.mrc-lmb.cam.ac.uk> On Mon, Feb 19, 2001 at 03:54:15PM +0000, rice wrote: > When () are nested, the inner ones are always evaluated first. > > $ or @ on their own, with no (), are unchanged. So what you're saying basically is that $(...) and @(...) are expressions (to the matching bracket), and any other occurrence of $, ( and @ are just basic text. (That's fine as it now agrees with how I changed things - previously I was working on $ and @ at the start of a word indicating an expression.) Could this behaviour please be documented? The ACD docs seem to indicate that all parsing is done on a word by word basis, which is where I originally concluded that $/@ starting a word indicates an expression. James -- James Bonfield (jkb at mrc-lmb.cam.ac.uk) Tel: 01223 402499 Fax: 01223 213556 Medical Research Council - Laboratory of Molecular Biology, Hills Road, Cambridge, CB2 2QH, England. Also see Staden Package WWW site at http://www.mrc-lmb.cam.ac.uk/pubseq/ From jkb at mrc-lmb.cam.ac.uk Wed Feb 7 15:13:29 2001 From: jkb at mrc-lmb.cam.ac.uk (James Bonfield) Date: Wed, 7 Feb 2001 15:13:29 +0000 Subject: required vs optional Message-ID: <20010207151329.A12981@arran.mrc-lmb.cam.ac.uk> Hello all, I'm confused about the precise use for the required and optional parameters. The observed usage varies and seems to disagree with the documentation. Some examples help, but I should warn you that I'm still using emboss 1.5.5 so things may have changed. syco.acd: bool: plot [ opt: N info: "Produce plot" ] xygraph: graph [ req: $(plot) multi: 3 ] outfile: outfile [ req: @(!$(plot)) ] Here plot is a boolean. If "Produce plot" is answered as yes then the xygraph type is a suitable question (althouugh whether or not it's asked for is another matter) and the outfile is redundant. The produce plot is answered as no then vice versa is true. It's clear to see how this is implemented via the "req: $(plot)" code. In the graphical interface, where all command options are displayed simultaneously, this produces code which automatically "greys-out" arguments that are superfluous. However if we look at shuffleseq.acd int: shuffle [ req: N def: 1 info: "Number of shuffles" ] Here this is implying that 'shuffle' is never required. So required is both indicating parameters which are merely optional and parameters which are not needed. In my current GUI this causes shuffle to be permanently greyed-out, which is obviously incorrect. As a hack I can ignore fixed "req:N" statements and only grey-out when the value of req: is an expression, but that's also wrong. Even more confusing is pepwheel.acd: bool: wheel [ def: Y info: "Plot the wheel" ] int: steps [ opt: Y min: 2 max: 100 def: 18 info: "Number of steps" help: "The number of residues plotted per turn is this value divided by the 'turns' value." ] steps is listed using "opt" and not "req". It also has no dependency on wheel, which doesn't make sense. Wheel mentions neither optional or required settings - what are the default values for these? I also do not understand the rules for working out which parameters are listed in the help as mandatory, optional, or advanced. The help for pepwheel indicates that -steps is optional and -wheel is advanced, which isn't too sensible. I assume this information is derived from the use of opt, and req. Anyway, ideally I'd like a needed paramater so that I can distinguish between options which have no use (eg "steps" in pepwheel after "wheel" is set to 0) and options that have a use but a simply optional. The required setting seems redundant, and could be a source of error (what does req:Y opt:Y mean, or req:N opt:N?). Any suggestions? James -- James Bonfield (jkb at mrc-lmb.cam.ac.uk) Tel: 01223 402499 Fax: 01223 213556 Medical Research Council - Laboratory of Molecular Biology, Hills Road, Cambridge, CB2 2QH, England. Also see Staden Package WWW site at http://www.mrc-lmb.cam.ac.uk/pubseq/ From Peter.Rice at uk.lionbioscience.com Wed Feb 7 15:32:40 2001 From: Peter.Rice at uk.lionbioscience.com (Peter Rice) Date: Wed, 07 Feb 2001 15:32:40 +0000 Subject: required vs optional References: <20010207151329.A12981@arran.mrc-lmb.cam.ac.uk> Message-ID: <3A816A98.2BA3C3FF@lionbio.co.uk> Hi James, >I'm confused about the precise use for the required and optional >parameters. It is all very simple. Honest. Only the expressions make it complicated. By default, everything is "advanced" (i.e. never prompted for). If you say "req: Y" then it means required, and will be prompted. If you say "param: Y" this also means required, and will be prompted. Incidentally, as "req: N" is the default it is not needed in shuffleseq.acd If you say "opt: Y" then it means optional, and is only prompted for when you run with -options on the command line (or EMBOSS_OPTIONS defined as true) If you want to make life complicated, you can say "req: " and work out at run time whether it is required or not. These are the cases that cause you problems. If you can figure out the result, then fine. If not, you can assume a true value or make up your own "may be needed, but I can't tell" category. >I also do not understand the rules for working out which parameters are listed >in the help as mandatory, optional, or advanced. The help for pepwheel >indicates that -steps is optional and -wheel is advanced, which isn't too >sensible. I assume this information is derived from the use of opt, and req. The help has the same problem you do. Only worse - it has to guess what would happen with any expression. At least you get to see the user input :-) >Anyway, ideally I'd like a needed paramater so that I can distinguish between >options which have no use (eg "steps" in pepwheel after "wheel" is set to 0) >and options that have a use but a simply optional. The required setting seems >redundant, and could be a source of error (what does req:Y opt:Y mean, or >req:N opt:N?). In pepwheel.acd, plotting the wheel is 'advanced' (req: N) so users can only turn off the plot from the command line. The step increment is optional so users can try running "pepwheel -options" and be prompted for the step. Yes, we could turn it off if there is no plot, but the program does not mind and it would make life even more difficult for interface developers if we had too many expressions. The obvious solution would be to say "opt: $(wheel)" in steps. I would guess leaving it as it currently stands is 'best'. Who knows, one day the program might calculate something that needs the step value, without displaying the plot :-) Of course, you are free to play with the settings to find a way that makes sense for your graphical interface. Most variations would make sense to the program (just try it from the command line to check what happens). Hope this helps, Peter -- ------------------------------------------------ Peter Rice, LION Bioscience Ltd, Cambridge, UK peter.rice at uk.lionbioscience.com +44 1223 224723 From david.martin at biotek.uio.no Wed Feb 7 15:41:28 2001 From: david.martin at biotek.uio.no (David Martin) Date: Wed, 7 Feb 2001 16:41:28 +0100 Subject: [EMBnet ADMIN] required vs optional In-Reply-To: <20010207151329.A12981@arran.mrc-lmb.cam.ac.uk> Message-ID: On Wed, 7 Feb 2001, James Bonfield wrote: > Hello all, > > I'm confused about the precise use for the required and optional > parameters. The observed usage varies and seems to disagree with the > documentation. Some examples help, but I should warn you that I'm still using > emboss 1.5.5 so things may have changed. Unless I am mistaken parameters are defined inthe following way: req: Y - a required parameter opt: y - an optional parameter everything else : an advanced parameter I would expect the opt: N to be redundant. The reason one can set opt: N or req: N is so that it can be implemented programatically (as your first example shows.) > > syco.acd: > > bool: plot [ opt: N info: "Produce plot" ] > > xygraph: graph [ req: $(plot) multi: 3 ] > > outfile: outfile [ req: @(!$(plot)) ] > > Here plot is a boolean. If "Produce plot" is answered as yes then the xygraph > type is a suitable question (althouugh whether or not it's asked for is > another matter) and the outfile is redundant. The produce plot is answered as > no then vice versa is true. It's clear to see how this is implemented via the > "req: $(plot)" code. In the graphical interface, where all command options are > displayed simultaneously, this produces code which automatically "greys-out" > arguments that are superfluous. > > However if we look at shuffleseq.acd > > int: shuffle [ req: N def: 1 info: "Number of shuffles" ] > > Here this is implying that 'shuffle' is never required. So required is both > indicating parameters which are merely optional and parameters which are not > needed. In my current GUI this causes shuffle to be permanently greyed-out, > which is obviously incorrect. As a hack I can ignore fixed "req:N" statements > and only grey-out when the value of req: is an expression, but that's also > wrong. I would interpret this as being that shuffle is an advanced qualifier, which is as EMBOSS appears to give it. It is not that it isn't required, but that it is already set. It is probably needed but isn't required to be prompted for. > > Even more confusing is pepwheel.acd: > > bool: wheel [ > def: Y > info: "Plot the wheel" > ] > > int: steps [ > opt: Y > min: 2 > max: 100 > def: 18 > info: "Number of steps" > help: "The number of residues plotted per turn is this value > divided by the 'turns' value." > ] > > steps is listed using "opt" and not "req". It also has no dependency on wheel, > which doesn't make sense. Wheel mentions neither optional or required settings > - what are the default values for these? But the opt/req controlls which parameters are prompted for.. I suppose it should really be dependent so it is not prompted if -nowheel is set on the comamnd line. > > I also do not understand the rules for working out which parameters are listed > in the help as mandatory, optional, or advanced. The help for pepwheel > indicates that -steps is optional and -wheel is advanced, which isn't too > sensible. I assume this information is derived from the use of opt, and req. yup. if req is Y then it is in mandatory, opt is Y then it is in optional, neither then it is in advanced. What it does if both are set is a mystery, probably throws an error. > > Anyway, ideally I'd like a needed paramater so that I can distinguish between > options which have no use (eg "steps" in pepwheel after "wheel" is set to 0) > and options that have a use but a simply optional. The required setting seems > redundant, and could be a source of error (what does req:Y opt:Y mean, or > req:N opt:N?). Can both be defined in a valid ACD file? On a sideways but related note, is it possible to configure the windows versions of SIP/NIP to import sequences from an external program? this would be very nice. I am thinking of a java thinghy with a gui to cover seqret/entret/showdb from a remote EMBOSS machine. ..d --------------------------------------------------------------------- * Dr. David Martin Biotechnology Centre of Oslo * * Node Manager Gaustadalleen 21 * * The Norwegian EMBNet Node P.O. box 1125 Blindern * * tel +47 22 84 05 35 N-0317 Oslo * * fax +47 22 84 05 01 Norway * --------------------------------------------------------------------- I will be leaving the Norwegian EMBnet node on 23rd February. All work related mail should be addressed to admin at embnet.uio.no where my successor, Rune Groven will deal with it. All personal email should be sent to dmartin at hgmp.mrc.ac.uk from whence it will be automatically forwarded to me. Spam should continue to be sent to /dev/null From jkb at mrc-lmb.cam.ac.uk Wed Feb 7 16:59:30 2001 From: jkb at mrc-lmb.cam.ac.uk (James Bonfield) Date: Wed, 7 Feb 2001 16:59:30 +0000 Subject: required vs optional In-Reply-To: <3A816A98.2BA3C3FF@lionbio.co.uk>; from Peter.Rice@uk.lionbioscience.com on Wed, Feb 07, 2001 at 03:32:40PM +0000 References: <20010207151329.A12981@arran.mrc-lmb.cam.ac.uk> <3A816A98.2BA3C3FF@lionbio.co.uk> Message-ID: <20010207165930.C19630@arran.mrc-lmb.cam.ac.uk> Hi Peter, Thanks for some clarification, but I still have further queries. > If you say "req: Y" then it means required, and will be prompted. > If you say "param: Y" this also means required, and will be prompted. So for the purposes of the GUI I can just treat them the same and always specify the qualifier anyway. > If you want to make life complicated, you can say "req: " and work > out at run time whether it is required or not. These are the cases that cause > you problems. If you can figure out the result, then fine. If not, you can Actually these aren't a problem at all. I'm handling expressions already, even in cases for specifying the minimum and maximum values (I don't know if these exist, but I hacked an example to test it). So my GUI for stretcher, for example, automatically handles changing the gappenalty and gaplength parameters when the user changes from DNA to protein or back again. If the user modifies gappenalty and then changes the sequence type then the program remembers that the user has already adjusted the value and so it then does not automatically change it. Internally this is all performed by Tcl's excellent variable trace options. When a value is an expression I work out which variables are contained within the expression and produce a Tcl trace for each one. Then whenever a dependent variable is modified a callback procedure is invoked which reevaluations the expression. This is even working (wrongly as it turns out) for the required parameters, so setting $(wheel) to 0, from 1, (for example) could automatically grey-out the other parameters. > would make life even more difficult for interface developers if we had too > many expressions. The obvious solution would be to say "opt: $(wheel)" in > steps. To be honest it'd make my life easier as things are more consistent, and it looks better for the user too. > Of course, you are free to play with the settings to find a way that makes > sense for your graphical interface. Most variations would make sense to the > program (just try it from the command line to check what happens). I still feel that a way of indicating unnecessary or pointless questions would be useful. It's additional to required as that just indicates whether or not the default value is enough. I'm just trying to think what dialogues professional applications are likely to have, and greying-out things on the fly is a common technique. The more I think about it the more I'm certain. So I propose the following addition to each command line qualifier. needed: @expression The default is 1. If an expression is specified and it evaluations to 0 then the option is not asked for (even with -options) and in GUIs it will be greyed-out. For me, this is trivial to do (ie I've already written it). Any comments on this proposal? James -- James Bonfield (jkb at mrc-lmb.cam.ac.uk) Tel: 01223 402499 Fax: 01223 213556 Medical Research Council - Laboratory of Molecular Biology, Hills Road, Cambridge, CB2 2QH, England. Also see Staden Package WWW site at http://www.mrc-lmb.cam.ac.uk/pubseq/ From Peter.Rice at uk.lionbioscience.com Wed Feb 7 17:07:34 2001 From: Peter.Rice at uk.lionbioscience.com (Peter Rice) Date: Wed, 07 Feb 2001 17:07:34 +0000 Subject: required vs optional References: <20010207151329.A12981@arran.mrc-lmb.cam.ac.uk> <3A816A98.2BA3C3FF@lionbio.co.uk> <20010207165930.C19630@arran.mrc-lmb.cam.ac.uk> Message-ID: <3A8180D6.F158B05C@lionbio.co.uk> James Bonfield wrote: > > If you say "req: Y" then it means required, and will be prompted. > > If you say "param: Y" this also means required, and will be prompted. > > So for the purposes of the GUI I can just treat them the same and always > specify the qualifier anyway. Yes. > I still feel that a way of indicating unnecessary or pointless questions would > be useful. It's additional to required as that just indicates whether or not > the default value is enough. I'm just trying to think what dialogues > professional applications are likely to have, and greying-out things on the > fly is a common technique. The more I think about it the more I'm certain. > So I propose the following addition to each command line qualifier. > > needed: @expression > > The default is 1. If an expression is specified and it evaluations to 0 then > the option is not asked for (even with -options) and in GUIs it will be > greyed-out. For me, this is trivial to do (ie I've already written it). > Any comments on this proposal? I don't see how this is different from testing required, parameter and optional in turn, which is what the code will do. Basically, needed is 'required or parameter or optional' ... every time. You would then have the choice of including/excluding the optional ones, just like a 'real' user. -- ------------------------------------------------ Peter Rice, LION Bioscience Ltd, Cambridge, UK peter.rice at uk.lionbioscience.com +44 1223 224723 From jkb at mrc-lmb.cam.ac.uk Wed Feb 7 17:16:49 2001 From: jkb at mrc-lmb.cam.ac.uk (James Bonfield) Date: Wed, 7 Feb 2001 17:16:49 +0000 Subject: required vs optional In-Reply-To: <3A8180D6.F158B05C@lionbio.co.uk>; from Peter.Rice@uk.lionbioscience.com on Wed, Feb 07, 2001 at 05:07:34PM +0000 References: <20010207151329.A12981@arran.mrc-lmb.cam.ac.uk> <3A816A98.2BA3C3FF@lionbio.co.uk> <20010207165930.C19630@arran.mrc-lmb.cam.ac.uk> <3A8180D6.F158B05C@lionbio.co.uk> Message-ID: <20010207171649.A25071@arran.mrc-lmb.cam.ac.uk> On Wed, Feb 07, 2001 at 05:07:34PM +0000, Peter Rice wrote: > I don't see how this is different from testing required, parameter and > optional in turn, which is what the code will do. > > Basically, needed is 'required or parameter or optional' ... every time. > > You would then have the choice of including/excluding the optional ones, just > like a 'real' user. But I've already found one case (shuffleseq) where it is stated to be "req: N" (and defaults to opt:y), in which case I (wrongly) grey it out as it's never needed. And of course there are other cases where the existing syntax is sufficient, but it's not actually used (eg all those options in pepwheel). What am I missing? James -- James Bonfield (jkb at mrc-lmb.cam.ac.uk) Tel: 01223 402499 Fax: 01223 213556 Medical Research Council - Laboratory of Molecular Biology, Hills Road, Cambridge, CB2 2QH, England. Also see Staden Package WWW site at http://www.mrc-lmb.cam.ac.uk/pubseq/ From Peter.Rice at uk.lionbioscience.com Wed Feb 7 17:30:48 2001 From: Peter.Rice at uk.lionbioscience.com (Peter Rice) Date: Wed, 07 Feb 2001 17:30:48 +0000 Subject: required vs optional References: <20010207151329.A12981@arran.mrc-lmb.cam.ac.uk> <3A816A98.2BA3C3FF@lionbio.co.uk> <20010207165930.C19630@arran.mrc-lmb.cam.ac.uk> <3A8180D6.F158B05C@lionbio.co.uk> <20010207171649.A25071@arran.mrc-lmb.cam.ac.uk> Message-ID: <3A818648.72D3658A@lionbio.co.uk> James Bonfield wrote: > But I've already found one case (shuffleseq) where it is stated to be "req: N" > (and defaults to opt:y), in which case I (wrongly) grey it out as it's never > needed. I am a little confused there. This definition: int: shuffle [ req: N def: 1 info: "Number of shuffles" ] is opt: N (the default - all these default to false) EMBOSS will never prompt for it, but you can certainly set it from the command line, and if you like you can offer it to the user. EMBOSS will accept anything on the command line (there is no complaint if the user puts something on the command line that is not wanted). But the assumption for graphical user interfaces is that only the optional cases are offered. I guess, being pedantic, it would be reasonable to change shuffle to be "opt: y" in the acd file, and the same for most other options you might like to turn on. My suspicion is that only a few keen users are really using the -options qualifier, but in any case it was aimed as much at the GUI developer as at the user who didn't want too many prompts. > And of course there are other cases where the existing syntax is sufficient, > but it's not actually used (eg all those options in pepwheel). We could clean them up, or we could remove the 'turn off the plot' option :-) -- ------------------------------------------------ Peter Rice, LION Bioscience Ltd, Cambridge, UK peter.rice at uk.lionbioscience.com +44 1223 224723 From jkb at mrc-lmb.cam.ac.uk Wed Feb 7 17:50:53 2001 From: jkb at mrc-lmb.cam.ac.uk (James Bonfield) Date: Wed, 7 Feb 2001 17:50:53 +0000 Subject: required vs optional In-Reply-To: <3A818648.72D3658A@lionbio.co.uk>; from Peter.Rice@uk.lionbioscience.com on Wed, Feb 07, 2001 at 05:30:48PM +0000 References: <20010207151329.A12981@arran.mrc-lmb.cam.ac.uk> <3A816A98.2BA3C3FF@lionbio.co.uk> <20010207165930.C19630@arran.mrc-lmb.cam.ac.uk> <3A8180D6.F158B05C@lionbio.co.uk> <20010207171649.A25071@arran.mrc-lmb.cam.ac.uk> <3A818648.72D3658A@lionbio.co.uk> Message-ID: <20010207175053.C25071@arran.mrc-lmb.cam.ac.uk> On Wed, Feb 07, 2001 at 05:30:48PM +0000, Peter Rice wrote: > I am a little confused there. This definition: > > int: shuffle [ req: N def: 1 info: "Number of shuffles" ] > > is opt: N (the default - all these default to false) Sorry, I thought opt: Y was the default. Incidently is there a list anywhere of all the default values? Eg including info, prompt, etc. Sometimes info: is used and sometimes prompt:, but often none are present. However emboss still prompts with an appropriate query - where does this come from? I can see some in the ajacd.c code, but not all. Besides reading the code to find the defaults isn't my idea of fun :-) > on. My suspicion is that only a few keen users are really using the -options > qualifier, but in any case it was aimed as much at the GUI developer as at the > user who didn't want too many prompts. Agreed. So just to clarify things, if the physical values or results from expressions give the following, then I need to enable or disable (grey-out) the question appropriately: req opt greyed out Y N N Y Y N N N N N Y Y Do req:N opt:N and req:Y opt:Y ever occur? They sound like inconsistencies. Anyway, only when it's optional and not required should I grey them out. However as optional defaults to N only places where opt:Y is explicitly stated will I ever grey out paramaters, which unfortunately includes all those graphics options (eg this is syco): bool: plot [ opt: N info: "Produce plot" ] xygraph: graph [ req: $(plot) multi: 3 ] outfile: outfile [ req: @(!$(plot)) ] I assume that I can just add opt:Y to the graph and outfile without breaking anything. I must admit though that this would be much easier if there was one single parameter to check as it makes the tcl variable tracing harder to write - rather than a trace of a single variable I have to produce a new variable consistencing of an expression (using req and opt) and trace that instead. James -- James Bonfield (jkb at mrc-lmb.cam.ac.uk) Tel: 01223 402499 Fax: 01223 213556 Medical Research Council - Laboratory of Molecular Biology, Hills Road, Cambridge, CB2 2QH, England. Also see Staden Package WWW site at http://www.mrc-lmb.cam.ac.uk/pubseq/ From Peter.Rice at uk.lionbioscience.com Wed Feb 7 18:18:20 2001 From: Peter.Rice at uk.lionbioscience.com (Peter Rice) Date: Wed, 07 Feb 2001 18:18:20 +0000 Subject: required vs optional References: <20010207151329.A12981@arran.mrc-lmb.cam.ac.uk> <3A816A98.2BA3C3FF@lionbio.co.uk> <20010207165930.C19630@arran.mrc-lmb.cam.ac.uk> <3A8180D6.F158B05C@lionbio.co.uk> <20010207171649.A25071@arran.mrc-lmb.cam.ac.uk> <3A818648.72D3658A@lionbio.co.uk> <20010207175053.C25071@arran.mrc-lmb.cam.ac.uk> Message-ID: <3A81916C.7E87ED03@lionbio.co.uk> James Bonfield wrote: > Incidently is there a list anywhere of all the default values? Eg including > info, prompt, etc. Sometimes info: is used and sometimes prompt:, but often > none are present. However emboss still prompts with an appropriate query - > where does this come from? I can see some in the ajacd.c code, but not > all. Besides reading the code to find the defaults isn't my idea of fun :-) All very much cleaned up since PISE started building their interface. All applications should be using info, and so should you. That was its original purpose. For cases where the prompt to the interactive user would be different, we invented 'prompt:' which applications can use if available, but the (unwritten) standard says that when prompts are being specified, info must always be there. However, certain data types have their own prompts. Sequence for example. Those are defined interally as secret codes, and have their own automatic prompts. If you try the following: % make check % entrails emboss.ent Almost all will be revealed in file 'emboss.ent'. entrails was written for interface developers who need to see these kinds of internals. For some reason, the 'make check' applications do not get installed so you need to run them from the original emboss directory. It should be not too hard to add the default prompts to entrails. Oops. I see it has the explanations swapped for input and output formats too. Needs some work. Which means ... Let me know what internals you need to peek at, and I can hack it for you while I fix the rest. > Agreed. So just to clarify things, if the physical values or results from > expressions give the following, then I need to enable or disable (grey-out) > the question appropriately: > > req opt greyed out > Y N N > Y Y N > N N N > N Y Y Swap those last two. opt:y means it can be useful opt:n means you can change it if you dare. > Do req:N opt:N and req:Y opt:Y ever occur? They sound like inconsistencies. They are rare, but they are possible (especially with expressions around). EMBOSS only has to decide whether to prompt. req:y prompts always. opt:y has to test whether -options (or EMBOSS_OPTIONS) is set. No conflict. EMBOSS will always prompt if it finds a reason to. > Anyway, only when it's optional and not required should I grey them out. > However as optional defaults to N only places where opt:Y is explicitly stated > will I ever grey out paramaters, which unfortunately includes all those > graphics options (eg this is syco): Some misunderstanding of 'required' perhaps? Required means the application would like to know what the user really wants, and it will prompt if it doesn't know. Optional means the user may want to set it, and he/she can ask for a prompt. With nothing set, the user can only set the value on the command line. These are the ones I would expect you to grey out. In processing, EMBOSS sets everything. If there is no prompt, and nothing on the command line, there is always a default value. > bool: plot [ opt: N info: "Produce plot" ] > > xygraph: graph [ req: $(plot) multi: 3 ] > > outfile: outfile [ req: @(!$(plot)) ] > > I assume that I can just add opt:Y to the graph and outfile without breaking > anything. I must admit though that this would be much easier if there was one > single parameter to check as it makes the tcl variable tracing harder to > write - rather than a trace of a single variable I have to produce a new > variable consistencing of an expression (using req and opt) and trace that > instead. Can you just generate the tcl variable you need? If only req or opt will be used you can just pick the value of whichever one is being set. Spare 'N' settings can be safely ignored, as they are the default anyway. -- ------------------------------------------------ Peter Rice, LION Bioscience Ltd, Cambridge, UK peter.rice at uk.lionbioscience.com +44 1223 224723 From leonardz at bioinfo.sickkids.on.ca Wed Feb 7 21:32:35 2001 From: leonardz at bioinfo.sickkids.on.ca (Len F. Zaifman) Date: Wed, 07 Feb 2001 16:32:35 -0500 Subject: EMBOSS - Indexing breaks on large databases Message-ID: <3A81BEF3.308A85CF@bioinfo.sickkids.on.ca> I have installed emboss 1.9.1 on an O2000. It installed nicely once I gave up on installing it shared. The issue came up in indexing genbank files. Most divisions indexed fine with dbiflat. However, when I try to index est , or all of genbank , the indexing breaks due to sort running out of memory: explicitly: I run dbiflat -idformat GB -directory /data/genbank -indexdirectory /tools/emboss1.9.1/data/indices/est -dbname GenBankEst -filenames gbest*.seq -date 06/02/01 -sortoptions '-T /tmp_disk/scratch4/applicat/est -k1,1' & dbiflat -idformat GB -directory /data/genbank -indexdirectory /tools/emboss1.9.1/data/indices/genbank -dbname GenBank -filenames *.seq -date 06/02/01 -sortoptions '-T /tmp_disk/scratch4/applicat/genbank -k1,1' & get UX:sort: ERROR: Out of memory before merge: Not enough space sort is run with -T /scratch4 -k1,1 , where scratch4 has a 10 GB quota I checked the environment and it is using the system sort (/bin/sort). There were no syslog errors. All other smaller divisions seemed to work. I have a scheduled reboot where I am going to set the maximum resident set size to 1 GB (it is currently 1/2 GB). However, is there a more clever way of doing this (ie if I did this on my work station I would be limited to 1/8 GB or swap like crazy). Details: I configure using: ./configure --prefix=/tools/emboss1.9.1 --disable-shared --with-x --with-pngdriver on an O2K running Irix 6.5.10 and the MipsPro 7.3.1.2 Compilers Any ideas?? As a side note: when I tried indexing all of genbank I got almost 60000 sequences generating the following warning notice: This is a warning: Duplicate ID skipped: 'XXXXXXXX' Is this an indication that the initial data needs to be cleaned up first, or a non-issue? Thanks. From jrvalverde at cnb.uam.es Thu Feb 8 09:00:49 2001 From: jrvalverde at cnb.uam.es (jrvalverde at cnb.uam.es) Date: Thu, 8 Feb 2001 10:00:49 +0100 (MET) Subject: EMBOSS - Indexing breaks on large databases In-Reply-To: <3A81BEF3.308A85CF@bioinfo.sickkids.on.ca> Message-ID: <200102080900.f1890pU3292249@embnet.cnb.uam.es> "Len F. Zaifman" wrote: > I have installed emboss 1.9.1 on an O2000. It installed nicely once I > gave up on installing it shared. BTW, I have succeeded in compiling EMBOSS for IRIX using 64 bit compilation. It required some tweaking, but works. The recipe for those willing to give it a try is - remove 'gcc' from your path - define COMPILER_DEFAULTS_PATH appropriately (see pe_environ) to look for a compiler.defaults file containing e.g. :abi=64:isa=4:proc=r10k - ./configure in EMBOSS and all EMBASSY subdirs - search in all files for 'CC = cc' and substitute it for 'CC = cc -64' - same for 'LD = /bin/ld' -> 'LD = /bin/ld -64' - make The reason is that compiling depends on the Makefile and on libtool, as well as linking. I didn't spend much in looking at configure since the above steps where so straightforward. I know I should look into the configure script and add an option for 64-bit-irix-compile or some such, but that'll have to wait till I have time for it. Yes, I know, the search and substitute thing looks tedious, but it isn't, honest: create a 'chfile.sh' out of the EMBOSS source hierarchy containing #/bin/sh cp $1 $1.orig mv $1 tmpfile sed -e 's/CC="cc"/CC="cc -64"/g' $1 > tmpfile sed -e 's/CC = cc/CC = cc -64/g' $1 > tmpfile sed -e 's/\/bin\/ld/\/bin\/ld -64/g' $1 > tmpfile rm tmpfile ## if you are sure, uncomment this #rm $1.orig 'cd' to the emboss directory and run find . -type f -exec /path/to/chfile.sh {} \; -print and you are done with the CC changes. Libtool requires special treatment since it uses quotes j From david.martin at biotek.uio.no Thu Feb 8 09:32:12 2001 From: david.martin at biotek.uio.no (David Martin) Date: Thu, 8 Feb 2001 10:32:12 +0100 Subject: [EMBnet ADMIN] EMBOSS - Indexing breaks on large databases In-Reply-To: <3A81BEF3.308A85CF@bioinfo.sickkids.on.ca> Message-ID: On Wed, 7 Feb 2001, Len F. Zaifman wrote: > I have installed emboss 1.9.1 on an O2000. It installed nicely once I > gave up on installing it shared. Which compiler were you using? I note that you have the MIPS compiler. Have you tried using gcc which seems (on my o200 which shouldn't be so different) to work just fine on EMBL. ..d > > The issue came up in indexing genbank files. Most divisions indexed fine > with dbiflat. However, when I try to index > est , or all of genbank , the indexing breaks due to sort running out of > memory: > > explicitly: > I run > dbiflat -idformat GB -directory /data/genbank -indexdirectory > /tools/emboss1.9.1/data/indices/est -dbname GenBankEst -filenames > gbest*.seq -date 06/02/01 -sortoptions '-T > /tmp_disk/scratch4/applicat/est -k1,1' > & > dbiflat -idformat GB -directory /data/genbank -indexdirectory > /tools/emboss1.9.1/data/indices/genbank -dbname GenBank -filenames > *.seq -date 06/02/01 -sortoptions '-T > /tmp_disk/scratch4/applicat/genbank -k1,1' > & get > > UX:sort: ERROR: Out of memory before merge: Not enough space > > > sort is run with -T /scratch4 -k1,1 , where scratch4 has a 10 GB > quota > I checked the environment and it is using the system sort (/bin/sort). > There were no syslog errors. > > All other smaller divisions seemed to work. I have a scheduled reboot > where I am going to set the > maximum resident set size to 1 GB (it is currently 1/2 GB). However, is > there a more clever way of doing this (ie if I did this on my work > station I would be limited to 1/8 GB or swap like crazy). > > Details: > > I configure using: > ./configure --prefix=/tools/emboss1.9.1 --disable-shared --with-x > --with-pngdriver > > on an O2K running Irix 6.5.10 and the MipsPro 7.3.1.2 Compilers > > Any ideas?? > > > > As a side note: when I tried indexing all of genbank I got almost 60000 > sequences generating the following warning notice: > > > > This is a warning: Duplicate ID skipped: 'XXXXXXXX' > > Is this an indication that the initial data needs to be cleaned up > first, or a non-issue? > > Thanks. > > > > --------------------------------------------------------------------- * Dr. David Martin Biotechnology Centre of Oslo * * Node Manager Gaustadalleen 21 * * The Norwegian EMBNet Node P.O. box 1125 Blindern * * tel +47 22 84 05 35 N-0317 Oslo * * fax +47 22 84 05 01 Norway * --------------------------------------------------------------------- I will be leaving the Norwegian EMBnet node on 23rd February. All work related mail should be addressed to admin at embnet.uio.no where my successor, Rune Groven will deal with it. All personal email should be sent to dmartin at hgmp.mrc.ac.uk from whence it will be automatically forwarded to me. Spam should continue to be sent to /dev/null From david.martin at biotek.uio.no Thu Feb 8 09:41:28 2001 From: david.martin at biotek.uio.no (David Martin) Date: Thu, 8 Feb 2001 10:41:28 +0100 Subject: [EMBnet ADMIN] Re: EMBOSS - Indexing breaks on large databases In-Reply-To: <200102080900.f1890pU3292249@embnet.cnb.uam.es> Message-ID: On Thu, 8 Feb 2001 jrvalverde at cnb.uam.es wrote: > "Len F. Zaifman" wrote: > > I have installed emboss 1.9.1 on an O2000. It installed nicely once I > > gave up on installing it shared. > > BTW, I have succeeded in compiling EMBOSS for IRIX using 64 bit > compilation. > > It required some tweaking, but works. The recipe for those willing to > give it a try is > > - remove 'gcc' from your path > - define COMPILER_DEFAULTS_PATH appropriately (see pe_environ) > to look for a compiler.defaults file containing e.g. > :abi=64:isa=4:proc=r10k > - ./configure in EMBOSS and all EMBASSY subdirs > - search in all files for 'CC = cc' and substitute it > for 'CC = cc -64' > - same for 'LD = /bin/ld' -> 'LD = /bin/ld -64' > - make > > The reason is that compiling depends on the Makefile and on libtool, > as well as linking. I didn't spend much in looking at configure since > the above steps where so straightforward. I know I should look into > the configure script and add an option for 64-bit-irix-compile or some > such, but that'll have to wait till I have time for it. > > Yes, I know, the search and substitute thing looks tedious, but it > isn't, honest: create a 'chfile.sh' out of the EMBOSS source hierarchy > containing > > #/bin/sh > cp $1 $1.orig > mv $1 tmpfile > sed -e 's/CC="cc"/CC="cc -64"/g' $1 > tmpfile > sed -e 's/CC = cc/CC = cc -64/g' $1 > tmpfile > sed -e 's/\/bin\/ld/\/bin\/ld -64/g' $1 > tmpfile > rm tmpfile > ## if you are sure, uncomment this > #rm $1.orig This will break if you have more than one ofthe sed commands run. It will either overwrite the previous substitutions or die if noclobber is set. put the sed commands in a file and source them with sed -f and it should work. alternatively setenv CC "cc -64"; setenv LD "/bin/ld -64"; ./configure may be easier (if configure is working properly). > > 'cd' to the emboss directory and run > > find . -type f -exec /path/to/chfile.sh {} \; -print > > and you are done with the CC changes. Libtool requires special > treatment since it uses quotes > ..d --------------------------------------------------------------------- * Dr. David Martin Biotechnology Centre of Oslo * * Node Manager Gaustadalleen 21 * * The Norwegian EMBNet Node P.O. box 1125 Blindern * * tel +47 22 84 05 35 N-0317 Oslo * * fax +47 22 84 05 01 Norway * --------------------------------------------------------------------- I will be leaving the Norwegian EMBnet node on 23rd February. All work related mail should be addressed to admin at embnet.uio.no where my successor, Rune Groven will deal with it. All personal email should be sent to dmartin at hgmp.mrc.ac.uk from whence it will be automatically forwarded to me. Spam should continue to be sent to /dev/null From jrvalverde at cnb.uam.es Thu Feb 8 10:45:29 2001 From: jrvalverde at cnb.uam.es (jrvalverde at cnb.uam.es) Date: Thu, 8 Feb 2001 11:45:29 +0100 (MET) Subject: [EMBnet ADMIN] Re: EMBOSS - Indexing breaks on large databases In-Reply-To: Message-ID: <200102081045.f18AjVM3331405@embnet.cnb.uam.es> David Martin wrote: > On Thu, 8 Feb 2001 jrvalverde at cnb.uam.es wrote: > > This will break if you have more than one ofthe sed commands run. It will > either overwrite the previous substitutions or die if noclobber is set. > > put the sed commands in a file and source them with sed -f and it should > work. > Certainly! I was writing from memory and got a user interrupting me with a problem meanwhile, I wrote it in a hassle and didn't realize it was mid-way (copy-pasted and half-edited lines). Shuld have been > sed -e 's/CC="cc"/CC="cc -64"/g' $1 > tmpfile > sed -e 's/CC = cc/CC = cc -64/g' tmpfile > $1 > sed -e 's/\/bin\/ld/\/bin\/ld -64/g' $1 > tmpfile > mv tmpfile > $1 Yeek! I should learn to be more careful with the [send] button or have a less window clobbered screen. j From jrvalverde at cnb.uam.es Thu Feb 8 10:52:56 2001 From: jrvalverde at cnb.uam.es (jrvalverde at cnb.uam.es) Date: Thu, 8 Feb 2001 11:52:56 +0100 (MET) Subject: [EMBnet ADMIN] Re: EMBOSS - Indexing breaks on large databases In-Reply-To: Message-ID: <200102081052.f18Aqv53259410@embnet.cnb.uam.es> David Martin wrote: > On Thu, 8 Feb 2001 jrvalverde at cnb.uam.es wrote: > > may be easier (if configure is working properly). > Nope, for some reason it didn't work for me, I got still the '-n32' on libtool.sh ... Which reminds me, I better have a look at the configure and fix it before anyone tries it, I may be forgetting some details. I keep being interrupted every two words I write. Lemme see, yes, libtool.sh contained LD="/bin/ld -n32" after ./configure, and the setenv doesn't work if you have gcc, for it finds it and uses it instead. Damn, forget my mails I'll write again when I don't get interrupted! j From jkb at mrc-lmb.cam.ac.uk Thu Feb 8 12:09:58 2001 From: jkb at mrc-lmb.cam.ac.uk (James Bonfield) Date: Thu, 8 Feb 2001 12:09:58 +0000 Subject: required vs optional In-Reply-To: <3A81916C.7E87ED03@lionbio.co.uk>; from Peter.Rice@uk.lionbioscience.com on Wed, Feb 07, 2001 at 06:18:20PM +0000 References: <20010207151329.A12981@arran.mrc-lmb.cam.ac.uk> <3A816A98.2BA3C3FF@lionbio.co.uk> <20010207165930.C19630@arran.mrc-lmb.cam.ac.uk> <3A8180D6.F158B05C@lionbio.co.uk> <20010207171649.A25071@arran.mrc-lmb.cam.ac.uk> <3A818648.72D3658A@lionbio.co.uk> <20010207175053.C25071@arran.mrc-lmb.cam.ac.uk> <3A81916C.7E87ED03@lionbio.co.uk> Message-ID: <20010208120958.D23361@arran.mrc-lmb.cam.ac.uk> On Wed, Feb 07, 2001 at 06:18:20PM +0000, Peter Rice wrote: > > req opt greyed out > > Y N N > > Y Y N > > N N N > > N Y Y > > Swap those last two. opt:y means it can be useful opt:n means you can change > it if you dare. This still leaves me confused. Swapping the last two gives: req opt greyed out Y N N Y Y N N N Y N Y N The defaults are both N, so not specifying either req or opt indicates a greyed-out option, which cannot be true surely? Can you please spell out for me exactly when we can be sure (during processing, not just at the time of producing help) when we know a question will be ignored (ie changing it's value has no effect)? > > Anyway, only when it's optional and not required should I grey them out. > > However as optional defaults to N only places where opt:Y is explicitly stated > > will I ever grey out paramaters, which unfortunately includes all those > > graphics options (eg this is syco): > > Some misunderstanding of 'required' perhaps? And optional too - I considered optional:N to indicate a mandatory question, but that's the purpose of required. > Can you just generate the tcl variable you need? If only req or opt will be > used you can just pick the value of whichever one is being set. Spare 'N' > settings can be safely ignored, as they are the default anyway. What if both req and opt are used (ignoring any opt:N and req:N settings)? embossdata looks to be the only one that uses both (but I didn't also check for param). Filename is defined as opt:Y req:$(fetch). I suppose as it's opt:Y then it may be useful and so should never be displayed as greyed-out. (I have been wondering about an "optional" tab so that the option values are automatically put into a separate area of the dialogue.) James -- James Bonfield (jkb at mrc-lmb.cam.ac.uk) Tel: 01223 402499 Fax: 01223 213556 Medical Research Council - Laboratory of Molecular Biology, Hills Road, Cambridge, CB2 2QH, England. Also see Staden Package WWW site at http://www.mrc-lmb.cam.ac.uk/pubseq/ From Peter.Rice at uk.lionbioscience.com Thu Feb 8 12:19:26 2001 From: Peter.Rice at uk.lionbioscience.com (Peter Rice) Date: Thu, 08 Feb 2001 12:19:26 +0000 Subject: required vs optional References: <20010207151329.A12981@arran.mrc-lmb.cam.ac.uk> <3A816A98.2BA3C3FF@lionbio.co.uk> <20010207165930.C19630@arran.mrc-lmb.cam.ac.uk> <3A8180D6.F158B05C@lionbio.co.uk> <20010207171649.A25071@arran.mrc-lmb.cam.ac.uk> <3A818648.72D3658A@lionbio.co.uk> <20010207175053.C25071@arran.mrc-lmb.cam.ac.uk> <3A81916C.7E87ED03@lionbio.co.uk> <20010208120958.D23361@arran.mrc-lmb.cam.ac.uk> Message-ID: <3A828ECE.E5E7F2BB@lionbio.co.uk> James Bonfield wrote: > The defaults are both N, so not specifying either req or opt indicates a > greyed-out option, which cannot be true surely? If both are N, EMBOSS will not prompt. The application will always receive a value, and it can always be set through the command line. > Can you please spell out for me exactly when we can be sure (during > processing, not just at the time of producing help) when we know a question > will be ignored (ie changing it's value has no effect)? You can't be sure - only the application code can tell you that. But you can be sure that the command line version will process everything in one pass so all the information is in the ACD file. > What if both req and opt are used (ignoring any opt:N and req:N settings)? > embossdata looks to be the only one that uses both (but I didn't also check > for param). Filename is defined as opt:Y req:$(fetch). I suppose as it's opt:Y > then it may be useful and so should never be displayed as greyed-out. Yes. If opt or req are Y you should not grey out. > (I have been wondering about an "optional" tab so that the option values are > automatically put into a separate area of the dialogue.) That is what we rather expected GUI developers to do. If you feel daring, you could add an advanced tab for the greyed out ones, but they may be ignored by the application. Our assumption was that anything that should appear in the GUI would be opt, but there are certainly some options that have no opt:Y at present. Feel free to suggest cases for promption to opt:Y. -- ------------------------------------------------ Peter Rice, LION Bioscience Ltd, Cambridge, UK peter.rice at uk.lionbioscience.com +44 1223 224723 From jkb at mrc-lmb.cam.ac.uk Thu Feb 8 12:43:49 2001 From: jkb at mrc-lmb.cam.ac.uk (James Bonfield) Date: Thu, 8 Feb 2001 12:43:49 +0000 Subject: required vs optional In-Reply-To: <3A828ECE.E5E7F2BB@lionbio.co.uk>; from Peter.Rice@uk.lionbioscience.com on Thu, Feb 08, 2001 at 12:19:26PM +0000 References: <20010207151329.A12981@arran.mrc-lmb.cam.ac.uk> <3A816A98.2BA3C3FF@lionbio.co.uk> <20010207165930.C19630@arran.mrc-lmb.cam.ac.uk> <3A8180D6.F158B05C@lionbio.co.uk> <20010207171649.A25071@arran.mrc-lmb.cam.ac.uk> <3A818648.72D3658A@lionbio.co.uk> <20010207175053.C25071@arran.mrc-lmb.cam.ac.uk> <3A81916C.7E87ED03@lionbio.co.uk> <20010208120958.D23361@arran.mrc-lmb.cam.ac.uk> <3A828ECE.E5E7F2BB@lionbio.co.uk> Message-ID: <20010208124349.F23361@arran.mrc-lmb.cam.ac.uk> On Thu, Feb 08, 2001 at 12:19:26PM +0000, Peter Rice wrote: > If both are N, EMBOSS will not prompt. The application will always receive a > value, and it can always be set through the command line. I think this is where the confusion comes from. ACD is designed for command lines where each question is presented one at a time, whereas I'm dealing in GUIs where the questions need to be shown all at once. For added complexity the users may answer questions out of order. > > Can you please spell out for me exactly when we can be sure (during > > processing, not just at the time of producing help) when we know a question > > will be ignored (ie changing it's value has no effect)? > > You can't be sure - only the application code can tell you that. So this is getting back to my original proposal. I'd come to the conclusion that the required and optional attributes where not enough to indicate whether an application needs a value for an option. Hence the suggestion of an extra attribute that indicates this (via an expression). Without this I basically have to have all options permanently available to the user even though I know many are useless (eg the user has selected not to perform a plot in pepwheel, but they are still asked to provide information about how the plot should look). I know on a web-form there's nothing that can be done about such matters, but most real windows of unix applications do make use of greying out unneeded options so that the complexity to the user is less. Would you have any objections to me making use of a needed: attribute? > But you can be sure that the command line version will process everything in > one pass so all the information is in the ACD file. But above you indicated that some information is known only by the application and so it isn't in the ACD file. > > (I have been wondering about an "optional" tab so that the option values are > > automatically put into a separate area of the dialogue.) > > That is what we rather expected GUI developers to do. The main problem is that the natural grouping is separated. Eg pepwheel -h indicates that the options tab would contain how to plot the wheel, but whether to perform the plot is in a different tab (advanced). Also for syco -graph and -outfile are both listed as mandatory, and yet the program will only ever use one or the other and never both (depending on the advanced parameter -plot). This would be rather confusing to the user. James -- James Bonfield (jkb at mrc-lmb.cam.ac.uk) Tel: 01223 402499 Fax: 01223 213556 Medical Research Council - Laboratory of Molecular Biology, Hills Road, Cambridge, CB2 2QH, England. Also see Staden Package WWW site at http://www.mrc-lmb.cam.ac.uk/pubseq/ From Peter.Rice at uk.lionbioscience.com Thu Feb 8 12:58:21 2001 From: Peter.Rice at uk.lionbioscience.com (Peter Rice) Date: Thu, 08 Feb 2001 12:58:21 +0000 Subject: required vs optional References: <20010207151329.A12981@arran.mrc-lmb.cam.ac.uk> <3A816A98.2BA3C3FF@lionbio.co.uk> <20010207165930.C19630@arran.mrc-lmb.cam.ac.uk> <3A8180D6.F158B05C@lionbio.co.uk> <20010207171649.A25071@arran.mrc-lmb.cam.ac.uk> <3A818648.72D3658A@lionbio.co.uk> <20010207175053.C25071@arran.mrc-lmb.cam.ac.uk> <3A81916C.7E87ED03@lionbio.co.uk> <20010208120958.D23361@arran.mrc-lmb.cam.ac.uk> <3A828ECE.E5E7F2BB@lionbio.co.uk> <20010208124349.F23361@arran.mrc-lmb.cam.ac.uk> Message-ID: <3A8297ED.256D80E8@lionbio.co.uk> James Bonfield wrote: > I think this is where the confusion comes from. ACD is designed for command > lines where each question is presented one at a time, whereas I'm dealing in > GUIs where the questions need to be shown all at once. For added complexity > the users may answer questions out of order. ACD is intended to cope with both. It forces the application to demand all input at the beginning. All we need to do is tweak whatever is needed to make GUIs comfortable. > Would you have any objections to me making use of a needed: attribute? Can we make opt/req do what you want? The original intention was to use opt for this (with an expression if necessary). > > But you can be sure that the command line version will process everything in > > one pass so all the information is in the ACD file. > > But above you indicated that some information is known only by the application > and so it isn't in the ACD file. Only in so far as the application can get a value for any option, and do whatever it wants with it. In effect, by greying out options, you are forcing EMBOSS to use the default value (which is what I would expect). > The main problem is that the natural grouping is separated. Eg pepwheel -h > indicates that the options tab would contain how to plot the wheel, but > whether to perform the plot is in a different tab (advanced). Hmmm. Well, we could make that option an opt:Y as well but it seems your tab system is rather artificial if you only have required and optional tabs. Is this fixed more easily by the option groups we were discussing earlier? > Also for syco -graph and -outfile are both listed as mandatory, and yet the > program will only ever use one or the other and never both (depending on the > advanced parameter -plot). This would be rather confusing to the user. Help will list them as mandatory because they both could be true. When you run "syco -help" is has no way to guess what the value of "plot" will be. Your interface can be more cunning by greying out -graph unless -plot is selected (ah, but as it is not currently optional you will not display it) Question: can you avoid greying out all options that are used in dependencies (that appear as $(name) or $(name.something) in the ACD definitions) ? -- ------------------------------------------------ Peter Rice, LION Bioscience Ltd, Cambridge, UK peter.rice at uk.lionbioscience.com +44 1223 224723 From Peter.Rice at uk.lionbioscience.com Thu Feb 8 13:13:45 2001 From: Peter.Rice at uk.lionbioscience.com (Peter Rice) Date: Thu, 08 Feb 2001 13:13:45 +0000 Subject: required vs optional References: <20010207165930.C19630@arran.mrc-lmb.cam.ac.uk> <3A8180D6.F158B05C@lionbio.co.uk> <20010207171649.A25071@arran.mrc-lmb.cam.ac.uk> <3A818648.72D3658A@lionbio.co.uk> <20010207175053.C25071@arran.mrc-lmb.cam.ac.uk> <3A81916C.7E87ED03@lionbio.co.uk> <20010208120958.D23361@arran.mrc-lmb.cam.ac.uk> <3A828ECE.E5E7F2BB@lionbio.co.uk> <20010208124349.F23361@arran.mrc-lmb.cam.ac.uk> <3A8297ED.256D80E8@lionbio.co.uk> <20010208130349.G23361@arran.mrc-lmb.cam.ac.uk> Message-ID: <3A829B89.EC8C7891@lionbio.co.uk> James Bonfield wrote: > I think you understand why I want to grey-out options. I do not want to > grey-out things which are merely optional, but things which really have no > effect whatsoever. For example -graph when -plot is 0. In this case EMBOSS > will not be using a default value, rather it won't be using it at all (and > specifying it on the command line will have no effect). This sort of thing > happens more often than you'd expect. > > Hence I do not think we can use opt/req for this as they already have other > purposes. OK. Since you are a good boy (i.e. you can cope with ACD expressions and stuff) let us experiment with a new option for GUIs only. How about: needed:y or needed:n default value would be assumed to be (parameter or required or optional) unless you think a default of Y is better. for cases where this is not enough, we expect to use an expression. In ajax/ajacd.c: static int nDefAttr = 11; ... change to 12 enum AcdEDef { ... add at the end: DEF_NEEDED (with a comma for DEF_EXPECTED on the line above) AcdOAttr acdAttrDef[] = { ... add before the NULL, VT_NULL line: {"needed", VT_BOOL}, /* value is needed, i.e. useful */ EMBOSS should now accept needed: in ACD files for all definitions (note: this is how comment, corba and style should have been added for cleanness) -- ------------------------------------------------ Peter Rice, LION Bioscience Ltd, Cambridge, UK peter.rice at uk.lionbioscience.com +44 1223 224723 From jrvalverde at cnb.uam.es Thu Feb 8 13:27:48 2001 From: jrvalverde at cnb.uam.es (jrvalverde at cnb.uam.es) Date: Thu, 8 Feb 2001 14:27:48 +0100 (MET) Subject: EMBOSS - Indexing breaks on large databases In-Reply-To: <200102080900.f1890pU3292249@embnet.cnb.uam.es> Message-ID: <200102081327.f18DRnN3428709@embnet.cnb.uam.es> wrote: > > #/bin/sh > cp $1 $1.orig > mv $1 tmpfile > sed -e 's/CC="cc"/CC="cc -64"/g' $1 > tmpfile > sed -e 's/CC = cc/CC = cc -64/g' $1 > tmpfile > sed -e 's/\/bin\/ld/\/bin\/ld -64/g' $1 > tmpfile > rm tmpfile > ## if you are sure, uncomment this > #rm $1.orig > Sorry, that won't work, I was (and am) being interrupted and writing from memory, and slipped on this. The correct recipe should ovbiously not rewrite temporary files, but rather be #/bin/sh cp $1 $1.orig mv $1 tmpfile sed -e 's/CC="cc"/CC="cc -64"/g' tmpfile > $1 sed -e 's/CC = cc/CC = cc -64/g' $1 > tmpfile sed -e 's/\/bin\/ld -n32/\/bin\/ld -64/g' tmpfile > $1 rm tmpfile ## if you are sure, uncomment this #rm $1.orig The problem seems to be that some parts of the compilation are handled by Makefile and others by libtool.sh, which makes overriding more difficult. Both are generated by configure, hence the right thing to do (if I find enough uninterrupted time) would be to change it instead. j From ableasby at hgmp.mrc.ac.uk Thu Feb 8 14:04:04 2001 From: ableasby at hgmp.mrc.ac.uk (ableasby at hgmp.mrc.ac.uk) Date: Thu, 8 Feb 2001 14:04:04 GMT Subject: EMBOSS - Indexing breaks on large databases Message-ID: <200102081404.OAA25562@bromine.hgmp.mrc.ac.uk> You should indeed be worried about the duplicate entries (normally caused by indexing the database and updates at the same time) as you'll never be certain which one is retrieved. It is best to index them separately. wrt the 64-bit thing on this same thread. Coincidentally I'm working on this at the moment. The compilation flags are not the only concern. There's ftell64's, ftello's [and fseeks] and, of course, the indexing. That's not even mentioning ajints and ajlongs :-) Alan From jkb at mrc-lmb.cam.ac.uk Thu Feb 8 14:46:27 2001 From: jkb at mrc-lmb.cam.ac.uk (James Bonfield) Date: Thu, 8 Feb 2001 14:46:27 +0000 Subject: required vs optional In-Reply-To: <3A829B89.EC8C7891@lionbio.co.uk>; from Peter.Rice@uk.lionbioscience.com on Thu, Feb 08, 2001 at 01:13:45PM +0000 References: <20010207171649.A25071@arran.mrc-lmb.cam.ac.uk> <3A818648.72D3658A@lionbio.co.uk> <20010207175053.C25071@arran.mrc-lmb.cam.ac.uk> <3A81916C.7E87ED03@lionbio.co.uk> <20010208120958.D23361@arran.mrc-lmb.cam.ac.uk> <3A828ECE.E5E7F2BB@lionbio.co.uk> <20010208124349.F23361@arran.mrc-lmb.cam.ac.uk> <3A8297ED.256D80E8@lionbio.co.uk> <20010208130349.G23361@arran.mrc-lmb.cam.ac.uk> <3A829B89.EC8C7891@lionbio.co.uk> Message-ID: <20010208144627.I23361@arran.mrc-lmb.cam.ac.uk> On Thu, Feb 08, 2001 at 01:13:45PM +0000, Peter Rice wrote: > OK. Since you are a good boy (i.e. you can cope with ACD expressions and > stuff) let us experiment with a new option for GUIs only. > > How about: > > needed:y or needed:n > > default value would be assumed to be (parameter or required or optional) > unless you think a default of Y is better. I think that's enough. The default will indeed cover most cases. I'll need to ponder about the best way to code this default, but I guess that's my problem :-} On a related note, I've just had a hunt for cases where param and req are both specified differently. I found one case in vectorstrip: ... bool: vectorfile [ param: Y def: Y prompt: "Are your vector sequences in a file?" ] infile: vectors [ param: Y req: @($(vectorfile)?Y:N) nullok: Y def: "" prompt: "Name of vectorfile" ] ... The help lists this as: Mandatory qualifiers (* if not always prompted): [-sequence] seqall (no help text) seqall value [-[no]vectorfile] bool Are your vector sequences in a file? * [-vectors] infile Name of vectorfile * -linkera string 5' sequence * -linkerb string 3' sequence -mismatch integer Max allowed % mismatch -[no]besthits bool Show only the best hits (minimise mismatches)? [-outf] outfile (no help text) outfile value [-outseq] seqoutall (no help text) seqoutall value Optional qualifiers: (none) Advanced qualifiers: (none) So on the command line we can do: vectorstrip dna.embl 1 vector_file outfile This asks a few remaining questions, then uses a vector file named vector_file, and finally saves the results to outfile. However if I try: vectorstrip dna.embl 0 outfile then I have problems because outfile isn't a vector file. To specify this correctly I have to use: vectorstrip dna.embl 0 vector_file outfile vector_file must exist, even though it's not used. This is because of specifying both req and param. Param is always Y regardless of the previous question, but it's not always required. This makes sense when specifying "-qualifier value" syntax, but not with the param syntax. I've checked and the following works as I originally expected: infile: vectors [ param: @($(vectorfile)?Y:N) nullok: Y def: "" prompt: "Name of vectorfile" ] However this does mean there's a different number of items on the command line depending on whether vectorfile was set or not, however such things are presumably only an issue for programs invoking vectorstrip, in which case they'd be more self-documenting if they use -qualifier anyway. James -- James Bonfield (jkb at mrc-lmb.cam.ac.uk) Tel: 01223 402499 Fax: 01223 213556 Medical Research Council - Laboratory of Molecular Biology, Hills Road, Cambridge, CB2 2QH, England. Also see Staden Package WWW site at http://www.mrc-lmb.cam.ac.uk/pubseq/ From leonardz at bioinfo.sickkids.on.ca Thu Feb 8 14:51:44 2001 From: leonardz at bioinfo.sickkids.on.ca (Len F. Zaifman) Date: Thu, 08 Feb 2001 09:51:44 -0500 Subject: EMBOSS - Indexing breaks on large databases References: <200102081404.OAA25562@bromine.hgmp.mrc.ac.uk> Message-ID: <3A82B280.5F5878D@bioinfo.sickkids.on.ca> ableasby at hgmp.mrc.ac.uk wrote: > > You should indeed be worried about the duplicate entries > (normally caused by indexing the database and updates > at the same time) as you'll never be certain which one is > retrieved. It is best to index them separately. > > wrt the 64-bit thing on this same thread. Coincidentally > I'm working on this at the moment. The compilation flags > are not the only concern. There's ftell64's, ftello's > [and fseeks] and, of course, the indexing. That's not > even mentioning ajints and ajlongs :-) > > Alan Thanks Alan but I think I may not have been clear enough: I don't think that dbiflat it self has the problem (although I could be wrong). I think the problem comes from: > the indexing breaks due to sort running out of memory: << my comment > > UX:sort: ERROR: Out of memory before merge: Not enough space << the error reported So I believe it is a system issue where my resident memory set size needs to be increased. I was hoping someone had a workaround to get sort to work within available memory, and not request the an amount beyond the limit. Having said that, geeting a 64 bit clean emboss would be great. Thanks to the others who responded as well. -------------- next part -------------- A non-text attachment was scrubbed... Name: leonardz.vcf Type: text/x-vcard Size: 358 bytes Desc: Card for Len F. Zaifman URL: From ableasby at hgmp.mrc.ac.uk Thu Feb 8 15:01:30 2001 From: ableasby at hgmp.mrc.ac.uk (ableasby at hgmp.mrc.ac.uk) Date: Thu, 8 Feb 2001 15:01:30 GMT Subject: EMBOSS - Indexing breaks on large databases Message-ID: <200102081501.PAA06513@bromine.hgmp.mrc.ac.uk> Hi Leonard, No, you were indeed perfectly clear. I was replying on the second matter since the sort error message is a system one. The emboss indexing programs just use a C "system()" call for performing the sorts so if anything goes wrong after that point its an SEP (Somebody elses problem) as far as EMBOSS is concerned. There is a sort option for those programs but I would NOT recommend using it. I don't think the person who wrote that bit of code/modified it was ever entirely happy with it and I am going to clean it up as part of the 64 bit process. Rgds Alan From Peter.Rice at uk.lionbioscience.com Thu Feb 8 15:07:27 2001 From: Peter.Rice at uk.lionbioscience.com (Peter Rice) Date: Thu, 08 Feb 2001 15:07:27 +0000 Subject: EMBOSS - Indexing breaks on large databases References: <200102081501.PAA06513@bromine.hgmp.mrc.ac.uk> Message-ID: <3A82B62F.D1FE8406@lionbio.co.uk> ableasby at hgmp.mrc.ac.uk wrote: > There is a sort option for those programs but I would NOT > recommend using it. I don't think the person who wrote that bit > of code/modified it was ever entirely happy with it and I am > going to clean it up as part of the 64 bit process. That was originally added because of very strange problems with GNU sort in Norway, which appeared to have a mind of its own (GNU sort, not Norway) when deciding how to sort entry names. "dbiflat -nocleanup" will leave the temporary files around, so you can try sorting them by hand to see what resources it needs. The "-debug" command line option will write a dbiflat.dbg file that includes the sort commands used. The last one will be the one that fell over. -- ------------------------------------------------ Peter Rice, LION Bioscience Ltd, Cambridge, UK peter.rice at uk.lionbioscience.com +44 1223 224723 From jkb at mrc-lmb.cam.ac.uk Thu Feb 8 15:36:19 2001 From: jkb at mrc-lmb.cam.ac.uk (James Bonfield) Date: Thu, 8 Feb 2001 15:36:19 +0000 Subject: Corrections to ACD Syntax manual Message-ID: <20010208153619.A26987@arran.mrc-lmb.cam.ac.uk> Just a couple small problems I encountered near the start of this manual. 1. "Parameters and qualifiers are defined by a single token followed by either a colon ':'" and "The first token in the file must be "application" directly followed by a colon ':' or an equal sign '='." I think the phrase "directed followed" is misleading, as "token : value" is just as valid as "token: value". Whether this is deliberate or not I do not know, but I see both examples liberally used. 2. "Values can be delimited (i.e. treated as one token) by any of the following pairs, which are stripped as the value is parsed : '' {} () [] <> " The only quoting I find used is double quotes, which isn't listed above. I also wonder whether this many quoting styles is just a symptom of the lack of escaping mechanism. Adding backslash support would probably allow great simplification of this. James -- James Bonfield (jkb at mrc-lmb.cam.ac.uk) Tel: 01223 402499 Fax: 01223 213556 Medical Research Council - Laboratory of Molecular Biology, Hills Road, Cambridge, CB2 2QH, England. Also see Staden Package WWW site at http://www.mrc-lmb.cam.ac.uk/pubseq/ From Peter.Rice at uk.lionbioscience.com Thu Feb 8 15:48:03 2001 From: Peter.Rice at uk.lionbioscience.com (Peter Rice) Date: Thu, 08 Feb 2001 15:48:03 +0000 Subject: Corrections to ACD Syntax manual References: <20010208153619.A26987@arran.mrc-lmb.cam.ac.uk> Message-ID: <3A82BFB3.339115D7@lionbio.co.uk> James Bonfield wrote: > "The first token in the file must be "application" directly followed by a > colon ':' or an equal sign '='." > > I think the phrase "directly followed" is misleading, as "token : value" is > just as valid as "token: value". Whether this is deliberate or not I do not > know, but I see both examples liberally used. The original parser was very forgiving. It allows a few other formats too. It should be fixed. Meanwhile, the documentation can be economical with the truth by only giving the officially approved style. > 2. "Values can be delimited (i.e. treated as one token) by any of the > following pairs, which are stripped as the value is parsed : > > '' {} () [] <> > > The only quoting I find used is double quotes, which isn't listed above. Oops. Never noticed that one. Thanks. > I also wonder whether this many quoting styles is just a symptom of the lack > of escaping mechanism. Adding backslash support would probably allow great > simplification of this. Spot on there. Plus it was great fun to write :-) The alternatives never caught on, and others have pointed out that the brackets could be useful for alternative forms of syntax (lists of values, for example) in some future release. Should be fixed - at least backslash support should be added. Probably not the highest priority, something for the weekend perhaps. -- ------------------------------------------------ Peter Rice, LION Bioscience Ltd, Cambridge, UK peter.rice at uk.lionbioscience.com +44 1223 224723 From jkb at mrc-lmb.cam.ac.uk Thu Feb 8 15:57:18 2001 From: jkb at mrc-lmb.cam.ac.uk (James Bonfield) Date: Thu, 8 Feb 2001 15:57:18 +0000 Subject: required vs optional In-Reply-To: <3A829B89.EC8C7891@lionbio.co.uk>; from Peter.Rice@uk.lionbioscience.com on Thu, Feb 08, 2001 at 01:13:45PM +0000 References: <20010207171649.A25071@arran.mrc-lmb.cam.ac.uk> <3A818648.72D3658A@lionbio.co.uk> <20010207175053.C25071@arran.mrc-lmb.cam.ac.uk> <3A81916C.7E87ED03@lionbio.co.uk> <20010208120958.D23361@arran.mrc-lmb.cam.ac.uk> <3A828ECE.E5E7F2BB@lionbio.co.uk> <20010208124349.F23361@arran.mrc-lmb.cam.ac.uk> <3A8297ED.256D80E8@lionbio.co.uk> <20010208130349.G23361@arran.mrc-lmb.cam.ac.uk> <3A829B89.EC8C7891@lionbio.co.uk> Message-ID: <20010208155718.J23361@arran.mrc-lmb.cam.ac.uk> On Thu, Feb 08, 2001 at 01:13:45PM +0000, Peter Rice wrote: > How about: > > needed:y or needed:n > > default value would be assumed to be (parameter or required or optional) > unless you think a default of Y is better. Looking at this in more detail, I'm unsure of the default value. Specifically "parameter or required or optional" implies that the default is no, as the default for all three dependent variables is also no. A rough check implies there are some 400ish items where none of parameter, require or optional are defined (to be anything other than N or just the default). Indeed it's only likely to be the case that we wish to set needed:n for an option that has required or optional as an expression, which is probably only a small percentage. I found roughly 50 expressions used in opt and req. If we set the default of needed to be Y then most (but perhaps not all) of these will need the req: expression duplicating in the needed: value. Given that 50 is much less than 400 may I ask for the default to be needed:y? I'm only thinking about producing less emboss changes and it has nothing to do with the fact that dealing with a non-constant default value is a bit tricky for me. :-) James -- James Bonfield (jkb at mrc-lmb.cam.ac.uk) Tel: 01223 402499 Fax: 01223 213556 Medical Research Council - Laboratory of Molecular Biology, Hills Road, Cambridge, CB2 2QH, England. Also see Staden Package WWW site at http://www.mrc-lmb.cam.ac.uk/pubseq/ From jkb at mrc-lmb.cam.ac.uk Thu Feb 8 16:05:06 2001 From: jkb at mrc-lmb.cam.ac.uk (James Bonfield) Date: Thu, 8 Feb 2001 16:05:06 +0000 Subject: Corrections to ACD Syntax manual In-Reply-To: <3A82BFB3.339115D7@lionbio.co.uk>; from Peter.Rice@uk.lionbioscience.com on Thu, Feb 08, 2001 at 03:48:03PM +0000 References: <20010208153619.A26987@arran.mrc-lmb.cam.ac.uk> <3A82BFB3.339115D7@lionbio.co.uk> Message-ID: <20010208160506.K23361@arran.mrc-lmb.cam.ac.uk> On Thu, Feb 08, 2001 at 03:48:03PM +0000, Peter Rice wrote: > Should be fixed - at least backslash support should be added. The only tricky bit is dealing with the existing backslash mechanism which is used for adding newline characters (eg see transeq.acd). (Which incidently I couldn't find documented either...) For what it's worth, my own parser (written in vanilla tcl) uses the following regular expressions for strings: set tlist { {^.(.*).$} {\1} {\\[ \n\r]+} {\\n} {[ \n\r]+} { } {\\n} "\n" {\\(.)} {\1} } set rules [format { # ... {"(\\.|[^"\\])*"} STRING {%s} {'(\\.|[^'\\])*'} STRING {%s} {<(\\.|[^>\\])*>} STRING {%s} {\{(\\.|[^\}\\])*\}} STRING {%s} # ... } $tlist $tlist $tlist $tlist This is my own lex-style hack. The rules are matched one at a time in the order they are listed. If a rule matches then the token type (STRING) is added to the token list and the token value is edited, if appropriate, based on a series of 'regsub' calls (listed in the $tlist variable here). My STRING definition does include backslashing already, which is of course not strictly correct at the moment (but I don't know how often it really matters). The substitutions (tlist) are a way to simulate the emboss acd parser mechanism of squashing multiple white-space into a single space character, and adding newlines with a single backslash. I'm not suggesting for a minute anyone copies my code, but the regular expressions may be handy for other people trying to parse ACD. James -- James Bonfield (jkb at mrc-lmb.cam.ac.uk) Tel: 01223 402499 Fax: 01223 213556 Medical Research Council - Laboratory of Molecular Biology, Hills Road, Cambridge, CB2 2QH, England. Also see Staden Package WWW site at http://www.mrc-lmb.cam.ac.uk/pubseq/ From Peter.Rice at uk.lionbioscience.com Thu Feb 8 16:07:13 2001 From: Peter.Rice at uk.lionbioscience.com (Peter Rice) Date: Thu, 08 Feb 2001 16:07:13 +0000 Subject: required vs optional References: <20010207171649.A25071@arran.mrc-lmb.cam.ac.uk> <3A818648.72D3658A@lionbio.co.uk> <20010207175053.C25071@arran.mrc-lmb.cam.ac.uk> <3A81916C.7E87ED03@lionbio.co.uk> <20010208120958.D23361@arran.mrc-lmb.cam.ac.uk> <3A828ECE.E5E7F2BB@lionbio.co.uk> <20010208124349.F23361@arran.mrc-lmb.cam.ac.uk> <3A8297ED.256D80E8@lionbio.co.uk> <20010208130349.G23361@arran.mrc-lmb.cam.ac.uk> <3A829B89.EC8C7891@lionbio.co.uk> <20010208155718.J23361@arran.mrc-lmb.cam.ac.uk> Message-ID: <3A82C431.FBC08E98@lionbio.co.uk> James Bonfield wrote: > Given that 50 is much less than 400 may I ask for the default to be needed:y? > > I'm only thinking about producing less emboss changes and it has nothing to do > with the fact that dealing with a non-constant default value is a bit tricky > for me. :-) No problem at all - we plan to ignore it, unless we can think of a use for it (like a -needed command line qualifier that only prompts for the needed values. You are completely free to choose your own default. -- ------------------------------------------------ Peter Rice, LION Bioscience Ltd, Cambridge, UK peter.rice at uk.lionbioscience.com +44 1223 224723 From Peter.Rice at uk.lionbioscience.com Thu Feb 8 16:30:58 2001 From: Peter.Rice at uk.lionbioscience.com (Peter Rice) Date: Thu, 08 Feb 2001 16:30:58 +0000 Subject: required vs optional References: Message-ID: <3A82C9C2.D99E240E@lionbio.co.uk> David Martin wrote (but not to the list): > > On Thu, 8 Feb 2001, Peter Rice wrote: > > > No problem at all - we plan to ignore it, unless we can think of a use for it > > (like a -needed command line qualifier that only prompts for the needed > > values. You are completely free to choose your own default. > > Umm.. surely there should be a defined default at the ACD level, not at > the acd processeing application level, and James is right with needed: y > because it is somewhat stupid to have a default for options that says they > are not needed (in which case why put them in?) EMBOSS will have a default (N) but will not be using it. We can define a default behaviour later if we want to make use of it. A -needed qualifier would be fun, but probably not of general interest. > Is there a copy of the ACD guide in anything other than HTML or PS, and if > so, can I have it to add to the collection. I am tempted to go mad and > rewrite everything into DocBook.. There was an MS-Word original, but I believe it is now maintained in HTML. -- ------------------------------------------------ Peter Rice, LION Bioscience Ltd, Cambridge, UK peter.rice at uk.lionbioscience.com +44 1223 224723 From gwilliam at hgmp.mrc.ac.uk Thu Feb 8 16:36:04 2001 From: gwilliam at hgmp.mrc.ac.uk (Gary Williams, Tel 01223 494522) Date: Thu, 08 Feb 2001 16:36:04 +0000 Subject: required vs optional References: <3A82C9C2.D99E240E@lionbio.co.uk> Message-ID: <3A82CAF4.D0F38DE5@hgmp.mrc.ac.uk> Peter Rice wrote: > > Is there a copy of the ACD guide in anything other than HTML or PS, and if > > so, can I have it to add to the collection. I am tempted to go mad and > > rewrite everything into DocBook.. > > There was an MS-Word original, but I believe it is now maintained in HTML. This is the case. -- Gary Williams Tel: +44 1223 494522 Fax: +44 1223 494512 mailto:G.Williams at hgmp.mrc.ac.uk http://www.hgmp.mrc.ac.uk/ Bioinformatics,MRC HGMP Resource Centre,Hinxton,Cambridge, CB10 1SB,UK From jrvalverde at cnb.uam.es Fri Feb 9 12:41:37 2001 From: jrvalverde at cnb.uam.es (jrvalverde at cnb.uam.es) Date: Fri, 9 Feb 2001 13:41:37 +0100 (MET) Subject: 64 bit & irix Message-ID: <200102091241.f19Cfcj3491505@embnet.cnb.uam.es> So, I finally had time to take a look at the configure script. It already contains a check for 64 bit irix compilation, and, when correctly used, generates the correct compilation scripts and makefiles. The problem is that it checks whether 64-bit is enabled _by default_, not whether it is available or has been user-selected. What ./configure does is create a trivial C file, compile it and check whether it is -o32, -n32 or -64, and then decide which compiler to use. Now, there is a simpler trick than the one I proposed: it all reduces to making the compiler/linker use the appropriate defaults _before_ running configure. For this: - remove gcc from your path - create a compiler.defaults file containing -DEFAULT:abi=64:isa=mips4:proc=r10k - define COMPILER_DEFAULTS_PATH setenv COMPILER_DEFAULTS_PATH `pwd` - run configure And that's it. Defining cc to be 'cc -64' and ld to be 'ld -64' and running configure dosn't work. In that case, the linker used is still the -n32 linker, and build fails with linking errors. You need to go the obscure trick way. One reason is configure hard codes the 'default' compilation method (which being default, it shouldn't need to) for the linker. Hence although if one aliases 'cc/gcc' to 64bit, it will still hardcode ld to be the default 'ld -32'. Should it just use "cc/gcc" and "ld" only, this wouldn't happen as defaults would be carried all along, this way the explicit flag overrides the aliases. So unless you get the explicit flags right by using the above trick, it won't work. BTW, this mess seems to happen (from what I gather looking at ./configure) only on IRIX. Makes sense, since that's probably the only system too that tries to maintain three incompatible binary systems defaulting compilation to 32 bits in 64 bit machines (sigh). On a side note: it won't work for gcc: the -64 test in configure is only used if the compiler is 'cc'. No check for --mabi=64 seems to be present. For gcc I suspect one will need to run ./configure and then substitute all instances of 'gcc' by 'gcc --mabi=64' and of 'ld -n32' by 'ld -64'. Haven't tested though. Or configure as above with system cc and later alias cc to gcc --mabi=64. BTW, David, might you update the trick you added to the admin guide? Thanks. j From david.martin at biotek.uio.no Fri Feb 9 13:53:46 2001 From: david.martin at biotek.uio.no (David Martin) Date: Fri, 9 Feb 2001 14:53:46 +0100 Subject: [EMBnet ADMIN] 64 bit & irix In-Reply-To: <200102091241.f19Cfcj3491505@embnet.cnb.uam.es> Message-ID: OK, I'll try this. I'll have to try to get 64 bit versions of the PNG libraries first. ..d The admin guide will get updated in due course. On Fri, 9 Feb 2001 jrvalverde at cnb.uam.es wrote: > So, I finally had time to take a look at the configure script. > > It already contains a check for 64 bit irix compilation, and, > when correctly used, generates the correct compilation scripts > and makefiles. The problem is that it checks whether 64-bit is > enabled _by default_, not whether it is available or has been > user-selected. > > What ./configure does is create a trivial C file, compile it > and check whether it is -o32, -n32 or -64, and then decide > which compiler to use. > > Now, there is a simpler trick than the one I proposed: it all > reduces to making the compiler/linker use the appropriate > defaults _before_ running configure. For this: > > - remove gcc from your path > > - create a compiler.defaults file containing > > -DEFAULT:abi=64:isa=mips4:proc=r10k > > - define COMPILER_DEFAULTS_PATH > > setenv COMPILER_DEFAULTS_PATH `pwd` > > - run configure > > And that's it. > > Defining cc to be 'cc -64' and ld to be 'ld -64' and running > configure dosn't work. In that case, the linker used is still > the -n32 linker, and build fails with linking errors. You > need to go the obscure trick way. > > One reason is configure hard codes the 'default' compilation > method (which being default, it shouldn't need to) for the > linker. Hence although if one aliases 'cc/gcc' to 64bit, > it will still hardcode ld to be the default 'ld -32'. Should > it just use "cc/gcc" and "ld" only, this wouldn't happen as > defaults would be carried all along, this way the explicit > flag overrides the aliases. So unless you get the explicit > flags right by using the above trick, it won't work. > > BTW, this mess seems to happen (from what I gather looking at > ./configure) only on IRIX. Makes sense, since that's probably > the only system too that tries to maintain three incompatible > binary systems defaulting compilation to 32 bits in 64 bit > machines (sigh). > > On a side note: it won't work for gcc: the -64 test in configure > is only used if the compiler is 'cc'. No check for --mabi=64 > seems to be present. For gcc I suspect one will need to run > ./configure and then substitute all instances of 'gcc' by > 'gcc --mabi=64' and of 'ld -n32' by 'ld -64'. Haven't tested > though. Or configure as above with system cc and later alias > cc to gcc --mabi=64. > > BTW, David, might you update the trick you added to the admin > guide? Thanks. > > j > > > > --------------------------------------------------------------------- * Dr. David Martin Biotechnology Centre of Oslo * * Node Manager Gaustadalleen 21 * * The Norwegian EMBNet Node P.O. box 1125 Blindern * * tel +47 22 84 05 35 N-0317 Oslo * * fax +47 22 84 05 01 Norway * --------------------------------------------------------------------- I will be leaving the Norwegian EMBnet node on 23rd February. All work related mail should be addressed to admin at embnet.uio.no where my successor, Rune Groven will deal with it. All personal email should be sent to dmartin at hgmp.mrc.ac.uk from whence it will be automatically forwarded to me. Spam should continue to be sent to /dev/null From jkb at mrc-lmb.cam.ac.uk Tue Feb 13 13:03:06 2001 From: jkb at mrc-lmb.cam.ac.uk (James Bonfield) Date: Tue, 13 Feb 2001 13:03:06 +0000 Subject: More ACD Syntax manual corrections Message-ID: <20010213130306.A25301@arran.mrc-lmb.cam.ac.uk> The calculated attributes are poorly defined between seqall and seqset. The main table under the heading of "Ajax Data Types" lists totweight as calculated in seqset, but not seqall. Also all of the sequence types have nucleic as a calculated attribute. Later on in the "Calculated Attributes" section another table is presented where seqset is mislabeled as seqall. Nucleic is also missing from all of the types shown. Finally, dbifasta, dbiflat and dbigcg all use the keyword "is:" in ACD expressions. For now I'm assuming that "is: $(dbname)" means $(dbname) != "". Could is: be added to the documentation? (Perhaps it is, but searching for "is" is lunacy, and is: finds nothing relevant.) James -- James Bonfield (jkb at mrc-lmb.cam.ac.uk) Tel: 01223 402499 Fax: 01223 213556 Medical Research Council - Laboratory of Molecular Biology, Hills Road, Cambridge, CB2 2QH, England. Also see Staden Package WWW site at http://www.mrc-lmb.cam.ac.uk/pubseq/ From jkb at mrc-lmb.cam.ac.uk Tue Feb 13 19:11:53 2001 From: jkb at mrc-lmb.cam.ac.uk (James Bonfield) Date: Tue, 13 Feb 2001 19:11:53 +0000 Subject: pepwindow bugs? Message-ID: <20010213191153.A1360@arran.mrc-lmb.cam.ac.uk> I'm having trouble getting pepwindow to look anything like the results produce using xpip's (OLD code!) Kyte & Doolittle implementation. For an easier example than xpip (which is available from our ftp site, but I don't expect many to have it) also try looking at the java program at: http://arbl.cvmbs.colostate.edu/molkit/hydropathy/ On the demo sequence there xpip and this java app give essentially identical results, while pepwindow gives something looking VERY different. Any explanations? James -- James Bonfield (jkb at mrc-lmb.cam.ac.uk) Tel: 01223 402499 Fax: 01223 213556 Medical Research Council - Laboratory of Molecular Biology, Hills Road, Cambridge, CB2 2QH, England. Also see Staden Package WWW site at http://www.mrc-lmb.cam.ac.uk/pubseq/ From jkb at mrc-lmb.cam.ac.uk Tue Feb 13 19:16:14 2001 From: jkb at mrc-lmb.cam.ac.uk (James Bonfield) Date: Tue, 13 Feb 2001 19:16:14 +0000 Subject: abiview bug Message-ID: <20010213191614.A9162@arran.mrc-lmb.cam.ac.uk> I'm getting assertion failed in quite a few ABI3100 files with abiview. Unfortunately these files are not public so I cannot make them available to the emboss-dev team. So are there any obvious ways I can provide more debugging info? (Except for this:) Uncaught exception Assertion failed raised at ../../ajax/ajmem.c:78 EMBOSS An error in ../../ajax/ajexcept.c at line 56: aborting... By breaking at _exit I get: #0 0x40486ca0 in _exit () from /lib/libc.so.6 #1 0x40405085 in exit () at exit.c:82 #2 0x40171846 in ajMessCrashFL () from /nfs/arran/home5/pubseq/share/private/emboss/EMBOSS-1.9.1//linux-binaries/lib/libajax.so.0 #3 0x4015e1ee in ajExceptRaise () from /nfs/arran/home5/pubseq/share/private/emboss/EMBOSS-1.9.1//linux-binaries/lib/libajax.so.0 #4 0x40170c4d in ajMemCalloc () from /nfs/arran/home5/pubseq/share/private/emboss/EMBOSS-1.9.1//linux-binaries/lib/libajax.so.0 #5 0x40170cfb in ajMemCalloc0 () from /nfs/arran/home5/pubseq/share/private/emboss/EMBOSS-1.9.1//linux-binaries/lib/libajax.so.0 #6 0x4012ab2c in ajGraphxyDataNewI () from /nfs/arran/home5/pubseq/share/private/emboss/EMBOSS-1.9.1//linux-binaries/lib/libajaxg.so.0 #7 0x8049859 in graphDisplay () #8 0x804962f in main () #9 0x403f1b5c in __libc_start_main (main=0x804932c
, argc=2, ubp_av=0xbffff0e4, init=0x8048e70 <_init>, fini=0x8049b7c <_fini>, rtld_fini=0x4000d634 <_dl_fini>, stack_end=0xbffff0dc) at ../sysdeps/generic/libc-start.c:129 Alas this is compiled with the standard cc -O options, but if this isn't a known issue then I can try producing a debug version. James -- James Bonfield (jkb at mrc-lmb.cam.ac.uk) Tel: 01223 402499 Fax: 01223 213556 Medical Research Council - Laboratory of Molecular Biology, Hills Road, Cambridge, CB2 2QH, England. Also see Staden Package WWW site at http://www.mrc-lmb.cam.ac.uk/pubseq/ From ableasby at hgmp.mrc.ac.uk Wed Feb 14 00:48:43 2001 From: ableasby at hgmp.mrc.ac.uk (ableasby at hgmp.mrc.ac.uk) Date: Wed, 14 Feb 2001 00:48:43 GMT Subject: Pepwindow bug? Message-ID: <200102140048.AAA06352@tin.hgmp.mrc.ac.uk> Hi Jack (et al), I can comment on a function but not the program :-) From what you say an ajStrCleanWhite which removes leading, trailing and excess whitespace from a string might do the trick. Author? :-) Alan From ableasby at hgmp.mrc.ac.uk Wed Feb 14 01:27:31 2001 From: ableasby at hgmp.mrc.ac.uk (ableasby at hgmp.mrc.ac.uk) Date: Wed, 14 Feb 2001 01:27:31 GMT Subject: Pepwindow bug? Message-ID: <200102140127.BAA04190@bromine.hgmp.mrc.ac.uk> Yes, (if I'd got the function name right), just add an ajStrClean(&buffer); after each of the "line++" lines. You can tell its not one of mine..... I'd have used "++line" .... not that it matters. Alan From gwilliam at hgmp.mrc.ac.uk Wed Feb 14 09:17:33 2001 From: gwilliam at hgmp.mrc.ac.uk (Gary Williams, Tel 01223 494522) Date: Wed, 14 Feb 2001 09:17:33 +0000 Subject: [Fwd: GP and Arka -- software for molecular biology] Message-ID: <3A8A4D2D.CF873866@hgmp.mrc.ac.uk> Don Gilbert wrote: > > Gary, > > Re: wrappers (guis) for emboss -- I have done enough tests > to know that emboss components can be run from SeqPup's > sequence analysis program (Java based). It may be a while > till I can find time to do enough work though to make > emboss usable from seqpup, but I'm hopeful maybe by this > summer to have something. I'll pass on to emboss > mail lists any progress in using w/ SeqPup. > > -- Don > > -- > -- d.gilbert--bioinformatics--indiana-u--bloomington-in-47405 > -- gilbertd at bio.indiana.edu -- Gary Williams Tel: +44 1223 494522 Fax: +44 1223 494512 mailto:G.Williams at hgmp.mrc.ac.uk http://www.hgmp.mrc.ac.uk/ Bioinformatics,MRC HGMP Resource Centre,Hinxton,Cambridge, CB10 1SB,UK From jkb at mrc-lmb.cam.ac.uk Wed Feb 14 09:37:42 2001 From: jkb at mrc-lmb.cam.ac.uk (James Bonfield) Date: Wed, 14 Feb 2001 09:37:42 +0000 Subject: pepwindow bugs? In-Reply-To: ; from jackl@cmbi.kun.nl on Wed, Feb 14, 2001 at 01:41:19AM +0100 References: <20010213191153.A1360@arran.mrc-lmb.cam.ac.uk> Message-ID: <20010214093742.B9543@arran.mrc-lmb.cam.ac.uk> Hi Jack, > > results, while pepwindow gives something looking VERY different. Any > > explanations? > > You're absolutely right, the output is absolute rubbish! Very good > for a non-biologist! :-) Credit has to be shared with David Judge as it was a joint discovery. When a true fix is found (I haven't got time to follow the hints at present) could someone please email me the new C file? James -- James Bonfield (jkb at mrc-lmb.cam.ac.uk) Tel: 01223 402499 Fax: 01223 213556 Medical Research Council - Laboratory of Molecular Biology, Hills Road, Cambridge, CB2 2QH, England. Also see Staden Package WWW site at http://www.mrc-lmb.cam.ac.uk/pubseq/ From ableasby at hgmp.mrc.ac.uk Wed Feb 14 09:52:12 2001 From: ableasby at hgmp.mrc.ac.uk (ableasby at hgmp.mrc.ac.uk) Date: Wed, 14 Feb 2001 09:52:12 GMT Subject: patched pepwindow(all).c Message-ID: <200102140952.JAA01756@bromine.hgmp.mrc.ac.uk> Hi James, ftp://ftp.uk.embnet.org/pub/EMBOSS/patchfiles/ From jkb at mrc-lmb.cam.ac.uk Wed Feb 14 10:22:51 2001 From: jkb at mrc-lmb.cam.ac.uk (James Bonfield) Date: Wed, 14 Feb 2001 10:22:51 +0000 Subject: abiview bug In-Reply-To: <3A8A5660.F4D90E58@hgmp.mrc.ac.uk>; from tcarver@hgmp.mrc.ac.uk on Wed, Feb 14, 2001 at 09:56:48AM +0000 References: <20010213191614.A9162@arran.mrc-lmb.cam.ac.uk> <20010214095307.C9543@arran.mrc-lmb.cam.ac.uk> <3A8A5660.F4D90E58@hgmp.mrc.ac.uk> Message-ID: <20010214102251.D9543@arran.mrc-lmb.cam.ac.uk> Hello all, I've now identified the problem in the abiview program. The program was assuming that the position of the PLOC block immediately follows the PBAS block, which is not always true. All blocks should be considered as files in a directory - they can physically be stored in any order. Hence you have to query the directory in order to find the file location. Fortunately the patch is simple: *** abiview.c~ Wed Feb 14 09:48:07 2001 --- abiview.c Wed Feb 14 10:05:59 2001 *************** *** 71,76 **** --- 71,77 ---- int i; int base; long int baseO; + long int basePosO; long int numBases; long int numPoints; long int dataOffset[4]; *************** *** 121,126 **** --- 122,129 ---- res4 = (char)(fwo_&BYTE[0]); ajSeqABIReadSeq(fp,baseO,numBases,&nseq); + basePosO = ajSeqABIGetBasePosOffset(fp); /* find PLOC tag & get offset */ + ajFileSeek(fp, basePosO, SEEK_SET); ajSeqABIGetBasePosition(fp,numBases,&basePositions); On a more general note, most people do not keep ABI files on disk as they are simply too large. They typically convert them to SCF instead. (Indeed some machines, eg Licor, write SCF as their native format.) We maintain a freely available library (io_lib) of routines for reading and writing ABI, ALF, SCF, CTF (Jean Thierry-mieg's compressed format) and ZTR (my own compressed format). All of ABI->* filters are lossy as there's lots of other bits in the ABI files which no one quite knows what to do with, however the SCF->CTF and SCF->ZTR are lossless (and SCF->ZTR is typically slightly smaller than bzipped SCF). Maybe it makes sense not to duplicate work. io_lib is free although it isn't yet GPLed. That shouldn't be a problem, but if it is I cannot see an issue with GPLing io_lib. Indeed it already looks like parts of io_lib (or at least "ted" which much of it came from) are in emboss; there's a striking similarities in the seqABIGetFlag and getABIIndexEntryLW functions (eg the same bizarre flow controls, identical code layout, and some identical variable names). James -- James Bonfield (jkb at mrc-lmb.cam.ac.uk) Tel: 01223 402499 Fax: 01223 213556 Medical Research Council - Laboratory of Molecular Biology, Hills Road, Cambridge, CB2 2QH, England. Also see Staden Package WWW site at http://www.mrc-lmb.cam.ac.uk/pubseq/ From gwilliam at hgmp.mrc.ac.uk Wed Feb 14 10:37:14 2001 From: gwilliam at hgmp.mrc.ac.uk (gwilliam at hgmp.mrc.ac.uk) Date: Wed, 14 Feb 2001 10:37:14 GMT Subject: -sreverse and feature tables Message-ID: <200102141037.KAA05396@californium.hgmp.mrc.ac.uk> Peter, Currently when you do a -sreverse, the feature table(s) of the reversed sequence(s) are not also reversed (start and end positions changed to be length-start+1, length-end+1 and sense negated). This is shown when you do % showseq em:hsfau1 -format=2 stdout and showseq em:hsfau1 -format=2 stdout -srev (The sequence gets reverse-complemented, but the feature table stays the same and so inappropriate regions are indicated in the display) Thanks, Gary From tcarver at hgmp.mrc.ac.uk Wed Feb 14 10:46:29 2001 From: tcarver at hgmp.mrc.ac.uk (Tim Carver) Date: Wed, 14 Feb 2001 10:46:29 +0000 Subject: abiview bug References: <20010213191614.A9162@arran.mrc-lmb.cam.ac.uk> <20010214095307.C9543@arran.mrc-lmb.cam.ac.uk> <3A8A5660.F4D90E58@hgmp.mrc.ac.uk> <20010214102251.D9543@arran.mrc-lmb.cam.ac.uk> Message-ID: <3A8A6204.8D771C67@hgmp.mrc.ac.uk> Hi James, Many thanks for the patch. Glad it was simple. This has now been installed here. I agree with your comments about io_lib and re-inventing the wheel. At some stage it would be good to get all these formats in. Regards Tim > Hello all, > > I've now identified the problem in the abiview program. > The program was assuming that the position of the PLOC block immediately > follows the PBAS block, which is not always true. All blocks should be > considered as files in a directory - they can physically be stored in any > order. Hence you have to query the directory in order to find the file > location. Fortunately the patch is simple: > > *** abiview.c~ Wed Feb 14 09:48:07 2001 > --- abiview.c Wed Feb 14 10:05:59 2001 > *************** > *** 71,76 **** > --- 71,77 ---- > int i; > int base; > long int baseO; > + long int basePosO; > long int numBases; > long int numPoints; > long int dataOffset[4]; > *************** > *** 121,126 **** > --- 122,129 ---- > res4 = (char)(fwo_&BYTE[0]); > > ajSeqABIReadSeq(fp,baseO,numBases,&nseq); > + basePosO = ajSeqABIGetBasePosOffset(fp); /* find PLOC tag & get offset */ > + ajFileSeek(fp, basePosO, SEEK_SET); > ajSeqABIGetBasePosition(fp,numBases,&basePositions); > > On a more general note, most people do not keep ABI files on disk as they are > simply too large. They typically convert them to SCF instead. (Indeed some > machines, eg Licor, write SCF as their native format.) We maintain a freely > available library (io_lib) of routines for reading and writing ABI, ALF, SCF, > CTF (Jean Thierry-mieg's compressed format) and ZTR (my own compressed > format). All of ABI->* filters are lossy as there's lots of other bits in the > ABI files which no one quite knows what to do with, however the SCF->CTF and > SCF->ZTR are lossless (and SCF->ZTR is typically slightly smaller than bzipped > SCF). Maybe it makes sense not to duplicate work. > > io_lib is free although it isn't yet GPLed. That shouldn't be a problem, but > if it is I cannot see an issue with GPLing io_lib. Indeed it already looks > like parts of io_lib (or at least "ted" which much of it came from) are in > emboss; there's a striking similarities in the seqABIGetFlag and > getABIIndexEntryLW functions (eg the same bizarre flow controls, identical > code layout, and some identical variable names). > > James > > -- > James Bonfield (jkb at mrc-lmb.cam.ac.uk) Tel: 01223 402499 Fax: 01223 213556 > Medical Research Council - Laboratory of Molecular Biology, > Hills Road, Cambridge, CB2 2QH, England. > Also see Staden Package WWW site at http://www.mrc-lmb.cam.ac.uk/pubseq/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From jkb at mrc-lmb.cam.ac.uk Wed Feb 14 12:20:46 2001 From: jkb at mrc-lmb.cam.ac.uk (James Bonfield) Date: Wed, 14 Feb 2001 12:20:46 +0000 Subject: sequence.feature attribute Message-ID: <20010214122046.C10074@arran.mrc-lmb.cam.ac.uk> The feature attribute of type sequence (et al) doesn't appear to be documented. One thing I have noticed is that it prevents reading of simple format files. Eg: jkb at jura[work/emboss]& cat seq1 ATCGTACGATCGGACTAGC jkb at jura[work/emboss]& diffseq seq1 Find differences (SNPs) between nearly identical sequences EMBOSS An error in ajfile.c at line 980: Error reading from file '.' jkb at jura[work/emboss]& cat seq2 >test ATCGTACGATCGGACTAGC jkb at jura[work/emboss]& diffseq seq2 Find differences (SNPs) between nearly identical sequences Second sequence: I don't know why fasta format would work and plain text does not, as neither support features. James -- James Bonfield (jkb at mrc-lmb.cam.ac.uk) Tel: 01223 402499 Fax: 01223 213556 Medical Research Council - Laboratory of Molecular Biology, Hills Road, Cambridge, CB2 2QH, England. Also see Staden Package WWW site at http://www/pubseq/ From jkb at mrc-lmb.cam.ac.uk Wed Feb 14 15:01:12 2001 From: jkb at mrc-lmb.cam.ac.uk (James Bonfield) Date: Wed, 14 Feb 2001 15:01:12 +0000 Subject: Lists, selections & defaults Message-ID: <20010214150112.D10074@arran.mrc-lmb.cam.ac.uk> Some more ACD questions. If I've missed something in the documentation then please point me to it, otherwise please note it as a request for documentation improvements. For the list type, what value should default have? I've seen examples thus: showseq.acd: list: things [ default: "B N T S F" values: "S:Sequence, B:Blank line, 1:Frame1 translation, 2:Frame2 translation, 3:Frame3 translation, ... ajbad.acd: (is this deliberately a badly formatted acd file? list: frames [ default: "1,2,3,4,5,6" values: "0: None, 1: F1,2: F2,3: F3,4: R1,5: R2,6: R3" ... dbiblast.acd: List: seqtype [ req: Y prompt: "Sequence type" value: "N:nucleic;P:protein;?:unknown" max: 1 min: 1 def: unknown ] So the default is sometimes the 'code' (as in showseq.acd) and sometimes the 'value' (dbiblast.acd). When default is a list, sometimes it is space separated, and sometimes command separated. How should I be parsing this? James -- James Bonfield (jkb at mrc-lmb.cam.ac.uk) Tel: 01223 402499 Fax: 01223 213556 Medical Research Council - Laboratory of Molecular Biology, Hills Road, Cambridge, CB2 2QH, England. Also see Staden Package WWW site at http://www.mrc-lmb.cam.ac.uk/pubseq/ From leonardz at bioinfo.sickkids.on.ca Thu Feb 15 21:56:09 2001 From: leonardz at bioinfo.sickkids.on.ca (Len F. Zaifman) Date: Thu, 15 Feb 2001 16:56:09 -0500 Subject: Problem with water program Message-ID: <3A8C5079.D791FDDF@bioinfo.sickkids.on.ca> We are using water on a large sequence ( > 10000000 bases) and aligning a smaller sequence ( < 1000 bases ) to it. This is for emboss 1.9.1. The water manual says: > Diagnostic Error Messages > > Uncaught exception > Assertion failed > raised at ajmem.c:xxx > > > Probably means you have run out of memory. Try using supermatcher or matcher if this happens. This is on an SGI \Origin with Gigabytes of Ram, so I upped the amount of memory a single process could use from .5 GB to 2 GB and still got the same result. On the SGI I use ssusage to determine real memory usage and the memory usage didn't change in spite of the above change from kernel memory parameters real memory use rlimit_rss_max = 536870912 (0x20000000) ll 75216 mxrss 4 4k page size this is 300 MB of ram rlimit_rss_cur = 536870912 (0x20000000) ll to rlimit_rss_max = 2147483648 (0x80000000) ll 75216 mxrss rlimit_rss_cur = 2147483648 (0x80000000) ll (& yes I rebooted the system to ensure I got the new limits). So I ran water using par to trace the program and got the detailed output below. It looks to me like it is failing on opening my output file My.output. Any comments? By the way, this is a file and directory where I do have write permission, so it is not that. 15728.339mS(+ 22uS)[ 16] water.dbg( 1807): close(5) << this closes /tools/emboss1.9.1/share/EMBOSS/data/EDNAMAT 15728.366mS(+ 28uS)[ 16] water.dbg( 1807): END-close() OK 15728.649mS(+ 284uS)[ 16] water.dbg( 1807): open("MY.output", O_WRONLY|O_CREAT|O_TRUNC, 0666)15728.760mS(+ 109uS)[ 16] water.dbg( 1807): END-open() = 5 << This is my outputfile from water specified by -outfile 15729.497mS(+ 738uS)[ 16] water.dbg( 1807): write(2, "Uncaught exception", 18) 15729.545mS(+ 47uS)[ 16] water.dbg( 1807): END-write() = 18 15729.558mS(+ 12uS)[ 16] water.dbg( 1807): write(2, "\n", 1) 15729.622mS(+ 65uS)[ 16] water.dbg( 1807): END-write() = 1 15729.642mS(+ 17uS)[ 16] water.dbg( 1807): write(2, " Assertion failed", 17) 15729.649mS(+ 8uS)[ 16] water.dbg( 1807): END-write() = 17 15729.655mS(+ 6uS)[ 16] water.dbg( 1807): write(2, "\n", 1) 15729.665mS(+ 9uS)[ 16] water.dbg( 1807): END-write() = 1 15729.686mS(+ 20uS)[ 16] water.dbg( 1807): write(2, " raised at ajmem.c:167\n", 23) 15729.695mS(+ 10uS)[ 16] water.dbg( 1807): END-write() = 23 15729.700mS(+ 4uS)[ 16] water.dbg( 1807): write(2, "\n", 1) 15729.712mS(+ 11uS)[ 16] water.dbg( 1807): END-write() = 1 15729.853mS(+ 140uS)[ 16] water.dbg( 1807): write(2, "\n EMBOSS An error in ajexcep", 58) 15729.893mS(+ 40uS)[ 16] water.dbg( 1807): END-write() = 58 15729.900mS(+ 7uS)[ 16] water.dbg( 1807): write(2, "\n", 1) 15729.955mS(+ 55uS)[ 16] water.dbg( 1807): END-write() = 1 15729.998mS(+ 43uS)[ 16] water.dbg( 1807): prctl(PR_LASTSHEXIT) 15730.005mS(+ 6uS)[ 16] water.dbg( 1807): END-prctl() = 1 15730.048mS(+ 44uS)[ 16] water.dbg( 1807): exit(1) -------------- next part -------------- A non-text attachment was scrubbed... Name: leonardz.vcf Type: text/x-vcard Size: 358 bytes Desc: Card for Len F. Zaifman URL: From jkb at mrc-lmb.cam.ac.uk Fri Feb 16 12:48:08 2001 From: jkb at mrc-lmb.cam.ac.uk (James Bonfield) Date: Fri, 16 Feb 2001 12:48:08 +0000 Subject: abiview -graph data Message-ID: <20010216124808.A15098@arran.mrc-lmb.cam.ac.uk> Just another minor buglet: abiview does not accept "-graph data". It gripes with: Writing graph 1 data to abiview1.dat Writing graph 2 data to abiview2.dat Writing graph 3 data to abiview3.dat Writing graph 4 data to abiview4.dat *** PLPLOT ERROR *** pladv: Please call plinit first, aborting operation *** PLPLOT ERROR *** etc This isn't causing me problems at all (as we have our own trace viewer), so please don't feel that I'm waiting for a fix. -- James Bonfield (jkb at mrc-lmb.cam.ac.uk) Tel: 01223 402499 Fax: 01223 213556 Medical Research Council - Laboratory of Molecular Biology, Hills Road, Cambridge, CB2 2QH, England. Also see Staden Package WWW site at http://www.mrc-lmb.cam.ac.uk/pubseq/ From jkb at mrc-lmb.cam.ac.uk Fri Feb 16 12:50:23 2001 From: jkb at mrc-lmb.cam.ac.uk (James Bonfield) Date: Fri, 16 Feb 2001 12:50:23 +0000 Subject: Spin update Message-ID: <20010216125023.A3774@arran.mrc-lmb.cam.ac.uk> The EMBOSS interface in Spin is coming along well. My parser processes all 156 ACD files (including the collection of test and demo ones) in the 1.9.1 distribution and produces tcl code (using [incr widgets]) for each dialogue. All the tcl code runs and produces dialogue, but I expect many of them fail when I press the OK button. I'm working through them slowly. Technology wise, I think I've implemented pretty much most of the hard stuff now, with expressions in default, information, min, max, required and needed all working. Quite a bit of checking is performed (sequence types, min/max ranges, etc), but I've still got more work to do there (eg maximum number of values within lists). I'm using our existing code (that Kathryn Beal wrote) for drawing the graphical output from EMBOSS. However there seems to be little consistency in the arguments here. Many programs take a -graph option which can have the type "data". Others take a -data argument which saves the graphical information in another format. It would be good if these could be merged together somehow - all using the same method. I recall Kathryn discussing this with Alan before, but I don't know what the outcome was. Could you (Alan) please remind me what the conclusion was? One other obvious thing that strikes me is that we need a big reorganisation of the appl: group field. I'm using these to generate menus and cascading submenus, but it's almost impossible to guess where things we be. I'll perhaps have a go at reorganising it once I've finished the completed acd2tcl itself. James -- James Bonfield (jkb at mrc-lmb.cam.ac.uk) Tel: 01223 402499 Fax: 01223 213556 Medical Research Council - Laboratory of Molecular Biology, Hills Road, Cambridge, CB2 2QH, England. Also see Staden Package WWW site at http://www.mrc-lmb.cam.ac.uk/pubseq/ From jkb at mrc-lmb.cam.ac.uk Mon Feb 19 15:43:08 2001 From: jkb at mrc-lmb.cam.ac.uk (James Bonfield) Date: Mon, 19 Feb 2001 15:43:08 +0000 Subject: Strings vs expressions Message-ID: <20010219154308.C25705@arran.mrc-lmb.cam.ac.uk> Hello all, When is a expression within a string a real expression, and when is it just part of the string? Clearly not all $ symbols in strings form expressions. Eg dbiblast.acd contains a string: pattern: "^([0-9]+.[0-9]+.[0-9]+)?$" Mostly complex expressions are wrapped in @(), but not always. Eg see transeq.acd: def: $(sequence.begin)-$(sequence.end) How can I tell expressions from just internal dollars. There doesn't appear to be any escaping syntax, which means this _must_ be clearly defined to avoid bugs. My current ruling is if isn't followed by ( then it's not an expression. Is that valid? James -- James Bonfield (jkb at mrc-lmb.cam.ac.uk) Tel: 01223 402499 Fax: 01223 213556 Medical Research Council - Laboratory of Molecular Biology, Hills Road, Cambridge, CB2 2QH, England. Also see Staden Package WWW site at http://www.mrc-lmb.cam.ac.uk/pubseq/ From peter.rice at uk.lionbioscience.com Mon Feb 19 15:54:15 2001 From: peter.rice at uk.lionbioscience.com (rice) Date: Mon, 19 Feb 2001 15:54:15 +0000 Subject: Strings vs expressions References: <20010219154308.C25705@arran.mrc-lmb.cam.ac.uk> Message-ID: <3A9141A7.AB587BFD@uk.lionbioscience.com> James Bonfield wrote: > When is a expression within a string a real expression, and when is it just > part of the string? > Mostly complex expressions are wrapped in @(), but not always. Eg see > transeq.acd: > > def: $(sequence.begin)-$(sequence.end) This is a silly one - it only looks like an expression. $(name) is replaced by the variable value @(expression) is replaced by the expression result The example you give ends up as something like "1-1000" in a string. It is not an expression. Maybe we should call it a grimace :-) When () are nested, the inner ones are always evaluated first. $ or @ on their own, with no (), are unchanged. -- ------------------------------------------------ Peter Rice, LION Bioscience Ltd, Cambridge, UK peter.rice at uk.lionbioscience.com +44 1223 224723 From jkb at mrc-lmb.cam.ac.uk Mon Feb 19 15:57:39 2001 From: jkb at mrc-lmb.cam.ac.uk (James Bonfield) Date: Mon, 19 Feb 2001 15:57:39 +0000 Subject: Strings vs expressions In-Reply-To: <3A9141A7.AB587BFD@uk.lionbioscience.com>; from peter.rice@uk.lionbioscience.com on Mon, Feb 19, 2001 at 03:54:15PM +0000 References: <20010219154308.C25705@arran.mrc-lmb.cam.ac.uk> <3A9141A7.AB587BFD@uk.lionbioscience.com> Message-ID: <20010219155738.D25705@arran.mrc-lmb.cam.ac.uk> On Mon, Feb 19, 2001 at 03:54:15PM +0000, rice wrote: > When () are nested, the inner ones are always evaluated first. > > $ or @ on their own, with no (), are unchanged. So what you're saying basically is that $(...) and @(...) are expressions (to the matching bracket), and any other occurrence of $, ( and @ are just basic text. (That's fine as it now agrees with how I changed things - previously I was working on $ and @ at the start of a word indicating an expression.) Could this behaviour please be documented? The ACD docs seem to indicate that all parsing is done on a word by word basis, which is where I originally concluded that $/@ starting a word indicates an expression. James -- James Bonfield (jkb at mrc-lmb.cam.ac.uk) Tel: 01223 402499 Fax: 01223 213556 Medical Research Council - Laboratory of Molecular Biology, Hills Road, Cambridge, CB2 2QH, England. Also see Staden Package WWW site at http://www.mrc-lmb.cam.ac.uk/pubseq/