From gbottu at ben.vub.ac.be Fri Sep 17 15:15:56 2004 From: gbottu at ben.vub.ac.be (Guy Bottu) Date: Fri, 17 Sep 2004 21:15:56 +0200 Subject: bizarre behaviour of ajSeqGetUsa Message-ID: <20040917191556.GA3184@bigben.ulb.ac.be> Dear support, I discovered something bizarre. To see it, try the following program under EMBOSS 2.9.0 : ---------------------------- #include "emboss.h" #include int main(int argc, char **argv) { AjPSeq seq1; AjPSeq seq2; embInit ("test", argc, argv); seq1 = ajAcdGetSeq("asequence"); seq2 = ajAcdGetSeq("bsequence"); ajFmtPrint("We have %S and %S\n", ajSeqGetUsa(seq1), ajSeqGetUsa(seq2)); ajExit (); } ------------------------------ application: test [ ] sequence: asequence [ parameter: "Y" ] sequence: bsequence [ parameter: "Y" ] ---------------------------- In my hands the command test sw:papa_carpa sw:tpa_human outputs We have sw-id:TPA_HUMAN and sw-id:TPA_HUMAN Where is sw:PAPA_CARPA gone ? I do not know whether this is a bug or a "feature", but it is surely counterintuituve behaviour. Sincerely, Guy Bottu From pmr at ebi.ac.uk Sat Sep 18 06:15:30 2004 From: pmr at ebi.ac.uk (pmr at ebi.ac.uk) Date: Sat, 18 Sep 2004 11:15:30 +0100 (BST) Subject: [EMBOSS-BUG] bizarre behaviour of ajSeqGetUsa In-Reply-To: <20040917191556.GA3184@bigben.ulb.ac.be> References: <20040917191556.GA3184@bigben.ulb.ac.be> Message-ID: <1147.217.134.102.200.1095502530.squirrel@webmail.ebi.ac.uk> Dear Guy, > I discovered something bizarre. To see it, try the following program under > EMBOSS 2.9.0 : > ajFmtPrint("We have %S and %S\n", ajSeqGetUsa(seq1), ajSeqGetUsa(seq2)); > test sw:papa_carpa sw:tpa_human > > outputs > > We have sw-id:TPA_HUMAN and sw-id:TPA_HUMAN > > Where is sw:PAPA_CARPA gone ? I do not know whether this is a bug or a > "feature", but it is surely counterintuituve behaviour. It's a feature, but one that does need careful documentation. ajSeqGetUsa returns a pointer to an AjPStr ... but that string has to exist somewhere, and as the USA is not necessarily in the original AjPSeq is makes one up. This leaves your second ajSeqGetUsa call overwriting the string before the first one has been printed. Any function returning a "const AjPStr" could, in thory, have such effects. It depends though on whether the AjPStr is likely to change, or whether it depends entirely on the input to the function. We should at least mark such cases in the documentation. Making a new string would cause a memory leak. The recommended solutions would be: 1. split the ajFmtPrint statement into two 2. copy the ajSeqGetUsa result into a string, and clean it up Our solutions could be (for a future release) populating a Usa attribute of the AjPSeq and passing that back. As an AjPSeq attribute it would then be cleaned up by the AjPSeq destructor. At present this attribute contains the original USA but I see no reason not to "correct" it. The alternative would involve writing a lot of extra documentation .... I think I prefer the code fix. Hpe this helps, Peter From henrikki.almusa at helsinki.fi Fri Sep 24 03:23:33 2004 From: henrikki.almusa at helsinki.fi (Henrikki Almusa) Date: Fri, 24 Sep 2004 10:23:33 +0300 Subject: New program: makeseq Message-ID: <200409241023.33656.henrikki.almusa@helsinki.fi> Hello, I have written a new program for emboss. The code is made against emboss-2.9.0 and the program is called 'makeseq'. It creates random sequences, but since the biological world isn't quite that random, it can use either pepstats output (for proteins) or cusp output (for nucleotides) to create a distribution. This should give users the ability to create random sequences biased according to their own sequence triplet or amino acid distributions. The program also allows inserting a given sequence (insert) within the created sequence. However, I've encountered a few problems where I need help. 1. Acd handling I've tried to make the program query something depending on other selection. Sequence type should be asked if there is no distribution file and start point of the insertion should be asked if insert has been given. I can't make it query the these properly properly. 2. Segfaults The program segfaults when asked to make a nucleotide sequence with a given insert. This is caused by the inserts sequence type check. The stack trace is: #0 0x40103531 in cvt_s () from /work/hena/emboss-2.9.0/lib/libajax.so.0 #1 0x4010487a in ajFmtVfmt () from /work/hena/emboss-2.9.0/lib/libajax.so.0 #2 0x4010445c in ajFmtVfmtStrCL () from /work/hena/emboss-2.9.0/lib/libajax.so.0 #3 0x40104367 in ajFmtPrintS () from /work/hena/emboss-2.9.0/lib/libajax.so.0 #4 0x40145998 in seqTypeCharDnaGap () from /work/hena/emboss-2.9.0/lib/libajax.so.0 #5 0x40144e70 in ajSeqTypeDnaS () from /work/hena/emboss-2.9.0/lib/libajax.so.0 #6 0x08049234 in main () #7 0x40466a67 in __libc_start_main () from /lib/i686/libc.so.6 Protein typechecking works ok. 3. Uniformity This problem appears when making pure random sequence. I tried to use 'ajax/seqtype.c' lines char seqCharProtPure[] = "ACDEFGHIKLMNPQRSTVWYacdefghiklmnpqrstvwy"; char seqCharNucPure[] = "ACGTUacgtu"; with the following addeitions to the file int seqCharProtPureLength = 40; int seqCharNucPureLength = 10; Now, this did not work. Therefore, I just copied them within the program and it worked fine. However, I don't think this is the proper way to do, since the program doesn't then uses it's own settings for what is good character and what is not. Is there a way to use something more generic, so that if emboss changes these things, they would be applied to this program as well? Any help is most appreciated. I would like to submit this when these things are fixed. 'makeseq.c' and 'makeseq.acd' are attached. Also basic help on creating help page for makeseq would be appreciated. Thanks, -- Henrikki Almusa -------------- next part -------------- application: makeseq [ documentation: "Creates random sequences" groups: "Edit" ] section: input [ information: "Input section" type: "page" ] infile: data [ information: "Distribution file" help: "This file should be pepstats output file to create protein sequences or cusp output to create nucleotide sequence. Nucleotide sequences will be created as triplets with end trimmed to be correct length." additional: "Y" nullok: "Y" ] endsection: input section: required [ information: "Required section" type: "page" ] integer: amount [ standard: "Y" default: "100" minimum: "1" information: "Number of sequences" ] integer: length [ standard: "Y" default: "100" minimum: "1" information: "Length of single sequence" ] endsection: required section: advanced [ information: "Advanced section" type: "page" ] # this should be queried if no data file boolean: protein [ standard: "@(!$(data) > 0 ? Y : N)" default: "N" additional: "Y" information: "Make protein sequences" ] string: insert [ information: "Inserted string" help: "String that is inserted into sequence" additional: "Y" nullok: "Y" knowntype: "sequence" ] # this isn't always queried even as insert given integer: start [ standard: "@($(insert) ? Y : N)" information: "Start point of inserted sequence" minimum: "1" default: "1" # maximum: "@($(length) - @($(insert) ? $(insert.length)-1 : 0))" ] endsection: advanced section: output [ information: "Output section" type: "page" ] seqoutall: outseq [ parameter: "Y" type: "any" name: "makeseq" ] endsection: output -------------- next part -------------- A non-text attachment was scrubbed... Name: makeseq.c Type: text/x-csrc Size: 8999 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/emboss-dev/attachments/20040924/3a8b4444/attachment.bin From jison at hgmp.mrc.ac.uk Fri Sep 24 11:02:39 2004 From: jison at hgmp.mrc.ac.uk (Dr J.C. Ison) Date: Fri, 24 Sep 2004 16:02:39 +0100 Subject: New program: makeseq References: <200409241023.33656.henrikki.almusa@helsinki.fi> Message-ID: <4154370F.FE074597@hgmp.mrc.ac.uk> Hi Henrikki Sounds very useful. I've tried to address all your points (below). Cheers Jon Henrikki Almusa wrote: > > Hello, > > I have written a new program for emboss. The code is made against emboss-2.9.0 > and the program is called 'makeseq'. It creates random sequences, but since > the biological world isn't quite that random, it can use either pepstats > output (for proteins) or cusp output (for nucleotides) to create a > distribution. This should give users the ability to create random sequences > biased according to their own sequence triplet or amino acid distributions. > The program also allows inserting a given sequence (insert) within the > created sequence. However, I've encountered a few problems where I need help. > > 1. Acd handling > I've tried to make the program query something depending on other selection. > Sequence type should be asked if there is no distribution file and start > point of the insertion should be asked if insert has been given. I can't make > it query the these properly properly. The cleanest way to do this is to use a "Toggle" ACD data item. e.g. toggle: distro [ standard: "Y" information: "Do you want to use make an insert?" default: "N" ] integer: start [ standard: "$(retain)" information: "Start point of inserted sequence" default: "1" ] You don't need to prompt the user for sequnce type though, because "sequence" data items have attributes: sequence: sequence [ parameter: "Y" type: protein ] sequence.begin (start residue, i.e. -sbegin value) sequence.end (end residue, i.e. -send value) sequence.length (length) sequence.protein (true if sequence is protein) sequence.nucleic (true if sequence is nucleic) sequence.name (name) sequence.weight (alignment weight for a seqset) sequence.count (no. of sequences in a seqset) You access them in ACD by e.g. $(sequence.begin) etc. e.g. to ensure your insert isn't past the end of the sequence use maximum: $(sequence.end) > > 2. Segfaults > The program segfaults when asked to make a nucleotide sequence with a given > insert. This is caused by the inserts sequence type check. The stack trace > is: > > #0 0x40103531 in cvt_s () from /work/hena/emboss-2.9.0/lib/libajax.so.0 > #1 0x4010487a in ajFmtVfmt () from /work/hena/emboss-2.9.0/lib/libajax.so.0 > #2 0x4010445c in ajFmtVfmtStrCL () > from /work/hena/emboss-2.9.0/lib/libajax.so.0 > #3 0x40104367 in ajFmtPrintS () from /work/hena/emboss-2.9.0/lib/libajax.so.0 > #4 0x40145998 in seqTypeCharDnaGap () > from /work/hena/emboss-2.9.0/lib/libajax.so.0 > #5 0x40144e70 in ajSeqTypeDnaS () > from /work/hena/emboss-2.9.0/lib/libajax.so.0 > #6 0x08049234 in main () > #7 0x40466a67 in __libc_start_main () from /lib/i686/libc.so.6 > > Protein typechecking works ok. > If you really can't fix it get back in touch and I can run it through Purify. > 3. Uniformity > This problem appears when making pure random sequence. I tried to use > 'ajax/seqtype.c' lines > > char seqCharProtPure[] = "ACDEFGHIKLMNPQRSTVWYacdefghiklmnpqrstvwy"; > char seqCharNucPure[] = "ACGTUacgtu"; > > with the following addeitions to the file > > int seqCharProtPureLength = 40; > int seqCharNucPureLength = 10; > > Now, this did not work. Therefore, I just copied them within the program and > it worked fine. However, I don't think this is the proper way to do, since > the program doesn't then uses it's own settings for what is good character > and what is not. I'm presuming the 10 and 40 are size of your two arrays. If you want to treat them as strings you have to leave space for your terminating NULL, so 41 and 11 would do it. All abitrary limits really should be avoided though, use e.g. AjPStr seqCharProtPure=NULL; seqCharProtPure=ajStrNewC("ACDEFGHIKLMNPQRSTVWYacdefghiklmnpqrstvwy"); and ajStrChar to return a single character from a string at a given position. Is there a way to use something more generic, so that if > emboss changes these things, they would be applied to this program as well? > There might (perhaps should!) be - Alan Bleasby (ableasby at rfcgr.mrc.ac.uk) is the best man to ask about that. > Any help is most appreciated. I would like to submit this when these things > are fixed. 'makeseq.c' and 'makeseq.acd' are attached. Also basic help on > creating help page for makeseq would be appreciated. I've attached the template I use for the DOMAINATRIX documentation, e.g. : http://www.rfcgr.mrc.ac.uk/Software/EMBOSS/Apps/domainatrix/rocon.html With this template, I document stuff by hand. The only external program I use is "acdtable" to get the ACD stuff. This is slightly different from the format used for EMBOSS apps though. > > Thanks, > -- > Henrikki Almusa > Hope this helps and thanks for the interest Cheers Jon > ------------------------------------------------------------------------ > Name: makeseq.acd > makeseq.acd Type: Plain Text (text/plain) > Encoding: 7bit > > Name: makeseq.c > makeseq.c Type: text/x-csrc > Encoding: 7bit -- Jon C. Ison, PhD Proteomics Applications Group MRC Rosalind Franklin Centre for Genomics Research Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SB, UK Tel: +44 1223 494500 Fax: +44 1223 494512 E-mail: jison at rfcgr.mrc.ac.uk Web: http://www.rfcgr.mrc.ac.uk -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.open-bio.org/pipermail/emboss-dev/attachments/20040924/5eef7754/attachment.html From henrikki.almusa at helsinki.fi Mon Sep 27 04:33:16 2004 From: henrikki.almusa at helsinki.fi (Henrikki Almusa) Date: Mon, 27 Sep 2004 11:33:16 +0300 Subject: New program: makeseq In-Reply-To: <4154370F.FE074597@hgmp.mrc.ac.uk> References: <200409241023.33656.henrikki.almusa@helsinki.fi> <4154370F.FE074597@hgmp.mrc.ac.uk> Message-ID: <200409271133.16344.henrikki.almusa@helsinki.fi> On Friday 24 September 2004 18:02, Dr J.C. Ison wrote: > > 1. Acd handling > The cleanest way to do this is to use a "Toggle" ACD data item. > e.g. Ok, this part seems to work ok. I put toggle for both data file and insert info. > You don't need to prompt the user for sequnce type though, because > "sequence" data items have attributes: > > sequence: sequence > [ > parameter: "Y" > type: protein > ] > > sequence.begin (start residue, i.e. -sbegin value) > > You access them in ACD by e.g. $(sequence.begin) etc. > e.g. to ensure your insert isn't past the end of the sequence use > maximum: $(sequence.end) Well, i don't have a sequence there anywhere. And the problem also comes from the fact that data file can determine the type as well. It is now queried if the data file is not given. And since the insert is counted within the sequence length the maximium place to start the insert is lenght - insert.length. That calculation doesn't seem to work either, so I'm checking that inside the code. > > 2. Segfaults > If you really can't fix it get back in touch and I can run it through > Purify. That would be nice. I honestly can't figure this one out. I checked that the insert goes there (inserts ajpstr can be printed with ajFmtPrint() before test). > > 3. Uniformity > I'm presuming the 10 and 40 are size of your two arrays. If you want > to treat them as strings you have to leave space for your terminating > NULL, so 41 and 11 would do it. All abitrary limits really should be > avoided though, use e.g. > > AjPStr seqCharProtPure=NULL; > seqCharProtPure=ajStrNewC("ACDEFGHIKLMNPQRSTVWYacdefghiklmnpqrstvwy"); > > and ajStrChar to return a single character from a string at a given > position. I use the length to tell me size of the char array that exists. Then when creating a random sequence, i can just ask random number between 0 and length to get a character for sequence. Well there is one abstraction layer between that char array and the final one used in randomised selection, but thats because of cusp. There is no arbitrary limits as such. Usage of the above char arrays are in makeseq_default_chars function. >> Is there a way to use something more generic, so that if > > emboss changes these things, they would be applied to this program as > > well? > > There might (perhaps should!) be - Alan Bleasby > (ableasby at rfcgr.mrc.ac.uk) is the best man to ask about that. Ok. I'll put another post to emboss-dev later on this. > I've attached the template I use for the DOMAINATRIX documentation, e.g. > > http://www.rfcgr.mrc.ac.uk/Software/EMBOSS/Apps/domainatrix/rocon.html > With this template, I document stuff by hand. The only external program > I use is "acdtable" to get the ACD stuff. This is slightly different > from the format used for EMBOSS apps though. So there is no script to run to get basic info from acd file into html file. Then its just manual labour of copying and writing html file :). > Hope this helps and thanks for the interest > > Cheers > > Jon Thanks for help. I attached the new versions of .c and .acd files. -- Henrikki Almusa -------------- next part -------------- A non-text attachment was scrubbed... Name: makeseq.c Type: text/x-csrc Size: 8999 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/emboss-dev/attachments/20040927/eea2d167/attachment.bin -------------- next part -------------- application: makeseq [ documentation: "Creates random sequences" groups: "Edit" ] section: required [ information: "Required section" type: "page" ] integer: amount [ standard: "Y" default: "100" minimum: "1" information: "Number of sequences" ] integer: length [ standard: "Y" default: "100" minimum: "1" information: "Length of single sequence" ] toggle: useinsert [ standard: "Y" information: "Do you want to make an insert" default: "N" ] string: insert [ standard: "$(useinsert)" information: "Inserted string" help: "String that is inserted into sequence" # nullok: "Y" knowntype: "sequence" ] integer: start [ standard: "$(useinsert)" information: "Start point of inserted sequence" minimum: "1" default: "1" # maximum: "@($(length) - @($(insert) ? $(insert.length)-1 : 0))" ] toggle: usedata [ standard: "Y" information: "Do you want to use distribution file" default: "N" ] endsection: required section: input [ information: "Input section" type: "page" ] infile: data [ standard: "$(usedata)" information: "Distribution file" help: "This file should be pepstats output file to create protein sequences or cusp output to create nucleotide sequence. Nucleotide sequences will be created as triplets with end trimmed to be correct length." nullok: "Y" ] endsection: input section: advanced [ information: "Advanced section" type: "page" ] boolean: protein [ standard: "@($(usedata) ? N : Y)" default: "N" information: "Make protein sequences" ] endsection: advanced section: output [ information: "Output section" type: "page" ] seqoutall: outseq [ parameter: "Y" type: "any" name: "makeseq" ] endsection: output From henrikki.almusa at helsinki.fi Mon Sep 27 05:04:07 2004 From: henrikki.almusa at helsinki.fi (Henrikki Almusa) Date: Mon, 27 Sep 2004 12:04:07 +0300 Subject: ACD, toggle handling Message-ID: <200409271204.07484.henrikki.almusa@helsinki.fi> Hello, I've done a program that uses toggle in acd to allow one or more options to depend on others. Eg. toggle: useinsert [ standard: "Y" information: "Do you want to make an insert" default: "N" ] string: insert [ standard: "$(useinsert)" information: "Inserted string" ] integer: start [ standard: "$(useinsert)" information: "Start point of inserted sequence" default: "1" ] Now if I give in commandline option '-insert y', then it asks the "Do you want to make an insert[N]: ". If i answer no to it, it makes the insert with default start value. If I say yes, then it asks to start point. Now i would suggest that since I have already given the option that depends on the toggle option, program should automatically assume that useinsert is 'Y'. Not ask if want to add insert and then ask for the start point of insert. Most often I use the command line options when giving out files, as bash has autocompletion. Would this be a good way to handle this? Sincerely, -- Henrikki Almusa From ableasby at hgmp.mrc.ac.uk Mon Sep 27 05:34:26 2004 From: ableasby at hgmp.mrc.ac.uk (Alan Bleasby) Date: Mon, 27 Sep 2004 10:34:26 +0100 (BST) Subject: ACD, toggle handling Message-ID: <200409270934.i8R9YQ3M006048@bromine.hgmp.mrc.ac.uk> Hello Henrikki, You have the definition: toggle: useinsert [ standard: "Y" information: "Do you want to make an insert" default: "N" ] The more usual form would be to omit the "standard" definition in which case it wil not prompt. Alan From henrikki.almusa at helsinki.fi Mon Sep 27 05:44:54 2004 From: henrikki.almusa at helsinki.fi (Henrikki Almusa) Date: Mon, 27 Sep 2004 12:44:54 +0300 Subject: ACD, toggle handling In-Reply-To: <200409270934.i8R9YQ3M006048@bromine.hgmp.mrc.ac.uk> References: <200409270934.i8R9YQ3M006048@bromine.hgmp.mrc.ac.uk> Message-ID: <200409271244.54334.henrikki.almusa@helsinki.fi> On Monday 27 September 2004 12:34, Alan Bleasby wrote: > Hello Henrikki, > > You have the definition: > > toggle: useinsert [ > standard: "Y" > information: "Do you want to make an insert" > default: "N" > ] > > > The more usual form would be to omit the "standard" definition > in which case it wil not prompt. Yes, but then it still wont prompt for other option (in this example 'start') that depends on 'useinsert'. This would be needed. For that would need to give '-useinsert' in commandline as well. I think that '-useinsert -insert y' is not very userfriendly. -- Henrikki Almusa From pmr at ebi.ac.uk Mon Sep 27 10:34:19 2004 From: pmr at ebi.ac.uk (pmr at ebi.ac.uk) Date: Mon, 27 Sep 2004 15:34:19 +0100 (BST) Subject: ACD, toggle handling In-Reply-To: <200409271204.07484.henrikki.almusa@helsinki.fi> References: <200409271204.07484.henrikki.almusa@helsinki.fi> Message-ID: <61381.204.50.11.2.1096295659.squirrel@webmail.ebi.ac.uk> Hi Henrikki, > Now if I give in commandline option '-insert y', then it asks the "Do you > want > to make an insert[N]: ". If i answer no to it, it makes the insert with > default start value. If I say yes, then it asks to start point. > > Now i would suggest that since I have already given the option that > depends on > the toggle option, program should automatically assume that useinsert is > 'Y'. > Not ask if want to add insert and then ask for the start point of insert. > Most often I use the command line options when giving out files, as bash > has > autocompletion. Would this be a good way to handle this? When the ACD file is processed, the "useinsert" toggle is processed first. The value for "insert" has not yet been processed, and cannot be used. Toggles are usually not prompted for, so removing the 'standard: "Y"' from useinsert would perhaps give the behaviour you want (you can also make that default to "Y" and have -nouseinsert as a commandline oiption to turn off prompting). As -useinsert only controls whether the other options are prompted for (that is what toggle is for) you can set the default either way. Hope this helps, Peter From henrikki.almusa at helsinki.fi Wed Sep 29 02:41:50 2004 From: henrikki.almusa at helsinki.fi (Henrikki Almusa) Date: Wed, 29 Sep 2004 09:41:50 +0300 Subject: ajSeqTypeCheckS Message-ID: <200409290941.50650.henrikki.almusa@helsinki.fi> Hello, I was reading through the ajseqtype.c to find more explicit way to check sequence type for protein (other than ajSeqTypeAnyprotS) and found ajSeqTypeCheckS. This would be good function. Only seems to return ajTrue even if the check should fail. Code that would return ajFalse has been commented out. Why and what would be needed to do for it be uncommented? Thanks, -- Henrikki Almusa From gbottu at ben.vub.ac.be Fri Sep 17 19:15:56 2004 From: gbottu at ben.vub.ac.be (Guy Bottu) Date: Fri, 17 Sep 2004 21:15:56 +0200 Subject: bizarre behaviour of ajSeqGetUsa Message-ID: <20040917191556.GA3184@bigben.ulb.ac.be> Dear support, I discovered something bizarre. To see it, try the following program under EMBOSS 2.9.0 : ---------------------------- #include "emboss.h" #include int main(int argc, char **argv) { AjPSeq seq1; AjPSeq seq2; embInit ("test", argc, argv); seq1 = ajAcdGetSeq("asequence"); seq2 = ajAcdGetSeq("bsequence"); ajFmtPrint("We have %S and %S\n", ajSeqGetUsa(seq1), ajSeqGetUsa(seq2)); ajExit (); } ------------------------------ application: test [ ] sequence: asequence [ parameter: "Y" ] sequence: bsequence [ parameter: "Y" ] ---------------------------- In my hands the command test sw:papa_carpa sw:tpa_human outputs We have sw-id:TPA_HUMAN and sw-id:TPA_HUMAN Where is sw:PAPA_CARPA gone ? I do not know whether this is a bug or a "feature", but it is surely counterintuituve behaviour. Sincerely, Guy Bottu From pmr at ebi.ac.uk Sat Sep 18 10:15:30 2004 From: pmr at ebi.ac.uk (pmr at ebi.ac.uk) Date: Sat, 18 Sep 2004 11:15:30 +0100 (BST) Subject: [EMBOSS-BUG] bizarre behaviour of ajSeqGetUsa In-Reply-To: <20040917191556.GA3184@bigben.ulb.ac.be> References: <20040917191556.GA3184@bigben.ulb.ac.be> Message-ID: <1147.217.134.102.200.1095502530.squirrel@webmail.ebi.ac.uk> Dear Guy, > I discovered something bizarre. To see it, try the following program under > EMBOSS 2.9.0 : > ajFmtPrint("We have %S and %S\n", ajSeqGetUsa(seq1), ajSeqGetUsa(seq2)); > test sw:papa_carpa sw:tpa_human > > outputs > > We have sw-id:TPA_HUMAN and sw-id:TPA_HUMAN > > Where is sw:PAPA_CARPA gone ? I do not know whether this is a bug or a > "feature", but it is surely counterintuituve behaviour. It's a feature, but one that does need careful documentation. ajSeqGetUsa returns a pointer to an AjPStr ... but that string has to exist somewhere, and as the USA is not necessarily in the original AjPSeq is makes one up. This leaves your second ajSeqGetUsa call overwriting the string before the first one has been printed. Any function returning a "const AjPStr" could, in thory, have such effects. It depends though on whether the AjPStr is likely to change, or whether it depends entirely on the input to the function. We should at least mark such cases in the documentation. Making a new string would cause a memory leak. The recommended solutions would be: 1. split the ajFmtPrint statement into two 2. copy the ajSeqGetUsa result into a string, and clean it up Our solutions could be (for a future release) populating a Usa attribute of the AjPSeq and passing that back. As an AjPSeq attribute it would then be cleaned up by the AjPSeq destructor. At present this attribute contains the original USA but I see no reason not to "correct" it. The alternative would involve writing a lot of extra documentation .... I think I prefer the code fix. Hpe this helps, Peter From henrikki.almusa at helsinki.fi Fri Sep 24 07:23:33 2004 From: henrikki.almusa at helsinki.fi (Henrikki Almusa) Date: Fri, 24 Sep 2004 10:23:33 +0300 Subject: New program: makeseq Message-ID: <200409241023.33656.henrikki.almusa@helsinki.fi> Hello, I have written a new program for emboss. The code is made against emboss-2.9.0 and the program is called 'makeseq'. It creates random sequences, but since the biological world isn't quite that random, it can use either pepstats output (for proteins) or cusp output (for nucleotides) to create a distribution. This should give users the ability to create random sequences biased according to their own sequence triplet or amino acid distributions. The program also allows inserting a given sequence (insert) within the created sequence. However, I've encountered a few problems where I need help. 1. Acd handling I've tried to make the program query something depending on other selection. Sequence type should be asked if there is no distribution file and start point of the insertion should be asked if insert has been given. I can't make it query the these properly properly. 2. Segfaults The program segfaults when asked to make a nucleotide sequence with a given insert. This is caused by the inserts sequence type check. The stack trace is: #0 0x40103531 in cvt_s () from /work/hena/emboss-2.9.0/lib/libajax.so.0 #1 0x4010487a in ajFmtVfmt () from /work/hena/emboss-2.9.0/lib/libajax.so.0 #2 0x4010445c in ajFmtVfmtStrCL () from /work/hena/emboss-2.9.0/lib/libajax.so.0 #3 0x40104367 in ajFmtPrintS () from /work/hena/emboss-2.9.0/lib/libajax.so.0 #4 0x40145998 in seqTypeCharDnaGap () from /work/hena/emboss-2.9.0/lib/libajax.so.0 #5 0x40144e70 in ajSeqTypeDnaS () from /work/hena/emboss-2.9.0/lib/libajax.so.0 #6 0x08049234 in main () #7 0x40466a67 in __libc_start_main () from /lib/i686/libc.so.6 Protein typechecking works ok. 3. Uniformity This problem appears when making pure random sequence. I tried to use 'ajax/seqtype.c' lines char seqCharProtPure[] = "ACDEFGHIKLMNPQRSTVWYacdefghiklmnpqrstvwy"; char seqCharNucPure[] = "ACGTUacgtu"; with the following addeitions to the file int seqCharProtPureLength = 40; int seqCharNucPureLength = 10; Now, this did not work. Therefore, I just copied them within the program and it worked fine. However, I don't think this is the proper way to do, since the program doesn't then uses it's own settings for what is good character and what is not. Is there a way to use something more generic, so that if emboss changes these things, they would be applied to this program as well? Any help is most appreciated. I would like to submit this when these things are fixed. 'makeseq.c' and 'makeseq.acd' are attached. Also basic help on creating help page for makeseq would be appreciated. Thanks, -- Henrikki Almusa -------------- next part -------------- application: makeseq [ documentation: "Creates random sequences" groups: "Edit" ] section: input [ information: "Input section" type: "page" ] infile: data [ information: "Distribution file" help: "This file should be pepstats output file to create protein sequences or cusp output to create nucleotide sequence. Nucleotide sequences will be created as triplets with end trimmed to be correct length." additional: "Y" nullok: "Y" ] endsection: input section: required [ information: "Required section" type: "page" ] integer: amount [ standard: "Y" default: "100" minimum: "1" information: "Number of sequences" ] integer: length [ standard: "Y" default: "100" minimum: "1" information: "Length of single sequence" ] endsection: required section: advanced [ information: "Advanced section" type: "page" ] # this should be queried if no data file boolean: protein [ standard: "@(!$(data) > 0 ? Y : N)" default: "N" additional: "Y" information: "Make protein sequences" ] string: insert [ information: "Inserted string" help: "String that is inserted into sequence" additional: "Y" nullok: "Y" knowntype: "sequence" ] # this isn't always queried even as insert given integer: start [ standard: "@($(insert) ? Y : N)" information: "Start point of inserted sequence" minimum: "1" default: "1" # maximum: "@($(length) - @($(insert) ? $(insert.length)-1 : 0))" ] endsection: advanced section: output [ information: "Output section" type: "page" ] seqoutall: outseq [ parameter: "Y" type: "any" name: "makeseq" ] endsection: output -------------- next part -------------- A non-text attachment was scrubbed... Name: makeseq.c Type: text/x-csrc Size: 8999 bytes Desc: not available URL: From jison at hgmp.mrc.ac.uk Fri Sep 24 15:02:39 2004 From: jison at hgmp.mrc.ac.uk (Dr J.C. Ison) Date: Fri, 24 Sep 2004 16:02:39 +0100 Subject: New program: makeseq References: <200409241023.33656.henrikki.almusa@helsinki.fi> Message-ID: <4154370F.FE074597@hgmp.mrc.ac.uk> Hi Henrikki Sounds very useful. I've tried to address all your points (below). Cheers Jon Henrikki Almusa wrote: > > Hello, > > I have written a new program for emboss. The code is made against emboss-2.9.0 > and the program is called 'makeseq'. It creates random sequences, but since > the biological world isn't quite that random, it can use either pepstats > output (for proteins) or cusp output (for nucleotides) to create a > distribution. This should give users the ability to create random sequences > biased according to their own sequence triplet or amino acid distributions. > The program also allows inserting a given sequence (insert) within the > created sequence. However, I've encountered a few problems where I need help. > > 1. Acd handling > I've tried to make the program query something depending on other selection. > Sequence type should be asked if there is no distribution file and start > point of the insertion should be asked if insert has been given. I can't make > it query the these properly properly. The cleanest way to do this is to use a "Toggle" ACD data item. e.g. toggle: distro [ standard: "Y" information: "Do you want to use make an insert?" default: "N" ] integer: start [ standard: "$(retain)" information: "Start point of inserted sequence" default: "1" ] You don't need to prompt the user for sequnce type though, because "sequence" data items have attributes: sequence: sequence [ parameter: "Y" type: protein ] sequence.begin (start residue, i.e. -sbegin value) sequence.end (end residue, i.e. -send value) sequence.length (length) sequence.protein (true if sequence is protein) sequence.nucleic (true if sequence is nucleic) sequence.name (name) sequence.weight (alignment weight for a seqset) sequence.count (no. of sequences in a seqset) You access them in ACD by e.g. $(sequence.begin) etc. e.g. to ensure your insert isn't past the end of the sequence use maximum: $(sequence.end) > > 2. Segfaults > The program segfaults when asked to make a nucleotide sequence with a given > insert. This is caused by the inserts sequence type check. The stack trace > is: > > #0 0x40103531 in cvt_s () from /work/hena/emboss-2.9.0/lib/libajax.so.0 > #1 0x4010487a in ajFmtVfmt () from /work/hena/emboss-2.9.0/lib/libajax.so.0 > #2 0x4010445c in ajFmtVfmtStrCL () > from /work/hena/emboss-2.9.0/lib/libajax.so.0 > #3 0x40104367 in ajFmtPrintS () from /work/hena/emboss-2.9.0/lib/libajax.so.0 > #4 0x40145998 in seqTypeCharDnaGap () > from /work/hena/emboss-2.9.0/lib/libajax.so.0 > #5 0x40144e70 in ajSeqTypeDnaS () > from /work/hena/emboss-2.9.0/lib/libajax.so.0 > #6 0x08049234 in main () > #7 0x40466a67 in __libc_start_main () from /lib/i686/libc.so.6 > > Protein typechecking works ok. > If you really can't fix it get back in touch and I can run it through Purify. > 3. Uniformity > This problem appears when making pure random sequence. I tried to use > 'ajax/seqtype.c' lines > > char seqCharProtPure[] = "ACDEFGHIKLMNPQRSTVWYacdefghiklmnpqrstvwy"; > char seqCharNucPure[] = "ACGTUacgtu"; > > with the following addeitions to the file > > int seqCharProtPureLength = 40; > int seqCharNucPureLength = 10; > > Now, this did not work. Therefore, I just copied them within the program and > it worked fine. However, I don't think this is the proper way to do, since > the program doesn't then uses it's own settings for what is good character > and what is not. I'm presuming the 10 and 40 are size of your two arrays. If you want to treat them as strings you have to leave space for your terminating NULL, so 41 and 11 would do it. All abitrary limits really should be avoided though, use e.g. AjPStr seqCharProtPure=NULL; seqCharProtPure=ajStrNewC("ACDEFGHIKLMNPQRSTVWYacdefghiklmnpqrstvwy"); and ajStrChar to return a single character from a string at a given position. Is there a way to use something more generic, so that if > emboss changes these things, they would be applied to this program as well? > There might (perhaps should!) be - Alan Bleasby (ableasby at rfcgr.mrc.ac.uk) is the best man to ask about that. > Any help is most appreciated. I would like to submit this when these things > are fixed. 'makeseq.c' and 'makeseq.acd' are attached. Also basic help on > creating help page for makeseq would be appreciated. I've attached the template I use for the DOMAINATRIX documentation, e.g. : http://www.rfcgr.mrc.ac.uk/Software/EMBOSS/Apps/domainatrix/rocon.html With this template, I document stuff by hand. The only external program I use is "acdtable" to get the ACD stuff. This is slightly different from the format used for EMBOSS apps though. > > Thanks, > -- > Henrikki Almusa > Hope this helps and thanks for the interest Cheers Jon > ------------------------------------------------------------------------ > Name: makeseq.acd > makeseq.acd Type: Plain Text (text/plain) > Encoding: 7bit > > Name: makeseq.c > makeseq.c Type: text/x-csrc > Encoding: 7bit -- Jon C. Ison, PhD Proteomics Applications Group MRC Rosalind Franklin Centre for Genomics Research Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SB, UK Tel: +44 1223 494500 Fax: +44 1223 494512 E-mail: jison at rfcgr.mrc.ac.uk Web: http://www.rfcgr.mrc.ac.uk -------------- next part -------------- An HTML attachment was scrubbed... URL: From henrikki.almusa at helsinki.fi Mon Sep 27 08:33:16 2004 From: henrikki.almusa at helsinki.fi (Henrikki Almusa) Date: Mon, 27 Sep 2004 11:33:16 +0300 Subject: New program: makeseq In-Reply-To: <4154370F.FE074597@hgmp.mrc.ac.uk> References: <200409241023.33656.henrikki.almusa@helsinki.fi> <4154370F.FE074597@hgmp.mrc.ac.uk> Message-ID: <200409271133.16344.henrikki.almusa@helsinki.fi> On Friday 24 September 2004 18:02, Dr J.C. Ison wrote: > > 1. Acd handling > The cleanest way to do this is to use a "Toggle" ACD data item. > e.g. Ok, this part seems to work ok. I put toggle for both data file and insert info. > You don't need to prompt the user for sequnce type though, because > "sequence" data items have attributes: > > sequence: sequence > [ > parameter: "Y" > type: protein > ] > > sequence.begin (start residue, i.e. -sbegin value) > > You access them in ACD by e.g. $(sequence.begin) etc. > e.g. to ensure your insert isn't past the end of the sequence use > maximum: $(sequence.end) Well, i don't have a sequence there anywhere. And the problem also comes from the fact that data file can determine the type as well. It is now queried if the data file is not given. And since the insert is counted within the sequence length the maximium place to start the insert is lenght - insert.length. That calculation doesn't seem to work either, so I'm checking that inside the code. > > 2. Segfaults > If you really can't fix it get back in touch and I can run it through > Purify. That would be nice. I honestly can't figure this one out. I checked that the insert goes there (inserts ajpstr can be printed with ajFmtPrint() before test). > > 3. Uniformity > I'm presuming the 10 and 40 are size of your two arrays. If you want > to treat them as strings you have to leave space for your terminating > NULL, so 41 and 11 would do it. All abitrary limits really should be > avoided though, use e.g. > > AjPStr seqCharProtPure=NULL; > seqCharProtPure=ajStrNewC("ACDEFGHIKLMNPQRSTVWYacdefghiklmnpqrstvwy"); > > and ajStrChar to return a single character from a string at a given > position. I use the length to tell me size of the char array that exists. Then when creating a random sequence, i can just ask random number between 0 and length to get a character for sequence. Well there is one abstraction layer between that char array and the final one used in randomised selection, but thats because of cusp. There is no arbitrary limits as such. Usage of the above char arrays are in makeseq_default_chars function. >> Is there a way to use something more generic, so that if > > emboss changes these things, they would be applied to this program as > > well? > > There might (perhaps should!) be - Alan Bleasby > (ableasby at rfcgr.mrc.ac.uk) is the best man to ask about that. Ok. I'll put another post to emboss-dev later on this. > I've attached the template I use for the DOMAINATRIX documentation, e.g. > > http://www.rfcgr.mrc.ac.uk/Software/EMBOSS/Apps/domainatrix/rocon.html > With this template, I document stuff by hand. The only external program > I use is "acdtable" to get the ACD stuff. This is slightly different > from the format used for EMBOSS apps though. So there is no script to run to get basic info from acd file into html file. Then its just manual labour of copying and writing html file :). > Hope this helps and thanks for the interest > > Cheers > > Jon Thanks for help. I attached the new versions of .c and .acd files. -- Henrikki Almusa -------------- next part -------------- A non-text attachment was scrubbed... Name: makeseq.c Type: text/x-csrc Size: 8999 bytes Desc: not available URL: -------------- next part -------------- application: makeseq [ documentation: "Creates random sequences" groups: "Edit" ] section: required [ information: "Required section" type: "page" ] integer: amount [ standard: "Y" default: "100" minimum: "1" information: "Number of sequences" ] integer: length [ standard: "Y" default: "100" minimum: "1" information: "Length of single sequence" ] toggle: useinsert [ standard: "Y" information: "Do you want to make an insert" default: "N" ] string: insert [ standard: "$(useinsert)" information: "Inserted string" help: "String that is inserted into sequence" # nullok: "Y" knowntype: "sequence" ] integer: start [ standard: "$(useinsert)" information: "Start point of inserted sequence" minimum: "1" default: "1" # maximum: "@($(length) - @($(insert) ? $(insert.length)-1 : 0))" ] toggle: usedata [ standard: "Y" information: "Do you want to use distribution file" default: "N" ] endsection: required section: input [ information: "Input section" type: "page" ] infile: data [ standard: "$(usedata)" information: "Distribution file" help: "This file should be pepstats output file to create protein sequences or cusp output to create nucleotide sequence. Nucleotide sequences will be created as triplets with end trimmed to be correct length." nullok: "Y" ] endsection: input section: advanced [ information: "Advanced section" type: "page" ] boolean: protein [ standard: "@($(usedata) ? N : Y)" default: "N" information: "Make protein sequences" ] endsection: advanced section: output [ information: "Output section" type: "page" ] seqoutall: outseq [ parameter: "Y" type: "any" name: "makeseq" ] endsection: output From henrikki.almusa at helsinki.fi Mon Sep 27 09:04:07 2004 From: henrikki.almusa at helsinki.fi (Henrikki Almusa) Date: Mon, 27 Sep 2004 12:04:07 +0300 Subject: ACD, toggle handling Message-ID: <200409271204.07484.henrikki.almusa@helsinki.fi> Hello, I've done a program that uses toggle in acd to allow one or more options to depend on others. Eg. toggle: useinsert [ standard: "Y" information: "Do you want to make an insert" default: "N" ] string: insert [ standard: "$(useinsert)" information: "Inserted string" ] integer: start [ standard: "$(useinsert)" information: "Start point of inserted sequence" default: "1" ] Now if I give in commandline option '-insert y', then it asks the "Do you want to make an insert[N]: ". If i answer no to it, it makes the insert with default start value. If I say yes, then it asks to start point. Now i would suggest that since I have already given the option that depends on the toggle option, program should automatically assume that useinsert is 'Y'. Not ask if want to add insert and then ask for the start point of insert. Most often I use the command line options when giving out files, as bash has autocompletion. Would this be a good way to handle this? Sincerely, -- Henrikki Almusa From ableasby at hgmp.mrc.ac.uk Mon Sep 27 09:34:26 2004 From: ableasby at hgmp.mrc.ac.uk (Alan Bleasby) Date: Mon, 27 Sep 2004 10:34:26 +0100 (BST) Subject: ACD, toggle handling Message-ID: <200409270934.i8R9YQ3M006048@bromine.hgmp.mrc.ac.uk> Hello Henrikki, You have the definition: toggle: useinsert [ standard: "Y" information: "Do you want to make an insert" default: "N" ] The more usual form would be to omit the "standard" definition in which case it wil not prompt. Alan From henrikki.almusa at helsinki.fi Mon Sep 27 09:44:54 2004 From: henrikki.almusa at helsinki.fi (Henrikki Almusa) Date: Mon, 27 Sep 2004 12:44:54 +0300 Subject: ACD, toggle handling In-Reply-To: <200409270934.i8R9YQ3M006048@bromine.hgmp.mrc.ac.uk> References: <200409270934.i8R9YQ3M006048@bromine.hgmp.mrc.ac.uk> Message-ID: <200409271244.54334.henrikki.almusa@helsinki.fi> On Monday 27 September 2004 12:34, Alan Bleasby wrote: > Hello Henrikki, > > You have the definition: > > toggle: useinsert [ > standard: "Y" > information: "Do you want to make an insert" > default: "N" > ] > > > The more usual form would be to omit the "standard" definition > in which case it wil not prompt. Yes, but then it still wont prompt for other option (in this example 'start') that depends on 'useinsert'. This would be needed. For that would need to give '-useinsert' in commandline as well. I think that '-useinsert -insert y' is not very userfriendly. -- Henrikki Almusa From pmr at ebi.ac.uk Mon Sep 27 14:34:19 2004 From: pmr at ebi.ac.uk (pmr at ebi.ac.uk) Date: Mon, 27 Sep 2004 15:34:19 +0100 (BST) Subject: ACD, toggle handling In-Reply-To: <200409271204.07484.henrikki.almusa@helsinki.fi> References: <200409271204.07484.henrikki.almusa@helsinki.fi> Message-ID: <61381.204.50.11.2.1096295659.squirrel@webmail.ebi.ac.uk> Hi Henrikki, > Now if I give in commandline option '-insert y', then it asks the "Do you > want > to make an insert[N]: ". If i answer no to it, it makes the insert with > default start value. If I say yes, then it asks to start point. > > Now i would suggest that since I have already given the option that > depends on > the toggle option, program should automatically assume that useinsert is > 'Y'. > Not ask if want to add insert and then ask for the start point of insert. > Most often I use the command line options when giving out files, as bash > has > autocompletion. Would this be a good way to handle this? When the ACD file is processed, the "useinsert" toggle is processed first. The value for "insert" has not yet been processed, and cannot be used. Toggles are usually not prompted for, so removing the 'standard: "Y"' from useinsert would perhaps give the behaviour you want (you can also make that default to "Y" and have -nouseinsert as a commandline oiption to turn off prompting). As -useinsert only controls whether the other options are prompted for (that is what toggle is for) you can set the default either way. Hope this helps, Peter From henrikki.almusa at helsinki.fi Wed Sep 29 06:41:50 2004 From: henrikki.almusa at helsinki.fi (Henrikki Almusa) Date: Wed, 29 Sep 2004 09:41:50 +0300 Subject: ajSeqTypeCheckS Message-ID: <200409290941.50650.henrikki.almusa@helsinki.fi> Hello, I was reading through the ajseqtype.c to find more explicit way to check sequence type for protein (other than ajSeqTypeAnyprotS) and found ajSeqTypeCheckS. This would be good function. Only seems to return ajTrue even if the check should fail. Code that would return ajFalse has been commented out. Why and what would be needed to do for it be uncommented? Thanks, -- Henrikki Almusa