From econtact at defisc-immo.fr Fri Dec 7 01:06:34 2001 From: econtact at defisc-immo.fr (DEFISCIMMO) Date: Fri, 7 Dec 2001 07:06:34 +0100 Subject: INVESTISSEZ VOS IMPOTS Message-ID: **********EPARGNEZ VOS IMPOTS********************* Pour en savoir plus cliquez sur le lien suivant : www.defisc-immo.fr http://www.defisc-immo.fr/cgi-bin/s.pl?id=453059457;p=index;end;/ --------------------------------------------------------------------------- INVESTIR FACILEMENT Loyers percus ? partir de + Economie d'imp?ts 200 F/mois - Remboursement des pr?ts = EPARGNE MINIMALE Ou comment, dans le cadre de la LOI BESSON, se constituer : - un patrimoine - un capital retraite - des revenus compl?mentaires gr?ce ? un LOCATAIRE et ? des ECONOMIES D'IMPOTS. * Plans d'investissement sur demande DEFISCIMMO info at defisc-immo.fr Nous vous invitons ? remplir le formulaire ? l'adresse http://www.defisc-immo.fr/cgi-bin/s.pl?id=453059457;p=contact;end si vous ne souhaitez plus recevoir de messages cliquez sur le lien suivant http://www.defisc-immo.fr/contact/pages/mailing.htm ou r?pondez ? ce courrier en indiquant 'annulation' dans le sujet. From gbottu at ben.vub.ac.be Fri Dec 7 05:59:49 2001 From: gbottu at ben.vub.ac.be (Guy Bottu) Date: Fri, 7 Dec 2001 11:59:49 +0100 (MET) Subject: compiling EMNU on CompaqTru64 Message-ID: <200112071059.LAA16923@bigben.vub.ac.be> from : BEN Dear colleagues, I have a problem. I am trying to compile EMNU on our new computer. We have OS CompaqTru64 5.1 and compiler GNU gcc 3.O.1 It does not work because the files menu.h, form.h, eti.h, libmenu.a and libform.a are lacking. Anyone an idea where to obtain these ? Guy Bottu From gwilliam at hgmp.mrc.ac.uk Fri Dec 7 06:09:48 2001 From: gwilliam at hgmp.mrc.ac.uk (Gary Williams, Tel 01223 494522) Date: Fri, 07 Dec 2001 11:09:48 +0000 Subject: compiling EMNU on CompaqTru64 References: <200112071059.LAA16923@bigben.vub.ac.be> Message-ID: <3C10A37C.7C9CB92A@hgmp.mrc.ac.uk> The libmenu.a, menu.h and libform.a, form.h files are part of the standard curses (or ncurses) UNIX libraries. Check that these are set up correctly. ncurses is available from: ftp://dickey.his.com/ncurses/ or ftp://ftp.gnu.org/pub/gnu/ncurses Read emnu's INSTALL file for 'configure's arguments to piont to the required libraries. Guy Bottu wrote: > > from : BEN > > Dear colleagues, > > I have a problem. I am trying to compile EMNU on our new computer. We have OS > CompaqTru64 5.1 and compiler GNU gcc 3.O.1 > It does not work because the files menu.h, form.h, eti.h, libmenu.a and > libform.a are lacking. Anyone an idea where to obtain these ? > > Guy Bottu -- Gary Williams Tel: +44 1223 494522 Fax: +44 1223 494512 mailto:G.Williams at hgmp.mrc.ac.uk http://www.hgmp.mrc.ac.uk/ Bioinformatics,MRC HGMP Resource Centre,Hinxton,Cambridge, CB10 1SB,UK From ableasby at hgmp.mrc.ac.uk Fri Dec 7 06:10:26 2001 From: ableasby at hgmp.mrc.ac.uk (ableasby at hgmp.mrc.ac.uk) Date: Fri, 7 Dec 2001 11:10:26 GMT Subject: compiling EMNU on CompaqTru64 Message-ID: <200112071110.LAA24602@bromine.hgmp.mrc.ac.uk> Hi Guy, I believe you'll find them if you install GNU ncurses from ftp.gnu.org Cheers Alan From mad at biol.unlp.edu.ar Fri Dec 7 09:44:33 2001 From: mad at biol.unlp.edu.ar (Sarachu Martin) Date: Fri, 07 Dec 2001 11:44:33 -0300 (ART) Subject: gcg and solaris 8 Message-ID: <1007736273.3c10d5d1aaead@www.biol.unlp.edu.ar> Hi, sorry for the off-topic but maybe you can help me. Do you know if GCG 9 does run on a UltraSparc/Solaris 8 system? I installed GCG 9 on a Intel/Solaris 8 system and got a "cannot execute exe file" error on several files. GCG doesn?t run on a PC platform? Thanks, martin. From ztu at msi.umn.edu Fri Dec 7 09:54:15 2001 From: ztu at msi.umn.edu (Zheng Jin Tu) Date: Fri, 7 Dec 2001 08:54:15 -0600 (CST) Subject: gcg and solaris 8 In-Reply-To: <1007736273.3c10d5d1aaead@www.biol.unlp.edu.ar> Message-ID: Hi Sarachu: The best place is asking Acclerys. The company has better idea what operating system should be. Email: Help at GCG.Com Thanks, Tu On Fri, 7 Dec 2001, Sarachu Martin wrote: > Hi, > > sorry for the off-topic but maybe you can help me. Do you know if GCG 9 does > run on a UltraSparc/Solaris 8 system? I installed GCG 9 on a Intel/Solaris 8 > system and got a "cannot execute exe file" error on several files. GCG doesn?t > run on a PC platform? > > Thanks, > > martin. > From mathog at mendel.bio.caltech.edu Wed Dec 12 13:44:25 2001 From: mathog at mendel.bio.caltech.edu (David Mathog) Date: Wed, 12 Dec 2001 10:44:25 -0800 Subject: quick questions Message-ID: 1. Is this list archived in a searchable form somewhere? 2. what Ajax call or calls say if a command line switch was or wasn't present? For instance, at the moment when this foo = AjGetInt("Somekey"); returns foo = 0 I can't tell if the user entered "-somekey=0" or just left it off the line. 3. What entries have to go in the makefile to result in an EMBOSS executable that gdb will debug? This is on Solaris 8. I tried using -g along, but gdb didn't like the resulting executable. It would start it, but "bt" (backtrace) only showed binary addresses. GDBs exact message was: This GDB was configured as "sparc-sun-solaris2.8"..."/usr/local/src/EMBOSS/embassy/ESIM4-1.0.0/source/esim4": not in executable format: File format not recognized Thanks, David Mathog mathog at caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech From peter.rice at uk.lionbioscience.com Thu Dec 13 05:09:35 2001 From: peter.rice at uk.lionbioscience.com (Peter Rice) Date: Thu, 13 Dec 2001 10:09:35 +0000 Subject: quick questions References: Message-ID: <3C187E5F.9D94B7D3@uk.lionbioscience.com> Hi David, >2. what Ajax call or calls say if a command line switch was or wasn't >present? None. Values can be set on the command line, or by dependence on other values, or just default. Why would you like to know what was on the command line? It could be tricky for GUI interfaces if they deliberately put everything on the command line, default values and all. >3. What entries have to go in the makefile to result in an EMBOSS >executable that gdb will debug? None. Just run: ./configure --enable-debug before you make. regards, Peter Rice -- ------------------------------------------------ Peter Rice, LION Bioscience Ltd, Cambridge, UK peter.rice at uk.lionbioscience.com +44 1223 224723 From kkmattil at csc.fi Thu Dec 13 08:08:52 2001 From: kkmattil at csc.fi (Kimmo Mattila) Date: Thu, 13 Dec 2001 15:08:52 +0200 (EET) Subject: Problems with fuzzpro and ehmmer Message-ID: Dear EMBOSS people. I have had few problems with fuzzpro, patmatdb and ehmmer. If anyone of you have suggestions how to solve them, please tell. FUZZPRO and PATMATDB I am using fuzzpro and patmatdb with GCG formatted databases. If I run a search against whole database (e.g. swiss:*), the programs do find the right hit sequences, but pick wrong names for the found entries. With plane sequence files or with sequence name lists, this error does not occur. I have checked both the EMBOSS indexing and the GCG database files and they should be OK. Other EMBOSS and GCG ?applications give correct results, when same database files are used. Has someone else had similar troubles? If the indexing of the databases is in order, what might cause this? EHMMER We have successfully installed EMOBOSS-HMMER, however, unlike the native HMMER, the emboss version is not able to use multiple processors (even though ?cpu option is mentioned in the help data.) When I compared the Makefile of EMBOSS-HMMER to the native one, in noticed that the EMBOSS version lacks the settings for compiling multiprocessor version of HMMER. Has someone managed to circumvent this with some simple trick like copying some parts of the original HMMER Makefile to the Makefile of EMBOSS-version? Secondly, when I use ehmmsearch long output files are not complete. After about 200 lines lines ehmmsearch starts writing the output to the screen instead of the output file. The last line in the output file seems to be Domain top hits: And after this the alignments are printed to the screen. What might cause this? Is there e.g. some limit in the output file size. Regards, Kimmo Mattila --------------------------------------------------------------- Kimmo Mattila Science Support kimmo.mattila at csc.fi Center for Scientific Computing tel. +358 (0)9 457 2708 Tekniikantie 15a D, PL 405 fax. +358 (0)9 457 2302 FIN-02101 Espoo, Finland --------------------------------------------------------------- From mathog at mendel.bio.caltech.edu Thu Dec 13 11:04:30 2001 From: mathog at mendel.bio.caltech.edu (David Mathog) Date: Thu, 13 Dec 2001 08:04:30 -0800 Subject: quick questions Message-ID: > >2. what Ajax call or calls say if a command line switch was or wasn't > >present? > > None. Values can be set on the command line, or by dependence on other > values, or just default. > > Why would you like to know what was on the command line? It could be tricky > for GUI interfaces if they deliberately put everything on the command line, > default values and all. Consider an optional integer parameter "foobar" for which 0 is a valid value and also where if foobar is not specified, it is calculated based on the input sequences. That is, it does not have a fixed default value. I see no way to distinguish because "calculate value" and "use this value" when AjAcdGetInt returns 0. The workaround would beto set the default in the .acd file to a magic default value, say -1000000, which is out of range for the desired variable, and interpret that value as "not specified". There are three problems with this approach: 1. There may be cases for which there are no magic values available. 2. In w2h the default value shows up filled in on the Web interface. So the user sees -1000000 and wonders what the heck that means, or thinks that -900000 might also be valid. 3. It requires that range checking be disabled or special cased I guess I'll have a look at the code for AjAcdGetInt and see if it's possible to modify that into AjAcdItemExists, returning a boolean T/F for when the item has been specified. Then the code would be (more or less like on GCG) if(AjAcdItemExists("foobar")){ ifoobar=AjAcdGetInt("foobar"); } else { ifoobar=calculated_value(); } Thanks, David Mathog mathog at caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech From peter.rice at uk.lionbioscience.com Thu Dec 13 11:21:21 2001 From: peter.rice at uk.lionbioscience.com (Peter Rice) Date: Thu, 13 Dec 2001 16:21:21 +0000 Subject: quick questions References: Message-ID: <3C18D581.7A22FDB8@uk.lionbioscience.com> David Mathog wrote: > > Consider an optional integer parameter "foobar" for which 0 is a valid > value and also where if foobar is not specified, it is calculated based > on the input sequences. > > I guess I'll have a look at the code for AjAcdGetInt and see if it's > possible to modify that into AjAcdItemExists, returning a boolean > T/F for when the item has been specified. Then the code would be > (more or less like on GCG) > > if(AjAcdItemExists("foobar")){ > ifoobar=AjAcdGetInt("foobar"); > } > else { > ifoobar=calculated_value(); > } Calculated values are intended to be calculated in the ACD file. Interfaces such as W2H should be able to do this in JavaScript, though in some cases they have to simply treat values as integers. Try this ACD file. Save it as 'foobar.acd' and run as 'acdc foobar'. It will prompt for a sequence, then prompt for foobar with the sequence length as default but will accept any value from 0 to the sequence length. The 'echo' string is defined you so can see the value of foobar in the prompt. The default value can be calculated in more exotic ways too ... see the @() functions and the other calculated attributes. More can be easily added. ==================== appl: foobar [ documentation: "ACD example" groups: "test" ] sequence: sequence [ required: "Y" ] integer: foobar [ required: "Y" default: "$(sequence.len)" minimum: "0" maximum: "$(sequence.len)" ] string: echo [ prompt: "Foobar is $(foobar)" required: "Y" ] =================== There are many other ways to set options. You could set a boolean to calculate a value, and another value to define the calculation. Testing the command line will have real problems for your original idea, because an interface might be writing every option, with what it considers the default value, on the command line. -- ------------------------------------------------ Peter Rice, LION Bioscience Ltd, Cambridge, UK peter.rice at uk.lionbioscience.com +44 1223 224723 From mathog at mendel.bio.caltech.edu Thu Dec 13 12:33:48 2001 From: mathog at mendel.bio.caltech.edu (David Mathog) Date: Thu, 13 Dec 2001 09:33:48 -0800 Subject: quick questions Message-ID: > Calculated values are intended to be calculated in the ACD file. Interfaces > such as W2H should be able to do this in JavaScript, though in some cases > they have to simply treat values as integers. The default value in this case is the end result of at least a hundred lines of C code. > > Testing the command line will have real problems for your original idea, > because an interface might be writing every option, with what it considers > the default value, on the command line. > That's a good point. W2H isn't like that, but some other interface might be. I guess it won't hurt to add a couple of extra booleans to cover those variables whose default is difficult to calculate prior to the program running. David Mathog mathog at caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech From mathog at mendel.bio.caltech.edu Thu Dec 13 14:01:43 2001 From: mathog at mendel.bio.caltech.edu (David Mathog) Date: Thu, 13 Dec 2001 11:01:43 -0800 Subject: quick questions Message-ID: Hmm, after going up and down through the ACD notation I can't find what I'm looking for there either. Consider this notation: bool: usermspA [ opt: Y def: N info: "False: esim4 calculates mspA, True: mspA from command line." ] int: mspA [ opt: $(usermspA) req: $(usermspA) def: 16 info: "long description. default of 16 is not used unless usermspA is specified.." ] If the command is issued with -usermspA then it will prompt for -mspA if it wasn't also specified, which gives the desired results. However, if the command has only this on the command line. -mspA=16 it clearly means that the user really wants to use the value of 16 for the parameter. How then to switch the state on -usermspA automatically, or failing that, prompt for -usermspA? 16 happens to be the default value. It wasn't set to an illegal (magic) value because we don't want -1000000 showing up in a GUI. But it isn't normally used because -usermspA will be false. As before, we could use a sort of magic number and do: bool: usermspA [ opt: Y def: @($(mspa)!=16) info: "False: esim4 calculates mspA, True: mspA from command line." ] and it will correctly flip the bit when the user specifies it - except when by bad luck they choose to specify the default value. And round and round the logic goes. I don't suppose that there is a ".specified" or ".online" attribute in ACD? Ie, this would do the job: bool: usermspA [ opt: Y def: $(mspa.online) info: "False: esim4 calculates mspA, True: mspA from command line." ] The desired GUI interaction in that case could be one of: 1. changing value in mspA toggles state of usermspA (messy) 2. -mspA slot is grayed out unless -usermspA is set (simpler) In some interfaces this could be covered over with Javascript - but the command line variant still wouldn't work exactly right. Or am I missing something? Summary: works: command works: command -usermspA -mspA 16 works (prompts for mspA): command -usermspA fails to prompt or override usermspA: command -mspA 16 Thanks, David Mathog mathog at caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech From peter.rice at uk.lionbioscience.com Fri Dec 14 04:41:17 2001 From: peter.rice at uk.lionbioscience.com (Peter Rice) Date: Fri, 14 Dec 2001 09:41:17 +0000 Subject: quick questions References: Message-ID: <3C19C93D.A3FD713F@uk.lionbioscience.com> David Mathog wrote: > > Hmm, after going up and down through the ACD notation I can't find > what I'm looking for there either. > > The desired GUI interaction in that case could be one of: > > 1. changing value in mspA toggles state of usermspA (messy) This means 'mspA depends on usermspA' and 'usermspA depends on mspA'. ACD expressly forbids this. All dependencies must be to something defined earlier in the file. > 2. -mspA slot is grayed out unless -usermspA is set (simpler) Could be done with an extra ACD attribute, with a value of "$(usermspA)", but you would expect most GUIs to ignore this. In general, you can expect to have options in EMBOSS that are not used by the program but can still be set on the command line. Your -mspA is just another case. Having said that, adding an ACD function (you would only need one) to test whether a value was set by the user is fairly trivial (setting via the command line or by replying to a prompt if there is one). -- ------------------------------------------------ Peter Rice, LION Bioscience Ltd, Cambridge, UK peter.rice at uk.lionbioscience.com +44 1223 224723 From charles at moulinette.dyndns.org Tue Dec 18 03:56:25 2001 From: charles at moulinette.dyndns.org (Charles Plessy) Date: Tue, 18 Dec 2001 09:56:25 +0100 Subject: phylogenic analysis with emboss Message-ID: <20011218085625.GB803@gizmotronics.dyndns.org> Hi, I was wondering which tools you used for phylogenic analysis, since I can't find any treedrawing in either emboss or embassy's phylip. Charles From gbottu at ben.vub.ac.be Tue Dec 18 04:34:43 2001 From: gbottu at ben.vub.ac.be (Guy Bottu) Date: Tue, 18 Dec 2001 10:34:43 +0100 (MET) Subject: phylogenic analysis with emboss Message-ID: <200112180934.KAA04579@bigben.vub.ac.be> from : BEN >I was wondering which tools you used for phylogenic analysis, since I can't >find any treedrawing in either emboss or embassy's phylip. If I am not wrong, the EMBOSS on-line help states explicitly that the tree drawing programs and the tree editors of PHYLIP were not included in the embassy PHYLIP. So, you should retrieve the original PHYLIP package from evolution.genetics.washington.edu and use the programs drawgram and drawtree. Note that while embassy has integrated PHYLIP version 3.53c, there is now a version 3.6a2, which is definitively better. drawgram/drawtree has now for previewing the graphic an X-display and the generated PostScript files can not only be send directly to a printer, but can also be incorporated into documents like MS-Word doc files. Another useful freeware tool I know about is NJplot, which is distributed together with CLUSTAL (which you must install anyway in order emma to work). Guy Bottu From letondal at pasteur.fr Wed Dec 19 09:03:18 2001 From: letondal at pasteur.fr (Catherine Letondal) Date: Wed, 19 Dec 2001 15:03:18 +0100 Subject: how to cite EMBOSS? Message-ID: <200112191403.fBJE3IW452649@electre.pasteur.fr> Hi, Sorry if this is an FAQ, but I was not able to find any reference in EMBOSS documentation and Web site (apart from the original algorithms of course). Is there any reference for the EMBOSS project? Thanks a lot, -- Catherine Letondal -- Pasteur Institute Computing Center From gwilliam at hgmp.mrc.ac.uk Wed Dec 19 09:05:49 2001 From: gwilliam at hgmp.mrc.ac.uk (Gary Williams, Tel 01223 494522) Date: Wed, 19 Dec 2001 14:05:49 +0000 Subject: how to cite EMBOSS? References: <200112191403.fBJE3IW452649@electre.pasteur.fr> Message-ID: <3C209EBD.763B54F7@hgmp.mrc.ac.uk> See the FAQ file: Q) Is there a reference I can cite for EMBOSS? A) Rice,P. Longden,I. and Bleasby,A. "EMBOSS: The European Molecular Biology Open Software Suite" Trends in Genetics June 2000, vol 16, No 6. pp.276-277 You are right - it should be in a more obvious place. Gary Catherine Letondal wrote: > > Hi, > > Sorry if this is an FAQ, but I was not able to find any reference > in EMBOSS documentation and Web site (apart from the original > algorithms of course). Is there any reference for the EMBOSS project? > > Thanks a lot, > > -- > Catherine Letondal -- Pasteur Institute Computing Center -- Gary Williams Tel: +44 1223 494522 Fax: +44 1223 494512 mailto:G.Williams at hgmp.mrc.ac.uk http://www.hgmp.mrc.ac.uk/ Bioinformatics,MRC HGMP Resource Centre,Hinxton,Cambridge, CB10 1SB,UK From letondal at pasteur.fr Wed Dec 19 09:10:44 2001 From: letondal at pasteur.fr (Catherine Letondal) Date: Wed, 19 Dec 2001 15:10:44 +0100 Subject: how to cite EMBOSS? In-Reply-To: Your message of "Wed, 19 Dec 2001 14:05:49 GMT." <3C209EBD.763B54F7@hgmp.mrc.ac.uk> Message-ID: <200112191410.fBJEAiW438064@electre.pasteur.fr> "Gary Williams, Tel 01223 494522" wrote: > > See the FAQ file: > > Q) Is there a reference I can cite for EMBOSS? > > A) Rice,P. Longden,I. and Bleasby,A. > "EMBOSS: The European Molecular Biology Open Software Suite" > Trends in Genetics June 2000, vol 16, No 6. pp.276-277 > > You are right - it should be in a more obvious place. Thanks - yes, maybe in the http://www.uk.embnet.org/Software/EMBOSS/general.html page? > > Gary > > Catherine Letondal wrote: > > > > Hi, > > > > Sorry if this is an FAQ, but I was not able to find any reference > > in EMBOSS documentation and Web site (apart from the original > > algorithms of course). Is there any reference for the EMBOSS project? > > > > Thanks a lot, > > > > -- > > Catherine Letondal -- Pasteur Institute Computing Center > > -- > Gary Williams Tel: +44 1223 494522 Fax: +44 1223 494512 > mailto:G.Williams at hgmp.mrc.ac.uk http://www.hgmp.mrc.ac.uk/ > Bioinformatics,MRC HGMP Resource Centre,Hinxton,Cambridge, CB10 1SB,UK -- Catherine Letondal -- Pasteur Institute Computing Center From gwilliam at hgmp.mrc.ac.uk Wed Dec 19 09:21:09 2001 From: gwilliam at hgmp.mrc.ac.uk (Gary Williams, Tel 01223 494522) Date: Wed, 19 Dec 2001 14:21:09 +0000 Subject: how to cite EMBOSS? References: <200112191410.fBJEAiW438064@electre.pasteur.fr> Message-ID: <3C20A255.80D77C57@hgmp.mrc.ac.uk> Catherine Letondal wrote: > > "Gary Williams, Tel 01223 494522" wrote: > > > > See the FAQ file: > > > > Q) Is there a reference I can cite for EMBOSS? > > > > A) Rice,P. Longden,I. and Bleasby,A. > > "EMBOSS: The European Molecular Biology Open Software Suite" > > Trends in Genetics June 2000, vol 16, No 6. pp.276-277 > > > > You are right - it should be in a more obvious place. > > Thanks - yes, maybe in the http://www.uk.embnet.org/Software/EMBOSS/general.html page? Done. Gary -- Gary Williams Tel: +44 1223 494522 Fax: +44 1223 494512 mailto:G.Williams at hgmp.mrc.ac.uk http://www.hgmp.mrc.ac.uk/ Bioinformatics,MRC HGMP Resource Centre,Hinxton,Cambridge, CB10 1SB,UK From simon.andrews at bbsrc.ac.uk Wed Dec 19 09:55:43 2001 From: simon.andrews at bbsrc.ac.uk (simon andrews (BI)) Date: Wed, 19 Dec 2001 14:55:43 -0000 Subject: Farm files for databases Message-ID: <2DC41140A89ED411989D00508BDCD9EDEA51EC@bi-exsrv1.iapc.bbsrc.ac.uk> I'm trying to find the best way to do the following: I have an application which returns an identifier (effectively an accession number), which could be present in any one of 4 separate EMBOSS databases. I'd like to be able to search all of these databases and retrieve the sequence from whichever one finds it (I know that the identifiers are unique between the different databases - so I'll only ever find one entry). Having read the EMBOSS documentation the only reference I could find for doing this sort of thing was to make a database entry with an "EXTERNAL" format, and then have seqret query a script to return the sequence. However the details for this are pretty sketchy. What exactly would a script of this type have to do? What input is it supplied with (and how), and what must it return? Is this the only (or best) way to do what I'm trying to do? Any help is much appreciated. TTFN Simon. ---- Simon Andrews PhD Bioinformatics Dept The Babraham Institute simon.andrews at bbsrc.ac.uk +44 (0)1223 496463 From peter.rice at uk.lionbioscience.com Wed Dec 19 10:16:46 2001 From: peter.rice at uk.lionbioscience.com (Peter Rice) Date: Wed, 19 Dec 2001 15:16:46 +0000 Subject: Farm files for databases References: <2DC41140A89ED411989D00508BDCD9EDEA51EC@bi-exsrv1.iapc.bbsrc.ac.uk> Message-ID: <3C20AF5E.F905F683@uk.lionbioscience.com> "simon andrews (BI)" wrote: > > I have an application which returns an identifier (effectively an > accession number), which could be present in any one of 4 separate > EMBOSS databases. > I'd like to be able to search all of these databases and retrieve the > sequence from whichever one finds it (I know that the identifiers are > unique between the different databases - so I'll only ever find one > entry). This sounds like a job for SRS, although the query could be complicated if there is a possibility of getting more than one copy returned. A script is a good solution. The script should read the dbname:id query from the commandline, and return the sequence in some specified format. What the script does is up to you. If there is no sequence found, it can simply return nothing. The original 'external' applications were the 'efetch' utility in acedb, and GCG's 'typedata'. Peter -- ------------------------------------------------ Peter Rice, LION Bioscience Ltd, Cambridge, UK peter.rice at uk.lionbioscience.com +44 1223 224723 From jason at cgt.mc.duke.edu Wed Dec 19 16:01:23 2001 From: jason at cgt.mc.duke.edu (Jason Stajich) Date: Wed, 19 Dec 2001 16:01:23 -0500 (EST) Subject: alignment sequence reading with stop codons (bug?) Message-ID: I noticed this in playing with our new bioperl wrappers for EMBOSS. Apparently -seqall does not read sequences with stop codons. I can submit as a bug if that is more appropriate. Getting warmed up to the EMBOSS dev process. This occurs with both EMBOSS-1.9.1 and CVS code I checked out today (2.0.1 I guess). The work around is of course to specify the arguments in the correct way or replace the stop codon with something like X. I know which sequence will have potential stop codons so I can work around this in my own code. [jason at gordola crypto_intergenic]$ cat jason.seq >SW-CC27_YEAST SW:CC27_YEAST P38042 saccharomyces cerevisiae (baker's yeast). cell division control protein 27. 10/2001; PIR:S45825 cell division control protein CDC27 - yeast (Saccharomyces cerevisia MAVNPELAPFTLSRGIPSFDDQALSTIIQLQDCIQQAIQQLNYSTAEFLAELLYAECSIL DKSSVYWSDAVYLYALSLFLNKSYHTAFQISKEFKEYHLGIAYIFGRCALQLSQGVNEAI LTLLSIINVFSSNSSNTRINMVLNSNLVHIPDLATLNCLLGNLYMKLDHSKEGAFYHSEA LAINPYLWESYEAICKMRATVDLKRVFFDIAGKKSNSHNNNAASSFPSTSLSHFEPRSQP SLYSKTNKNGNNNINNNVNTLFQSSNSPPSTSASSFSSIQHFSRSQQQQANTSIRTCQNK NTQTPKNPAINSKTSSALPNNISMNLVSPSSKQPTISSLAKVYNRNKLLTTPPSKLLNND RNHQNNNNNNNNNNNNNNNNNNNNNNNNIINKTTFKTPRNLYSSTGRLTTSKKNPRSLII SNSILTSDYQITLPEIMYNFALILRSSSQYNSFKAIRLFESQIPSHIKDTMPWCLVQLGK LHFEIINYDMSLKYFNRLKDLQPARVKDMEIFSTLLWHLHDKVKSSNLANGLMDTMPNKP ETWCCIGNLLSLQKDHDAAIKAFEKATQLDPNFAYAYTLQGHEHSSNDSSDSAKTCYRKA LACDPQHYNAYYGLGTSAMKLGQYEEALLYFEKARSINPVNVVLICCCGGSLEKLGYKEK ALQYYELACHLQPTSSLSKYKMGQLLYSMTRYNVALQTFEELVKLVPDDATAHYLLGQTY RIVGRKKDAIKELTVAMNLDPKGNQVIIDELQKCHMQE [jason at gordola crypto_intergenic]$ cat prot.seq >Contig5745 CLIF*RLLLIQMIHPQARRAFTFLQQQEPYRIQSMEQLSTLLWHLADLPALSHLSQSLIS ISRSSPQAWIAVGNCFSLQKDHDEAMRCFRRATQVDEGCAYAWTLCGYEAVEMEEYERAM AFYRTAIRTDARHYNAWYVLFFFFFFFFVPGDIDS*PKKGMEWG*FISKRIDRGMRSIIL KEPSKSIQLIPFFYVALVW*VGVSSYPLETMTNIDFPKKKKALEKSNDVVQALHFYERAS KYAPTSAMVQFKRIRALVALQRYDEAISALVPLTHSAPDEANVFFLLGKCLLKKERRQEA TMAFTNARELEPK [jason at gordola crypto_intergenic]$ water jason.seq prot.seq Smith-Waterman local alignment. An error has been found: Sequence Contig5745 must be protein sequence, found bad character '*' An error has been found: option -seqall: Unable to read sequence 'prot.seq' There is a serious problem: water terminated: Bad value for option and no prompt [jason at gordola crypto_intergenic]$ water prot.seq jason.seq Smith-Waterman local alignment. Gap opening penalty [10.0]: Gap extension penalty [0.5]: Output file [contig5745.water]: -- Jason Stajich Duke University jason at cgt.mc.duke.edu From bauer at genprofile.com Thu Dec 20 02:02:56 2001 From: bauer at genprofile.com (David Bauer) Date: Thu, 20 Dec 2001 08:02:56 +0100 Subject: alignment sequence reading with stop codons (bug?) References: Message-ID: <3C218D20.D752C59@genprofile.com> Hi, the protein alignment programs don't like the '*' in your protein sequences. They are designed to align true proteins which usualy do not contain stop codons. If this are putative ORFs, a solution would be to split them up at the stops, creating a separate protein sequence for each ORF. I also guess you are misinterpreting the -seqall. This means to return all sequences from a file containing more than one sequence (like a fasta formated file with several sequences separated by theire description lines). For me the -seqall option does not make much sense in the case of alignment programs which need exactly 2 sequences to align. There you must always pass the two sequence files which you want align as arguments to the alignment program and each file must contain exactly one sequence. I hope this helps, David Bauer. Jason Stajich wrote: > > I noticed this in playing with our new bioperl wrappers for EMBOSS. > Apparently -seqall does not read sequences with stop codons. > I can submit as a bug if that is more appropriate. Getting warmed up to > the EMBOSS dev process. From simon.andrews at bbsrc.ac.uk Thu Dec 20 04:21:43 2001 From: simon.andrews at bbsrc.ac.uk (simon andrews (BI)) Date: Thu, 20 Dec 2001 09:21:43 -0000 Subject: Bug in entret. Message-ID: <2DC41140A89ED411989D00508BDCD9EDEA51F1@bi-exsrv1.iapc.bbsrc.ac.uk> Following on from my query yesterday, I have hit a problem trying to implement a multiple search because of what appears to be a bug in entret. I am using a series of fasta flat files, indexed with dbifasta. What I am finding is that although I can retrieve entries from the database with seqret, using entret always returns an empty file with the same accession number: ############ %> entret htg_mus:AC092094_v6_c8 Reads and writes (returns) flatfile entries Output file [ac092094_v6_c8.entret]: %> more ac092094_v6_c8.entret %> seqret htg_mus:AC092094_v6_c8 Reads and writes (returns) sequences Output sequence [ac092094_v6_c8.fasta]: %> more ac092094_v6_c8.fasta >AC092094_v6_c8 Mus musculus clone RP23-261m19, WORKING DRAFT SEQUENCE, 8 unordered pieces. CAGGACAGCCAGGGCTACACAGAGAAACCCTGTCTCAAAAAACAAAAAAACAAAAAAAAA ACAAAAGAAGAAGAAAATGTCTGTGAATACCCTGGAAAAGTTACTCAGTGAAAGTAGATG AGTCCCTGAGTCAGTGACAGGAAGTGAGTGCAGTCTGAGCACTGGCTTGTGACCAATGAC AAAAACATAAGCTAGACTTGCTCTGCAAAGTGGAGGACAGAACAGACAAAGCCCCAGAGT etc. etc. ############ entret doesn't produce any errors, but if I run it with the -debug option I see the following lines in entret.dbg ############ Initializing seqInFormat, 40 formats ajSeqRead: input file '/data/MOUSE/HTG/htg_mus.fasta' still there, try again seqRead: single access - count 1 - call access routine again seqAccessEmblcd type 1 query data all finished seqRead: seqin->Query->Access->Access(seqin) *failed* ajSeqallNext failed closing file 'ac092094_v6_c4.entret' ############ I've checked, and the /data/MOUSE/HTG/htg_mus.fasta file is definitely there, and is readable, so I suspect that something in the EMBOSS internals is going wrong. This is using EMBOSS 2.0.0. Is this a known bug? Is there a fix on the way? I can bluff the script using seqret in this case, but I'd like to make a more general solution eventually. Cheers Simon. From simon.andrews at bbsrc.ac.uk Thu Dec 20 07:24:18 2001 From: simon.andrews at bbsrc.ac.uk (simon andrews (BI)) Date: Thu, 20 Dec 2001 12:24:18 -0000 Subject: Farm files for databases Message-ID: <2DC41140A89ED411989D00508BDCD9EDEA51F3@bi-exsrv1.iapc.bbsrc.ac.uk> After getting some useful info from Peter Rice about how to create a database farm in EMBOSS I thought I'd share the script I'm now using to do this. To use this simply copy and paste the text of the script at the bottom of this message to a file on your system, then make sure that this file is readable and executable by everyone (chmod 755 filename). The comments in the script tell you what changes you need to make to the script itself, and the format of the entry you need to create in emboss.default. Because of the bug I previously reported in entret, this script will not work from an entret query to the farm. It will work with seqret (and will output any format you like), and can also be used as part of a USA for any of the standard EMBOSS programs. The script requires a unix-like OS, but could trivially be adapted to run under Win32 if anyone is running EMBOSS under windows. TTFN Simon. ------ Script Starts Here -- Beware of long lines wrapping ---------------------- #!/usr/bin/perl -w use strict; # EMBOSS farm file script # # Written by Simon Andrews # simon.andrews at bbsrc.ac.uk # Dec 2001 # # This script allows you to set up a farm # of EMBOSS databases which can be queried # by a single instance of seqret. The # program must be accompanied by an entry # in emboss.default which looks like this: # # DB name_of_database [ # type: N (or P if we're dealing with proteins) # method: app # format: fasta # app: "/path/to/this/script" # comment: "Whatever text you'd like to see in showdb" ] # # First we need to set a few preferences # # What is the full path to seqret? # If you are sure that seqret will always # be somewhere in your path, then you can # just leave this as 'seqret'. my $seqret_path = 'seqret'; # Now we need to know the names of the # databases you'd like included in the # search. These must be dabases which # have already been indexed, and installed # correctly into emboss.default. Simply # enter the database names between the # brackets, separated by spaces. my @databases = qw(dbase1 dbase2 dbase3); ##### End of bits which need to be edited ######### my ($reference) = @ARGV; if ($reference =~ /:(.+)$/){ $reference = $1; } else { die "\n*** FARM ERROR *** Couldn't get accession after : from $reference\n\n"; } foreach my $database (@databases){ my $sequence = `$seqret_path $database:$reference fasta::stdout 2>/dev/null`; if ($sequence){ print $sequence; exit; } } warn "\n*** FARM ERROR *** Couldn't find $reference in any of '@databases'\n\n"; From lukem at bioinfo.pbi.nrc.ca Thu Dec 20 10:10:19 2001 From: lukem at bioinfo.pbi.nrc.ca (Luke McCarthy) Date: Thu, 20 Dec 2001 09:10:19 -0600 Subject: alignment sequence reading with stop codons (bug?) References: <3C218D20.D752C59@genprofile.com> Message-ID: <3C21FF5B.2C4251BF@bioinfo.pbi.nrc.ca> David Bauer wrote: > > I also guess you are misinterpreting the -seqall. This means to return > all sequences from a file containing more than one sequence (like a > fasta formated file with several sequences separated by theire > description lines). For me the -seqall option does not make much sense > in the case of alignment programs which need exactly 2 sequences to > align. Nevertheless, the acd files for water and needle clearly state that the second argument is a parameter of type seqall. Which makes perfect sense if one wants to align a probe sequence against a database of others (a la BLAST) Cheers, Luke From jason at cgt.mc.duke.edu Thu Dec 20 10:12:48 2001 From: jason at cgt.mc.duke.edu (Jason Stajich) Date: Thu, 20 Dec 2001 10:12:48 -0500 (EST) Subject: alignment sequence reading with stop codons (bug?) In-Reply-To: <3C218D20.D752C59@genprofile.com> Message-ID: On Thu, 20 Dec 2001, David Bauer wrote: > Hi, > > the protein alignment programs don't like the '*' in your protein > sequences. They are designed to align true proteins which usualy do not > contain stop codons. > If this are putative ORFs, a solution would be to split them up at the > stops, creating a separate protein sequence for each ORF. > Re-aligning blastx hsps in some distant fungi so am hitting pseudogenes or sequencing errors, hence the stop codons. What is confusing to me wrt to the actual alignment programs, is if they don't like stop codons at all, they still allow an alignment when the sequence containing the stop codon is the query (-sequencea) but not when the sequence is in the subject db - ie the behavior in my previous msg. I may just recode the stop codons as an unknown aa to achieve what I need for the alignment. I realize it is silly to try and align these proteins with stop codons but I am looking for conserved regions for degenerate PCR primer picking. [jason at gordola crypto_intergenic]$ head -6 contig5745.water Local: Contig5745 vs SW-CC27_YEAST Score: 367.50 Contig5745 1 CLIF*RLLLIQMI.HPQARRAFTFLQQQEPYRIQSMEQLSTLLWH 44 ||: | ::| : : : | |: :| |:: || |||||| SW-CC27_YEAST 474 CLVQLGKLHFEIINYDMSLKYFNRLKDLQPARVKDMEIFSTLLWH 518 > I also guess you are misinterpreting the -seqall. This means to return > all sequences from a file containing more than one sequence (like a > fasta formated file with several sequences separated by theire > description lines). For me the -seqall option does not make much sense > in the case of alignment programs which need exactly 2 sequences to > align. > There you must always pass the two sequence files which you want align > as arguments to the alignment program and each file must contain exactly > one sequence. > In the alignment program context -seqall is the name of the db to search the query (-sequencea) against - so one will get an alignment of the first sequence against the whole db of sequences. I am only interested in 1 pairwise comparison so the order of the sequences didn't really matter to me. We have a SW alignment module in bioperl (written in C - before you gag) for protein alignments but was trying out our new EMBOSS wrappers in bioperl, hence the reported issue. > I hope this helps, > > David Bauer. > > > Jason Stajich wrote: > > > > I noticed this in playing with our new bioperl wrappers for EMBOSS. > > Apparently -seqall does not read sequences with stop codons. > > I can submit as a bug if that is more appropriate. Getting warmed up to > > the EMBOSS dev process. > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From gbottu at ben.vub.ac.be Thu Dec 20 12:25:52 2001 From: gbottu at ben.vub.ac.be (Guy Bottu) Date: Thu, 20 Dec 2001 18:25:52 +0100 (MET) Subject: Farm files for databases, using SRS Message-ID: <200112201725.SAA11576@bigben.vub.ac.be> from : BEN You can also, like Peter suggested, use SRS. For example, I wanted to access the databanks IMGT/LIGM and IMGT/MHC as one databank with name imgt and shortname im. I use SRS for retrieving one or several sequences eventually with their documentation and a direct access to a databank for a full search (faster ?). I wrote : in .../emboss/share/EMBOSS/emboss.default : DB imgt [ type: N comment: 'Immunogenetics Databases' methodquery: srs dbalias: IMGT formatquery: embl methodall: direct dir: /sw/emboss/DBlink file: 'I*' formatall: fasta ] DB im [ type: N comment: 'Immunogenetics Databases' methodquery: srs dbalias: IMGT formatquery: embl methodall: direct dir: /sw/emboss/DBlink file: 'I*' formatall: fasta ] and in .../srs/icarus/site/site.i (hidden so that it does show up in the WWW page of SRS) : $imgt_db=$Library:[IMGT format:$EMBL_FORMAT virtualInfo:$LibVirtual:[ memberLibs:{$IMGT_DB $MHC_DB} ] type:hidden ] The directory /sw/emboss/DBlink contains : Iligm -> /dbfb/imgt/ligm Imhc -> /dbfb/imgt/mhc Guy Bottu From ableasby at hgmp.mrc.ac.uk Mon Dec 24 11:41:15 2001 From: ableasby at hgmp.mrc.ac.uk (ableasby at hgmp.mrc.ac.uk) Date: Mon, 24 Dec 2001 16:41:15 GMT Subject: EMBOSS 2.1.0 released Message-ID: <200112241641.QAA23771@bromine.hgmp.mrc.ac.uk> EMBOSS 2.1.0, coming some 6 months after the previous release, now includes an alpha release of the client/server GUI called Jemboss, written at the HGMP by Tim Carver. The complete package is available for download from: http://www.uk.embnet.org/Software/EMBOSS Several new applications are provided including primer3. There has also been considerable work done in the transition towards standard report formats and many applications now use these. Any alignment program can use the -aformat qualifier to choose a variety of standard outputs (e.g. pair, markx0,markx1,srs). Reports for non-alignment programs similarly use the -rformat qualifier. All have sensible defaults. Reports will be further integrated throughout the EMBOSS vsn 2 distributions. EMBOSS will work as usual without Jemboss, however if you wish to try using Jemboss (server or client) see: http://www.uk.embnet.org/Software/EMBOSS/Jemboss/download/setup.html Alan From smcmahan at facstaff.wisc.edu Sat Dec 1 00:41:40 2001 From: smcmahan at facstaff.wisc.edu (Scott McMahan) Date: Fri, 30 Nov 2001 18:41:40 -0600 Subject: Modifying existing programs Message-ID: <3C082744.1020802@facstaff.wisc.edu> I've modified pepstats.c (and necessary support files) to include the calculation of molar extinction coefficient at 280 and the expected A280 of a 1mg/ml solution. I've looked on the website, but couldn't find documentation about how to handle additions to existing applications. Could someone please point me in the right direction? -- Scott McMahan smcmahan at facstaff.wisc.edu From econtact at defisc-immo.fr Fri Dec 7 06:06:34 2001 From: econtact at defisc-immo.fr (DEFISCIMMO) Date: Fri, 7 Dec 2001 07:06:34 +0100 Subject: INVESTISSEZ VOS IMPOTS Message-ID: **********EPARGNEZ VOS IMPOTS********************* Pour en savoir plus cliquez sur le lien suivant : www.defisc-immo.fr http://www.defisc-immo.fr/cgi-bin/s.pl?id=453059457;p=index;end;/ --------------------------------------------------------------------------- INVESTIR FACILEMENT Loyers percus ? partir de + Economie d'imp?ts 200 F/mois - Remboursement des pr?ts = EPARGNE MINIMALE Ou comment, dans le cadre de la LOI BESSON, se constituer : - un patrimoine - un capital retraite - des revenus compl?mentaires gr?ce ? un LOCATAIRE et ? des ECONOMIES D'IMPOTS. * Plans d'investissement sur demande DEFISCIMMO info at defisc-immo.fr Nous vous invitons ? remplir le formulaire ? l'adresse http://www.defisc-immo.fr/cgi-bin/s.pl?id=453059457;p=contact;end si vous ne souhaitez plus recevoir de messages cliquez sur le lien suivant http://www.defisc-immo.fr/contact/pages/mailing.htm ou r?pondez ? ce courrier en indiquant 'annulation' dans le sujet. From gbottu at ben.vub.ac.be Fri Dec 7 10:59:49 2001 From: gbottu at ben.vub.ac.be (Guy Bottu) Date: Fri, 7 Dec 2001 11:59:49 +0100 (MET) Subject: compiling EMNU on CompaqTru64 Message-ID: <200112071059.LAA16923@bigben.vub.ac.be> from : BEN Dear colleagues, I have a problem. I am trying to compile EMNU on our new computer. We have OS CompaqTru64 5.1 and compiler GNU gcc 3.O.1 It does not work because the files menu.h, form.h, eti.h, libmenu.a and libform.a are lacking. Anyone an idea where to obtain these ? Guy Bottu From gwilliam at hgmp.mrc.ac.uk Fri Dec 7 11:09:48 2001 From: gwilliam at hgmp.mrc.ac.uk (Gary Williams, Tel 01223 494522) Date: Fri, 07 Dec 2001 11:09:48 +0000 Subject: compiling EMNU on CompaqTru64 References: <200112071059.LAA16923@bigben.vub.ac.be> Message-ID: <3C10A37C.7C9CB92A@hgmp.mrc.ac.uk> The libmenu.a, menu.h and libform.a, form.h files are part of the standard curses (or ncurses) UNIX libraries. Check that these are set up correctly. ncurses is available from: ftp://dickey.his.com/ncurses/ or ftp://ftp.gnu.org/pub/gnu/ncurses Read emnu's INSTALL file for 'configure's arguments to piont to the required libraries. Guy Bottu wrote: > > from : BEN > > Dear colleagues, > > I have a problem. I am trying to compile EMNU on our new computer. We have OS > CompaqTru64 5.1 and compiler GNU gcc 3.O.1 > It does not work because the files menu.h, form.h, eti.h, libmenu.a and > libform.a are lacking. Anyone an idea where to obtain these ? > > Guy Bottu -- Gary Williams Tel: +44 1223 494522 Fax: +44 1223 494512 mailto:G.Williams at hgmp.mrc.ac.uk http://www.hgmp.mrc.ac.uk/ Bioinformatics,MRC HGMP Resource Centre,Hinxton,Cambridge, CB10 1SB,UK From ableasby at hgmp.mrc.ac.uk Fri Dec 7 11:10:26 2001 From: ableasby at hgmp.mrc.ac.uk (ableasby at hgmp.mrc.ac.uk) Date: Fri, 7 Dec 2001 11:10:26 GMT Subject: compiling EMNU on CompaqTru64 Message-ID: <200112071110.LAA24602@bromine.hgmp.mrc.ac.uk> Hi Guy, I believe you'll find them if you install GNU ncurses from ftp.gnu.org Cheers Alan From mad at biol.unlp.edu.ar Fri Dec 7 14:44:33 2001 From: mad at biol.unlp.edu.ar (Sarachu Martin) Date: Fri, 07 Dec 2001 11:44:33 -0300 (ART) Subject: gcg and solaris 8 Message-ID: <1007736273.3c10d5d1aaead@www.biol.unlp.edu.ar> Hi, sorry for the off-topic but maybe you can help me. Do you know if GCG 9 does run on a UltraSparc/Solaris 8 system? I installed GCG 9 on a Intel/Solaris 8 system and got a "cannot execute exe file" error on several files. GCG doesn?t run on a PC platform? Thanks, martin. From ztu at msi.umn.edu Fri Dec 7 14:54:15 2001 From: ztu at msi.umn.edu (Zheng Jin Tu) Date: Fri, 7 Dec 2001 08:54:15 -0600 (CST) Subject: gcg and solaris 8 In-Reply-To: <1007736273.3c10d5d1aaead@www.biol.unlp.edu.ar> Message-ID: Hi Sarachu: The best place is asking Acclerys. The company has better idea what operating system should be. Email: Help at GCG.Com Thanks, Tu On Fri, 7 Dec 2001, Sarachu Martin wrote: > Hi, > > sorry for the off-topic but maybe you can help me. Do you know if GCG 9 does > run on a UltraSparc/Solaris 8 system? I installed GCG 9 on a Intel/Solaris 8 > system and got a "cannot execute exe file" error on several files. GCG doesn?t > run on a PC platform? > > Thanks, > > martin. > From mathog at mendel.bio.caltech.edu Wed Dec 12 18:44:25 2001 From: mathog at mendel.bio.caltech.edu (David Mathog) Date: Wed, 12 Dec 2001 10:44:25 -0800 Subject: quick questions Message-ID: 1. Is this list archived in a searchable form somewhere? 2. what Ajax call or calls say if a command line switch was or wasn't present? For instance, at the moment when this foo = AjGetInt("Somekey"); returns foo = 0 I can't tell if the user entered "-somekey=0" or just left it off the line. 3. What entries have to go in the makefile to result in an EMBOSS executable that gdb will debug? This is on Solaris 8. I tried using -g along, but gdb didn't like the resulting executable. It would start it, but "bt" (backtrace) only showed binary addresses. GDBs exact message was: This GDB was configured as "sparc-sun-solaris2.8"..."/usr/local/src/EMBOSS/embassy/ESIM4-1.0.0/source/esim4": not in executable format: File format not recognized Thanks, David Mathog mathog at caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech From peter.rice at uk.lionbioscience.com Thu Dec 13 10:09:35 2001 From: peter.rice at uk.lionbioscience.com (Peter Rice) Date: Thu, 13 Dec 2001 10:09:35 +0000 Subject: quick questions References: Message-ID: <3C187E5F.9D94B7D3@uk.lionbioscience.com> Hi David, >2. what Ajax call or calls say if a command line switch was or wasn't >present? None. Values can be set on the command line, or by dependence on other values, or just default. Why would you like to know what was on the command line? It could be tricky for GUI interfaces if they deliberately put everything on the command line, default values and all. >3. What entries have to go in the makefile to result in an EMBOSS >executable that gdb will debug? None. Just run: ./configure --enable-debug before you make. regards, Peter Rice -- ------------------------------------------------ Peter Rice, LION Bioscience Ltd, Cambridge, UK peter.rice at uk.lionbioscience.com +44 1223 224723 From kkmattil at csc.fi Thu Dec 13 13:08:52 2001 From: kkmattil at csc.fi (Kimmo Mattila) Date: Thu, 13 Dec 2001 15:08:52 +0200 (EET) Subject: Problems with fuzzpro and ehmmer Message-ID: Dear EMBOSS people. I have had few problems with fuzzpro, patmatdb and ehmmer. If anyone of you have suggestions how to solve them, please tell. FUZZPRO and PATMATDB I am using fuzzpro and patmatdb with GCG formatted databases. If I run a search against whole database (e.g. swiss:*), the programs do find the right hit sequences, but pick wrong names for the found entries. With plane sequence files or with sequence name lists, this error does not occur. I have checked both the EMBOSS indexing and the GCG database files and they should be OK. Other EMBOSS and GCG ?applications give correct results, when same database files are used. Has someone else had similar troubles? If the indexing of the databases is in order, what might cause this? EHMMER We have successfully installed EMOBOSS-HMMER, however, unlike the native HMMER, the emboss version is not able to use multiple processors (even though ?cpu option is mentioned in the help data.) When I compared the Makefile of EMBOSS-HMMER to the native one, in noticed that the EMBOSS version lacks the settings for compiling multiprocessor version of HMMER. Has someone managed to circumvent this with some simple trick like copying some parts of the original HMMER Makefile to the Makefile of EMBOSS-version? Secondly, when I use ehmmsearch long output files are not complete. After about 200 lines lines ehmmsearch starts writing the output to the screen instead of the output file. The last line in the output file seems to be Domain top hits: And after this the alignments are printed to the screen. What might cause this? Is there e.g. some limit in the output file size. Regards, Kimmo Mattila --------------------------------------------------------------- Kimmo Mattila Science Support kimmo.mattila at csc.fi Center for Scientific Computing tel. +358 (0)9 457 2708 Tekniikantie 15a D, PL 405 fax. +358 (0)9 457 2302 FIN-02101 Espoo, Finland --------------------------------------------------------------- From mathog at mendel.bio.caltech.edu Thu Dec 13 16:04:30 2001 From: mathog at mendel.bio.caltech.edu (David Mathog) Date: Thu, 13 Dec 2001 08:04:30 -0800 Subject: quick questions Message-ID: > >2. what Ajax call or calls say if a command line switch was or wasn't > >present? > > None. Values can be set on the command line, or by dependence on other > values, or just default. > > Why would you like to know what was on the command line? It could be tricky > for GUI interfaces if they deliberately put everything on the command line, > default values and all. Consider an optional integer parameter "foobar" for which 0 is a valid value and also where if foobar is not specified, it is calculated based on the input sequences. That is, it does not have a fixed default value. I see no way to distinguish because "calculate value" and "use this value" when AjAcdGetInt returns 0. The workaround would beto set the default in the .acd file to a magic default value, say -1000000, which is out of range for the desired variable, and interpret that value as "not specified". There are three problems with this approach: 1. There may be cases for which there are no magic values available. 2. In w2h the default value shows up filled in on the Web interface. So the user sees -1000000 and wonders what the heck that means, or thinks that -900000 might also be valid. 3. It requires that range checking be disabled or special cased I guess I'll have a look at the code for AjAcdGetInt and see if it's possible to modify that into AjAcdItemExists, returning a boolean T/F for when the item has been specified. Then the code would be (more or less like on GCG) if(AjAcdItemExists("foobar")){ ifoobar=AjAcdGetInt("foobar"); } else { ifoobar=calculated_value(); } Thanks, David Mathog mathog at caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech From peter.rice at uk.lionbioscience.com Thu Dec 13 16:21:21 2001 From: peter.rice at uk.lionbioscience.com (Peter Rice) Date: Thu, 13 Dec 2001 16:21:21 +0000 Subject: quick questions References: Message-ID: <3C18D581.7A22FDB8@uk.lionbioscience.com> David Mathog wrote: > > Consider an optional integer parameter "foobar" for which 0 is a valid > value and also where if foobar is not specified, it is calculated based > on the input sequences. > > I guess I'll have a look at the code for AjAcdGetInt and see if it's > possible to modify that into AjAcdItemExists, returning a boolean > T/F for when the item has been specified. Then the code would be > (more or less like on GCG) > > if(AjAcdItemExists("foobar")){ > ifoobar=AjAcdGetInt("foobar"); > } > else { > ifoobar=calculated_value(); > } Calculated values are intended to be calculated in the ACD file. Interfaces such as W2H should be able to do this in JavaScript, though in some cases they have to simply treat values as integers. Try this ACD file. Save it as 'foobar.acd' and run as 'acdc foobar'. It will prompt for a sequence, then prompt for foobar with the sequence length as default but will accept any value from 0 to the sequence length. The 'echo' string is defined you so can see the value of foobar in the prompt. The default value can be calculated in more exotic ways too ... see the @() functions and the other calculated attributes. More can be easily added. ==================== appl: foobar [ documentation: "ACD example" groups: "test" ] sequence: sequence [ required: "Y" ] integer: foobar [ required: "Y" default: "$(sequence.len)" minimum: "0" maximum: "$(sequence.len)" ] string: echo [ prompt: "Foobar is $(foobar)" required: "Y" ] =================== There are many other ways to set options. You could set a boolean to calculate a value, and another value to define the calculation. Testing the command line will have real problems for your original idea, because an interface might be writing every option, with what it considers the default value, on the command line. -- ------------------------------------------------ Peter Rice, LION Bioscience Ltd, Cambridge, UK peter.rice at uk.lionbioscience.com +44 1223 224723 From mathog at mendel.bio.caltech.edu Thu Dec 13 17:33:48 2001 From: mathog at mendel.bio.caltech.edu (David Mathog) Date: Thu, 13 Dec 2001 09:33:48 -0800 Subject: quick questions Message-ID: > Calculated values are intended to be calculated in the ACD file. Interfaces > such as W2H should be able to do this in JavaScript, though in some cases > they have to simply treat values as integers. The default value in this case is the end result of at least a hundred lines of C code. > > Testing the command line will have real problems for your original idea, > because an interface might be writing every option, with what it considers > the default value, on the command line. > That's a good point. W2H isn't like that, but some other interface might be. I guess it won't hurt to add a couple of extra booleans to cover those variables whose default is difficult to calculate prior to the program running. David Mathog mathog at caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech From mathog at mendel.bio.caltech.edu Thu Dec 13 19:01:43 2001 From: mathog at mendel.bio.caltech.edu (David Mathog) Date: Thu, 13 Dec 2001 11:01:43 -0800 Subject: quick questions Message-ID: Hmm, after going up and down through the ACD notation I can't find what I'm looking for there either. Consider this notation: bool: usermspA [ opt: Y def: N info: "False: esim4 calculates mspA, True: mspA from command line." ] int: mspA [ opt: $(usermspA) req: $(usermspA) def: 16 info: "long description. default of 16 is not used unless usermspA is specified.." ] If the command is issued with -usermspA then it will prompt for -mspA if it wasn't also specified, which gives the desired results. However, if the command has only this on the command line. -mspA=16 it clearly means that the user really wants to use the value of 16 for the parameter. How then to switch the state on -usermspA automatically, or failing that, prompt for -usermspA? 16 happens to be the default value. It wasn't set to an illegal (magic) value because we don't want -1000000 showing up in a GUI. But it isn't normally used because -usermspA will be false. As before, we could use a sort of magic number and do: bool: usermspA [ opt: Y def: @($(mspa)!=16) info: "False: esim4 calculates mspA, True: mspA from command line." ] and it will correctly flip the bit when the user specifies it - except when by bad luck they choose to specify the default value. And round and round the logic goes. I don't suppose that there is a ".specified" or ".online" attribute in ACD? Ie, this would do the job: bool: usermspA [ opt: Y def: $(mspa.online) info: "False: esim4 calculates mspA, True: mspA from command line." ] The desired GUI interaction in that case could be one of: 1. changing value in mspA toggles state of usermspA (messy) 2. -mspA slot is grayed out unless -usermspA is set (simpler) In some interfaces this could be covered over with Javascript - but the command line variant still wouldn't work exactly right. Or am I missing something? Summary: works: command works: command -usermspA -mspA 16 works (prompts for mspA): command -usermspA fails to prompt or override usermspA: command -mspA 16 Thanks, David Mathog mathog at caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech From peter.rice at uk.lionbioscience.com Fri Dec 14 09:41:17 2001 From: peter.rice at uk.lionbioscience.com (Peter Rice) Date: Fri, 14 Dec 2001 09:41:17 +0000 Subject: quick questions References: Message-ID: <3C19C93D.A3FD713F@uk.lionbioscience.com> David Mathog wrote: > > Hmm, after going up and down through the ACD notation I can't find > what I'm looking for there either. > > The desired GUI interaction in that case could be one of: > > 1. changing value in mspA toggles state of usermspA (messy) This means 'mspA depends on usermspA' and 'usermspA depends on mspA'. ACD expressly forbids this. All dependencies must be to something defined earlier in the file. > 2. -mspA slot is grayed out unless -usermspA is set (simpler) Could be done with an extra ACD attribute, with a value of "$(usermspA)", but you would expect most GUIs to ignore this. In general, you can expect to have options in EMBOSS that are not used by the program but can still be set on the command line. Your -mspA is just another case. Having said that, adding an ACD function (you would only need one) to test whether a value was set by the user is fairly trivial (setting via the command line or by replying to a prompt if there is one). -- ------------------------------------------------ Peter Rice, LION Bioscience Ltd, Cambridge, UK peter.rice at uk.lionbioscience.com +44 1223 224723 From charles at moulinette.dyndns.org Tue Dec 18 08:56:25 2001 From: charles at moulinette.dyndns.org (Charles Plessy) Date: Tue, 18 Dec 2001 09:56:25 +0100 Subject: phylogenic analysis with emboss Message-ID: <20011218085625.GB803@gizmotronics.dyndns.org> Hi, I was wondering which tools you used for phylogenic analysis, since I can't find any treedrawing in either emboss or embassy's phylip. Charles From gbottu at ben.vub.ac.be Tue Dec 18 09:34:43 2001 From: gbottu at ben.vub.ac.be (Guy Bottu) Date: Tue, 18 Dec 2001 10:34:43 +0100 (MET) Subject: phylogenic analysis with emboss Message-ID: <200112180934.KAA04579@bigben.vub.ac.be> from : BEN >I was wondering which tools you used for phylogenic analysis, since I can't >find any treedrawing in either emboss or embassy's phylip. If I am not wrong, the EMBOSS on-line help states explicitly that the tree drawing programs and the tree editors of PHYLIP were not included in the embassy PHYLIP. So, you should retrieve the original PHYLIP package from evolution.genetics.washington.edu and use the programs drawgram and drawtree. Note that while embassy has integrated PHYLIP version 3.53c, there is now a version 3.6a2, which is definitively better. drawgram/drawtree has now for previewing the graphic an X-display and the generated PostScript files can not only be send directly to a printer, but can also be incorporated into documents like MS-Word doc files. Another useful freeware tool I know about is NJplot, which is distributed together with CLUSTAL (which you must install anyway in order emma to work). Guy Bottu From letondal at pasteur.fr Wed Dec 19 14:03:18 2001 From: letondal at pasteur.fr (Catherine Letondal) Date: Wed, 19 Dec 2001 15:03:18 +0100 Subject: how to cite EMBOSS? Message-ID: <200112191403.fBJE3IW452649@electre.pasteur.fr> Hi, Sorry if this is an FAQ, but I was not able to find any reference in EMBOSS documentation and Web site (apart from the original algorithms of course). Is there any reference for the EMBOSS project? Thanks a lot, -- Catherine Letondal -- Pasteur Institute Computing Center From gwilliam at hgmp.mrc.ac.uk Wed Dec 19 14:05:49 2001 From: gwilliam at hgmp.mrc.ac.uk (Gary Williams, Tel 01223 494522) Date: Wed, 19 Dec 2001 14:05:49 +0000 Subject: how to cite EMBOSS? References: <200112191403.fBJE3IW452649@electre.pasteur.fr> Message-ID: <3C209EBD.763B54F7@hgmp.mrc.ac.uk> See the FAQ file: Q) Is there a reference I can cite for EMBOSS? A) Rice,P. Longden,I. and Bleasby,A. "EMBOSS: The European Molecular Biology Open Software Suite" Trends in Genetics June 2000, vol 16, No 6. pp.276-277 You are right - it should be in a more obvious place. Gary Catherine Letondal wrote: > > Hi, > > Sorry if this is an FAQ, but I was not able to find any reference > in EMBOSS documentation and Web site (apart from the original > algorithms of course). Is there any reference for the EMBOSS project? > > Thanks a lot, > > -- > Catherine Letondal -- Pasteur Institute Computing Center -- Gary Williams Tel: +44 1223 494522 Fax: +44 1223 494512 mailto:G.Williams at hgmp.mrc.ac.uk http://www.hgmp.mrc.ac.uk/ Bioinformatics,MRC HGMP Resource Centre,Hinxton,Cambridge, CB10 1SB,UK From letondal at pasteur.fr Wed Dec 19 14:10:44 2001 From: letondal at pasteur.fr (Catherine Letondal) Date: Wed, 19 Dec 2001 15:10:44 +0100 Subject: how to cite EMBOSS? In-Reply-To: Your message of "Wed, 19 Dec 2001 14:05:49 GMT." <3C209EBD.763B54F7@hgmp.mrc.ac.uk> Message-ID: <200112191410.fBJEAiW438064@electre.pasteur.fr> "Gary Williams, Tel 01223 494522" wrote: > > See the FAQ file: > > Q) Is there a reference I can cite for EMBOSS? > > A) Rice,P. Longden,I. and Bleasby,A. > "EMBOSS: The European Molecular Biology Open Software Suite" > Trends in Genetics June 2000, vol 16, No 6. pp.276-277 > > You are right - it should be in a more obvious place. Thanks - yes, maybe in the http://www.uk.embnet.org/Software/EMBOSS/general.html page? > > Gary > > Catherine Letondal wrote: > > > > Hi, > > > > Sorry if this is an FAQ, but I was not able to find any reference > > in EMBOSS documentation and Web site (apart from the original > > algorithms of course). Is there any reference for the EMBOSS project? > > > > Thanks a lot, > > > > -- > > Catherine Letondal -- Pasteur Institute Computing Center > > -- > Gary Williams Tel: +44 1223 494522 Fax: +44 1223 494512 > mailto:G.Williams at hgmp.mrc.ac.uk http://www.hgmp.mrc.ac.uk/ > Bioinformatics,MRC HGMP Resource Centre,Hinxton,Cambridge, CB10 1SB,UK -- Catherine Letondal -- Pasteur Institute Computing Center From gwilliam at hgmp.mrc.ac.uk Wed Dec 19 14:21:09 2001 From: gwilliam at hgmp.mrc.ac.uk (Gary Williams, Tel 01223 494522) Date: Wed, 19 Dec 2001 14:21:09 +0000 Subject: how to cite EMBOSS? References: <200112191410.fBJEAiW438064@electre.pasteur.fr> Message-ID: <3C20A255.80D77C57@hgmp.mrc.ac.uk> Catherine Letondal wrote: > > "Gary Williams, Tel 01223 494522" wrote: > > > > See the FAQ file: > > > > Q) Is there a reference I can cite for EMBOSS? > > > > A) Rice,P. Longden,I. and Bleasby,A. > > "EMBOSS: The European Molecular Biology Open Software Suite" > > Trends in Genetics June 2000, vol 16, No 6. pp.276-277 > > > > You are right - it should be in a more obvious place. > > Thanks - yes, maybe in the http://www.uk.embnet.org/Software/EMBOSS/general.html page? Done. Gary -- Gary Williams Tel: +44 1223 494522 Fax: +44 1223 494512 mailto:G.Williams at hgmp.mrc.ac.uk http://www.hgmp.mrc.ac.uk/ Bioinformatics,MRC HGMP Resource Centre,Hinxton,Cambridge, CB10 1SB,UK From simon.andrews at bbsrc.ac.uk Wed Dec 19 14:55:43 2001 From: simon.andrews at bbsrc.ac.uk (simon andrews (BI)) Date: Wed, 19 Dec 2001 14:55:43 -0000 Subject: Farm files for databases Message-ID: <2DC41140A89ED411989D00508BDCD9EDEA51EC@bi-exsrv1.iapc.bbsrc.ac.uk> I'm trying to find the best way to do the following: I have an application which returns an identifier (effectively an accession number), which could be present in any one of 4 separate EMBOSS databases. I'd like to be able to search all of these databases and retrieve the sequence from whichever one finds it (I know that the identifiers are unique between the different databases - so I'll only ever find one entry). Having read the EMBOSS documentation the only reference I could find for doing this sort of thing was to make a database entry with an "EXTERNAL" format, and then have seqret query a script to return the sequence. However the details for this are pretty sketchy. What exactly would a script of this type have to do? What input is it supplied with (and how), and what must it return? Is this the only (or best) way to do what I'm trying to do? Any help is much appreciated. TTFN Simon. ---- Simon Andrews PhD Bioinformatics Dept The Babraham Institute simon.andrews at bbsrc.ac.uk +44 (0)1223 496463 From peter.rice at uk.lionbioscience.com Wed Dec 19 15:16:46 2001 From: peter.rice at uk.lionbioscience.com (Peter Rice) Date: Wed, 19 Dec 2001 15:16:46 +0000 Subject: Farm files for databases References: <2DC41140A89ED411989D00508BDCD9EDEA51EC@bi-exsrv1.iapc.bbsrc.ac.uk> Message-ID: <3C20AF5E.F905F683@uk.lionbioscience.com> "simon andrews (BI)" wrote: > > I have an application which returns an identifier (effectively an > accession number), which could be present in any one of 4 separate > EMBOSS databases. > I'd like to be able to search all of these databases and retrieve the > sequence from whichever one finds it (I know that the identifiers are > unique between the different databases - so I'll only ever find one > entry). This sounds like a job for SRS, although the query could be complicated if there is a possibility of getting more than one copy returned. A script is a good solution. The script should read the dbname:id query from the commandline, and return the sequence in some specified format. What the script does is up to you. If there is no sequence found, it can simply return nothing. The original 'external' applications were the 'efetch' utility in acedb, and GCG's 'typedata'. Peter -- ------------------------------------------------ Peter Rice, LION Bioscience Ltd, Cambridge, UK peter.rice at uk.lionbioscience.com +44 1223 224723 From jason at cgt.mc.duke.edu Wed Dec 19 21:01:23 2001 From: jason at cgt.mc.duke.edu (Jason Stajich) Date: Wed, 19 Dec 2001 16:01:23 -0500 (EST) Subject: alignment sequence reading with stop codons (bug?) Message-ID: I noticed this in playing with our new bioperl wrappers for EMBOSS. Apparently -seqall does not read sequences with stop codons. I can submit as a bug if that is more appropriate. Getting warmed up to the EMBOSS dev process. This occurs with both EMBOSS-1.9.1 and CVS code I checked out today (2.0.1 I guess). The work around is of course to specify the arguments in the correct way or replace the stop codon with something like X. I know which sequence will have potential stop codons so I can work around this in my own code. [jason at gordola crypto_intergenic]$ cat jason.seq >SW-CC27_YEAST SW:CC27_YEAST P38042 saccharomyces cerevisiae (baker's yeast). cell division control protein 27. 10/2001; PIR:S45825 cell division control protein CDC27 - yeast (Saccharomyces cerevisia MAVNPELAPFTLSRGIPSFDDQALSTIIQLQDCIQQAIQQLNYSTAEFLAELLYAECSIL DKSSVYWSDAVYLYALSLFLNKSYHTAFQISKEFKEYHLGIAYIFGRCALQLSQGVNEAI LTLLSIINVFSSNSSNTRINMVLNSNLVHIPDLATLNCLLGNLYMKLDHSKEGAFYHSEA LAINPYLWESYEAICKMRATVDLKRVFFDIAGKKSNSHNNNAASSFPSTSLSHFEPRSQP SLYSKTNKNGNNNINNNVNTLFQSSNSPPSTSASSFSSIQHFSRSQQQQANTSIRTCQNK NTQTPKNPAINSKTSSALPNNISMNLVSPSSKQPTISSLAKVYNRNKLLTTPPSKLLNND RNHQNNNNNNNNNNNNNNNNNNNNNNNNIINKTTFKTPRNLYSSTGRLTTSKKNPRSLII SNSILTSDYQITLPEIMYNFALILRSSSQYNSFKAIRLFESQIPSHIKDTMPWCLVQLGK LHFEIINYDMSLKYFNRLKDLQPARVKDMEIFSTLLWHLHDKVKSSNLANGLMDTMPNKP ETWCCIGNLLSLQKDHDAAIKAFEKATQLDPNFAYAYTLQGHEHSSNDSSDSAKTCYRKA LACDPQHYNAYYGLGTSAMKLGQYEEALLYFEKARSINPVNVVLICCCGGSLEKLGYKEK ALQYYELACHLQPTSSLSKYKMGQLLYSMTRYNVALQTFEELVKLVPDDATAHYLLGQTY RIVGRKKDAIKELTVAMNLDPKGNQVIIDELQKCHMQE [jason at gordola crypto_intergenic]$ cat prot.seq >Contig5745 CLIF*RLLLIQMIHPQARRAFTFLQQQEPYRIQSMEQLSTLLWHLADLPALSHLSQSLIS ISRSSPQAWIAVGNCFSLQKDHDEAMRCFRRATQVDEGCAYAWTLCGYEAVEMEEYERAM AFYRTAIRTDARHYNAWYVLFFFFFFFFVPGDIDS*PKKGMEWG*FISKRIDRGMRSIIL KEPSKSIQLIPFFYVALVW*VGVSSYPLETMTNIDFPKKKKALEKSNDVVQALHFYERAS KYAPTSAMVQFKRIRALVALQRYDEAISALVPLTHSAPDEANVFFLLGKCLLKKERRQEA TMAFTNARELEPK [jason at gordola crypto_intergenic]$ water jason.seq prot.seq Smith-Waterman local alignment. An error has been found: Sequence Contig5745 must be protein sequence, found bad character '*' An error has been found: option -seqall: Unable to read sequence 'prot.seq' There is a serious problem: water terminated: Bad value for option and no prompt [jason at gordola crypto_intergenic]$ water prot.seq jason.seq Smith-Waterman local alignment. Gap opening penalty [10.0]: Gap extension penalty [0.5]: Output file [contig5745.water]: -- Jason Stajich Duke University jason at cgt.mc.duke.edu From bauer at genprofile.com Thu Dec 20 07:02:56 2001 From: bauer at genprofile.com (David Bauer) Date: Thu, 20 Dec 2001 08:02:56 +0100 Subject: alignment sequence reading with stop codons (bug?) References: Message-ID: <3C218D20.D752C59@genprofile.com> Hi, the protein alignment programs don't like the '*' in your protein sequences. They are designed to align true proteins which usualy do not contain stop codons. If this are putative ORFs, a solution would be to split them up at the stops, creating a separate protein sequence for each ORF. I also guess you are misinterpreting the -seqall. This means to return all sequences from a file containing more than one sequence (like a fasta formated file with several sequences separated by theire description lines). For me the -seqall option does not make much sense in the case of alignment programs which need exactly 2 sequences to align. There you must always pass the two sequence files which you want align as arguments to the alignment program and each file must contain exactly one sequence. I hope this helps, David Bauer. Jason Stajich wrote: > > I noticed this in playing with our new bioperl wrappers for EMBOSS. > Apparently -seqall does not read sequences with stop codons. > I can submit as a bug if that is more appropriate. Getting warmed up to > the EMBOSS dev process. From simon.andrews at bbsrc.ac.uk Thu Dec 20 09:21:43 2001 From: simon.andrews at bbsrc.ac.uk (simon andrews (BI)) Date: Thu, 20 Dec 2001 09:21:43 -0000 Subject: Bug in entret. Message-ID: <2DC41140A89ED411989D00508BDCD9EDEA51F1@bi-exsrv1.iapc.bbsrc.ac.uk> Following on from my query yesterday, I have hit a problem trying to implement a multiple search because of what appears to be a bug in entret. I am using a series of fasta flat files, indexed with dbifasta. What I am finding is that although I can retrieve entries from the database with seqret, using entret always returns an empty file with the same accession number: ############ %> entret htg_mus:AC092094_v6_c8 Reads and writes (returns) flatfile entries Output file [ac092094_v6_c8.entret]: %> more ac092094_v6_c8.entret %> seqret htg_mus:AC092094_v6_c8 Reads and writes (returns) sequences Output sequence [ac092094_v6_c8.fasta]: %> more ac092094_v6_c8.fasta >AC092094_v6_c8 Mus musculus clone RP23-261m19, WORKING DRAFT SEQUENCE, 8 unordered pieces. CAGGACAGCCAGGGCTACACAGAGAAACCCTGTCTCAAAAAACAAAAAAACAAAAAAAAA ACAAAAGAAGAAGAAAATGTCTGTGAATACCCTGGAAAAGTTACTCAGTGAAAGTAGATG AGTCCCTGAGTCAGTGACAGGAAGTGAGTGCAGTCTGAGCACTGGCTTGTGACCAATGAC AAAAACATAAGCTAGACTTGCTCTGCAAAGTGGAGGACAGAACAGACAAAGCCCCAGAGT etc. etc. ############ entret doesn't produce any errors, but if I run it with the -debug option I see the following lines in entret.dbg ############ Initializing seqInFormat, 40 formats ajSeqRead: input file '/data/MOUSE/HTG/htg_mus.fasta' still there, try again seqRead: single access - count 1 - call access routine again seqAccessEmblcd type 1 query data all finished seqRead: seqin->Query->Access->Access(seqin) *failed* ajSeqallNext failed closing file 'ac092094_v6_c4.entret' ############ I've checked, and the /data/MOUSE/HTG/htg_mus.fasta file is definitely there, and is readable, so I suspect that something in the EMBOSS internals is going wrong. This is using EMBOSS 2.0.0. Is this a known bug? Is there a fix on the way? I can bluff the script using seqret in this case, but I'd like to make a more general solution eventually. Cheers Simon. From simon.andrews at bbsrc.ac.uk Thu Dec 20 12:24:18 2001 From: simon.andrews at bbsrc.ac.uk (simon andrews (BI)) Date: Thu, 20 Dec 2001 12:24:18 -0000 Subject: Farm files for databases Message-ID: <2DC41140A89ED411989D00508BDCD9EDEA51F3@bi-exsrv1.iapc.bbsrc.ac.uk> After getting some useful info from Peter Rice about how to create a database farm in EMBOSS I thought I'd share the script I'm now using to do this. To use this simply copy and paste the text of the script at the bottom of this message to a file on your system, then make sure that this file is readable and executable by everyone (chmod 755 filename). The comments in the script tell you what changes you need to make to the script itself, and the format of the entry you need to create in emboss.default. Because of the bug I previously reported in entret, this script will not work from an entret query to the farm. It will work with seqret (and will output any format you like), and can also be used as part of a USA for any of the standard EMBOSS programs. The script requires a unix-like OS, but could trivially be adapted to run under Win32 if anyone is running EMBOSS under windows. TTFN Simon. ------ Script Starts Here -- Beware of long lines wrapping ---------------------- #!/usr/bin/perl -w use strict; # EMBOSS farm file script # # Written by Simon Andrews # simon.andrews at bbsrc.ac.uk # Dec 2001 # # This script allows you to set up a farm # of EMBOSS databases which can be queried # by a single instance of seqret. The # program must be accompanied by an entry # in emboss.default which looks like this: # # DB name_of_database [ # type: N (or P if we're dealing with proteins) # method: app # format: fasta # app: "/path/to/this/script" # comment: "Whatever text you'd like to see in showdb" ] # # First we need to set a few preferences # # What is the full path to seqret? # If you are sure that seqret will always # be somewhere in your path, then you can # just leave this as 'seqret'. my $seqret_path = 'seqret'; # Now we need to know the names of the # databases you'd like included in the # search. These must be dabases which # have already been indexed, and installed # correctly into emboss.default. Simply # enter the database names between the # brackets, separated by spaces. my @databases = qw(dbase1 dbase2 dbase3); ##### End of bits which need to be edited ######### my ($reference) = @ARGV; if ($reference =~ /:(.+)$/){ $reference = $1; } else { die "\n*** FARM ERROR *** Couldn't get accession after : from $reference\n\n"; } foreach my $database (@databases){ my $sequence = `$seqret_path $database:$reference fasta::stdout 2>/dev/null`; if ($sequence){ print $sequence; exit; } } warn "\n*** FARM ERROR *** Couldn't find $reference in any of '@databases'\n\n"; From lukem at bioinfo.pbi.nrc.ca Thu Dec 20 15:10:19 2001 From: lukem at bioinfo.pbi.nrc.ca (Luke McCarthy) Date: Thu, 20 Dec 2001 09:10:19 -0600 Subject: alignment sequence reading with stop codons (bug?) References: <3C218D20.D752C59@genprofile.com> Message-ID: <3C21FF5B.2C4251BF@bioinfo.pbi.nrc.ca> David Bauer wrote: > > I also guess you are misinterpreting the -seqall. This means to return > all sequences from a file containing more than one sequence (like a > fasta formated file with several sequences separated by theire > description lines). For me the -seqall option does not make much sense > in the case of alignment programs which need exactly 2 sequences to > align. Nevertheless, the acd files for water and needle clearly state that the second argument is a parameter of type seqall. Which makes perfect sense if one wants to align a probe sequence against a database of others (a la BLAST) Cheers, Luke From jason at cgt.mc.duke.edu Thu Dec 20 15:12:48 2001 From: jason at cgt.mc.duke.edu (Jason Stajich) Date: Thu, 20 Dec 2001 10:12:48 -0500 (EST) Subject: alignment sequence reading with stop codons (bug?) In-Reply-To: <3C218D20.D752C59@genprofile.com> Message-ID: On Thu, 20 Dec 2001, David Bauer wrote: > Hi, > > the protein alignment programs don't like the '*' in your protein > sequences. They are designed to align true proteins which usualy do not > contain stop codons. > If this are putative ORFs, a solution would be to split them up at the > stops, creating a separate protein sequence for each ORF. > Re-aligning blastx hsps in some distant fungi so am hitting pseudogenes or sequencing errors, hence the stop codons. What is confusing to me wrt to the actual alignment programs, is if they don't like stop codons at all, they still allow an alignment when the sequence containing the stop codon is the query (-sequencea) but not when the sequence is in the subject db - ie the behavior in my previous msg. I may just recode the stop codons as an unknown aa to achieve what I need for the alignment. I realize it is silly to try and align these proteins with stop codons but I am looking for conserved regions for degenerate PCR primer picking. [jason at gordola crypto_intergenic]$ head -6 contig5745.water Local: Contig5745 vs SW-CC27_YEAST Score: 367.50 Contig5745 1 CLIF*RLLLIQMI.HPQARRAFTFLQQQEPYRIQSMEQLSTLLWH 44 ||: | ::| : : : | |: :| |:: || |||||| SW-CC27_YEAST 474 CLVQLGKLHFEIINYDMSLKYFNRLKDLQPARVKDMEIFSTLLWH 518 > I also guess you are misinterpreting the -seqall. This means to return > all sequences from a file containing more than one sequence (like a > fasta formated file with several sequences separated by theire > description lines). For me the -seqall option does not make much sense > in the case of alignment programs which need exactly 2 sequences to > align. > There you must always pass the two sequence files which you want align > as arguments to the alignment program and each file must contain exactly > one sequence. > In the alignment program context -seqall is the name of the db to search the query (-sequencea) against - so one will get an alignment of the first sequence against the whole db of sequences. I am only interested in 1 pairwise comparison so the order of the sequences didn't really matter to me. We have a SW alignment module in bioperl (written in C - before you gag) for protein alignments but was trying out our new EMBOSS wrappers in bioperl, hence the reported issue. > I hope this helps, > > David Bauer. > > > Jason Stajich wrote: > > > > I noticed this in playing with our new bioperl wrappers for EMBOSS. > > Apparently -seqall does not read sequences with stop codons. > > I can submit as a bug if that is more appropriate. Getting warmed up to > > the EMBOSS dev process. > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From gbottu at ben.vub.ac.be Thu Dec 20 17:25:52 2001 From: gbottu at ben.vub.ac.be (Guy Bottu) Date: Thu, 20 Dec 2001 18:25:52 +0100 (MET) Subject: Farm files for databases, using SRS Message-ID: <200112201725.SAA11576@bigben.vub.ac.be> from : BEN You can also, like Peter suggested, use SRS. For example, I wanted to access the databanks IMGT/LIGM and IMGT/MHC as one databank with name imgt and shortname im. I use SRS for retrieving one or several sequences eventually with their documentation and a direct access to a databank for a full search (faster ?). I wrote : in .../emboss/share/EMBOSS/emboss.default : DB imgt [ type: N comment: 'Immunogenetics Databases' methodquery: srs dbalias: IMGT formatquery: embl methodall: direct dir: /sw/emboss/DBlink file: 'I*' formatall: fasta ] DB im [ type: N comment: 'Immunogenetics Databases' methodquery: srs dbalias: IMGT formatquery: embl methodall: direct dir: /sw/emboss/DBlink file: 'I*' formatall: fasta ] and in .../srs/icarus/site/site.i (hidden so that it does show up in the WWW page of SRS) : $imgt_db=$Library:[IMGT format:$EMBL_FORMAT virtualInfo:$LibVirtual:[ memberLibs:{$IMGT_DB $MHC_DB} ] type:hidden ] The directory /sw/emboss/DBlink contains : Iligm -> /dbfb/imgt/ligm Imhc -> /dbfb/imgt/mhc Guy Bottu From ableasby at hgmp.mrc.ac.uk Mon Dec 24 16:41:15 2001 From: ableasby at hgmp.mrc.ac.uk (ableasby at hgmp.mrc.ac.uk) Date: Mon, 24 Dec 2001 16:41:15 GMT Subject: EMBOSS 2.1.0 released Message-ID: <200112241641.QAA23771@bromine.hgmp.mrc.ac.uk> EMBOSS 2.1.0, coming some 6 months after the previous release, now includes an alpha release of the client/server GUI called Jemboss, written at the HGMP by Tim Carver. The complete package is available for download from: http://www.uk.embnet.org/Software/EMBOSS Several new applications are provided including primer3. There has also been considerable work done in the transition towards standard report formats and many applications now use these. Any alignment program can use the -aformat qualifier to choose a variety of standard outputs (e.g. pair, markx0,markx1,srs). Reports for non-alignment programs similarly use the -rformat qualifier. All have sensible defaults. Reports will be further integrated throughout the EMBOSS vsn 2 distributions. EMBOSS will work as usual without Jemboss, however if you wish to try using Jemboss (server or client) see: http://www.uk.embnet.org/Software/EMBOSS/Jemboss/download/setup.html Alan