From starksb at ebi.ac.uk Fri Nov 1 04:45:29 2002 From: starksb at ebi.ac.uk (David Starks-Browning) Date: Fri, 1 Nov 2002 09:45:29 +0000 Subject: emboss in cygwin In-Reply-To: <3DC16BAA.1050201@bigfoot.com> References: <3DC16BAA.1050201@bigfoot.com> Message-ID: <4429-Fri01Nov2002094530+0000-starksb@ebi.ac.uk> On Thursday 31 Oct 02, clwu writes: > Hi, group, > I am new to group. I tried to compile EMBOSS under > win2K/cygwin but I failed. EMBOSS website at HGMP mentioned that > "Richard Bruskiewich and Simon Kelley at the Sanger Centre have > succeeded in compiling EMBOSS under Windows NT using the CygWin package. > The resulting executables have been tested but not thoroughly enough for > a release. Contact Richard Bruskiewich for more information. ". But I > can not follow the link in this page to get help. > Does anyone have the successful experience on this? I just built EMBOSS-2.5.1 on Win98 using the latest Cygwin downloaded from . There is no libgd.[a|dll] so no PNG support. But everything else appeared to build fine. I've not tested the applications though. Note that you will need much more from Cygwin's setup.exe than is installed by default. If you provide details about what failed, I may be able to help you. Feel free to respond off-list, as a Cygwin build may not be interesting to the rest of the emboss list. We can always summarise to the emboss list once we get it sorted, if there is interest. Regards, David (Cygwin FAQ maintainer) ------------------------------------------------------------------- David Starks-Browning | starksb at ebi.ac.uk EMBL Outstation -- | The European Bioinformatics Institute | Wellcome Trust Genome Campus | tel: +44 (1223) 494 616 Hinxton, Cambridge, CB10 1SD, UK | fax: +44 (1223) 494 468 ------------------------------------------------------------------- From peter.rice at uk.lionbioscience.com Fri Nov 1 05:12:58 2002 From: peter.rice at uk.lionbioscience.com (Peter Rice) Date: Fri, 01 Nov 2002 10:12:58 +0000 Subject: emboss in cygwin References: <3DC16BAA.1050201@bigfoot.com> Message-ID: <3DC253AA.8080401@uk.lionbioscience.com> clwu wrote: > I am new to group. I tried to compile EMBOSS under > win2K/cygwin but I failed. EMBOSS website at HGMP mentioned that > "Richard Bruskiewich and Simon Kelley at the Sanger Centre have > succeeded in compiling EMBOSS under Windows NT using the CygWin package. > The resulting executables have been tested but not thoroughly enough for > a release. Contact Richard Bruskiewich for more information. ". But I > can not follow the link in this page to get help. That is rather old information. The history is that Richard Bruskiewich made a windows port of an early ACEDB version, and they both tried porting an early EMBOSS release using cygwin - which worked apart from the graphics library and windows fiel naming. Neither Richard nor Simon have been working on this recently. David Starks-Browning at EBI has built EMBOSS but not yet tried the applications. I hear of other groups who have also tried. You can expect problems with Windows filenames which clash with EMBOSS "USA" syntax. We can try to fix these - perhaps by requiring all database names to have more than one letter so Windows drive letters work. Any suggestions on changes needed to make EMBOSS work better (or work at all) on windows systems? Peter -- ------------------------------------------------ Peter Rice, LION Bioscience Ltd, Cambridge, UK peter.rice at uk.lionbioscience.com +44 1223 224723 From pageauma at ESI.UMontreal.CA Fri Nov 1 15:01:07 2002 From: pageauma at ESI.UMontreal.CA (Marie PAGEAU) Date: Fri, 1 Nov 2002 15:01:07 -0500 Subject: test sequence Message-ID: Dear colleagues, A lady, professor of biochemistry at the Universite de Montreal, sent me the following request. Would you please be nice enough to help us? Your help would be highly appreciated. Best regards, Marie Pageau ----------------------------------------------------------- De : Muriel Aubry Envoy? : 30 octobre, 2002 14:47 Objet : test sequence Hi, I am presently using the restrict and showseq programs from EMBOSS. I have noticed that some very usual enzymes are not detected by the program such as XhoI and PstI and a few others when the complete list of enzymes is used. I have here below a test sequence that should contain XhoI, EcoRI, PstI, EcoRV, HindIII, KpnI, SacII, ApaI, SmaI, BamHI and XbaI. XhoI, PstI and EcoRV are not detected by restrict and showseq in the test sequence shown below. Is there a problem with the restriction enzyme list? Test Sequence: gagcagggggatctcggcgagctctcgagaattctcacgcgtctgcaggatatcaagcttgcggtaccgcgg gcccggg From ableasby at hgmp.mrc.ac.uk Fri Nov 1 15:21:29 2002 From: ableasby at hgmp.mrc.ac.uk (ableasby at hgmp.mrc.ac.uk) Date: Fri, 1 Nov 2002 20:21:29 GMT Subject: test sequence Message-ID: <200211012021.UAA12495@bromine.hgmp.mrc.ac.uk> There is probably not a problem. EMBOSS only reports one isoschizomer for cases where several REs have the same cut site. If the -preferred switch is given to these programs then the more easily available of the isoschizomers will be reported. This is controlled by the file: embossre.equ where, for each RE, you can specify which isoschizomer should be reported. So, first try adding -preferred. If you just want to search for a particular set of enzymes they can be given as a comma-separated list using the -enzymes qualifier e.g. -enzymes "ecori bamhi" HTH Alan Bleasby HGMP PS: NEB supply an emboss-format set of files which are just the most common REs. You can rename them (e.g. to embossre.enz/ref/sup) and overwrite your current set in the emboss REBASE directory. From David.Bauer at SCHERING.DE Mon Nov 4 03:35:51 2002 From: David.Bauer at SCHERING.DE (David.Bauer at SCHERING.DE) Date: Mon, 4 Nov 2002 09:35:51 +0100 Subject: test sequence Message-ID: Hi Alan, I wonder how the "preferred" list is created. Restrict finds the site "CTGCAG" as recognition site of BstMAI. This is a rather exotic enzyme, available only from one single company, which I didn't know before. I appologize for my ignorance if this is a common supplier in UK ;-) On the other hand PstI is available from about 20 suppliers and this is also the enzyme name used in various catalogue pictures of multiple cloning sites in vectors (puc19 polylinker etc.) So I would suggest to add the BstMAI -> PstI mapping to the distribution version of embossre.equ. David. There is probably not a problem. EMBOSS only reports one isoschizomer for cases where several REs have the same cut site. If the -preferred switch is given to these programs then the more easily available of the isoschizomers will be reported. This is controlled by the file: embossre.equ where, for each RE, you can specify which isoschizomer should be reported. So, first try adding -preferred. If you just want to search for a particular set of enzymes they can be given as a comma-separated list using the -enzymes qualifier e.g. -enzymes "ecori bamhi" HTH Alan Bleasby HGMP PS: NEB supply an emboss-format set of files which are just the most common REs. You can rename them (e.g. to embossre.enz/ref/sup) and overwrite your current set in the emboss REBASE directory. From gbottu at ben.vub.ac.be Mon Nov 4 05:05:58 2002 From: gbottu at ben.vub.ac.be (Guy Bottu) Date: Mon, 4 Nov 2002 11:05:58 +0100 (CET) Subject: Remote getz from emboss Message-ID: <200211041005.LAA1502643@black.vub.ac.be> from : BEN At the BEN site we do have a Perl script that reproduces more or less the functionality of the GCG program lookup. It can access a local (with getz) or a remote (with rsh getz) SRS server (simple outcomment inside the script, was because we once had our SRS server on a different computer). It can be run interactively at the command line or put behind an EMBOSS wrapper program and thus behind Staden or the EMBOSS WWW interfaces. Is this what you are looking for ? Guy Bottu From duhaimj at ircm.qc.ca Tue Nov 5 16:34:57 2002 From: duhaimj at ircm.qc.ca (Johanne Duhaime) Date: Tue, 05 Nov 2002 16:34:57 -0500 Subject: MSE will not save on Exit Message-ID: <3DC83981.625D3E83@ircm.qc.ca> Hello I am trying to use MSE (MSE -0.04.tar.gz just installed) but I cannot save with the exit command. After I modified a sequence, when I type Exit on the command line I have: Sequences modified do you wish to continue exiting [N] Saying Y or N will not save anything. For now I have to use "write". Any idea of the problem? -- Johanne Duhaime IRCM 110 Ave des Pins O Montreal, Quebec 987-5556 (tel) 987-5644 (fax) Johanne_Duhaime at ircm.qc.ca http://www.ircm.qc.ca From w2hgcg at netscape.net Tue Nov 5 22:01:47 2002 From: w2hgcg at netscape.net (w2hgcg at netscape.net) Date: Tue, 05 Nov 2002 22:01:47 -0500 Subject: epitope search Message-ID: <56BE5F2D.194AD14A.000665E2@netscape.net> I know this is not the place but perhaps... I am working with LSA-3, I have been able to produce some antibodys in rabits, when I run Inmunoblot the antiboys recognize specific proteic bands (bandas, do not know the right english word), how can I search for epitopes in the pfalciparum against my inmunogenos? sorry for my english... Lucia Goncalvez __________________________________________________________________ The NEW Netscape 7.0 browser is now available. Upgrade now! http://channels.netscape.com/ns/browsers/download.jsp Get your own FREE, personal Netscape Mail account today at http://webmail.netscape.com/ From ray at leicester.ac.uk Wed Nov 6 08:24:38 2002 From: ray at leicester.ac.uk (Dalgleish, Dr R.) Date: Wed, 6 Nov 2002 13:24:38 -0000 Subject: Suggestion for new EMBOSS program Message-ID: I find GCG framealign very useful to align a protein with its DNA sequence. Could somebody find the time to write an EMBOSS equivalent? Thanks, Raymond Dalgleish Genetics Leicester From peter.rice at uk.lionbioscience.com Wed Nov 6 08:32:13 2002 From: peter.rice at uk.lionbioscience.com (Peter Rice) Date: Wed, 06 Nov 2002 13:32:13 +0000 Subject: Suggestion for new EMBOSS program References: Message-ID: <3DC919DD.5020306@uk.lionbioscience.com> Dalgleish, Dr R. wrote: > I find GCG framealign very useful to align > a protein with its DNA sequence. Could > somebody find the time to write an EMBOSS > equivalent? Sounds rather like genewise in the (free) Wise2 package http://www.sanger.ac.uk/Software/Wise2/ You can try it at http://www.sanger.ac.uk/Software/Wise2/genewiseform.shtml Can you be more specific about whether this is what you need, and what you want in EMBOSS? regards, Peter -- ------------------------------------------------ Peter Rice, LION Bioscience Ltd, Cambridge, UK peter.rice at uk.lionbioscience.com +44 1223 224723 From heme at postmark.net Thu Nov 7 07:05:55 2002 From: heme at postmark.net (Per Johansson) Date: Thu, 07 Nov 2002 12:05:55 +0000 Subject: Suggestion for new use for EMBOSS program Message-ID: <20021107120555.8757.qmail@venus.postmark.net> I have it difficult to find a replacement for the GCG program FINDPATTERNS. The EMBOSS program fuzznuc cannot use a database of patterns (primers). Other alignment programs in EMBOSS like supermatcher are useful but, among other things, you can't choose mismatch settings. The best replacement I've found is the EMBOSS program tfscan! Tfscan uses a database of patterns, but you can't reverse the patterns (you have to put in copies of forward and reverse primer sequences in the database). The tfscan algorithm is ideal (and is much faster than find patterns) but obviously a few minor changes to the input and output would be required if it were used in a replacement program. Obviously, I could write a script to wrap tfscan but I'd like to avoid this. A new program with this functionality would be beneficial for the EMBOSS package. Per From charles at moulinette.dyndns.org Thu Nov 7 08:20:42 2002 From: charles at moulinette.dyndns.org (Charles Plessy) Date: Thu, 7 Nov 2002 14:20:42 +0100 Subject: Suggestion for new use for EMBOSS program In-Reply-To: <20021107120555.8757.qmail@venus.postmark.net> References: <20021107120555.8757.qmail@venus.postmark.net> Message-ID: <20021107132042.GA9854@moulinette.dyndns.org> > The best replacement I've found is the EMBOSS program tfscan! > Obviously, I could write a script to wrap tfscan but I'd like to > avoid this. A new program with this functionality would be beneficial > for the EMBOSS package. An alternative would be to write a script that builds a transfac-format database from a flatfile containing names and corresponding consensus (This would also allow to migrate the pattern.dat file of GCG). Charles From heme at postmark.net Thu Nov 7 10:11:43 2002 From: heme at postmark.net (Per Johansson) Date: Thu, 07 Nov 2002 15:11:43 +0000 Subject: Fwd: Re: Suggestion for new use for EMBOSS program Message-ID: <20021107151143.25950.qmail@venus.postmark.net> It's OK to reformat the GCG pattern file to a transfac-format database and use tfscan. But I still miss some functions in FINDPATTERNS. I can't search the reverse primer strand, the output is limited to ONE format no alignment format, tfscan it doesn't accept wobbeling bases in primers (e.g. K=G or T , D=G or C or A not-T ...). But otherwise the tfscan algorithm is a very nice and fast word-matching algorithm, but it COULD be used for other purposes also! Per --- Forwarded Message --- To: EMBOSS From: Charles Plessy Reply-To: c.plessy at mangoosta.net Subject: Re: Suggestion for new use for EMBOSS program Date: Thu, 7 Nov 2002 14:20:42 +0100 > The best replacement I've found is the EMBOSS program tfscan! > Obviously, I could write a script to wrap tfscan but I'd like to > avoid this. A new program with this functionality would be beneficial > for the EMBOSS package. An alternative would be to write a script that builds a transfac-format database from a flatfile containing names and corresponding consensus (This would also allow to migrate the pattern.dat file of GCG). Charles From heme at postmark.net Fri Nov 8 09:28:53 2002 From: heme at postmark.net (Per Johansson) Date: Fri, 08 Nov 2002 14:28:53 +0000 Subject: EMBOSS default program settings Message-ID: <20021108142853.4941.qmail@venus.postmark.net> I have problems with EMBOSS default program settings in the emboss.defaults file. set emboss_stdout 1 Works fine, output goes to stdout set emboss_verbose 1 Dosen't work set emboss_format embl The programs still outputs fasta format by default! And the ONLY sequence format the EMBOSS programs accepts as input format is embl! It dosen't work as it should. Per From peter.rice at uk.lionbioscience.com Fri Nov 8 10:52:05 2002 From: peter.rice at uk.lionbioscience.com (Peter Rice) Date: Fri, 08 Nov 2002 15:52:05 +0000 Subject: EMBOSS default program settings References: <20021108142853.4941.qmail@venus.postmark.net> Message-ID: <3DCBDDA5.3070106@uk.lionbioscience.com> Per Johansson wrote: > I have problems with EMBOSS default program settings in the > emboss.defaults file. > > set emboss_stdout 1 Works fine, output goes to stdout > > set emboss_verbose 1 Dosen't work Because help is generated as soon as the -help option is tested. Changed in the next release to set -verbose before -help. > set emboss_format embl The programs still outputs fasta format by > default! And the ONLY sequence format the EMBOSS programs accepts as > input format is embl! > > It dosen't work as it should. Well .... emboss_format sets the default *input* format. You can still say fasta::filename to read fasta format The output format is specified as emboss_outformat EMBOSS will read all input formats if you only set emboss_outformat I think you really mean to say: set emboss_outformat embl Hope this helps Peter -- ------------------------------------------------ Peter Rice, LION Bioscience Ltd, Cambridge, UK peter.rice at uk.lionbioscience.com +44 1223 224723 From heme at postmark.net Mon Nov 11 01:32:50 2002 From: heme at postmark.net (Per Johansson) Date: Mon, 11 Nov 2002 06:32:50 +0000 Subject: Fwd: Re: EMBOSS default program settings Message-ID: <20021111063250.19730.qmail@venus.postmark.net> Thank you Peter, I DO mean set emboss_outformat embl (but I can't find emboss_outformat in the documentation). When I use set emboss_outformat embl in the emboss.default file I don't see any effect at all, the default output format is still fasta, I use emboss-2.4.1 . But I assume this is not version dependent. Per Per Johansson heme at postmark.net --- Forwarded Message --- To: Per Johansson Cc: EMBOSS From: Peter Rice Subject: Re: EMBOSS default program settings Date: Fri, 08 Nov 2002 15:52:05 +0000 Per Johansson wrote: > I have problems with EMBOSS default program settings in the > emboss.defaults file. > > set emboss_stdout 1 Works fine, output goes to stdout > > set emboss_verbose 1 Dosen't work Because help is generated as soon as the -help option is tested. Changed in the next release to set -verbose before -help. > set emboss_format embl The programs still outputs fasta format by > default! And the ONLY sequence format the EMBOSS programs accepts as > input format is embl! > > It dosen't work as it should. Well .... emboss_format sets the default *input* format. You can still say fasta::filename to read fasta format The output format is specified as emboss_outformat EMBOSS will read all input formats if you only set emboss_outformat I think you really mean to say: set emboss_outformat embl Hope this helps Peter -- ------------------------------------------------ Peter Rice, LION Bioscience Ltd, Cambridge, UK peter.rice at uk.lionbioscience.com +44 1223 224723 From gwilliam at hgmp.mrc.ac.uk Mon Nov 11 04:20:01 2002 From: gwilliam at hgmp.mrc.ac.uk (Gary Williams, Tel 01223 494522) Date: Mon, 11 Nov 2002 09:20:01 +0000 Subject: Fwd: Re: EMBOSS default program settings References: <20021111063250.19730.qmail@venus.postmark.net> Message-ID: <3DCF7641.E2ED9B73@hgmp.mrc.ac.uk> Per Johansson wrote: > > Thank you Peter, > > I DO mean > > set emboss_outformat embl (but I can't find emboss_outformat in the > documentation). It is documented in: http://www.hgmp.mrc.ac.uk/Software/EMBOSS/Usa/databases.html#global -- Gary Williams Tel: +44 1223 494522 Fax: +44 1223 494512 mailto:G.Williams at hgmp.mrc.ac.uk http://www.hgmp.mrc.ac.uk/ Bioinformatics,MRC HGMP Resource Centre,Hinxton,Cambridge, CB10 1SB,UK From heme at postmark.net Mon Nov 11 08:12:50 2002 From: heme at postmark.net (Per Johansson) Date: Mon, 11 Nov 2002 13:12:50 +0000 Subject: Fwd: Re: EMBOSS default program settings Message-ID: <20021111131250.4618.qmail@www2.postmark.net> Tnak you, That solves the problem, ALWAYS use the latest version! Per Gary Williams, Tel 01223 494522 wrote: > Per Johansson wrote: > > > > Thank you Peter, > > > > I DO mean > > > > set emboss_outformat embl (but I can't find emboss_outformat in the > > documentation). > > It is documented in: > http://www.hgmp.mrc.ac.uk/Software/EMBOSS/Usa/databases.html#global > > -- > Gary Williams Tel: +44 1223 494522 Fax: +44 1223 494512 > mailto:G.Williams at hgmp.mrc.ac.uk http://www.hgmp.mrc.ac.uk/ > Bioinformatics,MRC HGMP Resource Centre,Hinxton,Cambridge, CB10 1SB,UK From sebastian.bassi at ar.advantaseeds.com Mon Nov 11 07:10:54 2002 From: sebastian.bassi at ar.advantaseeds.com (Sebastian Bassi) Date: Mon, 11 Nov 2002 13:10:54 +0100 Subject: Problem with EMBOSS GUI Message-ID: Hi, I've just installed the EMBOSS GUI on http://genes.unq.edu.ar/EMBOSS (this should look like this http://bioinfo.pbi.nrc.ca:8090/EMBOSS/) The problem as you can see on the webpage is the missing programs on the left column (it should appear there all the EMBOSS programs). I think this should be a path problem. For you to help me evaluate it, I attach two files: embossdir.txt, a capture of the ls -Ra from my emboss inst. directory (/opt/emboss). emboss.pl, the emboss.pl file (for you to see if the path are right). The emboss.zip file contains both files and I made it because sometimes attached text get corrupted by some mailers. I hope you can help me. Note: The EMBOSS were compiled using this: configure --prefix=/opt/emboss --without-x --x-includes="" --x-libraries="" The "without x" part is because it was comiled on a RH web server without X. The emboss works fine, the problem is this GUI. Sebastian Bassi. Advanta Seeds. Balcarce Research Station. -------------- next part -------------- A non-text attachment was scrubbed... Name: emboss.zip Type: application/x-zip-compressed Size: 15092 bytes Desc: emboss.zip Url : http://lists.open-bio.org/pipermail/emboss/attachments/20021111/3f4c80e9/attachment.bin From mad at biol.unlp.edu.ar Mon Nov 11 14:18:00 2002 From: mad at biol.unlp.edu.ar (=?ISO-8859-1?Q?Mart=EDn_Sarachu?=) Date: Mon, 11 Nov 2002 16:18:00 -0300 Subject: tfextract not indexing? Message-ID: <3DD00268.6090206@biol.unlp.edu.ar> Hi, tfextract is apparently running ok, but it's output are files are empty. The command line is > # tfextract -debug -warning -error -fatal -die -verbose > Extract data from TRANSFAC > Full pathname of transfac SITE.DAT: /home/work/dbs/transfac/site.dat > # and > # ls -s /usr/local/emboss/share/EMBOSS/data/tf* > 0 /usr/local/emboss/share/EMBOSS/data/tffungi > 0 /usr/local/emboss/share/EMBOSS/data/tfinsect > 0 /usr/local/emboss/share/EMBOSS/data/tfother > 0 /usr/local/emboss/share/EMBOSS/data/tfplant > 0 /usr/local/emboss/share/EMBOSS/data/tfvertebrate ...a sample from site.dat > VV TRANSFAC SITES TABLE, V.2.4 25-08-1995 > XX > // > AC R00001 > XX > ID HS$6-16_01 > XX > DT 20.06.90 (created); . > DT 24.08.95 10:48:05 (updated); EWI. > XX > TY DNA > XX > DE 6-16 > XX > SE gGGAAAaTGAAACT > XX > EL ISRE > XX > SF -127 > ST -89 > XX > ... > ... > SO 0811; B103 > ME gel shift competition > RN [1] > RA Suzuki-Yagawa Y., Kawakami K., Nagano K. > RT Housekeeping Na,K-ATPase alpha1 subunit gene promoter is composed > RT of multiple cis elements to which common and cell type-specific > RT factors bind > RL Mol. Cell. Biol. 12:4046-4055 (1992). > DR EMBL; X52560; HSNFIL6(37:74). > // am I missing something? Thanks, martin -- Mart?n Sarachu mad at biol.unlp.edu.ar EMBNet Argentina http://www.ar.embnet.org From Gunnar.Andersson at imbim.uu.se Tue Nov 12 05:28:47 2002 From: Gunnar.Andersson at imbim.uu.se (Gunnar Andersson) Date: Tue, 12 Nov 2002 11:28:47 +0100 Subject: DAN output Tm Message-ID: How should I interpret the Tm calculated but DAN? Is Tm an estimated melt point of the entire sequence or of the sequence in the window? How can this Tm (window=100nt) be higher than Tmprod of the full 160 nt sequence? -- Gunnar Andersson Institutionen f?r medicinsk biokemi och mikrobiologi Uppsala Biomedicinska Centrum (BMC), Husarg. 3 Box 582, 751 23 UPPSALA E-post : Gunnar.Andersson at imbim.uu.se Telefon: 018-471 45 87 Fax:018-50 98 76 From r.bowden at vir.gla.ac.uk Thu Nov 14 10:36:23 2002 From: r.bowden at vir.gla.ac.uk (Rory Bowden) Date: Thu, 14 Nov 2002 15:36:23 -0000 Subject: Fw: Other: EMBOSS versus GCG? Message-ID: <007e01c28bf3$96beddc0$6886d182@vir.gla.ac.uk> This on the 'evoldir' list, which is the main international mailing list in evolutionary biology. Would anyone like to make any comments for me to pass on? while I'm definitely not in Canada I would say that this question is likely to come up here (in the UK) at the institutional if not research council level. Does anyone have an opinion they'd like to articulate e.g. about whether EMBOSS is ready to supplant GCG for end-users. Rory Bowden MRC Virology Unit Glasgow UK ----- Original Message ----- From: "EvolDir" To: Sent: Thursday, November 14, 2002 9:34 AM Subject: Other: EMBOSS versus GCG? > > Since its inception as the "Wisconsin package" in the early 1980s, the GCG > suite of programs have provided a continuously improving "gold standard" > for evolutionary bioinformatics software. The GCG suite is featured > extensively in the latest bioinformatics textbooks (e.g. Mount) and in > software reviews (e.g. The Scientist, August 19). Although some individual > GCG programs have been surpassed by others, their range and flexibility, > permitting linkage of programs together in innovative ways, has no current > equivalent. > > Recently, "open source" advocates have pointed to the EMBOSS suite as > providing a free alternative to the commercial package (supplied by > Accelrys, with whom I have no financial connection). It is my impression > that GCG is in a different league. For example, compare the GCG program > "Window" with its proposed EMBOSS alternative "Freak": > > TASK: Determination of the number of occurences of a motif in a sequence > window. > > GGC program WINDOW > > 1. Allows up to 6 motifs at a time > 2. Outputs absolute values and has a variety of other output options. > 3. Extensive input menu > > EMBOSS program FREAK > > 1. Allows only 1 motif at a time. > 2. Outputs a calculated fraction. > 3. Very limited input menu. > > However, in Canada the open-source agenda has won out. In April 2002 > the publicly-funded, Halifax-based, Canadian Bioinformatics Resource (CBR) > abandonned GCG, apparently with the consent of the Canadian evolutionary > bioinformatics community. In this respect, I would be interested to hear > from concerned parties in Canada with respect to the following questions: > > 1. Does your institution (or do you yourself) support GCG, so that you do > not need CBR to supply GCG? > > 2. If you do not have independent access, do you find EMBOSS a suitable > substitute for GCG? > > That Canada, which has spent hundreds of millions on genome > projects, cannot give its researchers and their students a choice from > among the relatively-inexpensive software packages that are available to > analyze genomics data, seems to me very strange. > > Donald Forsdyke, Department of Biochemistry, > Queen's University, Canada > http://post.queensu.ca/~forsdyke/bioinfor.htm > > > > From peter.rice at uk.lionbioscience.com Fri Nov 15 08:20:13 2002 From: peter.rice at uk.lionbioscience.com (Peter Rice) Date: Fri, 15 Nov 2002 13:20:13 +0000 Subject: Fw: Other: EMBOSS versus GCG? References: <007e01c28bf3$96beddc0$6886d182@vir.gla.ac.uk> Message-ID: <3DD4F48D.2020606@uk.lionbioscience.com> Rory Bowden wrote: > This on the 'evoldir' list, which is the main international mailing list in > evolutionary biology. Would anyone like to make any comments for me to pass > on? >> It is my impression >>that GCG is in a different league. For example, compare the GCG program >>"Window" with its proposed EMBOSS alternative "Freak": >> >>TASK: Determination of the number of occurences of a motif in a sequence >>window. >> >>GGC program WINDOW >> >>1. Allows up to 6 motifs at a time >>2. Outputs absolute values and has a variety of other output options. >>3. Extensive input menu >> >>EMBOSS program FREAK >> >>1. Allows only 1 motif at a time. >>2. Outputs a calculated fraction. >>3. Very limited input menu. Window : produces scores over a 'window' (a base range). StatPlot : Plots Window results EMBOSS : reports have scores over a base range as a general output format. Freak: frequency of matches FuzzNuc/FuzzPro/FuzzTran: Pattern matches with ambiguity codes Restrict: Pattern matches with a pattern file etc... This makes it possible to develop some really nice new EMBOSS applications. So ... how about a program which reads EMBOSS report files and produces a summary report (think of window), and another that plots them all (think of statplot). Scores could be plotted if we have a good way to compare them. Yes, I know freak does not produce a report file ... but that is a very easy change. It could also read in EMBL/SwissProt feature tables as annotation. So, suggestions please for EMBOSS applications to plot reports/features... For example: 1. xy plot of scores as points at the centre of a feature, with the sequence position on the x axis and the score on the y axis. Possibly split into multiple plots by program/feature-type/named-tag-value (e.g. pattern) (like statplot only much more versatile). 2. xy plot of lines for each feature 3. GANTT (bar) chart of features by position, annotated with feature type/program/score as appropriate 4. Combine these - xy plot of features with scores, and other features reported underneath (think of the -mark option in statplot - but with far more annotation possible below the x axis) Maybe we can make some mock-ups on the EMBOSS pages to show the possibilities? regards, Peter -- ------------------------------------------------ Peter Rice, LION Bioscience Ltd, Cambridge, UK peter.rice at uk.lionbioscience.com +44 1223 224723 From sjmiller at u.arizona.edu Fri Nov 15 14:23:21 2002 From: sjmiller at u.arizona.edu (Susan J. Miller) Date: Fri, 15 Nov 2002 12:23:21 -0700 Subject: newcpgreport vs newcpgseek Message-ID: <3DD549A9.9170F056@u.arizona.edu> I could not find an emboss FAQ...is there one? I'm trying to figure out the differences between cpgreport, newcpgreport and newcpgseek. -- Thanks, -susan Susan J. Miller Biotechnology Computing Facility Arizona Research Laboratories Bio West 228 University of Arizona Tucson, AZ 85721 (520) 626-2597 From rls at ebi.ac.uk Fri Nov 15 20:08:02 2002 From: rls at ebi.ac.uk (Rodrigo Lopez) Date: Sat, 16 Nov 2002 01:08:02 -0000 Subject: newcpgreport vs newcpgseek In-Reply-To: <3DD549A9.9170F056@u.arizona.edu> Message-ID: <000501c28d0c$9cf06780$0a0868d5@castafiore> Hi Susan, Yes. I never had the time to document these. Briefly: newcpgreport use the same method to find islands but produce different output. The method is described in: Larsen,F., Gundersen,G., Lopez,R., Prydz,H. CpG islands as gene markers in the human genome. (1992) Genomics 13 (4):1095-107 MedlineID: 92372002 PubMedID: 1505946 Cpgreport uses a scoring method based on sum/frequencies which overpredicts islands but finds the smaller ones around primary exons. Cpgseek is deprecated at the moment. For all practical purposes I use newcpgreport. I actually use it to produce the human cpgisland database you can find on the EBI's ftp server as well as on the EBI's SRS server. Hope this helps, R:) > -----Original Message----- > From: owner-emboss at hgmp.mrc.ac.uk > [mailto:owner-emboss at hgmp.mrc.ac.uk] On Behalf Of Susan J. Miller > Sent: 15 November 2002 19:23 > To: emboss at embnet.org > Subject: newcpgreport vs newcpgseek > > > I could not find an emboss FAQ...is there one? > > I'm trying to figure out the differences between cpgreport, > newcpgreport and newcpgseek. > > -- > Thanks, > -susan > > Susan J. Miller > Biotechnology Computing Facility > Arizona Research Laboratories > Bio West 228 > University of Arizona > Tucson, AZ 85721 > (520) 626-2597 > From David.Bauer at SCHERING.DE Wed Nov 20 09:45:24 2002 From: David.Bauer at SCHERING.DE (David.Bauer at SCHERING.DE) Date: Wed, 20 Nov 2002 15:45:24 +0100 Subject: vectorstrip Message-ID: Hi, If I run vectorstrip on a file with many sequences, the output file contains only sequences where the vector was stripped. I would find it more usefull, if vectorstrip would (maybe optionally) also send the sequences with no hit to the vector in the output file. Or have I overseen something ? David. From Myrian_Grondin at UQTR.CA Wed Nov 20 11:15:20 2002 From: Myrian_Grondin at UQTR.CA (Myrian_Grondin at UQTR.CA) Date: Wed, 20 Nov 2002 11:15:20 -0500 Subject: Install Emboss with Windows?? Message-ID: <1037808920.3ddbb518dbbac@courriel.uqtr.ca> Hi, We are working on PC, OS Windows 98, and we would like to know if it's possible to install Emboss on our machine. If so, which software have we to install to be able to run Emboss? Thanks a lot (excuse me, my English is so poor...) Myrian ------------------------------------------------- Courriel exp?di? via https://courriel.uqtr.ca From stefanielager at fastmail.ca Thu Nov 21 03:19:32 2002 From: stefanielager at fastmail.ca (Stefanie Lager) Date: Thu, 21 Nov 2002 03:19:32 -0500 (EST) Subject: EMBOOS end EMBL entryname Message-ID: <3DDC9714.000059.00380@ns.interchange.ca> Hi, Does EMBL still stick to entrynames (the ID line)of "nine uppercase alphanumeric Characters"? (http://www.ebi.ac.uk/embl/Documentation/User_manual/id_line.html) .I can't retrive sequences from the International Protein Index (IPI) database (11 characters in ID entryname) in EMBL or SWISS format using EMBOSS programs. The EMBOSS programs only accepts 10 characters for ID in EMBL or SWISS format . Is this problem fixed in EMBOSS versions later than 2.4.1? EMBL can have wthatever policy they want but it would be nice if the EMBOSS programs would accept ANY lenth of ID also in EMBL and SWISS format. Stefanie _________________________________________________________________ http://fastmail.ca/ - Fast Secure Web Email for Canadians From sharmila at ebi.ac.uk Thu Nov 21 06:12:33 2002 From: sharmila at ebi.ac.uk (Sharmila Pillai) Date: Thu, 21 Nov 2002 11:12:33 +0000 Subject: Install Emboss with Windows?? Message-ID: Hi, From what I understand there is a cygwin compiled version but its not tested and cannot handle graphics. You should refer to what Rodrigo Lopez wrote to the embosslist on 1/11/02 in response to subject:Remote getz from emboss Though this not the solution for your problem today, this could be the direction for Windows users. I'll try to explain bit of it here: At the EBI's External Services group, I am working on a webservice for EMBOSS using SOAP. Basically, this enables the user to use EMBOSS applications remotely. % seqret srsembl:J00231 -lhttp://servername:portnum/axis/services The above command would use AXIS/SOAP to access the 'servername' and the 'portnum' which inturn would retrieve data from srsembl (as defined in emboss.default) and pass it on to the application (seqret, in this example). The result is sent to stdout. All the user (using any OS) needs is a client which understands/interprets a command line as above and some libraries for Axis/SOAP. We have an experimental service using both Java and Perl running on Axis/Tomcat. I don't think EBI provides remote access to many EMBOSS applications today. Hoping our experimental service survives our tests and there is enough demand for such a service, EBI can soon start opening up webservice access to EMBOSS. //Sharmila. From Georg.Beckmann at Schering.DE Thu Nov 21 03:25:39 2002 From: Georg.Beckmann at Schering.DE (Georg.Beckmann at Schering.DE) Date: Thu, 21 Nov 2002 09:25:39 +0100 Subject: OldDistances Message-ID: Hi, does anybody know if EMBOSS offers a program similar to OldDistances in GCG ? OldDistances - which previously had still another name, that I don't remember - calculates a matrix of pairwise similarities from a multiple alignment. As far as I can see, there is no such program. Is somebody working on such program for Emboss ? Thanks. Ciao, Georg Beckmann From newgene at bigfoot.com Thu Nov 21 09:55:24 2002 From: newgene at bigfoot.com (clwu) Date: Thu, 21 Nov 2002 08:55:24 -0600 Subject: Install Emboss with Windows?? References: <1037808920.3ddbb518dbbac@courriel.uqtr.ca> Message-ID: <3DDCF3DC.9020202@bigfoot.com> I recently compiled EMBOSS successfully under cygwin/win2K(Thanks for David Starks-Browning's great help). And so far, all applications I used works fine(graphics output is also OK under openbox/cygwin). I think you should install cygwin and give a try. good luck. Chunlei Myrian_Grondin at UQTR.CA wrote:e >Hi, >We are working on PC, OS Windows 98, and we would like to know if it's possible >to install Emboss on our machine. If so, which software have we to install to >be able to run Emboss? >Thanks a lot (excuse me, my English is so poor...) >Myrian > > > >------------------------------------------------- >Courriel exp?di? via https://courriel.uqtr.ca > > From lukem at gene.pbi.nrc.ca Thu Nov 21 10:28:13 2002 From: lukem at gene.pbi.nrc.ca (Luke McCarthy) Date: Thu, 21 Nov 2002 09:28:13 -0600 (CST) Subject: Install Emboss with Windows?? In-Reply-To: <1037808920.3ddbb518dbbac@courriel.uqtr.ca> Message-ID: On Wed, 20 Nov 2002 Myrian_Grondin at UQTR.CA wrote: > Hi, > We are working on PC, OS Windows 98, and we would like to know if it's > possible to install Emboss on our machine. If so, which software have we to > install to be able to run Emboss? Other posts have addressed the issue of installing EMBOSS locally on a Windows box, but if you have an immediate pressing need to use the EMBOSS applications, the Canadian Bioinformatics Resource offers access through a web interface at http://www.cbr.nrc.ca/services/emboss_e.php ou en francais: http://www.cbr.nrc.ca/services/emboss_f.php Unfortunately, the interface itself is English only, but then so are the EMBOSS applications (at least as far as I know...) Cheers, Luke From newgene at bigfoot.com Thu Nov 21 11:52:11 2002 From: newgene at bigfoot.com (clwu) Date: Thu, 21 Nov 2002 10:52:11 -0600 Subject: mfold Message-ID: <3DDD0F3B.8020109@bigfoot.com> Hi, group, Does anybody know if there is a EMBOSS equivalence for "mfold" program in GCG? Thanks. Chunlei From stefanielager at fastmail.ca Fri Nov 22 03:33:29 2002 From: stefanielager at fastmail.ca (Stefanie Lager) Date: Fri, 22 Nov 2002 03:33:29 -0500 (EST) Subject: mfold Message-ID: <3DDDEBD9.000009.03475@ns.interchange.ca> > Hi, group, > Does anybody know if there is a EMBOSS equivalence for > "mfold" program in GCG? > > Thanks. > > Chunlei NO, but there are plenty of RNA structure software out there, both as servers and for local installation. http://www.bioinfo.rpi.edu/~zukerm/rna/node3.html#SECTION00031 _________________________________________________________________ http://fastmail.ca/ - Fast Secure Web Email for Canadians From mikep at angis.org.au Sun Nov 24 17:25:41 2002 From: mikep at angis.org.au (Michael Poidinger) Date: Mon, 25 Nov 2002 09:25:41 +1100 Subject: codon useage tables In-Reply-To: <3DDDEBD9.000009.03475@ns.interchange.ca> Message-ID: <5.2.0.9.2.20021125092344.02f2ae50@morgan.angis.org.au> Is there a site somewhere which describes which organisms/data sets the EMBOSS codon useage tables are derived from? some are obvious from their name, others are not. Thanks, Mike ------------------------------------ Dr Michael Poidinger PhD(virology) PGDipSci (computer science) CEO, Australian Genome Information Centre Head, Australian National Genome Information Service ph 61-2-93518617 mob 0413146765 fax 61-2-93518618 email head at angis.org.au ------------------------------------------ From areagp61 at yahoo.it Mon Nov 25 04:38:59 2002 From: areagp61 at yahoo.it (Graziano P.) Date: Mon, 25 Nov 2002 10:38:59 +0100 Subject: codon useage tables References: <5.2.0.9.2.20021125092344.02f2ae50@morgan.angis.org.au> Message-ID: <007701c29466$8127a7f0$18105709@italy.ibm.com> Not every file but most are described in the README file from ftp://ftp.ebi.ac.uk/pub/databases/codonusage Hope this helps Graziano Pappad? ----- Original Message ----- From: "Michael Poidinger" To: Sent: Sunday, November 24, 2002 11:25 PM Subject: codon useage tables > Is there a site somewhere which describes which organisms/data sets the > EMBOSS codon useage tables are derived from? some are obvious from their > name, others are not. > > Thanks, > Mike > ------------------------------------ > Dr Michael Poidinger > PhD(virology) PGDipSci (computer science) > CEO, Australian Genome Information Centre > Head, Australian National Genome Information Service > ph 61-2-93518617 > mob 0413146765 > fax 61-2-93518618 > email head at angis.org.au > ------------------------------------------ > ______________________________________________________________________ Per te Blu American Express ? gratis! http://it.yahoo.com/mail_it/foot/?http://www.americanexpress.it/land_yahoo From mikep at angis.org.au Mon Nov 25 16:58:02 2002 From: mikep at angis.org.au (Michael Poidinger) Date: Tue, 26 Nov 2002 08:58:02 +1100 Subject: codon useage tables In-Reply-To: <007701c29466$8127a7f0$18105709@italy.ibm.com> References: <5.2.0.9.2.20021125092344.02f2ae50@morgan.angis.org.au> Message-ID: <5.2.0.9.2.20021126085548.02f344e8@morgan.angis.org.au> At 10:38 AM 25/11/2002 +0100, Graziano P. wrote: >Not every file but most are described in the README file >from ftp://ftp.ebi.ac.uk/pub/databases/codonusage > >Hope this helps Thanks, it helps with quite a few. Do you (or anyone else) know the difference between related files? such as Ehum and Ehuman Eeco, Eeco_h and Eecoli Emus, Emussp etc. Thanks, Mike ------------------------------------ Dr Michael Poidinger PhD(virology) PGDipSci (computer science) CEO, Australian Genome Information Centre Head, Australian National Genome Information Service ph 61-2-93518617 mob 0413146765 fax 61-2-93518618 email head at angis.org.au ------------------------------------------ From peter.rice at uk.lionbioscience.com Tue Nov 26 05:40:04 2002 From: peter.rice at uk.lionbioscience.com (Peter Rice) Date: Tue, 26 Nov 2002 10:40:04 +0000 Subject: codon useage tables References: <5.2.0.9.2.20021125092344.02f2ae50@morgan.angis.org.au> <5.2.0.9.2.20021126085548.02f344e8@morgan.angis.org.au> Message-ID: <3DE34F84.90108@uk.lionbioscience.com> Michael Poidinger wrote: > Do you (or anyone else) know the difference between related files? > > such as > Ehum and Ehuman > Eeco, Eeco_h and Eecoli > Emus, Emussp The codon usage files were set up a long time ago. It was not so easy to find a good set of tables that were free to use. The first tables (if I recall correctly) came from the TRANSTERM database Short names (Eeco) are reformatted TRANSTERM codon usage tables with an E (EMBOSS) prefix and a .cut suffix to identify the format. Names with _h (Eco_h) are highly expressed genes (high Codon Adaptation Index values) sp endings? Help! Ysp is "Yeast S.pombe" of course. I assume the others are for a genus (e.g. Mus sp. = Mus musculus and Mus domesticus) rather than a single species. Emussp.cut is a reformat of TRANSTERM's mussp.cod file. The EBI's FTP copy of TRANSTERM did not document exactly what these names mean. The original TRANSTERM documentation also leaves you to guess at the 3-letter spoecies codes. The TRANSTERM website seems to be only partly available. Longer names (Eecoli) are added from elsewhere (I need to check on their origin) and only include a few genes (count the stop codons!) so I assume they are old and probably obsolete. mt endings are mitochondrial genes cp endings are chloroplast genes Time to review these tables I suspect!!! How about replacing them with annotated tables from CUTG for selected species? We need to be careful about default table names in some programs, but they are easy to update. Peter -- ------------------------------------------------ Peter Rice, LION Bioscience Ltd, Cambridge, UK peter.rice at uk.lionbioscience.com +44 1223 224723 From Joerg.Schaber at uv.es Tue Nov 26 07:41:44 2002 From: Joerg.Schaber at uv.es (Joerg Schaber) Date: Tue, 26 Nov 2002 13:41:44 +0100 Subject: duplicate ID Message-ID: <3DE36C08.6030603@uv.es> Hi, creating a ncbi database using dbiflat I always get a few times the message "Warning: Duplicate ID skipped: '' All hits will point to first ID found". Even though it does not seem to have severe efects I would like to know what duplicate IDs are ment. I checked the genomes IDs and acnums and they seem to be OK (all *gbk files downloaded from NCBI) and they all have entries and are not 'null'. Any idea what's the problem? here the command I use: dbiflat -idformat gb -directory "." -filename "*.gbk" -dbname "ncbibac" -release "1.0" -date "26/11/02" -fields acnum,des,taxon greeetings, joerg From david.vilanova at rdls.nestle.com Tue Nov 26 08:43:40 2002 From: david.vilanova at rdls.nestle.com (Vilanova,David,LAUSANNE,NRC/BS) Date: Tue, 26 Nov 2002 14:43:40 +0100 Subject: Matcher Message-ID: <89466355CEFE7244AC3A013E45641C180144ECDD@lsmail2.crn.nestrd.ch> Dear all, I was wondering if matcher program accepts a sequence via stdin. the following exemple doesn't work for me. matcher -sequencea 'ATGCGA' -sequenceb 'ATCTAGATATGCGA' >cannot open ATGCGA file for read. Is there anyway to submit a sequence via stdin ??? Thanks, David From peter.rice at uk.lionbioscience.com Tue Nov 26 08:53:50 2002 From: peter.rice at uk.lionbioscience.com (Peter Rice) Date: Tue, 26 Nov 2002 13:53:50 +0000 Subject: Matcher References: <89466355CEFE7244AC3A013E45641C180144ECDD@lsmail2.crn.nestrd.ch> Message-ID: <3DE37CEE.5090903@uk.lionbioscience.com> Vilanova,David,LAUSANNE,NRC/BS wrote: > Dear all, > I was wondering if matcher program accepts a sequence via stdin. > > the following exemple doesn't work for me. > > matcher -sequencea 'ATGCGA' -sequenceb 'ATCTAGATATGCGA' > >>cannot open ATGCGA file for read. > > > Is there anyway to submit a sequence via stdin ??? You don't mean stdin (that can only read one sequence anyway) ... you mean "can I specify a sequence on the command line?" Yes!!!! You need the "asis" special format. matcher -sequencea 'asis::ATGCGA' -sequenceb 'asis::ATCTAGATATGCGA' (assuming your shell allows the command line to be long enough for your sequences :-) Hope this helps Peter -- ------------------------------------------------ Peter Rice, LION Bioscience Ltd, Cambridge, UK peter.rice at uk.lionbioscience.com +44 1223 224723 From david.vilanova at rdls.nestle.com Tue Nov 26 09:12:19 2002 From: david.vilanova at rdls.nestle.com (Vilanova,David,LAUSANNE,NRC/BS) Date: Tue, 26 Nov 2002 15:12:19 +0100 Subject: Matcher Message-ID: <89466355CEFE7244AC3A013E45641C180144ECE0@lsmail2.crn.nestrd.ch> Thanks Peter, Sorry for the mistake. I'm writing a bioperl script which automatically runs an emboss aplication. I could have worked by generating foreach sequence I read a new file but it looks pretty nice like that. Regards, David #! /usr/bin/perl -w use Bio::Factory::EMBOSS; use Bio::SeqIO; die "Usage: perl script.pl [seqfileA] [seqfileB] [outfile]\n" unless @ARGV eq '3'; #Read input files ($seqfileA,$seqfileB,$outfile) = @ARGV; #Initialize Object $EMBOSS = new Bio::Factory::EMBOSS; #Define emboss program to run $application = $EMBOSS->program('matcher'); #Manipulate SeqfileA file $seqA = new Bio::SeqIO (-file => $seqfileA, -format => 'fasta'); while ($seqinA = $seqA->next_seq){ $inseqA = "asis::".$seqinA->seq; $seqidA = $seqinA->id; #$seqoutA->write_seq($inseqA); print "####$seqidA\n"; #Initialize seqB at every iteration of SeqA $seqB = new Bio::SeqIO (-file => $seqfileB, -format => 'fasta'); while ($seqinB = $seqB->next_seq){ $inseqB = "asis::".$seqinB->seq; #Format like asis::ATGCGA (required for emboss) $seqidB = $seqinB->id; #$seqoutB->write_seq($inseqB); #print "####$inseqA\n"; print "Processing sequence $seqidA..vs..$seqidB..."; #Define program parameters and run... $application->run({ -sequencea => $inseqA, -sequenceb => $inseqB, -outfile => $outfile }); print "done\n"; .... Manipulate alignments..... .... } } -----Original Message----- From: Peter Rice [mailto:peter.rice at uk.lionbioscience.com] Sent: mardi, 26. novembre 2002 14:54 To: Vilanova,David,LAUSANNE,NRC/BS Cc: 'emboss at embnet.org' Subject: Re: Matcher Vilanova,David,LAUSANNE,NRC/BS wrote: > Dear all, > I was wondering if matcher program accepts a sequence via stdin. > > the following exemple doesn't work for me. > > matcher -sequencea 'ATGCGA' -sequenceb 'ATCTAGATATGCGA' > >>cannot open ATGCGA file for read. > > > Is there anyway to submit a sequence via stdin ??? You don't mean stdin (that can only read one sequence anyway) ... you mean "can I specify a sequence on the command line?" Yes!!!! You need the "asis" special format. matcher -sequencea 'asis::ATGCGA' -sequenceb 'asis::ATCTAGATATGCGA' (assuming your shell allows the command line to be long enough for your sequences :-) Hope this helps Peter -- ------------------------------------------------ Peter Rice, LION Bioscience Ltd, Cambridge, UK peter.rice at uk.lionbioscience.com +44 1223 224723 From jason at cgt.mc.duke.edu Tue Nov 26 09:54:19 2002 From: jason at cgt.mc.duke.edu (Jason Stajich) Date: Tue, 26 Nov 2002 09:54:19 -0500 (EST) Subject: Matcher In-Reply-To: <89466355CEFE7244AC3A013E45641C180144ECE0@lsmail2.crn.nestrd.ch> References: <89466355CEFE7244AC3A013E45641C180144ECE0@lsmail2.crn.nestrd.ch> Message-ID: Bioperl will also do the behind-the-scenes work of creating the tempfile and cleaning it up for you if you just pass in a Bio::PrimarySeqI object. It detects if you pass in an object or a string and proceeds accordingly. Jason Stajich Duke University jason at cgt.mc.duke.edu On Tue, 26 Nov 2002, Vilanova,David,LAUSANNE,NRC/BS wrote: > Thanks Peter, > Sorry for the mistake. > I'm writing a bioperl script which automatically runs an emboss aplication. > I could have worked by generating foreach sequence I read a new file but it > looks pretty nice like that. > > Regards, > David > > > #! /usr/bin/perl -w > > use Bio::Factory::EMBOSS; > use Bio::SeqIO; > > die "Usage: perl script.pl [seqfileA] [seqfileB] [outfile]\n" unless @ARGV > eq '3'; > > #Read input files > ($seqfileA,$seqfileB,$outfile) = @ARGV; > > #Initialize Object > $EMBOSS = new Bio::Factory::EMBOSS; > > #Define emboss program to run > $application = $EMBOSS->program('matcher'); > > #Manipulate SeqfileA file > $seqA = new Bio::SeqIO (-file => $seqfileA, > -format => 'fasta'); > > > while ($seqinA = $seqA->next_seq){ > $inseqA = "asis::".$seqinA->seq; > $seqidA = $seqinA->id; > #$seqoutA->write_seq($inseqA); > > print "####$seqidA\n"; > #Initialize seqB at every iteration of SeqA > $seqB = new Bio::SeqIO (-file => $seqfileB, > -format => 'fasta'); > > while ($seqinB = $seqB->next_seq){ > $inseqB = "asis::".$seqinB->seq; #Format like asis::ATGCGA (required > for emboss) > $seqidB = $seqinB->id; > #$seqoutB->write_seq($inseqB); > #print "####$inseqA\n"; > print "Processing sequence $seqidA..vs..$seqidB..."; > > > #Define program parameters and run... > $application->run({ > -sequencea => $inseqA, > -sequenceb => $inseqB, > -outfile => $outfile }); > print "done\n"; > .... > Manipulate alignments..... > .... > } > > } > > > > > > -----Original Message----- > From: Peter Rice [mailto:peter.rice at uk.lionbioscience.com] > Sent: mardi, 26. novembre 2002 14:54 > To: Vilanova,David,LAUSANNE,NRC/BS > Cc: 'emboss at embnet.org' > Subject: Re: Matcher > > > Vilanova,David,LAUSANNE,NRC/BS wrote: > > Dear all, > > I was wondering if matcher program accepts a sequence via stdin. > > > > the following exemple doesn't work for me. > > > > matcher -sequencea 'ATGCGA' -sequenceb 'ATCTAGATATGCGA' > > > >>cannot open ATGCGA file for read. > > > > > > Is there anyway to submit a sequence via stdin ??? > > You don't mean stdin (that can only read one sequence anyway) ... you mean > "can I specify a sequence on the command line?" > > Yes!!!! You need the "asis" special format. > > matcher -sequencea 'asis::ATGCGA' -sequenceb 'asis::ATCTAGATATGCGA' > > (assuming your shell allows the command line to be long enough for your > sequences :-) > > Hope this helps > > Peter > > -- > ------------------------------------------------ > Peter Rice, LION Bioscience Ltd, Cambridge, UK > peter.rice at uk.lionbioscience.com +44 1223 224723 > From david.vilanova at rdls.nestle.com Tue Nov 26 10:58:32 2002 From: david.vilanova at rdls.nestle.com (Vilanova,David,LAUSANNE,NRC/BS) Date: Tue, 26 Nov 2002 16:58:32 +0100 Subject: Bioperl and matcher Message-ID: <89466355CEFE7244AC3A013E45641C180144ECE7@lsmail2.crn.nestrd.ch> Hello, I have problems retrieving the alignments from an emboss output. The program belows read 2 files and runs a matcher of all against all. Matcher gives me an msf output and then I try to parse this alignment with Bio::AlignIO. However I get an exception... Processing sequence 1..vs..3...done ------------- EXCEPTION ------------- MSG: 1 exists as an alignment line but not in the header. Not confident of what is going on! STACK Bio::AlignIO::msf::next_aln /usr/local/lib/perl5/site_perl/5.8.0/Bio/AlignIO/msf.pm:106 STACK toplevel Run_Emboss.pl:50 -------------------------------------- Here is the output from matcher: !!NA_MULTIPLE_ALIGNMENT 1.0 out MSF: 5 Type: N 26/11/02 CompCheck: 2090 .. Name: EMBOSS_001 Len: 5 Check: 1045 Weight: 1.00 Name: EMBOSS_002 Len: 5 Check: 1045 Weight: 1.00 // 1 5 EMBOSS_001 CGGCG EMBOSS_002 CGGCG ########################################################### It doesn't work for fasta format as well in my script (see output below): Processing sequence 1..vs..3...done Use of uninitialized value in sprintf at /usr/local/lib/perl5/site_perl/5.8.0/Bio/SimpleAlign.pm line 257, line 4. Use of uninitialized value in hash element at /usr/local/lib/perl5/site_perl/5.8.0/Bio/SimpleAlign.pm line 268, line 4. Use of uninitialized value in hash element at /usr/local/lib/perl5/site_perl/5.8.0/Bio/SimpleAlign.pm line 268, line 4. Use of uninitialized value in hash element at /usr/local/lib/perl5/site_perl/5.8.0/Bio/SimpleAlign.pm line 270, line 4. ######################### #Script #! /usr/bin/perl -w use Bio::Factory::EMBOSS; use Bio::SeqIO; use Bio::AlignIO; die "Usage: perl script.pl [seqfileA] [seqfileB] [outfile]\n" unless @ARGV eq '3'; #Read input files ($seqfileA,$seqfileB,$outfile) = @ARGV; #Initialize Object $EMBOSS = new Bio::Factory::EMBOSS; #Define emboss program to run $application = $EMBOSS->program('matcher'); #Manipulate SeqfileA file $seqA = new Bio::SeqIO (-file => $seqfileA, -format => 'fasta'); while ($seqinA = $seqA->next_seq){ $inseqA = "asis::".$seqinA->seq; $seqidA = $seqinA->id; print "####$seqidA\n"; #Initialize seqB at every iteration of SeqA $seqB = new Bio::SeqIO (-file => $seqfileB, -format => 'fasta'); while ($seqinB = $seqB->next_seq){ $inseqB = "asis::".$seqinB->seq; #Format like asis::ATGCGA (required for emboss) $seqidB = $seqinB->id; print "Processing sequence $seqidA..vs..$seqidB..."; #Define program parameters and run... $application->run({ -sequencea => $inseqA, -sequenceb => $inseqB, -aformat => 'msf', -outfile => $outfile }); print "done\n"; $alnin = new Bio::AlignIO(-format => 'msf', -file => $outfile ); while ($aln = $alnin->next_aln){ print $aln->no_residues,"\n"; #print $aln->consensus_string,"\n"; } } } From jason at cgt.mc.duke.edu Tue Nov 26 11:05:22 2002 From: jason at cgt.mc.duke.edu (Jason Stajich) Date: Tue, 26 Nov 2002 11:05:22 -0500 (EST) Subject: Bioperl and matcher In-Reply-To: <89466355CEFE7244AC3A013E45641C180144ECE7@lsmail2.crn.nestrd.ch> References: <89466355CEFE7244AC3A013E45641C180144ECE7@lsmail2.crn.nestrd.ch> Message-ID: Our msf parser is seeing something it isn't expecting - not sure why - what happens when you just use the straight 'emboss' parser with standard emboss alignment output which is the route that has been most heavily tested? -jason Jason Stajich Duke University jason at cgt.mc.duke.edu On Tue, 26 Nov 2002, Vilanova,David,LAUSANNE,NRC/BS wrote: > > Hello, > I have problems retrieving the alignments from an emboss output. > The program belows read 2 files and runs a matcher of all against all. > Matcher gives me an msf output and then I try to parse this alignment with > Bio::AlignIO. > However I get an exception... > > Processing sequence 1..vs..3...done > > ------------- EXCEPTION ------------- > MSG: 1 exists as an alignment line but not in the header. Not confident of > what is going on! > STACK Bio::AlignIO::msf::next_aln > /usr/local/lib/perl5/site_perl/5.8.0/Bio/AlignIO/msf.pm:106 > STACK toplevel Run_Emboss.pl:50 > > -------------------------------------- > > Here is the output from matcher: > !!NA_MULTIPLE_ALIGNMENT 1.0 > > out MSF: 5 Type: N 26/11/02 CompCheck: 2090 .. > > Name: EMBOSS_001 Len: 5 Check: 1045 Weight: 1.00 > Name: EMBOSS_002 Len: 5 Check: 1045 Weight: 1.00 > > // > > 1 5 > EMBOSS_001 CGGCG > EMBOSS_002 CGGCG > > > ########################################################### > It doesn't work for fasta format as well in my script (see output below): > Processing sequence 1..vs..3...done > Use of uninitialized value in sprintf at > /usr/local/lib/perl5/site_perl/5.8.0/Bio/SimpleAlign.pm line 257, > line 4. > Use of uninitialized value in hash element at > /usr/local/lib/perl5/site_perl/5.8.0/Bio/SimpleAlign.pm line 268, > line 4. > Use of uninitialized value in hash element at > /usr/local/lib/perl5/site_perl/5.8.0/Bio/SimpleAlign.pm line 268, > line 4. > Use of uninitialized value in hash element at > /usr/local/lib/perl5/site_perl/5.8.0/Bio/SimpleAlign.pm line 270, > line 4. > > ######################### > > > #Script > #! /usr/bin/perl -w > > use Bio::Factory::EMBOSS; > use Bio::SeqIO; > use Bio::AlignIO; > > die "Usage: perl script.pl [seqfileA] [seqfileB] [outfile]\n" unless @ARGV > eq '3'; > > #Read input files > ($seqfileA,$seqfileB,$outfile) = @ARGV; > > #Initialize Object > $EMBOSS = new Bio::Factory::EMBOSS; > > #Define emboss program to run > $application = $EMBOSS->program('matcher'); > > #Manipulate SeqfileA file > $seqA = new Bio::SeqIO (-file => $seqfileA, > -format => 'fasta'); > > > while ($seqinA = $seqA->next_seq){ > $inseqA = "asis::".$seqinA->seq; > $seqidA = $seqinA->id; > > > print "####$seqidA\n"; > #Initialize seqB at every iteration of SeqA > $seqB = new Bio::SeqIO (-file => $seqfileB, > -format => 'fasta'); > > while ($seqinB = $seqB->next_seq){ > $inseqB = "asis::".$seqinB->seq; #Format like asis::ATGCGA (required for > emboss) > $seqidB = $seqinB->id; > > print "Processing sequence $seqidA..vs..$seqidB..."; > > #Define program parameters and run... > $application->run({ > -sequencea => $inseqA, > -sequenceb => $inseqB, > -aformat => 'msf', > -outfile => $outfile }); > print "done\n"; > > $alnin = new Bio::AlignIO(-format => 'msf', > -file => $outfile ); > > while ($aln = $alnin->next_aln){ > print $aln->no_residues,"\n"; > #print $aln->consensus_string,"\n"; > > } > } > } > > > > > > > > > From peter.rice at uk.lionbioscience.com Tue Nov 26 11:12:46 2002 From: peter.rice at uk.lionbioscience.com (Peter Rice) Date: Tue, 26 Nov 2002 16:12:46 +0000 Subject: Bioperl and matcher References: <89466355CEFE7244AC3A013E45641C180144ECE7@lsmail2.crn.nestrd.ch> Message-ID: <3DE39D7E.9080403@uk.lionbioscience.com> Vilanova,David,LAUSANNE,NRC/BS wrote: > > Hello, > I have problems retrieving the alignments from an emboss output. > The program belows read 2 files and runs a matcher of all against all. > Matcher gives me an msf output and then I try to parse this alignment with > Bio::AlignIO. > However I get an exception... > > Processing sequence 1..vs..3...done > > ------------- EXCEPTION ------------- > MSG: 1 exists as an alignment line but not in the header. Not confident of > what is going on! BioPerl seems to be having trouble with the EMBOSS MSF format output. It could be something about the naming of the sequences? EMBOSS is making up names for your sequences. I assume you are using asis::CGGCG to pass them to matcher. You can put -sid after each sequence to give them names, for example: matcher -out x.x -af msf asis:ccggc -sid cg asis::cgggc -sid gg (-sid, like -aformat, is an associated qualifier. It must follow the asis:: sequence because it is positional (putting it first on the command line for example would refer to all sequences - fine for -sformat but not a good idea for -sid :-) Hope this helps Peter -- ------------------------------------------------ Peter Rice, LION Bioscience Ltd, Cambridge, UK peter.rice at uk.lionbioscience.com +44 1223 224723 From david.vilanova at rdls.nestle.com Tue Nov 26 11:14:19 2002 From: david.vilanova at rdls.nestle.com (Vilanova,David,LAUSANNE,NRC/BS) Date: Tue, 26 Nov 2002 17:14:19 +0100 Subject: Bioperl and matcher Message-ID: <89466355CEFE7244AC3A013E45641C180144ECE8@lsmail2.crn.nestrd.ch> Ok,I use: $alnin = new Bio::AlignIO(-format =>'emboss', -file => $outfile ); while ($aln = $alnin->next_aln){ print $aln->no_residues,"\n"; } I don't specify any format to emboss so I get the standard alignment. In this case It doesn't work, it never enters this loop... but the program doesn't crash. It does all the alignements, store the aln in outfile but seems not to read it..!! bizarre ??? David -----Original Message----- From: Jason Stajich [mailto:jason at cgt.mc.duke.edu] Sent: mardi, 26. novembre 2002 17:05 To: Vilanova,David,LAUSANNE,NRC/BS Cc: 'bioperl-l at bioperl.org'; 'emboss at embnet.org' Subject: Re: Bioperl and matcher Our msf parser is seeing something it isn't expecting - not sure why - what happens when you just use the straight 'emboss' parser with standard emboss alignment output which is the route that has been most heavily tested? -jason Jason Stajich Duke University jason at cgt.mc.duke.edu On Tue, 26 Nov 2002, Vilanova,David,LAUSANNE,NRC/BS wrote: > > Hello, > I have problems retrieving the alignments from an emboss output. > The program belows read 2 files and runs a matcher of all against all. > Matcher gives me an msf output and then I try to parse this alignment with > Bio::AlignIO. > However I get an exception... > > Processing sequence 1..vs..3...done > > ------------- EXCEPTION ------------- > MSG: 1 exists as an alignment line but not in the header. Not confident of > what is going on! > STACK Bio::AlignIO::msf::next_aln > /usr/local/lib/perl5/site_perl/5.8.0/Bio/AlignIO/msf.pm:106 > STACK toplevel Run_Emboss.pl:50 > > -------------------------------------- > > Here is the output from matcher: > !!NA_MULTIPLE_ALIGNMENT 1.0 > > out MSF: 5 Type: N 26/11/02 CompCheck: 2090 .. > > Name: EMBOSS_001 Len: 5 Check: 1045 Weight: 1.00 > Name: EMBOSS_002 Len: 5 Check: 1045 Weight: 1.00 > > // > > 1 5 > EMBOSS_001 CGGCG > EMBOSS_002 CGGCG > > > ########################################################### > It doesn't work for fasta format as well in my script (see output below): > Processing sequence 1..vs..3...done > Use of uninitialized value in sprintf at > /usr/local/lib/perl5/site_perl/5.8.0/Bio/SimpleAlign.pm line 257, > line 4. > Use of uninitialized value in hash element at > /usr/local/lib/perl5/site_perl/5.8.0/Bio/SimpleAlign.pm line 268, > line 4. > Use of uninitialized value in hash element at > /usr/local/lib/perl5/site_perl/5.8.0/Bio/SimpleAlign.pm line 268, > line 4. > Use of uninitialized value in hash element at > /usr/local/lib/perl5/site_perl/5.8.0/Bio/SimpleAlign.pm line 270, > line 4. > > ######################### > > > #Script > #! /usr/bin/perl -w > > use Bio::Factory::EMBOSS; > use Bio::SeqIO; > use Bio::AlignIO; > > die "Usage: perl script.pl [seqfileA] [seqfileB] [outfile]\n" unless @ARGV > eq '3'; > > #Read input files > ($seqfileA,$seqfileB,$outfile) = @ARGV; > > #Initialize Object > $EMBOSS = new Bio::Factory::EMBOSS; > > #Define emboss program to run > $application = $EMBOSS->program('matcher'); > > #Manipulate SeqfileA file > $seqA = new Bio::SeqIO (-file => $seqfileA, > -format => 'fasta'); > > > while ($seqinA = $seqA->next_seq){ > $inseqA = "asis::".$seqinA->seq; > $seqidA = $seqinA->id; > > > print "####$seqidA\n"; > #Initialize seqB at every iteration of SeqA > $seqB = new Bio::SeqIO (-file => $seqfileB, > -format => 'fasta'); > > while ($seqinB = $seqB->next_seq){ > $inseqB = "asis::".$seqinB->seq; #Format like asis::ATGCGA (required for > emboss) > $seqidB = $seqinB->id; > > print "Processing sequence $seqidA..vs..$seqidB..."; > > #Define program parameters and run... > $application->run({ > -sequencea => $inseqA, > -sequenceb => $inseqB, > -aformat => 'msf', > -outfile => $outfile }); > print "done\n"; > > $alnin = new Bio::AlignIO(-format => 'msf', > -file => $outfile ); > > while ($aln = $alnin->next_aln){ > print $aln->no_residues,"\n"; > #print $aln->consensus_string,"\n"; > > } > } > } > > > > > > > > > From david.vilanova at rdls.nestle.com Tue Nov 26 11:33:56 2002 From: david.vilanova at rdls.nestle.com (Vilanova,David,LAUSANNE,NRC/BS) Date: Tue, 26 Nov 2002 17:33:56 +0100 Subject: Bioperl and matcher Message-ID: <89466355CEFE7244AC3A013E45641C180144ECEC@lsmail2.crn.nestrd.ch> I tried that but it still doesn't fix the problem... -----Original Message----- From: Peter Rice [mailto:peter.rice at uk.lionbioscience.com] Sent: mardi, 26. novembre 2002 17:13 To: Vilanova,David,LAUSANNE,NRC/BS Cc: 'bioperl-l at bioperl.org'; 'emboss at embnet.org' Subject: Re: Bioperl and matcher Vilanova,David,LAUSANNE,NRC/BS wrote: > > Hello, > I have problems retrieving the alignments from an emboss output. > The program belows read 2 files and runs a matcher of all against all. > Matcher gives me an msf output and then I try to parse this alignment with > Bio::AlignIO. > However I get an exception... > > Processing sequence 1..vs..3...done > > ------------- EXCEPTION ------------- > MSG: 1 exists as an alignment line but not in the header. Not confident of > what is going on! BioPerl seems to be having trouble with the EMBOSS MSF format output. It could be something about the naming of the sequences? EMBOSS is making up names for your sequences. I assume you are using asis::CGGCG to pass them to matcher. You can put -sid after each sequence to give them names, for example: matcher -out x.x -af msf asis:ccggc -sid cg asis::cgggc -sid gg (-sid, like -aformat, is an associated qualifier. It must follow the asis:: sequence because it is positional (putting it first on the command line for example would refer to all sequences - fine for -sformat but not a good idea for -sid :-) Hope this helps Peter -- ------------------------------------------------ Peter Rice, LION Bioscience Ltd, Cambridge, UK peter.rice at uk.lionbioscience.com +44 1223 224723 From vz_silvana at verizon-uweb.com Wed Nov 27 15:30:03 2002 From: vz_silvana at verizon-uweb.com (Silvana Paredes) Date: Wed, 27 Nov 2002 15:30:03 -0500 Subject: Inquire about login jemboss Message-ID: <200211272030.PAA18916@www22.ureach.com> To whom may it concern: I downloaded the jemboss software but I am trying to used and it is asking me for a login and a password and I can't find the way to set up an account or use the emboss without login it. I will appreciate if you can give me instructions about how to start using it or create an account. Thank you so much, Best regards, Silvana Paredes From starksb at ebi.ac.uk Fri Nov 1 09:45:29 2002 From: starksb at ebi.ac.uk (David Starks-Browning) Date: Fri, 1 Nov 2002 09:45:29 +0000 Subject: emboss in cygwin In-Reply-To: <3DC16BAA.1050201@bigfoot.com> References: <3DC16BAA.1050201@bigfoot.com> Message-ID: <4429-Fri01Nov2002094530+0000-starksb@ebi.ac.uk> On Thursday 31 Oct 02, clwu writes: > Hi, group, > I am new to group. I tried to compile EMBOSS under > win2K/cygwin but I failed. EMBOSS website at HGMP mentioned that > "Richard Bruskiewich and Simon Kelley at the Sanger Centre have > succeeded in compiling EMBOSS under Windows NT using the CygWin package. > The resulting executables have been tested but not thoroughly enough for > a release. Contact Richard Bruskiewich for more information. ". But I > can not follow the link in this page to get help. > Does anyone have the successful experience on this? I just built EMBOSS-2.5.1 on Win98 using the latest Cygwin downloaded from . There is no libgd.[a|dll] so no PNG support. But everything else appeared to build fine. I've not tested the applications though. Note that you will need much more from Cygwin's setup.exe than is installed by default. If you provide details about what failed, I may be able to help you. Feel free to respond off-list, as a Cygwin build may not be interesting to the rest of the emboss list. We can always summarise to the emboss list once we get it sorted, if there is interest. Regards, David (Cygwin FAQ maintainer) ------------------------------------------------------------------- David Starks-Browning | starksb at ebi.ac.uk EMBL Outstation -- | The European Bioinformatics Institute | Wellcome Trust Genome Campus | tel: +44 (1223) 494 616 Hinxton, Cambridge, CB10 1SD, UK | fax: +44 (1223) 494 468 ------------------------------------------------------------------- From peter.rice at uk.lionbioscience.com Fri Nov 1 10:12:58 2002 From: peter.rice at uk.lionbioscience.com (Peter Rice) Date: Fri, 01 Nov 2002 10:12:58 +0000 Subject: emboss in cygwin References: <3DC16BAA.1050201@bigfoot.com> Message-ID: <3DC253AA.8080401@uk.lionbioscience.com> clwu wrote: > I am new to group. I tried to compile EMBOSS under > win2K/cygwin but I failed. EMBOSS website at HGMP mentioned that > "Richard Bruskiewich and Simon Kelley at the Sanger Centre have > succeeded in compiling EMBOSS under Windows NT using the CygWin package. > The resulting executables have been tested but not thoroughly enough for > a release. Contact Richard Bruskiewich for more information. ". But I > can not follow the link in this page to get help. That is rather old information. The history is that Richard Bruskiewich made a windows port of an early ACEDB version, and they both tried porting an early EMBOSS release using cygwin - which worked apart from the graphics library and windows fiel naming. Neither Richard nor Simon have been working on this recently. David Starks-Browning at EBI has built EMBOSS but not yet tried the applications. I hear of other groups who have also tried. You can expect problems with Windows filenames which clash with EMBOSS "USA" syntax. We can try to fix these - perhaps by requiring all database names to have more than one letter so Windows drive letters work. Any suggestions on changes needed to make EMBOSS work better (or work at all) on windows systems? Peter -- ------------------------------------------------ Peter Rice, LION Bioscience Ltd, Cambridge, UK peter.rice at uk.lionbioscience.com +44 1223 224723 From pageauma at ESI.UMontreal.CA Fri Nov 1 20:01:07 2002 From: pageauma at ESI.UMontreal.CA (Marie PAGEAU) Date: Fri, 1 Nov 2002 15:01:07 -0500 Subject: test sequence Message-ID: Dear colleagues, A lady, professor of biochemistry at the Universite de Montreal, sent me the following request. Would you please be nice enough to help us? Your help would be highly appreciated. Best regards, Marie Pageau ----------------------------------------------------------- De : Muriel Aubry Envoy? : 30 octobre, 2002 14:47 Objet : test sequence Hi, I am presently using the restrict and showseq programs from EMBOSS. I have noticed that some very usual enzymes are not detected by the program such as XhoI and PstI and a few others when the complete list of enzymes is used. I have here below a test sequence that should contain XhoI, EcoRI, PstI, EcoRV, HindIII, KpnI, SacII, ApaI, SmaI, BamHI and XbaI. XhoI, PstI and EcoRV are not detected by restrict and showseq in the test sequence shown below. Is there a problem with the restriction enzyme list? Test Sequence: gagcagggggatctcggcgagctctcgagaattctcacgcgtctgcaggatatcaagcttgcggtaccgcgg gcccggg From ableasby at hgmp.mrc.ac.uk Fri Nov 1 20:21:29 2002 From: ableasby at hgmp.mrc.ac.uk (ableasby at hgmp.mrc.ac.uk) Date: Fri, 1 Nov 2002 20:21:29 GMT Subject: test sequence Message-ID: <200211012021.UAA12495@bromine.hgmp.mrc.ac.uk> There is probably not a problem. EMBOSS only reports one isoschizomer for cases where several REs have the same cut site. If the -preferred switch is given to these programs then the more easily available of the isoschizomers will be reported. This is controlled by the file: embossre.equ where, for each RE, you can specify which isoschizomer should be reported. So, first try adding -preferred. If you just want to search for a particular set of enzymes they can be given as a comma-separated list using the -enzymes qualifier e.g. -enzymes "ecori bamhi" HTH Alan Bleasby HGMP PS: NEB supply an emboss-format set of files which are just the most common REs. You can rename them (e.g. to embossre.enz/ref/sup) and overwrite your current set in the emboss REBASE directory. From David.Bauer at SCHERING.DE Mon Nov 4 08:35:51 2002 From: David.Bauer at SCHERING.DE (David.Bauer at SCHERING.DE) Date: Mon, 4 Nov 2002 09:35:51 +0100 Subject: test sequence Message-ID: Hi Alan, I wonder how the "preferred" list is created. Restrict finds the site "CTGCAG" as recognition site of BstMAI. This is a rather exotic enzyme, available only from one single company, which I didn't know before. I appologize for my ignorance if this is a common supplier in UK ;-) On the other hand PstI is available from about 20 suppliers and this is also the enzyme name used in various catalogue pictures of multiple cloning sites in vectors (puc19 polylinker etc.) So I would suggest to add the BstMAI -> PstI mapping to the distribution version of embossre.equ. David. There is probably not a problem. EMBOSS only reports one isoschizomer for cases where several REs have the same cut site. If the -preferred switch is given to these programs then the more easily available of the isoschizomers will be reported. This is controlled by the file: embossre.equ where, for each RE, you can specify which isoschizomer should be reported. So, first try adding -preferred. If you just want to search for a particular set of enzymes they can be given as a comma-separated list using the -enzymes qualifier e.g. -enzymes "ecori bamhi" HTH Alan Bleasby HGMP PS: NEB supply an emboss-format set of files which are just the most common REs. You can rename them (e.g. to embossre.enz/ref/sup) and overwrite your current set in the emboss REBASE directory. From gbottu at ben.vub.ac.be Mon Nov 4 10:05:58 2002 From: gbottu at ben.vub.ac.be (Guy Bottu) Date: Mon, 4 Nov 2002 11:05:58 +0100 (CET) Subject: Remote getz from emboss Message-ID: <200211041005.LAA1502643@black.vub.ac.be> from : BEN At the BEN site we do have a Perl script that reproduces more or less the functionality of the GCG program lookup. It can access a local (with getz) or a remote (with rsh getz) SRS server (simple outcomment inside the script, was because we once had our SRS server on a different computer). It can be run interactively at the command line or put behind an EMBOSS wrapper program and thus behind Staden or the EMBOSS WWW interfaces. Is this what you are looking for ? Guy Bottu From duhaimj at ircm.qc.ca Tue Nov 5 21:34:57 2002 From: duhaimj at ircm.qc.ca (Johanne Duhaime) Date: Tue, 05 Nov 2002 16:34:57 -0500 Subject: MSE will not save on Exit Message-ID: <3DC83981.625D3E83@ircm.qc.ca> Hello I am trying to use MSE (MSE -0.04.tar.gz just installed) but I cannot save with the exit command. After I modified a sequence, when I type Exit on the command line I have: Sequences modified do you wish to continue exiting [N] Saying Y or N will not save anything. For now I have to use "write". Any idea of the problem? -- Johanne Duhaime IRCM 110 Ave des Pins O Montreal, Quebec 987-5556 (tel) 987-5644 (fax) Johanne_Duhaime at ircm.qc.ca http://www.ircm.qc.ca From w2hgcg at netscape.net Wed Nov 6 03:01:47 2002 From: w2hgcg at netscape.net (w2hgcg at netscape.net) Date: Tue, 05 Nov 2002 22:01:47 -0500 Subject: epitope search Message-ID: <56BE5F2D.194AD14A.000665E2@netscape.net> I know this is not the place but perhaps... I am working with LSA-3, I have been able to produce some antibodys in rabits, when I run Inmunoblot the antiboys recognize specific proteic bands (bandas, do not know the right english word), how can I search for epitopes in the pfalciparum against my inmunogenos? sorry for my english... Lucia Goncalvez __________________________________________________________________ The NEW Netscape 7.0 browser is now available. Upgrade now! http://channels.netscape.com/ns/browsers/download.jsp Get your own FREE, personal Netscape Mail account today at http://webmail.netscape.com/ From ray at leicester.ac.uk Wed Nov 6 13:24:38 2002 From: ray at leicester.ac.uk (Dalgleish, Dr R.) Date: Wed, 6 Nov 2002 13:24:38 -0000 Subject: Suggestion for new EMBOSS program Message-ID: I find GCG framealign very useful to align a protein with its DNA sequence. Could somebody find the time to write an EMBOSS equivalent? Thanks, Raymond Dalgleish Genetics Leicester From peter.rice at uk.lionbioscience.com Wed Nov 6 13:32:13 2002 From: peter.rice at uk.lionbioscience.com (Peter Rice) Date: Wed, 06 Nov 2002 13:32:13 +0000 Subject: Suggestion for new EMBOSS program References: Message-ID: <3DC919DD.5020306@uk.lionbioscience.com> Dalgleish, Dr R. wrote: > I find GCG framealign very useful to align > a protein with its DNA sequence. Could > somebody find the time to write an EMBOSS > equivalent? Sounds rather like genewise in the (free) Wise2 package http://www.sanger.ac.uk/Software/Wise2/ You can try it at http://www.sanger.ac.uk/Software/Wise2/genewiseform.shtml Can you be more specific about whether this is what you need, and what you want in EMBOSS? regards, Peter -- ------------------------------------------------ Peter Rice, LION Bioscience Ltd, Cambridge, UK peter.rice at uk.lionbioscience.com +44 1223 224723 From heme at postmark.net Thu Nov 7 12:05:55 2002 From: heme at postmark.net (Per Johansson) Date: Thu, 07 Nov 2002 12:05:55 +0000 Subject: Suggestion for new use for EMBOSS program Message-ID: <20021107120555.8757.qmail@venus.postmark.net> I have it difficult to find a replacement for the GCG program FINDPATTERNS. The EMBOSS program fuzznuc cannot use a database of patterns (primers). Other alignment programs in EMBOSS like supermatcher are useful but, among other things, you can't choose mismatch settings. The best replacement I've found is the EMBOSS program tfscan! Tfscan uses a database of patterns, but you can't reverse the patterns (you have to put in copies of forward and reverse primer sequences in the database). The tfscan algorithm is ideal (and is much faster than find patterns) but obviously a few minor changes to the input and output would be required if it were used in a replacement program. Obviously, I could write a script to wrap tfscan but I'd like to avoid this. A new program with this functionality would be beneficial for the EMBOSS package. Per From charles at moulinette.dyndns.org Thu Nov 7 13:20:42 2002 From: charles at moulinette.dyndns.org (Charles Plessy) Date: Thu, 7 Nov 2002 14:20:42 +0100 Subject: Suggestion for new use for EMBOSS program In-Reply-To: <20021107120555.8757.qmail@venus.postmark.net> References: <20021107120555.8757.qmail@venus.postmark.net> Message-ID: <20021107132042.GA9854@moulinette.dyndns.org> > The best replacement I've found is the EMBOSS program tfscan! > Obviously, I could write a script to wrap tfscan but I'd like to > avoid this. A new program with this functionality would be beneficial > for the EMBOSS package. An alternative would be to write a script that builds a transfac-format database from a flatfile containing names and corresponding consensus (This would also allow to migrate the pattern.dat file of GCG). Charles From heme at postmark.net Thu Nov 7 15:11:43 2002 From: heme at postmark.net (Per Johansson) Date: Thu, 07 Nov 2002 15:11:43 +0000 Subject: Fwd: Re: Suggestion for new use for EMBOSS program Message-ID: <20021107151143.25950.qmail@venus.postmark.net> It's OK to reformat the GCG pattern file to a transfac-format database and use tfscan. But I still miss some functions in FINDPATTERNS. I can't search the reverse primer strand, the output is limited to ONE format no alignment format, tfscan it doesn't accept wobbeling bases in primers (e.g. K=G or T , D=G or C or A not-T ...). But otherwise the tfscan algorithm is a very nice and fast word-matching algorithm, but it COULD be used for other purposes also! Per --- Forwarded Message --- To: EMBOSS From: Charles Plessy Reply-To: c.plessy at mangoosta.net Subject: Re: Suggestion for new use for EMBOSS program Date: Thu, 7 Nov 2002 14:20:42 +0100 > The best replacement I've found is the EMBOSS program tfscan! > Obviously, I could write a script to wrap tfscan but I'd like to > avoid this. A new program with this functionality would be beneficial > for the EMBOSS package. An alternative would be to write a script that builds a transfac-format database from a flatfile containing names and corresponding consensus (This would also allow to migrate the pattern.dat file of GCG). Charles From heme at postmark.net Fri Nov 8 14:28:53 2002 From: heme at postmark.net (Per Johansson) Date: Fri, 08 Nov 2002 14:28:53 +0000 Subject: EMBOSS default program settings Message-ID: <20021108142853.4941.qmail@venus.postmark.net> I have problems with EMBOSS default program settings in the emboss.defaults file. set emboss_stdout 1 Works fine, output goes to stdout set emboss_verbose 1 Dosen't work set emboss_format embl The programs still outputs fasta format by default! And the ONLY sequence format the EMBOSS programs accepts as input format is embl! It dosen't work as it should. Per From peter.rice at uk.lionbioscience.com Fri Nov 8 15:52:05 2002 From: peter.rice at uk.lionbioscience.com (Peter Rice) Date: Fri, 08 Nov 2002 15:52:05 +0000 Subject: EMBOSS default program settings References: <20021108142853.4941.qmail@venus.postmark.net> Message-ID: <3DCBDDA5.3070106@uk.lionbioscience.com> Per Johansson wrote: > I have problems with EMBOSS default program settings in the > emboss.defaults file. > > set emboss_stdout 1 Works fine, output goes to stdout > > set emboss_verbose 1 Dosen't work Because help is generated as soon as the -help option is tested. Changed in the next release to set -verbose before -help. > set emboss_format embl The programs still outputs fasta format by > default! And the ONLY sequence format the EMBOSS programs accepts as > input format is embl! > > It dosen't work as it should. Well .... emboss_format sets the default *input* format. You can still say fasta::filename to read fasta format The output format is specified as emboss_outformat EMBOSS will read all input formats if you only set emboss_outformat I think you really mean to say: set emboss_outformat embl Hope this helps Peter -- ------------------------------------------------ Peter Rice, LION Bioscience Ltd, Cambridge, UK peter.rice at uk.lionbioscience.com +44 1223 224723 From heme at postmark.net Mon Nov 11 06:32:50 2002 From: heme at postmark.net (Per Johansson) Date: Mon, 11 Nov 2002 06:32:50 +0000 Subject: Fwd: Re: EMBOSS default program settings Message-ID: <20021111063250.19730.qmail@venus.postmark.net> Thank you Peter, I DO mean set emboss_outformat embl (but I can't find emboss_outformat in the documentation). When I use set emboss_outformat embl in the emboss.default file I don't see any effect at all, the default output format is still fasta, I use emboss-2.4.1 . But I assume this is not version dependent. Per Per Johansson heme at postmark.net --- Forwarded Message --- To: Per Johansson Cc: EMBOSS From: Peter Rice Subject: Re: EMBOSS default program settings Date: Fri, 08 Nov 2002 15:52:05 +0000 Per Johansson wrote: > I have problems with EMBOSS default program settings in the > emboss.defaults file. > > set emboss_stdout 1 Works fine, output goes to stdout > > set emboss_verbose 1 Dosen't work Because help is generated as soon as the -help option is tested. Changed in the next release to set -verbose before -help. > set emboss_format embl The programs still outputs fasta format by > default! And the ONLY sequence format the EMBOSS programs accepts as > input format is embl! > > It dosen't work as it should. Well .... emboss_format sets the default *input* format. You can still say fasta::filename to read fasta format The output format is specified as emboss_outformat EMBOSS will read all input formats if you only set emboss_outformat I think you really mean to say: set emboss_outformat embl Hope this helps Peter -- ------------------------------------------------ Peter Rice, LION Bioscience Ltd, Cambridge, UK peter.rice at uk.lionbioscience.com +44 1223 224723 From gwilliam at hgmp.mrc.ac.uk Mon Nov 11 09:20:01 2002 From: gwilliam at hgmp.mrc.ac.uk (Gary Williams, Tel 01223 494522) Date: Mon, 11 Nov 2002 09:20:01 +0000 Subject: Fwd: Re: EMBOSS default program settings References: <20021111063250.19730.qmail@venus.postmark.net> Message-ID: <3DCF7641.E2ED9B73@hgmp.mrc.ac.uk> Per Johansson wrote: > > Thank you Peter, > > I DO mean > > set emboss_outformat embl (but I can't find emboss_outformat in the > documentation). It is documented in: http://www.hgmp.mrc.ac.uk/Software/EMBOSS/Usa/databases.html#global -- Gary Williams Tel: +44 1223 494522 Fax: +44 1223 494512 mailto:G.Williams at hgmp.mrc.ac.uk http://www.hgmp.mrc.ac.uk/ Bioinformatics,MRC HGMP Resource Centre,Hinxton,Cambridge, CB10 1SB,UK From heme at postmark.net Mon Nov 11 13:12:50 2002 From: heme at postmark.net (Per Johansson) Date: Mon, 11 Nov 2002 13:12:50 +0000 Subject: Fwd: Re: EMBOSS default program settings Message-ID: <20021111131250.4618.qmail@www2.postmark.net> Tnak you, That solves the problem, ALWAYS use the latest version! Per Gary Williams, Tel 01223 494522 wrote: > Per Johansson wrote: > > > > Thank you Peter, > > > > I DO mean > > > > set emboss_outformat embl (but I can't find emboss_outformat in the > > documentation). > > It is documented in: > http://www.hgmp.mrc.ac.uk/Software/EMBOSS/Usa/databases.html#global > > -- > Gary Williams Tel: +44 1223 494522 Fax: +44 1223 494512 > mailto:G.Williams at hgmp.mrc.ac.uk http://www.hgmp.mrc.ac.uk/ > Bioinformatics,MRC HGMP Resource Centre,Hinxton,Cambridge, CB10 1SB,UK From sebastian.bassi at ar.advantaseeds.com Mon Nov 11 12:10:54 2002 From: sebastian.bassi at ar.advantaseeds.com (Sebastian Bassi) Date: Mon, 11 Nov 2002 13:10:54 +0100 Subject: Problem with EMBOSS GUI Message-ID: Hi, I've just installed the EMBOSS GUI on http://genes.unq.edu.ar/EMBOSS (this should look like this http://bioinfo.pbi.nrc.ca:8090/EMBOSS/) The problem as you can see on the webpage is the missing programs on the left column (it should appear there all the EMBOSS programs). I think this should be a path problem. For you to help me evaluate it, I attach two files: embossdir.txt, a capture of the ls -Ra from my emboss inst. directory (/opt/emboss). emboss.pl, the emboss.pl file (for you to see if the path are right). The emboss.zip file contains both files and I made it because sometimes attached text get corrupted by some mailers. I hope you can help me. Note: The EMBOSS were compiled using this: configure --prefix=/opt/emboss --without-x --x-includes="" --x-libraries="" The "without x" part is because it was comiled on a RH web server without X. The emboss works fine, the problem is this GUI. Sebastian Bassi. Advanta Seeds. Balcarce Research Station. -------------- next part -------------- A non-text attachment was scrubbed... Name: emboss.zip Type: application/x-zip-compressed Size: 15092 bytes Desc: emboss.zip URL: From mad at biol.unlp.edu.ar Mon Nov 11 19:18:00 2002 From: mad at biol.unlp.edu.ar (=?ISO-8859-1?Q?Mart=EDn_Sarachu?=) Date: Mon, 11 Nov 2002 16:18:00 -0300 Subject: tfextract not indexing? Message-ID: <3DD00268.6090206@biol.unlp.edu.ar> Hi, tfextract is apparently running ok, but it's output are files are empty. The command line is > # tfextract -debug -warning -error -fatal -die -verbose > Extract data from TRANSFAC > Full pathname of transfac SITE.DAT: /home/work/dbs/transfac/site.dat > # and > # ls -s /usr/local/emboss/share/EMBOSS/data/tf* > 0 /usr/local/emboss/share/EMBOSS/data/tffungi > 0 /usr/local/emboss/share/EMBOSS/data/tfinsect > 0 /usr/local/emboss/share/EMBOSS/data/tfother > 0 /usr/local/emboss/share/EMBOSS/data/tfplant > 0 /usr/local/emboss/share/EMBOSS/data/tfvertebrate ...a sample from site.dat > VV TRANSFAC SITES TABLE, V.2.4 25-08-1995 > XX > // > AC R00001 > XX > ID HS$6-16_01 > XX > DT 20.06.90 (created); . > DT 24.08.95 10:48:05 (updated); EWI. > XX > TY DNA > XX > DE 6-16 > XX > SE gGGAAAaTGAAACT > XX > EL ISRE > XX > SF -127 > ST -89 > XX > ... > ... > SO 0811; B103 > ME gel shift competition > RN [1] > RA Suzuki-Yagawa Y., Kawakami K., Nagano K. > RT Housekeeping Na,K-ATPase alpha1 subunit gene promoter is composed > RT of multiple cis elements to which common and cell type-specific > RT factors bind > RL Mol. Cell. Biol. 12:4046-4055 (1992). > DR EMBL; X52560; HSNFIL6(37:74). > // am I missing something? Thanks, martin -- Mart?n Sarachu mad at biol.unlp.edu.ar EMBNet Argentina http://www.ar.embnet.org From Gunnar.Andersson at imbim.uu.se Tue Nov 12 10:28:47 2002 From: Gunnar.Andersson at imbim.uu.se (Gunnar Andersson) Date: Tue, 12 Nov 2002 11:28:47 +0100 Subject: DAN output Tm Message-ID: How should I interpret the Tm calculated but DAN? Is Tm an estimated melt point of the entire sequence or of the sequence in the window? How can this Tm (window=100nt) be higher than Tmprod of the full 160 nt sequence? -- Gunnar Andersson Institutionen f?r medicinsk biokemi och mikrobiologi Uppsala Biomedicinska Centrum (BMC), Husarg. 3 Box 582, 751 23 UPPSALA E-post : Gunnar.Andersson at imbim.uu.se Telefon: 018-471 45 87 Fax:018-50 98 76 From r.bowden at vir.gla.ac.uk Thu Nov 14 15:36:23 2002 From: r.bowden at vir.gla.ac.uk (Rory Bowden) Date: Thu, 14 Nov 2002 15:36:23 -0000 Subject: Fw: Other: EMBOSS versus GCG? Message-ID: <007e01c28bf3$96beddc0$6886d182@vir.gla.ac.uk> This on the 'evoldir' list, which is the main international mailing list in evolutionary biology. Would anyone like to make any comments for me to pass on? while I'm definitely not in Canada I would say that this question is likely to come up here (in the UK) at the institutional if not research council level. Does anyone have an opinion they'd like to articulate e.g. about whether EMBOSS is ready to supplant GCG for end-users. Rory Bowden MRC Virology Unit Glasgow UK ----- Original Message ----- From: "EvolDir" To: Sent: Thursday, November 14, 2002 9:34 AM Subject: Other: EMBOSS versus GCG? > > Since its inception as the "Wisconsin package" in the early 1980s, the GCG > suite of programs have provided a continuously improving "gold standard" > for evolutionary bioinformatics software. The GCG suite is featured > extensively in the latest bioinformatics textbooks (e.g. Mount) and in > software reviews (e.g. The Scientist, August 19). Although some individual > GCG programs have been surpassed by others, their range and flexibility, > permitting linkage of programs together in innovative ways, has no current > equivalent. > > Recently, "open source" advocates have pointed to the EMBOSS suite as > providing a free alternative to the commercial package (supplied by > Accelrys, with whom I have no financial connection). It is my impression > that GCG is in a different league. For example, compare the GCG program > "Window" with its proposed EMBOSS alternative "Freak": > > TASK: Determination of the number of occurences of a motif in a sequence > window. > > GGC program WINDOW > > 1. Allows up to 6 motifs at a time > 2. Outputs absolute values and has a variety of other output options. > 3. Extensive input menu > > EMBOSS program FREAK > > 1. Allows only 1 motif at a time. > 2. Outputs a calculated fraction. > 3. Very limited input menu. > > However, in Canada the open-source agenda has won out. In April 2002 > the publicly-funded, Halifax-based, Canadian Bioinformatics Resource (CBR) > abandonned GCG, apparently with the consent of the Canadian evolutionary > bioinformatics community. In this respect, I would be interested to hear > from concerned parties in Canada with respect to the following questions: > > 1. Does your institution (or do you yourself) support GCG, so that you do > not need CBR to supply GCG? > > 2. If you do not have independent access, do you find EMBOSS a suitable > substitute for GCG? > > That Canada, which has spent hundreds of millions on genome > projects, cannot give its researchers and their students a choice from > among the relatively-inexpensive software packages that are available to > analyze genomics data, seems to me very strange. > > Donald Forsdyke, Department of Biochemistry, > Queen's University, Canada > http://post.queensu.ca/~forsdyke/bioinfor.htm > > > > From peter.rice at uk.lionbioscience.com Fri Nov 15 13:20:13 2002 From: peter.rice at uk.lionbioscience.com (Peter Rice) Date: Fri, 15 Nov 2002 13:20:13 +0000 Subject: Fw: Other: EMBOSS versus GCG? References: <007e01c28bf3$96beddc0$6886d182@vir.gla.ac.uk> Message-ID: <3DD4F48D.2020606@uk.lionbioscience.com> Rory Bowden wrote: > This on the 'evoldir' list, which is the main international mailing list in > evolutionary biology. Would anyone like to make any comments for me to pass > on? >> It is my impression >>that GCG is in a different league. For example, compare the GCG program >>"Window" with its proposed EMBOSS alternative "Freak": >> >>TASK: Determination of the number of occurences of a motif in a sequence >>window. >> >>GGC program WINDOW >> >>1. Allows up to 6 motifs at a time >>2. Outputs absolute values and has a variety of other output options. >>3. Extensive input menu >> >>EMBOSS program FREAK >> >>1. Allows only 1 motif at a time. >>2. Outputs a calculated fraction. >>3. Very limited input menu. Window : produces scores over a 'window' (a base range). StatPlot : Plots Window results EMBOSS : reports have scores over a base range as a general output format. Freak: frequency of matches FuzzNuc/FuzzPro/FuzzTran: Pattern matches with ambiguity codes Restrict: Pattern matches with a pattern file etc... This makes it possible to develop some really nice new EMBOSS applications. So ... how about a program which reads EMBOSS report files and produces a summary report (think of window), and another that plots them all (think of statplot). Scores could be plotted if we have a good way to compare them. Yes, I know freak does not produce a report file ... but that is a very easy change. It could also read in EMBL/SwissProt feature tables as annotation. So, suggestions please for EMBOSS applications to plot reports/features... For example: 1. xy plot of scores as points at the centre of a feature, with the sequence position on the x axis and the score on the y axis. Possibly split into multiple plots by program/feature-type/named-tag-value (e.g. pattern) (like statplot only much more versatile). 2. xy plot of lines for each feature 3. GANTT (bar) chart of features by position, annotated with feature type/program/score as appropriate 4. Combine these - xy plot of features with scores, and other features reported underneath (think of the -mark option in statplot - but with far more annotation possible below the x axis) Maybe we can make some mock-ups on the EMBOSS pages to show the possibilities? regards, Peter -- ------------------------------------------------ Peter Rice, LION Bioscience Ltd, Cambridge, UK peter.rice at uk.lionbioscience.com +44 1223 224723 From sjmiller at u.arizona.edu Fri Nov 15 19:23:21 2002 From: sjmiller at u.arizona.edu (Susan J. Miller) Date: Fri, 15 Nov 2002 12:23:21 -0700 Subject: newcpgreport vs newcpgseek Message-ID: <3DD549A9.9170F056@u.arizona.edu> I could not find an emboss FAQ...is there one? I'm trying to figure out the differences between cpgreport, newcpgreport and newcpgseek. -- Thanks, -susan Susan J. Miller Biotechnology Computing Facility Arizona Research Laboratories Bio West 228 University of Arizona Tucson, AZ 85721 (520) 626-2597 From rls at ebi.ac.uk Sat Nov 16 01:08:02 2002 From: rls at ebi.ac.uk (Rodrigo Lopez) Date: Sat, 16 Nov 2002 01:08:02 -0000 Subject: newcpgreport vs newcpgseek In-Reply-To: <3DD549A9.9170F056@u.arizona.edu> Message-ID: <000501c28d0c$9cf06780$0a0868d5@castafiore> Hi Susan, Yes. I never had the time to document these. Briefly: newcpgreport use the same method to find islands but produce different output. The method is described in: Larsen,F., Gundersen,G., Lopez,R., Prydz,H. CpG islands as gene markers in the human genome. (1992) Genomics 13 (4):1095-107 MedlineID: 92372002 PubMedID: 1505946 Cpgreport uses a scoring method based on sum/frequencies which overpredicts islands but finds the smaller ones around primary exons. Cpgseek is deprecated at the moment. For all practical purposes I use newcpgreport. I actually use it to produce the human cpgisland database you can find on the EBI's ftp server as well as on the EBI's SRS server. Hope this helps, R:) > -----Original Message----- > From: owner-emboss at hgmp.mrc.ac.uk > [mailto:owner-emboss at hgmp.mrc.ac.uk] On Behalf Of Susan J. Miller > Sent: 15 November 2002 19:23 > To: emboss at embnet.org > Subject: newcpgreport vs newcpgseek > > > I could not find an emboss FAQ...is there one? > > I'm trying to figure out the differences between cpgreport, > newcpgreport and newcpgseek. > > -- > Thanks, > -susan > > Susan J. Miller > Biotechnology Computing Facility > Arizona Research Laboratories > Bio West 228 > University of Arizona > Tucson, AZ 85721 > (520) 626-2597 > From David.Bauer at SCHERING.DE Wed Nov 20 14:45:24 2002 From: David.Bauer at SCHERING.DE (David.Bauer at SCHERING.DE) Date: Wed, 20 Nov 2002 15:45:24 +0100 Subject: vectorstrip Message-ID: Hi, If I run vectorstrip on a file with many sequences, the output file contains only sequences where the vector was stripped. I would find it more usefull, if vectorstrip would (maybe optionally) also send the sequences with no hit to the vector in the output file. Or have I overseen something ? David. From Myrian_Grondin at UQTR.CA Wed Nov 20 16:15:20 2002 From: Myrian_Grondin at UQTR.CA (Myrian_Grondin at UQTR.CA) Date: Wed, 20 Nov 2002 11:15:20 -0500 Subject: Install Emboss with Windows?? Message-ID: <1037808920.3ddbb518dbbac@courriel.uqtr.ca> Hi, We are working on PC, OS Windows 98, and we would like to know if it's possible to install Emboss on our machine. If so, which software have we to install to be able to run Emboss? Thanks a lot (excuse me, my English is so poor...) Myrian ------------------------------------------------- Courriel exp?di? via https://courriel.uqtr.ca From stefanielager at fastmail.ca Thu Nov 21 08:19:32 2002 From: stefanielager at fastmail.ca (Stefanie Lager) Date: Thu, 21 Nov 2002 03:19:32 -0500 (EST) Subject: EMBOOS end EMBL entryname Message-ID: <3DDC9714.000059.00380@ns.interchange.ca> Hi, Does EMBL still stick to entrynames (the ID line)of "nine uppercase alphanumeric Characters"? (http://www.ebi.ac.uk/embl/Documentation/User_manual/id_line.html) .I can't retrive sequences from the International Protein Index (IPI) database (11 characters in ID entryname) in EMBL or SWISS format using EMBOSS programs. The EMBOSS programs only accepts 10 characters for ID in EMBL or SWISS format . Is this problem fixed in EMBOSS versions later than 2.4.1? EMBL can have wthatever policy they want but it would be nice if the EMBOSS programs would accept ANY lenth of ID also in EMBL and SWISS format. Stefanie _________________________________________________________________ http://fastmail.ca/ - Fast Secure Web Email for Canadians From sharmila at ebi.ac.uk Thu Nov 21 11:12:33 2002 From: sharmila at ebi.ac.uk (Sharmila Pillai) Date: Thu, 21 Nov 2002 11:12:33 +0000 Subject: Install Emboss with Windows?? Message-ID: Hi, From what I understand there is a cygwin compiled version but its not tested and cannot handle graphics. You should refer to what Rodrigo Lopez wrote to the embosslist on 1/11/02 in response to subject:Remote getz from emboss Though this not the solution for your problem today, this could be the direction for Windows users. I'll try to explain bit of it here: At the EBI's External Services group, I am working on a webservice for EMBOSS using SOAP. Basically, this enables the user to use EMBOSS applications remotely. % seqret srsembl:J00231 -lhttp://servername:portnum/axis/services The above command would use AXIS/SOAP to access the 'servername' and the 'portnum' which inturn would retrieve data from srsembl (as defined in emboss.default) and pass it on to the application (seqret, in this example). The result is sent to stdout. All the user (using any OS) needs is a client which understands/interprets a command line as above and some libraries for Axis/SOAP. We have an experimental service using both Java and Perl running on Axis/Tomcat. I don't think EBI provides remote access to many EMBOSS applications today. Hoping our experimental service survives our tests and there is enough demand for such a service, EBI can soon start opening up webservice access to EMBOSS. //Sharmila. From Georg.Beckmann at Schering.DE Thu Nov 21 08:25:39 2002 From: Georg.Beckmann at Schering.DE (Georg.Beckmann at Schering.DE) Date: Thu, 21 Nov 2002 09:25:39 +0100 Subject: OldDistances Message-ID: Hi, does anybody know if EMBOSS offers a program similar to OldDistances in GCG ? OldDistances - which previously had still another name, that I don't remember - calculates a matrix of pairwise similarities from a multiple alignment. As far as I can see, there is no such program. Is somebody working on such program for Emboss ? Thanks. Ciao, Georg Beckmann From newgene at bigfoot.com Thu Nov 21 14:55:24 2002 From: newgene at bigfoot.com (clwu) Date: Thu, 21 Nov 2002 08:55:24 -0600 Subject: Install Emboss with Windows?? References: <1037808920.3ddbb518dbbac@courriel.uqtr.ca> Message-ID: <3DDCF3DC.9020202@bigfoot.com> I recently compiled EMBOSS successfully under cygwin/win2K(Thanks for David Starks-Browning's great help). And so far, all applications I used works fine(graphics output is also OK under openbox/cygwin). I think you should install cygwin and give a try. good luck. Chunlei Myrian_Grondin at UQTR.CA wrote:e >Hi, >We are working on PC, OS Windows 98, and we would like to know if it's possible >to install Emboss on our machine. If so, which software have we to install to >be able to run Emboss? >Thanks a lot (excuse me, my English is so poor...) >Myrian > > > >------------------------------------------------- >Courriel exp?di? via https://courriel.uqtr.ca > > From lukem at gene.pbi.nrc.ca Thu Nov 21 15:28:13 2002 From: lukem at gene.pbi.nrc.ca (Luke McCarthy) Date: Thu, 21 Nov 2002 09:28:13 -0600 (CST) Subject: Install Emboss with Windows?? In-Reply-To: <1037808920.3ddbb518dbbac@courriel.uqtr.ca> Message-ID: On Wed, 20 Nov 2002 Myrian_Grondin at UQTR.CA wrote: > Hi, > We are working on PC, OS Windows 98, and we would like to know if it's > possible to install Emboss on our machine. If so, which software have we to > install to be able to run Emboss? Other posts have addressed the issue of installing EMBOSS locally on a Windows box, but if you have an immediate pressing need to use the EMBOSS applications, the Canadian Bioinformatics Resource offers access through a web interface at http://www.cbr.nrc.ca/services/emboss_e.php ou en francais: http://www.cbr.nrc.ca/services/emboss_f.php Unfortunately, the interface itself is English only, but then so are the EMBOSS applications (at least as far as I know...) Cheers, Luke From newgene at bigfoot.com Thu Nov 21 16:52:11 2002 From: newgene at bigfoot.com (clwu) Date: Thu, 21 Nov 2002 10:52:11 -0600 Subject: mfold Message-ID: <3DDD0F3B.8020109@bigfoot.com> Hi, group, Does anybody know if there is a EMBOSS equivalence for "mfold" program in GCG? Thanks. Chunlei From stefanielager at fastmail.ca Fri Nov 22 08:33:29 2002 From: stefanielager at fastmail.ca (Stefanie Lager) Date: Fri, 22 Nov 2002 03:33:29 -0500 (EST) Subject: mfold Message-ID: <3DDDEBD9.000009.03475@ns.interchange.ca> > Hi, group, > Does anybody know if there is a EMBOSS equivalence for > "mfold" program in GCG? > > Thanks. > > Chunlei NO, but there are plenty of RNA structure software out there, both as servers and for local installation. http://www.bioinfo.rpi.edu/~zukerm/rna/node3.html#SECTION00031 _________________________________________________________________ http://fastmail.ca/ - Fast Secure Web Email for Canadians From mikep at angis.org.au Sun Nov 24 22:25:41 2002 From: mikep at angis.org.au (Michael Poidinger) Date: Mon, 25 Nov 2002 09:25:41 +1100 Subject: codon useage tables In-Reply-To: <3DDDEBD9.000009.03475@ns.interchange.ca> Message-ID: <5.2.0.9.2.20021125092344.02f2ae50@morgan.angis.org.au> Is there a site somewhere which describes which organisms/data sets the EMBOSS codon useage tables are derived from? some are obvious from their name, others are not. Thanks, Mike ------------------------------------ Dr Michael Poidinger PhD(virology) PGDipSci (computer science) CEO, Australian Genome Information Centre Head, Australian National Genome Information Service ph 61-2-93518617 mob 0413146765 fax 61-2-93518618 email head at angis.org.au ------------------------------------------ From areagp61 at yahoo.it Mon Nov 25 09:38:59 2002 From: areagp61 at yahoo.it (Graziano P.) Date: Mon, 25 Nov 2002 10:38:59 +0100 Subject: codon useage tables References: <5.2.0.9.2.20021125092344.02f2ae50@morgan.angis.org.au> Message-ID: <007701c29466$8127a7f0$18105709@italy.ibm.com> Not every file but most are described in the README file from ftp://ftp.ebi.ac.uk/pub/databases/codonusage Hope this helps Graziano Pappad? ----- Original Message ----- From: "Michael Poidinger" To: Sent: Sunday, November 24, 2002 11:25 PM Subject: codon useage tables > Is there a site somewhere which describes which organisms/data sets the > EMBOSS codon useage tables are derived from? some are obvious from their > name, others are not. > > Thanks, > Mike > ------------------------------------ > Dr Michael Poidinger > PhD(virology) PGDipSci (computer science) > CEO, Australian Genome Information Centre > Head, Australian National Genome Information Service > ph 61-2-93518617 > mob 0413146765 > fax 61-2-93518618 > email head at angis.org.au > ------------------------------------------ > ______________________________________________________________________ Per te Blu American Express ? gratis! http://it.yahoo.com/mail_it/foot/?http://www.americanexpress.it/land_yahoo From mikep at angis.org.au Mon Nov 25 21:58:02 2002 From: mikep at angis.org.au (Michael Poidinger) Date: Tue, 26 Nov 2002 08:58:02 +1100 Subject: codon useage tables In-Reply-To: <007701c29466$8127a7f0$18105709@italy.ibm.com> References: <5.2.0.9.2.20021125092344.02f2ae50@morgan.angis.org.au> Message-ID: <5.2.0.9.2.20021126085548.02f344e8@morgan.angis.org.au> At 10:38 AM 25/11/2002 +0100, Graziano P. wrote: >Not every file but most are described in the README file >from ftp://ftp.ebi.ac.uk/pub/databases/codonusage > >Hope this helps Thanks, it helps with quite a few. Do you (or anyone else) know the difference between related files? such as Ehum and Ehuman Eeco, Eeco_h and Eecoli Emus, Emussp etc. Thanks, Mike ------------------------------------ Dr Michael Poidinger PhD(virology) PGDipSci (computer science) CEO, Australian Genome Information Centre Head, Australian National Genome Information Service ph 61-2-93518617 mob 0413146765 fax 61-2-93518618 email head at angis.org.au ------------------------------------------ From peter.rice at uk.lionbioscience.com Tue Nov 26 10:40:04 2002 From: peter.rice at uk.lionbioscience.com (Peter Rice) Date: Tue, 26 Nov 2002 10:40:04 +0000 Subject: codon useage tables References: <5.2.0.9.2.20021125092344.02f2ae50@morgan.angis.org.au> <5.2.0.9.2.20021126085548.02f344e8@morgan.angis.org.au> Message-ID: <3DE34F84.90108@uk.lionbioscience.com> Michael Poidinger wrote: > Do you (or anyone else) know the difference between related files? > > such as > Ehum and Ehuman > Eeco, Eeco_h and Eecoli > Emus, Emussp The codon usage files were set up a long time ago. It was not so easy to find a good set of tables that were free to use. The first tables (if I recall correctly) came from the TRANSTERM database Short names (Eeco) are reformatted TRANSTERM codon usage tables with an E (EMBOSS) prefix and a .cut suffix to identify the format. Names with _h (Eco_h) are highly expressed genes (high Codon Adaptation Index values) sp endings? Help! Ysp is "Yeast S.pombe" of course. I assume the others are for a genus (e.g. Mus sp. = Mus musculus and Mus domesticus) rather than a single species. Emussp.cut is a reformat of TRANSTERM's mussp.cod file. The EBI's FTP copy of TRANSTERM did not document exactly what these names mean. The original TRANSTERM documentation also leaves you to guess at the 3-letter spoecies codes. The TRANSTERM website seems to be only partly available. Longer names (Eecoli) are added from elsewhere (I need to check on their origin) and only include a few genes (count the stop codons!) so I assume they are old and probably obsolete. mt endings are mitochondrial genes cp endings are chloroplast genes Time to review these tables I suspect!!! How about replacing them with annotated tables from CUTG for selected species? We need to be careful about default table names in some programs, but they are easy to update. Peter -- ------------------------------------------------ Peter Rice, LION Bioscience Ltd, Cambridge, UK peter.rice at uk.lionbioscience.com +44 1223 224723 From Joerg.Schaber at uv.es Tue Nov 26 12:41:44 2002 From: Joerg.Schaber at uv.es (Joerg Schaber) Date: Tue, 26 Nov 2002 13:41:44 +0100 Subject: duplicate ID Message-ID: <3DE36C08.6030603@uv.es> Hi, creating a ncbi database using dbiflat I always get a few times the message "Warning: Duplicate ID skipped: '' All hits will point to first ID found". Even though it does not seem to have severe efects I would like to know what duplicate IDs are ment. I checked the genomes IDs and acnums and they seem to be OK (all *gbk files downloaded from NCBI) and they all have entries and are not 'null'. Any idea what's the problem? here the command I use: dbiflat -idformat gb -directory "." -filename "*.gbk" -dbname "ncbibac" -release "1.0" -date "26/11/02" -fields acnum,des,taxon greeetings, joerg From david.vilanova at rdls.nestle.com Tue Nov 26 13:43:40 2002 From: david.vilanova at rdls.nestle.com (Vilanova,David,LAUSANNE,NRC/BS) Date: Tue, 26 Nov 2002 14:43:40 +0100 Subject: Matcher Message-ID: <89466355CEFE7244AC3A013E45641C180144ECDD@lsmail2.crn.nestrd.ch> Dear all, I was wondering if matcher program accepts a sequence via stdin. the following exemple doesn't work for me. matcher -sequencea 'ATGCGA' -sequenceb 'ATCTAGATATGCGA' >cannot open ATGCGA file for read. Is there anyway to submit a sequence via stdin ??? Thanks, David From peter.rice at uk.lionbioscience.com Tue Nov 26 13:53:50 2002 From: peter.rice at uk.lionbioscience.com (Peter Rice) Date: Tue, 26 Nov 2002 13:53:50 +0000 Subject: Matcher References: <89466355CEFE7244AC3A013E45641C180144ECDD@lsmail2.crn.nestrd.ch> Message-ID: <3DE37CEE.5090903@uk.lionbioscience.com> Vilanova,David,LAUSANNE,NRC/BS wrote: > Dear all, > I was wondering if matcher program accepts a sequence via stdin. > > the following exemple doesn't work for me. > > matcher -sequencea 'ATGCGA' -sequenceb 'ATCTAGATATGCGA' > >>cannot open ATGCGA file for read. > > > Is there anyway to submit a sequence via stdin ??? You don't mean stdin (that can only read one sequence anyway) ... you mean "can I specify a sequence on the command line?" Yes!!!! You need the "asis" special format. matcher -sequencea 'asis::ATGCGA' -sequenceb 'asis::ATCTAGATATGCGA' (assuming your shell allows the command line to be long enough for your sequences :-) Hope this helps Peter -- ------------------------------------------------ Peter Rice, LION Bioscience Ltd, Cambridge, UK peter.rice at uk.lionbioscience.com +44 1223 224723 From david.vilanova at rdls.nestle.com Tue Nov 26 14:12:19 2002 From: david.vilanova at rdls.nestle.com (Vilanova,David,LAUSANNE,NRC/BS) Date: Tue, 26 Nov 2002 15:12:19 +0100 Subject: Matcher Message-ID: <89466355CEFE7244AC3A013E45641C180144ECE0@lsmail2.crn.nestrd.ch> Thanks Peter, Sorry for the mistake. I'm writing a bioperl script which automatically runs an emboss aplication. I could have worked by generating foreach sequence I read a new file but it looks pretty nice like that. Regards, David #! /usr/bin/perl -w use Bio::Factory::EMBOSS; use Bio::SeqIO; die "Usage: perl script.pl [seqfileA] [seqfileB] [outfile]\n" unless @ARGV eq '3'; #Read input files ($seqfileA,$seqfileB,$outfile) = @ARGV; #Initialize Object $EMBOSS = new Bio::Factory::EMBOSS; #Define emboss program to run $application = $EMBOSS->program('matcher'); #Manipulate SeqfileA file $seqA = new Bio::SeqIO (-file => $seqfileA, -format => 'fasta'); while ($seqinA = $seqA->next_seq){ $inseqA = "asis::".$seqinA->seq; $seqidA = $seqinA->id; #$seqoutA->write_seq($inseqA); print "####$seqidA\n"; #Initialize seqB at every iteration of SeqA $seqB = new Bio::SeqIO (-file => $seqfileB, -format => 'fasta'); while ($seqinB = $seqB->next_seq){ $inseqB = "asis::".$seqinB->seq; #Format like asis::ATGCGA (required for emboss) $seqidB = $seqinB->id; #$seqoutB->write_seq($inseqB); #print "####$inseqA\n"; print "Processing sequence $seqidA..vs..$seqidB..."; #Define program parameters and run... $application->run({ -sequencea => $inseqA, -sequenceb => $inseqB, -outfile => $outfile }); print "done\n"; .... Manipulate alignments..... .... } } -----Original Message----- From: Peter Rice [mailto:peter.rice at uk.lionbioscience.com] Sent: mardi, 26. novembre 2002 14:54 To: Vilanova,David,LAUSANNE,NRC/BS Cc: 'emboss at embnet.org' Subject: Re: Matcher Vilanova,David,LAUSANNE,NRC/BS wrote: > Dear all, > I was wondering if matcher program accepts a sequence via stdin. > > the following exemple doesn't work for me. > > matcher -sequencea 'ATGCGA' -sequenceb 'ATCTAGATATGCGA' > >>cannot open ATGCGA file for read. > > > Is there anyway to submit a sequence via stdin ??? You don't mean stdin (that can only read one sequence anyway) ... you mean "can I specify a sequence on the command line?" Yes!!!! You need the "asis" special format. matcher -sequencea 'asis::ATGCGA' -sequenceb 'asis::ATCTAGATATGCGA' (assuming your shell allows the command line to be long enough for your sequences :-) Hope this helps Peter -- ------------------------------------------------ Peter Rice, LION Bioscience Ltd, Cambridge, UK peter.rice at uk.lionbioscience.com +44 1223 224723 From jason at cgt.mc.duke.edu Tue Nov 26 14:54:19 2002 From: jason at cgt.mc.duke.edu (Jason Stajich) Date: Tue, 26 Nov 2002 09:54:19 -0500 (EST) Subject: Matcher In-Reply-To: <89466355CEFE7244AC3A013E45641C180144ECE0@lsmail2.crn.nestrd.ch> References: <89466355CEFE7244AC3A013E45641C180144ECE0@lsmail2.crn.nestrd.ch> Message-ID: Bioperl will also do the behind-the-scenes work of creating the tempfile and cleaning it up for you if you just pass in a Bio::PrimarySeqI object. It detects if you pass in an object or a string and proceeds accordingly. Jason Stajich Duke University jason at cgt.mc.duke.edu On Tue, 26 Nov 2002, Vilanova,David,LAUSANNE,NRC/BS wrote: > Thanks Peter, > Sorry for the mistake. > I'm writing a bioperl script which automatically runs an emboss aplication. > I could have worked by generating foreach sequence I read a new file but it > looks pretty nice like that. > > Regards, > David > > > #! /usr/bin/perl -w > > use Bio::Factory::EMBOSS; > use Bio::SeqIO; > > die "Usage: perl script.pl [seqfileA] [seqfileB] [outfile]\n" unless @ARGV > eq '3'; > > #Read input files > ($seqfileA,$seqfileB,$outfile) = @ARGV; > > #Initialize Object > $EMBOSS = new Bio::Factory::EMBOSS; > > #Define emboss program to run > $application = $EMBOSS->program('matcher'); > > #Manipulate SeqfileA file > $seqA = new Bio::SeqIO (-file => $seqfileA, > -format => 'fasta'); > > > while ($seqinA = $seqA->next_seq){ > $inseqA = "asis::".$seqinA->seq; > $seqidA = $seqinA->id; > #$seqoutA->write_seq($inseqA); > > print "####$seqidA\n"; > #Initialize seqB at every iteration of SeqA > $seqB = new Bio::SeqIO (-file => $seqfileB, > -format => 'fasta'); > > while ($seqinB = $seqB->next_seq){ > $inseqB = "asis::".$seqinB->seq; #Format like asis::ATGCGA (required > for emboss) > $seqidB = $seqinB->id; > #$seqoutB->write_seq($inseqB); > #print "####$inseqA\n"; > print "Processing sequence $seqidA..vs..$seqidB..."; > > > #Define program parameters and run... > $application->run({ > -sequencea => $inseqA, > -sequenceb => $inseqB, > -outfile => $outfile }); > print "done\n"; > .... > Manipulate alignments..... > .... > } > > } > > > > > > -----Original Message----- > From: Peter Rice [mailto:peter.rice at uk.lionbioscience.com] > Sent: mardi, 26. novembre 2002 14:54 > To: Vilanova,David,LAUSANNE,NRC/BS > Cc: 'emboss at embnet.org' > Subject: Re: Matcher > > > Vilanova,David,LAUSANNE,NRC/BS wrote: > > Dear all, > > I was wondering if matcher program accepts a sequence via stdin. > > > > the following exemple doesn't work for me. > > > > matcher -sequencea 'ATGCGA' -sequenceb 'ATCTAGATATGCGA' > > > >>cannot open ATGCGA file for read. > > > > > > Is there anyway to submit a sequence via stdin ??? > > You don't mean stdin (that can only read one sequence anyway) ... you mean > "can I specify a sequence on the command line?" > > Yes!!!! You need the "asis" special format. > > matcher -sequencea 'asis::ATGCGA' -sequenceb 'asis::ATCTAGATATGCGA' > > (assuming your shell allows the command line to be long enough for your > sequences :-) > > Hope this helps > > Peter > > -- > ------------------------------------------------ > Peter Rice, LION Bioscience Ltd, Cambridge, UK > peter.rice at uk.lionbioscience.com +44 1223 224723 > From david.vilanova at rdls.nestle.com Tue Nov 26 15:58:32 2002 From: david.vilanova at rdls.nestle.com (Vilanova,David,LAUSANNE,NRC/BS) Date: Tue, 26 Nov 2002 16:58:32 +0100 Subject: Bioperl and matcher Message-ID: <89466355CEFE7244AC3A013E45641C180144ECE7@lsmail2.crn.nestrd.ch> Hello, I have problems retrieving the alignments from an emboss output. The program belows read 2 files and runs a matcher of all against all. Matcher gives me an msf output and then I try to parse this alignment with Bio::AlignIO. However I get an exception... Processing sequence 1..vs..3...done ------------- EXCEPTION ------------- MSG: 1 exists as an alignment line but not in the header. Not confident of what is going on! STACK Bio::AlignIO::msf::next_aln /usr/local/lib/perl5/site_perl/5.8.0/Bio/AlignIO/msf.pm:106 STACK toplevel Run_Emboss.pl:50 -------------------------------------- Here is the output from matcher: !!NA_MULTIPLE_ALIGNMENT 1.0 out MSF: 5 Type: N 26/11/02 CompCheck: 2090 .. Name: EMBOSS_001 Len: 5 Check: 1045 Weight: 1.00 Name: EMBOSS_002 Len: 5 Check: 1045 Weight: 1.00 // 1 5 EMBOSS_001 CGGCG EMBOSS_002 CGGCG ########################################################### It doesn't work for fasta format as well in my script (see output below): Processing sequence 1..vs..3...done Use of uninitialized value in sprintf at /usr/local/lib/perl5/site_perl/5.8.0/Bio/SimpleAlign.pm line 257, line 4. Use of uninitialized value in hash element at /usr/local/lib/perl5/site_perl/5.8.0/Bio/SimpleAlign.pm line 268, line 4. Use of uninitialized value in hash element at /usr/local/lib/perl5/site_perl/5.8.0/Bio/SimpleAlign.pm line 268, line 4. Use of uninitialized value in hash element at /usr/local/lib/perl5/site_perl/5.8.0/Bio/SimpleAlign.pm line 270, line 4. ######################### #Script #! /usr/bin/perl -w use Bio::Factory::EMBOSS; use Bio::SeqIO; use Bio::AlignIO; die "Usage: perl script.pl [seqfileA] [seqfileB] [outfile]\n" unless @ARGV eq '3'; #Read input files ($seqfileA,$seqfileB,$outfile) = @ARGV; #Initialize Object $EMBOSS = new Bio::Factory::EMBOSS; #Define emboss program to run $application = $EMBOSS->program('matcher'); #Manipulate SeqfileA file $seqA = new Bio::SeqIO (-file => $seqfileA, -format => 'fasta'); while ($seqinA = $seqA->next_seq){ $inseqA = "asis::".$seqinA->seq; $seqidA = $seqinA->id; print "####$seqidA\n"; #Initialize seqB at every iteration of SeqA $seqB = new Bio::SeqIO (-file => $seqfileB, -format => 'fasta'); while ($seqinB = $seqB->next_seq){ $inseqB = "asis::".$seqinB->seq; #Format like asis::ATGCGA (required for emboss) $seqidB = $seqinB->id; print "Processing sequence $seqidA..vs..$seqidB..."; #Define program parameters and run... $application->run({ -sequencea => $inseqA, -sequenceb => $inseqB, -aformat => 'msf', -outfile => $outfile }); print "done\n"; $alnin = new Bio::AlignIO(-format => 'msf', -file => $outfile ); while ($aln = $alnin->next_aln){ print $aln->no_residues,"\n"; #print $aln->consensus_string,"\n"; } } } From jason at cgt.mc.duke.edu Tue Nov 26 16:05:22 2002 From: jason at cgt.mc.duke.edu (Jason Stajich) Date: Tue, 26 Nov 2002 11:05:22 -0500 (EST) Subject: Bioperl and matcher In-Reply-To: <89466355CEFE7244AC3A013E45641C180144ECE7@lsmail2.crn.nestrd.ch> References: <89466355CEFE7244AC3A013E45641C180144ECE7@lsmail2.crn.nestrd.ch> Message-ID: Our msf parser is seeing something it isn't expecting - not sure why - what happens when you just use the straight 'emboss' parser with standard emboss alignment output which is the route that has been most heavily tested? -jason Jason Stajich Duke University jason at cgt.mc.duke.edu On Tue, 26 Nov 2002, Vilanova,David,LAUSANNE,NRC/BS wrote: > > Hello, > I have problems retrieving the alignments from an emboss output. > The program belows read 2 files and runs a matcher of all against all. > Matcher gives me an msf output and then I try to parse this alignment with > Bio::AlignIO. > However I get an exception... > > Processing sequence 1..vs..3...done > > ------------- EXCEPTION ------------- > MSG: 1 exists as an alignment line but not in the header. Not confident of > what is going on! > STACK Bio::AlignIO::msf::next_aln > /usr/local/lib/perl5/site_perl/5.8.0/Bio/AlignIO/msf.pm:106 > STACK toplevel Run_Emboss.pl:50 > > -------------------------------------- > > Here is the output from matcher: > !!NA_MULTIPLE_ALIGNMENT 1.0 > > out MSF: 5 Type: N 26/11/02 CompCheck: 2090 .. > > Name: EMBOSS_001 Len: 5 Check: 1045 Weight: 1.00 > Name: EMBOSS_002 Len: 5 Check: 1045 Weight: 1.00 > > // > > 1 5 > EMBOSS_001 CGGCG > EMBOSS_002 CGGCG > > > ########################################################### > It doesn't work for fasta format as well in my script (see output below): > Processing sequence 1..vs..3...done > Use of uninitialized value in sprintf at > /usr/local/lib/perl5/site_perl/5.8.0/Bio/SimpleAlign.pm line 257, > line 4. > Use of uninitialized value in hash element at > /usr/local/lib/perl5/site_perl/5.8.0/Bio/SimpleAlign.pm line 268, > line 4. > Use of uninitialized value in hash element at > /usr/local/lib/perl5/site_perl/5.8.0/Bio/SimpleAlign.pm line 268, > line 4. > Use of uninitialized value in hash element at > /usr/local/lib/perl5/site_perl/5.8.0/Bio/SimpleAlign.pm line 270, > line 4. > > ######################### > > > #Script > #! /usr/bin/perl -w > > use Bio::Factory::EMBOSS; > use Bio::SeqIO; > use Bio::AlignIO; > > die "Usage: perl script.pl [seqfileA] [seqfileB] [outfile]\n" unless @ARGV > eq '3'; > > #Read input files > ($seqfileA,$seqfileB,$outfile) = @ARGV; > > #Initialize Object > $EMBOSS = new Bio::Factory::EMBOSS; > > #Define emboss program to run > $application = $EMBOSS->program('matcher'); > > #Manipulate SeqfileA file > $seqA = new Bio::SeqIO (-file => $seqfileA, > -format => 'fasta'); > > > while ($seqinA = $seqA->next_seq){ > $inseqA = "asis::".$seqinA->seq; > $seqidA = $seqinA->id; > > > print "####$seqidA\n"; > #Initialize seqB at every iteration of SeqA > $seqB = new Bio::SeqIO (-file => $seqfileB, > -format => 'fasta'); > > while ($seqinB = $seqB->next_seq){ > $inseqB = "asis::".$seqinB->seq; #Format like asis::ATGCGA (required for > emboss) > $seqidB = $seqinB->id; > > print "Processing sequence $seqidA..vs..$seqidB..."; > > #Define program parameters and run... > $application->run({ > -sequencea => $inseqA, > -sequenceb => $inseqB, > -aformat => 'msf', > -outfile => $outfile }); > print "done\n"; > > $alnin = new Bio::AlignIO(-format => 'msf', > -file => $outfile ); > > while ($aln = $alnin->next_aln){ > print $aln->no_residues,"\n"; > #print $aln->consensus_string,"\n"; > > } > } > } > > > > > > > > > From peter.rice at uk.lionbioscience.com Tue Nov 26 16:12:46 2002 From: peter.rice at uk.lionbioscience.com (Peter Rice) Date: Tue, 26 Nov 2002 16:12:46 +0000 Subject: Bioperl and matcher References: <89466355CEFE7244AC3A013E45641C180144ECE7@lsmail2.crn.nestrd.ch> Message-ID: <3DE39D7E.9080403@uk.lionbioscience.com> Vilanova,David,LAUSANNE,NRC/BS wrote: > > Hello, > I have problems retrieving the alignments from an emboss output. > The program belows read 2 files and runs a matcher of all against all. > Matcher gives me an msf output and then I try to parse this alignment with > Bio::AlignIO. > However I get an exception... > > Processing sequence 1..vs..3...done > > ------------- EXCEPTION ------------- > MSG: 1 exists as an alignment line but not in the header. Not confident of > what is going on! BioPerl seems to be having trouble with the EMBOSS MSF format output. It could be something about the naming of the sequences? EMBOSS is making up names for your sequences. I assume you are using asis::CGGCG to pass them to matcher. You can put -sid after each sequence to give them names, for example: matcher -out x.x -af msf asis:ccggc -sid cg asis::cgggc -sid gg (-sid, like -aformat, is an associated qualifier. It must follow the asis:: sequence because it is positional (putting it first on the command line for example would refer to all sequences - fine for -sformat but not a good idea for -sid :-) Hope this helps Peter -- ------------------------------------------------ Peter Rice, LION Bioscience Ltd, Cambridge, UK peter.rice at uk.lionbioscience.com +44 1223 224723 From david.vilanova at rdls.nestle.com Tue Nov 26 16:14:19 2002 From: david.vilanova at rdls.nestle.com (Vilanova,David,LAUSANNE,NRC/BS) Date: Tue, 26 Nov 2002 17:14:19 +0100 Subject: Bioperl and matcher Message-ID: <89466355CEFE7244AC3A013E45641C180144ECE8@lsmail2.crn.nestrd.ch> Ok,I use: $alnin = new Bio::AlignIO(-format =>'emboss', -file => $outfile ); while ($aln = $alnin->next_aln){ print $aln->no_residues,"\n"; } I don't specify any format to emboss so I get the standard alignment. In this case It doesn't work, it never enters this loop... but the program doesn't crash. It does all the alignements, store the aln in outfile but seems not to read it..!! bizarre ??? David -----Original Message----- From: Jason Stajich [mailto:jason at cgt.mc.duke.edu] Sent: mardi, 26. novembre 2002 17:05 To: Vilanova,David,LAUSANNE,NRC/BS Cc: 'bioperl-l at bioperl.org'; 'emboss at embnet.org' Subject: Re: Bioperl and matcher Our msf parser is seeing something it isn't expecting - not sure why - what happens when you just use the straight 'emboss' parser with standard emboss alignment output which is the route that has been most heavily tested? -jason Jason Stajich Duke University jason at cgt.mc.duke.edu On Tue, 26 Nov 2002, Vilanova,David,LAUSANNE,NRC/BS wrote: > > Hello, > I have problems retrieving the alignments from an emboss output. > The program belows read 2 files and runs a matcher of all against all. > Matcher gives me an msf output and then I try to parse this alignment with > Bio::AlignIO. > However I get an exception... > > Processing sequence 1..vs..3...done > > ------------- EXCEPTION ------------- > MSG: 1 exists as an alignment line but not in the header. Not confident of > what is going on! > STACK Bio::AlignIO::msf::next_aln > /usr/local/lib/perl5/site_perl/5.8.0/Bio/AlignIO/msf.pm:106 > STACK toplevel Run_Emboss.pl:50 > > -------------------------------------- > > Here is the output from matcher: > !!NA_MULTIPLE_ALIGNMENT 1.0 > > out MSF: 5 Type: N 26/11/02 CompCheck: 2090 .. > > Name: EMBOSS_001 Len: 5 Check: 1045 Weight: 1.00 > Name: EMBOSS_002 Len: 5 Check: 1045 Weight: 1.00 > > // > > 1 5 > EMBOSS_001 CGGCG > EMBOSS_002 CGGCG > > > ########################################################### > It doesn't work for fasta format as well in my script (see output below): > Processing sequence 1..vs..3...done > Use of uninitialized value in sprintf at > /usr/local/lib/perl5/site_perl/5.8.0/Bio/SimpleAlign.pm line 257, > line 4. > Use of uninitialized value in hash element at > /usr/local/lib/perl5/site_perl/5.8.0/Bio/SimpleAlign.pm line 268, > line 4. > Use of uninitialized value in hash element at > /usr/local/lib/perl5/site_perl/5.8.0/Bio/SimpleAlign.pm line 268, > line 4. > Use of uninitialized value in hash element at > /usr/local/lib/perl5/site_perl/5.8.0/Bio/SimpleAlign.pm line 270, > line 4. > > ######################### > > > #Script > #! /usr/bin/perl -w > > use Bio::Factory::EMBOSS; > use Bio::SeqIO; > use Bio::AlignIO; > > die "Usage: perl script.pl [seqfileA] [seqfileB] [outfile]\n" unless @ARGV > eq '3'; > > #Read input files > ($seqfileA,$seqfileB,$outfile) = @ARGV; > > #Initialize Object > $EMBOSS = new Bio::Factory::EMBOSS; > > #Define emboss program to run > $application = $EMBOSS->program('matcher'); > > #Manipulate SeqfileA file > $seqA = new Bio::SeqIO (-file => $seqfileA, > -format => 'fasta'); > > > while ($seqinA = $seqA->next_seq){ > $inseqA = "asis::".$seqinA->seq; > $seqidA = $seqinA->id; > > > print "####$seqidA\n"; > #Initialize seqB at every iteration of SeqA > $seqB = new Bio::SeqIO (-file => $seqfileB, > -format => 'fasta'); > > while ($seqinB = $seqB->next_seq){ > $inseqB = "asis::".$seqinB->seq; #Format like asis::ATGCGA (required for > emboss) > $seqidB = $seqinB->id; > > print "Processing sequence $seqidA..vs..$seqidB..."; > > #Define program parameters and run... > $application->run({ > -sequencea => $inseqA, > -sequenceb => $inseqB, > -aformat => 'msf', > -outfile => $outfile }); > print "done\n"; > > $alnin = new Bio::AlignIO(-format => 'msf', > -file => $outfile ); > > while ($aln = $alnin->next_aln){ > print $aln->no_residues,"\n"; > #print $aln->consensus_string,"\n"; > > } > } > } > > > > > > > > > From david.vilanova at rdls.nestle.com Tue Nov 26 16:33:56 2002 From: david.vilanova at rdls.nestle.com (Vilanova,David,LAUSANNE,NRC/BS) Date: Tue, 26 Nov 2002 17:33:56 +0100 Subject: Bioperl and matcher Message-ID: <89466355CEFE7244AC3A013E45641C180144ECEC@lsmail2.crn.nestrd.ch> I tried that but it still doesn't fix the problem... -----Original Message----- From: Peter Rice [mailto:peter.rice at uk.lionbioscience.com] Sent: mardi, 26. novembre 2002 17:13 To: Vilanova,David,LAUSANNE,NRC/BS Cc: 'bioperl-l at bioperl.org'; 'emboss at embnet.org' Subject: Re: Bioperl and matcher Vilanova,David,LAUSANNE,NRC/BS wrote: > > Hello, > I have problems retrieving the alignments from an emboss output. > The program belows read 2 files and runs a matcher of all against all. > Matcher gives me an msf output and then I try to parse this alignment with > Bio::AlignIO. > However I get an exception... > > Processing sequence 1..vs..3...done > > ------------- EXCEPTION ------------- > MSG: 1 exists as an alignment line but not in the header. Not confident of > what is going on! BioPerl seems to be having trouble with the EMBOSS MSF format output. It could be something about the naming of the sequences? EMBOSS is making up names for your sequences. I assume you are using asis::CGGCG to pass them to matcher. You can put -sid after each sequence to give them names, for example: matcher -out x.x -af msf asis:ccggc -sid cg asis::cgggc -sid gg (-sid, like -aformat, is an associated qualifier. It must follow the asis:: sequence because it is positional (putting it first on the command line for example would refer to all sequences - fine for -sformat but not a good idea for -sid :-) Hope this helps Peter -- ------------------------------------------------ Peter Rice, LION Bioscience Ltd, Cambridge, UK peter.rice at uk.lionbioscience.com +44 1223 224723 From vz_silvana at verizon-uweb.com Wed Nov 27 20:30:03 2002 From: vz_silvana at verizon-uweb.com (Silvana Paredes) Date: Wed, 27 Nov 2002 15:30:03 -0500 Subject: Inquire about login jemboss Message-ID: <200211272030.PAA18916@www22.ureach.com> To whom may it concern: I downloaded the jemboss software but I am trying to used and it is asking me for a login and a password and I can't find the way to set up an account or use the emboss without login it. I will appreciate if you can give me instructions about how to start using it or create an account. Thank you so much, Best regards, Silvana Paredes