From tmargus at ebc.ee Thu Sep 5 12:28:19 2002 From: tmargus at ebc.ee (=?iso-8859-1?Q?T=F5nu_Margus?=) Date: Thu, 5 Sep 2002 19:28:19 +0300 Subject: how to specify seq parameters? Message-ID: <002e01c254f9$3f0b72e0$1e1728c1@ebc.ee> Hi, It is simple question, but I didn't find an answer from documentation. How to tell dotmatcher or whatever an other program in command line, that it will read sequence in reverse/complement? If to run it with an option -sask then it will ask, but from command line? Help tells db:seq1 (Parameter1) db:seq2 (Parameter2) where parameters are optional. I can't find from documentation how to specify "parameters". Thanks in Advance, T?nu Margus -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.open-bio.org/pipermail/emboss/attachments/20020905/2e666820/attachment.html From peter.rice at uk.lionbioscience.com Thu Sep 5 12:35:14 2002 From: peter.rice at uk.lionbioscience.com (Peter Rice) Date: Thu, 05 Sep 2002 17:35:14 +0100 Subject: how to specify seq parameters? References: <002e01c254f9$3f0b72e0$1e1728c1@ebc.ee> Message-ID: <3D7787C2.3020108@uk.lionbioscience.com> T?nu Margus wrote: > It is simple question, but I didn't find an answer from documentation. > > How to tell dotmatcher or whatever an other program in command line, > that it will read sequence in reverse/complement? > If to run it with an option -sask then it will ask, but from command line? > Help tells db:seq1 (Parameter1) db:seq2 (Parameter2) > where parameters are optional. You have many options!!! To reverse the first sequence: % dotmatcher db:seq1 -sreverse % dotmatcher -sreverse1 db:seq1 % dotmatcher -sequencea_sreverse db:seq1 % dotmatcher db:seq1[::r] Note: -sreverse is an "associated qualifier" (dotmatcher -help -verbose lists them). Associated qualifiers, specified as a simple qualifier such as '-sreverse', apply to the previous true qualifier, so -sreverse after db:seq1 applies only to db:seq1. If you put them at the start of the line, they apply to all sequences. -sreverse1 is the first -sreverse (for the first sequence) -sequencea_sreverse is the -sreverse for parameter sequencea (the name of the first sequence in dotmatcher, see dotmatcher -help for these names) db:seq1[::r] is an extension to the USA syntax. You can specify -sbegin, -send and -sreverse in the USA. regards, Peter Rice -- ------------------------------------------------ Peter Rice, LION Bioscience Ltd, Cambridge, UK peter.rice at uk.lionbioscience.com +44 1223 224723 From gbottu at ben.vub.ac.be Fri Sep 6 03:39:00 2002 From: gbottu at ben.vub.ac.be (Guy Bottu) Date: Fri, 6 Sep 2002 09:39:00 +0200 (CEST) Subject: question about plotcon Message-ID: <200209060739.JAA1355059@black.vub.ac.be> from : BEN Dear colleagues, Does someone know how plotcon computes the average similarity score ? When I give as input a 5 protein alignment and a similarity matrix that scores identical amino acid pairs +1, the plotcon graphic raises nowhere above +0.2 Is this O.K. ? Guy Bottu From s.roehrig at xantos.de Fri Sep 6 07:33:04 2002 From: s.roehrig at xantos.de (Roehrig, Sascha) Date: Fri, 6 Sep 2002 13:33:04 +0200 Subject: indexing refseq with dbiflat Message-ID: Dear all, I am having trouble indexing refseq (release in genbank format from yesterday). During indexing I get a lot of errors complaining about duplicate ids: Index a flat file database Warning: Duplicate ID skipped: '0610012A05Rik' Warning: Duplicate ID skipped: '0610043B10Rik' Warning: Duplicate ID skipped: '1110004B13Rik' Warning: Duplicate ID skipped: '1110020A23Rik' Warning: Duplicate ID skipped: '14' Warning: Duplicate ID skipped: '14' Warning: Duplicate ID skipped: '14' Warning: Duplicate ID skipped: '14' Warning: Duplicate ID skipped: '1500000C01Rik' ... ... After indexing, I am not able to retrieve a lot of entries which are present in the flatfile, i.e.: NM_000303 NM_005693 ... ... Any suggestions would be greatly appreciated. I noticed that one of the changes in version 2.4.1 (I am using 2.5.0) addressed fixing the indexing of refseq. Best regards Sascha -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.open-bio.org/pipermail/emboss/attachments/20020906/fa6e7412/attachment.html From abrown at nimr.mrc.ac.uk Fri Sep 6 09:27:44 2002 From: abrown at nimr.mrc.ac.uk (Alex Brown) Date: Fri, 6 Sep 2002 14:27:44 +0100 Subject: eprimer3 Message-ID: <6D152F0A-C19C-11D6-BA35-0003938768AC@nimr.mrc.ac.uk> Hi. I'm running EMBOSS on a Mac PowerBook G4 (OSX), under Darwin The EMBOSS program eprimer3 requires the program primer3 from the Whiltehead Institute ( http://www- genome.wi.mit.edu/ftp/distribution/software/primer3_0_9_test.tar.gz ). I have downloaded this. When I try to install it using 'make primer3', I get the following error message: [xxxx:/primer3_0_9_test/src] me% make primer3 make: *** No rule to make target `primer3'. Stop. Being a UNIX grasshopper, I'm not sure how to overcome this problem. Has anyone installed primer3 onto Darwin successfully ? Many Thanks, Alex Brown. From David.Bauer at SCHERING.DE Fri Sep 6 10:06:33 2002 From: David.Bauer at SCHERING.DE (David.Bauer at SCHERING.DE) Date: Fri, 6 Sep 2002 16:06:33 +0200 Subject: infoseq db name in fasta file Message-ID: Hi, infoseq can read fasta file with a database name in front of the identifier. >database:name accession description Is there a way to preserve the database name on output? In the USA and NAME colums appears only the name and not the database. Thanks, David. From mathog at mendel.bio.caltech.edu Fri Sep 6 10:47:50 2002 From: mathog at mendel.bio.caltech.edu (David Mathog) Date: Fri, 06 Sep 2002 07:47:50 -0700 Subject: eprimer3 Message-ID: > The EMBOSS program eprimer3 requires the program primer3 from the > Whiltehead Institute ( http://www- > genome.wi.mit.edu/ftp/distribution/software/primer3_0_9_test.tar.gz ). I > have downloaded this. When I try to install it using 'make primer3', I > get the following error message: > > [xxxx:/primer3_0_9_test/src] me% make primer3 > make: *** No rule to make target `primer3'. Stop. > Just use "make". The executable produced is actually called primer3_core. In older versions of primer3 before it was wrapped in a web interface the program was indeed called primer3, as you'd expect. Regards David Mathog mathog at caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech From peter.rice at uk.lionbioscience.com Fri Sep 6 10:54:17 2002 From: peter.rice at uk.lionbioscience.com (Peter Rice) Date: Fri, 06 Sep 2002 15:54:17 +0100 Subject: infoseq db name in fasta file References: Message-ID: <3D78C199.6000308@uk.lionbioscience.com> David.Bauer at SCHERING.DE wrote: > infoseq can read fasta file with a database name in front of the identifier. > >>database:name accession description > > Is there a way to preserve the database name on output? > In the USA and NAME colums appears only the name and not the database. Tricky ... The USA is of course the filename for EMBOSS. If you 'cheat' with -sdb database the USA will be database-id:name The real problem is that EMBOSS cannot easily know "database" is valid. Any suggestions for how this could work? Peter -- ------------------------------------------------ Peter Rice, LION Bioscience Ltd, Cambridge, UK peter.rice at uk.lionbioscience.com +44 1223 224723 From abrown at nimr.mrc.ac.uk Fri Sep 6 11:07:08 2002 From: abrown at nimr.mrc.ac.uk (Alex Brown) Date: Fri, 6 Sep 2002 16:07:08 +0100 Subject: eprimer3 In-Reply-To: <3D78B34F.2000002@clondiag.com> Message-ID: <4FC704BA-C1AA-11D6-BA35-0003938768AC@nimr.mrc.ac.uk> Hi. Thanks for the help. This appears to have worked to some extent, after I modified the Makefile. I had to change the line CC = gcc to CC = CC (I think that CC on Darwin is GCC). This appeared to do something, but I this time I got a load of error messages: [xxxx:/primer3_0_9_test/src] me% make cc -c -g -Wall -D__USE_FIXED_PROTOTYPES__ -include /usr/include/sys/types.h -O2 -DDPAL_MAX_ALIGN=36 -DMAX_PRIMER_LENGTH=36 primer3_main.c primer3_main.c: In function `oligo_param': primer3_main.c:1113: warning: passing arg 3 of `oligo_overlaps_interval' from incompatible pointer type primer3_main.c:1147: warning: passing arg 3 of `oligo_overlaps_interval' from incompatible pointer type primer3_main.c:1151: warning: passing arg 3 of `oligo_overlaps_interval' from incompatible pointer type cc -g -Wall -D__USE_FIXED_PROTOTYPES__ -include /usr/include/sys/types.h -O2 -c -o oligotm.o oligotm.c primer3_release.h:3: warning: `pr_release' defined but not used cc -c -g -Wall -D__USE_FIXED_PROTOTYPES__ -include /usr/include/sys/types.h -O2 -DDPAL_MAX_ALIGN=36 -DMAX_PRIMER_LENGTH=36 -o dpal_primer.o dpal.c dpal.c: In function `_dpal_generic': dpal.c:332: warning: `i0' might be used uninitialized in this function dpal.c:332: warning: `j0' might be used uninitialized in this function dpal.c:336: warning: `I' might be used uninitialized in this function dpal.c:336: warning: `J' might be used uninitialized in this function dpal.c:338: warning: `score' might be used uninitialized in this function dpal.c: In function `_dpal_long_nopath_generic': dpal.c:560: warning: `I' might be used uninitialized in this function dpal.c:560: warning: `J' might be used uninitialized in this function dpal.c: At top level: primer3_release.h:3: warning: `pr_release' defined but not used cc -c -g -Wall -D__USE_FIXED_PROTOTYPES__ -include /usr/include/sys/types.h -O2 -DDPAL_MAX_ALIGN=36 -DMAX_PRIMER_LENGTH=36 -o format_output.o format_output.c cc -c -g -Wall -D__USE_FIXED_PROTOTYPES__ -include /usr/include/sys/types.h -O2 -DDPAL_MAX_ALIGN=36 -DMAX_PRIMER_LENGTH=36 -o boulder_input.o boulder_input.c primer3_release.h:3: warning: `pr_release' defined but not used cc -g -o primer3_core primer3_main.o oligotm.o dpal_primer.o format_output.o boulder_input.o '-static' -lm /usr/bin/ld: can't locate file for: -lcrt0.o make: *** [primer3_core] Error 1 According to the README file, the warnings about i0, j0, I, J and score being possibly used unset in dpal.c are normal and harmless. However, the 'make' is still failing, the primer3_core exectuable does not appear. The problem seems to be in the line '/usr/bin/ld: can't locate file for: -lcrt0.o'. Any ideas ?? Many thanks, Alex Brown. From estienne at sanbi.ac.za Mon Sep 9 05:53:43 2002 From: estienne at sanbi.ac.za (Estienne Swart) Date: Mon, 09 Sep 2002 11:53:43 +0200 Subject: Piping with emboss utilites that expect two files as input Message-ID: <3D7C6FA7.7030803@sanbi.ac.za> Hi, Is there any way to directly pipe two sequences into utitilities such as water/matcher, etc. which require usually require two files as input? It is possible to pipe one or the other (seqA or seqB, e.g.water -auto -filter -SeqA sample1.seq < sample2.seq ), but no both at once (e.g.water -auto -filter < sample1.seq < sample2.seq ). I know not all shells can read from multiple pipes(?), like zsh can, e.g.: cat < sample2.seq A less simple sequence ACGAGCACTAGCATGCATCGAGCATCGGGAGGACTTAGCAGTCGAGCATCGTACGAGCT >A simple sequence AGAGGCAGGCGACGAGCGAGCGAGCGAGC But, it will really be useful (from a scripting perspective) if these utilities could at least take a pair of sequences from stdin, and align them. Estienne Swart From hkawai at venus.dti.ne.jp Mon Sep 9 23:33:53 2002 From: hkawai at venus.dti.ne.jp (=?ISO-2022-JP?B?GyRCMk85ZzkoNSobKEI=?=) Date: Tue, 10 Sep 2002 12:33:53 +0900 Subject: GenBank indexing Trouble Message-ID: <200209100333.g8A3X2Yn024400@smtp1.dti.ne.jp> Hello I'm using EMBOSS package. I appreciate developers' efforts. Unfortunately, I found a trouble when I indexed GenBank 130 and called it with entret/seqret. First of all, I made index for all files of GenBank 130 (except EST,GSS,HTG) described below. -------------------------------------- % /usr/local/EMBOSS/2.5.0/bin/dbiflat Index a flat file database EMBL : EMBL SWISS : Swiss-Prot, SpTrEMBL, TrEMBLnew GB : Genbank, DDBJ Entry format [SWISS]: GB Database directory [.]: Wildcard database filename [*.dat]: *.seq Database name: GB Release number [0.0]: Index date [00/00/00]: Warning: Duplicate ID skipped: 'AY071141' -------------------------------------- When I called L11995 with "entret gb:L11995", I got the incorrect entry whose accession is M20152. And I tried to get gb:M20152, I got M20153. These three entries exist on the gbrod3.seq file sequentially. This trouble does not occur when I called entries whose 'LOCUS' and 'ACCESSION' fields are identical (e.g.BC003860). Because this trouble occurs with dbiflat in version 2.4.1 or 2.5.0 but does not in 2.3.1, I'm now using EMBOSS 2.3.1 for only dbiflat/dbifasta, and 2.4.1 for other programs (entret/seqret and so on). My hypothesis of this trouble is described below. I focused on the duplicate ID AY071141 and I removed one AY071141entry (from gbinv4.seq file). In this case, I could get correct entries. When dbiflat finds duplicate ID to be skipped, I guess, the index counter of LOCUS and ACCESSION should be increased (or decreased). But in this version, ONLY LOCUS counter would be increased (or decreased) and ACCESSION's one would not be increased (or decreased). I hope my report will be helpfull for developers. Best regards Kawai From peter.rice at uk.lionbioscience.com Tue Sep 10 04:27:51 2002 From: peter.rice at uk.lionbioscience.com (Peter Rice) Date: Tue, 10 Sep 2002 09:27:51 +0100 Subject: GenBank indexing Trouble References: <200209100333.g8A3X2Yn024400@smtp1.dti.ne.jp> Message-ID: <3D7DAD07.3000407@uk.lionbioscience.com> Dear Kawai, > I'm using EMBOSS package. I appreciate developers' efforts. > Unfortunately, I found a trouble when I indexed GenBank 130 and > called it with entret/seqret. > I focused on the duplicate ID AY071141 and I removed one AY071141entry > (from gbinv4.seq file). There is a problem with duplicate IDs which will be solved in EMBOSS 2.5.1 Alan: can we build 2.5.1 or put the updated dbi*.c and embdbi.* files in the patchfiles directory? regards, Peter Rice -- ------------------------------------------------ Peter Rice, LION Bioscience Ltd, Cambridge, UK peter.rice at uk.lionbioscience.com +44 1223 224723 From ableasby at hgmp.mrc.ac.uk Tue Sep 10 06:09:22 2002 From: ableasby at hgmp.mrc.ac.uk (ableasby at hgmp.mrc.ac.uk) Date: Tue, 10 Sep 2002 11:09:22 +0100 (BST) Subject: GenBank indexing Trouble Message-ID: <200209101009.LAA26312@bromine.hgmp.mrc.ac.uk> >Alan: can we build 2.5.1 or put the updated dbi*.c and embdbi.* files in >the patchfiles directory? I just have to test whether some Solaris configuration modifications don't break anything and will hopefully put 2.5.1 out late morning or earlyb afternoon. Alan From ableasby at hgmp.mrc.ac.uk Tue Sep 10 08:03:42 2002 From: ableasby at hgmp.mrc.ac.uk (ableasby at hgmp.mrc.ac.uk) Date: Tue, 10 Sep 2002 13:03:42 +0100 (BST) Subject: EMBOSS 2.5.1 released Message-ID: <200209101203.NAA10569@bromine.hgmp.mrc.ac.uk> This release fixes problems associated with non-unique identifiers in some databases (e.g. REFSEQ). Note that there is now a specific indexing option for that database in dbiflat. Alan From abrown at nimr.mrc.ac.uk Tue Sep 10 12:02:20 2002 From: abrown at nimr.mrc.ac.uk (Alex Brown) Date: Tue, 10 Sep 2002 17:02:20 +0100 Subject: eprimer3 again Message-ID: Hi. Thanks to David Mathog andPeter Slickers for their help in sorting out my last problem. I had to edit the Makefile to remove the line LIBOPTS ='-static' and then remove all reference to $(LIBOPTS), as well as changing the CC = CC (gcc in Darwin is cc). This new Makefile produced the executable primer3_core. Howerver, on running the test program primer_test.pl in the ../test directory, I got the following errors : [****:/primer3_0_9_test/test] me primer_test.pl /primer3_0_9_test/test/primer_test.pl: testing ../src/primer3_core Tue Sep 10 16:49:41 BST 2002 Testing fatal errors 1,5d0 < PRIMER_SEQUENCE_ID=seq1 < SEQUENCE=ATGCTAGCTAGTCGATCGATGCTAGCTAGTCGATCTACTATCATCATGCTAGCATGGGGCATCTGTGGGATGCTGTCGTACTGATCGATGCTAGCTAGCTAGTCGATCGTAGCTATCATCATCTAGTCATCGTAGCTATCATCGTAGC < PRIMER_MISPRIMING_LIBRARY=repeat_file < PRIMER_ERROR=Cannot open mispriming library repeat_file < = Difference found between primer_global_err/repeat_lib_err.out and primer_global_err/repeat_lib_err.tmp 1d0 < ../src/primer3_core: Cannot open mispriming library repeat_file Difference found between primer_global_err/repeat_lib_err.out2 and primer_global_err/repeat_lib_err.tmp2 1,6d0 < PRIMER_SEQUENCE_ID=seq1 < SEQUENCE=ATGCTAGCTAGTCGATCGATGCTAGCTAGTCGATCTACTATCATCATGCTAGCATGGGGCATCTGTGGGATGCTGTCGTACTGATCGATGCTAGCTAGCTAGTCGATCGTAGCTATCATCATCTAGTCATCGTAGCTATCATCGTAGC < PRIMER_PICK_INTERNAL_OLIGO=1 < PRIMER_INTERNAL_OLIGO_MISHYB_LIBRARY=repeat_file < PRIMER_ERROR=Cannot open internal oligo mishyb library repeat_file < = Difference found between primer_global_err/repeat_lib_int_err.out and primer_global_err/repeat_lib_int_err.tmp 1d0 < ../src/primer3_core: Cannot open internal oligo mishyb library repeat_file Difference found between primer_global_err/repeat_lib_int_err.out2 and primer_global_err/repeat_lib_int_err.tmp2 FAILED primer_boundary OK primer_internal OK primer_boundary_formated OK primer_internal_formated OK primer_start_codon OK primer_boundary1 OK primer_internal1 OK primer_task OK primer_task_formated OK primer_boundary1_formated Cannot read primer_boundary1_formated_output at /Downloads/primer3_0_9_test/test/primer_test.pl line 80. Somethng appears to be not quite right. I should point out that I had to edit the primer_test.pl file to change the first line to '#!/usr//bin/perl5.6.0 -w' to make it compatible with Darwin. I then copied primer3_core to my EMBOSS bin directory, and tested it using the sequence file given in the EMBOSS manual. Without the -explain flag, the results were perfect, as shown in the manual. However, with the -explain flag set, I get the following error message: Picks PCR primers and hybridization oligos Output file [hsfau1.eprimer3]: *** malloc[422]: error for object 0x1d7c80: Incorrect check sum for freed object - object was probably modified after beeing freed; break at szone_error Bus error Why is the test program failing, and why is the -explain flag causing these problems ? Are there some files missing from the installation ? Hope someone can help. Many thanks. Alex Brown From hkawai at venus.dti.ne.jp Wed Sep 11 06:13:20 2002 From: hkawai at venus.dti.ne.jp (Hironori Kawai) Date: Wed, 11 Sep 2002 19:13:20 +0900 Subject: GenBank indexing Trouble In-Reply-To: <3D7DAD07.3000407@uk.lionbioscience.com> References: <3D7DAD07.3000407@uk.lionbioscience.com> Message-ID: <200209111012.g8BACTYn000007@smtp1.dti.ne.jp> Thanks for uploading new version quickly. The problem I had reported did not occur in new version. But, I would like to discuss another issue. In my previous report, I mentioned duplicate ID 'AY071141'. The duplicate entries are shown below. -------------------------------------------------- LOCUS AY071141 2622 bp DEFINITION Drosophila melanogaster RE17910 full length cDNA. ACCESSION AY071141 LOCUS AY071141 2958 bp DEFINITION Drosophila melanogaster RE17910 full insert cDNA. ACCESSION AY119119 --------------------------------------------------- Even if I use AY119119 with entret/seqret, the former entry is output. I think it is dangerous because it's difficult to notice incorrect entries have been output. In this case, I wish entret/seqret output the latter entry or output no entry but warning. Best regards Kawai From peter.rice at uk.lionbioscience.com Wed Sep 11 06:29:41 2002 From: peter.rice at uk.lionbioscience.com (Peter Rice) Date: Wed, 11 Sep 2002 11:29:41 +0100 Subject: GenBank indexing Trouble References: <3D7DAD07.3000407@uk.lionbioscience.com> <200209111012.g8BACTYn000007@smtp1.dti.ne.jp> Message-ID: <3D7F1B15.70609@uk.lionbioscience.com> Hironori Kawai wrote: > Thanks for uploading new version quickly. > The problem I had reported did not occur in new version. Thanks. > But, I would like to discuss another issue. > In my previous report, I mentioned duplicate > ID 'AY071141'. The duplicate entries are shown below. > -------------------------------------------------- > LOCUS AY071141 2622 bp > DEFINITION Drosophila melanogaster RE17910 full length cDNA. > ACCESSION AY071141 > > LOCUS AY071141 2958 bp > DEFINITION Drosophila melanogaster RE17910 full insert cDNA. > ACCESSION AY119119 > --------------------------------------------------- > Even if I use AY119119 with entret/seqret, the former entry is output. > I think it is dangerous because it's difficult to notice incorrect entries have been output. > In this case, I wish entret/seqret output the latter entry or output no entry but warning. This is a known problem. Only one AY071141 entry can be indexed, but EMBOSS does not have control over which of the 2 (or more) entries will be found first in the sorted index. We save all information from the duplicates (accession for example) simply because we do not know which one to discard. So your search results are 'correct'. The root cause in EMBOSS is that the EMBLCD/Staden index format EMBOSS uses can only have one unique ID for each entry. The solution is likely to be a new EMBOSS index format. The root cause in real life is databases that have duplicate IDs. Surprising that it can happen in GenBank regards, Peter -- ------------------------------------------------ Peter Rice, LION Bioscience Ltd, Cambridge, UK peter.rice at uk.lionbioscience.com +44 1223 224723 From tchiang at bioinfo.sickkids.on.ca Wed Sep 11 11:51:38 2002 From: tchiang at bioinfo.sickkids.on.ca (Ted Chiang) Date: Wed, 11 Sep 2002 11:51:38 -0400 (EDT) Subject: counting fasta files Message-ID: Hello, I have a file that contains several hundreds of fasta sequences. Is there a function/program that will count the number of sequences in this file and report it? -Ted ===================================== Ted Chiang, Analyst Centre for Computational Biology Hospital for Sick Children, Toronto 416.813.7028 tchiang at bioinfo.sickkids.on.ca ===================================== From jason at cgt.mc.duke.edu Wed Sep 11 11:52:19 2002 From: jason at cgt.mc.duke.edu (Jason Stajich) Date: Wed, 11 Sep 2002 11:52:19 -0400 (EDT) Subject: counting fasta files In-Reply-To: Message-ID: % grep "^>" file.fa | wc -l On Wed, 11 Sep 2002, Ted Chiang wrote: > > Hello, > > I have a file that contains several hundreds of fasta sequences. Is there > a function/program that will count the number of sequences in this file > and report it? > > -Ted > > > ===================================== > Ted Chiang, Analyst > Centre for Computational Biology > Hospital for Sick Children, Toronto > 416.813.7028 > tchiang at bioinfo.sickkids.on.ca > ===================================== > > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From chenna at embl-heidelberg.de Wed Sep 11 12:02:53 2002 From: chenna at embl-heidelberg.de (Ramu Chenna) Date: Wed, 11 Sep 2002 18:02:53 +0200 Subject: counting fasta files In-Reply-To: Message-ID: grep '>' myfasta.file | wc ramu On Wed, 11 Sep 2002, Ted Chiang wrote: > > Hello, > > I have a file that contains several hundreds of fasta sequences. Is there > a function/program that will count the number of sequences in this file > and report it? > > -Ted > > > ===================================== > Ted Chiang, Analyst > Centre for Computational Biology > Hospital for Sick Children, Toronto > 416.813.7028 > tchiang at bioinfo.sickkids.on.ca > ===================================== > > > From mathog at mendel.bio.caltech.edu Wed Sep 11 12:05:20 2002 From: mathog at mendel.bio.caltech.edu (David Mathog) Date: Wed, 11 Sep 2002 09:05:20 -0700 Subject: counting fasta files Message-ID: > I have a file that contains several hundreds of fasta sequences. Is there > a function/program that will count the number of sequences in this file > and report it? Try this: ftp://saf.bio.caltech.edu/pub/software/molbio/fastaproperties.c Reads a fasta file (as filename from arg 1, stdin if that's "-") and emits one status line to stdout which is: N M TYPE MINLEN MAXLEN AVELEN where N is the number of sequences in the file M is the total number of bp/aa in the file (over all sequences) TYPE is P or N, the best guess for protein or nucleic acid. If it can't tell it will emit P. MINLEN MAXLEN 1.0.2 11-JUL-2002, DRM. Added the statistics at the end of the line. 1.0.1 22-MAY-2002, DRM. Revised count of bases so that it doesn't mess up on counts > int4 range. Use of long long is an extension to older ANSI C standards. Regards, David Mathog mathog at caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech From fernan at iib.unsam.edu.ar Wed Sep 11 12:24:47 2002 From: fernan at iib.unsam.edu.ar (Fernan Aguero) Date: Wed, 11 Sep 2002 13:24:47 -0300 Subject: counting fasta files In-Reply-To: References: Message-ID: <20020911162447.GB34980@iib.unsam.edu.ar> +----[ Asi hablaba Ted Chiang (tchiang at bioinfo.sickkids.on.ca): | | I have a file that contains several hundreds of fasta sequences. Is there | a function/program that will count the number of sequences in this file | and report it? | +----] Here's another one cat file.fasta | grep -c \> -- F e r n a n A g u e r o http://genoma.unsam.edu.ar/~fernan From flo at ebi.ac.uk Wed Sep 11 13:32:52 2002 From: flo at ebi.ac.uk (Florence Servant) Date: Wed, 11 Sep 2002 17:32:52 +0000 Subject: counting fasta files References: <20020911162447.GB34980@iib.unsam.edu.ar> Message-ID: <3D7F7E43.6E148028@ebi.ac.uk> Fernan Aguero wrote: > +----[ Asi hablaba Ted Chiang (tchiang at bioinfo.sickkids.on.ca): > | > | I have a file that contains several hundreds of fasta sequences. Is there > | a function/program that will count the number of sequences in this file > | and report it? > | > +----] > > Here's another one > > cat file.fasta | grep -c \> > Hi all, I would suggest to add a contraint which is that the line must start with >: cat file.fasta | grep -c ^\> To be really sure it does work for all the fasta files you can have, you also have to take in account that several comment lines starting with > can occur for a single sequence. ggrep -A 1 ^\> file.fasta | grep -v ^-- | grep -v ^\> | wc -l Flo -- Florence SERVANT EBI - European Bioinformatics Institute - Room A2-40 Wellcome Trust Genome Campus Hinxton Cambridge CB10 1SD United Kingdom Tel : (+44) 01223 494 686 From crowdy at bioinfo.sickkids.on.ca Wed Sep 11 15:05:21 2002 From: crowdy at bioinfo.sickkids.on.ca (Edgar Crowdy) Date: Wed, 11 Sep 2002 15:05:21 -0400 Subject: Y and N args Message-ID: <3D7F93F1.476514DB@bioinfo.sickkids.on.ca> Hi, I have a question about the standards for yes/no options in EMBOSS programs. dotpath works as I expect it to. There are three Y/N parameters: "data", "overlap" and "boxit". On the command line I represent them like this: -nodata -overlaps -boxit (meaning 'no' to the first and 'yes' to the last two). In other words, the command line switch on its own is a 'yes', and to say 'no' you prepend 'no' to the switch. I'm having trouble with a program called cirdna. In that program I can use "-noticklines" and "-nointersymbol" but I can't express a 'yes' by saying -ticklines or -intersymbol as in other programs. It only runs if I say '-intersymbol Y', for example. Is cirdna an exception to the rule or are there other programs? Will placing a Y or N after all Y/N parameters work? I need a standard that works for all EMBOSS programs with Y/N parameters. -- Edgar Crowdy, Programmer crowdy at bioinfo.sickkids.on.ca Centre for Computational Biology, Hospital for Sick Children, Toronto From haruna at sgi.com Wed Sep 11 23:23:07 2002 From: haruna at sgi.com (Haruna Cofer) Date: Wed, 11 Sep 2002 23:23:07 -0400 Subject: SGI Porting Notes for EMBOSS 2.5.1 References: <200209101203.NAA10569@bromine.hgmp.mrc.ac.uk> Message-ID: <3D80089B.57DA99AF@sgi.com> Hello -- I have updated my SGI porting notes for the latest 2.5.1 release of EMBOSS, EMBASSY, and Jemboss: http://www.sgi.com/industries/sciences/chembio/resources/emboss/ Enjoy! ;) -- Haruna :) -- Haruna N. Cofer Silicon Graphics Inc. ChemPharm Applications From peter.rice at uk.lionbioscience.com Thu Sep 12 05:31:12 2002 From: peter.rice at uk.lionbioscience.com (Peter Rice) Date: Thu, 12 Sep 2002 10:31:12 +0100 Subject: Y and N args References: <3D7F93F1.476514DB@bioinfo.sickkids.on.ca> Message-ID: <3D805EE0.4080303@uk.lionbioscience.com> Edgar Crowdy wrote: > I'm having trouble with a program called cirdna. In that program I can > use "-noticklines" and "-nointersymbol" but I can't express a 'yes' by > saying -ticklines or -intersymbol as in other programs. It only runs if I > say '-intersymbol Y', for example. > > Is cirdna an exception to the rule or are there other programs? > Will placing a Y or N after all Y/N parameters work? Ugh!!!!! cirdna is using strings for ticklines and intersymbol. That is why the value is required. This needs to be fixed - that part is easy. Simply make cirdna use boolean options as it should. More worrying is that -nointersymbol was allowed. That really must be only for booleans. For example, seqret -nooutfile will write to a file called "N". Not too surprising that nobody has tried this before :-) The boolean testing code is in the right function. Should be easy to modify. regards, Peter Rice -- ------------------------------------------------ Peter Rice, LION Bioscience Ltd, Cambridge, UK peter.rice at uk.lionbioscience.com +44 1223 224723 From areagp61 at yahoo.it Thu Sep 12 10:39:04 2002 From: areagp61 at yahoo.it (Graziano P.) Date: Thu, 12 Sep 2002 16:39:04 +0200 Subject: emma Message-ID: <015201c25a6a$26427d00$18105709@italy.ibm.com> Hi, I am trying to use the following emma options: -norgap and -nopercent, but when I write emma sequence.fasta -norgap the program returns the following error message: EMBOSS An error in ajacd.c at line 11262: unknown qualifier -norgap the same is if I write emma sequence.fasta -nopercent EMBOSS An error in ajacd.c at line 11262: unknown qualifier -nopercent I have seen in emma.acd file and these two qualifiers exist!! Lines in emma.acd file about these two options are: bool: norgap [ optional: "$(prot)" default: "No" information: "No residue specific gaps" help: "'Residue specific penalties' are amino acid specific gap penalties that ......" ] bool: nopercent [ optional: "@(!$(slow))" default: "No" information: "Fast pairwise alignment: similarity scores: suppresses percentage score" ] Anyone have alredy found this problem? Thanks Graziano ----------------------------------------------- Graziano Pappad? areagp61 at yahoo.it ---------------------------------------------- ______________________________________________________________________ Scarica il nuovo Yahoo! Messenger: con webcam, nuove faccine e tante altre novit?. http://it.yahoo.com/mail_it/foot/?http://it.messenger.yahoo.com/ From peter.rice at uk.lionbioscience.com Thu Sep 12 10:51:08 2002 From: peter.rice at uk.lionbioscience.com (Peter Rice) Date: Thu, 12 Sep 2002 15:51:08 +0100 Subject: emma References: <015201c25a6a$26427d00$18105709@italy.ibm.com> Message-ID: <3D80A9DC.5050105@uk.lionbioscience.com> Graziano P. wrote: > I am trying to use the following emma options: -norgap and -nopercent, but > when I write > > emma sequence.fasta -norgap > > the program returns the following error message: > > EMBOSS An error in ajacd.c at line 11262: > unknown qualifier -norgap > > the same is if I write > > emma sequence.fasta -nopercent > > EMBOSS An error in ajacd.c at line 11262: > unknown qualifier -nopercent > > I have seen in emma.acd file and these two qualifiers exist!! Oops. We are working on it. "no" in front of a qualifier is automatically removed, and the value is set to "N". It should check (a) that the qualifier is boolean and (b) that the name is correct. Qualifiers should not begin with "no". For emma, -nopercent and -norgap will need new names to work correctly. We should make it -percent and -rgap :-) > Anyone have alredy found this problem? This is the same as the problem reported this morning ... before today, nobody had reported it !!! Peter -- ------------------------------------------------ Peter Rice, LION Bioscience Ltd, Cambridge, UK peter.rice at uk.lionbioscience.com +44 1223 224723 From shay at bioinfo.sickkids.on.ca Fri Sep 13 13:22:29 2002 From: shay at bioinfo.sickkids.on.ca (Shayanthan Parameswaran) Date: Fri, 13 Sep 2002 13:22:29 -0400 Subject: GenBank indexing Trouble (fwd) References: Message-ID: <3D821ED5.F36F50BE@bioinfo.sickkids.on.ca> To all, We installed Emboss 2.5.1 and indexed genbank 131 with the GB format option using the new dbiflat that corrected the error of incorrect entry retrieval. We tried the new REFSEQ option in dbiflat to index refseq, however, the error that was fixed in the dbiflat GB option does not seem to be fixed in the REFSEQ format option. Seqret retrieves the entry NM_066922.1 instead of NM_066918. Has anyone else experienced this error with the REFSEQ format option? Shay > > Date: Tue, 10 Sep 2002 13:03:42 +0100 (BST) > From: ableasby at hgmp.mrc.ac.uk > To: emboss at hgmp.mrc.ac.uk > Subject: EMBOSS 2.5.1 released > > This release fixes problems associated with non-unique identifiers > in some databases (e.g. REFSEQ). Note that there is now a specific > indexing option for that database in dbiflat. > > Alan > > > Date: Tue, 10 Sep 2002 12:33:53 +0900 > From: "[ISO-2022-JP] 河合宏紀" > To: emboss at embnet.org > Subject: GenBank indexing Trouble > > Hello > > I'm using EMBOSS package. I appreciate developers' efforts. > Unfortunately, I found a trouble when I indexed GenBank 130 and > called it with entret/seqret. > > First of all, I made index for all files of GenBank 130 (except > EST,GSS,HTG) described below. > -------------------------------------- > % /usr/local/EMBOSS/2.5.0/bin/dbiflat > Index a flat file database > EMBL : EMBL > SWISS : Swiss-Prot, SpTrEMBL, TrEMBLnew > GB : Genbank, DDBJ > Entry format [SWISS]: GB > Database directory [.]: > Wildcard database filename [*.dat]: *.seq > Database name: GB > Release number [0.0]: > Index date [00/00/00]: > Warning: Duplicate ID skipped: 'AY071141' > -------------------------------------- > > When I called L11995 with "entret gb:L11995", I got the incorrect entry > whose accession is M20152. And I tried to get gb:M20152, I got M20153. > These three entries exist on the gbrod3.seq file sequentially. This > trouble does not occur when I called entries whose 'LOCUS' and > 'ACCESSION' fields are identical (e.g.BC003860). Because this trouble > occurs with dbiflat in version 2.4.1 or 2.5.0 but does not in 2.3.1, I'm > now using EMBOSS 2.3.1 for only dbiflat/dbifasta, and 2.4.1 for other > programs (entret/seqret and so on). > > My hypothesis of this trouble is described below. > I focused on the duplicate ID AY071141 and I removed one AY071141entry > (from gbinv4.seq file). > In this case, I could get correct entries. > When dbiflat finds duplicate ID to be skipped, I guess, the index counter > of LOCUS and ACCESSION should be increased (or decreased). But in this > version, ONLY LOCUS counter would be increased (or decreased) and > ACCESSION's one would not be increased (or decreased). > > I hope my report will be helpfull for developers. > > Best regards > > Kawai -- Shayanthan Parameswaran Bioinformatics Supercomputing Centre Programmer (416) 813-8030 555 University Avenue email: shay at bioinfo.sickkids.on.ca The Hospital for Sick Children http: www.bioinfo.sickkids.on.ca Toronto, ON, M5G 1X8, CANADA. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.open-bio.org/pipermail/emboss/attachments/20020913/f5914740/attachment.html From jison at hgmp.mrc.ac.uk Mon Sep 16 12:14:04 2002 From: jison at hgmp.mrc.ac.uk (Dr J.C. Ison) Date: Mon, 16 Sep 2002 17:14:04 +0100 Subject: requestion for Emboss Demo version References: <20020916125719.36123.qmail@web13805.mail.yahoo.com> Message-ID: <3D86034B.103507D1@hgmp.mrc.ac.uk> Kuzhali - - please provide more details of your problem. One of us might be able to help then. Cheers J. kuzhali subbiah wrote: > hello sir > i need a Demo version of emboss,& i ve got a doubt > sir ,i ve downloaded the software & > database(swissprot) > ,will u plz help me how to link the software & > database sir,it will be very helpfull if u provide me > the informations.i request u to send the informations > as soon as possible. > > kuzhali > From shilpahalesh at yahoo.co.in Tue Sep 17 01:34:53 2002 From: shilpahalesh at yahoo.co.in (=?iso-8859-1?q?shilpa=20halesh?=) Date: Tue, 17 Sep 2002 06:34:53 +0100 (BST) Subject: info needed Message-ID: <20020917053453.67174.qmail@web8004.mail.in.yahoo.com> hello sir, i am PG student doing medical software course.we are required to work on bioinformatics tools.it is given as an assignment.i have selected emboss as the topic.we are asked to understand the source code and to change it another compatible language. i have downloaded the emboss software,the database,i have downloaded the tutorial also,but i am not able to understand how to link the software with the database.i request you to please guide me so that i can understand the source code in a proper way.i hope you would consider my request.my mail id is shilpahalesh at yahoo.co.in shilpa ________________________________________________________________________ Missed your favourite TV serial last night? Try the new, Yahoo! TV. visit http://in.tv.yahoo.com From jison at hgmp.mrc.ac.uk Tue Sep 17 03:06:55 2002 From: jison at hgmp.mrc.ac.uk (Dr J.C. Ison) Date: Tue, 17 Sep 2002 08:06:55 +0100 Subject: info needed References: <20020917053453.67174.qmail@web8004.mail.in.yahoo.com> Message-ID: <3D86D48F.832F053D@hgmp.mrc.ac.uk> If its a sequence database then some of the functions you'll need are in the "Sequence reading and writing" AJAX library files : ajseq General sequence handling ajseqdata Sequence data types ajseqdb Sequence database access ajseqread Sequence reading ajseqtype Sequence types ajseqwrite Sequence writing Links to them are from: http://www.hgmp.mrc.ac.uk/Software/EMBOSS/Doc/Ajax/index.html J. shilpa halesh wrote: > hello sir, > i am PG student doing medical software course.we are > required to work on bioinformatics tools.it is given > as an assignment.i have selected emboss as the > topic.we are asked to understand the source code and > to change it another compatible language. > i have downloaded the emboss software,the database,i > have downloaded the tutorial also,but i am not able to > understand how to link the software with the > database.i request you to please guide me so that i > can understand the source code in a proper way.i hope > you would consider my request.my mail id is > shilpahalesh at yahoo.co.in > > shilpa > > ________________________________________________________________________ > Missed your favourite TV serial last night? Try the new, Yahoo! TV. > visit http://in.tv.yahoo.com -- Jon C. Ison, PhD Bioinformatics Applications Group UK MRC Human Genome Mapping Project Resource Centre Hinxton, Cambridge, CB10 1SB, UK E-mail : jison at hgmp.mrc.ac.uk Tel : 01223 49-4548 HGMP-RC: http://www.hgmp.mrc.ac.uk/ EMBOSS : http://www.hgmp.mrc.ac.uk/Software/EMBOSS/ CCP11 : http://www.hgmp.mrc.ac.uk/CCP11/ From gbottu at ben.vub.ac.be Tue Sep 17 03:50:05 2002 From: gbottu at ben.vub.ac.be (Guy Bottu) Date: Tue, 17 Sep 2002 09:50:05 +0200 (CEST) Subject: -sbegin -send blues Message-ID: <200209170750.JAA1152221@black.vub.ac.be> from : BEN Dear colleagues, I just installed Staden and its EMBOSS interface. I noted a few problems and sent a bug report, with among other things the following : Why is it that sometimes you can make a program operate only on a range of the sequence by filling in the "Start position" "End position" boxes, but sometimes not. An example of a program that operates on the complete sequence no matter what you do is octanol. I received as reply : All EMBOSS programs that take a -sequence option should also accept -sbegin and -send options for indicating the range, but it appears some do not. I view this as an EMBOSS bug (in Octanol). There doesn't appear to be any component of the ACD file to indicate whether sbegin and send are valid options, and indeed octanol on the command line is happy to accept (and ignore) them. What a surprise ! I had not thought of testing at the command line. Does someone know why it is that some EMBOSS programs behave in this nonstandard way ? Sincerely, Guy Bottu From ableasby at hgmp.mrc.ac.uk Tue Sep 17 04:33:36 2002 From: ableasby at hgmp.mrc.ac.uk (ableasby at hgmp.mrc.ac.uk) Date: Tue, 17 Sep 2002 09:33:36 +0100 (BST) Subject: -sbegin -send blues Message-ID: <200209170833.JAA02474@bromine.hgmp.mrc.ac.uk> >Does someone know why it is that some EMBOSS programs behave in this >nonstandard way ? Yes. Usually an oversight by the author of an application. It is usually fixed now by one simple call to ajSeqTrim(). Historically functions such as ajSeqBegin() and ajSeqEnd() had to be used and some confusion was possibly caused. Its either that or incompetence :-) Cheers Alan From thierry.jaccaud at wanadoo.fr Wed Sep 18 16:31:16 2002 From: thierry.jaccaud at wanadoo.fr (Thierry JACCAUD) Date: Wed, 18 Sep 2002 22:31:16 +0200 Subject: =?iso-8859-1?Q?L'Ecologiste_n=B07_est_paru_!?= Message-ID: A l'attention de emboss at embnet.org Bonjour, J'ai le plaisir de vous adresser ci-apr?s la pr?sentation du n?7 de L'Ecologiste, paru fin juin et encore disponible. La prochaine livraison de L'Ecologiste para?tra d?but octobre. En vous souhaitant bonne lecture ! Sinc?res salutations, Thierry Jaccaud, r?dacteur en chef Dossier du n?7 : comment nourrir l'humanit? ? 84 pages couleurs, en kiosque ou sur commande, 6 euros Sommaire d?taill? sur www.ecologiste.org * Les OGM, la faim et l'Acad?mie des sciences : ? lire une contribution exceptionnelle de Jean-Pierre Berlan, directeur de recherche ? l'INRA, adress?e ? tous les membres de l'Acad?mie. * 48 pages de dossier central : comment nourrir l'humanit? ? Les fausses solutions ? Le d?veloppement, le libre-?change, une certaine forme d'aide alimentaire... pr?conis?es par les grandes institutions internationales, ces solutions apparaissent plut?t comme des causes essentielles de la faim. Les vraies solutions ? La r?forme agraire, la protection des sols, les semences traditionnelles, les petites fermes diversifi?es dont la productivit? est plus ?lev?e que celle de l'agro-industrie comme le montre Vandana Shiva, la remise en cause des r?gimes alimentaires fortement carn?s des Occidentaux. Les solutions existent, m?me avec l'augmentation de population habituellement pr?vue pour 2050. Le dossier se conclut par des ?tudes d?taill?es sur Cuba, la Pologne, les Philippines. * Egalement au sommaire du n?7: La nouvelle ministre de l'Environnement d?clare que le nucl?aire est la moins polluante des industries ? L'Ecologiste publie un bilan de Tchernobyl, par Corinne Castanier, directrice de la Criirad. De nouveaux d?put?s viennent d'?tre ?lus mais leur pouvoir l?gislatif est concr?tement menac? par les r?gles de l'OMC analysent Agn?s Bertrand et Laurence Kalafatid?s. A lire ?galement, des articles sur les baleines grises mises en p?ril par Esso au large des c?tes russes, les pesticides et l'Union europ?enne, l'exemple d'un ?covillage dans les Alpes, un bilan du colloque sur l'apr?s d?veloppement, des ?ditoriaux sur les Indiens Cris, le climat, les dioxines. Avec ?galement des recensions de revues et de livres et le billet d'Alain Herv? ! ********************************************************** " Enfin l'?dition fran?aise de The Ecologist ! " Le Monde " Un titre de r?f?rence " Lib?ration. " Une grande revue est n?e " Politis " Excellent " Le Monde diplomatique Prix au num?ro 6 euros (port offert). Abonnement 1 an, quatre n? : 22,50 euros. Abonnement deux ans, huit n? : 43 euros. L'Ecologiste, 25, rue de F?camp - 75012 Paris. Tel 01 46 28 70 32 Fax 01 43 47 03 38. Nouveau courriel : contact at ecologiste.org - Site Internet : www.ecologiste.org Directeur de publication : Teddy Goldsmith. R?dacteur en chef : Thierry Jaccaud Disponible en kiosque en France, Belgique, Canada, Suisse, Luxembourg et les principaux pays francophones. Si le diffuseur de presse a vendu tous ses exemplaires, il peut encore vous commander un exemplaire (r?assort disponible. codification NMPP : 1848). *********************************************************** Si vous ne souhaitez plus recevoir de mails, il vous suffit de r?pondre en indiquant en objet " unsubscribe ". La p?riodicit? de ces messages est de l'ordre de 1 tous les trois mois environ. *********************************************************** N.B. Le livre de Teddy Goldsmith " Le d?fi du XXI?me si?cle " vient d'?tre r??dit? aux ?ditions du Rocher, sous le titre " Le Tao de L'Ecologie ", avec un texte et une bibliographie revue et corrig?e. " Lisez ce livre, vous ne verrez plus le monde de la m?me fa?on ! " Jean-Marie Pelt. Disponible ? L'Ecologiste, 23 euros, port et emballage offerts. From mad at biol.unlp.edu.ar Thu Sep 19 05:22:01 2002 From: mad at biol.unlp.edu.ar (Martin Sarachu) Date: Thu, 19 Sep 2002 12:22:01 +0300 Subject: emboss on openMosix Message-ID: <3D899739.7A61160F@biol.unlp.edu.ar> Hi, anybody knows if EMBOSS runs with openMosix clusters? Does EMBOSS apps use shared memory? Cheers, martin -- Martin Sarachu mad at biol.unlp.edu.ar EMBnet Argentina http://www.ar.embnet.org From ableasby at hgmp.mrc.ac.uk Thu Sep 19 12:15:52 2002 From: ableasby at hgmp.mrc.ac.uk (ableasby at hgmp.mrc.ac.uk) Date: Thu, 19 Sep 2002 17:15:52 +0100 (BST) Subject: emboss on openMosix Message-ID: <200209191615.RAA02367@bromine.hgmp.mrc.ac.uk> We haven't tried OpenMosix here. As far as shared memory is concerned, not the IPC kind (i.e. no shm) only shared libraries. Alan From Wiepert.Mathieu at mayo.edu Fri Sep 20 13:15:20 2002 From: Wiepert.Mathieu at mayo.edu (Wiepert, Mathieu) Date: Fri, 20 Sep 2002 12:15:20 -0500 Subject: report formats (-rformat) Message-ID: <2F41CC6C9777D311ACBD009027B108EA03EAFF1D@excsrv32.mayo.edu> Hi, I was wondering how -rformat works. I was doing a fuzzpro search, and got formats with many header lines. The output was SeqName Start End Score Mismatch NU5M_RHIUN 150 157 8 . SeqName Start End Score Mismatch NU5M_FELCA 150 157 8 . SeqName Start End Score Mismatch TBB2_NEIMB 572 579 8 . ... But I was hoping for SeqName Start End Score Mismatch NU5M_RHIUN 150 157 8 . NU5M_FELCA 150 157 8 . TBB2_NEIMB 572 579 8 . ... fuzzpro -sequence swissprot:* -rformat -pattern [MF]-[LS]-[F]-[LPA]-[GL]-X-[GAY]-X -mismatch 0 -outfile fuzzpro.excel Did I do something wrong, or are my expectations wrong. It wasn't too clear to me from the formats page how that might work. If this has been asked 100 times before, sorry! Is there an archive I can search? Googling didn't come up with much. -Mat From lenaganesh2k2 at yahoo.com Tue Sep 24 06:34:30 2002 From: lenaganesh2k2 at yahoo.com (LENA GANESH) Date: Tue, 24 Sep 2002 03:34:30 -0700 (PDT) Subject: Pls suggest to get Postscript o/p perfectly in Perl In-Reply-To: <20020923153739.66796.qmail@web12905.mail.yahoo.com> Message-ID: <20020924103430.1099.qmail@web12906.mail.yahoo.com> Sir, I'm Developing Perl program to run the emboss commands in Web Browser. when i'm trying this command in Pel ***************** $cmd="/emboss/bin/dottup F1.TXT F2.TXT -wordsize 4 -graph ps"; system($cmd); ***************** I'm getting only ps script file, but not dotted image.If i run same this command in shell, i'm getting the result perfectly. Is there any alternate way? Pls suggests. --------------------------------- Do you Yahoo!? New DSL Internet Access from SBC & Yahoo! -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.open-bio.org/pipermail/emboss/attachments/20020924/a9164f36/attachment.html From simon.andrews at bbsrc.ac.uk Tue Sep 24 07:31:31 2002 From: simon.andrews at bbsrc.ac.uk (simon andrews (BI)) Date: Tue, 24 Sep 2002 12:31:31 +0100 Subject: Pls suggest to get Postscript o/p perfectly in Perl Message-ID: <2DC41140A89ED411989D00508BDCD9ED01E286FC@bi-exsrv1.iapc.bbsrc.ac.uk> From: LENA GANESH [mailto:lenaganesh2k2 at yahoo.com] Subject: Pls suggest to get Postscript o/p perfectly in Perl > I'm Developing Perl program to run the emboss commands in Web Browser. > when i'm trying this command in Pel > ***************** > $cmd="/emboss/bin/dottup F1.TXT F2.TXT -wordsize 4 -graph ps"; > system($cmd); > ***************** > I'm getting only ps script file, but not dotted image.If i run same this > command in shell, i'm getting the result perfectly. Is there any alternate > way? I'm not sure what you mean here. If you say you're getting the dottup.ps file created, then that IS the 'dotted image', it's just in postscript format, which requires that you open it in a suitable viewer. If you want to display the output of dottup on a web page then postscript format is not much help as most browsers won't know what to do with it. You'd be better off using PNG format. Unfortunately there doesn't seem to be a way to make dottup output its PNG to STDOUT, so you'll have to let it create dottup.1.png, then read it and pass it back to the browser. Something like this should get you started. Just put it in your cgi-bin directory, and alter the path on line 5 to wherever your F1 and F2 files are situated. You will also need to ensure that the user you web server runs under has permission to create files in that directory. Watch for long lines being wrapped: ######################################################################### #!/usr/bin/perl -w use strict; use CGI::Carp qw(fatalsToBrowser); $ENV{'PLPLOT_LIB'}='/emboss/share/EMBOSS'; chdir('/wherever/your/files/are') || die "Couldn't change directory: $!"; system ('/emboss/bin/dottup F1.TXT F2.TXT -wordsize 4 -data 0 -auto -graph png > /dev/null'); unless (-e 'dottup.1.png') { die "No output created by dottup - check web server error logs"; } # If running on non-unix systems you will also # need to binmode() all filehandles for PNG output open (PNG,'dottup.1.png') || die "Couldn't read PNG file: $!"; print "Content-type:Image/png\n\n"; print while (); ########################################################################## If you are just trying to make a general EMBOSS interface then are you aware that there are several in existence already? Have a look at http://www.hgmp.mrc.ac.uk/Software/EMBOSS/interfaces.html for a list of them. Hope this helps Simon. From David.Bauer at SCHERING.DE Tue Sep 24 07:38:55 2002 From: David.Bauer at SCHERING.DE (David.Bauer at SCHERING.DE) Date: Tue, 24 Sep 2002 13:38:55 +0200 Subject: Pls suggest to get Postscript o/p perfectly in Perl Message-ID: Hello Lena, if you want use emboss via web you should have a look at Luke's interface: http://www.cbr.nrc.ca/EMBOSS/ The interface is build up in perl as perl modules. It's easy to setup and you can use the modules from your own perl cgi-scripts. David. From scop at mrc-lmb.cam.ac.uk Fri Sep 27 05:22:39 2002 From: scop at mrc-lmb.cam.ac.uk (Scop authors) Date: Fri, 27 Sep 2002 10:22:39 +0100 (BST) Subject: position available to work on SCOP Message-ID: Medical Research Council- Centre for Protein Engineering Research Position to work on Structural Classification of Proteins [Ref:CPE/802/11] Applications are invited for a scientific programmer position to work with Dr.Alexey Murzin on further development of the SCOP (Structural Classification of Protein) database. The SCOP database: scop.mrc-lmb.cam.ac.uk/scop is a widely used resource for the investigation of protein structures and sequences. The successful applicant will be responsible for collaboration with other public databases on co-ordination, integration and distribution of sequence and structural family data. He or she will develop tools to support the annotation of SCOP entries and will contribute to the extension of SCOP functionalities in close collaboration with the other members of the SCOP team. The successful applicant will also have the opportunity to carry out independent research on novel computational techniques or protein evolution, according to his or her interests. These tasks require a broad computer science background and strong programming skills. The ideal candidate will be able to tackle a variety of problems, quickly learn new tools, design and implement solutions as needed. He or she will also have some specific competences at the interface of computer science and biology, together with the ability to work in an interdisciplinary area and a genuine interest for biology in its modern, computerized form. Desirable experience includes some background in computational molecular biology and in the management of large amount of partially unstructured data. Commitment to both data and software quality is indispensable and ability to work in a team environment essential. If you think you fit the job description and requirements, you are encouraged to apply even if you fail short of some of the ideal candidate's attributes, but please consider that in order to be successful you must have demonstrated strong computational skills and a genuine interest in biology. For further information please contact us at scop at mrc-lmb.cam.ac.uk. You can also visit the CPE website at: www.mrc-cpe.cam.ac.uk -------------------------------------------------------------------------- This position is grant funded up to 5 years and will be to MRC pay band 4 with a starting salary in the range of 20,625 to 24,750 pounds per annum, depending on qualifications and experience. This post also attracts a scientific supplement of 9% per annum. Applicants should include a full CV, covering letter and details of three professional referees who can be contacted prior to interview. Please quote the relevant job reference CPE/802/11 and email to recruit at mrc-lmb.cam.ac.uk or post to: Kelly Andrews, Personnel Assistant, MRC Centre, Hills Road, Cambridge, CB2 2QH, UK. We are not able to guarantee considering candidates who apply after 1 December 2002. -------------------------------------------------------------------------- `Leading Science for Better Health' The Medical Research Council is an Equal Opportunities Employer and operates a strict no smoking policy. -------------------------------------------------------------------------- From dmerberg at Phylos.com Mon Sep 30 11:59:49 2002 From: dmerberg at Phylos.com (David Merberg) Date: Mon, 30 Sep 2002 11:59:49 -0400 Subject: Setting graphic size in EMBOSS Message-ID: <7E2FACE572C5D61180F300A0C9E97E8F03D6FA@NTSERVER1> Hi all, Is it possible to set the size of graphic output in EMBOSS? For example, I'd like to see the output of ABIview in one long image that could be scrolled horizontally. Thanks, David Merberg Phylos, Inc. 128 Spring Street Lexington, MA 01720 From tmargus at ebc.ee Thu Sep 5 16:28:19 2002 From: tmargus at ebc.ee (=?iso-8859-1?Q?T=F5nu_Margus?=) Date: Thu, 5 Sep 2002 19:28:19 +0300 Subject: how to specify seq parameters? Message-ID: <002e01c254f9$3f0b72e0$1e1728c1@ebc.ee> Hi, It is simple question, but I didn't find an answer from documentation. How to tell dotmatcher or whatever an other program in command line, that it will read sequence in reverse/complement? If to run it with an option -sask then it will ask, but from command line? Help tells db:seq1 (Parameter1) db:seq2 (Parameter2) where parameters are optional. I can't find from documentation how to specify "parameters". Thanks in Advance, T?nu Margus -------------- next part -------------- An HTML attachment was scrubbed... URL: From peter.rice at uk.lionbioscience.com Thu Sep 5 16:35:14 2002 From: peter.rice at uk.lionbioscience.com (Peter Rice) Date: Thu, 05 Sep 2002 17:35:14 +0100 Subject: how to specify seq parameters? References: <002e01c254f9$3f0b72e0$1e1728c1@ebc.ee> Message-ID: <3D7787C2.3020108@uk.lionbioscience.com> T?nu Margus wrote: > It is simple question, but I didn't find an answer from documentation. > > How to tell dotmatcher or whatever an other program in command line, > that it will read sequence in reverse/complement? > If to run it with an option -sask then it will ask, but from command line? > Help tells db:seq1 (Parameter1) db:seq2 (Parameter2) > where parameters are optional. You have many options!!! To reverse the first sequence: % dotmatcher db:seq1 -sreverse % dotmatcher -sreverse1 db:seq1 % dotmatcher -sequencea_sreverse db:seq1 % dotmatcher db:seq1[::r] Note: -sreverse is an "associated qualifier" (dotmatcher -help -verbose lists them). Associated qualifiers, specified as a simple qualifier such as '-sreverse', apply to the previous true qualifier, so -sreverse after db:seq1 applies only to db:seq1. If you put them at the start of the line, they apply to all sequences. -sreverse1 is the first -sreverse (for the first sequence) -sequencea_sreverse is the -sreverse for parameter sequencea (the name of the first sequence in dotmatcher, see dotmatcher -help for these names) db:seq1[::r] is an extension to the USA syntax. You can specify -sbegin, -send and -sreverse in the USA. regards, Peter Rice -- ------------------------------------------------ Peter Rice, LION Bioscience Ltd, Cambridge, UK peter.rice at uk.lionbioscience.com +44 1223 224723 From gbottu at ben.vub.ac.be Fri Sep 6 07:39:00 2002 From: gbottu at ben.vub.ac.be (Guy Bottu) Date: Fri, 6 Sep 2002 09:39:00 +0200 (CEST) Subject: question about plotcon Message-ID: <200209060739.JAA1355059@black.vub.ac.be> from : BEN Dear colleagues, Does someone know how plotcon computes the average similarity score ? When I give as input a 5 protein alignment and a similarity matrix that scores identical amino acid pairs +1, the plotcon graphic raises nowhere above +0.2 Is this O.K. ? Guy Bottu From s.roehrig at xantos.de Fri Sep 6 11:33:04 2002 From: s.roehrig at xantos.de (Roehrig, Sascha) Date: Fri, 6 Sep 2002 13:33:04 +0200 Subject: indexing refseq with dbiflat Message-ID: Dear all, I am having trouble indexing refseq (release in genbank format from yesterday). During indexing I get a lot of errors complaining about duplicate ids: Index a flat file database Warning: Duplicate ID skipped: '0610012A05Rik' Warning: Duplicate ID skipped: '0610043B10Rik' Warning: Duplicate ID skipped: '1110004B13Rik' Warning: Duplicate ID skipped: '1110020A23Rik' Warning: Duplicate ID skipped: '14' Warning: Duplicate ID skipped: '14' Warning: Duplicate ID skipped: '14' Warning: Duplicate ID skipped: '14' Warning: Duplicate ID skipped: '1500000C01Rik' ... ... After indexing, I am not able to retrieve a lot of entries which are present in the flatfile, i.e.: NM_000303 NM_005693 ... ... Any suggestions would be greatly appreciated. I noticed that one of the changes in version 2.4.1 (I am using 2.5.0) addressed fixing the indexing of refseq. Best regards Sascha -------------- next part -------------- An HTML attachment was scrubbed... URL: From abrown at nimr.mrc.ac.uk Fri Sep 6 13:27:44 2002 From: abrown at nimr.mrc.ac.uk (Alex Brown) Date: Fri, 6 Sep 2002 14:27:44 +0100 Subject: eprimer3 Message-ID: <6D152F0A-C19C-11D6-BA35-0003938768AC@nimr.mrc.ac.uk> Hi. I'm running EMBOSS on a Mac PowerBook G4 (OSX), under Darwin The EMBOSS program eprimer3 requires the program primer3 from the Whiltehead Institute ( http://www- genome.wi.mit.edu/ftp/distribution/software/primer3_0_9_test.tar.gz ). I have downloaded this. When I try to install it using 'make primer3', I get the following error message: [xxxx:/primer3_0_9_test/src] me% make primer3 make: *** No rule to make target `primer3'. Stop. Being a UNIX grasshopper, I'm not sure how to overcome this problem. Has anyone installed primer3 onto Darwin successfully ? Many Thanks, Alex Brown. From David.Bauer at SCHERING.DE Fri Sep 6 14:06:33 2002 From: David.Bauer at SCHERING.DE (David.Bauer at SCHERING.DE) Date: Fri, 6 Sep 2002 16:06:33 +0200 Subject: infoseq db name in fasta file Message-ID: Hi, infoseq can read fasta file with a database name in front of the identifier. >database:name accession description Is there a way to preserve the database name on output? In the USA and NAME colums appears only the name and not the database. Thanks, David. From mathog at mendel.bio.caltech.edu Fri Sep 6 14:47:50 2002 From: mathog at mendel.bio.caltech.edu (David Mathog) Date: Fri, 06 Sep 2002 07:47:50 -0700 Subject: eprimer3 Message-ID: > The EMBOSS program eprimer3 requires the program primer3 from the > Whiltehead Institute ( http://www- > genome.wi.mit.edu/ftp/distribution/software/primer3_0_9_test.tar.gz ). I > have downloaded this. When I try to install it using 'make primer3', I > get the following error message: > > [xxxx:/primer3_0_9_test/src] me% make primer3 > make: *** No rule to make target `primer3'. Stop. > Just use "make". The executable produced is actually called primer3_core. In older versions of primer3 before it was wrapped in a web interface the program was indeed called primer3, as you'd expect. Regards David Mathog mathog at caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech From peter.rice at uk.lionbioscience.com Fri Sep 6 14:54:17 2002 From: peter.rice at uk.lionbioscience.com (Peter Rice) Date: Fri, 06 Sep 2002 15:54:17 +0100 Subject: infoseq db name in fasta file References: Message-ID: <3D78C199.6000308@uk.lionbioscience.com> David.Bauer at SCHERING.DE wrote: > infoseq can read fasta file with a database name in front of the identifier. > >>database:name accession description > > Is there a way to preserve the database name on output? > In the USA and NAME colums appears only the name and not the database. Tricky ... The USA is of course the filename for EMBOSS. If you 'cheat' with -sdb database the USA will be database-id:name The real problem is that EMBOSS cannot easily know "database" is valid. Any suggestions for how this could work? Peter -- ------------------------------------------------ Peter Rice, LION Bioscience Ltd, Cambridge, UK peter.rice at uk.lionbioscience.com +44 1223 224723 From abrown at nimr.mrc.ac.uk Fri Sep 6 15:07:08 2002 From: abrown at nimr.mrc.ac.uk (Alex Brown) Date: Fri, 6 Sep 2002 16:07:08 +0100 Subject: eprimer3 In-Reply-To: <3D78B34F.2000002@clondiag.com> Message-ID: <4FC704BA-C1AA-11D6-BA35-0003938768AC@nimr.mrc.ac.uk> Hi. Thanks for the help. This appears to have worked to some extent, after I modified the Makefile. I had to change the line CC = gcc to CC = CC (I think that CC on Darwin is GCC). This appeared to do something, but I this time I got a load of error messages: [xxxx:/primer3_0_9_test/src] me% make cc -c -g -Wall -D__USE_FIXED_PROTOTYPES__ -include /usr/include/sys/types.h -O2 -DDPAL_MAX_ALIGN=36 -DMAX_PRIMER_LENGTH=36 primer3_main.c primer3_main.c: In function `oligo_param': primer3_main.c:1113: warning: passing arg 3 of `oligo_overlaps_interval' from incompatible pointer type primer3_main.c:1147: warning: passing arg 3 of `oligo_overlaps_interval' from incompatible pointer type primer3_main.c:1151: warning: passing arg 3 of `oligo_overlaps_interval' from incompatible pointer type cc -g -Wall -D__USE_FIXED_PROTOTYPES__ -include /usr/include/sys/types.h -O2 -c -o oligotm.o oligotm.c primer3_release.h:3: warning: `pr_release' defined but not used cc -c -g -Wall -D__USE_FIXED_PROTOTYPES__ -include /usr/include/sys/types.h -O2 -DDPAL_MAX_ALIGN=36 -DMAX_PRIMER_LENGTH=36 -o dpal_primer.o dpal.c dpal.c: In function `_dpal_generic': dpal.c:332: warning: `i0' might be used uninitialized in this function dpal.c:332: warning: `j0' might be used uninitialized in this function dpal.c:336: warning: `I' might be used uninitialized in this function dpal.c:336: warning: `J' might be used uninitialized in this function dpal.c:338: warning: `score' might be used uninitialized in this function dpal.c: In function `_dpal_long_nopath_generic': dpal.c:560: warning: `I' might be used uninitialized in this function dpal.c:560: warning: `J' might be used uninitialized in this function dpal.c: At top level: primer3_release.h:3: warning: `pr_release' defined but not used cc -c -g -Wall -D__USE_FIXED_PROTOTYPES__ -include /usr/include/sys/types.h -O2 -DDPAL_MAX_ALIGN=36 -DMAX_PRIMER_LENGTH=36 -o format_output.o format_output.c cc -c -g -Wall -D__USE_FIXED_PROTOTYPES__ -include /usr/include/sys/types.h -O2 -DDPAL_MAX_ALIGN=36 -DMAX_PRIMER_LENGTH=36 -o boulder_input.o boulder_input.c primer3_release.h:3: warning: `pr_release' defined but not used cc -g -o primer3_core primer3_main.o oligotm.o dpal_primer.o format_output.o boulder_input.o '-static' -lm /usr/bin/ld: can't locate file for: -lcrt0.o make: *** [primer3_core] Error 1 According to the README file, the warnings about i0, j0, I, J and score being possibly used unset in dpal.c are normal and harmless. However, the 'make' is still failing, the primer3_core exectuable does not appear. The problem seems to be in the line '/usr/bin/ld: can't locate file for: -lcrt0.o'. Any ideas ?? Many thanks, Alex Brown. From estienne at sanbi.ac.za Mon Sep 9 09:53:43 2002 From: estienne at sanbi.ac.za (Estienne Swart) Date: Mon, 09 Sep 2002 11:53:43 +0200 Subject: Piping with emboss utilites that expect two files as input Message-ID: <3D7C6FA7.7030803@sanbi.ac.za> Hi, Is there any way to directly pipe two sequences into utitilities such as water/matcher, etc. which require usually require two files as input? It is possible to pipe one or the other (seqA or seqB, e.g.water -auto -filter -SeqA sample1.seq < sample2.seq ), but no both at once (e.g.water -auto -filter < sample1.seq < sample2.seq ). I know not all shells can read from multiple pipes(?), like zsh can, e.g.: cat < sample2.seq A less simple sequence ACGAGCACTAGCATGCATCGAGCATCGGGAGGACTTAGCAGTCGAGCATCGTACGAGCT >A simple sequence AGAGGCAGGCGACGAGCGAGCGAGCGAGC But, it will really be useful (from a scripting perspective) if these utilities could at least take a pair of sequences from stdin, and align them. Estienne Swart From hkawai at venus.dti.ne.jp Tue Sep 10 03:33:53 2002 From: hkawai at venus.dti.ne.jp (=?ISO-2022-JP?B?GyRCMk85ZzkoNSobKEI=?=) Date: Tue, 10 Sep 2002 12:33:53 +0900 Subject: GenBank indexing Trouble Message-ID: <200209100333.g8A3X2Yn024400@smtp1.dti.ne.jp> Hello I'm using EMBOSS package. I appreciate developers' efforts. Unfortunately, I found a trouble when I indexed GenBank 130 and called it with entret/seqret. First of all, I made index for all files of GenBank 130 (except EST,GSS,HTG) described below. -------------------------------------- % /usr/local/EMBOSS/2.5.0/bin/dbiflat Index a flat file database EMBL : EMBL SWISS : Swiss-Prot, SpTrEMBL, TrEMBLnew GB : Genbank, DDBJ Entry format [SWISS]: GB Database directory [.]: Wildcard database filename [*.dat]: *.seq Database name: GB Release number [0.0]: Index date [00/00/00]: Warning: Duplicate ID skipped: 'AY071141' -------------------------------------- When I called L11995 with "entret gb:L11995", I got the incorrect entry whose accession is M20152. And I tried to get gb:M20152, I got M20153. These three entries exist on the gbrod3.seq file sequentially. This trouble does not occur when I called entries whose 'LOCUS' and 'ACCESSION' fields are identical (e.g.BC003860). Because this trouble occurs with dbiflat in version 2.4.1 or 2.5.0 but does not in 2.3.1, I'm now using EMBOSS 2.3.1 for only dbiflat/dbifasta, and 2.4.1 for other programs (entret/seqret and so on). My hypothesis of this trouble is described below. I focused on the duplicate ID AY071141 and I removed one AY071141entry (from gbinv4.seq file). In this case, I could get correct entries. When dbiflat finds duplicate ID to be skipped, I guess, the index counter of LOCUS and ACCESSION should be increased (or decreased). But in this version, ONLY LOCUS counter would be increased (or decreased) and ACCESSION's one would not be increased (or decreased). I hope my report will be helpfull for developers. Best regards Kawai From peter.rice at uk.lionbioscience.com Tue Sep 10 08:27:51 2002 From: peter.rice at uk.lionbioscience.com (Peter Rice) Date: Tue, 10 Sep 2002 09:27:51 +0100 Subject: GenBank indexing Trouble References: <200209100333.g8A3X2Yn024400@smtp1.dti.ne.jp> Message-ID: <3D7DAD07.3000407@uk.lionbioscience.com> Dear Kawai, > I'm using EMBOSS package. I appreciate developers' efforts. > Unfortunately, I found a trouble when I indexed GenBank 130 and > called it with entret/seqret. > I focused on the duplicate ID AY071141 and I removed one AY071141entry > (from gbinv4.seq file). There is a problem with duplicate IDs which will be solved in EMBOSS 2.5.1 Alan: can we build 2.5.1 or put the updated dbi*.c and embdbi.* files in the patchfiles directory? regards, Peter Rice -- ------------------------------------------------ Peter Rice, LION Bioscience Ltd, Cambridge, UK peter.rice at uk.lionbioscience.com +44 1223 224723 From ableasby at hgmp.mrc.ac.uk Tue Sep 10 10:09:22 2002 From: ableasby at hgmp.mrc.ac.uk (ableasby at hgmp.mrc.ac.uk) Date: Tue, 10 Sep 2002 11:09:22 +0100 (BST) Subject: GenBank indexing Trouble Message-ID: <200209101009.LAA26312@bromine.hgmp.mrc.ac.uk> >Alan: can we build 2.5.1 or put the updated dbi*.c and embdbi.* files in >the patchfiles directory? I just have to test whether some Solaris configuration modifications don't break anything and will hopefully put 2.5.1 out late morning or earlyb afternoon. Alan From ableasby at hgmp.mrc.ac.uk Tue Sep 10 12:03:42 2002 From: ableasby at hgmp.mrc.ac.uk (ableasby at hgmp.mrc.ac.uk) Date: Tue, 10 Sep 2002 13:03:42 +0100 (BST) Subject: EMBOSS 2.5.1 released Message-ID: <200209101203.NAA10569@bromine.hgmp.mrc.ac.uk> This release fixes problems associated with non-unique identifiers in some databases (e.g. REFSEQ). Note that there is now a specific indexing option for that database in dbiflat. Alan From abrown at nimr.mrc.ac.uk Tue Sep 10 16:02:20 2002 From: abrown at nimr.mrc.ac.uk (Alex Brown) Date: Tue, 10 Sep 2002 17:02:20 +0100 Subject: eprimer3 again Message-ID: Hi. Thanks to David Mathog andPeter Slickers for their help in sorting out my last problem. I had to edit the Makefile to remove the line LIBOPTS ='-static' and then remove all reference to $(LIBOPTS), as well as changing the CC = CC (gcc in Darwin is cc). This new Makefile produced the executable primer3_core. Howerver, on running the test program primer_test.pl in the ../test directory, I got the following errors : [****:/primer3_0_9_test/test] me primer_test.pl /primer3_0_9_test/test/primer_test.pl: testing ../src/primer3_core Tue Sep 10 16:49:41 BST 2002 Testing fatal errors 1,5d0 < PRIMER_SEQUENCE_ID=seq1 < SEQUENCE=ATGCTAGCTAGTCGATCGATGCTAGCTAGTCGATCTACTATCATCATGCTAGCATGGGGCATCTGTGGGATGCTGTCGTACTGATCGATGCTAGCTAGCTAGTCGATCGTAGCTATCATCATCTAGTCATCGTAGCTATCATCGTAGC < PRIMER_MISPRIMING_LIBRARY=repeat_file < PRIMER_ERROR=Cannot open mispriming library repeat_file < = Difference found between primer_global_err/repeat_lib_err.out and primer_global_err/repeat_lib_err.tmp 1d0 < ../src/primer3_core: Cannot open mispriming library repeat_file Difference found between primer_global_err/repeat_lib_err.out2 and primer_global_err/repeat_lib_err.tmp2 1,6d0 < PRIMER_SEQUENCE_ID=seq1 < SEQUENCE=ATGCTAGCTAGTCGATCGATGCTAGCTAGTCGATCTACTATCATCATGCTAGCATGGGGCATCTGTGGGATGCTGTCGTACTGATCGATGCTAGCTAGCTAGTCGATCGTAGCTATCATCATCTAGTCATCGTAGCTATCATCGTAGC < PRIMER_PICK_INTERNAL_OLIGO=1 < PRIMER_INTERNAL_OLIGO_MISHYB_LIBRARY=repeat_file < PRIMER_ERROR=Cannot open internal oligo mishyb library repeat_file < = Difference found between primer_global_err/repeat_lib_int_err.out and primer_global_err/repeat_lib_int_err.tmp 1d0 < ../src/primer3_core: Cannot open internal oligo mishyb library repeat_file Difference found between primer_global_err/repeat_lib_int_err.out2 and primer_global_err/repeat_lib_int_err.tmp2 FAILED primer_boundary OK primer_internal OK primer_boundary_formated OK primer_internal_formated OK primer_start_codon OK primer_boundary1 OK primer_internal1 OK primer_task OK primer_task_formated OK primer_boundary1_formated Cannot read primer_boundary1_formated_output at /Downloads/primer3_0_9_test/test/primer_test.pl line 80. Somethng appears to be not quite right. I should point out that I had to edit the primer_test.pl file to change the first line to '#!/usr//bin/perl5.6.0 -w' to make it compatible with Darwin. I then copied primer3_core to my EMBOSS bin directory, and tested it using the sequence file given in the EMBOSS manual. Without the -explain flag, the results were perfect, as shown in the manual. However, with the -explain flag set, I get the following error message: Picks PCR primers and hybridization oligos Output file [hsfau1.eprimer3]: *** malloc[422]: error for object 0x1d7c80: Incorrect check sum for freed object - object was probably modified after beeing freed; break at szone_error Bus error Why is the test program failing, and why is the -explain flag causing these problems ? Are there some files missing from the installation ? Hope someone can help. Many thanks. Alex Brown From hkawai at venus.dti.ne.jp Wed Sep 11 10:13:20 2002 From: hkawai at venus.dti.ne.jp (Hironori Kawai) Date: Wed, 11 Sep 2002 19:13:20 +0900 Subject: GenBank indexing Trouble In-Reply-To: <3D7DAD07.3000407@uk.lionbioscience.com> References: <3D7DAD07.3000407@uk.lionbioscience.com> Message-ID: <200209111012.g8BACTYn000007@smtp1.dti.ne.jp> Thanks for uploading new version quickly. The problem I had reported did not occur in new version. But, I would like to discuss another issue. In my previous report, I mentioned duplicate ID 'AY071141'. The duplicate entries are shown below. -------------------------------------------------- LOCUS AY071141 2622 bp DEFINITION Drosophila melanogaster RE17910 full length cDNA. ACCESSION AY071141 LOCUS AY071141 2958 bp DEFINITION Drosophila melanogaster RE17910 full insert cDNA. ACCESSION AY119119 --------------------------------------------------- Even if I use AY119119 with entret/seqret, the former entry is output. I think it is dangerous because it's difficult to notice incorrect entries have been output. In this case, I wish entret/seqret output the latter entry or output no entry but warning. Best regards Kawai From peter.rice at uk.lionbioscience.com Wed Sep 11 10:29:41 2002 From: peter.rice at uk.lionbioscience.com (Peter Rice) Date: Wed, 11 Sep 2002 11:29:41 +0100 Subject: GenBank indexing Trouble References: <3D7DAD07.3000407@uk.lionbioscience.com> <200209111012.g8BACTYn000007@smtp1.dti.ne.jp> Message-ID: <3D7F1B15.70609@uk.lionbioscience.com> Hironori Kawai wrote: > Thanks for uploading new version quickly. > The problem I had reported did not occur in new version. Thanks. > But, I would like to discuss another issue. > In my previous report, I mentioned duplicate > ID 'AY071141'. The duplicate entries are shown below. > -------------------------------------------------- > LOCUS AY071141 2622 bp > DEFINITION Drosophila melanogaster RE17910 full length cDNA. > ACCESSION AY071141 > > LOCUS AY071141 2958 bp > DEFINITION Drosophila melanogaster RE17910 full insert cDNA. > ACCESSION AY119119 > --------------------------------------------------- > Even if I use AY119119 with entret/seqret, the former entry is output. > I think it is dangerous because it's difficult to notice incorrect entries have been output. > In this case, I wish entret/seqret output the latter entry or output no entry but warning. This is a known problem. Only one AY071141 entry can be indexed, but EMBOSS does not have control over which of the 2 (or more) entries will be found first in the sorted index. We save all information from the duplicates (accession for example) simply because we do not know which one to discard. So your search results are 'correct'. The root cause in EMBOSS is that the EMBLCD/Staden index format EMBOSS uses can only have one unique ID for each entry. The solution is likely to be a new EMBOSS index format. The root cause in real life is databases that have duplicate IDs. Surprising that it can happen in GenBank regards, Peter -- ------------------------------------------------ Peter Rice, LION Bioscience Ltd, Cambridge, UK peter.rice at uk.lionbioscience.com +44 1223 224723 From tchiang at bioinfo.sickkids.on.ca Wed Sep 11 15:51:38 2002 From: tchiang at bioinfo.sickkids.on.ca (Ted Chiang) Date: Wed, 11 Sep 2002 11:51:38 -0400 (EDT) Subject: counting fasta files Message-ID: Hello, I have a file that contains several hundreds of fasta sequences. Is there a function/program that will count the number of sequences in this file and report it? -Ted ===================================== Ted Chiang, Analyst Centre for Computational Biology Hospital for Sick Children, Toronto 416.813.7028 tchiang at bioinfo.sickkids.on.ca ===================================== From jason at cgt.mc.duke.edu Wed Sep 11 15:52:19 2002 From: jason at cgt.mc.duke.edu (Jason Stajich) Date: Wed, 11 Sep 2002 11:52:19 -0400 (EDT) Subject: counting fasta files In-Reply-To: Message-ID: % grep "^>" file.fa | wc -l On Wed, 11 Sep 2002, Ted Chiang wrote: > > Hello, > > I have a file that contains several hundreds of fasta sequences. Is there > a function/program that will count the number of sequences in this file > and report it? > > -Ted > > > ===================================== > Ted Chiang, Analyst > Centre for Computational Biology > Hospital for Sick Children, Toronto > 416.813.7028 > tchiang at bioinfo.sickkids.on.ca > ===================================== > > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From chenna at embl-heidelberg.de Wed Sep 11 16:02:53 2002 From: chenna at embl-heidelberg.de (Ramu Chenna) Date: Wed, 11 Sep 2002 18:02:53 +0200 Subject: counting fasta files In-Reply-To: Message-ID: grep '>' myfasta.file | wc ramu On Wed, 11 Sep 2002, Ted Chiang wrote: > > Hello, > > I have a file that contains several hundreds of fasta sequences. Is there > a function/program that will count the number of sequences in this file > and report it? > > -Ted > > > ===================================== > Ted Chiang, Analyst > Centre for Computational Biology > Hospital for Sick Children, Toronto > 416.813.7028 > tchiang at bioinfo.sickkids.on.ca > ===================================== > > > From mathog at mendel.bio.caltech.edu Wed Sep 11 16:05:20 2002 From: mathog at mendel.bio.caltech.edu (David Mathog) Date: Wed, 11 Sep 2002 09:05:20 -0700 Subject: counting fasta files Message-ID: > I have a file that contains several hundreds of fasta sequences. Is there > a function/program that will count the number of sequences in this file > and report it? Try this: ftp://saf.bio.caltech.edu/pub/software/molbio/fastaproperties.c Reads a fasta file (as filename from arg 1, stdin if that's "-") and emits one status line to stdout which is: N M TYPE MINLEN MAXLEN AVELEN where N is the number of sequences in the file M is the total number of bp/aa in the file (over all sequences) TYPE is P or N, the best guess for protein or nucleic acid. If it can't tell it will emit P. MINLEN MAXLEN 1.0.2 11-JUL-2002, DRM. Added the statistics at the end of the line. 1.0.1 22-MAY-2002, DRM. Revised count of bases so that it doesn't mess up on counts > int4 range. Use of long long is an extension to older ANSI C standards. Regards, David Mathog mathog at caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech From fernan at iib.unsam.edu.ar Wed Sep 11 16:24:47 2002 From: fernan at iib.unsam.edu.ar (Fernan Aguero) Date: Wed, 11 Sep 2002 13:24:47 -0300 Subject: counting fasta files In-Reply-To: References: Message-ID: <20020911162447.GB34980@iib.unsam.edu.ar> +----[ Asi hablaba Ted Chiang (tchiang at bioinfo.sickkids.on.ca): | | I have a file that contains several hundreds of fasta sequences. Is there | a function/program that will count the number of sequences in this file | and report it? | +----] Here's another one cat file.fasta | grep -c \> -- F e r n a n A g u e r o http://genoma.unsam.edu.ar/~fernan From flo at ebi.ac.uk Wed Sep 11 17:32:52 2002 From: flo at ebi.ac.uk (Florence Servant) Date: Wed, 11 Sep 2002 17:32:52 +0000 Subject: counting fasta files References: <20020911162447.GB34980@iib.unsam.edu.ar> Message-ID: <3D7F7E43.6E148028@ebi.ac.uk> Fernan Aguero wrote: > +----[ Asi hablaba Ted Chiang (tchiang at bioinfo.sickkids.on.ca): > | > | I have a file that contains several hundreds of fasta sequences. Is there > | a function/program that will count the number of sequences in this file > | and report it? > | > +----] > > Here's another one > > cat file.fasta | grep -c \> > Hi all, I would suggest to add a contraint which is that the line must start with >: cat file.fasta | grep -c ^\> To be really sure it does work for all the fasta files you can have, you also have to take in account that several comment lines starting with > can occur for a single sequence. ggrep -A 1 ^\> file.fasta | grep -v ^-- | grep -v ^\> | wc -l Flo -- Florence SERVANT EBI - European Bioinformatics Institute - Room A2-40 Wellcome Trust Genome Campus Hinxton Cambridge CB10 1SD United Kingdom Tel : (+44) 01223 494 686 From crowdy at bioinfo.sickkids.on.ca Wed Sep 11 19:05:21 2002 From: crowdy at bioinfo.sickkids.on.ca (Edgar Crowdy) Date: Wed, 11 Sep 2002 15:05:21 -0400 Subject: Y and N args Message-ID: <3D7F93F1.476514DB@bioinfo.sickkids.on.ca> Hi, I have a question about the standards for yes/no options in EMBOSS programs. dotpath works as I expect it to. There are three Y/N parameters: "data", "overlap" and "boxit". On the command line I represent them like this: -nodata -overlaps -boxit (meaning 'no' to the first and 'yes' to the last two). In other words, the command line switch on its own is a 'yes', and to say 'no' you prepend 'no' to the switch. I'm having trouble with a program called cirdna. In that program I can use "-noticklines" and "-nointersymbol" but I can't express a 'yes' by saying -ticklines or -intersymbol as in other programs. It only runs if I say '-intersymbol Y', for example. Is cirdna an exception to the rule or are there other programs? Will placing a Y or N after all Y/N parameters work? I need a standard that works for all EMBOSS programs with Y/N parameters. -- Edgar Crowdy, Programmer crowdy at bioinfo.sickkids.on.ca Centre for Computational Biology, Hospital for Sick Children, Toronto From haruna at sgi.com Thu Sep 12 03:23:07 2002 From: haruna at sgi.com (Haruna Cofer) Date: Wed, 11 Sep 2002 23:23:07 -0400 Subject: SGI Porting Notes for EMBOSS 2.5.1 References: <200209101203.NAA10569@bromine.hgmp.mrc.ac.uk> Message-ID: <3D80089B.57DA99AF@sgi.com> Hello -- I have updated my SGI porting notes for the latest 2.5.1 release of EMBOSS, EMBASSY, and Jemboss: http://www.sgi.com/industries/sciences/chembio/resources/emboss/ Enjoy! ;) -- Haruna :) -- Haruna N. Cofer Silicon Graphics Inc. ChemPharm Applications From peter.rice at uk.lionbioscience.com Thu Sep 12 09:31:12 2002 From: peter.rice at uk.lionbioscience.com (Peter Rice) Date: Thu, 12 Sep 2002 10:31:12 +0100 Subject: Y and N args References: <3D7F93F1.476514DB@bioinfo.sickkids.on.ca> Message-ID: <3D805EE0.4080303@uk.lionbioscience.com> Edgar Crowdy wrote: > I'm having trouble with a program called cirdna. In that program I can > use "-noticklines" and "-nointersymbol" but I can't express a 'yes' by > saying -ticklines or -intersymbol as in other programs. It only runs if I > say '-intersymbol Y', for example. > > Is cirdna an exception to the rule or are there other programs? > Will placing a Y or N after all Y/N parameters work? Ugh!!!!! cirdna is using strings for ticklines and intersymbol. That is why the value is required. This needs to be fixed - that part is easy. Simply make cirdna use boolean options as it should. More worrying is that -nointersymbol was allowed. That really must be only for booleans. For example, seqret -nooutfile will write to a file called "N". Not too surprising that nobody has tried this before :-) The boolean testing code is in the right function. Should be easy to modify. regards, Peter Rice -- ------------------------------------------------ Peter Rice, LION Bioscience Ltd, Cambridge, UK peter.rice at uk.lionbioscience.com +44 1223 224723 From areagp61 at yahoo.it Thu Sep 12 14:39:04 2002 From: areagp61 at yahoo.it (Graziano P.) Date: Thu, 12 Sep 2002 16:39:04 +0200 Subject: emma Message-ID: <015201c25a6a$26427d00$18105709@italy.ibm.com> Hi, I am trying to use the following emma options: -norgap and -nopercent, but when I write emma sequence.fasta -norgap the program returns the following error message: EMBOSS An error in ajacd.c at line 11262: unknown qualifier -norgap the same is if I write emma sequence.fasta -nopercent EMBOSS An error in ajacd.c at line 11262: unknown qualifier -nopercent I have seen in emma.acd file and these two qualifiers exist!! Lines in emma.acd file about these two options are: bool: norgap [ optional: "$(prot)" default: "No" information: "No residue specific gaps" help: "'Residue specific penalties' are amino acid specific gap penalties that ......" ] bool: nopercent [ optional: "@(!$(slow))" default: "No" information: "Fast pairwise alignment: similarity scores: suppresses percentage score" ] Anyone have alredy found this problem? Thanks Graziano ----------------------------------------------- Graziano Pappad? areagp61 at yahoo.it ---------------------------------------------- ______________________________________________________________________ Scarica il nuovo Yahoo! Messenger: con webcam, nuove faccine e tante altre novit?. http://it.yahoo.com/mail_it/foot/?http://it.messenger.yahoo.com/ From peter.rice at uk.lionbioscience.com Thu Sep 12 14:51:08 2002 From: peter.rice at uk.lionbioscience.com (Peter Rice) Date: Thu, 12 Sep 2002 15:51:08 +0100 Subject: emma References: <015201c25a6a$26427d00$18105709@italy.ibm.com> Message-ID: <3D80A9DC.5050105@uk.lionbioscience.com> Graziano P. wrote: > I am trying to use the following emma options: -norgap and -nopercent, but > when I write > > emma sequence.fasta -norgap > > the program returns the following error message: > > EMBOSS An error in ajacd.c at line 11262: > unknown qualifier -norgap > > the same is if I write > > emma sequence.fasta -nopercent > > EMBOSS An error in ajacd.c at line 11262: > unknown qualifier -nopercent > > I have seen in emma.acd file and these two qualifiers exist!! Oops. We are working on it. "no" in front of a qualifier is automatically removed, and the value is set to "N". It should check (a) that the qualifier is boolean and (b) that the name is correct. Qualifiers should not begin with "no". For emma, -nopercent and -norgap will need new names to work correctly. We should make it -percent and -rgap :-) > Anyone have alredy found this problem? This is the same as the problem reported this morning ... before today, nobody had reported it !!! Peter -- ------------------------------------------------ Peter Rice, LION Bioscience Ltd, Cambridge, UK peter.rice at uk.lionbioscience.com +44 1223 224723 From shay at bioinfo.sickkids.on.ca Fri Sep 13 17:22:29 2002 From: shay at bioinfo.sickkids.on.ca (Shayanthan Parameswaran) Date: Fri, 13 Sep 2002 13:22:29 -0400 Subject: GenBank indexing Trouble (fwd) References: Message-ID: <3D821ED5.F36F50BE@bioinfo.sickkids.on.ca> To all, We installed Emboss 2.5.1 and indexed genbank 131 with the GB format option using the new dbiflat that corrected the error of incorrect entry retrieval. We tried the new REFSEQ option in dbiflat to index refseq, however, the error that was fixed in the dbiflat GB option does not seem to be fixed in the REFSEQ format option. Seqret retrieves the entry NM_066922.1 instead of NM_066918. Has anyone else experienced this error with the REFSEQ format option? Shay > > Date: Tue, 10 Sep 2002 13:03:42 +0100 (BST) > From: ableasby at hgmp.mrc.ac.uk > To: emboss at hgmp.mrc.ac.uk > Subject: EMBOSS 2.5.1 released > > This release fixes problems associated with non-unique identifiers > in some databases (e.g. REFSEQ). Note that there is now a specific > indexing option for that database in dbiflat. > > Alan > > > Date: Tue, 10 Sep 2002 12:33:53 +0900 > From: "[ISO-2022-JP] 河合宏紀" > To: emboss at embnet.org > Subject: GenBank indexing Trouble > > Hello > > I'm using EMBOSS package. I appreciate developers' efforts. > Unfortunately, I found a trouble when I indexed GenBank 130 and > called it with entret/seqret. > > First of all, I made index for all files of GenBank 130 (except > EST,GSS,HTG) described below. > -------------------------------------- > % /usr/local/EMBOSS/2.5.0/bin/dbiflat > Index a flat file database > EMBL : EMBL > SWISS : Swiss-Prot, SpTrEMBL, TrEMBLnew > GB : Genbank, DDBJ > Entry format [SWISS]: GB > Database directory [.]: > Wildcard database filename [*.dat]: *.seq > Database name: GB > Release number [0.0]: > Index date [00/00/00]: > Warning: Duplicate ID skipped: 'AY071141' > -------------------------------------- > > When I called L11995 with "entret gb:L11995", I got the incorrect entry > whose accession is M20152. And I tried to get gb:M20152, I got M20153. > These three entries exist on the gbrod3.seq file sequentially. This > trouble does not occur when I called entries whose 'LOCUS' and > 'ACCESSION' fields are identical (e.g.BC003860). Because this trouble > occurs with dbiflat in version 2.4.1 or 2.5.0 but does not in 2.3.1, I'm > now using EMBOSS 2.3.1 for only dbiflat/dbifasta, and 2.4.1 for other > programs (entret/seqret and so on). > > My hypothesis of this trouble is described below. > I focused on the duplicate ID AY071141 and I removed one AY071141entry > (from gbinv4.seq file). > In this case, I could get correct entries. > When dbiflat finds duplicate ID to be skipped, I guess, the index counter > of LOCUS and ACCESSION should be increased (or decreased). But in this > version, ONLY LOCUS counter would be increased (or decreased) and > ACCESSION's one would not be increased (or decreased). > > I hope my report will be helpfull for developers. > > Best regards > > Kawai -- Shayanthan Parameswaran Bioinformatics Supercomputing Centre Programmer (416) 813-8030 555 University Avenue email: shay at bioinfo.sickkids.on.ca The Hospital for Sick Children http: www.bioinfo.sickkids.on.ca Toronto, ON, M5G 1X8, CANADA. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jison at hgmp.mrc.ac.uk Mon Sep 16 16:14:04 2002 From: jison at hgmp.mrc.ac.uk (Dr J.C. Ison) Date: Mon, 16 Sep 2002 17:14:04 +0100 Subject: requestion for Emboss Demo version References: <20020916125719.36123.qmail@web13805.mail.yahoo.com> Message-ID: <3D86034B.103507D1@hgmp.mrc.ac.uk> Kuzhali - - please provide more details of your problem. One of us might be able to help then. Cheers J. kuzhali subbiah wrote: > hello sir > i need a Demo version of emboss,& i ve got a doubt > sir ,i ve downloaded the software & > database(swissprot) > ,will u plz help me how to link the software & > database sir,it will be very helpfull if u provide me > the informations.i request u to send the informations > as soon as possible. > > kuzhali > From shilpahalesh at yahoo.co.in Tue Sep 17 05:34:53 2002 From: shilpahalesh at yahoo.co.in (=?iso-8859-1?q?shilpa=20halesh?=) Date: Tue, 17 Sep 2002 06:34:53 +0100 (BST) Subject: info needed Message-ID: <20020917053453.67174.qmail@web8004.mail.in.yahoo.com> hello sir, i am PG student doing medical software course.we are required to work on bioinformatics tools.it is given as an assignment.i have selected emboss as the topic.we are asked to understand the source code and to change it another compatible language. i have downloaded the emboss software,the database,i have downloaded the tutorial also,but i am not able to understand how to link the software with the database.i request you to please guide me so that i can understand the source code in a proper way.i hope you would consider my request.my mail id is shilpahalesh at yahoo.co.in shilpa ________________________________________________________________________ Missed your favourite TV serial last night? Try the new, Yahoo! TV. visit http://in.tv.yahoo.com From jison at hgmp.mrc.ac.uk Tue Sep 17 07:06:55 2002 From: jison at hgmp.mrc.ac.uk (Dr J.C. Ison) Date: Tue, 17 Sep 2002 08:06:55 +0100 Subject: info needed References: <20020917053453.67174.qmail@web8004.mail.in.yahoo.com> Message-ID: <3D86D48F.832F053D@hgmp.mrc.ac.uk> If its a sequence database then some of the functions you'll need are in the "Sequence reading and writing" AJAX library files : ajseq General sequence handling ajseqdata Sequence data types ajseqdb Sequence database access ajseqread Sequence reading ajseqtype Sequence types ajseqwrite Sequence writing Links to them are from: http://www.hgmp.mrc.ac.uk/Software/EMBOSS/Doc/Ajax/index.html J. shilpa halesh wrote: > hello sir, > i am PG student doing medical software course.we are > required to work on bioinformatics tools.it is given > as an assignment.i have selected emboss as the > topic.we are asked to understand the source code and > to change it another compatible language. > i have downloaded the emboss software,the database,i > have downloaded the tutorial also,but i am not able to > understand how to link the software with the > database.i request you to please guide me so that i > can understand the source code in a proper way.i hope > you would consider my request.my mail id is > shilpahalesh at yahoo.co.in > > shilpa > > ________________________________________________________________________ > Missed your favourite TV serial last night? Try the new, Yahoo! TV. > visit http://in.tv.yahoo.com -- Jon C. Ison, PhD Bioinformatics Applications Group UK MRC Human Genome Mapping Project Resource Centre Hinxton, Cambridge, CB10 1SB, UK E-mail : jison at hgmp.mrc.ac.uk Tel : 01223 49-4548 HGMP-RC: http://www.hgmp.mrc.ac.uk/ EMBOSS : http://www.hgmp.mrc.ac.uk/Software/EMBOSS/ CCP11 : http://www.hgmp.mrc.ac.uk/CCP11/ From gbottu at ben.vub.ac.be Tue Sep 17 07:50:05 2002 From: gbottu at ben.vub.ac.be (Guy Bottu) Date: Tue, 17 Sep 2002 09:50:05 +0200 (CEST) Subject: -sbegin -send blues Message-ID: <200209170750.JAA1152221@black.vub.ac.be> from : BEN Dear colleagues, I just installed Staden and its EMBOSS interface. I noted a few problems and sent a bug report, with among other things the following : Why is it that sometimes you can make a program operate only on a range of the sequence by filling in the "Start position" "End position" boxes, but sometimes not. An example of a program that operates on the complete sequence no matter what you do is octanol. I received as reply : All EMBOSS programs that take a -sequence option should also accept -sbegin and -send options for indicating the range, but it appears some do not. I view this as an EMBOSS bug (in Octanol). There doesn't appear to be any component of the ACD file to indicate whether sbegin and send are valid options, and indeed octanol on the command line is happy to accept (and ignore) them. What a surprise ! I had not thought of testing at the command line. Does someone know why it is that some EMBOSS programs behave in this nonstandard way ? Sincerely, Guy Bottu From ableasby at hgmp.mrc.ac.uk Tue Sep 17 08:33:36 2002 From: ableasby at hgmp.mrc.ac.uk (ableasby at hgmp.mrc.ac.uk) Date: Tue, 17 Sep 2002 09:33:36 +0100 (BST) Subject: -sbegin -send blues Message-ID: <200209170833.JAA02474@bromine.hgmp.mrc.ac.uk> >Does someone know why it is that some EMBOSS programs behave in this >nonstandard way ? Yes. Usually an oversight by the author of an application. It is usually fixed now by one simple call to ajSeqTrim(). Historically functions such as ajSeqBegin() and ajSeqEnd() had to be used and some confusion was possibly caused. Its either that or incompetence :-) Cheers Alan From thierry.jaccaud at wanadoo.fr Wed Sep 18 20:31:16 2002 From: thierry.jaccaud at wanadoo.fr (Thierry JACCAUD) Date: Wed, 18 Sep 2002 22:31:16 +0200 Subject: =?iso-8859-1?Q?L'Ecologiste_n=B07_est_paru_!?= Message-ID: A l'attention de emboss at embnet.org Bonjour, J'ai le plaisir de vous adresser ci-apr?s la pr?sentation du n?7 de L'Ecologiste, paru fin juin et encore disponible. La prochaine livraison de L'Ecologiste para?tra d?but octobre. En vous souhaitant bonne lecture ! Sinc?res salutations, Thierry Jaccaud, r?dacteur en chef Dossier du n?7 : comment nourrir l'humanit? ? 84 pages couleurs, en kiosque ou sur commande, 6 euros Sommaire d?taill? sur www.ecologiste.org * Les OGM, la faim et l'Acad?mie des sciences : ? lire une contribution exceptionnelle de Jean-Pierre Berlan, directeur de recherche ? l'INRA, adress?e ? tous les membres de l'Acad?mie. * 48 pages de dossier central : comment nourrir l'humanit? ? Les fausses solutions ? Le d?veloppement, le libre-?change, une certaine forme d'aide alimentaire... pr?conis?es par les grandes institutions internationales, ces solutions apparaissent plut?t comme des causes essentielles de la faim. Les vraies solutions ? La r?forme agraire, la protection des sols, les semences traditionnelles, les petites fermes diversifi?es dont la productivit? est plus ?lev?e que celle de l'agro-industrie comme le montre Vandana Shiva, la remise en cause des r?gimes alimentaires fortement carn?s des Occidentaux. Les solutions existent, m?me avec l'augmentation de population habituellement pr?vue pour 2050. Le dossier se conclut par des ?tudes d?taill?es sur Cuba, la Pologne, les Philippines. * Egalement au sommaire du n?7: La nouvelle ministre de l'Environnement d?clare que le nucl?aire est la moins polluante des industries ? L'Ecologiste publie un bilan de Tchernobyl, par Corinne Castanier, directrice de la Criirad. De nouveaux d?put?s viennent d'?tre ?lus mais leur pouvoir l?gislatif est concr?tement menac? par les r?gles de l'OMC analysent Agn?s Bertrand et Laurence Kalafatid?s. A lire ?galement, des articles sur les baleines grises mises en p?ril par Esso au large des c?tes russes, les pesticides et l'Union europ?enne, l'exemple d'un ?covillage dans les Alpes, un bilan du colloque sur l'apr?s d?veloppement, des ?ditoriaux sur les Indiens Cris, le climat, les dioxines. Avec ?galement des recensions de revues et de livres et le billet d'Alain Herv? ! ********************************************************** " Enfin l'?dition fran?aise de The Ecologist ! " Le Monde " Un titre de r?f?rence " Lib?ration. " Une grande revue est n?e " Politis " Excellent " Le Monde diplomatique Prix au num?ro 6 euros (port offert). Abonnement 1 an, quatre n? : 22,50 euros. Abonnement deux ans, huit n? : 43 euros. L'Ecologiste, 25, rue de F?camp - 75012 Paris. Tel 01 46 28 70 32 Fax 01 43 47 03 38. Nouveau courriel : contact at ecologiste.org - Site Internet : www.ecologiste.org Directeur de publication : Teddy Goldsmith. R?dacteur en chef : Thierry Jaccaud Disponible en kiosque en France, Belgique, Canada, Suisse, Luxembourg et les principaux pays francophones. Si le diffuseur de presse a vendu tous ses exemplaires, il peut encore vous commander un exemplaire (r?assort disponible. codification NMPP : 1848). *********************************************************** Si vous ne souhaitez plus recevoir de mails, il vous suffit de r?pondre en indiquant en objet " unsubscribe ". La p?riodicit? de ces messages est de l'ordre de 1 tous les trois mois environ. *********************************************************** N.B. Le livre de Teddy Goldsmith " Le d?fi du XXI?me si?cle " vient d'?tre r??dit? aux ?ditions du Rocher, sous le titre " Le Tao de L'Ecologie ", avec un texte et une bibliographie revue et corrig?e. " Lisez ce livre, vous ne verrez plus le monde de la m?me fa?on ! " Jean-Marie Pelt. Disponible ? L'Ecologiste, 23 euros, port et emballage offerts. From mad at biol.unlp.edu.ar Thu Sep 19 09:22:01 2002 From: mad at biol.unlp.edu.ar (Martin Sarachu) Date: Thu, 19 Sep 2002 12:22:01 +0300 Subject: emboss on openMosix Message-ID: <3D899739.7A61160F@biol.unlp.edu.ar> Hi, anybody knows if EMBOSS runs with openMosix clusters? Does EMBOSS apps use shared memory? Cheers, martin -- Martin Sarachu mad at biol.unlp.edu.ar EMBnet Argentina http://www.ar.embnet.org From ableasby at hgmp.mrc.ac.uk Thu Sep 19 16:15:52 2002 From: ableasby at hgmp.mrc.ac.uk (ableasby at hgmp.mrc.ac.uk) Date: Thu, 19 Sep 2002 17:15:52 +0100 (BST) Subject: emboss on openMosix Message-ID: <200209191615.RAA02367@bromine.hgmp.mrc.ac.uk> We haven't tried OpenMosix here. As far as shared memory is concerned, not the IPC kind (i.e. no shm) only shared libraries. Alan From Wiepert.Mathieu at mayo.edu Fri Sep 20 17:15:20 2002 From: Wiepert.Mathieu at mayo.edu (Wiepert, Mathieu) Date: Fri, 20 Sep 2002 12:15:20 -0500 Subject: report formats (-rformat) Message-ID: <2F41CC6C9777D311ACBD009027B108EA03EAFF1D@excsrv32.mayo.edu> Hi, I was wondering how -rformat works. I was doing a fuzzpro search, and got formats with many header lines. The output was SeqName Start End Score Mismatch NU5M_RHIUN 150 157 8 . SeqName Start End Score Mismatch NU5M_FELCA 150 157 8 . SeqName Start End Score Mismatch TBB2_NEIMB 572 579 8 . ... But I was hoping for SeqName Start End Score Mismatch NU5M_RHIUN 150 157 8 . NU5M_FELCA 150 157 8 . TBB2_NEIMB 572 579 8 . ... fuzzpro -sequence swissprot:* -rformat -pattern [MF]-[LS]-[F]-[LPA]-[GL]-X-[GAY]-X -mismatch 0 -outfile fuzzpro.excel Did I do something wrong, or are my expectations wrong. It wasn't too clear to me from the formats page how that might work. If this has been asked 100 times before, sorry! Is there an archive I can search? Googling didn't come up with much. -Mat From lenaganesh2k2 at yahoo.com Tue Sep 24 10:34:30 2002 From: lenaganesh2k2 at yahoo.com (LENA GANESH) Date: Tue, 24 Sep 2002 03:34:30 -0700 (PDT) Subject: Pls suggest to get Postscript o/p perfectly in Perl In-Reply-To: <20020923153739.66796.qmail@web12905.mail.yahoo.com> Message-ID: <20020924103430.1099.qmail@web12906.mail.yahoo.com> Sir, I'm Developing Perl program to run the emboss commands in Web Browser. when i'm trying this command in Pel ***************** $cmd="/emboss/bin/dottup F1.TXT F2.TXT -wordsize 4 -graph ps"; system($cmd); ***************** I'm getting only ps script file, but not dotted image.If i run same this command in shell, i'm getting the result perfectly. Is there any alternate way? Pls suggests. --------------------------------- Do you Yahoo!? New DSL Internet Access from SBC & Yahoo! -------------- next part -------------- An HTML attachment was scrubbed... URL: From simon.andrews at bbsrc.ac.uk Tue Sep 24 11:31:31 2002 From: simon.andrews at bbsrc.ac.uk (simon andrews (BI)) Date: Tue, 24 Sep 2002 12:31:31 +0100 Subject: Pls suggest to get Postscript o/p perfectly in Perl Message-ID: <2DC41140A89ED411989D00508BDCD9ED01E286FC@bi-exsrv1.iapc.bbsrc.ac.uk> From: LENA GANESH [mailto:lenaganesh2k2 at yahoo.com] Subject: Pls suggest to get Postscript o/p perfectly in Perl > I'm Developing Perl program to run the emboss commands in Web Browser. > when i'm trying this command in Pel > ***************** > $cmd="/emboss/bin/dottup F1.TXT F2.TXT -wordsize 4 -graph ps"; > system($cmd); > ***************** > I'm getting only ps script file, but not dotted image.If i run same this > command in shell, i'm getting the result perfectly. Is there any alternate > way? I'm not sure what you mean here. If you say you're getting the dottup.ps file created, then that IS the 'dotted image', it's just in postscript format, which requires that you open it in a suitable viewer. If you want to display the output of dottup on a web page then postscript format is not much help as most browsers won't know what to do with it. You'd be better off using PNG format. Unfortunately there doesn't seem to be a way to make dottup output its PNG to STDOUT, so you'll have to let it create dottup.1.png, then read it and pass it back to the browser. Something like this should get you started. Just put it in your cgi-bin directory, and alter the path on line 5 to wherever your F1 and F2 files are situated. You will also need to ensure that the user you web server runs under has permission to create files in that directory. Watch for long lines being wrapped: ######################################################################### #!/usr/bin/perl -w use strict; use CGI::Carp qw(fatalsToBrowser); $ENV{'PLPLOT_LIB'}='/emboss/share/EMBOSS'; chdir('/wherever/your/files/are') || die "Couldn't change directory: $!"; system ('/emboss/bin/dottup F1.TXT F2.TXT -wordsize 4 -data 0 -auto -graph png > /dev/null'); unless (-e 'dottup.1.png') { die "No output created by dottup - check web server error logs"; } # If running on non-unix systems you will also # need to binmode() all filehandles for PNG output open (PNG,'dottup.1.png') || die "Couldn't read PNG file: $!"; print "Content-type:Image/png\n\n"; print while (); ########################################################################## If you are just trying to make a general EMBOSS interface then are you aware that there are several in existence already? Have a look at http://www.hgmp.mrc.ac.uk/Software/EMBOSS/interfaces.html for a list of them. Hope this helps Simon. From David.Bauer at SCHERING.DE Tue Sep 24 11:38:55 2002 From: David.Bauer at SCHERING.DE (David.Bauer at SCHERING.DE) Date: Tue, 24 Sep 2002 13:38:55 +0200 Subject: Pls suggest to get Postscript o/p perfectly in Perl Message-ID: Hello Lena, if you want use emboss via web you should have a look at Luke's interface: http://www.cbr.nrc.ca/EMBOSS/ The interface is build up in perl as perl modules. It's easy to setup and you can use the modules from your own perl cgi-scripts. David. From scop at mrc-lmb.cam.ac.uk Fri Sep 27 09:22:39 2002 From: scop at mrc-lmb.cam.ac.uk (Scop authors) Date: Fri, 27 Sep 2002 10:22:39 +0100 (BST) Subject: position available to work on SCOP Message-ID: Medical Research Council- Centre for Protein Engineering Research Position to work on Structural Classification of Proteins [Ref:CPE/802/11] Applications are invited for a scientific programmer position to work with Dr.Alexey Murzin on further development of the SCOP (Structural Classification of Protein) database. The SCOP database: scop.mrc-lmb.cam.ac.uk/scop is a widely used resource for the investigation of protein structures and sequences. The successful applicant will be responsible for collaboration with other public databases on co-ordination, integration and distribution of sequence and structural family data. He or she will develop tools to support the annotation of SCOP entries and will contribute to the extension of SCOP functionalities in close collaboration with the other members of the SCOP team. The successful applicant will also have the opportunity to carry out independent research on novel computational techniques or protein evolution, according to his or her interests. These tasks require a broad computer science background and strong programming skills. The ideal candidate will be able to tackle a variety of problems, quickly learn new tools, design and implement solutions as needed. He or she will also have some specific competences at the interface of computer science and biology, together with the ability to work in an interdisciplinary area and a genuine interest for biology in its modern, computerized form. Desirable experience includes some background in computational molecular biology and in the management of large amount of partially unstructured data. Commitment to both data and software quality is indispensable and ability to work in a team environment essential. If you think you fit the job description and requirements, you are encouraged to apply even if you fail short of some of the ideal candidate's attributes, but please consider that in order to be successful you must have demonstrated strong computational skills and a genuine interest in biology. For further information please contact us at scop at mrc-lmb.cam.ac.uk. You can also visit the CPE website at: www.mrc-cpe.cam.ac.uk -------------------------------------------------------------------------- This position is grant funded up to 5 years and will be to MRC pay band 4 with a starting salary in the range of 20,625 to 24,750 pounds per annum, depending on qualifications and experience. This post also attracts a scientific supplement of 9% per annum. Applicants should include a full CV, covering letter and details of three professional referees who can be contacted prior to interview. Please quote the relevant job reference CPE/802/11 and email to recruit at mrc-lmb.cam.ac.uk or post to: Kelly Andrews, Personnel Assistant, MRC Centre, Hills Road, Cambridge, CB2 2QH, UK. We are not able to guarantee considering candidates who apply after 1 December 2002. -------------------------------------------------------------------------- `Leading Science for Better Health' The Medical Research Council is an Equal Opportunities Employer and operates a strict no smoking policy. -------------------------------------------------------------------------- From dmerberg at Phylos.com Mon Sep 30 15:59:49 2002 From: dmerberg at Phylos.com (David Merberg) Date: Mon, 30 Sep 2002 11:59:49 -0400 Subject: Setting graphic size in EMBOSS Message-ID: <7E2FACE572C5D61180F300A0C9E97E8F03D6FA@NTSERVER1> Hi all, Is it possible to set the size of graphic output in EMBOSS? For example, I'd like to see the output of ABIview in one long image that could be scrolled horizontally. Thanks, David Merberg Phylos, Inc. 128 Spring Street Lexington, MA 01720