From gbottu at ben.vub.ac.be Thu Jun 2 06:09:54 2005 From: gbottu at ben.vub.ac.be (Guy Bottu) Date: Thu, 2 Jun 2005 12:09:54 +0200 Subject: [EMBOSS] use water/matcher to find where RNA bybridizes Message-ID: <20050602100954.GA14063@bigben.ulb.ac.be> from : Belgian EMBnet Node Dear colleagues, One of our users had a problem : how to find the location where a small molecule of RNA binds to a mRNA and so interferes with its functioning. Nothing in EMBOSS and nothing found on the WWW. We finally did the following : use revseq -nocomp to reverse the mRNA and then align the two sequences using as matrix : ------------------------------- A T G C S W R Y K M B V H D N U A 0 5 0 0 0 5 5 0 0 5 0 5 5 5 5 0 T 5 0 5 0 0 5 0 5 5 0 5 0 5 5 5 5 G 0 5 0 5 5 0 5 0 5 0 5 5 0 5 5 3 C 0 0 5 0 5 0 0 5 0 5 5 5 5 0 5 0 S 0 0 5 5 5 0 5 5 5 5 5 5 5 5 5 0 W 5 5 0 0 0 5 5 5 5 5 5 5 5 5 5 5 R 5 0 5 0 5 5 5 0 5 5 5 5 5 5 5 0 Y 0 5 0 5 5 5 0 5 5 5 5 5 5 5 5 5 K 0 5 5 0 5 5 5 5 5 0 5 5 5 5 5 5 M 5 0 0 5 5 5 5 5 0 5 5 5 5 5 5 0 B 0 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 V 5 0 5 5 5 5 5 5 5 5 5 5 5 5 5 0 H 5 5 0 5 5 5 5 5 5 5 5 5 5 5 5 5 D 5 5 5 0 5 5 5 5 5 5 5 5 5 5 5 5 N 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 U 0 5 3 0 0 5 0 5 5 0 5 0 5 5 5 5 ------------------------------- This gave a reasonable result. water made the following alignment : ------------------------------ #======================================= # # Aligned_sequences: 2 # 1: mRNA # 2: RNAi # Matrix: HYB # Gap_penalty: 10.0 # Extend_penalty: 0.5 # # Length: 49 # Identity: 3/49 ( 6.1%) # Similarity: 0/49 ( 0.0%) # Gaps: 0/49 ( 0.0%) # Score: 185.0 # # #======================================= mRNA 2892 AATGTTGTGTGAGGATAATAGTAATAGTAATAGTAATAATAATAATAAT 2940 .. . . .... ... .. .. .. .. .. ................ RNAi 1 TTTGACCCTGCTACTACTACTACTACTACTACGATTATTATTATTATTA 49 -------------------------------- The only thing which bothers me is that the base pairs (which do have a positive comparison score) are not labeled as "similar", they get a '.' instead of a ':'. Does someone know why this is ? Guy Bottu From gbottu at ben.vub.ac.be Thu Jun 2 11:08:45 2005 From: gbottu at ben.vub.ac.be (Guy Bottu) Date: Thu, 2 Jun 2005 17:08:45 +0200 Subject: [EMBOSS] use water/matcher to find where RNA bybridizes In-Reply-To: References: Message-ID: <20050602150845.GA17226@bigben.ulb.ac.be> On Thu, Jun 02, 2005 at 07:52:45AM -0700, David Mathog wrote: > > One of our users had a problem : how to find the location where a small > > molecule of RNA binds to a mRNA and so interferes with its functioning. > > This can also be addressed with Mfold. Let A be the large mRNA of > length N and B the small one of length M. Create a hybrid RNA sequence > AB of length N+M. Set the rules in mfold so that > > bases 1->N will not bind with bases 1->N > bases N+1->N+M will not bind with bases N+1->N+M Clever idea ! As a matter of fact, I had thought of doing that, with the extra of putting between both a linker of 200 T's wich are not allowed to pait at all. Unfortunately the program mfold crashed with message : Fill run failed Maybe there is something unusual in the sequence. Regards, Guy Bottu, BEN From mathog at mendel.bio.caltech.edu Thu Jun 2 10:52:45 2005 From: mathog at mendel.bio.caltech.edu (David Mathog) Date: Thu, 02 Jun 2005 07:52:45 -0700 Subject: [EMBOSS] use water/matcher to find where RNA bybridizes Message-ID: > > One of our users had a problem : how to find the location where a small > molecule of RNA binds to a mRNA and so interferes with its functioning. This can also be addressed with Mfold. Let A be the large mRNA of length N and B the small one of length M. Create a hybrid RNA sequence AB of length N+M. Set the rules in mfold so that bases 1->N will not bind with bases 1->N bases N+1->N+M will not bind with bases N+1->N+M Run Mfold. Look through the results. If this runs properly you should see B bound somewhere in A with an energy level you may then use to compare binding affinities. Regards, David Mathog mathog at caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech From fernan at iib.unsam.edu.ar Thu Jun 2 13:08:31 2005 From: fernan at iib.unsam.edu.ar (Fernan Aguero) Date: Thu, 2 Jun 2005 14:08:31 -0300 Subject: [EMBOSS] use water/matcher to find where RNA bybridizes In-Reply-To: <20050602100954.GA14063@bigben.ulb.ac.be> References: <20050602100954.GA14063@bigben.ulb.ac.be> Message-ID: <20050602170831.GW44956@iib.unsam.edu.ar> +----[ Guy Bottu (02.Jun.2005 07:13): | | mRNA 2892 AATGTTGTGTGAGGATAATAGTAATAGTAATAGTAATAATAATAATAAT 2940 | .. . . .... ... .. .. .. .. .. ................ | RNAi 1 TTTGACCCTGCTACTACTACTACTACTACTACGATTATTATTATTATTA 49 | -------------------------------- | The only thing which bothers me is that the base pairs (which do have a | positive comparison score) are not labeled as "similar", they get a '.' | instead of a ':'. Does someone know why this is ? | +----] Guy, just a guess, but '.' and ':' are used in protein-protein comparisons to denote identity and similarity which are both different and meaningful. In dna-dna comparisons, you only care for identity, whether you consider it to be aligning A with A or A with its complement. So I would only expect only one of '.' or ':' used ... don't remember which is used for identity in emboss. My 2 cents guess, Fernan From pmr at ebi.ac.uk Thu Jun 2 13:23:13 2005 From: pmr at ebi.ac.uk (pmr at ebi.ac.uk) Date: Thu, 2 Jun 2005 18:23:13 +0100 (BST) Subject: [EMBOSS] use water/matcher to find where RNA bybridizes In-Reply-To: <20050602100954.GA14063@bigben.ulb.ac.be> References: <20050602100954.GA14063@bigben.ulb.ac.be> Message-ID: <3729.198.161.30.152.1117732993.squirrel@webmail.ebi.ac.uk> Guy Bottu writes: > One of our users had a problem : how to find the location where a small > molecule of RNA binds to a mRNA and so interferes with its functioning. > Nothing in EMBOSS and nothing found on the WWW. We finally did the > following : use revseq -nocomp to reverse the mRNA and then align the two > sequences using as matrix : > ------------------------------- > A T G C S W R Y K M B V H D N U > A 0 5 0 0 0 5 5 0 0 5 0 5 5 5 5 0 > T 5 0 5 0 0 5 0 5 5 0 5 0 5 5 5 5 .......... > ------------------------------- > This gave a reasonable result. water made the following alignment : > ------------------------------ ..... > mRNA 2892 AATGTTGTGTGAGGATAATAGTAATAGTAATAGTAATAATAATAATAAT > 2940 > .. . . .... ... .. .. .. .. .. ................ > RNAi 1 TTTGACCCTGCTACTACTACTACTACTACTACGATTATTATTATTATTA 49 > -------------------------------- > The only thing which bothers me is that the base pairs (which do have a > positive comparison score) are not labeled as "similar", they get a '.' > instead of a ':'. Does someone know why this is ? I believe this is simply because the bases are not identical. A user matrix can have arbitrary values, so the results are marked as similar (A=T scores 5) but identities are only scored at zero and so never appear with ":". You could try setting the scores to match the hydrogen bonds for this experiment (G=C 3 A=T 2 G=T 1) RNA folding is a missing area in EMBOSS. The Vienna package has been suggested as a possible EMBASSY package. Does anyone have any experience with it, or suggestions for alternative RNA packages we could use? regards, Peter From David.Bauer at SCHERING.DE Fri Jun 3 02:37:11 2005 From: David.Bauer at SCHERING.DE (David.Bauer at SCHERING.DE) Date: Fri, 3 Jun 2005 08:37:11 +0200 Subject: Antwort: Re: [EMBOSS] use water/matcher to find where RNA bybridizes Message-ID: Hi, I use the Vienna RNA package. It allows to look for global structure of the complete RNA (RNAfold) or local structures (RNALfold). The global folding accepts also longer sequences (as far as I remember this was a problem with Mfold). Visualization is a bit tricky. But there are helper scripts to convert the output to .ct files (b2ct) which can be used to create different graphical representations. Regards, David. RNA folding is a missing area in EMBOSS. The Vienna package has been suggested as a possible EMBASSY package. Does anyone have any experience with it, or suggestions for alternative RNA packages we could use? regards, Peter From gbottu at ben.vub.ac.be Fri Jun 3 04:17:41 2005 From: gbottu at ben.vub.ac.be (Guy Bottu) Date: Fri, 3 Jun 2005 10:17:41 +0200 Subject: [EMBOSS] use water/matcher to find where RNA bybridizes In-Reply-To: <20050602170831.GW44956@iib.unsam.edu.ar> References: <20050602100954.GA14063@bigben.ulb.ac.be> <20050602170831.GW44956@iib.unsam.edu.ar> Message-ID: <20050603081741.GA23810@bigben.ulb.ac.be> Dear all, Thanks for your replies. It is however still not clear to me where the '.' come from. I thought the EMBOSS "pair" output would put a '|' for identities and a ':' for similarities (score positive). Maybe the program is fooled and seriously perturbed by a matrix that assigns a negative score to identical base pairs. As for the proposal to distribute ViennaRNA as an Embassadir, why not ? At the BEN site we have mfold integrated under EMBOSS, but I am afraid distributing mfold as Embasadir will turn out to be impossible bacause of licencing issues. Note that mfold does not entirely solve the problem, since it operates on a single sequence, it does not search for a structure composed of two strands. I guess this is also true for ViennaRNA. We (me and our user) had tried to use mfold (with as input a sequence composed of the mRNA, a poly-T linker and the small RNA), but the program crashed with error message "Cannot get Fill". Maybe the sequence had something unusual. Regards, Guy Bottu, BEN From atorrano at lsi.upc.edu Fri Jun 3 05:15:10 2005 From: atorrano at lsi.upc.edu (Alexis Torrano Martinez) Date: Fri, 3 Jun 2005 11:15:10 +0200 (MET DST) Subject: [EMBOSS] external and app Message-ID: <7479297835atorrano@lsi.upc.es> Hello I am trying to execute hmmsearch from EMBOSS. This way I want to have a kind of wrap over the DDBB and retrieval apps. DB Pfam [ method: "app" comment: "Pfam with HMMER indexing" app: "$HMMERBIN/hmmsearch $EMBOSS_DATA/pfam/Pfam_ls %s" ] That is my DB specification for EMBOSS. How should I run seqret to execute properly hmmsearch? seqret Pfam:$HOME/soft/hmmer/last/tutorial/7LES_DROME And the next error was unexpected : Error: Unable to read sequence 'Pfam:/usr/usuaris/it/inb/soft/hmmer/last/tutorial/7LES_DROME' As tutorial says, if you specify external, %s receives as value the second field of the query (ID from seqret DB:ID). There is a way to call hmmsearch from EMBOSS? A lot of thanks. Regards. Alexis Torrano. -- ----------------------------------------------------- Alexis Torrano Martinez Instituto Nacional de Bioinformatica (INB) Nodo Computacional GNHC-2 UPC-CIRI c/. Jordi Girona 1-3 Modul C6-E201 Tel. : 934 011 650 E-08034 Barcelona Fax : 934 017 014 Catalunya (Spain) e-mail : atorrano at lsi.upc.edu ----------------------------------------------------- From gbottu at ben.vub.ac.be Fri Jun 3 06:03:18 2005 From: gbottu at ben.vub.ac.be (Guy Bottu) Date: Fri, 3 Jun 2005 12:03:18 +0200 Subject: [EMBOSS] external and app In-Reply-To: <7479297835atorrano@lsi.upc.es> References: <7479297835atorrano@lsi.upc.es> Message-ID: <20050603100318.GA24538@bigben.ulb.ac.be> On Fri, Jun 03, 2005 at 11:15:10AM +0200, Alexis Torrano Martinez wrote: > I am trying to execute hmmsearch from EMBOSS. This way I want to have > a kind of wrap over the DDBB and retrieval apps. > > > DB Pfam [ > method: "app" > comment: "Pfam with HMMER indexing" > app: "$HMMERBIN/hmmsearch $EMBOSS_DATA/pfam/Pfam_ls %s" > ] Dear Alexis, Your problem is as good as certain that the program defined as "app" should return a sequence to standard output, so that EMBOSS can take it. And this is not what hmmsearch does. Furthermore, hmmsearch searches a HMM against a databank of sequences ; you seem to want to search a sequence against a databank of HMM's (Pfam_ls), for which you need hmmpfam. It is maybe a good idea to install the Embassadir HMMER. Note however that ehmmpfam needs the user to specify where the databank is. At the BEN site I have a little bit "hacked" the program so that it uses Pfam_ls by default (and still lets the user choose an alternative). If you are interested I can send you a mail with "how to". Guy Bottu, Belgian EMBnet Node From pmr at ebi.ac.uk Fri Jun 3 06:08:11 2005 From: pmr at ebi.ac.uk (pmr at ebi.ac.uk) Date: Fri, 3 Jun 2005 11:08:11 +0100 (BST) Subject: [EMBOSS] use water/matcher to find where RNA bybridizes In-Reply-To: <20050603081741.GA23810@bigben.ulb.ac.be> References: <20050602100954.GA14063@bigben.ulb.ac.be> <20050602170831.GW44956@iib.unsam.edu.ar> <20050603081741.GA23810@bigben.ulb.ac.be> Message-ID: <1543.198.161.30.152.1117793291.squirrel@webmail.ebi.ac.uk> Dear Guy, > Thanks for your replies. It is however still not clear to me where the '.' > come from. I thought the EMBOSS "pair" output would put a '|' for > identities and a ':' for similarities (score positive). Maybe the program > is fooled and seriously perturbed by a matrix that assigns a negative > score to identical base pairs. I believe it is perturbed by the zero score for identical base pairs. This makes it unable to find a consensus character for the alignment, and so the "no consensus found" '.' character appears in the output. Making the output format understand your non-identical matching is an interesting challenge. I will look into it a little more. regards, Peter From Marc.Logghe at devgen.com Fri Jun 3 06:23:05 2005 From: Marc.Logghe at devgen.com (Marc Logghe) Date: Fri, 3 Jun 2005 12:23:05 +0200 Subject: [EMBOSS] external and app Message-ID: <0C528E3670D8CE4B8E013F6749231AA606E802@ANTARESIA.be.devgen.com> Hi, Just wondering, what happens if you use entret in stead of seqret. EMBOSS is supposed to just return the 'sequence' (in this case pfam result), unaltered, unparsed. When you use seqret, EMBOSS will parse the output and try to make a sequence out of it. HTH, Marc > -----Original Message----- > From: owner-emboss at hgmp.mrc.ac.uk > [mailto:owner-emboss at hgmp.mrc.ac.uk] On Behalf Of Guy Bottu > Sent: Friday, June 03, 2005 12:03 PM > To: Alexis Torrano Martinez; emboss at embnet.org > Subject: Re: [EMBOSS] external and app > > On Fri, Jun 03, 2005 at 11:15:10AM +0200, Alexis Torrano > Martinez wrote: > > I am trying to execute hmmsearch from EMBOSS. This way I > want to have > > a kind of wrap over the DDBB and retrieval apps. > > > > > > DB Pfam [ > > method: "app" > > comment: "Pfam with HMMER indexing" > > app: "$HMMERBIN/hmmsearch $EMBOSS_DATA/pfam/Pfam_ls %s" > > ] > > Dear Alexis, > > Your problem is as good as certain that the program defined as "app" > should return a sequence to standard output, so that EMBOSS > can take it. > And this is not what hmmsearch does. Furthermore, hmmsearch > searches a HMM against a databank of sequences ; you seem to > want to search a sequence against a databank of HMM's > (Pfam_ls), for which you need hmmpfam. It is maybe a good > idea to install the Embassadir HMMER. Note however that > ehmmpfam needs the user to specify where the databank is. At > the BEN site I have a little bit "hacked" the program so that > it uses Pfam_ls by default (and still lets the user choose an > alternative). If you are interested I can send you a mail > with "how to". > > Guy Bottu, > Belgian EMBnet Node > > From pmr at ebi.ac.uk Fri Jun 3 06:49:24 2005 From: pmr at ebi.ac.uk (pmr at ebi.ac.uk) Date: Fri, 3 Jun 2005 11:49:24 +0100 (BST) Subject: [EMBOSS] external and app In-Reply-To: <0C528E3670D8CE4B8E013F6749231AA606E802@ANTARESIA.be.devgen.com> References: <0C528E3670D8CE4B8E013F6749231AA606E802@ANTARESIA.be.devgen.com> Message-ID: <1830.198.161.30.152.1117795764.squirrel@webmail.ebi.ac.uk> Hi Marc, > Just wondering, what happens if you use entret in stead of seqret. > EMBOSS is supposed to just return the 'sequence' (in this case pfam > result), unaltered, unparsed. When you use seqret, EMBOSS will parse the > output and try to make a sequence out of it. Entret has to read the input as a sequence, and then returns the full text. So entret will fail where seqret fails. regards, Peter From jtk at cmp.uea.ac.uk Fri Jun 3 08:41:24 2005 From: jtk at cmp.uea.ac.uk (Jan T. Kim) Date: Fri, 3 Jun 2005 13:41:24 +0100 Subject: [EMBOSS] Reading Two Sequences from stdin with water Message-ID: <20050603124124.GI21551@jtkpc.cmp.uea.ac.uk> Dear EMBOSSers, is it possible to read both input sequences to a pairwise alignment from one input stream? With the test input file attached, the command water -asequence fasta::x.fasta:seq1 -bsequence fasta::x.fasta:seq2 -outfile stdout -auto runs as I expect, but the command cat x.fasta | water -asequence fasta::stdin:seq1 -bsequence fasta::stdin:seq2 -outfile stdout -auto gives EMBOSS An error in ajfile.c at line 1926: Error reading from file 'stdin' It may well be that water consumes the entire input stream on getting the first sequence, thus rendering itself unable to acquire the second one. Is there a solution to this? I would really like to avoid the mess of temporary files and run water in a clean pipe (pun intended ;-) ) Best regards & thanks in advance, Jan -- +- Jan T. Kim -------------------------------------------------------+ | *NEW* email: jtk at cmp.uea.ac.uk | | *NEW* WWW: http://www.cmp.uea.ac.uk/people/jtk | *-----=< hierarchical systems are for files, not for humans >=-----* -------------- next part -------------- > seq1 accaacc > seq2 acgagcc From jtk at cmp.uea.ac.uk Fri Jun 3 08:53:35 2005 From: jtk at cmp.uea.ac.uk (Jan T. Kim) Date: Fri, 3 Jun 2005 13:53:35 +0100 Subject: [EMBOSS] Reading Two Sequences from stdin with water Message-ID: <20050603125335.GA25735@jtkpc.cmp.uea.ac.uk> Dear EMBOSSers, is it possible to read both input sequences to a pairwise alignment from one input stream? With the test input file attached, the command water -asequence fasta::x.fasta:seq1 -bsequence fasta::x.fasta:seq2 -outfile stdout -auto runs as I expect, but the command cat x.fasta | water -asequence fasta::stdin:seq1 -bsequence fasta::stdin:seq2 -outfile stdout -auto gives EMBOSS An error in ajfile.c at line 1926: Error reading from file 'stdin' It may well be that water consumes the entire input stream on getting the first sequence, thus rendering itself unable to acquire the second one. Is there a solution to this? I would really like to avoid the mess of temporary files and run water in a clean pipe (pun intended ;-) ) Best regards & thanks in advance, Jan -- +- Jan T. Kim -------------------------------------------------------+ | *NEW* email: jtk at cmp.uea.ac.uk | | *NEW* WWW: http://www.cmp.uea.ac.uk/people/jtk | *-----=< hierarchical systems are for files, not for humans >=-----* -------------- next part -------------- > seq1 accaacc > seq2 acgagcc From simon.andrews at bbsrc.ac.uk Fri Jun 3 08:16:58 2005 From: simon.andrews at bbsrc.ac.uk (simon andrews (BI)) Date: Fri, 3 Jun 2005 13:16:58 +0100 Subject: [EMBOSS] Reading Two Sequences from stdin with water In-Reply-To: <20050603125335.GA25735@jtkpc.cmp.uea.ac.uk> References: <20050603125335.GA25735@jtkpc.cmp.uea.ac.uk> Message-ID: <94bd127ae352d650997dc4263fab3b8d@bbsrc.ac.uk> On 3 Jun 2005, at 13:53, Jan T. Kim wrote: > Dear EMBOSSers, > > is it possible to read both input sequences to a pairwise alignment > from one input stream? I spent a while trying to figure this out a few months back. In the end the best solution I came up with was to use the asis: sequence type. This allows you to do: water -auto asis:aaaa asis:ataa stdout which avoids the need for messing with the file system. I seem to remember I found a way to set names for the sequences as well, but can't find that right now. As long as you make sure you don't pass your command through a shell when you launch this from a script then it actually scales pretty well to quite large sequences. Hope this helps Simon. -- Simon Andrews PhD Bioinformatics Dept. The Babraham Institute simon.andrews at bbsrc.ac.uk +44 (0) 1223 496463 From pmr at ebi.ac.uk Fri Jun 3 10:09:03 2005 From: pmr at ebi.ac.uk (pmr at ebi.ac.uk) Date: Fri, 3 Jun 2005 15:09:03 +0100 (BST) Subject: [EMBOSS] Reading Two Sequences from stdin with water In-Reply-To: <20050603125335.GA25735@jtkpc.cmp.uea.ac.uk> References: <20050603125335.GA25735@jtkpc.cmp.uea.ac.uk> Message-ID: <1977.198.161.30.152.1117807743.squirrel@webmail.ebi.ac.uk> Jan T. Kim writes: > is it possible to read both input sequences to a pairwise alignment > from one input stream? > > cat x.fasta | water -asequence fasta::stdin:seq1 -bsequence > fasta::stdin:seq2 -outfile stdout -auto > > gives > > EMBOSS An error in ajfile.c at line 1926: > Error reading from file 'stdin' > > It may well be that water consumes the entire input stream on getting the > first sequence, thus rendering itself unable to acquire the second one. > > Is there a solution to this? I would really like to avoid the mess of > temporary files and run water in a clean pipe (pun intended ;-) ) EMBOSS will only cleanly read stdin as one input. We should probably trap that internally and give an error if we find stdin opening again. I wonder whether there is any useful way to share the stdin filebuffer. Hmmmm... in the early days of EMBOSS we decided not to allow it, but it could be worth a try. You would still be in trouble if you tried to read the second sequence first though. Assuming your x.fasta file has only seq1 and seq2 in that order, reading seq1 will continue until the first line of seq2 is reached. By then it would be too late for seq2 to be read cleanly. At least you have fasta:: specified - with no specified format, EMBOSS has to read a long way into the input just to check whether it is really GCG format. As for the asis format, I suppose an EMBOSS utility that reads x.fasta and outputs asis::ctagtacgatgcgatcg asis::tgatcgatggctacgtagc would be useful to you - then you could put `sillyname x.fasta` in your command line... at least until the command line gets too long. Hard to preserve the ID and description of the sequences though. "If you think water is pure, just remember what fish do in it." Hope that helps, Peter From jtk at cmp.uea.ac.uk Fri Jun 3 11:40:31 2005 From: jtk at cmp.uea.ac.uk (Jan T. Kim) Date: Fri, 3 Jun 2005 16:40:31 +0100 Subject: [EMBOSS] Reading Two Sequences from stdin with water In-Reply-To: <94bd127ae352d650997dc4263fab3b8d@bbsrc.ac.uk> References: <20050603125335.GA25735@jtkpc.cmp.uea.ac.uk> <94bd127ae352d650997dc4263fab3b8d@bbsrc.ac.uk> Message-ID: <20050603154031.GE25735@jtkpc.cmp.uea.ac.uk> On Fri, Jun 03, 2005 at 01:16:58PM +0100, simon andrews wrote: > > On 3 Jun 2005, at 13:53, Jan T. Kim wrote: > > >Dear EMBOSSers, > > > >is it possible to read both input sequences to a pairwise alignment > >from one input stream? > > I spent a while trying to figure this out a few months back. In the > end the best solution I came up with was to use the asis: sequence > type. This allows you to do: > > water -auto asis:aaaa asis:ataa stdout > > which avoids the need for messing with the file system. I seem to > remember I found a way to set names for the sequences as well, but > can't find that right now. That's a good idea which I hadn't thought of. Thanks for that. I don't need any names, other than for purposes of identifying the sequence within a multisequence file, which is not necessary with this solution. > As long as you make sure you don't pass your command through a shell > when you launch this from a script then it actually scales pretty well > to quite large sequences. Hmm... isn't there any OS specific limitation to the length of arguments? But anyway, this is not an issue for me in my case, where sequence length does not exceed a few hundred symbols. Best regards, Jan -- +- Jan T. Kim -------------------------------------------------------+ | *NEW* email: jtk at cmp.uea.ac.uk | | *NEW* WWW: http://www.cmp.uea.ac.uk/people/jtk | *-----=< hierarchical systems are for files, not for humans >=-----* From simon.andrews at bbsrc.ac.uk Fri Jun 3 10:53:17 2005 From: simon.andrews at bbsrc.ac.uk (simon andrews (BI)) Date: Fri, 3 Jun 2005 15:53:17 +0100 Subject: [EMBOSS] Reading Two Sequences from stdin with water In-Reply-To: <20050603154031.GE25735@jtkpc.cmp.uea.ac.uk> References: <20050603125335.GA25735@jtkpc.cmp.uea.ac.uk> <94bd127ae352d650997dc4263fab3b8d@bbsrc.ac.uk> <20050603154031.GE25735@jtkpc.cmp.uea.ac.uk> Message-ID: <297ae8156db03f61d2deb2e786d3bf10@bbsrc.ac.uk> On 3 Jun 2005, at 16:40, Jan T. Kim wrote: > On Fri, Jun 03, 2005 at 01:16:58PM +0100, simon andrews wrote: >> As long as you make sure you don't pass your command through a shell >> when you launch this from a script then it actually scales pretty well >> to quite large sequences. > > Hmm... isn't there any OS specific limitation to the length of > arguments? > But anyway, this is not an issue for me in my case, where sequence > length does not exceed a few hundred symbols. The only limit is imposed when the command is passed through a shell, and is then dependent on the shell you're using. If you can call the program without going through a shell then there should be no limit (beyond normal OS memory limits). The method for doing this varies with the language you're writing the script in, but for example in Perl: system ("water -auto asis:gatc asis:gatc stdout") would pass the arguments through a shell, whereas system("water", "-auto", "asis:gatc","asis:gatc","stdout") would not. Simon. -- Simon Andrews PhD Bioinformatics Dept. The Babraham Institute simon.andrews at bbsrc.ac.uk +44 (0) 1223 496463 From andrew.warry at bbsrc.ac.uk Fri Jun 3 11:23:57 2005 From: andrew.warry at bbsrc.ac.uk (andrew warry (BITS)) Date: Fri, 3 Jun 2005 16:23:57 +0100 Subject: [EMBOSS] Reading Two Sequences from stdin with water Message-ID: <3AED5B0556B73F4A9B556F43384F5C8501857BEC@bitse2knas1.bits.bbsrc.reserved> >Is there a solution to this? I would really like to avoid the mess of temporary files and >run water in a clean pipe (pun intended ;-) ) Hi How about : nthseq x.fasta -number 2 -stdout -auto | water -aseq stdin -bseq x.fasta -stdout -auto It isn't very neat and does a redundant comparison but it does the job! Andrew ----------------------------------------------------------------------- ANDREW WARRY Computational Molecular Biology Support BBSRC Bioscience IT services West Common Harpenden HERTS AL5 2JE tel: (01582) 714904 fax: (01582) 714901 andrew.warry at bbsrc.ac.uk ----------------------------------------------------------------------- -- Disclaimer: This e-mail and any attachments are confidential and intended solely for the use of the recipient(s) to whom they are addressed. If you have received it in error, please destroy all copies and inform the sender. This email and any attachments are believed to be free from viruses but BBSRC accepts no liability in connection therewith. From simon.andrews at bbsrc.ac.uk Fri Jun 3 11:32:34 2005 From: simon.andrews at bbsrc.ac.uk (simon andrews (BI)) Date: Fri, 3 Jun 2005 16:32:34 +0100 Subject: [EMBOSS] Reading Two Sequences from stdin with water In-Reply-To: <3AED5B0556B73F4A9B556F43384F5C8501857BEC@bitse2knas1.bits.bbsrc.reserved> References: <3AED5B0556B73F4A9B556F43384F5C8501857BEC@bitse2knas1.bits.bbsrc.reserved> Message-ID: <53838984cac0240ba7aefe6d33f7810d@bbsrc.ac.uk> On 3 Jun 2005, at 16:23, andrew warry ((BITS)) wrote: > >> Is there a solution to this? I would really like to avoid the mess of >> temporary files and run water in a clean pipe (pun intended ;-) ) > > Hi > How about : > > nthseq x.fasta -number 2 -stdout -auto | water -aseq stdin -bseq > x.fasta > -stdout -auto > > It isn't very neat and does a redundant comparison but it does the job! But x.fasta still has to appear on the filesystem. You can't run this cleanly in a pipe. Simon. From golharam at umdnj.edu Fri Jun 3 10:57:18 2005 From: golharam at umdnj.edu (Ryan Golhar) Date: Fri, 03 Jun 2005 10:57:18 -0400 Subject: [EMBOSS] Man pages Message-ID: <004501c5684c$89f2ef90$e6028a0a@GOLHARMOBILE1> Hi all, I recently noticed there aren't man pages installed with emboss, but I thought there were in the past. Are there man pages available? If so, where/how do I get them? ----- Ryan Golhar Computational Biologist The Informatics Institute at The University of Medicine & Dentistry of NJ Phone: 973-972-5034 Fax: 973-972-7412 Email: golharam at umdnj.edu From jtk at cmp.uea.ac.uk Fri Jun 3 13:18:01 2005 From: jtk at cmp.uea.ac.uk (Jan T. Kim) Date: Fri, 3 Jun 2005 18:18:01 +0100 Subject: [EMBOSS] Reading Two Sequences from stdin with water In-Reply-To: <1977.198.161.30.152.1117807743.squirrel@webmail.ebi.ac.uk> References: <20050603125335.GA25735@jtkpc.cmp.uea.ac.uk> <1977.198.161.30.152.1117807743.squirrel@webmail.ebi.ac.uk> Message-ID: <20050603171801.GF25735@jtkpc.cmp.uea.ac.uk> On Fri, Jun 03, 2005 at 03:09:03PM +0100, pmr at ebi.ac.uk wrote: > Jan T. Kim writes: > > is it possible to read both input sequences to a pairwise alignment > > from one input stream? > > > > cat x.fasta | water -asequence fasta::stdin:seq1 -bsequence > > fasta::stdin:seq2 -outfile stdout -auto > > > > gives > > > > EMBOSS An error in ajfile.c at line 1926: > > Error reading from file 'stdin' > > > > It may well be that water consumes the entire input stream on getting the > > first sequence, thus rendering itself unable to acquire the second one. > > > > Is there a solution to this? I would really like to avoid the mess of > > temporary files and run water in a clean pipe (pun intended ;-) ) > > EMBOSS will only cleanly read stdin as one input. We should probably trap > that internally and give an error if we find stdin opening again. I wonder > whether there is any useful way to share the stdin filebuffer. Hmmmm... in > the early days of EMBOSS we decided not to allow it, but it could be worth > a try. You would still be in trouble if you tried to read the second > sequence first though. Conceptually, this could be cleanly handled (which is why I tried in the first place), by having the function for obtaining the input sequences determine the source files in a first pass of the list of sources, and then obtain all requested sequences that come from the same file in one go through that file. This could be applied to the standard input just as to any other file. However, if the current code acquires the two sequences one after the other and independently of each other, it will require a possibly less than trivial rewrites to change that -- likely, the API for obtaining a sequence specified by a USA would have to be extended such that multiple sequences can be obtained from one file in one pass through that file, and some functions to group lists of USAs into sublists of USAs that refer to the same file would have to be provided. > Assuming your x.fasta file has only seq1 and seq2 in that order, reading > seq1 will continue until the first line of seq2 is reached. By then it > would be too late for seq2 to be read cleanly. Well, the approach outlined above does not have that limitation, and it also works for interleaved sequence formats. But if the EMBOSS internals are as I assume above, it's clear to me that this is something for the long-term wishlist. > At least you have fasta:: specified - with no specified format, EMBOSS has > to read a long way into the input just to check whether it is really GCG > format. Yes, heuristic format determination and non-seekable inputs don't mix too well generally... > As for the asis format, I suppose an EMBOSS utility that reads x.fasta and > outputs asis::ctagtacgatgcgatcg asis::tgatcgatggctacgtagc would be useful > to you - then you could put `sillyname x.fasta` in your command line... at > least until the command line gets too long. Hard to preserve the ID and > description of the sequences though. Yes -- in my case, I have the sequences available within a Python script anyway, so the asis approach works fine for me (even with a popen facility that goes through a shell -- I'll have to check how to eliminate that for future occasions where sequences may be too long for the command line, though). > "If you think water is pure, just remember what fish do in it." I like to boil my water, adding an all-natural disinfectant known as "coffee" for this reason... ;-) Best regards, Jan -- +- Jan T. Kim -------------------------------------------------------+ | *NEW* email: jtk at cmp.uea.ac.uk | | *NEW* WWW: http://www.cmp.uea.ac.uk/people/jtk | *-----=< hierarchical systems are for files, not for humans >=-----* From robin at hms.harvard.edu Fri Jun 3 12:30:33 2005 From: robin at hms.harvard.edu (Robin Colgrove) Date: Fri, 3 Jun 2005 12:30:33 -0400 Subject: [EMBOSS] Man pages in multiple languages? In-Reply-To: <004501c5684c$89f2ef90$e6028a0a@GOLHARMOBILE1> References: <004501c5684c$89f2ef90$e6028a0a@GOLHARMOBILE1> Message-ID: Hello all, are there EMBOSS man pages in other languages than English? Mandarin and Spanish in particular would help around here. thanks robin colgrove Harvard Medical School From pmr at ebi.ac.uk Fri Jun 3 13:14:08 2005 From: pmr at ebi.ac.uk (pmr at ebi.ac.uk) Date: Fri, 3 Jun 2005 18:14:08 +0100 (BST) Subject: [EMBOSS] Man pages in multiple languages? In-Reply-To: References: <004501c5684c$89f2ef90$e6028a0a@GOLHARMOBILE1> Message-ID: <2398.198.161.30.152.1117818848.squirrel@webmail.ebi.ac.uk> Hi Robin, > are there EMBOSS man pages in other languages than English? > > Mandarin and Spanish in particular would help around here. We don't have man pages exactly. We have a text version of the online documentation, with the "tfm" program to display to the screen. To find out why it is called tfm, you can use the command: tfm tfm Of course, it prints "The F(antastic) Manual" as in "RTFM" For other languages, there may be something out there. We are aware of a Japanese user group that has translated much of the EMBOSS materials. I am sure there are Mandarin speakers who could create a Mandarin version - though on the first ever EMBOSS course (in Beijing) ethere was a vote against creating a Mandarin version of the commandline. Hope this helps, Peter Rice From luojc at plum.lsc.pku.edu.cn Fri Jun 3 21:15:37 2005 From: luojc at plum.lsc.pku.edu.cn (Jingchu Luo) Date: Sat, 4 Jun 2005 09:15:37 +0800 (CST) Subject: [EMBOSS] Man pages in multiple languages? In-Reply-To: <2398.198.161.30.152.1117818848.squirrel@webmail.ebi.ac.uk> Message-ID: > I am sure there are Mandarin speakers who could create a Mandarin > version - though on the first ever EMBOSS course (in Beijing) there was > a vote against creating a Mandarin version of the commandline. We were running an EMBnet bioinformatics workshop in April 1999. Peter gave a talk about EMBOSS. It might be useful to have user manual and/or documentation in Chinese for the Chinese user group. We'll see if anyone in mainland has been working on this already. Jingchu ------- Jingchu Luo Centre of Bioinformatics Peking University Beijing 100871, China Tel: 86-10-6275-7281 Fax: 86-10-6275-9001 Email: luojc at pku.edu.cn URL: http://www.cbi.pku.edu.cn From d.gatherer at vir.gla.ac.uk Wed Jun 15 06:31:33 2005 From: d.gatherer at vir.gla.ac.uk (Derek Gatherer) Date: Wed, 15 Jun 2005 11:31:33 +0100 Subject: [EMBOSS] seqret options Message-ID: <6.2.1.2.1.20050615111255.02adcf50@lenzie.gla.ac.uk> Dear EMBOSSers I'm trying to write a pipeline to take a load of paired, aligned homologues from 2 species and submit them sequentially to the yn00 application from the well known PAML package. PAML's applications all take PHYLIP format. I can easily make this by looping over: seqret -auto -osformat phylip infile -out outfile However, PAML requires that the flag "I" be placed on the top line of the phylip fomat to indicate interleaved, eg: 2 663 I c-barf1 ATGGCCAGGC TTTTCGCTCA GCTGCTCCTG CTCGCGGGCT CCGTCGCCTC barf1 ATGGCCAGGT TCATCGCTCA GCTCCTCCTG TTGGCCTCCT GTGTGGCCGC CTGCCTGGCC GTCACCGCCT TTGTGGGTGA GCGGGCCGTC CTGAGTTCCT CGGCCAGGCT GTCACCGCTT TCTTGGGTGA GCGAGTCACC CTGACCTCCT rather than the standard phylip format, given by seqret: 2 663 c-barf1 ATGGCCAGGC TTTTCGCTCA GCTGCTCCTG CTCGCGGGCT CCGTCGCCTC barf1 ATGGCCAGGT TCATCGCTCA GCTCCTCCTG TTGGCCTCCT GTGTGGCCGC CTGCCTGGCC GTCACCGCCT TTGTGGGTGA GCGGGCCGTC CTGAGTTCCT CGGCCAGGCT GTCACCGCTT TCTTGGGTGA GCGAGTCACC CTGACCTCCT I could write a script to open each seqret output file and add this character to the top line of each, but before I dive into this, I'd like to know if there is any flag I can add to seqret to get the "I" added automatically. Failing that, PAML takes the other, non-interleaved phylip format ("sequential") by default, and that would not require any flag insertion. Seqret also can produce this (using -osformat phylip3): 1 663 YF c-barf1 ATGGCCAGGC TTTTCGCTCA GCTGCTCCTG CTCGCGGGCT CCGTCGCCTC CTGCCTGGCC GTCACCGCCT TTGTGGGTGA GCGGGCCGTC CTGAGTTCCT ACTGGAAGAG GGTGAGCCTA GGGCCCGAGA TCATGGTGGA ATGGTTCAAA but then PAML won't read it because it doesn't like the YF flags inserted by seqret!! So I either have to script to remove flags from sequential or insert them in interleaved, unless seqret has a solution. All assistance gratefully appreciated Derek From David.Bauer at SCHERING.DE Wed Jun 15 07:19:55 2005 From: David.Bauer at SCHERING.DE (David.Bauer at SCHERING.DE) Date: Wed, 15 Jun 2005 13:19:55 +0200 Subject: Antwort: [EMBOSS] seqret options Message-ID: Hi Derek, you can easily change this in the source code. The sequence output formats are defined in ajax/ajseqwrite.c In the function seqWritePhylip3 you find a line: ajFmtPrintF(outseq->File, "1 %d YF\n", ilen); Here you can just delete the YF and recompile emboss. David. Derek Gatherer An: emboss at embnet.org Gesendet von: Kopie: owner-emboss at hgm Thema: [EMBOSS] seqret options p.mrc.ac.uk 15.06.2005 12:31 Dear EMBOSSers I'm trying to write a pipeline to take a load of paired, aligned homologues from 2 species and submit them sequentially to the yn00 application from the well known PAML package. PAML's applications all take PHYLIP format. I can easily make this by looping over: seqret -auto -osformat phylip infile -out outfile However, PAML requires that the flag "I" be placed on the top line of the phylip fomat to indicate interleaved, eg: 2 663 I c-barf1 ATGGCCAGGC TTTTCGCTCA GCTGCTCCTG CTCGCGGGCT CCGTCGCCTC barf1 ATGGCCAGGT TCATCGCTCA GCTCCTCCTG TTGGCCTCCT GTGTGGCCGC CTGCCTGGCC GTCACCGCCT TTGTGGGTGA GCGGGCCGTC CTGAGTTCCT CGGCCAGGCT GTCACCGCTT TCTTGGGTGA GCGAGTCACC CTGACCTCCT rather than the standard phylip format, given by seqret: 2 663 c-barf1 ATGGCCAGGC TTTTCGCTCA GCTGCTCCTG CTCGCGGGCT CCGTCGCCTC barf1 ATGGCCAGGT TCATCGCTCA GCTCCTCCTG TTGGCCTCCT GTGTGGCCGC CTGCCTGGCC GTCACCGCCT TTGTGGGTGA GCGGGCCGTC CTGAGTTCCT CGGCCAGGCT GTCACCGCTT TCTTGGGTGA GCGAGTCACC CTGACCTCCT I could write a script to open each seqret output file and add this character to the top line of each, but before I dive into this, I'd like to know if there is any flag I can add to seqret to get the "I" added automatically. Failing that, PAML takes the other, non-interleaved phylip format ("sequential") by default, and that would not require any flag insertion. Seqret also can produce this (using -osformat phylip3): 1 663 YF c-barf1 ATGGCCAGGC TTTTCGCTCA GCTGCTCCTG CTCGCGGGCT CCGTCGCCTC CTGCCTGGCC GTCACCGCCT TTGTGGGTGA GCGGGCCGTC CTGAGTTCCT ACTGGAAGAG GGTGAGCCTA GGGCCCGAGA TCATGGTGGA ATGGTTCAAA but then PAML won't read it because it doesn't like the YF flags inserted by seqret!! So I either have to script to remove flags from sequential or insert them in interleaved, unless seqret has a solution. All assistance gratefully appreciated Derek From pmr at ebi.ac.uk Wed Jun 15 08:23:48 2005 From: pmr at ebi.ac.uk (Peter Rice) Date: Wed, 15 Jun 2005 13:23:48 +0100 Subject: [EMBOSS] seqret options In-Reply-To: <6.2.1.2.1.20050615111255.02adcf50@lenzie.gla.ac.uk> References: <6.2.1.2.1.20050615111255.02adcf50@lenzie.gla.ac.uk> Message-ID: <42B01DD4.8050303@ebi.ac.uk> Derek Gatherer wrote: > Dear EMBOSSers > > I'm trying to write a pipeline to take a load of paired, aligned > homologues from 2 species and submit them sequentially to the yn00 > application from the well known PAML package. PAML's applications all > take PHYLIP format. > Failing that, PAML takes the other, non-interleaved phylip format > ("sequential") by default, and that would not require any flag > insertion. Last time I worked through the PHYLIP formats (for EMBOSS 2.10.0) I found Phylip had changed the format it used. One change was that I removed the YF from phylip3 format because phylip was no longer using it - so updating to EMBOSS 2.10.0 will solve your non-interleaved format problem (and David Bauer's code fix is exactly what you need). Any more feedback on the variations of phylip formats that other packages use would be a great help! We will be releasing the PHYLIP 3.6 integration (as a PHYLIPNEW EMBASSY package) soon and expect to see more use of phylogenetics packages with EMBOSS. regards, Peter Rice From d.gatherer at vir.gla.ac.uk Wed Jun 15 08:44:46 2005 From: d.gatherer at vir.gla.ac.uk (Derek Gatherer) Date: Wed, 15 Jun 2005 13:44:46 +0100 Subject: [EMBOSS] seqret options In-Reply-To: <42B01DD4.8050303@ebi.ac.uk> References: <6.2.1.2.1.20050615111255.02adcf50@lenzie.gla.ac.uk> <42B01DD4.8050303@ebi.ac.uk> Message-ID: <6.2.1.2.1.20050615134121.02addff8@lenzie.gla.ac.uk> I do have 2.10.0: [gath01d at gamma seqs]$ seqret -osformat phylip3 barf1_both.seq Reads and writes (returns) sequences Output sequence [c-barf1.phylip3]: barf1.phylip3 [gath01d at gamma seqs]$ more barf1.phylip3 1 663 YF c-barf1ATGGCCAGGC TTTTCGCTCA GCTGCTCCTG CTCGCGGGCT CCGTCGCCTC CTGCCTGGCC GTCACCGCCT TTGTGGGTGA GCGGGCCGTC CTGAGTTCCT ACTGGAAGAG GGTGAGCCTA GGGCCCGAGA TCATGGTGGA ATGGTTCAAA [gath01d at gamma seqs]$ embossversion Writes the current EMBOSS version number 2.10.0 Anyway, I know how to do the code fix now, so thanks to all. Cheers Derek At 13:23 15/06/2005, you wrote: >Derek Gatherer wrote: > >>Dear EMBOSSers >>I'm trying to write a pipeline to take a load of paired, aligned >>homologues from 2 species and submit them sequentially to the yn00 >>application from the well known PAML package. PAML's applications all >>take PHYLIP format. > >>Failing that, PAML takes the other, non-interleaved phylip format >>("sequential") by default, and that would not require any flag insertion. > >Last time I worked through the PHYLIP formats (for EMBOSS 2.10.0) I found >Phylip had changed the format it used. > >One change was that I removed the YF from phylip3 format because phylip >was no longer using it - so updating to EMBOSS 2.10.0 will solve your >non-interleaved format problem (and David Bauer's code fix is exactly what >you need). > >Any more feedback on the variations of phylip formats that other packages >use would be a great help! > >We will be releasing the PHYLIP 3.6 integration (as a PHYLIPNEW EMBASSY >package) soon and expect to see more use of phylogenetics packages with EMBOSS. > >regards, > >Peter Rice > From pmr at ebi.ac.uk Wed Jun 15 08:49:59 2005 From: pmr at ebi.ac.uk (Peter Rice) Date: Wed, 15 Jun 2005 13:49:59 +0100 Subject: [EMBOSS] seqret options In-Reply-To: <6.2.1.2.1.20050615134121.02addff8@lenzie.gla.ac.uk> References: <6.2.1.2.1.20050615111255.02adcf50@lenzie.gla.ac.uk> <42B01DD4.8050303@ebi.ac.uk> <6.2.1.2.1.20050615134121.02addff8@lenzie.gla.ac.uk> Message-ID: <42B023F7.7010808@ebi.ac.uk> Derek Gatherer wrote: > I do have 2.10.0: > > [gath01d at gamma seqs]$ seqret -osformat phylip3 barf1_both.seq > Reads and writes (returns) sequences > Output sequence [c-barf1.phylip3]: barf1.phylip3 > [gath01d at gamma seqs]$ more barf1.phylip3 > 1 663 YF > c-barf1ATGGCCAGGC TTTTCGCTCA GCTGCTCCTG CTCGCGGGCT CCGTCGCCTC > CTGCCTGGCC GTCACCGCCT TTGTGGGTGA GCGGGCCGTC CTGAGTTCCT > ACTGGAAGAG GGTGAGCCTA GGGCCCGAGA TCATGGTGGA ATGGTTCAAA > [gath01d at gamma seqs]$ embossversion > Writes the current EMBOSS version number > 2.10.0 Oops ... make that "will be in 3.0.0" in that case ... it worked for me :-) regards, Peter From d.gatherer at vir.gla.ac.uk Wed Jun 15 09:25:36 2005 From: d.gatherer at vir.gla.ac.uk (Derek Gatherer) Date: Wed, 15 Jun 2005 14:25:36 +0100 Subject: [EMBOSS] seqret again Message-ID: <6.2.1.2.1.20050615142150.02afda58@lenzie.gla.ac.uk> Is this a bug? Compare the following output from seqret when phylip and phylip3 are specified. Shouldn't the first line of the phylip3 output be "2 546 YF" and not "1 546" ? [gath01d at gamma EBV]$ seqret -osformat phylip seqs/balf1.both Reads and writes (returns) sequences Output sequence [c-balf1.phylip]: seqs/balf1.phylip [gath01d at gamma EBV]$ more seqs/balf1.phylip 2 546 c-balf1.seATGCAGCCAG CCAAGTCTAC CGATTCGGTG TTTGTGAGGA CCCCGGTCGA balf1.seq ATGAGGCCAG CCAAGTCTAC AGATTCTGTG TTTGTGAGGA CCCCGGTCGA GGCGTGGGTC TCACCCTCGC CCCCGGACGA CAAAGTGGCA GAGACCAGCT GGCGTGGGTC GCGCCCTCGC CGCCGGACGA CAAGGTGGCT GAGTCCAGCT [snip] [gath01d at gamma EBV]$ seqret -osformat phylip3 seqs/balf1.both Reads and writes (returns) sequences Output sequence [c-balf1.phylip3]: seqs/balf1.phylip3 [gath01d at gamma EBV]$ more seqs/balf1.phylip3 1 546 YF c-balf1.seATGCAGCCAG CCAAGTCTAC CGATTCGGTG TTTGTGAGGA CCCCGGTCGA GGCGTGGGTC TCACCCTCGC CCCCGGACGA CAAAGTGGCA GAGACCAGCT ACCTCCTGTT CAGGGCCCTA TACGCTGTGT TCACCCAGGA CGAGACGGAC CTGCCTCTAC CGGCCCTGGT CATGTGCCGG CTCCTGAAGG CCTCCCTGAG [snip] From pmr at ebi.ac.uk Wed Jun 15 09:35:57 2005 From: pmr at ebi.ac.uk (Peter Rice) Date: Wed, 15 Jun 2005 14:35:57 +0100 Subject: [EMBOSS] seqret again In-Reply-To: <6.2.1.2.1.20050615142150.02afda58@lenzie.gla.ac.uk> References: <6.2.1.2.1.20050615142150.02afda58@lenzie.gla.ac.uk> Message-ID: <42B02EBD.4040800@ebi.ac.uk> Derek Gatherer wrote: > Is this a bug? Compare the following output from seqret when phylip and > phylip3 are specified. Shouldn't the first line of the phylip3 output > be "2 546 YF" and not "1 546" ? > [gath01d at gamma EBV]$ seqret -osformat phylip3 seqs/balf1.both > Reads and writes (returns) sequences > Output sequence [c-balf1.phylip3]: seqs/balf1.phylip3 > [gath01d at gamma EBV]$ more seqs/balf1.phylip3 > 1 546 YF > c-balf1.seATGCAGCCAG CCAAGTCTAC CGATTCGGTG TTTGTGAGGA CCCCGGTCGA > GGCGTGGGTC TCACCCTCGC CCCCGGACGA CAAAGTGGCA GAGACCAGCT > ACCTCCTGTT CAGGGCCCTA TACGCTGTGT TCACCCAGGA CGAGACGGAC > CTGCCTCTAC CGGCCCTGGT CATGTGCCGG CTCCTGAAGG CCTCCCTGAG Yes. Fixed in the next release (and in the current CVS code). Fixed as in "2 546" without the YF. Do any programs require the YF? Peter From kertib at linuxlap.hu Wed Jun 15 10:13:44 2005 From: kertib at linuxlap.hu (Kerti Balazs Gabor) Date: Wed, 15 Jun 2005 16:13:44 +0200 Subject: [EMBOSS] Install error (AMD64) Message-ID: <42B03798.1060204@linuxlap.hu> Hello! I would like to install emboss (latest version) from source. The host OS is Fedora Linux Core 4 (2.6.11-1.1369_FC4 #1 Thu Jun 2 22:56:33 EDT 2005 x86_64 x86_64 x86_64 GNU/Linux). The script $ configure --enable 64 ran clear but the make made error this: /bin/sh ../libtool --tag=CC --mode=link gcc -O2 -o aaindexextract aaindexextract.o ../nucleus/libnucleus.la ../ajax/libajaxg.la ../ajax/libajax.la ../plplot/libplplot.la -lX11 -lm mkdir .libs gcc -O2 -o .libs/aaindexextract aaindexextract.o ../nucleus/.libs/libnucleus.so ../ajax/.libs/libajaxg.so ../ajax/.libs/libajax.so ../plplot/.libs/libplplot.so -lX11 -lm -Wl,--rpath -Wl,/usr/local/lib /usr/bin/ld: cannot find -lX11 collect2: ld returned 1 exit status make[2]: *** [aaindexextract] Error 1 make[2]: Leaving directory `/usr/src/EMBOSS-2.10.0/emboss' make[1]: *** [all-recursive] Error 1 make[1]: Leaving directory `/usr/src/EMBOSS-2.10.0/emboss' make: *** [all-recursive] Error 1 [root at localhost EMBOSS-2.10.0]# How to solve this? What package(s) need for it? Balazs From ableasby at hgmp.mrc.ac.uk Wed Jun 15 10:26:39 2005 From: ableasby at hgmp.mrc.ac.uk (Alan Bleasby) Date: Wed, 15 Jun 2005 15:26:39 +0100 (BST) Subject: [EMBOSS] Install error (AMD64) Message-ID: <200506151426.j5FEQduS029156@bromine.hgmp.mrc.ac.uk> Dear Balazs, You need to install the xorg-x11-devel RPM, 'make clean' and do the configure step again. Also, there is no need to define --enable64 unless you expect 'user space' applications to consume more than 4Gb of internal memory. HTH Alan Bleasby RFCGR/HGMP (for the next month and a half) From aengus.stewart at cancer.org.uk Wed Jun 15 11:46:07 2005 From: aengus.stewart at cancer.org.uk (Aengus Stewart) Date: Wed, 15 Jun 2005 16:46:07 +0100 Subject: [EMBOSS] 3.0.0 Message-ID: <42B04D3F.7020405@cancer.org.uk> Will the ceremonial release of 3.0.0 into the wild be at ISMB? In other words, soon? :-) Regards Aengus -- ----------------------------------------------------------------------- Aengus Stewart Group Leader Tel: +44 (0)20 7269 3679 Cancer Research UK, Lincoln's Inn Fields, Holborn, London, WC2A 3PX, UK ----------------------------------------------------------------------- This electronic message contains information which may be privileged and confidential. The information is intended to be for the use of the individual(s) or entity named above. If you are not the intended recipient, be aware that any disclosure, copying, distribution or use of the contents of this information is prohibited. If you have received this electronic message in error, please notify me by telephone or email (to the number or address above) immediately. From ableasby at hgmp.mrc.ac.uk Wed Jun 15 12:44:18 2005 From: ableasby at hgmp.mrc.ac.uk (Alan Bleasby) Date: Wed, 15 Jun 2005 17:44:18 +0100 (BST) Subject: [EMBOSS] 3.0.0 Message-ID: <200506151644.j5FGiI8T009556@bromine.hgmp.mrc.ac.uk> Well, we always like to try to release on St Swithin's Day; that date is normally before ISMB, but this year it isn't. EMBOSS will feature at ISMB in all the usual places (BOSC, poster, demo and maybe BOF) and the soon-to-be-released 3.0.0 will certainly be mentioned there. Alan From golharam at umdnj.edu Wed Jun 15 15:01:54 2005 From: golharam at umdnj.edu (Ryan Golhar) Date: Wed, 15 Jun 2005 15:01:54 -0400 Subject: [EMBOSS] EMBOSS-GUI Message-ID: <000c01c571dc$b1daaf90$e6028a0a@GOLHARMOBILE1> Does anyone know if any work is being done on EMBOSS-GUI by Luke McCarthy. The web site doesn't seem to be active and out-of-date. If a new version isn't being worked on, I'd like to volunteer to help maintain it for v3.0.0. Its such a simple and clean interface. I haven't found anything else like it. Ryan From andrespinzon at gmail.com Wed Jun 15 16:14:27 2005 From: andrespinzon at gmail.com (Andres Pinzon) Date: Wed, 15 Jun 2005 15:14:27 -0500 Subject: [EMBOSS] EMBOSS-GUI In-Reply-To: <000c01c571dc$b1daaf90$e6028a0a@GOLHARMOBILE1> References: <000c01c571dc$b1daaf90$e6028a0a@GOLHARMOBILE1> Message-ID: <8968fc7e0506151314772f91f0@mail.gmail.com> 2005/6/15, Ryan Golhar : > Does anyone know if any work is being done on EMBOSS-GUI by Luke > McCarthy. The web site doesn't seem to be active and out-of-date. > > If a new version isn't being worked on, I'd like to volunteer to help > maintain it for v3.0.0. Its such a simple and clean interface. I > haven't found anything else like it. If you need help to maintaini it please ask me! ;-) I really liked that interface too. -- --------- Andr?s Pinz?n [http://www.andrespinzon.com] Centro de Bioinformatica, Instituto de Biotecnologia http://bioinf.ibun.unal.edu.co Universidad Nacional de Colombia tel. 3165000 ext. 16961 GNU/Linux user number 349752 ---------- From lukem at gene.pbi.nrc.ca Wed Jun 15 15:49:23 2005 From: lukem at gene.pbi.nrc.ca (Luke McCarthy) Date: Wed, 15 Jun 2005 13:49:23 -0600 Subject: [EMBOSS] EMBOSS-GUI In-Reply-To: <000c01c571dc$b1daaf90$e6028a0a@GOLHARMOBILE1> References: <000c01c571dc$b1daaf90$e6028a0a@GOLHARMOBILE1> Message-ID: <1118864963.13749.8.camel@incognito.invalid> On Wed, 2005-06-15 at 13:01, Ryan Golhar wrote: > Does anyone know if any work is being done on EMBOSS-GUI by Luke > McCarthy. The web site doesn't seem to be active and out-of-date. > > If a new version isn't being worked on, I'd like to volunteer to help > maintain it for v3.0.0. Its such a simple and clean interface. I > haven't found anything else like it. I have developed a new version and moved the code to sourceforge (http://sourceforge.net/projects/embossgui/) Since February, the only remaining step has been to wrap it up in a releasable format, but I just haven't found the time. I had considered waiting until the 3.0.0 release of EMBOSS, but if there's interest now I'll do my best to get it out there sooner. Cheers, Luke From golharam at umdnj.edu Thu Jun 16 11:10:49 2005 From: golharam at umdnj.edu (Ryan Golhar) Date: Thu, 16 Jun 2005 11:10:49 -0400 Subject: [EMBOSS] EMBOSS-GUI In-Reply-To: <1118866913.13749.12.camel@incognito.invalid> Message-ID: <002201c57285$95084090$e6028a0a@GOLHARMOBILE1> The release for EMBOSS 3.0.0 is around July 15th? If so, I can wait for embossgui until then. If you need any help with embossgui, please let me know. I'd be more than happy to contribute what I can. Ryan -----Original Message----- From: Luke McCarthy [mailto:lukem at gene.pbi.nrc.ca] Sent: Wednesday, June 15, 2005 4:22 PM To: Ryan Golhar Subject: Re: [EMBOSS] EMBOSS-GUI * (also copied to emboss at embnet.org) On Wed, 2005-06-15 at 13:01, Ryan Golhar wrote: > Does anyone know if any work is being done on EMBOSS-GUI by Luke > McCarthy. The web site doesn't seem to be active and out-of-date. > > If a new version isn't being worked on, I'd like to volunteer to help > maintain it for v3.0.0. Its such a simple and clean interface. I > haven't found anything else like it. I have developed a new version and moved the code to sourceforge (http://sourceforge.net/projects/embossgui/) Since February, the only remaining step has been to wrap it up in a releasable format, but I just haven't found the time. I had considered waiting until the 3.0.0 release of EMBOSS, but if there's interest now I'll do my best to get it out there sooner. Cheers, Luke From msarachu at biol.unlp.edu.ar Thu Jun 16 15:41:23 2005 From: msarachu at biol.unlp.edu.ar (Martin Sarachu) Date: Thu, 16 Jun 2005 16:41:23 -0300 Subject: [EMBOSS] Masking the : character? Message-ID: <42B1D5E3.1000503@biol.unlp.edu.ar> Dear list, is there any way to mask the ':' character so it is not interpreted as a delimiter for DB:sequence? I have this file /home/embtest/wProjects/test/.clustal.05.06.15:17.46.27/ops2_drome.msf and when I run infoseq I get this error $ infoseq /home/embtest/wProjects/test/.clustal.05.06.15:17.46.27/ops2_drome.msf Displays some simple information about sequences Error: failed to open filename '/home/embtest/wProjects/test/.clustal.05.06.15' Error: Unable to read sequence '/home/embtest/wProjects/test/.clustal.05.06.15:17.46.27/ops2_drome.msf' Died: infoseq terminated: Bad value for '-sequence' and no prompt Thanks in advance, Martin -- Martin Sarachu msarachu at biol.unlp.edu.ar AR.EMBnet http://www.ar.embnet.org From yezhiqiang at gmail.com Sat Jun 18 05:28:16 2005 From: yezhiqiang at gmail.com (yezhiqiang at gmail.com) Date: Sat, 18 Jun 2005 17:28:16 +0800 Subject: [EMBOSS] Masking the : character? In-Reply-To: <42B1D5E3.1000503@biol.unlp.edu.ar> References: <42B1D5E3.1000503@biol.unlp.edu.ar> Message-ID: <34198fe4050618022825238622@mail.gmail.com> I have also found this. and \: or using quote cannot solve this problem. But why not just rename your file name? It doesn't bother. 2005/6/17, Martin Sarachu : > Dear list, > > is there any way to mask the ':' character so it is not interpreted as a > delimiter for DB:sequence? > I have this file > > /home/embtest/wProjects/test/.clustal.05.06.15:17.46.27/ops2_drome.msf > > and when I run infoseq I get this error > > $ infoseq > /home/embtest/wProjects/test/.clustal.05.06.15:17.46.27/ops2_drome.msf > Displays some simple information about sequences > Error: failed to open filename > '/home/embtest/wProjects/test/.clustal.05.06.15' > Error: Unable to read sequence > '/home/embtest/wProjects/test/.clustal.05.06.15:17.46.27/ops2_drome.msf' > Died: infoseq terminated: Bad value for '-sequence' and no prompt > > Thanks in advance, > > Martin > > -- > Martin Sarachu > msarachu at biol.unlp.edu.ar > AR.EMBnet > http://www.ar.embnet.org > From yezhiqiang at gmail.com Sat Jun 18 05:50:50 2005 From: yezhiqiang at gmail.com (yezhiqiang at gmail.com) Date: Sat, 18 Jun 2005 17:50:50 +0800 Subject: [EMBOSS] Man pages In-Reply-To: <004501c5684c$89f2ef90$e6028a0a@GOLHARMOBILE1> References: <004501c5684c$89f2ef90$e6028a0a@GOLHARMOBILE1> Message-ID: <34198fe405061802504ace851@mail.gmail.com> EMBOss has its own manual system: tfm try like this: wossname seqret tfm seqret 2005/6/3, Ryan Golhar : > Hi all, > > I recently noticed there aren't man pages installed with emboss, but I > thought there were in the past. Are there man pages available? If so, > where/how do I get them? > > ----- > Ryan Golhar > Computational Biologist > The Informatics Institute at > The University of Medicine & Dentistry of NJ > > Phone: 973-972-5034 > Fax: 973-972-7412 > Email: golharam at umdnj.edu > > From jrvalverde at cnb.uam.es Mon Jun 20 04:55:20 2005 From: jrvalverde at cnb.uam.es (=?ISO-8859-15?Q?Jos=E9?= R. Valverde) Date: Mon, 20 Jun 2005 10:55:20 +0200 Subject: [EMBOSS] Multiplatform filenames (was Re: Masking the : character?) In-Reply-To: <34198fe4050618022825238622@mail.gmail.com> References: <42B1D5E3.1000503@biol.unlp.edu.ar> <34198fe4050618022825238622@mail.gmail.com> Message-ID: <20050620105520.736fef76.jrvalverde@cnb.uam.es> On Sat, 18 Jun 2005 17:28:16 +0800 wrote: > I have also found this. > and \: or using quote cannot solve this problem. > > But why not just rename your file name? It doesn't bother. > > > 2005/6/17, Martin Sarachu : > > Dear list, > > > > is there any way to mask the ':' character so it is not interpreted as a > > delimiter for DB:sequence? Renaming. --------- Or in other words (caution, detailed explanation follows): Why should anybody have a database or db. file named something\ or something\\\? But the fact is that by Unix filesystem semantics that is allowed. So, there is no easy way to avoid the ':' problem as one must acommodate for this. Specially since :: is also meningful to EMBOSS. One should introduce the notion of a special scape metacharacter or a quotation method, and while at it, it should integrate easily with shells... meaning that it should not be pre-processed by the shell (e.g. 'file:name' would come out of the shell as file:name, the user would need to type "'file:name'" or some other such horrible combination to escape shell quotations too). The problem arises because the ':' is used for historic reasons as a carry-over from VMS where it had special meaning on pathnames. This does not hold on UNIX where it is a legit character (actually ANY char but '/' and NULL is a legit character on UNIX). This is important as EMBOSS may be used on many locales, and you don't know in advance how a given symbol will be represented on them. Freedom comes at a cost. QUICK SOLUTION - ------------ I think that for the user it is simpler to know that ':' has a special meaning and should be avoided. For the cases where the colon is generated automatically, it may be better to provide a renaming script that changes the colon to something else. UI 'PRO' APPROACH - --------------- For GUI writers it is probably better to "translate" any such filenames between the user and EMBOSS. Note the quotes around translate above: it is not immediate. Let me explain: Escaping for the *command line* must be done using some character that is a) meaningful (but those are mostly already taken) and b) easy to type on a keyboard. In any case, this means that the user must be aware of the special case, and if so, renaming is just as good a solution. Escaping for the GUI removes all conditions and gives you full freedom. There are useful tricks to use special quoting/escaping chars on GUIs (hint: look into ASCII 0-32), but translating filenames can NOT be done transparently to the user (unless you can guarantee yours is the only user interface they will use). Any translation will change the filename and make it look differently or even untypable on other interfaces. Note that the problem still remains of distinguishing when a pathname containing a colon is an actual filename and not a database:file specification automatically. On a GUI you may assume a :-containing path is a filename when you are tagging uploaded data or program generated data, but otherwise you should be cautious, highly cautious. I.e. does swiss:prot_human refer to the database entry or to the data the user uploaded and called that way? Is it possible someone has called their database 'sequencer_files' locally and if so how you distinguish the local database of sequencer files from the user batch of sequencer_files:* uploaded sequences? Assuming you can tell, then read on: The trick is to create a special hidden directory on each user directory accessed: e.g. .myGUI-names. Then for every file make a suitably processed symlink on that subdirectory and call emboss through the symlink, sort of: my-gui-store-file(filename) { save(filename); sym = concatenate(".myGUI-names/", process(filename)); make_symlink(sym); } my-gui-emboss-access-file(filename) { sym = concatenate(".myGUI-names/", process(filename)); if (!file_exists(sym)) make_symlink(sym); emboss-access(sym); } process(filename) { for (p = filename; *p; p++) if (*p == ':') *p = SUB; // e.g. ASCII 0x1A } And off you go. Why the ? You should try to substitute the colon by something that is guaranteed to be portable. You only have either a) the portable character set (which is all typable) or b) the control character set (ASCII 0-32) which you may assume will be available everywhere, and most probably not used in filenames as they are very difficult to type or use by hand in general. From these we better avoid NUL, BEL, BS, HT, LF, VT, FF, CR and ESC just in case. But we still have plenty to choose from: SUB (substitute), CAN (cancel), DLE (data link espcape) have good mnemonics for escaping and STX (start of transmission) and ETX (end of transmission) for quoting, but these are only suggestions. That is to say: in the example above we substituted : by , because we only care about this special case. If there were more cases, then full escaping/quoting might be needed, and then instead we would copy the filename into a new string and fully quote/escape. I suggest the substitution approach since we are doing the encoding *within* the file name: anything else (quoting/escaping) will introduce additional chars inside the filename and this will reduce the available filename length hence making it less transparent and potentially dangerous (should by any chance be two filenames on the length limit containing an escapable sequence and differing only in the last char). Alternately one may use a hash of the filename instead, but this is more painful to code, maintain and debug and potentially more wasteful in terms of space. Now, the original filenames are in place, and available for the command line, up/downloads, other user interfaces, etc.. to manage as they wish, but your GUI is no longer haunted by the infamous colon. Symlinks on UNIX eat very little space: usually just the directory entry. If space is very tight and becomes a concern you may consider either hardlinks or only symlinking special filenames (this last at the cost of additionally complex logic). With current hard disks I wouldn't worry. And, yes, I know this involves many more changes to a UI, but either users accommodate (by avoiding the colon) or the UI does (by hidding limitations). Actually this a similar trick is used by NetATalk, AppleTalk, MacOS X and other systems that have similar metadata problems. j -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/emboss/attachments/20050620/be157c6f/attachment.bin From pmr at ebi.ac.uk Mon Jun 20 05:16:35 2005 From: pmr at ebi.ac.uk (Peter Rice) Date: Mon, 20 Jun 2005 10:16:35 +0100 Subject: [EMBOSS] Multiplatform filenames (was Re: Masking the : character?) In-Reply-To: <20050620105520.736fef76.jrvalverde@cnb.uam.es> References: <42B1D5E3.1000503@biol.unlp.edu.ar> <34198fe4050618022825238622@mail.gmail.com> <20050620105520.736fef76.jrvalverde@cnb.uam.es> Message-ID: <42B68973.7090105@ebi.ac.uk> Jos? R. Valverde wrote: >>2005/6/17, Martin Sarachu : >>>is there any way to mask the ':' character so it is not interpreted as a >>>delimiter for DB:sequence? > The problem arises because the ':' is used for historic reasons as a > carry-over from VMS where it had special meaning on pathnames. This > does not hold on UNIX where it is a legit character (actually ANY char > but '/' and NULL is a legit character on UNIX). This is important as > EMBOSS may be used on many locales, and you don't know in advance > how a given symbol will be represented on them. Freedom comes at a > cost. Strictly speaknig, the problem arises because ':' has become a standard for bioinformatics users - though, yes, VMS was the source of the special syntax. It was adopted by, among others, GCG and SRS. It also is used, of course, in URN and URL syntax. However, in this case there is a partial solution. only alphanumneric characters are allowed in EMBOSS database names, and they must be more that one character in length (to avoid clashing with C: on Windows systems). The problem posted was not in a database name. It was the filename:id syntax, where a ':' appeared in the filename full path. For a ':' in a directory name (not in the filename) we could try to catch it by not allowing '/' in the ID. However, that can run into problems. For example, PFAM uses '/' in the identifier of a sequence derived from a longer entry. > QUICK SOLUTION > - ------------ > I think that for the user it is simpler to know that ':' has a special > meaning and should be avoided. > > For the cases where the colon is generated automatically, it may be better > to provide a renaming script that changes the colon to something else. That would be my recommendation too. > UI 'PRO' APPROACH > - --------------- > For GUI writers it is probably better to "translate" any such filenames > between the user and EMBOSS. Note the quotes around translate above: it > is not immediate. Let me explain: > > The trick is to create a special hidden directory on each user > directory accessed: e.g. .myGUI-names. Then for every file make a > suitably processed symlink on that subdirectory and call emboss through > the symlink, sort of: Looks like a good approach. The alternative would be to trap "bad" filenames and ask the user to correct them. regards, Peter From kkmattil at csc.fi Mon Jun 20 07:50:46 2005 From: kkmattil at csc.fi (Kimmo Mattila) Date: Mon, 20 Jun 2005 14:50:46 +0300 (EEST) Subject: [EMBOSS] Installing EMBOSS on a Rocks linux Message-ID: Hi I would like to ask, if anyone of you have managed to install EMBOSS on a linux cluster running Rocks linux. When I tried to install EMBOSS to our Rocks cluster, the standard installation procedure went through without error messages, but when I try to start an EMBOSS application, I get an error message: wossname Segmentation fault (core dumped) Google search about this topic revealed that some one else have had similar problems with Rocks too, but I was not able to find any potential solution. However, EMBOSS is available in Rocks based BioBrew linux distribution. So, any hints about how to install EMBOSS in a Rocks cluster would be welcome. Regards, Kimmo Mattila --------------------------------------------------------------- Kimmo Mattila, sovellusasiantuntija, Bioinformatiikan palvelut, CSC PL 405 02101 Espoo, puh 09 457 2708 , fax (09) 457 2302 CSC on tieteen tietotekniikan keskus, www.csc.fi, s-posti: kimmo.mattila at csc.fi Kimmo Mattila, application scientist, Bioinformatics Support, CSC P.O. Box 405 02101 Espoo, Finland, tel +358 9 4572708, fax +358 9 4572302 CSC is the Finnish IT Center for Science, www.csc.fi, e-mail: kimmo.mattila at csc.fi --------------------------------------------------------------- From smiddha at indiana.edu Mon Jun 20 10:59:56 2005 From: smiddha at indiana.edu (Sumit Middha) Date: Mon, 20 Jun 2005 09:59:56 -0500 Subject: [EMBOSS] Emboss package - file size limitations In-Reply-To: References: Message-ID: <1119279596.42b6d9ec2c52d@webmail.iu.edu> Hi, I looked around for threshold limitations on the size of the files that can be used for analysis, but could not locate any information. Is there a limit to the size of files that I can use, and is there a different limit on the web and command line usage. Actually I had the same question for GCG tools. Thanks, Sumit From pmr at ebi.ac.uk Mon Jun 20 11:26:52 2005 From: pmr at ebi.ac.uk (Peter Rice) Date: Mon, 20 Jun 2005 16:26:52 +0100 Subject: [EMBOSS] Emboss package - file size limitations In-Reply-To: <1119279596.42b6d9ec2c52d@webmail.iu.edu> References: <1119279596.42b6d9ec2c52d@webmail.iu.edu> Message-ID: <42B6E03C.9020306@ebi.ac.uk> Hi Sumit, > Is there a limit to the size of files that I can use, and is there a different > limit on the web and command line usage. EMBOSS has no hard coded limit on sequence or file size. The operating system may have problems with 2Gb file size, and the EMBLCD indexing system we use for database indexing in EMBOSS 2 has a 2Gb file size limit (4 byte file pointers are part of the index format) - there will be a new indexing system in beta release with EMBOSS 3 that will have enough space for large file offsets. Some algorithms will have limits, depending on the memory (real and virtual) on your machine. > Actually I had the same question for GCG tools. I believe sequence length is still up to 350kb unless you have the source code (when I was at Sanger I routinely rebuilt GCG with 750kb as the maximum sequence length so the genome sequencers could still use it on their own sequences!) A future release of GCG is supposed to increase this. Hope that helps, Peter Rice From francis at bii.a-star.edu.sg Tue Jun 21 04:47:51 2005 From: francis at bii.a-star.edu.sg (Francis Tang) Date: Tue, 21 Jun 2005 16:47:51 +0800 Subject: [EMBOSS] Wildfire 2.0 Message-ID: <42B7D437.5060506@bii.a-star.edu.sg> Dear EMBOSS users, On behalf of the Bioinformatics Institute, Singapore, I would like to announce that Wildfire 2.0 is now available for download from http://wildfire.bii.a-star.edu.sg . Wildfire is a GUI application for constructing workflows. It has been configured so that you can build workflows using EMBOSS applications immediately. The resulting workflows can run on a cluster or other multi-cpu machine, and exploit parallelism where possible. Wildfire is described in the BMC Bioinformatics article: "Wildfire: distributed, Grid-enabled workflow construction and execution", BMC Bioinformatics 2005, 6:69. http://www.biomedcentral.com/1471-2105/6/69/abstract We invite you all to download and try Wildfire and welcome feedback to wildfire at bii.a-star.edu.sg . Thank you. Francis. -- Francis TANG, Post-Doctoral Research Fellow Bioinformatics Institute, BMSI, A-STAR, Singapore. Tel: +65 64788282 Fax: +65 64789048 Email: francis at bii.a-star.edu.sg Add: Matrix L7, Biopolis WWW: http://www.bii.a-star.edu.sg/~francis/ From jieqiwang at gmail.com Tue Jun 21 10:55:46 2005 From: jieqiwang at gmail.com (Wang Jieqi) Date: Tue, 21 Jun 2005 22:55:46 +0800 Subject: [EMBOSS] Help with retrieving sequences Message-ID: <55162b5205062107555043348@mail.gmail.com> Hello, I started to learn EMBOSS recently. Now, I want to read the CDS of several mRNA sequences. The complete entires of these mRNAs(cDNA) have been retrieved from GeneBank into a single file. Could you please tell me what to do next? And, I find that seqret seems to only read the first molecule, could you please help me out? Thanks. Best regards, Jieqi -- Jieqi Wang Room 121, Department of Biology Tsinghua University Beijing, 100084 China, People's Republic Mobile: +86-13641302483 Dorm: +86-10-51534406 Lab: +86-10-62784794 Fax: +86-10-62794376 From aengus.stewart at cancer.org.uk Tue Jun 21 11:16:41 2005 From: aengus.stewart at cancer.org.uk (Aengus Stewart) Date: Tue, 21 Jun 2005 16:16:41 +0100 Subject: [EMBOSS] Data Lib sizes and indexing progs Message-ID: <42B82F59.5040200@cancer.org.uk> Hi folks, Just wondering how the new indexing methods were coming on. Its just I had a look at the most recent EMBL release and its (give or take the odd gig)AND INDEXING PROGS 250Gb which means to have the head room to hold a copy while installing a new copy requires >500Gb. Any info on how the new indexing will work and will it still have to run off uncompressed .dat files or will it produce its own index format? Sorry about the questions, its just I am rushing around the filesystem deleting anything that may appear to be "deleteable" to scrounge enough space :-) Regards Aengus -- ----------------------------------------------------------------------- Aengus Stewart Group Leader Bioinformatics at CGAL Tel: +44 (0)20 7269 3679 Cancer Research UK, Lincoln's Inn Fields, Holborn, London, WC2A 3PX, UK ----------------------------------------------------------------------- This electronic message contains information which may be privileged and confidential. The information is intended to be for the use of the individual(s) or entity named above. If you are not the intended recipient, be aware that any disclosure, copying, distribution or use of the contents of this information is prohibited. If you have received this electronic message in error, please notify me by telephone or email (to the number or address above) immediately. From ableasby at hgmp.mrc.ac.uk Tue Jun 21 11:27:56 2005 From: ableasby at hgmp.mrc.ac.uk (Alan Bleasby) Date: Tue, 21 Jun 2005 16:27:56 +0100 (BST) Subject: [EMBOSS] Data Lib sizes and indexing progs Message-ID: <200506211527.j5LFRuRR024742@bromine.hgmp.mrc.ac.uk> The new indexing programs are done (in CVS). The programs are: dbxflat, dbxfasta and dbxgcg and they operate like their 'dbi' couterparts. The dbx and dbi programs will be available in the next release. So, for EMBL, you would typically index the *.dat files. As before, you can create id,acc,sv,key,org & des indexes (though many sites just index id and acc). An indexing job on the whole of the recently released EMBL will produce id, acc and key indexes of the following sizes. They should give you some idea of the extra disc space you'll need. -rw-r--r-- 1 root root 19950 Jun 19 14:11 embli.ent -rw-r--r-- 1 root root 122 Jun 20 13:41 embli.pxac -rw-r--r-- 1 root root 122 Jun 20 13:41 embli.pxid -rw-r--r-- 1 root root 126 Jun 20 13:41 embli.pxkw -rw-r--r-- 1 root root 8755992576 Jun 20 13:41 embli.xac -rw-r--r-- 1 root root 7482558464 Jun 20 13:41 embli.xid -rw-r--r-- 1 root root 4046751744 Jun 20 13:41 embli.xkw HTH Alan From kellert at ohsu.edu Thu Jun 23 00:06:18 2005 From: kellert at ohsu.edu (Thomas J Keller) Date: Wed, 22 Jun 2005 21:06:18 -0700 Subject: [EMBOSS] source of common vectors in cirdna format Message-ID: <03b2ab1a566cf2386b89bb570e26e6eb@ohsu.edu> Greetings, Is there a source for common vectors in cirdna format available for downloading? Thanks in advance, Tom Keller Tom Keller, Ph.D. http://www.ohsu.edu/research/core kellert at ohsu.edu 503-494-2442 -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/enriched Size: 259 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/emboss/attachments/20050622/d4ae1249/attachment.bin From clemens.broger at roche.com Thu Jun 23 09:48:24 2005 From: clemens.broger at roche.com (Broger, Clemens) Date: Thu, 23 Jun 2005 15:48:24 +0200 Subject: [EMBOSS] Needle/water, revcomp Message-ID: <7E08B1C16672A147B29D3DE3827536E37E00CC@rbamsem3.emea.roche.com> I have 2 questions: The first is about identity/similarity in nucleotide alignments made with needle (probably the same holds true for water): ######################################## # Program: needle # Rundate: Thu Jun 23 13:29:58 2005 # Align_format: srspair # Report_file: seq0.needle ######################################## #======================================= # # Aligned_sequences: 2 # 1: SEQ0 # 2: SEQ1 # Matrix: EDNAFULL # Gap_penalty: 100.0 # Extend_penalty: 10.0 # # Length: 70 # Length of sequence 1: 70 # Length of sequence 2: 70 # Identity: 46/70 (65.7%) # Similarity: 47/70 (67.1%) # Gaps: 0/70 ( 0.0%) # Score: 162.0 # # #======================================= . . . . . SEQ0 1 aaaaaaaaaaaaaaaaaaaaaaaaacccccgggggtttttuuuuunnnnn 50 |||||||||||||||||||||......|......||:....:|.. SEQ1 1 aaaaaaaaaaaaaaaaaaaaacgtunacgtunacgtunacgtunacgtun 50 . . . . . . . SEQ0 51 aaaaaaaaaaaaaaaaaaaa 70 |||||||||||||||||||| SEQ1 51 aaaaaaaaaaaaaaaaaaaa 70 . . Each base of the set acgtun is aligned against each other. The 20 a's at the beginning and end are only to force an ungapped alignment. Maximum gap penalties were used. I agree with the symbols in the alignment |,: and ., but the 46 identities in the summary imply that the n-n match is also counted. The t-u matches are counted as similar, which is ok, but the n-n match is not counted as similar, although it is counted as identical. I think the n-n match should not be counted both in identity and similarity. Now for ambiguous bases. w is a or t ######################################## # Program: needle # Rundate: Thu Jun 23 14:53:33 2005 # Align_format: srspair # Report_file: seq0.needle ######################################## #======================================= # # Aligned_sequences: 2 # 1: SEQ0 # 2: SEQ1 # Matrix: EDNAFULL # Gap_penalty: 100.0 # Extend_penalty: 10.0 # # Length: 26 # Length of sequence 1: 26 # Length of sequence 2: 26 # Identity: 21/26 (80.8%) # Similarity: 23/26 (88.5%) # Gaps: 0/26 ( 0.0%) # Score: 94.0 # # #======================================= . . SEQ0 1 aaaaaaaaaawwwwwwaaaaaaaaaa 26 ||||||||||.. .|||||||||| SEQ1 1 aaaaaaaaaaatwgcuaaaaaaaaaa 26 . . In the alignment I would put a dot at the w-w match (but I could also agree with the way it is handled now). But again the w is counted in the summary as an identity but not as a similarity. The second question is about the handling in EMBOSS of reverse-complemented nucleotide segments such as db:seq[10:20:r] The sequence is first reverse-complemented and then residues 10 to 20 are cut out. Biologists usually expect that residues 10 to 20 are first cut out and then reverse-complemented. Can this be changed? That would be very helpful. Best regards Clemens Dr. Clemens Broger Bioinformatics F. Hoffmann-La Roche Ltd. PRBI 65/303 CH-4070 Basel clemens.broger at roche.com +41-61-688-4447 -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.open-bio.org/pipermail/emboss/attachments/20050623/270ec53f/attachment.html From pmr at ebi.ac.uk Thu Jun 23 10:38:25 2005 From: pmr at ebi.ac.uk (pmr at ebi.ac.uk) Date: Thu, 23 Jun 2005 15:38:25 +0100 (BST) Subject: [EMBOSS] source of common vectors in cirdna format In-Reply-To: <03b2ab1a566cf2386b89bb570e26e6eb@ohsu.edu> References: <03b2ab1a566cf2386b89bb570e26e6eb@ohsu.edu> Message-ID: <2840.12.27.2.2.1119537505.squirrel@webmail.ebi.ac.uk> Tom Keller writes: > Is there a source for common vectors in cirdna format available for > downloading? Or is there a source of common vectors that we could convert to cirdna format? regards, Peter Rice From pmr at ebi.ac.uk Thu Jun 23 10:41:34 2005 From: pmr at ebi.ac.uk (pmr at ebi.ac.uk) Date: Thu, 23 Jun 2005 15:41:34 +0100 (BST) Subject: [EMBOSS] Needle/water, revcomp In-Reply-To: <7E08B1C16672A147B29D3DE3827536E37E00CC@rbamsem3.emea.roche.com> References: <7E08B1C16672A147B29D3DE3827536E37E00CC@rbamsem3.emea.roche.com> Message-ID: <2849.12.27.2.2.1119537694.squirrel@webmail.ebi.ac.uk> Clemens Broger writes: > I have 2 questions: > > The first is about identity/similarity in nucleotide alignments made > with needle (probably the same holds true for water): Tricky. This requires the matrix to define some codes as ambiguity codes so we know w-w is not an identity. I woudl guess we can extend the matrix formats we use to include this information, or perhaps for nucleotide sequences we can "know" the answer. I will investigate. > The second question is about the handling in EMBOSS of > reverse-complemented nucleotide segments such as > > db:seq[10:20:r] > > The sequence is first reverse-complemented and then residues 10 to 20 > are cut out. > Biologists usually expect that residues 10 to 20 are first cut out and > then reverse-complemented. > > Can this be changed? That would be very helpful. Oops. Yes - will do. regards, Peter Rice From msarachu at biol.unlp.edu.ar Mon Jun 27 08:28:58 2005 From: msarachu at biol.unlp.edu.ar (Martin Sarachu) Date: Mon, 27 Jun 2005 09:28:58 -0300 Subject: [EMBOSS] Re: wemboss: warning and errors In-Reply-To: <0C9336E1DA90DB479BEBAF2C7C5699E1016EA96D@USINDMDOWM001.dow.com> References: <0C9336E1DA90DB479BEBAF2C7C5699E1016EA96D@USINDMDOWM001.dow.com> Message-ID: <42BFF10A.4090405@biol.unlp.edu.ar> Dear Rani, about the error with ACD, when running distmat from command line (-options to be prompted for all options) I get this error with ACD > # distmat -options > Creates a distance matrix from multiple alignments > Input sequence set: uniprot:papa_* > Multiple substitution correction methods for proteins > 0 : Uncorrected > 1 : Jukes-Cantor > 2 : Kimura Protein > Method to use [0]: 1 > Warning: ACD expression invalid @(!$acdprotein) > > Warning: ACD expression invalid @(!$acdprotein) > > Error: File /usr/local/emboss/share/EMBOSS/acd/distmat.acd line 60: (ambiguous) Bad additional flag N | Y) > but without -options (i.e. default options chosen) runs ok > # distmat > Creates a distance matrix from multiple alignments > Input sequence set: uniprot:papa_* > Multiple substitution correction methods for proteins > 0 : Uncorrected > 1 : Jukes-Cantor > 2 : Kimura Protein > Method to use [0]: 1 > Output file [papa_.distmat]: > Warning: Sequence lengths are not equal! > Warning: Sequence lengths are not equal! > Warning: Sequence lengths are not equal! there is a missing left parenthesis in distmat.acd in line 61, please change this > additional: "@(@(@(!$acdprotein)) & @($(nucmethod)==1)) | to this > additional: "@(@(@(!$(acdprotein)) & @($(nucmethod)==1)) | Regards, Martin PS: working on the exclude problem... Mamidipalli, SudhaRani (S) wrote: > Hello Martin, > > While testing the programs in wEMBOSS,we have encountered couple of problems. > > 1.The 'distmat' program gave some warning. Here is the warning of that program. > ------------------------------- > Warning! > "ambiguous" parameter: syntax error (missing left parenthesis) in ACD expression (tell to EMBOSS Manager : this could produce wrong results from program execution!) > ------------------------------- > I went and checked distmat.acd file but couldn't find any error. > > 2. I added some programs, that we don't want to be displayed in wemboss, in the exclude file: /genomics/sw/wEMBOSS-1.4.0/wEMBOSS/data/exclude. And then I re-installed wrappers4EMBOSS and wEMBOSS. Surprisingly, only few programs(for example tranalign,embossversion etc.) got deleted from wemboss whereas few programs (for example textsearch, entret etc.) show up with error > -------- > EMBOSS: error... > chaos has been excluded > ---------- > > Please clarify. > > Thanks and Regards, > Rani. > -- Martin Sarachu msarachu at biol.unlp.edu.ar AR.EMBnet http://www.ar.embnet.org From pmr at ebi.ac.uk Mon Jun 27 10:25:35 2005 From: pmr at ebi.ac.uk (pmr at ebi.ac.uk) Date: Mon, 27 Jun 2005 15:25:35 +0100 (BST) Subject: [EMBOSS] Re: wemboss: warning and errors In-Reply-To: <42BFF10A.4090405@biol.unlp.edu.ar> References: <0C9336E1DA90DB479BEBAF2C7C5699E1016EA96D@USINDMDOWM001.dow.com> <42BFF10A.4090405@biol.unlp.edu.ar> Message-ID: <1613.12.27.2.2.1119882335.squirrel@webmail.ebi.ac.uk> Martin Srachu writes: > there is a missing left parenthesis in distmat.acd in line 61, please > change this > >> additional: "@(@(@(!$acdprotein)) & @($(nucmethod)==1)) | > > to this > >> additional: "@(@(@(!$(acdprotein)) & @($(nucmethod)==1)) | Already fixed in EMBOSS 2.10.0. But this does highlight a gap in the ACD validation - this expression is only evaluated when needed (when -option is used). I will try adding checks for all strings to generate warnings for unbalanced () and $ or @ without ( to acdvalid before the July 15th release. >> -------- >> EMBOSS: error... >> chaos has been excluded >> ---------- I know this is really a wEMBOSS problem, but the message appeals to my sense of humour!!! Can you send me an explanation of it when you have a solution - it may appear in future EMBOSS talks :-) regards, Peter From gbottu at ben.vub.ac.be Wed Jun 29 04:30:02 2005 From: gbottu at ben.vub.ac.be (Guy Bottu) Date: Wed, 29 Jun 2005 10:30:02 +0200 Subject: [EMBOSS] bug related to -plasmid parameter Message-ID: <20050629083002.GA4560@bigben.ulb.ac.be> from: Belgian EMBnet Node Dear colleagues, At the BEN site we have on our main computer EMBOSS 2.10.0 under Alpha OSF 5.1A. I just noticed that the programs remap, restrict and restover give a segmentation fault when run with parameter -plasmid. This does however not occur with an EMBOSS installation we have on a Linux. So, this behaviour must be dependant on the OS and maybe on the hardware. Did someone else notice it ? Regards, Guy Bottu From ableasby at hgmp.mrc.ac.uk Wed Jun 29 08:13:15 2005 From: ableasby at hgmp.mrc.ac.uk (Alan Bleasby) Date: Wed, 29 Jun 2005 13:13:15 +0100 (BST) Subject: [EMBOSS] bug related to -plasmid parameter Message-ID: <200506291213.j5TCDFMb014301@bromine.hgmp.mrc.ac.uk> Dear Guy, Thanks for spotting that. It's now fixed in CVS and will be part of the 3.0.0 release. ATB Alan Bleasby RFCGR/HGMP (for one more month) From gbottu at ben.vub.ac.be Thu Jun 2 10:09:54 2005 From: gbottu at ben.vub.ac.be (Guy Bottu) Date: Thu, 2 Jun 2005 12:09:54 +0200 Subject: [EMBOSS] use water/matcher to find where RNA bybridizes Message-ID: <20050602100954.GA14063@bigben.ulb.ac.be> from : Belgian EMBnet Node Dear colleagues, One of our users had a problem : how to find the location where a small molecule of RNA binds to a mRNA and so interferes with its functioning. Nothing in EMBOSS and nothing found on the WWW. We finally did the following : use revseq -nocomp to reverse the mRNA and then align the two sequences using as matrix : ------------------------------- A T G C S W R Y K M B V H D N U A 0 5 0 0 0 5 5 0 0 5 0 5 5 5 5 0 T 5 0 5 0 0 5 0 5 5 0 5 0 5 5 5 5 G 0 5 0 5 5 0 5 0 5 0 5 5 0 5 5 3 C 0 0 5 0 5 0 0 5 0 5 5 5 5 0 5 0 S 0 0 5 5 5 0 5 5 5 5 5 5 5 5 5 0 W 5 5 0 0 0 5 5 5 5 5 5 5 5 5 5 5 R 5 0 5 0 5 5 5 0 5 5 5 5 5 5 5 0 Y 0 5 0 5 5 5 0 5 5 5 5 5 5 5 5 5 K 0 5 5 0 5 5 5 5 5 0 5 5 5 5 5 5 M 5 0 0 5 5 5 5 5 0 5 5 5 5 5 5 0 B 0 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 V 5 0 5 5 5 5 5 5 5 5 5 5 5 5 5 0 H 5 5 0 5 5 5 5 5 5 5 5 5 5 5 5 5 D 5 5 5 0 5 5 5 5 5 5 5 5 5 5 5 5 N 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 U 0 5 3 0 0 5 0 5 5 0 5 0 5 5 5 5 ------------------------------- This gave a reasonable result. water made the following alignment : ------------------------------ #======================================= # # Aligned_sequences: 2 # 1: mRNA # 2: RNAi # Matrix: HYB # Gap_penalty: 10.0 # Extend_penalty: 0.5 # # Length: 49 # Identity: 3/49 ( 6.1%) # Similarity: 0/49 ( 0.0%) # Gaps: 0/49 ( 0.0%) # Score: 185.0 # # #======================================= mRNA 2892 AATGTTGTGTGAGGATAATAGTAATAGTAATAGTAATAATAATAATAAT 2940 .. . . .... ... .. .. .. .. .. ................ RNAi 1 TTTGACCCTGCTACTACTACTACTACTACTACGATTATTATTATTATTA 49 -------------------------------- The only thing which bothers me is that the base pairs (which do have a positive comparison score) are not labeled as "similar", they get a '.' instead of a ':'. Does someone know why this is ? Guy Bottu From gbottu at ben.vub.ac.be Thu Jun 2 15:08:45 2005 From: gbottu at ben.vub.ac.be (Guy Bottu) Date: Thu, 2 Jun 2005 17:08:45 +0200 Subject: [EMBOSS] use water/matcher to find where RNA bybridizes In-Reply-To: References: Message-ID: <20050602150845.GA17226@bigben.ulb.ac.be> On Thu, Jun 02, 2005 at 07:52:45AM -0700, David Mathog wrote: > > One of our users had a problem : how to find the location where a small > > molecule of RNA binds to a mRNA and so interferes with its functioning. > > This can also be addressed with Mfold. Let A be the large mRNA of > length N and B the small one of length M. Create a hybrid RNA sequence > AB of length N+M. Set the rules in mfold so that > > bases 1->N will not bind with bases 1->N > bases N+1->N+M will not bind with bases N+1->N+M Clever idea ! As a matter of fact, I had thought of doing that, with the extra of putting between both a linker of 200 T's wich are not allowed to pait at all. Unfortunately the program mfold crashed with message : Fill run failed Maybe there is something unusual in the sequence. Regards, Guy Bottu, BEN From mathog at mendel.bio.caltech.edu Thu Jun 2 14:52:45 2005 From: mathog at mendel.bio.caltech.edu (David Mathog) Date: Thu, 02 Jun 2005 07:52:45 -0700 Subject: [EMBOSS] use water/matcher to find where RNA bybridizes Message-ID: > > One of our users had a problem : how to find the location where a small > molecule of RNA binds to a mRNA and so interferes with its functioning. This can also be addressed with Mfold. Let A be the large mRNA of length N and B the small one of length M. Create a hybrid RNA sequence AB of length N+M. Set the rules in mfold so that bases 1->N will not bind with bases 1->N bases N+1->N+M will not bind with bases N+1->N+M Run Mfold. Look through the results. If this runs properly you should see B bound somewhere in A with an energy level you may then use to compare binding affinities. Regards, David Mathog mathog at caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech From fernan at iib.unsam.edu.ar Thu Jun 2 17:08:31 2005 From: fernan at iib.unsam.edu.ar (Fernan Aguero) Date: Thu, 2 Jun 2005 14:08:31 -0300 Subject: [EMBOSS] use water/matcher to find where RNA bybridizes In-Reply-To: <20050602100954.GA14063@bigben.ulb.ac.be> References: <20050602100954.GA14063@bigben.ulb.ac.be> Message-ID: <20050602170831.GW44956@iib.unsam.edu.ar> +----[ Guy Bottu (02.Jun.2005 07:13): | | mRNA 2892 AATGTTGTGTGAGGATAATAGTAATAGTAATAGTAATAATAATAATAAT 2940 | .. . . .... ... .. .. .. .. .. ................ | RNAi 1 TTTGACCCTGCTACTACTACTACTACTACTACGATTATTATTATTATTA 49 | -------------------------------- | The only thing which bothers me is that the base pairs (which do have a | positive comparison score) are not labeled as "similar", they get a '.' | instead of a ':'. Does someone know why this is ? | +----] Guy, just a guess, but '.' and ':' are used in protein-protein comparisons to denote identity and similarity which are both different and meaningful. In dna-dna comparisons, you only care for identity, whether you consider it to be aligning A with A or A with its complement. So I would only expect only one of '.' or ':' used ... don't remember which is used for identity in emboss. My 2 cents guess, Fernan From pmr at ebi.ac.uk Thu Jun 2 17:23:13 2005 From: pmr at ebi.ac.uk (pmr at ebi.ac.uk) Date: Thu, 2 Jun 2005 18:23:13 +0100 (BST) Subject: [EMBOSS] use water/matcher to find where RNA bybridizes In-Reply-To: <20050602100954.GA14063@bigben.ulb.ac.be> References: <20050602100954.GA14063@bigben.ulb.ac.be> Message-ID: <3729.198.161.30.152.1117732993.squirrel@webmail.ebi.ac.uk> Guy Bottu writes: > One of our users had a problem : how to find the location where a small > molecule of RNA binds to a mRNA and so interferes with its functioning. > Nothing in EMBOSS and nothing found on the WWW. We finally did the > following : use revseq -nocomp to reverse the mRNA and then align the two > sequences using as matrix : > ------------------------------- > A T G C S W R Y K M B V H D N U > A 0 5 0 0 0 5 5 0 0 5 0 5 5 5 5 0 > T 5 0 5 0 0 5 0 5 5 0 5 0 5 5 5 5 .......... > ------------------------------- > This gave a reasonable result. water made the following alignment : > ------------------------------ ..... > mRNA 2892 AATGTTGTGTGAGGATAATAGTAATAGTAATAGTAATAATAATAATAAT > 2940 > .. . . .... ... .. .. .. .. .. ................ > RNAi 1 TTTGACCCTGCTACTACTACTACTACTACTACGATTATTATTATTATTA 49 > -------------------------------- > The only thing which bothers me is that the base pairs (which do have a > positive comparison score) are not labeled as "similar", they get a '.' > instead of a ':'. Does someone know why this is ? I believe this is simply because the bases are not identical. A user matrix can have arbitrary values, so the results are marked as similar (A=T scores 5) but identities are only scored at zero and so never appear with ":". You could try setting the scores to match the hydrogen bonds for this experiment (G=C 3 A=T 2 G=T 1) RNA folding is a missing area in EMBOSS. The Vienna package has been suggested as a possible EMBASSY package. Does anyone have any experience with it, or suggestions for alternative RNA packages we could use? regards, Peter From David.Bauer at SCHERING.DE Fri Jun 3 06:37:11 2005 From: David.Bauer at SCHERING.DE (David.Bauer at SCHERING.DE) Date: Fri, 3 Jun 2005 08:37:11 +0200 Subject: Antwort: Re: [EMBOSS] use water/matcher to find where RNA bybridizes Message-ID: Hi, I use the Vienna RNA package. It allows to look for global structure of the complete RNA (RNAfold) or local structures (RNALfold). The global folding accepts also longer sequences (as far as I remember this was a problem with Mfold). Visualization is a bit tricky. But there are helper scripts to convert the output to .ct files (b2ct) which can be used to create different graphical representations. Regards, David. RNA folding is a missing area in EMBOSS. The Vienna package has been suggested as a possible EMBASSY package. Does anyone have any experience with it, or suggestions for alternative RNA packages we could use? regards, Peter From gbottu at ben.vub.ac.be Fri Jun 3 08:17:41 2005 From: gbottu at ben.vub.ac.be (Guy Bottu) Date: Fri, 3 Jun 2005 10:17:41 +0200 Subject: [EMBOSS] use water/matcher to find where RNA bybridizes In-Reply-To: <20050602170831.GW44956@iib.unsam.edu.ar> References: <20050602100954.GA14063@bigben.ulb.ac.be> <20050602170831.GW44956@iib.unsam.edu.ar> Message-ID: <20050603081741.GA23810@bigben.ulb.ac.be> Dear all, Thanks for your replies. It is however still not clear to me where the '.' come from. I thought the EMBOSS "pair" output would put a '|' for identities and a ':' for similarities (score positive). Maybe the program is fooled and seriously perturbed by a matrix that assigns a negative score to identical base pairs. As for the proposal to distribute ViennaRNA as an Embassadir, why not ? At the BEN site we have mfold integrated under EMBOSS, but I am afraid distributing mfold as Embasadir will turn out to be impossible bacause of licencing issues. Note that mfold does not entirely solve the problem, since it operates on a single sequence, it does not search for a structure composed of two strands. I guess this is also true for ViennaRNA. We (me and our user) had tried to use mfold (with as input a sequence composed of the mRNA, a poly-T linker and the small RNA), but the program crashed with error message "Cannot get Fill". Maybe the sequence had something unusual. Regards, Guy Bottu, BEN From atorrano at lsi.upc.edu Fri Jun 3 09:15:10 2005 From: atorrano at lsi.upc.edu (Alexis Torrano Martinez) Date: Fri, 3 Jun 2005 11:15:10 +0200 (MET DST) Subject: [EMBOSS] external and app Message-ID: <7479297835atorrano@lsi.upc.es> Hello I am trying to execute hmmsearch from EMBOSS. This way I want to have a kind of wrap over the DDBB and retrieval apps. DB Pfam [ method: "app" comment: "Pfam with HMMER indexing" app: "$HMMERBIN/hmmsearch $EMBOSS_DATA/pfam/Pfam_ls %s" ] That is my DB specification for EMBOSS. How should I run seqret to execute properly hmmsearch? seqret Pfam:$HOME/soft/hmmer/last/tutorial/7LES_DROME And the next error was unexpected : Error: Unable to read sequence 'Pfam:/usr/usuaris/it/inb/soft/hmmer/last/tutorial/7LES_DROME' As tutorial says, if you specify external, %s receives as value the second field of the query (ID from seqret DB:ID). There is a way to call hmmsearch from EMBOSS? A lot of thanks. Regards. Alexis Torrano. -- ----------------------------------------------------- Alexis Torrano Martinez Instituto Nacional de Bioinformatica (INB) Nodo Computacional GNHC-2 UPC-CIRI c/. Jordi Girona 1-3 Modul C6-E201 Tel. : 934 011 650 E-08034 Barcelona Fax : 934 017 014 Catalunya (Spain) e-mail : atorrano at lsi.upc.edu ----------------------------------------------------- From gbottu at ben.vub.ac.be Fri Jun 3 10:03:18 2005 From: gbottu at ben.vub.ac.be (Guy Bottu) Date: Fri, 3 Jun 2005 12:03:18 +0200 Subject: [EMBOSS] external and app In-Reply-To: <7479297835atorrano@lsi.upc.es> References: <7479297835atorrano@lsi.upc.es> Message-ID: <20050603100318.GA24538@bigben.ulb.ac.be> On Fri, Jun 03, 2005 at 11:15:10AM +0200, Alexis Torrano Martinez wrote: > I am trying to execute hmmsearch from EMBOSS. This way I want to have > a kind of wrap over the DDBB and retrieval apps. > > > DB Pfam [ > method: "app" > comment: "Pfam with HMMER indexing" > app: "$HMMERBIN/hmmsearch $EMBOSS_DATA/pfam/Pfam_ls %s" > ] Dear Alexis, Your problem is as good as certain that the program defined as "app" should return a sequence to standard output, so that EMBOSS can take it. And this is not what hmmsearch does. Furthermore, hmmsearch searches a HMM against a databank of sequences ; you seem to want to search a sequence against a databank of HMM's (Pfam_ls), for which you need hmmpfam. It is maybe a good idea to install the Embassadir HMMER. Note however that ehmmpfam needs the user to specify where the databank is. At the BEN site I have a little bit "hacked" the program so that it uses Pfam_ls by default (and still lets the user choose an alternative). If you are interested I can send you a mail with "how to". Guy Bottu, Belgian EMBnet Node From pmr at ebi.ac.uk Fri Jun 3 10:08:11 2005 From: pmr at ebi.ac.uk (pmr at ebi.ac.uk) Date: Fri, 3 Jun 2005 11:08:11 +0100 (BST) Subject: [EMBOSS] use water/matcher to find where RNA bybridizes In-Reply-To: <20050603081741.GA23810@bigben.ulb.ac.be> References: <20050602100954.GA14063@bigben.ulb.ac.be> <20050602170831.GW44956@iib.unsam.edu.ar> <20050603081741.GA23810@bigben.ulb.ac.be> Message-ID: <1543.198.161.30.152.1117793291.squirrel@webmail.ebi.ac.uk> Dear Guy, > Thanks for your replies. It is however still not clear to me where the '.' > come from. I thought the EMBOSS "pair" output would put a '|' for > identities and a ':' for similarities (score positive). Maybe the program > is fooled and seriously perturbed by a matrix that assigns a negative > score to identical base pairs. I believe it is perturbed by the zero score for identical base pairs. This makes it unable to find a consensus character for the alignment, and so the "no consensus found" '.' character appears in the output. Making the output format understand your non-identical matching is an interesting challenge. I will look into it a little more. regards, Peter From Marc.Logghe at devgen.com Fri Jun 3 10:23:05 2005 From: Marc.Logghe at devgen.com (Marc Logghe) Date: Fri, 3 Jun 2005 12:23:05 +0200 Subject: [EMBOSS] external and app Message-ID: <0C528E3670D8CE4B8E013F6749231AA606E802@ANTARESIA.be.devgen.com> Hi, Just wondering, what happens if you use entret in stead of seqret. EMBOSS is supposed to just return the 'sequence' (in this case pfam result), unaltered, unparsed. When you use seqret, EMBOSS will parse the output and try to make a sequence out of it. HTH, Marc > -----Original Message----- > From: owner-emboss at hgmp.mrc.ac.uk > [mailto:owner-emboss at hgmp.mrc.ac.uk] On Behalf Of Guy Bottu > Sent: Friday, June 03, 2005 12:03 PM > To: Alexis Torrano Martinez; emboss at embnet.org > Subject: Re: [EMBOSS] external and app > > On Fri, Jun 03, 2005 at 11:15:10AM +0200, Alexis Torrano > Martinez wrote: > > I am trying to execute hmmsearch from EMBOSS. This way I > want to have > > a kind of wrap over the DDBB and retrieval apps. > > > > > > DB Pfam [ > > method: "app" > > comment: "Pfam with HMMER indexing" > > app: "$HMMERBIN/hmmsearch $EMBOSS_DATA/pfam/Pfam_ls %s" > > ] > > Dear Alexis, > > Your problem is as good as certain that the program defined as "app" > should return a sequence to standard output, so that EMBOSS > can take it. > And this is not what hmmsearch does. Furthermore, hmmsearch > searches a HMM against a databank of sequences ; you seem to > want to search a sequence against a databank of HMM's > (Pfam_ls), for which you need hmmpfam. It is maybe a good > idea to install the Embassadir HMMER. Note however that > ehmmpfam needs the user to specify where the databank is. At > the BEN site I have a little bit "hacked" the program so that > it uses Pfam_ls by default (and still lets the user choose an > alternative). If you are interested I can send you a mail > with "how to". > > Guy Bottu, > Belgian EMBnet Node > > From pmr at ebi.ac.uk Fri Jun 3 10:49:24 2005 From: pmr at ebi.ac.uk (pmr at ebi.ac.uk) Date: Fri, 3 Jun 2005 11:49:24 +0100 (BST) Subject: [EMBOSS] external and app In-Reply-To: <0C528E3670D8CE4B8E013F6749231AA606E802@ANTARESIA.be.devgen.com> References: <0C528E3670D8CE4B8E013F6749231AA606E802@ANTARESIA.be.devgen.com> Message-ID: <1830.198.161.30.152.1117795764.squirrel@webmail.ebi.ac.uk> Hi Marc, > Just wondering, what happens if you use entret in stead of seqret. > EMBOSS is supposed to just return the 'sequence' (in this case pfam > result), unaltered, unparsed. When you use seqret, EMBOSS will parse the > output and try to make a sequence out of it. Entret has to read the input as a sequence, and then returns the full text. So entret will fail where seqret fails. regards, Peter From jtk at cmp.uea.ac.uk Fri Jun 3 12:41:24 2005 From: jtk at cmp.uea.ac.uk (Jan T. Kim) Date: Fri, 3 Jun 2005 13:41:24 +0100 Subject: [EMBOSS] Reading Two Sequences from stdin with water Message-ID: <20050603124124.GI21551@jtkpc.cmp.uea.ac.uk> Dear EMBOSSers, is it possible to read both input sequences to a pairwise alignment from one input stream? With the test input file attached, the command water -asequence fasta::x.fasta:seq1 -bsequence fasta::x.fasta:seq2 -outfile stdout -auto runs as I expect, but the command cat x.fasta | water -asequence fasta::stdin:seq1 -bsequence fasta::stdin:seq2 -outfile stdout -auto gives EMBOSS An error in ajfile.c at line 1926: Error reading from file 'stdin' It may well be that water consumes the entire input stream on getting the first sequence, thus rendering itself unable to acquire the second one. Is there a solution to this? I would really like to avoid the mess of temporary files and run water in a clean pipe (pun intended ;-) ) Best regards & thanks in advance, Jan -- +- Jan T. Kim -------------------------------------------------------+ | *NEW* email: jtk at cmp.uea.ac.uk | | *NEW* WWW: http://www.cmp.uea.ac.uk/people/jtk | *-----=< hierarchical systems are for files, not for humans >=-----* -------------- next part -------------- > seq1 accaacc > seq2 acgagcc From jtk at cmp.uea.ac.uk Fri Jun 3 12:53:35 2005 From: jtk at cmp.uea.ac.uk (Jan T. Kim) Date: Fri, 3 Jun 2005 13:53:35 +0100 Subject: [EMBOSS] Reading Two Sequences from stdin with water Message-ID: <20050603125335.GA25735@jtkpc.cmp.uea.ac.uk> Dear EMBOSSers, is it possible to read both input sequences to a pairwise alignment from one input stream? With the test input file attached, the command water -asequence fasta::x.fasta:seq1 -bsequence fasta::x.fasta:seq2 -outfile stdout -auto runs as I expect, but the command cat x.fasta | water -asequence fasta::stdin:seq1 -bsequence fasta::stdin:seq2 -outfile stdout -auto gives EMBOSS An error in ajfile.c at line 1926: Error reading from file 'stdin' It may well be that water consumes the entire input stream on getting the first sequence, thus rendering itself unable to acquire the second one. Is there a solution to this? I would really like to avoid the mess of temporary files and run water in a clean pipe (pun intended ;-) ) Best regards & thanks in advance, Jan -- +- Jan T. Kim -------------------------------------------------------+ | *NEW* email: jtk at cmp.uea.ac.uk | | *NEW* WWW: http://www.cmp.uea.ac.uk/people/jtk | *-----=< hierarchical systems are for files, not for humans >=-----* -------------- next part -------------- > seq1 accaacc > seq2 acgagcc From simon.andrews at bbsrc.ac.uk Fri Jun 3 12:16:58 2005 From: simon.andrews at bbsrc.ac.uk (simon andrews (BI)) Date: Fri, 3 Jun 2005 13:16:58 +0100 Subject: [EMBOSS] Reading Two Sequences from stdin with water In-Reply-To: <20050603125335.GA25735@jtkpc.cmp.uea.ac.uk> References: <20050603125335.GA25735@jtkpc.cmp.uea.ac.uk> Message-ID: <94bd127ae352d650997dc4263fab3b8d@bbsrc.ac.uk> On 3 Jun 2005, at 13:53, Jan T. Kim wrote: > Dear EMBOSSers, > > is it possible to read both input sequences to a pairwise alignment > from one input stream? I spent a while trying to figure this out a few months back. In the end the best solution I came up with was to use the asis: sequence type. This allows you to do: water -auto asis:aaaa asis:ataa stdout which avoids the need for messing with the file system. I seem to remember I found a way to set names for the sequences as well, but can't find that right now. As long as you make sure you don't pass your command through a shell when you launch this from a script then it actually scales pretty well to quite large sequences. Hope this helps Simon. -- Simon Andrews PhD Bioinformatics Dept. The Babraham Institute simon.andrews at bbsrc.ac.uk +44 (0) 1223 496463 From pmr at ebi.ac.uk Fri Jun 3 14:09:03 2005 From: pmr at ebi.ac.uk (pmr at ebi.ac.uk) Date: Fri, 3 Jun 2005 15:09:03 +0100 (BST) Subject: [EMBOSS] Reading Two Sequences from stdin with water In-Reply-To: <20050603125335.GA25735@jtkpc.cmp.uea.ac.uk> References: <20050603125335.GA25735@jtkpc.cmp.uea.ac.uk> Message-ID: <1977.198.161.30.152.1117807743.squirrel@webmail.ebi.ac.uk> Jan T. Kim writes: > is it possible to read both input sequences to a pairwise alignment > from one input stream? > > cat x.fasta | water -asequence fasta::stdin:seq1 -bsequence > fasta::stdin:seq2 -outfile stdout -auto > > gives > > EMBOSS An error in ajfile.c at line 1926: > Error reading from file 'stdin' > > It may well be that water consumes the entire input stream on getting the > first sequence, thus rendering itself unable to acquire the second one. > > Is there a solution to this? I would really like to avoid the mess of > temporary files and run water in a clean pipe (pun intended ;-) ) EMBOSS will only cleanly read stdin as one input. We should probably trap that internally and give an error if we find stdin opening again. I wonder whether there is any useful way to share the stdin filebuffer. Hmmmm... in the early days of EMBOSS we decided not to allow it, but it could be worth a try. You would still be in trouble if you tried to read the second sequence first though. Assuming your x.fasta file has only seq1 and seq2 in that order, reading seq1 will continue until the first line of seq2 is reached. By then it would be too late for seq2 to be read cleanly. At least you have fasta:: specified - with no specified format, EMBOSS has to read a long way into the input just to check whether it is really GCG format. As for the asis format, I suppose an EMBOSS utility that reads x.fasta and outputs asis::ctagtacgatgcgatcg asis::tgatcgatggctacgtagc would be useful to you - then you could put `sillyname x.fasta` in your command line... at least until the command line gets too long. Hard to preserve the ID and description of the sequences though. "If you think water is pure, just remember what fish do in it." Hope that helps, Peter From jtk at cmp.uea.ac.uk Fri Jun 3 15:40:31 2005 From: jtk at cmp.uea.ac.uk (Jan T. Kim) Date: Fri, 3 Jun 2005 16:40:31 +0100 Subject: [EMBOSS] Reading Two Sequences from stdin with water In-Reply-To: <94bd127ae352d650997dc4263fab3b8d@bbsrc.ac.uk> References: <20050603125335.GA25735@jtkpc.cmp.uea.ac.uk> <94bd127ae352d650997dc4263fab3b8d@bbsrc.ac.uk> Message-ID: <20050603154031.GE25735@jtkpc.cmp.uea.ac.uk> On Fri, Jun 03, 2005 at 01:16:58PM +0100, simon andrews wrote: > > On 3 Jun 2005, at 13:53, Jan T. Kim wrote: > > >Dear EMBOSSers, > > > >is it possible to read both input sequences to a pairwise alignment > >from one input stream? > > I spent a while trying to figure this out a few months back. In the > end the best solution I came up with was to use the asis: sequence > type. This allows you to do: > > water -auto asis:aaaa asis:ataa stdout > > which avoids the need for messing with the file system. I seem to > remember I found a way to set names for the sequences as well, but > can't find that right now. That's a good idea which I hadn't thought of. Thanks for that. I don't need any names, other than for purposes of identifying the sequence within a multisequence file, which is not necessary with this solution. > As long as you make sure you don't pass your command through a shell > when you launch this from a script then it actually scales pretty well > to quite large sequences. Hmm... isn't there any OS specific limitation to the length of arguments? But anyway, this is not an issue for me in my case, where sequence length does not exceed a few hundred symbols. Best regards, Jan -- +- Jan T. Kim -------------------------------------------------------+ | *NEW* email: jtk at cmp.uea.ac.uk | | *NEW* WWW: http://www.cmp.uea.ac.uk/people/jtk | *-----=< hierarchical systems are for files, not for humans >=-----* From simon.andrews at bbsrc.ac.uk Fri Jun 3 14:53:17 2005 From: simon.andrews at bbsrc.ac.uk (simon andrews (BI)) Date: Fri, 3 Jun 2005 15:53:17 +0100 Subject: [EMBOSS] Reading Two Sequences from stdin with water In-Reply-To: <20050603154031.GE25735@jtkpc.cmp.uea.ac.uk> References: <20050603125335.GA25735@jtkpc.cmp.uea.ac.uk> <94bd127ae352d650997dc4263fab3b8d@bbsrc.ac.uk> <20050603154031.GE25735@jtkpc.cmp.uea.ac.uk> Message-ID: <297ae8156db03f61d2deb2e786d3bf10@bbsrc.ac.uk> On 3 Jun 2005, at 16:40, Jan T. Kim wrote: > On Fri, Jun 03, 2005 at 01:16:58PM +0100, simon andrews wrote: >> As long as you make sure you don't pass your command through a shell >> when you launch this from a script then it actually scales pretty well >> to quite large sequences. > > Hmm... isn't there any OS specific limitation to the length of > arguments? > But anyway, this is not an issue for me in my case, where sequence > length does not exceed a few hundred symbols. The only limit is imposed when the command is passed through a shell, and is then dependent on the shell you're using. If you can call the program without going through a shell then there should be no limit (beyond normal OS memory limits). The method for doing this varies with the language you're writing the script in, but for example in Perl: system ("water -auto asis:gatc asis:gatc stdout") would pass the arguments through a shell, whereas system("water", "-auto", "asis:gatc","asis:gatc","stdout") would not. Simon. -- Simon Andrews PhD Bioinformatics Dept. The Babraham Institute simon.andrews at bbsrc.ac.uk +44 (0) 1223 496463 From andrew.warry at bbsrc.ac.uk Fri Jun 3 15:23:57 2005 From: andrew.warry at bbsrc.ac.uk (andrew warry (BITS)) Date: Fri, 3 Jun 2005 16:23:57 +0100 Subject: [EMBOSS] Reading Two Sequences from stdin with water Message-ID: <3AED5B0556B73F4A9B556F43384F5C8501857BEC@bitse2knas1.bits.bbsrc.reserved> >Is there a solution to this? I would really like to avoid the mess of temporary files and >run water in a clean pipe (pun intended ;-) ) Hi How about : nthseq x.fasta -number 2 -stdout -auto | water -aseq stdin -bseq x.fasta -stdout -auto It isn't very neat and does a redundant comparison but it does the job! Andrew ----------------------------------------------------------------------- ANDREW WARRY Computational Molecular Biology Support BBSRC Bioscience IT services West Common Harpenden HERTS AL5 2JE tel: (01582) 714904 fax: (01582) 714901 andrew.warry at bbsrc.ac.uk ----------------------------------------------------------------------- -- Disclaimer: This e-mail and any attachments are confidential and intended solely for the use of the recipient(s) to whom they are addressed. If you have received it in error, please destroy all copies and inform the sender. This email and any attachments are believed to be free from viruses but BBSRC accepts no liability in connection therewith. From simon.andrews at bbsrc.ac.uk Fri Jun 3 15:32:34 2005 From: simon.andrews at bbsrc.ac.uk (simon andrews (BI)) Date: Fri, 3 Jun 2005 16:32:34 +0100 Subject: [EMBOSS] Reading Two Sequences from stdin with water In-Reply-To: <3AED5B0556B73F4A9B556F43384F5C8501857BEC@bitse2knas1.bits.bbsrc.reserved> References: <3AED5B0556B73F4A9B556F43384F5C8501857BEC@bitse2knas1.bits.bbsrc.reserved> Message-ID: <53838984cac0240ba7aefe6d33f7810d@bbsrc.ac.uk> On 3 Jun 2005, at 16:23, andrew warry ((BITS)) wrote: > >> Is there a solution to this? I would really like to avoid the mess of >> temporary files and run water in a clean pipe (pun intended ;-) ) > > Hi > How about : > > nthseq x.fasta -number 2 -stdout -auto | water -aseq stdin -bseq > x.fasta > -stdout -auto > > It isn't very neat and does a redundant comparison but it does the job! But x.fasta still has to appear on the filesystem. You can't run this cleanly in a pipe. Simon. From golharam at umdnj.edu Fri Jun 3 14:57:18 2005 From: golharam at umdnj.edu (Ryan Golhar) Date: Fri, 03 Jun 2005 10:57:18 -0400 Subject: [EMBOSS] Man pages Message-ID: <004501c5684c$89f2ef90$e6028a0a@GOLHARMOBILE1> Hi all, I recently noticed there aren't man pages installed with emboss, but I thought there were in the past. Are there man pages available? If so, where/how do I get them? ----- Ryan Golhar Computational Biologist The Informatics Institute at The University of Medicine & Dentistry of NJ Phone: 973-972-5034 Fax: 973-972-7412 Email: golharam at umdnj.edu From jtk at cmp.uea.ac.uk Fri Jun 3 17:18:01 2005 From: jtk at cmp.uea.ac.uk (Jan T. Kim) Date: Fri, 3 Jun 2005 18:18:01 +0100 Subject: [EMBOSS] Reading Two Sequences from stdin with water In-Reply-To: <1977.198.161.30.152.1117807743.squirrel@webmail.ebi.ac.uk> References: <20050603125335.GA25735@jtkpc.cmp.uea.ac.uk> <1977.198.161.30.152.1117807743.squirrel@webmail.ebi.ac.uk> Message-ID: <20050603171801.GF25735@jtkpc.cmp.uea.ac.uk> On Fri, Jun 03, 2005 at 03:09:03PM +0100, pmr at ebi.ac.uk wrote: > Jan T. Kim writes: > > is it possible to read both input sequences to a pairwise alignment > > from one input stream? > > > > cat x.fasta | water -asequence fasta::stdin:seq1 -bsequence > > fasta::stdin:seq2 -outfile stdout -auto > > > > gives > > > > EMBOSS An error in ajfile.c at line 1926: > > Error reading from file 'stdin' > > > > It may well be that water consumes the entire input stream on getting the > > first sequence, thus rendering itself unable to acquire the second one. > > > > Is there a solution to this? I would really like to avoid the mess of > > temporary files and run water in a clean pipe (pun intended ;-) ) > > EMBOSS will only cleanly read stdin as one input. We should probably trap > that internally and give an error if we find stdin opening again. I wonder > whether there is any useful way to share the stdin filebuffer. Hmmmm... in > the early days of EMBOSS we decided not to allow it, but it could be worth > a try. You would still be in trouble if you tried to read the second > sequence first though. Conceptually, this could be cleanly handled (which is why I tried in the first place), by having the function for obtaining the input sequences determine the source files in a first pass of the list of sources, and then obtain all requested sequences that come from the same file in one go through that file. This could be applied to the standard input just as to any other file. However, if the current code acquires the two sequences one after the other and independently of each other, it will require a possibly less than trivial rewrites to change that -- likely, the API for obtaining a sequence specified by a USA would have to be extended such that multiple sequences can be obtained from one file in one pass through that file, and some functions to group lists of USAs into sublists of USAs that refer to the same file would have to be provided. > Assuming your x.fasta file has only seq1 and seq2 in that order, reading > seq1 will continue until the first line of seq2 is reached. By then it > would be too late for seq2 to be read cleanly. Well, the approach outlined above does not have that limitation, and it also works for interleaved sequence formats. But if the EMBOSS internals are as I assume above, it's clear to me that this is something for the long-term wishlist. > At least you have fasta:: specified - with no specified format, EMBOSS has > to read a long way into the input just to check whether it is really GCG > format. Yes, heuristic format determination and non-seekable inputs don't mix too well generally... > As for the asis format, I suppose an EMBOSS utility that reads x.fasta and > outputs asis::ctagtacgatgcgatcg asis::tgatcgatggctacgtagc would be useful > to you - then you could put `sillyname x.fasta` in your command line... at > least until the command line gets too long. Hard to preserve the ID and > description of the sequences though. Yes -- in my case, I have the sequences available within a Python script anyway, so the asis approach works fine for me (even with a popen facility that goes through a shell -- I'll have to check how to eliminate that for future occasions where sequences may be too long for the command line, though). > "If you think water is pure, just remember what fish do in it." I like to boil my water, adding an all-natural disinfectant known as "coffee" for this reason... ;-) Best regards, Jan -- +- Jan T. Kim -------------------------------------------------------+ | *NEW* email: jtk at cmp.uea.ac.uk | | *NEW* WWW: http://www.cmp.uea.ac.uk/people/jtk | *-----=< hierarchical systems are for files, not for humans >=-----* From robin at hms.harvard.edu Fri Jun 3 16:30:33 2005 From: robin at hms.harvard.edu (Robin Colgrove) Date: Fri, 3 Jun 2005 12:30:33 -0400 Subject: [EMBOSS] Man pages in multiple languages? In-Reply-To: <004501c5684c$89f2ef90$e6028a0a@GOLHARMOBILE1> References: <004501c5684c$89f2ef90$e6028a0a@GOLHARMOBILE1> Message-ID: Hello all, are there EMBOSS man pages in other languages than English? Mandarin and Spanish in particular would help around here. thanks robin colgrove Harvard Medical School From pmr at ebi.ac.uk Fri Jun 3 17:14:08 2005 From: pmr at ebi.ac.uk (pmr at ebi.ac.uk) Date: Fri, 3 Jun 2005 18:14:08 +0100 (BST) Subject: [EMBOSS] Man pages in multiple languages? In-Reply-To: References: <004501c5684c$89f2ef90$e6028a0a@GOLHARMOBILE1> Message-ID: <2398.198.161.30.152.1117818848.squirrel@webmail.ebi.ac.uk> Hi Robin, > are there EMBOSS man pages in other languages than English? > > Mandarin and Spanish in particular would help around here. We don't have man pages exactly. We have a text version of the online documentation, with the "tfm" program to display to the screen. To find out why it is called tfm, you can use the command: tfm tfm Of course, it prints "The F(antastic) Manual" as in "RTFM" For other languages, there may be something out there. We are aware of a Japanese user group that has translated much of the EMBOSS materials. I am sure there are Mandarin speakers who could create a Mandarin version - though on the first ever EMBOSS course (in Beijing) ethere was a vote against creating a Mandarin version of the commandline. Hope this helps, Peter Rice From luojc at plum.lsc.pku.edu.cn Sat Jun 4 01:15:37 2005 From: luojc at plum.lsc.pku.edu.cn (Jingchu Luo) Date: Sat, 4 Jun 2005 09:15:37 +0800 (CST) Subject: [EMBOSS] Man pages in multiple languages? In-Reply-To: <2398.198.161.30.152.1117818848.squirrel@webmail.ebi.ac.uk> Message-ID: > I am sure there are Mandarin speakers who could create a Mandarin > version - though on the first ever EMBOSS course (in Beijing) there was > a vote against creating a Mandarin version of the commandline. We were running an EMBnet bioinformatics workshop in April 1999. Peter gave a talk about EMBOSS. It might be useful to have user manual and/or documentation in Chinese for the Chinese user group. We'll see if anyone in mainland has been working on this already. Jingchu ------- Jingchu Luo Centre of Bioinformatics Peking University Beijing 100871, China Tel: 86-10-6275-7281 Fax: 86-10-6275-9001 Email: luojc at pku.edu.cn URL: http://www.cbi.pku.edu.cn From d.gatherer at vir.gla.ac.uk Wed Jun 15 10:31:33 2005 From: d.gatherer at vir.gla.ac.uk (Derek Gatherer) Date: Wed, 15 Jun 2005 11:31:33 +0100 Subject: [EMBOSS] seqret options Message-ID: <6.2.1.2.1.20050615111255.02adcf50@lenzie.gla.ac.uk> Dear EMBOSSers I'm trying to write a pipeline to take a load of paired, aligned homologues from 2 species and submit them sequentially to the yn00 application from the well known PAML package. PAML's applications all take PHYLIP format. I can easily make this by looping over: seqret -auto -osformat phylip infile -out outfile However, PAML requires that the flag "I" be placed on the top line of the phylip fomat to indicate interleaved, eg: 2 663 I c-barf1 ATGGCCAGGC TTTTCGCTCA GCTGCTCCTG CTCGCGGGCT CCGTCGCCTC barf1 ATGGCCAGGT TCATCGCTCA GCTCCTCCTG TTGGCCTCCT GTGTGGCCGC CTGCCTGGCC GTCACCGCCT TTGTGGGTGA GCGGGCCGTC CTGAGTTCCT CGGCCAGGCT GTCACCGCTT TCTTGGGTGA GCGAGTCACC CTGACCTCCT rather than the standard phylip format, given by seqret: 2 663 c-barf1 ATGGCCAGGC TTTTCGCTCA GCTGCTCCTG CTCGCGGGCT CCGTCGCCTC barf1 ATGGCCAGGT TCATCGCTCA GCTCCTCCTG TTGGCCTCCT GTGTGGCCGC CTGCCTGGCC GTCACCGCCT TTGTGGGTGA GCGGGCCGTC CTGAGTTCCT CGGCCAGGCT GTCACCGCTT TCTTGGGTGA GCGAGTCACC CTGACCTCCT I could write a script to open each seqret output file and add this character to the top line of each, but before I dive into this, I'd like to know if there is any flag I can add to seqret to get the "I" added automatically. Failing that, PAML takes the other, non-interleaved phylip format ("sequential") by default, and that would not require any flag insertion. Seqret also can produce this (using -osformat phylip3): 1 663 YF c-barf1 ATGGCCAGGC TTTTCGCTCA GCTGCTCCTG CTCGCGGGCT CCGTCGCCTC CTGCCTGGCC GTCACCGCCT TTGTGGGTGA GCGGGCCGTC CTGAGTTCCT ACTGGAAGAG GGTGAGCCTA GGGCCCGAGA TCATGGTGGA ATGGTTCAAA but then PAML won't read it because it doesn't like the YF flags inserted by seqret!! So I either have to script to remove flags from sequential or insert them in interleaved, unless seqret has a solution. All assistance gratefully appreciated Derek From David.Bauer at SCHERING.DE Wed Jun 15 11:19:55 2005 From: David.Bauer at SCHERING.DE (David.Bauer at SCHERING.DE) Date: Wed, 15 Jun 2005 13:19:55 +0200 Subject: Antwort: [EMBOSS] seqret options Message-ID: Hi Derek, you can easily change this in the source code. The sequence output formats are defined in ajax/ajseqwrite.c In the function seqWritePhylip3 you find a line: ajFmtPrintF(outseq->File, "1 %d YF\n", ilen); Here you can just delete the YF and recompile emboss. David. Derek Gatherer An: emboss at embnet.org Gesendet von: Kopie: owner-emboss at hgm Thema: [EMBOSS] seqret options p.mrc.ac.uk 15.06.2005 12:31 Dear EMBOSSers I'm trying to write a pipeline to take a load of paired, aligned homologues from 2 species and submit them sequentially to the yn00 application from the well known PAML package. PAML's applications all take PHYLIP format. I can easily make this by looping over: seqret -auto -osformat phylip infile -out outfile However, PAML requires that the flag "I" be placed on the top line of the phylip fomat to indicate interleaved, eg: 2 663 I c-barf1 ATGGCCAGGC TTTTCGCTCA GCTGCTCCTG CTCGCGGGCT CCGTCGCCTC barf1 ATGGCCAGGT TCATCGCTCA GCTCCTCCTG TTGGCCTCCT GTGTGGCCGC CTGCCTGGCC GTCACCGCCT TTGTGGGTGA GCGGGCCGTC CTGAGTTCCT CGGCCAGGCT GTCACCGCTT TCTTGGGTGA GCGAGTCACC CTGACCTCCT rather than the standard phylip format, given by seqret: 2 663 c-barf1 ATGGCCAGGC TTTTCGCTCA GCTGCTCCTG CTCGCGGGCT CCGTCGCCTC barf1 ATGGCCAGGT TCATCGCTCA GCTCCTCCTG TTGGCCTCCT GTGTGGCCGC CTGCCTGGCC GTCACCGCCT TTGTGGGTGA GCGGGCCGTC CTGAGTTCCT CGGCCAGGCT GTCACCGCTT TCTTGGGTGA GCGAGTCACC CTGACCTCCT I could write a script to open each seqret output file and add this character to the top line of each, but before I dive into this, I'd like to know if there is any flag I can add to seqret to get the "I" added automatically. Failing that, PAML takes the other, non-interleaved phylip format ("sequential") by default, and that would not require any flag insertion. Seqret also can produce this (using -osformat phylip3): 1 663 YF c-barf1 ATGGCCAGGC TTTTCGCTCA GCTGCTCCTG CTCGCGGGCT CCGTCGCCTC CTGCCTGGCC GTCACCGCCT TTGTGGGTGA GCGGGCCGTC CTGAGTTCCT ACTGGAAGAG GGTGAGCCTA GGGCCCGAGA TCATGGTGGA ATGGTTCAAA but then PAML won't read it because it doesn't like the YF flags inserted by seqret!! So I either have to script to remove flags from sequential or insert them in interleaved, unless seqret has a solution. All assistance gratefully appreciated Derek From pmr at ebi.ac.uk Wed Jun 15 12:23:48 2005 From: pmr at ebi.ac.uk (Peter Rice) Date: Wed, 15 Jun 2005 13:23:48 +0100 Subject: [EMBOSS] seqret options In-Reply-To: <6.2.1.2.1.20050615111255.02adcf50@lenzie.gla.ac.uk> References: <6.2.1.2.1.20050615111255.02adcf50@lenzie.gla.ac.uk> Message-ID: <42B01DD4.8050303@ebi.ac.uk> Derek Gatherer wrote: > Dear EMBOSSers > > I'm trying to write a pipeline to take a load of paired, aligned > homologues from 2 species and submit them sequentially to the yn00 > application from the well known PAML package. PAML's applications all > take PHYLIP format. > Failing that, PAML takes the other, non-interleaved phylip format > ("sequential") by default, and that would not require any flag > insertion. Last time I worked through the PHYLIP formats (for EMBOSS 2.10.0) I found Phylip had changed the format it used. One change was that I removed the YF from phylip3 format because phylip was no longer using it - so updating to EMBOSS 2.10.0 will solve your non-interleaved format problem (and David Bauer's code fix is exactly what you need). Any more feedback on the variations of phylip formats that other packages use would be a great help! We will be releasing the PHYLIP 3.6 integration (as a PHYLIPNEW EMBASSY package) soon and expect to see more use of phylogenetics packages with EMBOSS. regards, Peter Rice From d.gatherer at vir.gla.ac.uk Wed Jun 15 12:44:46 2005 From: d.gatherer at vir.gla.ac.uk (Derek Gatherer) Date: Wed, 15 Jun 2005 13:44:46 +0100 Subject: [EMBOSS] seqret options In-Reply-To: <42B01DD4.8050303@ebi.ac.uk> References: <6.2.1.2.1.20050615111255.02adcf50@lenzie.gla.ac.uk> <42B01DD4.8050303@ebi.ac.uk> Message-ID: <6.2.1.2.1.20050615134121.02addff8@lenzie.gla.ac.uk> I do have 2.10.0: [gath01d at gamma seqs]$ seqret -osformat phylip3 barf1_both.seq Reads and writes (returns) sequences Output sequence [c-barf1.phylip3]: barf1.phylip3 [gath01d at gamma seqs]$ more barf1.phylip3 1 663 YF c-barf1ATGGCCAGGC TTTTCGCTCA GCTGCTCCTG CTCGCGGGCT CCGTCGCCTC CTGCCTGGCC GTCACCGCCT TTGTGGGTGA GCGGGCCGTC CTGAGTTCCT ACTGGAAGAG GGTGAGCCTA GGGCCCGAGA TCATGGTGGA ATGGTTCAAA [gath01d at gamma seqs]$ embossversion Writes the current EMBOSS version number 2.10.0 Anyway, I know how to do the code fix now, so thanks to all. Cheers Derek At 13:23 15/06/2005, you wrote: >Derek Gatherer wrote: > >>Dear EMBOSSers >>I'm trying to write a pipeline to take a load of paired, aligned >>homologues from 2 species and submit them sequentially to the yn00 >>application from the well known PAML package. PAML's applications all >>take PHYLIP format. > >>Failing that, PAML takes the other, non-interleaved phylip format >>("sequential") by default, and that would not require any flag insertion. > >Last time I worked through the PHYLIP formats (for EMBOSS 2.10.0) I found >Phylip had changed the format it used. > >One change was that I removed the YF from phylip3 format because phylip >was no longer using it - so updating to EMBOSS 2.10.0 will solve your >non-interleaved format problem (and David Bauer's code fix is exactly what >you need). > >Any more feedback on the variations of phylip formats that other packages >use would be a great help! > >We will be releasing the PHYLIP 3.6 integration (as a PHYLIPNEW EMBASSY >package) soon and expect to see more use of phylogenetics packages with EMBOSS. > >regards, > >Peter Rice > From pmr at ebi.ac.uk Wed Jun 15 12:49:59 2005 From: pmr at ebi.ac.uk (Peter Rice) Date: Wed, 15 Jun 2005 13:49:59 +0100 Subject: [EMBOSS] seqret options In-Reply-To: <6.2.1.2.1.20050615134121.02addff8@lenzie.gla.ac.uk> References: <6.2.1.2.1.20050615111255.02adcf50@lenzie.gla.ac.uk> <42B01DD4.8050303@ebi.ac.uk> <6.2.1.2.1.20050615134121.02addff8@lenzie.gla.ac.uk> Message-ID: <42B023F7.7010808@ebi.ac.uk> Derek Gatherer wrote: > I do have 2.10.0: > > [gath01d at gamma seqs]$ seqret -osformat phylip3 barf1_both.seq > Reads and writes (returns) sequences > Output sequence [c-barf1.phylip3]: barf1.phylip3 > [gath01d at gamma seqs]$ more barf1.phylip3 > 1 663 YF > c-barf1ATGGCCAGGC TTTTCGCTCA GCTGCTCCTG CTCGCGGGCT CCGTCGCCTC > CTGCCTGGCC GTCACCGCCT TTGTGGGTGA GCGGGCCGTC CTGAGTTCCT > ACTGGAAGAG GGTGAGCCTA GGGCCCGAGA TCATGGTGGA ATGGTTCAAA > [gath01d at gamma seqs]$ embossversion > Writes the current EMBOSS version number > 2.10.0 Oops ... make that "will be in 3.0.0" in that case ... it worked for me :-) regards, Peter From d.gatherer at vir.gla.ac.uk Wed Jun 15 13:25:36 2005 From: d.gatherer at vir.gla.ac.uk (Derek Gatherer) Date: Wed, 15 Jun 2005 14:25:36 +0100 Subject: [EMBOSS] seqret again Message-ID: <6.2.1.2.1.20050615142150.02afda58@lenzie.gla.ac.uk> Is this a bug? Compare the following output from seqret when phylip and phylip3 are specified. Shouldn't the first line of the phylip3 output be "2 546 YF" and not "1 546" ? [gath01d at gamma EBV]$ seqret -osformat phylip seqs/balf1.both Reads and writes (returns) sequences Output sequence [c-balf1.phylip]: seqs/balf1.phylip [gath01d at gamma EBV]$ more seqs/balf1.phylip 2 546 c-balf1.seATGCAGCCAG CCAAGTCTAC CGATTCGGTG TTTGTGAGGA CCCCGGTCGA balf1.seq ATGAGGCCAG CCAAGTCTAC AGATTCTGTG TTTGTGAGGA CCCCGGTCGA GGCGTGGGTC TCACCCTCGC CCCCGGACGA CAAAGTGGCA GAGACCAGCT GGCGTGGGTC GCGCCCTCGC CGCCGGACGA CAAGGTGGCT GAGTCCAGCT [snip] [gath01d at gamma EBV]$ seqret -osformat phylip3 seqs/balf1.both Reads and writes (returns) sequences Output sequence [c-balf1.phylip3]: seqs/balf1.phylip3 [gath01d at gamma EBV]$ more seqs/balf1.phylip3 1 546 YF c-balf1.seATGCAGCCAG CCAAGTCTAC CGATTCGGTG TTTGTGAGGA CCCCGGTCGA GGCGTGGGTC TCACCCTCGC CCCCGGACGA CAAAGTGGCA GAGACCAGCT ACCTCCTGTT CAGGGCCCTA TACGCTGTGT TCACCCAGGA CGAGACGGAC CTGCCTCTAC CGGCCCTGGT CATGTGCCGG CTCCTGAAGG CCTCCCTGAG [snip] From pmr at ebi.ac.uk Wed Jun 15 13:35:57 2005 From: pmr at ebi.ac.uk (Peter Rice) Date: Wed, 15 Jun 2005 14:35:57 +0100 Subject: [EMBOSS] seqret again In-Reply-To: <6.2.1.2.1.20050615142150.02afda58@lenzie.gla.ac.uk> References: <6.2.1.2.1.20050615142150.02afda58@lenzie.gla.ac.uk> Message-ID: <42B02EBD.4040800@ebi.ac.uk> Derek Gatherer wrote: > Is this a bug? Compare the following output from seqret when phylip and > phylip3 are specified. Shouldn't the first line of the phylip3 output > be "2 546 YF" and not "1 546" ? > [gath01d at gamma EBV]$ seqret -osformat phylip3 seqs/balf1.both > Reads and writes (returns) sequences > Output sequence [c-balf1.phylip3]: seqs/balf1.phylip3 > [gath01d at gamma EBV]$ more seqs/balf1.phylip3 > 1 546 YF > c-balf1.seATGCAGCCAG CCAAGTCTAC CGATTCGGTG TTTGTGAGGA CCCCGGTCGA > GGCGTGGGTC TCACCCTCGC CCCCGGACGA CAAAGTGGCA GAGACCAGCT > ACCTCCTGTT CAGGGCCCTA TACGCTGTGT TCACCCAGGA CGAGACGGAC > CTGCCTCTAC CGGCCCTGGT CATGTGCCGG CTCCTGAAGG CCTCCCTGAG Yes. Fixed in the next release (and in the current CVS code). Fixed as in "2 546" without the YF. Do any programs require the YF? Peter From kertib at linuxlap.hu Wed Jun 15 14:13:44 2005 From: kertib at linuxlap.hu (Kerti Balazs Gabor) Date: Wed, 15 Jun 2005 16:13:44 +0200 Subject: [EMBOSS] Install error (AMD64) Message-ID: <42B03798.1060204@linuxlap.hu> Hello! I would like to install emboss (latest version) from source. The host OS is Fedora Linux Core 4 (2.6.11-1.1369_FC4 #1 Thu Jun 2 22:56:33 EDT 2005 x86_64 x86_64 x86_64 GNU/Linux). The script $ configure --enable 64 ran clear but the make made error this: /bin/sh ../libtool --tag=CC --mode=link gcc -O2 -o aaindexextract aaindexextract.o ../nucleus/libnucleus.la ../ajax/libajaxg.la ../ajax/libajax.la ../plplot/libplplot.la -lX11 -lm mkdir .libs gcc -O2 -o .libs/aaindexextract aaindexextract.o ../nucleus/.libs/libnucleus.so ../ajax/.libs/libajaxg.so ../ajax/.libs/libajax.so ../plplot/.libs/libplplot.so -lX11 -lm -Wl,--rpath -Wl,/usr/local/lib /usr/bin/ld: cannot find -lX11 collect2: ld returned 1 exit status make[2]: *** [aaindexextract] Error 1 make[2]: Leaving directory `/usr/src/EMBOSS-2.10.0/emboss' make[1]: *** [all-recursive] Error 1 make[1]: Leaving directory `/usr/src/EMBOSS-2.10.0/emboss' make: *** [all-recursive] Error 1 [root at localhost EMBOSS-2.10.0]# How to solve this? What package(s) need for it? Balazs From ableasby at hgmp.mrc.ac.uk Wed Jun 15 14:26:39 2005 From: ableasby at hgmp.mrc.ac.uk (Alan Bleasby) Date: Wed, 15 Jun 2005 15:26:39 +0100 (BST) Subject: [EMBOSS] Install error (AMD64) Message-ID: <200506151426.j5FEQduS029156@bromine.hgmp.mrc.ac.uk> Dear Balazs, You need to install the xorg-x11-devel RPM, 'make clean' and do the configure step again. Also, there is no need to define --enable64 unless you expect 'user space' applications to consume more than 4Gb of internal memory. HTH Alan Bleasby RFCGR/HGMP (for the next month and a half) From aengus.stewart at cancer.org.uk Wed Jun 15 15:46:07 2005 From: aengus.stewart at cancer.org.uk (Aengus Stewart) Date: Wed, 15 Jun 2005 16:46:07 +0100 Subject: [EMBOSS] 3.0.0 Message-ID: <42B04D3F.7020405@cancer.org.uk> Will the ceremonial release of 3.0.0 into the wild be at ISMB? In other words, soon? :-) Regards Aengus -- ----------------------------------------------------------------------- Aengus Stewart Group Leader Tel: +44 (0)20 7269 3679 Cancer Research UK, Lincoln's Inn Fields, Holborn, London, WC2A 3PX, UK ----------------------------------------------------------------------- This electronic message contains information which may be privileged and confidential. The information is intended to be for the use of the individual(s) or entity named above. If you are not the intended recipient, be aware that any disclosure, copying, distribution or use of the contents of this information is prohibited. If you have received this electronic message in error, please notify me by telephone or email (to the number or address above) immediately. From ableasby at hgmp.mrc.ac.uk Wed Jun 15 16:44:18 2005 From: ableasby at hgmp.mrc.ac.uk (Alan Bleasby) Date: Wed, 15 Jun 2005 17:44:18 +0100 (BST) Subject: [EMBOSS] 3.0.0 Message-ID: <200506151644.j5FGiI8T009556@bromine.hgmp.mrc.ac.uk> Well, we always like to try to release on St Swithin's Day; that date is normally before ISMB, but this year it isn't. EMBOSS will feature at ISMB in all the usual places (BOSC, poster, demo and maybe BOF) and the soon-to-be-released 3.0.0 will certainly be mentioned there. Alan From golharam at umdnj.edu Wed Jun 15 19:01:54 2005 From: golharam at umdnj.edu (Ryan Golhar) Date: Wed, 15 Jun 2005 15:01:54 -0400 Subject: [EMBOSS] EMBOSS-GUI Message-ID: <000c01c571dc$b1daaf90$e6028a0a@GOLHARMOBILE1> Does anyone know if any work is being done on EMBOSS-GUI by Luke McCarthy. The web site doesn't seem to be active and out-of-date. If a new version isn't being worked on, I'd like to volunteer to help maintain it for v3.0.0. Its such a simple and clean interface. I haven't found anything else like it. Ryan From andrespinzon at gmail.com Wed Jun 15 20:14:27 2005 From: andrespinzon at gmail.com (Andres Pinzon) Date: Wed, 15 Jun 2005 15:14:27 -0500 Subject: [EMBOSS] EMBOSS-GUI In-Reply-To: <000c01c571dc$b1daaf90$e6028a0a@GOLHARMOBILE1> References: <000c01c571dc$b1daaf90$e6028a0a@GOLHARMOBILE1> Message-ID: <8968fc7e0506151314772f91f0@mail.gmail.com> 2005/6/15, Ryan Golhar : > Does anyone know if any work is being done on EMBOSS-GUI by Luke > McCarthy. The web site doesn't seem to be active and out-of-date. > > If a new version isn't being worked on, I'd like to volunteer to help > maintain it for v3.0.0. Its such a simple and clean interface. I > haven't found anything else like it. If you need help to maintaini it please ask me! ;-) I really liked that interface too. -- --------- Andr?s Pinz?n [http://www.andrespinzon.com] Centro de Bioinformatica, Instituto de Biotecnologia http://bioinf.ibun.unal.edu.co Universidad Nacional de Colombia tel. 3165000 ext. 16961 GNU/Linux user number 349752 ---------- From lukem at gene.pbi.nrc.ca Wed Jun 15 19:49:23 2005 From: lukem at gene.pbi.nrc.ca (Luke McCarthy) Date: Wed, 15 Jun 2005 13:49:23 -0600 Subject: [EMBOSS] EMBOSS-GUI In-Reply-To: <000c01c571dc$b1daaf90$e6028a0a@GOLHARMOBILE1> References: <000c01c571dc$b1daaf90$e6028a0a@GOLHARMOBILE1> Message-ID: <1118864963.13749.8.camel@incognito.invalid> On Wed, 2005-06-15 at 13:01, Ryan Golhar wrote: > Does anyone know if any work is being done on EMBOSS-GUI by Luke > McCarthy. The web site doesn't seem to be active and out-of-date. > > If a new version isn't being worked on, I'd like to volunteer to help > maintain it for v3.0.0. Its such a simple and clean interface. I > haven't found anything else like it. I have developed a new version and moved the code to sourceforge (http://sourceforge.net/projects/embossgui/) Since February, the only remaining step has been to wrap it up in a releasable format, but I just haven't found the time. I had considered waiting until the 3.0.0 release of EMBOSS, but if there's interest now I'll do my best to get it out there sooner. Cheers, Luke From golharam at umdnj.edu Thu Jun 16 15:10:49 2005 From: golharam at umdnj.edu (Ryan Golhar) Date: Thu, 16 Jun 2005 11:10:49 -0400 Subject: [EMBOSS] EMBOSS-GUI In-Reply-To: <1118866913.13749.12.camel@incognito.invalid> Message-ID: <002201c57285$95084090$e6028a0a@GOLHARMOBILE1> The release for EMBOSS 3.0.0 is around July 15th? If so, I can wait for embossgui until then. If you need any help with embossgui, please let me know. I'd be more than happy to contribute what I can. Ryan -----Original Message----- From: Luke McCarthy [mailto:lukem at gene.pbi.nrc.ca] Sent: Wednesday, June 15, 2005 4:22 PM To: Ryan Golhar Subject: Re: [EMBOSS] EMBOSS-GUI * (also copied to emboss at embnet.org) On Wed, 2005-06-15 at 13:01, Ryan Golhar wrote: > Does anyone know if any work is being done on EMBOSS-GUI by Luke > McCarthy. The web site doesn't seem to be active and out-of-date. > > If a new version isn't being worked on, I'd like to volunteer to help > maintain it for v3.0.0. Its such a simple and clean interface. I > haven't found anything else like it. I have developed a new version and moved the code to sourceforge (http://sourceforge.net/projects/embossgui/) Since February, the only remaining step has been to wrap it up in a releasable format, but I just haven't found the time. I had considered waiting until the 3.0.0 release of EMBOSS, but if there's interest now I'll do my best to get it out there sooner. Cheers, Luke From msarachu at biol.unlp.edu.ar Thu Jun 16 19:41:23 2005 From: msarachu at biol.unlp.edu.ar (Martin Sarachu) Date: Thu, 16 Jun 2005 16:41:23 -0300 Subject: [EMBOSS] Masking the : character? Message-ID: <42B1D5E3.1000503@biol.unlp.edu.ar> Dear list, is there any way to mask the ':' character so it is not interpreted as a delimiter for DB:sequence? I have this file /home/embtest/wProjects/test/.clustal.05.06.15:17.46.27/ops2_drome.msf and when I run infoseq I get this error $ infoseq /home/embtest/wProjects/test/.clustal.05.06.15:17.46.27/ops2_drome.msf Displays some simple information about sequences Error: failed to open filename '/home/embtest/wProjects/test/.clustal.05.06.15' Error: Unable to read sequence '/home/embtest/wProjects/test/.clustal.05.06.15:17.46.27/ops2_drome.msf' Died: infoseq terminated: Bad value for '-sequence' and no prompt Thanks in advance, Martin -- Martin Sarachu msarachu at biol.unlp.edu.ar AR.EMBnet http://www.ar.embnet.org From yezhiqiang at gmail.com Sat Jun 18 09:28:16 2005 From: yezhiqiang at gmail.com (yezhiqiang at gmail.com) Date: Sat, 18 Jun 2005 17:28:16 +0800 Subject: [EMBOSS] Masking the : character? In-Reply-To: <42B1D5E3.1000503@biol.unlp.edu.ar> References: <42B1D5E3.1000503@biol.unlp.edu.ar> Message-ID: <34198fe4050618022825238622@mail.gmail.com> I have also found this. and \: or using quote cannot solve this problem. But why not just rename your file name? It doesn't bother. 2005/6/17, Martin Sarachu : > Dear list, > > is there any way to mask the ':' character so it is not interpreted as a > delimiter for DB:sequence? > I have this file > > /home/embtest/wProjects/test/.clustal.05.06.15:17.46.27/ops2_drome.msf > > and when I run infoseq I get this error > > $ infoseq > /home/embtest/wProjects/test/.clustal.05.06.15:17.46.27/ops2_drome.msf > Displays some simple information about sequences > Error: failed to open filename > '/home/embtest/wProjects/test/.clustal.05.06.15' > Error: Unable to read sequence > '/home/embtest/wProjects/test/.clustal.05.06.15:17.46.27/ops2_drome.msf' > Died: infoseq terminated: Bad value for '-sequence' and no prompt > > Thanks in advance, > > Martin > > -- > Martin Sarachu > msarachu at biol.unlp.edu.ar > AR.EMBnet > http://www.ar.embnet.org > From yezhiqiang at gmail.com Sat Jun 18 09:50:50 2005 From: yezhiqiang at gmail.com (yezhiqiang at gmail.com) Date: Sat, 18 Jun 2005 17:50:50 +0800 Subject: [EMBOSS] Man pages In-Reply-To: <004501c5684c$89f2ef90$e6028a0a@GOLHARMOBILE1> References: <004501c5684c$89f2ef90$e6028a0a@GOLHARMOBILE1> Message-ID: <34198fe405061802504ace851@mail.gmail.com> EMBOss has its own manual system: tfm try like this: wossname seqret tfm seqret 2005/6/3, Ryan Golhar : > Hi all, > > I recently noticed there aren't man pages installed with emboss, but I > thought there were in the past. Are there man pages available? If so, > where/how do I get them? > > ----- > Ryan Golhar > Computational Biologist > The Informatics Institute at > The University of Medicine & Dentistry of NJ > > Phone: 973-972-5034 > Fax: 973-972-7412 > Email: golharam at umdnj.edu > > From jrvalverde at cnb.uam.es Mon Jun 20 08:55:20 2005 From: jrvalverde at cnb.uam.es (=?ISO-8859-15?Q?Jos=E9?= R. Valverde) Date: Mon, 20 Jun 2005 10:55:20 +0200 Subject: [EMBOSS] Multiplatform filenames (was Re: Masking the : character?) In-Reply-To: <34198fe4050618022825238622@mail.gmail.com> References: <42B1D5E3.1000503@biol.unlp.edu.ar> <34198fe4050618022825238622@mail.gmail.com> Message-ID: <20050620105520.736fef76.jrvalverde@cnb.uam.es> On Sat, 18 Jun 2005 17:28:16 +0800 wrote: > I have also found this. > and \: or using quote cannot solve this problem. > > But why not just rename your file name? It doesn't bother. > > > 2005/6/17, Martin Sarachu : > > Dear list, > > > > is there any way to mask the ':' character so it is not interpreted as a > > delimiter for DB:sequence? Renaming. --------- Or in other words (caution, detailed explanation follows): Why should anybody have a database or db. file named something\ or something\\\? But the fact is that by Unix filesystem semantics that is allowed. So, there is no easy way to avoid the ':' problem as one must acommodate for this. Specially since :: is also meningful to EMBOSS. One should introduce the notion of a special scape metacharacter or a quotation method, and while at it, it should integrate easily with shells... meaning that it should not be pre-processed by the shell (e.g. 'file:name' would come out of the shell as file:name, the user would need to type "'file:name'" or some other such horrible combination to escape shell quotations too). The problem arises because the ':' is used for historic reasons as a carry-over from VMS where it had special meaning on pathnames. This does not hold on UNIX where it is a legit character (actually ANY char but '/' and NULL is a legit character on UNIX). This is important as EMBOSS may be used on many locales, and you don't know in advance how a given symbol will be represented on them. Freedom comes at a cost. QUICK SOLUTION - ------------ I think that for the user it is simpler to know that ':' has a special meaning and should be avoided. For the cases where the colon is generated automatically, it may be better to provide a renaming script that changes the colon to something else. UI 'PRO' APPROACH - --------------- For GUI writers it is probably better to "translate" any such filenames between the user and EMBOSS. Note the quotes around translate above: it is not immediate. Let me explain: Escaping for the *command line* must be done using some character that is a) meaningful (but those are mostly already taken) and b) easy to type on a keyboard. In any case, this means that the user must be aware of the special case, and if so, renaming is just as good a solution. Escaping for the GUI removes all conditions and gives you full freedom. There are useful tricks to use special quoting/escaping chars on GUIs (hint: look into ASCII 0-32), but translating filenames can NOT be done transparently to the user (unless you can guarantee yours is the only user interface they will use). Any translation will change the filename and make it look differently or even untypable on other interfaces. Note that the problem still remains of distinguishing when a pathname containing a colon is an actual filename and not a database:file specification automatically. On a GUI you may assume a :-containing path is a filename when you are tagging uploaded data or program generated data, but otherwise you should be cautious, highly cautious. I.e. does swiss:prot_human refer to the database entry or to the data the user uploaded and called that way? Is it possible someone has called their database 'sequencer_files' locally and if so how you distinguish the local database of sequencer files from the user batch of sequencer_files:* uploaded sequences? Assuming you can tell, then read on: The trick is to create a special hidden directory on each user directory accessed: e.g. .myGUI-names. Then for every file make a suitably processed symlink on that subdirectory and call emboss through the symlink, sort of: my-gui-store-file(filename) { save(filename); sym = concatenate(".myGUI-names/", process(filename)); make_symlink(sym); } my-gui-emboss-access-file(filename) { sym = concatenate(".myGUI-names/", process(filename)); if (!file_exists(sym)) make_symlink(sym); emboss-access(sym); } process(filename) { for (p = filename; *p; p++) if (*p == ':') *p = SUB; // e.g. ASCII 0x1A } And off you go. Why the ? You should try to substitute the colon by something that is guaranteed to be portable. You only have either a) the portable character set (which is all typable) or b) the control character set (ASCII 0-32) which you may assume will be available everywhere, and most probably not used in filenames as they are very difficult to type or use by hand in general. From these we better avoid NUL, BEL, BS, HT, LF, VT, FF, CR and ESC just in case. But we still have plenty to choose from: SUB (substitute), CAN (cancel), DLE (data link espcape) have good mnemonics for escaping and STX (start of transmission) and ETX (end of transmission) for quoting, but these are only suggestions. That is to say: in the example above we substituted : by , because we only care about this special case. If there were more cases, then full escaping/quoting might be needed, and then instead we would copy the filename into a new string and fully quote/escape. I suggest the substitution approach since we are doing the encoding *within* the file name: anything else (quoting/escaping) will introduce additional chars inside the filename and this will reduce the available filename length hence making it less transparent and potentially dangerous (should by any chance be two filenames on the length limit containing an escapable sequence and differing only in the last char). Alternately one may use a hash of the filename instead, but this is more painful to code, maintain and debug and potentially more wasteful in terms of space. Now, the original filenames are in place, and available for the command line, up/downloads, other user interfaces, etc.. to manage as they wish, but your GUI is no longer haunted by the infamous colon. Symlinks on UNIX eat very little space: usually just the directory entry. If space is very tight and becomes a concern you may consider either hardlinks or only symlinking special filenames (this last at the cost of additionally complex logic). With current hard disks I wouldn't worry. And, yes, I know this involves many more changes to a UI, but either users accommodate (by avoiding the colon) or the UI does (by hidding limitations). Actually this a similar trick is used by NetATalk, AppleTalk, MacOS X and other systems that have similar metadata problems. j -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available URL: From pmr at ebi.ac.uk Mon Jun 20 09:16:35 2005 From: pmr at ebi.ac.uk (Peter Rice) Date: Mon, 20 Jun 2005 10:16:35 +0100 Subject: [EMBOSS] Multiplatform filenames (was Re: Masking the : character?) In-Reply-To: <20050620105520.736fef76.jrvalverde@cnb.uam.es> References: <42B1D5E3.1000503@biol.unlp.edu.ar> <34198fe4050618022825238622@mail.gmail.com> <20050620105520.736fef76.jrvalverde@cnb.uam.es> Message-ID: <42B68973.7090105@ebi.ac.uk> Jos? R. Valverde wrote: >>2005/6/17, Martin Sarachu : >>>is there any way to mask the ':' character so it is not interpreted as a >>>delimiter for DB:sequence? > The problem arises because the ':' is used for historic reasons as a > carry-over from VMS where it had special meaning on pathnames. This > does not hold on UNIX where it is a legit character (actually ANY char > but '/' and NULL is a legit character on UNIX). This is important as > EMBOSS may be used on many locales, and you don't know in advance > how a given symbol will be represented on them. Freedom comes at a > cost. Strictly speaknig, the problem arises because ':' has become a standard for bioinformatics users - though, yes, VMS was the source of the special syntax. It was adopted by, among others, GCG and SRS. It also is used, of course, in URN and URL syntax. However, in this case there is a partial solution. only alphanumneric characters are allowed in EMBOSS database names, and they must be more that one character in length (to avoid clashing with C: on Windows systems). The problem posted was not in a database name. It was the filename:id syntax, where a ':' appeared in the filename full path. For a ':' in a directory name (not in the filename) we could try to catch it by not allowing '/' in the ID. However, that can run into problems. For example, PFAM uses '/' in the identifier of a sequence derived from a longer entry. > QUICK SOLUTION > - ------------ > I think that for the user it is simpler to know that ':' has a special > meaning and should be avoided. > > For the cases where the colon is generated automatically, it may be better > to provide a renaming script that changes the colon to something else. That would be my recommendation too. > UI 'PRO' APPROACH > - --------------- > For GUI writers it is probably better to "translate" any such filenames > between the user and EMBOSS. Note the quotes around translate above: it > is not immediate. Let me explain: > > The trick is to create a special hidden directory on each user > directory accessed: e.g. .myGUI-names. Then for every file make a > suitably processed symlink on that subdirectory and call emboss through > the symlink, sort of: Looks like a good approach. The alternative would be to trap "bad" filenames and ask the user to correct them. regards, Peter From kkmattil at csc.fi Mon Jun 20 11:50:46 2005 From: kkmattil at csc.fi (Kimmo Mattila) Date: Mon, 20 Jun 2005 14:50:46 +0300 (EEST) Subject: [EMBOSS] Installing EMBOSS on a Rocks linux Message-ID: Hi I would like to ask, if anyone of you have managed to install EMBOSS on a linux cluster running Rocks linux. When I tried to install EMBOSS to our Rocks cluster, the standard installation procedure went through without error messages, but when I try to start an EMBOSS application, I get an error message: wossname Segmentation fault (core dumped) Google search about this topic revealed that some one else have had similar problems with Rocks too, but I was not able to find any potential solution. However, EMBOSS is available in Rocks based BioBrew linux distribution. So, any hints about how to install EMBOSS in a Rocks cluster would be welcome. Regards, Kimmo Mattila --------------------------------------------------------------- Kimmo Mattila, sovellusasiantuntija, Bioinformatiikan palvelut, CSC PL 405 02101 Espoo, puh 09 457 2708 , fax (09) 457 2302 CSC on tieteen tietotekniikan keskus, www.csc.fi, s-posti: kimmo.mattila at csc.fi Kimmo Mattila, application scientist, Bioinformatics Support, CSC P.O. Box 405 02101 Espoo, Finland, tel +358 9 4572708, fax +358 9 4572302 CSC is the Finnish IT Center for Science, www.csc.fi, e-mail: kimmo.mattila at csc.fi --------------------------------------------------------------- From smiddha at indiana.edu Mon Jun 20 14:59:56 2005 From: smiddha at indiana.edu (Sumit Middha) Date: Mon, 20 Jun 2005 09:59:56 -0500 Subject: [EMBOSS] Emboss package - file size limitations In-Reply-To: References: Message-ID: <1119279596.42b6d9ec2c52d@webmail.iu.edu> Hi, I looked around for threshold limitations on the size of the files that can be used for analysis, but could not locate any information. Is there a limit to the size of files that I can use, and is there a different limit on the web and command line usage. Actually I had the same question for GCG tools. Thanks, Sumit From pmr at ebi.ac.uk Mon Jun 20 15:26:52 2005 From: pmr at ebi.ac.uk (Peter Rice) Date: Mon, 20 Jun 2005 16:26:52 +0100 Subject: [EMBOSS] Emboss package - file size limitations In-Reply-To: <1119279596.42b6d9ec2c52d@webmail.iu.edu> References: <1119279596.42b6d9ec2c52d@webmail.iu.edu> Message-ID: <42B6E03C.9020306@ebi.ac.uk> Hi Sumit, > Is there a limit to the size of files that I can use, and is there a different > limit on the web and command line usage. EMBOSS has no hard coded limit on sequence or file size. The operating system may have problems with 2Gb file size, and the EMBLCD indexing system we use for database indexing in EMBOSS 2 has a 2Gb file size limit (4 byte file pointers are part of the index format) - there will be a new indexing system in beta release with EMBOSS 3 that will have enough space for large file offsets. Some algorithms will have limits, depending on the memory (real and virtual) on your machine. > Actually I had the same question for GCG tools. I believe sequence length is still up to 350kb unless you have the source code (when I was at Sanger I routinely rebuilt GCG with 750kb as the maximum sequence length so the genome sequencers could still use it on their own sequences!) A future release of GCG is supposed to increase this. Hope that helps, Peter Rice From francis at bii.a-star.edu.sg Tue Jun 21 08:47:51 2005 From: francis at bii.a-star.edu.sg (Francis Tang) Date: Tue, 21 Jun 2005 16:47:51 +0800 Subject: [EMBOSS] Wildfire 2.0 Message-ID: <42B7D437.5060506@bii.a-star.edu.sg> Dear EMBOSS users, On behalf of the Bioinformatics Institute, Singapore, I would like to announce that Wildfire 2.0 is now available for download from http://wildfire.bii.a-star.edu.sg . Wildfire is a GUI application for constructing workflows. It has been configured so that you can build workflows using EMBOSS applications immediately. The resulting workflows can run on a cluster or other multi-cpu machine, and exploit parallelism where possible. Wildfire is described in the BMC Bioinformatics article: "Wildfire: distributed, Grid-enabled workflow construction and execution", BMC Bioinformatics 2005, 6:69. http://www.biomedcentral.com/1471-2105/6/69/abstract We invite you all to download and try Wildfire and welcome feedback to wildfire at bii.a-star.edu.sg . Thank you. Francis. -- Francis TANG, Post-Doctoral Research Fellow Bioinformatics Institute, BMSI, A-STAR, Singapore. Tel: +65 64788282 Fax: +65 64789048 Email: francis at bii.a-star.edu.sg Add: Matrix L7, Biopolis WWW: http://www.bii.a-star.edu.sg/~francis/ From jieqiwang at gmail.com Tue Jun 21 14:55:46 2005 From: jieqiwang at gmail.com (Wang Jieqi) Date: Tue, 21 Jun 2005 22:55:46 +0800 Subject: [EMBOSS] Help with retrieving sequences Message-ID: <55162b5205062107555043348@mail.gmail.com> Hello, I started to learn EMBOSS recently. Now, I want to read the CDS of several mRNA sequences. The complete entires of these mRNAs(cDNA) have been retrieved from GeneBank into a single file. Could you please tell me what to do next? And, I find that seqret seems to only read the first molecule, could you please help me out? Thanks. Best regards, Jieqi -- Jieqi Wang Room 121, Department of Biology Tsinghua University Beijing, 100084 China, People's Republic Mobile: +86-13641302483 Dorm: +86-10-51534406 Lab: +86-10-62784794 Fax: +86-10-62794376 From aengus.stewart at cancer.org.uk Tue Jun 21 15:16:41 2005 From: aengus.stewart at cancer.org.uk (Aengus Stewart) Date: Tue, 21 Jun 2005 16:16:41 +0100 Subject: [EMBOSS] Data Lib sizes and indexing progs Message-ID: <42B82F59.5040200@cancer.org.uk> Hi folks, Just wondering how the new indexing methods were coming on. Its just I had a look at the most recent EMBL release and its (give or take the odd gig)AND INDEXING PROGS 250Gb which means to have the head room to hold a copy while installing a new copy requires >500Gb. Any info on how the new indexing will work and will it still have to run off uncompressed .dat files or will it produce its own index format? Sorry about the questions, its just I am rushing around the filesystem deleting anything that may appear to be "deleteable" to scrounge enough space :-) Regards Aengus -- ----------------------------------------------------------------------- Aengus Stewart Group Leader Bioinformatics at CGAL Tel: +44 (0)20 7269 3679 Cancer Research UK, Lincoln's Inn Fields, Holborn, London, WC2A 3PX, UK ----------------------------------------------------------------------- This electronic message contains information which may be privileged and confidential. The information is intended to be for the use of the individual(s) or entity named above. If you are not the intended recipient, be aware that any disclosure, copying, distribution or use of the contents of this information is prohibited. If you have received this electronic message in error, please notify me by telephone or email (to the number or address above) immediately. From ableasby at hgmp.mrc.ac.uk Tue Jun 21 15:27:56 2005 From: ableasby at hgmp.mrc.ac.uk (Alan Bleasby) Date: Tue, 21 Jun 2005 16:27:56 +0100 (BST) Subject: [EMBOSS] Data Lib sizes and indexing progs Message-ID: <200506211527.j5LFRuRR024742@bromine.hgmp.mrc.ac.uk> The new indexing programs are done (in CVS). The programs are: dbxflat, dbxfasta and dbxgcg and they operate like their 'dbi' couterparts. The dbx and dbi programs will be available in the next release. So, for EMBL, you would typically index the *.dat files. As before, you can create id,acc,sv,key,org & des indexes (though many sites just index id and acc). An indexing job on the whole of the recently released EMBL will produce id, acc and key indexes of the following sizes. They should give you some idea of the extra disc space you'll need. -rw-r--r-- 1 root root 19950 Jun 19 14:11 embli.ent -rw-r--r-- 1 root root 122 Jun 20 13:41 embli.pxac -rw-r--r-- 1 root root 122 Jun 20 13:41 embli.pxid -rw-r--r-- 1 root root 126 Jun 20 13:41 embli.pxkw -rw-r--r-- 1 root root 8755992576 Jun 20 13:41 embli.xac -rw-r--r-- 1 root root 7482558464 Jun 20 13:41 embli.xid -rw-r--r-- 1 root root 4046751744 Jun 20 13:41 embli.xkw HTH Alan From kellert at ohsu.edu Thu Jun 23 04:06:18 2005 From: kellert at ohsu.edu (Thomas J Keller) Date: Wed, 22 Jun 2005 21:06:18 -0700 Subject: [EMBOSS] source of common vectors in cirdna format Message-ID: <03b2ab1a566cf2386b89bb570e26e6eb@ohsu.edu> Greetings, Is there a source for common vectors in cirdna format available for downloading? Thanks in advance, Tom Keller Tom Keller, Ph.D. http://www.ohsu.edu/research/core kellert at ohsu.edu 503-494-2442 -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/enriched Size: 259 bytes Desc: not available URL: From clemens.broger at roche.com Thu Jun 23 13:48:24 2005 From: clemens.broger at roche.com (Broger, Clemens) Date: Thu, 23 Jun 2005 15:48:24 +0200 Subject: [EMBOSS] Needle/water, revcomp Message-ID: <7E08B1C16672A147B29D3DE3827536E37E00CC@rbamsem3.emea.roche.com> I have 2 questions: The first is about identity/similarity in nucleotide alignments made with needle (probably the same holds true for water): ######################################## # Program: needle # Rundate: Thu Jun 23 13:29:58 2005 # Align_format: srspair # Report_file: seq0.needle ######################################## #======================================= # # Aligned_sequences: 2 # 1: SEQ0 # 2: SEQ1 # Matrix: EDNAFULL # Gap_penalty: 100.0 # Extend_penalty: 10.0 # # Length: 70 # Length of sequence 1: 70 # Length of sequence 2: 70 # Identity: 46/70 (65.7%) # Similarity: 47/70 (67.1%) # Gaps: 0/70 ( 0.0%) # Score: 162.0 # # #======================================= . . . . . SEQ0 1 aaaaaaaaaaaaaaaaaaaaaaaaacccccgggggtttttuuuuunnnnn 50 |||||||||||||||||||||......|......||:....:|.. SEQ1 1 aaaaaaaaaaaaaaaaaaaaacgtunacgtunacgtunacgtunacgtun 50 . . . . . . . SEQ0 51 aaaaaaaaaaaaaaaaaaaa 70 |||||||||||||||||||| SEQ1 51 aaaaaaaaaaaaaaaaaaaa 70 . . Each base of the set acgtun is aligned against each other. The 20 a's at the beginning and end are only to force an ungapped alignment. Maximum gap penalties were used. I agree with the symbols in the alignment |,: and ., but the 46 identities in the summary imply that the n-n match is also counted. The t-u matches are counted as similar, which is ok, but the n-n match is not counted as similar, although it is counted as identical. I think the n-n match should not be counted both in identity and similarity. Now for ambiguous bases. w is a or t ######################################## # Program: needle # Rundate: Thu Jun 23 14:53:33 2005 # Align_format: srspair # Report_file: seq0.needle ######################################## #======================================= # # Aligned_sequences: 2 # 1: SEQ0 # 2: SEQ1 # Matrix: EDNAFULL # Gap_penalty: 100.0 # Extend_penalty: 10.0 # # Length: 26 # Length of sequence 1: 26 # Length of sequence 2: 26 # Identity: 21/26 (80.8%) # Similarity: 23/26 (88.5%) # Gaps: 0/26 ( 0.0%) # Score: 94.0 # # #======================================= . . SEQ0 1 aaaaaaaaaawwwwwwaaaaaaaaaa 26 ||||||||||.. .|||||||||| SEQ1 1 aaaaaaaaaaatwgcuaaaaaaaaaa 26 . . In the alignment I would put a dot at the w-w match (but I could also agree with the way it is handled now). But again the w is counted in the summary as an identity but not as a similarity. The second question is about the handling in EMBOSS of reverse-complemented nucleotide segments such as db:seq[10:20:r] The sequence is first reverse-complemented and then residues 10 to 20 are cut out. Biologists usually expect that residues 10 to 20 are first cut out and then reverse-complemented. Can this be changed? That would be very helpful. Best regards Clemens Dr. Clemens Broger Bioinformatics F. Hoffmann-La Roche Ltd. PRBI 65/303 CH-4070 Basel clemens.broger at roche.com +41-61-688-4447 -------------- next part -------------- An HTML attachment was scrubbed... URL: From pmr at ebi.ac.uk Thu Jun 23 14:38:25 2005 From: pmr at ebi.ac.uk (pmr at ebi.ac.uk) Date: Thu, 23 Jun 2005 15:38:25 +0100 (BST) Subject: [EMBOSS] source of common vectors in cirdna format In-Reply-To: <03b2ab1a566cf2386b89bb570e26e6eb@ohsu.edu> References: <03b2ab1a566cf2386b89bb570e26e6eb@ohsu.edu> Message-ID: <2840.12.27.2.2.1119537505.squirrel@webmail.ebi.ac.uk> Tom Keller writes: > Is there a source for common vectors in cirdna format available for > downloading? Or is there a source of common vectors that we could convert to cirdna format? regards, Peter Rice From pmr at ebi.ac.uk Thu Jun 23 14:41:34 2005 From: pmr at ebi.ac.uk (pmr at ebi.ac.uk) Date: Thu, 23 Jun 2005 15:41:34 +0100 (BST) Subject: [EMBOSS] Needle/water, revcomp In-Reply-To: <7E08B1C16672A147B29D3DE3827536E37E00CC@rbamsem3.emea.roche.com> References: <7E08B1C16672A147B29D3DE3827536E37E00CC@rbamsem3.emea.roche.com> Message-ID: <2849.12.27.2.2.1119537694.squirrel@webmail.ebi.ac.uk> Clemens Broger writes: > I have 2 questions: > > The first is about identity/similarity in nucleotide alignments made > with needle (probably the same holds true for water): Tricky. This requires the matrix to define some codes as ambiguity codes so we know w-w is not an identity. I woudl guess we can extend the matrix formats we use to include this information, or perhaps for nucleotide sequences we can "know" the answer. I will investigate. > The second question is about the handling in EMBOSS of > reverse-complemented nucleotide segments such as > > db:seq[10:20:r] > > The sequence is first reverse-complemented and then residues 10 to 20 > are cut out. > Biologists usually expect that residues 10 to 20 are first cut out and > then reverse-complemented. > > Can this be changed? That would be very helpful. Oops. Yes - will do. regards, Peter Rice From msarachu at biol.unlp.edu.ar Mon Jun 27 12:28:58 2005 From: msarachu at biol.unlp.edu.ar (Martin Sarachu) Date: Mon, 27 Jun 2005 09:28:58 -0300 Subject: [EMBOSS] Re: wemboss: warning and errors In-Reply-To: <0C9336E1DA90DB479BEBAF2C7C5699E1016EA96D@USINDMDOWM001.dow.com> References: <0C9336E1DA90DB479BEBAF2C7C5699E1016EA96D@USINDMDOWM001.dow.com> Message-ID: <42BFF10A.4090405@biol.unlp.edu.ar> Dear Rani, about the error with ACD, when running distmat from command line (-options to be prompted for all options) I get this error with ACD > # distmat -options > Creates a distance matrix from multiple alignments > Input sequence set: uniprot:papa_* > Multiple substitution correction methods for proteins > 0 : Uncorrected > 1 : Jukes-Cantor > 2 : Kimura Protein > Method to use [0]: 1 > Warning: ACD expression invalid @(!$acdprotein) > > Warning: ACD expression invalid @(!$acdprotein) > > Error: File /usr/local/emboss/share/EMBOSS/acd/distmat.acd line 60: (ambiguous) Bad additional flag N | Y) > but without -options (i.e. default options chosen) runs ok > # distmat > Creates a distance matrix from multiple alignments > Input sequence set: uniprot:papa_* > Multiple substitution correction methods for proteins > 0 : Uncorrected > 1 : Jukes-Cantor > 2 : Kimura Protein > Method to use [0]: 1 > Output file [papa_.distmat]: > Warning: Sequence lengths are not equal! > Warning: Sequence lengths are not equal! > Warning: Sequence lengths are not equal! there is a missing left parenthesis in distmat.acd in line 61, please change this > additional: "@(@(@(!$acdprotein)) & @($(nucmethod)==1)) | to this > additional: "@(@(@(!$(acdprotein)) & @($(nucmethod)==1)) | Regards, Martin PS: working on the exclude problem... Mamidipalli, SudhaRani (S) wrote: > Hello Martin, > > While testing the programs in wEMBOSS,we have encountered couple of problems. > > 1.The 'distmat' program gave some warning. Here is the warning of that program. > ------------------------------- > Warning! > "ambiguous" parameter: syntax error (missing left parenthesis) in ACD expression (tell to EMBOSS Manager : this could produce wrong results from program execution!) > ------------------------------- > I went and checked distmat.acd file but couldn't find any error. > > 2. I added some programs, that we don't want to be displayed in wemboss, in the exclude file: /genomics/sw/wEMBOSS-1.4.0/wEMBOSS/data/exclude. And then I re-installed wrappers4EMBOSS and wEMBOSS. Surprisingly, only few programs(for example tranalign,embossversion etc.) got deleted from wemboss whereas few programs (for example textsearch, entret etc.) show up with error > -------- > EMBOSS: error... > chaos has been excluded > ---------- > > Please clarify. > > Thanks and Regards, > Rani. > -- Martin Sarachu msarachu at biol.unlp.edu.ar AR.EMBnet http://www.ar.embnet.org From pmr at ebi.ac.uk Mon Jun 27 14:25:35 2005 From: pmr at ebi.ac.uk (pmr at ebi.ac.uk) Date: Mon, 27 Jun 2005 15:25:35 +0100 (BST) Subject: [EMBOSS] Re: wemboss: warning and errors In-Reply-To: <42BFF10A.4090405@biol.unlp.edu.ar> References: <0C9336E1DA90DB479BEBAF2C7C5699E1016EA96D@USINDMDOWM001.dow.com> <42BFF10A.4090405@biol.unlp.edu.ar> Message-ID: <1613.12.27.2.2.1119882335.squirrel@webmail.ebi.ac.uk> Martin Srachu writes: > there is a missing left parenthesis in distmat.acd in line 61, please > change this > >> additional: "@(@(@(!$acdprotein)) & @($(nucmethod)==1)) | > > to this > >> additional: "@(@(@(!$(acdprotein)) & @($(nucmethod)==1)) | Already fixed in EMBOSS 2.10.0. But this does highlight a gap in the ACD validation - this expression is only evaluated when needed (when -option is used). I will try adding checks for all strings to generate warnings for unbalanced () and $ or @ without ( to acdvalid before the July 15th release. >> -------- >> EMBOSS: error... >> chaos has been excluded >> ---------- I know this is really a wEMBOSS problem, but the message appeals to my sense of humour!!! Can you send me an explanation of it when you have a solution - it may appear in future EMBOSS talks :-) regards, Peter From gbottu at ben.vub.ac.be Wed Jun 29 08:30:02 2005 From: gbottu at ben.vub.ac.be (Guy Bottu) Date: Wed, 29 Jun 2005 10:30:02 +0200 Subject: [EMBOSS] bug related to -plasmid parameter Message-ID: <20050629083002.GA4560@bigben.ulb.ac.be> from: Belgian EMBnet Node Dear colleagues, At the BEN site we have on our main computer EMBOSS 2.10.0 under Alpha OSF 5.1A. I just noticed that the programs remap, restrict and restover give a segmentation fault when run with parameter -plasmid. This does however not occur with an EMBOSS installation we have on a Linux. So, this behaviour must be dependant on the OS and maybe on the hardware. Did someone else notice it ? Regards, Guy Bottu From ableasby at hgmp.mrc.ac.uk Wed Jun 29 12:13:15 2005 From: ableasby at hgmp.mrc.ac.uk (Alan Bleasby) Date: Wed, 29 Jun 2005 13:13:15 +0100 (BST) Subject: [EMBOSS] bug related to -plasmid parameter Message-ID: <200506291213.j5TCDFMb014301@bromine.hgmp.mrc.ac.uk> Dear Guy, Thanks for spotting that. It's now fixed in CVS and will be part of the 3.0.0 release. ATB Alan Bleasby RFCGR/HGMP (for one more month)