From gbottu at ben.vub.ac.be  Thu Jun  2 06:09:54 2005
From: gbottu at ben.vub.ac.be (Guy Bottu)
Date: Thu, 2 Jun 2005 12:09:54 +0200
Subject: [EMBOSS] use water/matcher to find where RNA bybridizes
Message-ID: <20050602100954.GA14063@bigben.ulb.ac.be>

from : Belgian EMBnet Node

	Dear colleagues,

One of our users had a problem : how to find the location where a small 
molecule of RNA binds to a mRNA and so interferes with its functioning. 
Nothing in EMBOSS and nothing found on the WWW. We finally did the 
following : use revseq -nocomp to reverse the mRNA and then align the two 
sequences using as matrix :
-------------------------------
    A   T   G   C   S   W   R   Y   K   M   B   V   H   D   N   U
A   0   5   0   0   0   5   5   0   0   5   0   5   5   5   5   0
T   5   0   5   0   0   5   0   5   5   0   5   0   5   5   5   5
G   0   5   0   5   5   0   5   0   5   0   5   5   0   5   5   3
C   0   0   5   0   5   0   0   5   0   5   5   5   5   0   5   0
S   0   0   5   5   5   0   5   5   5   5   5   5   5   5   5   0         
W   5   5   0   0   0   5   5   5   5   5   5   5   5   5   5   5
R   5   0   5   0   5   5   5   0   5   5   5   5   5   5   5   0
Y   0   5   0   5   5   5   0   5   5   5   5   5   5   5   5   5
K   0   5   5   0   5   5   5   5   5   0   5   5   5   5   5   5
M   5   0   0   5   5   5   5   5   0   5   5   5   5   5   5   0          
B   0   5   5   5   5   5   5   5   5   5   5   5   5   5   5   5   
V   5   0   5   5   5   5   5   5   5   5   5   5   5   5   5   0
H   5   5   0   5   5   5   5   5   5   5   5   5   5   5   5   5
D   5   5   5   0   5   5   5   5   5   5   5   5   5   5   5   5
N   5   5   5   5   5   5   5   5   5   5   5   5   5   5   5   5
U   0   5   3   0   0   5   0   5   5   0   5   0   5   5   5   5
-------------------------------
This gave a reasonable result. water made the following alignment :
------------------------------
#=======================================
#
# Aligned_sequences: 2
# 1: mRNA
# 2: RNAi
# Matrix: HYB
# Gap_penalty: 10.0
# Extend_penalty: 0.5
#
# Length: 49
# Identity:       3/49 ( 6.1%)
# Similarity:     0/49 ( 0.0%)
# Gaps:           0/49 ( 0.0%)
# Score: 185.0
# 
#
#=======================================

mRNA            2892 AATGTTGTGTGAGGATAATAGTAATAGTAATAGTAATAATAATAATAAT  2940
                     ..  . . .... ... .. .. .. .. ..  ................
RNAi               1 TTTGACCCTGCTACTACTACTACTACTACTACGATTATTATTATTATTA  49
--------------------------------
The only thing which bothers me is that the base pairs (which do have a 
positive comparison score) are not labeled as "similar", they get a '.' 
instead of a ':'. Does someone know why this is ?

	Guy Bottu


From gbottu at ben.vub.ac.be  Thu Jun  2 11:08:45 2005
From: gbottu at ben.vub.ac.be (Guy Bottu)
Date: Thu, 2 Jun 2005 17:08:45 +0200
Subject: [EMBOSS] use water/matcher to find where RNA bybridizes
In-Reply-To: <E1Ddr3t-0000sV-00@mendel.bio.caltech.edu>
References: <E1Ddr3t-0000sV-00@mendel.bio.caltech.edu>
Message-ID: <20050602150845.GA17226@bigben.ulb.ac.be>

On Thu, Jun 02, 2005 at 07:52:45AM -0700, David Mathog wrote:
> > One of our users had a problem : how to find the location where a small 
> > molecule of RNA binds to a mRNA and so interferes with its functioning. 
> 
> This can also be addressed with Mfold.  Let A be the large mRNA of
> length N and B the small one of length M. Create a hybrid RNA sequence
> AB of length N+M.  Set the rules in mfold so that
> 
>   bases 1->N will not bind with bases 1->N
>   bases N+1->N+M will not bind with bases N+1->N+M

Clever idea ! As a matter of fact, I had thought of doing that, with the 
extra of putting between both a linker of 200 T's wich are not allowed to 
pait at all. Unfortunately the program mfold crashed with message :

Fill run failed

Maybe there is something unusual in the sequence.

	Regards,
	Guy Bottu,
	BEN


From mathog at mendel.bio.caltech.edu  Thu Jun  2 10:52:45 2005
From: mathog at mendel.bio.caltech.edu (David Mathog)
Date: Thu, 02 Jun 2005 07:52:45 -0700
Subject: [EMBOSS] use water/matcher to find where RNA bybridizes
Message-ID: <E1Ddr3t-0000sV-00@mendel.bio.caltech.edu>


> 
> One of our users had a problem : how to find the location where a small 
> molecule of RNA binds to a mRNA and so interferes with its functioning. 


This can also be addressed with Mfold.  Let A be the large mRNA of
length N and B the small one of length M. Create a hybrid RNA sequence
AB of length N+M.  Set the rules in mfold so that

  bases 1->N will not bind with bases 1->N
  bases N+1->N+M will not bind with bases N+1->N+M

Run Mfold.
Look through the results.

If this runs properly you should see B bound somewhere in A with
an energy level you may then use to compare binding affinities. 

Regards,  

David Mathog
mathog at caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech


From fernan at iib.unsam.edu.ar  Thu Jun  2 13:08:31 2005
From: fernan at iib.unsam.edu.ar (Fernan Aguero)
Date: Thu, 2 Jun 2005 14:08:31 -0300
Subject: [EMBOSS] use water/matcher to find where RNA bybridizes
In-Reply-To: <20050602100954.GA14063@bigben.ulb.ac.be>
References: <20050602100954.GA14063@bigben.ulb.ac.be>
Message-ID: <20050602170831.GW44956@iib.unsam.edu.ar>

+----[ Guy Bottu <gbottu at ben.vub.ac.be> (02.Jun.2005 07:13):
|
| mRNA            2892 AATGTTGTGTGAGGATAATAGTAATAGTAATAGTAATAATAATAATAAT  2940
|                      ..  . . .... ... .. .. .. .. ..  ................
| RNAi               1 TTTGACCCTGCTACTACTACTACTACTACTACGATTATTATTATTATTA  49
| --------------------------------
| The only thing which bothers me is that the base pairs (which do have a 
| positive comparison score) are not labeled as "similar", they get a '.' 
| instead of a ':'. Does someone know why this is ?
|
+----]

Guy,

just a guess, but '.' and ':' are used in protein-protein
comparisons to denote identity and similarity which are both different
and meaningful. In dna-dna comparisons, you only care for
identity, whether you consider it to be aligning A with A or
A with its complement. So I would only expect only one of
'.' or ':' used ... don't remember which is used for
identity in emboss.

My 2 cents guess,

Fernan


From pmr at ebi.ac.uk  Thu Jun  2 13:23:13 2005
From: pmr at ebi.ac.uk (pmr at ebi.ac.uk)
Date: Thu, 2 Jun 2005 18:23:13 +0100 (BST)
Subject: [EMBOSS] use water/matcher to find where RNA bybridizes
In-Reply-To: <20050602100954.GA14063@bigben.ulb.ac.be>
References: <20050602100954.GA14063@bigben.ulb.ac.be>
Message-ID: <3729.198.161.30.152.1117732993.squirrel@webmail.ebi.ac.uk>

Guy Bottu writes:

> One of our users had a problem : how to find the location where a small
> molecule of RNA binds to a mRNA and so interferes with its functioning.
> Nothing in EMBOSS and nothing found on the WWW. We finally did the
> following : use revseq -nocomp to reverse the mRNA and then align the two
> sequences using as matrix :
> -------------------------------
>     A   T   G   C   S   W   R   Y   K   M   B   V   H   D   N   U
> A   0   5   0   0   0   5   5   0   0   5   0   5   5   5   5   0
> T   5   0   5   0   0   5   0   5   5   0   5   0   5   5   5   5


..........

> -------------------------------
> This gave a reasonable result. water made the following alignment :
> ------------------------------


.....

> mRNA            2892 AATGTTGTGTGAGGATAATAGTAATAGTAATAGTAATAATAATAATAAT
> 2940
>                      ..  . . .... ... .. .. .. .. ..  ................
> RNAi               1 TTTGACCCTGCTACTACTACTACTACTACTACGATTATTATTATTATTA  49
> --------------------------------
> The only thing which bothers me is that the base pairs (which do have a
> positive comparison score) are not labeled as "similar", they get a '.'
> instead of a ':'. Does someone know why this is ?


I believe this is simply because the bases are not identical. A user
matrix can have arbitrary values, so the results are marked as similar
(A=T scores 5) but identities are only scored at zero and so never appear
with ":".

You could try setting the scores to match the hydrogen bonds for this
experiment (G=C 3 A=T 2 G=T 1)

RNA folding is a missing area in EMBOSS. The Vienna package has been
suggested as a possible EMBASSY package. Does anyone have any experience
with it, or suggestions for alternative RNA packages we could use?

regards,

Peter


From David.Bauer at SCHERING.DE  Fri Jun  3 02:37:11 2005
From: David.Bauer at SCHERING.DE (David.Bauer at SCHERING.DE)
Date: Fri, 3 Jun 2005 08:37:11 +0200
Subject: Antwort: Re: [EMBOSS] use water/matcher to find where RNA bybridizes
Message-ID: <OFB791E762.F05DE52B-ONC1257015.0020620D-C1257015.00245D30@schering.net>


Hi,

I use the Vienna RNA package.
It allows to look for global structure of the complete RNA (RNAfold) or
local structures (RNALfold).
The global folding accepts also longer sequences (as far as I remember this
was a problem with Mfold).
Visualization is a bit tricky. But there are helper scripts to convert the
output to .ct files (b2ct) which can be used to create
different graphical representations.

Regards,
David.


RNA folding is a missing area in EMBOSS. The Vienna package has been
suggested as a possible EMBASSY package. Does anyone have any experience
with it, or suggestions for alternative RNA packages we could use?

regards,

Peter


From gbottu at ben.vub.ac.be  Fri Jun  3 04:17:41 2005
From: gbottu at ben.vub.ac.be (Guy Bottu)
Date: Fri, 3 Jun 2005 10:17:41 +0200
Subject: [EMBOSS] use water/matcher to find where RNA bybridizes
In-Reply-To: <20050602170831.GW44956@iib.unsam.edu.ar>
References: <20050602100954.GA14063@bigben.ulb.ac.be> <20050602170831.GW44956@iib.unsam.edu.ar>
Message-ID: <20050603081741.GA23810@bigben.ulb.ac.be>

	Dear all,

Thanks for your replies. It is however still not clear to me where the '.' 
come from. I thought the EMBOSS "pair" output would put a '|' for 
identities and a ':' for similarities (score positive). Maybe the program 
is fooled and seriously perturbed by a matrix that assigns a negative 
score to identical base pairs.

As for the proposal to distribute ViennaRNA as an Embassadir, why not ? At 
the BEN site we have mfold integrated under EMBOSS, but I am afraid 
distributing mfold as Embasadir will turn out to be impossible bacause of 
licencing issues.
Note that mfold does not entirely solve the problem, since it operates on 
a single sequence, it does not search for a structure composed of two 
strands. I guess this is also true for ViennaRNA.
We (me and our user) had tried to use mfold (with as input a sequence 
composed of the mRNA, a poly-T linker and the small RNA), but the program 
crashed with error message "Cannot get Fill". Maybe the sequence had 
something unusual.

	Regards,
	Guy Bottu,
	BEN
 

From atorrano at lsi.upc.edu  Fri Jun  3 05:15:10 2005
From: atorrano at lsi.upc.edu (Alexis Torrano Martinez)
Date: Fri, 3 Jun 2005 11:15:10 +0200 (MET DST)
Subject: [EMBOSS] external and app
Message-ID: <7479297835atorrano@lsi.upc.es>


Hello

I am trying to execute hmmsearch from EMBOSS. This way I want to have
a kind of wrap over the DDBB and retrieval apps. 

 
 DB Pfam [
 	method: "app"
 	comment: "Pfam with HMMER indexing"
 	app: "$HMMERBIN/hmmsearch $EMBOSS_DATA/pfam/Pfam_ls %s"
 	]

That is my DB specification for EMBOSS. How should I run seqret 
to execute properly hmmsearch? 

 seqret Pfam:$HOME/soft/hmmer/last/tutorial/7LES_DROME


And the next error was unexpected :

Error: Unable to read sequence
'Pfam:/usr/usuaris/it/inb/soft/hmmer/last/tutorial/7LES_DROME'

As tutorial says, if you specify external, %s receives as value the
second field of the query (ID from seqret DB:ID).

There is a way to call hmmsearch from EMBOSS?

           A lot of thanks.

         Regards.

    Alexis Torrano.     


--
-----------------------------------------------------
Alexis Torrano Martinez

Instituto Nacional de Bioinformatica (INB) Nodo
Computacional GNHC-2
UPC-CIRI
c/. Jordi Girona 1-3
Modul C6-E201           Tel.   : 934 011 650
E-08034 Barcelona       Fax    : 934 017 014
Catalunya (Spain)       e-mail : atorrano at lsi.upc.edu
-----------------------------------------------------


From gbottu at ben.vub.ac.be  Fri Jun  3 06:03:18 2005
From: gbottu at ben.vub.ac.be (Guy Bottu)
Date: Fri, 3 Jun 2005 12:03:18 +0200
Subject: [EMBOSS] external and app
In-Reply-To: <7479297835atorrano@lsi.upc.es>
References: <7479297835atorrano@lsi.upc.es>
Message-ID: <20050603100318.GA24538@bigben.ulb.ac.be>

On Fri, Jun 03, 2005 at 11:15:10AM +0200, Alexis Torrano Martinez wrote:
> I am trying to execute hmmsearch from EMBOSS. This way I want to have
> a kind of wrap over the DDBB and retrieval apps. 
> 
>  
>  DB Pfam [
>  	method: "app"
>  	comment: "Pfam with HMMER indexing"
>  	app: "$HMMERBIN/hmmsearch $EMBOSS_DATA/pfam/Pfam_ls %s"
>  	]

	Dear Alexis,

Your problem is as good as certain that the program defined as "app" 
should return a sequence to standard output, so that EMBOSS can take it. 
And this is not what hmmsearch does. Furthermore, hmmsearch searches a HMM 
against a databank of sequences ; you seem to want to search a sequence 
against a databank of HMM's (Pfam_ls), for which you need hmmpfam. It is 
maybe a good idea to install the Embassadir HMMER. Note however that 
ehmmpfam needs the user to specify where the databank is. At the BEN site 
I have a little bit "hacked" the program so that it uses Pfam_ls by 
default (and still lets the user choose an alternative). If you are 
interested I can send you a mail with "how to".

	Guy Bottu,
	Belgian EMBnet Node


From pmr at ebi.ac.uk  Fri Jun  3 06:08:11 2005
From: pmr at ebi.ac.uk (pmr at ebi.ac.uk)
Date: Fri, 3 Jun 2005 11:08:11 +0100 (BST)
Subject: [EMBOSS] use water/matcher to find where RNA bybridizes
In-Reply-To: <20050603081741.GA23810@bigben.ulb.ac.be>
References: <20050602100954.GA14063@bigben.ulb.ac.be>
    <20050602170831.GW44956@iib.unsam.edu.ar>
    <20050603081741.GA23810@bigben.ulb.ac.be>
Message-ID: <1543.198.161.30.152.1117793291.squirrel@webmail.ebi.ac.uk>

Dear Guy,

> Thanks for your replies. It is however still not clear to me where the '.'
> come from. I thought the EMBOSS "pair" output would put a '|' for
> identities and a ':' for similarities (score positive). Maybe the program
> is fooled and seriously perturbed by a matrix that assigns a negative
> score to identical base pairs.

I believe it is perturbed by the zero score for identical base pairs. This
makes it unable to find a consensus character for the alignment, and so
the "no consensus found" '.' character appears in the output.

Making the output format understand your non-identical matching is an
interesting challenge. I will look into it a little more.

regards,

Peter


From Marc.Logghe at devgen.com  Fri Jun  3 06:23:05 2005
From: Marc.Logghe at devgen.com (Marc Logghe)
Date: Fri, 3 Jun 2005 12:23:05 +0200
Subject: [EMBOSS] external and app
Message-ID: <0C528E3670D8CE4B8E013F6749231AA606E802@ANTARESIA.be.devgen.com>

Hi,
Just wondering, what happens if you use entret in stead of seqret.
EMBOSS is supposed to just return the 'sequence' (in this case pfam
result), unaltered, unparsed. When you use seqret, EMBOSS will parse the
output and try to make a sequence out of it.
HTH,
Marc


> -----Original Message-----
> From: owner-emboss at hgmp.mrc.ac.uk 
> [mailto:owner-emboss at hgmp.mrc.ac.uk] On Behalf Of Guy Bottu
> Sent: Friday, June 03, 2005 12:03 PM
> To: Alexis Torrano Martinez; emboss at embnet.org
> Subject: Re: [EMBOSS] external and app
> 
> On Fri, Jun 03, 2005 at 11:15:10AM +0200, Alexis Torrano 
> Martinez wrote:
> > I am trying to execute hmmsearch from EMBOSS. This way I 
> want to have 
> > a kind of wrap over the DDBB and retrieval apps.
> > 
> >  
> >  DB Pfam [
> >  	method: "app"
> >  	comment: "Pfam with HMMER indexing"
> >  	app: "$HMMERBIN/hmmsearch $EMBOSS_DATA/pfam/Pfam_ls %s"
> >  	]
> 
> 	Dear Alexis,
> 
> Your problem is as good as certain that the program defined as "app" 
> should return a sequence to standard output, so that EMBOSS 
> can take it. 
> And this is not what hmmsearch does. Furthermore, hmmsearch 
> searches a HMM against a databank of sequences ; you seem to 
> want to search a sequence against a databank of HMM's 
> (Pfam_ls), for which you need hmmpfam. It is maybe a good 
> idea to install the Embassadir HMMER. Note however that 
> ehmmpfam needs the user to specify where the databank is. At 
> the BEN site I have a little bit "hacked" the program so that 
> it uses Pfam_ls by default (and still lets the user choose an 
> alternative). If you are interested I can send you a mail 
> with "how to".
> 
> 	Guy Bottu,
> 	Belgian EMBnet Node
> 
> 


From pmr at ebi.ac.uk  Fri Jun  3 06:49:24 2005
From: pmr at ebi.ac.uk (pmr at ebi.ac.uk)
Date: Fri, 3 Jun 2005 11:49:24 +0100 (BST)
Subject: [EMBOSS] external and app
In-Reply-To: 
     <0C528E3670D8CE4B8E013F6749231AA606E802@ANTARESIA.be.devgen.com>
References: 
    <0C528E3670D8CE4B8E013F6749231AA606E802@ANTARESIA.be.devgen.com>
Message-ID: <1830.198.161.30.152.1117795764.squirrel@webmail.ebi.ac.uk>

Hi Marc,

> Just wondering, what happens if you use entret in stead of seqret.
> EMBOSS is supposed to just return the 'sequence' (in this case pfam
> result), unaltered, unparsed. When you use seqret, EMBOSS will parse the
> output and try to make a sequence out of it.

Entret has to read the input as a sequence, and then returns the full text.

So entret will fail where seqret fails.

regards,

Peter


From jtk at cmp.uea.ac.uk  Fri Jun  3 08:41:24 2005
From: jtk at cmp.uea.ac.uk (Jan T. Kim)
Date: Fri, 3 Jun 2005 13:41:24 +0100
Subject: [EMBOSS] Reading Two Sequences from stdin with water
Message-ID: <20050603124124.GI21551@jtkpc.cmp.uea.ac.uk>

Dear EMBOSSers,

is it possible to read both input sequences to a pairwise alignment
from one input stream?

With the test input file attached, the command

    water -asequence fasta::x.fasta:seq1 -bsequence fasta::x.fasta:seq2 -outfile stdout -auto

runs as I expect, but the command

    cat x.fasta | water -asequence fasta::stdin:seq1 -bsequence fasta::stdin:seq2 -outfile stdout -auto

gives

   EMBOSS An error in ajfile.c at line 1926:
Error reading from file 'stdin'

It may well be that water consumes the entire input stream on getting the
first sequence, thus rendering itself unable to acquire the second one.

Is there a solution to this? I would really like to avoid the mess of
temporary files and run water in a clean pipe (pun intended  ;-)  )

Best regards & thanks in advance, Jan
-- 
 +- Jan T. Kim -------------------------------------------------------+
 |    *NEW*    email: jtk at cmp.uea.ac.uk                               |
 |    *NEW*    WWW:   http://www.cmp.uea.ac.uk/people/jtk             |
 *-----=<  hierarchical systems are for files, not for humans  >=-----*
-------------- next part --------------
> seq1
accaacc
> seq2
acgagcc

From jtk at cmp.uea.ac.uk  Fri Jun  3 08:53:35 2005
From: jtk at cmp.uea.ac.uk (Jan T. Kim)
Date: Fri, 3 Jun 2005 13:53:35 +0100
Subject: [EMBOSS] Reading Two Sequences from stdin with water
Message-ID: <20050603125335.GA25735@jtkpc.cmp.uea.ac.uk>

Dear EMBOSSers,

is it possible to read both input sequences to a pairwise alignment
from one input stream?

With the test input file attached, the command

    water -asequence fasta::x.fasta:seq1 -bsequence fasta::x.fasta:seq2 -outfile stdout -auto

runs as I expect, but the command

    cat x.fasta | water -asequence fasta::stdin:seq1 -bsequence fasta::stdin:seq2 -outfile stdout -auto

gives

   EMBOSS An error in ajfile.c at line 1926:
Error reading from file 'stdin'

It may well be that water consumes the entire input stream on getting the
first sequence, thus rendering itself unable to acquire the second one.

Is there a solution to this? I would really like to avoid the mess of
temporary files and run water in a clean pipe (pun intended  ;-)  )

Best regards & thanks in advance, Jan
-- 
 +- Jan T. Kim -------------------------------------------------------+
 |    *NEW*    email: jtk at cmp.uea.ac.uk                               |
 |    *NEW*    WWW:   http://www.cmp.uea.ac.uk/people/jtk             |
 *-----=<  hierarchical systems are for files, not for humans  >=-----*
-------------- next part --------------
> seq1
accaacc
> seq2
acgagcc

From simon.andrews at bbsrc.ac.uk  Fri Jun  3 08:16:58 2005
From: simon.andrews at bbsrc.ac.uk (simon andrews (BI))
Date: Fri, 3 Jun 2005 13:16:58 +0100
Subject: [EMBOSS] Reading Two Sequences from stdin with water
In-Reply-To: <20050603125335.GA25735@jtkpc.cmp.uea.ac.uk>
References: <20050603125335.GA25735@jtkpc.cmp.uea.ac.uk>
Message-ID: <94bd127ae352d650997dc4263fab3b8d@bbsrc.ac.uk>


On 3 Jun 2005, at 13:53, Jan T. Kim wrote:

> Dear EMBOSSers,
>
> is it possible to read both input sequences to a pairwise alignment
> from one input stream?

I spent a while trying to figure this out a few months back.  In the 
end the best solution I came up with was to use the asis: sequence 
type.  This allows you to do:

water -auto asis:aaaa asis:ataa stdout

which avoids the need for messing with the file system.  I seem to 
remember I found a way to set names for the sequences as well, but 
can't find that right now.

As long as you make sure you don't pass your command through a shell 
when you launch this from a script then it actually scales pretty well 
to quite large sequences.

Hope this helps

Simon.
-- 
Simon Andrews PhD
Bioinformatics Dept.
The Babraham Institute

simon.andrews at bbsrc.ac.uk
+44 (0) 1223 496463


From pmr at ebi.ac.uk  Fri Jun  3 10:09:03 2005
From: pmr at ebi.ac.uk (pmr at ebi.ac.uk)
Date: Fri, 3 Jun 2005 15:09:03 +0100 (BST)
Subject: [EMBOSS] Reading Two Sequences from stdin with water
In-Reply-To: <20050603125335.GA25735@jtkpc.cmp.uea.ac.uk>
References: <20050603125335.GA25735@jtkpc.cmp.uea.ac.uk>
Message-ID: <1977.198.161.30.152.1117807743.squirrel@webmail.ebi.ac.uk>

Jan T. Kim writes:
> is it possible to read both input sequences to a pairwise alignment
> from one input stream?
>
>     cat x.fasta | water -asequence fasta::stdin:seq1 -bsequence
> fasta::stdin:seq2 -outfile stdout -auto
>
> gives
>
>    EMBOSS An error in ajfile.c at line 1926:
> Error reading from file 'stdin'
>
> It may well be that water consumes the entire input stream on getting the
> first sequence, thus rendering itself unable to acquire the second one.
>
> Is there a solution to this? I would really like to avoid the mess of
> temporary files and run water in a clean pipe (pun intended  ;-)  )

EMBOSS will only cleanly read stdin as one input. We should probably trap
that internally and give an error if we find stdin opening again. I wonder
whether there is any useful way to share the stdin filebuffer. Hmmmm... in
the early days of EMBOSS we decided not to allow it, but it could be worth
a try. You would still be in trouble if you tried to read the second
sequence first though.

Assuming your x.fasta file has only seq1 and seq2 in that order, reading
seq1 will continue until the first line of seq2 is reached. By then it
would be too late for seq2 to be read cleanly.

At least you have fasta:: specified - with no specified format, EMBOSS has
to read a long way into the input just to check whether it is really GCG
format.

As for the asis format, I suppose an EMBOSS utility that reads x.fasta and
outputs asis::ctagtacgatgcgatcg asis::tgatcgatggctacgtagc would be useful
to you - then you could put `sillyname x.fasta` in your command line... at
least until the command line gets too long. Hard to preserve the ID and
description of the sequences though.

"If you think water is pure, just remember what fish do in it."

Hope that helps,

Peter


From jtk at cmp.uea.ac.uk  Fri Jun  3 11:40:31 2005
From: jtk at cmp.uea.ac.uk (Jan T. Kim)
Date: Fri, 3 Jun 2005 16:40:31 +0100
Subject: [EMBOSS] Reading Two Sequences from stdin with water
In-Reply-To: <94bd127ae352d650997dc4263fab3b8d@bbsrc.ac.uk>
References: <20050603125335.GA25735@jtkpc.cmp.uea.ac.uk> <94bd127ae352d650997dc4263fab3b8d@bbsrc.ac.uk>
Message-ID: <20050603154031.GE25735@jtkpc.cmp.uea.ac.uk>

On Fri, Jun 03, 2005 at 01:16:58PM +0100, simon andrews wrote:
> 
> On 3 Jun 2005, at 13:53, Jan T. Kim wrote:
> 
> >Dear EMBOSSers,
> >
> >is it possible to read both input sequences to a pairwise alignment
> >from one input stream?
> 
> I spent a while trying to figure this out a few months back.  In the 
> end the best solution I came up with was to use the asis: sequence 
> type.  This allows you to do:
> 
> water -auto asis:aaaa asis:ataa stdout
> 
> which avoids the need for messing with the file system.  I seem to 
> remember I found a way to set names for the sequences as well, but 
> can't find that right now.

That's a good idea which I hadn't thought of. Thanks for that. I don't
need any names, other than for purposes of identifying the sequence
within a multisequence file, which is not necessary with this solution.

> As long as you make sure you don't pass your command through a shell 
> when you launch this from a script then it actually scales pretty well 
> to quite large sequences.

Hmm... isn't there any OS specific limitation to the length of arguments?
But anyway, this is not an issue for me in my case, where sequence
length does not exceed a few hundred symbols.

Best regards, Jan
-- 
 +- Jan T. Kim -------------------------------------------------------+
 |    *NEW*    email: jtk at cmp.uea.ac.uk                               |
 |    *NEW*    WWW:   http://www.cmp.uea.ac.uk/people/jtk             |
 *-----=<  hierarchical systems are for files, not for humans  >=-----*


From simon.andrews at bbsrc.ac.uk  Fri Jun  3 10:53:17 2005
From: simon.andrews at bbsrc.ac.uk (simon andrews (BI))
Date: Fri, 3 Jun 2005 15:53:17 +0100
Subject: [EMBOSS] Reading Two Sequences from stdin with water
In-Reply-To: <20050603154031.GE25735@jtkpc.cmp.uea.ac.uk>
References: <20050603125335.GA25735@jtkpc.cmp.uea.ac.uk> <94bd127ae352d650997dc4263fab3b8d@bbsrc.ac.uk> <20050603154031.GE25735@jtkpc.cmp.uea.ac.uk>
Message-ID: <297ae8156db03f61d2deb2e786d3bf10@bbsrc.ac.uk>


On 3 Jun 2005, at 16:40, Jan T. Kim wrote:

> On Fri, Jun 03, 2005 at 01:16:58PM +0100, simon andrews wrote:
>> As long as you make sure you don't pass your command through a shell
>> when you launch this from a script then it actually scales pretty well
>> to quite large sequences.
>
> Hmm... isn't there any OS specific limitation to the length of 
> arguments?
> But anyway, this is not an issue for me in my case, where sequence
> length does not exceed a few hundred symbols.

The only limit is imposed when the command is passed through a shell, 
and is then dependent on the shell you're using.  If you can call the 
program without going through a shell then there should be no limit 
(beyond normal OS memory limits).

The method for doing this varies with the language you're writing the 
script in, but for example in Perl:

system ("water -auto asis:gatc asis:gatc stdout")

would pass the arguments through a shell, whereas

system("water", "-auto", "asis:gatc","asis:gatc","stdout")

would not.

Simon.
-- 
Simon Andrews PhD
Bioinformatics Dept.
The Babraham Institute

simon.andrews at bbsrc.ac.uk
+44 (0) 1223 496463


From andrew.warry at bbsrc.ac.uk  Fri Jun  3 11:23:57 2005
From: andrew.warry at bbsrc.ac.uk (andrew warry (BITS))
Date: Fri, 3 Jun 2005 16:23:57 +0100
Subject: [EMBOSS] Reading Two Sequences from stdin with water
Message-ID: <3AED5B0556B73F4A9B556F43384F5C8501857BEC@bitse2knas1.bits.bbsrc.reserved>


>Is there a solution to this? I would really like to avoid the mess of
temporary files and >run water in a clean pipe (pun intended  ;-)  )

Hi
How about :

nthseq x.fasta -number 2 -stdout -auto | water -aseq stdin -bseq x.fasta
-stdout -auto

It isn't very neat and does a redundant comparison but it does the job!


Andrew

----------------------------------------------------------------------- 
ANDREW WARRY 
Computational Molecular Biology Support 
BBSRC Bioscience IT services 
West Common                                      
Harpenden                                        
HERTS AL5 2JE
tel: (01582) 714904
fax: (01582) 714901
andrew.warry at bbsrc.ac.uk      
----------------------------------------------------------------------- 

-- 
Disclaimer: This e-mail and any attachments are confidential and intended solely for the use of the recipient(s) to whom they are addressed. If you have received it in error, please destroy all copies and inform the sender. This email and any attachments are believed to be free from viruses but BBSRC accepts no liability in connection therewith.


From simon.andrews at bbsrc.ac.uk  Fri Jun  3 11:32:34 2005
From: simon.andrews at bbsrc.ac.uk (simon andrews (BI))
Date: Fri, 3 Jun 2005 16:32:34 +0100
Subject: [EMBOSS] Reading Two Sequences from stdin with water
In-Reply-To: <3AED5B0556B73F4A9B556F43384F5C8501857BEC@bitse2knas1.bits.bbsrc.reserved>
References: <3AED5B0556B73F4A9B556F43384F5C8501857BEC@bitse2knas1.bits.bbsrc.reserved>
Message-ID: <53838984cac0240ba7aefe6d33f7810d@bbsrc.ac.uk>


On 3 Jun 2005, at 16:23, andrew warry ((BITS)) wrote:

>
>> Is there a solution to this? I would really like to avoid the mess of
>> temporary files and run water in a clean pipe (pun intended  ;-)  )
>
> Hi
> How about :
>
> nthseq x.fasta -number 2 -stdout -auto | water -aseq stdin -bseq 
> x.fasta
> -stdout -auto
>
> It isn't very neat and does a redundant comparison but it does the job!

But x.fasta still has to appear on the filesystem.  You can't run this 
cleanly in a pipe.

Simon.


From golharam at umdnj.edu  Fri Jun  3 10:57:18 2005
From: golharam at umdnj.edu (Ryan Golhar)
Date: Fri, 03 Jun 2005 10:57:18 -0400
Subject: [EMBOSS] Man pages
Message-ID: <004501c5684c$89f2ef90$e6028a0a@GOLHARMOBILE1>

Hi all,

I recently noticed there aren't man pages installed with emboss, but I
thought there were in the past.  Are there man pages available?  If so,
where/how do I get them?

-----
Ryan Golhar
Computational Biologist
The Informatics Institute at
The University of Medicine & Dentistry of NJ

Phone: 973-972-5034
Fax: 973-972-7412
Email: golharam at umdnj.edu


From jtk at cmp.uea.ac.uk  Fri Jun  3 13:18:01 2005
From: jtk at cmp.uea.ac.uk (Jan T. Kim)
Date: Fri, 3 Jun 2005 18:18:01 +0100
Subject: [EMBOSS] Reading Two Sequences from stdin with water
In-Reply-To: <1977.198.161.30.152.1117807743.squirrel@webmail.ebi.ac.uk>
References: <20050603125335.GA25735@jtkpc.cmp.uea.ac.uk> <1977.198.161.30.152.1117807743.squirrel@webmail.ebi.ac.uk>
Message-ID: <20050603171801.GF25735@jtkpc.cmp.uea.ac.uk>

On Fri, Jun 03, 2005 at 03:09:03PM +0100, pmr at ebi.ac.uk wrote:
> Jan T. Kim writes:
> > is it possible to read both input sequences to a pairwise alignment
> > from one input stream?
> >
> >     cat x.fasta | water -asequence fasta::stdin:seq1 -bsequence
> > fasta::stdin:seq2 -outfile stdout -auto
> >
> > gives
> >
> >    EMBOSS An error in ajfile.c at line 1926:
> > Error reading from file 'stdin'
> >
> > It may well be that water consumes the entire input stream on getting the
> > first sequence, thus rendering itself unable to acquire the second one.
> >
> > Is there a solution to this? I would really like to avoid the mess of
> > temporary files and run water in a clean pipe (pun intended  ;-)  )
> 
> EMBOSS will only cleanly read stdin as one input. We should probably trap
> that internally and give an error if we find stdin opening again. I wonder
> whether there is any useful way to share the stdin filebuffer. Hmmmm... in
> the early days of EMBOSS we decided not to allow it, but it could be worth
> a try. You would still be in trouble if you tried to read the second
> sequence first though.

Conceptually, this could be cleanly handled (which is why I tried in
the first place), by having the function for obtaining the input sequences
determine the source files in a first pass of the list of sources, and
then obtain all requested sequences that come from the same file in one
go through that file. This could be applied to the standard input just
as to any other file.

However, if the current code acquires the two sequences one after the
other and independently of each other, it will require a possibly less than
trivial rewrites to change that -- likely, the API for obtaining a
sequence specified by a USA would have to be extended such that multiple
sequences can be obtained from one file in one pass through that file,
and some functions to group lists of USAs into sublists of USAs that
refer to the same file would have to be provided.

> Assuming your x.fasta file has only seq1 and seq2 in that order, reading
> seq1 will continue until the first line of seq2 is reached. By then it
> would be too late for seq2 to be read cleanly.

Well, the approach outlined above does not have that limitation, and
it also works for interleaved sequence formats. But if the EMBOSS
internals are as I assume above, it's clear to me that this is something
for the long-term wishlist.

> At least you have fasta:: specified - with no specified format, EMBOSS has
> to read a long way into the input just to check whether it is really GCG
> format.

Yes, heuristic format determination and non-seekable inputs don't mix
too well generally...

> As for the asis format, I suppose an EMBOSS utility that reads x.fasta and
> outputs asis::ctagtacgatgcgatcg asis::tgatcgatggctacgtagc would be useful
> to you - then you could put `sillyname x.fasta` in your command line... at
> least until the command line gets too long. Hard to preserve the ID and
> description of the sequences though.

Yes -- in my case, I have the sequences available within a Python script
anyway, so the asis approach works fine for me (even with a popen
facility that goes through a shell -- I'll have to check how to eliminate
that for future occasions where sequences may be too long for the
command line, though).

> "If you think water is pure, just remember what fish do in it."

I like to boil my water, adding an all-natural disinfectant known as
"coffee" for this reason...  ;-)

Best regards, Jan
-- 
 +- Jan T. Kim -------------------------------------------------------+
 |    *NEW*    email: jtk at cmp.uea.ac.uk                               |
 |    *NEW*    WWW:   http://www.cmp.uea.ac.uk/people/jtk             |
 *-----=<  hierarchical systems are for files, not for humans  >=-----*


From robin at hms.harvard.edu  Fri Jun  3 12:30:33 2005
From: robin at hms.harvard.edu (Robin Colgrove)
Date: Fri, 3 Jun 2005 12:30:33 -0400
Subject: [EMBOSS] Man pages in multiple languages?
In-Reply-To: <004501c5684c$89f2ef90$e6028a0a@GOLHARMOBILE1>
References: <004501c5684c$89f2ef90$e6028a0a@GOLHARMOBILE1>
Message-ID: <f4b57570c2458cbe48e5a2fd1468a787@hms.harvard.edu>


Hello all,

are there EMBOSS man pages in other languages than English?

Mandarin and Spanish in particular would help around here.

thanks

robin colgrove
Harvard Medical School


From pmr at ebi.ac.uk  Fri Jun  3 13:14:08 2005
From: pmr at ebi.ac.uk (pmr at ebi.ac.uk)
Date: Fri, 3 Jun 2005 18:14:08 +0100 (BST)
Subject: [EMBOSS] Man pages in multiple languages?
In-Reply-To: <f4b57570c2458cbe48e5a2fd1468a787@hms.harvard.edu>
References: <004501c5684c$89f2ef90$e6028a0a@GOLHARMOBILE1>
    <f4b57570c2458cbe48e5a2fd1468a787@hms.harvard.edu>
Message-ID: <2398.198.161.30.152.1117818848.squirrel@webmail.ebi.ac.uk>

Hi Robin,

> are there EMBOSS man pages in other languages than English?
>
> Mandarin and Spanish in particular would help around here.

We don't have man pages exactly. We have a text version of the online
documentation, with the "tfm" program to display to the screen.

To find out why it is called tfm, you can use the command:

tfm tfm

Of course, it prints "The F(antastic) Manual" as in "RTFM"

For other languages, there may be something out there. We are aware of a
Japanese user group that has translated much of the EMBOSS materials. I am
sure there are Mandarin speakers who could create a Mandarin version -
though on the first ever EMBOSS course (in Beijing) ethere was a vote
against creating a Mandarin version of the commandline.

Hope this helps,

Peter Rice


From luojc at plum.lsc.pku.edu.cn  Fri Jun  3 21:15:37 2005
From: luojc at plum.lsc.pku.edu.cn (Jingchu Luo)
Date: Sat, 4 Jun 2005 09:15:37 +0800 (CST)
Subject: [EMBOSS] Man pages in multiple languages?
In-Reply-To: <2398.198.161.30.152.1117818848.squirrel@webmail.ebi.ac.uk>
Message-ID: <Pine.LNX.4.44.0506040817320.25760-100000@plum.lsc.pku.edu.cn>

> I am sure there are Mandarin speakers who could create a Mandarin
> version - though on the first ever EMBOSS course (in Beijing) there was
> a vote against creating a Mandarin version of the commandline.

We were running an EMBnet bioinformatics workshop in April 1999. Peter 
gave a talk about EMBOSS. It might be useful to have user manual and/or 
documentation in Chinese for the Chinese user group. We'll see if anyone 
in mainland has been working on this already. 

Jingchu
-------
Jingchu Luo
Centre of Bioinformatics
Peking University
Beijing 100871, China
Tel: 86-10-6275-7281
Fax: 86-10-6275-9001
Email: luojc at pku.edu.cn
URL: http://www.cbi.pku.edu.cn 


From d.gatherer at vir.gla.ac.uk  Wed Jun 15 06:31:33 2005
From: d.gatherer at vir.gla.ac.uk (Derek Gatherer)
Date: Wed, 15 Jun 2005 11:31:33 +0100
Subject: [EMBOSS] seqret options
Message-ID: <6.2.1.2.1.20050615111255.02adcf50@lenzie.gla.ac.uk>

Dear EMBOSSers

I'm trying to write a pipeline to take a load of paired, aligned homologues 
from 2 species and submit them sequentially to the yn00 application from 
the well known PAML package.  PAML's applications all take PHYLIP 
format.  I can easily make this by looping over:

seqret -auto -osformat phylip infile -out outfile

However, PAML requires that the flag "I" be placed on the top line of the 
phylip fomat to indicate interleaved, eg:

  2 663 I
c-barf1  ATGGCCAGGC TTTTCGCTCA GCTGCTCCTG CTCGCGGGCT CCGTCGCCTC
barf1     ATGGCCAGGT TCATCGCTCA GCTCCTCCTG TTGGCCTCCT GTGTGGCCGC

           CTGCCTGGCC GTCACCGCCT TTGTGGGTGA GCGGGCCGTC CTGAGTTCCT
           CGGCCAGGCT GTCACCGCTT TCTTGGGTGA GCGAGTCACC CTGACCTCCT

rather than the standard phylip format, given by seqret:

  2 663
c-barf1   ATGGCCAGGC TTTTCGCTCA GCTGCTCCTG CTCGCGGGCT CCGTCGCCTC
barf1     ATGGCCAGGT TCATCGCTCA GCTCCTCCTG TTGGCCTCCT GTGTGGCCGC

           CTGCCTGGCC GTCACCGCCT TTGTGGGTGA GCGGGCCGTC CTGAGTTCCT
           CGGCCAGGCT GTCACCGCTT TCTTGGGTGA GCGAGTCACC CTGACCTCCT

I could write a script to open each seqret output file and add this 
character to the top line of each, but before I dive into this, I'd like to 
know if there is any flag I can add to seqret to get the "I" added 
automatically.

Failing that, PAML takes the other, non-interleaved phylip format 
("sequential") by default, and that would not require any flag 
insertion.  Seqret also can produce this (using -osformat phylip3):

1 663 YF
c-barf1 ATGGCCAGGC TTTTCGCTCA GCTGCTCCTG CTCGCGGGCT CCGTCGCCTC
           CTGCCTGGCC GTCACCGCCT TTGTGGGTGA GCGGGCCGTC CTGAGTTCCT
           ACTGGAAGAG GGTGAGCCTA GGGCCCGAGA TCATGGTGGA ATGGTTCAAA

but then PAML won't read it because it doesn't like the YF flags inserted 
by seqret!!

So I either have to script to remove flags from sequential or insert them 
in interleaved, unless seqret has a solution.

All assistance gratefully appreciated
Derek


From David.Bauer at SCHERING.DE  Wed Jun 15 07:19:55 2005
From: David.Bauer at SCHERING.DE (David.Bauer at SCHERING.DE)
Date: Wed, 15 Jun 2005 13:19:55 +0200
Subject: Antwort: [EMBOSS] seqret options
Message-ID: <OFA27F3B1C.3EC36BC7-ONC1257021.003CD8F5-C1257021.003E3FCC@schering.net>


Hi Derek,

you can easily change this in the source code.
The sequence output formats are defined in ajax/ajseqwrite.c
In the function seqWritePhylip3 you find a line:
ajFmtPrintF(outseq->File, "1 %d YF\n", ilen);
Here you can just delete the YF and recompile emboss.

David.


                      Derek Gatherer                                                                                             
                      <d.gatherer at vir.                                                                                           
                      gla.ac.uk>               An:      emboss at embnet.org                                                        
                      Gesendet von:            Kopie:                                                                            
                      owner-emboss at hgm         Thema:   [EMBOSS] seqret options                                                  
                      p.mrc.ac.uk                                                                                                
                                                                                                                                 
                                                                                                                                 
                      15.06.2005 12:31                                                                                           
                                                                                                                                 
                                                                                                                                 
Dear EMBOSSers

I'm trying to write a pipeline to take a load of paired, aligned homologues

from 2 species and submit them sequentially to the yn00 application from
the well known PAML package.  PAML's applications all take PHYLIP
format.  I can easily make this by looping over:

seqret -auto -osformat phylip infile -out outfile

However, PAML requires that the flag "I" be placed on the top line of the
phylip fomat to indicate interleaved, eg:

  2 663 I
c-barf1  ATGGCCAGGC TTTTCGCTCA GCTGCTCCTG CTCGCGGGCT CCGTCGCCTC
barf1     ATGGCCAGGT TCATCGCTCA GCTCCTCCTG TTGGCCTCCT GTGTGGCCGC

           CTGCCTGGCC GTCACCGCCT TTGTGGGTGA GCGGGCCGTC CTGAGTTCCT
           CGGCCAGGCT GTCACCGCTT TCTTGGGTGA GCGAGTCACC CTGACCTCCT

rather than the standard phylip format, given by seqret:

  2 663
c-barf1   ATGGCCAGGC TTTTCGCTCA GCTGCTCCTG CTCGCGGGCT CCGTCGCCTC
barf1     ATGGCCAGGT TCATCGCTCA GCTCCTCCTG TTGGCCTCCT GTGTGGCCGC

           CTGCCTGGCC GTCACCGCCT TTGTGGGTGA GCGGGCCGTC CTGAGTTCCT
           CGGCCAGGCT GTCACCGCTT TCTTGGGTGA GCGAGTCACC CTGACCTCCT

I could write a script to open each seqret output file and add this
character to the top line of each, but before I dive into this, I'd like to

know if there is any flag I can add to seqret to get the "I" added
automatically.

Failing that, PAML takes the other, non-interleaved phylip format
("sequential") by default, and that would not require any flag
insertion.  Seqret also can produce this (using -osformat phylip3):

1 663 YF
c-barf1 ATGGCCAGGC TTTTCGCTCA GCTGCTCCTG CTCGCGGGCT CCGTCGCCTC
           CTGCCTGGCC GTCACCGCCT TTGTGGGTGA GCGGGCCGTC CTGAGTTCCT
           ACTGGAAGAG GGTGAGCCTA GGGCCCGAGA TCATGGTGGA ATGGTTCAAA

but then PAML won't read it because it doesn't like the YF flags inserted
by seqret!!

So I either have to script to remove flags from sequential or insert them
in interleaved, unless seqret has a solution.

All assistance gratefully appreciated
Derek


From pmr at ebi.ac.uk  Wed Jun 15 08:23:48 2005
From: pmr at ebi.ac.uk (Peter Rice)
Date: Wed, 15 Jun 2005 13:23:48 +0100
Subject: [EMBOSS] seqret options
In-Reply-To: <6.2.1.2.1.20050615111255.02adcf50@lenzie.gla.ac.uk>
References: <6.2.1.2.1.20050615111255.02adcf50@lenzie.gla.ac.uk>
Message-ID: <42B01DD4.8050303@ebi.ac.uk>

Derek Gatherer wrote:

> Dear EMBOSSers
> 
> I'm trying to write a pipeline to take a load of paired, aligned 
> homologues from 2 species and submit them sequentially to the yn00 
> application from the well known PAML package.  PAML's applications all 
> take PHYLIP format.

> Failing that, PAML takes the other, non-interleaved phylip format 
> ("sequential") by default, and that would not require any flag 
> insertion. 

Last time I worked through the PHYLIP formats (for EMBOSS 2.10.0) I found 
Phylip had changed the format it used.

One change was that I removed the YF from phylip3 format because phylip was no 
longer using it - so updating to EMBOSS 2.10.0 will solve your non-interleaved 
format problem (and David Bauer's code fix is exactly what you need).

Any more feedback on the variations of phylip formats that other packages use 
would be a great help!

We will be releasing the PHYLIP 3.6 integration (as a PHYLIPNEW EMBASSY 
package) soon and expect to see more use of phylogenetics packages with EMBOSS.

regards,

Peter Rice


From d.gatherer at vir.gla.ac.uk  Wed Jun 15 08:44:46 2005
From: d.gatherer at vir.gla.ac.uk (Derek Gatherer)
Date: Wed, 15 Jun 2005 13:44:46 +0100
Subject: [EMBOSS] seqret options
In-Reply-To: <42B01DD4.8050303@ebi.ac.uk>
References: <6.2.1.2.1.20050615111255.02adcf50@lenzie.gla.ac.uk>
 <42B01DD4.8050303@ebi.ac.uk>
Message-ID: <6.2.1.2.1.20050615134121.02addff8@lenzie.gla.ac.uk>

I do have 2.10.0:

[gath01d at gamma seqs]$ seqret -osformat phylip3 barf1_both.seq
Reads and writes (returns) sequences
Output sequence [c-barf1.phylip3]: barf1.phylip3
[gath01d at gamma seqs]$ more barf1.phylip3
1 663 YF
c-barf1ATGGCCAGGC TTTTCGCTCA GCTGCTCCTG CTCGCGGGCT CCGTCGCCTC
           CTGCCTGGCC GTCACCGCCT TTGTGGGTGA GCGGGCCGTC CTGAGTTCCT
           ACTGGAAGAG GGTGAGCCTA GGGCCCGAGA TCATGGTGGA ATGGTTCAAA
[gath01d at gamma seqs]$ embossversion
Writes the current EMBOSS version number
2.10.0

Anyway, I know how to do the code fix now, so thanks to all.

Cheers
Derek

At 13:23 15/06/2005, you wrote:
>Derek Gatherer wrote:
>
>>Dear EMBOSSers
>>I'm trying to write a pipeline to take a load of paired, aligned 
>>homologues from 2 species and submit them sequentially to the yn00 
>>application from the well known PAML package.  PAML's applications all 
>>take PHYLIP format.
>
>>Failing that, PAML takes the other, non-interleaved phylip format 
>>("sequential") by default, and that would not require any flag insertion.
>
>Last time I worked through the PHYLIP formats (for EMBOSS 2.10.0) I found 
>Phylip had changed the format it used.
>
>One change was that I removed the YF from phylip3 format because phylip 
>was no longer using it - so updating to EMBOSS 2.10.0 will solve your 
>non-interleaved format problem (and David Bauer's code fix is exactly what 
>you need).
>
>Any more feedback on the variations of phylip formats that other packages 
>use would be a great help!
>
>We will be releasing the PHYLIP 3.6 integration (as a PHYLIPNEW EMBASSY 
>package) soon and expect to see more use of phylogenetics packages with EMBOSS.
>
>regards,
>
>Peter Rice
>


From pmr at ebi.ac.uk  Wed Jun 15 08:49:59 2005
From: pmr at ebi.ac.uk (Peter Rice)
Date: Wed, 15 Jun 2005 13:49:59 +0100
Subject: [EMBOSS] seqret options
In-Reply-To: <6.2.1.2.1.20050615134121.02addff8@lenzie.gla.ac.uk>
References: <6.2.1.2.1.20050615111255.02adcf50@lenzie.gla.ac.uk> <42B01DD4.8050303@ebi.ac.uk> <6.2.1.2.1.20050615134121.02addff8@lenzie.gla.ac.uk>
Message-ID: <42B023F7.7010808@ebi.ac.uk>

Derek Gatherer wrote:
> I do have 2.10.0:
> 
> [gath01d at gamma seqs]$ seqret -osformat phylip3 barf1_both.seq
> Reads and writes (returns) sequences
> Output sequence [c-barf1.phylip3]: barf1.phylip3
> [gath01d at gamma seqs]$ more barf1.phylip3
> 1 663 YF
> c-barf1ATGGCCAGGC TTTTCGCTCA GCTGCTCCTG CTCGCGGGCT CCGTCGCCTC
>           CTGCCTGGCC GTCACCGCCT TTGTGGGTGA GCGGGCCGTC CTGAGTTCCT
>           ACTGGAAGAG GGTGAGCCTA GGGCCCGAGA TCATGGTGGA ATGGTTCAAA
> [gath01d at gamma seqs]$ embossversion
> Writes the current EMBOSS version number
> 2.10.0

Oops ... make that "will be in 3.0.0" in that case ... it worked for me :-)

regards,

Peter


From d.gatherer at vir.gla.ac.uk  Wed Jun 15 09:25:36 2005
From: d.gatherer at vir.gla.ac.uk (Derek Gatherer)
Date: Wed, 15 Jun 2005 14:25:36 +0100
Subject: [EMBOSS] seqret again
Message-ID: <6.2.1.2.1.20050615142150.02afda58@lenzie.gla.ac.uk>

Is this a bug?  Compare the following output from seqret when phylip and 
phylip3 are specified.  Shouldn't the first line of the phylip3 output be 
"2 546 YF" and not "1 546" ?

[gath01d at gamma EBV]$ seqret -osformat phylip seqs/balf1.both
Reads and writes (returns) sequences
Output sequence [c-balf1.phylip]: seqs/balf1.phylip
[gath01d at gamma EBV]$ more seqs/balf1.phylip
  2 546
c-balf1.seATGCAGCCAG CCAAGTCTAC CGATTCGGTG TTTGTGAGGA CCCCGGTCGA
balf1.seq ATGAGGCCAG CCAAGTCTAC AGATTCTGTG TTTGTGAGGA CCCCGGTCGA

           GGCGTGGGTC TCACCCTCGC CCCCGGACGA CAAAGTGGCA GAGACCAGCT
           GGCGTGGGTC GCGCCCTCGC CGCCGGACGA CAAGGTGGCT GAGTCCAGCT
[snip]

[gath01d at gamma EBV]$ seqret -osformat phylip3 seqs/balf1.both
Reads and writes (returns) sequences
Output sequence [c-balf1.phylip3]: seqs/balf1.phylip3
[gath01d at gamma EBV]$ more seqs/balf1.phylip3
1 546 YF
c-balf1.seATGCAGCCAG CCAAGTCTAC CGATTCGGTG TTTGTGAGGA CCCCGGTCGA
           GGCGTGGGTC TCACCCTCGC CCCCGGACGA CAAAGTGGCA GAGACCAGCT
           ACCTCCTGTT CAGGGCCCTA TACGCTGTGT TCACCCAGGA CGAGACGGAC
           CTGCCTCTAC CGGCCCTGGT CATGTGCCGG CTCCTGAAGG CCTCCCTGAG

[snip]


From pmr at ebi.ac.uk  Wed Jun 15 09:35:57 2005
From: pmr at ebi.ac.uk (Peter Rice)
Date: Wed, 15 Jun 2005 14:35:57 +0100
Subject: [EMBOSS] seqret again
In-Reply-To: <6.2.1.2.1.20050615142150.02afda58@lenzie.gla.ac.uk>
References: <6.2.1.2.1.20050615142150.02afda58@lenzie.gla.ac.uk>
Message-ID: <42B02EBD.4040800@ebi.ac.uk>

Derek Gatherer wrote:

> Is this a bug?  Compare the following output from seqret when phylip and 
> phylip3 are specified.  Shouldn't the first line of the phylip3 output 
> be "2 546 YF" and not "1 546" ?
> [gath01d at gamma EBV]$ seqret -osformat phylip3 seqs/balf1.both
> Reads and writes (returns) sequences
> Output sequence [c-balf1.phylip3]: seqs/balf1.phylip3
> [gath01d at gamma EBV]$ more seqs/balf1.phylip3
> 1 546 YF
> c-balf1.seATGCAGCCAG CCAAGTCTAC CGATTCGGTG TTTGTGAGGA CCCCGGTCGA
>           GGCGTGGGTC TCACCCTCGC CCCCGGACGA CAAAGTGGCA GAGACCAGCT
>           ACCTCCTGTT CAGGGCCCTA TACGCTGTGT TCACCCAGGA CGAGACGGAC
>           CTGCCTCTAC CGGCCCTGGT CATGTGCCGG CTCCTGAAGG CCTCCCTGAG

Yes. Fixed in the next release (and in the current CVS code).

Fixed as in "2 546" without the YF.

Do any programs require the YF?

Peter


From kertib at linuxlap.hu  Wed Jun 15 10:13:44 2005
From: kertib at linuxlap.hu (Kerti Balazs Gabor)
Date: Wed, 15 Jun 2005 16:13:44 +0200
Subject: [EMBOSS] Install error (AMD64)
Message-ID: <42B03798.1060204@linuxlap.hu>

Hello!

I would like to install emboss (latest version) from source. The host OS 
is Fedora Linux Core 4 (2.6.11-1.1369_FC4 #1 Thu Jun 2 22:56:33 EDT 2005 
x86_64 x86_64 x86_64 GNU/Linux).
The script
$ configure --enable 64
ran clear but the
make
made error this:

/bin/sh ../libtool --tag=CC --mode=link gcc  -O2   -o aaindexextract 
aaindexextract.o ../nucleus/libnucleus.la ../ajax/libajaxg.la 
../ajax/libajax.la ../plplot/libplplot.la -lX11  -lm
mkdir .libs
gcc -O2 -o .libs/aaindexextract aaindexextract.o 
../nucleus/.libs/libnucleus.so ../ajax/.libs/libajaxg.so 
../ajax/.libs/libajax.so ../plplot/.libs/libplplot.so -lX11 -lm 
-Wl,--rpath -Wl,/usr/local/lib
/usr/bin/ld: cannot find -lX11
collect2: ld returned 1 exit status
make[2]: *** [aaindexextract] Error 1
make[2]: Leaving directory `/usr/src/EMBOSS-2.10.0/emboss'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/usr/src/EMBOSS-2.10.0/emboss'
make: *** [all-recursive] Error 1
[root at localhost EMBOSS-2.10.0]#

How to solve this? What package(s) need for it?

Balazs


From ableasby at hgmp.mrc.ac.uk  Wed Jun 15 10:26:39 2005
From: ableasby at hgmp.mrc.ac.uk (Alan Bleasby)
Date: Wed, 15 Jun 2005 15:26:39 +0100 (BST)
Subject: [EMBOSS] Install error (AMD64)
Message-ID: <200506151426.j5FEQduS029156@bromine.hgmp.mrc.ac.uk>

Dear Balazs,

You need to install the  xorg-x11-devel RPM, 'make clean' and do
the configure step again.

Also, there is no need to define --enable64 unless you
expect 'user space' applications to consume more than
4Gb of internal memory.

HTH

Alan Bleasby
RFCGR/HGMP (for the next month and a half)


From aengus.stewart at cancer.org.uk  Wed Jun 15 11:46:07 2005
From: aengus.stewart at cancer.org.uk (Aengus Stewart)
Date: Wed, 15 Jun 2005 16:46:07 +0100
Subject: [EMBOSS] 3.0.0
Message-ID: <42B04D3F.7020405@cancer.org.uk>


Will the ceremonial release of 3.0.0 into the wild be at ISMB?

In other words, soon? :-)


Regards
Aengus


-- 
-----------------------------------------------------------------------
Aengus Stewart
Group Leader
<GROUP NAME GOES HERE>                         Tel: +44 (0)20 7269 3679
Cancer Research UK, Lincoln's Inn Fields, Holborn, London, WC2A 3PX, UK
-----------------------------------------------------------------------

This electronic message contains information  which may be privileged and
confidential.  The information is intended to be for the use of the
individual(s) or entity named above.  If you are not the intended recipient,
be aware that any disclosure, copying, distribution or use of the contents
of this information is prohibited. If you have received this electronic
message in error, please notify me by telephone or email (to the number
or address above) immediately.


From ableasby at hgmp.mrc.ac.uk  Wed Jun 15 12:44:18 2005
From: ableasby at hgmp.mrc.ac.uk (Alan Bleasby)
Date: Wed, 15 Jun 2005 17:44:18 +0100 (BST)
Subject: [EMBOSS] 3.0.0
Message-ID: <200506151644.j5FGiI8T009556@bromine.hgmp.mrc.ac.uk>

Well, we always like to try to release on St Swithin's Day; that
date is normally before ISMB, but this year it isn't.

EMBOSS will feature at ISMB in all the usual places (BOSC, poster,
demo and maybe BOF) and the soon-to-be-released 3.0.0 will
certainly be mentioned there.

Alan


From golharam at umdnj.edu  Wed Jun 15 15:01:54 2005
From: golharam at umdnj.edu (Ryan Golhar)
Date: Wed, 15 Jun 2005 15:01:54 -0400
Subject: [EMBOSS] EMBOSS-GUI
Message-ID: <000c01c571dc$b1daaf90$e6028a0a@GOLHARMOBILE1>

Does anyone know if any work is being done on EMBOSS-GUI by Luke
McCarthy.  The web site doesn't seem to be active and out-of-date. 

If a new version isn't being worked on, I'd like to volunteer to help
maintain it for v3.0.0.  Its such a simple and clean interface.  I
haven't found anything else like it.

Ryan


From andrespinzon at gmail.com  Wed Jun 15 16:14:27 2005
From: andrespinzon at gmail.com (Andres Pinzon)
Date: Wed, 15 Jun 2005 15:14:27 -0500
Subject: [EMBOSS] EMBOSS-GUI
In-Reply-To: <000c01c571dc$b1daaf90$e6028a0a@GOLHARMOBILE1>
References: <000c01c571dc$b1daaf90$e6028a0a@GOLHARMOBILE1>
Message-ID: <8968fc7e0506151314772f91f0@mail.gmail.com>

2005/6/15, Ryan Golhar <golharam at umdnj.edu>:
> Does anyone know if any work is being done on EMBOSS-GUI by Luke
> McCarthy.  The web site doesn't seem to be active and out-of-date.
> 
> If a new version isn't being worked on, I'd like to volunteer to help
> maintain it for v3.0.0.  Its such a simple and clean interface.  I
> haven't found anything else like it.

If you need help to maintaini it please ask me! ;-)
I really liked that interface too.


-- 
---------
Andr?s Pinz?n [http://www.andrespinzon.com]   
Centro de Bioinformatica, Instituto de Biotecnologia
http://bioinf.ibun.unal.edu.co
Universidad Nacional de Colombia
tel. 3165000 ext. 16961   
GNU/Linux user number 349752
----------


From lukem at gene.pbi.nrc.ca  Wed Jun 15 15:49:23 2005
From: lukem at gene.pbi.nrc.ca (Luke McCarthy)
Date: Wed, 15 Jun 2005 13:49:23 -0600
Subject: [EMBOSS] EMBOSS-GUI
In-Reply-To: <000c01c571dc$b1daaf90$e6028a0a@GOLHARMOBILE1>
References: <000c01c571dc$b1daaf90$e6028a0a@GOLHARMOBILE1>
Message-ID: <1118864963.13749.8.camel@incognito.invalid>

On Wed, 2005-06-15 at 13:01, Ryan Golhar wrote:
> Does anyone know if any work is being done on EMBOSS-GUI by Luke
> McCarthy.  The web site doesn't seem to be active and out-of-date. 
> 
> If a new version isn't being worked on, I'd like to volunteer to help
> maintain it for v3.0.0.  Its such a simple and clean interface.  I
> haven't found anything else like it.

I have developed a new version and moved the code to sourceforge
(http://sourceforge.net/projects/embossgui/)  Since February, the only
remaining step has been to wrap it up in a releasable format, but I just
haven't found the time.

I had considered waiting until the 3.0.0 release of EMBOSS, but if
there's interest now I'll do my best to get it out there sooner.

Cheers,

Luke


From golharam at umdnj.edu  Thu Jun 16 11:10:49 2005
From: golharam at umdnj.edu (Ryan Golhar)
Date: Thu, 16 Jun 2005 11:10:49 -0400
Subject: [EMBOSS] EMBOSS-GUI
In-Reply-To: <1118866913.13749.12.camel@incognito.invalid>
Message-ID: <002201c57285$95084090$e6028a0a@GOLHARMOBILE1>

The release for EMBOSS 3.0.0 is around July 15th?  If so, I can wait for
embossgui until then.  If you need any help with embossgui, please let
me know.  I'd be more than happy to contribute what I can.

Ryan


-----Original Message-----
From: Luke McCarthy [mailto:lukem at gene.pbi.nrc.ca] 
Sent: Wednesday, June 15, 2005 4:22 PM
To: Ryan Golhar
Subject: Re: [EMBOSS] EMBOSS-GUI


      * (also copied to emboss at embnet.org)

On Wed, 2005-06-15 at 13:01, Ryan Golhar wrote:
> Does anyone know if any work is being done on EMBOSS-GUI by Luke 
> McCarthy.  The web site doesn't seem to be active and out-of-date.
> 
> If a new version isn't being worked on, I'd like to volunteer to help 
> maintain it for v3.0.0.  Its such a simple and clean interface.  I 
> haven't found anything else like it.

I have developed a new version and moved the code to sourceforge
(http://sourceforge.net/projects/embossgui/)  Since February, the only
remaining step has been to wrap it up in a releasable format, but I just
haven't found the time.

I had considered waiting until the 3.0.0 release of EMBOSS, but if
there's interest now I'll do my best to get it out there sooner.

Cheers,

Luke


From msarachu at biol.unlp.edu.ar  Thu Jun 16 15:41:23 2005
From: msarachu at biol.unlp.edu.ar (Martin Sarachu)
Date: Thu, 16 Jun 2005 16:41:23 -0300
Subject: [EMBOSS] Masking the : character?
Message-ID: <42B1D5E3.1000503@biol.unlp.edu.ar>

Dear list,

is there any way to mask the ':' character so it is not interpreted as a 
delimiter for DB:sequence?
I have this file

/home/embtest/wProjects/test/.clustal.05.06.15:17.46.27/ops2_drome.msf

and when I run infoseq I get this error

$ infoseq 
/home/embtest/wProjects/test/.clustal.05.06.15:17.46.27/ops2_drome.msf
Displays some simple information about sequences
Error: failed to open filename 
'/home/embtest/wProjects/test/.clustal.05.06.15'
Error: Unable to read sequence 
'/home/embtest/wProjects/test/.clustal.05.06.15:17.46.27/ops2_drome.msf'
Died: infoseq terminated: Bad value for '-sequence' and no prompt


Thanks in advance,

Martin

-- 
Martin Sarachu
msarachu at biol.unlp.edu.ar
AR.EMBnet
http://www.ar.embnet.org


From yezhiqiang at gmail.com  Sat Jun 18 05:28:16 2005
From: yezhiqiang at gmail.com (yezhiqiang at gmail.com)
Date: Sat, 18 Jun 2005 17:28:16 +0800
Subject: [EMBOSS] Masking the : character?
In-Reply-To: <42B1D5E3.1000503@biol.unlp.edu.ar>
References: <42B1D5E3.1000503@biol.unlp.edu.ar>
Message-ID: <34198fe4050618022825238622@mail.gmail.com>

I have also found this.
and \:  or using quote cannot solve this problem.

But why not just rename your file name? It doesn't bother.


2005/6/17, Martin Sarachu <msarachu at biol.unlp.edu.ar>:
> Dear list,
> 
> is there any way to mask the ':' character so it is not interpreted as a
> delimiter for DB:sequence?
> I have this file
> 
> /home/embtest/wProjects/test/.clustal.05.06.15:17.46.27/ops2_drome.msf
> 
> and when I run infoseq I get this error
> 
> $ infoseq
> /home/embtest/wProjects/test/.clustal.05.06.15:17.46.27/ops2_drome.msf
> Displays some simple information about sequences
> Error: failed to open filename
> '/home/embtest/wProjects/test/.clustal.05.06.15'
> Error: Unable to read sequence
> '/home/embtest/wProjects/test/.clustal.05.06.15:17.46.27/ops2_drome.msf'
> Died: infoseq terminated: Bad value for '-sequence' and no prompt
> 
> Thanks in advance,
> 
> Martin
> 
> --
> Martin Sarachu
> msarachu at biol.unlp.edu.ar
> AR.EMBnet
> http://www.ar.embnet.org
>


From yezhiqiang at gmail.com  Sat Jun 18 05:50:50 2005
From: yezhiqiang at gmail.com (yezhiqiang at gmail.com)
Date: Sat, 18 Jun 2005 17:50:50 +0800
Subject: [EMBOSS] Man pages
In-Reply-To: <004501c5684c$89f2ef90$e6028a0a@GOLHARMOBILE1>
References: <004501c5684c$89f2ef90$e6028a0a@GOLHARMOBILE1>
Message-ID: <34198fe405061802504ace851@mail.gmail.com>

EMBOss has its own manual system: tfm

try like this:
wossname seqret  
tfm seqret


2005/6/3, Ryan Golhar <golharam at umdnj.edu>:
> Hi all,
> 
> I recently noticed there aren't man pages installed with emboss, but I
> thought there were in the past.  Are there man pages available?  If so,
> where/how do I get them?
> 
> -----
> Ryan Golhar
> Computational Biologist
> The Informatics Institute at
> The University of Medicine & Dentistry of NJ
> 
> Phone: 973-972-5034
> Fax: 973-972-7412
> Email: golharam at umdnj.edu
> 
>


From jrvalverde at cnb.uam.es  Mon Jun 20 04:55:20 2005
From: jrvalverde at cnb.uam.es (=?ISO-8859-15?Q?Jos=E9?= R. Valverde)
Date: Mon, 20 Jun 2005 10:55:20 +0200
Subject: [EMBOSS] Multiplatform filenames (was Re: Masking the : character?)
In-Reply-To: <34198fe4050618022825238622@mail.gmail.com>
References: <42B1D5E3.1000503@biol.unlp.edu.ar>
	<34198fe4050618022825238622@mail.gmail.com>
Message-ID: <20050620105520.736fef76.jrvalverde@cnb.uam.es>

On Sat, 18 Jun 2005 17:28:16 +0800
<yezhiqiang at gmail.com> wrote:
> I have also found this.
> and \:  or using quote cannot solve this problem.
> 
> But why not just rename your file name? It doesn't bother.
> 
> 
> 2005/6/17, Martin Sarachu <msarachu at biol.unlp.edu.ar>:
> > Dear list,
> > 
> > is there any way to mask the ':' character so it is not interpreted as a
> > delimiter for DB:sequence?

	Renaming. 
	---------

Or in other words (caution, detailed explanation follows):

    Why should anybody have a database or db. file named something\ or 
something\\\?

But the fact is that by Unix filesystem semantics that is allowed. So,
there is no easy way to avoid the ':' problem as one must acommodate for
this. Specially since :: is also meningful to EMBOSS. One should introduce
the notion of a special scape metacharacter or a quotation method, and
while at it, it should integrate easily with shells... meaning that it 
should not be pre-processed by the shell (e.g. 'file:name' would come out
of the shell as file:name, the user would need to type "'file:name'" or
some other such horrible combination to escape shell quotations too).

The problem arises because the ':' is used for historic reasons as a
carry-over from VMS where it had special meaning on pathnames. This 
does not hold on UNIX where it is a legit character (actually ANY char
but '/' and NULL is a legit character on UNIX). This is important as
EMBOSS may be used on many locales, and you don't know in advance
how a given symbol will be represented on them. Freedom comes at a 
cost.

QUICK SOLUTION
- ------------
I think that for the user it is simpler to know that ':' has a special
meaning and should be avoided.

For the cases where the colon is generated automatically, it may be better
to provide a renaming script that changes the colon to something else.


UI 'PRO' APPROACH
- ---------------
For GUI writers it is probably better to "translate" any such filenames
between the user and EMBOSS. Note the quotes around translate above: it
is not immediate. Let me explain:

	Escaping for the *command line* must be done using some character 
that is a) meaningful (but those are mostly already taken) and b) easy 
to type on a keyboard. In any case, this means that the user must be aware
of the special case, and if so, renaming is just as good a solution.

	Escaping for the GUI removes all conditions and gives you full
freedom. There are useful tricks to use special quoting/escaping chars
on GUIs (hint: look into ASCII 0-32), but translating filenames can NOT
be done transparently to the user (unless you can guarantee yours is
the only user interface they will use). Any translation will change
the filename and make it look differently or even untypable on other
interfaces.

	Note that the problem still remains of distinguishing when a
pathname containing a colon is an actual filename and not a database:file
specification automatically. On a GUI you may assume a :-containing path
is a filename when you are tagging uploaded data or program generated
data, but otherwise you should be cautious, highly cautious. I.e. does
swiss:prot_human refer to the database entry or to the data the user
uploaded and called that way? Is it possible someone has called their
database 'sequencer_files' locally and if so how you distinguish the 
local database of sequencer files from the user batch of sequencer_files:*
uploaded sequences?

	Assuming you can tell, then read on:

	The trick is to create a special hidden directory on each user
directory accessed: e.g. .myGUI-names. Then for every file make a
suitably processed symlink on that subdirectory and call emboss through
the symlink, sort of:

	my-gui-store-file(filename)
	{
		save(filename);
		sym = concatenate(".myGUI-names/", process(filename));
		make_symlink(sym);
	}

	my-gui-emboss-access-file(filename)
	{
		sym = concatenate(".myGUI-names/", process(filename));
		if (!file_exists(sym))
			make_symlink(sym);
		emboss-access(sym);
	}

	process(filename)
	{
		for (p = filename; *p; p++)
			if (*p == ':')
				*p = SUB; // e.g. ASCII 0x1A
	}

And off you go. Why the <SUB>? You should try to substitute the colon by
something that is guaranteed to be portable. You only have either a) the
portable character set (which is all typable) or b) the control character
set (ASCII 0-32) which you may assume will be available everywhere, and
most probably not used in filenames as they are very difficult to type or
use by hand in general. From these we better avoid NUL, BEL, BS, HT, LF,
VT, FF, CR and ESC just in case. But we still have plenty to choose from:
SUB (substitute), CAN (cancel), DLE (data link espcape) have good mnemonics 
for escaping and STX (start of transmission) and ETX (end of transmission) 
for quoting, but these are only suggestions.

That is to say: in the example above we substituted : by <SUB>, because
we only care about this special case. If there were more cases, then full
escaping/quoting might be needed, and then instead we would copy the
filename into a new string and fully quote/escape. 

I suggest the substitution approach since we are doing the encoding *within* 
the file name: anything else (quoting/escaping) will introduce additional 
chars inside the filename and this will reduce the available filename length 
hence making it less transparent and potentially dangerous (should by any 
chance be two filenames on the length limit containing an escapable sequence
and differing only in the last char).

Alternately one may use a hash of the filename instead, but this is more
painful to code, maintain and debug and potentially more wasteful in terms
of space.

Now, the original filenames are in place, and available for the command
line, up/downloads, other user interfaces, etc.. to manage as they wish,
but your GUI is no longer haunted by the infamous colon.

Symlinks on UNIX eat very little space: usually just the directory
entry. If space is very tight and becomes a concern you may consider
either hardlinks or only symlinking special filenames (this last at
the cost of additionally complex logic). With current hard disks I
wouldn't worry.

And, yes, I know this involves many more changes to a UI, but either
users accommodate (by avoiding the colon) or the UI does (by hidding
limitations).

Actually this a similar trick is used by NetATalk, AppleTalk, MacOS X 
and other systems that have similar metadata problems.

				j

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://lists.open-bio.org/pipermail/emboss/attachments/20050620/be157c6f/attachment.bin 

From pmr at ebi.ac.uk  Mon Jun 20 05:16:35 2005
From: pmr at ebi.ac.uk (Peter Rice)
Date: Mon, 20 Jun 2005 10:16:35 +0100
Subject: [EMBOSS] Multiplatform filenames (was Re: Masking the : character?)
In-Reply-To: <20050620105520.736fef76.jrvalverde@cnb.uam.es>
References: <42B1D5E3.1000503@biol.unlp.edu.ar>	<34198fe4050618022825238622@mail.gmail.com> <20050620105520.736fef76.jrvalverde@cnb.uam.es>
Message-ID: <42B68973.7090105@ebi.ac.uk>

Jos? R. Valverde wrote:

>>2005/6/17, Martin Sarachu <msarachu at biol.unlp.edu.ar>:
>>>is there any way to mask the ':' character so it is not interpreted as a
>>>delimiter for DB:sequence?

> The problem arises because the ':' is used for historic reasons as a
> carry-over from VMS where it had special meaning on pathnames. This 
> does not hold on UNIX where it is a legit character (actually ANY char
> but '/' and NULL is a legit character on UNIX). This is important as
> EMBOSS may be used on many locales, and you don't know in advance
> how a given symbol will be represented on them. Freedom comes at a 
> cost.

Strictly speaknig, the problem arises because ':' has become a standard for 
bioinformatics users - though, yes, VMS was the source of the special syntax. 
It was adopted by, among others, GCG and SRS. It also is used, of course, in 
URN and URL syntax.

However, in this case there is a partial solution. only alphanumneric 
characters are allowed in EMBOSS database names, and they must be more that 
one character in length (to avoid clashing with C: on Windows systems).

The problem posted was not in a database name. It was the filename:id syntax, 
where a ':' appeared in the filename full path.

For a ':' in a directory name (not in the filename) we could try to catch it 
by not allowing '/' in the ID. However, that can run into problems. For 
example, PFAM uses '/' in the identifier of a sequence derived from a longer 
entry.


> QUICK SOLUTION
> - ------------
> I think that for the user it is simpler to know that ':' has a special
> meaning and should be avoided.
> 
> For the cases where the colon is generated automatically, it may be better
> to provide a renaming script that changes the colon to something else.

That would be my recommendation too.

> UI 'PRO' APPROACH
> - ---------------
> For GUI writers it is probably better to "translate" any such filenames
> between the user and EMBOSS. Note the quotes around translate above: it
> is not immediate. Let me explain:
> 

> 	The trick is to create a special hidden directory on each user
> directory accessed: e.g. .myGUI-names. Then for every file make a
> suitably processed symlink on that subdirectory and call emboss through
> the symlink, sort of:

Looks like a good approach. The alternative would be to trap "bad" filenames 
and ask the user to correct them.

regards,

Peter


From kkmattil at csc.fi  Mon Jun 20 07:50:46 2005
From: kkmattil at csc.fi (Kimmo Mattila)
Date: Mon, 20 Jun 2005 14:50:46 +0300 (EEST)
Subject: [EMBOSS] Installing EMBOSS on a Rocks linux
Message-ID: <Pine.LNX.4.62.0506201446440.31123@sampo3.csc.fi>


Hi

I would like to ask, if anyone of you have managed to install EMBOSS on a
linux cluster running Rocks linux.  When I tried to install EMBOSS to our
Rocks cluster, the standard installation procedure went through without 
error messages, but when I try to start an EMBOSS application,  I get an 
error message:

   wossname

   Segmentation fault (core dumped)

Google search about this topic revealed that some one else have had 
similar problems with Rocks too, but I was not able to find any potential 
solution. However, EMBOSS is available in Rocks based BioBrew linux 
distribution.

So, any hints about how to install EMBOSS in a Rocks cluster would be 
welcome.

Regards,

Kimmo Mattila


---------------------------------------------------------------
Kimmo Mattila, sovellusasiantuntija, Bioinformatiikan palvelut, CSC
PL 405 02101 Espoo, puh 09 457 2708 , fax (09) 457 2302
CSC on tieteen tietotekniikan keskus, www.csc.fi, s-posti: 
kimmo.mattila at csc.fi

Kimmo Mattila, application scientist, Bioinformatics Support, CSC
P.O. Box 405 02101 Espoo, Finland, tel +358 9 4572708, fax +358 9 4572302
CSC is the Finnish IT Center for Science, www.csc.fi, e-mail: 
kimmo.mattila at csc.fi
---------------------------------------------------------------


From smiddha at indiana.edu  Mon Jun 20 10:59:56 2005
From: smiddha at indiana.edu (Sumit Middha)
Date: Mon, 20 Jun 2005 09:59:56 -0500
Subject: [EMBOSS] Emboss package - file size limitations
In-Reply-To: <Pine.LNX.4.62.0506201446440.31123@sampo3.csc.fi>
References: <Pine.LNX.4.62.0506201446440.31123@sampo3.csc.fi>
Message-ID: <1119279596.42b6d9ec2c52d@webmail.iu.edu>


Hi,

I looked around for threshold limitations on the size of the files that can be
used for analysis, but could not locate any information.

Is there a limit to the size of files that I can use, and is there a different
limit on the web and command line usage.

Actually I had the same question for GCG tools.

Thanks,
Sumit


From pmr at ebi.ac.uk  Mon Jun 20 11:26:52 2005
From: pmr at ebi.ac.uk (Peter Rice)
Date: Mon, 20 Jun 2005 16:26:52 +0100
Subject: [EMBOSS] Emboss package - file size limitations
In-Reply-To: <1119279596.42b6d9ec2c52d@webmail.iu.edu>
References: <Pine.LNX.4.62.0506201446440.31123@sampo3.csc.fi> <1119279596.42b6d9ec2c52d@webmail.iu.edu>
Message-ID: <42B6E03C.9020306@ebi.ac.uk>

Hi Sumit,

> Is there a limit to the size of files that I can use, and is there a different
> limit on the web and command line usage.

EMBOSS has no hard coded limit on sequence or file size. The operating system 
may have problems with 2Gb file size, and the EMBLCD indexing system we use 
for database indexing in EMBOSS 2 has a 2Gb file size limit (4 byte file 
pointers are part of the index format) - there will be a new indexing system 
in beta release with EMBOSS 3 that will have enough space for large file offsets.

Some algorithms will have limits, depending on the memory (real and virtual) 
on your machine.

> Actually I had the same question for GCG tools.

I believe sequence length is still up to 350kb unless you have the source code 
(when I was at Sanger I routinely rebuilt GCG with 750kb as the maximum 
sequence length so the genome sequencers could still use it on their own 
sequences!) A future release of GCG is supposed to increase this.

Hope that helps,

Peter Rice


From francis at bii.a-star.edu.sg  Tue Jun 21 04:47:51 2005
From: francis at bii.a-star.edu.sg (Francis Tang)
Date: Tue, 21 Jun 2005 16:47:51 +0800
Subject: [EMBOSS] Wildfire 2.0
Message-ID: <42B7D437.5060506@bii.a-star.edu.sg>

Dear EMBOSS users,

On behalf of the Bioinformatics Institute, Singapore, I would like to 
announce that Wildfire 2.0 is now available for download from 
http://wildfire.bii.a-star.edu.sg .

Wildfire is a GUI application for constructing workflows.  It has been 
configured so that you can build workflows using EMBOSS applications 
immediately.  The resulting workflows can run on a cluster or other 
multi-cpu machine, and exploit parallelism where possible.

Wildfire is described in the BMC Bioinformatics article:

     "Wildfire: distributed, Grid-enabled workflow construction and 
execution", BMC Bioinformatics 2005, 6:69.
     http://www.biomedcentral.com/1471-2105/6/69/abstract

We invite you all to download and try Wildfire and welcome feedback to 
wildfire at bii.a-star.edu.sg .

Thank you.

Francis.


-- 
Francis TANG, Post-Doctoral Research Fellow
Bioinformatics Institute, BMSI, A-STAR, Singapore.
Tel: +65 64788282  Fax: +65 64789048  Email: francis at bii.a-star.edu.sg
Add: Matrix L7, Biopolis   WWW: http://www.bii.a-star.edu.sg/~francis/


From jieqiwang at gmail.com  Tue Jun 21 10:55:46 2005
From: jieqiwang at gmail.com (Wang Jieqi)
Date: Tue, 21 Jun 2005 22:55:46 +0800
Subject: [EMBOSS] Help with retrieving sequences
Message-ID: <55162b5205062107555043348@mail.gmail.com>

Hello,
I started to learn EMBOSS recently. Now, I want to read the CDS of
several mRNA sequences. The complete entires of these mRNAs(cDNA) have
been retrieved from GeneBank into a single file. Could you please tell
me what to do next? And, I find that seqret seems to only read the
first molecule, could you please help me out? Thanks.


   Best regards, 
Jieqi
-- 
Jieqi Wang
Room 121, Department of Biology
Tsinghua University
Beijing, 100084
China, People's Republic
Mobile: +86-13641302483
Dorm:   +86-10-51534406
Lab:     +86-10-62784794
Fax:     +86-10-62794376


From aengus.stewart at cancer.org.uk  Tue Jun 21 11:16:41 2005
From: aengus.stewart at cancer.org.uk (Aengus Stewart)
Date: Tue, 21 Jun 2005 16:16:41 +0100
Subject: [EMBOSS] Data Lib sizes and indexing progs
Message-ID: <42B82F59.5040200@cancer.org.uk>


Hi folks,

Just wondering how the new indexing methods were coming on.

Its just I had a look at the most recent EMBL release and its (give or take the odd gig)AND INDEXING PROGS 250Gb which means to have the head room to hold a copy while installing a new copy requires >500Gb.

Any info on how the new indexing will work and will it still have to run off uncompressed .dat files or will it produce its own index format?

Sorry about the questions, its just I am rushing around the filesystem deleting anything that may appear to be "deleteable" to scrounge enough space :-)


Regards
Aengus

-- 
-----------------------------------------------------------------------
Aengus Stewart
Group Leader
Bioinformatics at CGAL                            Tel: +44 (0)20 7269 3679
Cancer Research UK, Lincoln's Inn Fields, Holborn, London, WC2A 3PX, UK
-----------------------------------------------------------------------

This electronic message contains information  which may be privileged and
confidential.  The information is intended to be for the use of the
individual(s) or entity named above.  If you are not the intended recipient,
be aware that any disclosure, copying, distribution or use of the contents
of this information is prohibited. If you have received this electronic
message in error, please notify me by telephone or email (to the number
or address above) immediately.


From ableasby at hgmp.mrc.ac.uk  Tue Jun 21 11:27:56 2005
From: ableasby at hgmp.mrc.ac.uk (Alan Bleasby)
Date: Tue, 21 Jun 2005 16:27:56 +0100 (BST)
Subject: [EMBOSS] Data Lib sizes and indexing progs
Message-ID: <200506211527.j5LFRuRR024742@bromine.hgmp.mrc.ac.uk>

The new indexing programs are done (in CVS). The programs are:
dbxflat, dbxfasta and dbxgcg  and they operate like their
'dbi' couterparts. The dbx and dbi programs will be available
in the next release.

So, for EMBL, you would typically index the *.dat files.
As before, you can create id,acc,sv,key,org & des indexes
(though many sites just index id and acc). 

An indexing job on the whole of the recently released EMBL will
produce id, acc and key indexes of the following sizes. They
should give you some idea of the extra disc space you'll need.

-rw-r--r--  1 root root      19950 Jun 19 14:11 embli.ent
-rw-r--r--  1 root root        122 Jun 20 13:41 embli.pxac
-rw-r--r--  1 root root        122 Jun 20 13:41 embli.pxid
-rw-r--r--  1 root root        126 Jun 20 13:41 embli.pxkw
-rw-r--r--  1 root root 8755992576 Jun 20 13:41 embli.xac
-rw-r--r--  1 root root 7482558464 Jun 20 13:41 embli.xid
-rw-r--r--  1 root root 4046751744 Jun 20 13:41 embli.xkw

HTH

Alan


From kellert at ohsu.edu  Thu Jun 23 00:06:18 2005
From: kellert at ohsu.edu (Thomas J Keller)
Date: Wed, 22 Jun 2005 21:06:18 -0700
Subject: [EMBOSS] source of common vectors in cirdna format
Message-ID: <03b2ab1a566cf2386b89bb570e26e6eb@ohsu.edu>

Greetings,
Is there a source for common vectors in cirdna format available for 
downloading?

Thanks in advance,
Tom Keller

Tom Keller, Ph.D.
http://www.ohsu.edu/research/core
kellert at ohsu.edu
503-494-2442
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/enriched
Size: 259 bytes
Desc: not available
Url : http://lists.open-bio.org/pipermail/emboss/attachments/20050622/d4ae1249/attachment.bin 

From clemens.broger at roche.com  Thu Jun 23 09:48:24 2005
From: clemens.broger at roche.com (Broger, Clemens)
Date: Thu, 23 Jun 2005 15:48:24 +0200
Subject: [EMBOSS] Needle/water, revcomp
Message-ID: <7E08B1C16672A147B29D3DE3827536E37E00CC@rbamsem3.emea.roche.com>

I have 2 questions:

The first is about identity/similarity in nucleotide alignments made
with needle (probably the same holds true for water):
 
########################################
# Program:  needle
# Rundate:  Thu Jun 23 13:29:58 2005
# Align_format: srspair
# Report_file: seq0.needle
########################################

#=======================================
#
# Aligned_sequences: 2
# 1: SEQ0
# 2: SEQ1
# Matrix: EDNAFULL
# Gap_penalty: 100.0
# Extend_penalty: 10.0
#
# Length: 70
# Length of sequence 1: 70
# Length of sequence 2: 70
# Identity:      46/70 (65.7%)
# Similarity:    47/70 (67.1%)
# Gaps:           0/70 ( 0.0%)
# Score: 162.0
# 
#
#=======================================

                              .         .         .         .         .
SEQ0               1 aaaaaaaaaaaaaaaaaaaaaaaaacccccgggggtttttuuuuunnnnn
50
                     |||||||||||||||||||||......|......||:....:|..     
SEQ1               1 aaaaaaaaaaaaaaaaaaaaacgtunacgtunacgtunacgtunacgtun
50
                              .         .         .         .         .

                              .         .
SEQ0              51 aaaaaaaaaaaaaaaaaaaa     70
                     ||||||||||||||||||||
SEQ1              51 aaaaaaaaaaaaaaaaaaaa     70
                              .         .

Each base of the set acgtun is aligned against each other. The 20 a's at
the beginning and end are only to force an ungapped alignment. Maximum
gap penalties were used.
 
I agree with the symbols in the alignment |,: and ., but the 46
identities in the summary imply that the n-n match is also counted. The
t-u matches are counted as similar, which is ok, but the n-n match is
not counted as similar, although it is counted as identical. I think the
n-n match should not be counted both in identity and similarity.
 
Now for ambiguous bases. w is a or t
 
########################################
# Program:  needle
# Rundate:  Thu Jun 23 14:53:33 2005
# Align_format: srspair
# Report_file: seq0.needle
########################################

#=======================================
#
# Aligned_sequences: 2
# 1: SEQ0
# 2: SEQ1
# Matrix: EDNAFULL
# Gap_penalty: 100.0
# Extend_penalty: 10.0
#
# Length: 26
# Length of sequence 1: 26
# Length of sequence 2: 26
# Identity:      21/26 (80.8%)
# Similarity:    23/26 (88.5%)
# Gaps:           0/26 ( 0.0%)
# Score: 94.0
# 
#
#=======================================

                              .         .      
SEQ0               1 aaaaaaaaaawwwwwwaaaaaaaaaa     26
                     ||||||||||..   .||||||||||
SEQ1               1 aaaaaaaaaaatwgcuaaaaaaaaaa     26
                              .         .      

In the alignment I would put a dot at the w-w match (but I could also
agree with the way it is handled now). But again the w is counted in the
summary as an identity but not as a similarity.


The second question is about the handling in EMBOSS of
reverse-complemented nucleotide segments such as  

db:seq[10:20:r]

The sequence is first reverse-complemented and then residues 10 to 20
are cut out.
Biologists usually expect that residues 10 to 20 are first cut out and
then reverse-complemented.

Can this be changed? That would be very helpful.

Best regards

Clemens


Dr. Clemens Broger
Bioinformatics
F. Hoffmann-La Roche Ltd.
PRBI 65/303
CH-4070 Basel
clemens.broger at roche.com
+41-61-688-4447

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.open-bio.org/pipermail/emboss/attachments/20050623/270ec53f/attachment.html 

From pmr at ebi.ac.uk  Thu Jun 23 10:38:25 2005
From: pmr at ebi.ac.uk (pmr at ebi.ac.uk)
Date: Thu, 23 Jun 2005 15:38:25 +0100 (BST)
Subject: [EMBOSS] source of common vectors in cirdna format
In-Reply-To: <03b2ab1a566cf2386b89bb570e26e6eb@ohsu.edu>
References: <03b2ab1a566cf2386b89bb570e26e6eb@ohsu.edu>
Message-ID: <2840.12.27.2.2.1119537505.squirrel@webmail.ebi.ac.uk>

Tom Keller writes:

> Is there a source for common vectors in cirdna format available for
> downloading?

Or is there a source of common vectors that we could convert to cirdna
format?

regards,

Peter Rice


From pmr at ebi.ac.uk  Thu Jun 23 10:41:34 2005
From: pmr at ebi.ac.uk (pmr at ebi.ac.uk)
Date: Thu, 23 Jun 2005 15:41:34 +0100 (BST)
Subject: [EMBOSS] Needle/water, revcomp
In-Reply-To: 
     <7E08B1C16672A147B29D3DE3827536E37E00CC@rbamsem3.emea.roche.com>
References: 
    <7E08B1C16672A147B29D3DE3827536E37E00CC@rbamsem3.emea.roche.com>
Message-ID: <2849.12.27.2.2.1119537694.squirrel@webmail.ebi.ac.uk>

Clemens Broger writes:

> I have 2 questions:
>
> The first is about identity/similarity in nucleotide alignments made
> with needle (probably the same holds true for water):

Tricky. This requires the matrix to define some codes as ambiguity codes
so we know w-w is not an identity. I woudl guess we can extend the matrix
formats we use to include this information, or perhaps for nucleotide
sequences we can "know" the answer.

I will investigate.

> The second question is about the handling in EMBOSS of
> reverse-complemented nucleotide segments such as
>
> db:seq[10:20:r]
>
> The sequence is first reverse-complemented and then residues 10 to 20
> are cut out.
> Biologists usually expect that residues 10 to 20 are first cut out and
> then reverse-complemented.
>
> Can this be changed? That would be very helpful.

Oops. Yes - will do.

regards,

Peter Rice


From msarachu at biol.unlp.edu.ar  Mon Jun 27 08:28:58 2005
From: msarachu at biol.unlp.edu.ar (Martin Sarachu)
Date: Mon, 27 Jun 2005 09:28:58 -0300
Subject: [EMBOSS] Re: wemboss: warning and errors
In-Reply-To: <0C9336E1DA90DB479BEBAF2C7C5699E1016EA96D@USINDMDOWM001.dow.com>
References: <0C9336E1DA90DB479BEBAF2C7C5699E1016EA96D@USINDMDOWM001.dow.com>
Message-ID: <42BFF10A.4090405@biol.unlp.edu.ar>

Dear Rani,

about the error with ACD, when running distmat from command line 
(-options to be prompted for all options) I get this error with ACD

> # distmat -options
> Creates a distance matrix from multiple alignments
> Input sequence set: uniprot:papa_*
> Multiple substitution correction methods for proteins
>          0 : Uncorrected
>          1 : Jukes-Cantor
>          2 : Kimura Protein
> Method to use [0]: 1
> Warning: ACD expression invalid @(!$acdprotein)
> 
> Warning: ACD expression invalid @(!$acdprotein)
> 
> Error: File /usr/local/emboss/share/EMBOSS/acd/distmat.acd line 60: (ambiguous) Bad additional flag N | Y)
> 

but without -options (i.e. default options chosen) runs ok

> # distmat
> Creates a distance matrix from multiple alignments
> Input sequence set: uniprot:papa_*
> Multiple substitution correction methods for proteins
>          0 : Uncorrected
>          1 : Jukes-Cantor
>          2 : Kimura Protein
> Method to use [0]: 1
> Output file [papa_.distmat]:
> Warning: Sequence lengths are not equal!
> Warning: Sequence lengths are not equal!
> Warning: Sequence lengths are not equal!

there is a missing left parenthesis in distmat.acd in line 61, please 
change this

>     additional: "@(@(@(!$acdprotein)) & @($(nucmethod)==1)) |

to this

>     additional: "@(@(@(!$(acdprotein)) & @($(nucmethod)==1)) |


Regards,

Martin

PS: working on the exclude problem...


Mamidipalli, SudhaRani (S) wrote:
> Hello Martin,
> 
> While testing the programs in wEMBOSS,we have encountered couple of problems.
> 
> 1.The 'distmat' program gave some warning. Here is the warning of that program. 
> -------------------------------
> Warning! 
> "ambiguous" parameter: syntax error (missing left parenthesis) in ACD expression (tell to EMBOSS Manager : this could produce wrong results from program execution!) 
> -------------------------------
> I went and checked distmat.acd file but couldn't find any error. 
> 
> 2. I added some programs, that we don't want to be displayed in wemboss, in the exclude file: /genomics/sw/wEMBOSS-1.4.0/wEMBOSS/data/exclude. And then I re-installed wrappers4EMBOSS and wEMBOSS. Surprisingly, only few programs(for example tranalign,embossversion etc.) got deleted from wemboss whereas few programs (for example textsearch, entret etc.) show up with error 
> --------
> EMBOSS: error...
> chaos has been excluded
> ----------
>   
> Please clarify.
> 
> Thanks and Regards,
> Rani.
> 

-- 
Martin Sarachu
msarachu at biol.unlp.edu.ar
AR.EMBnet
http://www.ar.embnet.org


From pmr at ebi.ac.uk  Mon Jun 27 10:25:35 2005
From: pmr at ebi.ac.uk (pmr at ebi.ac.uk)
Date: Mon, 27 Jun 2005 15:25:35 +0100 (BST)
Subject: [EMBOSS] Re: wemboss: warning and errors
In-Reply-To: <42BFF10A.4090405@biol.unlp.edu.ar>
References: <0C9336E1DA90DB479BEBAF2C7C5699E1016EA96D@USINDMDOWM001.dow.com>
    <42BFF10A.4090405@biol.unlp.edu.ar>
Message-ID: <1613.12.27.2.2.1119882335.squirrel@webmail.ebi.ac.uk>

Martin Srachu writes:

> there is a missing left parenthesis in distmat.acd in line 61, please
> change this
>
>>     additional: "@(@(@(!$acdprotein)) & @($(nucmethod)==1)) |
>
> to this
>
>>     additional: "@(@(@(!$(acdprotein)) & @($(nucmethod)==1)) |

Already fixed in EMBOSS 2.10.0.

But this does highlight a gap in the ACD validation - this expression is
only evaluated when needed (when -option is used). I will try adding
checks for all strings to generate warnings for unbalanced () and $ or @
without ( to acdvalid before the July 15th release.

>> --------
>> EMBOSS: error...
>> chaos has been excluded
>> ----------

I know this is really a wEMBOSS problem, but the message appeals to my
sense of humour!!! Can you send me an explanation of it when you have a
solution - it may appear in future EMBOSS talks :-)

regards,

Peter


From gbottu at ben.vub.ac.be  Wed Jun 29 04:30:02 2005
From: gbottu at ben.vub.ac.be (Guy Bottu)
Date: Wed, 29 Jun 2005 10:30:02 +0200
Subject: [EMBOSS] bug related to -plasmid parameter
Message-ID: <20050629083002.GA4560@bigben.ulb.ac.be>

from: Belgian EMBnet Node

	Dear colleagues,

At the BEN site we have on our main computer EMBOSS 2.10.0 under Alpha OSF 
5.1A. I just noticed that the programs remap, restrict and restover give a 
segmentation fault when run with parameter -plasmid. This does however not 
occur with an EMBOSS installation we have on a Linux. So, this behaviour 
must be dependant on the OS and maybe on the hardware. Did someone else 
notice it ?

	Regards,
	Guy Bottu


From ableasby at hgmp.mrc.ac.uk  Wed Jun 29 08:13:15 2005
From: ableasby at hgmp.mrc.ac.uk (Alan Bleasby)
Date: Wed, 29 Jun 2005 13:13:15 +0100 (BST)
Subject: [EMBOSS] bug related to -plasmid parameter
Message-ID: <200506291213.j5TCDFMb014301@bromine.hgmp.mrc.ac.uk>

Dear Guy,

Thanks for spotting that. It's now fixed in CVS and will be
part of the 3.0.0 release.

ATB

Alan Bleasby
RFCGR/HGMP (for one more month)


From gbottu at ben.vub.ac.be  Thu Jun  2 10:09:54 2005
From: gbottu at ben.vub.ac.be (Guy Bottu)
Date: Thu, 2 Jun 2005 12:09:54 +0200
Subject: [EMBOSS] use water/matcher to find where RNA bybridizes
Message-ID: <20050602100954.GA14063@bigben.ulb.ac.be>

from : Belgian EMBnet Node

	Dear colleagues,

One of our users had a problem : how to find the location where a small 
molecule of RNA binds to a mRNA and so interferes with its functioning. 
Nothing in EMBOSS and nothing found on the WWW. We finally did the 
following : use revseq -nocomp to reverse the mRNA and then align the two 
sequences using as matrix :
-------------------------------
    A   T   G   C   S   W   R   Y   K   M   B   V   H   D   N   U
A   0   5   0   0   0   5   5   0   0   5   0   5   5   5   5   0
T   5   0   5   0   0   5   0   5   5   0   5   0   5   5   5   5
G   0   5   0   5   5   0   5   0   5   0   5   5   0   5   5   3
C   0   0   5   0   5   0   0   5   0   5   5   5   5   0   5   0
S   0   0   5   5   5   0   5   5   5   5   5   5   5   5   5   0         
W   5   5   0   0   0   5   5   5   5   5   5   5   5   5   5   5
R   5   0   5   0   5   5   5   0   5   5   5   5   5   5   5   0
Y   0   5   0   5   5   5   0   5   5   5   5   5   5   5   5   5
K   0   5   5   0   5   5   5   5   5   0   5   5   5   5   5   5
M   5   0   0   5   5   5   5   5   0   5   5   5   5   5   5   0          
B   0   5   5   5   5   5   5   5   5   5   5   5   5   5   5   5   
V   5   0   5   5   5   5   5   5   5   5   5   5   5   5   5   0
H   5   5   0   5   5   5   5   5   5   5   5   5   5   5   5   5
D   5   5   5   0   5   5   5   5   5   5   5   5   5   5   5   5
N   5   5   5   5   5   5   5   5   5   5   5   5   5   5   5   5
U   0   5   3   0   0   5   0   5   5   0   5   0   5   5   5   5
-------------------------------
This gave a reasonable result. water made the following alignment :
------------------------------
#=======================================
#
# Aligned_sequences: 2
# 1: mRNA
# 2: RNAi
# Matrix: HYB
# Gap_penalty: 10.0
# Extend_penalty: 0.5
#
# Length: 49
# Identity:       3/49 ( 6.1%)
# Similarity:     0/49 ( 0.0%)
# Gaps:           0/49 ( 0.0%)
# Score: 185.0
# 
#
#=======================================

mRNA            2892 AATGTTGTGTGAGGATAATAGTAATAGTAATAGTAATAATAATAATAAT  2940
                     ..  . . .... ... .. .. .. .. ..  ................
RNAi               1 TTTGACCCTGCTACTACTACTACTACTACTACGATTATTATTATTATTA  49
--------------------------------
The only thing which bothers me is that the base pairs (which do have a 
positive comparison score) are not labeled as "similar", they get a '.' 
instead of a ':'. Does someone know why this is ?

	Guy Bottu


From gbottu at ben.vub.ac.be  Thu Jun  2 15:08:45 2005
From: gbottu at ben.vub.ac.be (Guy Bottu)
Date: Thu, 2 Jun 2005 17:08:45 +0200
Subject: [EMBOSS] use water/matcher to find where RNA bybridizes
In-Reply-To: <E1Ddr3t-0000sV-00@mendel.bio.caltech.edu>
References: <E1Ddr3t-0000sV-00@mendel.bio.caltech.edu>
Message-ID: <20050602150845.GA17226@bigben.ulb.ac.be>

On Thu, Jun 02, 2005 at 07:52:45AM -0700, David Mathog wrote:
> > One of our users had a problem : how to find the location where a small 
> > molecule of RNA binds to a mRNA and so interferes with its functioning. 
> 
> This can also be addressed with Mfold.  Let A be the large mRNA of
> length N and B the small one of length M. Create a hybrid RNA sequence
> AB of length N+M.  Set the rules in mfold so that
> 
>   bases 1->N will not bind with bases 1->N
>   bases N+1->N+M will not bind with bases N+1->N+M

Clever idea ! As a matter of fact, I had thought of doing that, with the 
extra of putting between both a linker of 200 T's wich are not allowed to 
pait at all. Unfortunately the program mfold crashed with message :

Fill run failed

Maybe there is something unusual in the sequence.

	Regards,
	Guy Bottu,
	BEN


From mathog at mendel.bio.caltech.edu  Thu Jun  2 14:52:45 2005
From: mathog at mendel.bio.caltech.edu (David Mathog)
Date: Thu, 02 Jun 2005 07:52:45 -0700
Subject: [EMBOSS] use water/matcher to find where RNA bybridizes
Message-ID: <E1Ddr3t-0000sV-00@mendel.bio.caltech.edu>


> 
> One of our users had a problem : how to find the location where a small 
> molecule of RNA binds to a mRNA and so interferes with its functioning. 


This can also be addressed with Mfold.  Let A be the large mRNA of
length N and B the small one of length M. Create a hybrid RNA sequence
AB of length N+M.  Set the rules in mfold so that

  bases 1->N will not bind with bases 1->N
  bases N+1->N+M will not bind with bases N+1->N+M

Run Mfold.
Look through the results.

If this runs properly you should see B bound somewhere in A with
an energy level you may then use to compare binding affinities. 

Regards,  

David Mathog
mathog at caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech


From fernan at iib.unsam.edu.ar  Thu Jun  2 17:08:31 2005
From: fernan at iib.unsam.edu.ar (Fernan Aguero)
Date: Thu, 2 Jun 2005 14:08:31 -0300
Subject: [EMBOSS] use water/matcher to find where RNA bybridizes
In-Reply-To: <20050602100954.GA14063@bigben.ulb.ac.be>
References: <20050602100954.GA14063@bigben.ulb.ac.be>
Message-ID: <20050602170831.GW44956@iib.unsam.edu.ar>

+----[ Guy Bottu <gbottu at ben.vub.ac.be> (02.Jun.2005 07:13):
|
| mRNA            2892 AATGTTGTGTGAGGATAATAGTAATAGTAATAGTAATAATAATAATAAT  2940
|                      ..  . . .... ... .. .. .. .. ..  ................
| RNAi               1 TTTGACCCTGCTACTACTACTACTACTACTACGATTATTATTATTATTA  49
| --------------------------------
| The only thing which bothers me is that the base pairs (which do have a 
| positive comparison score) are not labeled as "similar", they get a '.' 
| instead of a ':'. Does someone know why this is ?
|
+----]

Guy,

just a guess, but '.' and ':' are used in protein-protein
comparisons to denote identity and similarity which are both different
and meaningful. In dna-dna comparisons, you only care for
identity, whether you consider it to be aligning A with A or
A with its complement. So I would only expect only one of
'.' or ':' used ... don't remember which is used for
identity in emboss.

My 2 cents guess,

Fernan


From pmr at ebi.ac.uk  Thu Jun  2 17:23:13 2005
From: pmr at ebi.ac.uk (pmr at ebi.ac.uk)
Date: Thu, 2 Jun 2005 18:23:13 +0100 (BST)
Subject: [EMBOSS] use water/matcher to find where RNA bybridizes
In-Reply-To: <20050602100954.GA14063@bigben.ulb.ac.be>
References: <20050602100954.GA14063@bigben.ulb.ac.be>
Message-ID: <3729.198.161.30.152.1117732993.squirrel@webmail.ebi.ac.uk>

Guy Bottu writes:

> One of our users had a problem : how to find the location where a small
> molecule of RNA binds to a mRNA and so interferes with its functioning.
> Nothing in EMBOSS and nothing found on the WWW. We finally did the
> following : use revseq -nocomp to reverse the mRNA and then align the two
> sequences using as matrix :
> -------------------------------
>     A   T   G   C   S   W   R   Y   K   M   B   V   H   D   N   U
> A   0   5   0   0   0   5   5   0   0   5   0   5   5   5   5   0
> T   5   0   5   0   0   5   0   5   5   0   5   0   5   5   5   5


..........

> -------------------------------
> This gave a reasonable result. water made the following alignment :
> ------------------------------


.....

> mRNA            2892 AATGTTGTGTGAGGATAATAGTAATAGTAATAGTAATAATAATAATAAT
> 2940
>                      ..  . . .... ... .. .. .. .. ..  ................
> RNAi               1 TTTGACCCTGCTACTACTACTACTACTACTACGATTATTATTATTATTA  49
> --------------------------------
> The only thing which bothers me is that the base pairs (which do have a
> positive comparison score) are not labeled as "similar", they get a '.'
> instead of a ':'. Does someone know why this is ?


I believe this is simply because the bases are not identical. A user
matrix can have arbitrary values, so the results are marked as similar
(A=T scores 5) but identities are only scored at zero and so never appear
with ":".

You could try setting the scores to match the hydrogen bonds for this
experiment (G=C 3 A=T 2 G=T 1)

RNA folding is a missing area in EMBOSS. The Vienna package has been
suggested as a possible EMBASSY package. Does anyone have any experience
with it, or suggestions for alternative RNA packages we could use?

regards,

Peter


From David.Bauer at SCHERING.DE  Fri Jun  3 06:37:11 2005
From: David.Bauer at SCHERING.DE (David.Bauer at SCHERING.DE)
Date: Fri, 3 Jun 2005 08:37:11 +0200
Subject: Antwort: Re: [EMBOSS] use water/matcher to find where RNA bybridizes
Message-ID: <OFB791E762.F05DE52B-ONC1257015.0020620D-C1257015.00245D30@schering.net>


Hi,

I use the Vienna RNA package.
It allows to look for global structure of the complete RNA (RNAfold) or
local structures (RNALfold).
The global folding accepts also longer sequences (as far as I remember this
was a problem with Mfold).
Visualization is a bit tricky. But there are helper scripts to convert the
output to .ct files (b2ct) which can be used to create
different graphical representations.

Regards,
David.


RNA folding is a missing area in EMBOSS. The Vienna package has been
suggested as a possible EMBASSY package. Does anyone have any experience
with it, or suggestions for alternative RNA packages we could use?

regards,

Peter


From gbottu at ben.vub.ac.be  Fri Jun  3 08:17:41 2005
From: gbottu at ben.vub.ac.be (Guy Bottu)
Date: Fri, 3 Jun 2005 10:17:41 +0200
Subject: [EMBOSS] use water/matcher to find where RNA bybridizes
In-Reply-To: <20050602170831.GW44956@iib.unsam.edu.ar>
References: <20050602100954.GA14063@bigben.ulb.ac.be> <20050602170831.GW44956@iib.unsam.edu.ar>
Message-ID: <20050603081741.GA23810@bigben.ulb.ac.be>

	Dear all,

Thanks for your replies. It is however still not clear to me where the '.' 
come from. I thought the EMBOSS "pair" output would put a '|' for 
identities and a ':' for similarities (score positive). Maybe the program 
is fooled and seriously perturbed by a matrix that assigns a negative 
score to identical base pairs.

As for the proposal to distribute ViennaRNA as an Embassadir, why not ? At 
the BEN site we have mfold integrated under EMBOSS, but I am afraid 
distributing mfold as Embasadir will turn out to be impossible bacause of 
licencing issues.
Note that mfold does not entirely solve the problem, since it operates on 
a single sequence, it does not search for a structure composed of two 
strands. I guess this is also true for ViennaRNA.
We (me and our user) had tried to use mfold (with as input a sequence 
composed of the mRNA, a poly-T linker and the small RNA), but the program 
crashed with error message "Cannot get Fill". Maybe the sequence had 
something unusual.

	Regards,
	Guy Bottu,
	BEN
 

From atorrano at lsi.upc.edu  Fri Jun  3 09:15:10 2005
From: atorrano at lsi.upc.edu (Alexis Torrano Martinez)
Date: Fri, 3 Jun 2005 11:15:10 +0200 (MET DST)
Subject: [EMBOSS] external and app
Message-ID: <7479297835atorrano@lsi.upc.es>


Hello

I am trying to execute hmmsearch from EMBOSS. This way I want to have
a kind of wrap over the DDBB and retrieval apps. 

 
 DB Pfam [
 	method: "app"
 	comment: "Pfam with HMMER indexing"
 	app: "$HMMERBIN/hmmsearch $EMBOSS_DATA/pfam/Pfam_ls %s"
 	]

That is my DB specification for EMBOSS. How should I run seqret 
to execute properly hmmsearch? 

 seqret Pfam:$HOME/soft/hmmer/last/tutorial/7LES_DROME


And the next error was unexpected :

Error: Unable to read sequence
'Pfam:/usr/usuaris/it/inb/soft/hmmer/last/tutorial/7LES_DROME'

As tutorial says, if you specify external, %s receives as value the
second field of the query (ID from seqret DB:ID).

There is a way to call hmmsearch from EMBOSS?

           A lot of thanks.

         Regards.

    Alexis Torrano.     


--
-----------------------------------------------------
Alexis Torrano Martinez

Instituto Nacional de Bioinformatica (INB) Nodo
Computacional GNHC-2
UPC-CIRI
c/. Jordi Girona 1-3
Modul C6-E201           Tel.   : 934 011 650
E-08034 Barcelona       Fax    : 934 017 014
Catalunya (Spain)       e-mail : atorrano at lsi.upc.edu
-----------------------------------------------------


From gbottu at ben.vub.ac.be  Fri Jun  3 10:03:18 2005
From: gbottu at ben.vub.ac.be (Guy Bottu)
Date: Fri, 3 Jun 2005 12:03:18 +0200
Subject: [EMBOSS] external and app
In-Reply-To: <7479297835atorrano@lsi.upc.es>
References: <7479297835atorrano@lsi.upc.es>
Message-ID: <20050603100318.GA24538@bigben.ulb.ac.be>

On Fri, Jun 03, 2005 at 11:15:10AM +0200, Alexis Torrano Martinez wrote:
> I am trying to execute hmmsearch from EMBOSS. This way I want to have
> a kind of wrap over the DDBB and retrieval apps. 
> 
>  
>  DB Pfam [
>  	method: "app"
>  	comment: "Pfam with HMMER indexing"
>  	app: "$HMMERBIN/hmmsearch $EMBOSS_DATA/pfam/Pfam_ls %s"
>  	]

	Dear Alexis,

Your problem is as good as certain that the program defined as "app" 
should return a sequence to standard output, so that EMBOSS can take it. 
And this is not what hmmsearch does. Furthermore, hmmsearch searches a HMM 
against a databank of sequences ; you seem to want to search a sequence 
against a databank of HMM's (Pfam_ls), for which you need hmmpfam. It is 
maybe a good idea to install the Embassadir HMMER. Note however that 
ehmmpfam needs the user to specify where the databank is. At the BEN site 
I have a little bit "hacked" the program so that it uses Pfam_ls by 
default (and still lets the user choose an alternative). If you are 
interested I can send you a mail with "how to".

	Guy Bottu,
	Belgian EMBnet Node


From pmr at ebi.ac.uk  Fri Jun  3 10:08:11 2005
From: pmr at ebi.ac.uk (pmr at ebi.ac.uk)
Date: Fri, 3 Jun 2005 11:08:11 +0100 (BST)
Subject: [EMBOSS] use water/matcher to find where RNA bybridizes
In-Reply-To: <20050603081741.GA23810@bigben.ulb.ac.be>
References: <20050602100954.GA14063@bigben.ulb.ac.be>
    <20050602170831.GW44956@iib.unsam.edu.ar>
    <20050603081741.GA23810@bigben.ulb.ac.be>
Message-ID: <1543.198.161.30.152.1117793291.squirrel@webmail.ebi.ac.uk>

Dear Guy,

> Thanks for your replies. It is however still not clear to me where the '.'
> come from. I thought the EMBOSS "pair" output would put a '|' for
> identities and a ':' for similarities (score positive). Maybe the program
> is fooled and seriously perturbed by a matrix that assigns a negative
> score to identical base pairs.

I believe it is perturbed by the zero score for identical base pairs. This
makes it unable to find a consensus character for the alignment, and so
the "no consensus found" '.' character appears in the output.

Making the output format understand your non-identical matching is an
interesting challenge. I will look into it a little more.

regards,

Peter


From Marc.Logghe at devgen.com  Fri Jun  3 10:23:05 2005
From: Marc.Logghe at devgen.com (Marc Logghe)
Date: Fri, 3 Jun 2005 12:23:05 +0200
Subject: [EMBOSS] external and app
Message-ID: <0C528E3670D8CE4B8E013F6749231AA606E802@ANTARESIA.be.devgen.com>

Hi,
Just wondering, what happens if you use entret in stead of seqret.
EMBOSS is supposed to just return the 'sequence' (in this case pfam
result), unaltered, unparsed. When you use seqret, EMBOSS will parse the
output and try to make a sequence out of it.
HTH,
Marc


> -----Original Message-----
> From: owner-emboss at hgmp.mrc.ac.uk 
> [mailto:owner-emboss at hgmp.mrc.ac.uk] On Behalf Of Guy Bottu
> Sent: Friday, June 03, 2005 12:03 PM
> To: Alexis Torrano Martinez; emboss at embnet.org
> Subject: Re: [EMBOSS] external and app
> 
> On Fri, Jun 03, 2005 at 11:15:10AM +0200, Alexis Torrano 
> Martinez wrote:
> > I am trying to execute hmmsearch from EMBOSS. This way I 
> want to have 
> > a kind of wrap over the DDBB and retrieval apps.
> > 
> >  
> >  DB Pfam [
> >  	method: "app"
> >  	comment: "Pfam with HMMER indexing"
> >  	app: "$HMMERBIN/hmmsearch $EMBOSS_DATA/pfam/Pfam_ls %s"
> >  	]
> 
> 	Dear Alexis,
> 
> Your problem is as good as certain that the program defined as "app" 
> should return a sequence to standard output, so that EMBOSS 
> can take it. 
> And this is not what hmmsearch does. Furthermore, hmmsearch 
> searches a HMM against a databank of sequences ; you seem to 
> want to search a sequence against a databank of HMM's 
> (Pfam_ls), for which you need hmmpfam. It is maybe a good 
> idea to install the Embassadir HMMER. Note however that 
> ehmmpfam needs the user to specify where the databank is. At 
> the BEN site I have a little bit "hacked" the program so that 
> it uses Pfam_ls by default (and still lets the user choose an 
> alternative). If you are interested I can send you a mail 
> with "how to".
> 
> 	Guy Bottu,
> 	Belgian EMBnet Node
> 
> 


From pmr at ebi.ac.uk  Fri Jun  3 10:49:24 2005
From: pmr at ebi.ac.uk (pmr at ebi.ac.uk)
Date: Fri, 3 Jun 2005 11:49:24 +0100 (BST)
Subject: [EMBOSS] external and app
In-Reply-To:      <0C528E3670D8CE4B8E013F6749231AA606E802@ANTARESIA.be.devgen.com>
References:     <0C528E3670D8CE4B8E013F6749231AA606E802@ANTARESIA.be.devgen.com>
Message-ID: <1830.198.161.30.152.1117795764.squirrel@webmail.ebi.ac.uk>

Hi Marc,

> Just wondering, what happens if you use entret in stead of seqret.
> EMBOSS is supposed to just return the 'sequence' (in this case pfam
> result), unaltered, unparsed. When you use seqret, EMBOSS will parse the
> output and try to make a sequence out of it.

Entret has to read the input as a sequence, and then returns the full text.

So entret will fail where seqret fails.

regards,

Peter


From jtk at cmp.uea.ac.uk  Fri Jun  3 12:41:24 2005
From: jtk at cmp.uea.ac.uk (Jan T. Kim)
Date: Fri, 3 Jun 2005 13:41:24 +0100
Subject: [EMBOSS] Reading Two Sequences from stdin with water
Message-ID: <20050603124124.GI21551@jtkpc.cmp.uea.ac.uk>

Dear EMBOSSers,

is it possible to read both input sequences to a pairwise alignment
from one input stream?

With the test input file attached, the command

    water -asequence fasta::x.fasta:seq1 -bsequence fasta::x.fasta:seq2 -outfile stdout -auto

runs as I expect, but the command

    cat x.fasta | water -asequence fasta::stdin:seq1 -bsequence fasta::stdin:seq2 -outfile stdout -auto

gives

   EMBOSS An error in ajfile.c at line 1926:
Error reading from file 'stdin'

It may well be that water consumes the entire input stream on getting the
first sequence, thus rendering itself unable to acquire the second one.

Is there a solution to this? I would really like to avoid the mess of
temporary files and run water in a clean pipe (pun intended  ;-)  )

Best regards & thanks in advance, Jan
-- 
 +- Jan T. Kim -------------------------------------------------------+
 |    *NEW*    email: jtk at cmp.uea.ac.uk                               |
 |    *NEW*    WWW:   http://www.cmp.uea.ac.uk/people/jtk             |
 *-----=<  hierarchical systems are for files, not for humans  >=-----*
-------------- next part --------------
> seq1
accaacc
> seq2
acgagcc

From jtk at cmp.uea.ac.uk  Fri Jun  3 12:53:35 2005
From: jtk at cmp.uea.ac.uk (Jan T. Kim)
Date: Fri, 3 Jun 2005 13:53:35 +0100
Subject: [EMBOSS] Reading Two Sequences from stdin with water
Message-ID: <20050603125335.GA25735@jtkpc.cmp.uea.ac.uk>

Dear EMBOSSers,

is it possible to read both input sequences to a pairwise alignment
from one input stream?

With the test input file attached, the command

    water -asequence fasta::x.fasta:seq1 -bsequence fasta::x.fasta:seq2 -outfile stdout -auto

runs as I expect, but the command

    cat x.fasta | water -asequence fasta::stdin:seq1 -bsequence fasta::stdin:seq2 -outfile stdout -auto

gives

   EMBOSS An error in ajfile.c at line 1926:
Error reading from file 'stdin'

It may well be that water consumes the entire input stream on getting the
first sequence, thus rendering itself unable to acquire the second one.

Is there a solution to this? I would really like to avoid the mess of
temporary files and run water in a clean pipe (pun intended  ;-)  )

Best regards & thanks in advance, Jan
-- 
 +- Jan T. Kim -------------------------------------------------------+
 |    *NEW*    email: jtk at cmp.uea.ac.uk                               |
 |    *NEW*    WWW:   http://www.cmp.uea.ac.uk/people/jtk             |
 *-----=<  hierarchical systems are for files, not for humans  >=-----*
-------------- next part --------------
> seq1
accaacc
> seq2
acgagcc

From simon.andrews at bbsrc.ac.uk  Fri Jun  3 12:16:58 2005
From: simon.andrews at bbsrc.ac.uk (simon andrews (BI))
Date: Fri, 3 Jun 2005 13:16:58 +0100
Subject: [EMBOSS] Reading Two Sequences from stdin with water
In-Reply-To: <20050603125335.GA25735@jtkpc.cmp.uea.ac.uk>
References: <20050603125335.GA25735@jtkpc.cmp.uea.ac.uk>
Message-ID: <94bd127ae352d650997dc4263fab3b8d@bbsrc.ac.uk>


On 3 Jun 2005, at 13:53, Jan T. Kim wrote:

> Dear EMBOSSers,
>
> is it possible to read both input sequences to a pairwise alignment
> from one input stream?

I spent a while trying to figure this out a few months back.  In the 
end the best solution I came up with was to use the asis: sequence 
type.  This allows you to do:

water -auto asis:aaaa asis:ataa stdout

which avoids the need for messing with the file system.  I seem to 
remember I found a way to set names for the sequences as well, but 
can't find that right now.

As long as you make sure you don't pass your command through a shell 
when you launch this from a script then it actually scales pretty well 
to quite large sequences.

Hope this helps

Simon.
-- 
Simon Andrews PhD
Bioinformatics Dept.
The Babraham Institute

simon.andrews at bbsrc.ac.uk
+44 (0) 1223 496463


From pmr at ebi.ac.uk  Fri Jun  3 14:09:03 2005
From: pmr at ebi.ac.uk (pmr at ebi.ac.uk)
Date: Fri, 3 Jun 2005 15:09:03 +0100 (BST)
Subject: [EMBOSS] Reading Two Sequences from stdin with water
In-Reply-To: <20050603125335.GA25735@jtkpc.cmp.uea.ac.uk>
References: <20050603125335.GA25735@jtkpc.cmp.uea.ac.uk>
Message-ID: <1977.198.161.30.152.1117807743.squirrel@webmail.ebi.ac.uk>

Jan T. Kim writes:
> is it possible to read both input sequences to a pairwise alignment
> from one input stream?
>
>     cat x.fasta | water -asequence fasta::stdin:seq1 -bsequence
> fasta::stdin:seq2 -outfile stdout -auto
>
> gives
>
>    EMBOSS An error in ajfile.c at line 1926:
> Error reading from file 'stdin'
>
> It may well be that water consumes the entire input stream on getting the
> first sequence, thus rendering itself unable to acquire the second one.
>
> Is there a solution to this? I would really like to avoid the mess of
> temporary files and run water in a clean pipe (pun intended  ;-)  )

EMBOSS will only cleanly read stdin as one input. We should probably trap
that internally and give an error if we find stdin opening again. I wonder
whether there is any useful way to share the stdin filebuffer. Hmmmm... in
the early days of EMBOSS we decided not to allow it, but it could be worth
a try. You would still be in trouble if you tried to read the second
sequence first though.

Assuming your x.fasta file has only seq1 and seq2 in that order, reading
seq1 will continue until the first line of seq2 is reached. By then it
would be too late for seq2 to be read cleanly.

At least you have fasta:: specified - with no specified format, EMBOSS has
to read a long way into the input just to check whether it is really GCG
format.

As for the asis format, I suppose an EMBOSS utility that reads x.fasta and
outputs asis::ctagtacgatgcgatcg asis::tgatcgatggctacgtagc would be useful
to you - then you could put `sillyname x.fasta` in your command line... at
least until the command line gets too long. Hard to preserve the ID and
description of the sequences though.

"If you think water is pure, just remember what fish do in it."

Hope that helps,

Peter


From jtk at cmp.uea.ac.uk  Fri Jun  3 15:40:31 2005
From: jtk at cmp.uea.ac.uk (Jan T. Kim)
Date: Fri, 3 Jun 2005 16:40:31 +0100
Subject: [EMBOSS] Reading Two Sequences from stdin with water
In-Reply-To: <94bd127ae352d650997dc4263fab3b8d@bbsrc.ac.uk>
References: <20050603125335.GA25735@jtkpc.cmp.uea.ac.uk> <94bd127ae352d650997dc4263fab3b8d@bbsrc.ac.uk>
Message-ID: <20050603154031.GE25735@jtkpc.cmp.uea.ac.uk>

On Fri, Jun 03, 2005 at 01:16:58PM +0100, simon andrews wrote:
> 
> On 3 Jun 2005, at 13:53, Jan T. Kim wrote:
> 
> >Dear EMBOSSers,
> >
> >is it possible to read both input sequences to a pairwise alignment
> >from one input stream?
> 
> I spent a while trying to figure this out a few months back.  In the 
> end the best solution I came up with was to use the asis: sequence 
> type.  This allows you to do:
> 
> water -auto asis:aaaa asis:ataa stdout
> 
> which avoids the need for messing with the file system.  I seem to 
> remember I found a way to set names for the sequences as well, but 
> can't find that right now.

That's a good idea which I hadn't thought of. Thanks for that. I don't
need any names, other than for purposes of identifying the sequence
within a multisequence file, which is not necessary with this solution.

> As long as you make sure you don't pass your command through a shell 
> when you launch this from a script then it actually scales pretty well 
> to quite large sequences.

Hmm... isn't there any OS specific limitation to the length of arguments?
But anyway, this is not an issue for me in my case, where sequence
length does not exceed a few hundred symbols.

Best regards, Jan
-- 
 +- Jan T. Kim -------------------------------------------------------+
 |    *NEW*    email: jtk at cmp.uea.ac.uk                               |
 |    *NEW*    WWW:   http://www.cmp.uea.ac.uk/people/jtk             |
 *-----=<  hierarchical systems are for files, not for humans  >=-----*


From simon.andrews at bbsrc.ac.uk  Fri Jun  3 14:53:17 2005
From: simon.andrews at bbsrc.ac.uk (simon andrews (BI))
Date: Fri, 3 Jun 2005 15:53:17 +0100
Subject: [EMBOSS] Reading Two Sequences from stdin with water
In-Reply-To: <20050603154031.GE25735@jtkpc.cmp.uea.ac.uk>
References: <20050603125335.GA25735@jtkpc.cmp.uea.ac.uk> <94bd127ae352d650997dc4263fab3b8d@bbsrc.ac.uk> <20050603154031.GE25735@jtkpc.cmp.uea.ac.uk>
Message-ID: <297ae8156db03f61d2deb2e786d3bf10@bbsrc.ac.uk>


On 3 Jun 2005, at 16:40, Jan T. Kim wrote:

> On Fri, Jun 03, 2005 at 01:16:58PM +0100, simon andrews wrote:
>> As long as you make sure you don't pass your command through a shell
>> when you launch this from a script then it actually scales pretty well
>> to quite large sequences.
>
> Hmm... isn't there any OS specific limitation to the length of 
> arguments?
> But anyway, this is not an issue for me in my case, where sequence
> length does not exceed a few hundred symbols.

The only limit is imposed when the command is passed through a shell, 
and is then dependent on the shell you're using.  If you can call the 
program without going through a shell then there should be no limit 
(beyond normal OS memory limits).

The method for doing this varies with the language you're writing the 
script in, but for example in Perl:

system ("water -auto asis:gatc asis:gatc stdout")

would pass the arguments through a shell, whereas

system("water", "-auto", "asis:gatc","asis:gatc","stdout")

would not.

Simon.
-- 
Simon Andrews PhD
Bioinformatics Dept.
The Babraham Institute

simon.andrews at bbsrc.ac.uk
+44 (0) 1223 496463


From andrew.warry at bbsrc.ac.uk  Fri Jun  3 15:23:57 2005
From: andrew.warry at bbsrc.ac.uk (andrew warry (BITS))
Date: Fri, 3 Jun 2005 16:23:57 +0100
Subject: [EMBOSS] Reading Two Sequences from stdin with water
Message-ID: <3AED5B0556B73F4A9B556F43384F5C8501857BEC@bitse2knas1.bits.bbsrc.reserved>


>Is there a solution to this? I would really like to avoid the mess of
temporary files and >run water in a clean pipe (pun intended  ;-)  )

Hi
How about :

nthseq x.fasta -number 2 -stdout -auto | water -aseq stdin -bseq x.fasta
-stdout -auto

It isn't very neat and does a redundant comparison but it does the job!


Andrew

----------------------------------------------------------------------- 
ANDREW WARRY 
Computational Molecular Biology Support 
BBSRC Bioscience IT services 
West Common                                      
Harpenden                                        
HERTS AL5 2JE
tel: (01582) 714904
fax: (01582) 714901
andrew.warry at bbsrc.ac.uk      
----------------------------------------------------------------------- 

-- 
Disclaimer: This e-mail and any attachments are confidential and intended solely for the use of the recipient(s) to whom they are addressed. If you have received it in error, please destroy all copies and inform the sender. This email and any attachments are believed to be free from viruses but BBSRC accepts no liability in connection therewith.


From simon.andrews at bbsrc.ac.uk  Fri Jun  3 15:32:34 2005
From: simon.andrews at bbsrc.ac.uk (simon andrews (BI))
Date: Fri, 3 Jun 2005 16:32:34 +0100
Subject: [EMBOSS] Reading Two Sequences from stdin with water
In-Reply-To: <3AED5B0556B73F4A9B556F43384F5C8501857BEC@bitse2knas1.bits.bbsrc.reserved>
References: <3AED5B0556B73F4A9B556F43384F5C8501857BEC@bitse2knas1.bits.bbsrc.reserved>
Message-ID: <53838984cac0240ba7aefe6d33f7810d@bbsrc.ac.uk>


On 3 Jun 2005, at 16:23, andrew warry ((BITS)) wrote:

>
>> Is there a solution to this? I would really like to avoid the mess of
>> temporary files and run water in a clean pipe (pun intended  ;-)  )
>
> Hi
> How about :
>
> nthseq x.fasta -number 2 -stdout -auto | water -aseq stdin -bseq 
> x.fasta
> -stdout -auto
>
> It isn't very neat and does a redundant comparison but it does the job!

But x.fasta still has to appear on the filesystem.  You can't run this 
cleanly in a pipe.

Simon.


From golharam at umdnj.edu  Fri Jun  3 14:57:18 2005
From: golharam at umdnj.edu (Ryan Golhar)
Date: Fri, 03 Jun 2005 10:57:18 -0400
Subject: [EMBOSS] Man pages
Message-ID: <004501c5684c$89f2ef90$e6028a0a@GOLHARMOBILE1>

Hi all,

I recently noticed there aren't man pages installed with emboss, but I
thought there were in the past.  Are there man pages available?  If so,
where/how do I get them?

-----
Ryan Golhar
Computational Biologist
The Informatics Institute at
The University of Medicine & Dentistry of NJ

Phone: 973-972-5034
Fax: 973-972-7412
Email: golharam at umdnj.edu


From jtk at cmp.uea.ac.uk  Fri Jun  3 17:18:01 2005
From: jtk at cmp.uea.ac.uk (Jan T. Kim)
Date: Fri, 3 Jun 2005 18:18:01 +0100
Subject: [EMBOSS] Reading Two Sequences from stdin with water
In-Reply-To: <1977.198.161.30.152.1117807743.squirrel@webmail.ebi.ac.uk>
References: <20050603125335.GA25735@jtkpc.cmp.uea.ac.uk> <1977.198.161.30.152.1117807743.squirrel@webmail.ebi.ac.uk>
Message-ID: <20050603171801.GF25735@jtkpc.cmp.uea.ac.uk>

On Fri, Jun 03, 2005 at 03:09:03PM +0100, pmr at ebi.ac.uk wrote:
> Jan T. Kim writes:
> > is it possible to read both input sequences to a pairwise alignment
> > from one input stream?
> >
> >     cat x.fasta | water -asequence fasta::stdin:seq1 -bsequence
> > fasta::stdin:seq2 -outfile stdout -auto
> >
> > gives
> >
> >    EMBOSS An error in ajfile.c at line 1926:
> > Error reading from file 'stdin'
> >
> > It may well be that water consumes the entire input stream on getting the
> > first sequence, thus rendering itself unable to acquire the second one.
> >
> > Is there a solution to this? I would really like to avoid the mess of
> > temporary files and run water in a clean pipe (pun intended  ;-)  )
> 
> EMBOSS will only cleanly read stdin as one input. We should probably trap
> that internally and give an error if we find stdin opening again. I wonder
> whether there is any useful way to share the stdin filebuffer. Hmmmm... in
> the early days of EMBOSS we decided not to allow it, but it could be worth
> a try. You would still be in trouble if you tried to read the second
> sequence first though.

Conceptually, this could be cleanly handled (which is why I tried in
the first place), by having the function for obtaining the input sequences
determine the source files in a first pass of the list of sources, and
then obtain all requested sequences that come from the same file in one
go through that file. This could be applied to the standard input just
as to any other file.

However, if the current code acquires the two sequences one after the
other and independently of each other, it will require a possibly less than
trivial rewrites to change that -- likely, the API for obtaining a
sequence specified by a USA would have to be extended such that multiple
sequences can be obtained from one file in one pass through that file,
and some functions to group lists of USAs into sublists of USAs that
refer to the same file would have to be provided.

> Assuming your x.fasta file has only seq1 and seq2 in that order, reading
> seq1 will continue until the first line of seq2 is reached. By then it
> would be too late for seq2 to be read cleanly.

Well, the approach outlined above does not have that limitation, and
it also works for interleaved sequence formats. But if the EMBOSS
internals are as I assume above, it's clear to me that this is something
for the long-term wishlist.

> At least you have fasta:: specified - with no specified format, EMBOSS has
> to read a long way into the input just to check whether it is really GCG
> format.

Yes, heuristic format determination and non-seekable inputs don't mix
too well generally...

> As for the asis format, I suppose an EMBOSS utility that reads x.fasta and
> outputs asis::ctagtacgatgcgatcg asis::tgatcgatggctacgtagc would be useful
> to you - then you could put `sillyname x.fasta` in your command line... at
> least until the command line gets too long. Hard to preserve the ID and
> description of the sequences though.

Yes -- in my case, I have the sequences available within a Python script
anyway, so the asis approach works fine for me (even with a popen
facility that goes through a shell -- I'll have to check how to eliminate
that for future occasions where sequences may be too long for the
command line, though).

> "If you think water is pure, just remember what fish do in it."

I like to boil my water, adding an all-natural disinfectant known as
"coffee" for this reason...  ;-)

Best regards, Jan
-- 
 +- Jan T. Kim -------------------------------------------------------+
 |    *NEW*    email: jtk at cmp.uea.ac.uk                               |
 |    *NEW*    WWW:   http://www.cmp.uea.ac.uk/people/jtk             |
 *-----=<  hierarchical systems are for files, not for humans  >=-----*


From robin at hms.harvard.edu  Fri Jun  3 16:30:33 2005
From: robin at hms.harvard.edu (Robin Colgrove)
Date: Fri, 3 Jun 2005 12:30:33 -0400
Subject: [EMBOSS] Man pages in multiple languages?
In-Reply-To: <004501c5684c$89f2ef90$e6028a0a@GOLHARMOBILE1>
References: <004501c5684c$89f2ef90$e6028a0a@GOLHARMOBILE1>
Message-ID: <f4b57570c2458cbe48e5a2fd1468a787@hms.harvard.edu>


Hello all,

are there EMBOSS man pages in other languages than English?

Mandarin and Spanish in particular would help around here.

thanks

robin colgrove
Harvard Medical School


From pmr at ebi.ac.uk  Fri Jun  3 17:14:08 2005
From: pmr at ebi.ac.uk (pmr at ebi.ac.uk)
Date: Fri, 3 Jun 2005 18:14:08 +0100 (BST)
Subject: [EMBOSS] Man pages in multiple languages?
In-Reply-To: <f4b57570c2458cbe48e5a2fd1468a787@hms.harvard.edu>
References: <004501c5684c$89f2ef90$e6028a0a@GOLHARMOBILE1>
    <f4b57570c2458cbe48e5a2fd1468a787@hms.harvard.edu>
Message-ID: <2398.198.161.30.152.1117818848.squirrel@webmail.ebi.ac.uk>

Hi Robin,

> are there EMBOSS man pages in other languages than English?
>
> Mandarin and Spanish in particular would help around here.

We don't have man pages exactly. We have a text version of the online
documentation, with the "tfm" program to display to the screen.

To find out why it is called tfm, you can use the command:

tfm tfm

Of course, it prints "The F(antastic) Manual" as in "RTFM"

For other languages, there may be something out there. We are aware of a
Japanese user group that has translated much of the EMBOSS materials. I am
sure there are Mandarin speakers who could create a Mandarin version -
though on the first ever EMBOSS course (in Beijing) ethere was a vote
against creating a Mandarin version of the commandline.

Hope this helps,

Peter Rice


From luojc at plum.lsc.pku.edu.cn  Sat Jun  4 01:15:37 2005
From: luojc at plum.lsc.pku.edu.cn (Jingchu Luo)
Date: Sat, 4 Jun 2005 09:15:37 +0800 (CST)
Subject: [EMBOSS] Man pages in multiple languages?
In-Reply-To: <2398.198.161.30.152.1117818848.squirrel@webmail.ebi.ac.uk>
Message-ID: <Pine.LNX.4.44.0506040817320.25760-100000@plum.lsc.pku.edu.cn>

> I am sure there are Mandarin speakers who could create a Mandarin
> version - though on the first ever EMBOSS course (in Beijing) there was
> a vote against creating a Mandarin version of the commandline.

We were running an EMBnet bioinformatics workshop in April 1999. Peter 
gave a talk about EMBOSS. It might be useful to have user manual and/or 
documentation in Chinese for the Chinese user group. We'll see if anyone 
in mainland has been working on this already. 

Jingchu
-------
Jingchu Luo
Centre of Bioinformatics
Peking University
Beijing 100871, China
Tel: 86-10-6275-7281
Fax: 86-10-6275-9001
Email: luojc at pku.edu.cn
URL: http://www.cbi.pku.edu.cn 


From d.gatherer at vir.gla.ac.uk  Wed Jun 15 10:31:33 2005
From: d.gatherer at vir.gla.ac.uk (Derek Gatherer)
Date: Wed, 15 Jun 2005 11:31:33 +0100
Subject: [EMBOSS] seqret options
Message-ID: <6.2.1.2.1.20050615111255.02adcf50@lenzie.gla.ac.uk>

Dear EMBOSSers

I'm trying to write a pipeline to take a load of paired, aligned homologues 
from 2 species and submit them sequentially to the yn00 application from 
the well known PAML package.  PAML's applications all take PHYLIP 
format.  I can easily make this by looping over:

seqret -auto -osformat phylip infile -out outfile

However, PAML requires that the flag "I" be placed on the top line of the 
phylip fomat to indicate interleaved, eg:

  2 663 I
c-barf1  ATGGCCAGGC TTTTCGCTCA GCTGCTCCTG CTCGCGGGCT CCGTCGCCTC
barf1     ATGGCCAGGT TCATCGCTCA GCTCCTCCTG TTGGCCTCCT GTGTGGCCGC

           CTGCCTGGCC GTCACCGCCT TTGTGGGTGA GCGGGCCGTC CTGAGTTCCT
           CGGCCAGGCT GTCACCGCTT TCTTGGGTGA GCGAGTCACC CTGACCTCCT

rather than the standard phylip format, given by seqret:

  2 663
c-barf1   ATGGCCAGGC TTTTCGCTCA GCTGCTCCTG CTCGCGGGCT CCGTCGCCTC
barf1     ATGGCCAGGT TCATCGCTCA GCTCCTCCTG TTGGCCTCCT GTGTGGCCGC

           CTGCCTGGCC GTCACCGCCT TTGTGGGTGA GCGGGCCGTC CTGAGTTCCT
           CGGCCAGGCT GTCACCGCTT TCTTGGGTGA GCGAGTCACC CTGACCTCCT

I could write a script to open each seqret output file and add this 
character to the top line of each, but before I dive into this, I'd like to 
know if there is any flag I can add to seqret to get the "I" added 
automatically.

Failing that, PAML takes the other, non-interleaved phylip format 
("sequential") by default, and that would not require any flag 
insertion.  Seqret also can produce this (using -osformat phylip3):

1 663 YF
c-barf1 ATGGCCAGGC TTTTCGCTCA GCTGCTCCTG CTCGCGGGCT CCGTCGCCTC
           CTGCCTGGCC GTCACCGCCT TTGTGGGTGA GCGGGCCGTC CTGAGTTCCT
           ACTGGAAGAG GGTGAGCCTA GGGCCCGAGA TCATGGTGGA ATGGTTCAAA

but then PAML won't read it because it doesn't like the YF flags inserted 
by seqret!!

So I either have to script to remove flags from sequential or insert them 
in interleaved, unless seqret has a solution.

All assistance gratefully appreciated
Derek


From David.Bauer at SCHERING.DE  Wed Jun 15 11:19:55 2005
From: David.Bauer at SCHERING.DE (David.Bauer at SCHERING.DE)
Date: Wed, 15 Jun 2005 13:19:55 +0200
Subject: Antwort: [EMBOSS] seqret options
Message-ID: <OFA27F3B1C.3EC36BC7-ONC1257021.003CD8F5-C1257021.003E3FCC@schering.net>


Hi Derek,

you can easily change this in the source code.
The sequence output formats are defined in ajax/ajseqwrite.c
In the function seqWritePhylip3 you find a line:
ajFmtPrintF(outseq->File, "1 %d YF\n", ilen);
Here you can just delete the YF and recompile emboss.

David.


                      Derek Gatherer                                                                                             
                      <d.gatherer at vir.                                                                                           
                      gla.ac.uk>               An:      emboss at embnet.org                                                        
                      Gesendet von:            Kopie:                                                                            
                      owner-emboss at hgm         Thema:   [EMBOSS] seqret options                                                  
                      p.mrc.ac.uk                                                                                                
                                                                                                                                 
                                                                                                                                 
                      15.06.2005 12:31                                                                                           
                                                                                                                                 
                                                                                                                                 
Dear EMBOSSers

I'm trying to write a pipeline to take a load of paired, aligned homologues

from 2 species and submit them sequentially to the yn00 application from
the well known PAML package.  PAML's applications all take PHYLIP
format.  I can easily make this by looping over:

seqret -auto -osformat phylip infile -out outfile

However, PAML requires that the flag "I" be placed on the top line of the
phylip fomat to indicate interleaved, eg:

  2 663 I
c-barf1  ATGGCCAGGC TTTTCGCTCA GCTGCTCCTG CTCGCGGGCT CCGTCGCCTC
barf1     ATGGCCAGGT TCATCGCTCA GCTCCTCCTG TTGGCCTCCT GTGTGGCCGC

           CTGCCTGGCC GTCACCGCCT TTGTGGGTGA GCGGGCCGTC CTGAGTTCCT
           CGGCCAGGCT GTCACCGCTT TCTTGGGTGA GCGAGTCACC CTGACCTCCT

rather than the standard phylip format, given by seqret:

  2 663
c-barf1   ATGGCCAGGC TTTTCGCTCA GCTGCTCCTG CTCGCGGGCT CCGTCGCCTC
barf1     ATGGCCAGGT TCATCGCTCA GCTCCTCCTG TTGGCCTCCT GTGTGGCCGC

           CTGCCTGGCC GTCACCGCCT TTGTGGGTGA GCGGGCCGTC CTGAGTTCCT
           CGGCCAGGCT GTCACCGCTT TCTTGGGTGA GCGAGTCACC CTGACCTCCT

I could write a script to open each seqret output file and add this
character to the top line of each, but before I dive into this, I'd like to

know if there is any flag I can add to seqret to get the "I" added
automatically.

Failing that, PAML takes the other, non-interleaved phylip format
("sequential") by default, and that would not require any flag
insertion.  Seqret also can produce this (using -osformat phylip3):

1 663 YF
c-barf1 ATGGCCAGGC TTTTCGCTCA GCTGCTCCTG CTCGCGGGCT CCGTCGCCTC
           CTGCCTGGCC GTCACCGCCT TTGTGGGTGA GCGGGCCGTC CTGAGTTCCT
           ACTGGAAGAG GGTGAGCCTA GGGCCCGAGA TCATGGTGGA ATGGTTCAAA

but then PAML won't read it because it doesn't like the YF flags inserted
by seqret!!

So I either have to script to remove flags from sequential or insert them
in interleaved, unless seqret has a solution.

All assistance gratefully appreciated
Derek


From pmr at ebi.ac.uk  Wed Jun 15 12:23:48 2005
From: pmr at ebi.ac.uk (Peter Rice)
Date: Wed, 15 Jun 2005 13:23:48 +0100
Subject: [EMBOSS] seqret options
In-Reply-To: <6.2.1.2.1.20050615111255.02adcf50@lenzie.gla.ac.uk>
References: <6.2.1.2.1.20050615111255.02adcf50@lenzie.gla.ac.uk>
Message-ID: <42B01DD4.8050303@ebi.ac.uk>

Derek Gatherer wrote:

> Dear EMBOSSers
> 
> I'm trying to write a pipeline to take a load of paired, aligned 
> homologues from 2 species and submit them sequentially to the yn00 
> application from the well known PAML package.  PAML's applications all 
> take PHYLIP format.

> Failing that, PAML takes the other, non-interleaved phylip format 
> ("sequential") by default, and that would not require any flag 
> insertion. 

Last time I worked through the PHYLIP formats (for EMBOSS 2.10.0) I found 
Phylip had changed the format it used.

One change was that I removed the YF from phylip3 format because phylip was no 
longer using it - so updating to EMBOSS 2.10.0 will solve your non-interleaved 
format problem (and David Bauer's code fix is exactly what you need).

Any more feedback on the variations of phylip formats that other packages use 
would be a great help!

We will be releasing the PHYLIP 3.6 integration (as a PHYLIPNEW EMBASSY 
package) soon and expect to see more use of phylogenetics packages with EMBOSS.

regards,

Peter Rice


From d.gatherer at vir.gla.ac.uk  Wed Jun 15 12:44:46 2005
From: d.gatherer at vir.gla.ac.uk (Derek Gatherer)
Date: Wed, 15 Jun 2005 13:44:46 +0100
Subject: [EMBOSS] seqret options
In-Reply-To: <42B01DD4.8050303@ebi.ac.uk>
References: <6.2.1.2.1.20050615111255.02adcf50@lenzie.gla.ac.uk>
 <42B01DD4.8050303@ebi.ac.uk>
Message-ID: <6.2.1.2.1.20050615134121.02addff8@lenzie.gla.ac.uk>

I do have 2.10.0:

[gath01d at gamma seqs]$ seqret -osformat phylip3 barf1_both.seq
Reads and writes (returns) sequences
Output sequence [c-barf1.phylip3]: barf1.phylip3
[gath01d at gamma seqs]$ more barf1.phylip3
1 663 YF
c-barf1ATGGCCAGGC TTTTCGCTCA GCTGCTCCTG CTCGCGGGCT CCGTCGCCTC
           CTGCCTGGCC GTCACCGCCT TTGTGGGTGA GCGGGCCGTC CTGAGTTCCT
           ACTGGAAGAG GGTGAGCCTA GGGCCCGAGA TCATGGTGGA ATGGTTCAAA
[gath01d at gamma seqs]$ embossversion
Writes the current EMBOSS version number
2.10.0

Anyway, I know how to do the code fix now, so thanks to all.

Cheers
Derek

At 13:23 15/06/2005, you wrote:
>Derek Gatherer wrote:
>
>>Dear EMBOSSers
>>I'm trying to write a pipeline to take a load of paired, aligned 
>>homologues from 2 species and submit them sequentially to the yn00 
>>application from the well known PAML package.  PAML's applications all 
>>take PHYLIP format.
>
>>Failing that, PAML takes the other, non-interleaved phylip format 
>>("sequential") by default, and that would not require any flag insertion.
>
>Last time I worked through the PHYLIP formats (for EMBOSS 2.10.0) I found 
>Phylip had changed the format it used.
>
>One change was that I removed the YF from phylip3 format because phylip 
>was no longer using it - so updating to EMBOSS 2.10.0 will solve your 
>non-interleaved format problem (and David Bauer's code fix is exactly what 
>you need).
>
>Any more feedback on the variations of phylip formats that other packages 
>use would be a great help!
>
>We will be releasing the PHYLIP 3.6 integration (as a PHYLIPNEW EMBASSY 
>package) soon and expect to see more use of phylogenetics packages with EMBOSS.
>
>regards,
>
>Peter Rice
>


From pmr at ebi.ac.uk  Wed Jun 15 12:49:59 2005
From: pmr at ebi.ac.uk (Peter Rice)
Date: Wed, 15 Jun 2005 13:49:59 +0100
Subject: [EMBOSS] seqret options
In-Reply-To: <6.2.1.2.1.20050615134121.02addff8@lenzie.gla.ac.uk>
References: <6.2.1.2.1.20050615111255.02adcf50@lenzie.gla.ac.uk> <42B01DD4.8050303@ebi.ac.uk> <6.2.1.2.1.20050615134121.02addff8@lenzie.gla.ac.uk>
Message-ID: <42B023F7.7010808@ebi.ac.uk>

Derek Gatherer wrote:
> I do have 2.10.0:
> 
> [gath01d at gamma seqs]$ seqret -osformat phylip3 barf1_both.seq
> Reads and writes (returns) sequences
> Output sequence [c-barf1.phylip3]: barf1.phylip3
> [gath01d at gamma seqs]$ more barf1.phylip3
> 1 663 YF
> c-barf1ATGGCCAGGC TTTTCGCTCA GCTGCTCCTG CTCGCGGGCT CCGTCGCCTC
>           CTGCCTGGCC GTCACCGCCT TTGTGGGTGA GCGGGCCGTC CTGAGTTCCT
>           ACTGGAAGAG GGTGAGCCTA GGGCCCGAGA TCATGGTGGA ATGGTTCAAA
> [gath01d at gamma seqs]$ embossversion
> Writes the current EMBOSS version number
> 2.10.0

Oops ... make that "will be in 3.0.0" in that case ... it worked for me :-)

regards,

Peter


From d.gatherer at vir.gla.ac.uk  Wed Jun 15 13:25:36 2005
From: d.gatherer at vir.gla.ac.uk (Derek Gatherer)
Date: Wed, 15 Jun 2005 14:25:36 +0100
Subject: [EMBOSS] seqret again
Message-ID: <6.2.1.2.1.20050615142150.02afda58@lenzie.gla.ac.uk>

Is this a bug?  Compare the following output from seqret when phylip and 
phylip3 are specified.  Shouldn't the first line of the phylip3 output be 
"2 546 YF" and not "1 546" ?

[gath01d at gamma EBV]$ seqret -osformat phylip seqs/balf1.both
Reads and writes (returns) sequences
Output sequence [c-balf1.phylip]: seqs/balf1.phylip
[gath01d at gamma EBV]$ more seqs/balf1.phylip
  2 546
c-balf1.seATGCAGCCAG CCAAGTCTAC CGATTCGGTG TTTGTGAGGA CCCCGGTCGA
balf1.seq ATGAGGCCAG CCAAGTCTAC AGATTCTGTG TTTGTGAGGA CCCCGGTCGA

           GGCGTGGGTC TCACCCTCGC CCCCGGACGA CAAAGTGGCA GAGACCAGCT
           GGCGTGGGTC GCGCCCTCGC CGCCGGACGA CAAGGTGGCT GAGTCCAGCT
[snip]

[gath01d at gamma EBV]$ seqret -osformat phylip3 seqs/balf1.both
Reads and writes (returns) sequences
Output sequence [c-balf1.phylip3]: seqs/balf1.phylip3
[gath01d at gamma EBV]$ more seqs/balf1.phylip3
1 546 YF
c-balf1.seATGCAGCCAG CCAAGTCTAC CGATTCGGTG TTTGTGAGGA CCCCGGTCGA
           GGCGTGGGTC TCACCCTCGC CCCCGGACGA CAAAGTGGCA GAGACCAGCT
           ACCTCCTGTT CAGGGCCCTA TACGCTGTGT TCACCCAGGA CGAGACGGAC
           CTGCCTCTAC CGGCCCTGGT CATGTGCCGG CTCCTGAAGG CCTCCCTGAG

[snip]


From pmr at ebi.ac.uk  Wed Jun 15 13:35:57 2005
From: pmr at ebi.ac.uk (Peter Rice)
Date: Wed, 15 Jun 2005 14:35:57 +0100
Subject: [EMBOSS] seqret again
In-Reply-To: <6.2.1.2.1.20050615142150.02afda58@lenzie.gla.ac.uk>
References: <6.2.1.2.1.20050615142150.02afda58@lenzie.gla.ac.uk>
Message-ID: <42B02EBD.4040800@ebi.ac.uk>

Derek Gatherer wrote:

> Is this a bug?  Compare the following output from seqret when phylip and 
> phylip3 are specified.  Shouldn't the first line of the phylip3 output 
> be "2 546 YF" and not "1 546" ?
> [gath01d at gamma EBV]$ seqret -osformat phylip3 seqs/balf1.both
> Reads and writes (returns) sequences
> Output sequence [c-balf1.phylip3]: seqs/balf1.phylip3
> [gath01d at gamma EBV]$ more seqs/balf1.phylip3
> 1 546 YF
> c-balf1.seATGCAGCCAG CCAAGTCTAC CGATTCGGTG TTTGTGAGGA CCCCGGTCGA
>           GGCGTGGGTC TCACCCTCGC CCCCGGACGA CAAAGTGGCA GAGACCAGCT
>           ACCTCCTGTT CAGGGCCCTA TACGCTGTGT TCACCCAGGA CGAGACGGAC
>           CTGCCTCTAC CGGCCCTGGT CATGTGCCGG CTCCTGAAGG CCTCCCTGAG

Yes. Fixed in the next release (and in the current CVS code).

Fixed as in "2 546" without the YF.

Do any programs require the YF?

Peter


From kertib at linuxlap.hu  Wed Jun 15 14:13:44 2005
From: kertib at linuxlap.hu (Kerti Balazs Gabor)
Date: Wed, 15 Jun 2005 16:13:44 +0200
Subject: [EMBOSS] Install error (AMD64)
Message-ID: <42B03798.1060204@linuxlap.hu>

Hello!

I would like to install emboss (latest version) from source. The host OS 
is Fedora Linux Core 4 (2.6.11-1.1369_FC4 #1 Thu Jun 2 22:56:33 EDT 2005 
x86_64 x86_64 x86_64 GNU/Linux).
The script
$ configure --enable 64
ran clear but the
make
made error this:

/bin/sh ../libtool --tag=CC --mode=link gcc  -O2   -o aaindexextract 
aaindexextract.o ../nucleus/libnucleus.la ../ajax/libajaxg.la 
../ajax/libajax.la ../plplot/libplplot.la -lX11  -lm
mkdir .libs
gcc -O2 -o .libs/aaindexextract aaindexextract.o 
../nucleus/.libs/libnucleus.so ../ajax/.libs/libajaxg.so 
../ajax/.libs/libajax.so ../plplot/.libs/libplplot.so -lX11 -lm 
-Wl,--rpath -Wl,/usr/local/lib
/usr/bin/ld: cannot find -lX11
collect2: ld returned 1 exit status
make[2]: *** [aaindexextract] Error 1
make[2]: Leaving directory `/usr/src/EMBOSS-2.10.0/emboss'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/usr/src/EMBOSS-2.10.0/emboss'
make: *** [all-recursive] Error 1
[root at localhost EMBOSS-2.10.0]#

How to solve this? What package(s) need for it?

Balazs


From ableasby at hgmp.mrc.ac.uk  Wed Jun 15 14:26:39 2005
From: ableasby at hgmp.mrc.ac.uk (Alan Bleasby)
Date: Wed, 15 Jun 2005 15:26:39 +0100 (BST)
Subject: [EMBOSS] Install error (AMD64)
Message-ID: <200506151426.j5FEQduS029156@bromine.hgmp.mrc.ac.uk>

Dear Balazs,

You need to install the  xorg-x11-devel RPM, 'make clean' and do
the configure step again.

Also, there is no need to define --enable64 unless you
expect 'user space' applications to consume more than
4Gb of internal memory.

HTH

Alan Bleasby
RFCGR/HGMP (for the next month and a half)


From aengus.stewart at cancer.org.uk  Wed Jun 15 15:46:07 2005
From: aengus.stewart at cancer.org.uk (Aengus Stewart)
Date: Wed, 15 Jun 2005 16:46:07 +0100
Subject: [EMBOSS] 3.0.0
Message-ID: <42B04D3F.7020405@cancer.org.uk>


Will the ceremonial release of 3.0.0 into the wild be at ISMB?

In other words, soon? :-)


Regards
Aengus


-- 
-----------------------------------------------------------------------
Aengus Stewart
Group Leader
<GROUP NAME GOES HERE>                         Tel: +44 (0)20 7269 3679
Cancer Research UK, Lincoln's Inn Fields, Holborn, London, WC2A 3PX, UK
-----------------------------------------------------------------------

This electronic message contains information  which may be privileged and
confidential.  The information is intended to be for the use of the
individual(s) or entity named above.  If you are not the intended recipient,
be aware that any disclosure, copying, distribution or use of the contents
of this information is prohibited. If you have received this electronic
message in error, please notify me by telephone or email (to the number
or address above) immediately.


From ableasby at hgmp.mrc.ac.uk  Wed Jun 15 16:44:18 2005
From: ableasby at hgmp.mrc.ac.uk (Alan Bleasby)
Date: Wed, 15 Jun 2005 17:44:18 +0100 (BST)
Subject: [EMBOSS] 3.0.0
Message-ID: <200506151644.j5FGiI8T009556@bromine.hgmp.mrc.ac.uk>

Well, we always like to try to release on St Swithin's Day; that
date is normally before ISMB, but this year it isn't.

EMBOSS will feature at ISMB in all the usual places (BOSC, poster,
demo and maybe BOF) and the soon-to-be-released 3.0.0 will
certainly be mentioned there.

Alan


From golharam at umdnj.edu  Wed Jun 15 19:01:54 2005
From: golharam at umdnj.edu (Ryan Golhar)
Date: Wed, 15 Jun 2005 15:01:54 -0400
Subject: [EMBOSS] EMBOSS-GUI
Message-ID: <000c01c571dc$b1daaf90$e6028a0a@GOLHARMOBILE1>

Does anyone know if any work is being done on EMBOSS-GUI by Luke
McCarthy.  The web site doesn't seem to be active and out-of-date. 

If a new version isn't being worked on, I'd like to volunteer to help
maintain it for v3.0.0.  Its such a simple and clean interface.  I
haven't found anything else like it.

Ryan


From andrespinzon at gmail.com  Wed Jun 15 20:14:27 2005
From: andrespinzon at gmail.com (Andres Pinzon)
Date: Wed, 15 Jun 2005 15:14:27 -0500
Subject: [EMBOSS] EMBOSS-GUI
In-Reply-To: <000c01c571dc$b1daaf90$e6028a0a@GOLHARMOBILE1>
References: <000c01c571dc$b1daaf90$e6028a0a@GOLHARMOBILE1>
Message-ID: <8968fc7e0506151314772f91f0@mail.gmail.com>

2005/6/15, Ryan Golhar <golharam at umdnj.edu>:
> Does anyone know if any work is being done on EMBOSS-GUI by Luke
> McCarthy.  The web site doesn't seem to be active and out-of-date.
> 
> If a new version isn't being worked on, I'd like to volunteer to help
> maintain it for v3.0.0.  Its such a simple and clean interface.  I
> haven't found anything else like it.

If you need help to maintaini it please ask me! ;-)
I really liked that interface too.


-- 
---------
Andr?s Pinz?n [http://www.andrespinzon.com]   
Centro de Bioinformatica, Instituto de Biotecnologia
http://bioinf.ibun.unal.edu.co
Universidad Nacional de Colombia
tel. 3165000 ext. 16961   
GNU/Linux user number 349752
----------


From lukem at gene.pbi.nrc.ca  Wed Jun 15 19:49:23 2005
From: lukem at gene.pbi.nrc.ca (Luke McCarthy)
Date: Wed, 15 Jun 2005 13:49:23 -0600
Subject: [EMBOSS] EMBOSS-GUI
In-Reply-To: <000c01c571dc$b1daaf90$e6028a0a@GOLHARMOBILE1>
References: <000c01c571dc$b1daaf90$e6028a0a@GOLHARMOBILE1>
Message-ID: <1118864963.13749.8.camel@incognito.invalid>

On Wed, 2005-06-15 at 13:01, Ryan Golhar wrote:
> Does anyone know if any work is being done on EMBOSS-GUI by Luke
> McCarthy.  The web site doesn't seem to be active and out-of-date. 
> 
> If a new version isn't being worked on, I'd like to volunteer to help
> maintain it for v3.0.0.  Its such a simple and clean interface.  I
> haven't found anything else like it.

I have developed a new version and moved the code to sourceforge
(http://sourceforge.net/projects/embossgui/)  Since February, the only
remaining step has been to wrap it up in a releasable format, but I just
haven't found the time.

I had considered waiting until the 3.0.0 release of EMBOSS, but if
there's interest now I'll do my best to get it out there sooner.

Cheers,

Luke


From golharam at umdnj.edu  Thu Jun 16 15:10:49 2005
From: golharam at umdnj.edu (Ryan Golhar)
Date: Thu, 16 Jun 2005 11:10:49 -0400
Subject: [EMBOSS] EMBOSS-GUI
In-Reply-To: <1118866913.13749.12.camel@incognito.invalid>
Message-ID: <002201c57285$95084090$e6028a0a@GOLHARMOBILE1>

The release for EMBOSS 3.0.0 is around July 15th?  If so, I can wait for
embossgui until then.  If you need any help with embossgui, please let
me know.  I'd be more than happy to contribute what I can.

Ryan


-----Original Message-----
From: Luke McCarthy [mailto:lukem at gene.pbi.nrc.ca] 
Sent: Wednesday, June 15, 2005 4:22 PM
To: Ryan Golhar
Subject: Re: [EMBOSS] EMBOSS-GUI


      * (also copied to emboss at embnet.org)

On Wed, 2005-06-15 at 13:01, Ryan Golhar wrote:
> Does anyone know if any work is being done on EMBOSS-GUI by Luke 
> McCarthy.  The web site doesn't seem to be active and out-of-date.
> 
> If a new version isn't being worked on, I'd like to volunteer to help 
> maintain it for v3.0.0.  Its such a simple and clean interface.  I 
> haven't found anything else like it.

I have developed a new version and moved the code to sourceforge
(http://sourceforge.net/projects/embossgui/)  Since February, the only
remaining step has been to wrap it up in a releasable format, but I just
haven't found the time.

I had considered waiting until the 3.0.0 release of EMBOSS, but if
there's interest now I'll do my best to get it out there sooner.

Cheers,

Luke


From msarachu at biol.unlp.edu.ar  Thu Jun 16 19:41:23 2005
From: msarachu at biol.unlp.edu.ar (Martin Sarachu)
Date: Thu, 16 Jun 2005 16:41:23 -0300
Subject: [EMBOSS] Masking the : character?
Message-ID: <42B1D5E3.1000503@biol.unlp.edu.ar>

Dear list,

is there any way to mask the ':' character so it is not interpreted as a 
delimiter for DB:sequence?
I have this file

/home/embtest/wProjects/test/.clustal.05.06.15:17.46.27/ops2_drome.msf

and when I run infoseq I get this error

$ infoseq 
/home/embtest/wProjects/test/.clustal.05.06.15:17.46.27/ops2_drome.msf
Displays some simple information about sequences
Error: failed to open filename 
'/home/embtest/wProjects/test/.clustal.05.06.15'
Error: Unable to read sequence 
'/home/embtest/wProjects/test/.clustal.05.06.15:17.46.27/ops2_drome.msf'
Died: infoseq terminated: Bad value for '-sequence' and no prompt


Thanks in advance,

Martin

-- 
Martin Sarachu
msarachu at biol.unlp.edu.ar
AR.EMBnet
http://www.ar.embnet.org


From yezhiqiang at gmail.com  Sat Jun 18 09:28:16 2005
From: yezhiqiang at gmail.com (yezhiqiang at gmail.com)
Date: Sat, 18 Jun 2005 17:28:16 +0800
Subject: [EMBOSS] Masking the : character?
In-Reply-To: <42B1D5E3.1000503@biol.unlp.edu.ar>
References: <42B1D5E3.1000503@biol.unlp.edu.ar>
Message-ID: <34198fe4050618022825238622@mail.gmail.com>

I have also found this.
and \:  or using quote cannot solve this problem.

But why not just rename your file name? It doesn't bother.


2005/6/17, Martin Sarachu <msarachu at biol.unlp.edu.ar>:
> Dear list,
> 
> is there any way to mask the ':' character so it is not interpreted as a
> delimiter for DB:sequence?
> I have this file
> 
> /home/embtest/wProjects/test/.clustal.05.06.15:17.46.27/ops2_drome.msf
> 
> and when I run infoseq I get this error
> 
> $ infoseq
> /home/embtest/wProjects/test/.clustal.05.06.15:17.46.27/ops2_drome.msf
> Displays some simple information about sequences
> Error: failed to open filename
> '/home/embtest/wProjects/test/.clustal.05.06.15'
> Error: Unable to read sequence
> '/home/embtest/wProjects/test/.clustal.05.06.15:17.46.27/ops2_drome.msf'
> Died: infoseq terminated: Bad value for '-sequence' and no prompt
> 
> Thanks in advance,
> 
> Martin
> 
> --
> Martin Sarachu
> msarachu at biol.unlp.edu.ar
> AR.EMBnet
> http://www.ar.embnet.org
>


From yezhiqiang at gmail.com  Sat Jun 18 09:50:50 2005
From: yezhiqiang at gmail.com (yezhiqiang at gmail.com)
Date: Sat, 18 Jun 2005 17:50:50 +0800
Subject: [EMBOSS] Man pages
In-Reply-To: <004501c5684c$89f2ef90$e6028a0a@GOLHARMOBILE1>
References: <004501c5684c$89f2ef90$e6028a0a@GOLHARMOBILE1>
Message-ID: <34198fe405061802504ace851@mail.gmail.com>

EMBOss has its own manual system: tfm

try like this:
wossname seqret  
tfm seqret


2005/6/3, Ryan Golhar <golharam at umdnj.edu>:
> Hi all,
> 
> I recently noticed there aren't man pages installed with emboss, but I
> thought there were in the past.  Are there man pages available?  If so,
> where/how do I get them?
> 
> -----
> Ryan Golhar
> Computational Biologist
> The Informatics Institute at
> The University of Medicine & Dentistry of NJ
> 
> Phone: 973-972-5034
> Fax: 973-972-7412
> Email: golharam at umdnj.edu
> 
>


From jrvalverde at cnb.uam.es  Mon Jun 20 08:55:20 2005
From: jrvalverde at cnb.uam.es (=?ISO-8859-15?Q?Jos=E9?= R. Valverde)
Date: Mon, 20 Jun 2005 10:55:20 +0200
Subject: [EMBOSS] Multiplatform filenames (was Re: Masking the : character?)
In-Reply-To: <34198fe4050618022825238622@mail.gmail.com>
References: <42B1D5E3.1000503@biol.unlp.edu.ar>
	<34198fe4050618022825238622@mail.gmail.com>
Message-ID: <20050620105520.736fef76.jrvalverde@cnb.uam.es>

On Sat, 18 Jun 2005 17:28:16 +0800
<yezhiqiang at gmail.com> wrote:
> I have also found this.
> and \:  or using quote cannot solve this problem.
> 
> But why not just rename your file name? It doesn't bother.
> 
> 
> 2005/6/17, Martin Sarachu <msarachu at biol.unlp.edu.ar>:
> > Dear list,
> > 
> > is there any way to mask the ':' character so it is not interpreted as a
> > delimiter for DB:sequence?

	Renaming. 
	---------

Or in other words (caution, detailed explanation follows):

    Why should anybody have a database or db. file named something\ or 
something\\\?

But the fact is that by Unix filesystem semantics that is allowed. So,
there is no easy way to avoid the ':' problem as one must acommodate for
this. Specially since :: is also meningful to EMBOSS. One should introduce
the notion of a special scape metacharacter or a quotation method, and
while at it, it should integrate easily with shells... meaning that it 
should not be pre-processed by the shell (e.g. 'file:name' would come out
of the shell as file:name, the user would need to type "'file:name'" or
some other such horrible combination to escape shell quotations too).

The problem arises because the ':' is used for historic reasons as a
carry-over from VMS where it had special meaning on pathnames. This 
does not hold on UNIX where it is a legit character (actually ANY char
but '/' and NULL is a legit character on UNIX). This is important as
EMBOSS may be used on many locales, and you don't know in advance
how a given symbol will be represented on them. Freedom comes at a 
cost.

QUICK SOLUTION
- ------------
I think that for the user it is simpler to know that ':' has a special
meaning and should be avoided.

For the cases where the colon is generated automatically, it may be better
to provide a renaming script that changes the colon to something else.


UI 'PRO' APPROACH
- ---------------
For GUI writers it is probably better to "translate" any such filenames
between the user and EMBOSS. Note the quotes around translate above: it
is not immediate. Let me explain:

	Escaping for the *command line* must be done using some character 
that is a) meaningful (but those are mostly already taken) and b) easy 
to type on a keyboard. In any case, this means that the user must be aware
of the special case, and if so, renaming is just as good a solution.

	Escaping for the GUI removes all conditions and gives you full
freedom. There are useful tricks to use special quoting/escaping chars
on GUIs (hint: look into ASCII 0-32), but translating filenames can NOT
be done transparently to the user (unless you can guarantee yours is
the only user interface they will use). Any translation will change
the filename and make it look differently or even untypable on other
interfaces.

	Note that the problem still remains of distinguishing when a
pathname containing a colon is an actual filename and not a database:file
specification automatically. On a GUI you may assume a :-containing path
is a filename when you are tagging uploaded data or program generated
data, but otherwise you should be cautious, highly cautious. I.e. does
swiss:prot_human refer to the database entry or to the data the user
uploaded and called that way? Is it possible someone has called their
database 'sequencer_files' locally and if so how you distinguish the 
local database of sequencer files from the user batch of sequencer_files:*
uploaded sequences?

	Assuming you can tell, then read on:

	The trick is to create a special hidden directory on each user
directory accessed: e.g. .myGUI-names. Then for every file make a
suitably processed symlink on that subdirectory and call emboss through
the symlink, sort of:

	my-gui-store-file(filename)
	{
		save(filename);
		sym = concatenate(".myGUI-names/", process(filename));
		make_symlink(sym);
	}

	my-gui-emboss-access-file(filename)
	{
		sym = concatenate(".myGUI-names/", process(filename));
		if (!file_exists(sym))
			make_symlink(sym);
		emboss-access(sym);
	}

	process(filename)
	{
		for (p = filename; *p; p++)
			if (*p == ':')
				*p = SUB; // e.g. ASCII 0x1A
	}

And off you go. Why the <SUB>? You should try to substitute the colon by
something that is guaranteed to be portable. You only have either a) the
portable character set (which is all typable) or b) the control character
set (ASCII 0-32) which you may assume will be available everywhere, and
most probably not used in filenames as they are very difficult to type or
use by hand in general. From these we better avoid NUL, BEL, BS, HT, LF,
VT, FF, CR and ESC just in case. But we still have plenty to choose from:
SUB (substitute), CAN (cancel), DLE (data link espcape) have good mnemonics 
for escaping and STX (start of transmission) and ETX (end of transmission) 
for quoting, but these are only suggestions.

That is to say: in the example above we substituted : by <SUB>, because
we only care about this special case. If there were more cases, then full
escaping/quoting might be needed, and then instead we would copy the
filename into a new string and fully quote/escape. 

I suggest the substitution approach since we are doing the encoding *within* 
the file name: anything else (quoting/escaping) will introduce additional 
chars inside the filename and this will reduce the available filename length 
hence making it less transparent and potentially dangerous (should by any 
chance be two filenames on the length limit containing an escapable sequence
and differing only in the last char).

Alternately one may use a hash of the filename instead, but this is more
painful to code, maintain and debug and potentially more wasteful in terms
of space.

Now, the original filenames are in place, and available for the command
line, up/downloads, other user interfaces, etc.. to manage as they wish,
but your GUI is no longer haunted by the infamous colon.

Symlinks on UNIX eat very little space: usually just the directory
entry. If space is very tight and becomes a concern you may consider
either hardlinks or only symlinking special filenames (this last at
the cost of additionally complex logic). With current hard disks I
wouldn't worry.

And, yes, I know this involves many more changes to a UI, but either
users accommodate (by avoiding the colon) or the UI does (by hidding
limitations).

Actually this a similar trick is used by NetATalk, AppleTalk, MacOS X 
and other systems that have similar metadata problems.

				j

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/emboss/attachments/20050620/be157c6f/attachment.sig>

From pmr at ebi.ac.uk  Mon Jun 20 09:16:35 2005
From: pmr at ebi.ac.uk (Peter Rice)
Date: Mon, 20 Jun 2005 10:16:35 +0100
Subject: [EMBOSS] Multiplatform filenames (was Re: Masking the : character?)
In-Reply-To: <20050620105520.736fef76.jrvalverde@cnb.uam.es>
References: <42B1D5E3.1000503@biol.unlp.edu.ar>	<34198fe4050618022825238622@mail.gmail.com> <20050620105520.736fef76.jrvalverde@cnb.uam.es>
Message-ID: <42B68973.7090105@ebi.ac.uk>

Jos? R. Valverde wrote:

>>2005/6/17, Martin Sarachu <msarachu at biol.unlp.edu.ar>:
>>>is there any way to mask the ':' character so it is not interpreted as a
>>>delimiter for DB:sequence?

> The problem arises because the ':' is used for historic reasons as a
> carry-over from VMS where it had special meaning on pathnames. This 
> does not hold on UNIX where it is a legit character (actually ANY char
> but '/' and NULL is a legit character on UNIX). This is important as
> EMBOSS may be used on many locales, and you don't know in advance
> how a given symbol will be represented on them. Freedom comes at a 
> cost.

Strictly speaknig, the problem arises because ':' has become a standard for 
bioinformatics users - though, yes, VMS was the source of the special syntax. 
It was adopted by, among others, GCG and SRS. It also is used, of course, in 
URN and URL syntax.

However, in this case there is a partial solution. only alphanumneric 
characters are allowed in EMBOSS database names, and they must be more that 
one character in length (to avoid clashing with C: on Windows systems).

The problem posted was not in a database name. It was the filename:id syntax, 
where a ':' appeared in the filename full path.

For a ':' in a directory name (not in the filename) we could try to catch it 
by not allowing '/' in the ID. However, that can run into problems. For 
example, PFAM uses '/' in the identifier of a sequence derived from a longer 
entry.


> QUICK SOLUTION
> - ------------
> I think that for the user it is simpler to know that ':' has a special
> meaning and should be avoided.
> 
> For the cases where the colon is generated automatically, it may be better
> to provide a renaming script that changes the colon to something else.

That would be my recommendation too.

> UI 'PRO' APPROACH
> - ---------------
> For GUI writers it is probably better to "translate" any such filenames
> between the user and EMBOSS. Note the quotes around translate above: it
> is not immediate. Let me explain:
> 

> 	The trick is to create a special hidden directory on each user
> directory accessed: e.g. .myGUI-names. Then for every file make a
> suitably processed symlink on that subdirectory and call emboss through
> the symlink, sort of:

Looks like a good approach. The alternative would be to trap "bad" filenames 
and ask the user to correct them.

regards,

Peter


From kkmattil at csc.fi  Mon Jun 20 11:50:46 2005
From: kkmattil at csc.fi (Kimmo Mattila)
Date: Mon, 20 Jun 2005 14:50:46 +0300 (EEST)
Subject: [EMBOSS] Installing EMBOSS on a Rocks linux
Message-ID: <Pine.LNX.4.62.0506201446440.31123@sampo3.csc.fi>


Hi

I would like to ask, if anyone of you have managed to install EMBOSS on a
linux cluster running Rocks linux.  When I tried to install EMBOSS to our
Rocks cluster, the standard installation procedure went through without 
error messages, but when I try to start an EMBOSS application,  I get an 
error message:

   wossname

   Segmentation fault (core dumped)

Google search about this topic revealed that some one else have had 
similar problems with Rocks too, but I was not able to find any potential 
solution. However, EMBOSS is available in Rocks based BioBrew linux 
distribution.

So, any hints about how to install EMBOSS in a Rocks cluster would be 
welcome.

Regards,

Kimmo Mattila


---------------------------------------------------------------
Kimmo Mattila, sovellusasiantuntija, Bioinformatiikan palvelut, CSC
PL 405 02101 Espoo, puh 09 457 2708 , fax (09) 457 2302
CSC on tieteen tietotekniikan keskus, www.csc.fi, s-posti: 
kimmo.mattila at csc.fi

Kimmo Mattila, application scientist, Bioinformatics Support, CSC
P.O. Box 405 02101 Espoo, Finland, tel +358 9 4572708, fax +358 9 4572302
CSC is the Finnish IT Center for Science, www.csc.fi, e-mail: 
kimmo.mattila at csc.fi
---------------------------------------------------------------


From smiddha at indiana.edu  Mon Jun 20 14:59:56 2005
From: smiddha at indiana.edu (Sumit Middha)
Date: Mon, 20 Jun 2005 09:59:56 -0500
Subject: [EMBOSS] Emboss package - file size limitations
In-Reply-To: <Pine.LNX.4.62.0506201446440.31123@sampo3.csc.fi>
References: <Pine.LNX.4.62.0506201446440.31123@sampo3.csc.fi>
Message-ID: <1119279596.42b6d9ec2c52d@webmail.iu.edu>


Hi,

I looked around for threshold limitations on the size of the files that can be
used for analysis, but could not locate any information.

Is there a limit to the size of files that I can use, and is there a different
limit on the web and command line usage.

Actually I had the same question for GCG tools.

Thanks,
Sumit


From pmr at ebi.ac.uk  Mon Jun 20 15:26:52 2005
From: pmr at ebi.ac.uk (Peter Rice)
Date: Mon, 20 Jun 2005 16:26:52 +0100
Subject: [EMBOSS] Emboss package - file size limitations
In-Reply-To: <1119279596.42b6d9ec2c52d@webmail.iu.edu>
References: <Pine.LNX.4.62.0506201446440.31123@sampo3.csc.fi> <1119279596.42b6d9ec2c52d@webmail.iu.edu>
Message-ID: <42B6E03C.9020306@ebi.ac.uk>

Hi Sumit,

> Is there a limit to the size of files that I can use, and is there a different
> limit on the web and command line usage.

EMBOSS has no hard coded limit on sequence or file size. The operating system 
may have problems with 2Gb file size, and the EMBLCD indexing system we use 
for database indexing in EMBOSS 2 has a 2Gb file size limit (4 byte file 
pointers are part of the index format) - there will be a new indexing system 
in beta release with EMBOSS 3 that will have enough space for large file offsets.

Some algorithms will have limits, depending on the memory (real and virtual) 
on your machine.

> Actually I had the same question for GCG tools.

I believe sequence length is still up to 350kb unless you have the source code 
(when I was at Sanger I routinely rebuilt GCG with 750kb as the maximum 
sequence length so the genome sequencers could still use it on their own 
sequences!) A future release of GCG is supposed to increase this.

Hope that helps,

Peter Rice


From francis at bii.a-star.edu.sg  Tue Jun 21 08:47:51 2005
From: francis at bii.a-star.edu.sg (Francis Tang)
Date: Tue, 21 Jun 2005 16:47:51 +0800
Subject: [EMBOSS] Wildfire 2.0
Message-ID: <42B7D437.5060506@bii.a-star.edu.sg>

Dear EMBOSS users,

On behalf of the Bioinformatics Institute, Singapore, I would like to 
announce that Wildfire 2.0 is now available for download from 
http://wildfire.bii.a-star.edu.sg .

Wildfire is a GUI application for constructing workflows.  It has been 
configured so that you can build workflows using EMBOSS applications 
immediately.  The resulting workflows can run on a cluster or other 
multi-cpu machine, and exploit parallelism where possible.

Wildfire is described in the BMC Bioinformatics article:

     "Wildfire: distributed, Grid-enabled workflow construction and 
execution", BMC Bioinformatics 2005, 6:69.
     http://www.biomedcentral.com/1471-2105/6/69/abstract

We invite you all to download and try Wildfire and welcome feedback to 
wildfire at bii.a-star.edu.sg .

Thank you.

Francis.


-- 
Francis TANG, Post-Doctoral Research Fellow
Bioinformatics Institute, BMSI, A-STAR, Singapore.
Tel: +65 64788282  Fax: +65 64789048  Email: francis at bii.a-star.edu.sg
Add: Matrix L7, Biopolis   WWW: http://www.bii.a-star.edu.sg/~francis/


From jieqiwang at gmail.com  Tue Jun 21 14:55:46 2005
From: jieqiwang at gmail.com (Wang Jieqi)
Date: Tue, 21 Jun 2005 22:55:46 +0800
Subject: [EMBOSS] Help with retrieving sequences
Message-ID: <55162b5205062107555043348@mail.gmail.com>

Hello,
I started to learn EMBOSS recently. Now, I want to read the CDS of
several mRNA sequences. The complete entires of these mRNAs(cDNA) have
been retrieved from GeneBank into a single file. Could you please tell
me what to do next? And, I find that seqret seems to only read the
first molecule, could you please help me out? Thanks.


   Best regards, 
Jieqi
-- 
Jieqi Wang
Room 121, Department of Biology
Tsinghua University
Beijing, 100084
China, People's Republic
Mobile: +86-13641302483
Dorm:   +86-10-51534406
Lab:     +86-10-62784794
Fax:     +86-10-62794376


From aengus.stewart at cancer.org.uk  Tue Jun 21 15:16:41 2005
From: aengus.stewart at cancer.org.uk (Aengus Stewart)
Date: Tue, 21 Jun 2005 16:16:41 +0100
Subject: [EMBOSS] Data Lib sizes and indexing progs
Message-ID: <42B82F59.5040200@cancer.org.uk>


Hi folks,

Just wondering how the new indexing methods were coming on.

Its just I had a look at the most recent EMBL release and its (give or take the odd gig)AND INDEXING PROGS 250Gb which means to have the head room to hold a copy while installing a new copy requires >500Gb.

Any info on how the new indexing will work and will it still have to run off uncompressed .dat files or will it produce its own index format?

Sorry about the questions, its just I am rushing around the filesystem deleting anything that may appear to be "deleteable" to scrounge enough space :-)


Regards
Aengus

-- 
-----------------------------------------------------------------------
Aengus Stewart
Group Leader
Bioinformatics at CGAL                            Tel: +44 (0)20 7269 3679
Cancer Research UK, Lincoln's Inn Fields, Holborn, London, WC2A 3PX, UK
-----------------------------------------------------------------------

This electronic message contains information  which may be privileged and
confidential.  The information is intended to be for the use of the
individual(s) or entity named above.  If you are not the intended recipient,
be aware that any disclosure, copying, distribution or use of the contents
of this information is prohibited. If you have received this electronic
message in error, please notify me by telephone or email (to the number
or address above) immediately.


From ableasby at hgmp.mrc.ac.uk  Tue Jun 21 15:27:56 2005
From: ableasby at hgmp.mrc.ac.uk (Alan Bleasby)
Date: Tue, 21 Jun 2005 16:27:56 +0100 (BST)
Subject: [EMBOSS] Data Lib sizes and indexing progs
Message-ID: <200506211527.j5LFRuRR024742@bromine.hgmp.mrc.ac.uk>

The new indexing programs are done (in CVS). The programs are:
dbxflat, dbxfasta and dbxgcg  and they operate like their
'dbi' couterparts. The dbx and dbi programs will be available
in the next release.

So, for EMBL, you would typically index the *.dat files.
As before, you can create id,acc,sv,key,org & des indexes
(though many sites just index id and acc). 

An indexing job on the whole of the recently released EMBL will
produce id, acc and key indexes of the following sizes. They
should give you some idea of the extra disc space you'll need.

-rw-r--r--  1 root root      19950 Jun 19 14:11 embli.ent
-rw-r--r--  1 root root        122 Jun 20 13:41 embli.pxac
-rw-r--r--  1 root root        122 Jun 20 13:41 embli.pxid
-rw-r--r--  1 root root        126 Jun 20 13:41 embli.pxkw
-rw-r--r--  1 root root 8755992576 Jun 20 13:41 embli.xac
-rw-r--r--  1 root root 7482558464 Jun 20 13:41 embli.xid
-rw-r--r--  1 root root 4046751744 Jun 20 13:41 embli.xkw

HTH

Alan


From kellert at ohsu.edu  Thu Jun 23 04:06:18 2005
From: kellert at ohsu.edu (Thomas J Keller)
Date: Wed, 22 Jun 2005 21:06:18 -0700
Subject: [EMBOSS] source of common vectors in cirdna format
Message-ID: <03b2ab1a566cf2386b89bb570e26e6eb@ohsu.edu>

Greetings,
Is there a source for common vectors in cirdna format available for 
downloading?

Thanks in advance,
Tom Keller

Tom Keller, Ph.D.
http://www.ohsu.edu/research/core
kellert at ohsu.edu
503-494-2442
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/enriched
Size: 259 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/emboss/attachments/20050622/d4ae1249/attachment-0001.bin>

From clemens.broger at roche.com  Thu Jun 23 13:48:24 2005
From: clemens.broger at roche.com (Broger, Clemens)
Date: Thu, 23 Jun 2005 15:48:24 +0200
Subject: [EMBOSS] Needle/water, revcomp
Message-ID: <7E08B1C16672A147B29D3DE3827536E37E00CC@rbamsem3.emea.roche.com>

I have 2 questions:

The first is about identity/similarity in nucleotide alignments made
with needle (probably the same holds true for water):
 
########################################
# Program:  needle
# Rundate:  Thu Jun 23 13:29:58 2005
# Align_format: srspair
# Report_file: seq0.needle
########################################

#=======================================
#
# Aligned_sequences: 2
# 1: SEQ0
# 2: SEQ1
# Matrix: EDNAFULL
# Gap_penalty: 100.0
# Extend_penalty: 10.0
#
# Length: 70
# Length of sequence 1: 70
# Length of sequence 2: 70
# Identity:      46/70 (65.7%)
# Similarity:    47/70 (67.1%)
# Gaps:           0/70 ( 0.0%)
# Score: 162.0
# 
#
#=======================================

                              .         .         .         .         .
SEQ0               1 aaaaaaaaaaaaaaaaaaaaaaaaacccccgggggtttttuuuuunnnnn
50
                     |||||||||||||||||||||......|......||:....:|..     
SEQ1               1 aaaaaaaaaaaaaaaaaaaaacgtunacgtunacgtunacgtunacgtun
50
                              .         .         .         .         .

                              .         .
SEQ0              51 aaaaaaaaaaaaaaaaaaaa     70
                     ||||||||||||||||||||
SEQ1              51 aaaaaaaaaaaaaaaaaaaa     70
                              .         .

Each base of the set acgtun is aligned against each other. The 20 a's at
the beginning and end are only to force an ungapped alignment. Maximum
gap penalties were used.
 
I agree with the symbols in the alignment |,: and ., but the 46
identities in the summary imply that the n-n match is also counted. The
t-u matches are counted as similar, which is ok, but the n-n match is
not counted as similar, although it is counted as identical. I think the
n-n match should not be counted both in identity and similarity.
 
Now for ambiguous bases. w is a or t
 
########################################
# Program:  needle
# Rundate:  Thu Jun 23 14:53:33 2005
# Align_format: srspair
# Report_file: seq0.needle
########################################

#=======================================
#
# Aligned_sequences: 2
# 1: SEQ0
# 2: SEQ1
# Matrix: EDNAFULL
# Gap_penalty: 100.0
# Extend_penalty: 10.0
#
# Length: 26
# Length of sequence 1: 26
# Length of sequence 2: 26
# Identity:      21/26 (80.8%)
# Similarity:    23/26 (88.5%)
# Gaps:           0/26 ( 0.0%)
# Score: 94.0
# 
#
#=======================================

                              .         .      
SEQ0               1 aaaaaaaaaawwwwwwaaaaaaaaaa     26
                     ||||||||||..   .||||||||||
SEQ1               1 aaaaaaaaaaatwgcuaaaaaaaaaa     26
                              .         .      

In the alignment I would put a dot at the w-w match (but I could also
agree with the way it is handled now). But again the w is counted in the
summary as an identity but not as a similarity.


The second question is about the handling in EMBOSS of
reverse-complemented nucleotide segments such as  

db:seq[10:20:r]

The sequence is first reverse-complemented and then residues 10 to 20
are cut out.
Biologists usually expect that residues 10 to 20 are first cut out and
then reverse-complemented.

Can this be changed? That would be very helpful.

Best regards

Clemens


Dr. Clemens Broger
Bioinformatics
F. Hoffmann-La Roche Ltd.
PRBI 65/303
CH-4070 Basel
clemens.broger at roche.com
+41-61-688-4447

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.open-bio.org/pipermail/emboss/attachments/20050623/270ec53f/attachment-0001.html>

From pmr at ebi.ac.uk  Thu Jun 23 14:38:25 2005
From: pmr at ebi.ac.uk (pmr at ebi.ac.uk)
Date: Thu, 23 Jun 2005 15:38:25 +0100 (BST)
Subject: [EMBOSS] source of common vectors in cirdna format
In-Reply-To: <03b2ab1a566cf2386b89bb570e26e6eb@ohsu.edu>
References: <03b2ab1a566cf2386b89bb570e26e6eb@ohsu.edu>
Message-ID: <2840.12.27.2.2.1119537505.squirrel@webmail.ebi.ac.uk>

Tom Keller writes:

> Is there a source for common vectors in cirdna format available for
> downloading?

Or is there a source of common vectors that we could convert to cirdna
format?

regards,

Peter Rice


From pmr at ebi.ac.uk  Thu Jun 23 14:41:34 2005
From: pmr at ebi.ac.uk (pmr at ebi.ac.uk)
Date: Thu, 23 Jun 2005 15:41:34 +0100 (BST)
Subject: [EMBOSS] Needle/water, revcomp
In-Reply-To:      <7E08B1C16672A147B29D3DE3827536E37E00CC@rbamsem3.emea.roche.com>
References:     <7E08B1C16672A147B29D3DE3827536E37E00CC@rbamsem3.emea.roche.com>
Message-ID: <2849.12.27.2.2.1119537694.squirrel@webmail.ebi.ac.uk>

Clemens Broger writes:

> I have 2 questions:
>
> The first is about identity/similarity in nucleotide alignments made
> with needle (probably the same holds true for water):

Tricky. This requires the matrix to define some codes as ambiguity codes
so we know w-w is not an identity. I woudl guess we can extend the matrix
formats we use to include this information, or perhaps for nucleotide
sequences we can "know" the answer.

I will investigate.

> The second question is about the handling in EMBOSS of
> reverse-complemented nucleotide segments such as
>
> db:seq[10:20:r]
>
> The sequence is first reverse-complemented and then residues 10 to 20
> are cut out.
> Biologists usually expect that residues 10 to 20 are first cut out and
> then reverse-complemented.
>
> Can this be changed? That would be very helpful.

Oops. Yes - will do.

regards,

Peter Rice


From msarachu at biol.unlp.edu.ar  Mon Jun 27 12:28:58 2005
From: msarachu at biol.unlp.edu.ar (Martin Sarachu)
Date: Mon, 27 Jun 2005 09:28:58 -0300
Subject: [EMBOSS] Re: wemboss: warning and errors
In-Reply-To: <0C9336E1DA90DB479BEBAF2C7C5699E1016EA96D@USINDMDOWM001.dow.com>
References: <0C9336E1DA90DB479BEBAF2C7C5699E1016EA96D@USINDMDOWM001.dow.com>
Message-ID: <42BFF10A.4090405@biol.unlp.edu.ar>

Dear Rani,

about the error with ACD, when running distmat from command line 
(-options to be prompted for all options) I get this error with ACD

> # distmat -options
> Creates a distance matrix from multiple alignments
> Input sequence set: uniprot:papa_*
> Multiple substitution correction methods for proteins
>          0 : Uncorrected
>          1 : Jukes-Cantor
>          2 : Kimura Protein
> Method to use [0]: 1
> Warning: ACD expression invalid @(!$acdprotein)
> 
> Warning: ACD expression invalid @(!$acdprotein)
> 
> Error: File /usr/local/emboss/share/EMBOSS/acd/distmat.acd line 60: (ambiguous) Bad additional flag N | Y)
> 

but without -options (i.e. default options chosen) runs ok

> # distmat
> Creates a distance matrix from multiple alignments
> Input sequence set: uniprot:papa_*
> Multiple substitution correction methods for proteins
>          0 : Uncorrected
>          1 : Jukes-Cantor
>          2 : Kimura Protein
> Method to use [0]: 1
> Output file [papa_.distmat]:
> Warning: Sequence lengths are not equal!
> Warning: Sequence lengths are not equal!
> Warning: Sequence lengths are not equal!

there is a missing left parenthesis in distmat.acd in line 61, please 
change this

>     additional: "@(@(@(!$acdprotein)) & @($(nucmethod)==1)) |

to this

>     additional: "@(@(@(!$(acdprotein)) & @($(nucmethod)==1)) |


Regards,

Martin

PS: working on the exclude problem...


Mamidipalli, SudhaRani (S) wrote:
> Hello Martin,
> 
> While testing the programs in wEMBOSS,we have encountered couple of problems.
> 
> 1.The 'distmat' program gave some warning. Here is the warning of that program. 
> -------------------------------
> Warning! 
> "ambiguous" parameter: syntax error (missing left parenthesis) in ACD expression (tell to EMBOSS Manager : this could produce wrong results from program execution!) 
> -------------------------------
> I went and checked distmat.acd file but couldn't find any error. 
> 
> 2. I added some programs, that we don't want to be displayed in wemboss, in the exclude file: /genomics/sw/wEMBOSS-1.4.0/wEMBOSS/data/exclude. And then I re-installed wrappers4EMBOSS and wEMBOSS. Surprisingly, only few programs(for example tranalign,embossversion etc.) got deleted from wemboss whereas few programs (for example textsearch, entret etc.) show up with error 
> --------
> EMBOSS: error...
> chaos has been excluded
> ----------
>   
> Please clarify.
> 
> Thanks and Regards,
> Rani.
> 

-- 
Martin Sarachu
msarachu at biol.unlp.edu.ar
AR.EMBnet
http://www.ar.embnet.org


From pmr at ebi.ac.uk  Mon Jun 27 14:25:35 2005
From: pmr at ebi.ac.uk (pmr at ebi.ac.uk)
Date: Mon, 27 Jun 2005 15:25:35 +0100 (BST)
Subject: [EMBOSS] Re: wemboss: warning and errors
In-Reply-To: <42BFF10A.4090405@biol.unlp.edu.ar>
References: <0C9336E1DA90DB479BEBAF2C7C5699E1016EA96D@USINDMDOWM001.dow.com>
    <42BFF10A.4090405@biol.unlp.edu.ar>
Message-ID: <1613.12.27.2.2.1119882335.squirrel@webmail.ebi.ac.uk>

Martin Srachu writes:

> there is a missing left parenthesis in distmat.acd in line 61, please
> change this
>
>>     additional: "@(@(@(!$acdprotein)) & @($(nucmethod)==1)) |
>
> to this
>
>>     additional: "@(@(@(!$(acdprotein)) & @($(nucmethod)==1)) |

Already fixed in EMBOSS 2.10.0.

But this does highlight a gap in the ACD validation - this expression is
only evaluated when needed (when -option is used). I will try adding
checks for all strings to generate warnings for unbalanced () and $ or @
without ( to acdvalid before the July 15th release.

>> --------
>> EMBOSS: error...
>> chaos has been excluded
>> ----------

I know this is really a wEMBOSS problem, but the message appeals to my
sense of humour!!! Can you send me an explanation of it when you have a
solution - it may appear in future EMBOSS talks :-)

regards,

Peter


From gbottu at ben.vub.ac.be  Wed Jun 29 08:30:02 2005
From: gbottu at ben.vub.ac.be (Guy Bottu)
Date: Wed, 29 Jun 2005 10:30:02 +0200
Subject: [EMBOSS] bug related to -plasmid parameter
Message-ID: <20050629083002.GA4560@bigben.ulb.ac.be>

from: Belgian EMBnet Node

	Dear colleagues,

At the BEN site we have on our main computer EMBOSS 2.10.0 under Alpha OSF 
5.1A. I just noticed that the programs remap, restrict and restover give a 
segmentation fault when run with parameter -plasmid. This does however not 
occur with an EMBOSS installation we have on a Linux. So, this behaviour 
must be dependant on the OS and maybe on the hardware. Did someone else 
notice it ?

	Regards,
	Guy Bottu


From ableasby at hgmp.mrc.ac.uk  Wed Jun 29 12:13:15 2005
From: ableasby at hgmp.mrc.ac.uk (Alan Bleasby)
Date: Wed, 29 Jun 2005 13:13:15 +0100 (BST)
Subject: [EMBOSS] bug related to -plasmid parameter
Message-ID: <200506291213.j5TCDFMb014301@bromine.hgmp.mrc.ac.uk>

Dear Guy,

Thanks for spotting that. It's now fixed in CVS and will be
part of the 3.0.0 release.

ATB

Alan Bleasby
RFCGR/HGMP (for one more month)