From mathog at mendel.bio.caltech.edu  Tue Apr  2 11:39:48 2002
From: mathog at mendel.bio.caltech.edu (David Mathog)
Date: Tue, 02 Apr 2002 08:39:48 -0800
Subject: primer3 and qual scores
Message-ID: <E16sRK0-0000ZJ-00@mendel.bio.caltech.edu>

The 0.9 version of primer3 from 

http://www-genome.wi.mit.edu/genome_software/other/primer3.html

comes with a cgi script that puts up a web interface which
drives primer3_core.  That web interface provides a slew
of options for including qual values in a section labeled
"Sequence Quality".  The primer3 program in EMBOSS seems not to
have these options.   Is there some technical reason why this
functionality wasn't included or is it just one of those (many) things
that have yet to percolate to the top of the "to do" list?

Thanks,

David Mathog
mathog at caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech


From vvrajarao at yahoo.com  Wed Apr  3 09:35:44 2002
From: vvrajarao at yahoo.com (V V Raja Rao)
Date: Wed, 3 Apr 2002 06:35:44 -0800 (PST)
Subject: tandem repeats
Message-ID: <20020403143544.79253.qmail@web11107.mail.yahoo.com>

Hi,

  I would like to know the algorithm used for the
tandem repeat finder program in the emboss package.
Can someone mail me the same.

Thanks in advance,
Raja.

__________________________________________________
Do You Yahoo!?
Yahoo! Tax Center - online filing with TurboTax
http://taxes.yahoo.com/


From gwilliam at hgmp.mrc.ac.uk  Wed Apr  3 10:56:04 2002
From: gwilliam at hgmp.mrc.ac.uk (Gary Williams, Tel 01223 494522)
Date: Wed, 03 Apr 2002 16:56:04 +0100
Subject: primer3 and qual scores
References: <E16sRK0-0000ZJ-00@mendel.bio.caltech.edu>
Message-ID: <3CAB2614.9A8B28A@hgmp.mrc.ac.uk>


As you say "a slew of options"!
I didn't include the quality values as there was pressure from the GUI
community to minimise the number of options.

I, myself,  have never used the quality options.
They could be added in, but there are already rather a lot of options
for this program; do we need more? Which ones do you consider the most
useful, and why?

Gary

David Mathog wrote:
> 
> The 0.9 version of primer3 from
> 
> http://www-genome.wi.mit.edu/genome_software/other/primer3.html
> 
> comes with a cgi script that puts up a web interface which
> drives primer3_core.  That web interface provides a slew
> of options for including qual values in a section labeled
> "Sequence Quality".  The primer3 program in EMBOSS seems not to
> have these options.   Is there some technical reason why this
> functionality wasn't included or is it just one of those (many) things
> that have yet to percolate to the top of the "to do" list?
> 
> Thanks,
> 
> David Mathog
> mathog at caltech.edu
> Manager, Sequence Analysis Facility, Biology Division, Caltech

-- 
Gary Williams               Tel: +44 1223 494522  Fax: +44 1223 494512
mailto:G.Williams at hgmp.mrc.ac.uk            http://www.hgmp.mrc.ac.uk/
Bioinformatics,MRC HGMP Resource Centre,Hinxton,Cambridge, CB10 1SB,UK


From foisys at mac.com  Thu Apr  4 14:02:47 2002
From: foisys at mac.com (Sylvain Foisy)
Date: Thu, 4 Apr 2002 14:02:47 -0500
Subject: Compiling EMBOSS for Jemboss use in MacOS X
Message-ID: <8DA5625E-47FE-11D6-ABCB-0003936297DA@mac.com>

Hi,

Thanks for supporting OS X in EMBOSS. I got a clean compile and make and 
it is working OK. I would like to try Jemboss and I have one (I guess, 
pretty stupid) question. What is the Java location I might specify in 
the configure command in Mac OS X? There is a lot of differents places 
in OS X that might be good...

Any hints?

Sylvain

++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Sylvain Foisy, Ph. D.
Manager
BIONEQ - Le Reseau quebecois de bioinformatique
Genome-Quebec
Tel.: (514) 343-6111 poste 5188
E-mail: foisys at medcn.umontreal.ca
++++++++++++++++++++++++++++++++++++++++++++++++++++++++


From ame at esbs.u-strasbg.fr  Mon Apr  8 05:13:07 2002
From: ame at esbs.u-strasbg.fr (Jean-Christophe Ame)
Date: Mon, 8 Apr 2002 11:13:07 +0200
Subject: Blast database
Message-ID: <D6C83EC3-4AD0-11D6-BFB6-0005024329A7@esbs.u-strasbg.fr>

Hello,

I have a few BLAST formatted databases to be used with BLAST and I would 
like them to be shared with emboss. how can I do that ? How should I set 
up my .embossrc file ? Any answer would be of great help. Thank you.

Jean-Christophe


________________________
Jean-Christophe Am?, PhD
U.P.R. 9003 du CNRS - Canc?rog?n?se et Mutag?n?se Mol?culaire et 
Structurale
?cole Sup?rieure de Biotechnologie de Strasbourg
P?le API
Boulevard S?bastien-Brant
67400 Illkirch
France

tel.: 03 90 24 47 05
Fax.: 03 90 24 46 86


From cquijano at iib.uam.es  Tue Apr  9 04:51:30 2002
From: cquijano at iib.uam.es (Carlos Quijano)
Date: Tue, 09 Apr 2002 10:51:30 +0200
Subject: Where is protml (Phylip) ?
Message-ID: <3CB2AB92.5040308@iib.uam.es>

Hello,

Some people asked me about "activating" the protml application from 
phylip package. ;-)
We use Emboss, with Embassy, and it seems that protml is documented but 
not compiled or installed. Even looking for it under ./src it is not 
present by some reason.

I don't know if the cause for it is that protml (something like dnaml 
but for proteins trees) has been developed with PASCAL instead of C.

With the phylip package (not Embasy's) protml.pas is present between the 
other sourcefiles. And I have seen that it comes with the MOLPHY package 
too, but in C, I guess.

Someone has any idea for put a compiled protml.pas (or molphy's) into 
Emboss and make it accesible for the frontend-app used (for me w2h)??

I know perhaps this solution is not the best one, or perhaps a little 
paranoid. Because it's possible to use PIE like web interface for 
Phylip, Molphi's protml and puzzle.

I am only looking for an easy way for compiling protml and make it part 
of the Emboss or Emboss/w2h applications set.


Thank you for your time.


From frank at bioss.ac.uk  Tue Apr  9 06:04:18 2002
From: frank at bioss.ac.uk (Frank Wright)
Date: Tue, 09 Apr 2002 11:04:18 +0100
Subject: Where is protml (Phylip) ?
References: <3CB2AB92.5040308@iib.uam.es>
Message-ID: <3CB2BCA2.F34D3DEC@bioss.ac.uk>

PHYLIP 3.5 includes PROTML (from the MOLPHY package) because v 3.5 does
not have a protein maximum likelihood program.

However, PHYLIP 3.6 (almost about to go to a Beta release) has a new
program, PROML, which is a PHYLIP protein maximum likelihood program. 
PROML has additional features to PROTML.  

I suggest waiting for PHYLIP 3.6 to be released and EMBOSS/EMBASSY is
adapted to access v 3.6 programs.  In the meantime, PHYLIP 3.6 (alpha
version, but pretty stable) is available from
http://evolution.genetics.washington.edu/phylip.html.

Best Wishes,
Frank
-- 
Frank Wright
Biomathematics and Statistics Scotland, 
SCRI, DUNDEE DD2 5DA, Scotland
frank at bioss.sari.ac.uk


From Guoneng.Zhong at med.nyu.edu  Tue Apr  9 13:37:42 2002
From: Guoneng.Zhong at med.nyu.edu (Guoneng Zhong)
Date: Tue, 9 Apr 2002 13:37:42 -0400
Subject: problem running
Message-ID: <7EDDC060-4BE0-11D6-A32B-0050E41E5C1B@med.nyu.edu>

Hi,
I followed the instructions and installed emboss on a Tru64 unix.  I ran 
the test:
wossname -auto | more

and it worked (at least no weird errors).

But here are two problems:

1. Running jemboss gave me this:
Error: failed /usr/opt/java131/jre/lib/alpha/fast/libjvm.so, because 
dlopen: cannot load /usr/opt/java131/jre/lib/alpha/fast/libjvm.so

2. Running abiviewer gave me this:
Reads ABI file and display the trace
Output sequence [outfile.fasta]:
Graph type [x11]:
PLPLOT_LIB="/usr/local/emboss/lib"

Cannot open library file: plstnd5.fnt

Please set PLPLOT_LIB to the plplot/lib directory under emboss

*** PLPLOT ERROR ***
Unable to open font file
Program aborted

Any hint would help.  Thanks!

Guoneng


From David.Bauer at SCHERING.DE  Wed Apr 10 01:47:02 2002
From: David.Bauer at SCHERING.DE (David.Bauer at SCHERING.DE)
Date: Wed, 10 Apr 2002 07:47:02 +0200
Subject: Antwort: problem running
Message-ID: <OFDEBEE501.AA84E38F-ONC1256B97.001E7AAE@schering.de>


Hi,

Answer for question 2:

The PLPLOT_LIB must point to a directory with the .fnt files.
If you do a standard installation, then this is "/usr/local/share/EMBOSS".
Alternatively you can use the location where you unpacked the emboss tar file.
In that case it is the /.../EMBOSS-..../plplot/lib.
So if you use csh or tcsh you should have a
     setenv PLPLOT_LIB /usr/local/share/EMBOSS
in your .cshrc.

Hope this helps.

Ciao,
David.


Hi,
I followed the instructions and installed emboss on a Tru64 unix.  I ran
the test:
wossname -auto | more

and it worked (at least no weird errors).

But here are two problems:

1. Running jemboss gave me this:
Error: failed /usr/opt/java131/jre/lib/alpha/fast/libjvm.so, because
dlopen: cannot load /usr/opt/java131/jre/lib/alpha/fast/libjvm.so

2. Running abiviewer gave me this:
Reads ABI file and display the trace
Output sequence [outfile.fasta]:
Graph type [x11]:
PLPLOT_LIB="/usr/local/emboss/lib"

Cannot open library file: plstnd5.fnt

Please set PLPLOT_LIB to the plplot/lib directory under emboss

*** PLPLOT ERROR ***
Unable to open font file
Program aborted

Any hint would help.  Thanks!

Guoneng


From john.walshaw at bbsrc.ac.uk  Wed Apr 10 06:44:13 2002
From: john.walshaw at bbsrc.ac.uk (john walshaw (JIC))
Date: Wed, 10 Apr 2002 11:44:13 +0100
Subject: dbifasta/seqret and ncbi-format fasta headers
Message-ID: <E4D0A20B9E9ED4118E3C00508BEED171030AC451@jimserv2.jic.bbsrc.ac.uk>

I have a question about ncbi-type sequence headers in fasta-format files.
I'm
using EMBOSS 2.3.1.

The ncbi format for the dbifasta program is described variously as:

      ncbi : >blah|...[|ACC]|ID
      
and
	>...[|accno]|id ...

in the EMBOSS admin guide and by 'tfm dbifasta'.

>From these I assumed that within the first of the whitespace-delimited
'fields', the last two '|'-delimited subfields will be treated by dbifasta
as the accession no and ID respectively:

>gi|15375403|dbj|AB039926.1|AB039926 Arabidopsis ...blah...
                 ^^^^^^^^^^ ^^^^^^^^
                    accno      id
		    
- but this doesn't work as seqret reports in this case that AB039926 is not
in
my database (which I indexed with dbifasta using idformat 'ncbi', and
specified
with method: emblcd  format:fasta & the necessary dir: and indexdir:
fields).

But this sequence works (I can get it with seqret) -

>gi|15383574|gb|AV540904.2|AV540904 AV540904 Arabidopsis thaliana roots
...blah
                           ^^^^^^^^ ^^^^^^^^
-because the second whitespace-delimited field is present AND identical to
the
previous subfield. The 2nd field is not simply being used as the accno,
because
for example this entry:

>gi|15383574|gb|AV540904.2|XXXXXXX YYYYYYY

cannot be returned by seqret either as XXXXXXX or YYYYYYY (or by any means
other than requesting all sequences in the DB).

Am I doing something stupid? I've looked into this problem a lot, and can
provide debug files for seqret & dbifasta, and I'm sure my db specification
in
emboss.default is correct. For the sequences which fail, seqret reads the
correct header line, but then thinks that accno=''. And seqret always
returns
the id as 'gi' (even for sequences which can be fetched normally). All of
the
correct accnos (e.g. AV540904.2) appear in the acnum.trg file.

Regards,

John Walshaw
John Innes Centre, Norwich Research Park,
Colney, Norwich NR4 7UH, UK. +44(0)1603 450827


From valenzi at iigb.na.cnr.it  Wed Apr 10 12:49:07 2002
From: valenzi at iigb.na.cnr.it (Marco Valenzi)
Date: Wed, 10 Apr 2002 18:49:07 +0200
Subject: About prima
Message-ID: <a05001901b8da1cfd72f0@[140.164.13.84]>

Hi, I'm Marco Valenzi from Naples.
Why prima has been removed from the current package of EMBOSS-2.3.1?
Many thanks
-- 


Marco Valenzi
Institute of Genetics and Biophysics "Adriano Buzzati Traverso"
via Guglielmo Marconi, 10
80125 Naples ITALY
E-mail valenzi at iigbna.iigb.na.cnr.it
tel. +39 081 7257303


From cox at mshri.on.ca  Sat Apr 13 20:46:07 2002
From: cox at mshri.on.ca (Brian Cox)
Date: Sat, 13 Apr 2002 17:46:07 -0700
Subject: jemboss
Message-ID: <000801c1e34d$c79a9c30$bc66f8ce@rossdell>

Hello, I noticed that there is a Jemboss for windows on your FTP site.  I downloaded it but, require a login and password.  How do I obtain these?  Is this a standalone version such as the one for Unix?

Thank you
Brian Cox
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.open-bio.org/pipermail/emboss/attachments/20020413/cad89689/attachment.html 

From quenzer at informatik.uni-tuebingen.de  Tue Apr 16 08:08:56 2002
From: quenzer at informatik.uni-tuebingen.de (Muriel Quenzer)
Date: Tue, 16 Apr 2002 14:08:56 +0200
Subject: Size of EMBOSS 2.3.1 for Solaris 2.8
Message-ID: <200204161208.g3GC8u721130@tauri.informatik.uni-tuebingen.de>

Hi,

I am new to EMBOSS and have to install the latest EMBOSS version 2.3.1 for Sun 
Solaris 2.8. I have been told that the EMBOSS version 2.0.1 needed 
approximately 15 MB disk space, whereas the EMBOSS version 2.3.1 that I 
compiled needs approximately 520 (!) MB. Is this correct?

Thanks for any suggestions.

Muriel


-- 
Mit freundlichen Gr??en,
                                              
Muriel Quenzer                                         
----------
Universit?t T?bingen                                
Wilhelm-Schickard-Institut f?r Informatik
Zentrum f?r Bioinformatik            
Sand 13, 72076 T?bingen
Germany                            
Tel.: +49 (0)7071/29-70464                               
E-mail: quenzer at informatik.uni-tuebingen.de        
GnuPG PUBLIC KEY on request
Key fingerprint = ADDF 1E38 773F 3D51 682E  1F50 D7CC 47E1 3AE8 E047


From charles at moulinette.dyndns.org  Tue Apr 16 08:55:32 2002
From: charles at moulinette.dyndns.org (Charles Plessy)
Date: Tue, 16 Apr 2002 14:55:32 +0200
Subject: Pentium optimisation
Message-ID: <20020416125532.GA22253@moulinette.dyndns.org>

Hi,

I'm running emboss on a debian GNU\Linux with a Pentium IV. Would I
increase the speed of computations if I compiled it with for i686, not
i386 processors (or is it only useful for multimedia apps) ?

Charles


From David.Bauer at SCHERING.DE  Tue Apr 16 10:11:39 2002
From: David.Bauer at SCHERING.DE (David.Bauer at SCHERING.DE)
Date: Tue, 16 Apr 2002 16:11:39 +0200
Subject: Antwort: Size of EMBOSS 2.3.1 for Solaris 2.8
Message-ID: <OF6E1B5F8D.98AC7DD3-ONC1256B9D.004ABA0E@schering.de>


Hi,

this is a little bit overestimated....

I have EMBOSS on Solaris 2.7.
The build tree is ~100 MB with embassy apps (they are about 15 MB incl. tar
files).
The installed version needs 21.4 MB for the binaries and 94 MB in share/EMBOSS
(where 75 MB are for PRINTS,PROSITE and REBASE).

Mit freundlichen Gr??en,
David.


Hi,

I am new to EMBOSS and have to install the latest EMBOSS version 2.3.1 for Sun
Solaris 2.8. I have been told that the EMBOSS version 2.0.1 needed
approximately 15 MB disk space, whereas the EMBOSS version 2.3.1 that I
compiled needs approximately 520 (!) MB. Is this correct?

Thanks for any suggestions.

Muriel


--
Mit freundlichen Gr??en,

Muriel Quenzer
----------
Universit?t T?bingen
Wilhelm-Schickard-Institut f?r Informatik
Zentrum f?r Bioinformatik
Sand 13, 72076 T?bingen
Germany
Tel.: +49 (0)7071/29-70464
E-mail: quenzer at informatik.uni-tuebingen.de
GnuPG PUBLIC KEY on request
Key fingerprint = ADDF 1E38 773F 3D51 682E  1F50 D7CC 47E1 3AE8 E047


From peacfrog at ptd.net  Tue Apr 16 13:18:07 2002
From: peacfrog at ptd.net (Cynthia Martino)
Date: Tue, 16 Apr 2002 13:18:07 -0400
Subject: Prima
Message-ID: <001301c1e56a$ad3c7880$2d7ce518@msns.str.ptd.net>

Hi there!

In the past I was able to access a number of programs, including prima, via the EMBnet Norway site.  However, now when I click on the program name within the program list all I get is a help page describing the qualifiers.  

Do you know if this and the other programs formerly available (via www.no.embnet.org/Programs/) can still be accessed online? 

Any feedback is greatly appreciated.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.open-bio.org/pipermail/emboss/attachments/20020416/bc493f46/attachment.html 

From letondal at pasteur.fr  Tue Apr 16 17:41:15 2002
From: letondal at pasteur.fr (Catherine Letondal)
Date: Tue, 16 Apr 2002 23:41:15 +0200
Subject: Prima 
In-Reply-To: Your message of "Tue, 16 Apr 2002 13:18:07 EDT."
             <001301c1e56a$ad3c7880$2d7ce518@msns.str.ptd.net> 
Message-ID: <200204162141.g3GLfF6O453283@electre.pasteur.fr>


"Cynthia Martino" wrote:
> This is a multi-part message in MIME format.
> 
> Hi there!
> 
> In the past I was able to access a number of programs, including prima, =
> via the EMBnet Norway site.  However, now when I click on the program =
> name within the program list all I get is a help page describing the =
> qualifiers. =20
> 
> Do you know if this and the other programs formerly available (via =
> www.no.embnet.org/Programs/) can still be accessed online?=20
> 
> Any feedback is greatly appreciated.
> 

Hi,

If you want to use a similar interface, you can go to:
http://bioweb.pasteur.fr/seqanal/interfaces/prima.html
(see http://bioweb.pasteur.fr/intro-uk.html for all EMBOSS programs)

There are however other EMBOSS interfaces that you can use, people
from EMBOSS will tell you more accurately than I would do.

I don't know what happens on the no.embnet.org server. 
We are late in the distribution of the Xml Pise programs for EMBOSS
latest version, so that could explain.

--
Catherine Letondal -- Pasteur Institute Computing Center


From David.Bauer at SCHERING.DE  Wed Apr 17 01:34:38 2002
From: David.Bauer at SCHERING.DE (David.Bauer at SCHERING.DE)
Date: Wed, 17 Apr 2002 07:34:38 +0200
Subject: Antwort: Prima
Message-ID: <OFA61313C6.D09988C0-ONC1256B9E.001E5F7C@schering.de>


Hi,

the EMBOSS programs are also available at
http://ubigcg.mdh4.mdc-berlin.de:8080/

Btw. I have updated the system to EMBOSS version 2.3.1.

Ciao, David.


Hi there!

In the past I was able to access a number of programs, including prima, via the
EMBnet Norway site.  However, now when I click on the program name within the
program list all I get is a help page describing the qualifiers.

Do you know if this and the other programs formerly available (via
www.no.embnet.org/Programs/) can still be accessed online?

Any feedback is greatly appreciated.


From mathog at mendel.bio.caltech.edu  Wed Apr 17 15:05:47 2002
From: mathog at mendel.bio.caltech.edu (David Mathog)
Date: Wed, 17 Apr 2002 12:05:47 -0700
Subject: network USA
Message-ID: <E16xukV-0004HN-00@mendel.bio.caltech.edu>

Today I finally realized that the NCBi's PmFetch cgi 

  http://www.ncbi.nlm.nih.gov:80/entrez/utils/pmfetch_help.html

can be used to retrieve data via gi using a "simple" URL like this:

wget -O dmwhite.genbank \
'http://www.ncbi.nlm.nih.gov/entrez/utils/pmfetch.fcgi?db=Nucleotide&id=10873&report=gen&mode=text'

Unfortunately it seems not to be able to retrieve by either accession
number or
locus name - I'm still waiting to hear if there is some other NCBI 
interface for that.

Which is a long way of coming around to considering how a USA could be
used to retrieve remote sequences without exposing end users to truly
hideous
constructs.  The semantics of accessing arbitrary network databases are
probably much too complex to include in the USA but one can imagine
burying
these details under new types of "database" entries in the defaults
file. Something like this:

DB gigenbank [
  method: remoteurlbyid
  comment: "GENBANK at NCBI by gi number"
  format: -
  dir: -
  file: -
  type: N
#optional
  target:
'http://www.ncbi.nlm.nih.gov/entrez/utils/pmfetch.fcgi?db=Nucleotide&id=$ID&report=gen&mode=text'
  filter: 'wget -O - $target'
]

Which would then allow something like this to work transparently:

% seqret gigenbank:10873

The USA already has the "program" option but I think in a situation like
this it's
much too complex to actually use.  How many users are going to be able
to successfully negotiate this:

% seqret -sequence=fasta::"wget -O -
'http://www.ncbi.nlm.nih.gov/entrez/utils/pmfetch.fcgi?db=Nucleotide&id=10873&report=fasta&mode=text'
|" -filter

Anyway, what I'm proposing is that the database definition be extended
slightly
to allow remote accesss methods.  This would be particularly helpful for
people
running EMBOSS on their own PCs or Macs, who tend not to have large
local databases installed.

Regards,

David Mathog
mathog at caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech


From dmartin at bioinformatics.msiwtb.dundee.ac.uk  Wed Apr 17 15:32:40 2002
From: dmartin at bioinformatics.msiwtb.dundee.ac.uk (David Martin)
Date: Wed, 17 Apr 2002 20:32:40 +0100 (BST)
Subject: network USA
In-Reply-To: <E16xukV-0004HN-00@mendel.bio.caltech.edu>
Message-ID: <Pine.LNX.4.33.0204172029230.2149-100000@bioinformatics.msiwtb.dundee.ac.uk>

On Wed, 17 Apr 2002, David Mathog wrote:

> Today I finally realized that the NCBi's PmFetch cgi
>
>   http://www.ncbi.nlm.nih.gov:80/entrez/utils/pmfetch_help.html
>
> can be used to retrieve data via gi using a "simple" URL like this:
>
> wget -O dmwhite.genbank \
> 'http://www.ncbi.nlm.nih.gov/entrez/utils/pmfetch.fcgi?db=Nucleotide&id=10873&report=gen&mode=text'
>
> Unfortunately it seems not to be able to retrieve by either accession
> number or
> locus name - I'm still waiting to hear if there is some other NCBI
> interface for that.
>
> Which is a long way of coming around to considering how a USA could be
> used to retrieve remote sequences without exposing end users to truly
> hideous
> constructs.  The semantics of accessing arbitrary network databases are
> probably much too complex to include in the USA but one can imagine
> burying
> these details under new types of "database" entries in the defaults
> file. Something like this:

Try 'method: url' and using %s instead of $ID. It has been there from
EMBOSS 0.0.4 to
allow retrieval from remote srs servers (or indeed any arbitrary web
address where the id can be passed in the url).

Around page 19-20 in the admin guide.

If it doesn't work then let the guilty parties know.

..d

>
> DB gigenbank [
>   method: remoteurlbyid
>   comment: "GENBANK at NCBI by gi number"
>   format: -
>   dir: -
>   file: -
>   type: N
> #optional
>   target:
> 'http://www.ncbi.nlm.nih.gov/entrez/utils/pmfetch.fcgi?db=Nucleotide&id=$ID&report=gen&mode=text'
>   filter: 'wget -O - $target'
> ]
>
> Which would then allow something like this to work transparently:
>
> % seqret gigenbank:10873
>
> The USA already has the "program" option but I think in a situation like
> this it's
> much too complex to actually use.  How many users are going to be able
> to successfully negotiate this:
>
> % seqret -sequence=fasta::"wget -O -
> 'http://www.ncbi.nlm.nih.gov/entrez/utils/pmfetch.fcgi?db=Nucleotide&id=10873&report=fasta&mode=text'
> |" -filter
>
> Anyway, what I'm proposing is that the database definition be extended
> slightly
> to allow remote accesss methods.  This would be particularly helpful for
> people
> running EMBOSS on their own PCs or Macs, who tend not to have large
> local databases installed.
>
> Regards,
>
> David Mathog
> mathog at caltech.edu
> Manager, Sequence Analysis Facility, Biology Division, Caltech
>

----------------------------------
David Martin PhD
Bioinformatics Scientific Officer
Wellcome Trust Biocentre, Dundee
----------------------------------


From David.Bauer at SCHERING.DE  Thu Apr 18 01:50:47 2002
From: David.Bauer at SCHERING.DE (David.Bauer at SCHERING.DE)
Date: Thu, 18 Apr 2002 07:50:47 +0200
Subject: Antwort: network USA
Message-ID: <OF31990B94.D3A4AF87-ONC1256B9F.001E1BD2@schering.de>


Hi,

I use for this a workaround which uses method app calling scripts which use two
urls at ncbi.
In emboss.default I have two entries for nucleotide and protein, which call an
external script.
############
DB ncbin [ type: N method: app format: genbank
  app: "/bips/bin/emboss/ncbi_fetchn %s"
  comment: "NCBI GenBank Nucleotide" ]

DB ncbip [ type: P method: app format: genbank
  app: "/bips/bin/emboss/ncbi_fetchp %s"
  comment: "NCBI GenBank Protein" ]
##################

The script is unfortunately not very portable as it uses a modified perl LWP
module to work with our firewall.
Basic idea is to use the

"http://www.ncbi.nlm.nih.gov/entrez/utils/pmqty.fcgi?db=nucleotide&term
=$id&dopt=genbank"
resp. "http://www.ncbi.nlm.nih.gov/entrez/utils/pmqty.fcgi?db=protein&term
=$id&dopt=genpept" for protein

to get the gid where $id is locus or acc.
If there are different gid for one acc, then all of them are returned.
What I have observed is that the gid of the most recent version is returned
first (but I'm not sure if this is always true).
So I just grab the first gid which comes and then use the same url you already
mentioned:

"http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=text&db=nucleotide&uid
=$gid&dopt=GenBank"
("http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=text&db=protein&uid
=$gid&dopt=genpept")

to return the whole entry.

So the user can just use
ncbin:<some id or acc.>
with  seqret or entret to get the sequence or genbank entry.


Hope this helps,
Ciao, David.


From inagy at abc.hu  Thu Apr 18 02:50:57 2002
From: inagy at abc.hu (inagy at abc.hu)
Date: Thu, 18 Apr 2002 08:50:57 +0200 (CEST)
Subject: primer3 and -format_output
In-Reply-To: <3CAB2614.9A8B28A@hgmp.mrc.ac.uk>
Message-ID: <XFMail.20020418085057.inagy@abc.hu>


The Whitehead primer3 program has a "-format_output" option  that writes a
formatted output of the input seqences and highlightes the primer binding sites,
etc.

Would it be possible to include this option into the EMBOSS version too ?
It is sometimes very useful.


Istvan


From gwilliam at hgmp.mrc.ac.uk  Fri Apr 19 04:15:14 2002
From: gwilliam at hgmp.mrc.ac.uk (Gary Williams, Tel 01223 494522)
Date: Fri, 19 Apr 2002 09:15:14 +0100
Subject: primer3 and -format_output
References: <XFMail.20020418085057.inagy@abc.hu>
Message-ID: <3CBFD212.C29B713@hgmp.mrc.ac.uk>

I'll add this to the list of suggestions for primer3.
Gary

inagy at abc.hu wrote:
> 
> The Whitehead primer3 program has a "-format_output" option  that writes a
> formatted output of the input seqences and highlightes the primer binding sites,
> etc.
> 
> Would it be possible to include this option into the EMBOSS version too ?
> It is sometimes very useful.
> 
> Istvan

-- 
Gary Williams               Tel: +44 1223 494522  Fax: +44 1223 494512
mailto:G.Williams at hgmp.mrc.ac.uk            http://www.hgmp.mrc.ac.uk/
Bioinformatics,MRC HGMP Resource Centre,Hinxton,Cambridge, CB10 1SB,UK


From grimplet at ensam.inra.fr  Fri Apr 19 05:27:58 2002
From: grimplet at ensam.inra.fr (=?iso-8859-1?q?j=E9r=F4me=20Grimplet?=)
Date: Fri, 19 Apr 2002 11:27:58 +0200
Subject: primer3_core
Message-ID: <200204191129.g3JBTOe14633@ensam.inra.fr>

I believe that somebody already put this question a few week ago, but how can 
I get the primer3_core programm. I don't find it in the  Whitehead Institute 
package.

Thanks,

Jerome
-- 
J?r?me Grimplet 
Laboratoire de Biochimie M?tabolique et Technologie
UMR Sciences Pour l'Oenologie
2, Place Viala
34060 Montpellier Cedex 01
Tel: 33(0)4.99.61.27.56
Fax: 33(0)4.99.61.28.57
grimplet at ensam.inra.fr


From gwilliam at hgmp.mrc.ac.uk  Fri Apr 19 06:51:46 2002
From: gwilliam at hgmp.mrc.ac.uk (Gary Williams, Tel 01223 494522)
Date: Fri, 19 Apr 2002 11:51:46 +0100
Subject: primer3_core
References: <200204191129.g3JBTOe14633@ensam.inra.fr>
Message-ID: <3CBFF6C2.DCAF6007@hgmp.mrc.ac.uk>


>From the eprimer3 documentation:

Notes

   The Whitehead Institute program that is run by this program is
   available from:
   http://www-genome.wi.mit.edu/genome_software/other/primer3.html
   (Then see the link 'Get release 0.9')

   The version that is run by this program is 3.0.9 currently available
   from:
  
http://www-genome.wi.mit.edu/ftp/distribution/software/primer3_0_9_test.tar.gz
 

j?r?me Grimplet wrote:
> 
> I believe that somebody already put this question a few week ago, but how can
> I get the primer3_core programm. I don't find it in the  Whitehead Institute
> package.
> 
> Thanks,
> 
> Jerome
> --
> J?r?me Grimplet
> Laboratoire de Biochimie M?tabolique et Technologie
> UMR Sciences Pour l'Oenologie
> 2, Place Viala
> 34060 Montpellier Cedex 01
> Tel: 33(0)4.99.61.27.56
> Fax: 33(0)4.99.61.28.57
> grimplet at ensam.inra.fr

-- 
Gary Williams               Tel: +44 1223 494522  Fax: +44 1223 494512
mailto:G.Williams at hgmp.mrc.ac.uk            http://www.hgmp.mrc.ac.uk/
Bioinformatics,MRC HGMP Resource Centre,Hinxton,Cambridge, CB10 1SB,UK


From peter.rice at uk.lionbioscience.com  Fri Apr 19 08:49:07 2002
From: peter.rice at uk.lionbioscience.com (Peter Rice)
Date: Fri, 19 Apr 2002 13:49:07 +0100
Subject: network USA
References: <E16xukV-0004HN-00@mendel.bio.caltech.edu>
Message-ID: <3CC01243.9190F3FB@uk.lionbioscience.com>

David Mathog wrote:
> 
> Today I finally realized that the NCBi's PmFetch cgi
> 
>   http://www.ncbi.nlm.nih.gov:80/entrez/utils/pmfetch_help.html
> 
> can be used to retrieve data via gi using a "simple" URL like this:
> 
> wget -O dmwhite.genbank \
> 'http://www.ncbi.nlm.nih.gov/entrez/utils/pmfetch.fcgi?db=Nucleotide&id=10873&report=gen&mode=text'
> 
> Unfortunately it seems not to be able to retrieve by either accession
> number or
> locus name - I'm still waiting to hear if there is some other NCBI
> interface for that.

Oops. Will be fixed in 2.4.0 (Alan and I thought it already was, but it
needed one extra line of code in the latest CVS version). The problem is
that EMBOSS checks the ID and accession of the returned entry for the URL
access method, and of course neither matches '10873'.

Which leads on to a new access method for 2.4.0. We are adding an "srswww"
access method that generates the SRS URLs, and can query by id, accession,
seqversion (or GI), keyword, organism or description. We can add at least
some of these for entrez (new access method entrez) if we can gather up
enough URLs. Are there any entrez experts who can help with suggested URLs
to retrieve (preferably plain text, but html will do) from entrez with
queries for each of these fields?

Peter

-- 
------------------------------------------------
Peter Rice, LION Bioscience Ltd, Cambridge, UK
peter.rice at uk.lionbioscience.com +44 1223 224723


From sghk100 at sghms.ac.uk  Fri Apr 19 11:19:15 2002
From: sghk100 at sghms.ac.uk (David Winterbourne)
Date: Fri, 19 Apr 2002 16:19:15 +0100
Subject: network USA
Message-ID: <3CC03573.4A76C643@sghms.ac.uk>

David Martin wrote:

> ...
>
> Try 'method: url' and using %s instead of $ID. It has been there from
> EMBOSS 0.0.4 to
> allow retrieval from remote srs servers (or indeed any arbitrary web
> address where the id can be passed in the url).

I have been having a problem accessing the Swiss Prot database using this method. I set up URL based access
to SWISSPROT and EMBL databases at EBI as follows:

DB sw [ type: P method: url format: swiss
url: "http://srs.ebi.ac.uk/srs6bin/cgi-bin/wgetz?-e+[SWISSPROT:%s]"
DB embl [ type: N method: url format: embl
url: "http://srs.ebi.ac.uk/srs6bin/cgi-bin/wgetz?-e+[EMBL-id:%s]"

For an EMBL entry, using the URL in a browser and specifying it in EMBOSS accesses the data. However, the
equivalent for Swiss Prot works in a browser but not  in EMBOSS - it just causes the system to hang. Is
there a simple solution?

Regards
David
--
David Winterbourne
Department of Surgery
St. George's Hospital Medical School, London SW17 0RE, England
Tel: 020 8725 5581   Fax: 020 8725 3594


From jfreeman at variagenics.com  Mon Apr 22 17:21:51 2002
From: jfreeman at variagenics.com (James Freeman)
Date: Mon, 22 Apr 2002 17:21:51 -0400
Subject: Jemboss and Resin
Message-ID: <3CC47EEF.635B531E@variagenics.com>

To whom it may concern,

Does anyone know of any problems when using Resin
(http://www.caucho.com/) as a substitute for Tomcat when running
Jemboss?

Thanks for your assistance,

Jim Freeman
Senior Scientist
Variagenics, Inc.


From tchiang at bioinfo.sickkids.on.ca  Tue Apr 23 10:01:40 2002
From: tchiang at bioinfo.sickkids.on.ca (Ted Chiang)
Date: Tue, 23 Apr 2002 10:01:40 -0400 (EDT)
Subject: EMBOSS:complex
Message-ID: <Pine.GSO.4.05.10204230956430.15900-100000@kenny>


Just a quick question.  In the 2.3.1 release, the EMBOSS program 'complex'
is not fully implemented.  Will this program be in the next release or
have we missed something in the installation?

-Ted


=====================================
Ted Chiang
Bioinformatics Supercomputing Centre
Hospital for Sick Children, Toronto
ext. 7028
tchiang at bioinfo.sickkids.on.ca


From peter.rice at uk.lionbioscience.com  Tue Apr 23 10:30:23 2002
From: peter.rice at uk.lionbioscience.com (Peter Rice)
Date: Tue, 23 Apr 2002 15:30:23 +0100
Subject: EMBOSS:complex
References: <Pine.GSO.4.05.10204230956430.15900-100000@kenny>
Message-ID: <3CC56FFF.3C3AFB6A@uk.lionbioscience.com>

Ted Chiang wrote:
> 
> Just a quick question.  In the 2.3.1 release, the EMBOSS program 'complex'
> is not fully implemented.  Will this program be in the next release or
> have we missed something in the installation?

complex is a strange application (with italian command line options) that
the authors have not been maintaining.

We have moved it into the "make check" set of obsolete/testing
applications. If you do need it, the "make check" command will build it,
but you then need to copy the binary and other files to the install
directories by hand.

One unfortunate side effect of moving applications to "make check" (or
removing them) is that the old binaries will stay in the install directory.

Perhaps we can find a way to clean them up ... need to think about that a
little.

regards,

Peter

-- 
------------------------------------------------
Peter Rice, LION Bioscience Ltd, Cambridge, UK
peter.rice at uk.lionbioscience.com +44 1223 224723


From letondal at pasteur.fr  Tue Apr 23 11:06:50 2002
From: letondal at pasteur.fr (Catherine Letondal)
Date: Tue, 23 Apr 2002 17:06:50 +0200
Subject: Pise/EMBOSS 2.3.1
Message-ID: <200204231506.g3NF6oop249416@electre.pasteur.fr>


Hi,

I have more or less adapted new ACD types and attributes to Pise.
(ftp://ftp.pasteur.fr/pub/GenSoft/unix/misc/Pise/emboss_xml_files-2.3.1.tar.gz)

Main changes were for align types, where I could associate a "pipetype" to
chain to other programs taking alignment as input. BTW, I found "MSF" and "fasta" for the
-aformat parameter - are there others?

The main problem I had was with string parameters for specifying a path, with ./ default
value and having corresponding extn parameters.

On a Web interface you cannot really allow path and filename manipulation, and you 
must give a mean to the user to upload or input data (except if you have a login on user
home directory, which I'm aware is the choice for other Web interfaces for EMBOSS).
That's why I had to discard the following programs:
alignwrap, contacts, seqnr, seqsort, siggen, dichet and scopalign.

I have tried to "guess" that such parameter is a path, according to their name, 
with the next parameter being the extension, and, it's in the input or output section, 
so I can decide it's an InFile or Sequence, or Results Pise parameter. 
But some parameters are neither in input nor in output sections, and it's not secure 
to associate to parameters just because they follow each other.

A solution could be to have an explicit type and the extension as an attribute:

path: algpath  [
  parameter: "Y"
  prompt: "Location and extension of alignment files for input"
  default: "./"
  extn: ".align"
]

instead of (siggen.acd):

section: input [ info: "input Section" type: page ]
string: algpath  [
  parameter: "Y"
  prompt: "Location of alignment files for input"
  default: "./"
]

string: algextn  [
  parameter: "Y"
  prompt: "Extension of alignment files for input"
  default: ".align"
]

What do you think?

Thanks a lot in advance,

-- 
Catherine Letondal -- Pasteur Institute Computing Center


From peter.rice at uk.lionbioscience.com  Tue Apr 23 12:03:27 2002
From: peter.rice at uk.lionbioscience.com (Peter Rice)
Date: Tue, 23 Apr 2002 17:03:27 +0100
Subject: Pise/EMBOSS 2.3.1
References: <200204231506.g3NF6oop249416@electre.pasteur.fr>
Message-ID: <3CC585CF.D664FB68@uk.lionbioscience.com>

Hi Catherine,

> Main changes were for align types, where I could associate a "pipetype" to
> chain to other programs taking alignment as input. BTW, I found "MSF" and "fasta" for the
> -aformat parameter - are there others?

There are more, but not sequence formats. We should add them to "entrails"
output.

We can easily add more sequence formats. Can you suggest some?

The full list is (from ajax/ajalign.c) :

markx0*, markx1*, markx2*, markx3*, markx10* (from the FASTA package)
multiple
pair*
simple
score
srs, srspair* (for simple parsing in SRS in case the others change)
trace (for debugging only)

Those with '*' are for pairwise alignments only.

> The main problem I had was with string parameters for specifying a path, with ./ default
> value and having corresponding extn parameters.
>
> That's why I had to discard the following programs:
> alignwrap, contacts, seqnr, seqsort, siggen, dichet and scopalign.
>
> A solution could be to have an explicit type and the extension as an attribute:
> 
> path: algpath  [
>   parameter: "Y"
>   prompt: "Location and extension of alignment files for input"
>   default: "./"
>   extn: ".align"
> ]
> 
> What do you think?

The path and extension options  are a terrible 'hack' to avoid having "*"
on the command line for those programs.

This is really just infile with a wild card filename (which works already).

We can make a new ACD type "inwild" which works like infile but with some
small differences. The prompt would be "Input file(s)". The ajAcdGetInwild
function will return an AjPFile. We can add functions to report the
filenames as a string list (the first file is already open, the others are
in a list so it is a little tricky to make the list in an application).

There should be an attribute "inextension:align" (for example) and a
default value of "*". If the user specifies "*.align" the inextension will
be ignored.

Associated qualifiers:

-inextension align
-indirectory /home/user/somewhere (defaults to current directory)

For consistency, we can add the same qualifiers for infile.

With "out" instead of "in" we can sue the same qualifiers for outfile and a
new ACD type "outwild" (outwild can open a new output file, using a new
ajFileNextOut call, but the application needs to give the base name each
time).

All easy to implement.

One problem ... inwild does not work well as a parameter because it has to
be given as "*" on the command line. Same problem for "outwild". I am sure
users can be educated.

The programs that use the path/extension options do not define them as
parameters anyway. Their ACD files need some corrections.

Comments?

regards,

Peter

-- 
------------------------------------------------
Peter Rice, LION Bioscience Ltd, Cambridge, UK
peter.rice at uk.lionbioscience.com +44 1223 224723


From letondal at pasteur.fr  Tue Apr 23 13:38:43 2002
From: letondal at pasteur.fr (Catherine Letondal)
Date: Tue, 23 Apr 2002 19:38:43 +0200
Subject: Pise/EMBOSS 2.3.1 
In-Reply-To: Your message of "Tue, 23 Apr 2002 17:03:27 BST."
             <3CC585CF.D664FB68@uk.lionbioscience.com> 
Message-ID: <200204231738.g3NHchop186093@electre.pasteur.fr>


Peter Rice wrote:
> Hi Catherine,

Hi Peter,


> 
> > Main changes were for align types, where I could associate a "pipetype" to
> > chain to other programs taking alignment as input. BTW, I found "MSF" and "fasta" for the
> > -aformat parameter - are there others?
> 
> There are more, but not sequence formats. We should add them to "entrails"
> output.
> 
> We can easily add more sequence formats. Can you suggest some?

I just asked to know which one to put on the Web interface.
(There are also clustalw or Phylip, but it's not necessary in Pise, since there
are format converters).
 

> Those with '*' are for pairwise alignments only.
> 
> > The main problem I had was with string parameters for specifying a path, with ./ default
> > value and having corresponding extn parameters.
> >
> > That's why I had to discard the following programs:
> > alignwrap, contacts, seqnr, seqsort, siggen, dichet and scopalign.
> >
> > A solution could be to have an explicit type and the extension as an attribute:
> > 
> > path: algpath  [
> >   parameter: "Y"
> >   prompt: "Location and extension of alignment files for input"
> >   default: "./"
> >   extn: ".align"
> > ]
> > 
> > What do you think?
> 
> The path and extension options  are a terrible 'hack' to avoid having "*"
> on the command line for those programs.

I have the same problem just with ./, since '/' cannot be allowed in a string
parameter on a Web server.

Another problem I have made a workaround for, is the '*' programs such as
extractseqfeat, where it is replaced in the Web form by 'all', then replaced
in the CGI by '*'. 

> 
> This is really just infile with a wild card filename (which works already).
> 
> We can make a new ACD type "inwild" which works like infile but with some
> small differences. The prompt would be "Input file(s)". The ajAcdGetInwild
> function will return an AjPFile. We can add functions to report the
> filenames as a string list (the first file is already open, the others are
> in a list so it is a little tricky to make the list in an application).
> 
> There should be an attribute "inextension:align" (for example) and a
> default value of "*". If the user specifies "*.align" the inextension will
> be ignored.
> 
> Associated qualifiers:
> 
> -inextension align
> -indirectory /home/user/somewhere (defaults to current directory)
> 
> For consistency, we can add the same qualifiers for infile.
> 
> With "out" instead of "in" we can sue the same qualifiers for outfile and a
> new ACD type "outwild" (outwild can open a new output file, using a new
> ajFileNextOut call, but the application needs to give the base name each
> time).
> 
> All easy to implement.
> 
> One problem ... inwild does not work well as a parameter because it has to
> be given as "*" on the command line. Same problem for "outwild". I am sure
> users can be educated.
> 
> The programs that use the path/extension options do not define them as
> parameters anyway. Their ACD files need some corrections.
> 
> Comments?

As long as there is a way to detect such kind of parameter (in order to replace
them by a simple textarea or file upload on a Web interface), I think it's
very useful! So the type would be inwild or outwild?

PS: Regarding Pise/EMBOSS I forgot to mention that not only output alignment 
are "connected" by Pise menus. I have also added this feature for sequence, seqall, 
seqout, etc...

Thanks for the quick answer!

--
Catherine Letondal -- Pasteur Institute Computing Center


From mathog at mendel.bio.caltech.edu  Tue Apr 23 14:25:34 2002
From: mathog at mendel.bio.caltech.edu (David Mathog)
Date: Tue, 23 Apr 2002 11:25:34 -0700
Subject: Pise/EMBOSS 2.3.1
Message-ID: <E1704ys-00016w-00@mendel.bio.caltech.edu>

> One problem ... inwild does not work well as a parameter because it
has to
> be given as "*" on the command line. Same problem for "outwild". I am
sure
> users can be educated.

Sure they can.  That's why thousands of hours are being spent wrapping
GUIs
around programs so that users don't have to (horrors) log on or (gasp)
type
a command line.

Back to the subject at hand. (And this is stream of consciousness, so
please
bear with me.) I think that maybe for purposes of interface design there
should be predefined methods to break out (all) the pieces/options of a
USA.   (Perhaps
even reduced to perl and C modules in the EMBOSS distribution so that
W2h/Pise/etc 
don't need to be rewritten for each EMBOSS release.) Consider something
like this:

 program -sequence=genbank:\*

That never translates directly well into a GUI because the end user has
to
know what the full USA syntax is and especially that a "*" is a wild
card.
And often enough, they don't understand these concepts.   And even if
they do,
they may not be able to use certain aspects of that syntax on a given
server (for
instance, files and paths, or particular databases.)   So it falls to
the
GUI to put some glue in between the USA and the user.  The two main web
interfaces for EMBOSS take opposite paths in this regard.   Pise hides
the USA  completely and W2H allows the user to manipulate USAs through a
tool.
In W2H you generally have to build the USAs
ahead of time through a separate window and store them in a list, then
you select one or more USAs from the list when you run the program. 
(USAs can also generally be typed  into the slots
within the program - if the user knows what he/she is doing.)
In PISE you can enter a database USA like "genbank:dmwhite" (but it
isn't
called a USA) but entering "genbank:*" doesn't work (for instance, with
compseq).  PISE isn't really designed to handle wild cards because
it's going to try to extract that whole sequence from the database and
save it in a file and then run the program on that file.  This is
consistent with
its typical "upload data for each program" design.  Pise only ever runs
programs with the "simple file" sort of USA.  So perhaps its just as
well
that "genbank:*" doesn't work at the moment!!!  To get around this
wildcard limitation Pise would have to be reworked enough to recognize
wildcards (and USAs in general) and slot them onto the command line
without first extracting the sequences they refer to.

Anyway,  what's really going on with -sequence is that all of the
components of  USA are encoded into a single string for use on the
command line and then are broken out again into separate pieces later
within the program.   For a GUI _all_ these pieces need to be broken
out explicitly and displayed to the user (who isn't expected to know
anything about USAs or have to learn anything them or the interface). 
Something like this:

format: default
database:genbank
x ALL_ENTRIES  o BY_STRING
entrystring: (blank)

>From that the GUI/cgi can easily enough format a USA for the final
command line.

But imagine using such an interface.  It's great if you just run an
occasional program
but not so wonderful when you're doing something complex.  How do you
cut and paste
the state of 4 (or more) USA variables from one page (=program) to
another?  That suggests to me that a GUI which always has fully broken
out USA options will probably
end up being pretty awkward to use.   However, since the purpose of the
GUI is
to essentially reformat (implicit) information in the USA why not make
that an
explicit option - and let it reformat in both directions?  Then the
"standard" USA
GUI interface starts to look something like this:

[test usa] [from USA] [to USA] [use this] [abort]   <------(buttons)
USA:[  genbank:*   ]
format: default
database:genbank      <-------- (pull down list)
x ALL_ENTRIES  o BY_STRING
entrystring: (blank)

Actually it's a LOT more complicated than that, considering that it also
encompasses
listfiles, multiple entries (foo.msf{one,two, three}) etc..  If the user
has a USA he/she can
plug it into the GUI and fine.  Or they can plug it, translate it, and
tweak it.  Or if
they don't have a USA to start with they can use this page to build one.
And this USA constructor page can enable/disable the USA fields as
appropriate
for each site and/or program.  (No file access?  Can't accept list files
or wild cards?
Then don't show those USA options.  Make the database list from the
output of showdb.)

The final problem is that exposing the guts of the USA will take up a
lot of
screen space and complicate the program interfaces.  That's less
of a problem though if the GUI for any given EMBOSS program just
provides
a slot to plug in a USA and some way to pop up the USA fomatter window
to fill in that slot (through javascript or whatever).  The popped up
formatter
could then drop the final USA back into the program's USA slot.  (Sort
of like
what W2H does, but into the programs slot rather than the working list).

Regards,

David Mathog
mathog at caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech


From jison at hgmp.mrc.ac.uk  Wed Apr 24 04:40:25 2002
From: jison at hgmp.mrc.ac.uk (Dr J.C. Ison)
Date: Wed, 24 Apr 2002 09:40:25 +0100
Subject: Pise/EMBOSS 2.3.1
References: <200204231506.g3NF6oop249416@electre.pasteur.fr> <3CC585CF.D664FB68@uk.lionbioscience.com>
Message-ID: <3CC66F79.D7AB51BA@hgmp.mrc.ac.uk>

> The programs that use the path/extension options do not define them as
> parameters anyway. Their ACD files need some corrections.
>
> Comments?

They are parameters in new versions of the the protein structure apps
(alignwrap, contacts, seqnr, seqsort, siggen, dichet and scopalign etc)
but I haven't committed them yet - within a month hopefully.

J.


From charles at moulinette.dyndns.org  Fri Apr 26 17:30:56 2002
From: charles at moulinette.dyndns.org (Charles Plessy)
Date: Fri, 26 Apr 2002 23:30:56 +0200
Subject: seqret doesn't count more than 99?
Message-ID: <20020426213056.GA26616@moulinette.dyndns.org>

Hello,

I downloaded the draft of the fugu genome (fasta format, 300Mb) and
renamed the headers using the following command line :

sed < fugu_02_04_28.fasta 's/>/>gnl|fugu|/' > fugu_newheaders_02_04_28.fasta

I'm not able to index a blast database correctly if the header doesn't
look ?ncbi compliant? ant formatdb haddn't been run with the -o flag.

I created the blast database and indexed it with dbiblast. The reason
for not formatting the fasta file itself is to save space. This also
enforces a synchronicity between the blast hits names and the names
that I can give to seqret.

Here is now the prbolem :

charles at pc-1035-a:~$ seqret fugu:Scaffold_7
Reads and writes (returns) sequences
Output sequence [scaffold_7.fasta]:  ==> OK!

charles at pc-1035-a:~$ seqret fugu:Scaffold_99
Reads and writes (returns) sequences
Output sequence [scaffold_99.fasta]: ==> OK!

charles at pc-1035-a:~$ seqret fugu:Scaffold_100
Reads and writes (returns) sequences
Error: Unable to read sequence 'fugu:Scaffold_100'

==> KO :((

seqret can't fetch sequences names like Scaffold_xzy, where xyz >= 100.

Is it due to the lenght of the name?
I am puzzled with that problem... I can send you more info if you like.

Charles


From simon.andrews at bbsrc.ac.uk  Mon Apr 29 05:49:29 2002
From: simon.andrews at bbsrc.ac.uk (simon andrews (BI))
Date: Mon, 29 Apr 2002 10:49:29 +0100
Subject: seqret doesn't count more than 99?
Message-ID: <2DC41140A89ED411989D00508BDCD9ED01E28535@bi-exsrv1.iapc.bbsrc.ac.uk>


> -----Original Message-----
> From: Charles Plessy [mailto:charles at moulinette.dyndns.org]
> Sent: 26 April 2002 22:31
> To: emboss at hgmp.mrc.ac.uk
> Subject: seqret doesn't count more than 99?
> 
> 
> Hello,
> 
> I downloaded the draft of the fugu genome 

[snip]

> I'm not able to index a blast database correctly if the header doesn't
> look ?ncbi compliant? ant formatdb haddn't been run with the -o flag.

I'd not tried this before, but we see the same thing here.  Running dbiblast
on the indexed raw fugu data seems to work, but seqret fails on the
subsequent retrieval.

The problem seems to be in the accession numbers entered into the .trg file
created by dbiblast.  Running seqret with debug on, shows the following
(edited) entries:

------------------------------------
USA to test: 'fugu_blasttest:Scaffold_1'
[snip]

found dbname fugu_blasttest
wild query 'Scaffold_1' 'Scaffold_1' '' 
database type: 'N' format 'ncbi'
use access method 'blast'
Matched seqAccess[12] 'blast'
seqAccessBlast type 1
[snip]

seqCdIdxSearch (entry 'Scaffold_1')
[several more of these]
idx test 59 'Scaffold_100' -1 (+/- 39)
idx test 49 'Contig_83248'  1 (+/- 18)
idx test 54 'Contig_9376'  1 (+/- 8)
idx test 56 'Scaffold_10' -1 (+/- 3)
idx test 55 'Scaffold_1' -1 (+/- 0)
 
ajFileNewIn '/data/Fugu/EMBOSS/TEST/acnum.trg'
ajNamResolve of '/data/Fugu/EMBOSS/TEST/acnum.trg'
seqCdReadHeader file /data/Fugu/EMBOSS/TEST/acnum.trg
  FileSize: 416800 NRecords: 20825 recsize: 20 idsize: 10
seqCdFileOpen '/data/Fugu/EMBOSS/TEST/acnum.trg' NRecords: 20825 RecSize: 20
ajFileNewIn '/data/Fugu/EMBOSS/TEST/acnum.hit'
ajNamResolve of '/data/Fugu/EMBOSS/TEST/acnum.hit'
seqCdReadHeader file /data/Fugu/EMBOSS/TEST/acnum.hit
  FileSize: 83600 NRecords: 20825 recsize: 4 idsize: -6
seqCdFileOpen '/data/Fugu/EMBOSS/TEST/acnum.hit' NRecords: 20825 RecSize: 4
seqCdTrgSearch 'Scaffold_1' recSize: 20
trg test 10412 'ZZ0010413' -1 (+/- 20825)
trg test 5206 'ZZ0005207' -1 (+/- 10412)
trg test 2603 'ZZ0002604' -1 (+/- 5206)
trg test 1301 'ZZ0001302' -1 (+/- 2603)
trg test 650 'ZZ0000651' -1 (+/- 1301)
trg test 325 'ZZ0000326' -1 (+/- 650)
trg test 162 'ZZ0000163' -1 (+/- 325)
trg test 81 'ZZ0000082' -1 (+/- 162)
trg test 40 'ZZ0000041' -1 (+/- 81)
trg test 20 'ZZ0000021' -1 (+/- 40)
trg test 10 'ZZ0000011' -1 (+/- 20)
trg test 5 'ZZ0000006' -1 (+/- 10)
trg test 2 'ZZ0000003' -1 (+/- 5)
trg test 1 'ZZ0000002' -1 (+/- 2)
trg test 0 'ZZ0000001' -1 (+/- 1)
'SCAFFOLD_1' not found found in .trg

------------------------------------------------

After this is cleans up after itself and exits.  Looking through the .trg
file all the accessions are of the form ZZ0000XXX.  This format of accession
doesn't appear anywhere in my original data, so I don't know where it's
coming from (presumably either dbiblast or formatdb?).  The inability to
reconcile the Scaffold_1 with the ZZ00... accessions seems to be what causes
seqret to fail.


> I created the blast database and indexed it with dbiblast. The reason
> for not formatting the fasta file itself is to save space. This also
> enforces a synchronicity between the blast hits names and the names
> that I can give to seqret.

The way we did this was to use the fasta files for both.  I take the point
about the space saving, but the assembled data wasn't all that big.  If you
use the raw fasta files for both formatdb (without header parsing) and
dbifasta, then you can still use the same accession codes as reference in
both.


> Here is now the prbolem :
> 
> charles at pc-1035-a:~$ seqret fugu:Scaffold_100
> Reads and writes (returns) sequences
> Error: Unable to read sequence 'fugu:Scaffold_100'
> 
> ==> KO :((
> 
> seqret can't fetch sequences names like Scaffold_xzy, where 
> xyz >= 100.
> 
> Is it due to the length of the name?

It might be worth running seqret with the -debug flag on and looking at the
messages at the end of seqret.dbg.  This usually gives some more useful
information about what is going wrong in these cases.

I'd be interested in seeing a resolution to this as well...

	TTFN

	Simon.


From peter.rice at uk.lionbioscience.com  Mon Apr 29 06:41:30 2002
From: peter.rice at uk.lionbioscience.com (Peter Rice)
Date: Mon, 29 Apr 2002 11:41:30 +0100
Subject: seqret doesn't count more than 99?
References: <2DC41140A89ED411989D00508BDCD9ED01E28535@bi-exsrv1.iapc.bbsrc.ac.uk>
Message-ID: <3CCD235A.D418202E@uk.lionbioscience.com>

"simon andrews (BI)" wrote:
> The problem seems to be in the accession numbers entered into the .trg file
> created by dbiblast.  Running seqret with debug on, shows the following
> (edited) entries:

The command line:

   seqret fugu_blasttest:Scaffold_1

searches both the entryname and acnum indices.

The ZZ accession number are invented bu dbiblast so there is something in
the acnum index (they should disappear in 2.4.0, where we handle empty
indices gracefully).

The problem will be in the entryname index, where is seems Scaffold_1 was
found, but not accepted. I am waiting for the example file from Charles,
but I suspect this is a problem already fixed in the code for 2.4.0.

regards,

Peter

-- 
------------------------------------------------
Peter Rice, LION Bioscience Ltd, Cambridge, UK
peter.rice at uk.lionbioscience.com +44 1223 224723


From charles at moulinette.dyndns.org  Mon Apr 29 09:22:07 2002
From: charles at moulinette.dyndns.org (Charles Plessy)
Date: Mon, 29 Apr 2002 15:22:07 +0200
Subject: seqret doesn't count more than 99?
In-Reply-To: <2DC41140A89ED411989D00508BDCD9ED01E28535@bi-exsrv1.iapc.bbsrc.ac.uk>
References: <2DC41140A89ED411989D00508BDCD9ED01E28535@bi-exsrv1.iapc.bbsrc.ac.uk>
Message-ID: <20020429132207.GD1818@moulinette.dyndns.org>

> I'd not tried this before, but we see the same thing here.  Running dbiblast
> on the indexed raw fugu data seems to work, but seqret fails on the
> subsequent retrieval.

I have to NCBIze the headers in order to make it work : I use either
lcl|entryname or gnl|dbname|entryname

> > I created the blast database and indexed it with dbiblast. The reason
> > for not formatting the fasta file itself is to save space. This also
> > enforces a synchronicity between the blast hits names and the names
> > that I can give to seqret.
> 
> The way we did this was to use the fasta files for both.  I take the point
> about the space saving, but the assembled data wasn't all that big.  If you
> use the raw fasta files for both formatdb (without header parsing) and
> dbifasta, then you can still use the same accession codes as reference in
> both.

You are right, I was also motivated to do something 'aesthetic' ;)

> It might be worth running seqret with the -debug flag on and looking at the
> messages at the end of seqret.dbg.  This usually gives some more useful
> information about what is going wrong in these cases.

I can send the debug info upon request, the files (one success, one
failure) are not that big (70k) but I think that netiquette doesn't
recommend sending them to all the list.

Charles


From mathog at mendel.bio.caltech.edu  Tue Apr  2 16:39:48 2002
From: mathog at mendel.bio.caltech.edu (David Mathog)
Date: Tue, 02 Apr 2002 08:39:48 -0800
Subject: primer3 and qual scores
Message-ID: <E16sRK0-0000ZJ-00@mendel.bio.caltech.edu>

The 0.9 version of primer3 from 

http://www-genome.wi.mit.edu/genome_software/other/primer3.html

comes with a cgi script that puts up a web interface which
drives primer3_core.  That web interface provides a slew
of options for including qual values in a section labeled
"Sequence Quality".  The primer3 program in EMBOSS seems not to
have these options.   Is there some technical reason why this
functionality wasn't included or is it just one of those (many) things
that have yet to percolate to the top of the "to do" list?

Thanks,

David Mathog
mathog at caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech


From vvrajarao at yahoo.com  Wed Apr  3 14:35:44 2002
From: vvrajarao at yahoo.com (V V Raja Rao)
Date: Wed, 3 Apr 2002 06:35:44 -0800 (PST)
Subject: tandem repeats
Message-ID: <20020403143544.79253.qmail@web11107.mail.yahoo.com>

Hi,

  I would like to know the algorithm used for the
tandem repeat finder program in the emboss package.
Can someone mail me the same.

Thanks in advance,
Raja.

__________________________________________________
Do You Yahoo!?
Yahoo! Tax Center - online filing with TurboTax
http://taxes.yahoo.com/


From gwilliam at hgmp.mrc.ac.uk  Wed Apr  3 15:56:04 2002
From: gwilliam at hgmp.mrc.ac.uk (Gary Williams, Tel 01223 494522)
Date: Wed, 03 Apr 2002 16:56:04 +0100
Subject: primer3 and qual scores
References: <E16sRK0-0000ZJ-00@mendel.bio.caltech.edu>
Message-ID: <3CAB2614.9A8B28A@hgmp.mrc.ac.uk>


As you say "a slew of options"!
I didn't include the quality values as there was pressure from the GUI
community to minimise the number of options.

I, myself,  have never used the quality options.
They could be added in, but there are already rather a lot of options
for this program; do we need more? Which ones do you consider the most
useful, and why?

Gary

David Mathog wrote:
> 
> The 0.9 version of primer3 from
> 
> http://www-genome.wi.mit.edu/genome_software/other/primer3.html
> 
> comes with a cgi script that puts up a web interface which
> drives primer3_core.  That web interface provides a slew
> of options for including qual values in a section labeled
> "Sequence Quality".  The primer3 program in EMBOSS seems not to
> have these options.   Is there some technical reason why this
> functionality wasn't included or is it just one of those (many) things
> that have yet to percolate to the top of the "to do" list?
> 
> Thanks,
> 
> David Mathog
> mathog at caltech.edu
> Manager, Sequence Analysis Facility, Biology Division, Caltech

-- 
Gary Williams               Tel: +44 1223 494522  Fax: +44 1223 494512
mailto:G.Williams at hgmp.mrc.ac.uk            http://www.hgmp.mrc.ac.uk/
Bioinformatics,MRC HGMP Resource Centre,Hinxton,Cambridge, CB10 1SB,UK


From foisys at mac.com  Thu Apr  4 19:02:47 2002
From: foisys at mac.com (Sylvain Foisy)
Date: Thu, 4 Apr 2002 14:02:47 -0500
Subject: Compiling EMBOSS for Jemboss use in MacOS X
Message-ID: <8DA5625E-47FE-11D6-ABCB-0003936297DA@mac.com>

Hi,

Thanks for supporting OS X in EMBOSS. I got a clean compile and make and 
it is working OK. I would like to try Jemboss and I have one (I guess, 
pretty stupid) question. What is the Java location I might specify in 
the configure command in Mac OS X? There is a lot of differents places 
in OS X that might be good...

Any hints?

Sylvain

++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Sylvain Foisy, Ph. D.
Manager
BIONEQ - Le Reseau quebecois de bioinformatique
Genome-Quebec
Tel.: (514) 343-6111 poste 5188
E-mail: foisys at medcn.umontreal.ca
++++++++++++++++++++++++++++++++++++++++++++++++++++++++


From ame at esbs.u-strasbg.fr  Mon Apr  8 09:13:07 2002
From: ame at esbs.u-strasbg.fr (Jean-Christophe Ame)
Date: Mon, 8 Apr 2002 11:13:07 +0200
Subject: Blast database
Message-ID: <D6C83EC3-4AD0-11D6-BFB6-0005024329A7@esbs.u-strasbg.fr>

Hello,

I have a few BLAST formatted databases to be used with BLAST and I would 
like them to be shared with emboss. how can I do that ? How should I set 
up my .embossrc file ? Any answer would be of great help. Thank you.

Jean-Christophe


________________________
Jean-Christophe Am?, PhD
U.P.R. 9003 du CNRS - Canc?rog?n?se et Mutag?n?se Mol?culaire et 
Structurale
?cole Sup?rieure de Biotechnologie de Strasbourg
P?le API
Boulevard S?bastien-Brant
67400 Illkirch
France

tel.: 03 90 24 47 05
Fax.: 03 90 24 46 86


From cquijano at iib.uam.es  Tue Apr  9 08:51:30 2002
From: cquijano at iib.uam.es (Carlos Quijano)
Date: Tue, 09 Apr 2002 10:51:30 +0200
Subject: Where is protml (Phylip) ?
Message-ID: <3CB2AB92.5040308@iib.uam.es>

Hello,

Some people asked me about "activating" the protml application from 
phylip package. ;-)
We use Emboss, with Embassy, and it seems that protml is documented but 
not compiled or installed. Even looking for it under ./src it is not 
present by some reason.

I don't know if the cause for it is that protml (something like dnaml 
but for proteins trees) has been developed with PASCAL instead of C.

With the phylip package (not Embasy's) protml.pas is present between the 
other sourcefiles. And I have seen that it comes with the MOLPHY package 
too, but in C, I guess.

Someone has any idea for put a compiled protml.pas (or molphy's) into 
Emboss and make it accesible for the frontend-app used (for me w2h)??

I know perhaps this solution is not the best one, or perhaps a little 
paranoid. Because it's possible to use PIE like web interface for 
Phylip, Molphi's protml and puzzle.

I am only looking for an easy way for compiling protml and make it part 
of the Emboss or Emboss/w2h applications set.


Thank you for your time.


From frank at bioss.ac.uk  Tue Apr  9 10:04:18 2002
From: frank at bioss.ac.uk (Frank Wright)
Date: Tue, 09 Apr 2002 11:04:18 +0100
Subject: Where is protml (Phylip) ?
References: <3CB2AB92.5040308@iib.uam.es>
Message-ID: <3CB2BCA2.F34D3DEC@bioss.ac.uk>

PHYLIP 3.5 includes PROTML (from the MOLPHY package) because v 3.5 does
not have a protein maximum likelihood program.

However, PHYLIP 3.6 (almost about to go to a Beta release) has a new
program, PROML, which is a PHYLIP protein maximum likelihood program. 
PROML has additional features to PROTML.  

I suggest waiting for PHYLIP 3.6 to be released and EMBOSS/EMBASSY is
adapted to access v 3.6 programs.  In the meantime, PHYLIP 3.6 (alpha
version, but pretty stable) is available from
http://evolution.genetics.washington.edu/phylip.html.

Best Wishes,
Frank
-- 
Frank Wright
Biomathematics and Statistics Scotland, 
SCRI, DUNDEE DD2 5DA, Scotland
frank at bioss.sari.ac.uk


From Guoneng.Zhong at med.nyu.edu  Tue Apr  9 17:37:42 2002
From: Guoneng.Zhong at med.nyu.edu (Guoneng Zhong)
Date: Tue, 9 Apr 2002 13:37:42 -0400
Subject: problem running
Message-ID: <7EDDC060-4BE0-11D6-A32B-0050E41E5C1B@med.nyu.edu>

Hi,
I followed the instructions and installed emboss on a Tru64 unix.  I ran 
the test:
wossname -auto | more

and it worked (at least no weird errors).

But here are two problems:

1. Running jemboss gave me this:
Error: failed /usr/opt/java131/jre/lib/alpha/fast/libjvm.so, because 
dlopen: cannot load /usr/opt/java131/jre/lib/alpha/fast/libjvm.so

2. Running abiviewer gave me this:
Reads ABI file and display the trace
Output sequence [outfile.fasta]:
Graph type [x11]:
PLPLOT_LIB="/usr/local/emboss/lib"

Cannot open library file: plstnd5.fnt

Please set PLPLOT_LIB to the plplot/lib directory under emboss

*** PLPLOT ERROR ***
Unable to open font file
Program aborted

Any hint would help.  Thanks!

Guoneng


From David.Bauer at SCHERING.DE  Wed Apr 10 05:47:02 2002
From: David.Bauer at SCHERING.DE (David.Bauer at SCHERING.DE)
Date: Wed, 10 Apr 2002 07:47:02 +0200
Subject: Antwort: problem running
Message-ID: <OFDEBEE501.AA84E38F-ONC1256B97.001E7AAE@schering.de>


Hi,

Answer for question 2:

The PLPLOT_LIB must point to a directory with the .fnt files.
If you do a standard installation, then this is "/usr/local/share/EMBOSS".
Alternatively you can use the location where you unpacked the emboss tar file.
In that case it is the /.../EMBOSS-..../plplot/lib.
So if you use csh or tcsh you should have a
     setenv PLPLOT_LIB /usr/local/share/EMBOSS
in your .cshrc.

Hope this helps.

Ciao,
David.


Hi,
I followed the instructions and installed emboss on a Tru64 unix.  I ran
the test:
wossname -auto | more

and it worked (at least no weird errors).

But here are two problems:

1. Running jemboss gave me this:
Error: failed /usr/opt/java131/jre/lib/alpha/fast/libjvm.so, because
dlopen: cannot load /usr/opt/java131/jre/lib/alpha/fast/libjvm.so

2. Running abiviewer gave me this:
Reads ABI file and display the trace
Output sequence [outfile.fasta]:
Graph type [x11]:
PLPLOT_LIB="/usr/local/emboss/lib"

Cannot open library file: plstnd5.fnt

Please set PLPLOT_LIB to the plplot/lib directory under emboss

*** PLPLOT ERROR ***
Unable to open font file
Program aborted

Any hint would help.  Thanks!

Guoneng


From john.walshaw at bbsrc.ac.uk  Wed Apr 10 10:44:13 2002
From: john.walshaw at bbsrc.ac.uk (john walshaw (JIC))
Date: Wed, 10 Apr 2002 11:44:13 +0100
Subject: dbifasta/seqret and ncbi-format fasta headers
Message-ID: <E4D0A20B9E9ED4118E3C00508BEED171030AC451@jimserv2.jic.bbsrc.ac.uk>

I have a question about ncbi-type sequence headers in fasta-format files.
I'm
using EMBOSS 2.3.1.

The ncbi format for the dbifasta program is described variously as:

      ncbi : >blah|...[|ACC]|ID
      
and
	>...[|accno]|id ...

in the EMBOSS admin guide and by 'tfm dbifasta'.

>From these I assumed that within the first of the whitespace-delimited
'fields', the last two '|'-delimited subfields will be treated by dbifasta
as the accession no and ID respectively:

>gi|15375403|dbj|AB039926.1|AB039926 Arabidopsis ...blah...
                 ^^^^^^^^^^ ^^^^^^^^
                    accno      id
		    
- but this doesn't work as seqret reports in this case that AB039926 is not
in
my database (which I indexed with dbifasta using idformat 'ncbi', and
specified
with method: emblcd  format:fasta & the necessary dir: and indexdir:
fields).

But this sequence works (I can get it with seqret) -

>gi|15383574|gb|AV540904.2|AV540904 AV540904 Arabidopsis thaliana roots
...blah
                           ^^^^^^^^ ^^^^^^^^
-because the second whitespace-delimited field is present AND identical to
the
previous subfield. The 2nd field is not simply being used as the accno,
because
for example this entry:

>gi|15383574|gb|AV540904.2|XXXXXXX YYYYYYY

cannot be returned by seqret either as XXXXXXX or YYYYYYY (or by any means
other than requesting all sequences in the DB).

Am I doing something stupid? I've looked into this problem a lot, and can
provide debug files for seqret & dbifasta, and I'm sure my db specification
in
emboss.default is correct. For the sequences which fail, seqret reads the
correct header line, but then thinks that accno=''. And seqret always
returns
the id as 'gi' (even for sequences which can be fetched normally). All of
the
correct accnos (e.g. AV540904.2) appear in the acnum.trg file.

Regards,

John Walshaw
John Innes Centre, Norwich Research Park,
Colney, Norwich NR4 7UH, UK. +44(0)1603 450827


From valenzi at iigb.na.cnr.it  Wed Apr 10 16:49:07 2002
From: valenzi at iigb.na.cnr.it (Marco Valenzi)
Date: Wed, 10 Apr 2002 18:49:07 +0200
Subject: About prima
Message-ID: <a05001901b8da1cfd72f0@[140.164.13.84]>

Hi, I'm Marco Valenzi from Naples.
Why prima has been removed from the current package of EMBOSS-2.3.1?
Many thanks
-- 


Marco Valenzi
Institute of Genetics and Biophysics "Adriano Buzzati Traverso"
via Guglielmo Marconi, 10
80125 Naples ITALY
E-mail valenzi at iigbna.iigb.na.cnr.it
tel. +39 081 7257303


From cox at mshri.on.ca  Sun Apr 14 00:46:07 2002
From: cox at mshri.on.ca (Brian Cox)
Date: Sat, 13 Apr 2002 17:46:07 -0700
Subject: jemboss
Message-ID: <000801c1e34d$c79a9c30$bc66f8ce@rossdell>

Hello, I noticed that there is a Jemboss for windows on your FTP site.  I downloaded it but, require a login and password.  How do I obtain these?  Is this a standalone version such as the one for Unix?

Thank you
Brian Cox
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.open-bio.org/pipermail/emboss/attachments/20020413/cad89689/attachment-0001.html>

From quenzer at informatik.uni-tuebingen.de  Tue Apr 16 12:08:56 2002
From: quenzer at informatik.uni-tuebingen.de (Muriel Quenzer)
Date: Tue, 16 Apr 2002 14:08:56 +0200
Subject: Size of EMBOSS 2.3.1 for Solaris 2.8
Message-ID: <200204161208.g3GC8u721130@tauri.informatik.uni-tuebingen.de>

Hi,

I am new to EMBOSS and have to install the latest EMBOSS version 2.3.1 for Sun 
Solaris 2.8. I have been told that the EMBOSS version 2.0.1 needed 
approximately 15 MB disk space, whereas the EMBOSS version 2.3.1 that I 
compiled needs approximately 520 (!) MB. Is this correct?

Thanks for any suggestions.

Muriel


-- 
Mit freundlichen Gr??en,
                                              
Muriel Quenzer                                         
----------
Universit?t T?bingen                                
Wilhelm-Schickard-Institut f?r Informatik
Zentrum f?r Bioinformatik            
Sand 13, 72076 T?bingen
Germany                            
Tel.: +49 (0)7071/29-70464                               
E-mail: quenzer at informatik.uni-tuebingen.de        
GnuPG PUBLIC KEY on request
Key fingerprint = ADDF 1E38 773F 3D51 682E  1F50 D7CC 47E1 3AE8 E047


From charles at moulinette.dyndns.org  Tue Apr 16 12:55:32 2002
From: charles at moulinette.dyndns.org (Charles Plessy)
Date: Tue, 16 Apr 2002 14:55:32 +0200
Subject: Pentium optimisation
Message-ID: <20020416125532.GA22253@moulinette.dyndns.org>

Hi,

I'm running emboss on a debian GNU\Linux with a Pentium IV. Would I
increase the speed of computations if I compiled it with for i686, not
i386 processors (or is it only useful for multimedia apps) ?

Charles


From David.Bauer at SCHERING.DE  Tue Apr 16 14:11:39 2002
From: David.Bauer at SCHERING.DE (David.Bauer at SCHERING.DE)
Date: Tue, 16 Apr 2002 16:11:39 +0200
Subject: Antwort: Size of EMBOSS 2.3.1 for Solaris 2.8
Message-ID: <OF6E1B5F8D.98AC7DD3-ONC1256B9D.004ABA0E@schering.de>


Hi,

this is a little bit overestimated....

I have EMBOSS on Solaris 2.7.
The build tree is ~100 MB with embassy apps (they are about 15 MB incl. tar
files).
The installed version needs 21.4 MB for the binaries and 94 MB in share/EMBOSS
(where 75 MB are for PRINTS,PROSITE and REBASE).

Mit freundlichen Gr??en,
David.


Hi,

I am new to EMBOSS and have to install the latest EMBOSS version 2.3.1 for Sun
Solaris 2.8. I have been told that the EMBOSS version 2.0.1 needed
approximately 15 MB disk space, whereas the EMBOSS version 2.3.1 that I
compiled needs approximately 520 (!) MB. Is this correct?

Thanks for any suggestions.

Muriel


--
Mit freundlichen Gr??en,

Muriel Quenzer
----------
Universit?t T?bingen
Wilhelm-Schickard-Institut f?r Informatik
Zentrum f?r Bioinformatik
Sand 13, 72076 T?bingen
Germany
Tel.: +49 (0)7071/29-70464
E-mail: quenzer at informatik.uni-tuebingen.de
GnuPG PUBLIC KEY on request
Key fingerprint = ADDF 1E38 773F 3D51 682E  1F50 D7CC 47E1 3AE8 E047


From peacfrog at ptd.net  Tue Apr 16 17:18:07 2002
From: peacfrog at ptd.net (Cynthia Martino)
Date: Tue, 16 Apr 2002 13:18:07 -0400
Subject: Prima
Message-ID: <001301c1e56a$ad3c7880$2d7ce518@msns.str.ptd.net>

Hi there!

In the past I was able to access a number of programs, including prima, via the EMBnet Norway site.  However, now when I click on the program name within the program list all I get is a help page describing the qualifiers.  

Do you know if this and the other programs formerly available (via www.no.embnet.org/Programs/) can still be accessed online? 

Any feedback is greatly appreciated.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.open-bio.org/pipermail/emboss/attachments/20020416/bc493f46/attachment-0001.html>

From letondal at pasteur.fr  Tue Apr 16 21:41:15 2002
From: letondal at pasteur.fr (Catherine Letondal)
Date: Tue, 16 Apr 2002 23:41:15 +0200
Subject: Prima 
In-Reply-To: Your message of "Tue, 16 Apr 2002 13:18:07 EDT."
             <001301c1e56a$ad3c7880$2d7ce518@msns.str.ptd.net> 
Message-ID: <200204162141.g3GLfF6O453283@electre.pasteur.fr>


"Cynthia Martino" wrote:
> This is a multi-part message in MIME format.
> 
> Hi there!
> 
> In the past I was able to access a number of programs, including prima, =
> via the EMBnet Norway site.  However, now when I click on the program =
> name within the program list all I get is a help page describing the =
> qualifiers. =20
> 
> Do you know if this and the other programs formerly available (via =
> www.no.embnet.org/Programs/) can still be accessed online?=20
> 
> Any feedback is greatly appreciated.
> 

Hi,

If you want to use a similar interface, you can go to:
http://bioweb.pasteur.fr/seqanal/interfaces/prima.html
(see http://bioweb.pasteur.fr/intro-uk.html for all EMBOSS programs)

There are however other EMBOSS interfaces that you can use, people
from EMBOSS will tell you more accurately than I would do.

I don't know what happens on the no.embnet.org server. 
We are late in the distribution of the Xml Pise programs for EMBOSS
latest version, so that could explain.

--
Catherine Letondal -- Pasteur Institute Computing Center


From David.Bauer at SCHERING.DE  Wed Apr 17 05:34:38 2002
From: David.Bauer at SCHERING.DE (David.Bauer at SCHERING.DE)
Date: Wed, 17 Apr 2002 07:34:38 +0200
Subject: Antwort: Prima
Message-ID: <OFA61313C6.D09988C0-ONC1256B9E.001E5F7C@schering.de>


Hi,

the EMBOSS programs are also available at
http://ubigcg.mdh4.mdc-berlin.de:8080/

Btw. I have updated the system to EMBOSS version 2.3.1.

Ciao, David.


Hi there!

In the past I was able to access a number of programs, including prima, via the
EMBnet Norway site.  However, now when I click on the program name within the
program list all I get is a help page describing the qualifiers.

Do you know if this and the other programs formerly available (via
www.no.embnet.org/Programs/) can still be accessed online?

Any feedback is greatly appreciated.


From mathog at mendel.bio.caltech.edu  Wed Apr 17 19:05:47 2002
From: mathog at mendel.bio.caltech.edu (David Mathog)
Date: Wed, 17 Apr 2002 12:05:47 -0700
Subject: network USA
Message-ID: <E16xukV-0004HN-00@mendel.bio.caltech.edu>

Today I finally realized that the NCBi's PmFetch cgi 

  http://www.ncbi.nlm.nih.gov:80/entrez/utils/pmfetch_help.html

can be used to retrieve data via gi using a "simple" URL like this:

wget -O dmwhite.genbank \
'http://www.ncbi.nlm.nih.gov/entrez/utils/pmfetch.fcgi?db=Nucleotide&id=10873&report=gen&mode=text'

Unfortunately it seems not to be able to retrieve by either accession
number or
locus name - I'm still waiting to hear if there is some other NCBI 
interface for that.

Which is a long way of coming around to considering how a USA could be
used to retrieve remote sequences without exposing end users to truly
hideous
constructs.  The semantics of accessing arbitrary network databases are
probably much too complex to include in the USA but one can imagine
burying
these details under new types of "database" entries in the defaults
file. Something like this:

DB gigenbank [
  method: remoteurlbyid
  comment: "GENBANK at NCBI by gi number"
  format: -
  dir: -
  file: -
  type: N
#optional
  target:
'http://www.ncbi.nlm.nih.gov/entrez/utils/pmfetch.fcgi?db=Nucleotide&id=$ID&report=gen&mode=text'
  filter: 'wget -O - $target'
]

Which would then allow something like this to work transparently:

% seqret gigenbank:10873

The USA already has the "program" option but I think in a situation like
this it's
much too complex to actually use.  How many users are going to be able
to successfully negotiate this:

% seqret -sequence=fasta::"wget -O -
'http://www.ncbi.nlm.nih.gov/entrez/utils/pmfetch.fcgi?db=Nucleotide&id=10873&report=fasta&mode=text'
|" -filter

Anyway, what I'm proposing is that the database definition be extended
slightly
to allow remote accesss methods.  This would be particularly helpful for
people
running EMBOSS on their own PCs or Macs, who tend not to have large
local databases installed.

Regards,

David Mathog
mathog at caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech


From dmartin at bioinformatics.msiwtb.dundee.ac.uk  Wed Apr 17 19:32:40 2002
From: dmartin at bioinformatics.msiwtb.dundee.ac.uk (David Martin)
Date: Wed, 17 Apr 2002 20:32:40 +0100 (BST)
Subject: network USA
In-Reply-To: <E16xukV-0004HN-00@mendel.bio.caltech.edu>
Message-ID: <Pine.LNX.4.33.0204172029230.2149-100000@bioinformatics.msiwtb.dundee.ac.uk>

On Wed, 17 Apr 2002, David Mathog wrote:

> Today I finally realized that the NCBi's PmFetch cgi
>
>   http://www.ncbi.nlm.nih.gov:80/entrez/utils/pmfetch_help.html
>
> can be used to retrieve data via gi using a "simple" URL like this:
>
> wget -O dmwhite.genbank \
> 'http://www.ncbi.nlm.nih.gov/entrez/utils/pmfetch.fcgi?db=Nucleotide&id=10873&report=gen&mode=text'
>
> Unfortunately it seems not to be able to retrieve by either accession
> number or
> locus name - I'm still waiting to hear if there is some other NCBI
> interface for that.
>
> Which is a long way of coming around to considering how a USA could be
> used to retrieve remote sequences without exposing end users to truly
> hideous
> constructs.  The semantics of accessing arbitrary network databases are
> probably much too complex to include in the USA but one can imagine
> burying
> these details under new types of "database" entries in the defaults
> file. Something like this:

Try 'method: url' and using %s instead of $ID. It has been there from
EMBOSS 0.0.4 to
allow retrieval from remote srs servers (or indeed any arbitrary web
address where the id can be passed in the url).

Around page 19-20 in the admin guide.

If it doesn't work then let the guilty parties know.

..d

>
> DB gigenbank [
>   method: remoteurlbyid
>   comment: "GENBANK at NCBI by gi number"
>   format: -
>   dir: -
>   file: -
>   type: N
> #optional
>   target:
> 'http://www.ncbi.nlm.nih.gov/entrez/utils/pmfetch.fcgi?db=Nucleotide&id=$ID&report=gen&mode=text'
>   filter: 'wget -O - $target'
> ]
>
> Which would then allow something like this to work transparently:
>
> % seqret gigenbank:10873
>
> The USA already has the "program" option but I think in a situation like
> this it's
> much too complex to actually use.  How many users are going to be able
> to successfully negotiate this:
>
> % seqret -sequence=fasta::"wget -O -
> 'http://www.ncbi.nlm.nih.gov/entrez/utils/pmfetch.fcgi?db=Nucleotide&id=10873&report=fasta&mode=text'
> |" -filter
>
> Anyway, what I'm proposing is that the database definition be extended
> slightly
> to allow remote accesss methods.  This would be particularly helpful for
> people
> running EMBOSS on their own PCs or Macs, who tend not to have large
> local databases installed.
>
> Regards,
>
> David Mathog
> mathog at caltech.edu
> Manager, Sequence Analysis Facility, Biology Division, Caltech
>

----------------------------------
David Martin PhD
Bioinformatics Scientific Officer
Wellcome Trust Biocentre, Dundee
----------------------------------


From David.Bauer at SCHERING.DE  Thu Apr 18 05:50:47 2002
From: David.Bauer at SCHERING.DE (David.Bauer at SCHERING.DE)
Date: Thu, 18 Apr 2002 07:50:47 +0200
Subject: Antwort: network USA
Message-ID: <OF31990B94.D3A4AF87-ONC1256B9F.001E1BD2@schering.de>


Hi,

I use for this a workaround which uses method app calling scripts which use two
urls at ncbi.
In emboss.default I have two entries for nucleotide and protein, which call an
external script.
############
DB ncbin [ type: N method: app format: genbank
  app: "/bips/bin/emboss/ncbi_fetchn %s"
  comment: "NCBI GenBank Nucleotide" ]

DB ncbip [ type: P method: app format: genbank
  app: "/bips/bin/emboss/ncbi_fetchp %s"
  comment: "NCBI GenBank Protein" ]
##################

The script is unfortunately not very portable as it uses a modified perl LWP
module to work with our firewall.
Basic idea is to use the

"http://www.ncbi.nlm.nih.gov/entrez/utils/pmqty.fcgi?db=nucleotide&term
=$id&dopt=genbank"
resp. "http://www.ncbi.nlm.nih.gov/entrez/utils/pmqty.fcgi?db=protein&term
=$id&dopt=genpept" for protein

to get the gid where $id is locus or acc.
If there are different gid for one acc, then all of them are returned.
What I have observed is that the gid of the most recent version is returned
first (but I'm not sure if this is always true).
So I just grab the first gid which comes and then use the same url you already
mentioned:

"http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=text&db=nucleotide&uid
=$gid&dopt=GenBank"
("http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=text&db=protein&uid
=$gid&dopt=genpept")

to return the whole entry.

So the user can just use
ncbin:<some id or acc.>
with  seqret or entret to get the sequence or genbank entry.


Hope this helps,
Ciao, David.


From inagy at abc.hu  Thu Apr 18 06:50:57 2002
From: inagy at abc.hu (inagy at abc.hu)
Date: Thu, 18 Apr 2002 08:50:57 +0200 (CEST)
Subject: primer3 and -format_output
In-Reply-To: <3CAB2614.9A8B28A@hgmp.mrc.ac.uk>
Message-ID: <XFMail.20020418085057.inagy@abc.hu>


The Whitehead primer3 program has a "-format_output" option  that writes a
formatted output of the input seqences and highlightes the primer binding sites,
etc.

Would it be possible to include this option into the EMBOSS version too ?
It is sometimes very useful.


Istvan


From gwilliam at hgmp.mrc.ac.uk  Fri Apr 19 08:15:14 2002
From: gwilliam at hgmp.mrc.ac.uk (Gary Williams, Tel 01223 494522)
Date: Fri, 19 Apr 2002 09:15:14 +0100
Subject: primer3 and -format_output
References: <XFMail.20020418085057.inagy@abc.hu>
Message-ID: <3CBFD212.C29B713@hgmp.mrc.ac.uk>

I'll add this to the list of suggestions for primer3.
Gary

inagy at abc.hu wrote:
> 
> The Whitehead primer3 program has a "-format_output" option  that writes a
> formatted output of the input seqences and highlightes the primer binding sites,
> etc.
> 
> Would it be possible to include this option into the EMBOSS version too ?
> It is sometimes very useful.
> 
> Istvan

-- 
Gary Williams               Tel: +44 1223 494522  Fax: +44 1223 494512
mailto:G.Williams at hgmp.mrc.ac.uk            http://www.hgmp.mrc.ac.uk/
Bioinformatics,MRC HGMP Resource Centre,Hinxton,Cambridge, CB10 1SB,UK


From grimplet at ensam.inra.fr  Fri Apr 19 09:27:58 2002
From: grimplet at ensam.inra.fr (=?iso-8859-1?q?j=E9r=F4me=20Grimplet?=)
Date: Fri, 19 Apr 2002 11:27:58 +0200
Subject: primer3_core
Message-ID: <200204191129.g3JBTOe14633@ensam.inra.fr>

I believe that somebody already put this question a few week ago, but how can 
I get the primer3_core programm. I don't find it in the  Whitehead Institute 
package.

Thanks,

Jerome
-- 
J?r?me Grimplet 
Laboratoire de Biochimie M?tabolique et Technologie
UMR Sciences Pour l'Oenologie
2, Place Viala
34060 Montpellier Cedex 01
Tel: 33(0)4.99.61.27.56
Fax: 33(0)4.99.61.28.57
grimplet at ensam.inra.fr


From gwilliam at hgmp.mrc.ac.uk  Fri Apr 19 10:51:46 2002
From: gwilliam at hgmp.mrc.ac.uk (Gary Williams, Tel 01223 494522)
Date: Fri, 19 Apr 2002 11:51:46 +0100
Subject: primer3_core
References: <200204191129.g3JBTOe14633@ensam.inra.fr>
Message-ID: <3CBFF6C2.DCAF6007@hgmp.mrc.ac.uk>


>From the eprimer3 documentation:

Notes

   The Whitehead Institute program that is run by this program is
   available from:
   http://www-genome.wi.mit.edu/genome_software/other/primer3.html
   (Then see the link 'Get release 0.9')

   The version that is run by this program is 3.0.9 currently available
   from:
  
http://www-genome.wi.mit.edu/ftp/distribution/software/primer3_0_9_test.tar.gz
 

j?r?me Grimplet wrote:
> 
> I believe that somebody already put this question a few week ago, but how can
> I get the primer3_core programm. I don't find it in the  Whitehead Institute
> package.
> 
> Thanks,
> 
> Jerome
> --
> J?r?me Grimplet
> Laboratoire de Biochimie M?tabolique et Technologie
> UMR Sciences Pour l'Oenologie
> 2, Place Viala
> 34060 Montpellier Cedex 01
> Tel: 33(0)4.99.61.27.56
> Fax: 33(0)4.99.61.28.57
> grimplet at ensam.inra.fr

-- 
Gary Williams               Tel: +44 1223 494522  Fax: +44 1223 494512
mailto:G.Williams at hgmp.mrc.ac.uk            http://www.hgmp.mrc.ac.uk/
Bioinformatics,MRC HGMP Resource Centre,Hinxton,Cambridge, CB10 1SB,UK


From peter.rice at uk.lionbioscience.com  Fri Apr 19 12:49:07 2002
From: peter.rice at uk.lionbioscience.com (Peter Rice)
Date: Fri, 19 Apr 2002 13:49:07 +0100
Subject: network USA
References: <E16xukV-0004HN-00@mendel.bio.caltech.edu>
Message-ID: <3CC01243.9190F3FB@uk.lionbioscience.com>

David Mathog wrote:
> 
> Today I finally realized that the NCBi's PmFetch cgi
> 
>   http://www.ncbi.nlm.nih.gov:80/entrez/utils/pmfetch_help.html
> 
> can be used to retrieve data via gi using a "simple" URL like this:
> 
> wget -O dmwhite.genbank \
> 'http://www.ncbi.nlm.nih.gov/entrez/utils/pmfetch.fcgi?db=Nucleotide&id=10873&report=gen&mode=text'
> 
> Unfortunately it seems not to be able to retrieve by either accession
> number or
> locus name - I'm still waiting to hear if there is some other NCBI
> interface for that.

Oops. Will be fixed in 2.4.0 (Alan and I thought it already was, but it
needed one extra line of code in the latest CVS version). The problem is
that EMBOSS checks the ID and accession of the returned entry for the URL
access method, and of course neither matches '10873'.

Which leads on to a new access method for 2.4.0. We are adding an "srswww"
access method that generates the SRS URLs, and can query by id, accession,
seqversion (or GI), keyword, organism or description. We can add at least
some of these for entrez (new access method entrez) if we can gather up
enough URLs. Are there any entrez experts who can help with suggested URLs
to retrieve (preferably plain text, but html will do) from entrez with
queries for each of these fields?

Peter

-- 
------------------------------------------------
Peter Rice, LION Bioscience Ltd, Cambridge, UK
peter.rice at uk.lionbioscience.com +44 1223 224723


From sghk100 at sghms.ac.uk  Fri Apr 19 15:19:15 2002
From: sghk100 at sghms.ac.uk (David Winterbourne)
Date: Fri, 19 Apr 2002 16:19:15 +0100
Subject: network USA
Message-ID: <3CC03573.4A76C643@sghms.ac.uk>

David Martin wrote:

> ...
>
> Try 'method: url' and using %s instead of $ID. It has been there from
> EMBOSS 0.0.4 to
> allow retrieval from remote srs servers (or indeed any arbitrary web
> address where the id can be passed in the url).

I have been having a problem accessing the Swiss Prot database using this method. I set up URL based access
to SWISSPROT and EMBL databases at EBI as follows:

DB sw [ type: P method: url format: swiss
url: "http://srs.ebi.ac.uk/srs6bin/cgi-bin/wgetz?-e+[SWISSPROT:%s]"
DB embl [ type: N method: url format: embl
url: "http://srs.ebi.ac.uk/srs6bin/cgi-bin/wgetz?-e+[EMBL-id:%s]"

For an EMBL entry, using the URL in a browser and specifying it in EMBOSS accesses the data. However, the
equivalent for Swiss Prot works in a browser but not  in EMBOSS - it just causes the system to hang. Is
there a simple solution?

Regards
David
--
David Winterbourne
Department of Surgery
St. George's Hospital Medical School, London SW17 0RE, England
Tel: 020 8725 5581   Fax: 020 8725 3594


From jfreeman at variagenics.com  Mon Apr 22 21:21:51 2002
From: jfreeman at variagenics.com (James Freeman)
Date: Mon, 22 Apr 2002 17:21:51 -0400
Subject: Jemboss and Resin
Message-ID: <3CC47EEF.635B531E@variagenics.com>

To whom it may concern,

Does anyone know of any problems when using Resin
(http://www.caucho.com/) as a substitute for Tomcat when running
Jemboss?

Thanks for your assistance,

Jim Freeman
Senior Scientist
Variagenics, Inc.


From tchiang at bioinfo.sickkids.on.ca  Tue Apr 23 14:01:40 2002
From: tchiang at bioinfo.sickkids.on.ca (Ted Chiang)
Date: Tue, 23 Apr 2002 10:01:40 -0400 (EDT)
Subject: EMBOSS:complex
Message-ID: <Pine.GSO.4.05.10204230956430.15900-100000@kenny>


Just a quick question.  In the 2.3.1 release, the EMBOSS program 'complex'
is not fully implemented.  Will this program be in the next release or
have we missed something in the installation?

-Ted


=====================================
Ted Chiang
Bioinformatics Supercomputing Centre
Hospital for Sick Children, Toronto
ext. 7028
tchiang at bioinfo.sickkids.on.ca


From peter.rice at uk.lionbioscience.com  Tue Apr 23 14:30:23 2002
From: peter.rice at uk.lionbioscience.com (Peter Rice)
Date: Tue, 23 Apr 2002 15:30:23 +0100
Subject: EMBOSS:complex
References: <Pine.GSO.4.05.10204230956430.15900-100000@kenny>
Message-ID: <3CC56FFF.3C3AFB6A@uk.lionbioscience.com>

Ted Chiang wrote:
> 
> Just a quick question.  In the 2.3.1 release, the EMBOSS program 'complex'
> is not fully implemented.  Will this program be in the next release or
> have we missed something in the installation?

complex is a strange application (with italian command line options) that
the authors have not been maintaining.

We have moved it into the "make check" set of obsolete/testing
applications. If you do need it, the "make check" command will build it,
but you then need to copy the binary and other files to the install
directories by hand.

One unfortunate side effect of moving applications to "make check" (or
removing them) is that the old binaries will stay in the install directory.

Perhaps we can find a way to clean them up ... need to think about that a
little.

regards,

Peter

-- 
------------------------------------------------
Peter Rice, LION Bioscience Ltd, Cambridge, UK
peter.rice at uk.lionbioscience.com +44 1223 224723


From letondal at pasteur.fr  Tue Apr 23 15:06:50 2002
From: letondal at pasteur.fr (Catherine Letondal)
Date: Tue, 23 Apr 2002 17:06:50 +0200
Subject: Pise/EMBOSS 2.3.1
Message-ID: <200204231506.g3NF6oop249416@electre.pasteur.fr>


Hi,

I have more or less adapted new ACD types and attributes to Pise.
(ftp://ftp.pasteur.fr/pub/GenSoft/unix/misc/Pise/emboss_xml_files-2.3.1.tar.gz)

Main changes were for align types, where I could associate a "pipetype" to
chain to other programs taking alignment as input. BTW, I found "MSF" and "fasta" for the
-aformat parameter - are there others?

The main problem I had was with string parameters for specifying a path, with ./ default
value and having corresponding extn parameters.

On a Web interface you cannot really allow path and filename manipulation, and you 
must give a mean to the user to upload or input data (except if you have a login on user
home directory, which I'm aware is the choice for other Web interfaces for EMBOSS).
That's why I had to discard the following programs:
alignwrap, contacts, seqnr, seqsort, siggen, dichet and scopalign.

I have tried to "guess" that such parameter is a path, according to their name, 
with the next parameter being the extension, and, it's in the input or output section, 
so I can decide it's an InFile or Sequence, or Results Pise parameter. 
But some parameters are neither in input nor in output sections, and it's not secure 
to associate to parameters just because they follow each other.

A solution could be to have an explicit type and the extension as an attribute:

path: algpath  [
  parameter: "Y"
  prompt: "Location and extension of alignment files for input"
  default: "./"
  extn: ".align"
]

instead of (siggen.acd):

section: input [ info: "input Section" type: page ]
string: algpath  [
  parameter: "Y"
  prompt: "Location of alignment files for input"
  default: "./"
]

string: algextn  [
  parameter: "Y"
  prompt: "Extension of alignment files for input"
  default: ".align"
]

What do you think?

Thanks a lot in advance,

-- 
Catherine Letondal -- Pasteur Institute Computing Center


From peter.rice at uk.lionbioscience.com  Tue Apr 23 16:03:27 2002
From: peter.rice at uk.lionbioscience.com (Peter Rice)
Date: Tue, 23 Apr 2002 17:03:27 +0100
Subject: Pise/EMBOSS 2.3.1
References: <200204231506.g3NF6oop249416@electre.pasteur.fr>
Message-ID: <3CC585CF.D664FB68@uk.lionbioscience.com>

Hi Catherine,

> Main changes were for align types, where I could associate a "pipetype" to
> chain to other programs taking alignment as input. BTW, I found "MSF" and "fasta" for the
> -aformat parameter - are there others?

There are more, but not sequence formats. We should add them to "entrails"
output.

We can easily add more sequence formats. Can you suggest some?

The full list is (from ajax/ajalign.c) :

markx0*, markx1*, markx2*, markx3*, markx10* (from the FASTA package)
multiple
pair*
simple
score
srs, srspair* (for simple parsing in SRS in case the others change)
trace (for debugging only)

Those with '*' are for pairwise alignments only.

> The main problem I had was with string parameters for specifying a path, with ./ default
> value and having corresponding extn parameters.
>
> That's why I had to discard the following programs:
> alignwrap, contacts, seqnr, seqsort, siggen, dichet and scopalign.
>
> A solution could be to have an explicit type and the extension as an attribute:
> 
> path: algpath  [
>   parameter: "Y"
>   prompt: "Location and extension of alignment files for input"
>   default: "./"
>   extn: ".align"
> ]
> 
> What do you think?

The path and extension options  are a terrible 'hack' to avoid having "*"
on the command line for those programs.

This is really just infile with a wild card filename (which works already).

We can make a new ACD type "inwild" which works like infile but with some
small differences. The prompt would be "Input file(s)". The ajAcdGetInwild
function will return an AjPFile. We can add functions to report the
filenames as a string list (the first file is already open, the others are
in a list so it is a little tricky to make the list in an application).

There should be an attribute "inextension:align" (for example) and a
default value of "*". If the user specifies "*.align" the inextension will
be ignored.

Associated qualifiers:

-inextension align
-indirectory /home/user/somewhere (defaults to current directory)

For consistency, we can add the same qualifiers for infile.

With "out" instead of "in" we can sue the same qualifiers for outfile and a
new ACD type "outwild" (outwild can open a new output file, using a new
ajFileNextOut call, but the application needs to give the base name each
time).

All easy to implement.

One problem ... inwild does not work well as a parameter because it has to
be given as "*" on the command line. Same problem for "outwild". I am sure
users can be educated.

The programs that use the path/extension options do not define them as
parameters anyway. Their ACD files need some corrections.

Comments?

regards,

Peter

-- 
------------------------------------------------
Peter Rice, LION Bioscience Ltd, Cambridge, UK
peter.rice at uk.lionbioscience.com +44 1223 224723


From letondal at pasteur.fr  Tue Apr 23 17:38:43 2002
From: letondal at pasteur.fr (Catherine Letondal)
Date: Tue, 23 Apr 2002 19:38:43 +0200
Subject: Pise/EMBOSS 2.3.1 
In-Reply-To: Your message of "Tue, 23 Apr 2002 17:03:27 BST."
             <3CC585CF.D664FB68@uk.lionbioscience.com> 
Message-ID: <200204231738.g3NHchop186093@electre.pasteur.fr>


Peter Rice wrote:
> Hi Catherine,

Hi Peter,


> 
> > Main changes were for align types, where I could associate a "pipetype" to
> > chain to other programs taking alignment as input. BTW, I found "MSF" and "fasta" for the
> > -aformat parameter - are there others?
> 
> There are more, but not sequence formats. We should add them to "entrails"
> output.
> 
> We can easily add more sequence formats. Can you suggest some?

I just asked to know which one to put on the Web interface.
(There are also clustalw or Phylip, but it's not necessary in Pise, since there
are format converters).
 

> Those with '*' are for pairwise alignments only.
> 
> > The main problem I had was with string parameters for specifying a path, with ./ default
> > value and having corresponding extn parameters.
> >
> > That's why I had to discard the following programs:
> > alignwrap, contacts, seqnr, seqsort, siggen, dichet and scopalign.
> >
> > A solution could be to have an explicit type and the extension as an attribute:
> > 
> > path: algpath  [
> >   parameter: "Y"
> >   prompt: "Location and extension of alignment files for input"
> >   default: "./"
> >   extn: ".align"
> > ]
> > 
> > What do you think?
> 
> The path and extension options  are a terrible 'hack' to avoid having "*"
> on the command line for those programs.

I have the same problem just with ./, since '/' cannot be allowed in a string
parameter on a Web server.

Another problem I have made a workaround for, is the '*' programs such as
extractseqfeat, where it is replaced in the Web form by 'all', then replaced
in the CGI by '*'. 

> 
> This is really just infile with a wild card filename (which works already).
> 
> We can make a new ACD type "inwild" which works like infile but with some
> small differences. The prompt would be "Input file(s)". The ajAcdGetInwild
> function will return an AjPFile. We can add functions to report the
> filenames as a string list (the first file is already open, the others are
> in a list so it is a little tricky to make the list in an application).
> 
> There should be an attribute "inextension:align" (for example) and a
> default value of "*". If the user specifies "*.align" the inextension will
> be ignored.
> 
> Associated qualifiers:
> 
> -inextension align
> -indirectory /home/user/somewhere (defaults to current directory)
> 
> For consistency, we can add the same qualifiers for infile.
> 
> With "out" instead of "in" we can sue the same qualifiers for outfile and a
> new ACD type "outwild" (outwild can open a new output file, using a new
> ajFileNextOut call, but the application needs to give the base name each
> time).
> 
> All easy to implement.
> 
> One problem ... inwild does not work well as a parameter because it has to
> be given as "*" on the command line. Same problem for "outwild". I am sure
> users can be educated.
> 
> The programs that use the path/extension options do not define them as
> parameters anyway. Their ACD files need some corrections.
> 
> Comments?

As long as there is a way to detect such kind of parameter (in order to replace
them by a simple textarea or file upload on a Web interface), I think it's
very useful! So the type would be inwild or outwild?

PS: Regarding Pise/EMBOSS I forgot to mention that not only output alignment 
are "connected" by Pise menus. I have also added this feature for sequence, seqall, 
seqout, etc...

Thanks for the quick answer!

--
Catherine Letondal -- Pasteur Institute Computing Center


From mathog at mendel.bio.caltech.edu  Tue Apr 23 18:25:34 2002
From: mathog at mendel.bio.caltech.edu (David Mathog)
Date: Tue, 23 Apr 2002 11:25:34 -0700
Subject: Pise/EMBOSS 2.3.1
Message-ID: <E1704ys-00016w-00@mendel.bio.caltech.edu>

> One problem ... inwild does not work well as a parameter because it
has to
> be given as "*" on the command line. Same problem for "outwild". I am
sure
> users can be educated.

Sure they can.  That's why thousands of hours are being spent wrapping
GUIs
around programs so that users don't have to (horrors) log on or (gasp)
type
a command line.

Back to the subject at hand. (And this is stream of consciousness, so
please
bear with me.) I think that maybe for purposes of interface design there
should be predefined methods to break out (all) the pieces/options of a
USA.   (Perhaps
even reduced to perl and C modules in the EMBOSS distribution so that
W2h/Pise/etc 
don't need to be rewritten for each EMBOSS release.) Consider something
like this:

 program -sequence=genbank:\*

That never translates directly well into a GUI because the end user has
to
know what the full USA syntax is and especially that a "*" is a wild
card.
And often enough, they don't understand these concepts.   And even if
they do,
they may not be able to use certain aspects of that syntax on a given
server (for
instance, files and paths, or particular databases.)   So it falls to
the
GUI to put some glue in between the USA and the user.  The two main web
interfaces for EMBOSS take opposite paths in this regard.   Pise hides
the USA  completely and W2H allows the user to manipulate USAs through a
tool.
In W2H you generally have to build the USAs
ahead of time through a separate window and store them in a list, then
you select one or more USAs from the list when you run the program. 
(USAs can also generally be typed  into the slots
within the program - if the user knows what he/she is doing.)
In PISE you can enter a database USA like "genbank:dmwhite" (but it
isn't
called a USA) but entering "genbank:*" doesn't work (for instance, with
compseq).  PISE isn't really designed to handle wild cards because
it's going to try to extract that whole sequence from the database and
save it in a file and then run the program on that file.  This is
consistent with
its typical "upload data for each program" design.  Pise only ever runs
programs with the "simple file" sort of USA.  So perhaps its just as
well
that "genbank:*" doesn't work at the moment!!!  To get around this
wildcard limitation Pise would have to be reworked enough to recognize
wildcards (and USAs in general) and slot them onto the command line
without first extracting the sequences they refer to.

Anyway,  what's really going on with -sequence is that all of the
components of  USA are encoded into a single string for use on the
command line and then are broken out again into separate pieces later
within the program.   For a GUI _all_ these pieces need to be broken
out explicitly and displayed to the user (who isn't expected to know
anything about USAs or have to learn anything them or the interface). 
Something like this:

format: default
database:genbank
x ALL_ENTRIES  o BY_STRING
entrystring: (blank)

>From that the GUI/cgi can easily enough format a USA for the final
command line.

But imagine using such an interface.  It's great if you just run an
occasional program
but not so wonderful when you're doing something complex.  How do you
cut and paste
the state of 4 (or more) USA variables from one page (=program) to
another?  That suggests to me that a GUI which always has fully broken
out USA options will probably
end up being pretty awkward to use.   However, since the purpose of the
GUI is
to essentially reformat (implicit) information in the USA why not make
that an
explicit option - and let it reformat in both directions?  Then the
"standard" USA
GUI interface starts to look something like this:

[test usa] [from USA] [to USA] [use this] [abort]   <------(buttons)
USA:[  genbank:*   ]
format: default
database:genbank      <-------- (pull down list)
x ALL_ENTRIES  o BY_STRING
entrystring: (blank)

Actually it's a LOT more complicated than that, considering that it also
encompasses
listfiles, multiple entries (foo.msf{one,two, three}) etc..  If the user
has a USA he/she can
plug it into the GUI and fine.  Or they can plug it, translate it, and
tweak it.  Or if
they don't have a USA to start with they can use this page to build one.
And this USA constructor page can enable/disable the USA fields as
appropriate
for each site and/or program.  (No file access?  Can't accept list files
or wild cards?
Then don't show those USA options.  Make the database list from the
output of showdb.)

The final problem is that exposing the guts of the USA will take up a
lot of
screen space and complicate the program interfaces.  That's less
of a problem though if the GUI for any given EMBOSS program just
provides
a slot to plug in a USA and some way to pop up the USA fomatter window
to fill in that slot (through javascript or whatever).  The popped up
formatter
could then drop the final USA back into the program's USA slot.  (Sort
of like
what W2H does, but into the programs slot rather than the working list).

Regards,

David Mathog
mathog at caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech


From jison at hgmp.mrc.ac.uk  Wed Apr 24 08:40:25 2002
From: jison at hgmp.mrc.ac.uk (Dr J.C. Ison)
Date: Wed, 24 Apr 2002 09:40:25 +0100
Subject: Pise/EMBOSS 2.3.1
References: <200204231506.g3NF6oop249416@electre.pasteur.fr> <3CC585CF.D664FB68@uk.lionbioscience.com>
Message-ID: <3CC66F79.D7AB51BA@hgmp.mrc.ac.uk>

> The programs that use the path/extension options do not define them as
> parameters anyway. Their ACD files need some corrections.
>
> Comments?

They are parameters in new versions of the the protein structure apps
(alignwrap, contacts, seqnr, seqsort, siggen, dichet and scopalign etc)
but I haven't committed them yet - within a month hopefully.

J.


From charles at moulinette.dyndns.org  Fri Apr 26 21:30:56 2002
From: charles at moulinette.dyndns.org (Charles Plessy)
Date: Fri, 26 Apr 2002 23:30:56 +0200
Subject: seqret doesn't count more than 99?
Message-ID: <20020426213056.GA26616@moulinette.dyndns.org>

Hello,

I downloaded the draft of the fugu genome (fasta format, 300Mb) and
renamed the headers using the following command line :

sed < fugu_02_04_28.fasta 's/>/>gnl|fugu|/' > fugu_newheaders_02_04_28.fasta

I'm not able to index a blast database correctly if the header doesn't
look ?ncbi compliant? ant formatdb haddn't been run with the -o flag.

I created the blast database and indexed it with dbiblast. The reason
for not formatting the fasta file itself is to save space. This also
enforces a synchronicity between the blast hits names and the names
that I can give to seqret.

Here is now the prbolem :

charles at pc-1035-a:~$ seqret fugu:Scaffold_7
Reads and writes (returns) sequences
Output sequence [scaffold_7.fasta]:  ==> OK!

charles at pc-1035-a:~$ seqret fugu:Scaffold_99
Reads and writes (returns) sequences
Output sequence [scaffold_99.fasta]: ==> OK!

charles at pc-1035-a:~$ seqret fugu:Scaffold_100
Reads and writes (returns) sequences
Error: Unable to read sequence 'fugu:Scaffold_100'

==> KO :((

seqret can't fetch sequences names like Scaffold_xzy, where xyz >= 100.

Is it due to the lenght of the name?
I am puzzled with that problem... I can send you more info if you like.

Charles


From simon.andrews at bbsrc.ac.uk  Mon Apr 29 09:49:29 2002
From: simon.andrews at bbsrc.ac.uk (simon andrews (BI))
Date: Mon, 29 Apr 2002 10:49:29 +0100
Subject: seqret doesn't count more than 99?
Message-ID: <2DC41140A89ED411989D00508BDCD9ED01E28535@bi-exsrv1.iapc.bbsrc.ac.uk>


> -----Original Message-----
> From: Charles Plessy [mailto:charles at moulinette.dyndns.org]
> Sent: 26 April 2002 22:31
> To: emboss at hgmp.mrc.ac.uk
> Subject: seqret doesn't count more than 99?
> 
> 
> Hello,
> 
> I downloaded the draft of the fugu genome 

[snip]

> I'm not able to index a blast database correctly if the header doesn't
> look ?ncbi compliant? ant formatdb haddn't been run with the -o flag.

I'd not tried this before, but we see the same thing here.  Running dbiblast
on the indexed raw fugu data seems to work, but seqret fails on the
subsequent retrieval.

The problem seems to be in the accession numbers entered into the .trg file
created by dbiblast.  Running seqret with debug on, shows the following
(edited) entries:

------------------------------------
USA to test: 'fugu_blasttest:Scaffold_1'
[snip]

found dbname fugu_blasttest
wild query 'Scaffold_1' 'Scaffold_1' '' 
database type: 'N' format 'ncbi'
use access method 'blast'
Matched seqAccess[12] 'blast'
seqAccessBlast type 1
[snip]

seqCdIdxSearch (entry 'Scaffold_1')
[several more of these]
idx test 59 'Scaffold_100' -1 (+/- 39)
idx test 49 'Contig_83248'  1 (+/- 18)
idx test 54 'Contig_9376'  1 (+/- 8)
idx test 56 'Scaffold_10' -1 (+/- 3)
idx test 55 'Scaffold_1' -1 (+/- 0)
 
ajFileNewIn '/data/Fugu/EMBOSS/TEST/acnum.trg'
ajNamResolve of '/data/Fugu/EMBOSS/TEST/acnum.trg'
seqCdReadHeader file /data/Fugu/EMBOSS/TEST/acnum.trg
  FileSize: 416800 NRecords: 20825 recsize: 20 idsize: 10
seqCdFileOpen '/data/Fugu/EMBOSS/TEST/acnum.trg' NRecords: 20825 RecSize: 20
ajFileNewIn '/data/Fugu/EMBOSS/TEST/acnum.hit'
ajNamResolve of '/data/Fugu/EMBOSS/TEST/acnum.hit'
seqCdReadHeader file /data/Fugu/EMBOSS/TEST/acnum.hit
  FileSize: 83600 NRecords: 20825 recsize: 4 idsize: -6
seqCdFileOpen '/data/Fugu/EMBOSS/TEST/acnum.hit' NRecords: 20825 RecSize: 4
seqCdTrgSearch 'Scaffold_1' recSize: 20
trg test 10412 'ZZ0010413' -1 (+/- 20825)
trg test 5206 'ZZ0005207' -1 (+/- 10412)
trg test 2603 'ZZ0002604' -1 (+/- 5206)
trg test 1301 'ZZ0001302' -1 (+/- 2603)
trg test 650 'ZZ0000651' -1 (+/- 1301)
trg test 325 'ZZ0000326' -1 (+/- 650)
trg test 162 'ZZ0000163' -1 (+/- 325)
trg test 81 'ZZ0000082' -1 (+/- 162)
trg test 40 'ZZ0000041' -1 (+/- 81)
trg test 20 'ZZ0000021' -1 (+/- 40)
trg test 10 'ZZ0000011' -1 (+/- 20)
trg test 5 'ZZ0000006' -1 (+/- 10)
trg test 2 'ZZ0000003' -1 (+/- 5)
trg test 1 'ZZ0000002' -1 (+/- 2)
trg test 0 'ZZ0000001' -1 (+/- 1)
'SCAFFOLD_1' not found found in .trg

------------------------------------------------

After this is cleans up after itself and exits.  Looking through the .trg
file all the accessions are of the form ZZ0000XXX.  This format of accession
doesn't appear anywhere in my original data, so I don't know where it's
coming from (presumably either dbiblast or formatdb?).  The inability to
reconcile the Scaffold_1 with the ZZ00... accessions seems to be what causes
seqret to fail.


> I created the blast database and indexed it with dbiblast. The reason
> for not formatting the fasta file itself is to save space. This also
> enforces a synchronicity between the blast hits names and the names
> that I can give to seqret.

The way we did this was to use the fasta files for both.  I take the point
about the space saving, but the assembled data wasn't all that big.  If you
use the raw fasta files for both formatdb (without header parsing) and
dbifasta, then you can still use the same accession codes as reference in
both.


> Here is now the prbolem :
> 
> charles at pc-1035-a:~$ seqret fugu:Scaffold_100
> Reads and writes (returns) sequences
> Error: Unable to read sequence 'fugu:Scaffold_100'
> 
> ==> KO :((
> 
> seqret can't fetch sequences names like Scaffold_xzy, where 
> xyz >= 100.
> 
> Is it due to the length of the name?

It might be worth running seqret with the -debug flag on and looking at the
messages at the end of seqret.dbg.  This usually gives some more useful
information about what is going wrong in these cases.

I'd be interested in seeing a resolution to this as well...

	TTFN

	Simon.


From peter.rice at uk.lionbioscience.com  Mon Apr 29 10:41:30 2002
From: peter.rice at uk.lionbioscience.com (Peter Rice)
Date: Mon, 29 Apr 2002 11:41:30 +0100
Subject: seqret doesn't count more than 99?
References: <2DC41140A89ED411989D00508BDCD9ED01E28535@bi-exsrv1.iapc.bbsrc.ac.uk>
Message-ID: <3CCD235A.D418202E@uk.lionbioscience.com>

"simon andrews (BI)" wrote:
> The problem seems to be in the accession numbers entered into the .trg file
> created by dbiblast.  Running seqret with debug on, shows the following
> (edited) entries:

The command line:

   seqret fugu_blasttest:Scaffold_1

searches both the entryname and acnum indices.

The ZZ accession number are invented bu dbiblast so there is something in
the acnum index (they should disappear in 2.4.0, where we handle empty
indices gracefully).

The problem will be in the entryname index, where is seems Scaffold_1 was
found, but not accepted. I am waiting for the example file from Charles,
but I suspect this is a problem already fixed in the code for 2.4.0.

regards,

Peter

-- 
------------------------------------------------
Peter Rice, LION Bioscience Ltd, Cambridge, UK
peter.rice at uk.lionbioscience.com +44 1223 224723


From charles at moulinette.dyndns.org  Mon Apr 29 13:22:07 2002
From: charles at moulinette.dyndns.org (Charles Plessy)
Date: Mon, 29 Apr 2002 15:22:07 +0200
Subject: seqret doesn't count more than 99?
In-Reply-To: <2DC41140A89ED411989D00508BDCD9ED01E28535@bi-exsrv1.iapc.bbsrc.ac.uk>
References: <2DC41140A89ED411989D00508BDCD9ED01E28535@bi-exsrv1.iapc.bbsrc.ac.uk>
Message-ID: <20020429132207.GD1818@moulinette.dyndns.org>

> I'd not tried this before, but we see the same thing here.  Running dbiblast
> on the indexed raw fugu data seems to work, but seqret fails on the
> subsequent retrieval.

I have to NCBIze the headers in order to make it work : I use either
lcl|entryname or gnl|dbname|entryname

> > I created the blast database and indexed it with dbiblast. The reason
> > for not formatting the fasta file itself is to save space. This also
> > enforces a synchronicity between the blast hits names and the names
> > that I can give to seqret.
> 
> The way we did this was to use the fasta files for both.  I take the point
> about the space saving, but the assembled data wasn't all that big.  If you
> use the raw fasta files for both formatdb (without header parsing) and
> dbifasta, then you can still use the same accession codes as reference in
> both.

You are right, I was also motivated to do something 'aesthetic' ;)

> It might be worth running seqret with the -debug flag on and looking at the
> messages at the end of seqret.dbg.  This usually gives some more useful
> information about what is going wrong in these cases.

I can send the debug info upon request, the files (one success, one
failure) are not that big (70k) but I think that netiquette doesn't
recommend sending them to all the list.

Charles