From econtact at defisc-immo.fr  Fri Dec  7 01:06:34 2001
From: econtact at defisc-immo.fr (DEFISCIMMO)
Date: Fri, 7 Dec 2001 07:06:34 +0100
Subject: INVESTISSEZ VOS IMPOTS
Message-ID: <NFBBIHPCCLODCPNCLFCDEEKDPFAD.econtact@defisc-immo.fr>

**********EPARGNEZ VOS IMPOTS*********************

Pour en savoir plus cliquez sur le lien suivant :
www.defisc-immo.fr
http://www.defisc-immo.fr/cgi-bin/s.pl?id=453059457;p=index;end;/

---------------------------------------------------------------------------

                                                 INVESTIR FACILEMENT

                                                        Loyers percus
         ? partir de                         + Economie d'imp?ts
          200 F/mois                        - Remboursement des pr?ts
                                                   = EPARGNE MINIMALE


Ou comment, dans le cadre de la LOI BESSON, se constituer :
 - un patrimoine
 - un capital retraite
 - des revenus compl?mentaires
gr?ce ? un LOCATAIRE et ? des ECONOMIES D'IMPOTS.

* Plans d'investissement sur demande


             DEFISCIMMO

             info at defisc-immo.fr

Nous vous invitons ? remplir le formulaire ? l'adresse
 http://www.defisc-immo.fr/cgi-bin/s.pl?id=453059457;p=contact;end


si vous ne souhaitez plus recevoir de messages cliquez sur le lien suivant
http://www.defisc-immo.fr/contact/pages/mailing.htm
ou r?pondez ? ce courrier en indiquant 'annulation' dans le sujet.


From gbottu at ben.vub.ac.be  Fri Dec  7 05:59:49 2001
From: gbottu at ben.vub.ac.be (Guy Bottu)
Date: Fri, 7 Dec 2001 11:59:49 +0100 (MET)
Subject: compiling EMNU on CompaqTru64
Message-ID: <200112071059.LAA16923@bigben.vub.ac.be>

from : BEN

	Dear colleagues,
	
I have a problem. I am trying to compile EMNU on our new computer. We have OS 
CompaqTru64 5.1 and compiler GNU gcc 3.O.1
It does not work because the files menu.h, form.h, eti.h, libmenu.a and 
libform.a are lacking. Anyone an idea where to obtain these ?

	Guy Bottu


From gwilliam at hgmp.mrc.ac.uk  Fri Dec  7 06:09:48 2001
From: gwilliam at hgmp.mrc.ac.uk (Gary Williams, Tel 01223 494522)
Date: Fri, 07 Dec 2001 11:09:48 +0000
Subject: compiling EMNU on CompaqTru64
References: <200112071059.LAA16923@bigben.vub.ac.be>
Message-ID: <3C10A37C.7C9CB92A@hgmp.mrc.ac.uk>


The libmenu.a, menu.h and libform.a, form.h files are part of the
standard curses (or ncurses) UNIX libraries.
Check that these are set up correctly.

ncurses is available from:
ftp://dickey.his.com/ncurses/
or
ftp://ftp.gnu.org/pub/gnu/ncurses

Read emnu's INSTALL file for 'configure's arguments to piont to the
required libraries.

Guy Bottu wrote:
> 
> from : BEN
> 
>         Dear colleagues,
> 
> I have a problem. I am trying to compile EMNU on our new computer. We have OS
> CompaqTru64 5.1 and compiler GNU gcc 3.O.1
> It does not work because the files menu.h, form.h, eti.h, libmenu.a and
> libform.a are lacking. Anyone an idea where to obtain these ?
> 
>         Guy Bottu

-- 
Gary Williams               Tel: +44 1223 494522  Fax: +44 1223 494512
mailto:G.Williams at hgmp.mrc.ac.uk            http://www.hgmp.mrc.ac.uk/
Bioinformatics,MRC HGMP Resource Centre,Hinxton,Cambridge, CB10 1SB,UK


From ableasby at hgmp.mrc.ac.uk  Fri Dec  7 06:10:26 2001
From: ableasby at hgmp.mrc.ac.uk (ableasby at hgmp.mrc.ac.uk)
Date: Fri, 7 Dec 2001 11:10:26 GMT
Subject: compiling EMNU on CompaqTru64
Message-ID: <200112071110.LAA24602@bromine.hgmp.mrc.ac.uk>

Hi Guy,

I believe you'll find them if you install GNU ncurses from
ftp.gnu.org

Cheers
Alan


From mad at biol.unlp.edu.ar  Fri Dec  7 09:44:33 2001
From: mad at biol.unlp.edu.ar (Sarachu Martin)
Date: Fri, 07 Dec 2001 11:44:33 -0300 (ART)
Subject: gcg and solaris 8
Message-ID: <1007736273.3c10d5d1aaead@www.biol.unlp.edu.ar>

Hi,

sorry for the off-topic but maybe you can help me. Do you know if GCG 9 does 
run on a UltraSparc/Solaris 8 system? I installed GCG 9 on a Intel/Solaris 8 
system and got a "cannot execute exe file" error on several files. GCG doesn?t 
run on a PC platform?

Thanks,

martin.


From ztu at msi.umn.edu  Fri Dec  7 09:54:15 2001
From: ztu at msi.umn.edu (Zheng Jin Tu)
Date: Fri, 7 Dec 2001 08:54:15 -0600 (CST)
Subject: gcg and solaris 8
In-Reply-To: <1007736273.3c10d5d1aaead@www.biol.unlp.edu.ar>
Message-ID: <Pine.LNX.4.31.0112070852130.27573-100000@virga.msi.umn.edu>

Hi Sarachu:

The best place is asking Acclerys.  The company has better idea what
operating system should be.

Email: Help at GCG.Com

Thanks,

Tu

On Fri, 7 Dec 2001, Sarachu Martin wrote:

> Hi,
>
> sorry for the off-topic but maybe you can help me. Do you know if GCG 9 does
> run on a UltraSparc/Solaris 8 system? I installed GCG 9 on a Intel/Solaris 8
> system and got a "cannot execute exe file" error on several files. GCG doesn?t
> run on a PC platform?
>
> Thanks,
>
> martin.
>


From mathog at mendel.bio.caltech.edu  Wed Dec 12 13:44:25 2001
From: mathog at mendel.bio.caltech.edu (David Mathog)
Date: Wed, 12 Dec 2001 10:44:25 -0800
Subject: quick questions
Message-ID: <E16EEMj-0002mo-00@mendel.bio.caltech.edu>

1.  Is this list archived in a searchable form somewhere?
2.  what Ajax call or calls say if a command line switch was or wasn't
present?
For instance, at the moment when this

    foo = AjGetInt("Somekey");

returns foo = 0 I can't tell if the user entered "-somekey=0" or just
left it off the line.

3.  What entries have to go in the makefile to result in an EMBOSS
executable that gdb will debug?   This is on Solaris 8.   I tried using
-g along, but  gdb didn't like the resulting
executable.  It would start it, but "bt" (backtrace) only showed binary
addresses.
GDBs exact message was:

This GDB was configured as
"sparc-sun-solaris2.8"..."/usr/local/src/EMBOSS/embassy/ESIM4-1.0.0/source/esim4":
not in executable format: File format not recognized

Thanks,

David Mathog
mathog at caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech


From peter.rice at uk.lionbioscience.com  Thu Dec 13 05:09:35 2001
From: peter.rice at uk.lionbioscience.com (Peter Rice)
Date: Thu, 13 Dec 2001 10:09:35 +0000
Subject: quick questions
References: <E16EEMj-0002mo-00@mendel.bio.caltech.edu>
Message-ID: <3C187E5F.9D94B7D3@uk.lionbioscience.com>

Hi David,

>2.  what Ajax call or calls say if a command line switch was or wasn't
>present?

None. Values can be set on the command line, or by dependence on other
values, or just default.

Why would you like to know what was on the command line? It could be tricky
for GUI interfaces if they deliberately put everything on the command line,
default values and all.

>3.  What entries have to go in the makefile to result in an EMBOSS
>executable that gdb will debug?

None. Just run:

    ./configure --enable-debug

before you make.

regards,

Peter Rice

-- 
------------------------------------------------
Peter Rice, LION Bioscience Ltd, Cambridge, UK
peter.rice at uk.lionbioscience.com +44 1223 224723


From kkmattil at csc.fi  Thu Dec 13 08:08:52 2001
From: kkmattil at csc.fi (Kimmo Mattila)
Date: Thu, 13 Dec 2001 15:08:52 +0200 (EET)
Subject: Problems with fuzzpro and ehmmer
Message-ID: <Pine.LNX.4.33.0112131334250.7017-100000@sampo.csc.fi>


Dear EMBOSS people.

I have had few problems with fuzzpro, patmatdb and ehmmer. If anyone of
you have suggestions how to solve them, please tell.


FUZZPRO and PATMATDB

I am using fuzzpro and patmatdb with GCG formatted databases. If I run a
search against whole database (e.g. swiss:*), the programs do find the
right hit sequences, but pick wrong names for the found entries. With
plane sequence files or with sequence name lists, this error does not
occur. I have checked both the EMBOSS indexing and the GCG database files
and they should be OK. Other EMBOSS and GCG ?applications give correct
results, when same database files are used.

Has someone else had similar troubles? If the indexing of the
databases is in order, what might cause this?


EHMMER

We have successfully installed EMOBOSS-HMMER, however, unlike the native
HMMER, the emboss version is not able to use multiple processors (even
though ?cpu option is mentioned in the help data.) When I compared the
Makefile of EMBOSS-HMMER to the native one, in noticed that the EMBOSS
version lacks the settings for compiling multiprocessor version of HMMER.
Has someone managed to circumvent this with some simple trick like copying
some parts of the original HMMER Makefile to the Makefile of
EMBOSS-version?

Secondly, when I use ehmmsearch long output files are not complete.
After about 200 lines lines ehmmsearch starts writing the output to the
screen instead of the output file. The last line in the output file seems
to be

 Domain top hits:

And after this the alignments are printed to the screen.
What might cause this? Is there e.g. some limit in the output file
size.

Regards,

Kimmo Mattila

---------------------------------------------------------------
   Kimmo Mattila		Science Support
   kimmo.mattila at csc.fi		Center for Scientific Computing
   tel. +358 (0)9 457 2708	Tekniikantie 15a D, PL 405
   fax. +358 (0)9 457 2302	FIN-02101 Espoo, Finland
---------------------------------------------------------------


From mathog at mendel.bio.caltech.edu  Thu Dec 13 11:04:30 2001
From: mathog at mendel.bio.caltech.edu (David Mathog)
Date: Thu, 13 Dec 2001 08:04:30 -0800
Subject: quick questions
Message-ID: <E16EYLW-0004za-00@mendel.bio.caltech.edu>

 > >2.  what Ajax call or calls say if a command line switch was or
wasn't
> >present?
> 
> None. Values can be set on the command line, or by dependence on other
> values, or just default.
> 
> Why would you like to know what was on the command line? It could be
tricky
> for GUI interfaces if they deliberately put everything on the command
line,
> default values and all.

Consider an optional integer parameter "foobar" for which 0 is a valid
value and also where if foobar is not specified, it is calculated based
on the input sequences.  That is,
it does not have a fixed default value.  I see no way to distinguish 
because "calculate value" and "use this value" when AjAcdGetInt returns
0.
The workaround would beto set the default in the .acd file to a magic
default value, say -1000000, which is out of range for the desired
variable, and interpret that value
as "not specified".  There are three problems with this approach:

1.  There may be cases for which there are no magic values available.
2.  In w2h the default value shows up filled in on the Web interface. 
So the user sees -1000000 and wonders what the heck that means, or
thinks that -900000 might
also be valid.
3.  It requires that  range checking be disabled or special cased

I guess I'll have a look at the code for AjAcdGetInt and see if it's
possible to
modify that into AjAcdItemExists, returning a boolean T/F for when the
item has been specified.  Then the code would be (more or less like on
GCG)

if(AjAcdItemExists("foobar")){
   ifoobar=AjAcdGetInt("foobar");
}
else {
   ifoobar=calculated_value();
}

Thanks,

David Mathog
mathog at caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech


From peter.rice at uk.lionbioscience.com  Thu Dec 13 11:21:21 2001
From: peter.rice at uk.lionbioscience.com (Peter Rice)
Date: Thu, 13 Dec 2001 16:21:21 +0000
Subject: quick questions
References: <E16EYLW-0004za-00@mendel.bio.caltech.edu>
Message-ID: <3C18D581.7A22FDB8@uk.lionbioscience.com>

David Mathog wrote:
> 
> Consider an optional integer parameter "foobar" for which 0 is a valid
> value and also where if foobar is not specified, it is calculated based
> on the input sequences.
>
> I guess I'll have a look at the code for AjAcdGetInt and see if it's
> possible to modify that into AjAcdItemExists, returning a boolean
> T/F for when the item has been specified.  Then the code would be
> (more or less like on GCG)
> 
> if(AjAcdItemExists("foobar")){
>    ifoobar=AjAcdGetInt("foobar");
> }
> else {
>    ifoobar=calculated_value();
> }

Calculated values are intended to be calculated in the ACD file. Interfaces
such as W2H should be able to do this in JavaScript, though in some cases
they have to simply treat values as integers.

Try this ACD file. Save it as 'foobar.acd' and run as 'acdc foobar'.
It will prompt for a sequence, then prompt for foobar with the sequence
length as default but will accept any value from 0 to the sequence length.
The 'echo' string is defined you so can see the value of foobar in the
prompt.

The default value can be calculated in more exotic ways too ... see the @()
functions and the other calculated attributes. More can be easily added.

====================

appl: foobar [
  documentation: "ACD example"
  groups: "test"
]

sequence: sequence  [
  required: "Y"
]

integer: foobar  [
  required: "Y"
  default: "$(sequence.len)"
  minimum: "0"
  maximum: "$(sequence.len)"
]

string: echo  [
  prompt: "Foobar is $(foobar)"
  required: "Y"
]

===================

There are many other ways to set options. You could set a boolean to
calculate a value, and another value to define the calculation.

Testing the command line will have real problems for your original idea,
because an interface might be writing every option, with what it considers
the default value, on the command line.


-- 
------------------------------------------------
Peter Rice, LION Bioscience Ltd, Cambridge, UK
peter.rice at uk.lionbioscience.com +44 1223 224723


From mathog at mendel.bio.caltech.edu  Thu Dec 13 12:33:48 2001
From: mathog at mendel.bio.caltech.edu (David Mathog)
Date: Thu, 13 Dec 2001 09:33:48 -0800
Subject: quick questions
Message-ID: <E16EZjw-00054c-00@mendel.bio.caltech.edu>

> Calculated values are intended to be calculated in the ACD file.
Interfaces
> such as W2H should be able to do this in JavaScript, though in some
cases
> they have to simply treat values as integers.

The default value in this case is the end result of at least a hundred
lines of C code.

> 
> Testing the command line will have real problems for your original
idea,
> because an interface might be writing every option, with what it
considers
> the default value, on the command line.
> 

That's a good point.  W2H isn't like that, but some other interface
might be.  I
guess it won't hurt to add a couple of extra booleans to cover those
variables
whose default is difficult to calculate prior to the program running.

David Mathog
mathog at caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech


From mathog at mendel.bio.caltech.edu  Thu Dec 13 14:01:43 2001
From: mathog at mendel.bio.caltech.edu (David Mathog)
Date: Thu, 13 Dec 2001 11:01:43 -0800
Subject: quick questions
Message-ID: <E16Eb70-0005go-00@mendel.bio.caltech.edu>

Hmm, after going up and down through the ACD notation I can't find
what I'm looking for there either.  Consider this notation:

bool: usermspA [ 
  opt: Y
  def: N
  info: "False: esim4 calculates mspA, True: mspA from command line."
]

int: mspA [ 
  opt: $(usermspA)
  req: $(usermspA)
  def: 16
  info: "long description. default of 16 is not used unless usermspA is
specified.."
]

If the command is issued with -usermspA then it will prompt for -mspA if
it
wasn't also specified, which gives the desired results. However, if the
command 
has only this on the command line.

  -mspA=16

it clearly means that the user really wants to use the value of 16 for
the parameter.
How then to switch the state on -usermspA automatically, or failing
that, prompt for
-usermspA?  16 happens to be the default value.  It wasn't set to an
illegal (magic) value because we don't want -1000000 showing up in a
GUI.  But it isn't normally used
because -usermspA will be false.  As before, we could use a sort of
magic number and do:

bool: usermspA [ 
  opt: Y
  def: @($(mspa)!=16)
  info: "False: esim4 calculates mspA, True: mspA from command line."
]

and it will correctly flip the bit when the user specifies it - except
when by bad luck
they choose to specify the default value.  And round and round the logic
goes.  I don't suppose that there is a ".specified" or ".online"
attribute in ACD?  Ie, this would do the job:

bool: usermspA [ 
  opt: Y
  def: $(mspa.online)
  info: "False: esim4 calculates mspA, True: mspA from command line."
]

The desired GUI interaction in that case could be one of:

1.  changing value in mspA toggles state of usermspA (messy)
2.  -mspA slot is grayed out unless -usermspA is set (simpler)

In some interfaces this could be covered over with Javascript - but the
command line
variant still wouldn't work exactly right.  

Or am I missing something?


Summary:

works:  command 
works:  command -usermspA -mspA 16
works (prompts for mspA):  command -usermspA
fails to prompt or override usermspA:    command -mspA 16

Thanks,

David Mathog
mathog at caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech


From peter.rice at uk.lionbioscience.com  Fri Dec 14 04:41:17 2001
From: peter.rice at uk.lionbioscience.com (Peter Rice)
Date: Fri, 14 Dec 2001 09:41:17 +0000
Subject: quick questions
References: <E16Eb70-0005go-00@mendel.bio.caltech.edu>
Message-ID: <3C19C93D.A3FD713F@uk.lionbioscience.com>

David Mathog wrote:
> 
> Hmm, after going up and down through the ACD notation I can't find
> what I'm looking for there either. 
> 
> The desired GUI interaction in that case could be one of:
> 
> 1.  changing value in mspA toggles state of usermspA (messy)

This means 'mspA depends on usermspA' and 'usermspA depends on mspA'.
ACD expressly forbids this. All dependencies must be to something defined
earlier in the file.

> 2.  -mspA slot is grayed out unless -usermspA is set (simpler)

Could be done with an extra ACD attribute, with a value of "$(usermspA)",
but you would expect most GUIs to ignore this.

In general, you can expect to have options in EMBOSS that are not used by
the program but can still be set on the command line. Your -mspA is just
another case.

Having said that, adding an ACD function (you would only need one) to test
whether a value was set by the user is fairly trivial (setting via the
command line or by replying to a prompt if there is one).

-- 
------------------------------------------------
Peter Rice, LION Bioscience Ltd, Cambridge, UK
peter.rice at uk.lionbioscience.com +44 1223 224723


From charles at moulinette.dyndns.org  Tue Dec 18 03:56:25 2001
From: charles at moulinette.dyndns.org (Charles Plessy)
Date: Tue, 18 Dec 2001 09:56:25 +0100
Subject: phylogenic analysis with emboss
Message-ID: <20011218085625.GB803@gizmotronics.dyndns.org>

Hi,

I was wondering which tools you used for phylogenic analysis, since I can't
find any treedrawing in either emboss or embassy's phylip.

Charles


From gbottu at ben.vub.ac.be  Tue Dec 18 04:34:43 2001
From: gbottu at ben.vub.ac.be (Guy Bottu)
Date: Tue, 18 Dec 2001 10:34:43 +0100 (MET)
Subject: phylogenic analysis with emboss
Message-ID: <200112180934.KAA04579@bigben.vub.ac.be>

from : BEN

>I was wondering which tools you used for phylogenic analysis, since I can't
>find any treedrawing in either emboss or embassy's phylip.

If I am not wrong, the EMBOSS on-line help states explicitly that the tree 
drawing programs and the tree editors of PHYLIP were not included in the embassy 
PHYLIP. So, you should retrieve the original PHYLIP package from 
evolution.genetics.washington.edu and use the programs drawgram and drawtree.
Note that while embassy has integrated PHYLIP version 3.53c, there is now a 
version 3.6a2, which is definitively better. drawgram/drawtree has now for 
previewing the graphic an X-display and the generated PostScript files can not 
only be send directly to a printer, but can also be incorporated into documents 
like MS-Word doc files.

Another useful freeware tool I know about is NJplot, which is distributed 
together with CLUSTAL (which you must install anyway in order emma to work).

	Guy Bottu

	
From letondal at pasteur.fr  Wed Dec 19 09:03:18 2001
From: letondal at pasteur.fr (Catherine Letondal)
Date: Wed, 19 Dec 2001 15:03:18 +0100
Subject: how to cite EMBOSS?
Message-ID: <200112191403.fBJE3IW452649@electre.pasteur.fr>


Hi,

Sorry if this is an FAQ, but I was not able to find any reference
in EMBOSS documentation and Web site (apart from the original
algorithms of course). Is there any reference for the EMBOSS project?

Thanks a lot,

-- 
Catherine Letondal -- Pasteur Institute Computing Center


From gwilliam at hgmp.mrc.ac.uk  Wed Dec 19 09:05:49 2001
From: gwilliam at hgmp.mrc.ac.uk (Gary Williams, Tel 01223 494522)
Date: Wed, 19 Dec 2001 14:05:49 +0000
Subject: how to cite EMBOSS?
References: <200112191403.fBJE3IW452649@electre.pasteur.fr>
Message-ID: <3C209EBD.763B54F7@hgmp.mrc.ac.uk>


See the FAQ file:

Q) Is there a reference I can cite for EMBOSS?

A) Rice,P. Longden,I. and Bleasby,A.
"EMBOSS: The European Molecular Biology Open Software Suite"
Trends in Genetics June 2000, vol 16, No 6. pp.276-277

You are right - it should be in a more obvious place.

Gary

Catherine Letondal wrote:
> 
> Hi,
> 
> Sorry if this is an FAQ, but I was not able to find any reference
> in EMBOSS documentation and Web site (apart from the original
> algorithms of course). Is there any reference for the EMBOSS project?
> 
> Thanks a lot,
> 
> --
> Catherine Letondal -- Pasteur Institute Computing Center

-- 
Gary Williams               Tel: +44 1223 494522  Fax: +44 1223 494512
mailto:G.Williams at hgmp.mrc.ac.uk            http://www.hgmp.mrc.ac.uk/
Bioinformatics,MRC HGMP Resource Centre,Hinxton,Cambridge, CB10 1SB,UK


From letondal at pasteur.fr  Wed Dec 19 09:10:44 2001
From: letondal at pasteur.fr (Catherine Letondal)
Date: Wed, 19 Dec 2001 15:10:44 +0100
Subject: how to cite EMBOSS? 
In-Reply-To: Your message of "Wed, 19 Dec 2001 14:05:49 GMT."
             <3C209EBD.763B54F7@hgmp.mrc.ac.uk> 
Message-ID: <200112191410.fBJEAiW438064@electre.pasteur.fr>


"Gary Williams, Tel 01223 494522" wrote:
> 
> See the FAQ file:
> 
> Q) Is there a reference I can cite for EMBOSS?
> 
> A) Rice,P. Longden,I. and Bleasby,A.
> "EMBOSS: The European Molecular Biology Open Software Suite"
> Trends in Genetics June 2000, vol 16, No 6. pp.276-277
> 
> You are right - it should be in a more obvious place.

Thanks - yes, maybe in the http://www.uk.embnet.org/Software/EMBOSS/general.html page?

> 
> Gary
> 
> Catherine Letondal wrote:
> > 
> > Hi,
> > 
> > Sorry if this is an FAQ, but I was not able to find any reference
> > in EMBOSS documentation and Web site (apart from the original
> > algorithms of course). Is there any reference for the EMBOSS project?
> > 
> > Thanks a lot,
> > 
> > --
> > Catherine Letondal -- Pasteur Institute Computing Center
> 
> -- 
> Gary Williams               Tel: +44 1223 494522  Fax: +44 1223 494512
> mailto:G.Williams at hgmp.mrc.ac.uk            http://www.hgmp.mrc.ac.uk/
> Bioinformatics,MRC HGMP Resource Centre,Hinxton,Cambridge, CB10 1SB,UK

--
Catherine Letondal -- Pasteur Institute Computing Center


From gwilliam at hgmp.mrc.ac.uk  Wed Dec 19 09:21:09 2001
From: gwilliam at hgmp.mrc.ac.uk (Gary Williams, Tel 01223 494522)
Date: Wed, 19 Dec 2001 14:21:09 +0000
Subject: how to cite EMBOSS?
References: <200112191410.fBJEAiW438064@electre.pasteur.fr>
Message-ID: <3C20A255.80D77C57@hgmp.mrc.ac.uk>

Catherine Letondal wrote:
> 
> "Gary Williams, Tel 01223 494522" wrote:
> >
> > See the FAQ file:
> >
> > Q) Is there a reference I can cite for EMBOSS?
> >
> > A) Rice,P. Longden,I. and Bleasby,A.
> > "EMBOSS: The European Molecular Biology Open Software Suite"
> > Trends in Genetics June 2000, vol 16, No 6. pp.276-277
> >
> > You are right - it should be in a more obvious place.
> 
> Thanks - yes, maybe in the http://www.uk.embnet.org/Software/EMBOSS/general.html page?

Done.
Gary

-- 
Gary Williams               Tel: +44 1223 494522  Fax: +44 1223 494512
mailto:G.Williams at hgmp.mrc.ac.uk            http://www.hgmp.mrc.ac.uk/
Bioinformatics,MRC HGMP Resource Centre,Hinxton,Cambridge, CB10 1SB,UK


From simon.andrews at bbsrc.ac.uk  Wed Dec 19 09:55:43 2001
From: simon.andrews at bbsrc.ac.uk (simon andrews (BI))
Date: Wed, 19 Dec 2001 14:55:43 -0000
Subject: Farm files for databases
Message-ID: <2DC41140A89ED411989D00508BDCD9EDEA51EC@bi-exsrv1.iapc.bbsrc.ac.uk>

I'm trying to find the best way to do the following:

I have an application which returns an identifier (effectively an accession
number), which could be present in any one of 4 separate EMBOSS databases.
I'd like to be able to search all of these databases and retrieve the
sequence from whichever one finds it (I know that the identifiers are unique
between the different databases - so I'll only ever find one entry).

Having read the EMBOSS documentation the only reference I could find for
doing this sort of thing was to make a database entry with an "EXTERNAL"
format, and then have seqret query a script to return the sequence.  However
the details for this are pretty sketchy.

What exactly would a script of this type have to do?  What input is it
supplied with (and how), and what must it return?  Is this the only (or
best) way to do what I'm trying to do?

Any help is much appreciated.

	TTFN

	Simon.

----
Simon Andrews PhD
Bioinformatics Dept
The Babraham Institute

simon.andrews at bbsrc.ac.uk
+44 (0)1223 496463 


From peter.rice at uk.lionbioscience.com  Wed Dec 19 10:16:46 2001
From: peter.rice at uk.lionbioscience.com (Peter Rice)
Date: Wed, 19 Dec 2001 15:16:46 +0000
Subject: Farm files for databases
References: <2DC41140A89ED411989D00508BDCD9EDEA51EC@bi-exsrv1.iapc.bbsrc.ac.uk>
Message-ID: <3C20AF5E.F905F683@uk.lionbioscience.com>

"simon andrews (BI)" wrote:
> 
> I have an application which returns an identifier (effectively an
> accession number), which could be present in any one of 4 separate
> EMBOSS databases.
> I'd like to be able to search all of these databases and retrieve the
> sequence from whichever one finds it (I know that the identifiers are
> unique between the different databases - so I'll only ever find one
> entry).

This sounds like a job for SRS, although the query could be complicated if
there is a possibility of getting more than one copy returned.

A script is a good solution. The script should read the dbname:id query
from the commandline, and return the sequence in some specified format.
What the script does is up to you.

If there is no sequence found, it can simply return nothing.

The original 'external' applications were the 'efetch' utility in acedb,
and GCG's 'typedata'.

Peter

-- 
------------------------------------------------
Peter Rice, LION Bioscience Ltd, Cambridge, UK
peter.rice at uk.lionbioscience.com +44 1223 224723


From jason at cgt.mc.duke.edu  Wed Dec 19 16:01:23 2001
From: jason at cgt.mc.duke.edu (Jason Stajich)
Date: Wed, 19 Dec 2001 16:01:23 -0500 (EST)
Subject: alignment sequence reading with stop codons (bug?)
Message-ID: <Pine.LNX.4.33.0112191550310.32707-100000@tenero.genetics.duke.edu>

I noticed this in playing with our new bioperl wrappers for EMBOSS.
Apparently -seqall does not read sequences with stop codons.
I can submit as a bug if that is more appropriate.  Getting warmed up to
the EMBOSS dev process.

This occurs with both
EMBOSS-1.9.1
and
CVS code I checked out today (2.0.1 I guess).
The work around is of course to specify the arguments in the correct way
or replace the stop codon with something like X.  I know which sequence
will have potential stop codons so I can work around this in my own code.

[jason at gordola crypto_intergenic]$ cat jason.seq
>SW-CC27_YEAST SW:CC27_YEAST P38042 saccharomyces cerevisiae (baker's
yeast). cell division control protein 27. 10/2001; PIR:S45825 cell
division control protein CDC27 - yeast (Saccharomyces cerevisia
MAVNPELAPFTLSRGIPSFDDQALSTIIQLQDCIQQAIQQLNYSTAEFLAELLYAECSIL
DKSSVYWSDAVYLYALSLFLNKSYHTAFQISKEFKEYHLGIAYIFGRCALQLSQGVNEAI
LTLLSIINVFSSNSSNTRINMVLNSNLVHIPDLATLNCLLGNLYMKLDHSKEGAFYHSEA
LAINPYLWESYEAICKMRATVDLKRVFFDIAGKKSNSHNNNAASSFPSTSLSHFEPRSQP
SLYSKTNKNGNNNINNNVNTLFQSSNSPPSTSASSFSSIQHFSRSQQQQANTSIRTCQNK
NTQTPKNPAINSKTSSALPNNISMNLVSPSSKQPTISSLAKVYNRNKLLTTPPSKLLNND
RNHQNNNNNNNNNNNNNNNNNNNNNNNNIINKTTFKTPRNLYSSTGRLTTSKKNPRSLII
SNSILTSDYQITLPEIMYNFALILRSSSQYNSFKAIRLFESQIPSHIKDTMPWCLVQLGK
LHFEIINYDMSLKYFNRLKDLQPARVKDMEIFSTLLWHLHDKVKSSNLANGLMDTMPNKP
ETWCCIGNLLSLQKDHDAAIKAFEKATQLDPNFAYAYTLQGHEHSSNDSSDSAKTCYRKA
LACDPQHYNAYYGLGTSAMKLGQYEEALLYFEKARSINPVNVVLICCCGGSLEKLGYKEK
ALQYYELACHLQPTSSLSKYKMGQLLYSMTRYNVALQTFEELVKLVPDDATAHYLLGQTY
RIVGRKKDAIKELTVAMNLDPKGNQVIIDELQKCHMQE

[jason at gordola crypto_intergenic]$ cat prot.seq
>Contig5745
CLIF*RLLLIQMIHPQARRAFTFLQQQEPYRIQSMEQLSTLLWHLADLPALSHLSQSLIS
ISRSSPQAWIAVGNCFSLQKDHDEAMRCFRRATQVDEGCAYAWTLCGYEAVEMEEYERAM
AFYRTAIRTDARHYNAWYVLFFFFFFFFVPGDIDS*PKKGMEWG*FISKRIDRGMRSIIL
KEPSKSIQLIPFFYVALVW*VGVSSYPLETMTNIDFPKKKKALEKSNDVVQALHFYERAS
KYAPTSAMVQFKRIRALVALQRYDEAISALVPLTHSAPDEANVFFLLGKCLLKKERRQEA
TMAFTNARELEPK

[jason at gordola crypto_intergenic]$ water jason.seq prot.seq
Smith-Waterman local alignment.
   An error has been found: Sequence Contig5745 must be protein sequence,
 found bad character '*'
   An error has been found: option -seqall: Unable to read sequence
'prot.seq'
   There is a serious problem: water terminated: Bad value for option and
no prompt

[jason at gordola crypto_intergenic]$ water prot.seq jason.seq
Smith-Waterman local alignment.
Gap opening penalty [10.0]:
Gap extension penalty [0.5]:
Output file [contig5745.water]:

-- 
Jason Stajich
Duke University
jason at cgt.mc.duke.edu


From bauer at genprofile.com  Thu Dec 20 02:02:56 2001
From: bauer at genprofile.com (David Bauer)
Date: Thu, 20 Dec 2001 08:02:56 +0100
Subject: alignment sequence reading with stop codons (bug?)
References: <Pine.LNX.4.33.0112191550310.32707-100000@tenero.genetics.duke.edu>
Message-ID: <3C218D20.D752C59@genprofile.com>

Hi,

the protein alignment programs don't like the '*' in your protein
sequences. They are designed to align true proteins which usualy do not
contain stop codons.
If this are putative ORFs, a solution would be to split them up at the
stops, creating a separate protein sequence for each ORF.

I also guess you are misinterpreting the -seqall. This means to return
all sequences from a file containing more than one sequence (like a
fasta formated file with several sequences separated by theire
description lines). For me the -seqall option does not make much sense
in the case of alignment programs which need exactly 2 sequences to
align.
There you must always pass the two sequence files which you want align
as arguments to the alignment program and each file must contain exactly
one sequence.

I hope this helps,

David Bauer.


Jason Stajich wrote:
> 
> I noticed this in playing with our new bioperl wrappers for EMBOSS.
> Apparently -seqall does not read sequences with stop codons.
> I can submit as a bug if that is more appropriate.  Getting warmed up to
> the EMBOSS dev process.


From simon.andrews at bbsrc.ac.uk  Thu Dec 20 04:21:43 2001
From: simon.andrews at bbsrc.ac.uk (simon andrews (BI))
Date: Thu, 20 Dec 2001 09:21:43 -0000
Subject: Bug in entret.
Message-ID: <2DC41140A89ED411989D00508BDCD9EDEA51F1@bi-exsrv1.iapc.bbsrc.ac.uk>

Following on from my query yesterday, I have hit a problem trying to
implement a multiple search because of what appears to be a bug in entret.

I am using a series of fasta flat files, indexed with dbifasta.  What I am
finding is that although I can retrieve entries from the database with
seqret, using entret always returns an empty file with the same accession
number:

############

%> entret htg_mus:AC092094_v6_c8
Reads and writes (returns) flatfile entries
Output file [ac092094_v6_c8.entret]: 
%> more ac092094_v6_c8.entret
%> seqret htg_mus:AC092094_v6_c8       
Reads and writes (returns) sequences
Output sequence [ac092094_v6_c8.fasta]: 
%> more ac092094_v6_c8.fasta
>AC092094_v6_c8 Mus musculus clone RP23-261m19, WORKING DRAFT SEQUENCE, 8
unordered pieces.
CAGGACAGCCAGGGCTACACAGAGAAACCCTGTCTCAAAAAACAAAAAAACAAAAAAAAA
ACAAAAGAAGAAGAAAATGTCTGTGAATACCCTGGAAAAGTTACTCAGTGAAAGTAGATG
AGTCCCTGAGTCAGTGACAGGAAGTGAGTGCAGTCTGAGCACTGGCTTGTGACCAATGAC
AAAAACATAAGCTAGACTTGCTCTGCAAAGTGGAGGACAGAACAGACAAAGCCCCAGAGT

etc. etc.
############

entret doesn't produce any errors, but if I run it with the -debug option I
see the following lines in entret.dbg


############
Initializing seqInFormat, 40 formats
ajSeqRead: input file '/data/MOUSE/HTG/htg_mus.fasta' still there, try again
seqRead: single access - count 1 - call access routine again
seqAccessEmblcd type 1
query data all finished
seqRead: seqin->Query->Access->Access(seqin) *failed*
ajSeqallNext failed
closing file 'ac092094_v6_c4.entret'
############

I've checked, and the /data/MOUSE/HTG/htg_mus.fasta file is definitely
there, and is readable, so I suspect that something in the EMBOSS internals
is going wrong.

This is using EMBOSS 2.0.0.  Is this a known bug?  Is there a fix on the
way?  I can bluff the script using seqret in this case, but I'd like to make
a more general solution eventually.

	Cheers
	
	Simon.


From simon.andrews at bbsrc.ac.uk  Thu Dec 20 07:24:18 2001
From: simon.andrews at bbsrc.ac.uk (simon andrews (BI))
Date: Thu, 20 Dec 2001 12:24:18 -0000
Subject: Farm files for databases
Message-ID: <2DC41140A89ED411989D00508BDCD9EDEA51F3@bi-exsrv1.iapc.bbsrc.ac.uk>

After getting some useful info from Peter Rice about how to create a
database farm in EMBOSS I thought I'd share the script I'm now using to do
this.

To use this simply copy and paste the text of the script at the bottom of
this message to a file on your system, then make sure that this file is
readable and executable by everyone (chmod 755 filename).  The comments in
the script tell you what changes you need to make to the script itself, and
the format of the entry you need to create in emboss.default.

Because of the bug I previously reported in entret, this script will not
work from an entret query to the farm.  It will work with seqret (and will
output any format you like), and can also be used as part of a USA for any
of the standard EMBOSS programs.

The script requires a unix-like OS, but could trivially be adapted to run
under Win32 if anyone is running EMBOSS under windows.

	TTFN

	Simon.

------ Script Starts Here -- Beware of long lines wrapping
----------------------
#!/usr/bin/perl -w
use strict;

# EMBOSS farm file script
#
# Written by Simon Andrews
# simon.andrews at bbsrc.ac.uk
# Dec 2001
#
# This script allows you to set up a farm
# of EMBOSS databases which can be queried
# by a single instance of seqret.  The
# program must be accompanied by an entry
# in emboss.default which looks like this:
#
# DB name_of_database [
#	type: N (or P if we're dealing with proteins)
#	method: app
#	format: fasta
#	app: "/path/to/this/script"
#	comment: "Whatever text you'd like to see in showdb" ]
#

# First we need to set a few preferences
#
# What is the full path to seqret?
# If you are sure that seqret will always
# be somewhere in your path, then you can
# just leave this as 'seqret'.

my $seqret_path = 'seqret';


# Now we need to know the names of the
# databases you'd like included in the
# search.  These must be dabases which
# have already been indexed, and installed
# correctly into emboss.default.  Simply
# enter the database names between the
# brackets, separated by spaces.

my @databases = qw(dbase1 dbase2 dbase3);


##### End of bits which need to be edited #########

my ($reference) = @ARGV;

if ($reference =~ /:(.+)$/){
  $reference = $1;
}

else {
  die "\n*** FARM ERROR *** Couldn't get accession after : from
$reference\n\n";
}


foreach my $database (@databases){

  my $sequence = `$seqret_path $database:$reference fasta::stdout
2>/dev/null`;

  if ($sequence){
	print $sequence;
	exit;
  }

}

warn "\n*** FARM ERROR *** Couldn't find $reference in any of
'@databases'\n\n";


From lukem at bioinfo.pbi.nrc.ca  Thu Dec 20 10:10:19 2001
From: lukem at bioinfo.pbi.nrc.ca (Luke McCarthy)
Date: Thu, 20 Dec 2001 09:10:19 -0600
Subject: alignment sequence reading with stop codons (bug?)
References: <Pine.LNX.4.33.0112191550310.32707-100000@tenero.genetics.duke.edu> <3C218D20.D752C59@genprofile.com>
Message-ID: <3C21FF5B.2C4251BF@bioinfo.pbi.nrc.ca>

David Bauer wrote:
> 
> I also guess you are misinterpreting the -seqall. This means to return
> all sequences from a file containing more than one sequence (like a
> fasta formated file with several sequences separated by theire
> description lines). For me the -seqall option does not make much sense
> in the case of alignment programs which need exactly 2 sequences to
> align.

Nevertheless, the acd files for water and needle clearly state that the
second argument is a parameter of type seqall.  Which makes perfect
sense if one wants to align a probe sequence against a database of
others (a la BLAST)

Cheers,

Luke


From jason at cgt.mc.duke.edu  Thu Dec 20 10:12:48 2001
From: jason at cgt.mc.duke.edu (Jason Stajich)
Date: Thu, 20 Dec 2001 10:12:48 -0500 (EST)
Subject: alignment sequence reading with stop codons (bug?)
In-Reply-To: <3C218D20.D752C59@genprofile.com>
Message-ID: <Pine.LNX.4.33.0112200937250.2664-100000@tenero.genetics.duke.edu>

On Thu, 20 Dec 2001, David Bauer wrote:

> Hi,
>
> the protein alignment programs don't like the '*' in your protein
> sequences. They are designed to align true proteins which usualy do not
> contain stop codons.
> If this are putative ORFs, a solution would be to split them up at the
> stops, creating a separate protein sequence for each ORF.
>

Re-aligning blastx hsps in some distant fungi so am hitting pseudogenes or
sequencing errors, hence the stop codons.

What is confusing to me wrt to the actual alignment programs, is if they
don't like stop codons at all, they still allow an alignment when the
sequence containing the stop codon is the query (-sequencea) but not when
the sequence is in the subject db - ie the behavior in my previous msg.
I may just recode the stop codons as an unknown aa to achieve what I need
for the alignment.  I realize it is silly to try and align these proteins
with stop codons but I am looking for conserved regions for degenerate PCR
primer picking.

[jason at gordola crypto_intergenic]$ head -6 contig5745.water
Local: Contig5745 vs SW-CC27_YEAST
Score: 367.50

Contig5745      1        CLIF*RLLLIQMI.HPQARRAFTFLQQQEPYRIQSMEQLSTLLWH 44
                         ||:    |  ::| :  : : |  |:  :| |:: ||  ||||||
SW-CC27_YEAST   474      CLVQLGKLHFEIINYDMSLKYFNRLKDLQPARVKDMEIFSTLLWH 518

> I also guess you are misinterpreting the -seqall. This means to return
> all sequences from a file containing more than one sequence (like a
> fasta formated file with several sequences separated by theire
> description lines). For me the -seqall option does not make much sense
> in the case of alignment programs which need exactly 2 sequences to
> align.
> There you must always pass the two sequence files which you want align
> as arguments to the alignment program and each file must contain exactly
> one sequence.
>

In the alignment program context -seqall is the name of the db to search
the query (-sequencea) against - so one will get an alignment of the first
sequence against the whole db of sequences.  I am only interested in 1
pairwise comparison so the order of the sequences didn't really matter to
me.  We have a SW alignment module in bioperl (written in C - before you
gag) for protein alignments but was trying out our new EMBOSS wrappers in
bioperl, hence the reported issue.

> I hope this helps,
>
> David Bauer.
>
>
> Jason Stajich wrote:
> >
> > I noticed this in playing with our new bioperl wrappers for EMBOSS.
> > Apparently -seqall does not read sequences with stop codons.
> > I can submit as a bug if that is more appropriate.  Getting warmed up to
> > the EMBOSS dev process.
>

-- 
Jason Stajich
Duke University
jason at cgt.mc.duke.edu


From gbottu at ben.vub.ac.be  Thu Dec 20 12:25:52 2001
From: gbottu at ben.vub.ac.be (Guy Bottu)
Date: Thu, 20 Dec 2001 18:25:52 +0100 (MET)
Subject: Farm files for databases, using SRS
Message-ID: <200112201725.SAA11576@bigben.vub.ac.be>

from : BEN

You can also, like Peter suggested, use SRS. For example, I wanted to access the 
databanks IMGT/LIGM and IMGT/MHC as one databank with name imgt and shortname 
im. I use SRS for retrieving one or several sequences eventually with their 
documentation and a direct access to a databank for a full search (faster ?). I 
wrote : 

in .../emboss/share/EMBOSS/emboss.default :

DB imgt [ type: N comment: 'Immunogenetics Databases'
          methodquery: srs dbalias: IMGT formatquery: embl
          methodall: direct dir: /sw/emboss/DBlink file: 'I*' formatall: fasta
]
DB im   [ type: N comment: 'Immunogenetics Databases'
          methodquery: srs dbalias: IMGT formatquery: embl
          methodall: direct dir: /sw/emboss/DBlink file: 'I*' formatall: fasta
]

and in .../srs/icarus/site/site.i (hidden so that it does show up in the WWW 
page of SRS) :

$imgt_db=$Library:[IMGT
  format:$EMBL_FORMAT
  virtualInfo:$LibVirtual:[
    memberLibs:{$IMGT_DB $MHC_DB}
  ]
  type:hidden
]

The directory /sw/emboss/DBlink contains  :

Iligm -> /dbfb/imgt/ligm
Imhc -> /dbfb/imgt/mhc

	Guy Bottu


From ableasby at hgmp.mrc.ac.uk  Mon Dec 24 11:41:15 2001
From: ableasby at hgmp.mrc.ac.uk (ableasby at hgmp.mrc.ac.uk)
Date: Mon, 24 Dec 2001 16:41:15 GMT
Subject: EMBOSS 2.1.0 released
Message-ID: <200112241641.QAA23771@bromine.hgmp.mrc.ac.uk>

EMBOSS 2.1.0, coming some 6 months after the previous release, now includes
an alpha release of the client/server GUI called Jemboss, written at
the HGMP by Tim Carver. The complete package is available for download
from:
       http://www.uk.embnet.org/Software/EMBOSS

Several new applications are provided including primer3. There has also
been considerable work done in the transition towards standard report
formats and many applications now use these. Any alignment program
can use the -aformat qualifier to choose a variety of standard outputs
(e.g. pair, markx0,markx1,srs). Reports for non-alignment programs
similarly use the -rformat qualifier. All have sensible defaults.
Reports will be further integrated throughout the EMBOSS vsn 2
distributions. 

EMBOSS will work as usual without Jemboss, however if you wish to try
using Jemboss (server or client) see:

http://www.uk.embnet.org/Software/EMBOSS/Jemboss/download/setup.html


Alan


From smcmahan at facstaff.wisc.edu  Sat Dec  1 00:41:40 2001
From: smcmahan at facstaff.wisc.edu (Scott McMahan)
Date: Fri, 30 Nov 2001 18:41:40 -0600
Subject: Modifying existing programs
Message-ID: <3C082744.1020802@facstaff.wisc.edu>

	I've modified pepstats.c (and necessary support files) to include the 
calculation of molar extinction coefficient at 280 and the expected A280 
of a 1mg/ml solution.  I've looked on the website, but couldn't find 
documentation about how to handle additions to existing applications. 
Could someone please point me in the right direction?

-- 
Scott McMahan
smcmahan at facstaff.wisc.edu


From econtact at defisc-immo.fr  Fri Dec  7 06:06:34 2001
From: econtact at defisc-immo.fr (DEFISCIMMO)
Date: Fri, 7 Dec 2001 07:06:34 +0100
Subject: INVESTISSEZ VOS IMPOTS
Message-ID: <NFBBIHPCCLODCPNCLFCDEEKDPFAD.econtact@defisc-immo.fr>

**********EPARGNEZ VOS IMPOTS*********************

Pour en savoir plus cliquez sur le lien suivant :
www.defisc-immo.fr
http://www.defisc-immo.fr/cgi-bin/s.pl?id=453059457;p=index;end;/

---------------------------------------------------------------------------

                                                 INVESTIR FACILEMENT

                                                        Loyers percus
         ? partir de                         + Economie d'imp?ts
          200 F/mois                        - Remboursement des pr?ts
                                                   = EPARGNE MINIMALE


Ou comment, dans le cadre de la LOI BESSON, se constituer :
 - un patrimoine
 - un capital retraite
 - des revenus compl?mentaires
gr?ce ? un LOCATAIRE et ? des ECONOMIES D'IMPOTS.

* Plans d'investissement sur demande


             DEFISCIMMO

             info at defisc-immo.fr

Nous vous invitons ? remplir le formulaire ? l'adresse
 http://www.defisc-immo.fr/cgi-bin/s.pl?id=453059457;p=contact;end


si vous ne souhaitez plus recevoir de messages cliquez sur le lien suivant
http://www.defisc-immo.fr/contact/pages/mailing.htm
ou r?pondez ? ce courrier en indiquant 'annulation' dans le sujet.


From gbottu at ben.vub.ac.be  Fri Dec  7 10:59:49 2001
From: gbottu at ben.vub.ac.be (Guy Bottu)
Date: Fri, 7 Dec 2001 11:59:49 +0100 (MET)
Subject: compiling EMNU on CompaqTru64
Message-ID: <200112071059.LAA16923@bigben.vub.ac.be>

from : BEN

	Dear colleagues,
	
I have a problem. I am trying to compile EMNU on our new computer. We have OS 
CompaqTru64 5.1 and compiler GNU gcc 3.O.1
It does not work because the files menu.h, form.h, eti.h, libmenu.a and 
libform.a are lacking. Anyone an idea where to obtain these ?

	Guy Bottu


From gwilliam at hgmp.mrc.ac.uk  Fri Dec  7 11:09:48 2001
From: gwilliam at hgmp.mrc.ac.uk (Gary Williams, Tel 01223 494522)
Date: Fri, 07 Dec 2001 11:09:48 +0000
Subject: compiling EMNU on CompaqTru64
References: <200112071059.LAA16923@bigben.vub.ac.be>
Message-ID: <3C10A37C.7C9CB92A@hgmp.mrc.ac.uk>


The libmenu.a, menu.h and libform.a, form.h files are part of the
standard curses (or ncurses) UNIX libraries.
Check that these are set up correctly.

ncurses is available from:
ftp://dickey.his.com/ncurses/
or
ftp://ftp.gnu.org/pub/gnu/ncurses

Read emnu's INSTALL file for 'configure's arguments to piont to the
required libraries.

Guy Bottu wrote:
> 
> from : BEN
> 
>         Dear colleagues,
> 
> I have a problem. I am trying to compile EMNU on our new computer. We have OS
> CompaqTru64 5.1 and compiler GNU gcc 3.O.1
> It does not work because the files menu.h, form.h, eti.h, libmenu.a and
> libform.a are lacking. Anyone an idea where to obtain these ?
> 
>         Guy Bottu

-- 
Gary Williams               Tel: +44 1223 494522  Fax: +44 1223 494512
mailto:G.Williams at hgmp.mrc.ac.uk            http://www.hgmp.mrc.ac.uk/
Bioinformatics,MRC HGMP Resource Centre,Hinxton,Cambridge, CB10 1SB,UK


From ableasby at hgmp.mrc.ac.uk  Fri Dec  7 11:10:26 2001
From: ableasby at hgmp.mrc.ac.uk (ableasby at hgmp.mrc.ac.uk)
Date: Fri, 7 Dec 2001 11:10:26 GMT
Subject: compiling EMNU on CompaqTru64
Message-ID: <200112071110.LAA24602@bromine.hgmp.mrc.ac.uk>

Hi Guy,

I believe you'll find them if you install GNU ncurses from
ftp.gnu.org

Cheers
Alan


From mad at biol.unlp.edu.ar  Fri Dec  7 14:44:33 2001
From: mad at biol.unlp.edu.ar (Sarachu Martin)
Date: Fri, 07 Dec 2001 11:44:33 -0300 (ART)
Subject: gcg and solaris 8
Message-ID: <1007736273.3c10d5d1aaead@www.biol.unlp.edu.ar>

Hi,

sorry for the off-topic but maybe you can help me. Do you know if GCG 9 does 
run on a UltraSparc/Solaris 8 system? I installed GCG 9 on a Intel/Solaris 8 
system and got a "cannot execute exe file" error on several files. GCG doesn?t 
run on a PC platform?

Thanks,

martin.


From ztu at msi.umn.edu  Fri Dec  7 14:54:15 2001
From: ztu at msi.umn.edu (Zheng Jin Tu)
Date: Fri, 7 Dec 2001 08:54:15 -0600 (CST)
Subject: gcg and solaris 8
In-Reply-To: <1007736273.3c10d5d1aaead@www.biol.unlp.edu.ar>
Message-ID: <Pine.LNX.4.31.0112070852130.27573-100000@virga.msi.umn.edu>

Hi Sarachu:

The best place is asking Acclerys.  The company has better idea what
operating system should be.

Email: Help at GCG.Com

Thanks,

Tu

On Fri, 7 Dec 2001, Sarachu Martin wrote:

> Hi,
>
> sorry for the off-topic but maybe you can help me. Do you know if GCG 9 does
> run on a UltraSparc/Solaris 8 system? I installed GCG 9 on a Intel/Solaris 8
> system and got a "cannot execute exe file" error on several files. GCG doesn?t
> run on a PC platform?
>
> Thanks,
>
> martin.
>


From mathog at mendel.bio.caltech.edu  Wed Dec 12 18:44:25 2001
From: mathog at mendel.bio.caltech.edu (David Mathog)
Date: Wed, 12 Dec 2001 10:44:25 -0800
Subject: quick questions
Message-ID: <E16EEMj-0002mo-00@mendel.bio.caltech.edu>

1.  Is this list archived in a searchable form somewhere?
2.  what Ajax call or calls say if a command line switch was or wasn't
present?
For instance, at the moment when this

    foo = AjGetInt("Somekey");

returns foo = 0 I can't tell if the user entered "-somekey=0" or just
left it off the line.

3.  What entries have to go in the makefile to result in an EMBOSS
executable that gdb will debug?   This is on Solaris 8.   I tried using
-g along, but  gdb didn't like the resulting
executable.  It would start it, but "bt" (backtrace) only showed binary
addresses.
GDBs exact message was:

This GDB was configured as
"sparc-sun-solaris2.8"..."/usr/local/src/EMBOSS/embassy/ESIM4-1.0.0/source/esim4":
not in executable format: File format not recognized

Thanks,

David Mathog
mathog at caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech


From peter.rice at uk.lionbioscience.com  Thu Dec 13 10:09:35 2001
From: peter.rice at uk.lionbioscience.com (Peter Rice)
Date: Thu, 13 Dec 2001 10:09:35 +0000
Subject: quick questions
References: <E16EEMj-0002mo-00@mendel.bio.caltech.edu>
Message-ID: <3C187E5F.9D94B7D3@uk.lionbioscience.com>

Hi David,

>2.  what Ajax call or calls say if a command line switch was or wasn't
>present?

None. Values can be set on the command line, or by dependence on other
values, or just default.

Why would you like to know what was on the command line? It could be tricky
for GUI interfaces if they deliberately put everything on the command line,
default values and all.

>3.  What entries have to go in the makefile to result in an EMBOSS
>executable that gdb will debug?

None. Just run:

    ./configure --enable-debug

before you make.

regards,

Peter Rice

-- 
------------------------------------------------
Peter Rice, LION Bioscience Ltd, Cambridge, UK
peter.rice at uk.lionbioscience.com +44 1223 224723


From kkmattil at csc.fi  Thu Dec 13 13:08:52 2001
From: kkmattil at csc.fi (Kimmo Mattila)
Date: Thu, 13 Dec 2001 15:08:52 +0200 (EET)
Subject: Problems with fuzzpro and ehmmer
Message-ID: <Pine.LNX.4.33.0112131334250.7017-100000@sampo.csc.fi>


Dear EMBOSS people.

I have had few problems with fuzzpro, patmatdb and ehmmer. If anyone of
you have suggestions how to solve them, please tell.


FUZZPRO and PATMATDB

I am using fuzzpro and patmatdb with GCG formatted databases. If I run a
search against whole database (e.g. swiss:*), the programs do find the
right hit sequences, but pick wrong names for the found entries. With
plane sequence files or with sequence name lists, this error does not
occur. I have checked both the EMBOSS indexing and the GCG database files
and they should be OK. Other EMBOSS and GCG ?applications give correct
results, when same database files are used.

Has someone else had similar troubles? If the indexing of the
databases is in order, what might cause this?


EHMMER

We have successfully installed EMOBOSS-HMMER, however, unlike the native
HMMER, the emboss version is not able to use multiple processors (even
though ?cpu option is mentioned in the help data.) When I compared the
Makefile of EMBOSS-HMMER to the native one, in noticed that the EMBOSS
version lacks the settings for compiling multiprocessor version of HMMER.
Has someone managed to circumvent this with some simple trick like copying
some parts of the original HMMER Makefile to the Makefile of
EMBOSS-version?

Secondly, when I use ehmmsearch long output files are not complete.
After about 200 lines lines ehmmsearch starts writing the output to the
screen instead of the output file. The last line in the output file seems
to be

 Domain top hits:

And after this the alignments are printed to the screen.
What might cause this? Is there e.g. some limit in the output file
size.

Regards,

Kimmo Mattila

---------------------------------------------------------------
   Kimmo Mattila		Science Support
   kimmo.mattila at csc.fi		Center for Scientific Computing
   tel. +358 (0)9 457 2708	Tekniikantie 15a D, PL 405
   fax. +358 (0)9 457 2302	FIN-02101 Espoo, Finland
---------------------------------------------------------------


From mathog at mendel.bio.caltech.edu  Thu Dec 13 16:04:30 2001
From: mathog at mendel.bio.caltech.edu (David Mathog)
Date: Thu, 13 Dec 2001 08:04:30 -0800
Subject: quick questions
Message-ID: <E16EYLW-0004za-00@mendel.bio.caltech.edu>

 > >2.  what Ajax call or calls say if a command line switch was or
wasn't
> >present?
> 
> None. Values can be set on the command line, or by dependence on other
> values, or just default.
> 
> Why would you like to know what was on the command line? It could be
tricky
> for GUI interfaces if they deliberately put everything on the command
line,
> default values and all.

Consider an optional integer parameter "foobar" for which 0 is a valid
value and also where if foobar is not specified, it is calculated based
on the input sequences.  That is,
it does not have a fixed default value.  I see no way to distinguish 
because "calculate value" and "use this value" when AjAcdGetInt returns
0.
The workaround would beto set the default in the .acd file to a magic
default value, say -1000000, which is out of range for the desired
variable, and interpret that value
as "not specified".  There are three problems with this approach:

1.  There may be cases for which there are no magic values available.
2.  In w2h the default value shows up filled in on the Web interface. 
So the user sees -1000000 and wonders what the heck that means, or
thinks that -900000 might
also be valid.
3.  It requires that  range checking be disabled or special cased

I guess I'll have a look at the code for AjAcdGetInt and see if it's
possible to
modify that into AjAcdItemExists, returning a boolean T/F for when the
item has been specified.  Then the code would be (more or less like on
GCG)

if(AjAcdItemExists("foobar")){
   ifoobar=AjAcdGetInt("foobar");
}
else {
   ifoobar=calculated_value();
}

Thanks,

David Mathog
mathog at caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech


From peter.rice at uk.lionbioscience.com  Thu Dec 13 16:21:21 2001
From: peter.rice at uk.lionbioscience.com (Peter Rice)
Date: Thu, 13 Dec 2001 16:21:21 +0000
Subject: quick questions
References: <E16EYLW-0004za-00@mendel.bio.caltech.edu>
Message-ID: <3C18D581.7A22FDB8@uk.lionbioscience.com>

David Mathog wrote:
> 
> Consider an optional integer parameter "foobar" for which 0 is a valid
> value and also where if foobar is not specified, it is calculated based
> on the input sequences.
>
> I guess I'll have a look at the code for AjAcdGetInt and see if it's
> possible to modify that into AjAcdItemExists, returning a boolean
> T/F for when the item has been specified.  Then the code would be
> (more or less like on GCG)
> 
> if(AjAcdItemExists("foobar")){
>    ifoobar=AjAcdGetInt("foobar");
> }
> else {
>    ifoobar=calculated_value();
> }

Calculated values are intended to be calculated in the ACD file. Interfaces
such as W2H should be able to do this in JavaScript, though in some cases
they have to simply treat values as integers.

Try this ACD file. Save it as 'foobar.acd' and run as 'acdc foobar'.
It will prompt for a sequence, then prompt for foobar with the sequence
length as default but will accept any value from 0 to the sequence length.
The 'echo' string is defined you so can see the value of foobar in the
prompt.

The default value can be calculated in more exotic ways too ... see the @()
functions and the other calculated attributes. More can be easily added.

====================

appl: foobar [
  documentation: "ACD example"
  groups: "test"
]

sequence: sequence  [
  required: "Y"
]

integer: foobar  [
  required: "Y"
  default: "$(sequence.len)"
  minimum: "0"
  maximum: "$(sequence.len)"
]

string: echo  [
  prompt: "Foobar is $(foobar)"
  required: "Y"
]

===================

There are many other ways to set options. You could set a boolean to
calculate a value, and another value to define the calculation.

Testing the command line will have real problems for your original idea,
because an interface might be writing every option, with what it considers
the default value, on the command line.


-- 
------------------------------------------------
Peter Rice, LION Bioscience Ltd, Cambridge, UK
peter.rice at uk.lionbioscience.com +44 1223 224723


From mathog at mendel.bio.caltech.edu  Thu Dec 13 17:33:48 2001
From: mathog at mendel.bio.caltech.edu (David Mathog)
Date: Thu, 13 Dec 2001 09:33:48 -0800
Subject: quick questions
Message-ID: <E16EZjw-00054c-00@mendel.bio.caltech.edu>

> Calculated values are intended to be calculated in the ACD file.
Interfaces
> such as W2H should be able to do this in JavaScript, though in some
cases
> they have to simply treat values as integers.

The default value in this case is the end result of at least a hundred
lines of C code.

> 
> Testing the command line will have real problems for your original
idea,
> because an interface might be writing every option, with what it
considers
> the default value, on the command line.
> 

That's a good point.  W2H isn't like that, but some other interface
might be.  I
guess it won't hurt to add a couple of extra booleans to cover those
variables
whose default is difficult to calculate prior to the program running.

David Mathog
mathog at caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech


From mathog at mendel.bio.caltech.edu  Thu Dec 13 19:01:43 2001
From: mathog at mendel.bio.caltech.edu (David Mathog)
Date: Thu, 13 Dec 2001 11:01:43 -0800
Subject: quick questions
Message-ID: <E16Eb70-0005go-00@mendel.bio.caltech.edu>

Hmm, after going up and down through the ACD notation I can't find
what I'm looking for there either.  Consider this notation:

bool: usermspA [ 
  opt: Y
  def: N
  info: "False: esim4 calculates mspA, True: mspA from command line."
]

int: mspA [ 
  opt: $(usermspA)
  req: $(usermspA)
  def: 16
  info: "long description. default of 16 is not used unless usermspA is
specified.."
]

If the command is issued with -usermspA then it will prompt for -mspA if
it
wasn't also specified, which gives the desired results. However, if the
command 
has only this on the command line.

  -mspA=16

it clearly means that the user really wants to use the value of 16 for
the parameter.
How then to switch the state on -usermspA automatically, or failing
that, prompt for
-usermspA?  16 happens to be the default value.  It wasn't set to an
illegal (magic) value because we don't want -1000000 showing up in a
GUI.  But it isn't normally used
because -usermspA will be false.  As before, we could use a sort of
magic number and do:

bool: usermspA [ 
  opt: Y
  def: @($(mspa)!=16)
  info: "False: esim4 calculates mspA, True: mspA from command line."
]

and it will correctly flip the bit when the user specifies it - except
when by bad luck
they choose to specify the default value.  And round and round the logic
goes.  I don't suppose that there is a ".specified" or ".online"
attribute in ACD?  Ie, this would do the job:

bool: usermspA [ 
  opt: Y
  def: $(mspa.online)
  info: "False: esim4 calculates mspA, True: mspA from command line."
]

The desired GUI interaction in that case could be one of:

1.  changing value in mspA toggles state of usermspA (messy)
2.  -mspA slot is grayed out unless -usermspA is set (simpler)

In some interfaces this could be covered over with Javascript - but the
command line
variant still wouldn't work exactly right.  

Or am I missing something?


Summary:

works:  command 
works:  command -usermspA -mspA 16
works (prompts for mspA):  command -usermspA
fails to prompt or override usermspA:    command -mspA 16

Thanks,

David Mathog
mathog at caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech


From peter.rice at uk.lionbioscience.com  Fri Dec 14 09:41:17 2001
From: peter.rice at uk.lionbioscience.com (Peter Rice)
Date: Fri, 14 Dec 2001 09:41:17 +0000
Subject: quick questions
References: <E16Eb70-0005go-00@mendel.bio.caltech.edu>
Message-ID: <3C19C93D.A3FD713F@uk.lionbioscience.com>

David Mathog wrote:
> 
> Hmm, after going up and down through the ACD notation I can't find
> what I'm looking for there either. 
> 
> The desired GUI interaction in that case could be one of:
> 
> 1.  changing value in mspA toggles state of usermspA (messy)

This means 'mspA depends on usermspA' and 'usermspA depends on mspA'.
ACD expressly forbids this. All dependencies must be to something defined
earlier in the file.

> 2.  -mspA slot is grayed out unless -usermspA is set (simpler)

Could be done with an extra ACD attribute, with a value of "$(usermspA)",
but you would expect most GUIs to ignore this.

In general, you can expect to have options in EMBOSS that are not used by
the program but can still be set on the command line. Your -mspA is just
another case.

Having said that, adding an ACD function (you would only need one) to test
whether a value was set by the user is fairly trivial (setting via the
command line or by replying to a prompt if there is one).

-- 
------------------------------------------------
Peter Rice, LION Bioscience Ltd, Cambridge, UK
peter.rice at uk.lionbioscience.com +44 1223 224723


From charles at moulinette.dyndns.org  Tue Dec 18 08:56:25 2001
From: charles at moulinette.dyndns.org (Charles Plessy)
Date: Tue, 18 Dec 2001 09:56:25 +0100
Subject: phylogenic analysis with emboss
Message-ID: <20011218085625.GB803@gizmotronics.dyndns.org>

Hi,

I was wondering which tools you used for phylogenic analysis, since I can't
find any treedrawing in either emboss or embassy's phylip.

Charles


From gbottu at ben.vub.ac.be  Tue Dec 18 09:34:43 2001
From: gbottu at ben.vub.ac.be (Guy Bottu)
Date: Tue, 18 Dec 2001 10:34:43 +0100 (MET)
Subject: phylogenic analysis with emboss
Message-ID: <200112180934.KAA04579@bigben.vub.ac.be>

from : BEN

>I was wondering which tools you used for phylogenic analysis, since I can't
>find any treedrawing in either emboss or embassy's phylip.

If I am not wrong, the EMBOSS on-line help states explicitly that the tree 
drawing programs and the tree editors of PHYLIP were not included in the embassy 
PHYLIP. So, you should retrieve the original PHYLIP package from 
evolution.genetics.washington.edu and use the programs drawgram and drawtree.
Note that while embassy has integrated PHYLIP version 3.53c, there is now a 
version 3.6a2, which is definitively better. drawgram/drawtree has now for 
previewing the graphic an X-display and the generated PostScript files can not 
only be send directly to a printer, but can also be incorporated into documents 
like MS-Word doc files.

Another useful freeware tool I know about is NJplot, which is distributed 
together with CLUSTAL (which you must install anyway in order emma to work).

	Guy Bottu

	
From letondal at pasteur.fr  Wed Dec 19 14:03:18 2001
From: letondal at pasteur.fr (Catherine Letondal)
Date: Wed, 19 Dec 2001 15:03:18 +0100
Subject: how to cite EMBOSS?
Message-ID: <200112191403.fBJE3IW452649@electre.pasteur.fr>


Hi,

Sorry if this is an FAQ, but I was not able to find any reference
in EMBOSS documentation and Web site (apart from the original
algorithms of course). Is there any reference for the EMBOSS project?

Thanks a lot,

-- 
Catherine Letondal -- Pasteur Institute Computing Center


From gwilliam at hgmp.mrc.ac.uk  Wed Dec 19 14:05:49 2001
From: gwilliam at hgmp.mrc.ac.uk (Gary Williams, Tel 01223 494522)
Date: Wed, 19 Dec 2001 14:05:49 +0000
Subject: how to cite EMBOSS?
References: <200112191403.fBJE3IW452649@electre.pasteur.fr>
Message-ID: <3C209EBD.763B54F7@hgmp.mrc.ac.uk>


See the FAQ file:

Q) Is there a reference I can cite for EMBOSS?

A) Rice,P. Longden,I. and Bleasby,A.
"EMBOSS: The European Molecular Biology Open Software Suite"
Trends in Genetics June 2000, vol 16, No 6. pp.276-277

You are right - it should be in a more obvious place.

Gary

Catherine Letondal wrote:
> 
> Hi,
> 
> Sorry if this is an FAQ, but I was not able to find any reference
> in EMBOSS documentation and Web site (apart from the original
> algorithms of course). Is there any reference for the EMBOSS project?
> 
> Thanks a lot,
> 
> --
> Catherine Letondal -- Pasteur Institute Computing Center

-- 
Gary Williams               Tel: +44 1223 494522  Fax: +44 1223 494512
mailto:G.Williams at hgmp.mrc.ac.uk            http://www.hgmp.mrc.ac.uk/
Bioinformatics,MRC HGMP Resource Centre,Hinxton,Cambridge, CB10 1SB,UK


From letondal at pasteur.fr  Wed Dec 19 14:10:44 2001
From: letondal at pasteur.fr (Catherine Letondal)
Date: Wed, 19 Dec 2001 15:10:44 +0100
Subject: how to cite EMBOSS? 
In-Reply-To: Your message of "Wed, 19 Dec 2001 14:05:49 GMT."
             <3C209EBD.763B54F7@hgmp.mrc.ac.uk> 
Message-ID: <200112191410.fBJEAiW438064@electre.pasteur.fr>


"Gary Williams, Tel 01223 494522" wrote:
> 
> See the FAQ file:
> 
> Q) Is there a reference I can cite for EMBOSS?
> 
> A) Rice,P. Longden,I. and Bleasby,A.
> "EMBOSS: The European Molecular Biology Open Software Suite"
> Trends in Genetics June 2000, vol 16, No 6. pp.276-277
> 
> You are right - it should be in a more obvious place.

Thanks - yes, maybe in the http://www.uk.embnet.org/Software/EMBOSS/general.html page?

> 
> Gary
> 
> Catherine Letondal wrote:
> > 
> > Hi,
> > 
> > Sorry if this is an FAQ, but I was not able to find any reference
> > in EMBOSS documentation and Web site (apart from the original
> > algorithms of course). Is there any reference for the EMBOSS project?
> > 
> > Thanks a lot,
> > 
> > --
> > Catherine Letondal -- Pasteur Institute Computing Center
> 
> -- 
> Gary Williams               Tel: +44 1223 494522  Fax: +44 1223 494512
> mailto:G.Williams at hgmp.mrc.ac.uk            http://www.hgmp.mrc.ac.uk/
> Bioinformatics,MRC HGMP Resource Centre,Hinxton,Cambridge, CB10 1SB,UK

--
Catherine Letondal -- Pasteur Institute Computing Center


From gwilliam at hgmp.mrc.ac.uk  Wed Dec 19 14:21:09 2001
From: gwilliam at hgmp.mrc.ac.uk (Gary Williams, Tel 01223 494522)
Date: Wed, 19 Dec 2001 14:21:09 +0000
Subject: how to cite EMBOSS?
References: <200112191410.fBJEAiW438064@electre.pasteur.fr>
Message-ID: <3C20A255.80D77C57@hgmp.mrc.ac.uk>

Catherine Letondal wrote:
> 
> "Gary Williams, Tel 01223 494522" wrote:
> >
> > See the FAQ file:
> >
> > Q) Is there a reference I can cite for EMBOSS?
> >
> > A) Rice,P. Longden,I. and Bleasby,A.
> > "EMBOSS: The European Molecular Biology Open Software Suite"
> > Trends in Genetics June 2000, vol 16, No 6. pp.276-277
> >
> > You are right - it should be in a more obvious place.
> 
> Thanks - yes, maybe in the http://www.uk.embnet.org/Software/EMBOSS/general.html page?

Done.
Gary

-- 
Gary Williams               Tel: +44 1223 494522  Fax: +44 1223 494512
mailto:G.Williams at hgmp.mrc.ac.uk            http://www.hgmp.mrc.ac.uk/
Bioinformatics,MRC HGMP Resource Centre,Hinxton,Cambridge, CB10 1SB,UK


From simon.andrews at bbsrc.ac.uk  Wed Dec 19 14:55:43 2001
From: simon.andrews at bbsrc.ac.uk (simon andrews (BI))
Date: Wed, 19 Dec 2001 14:55:43 -0000
Subject: Farm files for databases
Message-ID: <2DC41140A89ED411989D00508BDCD9EDEA51EC@bi-exsrv1.iapc.bbsrc.ac.uk>

I'm trying to find the best way to do the following:

I have an application which returns an identifier (effectively an accession
number), which could be present in any one of 4 separate EMBOSS databases.
I'd like to be able to search all of these databases and retrieve the
sequence from whichever one finds it (I know that the identifiers are unique
between the different databases - so I'll only ever find one entry).

Having read the EMBOSS documentation the only reference I could find for
doing this sort of thing was to make a database entry with an "EXTERNAL"
format, and then have seqret query a script to return the sequence.  However
the details for this are pretty sketchy.

What exactly would a script of this type have to do?  What input is it
supplied with (and how), and what must it return?  Is this the only (or
best) way to do what I'm trying to do?

Any help is much appreciated.

	TTFN

	Simon.

----
Simon Andrews PhD
Bioinformatics Dept
The Babraham Institute

simon.andrews at bbsrc.ac.uk
+44 (0)1223 496463 


From peter.rice at uk.lionbioscience.com  Wed Dec 19 15:16:46 2001
From: peter.rice at uk.lionbioscience.com (Peter Rice)
Date: Wed, 19 Dec 2001 15:16:46 +0000
Subject: Farm files for databases
References: <2DC41140A89ED411989D00508BDCD9EDEA51EC@bi-exsrv1.iapc.bbsrc.ac.uk>
Message-ID: <3C20AF5E.F905F683@uk.lionbioscience.com>

"simon andrews (BI)" wrote:
> 
> I have an application which returns an identifier (effectively an
> accession number), which could be present in any one of 4 separate
> EMBOSS databases.
> I'd like to be able to search all of these databases and retrieve the
> sequence from whichever one finds it (I know that the identifiers are
> unique between the different databases - so I'll only ever find one
> entry).

This sounds like a job for SRS, although the query could be complicated if
there is a possibility of getting more than one copy returned.

A script is a good solution. The script should read the dbname:id query
from the commandline, and return the sequence in some specified format.
What the script does is up to you.

If there is no sequence found, it can simply return nothing.

The original 'external' applications were the 'efetch' utility in acedb,
and GCG's 'typedata'.

Peter

-- 
------------------------------------------------
Peter Rice, LION Bioscience Ltd, Cambridge, UK
peter.rice at uk.lionbioscience.com +44 1223 224723


From jason at cgt.mc.duke.edu  Wed Dec 19 21:01:23 2001
From: jason at cgt.mc.duke.edu (Jason Stajich)
Date: Wed, 19 Dec 2001 16:01:23 -0500 (EST)
Subject: alignment sequence reading with stop codons (bug?)
Message-ID: <Pine.LNX.4.33.0112191550310.32707-100000@tenero.genetics.duke.edu>

I noticed this in playing with our new bioperl wrappers for EMBOSS.
Apparently -seqall does not read sequences with stop codons.
I can submit as a bug if that is more appropriate.  Getting warmed up to
the EMBOSS dev process.

This occurs with both
EMBOSS-1.9.1
and
CVS code I checked out today (2.0.1 I guess).
The work around is of course to specify the arguments in the correct way
or replace the stop codon with something like X.  I know which sequence
will have potential stop codons so I can work around this in my own code.

[jason at gordola crypto_intergenic]$ cat jason.seq
>SW-CC27_YEAST SW:CC27_YEAST P38042 saccharomyces cerevisiae (baker's
yeast). cell division control protein 27. 10/2001; PIR:S45825 cell
division control protein CDC27 - yeast (Saccharomyces cerevisia
MAVNPELAPFTLSRGIPSFDDQALSTIIQLQDCIQQAIQQLNYSTAEFLAELLYAECSIL
DKSSVYWSDAVYLYALSLFLNKSYHTAFQISKEFKEYHLGIAYIFGRCALQLSQGVNEAI
LTLLSIINVFSSNSSNTRINMVLNSNLVHIPDLATLNCLLGNLYMKLDHSKEGAFYHSEA
LAINPYLWESYEAICKMRATVDLKRVFFDIAGKKSNSHNNNAASSFPSTSLSHFEPRSQP
SLYSKTNKNGNNNINNNVNTLFQSSNSPPSTSASSFSSIQHFSRSQQQQANTSIRTCQNK
NTQTPKNPAINSKTSSALPNNISMNLVSPSSKQPTISSLAKVYNRNKLLTTPPSKLLNND
RNHQNNNNNNNNNNNNNNNNNNNNNNNNIINKTTFKTPRNLYSSTGRLTTSKKNPRSLII
SNSILTSDYQITLPEIMYNFALILRSSSQYNSFKAIRLFESQIPSHIKDTMPWCLVQLGK
LHFEIINYDMSLKYFNRLKDLQPARVKDMEIFSTLLWHLHDKVKSSNLANGLMDTMPNKP
ETWCCIGNLLSLQKDHDAAIKAFEKATQLDPNFAYAYTLQGHEHSSNDSSDSAKTCYRKA
LACDPQHYNAYYGLGTSAMKLGQYEEALLYFEKARSINPVNVVLICCCGGSLEKLGYKEK
ALQYYELACHLQPTSSLSKYKMGQLLYSMTRYNVALQTFEELVKLVPDDATAHYLLGQTY
RIVGRKKDAIKELTVAMNLDPKGNQVIIDELQKCHMQE

[jason at gordola crypto_intergenic]$ cat prot.seq
>Contig5745
CLIF*RLLLIQMIHPQARRAFTFLQQQEPYRIQSMEQLSTLLWHLADLPALSHLSQSLIS
ISRSSPQAWIAVGNCFSLQKDHDEAMRCFRRATQVDEGCAYAWTLCGYEAVEMEEYERAM
AFYRTAIRTDARHYNAWYVLFFFFFFFFVPGDIDS*PKKGMEWG*FISKRIDRGMRSIIL
KEPSKSIQLIPFFYVALVW*VGVSSYPLETMTNIDFPKKKKALEKSNDVVQALHFYERAS
KYAPTSAMVQFKRIRALVALQRYDEAISALVPLTHSAPDEANVFFLLGKCLLKKERRQEA
TMAFTNARELEPK

[jason at gordola crypto_intergenic]$ water jason.seq prot.seq
Smith-Waterman local alignment.
   An error has been found: Sequence Contig5745 must be protein sequence,
 found bad character '*'
   An error has been found: option -seqall: Unable to read sequence
'prot.seq'
   There is a serious problem: water terminated: Bad value for option and
no prompt

[jason at gordola crypto_intergenic]$ water prot.seq jason.seq
Smith-Waterman local alignment.
Gap opening penalty [10.0]:
Gap extension penalty [0.5]:
Output file [contig5745.water]:

-- 
Jason Stajich
Duke University
jason at cgt.mc.duke.edu


From bauer at genprofile.com  Thu Dec 20 07:02:56 2001
From: bauer at genprofile.com (David Bauer)
Date: Thu, 20 Dec 2001 08:02:56 +0100
Subject: alignment sequence reading with stop codons (bug?)
References: <Pine.LNX.4.33.0112191550310.32707-100000@tenero.genetics.duke.edu>
Message-ID: <3C218D20.D752C59@genprofile.com>

Hi,

the protein alignment programs don't like the '*' in your protein
sequences. They are designed to align true proteins which usualy do not
contain stop codons.
If this are putative ORFs, a solution would be to split them up at the
stops, creating a separate protein sequence for each ORF.

I also guess you are misinterpreting the -seqall. This means to return
all sequences from a file containing more than one sequence (like a
fasta formated file with several sequences separated by theire
description lines). For me the -seqall option does not make much sense
in the case of alignment programs which need exactly 2 sequences to
align.
There you must always pass the two sequence files which you want align
as arguments to the alignment program and each file must contain exactly
one sequence.

I hope this helps,

David Bauer.


Jason Stajich wrote:
> 
> I noticed this in playing with our new bioperl wrappers for EMBOSS.
> Apparently -seqall does not read sequences with stop codons.
> I can submit as a bug if that is more appropriate.  Getting warmed up to
> the EMBOSS dev process.


From simon.andrews at bbsrc.ac.uk  Thu Dec 20 09:21:43 2001
From: simon.andrews at bbsrc.ac.uk (simon andrews (BI))
Date: Thu, 20 Dec 2001 09:21:43 -0000
Subject: Bug in entret.
Message-ID: <2DC41140A89ED411989D00508BDCD9EDEA51F1@bi-exsrv1.iapc.bbsrc.ac.uk>

Following on from my query yesterday, I have hit a problem trying to
implement a multiple search because of what appears to be a bug in entret.

I am using a series of fasta flat files, indexed with dbifasta.  What I am
finding is that although I can retrieve entries from the database with
seqret, using entret always returns an empty file with the same accession
number:

############

%> entret htg_mus:AC092094_v6_c8
Reads and writes (returns) flatfile entries
Output file [ac092094_v6_c8.entret]: 
%> more ac092094_v6_c8.entret
%> seqret htg_mus:AC092094_v6_c8       
Reads and writes (returns) sequences
Output sequence [ac092094_v6_c8.fasta]: 
%> more ac092094_v6_c8.fasta
>AC092094_v6_c8 Mus musculus clone RP23-261m19, WORKING DRAFT SEQUENCE, 8
unordered pieces.
CAGGACAGCCAGGGCTACACAGAGAAACCCTGTCTCAAAAAACAAAAAAACAAAAAAAAA
ACAAAAGAAGAAGAAAATGTCTGTGAATACCCTGGAAAAGTTACTCAGTGAAAGTAGATG
AGTCCCTGAGTCAGTGACAGGAAGTGAGTGCAGTCTGAGCACTGGCTTGTGACCAATGAC
AAAAACATAAGCTAGACTTGCTCTGCAAAGTGGAGGACAGAACAGACAAAGCCCCAGAGT

etc. etc.
############

entret doesn't produce any errors, but if I run it with the -debug option I
see the following lines in entret.dbg


############
Initializing seqInFormat, 40 formats
ajSeqRead: input file '/data/MOUSE/HTG/htg_mus.fasta' still there, try again
seqRead: single access - count 1 - call access routine again
seqAccessEmblcd type 1
query data all finished
seqRead: seqin->Query->Access->Access(seqin) *failed*
ajSeqallNext failed
closing file 'ac092094_v6_c4.entret'
############

I've checked, and the /data/MOUSE/HTG/htg_mus.fasta file is definitely
there, and is readable, so I suspect that something in the EMBOSS internals
is going wrong.

This is using EMBOSS 2.0.0.  Is this a known bug?  Is there a fix on the
way?  I can bluff the script using seqret in this case, but I'd like to make
a more general solution eventually.

	Cheers
	
	Simon.


From simon.andrews at bbsrc.ac.uk  Thu Dec 20 12:24:18 2001
From: simon.andrews at bbsrc.ac.uk (simon andrews (BI))
Date: Thu, 20 Dec 2001 12:24:18 -0000
Subject: Farm files for databases
Message-ID: <2DC41140A89ED411989D00508BDCD9EDEA51F3@bi-exsrv1.iapc.bbsrc.ac.uk>

After getting some useful info from Peter Rice about how to create a
database farm in EMBOSS I thought I'd share the script I'm now using to do
this.

To use this simply copy and paste the text of the script at the bottom of
this message to a file on your system, then make sure that this file is
readable and executable by everyone (chmod 755 filename).  The comments in
the script tell you what changes you need to make to the script itself, and
the format of the entry you need to create in emboss.default.

Because of the bug I previously reported in entret, this script will not
work from an entret query to the farm.  It will work with seqret (and will
output any format you like), and can also be used as part of a USA for any
of the standard EMBOSS programs.

The script requires a unix-like OS, but could trivially be adapted to run
under Win32 if anyone is running EMBOSS under windows.

	TTFN

	Simon.

------ Script Starts Here -- Beware of long lines wrapping
----------------------
#!/usr/bin/perl -w
use strict;

# EMBOSS farm file script
#
# Written by Simon Andrews
# simon.andrews at bbsrc.ac.uk
# Dec 2001
#
# This script allows you to set up a farm
# of EMBOSS databases which can be queried
# by a single instance of seqret.  The
# program must be accompanied by an entry
# in emboss.default which looks like this:
#
# DB name_of_database [
#	type: N (or P if we're dealing with proteins)
#	method: app
#	format: fasta
#	app: "/path/to/this/script"
#	comment: "Whatever text you'd like to see in showdb" ]
#

# First we need to set a few preferences
#
# What is the full path to seqret?
# If you are sure that seqret will always
# be somewhere in your path, then you can
# just leave this as 'seqret'.

my $seqret_path = 'seqret';


# Now we need to know the names of the
# databases you'd like included in the
# search.  These must be dabases which
# have already been indexed, and installed
# correctly into emboss.default.  Simply
# enter the database names between the
# brackets, separated by spaces.

my @databases = qw(dbase1 dbase2 dbase3);


##### End of bits which need to be edited #########

my ($reference) = @ARGV;

if ($reference =~ /:(.+)$/){
  $reference = $1;
}

else {
  die "\n*** FARM ERROR *** Couldn't get accession after : from
$reference\n\n";
}


foreach my $database (@databases){

  my $sequence = `$seqret_path $database:$reference fasta::stdout
2>/dev/null`;

  if ($sequence){
	print $sequence;
	exit;
  }

}

warn "\n*** FARM ERROR *** Couldn't find $reference in any of
'@databases'\n\n";


From lukem at bioinfo.pbi.nrc.ca  Thu Dec 20 15:10:19 2001
From: lukem at bioinfo.pbi.nrc.ca (Luke McCarthy)
Date: Thu, 20 Dec 2001 09:10:19 -0600
Subject: alignment sequence reading with stop codons (bug?)
References: <Pine.LNX.4.33.0112191550310.32707-100000@tenero.genetics.duke.edu> <3C218D20.D752C59@genprofile.com>
Message-ID: <3C21FF5B.2C4251BF@bioinfo.pbi.nrc.ca>

David Bauer wrote:
> 
> I also guess you are misinterpreting the -seqall. This means to return
> all sequences from a file containing more than one sequence (like a
> fasta formated file with several sequences separated by theire
> description lines). For me the -seqall option does not make much sense
> in the case of alignment programs which need exactly 2 sequences to
> align.

Nevertheless, the acd files for water and needle clearly state that the
second argument is a parameter of type seqall.  Which makes perfect
sense if one wants to align a probe sequence against a database of
others (a la BLAST)

Cheers,

Luke


From jason at cgt.mc.duke.edu  Thu Dec 20 15:12:48 2001
From: jason at cgt.mc.duke.edu (Jason Stajich)
Date: Thu, 20 Dec 2001 10:12:48 -0500 (EST)
Subject: alignment sequence reading with stop codons (bug?)
In-Reply-To: <3C218D20.D752C59@genprofile.com>
Message-ID: <Pine.LNX.4.33.0112200937250.2664-100000@tenero.genetics.duke.edu>

On Thu, 20 Dec 2001, David Bauer wrote:

> Hi,
>
> the protein alignment programs don't like the '*' in your protein
> sequences. They are designed to align true proteins which usualy do not
> contain stop codons.
> If this are putative ORFs, a solution would be to split them up at the
> stops, creating a separate protein sequence for each ORF.
>

Re-aligning blastx hsps in some distant fungi so am hitting pseudogenes or
sequencing errors, hence the stop codons.

What is confusing to me wrt to the actual alignment programs, is if they
don't like stop codons at all, they still allow an alignment when the
sequence containing the stop codon is the query (-sequencea) but not when
the sequence is in the subject db - ie the behavior in my previous msg.
I may just recode the stop codons as an unknown aa to achieve what I need
for the alignment.  I realize it is silly to try and align these proteins
with stop codons but I am looking for conserved regions for degenerate PCR
primer picking.

[jason at gordola crypto_intergenic]$ head -6 contig5745.water
Local: Contig5745 vs SW-CC27_YEAST
Score: 367.50

Contig5745      1        CLIF*RLLLIQMI.HPQARRAFTFLQQQEPYRIQSMEQLSTLLWH 44
                         ||:    |  ::| :  : : |  |:  :| |:: ||  ||||||
SW-CC27_YEAST   474      CLVQLGKLHFEIINYDMSLKYFNRLKDLQPARVKDMEIFSTLLWH 518

> I also guess you are misinterpreting the -seqall. This means to return
> all sequences from a file containing more than one sequence (like a
> fasta formated file with several sequences separated by theire
> description lines). For me the -seqall option does not make much sense
> in the case of alignment programs which need exactly 2 sequences to
> align.
> There you must always pass the two sequence files which you want align
> as arguments to the alignment program and each file must contain exactly
> one sequence.
>

In the alignment program context -seqall is the name of the db to search
the query (-sequencea) against - so one will get an alignment of the first
sequence against the whole db of sequences.  I am only interested in 1
pairwise comparison so the order of the sequences didn't really matter to
me.  We have a SW alignment module in bioperl (written in C - before you
gag) for protein alignments but was trying out our new EMBOSS wrappers in
bioperl, hence the reported issue.

> I hope this helps,
>
> David Bauer.
>
>
> Jason Stajich wrote:
> >
> > I noticed this in playing with our new bioperl wrappers for EMBOSS.
> > Apparently -seqall does not read sequences with stop codons.
> > I can submit as a bug if that is more appropriate.  Getting warmed up to
> > the EMBOSS dev process.
>

-- 
Jason Stajich
Duke University
jason at cgt.mc.duke.edu


From gbottu at ben.vub.ac.be  Thu Dec 20 17:25:52 2001
From: gbottu at ben.vub.ac.be (Guy Bottu)
Date: Thu, 20 Dec 2001 18:25:52 +0100 (MET)
Subject: Farm files for databases, using SRS
Message-ID: <200112201725.SAA11576@bigben.vub.ac.be>

from : BEN

You can also, like Peter suggested, use SRS. For example, I wanted to access the 
databanks IMGT/LIGM and IMGT/MHC as one databank with name imgt and shortname 
im. I use SRS for retrieving one or several sequences eventually with their 
documentation and a direct access to a databank for a full search (faster ?). I 
wrote : 

in .../emboss/share/EMBOSS/emboss.default :

DB imgt [ type: N comment: 'Immunogenetics Databases'
          methodquery: srs dbalias: IMGT formatquery: embl
          methodall: direct dir: /sw/emboss/DBlink file: 'I*' formatall: fasta
]
DB im   [ type: N comment: 'Immunogenetics Databases'
          methodquery: srs dbalias: IMGT formatquery: embl
          methodall: direct dir: /sw/emboss/DBlink file: 'I*' formatall: fasta
]

and in .../srs/icarus/site/site.i (hidden so that it does show up in the WWW 
page of SRS) :

$imgt_db=$Library:[IMGT
  format:$EMBL_FORMAT
  virtualInfo:$LibVirtual:[
    memberLibs:{$IMGT_DB $MHC_DB}
  ]
  type:hidden
]

The directory /sw/emboss/DBlink contains  :

Iligm -> /dbfb/imgt/ligm
Imhc -> /dbfb/imgt/mhc

	Guy Bottu


From ableasby at hgmp.mrc.ac.uk  Mon Dec 24 16:41:15 2001
From: ableasby at hgmp.mrc.ac.uk (ableasby at hgmp.mrc.ac.uk)
Date: Mon, 24 Dec 2001 16:41:15 GMT
Subject: EMBOSS 2.1.0 released
Message-ID: <200112241641.QAA23771@bromine.hgmp.mrc.ac.uk>

EMBOSS 2.1.0, coming some 6 months after the previous release, now includes
an alpha release of the client/server GUI called Jemboss, written at
the HGMP by Tim Carver. The complete package is available for download
from:
       http://www.uk.embnet.org/Software/EMBOSS

Several new applications are provided including primer3. There has also
been considerable work done in the transition towards standard report
formats and many applications now use these. Any alignment program
can use the -aformat qualifier to choose a variety of standard outputs
(e.g. pair, markx0,markx1,srs). Reports for non-alignment programs
similarly use the -rformat qualifier. All have sensible defaults.
Reports will be further integrated throughout the EMBOSS vsn 2
distributions. 

EMBOSS will work as usual without Jemboss, however if you wish to try
using Jemboss (server or client) see:

http://www.uk.embnet.org/Software/EMBOSS/Jemboss/download/setup.html


Alan