From fchetou at infobiogen.fr  Tue Jul  3 04:55:58 2001
From: fchetou at infobiogen.fr (Farid Chetouani)
Date: Tue, 03 Jul 2001 10:55:58 +0200
Subject: Protein Clustering tool
Message-ID: <3B41889E.3EDC8D1E@infobiogen.fr>

Bonjour 

I would like to know,
if there is plan in Emboss to develop
a software to cluster protein into families (of paralogues/orthologues)
according to the sequence similarity

thank you for your help

F

PS: please reply to my email fchetou at infobiogen.fr


From frank at bioss.sari.ac.uk  Tue Jul  3 05:18:20 2001
From: frank at bioss.sari.ac.uk (Frank Wright)
Date: Tue, 03 Jul 2001 10:18:20 +0100
Subject: Protein Clustering tool
References: <3B41889E.3EDC8D1E@infobiogen.fr>
Message-ID: <3B418DDC.F2004E00@bioss.sari.ac.uk>

Hi All,

  If you wish to construct phylogenetic trees (specifically gene trees)
from protein sequences so as to infer duplication and
paralogous/orthologous relationships, then you can use the PHYLIP
package (available as an EMBASSY application).  Genetic distances can be
calculated using EPROTDIST and the distance matrix created can be input
into either EFITCH (slower, more accurate tree) or ENEIGHBOR (faster,
more approximate clustering method, allowing the use of the
Neighbor-Joining algorithm, or the UPGMA algorithm - use the latter only
if you have previously tested that the "molecular clock" assumption is
valid for your dataset).

  ePROTDIST, eFITCH and eNEIGHBOR come from version 3.5 of the PHYLIP
package (http://evolution.genetics.washington.edu).  PHYLIP 3.6 has
recently been released (alpha version).  However, PROTDIST 3.6 has
improved distances (copes with among-site rate heterogeneity to give
more accurate genetic distances) and there are also improvements to
NEIGHBOR 3.6 (faster) and to FITCH 3.6.  I presume that PHYLIP 3.6 will
be available as an EMBASSY application once it is confident that there
are no serious bugs :-)

I hope that helps,
Best Wishes,
Frank 
-- 
Frank Wright
Biomathematics and Statistics Scotland, 
SCRI, DUNDEE DD2 5DA, Scotland
frank at bioss.sari.ac.uk


From fchetou at pasteur.fr  Tue Jul  3 05:38:29 2001
From: fchetou at pasteur.fr (Farid Chetouani)
Date: Tue, 3 Jul 2001 11:38:29 +0200
Subject: Protein Clustering tool
In-Reply-To: <3B418DDC.F2004E00@bioss.sari.ac.uk>; from frank@bioss.sari.ac.uk on Tue, Jul 03, 2001 at 10:18:20AM +0100
References: <3B41889E.3EDC8D1E@infobiogen.fr> <3B418DDC.F2004E00@bioss.sari.ac.uk>
Message-ID: <20010703113829.A38883@pasteur.fr>

Bonjour 

Firstly, Frank thank you for your reply.
I am sorry my first email was not enough precise.

In fact,
I was wondering if EMBOSS plan to provide a free clustering tool
with a view to get from a protein fasta sequence file
a list of family proteins. 

For instance, thanks to A. Enright & C. Ouzounis
GeneRage software is free for academic research
(http://www.ebi.ac.uk/research/cgg/services/rage/)
but the sources are not yet available

best regards
thank you for your help
F

PS: please reply to my email, fchetou at infobiogen.fr

> 
>   If you wish to construct phylogenetic trees (specifically gene trees)
> from protein sequences so as to infer duplication and
> paralogous/orthologous relationships, then you can use the PHYLIP
> package (available as an EMBASSY application).  Genetic distances can be
> calculated using EPROTDIST and the distance matrix created can be input
> into either EFITCH (slower, more accurate tree) or ENEIGHBOR (faster,
> more approximate clustering method, allowing the use of the
> Neighbor-Joining algorithm, or the UPGMA algorithm - use the latter only
> if you have previously tested that the "molecular clock" assumption is
> valid for your dataset).
> 
>   ePROTDIST, eFITCH and eNEIGHBOR come from version 3.5 of the PHYLIP
> package (http://evolution.genetics.washington.edu).  PHYLIP 3.6 has
> recently been released (alpha version).  However, PROTDIST 3.6 has
> improved distances (copes with among-site rate heterogeneity to give
> more accurate genetic distances) and there are also improvements to
> NEIGHBOR 3.6 (faster) and to FITCH 3.6.  I presume that PHYLIP 3.6 will
> be available as an EMBASSY application once it is confident that there
> are no serious bugs :-)
> 
> I hope that helps,
> Best Wishes,
> Frank 
> -- 
> Frank Wright
> Biomathematics and Statistics Scotland, 
> SCRI, DUNDEE DD2 5DA, Scotland
> frank at bioss.sari.ac.uk


From jison at hgmp.mrc.ac.uk  Tue Jul  3 05:48:04 2001
From: jison at hgmp.mrc.ac.uk (Dr J.C. Ison)
Date: Tue, 03 Jul 2001 10:48:04 +0100
Subject: Protein Clustering tool
References: <3B41889E.3EDC8D1E@infobiogen.fr>
Message-ID: <3B4194D4.929C7A3D@hgmp.mrc.ac.uk>

Software to cluster protein sequences into families on
the basis of relatedness of sequence is on my list of
jobs to do - will happen within the next 3 months.

I personally need something quite simple minded, if
you have any specific requirements let me know and
I can try and pull it in my design.

Cheers

J.

Farid Chetouani wrote:

> Bonjour
>
> I would like to know,
> if there is plan in Emboss to develop
> a software to cluster protein into families (of paralogues/orthologues)
> according to the sequence similarity
>
> thank you for your help
>
> F
>
> PS: please reply to my email fchetou at infobiogen.fr

--
Jon C. Ison, PhD
Bioinformatics Applications Group
UK MRC Human Genome Mapping Project Resource Centre
Hinxton, Cambridge, CB10 1SB, UK
E-mail : jison at hgmp.mrc.ac.uk
Tel    : 01223 49-4548
HGMP-RC: http://www.hgmp.mrc.ac.uk/
EMBOSS : http://www.hgmp.mrc.ac.uk/Software/EMBOSS/
CCP11  : http://www.hgmp.mrc.ac.uk/CCP11/


From gbottu at ben.vub.ac.be  Mon Jul  9 05:35:09 2001
From: gbottu at ben.vub.ac.be (Guy Bottu)
Date: Mon, 9 Jul 2001 11:35:09 +0200 (MET DST)
Subject: No subject
Message-ID: <200107090935.LAA09786@bigben.vub.ac.be>

	Dear friends,
	
I am puzzled by pscan outputs. I do not see the difference between "Not all 
elements match but those that do are in order" and "Remaining partial matches", 
since in both cases there are two matches with the same element. And, in 
general, how does pscan handle cases where you the protein really contains 
several times the same motif (e.g. proteins with kringles). Can Alan or someone 
else anwer this ?

	Regards,
	Guy Bottu
-------------- next part --------------


CLASS 1
Fingerprints with all elements in order


CLASS 2
All elements match but not all in the correct order

Fingerprint HTHREPRESSR Elements 2
    Accession number PR00031
    Lambda and other repressor helix-turn-helix signature
  Element 1 Threshold 50% Score 73%
             Start position 135 Length 10
  Element 2 Threshold 32% Score 32%
             Start position 74 Length 17


CLASS 3
Not all elements match but those that do are in order

Fingerprint GEMCOATBR1 Elements 7
    Accession number PR00225
    Geminivirus BR1 coat protein signature
  Element 3 Threshold 30% Score 37%
             Start position 281 Length 15
  Element 3 Threshold 30% Score 31%
             Start position 196 Length 15


CLASS 4
Remaining partial matches

Fingerprint GABAARBETA Elements 4
    Accession number PR01160
    Gamma-aminobutyric-acid A receptor beta subunit signature
  Element 1 Threshold 33% Score 34%
             Start position 275 Length 15
  Element 1 Threshold 33% Score 33%
             Start position 187 Length 15


From sgmd at genetik.fu-berlin.de  Tue Jul 10 04:36:25 2001
From: sgmd at genetik.fu-berlin.de (Thomas Siegmund)
Date: Tue, 10 Jul 2001 10:36:25 +0200
Subject: Announce: X GUI for EMBOSS V0.5
Message-ID: <20010710083627.D881617AD6@mercury.hgmp.mrc.ac.uk>

Dear all,

a few months ago I announced my plan to build a X Window GUI for EMBOSS based 
on Kaptain and QT/KDE. Today I'd like to inform you that I have made some 
progress with it. Version 0.5 of EMBOSS.kaptn is available at 
http://userpage.fu-berlin.de/~sgmd .

ChangeLog:
==========
Version 0.5
- Covering 50 EMBOSS applications with (almost) all options
- Integrated EMBOSS help system
- Use new regexpression features of Kaptain 0.6. This allows fallback
  to EMBOSS defaults, if text input fields for parameters like "-outfile"
  are empty.
- Files can be selected by drag & drop
- Addition of embosslauncher, a tool to set the working directory and to run
  different EMBOSS applications with the same sequence file
- Simple install script
Version 0.1
- First simple GUIs for 12 EMBOSS applications
- First public announcement at emboss at embnet.org

Please give it a try and let me know what you think.

With best regards

Thomas

-- 
Thomas Siegmund
Freie Universit?t Berlin
Institut f?r Genetik
Arnimallee 7
14195 Berlin
Germany
Tel: +49 30 838 54868
Fax: +49 30 838 54395
http://userpage.fu-berlin.de/~sgmd


From friends at openxxx.net  Sat Jul 14 21:17:19 2001
From: friends at openxxx.net (friends at openxxx.net)
Date: Sun, 15 Jul 2001 02:17:19 +0100 (BST)
Subject: Hello, your friend recommended openxxx to you
Message-ID: <20010715011719.4C2CA17A56@mercury.hgmp.mrc.ac.uk>


You have been invited to check out this adult site
by one of your friends who visited us.

our URL is http://www.openxxx.net/
enjoy,
OpenXXX TEAM 2001


From ableasby at hgmp.mrc.ac.uk  Sun Jul 15 08:51:31 2001
From: ableasby at hgmp.mrc.ac.uk (ableasby at hgmp.mrc.ac.uk)
Date: Sun, 15 Jul 2001 13:51:31 +0100 (BST)
Subject: Announcing EMBOSS 2.0.0
Message-ID: <200107151251.NAA11553@bromine.hgmp.mrc.ac.uk>

EMBOSS 2.0.0 includes:

1. Feature table reading: EMBL, Swissprot and PIR feature tables are
   handled by rewritten library routines. Tables can currently be
   read and written (or interconverted) in native or GFF formats.
   For the applications programmer an internal key/value pair
   structure greatly simplifies use.

2. Report Handling: Stub code to enable application output to be
   selected in (a range of) standard output report formats has
   been included. The feature tables above use one of these
   formats. More report formats will be added during the lifetime of
   the 2.x.x series. Release 3.0.0 of EMBOSS will mark the completion
   of this phase.

3. Code purification: All library code and applications are tested
   for memory handling before release. To our knowledge the code does
   not leak a single byte in normal use. A "purify" script is provided
   (mainly for developers).

4. Quality control: code has been written, supplied and used for
   testing code prior to release. This ensures that applications
   produce the same output (where appropriate) after changes to
   the library etc. A QA test script is provided/

5. Code modification: almost all the source code has been revamped
   since the 1.x.x series. All functions, including those in
   applications, have unique names. This now allows you to navigate
   the entire source code using SRS.

6. Protein structure code has been added and, although not yet
   complete, this marks one of many new directions for applications.

Not entirely by coincidence the release of 2.0.0, like 1.0.0, has
happened on St Swithin's Day (15th July) just prior to the ISMB
conference. So, if it works on that day it should work for 40 days
thereafter! We look forward to making the same joke again next year (1).

Alan

On behalf of the development team (apologies if your name has been
omitted by accident) who are:

HGMP: Alan Bleasby, Tim Carver, Jon Ison, Ranjeeva Ranasinghe,
      Gary Williams
Lion Bioscience: Peter Rice

Special thanks to David Martin (University of Dundee) for the
administration guide. To Lisa Mullan (HGMP training courses) for
providing feedback and suggestions from course attendees.

Thanks to all who have made suggestions, provided bug reports or
contributed code. If we've failed to acknowledge you here you
should be there in the source code. If not, tell us and we'll fix
it!


Footnote:

1.
St. Swithin's Day if thou dost rain, 
For forty days it will remain; 
St. Swithin's Day if thou be fair, 
For forty days 'twill rain na mair.


From ableasby at hgmp.mrc.ac.uk  Mon Jul 16 06:55:04 2001
From: ableasby at hgmp.mrc.ac.uk (ableasby at hgmp.mrc.ac.uk)
Date: Mon, 16 Jul 2001 11:55:04 +0100 (BST)
Subject: EMBOSS 2.0.0 amendment
Message-ID: <200107161055.LAA13298@bromine.hgmp.mrc.ac.uk>

There was an omission in the original EMBOSS-2.0.0.tar.gz file
which would have resulted in 3 of the protein structure acd
files not being copied after a "make install". This has now
been corrected and a replacement file put on the server.

Alan


From dessen at infobiogen.fr  Mon Jul 16 08:25:38 2001
From: dessen at infobiogen.fr (Philippe Dessen)
Date: Mon, 16 Jul 2001 14:25:38 +0200
Subject: fuzznuc
Message-ID: <3B52DC18.3141B98D@infobiogen.fr>

Just a question about fuzznuc :
Is it possible to define a pattern with repetition of a motif (as n
letters with n>1) ?
That is not mentionned in documentation .

The following pattern (a stop codon in a coding frame) seems to be
illegal !
<(NNN)(0,)TGA(NNN)(1,)>

$ fuzznuc seqfile
Nucleic acid pattern search
Search pattern: <(NNN)(0,)TGA(NNN)(1,)>
Number of mismatches [0]:
Output file [rptufrpx.fuzznuc]:
   This is a warning: Illegal character [(]

   EMBOSS An error in fuzznuc.c at line 96:
Illegal pattern

--------
in GCG syntax  you can use  (NNN){1,}


Regards

Philippe Dessen


From gwilliam at hgmp.mrc.ac.uk  Mon Jul 16 08:33:33 2001
From: gwilliam at hgmp.mrc.ac.uk (Gary Williams, Tel 01223 494522)
Date: Mon, 16 Jul 2001 13:33:33 +0100
Subject: fuzznuc
References: <3B52DC18.3141B98D@infobiogen.fr>
Message-ID: <3B52DF1D.4B3F2FCF@hgmp.mrc.ac.uk>

Philippe Dessen wrote:
> 
> Just a question about fuzznuc :
> Is it possible to define a pattern with repetition of a motif (as n
> letters with n>1) ?
> That is not mentionned in documentation .
> 
> The following pattern (a stop codon in a coding frame) seems to be
> illegal !
> <(NNN)(0,)TGA(NNN)(1,)>
> 
> $ fuzznuc seqfile
> Nucleic acid pattern search
> Search pattern: <(NNN)(0,)TGA(NNN)(1,)>
> Number of mismatches [0]:
> Output file [rptufrpx.fuzznuc]:
>    This is a warning: Illegal character [(]
> 
>    EMBOSS An error in fuzznuc.c at line 96:
> Illegal pattern


I think this is illegal in fuzznuc's PROSITE-style of pattern.
You might like to try 'dreg' instead with a regular expression like:

^(...)*TGA(...)+$

Note that these regilar expressions are case-sensitive, so put '-supper'
on your command  line to
force the sequence into the required upper case.

Gary

-- 
Gary Williams               Tel: +44 1223 494522  Fax: +44 1223 494512
mailto:G.Williams at hgmp.mrc.ac.uk            http://www.hgmp.mrc.ac.uk/
Bioinformatics,MRC HGMP Resource Centre,Hinxton,Cambridge, CB10 1SB,UK


From kala at avesthagen.com  Mon Jul 23 02:43:47 2001
From: kala at avesthagen.com (Kala)
Date: Mon, 23 Jul 2001 12:13:47 +0530 (IST)
Subject: help pl.
Message-ID: <Pine.LNX.4.33.0107231213370.8793-100000@mail.avesthagen.com>

Hi all,
 Cud u pl.tell me whether i can install Emboss on True64Unix...
 I'm unable to untar it...It says "not look like a tar archive"...

 It'll b very useful if i get a reply soon.

thanx in adv.
kala


From bauer at genprofile.com  Mon Jul 23 03:06:57 2001
From: bauer at genprofile.com (David Bauer)
Date: Mon, 23 Jul 2001 09:06:57 +0200
Subject: help pl.
References: <Pine.LNX.4.33.0107231213370.8793-100000@mail.avesthagen.com>
Message-ID: <3B5BCD11.C2E54D9B@genprofile.com>

Kala wrote:
> 
> Hi all,
>  Cud u pl.tell me whether i can install Emboss on True64Unix...
>  I'm unable to untar it...It says "not look like a tar archive"...

The download file is a tar file which is compressed with gzip.
If you have gnu tar use 'tar -xvzf <filename>'. The z options says its a
compressed tar file.
If your systems tar does not know how to handle compressed archives
(like e.g. the Solaris tar) you must first run 'gunzip <filname>'.

I hope this helps,

Ciao, David.


From dmartin at bioinformatics.msiwtb.dundee.ac.uk  Mon Jul 23 04:52:14 2001
From: dmartin at bioinformatics.msiwtb.dundee.ac.uk (David Martin)
Date: Mon, 23 Jul 2001 09:52:14 +0100 (BST)
Subject: help pl.
In-Reply-To: <Pine.LNX.4.33.0107231213370.8793-100000@mail.avesthagen.com>
Message-ID: <Pine.LNX.4.33.0107230950100.29068-100000@bioinformatics.msiwtb.dundee.ac.uk>

On Mon, 23 Jul 2001, Kala wrote:

> Hi all,
>  Cud u pl.tell me whether i can install Emboss on True64Unix...
>  I'm unable to untar it...It says "not look like a tar archive"...

First ensure that it was transferred in binary mode.

Secondly, you will need to gunzip the archive before untarr'ing. Gnu tar
includes gunzip (tar zxf filename) whereas many vendor supplied versions
don't. The following command line may help

zcat filename | tar xf -

..d

>
>  It'll b very useful if i get a reply soon.
>
> thanx in adv.
> kala
>
>
>

----------------------------------
David Martin PhD
Bioinformatics Scientific Officer
Wellcome Trust Biocentre, Dundee
----------------------------------


From gbottu at ben.vub.ac.be  Wed Jul 25 14:31:27 2001
From: gbottu at ben.vub.ac.be (Guy Bottu)
Date: Wed, 25 Jul 2001 20:31:27 +0200 (MET DST)
Subject: tfscan blues
Message-ID: <200107251831.UAA17816@bigben.vub.ac.be>

from : BEN

	Dear colleagues,
	
I had already posted this question before, but nobody had replied. The problem 
is that the value of the program tfscan is decreasing, since we cannot get 
updates of TRANSFAC anymore, unless we pay a licence, and I wonder whether at 
all EMBnet Nodes have the right to give access to their users.
For info, see  http://www.biobase.de/academia.html
Anybody a comment ?

	Guy Bottu


From c.plessy at mangoosta.net  Wed Jul 25 17:18:47 2001
From: c.plessy at mangoosta.net (Charles Plessy)
Date: Wed, 25 Jul 2001 23:18:47 +0200
Subject: tfscan blues
In-Reply-To: <200107251831.UAA17816@bigben.vub.ac.be>
References: <200107251831.UAA17816@bigben.vub.ac.be>
Message-ID: <01072523184702.02531@moulinette>

Le Mercredi 25 Juillet 2001 20:31, Guy Bottu a ?crit :
> from : BEN
>
> 	Dear colleagues,
>
> I had already posted this question before, but nobody had replied. The
> problem is that the value of the program tfscan is decreasing, since we
> cannot get updates of TRANSFAC anymore, unless we pay a licence, and I
> wonder whether at all EMBnet Nodes have the right to give access to their
> users.
> For info, see  http://www.biobase.de/academia.html
> Anybody a comment ?

I have a related question : do you think that it would be possible to build 
fake transfac databases from a simple file?
Currently I'm adding into an array (in the GCG findpattern format) any 
binding site of my interest that i find in the litterature. (with a name and 
a reference)
The goal would be to use existing programs to do searches within a set of 
home-selected transcription factors.

Charles PLESSY


From dmartin at bioinformatics.msiwtb.dundee.ac.uk  Thu Jul 26 03:55:54 2001
From: dmartin at bioinformatics.msiwtb.dundee.ac.uk (David Martin)
Date: Thu, 26 Jul 2001 08:55:54 +0100 (BST)
Subject: tfscan blues
In-Reply-To: <01072523184702.02531@moulinette>
Message-ID: <Pine.LNX.4.33.0107260854020.19723-100000@bioinformatics.msiwtb.dundee.ac.uk>

On Wed, 25 Jul 2001, Charles Plessy wrote:

> Le Mercredi 25 Juillet 2001 20:31, Guy Bottu a ?crit :
> > from : BEN
> >
> > 	Dear colleagues,
> >
> > I had already posted this question before, but nobody had replied. The
> > problem is that the value of the program tfscan is decreasing, since we
> > cannot get updates of TRANSFAC anymore, unless we pay a licence, and I
> > wonder whether at all EMBnet Nodes have the right to give access to their
> > users.
> > For info, see  http://www.biobase.de/academia.html
> > Anybody a comment ?
>
> I have a related question : do you think that it would be possible to build
> fake transfac databases from a simple file?
> Currently I'm adding into an array (in the GCG findpattern format) any
> binding site of my interest that i find in the litterature. (with a name and
> a reference)
> The goal would be to use existing programs to do searches within a set of
> home-selected transcription factors.

It woul dbe nice to have a public front end for such a database so that
submissions could be sent to a curator. Then we can return the information
to the public domain (all literature referenced of course so we cannot be
accused of stealing TRANSFAC).

..d


----------------------------------
David Martin PhD
Bioinformatics Scientific Officer
Wellcome Trust Biocentre, Dundee
----------------------------------


From charles at moulinette.dyndns.org  Thu Jul 26 18:41:20 2001
From: charles at moulinette.dyndns.org (Charles)
Date: Fri, 27 Jul 2001 00:41:20 +0200 (CEST)
Subject: tfscan blues
In-Reply-To: <Pine.LNX.4.33.0107260854020.19723-100000@bioinformatics.msiwtb.dundee.ac.uk>
Message-ID: <Pine.LNX.4.33.0107270018380.6343-100000@moulinette.dyndns.org>

> > I have a related question : do you think that it would be possible to build
> > fake transfac databases from a simple file?
> > Currently I'm adding into an array (in the GCG findpattern format) any
> > binding site of my interest that i find in the litterature. (with a name and
> > a reference)
> > The goal would be to use existing programs to do searches within a set of
> > home-selected transcription factors.
>
> It woul dbe nice to have a public front end for such a database so that
> submissions could be sent to a curator. Then we can return the information
> to the public domain (all literature referenced of course so we cannot be
> accused of stealing TRANSFAC).

Well, i bet that tools allowing public contribution could be a CGI form,
or a CVS archive, but setting up those interfaces is far beyond my
capacities.

Currently i have betveen 40 and 50 entries, very focused on molecular
biology of development in early vertebrate embryos. Some are complex and
other degenerate. I could not get something interesting of it for
the moment, using GCG findpatterns : i get either close to no sites or
plenty if i allow mismatches.

I can easily imagine a way to store more information in a separate array,
and then build a findpattern data file using a perl script. But i'd like
to try some programs that give a score to the matches, in order to search
for high complexity binding sites more efficiently.

Charles


From s.roehrig at xantos.de  Fri Jul 27 08:38:03 2001
From: s.roehrig at xantos.de (Roehrig, Sascha)
Date: Fri, 27 Jul 2001 14:38:03 +0200
Subject: coderet error with embl format file
Message-ID: <A31E2A68EB0ED411950A00D0B72601C10EE701@XAN-BDC>

Dear all,
 
I encountered an error while retrieving the feature table from the sample
embl database entry:
http://srs6.ebi.ac.uk/srs6bin/cgi-bin/wgetz?-e+[EMBL-ID:'TRBG361'
<http://srs6.ebi.ac.uk/srs6bin/cgi-bin/wgetz?-e+[EMBL-ID:'TRBG361'> ].
 
The cds and mRNA were shown correctly. However, the translation was missing
the first line of amino acids and ended with double quotes.
 
Has anybody else noticed the same?
 
Best wishes,
 
Sascha
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.open-bio.org/pipermail/emboss/attachments/20010727/25d0545e/attachment.html 

From peter.rice at uk.lionbioscience.com  Fri Jul 27 08:50:28 2001
From: peter.rice at uk.lionbioscience.com (Peter Rice)
Date: Fri, 27 Jul 2001 13:50:28 +0100
Subject: coderet error with embl format file
References: <A31E2A68EB0ED411950A00D0B72601C10EE701@XAN-BDC>
Message-ID: <3B616394.54F03A92@uk.lionbioscience.com>

"Roehrig, Sascha" wrote:

> I encountered an error while retrieving the feature table from the
>sample embl database entry:
>http://srs6.ebi.ac.uk/srs6bin/cgi-bin/wgetz?-e+[EMBL-ID:'TRBG361'
> 
>The cds and mRNA were shown correctly. However, the translation was
>missing the first line of amino acids and ended with double quotes.

Works for me in 2.0.0

There were some feature handling code changes in 2.0.0 - perhaps you can
simply install the new version.

regards,

Peter

-- 
------------------------------------------------
Peter Rice, LION Bioscience Ltd, Cambridge, UK
peter.rice at uk.lionbioscience.com +44 1223 224723


From ableasby at hgmp.mrc.ac.uk  Sun Jul 29 19:54:37 2001
From: ableasby at hgmp.mrc.ac.uk (ableasby at hgmp.mrc.ac.uk)
Date: Mon, 30 Jul 2001 00:54:37 +0100 (BST)
Subject: EMBOSS 2.0.1 (& HMMER)
Message-ID: <200107292354.AAA28489@bromine.hgmp.mrc.ac.uk>

EMBOSS 2.0.1 fixes an indexing problem with DBIGCG and split entries.
It also incorporates handling of the Selex format as used in the
HMMER package.

HMMER 2.1.1 has been converted for EMBOSS and appears in the
download directory (ftp://ftp.uk.embnet.org/pub/EMBOSS/) as
the 'embassy' package HMMER-2.1.1.tar.gz

Alan


From lukem at bioinfo.pbi.nrc.ca  Mon Jul 30 00:09:24 2001
From: lukem at bioinfo.pbi.nrc.ca (Luke McCarthy)
Date: Sun, 29 Jul 2001 22:09:24 -0600
Subject: EMBOSS GUI
Message-ID: <5.1.0.14.0.20010729190020.00a13a10@bioinfo.pbi.nrc.ca>

Hi everybody,

On and off over the past year or so, I've been developing a GUI for the 
EMBOSS tools designed to operate over the web.  It's been listed at the 
EMBOSS web site for most of that time, but in the last month I've 
significantly improved it to the point where I think it could be very 
useful to the entire EMBOSS user community.  But I'd like a little help to 
that end.

Before I release this interface out into the wild, I'd like it to be as 
polished as possible.  I don't have a lot of time right now to do extensive 
testing of any kind ('testing' to this date has only involved one trial run 
of each application), plus I don't actually use many of these tools in 
practice, so I don't know if they're actually useful the way they're 
currently presented.  So I'd like to solicit your assistance as EMBOSS users:

If you could find the time to drop by 
http://bioinfo.pbi.nrc.ca:8090/EMBOSS/index.html , try out your favourite 
EMBOSS tool, offer any suggestion or criticism that comes to mind, and 
definitely let me know if something doesn't work, I would be very 
grateful.  Criticism of the look and feel (colours, font size, general 
appearance) is appreciated, but all of that is eminently configurable 
through the use of style sheets (which means if you're using an older 
browser, this probably won't work very well for you), so it's not really 
too helpful.  What I'm really looking for are places where the interface is 
awkward or difficult to understand.  And support for the frame/page 
groupings that EMBOSS 2.0.0 allows will be coming in the next couple of 
days, before anyone suggests that.

About the interface itself: the scripts build the input collection pages on 
the fly, reading relevant information from the ACD files (incidentally, in 
the process of building this interface I've written an ACD->XML converter 
if anyone would find that useful ;)  Because of this, it's remarkably 
robust to changes in the tools themselves.  Even the menu is generated 
dynamically, so only those tools which are available on your system will be 
listed (for example, if you haven't installed the EMBASSY stuff it won't 
show up...)  You can also have the script dump all of the input collection 
pages and the menus to static HTML files if you're expecting heavy traffic 
and don't want to waste system resources...

Anyway, that's my story.  I strongly urge anyone who thinks a GUI for the 
EMBOSS tools would be a useful thing to drop by and help me make this one 
all that it can be.  Any questions can be directed to me personally if you 
don't want to clutter up the list.

Cheers,

Luke McCarthy
Bioinformatics Group,
Plant Biotechnology Institute,
National Research Council of Canada
lukem at bioinfo.pbi.nrc.ca


From bauer at genprofile.com  Tue Jul 31 04:03:02 2001
From: bauer at genprofile.com (David Bauer)
Date: Tue, 31 Jul 2001 10:03:02 +0200
Subject: showfeat overlaping CDS
Message-ID: <3B666636.EEFD11CF@genprofile.com>

Hi,

I have a EMBL file with 2 CDS entries which stand for alternatively
spliced products. I would like to display only one of them at a time
with showfeat.
Both have a /gene and /label with the gene name (e.g. gene1 gene2). 
So what I thought was to use:
-matchtype=cds -matchtag=label -matchvalue=gene1
The matchtype works as I expect but with any kind of matchtag or
matchvalue I'm getting core dumps.
So what's wrong with the above example ?

Also if I use -tags with a spliced CDS the tags are displayed only with
the first exon, all other exons get just a CDS so it is not visible
which of the remaining exons belongs to which of the genes.

Thanks, David.
-- 
Dr. David Bauer
GenProfile AG, Max-Delbrueck-Center, Erwin-Negelein-Haus 
Robert-Roessle-Str. 10, D-13125 Berlin, Germany
bauer at genprofile.com, Tel:49-30-94892165, FAX:49-30-94892151


From peter.rice at uk.lionbioscience.com  Tue Jul 31 04:48:14 2001
From: peter.rice at uk.lionbioscience.com (Peter Rice)
Date: Tue, 31 Jul 2001 09:48:14 +0100
Subject: showfeat overlaping CDS
References: <3B666636.EEFD11CF@genprofile.com>
Message-ID: <3B6670CE.A7BBDEFA@uk.lionbioscience.com>

David Bauer wrote:

> I have a EMBL file with 2 CDS entries which stand for alternatively
> spliced products. I would like to display only one of them at a time
> with showfeat.
> Both have a /gene and /label with the gene name (e.g. gene1 gene2).
> So what I thought was to use:
> -matchtype=cds -matchtag=label -matchvalue=gene1
> The matchtype works as I expect but with any kind of matchtag or
> matchvalue I'm getting core dumps.

Works for me with 2.0.1, but purify complains horribly - most likely the
same problem. We will fix it and add these command line options to the new
test set.

> Also if I use -tags with a spliced CDS the tags are displayed only with
> the first exon, all other exons get just a CDS so it is not visible
> which of the remaining exons belongs to which of the genes.

Internally the tags are stored with the first exon. They include an
implicit group tag that can be displayed with the other exons. Is that what
you need?

If you print out a feature table in GFF format (with seqretallfeat) what
you see is pretty much what is stored internally. The Sequence and
FeatFlags information is part of the feature data, rather than part of the
tag-value list, and is used for keeping multiple exons together. For
example:

seqretallfeat tembl:hsegl1 gff::hsegl1.gff

We could probably add the Sequence tag to the showfeat output (although it
is not part of the EMBL feature table) or we could duplicate all the tags
if that's what users would prefer.

A short example from the test data set would be:

showfeat tembl:hsegl1 -tags


-- 
------------------------------------------------
Peter Rice, LION Bioscience Ltd, Cambridge, UK
peter.rice at uk.lionbioscience.com +44 1223 224723


From bauer at genprofile.com  Tue Jul 31 06:54:33 2001
From: bauer at genprofile.com (David Bauer)
Date: Tue, 31 Jul 2001 12:54:33 +0200
Subject: showfeat overlaping CDS
References: <3B666636.EEFD11CF@genprofile.com> <3B6670CE.A7BBDEFA@uk.lionbioscience.com>
Message-ID: <3B668E69.C7FD3FF7@genprofile.com>

Peter Rice wrote:
> Internally the tags are stored with the first exon. They include an
> implicit group tag that can be displayed with the other exons. Is that what
> you need?

Yes, this would be nice. It would be clear which exon belongs to which
splice variant.
The feature display in showseq is similar.

> If you print out a feature table in GFF format (with seqretallfeat) what
> you see is pretty much what is stored internally. The Sequence and
> FeatFlags information is part of the feature data, rather than part of the
> tag-value list, and is used for keeping multiple exons together. For

What I get is a FeatFlags "0x100" for the first exon and a FeatFlags
"0x104" for the consecutive exons. The flags are the same for both CDS.
But the Sequence has a .## which differs between the two CDS. I think if
showfeat (and showseq) could show the complete tags with the first exon
and just the Sequence tag with the remaining exons with the -tags
option.

> We could probably add the Sequence tag to the showfeat output (although it
> is not part of the EMBL feature table) or we could duplicate all the tags
> if that's what users would prefer.

I think duplication of all tags is not necessary, the Sequence tag is
sufficient.

David.
-- 
Dr. David Bauer
GenProfile AG, Max-Delbrueck-Center, Erwin-Negelein-Haus 
Robert-Roessle-Str. 10, D-13125 Berlin, Germany
bauer at genprofile.com, Tel:49-30-94892165, FAX:49-30-94892151


From fchetou at infobiogen.fr  Tue Jul  3 08:55:58 2001
From: fchetou at infobiogen.fr (Farid Chetouani)
Date: Tue, 03 Jul 2001 10:55:58 +0200
Subject: Protein Clustering tool
Message-ID: <3B41889E.3EDC8D1E@infobiogen.fr>

Bonjour 

I would like to know,
if there is plan in Emboss to develop
a software to cluster protein into families (of paralogues/orthologues)
according to the sequence similarity

thank you for your help

F

PS: please reply to my email fchetou at infobiogen.fr


From frank at bioss.sari.ac.uk  Tue Jul  3 09:18:20 2001
From: frank at bioss.sari.ac.uk (Frank Wright)
Date: Tue, 03 Jul 2001 10:18:20 +0100
Subject: Protein Clustering tool
References: <3B41889E.3EDC8D1E@infobiogen.fr>
Message-ID: <3B418DDC.F2004E00@bioss.sari.ac.uk>

Hi All,

  If you wish to construct phylogenetic trees (specifically gene trees)
from protein sequences so as to infer duplication and
paralogous/orthologous relationships, then you can use the PHYLIP
package (available as an EMBASSY application).  Genetic distances can be
calculated using EPROTDIST and the distance matrix created can be input
into either EFITCH (slower, more accurate tree) or ENEIGHBOR (faster,
more approximate clustering method, allowing the use of the
Neighbor-Joining algorithm, or the UPGMA algorithm - use the latter only
if you have previously tested that the "molecular clock" assumption is
valid for your dataset).

  ePROTDIST, eFITCH and eNEIGHBOR come from version 3.5 of the PHYLIP
package (http://evolution.genetics.washington.edu).  PHYLIP 3.6 has
recently been released (alpha version).  However, PROTDIST 3.6 has
improved distances (copes with among-site rate heterogeneity to give
more accurate genetic distances) and there are also improvements to
NEIGHBOR 3.6 (faster) and to FITCH 3.6.  I presume that PHYLIP 3.6 will
be available as an EMBASSY application once it is confident that there
are no serious bugs :-)

I hope that helps,
Best Wishes,
Frank 
-- 
Frank Wright
Biomathematics and Statistics Scotland, 
SCRI, DUNDEE DD2 5DA, Scotland
frank at bioss.sari.ac.uk


From fchetou at pasteur.fr  Tue Jul  3 09:38:29 2001
From: fchetou at pasteur.fr (Farid Chetouani)
Date: Tue, 3 Jul 2001 11:38:29 +0200
Subject: Protein Clustering tool
In-Reply-To: <3B418DDC.F2004E00@bioss.sari.ac.uk>; from frank@bioss.sari.ac.uk on Tue, Jul 03, 2001 at 10:18:20AM +0100
References: <3B41889E.3EDC8D1E@infobiogen.fr> <3B418DDC.F2004E00@bioss.sari.ac.uk>
Message-ID: <20010703113829.A38883@pasteur.fr>

Bonjour 

Firstly, Frank thank you for your reply.
I am sorry my first email was not enough precise.

In fact,
I was wondering if EMBOSS plan to provide a free clustering tool
with a view to get from a protein fasta sequence file
a list of family proteins. 

For instance, thanks to A. Enright & C. Ouzounis
GeneRage software is free for academic research
(http://www.ebi.ac.uk/research/cgg/services/rage/)
but the sources are not yet available

best regards
thank you for your help
F

PS: please reply to my email, fchetou at infobiogen.fr

> 
>   If you wish to construct phylogenetic trees (specifically gene trees)
> from protein sequences so as to infer duplication and
> paralogous/orthologous relationships, then you can use the PHYLIP
> package (available as an EMBASSY application).  Genetic distances can be
> calculated using EPROTDIST and the distance matrix created can be input
> into either EFITCH (slower, more accurate tree) or ENEIGHBOR (faster,
> more approximate clustering method, allowing the use of the
> Neighbor-Joining algorithm, or the UPGMA algorithm - use the latter only
> if you have previously tested that the "molecular clock" assumption is
> valid for your dataset).
> 
>   ePROTDIST, eFITCH and eNEIGHBOR come from version 3.5 of the PHYLIP
> package (http://evolution.genetics.washington.edu).  PHYLIP 3.6 has
> recently been released (alpha version).  However, PROTDIST 3.6 has
> improved distances (copes with among-site rate heterogeneity to give
> more accurate genetic distances) and there are also improvements to
> NEIGHBOR 3.6 (faster) and to FITCH 3.6.  I presume that PHYLIP 3.6 will
> be available as an EMBASSY application once it is confident that there
> are no serious bugs :-)
> 
> I hope that helps,
> Best Wishes,
> Frank 
> -- 
> Frank Wright
> Biomathematics and Statistics Scotland, 
> SCRI, DUNDEE DD2 5DA, Scotland
> frank at bioss.sari.ac.uk


From jison at hgmp.mrc.ac.uk  Tue Jul  3 09:48:04 2001
From: jison at hgmp.mrc.ac.uk (Dr J.C. Ison)
Date: Tue, 03 Jul 2001 10:48:04 +0100
Subject: Protein Clustering tool
References: <3B41889E.3EDC8D1E@infobiogen.fr>
Message-ID: <3B4194D4.929C7A3D@hgmp.mrc.ac.uk>

Software to cluster protein sequences into families on
the basis of relatedness of sequence is on my list of
jobs to do - will happen within the next 3 months.

I personally need something quite simple minded, if
you have any specific requirements let me know and
I can try and pull it in my design.

Cheers

J.

Farid Chetouani wrote:

> Bonjour
>
> I would like to know,
> if there is plan in Emboss to develop
> a software to cluster protein into families (of paralogues/orthologues)
> according to the sequence similarity
>
> thank you for your help
>
> F
>
> PS: please reply to my email fchetou at infobiogen.fr

--
Jon C. Ison, PhD
Bioinformatics Applications Group
UK MRC Human Genome Mapping Project Resource Centre
Hinxton, Cambridge, CB10 1SB, UK
E-mail : jison at hgmp.mrc.ac.uk
Tel    : 01223 49-4548
HGMP-RC: http://www.hgmp.mrc.ac.uk/
EMBOSS : http://www.hgmp.mrc.ac.uk/Software/EMBOSS/
CCP11  : http://www.hgmp.mrc.ac.uk/CCP11/


From gbottu at ben.vub.ac.be  Mon Jul  9 09:35:09 2001
From: gbottu at ben.vub.ac.be (Guy Bottu)
Date: Mon, 9 Jul 2001 11:35:09 +0200 (MET DST)
Subject: No subject
Message-ID: <200107090935.LAA09786@bigben.vub.ac.be>

	Dear friends,
	
I am puzzled by pscan outputs. I do not see the difference between "Not all 
elements match but those that do are in order" and "Remaining partial matches", 
since in both cases there are two matches with the same element. And, in 
general, how does pscan handle cases where you the protein really contains 
several times the same motif (e.g. proteins with kringles). Can Alan or someone 
else anwer this ?

	Regards,
	Guy Bottu
-------------- next part --------------


CLASS 1
Fingerprints with all elements in order


CLASS 2
All elements match but not all in the correct order

Fingerprint HTHREPRESSR Elements 2
    Accession number PR00031
    Lambda and other repressor helix-turn-helix signature
  Element 1 Threshold 50% Score 73%
             Start position 135 Length 10
  Element 2 Threshold 32% Score 32%
             Start position 74 Length 17


CLASS 3
Not all elements match but those that do are in order

Fingerprint GEMCOATBR1 Elements 7
    Accession number PR00225
    Geminivirus BR1 coat protein signature
  Element 3 Threshold 30% Score 37%
             Start position 281 Length 15
  Element 3 Threshold 30% Score 31%
             Start position 196 Length 15


CLASS 4
Remaining partial matches

Fingerprint GABAARBETA Elements 4
    Accession number PR01160
    Gamma-aminobutyric-acid A receptor beta subunit signature
  Element 1 Threshold 33% Score 34%
             Start position 275 Length 15
  Element 1 Threshold 33% Score 33%
             Start position 187 Length 15


From sgmd at genetik.fu-berlin.de  Tue Jul 10 08:36:25 2001
From: sgmd at genetik.fu-berlin.de (Thomas Siegmund)
Date: Tue, 10 Jul 2001 10:36:25 +0200
Subject: Announce: X GUI for EMBOSS V0.5
Message-ID: <20010710083627.D881617AD6@mercury.hgmp.mrc.ac.uk>

Dear all,

a few months ago I announced my plan to build a X Window GUI for EMBOSS based 
on Kaptain and QT/KDE. Today I'd like to inform you that I have made some 
progress with it. Version 0.5 of EMBOSS.kaptn is available at 
http://userpage.fu-berlin.de/~sgmd .

ChangeLog:
==========
Version 0.5
- Covering 50 EMBOSS applications with (almost) all options
- Integrated EMBOSS help system
- Use new regexpression features of Kaptain 0.6. This allows fallback
  to EMBOSS defaults, if text input fields for parameters like "-outfile"
  are empty.
- Files can be selected by drag & drop
- Addition of embosslauncher, a tool to set the working directory and to run
  different EMBOSS applications with the same sequence file
- Simple install script
Version 0.1
- First simple GUIs for 12 EMBOSS applications
- First public announcement at emboss at embnet.org

Please give it a try and let me know what you think.

With best regards

Thomas

-- 
Thomas Siegmund
Freie Universit?t Berlin
Institut f?r Genetik
Arnimallee 7
14195 Berlin
Germany
Tel: +49 30 838 54868
Fax: +49 30 838 54395
http://userpage.fu-berlin.de/~sgmd


From friends at openxxx.net  Sun Jul 15 01:17:19 2001
From: friends at openxxx.net (friends at openxxx.net)
Date: Sun, 15 Jul 2001 02:17:19 +0100 (BST)
Subject: Hello, your friend recommended openxxx to you
Message-ID: <20010715011719.4C2CA17A56@mercury.hgmp.mrc.ac.uk>


You have been invited to check out this adult site
by one of your friends who visited us.

our URL is http://www.openxxx.net/
enjoy,
OpenXXX TEAM 2001


From ableasby at hgmp.mrc.ac.uk  Sun Jul 15 12:51:31 2001
From: ableasby at hgmp.mrc.ac.uk (ableasby at hgmp.mrc.ac.uk)
Date: Sun, 15 Jul 2001 13:51:31 +0100 (BST)
Subject: Announcing EMBOSS 2.0.0
Message-ID: <200107151251.NAA11553@bromine.hgmp.mrc.ac.uk>

EMBOSS 2.0.0 includes:

1. Feature table reading: EMBL, Swissprot and PIR feature tables are
   handled by rewritten library routines. Tables can currently be
   read and written (or interconverted) in native or GFF formats.
   For the applications programmer an internal key/value pair
   structure greatly simplifies use.

2. Report Handling: Stub code to enable application output to be
   selected in (a range of) standard output report formats has
   been included. The feature tables above use one of these
   formats. More report formats will be added during the lifetime of
   the 2.x.x series. Release 3.0.0 of EMBOSS will mark the completion
   of this phase.

3. Code purification: All library code and applications are tested
   for memory handling before release. To our knowledge the code does
   not leak a single byte in normal use. A "purify" script is provided
   (mainly for developers).

4. Quality control: code has been written, supplied and used for
   testing code prior to release. This ensures that applications
   produce the same output (where appropriate) after changes to
   the library etc. A QA test script is provided/

5. Code modification: almost all the source code has been revamped
   since the 1.x.x series. All functions, including those in
   applications, have unique names. This now allows you to navigate
   the entire source code using SRS.

6. Protein structure code has been added and, although not yet
   complete, this marks one of many new directions for applications.

Not entirely by coincidence the release of 2.0.0, like 1.0.0, has
happened on St Swithin's Day (15th July) just prior to the ISMB
conference. So, if it works on that day it should work for 40 days
thereafter! We look forward to making the same joke again next year (1).

Alan

On behalf of the development team (apologies if your name has been
omitted by accident) who are:

HGMP: Alan Bleasby, Tim Carver, Jon Ison, Ranjeeva Ranasinghe,
      Gary Williams
Lion Bioscience: Peter Rice

Special thanks to David Martin (University of Dundee) for the
administration guide. To Lisa Mullan (HGMP training courses) for
providing feedback and suggestions from course attendees.

Thanks to all who have made suggestions, provided bug reports or
contributed code. If we've failed to acknowledge you here you
should be there in the source code. If not, tell us and we'll fix
it!


Footnote:

1.
St. Swithin's Day if thou dost rain, 
For forty days it will remain; 
St. Swithin's Day if thou be fair, 
For forty days 'twill rain na mair.


From ableasby at hgmp.mrc.ac.uk  Mon Jul 16 10:55:04 2001
From: ableasby at hgmp.mrc.ac.uk (ableasby at hgmp.mrc.ac.uk)
Date: Mon, 16 Jul 2001 11:55:04 +0100 (BST)
Subject: EMBOSS 2.0.0 amendment
Message-ID: <200107161055.LAA13298@bromine.hgmp.mrc.ac.uk>

There was an omission in the original EMBOSS-2.0.0.tar.gz file
which would have resulted in 3 of the protein structure acd
files not being copied after a "make install". This has now
been corrected and a replacement file put on the server.

Alan


From dessen at infobiogen.fr  Mon Jul 16 12:25:38 2001
From: dessen at infobiogen.fr (Philippe Dessen)
Date: Mon, 16 Jul 2001 14:25:38 +0200
Subject: fuzznuc
Message-ID: <3B52DC18.3141B98D@infobiogen.fr>

Just a question about fuzznuc :
Is it possible to define a pattern with repetition of a motif (as n
letters with n>1) ?
That is not mentionned in documentation .

The following pattern (a stop codon in a coding frame) seems to be
illegal !
<(NNN)(0,)TGA(NNN)(1,)>

$ fuzznuc seqfile
Nucleic acid pattern search
Search pattern: <(NNN)(0,)TGA(NNN)(1,)>
Number of mismatches [0]:
Output file [rptufrpx.fuzznuc]:
   This is a warning: Illegal character [(]

   EMBOSS An error in fuzznuc.c at line 96:
Illegal pattern

--------
in GCG syntax  you can use  (NNN){1,}


Regards

Philippe Dessen


From gwilliam at hgmp.mrc.ac.uk  Mon Jul 16 12:33:33 2001
From: gwilliam at hgmp.mrc.ac.uk (Gary Williams, Tel 01223 494522)
Date: Mon, 16 Jul 2001 13:33:33 +0100
Subject: fuzznuc
References: <3B52DC18.3141B98D@infobiogen.fr>
Message-ID: <3B52DF1D.4B3F2FCF@hgmp.mrc.ac.uk>

Philippe Dessen wrote:
> 
> Just a question about fuzznuc :
> Is it possible to define a pattern with repetition of a motif (as n
> letters with n>1) ?
> That is not mentionned in documentation .
> 
> The following pattern (a stop codon in a coding frame) seems to be
> illegal !
> <(NNN)(0,)TGA(NNN)(1,)>
> 
> $ fuzznuc seqfile
> Nucleic acid pattern search
> Search pattern: <(NNN)(0,)TGA(NNN)(1,)>
> Number of mismatches [0]:
> Output file [rptufrpx.fuzznuc]:
>    This is a warning: Illegal character [(]
> 
>    EMBOSS An error in fuzznuc.c at line 96:
> Illegal pattern


I think this is illegal in fuzznuc's PROSITE-style of pattern.
You might like to try 'dreg' instead with a regular expression like:

^(...)*TGA(...)+$

Note that these regilar expressions are case-sensitive, so put '-supper'
on your command  line to
force the sequence into the required upper case.

Gary

-- 
Gary Williams               Tel: +44 1223 494522  Fax: +44 1223 494512
mailto:G.Williams at hgmp.mrc.ac.uk            http://www.hgmp.mrc.ac.uk/
Bioinformatics,MRC HGMP Resource Centre,Hinxton,Cambridge, CB10 1SB,UK


From kala at avesthagen.com  Mon Jul 23 06:43:47 2001
From: kala at avesthagen.com (Kala)
Date: Mon, 23 Jul 2001 12:13:47 +0530 (IST)
Subject: help pl.
Message-ID: <Pine.LNX.4.33.0107231213370.8793-100000@mail.avesthagen.com>

Hi all,
 Cud u pl.tell me whether i can install Emboss on True64Unix...
 I'm unable to untar it...It says "not look like a tar archive"...

 It'll b very useful if i get a reply soon.

thanx in adv.
kala


From bauer at genprofile.com  Mon Jul 23 07:06:57 2001
From: bauer at genprofile.com (David Bauer)
Date: Mon, 23 Jul 2001 09:06:57 +0200
Subject: help pl.
References: <Pine.LNX.4.33.0107231213370.8793-100000@mail.avesthagen.com>
Message-ID: <3B5BCD11.C2E54D9B@genprofile.com>

Kala wrote:
> 
> Hi all,
>  Cud u pl.tell me whether i can install Emboss on True64Unix...
>  I'm unable to untar it...It says "not look like a tar archive"...

The download file is a tar file which is compressed with gzip.
If you have gnu tar use 'tar -xvzf <filename>'. The z options says its a
compressed tar file.
If your systems tar does not know how to handle compressed archives
(like e.g. the Solaris tar) you must first run 'gunzip <filname>'.

I hope this helps,

Ciao, David.


From dmartin at bioinformatics.msiwtb.dundee.ac.uk  Mon Jul 23 08:52:14 2001
From: dmartin at bioinformatics.msiwtb.dundee.ac.uk (David Martin)
Date: Mon, 23 Jul 2001 09:52:14 +0100 (BST)
Subject: help pl.
In-Reply-To: <Pine.LNX.4.33.0107231213370.8793-100000@mail.avesthagen.com>
Message-ID: <Pine.LNX.4.33.0107230950100.29068-100000@bioinformatics.msiwtb.dundee.ac.uk>

On Mon, 23 Jul 2001, Kala wrote:

> Hi all,
>  Cud u pl.tell me whether i can install Emboss on True64Unix...
>  I'm unable to untar it...It says "not look like a tar archive"...

First ensure that it was transferred in binary mode.

Secondly, you will need to gunzip the archive before untarr'ing. Gnu tar
includes gunzip (tar zxf filename) whereas many vendor supplied versions
don't. The following command line may help

zcat filename | tar xf -

..d

>
>  It'll b very useful if i get a reply soon.
>
> thanx in adv.
> kala
>
>
>

----------------------------------
David Martin PhD
Bioinformatics Scientific Officer
Wellcome Trust Biocentre, Dundee
----------------------------------


From gbottu at ben.vub.ac.be  Wed Jul 25 18:31:27 2001
From: gbottu at ben.vub.ac.be (Guy Bottu)
Date: Wed, 25 Jul 2001 20:31:27 +0200 (MET DST)
Subject: tfscan blues
Message-ID: <200107251831.UAA17816@bigben.vub.ac.be>

from : BEN

	Dear colleagues,
	
I had already posted this question before, but nobody had replied. The problem 
is that the value of the program tfscan is decreasing, since we cannot get 
updates of TRANSFAC anymore, unless we pay a licence, and I wonder whether at 
all EMBnet Nodes have the right to give access to their users.
For info, see  http://www.biobase.de/academia.html
Anybody a comment ?

	Guy Bottu


From c.plessy at mangoosta.net  Wed Jul 25 21:18:47 2001
From: c.plessy at mangoosta.net (Charles Plessy)
Date: Wed, 25 Jul 2001 23:18:47 +0200
Subject: tfscan blues
In-Reply-To: <200107251831.UAA17816@bigben.vub.ac.be>
References: <200107251831.UAA17816@bigben.vub.ac.be>
Message-ID: <01072523184702.02531@moulinette>

Le Mercredi 25 Juillet 2001 20:31, Guy Bottu a ?crit :
> from : BEN
>
> 	Dear colleagues,
>
> I had already posted this question before, but nobody had replied. The
> problem is that the value of the program tfscan is decreasing, since we
> cannot get updates of TRANSFAC anymore, unless we pay a licence, and I
> wonder whether at all EMBnet Nodes have the right to give access to their
> users.
> For info, see  http://www.biobase.de/academia.html
> Anybody a comment ?

I have a related question : do you think that it would be possible to build 
fake transfac databases from a simple file?
Currently I'm adding into an array (in the GCG findpattern format) any 
binding site of my interest that i find in the litterature. (with a name and 
a reference)
The goal would be to use existing programs to do searches within a set of 
home-selected transcription factors.

Charles PLESSY


From dmartin at bioinformatics.msiwtb.dundee.ac.uk  Thu Jul 26 07:55:54 2001
From: dmartin at bioinformatics.msiwtb.dundee.ac.uk (David Martin)
Date: Thu, 26 Jul 2001 08:55:54 +0100 (BST)
Subject: tfscan blues
In-Reply-To: <01072523184702.02531@moulinette>
Message-ID: <Pine.LNX.4.33.0107260854020.19723-100000@bioinformatics.msiwtb.dundee.ac.uk>

On Wed, 25 Jul 2001, Charles Plessy wrote:

> Le Mercredi 25 Juillet 2001 20:31, Guy Bottu a ?crit :
> > from : BEN
> >
> > 	Dear colleagues,
> >
> > I had already posted this question before, but nobody had replied. The
> > problem is that the value of the program tfscan is decreasing, since we
> > cannot get updates of TRANSFAC anymore, unless we pay a licence, and I
> > wonder whether at all EMBnet Nodes have the right to give access to their
> > users.
> > For info, see  http://www.biobase.de/academia.html
> > Anybody a comment ?
>
> I have a related question : do you think that it would be possible to build
> fake transfac databases from a simple file?
> Currently I'm adding into an array (in the GCG findpattern format) any
> binding site of my interest that i find in the litterature. (with a name and
> a reference)
> The goal would be to use existing programs to do searches within a set of
> home-selected transcription factors.

It woul dbe nice to have a public front end for such a database so that
submissions could be sent to a curator. Then we can return the information
to the public domain (all literature referenced of course so we cannot be
accused of stealing TRANSFAC).

..d


----------------------------------
David Martin PhD
Bioinformatics Scientific Officer
Wellcome Trust Biocentre, Dundee
----------------------------------


From charles at moulinette.dyndns.org  Thu Jul 26 22:41:20 2001
From: charles at moulinette.dyndns.org (Charles)
Date: Fri, 27 Jul 2001 00:41:20 +0200 (CEST)
Subject: tfscan blues
In-Reply-To: <Pine.LNX.4.33.0107260854020.19723-100000@bioinformatics.msiwtb.dundee.ac.uk>
Message-ID: <Pine.LNX.4.33.0107270018380.6343-100000@moulinette.dyndns.org>

> > I have a related question : do you think that it would be possible to build
> > fake transfac databases from a simple file?
> > Currently I'm adding into an array (in the GCG findpattern format) any
> > binding site of my interest that i find in the litterature. (with a name and
> > a reference)
> > The goal would be to use existing programs to do searches within a set of
> > home-selected transcription factors.
>
> It woul dbe nice to have a public front end for such a database so that
> submissions could be sent to a curator. Then we can return the information
> to the public domain (all literature referenced of course so we cannot be
> accused of stealing TRANSFAC).

Well, i bet that tools allowing public contribution could be a CGI form,
or a CVS archive, but setting up those interfaces is far beyond my
capacities.

Currently i have betveen 40 and 50 entries, very focused on molecular
biology of development in early vertebrate embryos. Some are complex and
other degenerate. I could not get something interesting of it for
the moment, using GCG findpatterns : i get either close to no sites or
plenty if i allow mismatches.

I can easily imagine a way to store more information in a separate array,
and then build a findpattern data file using a perl script. But i'd like
to try some programs that give a score to the matches, in order to search
for high complexity binding sites more efficiently.

Charles


From s.roehrig at xantos.de  Fri Jul 27 12:38:03 2001
From: s.roehrig at xantos.de (Roehrig, Sascha)
Date: Fri, 27 Jul 2001 14:38:03 +0200
Subject: coderet error with embl format file
Message-ID: <A31E2A68EB0ED411950A00D0B72601C10EE701@XAN-BDC>

Dear all,
 
I encountered an error while retrieving the feature table from the sample
embl database entry:
http://srs6.ebi.ac.uk/srs6bin/cgi-bin/wgetz?-e+[EMBL-ID:'TRBG361'
<http://srs6.ebi.ac.uk/srs6bin/cgi-bin/wgetz?-e+[EMBL-ID:'TRBG361'> ].
 
The cds and mRNA were shown correctly. However, the translation was missing
the first line of amino acids and ended with double quotes.
 
Has anybody else noticed the same?
 
Best wishes,
 
Sascha
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.open-bio.org/pipermail/emboss/attachments/20010727/25d0545e/attachment-0001.html>

From peter.rice at uk.lionbioscience.com  Fri Jul 27 12:50:28 2001
From: peter.rice at uk.lionbioscience.com (Peter Rice)
Date: Fri, 27 Jul 2001 13:50:28 +0100
Subject: coderet error with embl format file
References: <A31E2A68EB0ED411950A00D0B72601C10EE701@XAN-BDC>
Message-ID: <3B616394.54F03A92@uk.lionbioscience.com>

"Roehrig, Sascha" wrote:

> I encountered an error while retrieving the feature table from the
>sample embl database entry:
>http://srs6.ebi.ac.uk/srs6bin/cgi-bin/wgetz?-e+[EMBL-ID:'TRBG361'
> 
>The cds and mRNA were shown correctly. However, the translation was
>missing the first line of amino acids and ended with double quotes.

Works for me in 2.0.0

There were some feature handling code changes in 2.0.0 - perhaps you can
simply install the new version.

regards,

Peter

-- 
------------------------------------------------
Peter Rice, LION Bioscience Ltd, Cambridge, UK
peter.rice at uk.lionbioscience.com +44 1223 224723


From ableasby at hgmp.mrc.ac.uk  Sun Jul 29 23:54:37 2001
From: ableasby at hgmp.mrc.ac.uk (ableasby at hgmp.mrc.ac.uk)
Date: Mon, 30 Jul 2001 00:54:37 +0100 (BST)
Subject: EMBOSS 2.0.1 (& HMMER)
Message-ID: <200107292354.AAA28489@bromine.hgmp.mrc.ac.uk>

EMBOSS 2.0.1 fixes an indexing problem with DBIGCG and split entries.
It also incorporates handling of the Selex format as used in the
HMMER package.

HMMER 2.1.1 has been converted for EMBOSS and appears in the
download directory (ftp://ftp.uk.embnet.org/pub/EMBOSS/) as
the 'embassy' package HMMER-2.1.1.tar.gz

Alan


From lukem at bioinfo.pbi.nrc.ca  Mon Jul 30 04:09:24 2001
From: lukem at bioinfo.pbi.nrc.ca (Luke McCarthy)
Date: Sun, 29 Jul 2001 22:09:24 -0600
Subject: EMBOSS GUI
Message-ID: <5.1.0.14.0.20010729190020.00a13a10@bioinfo.pbi.nrc.ca>

Hi everybody,

On and off over the past year or so, I've been developing a GUI for the 
EMBOSS tools designed to operate over the web.  It's been listed at the 
EMBOSS web site for most of that time, but in the last month I've 
significantly improved it to the point where I think it could be very 
useful to the entire EMBOSS user community.  But I'd like a little help to 
that end.

Before I release this interface out into the wild, I'd like it to be as 
polished as possible.  I don't have a lot of time right now to do extensive 
testing of any kind ('testing' to this date has only involved one trial run 
of each application), plus I don't actually use many of these tools in 
practice, so I don't know if they're actually useful the way they're 
currently presented.  So I'd like to solicit your assistance as EMBOSS users:

If you could find the time to drop by 
http://bioinfo.pbi.nrc.ca:8090/EMBOSS/index.html , try out your favourite 
EMBOSS tool, offer any suggestion or criticism that comes to mind, and 
definitely let me know if something doesn't work, I would be very 
grateful.  Criticism of the look and feel (colours, font size, general 
appearance) is appreciated, but all of that is eminently configurable 
through the use of style sheets (which means if you're using an older 
browser, this probably won't work very well for you), so it's not really 
too helpful.  What I'm really looking for are places where the interface is 
awkward or difficult to understand.  And support for the frame/page 
groupings that EMBOSS 2.0.0 allows will be coming in the next couple of 
days, before anyone suggests that.

About the interface itself: the scripts build the input collection pages on 
the fly, reading relevant information from the ACD files (incidentally, in 
the process of building this interface I've written an ACD->XML converter 
if anyone would find that useful ;)  Because of this, it's remarkably 
robust to changes in the tools themselves.  Even the menu is generated 
dynamically, so only those tools which are available on your system will be 
listed (for example, if you haven't installed the EMBASSY stuff it won't 
show up...)  You can also have the script dump all of the input collection 
pages and the menus to static HTML files if you're expecting heavy traffic 
and don't want to waste system resources...

Anyway, that's my story.  I strongly urge anyone who thinks a GUI for the 
EMBOSS tools would be a useful thing to drop by and help me make this one 
all that it can be.  Any questions can be directed to me personally if you 
don't want to clutter up the list.

Cheers,

Luke McCarthy
Bioinformatics Group,
Plant Biotechnology Institute,
National Research Council of Canada
lukem at bioinfo.pbi.nrc.ca


From bauer at genprofile.com  Tue Jul 31 08:03:02 2001
From: bauer at genprofile.com (David Bauer)
Date: Tue, 31 Jul 2001 10:03:02 +0200
Subject: showfeat overlaping CDS
Message-ID: <3B666636.EEFD11CF@genprofile.com>

Hi,

I have a EMBL file with 2 CDS entries which stand for alternatively
spliced products. I would like to display only one of them at a time
with showfeat.
Both have a /gene and /label with the gene name (e.g. gene1 gene2). 
So what I thought was to use:
-matchtype=cds -matchtag=label -matchvalue=gene1
The matchtype works as I expect but with any kind of matchtag or
matchvalue I'm getting core dumps.
So what's wrong with the above example ?

Also if I use -tags with a spliced CDS the tags are displayed only with
the first exon, all other exons get just a CDS so it is not visible
which of the remaining exons belongs to which of the genes.

Thanks, David.
-- 
Dr. David Bauer
GenProfile AG, Max-Delbrueck-Center, Erwin-Negelein-Haus 
Robert-Roessle-Str. 10, D-13125 Berlin, Germany
bauer at genprofile.com, Tel:49-30-94892165, FAX:49-30-94892151


From peter.rice at uk.lionbioscience.com  Tue Jul 31 08:48:14 2001
From: peter.rice at uk.lionbioscience.com (Peter Rice)
Date: Tue, 31 Jul 2001 09:48:14 +0100
Subject: showfeat overlaping CDS
References: <3B666636.EEFD11CF@genprofile.com>
Message-ID: <3B6670CE.A7BBDEFA@uk.lionbioscience.com>

David Bauer wrote:

> I have a EMBL file with 2 CDS entries which stand for alternatively
> spliced products. I would like to display only one of them at a time
> with showfeat.
> Both have a /gene and /label with the gene name (e.g. gene1 gene2).
> So what I thought was to use:
> -matchtype=cds -matchtag=label -matchvalue=gene1
> The matchtype works as I expect but with any kind of matchtag or
> matchvalue I'm getting core dumps.

Works for me with 2.0.1, but purify complains horribly - most likely the
same problem. We will fix it and add these command line options to the new
test set.

> Also if I use -tags with a spliced CDS the tags are displayed only with
> the first exon, all other exons get just a CDS so it is not visible
> which of the remaining exons belongs to which of the genes.

Internally the tags are stored with the first exon. They include an
implicit group tag that can be displayed with the other exons. Is that what
you need?

If you print out a feature table in GFF format (with seqretallfeat) what
you see is pretty much what is stored internally. The Sequence and
FeatFlags information is part of the feature data, rather than part of the
tag-value list, and is used for keeping multiple exons together. For
example:

seqretallfeat tembl:hsegl1 gff::hsegl1.gff

We could probably add the Sequence tag to the showfeat output (although it
is not part of the EMBL feature table) or we could duplicate all the tags
if that's what users would prefer.

A short example from the test data set would be:

showfeat tembl:hsegl1 -tags


-- 
------------------------------------------------
Peter Rice, LION Bioscience Ltd, Cambridge, UK
peter.rice at uk.lionbioscience.com +44 1223 224723


From bauer at genprofile.com  Tue Jul 31 10:54:33 2001
From: bauer at genprofile.com (David Bauer)
Date: Tue, 31 Jul 2001 12:54:33 +0200
Subject: showfeat overlaping CDS
References: <3B666636.EEFD11CF@genprofile.com> <3B6670CE.A7BBDEFA@uk.lionbioscience.com>
Message-ID: <3B668E69.C7FD3FF7@genprofile.com>

Peter Rice wrote:
> Internally the tags are stored with the first exon. They include an
> implicit group tag that can be displayed with the other exons. Is that what
> you need?

Yes, this would be nice. It would be clear which exon belongs to which
splice variant.
The feature display in showseq is similar.

> If you print out a feature table in GFF format (with seqretallfeat) what
> you see is pretty much what is stored internally. The Sequence and
> FeatFlags information is part of the feature data, rather than part of the
> tag-value list, and is used for keeping multiple exons together. For

What I get is a FeatFlags "0x100" for the first exon and a FeatFlags
"0x104" for the consecutive exons. The flags are the same for both CDS.
But the Sequence has a .## which differs between the two CDS. I think if
showfeat (and showseq) could show the complete tags with the first exon
and just the Sequence tag with the remaining exons with the -tags
option.

> We could probably add the Sequence tag to the showfeat output (although it
> is not part of the EMBL feature table) or we could duplicate all the tags
> if that's what users would prefer.

I think duplication of all tags is not necessary, the Sequence tag is
sufficient.

David.
-- 
Dr. David Bauer
GenProfile AG, Max-Delbrueck-Center, Erwin-Negelein-Haus 
Robert-Roessle-Str. 10, D-13125 Berlin, Germany
bauer at genprofile.com, Tel:49-30-94892165, FAX:49-30-94892151