From ableasby at hgmp.mrc.ac.uk  Wed Jul 13 10:36:28 2005
From: ableasby at hgmp.mrc.ac.uk (Alan Bleasby)
Date: Wed, 13 Jul 2005 15:36:28 +0100 (BST)
Subject: [EMBOSS] New email lists ready
Message-ID: <200507131436.j6DEaSF7027543@bromine.hgmp.mrc.ac.uk>

The new email addresses for the EMBOSS lists are now set up and ready
(excluding any teething problems). They are:

   emboss at emboss.open-bio.org
   emboss-dev at emboss.open-bio.org
   emboss-bug at emboss.open-bio.org
   emboss-submit at emboss.open-bio.org

You can access the archives, subscribe/unsubscribe and alter
the way email is sent to you (e.g. digests) by visiting:

  http://emboss.open-bio.org/mailman/listinfo/emboss
  http://emboss.open-bio.org/mailman/listinfo/emboss-dev
  http://emboss.open-bio.org/mailman/listinfo/emboss-announce
  http://emboss.open-bio.org/mailman/listinfo/emboss-bug

The new FTP server is at:

  ftp://emboss.open-bio.org/pub/EMBOSS


Alan


From tjc at sanger.ac.uk  Wed Jul 13 11:11:40 2005
From: tjc at sanger.ac.uk (Tim Carver)
Date: Wed, 13 Jul 2005 16:11:40 +0100
Subject: [EMBOSS] Jemboss Announcement
Message-ID: <BEFAEDBC.2243%tjc@sanger.ac.uk>


With the imminent closure of the RFCGR, there will be no publicly available
Jemboss server. Jemboss will remain available for download and installation
as part of the EMBOSS distribution. You may find there is a local Jemboss
server already available at your own institution.

If you would like to have your server listed on the Jemboss web page please
contact the EMBOSS group (emboss-dev at emboss.open-bio.org)


Tim Carver
The Wellcome Trust Sanger Institute
Wellcome Trust Genome Campus, Hinxton,
Cambridge, CB10 1SA, UK


From ableasby at hgmp.mrc.ac.uk  Thu Jul 14 19:43:30 2005
From: ableasby at hgmp.mrc.ac.uk (Alan Bleasby)
Date: Fri, 15 Jul 2005 00:43:30 +0100 (BST)
Subject: [EMBOSS] EMBOSS 3.0.0 released
Message-ID: <200507142343.j6ENhUn2002328@bromine.hgmp.mrc.ac.uk>

EMBOSS 3.0.0 is now available for download from:

   ftp://emboss.open-bio.org/pub/EMBOSS/

   and, until the 27th July, from:
   ftp://ftp.rfcgr.mrc.ac.uk/pub/EMBOSS/

The following text details some of the changes from the previous
release.

Alan


EMBOSS main package:

New database indexing programs dbxflat, dbxfasta and dbxgcg. A
dbxblast program will be added if we can extract data from the new
BLAST formatdb output. These programs allow indexing of files
larger than 2Gb.
N.B.: Indexes will be created faster if they are written through a
      different disc controller than that used to read the database
      being indexed. If that is not possible then reading from and
      writing to different hard drives on the same controller is
      recommended. Note that each index can be created independently
      of the others e.g. you can create keyword and description
      indexes after you've created the ID and ACC indexes.

To support these programs, the emboss.default and .embossrc files can
include "resource" definitions. See the documentation of these
programs for more information. "resource" definitions are intended to
define anything other than environment variables and databases.

In the emboss.default and .embossrc files the same name can be used
for variables, databases, and resources (we now store them in separate
tables). In previous versions a single table was used and name clashes
could occur. This becomes an issue with the increasing use of resource
definitions.

Sequence sets in ACD have a new attribute "aligned" that reports
whether the sequences are aligned (reading a multiple alignment in for
visualisation) or not (reading a set of sequences into memory for
further processing - perhaps for alignment).

Sequence formats have been reviewed. "experiment" format is that used
by the Staden package. "staden" and "gcg" formats now parse out
comments from anywhere in the sequence. "nexus" and "nexusnon" formats
now correctly report protein sequence datatypes. "nbrf" or "pir"
format data can now be read from an SRSWWW server (for technical
reasons, SRS servers are unable to exactly reproduce NBRF/PIR
format). "clustal" output no longer writes in blocks of 10.  "Phylip3"
output is now renamed "phylipnon" for compatibility with other
non-interleaved output format names. The "phylip3" name remains valid
for back-compatibility. The header record for phylipnon format has
been changed to that accepted by phylip 3.6 (no YF on the header line,
number of sequences specified). Sequence format information on the web
has been updated to reflect these changes.

Codon usage table formats can be in these formats (-format qualifier):
  "emboss",    "EMBOSS codon usage file",
        "All numbers read, #comments for extras"
  "cut",       "EMBOSS codon usage file",
        "Same as EMBOSS, output default format is 'cut'"
  "gcg",       "GCG codon usage file",
        "All numbers read, #comments for extras"
  "cutg",      "CUTG codon usage file",
        "All numbers (cutgaa) read or fraction calculated, extras added"
  "cutgaa",    "CUTG codon usage file with aminoacids",
        "Cutg with all numbers"
  "spsum",     CUTG species summary file",
        "Number only, species and CDSs in header"
  "cherry",    "Mike Cherry codonusage database file",
        "GCG format with species and CDSs in header"
  "transterm", "TransTerm database file",
        "GCG format with no extras"
  "codehop",   "FHCRC codehop program codon usage file",
        "Freq only, extras at end"
  "staden",    "Staden package codon usage file with percentages",
        "Freq or number only, no extras"
  "numstaden", "Staden package codon usage file with numbers",
       "Number only, no extras. Can be read as 'staden'"

Any of these formats should be readable by default. Some files are
"readable" in more than one format (staden and numstaden for example
can both be read as "staden"). The extra names are used so we can
reuse them as output format names.

For output of codon usage tables, the same formats are available
(-oformat qualifier).

A new application codcopy (not codret because coderet is already an
EMBOSS program name) will convert from one format to another in the
same way as seqret converts sequence formats.

Coderet reports the number of CDS, mRNA and translation sequences.

Correction to sequence numbering for reversed nucleotide sequences in
alignments. Correction to sequence alignment functions returning
slightly suboptimal alignments.

The entrails program reports codon usage formats. Description of
report format entrails output improved. Entrails is built by "make
check" and is provided so that developers of wrappers can obtain all
EMBOSS internal details needed, for example all ACD datatypes and
input/output format names and descriptions.

Sequence types are explicitly set in cons, sixpack and backtranseq as
some output formats failed to recognise them as protein.

EMBASSY packages:

MYEMBOSS is a new EMBASSY package for developing your own code.

Installation requires recent versions of GNU packages autoconf,
automake and libtool.

To install, you must first build the configure and make files with
these commands:

aclocal -I m4

autoconf

automake -a

When you add your own programs, do so by adding source files in
myemboss/source and ACD files in myemboss/emboss_acd and add these
filenames to the Makefile.am files in each directory. There are
"myseq" and "mytest" examples provided to guide you.

There is no need to modify configure or Makefile files - these will be
automatically updated.

To allow MYEMBOSS to be installed by one user, and linked to an EMBOSS
installation maintained for the site by someone else, new variables
are added to locate the ACD files for any EMBASSY package. If myemboss
is not installed in the same place as EMBOSS, define
EMBOSS_MYEMBOSSROOT as the location of the myemboss installed ACD
files or the myemboss/emboss_acd source directory. This requires that
EMBASSY programs call the embInitP function with the name of the
package ("myemboss"). For ACD utilities such as acdvalid or acdc to
work, as these use the EMBOSS embInit call, another variable
EMBOSS_ACDUTILROOT must be defined, pointing to the same directory.

PHYLIP is a beta release port of PHYLIP 3.6b. We welcome comments on
the EMBOSS interface to the programs. Program names are prefixed by
'f' to avoid clashes with the old PHYLIP EMBASSY package. We still
need to work on adding new tree input and output formats, and updating
the code to PHYLIP 3.63 (December 2004). We are also considering
splitting more of the programs to simplify the ACD interface. In this
release seqboot and treedist are already split. seqboot is split by
input type into seqboot, restboot, discboot and freqboot. Treedist is
split by the number of input files into treedist and
treedistpair. Acdvalid objects to the dependencies in other programs,
for example the method used by fdnadist.

The DOMAINATRIX package of earlier releases has been extended and
replaced by 5 EMBASSY packages described below (32 applications in
total).  These tools were developed as part of a research project and
are distinct from other EMBOSS apps in being intended mostly for
computational biologists rather than biologist end-users.

STRUCTURE

The STRUCTURE package is used for parsing the PDB database and
generating secondary databases of coordinate and derived data.  The
tools have the following scope: (i) For parsing PDB files and writing
clean coordinate files (CCF files) that "clean-up" many PDB
inconsistencies.  For example, residue numbers give the correct index
into the biological sequence.  (ii) To generate CCF files for whole
PDB files or individual domains from the SCOP and CATH databases.
(iii) To augment CCF files with residue solvent accessibility and
secondary structure data.  (iv) To generate contact files (CON files)
of intra-chain and inter-chain residue-residue contact data. (v) To
generate CON files of residue-ligand contact data. (vi) Miscellaneous
file handling, e.g. dictionary of heterogen groups.

DOMAINATRIX

The DOMAINATRIX package is used for handling the SCOP and CATH
databases of protein domain classification, the parsable files of
which can be inconvenient, e.g. for comparative studies, extending and
processing.  The tools have the following scope: (i) For parsing raw
SCOP and CATH parsable files and writing domain classification files
(DCF files) with a single, simple and extensible format. (ii) To add
sequence records to a DCF file. (iii) To remove low resolution
domains.  (iv) To flexibly calculate and remove redundancy.  (v)
Primitive tools for secondary structure element mapping to domains in
a DCF file.

DOMALIGN

The DOMALIGN package is used for generating alignments for families of
domains, especially across large datasets, e.g. the whole of SCOP.
The tools have the following scope: (i) For identifying representative
structures for different nodes in the SCOP and CATH hierarchies.  (ii)
For generating annotated, structure-based sequence alignments for
these nodes.  (iii) For extending these domain alignment files (DAF
files) with sequences of unknown structure. (iv) All-versus-all global
sequence alignment.

DOMSEARCH 

The DOMSEARCH package is used for deriving extended sequence families,
especially from large structural datasets such as the whole of SCOP.
The tools have the following scope: (i) To generate domain hits files
(DHF files) of sequence relatives to an alignment or other
sequences. (ii) To remove fragmentary sequences from a DHF file.
(iii) To flexibly calculate and remove redundancy.  (iv) To remove
hits hits of ambiguous classification and collate sequences into
families.

SIGNATURE

The SIGNATURE package is used for generating, scanning and evaluating
sparse signatures and other predictive elements for protein sequence
characterisation.  The tools have the following scope: (i) To generate
sparse signatures for protein families from alignments and residue
contact data.  (ii) Generate other types of discriminator (e.g. HMMs)
from alignments. (iii) Generate ligand-binding signatures from
residue-ligand contacts.  (iv) Generate domain hits files (DHF files)
and ligand hits files (LHF files) of hits (sequences) from signature
scans. (v) Interpretation and display of signature performance by
using ROC analysis.


Where data, files etc are mentioned above or in the application
documentation, data structures and functions for manipulating such are
usually provided in the AJAX and NUCLEUS C programming libraries.  For
example, there are objects for handling protein atoms, residues,
chains, for SCOP and CATH domains and so on.


From thiago.venancio at gmail.com  Mon Jul 18 08:09:33 2005
From: thiago.venancio at gmail.com (Thiago Venancio)
Date: Mon, 18 Jul 2005 09:09:33 -0300
Subject: [EMBOSS] error msg
Message-ID: <44255ea80507180509386875bd@mail.gmail.com>

Hi all.
I am new to EMBOSS. I have installed it and got the problem:

"wossname: error while loading shared libraries: libnucleus.so.3: cannot 
open shared object file: No such file or directory" 

All the EMBOSS programs give the same error.

The instalation process have been ok and i have set the envs.

Thanks in advance.

Thiago


From golharam at umdnj.edu  Tue Jul 19 12:51:30 2005
From: golharam at umdnj.edu (Ryan Golhar)
Date: Tue, 19 Jul 2005 12:51:30 -0400
Subject: [EMBOSS] EMBOSS::GUI Web Interface
Message-ID: <009401c58c82$1d2a3670$2f01a8c0@GOLHARMOBILE1>

Hi Luke,

Any word on when EMBOSS-GUI will be available for EMBOSS 3.0.0?

Thanks,

Ryan


From jacob at biochemistry.ucl.ac.uk  Wed Jul 20 11:36:24 2005
From: jacob at biochemistry.ucl.ac.uk (Jacob Hurst)
Date: Wed, 20 Jul 2005 16:36:24 +0100 (BST)
Subject: [EMBOSS] problem with using accession number....
Message-ID: <Pine.LNX.4.44.0507201629480.21438-100000@localhost.localdomain>

Hello,

If I enter the following id seqret correctly returns the sequence.

acrm3<113>% seqret embl:hsgstpig
Reads and writes (returns) sequences
Output sequence [hsgstpig.fasta]:

however if i enter the corresponding accession number it fails.....

acrm3<114>% seqret embl:X08058
Reads and writes (returns) sequences
Error: Unable to read sequence 'embl:X08058'
Died: seqret terminated: Bad value for '-sequence' and no prompt

I was under the impression that emboss was setup to deal with both 
accession and id. 

regards Jake


-- 
Jacob Hurst Phd
Department of Biochemistry and Molecular Biology,
University College London


From pmr at ebi.ac.uk  Wed Jul 20 11:59:55 2005
From: pmr at ebi.ac.uk (Peter Rice)
Date: Wed, 20 Jul 2005 16:59:55 +0100
Subject: [EMBOSS] problem with using accession number....
In-Reply-To: <Pine.LNX.4.44.0507201629480.21438-100000@localhost.localdomain>
References: <Pine.LNX.4.44.0507201629480.21438-100000@localhost.localdomain>
Message-ID: <42DE74FB.80604@ebi.ac.uk>

Jacob Hurst wrote:
> I was under the impression that emboss was setup to deal with both 
> accession and id. 

Yes, but ... this depends on how the embl database is defined at your site.

Some sites have databases defined to access entries through, for example, a 
URL or an external application (or script) that can only search for entry names.

Hmmmm .... we could add a little more information on this in showdb .... for a 
future release.

If you have difficulty finding out how the database is defined, mail us at 
emboss-bug at emboss.open-bio.org and we can help you track it down.

regards,

Peter Rice


From golharam at umdnj.edu  Thu Jul 21 00:00:03 2005
From: golharam at umdnj.edu (Ryan Golhar)
Date: Thu, 21 Jul 2005 00:00:03 -0400
Subject: [EMBOSS] EMBOSS 3.0.0 RPMs available
Message-ID: <013801c58da8$acc10440$2f01a8c0@GOLHARMOBILE1>

I'm eager to upgrade our installation of EMBOSS on all our linux
workstations, so I've gone ahead and built RPMs for EMBOSS (based on
biolinux version) and MYEMBOSS applications.

You can download the RPMs and source RPMs from
http://serine.umdnj.edu/~golharam/biorpms.

They include (sorry for the capitalization):

DOMAINATRIX
DOMALIGN
DOMSEARCH
EMBOSS
EMBOSS-data
EMBOSS-devel
EMBOSS-Jemboss
EMNU
ESIM4
HMMER
MEME
MSE
MYEMBOSS
PHYLIP
SIGNATURE
STRUCTURE
TOPO

--
Ryan Golhar  -  golharam at umdnj.edu
The Informatics Institute of UMDNJ


From james_tan79 at hotmail.com  Thu Jul 21 05:41:24 2005
From: james_tan79 at hotmail.com (JT)
Date: Thu, 21 Jul 2005 17:41:24 +0800
Subject: [EMBOSS] any DNA or RNA program similar to pepstat ?
Message-ID: <BAY14-DAV2D38F07B066414E7C775995D60@phx.gbl>

Hi,

Is there any program that can output a report of simple DNA/RNA sequence 
information including e.g.
a) Molecular weight
b) Number of residues
c) Average residue weight
d) %G, %C, %A, %T, %GC
e) Melting temp
f) charge etc.

Thanks
James 


From jison at hgmp.mrc.ac.uk  Thu Jul 21 06:49:58 2005
From: jison at hgmp.mrc.ac.uk (Dr J.C. Ison)
Date: Thu, 21 Jul 2005 11:49:58 +0100
Subject: [EMBOSS] any DNA or RNA program similar to pepstat ?
References: <BAY14-DAV2D38F07B066414E7C775995D60@phx.gbl>
Message-ID: <42DF7DD6.CD81DF9B@hgmp.mrc.ac.uk>

Hi James

There's no single app to cover all your request, but some of the 
following might help (see http://emboss.sourceforge.net/apps/)

dan         Plot melting temperatures for DNA. 
freak       Residue/base frequency table or plot 
extractfeat Extract features from a sequence 
geecee      Calculates the fractional GC content of nucleic acid sequences 
infoseq     Displays some simple information about sequences 
isochore    Plots isochores in large DNA sequences 
newcpgseek  Reports CpG rich regions 
remap       Display a sequence with restriction cut sites, translation etc.. 
showfeat    Show features of a sequence. 

Please have a look at what's available and if you require something 
else / new functionality etc please get back in touch.

Cheers

Jon


JT wrote:
> 
> Hi,
> 
> Is there any program that can output a report of simple DNA/RNA sequence
> information including e.g.
> a) Molecular weight
> b) Number of residues
> c) Average residue weight
> d) %G, %C, %A, %T, %GC
> e) Melting temp
> f) charge etc.
> 
> Thanks
> James
> _______________________________________________
> EMBOSS mailing list
> EMBOSS at emboss.open-bio.org
> http://newportal.open-bio.org/mailman/listinfo/emboss

-- 
Jon C. Ison, PhD
Proteomics Applications Group
MRC Rosalind Franklin Centre for Genomics Research
Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SB, UK
Tel: +44 1223 494500  Fax: +44 1223 494512
E-mail: jison at rfcgr.mrc.ac.uk  Web: http://www.rfcgr.mrc.ac.uk


From kertib at linuxlap.hu  Thu Jul 21 07:38:34 2005
From: kertib at linuxlap.hu (Kerti =?iso-8859-1?q?Bal=E1zs_G=E1bor?=)
Date: Thu, 21 Jul 2005 13:38:34 +0200
Subject: [EMBOSS] Some question
Message-ID: <200507211338.35035.kertib@linuxlap.hu>

Hello!

There is some (elementary) question, because I do not find - maybe I do wrong 
- the solution.

- how to backtranslate a cds mRNA fragm. to (c)DNA fragm. ?
- how to generate antisense DNA fragm. from a sens.

Thank you.

Balazs


From pmr at ebi.ac.uk  Thu Jul 21 07:58:58 2005
From: pmr at ebi.ac.uk (Peter Rice)
Date: Thu, 21 Jul 2005 12:58:58 +0100
Subject: [EMBOSS] Some question
In-Reply-To: <200507211338.35035.kertib@linuxlap.hu>
References: <200507211338.35035.kertib@linuxlap.hu>
Message-ID: <42DF8E02.40909@ebi.ac.uk>

Kerti Bal?zs G?bor wrote:

> There is some (elementary) question, because I do not find - maybe I do wrong 
> - the solution.
> 
> - how to backtranslate a cds mRNA fragm. to (c)DNA fragm. ?

The cDNA will be identical to the mRNA. No backtranslation needed. 
Backtranslation (as in backtranseq) converts a protein sequence into a 
nucleotide sequence that will translate to the same protein sequence (using 
the most frequent codon for each amino acid).

If you only want to convert U (Uracil) to T (thymine) to convert an RNA 
sequence to DNA (all EMBOSS programs will accept both as nucleotide input) you 
can modify the program seqret to specify a nucleotide sequence as input, and 
generate a DNA sequence as output. An easy way to start writing EMBOSS 
programs - copy one program and one ACD file and make 4 small edits.

> - how to generate antisense DNA fragm. from a sens.

In EMBOSS, revseq does this. The antisense strand is smilpy the reverse 
compleemnt of the original.

Hope this helps,

Peter Rice


From jison at hgmp.mrc.ac.uk  Thu Jul 21 08:10:50 2005
From: jison at hgmp.mrc.ac.uk (Dr J.C. Ison)
Date: Thu, 21 Jul 2005 13:10:50 +0100
Subject: [EMBOSS] Some question
References: <200507211338.35035.kertib@linuxlap.hu>
Message-ID: <42DF90CA.9FB0FD34@hgmp.mrc.ac.uk>

Dear Balazs

See http://emboss.sourceforge.net/apps/ for application documentation.

transeq       Translates nucleic acid sequences.   (i.e. DNA -> protein)
backtranseq   Back translate a protein sequence    (i.e. protein -> DNA)
coderet       Extract CDS, mRNA and translations from feature tables 

I don't think there is anything to interchange sense/antisense or mRNA / DNA
sequences but something could be written if you let us know exactly what you
need / why you need it.

Cheers

Jon


Kerti Bal?zs G?bor wrote:
> 
> Hello!
> 
> There is some (elementary) question, because I do not find - maybe I do wrong
> - the solution.
> 
> - how to backtranslate a cds mRNA fragm. to (c)DNA fragm. ?
> - how to generate antisense DNA fragm. from a sens.
> 
> Thank you.
> 
> Balazs
> _______________________________________________
> EMBOSS mailing list
> EMBOSS at emboss.open-bio.org
> http://newportal.open-bio.org/mailman/listinfo/emboss

-- 
Jon C. Ison, PhD
Proteomics Applications Group
MRC Rosalind Franklin Centre for Genomics Research
Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SB, UK
Tel: +44 1223 494500  Fax: +44 1223 494512
E-mail: jison at rfcgr.mrc.ac.uk  Web: http://www.rfcgr.mrc.ac.uk


From faruque at ebi.ac.uk  Thu Jul 21 09:08:30 2005
From: faruque at ebi.ac.uk (Nadeem Faruque)
Date: Thu, 21 Jul 2005 14:08:30 +0100
Subject: [EMBOSS] backtranseq
In-Reply-To: <42DF90CA.9FB0FD34@hgmp.mrc.ac.uk>
References: <200507211338.35035.kertib@linuxlap.hu>
	<42DF90CA.9FB0FD34@hgmp.mrc.ac.uk>
Message-ID: <42DF9E4E.6060603@ebi.ac.uk>


> See http://emboss.sourceforge.net/apps/ for application documentation.
> 
> transeq       Translates nucleic acid sequences.   (i.e. DNA -> protein)
> backtranseq   Back translate a protein sequence    (i.e. protein -> DNA)
...

While backtranseq is very clever in predicting the cDNA sequence based on peptide sequence by choosing codons according 
to useage, would it not be very useful to have the option for it to return an answer in degenerate bases?

eg in human, the 'peptide' is simply 'M'
backtranseq returns the most likely codon used, ie 'ATG'
but since it could be TTG, CTG or ATG, it may be more useful for some people to return 'HTG'

Returning a degenerate sequence would have the advantage (for some uses) of being usable by normal DNA-savvy 
string-based search methods when finding the peptide coding location in nucleic acid sequences rather than having to use 
similarity searches.  I could also see it being useful for designing PCR primers within coding regions.

Nadeem

-- 
S.M. Nadeem N. Faruque
EMBL Nucleotide Database Curation Team
EMBL Outstation
Tel: +44 1223 494611                     Fax: +44 1223 494472
The European Bioinformatics Institute    URL: http://www.ebi.ac.uk/
Email for data submissions: datasubs at ebi.ac.uk
Email for updates: update at ebi.ac.uk
=============================================================================


From pmr at ebi.ac.uk  Thu Jul 21 10:00:30 2005
From: pmr at ebi.ac.uk (Peter Rice)
Date: Thu, 21 Jul 2005 15:00:30 +0100
Subject: [EMBOSS] backtranseq
In-Reply-To: <42DF9E4E.6060603@ebi.ac.uk>
References: <200507211338.35035.kertib@linuxlap.hu>	<42DF90CA.9FB0FD34@hgmp.mrc.ac.uk>
	<42DF9E4E.6060603@ebi.ac.uk>
Message-ID: <42DFAA7E.2070107@ebi.ac.uk>

Nadeem Faruque wrote:

> While backtranseq is very clever in predicting the cDNA sequence based on peptide sequence by choosing codons according 
> to useage, would it not be very useful to have the option for it to return an answer in degenerate bases?
> 
> eg in human, the 'peptide' is simply 'M'
> backtranseq returns the most likely codon used, ie 'ATG'
> but since it could be TTG, CTG or ATG, it may be more useful for some people to return 'HTG'

Ummmm .... depends on the genetic code. In human I would expect ATG, in 
bacteria GCG is second schoice and NTG would be the possible result - but only 
for a start codon of course (just one of the complexities of backtranslating - 
I think we must avoid inventing a start codon if the protein doesn't start 
with 'M' because the numbering gets complicated).

As this would need a different input (a genetic code, rather than a codon 
usage file) I would make this a different program - not difficult to write,

Any good suggestions for a program name?

> Returning a degenerate sequence would have the advantage (for some uses) of being usable by normal DNA-savvy 
> string-based search methods when finding the peptide coding location in nucleic acid sequences rather than having to use 
> similarity searches.  I could also see it being useful for designing PCR primers within coding regions.

... which leads on to whether EMBOSS should include such programs :-)

regards,

Peter Rice


From jcherry at ncbi.nlm.nih.gov  Thu Jul 21 10:58:14 2005
From: jcherry at ncbi.nlm.nih.gov (Josh Cherry)
Date: Thu, 21 Jul 2005 10:58:14 -0400 (EDT)
Subject: [EMBOSS] backtranseq
In-Reply-To: <42DFAA7E.2070107@ebi.ac.uk>
References: <200507211338.35035.kertib@linuxlap.hu>
	<42DF90CA.9FB0FD34@hgmp.mrc.ac.uk>
	<42DF9E4E.6060603@ebi.ac.uk> <42DFAA7E.2070107@ebi.ac.uk>
Message-ID: <Pine.LNX.4.58.0507211047070.25800@widget0.ncbi.nlm.nih.gov>


Nadeem Faruque wrote:

> Returning a degenerate sequence would have the advantage (for some uses)
> of being usable by normal DNA-savvy string-based search methods when
> finding the peptide coding location in nucleic acid sequences rather
> than having to use similarity searches.

But this won't work the way some might hope due to the nature of the
genetic code, specifically (in the standard code) the three amino acids
that have six codons each (S, L, and R).  Consider serine, encoded by UCN
and AGY.  Would you like this to be back-translated to WSN?  That matches
all six serine codons but also ten non-serine codons.  Some people may
still want to use it in a probe or primer though.

Josh

--
Joshua L. Cherry, Ph.D.
NCBI/NLM/NIH (Contractor)
jcherry at ncbi.nlm.nih.gov


From faruque at ebi.ac.uk  Thu Jul 21 11:21:35 2005
From: faruque at ebi.ac.uk (Nadeem Faruque)
Date: Thu, 21 Jul 2005 16:21:35 +0100
Subject: [EMBOSS] backtranseq
In-Reply-To: <Pine.LNX.4.58.0507211047070.25800@widget0.ncbi.nlm.nih.gov>
References: <200507211338.35035.kertib@linuxlap.hu>
	<42DF90CA.9FB0FD34@hgmp.mrc.ac.uk> <42DF9E4E.6060603@ebi.ac.uk>
	<42DFAA7E.2070107@ebi.ac.uk>
	<Pine.LNX.4.58.0507211047070.25800@widget0.ncbi.nlm.nih.gov>
Message-ID: <42DFBD7F.7060306@ebi.ac.uk>

Josh Cherry wrote:
> Nadeem Faruque wrote:
> 
> 
>>Returning a degenerate sequence would have the advantage (for some uses)
>>of being usable by normal DNA-savvy string-based search methods when
>>finding the peptide coding location in nucleic acid sequences rather
>>than having to use similarity searches.
> 
> 
> But this won't work the way some might hope due to the nature of the
> genetic code, specifically (in the standard code) the three amino acids
> that have six codons each (S, L, and R).  Consider serine, encoded by UCN
> and AGY.  Would you like this to be back-translated to WSN?  That matches
> all six serine codons but also ten non-serine codons.  Some people may
> still want to use it in a probe or primer though.

I was going to use Serine in my example but realised 'WSN' was a bit too degenerate to be a useful example.
I understand you could not roundtrip peptide->DNA->peptide with my suggested behaviour (as you can currently do with 
backtranseq), but you can do DNA->peptide->DNA in a usable form.
I'm sketchy about its potential use in oligo design, but given a degenerate backtranslation someone could possibly 
design oligos so as to avoid the more degenerate areas (esp for the 3' end of primers).  If they were to use backtranseq 
they would be ignorant of these regions.

Nadeem

-- 
S.M. Nadeem N. Faruque
EMBL Nucleotide Database Curation Team
EMBL Outstation
Tel: +44 1223 494611                     Fax: +44 1223 494472
The European Bioinformatics Institute    URL: http://www.ebi.ac.uk/
Email for data submissions: datasubs at ebi.ac.uk
Email for updates: update at ebi.ac.uk
=============================================================================


From pmr at ebi.ac.uk  Thu Jul 21 11:55:15 2005
From: pmr at ebi.ac.uk (Peter Rice)
Date: Thu, 21 Jul 2005 16:55:15 +0100
Subject: [EMBOSS] backtranseq
In-Reply-To: <42DFBD7F.7060306@ebi.ac.uk>
References: <200507211338.35035.kertib@linuxlap.hu>	<42DF90CA.9FB0FD34@hgmp.mrc.ac.uk>
	<42DF9E4E.6060603@ebi.ac.uk>	<42DFAA7E.2070107@ebi.ac.uk>	<Pine.LNX.4.58.0507211047070.25800@widget0.ncbi.nlm.nih.gov>
	<42DFBD7F.7060306@ebi.ac.uk>
Message-ID: <42DFC563.4010600@ebi.ac.uk>

Nadeem Faruque wrote:
> Josh Cherry wrote:
>>But this won't work the way some might hope due to the nature of the
>>genetic code, specifically (in the standard code) the three amino acids
>>that have six codons each (S, L, and R).  Consider serine, encoded by UCN
>>and AGY.  Would you like this to be back-translated to WSN?  That matches
>>all six serine codons but also ten non-serine codons.  Some people may
>>still want to use it in a probe or primer though.
> 
> I was going to use Serine in my example but realised 'WSN' was a bit too degenerate to be a useful example.
> I understand you could not roundtrip peptide->DNA->peptide with my suggested behaviour

... I bet you can!!!  Assuming you have a backtranslated sequence, WSN would 
be surely Serine (as would UCN or AGY). If any of the 3 positions is more 
specific, that could indicate one of the other possibilities.

I would be happy to accept a lower case residue if the result is uncertain (if 
the ambiguity codes do not match what one would expect from the genetic code 
in a backtranslation). For ASN the answer could be T (ACN) S (AGY) or R (AGR) 
with T ('t') the favourite by a majority vote (4/4 codons match, 2/6 for the 
others).

X can be used if all else fails. After all, we could be translating a sequence 
with a SNP. A command line option can give the user a choice of trying to 
resolve unclear positions or using X.

Degenerate codons would be:

A GCN
C UGY
D GAY
E GAR
F UUY
G GGN
H CAY
I AUH
K AAR
L YUN (CUN/UUR) - also matches F (UUY)
M AUG
N AAY
P CCN
Q CAR
R MGN (CGN/AGR) - also matches S (AGY)
S WSN (UCN/AGY) - also matches T (ACN)
                   also matches R (AGR)
                   also matches C and W and * (UGN)

T ACN
V GUN
W UGG
Y UAY
* URR - also matcheds W (UGG)
m NUG (start codon)


From lukem at gene.pbi.nrc.ca  Thu Jul 21 17:08:32 2005
From: lukem at gene.pbi.nrc.ca (Luke McCarthy)
Date: Thu, 21 Jul 2005 15:08:32 -0600
Subject: [EMBOSS] EMBOSS explorer
Message-ID: <1121980112.5376.11.camel@incognito.invalid>

Hi everybody,

I'm pleased to finally announce a new release of the EMBOSS interface
formerly known as EMBOSS::GUI, now known as EMBOSS explorer.

Development has moved to SourceForge.net and the new home page for the
interface is http://embossgui.sourceforge.net/  It's quite spartan at
the moment, but I'll be adding a FAQ as questions are frequent asked
(and answered...)

You can download EMBOSS explorer at
http://prdownloads.sourceforge.net/embossgui/emboss-explorer-2.0.0.tar.gz?download

The new release has been tested against EMBOSS-3.0.0, but not
thoroughly.  Please report bugs using the bug tracker at
http://sourceforge.net/tracker/?atid=699414&group_id=124389&func=browse
(as a last resort, email them to mccarthy at users.sourceforge.net, but I'm
hoping that use of the bug tracker will help with duplicate reports and
other organizational issues...)

Cheers,

Luke


From gwilliam at hgmp.mrc.ac.uk  Fri Jul 22 04:21:40 2005
From: gwilliam at hgmp.mrc.ac.uk (Gary Williams, Tel 01223 494522)
Date: Fri, 22 Jul 2005 09:21:40 +0100
Subject: [EMBOSS] backtranseq
References: <200507211338.35035.kertib@linuxlap.hu>	<42DF90CA.9FB0FD34@hgmp.mrc.ac.uk>
	<42DF9E4E.6060603@ebi.ac.uk> <42DFAA7E.2070107@ebi.ac.uk>
Message-ID: <42E0AC94.63F132A7@hgmp.mrc.ac.uk>

Peter Rice wrote:
> 
> Nadeem Faruque wrote:
> 
> > While backtranseq is very clever in predicting the cDNA sequence based on peptide sequence by choosing codons according
> > to useage, would it not be very useful to have the option for it to return an answer in degenerate bases?
> >
> > eg in human, the 'peptide' is simply 'M'
> > backtranseq returns the most likely codon used, ie 'ATG'
> > but since it could be TTG, CTG or ATG, it may be more useful for some people to return 'HTG'
> 
> Ummmm .... depends on the genetic code. In human I would expect ATG, in
> bacteria GCG is second schoice and NTG would be the possible result - but only
> for a start codon of course (just one of the complexities of backtranslating -
> I think we must avoid inventing a start codon if the protein doesn't start
> with 'M' because the numbering gets complicated).
> 
> As this would need a different input (a genetic code, rather than a codon
> usage file) I would make this a different program - not difficult to write,
> 
> Any good suggestions for a program name?

barebackseq

-- 
Gary Williams
MRC Rosalind Franklin Centre for Genomics Research
Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SB, UK
Tel: +44 1223 494522 
(UNTIL END OF JULY 2005)

E-mail: gareth.williams57 at ntlworld.com


From gbottu at ben.vub.ac.be  Fri Jul 22 05:10:17 2005
From: gbottu at ben.vub.ac.be (Guy Bottu)
Date: Fri, 22 Jul 2005 11:10:17 +0200
Subject: [EMBOSS] Some question
In-Reply-To: <42DF8E02.40909@ebi.ac.uk>
References: <200507211338.35035.kertib@linuxlap.hu> <42DF8E02.40909@ebi.ac.uk>
Message-ID: <20050722091017.GA27340@bigben.ulb.ac.be>

On Thu, Jul 21, 2005 at 12:58:58PM +0100, Peter Rice wrote:
> Kerti Bal?zs G?bor wrote:
> 
> > There is some (elementary) question, because I do not find - maybe I do wrong 
> > - the solution.
> > 
> > - how to backtranslate a cds mRNA fragm. to (c)DNA fragm. ?
> 
> The cDNA will be identical to the mRNA. No backtranslation needed. 
> Backtranslation (as in backtranseq) converts a protein sequence into a 
> nucleotide sequence that will translate to the same protein sequence (using 
> the most frequent codon for each amino acid).
> 
> If you only want to convert U (Uracil) to T (thymine) to convert an RNA 
> sequence to DNA (all EMBOSS programs will accept both as nucleotide input) you 
> can modify the program seqret to specify a nucleotide sequence as input, and 
> generate a DNA sequence as output. An easy way to start writing EMBOSS 
> programs - copy one program and one ACD file and make 4 small edits.

No need to modify seqret, the EMBOSS program biosed can be used to replace 
U by T in a sequence.

	Guy Bottu,
	Belgian EMBnet Node


From gbottu at ben.vub.ac.be  Fri Jul 22 05:26:38 2005
From: gbottu at ben.vub.ac.be (Guy Bottu)
Date: Fri, 22 Jul 2005 11:26:38 +0200
Subject: [EMBOSS] backtranseq
In-Reply-To: <42DFC563.4010600@ebi.ac.uk>
References: <200507211338.35035.kertib@linuxlap.hu>
	<42DF90CA.9FB0FD34@hgmp.mrc.ac.uk> <42DF9E4E.6060603@ebi.ac.uk>
	<42DFAA7E.2070107@ebi.ac.uk>
	<Pine.LNX.4.58.0507211047070.25800@widget0.ncbi.nlm.nih.gov>
	<42DFBD7F.7060306@ebi.ac.uk> <42DFC563.4010600@ebi.ac.uk>
Message-ID: <20050722092638.GB27340@bigben.ulb.ac.be>

I remember that the GCG program backtranslate let the use choose between 
the most likely backtranslation (as backtranseq does) and the most 
ambiguous backtranslation. So, adding to EMBOSS a program that makes the
most ambiguous backtranslation would bring back this lost functionality.

As for the problem cases like Serine, maybe an option to make instead of a 
sequence with ambiguity symbols a regular expression that exactly matches 
the allowed codons ? The utility of this may be limited, but you could 
e.g. if you have a peptide use the backtranslation with the program dreg to
search the corresponding CDS in a piece of DNA.

	Regards,
	Guy Bottu,
	Belgian EMBnet Node


From faruque at ebi.ac.uk  Fri Jul 22 06:22:27 2005
From: faruque at ebi.ac.uk (Nadeem Faruque)
Date: Fri, 22 Jul 2005 11:22:27 +0100
Subject: [EMBOSS] backtranseq
In-Reply-To: <20050722092638.GB27340@bigben.ulb.ac.be>
References: <200507211338.35035.kertib@linuxlap.hu>	<42DF90CA.9FB0FD34@hgmp.mrc.ac.uk>
	<42DF9E4E.6060603@ebi.ac.uk>	<42DFAA7E.2070107@ebi.ac.uk>	<Pine.LNX.4.58.0507211047070.25800@widget0.ncbi.nlm.nih.gov>	<42DFBD7F.7060306@ebi.ac.uk>
	<42DFC563.4010600@ebi.ac.uk>
	<20050722092638.GB27340@bigben.ulb.ac.be>
Message-ID: <42E0C8E3.8060900@ebi.ac.uk>

> As for the problem cases like Serine, maybe an option to make instead of a 
> sequence with ambiguity symbols a regular expression that exactly matches 
> the allowed codons ? The utility of this may be limited, but you could 
> e.g. if you have a peptide use the backtranslation with the program dreg to
> search the corresponding CDS in a piece of DNA.

I think we'd be better off with plain old IUPAC rather than venturing into more comples systems or we'll end up with 
weighted matrices or even HMM's.
The advantage of IUPAC is of course that you can plug it into most other programs.

Nadeem

-- 
S.M. Nadeem N. Faruque
EMBL Nucleotide Database Curation Team
EMBL Outstation
Tel: +44 1223 494611                     Fax: +44 1223 494472
The European Bioinformatics Institute    URL: http://www.ebi.ac.uk/
Email for data submissions: datasubs at ebi.ac.uk
Email for updates: update at ebi.ac.uk
=============================================================================


From pmr at ebi.ac.uk  Fri Jul 22 08:52:49 2005
From: pmr at ebi.ac.uk (Peter Rice)
Date: Fri, 22 Jul 2005 13:52:49 +0100
Subject: [EMBOSS] backtranseq
In-Reply-To: <42E0C8E3.8060900@ebi.ac.uk>
References: <200507211338.35035.kertib@linuxlap.hu>	<42DF90CA.9FB0FD34@hgmp.mrc.ac.uk>	<42DF9E4E.6060603@ebi.ac.uk>	<42DFAA7E.2070107@ebi.ac.uk>	<Pine.LNX.4.58.0507211047070.25800@widget0.ncbi.nlm.nih.gov>	<42DFBD7F.7060306@ebi.ac.uk>	<42DFC563.4010600@ebi.ac.uk>	<20050722092638.GB27340@bigben.ulb.ac.be>
	<42E0C8E3.8060900@ebi.ac.uk>
Message-ID: <42E0EC21.4030607@ebi.ac.uk>

Nadeem Faruque wrote:

> I think we'd be better off with plain old IUPAC rather than venturing into more comples systems or we'll end up with 
> weighted matrices or even HMM's.
> The advantage of IUPAC is of course that you can plug it into most other programs.

Well .... how about this part of IUPAC:

IUBMB recommends marking unclear codons, for example in
http://www.chem.qmul.ac.uk/iubmb/misc/naseq.html

"To avoid ambiguity, therefore, it is important to make it clear whenever the 
triplet YTN, for example, occurs in a sequence deduced from the occurrence of 
a leucine residue in the corresponding amino acid sequence that it does not 
include TTT or TTC as possibilities, etc. To emphasise this, it may be helpful 
to print such triplets in italics."

... we could use lowercase, rather than italics, to make this clear.

IUPAC also allows uncertain positions with (A,C,D) or (H.I.K.L). EMBOSS allows 
these, but after checking all occurrences in PIR it simply ignores the extra 
characters and assumes the amino acids are in the correct sequence. These are 
needed because Sanger protein sequencing determined composition but usually 
not the order of residues.

I see no codes for a choice of amino acids, other than B (D or N) and Z (E or 
Q), both from amino acid sequence composition, where hydrolyzing all amide 
bonds converted N to D (Asparagine to Aspartate) and Q to E (glutamine to 
glutamate). Also, one IUPAC report notes that NMR data can include J for "I or 
L" as Leucine and Isoleucine are indistinguishable by NMR. EBMOSS so far 
ignores this code (I only discovered it today :-).

U is now officially used for selenocysteine, although many EMBOSS programs 
cannot handle U and have to use X. The only character not used in amino acid 
sequence is O. I have seen it used in DNA sequence (CpG islands represented as 
OJ for specialised alignment scoring in one publication).


From pmr at ebi.ac.uk  Fri Jul 22 11:00:01 2005
From: pmr at ebi.ac.uk (Peter Rice)
Date: Fri, 22 Jul 2005 16:00:01 +0100
Subject: [EMBOSS] EMBOSS in August
Message-ID: <42E109F1.9070604@ebi.ac.uk>

We know it is close to the end of July, and we have not said what is happening 
to the EMBOSS team. We do have a solution, but it is not yet officially confirmed.

The Rosalind Franklin Centre for Genomic Research will close at the end of 
next week. The EMBOSS project will move to the European Bioinformatics 
Institute from August 1st. Development and support will continue as before.

The EMBOSS homepage will remain at http://emboss.sourceforge.net/

The FTP server (to download EMBOSS releases and updates) has moved to 
ftp://emboss.open-bio.org/pub/EMBOSS/

The EMBOSS anonymous CVS server will remain at cvs.open-bio.org hosted by the 
Open Bio Foundation, who will also continue to host the developers' CVS server.

The EMBOSS mailing lists have been moved to the Open Bio Foundation, so the 
addresses are now:

To contact the EMBOSS team:

emboss-bug at emboss.open-bio.org Bug reports and support requests
emboss-submit at emboss.open-bio.org Code submissions

Lists users/developers can subscribe to:

emboss at emboss.open-bio.org Users mailing list
emboss-dev at emboss.open-bio.org Developers mailing list
emboss-announce at emboss.open-bio.org New release announcements list

There are obvious gaps in these details ... more news as soon as we have 
confirmation.

regards,

Peter Rice, Alan Bleasby and the EMBOSS team.


From maoj at helix.nih.gov  Mon Jul 25 09:58:06 2005
From: maoj at helix.nih.gov (Jean Mao)
Date: Mon, 25 Jul 2005 09:58:06 -0400
Subject: [EMBOSS] (no subject)
Message-ID: <200507251358.j6PDw5N94183035@helix.nih.gov>

Hello all,

I am building emboss package on our linux cluster. Since it will be for
multiple batch run purpose, there is no need for us to include X11. I got
the following error during 'make install'. Can someone tell me which
programs use X11 and how to turn it off in them before running 'make
install'? Many thanks!!!
----------------------------------------------------------------------------
---------------------------------------
make[2]: Entering directory `/usr/local/EMBOSS-3.0.0/emboss'
/bin/sh ../libtool --tag=CC --mode=link gcc  -O2   -o aaindexextract
aaindexextract.o ../nucleus/libnucleus.la ../ajax/libajaxg.la
../ajax/libajax.la ../plplot/libplplot.la -lX11  -lm 
gcc -O2 -o .libs/aaindexextract aaindexextract.o
../nucleus/.libs/libnucleus.so ../ajax/.libs/libajaxg.so
../ajax/.libs/libajax.so ../plplot/.libs/libplplot.so -lX11 -lm -Wl,--rpath
-Wl,/usr/local/EMBOSS-3.0.0/lib
/usr/bin/ld: cannot find -lX11
collect2: ld returned 1 exit status
make[2]: *** [aaindexextract] Error 1
make[2]: Leaving directory `/usr/local/EMBOSS-3.0.0/emboss'
make[1]: *** [install-recursive] Error 1
make[1]: Leaving directory `/usr/local/EMBOSS-3.0.0/emboss'
make: *** [install-recursive] Error 1
----------------------------------------------------------------------------
---------------------------------------
Jean


From msarachu at biol.unlp.edu.ar  Mon Jul 25 10:44:12 2005
From: msarachu at biol.unlp.edu.ar (Martin Sarachu)
Date: Mon, 25 Jul 2005 11:44:12 -0300
Subject: [EMBOSS] wEMBOSS-1.5 & wrappers4EMBOSS-1.3
Message-ID: <42E4FABC.20302@biol.unlp.edu.ar>

This message is to announce the release of wEMBOSS-1.5 and 
wrappers4EMBOSS-1.3

wEMBOSS-1.5 includes:
* a session indicator to identify which user is running wEMBOSS
* the posibility to add notes to project results

wrappers4EMBOSS-1.3 includes:
* codehop wrapper for selecting degenerated primers
* muscle wrapper for multiple alignements

Both are available at http://www.wemboss.org


-- 
Martin Sarachu
msarachu at biol.unlp.edu.ar
AR.EMBnet
http://www.ar.embnet.org


From maoj at helix.nih.gov  Mon Jul 25 11:20:39 2005
From: maoj at helix.nih.gov (Jean Mao)
Date: Mon, 25 Jul 2005 11:20:39 -0400
Subject: [EMBOSS] How to exclude X11 when Compile Emboss
In-Reply-To: <71B0C9CB1FF4EA43BB48C08DCFF1A1FF01364AC3@NIHCESMLBX.nih.gov>
Message-ID: <200507251520.j6PFKdN93765833@helix.nih.gov>


> Hello all,
> 
> I am building emboss package on our linux cluster. Since it will be for
> multiple batch run purpose, there is no need for us to include X11. I got
> the following error during 'make install'. Can someone tell me which
> programs use X11 and how to turn it off in them before running 'make
> install'? Many thanks!!!
> -------------------------------------------------------
> make[2]: Entering directory `/usr/local/EMBOSS-3.0.0/emboss'
> /bin/sh ../libtool --tag=CC --mode=link gcc  -O2   -o aaindexextract
> aaindexextract.o ../nucleus/libnucleus.la ../ajax/libajaxg.la
> ../ajax/libajax.la ../plplot/libplplot.la -lX11  -lm 
> gcc -O2 -o .libs/aaindexextract aaindexextract.o
> ../nucleus/.libs/libnucleus.so ../ajax/.libs/libajaxg.so
> ../ajax/.libs/libajax.so ../plplot/.libs/libplplot.so -lX11 -lm
> -Wl,--rpath -Wl,/usr/local/EMBOSS-3.0.0/lib
> /usr/bin/ld: cannot find -lX11
> collect2: ld returned 1 exit status
> make[2]: *** [aaindexextract] Error 1
> make[2]: Leaving directory `/usr/local/EMBOSS-3.0.0/emboss'
> make[1]: *** [install-recursive] Error 1
> make[1]: Leaving directory `/usr/local/EMBOSS-3.0.0/emboss'
> make: *** [install-recursive] Error 1
> ----------------------------------------------------------
> Jean
> 


From maoj at mail.nih.gov  Mon Jul 25 09:56:20 2005
From: maoj at mail.nih.gov (Mao, Jean (NIH/CIT))
Date: Mon, 25 Jul 2005 09:56:20 -0400
Subject: [EMBOSS] How to Turn X11 off during Make?
Message-ID: <71B0C9CB1FF4EA43BB48C08DCFF1A1FF01730B6E@NIHCESMLBX.nih.gov>

Hello all,

I am building emboss package on our linux cluster. Since it will be for
multiple batch run purpose, there is no need for us to include X11. I got
the following error during 'make install'. Can someone tell me which
programs use X11 and how to turn it off in them before running 'make
install'? Many thanks!!!
----------------------------------------------------------------------------
---------------------------------------
make[2]: Entering directory `/usr/local/EMBOSS-3.0.0/emboss'
/bin/sh ../libtool --tag=CC --mode=link gcc  -O2   -o aaindexextract
aaindexextract.o ../nucleus/libnucleus.la ../ajax/libajaxg.la
../ajax/libajax.la ../plplot/libplplot.la -lX11  -lm 
gcc -O2 -o .libs/aaindexextract aaindexextract.o
../nucleus/.libs/libnucleus.so ../ajax/.libs/libajaxg.so
../ajax/.libs/libajax.so ../plplot/.libs/libplplot.so -lX11 -lm -Wl,--rpath
-Wl,/usr/local/EMBOSS-3.0.0/lib
/usr/bin/ld: cannot find -lX11
collect2: ld returned 1 exit status
make[2]: *** [aaindexextract] Error 1
make[2]: Leaving directory `/usr/local/EMBOSS-3.0.0/emboss'
make[1]: *** [install-recursive] Error 1
make[1]: Leaving directory `/usr/local/EMBOSS-3.0.0/emboss'
make: *** [install-recursive] Error 1
----------------------------------------------------------------------------
---------------------------------------
Jean


From idrummon at receptor.mgh.harvard.edu  Mon Jul 25 12:25:28 2005
From: idrummon at receptor.mgh.harvard.edu (Iain Drummond)
Date: Mon, 25 Jul 2005 12:25:28 -0400
Subject: [EMBOSS] How to exclude X11 when Compile Emboss
In-Reply-To: <200507251520.j6PFKdN93765833@helix.nih.gov>
Message-ID: <BF0A8AB8.C02F%idrummon@receptor.mgh.harvard.edu>

Jean,

Either tell emboss where to find the X11 libraries during the ./configure
step:

X features:
  --x-includes=DIR    X include files are in DIR
  --x-libraries=DIR   X library files are in DIR

for example

 ./configure  --x-includes=/usr/local/includes  --x-libraries=/usr/local/lib


or

decide not to use X11 at all

./configure --without-x

you can get this info by typing

./configure -help

Iain Drummond
-- 

Iain Drummond, Ph.D.
Assistant Professor
Department of Medicine, Harvard Medical School and
Renal Unit, Massachusetts General Hospital

Mailing address:
Renal Unit / MGH 149-8000
149 13th St. 
Charlestown, MA 02129

Tel: 617 726 5647
Fax: 617 726 5669

idrummond at partners.org
idrummon at receptor.mgh.harvard.edu

Lab Home Page:
http://danio.mgh.harvard.edu

> From: "Jean Mao" <maoj at helix.nih.gov>
> Organization: CIT
> Reply-To: maoj at helix.nih.gov
> Date: Mon, 25 Jul 2005 11:20:39 -0400
> To: <emboss at emboss.open-bio.org>
> Subject: [EMBOSS] How to exclude X11 when Compile Emboss
> 
> 
>> Hello all,
>> 
>> I am building emboss package on our linux cluster. Since it will be for
>> multiple batch run purpose, there is no need for us to include X11. I got
>> the following error during 'make install'. Can someone tell me which
>> programs use X11 and how to turn it off in them before running 'make
>> install'? Many thanks!!!
>> -------------------------------------------------------
>> make[2]: Entering directory `/usr/local/EMBOSS-3.0.0/emboss'
>> /bin/sh ../libtool --tag=CC --mode=link gcc  -O2   -o aaindexextract
>> aaindexextract.o ../nucleus/libnucleus.la ../ajax/libajaxg.la
>> ../ajax/libajax.la ../plplot/libplplot.la -lX11  -lm
>> gcc -O2 -o .libs/aaindexextract aaindexextract.o
>> ../nucleus/.libs/libnucleus.so ../ajax/.libs/libajaxg.so
>> ../ajax/.libs/libajax.so ../plplot/.libs/libplplot.so -lX11 -lm
>> -Wl,--rpath -Wl,/usr/local/EMBOSS-3.0.0/lib
>> /usr/bin/ld: cannot find -lX11
>> collect2: ld returned 1 exit status
>> make[2]: *** [aaindexextract] Error 1
>> make[2]: Leaving directory `/usr/local/EMBOSS-3.0.0/emboss'
>> make[1]: *** [install-recursive] Error 1
>> make[1]: Leaving directory `/usr/local/EMBOSS-3.0.0/emboss'
>> make: *** [install-recursive] Error 1
>> ----------------------------------------------------------
>> Jean
>> 
> _______________________________________________
> EMBOSS mailing list
> EMBOSS at emboss.open-bio.org
> http://newportal.open-bio.org/mailman/listinfo/emboss


From david at compbio.dundee.ac.uk  Tue Jul 26 11:02:51 2005
From: david at compbio.dundee.ac.uk (David Martin)
Date: Tue, 26 Jul 2005 16:02:51 +0100
Subject: [EMBOSS] dbxflat woes
Message-ID: <BF0C0F2B.13198%david@compbio.dundee.ac.uk>

I am trying to run dbxflat on uniprot (sprot/trembl/tremblnew) and it gets
most of the way through the second file then repeatably fails with the
error:

Processing file ./sprot.dat
Processing file ./trembl.dat

   EMBOSS An error in ajindex.c at line 811:
Something has unlocked the PRI root cache page


Any hints on what I can do to avoid this? I am running as an unpriviledged
user.

..d


From ableasby at hgmp.mrc.ac.uk  Tue Jul 26 11:55:19 2005
From: ableasby at hgmp.mrc.ac.uk (Alan Bleasby)
Date: Tue, 26 Jul 2005 16:55:19 +0100 (BST)
Subject: [EMBOSS] dbxflat woes
Message-ID: <200507261555.j6QFtJdq005430@bromine.hgmp.mrc.ac.uk>

>Something has unlocked the PRI root cache page

With an error like that the first thing to check is if
you've set CACHESIZE too small. The docs recommend that
it's set to 200. If that isn't the problem then
email me with your settings for:

a) PAGESIZE
b) CACHESIZE
c) Resource definition

from emboss.default and also email me with the command line
you are using.

Rgds

Alan


From pmr at ebi.ac.uk  Wed Jul 27 06:04:10 2005
From: pmr at ebi.ac.uk (Peter Rice)
Date: Wed, 27 Jul 2005 11:04:10 +0100
Subject: [EMBOSS] Database indexing logfiles
Message-ID: <42E75C1A.5010606@ebi.ac.uk>

Some questions for those who index their own databases in EMBOSS...

I am adding an output file to the programs to log information from the 
indexing run. A sample for indexing the "tembl" test database is included 
below (data files are in the test/embl directory).

Is this useful?

What other information would you like to see?

Can we improve the format of the report?

regards,

Peter Rice

%cat outfile.dbiflat
########################################
# Program: dbiflat
# Rundate: Wed Jul 27 2005 11:02:22
# Dbname: EMBL
# Release: 0.0
# Date: 00/00/00
# IndexDirectory: ./
# Maxindex: 0
# Fields: 6
#   Field 1: id
#   Field 2: acnum
#   Field 3: seqvn
#   Field 4: des
#   Field 5: keyword
#   Field 6: taxon
# Directory: ./
# Filenames: *.dat
# Exclude:
# Files: 10
#   File 1: ./est.dat
#   File 2: ./fun.dat
#   File 3: ./hum1.dat
#   File 4: ./inv.dat
#   File 5: ./pln.dat
#   File 6: ./pro.dat
#   File 7: ./rod.dat
#   File 8: ./sts.dat
#   File 9: ./vrl.dat
#   File 10: ./vrt.dat
########################################

processing filename 'est.dat' ... 1 entries
processing filename 'fun.dat' ... 1 entries
processing filename 'hum1.dat' ... 18 entries
processing filename 'inv.dat' ... 3 entries
processing filename 'pln.dat' ... 3 entries
processing filename 'pro.dat' ... 9 entries
processing filename 'rod.dat' ... 3 entries
processing filename 'sts.dat' ... 1 entries
processing filename 'vrl.dat' ... 1 entries
processing filename 'vrt.dat' ... 4 entries
Index acnum maxlen 8 items 88
Index seqvn maxlen 10 items 132
Index des maxlen 19 items 422
Index keyword maxlen 44 items 96
Index taxon maxlen 27 items 535
Total 10 files 44 entries


From smiddha at indiana.edu  Wed Jul 27 16:28:56 2005
From: smiddha at indiana.edu (Sumit Middha)
Date: Wed, 27 Jul 2005 15:28:56 -0500
Subject: [EMBOSS] EMBOSS explorer
In-Reply-To: <1121980112.5376.11.camel@incognito.invalid>
References: <1121980112.5376.11.camel@incognito.invalid>
Message-ID: <1122496136.42e7ee8815bd3@webmail.iu.edu>


Hi,
Its great to hear of the interface. I want to install it to my own directories
(possibly the same where I untar everything) and then I will manage to point my
web-pages or cgi etc to these. But I am not sure how to achieve that.

This is my attempt at installation. Can someone help me with this. THanks.


> ./install
installing EMBOSS Explorer perl modules...

Checking if your kit is complete...
Looks good
Writing Makefile for EMBOSS::GUI
cp lib/EMBOSS/ACD.pm blib/lib/EMBOSS/ACD.pm
cp lib/EMBOSS/GUI.pm blib/lib/EMBOSS/GUI.pm
cp lib/EMBOSS/GUI/Conf.pm blib/lib/EMBOSS/GUI/Conf.pm
cp lib/EMBOSS/GUI/XHTML.pm blib/lib/EMBOSS/GUI/XHTML.pm
Manifying blib/man3/EMBOSS::GUI.3
Manifying blib/man3/EMBOSS::ACD.3
Manifying blib/man3/EMBOSS::GUI::Conf.3
Manifying blib/man3/EMBOSS::GUI::XHTML.3
Warning: You do not have permissions to install into
/usr/local/lib/perl5/site_perl/5.8.5/sun4-solaris at
/usr/local/lib/perl5/5.8.5/ExtUtils/Install.pm line 114.
mkdir /usr/local/lib/perl5/site_perl/5.8.5/EMBOSS: Permission denied at
/usr/local/lib/perl5/5.8.5/ExtUtils/Install.pm line 176
*** Error code 255
make: Fatal error: Command failed for target `pure_site_install'


Quoting Luke McCarthy <lukem at gene.pbi.nrc.ca>:

> Hi everybody,
> 
> I'm pleased to finally announce a new release of the EMBOSS interface
> formerly known as EMBOSS::GUI, now known as EMBOSS explorer.
> 
> Development has moved to SourceForge.net and the new home page for the
> interface is http://embossgui.sourceforge.net/  It's quite spartan at
> the moment, but I'll be adding a FAQ as questions are frequent asked
> (and answered...)
> 
> You can download EMBOSS explorer at
> http://prdownloads.sourceforge.net/embossgui/emboss-explorer-2.0.0.tar.gz?download
> 
> The new release has been tested against EMBOSS-3.0.0, but not
> thoroughly.  Please report bugs using the bug tracker at
> http://sourceforge.net/tracker/?atid=699414&group_id=124389&func=browse
> (as a last resort, email them to mccarthy at users.sourceforge.net, but I'm
> hoping that use of the bug tracker will help with duplicate reports and
> other organizational issues...)
> 
> Cheers,
> 
> Luke
> _______________________________________________
> EMBOSS mailing list
> EMBOSS at emboss.open-bio.org
> http://newportal.open-bio.org/mailman/listinfo/emboss
> 


From lukem at gene.pbi.nrc.ca  Wed Jul 27 17:05:46 2005
From: lukem at gene.pbi.nrc.ca (Luke McCarthy)
Date: Wed, 27 Jul 2005 15:05:46 -0600
Subject: [EMBOSS] EMBOSS explorer
In-Reply-To: <1122496136.42e7ee8815bd3@webmail.iu.edu>
References: <1121980112.5376.11.camel@incognito.invalid>
	<1122496136.42e7ee8815bd3@webmail.iu.edu>
Message-ID: <1122498346.25556.7.camel@incognito.invalid>

On Wed, 2005-07-27 at 14:28, Sumit Middha wrote:
> Hi,
> Its great to hear of the interface. I want to install it to my own directories
> (possibly the same where I untar everything) and then I will manage to point my
> web-pages or cgi etc to these. But I am not sure how to achieve that.
> 
> This is my attempt at installation. Can someone help me with this. THanks.

At the moment, you can't use the install script to install to your local
directories.  You'd have to do quite a bit of extra setup anyway, to
make sure the web server could find (and had permission) to access the
library files in your own directory.

That being said, you can install the Perl modules like you would any
others:

	perl Makefile.PL
	make
	make install

You'll have to pass the appropriate options to Makefile.PL in order to
install to your own directory.

Alternatively, you can just run everything out of the untarred
directory.  You'll have to make sure that the web server is looking for
perl modules in the emboss-explorer/lib directory, and you'll have to
link appropriately to the html and cgi directories.  The webserver user
needs to be able to read everything in the lib, html and cgi
directories, and to be able to execute the script in the cgi directory,
and to be to write to the html/output directory.  I assume that you know
how to set up your webserver accordingly (or you wouldn't be asking...) 
You'll also have to edit emboss-explorer/lib/EMBOSS/GUI/Conf.pm and fill
in the correct locations.  Good luck.

Cheers,

Luke


From pmr at ebi.ac.uk  Thu Jul 28 12:14:06 2005
From: pmr at ebi.ac.uk (Peter Rice)
Date: Thu, 28 Jul 2005 17:14:06 +0100
Subject: [EMBOSS] Database indexing logfiles
In-Reply-To: <42E75C1A.5010606@ebi.ac.uk>
References: <42E75C1A.5010606@ebi.ac.uk>
Message-ID: <42E9044E.4090707@ebi.ac.uk>

After comments on this list, I have updated the dbiflat logfile.

It now includes:

Field names are the short names used by the USA (the index file names still 
work on the dbiflat commandline). These are the same names as SRS uses in its 
commandline queries.

Numbers of tokens for each index field in each file, and total unique values 
in each field index.

Full paths for all directories (including the current working directory)

Today's date (also written to the index file headers) if no date is given.

The full commandline - if there were prompts with non-default replies these 
will be included in the commandline reported. This uses new ACD functions that 
can be used to report in other programs. Any special requests for this 
information in other outputs?

regards,

Peter

> %cat outfile.dbiflat
> ########################################
> # Program: dbiflat
> # Rundate: Thu Jul 28 2005 17:04:58
> # Dbname: EMBL
> # Release: 0.0
> # Date: 28/07/05
> # CurrentDirectory: /homes/pmr/hgmp/test/embl/
> # IndexDirectory: ./
> # IndexDirectoryPath: /homes/pmr/hgmp/test/embl/
> # Maxindex: 0
> # Fields: 6
> #   Field 1: id
> #   Field 2: acc
> #   Field 3: sv
> #   Field 4: des
> #   Field 5: key
> #   Field 6: org
> # Directory: ./
> # DirectoryPath: /homes/pmr/hgmp/test/embl/
> # Filenames: *.dat
> # Exclude: 
> # Files: 10
> #   File 1: ./est.dat
> #   File 2: ./fun.dat
> #   File 3: ./hum1.dat
> #   File 4: ./inv.dat
> #   File 5: ./pln.dat
> #   File 6: ./pro.dat
> #   File 7: ./rod.dat
> #   File 8: ./sts.dat
> #   File 9: ./vrl.dat
> #   File 10: ./vrt.dat
> ########################################
> # Commandline: dbiflat
> #    -fields acnum,seqvn,des,keyword,taxon
> #    -dbname EMBL
> #    -idformat embl
> #    -auto
> ########################################
> 
> processing filename 'est.dat' ... 1 entries
>    acc 1
>     sv 3
>    des 15
>    key 1
>    org 14
> processing filename 'fun.dat' ... 1 entries
>    acc 1
>     sv 3
>    des 8
>    key 1
>    org 9
> processing filename 'hum1.dat' ... 18 entries
>    acc 53
>     sv 54
>    des 200
>    key 43
>    org 252
> processing filename 'inv.dat' ... 3 entries
>    acc 3
>     sv 9
>    des 20
>    key 3
>    org 33
> processing filename 'pln.dat' ... 3 entries
>    acc 7
>     sv 9
>    des 19
>    key 6
>    org 54
> processing filename 'pro.dat' ... 9 entries
>    acc 13
>     sv 27
>    des 77
>    key 28
>    org 54
> processing filename 'rod.dat' ... 3 entries
>    acc 3
>     sv 9
>    des 28
>    key 1
>    org 45
> processing filename 'sts.dat' ... 1 entries
>    acc 1
>     sv 3
>    des 12
>    key 7
>    org 14
> processing filename 'vrl.dat' ... 1 entries
>    acc 2
>     sv 3
>    des 10
>    key 1
>    org 5
> processing filename 'vrt.dat' ... 4 entries
>    acc 4
>     sv 12
>    des 33
>    key 5
>    org 55
> 
> Index acc maxlen 8 items 84
> Index sv maxlen 10 items 90
> Index des maxlen 19 items 215
> Index key maxlen 44 items 81
> Index org maxlen 27 items 116
> 
> Total 10 files 44 entries


From john8376 at uidaho.edu  Fri Jul 29 15:08:50 2005
From: john8376 at uidaho.edu (Audra Johnson)
Date: Fri, 29 Jul 2005 12:08:50 -0700
Subject: [EMBOSS] Using seqret to fetch from .nal index databases
Message-ID: <5C75DDA3-04A4-4A58-B925-31F9F017D8C4@uidaho.edu>

Apologies for the length, but I want to be thorough.  I'm doing blast  
searches and then trying to fetch the sequences from the our genembl  
database using seqret.  For example:

blastall -p tblastn /gcgdata_10.3/gcgblast/genembl -i  
dp00061_disordered_115_168.fasta

Gives me results of:

GB_PR:HUMRPA70KD        2e-08   412     573     1       54      54
GB_PR:BC018126  2e-08   386     547     1       54      54
GB_PAT:AX335048 2e-08   412     573     1       54      54
GB_PAT:AR175924 2e-08   412     573     1       54      54
GB_RO:BC019119  0.003   399     584     1       53      62

I've tried using a seqret just for the database name I'm giving  
blastall, and specifically saying the genembl.nal file:

$ seqret
Reads and writes (returns) sequences
Input sequence(s): /gcgdata_10.3/gcgblast/genembl.nal:HUMRPA70KD
Error: Unable to read sequence '/gcgdata_10.3/gcgblast/ 
genembl.nal:HUMRPA70KD'
Input sequence(s): /gcgdata_10.3/gcgblast/genembl:HUMRPA70KD
Error: failed to open filename '/gcgdata_10.3/gcgblast/genembl'
Error: Unable to read sequence '/gcgdata_10.3/gcgblast/ 
genembl:HUMRPA70KD'
Died: seqret terminated: Bad value for '-sequence' and no more retries

But neither works.  (I've omitted the beginning prefix GB_PR: and  
similar prefixes, but I've tried that way and it doesn't work,  
either.)  Is there any way to get seqret functioning with these  
databases?

-- Audra Johnson, University of Idaho


From golharam at umdnj.edu  Fri Jul 29 15:27:51 2005
From: golharam at umdnj.edu (Ryan Golhar)
Date: Fri, 29 Jul 2005 15:27:51 -0400
Subject: [EMBOSS] Using seqret to fetch from .nal index databases
In-Reply-To: <5C75DDA3-04A4-4A58-B925-31F9F017D8C4@uidaho.edu>
Message-ID: <008e01c59473$9cf21d70$2f01a8c0@GOLHARMOBILE1>

If you are using a NCBI formatted database, why not just use formatseq
from the ncbi toolkit to extract the sequence?


-----Original Message-----
From: emboss-bounces at emboss.open-bio.org
[mailto:emboss-bounces at emboss.open-bio.org] On Behalf Of Audra Johnson
Sent: Friday, July 29, 2005 3:09 PM
To: emboss at emboss.open-bio.org
Subject: [EMBOSS] Using seqret to fetch from .nal index databases


Apologies for the length, but I want to be thorough.  I'm doing blast  
searches and then trying to fetch the sequences from the our genembl  
database using seqret.  For example:

blastall -p tblastn /gcgdata_10.3/gcgblast/genembl -i  
dp00061_disordered_115_168.fasta

Gives me results of:

GB_PR:HUMRPA70KD        2e-08   412     573     1       54      54
GB_PR:BC018126  2e-08   386     547     1       54      54
GB_PAT:AX335048 2e-08   412     573     1       54      54
GB_PAT:AR175924 2e-08   412     573     1       54      54
GB_RO:BC019119  0.003   399     584     1       53      62

I've tried using a seqret just for the database name I'm giving  
blastall, and specifically saying the genembl.nal file:

$ seqret
Reads and writes (returns) sequences
Input sequence(s): /gcgdata_10.3/gcgblast/genembl.nal:HUMRPA70KD
Error: Unable to read sequence '/gcgdata_10.3/gcgblast/ 
genembl.nal:HUMRPA70KD'
Input sequence(s): /gcgdata_10.3/gcgblast/genembl:HUMRPA70KD
Error: failed to open filename '/gcgdata_10.3/gcgblast/genembl'
Error: Unable to read sequence '/gcgdata_10.3/gcgblast/ 
genembl:HUMRPA70KD'
Died: seqret terminated: Bad value for '-sequence' and no more retries

But neither works.  (I've omitted the beginning prefix GB_PR: and  
similar prefixes, but I've tried that way and it doesn't work,  
either.)  Is there any way to get seqret functioning with these  
databases?

-- Audra Johnson, University of Idaho
_______________________________________________
EMBOSS mailing list
EMBOSS at emboss.open-bio.org
http://newportal.open-bio.org/mailman/listinfo/emboss


From Andrew.Mather at dpi.vic.gov.au  Sat Jul 30 07:30:47 2005
From: Andrew.Mather at dpi.vic.gov.au (Andrew.Mather at dpi.vic.gov.au)
Date: Sat, 30 Jul 2005 21:30:47 +1000
Subject: [EMBOSS] EMBOSS GUI problems
Message-ID: <OFB7158ECE.3815DC06-ONCA25704E.003F1E39-CA25704E.003F3EA2@nre.vic.gov.au>

Hi Luke and EMBOSS list 

I've installed the EMBOSS GUI and for the most part, it's working pretty 
well. 

However for some apps (mainly seems to be alignment type ones like water, 
needle, emma, but that may just be because I've tried more of them than 
any others), it always fails 

Error: Unable to read sequence &#39;&#39 
Died: water terminated: Bad value for &#39-asequence&#39 with -auto 
defined 
water exited with status 1...

or in the /var/www/html/EMBOSS/runs/ error log, 

Error: Unable to read sequence '' 
Died: water terminated: Bad value for '-asequence' with -auto defined 
water exited with status 1... 

It doesn't seem to matter if it's sequence data pasted in, or uploaded 
from a file. 

Some apps work fine, so  I'm guessing it's not a fundamental problem like 
permissions on a temp directory or something. 

Are you able to point me at where to start lookng ? 

Thanks,
Andrew

 
Animal Genetics and Genomics, PIRVic Attwood
475 Mickleham Road, Attwood, 3049
ph +61 3 92174342
mob  0413 009 761


----------------
There are 10 kinds of people...those who understand binary and those who 
don't.


From Andrew.Mather at dpi.vic.gov.au  Sat Jul 30 06:40:45 2005
From: Andrew.Mather at dpi.vic.gov.au (Andrew.Mather at dpi.vic.gov.au)
Date: Sat, 30 Jul 2005 20:40:45 +1000
Subject: [EMBOSS] EMBOSS GUI problems
Message-ID: <OFA033AB41.6CF928E1-ONCA25704E.00390601-CA25704E.003AA9EF@nre.vic.gov.au>

Hi Luke and EMBOSS list

I've installed the EMBOSS GUI and for the most part, it's working pretty 
well.

However for some apps (mainly seems to be alignment type ones like water, 
needle, emma, but that may just be because I've tried more of them than 
any others), it always fails

Error: Unable to read sequence &#39;&#39
Died: water terminated: Bad value for &#39-asequence&#39 with -auto 
defined
water exited with status 1...

or in the /var/www/html/EMBOSS/runs/ error log, 

Error: Unable to read sequence ''
Died: water terminated: Bad value for '-asequence' with -auto defined
water exited with status 1...

It doesn't seem to matter if it's sequence data pasted in, or uploaded 
from a file.

Some apps work fine, so  I'm guessing it's not a fundamental problem like 
permissions on a temp directory or something.

Are you able to point me at where to start lookng ?

Thanks,
Andrew

Animal Genetics and Genomics, PIRVic Attwood
475 Mickleham Road, Attwood, 3049
ph +61 3 92174342
mob  0413 009 761


----------------
There are 10 kinds of people...those who understand binary and those who 
don't.


From ableasby at hgmp.mrc.ac.uk  Wed Jul 13 14:36:28 2005
From: ableasby at hgmp.mrc.ac.uk (Alan Bleasby)
Date: Wed, 13 Jul 2005 15:36:28 +0100 (BST)
Subject: [EMBOSS] New email lists ready
Message-ID: <200507131436.j6DEaSF7027543@bromine.hgmp.mrc.ac.uk>

The new email addresses for the EMBOSS lists are now set up and ready
(excluding any teething problems). They are:

   emboss at emboss.open-bio.org
   emboss-dev at emboss.open-bio.org
   emboss-bug at emboss.open-bio.org
   emboss-submit at emboss.open-bio.org

You can access the archives, subscribe/unsubscribe and alter
the way email is sent to you (e.g. digests) by visiting:

  http://emboss.open-bio.org/mailman/listinfo/emboss
  http://emboss.open-bio.org/mailman/listinfo/emboss-dev
  http://emboss.open-bio.org/mailman/listinfo/emboss-announce
  http://emboss.open-bio.org/mailman/listinfo/emboss-bug

The new FTP server is at:

  ftp://emboss.open-bio.org/pub/EMBOSS


Alan


From tjc at sanger.ac.uk  Wed Jul 13 15:11:40 2005
From: tjc at sanger.ac.uk (Tim Carver)
Date: Wed, 13 Jul 2005 16:11:40 +0100
Subject: [EMBOSS] Jemboss Announcement
Message-ID: <BEFAEDBC.2243%tjc@sanger.ac.uk>


With the imminent closure of the RFCGR, there will be no publicly available
Jemboss server. Jemboss will remain available for download and installation
as part of the EMBOSS distribution. You may find there is a local Jemboss
server already available at your own institution.

If you would like to have your server listed on the Jemboss web page please
contact the EMBOSS group (emboss-dev at emboss.open-bio.org)


Tim Carver
The Wellcome Trust Sanger Institute
Wellcome Trust Genome Campus, Hinxton,
Cambridge, CB10 1SA, UK


From ableasby at hgmp.mrc.ac.uk  Thu Jul 14 23:43:30 2005
From: ableasby at hgmp.mrc.ac.uk (Alan Bleasby)
Date: Fri, 15 Jul 2005 00:43:30 +0100 (BST)
Subject: [EMBOSS] EMBOSS 3.0.0 released
Message-ID: <200507142343.j6ENhUn2002328@bromine.hgmp.mrc.ac.uk>

EMBOSS 3.0.0 is now available for download from:

   ftp://emboss.open-bio.org/pub/EMBOSS/

   and, until the 27th July, from:
   ftp://ftp.rfcgr.mrc.ac.uk/pub/EMBOSS/

The following text details some of the changes from the previous
release.

Alan


EMBOSS main package:

New database indexing programs dbxflat, dbxfasta and dbxgcg. A
dbxblast program will be added if we can extract data from the new
BLAST formatdb output. These programs allow indexing of files
larger than 2Gb.
N.B.: Indexes will be created faster if they are written through a
      different disc controller than that used to read the database
      being indexed. If that is not possible then reading from and
      writing to different hard drives on the same controller is
      recommended. Note that each index can be created independently
      of the others e.g. you can create keyword and description
      indexes after you've created the ID and ACC indexes.

To support these programs, the emboss.default and .embossrc files can
include "resource" definitions. See the documentation of these
programs for more information. "resource" definitions are intended to
define anything other than environment variables and databases.

In the emboss.default and .embossrc files the same name can be used
for variables, databases, and resources (we now store them in separate
tables). In previous versions a single table was used and name clashes
could occur. This becomes an issue with the increasing use of resource
definitions.

Sequence sets in ACD have a new attribute "aligned" that reports
whether the sequences are aligned (reading a multiple alignment in for
visualisation) or not (reading a set of sequences into memory for
further processing - perhaps for alignment).

Sequence formats have been reviewed. "experiment" format is that used
by the Staden package. "staden" and "gcg" formats now parse out
comments from anywhere in the sequence. "nexus" and "nexusnon" formats
now correctly report protein sequence datatypes. "nbrf" or "pir"
format data can now be read from an SRSWWW server (for technical
reasons, SRS servers are unable to exactly reproduce NBRF/PIR
format). "clustal" output no longer writes in blocks of 10.  "Phylip3"
output is now renamed "phylipnon" for compatibility with other
non-interleaved output format names. The "phylip3" name remains valid
for back-compatibility. The header record for phylipnon format has
been changed to that accepted by phylip 3.6 (no YF on the header line,
number of sequences specified). Sequence format information on the web
has been updated to reflect these changes.

Codon usage table formats can be in these formats (-format qualifier):
  "emboss",    "EMBOSS codon usage file",
        "All numbers read, #comments for extras"
  "cut",       "EMBOSS codon usage file",
        "Same as EMBOSS, output default format is 'cut'"
  "gcg",       "GCG codon usage file",
        "All numbers read, #comments for extras"
  "cutg",      "CUTG codon usage file",
        "All numbers (cutgaa) read or fraction calculated, extras added"
  "cutgaa",    "CUTG codon usage file with aminoacids",
        "Cutg with all numbers"
  "spsum",     CUTG species summary file",
        "Number only, species and CDSs in header"
  "cherry",    "Mike Cherry codonusage database file",
        "GCG format with species and CDSs in header"
  "transterm", "TransTerm database file",
        "GCG format with no extras"
  "codehop",   "FHCRC codehop program codon usage file",
        "Freq only, extras at end"
  "staden",    "Staden package codon usage file with percentages",
        "Freq or number only, no extras"
  "numstaden", "Staden package codon usage file with numbers",
       "Number only, no extras. Can be read as 'staden'"

Any of these formats should be readable by default. Some files are
"readable" in more than one format (staden and numstaden for example
can both be read as "staden"). The extra names are used so we can
reuse them as output format names.

For output of codon usage tables, the same formats are available
(-oformat qualifier).

A new application codcopy (not codret because coderet is already an
EMBOSS program name) will convert from one format to another in the
same way as seqret converts sequence formats.

Coderet reports the number of CDS, mRNA and translation sequences.

Correction to sequence numbering for reversed nucleotide sequences in
alignments. Correction to sequence alignment functions returning
slightly suboptimal alignments.

The entrails program reports codon usage formats. Description of
report format entrails output improved. Entrails is built by "make
check" and is provided so that developers of wrappers can obtain all
EMBOSS internal details needed, for example all ACD datatypes and
input/output format names and descriptions.

Sequence types are explicitly set in cons, sixpack and backtranseq as
some output formats failed to recognise them as protein.

EMBASSY packages:

MYEMBOSS is a new EMBASSY package for developing your own code.

Installation requires recent versions of GNU packages autoconf,
automake and libtool.

To install, you must first build the configure and make files with
these commands:

aclocal -I m4

autoconf

automake -a

When you add your own programs, do so by adding source files in
myemboss/source and ACD files in myemboss/emboss_acd and add these
filenames to the Makefile.am files in each directory. There are
"myseq" and "mytest" examples provided to guide you.

There is no need to modify configure or Makefile files - these will be
automatically updated.

To allow MYEMBOSS to be installed by one user, and linked to an EMBOSS
installation maintained for the site by someone else, new variables
are added to locate the ACD files for any EMBASSY package. If myemboss
is not installed in the same place as EMBOSS, define
EMBOSS_MYEMBOSSROOT as the location of the myemboss installed ACD
files or the myemboss/emboss_acd source directory. This requires that
EMBASSY programs call the embInitP function with the name of the
package ("myemboss"). For ACD utilities such as acdvalid or acdc to
work, as these use the EMBOSS embInit call, another variable
EMBOSS_ACDUTILROOT must be defined, pointing to the same directory.

PHYLIP is a beta release port of PHYLIP 3.6b. We welcome comments on
the EMBOSS interface to the programs. Program names are prefixed by
'f' to avoid clashes with the old PHYLIP EMBASSY package. We still
need to work on adding new tree input and output formats, and updating
the code to PHYLIP 3.63 (December 2004). We are also considering
splitting more of the programs to simplify the ACD interface. In this
release seqboot and treedist are already split. seqboot is split by
input type into seqboot, restboot, discboot and freqboot. Treedist is
split by the number of input files into treedist and
treedistpair. Acdvalid objects to the dependencies in other programs,
for example the method used by fdnadist.

The DOMAINATRIX package of earlier releases has been extended and
replaced by 5 EMBASSY packages described below (32 applications in
total).  These tools were developed as part of a research project and
are distinct from other EMBOSS apps in being intended mostly for
computational biologists rather than biologist end-users.

STRUCTURE

The STRUCTURE package is used for parsing the PDB database and
generating secondary databases of coordinate and derived data.  The
tools have the following scope: (i) For parsing PDB files and writing
clean coordinate files (CCF files) that "clean-up" many PDB
inconsistencies.  For example, residue numbers give the correct index
into the biological sequence.  (ii) To generate CCF files for whole
PDB files or individual domains from the SCOP and CATH databases.
(iii) To augment CCF files with residue solvent accessibility and
secondary structure data.  (iv) To generate contact files (CON files)
of intra-chain and inter-chain residue-residue contact data. (v) To
generate CON files of residue-ligand contact data. (vi) Miscellaneous
file handling, e.g. dictionary of heterogen groups.

DOMAINATRIX

The DOMAINATRIX package is used for handling the SCOP and CATH
databases of protein domain classification, the parsable files of
which can be inconvenient, e.g. for comparative studies, extending and
processing.  The tools have the following scope: (i) For parsing raw
SCOP and CATH parsable files and writing domain classification files
(DCF files) with a single, simple and extensible format. (ii) To add
sequence records to a DCF file. (iii) To remove low resolution
domains.  (iv) To flexibly calculate and remove redundancy.  (v)
Primitive tools for secondary structure element mapping to domains in
a DCF file.

DOMALIGN

The DOMALIGN package is used for generating alignments for families of
domains, especially across large datasets, e.g. the whole of SCOP.
The tools have the following scope: (i) For identifying representative
structures for different nodes in the SCOP and CATH hierarchies.  (ii)
For generating annotated, structure-based sequence alignments for
these nodes.  (iii) For extending these domain alignment files (DAF
files) with sequences of unknown structure. (iv) All-versus-all global
sequence alignment.

DOMSEARCH 

The DOMSEARCH package is used for deriving extended sequence families,
especially from large structural datasets such as the whole of SCOP.
The tools have the following scope: (i) To generate domain hits files
(DHF files) of sequence relatives to an alignment or other
sequences. (ii) To remove fragmentary sequences from a DHF file.
(iii) To flexibly calculate and remove redundancy.  (iv) To remove
hits hits of ambiguous classification and collate sequences into
families.

SIGNATURE

The SIGNATURE package is used for generating, scanning and evaluating
sparse signatures and other predictive elements for protein sequence
characterisation.  The tools have the following scope: (i) To generate
sparse signatures for protein families from alignments and residue
contact data.  (ii) Generate other types of discriminator (e.g. HMMs)
from alignments. (iii) Generate ligand-binding signatures from
residue-ligand contacts.  (iv) Generate domain hits files (DHF files)
and ligand hits files (LHF files) of hits (sequences) from signature
scans. (v) Interpretation and display of signature performance by
using ROC analysis.


Where data, files etc are mentioned above or in the application
documentation, data structures and functions for manipulating such are
usually provided in the AJAX and NUCLEUS C programming libraries.  For
example, there are objects for handling protein atoms, residues,
chains, for SCOP and CATH domains and so on.


From thiago.venancio at gmail.com  Mon Jul 18 12:09:33 2005
From: thiago.venancio at gmail.com (Thiago Venancio)
Date: Mon, 18 Jul 2005 09:09:33 -0300
Subject: [EMBOSS] error msg
Message-ID: <44255ea80507180509386875bd@mail.gmail.com>

Hi all.
I am new to EMBOSS. I have installed it and got the problem:

"wossname: error while loading shared libraries: libnucleus.so.3: cannot 
open shared object file: No such file or directory" 

All the EMBOSS programs give the same error.

The instalation process have been ok and i have set the envs.

Thanks in advance.

Thiago


From golharam at umdnj.edu  Tue Jul 19 16:51:30 2005
From: golharam at umdnj.edu (Ryan Golhar)
Date: Tue, 19 Jul 2005 12:51:30 -0400
Subject: [EMBOSS] EMBOSS::GUI Web Interface
Message-ID: <009401c58c82$1d2a3670$2f01a8c0@GOLHARMOBILE1>

Hi Luke,

Any word on when EMBOSS-GUI will be available for EMBOSS 3.0.0?

Thanks,

Ryan


From jacob at biochemistry.ucl.ac.uk  Wed Jul 20 15:36:24 2005
From: jacob at biochemistry.ucl.ac.uk (Jacob Hurst)
Date: Wed, 20 Jul 2005 16:36:24 +0100 (BST)
Subject: [EMBOSS] problem with using accession number....
Message-ID: <Pine.LNX.4.44.0507201629480.21438-100000@localhost.localdomain>

Hello,

If I enter the following id seqret correctly returns the sequence.

acrm3<113>% seqret embl:hsgstpig
Reads and writes (returns) sequences
Output sequence [hsgstpig.fasta]:

however if i enter the corresponding accession number it fails.....

acrm3<114>% seqret embl:X08058
Reads and writes (returns) sequences
Error: Unable to read sequence 'embl:X08058'
Died: seqret terminated: Bad value for '-sequence' and no prompt

I was under the impression that emboss was setup to deal with both 
accession and id. 

regards Jake


-- 
Jacob Hurst Phd
Department of Biochemistry and Molecular Biology,
University College London


From pmr at ebi.ac.uk  Wed Jul 20 15:59:55 2005
From: pmr at ebi.ac.uk (Peter Rice)
Date: Wed, 20 Jul 2005 16:59:55 +0100
Subject: [EMBOSS] problem with using accession number....
In-Reply-To: <Pine.LNX.4.44.0507201629480.21438-100000@localhost.localdomain>
References: <Pine.LNX.4.44.0507201629480.21438-100000@localhost.localdomain>
Message-ID: <42DE74FB.80604@ebi.ac.uk>

Jacob Hurst wrote:
> I was under the impression that emboss was setup to deal with both 
> accession and id. 

Yes, but ... this depends on how the embl database is defined at your site.

Some sites have databases defined to access entries through, for example, a 
URL or an external application (or script) that can only search for entry names.

Hmmmm .... we could add a little more information on this in showdb .... for a 
future release.

If you have difficulty finding out how the database is defined, mail us at 
emboss-bug at emboss.open-bio.org and we can help you track it down.

regards,

Peter Rice


From golharam at umdnj.edu  Thu Jul 21 04:00:03 2005
From: golharam at umdnj.edu (Ryan Golhar)
Date: Thu, 21 Jul 2005 00:00:03 -0400
Subject: [EMBOSS] EMBOSS 3.0.0 RPMs available
Message-ID: <013801c58da8$acc10440$2f01a8c0@GOLHARMOBILE1>

I'm eager to upgrade our installation of EMBOSS on all our linux
workstations, so I've gone ahead and built RPMs for EMBOSS (based on
biolinux version) and MYEMBOSS applications.

You can download the RPMs and source RPMs from
http://serine.umdnj.edu/~golharam/biorpms.

They include (sorry for the capitalization):

DOMAINATRIX
DOMALIGN
DOMSEARCH
EMBOSS
EMBOSS-data
EMBOSS-devel
EMBOSS-Jemboss
EMNU
ESIM4
HMMER
MEME
MSE
MYEMBOSS
PHYLIP
SIGNATURE
STRUCTURE
TOPO

--
Ryan Golhar  -  golharam at umdnj.edu
The Informatics Institute of UMDNJ


From james_tan79 at hotmail.com  Thu Jul 21 09:41:24 2005
From: james_tan79 at hotmail.com (JT)
Date: Thu, 21 Jul 2005 17:41:24 +0800
Subject: [EMBOSS] any DNA or RNA program similar to pepstat ?
Message-ID: <BAY14-DAV2D38F07B066414E7C775995D60@phx.gbl>

Hi,

Is there any program that can output a report of simple DNA/RNA sequence 
information including e.g.
a) Molecular weight
b) Number of residues
c) Average residue weight
d) %G, %C, %A, %T, %GC
e) Melting temp
f) charge etc.

Thanks
James 


From jison at hgmp.mrc.ac.uk  Thu Jul 21 10:49:58 2005
From: jison at hgmp.mrc.ac.uk (Dr J.C. Ison)
Date: Thu, 21 Jul 2005 11:49:58 +0100
Subject: [EMBOSS] any DNA or RNA program similar to pepstat ?
References: <BAY14-DAV2D38F07B066414E7C775995D60@phx.gbl>
Message-ID: <42DF7DD6.CD81DF9B@hgmp.mrc.ac.uk>

Hi James

There's no single app to cover all your request, but some of the 
following might help (see http://emboss.sourceforge.net/apps/)

dan         Plot melting temperatures for DNA. 
freak       Residue/base frequency table or plot 
extractfeat Extract features from a sequence 
geecee      Calculates the fractional GC content of nucleic acid sequences 
infoseq     Displays some simple information about sequences 
isochore    Plots isochores in large DNA sequences 
newcpgseek  Reports CpG rich regions 
remap       Display a sequence with restriction cut sites, translation etc.. 
showfeat    Show features of a sequence. 

Please have a look at what's available and if you require something 
else / new functionality etc please get back in touch.

Cheers

Jon


JT wrote:
> 
> Hi,
> 
> Is there any program that can output a report of simple DNA/RNA sequence
> information including e.g.
> a) Molecular weight
> b) Number of residues
> c) Average residue weight
> d) %G, %C, %A, %T, %GC
> e) Melting temp
> f) charge etc.
> 
> Thanks
> James
> _______________________________________________
> EMBOSS mailing list
> EMBOSS at emboss.open-bio.org
> http://newportal.open-bio.org/mailman/listinfo/emboss

-- 
Jon C. Ison, PhD
Proteomics Applications Group
MRC Rosalind Franklin Centre for Genomics Research
Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SB, UK
Tel: +44 1223 494500  Fax: +44 1223 494512
E-mail: jison at rfcgr.mrc.ac.uk  Web: http://www.rfcgr.mrc.ac.uk


From kertib at linuxlap.hu  Thu Jul 21 11:38:34 2005
From: kertib at linuxlap.hu (Kerti =?iso-8859-1?q?Bal=E1zs_G=E1bor?=)
Date: Thu, 21 Jul 2005 13:38:34 +0200
Subject: [EMBOSS] Some question
Message-ID: <200507211338.35035.kertib@linuxlap.hu>

Hello!

There is some (elementary) question, because I do not find - maybe I do wrong 
- the solution.

- how to backtranslate a cds mRNA fragm. to (c)DNA fragm. ?
- how to generate antisense DNA fragm. from a sens.

Thank you.

Balazs


From pmr at ebi.ac.uk  Thu Jul 21 11:58:58 2005
From: pmr at ebi.ac.uk (Peter Rice)
Date: Thu, 21 Jul 2005 12:58:58 +0100
Subject: [EMBOSS] Some question
In-Reply-To: <200507211338.35035.kertib@linuxlap.hu>
References: <200507211338.35035.kertib@linuxlap.hu>
Message-ID: <42DF8E02.40909@ebi.ac.uk>

Kerti Bal?zs G?bor wrote:

> There is some (elementary) question, because I do not find - maybe I do wrong 
> - the solution.
> 
> - how to backtranslate a cds mRNA fragm. to (c)DNA fragm. ?

The cDNA will be identical to the mRNA. No backtranslation needed. 
Backtranslation (as in backtranseq) converts a protein sequence into a 
nucleotide sequence that will translate to the same protein sequence (using 
the most frequent codon for each amino acid).

If you only want to convert U (Uracil) to T (thymine) to convert an RNA 
sequence to DNA (all EMBOSS programs will accept both as nucleotide input) you 
can modify the program seqret to specify a nucleotide sequence as input, and 
generate a DNA sequence as output. An easy way to start writing EMBOSS 
programs - copy one program and one ACD file and make 4 small edits.

> - how to generate antisense DNA fragm. from a sens.

In EMBOSS, revseq does this. The antisense strand is smilpy the reverse 
compleemnt of the original.

Hope this helps,

Peter Rice


From jison at hgmp.mrc.ac.uk  Thu Jul 21 12:10:50 2005
From: jison at hgmp.mrc.ac.uk (Dr J.C. Ison)
Date: Thu, 21 Jul 2005 13:10:50 +0100
Subject: [EMBOSS] Some question
References: <200507211338.35035.kertib@linuxlap.hu>
Message-ID: <42DF90CA.9FB0FD34@hgmp.mrc.ac.uk>

Dear Balazs

See http://emboss.sourceforge.net/apps/ for application documentation.

transeq       Translates nucleic acid sequences.   (i.e. DNA -> protein)
backtranseq   Back translate a protein sequence    (i.e. protein -> DNA)
coderet       Extract CDS, mRNA and translations from feature tables 

I don't think there is anything to interchange sense/antisense or mRNA / DNA
sequences but something could be written if you let us know exactly what you
need / why you need it.

Cheers

Jon


Kerti Bal?zs G?bor wrote:
> 
> Hello!
> 
> There is some (elementary) question, because I do not find - maybe I do wrong
> - the solution.
> 
> - how to backtranslate a cds mRNA fragm. to (c)DNA fragm. ?
> - how to generate antisense DNA fragm. from a sens.
> 
> Thank you.
> 
> Balazs
> _______________________________________________
> EMBOSS mailing list
> EMBOSS at emboss.open-bio.org
> http://newportal.open-bio.org/mailman/listinfo/emboss

-- 
Jon C. Ison, PhD
Proteomics Applications Group
MRC Rosalind Franklin Centre for Genomics Research
Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SB, UK
Tel: +44 1223 494500  Fax: +44 1223 494512
E-mail: jison at rfcgr.mrc.ac.uk  Web: http://www.rfcgr.mrc.ac.uk


From faruque at ebi.ac.uk  Thu Jul 21 13:08:30 2005
From: faruque at ebi.ac.uk (Nadeem Faruque)
Date: Thu, 21 Jul 2005 14:08:30 +0100
Subject: [EMBOSS] backtranseq
In-Reply-To: <42DF90CA.9FB0FD34@hgmp.mrc.ac.uk>
References: <200507211338.35035.kertib@linuxlap.hu>
	<42DF90CA.9FB0FD34@hgmp.mrc.ac.uk>
Message-ID: <42DF9E4E.6060603@ebi.ac.uk>


> See http://emboss.sourceforge.net/apps/ for application documentation.
> 
> transeq       Translates nucleic acid sequences.   (i.e. DNA -> protein)
> backtranseq   Back translate a protein sequence    (i.e. protein -> DNA)
...

While backtranseq is very clever in predicting the cDNA sequence based on peptide sequence by choosing codons according 
to useage, would it not be very useful to have the option for it to return an answer in degenerate bases?

eg in human, the 'peptide' is simply 'M'
backtranseq returns the most likely codon used, ie 'ATG'
but since it could be TTG, CTG or ATG, it may be more useful for some people to return 'HTG'

Returning a degenerate sequence would have the advantage (for some uses) of being usable by normal DNA-savvy 
string-based search methods when finding the peptide coding location in nucleic acid sequences rather than having to use 
similarity searches.  I could also see it being useful for designing PCR primers within coding regions.

Nadeem

-- 
S.M. Nadeem N. Faruque
EMBL Nucleotide Database Curation Team
EMBL Outstation
Tel: +44 1223 494611                     Fax: +44 1223 494472
The European Bioinformatics Institute    URL: http://www.ebi.ac.uk/
Email for data submissions: datasubs at ebi.ac.uk
Email for updates: update at ebi.ac.uk
=============================================================================


From pmr at ebi.ac.uk  Thu Jul 21 14:00:30 2005
From: pmr at ebi.ac.uk (Peter Rice)
Date: Thu, 21 Jul 2005 15:00:30 +0100
Subject: [EMBOSS] backtranseq
In-Reply-To: <42DF9E4E.6060603@ebi.ac.uk>
References: <200507211338.35035.kertib@linuxlap.hu>	<42DF90CA.9FB0FD34@hgmp.mrc.ac.uk>
	<42DF9E4E.6060603@ebi.ac.uk>
Message-ID: <42DFAA7E.2070107@ebi.ac.uk>

Nadeem Faruque wrote:

> While backtranseq is very clever in predicting the cDNA sequence based on peptide sequence by choosing codons according 
> to useage, would it not be very useful to have the option for it to return an answer in degenerate bases?
> 
> eg in human, the 'peptide' is simply 'M'
> backtranseq returns the most likely codon used, ie 'ATG'
> but since it could be TTG, CTG or ATG, it may be more useful for some people to return 'HTG'

Ummmm .... depends on the genetic code. In human I would expect ATG, in 
bacteria GCG is second schoice and NTG would be the possible result - but only 
for a start codon of course (just one of the complexities of backtranslating - 
I think we must avoid inventing a start codon if the protein doesn't start 
with 'M' because the numbering gets complicated).

As this would need a different input (a genetic code, rather than a codon 
usage file) I would make this a different program - not difficult to write,

Any good suggestions for a program name?

> Returning a degenerate sequence would have the advantage (for some uses) of being usable by normal DNA-savvy 
> string-based search methods when finding the peptide coding location in nucleic acid sequences rather than having to use 
> similarity searches.  I could also see it being useful for designing PCR primers within coding regions.

... which leads on to whether EMBOSS should include such programs :-)

regards,

Peter Rice


From jcherry at ncbi.nlm.nih.gov  Thu Jul 21 14:58:14 2005
From: jcherry at ncbi.nlm.nih.gov (Josh Cherry)
Date: Thu, 21 Jul 2005 10:58:14 -0400 (EDT)
Subject: [EMBOSS] backtranseq
In-Reply-To: <42DFAA7E.2070107@ebi.ac.uk>
References: <200507211338.35035.kertib@linuxlap.hu>
	<42DF90CA.9FB0FD34@hgmp.mrc.ac.uk>
	<42DF9E4E.6060603@ebi.ac.uk> <42DFAA7E.2070107@ebi.ac.uk>
Message-ID: <Pine.LNX.4.58.0507211047070.25800@widget0.ncbi.nlm.nih.gov>


Nadeem Faruque wrote:

> Returning a degenerate sequence would have the advantage (for some uses)
> of being usable by normal DNA-savvy string-based search methods when
> finding the peptide coding location in nucleic acid sequences rather
> than having to use similarity searches.

But this won't work the way some might hope due to the nature of the
genetic code, specifically (in the standard code) the three amino acids
that have six codons each (S, L, and R).  Consider serine, encoded by UCN
and AGY.  Would you like this to be back-translated to WSN?  That matches
all six serine codons but also ten non-serine codons.  Some people may
still want to use it in a probe or primer though.

Josh

--
Joshua L. Cherry, Ph.D.
NCBI/NLM/NIH (Contractor)
jcherry at ncbi.nlm.nih.gov


From faruque at ebi.ac.uk  Thu Jul 21 15:21:35 2005
From: faruque at ebi.ac.uk (Nadeem Faruque)
Date: Thu, 21 Jul 2005 16:21:35 +0100
Subject: [EMBOSS] backtranseq
In-Reply-To: <Pine.LNX.4.58.0507211047070.25800@widget0.ncbi.nlm.nih.gov>
References: <200507211338.35035.kertib@linuxlap.hu>
	<42DF90CA.9FB0FD34@hgmp.mrc.ac.uk> <42DF9E4E.6060603@ebi.ac.uk>
	<42DFAA7E.2070107@ebi.ac.uk>
	<Pine.LNX.4.58.0507211047070.25800@widget0.ncbi.nlm.nih.gov>
Message-ID: <42DFBD7F.7060306@ebi.ac.uk>

Josh Cherry wrote:
> Nadeem Faruque wrote:
> 
> 
>>Returning a degenerate sequence would have the advantage (for some uses)
>>of being usable by normal DNA-savvy string-based search methods when
>>finding the peptide coding location in nucleic acid sequences rather
>>than having to use similarity searches.
> 
> 
> But this won't work the way some might hope due to the nature of the
> genetic code, specifically (in the standard code) the three amino acids
> that have six codons each (S, L, and R).  Consider serine, encoded by UCN
> and AGY.  Would you like this to be back-translated to WSN?  That matches
> all six serine codons but also ten non-serine codons.  Some people may
> still want to use it in a probe or primer though.

I was going to use Serine in my example but realised 'WSN' was a bit too degenerate to be a useful example.
I understand you could not roundtrip peptide->DNA->peptide with my suggested behaviour (as you can currently do with 
backtranseq), but you can do DNA->peptide->DNA in a usable form.
I'm sketchy about its potential use in oligo design, but given a degenerate backtranslation someone could possibly 
design oligos so as to avoid the more degenerate areas (esp for the 3' end of primers).  If they were to use backtranseq 
they would be ignorant of these regions.

Nadeem

-- 
S.M. Nadeem N. Faruque
EMBL Nucleotide Database Curation Team
EMBL Outstation
Tel: +44 1223 494611                     Fax: +44 1223 494472
The European Bioinformatics Institute    URL: http://www.ebi.ac.uk/
Email for data submissions: datasubs at ebi.ac.uk
Email for updates: update at ebi.ac.uk
=============================================================================


From pmr at ebi.ac.uk  Thu Jul 21 15:55:15 2005
From: pmr at ebi.ac.uk (Peter Rice)
Date: Thu, 21 Jul 2005 16:55:15 +0100
Subject: [EMBOSS] backtranseq
In-Reply-To: <42DFBD7F.7060306@ebi.ac.uk>
References: <200507211338.35035.kertib@linuxlap.hu>	<42DF90CA.9FB0FD34@hgmp.mrc.ac.uk>
	<42DF9E4E.6060603@ebi.ac.uk>	<42DFAA7E.2070107@ebi.ac.uk>	<Pine.LNX.4.58.0507211047070.25800@widget0.ncbi.nlm.nih.gov>
	<42DFBD7F.7060306@ebi.ac.uk>
Message-ID: <42DFC563.4010600@ebi.ac.uk>

Nadeem Faruque wrote:
> Josh Cherry wrote:
>>But this won't work the way some might hope due to the nature of the
>>genetic code, specifically (in the standard code) the three amino acids
>>that have six codons each (S, L, and R).  Consider serine, encoded by UCN
>>and AGY.  Would you like this to be back-translated to WSN?  That matches
>>all six serine codons but also ten non-serine codons.  Some people may
>>still want to use it in a probe or primer though.
> 
> I was going to use Serine in my example but realised 'WSN' was a bit too degenerate to be a useful example.
> I understand you could not roundtrip peptide->DNA->peptide with my suggested behaviour

... I bet you can!!!  Assuming you have a backtranslated sequence, WSN would 
be surely Serine (as would UCN or AGY). If any of the 3 positions is more 
specific, that could indicate one of the other possibilities.

I would be happy to accept a lower case residue if the result is uncertain (if 
the ambiguity codes do not match what one would expect from the genetic code 
in a backtranslation). For ASN the answer could be T (ACN) S (AGY) or R (AGR) 
with T ('t') the favourite by a majority vote (4/4 codons match, 2/6 for the 
others).

X can be used if all else fails. After all, we could be translating a sequence 
with a SNP. A command line option can give the user a choice of trying to 
resolve unclear positions or using X.

Degenerate codons would be:

A GCN
C UGY
D GAY
E GAR
F UUY
G GGN
H CAY
I AUH
K AAR
L YUN (CUN/UUR) - also matches F (UUY)
M AUG
N AAY
P CCN
Q CAR
R MGN (CGN/AGR) - also matches S (AGY)
S WSN (UCN/AGY) - also matches T (ACN)
                   also matches R (AGR)
                   also matches C and W and * (UGN)

T ACN
V GUN
W UGG
Y UAY
* URR - also matcheds W (UGG)
m NUG (start codon)


From lukem at gene.pbi.nrc.ca  Thu Jul 21 21:08:32 2005
From: lukem at gene.pbi.nrc.ca (Luke McCarthy)
Date: Thu, 21 Jul 2005 15:08:32 -0600
Subject: [EMBOSS] EMBOSS explorer
Message-ID: <1121980112.5376.11.camel@incognito.invalid>

Hi everybody,

I'm pleased to finally announce a new release of the EMBOSS interface
formerly known as EMBOSS::GUI, now known as EMBOSS explorer.

Development has moved to SourceForge.net and the new home page for the
interface is http://embossgui.sourceforge.net/  It's quite spartan at
the moment, but I'll be adding a FAQ as questions are frequent asked
(and answered...)

You can download EMBOSS explorer at
http://prdownloads.sourceforge.net/embossgui/emboss-explorer-2.0.0.tar.gz?download

The new release has been tested against EMBOSS-3.0.0, but not
thoroughly.  Please report bugs using the bug tracker at
http://sourceforge.net/tracker/?atid=699414&group_id=124389&func=browse
(as a last resort, email them to mccarthy at users.sourceforge.net, but I'm
hoping that use of the bug tracker will help with duplicate reports and
other organizational issues...)

Cheers,

Luke


From gwilliam at hgmp.mrc.ac.uk  Fri Jul 22 08:21:40 2005
From: gwilliam at hgmp.mrc.ac.uk (Gary Williams, Tel 01223 494522)
Date: Fri, 22 Jul 2005 09:21:40 +0100
Subject: [EMBOSS] backtranseq
References: <200507211338.35035.kertib@linuxlap.hu>	<42DF90CA.9FB0FD34@hgmp.mrc.ac.uk>
	<42DF9E4E.6060603@ebi.ac.uk> <42DFAA7E.2070107@ebi.ac.uk>
Message-ID: <42E0AC94.63F132A7@hgmp.mrc.ac.uk>

Peter Rice wrote:
> 
> Nadeem Faruque wrote:
> 
> > While backtranseq is very clever in predicting the cDNA sequence based on peptide sequence by choosing codons according
> > to useage, would it not be very useful to have the option for it to return an answer in degenerate bases?
> >
> > eg in human, the 'peptide' is simply 'M'
> > backtranseq returns the most likely codon used, ie 'ATG'
> > but since it could be TTG, CTG or ATG, it may be more useful for some people to return 'HTG'
> 
> Ummmm .... depends on the genetic code. In human I would expect ATG, in
> bacteria GCG is second schoice and NTG would be the possible result - but only
> for a start codon of course (just one of the complexities of backtranslating -
> I think we must avoid inventing a start codon if the protein doesn't start
> with 'M' because the numbering gets complicated).
> 
> As this would need a different input (a genetic code, rather than a codon
> usage file) I would make this a different program - not difficult to write,
> 
> Any good suggestions for a program name?

barebackseq

-- 
Gary Williams
MRC Rosalind Franklin Centre for Genomics Research
Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SB, UK
Tel: +44 1223 494522 
(UNTIL END OF JULY 2005)

E-mail: gareth.williams57 at ntlworld.com


From gbottu at ben.vub.ac.be  Fri Jul 22 09:10:17 2005
From: gbottu at ben.vub.ac.be (Guy Bottu)
Date: Fri, 22 Jul 2005 11:10:17 +0200
Subject: [EMBOSS] Some question
In-Reply-To: <42DF8E02.40909@ebi.ac.uk>
References: <200507211338.35035.kertib@linuxlap.hu> <42DF8E02.40909@ebi.ac.uk>
Message-ID: <20050722091017.GA27340@bigben.ulb.ac.be>

On Thu, Jul 21, 2005 at 12:58:58PM +0100, Peter Rice wrote:
> Kerti Bal?zs G?bor wrote:
> 
> > There is some (elementary) question, because I do not find - maybe I do wrong 
> > - the solution.
> > 
> > - how to backtranslate a cds mRNA fragm. to (c)DNA fragm. ?
> 
> The cDNA will be identical to the mRNA. No backtranslation needed. 
> Backtranslation (as in backtranseq) converts a protein sequence into a 
> nucleotide sequence that will translate to the same protein sequence (using 
> the most frequent codon for each amino acid).
> 
> If you only want to convert U (Uracil) to T (thymine) to convert an RNA 
> sequence to DNA (all EMBOSS programs will accept both as nucleotide input) you 
> can modify the program seqret to specify a nucleotide sequence as input, and 
> generate a DNA sequence as output. An easy way to start writing EMBOSS 
> programs - copy one program and one ACD file and make 4 small edits.

No need to modify seqret, the EMBOSS program biosed can be used to replace 
U by T in a sequence.

	Guy Bottu,
	Belgian EMBnet Node


From gbottu at ben.vub.ac.be  Fri Jul 22 09:26:38 2005
From: gbottu at ben.vub.ac.be (Guy Bottu)
Date: Fri, 22 Jul 2005 11:26:38 +0200
Subject: [EMBOSS] backtranseq
In-Reply-To: <42DFC563.4010600@ebi.ac.uk>
References: <200507211338.35035.kertib@linuxlap.hu>
	<42DF90CA.9FB0FD34@hgmp.mrc.ac.uk> <42DF9E4E.6060603@ebi.ac.uk>
	<42DFAA7E.2070107@ebi.ac.uk>
	<Pine.LNX.4.58.0507211047070.25800@widget0.ncbi.nlm.nih.gov>
	<42DFBD7F.7060306@ebi.ac.uk> <42DFC563.4010600@ebi.ac.uk>
Message-ID: <20050722092638.GB27340@bigben.ulb.ac.be>

I remember that the GCG program backtranslate let the use choose between 
the most likely backtranslation (as backtranseq does) and the most 
ambiguous backtranslation. So, adding to EMBOSS a program that makes the
most ambiguous backtranslation would bring back this lost functionality.

As for the problem cases like Serine, maybe an option to make instead of a 
sequence with ambiguity symbols a regular expression that exactly matches 
the allowed codons ? The utility of this may be limited, but you could 
e.g. if you have a peptide use the backtranslation with the program dreg to
search the corresponding CDS in a piece of DNA.

	Regards,
	Guy Bottu,
	Belgian EMBnet Node


From faruque at ebi.ac.uk  Fri Jul 22 10:22:27 2005
From: faruque at ebi.ac.uk (Nadeem Faruque)
Date: Fri, 22 Jul 2005 11:22:27 +0100
Subject: [EMBOSS] backtranseq
In-Reply-To: <20050722092638.GB27340@bigben.ulb.ac.be>
References: <200507211338.35035.kertib@linuxlap.hu>	<42DF90CA.9FB0FD34@hgmp.mrc.ac.uk>
	<42DF9E4E.6060603@ebi.ac.uk>	<42DFAA7E.2070107@ebi.ac.uk>	<Pine.LNX.4.58.0507211047070.25800@widget0.ncbi.nlm.nih.gov>	<42DFBD7F.7060306@ebi.ac.uk>
	<42DFC563.4010600@ebi.ac.uk>
	<20050722092638.GB27340@bigben.ulb.ac.be>
Message-ID: <42E0C8E3.8060900@ebi.ac.uk>

> As for the problem cases like Serine, maybe an option to make instead of a 
> sequence with ambiguity symbols a regular expression that exactly matches 
> the allowed codons ? The utility of this may be limited, but you could 
> e.g. if you have a peptide use the backtranslation with the program dreg to
> search the corresponding CDS in a piece of DNA.

I think we'd be better off with plain old IUPAC rather than venturing into more comples systems or we'll end up with 
weighted matrices or even HMM's.
The advantage of IUPAC is of course that you can plug it into most other programs.

Nadeem

-- 
S.M. Nadeem N. Faruque
EMBL Nucleotide Database Curation Team
EMBL Outstation
Tel: +44 1223 494611                     Fax: +44 1223 494472
The European Bioinformatics Institute    URL: http://www.ebi.ac.uk/
Email for data submissions: datasubs at ebi.ac.uk
Email for updates: update at ebi.ac.uk
=============================================================================


From pmr at ebi.ac.uk  Fri Jul 22 12:52:49 2005
From: pmr at ebi.ac.uk (Peter Rice)
Date: Fri, 22 Jul 2005 13:52:49 +0100
Subject: [EMBOSS] backtranseq
In-Reply-To: <42E0C8E3.8060900@ebi.ac.uk>
References: <200507211338.35035.kertib@linuxlap.hu>	<42DF90CA.9FB0FD34@hgmp.mrc.ac.uk>	<42DF9E4E.6060603@ebi.ac.uk>	<42DFAA7E.2070107@ebi.ac.uk>	<Pine.LNX.4.58.0507211047070.25800@widget0.ncbi.nlm.nih.gov>	<42DFBD7F.7060306@ebi.ac.uk>	<42DFC563.4010600@ebi.ac.uk>	<20050722092638.GB27340@bigben.ulb.ac.be>
	<42E0C8E3.8060900@ebi.ac.uk>
Message-ID: <42E0EC21.4030607@ebi.ac.uk>

Nadeem Faruque wrote:

> I think we'd be better off with plain old IUPAC rather than venturing into more comples systems or we'll end up with 
> weighted matrices or even HMM's.
> The advantage of IUPAC is of course that you can plug it into most other programs.

Well .... how about this part of IUPAC:

IUBMB recommends marking unclear codons, for example in
http://www.chem.qmul.ac.uk/iubmb/misc/naseq.html

"To avoid ambiguity, therefore, it is important to make it clear whenever the 
triplet YTN, for example, occurs in a sequence deduced from the occurrence of 
a leucine residue in the corresponding amino acid sequence that it does not 
include TTT or TTC as possibilities, etc. To emphasise this, it may be helpful 
to print such triplets in italics."

... we could use lowercase, rather than italics, to make this clear.

IUPAC also allows uncertain positions with (A,C,D) or (H.I.K.L). EMBOSS allows 
these, but after checking all occurrences in PIR it simply ignores the extra 
characters and assumes the amino acids are in the correct sequence. These are 
needed because Sanger protein sequencing determined composition but usually 
not the order of residues.

I see no codes for a choice of amino acids, other than B (D or N) and Z (E or 
Q), both from amino acid sequence composition, where hydrolyzing all amide 
bonds converted N to D (Asparagine to Aspartate) and Q to E (glutamine to 
glutamate). Also, one IUPAC report notes that NMR data can include J for "I or 
L" as Leucine and Isoleucine are indistinguishable by NMR. EBMOSS so far 
ignores this code (I only discovered it today :-).

U is now officially used for selenocysteine, although many EMBOSS programs 
cannot handle U and have to use X. The only character not used in amino acid 
sequence is O. I have seen it used in DNA sequence (CpG islands represented as 
OJ for specialised alignment scoring in one publication).


From pmr at ebi.ac.uk  Fri Jul 22 15:00:01 2005
From: pmr at ebi.ac.uk (Peter Rice)
Date: Fri, 22 Jul 2005 16:00:01 +0100
Subject: [EMBOSS] EMBOSS in August
Message-ID: <42E109F1.9070604@ebi.ac.uk>

We know it is close to the end of July, and we have not said what is happening 
to the EMBOSS team. We do have a solution, but it is not yet officially confirmed.

The Rosalind Franklin Centre for Genomic Research will close at the end of 
next week. The EMBOSS project will move to the European Bioinformatics 
Institute from August 1st. Development and support will continue as before.

The EMBOSS homepage will remain at http://emboss.sourceforge.net/

The FTP server (to download EMBOSS releases and updates) has moved to 
ftp://emboss.open-bio.org/pub/EMBOSS/

The EMBOSS anonymous CVS server will remain at cvs.open-bio.org hosted by the 
Open Bio Foundation, who will also continue to host the developers' CVS server.

The EMBOSS mailing lists have been moved to the Open Bio Foundation, so the 
addresses are now:

To contact the EMBOSS team:

emboss-bug at emboss.open-bio.org Bug reports and support requests
emboss-submit at emboss.open-bio.org Code submissions

Lists users/developers can subscribe to:

emboss at emboss.open-bio.org Users mailing list
emboss-dev at emboss.open-bio.org Developers mailing list
emboss-announce at emboss.open-bio.org New release announcements list

There are obvious gaps in these details ... more news as soon as we have 
confirmation.

regards,

Peter Rice, Alan Bleasby and the EMBOSS team.


From maoj at helix.nih.gov  Mon Jul 25 13:58:06 2005
From: maoj at helix.nih.gov (Jean Mao)
Date: Mon, 25 Jul 2005 09:58:06 -0400
Subject: [EMBOSS] (no subject)
Message-ID: <200507251358.j6PDw5N94183035@helix.nih.gov>

Hello all,

I am building emboss package on our linux cluster. Since it will be for
multiple batch run purpose, there is no need for us to include X11. I got
the following error during 'make install'. Can someone tell me which
programs use X11 and how to turn it off in them before running 'make
install'? Many thanks!!!
----------------------------------------------------------------------------
---------------------------------------
make[2]: Entering directory `/usr/local/EMBOSS-3.0.0/emboss'
/bin/sh ../libtool --tag=CC --mode=link gcc  -O2   -o aaindexextract
aaindexextract.o ../nucleus/libnucleus.la ../ajax/libajaxg.la
../ajax/libajax.la ../plplot/libplplot.la -lX11  -lm 
gcc -O2 -o .libs/aaindexextract aaindexextract.o
../nucleus/.libs/libnucleus.so ../ajax/.libs/libajaxg.so
../ajax/.libs/libajax.so ../plplot/.libs/libplplot.so -lX11 -lm -Wl,--rpath
-Wl,/usr/local/EMBOSS-3.0.0/lib
/usr/bin/ld: cannot find -lX11
collect2: ld returned 1 exit status
make[2]: *** [aaindexextract] Error 1
make[2]: Leaving directory `/usr/local/EMBOSS-3.0.0/emboss'
make[1]: *** [install-recursive] Error 1
make[1]: Leaving directory `/usr/local/EMBOSS-3.0.0/emboss'
make: *** [install-recursive] Error 1
----------------------------------------------------------------------------
---------------------------------------
Jean


From msarachu at biol.unlp.edu.ar  Mon Jul 25 14:44:12 2005
From: msarachu at biol.unlp.edu.ar (Martin Sarachu)
Date: Mon, 25 Jul 2005 11:44:12 -0300
Subject: [EMBOSS] wEMBOSS-1.5 & wrappers4EMBOSS-1.3
Message-ID: <42E4FABC.20302@biol.unlp.edu.ar>

This message is to announce the release of wEMBOSS-1.5 and 
wrappers4EMBOSS-1.3

wEMBOSS-1.5 includes:
* a session indicator to identify which user is running wEMBOSS
* the posibility to add notes to project results

wrappers4EMBOSS-1.3 includes:
* codehop wrapper for selecting degenerated primers
* muscle wrapper for multiple alignements

Both are available at http://www.wemboss.org


-- 
Martin Sarachu
msarachu at biol.unlp.edu.ar
AR.EMBnet
http://www.ar.embnet.org


From maoj at helix.nih.gov  Mon Jul 25 15:20:39 2005
From: maoj at helix.nih.gov (Jean Mao)
Date: Mon, 25 Jul 2005 11:20:39 -0400
Subject: [EMBOSS] How to exclude X11 when Compile Emboss
In-Reply-To: <71B0C9CB1FF4EA43BB48C08DCFF1A1FF01364AC3@NIHCESMLBX.nih.gov>
Message-ID: <200507251520.j6PFKdN93765833@helix.nih.gov>


> Hello all,
> 
> I am building emboss package on our linux cluster. Since it will be for
> multiple batch run purpose, there is no need for us to include X11. I got
> the following error during 'make install'. Can someone tell me which
> programs use X11 and how to turn it off in them before running 'make
> install'? Many thanks!!!
> -------------------------------------------------------
> make[2]: Entering directory `/usr/local/EMBOSS-3.0.0/emboss'
> /bin/sh ../libtool --tag=CC --mode=link gcc  -O2   -o aaindexextract
> aaindexextract.o ../nucleus/libnucleus.la ../ajax/libajaxg.la
> ../ajax/libajax.la ../plplot/libplplot.la -lX11  -lm 
> gcc -O2 -o .libs/aaindexextract aaindexextract.o
> ../nucleus/.libs/libnucleus.so ../ajax/.libs/libajaxg.so
> ../ajax/.libs/libajax.so ../plplot/.libs/libplplot.so -lX11 -lm
> -Wl,--rpath -Wl,/usr/local/EMBOSS-3.0.0/lib
> /usr/bin/ld: cannot find -lX11
> collect2: ld returned 1 exit status
> make[2]: *** [aaindexextract] Error 1
> make[2]: Leaving directory `/usr/local/EMBOSS-3.0.0/emboss'
> make[1]: *** [install-recursive] Error 1
> make[1]: Leaving directory `/usr/local/EMBOSS-3.0.0/emboss'
> make: *** [install-recursive] Error 1
> ----------------------------------------------------------
> Jean
> 


From maoj at mail.nih.gov  Mon Jul 25 13:56:20 2005
From: maoj at mail.nih.gov (Mao, Jean (NIH/CIT))
Date: Mon, 25 Jul 2005 09:56:20 -0400
Subject: [EMBOSS] How to Turn X11 off during Make?
Message-ID: <71B0C9CB1FF4EA43BB48C08DCFF1A1FF01730B6E@NIHCESMLBX.nih.gov>

Hello all,

I am building emboss package on our linux cluster. Since it will be for
multiple batch run purpose, there is no need for us to include X11. I got
the following error during 'make install'. Can someone tell me which
programs use X11 and how to turn it off in them before running 'make
install'? Many thanks!!!
----------------------------------------------------------------------------
---------------------------------------
make[2]: Entering directory `/usr/local/EMBOSS-3.0.0/emboss'
/bin/sh ../libtool --tag=CC --mode=link gcc  -O2   -o aaindexextract
aaindexextract.o ../nucleus/libnucleus.la ../ajax/libajaxg.la
../ajax/libajax.la ../plplot/libplplot.la -lX11  -lm 
gcc -O2 -o .libs/aaindexextract aaindexextract.o
../nucleus/.libs/libnucleus.so ../ajax/.libs/libajaxg.so
../ajax/.libs/libajax.so ../plplot/.libs/libplplot.so -lX11 -lm -Wl,--rpath
-Wl,/usr/local/EMBOSS-3.0.0/lib
/usr/bin/ld: cannot find -lX11
collect2: ld returned 1 exit status
make[2]: *** [aaindexextract] Error 1
make[2]: Leaving directory `/usr/local/EMBOSS-3.0.0/emboss'
make[1]: *** [install-recursive] Error 1
make[1]: Leaving directory `/usr/local/EMBOSS-3.0.0/emboss'
make: *** [install-recursive] Error 1
----------------------------------------------------------------------------
---------------------------------------
Jean


From idrummon at receptor.mgh.harvard.edu  Mon Jul 25 16:25:28 2005
From: idrummon at receptor.mgh.harvard.edu (Iain Drummond)
Date: Mon, 25 Jul 2005 12:25:28 -0400
Subject: [EMBOSS] How to exclude X11 when Compile Emboss
In-Reply-To: <200507251520.j6PFKdN93765833@helix.nih.gov>
Message-ID: <BF0A8AB8.C02F%idrummon@receptor.mgh.harvard.edu>

Jean,

Either tell emboss where to find the X11 libraries during the ./configure
step:

X features:
  --x-includes=DIR    X include files are in DIR
  --x-libraries=DIR   X library files are in DIR

for example

 ./configure  --x-includes=/usr/local/includes  --x-libraries=/usr/local/lib


or

decide not to use X11 at all

./configure --without-x

you can get this info by typing

./configure -help

Iain Drummond
-- 

Iain Drummond, Ph.D.
Assistant Professor
Department of Medicine, Harvard Medical School and
Renal Unit, Massachusetts General Hospital

Mailing address:
Renal Unit / MGH 149-8000
149 13th St. 
Charlestown, MA 02129

Tel: 617 726 5647
Fax: 617 726 5669

idrummond at partners.org
idrummon at receptor.mgh.harvard.edu

Lab Home Page:
http://danio.mgh.harvard.edu

> From: "Jean Mao" <maoj at helix.nih.gov>
> Organization: CIT
> Reply-To: maoj at helix.nih.gov
> Date: Mon, 25 Jul 2005 11:20:39 -0400
> To: <emboss at emboss.open-bio.org>
> Subject: [EMBOSS] How to exclude X11 when Compile Emboss
> 
> 
>> Hello all,
>> 
>> I am building emboss package on our linux cluster. Since it will be for
>> multiple batch run purpose, there is no need for us to include X11. I got
>> the following error during 'make install'. Can someone tell me which
>> programs use X11 and how to turn it off in them before running 'make
>> install'? Many thanks!!!
>> -------------------------------------------------------
>> make[2]: Entering directory `/usr/local/EMBOSS-3.0.0/emboss'
>> /bin/sh ../libtool --tag=CC --mode=link gcc  -O2   -o aaindexextract
>> aaindexextract.o ../nucleus/libnucleus.la ../ajax/libajaxg.la
>> ../ajax/libajax.la ../plplot/libplplot.la -lX11  -lm
>> gcc -O2 -o .libs/aaindexextract aaindexextract.o
>> ../nucleus/.libs/libnucleus.so ../ajax/.libs/libajaxg.so
>> ../ajax/.libs/libajax.so ../plplot/.libs/libplplot.so -lX11 -lm
>> -Wl,--rpath -Wl,/usr/local/EMBOSS-3.0.0/lib
>> /usr/bin/ld: cannot find -lX11
>> collect2: ld returned 1 exit status
>> make[2]: *** [aaindexextract] Error 1
>> make[2]: Leaving directory `/usr/local/EMBOSS-3.0.0/emboss'
>> make[1]: *** [install-recursive] Error 1
>> make[1]: Leaving directory `/usr/local/EMBOSS-3.0.0/emboss'
>> make: *** [install-recursive] Error 1
>> ----------------------------------------------------------
>> Jean
>> 
> _______________________________________________
> EMBOSS mailing list
> EMBOSS at emboss.open-bio.org
> http://newportal.open-bio.org/mailman/listinfo/emboss


From david at compbio.dundee.ac.uk  Tue Jul 26 15:02:51 2005
From: david at compbio.dundee.ac.uk (David Martin)
Date: Tue, 26 Jul 2005 16:02:51 +0100
Subject: [EMBOSS] dbxflat woes
Message-ID: <BF0C0F2B.13198%david@compbio.dundee.ac.uk>

I am trying to run dbxflat on uniprot (sprot/trembl/tremblnew) and it gets
most of the way through the second file then repeatably fails with the
error:

Processing file ./sprot.dat
Processing file ./trembl.dat

   EMBOSS An error in ajindex.c at line 811:
Something has unlocked the PRI root cache page


Any hints on what I can do to avoid this? I am running as an unpriviledged
user.

..d


From ableasby at hgmp.mrc.ac.uk  Tue Jul 26 15:55:19 2005
From: ableasby at hgmp.mrc.ac.uk (Alan Bleasby)
Date: Tue, 26 Jul 2005 16:55:19 +0100 (BST)
Subject: [EMBOSS] dbxflat woes
Message-ID: <200507261555.j6QFtJdq005430@bromine.hgmp.mrc.ac.uk>

>Something has unlocked the PRI root cache page

With an error like that the first thing to check is if
you've set CACHESIZE too small. The docs recommend that
it's set to 200. If that isn't the problem then
email me with your settings for:

a) PAGESIZE
b) CACHESIZE
c) Resource definition

from emboss.default and also email me with the command line
you are using.

Rgds

Alan


From pmr at ebi.ac.uk  Wed Jul 27 10:04:10 2005
From: pmr at ebi.ac.uk (Peter Rice)
Date: Wed, 27 Jul 2005 11:04:10 +0100
Subject: [EMBOSS] Database indexing logfiles
Message-ID: <42E75C1A.5010606@ebi.ac.uk>

Some questions for those who index their own databases in EMBOSS...

I am adding an output file to the programs to log information from the 
indexing run. A sample for indexing the "tembl" test database is included 
below (data files are in the test/embl directory).

Is this useful?

What other information would you like to see?

Can we improve the format of the report?

regards,

Peter Rice

%cat outfile.dbiflat
########################################
# Program: dbiflat
# Rundate: Wed Jul 27 2005 11:02:22
# Dbname: EMBL
# Release: 0.0
# Date: 00/00/00
# IndexDirectory: ./
# Maxindex: 0
# Fields: 6
#   Field 1: id
#   Field 2: acnum
#   Field 3: seqvn
#   Field 4: des
#   Field 5: keyword
#   Field 6: taxon
# Directory: ./
# Filenames: *.dat
# Exclude:
# Files: 10
#   File 1: ./est.dat
#   File 2: ./fun.dat
#   File 3: ./hum1.dat
#   File 4: ./inv.dat
#   File 5: ./pln.dat
#   File 6: ./pro.dat
#   File 7: ./rod.dat
#   File 8: ./sts.dat
#   File 9: ./vrl.dat
#   File 10: ./vrt.dat
########################################

processing filename 'est.dat' ... 1 entries
processing filename 'fun.dat' ... 1 entries
processing filename 'hum1.dat' ... 18 entries
processing filename 'inv.dat' ... 3 entries
processing filename 'pln.dat' ... 3 entries
processing filename 'pro.dat' ... 9 entries
processing filename 'rod.dat' ... 3 entries
processing filename 'sts.dat' ... 1 entries
processing filename 'vrl.dat' ... 1 entries
processing filename 'vrt.dat' ... 4 entries
Index acnum maxlen 8 items 88
Index seqvn maxlen 10 items 132
Index des maxlen 19 items 422
Index keyword maxlen 44 items 96
Index taxon maxlen 27 items 535
Total 10 files 44 entries


From smiddha at indiana.edu  Wed Jul 27 20:28:56 2005
From: smiddha at indiana.edu (Sumit Middha)
Date: Wed, 27 Jul 2005 15:28:56 -0500
Subject: [EMBOSS] EMBOSS explorer
In-Reply-To: <1121980112.5376.11.camel@incognito.invalid>
References: <1121980112.5376.11.camel@incognito.invalid>
Message-ID: <1122496136.42e7ee8815bd3@webmail.iu.edu>


Hi,
Its great to hear of the interface. I want to install it to my own directories
(possibly the same where I untar everything) and then I will manage to point my
web-pages or cgi etc to these. But I am not sure how to achieve that.

This is my attempt at installation. Can someone help me with this. THanks.


> ./install
installing EMBOSS Explorer perl modules...

Checking if your kit is complete...
Looks good
Writing Makefile for EMBOSS::GUI
cp lib/EMBOSS/ACD.pm blib/lib/EMBOSS/ACD.pm
cp lib/EMBOSS/GUI.pm blib/lib/EMBOSS/GUI.pm
cp lib/EMBOSS/GUI/Conf.pm blib/lib/EMBOSS/GUI/Conf.pm
cp lib/EMBOSS/GUI/XHTML.pm blib/lib/EMBOSS/GUI/XHTML.pm
Manifying blib/man3/EMBOSS::GUI.3
Manifying blib/man3/EMBOSS::ACD.3
Manifying blib/man3/EMBOSS::GUI::Conf.3
Manifying blib/man3/EMBOSS::GUI::XHTML.3
Warning: You do not have permissions to install into
/usr/local/lib/perl5/site_perl/5.8.5/sun4-solaris at
/usr/local/lib/perl5/5.8.5/ExtUtils/Install.pm line 114.
mkdir /usr/local/lib/perl5/site_perl/5.8.5/EMBOSS: Permission denied at
/usr/local/lib/perl5/5.8.5/ExtUtils/Install.pm line 176
*** Error code 255
make: Fatal error: Command failed for target `pure_site_install'


Quoting Luke McCarthy <lukem at gene.pbi.nrc.ca>:

> Hi everybody,
> 
> I'm pleased to finally announce a new release of the EMBOSS interface
> formerly known as EMBOSS::GUI, now known as EMBOSS explorer.
> 
> Development has moved to SourceForge.net and the new home page for the
> interface is http://embossgui.sourceforge.net/  It's quite spartan at
> the moment, but I'll be adding a FAQ as questions are frequent asked
> (and answered...)
> 
> You can download EMBOSS explorer at
> http://prdownloads.sourceforge.net/embossgui/emboss-explorer-2.0.0.tar.gz?download
> 
> The new release has been tested against EMBOSS-3.0.0, but not
> thoroughly.  Please report bugs using the bug tracker at
> http://sourceforge.net/tracker/?atid=699414&group_id=124389&func=browse
> (as a last resort, email them to mccarthy at users.sourceforge.net, but I'm
> hoping that use of the bug tracker will help with duplicate reports and
> other organizational issues...)
> 
> Cheers,
> 
> Luke
> _______________________________________________
> EMBOSS mailing list
> EMBOSS at emboss.open-bio.org
> http://newportal.open-bio.org/mailman/listinfo/emboss
> 


From lukem at gene.pbi.nrc.ca  Wed Jul 27 21:05:46 2005
From: lukem at gene.pbi.nrc.ca (Luke McCarthy)
Date: Wed, 27 Jul 2005 15:05:46 -0600
Subject: [EMBOSS] EMBOSS explorer
In-Reply-To: <1122496136.42e7ee8815bd3@webmail.iu.edu>
References: <1121980112.5376.11.camel@incognito.invalid>
	<1122496136.42e7ee8815bd3@webmail.iu.edu>
Message-ID: <1122498346.25556.7.camel@incognito.invalid>

On Wed, 2005-07-27 at 14:28, Sumit Middha wrote:
> Hi,
> Its great to hear of the interface. I want to install it to my own directories
> (possibly the same where I untar everything) and then I will manage to point my
> web-pages or cgi etc to these. But I am not sure how to achieve that.
> 
> This is my attempt at installation. Can someone help me with this. THanks.

At the moment, you can't use the install script to install to your local
directories.  You'd have to do quite a bit of extra setup anyway, to
make sure the web server could find (and had permission) to access the
library files in your own directory.

That being said, you can install the Perl modules like you would any
others:

	perl Makefile.PL
	make
	make install

You'll have to pass the appropriate options to Makefile.PL in order to
install to your own directory.

Alternatively, you can just run everything out of the untarred
directory.  You'll have to make sure that the web server is looking for
perl modules in the emboss-explorer/lib directory, and you'll have to
link appropriately to the html and cgi directories.  The webserver user
needs to be able to read everything in the lib, html and cgi
directories, and to be able to execute the script in the cgi directory,
and to be to write to the html/output directory.  I assume that you know
how to set up your webserver accordingly (or you wouldn't be asking...) 
You'll also have to edit emboss-explorer/lib/EMBOSS/GUI/Conf.pm and fill
in the correct locations.  Good luck.

Cheers,

Luke


From pmr at ebi.ac.uk  Thu Jul 28 16:14:06 2005
From: pmr at ebi.ac.uk (Peter Rice)
Date: Thu, 28 Jul 2005 17:14:06 +0100
Subject: [EMBOSS] Database indexing logfiles
In-Reply-To: <42E75C1A.5010606@ebi.ac.uk>
References: <42E75C1A.5010606@ebi.ac.uk>
Message-ID: <42E9044E.4090707@ebi.ac.uk>

After comments on this list, I have updated the dbiflat logfile.

It now includes:

Field names are the short names used by the USA (the index file names still 
work on the dbiflat commandline). These are the same names as SRS uses in its 
commandline queries.

Numbers of tokens for each index field in each file, and total unique values 
in each field index.

Full paths for all directories (including the current working directory)

Today's date (also written to the index file headers) if no date is given.

The full commandline - if there were prompts with non-default replies these 
will be included in the commandline reported. This uses new ACD functions that 
can be used to report in other programs. Any special requests for this 
information in other outputs?

regards,

Peter

> %cat outfile.dbiflat
> ########################################
> # Program: dbiflat
> # Rundate: Thu Jul 28 2005 17:04:58
> # Dbname: EMBL
> # Release: 0.0
> # Date: 28/07/05
> # CurrentDirectory: /homes/pmr/hgmp/test/embl/
> # IndexDirectory: ./
> # IndexDirectoryPath: /homes/pmr/hgmp/test/embl/
> # Maxindex: 0
> # Fields: 6
> #   Field 1: id
> #   Field 2: acc
> #   Field 3: sv
> #   Field 4: des
> #   Field 5: key
> #   Field 6: org
> # Directory: ./
> # DirectoryPath: /homes/pmr/hgmp/test/embl/
> # Filenames: *.dat
> # Exclude: 
> # Files: 10
> #   File 1: ./est.dat
> #   File 2: ./fun.dat
> #   File 3: ./hum1.dat
> #   File 4: ./inv.dat
> #   File 5: ./pln.dat
> #   File 6: ./pro.dat
> #   File 7: ./rod.dat
> #   File 8: ./sts.dat
> #   File 9: ./vrl.dat
> #   File 10: ./vrt.dat
> ########################################
> # Commandline: dbiflat
> #    -fields acnum,seqvn,des,keyword,taxon
> #    -dbname EMBL
> #    -idformat embl
> #    -auto
> ########################################
> 
> processing filename 'est.dat' ... 1 entries
>    acc 1
>     sv 3
>    des 15
>    key 1
>    org 14
> processing filename 'fun.dat' ... 1 entries
>    acc 1
>     sv 3
>    des 8
>    key 1
>    org 9
> processing filename 'hum1.dat' ... 18 entries
>    acc 53
>     sv 54
>    des 200
>    key 43
>    org 252
> processing filename 'inv.dat' ... 3 entries
>    acc 3
>     sv 9
>    des 20
>    key 3
>    org 33
> processing filename 'pln.dat' ... 3 entries
>    acc 7
>     sv 9
>    des 19
>    key 6
>    org 54
> processing filename 'pro.dat' ... 9 entries
>    acc 13
>     sv 27
>    des 77
>    key 28
>    org 54
> processing filename 'rod.dat' ... 3 entries
>    acc 3
>     sv 9
>    des 28
>    key 1
>    org 45
> processing filename 'sts.dat' ... 1 entries
>    acc 1
>     sv 3
>    des 12
>    key 7
>    org 14
> processing filename 'vrl.dat' ... 1 entries
>    acc 2
>     sv 3
>    des 10
>    key 1
>    org 5
> processing filename 'vrt.dat' ... 4 entries
>    acc 4
>     sv 12
>    des 33
>    key 5
>    org 55
> 
> Index acc maxlen 8 items 84
> Index sv maxlen 10 items 90
> Index des maxlen 19 items 215
> Index key maxlen 44 items 81
> Index org maxlen 27 items 116
> 
> Total 10 files 44 entries


From john8376 at uidaho.edu  Fri Jul 29 19:08:50 2005
From: john8376 at uidaho.edu (Audra Johnson)
Date: Fri, 29 Jul 2005 12:08:50 -0700
Subject: [EMBOSS] Using seqret to fetch from .nal index databases
Message-ID: <5C75DDA3-04A4-4A58-B925-31F9F017D8C4@uidaho.edu>

Apologies for the length, but I want to be thorough.  I'm doing blast  
searches and then trying to fetch the sequences from the our genembl  
database using seqret.  For example:

blastall -p tblastn /gcgdata_10.3/gcgblast/genembl -i  
dp00061_disordered_115_168.fasta

Gives me results of:

GB_PR:HUMRPA70KD        2e-08   412     573     1       54      54
GB_PR:BC018126  2e-08   386     547     1       54      54
GB_PAT:AX335048 2e-08   412     573     1       54      54
GB_PAT:AR175924 2e-08   412     573     1       54      54
GB_RO:BC019119  0.003   399     584     1       53      62

I've tried using a seqret just for the database name I'm giving  
blastall, and specifically saying the genembl.nal file:

$ seqret
Reads and writes (returns) sequences
Input sequence(s): /gcgdata_10.3/gcgblast/genembl.nal:HUMRPA70KD
Error: Unable to read sequence '/gcgdata_10.3/gcgblast/ 
genembl.nal:HUMRPA70KD'
Input sequence(s): /gcgdata_10.3/gcgblast/genembl:HUMRPA70KD
Error: failed to open filename '/gcgdata_10.3/gcgblast/genembl'
Error: Unable to read sequence '/gcgdata_10.3/gcgblast/ 
genembl:HUMRPA70KD'
Died: seqret terminated: Bad value for '-sequence' and no more retries

But neither works.  (I've omitted the beginning prefix GB_PR: and  
similar prefixes, but I've tried that way and it doesn't work,  
either.)  Is there any way to get seqret functioning with these  
databases?

-- Audra Johnson, University of Idaho


From golharam at umdnj.edu  Fri Jul 29 19:27:51 2005
From: golharam at umdnj.edu (Ryan Golhar)
Date: Fri, 29 Jul 2005 15:27:51 -0400
Subject: [EMBOSS] Using seqret to fetch from .nal index databases
In-Reply-To: <5C75DDA3-04A4-4A58-B925-31F9F017D8C4@uidaho.edu>
Message-ID: <008e01c59473$9cf21d70$2f01a8c0@GOLHARMOBILE1>

If you are using a NCBI formatted database, why not just use formatseq
from the ncbi toolkit to extract the sequence?


-----Original Message-----
From: emboss-bounces at emboss.open-bio.org
[mailto:emboss-bounces at emboss.open-bio.org] On Behalf Of Audra Johnson
Sent: Friday, July 29, 2005 3:09 PM
To: emboss at emboss.open-bio.org
Subject: [EMBOSS] Using seqret to fetch from .nal index databases


Apologies for the length, but I want to be thorough.  I'm doing blast  
searches and then trying to fetch the sequences from the our genembl  
database using seqret.  For example:

blastall -p tblastn /gcgdata_10.3/gcgblast/genembl -i  
dp00061_disordered_115_168.fasta

Gives me results of:

GB_PR:HUMRPA70KD        2e-08   412     573     1       54      54
GB_PR:BC018126  2e-08   386     547     1       54      54
GB_PAT:AX335048 2e-08   412     573     1       54      54
GB_PAT:AR175924 2e-08   412     573     1       54      54
GB_RO:BC019119  0.003   399     584     1       53      62

I've tried using a seqret just for the database name I'm giving  
blastall, and specifically saying the genembl.nal file:

$ seqret
Reads and writes (returns) sequences
Input sequence(s): /gcgdata_10.3/gcgblast/genembl.nal:HUMRPA70KD
Error: Unable to read sequence '/gcgdata_10.3/gcgblast/ 
genembl.nal:HUMRPA70KD'
Input sequence(s): /gcgdata_10.3/gcgblast/genembl:HUMRPA70KD
Error: failed to open filename '/gcgdata_10.3/gcgblast/genembl'
Error: Unable to read sequence '/gcgdata_10.3/gcgblast/ 
genembl:HUMRPA70KD'
Died: seqret terminated: Bad value for '-sequence' and no more retries

But neither works.  (I've omitted the beginning prefix GB_PR: and  
similar prefixes, but I've tried that way and it doesn't work,  
either.)  Is there any way to get seqret functioning with these  
databases?

-- Audra Johnson, University of Idaho
_______________________________________________
EMBOSS mailing list
EMBOSS at emboss.open-bio.org
http://newportal.open-bio.org/mailman/listinfo/emboss


From Andrew.Mather at dpi.vic.gov.au  Sat Jul 30 11:30:47 2005
From: Andrew.Mather at dpi.vic.gov.au (Andrew.Mather at dpi.vic.gov.au)
Date: Sat, 30 Jul 2005 21:30:47 +1000
Subject: [EMBOSS] EMBOSS GUI problems
Message-ID: <OFB7158ECE.3815DC06-ONCA25704E.003F1E39-CA25704E.003F3EA2@nre.vic.gov.au>

Hi Luke and EMBOSS list 

I've installed the EMBOSS GUI and for the most part, it's working pretty 
well. 

However for some apps (mainly seems to be alignment type ones like water, 
needle, emma, but that may just be because I've tried more of them than 
any others), it always fails 

Error: Unable to read sequence &#39;&#39 
Died: water terminated: Bad value for &#39-asequence&#39 with -auto 
defined 
water exited with status 1...

or in the /var/www/html/EMBOSS/runs/ error log, 

Error: Unable to read sequence '' 
Died: water terminated: Bad value for '-asequence' with -auto defined 
water exited with status 1... 

It doesn't seem to matter if it's sequence data pasted in, or uploaded 
from a file. 

Some apps work fine, so  I'm guessing it's not a fundamental problem like 
permissions on a temp directory or something. 

Are you able to point me at where to start lookng ? 

Thanks,
Andrew

 
Animal Genetics and Genomics, PIRVic Attwood
475 Mickleham Road, Attwood, 3049
ph +61 3 92174342
mob  0413 009 761


----------------
There are 10 kinds of people...those who understand binary and those who 
don't.


From Andrew.Mather at dpi.vic.gov.au  Sat Jul 30 10:40:45 2005
From: Andrew.Mather at dpi.vic.gov.au (Andrew.Mather at dpi.vic.gov.au)
Date: Sat, 30 Jul 2005 20:40:45 +1000
Subject: [EMBOSS] EMBOSS GUI problems
Message-ID: <OFA033AB41.6CF928E1-ONCA25704E.00390601-CA25704E.003AA9EF@nre.vic.gov.au>

Hi Luke and EMBOSS list

I've installed the EMBOSS GUI and for the most part, it's working pretty 
well.

However for some apps (mainly seems to be alignment type ones like water, 
needle, emma, but that may just be because I've tried more of them than 
any others), it always fails

Error: Unable to read sequence &#39;&#39
Died: water terminated: Bad value for &#39-asequence&#39 with -auto 
defined
water exited with status 1...

or in the /var/www/html/EMBOSS/runs/ error log, 

Error: Unable to read sequence ''
Died: water terminated: Bad value for '-asequence' with -auto defined
water exited with status 1...

It doesn't seem to matter if it's sequence data pasted in, or uploaded 
from a file.

Some apps work fine, so  I'm guessing it's not a fundamental problem like 
permissions on a temp directory or something.

Are you able to point me at where to start lookng ?

Thanks,
Andrew

Animal Genetics and Genomics, PIRVic Attwood
475 Mickleham Road, Attwood, 3049
ph +61 3 92174342
mob  0413 009 761


----------------
There are 10 kinds of people...those who understand binary and those who 
don't.