From ableasby at hgmp.mrc.ac.uk Wed Jul 13 10:36:28 2005 From: ableasby at hgmp.mrc.ac.uk (Alan Bleasby) Date: Wed, 13 Jul 2005 15:36:28 +0100 (BST) Subject: [EMBOSS] New email lists ready Message-ID: <200507131436.j6DEaSF7027543@bromine.hgmp.mrc.ac.uk> The new email addresses for the EMBOSS lists are now set up and ready (excluding any teething problems). They are: emboss at emboss.open-bio.org emboss-dev at emboss.open-bio.org emboss-bug at emboss.open-bio.org emboss-submit at emboss.open-bio.org You can access the archives, subscribe/unsubscribe and alter the way email is sent to you (e.g. digests) by visiting: http://emboss.open-bio.org/mailman/listinfo/emboss http://emboss.open-bio.org/mailman/listinfo/emboss-dev http://emboss.open-bio.org/mailman/listinfo/emboss-announce http://emboss.open-bio.org/mailman/listinfo/emboss-bug The new FTP server is at: ftp://emboss.open-bio.org/pub/EMBOSS Alan From tjc at sanger.ac.uk Wed Jul 13 11:11:40 2005 From: tjc at sanger.ac.uk (Tim Carver) Date: Wed, 13 Jul 2005 16:11:40 +0100 Subject: [EMBOSS] Jemboss Announcement Message-ID: With the imminent closure of the RFCGR, there will be no publicly available Jemboss server. Jemboss will remain available for download and installation as part of the EMBOSS distribution. You may find there is a local Jemboss server already available at your own institution. If you would like to have your server listed on the Jemboss web page please contact the EMBOSS group (emboss-dev at emboss.open-bio.org) Tim Carver The Wellcome Trust Sanger Institute Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK From ableasby at hgmp.mrc.ac.uk Thu Jul 14 19:43:30 2005 From: ableasby at hgmp.mrc.ac.uk (Alan Bleasby) Date: Fri, 15 Jul 2005 00:43:30 +0100 (BST) Subject: [EMBOSS] EMBOSS 3.0.0 released Message-ID: <200507142343.j6ENhUn2002328@bromine.hgmp.mrc.ac.uk> EMBOSS 3.0.0 is now available for download from: ftp://emboss.open-bio.org/pub/EMBOSS/ and, until the 27th July, from: ftp://ftp.rfcgr.mrc.ac.uk/pub/EMBOSS/ The following text details some of the changes from the previous release. Alan EMBOSS main package: New database indexing programs dbxflat, dbxfasta and dbxgcg. A dbxblast program will be added if we can extract data from the new BLAST formatdb output. These programs allow indexing of files larger than 2Gb. N.B.: Indexes will be created faster if they are written through a different disc controller than that used to read the database being indexed. If that is not possible then reading from and writing to different hard drives on the same controller is recommended. Note that each index can be created independently of the others e.g. you can create keyword and description indexes after you've created the ID and ACC indexes. To support these programs, the emboss.default and .embossrc files can include "resource" definitions. See the documentation of these programs for more information. "resource" definitions are intended to define anything other than environment variables and databases. In the emboss.default and .embossrc files the same name can be used for variables, databases, and resources (we now store them in separate tables). In previous versions a single table was used and name clashes could occur. This becomes an issue with the increasing use of resource definitions. Sequence sets in ACD have a new attribute "aligned" that reports whether the sequences are aligned (reading a multiple alignment in for visualisation) or not (reading a set of sequences into memory for further processing - perhaps for alignment). Sequence formats have been reviewed. "experiment" format is that used by the Staden package. "staden" and "gcg" formats now parse out comments from anywhere in the sequence. "nexus" and "nexusnon" formats now correctly report protein sequence datatypes. "nbrf" or "pir" format data can now be read from an SRSWWW server (for technical reasons, SRS servers are unable to exactly reproduce NBRF/PIR format). "clustal" output no longer writes in blocks of 10. "Phylip3" output is now renamed "phylipnon" for compatibility with other non-interleaved output format names. The "phylip3" name remains valid for back-compatibility. The header record for phylipnon format has been changed to that accepted by phylip 3.6 (no YF on the header line, number of sequences specified). Sequence format information on the web has been updated to reflect these changes. Codon usage table formats can be in these formats (-format qualifier): "emboss", "EMBOSS codon usage file", "All numbers read, #comments for extras" "cut", "EMBOSS codon usage file", "Same as EMBOSS, output default format is 'cut'" "gcg", "GCG codon usage file", "All numbers read, #comments for extras" "cutg", "CUTG codon usage file", "All numbers (cutgaa) read or fraction calculated, extras added" "cutgaa", "CUTG codon usage file with aminoacids", "Cutg with all numbers" "spsum", CUTG species summary file", "Number only, species and CDSs in header" "cherry", "Mike Cherry codonusage database file", "GCG format with species and CDSs in header" "transterm", "TransTerm database file", "GCG format with no extras" "codehop", "FHCRC codehop program codon usage file", "Freq only, extras at end" "staden", "Staden package codon usage file with percentages", "Freq or number only, no extras" "numstaden", "Staden package codon usage file with numbers", "Number only, no extras. Can be read as 'staden'" Any of these formats should be readable by default. Some files are "readable" in more than one format (staden and numstaden for example can both be read as "staden"). The extra names are used so we can reuse them as output format names. For output of codon usage tables, the same formats are available (-oformat qualifier). A new application codcopy (not codret because coderet is already an EMBOSS program name) will convert from one format to another in the same way as seqret converts sequence formats. Coderet reports the number of CDS, mRNA and translation sequences. Correction to sequence numbering for reversed nucleotide sequences in alignments. Correction to sequence alignment functions returning slightly suboptimal alignments. The entrails program reports codon usage formats. Description of report format entrails output improved. Entrails is built by "make check" and is provided so that developers of wrappers can obtain all EMBOSS internal details needed, for example all ACD datatypes and input/output format names and descriptions. Sequence types are explicitly set in cons, sixpack and backtranseq as some output formats failed to recognise them as protein. EMBASSY packages: MYEMBOSS is a new EMBASSY package for developing your own code. Installation requires recent versions of GNU packages autoconf, automake and libtool. To install, you must first build the configure and make files with these commands: aclocal -I m4 autoconf automake -a When you add your own programs, do so by adding source files in myemboss/source and ACD files in myemboss/emboss_acd and add these filenames to the Makefile.am files in each directory. There are "myseq" and "mytest" examples provided to guide you. There is no need to modify configure or Makefile files - these will be automatically updated. To allow MYEMBOSS to be installed by one user, and linked to an EMBOSS installation maintained for the site by someone else, new variables are added to locate the ACD files for any EMBASSY package. If myemboss is not installed in the same place as EMBOSS, define EMBOSS_MYEMBOSSROOT as the location of the myemboss installed ACD files or the myemboss/emboss_acd source directory. This requires that EMBASSY programs call the embInitP function with the name of the package ("myemboss"). For ACD utilities such as acdvalid or acdc to work, as these use the EMBOSS embInit call, another variable EMBOSS_ACDUTILROOT must be defined, pointing to the same directory. PHYLIP is a beta release port of PHYLIP 3.6b. We welcome comments on the EMBOSS interface to the programs. Program names are prefixed by 'f' to avoid clashes with the old PHYLIP EMBASSY package. We still need to work on adding new tree input and output formats, and updating the code to PHYLIP 3.63 (December 2004). We are also considering splitting more of the programs to simplify the ACD interface. In this release seqboot and treedist are already split. seqboot is split by input type into seqboot, restboot, discboot and freqboot. Treedist is split by the number of input files into treedist and treedistpair. Acdvalid objects to the dependencies in other programs, for example the method used by fdnadist. The DOMAINATRIX package of earlier releases has been extended and replaced by 5 EMBASSY packages described below (32 applications in total). These tools were developed as part of a research project and are distinct from other EMBOSS apps in being intended mostly for computational biologists rather than biologist end-users. STRUCTURE The STRUCTURE package is used for parsing the PDB database and generating secondary databases of coordinate and derived data. The tools have the following scope: (i) For parsing PDB files and writing clean coordinate files (CCF files) that "clean-up" many PDB inconsistencies. For example, residue numbers give the correct index into the biological sequence. (ii) To generate CCF files for whole PDB files or individual domains from the SCOP and CATH databases. (iii) To augment CCF files with residue solvent accessibility and secondary structure data. (iv) To generate contact files (CON files) of intra-chain and inter-chain residue-residue contact data. (v) To generate CON files of residue-ligand contact data. (vi) Miscellaneous file handling, e.g. dictionary of heterogen groups. DOMAINATRIX The DOMAINATRIX package is used for handling the SCOP and CATH databases of protein domain classification, the parsable files of which can be inconvenient, e.g. for comparative studies, extending and processing. The tools have the following scope: (i) For parsing raw SCOP and CATH parsable files and writing domain classification files (DCF files) with a single, simple and extensible format. (ii) To add sequence records to a DCF file. (iii) To remove low resolution domains. (iv) To flexibly calculate and remove redundancy. (v) Primitive tools for secondary structure element mapping to domains in a DCF file. DOMALIGN The DOMALIGN package is used for generating alignments for families of domains, especially across large datasets, e.g. the whole of SCOP. The tools have the following scope: (i) For identifying representative structures for different nodes in the SCOP and CATH hierarchies. (ii) For generating annotated, structure-based sequence alignments for these nodes. (iii) For extending these domain alignment files (DAF files) with sequences of unknown structure. (iv) All-versus-all global sequence alignment. DOMSEARCH The DOMSEARCH package is used for deriving extended sequence families, especially from large structural datasets such as the whole of SCOP. The tools have the following scope: (i) To generate domain hits files (DHF files) of sequence relatives to an alignment or other sequences. (ii) To remove fragmentary sequences from a DHF file. (iii) To flexibly calculate and remove redundancy. (iv) To remove hits hits of ambiguous classification and collate sequences into families. SIGNATURE The SIGNATURE package is used for generating, scanning and evaluating sparse signatures and other predictive elements for protein sequence characterisation. The tools have the following scope: (i) To generate sparse signatures for protein families from alignments and residue contact data. (ii) Generate other types of discriminator (e.g. HMMs) from alignments. (iii) Generate ligand-binding signatures from residue-ligand contacts. (iv) Generate domain hits files (DHF files) and ligand hits files (LHF files) of hits (sequences) from signature scans. (v) Interpretation and display of signature performance by using ROC analysis. Where data, files etc are mentioned above or in the application documentation, data structures and functions for manipulating such are usually provided in the AJAX and NUCLEUS C programming libraries. For example, there are objects for handling protein atoms, residues, chains, for SCOP and CATH domains and so on. From thiago.venancio at gmail.com Mon Jul 18 08:09:33 2005 From: thiago.venancio at gmail.com (Thiago Venancio) Date: Mon, 18 Jul 2005 09:09:33 -0300 Subject: [EMBOSS] error msg Message-ID: <44255ea80507180509386875bd@mail.gmail.com> Hi all. I am new to EMBOSS. I have installed it and got the problem: "wossname: error while loading shared libraries: libnucleus.so.3: cannot open shared object file: No such file or directory" All the EMBOSS programs give the same error. The instalation process have been ok and i have set the envs. Thanks in advance. Thiago From golharam at umdnj.edu Tue Jul 19 12:51:30 2005 From: golharam at umdnj.edu (Ryan Golhar) Date: Tue, 19 Jul 2005 12:51:30 -0400 Subject: [EMBOSS] EMBOSS::GUI Web Interface Message-ID: <009401c58c82$1d2a3670$2f01a8c0@GOLHARMOBILE1> Hi Luke, Any word on when EMBOSS-GUI will be available for EMBOSS 3.0.0? Thanks, Ryan From jacob at biochemistry.ucl.ac.uk Wed Jul 20 11:36:24 2005 From: jacob at biochemistry.ucl.ac.uk (Jacob Hurst) Date: Wed, 20 Jul 2005 16:36:24 +0100 (BST) Subject: [EMBOSS] problem with using accession number.... Message-ID: Hello, If I enter the following id seqret correctly returns the sequence. acrm3<113>% seqret embl:hsgstpig Reads and writes (returns) sequences Output sequence [hsgstpig.fasta]: however if i enter the corresponding accession number it fails..... acrm3<114>% seqret embl:X08058 Reads and writes (returns) sequences Error: Unable to read sequence 'embl:X08058' Died: seqret terminated: Bad value for '-sequence' and no prompt I was under the impression that emboss was setup to deal with both accession and id. regards Jake -- Jacob Hurst Phd Department of Biochemistry and Molecular Biology, University College London From pmr at ebi.ac.uk Wed Jul 20 11:59:55 2005 From: pmr at ebi.ac.uk (Peter Rice) Date: Wed, 20 Jul 2005 16:59:55 +0100 Subject: [EMBOSS] problem with using accession number.... In-Reply-To: References: Message-ID: <42DE74FB.80604@ebi.ac.uk> Jacob Hurst wrote: > I was under the impression that emboss was setup to deal with both > accession and id. Yes, but ... this depends on how the embl database is defined at your site. Some sites have databases defined to access entries through, for example, a URL or an external application (or script) that can only search for entry names. Hmmmm .... we could add a little more information on this in showdb .... for a future release. If you have difficulty finding out how the database is defined, mail us at emboss-bug at emboss.open-bio.org and we can help you track it down. regards, Peter Rice From golharam at umdnj.edu Thu Jul 21 00:00:03 2005 From: golharam at umdnj.edu (Ryan Golhar) Date: Thu, 21 Jul 2005 00:00:03 -0400 Subject: [EMBOSS] EMBOSS 3.0.0 RPMs available Message-ID: <013801c58da8$acc10440$2f01a8c0@GOLHARMOBILE1> I'm eager to upgrade our installation of EMBOSS on all our linux workstations, so I've gone ahead and built RPMs for EMBOSS (based on biolinux version) and MYEMBOSS applications. You can download the RPMs and source RPMs from http://serine.umdnj.edu/~golharam/biorpms. They include (sorry for the capitalization): DOMAINATRIX DOMALIGN DOMSEARCH EMBOSS EMBOSS-data EMBOSS-devel EMBOSS-Jemboss EMNU ESIM4 HMMER MEME MSE MYEMBOSS PHYLIP SIGNATURE STRUCTURE TOPO -- Ryan Golhar - golharam at umdnj.edu The Informatics Institute of UMDNJ From james_tan79 at hotmail.com Thu Jul 21 05:41:24 2005 From: james_tan79 at hotmail.com (JT) Date: Thu, 21 Jul 2005 17:41:24 +0800 Subject: [EMBOSS] any DNA or RNA program similar to pepstat ? Message-ID: Hi, Is there any program that can output a report of simple DNA/RNA sequence information including e.g. a) Molecular weight b) Number of residues c) Average residue weight d) %G, %C, %A, %T, %GC e) Melting temp f) charge etc. Thanks James From jison at hgmp.mrc.ac.uk Thu Jul 21 06:49:58 2005 From: jison at hgmp.mrc.ac.uk (Dr J.C. Ison) Date: Thu, 21 Jul 2005 11:49:58 +0100 Subject: [EMBOSS] any DNA or RNA program similar to pepstat ? References: Message-ID: <42DF7DD6.CD81DF9B@hgmp.mrc.ac.uk> Hi James There's no single app to cover all your request, but some of the following might help (see http://emboss.sourceforge.net/apps/) dan Plot melting temperatures for DNA. freak Residue/base frequency table or plot extractfeat Extract features from a sequence geecee Calculates the fractional GC content of nucleic acid sequences infoseq Displays some simple information about sequences isochore Plots isochores in large DNA sequences newcpgseek Reports CpG rich regions remap Display a sequence with restriction cut sites, translation etc.. showfeat Show features of a sequence. Please have a look at what's available and if you require something else / new functionality etc please get back in touch. Cheers Jon JT wrote: > > Hi, > > Is there any program that can output a report of simple DNA/RNA sequence > information including e.g. > a) Molecular weight > b) Number of residues > c) Average residue weight > d) %G, %C, %A, %T, %GC > e) Melting temp > f) charge etc. > > Thanks > James > _______________________________________________ > EMBOSS mailing list > EMBOSS at emboss.open-bio.org > http://newportal.open-bio.org/mailman/listinfo/emboss -- Jon C. Ison, PhD Proteomics Applications Group MRC Rosalind Franklin Centre for Genomics Research Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SB, UK Tel: +44 1223 494500 Fax: +44 1223 494512 E-mail: jison at rfcgr.mrc.ac.uk Web: http://www.rfcgr.mrc.ac.uk From kertib at linuxlap.hu Thu Jul 21 07:38:34 2005 From: kertib at linuxlap.hu (Kerti =?iso-8859-1?q?Bal=E1zs_G=E1bor?=) Date: Thu, 21 Jul 2005 13:38:34 +0200 Subject: [EMBOSS] Some question Message-ID: <200507211338.35035.kertib@linuxlap.hu> Hello! There is some (elementary) question, because I do not find - maybe I do wrong - the solution. - how to backtranslate a cds mRNA fragm. to (c)DNA fragm. ? - how to generate antisense DNA fragm. from a sens. Thank you. Balazs From pmr at ebi.ac.uk Thu Jul 21 07:58:58 2005 From: pmr at ebi.ac.uk (Peter Rice) Date: Thu, 21 Jul 2005 12:58:58 +0100 Subject: [EMBOSS] Some question In-Reply-To: <200507211338.35035.kertib@linuxlap.hu> References: <200507211338.35035.kertib@linuxlap.hu> Message-ID: <42DF8E02.40909@ebi.ac.uk> Kerti Bal?zs G?bor wrote: > There is some (elementary) question, because I do not find - maybe I do wrong > - the solution. > > - how to backtranslate a cds mRNA fragm. to (c)DNA fragm. ? The cDNA will be identical to the mRNA. No backtranslation needed. Backtranslation (as in backtranseq) converts a protein sequence into a nucleotide sequence that will translate to the same protein sequence (using the most frequent codon for each amino acid). If you only want to convert U (Uracil) to T (thymine) to convert an RNA sequence to DNA (all EMBOSS programs will accept both as nucleotide input) you can modify the program seqret to specify a nucleotide sequence as input, and generate a DNA sequence as output. An easy way to start writing EMBOSS programs - copy one program and one ACD file and make 4 small edits. > - how to generate antisense DNA fragm. from a sens. In EMBOSS, revseq does this. The antisense strand is smilpy the reverse compleemnt of the original. Hope this helps, Peter Rice From jison at hgmp.mrc.ac.uk Thu Jul 21 08:10:50 2005 From: jison at hgmp.mrc.ac.uk (Dr J.C. Ison) Date: Thu, 21 Jul 2005 13:10:50 +0100 Subject: [EMBOSS] Some question References: <200507211338.35035.kertib@linuxlap.hu> Message-ID: <42DF90CA.9FB0FD34@hgmp.mrc.ac.uk> Dear Balazs See http://emboss.sourceforge.net/apps/ for application documentation. transeq Translates nucleic acid sequences. (i.e. DNA -> protein) backtranseq Back translate a protein sequence (i.e. protein -> DNA) coderet Extract CDS, mRNA and translations from feature tables I don't think there is anything to interchange sense/antisense or mRNA / DNA sequences but something could be written if you let us know exactly what you need / why you need it. Cheers Jon Kerti Bal?zs G?bor wrote: > > Hello! > > There is some (elementary) question, because I do not find - maybe I do wrong > - the solution. > > - how to backtranslate a cds mRNA fragm. to (c)DNA fragm. ? > - how to generate antisense DNA fragm. from a sens. > > Thank you. > > Balazs > _______________________________________________ > EMBOSS mailing list > EMBOSS at emboss.open-bio.org > http://newportal.open-bio.org/mailman/listinfo/emboss -- Jon C. Ison, PhD Proteomics Applications Group MRC Rosalind Franklin Centre for Genomics Research Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SB, UK Tel: +44 1223 494500 Fax: +44 1223 494512 E-mail: jison at rfcgr.mrc.ac.uk Web: http://www.rfcgr.mrc.ac.uk From faruque at ebi.ac.uk Thu Jul 21 09:08:30 2005 From: faruque at ebi.ac.uk (Nadeem Faruque) Date: Thu, 21 Jul 2005 14:08:30 +0100 Subject: [EMBOSS] backtranseq In-Reply-To: <42DF90CA.9FB0FD34@hgmp.mrc.ac.uk> References: <200507211338.35035.kertib@linuxlap.hu> <42DF90CA.9FB0FD34@hgmp.mrc.ac.uk> Message-ID: <42DF9E4E.6060603@ebi.ac.uk> > See http://emboss.sourceforge.net/apps/ for application documentation. > > transeq Translates nucleic acid sequences. (i.e. DNA -> protein) > backtranseq Back translate a protein sequence (i.e. protein -> DNA) ... While backtranseq is very clever in predicting the cDNA sequence based on peptide sequence by choosing codons according to useage, would it not be very useful to have the option for it to return an answer in degenerate bases? eg in human, the 'peptide' is simply 'M' backtranseq returns the most likely codon used, ie 'ATG' but since it could be TTG, CTG or ATG, it may be more useful for some people to return 'HTG' Returning a degenerate sequence would have the advantage (for some uses) of being usable by normal DNA-savvy string-based search methods when finding the peptide coding location in nucleic acid sequences rather than having to use similarity searches. I could also see it being useful for designing PCR primers within coding regions. Nadeem -- S.M. Nadeem N. Faruque EMBL Nucleotide Database Curation Team EMBL Outstation Tel: +44 1223 494611 Fax: +44 1223 494472 The European Bioinformatics Institute URL: http://www.ebi.ac.uk/ Email for data submissions: datasubs at ebi.ac.uk Email for updates: update at ebi.ac.uk ============================================================================= From pmr at ebi.ac.uk Thu Jul 21 10:00:30 2005 From: pmr at ebi.ac.uk (Peter Rice) Date: Thu, 21 Jul 2005 15:00:30 +0100 Subject: [EMBOSS] backtranseq In-Reply-To: <42DF9E4E.6060603@ebi.ac.uk> References: <200507211338.35035.kertib@linuxlap.hu> <42DF90CA.9FB0FD34@hgmp.mrc.ac.uk> <42DF9E4E.6060603@ebi.ac.uk> Message-ID: <42DFAA7E.2070107@ebi.ac.uk> Nadeem Faruque wrote: > While backtranseq is very clever in predicting the cDNA sequence based on peptide sequence by choosing codons according > to useage, would it not be very useful to have the option for it to return an answer in degenerate bases? > > eg in human, the 'peptide' is simply 'M' > backtranseq returns the most likely codon used, ie 'ATG' > but since it could be TTG, CTG or ATG, it may be more useful for some people to return 'HTG' Ummmm .... depends on the genetic code. In human I would expect ATG, in bacteria GCG is second schoice and NTG would be the possible result - but only for a start codon of course (just one of the complexities of backtranslating - I think we must avoid inventing a start codon if the protein doesn't start with 'M' because the numbering gets complicated). As this would need a different input (a genetic code, rather than a codon usage file) I would make this a different program - not difficult to write, Any good suggestions for a program name? > Returning a degenerate sequence would have the advantage (for some uses) of being usable by normal DNA-savvy > string-based search methods when finding the peptide coding location in nucleic acid sequences rather than having to use > similarity searches. I could also see it being useful for designing PCR primers within coding regions. ... which leads on to whether EMBOSS should include such programs :-) regards, Peter Rice From jcherry at ncbi.nlm.nih.gov Thu Jul 21 10:58:14 2005 From: jcherry at ncbi.nlm.nih.gov (Josh Cherry) Date: Thu, 21 Jul 2005 10:58:14 -0400 (EDT) Subject: [EMBOSS] backtranseq In-Reply-To: <42DFAA7E.2070107@ebi.ac.uk> References: <200507211338.35035.kertib@linuxlap.hu> <42DF90CA.9FB0FD34@hgmp.mrc.ac.uk> <42DF9E4E.6060603@ebi.ac.uk> <42DFAA7E.2070107@ebi.ac.uk> Message-ID: Nadeem Faruque wrote: > Returning a degenerate sequence would have the advantage (for some uses) > of being usable by normal DNA-savvy string-based search methods when > finding the peptide coding location in nucleic acid sequences rather > than having to use similarity searches. But this won't work the way some might hope due to the nature of the genetic code, specifically (in the standard code) the three amino acids that have six codons each (S, L, and R). Consider serine, encoded by UCN and AGY. Would you like this to be back-translated to WSN? That matches all six serine codons but also ten non-serine codons. Some people may still want to use it in a probe or primer though. Josh -- Joshua L. Cherry, Ph.D. NCBI/NLM/NIH (Contractor) jcherry at ncbi.nlm.nih.gov From faruque at ebi.ac.uk Thu Jul 21 11:21:35 2005 From: faruque at ebi.ac.uk (Nadeem Faruque) Date: Thu, 21 Jul 2005 16:21:35 +0100 Subject: [EMBOSS] backtranseq In-Reply-To: References: <200507211338.35035.kertib@linuxlap.hu> <42DF90CA.9FB0FD34@hgmp.mrc.ac.uk> <42DF9E4E.6060603@ebi.ac.uk> <42DFAA7E.2070107@ebi.ac.uk> Message-ID: <42DFBD7F.7060306@ebi.ac.uk> Josh Cherry wrote: > Nadeem Faruque wrote: > > >>Returning a degenerate sequence would have the advantage (for some uses) >>of being usable by normal DNA-savvy string-based search methods when >>finding the peptide coding location in nucleic acid sequences rather >>than having to use similarity searches. > > > But this won't work the way some might hope due to the nature of the > genetic code, specifically (in the standard code) the three amino acids > that have six codons each (S, L, and R). Consider serine, encoded by UCN > and AGY. Would you like this to be back-translated to WSN? That matches > all six serine codons but also ten non-serine codons. Some people may > still want to use it in a probe or primer though. I was going to use Serine in my example but realised 'WSN' was a bit too degenerate to be a useful example. I understand you could not roundtrip peptide->DNA->peptide with my suggested behaviour (as you can currently do with backtranseq), but you can do DNA->peptide->DNA in a usable form. I'm sketchy about its potential use in oligo design, but given a degenerate backtranslation someone could possibly design oligos so as to avoid the more degenerate areas (esp for the 3' end of primers). If they were to use backtranseq they would be ignorant of these regions. Nadeem -- S.M. Nadeem N. Faruque EMBL Nucleotide Database Curation Team EMBL Outstation Tel: +44 1223 494611 Fax: +44 1223 494472 The European Bioinformatics Institute URL: http://www.ebi.ac.uk/ Email for data submissions: datasubs at ebi.ac.uk Email for updates: update at ebi.ac.uk ============================================================================= From pmr at ebi.ac.uk Thu Jul 21 11:55:15 2005 From: pmr at ebi.ac.uk (Peter Rice) Date: Thu, 21 Jul 2005 16:55:15 +0100 Subject: [EMBOSS] backtranseq In-Reply-To: <42DFBD7F.7060306@ebi.ac.uk> References: <200507211338.35035.kertib@linuxlap.hu> <42DF90CA.9FB0FD34@hgmp.mrc.ac.uk> <42DF9E4E.6060603@ebi.ac.uk> <42DFAA7E.2070107@ebi.ac.uk> <42DFBD7F.7060306@ebi.ac.uk> Message-ID: <42DFC563.4010600@ebi.ac.uk> Nadeem Faruque wrote: > Josh Cherry wrote: >>But this won't work the way some might hope due to the nature of the >>genetic code, specifically (in the standard code) the three amino acids >>that have six codons each (S, L, and R). Consider serine, encoded by UCN >>and AGY. Would you like this to be back-translated to WSN? That matches >>all six serine codons but also ten non-serine codons. Some people may >>still want to use it in a probe or primer though. > > I was going to use Serine in my example but realised 'WSN' was a bit too degenerate to be a useful example. > I understand you could not roundtrip peptide->DNA->peptide with my suggested behaviour ... I bet you can!!! Assuming you have a backtranslated sequence, WSN would be surely Serine (as would UCN or AGY). If any of the 3 positions is more specific, that could indicate one of the other possibilities. I would be happy to accept a lower case residue if the result is uncertain (if the ambiguity codes do not match what one would expect from the genetic code in a backtranslation). For ASN the answer could be T (ACN) S (AGY) or R (AGR) with T ('t') the favourite by a majority vote (4/4 codons match, 2/6 for the others). X can be used if all else fails. After all, we could be translating a sequence with a SNP. A command line option can give the user a choice of trying to resolve unclear positions or using X. Degenerate codons would be: A GCN C UGY D GAY E GAR F UUY G GGN H CAY I AUH K AAR L YUN (CUN/UUR) - also matches F (UUY) M AUG N AAY P CCN Q CAR R MGN (CGN/AGR) - also matches S (AGY) S WSN (UCN/AGY) - also matches T (ACN) also matches R (AGR) also matches C and W and * (UGN) T ACN V GUN W UGG Y UAY * URR - also matcheds W (UGG) m NUG (start codon) From lukem at gene.pbi.nrc.ca Thu Jul 21 17:08:32 2005 From: lukem at gene.pbi.nrc.ca (Luke McCarthy) Date: Thu, 21 Jul 2005 15:08:32 -0600 Subject: [EMBOSS] EMBOSS explorer Message-ID: <1121980112.5376.11.camel@incognito.invalid> Hi everybody, I'm pleased to finally announce a new release of the EMBOSS interface formerly known as EMBOSS::GUI, now known as EMBOSS explorer. Development has moved to SourceForge.net and the new home page for the interface is http://embossgui.sourceforge.net/ It's quite spartan at the moment, but I'll be adding a FAQ as questions are frequent asked (and answered...) You can download EMBOSS explorer at http://prdownloads.sourceforge.net/embossgui/emboss-explorer-2.0.0.tar.gz?download The new release has been tested against EMBOSS-3.0.0, but not thoroughly. Please report bugs using the bug tracker at http://sourceforge.net/tracker/?atid=699414&group_id=124389&func=browse (as a last resort, email them to mccarthy at users.sourceforge.net, but I'm hoping that use of the bug tracker will help with duplicate reports and other organizational issues...) Cheers, Luke From gwilliam at hgmp.mrc.ac.uk Fri Jul 22 04:21:40 2005 From: gwilliam at hgmp.mrc.ac.uk (Gary Williams, Tel 01223 494522) Date: Fri, 22 Jul 2005 09:21:40 +0100 Subject: [EMBOSS] backtranseq References: <200507211338.35035.kertib@linuxlap.hu> <42DF90CA.9FB0FD34@hgmp.mrc.ac.uk> <42DF9E4E.6060603@ebi.ac.uk> <42DFAA7E.2070107@ebi.ac.uk> Message-ID: <42E0AC94.63F132A7@hgmp.mrc.ac.uk> Peter Rice wrote: > > Nadeem Faruque wrote: > > > While backtranseq is very clever in predicting the cDNA sequence based on peptide sequence by choosing codons according > > to useage, would it not be very useful to have the option for it to return an answer in degenerate bases? > > > > eg in human, the 'peptide' is simply 'M' > > backtranseq returns the most likely codon used, ie 'ATG' > > but since it could be TTG, CTG or ATG, it may be more useful for some people to return 'HTG' > > Ummmm .... depends on the genetic code. In human I would expect ATG, in > bacteria GCG is second schoice and NTG would be the possible result - but only > for a start codon of course (just one of the complexities of backtranslating - > I think we must avoid inventing a start codon if the protein doesn't start > with 'M' because the numbering gets complicated). > > As this would need a different input (a genetic code, rather than a codon > usage file) I would make this a different program - not difficult to write, > > Any good suggestions for a program name? barebackseq -- Gary Williams MRC Rosalind Franklin Centre for Genomics Research Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SB, UK Tel: +44 1223 494522 (UNTIL END OF JULY 2005) E-mail: gareth.williams57 at ntlworld.com From gbottu at ben.vub.ac.be Fri Jul 22 05:10:17 2005 From: gbottu at ben.vub.ac.be (Guy Bottu) Date: Fri, 22 Jul 2005 11:10:17 +0200 Subject: [EMBOSS] Some question In-Reply-To: <42DF8E02.40909@ebi.ac.uk> References: <200507211338.35035.kertib@linuxlap.hu> <42DF8E02.40909@ebi.ac.uk> Message-ID: <20050722091017.GA27340@bigben.ulb.ac.be> On Thu, Jul 21, 2005 at 12:58:58PM +0100, Peter Rice wrote: > Kerti Bal?zs G?bor wrote: > > > There is some (elementary) question, because I do not find - maybe I do wrong > > - the solution. > > > > - how to backtranslate a cds mRNA fragm. to (c)DNA fragm. ? > > The cDNA will be identical to the mRNA. No backtranslation needed. > Backtranslation (as in backtranseq) converts a protein sequence into a > nucleotide sequence that will translate to the same protein sequence (using > the most frequent codon for each amino acid). > > If you only want to convert U (Uracil) to T (thymine) to convert an RNA > sequence to DNA (all EMBOSS programs will accept both as nucleotide input) you > can modify the program seqret to specify a nucleotide sequence as input, and > generate a DNA sequence as output. An easy way to start writing EMBOSS > programs - copy one program and one ACD file and make 4 small edits. No need to modify seqret, the EMBOSS program biosed can be used to replace U by T in a sequence. Guy Bottu, Belgian EMBnet Node From gbottu at ben.vub.ac.be Fri Jul 22 05:26:38 2005 From: gbottu at ben.vub.ac.be (Guy Bottu) Date: Fri, 22 Jul 2005 11:26:38 +0200 Subject: [EMBOSS] backtranseq In-Reply-To: <42DFC563.4010600@ebi.ac.uk> References: <200507211338.35035.kertib@linuxlap.hu> <42DF90CA.9FB0FD34@hgmp.mrc.ac.uk> <42DF9E4E.6060603@ebi.ac.uk> <42DFAA7E.2070107@ebi.ac.uk> <42DFBD7F.7060306@ebi.ac.uk> <42DFC563.4010600@ebi.ac.uk> Message-ID: <20050722092638.GB27340@bigben.ulb.ac.be> I remember that the GCG program backtranslate let the use choose between the most likely backtranslation (as backtranseq does) and the most ambiguous backtranslation. So, adding to EMBOSS a program that makes the most ambiguous backtranslation would bring back this lost functionality. As for the problem cases like Serine, maybe an option to make instead of a sequence with ambiguity symbols a regular expression that exactly matches the allowed codons ? The utility of this may be limited, but you could e.g. if you have a peptide use the backtranslation with the program dreg to search the corresponding CDS in a piece of DNA. Regards, Guy Bottu, Belgian EMBnet Node From faruque at ebi.ac.uk Fri Jul 22 06:22:27 2005 From: faruque at ebi.ac.uk (Nadeem Faruque) Date: Fri, 22 Jul 2005 11:22:27 +0100 Subject: [EMBOSS] backtranseq In-Reply-To: <20050722092638.GB27340@bigben.ulb.ac.be> References: <200507211338.35035.kertib@linuxlap.hu> <42DF90CA.9FB0FD34@hgmp.mrc.ac.uk> <42DF9E4E.6060603@ebi.ac.uk> <42DFAA7E.2070107@ebi.ac.uk> <42DFBD7F.7060306@ebi.ac.uk> <42DFC563.4010600@ebi.ac.uk> <20050722092638.GB27340@bigben.ulb.ac.be> Message-ID: <42E0C8E3.8060900@ebi.ac.uk> > As for the problem cases like Serine, maybe an option to make instead of a > sequence with ambiguity symbols a regular expression that exactly matches > the allowed codons ? The utility of this may be limited, but you could > e.g. if you have a peptide use the backtranslation with the program dreg to > search the corresponding CDS in a piece of DNA. I think we'd be better off with plain old IUPAC rather than venturing into more comples systems or we'll end up with weighted matrices or even HMM's. The advantage of IUPAC is of course that you can plug it into most other programs. Nadeem -- S.M. Nadeem N. Faruque EMBL Nucleotide Database Curation Team EMBL Outstation Tel: +44 1223 494611 Fax: +44 1223 494472 The European Bioinformatics Institute URL: http://www.ebi.ac.uk/ Email for data submissions: datasubs at ebi.ac.uk Email for updates: update at ebi.ac.uk ============================================================================= From pmr at ebi.ac.uk Fri Jul 22 08:52:49 2005 From: pmr at ebi.ac.uk (Peter Rice) Date: Fri, 22 Jul 2005 13:52:49 +0100 Subject: [EMBOSS] backtranseq In-Reply-To: <42E0C8E3.8060900@ebi.ac.uk> References: <200507211338.35035.kertib@linuxlap.hu> <42DF90CA.9FB0FD34@hgmp.mrc.ac.uk> <42DF9E4E.6060603@ebi.ac.uk> <42DFAA7E.2070107@ebi.ac.uk> <42DFBD7F.7060306@ebi.ac.uk> <42DFC563.4010600@ebi.ac.uk> <20050722092638.GB27340@bigben.ulb.ac.be> <42E0C8E3.8060900@ebi.ac.uk> Message-ID: <42E0EC21.4030607@ebi.ac.uk> Nadeem Faruque wrote: > I think we'd be better off with plain old IUPAC rather than venturing into more comples systems or we'll end up with > weighted matrices or even HMM's. > The advantage of IUPAC is of course that you can plug it into most other programs. Well .... how about this part of IUPAC: IUBMB recommends marking unclear codons, for example in http://www.chem.qmul.ac.uk/iubmb/misc/naseq.html "To avoid ambiguity, therefore, it is important to make it clear whenever the triplet YTN, for example, occurs in a sequence deduced from the occurrence of a leucine residue in the corresponding amino acid sequence that it does not include TTT or TTC as possibilities, etc. To emphasise this, it may be helpful to print such triplets in italics." ... we could use lowercase, rather than italics, to make this clear. IUPAC also allows uncertain positions with (A,C,D) or (H.I.K.L). EMBOSS allows these, but after checking all occurrences in PIR it simply ignores the extra characters and assumes the amino acids are in the correct sequence. These are needed because Sanger protein sequencing determined composition but usually not the order of residues. I see no codes for a choice of amino acids, other than B (D or N) and Z (E or Q), both from amino acid sequence composition, where hydrolyzing all amide bonds converted N to D (Asparagine to Aspartate) and Q to E (glutamine to glutamate). Also, one IUPAC report notes that NMR data can include J for "I or L" as Leucine and Isoleucine are indistinguishable by NMR. EBMOSS so far ignores this code (I only discovered it today :-). U is now officially used for selenocysteine, although many EMBOSS programs cannot handle U and have to use X. The only character not used in amino acid sequence is O. I have seen it used in DNA sequence (CpG islands represented as OJ for specialised alignment scoring in one publication). From pmr at ebi.ac.uk Fri Jul 22 11:00:01 2005 From: pmr at ebi.ac.uk (Peter Rice) Date: Fri, 22 Jul 2005 16:00:01 +0100 Subject: [EMBOSS] EMBOSS in August Message-ID: <42E109F1.9070604@ebi.ac.uk> We know it is close to the end of July, and we have not said what is happening to the EMBOSS team. We do have a solution, but it is not yet officially confirmed. The Rosalind Franklin Centre for Genomic Research will close at the end of next week. The EMBOSS project will move to the European Bioinformatics Institute from August 1st. Development and support will continue as before. The EMBOSS homepage will remain at http://emboss.sourceforge.net/ The FTP server (to download EMBOSS releases and updates) has moved to ftp://emboss.open-bio.org/pub/EMBOSS/ The EMBOSS anonymous CVS server will remain at cvs.open-bio.org hosted by the Open Bio Foundation, who will also continue to host the developers' CVS server. The EMBOSS mailing lists have been moved to the Open Bio Foundation, so the addresses are now: To contact the EMBOSS team: emboss-bug at emboss.open-bio.org Bug reports and support requests emboss-submit at emboss.open-bio.org Code submissions Lists users/developers can subscribe to: emboss at emboss.open-bio.org Users mailing list emboss-dev at emboss.open-bio.org Developers mailing list emboss-announce at emboss.open-bio.org New release announcements list There are obvious gaps in these details ... more news as soon as we have confirmation. regards, Peter Rice, Alan Bleasby and the EMBOSS team. From maoj at helix.nih.gov Mon Jul 25 09:58:06 2005 From: maoj at helix.nih.gov (Jean Mao) Date: Mon, 25 Jul 2005 09:58:06 -0400 Subject: [EMBOSS] (no subject) Message-ID: <200507251358.j6PDw5N94183035@helix.nih.gov> Hello all, I am building emboss package on our linux cluster. Since it will be for multiple batch run purpose, there is no need for us to include X11. I got the following error during 'make install'. Can someone tell me which programs use X11 and how to turn it off in them before running 'make install'? Many thanks!!! ---------------------------------------------------------------------------- --------------------------------------- make[2]: Entering directory `/usr/local/EMBOSS-3.0.0/emboss' /bin/sh ../libtool --tag=CC --mode=link gcc -O2 -o aaindexextract aaindexextract.o ../nucleus/libnucleus.la ../ajax/libajaxg.la ../ajax/libajax.la ../plplot/libplplot.la -lX11 -lm gcc -O2 -o .libs/aaindexextract aaindexextract.o ../nucleus/.libs/libnucleus.so ../ajax/.libs/libajaxg.so ../ajax/.libs/libajax.so ../plplot/.libs/libplplot.so -lX11 -lm -Wl,--rpath -Wl,/usr/local/EMBOSS-3.0.0/lib /usr/bin/ld: cannot find -lX11 collect2: ld returned 1 exit status make[2]: *** [aaindexextract] Error 1 make[2]: Leaving directory `/usr/local/EMBOSS-3.0.0/emboss' make[1]: *** [install-recursive] Error 1 make[1]: Leaving directory `/usr/local/EMBOSS-3.0.0/emboss' make: *** [install-recursive] Error 1 ---------------------------------------------------------------------------- --------------------------------------- Jean From msarachu at biol.unlp.edu.ar Mon Jul 25 10:44:12 2005 From: msarachu at biol.unlp.edu.ar (Martin Sarachu) Date: Mon, 25 Jul 2005 11:44:12 -0300 Subject: [EMBOSS] wEMBOSS-1.5 & wrappers4EMBOSS-1.3 Message-ID: <42E4FABC.20302@biol.unlp.edu.ar> This message is to announce the release of wEMBOSS-1.5 and wrappers4EMBOSS-1.3 wEMBOSS-1.5 includes: * a session indicator to identify which user is running wEMBOSS * the posibility to add notes to project results wrappers4EMBOSS-1.3 includes: * codehop wrapper for selecting degenerated primers * muscle wrapper for multiple alignements Both are available at http://www.wemboss.org -- Martin Sarachu msarachu at biol.unlp.edu.ar AR.EMBnet http://www.ar.embnet.org From maoj at helix.nih.gov Mon Jul 25 11:20:39 2005 From: maoj at helix.nih.gov (Jean Mao) Date: Mon, 25 Jul 2005 11:20:39 -0400 Subject: [EMBOSS] How to exclude X11 when Compile Emboss In-Reply-To: <71B0C9CB1FF4EA43BB48C08DCFF1A1FF01364AC3@NIHCESMLBX.nih.gov> Message-ID: <200507251520.j6PFKdN93765833@helix.nih.gov> > Hello all, > > I am building emboss package on our linux cluster. Since it will be for > multiple batch run purpose, there is no need for us to include X11. I got > the following error during 'make install'. Can someone tell me which > programs use X11 and how to turn it off in them before running 'make > install'? Many thanks!!! > ------------------------------------------------------- > make[2]: Entering directory `/usr/local/EMBOSS-3.0.0/emboss' > /bin/sh ../libtool --tag=CC --mode=link gcc -O2 -o aaindexextract > aaindexextract.o ../nucleus/libnucleus.la ../ajax/libajaxg.la > ../ajax/libajax.la ../plplot/libplplot.la -lX11 -lm > gcc -O2 -o .libs/aaindexextract aaindexextract.o > ../nucleus/.libs/libnucleus.so ../ajax/.libs/libajaxg.so > ../ajax/.libs/libajax.so ../plplot/.libs/libplplot.so -lX11 -lm > -Wl,--rpath -Wl,/usr/local/EMBOSS-3.0.0/lib > /usr/bin/ld: cannot find -lX11 > collect2: ld returned 1 exit status > make[2]: *** [aaindexextract] Error 1 > make[2]: Leaving directory `/usr/local/EMBOSS-3.0.0/emboss' > make[1]: *** [install-recursive] Error 1 > make[1]: Leaving directory `/usr/local/EMBOSS-3.0.0/emboss' > make: *** [install-recursive] Error 1 > ---------------------------------------------------------- > Jean > From maoj at mail.nih.gov Mon Jul 25 09:56:20 2005 From: maoj at mail.nih.gov (Mao, Jean (NIH/CIT)) Date: Mon, 25 Jul 2005 09:56:20 -0400 Subject: [EMBOSS] How to Turn X11 off during Make? Message-ID: <71B0C9CB1FF4EA43BB48C08DCFF1A1FF01730B6E@NIHCESMLBX.nih.gov> Hello all, I am building emboss package on our linux cluster. Since it will be for multiple batch run purpose, there is no need for us to include X11. I got the following error during 'make install'. Can someone tell me which programs use X11 and how to turn it off in them before running 'make install'? Many thanks!!! ---------------------------------------------------------------------------- --------------------------------------- make[2]: Entering directory `/usr/local/EMBOSS-3.0.0/emboss' /bin/sh ../libtool --tag=CC --mode=link gcc -O2 -o aaindexextract aaindexextract.o ../nucleus/libnucleus.la ../ajax/libajaxg.la ../ajax/libajax.la ../plplot/libplplot.la -lX11 -lm gcc -O2 -o .libs/aaindexextract aaindexextract.o ../nucleus/.libs/libnucleus.so ../ajax/.libs/libajaxg.so ../ajax/.libs/libajax.so ../plplot/.libs/libplplot.so -lX11 -lm -Wl,--rpath -Wl,/usr/local/EMBOSS-3.0.0/lib /usr/bin/ld: cannot find -lX11 collect2: ld returned 1 exit status make[2]: *** [aaindexextract] Error 1 make[2]: Leaving directory `/usr/local/EMBOSS-3.0.0/emboss' make[1]: *** [install-recursive] Error 1 make[1]: Leaving directory `/usr/local/EMBOSS-3.0.0/emboss' make: *** [install-recursive] Error 1 ---------------------------------------------------------------------------- --------------------------------------- Jean From idrummon at receptor.mgh.harvard.edu Mon Jul 25 12:25:28 2005 From: idrummon at receptor.mgh.harvard.edu (Iain Drummond) Date: Mon, 25 Jul 2005 12:25:28 -0400 Subject: [EMBOSS] How to exclude X11 when Compile Emboss In-Reply-To: <200507251520.j6PFKdN93765833@helix.nih.gov> Message-ID: Jean, Either tell emboss where to find the X11 libraries during the ./configure step: X features: --x-includes=DIR X include files are in DIR --x-libraries=DIR X library files are in DIR for example ./configure --x-includes=/usr/local/includes --x-libraries=/usr/local/lib or decide not to use X11 at all ./configure --without-x you can get this info by typing ./configure -help Iain Drummond -- Iain Drummond, Ph.D. Assistant Professor Department of Medicine, Harvard Medical School and Renal Unit, Massachusetts General Hospital Mailing address: Renal Unit / MGH 149-8000 149 13th St. Charlestown, MA 02129 Tel: 617 726 5647 Fax: 617 726 5669 idrummond at partners.org idrummon at receptor.mgh.harvard.edu Lab Home Page: http://danio.mgh.harvard.edu > From: "Jean Mao" > Organization: CIT > Reply-To: maoj at helix.nih.gov > Date: Mon, 25 Jul 2005 11:20:39 -0400 > To: > Subject: [EMBOSS] How to exclude X11 when Compile Emboss > > >> Hello all, >> >> I am building emboss package on our linux cluster. Since it will be for >> multiple batch run purpose, there is no need for us to include X11. I got >> the following error during 'make install'. Can someone tell me which >> programs use X11 and how to turn it off in them before running 'make >> install'? Many thanks!!! >> ------------------------------------------------------- >> make[2]: Entering directory `/usr/local/EMBOSS-3.0.0/emboss' >> /bin/sh ../libtool --tag=CC --mode=link gcc -O2 -o aaindexextract >> aaindexextract.o ../nucleus/libnucleus.la ../ajax/libajaxg.la >> ../ajax/libajax.la ../plplot/libplplot.la -lX11 -lm >> gcc -O2 -o .libs/aaindexextract aaindexextract.o >> ../nucleus/.libs/libnucleus.so ../ajax/.libs/libajaxg.so >> ../ajax/.libs/libajax.so ../plplot/.libs/libplplot.so -lX11 -lm >> -Wl,--rpath -Wl,/usr/local/EMBOSS-3.0.0/lib >> /usr/bin/ld: cannot find -lX11 >> collect2: ld returned 1 exit status >> make[2]: *** [aaindexextract] Error 1 >> make[2]: Leaving directory `/usr/local/EMBOSS-3.0.0/emboss' >> make[1]: *** [install-recursive] Error 1 >> make[1]: Leaving directory `/usr/local/EMBOSS-3.0.0/emboss' >> make: *** [install-recursive] Error 1 >> ---------------------------------------------------------- >> Jean >> > _______________________________________________ > EMBOSS mailing list > EMBOSS at emboss.open-bio.org > http://newportal.open-bio.org/mailman/listinfo/emboss From david at compbio.dundee.ac.uk Tue Jul 26 11:02:51 2005 From: david at compbio.dundee.ac.uk (David Martin) Date: Tue, 26 Jul 2005 16:02:51 +0100 Subject: [EMBOSS] dbxflat woes Message-ID: I am trying to run dbxflat on uniprot (sprot/trembl/tremblnew) and it gets most of the way through the second file then repeatably fails with the error: Processing file ./sprot.dat Processing file ./trembl.dat EMBOSS An error in ajindex.c at line 811: Something has unlocked the PRI root cache page Any hints on what I can do to avoid this? I am running as an unpriviledged user. ..d From ableasby at hgmp.mrc.ac.uk Tue Jul 26 11:55:19 2005 From: ableasby at hgmp.mrc.ac.uk (Alan Bleasby) Date: Tue, 26 Jul 2005 16:55:19 +0100 (BST) Subject: [EMBOSS] dbxflat woes Message-ID: <200507261555.j6QFtJdq005430@bromine.hgmp.mrc.ac.uk> >Something has unlocked the PRI root cache page With an error like that the first thing to check is if you've set CACHESIZE too small. The docs recommend that it's set to 200. If that isn't the problem then email me with your settings for: a) PAGESIZE b) CACHESIZE c) Resource definition from emboss.default and also email me with the command line you are using. Rgds Alan From pmr at ebi.ac.uk Wed Jul 27 06:04:10 2005 From: pmr at ebi.ac.uk (Peter Rice) Date: Wed, 27 Jul 2005 11:04:10 +0100 Subject: [EMBOSS] Database indexing logfiles Message-ID: <42E75C1A.5010606@ebi.ac.uk> Some questions for those who index their own databases in EMBOSS... I am adding an output file to the programs to log information from the indexing run. A sample for indexing the "tembl" test database is included below (data files are in the test/embl directory). Is this useful? What other information would you like to see? Can we improve the format of the report? regards, Peter Rice %cat outfile.dbiflat ######################################## # Program: dbiflat # Rundate: Wed Jul 27 2005 11:02:22 # Dbname: EMBL # Release: 0.0 # Date: 00/00/00 # IndexDirectory: ./ # Maxindex: 0 # Fields: 6 # Field 1: id # Field 2: acnum # Field 3: seqvn # Field 4: des # Field 5: keyword # Field 6: taxon # Directory: ./ # Filenames: *.dat # Exclude: # Files: 10 # File 1: ./est.dat # File 2: ./fun.dat # File 3: ./hum1.dat # File 4: ./inv.dat # File 5: ./pln.dat # File 6: ./pro.dat # File 7: ./rod.dat # File 8: ./sts.dat # File 9: ./vrl.dat # File 10: ./vrt.dat ######################################## processing filename 'est.dat' ... 1 entries processing filename 'fun.dat' ... 1 entries processing filename 'hum1.dat' ... 18 entries processing filename 'inv.dat' ... 3 entries processing filename 'pln.dat' ... 3 entries processing filename 'pro.dat' ... 9 entries processing filename 'rod.dat' ... 3 entries processing filename 'sts.dat' ... 1 entries processing filename 'vrl.dat' ... 1 entries processing filename 'vrt.dat' ... 4 entries Index acnum maxlen 8 items 88 Index seqvn maxlen 10 items 132 Index des maxlen 19 items 422 Index keyword maxlen 44 items 96 Index taxon maxlen 27 items 535 Total 10 files 44 entries From smiddha at indiana.edu Wed Jul 27 16:28:56 2005 From: smiddha at indiana.edu (Sumit Middha) Date: Wed, 27 Jul 2005 15:28:56 -0500 Subject: [EMBOSS] EMBOSS explorer In-Reply-To: <1121980112.5376.11.camel@incognito.invalid> References: <1121980112.5376.11.camel@incognito.invalid> Message-ID: <1122496136.42e7ee8815bd3@webmail.iu.edu> Hi, Its great to hear of the interface. I want to install it to my own directories (possibly the same where I untar everything) and then I will manage to point my web-pages or cgi etc to these. But I am not sure how to achieve that. This is my attempt at installation. Can someone help me with this. THanks. > ./install installing EMBOSS Explorer perl modules... Checking if your kit is complete... Looks good Writing Makefile for EMBOSS::GUI cp lib/EMBOSS/ACD.pm blib/lib/EMBOSS/ACD.pm cp lib/EMBOSS/GUI.pm blib/lib/EMBOSS/GUI.pm cp lib/EMBOSS/GUI/Conf.pm blib/lib/EMBOSS/GUI/Conf.pm cp lib/EMBOSS/GUI/XHTML.pm blib/lib/EMBOSS/GUI/XHTML.pm Manifying blib/man3/EMBOSS::GUI.3 Manifying blib/man3/EMBOSS::ACD.3 Manifying blib/man3/EMBOSS::GUI::Conf.3 Manifying blib/man3/EMBOSS::GUI::XHTML.3 Warning: You do not have permissions to install into /usr/local/lib/perl5/site_perl/5.8.5/sun4-solaris at /usr/local/lib/perl5/5.8.5/ExtUtils/Install.pm line 114. mkdir /usr/local/lib/perl5/site_perl/5.8.5/EMBOSS: Permission denied at /usr/local/lib/perl5/5.8.5/ExtUtils/Install.pm line 176 *** Error code 255 make: Fatal error: Command failed for target `pure_site_install' Quoting Luke McCarthy : > Hi everybody, > > I'm pleased to finally announce a new release of the EMBOSS interface > formerly known as EMBOSS::GUI, now known as EMBOSS explorer. > > Development has moved to SourceForge.net and the new home page for the > interface is http://embossgui.sourceforge.net/ It's quite spartan at > the moment, but I'll be adding a FAQ as questions are frequent asked > (and answered...) > > You can download EMBOSS explorer at > http://prdownloads.sourceforge.net/embossgui/emboss-explorer-2.0.0.tar.gz?download > > The new release has been tested against EMBOSS-3.0.0, but not > thoroughly. Please report bugs using the bug tracker at > http://sourceforge.net/tracker/?atid=699414&group_id=124389&func=browse > (as a last resort, email them to mccarthy at users.sourceforge.net, but I'm > hoping that use of the bug tracker will help with duplicate reports and > other organizational issues...) > > Cheers, > > Luke > _______________________________________________ > EMBOSS mailing list > EMBOSS at emboss.open-bio.org > http://newportal.open-bio.org/mailman/listinfo/emboss > From lukem at gene.pbi.nrc.ca Wed Jul 27 17:05:46 2005 From: lukem at gene.pbi.nrc.ca (Luke McCarthy) Date: Wed, 27 Jul 2005 15:05:46 -0600 Subject: [EMBOSS] EMBOSS explorer In-Reply-To: <1122496136.42e7ee8815bd3@webmail.iu.edu> References: <1121980112.5376.11.camel@incognito.invalid> <1122496136.42e7ee8815bd3@webmail.iu.edu> Message-ID: <1122498346.25556.7.camel@incognito.invalid> On Wed, 2005-07-27 at 14:28, Sumit Middha wrote: > Hi, > Its great to hear of the interface. I want to install it to my own directories > (possibly the same where I untar everything) and then I will manage to point my > web-pages or cgi etc to these. But I am not sure how to achieve that. > > This is my attempt at installation. Can someone help me with this. THanks. At the moment, you can't use the install script to install to your local directories. You'd have to do quite a bit of extra setup anyway, to make sure the web server could find (and had permission) to access the library files in your own directory. That being said, you can install the Perl modules like you would any others: perl Makefile.PL make make install You'll have to pass the appropriate options to Makefile.PL in order to install to your own directory. Alternatively, you can just run everything out of the untarred directory. You'll have to make sure that the web server is looking for perl modules in the emboss-explorer/lib directory, and you'll have to link appropriately to the html and cgi directories. The webserver user needs to be able to read everything in the lib, html and cgi directories, and to be able to execute the script in the cgi directory, and to be to write to the html/output directory. I assume that you know how to set up your webserver accordingly (or you wouldn't be asking...) You'll also have to edit emboss-explorer/lib/EMBOSS/GUI/Conf.pm and fill in the correct locations. Good luck. Cheers, Luke From pmr at ebi.ac.uk Thu Jul 28 12:14:06 2005 From: pmr at ebi.ac.uk (Peter Rice) Date: Thu, 28 Jul 2005 17:14:06 +0100 Subject: [EMBOSS] Database indexing logfiles In-Reply-To: <42E75C1A.5010606@ebi.ac.uk> References: <42E75C1A.5010606@ebi.ac.uk> Message-ID: <42E9044E.4090707@ebi.ac.uk> After comments on this list, I have updated the dbiflat logfile. It now includes: Field names are the short names used by the USA (the index file names still work on the dbiflat commandline). These are the same names as SRS uses in its commandline queries. Numbers of tokens for each index field in each file, and total unique values in each field index. Full paths for all directories (including the current working directory) Today's date (also written to the index file headers) if no date is given. The full commandline - if there were prompts with non-default replies these will be included in the commandline reported. This uses new ACD functions that can be used to report in other programs. Any special requests for this information in other outputs? regards, Peter > %cat outfile.dbiflat > ######################################## > # Program: dbiflat > # Rundate: Thu Jul 28 2005 17:04:58 > # Dbname: EMBL > # Release: 0.0 > # Date: 28/07/05 > # CurrentDirectory: /homes/pmr/hgmp/test/embl/ > # IndexDirectory: ./ > # IndexDirectoryPath: /homes/pmr/hgmp/test/embl/ > # Maxindex: 0 > # Fields: 6 > # Field 1: id > # Field 2: acc > # Field 3: sv > # Field 4: des > # Field 5: key > # Field 6: org > # Directory: ./ > # DirectoryPath: /homes/pmr/hgmp/test/embl/ > # Filenames: *.dat > # Exclude: > # Files: 10 > # File 1: ./est.dat > # File 2: ./fun.dat > # File 3: ./hum1.dat > # File 4: ./inv.dat > # File 5: ./pln.dat > # File 6: ./pro.dat > # File 7: ./rod.dat > # File 8: ./sts.dat > # File 9: ./vrl.dat > # File 10: ./vrt.dat > ######################################## > # Commandline: dbiflat > # -fields acnum,seqvn,des,keyword,taxon > # -dbname EMBL > # -idformat embl > # -auto > ######################################## > > processing filename 'est.dat' ... 1 entries > acc 1 > sv 3 > des 15 > key 1 > org 14 > processing filename 'fun.dat' ... 1 entries > acc 1 > sv 3 > des 8 > key 1 > org 9 > processing filename 'hum1.dat' ... 18 entries > acc 53 > sv 54 > des 200 > key 43 > org 252 > processing filename 'inv.dat' ... 3 entries > acc 3 > sv 9 > des 20 > key 3 > org 33 > processing filename 'pln.dat' ... 3 entries > acc 7 > sv 9 > des 19 > key 6 > org 54 > processing filename 'pro.dat' ... 9 entries > acc 13 > sv 27 > des 77 > key 28 > org 54 > processing filename 'rod.dat' ... 3 entries > acc 3 > sv 9 > des 28 > key 1 > org 45 > processing filename 'sts.dat' ... 1 entries > acc 1 > sv 3 > des 12 > key 7 > org 14 > processing filename 'vrl.dat' ... 1 entries > acc 2 > sv 3 > des 10 > key 1 > org 5 > processing filename 'vrt.dat' ... 4 entries > acc 4 > sv 12 > des 33 > key 5 > org 55 > > Index acc maxlen 8 items 84 > Index sv maxlen 10 items 90 > Index des maxlen 19 items 215 > Index key maxlen 44 items 81 > Index org maxlen 27 items 116 > > Total 10 files 44 entries From john8376 at uidaho.edu Fri Jul 29 15:08:50 2005 From: john8376 at uidaho.edu (Audra Johnson) Date: Fri, 29 Jul 2005 12:08:50 -0700 Subject: [EMBOSS] Using seqret to fetch from .nal index databases Message-ID: <5C75DDA3-04A4-4A58-B925-31F9F017D8C4@uidaho.edu> Apologies for the length, but I want to be thorough. I'm doing blast searches and then trying to fetch the sequences from the our genembl database using seqret. For example: blastall -p tblastn /gcgdata_10.3/gcgblast/genembl -i dp00061_disordered_115_168.fasta Gives me results of: GB_PR:HUMRPA70KD 2e-08 412 573 1 54 54 GB_PR:BC018126 2e-08 386 547 1 54 54 GB_PAT:AX335048 2e-08 412 573 1 54 54 GB_PAT:AR175924 2e-08 412 573 1 54 54 GB_RO:BC019119 0.003 399 584 1 53 62 I've tried using a seqret just for the database name I'm giving blastall, and specifically saying the genembl.nal file: $ seqret Reads and writes (returns) sequences Input sequence(s): /gcgdata_10.3/gcgblast/genembl.nal:HUMRPA70KD Error: Unable to read sequence '/gcgdata_10.3/gcgblast/ genembl.nal:HUMRPA70KD' Input sequence(s): /gcgdata_10.3/gcgblast/genembl:HUMRPA70KD Error: failed to open filename '/gcgdata_10.3/gcgblast/genembl' Error: Unable to read sequence '/gcgdata_10.3/gcgblast/ genembl:HUMRPA70KD' Died: seqret terminated: Bad value for '-sequence' and no more retries But neither works. (I've omitted the beginning prefix GB_PR: and similar prefixes, but I've tried that way and it doesn't work, either.) Is there any way to get seqret functioning with these databases? -- Audra Johnson, University of Idaho From golharam at umdnj.edu Fri Jul 29 15:27:51 2005 From: golharam at umdnj.edu (Ryan Golhar) Date: Fri, 29 Jul 2005 15:27:51 -0400 Subject: [EMBOSS] Using seqret to fetch from .nal index databases In-Reply-To: <5C75DDA3-04A4-4A58-B925-31F9F017D8C4@uidaho.edu> Message-ID: <008e01c59473$9cf21d70$2f01a8c0@GOLHARMOBILE1> If you are using a NCBI formatted database, why not just use formatseq from the ncbi toolkit to extract the sequence? -----Original Message----- From: emboss-bounces at emboss.open-bio.org [mailto:emboss-bounces at emboss.open-bio.org] On Behalf Of Audra Johnson Sent: Friday, July 29, 2005 3:09 PM To: emboss at emboss.open-bio.org Subject: [EMBOSS] Using seqret to fetch from .nal index databases Apologies for the length, but I want to be thorough. I'm doing blast searches and then trying to fetch the sequences from the our genembl database using seqret. For example: blastall -p tblastn /gcgdata_10.3/gcgblast/genembl -i dp00061_disordered_115_168.fasta Gives me results of: GB_PR:HUMRPA70KD 2e-08 412 573 1 54 54 GB_PR:BC018126 2e-08 386 547 1 54 54 GB_PAT:AX335048 2e-08 412 573 1 54 54 GB_PAT:AR175924 2e-08 412 573 1 54 54 GB_RO:BC019119 0.003 399 584 1 53 62 I've tried using a seqret just for the database name I'm giving blastall, and specifically saying the genembl.nal file: $ seqret Reads and writes (returns) sequences Input sequence(s): /gcgdata_10.3/gcgblast/genembl.nal:HUMRPA70KD Error: Unable to read sequence '/gcgdata_10.3/gcgblast/ genembl.nal:HUMRPA70KD' Input sequence(s): /gcgdata_10.3/gcgblast/genembl:HUMRPA70KD Error: failed to open filename '/gcgdata_10.3/gcgblast/genembl' Error: Unable to read sequence '/gcgdata_10.3/gcgblast/ genembl:HUMRPA70KD' Died: seqret terminated: Bad value for '-sequence' and no more retries But neither works. (I've omitted the beginning prefix GB_PR: and similar prefixes, but I've tried that way and it doesn't work, either.) Is there any way to get seqret functioning with these databases? -- Audra Johnson, University of Idaho _______________________________________________ EMBOSS mailing list EMBOSS at emboss.open-bio.org http://newportal.open-bio.org/mailman/listinfo/emboss From Andrew.Mather at dpi.vic.gov.au Sat Jul 30 07:30:47 2005 From: Andrew.Mather at dpi.vic.gov.au (Andrew.Mather at dpi.vic.gov.au) Date: Sat, 30 Jul 2005 21:30:47 +1000 Subject: [EMBOSS] EMBOSS GUI problems Message-ID: Hi Luke and EMBOSS list I've installed the EMBOSS GUI and for the most part, it's working pretty well. However for some apps (mainly seems to be alignment type ones like water, needle, emma, but that may just be because I've tried more of them than any others), it always fails Error: Unable to read sequence '' Died: water terminated: Bad value for '-asequence' with -auto defined water exited with status 1... or in the /var/www/html/EMBOSS/runs/ error log, Error: Unable to read sequence '' Died: water terminated: Bad value for '-asequence' with -auto defined water exited with status 1... It doesn't seem to matter if it's sequence data pasted in, or uploaded from a file. Some apps work fine, so I'm guessing it's not a fundamental problem like permissions on a temp directory or something. Are you able to point me at where to start lookng ? Thanks, Andrew Animal Genetics and Genomics, PIRVic Attwood 475 Mickleham Road, Attwood, 3049 ph +61 3 92174342 mob 0413 009 761 ---------------- There are 10 kinds of people...those who understand binary and those who don't. From Andrew.Mather at dpi.vic.gov.au Sat Jul 30 06:40:45 2005 From: Andrew.Mather at dpi.vic.gov.au (Andrew.Mather at dpi.vic.gov.au) Date: Sat, 30 Jul 2005 20:40:45 +1000 Subject: [EMBOSS] EMBOSS GUI problems Message-ID: Hi Luke and EMBOSS list I've installed the EMBOSS GUI and for the most part, it's working pretty well. However for some apps (mainly seems to be alignment type ones like water, needle, emma, but that may just be because I've tried more of them than any others), it always fails Error: Unable to read sequence '' Died: water terminated: Bad value for '-asequence' with -auto defined water exited with status 1... or in the /var/www/html/EMBOSS/runs/ error log, Error: Unable to read sequence '' Died: water terminated: Bad value for '-asequence' with -auto defined water exited with status 1... It doesn't seem to matter if it's sequence data pasted in, or uploaded from a file. Some apps work fine, so I'm guessing it's not a fundamental problem like permissions on a temp directory or something. Are you able to point me at where to start lookng ? Thanks, Andrew Animal Genetics and Genomics, PIRVic Attwood 475 Mickleham Road, Attwood, 3049 ph +61 3 92174342 mob 0413 009 761 ---------------- There are 10 kinds of people...those who understand binary and those who don't. From ableasby at hgmp.mrc.ac.uk Wed Jul 13 14:36:28 2005 From: ableasby at hgmp.mrc.ac.uk (Alan Bleasby) Date: Wed, 13 Jul 2005 15:36:28 +0100 (BST) Subject: [EMBOSS] New email lists ready Message-ID: <200507131436.j6DEaSF7027543@bromine.hgmp.mrc.ac.uk> The new email addresses for the EMBOSS lists are now set up and ready (excluding any teething problems). They are: emboss at emboss.open-bio.org emboss-dev at emboss.open-bio.org emboss-bug at emboss.open-bio.org emboss-submit at emboss.open-bio.org You can access the archives, subscribe/unsubscribe and alter the way email is sent to you (e.g. digests) by visiting: http://emboss.open-bio.org/mailman/listinfo/emboss http://emboss.open-bio.org/mailman/listinfo/emboss-dev http://emboss.open-bio.org/mailman/listinfo/emboss-announce http://emboss.open-bio.org/mailman/listinfo/emboss-bug The new FTP server is at: ftp://emboss.open-bio.org/pub/EMBOSS Alan From tjc at sanger.ac.uk Wed Jul 13 15:11:40 2005 From: tjc at sanger.ac.uk (Tim Carver) Date: Wed, 13 Jul 2005 16:11:40 +0100 Subject: [EMBOSS] Jemboss Announcement Message-ID: With the imminent closure of the RFCGR, there will be no publicly available Jemboss server. Jemboss will remain available for download and installation as part of the EMBOSS distribution. You may find there is a local Jemboss server already available at your own institution. If you would like to have your server listed on the Jemboss web page please contact the EMBOSS group (emboss-dev at emboss.open-bio.org) Tim Carver The Wellcome Trust Sanger Institute Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK From ableasby at hgmp.mrc.ac.uk Thu Jul 14 23:43:30 2005 From: ableasby at hgmp.mrc.ac.uk (Alan Bleasby) Date: Fri, 15 Jul 2005 00:43:30 +0100 (BST) Subject: [EMBOSS] EMBOSS 3.0.0 released Message-ID: <200507142343.j6ENhUn2002328@bromine.hgmp.mrc.ac.uk> EMBOSS 3.0.0 is now available for download from: ftp://emboss.open-bio.org/pub/EMBOSS/ and, until the 27th July, from: ftp://ftp.rfcgr.mrc.ac.uk/pub/EMBOSS/ The following text details some of the changes from the previous release. Alan EMBOSS main package: New database indexing programs dbxflat, dbxfasta and dbxgcg. A dbxblast program will be added if we can extract data from the new BLAST formatdb output. These programs allow indexing of files larger than 2Gb. N.B.: Indexes will be created faster if they are written through a different disc controller than that used to read the database being indexed. If that is not possible then reading from and writing to different hard drives on the same controller is recommended. Note that each index can be created independently of the others e.g. you can create keyword and description indexes after you've created the ID and ACC indexes. To support these programs, the emboss.default and .embossrc files can include "resource" definitions. See the documentation of these programs for more information. "resource" definitions are intended to define anything other than environment variables and databases. In the emboss.default and .embossrc files the same name can be used for variables, databases, and resources (we now store them in separate tables). In previous versions a single table was used and name clashes could occur. This becomes an issue with the increasing use of resource definitions. Sequence sets in ACD have a new attribute "aligned" that reports whether the sequences are aligned (reading a multiple alignment in for visualisation) or not (reading a set of sequences into memory for further processing - perhaps for alignment). Sequence formats have been reviewed. "experiment" format is that used by the Staden package. "staden" and "gcg" formats now parse out comments from anywhere in the sequence. "nexus" and "nexusnon" formats now correctly report protein sequence datatypes. "nbrf" or "pir" format data can now be read from an SRSWWW server (for technical reasons, SRS servers are unable to exactly reproduce NBRF/PIR format). "clustal" output no longer writes in blocks of 10. "Phylip3" output is now renamed "phylipnon" for compatibility with other non-interleaved output format names. The "phylip3" name remains valid for back-compatibility. The header record for phylipnon format has been changed to that accepted by phylip 3.6 (no YF on the header line, number of sequences specified). Sequence format information on the web has been updated to reflect these changes. Codon usage table formats can be in these formats (-format qualifier): "emboss", "EMBOSS codon usage file", "All numbers read, #comments for extras" "cut", "EMBOSS codon usage file", "Same as EMBOSS, output default format is 'cut'" "gcg", "GCG codon usage file", "All numbers read, #comments for extras" "cutg", "CUTG codon usage file", "All numbers (cutgaa) read or fraction calculated, extras added" "cutgaa", "CUTG codon usage file with aminoacids", "Cutg with all numbers" "spsum", CUTG species summary file", "Number only, species and CDSs in header" "cherry", "Mike Cherry codonusage database file", "GCG format with species and CDSs in header" "transterm", "TransTerm database file", "GCG format with no extras" "codehop", "FHCRC codehop program codon usage file", "Freq only, extras at end" "staden", "Staden package codon usage file with percentages", "Freq or number only, no extras" "numstaden", "Staden package codon usage file with numbers", "Number only, no extras. Can be read as 'staden'" Any of these formats should be readable by default. Some files are "readable" in more than one format (staden and numstaden for example can both be read as "staden"). The extra names are used so we can reuse them as output format names. For output of codon usage tables, the same formats are available (-oformat qualifier). A new application codcopy (not codret because coderet is already an EMBOSS program name) will convert from one format to another in the same way as seqret converts sequence formats. Coderet reports the number of CDS, mRNA and translation sequences. Correction to sequence numbering for reversed nucleotide sequences in alignments. Correction to sequence alignment functions returning slightly suboptimal alignments. The entrails program reports codon usage formats. Description of report format entrails output improved. Entrails is built by "make check" and is provided so that developers of wrappers can obtain all EMBOSS internal details needed, for example all ACD datatypes and input/output format names and descriptions. Sequence types are explicitly set in cons, sixpack and backtranseq as some output formats failed to recognise them as protein. EMBASSY packages: MYEMBOSS is a new EMBASSY package for developing your own code. Installation requires recent versions of GNU packages autoconf, automake and libtool. To install, you must first build the configure and make files with these commands: aclocal -I m4 autoconf automake -a When you add your own programs, do so by adding source files in myemboss/source and ACD files in myemboss/emboss_acd and add these filenames to the Makefile.am files in each directory. There are "myseq" and "mytest" examples provided to guide you. There is no need to modify configure or Makefile files - these will be automatically updated. To allow MYEMBOSS to be installed by one user, and linked to an EMBOSS installation maintained for the site by someone else, new variables are added to locate the ACD files for any EMBASSY package. If myemboss is not installed in the same place as EMBOSS, define EMBOSS_MYEMBOSSROOT as the location of the myemboss installed ACD files or the myemboss/emboss_acd source directory. This requires that EMBASSY programs call the embInitP function with the name of the package ("myemboss"). For ACD utilities such as acdvalid or acdc to work, as these use the EMBOSS embInit call, another variable EMBOSS_ACDUTILROOT must be defined, pointing to the same directory. PHYLIP is a beta release port of PHYLIP 3.6b. We welcome comments on the EMBOSS interface to the programs. Program names are prefixed by 'f' to avoid clashes with the old PHYLIP EMBASSY package. We still need to work on adding new tree input and output formats, and updating the code to PHYLIP 3.63 (December 2004). We are also considering splitting more of the programs to simplify the ACD interface. In this release seqboot and treedist are already split. seqboot is split by input type into seqboot, restboot, discboot and freqboot. Treedist is split by the number of input files into treedist and treedistpair. Acdvalid objects to the dependencies in other programs, for example the method used by fdnadist. The DOMAINATRIX package of earlier releases has been extended and replaced by 5 EMBASSY packages described below (32 applications in total). These tools were developed as part of a research project and are distinct from other EMBOSS apps in being intended mostly for computational biologists rather than biologist end-users. STRUCTURE The STRUCTURE package is used for parsing the PDB database and generating secondary databases of coordinate and derived data. The tools have the following scope: (i) For parsing PDB files and writing clean coordinate files (CCF files) that "clean-up" many PDB inconsistencies. For example, residue numbers give the correct index into the biological sequence. (ii) To generate CCF files for whole PDB files or individual domains from the SCOP and CATH databases. (iii) To augment CCF files with residue solvent accessibility and secondary structure data. (iv) To generate contact files (CON files) of intra-chain and inter-chain residue-residue contact data. (v) To generate CON files of residue-ligand contact data. (vi) Miscellaneous file handling, e.g. dictionary of heterogen groups. DOMAINATRIX The DOMAINATRIX package is used for handling the SCOP and CATH databases of protein domain classification, the parsable files of which can be inconvenient, e.g. for comparative studies, extending and processing. The tools have the following scope: (i) For parsing raw SCOP and CATH parsable files and writing domain classification files (DCF files) with a single, simple and extensible format. (ii) To add sequence records to a DCF file. (iii) To remove low resolution domains. (iv) To flexibly calculate and remove redundancy. (v) Primitive tools for secondary structure element mapping to domains in a DCF file. DOMALIGN The DOMALIGN package is used for generating alignments for families of domains, especially across large datasets, e.g. the whole of SCOP. The tools have the following scope: (i) For identifying representative structures for different nodes in the SCOP and CATH hierarchies. (ii) For generating annotated, structure-based sequence alignments for these nodes. (iii) For extending these domain alignment files (DAF files) with sequences of unknown structure. (iv) All-versus-all global sequence alignment. DOMSEARCH The DOMSEARCH package is used for deriving extended sequence families, especially from large structural datasets such as the whole of SCOP. The tools have the following scope: (i) To generate domain hits files (DHF files) of sequence relatives to an alignment or other sequences. (ii) To remove fragmentary sequences from a DHF file. (iii) To flexibly calculate and remove redundancy. (iv) To remove hits hits of ambiguous classification and collate sequences into families. SIGNATURE The SIGNATURE package is used for generating, scanning and evaluating sparse signatures and other predictive elements for protein sequence characterisation. The tools have the following scope: (i) To generate sparse signatures for protein families from alignments and residue contact data. (ii) Generate other types of discriminator (e.g. HMMs) from alignments. (iii) Generate ligand-binding signatures from residue-ligand contacts. (iv) Generate domain hits files (DHF files) and ligand hits files (LHF files) of hits (sequences) from signature scans. (v) Interpretation and display of signature performance by using ROC analysis. Where data, files etc are mentioned above or in the application documentation, data structures and functions for manipulating such are usually provided in the AJAX and NUCLEUS C programming libraries. For example, there are objects for handling protein atoms, residues, chains, for SCOP and CATH domains and so on. From thiago.venancio at gmail.com Mon Jul 18 12:09:33 2005 From: thiago.venancio at gmail.com (Thiago Venancio) Date: Mon, 18 Jul 2005 09:09:33 -0300 Subject: [EMBOSS] error msg Message-ID: <44255ea80507180509386875bd@mail.gmail.com> Hi all. I am new to EMBOSS. I have installed it and got the problem: "wossname: error while loading shared libraries: libnucleus.so.3: cannot open shared object file: No such file or directory" All the EMBOSS programs give the same error. The instalation process have been ok and i have set the envs. Thanks in advance. Thiago From golharam at umdnj.edu Tue Jul 19 16:51:30 2005 From: golharam at umdnj.edu (Ryan Golhar) Date: Tue, 19 Jul 2005 12:51:30 -0400 Subject: [EMBOSS] EMBOSS::GUI Web Interface Message-ID: <009401c58c82$1d2a3670$2f01a8c0@GOLHARMOBILE1> Hi Luke, Any word on when EMBOSS-GUI will be available for EMBOSS 3.0.0? Thanks, Ryan From jacob at biochemistry.ucl.ac.uk Wed Jul 20 15:36:24 2005 From: jacob at biochemistry.ucl.ac.uk (Jacob Hurst) Date: Wed, 20 Jul 2005 16:36:24 +0100 (BST) Subject: [EMBOSS] problem with using accession number.... Message-ID: Hello, If I enter the following id seqret correctly returns the sequence. acrm3<113>% seqret embl:hsgstpig Reads and writes (returns) sequences Output sequence [hsgstpig.fasta]: however if i enter the corresponding accession number it fails..... acrm3<114>% seqret embl:X08058 Reads and writes (returns) sequences Error: Unable to read sequence 'embl:X08058' Died: seqret terminated: Bad value for '-sequence' and no prompt I was under the impression that emboss was setup to deal with both accession and id. regards Jake -- Jacob Hurst Phd Department of Biochemistry and Molecular Biology, University College London From pmr at ebi.ac.uk Wed Jul 20 15:59:55 2005 From: pmr at ebi.ac.uk (Peter Rice) Date: Wed, 20 Jul 2005 16:59:55 +0100 Subject: [EMBOSS] problem with using accession number.... In-Reply-To: References: Message-ID: <42DE74FB.80604@ebi.ac.uk> Jacob Hurst wrote: > I was under the impression that emboss was setup to deal with both > accession and id. Yes, but ... this depends on how the embl database is defined at your site. Some sites have databases defined to access entries through, for example, a URL or an external application (or script) that can only search for entry names. Hmmmm .... we could add a little more information on this in showdb .... for a future release. If you have difficulty finding out how the database is defined, mail us at emboss-bug at emboss.open-bio.org and we can help you track it down. regards, Peter Rice From golharam at umdnj.edu Thu Jul 21 04:00:03 2005 From: golharam at umdnj.edu (Ryan Golhar) Date: Thu, 21 Jul 2005 00:00:03 -0400 Subject: [EMBOSS] EMBOSS 3.0.0 RPMs available Message-ID: <013801c58da8$acc10440$2f01a8c0@GOLHARMOBILE1> I'm eager to upgrade our installation of EMBOSS on all our linux workstations, so I've gone ahead and built RPMs for EMBOSS (based on biolinux version) and MYEMBOSS applications. You can download the RPMs and source RPMs from http://serine.umdnj.edu/~golharam/biorpms. They include (sorry for the capitalization): DOMAINATRIX DOMALIGN DOMSEARCH EMBOSS EMBOSS-data EMBOSS-devel EMBOSS-Jemboss EMNU ESIM4 HMMER MEME MSE MYEMBOSS PHYLIP SIGNATURE STRUCTURE TOPO -- Ryan Golhar - golharam at umdnj.edu The Informatics Institute of UMDNJ From james_tan79 at hotmail.com Thu Jul 21 09:41:24 2005 From: james_tan79 at hotmail.com (JT) Date: Thu, 21 Jul 2005 17:41:24 +0800 Subject: [EMBOSS] any DNA or RNA program similar to pepstat ? Message-ID: Hi, Is there any program that can output a report of simple DNA/RNA sequence information including e.g. a) Molecular weight b) Number of residues c) Average residue weight d) %G, %C, %A, %T, %GC e) Melting temp f) charge etc. Thanks James From jison at hgmp.mrc.ac.uk Thu Jul 21 10:49:58 2005 From: jison at hgmp.mrc.ac.uk (Dr J.C. Ison) Date: Thu, 21 Jul 2005 11:49:58 +0100 Subject: [EMBOSS] any DNA or RNA program similar to pepstat ? References: Message-ID: <42DF7DD6.CD81DF9B@hgmp.mrc.ac.uk> Hi James There's no single app to cover all your request, but some of the following might help (see http://emboss.sourceforge.net/apps/) dan Plot melting temperatures for DNA. freak Residue/base frequency table or plot extractfeat Extract features from a sequence geecee Calculates the fractional GC content of nucleic acid sequences infoseq Displays some simple information about sequences isochore Plots isochores in large DNA sequences newcpgseek Reports CpG rich regions remap Display a sequence with restriction cut sites, translation etc.. showfeat Show features of a sequence. Please have a look at what's available and if you require something else / new functionality etc please get back in touch. Cheers Jon JT wrote: > > Hi, > > Is there any program that can output a report of simple DNA/RNA sequence > information including e.g. > a) Molecular weight > b) Number of residues > c) Average residue weight > d) %G, %C, %A, %T, %GC > e) Melting temp > f) charge etc. > > Thanks > James > _______________________________________________ > EMBOSS mailing list > EMBOSS at emboss.open-bio.org > http://newportal.open-bio.org/mailman/listinfo/emboss -- Jon C. Ison, PhD Proteomics Applications Group MRC Rosalind Franklin Centre for Genomics Research Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SB, UK Tel: +44 1223 494500 Fax: +44 1223 494512 E-mail: jison at rfcgr.mrc.ac.uk Web: http://www.rfcgr.mrc.ac.uk From kertib at linuxlap.hu Thu Jul 21 11:38:34 2005 From: kertib at linuxlap.hu (Kerti =?iso-8859-1?q?Bal=E1zs_G=E1bor?=) Date: Thu, 21 Jul 2005 13:38:34 +0200 Subject: [EMBOSS] Some question Message-ID: <200507211338.35035.kertib@linuxlap.hu> Hello! There is some (elementary) question, because I do not find - maybe I do wrong - the solution. - how to backtranslate a cds mRNA fragm. to (c)DNA fragm. ? - how to generate antisense DNA fragm. from a sens. Thank you. Balazs From pmr at ebi.ac.uk Thu Jul 21 11:58:58 2005 From: pmr at ebi.ac.uk (Peter Rice) Date: Thu, 21 Jul 2005 12:58:58 +0100 Subject: [EMBOSS] Some question In-Reply-To: <200507211338.35035.kertib@linuxlap.hu> References: <200507211338.35035.kertib@linuxlap.hu> Message-ID: <42DF8E02.40909@ebi.ac.uk> Kerti Bal?zs G?bor wrote: > There is some (elementary) question, because I do not find - maybe I do wrong > - the solution. > > - how to backtranslate a cds mRNA fragm. to (c)DNA fragm. ? The cDNA will be identical to the mRNA. No backtranslation needed. Backtranslation (as in backtranseq) converts a protein sequence into a nucleotide sequence that will translate to the same protein sequence (using the most frequent codon for each amino acid). If you only want to convert U (Uracil) to T (thymine) to convert an RNA sequence to DNA (all EMBOSS programs will accept both as nucleotide input) you can modify the program seqret to specify a nucleotide sequence as input, and generate a DNA sequence as output. An easy way to start writing EMBOSS programs - copy one program and one ACD file and make 4 small edits. > - how to generate antisense DNA fragm. from a sens. In EMBOSS, revseq does this. The antisense strand is smilpy the reverse compleemnt of the original. Hope this helps, Peter Rice From jison at hgmp.mrc.ac.uk Thu Jul 21 12:10:50 2005 From: jison at hgmp.mrc.ac.uk (Dr J.C. Ison) Date: Thu, 21 Jul 2005 13:10:50 +0100 Subject: [EMBOSS] Some question References: <200507211338.35035.kertib@linuxlap.hu> Message-ID: <42DF90CA.9FB0FD34@hgmp.mrc.ac.uk> Dear Balazs See http://emboss.sourceforge.net/apps/ for application documentation. transeq Translates nucleic acid sequences. (i.e. DNA -> protein) backtranseq Back translate a protein sequence (i.e. protein -> DNA) coderet Extract CDS, mRNA and translations from feature tables I don't think there is anything to interchange sense/antisense or mRNA / DNA sequences but something could be written if you let us know exactly what you need / why you need it. Cheers Jon Kerti Bal?zs G?bor wrote: > > Hello! > > There is some (elementary) question, because I do not find - maybe I do wrong > - the solution. > > - how to backtranslate a cds mRNA fragm. to (c)DNA fragm. ? > - how to generate antisense DNA fragm. from a sens. > > Thank you. > > Balazs > _______________________________________________ > EMBOSS mailing list > EMBOSS at emboss.open-bio.org > http://newportal.open-bio.org/mailman/listinfo/emboss -- Jon C. Ison, PhD Proteomics Applications Group MRC Rosalind Franklin Centre for Genomics Research Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SB, UK Tel: +44 1223 494500 Fax: +44 1223 494512 E-mail: jison at rfcgr.mrc.ac.uk Web: http://www.rfcgr.mrc.ac.uk From faruque at ebi.ac.uk Thu Jul 21 13:08:30 2005 From: faruque at ebi.ac.uk (Nadeem Faruque) Date: Thu, 21 Jul 2005 14:08:30 +0100 Subject: [EMBOSS] backtranseq In-Reply-To: <42DF90CA.9FB0FD34@hgmp.mrc.ac.uk> References: <200507211338.35035.kertib@linuxlap.hu> <42DF90CA.9FB0FD34@hgmp.mrc.ac.uk> Message-ID: <42DF9E4E.6060603@ebi.ac.uk> > See http://emboss.sourceforge.net/apps/ for application documentation. > > transeq Translates nucleic acid sequences. (i.e. DNA -> protein) > backtranseq Back translate a protein sequence (i.e. protein -> DNA) ... While backtranseq is very clever in predicting the cDNA sequence based on peptide sequence by choosing codons according to useage, would it not be very useful to have the option for it to return an answer in degenerate bases? eg in human, the 'peptide' is simply 'M' backtranseq returns the most likely codon used, ie 'ATG' but since it could be TTG, CTG or ATG, it may be more useful for some people to return 'HTG' Returning a degenerate sequence would have the advantage (for some uses) of being usable by normal DNA-savvy string-based search methods when finding the peptide coding location in nucleic acid sequences rather than having to use similarity searches. I could also see it being useful for designing PCR primers within coding regions. Nadeem -- S.M. Nadeem N. Faruque EMBL Nucleotide Database Curation Team EMBL Outstation Tel: +44 1223 494611 Fax: +44 1223 494472 The European Bioinformatics Institute URL: http://www.ebi.ac.uk/ Email for data submissions: datasubs at ebi.ac.uk Email for updates: update at ebi.ac.uk ============================================================================= From pmr at ebi.ac.uk Thu Jul 21 14:00:30 2005 From: pmr at ebi.ac.uk (Peter Rice) Date: Thu, 21 Jul 2005 15:00:30 +0100 Subject: [EMBOSS] backtranseq In-Reply-To: <42DF9E4E.6060603@ebi.ac.uk> References: <200507211338.35035.kertib@linuxlap.hu> <42DF90CA.9FB0FD34@hgmp.mrc.ac.uk> <42DF9E4E.6060603@ebi.ac.uk> Message-ID: <42DFAA7E.2070107@ebi.ac.uk> Nadeem Faruque wrote: > While backtranseq is very clever in predicting the cDNA sequence based on peptide sequence by choosing codons according > to useage, would it not be very useful to have the option for it to return an answer in degenerate bases? > > eg in human, the 'peptide' is simply 'M' > backtranseq returns the most likely codon used, ie 'ATG' > but since it could be TTG, CTG or ATG, it may be more useful for some people to return 'HTG' Ummmm .... depends on the genetic code. In human I would expect ATG, in bacteria GCG is second schoice and NTG would be the possible result - but only for a start codon of course (just one of the complexities of backtranslating - I think we must avoid inventing a start codon if the protein doesn't start with 'M' because the numbering gets complicated). As this would need a different input (a genetic code, rather than a codon usage file) I would make this a different program - not difficult to write, Any good suggestions for a program name? > Returning a degenerate sequence would have the advantage (for some uses) of being usable by normal DNA-savvy > string-based search methods when finding the peptide coding location in nucleic acid sequences rather than having to use > similarity searches. I could also see it being useful for designing PCR primers within coding regions. ... which leads on to whether EMBOSS should include such programs :-) regards, Peter Rice From jcherry at ncbi.nlm.nih.gov Thu Jul 21 14:58:14 2005 From: jcherry at ncbi.nlm.nih.gov (Josh Cherry) Date: Thu, 21 Jul 2005 10:58:14 -0400 (EDT) Subject: [EMBOSS] backtranseq In-Reply-To: <42DFAA7E.2070107@ebi.ac.uk> References: <200507211338.35035.kertib@linuxlap.hu> <42DF90CA.9FB0FD34@hgmp.mrc.ac.uk> <42DF9E4E.6060603@ebi.ac.uk> <42DFAA7E.2070107@ebi.ac.uk> Message-ID: Nadeem Faruque wrote: > Returning a degenerate sequence would have the advantage (for some uses) > of being usable by normal DNA-savvy string-based search methods when > finding the peptide coding location in nucleic acid sequences rather > than having to use similarity searches. But this won't work the way some might hope due to the nature of the genetic code, specifically (in the standard code) the three amino acids that have six codons each (S, L, and R). Consider serine, encoded by UCN and AGY. Would you like this to be back-translated to WSN? That matches all six serine codons but also ten non-serine codons. Some people may still want to use it in a probe or primer though. Josh -- Joshua L. Cherry, Ph.D. NCBI/NLM/NIH (Contractor) jcherry at ncbi.nlm.nih.gov From faruque at ebi.ac.uk Thu Jul 21 15:21:35 2005 From: faruque at ebi.ac.uk (Nadeem Faruque) Date: Thu, 21 Jul 2005 16:21:35 +0100 Subject: [EMBOSS] backtranseq In-Reply-To: References: <200507211338.35035.kertib@linuxlap.hu> <42DF90CA.9FB0FD34@hgmp.mrc.ac.uk> <42DF9E4E.6060603@ebi.ac.uk> <42DFAA7E.2070107@ebi.ac.uk> Message-ID: <42DFBD7F.7060306@ebi.ac.uk> Josh Cherry wrote: > Nadeem Faruque wrote: > > >>Returning a degenerate sequence would have the advantage (for some uses) >>of being usable by normal DNA-savvy string-based search methods when >>finding the peptide coding location in nucleic acid sequences rather >>than having to use similarity searches. > > > But this won't work the way some might hope due to the nature of the > genetic code, specifically (in the standard code) the three amino acids > that have six codons each (S, L, and R). Consider serine, encoded by UCN > and AGY. Would you like this to be back-translated to WSN? That matches > all six serine codons but also ten non-serine codons. Some people may > still want to use it in a probe or primer though. I was going to use Serine in my example but realised 'WSN' was a bit too degenerate to be a useful example. I understand you could not roundtrip peptide->DNA->peptide with my suggested behaviour (as you can currently do with backtranseq), but you can do DNA->peptide->DNA in a usable form. I'm sketchy about its potential use in oligo design, but given a degenerate backtranslation someone could possibly design oligos so as to avoid the more degenerate areas (esp for the 3' end of primers). If they were to use backtranseq they would be ignorant of these regions. Nadeem -- S.M. Nadeem N. Faruque EMBL Nucleotide Database Curation Team EMBL Outstation Tel: +44 1223 494611 Fax: +44 1223 494472 The European Bioinformatics Institute URL: http://www.ebi.ac.uk/ Email for data submissions: datasubs at ebi.ac.uk Email for updates: update at ebi.ac.uk ============================================================================= From pmr at ebi.ac.uk Thu Jul 21 15:55:15 2005 From: pmr at ebi.ac.uk (Peter Rice) Date: Thu, 21 Jul 2005 16:55:15 +0100 Subject: [EMBOSS] backtranseq In-Reply-To: <42DFBD7F.7060306@ebi.ac.uk> References: <200507211338.35035.kertib@linuxlap.hu> <42DF90CA.9FB0FD34@hgmp.mrc.ac.uk> <42DF9E4E.6060603@ebi.ac.uk> <42DFAA7E.2070107@ebi.ac.uk> <42DFBD7F.7060306@ebi.ac.uk> Message-ID: <42DFC563.4010600@ebi.ac.uk> Nadeem Faruque wrote: > Josh Cherry wrote: >>But this won't work the way some might hope due to the nature of the >>genetic code, specifically (in the standard code) the three amino acids >>that have six codons each (S, L, and R). Consider serine, encoded by UCN >>and AGY. Would you like this to be back-translated to WSN? That matches >>all six serine codons but also ten non-serine codons. Some people may >>still want to use it in a probe or primer though. > > I was going to use Serine in my example but realised 'WSN' was a bit too degenerate to be a useful example. > I understand you could not roundtrip peptide->DNA->peptide with my suggested behaviour ... I bet you can!!! Assuming you have a backtranslated sequence, WSN would be surely Serine (as would UCN or AGY). If any of the 3 positions is more specific, that could indicate one of the other possibilities. I would be happy to accept a lower case residue if the result is uncertain (if the ambiguity codes do not match what one would expect from the genetic code in a backtranslation). For ASN the answer could be T (ACN) S (AGY) or R (AGR) with T ('t') the favourite by a majority vote (4/4 codons match, 2/6 for the others). X can be used if all else fails. After all, we could be translating a sequence with a SNP. A command line option can give the user a choice of trying to resolve unclear positions or using X. Degenerate codons would be: A GCN C UGY D GAY E GAR F UUY G GGN H CAY I AUH K AAR L YUN (CUN/UUR) - also matches F (UUY) M AUG N AAY P CCN Q CAR R MGN (CGN/AGR) - also matches S (AGY) S WSN (UCN/AGY) - also matches T (ACN) also matches R (AGR) also matches C and W and * (UGN) T ACN V GUN W UGG Y UAY * URR - also matcheds W (UGG) m NUG (start codon) From lukem at gene.pbi.nrc.ca Thu Jul 21 21:08:32 2005 From: lukem at gene.pbi.nrc.ca (Luke McCarthy) Date: Thu, 21 Jul 2005 15:08:32 -0600 Subject: [EMBOSS] EMBOSS explorer Message-ID: <1121980112.5376.11.camel@incognito.invalid> Hi everybody, I'm pleased to finally announce a new release of the EMBOSS interface formerly known as EMBOSS::GUI, now known as EMBOSS explorer. Development has moved to SourceForge.net and the new home page for the interface is http://embossgui.sourceforge.net/ It's quite spartan at the moment, but I'll be adding a FAQ as questions are frequent asked (and answered...) You can download EMBOSS explorer at http://prdownloads.sourceforge.net/embossgui/emboss-explorer-2.0.0.tar.gz?download The new release has been tested against EMBOSS-3.0.0, but not thoroughly. Please report bugs using the bug tracker at http://sourceforge.net/tracker/?atid=699414&group_id=124389&func=browse (as a last resort, email them to mccarthy at users.sourceforge.net, but I'm hoping that use of the bug tracker will help with duplicate reports and other organizational issues...) Cheers, Luke From gwilliam at hgmp.mrc.ac.uk Fri Jul 22 08:21:40 2005 From: gwilliam at hgmp.mrc.ac.uk (Gary Williams, Tel 01223 494522) Date: Fri, 22 Jul 2005 09:21:40 +0100 Subject: [EMBOSS] backtranseq References: <200507211338.35035.kertib@linuxlap.hu> <42DF90CA.9FB0FD34@hgmp.mrc.ac.uk> <42DF9E4E.6060603@ebi.ac.uk> <42DFAA7E.2070107@ebi.ac.uk> Message-ID: <42E0AC94.63F132A7@hgmp.mrc.ac.uk> Peter Rice wrote: > > Nadeem Faruque wrote: > > > While backtranseq is very clever in predicting the cDNA sequence based on peptide sequence by choosing codons according > > to useage, would it not be very useful to have the option for it to return an answer in degenerate bases? > > > > eg in human, the 'peptide' is simply 'M' > > backtranseq returns the most likely codon used, ie 'ATG' > > but since it could be TTG, CTG or ATG, it may be more useful for some people to return 'HTG' > > Ummmm .... depends on the genetic code. In human I would expect ATG, in > bacteria GCG is second schoice and NTG would be the possible result - but only > for a start codon of course (just one of the complexities of backtranslating - > I think we must avoid inventing a start codon if the protein doesn't start > with 'M' because the numbering gets complicated). > > As this would need a different input (a genetic code, rather than a codon > usage file) I would make this a different program - not difficult to write, > > Any good suggestions for a program name? barebackseq -- Gary Williams MRC Rosalind Franklin Centre for Genomics Research Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SB, UK Tel: +44 1223 494522 (UNTIL END OF JULY 2005) E-mail: gareth.williams57 at ntlworld.com From gbottu at ben.vub.ac.be Fri Jul 22 09:10:17 2005 From: gbottu at ben.vub.ac.be (Guy Bottu) Date: Fri, 22 Jul 2005 11:10:17 +0200 Subject: [EMBOSS] Some question In-Reply-To: <42DF8E02.40909@ebi.ac.uk> References: <200507211338.35035.kertib@linuxlap.hu> <42DF8E02.40909@ebi.ac.uk> Message-ID: <20050722091017.GA27340@bigben.ulb.ac.be> On Thu, Jul 21, 2005 at 12:58:58PM +0100, Peter Rice wrote: > Kerti Bal?zs G?bor wrote: > > > There is some (elementary) question, because I do not find - maybe I do wrong > > - the solution. > > > > - how to backtranslate a cds mRNA fragm. to (c)DNA fragm. ? > > The cDNA will be identical to the mRNA. No backtranslation needed. > Backtranslation (as in backtranseq) converts a protein sequence into a > nucleotide sequence that will translate to the same protein sequence (using > the most frequent codon for each amino acid). > > If you only want to convert U (Uracil) to T (thymine) to convert an RNA > sequence to DNA (all EMBOSS programs will accept both as nucleotide input) you > can modify the program seqret to specify a nucleotide sequence as input, and > generate a DNA sequence as output. An easy way to start writing EMBOSS > programs - copy one program and one ACD file and make 4 small edits. No need to modify seqret, the EMBOSS program biosed can be used to replace U by T in a sequence. Guy Bottu, Belgian EMBnet Node From gbottu at ben.vub.ac.be Fri Jul 22 09:26:38 2005 From: gbottu at ben.vub.ac.be (Guy Bottu) Date: Fri, 22 Jul 2005 11:26:38 +0200 Subject: [EMBOSS] backtranseq In-Reply-To: <42DFC563.4010600@ebi.ac.uk> References: <200507211338.35035.kertib@linuxlap.hu> <42DF90CA.9FB0FD34@hgmp.mrc.ac.uk> <42DF9E4E.6060603@ebi.ac.uk> <42DFAA7E.2070107@ebi.ac.uk> <42DFBD7F.7060306@ebi.ac.uk> <42DFC563.4010600@ebi.ac.uk> Message-ID: <20050722092638.GB27340@bigben.ulb.ac.be> I remember that the GCG program backtranslate let the use choose between the most likely backtranslation (as backtranseq does) and the most ambiguous backtranslation. So, adding to EMBOSS a program that makes the most ambiguous backtranslation would bring back this lost functionality. As for the problem cases like Serine, maybe an option to make instead of a sequence with ambiguity symbols a regular expression that exactly matches the allowed codons ? The utility of this may be limited, but you could e.g. if you have a peptide use the backtranslation with the program dreg to search the corresponding CDS in a piece of DNA. Regards, Guy Bottu, Belgian EMBnet Node From faruque at ebi.ac.uk Fri Jul 22 10:22:27 2005 From: faruque at ebi.ac.uk (Nadeem Faruque) Date: Fri, 22 Jul 2005 11:22:27 +0100 Subject: [EMBOSS] backtranseq In-Reply-To: <20050722092638.GB27340@bigben.ulb.ac.be> References: <200507211338.35035.kertib@linuxlap.hu> <42DF90CA.9FB0FD34@hgmp.mrc.ac.uk> <42DF9E4E.6060603@ebi.ac.uk> <42DFAA7E.2070107@ebi.ac.uk> <42DFBD7F.7060306@ebi.ac.uk> <42DFC563.4010600@ebi.ac.uk> <20050722092638.GB27340@bigben.ulb.ac.be> Message-ID: <42E0C8E3.8060900@ebi.ac.uk> > As for the problem cases like Serine, maybe an option to make instead of a > sequence with ambiguity symbols a regular expression that exactly matches > the allowed codons ? The utility of this may be limited, but you could > e.g. if you have a peptide use the backtranslation with the program dreg to > search the corresponding CDS in a piece of DNA. I think we'd be better off with plain old IUPAC rather than venturing into more comples systems or we'll end up with weighted matrices or even HMM's. The advantage of IUPAC is of course that you can plug it into most other programs. Nadeem -- S.M. Nadeem N. Faruque EMBL Nucleotide Database Curation Team EMBL Outstation Tel: +44 1223 494611 Fax: +44 1223 494472 The European Bioinformatics Institute URL: http://www.ebi.ac.uk/ Email for data submissions: datasubs at ebi.ac.uk Email for updates: update at ebi.ac.uk ============================================================================= From pmr at ebi.ac.uk Fri Jul 22 12:52:49 2005 From: pmr at ebi.ac.uk (Peter Rice) Date: Fri, 22 Jul 2005 13:52:49 +0100 Subject: [EMBOSS] backtranseq In-Reply-To: <42E0C8E3.8060900@ebi.ac.uk> References: <200507211338.35035.kertib@linuxlap.hu> <42DF90CA.9FB0FD34@hgmp.mrc.ac.uk> <42DF9E4E.6060603@ebi.ac.uk> <42DFAA7E.2070107@ebi.ac.uk> <42DFBD7F.7060306@ebi.ac.uk> <42DFC563.4010600@ebi.ac.uk> <20050722092638.GB27340@bigben.ulb.ac.be> <42E0C8E3.8060900@ebi.ac.uk> Message-ID: <42E0EC21.4030607@ebi.ac.uk> Nadeem Faruque wrote: > I think we'd be better off with plain old IUPAC rather than venturing into more comples systems or we'll end up with > weighted matrices or even HMM's. > The advantage of IUPAC is of course that you can plug it into most other programs. Well .... how about this part of IUPAC: IUBMB recommends marking unclear codons, for example in http://www.chem.qmul.ac.uk/iubmb/misc/naseq.html "To avoid ambiguity, therefore, it is important to make it clear whenever the triplet YTN, for example, occurs in a sequence deduced from the occurrence of a leucine residue in the corresponding amino acid sequence that it does not include TTT or TTC as possibilities, etc. To emphasise this, it may be helpful to print such triplets in italics." ... we could use lowercase, rather than italics, to make this clear. IUPAC also allows uncertain positions with (A,C,D) or (H.I.K.L). EMBOSS allows these, but after checking all occurrences in PIR it simply ignores the extra characters and assumes the amino acids are in the correct sequence. These are needed because Sanger protein sequencing determined composition but usually not the order of residues. I see no codes for a choice of amino acids, other than B (D or N) and Z (E or Q), both from amino acid sequence composition, where hydrolyzing all amide bonds converted N to D (Asparagine to Aspartate) and Q to E (glutamine to glutamate). Also, one IUPAC report notes that NMR data can include J for "I or L" as Leucine and Isoleucine are indistinguishable by NMR. EBMOSS so far ignores this code (I only discovered it today :-). U is now officially used for selenocysteine, although many EMBOSS programs cannot handle U and have to use X. The only character not used in amino acid sequence is O. I have seen it used in DNA sequence (CpG islands represented as OJ for specialised alignment scoring in one publication). From pmr at ebi.ac.uk Fri Jul 22 15:00:01 2005 From: pmr at ebi.ac.uk (Peter Rice) Date: Fri, 22 Jul 2005 16:00:01 +0100 Subject: [EMBOSS] EMBOSS in August Message-ID: <42E109F1.9070604@ebi.ac.uk> We know it is close to the end of July, and we have not said what is happening to the EMBOSS team. We do have a solution, but it is not yet officially confirmed. The Rosalind Franklin Centre for Genomic Research will close at the end of next week. The EMBOSS project will move to the European Bioinformatics Institute from August 1st. Development and support will continue as before. The EMBOSS homepage will remain at http://emboss.sourceforge.net/ The FTP server (to download EMBOSS releases and updates) has moved to ftp://emboss.open-bio.org/pub/EMBOSS/ The EMBOSS anonymous CVS server will remain at cvs.open-bio.org hosted by the Open Bio Foundation, who will also continue to host the developers' CVS server. The EMBOSS mailing lists have been moved to the Open Bio Foundation, so the addresses are now: To contact the EMBOSS team: emboss-bug at emboss.open-bio.org Bug reports and support requests emboss-submit at emboss.open-bio.org Code submissions Lists users/developers can subscribe to: emboss at emboss.open-bio.org Users mailing list emboss-dev at emboss.open-bio.org Developers mailing list emboss-announce at emboss.open-bio.org New release announcements list There are obvious gaps in these details ... more news as soon as we have confirmation. regards, Peter Rice, Alan Bleasby and the EMBOSS team. From maoj at helix.nih.gov Mon Jul 25 13:58:06 2005 From: maoj at helix.nih.gov (Jean Mao) Date: Mon, 25 Jul 2005 09:58:06 -0400 Subject: [EMBOSS] (no subject) Message-ID: <200507251358.j6PDw5N94183035@helix.nih.gov> Hello all, I am building emboss package on our linux cluster. Since it will be for multiple batch run purpose, there is no need for us to include X11. I got the following error during 'make install'. Can someone tell me which programs use X11 and how to turn it off in them before running 'make install'? Many thanks!!! ---------------------------------------------------------------------------- --------------------------------------- make[2]: Entering directory `/usr/local/EMBOSS-3.0.0/emboss' /bin/sh ../libtool --tag=CC --mode=link gcc -O2 -o aaindexextract aaindexextract.o ../nucleus/libnucleus.la ../ajax/libajaxg.la ../ajax/libajax.la ../plplot/libplplot.la -lX11 -lm gcc -O2 -o .libs/aaindexextract aaindexextract.o ../nucleus/.libs/libnucleus.so ../ajax/.libs/libajaxg.so ../ajax/.libs/libajax.so ../plplot/.libs/libplplot.so -lX11 -lm -Wl,--rpath -Wl,/usr/local/EMBOSS-3.0.0/lib /usr/bin/ld: cannot find -lX11 collect2: ld returned 1 exit status make[2]: *** [aaindexextract] Error 1 make[2]: Leaving directory `/usr/local/EMBOSS-3.0.0/emboss' make[1]: *** [install-recursive] Error 1 make[1]: Leaving directory `/usr/local/EMBOSS-3.0.0/emboss' make: *** [install-recursive] Error 1 ---------------------------------------------------------------------------- --------------------------------------- Jean From msarachu at biol.unlp.edu.ar Mon Jul 25 14:44:12 2005 From: msarachu at biol.unlp.edu.ar (Martin Sarachu) Date: Mon, 25 Jul 2005 11:44:12 -0300 Subject: [EMBOSS] wEMBOSS-1.5 & wrappers4EMBOSS-1.3 Message-ID: <42E4FABC.20302@biol.unlp.edu.ar> This message is to announce the release of wEMBOSS-1.5 and wrappers4EMBOSS-1.3 wEMBOSS-1.5 includes: * a session indicator to identify which user is running wEMBOSS * the posibility to add notes to project results wrappers4EMBOSS-1.3 includes: * codehop wrapper for selecting degenerated primers * muscle wrapper for multiple alignements Both are available at http://www.wemboss.org -- Martin Sarachu msarachu at biol.unlp.edu.ar AR.EMBnet http://www.ar.embnet.org From maoj at helix.nih.gov Mon Jul 25 15:20:39 2005 From: maoj at helix.nih.gov (Jean Mao) Date: Mon, 25 Jul 2005 11:20:39 -0400 Subject: [EMBOSS] How to exclude X11 when Compile Emboss In-Reply-To: <71B0C9CB1FF4EA43BB48C08DCFF1A1FF01364AC3@NIHCESMLBX.nih.gov> Message-ID: <200507251520.j6PFKdN93765833@helix.nih.gov> > Hello all, > > I am building emboss package on our linux cluster. Since it will be for > multiple batch run purpose, there is no need for us to include X11. I got > the following error during 'make install'. Can someone tell me which > programs use X11 and how to turn it off in them before running 'make > install'? Many thanks!!! > ------------------------------------------------------- > make[2]: Entering directory `/usr/local/EMBOSS-3.0.0/emboss' > /bin/sh ../libtool --tag=CC --mode=link gcc -O2 -o aaindexextract > aaindexextract.o ../nucleus/libnucleus.la ../ajax/libajaxg.la > ../ajax/libajax.la ../plplot/libplplot.la -lX11 -lm > gcc -O2 -o .libs/aaindexextract aaindexextract.o > ../nucleus/.libs/libnucleus.so ../ajax/.libs/libajaxg.so > ../ajax/.libs/libajax.so ../plplot/.libs/libplplot.so -lX11 -lm > -Wl,--rpath -Wl,/usr/local/EMBOSS-3.0.0/lib > /usr/bin/ld: cannot find -lX11 > collect2: ld returned 1 exit status > make[2]: *** [aaindexextract] Error 1 > make[2]: Leaving directory `/usr/local/EMBOSS-3.0.0/emboss' > make[1]: *** [install-recursive] Error 1 > make[1]: Leaving directory `/usr/local/EMBOSS-3.0.0/emboss' > make: *** [install-recursive] Error 1 > ---------------------------------------------------------- > Jean > From maoj at mail.nih.gov Mon Jul 25 13:56:20 2005 From: maoj at mail.nih.gov (Mao, Jean (NIH/CIT)) Date: Mon, 25 Jul 2005 09:56:20 -0400 Subject: [EMBOSS] How to Turn X11 off during Make? Message-ID: <71B0C9CB1FF4EA43BB48C08DCFF1A1FF01730B6E@NIHCESMLBX.nih.gov> Hello all, I am building emboss package on our linux cluster. Since it will be for multiple batch run purpose, there is no need for us to include X11. I got the following error during 'make install'. Can someone tell me which programs use X11 and how to turn it off in them before running 'make install'? Many thanks!!! ---------------------------------------------------------------------------- --------------------------------------- make[2]: Entering directory `/usr/local/EMBOSS-3.0.0/emboss' /bin/sh ../libtool --tag=CC --mode=link gcc -O2 -o aaindexextract aaindexextract.o ../nucleus/libnucleus.la ../ajax/libajaxg.la ../ajax/libajax.la ../plplot/libplplot.la -lX11 -lm gcc -O2 -o .libs/aaindexextract aaindexextract.o ../nucleus/.libs/libnucleus.so ../ajax/.libs/libajaxg.so ../ajax/.libs/libajax.so ../plplot/.libs/libplplot.so -lX11 -lm -Wl,--rpath -Wl,/usr/local/EMBOSS-3.0.0/lib /usr/bin/ld: cannot find -lX11 collect2: ld returned 1 exit status make[2]: *** [aaindexextract] Error 1 make[2]: Leaving directory `/usr/local/EMBOSS-3.0.0/emboss' make[1]: *** [install-recursive] Error 1 make[1]: Leaving directory `/usr/local/EMBOSS-3.0.0/emboss' make: *** [install-recursive] Error 1 ---------------------------------------------------------------------------- --------------------------------------- Jean From idrummon at receptor.mgh.harvard.edu Mon Jul 25 16:25:28 2005 From: idrummon at receptor.mgh.harvard.edu (Iain Drummond) Date: Mon, 25 Jul 2005 12:25:28 -0400 Subject: [EMBOSS] How to exclude X11 when Compile Emboss In-Reply-To: <200507251520.j6PFKdN93765833@helix.nih.gov> Message-ID: Jean, Either tell emboss where to find the X11 libraries during the ./configure step: X features: --x-includes=DIR X include files are in DIR --x-libraries=DIR X library files are in DIR for example ./configure --x-includes=/usr/local/includes --x-libraries=/usr/local/lib or decide not to use X11 at all ./configure --without-x you can get this info by typing ./configure -help Iain Drummond -- Iain Drummond, Ph.D. Assistant Professor Department of Medicine, Harvard Medical School and Renal Unit, Massachusetts General Hospital Mailing address: Renal Unit / MGH 149-8000 149 13th St. Charlestown, MA 02129 Tel: 617 726 5647 Fax: 617 726 5669 idrummond at partners.org idrummon at receptor.mgh.harvard.edu Lab Home Page: http://danio.mgh.harvard.edu > From: "Jean Mao" > Organization: CIT > Reply-To: maoj at helix.nih.gov > Date: Mon, 25 Jul 2005 11:20:39 -0400 > To: > Subject: [EMBOSS] How to exclude X11 when Compile Emboss > > >> Hello all, >> >> I am building emboss package on our linux cluster. Since it will be for >> multiple batch run purpose, there is no need for us to include X11. I got >> the following error during 'make install'. Can someone tell me which >> programs use X11 and how to turn it off in them before running 'make >> install'? Many thanks!!! >> ------------------------------------------------------- >> make[2]: Entering directory `/usr/local/EMBOSS-3.0.0/emboss' >> /bin/sh ../libtool --tag=CC --mode=link gcc -O2 -o aaindexextract >> aaindexextract.o ../nucleus/libnucleus.la ../ajax/libajaxg.la >> ../ajax/libajax.la ../plplot/libplplot.la -lX11 -lm >> gcc -O2 -o .libs/aaindexextract aaindexextract.o >> ../nucleus/.libs/libnucleus.so ../ajax/.libs/libajaxg.so >> ../ajax/.libs/libajax.so ../plplot/.libs/libplplot.so -lX11 -lm >> -Wl,--rpath -Wl,/usr/local/EMBOSS-3.0.0/lib >> /usr/bin/ld: cannot find -lX11 >> collect2: ld returned 1 exit status >> make[2]: *** [aaindexextract] Error 1 >> make[2]: Leaving directory `/usr/local/EMBOSS-3.0.0/emboss' >> make[1]: *** [install-recursive] Error 1 >> make[1]: Leaving directory `/usr/local/EMBOSS-3.0.0/emboss' >> make: *** [install-recursive] Error 1 >> ---------------------------------------------------------- >> Jean >> > _______________________________________________ > EMBOSS mailing list > EMBOSS at emboss.open-bio.org > http://newportal.open-bio.org/mailman/listinfo/emboss From david at compbio.dundee.ac.uk Tue Jul 26 15:02:51 2005 From: david at compbio.dundee.ac.uk (David Martin) Date: Tue, 26 Jul 2005 16:02:51 +0100 Subject: [EMBOSS] dbxflat woes Message-ID: I am trying to run dbxflat on uniprot (sprot/trembl/tremblnew) and it gets most of the way through the second file then repeatably fails with the error: Processing file ./sprot.dat Processing file ./trembl.dat EMBOSS An error in ajindex.c at line 811: Something has unlocked the PRI root cache page Any hints on what I can do to avoid this? I am running as an unpriviledged user. ..d From ableasby at hgmp.mrc.ac.uk Tue Jul 26 15:55:19 2005 From: ableasby at hgmp.mrc.ac.uk (Alan Bleasby) Date: Tue, 26 Jul 2005 16:55:19 +0100 (BST) Subject: [EMBOSS] dbxflat woes Message-ID: <200507261555.j6QFtJdq005430@bromine.hgmp.mrc.ac.uk> >Something has unlocked the PRI root cache page With an error like that the first thing to check is if you've set CACHESIZE too small. The docs recommend that it's set to 200. If that isn't the problem then email me with your settings for: a) PAGESIZE b) CACHESIZE c) Resource definition from emboss.default and also email me with the command line you are using. Rgds Alan From pmr at ebi.ac.uk Wed Jul 27 10:04:10 2005 From: pmr at ebi.ac.uk (Peter Rice) Date: Wed, 27 Jul 2005 11:04:10 +0100 Subject: [EMBOSS] Database indexing logfiles Message-ID: <42E75C1A.5010606@ebi.ac.uk> Some questions for those who index their own databases in EMBOSS... I am adding an output file to the programs to log information from the indexing run. A sample for indexing the "tembl" test database is included below (data files are in the test/embl directory). Is this useful? What other information would you like to see? Can we improve the format of the report? regards, Peter Rice %cat outfile.dbiflat ######################################## # Program: dbiflat # Rundate: Wed Jul 27 2005 11:02:22 # Dbname: EMBL # Release: 0.0 # Date: 00/00/00 # IndexDirectory: ./ # Maxindex: 0 # Fields: 6 # Field 1: id # Field 2: acnum # Field 3: seqvn # Field 4: des # Field 5: keyword # Field 6: taxon # Directory: ./ # Filenames: *.dat # Exclude: # Files: 10 # File 1: ./est.dat # File 2: ./fun.dat # File 3: ./hum1.dat # File 4: ./inv.dat # File 5: ./pln.dat # File 6: ./pro.dat # File 7: ./rod.dat # File 8: ./sts.dat # File 9: ./vrl.dat # File 10: ./vrt.dat ######################################## processing filename 'est.dat' ... 1 entries processing filename 'fun.dat' ... 1 entries processing filename 'hum1.dat' ... 18 entries processing filename 'inv.dat' ... 3 entries processing filename 'pln.dat' ... 3 entries processing filename 'pro.dat' ... 9 entries processing filename 'rod.dat' ... 3 entries processing filename 'sts.dat' ... 1 entries processing filename 'vrl.dat' ... 1 entries processing filename 'vrt.dat' ... 4 entries Index acnum maxlen 8 items 88 Index seqvn maxlen 10 items 132 Index des maxlen 19 items 422 Index keyword maxlen 44 items 96 Index taxon maxlen 27 items 535 Total 10 files 44 entries From smiddha at indiana.edu Wed Jul 27 20:28:56 2005 From: smiddha at indiana.edu (Sumit Middha) Date: Wed, 27 Jul 2005 15:28:56 -0500 Subject: [EMBOSS] EMBOSS explorer In-Reply-To: <1121980112.5376.11.camel@incognito.invalid> References: <1121980112.5376.11.camel@incognito.invalid> Message-ID: <1122496136.42e7ee8815bd3@webmail.iu.edu> Hi, Its great to hear of the interface. I want to install it to my own directories (possibly the same where I untar everything) and then I will manage to point my web-pages or cgi etc to these. But I am not sure how to achieve that. This is my attempt at installation. Can someone help me with this. THanks. > ./install installing EMBOSS Explorer perl modules... Checking if your kit is complete... Looks good Writing Makefile for EMBOSS::GUI cp lib/EMBOSS/ACD.pm blib/lib/EMBOSS/ACD.pm cp lib/EMBOSS/GUI.pm blib/lib/EMBOSS/GUI.pm cp lib/EMBOSS/GUI/Conf.pm blib/lib/EMBOSS/GUI/Conf.pm cp lib/EMBOSS/GUI/XHTML.pm blib/lib/EMBOSS/GUI/XHTML.pm Manifying blib/man3/EMBOSS::GUI.3 Manifying blib/man3/EMBOSS::ACD.3 Manifying blib/man3/EMBOSS::GUI::Conf.3 Manifying blib/man3/EMBOSS::GUI::XHTML.3 Warning: You do not have permissions to install into /usr/local/lib/perl5/site_perl/5.8.5/sun4-solaris at /usr/local/lib/perl5/5.8.5/ExtUtils/Install.pm line 114. mkdir /usr/local/lib/perl5/site_perl/5.8.5/EMBOSS: Permission denied at /usr/local/lib/perl5/5.8.5/ExtUtils/Install.pm line 176 *** Error code 255 make: Fatal error: Command failed for target `pure_site_install' Quoting Luke McCarthy : > Hi everybody, > > I'm pleased to finally announce a new release of the EMBOSS interface > formerly known as EMBOSS::GUI, now known as EMBOSS explorer. > > Development has moved to SourceForge.net and the new home page for the > interface is http://embossgui.sourceforge.net/ It's quite spartan at > the moment, but I'll be adding a FAQ as questions are frequent asked > (and answered...) > > You can download EMBOSS explorer at > http://prdownloads.sourceforge.net/embossgui/emboss-explorer-2.0.0.tar.gz?download > > The new release has been tested against EMBOSS-3.0.0, but not > thoroughly. Please report bugs using the bug tracker at > http://sourceforge.net/tracker/?atid=699414&group_id=124389&func=browse > (as a last resort, email them to mccarthy at users.sourceforge.net, but I'm > hoping that use of the bug tracker will help with duplicate reports and > other organizational issues...) > > Cheers, > > Luke > _______________________________________________ > EMBOSS mailing list > EMBOSS at emboss.open-bio.org > http://newportal.open-bio.org/mailman/listinfo/emboss > From lukem at gene.pbi.nrc.ca Wed Jul 27 21:05:46 2005 From: lukem at gene.pbi.nrc.ca (Luke McCarthy) Date: Wed, 27 Jul 2005 15:05:46 -0600 Subject: [EMBOSS] EMBOSS explorer In-Reply-To: <1122496136.42e7ee8815bd3@webmail.iu.edu> References: <1121980112.5376.11.camel@incognito.invalid> <1122496136.42e7ee8815bd3@webmail.iu.edu> Message-ID: <1122498346.25556.7.camel@incognito.invalid> On Wed, 2005-07-27 at 14:28, Sumit Middha wrote: > Hi, > Its great to hear of the interface. I want to install it to my own directories > (possibly the same where I untar everything) and then I will manage to point my > web-pages or cgi etc to these. But I am not sure how to achieve that. > > This is my attempt at installation. Can someone help me with this. THanks. At the moment, you can't use the install script to install to your local directories. You'd have to do quite a bit of extra setup anyway, to make sure the web server could find (and had permission) to access the library files in your own directory. That being said, you can install the Perl modules like you would any others: perl Makefile.PL make make install You'll have to pass the appropriate options to Makefile.PL in order to install to your own directory. Alternatively, you can just run everything out of the untarred directory. You'll have to make sure that the web server is looking for perl modules in the emboss-explorer/lib directory, and you'll have to link appropriately to the html and cgi directories. The webserver user needs to be able to read everything in the lib, html and cgi directories, and to be able to execute the script in the cgi directory, and to be to write to the html/output directory. I assume that you know how to set up your webserver accordingly (or you wouldn't be asking...) You'll also have to edit emboss-explorer/lib/EMBOSS/GUI/Conf.pm and fill in the correct locations. Good luck. Cheers, Luke From pmr at ebi.ac.uk Thu Jul 28 16:14:06 2005 From: pmr at ebi.ac.uk (Peter Rice) Date: Thu, 28 Jul 2005 17:14:06 +0100 Subject: [EMBOSS] Database indexing logfiles In-Reply-To: <42E75C1A.5010606@ebi.ac.uk> References: <42E75C1A.5010606@ebi.ac.uk> Message-ID: <42E9044E.4090707@ebi.ac.uk> After comments on this list, I have updated the dbiflat logfile. It now includes: Field names are the short names used by the USA (the index file names still work on the dbiflat commandline). These are the same names as SRS uses in its commandline queries. Numbers of tokens for each index field in each file, and total unique values in each field index. Full paths for all directories (including the current working directory) Today's date (also written to the index file headers) if no date is given. The full commandline - if there were prompts with non-default replies these will be included in the commandline reported. This uses new ACD functions that can be used to report in other programs. Any special requests for this information in other outputs? regards, Peter > %cat outfile.dbiflat > ######################################## > # Program: dbiflat > # Rundate: Thu Jul 28 2005 17:04:58 > # Dbname: EMBL > # Release: 0.0 > # Date: 28/07/05 > # CurrentDirectory: /homes/pmr/hgmp/test/embl/ > # IndexDirectory: ./ > # IndexDirectoryPath: /homes/pmr/hgmp/test/embl/ > # Maxindex: 0 > # Fields: 6 > # Field 1: id > # Field 2: acc > # Field 3: sv > # Field 4: des > # Field 5: key > # Field 6: org > # Directory: ./ > # DirectoryPath: /homes/pmr/hgmp/test/embl/ > # Filenames: *.dat > # Exclude: > # Files: 10 > # File 1: ./est.dat > # File 2: ./fun.dat > # File 3: ./hum1.dat > # File 4: ./inv.dat > # File 5: ./pln.dat > # File 6: ./pro.dat > # File 7: ./rod.dat > # File 8: ./sts.dat > # File 9: ./vrl.dat > # File 10: ./vrt.dat > ######################################## > # Commandline: dbiflat > # -fields acnum,seqvn,des,keyword,taxon > # -dbname EMBL > # -idformat embl > # -auto > ######################################## > > processing filename 'est.dat' ... 1 entries > acc 1 > sv 3 > des 15 > key 1 > org 14 > processing filename 'fun.dat' ... 1 entries > acc 1 > sv 3 > des 8 > key 1 > org 9 > processing filename 'hum1.dat' ... 18 entries > acc 53 > sv 54 > des 200 > key 43 > org 252 > processing filename 'inv.dat' ... 3 entries > acc 3 > sv 9 > des 20 > key 3 > org 33 > processing filename 'pln.dat' ... 3 entries > acc 7 > sv 9 > des 19 > key 6 > org 54 > processing filename 'pro.dat' ... 9 entries > acc 13 > sv 27 > des 77 > key 28 > org 54 > processing filename 'rod.dat' ... 3 entries > acc 3 > sv 9 > des 28 > key 1 > org 45 > processing filename 'sts.dat' ... 1 entries > acc 1 > sv 3 > des 12 > key 7 > org 14 > processing filename 'vrl.dat' ... 1 entries > acc 2 > sv 3 > des 10 > key 1 > org 5 > processing filename 'vrt.dat' ... 4 entries > acc 4 > sv 12 > des 33 > key 5 > org 55 > > Index acc maxlen 8 items 84 > Index sv maxlen 10 items 90 > Index des maxlen 19 items 215 > Index key maxlen 44 items 81 > Index org maxlen 27 items 116 > > Total 10 files 44 entries From john8376 at uidaho.edu Fri Jul 29 19:08:50 2005 From: john8376 at uidaho.edu (Audra Johnson) Date: Fri, 29 Jul 2005 12:08:50 -0700 Subject: [EMBOSS] Using seqret to fetch from .nal index databases Message-ID: <5C75DDA3-04A4-4A58-B925-31F9F017D8C4@uidaho.edu> Apologies for the length, but I want to be thorough. I'm doing blast searches and then trying to fetch the sequences from the our genembl database using seqret. For example: blastall -p tblastn /gcgdata_10.3/gcgblast/genembl -i dp00061_disordered_115_168.fasta Gives me results of: GB_PR:HUMRPA70KD 2e-08 412 573 1 54 54 GB_PR:BC018126 2e-08 386 547 1 54 54 GB_PAT:AX335048 2e-08 412 573 1 54 54 GB_PAT:AR175924 2e-08 412 573 1 54 54 GB_RO:BC019119 0.003 399 584 1 53 62 I've tried using a seqret just for the database name I'm giving blastall, and specifically saying the genembl.nal file: $ seqret Reads and writes (returns) sequences Input sequence(s): /gcgdata_10.3/gcgblast/genembl.nal:HUMRPA70KD Error: Unable to read sequence '/gcgdata_10.3/gcgblast/ genembl.nal:HUMRPA70KD' Input sequence(s): /gcgdata_10.3/gcgblast/genembl:HUMRPA70KD Error: failed to open filename '/gcgdata_10.3/gcgblast/genembl' Error: Unable to read sequence '/gcgdata_10.3/gcgblast/ genembl:HUMRPA70KD' Died: seqret terminated: Bad value for '-sequence' and no more retries But neither works. (I've omitted the beginning prefix GB_PR: and similar prefixes, but I've tried that way and it doesn't work, either.) Is there any way to get seqret functioning with these databases? -- Audra Johnson, University of Idaho From golharam at umdnj.edu Fri Jul 29 19:27:51 2005 From: golharam at umdnj.edu (Ryan Golhar) Date: Fri, 29 Jul 2005 15:27:51 -0400 Subject: [EMBOSS] Using seqret to fetch from .nal index databases In-Reply-To: <5C75DDA3-04A4-4A58-B925-31F9F017D8C4@uidaho.edu> Message-ID: <008e01c59473$9cf21d70$2f01a8c0@GOLHARMOBILE1> If you are using a NCBI formatted database, why not just use formatseq from the ncbi toolkit to extract the sequence? -----Original Message----- From: emboss-bounces at emboss.open-bio.org [mailto:emboss-bounces at emboss.open-bio.org] On Behalf Of Audra Johnson Sent: Friday, July 29, 2005 3:09 PM To: emboss at emboss.open-bio.org Subject: [EMBOSS] Using seqret to fetch from .nal index databases Apologies for the length, but I want to be thorough. I'm doing blast searches and then trying to fetch the sequences from the our genembl database using seqret. For example: blastall -p tblastn /gcgdata_10.3/gcgblast/genembl -i dp00061_disordered_115_168.fasta Gives me results of: GB_PR:HUMRPA70KD 2e-08 412 573 1 54 54 GB_PR:BC018126 2e-08 386 547 1 54 54 GB_PAT:AX335048 2e-08 412 573 1 54 54 GB_PAT:AR175924 2e-08 412 573 1 54 54 GB_RO:BC019119 0.003 399 584 1 53 62 I've tried using a seqret just for the database name I'm giving blastall, and specifically saying the genembl.nal file: $ seqret Reads and writes (returns) sequences Input sequence(s): /gcgdata_10.3/gcgblast/genembl.nal:HUMRPA70KD Error: Unable to read sequence '/gcgdata_10.3/gcgblast/ genembl.nal:HUMRPA70KD' Input sequence(s): /gcgdata_10.3/gcgblast/genembl:HUMRPA70KD Error: failed to open filename '/gcgdata_10.3/gcgblast/genembl' Error: Unable to read sequence '/gcgdata_10.3/gcgblast/ genembl:HUMRPA70KD' Died: seqret terminated: Bad value for '-sequence' and no more retries But neither works. (I've omitted the beginning prefix GB_PR: and similar prefixes, but I've tried that way and it doesn't work, either.) Is there any way to get seqret functioning with these databases? -- Audra Johnson, University of Idaho _______________________________________________ EMBOSS mailing list EMBOSS at emboss.open-bio.org http://newportal.open-bio.org/mailman/listinfo/emboss From Andrew.Mather at dpi.vic.gov.au Sat Jul 30 11:30:47 2005 From: Andrew.Mather at dpi.vic.gov.au (Andrew.Mather at dpi.vic.gov.au) Date: Sat, 30 Jul 2005 21:30:47 +1000 Subject: [EMBOSS] EMBOSS GUI problems Message-ID: Hi Luke and EMBOSS list I've installed the EMBOSS GUI and for the most part, it's working pretty well. However for some apps (mainly seems to be alignment type ones like water, needle, emma, but that may just be because I've tried more of them than any others), it always fails Error: Unable to read sequence '' Died: water terminated: Bad value for '-asequence' with -auto defined water exited with status 1... or in the /var/www/html/EMBOSS/runs/ error log, Error: Unable to read sequence '' Died: water terminated: Bad value for '-asequence' with -auto defined water exited with status 1... It doesn't seem to matter if it's sequence data pasted in, or uploaded from a file. Some apps work fine, so I'm guessing it's not a fundamental problem like permissions on a temp directory or something. Are you able to point me at where to start lookng ? Thanks, Andrew Animal Genetics and Genomics, PIRVic Attwood 475 Mickleham Road, Attwood, 3049 ph +61 3 92174342 mob 0413 009 761 ---------------- There are 10 kinds of people...those who understand binary and those who don't. From Andrew.Mather at dpi.vic.gov.au Sat Jul 30 10:40:45 2005 From: Andrew.Mather at dpi.vic.gov.au (Andrew.Mather at dpi.vic.gov.au) Date: Sat, 30 Jul 2005 20:40:45 +1000 Subject: [EMBOSS] EMBOSS GUI problems Message-ID: Hi Luke and EMBOSS list I've installed the EMBOSS GUI and for the most part, it's working pretty well. However for some apps (mainly seems to be alignment type ones like water, needle, emma, but that may just be because I've tried more of them than any others), it always fails Error: Unable to read sequence '' Died: water terminated: Bad value for '-asequence' with -auto defined water exited with status 1... or in the /var/www/html/EMBOSS/runs/ error log, Error: Unable to read sequence '' Died: water terminated: Bad value for '-asequence' with -auto defined water exited with status 1... It doesn't seem to matter if it's sequence data pasted in, or uploaded from a file. Some apps work fine, so I'm guessing it's not a fundamental problem like permissions on a temp directory or something. Are you able to point me at where to start lookng ? Thanks, Andrew Animal Genetics and Genomics, PIRVic Attwood 475 Mickleham Road, Attwood, 3049 ph +61 3 92174342 mob 0413 009 761 ---------------- There are 10 kinds of people...those who understand binary and those who don't.