From mathog at mendel.bio.caltech.edu Tue Apr 2 11:39:48 2002 From: mathog at mendel.bio.caltech.edu (David Mathog) Date: Tue, 02 Apr 2002 08:39:48 -0800 Subject: primer3 and qual scores Message-ID: The 0.9 version of primer3 from http://www-genome.wi.mit.edu/genome_software/other/primer3.html comes with a cgi script that puts up a web interface which drives primer3_core. That web interface provides a slew of options for including qual values in a section labeled "Sequence Quality". The primer3 program in EMBOSS seems not to have these options. Is there some technical reason why this functionality wasn't included or is it just one of those (many) things that have yet to percolate to the top of the "to do" list? Thanks, David Mathog mathog at caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech From vvrajarao at yahoo.com Wed Apr 3 09:35:44 2002 From: vvrajarao at yahoo.com (V V Raja Rao) Date: Wed, 3 Apr 2002 06:35:44 -0800 (PST) Subject: tandem repeats Message-ID: <20020403143544.79253.qmail@web11107.mail.yahoo.com> Hi, I would like to know the algorithm used for the tandem repeat finder program in the emboss package. Can someone mail me the same. Thanks in advance, Raja. __________________________________________________ Do You Yahoo!? Yahoo! Tax Center - online filing with TurboTax http://taxes.yahoo.com/ From gwilliam at hgmp.mrc.ac.uk Wed Apr 3 10:56:04 2002 From: gwilliam at hgmp.mrc.ac.uk (Gary Williams, Tel 01223 494522) Date: Wed, 03 Apr 2002 16:56:04 +0100 Subject: primer3 and qual scores References: Message-ID: <3CAB2614.9A8B28A@hgmp.mrc.ac.uk> As you say "a slew of options"! I didn't include the quality values as there was pressure from the GUI community to minimise the number of options. I, myself, have never used the quality options. They could be added in, but there are already rather a lot of options for this program; do we need more? Which ones do you consider the most useful, and why? Gary David Mathog wrote: > > The 0.9 version of primer3 from > > http://www-genome.wi.mit.edu/genome_software/other/primer3.html > > comes with a cgi script that puts up a web interface which > drives primer3_core. That web interface provides a slew > of options for including qual values in a section labeled > "Sequence Quality". The primer3 program in EMBOSS seems not to > have these options. Is there some technical reason why this > functionality wasn't included or is it just one of those (many) things > that have yet to percolate to the top of the "to do" list? > > Thanks, > > David Mathog > mathog at caltech.edu > Manager, Sequence Analysis Facility, Biology Division, Caltech -- Gary Williams Tel: +44 1223 494522 Fax: +44 1223 494512 mailto:G.Williams at hgmp.mrc.ac.uk http://www.hgmp.mrc.ac.uk/ Bioinformatics,MRC HGMP Resource Centre,Hinxton,Cambridge, CB10 1SB,UK From foisys at mac.com Thu Apr 4 14:02:47 2002 From: foisys at mac.com (Sylvain Foisy) Date: Thu, 4 Apr 2002 14:02:47 -0500 Subject: Compiling EMBOSS for Jemboss use in MacOS X Message-ID: <8DA5625E-47FE-11D6-ABCB-0003936297DA@mac.com> Hi, Thanks for supporting OS X in EMBOSS. I got a clean compile and make and it is working OK. I would like to try Jemboss and I have one (I guess, pretty stupid) question. What is the Java location I might specify in the configure command in Mac OS X? There is a lot of differents places in OS X that might be good... Any hints? Sylvain ++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Sylvain Foisy, Ph. D. Manager BIONEQ - Le Reseau quebecois de bioinformatique Genome-Quebec Tel.: (514) 343-6111 poste 5188 E-mail: foisys at medcn.umontreal.ca ++++++++++++++++++++++++++++++++++++++++++++++++++++++++ From ame at esbs.u-strasbg.fr Mon Apr 8 05:13:07 2002 From: ame at esbs.u-strasbg.fr (Jean-Christophe Ame) Date: Mon, 8 Apr 2002 11:13:07 +0200 Subject: Blast database Message-ID: Hello, I have a few BLAST formatted databases to be used with BLAST and I would like them to be shared with emboss. how can I do that ? How should I set up my .embossrc file ? Any answer would be of great help. Thank you. Jean-Christophe ________________________ Jean-Christophe Am?, PhD U.P.R. 9003 du CNRS - Canc?rog?n?se et Mutag?n?se Mol?culaire et Structurale ?cole Sup?rieure de Biotechnologie de Strasbourg P?le API Boulevard S?bastien-Brant 67400 Illkirch France tel.: 03 90 24 47 05 Fax.: 03 90 24 46 86 From cquijano at iib.uam.es Tue Apr 9 04:51:30 2002 From: cquijano at iib.uam.es (Carlos Quijano) Date: Tue, 09 Apr 2002 10:51:30 +0200 Subject: Where is protml (Phylip) ? Message-ID: <3CB2AB92.5040308@iib.uam.es> Hello, Some people asked me about "activating" the protml application from phylip package. ;-) We use Emboss, with Embassy, and it seems that protml is documented but not compiled or installed. Even looking for it under ./src it is not present by some reason. I don't know if the cause for it is that protml (something like dnaml but for proteins trees) has been developed with PASCAL instead of C. With the phylip package (not Embasy's) protml.pas is present between the other sourcefiles. And I have seen that it comes with the MOLPHY package too, but in C, I guess. Someone has any idea for put a compiled protml.pas (or molphy's) into Emboss and make it accesible for the frontend-app used (for me w2h)?? I know perhaps this solution is not the best one, or perhaps a little paranoid. Because it's possible to use PIE like web interface for Phylip, Molphi's protml and puzzle. I am only looking for an easy way for compiling protml and make it part of the Emboss or Emboss/w2h applications set. Thank you for your time. From frank at bioss.ac.uk Tue Apr 9 06:04:18 2002 From: frank at bioss.ac.uk (Frank Wright) Date: Tue, 09 Apr 2002 11:04:18 +0100 Subject: Where is protml (Phylip) ? References: <3CB2AB92.5040308@iib.uam.es> Message-ID: <3CB2BCA2.F34D3DEC@bioss.ac.uk> PHYLIP 3.5 includes PROTML (from the MOLPHY package) because v 3.5 does not have a protein maximum likelihood program. However, PHYLIP 3.6 (almost about to go to a Beta release) has a new program, PROML, which is a PHYLIP protein maximum likelihood program. PROML has additional features to PROTML. I suggest waiting for PHYLIP 3.6 to be released and EMBOSS/EMBASSY is adapted to access v 3.6 programs. In the meantime, PHYLIP 3.6 (alpha version, but pretty stable) is available from http://evolution.genetics.washington.edu/phylip.html. Best Wishes, Frank -- Frank Wright Biomathematics and Statistics Scotland, SCRI, DUNDEE DD2 5DA, Scotland frank at bioss.sari.ac.uk From Guoneng.Zhong at med.nyu.edu Tue Apr 9 13:37:42 2002 From: Guoneng.Zhong at med.nyu.edu (Guoneng Zhong) Date: Tue, 9 Apr 2002 13:37:42 -0400 Subject: problem running Message-ID: <7EDDC060-4BE0-11D6-A32B-0050E41E5C1B@med.nyu.edu> Hi, I followed the instructions and installed emboss on a Tru64 unix. I ran the test: wossname -auto | more and it worked (at least no weird errors). But here are two problems: 1. Running jemboss gave me this: Error: failed /usr/opt/java131/jre/lib/alpha/fast/libjvm.so, because dlopen: cannot load /usr/opt/java131/jre/lib/alpha/fast/libjvm.so 2. Running abiviewer gave me this: Reads ABI file and display the trace Output sequence [outfile.fasta]: Graph type [x11]: PLPLOT_LIB="/usr/local/emboss/lib" Cannot open library file: plstnd5.fnt Please set PLPLOT_LIB to the plplot/lib directory under emboss *** PLPLOT ERROR *** Unable to open font file Program aborted Any hint would help. Thanks! Guoneng From David.Bauer at SCHERING.DE Wed Apr 10 01:47:02 2002 From: David.Bauer at SCHERING.DE (David.Bauer at SCHERING.DE) Date: Wed, 10 Apr 2002 07:47:02 +0200 Subject: Antwort: problem running Message-ID: Hi, Answer for question 2: The PLPLOT_LIB must point to a directory with the .fnt files. If you do a standard installation, then this is "/usr/local/share/EMBOSS". Alternatively you can use the location where you unpacked the emboss tar file. In that case it is the /.../EMBOSS-..../plplot/lib. So if you use csh or tcsh you should have a setenv PLPLOT_LIB /usr/local/share/EMBOSS in your .cshrc. Hope this helps. Ciao, David. Hi, I followed the instructions and installed emboss on a Tru64 unix. I ran the test: wossname -auto | more and it worked (at least no weird errors). But here are two problems: 1. Running jemboss gave me this: Error: failed /usr/opt/java131/jre/lib/alpha/fast/libjvm.so, because dlopen: cannot load /usr/opt/java131/jre/lib/alpha/fast/libjvm.so 2. Running abiviewer gave me this: Reads ABI file and display the trace Output sequence [outfile.fasta]: Graph type [x11]: PLPLOT_LIB="/usr/local/emboss/lib" Cannot open library file: plstnd5.fnt Please set PLPLOT_LIB to the plplot/lib directory under emboss *** PLPLOT ERROR *** Unable to open font file Program aborted Any hint would help. Thanks! Guoneng From john.walshaw at bbsrc.ac.uk Wed Apr 10 06:44:13 2002 From: john.walshaw at bbsrc.ac.uk (john walshaw (JIC)) Date: Wed, 10 Apr 2002 11:44:13 +0100 Subject: dbifasta/seqret and ncbi-format fasta headers Message-ID: I have a question about ncbi-type sequence headers in fasta-format files. I'm using EMBOSS 2.3.1. The ncbi format for the dbifasta program is described variously as: ncbi : >blah|...[|ACC]|ID and >...[|accno]|id ... in the EMBOSS admin guide and by 'tfm dbifasta'. >From these I assumed that within the first of the whitespace-delimited 'fields', the last two '|'-delimited subfields will be treated by dbifasta as the accession no and ID respectively: >gi|15375403|dbj|AB039926.1|AB039926 Arabidopsis ...blah... ^^^^^^^^^^ ^^^^^^^^ accno id - but this doesn't work as seqret reports in this case that AB039926 is not in my database (which I indexed with dbifasta using idformat 'ncbi', and specified with method: emblcd format:fasta & the necessary dir: and indexdir: fields). But this sequence works (I can get it with seqret) - >gi|15383574|gb|AV540904.2|AV540904 AV540904 Arabidopsis thaliana roots ...blah ^^^^^^^^ ^^^^^^^^ -because the second whitespace-delimited field is present AND identical to the previous subfield. The 2nd field is not simply being used as the accno, because for example this entry: >gi|15383574|gb|AV540904.2|XXXXXXX YYYYYYY cannot be returned by seqret either as XXXXXXX or YYYYYYY (or by any means other than requesting all sequences in the DB). Am I doing something stupid? I've looked into this problem a lot, and can provide debug files for seqret & dbifasta, and I'm sure my db specification in emboss.default is correct. For the sequences which fail, seqret reads the correct header line, but then thinks that accno=''. And seqret always returns the id as 'gi' (even for sequences which can be fetched normally). All of the correct accnos (e.g. AV540904.2) appear in the acnum.trg file. Regards, John Walshaw John Innes Centre, Norwich Research Park, Colney, Norwich NR4 7UH, UK. +44(0)1603 450827 From valenzi at iigb.na.cnr.it Wed Apr 10 12:49:07 2002 From: valenzi at iigb.na.cnr.it (Marco Valenzi) Date: Wed, 10 Apr 2002 18:49:07 +0200 Subject: About prima Message-ID: Hi, I'm Marco Valenzi from Naples. Why prima has been removed from the current package of EMBOSS-2.3.1? Many thanks -- Marco Valenzi Institute of Genetics and Biophysics "Adriano Buzzati Traverso" via Guglielmo Marconi, 10 80125 Naples ITALY E-mail valenzi at iigbna.iigb.na.cnr.it tel. +39 081 7257303 From cox at mshri.on.ca Sat Apr 13 20:46:07 2002 From: cox at mshri.on.ca (Brian Cox) Date: Sat, 13 Apr 2002 17:46:07 -0700 Subject: jemboss Message-ID: <000801c1e34d$c79a9c30$bc66f8ce@rossdell> Hello, I noticed that there is a Jemboss for windows on your FTP site. I downloaded it but, require a login and password. How do I obtain these? Is this a standalone version such as the one for Unix? Thank you Brian Cox -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.open-bio.org/pipermail/emboss/attachments/20020413/cad89689/attachment.html From quenzer at informatik.uni-tuebingen.de Tue Apr 16 08:08:56 2002 From: quenzer at informatik.uni-tuebingen.de (Muriel Quenzer) Date: Tue, 16 Apr 2002 14:08:56 +0200 Subject: Size of EMBOSS 2.3.1 for Solaris 2.8 Message-ID: <200204161208.g3GC8u721130@tauri.informatik.uni-tuebingen.de> Hi, I am new to EMBOSS and have to install the latest EMBOSS version 2.3.1 for Sun Solaris 2.8. I have been told that the EMBOSS version 2.0.1 needed approximately 15 MB disk space, whereas the EMBOSS version 2.3.1 that I compiled needs approximately 520 (!) MB. Is this correct? Thanks for any suggestions. Muriel -- Mit freundlichen Gr??en, Muriel Quenzer ---------- Universit?t T?bingen Wilhelm-Schickard-Institut f?r Informatik Zentrum f?r Bioinformatik Sand 13, 72076 T?bingen Germany Tel.: +49 (0)7071/29-70464 E-mail: quenzer at informatik.uni-tuebingen.de GnuPG PUBLIC KEY on request Key fingerprint = ADDF 1E38 773F 3D51 682E 1F50 D7CC 47E1 3AE8 E047 From charles at moulinette.dyndns.org Tue Apr 16 08:55:32 2002 From: charles at moulinette.dyndns.org (Charles Plessy) Date: Tue, 16 Apr 2002 14:55:32 +0200 Subject: Pentium optimisation Message-ID: <20020416125532.GA22253@moulinette.dyndns.org> Hi, I'm running emboss on a debian GNU\Linux with a Pentium IV. Would I increase the speed of computations if I compiled it with for i686, not i386 processors (or is it only useful for multimedia apps) ? Charles From David.Bauer at SCHERING.DE Tue Apr 16 10:11:39 2002 From: David.Bauer at SCHERING.DE (David.Bauer at SCHERING.DE) Date: Tue, 16 Apr 2002 16:11:39 +0200 Subject: Antwort: Size of EMBOSS 2.3.1 for Solaris 2.8 Message-ID: Hi, this is a little bit overestimated.... I have EMBOSS on Solaris 2.7. The build tree is ~100 MB with embassy apps (they are about 15 MB incl. tar files). The installed version needs 21.4 MB for the binaries and 94 MB in share/EMBOSS (where 75 MB are for PRINTS,PROSITE and REBASE). Mit freundlichen Gr??en, David. Hi, I am new to EMBOSS and have to install the latest EMBOSS version 2.3.1 for Sun Solaris 2.8. I have been told that the EMBOSS version 2.0.1 needed approximately 15 MB disk space, whereas the EMBOSS version 2.3.1 that I compiled needs approximately 520 (!) MB. Is this correct? Thanks for any suggestions. Muriel -- Mit freundlichen Gr??en, Muriel Quenzer ---------- Universit?t T?bingen Wilhelm-Schickard-Institut f?r Informatik Zentrum f?r Bioinformatik Sand 13, 72076 T?bingen Germany Tel.: +49 (0)7071/29-70464 E-mail: quenzer at informatik.uni-tuebingen.de GnuPG PUBLIC KEY on request Key fingerprint = ADDF 1E38 773F 3D51 682E 1F50 D7CC 47E1 3AE8 E047 From peacfrog at ptd.net Tue Apr 16 13:18:07 2002 From: peacfrog at ptd.net (Cynthia Martino) Date: Tue, 16 Apr 2002 13:18:07 -0400 Subject: Prima Message-ID: <001301c1e56a$ad3c7880$2d7ce518@msns.str.ptd.net> Hi there! In the past I was able to access a number of programs, including prima, via the EMBnet Norway site. However, now when I click on the program name within the program list all I get is a help page describing the qualifiers. Do you know if this and the other programs formerly available (via www.no.embnet.org/Programs/) can still be accessed online? Any feedback is greatly appreciated. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.open-bio.org/pipermail/emboss/attachments/20020416/bc493f46/attachment.html From letondal at pasteur.fr Tue Apr 16 17:41:15 2002 From: letondal at pasteur.fr (Catherine Letondal) Date: Tue, 16 Apr 2002 23:41:15 +0200 Subject: Prima In-Reply-To: Your message of "Tue, 16 Apr 2002 13:18:07 EDT." <001301c1e56a$ad3c7880$2d7ce518@msns.str.ptd.net> Message-ID: <200204162141.g3GLfF6O453283@electre.pasteur.fr> "Cynthia Martino" wrote: > This is a multi-part message in MIME format. > > Hi there! > > In the past I was able to access a number of programs, including prima, = > via the EMBnet Norway site. However, now when I click on the program = > name within the program list all I get is a help page describing the = > qualifiers. =20 > > Do you know if this and the other programs formerly available (via = > www.no.embnet.org/Programs/) can still be accessed online?=20 > > Any feedback is greatly appreciated. > Hi, If you want to use a similar interface, you can go to: http://bioweb.pasteur.fr/seqanal/interfaces/prima.html (see http://bioweb.pasteur.fr/intro-uk.html for all EMBOSS programs) There are however other EMBOSS interfaces that you can use, people from EMBOSS will tell you more accurately than I would do. I don't know what happens on the no.embnet.org server. We are late in the distribution of the Xml Pise programs for EMBOSS latest version, so that could explain. -- Catherine Letondal -- Pasteur Institute Computing Center From David.Bauer at SCHERING.DE Wed Apr 17 01:34:38 2002 From: David.Bauer at SCHERING.DE (David.Bauer at SCHERING.DE) Date: Wed, 17 Apr 2002 07:34:38 +0200 Subject: Antwort: Prima Message-ID: Hi, the EMBOSS programs are also available at http://ubigcg.mdh4.mdc-berlin.de:8080/ Btw. I have updated the system to EMBOSS version 2.3.1. Ciao, David. Hi there! In the past I was able to access a number of programs, including prima, via the EMBnet Norway site. However, now when I click on the program name within the program list all I get is a help page describing the qualifiers. Do you know if this and the other programs formerly available (via www.no.embnet.org/Programs/) can still be accessed online? Any feedback is greatly appreciated. From mathog at mendel.bio.caltech.edu Wed Apr 17 15:05:47 2002 From: mathog at mendel.bio.caltech.edu (David Mathog) Date: Wed, 17 Apr 2002 12:05:47 -0700 Subject: network USA Message-ID: Today I finally realized that the NCBi's PmFetch cgi http://www.ncbi.nlm.nih.gov:80/entrez/utils/pmfetch_help.html can be used to retrieve data via gi using a "simple" URL like this: wget -O dmwhite.genbank \ 'http://www.ncbi.nlm.nih.gov/entrez/utils/pmfetch.fcgi?db=Nucleotide&id=10873&report=gen&mode=text' Unfortunately it seems not to be able to retrieve by either accession number or locus name - I'm still waiting to hear if there is some other NCBI interface for that. Which is a long way of coming around to considering how a USA could be used to retrieve remote sequences without exposing end users to truly hideous constructs. The semantics of accessing arbitrary network databases are probably much too complex to include in the USA but one can imagine burying these details under new types of "database" entries in the defaults file. Something like this: DB gigenbank [ method: remoteurlbyid comment: "GENBANK at NCBI by gi number" format: - dir: - file: - type: N #optional target: 'http://www.ncbi.nlm.nih.gov/entrez/utils/pmfetch.fcgi?db=Nucleotide&id=$ID&report=gen&mode=text' filter: 'wget -O - $target' ] Which would then allow something like this to work transparently: % seqret gigenbank:10873 The USA already has the "program" option but I think in a situation like this it's much too complex to actually use. How many users are going to be able to successfully negotiate this: % seqret -sequence=fasta::"wget -O - 'http://www.ncbi.nlm.nih.gov/entrez/utils/pmfetch.fcgi?db=Nucleotide&id=10873&report=fasta&mode=text' |" -filter Anyway, what I'm proposing is that the database definition be extended slightly to allow remote accesss methods. This would be particularly helpful for people running EMBOSS on their own PCs or Macs, who tend not to have large local databases installed. Regards, David Mathog mathog at caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech From dmartin at bioinformatics.msiwtb.dundee.ac.uk Wed Apr 17 15:32:40 2002 From: dmartin at bioinformatics.msiwtb.dundee.ac.uk (David Martin) Date: Wed, 17 Apr 2002 20:32:40 +0100 (BST) Subject: network USA In-Reply-To: Message-ID: On Wed, 17 Apr 2002, David Mathog wrote: > Today I finally realized that the NCBi's PmFetch cgi > > http://www.ncbi.nlm.nih.gov:80/entrez/utils/pmfetch_help.html > > can be used to retrieve data via gi using a "simple" URL like this: > > wget -O dmwhite.genbank \ > 'http://www.ncbi.nlm.nih.gov/entrez/utils/pmfetch.fcgi?db=Nucleotide&id=10873&report=gen&mode=text' > > Unfortunately it seems not to be able to retrieve by either accession > number or > locus name - I'm still waiting to hear if there is some other NCBI > interface for that. > > Which is a long way of coming around to considering how a USA could be > used to retrieve remote sequences without exposing end users to truly > hideous > constructs. The semantics of accessing arbitrary network databases are > probably much too complex to include in the USA but one can imagine > burying > these details under new types of "database" entries in the defaults > file. Something like this: Try 'method: url' and using %s instead of $ID. It has been there from EMBOSS 0.0.4 to allow retrieval from remote srs servers (or indeed any arbitrary web address where the id can be passed in the url). Around page 19-20 in the admin guide. If it doesn't work then let the guilty parties know. ..d > > DB gigenbank [ > method: remoteurlbyid > comment: "GENBANK at NCBI by gi number" > format: - > dir: - > file: - > type: N > #optional > target: > 'http://www.ncbi.nlm.nih.gov/entrez/utils/pmfetch.fcgi?db=Nucleotide&id=$ID&report=gen&mode=text' > filter: 'wget -O - $target' > ] > > Which would then allow something like this to work transparently: > > % seqret gigenbank:10873 > > The USA already has the "program" option but I think in a situation like > this it's > much too complex to actually use. How many users are going to be able > to successfully negotiate this: > > % seqret -sequence=fasta::"wget -O - > 'http://www.ncbi.nlm.nih.gov/entrez/utils/pmfetch.fcgi?db=Nucleotide&id=10873&report=fasta&mode=text' > |" -filter > > Anyway, what I'm proposing is that the database definition be extended > slightly > to allow remote accesss methods. This would be particularly helpful for > people > running EMBOSS on their own PCs or Macs, who tend not to have large > local databases installed. > > Regards, > > David Mathog > mathog at caltech.edu > Manager, Sequence Analysis Facility, Biology Division, Caltech > ---------------------------------- David Martin PhD Bioinformatics Scientific Officer Wellcome Trust Biocentre, Dundee ---------------------------------- From David.Bauer at SCHERING.DE Thu Apr 18 01:50:47 2002 From: David.Bauer at SCHERING.DE (David.Bauer at SCHERING.DE) Date: Thu, 18 Apr 2002 07:50:47 +0200 Subject: Antwort: network USA Message-ID: Hi, I use for this a workaround which uses method app calling scripts which use two urls at ncbi. In emboss.default I have two entries for nucleotide and protein, which call an external script. ############ DB ncbin [ type: N method: app format: genbank app: "/bips/bin/emboss/ncbi_fetchn %s" comment: "NCBI GenBank Nucleotide" ] DB ncbip [ type: P method: app format: genbank app: "/bips/bin/emboss/ncbi_fetchp %s" comment: "NCBI GenBank Protein" ] ################## The script is unfortunately not very portable as it uses a modified perl LWP module to work with our firewall. Basic idea is to use the "http://www.ncbi.nlm.nih.gov/entrez/utils/pmqty.fcgi?db=nucleotide&term =$id&dopt=genbank" resp. "http://www.ncbi.nlm.nih.gov/entrez/utils/pmqty.fcgi?db=protein&term =$id&dopt=genpept" for protein to get the gid where $id is locus or acc. If there are different gid for one acc, then all of them are returned. What I have observed is that the gid of the most recent version is returned first (but I'm not sure if this is always true). So I just grab the first gid which comes and then use the same url you already mentioned: "http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=text&db=nucleotide&uid =$gid&dopt=GenBank" ("http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=text&db=protein&uid =$gid&dopt=genpept") to return the whole entry. So the user can just use ncbin: with seqret or entret to get the sequence or genbank entry. Hope this helps, Ciao, David. From inagy at abc.hu Thu Apr 18 02:50:57 2002 From: inagy at abc.hu (inagy at abc.hu) Date: Thu, 18 Apr 2002 08:50:57 +0200 (CEST) Subject: primer3 and -format_output In-Reply-To: <3CAB2614.9A8B28A@hgmp.mrc.ac.uk> Message-ID: The Whitehead primer3 program has a "-format_output" option that writes a formatted output of the input seqences and highlightes the primer binding sites, etc. Would it be possible to include this option into the EMBOSS version too ? It is sometimes very useful. Istvan From gwilliam at hgmp.mrc.ac.uk Fri Apr 19 04:15:14 2002 From: gwilliam at hgmp.mrc.ac.uk (Gary Williams, Tel 01223 494522) Date: Fri, 19 Apr 2002 09:15:14 +0100 Subject: primer3 and -format_output References: Message-ID: <3CBFD212.C29B713@hgmp.mrc.ac.uk> I'll add this to the list of suggestions for primer3. Gary inagy at abc.hu wrote: > > The Whitehead primer3 program has a "-format_output" option that writes a > formatted output of the input seqences and highlightes the primer binding sites, > etc. > > Would it be possible to include this option into the EMBOSS version too ? > It is sometimes very useful. > > Istvan -- Gary Williams Tel: +44 1223 494522 Fax: +44 1223 494512 mailto:G.Williams at hgmp.mrc.ac.uk http://www.hgmp.mrc.ac.uk/ Bioinformatics,MRC HGMP Resource Centre,Hinxton,Cambridge, CB10 1SB,UK From grimplet at ensam.inra.fr Fri Apr 19 05:27:58 2002 From: grimplet at ensam.inra.fr (=?iso-8859-1?q?j=E9r=F4me=20Grimplet?=) Date: Fri, 19 Apr 2002 11:27:58 +0200 Subject: primer3_core Message-ID: <200204191129.g3JBTOe14633@ensam.inra.fr> I believe that somebody already put this question a few week ago, but how can I get the primer3_core programm. I don't find it in the Whitehead Institute package. Thanks, Jerome -- J?r?me Grimplet Laboratoire de Biochimie M?tabolique et Technologie UMR Sciences Pour l'Oenologie 2, Place Viala 34060 Montpellier Cedex 01 Tel: 33(0)4.99.61.27.56 Fax: 33(0)4.99.61.28.57 grimplet at ensam.inra.fr From gwilliam at hgmp.mrc.ac.uk Fri Apr 19 06:51:46 2002 From: gwilliam at hgmp.mrc.ac.uk (Gary Williams, Tel 01223 494522) Date: Fri, 19 Apr 2002 11:51:46 +0100 Subject: primer3_core References: <200204191129.g3JBTOe14633@ensam.inra.fr> Message-ID: <3CBFF6C2.DCAF6007@hgmp.mrc.ac.uk> >From the eprimer3 documentation: Notes The Whitehead Institute program that is run by this program is available from: http://www-genome.wi.mit.edu/genome_software/other/primer3.html (Then see the link 'Get release 0.9') The version that is run by this program is 3.0.9 currently available from: http://www-genome.wi.mit.edu/ftp/distribution/software/primer3_0_9_test.tar.gz j?r?me Grimplet wrote: > > I believe that somebody already put this question a few week ago, but how can > I get the primer3_core programm. I don't find it in the Whitehead Institute > package. > > Thanks, > > Jerome > -- > J?r?me Grimplet > Laboratoire de Biochimie M?tabolique et Technologie > UMR Sciences Pour l'Oenologie > 2, Place Viala > 34060 Montpellier Cedex 01 > Tel: 33(0)4.99.61.27.56 > Fax: 33(0)4.99.61.28.57 > grimplet at ensam.inra.fr -- Gary Williams Tel: +44 1223 494522 Fax: +44 1223 494512 mailto:G.Williams at hgmp.mrc.ac.uk http://www.hgmp.mrc.ac.uk/ Bioinformatics,MRC HGMP Resource Centre,Hinxton,Cambridge, CB10 1SB,UK From peter.rice at uk.lionbioscience.com Fri Apr 19 08:49:07 2002 From: peter.rice at uk.lionbioscience.com (Peter Rice) Date: Fri, 19 Apr 2002 13:49:07 +0100 Subject: network USA References: Message-ID: <3CC01243.9190F3FB@uk.lionbioscience.com> David Mathog wrote: > > Today I finally realized that the NCBi's PmFetch cgi > > http://www.ncbi.nlm.nih.gov:80/entrez/utils/pmfetch_help.html > > can be used to retrieve data via gi using a "simple" URL like this: > > wget -O dmwhite.genbank \ > 'http://www.ncbi.nlm.nih.gov/entrez/utils/pmfetch.fcgi?db=Nucleotide&id=10873&report=gen&mode=text' > > Unfortunately it seems not to be able to retrieve by either accession > number or > locus name - I'm still waiting to hear if there is some other NCBI > interface for that. Oops. Will be fixed in 2.4.0 (Alan and I thought it already was, but it needed one extra line of code in the latest CVS version). The problem is that EMBOSS checks the ID and accession of the returned entry for the URL access method, and of course neither matches '10873'. Which leads on to a new access method for 2.4.0. We are adding an "srswww" access method that generates the SRS URLs, and can query by id, accession, seqversion (or GI), keyword, organism or description. We can add at least some of these for entrez (new access method entrez) if we can gather up enough URLs. Are there any entrez experts who can help with suggested URLs to retrieve (preferably plain text, but html will do) from entrez with queries for each of these fields? Peter -- ------------------------------------------------ Peter Rice, LION Bioscience Ltd, Cambridge, UK peter.rice at uk.lionbioscience.com +44 1223 224723 From sghk100 at sghms.ac.uk Fri Apr 19 11:19:15 2002 From: sghk100 at sghms.ac.uk (David Winterbourne) Date: Fri, 19 Apr 2002 16:19:15 +0100 Subject: network USA Message-ID: <3CC03573.4A76C643@sghms.ac.uk> David Martin wrote: > ... > > Try 'method: url' and using %s instead of $ID. It has been there from > EMBOSS 0.0.4 to > allow retrieval from remote srs servers (or indeed any arbitrary web > address where the id can be passed in the url). I have been having a problem accessing the Swiss Prot database using this method. I set up URL based access to SWISSPROT and EMBL databases at EBI as follows: DB sw [ type: P method: url format: swiss url: "http://srs.ebi.ac.uk/srs6bin/cgi-bin/wgetz?-e+[SWISSPROT:%s]" DB embl [ type: N method: url format: embl url: "http://srs.ebi.ac.uk/srs6bin/cgi-bin/wgetz?-e+[EMBL-id:%s]" For an EMBL entry, using the URL in a browser and specifying it in EMBOSS accesses the data. However, the equivalent for Swiss Prot works in a browser but not in EMBOSS - it just causes the system to hang. Is there a simple solution? Regards David -- David Winterbourne Department of Surgery St. George's Hospital Medical School, London SW17 0RE, England Tel: 020 8725 5581 Fax: 020 8725 3594 From jfreeman at variagenics.com Mon Apr 22 17:21:51 2002 From: jfreeman at variagenics.com (James Freeman) Date: Mon, 22 Apr 2002 17:21:51 -0400 Subject: Jemboss and Resin Message-ID: <3CC47EEF.635B531E@variagenics.com> To whom it may concern, Does anyone know of any problems when using Resin (http://www.caucho.com/) as a substitute for Tomcat when running Jemboss? Thanks for your assistance, Jim Freeman Senior Scientist Variagenics, Inc. From tchiang at bioinfo.sickkids.on.ca Tue Apr 23 10:01:40 2002 From: tchiang at bioinfo.sickkids.on.ca (Ted Chiang) Date: Tue, 23 Apr 2002 10:01:40 -0400 (EDT) Subject: EMBOSS:complex Message-ID: Just a quick question. In the 2.3.1 release, the EMBOSS program 'complex' is not fully implemented. Will this program be in the next release or have we missed something in the installation? -Ted ===================================== Ted Chiang Bioinformatics Supercomputing Centre Hospital for Sick Children, Toronto ext. 7028 tchiang at bioinfo.sickkids.on.ca From peter.rice at uk.lionbioscience.com Tue Apr 23 10:30:23 2002 From: peter.rice at uk.lionbioscience.com (Peter Rice) Date: Tue, 23 Apr 2002 15:30:23 +0100 Subject: EMBOSS:complex References: Message-ID: <3CC56FFF.3C3AFB6A@uk.lionbioscience.com> Ted Chiang wrote: > > Just a quick question. In the 2.3.1 release, the EMBOSS program 'complex' > is not fully implemented. Will this program be in the next release or > have we missed something in the installation? complex is a strange application (with italian command line options) that the authors have not been maintaining. We have moved it into the "make check" set of obsolete/testing applications. If you do need it, the "make check" command will build it, but you then need to copy the binary and other files to the install directories by hand. One unfortunate side effect of moving applications to "make check" (or removing them) is that the old binaries will stay in the install directory. Perhaps we can find a way to clean them up ... need to think about that a little. regards, Peter -- ------------------------------------------------ Peter Rice, LION Bioscience Ltd, Cambridge, UK peter.rice at uk.lionbioscience.com +44 1223 224723 From letondal at pasteur.fr Tue Apr 23 11:06:50 2002 From: letondal at pasteur.fr (Catherine Letondal) Date: Tue, 23 Apr 2002 17:06:50 +0200 Subject: Pise/EMBOSS 2.3.1 Message-ID: <200204231506.g3NF6oop249416@electre.pasteur.fr> Hi, I have more or less adapted new ACD types and attributes to Pise. (ftp://ftp.pasteur.fr/pub/GenSoft/unix/misc/Pise/emboss_xml_files-2.3.1.tar.gz) Main changes were for align types, where I could associate a "pipetype" to chain to other programs taking alignment as input. BTW, I found "MSF" and "fasta" for the -aformat parameter - are there others? The main problem I had was with string parameters for specifying a path, with ./ default value and having corresponding extn parameters. On a Web interface you cannot really allow path and filename manipulation, and you must give a mean to the user to upload or input data (except if you have a login on user home directory, which I'm aware is the choice for other Web interfaces for EMBOSS). That's why I had to discard the following programs: alignwrap, contacts, seqnr, seqsort, siggen, dichet and scopalign. I have tried to "guess" that such parameter is a path, according to their name, with the next parameter being the extension, and, it's in the input or output section, so I can decide it's an InFile or Sequence, or Results Pise parameter. But some parameters are neither in input nor in output sections, and it's not secure to associate to parameters just because they follow each other. A solution could be to have an explicit type and the extension as an attribute: path: algpath [ parameter: "Y" prompt: "Location and extension of alignment files for input" default: "./" extn: ".align" ] instead of (siggen.acd): section: input [ info: "input Section" type: page ] string: algpath [ parameter: "Y" prompt: "Location of alignment files for input" default: "./" ] string: algextn [ parameter: "Y" prompt: "Extension of alignment files for input" default: ".align" ] What do you think? Thanks a lot in advance, -- Catherine Letondal -- Pasteur Institute Computing Center From peter.rice at uk.lionbioscience.com Tue Apr 23 12:03:27 2002 From: peter.rice at uk.lionbioscience.com (Peter Rice) Date: Tue, 23 Apr 2002 17:03:27 +0100 Subject: Pise/EMBOSS 2.3.1 References: <200204231506.g3NF6oop249416@electre.pasteur.fr> Message-ID: <3CC585CF.D664FB68@uk.lionbioscience.com> Hi Catherine, > Main changes were for align types, where I could associate a "pipetype" to > chain to other programs taking alignment as input. BTW, I found "MSF" and "fasta" for the > -aformat parameter - are there others? There are more, but not sequence formats. We should add them to "entrails" output. We can easily add more sequence formats. Can you suggest some? The full list is (from ajax/ajalign.c) : markx0*, markx1*, markx2*, markx3*, markx10* (from the FASTA package) multiple pair* simple score srs, srspair* (for simple parsing in SRS in case the others change) trace (for debugging only) Those with '*' are for pairwise alignments only. > The main problem I had was with string parameters for specifying a path, with ./ default > value and having corresponding extn parameters. > > That's why I had to discard the following programs: > alignwrap, contacts, seqnr, seqsort, siggen, dichet and scopalign. > > A solution could be to have an explicit type and the extension as an attribute: > > path: algpath [ > parameter: "Y" > prompt: "Location and extension of alignment files for input" > default: "./" > extn: ".align" > ] > > What do you think? The path and extension options are a terrible 'hack' to avoid having "*" on the command line for those programs. This is really just infile with a wild card filename (which works already). We can make a new ACD type "inwild" which works like infile but with some small differences. The prompt would be "Input file(s)". The ajAcdGetInwild function will return an AjPFile. We can add functions to report the filenames as a string list (the first file is already open, the others are in a list so it is a little tricky to make the list in an application). There should be an attribute "inextension:align" (for example) and a default value of "*". If the user specifies "*.align" the inextension will be ignored. Associated qualifiers: -inextension align -indirectory /home/user/somewhere (defaults to current directory) For consistency, we can add the same qualifiers for infile. With "out" instead of "in" we can sue the same qualifiers for outfile and a new ACD type "outwild" (outwild can open a new output file, using a new ajFileNextOut call, but the application needs to give the base name each time). All easy to implement. One problem ... inwild does not work well as a parameter because it has to be given as "*" on the command line. Same problem for "outwild". I am sure users can be educated. The programs that use the path/extension options do not define them as parameters anyway. Their ACD files need some corrections. Comments? regards, Peter -- ------------------------------------------------ Peter Rice, LION Bioscience Ltd, Cambridge, UK peter.rice at uk.lionbioscience.com +44 1223 224723 From letondal at pasteur.fr Tue Apr 23 13:38:43 2002 From: letondal at pasteur.fr (Catherine Letondal) Date: Tue, 23 Apr 2002 19:38:43 +0200 Subject: Pise/EMBOSS 2.3.1 In-Reply-To: Your message of "Tue, 23 Apr 2002 17:03:27 BST." <3CC585CF.D664FB68@uk.lionbioscience.com> Message-ID: <200204231738.g3NHchop186093@electre.pasteur.fr> Peter Rice wrote: > Hi Catherine, Hi Peter, > > > Main changes were for align types, where I could associate a "pipetype" to > > chain to other programs taking alignment as input. BTW, I found "MSF" and "fasta" for the > > -aformat parameter - are there others? > > There are more, but not sequence formats. We should add them to "entrails" > output. > > We can easily add more sequence formats. Can you suggest some? I just asked to know which one to put on the Web interface. (There are also clustalw or Phylip, but it's not necessary in Pise, since there are format converters). > Those with '*' are for pairwise alignments only. > > > The main problem I had was with string parameters for specifying a path, with ./ default > > value and having corresponding extn parameters. > > > > That's why I had to discard the following programs: > > alignwrap, contacts, seqnr, seqsort, siggen, dichet and scopalign. > > > > A solution could be to have an explicit type and the extension as an attribute: > > > > path: algpath [ > > parameter: "Y" > > prompt: "Location and extension of alignment files for input" > > default: "./" > > extn: ".align" > > ] > > > > What do you think? > > The path and extension options are a terrible 'hack' to avoid having "*" > on the command line for those programs. I have the same problem just with ./, since '/' cannot be allowed in a string parameter on a Web server. Another problem I have made a workaround for, is the '*' programs such as extractseqfeat, where it is replaced in the Web form by 'all', then replaced in the CGI by '*'. > > This is really just infile with a wild card filename (which works already). > > We can make a new ACD type "inwild" which works like infile but with some > small differences. The prompt would be "Input file(s)". The ajAcdGetInwild > function will return an AjPFile. We can add functions to report the > filenames as a string list (the first file is already open, the others are > in a list so it is a little tricky to make the list in an application). > > There should be an attribute "inextension:align" (for example) and a > default value of "*". If the user specifies "*.align" the inextension will > be ignored. > > Associated qualifiers: > > -inextension align > -indirectory /home/user/somewhere (defaults to current directory) > > For consistency, we can add the same qualifiers for infile. > > With "out" instead of "in" we can sue the same qualifiers for outfile and a > new ACD type "outwild" (outwild can open a new output file, using a new > ajFileNextOut call, but the application needs to give the base name each > time). > > All easy to implement. > > One problem ... inwild does not work well as a parameter because it has to > be given as "*" on the command line. Same problem for "outwild". I am sure > users can be educated. > > The programs that use the path/extension options do not define them as > parameters anyway. Their ACD files need some corrections. > > Comments? As long as there is a way to detect such kind of parameter (in order to replace them by a simple textarea or file upload on a Web interface), I think it's very useful! So the type would be inwild or outwild? PS: Regarding Pise/EMBOSS I forgot to mention that not only output alignment are "connected" by Pise menus. I have also added this feature for sequence, seqall, seqout, etc... Thanks for the quick answer! -- Catherine Letondal -- Pasteur Institute Computing Center From mathog at mendel.bio.caltech.edu Tue Apr 23 14:25:34 2002 From: mathog at mendel.bio.caltech.edu (David Mathog) Date: Tue, 23 Apr 2002 11:25:34 -0700 Subject: Pise/EMBOSS 2.3.1 Message-ID: > One problem ... inwild does not work well as a parameter because it has to > be given as "*" on the command line. Same problem for "outwild". I am sure > users can be educated. Sure they can. That's why thousands of hours are being spent wrapping GUIs around programs so that users don't have to (horrors) log on or (gasp) type a command line. Back to the subject at hand. (And this is stream of consciousness, so please bear with me.) I think that maybe for purposes of interface design there should be predefined methods to break out (all) the pieces/options of a USA. (Perhaps even reduced to perl and C modules in the EMBOSS distribution so that W2h/Pise/etc don't need to be rewritten for each EMBOSS release.) Consider something like this: program -sequence=genbank:\* That never translates directly well into a GUI because the end user has to know what the full USA syntax is and especially that a "*" is a wild card. And often enough, they don't understand these concepts. And even if they do, they may not be able to use certain aspects of that syntax on a given server (for instance, files and paths, or particular databases.) So it falls to the GUI to put some glue in between the USA and the user. The two main web interfaces for EMBOSS take opposite paths in this regard. Pise hides the USA completely and W2H allows the user to manipulate USAs through a tool. In W2H you generally have to build the USAs ahead of time through a separate window and store them in a list, then you select one or more USAs from the list when you run the program. (USAs can also generally be typed into the slots within the program - if the user knows what he/she is doing.) In PISE you can enter a database USA like "genbank:dmwhite" (but it isn't called a USA) but entering "genbank:*" doesn't work (for instance, with compseq). PISE isn't really designed to handle wild cards because it's going to try to extract that whole sequence from the database and save it in a file and then run the program on that file. This is consistent with its typical "upload data for each program" design. Pise only ever runs programs with the "simple file" sort of USA. So perhaps its just as well that "genbank:*" doesn't work at the moment!!! To get around this wildcard limitation Pise would have to be reworked enough to recognize wildcards (and USAs in general) and slot them onto the command line without first extracting the sequences they refer to. Anyway, what's really going on with -sequence is that all of the components of USA are encoded into a single string for use on the command line and then are broken out again into separate pieces later within the program. For a GUI _all_ these pieces need to be broken out explicitly and displayed to the user (who isn't expected to know anything about USAs or have to learn anything them or the interface). Something like this: format: default database:genbank x ALL_ENTRIES o BY_STRING entrystring: (blank) >From that the GUI/cgi can easily enough format a USA for the final command line. But imagine using such an interface. It's great if you just run an occasional program but not so wonderful when you're doing something complex. How do you cut and paste the state of 4 (or more) USA variables from one page (=program) to another? That suggests to me that a GUI which always has fully broken out USA options will probably end up being pretty awkward to use. However, since the purpose of the GUI is to essentially reformat (implicit) information in the USA why not make that an explicit option - and let it reformat in both directions? Then the "standard" USA GUI interface starts to look something like this: [test usa] [from USA] [to USA] [use this] [abort] <------(buttons) USA:[ genbank:* ] format: default database:genbank <-------- (pull down list) x ALL_ENTRIES o BY_STRING entrystring: (blank) Actually it's a LOT more complicated than that, considering that it also encompasses listfiles, multiple entries (foo.msf{one,two, three}) etc.. If the user has a USA he/she can plug it into the GUI and fine. Or they can plug it, translate it, and tweak it. Or if they don't have a USA to start with they can use this page to build one. And this USA constructor page can enable/disable the USA fields as appropriate for each site and/or program. (No file access? Can't accept list files or wild cards? Then don't show those USA options. Make the database list from the output of showdb.) The final problem is that exposing the guts of the USA will take up a lot of screen space and complicate the program interfaces. That's less of a problem though if the GUI for any given EMBOSS program just provides a slot to plug in a USA and some way to pop up the USA fomatter window to fill in that slot (through javascript or whatever). The popped up formatter could then drop the final USA back into the program's USA slot. (Sort of like what W2H does, but into the programs slot rather than the working list). Regards, David Mathog mathog at caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech From jison at hgmp.mrc.ac.uk Wed Apr 24 04:40:25 2002 From: jison at hgmp.mrc.ac.uk (Dr J.C. Ison) Date: Wed, 24 Apr 2002 09:40:25 +0100 Subject: Pise/EMBOSS 2.3.1 References: <200204231506.g3NF6oop249416@electre.pasteur.fr> <3CC585CF.D664FB68@uk.lionbioscience.com> Message-ID: <3CC66F79.D7AB51BA@hgmp.mrc.ac.uk> > The programs that use the path/extension options do not define them as > parameters anyway. Their ACD files need some corrections. > > Comments? They are parameters in new versions of the the protein structure apps (alignwrap, contacts, seqnr, seqsort, siggen, dichet and scopalign etc) but I haven't committed them yet - within a month hopefully. J. From charles at moulinette.dyndns.org Fri Apr 26 17:30:56 2002 From: charles at moulinette.dyndns.org (Charles Plessy) Date: Fri, 26 Apr 2002 23:30:56 +0200 Subject: seqret doesn't count more than 99? Message-ID: <20020426213056.GA26616@moulinette.dyndns.org> Hello, I downloaded the draft of the fugu genome (fasta format, 300Mb) and renamed the headers using the following command line : sed < fugu_02_04_28.fasta 's/>/>gnl|fugu|/' > fugu_newheaders_02_04_28.fasta I'm not able to index a blast database correctly if the header doesn't look ?ncbi compliant? ant formatdb haddn't been run with the -o flag. I created the blast database and indexed it with dbiblast. The reason for not formatting the fasta file itself is to save space. This also enforces a synchronicity between the blast hits names and the names that I can give to seqret. Here is now the prbolem : charles at pc-1035-a:~$ seqret fugu:Scaffold_7 Reads and writes (returns) sequences Output sequence [scaffold_7.fasta]: ==> OK! charles at pc-1035-a:~$ seqret fugu:Scaffold_99 Reads and writes (returns) sequences Output sequence [scaffold_99.fasta]: ==> OK! charles at pc-1035-a:~$ seqret fugu:Scaffold_100 Reads and writes (returns) sequences Error: Unable to read sequence 'fugu:Scaffold_100' ==> KO :(( seqret can't fetch sequences names like Scaffold_xzy, where xyz >= 100. Is it due to the lenght of the name? I am puzzled with that problem... I can send you more info if you like. Charles From simon.andrews at bbsrc.ac.uk Mon Apr 29 05:49:29 2002 From: simon.andrews at bbsrc.ac.uk (simon andrews (BI)) Date: Mon, 29 Apr 2002 10:49:29 +0100 Subject: seqret doesn't count more than 99? Message-ID: <2DC41140A89ED411989D00508BDCD9ED01E28535@bi-exsrv1.iapc.bbsrc.ac.uk> > -----Original Message----- > From: Charles Plessy [mailto:charles at moulinette.dyndns.org] > Sent: 26 April 2002 22:31 > To: emboss at hgmp.mrc.ac.uk > Subject: seqret doesn't count more than 99? > > > Hello, > > I downloaded the draft of the fugu genome [snip] > I'm not able to index a blast database correctly if the header doesn't > look ?ncbi compliant? ant formatdb haddn't been run with the -o flag. I'd not tried this before, but we see the same thing here. Running dbiblast on the indexed raw fugu data seems to work, but seqret fails on the subsequent retrieval. The problem seems to be in the accession numbers entered into the .trg file created by dbiblast. Running seqret with debug on, shows the following (edited) entries: ------------------------------------ USA to test: 'fugu_blasttest:Scaffold_1' [snip] found dbname fugu_blasttest wild query 'Scaffold_1' 'Scaffold_1' '' database type: 'N' format 'ncbi' use access method 'blast' Matched seqAccess[12] 'blast' seqAccessBlast type 1 [snip] seqCdIdxSearch (entry 'Scaffold_1') [several more of these] idx test 59 'Scaffold_100' -1 (+/- 39) idx test 49 'Contig_83248' 1 (+/- 18) idx test 54 'Contig_9376' 1 (+/- 8) idx test 56 'Scaffold_10' -1 (+/- 3) idx test 55 'Scaffold_1' -1 (+/- 0) ajFileNewIn '/data/Fugu/EMBOSS/TEST/acnum.trg' ajNamResolve of '/data/Fugu/EMBOSS/TEST/acnum.trg' seqCdReadHeader file /data/Fugu/EMBOSS/TEST/acnum.trg FileSize: 416800 NRecords: 20825 recsize: 20 idsize: 10 seqCdFileOpen '/data/Fugu/EMBOSS/TEST/acnum.trg' NRecords: 20825 RecSize: 20 ajFileNewIn '/data/Fugu/EMBOSS/TEST/acnum.hit' ajNamResolve of '/data/Fugu/EMBOSS/TEST/acnum.hit' seqCdReadHeader file /data/Fugu/EMBOSS/TEST/acnum.hit FileSize: 83600 NRecords: 20825 recsize: 4 idsize: -6 seqCdFileOpen '/data/Fugu/EMBOSS/TEST/acnum.hit' NRecords: 20825 RecSize: 4 seqCdTrgSearch 'Scaffold_1' recSize: 20 trg test 10412 'ZZ0010413' -1 (+/- 20825) trg test 5206 'ZZ0005207' -1 (+/- 10412) trg test 2603 'ZZ0002604' -1 (+/- 5206) trg test 1301 'ZZ0001302' -1 (+/- 2603) trg test 650 'ZZ0000651' -1 (+/- 1301) trg test 325 'ZZ0000326' -1 (+/- 650) trg test 162 'ZZ0000163' -1 (+/- 325) trg test 81 'ZZ0000082' -1 (+/- 162) trg test 40 'ZZ0000041' -1 (+/- 81) trg test 20 'ZZ0000021' -1 (+/- 40) trg test 10 'ZZ0000011' -1 (+/- 20) trg test 5 'ZZ0000006' -1 (+/- 10) trg test 2 'ZZ0000003' -1 (+/- 5) trg test 1 'ZZ0000002' -1 (+/- 2) trg test 0 'ZZ0000001' -1 (+/- 1) 'SCAFFOLD_1' not found found in .trg ------------------------------------------------ After this is cleans up after itself and exits. Looking through the .trg file all the accessions are of the form ZZ0000XXX. This format of accession doesn't appear anywhere in my original data, so I don't know where it's coming from (presumably either dbiblast or formatdb?). The inability to reconcile the Scaffold_1 with the ZZ00... accessions seems to be what causes seqret to fail. > I created the blast database and indexed it with dbiblast. The reason > for not formatting the fasta file itself is to save space. This also > enforces a synchronicity between the blast hits names and the names > that I can give to seqret. The way we did this was to use the fasta files for both. I take the point about the space saving, but the assembled data wasn't all that big. If you use the raw fasta files for both formatdb (without header parsing) and dbifasta, then you can still use the same accession codes as reference in both. > Here is now the prbolem : > > charles at pc-1035-a:~$ seqret fugu:Scaffold_100 > Reads and writes (returns) sequences > Error: Unable to read sequence 'fugu:Scaffold_100' > > ==> KO :(( > > seqret can't fetch sequences names like Scaffold_xzy, where > xyz >= 100. > > Is it due to the length of the name? It might be worth running seqret with the -debug flag on and looking at the messages at the end of seqret.dbg. This usually gives some more useful information about what is going wrong in these cases. I'd be interested in seeing a resolution to this as well... TTFN Simon. From peter.rice at uk.lionbioscience.com Mon Apr 29 06:41:30 2002 From: peter.rice at uk.lionbioscience.com (Peter Rice) Date: Mon, 29 Apr 2002 11:41:30 +0100 Subject: seqret doesn't count more than 99? References: <2DC41140A89ED411989D00508BDCD9ED01E28535@bi-exsrv1.iapc.bbsrc.ac.uk> Message-ID: <3CCD235A.D418202E@uk.lionbioscience.com> "simon andrews (BI)" wrote: > The problem seems to be in the accession numbers entered into the .trg file > created by dbiblast. Running seqret with debug on, shows the following > (edited) entries: The command line: seqret fugu_blasttest:Scaffold_1 searches both the entryname and acnum indices. The ZZ accession number are invented bu dbiblast so there is something in the acnum index (they should disappear in 2.4.0, where we handle empty indices gracefully). The problem will be in the entryname index, where is seems Scaffold_1 was found, but not accepted. I am waiting for the example file from Charles, but I suspect this is a problem already fixed in the code for 2.4.0. regards, Peter -- ------------------------------------------------ Peter Rice, LION Bioscience Ltd, Cambridge, UK peter.rice at uk.lionbioscience.com +44 1223 224723 From charles at moulinette.dyndns.org Mon Apr 29 09:22:07 2002 From: charles at moulinette.dyndns.org (Charles Plessy) Date: Mon, 29 Apr 2002 15:22:07 +0200 Subject: seqret doesn't count more than 99? In-Reply-To: <2DC41140A89ED411989D00508BDCD9ED01E28535@bi-exsrv1.iapc.bbsrc.ac.uk> References: <2DC41140A89ED411989D00508BDCD9ED01E28535@bi-exsrv1.iapc.bbsrc.ac.uk> Message-ID: <20020429132207.GD1818@moulinette.dyndns.org> > I'd not tried this before, but we see the same thing here. Running dbiblast > on the indexed raw fugu data seems to work, but seqret fails on the > subsequent retrieval. I have to NCBIze the headers in order to make it work : I use either lcl|entryname or gnl|dbname|entryname > > I created the blast database and indexed it with dbiblast. The reason > > for not formatting the fasta file itself is to save space. This also > > enforces a synchronicity between the blast hits names and the names > > that I can give to seqret. > > The way we did this was to use the fasta files for both. I take the point > about the space saving, but the assembled data wasn't all that big. If you > use the raw fasta files for both formatdb (without header parsing) and > dbifasta, then you can still use the same accession codes as reference in > both. You are right, I was also motivated to do something 'aesthetic' ;) > It might be worth running seqret with the -debug flag on and looking at the > messages at the end of seqret.dbg. This usually gives some more useful > information about what is going wrong in these cases. I can send the debug info upon request, the files (one success, one failure) are not that big (70k) but I think that netiquette doesn't recommend sending them to all the list. Charles From mathog at mendel.bio.caltech.edu Tue Apr 2 16:39:48 2002 From: mathog at mendel.bio.caltech.edu (David Mathog) Date: Tue, 02 Apr 2002 08:39:48 -0800 Subject: primer3 and qual scores Message-ID: The 0.9 version of primer3 from http://www-genome.wi.mit.edu/genome_software/other/primer3.html comes with a cgi script that puts up a web interface which drives primer3_core. That web interface provides a slew of options for including qual values in a section labeled "Sequence Quality". The primer3 program in EMBOSS seems not to have these options. Is there some technical reason why this functionality wasn't included or is it just one of those (many) things that have yet to percolate to the top of the "to do" list? Thanks, David Mathog mathog at caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech From vvrajarao at yahoo.com Wed Apr 3 14:35:44 2002 From: vvrajarao at yahoo.com (V V Raja Rao) Date: Wed, 3 Apr 2002 06:35:44 -0800 (PST) Subject: tandem repeats Message-ID: <20020403143544.79253.qmail@web11107.mail.yahoo.com> Hi, I would like to know the algorithm used for the tandem repeat finder program in the emboss package. Can someone mail me the same. Thanks in advance, Raja. __________________________________________________ Do You Yahoo!? Yahoo! Tax Center - online filing with TurboTax http://taxes.yahoo.com/ From gwilliam at hgmp.mrc.ac.uk Wed Apr 3 15:56:04 2002 From: gwilliam at hgmp.mrc.ac.uk (Gary Williams, Tel 01223 494522) Date: Wed, 03 Apr 2002 16:56:04 +0100 Subject: primer3 and qual scores References: Message-ID: <3CAB2614.9A8B28A@hgmp.mrc.ac.uk> As you say "a slew of options"! I didn't include the quality values as there was pressure from the GUI community to minimise the number of options. I, myself, have never used the quality options. They could be added in, but there are already rather a lot of options for this program; do we need more? Which ones do you consider the most useful, and why? Gary David Mathog wrote: > > The 0.9 version of primer3 from > > http://www-genome.wi.mit.edu/genome_software/other/primer3.html > > comes with a cgi script that puts up a web interface which > drives primer3_core. That web interface provides a slew > of options for including qual values in a section labeled > "Sequence Quality". The primer3 program in EMBOSS seems not to > have these options. Is there some technical reason why this > functionality wasn't included or is it just one of those (many) things > that have yet to percolate to the top of the "to do" list? > > Thanks, > > David Mathog > mathog at caltech.edu > Manager, Sequence Analysis Facility, Biology Division, Caltech -- Gary Williams Tel: +44 1223 494522 Fax: +44 1223 494512 mailto:G.Williams at hgmp.mrc.ac.uk http://www.hgmp.mrc.ac.uk/ Bioinformatics,MRC HGMP Resource Centre,Hinxton,Cambridge, CB10 1SB,UK From foisys at mac.com Thu Apr 4 19:02:47 2002 From: foisys at mac.com (Sylvain Foisy) Date: Thu, 4 Apr 2002 14:02:47 -0500 Subject: Compiling EMBOSS for Jemboss use in MacOS X Message-ID: <8DA5625E-47FE-11D6-ABCB-0003936297DA@mac.com> Hi, Thanks for supporting OS X in EMBOSS. I got a clean compile and make and it is working OK. I would like to try Jemboss and I have one (I guess, pretty stupid) question. What is the Java location I might specify in the configure command in Mac OS X? There is a lot of differents places in OS X that might be good... Any hints? Sylvain ++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Sylvain Foisy, Ph. D. Manager BIONEQ - Le Reseau quebecois de bioinformatique Genome-Quebec Tel.: (514) 343-6111 poste 5188 E-mail: foisys at medcn.umontreal.ca ++++++++++++++++++++++++++++++++++++++++++++++++++++++++ From ame at esbs.u-strasbg.fr Mon Apr 8 09:13:07 2002 From: ame at esbs.u-strasbg.fr (Jean-Christophe Ame) Date: Mon, 8 Apr 2002 11:13:07 +0200 Subject: Blast database Message-ID: Hello, I have a few BLAST formatted databases to be used with BLAST and I would like them to be shared with emboss. how can I do that ? How should I set up my .embossrc file ? Any answer would be of great help. Thank you. Jean-Christophe ________________________ Jean-Christophe Am?, PhD U.P.R. 9003 du CNRS - Canc?rog?n?se et Mutag?n?se Mol?culaire et Structurale ?cole Sup?rieure de Biotechnologie de Strasbourg P?le API Boulevard S?bastien-Brant 67400 Illkirch France tel.: 03 90 24 47 05 Fax.: 03 90 24 46 86 From cquijano at iib.uam.es Tue Apr 9 08:51:30 2002 From: cquijano at iib.uam.es (Carlos Quijano) Date: Tue, 09 Apr 2002 10:51:30 +0200 Subject: Where is protml (Phylip) ? Message-ID: <3CB2AB92.5040308@iib.uam.es> Hello, Some people asked me about "activating" the protml application from phylip package. ;-) We use Emboss, with Embassy, and it seems that protml is documented but not compiled or installed. Even looking for it under ./src it is not present by some reason. I don't know if the cause for it is that protml (something like dnaml but for proteins trees) has been developed with PASCAL instead of C. With the phylip package (not Embasy's) protml.pas is present between the other sourcefiles. And I have seen that it comes with the MOLPHY package too, but in C, I guess. Someone has any idea for put a compiled protml.pas (or molphy's) into Emboss and make it accesible for the frontend-app used (for me w2h)?? I know perhaps this solution is not the best one, or perhaps a little paranoid. Because it's possible to use PIE like web interface for Phylip, Molphi's protml and puzzle. I am only looking for an easy way for compiling protml and make it part of the Emboss or Emboss/w2h applications set. Thank you for your time. From frank at bioss.ac.uk Tue Apr 9 10:04:18 2002 From: frank at bioss.ac.uk (Frank Wright) Date: Tue, 09 Apr 2002 11:04:18 +0100 Subject: Where is protml (Phylip) ? References: <3CB2AB92.5040308@iib.uam.es> Message-ID: <3CB2BCA2.F34D3DEC@bioss.ac.uk> PHYLIP 3.5 includes PROTML (from the MOLPHY package) because v 3.5 does not have a protein maximum likelihood program. However, PHYLIP 3.6 (almost about to go to a Beta release) has a new program, PROML, which is a PHYLIP protein maximum likelihood program. PROML has additional features to PROTML. I suggest waiting for PHYLIP 3.6 to be released and EMBOSS/EMBASSY is adapted to access v 3.6 programs. In the meantime, PHYLIP 3.6 (alpha version, but pretty stable) is available from http://evolution.genetics.washington.edu/phylip.html. Best Wishes, Frank -- Frank Wright Biomathematics and Statistics Scotland, SCRI, DUNDEE DD2 5DA, Scotland frank at bioss.sari.ac.uk From Guoneng.Zhong at med.nyu.edu Tue Apr 9 17:37:42 2002 From: Guoneng.Zhong at med.nyu.edu (Guoneng Zhong) Date: Tue, 9 Apr 2002 13:37:42 -0400 Subject: problem running Message-ID: <7EDDC060-4BE0-11D6-A32B-0050E41E5C1B@med.nyu.edu> Hi, I followed the instructions and installed emboss on a Tru64 unix. I ran the test: wossname -auto | more and it worked (at least no weird errors). But here are two problems: 1. Running jemboss gave me this: Error: failed /usr/opt/java131/jre/lib/alpha/fast/libjvm.so, because dlopen: cannot load /usr/opt/java131/jre/lib/alpha/fast/libjvm.so 2. Running abiviewer gave me this: Reads ABI file and display the trace Output sequence [outfile.fasta]: Graph type [x11]: PLPLOT_LIB="/usr/local/emboss/lib" Cannot open library file: plstnd5.fnt Please set PLPLOT_LIB to the plplot/lib directory under emboss *** PLPLOT ERROR *** Unable to open font file Program aborted Any hint would help. Thanks! Guoneng From David.Bauer at SCHERING.DE Wed Apr 10 05:47:02 2002 From: David.Bauer at SCHERING.DE (David.Bauer at SCHERING.DE) Date: Wed, 10 Apr 2002 07:47:02 +0200 Subject: Antwort: problem running Message-ID: Hi, Answer for question 2: The PLPLOT_LIB must point to a directory with the .fnt files. If you do a standard installation, then this is "/usr/local/share/EMBOSS". Alternatively you can use the location where you unpacked the emboss tar file. In that case it is the /.../EMBOSS-..../plplot/lib. So if you use csh or tcsh you should have a setenv PLPLOT_LIB /usr/local/share/EMBOSS in your .cshrc. Hope this helps. Ciao, David. Hi, I followed the instructions and installed emboss on a Tru64 unix. I ran the test: wossname -auto | more and it worked (at least no weird errors). But here are two problems: 1. Running jemboss gave me this: Error: failed /usr/opt/java131/jre/lib/alpha/fast/libjvm.so, because dlopen: cannot load /usr/opt/java131/jre/lib/alpha/fast/libjvm.so 2. Running abiviewer gave me this: Reads ABI file and display the trace Output sequence [outfile.fasta]: Graph type [x11]: PLPLOT_LIB="/usr/local/emboss/lib" Cannot open library file: plstnd5.fnt Please set PLPLOT_LIB to the plplot/lib directory under emboss *** PLPLOT ERROR *** Unable to open font file Program aborted Any hint would help. Thanks! Guoneng From john.walshaw at bbsrc.ac.uk Wed Apr 10 10:44:13 2002 From: john.walshaw at bbsrc.ac.uk (john walshaw (JIC)) Date: Wed, 10 Apr 2002 11:44:13 +0100 Subject: dbifasta/seqret and ncbi-format fasta headers Message-ID: I have a question about ncbi-type sequence headers in fasta-format files. I'm using EMBOSS 2.3.1. The ncbi format for the dbifasta program is described variously as: ncbi : >blah|...[|ACC]|ID and >...[|accno]|id ... in the EMBOSS admin guide and by 'tfm dbifasta'. >From these I assumed that within the first of the whitespace-delimited 'fields', the last two '|'-delimited subfields will be treated by dbifasta as the accession no and ID respectively: >gi|15375403|dbj|AB039926.1|AB039926 Arabidopsis ...blah... ^^^^^^^^^^ ^^^^^^^^ accno id - but this doesn't work as seqret reports in this case that AB039926 is not in my database (which I indexed with dbifasta using idformat 'ncbi', and specified with method: emblcd format:fasta & the necessary dir: and indexdir: fields). But this sequence works (I can get it with seqret) - >gi|15383574|gb|AV540904.2|AV540904 AV540904 Arabidopsis thaliana roots ...blah ^^^^^^^^ ^^^^^^^^ -because the second whitespace-delimited field is present AND identical to the previous subfield. The 2nd field is not simply being used as the accno, because for example this entry: >gi|15383574|gb|AV540904.2|XXXXXXX YYYYYYY cannot be returned by seqret either as XXXXXXX or YYYYYYY (or by any means other than requesting all sequences in the DB). Am I doing something stupid? I've looked into this problem a lot, and can provide debug files for seqret & dbifasta, and I'm sure my db specification in emboss.default is correct. For the sequences which fail, seqret reads the correct header line, but then thinks that accno=''. And seqret always returns the id as 'gi' (even for sequences which can be fetched normally). All of the correct accnos (e.g. AV540904.2) appear in the acnum.trg file. Regards, John Walshaw John Innes Centre, Norwich Research Park, Colney, Norwich NR4 7UH, UK. +44(0)1603 450827 From valenzi at iigb.na.cnr.it Wed Apr 10 16:49:07 2002 From: valenzi at iigb.na.cnr.it (Marco Valenzi) Date: Wed, 10 Apr 2002 18:49:07 +0200 Subject: About prima Message-ID: Hi, I'm Marco Valenzi from Naples. Why prima has been removed from the current package of EMBOSS-2.3.1? Many thanks -- Marco Valenzi Institute of Genetics and Biophysics "Adriano Buzzati Traverso" via Guglielmo Marconi, 10 80125 Naples ITALY E-mail valenzi at iigbna.iigb.na.cnr.it tel. +39 081 7257303 From cox at mshri.on.ca Sun Apr 14 00:46:07 2002 From: cox at mshri.on.ca (Brian Cox) Date: Sat, 13 Apr 2002 17:46:07 -0700 Subject: jemboss Message-ID: <000801c1e34d$c79a9c30$bc66f8ce@rossdell> Hello, I noticed that there is a Jemboss for windows on your FTP site. I downloaded it but, require a login and password. How do I obtain these? Is this a standalone version such as the one for Unix? Thank you Brian Cox -------------- next part -------------- An HTML attachment was scrubbed... URL: From quenzer at informatik.uni-tuebingen.de Tue Apr 16 12:08:56 2002 From: quenzer at informatik.uni-tuebingen.de (Muriel Quenzer) Date: Tue, 16 Apr 2002 14:08:56 +0200 Subject: Size of EMBOSS 2.3.1 for Solaris 2.8 Message-ID: <200204161208.g3GC8u721130@tauri.informatik.uni-tuebingen.de> Hi, I am new to EMBOSS and have to install the latest EMBOSS version 2.3.1 for Sun Solaris 2.8. I have been told that the EMBOSS version 2.0.1 needed approximately 15 MB disk space, whereas the EMBOSS version 2.3.1 that I compiled needs approximately 520 (!) MB. Is this correct? Thanks for any suggestions. Muriel -- Mit freundlichen Gr??en, Muriel Quenzer ---------- Universit?t T?bingen Wilhelm-Schickard-Institut f?r Informatik Zentrum f?r Bioinformatik Sand 13, 72076 T?bingen Germany Tel.: +49 (0)7071/29-70464 E-mail: quenzer at informatik.uni-tuebingen.de GnuPG PUBLIC KEY on request Key fingerprint = ADDF 1E38 773F 3D51 682E 1F50 D7CC 47E1 3AE8 E047 From charles at moulinette.dyndns.org Tue Apr 16 12:55:32 2002 From: charles at moulinette.dyndns.org (Charles Plessy) Date: Tue, 16 Apr 2002 14:55:32 +0200 Subject: Pentium optimisation Message-ID: <20020416125532.GA22253@moulinette.dyndns.org> Hi, I'm running emboss on a debian GNU\Linux with a Pentium IV. Would I increase the speed of computations if I compiled it with for i686, not i386 processors (or is it only useful for multimedia apps) ? Charles From David.Bauer at SCHERING.DE Tue Apr 16 14:11:39 2002 From: David.Bauer at SCHERING.DE (David.Bauer at SCHERING.DE) Date: Tue, 16 Apr 2002 16:11:39 +0200 Subject: Antwort: Size of EMBOSS 2.3.1 for Solaris 2.8 Message-ID: Hi, this is a little bit overestimated.... I have EMBOSS on Solaris 2.7. The build tree is ~100 MB with embassy apps (they are about 15 MB incl. tar files). The installed version needs 21.4 MB for the binaries and 94 MB in share/EMBOSS (where 75 MB are for PRINTS,PROSITE and REBASE). Mit freundlichen Gr??en, David. Hi, I am new to EMBOSS and have to install the latest EMBOSS version 2.3.1 for Sun Solaris 2.8. I have been told that the EMBOSS version 2.0.1 needed approximately 15 MB disk space, whereas the EMBOSS version 2.3.1 that I compiled needs approximately 520 (!) MB. Is this correct? Thanks for any suggestions. Muriel -- Mit freundlichen Gr??en, Muriel Quenzer ---------- Universit?t T?bingen Wilhelm-Schickard-Institut f?r Informatik Zentrum f?r Bioinformatik Sand 13, 72076 T?bingen Germany Tel.: +49 (0)7071/29-70464 E-mail: quenzer at informatik.uni-tuebingen.de GnuPG PUBLIC KEY on request Key fingerprint = ADDF 1E38 773F 3D51 682E 1F50 D7CC 47E1 3AE8 E047 From peacfrog at ptd.net Tue Apr 16 17:18:07 2002 From: peacfrog at ptd.net (Cynthia Martino) Date: Tue, 16 Apr 2002 13:18:07 -0400 Subject: Prima Message-ID: <001301c1e56a$ad3c7880$2d7ce518@msns.str.ptd.net> Hi there! In the past I was able to access a number of programs, including prima, via the EMBnet Norway site. However, now when I click on the program name within the program list all I get is a help page describing the qualifiers. Do you know if this and the other programs formerly available (via www.no.embnet.org/Programs/) can still be accessed online? Any feedback is greatly appreciated. -------------- next part -------------- An HTML attachment was scrubbed... URL: From letondal at pasteur.fr Tue Apr 16 21:41:15 2002 From: letondal at pasteur.fr (Catherine Letondal) Date: Tue, 16 Apr 2002 23:41:15 +0200 Subject: Prima In-Reply-To: Your message of "Tue, 16 Apr 2002 13:18:07 EDT." <001301c1e56a$ad3c7880$2d7ce518@msns.str.ptd.net> Message-ID: <200204162141.g3GLfF6O453283@electre.pasteur.fr> "Cynthia Martino" wrote: > This is a multi-part message in MIME format. > > Hi there! > > In the past I was able to access a number of programs, including prima, = > via the EMBnet Norway site. However, now when I click on the program = > name within the program list all I get is a help page describing the = > qualifiers. =20 > > Do you know if this and the other programs formerly available (via = > www.no.embnet.org/Programs/) can still be accessed online?=20 > > Any feedback is greatly appreciated. > Hi, If you want to use a similar interface, you can go to: http://bioweb.pasteur.fr/seqanal/interfaces/prima.html (see http://bioweb.pasteur.fr/intro-uk.html for all EMBOSS programs) There are however other EMBOSS interfaces that you can use, people from EMBOSS will tell you more accurately than I would do. I don't know what happens on the no.embnet.org server. We are late in the distribution of the Xml Pise programs for EMBOSS latest version, so that could explain. -- Catherine Letondal -- Pasteur Institute Computing Center From David.Bauer at SCHERING.DE Wed Apr 17 05:34:38 2002 From: David.Bauer at SCHERING.DE (David.Bauer at SCHERING.DE) Date: Wed, 17 Apr 2002 07:34:38 +0200 Subject: Antwort: Prima Message-ID: Hi, the EMBOSS programs are also available at http://ubigcg.mdh4.mdc-berlin.de:8080/ Btw. I have updated the system to EMBOSS version 2.3.1. Ciao, David. Hi there! In the past I was able to access a number of programs, including prima, via the EMBnet Norway site. However, now when I click on the program name within the program list all I get is a help page describing the qualifiers. Do you know if this and the other programs formerly available (via www.no.embnet.org/Programs/) can still be accessed online? Any feedback is greatly appreciated. From mathog at mendel.bio.caltech.edu Wed Apr 17 19:05:47 2002 From: mathog at mendel.bio.caltech.edu (David Mathog) Date: Wed, 17 Apr 2002 12:05:47 -0700 Subject: network USA Message-ID: Today I finally realized that the NCBi's PmFetch cgi http://www.ncbi.nlm.nih.gov:80/entrez/utils/pmfetch_help.html can be used to retrieve data via gi using a "simple" URL like this: wget -O dmwhite.genbank \ 'http://www.ncbi.nlm.nih.gov/entrez/utils/pmfetch.fcgi?db=Nucleotide&id=10873&report=gen&mode=text' Unfortunately it seems not to be able to retrieve by either accession number or locus name - I'm still waiting to hear if there is some other NCBI interface for that. Which is a long way of coming around to considering how a USA could be used to retrieve remote sequences without exposing end users to truly hideous constructs. The semantics of accessing arbitrary network databases are probably much too complex to include in the USA but one can imagine burying these details under new types of "database" entries in the defaults file. Something like this: DB gigenbank [ method: remoteurlbyid comment: "GENBANK at NCBI by gi number" format: - dir: - file: - type: N #optional target: 'http://www.ncbi.nlm.nih.gov/entrez/utils/pmfetch.fcgi?db=Nucleotide&id=$ID&report=gen&mode=text' filter: 'wget -O - $target' ] Which would then allow something like this to work transparently: % seqret gigenbank:10873 The USA already has the "program" option but I think in a situation like this it's much too complex to actually use. How many users are going to be able to successfully negotiate this: % seqret -sequence=fasta::"wget -O - 'http://www.ncbi.nlm.nih.gov/entrez/utils/pmfetch.fcgi?db=Nucleotide&id=10873&report=fasta&mode=text' |" -filter Anyway, what I'm proposing is that the database definition be extended slightly to allow remote accesss methods. This would be particularly helpful for people running EMBOSS on their own PCs or Macs, who tend not to have large local databases installed. Regards, David Mathog mathog at caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech From dmartin at bioinformatics.msiwtb.dundee.ac.uk Wed Apr 17 19:32:40 2002 From: dmartin at bioinformatics.msiwtb.dundee.ac.uk (David Martin) Date: Wed, 17 Apr 2002 20:32:40 +0100 (BST) Subject: network USA In-Reply-To: Message-ID: On Wed, 17 Apr 2002, David Mathog wrote: > Today I finally realized that the NCBi's PmFetch cgi > > http://www.ncbi.nlm.nih.gov:80/entrez/utils/pmfetch_help.html > > can be used to retrieve data via gi using a "simple" URL like this: > > wget -O dmwhite.genbank \ > 'http://www.ncbi.nlm.nih.gov/entrez/utils/pmfetch.fcgi?db=Nucleotide&id=10873&report=gen&mode=text' > > Unfortunately it seems not to be able to retrieve by either accession > number or > locus name - I'm still waiting to hear if there is some other NCBI > interface for that. > > Which is a long way of coming around to considering how a USA could be > used to retrieve remote sequences without exposing end users to truly > hideous > constructs. The semantics of accessing arbitrary network databases are > probably much too complex to include in the USA but one can imagine > burying > these details under new types of "database" entries in the defaults > file. Something like this: Try 'method: url' and using %s instead of $ID. It has been there from EMBOSS 0.0.4 to allow retrieval from remote srs servers (or indeed any arbitrary web address where the id can be passed in the url). Around page 19-20 in the admin guide. If it doesn't work then let the guilty parties know. ..d > > DB gigenbank [ > method: remoteurlbyid > comment: "GENBANK at NCBI by gi number" > format: - > dir: - > file: - > type: N > #optional > target: > 'http://www.ncbi.nlm.nih.gov/entrez/utils/pmfetch.fcgi?db=Nucleotide&id=$ID&report=gen&mode=text' > filter: 'wget -O - $target' > ] > > Which would then allow something like this to work transparently: > > % seqret gigenbank:10873 > > The USA already has the "program" option but I think in a situation like > this it's > much too complex to actually use. How many users are going to be able > to successfully negotiate this: > > % seqret -sequence=fasta::"wget -O - > 'http://www.ncbi.nlm.nih.gov/entrez/utils/pmfetch.fcgi?db=Nucleotide&id=10873&report=fasta&mode=text' > |" -filter > > Anyway, what I'm proposing is that the database definition be extended > slightly > to allow remote accesss methods. This would be particularly helpful for > people > running EMBOSS on their own PCs or Macs, who tend not to have large > local databases installed. > > Regards, > > David Mathog > mathog at caltech.edu > Manager, Sequence Analysis Facility, Biology Division, Caltech > ---------------------------------- David Martin PhD Bioinformatics Scientific Officer Wellcome Trust Biocentre, Dundee ---------------------------------- From David.Bauer at SCHERING.DE Thu Apr 18 05:50:47 2002 From: David.Bauer at SCHERING.DE (David.Bauer at SCHERING.DE) Date: Thu, 18 Apr 2002 07:50:47 +0200 Subject: Antwort: network USA Message-ID: Hi, I use for this a workaround which uses method app calling scripts which use two urls at ncbi. In emboss.default I have two entries for nucleotide and protein, which call an external script. ############ DB ncbin [ type: N method: app format: genbank app: "/bips/bin/emboss/ncbi_fetchn %s" comment: "NCBI GenBank Nucleotide" ] DB ncbip [ type: P method: app format: genbank app: "/bips/bin/emboss/ncbi_fetchp %s" comment: "NCBI GenBank Protein" ] ################## The script is unfortunately not very portable as it uses a modified perl LWP module to work with our firewall. Basic idea is to use the "http://www.ncbi.nlm.nih.gov/entrez/utils/pmqty.fcgi?db=nucleotide&term =$id&dopt=genbank" resp. "http://www.ncbi.nlm.nih.gov/entrez/utils/pmqty.fcgi?db=protein&term =$id&dopt=genpept" for protein to get the gid where $id is locus or acc. If there are different gid for one acc, then all of them are returned. What I have observed is that the gid of the most recent version is returned first (but I'm not sure if this is always true). So I just grab the first gid which comes and then use the same url you already mentioned: "http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=text&db=nucleotide&uid =$gid&dopt=GenBank" ("http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=text&db=protein&uid =$gid&dopt=genpept") to return the whole entry. So the user can just use ncbin: with seqret or entret to get the sequence or genbank entry. Hope this helps, Ciao, David. From inagy at abc.hu Thu Apr 18 06:50:57 2002 From: inagy at abc.hu (inagy at abc.hu) Date: Thu, 18 Apr 2002 08:50:57 +0200 (CEST) Subject: primer3 and -format_output In-Reply-To: <3CAB2614.9A8B28A@hgmp.mrc.ac.uk> Message-ID: The Whitehead primer3 program has a "-format_output" option that writes a formatted output of the input seqences and highlightes the primer binding sites, etc. Would it be possible to include this option into the EMBOSS version too ? It is sometimes very useful. Istvan From gwilliam at hgmp.mrc.ac.uk Fri Apr 19 08:15:14 2002 From: gwilliam at hgmp.mrc.ac.uk (Gary Williams, Tel 01223 494522) Date: Fri, 19 Apr 2002 09:15:14 +0100 Subject: primer3 and -format_output References: Message-ID: <3CBFD212.C29B713@hgmp.mrc.ac.uk> I'll add this to the list of suggestions for primer3. Gary inagy at abc.hu wrote: > > The Whitehead primer3 program has a "-format_output" option that writes a > formatted output of the input seqences and highlightes the primer binding sites, > etc. > > Would it be possible to include this option into the EMBOSS version too ? > It is sometimes very useful. > > Istvan -- Gary Williams Tel: +44 1223 494522 Fax: +44 1223 494512 mailto:G.Williams at hgmp.mrc.ac.uk http://www.hgmp.mrc.ac.uk/ Bioinformatics,MRC HGMP Resource Centre,Hinxton,Cambridge, CB10 1SB,UK From grimplet at ensam.inra.fr Fri Apr 19 09:27:58 2002 From: grimplet at ensam.inra.fr (=?iso-8859-1?q?j=E9r=F4me=20Grimplet?=) Date: Fri, 19 Apr 2002 11:27:58 +0200 Subject: primer3_core Message-ID: <200204191129.g3JBTOe14633@ensam.inra.fr> I believe that somebody already put this question a few week ago, but how can I get the primer3_core programm. I don't find it in the Whitehead Institute package. Thanks, Jerome -- J?r?me Grimplet Laboratoire de Biochimie M?tabolique et Technologie UMR Sciences Pour l'Oenologie 2, Place Viala 34060 Montpellier Cedex 01 Tel: 33(0)4.99.61.27.56 Fax: 33(0)4.99.61.28.57 grimplet at ensam.inra.fr From gwilliam at hgmp.mrc.ac.uk Fri Apr 19 10:51:46 2002 From: gwilliam at hgmp.mrc.ac.uk (Gary Williams, Tel 01223 494522) Date: Fri, 19 Apr 2002 11:51:46 +0100 Subject: primer3_core References: <200204191129.g3JBTOe14633@ensam.inra.fr> Message-ID: <3CBFF6C2.DCAF6007@hgmp.mrc.ac.uk> >From the eprimer3 documentation: Notes The Whitehead Institute program that is run by this program is available from: http://www-genome.wi.mit.edu/genome_software/other/primer3.html (Then see the link 'Get release 0.9') The version that is run by this program is 3.0.9 currently available from: http://www-genome.wi.mit.edu/ftp/distribution/software/primer3_0_9_test.tar.gz j?r?me Grimplet wrote: > > I believe that somebody already put this question a few week ago, but how can > I get the primer3_core programm. I don't find it in the Whitehead Institute > package. > > Thanks, > > Jerome > -- > J?r?me Grimplet > Laboratoire de Biochimie M?tabolique et Technologie > UMR Sciences Pour l'Oenologie > 2, Place Viala > 34060 Montpellier Cedex 01 > Tel: 33(0)4.99.61.27.56 > Fax: 33(0)4.99.61.28.57 > grimplet at ensam.inra.fr -- Gary Williams Tel: +44 1223 494522 Fax: +44 1223 494512 mailto:G.Williams at hgmp.mrc.ac.uk http://www.hgmp.mrc.ac.uk/ Bioinformatics,MRC HGMP Resource Centre,Hinxton,Cambridge, CB10 1SB,UK From peter.rice at uk.lionbioscience.com Fri Apr 19 12:49:07 2002 From: peter.rice at uk.lionbioscience.com (Peter Rice) Date: Fri, 19 Apr 2002 13:49:07 +0100 Subject: network USA References: Message-ID: <3CC01243.9190F3FB@uk.lionbioscience.com> David Mathog wrote: > > Today I finally realized that the NCBi's PmFetch cgi > > http://www.ncbi.nlm.nih.gov:80/entrez/utils/pmfetch_help.html > > can be used to retrieve data via gi using a "simple" URL like this: > > wget -O dmwhite.genbank \ > 'http://www.ncbi.nlm.nih.gov/entrez/utils/pmfetch.fcgi?db=Nucleotide&id=10873&report=gen&mode=text' > > Unfortunately it seems not to be able to retrieve by either accession > number or > locus name - I'm still waiting to hear if there is some other NCBI > interface for that. Oops. Will be fixed in 2.4.0 (Alan and I thought it already was, but it needed one extra line of code in the latest CVS version). The problem is that EMBOSS checks the ID and accession of the returned entry for the URL access method, and of course neither matches '10873'. Which leads on to a new access method for 2.4.0. We are adding an "srswww" access method that generates the SRS URLs, and can query by id, accession, seqversion (or GI), keyword, organism or description. We can add at least some of these for entrez (new access method entrez) if we can gather up enough URLs. Are there any entrez experts who can help with suggested URLs to retrieve (preferably plain text, but html will do) from entrez with queries for each of these fields? Peter -- ------------------------------------------------ Peter Rice, LION Bioscience Ltd, Cambridge, UK peter.rice at uk.lionbioscience.com +44 1223 224723 From sghk100 at sghms.ac.uk Fri Apr 19 15:19:15 2002 From: sghk100 at sghms.ac.uk (David Winterbourne) Date: Fri, 19 Apr 2002 16:19:15 +0100 Subject: network USA Message-ID: <3CC03573.4A76C643@sghms.ac.uk> David Martin wrote: > ... > > Try 'method: url' and using %s instead of $ID. It has been there from > EMBOSS 0.0.4 to > allow retrieval from remote srs servers (or indeed any arbitrary web > address where the id can be passed in the url). I have been having a problem accessing the Swiss Prot database using this method. I set up URL based access to SWISSPROT and EMBL databases at EBI as follows: DB sw [ type: P method: url format: swiss url: "http://srs.ebi.ac.uk/srs6bin/cgi-bin/wgetz?-e+[SWISSPROT:%s]" DB embl [ type: N method: url format: embl url: "http://srs.ebi.ac.uk/srs6bin/cgi-bin/wgetz?-e+[EMBL-id:%s]" For an EMBL entry, using the URL in a browser and specifying it in EMBOSS accesses the data. However, the equivalent for Swiss Prot works in a browser but not in EMBOSS - it just causes the system to hang. Is there a simple solution? Regards David -- David Winterbourne Department of Surgery St. George's Hospital Medical School, London SW17 0RE, England Tel: 020 8725 5581 Fax: 020 8725 3594 From jfreeman at variagenics.com Mon Apr 22 21:21:51 2002 From: jfreeman at variagenics.com (James Freeman) Date: Mon, 22 Apr 2002 17:21:51 -0400 Subject: Jemboss and Resin Message-ID: <3CC47EEF.635B531E@variagenics.com> To whom it may concern, Does anyone know of any problems when using Resin (http://www.caucho.com/) as a substitute for Tomcat when running Jemboss? Thanks for your assistance, Jim Freeman Senior Scientist Variagenics, Inc. From tchiang at bioinfo.sickkids.on.ca Tue Apr 23 14:01:40 2002 From: tchiang at bioinfo.sickkids.on.ca (Ted Chiang) Date: Tue, 23 Apr 2002 10:01:40 -0400 (EDT) Subject: EMBOSS:complex Message-ID: Just a quick question. In the 2.3.1 release, the EMBOSS program 'complex' is not fully implemented. Will this program be in the next release or have we missed something in the installation? -Ted ===================================== Ted Chiang Bioinformatics Supercomputing Centre Hospital for Sick Children, Toronto ext. 7028 tchiang at bioinfo.sickkids.on.ca From peter.rice at uk.lionbioscience.com Tue Apr 23 14:30:23 2002 From: peter.rice at uk.lionbioscience.com (Peter Rice) Date: Tue, 23 Apr 2002 15:30:23 +0100 Subject: EMBOSS:complex References: Message-ID: <3CC56FFF.3C3AFB6A@uk.lionbioscience.com> Ted Chiang wrote: > > Just a quick question. In the 2.3.1 release, the EMBOSS program 'complex' > is not fully implemented. Will this program be in the next release or > have we missed something in the installation? complex is a strange application (with italian command line options) that the authors have not been maintaining. We have moved it into the "make check" set of obsolete/testing applications. If you do need it, the "make check" command will build it, but you then need to copy the binary and other files to the install directories by hand. One unfortunate side effect of moving applications to "make check" (or removing them) is that the old binaries will stay in the install directory. Perhaps we can find a way to clean them up ... need to think about that a little. regards, Peter -- ------------------------------------------------ Peter Rice, LION Bioscience Ltd, Cambridge, UK peter.rice at uk.lionbioscience.com +44 1223 224723 From letondal at pasteur.fr Tue Apr 23 15:06:50 2002 From: letondal at pasteur.fr (Catherine Letondal) Date: Tue, 23 Apr 2002 17:06:50 +0200 Subject: Pise/EMBOSS 2.3.1 Message-ID: <200204231506.g3NF6oop249416@electre.pasteur.fr> Hi, I have more or less adapted new ACD types and attributes to Pise. (ftp://ftp.pasteur.fr/pub/GenSoft/unix/misc/Pise/emboss_xml_files-2.3.1.tar.gz) Main changes were for align types, where I could associate a "pipetype" to chain to other programs taking alignment as input. BTW, I found "MSF" and "fasta" for the -aformat parameter - are there others? The main problem I had was with string parameters for specifying a path, with ./ default value and having corresponding extn parameters. On a Web interface you cannot really allow path and filename manipulation, and you must give a mean to the user to upload or input data (except if you have a login on user home directory, which I'm aware is the choice for other Web interfaces for EMBOSS). That's why I had to discard the following programs: alignwrap, contacts, seqnr, seqsort, siggen, dichet and scopalign. I have tried to "guess" that such parameter is a path, according to their name, with the next parameter being the extension, and, it's in the input or output section, so I can decide it's an InFile or Sequence, or Results Pise parameter. But some parameters are neither in input nor in output sections, and it's not secure to associate to parameters just because they follow each other. A solution could be to have an explicit type and the extension as an attribute: path: algpath [ parameter: "Y" prompt: "Location and extension of alignment files for input" default: "./" extn: ".align" ] instead of (siggen.acd): section: input [ info: "input Section" type: page ] string: algpath [ parameter: "Y" prompt: "Location of alignment files for input" default: "./" ] string: algextn [ parameter: "Y" prompt: "Extension of alignment files for input" default: ".align" ] What do you think? Thanks a lot in advance, -- Catherine Letondal -- Pasteur Institute Computing Center From peter.rice at uk.lionbioscience.com Tue Apr 23 16:03:27 2002 From: peter.rice at uk.lionbioscience.com (Peter Rice) Date: Tue, 23 Apr 2002 17:03:27 +0100 Subject: Pise/EMBOSS 2.3.1 References: <200204231506.g3NF6oop249416@electre.pasteur.fr> Message-ID: <3CC585CF.D664FB68@uk.lionbioscience.com> Hi Catherine, > Main changes were for align types, where I could associate a "pipetype" to > chain to other programs taking alignment as input. BTW, I found "MSF" and "fasta" for the > -aformat parameter - are there others? There are more, but not sequence formats. We should add them to "entrails" output. We can easily add more sequence formats. Can you suggest some? The full list is (from ajax/ajalign.c) : markx0*, markx1*, markx2*, markx3*, markx10* (from the FASTA package) multiple pair* simple score srs, srspair* (for simple parsing in SRS in case the others change) trace (for debugging only) Those with '*' are for pairwise alignments only. > The main problem I had was with string parameters for specifying a path, with ./ default > value and having corresponding extn parameters. > > That's why I had to discard the following programs: > alignwrap, contacts, seqnr, seqsort, siggen, dichet and scopalign. > > A solution could be to have an explicit type and the extension as an attribute: > > path: algpath [ > parameter: "Y" > prompt: "Location and extension of alignment files for input" > default: "./" > extn: ".align" > ] > > What do you think? The path and extension options are a terrible 'hack' to avoid having "*" on the command line for those programs. This is really just infile with a wild card filename (which works already). We can make a new ACD type "inwild" which works like infile but with some small differences. The prompt would be "Input file(s)". The ajAcdGetInwild function will return an AjPFile. We can add functions to report the filenames as a string list (the first file is already open, the others are in a list so it is a little tricky to make the list in an application). There should be an attribute "inextension:align" (for example) and a default value of "*". If the user specifies "*.align" the inextension will be ignored. Associated qualifiers: -inextension align -indirectory /home/user/somewhere (defaults to current directory) For consistency, we can add the same qualifiers for infile. With "out" instead of "in" we can sue the same qualifiers for outfile and a new ACD type "outwild" (outwild can open a new output file, using a new ajFileNextOut call, but the application needs to give the base name each time). All easy to implement. One problem ... inwild does not work well as a parameter because it has to be given as "*" on the command line. Same problem for "outwild". I am sure users can be educated. The programs that use the path/extension options do not define them as parameters anyway. Their ACD files need some corrections. Comments? regards, Peter -- ------------------------------------------------ Peter Rice, LION Bioscience Ltd, Cambridge, UK peter.rice at uk.lionbioscience.com +44 1223 224723 From letondal at pasteur.fr Tue Apr 23 17:38:43 2002 From: letondal at pasteur.fr (Catherine Letondal) Date: Tue, 23 Apr 2002 19:38:43 +0200 Subject: Pise/EMBOSS 2.3.1 In-Reply-To: Your message of "Tue, 23 Apr 2002 17:03:27 BST." <3CC585CF.D664FB68@uk.lionbioscience.com> Message-ID: <200204231738.g3NHchop186093@electre.pasteur.fr> Peter Rice wrote: > Hi Catherine, Hi Peter, > > > Main changes were for align types, where I could associate a "pipetype" to > > chain to other programs taking alignment as input. BTW, I found "MSF" and "fasta" for the > > -aformat parameter - are there others? > > There are more, but not sequence formats. We should add them to "entrails" > output. > > We can easily add more sequence formats. Can you suggest some? I just asked to know which one to put on the Web interface. (There are also clustalw or Phylip, but it's not necessary in Pise, since there are format converters). > Those with '*' are for pairwise alignments only. > > > The main problem I had was with string parameters for specifying a path, with ./ default > > value and having corresponding extn parameters. > > > > That's why I had to discard the following programs: > > alignwrap, contacts, seqnr, seqsort, siggen, dichet and scopalign. > > > > A solution could be to have an explicit type and the extension as an attribute: > > > > path: algpath [ > > parameter: "Y" > > prompt: "Location and extension of alignment files for input" > > default: "./" > > extn: ".align" > > ] > > > > What do you think? > > The path and extension options are a terrible 'hack' to avoid having "*" > on the command line for those programs. I have the same problem just with ./, since '/' cannot be allowed in a string parameter on a Web server. Another problem I have made a workaround for, is the '*' programs such as extractseqfeat, where it is replaced in the Web form by 'all', then replaced in the CGI by '*'. > > This is really just infile with a wild card filename (which works already). > > We can make a new ACD type "inwild" which works like infile but with some > small differences. The prompt would be "Input file(s)". The ajAcdGetInwild > function will return an AjPFile. We can add functions to report the > filenames as a string list (the first file is already open, the others are > in a list so it is a little tricky to make the list in an application). > > There should be an attribute "inextension:align" (for example) and a > default value of "*". If the user specifies "*.align" the inextension will > be ignored. > > Associated qualifiers: > > -inextension align > -indirectory /home/user/somewhere (defaults to current directory) > > For consistency, we can add the same qualifiers for infile. > > With "out" instead of "in" we can sue the same qualifiers for outfile and a > new ACD type "outwild" (outwild can open a new output file, using a new > ajFileNextOut call, but the application needs to give the base name each > time). > > All easy to implement. > > One problem ... inwild does not work well as a parameter because it has to > be given as "*" on the command line. Same problem for "outwild". I am sure > users can be educated. > > The programs that use the path/extension options do not define them as > parameters anyway. Their ACD files need some corrections. > > Comments? As long as there is a way to detect such kind of parameter (in order to replace them by a simple textarea or file upload on a Web interface), I think it's very useful! So the type would be inwild or outwild? PS: Regarding Pise/EMBOSS I forgot to mention that not only output alignment are "connected" by Pise menus. I have also added this feature for sequence, seqall, seqout, etc... Thanks for the quick answer! -- Catherine Letondal -- Pasteur Institute Computing Center From mathog at mendel.bio.caltech.edu Tue Apr 23 18:25:34 2002 From: mathog at mendel.bio.caltech.edu (David Mathog) Date: Tue, 23 Apr 2002 11:25:34 -0700 Subject: Pise/EMBOSS 2.3.1 Message-ID: > One problem ... inwild does not work well as a parameter because it has to > be given as "*" on the command line. Same problem for "outwild". I am sure > users can be educated. Sure they can. That's why thousands of hours are being spent wrapping GUIs around programs so that users don't have to (horrors) log on or (gasp) type a command line. Back to the subject at hand. (And this is stream of consciousness, so please bear with me.) I think that maybe for purposes of interface design there should be predefined methods to break out (all) the pieces/options of a USA. (Perhaps even reduced to perl and C modules in the EMBOSS distribution so that W2h/Pise/etc don't need to be rewritten for each EMBOSS release.) Consider something like this: program -sequence=genbank:\* That never translates directly well into a GUI because the end user has to know what the full USA syntax is and especially that a "*" is a wild card. And often enough, they don't understand these concepts. And even if they do, they may not be able to use certain aspects of that syntax on a given server (for instance, files and paths, or particular databases.) So it falls to the GUI to put some glue in between the USA and the user. The two main web interfaces for EMBOSS take opposite paths in this regard. Pise hides the USA completely and W2H allows the user to manipulate USAs through a tool. In W2H you generally have to build the USAs ahead of time through a separate window and store them in a list, then you select one or more USAs from the list when you run the program. (USAs can also generally be typed into the slots within the program - if the user knows what he/she is doing.) In PISE you can enter a database USA like "genbank:dmwhite" (but it isn't called a USA) but entering "genbank:*" doesn't work (for instance, with compseq). PISE isn't really designed to handle wild cards because it's going to try to extract that whole sequence from the database and save it in a file and then run the program on that file. This is consistent with its typical "upload data for each program" design. Pise only ever runs programs with the "simple file" sort of USA. So perhaps its just as well that "genbank:*" doesn't work at the moment!!! To get around this wildcard limitation Pise would have to be reworked enough to recognize wildcards (and USAs in general) and slot them onto the command line without first extracting the sequences they refer to. Anyway, what's really going on with -sequence is that all of the components of USA are encoded into a single string for use on the command line and then are broken out again into separate pieces later within the program. For a GUI _all_ these pieces need to be broken out explicitly and displayed to the user (who isn't expected to know anything about USAs or have to learn anything them or the interface). Something like this: format: default database:genbank x ALL_ENTRIES o BY_STRING entrystring: (blank) >From that the GUI/cgi can easily enough format a USA for the final command line. But imagine using such an interface. It's great if you just run an occasional program but not so wonderful when you're doing something complex. How do you cut and paste the state of 4 (or more) USA variables from one page (=program) to another? That suggests to me that a GUI which always has fully broken out USA options will probably end up being pretty awkward to use. However, since the purpose of the GUI is to essentially reformat (implicit) information in the USA why not make that an explicit option - and let it reformat in both directions? Then the "standard" USA GUI interface starts to look something like this: [test usa] [from USA] [to USA] [use this] [abort] <------(buttons) USA:[ genbank:* ] format: default database:genbank <-------- (pull down list) x ALL_ENTRIES o BY_STRING entrystring: (blank) Actually it's a LOT more complicated than that, considering that it also encompasses listfiles, multiple entries (foo.msf{one,two, three}) etc.. If the user has a USA he/she can plug it into the GUI and fine. Or they can plug it, translate it, and tweak it. Or if they don't have a USA to start with they can use this page to build one. And this USA constructor page can enable/disable the USA fields as appropriate for each site and/or program. (No file access? Can't accept list files or wild cards? Then don't show those USA options. Make the database list from the output of showdb.) The final problem is that exposing the guts of the USA will take up a lot of screen space and complicate the program interfaces. That's less of a problem though if the GUI for any given EMBOSS program just provides a slot to plug in a USA and some way to pop up the USA fomatter window to fill in that slot (through javascript or whatever). The popped up formatter could then drop the final USA back into the program's USA slot. (Sort of like what W2H does, but into the programs slot rather than the working list). Regards, David Mathog mathog at caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech From jison at hgmp.mrc.ac.uk Wed Apr 24 08:40:25 2002 From: jison at hgmp.mrc.ac.uk (Dr J.C. Ison) Date: Wed, 24 Apr 2002 09:40:25 +0100 Subject: Pise/EMBOSS 2.3.1 References: <200204231506.g3NF6oop249416@electre.pasteur.fr> <3CC585CF.D664FB68@uk.lionbioscience.com> Message-ID: <3CC66F79.D7AB51BA@hgmp.mrc.ac.uk> > The programs that use the path/extension options do not define them as > parameters anyway. Their ACD files need some corrections. > > Comments? They are parameters in new versions of the the protein structure apps (alignwrap, contacts, seqnr, seqsort, siggen, dichet and scopalign etc) but I haven't committed them yet - within a month hopefully. J. From charles at moulinette.dyndns.org Fri Apr 26 21:30:56 2002 From: charles at moulinette.dyndns.org (Charles Plessy) Date: Fri, 26 Apr 2002 23:30:56 +0200 Subject: seqret doesn't count more than 99? Message-ID: <20020426213056.GA26616@moulinette.dyndns.org> Hello, I downloaded the draft of the fugu genome (fasta format, 300Mb) and renamed the headers using the following command line : sed < fugu_02_04_28.fasta 's/>/>gnl|fugu|/' > fugu_newheaders_02_04_28.fasta I'm not able to index a blast database correctly if the header doesn't look ?ncbi compliant? ant formatdb haddn't been run with the -o flag. I created the blast database and indexed it with dbiblast. The reason for not formatting the fasta file itself is to save space. This also enforces a synchronicity between the blast hits names and the names that I can give to seqret. Here is now the prbolem : charles at pc-1035-a:~$ seqret fugu:Scaffold_7 Reads and writes (returns) sequences Output sequence [scaffold_7.fasta]: ==> OK! charles at pc-1035-a:~$ seqret fugu:Scaffold_99 Reads and writes (returns) sequences Output sequence [scaffold_99.fasta]: ==> OK! charles at pc-1035-a:~$ seqret fugu:Scaffold_100 Reads and writes (returns) sequences Error: Unable to read sequence 'fugu:Scaffold_100' ==> KO :(( seqret can't fetch sequences names like Scaffold_xzy, where xyz >= 100. Is it due to the lenght of the name? I am puzzled with that problem... I can send you more info if you like. Charles From simon.andrews at bbsrc.ac.uk Mon Apr 29 09:49:29 2002 From: simon.andrews at bbsrc.ac.uk (simon andrews (BI)) Date: Mon, 29 Apr 2002 10:49:29 +0100 Subject: seqret doesn't count more than 99? Message-ID: <2DC41140A89ED411989D00508BDCD9ED01E28535@bi-exsrv1.iapc.bbsrc.ac.uk> > -----Original Message----- > From: Charles Plessy [mailto:charles at moulinette.dyndns.org] > Sent: 26 April 2002 22:31 > To: emboss at hgmp.mrc.ac.uk > Subject: seqret doesn't count more than 99? > > > Hello, > > I downloaded the draft of the fugu genome [snip] > I'm not able to index a blast database correctly if the header doesn't > look ?ncbi compliant? ant formatdb haddn't been run with the -o flag. I'd not tried this before, but we see the same thing here. Running dbiblast on the indexed raw fugu data seems to work, but seqret fails on the subsequent retrieval. The problem seems to be in the accession numbers entered into the .trg file created by dbiblast. Running seqret with debug on, shows the following (edited) entries: ------------------------------------ USA to test: 'fugu_blasttest:Scaffold_1' [snip] found dbname fugu_blasttest wild query 'Scaffold_1' 'Scaffold_1' '' database type: 'N' format 'ncbi' use access method 'blast' Matched seqAccess[12] 'blast' seqAccessBlast type 1 [snip] seqCdIdxSearch (entry 'Scaffold_1') [several more of these] idx test 59 'Scaffold_100' -1 (+/- 39) idx test 49 'Contig_83248' 1 (+/- 18) idx test 54 'Contig_9376' 1 (+/- 8) idx test 56 'Scaffold_10' -1 (+/- 3) idx test 55 'Scaffold_1' -1 (+/- 0) ajFileNewIn '/data/Fugu/EMBOSS/TEST/acnum.trg' ajNamResolve of '/data/Fugu/EMBOSS/TEST/acnum.trg' seqCdReadHeader file /data/Fugu/EMBOSS/TEST/acnum.trg FileSize: 416800 NRecords: 20825 recsize: 20 idsize: 10 seqCdFileOpen '/data/Fugu/EMBOSS/TEST/acnum.trg' NRecords: 20825 RecSize: 20 ajFileNewIn '/data/Fugu/EMBOSS/TEST/acnum.hit' ajNamResolve of '/data/Fugu/EMBOSS/TEST/acnum.hit' seqCdReadHeader file /data/Fugu/EMBOSS/TEST/acnum.hit FileSize: 83600 NRecords: 20825 recsize: 4 idsize: -6 seqCdFileOpen '/data/Fugu/EMBOSS/TEST/acnum.hit' NRecords: 20825 RecSize: 4 seqCdTrgSearch 'Scaffold_1' recSize: 20 trg test 10412 'ZZ0010413' -1 (+/- 20825) trg test 5206 'ZZ0005207' -1 (+/- 10412) trg test 2603 'ZZ0002604' -1 (+/- 5206) trg test 1301 'ZZ0001302' -1 (+/- 2603) trg test 650 'ZZ0000651' -1 (+/- 1301) trg test 325 'ZZ0000326' -1 (+/- 650) trg test 162 'ZZ0000163' -1 (+/- 325) trg test 81 'ZZ0000082' -1 (+/- 162) trg test 40 'ZZ0000041' -1 (+/- 81) trg test 20 'ZZ0000021' -1 (+/- 40) trg test 10 'ZZ0000011' -1 (+/- 20) trg test 5 'ZZ0000006' -1 (+/- 10) trg test 2 'ZZ0000003' -1 (+/- 5) trg test 1 'ZZ0000002' -1 (+/- 2) trg test 0 'ZZ0000001' -1 (+/- 1) 'SCAFFOLD_1' not found found in .trg ------------------------------------------------ After this is cleans up after itself and exits. Looking through the .trg file all the accessions are of the form ZZ0000XXX. This format of accession doesn't appear anywhere in my original data, so I don't know where it's coming from (presumably either dbiblast or formatdb?). The inability to reconcile the Scaffold_1 with the ZZ00... accessions seems to be what causes seqret to fail. > I created the blast database and indexed it with dbiblast. The reason > for not formatting the fasta file itself is to save space. This also > enforces a synchronicity between the blast hits names and the names > that I can give to seqret. The way we did this was to use the fasta files for both. I take the point about the space saving, but the assembled data wasn't all that big. If you use the raw fasta files for both formatdb (without header parsing) and dbifasta, then you can still use the same accession codes as reference in both. > Here is now the prbolem : > > charles at pc-1035-a:~$ seqret fugu:Scaffold_100 > Reads and writes (returns) sequences > Error: Unable to read sequence 'fugu:Scaffold_100' > > ==> KO :(( > > seqret can't fetch sequences names like Scaffold_xzy, where > xyz >= 100. > > Is it due to the length of the name? It might be worth running seqret with the -debug flag on and looking at the messages at the end of seqret.dbg. This usually gives some more useful information about what is going wrong in these cases. I'd be interested in seeing a resolution to this as well... TTFN Simon. From peter.rice at uk.lionbioscience.com Mon Apr 29 10:41:30 2002 From: peter.rice at uk.lionbioscience.com (Peter Rice) Date: Mon, 29 Apr 2002 11:41:30 +0100 Subject: seqret doesn't count more than 99? References: <2DC41140A89ED411989D00508BDCD9ED01E28535@bi-exsrv1.iapc.bbsrc.ac.uk> Message-ID: <3CCD235A.D418202E@uk.lionbioscience.com> "simon andrews (BI)" wrote: > The problem seems to be in the accession numbers entered into the .trg file > created by dbiblast. Running seqret with debug on, shows the following > (edited) entries: The command line: seqret fugu_blasttest:Scaffold_1 searches both the entryname and acnum indices. The ZZ accession number are invented bu dbiblast so there is something in the acnum index (they should disappear in 2.4.0, where we handle empty indices gracefully). The problem will be in the entryname index, where is seems Scaffold_1 was found, but not accepted. I am waiting for the example file from Charles, but I suspect this is a problem already fixed in the code for 2.4.0. regards, Peter -- ------------------------------------------------ Peter Rice, LION Bioscience Ltd, Cambridge, UK peter.rice at uk.lionbioscience.com +44 1223 224723 From charles at moulinette.dyndns.org Mon Apr 29 13:22:07 2002 From: charles at moulinette.dyndns.org (Charles Plessy) Date: Mon, 29 Apr 2002 15:22:07 +0200 Subject: seqret doesn't count more than 99? In-Reply-To: <2DC41140A89ED411989D00508BDCD9ED01E28535@bi-exsrv1.iapc.bbsrc.ac.uk> References: <2DC41140A89ED411989D00508BDCD9ED01E28535@bi-exsrv1.iapc.bbsrc.ac.uk> Message-ID: <20020429132207.GD1818@moulinette.dyndns.org> > I'd not tried this before, but we see the same thing here. Running dbiblast > on the indexed raw fugu data seems to work, but seqret fails on the > subsequent retrieval. I have to NCBIze the headers in order to make it work : I use either lcl|entryname or gnl|dbname|entryname > > I created the blast database and indexed it with dbiblast. The reason > > for not formatting the fasta file itself is to save space. This also > > enforces a synchronicity between the blast hits names and the names > > that I can give to seqret. > > The way we did this was to use the fasta files for both. I take the point > about the space saving, but the assembled data wasn't all that big. If you > use the raw fasta files for both formatdb (without header parsing) and > dbifasta, then you can still use the same accession codes as reference in > both. You are right, I was also motivated to do something 'aesthetic' ;) > It might be worth running seqret with the -debug flag on and looking at the > messages at the end of seqret.dbg. This usually gives some more useful > information about what is going wrong in these cases. I can send the debug info upon request, the files (one success, one failure) are not that big (70k) but I think that netiquette doesn't recommend sending them to all the list. Charles