From fchetou at infobiogen.fr Tue Jul 3 04:55:58 2001 From: fchetou at infobiogen.fr (Farid Chetouani) Date: Tue, 03 Jul 2001 10:55:58 +0200 Subject: Protein Clustering tool Message-ID: <3B41889E.3EDC8D1E@infobiogen.fr> Bonjour I would like to know, if there is plan in Emboss to develop a software to cluster protein into families (of paralogues/orthologues) according to the sequence similarity thank you for your help F PS: please reply to my email fchetou at infobiogen.fr From frank at bioss.sari.ac.uk Tue Jul 3 05:18:20 2001 From: frank at bioss.sari.ac.uk (Frank Wright) Date: Tue, 03 Jul 2001 10:18:20 +0100 Subject: Protein Clustering tool References: <3B41889E.3EDC8D1E@infobiogen.fr> Message-ID: <3B418DDC.F2004E00@bioss.sari.ac.uk> Hi All, If you wish to construct phylogenetic trees (specifically gene trees) from protein sequences so as to infer duplication and paralogous/orthologous relationships, then you can use the PHYLIP package (available as an EMBASSY application). Genetic distances can be calculated using EPROTDIST and the distance matrix created can be input into either EFITCH (slower, more accurate tree) or ENEIGHBOR (faster, more approximate clustering method, allowing the use of the Neighbor-Joining algorithm, or the UPGMA algorithm - use the latter only if you have previously tested that the "molecular clock" assumption is valid for your dataset). ePROTDIST, eFITCH and eNEIGHBOR come from version 3.5 of the PHYLIP package (http://evolution.genetics.washington.edu). PHYLIP 3.6 has recently been released (alpha version). However, PROTDIST 3.6 has improved distances (copes with among-site rate heterogeneity to give more accurate genetic distances) and there are also improvements to NEIGHBOR 3.6 (faster) and to FITCH 3.6. I presume that PHYLIP 3.6 will be available as an EMBASSY application once it is confident that there are no serious bugs :-) I hope that helps, Best Wishes, Frank -- Frank Wright Biomathematics and Statistics Scotland, SCRI, DUNDEE DD2 5DA, Scotland frank at bioss.sari.ac.uk From fchetou at pasteur.fr Tue Jul 3 05:38:29 2001 From: fchetou at pasteur.fr (Farid Chetouani) Date: Tue, 3 Jul 2001 11:38:29 +0200 Subject: Protein Clustering tool In-Reply-To: <3B418DDC.F2004E00@bioss.sari.ac.uk>; from frank@bioss.sari.ac.uk on Tue, Jul 03, 2001 at 10:18:20AM +0100 References: <3B41889E.3EDC8D1E@infobiogen.fr> <3B418DDC.F2004E00@bioss.sari.ac.uk> Message-ID: <20010703113829.A38883@pasteur.fr> Bonjour Firstly, Frank thank you for your reply. I am sorry my first email was not enough precise. In fact, I was wondering if EMBOSS plan to provide a free clustering tool with a view to get from a protein fasta sequence file a list of family proteins. For instance, thanks to A. Enright & C. Ouzounis GeneRage software is free for academic research (http://www.ebi.ac.uk/research/cgg/services/rage/) but the sources are not yet available best regards thank you for your help F PS: please reply to my email, fchetou at infobiogen.fr > > If you wish to construct phylogenetic trees (specifically gene trees) > from protein sequences so as to infer duplication and > paralogous/orthologous relationships, then you can use the PHYLIP > package (available as an EMBASSY application). Genetic distances can be > calculated using EPROTDIST and the distance matrix created can be input > into either EFITCH (slower, more accurate tree) or ENEIGHBOR (faster, > more approximate clustering method, allowing the use of the > Neighbor-Joining algorithm, or the UPGMA algorithm - use the latter only > if you have previously tested that the "molecular clock" assumption is > valid for your dataset). > > ePROTDIST, eFITCH and eNEIGHBOR come from version 3.5 of the PHYLIP > package (http://evolution.genetics.washington.edu). PHYLIP 3.6 has > recently been released (alpha version). However, PROTDIST 3.6 has > improved distances (copes with among-site rate heterogeneity to give > more accurate genetic distances) and there are also improvements to > NEIGHBOR 3.6 (faster) and to FITCH 3.6. I presume that PHYLIP 3.6 will > be available as an EMBASSY application once it is confident that there > are no serious bugs :-) > > I hope that helps, > Best Wishes, > Frank > -- > Frank Wright > Biomathematics and Statistics Scotland, > SCRI, DUNDEE DD2 5DA, Scotland > frank at bioss.sari.ac.uk From jison at hgmp.mrc.ac.uk Tue Jul 3 05:48:04 2001 From: jison at hgmp.mrc.ac.uk (Dr J.C. Ison) Date: Tue, 03 Jul 2001 10:48:04 +0100 Subject: Protein Clustering tool References: <3B41889E.3EDC8D1E@infobiogen.fr> Message-ID: <3B4194D4.929C7A3D@hgmp.mrc.ac.uk> Software to cluster protein sequences into families on the basis of relatedness of sequence is on my list of jobs to do - will happen within the next 3 months. I personally need something quite simple minded, if you have any specific requirements let me know and I can try and pull it in my design. Cheers J. Farid Chetouani wrote: > Bonjour > > I would like to know, > if there is plan in Emboss to develop > a software to cluster protein into families (of paralogues/orthologues) > according to the sequence similarity > > thank you for your help > > F > > PS: please reply to my email fchetou at infobiogen.fr -- Jon C. Ison, PhD Bioinformatics Applications Group UK MRC Human Genome Mapping Project Resource Centre Hinxton, Cambridge, CB10 1SB, UK E-mail : jison at hgmp.mrc.ac.uk Tel : 01223 49-4548 HGMP-RC: http://www.hgmp.mrc.ac.uk/ EMBOSS : http://www.hgmp.mrc.ac.uk/Software/EMBOSS/ CCP11 : http://www.hgmp.mrc.ac.uk/CCP11/ From gbottu at ben.vub.ac.be Mon Jul 9 05:35:09 2001 From: gbottu at ben.vub.ac.be (Guy Bottu) Date: Mon, 9 Jul 2001 11:35:09 +0200 (MET DST) Subject: No subject Message-ID: <200107090935.LAA09786@bigben.vub.ac.be> Dear friends, I am puzzled by pscan outputs. I do not see the difference between "Not all elements match but those that do are in order" and "Remaining partial matches", since in both cases there are two matches with the same element. And, in general, how does pscan handle cases where you the protein really contains several times the same motif (e.g. proteins with kringles). Can Alan or someone else anwer this ? Regards, Guy Bottu -------------- next part -------------- CLASS 1 Fingerprints with all elements in order CLASS 2 All elements match but not all in the correct order Fingerprint HTHREPRESSR Elements 2 Accession number PR00031 Lambda and other repressor helix-turn-helix signature Element 1 Threshold 50% Score 73% Start position 135 Length 10 Element 2 Threshold 32% Score 32% Start position 74 Length 17 CLASS 3 Not all elements match but those that do are in order Fingerprint GEMCOATBR1 Elements 7 Accession number PR00225 Geminivirus BR1 coat protein signature Element 3 Threshold 30% Score 37% Start position 281 Length 15 Element 3 Threshold 30% Score 31% Start position 196 Length 15 CLASS 4 Remaining partial matches Fingerprint GABAARBETA Elements 4 Accession number PR01160 Gamma-aminobutyric-acid A receptor beta subunit signature Element 1 Threshold 33% Score 34% Start position 275 Length 15 Element 1 Threshold 33% Score 33% Start position 187 Length 15 From sgmd at genetik.fu-berlin.de Tue Jul 10 04:36:25 2001 From: sgmd at genetik.fu-berlin.de (Thomas Siegmund) Date: Tue, 10 Jul 2001 10:36:25 +0200 Subject: Announce: X GUI for EMBOSS V0.5 Message-ID: <20010710083627.D881617AD6@mercury.hgmp.mrc.ac.uk> Dear all, a few months ago I announced my plan to build a X Window GUI for EMBOSS based on Kaptain and QT/KDE. Today I'd like to inform you that I have made some progress with it. Version 0.5 of EMBOSS.kaptn is available at http://userpage.fu-berlin.de/~sgmd . ChangeLog: ========== Version 0.5 - Covering 50 EMBOSS applications with (almost) all options - Integrated EMBOSS help system - Use new regexpression features of Kaptain 0.6. This allows fallback to EMBOSS defaults, if text input fields for parameters like "-outfile" are empty. - Files can be selected by drag & drop - Addition of embosslauncher, a tool to set the working directory and to run different EMBOSS applications with the same sequence file - Simple install script Version 0.1 - First simple GUIs for 12 EMBOSS applications - First public announcement at emboss at embnet.org Please give it a try and let me know what you think. With best regards Thomas -- Thomas Siegmund Freie Universit?t Berlin Institut f?r Genetik Arnimallee 7 14195 Berlin Germany Tel: +49 30 838 54868 Fax: +49 30 838 54395 http://userpage.fu-berlin.de/~sgmd From friends at openxxx.net Sat Jul 14 21:17:19 2001 From: friends at openxxx.net (friends at openxxx.net) Date: Sun, 15 Jul 2001 02:17:19 +0100 (BST) Subject: Hello, your friend recommended openxxx to you Message-ID: <20010715011719.4C2CA17A56@mercury.hgmp.mrc.ac.uk> You have been invited to check out this adult site by one of your friends who visited us. our URL is http://www.openxxx.net/ enjoy, OpenXXX TEAM 2001 From ableasby at hgmp.mrc.ac.uk Sun Jul 15 08:51:31 2001 From: ableasby at hgmp.mrc.ac.uk (ableasby at hgmp.mrc.ac.uk) Date: Sun, 15 Jul 2001 13:51:31 +0100 (BST) Subject: Announcing EMBOSS 2.0.0 Message-ID: <200107151251.NAA11553@bromine.hgmp.mrc.ac.uk> EMBOSS 2.0.0 includes: 1. Feature table reading: EMBL, Swissprot and PIR feature tables are handled by rewritten library routines. Tables can currently be read and written (or interconverted) in native or GFF formats. For the applications programmer an internal key/value pair structure greatly simplifies use. 2. Report Handling: Stub code to enable application output to be selected in (a range of) standard output report formats has been included. The feature tables above use one of these formats. More report formats will be added during the lifetime of the 2.x.x series. Release 3.0.0 of EMBOSS will mark the completion of this phase. 3. Code purification: All library code and applications are tested for memory handling before release. To our knowledge the code does not leak a single byte in normal use. A "purify" script is provided (mainly for developers). 4. Quality control: code has been written, supplied and used for testing code prior to release. This ensures that applications produce the same output (where appropriate) after changes to the library etc. A QA test script is provided/ 5. Code modification: almost all the source code has been revamped since the 1.x.x series. All functions, including those in applications, have unique names. This now allows you to navigate the entire source code using SRS. 6. Protein structure code has been added and, although not yet complete, this marks one of many new directions for applications. Not entirely by coincidence the release of 2.0.0, like 1.0.0, has happened on St Swithin's Day (15th July) just prior to the ISMB conference. So, if it works on that day it should work for 40 days thereafter! We look forward to making the same joke again next year (1). Alan On behalf of the development team (apologies if your name has been omitted by accident) who are: HGMP: Alan Bleasby, Tim Carver, Jon Ison, Ranjeeva Ranasinghe, Gary Williams Lion Bioscience: Peter Rice Special thanks to David Martin (University of Dundee) for the administration guide. To Lisa Mullan (HGMP training courses) for providing feedback and suggestions from course attendees. Thanks to all who have made suggestions, provided bug reports or contributed code. If we've failed to acknowledge you here you should be there in the source code. If not, tell us and we'll fix it! Footnote: 1. St. Swithin's Day if thou dost rain, For forty days it will remain; St. Swithin's Day if thou be fair, For forty days 'twill rain na mair. From ableasby at hgmp.mrc.ac.uk Mon Jul 16 06:55:04 2001 From: ableasby at hgmp.mrc.ac.uk (ableasby at hgmp.mrc.ac.uk) Date: Mon, 16 Jul 2001 11:55:04 +0100 (BST) Subject: EMBOSS 2.0.0 amendment Message-ID: <200107161055.LAA13298@bromine.hgmp.mrc.ac.uk> There was an omission in the original EMBOSS-2.0.0.tar.gz file which would have resulted in 3 of the protein structure acd files not being copied after a "make install". This has now been corrected and a replacement file put on the server. Alan From dessen at infobiogen.fr Mon Jul 16 08:25:38 2001 From: dessen at infobiogen.fr (Philippe Dessen) Date: Mon, 16 Jul 2001 14:25:38 +0200 Subject: fuzznuc Message-ID: <3B52DC18.3141B98D@infobiogen.fr> Just a question about fuzznuc : Is it possible to define a pattern with repetition of a motif (as n letters with n>1) ? That is not mentionned in documentation . The following pattern (a stop codon in a coding frame) seems to be illegal ! <(NNN)(0,)TGA(NNN)(1,)> $ fuzznuc seqfile Nucleic acid pattern search Search pattern: <(NNN)(0,)TGA(NNN)(1,)> Number of mismatches [0]: Output file [rptufrpx.fuzznuc]: This is a warning: Illegal character [(] EMBOSS An error in fuzznuc.c at line 96: Illegal pattern -------- in GCG syntax you can use (NNN){1,} Regards Philippe Dessen From gwilliam at hgmp.mrc.ac.uk Mon Jul 16 08:33:33 2001 From: gwilliam at hgmp.mrc.ac.uk (Gary Williams, Tel 01223 494522) Date: Mon, 16 Jul 2001 13:33:33 +0100 Subject: fuzznuc References: <3B52DC18.3141B98D@infobiogen.fr> Message-ID: <3B52DF1D.4B3F2FCF@hgmp.mrc.ac.uk> Philippe Dessen wrote: > > Just a question about fuzznuc : > Is it possible to define a pattern with repetition of a motif (as n > letters with n>1) ? > That is not mentionned in documentation . > > The following pattern (a stop codon in a coding frame) seems to be > illegal ! > <(NNN)(0,)TGA(NNN)(1,)> > > $ fuzznuc seqfile > Nucleic acid pattern search > Search pattern: <(NNN)(0,)TGA(NNN)(1,)> > Number of mismatches [0]: > Output file [rptufrpx.fuzznuc]: > This is a warning: Illegal character [(] > > EMBOSS An error in fuzznuc.c at line 96: > Illegal pattern I think this is illegal in fuzznuc's PROSITE-style of pattern. You might like to try 'dreg' instead with a regular expression like: ^(...)*TGA(...)+$ Note that these regilar expressions are case-sensitive, so put '-supper' on your command line to force the sequence into the required upper case. Gary -- Gary Williams Tel: +44 1223 494522 Fax: +44 1223 494512 mailto:G.Williams at hgmp.mrc.ac.uk http://www.hgmp.mrc.ac.uk/ Bioinformatics,MRC HGMP Resource Centre,Hinxton,Cambridge, CB10 1SB,UK From kala at avesthagen.com Mon Jul 23 02:43:47 2001 From: kala at avesthagen.com (Kala) Date: Mon, 23 Jul 2001 12:13:47 +0530 (IST) Subject: help pl. Message-ID: Hi all, Cud u pl.tell me whether i can install Emboss on True64Unix... I'm unable to untar it...It says "not look like a tar archive"... It'll b very useful if i get a reply soon. thanx in adv. kala From bauer at genprofile.com Mon Jul 23 03:06:57 2001 From: bauer at genprofile.com (David Bauer) Date: Mon, 23 Jul 2001 09:06:57 +0200 Subject: help pl. References: Message-ID: <3B5BCD11.C2E54D9B@genprofile.com> Kala wrote: > > Hi all, > Cud u pl.tell me whether i can install Emboss on True64Unix... > I'm unable to untar it...It says "not look like a tar archive"... The download file is a tar file which is compressed with gzip. If you have gnu tar use 'tar -xvzf '. The z options says its a compressed tar file. If your systems tar does not know how to handle compressed archives (like e.g. the Solaris tar) you must first run 'gunzip '. I hope this helps, Ciao, David. From dmartin at bioinformatics.msiwtb.dundee.ac.uk Mon Jul 23 04:52:14 2001 From: dmartin at bioinformatics.msiwtb.dundee.ac.uk (David Martin) Date: Mon, 23 Jul 2001 09:52:14 +0100 (BST) Subject: help pl. In-Reply-To: Message-ID: On Mon, 23 Jul 2001, Kala wrote: > Hi all, > Cud u pl.tell me whether i can install Emboss on True64Unix... > I'm unable to untar it...It says "not look like a tar archive"... First ensure that it was transferred in binary mode. Secondly, you will need to gunzip the archive before untarr'ing. Gnu tar includes gunzip (tar zxf filename) whereas many vendor supplied versions don't. The following command line may help zcat filename | tar xf - ..d > > It'll b very useful if i get a reply soon. > > thanx in adv. > kala > > > ---------------------------------- David Martin PhD Bioinformatics Scientific Officer Wellcome Trust Biocentre, Dundee ---------------------------------- From gbottu at ben.vub.ac.be Wed Jul 25 14:31:27 2001 From: gbottu at ben.vub.ac.be (Guy Bottu) Date: Wed, 25 Jul 2001 20:31:27 +0200 (MET DST) Subject: tfscan blues Message-ID: <200107251831.UAA17816@bigben.vub.ac.be> from : BEN Dear colleagues, I had already posted this question before, but nobody had replied. The problem is that the value of the program tfscan is decreasing, since we cannot get updates of TRANSFAC anymore, unless we pay a licence, and I wonder whether at all EMBnet Nodes have the right to give access to their users. For info, see http://www.biobase.de/academia.html Anybody a comment ? Guy Bottu From c.plessy at mangoosta.net Wed Jul 25 17:18:47 2001 From: c.plessy at mangoosta.net (Charles Plessy) Date: Wed, 25 Jul 2001 23:18:47 +0200 Subject: tfscan blues In-Reply-To: <200107251831.UAA17816@bigben.vub.ac.be> References: <200107251831.UAA17816@bigben.vub.ac.be> Message-ID: <01072523184702.02531@moulinette> Le Mercredi 25 Juillet 2001 20:31, Guy Bottu a ?crit : > from : BEN > > Dear colleagues, > > I had already posted this question before, but nobody had replied. The > problem is that the value of the program tfscan is decreasing, since we > cannot get updates of TRANSFAC anymore, unless we pay a licence, and I > wonder whether at all EMBnet Nodes have the right to give access to their > users. > For info, see http://www.biobase.de/academia.html > Anybody a comment ? I have a related question : do you think that it would be possible to build fake transfac databases from a simple file? Currently I'm adding into an array (in the GCG findpattern format) any binding site of my interest that i find in the litterature. (with a name and a reference) The goal would be to use existing programs to do searches within a set of home-selected transcription factors. Charles PLESSY From dmartin at bioinformatics.msiwtb.dundee.ac.uk Thu Jul 26 03:55:54 2001 From: dmartin at bioinformatics.msiwtb.dundee.ac.uk (David Martin) Date: Thu, 26 Jul 2001 08:55:54 +0100 (BST) Subject: tfscan blues In-Reply-To: <01072523184702.02531@moulinette> Message-ID: On Wed, 25 Jul 2001, Charles Plessy wrote: > Le Mercredi 25 Juillet 2001 20:31, Guy Bottu a ?crit : > > from : BEN > > > > Dear colleagues, > > > > I had already posted this question before, but nobody had replied. The > > problem is that the value of the program tfscan is decreasing, since we > > cannot get updates of TRANSFAC anymore, unless we pay a licence, and I > > wonder whether at all EMBnet Nodes have the right to give access to their > > users. > > For info, see http://www.biobase.de/academia.html > > Anybody a comment ? > > I have a related question : do you think that it would be possible to build > fake transfac databases from a simple file? > Currently I'm adding into an array (in the GCG findpattern format) any > binding site of my interest that i find in the litterature. (with a name and > a reference) > The goal would be to use existing programs to do searches within a set of > home-selected transcription factors. It woul dbe nice to have a public front end for such a database so that submissions could be sent to a curator. Then we can return the information to the public domain (all literature referenced of course so we cannot be accused of stealing TRANSFAC). ..d ---------------------------------- David Martin PhD Bioinformatics Scientific Officer Wellcome Trust Biocentre, Dundee ---------------------------------- From charles at moulinette.dyndns.org Thu Jul 26 18:41:20 2001 From: charles at moulinette.dyndns.org (Charles) Date: Fri, 27 Jul 2001 00:41:20 +0200 (CEST) Subject: tfscan blues In-Reply-To: Message-ID: > > I have a related question : do you think that it would be possible to build > > fake transfac databases from a simple file? > > Currently I'm adding into an array (in the GCG findpattern format) any > > binding site of my interest that i find in the litterature. (with a name and > > a reference) > > The goal would be to use existing programs to do searches within a set of > > home-selected transcription factors. > > It woul dbe nice to have a public front end for such a database so that > submissions could be sent to a curator. Then we can return the information > to the public domain (all literature referenced of course so we cannot be > accused of stealing TRANSFAC). Well, i bet that tools allowing public contribution could be a CGI form, or a CVS archive, but setting up those interfaces is far beyond my capacities. Currently i have betveen 40 and 50 entries, very focused on molecular biology of development in early vertebrate embryos. Some are complex and other degenerate. I could not get something interesting of it for the moment, using GCG findpatterns : i get either close to no sites or plenty if i allow mismatches. I can easily imagine a way to store more information in a separate array, and then build a findpattern data file using a perl script. But i'd like to try some programs that give a score to the matches, in order to search for high complexity binding sites more efficiently. Charles From s.roehrig at xantos.de Fri Jul 27 08:38:03 2001 From: s.roehrig at xantos.de (Roehrig, Sascha) Date: Fri, 27 Jul 2001 14:38:03 +0200 Subject: coderet error with embl format file Message-ID: Dear all, I encountered an error while retrieving the feature table from the sample embl database entry: http://srs6.ebi.ac.uk/srs6bin/cgi-bin/wgetz?-e+[EMBL-ID:'TRBG361' ]. The cds and mRNA were shown correctly. However, the translation was missing the first line of amino acids and ended with double quotes. Has anybody else noticed the same? Best wishes, Sascha -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.open-bio.org/pipermail/emboss/attachments/20010727/25d0545e/attachment.html From peter.rice at uk.lionbioscience.com Fri Jul 27 08:50:28 2001 From: peter.rice at uk.lionbioscience.com (Peter Rice) Date: Fri, 27 Jul 2001 13:50:28 +0100 Subject: coderet error with embl format file References: Message-ID: <3B616394.54F03A92@uk.lionbioscience.com> "Roehrig, Sascha" wrote: > I encountered an error while retrieving the feature table from the >sample embl database entry: >http://srs6.ebi.ac.uk/srs6bin/cgi-bin/wgetz?-e+[EMBL-ID:'TRBG361' > >The cds and mRNA were shown correctly. However, the translation was >missing the first line of amino acids and ended with double quotes. Works for me in 2.0.0 There were some feature handling code changes in 2.0.0 - perhaps you can simply install the new version. regards, Peter -- ------------------------------------------------ Peter Rice, LION Bioscience Ltd, Cambridge, UK peter.rice at uk.lionbioscience.com +44 1223 224723 From ableasby at hgmp.mrc.ac.uk Sun Jul 29 19:54:37 2001 From: ableasby at hgmp.mrc.ac.uk (ableasby at hgmp.mrc.ac.uk) Date: Mon, 30 Jul 2001 00:54:37 +0100 (BST) Subject: EMBOSS 2.0.1 (& HMMER) Message-ID: <200107292354.AAA28489@bromine.hgmp.mrc.ac.uk> EMBOSS 2.0.1 fixes an indexing problem with DBIGCG and split entries. It also incorporates handling of the Selex format as used in the HMMER package. HMMER 2.1.1 has been converted for EMBOSS and appears in the download directory (ftp://ftp.uk.embnet.org/pub/EMBOSS/) as the 'embassy' package HMMER-2.1.1.tar.gz Alan From lukem at bioinfo.pbi.nrc.ca Mon Jul 30 00:09:24 2001 From: lukem at bioinfo.pbi.nrc.ca (Luke McCarthy) Date: Sun, 29 Jul 2001 22:09:24 -0600 Subject: EMBOSS GUI Message-ID: <5.1.0.14.0.20010729190020.00a13a10@bioinfo.pbi.nrc.ca> Hi everybody, On and off over the past year or so, I've been developing a GUI for the EMBOSS tools designed to operate over the web. It's been listed at the EMBOSS web site for most of that time, but in the last month I've significantly improved it to the point where I think it could be very useful to the entire EMBOSS user community. But I'd like a little help to that end. Before I release this interface out into the wild, I'd like it to be as polished as possible. I don't have a lot of time right now to do extensive testing of any kind ('testing' to this date has only involved one trial run of each application), plus I don't actually use many of these tools in practice, so I don't know if they're actually useful the way they're currently presented. So I'd like to solicit your assistance as EMBOSS users: If you could find the time to drop by http://bioinfo.pbi.nrc.ca:8090/EMBOSS/index.html , try out your favourite EMBOSS tool, offer any suggestion or criticism that comes to mind, and definitely let me know if something doesn't work, I would be very grateful. Criticism of the look and feel (colours, font size, general appearance) is appreciated, but all of that is eminently configurable through the use of style sheets (which means if you're using an older browser, this probably won't work very well for you), so it's not really too helpful. What I'm really looking for are places where the interface is awkward or difficult to understand. And support for the frame/page groupings that EMBOSS 2.0.0 allows will be coming in the next couple of days, before anyone suggests that. About the interface itself: the scripts build the input collection pages on the fly, reading relevant information from the ACD files (incidentally, in the process of building this interface I've written an ACD->XML converter if anyone would find that useful ;) Because of this, it's remarkably robust to changes in the tools themselves. Even the menu is generated dynamically, so only those tools which are available on your system will be listed (for example, if you haven't installed the EMBASSY stuff it won't show up...) You can also have the script dump all of the input collection pages and the menus to static HTML files if you're expecting heavy traffic and don't want to waste system resources... Anyway, that's my story. I strongly urge anyone who thinks a GUI for the EMBOSS tools would be a useful thing to drop by and help me make this one all that it can be. Any questions can be directed to me personally if you don't want to clutter up the list. Cheers, Luke McCarthy Bioinformatics Group, Plant Biotechnology Institute, National Research Council of Canada lukem at bioinfo.pbi.nrc.ca From bauer at genprofile.com Tue Jul 31 04:03:02 2001 From: bauer at genprofile.com (David Bauer) Date: Tue, 31 Jul 2001 10:03:02 +0200 Subject: showfeat overlaping CDS Message-ID: <3B666636.EEFD11CF@genprofile.com> Hi, I have a EMBL file with 2 CDS entries which stand for alternatively spliced products. I would like to display only one of them at a time with showfeat. Both have a /gene and /label with the gene name (e.g. gene1 gene2). So what I thought was to use: -matchtype=cds -matchtag=label -matchvalue=gene1 The matchtype works as I expect but with any kind of matchtag or matchvalue I'm getting core dumps. So what's wrong with the above example ? Also if I use -tags with a spliced CDS the tags are displayed only with the first exon, all other exons get just a CDS so it is not visible which of the remaining exons belongs to which of the genes. Thanks, David. -- Dr. David Bauer GenProfile AG, Max-Delbrueck-Center, Erwin-Negelein-Haus Robert-Roessle-Str. 10, D-13125 Berlin, Germany bauer at genprofile.com, Tel:49-30-94892165, FAX:49-30-94892151 From peter.rice at uk.lionbioscience.com Tue Jul 31 04:48:14 2001 From: peter.rice at uk.lionbioscience.com (Peter Rice) Date: Tue, 31 Jul 2001 09:48:14 +0100 Subject: showfeat overlaping CDS References: <3B666636.EEFD11CF@genprofile.com> Message-ID: <3B6670CE.A7BBDEFA@uk.lionbioscience.com> David Bauer wrote: > I have a EMBL file with 2 CDS entries which stand for alternatively > spliced products. I would like to display only one of them at a time > with showfeat. > Both have a /gene and /label with the gene name (e.g. gene1 gene2). > So what I thought was to use: > -matchtype=cds -matchtag=label -matchvalue=gene1 > The matchtype works as I expect but with any kind of matchtag or > matchvalue I'm getting core dumps. Works for me with 2.0.1, but purify complains horribly - most likely the same problem. We will fix it and add these command line options to the new test set. > Also if I use -tags with a spliced CDS the tags are displayed only with > the first exon, all other exons get just a CDS so it is not visible > which of the remaining exons belongs to which of the genes. Internally the tags are stored with the first exon. They include an implicit group tag that can be displayed with the other exons. Is that what you need? If you print out a feature table in GFF format (with seqretallfeat) what you see is pretty much what is stored internally. The Sequence and FeatFlags information is part of the feature data, rather than part of the tag-value list, and is used for keeping multiple exons together. For example: seqretallfeat tembl:hsegl1 gff::hsegl1.gff We could probably add the Sequence tag to the showfeat output (although it is not part of the EMBL feature table) or we could duplicate all the tags if that's what users would prefer. A short example from the test data set would be: showfeat tembl:hsegl1 -tags -- ------------------------------------------------ Peter Rice, LION Bioscience Ltd, Cambridge, UK peter.rice at uk.lionbioscience.com +44 1223 224723 From bauer at genprofile.com Tue Jul 31 06:54:33 2001 From: bauer at genprofile.com (David Bauer) Date: Tue, 31 Jul 2001 12:54:33 +0200 Subject: showfeat overlaping CDS References: <3B666636.EEFD11CF@genprofile.com> <3B6670CE.A7BBDEFA@uk.lionbioscience.com> Message-ID: <3B668E69.C7FD3FF7@genprofile.com> Peter Rice wrote: > Internally the tags are stored with the first exon. They include an > implicit group tag that can be displayed with the other exons. Is that what > you need? Yes, this would be nice. It would be clear which exon belongs to which splice variant. The feature display in showseq is similar. > If you print out a feature table in GFF format (with seqretallfeat) what > you see is pretty much what is stored internally. The Sequence and > FeatFlags information is part of the feature data, rather than part of the > tag-value list, and is used for keeping multiple exons together. For What I get is a FeatFlags "0x100" for the first exon and a FeatFlags "0x104" for the consecutive exons. The flags are the same for both CDS. But the Sequence has a .## which differs between the two CDS. I think if showfeat (and showseq) could show the complete tags with the first exon and just the Sequence tag with the remaining exons with the -tags option. > We could probably add the Sequence tag to the showfeat output (although it > is not part of the EMBL feature table) or we could duplicate all the tags > if that's what users would prefer. I think duplication of all tags is not necessary, the Sequence tag is sufficient. David. -- Dr. David Bauer GenProfile AG, Max-Delbrueck-Center, Erwin-Negelein-Haus Robert-Roessle-Str. 10, D-13125 Berlin, Germany bauer at genprofile.com, Tel:49-30-94892165, FAX:49-30-94892151 From fchetou at infobiogen.fr Tue Jul 3 08:55:58 2001 From: fchetou at infobiogen.fr (Farid Chetouani) Date: Tue, 03 Jul 2001 10:55:58 +0200 Subject: Protein Clustering tool Message-ID: <3B41889E.3EDC8D1E@infobiogen.fr> Bonjour I would like to know, if there is plan in Emboss to develop a software to cluster protein into families (of paralogues/orthologues) according to the sequence similarity thank you for your help F PS: please reply to my email fchetou at infobiogen.fr From frank at bioss.sari.ac.uk Tue Jul 3 09:18:20 2001 From: frank at bioss.sari.ac.uk (Frank Wright) Date: Tue, 03 Jul 2001 10:18:20 +0100 Subject: Protein Clustering tool References: <3B41889E.3EDC8D1E@infobiogen.fr> Message-ID: <3B418DDC.F2004E00@bioss.sari.ac.uk> Hi All, If you wish to construct phylogenetic trees (specifically gene trees) from protein sequences so as to infer duplication and paralogous/orthologous relationships, then you can use the PHYLIP package (available as an EMBASSY application). Genetic distances can be calculated using EPROTDIST and the distance matrix created can be input into either EFITCH (slower, more accurate tree) or ENEIGHBOR (faster, more approximate clustering method, allowing the use of the Neighbor-Joining algorithm, or the UPGMA algorithm - use the latter only if you have previously tested that the "molecular clock" assumption is valid for your dataset). ePROTDIST, eFITCH and eNEIGHBOR come from version 3.5 of the PHYLIP package (http://evolution.genetics.washington.edu). PHYLIP 3.6 has recently been released (alpha version). However, PROTDIST 3.6 has improved distances (copes with among-site rate heterogeneity to give more accurate genetic distances) and there are also improvements to NEIGHBOR 3.6 (faster) and to FITCH 3.6. I presume that PHYLIP 3.6 will be available as an EMBASSY application once it is confident that there are no serious bugs :-) I hope that helps, Best Wishes, Frank -- Frank Wright Biomathematics and Statistics Scotland, SCRI, DUNDEE DD2 5DA, Scotland frank at bioss.sari.ac.uk From fchetou at pasteur.fr Tue Jul 3 09:38:29 2001 From: fchetou at pasteur.fr (Farid Chetouani) Date: Tue, 3 Jul 2001 11:38:29 +0200 Subject: Protein Clustering tool In-Reply-To: <3B418DDC.F2004E00@bioss.sari.ac.uk>; from frank@bioss.sari.ac.uk on Tue, Jul 03, 2001 at 10:18:20AM +0100 References: <3B41889E.3EDC8D1E@infobiogen.fr> <3B418DDC.F2004E00@bioss.sari.ac.uk> Message-ID: <20010703113829.A38883@pasteur.fr> Bonjour Firstly, Frank thank you for your reply. I am sorry my first email was not enough precise. In fact, I was wondering if EMBOSS plan to provide a free clustering tool with a view to get from a protein fasta sequence file a list of family proteins. For instance, thanks to A. Enright & C. Ouzounis GeneRage software is free for academic research (http://www.ebi.ac.uk/research/cgg/services/rage/) but the sources are not yet available best regards thank you for your help F PS: please reply to my email, fchetou at infobiogen.fr > > If you wish to construct phylogenetic trees (specifically gene trees) > from protein sequences so as to infer duplication and > paralogous/orthologous relationships, then you can use the PHYLIP > package (available as an EMBASSY application). Genetic distances can be > calculated using EPROTDIST and the distance matrix created can be input > into either EFITCH (slower, more accurate tree) or ENEIGHBOR (faster, > more approximate clustering method, allowing the use of the > Neighbor-Joining algorithm, or the UPGMA algorithm - use the latter only > if you have previously tested that the "molecular clock" assumption is > valid for your dataset). > > ePROTDIST, eFITCH and eNEIGHBOR come from version 3.5 of the PHYLIP > package (http://evolution.genetics.washington.edu). PHYLIP 3.6 has > recently been released (alpha version). However, PROTDIST 3.6 has > improved distances (copes with among-site rate heterogeneity to give > more accurate genetic distances) and there are also improvements to > NEIGHBOR 3.6 (faster) and to FITCH 3.6. I presume that PHYLIP 3.6 will > be available as an EMBASSY application once it is confident that there > are no serious bugs :-) > > I hope that helps, > Best Wishes, > Frank > -- > Frank Wright > Biomathematics and Statistics Scotland, > SCRI, DUNDEE DD2 5DA, Scotland > frank at bioss.sari.ac.uk From jison at hgmp.mrc.ac.uk Tue Jul 3 09:48:04 2001 From: jison at hgmp.mrc.ac.uk (Dr J.C. Ison) Date: Tue, 03 Jul 2001 10:48:04 +0100 Subject: Protein Clustering tool References: <3B41889E.3EDC8D1E@infobiogen.fr> Message-ID: <3B4194D4.929C7A3D@hgmp.mrc.ac.uk> Software to cluster protein sequences into families on the basis of relatedness of sequence is on my list of jobs to do - will happen within the next 3 months. I personally need something quite simple minded, if you have any specific requirements let me know and I can try and pull it in my design. Cheers J. Farid Chetouani wrote: > Bonjour > > I would like to know, > if there is plan in Emboss to develop > a software to cluster protein into families (of paralogues/orthologues) > according to the sequence similarity > > thank you for your help > > F > > PS: please reply to my email fchetou at infobiogen.fr -- Jon C. Ison, PhD Bioinformatics Applications Group UK MRC Human Genome Mapping Project Resource Centre Hinxton, Cambridge, CB10 1SB, UK E-mail : jison at hgmp.mrc.ac.uk Tel : 01223 49-4548 HGMP-RC: http://www.hgmp.mrc.ac.uk/ EMBOSS : http://www.hgmp.mrc.ac.uk/Software/EMBOSS/ CCP11 : http://www.hgmp.mrc.ac.uk/CCP11/ From gbottu at ben.vub.ac.be Mon Jul 9 09:35:09 2001 From: gbottu at ben.vub.ac.be (Guy Bottu) Date: Mon, 9 Jul 2001 11:35:09 +0200 (MET DST) Subject: No subject Message-ID: <200107090935.LAA09786@bigben.vub.ac.be> Dear friends, I am puzzled by pscan outputs. I do not see the difference between "Not all elements match but those that do are in order" and "Remaining partial matches", since in both cases there are two matches with the same element. And, in general, how does pscan handle cases where you the protein really contains several times the same motif (e.g. proteins with kringles). Can Alan or someone else anwer this ? Regards, Guy Bottu -------------- next part -------------- CLASS 1 Fingerprints with all elements in order CLASS 2 All elements match but not all in the correct order Fingerprint HTHREPRESSR Elements 2 Accession number PR00031 Lambda and other repressor helix-turn-helix signature Element 1 Threshold 50% Score 73% Start position 135 Length 10 Element 2 Threshold 32% Score 32% Start position 74 Length 17 CLASS 3 Not all elements match but those that do are in order Fingerprint GEMCOATBR1 Elements 7 Accession number PR00225 Geminivirus BR1 coat protein signature Element 3 Threshold 30% Score 37% Start position 281 Length 15 Element 3 Threshold 30% Score 31% Start position 196 Length 15 CLASS 4 Remaining partial matches Fingerprint GABAARBETA Elements 4 Accession number PR01160 Gamma-aminobutyric-acid A receptor beta subunit signature Element 1 Threshold 33% Score 34% Start position 275 Length 15 Element 1 Threshold 33% Score 33% Start position 187 Length 15 From sgmd at genetik.fu-berlin.de Tue Jul 10 08:36:25 2001 From: sgmd at genetik.fu-berlin.de (Thomas Siegmund) Date: Tue, 10 Jul 2001 10:36:25 +0200 Subject: Announce: X GUI for EMBOSS V0.5 Message-ID: <20010710083627.D881617AD6@mercury.hgmp.mrc.ac.uk> Dear all, a few months ago I announced my plan to build a X Window GUI for EMBOSS based on Kaptain and QT/KDE. Today I'd like to inform you that I have made some progress with it. Version 0.5 of EMBOSS.kaptn is available at http://userpage.fu-berlin.de/~sgmd . ChangeLog: ========== Version 0.5 - Covering 50 EMBOSS applications with (almost) all options - Integrated EMBOSS help system - Use new regexpression features of Kaptain 0.6. This allows fallback to EMBOSS defaults, if text input fields for parameters like "-outfile" are empty. - Files can be selected by drag & drop - Addition of embosslauncher, a tool to set the working directory and to run different EMBOSS applications with the same sequence file - Simple install script Version 0.1 - First simple GUIs for 12 EMBOSS applications - First public announcement at emboss at embnet.org Please give it a try and let me know what you think. With best regards Thomas -- Thomas Siegmund Freie Universit?t Berlin Institut f?r Genetik Arnimallee 7 14195 Berlin Germany Tel: +49 30 838 54868 Fax: +49 30 838 54395 http://userpage.fu-berlin.de/~sgmd From friends at openxxx.net Sun Jul 15 01:17:19 2001 From: friends at openxxx.net (friends at openxxx.net) Date: Sun, 15 Jul 2001 02:17:19 +0100 (BST) Subject: Hello, your friend recommended openxxx to you Message-ID: <20010715011719.4C2CA17A56@mercury.hgmp.mrc.ac.uk> You have been invited to check out this adult site by one of your friends who visited us. our URL is http://www.openxxx.net/ enjoy, OpenXXX TEAM 2001 From ableasby at hgmp.mrc.ac.uk Sun Jul 15 12:51:31 2001 From: ableasby at hgmp.mrc.ac.uk (ableasby at hgmp.mrc.ac.uk) Date: Sun, 15 Jul 2001 13:51:31 +0100 (BST) Subject: Announcing EMBOSS 2.0.0 Message-ID: <200107151251.NAA11553@bromine.hgmp.mrc.ac.uk> EMBOSS 2.0.0 includes: 1. Feature table reading: EMBL, Swissprot and PIR feature tables are handled by rewritten library routines. Tables can currently be read and written (or interconverted) in native or GFF formats. For the applications programmer an internal key/value pair structure greatly simplifies use. 2. Report Handling: Stub code to enable application output to be selected in (a range of) standard output report formats has been included. The feature tables above use one of these formats. More report formats will be added during the lifetime of the 2.x.x series. Release 3.0.0 of EMBOSS will mark the completion of this phase. 3. Code purification: All library code and applications are tested for memory handling before release. To our knowledge the code does not leak a single byte in normal use. A "purify" script is provided (mainly for developers). 4. Quality control: code has been written, supplied and used for testing code prior to release. This ensures that applications produce the same output (where appropriate) after changes to the library etc. A QA test script is provided/ 5. Code modification: almost all the source code has been revamped since the 1.x.x series. All functions, including those in applications, have unique names. This now allows you to navigate the entire source code using SRS. 6. Protein structure code has been added and, although not yet complete, this marks one of many new directions for applications. Not entirely by coincidence the release of 2.0.0, like 1.0.0, has happened on St Swithin's Day (15th July) just prior to the ISMB conference. So, if it works on that day it should work for 40 days thereafter! We look forward to making the same joke again next year (1). Alan On behalf of the development team (apologies if your name has been omitted by accident) who are: HGMP: Alan Bleasby, Tim Carver, Jon Ison, Ranjeeva Ranasinghe, Gary Williams Lion Bioscience: Peter Rice Special thanks to David Martin (University of Dundee) for the administration guide. To Lisa Mullan (HGMP training courses) for providing feedback and suggestions from course attendees. Thanks to all who have made suggestions, provided bug reports or contributed code. If we've failed to acknowledge you here you should be there in the source code. If not, tell us and we'll fix it! Footnote: 1. St. Swithin's Day if thou dost rain, For forty days it will remain; St. Swithin's Day if thou be fair, For forty days 'twill rain na mair. From ableasby at hgmp.mrc.ac.uk Mon Jul 16 10:55:04 2001 From: ableasby at hgmp.mrc.ac.uk (ableasby at hgmp.mrc.ac.uk) Date: Mon, 16 Jul 2001 11:55:04 +0100 (BST) Subject: EMBOSS 2.0.0 amendment Message-ID: <200107161055.LAA13298@bromine.hgmp.mrc.ac.uk> There was an omission in the original EMBOSS-2.0.0.tar.gz file which would have resulted in 3 of the protein structure acd files not being copied after a "make install". This has now been corrected and a replacement file put on the server. Alan From dessen at infobiogen.fr Mon Jul 16 12:25:38 2001 From: dessen at infobiogen.fr (Philippe Dessen) Date: Mon, 16 Jul 2001 14:25:38 +0200 Subject: fuzznuc Message-ID: <3B52DC18.3141B98D@infobiogen.fr> Just a question about fuzznuc : Is it possible to define a pattern with repetition of a motif (as n letters with n>1) ? That is not mentionned in documentation . The following pattern (a stop codon in a coding frame) seems to be illegal ! <(NNN)(0,)TGA(NNN)(1,)> $ fuzznuc seqfile Nucleic acid pattern search Search pattern: <(NNN)(0,)TGA(NNN)(1,)> Number of mismatches [0]: Output file [rptufrpx.fuzznuc]: This is a warning: Illegal character [(] EMBOSS An error in fuzznuc.c at line 96: Illegal pattern -------- in GCG syntax you can use (NNN){1,} Regards Philippe Dessen From gwilliam at hgmp.mrc.ac.uk Mon Jul 16 12:33:33 2001 From: gwilliam at hgmp.mrc.ac.uk (Gary Williams, Tel 01223 494522) Date: Mon, 16 Jul 2001 13:33:33 +0100 Subject: fuzznuc References: <3B52DC18.3141B98D@infobiogen.fr> Message-ID: <3B52DF1D.4B3F2FCF@hgmp.mrc.ac.uk> Philippe Dessen wrote: > > Just a question about fuzznuc : > Is it possible to define a pattern with repetition of a motif (as n > letters with n>1) ? > That is not mentionned in documentation . > > The following pattern (a stop codon in a coding frame) seems to be > illegal ! > <(NNN)(0,)TGA(NNN)(1,)> > > $ fuzznuc seqfile > Nucleic acid pattern search > Search pattern: <(NNN)(0,)TGA(NNN)(1,)> > Number of mismatches [0]: > Output file [rptufrpx.fuzznuc]: > This is a warning: Illegal character [(] > > EMBOSS An error in fuzznuc.c at line 96: > Illegal pattern I think this is illegal in fuzznuc's PROSITE-style of pattern. You might like to try 'dreg' instead with a regular expression like: ^(...)*TGA(...)+$ Note that these regilar expressions are case-sensitive, so put '-supper' on your command line to force the sequence into the required upper case. Gary -- Gary Williams Tel: +44 1223 494522 Fax: +44 1223 494512 mailto:G.Williams at hgmp.mrc.ac.uk http://www.hgmp.mrc.ac.uk/ Bioinformatics,MRC HGMP Resource Centre,Hinxton,Cambridge, CB10 1SB,UK From kala at avesthagen.com Mon Jul 23 06:43:47 2001 From: kala at avesthagen.com (Kala) Date: Mon, 23 Jul 2001 12:13:47 +0530 (IST) Subject: help pl. Message-ID: Hi all, Cud u pl.tell me whether i can install Emboss on True64Unix... I'm unable to untar it...It says "not look like a tar archive"... It'll b very useful if i get a reply soon. thanx in adv. kala From bauer at genprofile.com Mon Jul 23 07:06:57 2001 From: bauer at genprofile.com (David Bauer) Date: Mon, 23 Jul 2001 09:06:57 +0200 Subject: help pl. References: Message-ID: <3B5BCD11.C2E54D9B@genprofile.com> Kala wrote: > > Hi all, > Cud u pl.tell me whether i can install Emboss on True64Unix... > I'm unable to untar it...It says "not look like a tar archive"... The download file is a tar file which is compressed with gzip. If you have gnu tar use 'tar -xvzf '. The z options says its a compressed tar file. If your systems tar does not know how to handle compressed archives (like e.g. the Solaris tar) you must first run 'gunzip '. I hope this helps, Ciao, David. From dmartin at bioinformatics.msiwtb.dundee.ac.uk Mon Jul 23 08:52:14 2001 From: dmartin at bioinformatics.msiwtb.dundee.ac.uk (David Martin) Date: Mon, 23 Jul 2001 09:52:14 +0100 (BST) Subject: help pl. In-Reply-To: Message-ID: On Mon, 23 Jul 2001, Kala wrote: > Hi all, > Cud u pl.tell me whether i can install Emboss on True64Unix... > I'm unable to untar it...It says "not look like a tar archive"... First ensure that it was transferred in binary mode. Secondly, you will need to gunzip the archive before untarr'ing. Gnu tar includes gunzip (tar zxf filename) whereas many vendor supplied versions don't. The following command line may help zcat filename | tar xf - ..d > > It'll b very useful if i get a reply soon. > > thanx in adv. > kala > > > ---------------------------------- David Martin PhD Bioinformatics Scientific Officer Wellcome Trust Biocentre, Dundee ---------------------------------- From gbottu at ben.vub.ac.be Wed Jul 25 18:31:27 2001 From: gbottu at ben.vub.ac.be (Guy Bottu) Date: Wed, 25 Jul 2001 20:31:27 +0200 (MET DST) Subject: tfscan blues Message-ID: <200107251831.UAA17816@bigben.vub.ac.be> from : BEN Dear colleagues, I had already posted this question before, but nobody had replied. The problem is that the value of the program tfscan is decreasing, since we cannot get updates of TRANSFAC anymore, unless we pay a licence, and I wonder whether at all EMBnet Nodes have the right to give access to their users. For info, see http://www.biobase.de/academia.html Anybody a comment ? Guy Bottu From c.plessy at mangoosta.net Wed Jul 25 21:18:47 2001 From: c.plessy at mangoosta.net (Charles Plessy) Date: Wed, 25 Jul 2001 23:18:47 +0200 Subject: tfscan blues In-Reply-To: <200107251831.UAA17816@bigben.vub.ac.be> References: <200107251831.UAA17816@bigben.vub.ac.be> Message-ID: <01072523184702.02531@moulinette> Le Mercredi 25 Juillet 2001 20:31, Guy Bottu a ?crit : > from : BEN > > Dear colleagues, > > I had already posted this question before, but nobody had replied. The > problem is that the value of the program tfscan is decreasing, since we > cannot get updates of TRANSFAC anymore, unless we pay a licence, and I > wonder whether at all EMBnet Nodes have the right to give access to their > users. > For info, see http://www.biobase.de/academia.html > Anybody a comment ? I have a related question : do you think that it would be possible to build fake transfac databases from a simple file? Currently I'm adding into an array (in the GCG findpattern format) any binding site of my interest that i find in the litterature. (with a name and a reference) The goal would be to use existing programs to do searches within a set of home-selected transcription factors. Charles PLESSY From dmartin at bioinformatics.msiwtb.dundee.ac.uk Thu Jul 26 07:55:54 2001 From: dmartin at bioinformatics.msiwtb.dundee.ac.uk (David Martin) Date: Thu, 26 Jul 2001 08:55:54 +0100 (BST) Subject: tfscan blues In-Reply-To: <01072523184702.02531@moulinette> Message-ID: On Wed, 25 Jul 2001, Charles Plessy wrote: > Le Mercredi 25 Juillet 2001 20:31, Guy Bottu a ?crit : > > from : BEN > > > > Dear colleagues, > > > > I had already posted this question before, but nobody had replied. The > > problem is that the value of the program tfscan is decreasing, since we > > cannot get updates of TRANSFAC anymore, unless we pay a licence, and I > > wonder whether at all EMBnet Nodes have the right to give access to their > > users. > > For info, see http://www.biobase.de/academia.html > > Anybody a comment ? > > I have a related question : do you think that it would be possible to build > fake transfac databases from a simple file? > Currently I'm adding into an array (in the GCG findpattern format) any > binding site of my interest that i find in the litterature. (with a name and > a reference) > The goal would be to use existing programs to do searches within a set of > home-selected transcription factors. It woul dbe nice to have a public front end for such a database so that submissions could be sent to a curator. Then we can return the information to the public domain (all literature referenced of course so we cannot be accused of stealing TRANSFAC). ..d ---------------------------------- David Martin PhD Bioinformatics Scientific Officer Wellcome Trust Biocentre, Dundee ---------------------------------- From charles at moulinette.dyndns.org Thu Jul 26 22:41:20 2001 From: charles at moulinette.dyndns.org (Charles) Date: Fri, 27 Jul 2001 00:41:20 +0200 (CEST) Subject: tfscan blues In-Reply-To: Message-ID: > > I have a related question : do you think that it would be possible to build > > fake transfac databases from a simple file? > > Currently I'm adding into an array (in the GCG findpattern format) any > > binding site of my interest that i find in the litterature. (with a name and > > a reference) > > The goal would be to use existing programs to do searches within a set of > > home-selected transcription factors. > > It woul dbe nice to have a public front end for such a database so that > submissions could be sent to a curator. Then we can return the information > to the public domain (all literature referenced of course so we cannot be > accused of stealing TRANSFAC). Well, i bet that tools allowing public contribution could be a CGI form, or a CVS archive, but setting up those interfaces is far beyond my capacities. Currently i have betveen 40 and 50 entries, very focused on molecular biology of development in early vertebrate embryos. Some are complex and other degenerate. I could not get something interesting of it for the moment, using GCG findpatterns : i get either close to no sites or plenty if i allow mismatches. I can easily imagine a way to store more information in a separate array, and then build a findpattern data file using a perl script. But i'd like to try some programs that give a score to the matches, in order to search for high complexity binding sites more efficiently. Charles From s.roehrig at xantos.de Fri Jul 27 12:38:03 2001 From: s.roehrig at xantos.de (Roehrig, Sascha) Date: Fri, 27 Jul 2001 14:38:03 +0200 Subject: coderet error with embl format file Message-ID: Dear all, I encountered an error while retrieving the feature table from the sample embl database entry: http://srs6.ebi.ac.uk/srs6bin/cgi-bin/wgetz?-e+[EMBL-ID:'TRBG361' ]. The cds and mRNA were shown correctly. However, the translation was missing the first line of amino acids and ended with double quotes. Has anybody else noticed the same? Best wishes, Sascha -------------- next part -------------- An HTML attachment was scrubbed... URL: From peter.rice at uk.lionbioscience.com Fri Jul 27 12:50:28 2001 From: peter.rice at uk.lionbioscience.com (Peter Rice) Date: Fri, 27 Jul 2001 13:50:28 +0100 Subject: coderet error with embl format file References: Message-ID: <3B616394.54F03A92@uk.lionbioscience.com> "Roehrig, Sascha" wrote: > I encountered an error while retrieving the feature table from the >sample embl database entry: >http://srs6.ebi.ac.uk/srs6bin/cgi-bin/wgetz?-e+[EMBL-ID:'TRBG361' > >The cds and mRNA were shown correctly. However, the translation was >missing the first line of amino acids and ended with double quotes. Works for me in 2.0.0 There were some feature handling code changes in 2.0.0 - perhaps you can simply install the new version. regards, Peter -- ------------------------------------------------ Peter Rice, LION Bioscience Ltd, Cambridge, UK peter.rice at uk.lionbioscience.com +44 1223 224723 From ableasby at hgmp.mrc.ac.uk Sun Jul 29 23:54:37 2001 From: ableasby at hgmp.mrc.ac.uk (ableasby at hgmp.mrc.ac.uk) Date: Mon, 30 Jul 2001 00:54:37 +0100 (BST) Subject: EMBOSS 2.0.1 (& HMMER) Message-ID: <200107292354.AAA28489@bromine.hgmp.mrc.ac.uk> EMBOSS 2.0.1 fixes an indexing problem with DBIGCG and split entries. It also incorporates handling of the Selex format as used in the HMMER package. HMMER 2.1.1 has been converted for EMBOSS and appears in the download directory (ftp://ftp.uk.embnet.org/pub/EMBOSS/) as the 'embassy' package HMMER-2.1.1.tar.gz Alan From lukem at bioinfo.pbi.nrc.ca Mon Jul 30 04:09:24 2001 From: lukem at bioinfo.pbi.nrc.ca (Luke McCarthy) Date: Sun, 29 Jul 2001 22:09:24 -0600 Subject: EMBOSS GUI Message-ID: <5.1.0.14.0.20010729190020.00a13a10@bioinfo.pbi.nrc.ca> Hi everybody, On and off over the past year or so, I've been developing a GUI for the EMBOSS tools designed to operate over the web. It's been listed at the EMBOSS web site for most of that time, but in the last month I've significantly improved it to the point where I think it could be very useful to the entire EMBOSS user community. But I'd like a little help to that end. Before I release this interface out into the wild, I'd like it to be as polished as possible. I don't have a lot of time right now to do extensive testing of any kind ('testing' to this date has only involved one trial run of each application), plus I don't actually use many of these tools in practice, so I don't know if they're actually useful the way they're currently presented. So I'd like to solicit your assistance as EMBOSS users: If you could find the time to drop by http://bioinfo.pbi.nrc.ca:8090/EMBOSS/index.html , try out your favourite EMBOSS tool, offer any suggestion or criticism that comes to mind, and definitely let me know if something doesn't work, I would be very grateful. Criticism of the look and feel (colours, font size, general appearance) is appreciated, but all of that is eminently configurable through the use of style sheets (which means if you're using an older browser, this probably won't work very well for you), so it's not really too helpful. What I'm really looking for are places where the interface is awkward or difficult to understand. And support for the frame/page groupings that EMBOSS 2.0.0 allows will be coming in the next couple of days, before anyone suggests that. About the interface itself: the scripts build the input collection pages on the fly, reading relevant information from the ACD files (incidentally, in the process of building this interface I've written an ACD->XML converter if anyone would find that useful ;) Because of this, it's remarkably robust to changes in the tools themselves. Even the menu is generated dynamically, so only those tools which are available on your system will be listed (for example, if you haven't installed the EMBASSY stuff it won't show up...) You can also have the script dump all of the input collection pages and the menus to static HTML files if you're expecting heavy traffic and don't want to waste system resources... Anyway, that's my story. I strongly urge anyone who thinks a GUI for the EMBOSS tools would be a useful thing to drop by and help me make this one all that it can be. Any questions can be directed to me personally if you don't want to clutter up the list. Cheers, Luke McCarthy Bioinformatics Group, Plant Biotechnology Institute, National Research Council of Canada lukem at bioinfo.pbi.nrc.ca From bauer at genprofile.com Tue Jul 31 08:03:02 2001 From: bauer at genprofile.com (David Bauer) Date: Tue, 31 Jul 2001 10:03:02 +0200 Subject: showfeat overlaping CDS Message-ID: <3B666636.EEFD11CF@genprofile.com> Hi, I have a EMBL file with 2 CDS entries which stand for alternatively spliced products. I would like to display only one of them at a time with showfeat. Both have a /gene and /label with the gene name (e.g. gene1 gene2). So what I thought was to use: -matchtype=cds -matchtag=label -matchvalue=gene1 The matchtype works as I expect but with any kind of matchtag or matchvalue I'm getting core dumps. So what's wrong with the above example ? Also if I use -tags with a spliced CDS the tags are displayed only with the first exon, all other exons get just a CDS so it is not visible which of the remaining exons belongs to which of the genes. Thanks, David. -- Dr. David Bauer GenProfile AG, Max-Delbrueck-Center, Erwin-Negelein-Haus Robert-Roessle-Str. 10, D-13125 Berlin, Germany bauer at genprofile.com, Tel:49-30-94892165, FAX:49-30-94892151 From peter.rice at uk.lionbioscience.com Tue Jul 31 08:48:14 2001 From: peter.rice at uk.lionbioscience.com (Peter Rice) Date: Tue, 31 Jul 2001 09:48:14 +0100 Subject: showfeat overlaping CDS References: <3B666636.EEFD11CF@genprofile.com> Message-ID: <3B6670CE.A7BBDEFA@uk.lionbioscience.com> David Bauer wrote: > I have a EMBL file with 2 CDS entries which stand for alternatively > spliced products. I would like to display only one of them at a time > with showfeat. > Both have a /gene and /label with the gene name (e.g. gene1 gene2). > So what I thought was to use: > -matchtype=cds -matchtag=label -matchvalue=gene1 > The matchtype works as I expect but with any kind of matchtag or > matchvalue I'm getting core dumps. Works for me with 2.0.1, but purify complains horribly - most likely the same problem. We will fix it and add these command line options to the new test set. > Also if I use -tags with a spliced CDS the tags are displayed only with > the first exon, all other exons get just a CDS so it is not visible > which of the remaining exons belongs to which of the genes. Internally the tags are stored with the first exon. They include an implicit group tag that can be displayed with the other exons. Is that what you need? If you print out a feature table in GFF format (with seqretallfeat) what you see is pretty much what is stored internally. The Sequence and FeatFlags information is part of the feature data, rather than part of the tag-value list, and is used for keeping multiple exons together. For example: seqretallfeat tembl:hsegl1 gff::hsegl1.gff We could probably add the Sequence tag to the showfeat output (although it is not part of the EMBL feature table) or we could duplicate all the tags if that's what users would prefer. A short example from the test data set would be: showfeat tembl:hsegl1 -tags -- ------------------------------------------------ Peter Rice, LION Bioscience Ltd, Cambridge, UK peter.rice at uk.lionbioscience.com +44 1223 224723 From bauer at genprofile.com Tue Jul 31 10:54:33 2001 From: bauer at genprofile.com (David Bauer) Date: Tue, 31 Jul 2001 12:54:33 +0200 Subject: showfeat overlaping CDS References: <3B666636.EEFD11CF@genprofile.com> <3B6670CE.A7BBDEFA@uk.lionbioscience.com> Message-ID: <3B668E69.C7FD3FF7@genprofile.com> Peter Rice wrote: > Internally the tags are stored with the first exon. They include an > implicit group tag that can be displayed with the other exons. Is that what > you need? Yes, this would be nice. It would be clear which exon belongs to which splice variant. The feature display in showseq is similar. > If you print out a feature table in GFF format (with seqretallfeat) what > you see is pretty much what is stored internally. The Sequence and > FeatFlags information is part of the feature data, rather than part of the > tag-value list, and is used for keeping multiple exons together. For What I get is a FeatFlags "0x100" for the first exon and a FeatFlags "0x104" for the consecutive exons. The flags are the same for both CDS. But the Sequence has a .## which differs between the two CDS. I think if showfeat (and showseq) could show the complete tags with the first exon and just the Sequence tag with the remaining exons with the -tags option. > We could probably add the Sequence tag to the showfeat output (although it > is not part of the EMBL feature table) or we could duplicate all the tags > if that's what users would prefer. I think duplication of all tags is not necessary, the Sequence tag is sufficient. David. -- Dr. David Bauer GenProfile AG, Max-Delbrueck-Center, Erwin-Negelein-Haus Robert-Roessle-Str. 10, D-13125 Berlin, Germany bauer at genprofile.com, Tel:49-30-94892165, FAX:49-30-94892151