From gbottu at ben.vub.ac.be Mon Oct 2 03:58:24 2006 From: gbottu at ben.vub.ac.be (Guy Bottu) Date: Mon, 2 Oct 2006 09:58:24 +0200 Subject: [EMBOSS] case sensitive identifiers - Checked by AntiVir DEMO version - In-Reply-To: <2023.86.132.219.183.1159518502.squirrel@webmail.ebi.ac.uk> References: <20060928135740.GA14320@bigben.ulb.ac.be> <451BDD04.9040806@ebi.ac.uk> <20060929081508.GA25906@bigben.ulb.ac.be> <2023.86.132.219.183.1159518502.squirrel@webmail.ebi.ac.uk> Message-ID: <20061002075824.GA5571@bigben.ulb.ac.be> On Fri, Sep 29, 2006 at 09:28:22AM +0100, pmr at ebi.ac.uk wrote: > For the PDB case, really only the end of the ID is case-sensitive. Do you > think the database should be case-sensitive for the whole ID, or does it > make sense to check for a pattern as the case-sensitive part? I think that trying to define which part of the ID is case-sensitive is making it just too complicated. Let's have it completely case-sensitive or not at all. > EMBOSS will initially read only one sequence for a seqall ... it does not > read in all the sequences and look for duplicates so we have to decide in > the emboss.defaults DB definition how to check a single ID (no way to read > them all and check for duplicates). Trying to check for duplicates is again too complicated. I understand that if a databank or a multiple sequence file has duplicates a "sequence" will retrieve the first and a "seqset" or "seqall" will retrieve them all. Well, let it be that way. It is the responsability of the database manager/user to make sure there are no duplicates. Guy From gbottu at ben.vub.ac.be Mon Oct 2 04:11:46 2006 From: gbottu at ben.vub.ac.be (Guy Bottu) Date: Mon, 2 Oct 2006 10:11:46 +0200 Subject: [EMBOSS] case sensitive identifiers - Checked by AntiVir DEMO version - In-Reply-To: <451CF527.8040506@ebi.ac.uk> References: <20060928135740.GA14320@bigben.ulb.ac.be> <451BDD04.9040806@ebi.ac.uk> <20060929081508.GA25906@bigben.ulb.ac.be> <451CF527.8040506@ebi.ac.uk> Message-ID: <20061002081146.GB5571@bigben.ulb.ac.be> On Fri, Sep 29, 2006 at 11:27:51AM +0100, Peter Rice wrote: > So, there will be 2 new (and for the first time boolean) attributes for > databases. To use them, you will need: > > caseidmatch: "Y" > hasaccession: "N" The "hasaccession" attribute is certainly useful for search methods like SRS and MRS who have the notion of searching in separate indexes. By default searching both "id" and "ac" is the thing to do, but there are databanks where there is no "ac" indexed or there are databanks, like EMBL or IMGTHLA, where the "id" and the "ac" are always identical, so that searching only the "id" gains time without loosing functionality. As for the case problem, I think we agree that the best is to always handle the sequence name as such (case as typed by the user) to the search method and in case the search method itself is not case senstive but the databank is, let EMBOSS if 'hasaccession: "Y"' parse the retrieved sequences and accept only those who match. This will work fine for SRS (and of course for the method "direct", where EMBOSS does all the work), but it will not work for MRS, since the current version of MRS does not allow case-different index words. Guy From jbreu at mpipsykl.mpg.de Fri Oct 6 12:53:00 2006 From: jbreu at mpipsykl.mpg.de (Johannes Breu) Date: Fri, 06 Oct 2006 18:53:00 +0200 Subject: [EMBOSS] question Message-ID: <3.0.6.32.20061006185300.00ab68e0@komserv.mpipsykl.mpg.de> To whom it may concern. I tried to install emboss on MS Windows 2000 in a cygwin environment. I typed ./configure (following INSTALL). It took a long time but there was no error message. After typing make I got the message bash:command:not found. Does anybody have any idea to solve this problem. Thanks. From shaun at ebi.ac.uk Fri Oct 6 14:30:56 2006 From: shaun at ebi.ac.uk (shaun at ebi.ac.uk) Date: Fri, 6 Oct 2006 19:30:56 +0100 (BST) Subject: [EMBOSS] question In-Reply-To: <3.0.6.32.20061006185300.00ab68e0@komserv.mpipsykl.mpg.de> References: <3.0.6.32.20061006185300.00ab68e0@komserv.mpipsykl.mpg.de> Message-ID: <50608.82.21.106.225.1160159456.squirrel@webmail.ebi.ac.uk> > To whom it may concern. > > I tried to install emboss on MS Windows 2000 in a cygwin environment. I > typed ./configure (following INSTALL). It took a long time but there was > no > error message. After typing make I got the message bash:command:not > found. > Does anybody have any idea to solve this problem. Thanks. > > _______________________________________________ > EMBOSS mailing list > EMBOSS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/emboss > Hi Johannes, I believe the installation under cygwin requires a couple of additional switches: ./configure --without-x CFLAGS=-s See the following URL for a more short guide to installing EMBOSS under a cygwin environment (it actually describes the scenario that you are encountering): http://emboss.sourceforge.net/download/cygwin.html HTH Shaun From mukherje at nsm.umass.edu Fri Oct 13 15:34:11 2006 From: mukherje at nsm.umass.edu (mukherje at nsm.umass.edu) Date: Fri, 13 Oct 2006 15:34:11 -0400 Subject: [EMBOSS] (no subject) Message-ID: <1160768051.452fea3357af3@mail-www.oit.umass.edu> Hi, I tried to build Emboss package in a cygwin environment but had thhe following error after running the "make" function. Creating library file: .libs/libajax.dll.a collect2: Id returned 1 exit status make[1]: *** [libajax.1a] Error 1 make[1]: Leaving directory '/home/supratim/Emboss-4.0.0/ajax' make: *** [install-recursive] Error1 I tried to run the application both before & after applying the fix as mentioned and also tried with the regular configure option as well as ./configure --without-x CFLAGS=-s make make install I also tried with the older version (EMBOSS-3.0.0) but did not have much luck. It works fine in a Mac but I would like to get it to work on a Windows platform with cygwin. Thank you for your assistance in advance Supratim Mukherjee Graduate student Department of Microbiology UMass, Amherst From kertib at linuxlap.hu Mon Oct 16 04:50:03 2006 From: kertib at linuxlap.hu (Kerti =?ISO-8859-1?Q?Bal=E1zs_G=E1bor?=) Date: Mon, 16 Oct 2006 10:50:03 +0200 Subject: [EMBOSS] make error Message-ID: <1160988603.3981.12.camel@balazska.site> Hi, The .configure run well, but the make made error ebove: config.status: creating jemboss/resources/Makefile config.status: creating jemboss/utils/Makefile config.status: creating Makefile config.status: executing depfiles commands server:/usr/src/EMBOSS-4.0.0 # make Making all in plplot make[1]: Entering directory `/usr/src/EMBOSS-4.0.0/plplot' Making all in lib make[2]: Entering directory `/usr/src/EMBOSS-4.0.0/plplot/lib' make[2]: Nothing to be done for `all'. make[2]: Leaving directory `/usr/src/EMBOSS-4.0.0/plplot/lib' make[2]: Entering directory `/usr/src/EMBOSS-4.0.0/plplot' make[2]: Nothing to be done for `all-am'. make[2]: Leaving directory `/usr/src/EMBOSS-4.0.0/plplot' make[1]: Leaving directory `/usr/src/EMBOSS-4.0.0/plplot' Making all in ajax make[1]: Entering directory `/usr/src/EMBOSS-4.0.0/ajax' make[1]: Nothing to be done for `all'. make[1]: Leaving directory `/usr/src/EMBOSS-4.0.0/ajax' Making all in nucleus make[1]: Entering directory `/usr/src/EMBOSS-4.0.0/nucleus' /bin/sh ../libtool --tag=CC --mode=link gcc -O2 -o libnucleus.la -rpath /usr/local/lib -version-info 4:0:0 embaln.lo embcom.lo embcons.lo embdata.lo embdbi.lo embdmx.lo embdomain.lo embest.lo embexit.lo embgroup.lo embiep.lo embindex.lo embinit.lo embmat.lo embmisc.lo embmol.lo embnmer.lo embpat.lo embpatlist.lo embprop.lo embpdb.lo embread.lo embsig.lo embshow.lo embword.lo libtool: link: `embmat.lo' is not a valid libtool object make[1]: *** [libnucleus.la] Error 1 make[1]: Leaving directory `/usr/src/EMBOSS-4.0.0/nucleus' make: *** [all-recursive] Error 1 server:/usr/src/EMBOSS-4.0.0 # The OS: SuSE OpenEnterprise Server 10.0 x86 Kerlenl: server: Linux server 2.6.16.21-0.25-default #1 Tue Sep 19 07:26:15 UTC 2006 i686 i686 i386 GNU/Linux What is the soultion? Thank you for your assistance in advance Kerti Balazs Gabor Genetics and Plant Breeding, Szent Istvan University, Pater K. U. 1., Godollo 2103, Hungary From maoj at helix.nih.gov Tue Oct 17 11:36:22 2006 From: maoj at helix.nih.gov (Jean Mao) Date: Tue, 17 Oct 2006 11:36:22 -0400 Subject: [EMBOSS] Question regarding dbxflat Message-ID: <000001c6f202$000323e0$be4de780@CIT.NIH.GOV> Hello, Could someone help me determine which fields I need to include while running dbxflat? I am going to index the genbank and est gb*.seq files and gbest*.seq files from ftp://ftp.ncbi.nih.gov/genbank/. These files have sequence entries composed of : Locus, Definition, Accession, Version, Keywords, Source, Organism?? If I specify 'acc, id' while indexing, will the 'Definition' line be indexed or not? What about 'acc, id, des' ? In other words, I would like to know which programs in EMBOSS will not work if I don't specify 'des' while indexing. Some programs in EMBOSS such as 'coderet' require feature table. If I only index 'acc, id', will coderet work when user specify 'genbank:xxxxx' ? I guess all I am trying to ask is what programs will stop working if I only accept default 'acc, id' fields. Thank you in advance. From pmr at ebi.ac.uk Tue Oct 17 12:23:03 2006 From: pmr at ebi.ac.uk (pmr at ebi.ac.uk) Date: Tue, 17 Oct 2006 17:23:03 +0100 (BST) Subject: [EMBOSS] Question regarding dbxflat In-Reply-To: <000001c6f202$000323e0$be4de780@CIT.NIH.GOV> References: <000001c6f202$000323e0$be4de780@CIT.NIH.GOV> Message-ID: <2320.210.150.186.27.1161102183.squirrel@webmail.ebi.ac.uk> Hi Jean, > If I specify 'acc, id' while indexing, will the 'Definition' line be > indexed or not? What about 'acc, id, des' ? In other words, I would like > to know which programs in EMBOSS will not work if I don't specify 'des' > while indexing. > Some programs in EMBOSS such as 'coderet' require feature table. If I only > index 'acc, id', will coderet work when user specify 'genbank:xxxxx' ? > > I guess all I am trying to ask is what programs will stop working if I > only accept default 'acc, id' fields. The dbxflat fields only affect queries (do you want to search by dbname-des: or dbname-gi: when you look for sequences). Retrieval is the same once an entry has been found - you can return all txt for entret, features for coderet, and os on as usual. By default only the id and acc lines will be indexed. We found a problem with one database that had no accessions (pdb as a fasta file indexed with dbxfasta) so the next release will have an option to turn off accession searches in the database definition and we may add an option to skip accession indexing. regards, Peter Rice From smiddha at indiana.edu Thu Oct 19 10:59:57 2006 From: smiddha at indiana.edu (Sumit Middha) Date: Thu, 19 Oct 2006 10:59:57 -0400 Subject: [EMBOSS] distmat Uncorrected distance > 100 Message-ID: <453792ED.6040601@indiana.edu> Hi, I tried using distmat from emboss on some alignments and am getting scores in excess of 100 (using all default options). I am not sure how scores can exceed 100. D = uncorrected distance = p-distance = 1-S where S = m/(npos + gaps*gap_penalty) So D is like a percentage and equals number of substitutions per 100 bases or amino acids. Please correct me or point me to an explanation which will help clarify my doubt. Thanks, Sumit From aengus.stewart at cancer.org.uk Mon Oct 23 14:00:11 2006 From: aengus.stewart at cancer.org.uk (Aengus Stewart) Date: Mon, 23 Oct 2006 19:00:11 +0100 Subject: [EMBOSS] Fuzznuc ignoring start and end Message-ID: <453D032B.7010104@cancer.org.uk> Hi folks, fuzznuc and also fuzzpro are ignoring the start and end params I am giving it. fuzznuc -pattern rccatgg -sbegin1 75834 -send1 96013 -sequence ac087388.fasta Cheers Aengus ######################################## # Program: fuzznuc # Rundate: Mon Oct 23 2006 18:54:54 # Commandline: fuzznuc # -pattern rccatgg # -sbegin 75834 # -send 96013 # -sequence ac087388.fasta # Report_format: seqtable # Report_file: ac087388.fuzznuc ######################################## #======================================= # # Sequence: AC087388 from: 75834 to: 96013 # HitCount: 15 # # Pattern_name Mismatch Pattern # pattern1 0 rccatgg # # Complement: No # #======================================= Start End Pattern_name Mismatch Sequence 38702 38708 pattern1 . gccatgg 43834 43840 pattern1 . accatgg 47457 47463 pattern1 . gccatgg 48659 48665 pattern1 . gccatgg 56718 56724 pattern1 . accatgg 61200 61206 pattern1 . accatgg 62151 62157 pattern1 . accatgg 68706 68712 pattern1 . accatgg 78513 78519 pattern1 . gccatgg 79973 79979 pattern1 . gccatgg 86415 86421 pattern1 . accatgg 97451 97457 pattern1 . accatgg 102803 102809 pattern1 . gccatgg 113924 113930 pattern1 . gccatgg 115436 115442 pattern1 . gccatgg #--------------------------------------- #--------------------------------------- #--------------------------------------- # Total_sequences: 1 # Total_hitcount: 15 #--------------------------------------- -- ----------------------------------------------------------------------- Aengus Stewart Group Leader Bioinformatics and BioStatistics Tel: +44 (0)20 7269 3679 Cancer Research UK, Lincoln's Inn Fields, Holborn, London, WC2A 3PX, UK ----------------------------------------------------------------------- This electronic message contains information which may be privileged and confidential. The information is intended to be for the use of the individual(s) or entity named above. Be aware that any third party disclosure, distribution, copying or use of this communication, without prior permission, is strictly prohibited. From mrln at o2.pl Mon Oct 23 17:49:07 2006 From: mrln at o2.pl (Marlena Roszczyk) Date: Mon, 23 Oct 2006 23:49:07 +0200 Subject: [EMBOSS] 30 entries only Message-ID: <1161640147.4367.37.camel@localhost.localdomain> Does anybody know how to solve this problem: I use Emboss via srswww method and everything seems to work fine until I ask seqret or infoseq (or any other application that searches the database) for many sequences (for example typing: "seqret database-des:kinase"). The output consists only of 30 entries even though the same query on srs.ebi.ac.uk results in a 6-digit number of entries. What shall I do to get all the entries I want? Is it a problem with Emboss or rather srs policy of sending data? Just in case you would like to see my emboss.default: DB zuniprot [ methodquery: srswww format: swiss type: P fields: "id acc sv des key org" dbalias: uniprot url: "http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz" comment: "uniprot/swiss via srswww" ] DB zswiss [ methodquery: srswww format: swiss type: P fields: "id acc sv des key org" dbalias: swissprot url: "http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz" comment: "swissprot via srswww" ] DB zembl [ type: N methodquery: srswww format: embl fields: "id acc key sv des org" dbalias: embl url: "http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz" comment: "embl via srswww" ] Thanks in advance, Marlena Roszczyk From David.Bauer at SCHERING.DE Tue Oct 24 02:48:45 2006 From: David.Bauer at SCHERING.DE (David.Bauer at SCHERING.DE) Date: Tue, 24 Oct 2006 08:48:45 +0200 Subject: [EMBOSS] Antwort: 30 entries only In-Reply-To: <1161640147.4367.37.camel@localhost.localdomain> Message-ID: Hi Marleno, SRS has a default limit of 30 entries/page. So it seems that you are getting only the first page of results from the server. If you want to run queries with such results, it may be a good idea to download the uniprot flat file from the ebi ftp server, index it with the EMBOSS dbxflat and than run the queries locally. But if this is not an option due to limited resources, I guess Peter will have an idea how to get the other result pages out of SRS with the srswww method. ;-) Cheers, David. emboss-bounces at lists.open-bio.org schrieb am 23/10/2006 23:49:07: > Does anybody know how to solve this problem: > > I use Emboss via srswww method and everything seems to work fine until I > ask seqret or infoseq (or any other application that searches the > database) for many sequences (for example typing: "seqret > database-des:kinase"). The output consists only of 30 entries even > though the same query on srs.ebi.ac.uk results in a 6-digit number of > entries. What shall I do to get all the entries I want? Is it a problem > with Emboss or rather srs policy of sending data? > > > Just in case you would like to see my emboss.default: > > DB zuniprot [ > methodquery: srswww > format: swiss > type: P > fields: "id acc sv des key org" > dbalias: uniprot > url: "http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz" > comment: "uniprot/swiss via srswww" > ] > > DB zswiss [ > methodquery: srswww > format: swiss > type: P > fields: "id acc sv des key org" > dbalias: swissprot > url: "http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz" > comment: "swissprot via srswww" > ] > > DB zembl [ > type: N > methodquery: srswww > format: embl > fields: "id acc key sv des org" > dbalias: embl > url: "http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz" > comment: "embl via srswww" > ] > > > Thanks in advance, > Marlena Roszczyk > > _______________________________________________ > EMBOSS mailing list > EMBOSS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/emboss From rls at ebi.ac.uk Tue Oct 24 03:59:46 2006 From: rls at ebi.ac.uk (Rodrigo Lopez) Date: Tue, 24 Oct 2006 08:59:46 +0100 Subject: [EMBOSS] 30 entries only In-Reply-To: <1161640147.4367.37.camel@localhost.localdomain> References: <1161640147.4367.37.camel@localhost.localdomain> Message-ID: <453DC7F2.6030908@ebi.ac.uk> Hi, I suspect this is related to the default view used in SRS. It is returning the first page of results that contains 30 sequences (the default). To overcome this problem, the call need to have the following parameters: http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz?[uniprot-des:kinase]+-vn+1+-lv+100 -vn is the view to use (1=names, 2=complete entries) -lv is the number of entries to be used in one go It is important to realize that downloading a lot of entries, although possible, take a while and results in high loads for the servers. The way in which I use to download a large set of entries is by generating a list (using -vn 1) and then using the list with seqret in the following way: % seqret @listname -out mykinases and making sure this time that -vn 2 is used to retrieve complete entries. This requires the addition of other EMBOSS database definitions (one for lists and another for complete entry retrieval). Hope this helps, R:) Marlena Roszczyk wrote: > Does anybody know how to solve this problem: > > I use Emboss via srswww method and everything seems to work fine until I > ask seqret or infoseq (or any other application that searches the > database) for many sequences (for example typing: "seqret > database-des:kinase"). The output consists only of 30 entries even > though the same query on srs.ebi.ac.uk results in a 6-digit number of > entries. What shall I do to get all the entries I want? Is it a problem > with Emboss or rather srs policy of sending data? > > > Just in case you would like to see my emboss.default: > > DB zuniprot [ > methodquery: srswww > format: swiss > type: P > fields: "id acc sv des key org" > dbalias: uniprot > url: "http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz" > comment: "uniprot/swiss via srswww" > ] > > DB zswiss [ > methodquery: srswww > format: swiss > type: P > fields: "id acc sv des key org" > dbalias: swissprot > url: "http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz" > comment: "swissprot via srswww" > ] > > DB zembl [ > type: N > methodquery: srswww > format: embl > fields: "id acc key sv des org" > dbalias: embl > url: "http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz" > comment: "embl via srswww" > ] > > > Thanks in advance, > Marlena Roszczyk > > _______________________________________________ > EMBOSS mailing list > EMBOSS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/emboss From pmr at ebi.ac.uk Tue Oct 24 04:55:17 2006 From: pmr at ebi.ac.uk (pmr at ebi.ac.uk) Date: Tue, 24 Oct 2006 09:55:17 +0100 (BST) Subject: [EMBOSS] 30 entries only In-Reply-To: <453DC7F2.6030908@ebi.ac.uk> References: <1161640147.4367.37.camel@localhost.localdomain> <453DC7F2.6030908@ebi.ac.uk> Message-ID: <3261.217.44.133.216.1161680117.squirrel@webmail.ebi.ac.uk> Rodrigo Lopez writes: > I suspect this is related to the default view used in SRS. It is > returning the first page of results that contains 30 sequences (the > default). To overcome this problem, the call need to have the following > parameters: > > http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz?[uniprot-des:kinase]+-vn+1+-lv+100 But this is using the EMBOSS "srswww" access method, which uses +-e+-ascii That should return complete entries for all as ascii text. Perhaps something has changed on the EBI's SRS server because this now only gives me 30 entries. +-lv+100 does give 100 entries ... but it will take some reworking of the code to loop through entries that way. Hmmmm..... regards, Peter From rls at ebi.ac.uk Tue Oct 24 05:01:50 2006 From: rls at ebi.ac.uk (Rodrigo Lopez) Date: Tue, 24 Oct 2006 10:01:50 +0100 Subject: [EMBOSS] 30 entries only In-Reply-To: <3261.217.44.133.216.1161680117.squirrel@webmail.ebi.ac.uk> References: <1161640147.4367.37.camel@localhost.localdomain> <453DC7F2.6030908@ebi.ac.uk> <3261.217.44.133.216.1161680117.squirrel@webmail.ebi.ac.uk> Message-ID: <453DD67E.8000600@ebi.ac.uk> I'll have to wait for our SRS admin to come back to find out if a change has in fact taken place or not. hmm.... R:/ pmr at ebi.ac.uk wrote: > Rodrigo Lopez writes: > >> I suspect this is related to the default view used in SRS. It is >> returning the first page of results that contains 30 sequences (the >> default). To overcome this problem, the call need to have the following >> parameters: >> >> http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz?[uniprot-des:kinase]+-vn+1+-lv+100 > > But this is using the EMBOSS "srswww" access method, which uses > > +-e+-ascii > > That should return complete entries for all as ascii text. > > Perhaps something has changed on the EBI's SRS server because this now > only gives me 30 entries. > > +-lv+100 does give 100 entries ... but it will take some reworking of the > code to loop through entries that way. Hmmmm..... > > > regards, > > Peter From maoj at helix.nih.gov Tue Oct 24 14:15:17 2006 From: maoj at helix.nih.gov (Jean Mao) Date: Tue, 24 Oct 2006 14:15:17 -0400 Subject: [EMBOSS] How to include Prosite and Rebase and Print into 'showdb' program Message-ID: <000001c6f798$5c2c13c0$be4de780@CIT.NIH.GOV> Hi, for EMBOSS 4.0.0, is there a way to show both prosite and rebase databases when I type 'showdb' at the prompt? I asked the same question back in 2003. I was hoping the answer will be different this time :-) Thanks. Jean From sovani at rohan.sdsu.edu Tue Oct 24 14:09:07 2006 From: sovani at rohan.sdsu.edu (Sujata Sovani) Date: Tue, 24 Oct 2006 11:09:07 -0700 (PDT) Subject: [EMBOSS] Antigenic - input file format? Message-ID: <3251.146.244.226.90.1161713347.squirrel@www-rohan.sdsu.edu> Hi, I want to use the package called 'Antigenic' in EMBOSS. I am not quite clear about the input file format to be used. How can I input a fasta file to the program? - is it possible to use a text file that has the amino acid sequence in a fasta format? In which folder should the file be? Please let me know. Thank you. Regards, Sujata From km at mrna.tn.nic.in Tue Oct 24 17:28:57 2006 From: km at mrna.tn.nic.in (km) Date: Wed, 25 Oct 2006 02:58:57 +0530 Subject: [EMBOSS] Antigenic - input file format? In-Reply-To: <3251.146.244.226.90.1161713347.squirrel@www-rohan.sdsu.edu> References: <3251.146.244.226.90.1161713347.squirrel@www-rohan.sdsu.edu> Message-ID: <20061024212857.GA31781@mrna.tn.nic.in> Hi, > I want to use the package called 'Antigenic' in EMBOSS. > I am not quite clear about the input file format to be used. pls consult EMBOSS documentation on the system by typing $tfm antigenic > How can I input a fasta file to the program? first check: tfm antigenic then, assuming that ur set of sequence(s) are in a textfile(myseqs.fa) in fasta format run: $antigenic -sequence myseqs.fa > is it possible to use a text file that has the amino acid sequence in a fasta format? yes >In which folder should the file be? simple solution would be that the sequence current folder regards, KM From golharam at umdnj.edu Wed Oct 25 00:28:30 2006 From: golharam at umdnj.edu (Ryan Golhar) Date: Wed, 25 Oct 2006 00:28:30 -0400 Subject: [EMBOSS] How to include Prosite and Rebase and Print into 'showdb'program In-Reply-To: <000001c6f798$5c2c13c0$be4de780@CIT.NIH.GOV> Message-ID: <002f01c6f7ee$07005fe0$2f01a8c0@GOLHARMOBILE1> Have you gotten an answer to this yet? > -----Original Message----- > From: emboss-bounces at lists.open-bio.org > [mailto:emboss-bounces at lists.open-bio.org] On Behalf Of Jean Mao > Sent: Tuesday, October 24, 2006 2:15 PM > To: emboss at emboss.open-bio.org > Subject: [EMBOSS] How to include Prosite and Rebase and Print > into 'showdb'program > > > Hi, for EMBOSS 4.0.0, is there a way to show both prosite and > rebase databases when I type 'showdb' at the prompt? I asked > the same question back in 2003. I was hoping the answer will > be different this time :-) > > Thanks. > > Jean > > > _______________________________________________ > EMBOSS mailing list > EMBOSS at lists.open-bio.org > http://lists.open-> bio.org/mailman/listinfo/emboss > From pmr at ebi.ac.uk Wed Oct 25 03:34:19 2006 From: pmr at ebi.ac.uk (pmr at ebi.ac.uk) Date: Wed, 25 Oct 2006 08:34:19 +0100 (BST) Subject: [EMBOSS] How to include Prosite and Rebase and Print into 'showdb' program In-Reply-To: <000001c6f798$5c2c13c0$be4de780@CIT.NIH.GOV> References: <000001c6f798$5c2c13c0$be4de780@CIT.NIH.GOV> Message-ID: <1836.217.44.133.216.1161761659.squirrel@webmail.ebi.ac.uk> Hi Jean, > Hi, for EMBOSS 4.0.0, is there a way to show both prosite and rebase > databases when I type 'showdb' at the prompt? I asked the same question > back in 2003. I was hoping the answer will be different this time :-) Well .... EMBOSS 4.0.0 does have extended showdb output so now we can add this. The main issue is that there is currently nothing in EMBOSS that uses the definition, but we would like to add a report of the database release to the output of programs that use them. The definitions would be expected to go in RESOURCE definitions in the emboss.default file but we could perhaps put something in the output of the *extract programs. I will take another look. regards, Peter From pmr at ebi.ac.uk Wed Oct 25 03:37:56 2006 From: pmr at ebi.ac.uk (pmr at ebi.ac.uk) Date: Wed, 25 Oct 2006 08:37:56 +0100 (BST) Subject: [EMBOSS] Antigenic - input file format? In-Reply-To: <3251.146.244.226.90.1161713347.squirrel@www-rohan.sdsu.edu> References: <3251.146.244.226.90.1161713347.squirrel@www-rohan.sdsu.edu> Message-ID: <1840.217.44.133.216.1161761876.squirrel@webmail.ebi.ac.uk> Hi Sujata, > I want to use the package called 'Antigenic' in EMBOSS. > I am not quite clear about the input file format to be used. > > How can I input a fasta file to the program? - is it possible to use a > text file that has the amino acid sequence in a fasta format? In which > folder should the file be? All EMBOSS programs read sequences from files, or from databases (local or remote). You can put the sequence in a file, in fasta format, and give the filename to any EMBOSS program as input. Sequences are "input parameters" so simply putting the filename on the command line is enough. EMBOSS will look in the current directory, but you can give the full or relative file path just like any Unix command. This assumes of course that you are running EMBOSS locally, not through a web interface (in that case, simply paste a FASTA format sequence into the text box). Hope that helps Peter From pmr at ebi.ac.uk Wed Oct 25 03:42:50 2006 From: pmr at ebi.ac.uk (pmr at ebi.ac.uk) Date: Wed, 25 Oct 2006 08:42:50 +0100 (BST) Subject: [EMBOSS] How to include Prosite and Rebase and Print into 'showdb'program In-Reply-To: <002f01c6f7ee$07005fe0$2f01a8c0@GOLHARMOBILE1> References: <000001c6f798$5c2c13c0$be4de780@CIT.NIH.GOV> <002f01c6f7ee$07005fe0$2f01a8c0@GOLHARMOBILE1> Message-ID: <1844.217.44.133.216.1161762170.squirrel@webmail.ebi.ac.uk> Ryan Golhar writes: > Have you gotten an answer to this yet? A bit quick off the mark there, Ryan! :-) :-) :-) Jean asked in the USA at 7pm our time. You posted this in India at 5am our time. I answered over breakfast (well, not quite a positive answer, but I did answer :-) If only Jean had asked last week ... I was in Japan and I'd have snuck in a reply already... and Alan, Jon and I do quite often post replies at very strange hours even when we are home. regards, Peter From mrln at o2.pl Wed Oct 25 09:13:36 2006 From: mrln at o2.pl (Marlena Roszczyk) Date: Wed, 25 Oct 2006 15:13:36 +0200 Subject: [EMBOSS] 30 entries only In-Reply-To: <3261.217.44.133.216.1161680117.squirrel@webmail.ebi.ac.uk> References: <1161640147.4367.37.camel@localhost.localdomain> <453DC7F2.6030908@ebi.ac.uk> <3261.217.44.133.216.1161680117.squirrel@webmail.ebi.ac.uk> Message-ID: <1161782016.4396.52.camel@localhost.localdomain> Adding lv parameter helped and is good enough. It required few more lines in emboss.default: DB blahblah [ method: url format: myfavouriteformat type: P url: "http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz?-e+-ascii+[uniprot-des:% s]+-lv+" ] Thank you. Still, option -vn 1 refuses to cooperate, although -vn 2 works fine. Adding +-vn+1 to the url-line above makes seqret return "Bad value for -sequence". Hmmm... Regards, Marlena Roszczyk > Rodrigo Lopez writes: > > > I suspect this is related to the default view used in SRS. It is > > returning the first page of results that contains 30 sequences (the > > default). Yes, the number 30 here and there doesn't seem a coincidence. From pmr at ebi.ac.uk Wed Oct 25 09:27:49 2006 From: pmr at ebi.ac.uk (Peter Rice) Date: Wed, 25 Oct 2006 14:27:49 +0100 Subject: [EMBOSS] Question regarding seqret In-Reply-To: <000001c6d11a$0db13530$be4de780@CIT.NIH.GOV> References: <000001c6d11a$0db13530$be4de780@CIT.NIH.GOV> Message-ID: <453F6655.2050900@ebi.ac.uk> Jean Mao wrote: > Hi, > I have a question hopefully someone can help me about it. > > I downloaded the gbrvt1.seq file from ftp://ftp.ncbi.nih.gov/genbank/ as a test, gunzip and index it with dbxflat (I know it's not > than 2gb): > > % dbxflat -dbname=testdb -dbresource=embl -idformat=gb -directory=. -fields='id,acc,sv,des' -filenames='gbvrt*.seq' -indexoutdir=. -release=0.0 -date='00/00/00' > > Then I run 'seqret' but failed to retrieve entries using 'sv' or 'des' fields: I didn't see an answer to this one, but I suspect you have already figured it out. dbixflat and dbiflat will have created the sv and des indices. You have to edit the database definition in emboss.default to say the fields exist. fields: "sv des" then seqret and other programs will know they can use them. Yes, in theory seqret could work out what indices are available for a dbxflat or dbiflat indexed database - but it would be more difficult for an SRS or SRSWWW database (for example) so we depend on the database definitions. Hope that helps, Peter From golharam at umdnj.edu Wed Oct 25 14:50:12 2006 From: golharam at umdnj.edu (Ryan Golhar) Date: Wed, 25 Oct 2006 14:50:12 -0400 Subject: [EMBOSS] How to include Prosite and Rebase and Print into 'showdb'program In-Reply-To: <1844.217.44.133.216.1161762170.squirrel@webmail.ebi.ac.uk> Message-ID: <006d01c6f866$66d9cbe0$2f01a8c0@GOLHARMOBILE1> > -----Original Message----- > From: pmr at ebi.ac.uk [mailto:pmr at ebi.ac.uk] > Sent: Wednesday, October 25, 2006 3:43 AM > To: golharam at umdnj.edu > Cc: 'Jean Mao'; emboss at emboss.open-bio.org > Subject: Re: [EMBOSS] How to include Prosite and Rebase and > Print into 'showdb'program > > > Ryan Golhar writes: > > Have you gotten an answer to this yet? > > A bit quick off the mark there, Ryan! :-) :-) :-) > > Jean asked in the USA at 7pm our time. You posted this in > India at 5am our time. I answered over breakfast (well, not > quite a positive answer, but I did answer :-) > > If only Jean had asked last week ... I was in Japan and I'd > have snuck in a reply already... and Alan, Jon and I do quite > often post replies at very strange hours even when we are home. > > regards, > > Peter > > > Sorry, I was cleaning out my mail folder. I had deleted the message already and noticed it in my deleted box. The subject caught my attention. I thought the message was older...my bad. From mkitagaw73 at yahoo.co.jp Fri Oct 27 09:09:40 2006 From: mkitagaw73 at yahoo.co.jp (mkitagaw73 at yahoo.co.jp) Date: Fri, 27 Oct 2006 22:09:40 +0900 Subject: [EMBOSS] ARACHNE3 Message-ID: I can not find "Arachne 3" the assembler of new version of "Arachne 2". Do you know where it is? -- Nari From mincloud at gmail.com Sun Oct 29 12:39:35 2006 From: mincloud at gmail.com (yun zheng) Date: Sun, 29 Oct 2006 11:39:35 -0600 Subject: [EMBOSS] How to apply the einverted and etandom to a fasta file Message-ID: <8f6eb9540610290939i48adf473g2f81c36a14b198ad@mail.gmail.com> Hi, I am a new user of emboss. I am trying to find repeat sequences in a nucleotide sequence file that have many sequences. Can anybody tell me how to use einverted and etandem to analyze all the sequences in a fasta file? Many Thanks. Sincerely Zheng, yun Dept of Computer Science and Engineering Washington Univ in St Louis Campus Box 1045 1 Brookings Drive Jolley Hall 505 St Louis, MO 63130 Details: I install a version on the linux platform. And the command is like follows, where the default value is used. >einverted -sequence test.fasta -outfile test.outfile -outseq >test-i.fasta Finds DNA inverted repeats Gap penalty [12]: Minimum score threshold [50]: Match score [3]: Mismatch score [-4]: But the output file seems always to be empty. When I try etandom >etandem -sequence test.fasta -outfile test-t.out -origfile test.etandem Looks for tandem repeats in a nucleotide sequence Minimum repeat size [10]: Maximum repeat size [10]: 18 However, it seems that only the first sequence is analyzed by the einverted and etandom. The test-t.out file is as follows. ######################################## # Program: etandem # Rundate: Sat Oct 28 2006 17:24:30 # Commandline: etandem # -sequence test.fasta # -outfile test-t.out # -origfile test.etandem # -maxrepeat 18 # Report_format: table # Report_file: test-t.out ######################################## #======================================= # # Sequence: D9X6RJV01EER0J from: 1 to: 55 # HitCount: 0 # # Threshold: 20 # Minrepeat: 10 # Maxrepeat: 18 # Mismatch: No # Uniform: No # #======================================= Start End Score Size Count Identity Consensus #--------------------------------------- #--------------------------------------- Many thanks. From gbottu at ben.vub.ac.be Mon Oct 30 10:33:13 2006 From: gbottu at ben.vub.ac.be (Guy Bottu) Date: Mon, 30 Oct 2006 16:33:13 +0100 Subject: [EMBOSS] How to apply the einverted and etandom to a fasta file - C In-Reply-To: <8f6eb9540610290939i48adf473g2f81c36a14b198ad@mail.gmail.com> References: <8f6eb9540610290939i48adf473g2f81c36a14b198ad@mail.gmail.com> Message-ID: <20061030153313.GA14597@bigben.ulb.ac.be> On Sun, Oct 29, 2006 at 11:39:35AM -0600, yun zheng wrote: > I am a new user of emboss. I am trying to find repeat sequences in a > nucleotide sequence file that have many sequences. > > Can anybody tell me how to use einverted and etandem to analyze all the > sequences in a fasta file? einverted is searching for palindromes rather than repeats. It operates without problem on a fastA multiple sequence file. The reason that the output file is empty is probably because it did not find any good palindrome. Maybe you can try experiment with the parameters. etandem operates only on one sequence at a time. You can see this because if you do etandem -help you see that it takes as input an object of type "sequence" rather than "seqall". If you want to treat many sequences at once, you will need to put them in separate files. If necessary you can run seqret -ossingle on your file. You can under the Tc shell (tcsh) (provided your files are all called something.fasta) do : foreach FASTAFILE (`ls *.fasta`) etandem $FASTAFILE -minrepeat=10 -maxrepeat=10 -threshold=20 -auto end Problem is that etandem works only well if you provide an appropriate value for minrepeat/maxrepeat/threshold. You can use equicktandem to get an idea (look in the 4th column of the output for a repeat size). Working on all sequences in one run will of course only go well if they all contain repeats of similar size and quality. I hope this helps. Guy Bottu, Belgian EMBnet Node From jbreu at mpipsykl.mpg.de Mon Oct 30 14:38:10 2006 From: jbreu at mpipsykl.mpg.de (Johannes Breu) Date: Mon, 30 Oct 2006 20:38:10 +0100 Subject: [EMBOSS] dbifasta Message-ID: <3.0.6.32.20061030203810.00acc5e8@komserv.mpipsykl.mpg.de> Hello, while trying to index my database (its mouse_ensembl_cdna and so is the name) I always get the following error message: $ dbifasta Database indexing for fasta file databases Database name: cdna simple : >ID idacc : >ID ACC gcgid : >db:ID gcgidacc : >db:ID ACC dbid : >db ID ncbi : | formats ID line format [idacc]: simple Database directory [.]: /data/cdna Wildcard database filename [*.dat]: cdna Release number [0.0]: Index date [00/00/00]: General log output file [outfile.dbifasta]: outfile.cdnafasta EMBOSS An error in dbifasta.c at line 210: No files selected For the case it?s relevant - I am using cygwin. Thank you, Johannes From ajb at ebi.ac.uk Mon Oct 30 16:30:03 2006 From: ajb at ebi.ac.uk (ajb at ebi.ac.uk) Date: Mon, 30 Oct 2006 21:30:03 -0000 (GMT) Subject: [EMBOSS] dbifasta In-Reply-To: <3.0.6.32.20061030203810.00acc5e8@komserv.mpipsykl.mpg.de> References: <3.0.6.32.20061030203810.00acc5e8@komserv.mpipsykl.mpg.de> Message-ID: <40898.81.98.244.247.1162243803.squirrel@webmail.ebi.ac.uk> Hi, If the filename is mouse_ensembl_cdna then that's the filename you should use at the Wildcard database filename [*.dat]: prompt. From your email you were using "cdna" instead. As a wildcard can be specified then perhaps you intended typing "*cdna" which would have picked up the filename mouse_ensembl_cdna HTH Alan EBI > Hello, > while trying to index my database (its mouse_ensembl_cdna and so is the > name) I always get the following error message: > > $ dbifasta > Database indexing for fasta file databases > Database name: cdna > simple : >ID > idacc : >ID ACC > gcgid : >db:ID > gcgidacc : >db:ID ACC > dbid : >db ID > ncbi : | formats > ID line format [idacc]: simple > Database directory [.]: /data/cdna > Wildcard database filename [*.dat]: cdna > Release number [0.0]: > Index date [00/00/00]: > General log output file [outfile.dbifasta]: outfile.cdnafasta > > EMBOSS An error in dbifasta.c at line 210: > No files selected > > > For the case it?s relevant - I am using cygwin. > > Thank you, Johannes > > > _______________________________________________ > EMBOSS mailing list > EMBOSS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/emboss > > From shrish at ccmb.res.in Tue Oct 31 07:41:19 2006 From: shrish at ccmb.res.in (Shrish Tiwari) Date: Tue, 31 Oct 2006 18:11:19 +0530 (IST) Subject: [EMBOSS] extracting noncoding regions Message-ID: <2187871.1162298479934.JavaMail.root@mailserver> An embedded and charset-unspecified text was scrubbed... Name: not available Url: http://lists.open-bio.org/pipermail/emboss/attachments/20061031/be440637/attachment.pl From shrish at ccmb.res.in Tue Oct 31 07:18:36 2006 From: shrish at ccmb.res.in (Shrish Tiwari) Date: Tue, 31 Oct 2006 17:48:36 +0530 (IST) Subject: [EMBOSS] showfeat troubles Message-ID: <24303384.1162297116701.JavaMail.root@mailserver> An embedded and charset-unspecified text was scrubbed... Name: not available Url: http://lists.open-bio.org/pipermail/emboss/attachments/20061031/9aa4724d/attachment.pl From David.Bauer at schering.de Tue Oct 31 08:54:03 2006 From: David.Bauer at schering.de (David.Bauer at schering.de) Date: Tue, 31 Oct 2006 14:54:03 +0100 Subject: [EMBOSS] Antwort: showfeat troubles In-Reply-To: <24303384.1162297116701.JavaMail.root@mailserver> Message-ID: Hi, I don't get this problem. Showfeat displays CDS from both strands with EMBL and GenBank files. What is the source of your Genbankf file ? Maybe the format is not perfectly correct ? David. emboss-bounces at lists.open-bio.org schrieb am 31/10/2006 13:18:36: > Hi! > I used the following command to extract only positions of CDS from gbk files: > showfeat -pos -matchtype CDS -width 0 > But I noticed that the program does not extract positions of CDS > that lie on the complementary strand, e.g. CDS > complement(5683..6459) did not show up in the resultant file. Any > ideas on how I can get showfeat to extract these positions too. > Shrish > Dr. Shrish Tiwari > E503, Centre for Cellular and Molecular Biology > Uppal Road, Hyderabad - 500 007, INDIA > Phone: 91-40-27192777 > Alternate email: shrish.geo at yahoo.com > > _______________________________________________ > EMBOSS mailing list > EMBOSS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/emboss From pmr at ebi.ac.uk Tue Oct 31 10:34:00 2006 From: pmr at ebi.ac.uk (Peter Rice) Date: Tue, 31 Oct 2006 15:34:00 +0000 Subject: [EMBOSS] extracting noncoding regions In-Reply-To: <2187871.1162298479934.JavaMail.root@mailserver> References: <2187871.1162298479934.JavaMail.root@mailserver> Message-ID: <45476CE8.8080409@ebi.ac.uk> Hi Shrish, Shrish Tiwari wrote: > Hi! > Is there a way of extracting the noncoding regions of a genome using an EMBOSS program? That is a simple change to coderet to return non-coding sequence (exclude the CDS and mRNA features). Does anyone else want this? We can do it for the next release. regards, Peter From pmr at ebi.ac.uk Tue Oct 31 10:55:49 2006 From: pmr at ebi.ac.uk (Peter Rice) Date: Tue, 31 Oct 2006 15:55:49 +0000 Subject: [EMBOSS] showfeat troubles In-Reply-To: <24303384.1162297116701.JavaMail.root@mailserver> References: <24303384.1162297116701.JavaMail.root@mailserver> Message-ID: <45477205.50002@ebi.ac.uk> Hi Shrish, Shrish Tiwari wrote: > Hi! > I used the following command to extract only positions of CDS from gbk files: > showfeat -pos -matchtype CDS -width 0 > But I noticed that the program does not extract positions of CDS that lie on the complementary strand, e.g. CDS complement(5683..6459) did not show up in the resultant file. Any ideas on how I can get showfeat to extract these positions too. It worked for me, but reports these as 5683..6469 (without -width 0 it will show the arrow in the reverse direction) Can you try running entret on the same genbank entry, and sending the output file to emboss-bug at emboss.open-bio.org so we can take a look at it. regards, Peter Rice From David.Bauer at schering.de Tue Oct 31 09:01:54 2006 From: David.Bauer at schering.de (David.Bauer at schering.de) Date: Tue, 31 Oct 2006 15:01:54 +0100 Subject: [EMBOSS] Antwort: extracting noncoding regions In-Reply-To: <2187871.1162298479934.JavaMail.root@mailserver> Message-ID: Hm, if the genome is annotated, you could use maskfeat -type mRNA (or -type CDS) to mask all transcribed or translated regions with N. HTH, David. emboss-bounces at lists.open-bio.org schrieb am 31/10/2006 13:41:19: > Hi! > Is there a way of extracting the noncoding regions of a genome using > an EMBOSS program? > Shrish > Dr. Shrish Tiwari > E503, Centre for Cellular and Molecular Biology > Uppal Road, Hyderabad - 500 007, INDIA > Phone: 91-40-27192777 > Alternate email: shrish.geo at yahoo.com > > _______________________________________________ > EMBOSS mailing list > EMBOSS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/emboss From golharam at umdnj.edu Tue Oct 31 12:02:38 2006 From: golharam at umdnj.edu (Ryan Golhar) Date: Tue, 31 Oct 2006 12:02:38 -0500 Subject: [EMBOSS] extracting noncoding regions In-Reply-To: <45476CE8.8080409@ebi.ac.uk> Message-ID: <000a01c6fd0e$5f5b5b70$b23d140a@GOLHARMOBILE1> I think that would be a useful feature...I have a need for it now and currently use a Bioperl script to parse out noncoding regions from a GenBank entry... > -----Original Message----- > From: emboss-bounces at lists.open-bio.org > [mailto:emboss-bounces at lists.open-bio.org] On Behalf Of Peter Rice > Sent: Tuesday, October 31, 2006 10:34 AM > To: Shrish Tiwari > Cc: emboss at emboss.open-bio.org > Subject: Re: [EMBOSS] extracting noncoding regions > > > Hi Shrish, > > Shrish Tiwari wrote: > > Hi! > > Is there a way of extracting the noncoding regions of a > genome using > > an EMBOSS program? > > That is a simple change to coderet to return non-coding > sequence (exclude the > CDS and mRNA features). > > Does anyone else want this? We can do it for the next release. > > regards, > > Peter > _______________________________________________ > EMBOSS mailing list > EMBOSS at lists.open-bio.org > http://lists.open-> bio.org/mailman/listinfo/emboss > From Richard.Rothery at ualberta.ca Tue Oct 31 12:02:22 2006 From: Richard.Rothery at ualberta.ca (Richard Rothery) Date: Tue, 31 Oct 2006 10:02:22 -0700 Subject: [EMBOSS] Batch retrieval of taxonomy/species names using entret..... Message-ID: <000001c6fd0e$5520f2f0$5e068081@Nordegg> Hi, I am interested in using entret to retrieve single field entries from swissprot or sptrembl. Specifically, I would like to feed entret a list of accessions and have it return a file with the species names and/or taxonomies. I intend to use this information to compare with my phylogeny analyses of clustalw alignments. Thanks, Richard ############################################### CIHR Membrane Protein Research Group, Department of Biochemistry, University of Alberta, Edmonton T6G 2H7 Ph. (780) 492-2229 Fax. (780) 492-0886 ############################################### From Suraj.Mukatira at STJUDE.ORG Tue Oct 31 13:30:00 2006 From: Suraj.Mukatira at STJUDE.ORG (Mukatira, Suraj) Date: Tue, 31 Oct 2006 12:30:00 -0600 Subject: [EMBOSS] extracting noncoding regions Message-ID: I use BioPerl as well. Extraction of non-coding regions and features like intron, UTR etc. would certainly be useful from within EMBOSS. Suraj Mukatira -----Original Message----- From: emboss-bounces at lists.open-bio.org [mailto:emboss-bounces at lists.open-bio.org] On Behalf Of Ryan Golhar Sent: Tuesday, October 31, 2006 11:03 AM To: 'Peter Rice'; 'Shrish Tiwari' Cc: emboss at emboss.open-bio.org Subject: Re: [EMBOSS] extracting noncoding regions I think that would be a useful feature...I have a need for it now and currently use a Bioperl script to parse out noncoding regions from a GenBank entry... > -----Original Message----- > From: emboss-bounces at lists.open-bio.org > [mailto:emboss-bounces at lists.open-bio.org] On Behalf Of Peter Rice > Sent: Tuesday, October 31, 2006 10:34 AM > To: Shrish Tiwari > Cc: emboss at emboss.open-bio.org > Subject: Re: [EMBOSS] extracting noncoding regions > > > Hi Shrish, > > Shrish Tiwari wrote: > > Hi! > > Is there a way of extracting the noncoding regions of a > genome using > > an EMBOSS program? > > That is a simple change to coderet to return non-coding > sequence (exclude the > CDS and mRNA features). > > Does anyone else want this? We can do it for the next release. > > regards, > > Peter > _______________________________________________ > EMBOSS mailing list > EMBOSS at lists.open-bio.org > http://lists.open-> bio.org/mailman/listinfo/emboss > _______________________________________________ EMBOSS mailing list EMBOSS at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/emboss From pmr at ebi.ac.uk Tue Oct 31 13:53:00 2006 From: pmr at ebi.ac.uk (Peter Rice) Date: Tue, 31 Oct 2006 18:53:00 +0000 Subject: [EMBOSS] Batch retrieval of taxonomy/species names using entret..... In-Reply-To: <000001c6fd0e$5520f2f0$5e068081@Nordegg> References: <000001c6fd0e$5520f2f0$5e068081@Nordegg> Message-ID: <45479B8C.5080800@ebi.ac.uk> Hi Richard, Richard Rothery wrote: > I am interested in using entret to retrieve single field entries from > swissprot or sptrembl. Specifically, I would like to feed entret a list > of accessions and have it return a file with the species names and/or > taxonomies. I intend to use this information to compare with my > phylogeny analyses of clustalw alignments. EMBOSS stores the full text in entret without parsing. We could try to extract specific fields but it is not easy to define them for all formats. You can do this with SRS. Try the EBI server for example: Go to the library page Select UniProtKB/SwissProt (or UniProtKB/TrEMBL) Select "standard query form" Enter your query in the top part (e.g. accession number) In the "create a view" section click the "list" button to egt the original lines. Select anything taxonomic from the pull down list (control-click to select more than one) Press "search". refine your query. You will see the URL at the top that can be used to retrieve data when you are happy. Failing that, you could just parse out the ID and O* lines from entret using a simple perl script. Hope that helps, Peter From gbottu at ben.vub.ac.be Mon Oct 2 07:58:24 2006 From: gbottu at ben.vub.ac.be (Guy Bottu) Date: Mon, 2 Oct 2006 09:58:24 +0200 Subject: [EMBOSS] case sensitive identifiers - Checked by AntiVir DEMO version - In-Reply-To: <2023.86.132.219.183.1159518502.squirrel@webmail.ebi.ac.uk> References: <20060928135740.GA14320@bigben.ulb.ac.be> <451BDD04.9040806@ebi.ac.uk> <20060929081508.GA25906@bigben.ulb.ac.be> <2023.86.132.219.183.1159518502.squirrel@webmail.ebi.ac.uk> Message-ID: <20061002075824.GA5571@bigben.ulb.ac.be> On Fri, Sep 29, 2006 at 09:28:22AM +0100, pmr at ebi.ac.uk wrote: > For the PDB case, really only the end of the ID is case-sensitive. Do you > think the database should be case-sensitive for the whole ID, or does it > make sense to check for a pattern as the case-sensitive part? I think that trying to define which part of the ID is case-sensitive is making it just too complicated. Let's have it completely case-sensitive or not at all. > EMBOSS will initially read only one sequence for a seqall ... it does not > read in all the sequences and look for duplicates so we have to decide in > the emboss.defaults DB definition how to check a single ID (no way to read > them all and check for duplicates). Trying to check for duplicates is again too complicated. I understand that if a databank or a multiple sequence file has duplicates a "sequence" will retrieve the first and a "seqset" or "seqall" will retrieve them all. Well, let it be that way. It is the responsability of the database manager/user to make sure there are no duplicates. Guy From gbottu at ben.vub.ac.be Mon Oct 2 08:11:46 2006 From: gbottu at ben.vub.ac.be (Guy Bottu) Date: Mon, 2 Oct 2006 10:11:46 +0200 Subject: [EMBOSS] case sensitive identifiers - Checked by AntiVir DEMO version - In-Reply-To: <451CF527.8040506@ebi.ac.uk> References: <20060928135740.GA14320@bigben.ulb.ac.be> <451BDD04.9040806@ebi.ac.uk> <20060929081508.GA25906@bigben.ulb.ac.be> <451CF527.8040506@ebi.ac.uk> Message-ID: <20061002081146.GB5571@bigben.ulb.ac.be> On Fri, Sep 29, 2006 at 11:27:51AM +0100, Peter Rice wrote: > So, there will be 2 new (and for the first time boolean) attributes for > databases. To use them, you will need: > > caseidmatch: "Y" > hasaccession: "N" The "hasaccession" attribute is certainly useful for search methods like SRS and MRS who have the notion of searching in separate indexes. By default searching both "id" and "ac" is the thing to do, but there are databanks where there is no "ac" indexed or there are databanks, like EMBL or IMGTHLA, where the "id" and the "ac" are always identical, so that searching only the "id" gains time without loosing functionality. As for the case problem, I think we agree that the best is to always handle the sequence name as such (case as typed by the user) to the search method and in case the search method itself is not case senstive but the databank is, let EMBOSS if 'hasaccession: "Y"' parse the retrieved sequences and accept only those who match. This will work fine for SRS (and of course for the method "direct", where EMBOSS does all the work), but it will not work for MRS, since the current version of MRS does not allow case-different index words. Guy From jbreu at mpipsykl.mpg.de Fri Oct 6 16:53:00 2006 From: jbreu at mpipsykl.mpg.de (Johannes Breu) Date: Fri, 06 Oct 2006 18:53:00 +0200 Subject: [EMBOSS] question Message-ID: <3.0.6.32.20061006185300.00ab68e0@komserv.mpipsykl.mpg.de> To whom it may concern. I tried to install emboss on MS Windows 2000 in a cygwin environment. I typed ./configure (following INSTALL). It took a long time but there was no error message. After typing make I got the message bash:command:not found. Does anybody have any idea to solve this problem. Thanks. From shaun at ebi.ac.uk Fri Oct 6 18:30:56 2006 From: shaun at ebi.ac.uk (shaun at ebi.ac.uk) Date: Fri, 6 Oct 2006 19:30:56 +0100 (BST) Subject: [EMBOSS] question In-Reply-To: <3.0.6.32.20061006185300.00ab68e0@komserv.mpipsykl.mpg.de> References: <3.0.6.32.20061006185300.00ab68e0@komserv.mpipsykl.mpg.de> Message-ID: <50608.82.21.106.225.1160159456.squirrel@webmail.ebi.ac.uk> > To whom it may concern. > > I tried to install emboss on MS Windows 2000 in a cygwin environment. I > typed ./configure (following INSTALL). It took a long time but there was > no > error message. After typing make I got the message bash:command:not > found. > Does anybody have any idea to solve this problem. Thanks. > > _______________________________________________ > EMBOSS mailing list > EMBOSS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/emboss > Hi Johannes, I believe the installation under cygwin requires a couple of additional switches: ./configure --without-x CFLAGS=-s See the following URL for a more short guide to installing EMBOSS under a cygwin environment (it actually describes the scenario that you are encountering): http://emboss.sourceforge.net/download/cygwin.html HTH Shaun From mukherje at nsm.umass.edu Fri Oct 13 19:34:11 2006 From: mukherje at nsm.umass.edu (mukherje at nsm.umass.edu) Date: Fri, 13 Oct 2006 15:34:11 -0400 Subject: [EMBOSS] (no subject) Message-ID: <1160768051.452fea3357af3@mail-www.oit.umass.edu> Hi, I tried to build Emboss package in a cygwin environment but had thhe following error after running the "make" function. Creating library file: .libs/libajax.dll.a collect2: Id returned 1 exit status make[1]: *** [libajax.1a] Error 1 make[1]: Leaving directory '/home/supratim/Emboss-4.0.0/ajax' make: *** [install-recursive] Error1 I tried to run the application both before & after applying the fix as mentioned and also tried with the regular configure option as well as ./configure --without-x CFLAGS=-s make make install I also tried with the older version (EMBOSS-3.0.0) but did not have much luck. It works fine in a Mac but I would like to get it to work on a Windows platform with cygwin. Thank you for your assistance in advance Supratim Mukherjee Graduate student Department of Microbiology UMass, Amherst From kertib at linuxlap.hu Mon Oct 16 08:50:03 2006 From: kertib at linuxlap.hu (Kerti =?ISO-8859-1?Q?Bal=E1zs_G=E1bor?=) Date: Mon, 16 Oct 2006 10:50:03 +0200 Subject: [EMBOSS] make error Message-ID: <1160988603.3981.12.camel@balazska.site> Hi, The .configure run well, but the make made error ebove: config.status: creating jemboss/resources/Makefile config.status: creating jemboss/utils/Makefile config.status: creating Makefile config.status: executing depfiles commands server:/usr/src/EMBOSS-4.0.0 # make Making all in plplot make[1]: Entering directory `/usr/src/EMBOSS-4.0.0/plplot' Making all in lib make[2]: Entering directory `/usr/src/EMBOSS-4.0.0/plplot/lib' make[2]: Nothing to be done for `all'. make[2]: Leaving directory `/usr/src/EMBOSS-4.0.0/plplot/lib' make[2]: Entering directory `/usr/src/EMBOSS-4.0.0/plplot' make[2]: Nothing to be done for `all-am'. make[2]: Leaving directory `/usr/src/EMBOSS-4.0.0/plplot' make[1]: Leaving directory `/usr/src/EMBOSS-4.0.0/plplot' Making all in ajax make[1]: Entering directory `/usr/src/EMBOSS-4.0.0/ajax' make[1]: Nothing to be done for `all'. make[1]: Leaving directory `/usr/src/EMBOSS-4.0.0/ajax' Making all in nucleus make[1]: Entering directory `/usr/src/EMBOSS-4.0.0/nucleus' /bin/sh ../libtool --tag=CC --mode=link gcc -O2 -o libnucleus.la -rpath /usr/local/lib -version-info 4:0:0 embaln.lo embcom.lo embcons.lo embdata.lo embdbi.lo embdmx.lo embdomain.lo embest.lo embexit.lo embgroup.lo embiep.lo embindex.lo embinit.lo embmat.lo embmisc.lo embmol.lo embnmer.lo embpat.lo embpatlist.lo embprop.lo embpdb.lo embread.lo embsig.lo embshow.lo embword.lo libtool: link: `embmat.lo' is not a valid libtool object make[1]: *** [libnucleus.la] Error 1 make[1]: Leaving directory `/usr/src/EMBOSS-4.0.0/nucleus' make: *** [all-recursive] Error 1 server:/usr/src/EMBOSS-4.0.0 # The OS: SuSE OpenEnterprise Server 10.0 x86 Kerlenl: server: Linux server 2.6.16.21-0.25-default #1 Tue Sep 19 07:26:15 UTC 2006 i686 i686 i386 GNU/Linux What is the soultion? Thank you for your assistance in advance Kerti Balazs Gabor Genetics and Plant Breeding, Szent Istvan University, Pater K. U. 1., Godollo 2103, Hungary From maoj at helix.nih.gov Tue Oct 17 15:36:22 2006 From: maoj at helix.nih.gov (Jean Mao) Date: Tue, 17 Oct 2006 11:36:22 -0400 Subject: [EMBOSS] Question regarding dbxflat Message-ID: <000001c6f202$000323e0$be4de780@CIT.NIH.GOV> Hello, Could someone help me determine which fields I need to include while running dbxflat? I am going to index the genbank and est gb*.seq files and gbest*.seq files from ftp://ftp.ncbi.nih.gov/genbank/. These files have sequence entries composed of : Locus, Definition, Accession, Version, Keywords, Source, Organism?? If I specify 'acc, id' while indexing, will the 'Definition' line be indexed or not? What about 'acc, id, des' ? In other words, I would like to know which programs in EMBOSS will not work if I don't specify 'des' while indexing. Some programs in EMBOSS such as 'coderet' require feature table. If I only index 'acc, id', will coderet work when user specify 'genbank:xxxxx' ? I guess all I am trying to ask is what programs will stop working if I only accept default 'acc, id' fields. Thank you in advance. From pmr at ebi.ac.uk Tue Oct 17 16:23:03 2006 From: pmr at ebi.ac.uk (pmr at ebi.ac.uk) Date: Tue, 17 Oct 2006 17:23:03 +0100 (BST) Subject: [EMBOSS] Question regarding dbxflat In-Reply-To: <000001c6f202$000323e0$be4de780@CIT.NIH.GOV> References: <000001c6f202$000323e0$be4de780@CIT.NIH.GOV> Message-ID: <2320.210.150.186.27.1161102183.squirrel@webmail.ebi.ac.uk> Hi Jean, > If I specify 'acc, id' while indexing, will the 'Definition' line be > indexed or not? What about 'acc, id, des' ? In other words, I would like > to know which programs in EMBOSS will not work if I don't specify 'des' > while indexing. > Some programs in EMBOSS such as 'coderet' require feature table. If I only > index 'acc, id', will coderet work when user specify 'genbank:xxxxx' ? > > I guess all I am trying to ask is what programs will stop working if I > only accept default 'acc, id' fields. The dbxflat fields only affect queries (do you want to search by dbname-des: or dbname-gi: when you look for sequences). Retrieval is the same once an entry has been found - you can return all txt for entret, features for coderet, and os on as usual. By default only the id and acc lines will be indexed. We found a problem with one database that had no accessions (pdb as a fasta file indexed with dbxfasta) so the next release will have an option to turn off accession searches in the database definition and we may add an option to skip accession indexing. regards, Peter Rice From smiddha at indiana.edu Thu Oct 19 14:59:57 2006 From: smiddha at indiana.edu (Sumit Middha) Date: Thu, 19 Oct 2006 10:59:57 -0400 Subject: [EMBOSS] distmat Uncorrected distance > 100 Message-ID: <453792ED.6040601@indiana.edu> Hi, I tried using distmat from emboss on some alignments and am getting scores in excess of 100 (using all default options). I am not sure how scores can exceed 100. D = uncorrected distance = p-distance = 1-S where S = m/(npos + gaps*gap_penalty) So D is like a percentage and equals number of substitutions per 100 bases or amino acids. Please correct me or point me to an explanation which will help clarify my doubt. Thanks, Sumit From aengus.stewart at cancer.org.uk Mon Oct 23 18:00:11 2006 From: aengus.stewart at cancer.org.uk (Aengus Stewart) Date: Mon, 23 Oct 2006 19:00:11 +0100 Subject: [EMBOSS] Fuzznuc ignoring start and end Message-ID: <453D032B.7010104@cancer.org.uk> Hi folks, fuzznuc and also fuzzpro are ignoring the start and end params I am giving it. fuzznuc -pattern rccatgg -sbegin1 75834 -send1 96013 -sequence ac087388.fasta Cheers Aengus ######################################## # Program: fuzznuc # Rundate: Mon Oct 23 2006 18:54:54 # Commandline: fuzznuc # -pattern rccatgg # -sbegin 75834 # -send 96013 # -sequence ac087388.fasta # Report_format: seqtable # Report_file: ac087388.fuzznuc ######################################## #======================================= # # Sequence: AC087388 from: 75834 to: 96013 # HitCount: 15 # # Pattern_name Mismatch Pattern # pattern1 0 rccatgg # # Complement: No # #======================================= Start End Pattern_name Mismatch Sequence 38702 38708 pattern1 . gccatgg 43834 43840 pattern1 . accatgg 47457 47463 pattern1 . gccatgg 48659 48665 pattern1 . gccatgg 56718 56724 pattern1 . accatgg 61200 61206 pattern1 . accatgg 62151 62157 pattern1 . accatgg 68706 68712 pattern1 . accatgg 78513 78519 pattern1 . gccatgg 79973 79979 pattern1 . gccatgg 86415 86421 pattern1 . accatgg 97451 97457 pattern1 . accatgg 102803 102809 pattern1 . gccatgg 113924 113930 pattern1 . gccatgg 115436 115442 pattern1 . gccatgg #--------------------------------------- #--------------------------------------- #--------------------------------------- # Total_sequences: 1 # Total_hitcount: 15 #--------------------------------------- -- ----------------------------------------------------------------------- Aengus Stewart Group Leader Bioinformatics and BioStatistics Tel: +44 (0)20 7269 3679 Cancer Research UK, Lincoln's Inn Fields, Holborn, London, WC2A 3PX, UK ----------------------------------------------------------------------- This electronic message contains information which may be privileged and confidential. The information is intended to be for the use of the individual(s) or entity named above. Be aware that any third party disclosure, distribution, copying or use of this communication, without prior permission, is strictly prohibited. From mrln at o2.pl Mon Oct 23 21:49:07 2006 From: mrln at o2.pl (Marlena Roszczyk) Date: Mon, 23 Oct 2006 23:49:07 +0200 Subject: [EMBOSS] 30 entries only Message-ID: <1161640147.4367.37.camel@localhost.localdomain> Does anybody know how to solve this problem: I use Emboss via srswww method and everything seems to work fine until I ask seqret or infoseq (or any other application that searches the database) for many sequences (for example typing: "seqret database-des:kinase"). The output consists only of 30 entries even though the same query on srs.ebi.ac.uk results in a 6-digit number of entries. What shall I do to get all the entries I want? Is it a problem with Emboss or rather srs policy of sending data? Just in case you would like to see my emboss.default: DB zuniprot [ methodquery: srswww format: swiss type: P fields: "id acc sv des key org" dbalias: uniprot url: "http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz" comment: "uniprot/swiss via srswww" ] DB zswiss [ methodquery: srswww format: swiss type: P fields: "id acc sv des key org" dbalias: swissprot url: "http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz" comment: "swissprot via srswww" ] DB zembl [ type: N methodquery: srswww format: embl fields: "id acc key sv des org" dbalias: embl url: "http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz" comment: "embl via srswww" ] Thanks in advance, Marlena Roszczyk From David.Bauer at SCHERING.DE Tue Oct 24 06:48:45 2006 From: David.Bauer at SCHERING.DE (David.Bauer at SCHERING.DE) Date: Tue, 24 Oct 2006 08:48:45 +0200 Subject: [EMBOSS] Antwort: 30 entries only In-Reply-To: <1161640147.4367.37.camel@localhost.localdomain> Message-ID: Hi Marleno, SRS has a default limit of 30 entries/page. So it seems that you are getting only the first page of results from the server. If you want to run queries with such results, it may be a good idea to download the uniprot flat file from the ebi ftp server, index it with the EMBOSS dbxflat and than run the queries locally. But if this is not an option due to limited resources, I guess Peter will have an idea how to get the other result pages out of SRS with the srswww method. ;-) Cheers, David. emboss-bounces at lists.open-bio.org schrieb am 23/10/2006 23:49:07: > Does anybody know how to solve this problem: > > I use Emboss via srswww method and everything seems to work fine until I > ask seqret or infoseq (or any other application that searches the > database) for many sequences (for example typing: "seqret > database-des:kinase"). The output consists only of 30 entries even > though the same query on srs.ebi.ac.uk results in a 6-digit number of > entries. What shall I do to get all the entries I want? Is it a problem > with Emboss or rather srs policy of sending data? > > > Just in case you would like to see my emboss.default: > > DB zuniprot [ > methodquery: srswww > format: swiss > type: P > fields: "id acc sv des key org" > dbalias: uniprot > url: "http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz" > comment: "uniprot/swiss via srswww" > ] > > DB zswiss [ > methodquery: srswww > format: swiss > type: P > fields: "id acc sv des key org" > dbalias: swissprot > url: "http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz" > comment: "swissprot via srswww" > ] > > DB zembl [ > type: N > methodquery: srswww > format: embl > fields: "id acc key sv des org" > dbalias: embl > url: "http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz" > comment: "embl via srswww" > ] > > > Thanks in advance, > Marlena Roszczyk > > _______________________________________________ > EMBOSS mailing list > EMBOSS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/emboss From rls at ebi.ac.uk Tue Oct 24 07:59:46 2006 From: rls at ebi.ac.uk (Rodrigo Lopez) Date: Tue, 24 Oct 2006 08:59:46 +0100 Subject: [EMBOSS] 30 entries only In-Reply-To: <1161640147.4367.37.camel@localhost.localdomain> References: <1161640147.4367.37.camel@localhost.localdomain> Message-ID: <453DC7F2.6030908@ebi.ac.uk> Hi, I suspect this is related to the default view used in SRS. It is returning the first page of results that contains 30 sequences (the default). To overcome this problem, the call need to have the following parameters: http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz?[uniprot-des:kinase]+-vn+1+-lv+100 -vn is the view to use (1=names, 2=complete entries) -lv is the number of entries to be used in one go It is important to realize that downloading a lot of entries, although possible, take a while and results in high loads for the servers. The way in which I use to download a large set of entries is by generating a list (using -vn 1) and then using the list with seqret in the following way: % seqret @listname -out mykinases and making sure this time that -vn 2 is used to retrieve complete entries. This requires the addition of other EMBOSS database definitions (one for lists and another for complete entry retrieval). Hope this helps, R:) Marlena Roszczyk wrote: > Does anybody know how to solve this problem: > > I use Emboss via srswww method and everything seems to work fine until I > ask seqret or infoseq (or any other application that searches the > database) for many sequences (for example typing: "seqret > database-des:kinase"). The output consists only of 30 entries even > though the same query on srs.ebi.ac.uk results in a 6-digit number of > entries. What shall I do to get all the entries I want? Is it a problem > with Emboss or rather srs policy of sending data? > > > Just in case you would like to see my emboss.default: > > DB zuniprot [ > methodquery: srswww > format: swiss > type: P > fields: "id acc sv des key org" > dbalias: uniprot > url: "http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz" > comment: "uniprot/swiss via srswww" > ] > > DB zswiss [ > methodquery: srswww > format: swiss > type: P > fields: "id acc sv des key org" > dbalias: swissprot > url: "http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz" > comment: "swissprot via srswww" > ] > > DB zembl [ > type: N > methodquery: srswww > format: embl > fields: "id acc key sv des org" > dbalias: embl > url: "http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz" > comment: "embl via srswww" > ] > > > Thanks in advance, > Marlena Roszczyk > > _______________________________________________ > EMBOSS mailing list > EMBOSS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/emboss From pmr at ebi.ac.uk Tue Oct 24 08:55:17 2006 From: pmr at ebi.ac.uk (pmr at ebi.ac.uk) Date: Tue, 24 Oct 2006 09:55:17 +0100 (BST) Subject: [EMBOSS] 30 entries only In-Reply-To: <453DC7F2.6030908@ebi.ac.uk> References: <1161640147.4367.37.camel@localhost.localdomain> <453DC7F2.6030908@ebi.ac.uk> Message-ID: <3261.217.44.133.216.1161680117.squirrel@webmail.ebi.ac.uk> Rodrigo Lopez writes: > I suspect this is related to the default view used in SRS. It is > returning the first page of results that contains 30 sequences (the > default). To overcome this problem, the call need to have the following > parameters: > > http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz?[uniprot-des:kinase]+-vn+1+-lv+100 But this is using the EMBOSS "srswww" access method, which uses +-e+-ascii That should return complete entries for all as ascii text. Perhaps something has changed on the EBI's SRS server because this now only gives me 30 entries. +-lv+100 does give 100 entries ... but it will take some reworking of the code to loop through entries that way. Hmmmm..... regards, Peter From rls at ebi.ac.uk Tue Oct 24 09:01:50 2006 From: rls at ebi.ac.uk (Rodrigo Lopez) Date: Tue, 24 Oct 2006 10:01:50 +0100 Subject: [EMBOSS] 30 entries only In-Reply-To: <3261.217.44.133.216.1161680117.squirrel@webmail.ebi.ac.uk> References: <1161640147.4367.37.camel@localhost.localdomain> <453DC7F2.6030908@ebi.ac.uk> <3261.217.44.133.216.1161680117.squirrel@webmail.ebi.ac.uk> Message-ID: <453DD67E.8000600@ebi.ac.uk> I'll have to wait for our SRS admin to come back to find out if a change has in fact taken place or not. hmm.... R:/ pmr at ebi.ac.uk wrote: > Rodrigo Lopez writes: > >> I suspect this is related to the default view used in SRS. It is >> returning the first page of results that contains 30 sequences (the >> default). To overcome this problem, the call need to have the following >> parameters: >> >> http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz?[uniprot-des:kinase]+-vn+1+-lv+100 > > But this is using the EMBOSS "srswww" access method, which uses > > +-e+-ascii > > That should return complete entries for all as ascii text. > > Perhaps something has changed on the EBI's SRS server because this now > only gives me 30 entries. > > +-lv+100 does give 100 entries ... but it will take some reworking of the > code to loop through entries that way. Hmmmm..... > > > regards, > > Peter From maoj at helix.nih.gov Tue Oct 24 18:15:17 2006 From: maoj at helix.nih.gov (Jean Mao) Date: Tue, 24 Oct 2006 14:15:17 -0400 Subject: [EMBOSS] How to include Prosite and Rebase and Print into 'showdb' program Message-ID: <000001c6f798$5c2c13c0$be4de780@CIT.NIH.GOV> Hi, for EMBOSS 4.0.0, is there a way to show both prosite and rebase databases when I type 'showdb' at the prompt? I asked the same question back in 2003. I was hoping the answer will be different this time :-) Thanks. Jean From sovani at rohan.sdsu.edu Tue Oct 24 18:09:07 2006 From: sovani at rohan.sdsu.edu (Sujata Sovani) Date: Tue, 24 Oct 2006 11:09:07 -0700 (PDT) Subject: [EMBOSS] Antigenic - input file format? Message-ID: <3251.146.244.226.90.1161713347.squirrel@www-rohan.sdsu.edu> Hi, I want to use the package called 'Antigenic' in EMBOSS. I am not quite clear about the input file format to be used. How can I input a fasta file to the program? - is it possible to use a text file that has the amino acid sequence in a fasta format? In which folder should the file be? Please let me know. Thank you. Regards, Sujata From km at mrna.tn.nic.in Tue Oct 24 21:28:57 2006 From: km at mrna.tn.nic.in (km) Date: Wed, 25 Oct 2006 02:58:57 +0530 Subject: [EMBOSS] Antigenic - input file format? In-Reply-To: <3251.146.244.226.90.1161713347.squirrel@www-rohan.sdsu.edu> References: <3251.146.244.226.90.1161713347.squirrel@www-rohan.sdsu.edu> Message-ID: <20061024212857.GA31781@mrna.tn.nic.in> Hi, > I want to use the package called 'Antigenic' in EMBOSS. > I am not quite clear about the input file format to be used. pls consult EMBOSS documentation on the system by typing $tfm antigenic > How can I input a fasta file to the program? first check: tfm antigenic then, assuming that ur set of sequence(s) are in a textfile(myseqs.fa) in fasta format run: $antigenic -sequence myseqs.fa > is it possible to use a text file that has the amino acid sequence in a fasta format? yes >In which folder should the file be? simple solution would be that the sequence current folder regards, KM From golharam at umdnj.edu Wed Oct 25 04:28:30 2006 From: golharam at umdnj.edu (Ryan Golhar) Date: Wed, 25 Oct 2006 00:28:30 -0400 Subject: [EMBOSS] How to include Prosite and Rebase and Print into 'showdb'program In-Reply-To: <000001c6f798$5c2c13c0$be4de780@CIT.NIH.GOV> Message-ID: <002f01c6f7ee$07005fe0$2f01a8c0@GOLHARMOBILE1> Have you gotten an answer to this yet? > -----Original Message----- > From: emboss-bounces at lists.open-bio.org > [mailto:emboss-bounces at lists.open-bio.org] On Behalf Of Jean Mao > Sent: Tuesday, October 24, 2006 2:15 PM > To: emboss at emboss.open-bio.org > Subject: [EMBOSS] How to include Prosite and Rebase and Print > into 'showdb'program > > > Hi, for EMBOSS 4.0.0, is there a way to show both prosite and > rebase databases when I type 'showdb' at the prompt? I asked > the same question back in 2003. I was hoping the answer will > be different this time :-) > > Thanks. > > Jean > > > _______________________________________________ > EMBOSS mailing list > EMBOSS at lists.open-bio.org > http://lists.open-> bio.org/mailman/listinfo/emboss > From pmr at ebi.ac.uk Wed Oct 25 07:34:19 2006 From: pmr at ebi.ac.uk (pmr at ebi.ac.uk) Date: Wed, 25 Oct 2006 08:34:19 +0100 (BST) Subject: [EMBOSS] How to include Prosite and Rebase and Print into 'showdb' program In-Reply-To: <000001c6f798$5c2c13c0$be4de780@CIT.NIH.GOV> References: <000001c6f798$5c2c13c0$be4de780@CIT.NIH.GOV> Message-ID: <1836.217.44.133.216.1161761659.squirrel@webmail.ebi.ac.uk> Hi Jean, > Hi, for EMBOSS 4.0.0, is there a way to show both prosite and rebase > databases when I type 'showdb' at the prompt? I asked the same question > back in 2003. I was hoping the answer will be different this time :-) Well .... EMBOSS 4.0.0 does have extended showdb output so now we can add this. The main issue is that there is currently nothing in EMBOSS that uses the definition, but we would like to add a report of the database release to the output of programs that use them. The definitions would be expected to go in RESOURCE definitions in the emboss.default file but we could perhaps put something in the output of the *extract programs. I will take another look. regards, Peter From pmr at ebi.ac.uk Wed Oct 25 07:37:56 2006 From: pmr at ebi.ac.uk (pmr at ebi.ac.uk) Date: Wed, 25 Oct 2006 08:37:56 +0100 (BST) Subject: [EMBOSS] Antigenic - input file format? In-Reply-To: <3251.146.244.226.90.1161713347.squirrel@www-rohan.sdsu.edu> References: <3251.146.244.226.90.1161713347.squirrel@www-rohan.sdsu.edu> Message-ID: <1840.217.44.133.216.1161761876.squirrel@webmail.ebi.ac.uk> Hi Sujata, > I want to use the package called 'Antigenic' in EMBOSS. > I am not quite clear about the input file format to be used. > > How can I input a fasta file to the program? - is it possible to use a > text file that has the amino acid sequence in a fasta format? In which > folder should the file be? All EMBOSS programs read sequences from files, or from databases (local or remote). You can put the sequence in a file, in fasta format, and give the filename to any EMBOSS program as input. Sequences are "input parameters" so simply putting the filename on the command line is enough. EMBOSS will look in the current directory, but you can give the full or relative file path just like any Unix command. This assumes of course that you are running EMBOSS locally, not through a web interface (in that case, simply paste a FASTA format sequence into the text box). Hope that helps Peter From pmr at ebi.ac.uk Wed Oct 25 07:42:50 2006 From: pmr at ebi.ac.uk (pmr at ebi.ac.uk) Date: Wed, 25 Oct 2006 08:42:50 +0100 (BST) Subject: [EMBOSS] How to include Prosite and Rebase and Print into 'showdb'program In-Reply-To: <002f01c6f7ee$07005fe0$2f01a8c0@GOLHARMOBILE1> References: <000001c6f798$5c2c13c0$be4de780@CIT.NIH.GOV> <002f01c6f7ee$07005fe0$2f01a8c0@GOLHARMOBILE1> Message-ID: <1844.217.44.133.216.1161762170.squirrel@webmail.ebi.ac.uk> Ryan Golhar writes: > Have you gotten an answer to this yet? A bit quick off the mark there, Ryan! :-) :-) :-) Jean asked in the USA at 7pm our time. You posted this in India at 5am our time. I answered over breakfast (well, not quite a positive answer, but I did answer :-) If only Jean had asked last week ... I was in Japan and I'd have snuck in a reply already... and Alan, Jon and I do quite often post replies at very strange hours even when we are home. regards, Peter From mrln at o2.pl Wed Oct 25 13:13:36 2006 From: mrln at o2.pl (Marlena Roszczyk) Date: Wed, 25 Oct 2006 15:13:36 +0200 Subject: [EMBOSS] 30 entries only In-Reply-To: <3261.217.44.133.216.1161680117.squirrel@webmail.ebi.ac.uk> References: <1161640147.4367.37.camel@localhost.localdomain> <453DC7F2.6030908@ebi.ac.uk> <3261.217.44.133.216.1161680117.squirrel@webmail.ebi.ac.uk> Message-ID: <1161782016.4396.52.camel@localhost.localdomain> Adding lv parameter helped and is good enough. It required few more lines in emboss.default: DB blahblah [ method: url format: myfavouriteformat type: P url: "http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz?-e+-ascii+[uniprot-des:% s]+-lv+" ] Thank you. Still, option -vn 1 refuses to cooperate, although -vn 2 works fine. Adding +-vn+1 to the url-line above makes seqret return "Bad value for -sequence". Hmmm... Regards, Marlena Roszczyk > Rodrigo Lopez writes: > > > I suspect this is related to the default view used in SRS. It is > > returning the first page of results that contains 30 sequences (the > > default). Yes, the number 30 here and there doesn't seem a coincidence. From pmr at ebi.ac.uk Wed Oct 25 13:27:49 2006 From: pmr at ebi.ac.uk (Peter Rice) Date: Wed, 25 Oct 2006 14:27:49 +0100 Subject: [EMBOSS] Question regarding seqret In-Reply-To: <000001c6d11a$0db13530$be4de780@CIT.NIH.GOV> References: <000001c6d11a$0db13530$be4de780@CIT.NIH.GOV> Message-ID: <453F6655.2050900@ebi.ac.uk> Jean Mao wrote: > Hi, > I have a question hopefully someone can help me about it. > > I downloaded the gbrvt1.seq file from ftp://ftp.ncbi.nih.gov/genbank/ as a test, gunzip and index it with dbxflat (I know it's not > than 2gb): > > % dbxflat -dbname=testdb -dbresource=embl -idformat=gb -directory=. -fields='id,acc,sv,des' -filenames='gbvrt*.seq' -indexoutdir=. -release=0.0 -date='00/00/00' > > Then I run 'seqret' but failed to retrieve entries using 'sv' or 'des' fields: I didn't see an answer to this one, but I suspect you have already figured it out. dbixflat and dbiflat will have created the sv and des indices. You have to edit the database definition in emboss.default to say the fields exist. fields: "sv des" then seqret and other programs will know they can use them. Yes, in theory seqret could work out what indices are available for a dbxflat or dbiflat indexed database - but it would be more difficult for an SRS or SRSWWW database (for example) so we depend on the database definitions. Hope that helps, Peter From golharam at umdnj.edu Wed Oct 25 18:50:12 2006 From: golharam at umdnj.edu (Ryan Golhar) Date: Wed, 25 Oct 2006 14:50:12 -0400 Subject: [EMBOSS] How to include Prosite and Rebase and Print into 'showdb'program In-Reply-To: <1844.217.44.133.216.1161762170.squirrel@webmail.ebi.ac.uk> Message-ID: <006d01c6f866$66d9cbe0$2f01a8c0@GOLHARMOBILE1> > -----Original Message----- > From: pmr at ebi.ac.uk [mailto:pmr at ebi.ac.uk] > Sent: Wednesday, October 25, 2006 3:43 AM > To: golharam at umdnj.edu > Cc: 'Jean Mao'; emboss at emboss.open-bio.org > Subject: Re: [EMBOSS] How to include Prosite and Rebase and > Print into 'showdb'program > > > Ryan Golhar writes: > > Have you gotten an answer to this yet? > > A bit quick off the mark there, Ryan! :-) :-) :-) > > Jean asked in the USA at 7pm our time. You posted this in > India at 5am our time. I answered over breakfast (well, not > quite a positive answer, but I did answer :-) > > If only Jean had asked last week ... I was in Japan and I'd > have snuck in a reply already... and Alan, Jon and I do quite > often post replies at very strange hours even when we are home. > > regards, > > Peter > > > Sorry, I was cleaning out my mail folder. I had deleted the message already and noticed it in my deleted box. The subject caught my attention. I thought the message was older...my bad. From mkitagaw73 at yahoo.co.jp Fri Oct 27 13:09:40 2006 From: mkitagaw73 at yahoo.co.jp (mkitagaw73 at yahoo.co.jp) Date: Fri, 27 Oct 2006 22:09:40 +0900 Subject: [EMBOSS] ARACHNE3 Message-ID: I can not find "Arachne 3" the assembler of new version of "Arachne 2". Do you know where it is? -- Nari From mincloud at gmail.com Sun Oct 29 17:39:35 2006 From: mincloud at gmail.com (yun zheng) Date: Sun, 29 Oct 2006 11:39:35 -0600 Subject: [EMBOSS] How to apply the einverted and etandom to a fasta file Message-ID: <8f6eb9540610290939i48adf473g2f81c36a14b198ad@mail.gmail.com> Hi, I am a new user of emboss. I am trying to find repeat sequences in a nucleotide sequence file that have many sequences. Can anybody tell me how to use einverted and etandem to analyze all the sequences in a fasta file? Many Thanks. Sincerely Zheng, yun Dept of Computer Science and Engineering Washington Univ in St Louis Campus Box 1045 1 Brookings Drive Jolley Hall 505 St Louis, MO 63130 Details: I install a version on the linux platform. And the command is like follows, where the default value is used. >einverted -sequence test.fasta -outfile test.outfile -outseq >test-i.fasta Finds DNA inverted repeats Gap penalty [12]: Minimum score threshold [50]: Match score [3]: Mismatch score [-4]: But the output file seems always to be empty. When I try etandom >etandem -sequence test.fasta -outfile test-t.out -origfile test.etandem Looks for tandem repeats in a nucleotide sequence Minimum repeat size [10]: Maximum repeat size [10]: 18 However, it seems that only the first sequence is analyzed by the einverted and etandom. The test-t.out file is as follows. ######################################## # Program: etandem # Rundate: Sat Oct 28 2006 17:24:30 # Commandline: etandem # -sequence test.fasta # -outfile test-t.out # -origfile test.etandem # -maxrepeat 18 # Report_format: table # Report_file: test-t.out ######################################## #======================================= # # Sequence: D9X6RJV01EER0J from: 1 to: 55 # HitCount: 0 # # Threshold: 20 # Minrepeat: 10 # Maxrepeat: 18 # Mismatch: No # Uniform: No # #======================================= Start End Score Size Count Identity Consensus #--------------------------------------- #--------------------------------------- Many thanks. From gbottu at ben.vub.ac.be Mon Oct 30 15:33:13 2006 From: gbottu at ben.vub.ac.be (Guy Bottu) Date: Mon, 30 Oct 2006 16:33:13 +0100 Subject: [EMBOSS] How to apply the einverted and etandom to a fasta file - C In-Reply-To: <8f6eb9540610290939i48adf473g2f81c36a14b198ad@mail.gmail.com> References: <8f6eb9540610290939i48adf473g2f81c36a14b198ad@mail.gmail.com> Message-ID: <20061030153313.GA14597@bigben.ulb.ac.be> On Sun, Oct 29, 2006 at 11:39:35AM -0600, yun zheng wrote: > I am a new user of emboss. I am trying to find repeat sequences in a > nucleotide sequence file that have many sequences. > > Can anybody tell me how to use einverted and etandem to analyze all the > sequences in a fasta file? einverted is searching for palindromes rather than repeats. It operates without problem on a fastA multiple sequence file. The reason that the output file is empty is probably because it did not find any good palindrome. Maybe you can try experiment with the parameters. etandem operates only on one sequence at a time. You can see this because if you do etandem -help you see that it takes as input an object of type "sequence" rather than "seqall". If you want to treat many sequences at once, you will need to put them in separate files. If necessary you can run seqret -ossingle on your file. You can under the Tc shell (tcsh) (provided your files are all called something.fasta) do : foreach FASTAFILE (`ls *.fasta`) etandem $FASTAFILE -minrepeat=10 -maxrepeat=10 -threshold=20 -auto end Problem is that etandem works only well if you provide an appropriate value for minrepeat/maxrepeat/threshold. You can use equicktandem to get an idea (look in the 4th column of the output for a repeat size). Working on all sequences in one run will of course only go well if they all contain repeats of similar size and quality. I hope this helps. Guy Bottu, Belgian EMBnet Node From jbreu at mpipsykl.mpg.de Mon Oct 30 19:38:10 2006 From: jbreu at mpipsykl.mpg.de (Johannes Breu) Date: Mon, 30 Oct 2006 20:38:10 +0100 Subject: [EMBOSS] dbifasta Message-ID: <3.0.6.32.20061030203810.00acc5e8@komserv.mpipsykl.mpg.de> Hello, while trying to index my database (its mouse_ensembl_cdna and so is the name) I always get the following error message: $ dbifasta Database indexing for fasta file databases Database name: cdna simple : >ID idacc : >ID ACC gcgid : >db:ID gcgidacc : >db:ID ACC dbid : >db ID ncbi : | formats ID line format [idacc]: simple Database directory [.]: /data/cdna Wildcard database filename [*.dat]: cdna Release number [0.0]: Index date [00/00/00]: General log output file [outfile.dbifasta]: outfile.cdnafasta EMBOSS An error in dbifasta.c at line 210: No files selected For the case it?s relevant - I am using cygwin. Thank you, Johannes From ajb at ebi.ac.uk Mon Oct 30 21:30:03 2006 From: ajb at ebi.ac.uk (ajb at ebi.ac.uk) Date: Mon, 30 Oct 2006 21:30:03 -0000 (GMT) Subject: [EMBOSS] dbifasta In-Reply-To: <3.0.6.32.20061030203810.00acc5e8@komserv.mpipsykl.mpg.de> References: <3.0.6.32.20061030203810.00acc5e8@komserv.mpipsykl.mpg.de> Message-ID: <40898.81.98.244.247.1162243803.squirrel@webmail.ebi.ac.uk> Hi, If the filename is mouse_ensembl_cdna then that's the filename you should use at the Wildcard database filename [*.dat]: prompt. From your email you were using "cdna" instead. As a wildcard can be specified then perhaps you intended typing "*cdna" which would have picked up the filename mouse_ensembl_cdna HTH Alan EBI > Hello, > while trying to index my database (its mouse_ensembl_cdna and so is the > name) I always get the following error message: > > $ dbifasta > Database indexing for fasta file databases > Database name: cdna > simple : >ID > idacc : >ID ACC > gcgid : >db:ID > gcgidacc : >db:ID ACC > dbid : >db ID > ncbi : | formats > ID line format [idacc]: simple > Database directory [.]: /data/cdna > Wildcard database filename [*.dat]: cdna > Release number [0.0]: > Index date [00/00/00]: > General log output file [outfile.dbifasta]: outfile.cdnafasta > > EMBOSS An error in dbifasta.c at line 210: > No files selected > > > For the case it?s relevant - I am using cygwin. > > Thank you, Johannes > > > _______________________________________________ > EMBOSS mailing list > EMBOSS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/emboss > > From shrish at ccmb.res.in Tue Oct 31 12:41:19 2006 From: shrish at ccmb.res.in (Shrish Tiwari) Date: Tue, 31 Oct 2006 18:11:19 +0530 (IST) Subject: [EMBOSS] extracting noncoding regions Message-ID: <2187871.1162298479934.JavaMail.root@mailserver> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From shrish at ccmb.res.in Tue Oct 31 12:18:36 2006 From: shrish at ccmb.res.in (Shrish Tiwari) Date: Tue, 31 Oct 2006 17:48:36 +0530 (IST) Subject: [EMBOSS] showfeat troubles Message-ID: <24303384.1162297116701.JavaMail.root@mailserver> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From David.Bauer at schering.de Tue Oct 31 13:54:03 2006 From: David.Bauer at schering.de (David.Bauer at schering.de) Date: Tue, 31 Oct 2006 14:54:03 +0100 Subject: [EMBOSS] Antwort: showfeat troubles In-Reply-To: <24303384.1162297116701.JavaMail.root@mailserver> Message-ID: Hi, I don't get this problem. Showfeat displays CDS from both strands with EMBL and GenBank files. What is the source of your Genbankf file ? Maybe the format is not perfectly correct ? David. emboss-bounces at lists.open-bio.org schrieb am 31/10/2006 13:18:36: > Hi! > I used the following command to extract only positions of CDS from gbk files: > showfeat -pos -matchtype CDS -width 0 > But I noticed that the program does not extract positions of CDS > that lie on the complementary strand, e.g. CDS > complement(5683..6459) did not show up in the resultant file. Any > ideas on how I can get showfeat to extract these positions too. > Shrish > Dr. Shrish Tiwari > E503, Centre for Cellular and Molecular Biology > Uppal Road, Hyderabad - 500 007, INDIA > Phone: 91-40-27192777 > Alternate email: shrish.geo at yahoo.com > > _______________________________________________ > EMBOSS mailing list > EMBOSS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/emboss From pmr at ebi.ac.uk Tue Oct 31 15:34:00 2006 From: pmr at ebi.ac.uk (Peter Rice) Date: Tue, 31 Oct 2006 15:34:00 +0000 Subject: [EMBOSS] extracting noncoding regions In-Reply-To: <2187871.1162298479934.JavaMail.root@mailserver> References: <2187871.1162298479934.JavaMail.root@mailserver> Message-ID: <45476CE8.8080409@ebi.ac.uk> Hi Shrish, Shrish Tiwari wrote: > Hi! > Is there a way of extracting the noncoding regions of a genome using an EMBOSS program? That is a simple change to coderet to return non-coding sequence (exclude the CDS and mRNA features). Does anyone else want this? We can do it for the next release. regards, Peter From pmr at ebi.ac.uk Tue Oct 31 15:55:49 2006 From: pmr at ebi.ac.uk (Peter Rice) Date: Tue, 31 Oct 2006 15:55:49 +0000 Subject: [EMBOSS] showfeat troubles In-Reply-To: <24303384.1162297116701.JavaMail.root@mailserver> References: <24303384.1162297116701.JavaMail.root@mailserver> Message-ID: <45477205.50002@ebi.ac.uk> Hi Shrish, Shrish Tiwari wrote: > Hi! > I used the following command to extract only positions of CDS from gbk files: > showfeat -pos -matchtype CDS -width 0 > But I noticed that the program does not extract positions of CDS that lie on the complementary strand, e.g. CDS complement(5683..6459) did not show up in the resultant file. Any ideas on how I can get showfeat to extract these positions too. It worked for me, but reports these as 5683..6469 (without -width 0 it will show the arrow in the reverse direction) Can you try running entret on the same genbank entry, and sending the output file to emboss-bug at emboss.open-bio.org so we can take a look at it. regards, Peter Rice From David.Bauer at schering.de Tue Oct 31 14:01:54 2006 From: David.Bauer at schering.de (David.Bauer at schering.de) Date: Tue, 31 Oct 2006 15:01:54 +0100 Subject: [EMBOSS] Antwort: extracting noncoding regions In-Reply-To: <2187871.1162298479934.JavaMail.root@mailserver> Message-ID: Hm, if the genome is annotated, you could use maskfeat -type mRNA (or -type CDS) to mask all transcribed or translated regions with N. HTH, David. emboss-bounces at lists.open-bio.org schrieb am 31/10/2006 13:41:19: > Hi! > Is there a way of extracting the noncoding regions of a genome using > an EMBOSS program? > Shrish > Dr. Shrish Tiwari > E503, Centre for Cellular and Molecular Biology > Uppal Road, Hyderabad - 500 007, INDIA > Phone: 91-40-27192777 > Alternate email: shrish.geo at yahoo.com > > _______________________________________________ > EMBOSS mailing list > EMBOSS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/emboss From golharam at umdnj.edu Tue Oct 31 17:02:38 2006 From: golharam at umdnj.edu (Ryan Golhar) Date: Tue, 31 Oct 2006 12:02:38 -0500 Subject: [EMBOSS] extracting noncoding regions In-Reply-To: <45476CE8.8080409@ebi.ac.uk> Message-ID: <000a01c6fd0e$5f5b5b70$b23d140a@GOLHARMOBILE1> I think that would be a useful feature...I have a need for it now and currently use a Bioperl script to parse out noncoding regions from a GenBank entry... > -----Original Message----- > From: emboss-bounces at lists.open-bio.org > [mailto:emboss-bounces at lists.open-bio.org] On Behalf Of Peter Rice > Sent: Tuesday, October 31, 2006 10:34 AM > To: Shrish Tiwari > Cc: emboss at emboss.open-bio.org > Subject: Re: [EMBOSS] extracting noncoding regions > > > Hi Shrish, > > Shrish Tiwari wrote: > > Hi! > > Is there a way of extracting the noncoding regions of a > genome using > > an EMBOSS program? > > That is a simple change to coderet to return non-coding > sequence (exclude the > CDS and mRNA features). > > Does anyone else want this? We can do it for the next release. > > regards, > > Peter > _______________________________________________ > EMBOSS mailing list > EMBOSS at lists.open-bio.org > http://lists.open-> bio.org/mailman/listinfo/emboss > From Richard.Rothery at ualberta.ca Tue Oct 31 17:02:22 2006 From: Richard.Rothery at ualberta.ca (Richard Rothery) Date: Tue, 31 Oct 2006 10:02:22 -0700 Subject: [EMBOSS] Batch retrieval of taxonomy/species names using entret..... Message-ID: <000001c6fd0e$5520f2f0$5e068081@Nordegg> Hi, I am interested in using entret to retrieve single field entries from swissprot or sptrembl. Specifically, I would like to feed entret a list of accessions and have it return a file with the species names and/or taxonomies. I intend to use this information to compare with my phylogeny analyses of clustalw alignments. Thanks, Richard ############################################### CIHR Membrane Protein Research Group, Department of Biochemistry, University of Alberta, Edmonton T6G 2H7 Ph. (780) 492-2229 Fax. (780) 492-0886 ############################################### From Suraj.Mukatira at STJUDE.ORG Tue Oct 31 18:30:00 2006 From: Suraj.Mukatira at STJUDE.ORG (Mukatira, Suraj) Date: Tue, 31 Oct 2006 12:30:00 -0600 Subject: [EMBOSS] extracting noncoding regions Message-ID: I use BioPerl as well. Extraction of non-coding regions and features like intron, UTR etc. would certainly be useful from within EMBOSS. Suraj Mukatira -----Original Message----- From: emboss-bounces at lists.open-bio.org [mailto:emboss-bounces at lists.open-bio.org] On Behalf Of Ryan Golhar Sent: Tuesday, October 31, 2006 11:03 AM To: 'Peter Rice'; 'Shrish Tiwari' Cc: emboss at emboss.open-bio.org Subject: Re: [EMBOSS] extracting noncoding regions I think that would be a useful feature...I have a need for it now and currently use a Bioperl script to parse out noncoding regions from a GenBank entry... > -----Original Message----- > From: emboss-bounces at lists.open-bio.org > [mailto:emboss-bounces at lists.open-bio.org] On Behalf Of Peter Rice > Sent: Tuesday, October 31, 2006 10:34 AM > To: Shrish Tiwari > Cc: emboss at emboss.open-bio.org > Subject: Re: [EMBOSS] extracting noncoding regions > > > Hi Shrish, > > Shrish Tiwari wrote: > > Hi! > > Is there a way of extracting the noncoding regions of a > genome using > > an EMBOSS program? > > That is a simple change to coderet to return non-coding > sequence (exclude the > CDS and mRNA features). > > Does anyone else want this? We can do it for the next release. > > regards, > > Peter > _______________________________________________ > EMBOSS mailing list > EMBOSS at lists.open-bio.org > http://lists.open-> bio.org/mailman/listinfo/emboss > _______________________________________________ EMBOSS mailing list EMBOSS at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/emboss From pmr at ebi.ac.uk Tue Oct 31 18:53:00 2006 From: pmr at ebi.ac.uk (Peter Rice) Date: Tue, 31 Oct 2006 18:53:00 +0000 Subject: [EMBOSS] Batch retrieval of taxonomy/species names using entret..... In-Reply-To: <000001c6fd0e$5520f2f0$5e068081@Nordegg> References: <000001c6fd0e$5520f2f0$5e068081@Nordegg> Message-ID: <45479B8C.5080800@ebi.ac.uk> Hi Richard, Richard Rothery wrote: > I am interested in using entret to retrieve single field entries from > swissprot or sptrembl. Specifically, I would like to feed entret a list > of accessions and have it return a file with the species names and/or > taxonomies. I intend to use this information to compare with my > phylogeny analyses of clustalw alignments. EMBOSS stores the full text in entret without parsing. We could try to extract specific fields but it is not easy to define them for all formats. You can do this with SRS. Try the EBI server for example: Go to the library page Select UniProtKB/SwissProt (or UniProtKB/TrEMBL) Select "standard query form" Enter your query in the top part (e.g. accession number) In the "create a view" section click the "list" button to egt the original lines. Select anything taxonomic from the pull down list (control-click to select more than one) Press "search". refine your query. You will see the URL at the top that can be used to retrieve data when you are happy. Failing that, you could just parse out the ID and O* lines from entret using a simple perl script. Hope that helps, Peter