From hpm at bioinfo-user.org.uk Sun Dec 3 10:06:25 2006 From: hpm at bioinfo-user.org.uk (Hamish McWilliam) Date: Sun, 03 Dec 2006 15:06:25 +0000 Subject: [EMBOSS] EMBOSS database setup In-Reply-To: <43964.81.98.244.247.1164904060.squirrel@webmail.ebi.ac.uk> References: <639b80db0611281217i6c1cc927v50ac9b8e6a71717c@mail.gmail.com> <1164902235.14146.57.camel@emboss2.ebi.ac.uk> <43964.81.98.244.247.1164904060.squirrel@webmail.ebi.ac.uk> Message-ID: <4572E7F1.5080709@bioinfo-user.org.uk> Hi Alan, > The appended definitions are simple ones that may be > useful if you only want a few sequences at a time. > If sites upgrade to SRS8 then alter accordingly. > > Alan > > DB embl [ type: N method: srswww format: embl release: "EBI" > url: "http://srs.ebi.ac.uk/srs7bin/cgi-bin/wgetz" > comment: "EMBL from the EBI" ] > > DB em [ type: N method: srswww format: embl release: "EBI" > url: "http://srs.ebi.ac.uk/srs7bin/cgi-bin/wgetz" > dbalias: "EMBL" > comment: "EMBL from the EBI" ] > > DB swissprot [ type: P method: srswww format: swiss release: "EBI" > url: "http://srs.ebi.ac.uk/srs7bin/cgi-bin/wgetz" > comment: "SWISSPROT from the EBI" ] > > DB sw [ type: P method: srswww format: swiss release: "EBI" > url: "http://srs.ebi.ac.uk/srs7bin/cgi-bin/wgetz" > dbalias: "SWISSPROT" > comment: "SWISSPROT from the EBI" ] > > DB uniprot [ type: P method: srswww format: swiss release: "EBI" > url: "http://srs.ebi.ac.uk/srs7bin/cgi-bin/wgetz" > comment: "UNIPROT from the EBI" ] > > DB uni [ type: P method: srswww format: swiss release: EBI > url: "http://srs.ebi.ac.uk/srs7bin/cgi-bin/wgetz" > dbalias: "UNIPROT" > comment: "UNIPROT from the EBI" ] > > DB pir [ type: P method: srswww format: nbrf release: "EBI" > url: "http://srs.ebi.ac.uk/srs7bin/cgi-bin/wgetz" > comment: "PIR from the EBI" ] > > DB genbank [ type: N method: srswww format: genbank release: "NCBI" > url: "http://www.infobiogen.fr/srs7bin/cgi-bin/wgetz" > comment: "GenBank from Infobiogen" ] > > DB gb [ type: N method: srswww format: genbank release: "NCBI" > url: "http://www.infobiogen.fr/srs7bin/cgi-bin/wgetz" > dbalias: "GENBANK" > comment: "GenBank from Infobiogen" ] > > DB refseq [ type: N method: srswww format: genbank release: "NCBI" > url: "http://srs.ebi.ac.uk/srs7bin/cgi-bin/wgetz" > comment: "REFSEQ from EBI" ] For the EBI's SRS server please use: http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz as the URL. This should allow for continued support when the server is upgraded. Also note that the Infobiogen SRS service is no longer available. For other SRS sites carrying GenBank please see the Public SRS Server List (http://downloads.biowisdomsrs.com/publicsrs.html). Hamish From maoj at helix.nih.gov Mon Dec 4 10:49:26 2006 From: maoj at helix.nih.gov (Jean Mao) Date: Mon, 4 Dec 2006 10:49:26 -0500 Subject: [EMBOSS] Application for PFAM? Message-ID: <000501c717bb$c6d353d0$be4de780@CIT.NIH.GOV> Hi, Just wondering if EMBOSS has any program that will search a pfam database? From David.Bauer at SCHERING.DE Tue Dec 5 01:40:54 2006 From: David.Bauer at SCHERING.DE (David.Bauer at SCHERING.DE) Date: Tue, 5 Dec 2006 07:40:54 +0100 Subject: [EMBOSS] Antwort: Application for PFAM? In-Reply-To: <000501c717bb$c6d353d0$be4de780@CIT.NIH.GOV> Message-ID: Hi Jean, not in the core EMBOSS but there is the HMMER-2.3.2 embassy application. This contains also the program ehmmpfam to search a sequence against the Pfam HMM database. HTH, David. emboss-bounces at lists.open-bio.org schrieb am 04/12/2006 16:49:26: > Hi, > Just wondering if EMBOSS has any program that will search a pfam database? > > _______________________________________________ > EMBOSS mailing list > EMBOSS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/emboss From JK at novozymes.com Wed Dec 6 07:34:38 2006 From: JK at novozymes.com (JK (Jesper Agerbo Krogh)) Date: Wed, 6 Dec 2006 13:34:38 +0100 Subject: [EMBOSS] Output from seqret in fastaformat. Message-ID: <934F95E71B6C9347A873C42AE3C196190B84C672@NZT0004E.dknz.nzcorp.net> Hi.. I've godt dbxflat to index the swissprot database.. but I'd like to have the output formatted with the USA as the fasta ID. Current..: seqret UNIPROT:Q12345 Reads and writes (returns) sequences output sequence(s) [ies3_yeast.fasta]: >IES3_YEAST Q12345 Ino eighty subunit 3. MKFEDLLATNKQVQFAHAATQHYKSVKTPDFLEKDPHHKKFHNADGLNQQGSSTPSTATD ANAASTASTHTNTTTFKRHIVAVDDISKMNYEMIKNSPGNVITNANQDEIDISTLKTRLY KDNLYAMNDNFLQAVNDQIVTLNAAEQDQETEDPDLSDDEKIDILTKIQENLLEEYQKLS QKERKWFILKELLLDANVELDLFSNRGRKASHPIAFGAVAIPTNVNANSLAFNRTKRRKI NKNGLLENIL .. but I'd like.. >UNIPROT:Q12345 Ino eighty subunit 3. MKFEDLLATNKQVQFAHAATQHYKSVKTPDFLEKDPHHKKFHNADGLNQQGSSTPSTATD ANAASTASTHTNTTTFKRHIVAVDDISKMNYEMIKNSPGNVITNANQDEIDISTLKTRLY KDNLYAMNDNFLQAVNDQIVTLNAAEQDQETEDPDLSDDEKIDILTKIQENLLEEYQKLS QKERKWFILKELLLDANVELDLFSNRGRKASHPIAFGAVAIPTNVNANSLAFNRTKRRKI NKNGLLENIL Is that possible? -- Jesper Krogh From maoj at helix.nih.gov Wed Dec 6 09:55:33 2006 From: maoj at helix.nih.gov (Jean Mao) Date: Wed, 6 Dec 2006 09:55:33 -0500 Subject: [EMBOSS] Question regarding dbxflat entry number processed Message-ID: <000a01c71946$94beb600$be4de780@CIT.NIH.GOV> Hi, I am using dbxflat to index a database. I would like to find out how many entries were processed. In the index file database.pxid, there is a line : Count 123456 which is very close to the number of entries in the database file but not exact the same. Is there a way to find out? Thank you very much. Jean Mao From ajb at ebi.ac.uk Wed Dec 6 10:23:44 2006 From: ajb at ebi.ac.uk (ajb at ebi.ac.uk) Date: Wed, 6 Dec 2006 15:23:44 -0000 (GMT) Subject: [EMBOSS] Question regarding dbxflat entry number processed In-Reply-To: <000a01c71946$94beb600$be4de780@CIT.NIH.GOV> References: <000a01c71946$94beb600$be4de780@CIT.NIH.GOV> Message-ID: <48753.81.98.244.247.1165418624.squirrel@webmail.ebi.ac.uk> Hi Jean, Usually you just need to add 1 to the 'Count' value as the counting works from 0 to n-1 rather than 1 to n. However, if there are duplicate keys in the database then that cannot be relied upon: the Count is representative of the number of unique keys at the top level of the tree and does not include any duplicates indexed in a subtree. HTH Alan From JK at novozymes.com Wed Dec 6 10:33:11 2006 From: JK at novozymes.com (JK (Jesper Agerbo Krogh)) Date: Wed, 6 Dec 2006 16:33:11 +0100 Subject: [EMBOSS] Output from seqret in fastaformat. In-Reply-To: <4576C594.3080609@ebi.ac.uk> Message-ID: <934F95E71B6C9347A873C42AE3C196191386B8DD@NZT0004E.dknz.nzcorp.net> Hi, > > Use -osdbname UNIPROT in the command line. That sort of works... but that gives me the DATABASE:ID not the DATABASE:AC in the fasta-header. Whats the actual difference between the id and the accessionnumbers? -- Jesper Krogh From pmr at ebi.ac.uk Wed Dec 6 12:30:58 2006 From: pmr at ebi.ac.uk (pmr at ebi.ac.uk) Date: Wed, 6 Dec 2006 17:30:58 -0000 (GMT) Subject: [EMBOSS] Output from seqret in fastaformat. In-Reply-To: <934F95E71B6C9347A873C42AE3C196191386B8DD@NZT0004E.dknz.nzcorp.net> References: <4576C594.3080609@ebi.ac.uk> <934F95E71B6C9347A873C42AE3C196191386B8DD@NZT0004E.dknz.nzcorp.net> Message-ID: <14912.193.173.109.1.1165426258.squirrel@webmail.ebi.ac.uk> Hi Jesper, >> Use -osdbname UNIPROT in the command line. > > That sort of works... but that gives me the DATABASE:ID not the > DATABASE:AC in the fasta-header. Yup, you need to redefine the ID as well with -sid Q12345 > Whats the actual difference between the id and the accessionnumbers? The id is the identifier on the ID line of the entry. The accession number is from the AC line - also a unique identifier but completely unmemorable. Given the choice, we prefer the real ID. Entries can also have more than one accession number (more common for EMBL entries than for UniProt) where entries are merged or changed. entret will show you the full entry so you can see where the identifiers come from. Hope that helps, Peter From mincloud at gmail.com Thu Dec 7 15:36:03 2006 From: mincloud at gmail.com (yun zheng) Date: Thu, 7 Dec 2006 14:36:03 -0600 Subject: [EMBOSS] how to find unique DNA sequences from a large database Message-ID: <8f6eb9540612071236i27bf5d28k1e921d220ea0d9b5@mail.gmail.com> Hi, Are there any tools for find unique sequences from a large database? Many thanks. I need to find unique DNA sequences from a large database. A short piece is given as follows. >001 aaaagttgtgtgtgtatgacaggtt >013 aacctgtcatacacacacaactttt >289 gttgtgtgtgtatgacaggtt >375 tgtgtgtatgacaggttgat >319 tcaacctgtcatacacaca >177 cgcagtgtgtgtatgacagg >271 gtcctacctgtcatacacac >020 aagacataatgtgtgtatgacag All these seem to be the same sequence, since BLASTN gives very small e-values for their alignments. BLASTN 2.2.8 [Jan-05-2004] Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs", Nucleic Acids Res. 25:3389-3402. Query= 001 (25 letters) Database: drought-clustered.fa 410 sequences; 8877 total letters Searching.done Score E Sequences producing significant alignments: (bits) Value 013 50 8e-11 001 50 8e-11 289 42 2e-08 375 34 5e-06 319 34 5e-06 177 32 2e-05 271 30 8e-05 020 28 3e-04 Best regards. sincerely Zheng, Yun Department of Computer Science Washington University in St Louis Campus Box 1045 1 Brookings Drive, St Louis, MO 63130 From mthon at tamu.edu Thu Dec 7 18:55:00 2006 From: mthon at tamu.edu (Michael Thon) Date: Thu, 7 Dec 2006 17:55:00 -0600 Subject: [EMBOSS] how to find unique DNA sequences from a large database In-Reply-To: <8f6eb9540612071236i27bf5d28k1e921d220ea0d9b5@mail.gmail.com> References: <8f6eb9540612071236i27bf5d28k1e921d220ea0d9b5@mail.gmail.com> Message-ID: <7F1F24A9-10FD-462D-BD63-349AD4538EB9@tamu.edu> Hi Yun , you might try a clustering algorithm like blastclust (single linkage clustering) or mcl (a.k.a tribe-mcl) or one of the others that exist. I can't think of any EMBOSS apps that would solve this problem, but maybe someone else has a better answer. Mike On Dec 7, 2006, at 2:36 PM, yun zheng wrote: > Hi, > > Are there any tools for find unique sequences from a large > database? Many > thanks. > > I need to find unique DNA sequences from a large database. A short > piece is > given as follows. > >> 001 > aaaagttgtgtgtgtatgacaggtt >> 013 > aacctgtcatacacacacaactttt >> 289 > gttgtgtgtgtatgacaggtt >> 375 > tgtgtgtatgacaggttgat >> 319 > tcaacctgtcatacacaca >> 177 > cgcagtgtgtgtatgacagg >> 271 > gtcctacctgtcatacacac >> 020 > aagacataatgtgtgtatgacag > > All these seem to be the same sequence, since BLASTN gives very small > e-values for their alignments. > > BLASTN 2.2.8 [Jan-05-2004] > > > Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. > Schaffer, > Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), > "Gapped BLAST and PSI-BLAST: a new generation of protein database > search > programs", Nucleic Acids Res. 25:3389-3402. > > Query= 001 > (25 letters) > > Database: drought-clustered.fa > 410 sequences; 8877 total letters > > Searching.done > > > Score E > Sequences producing significant alignments: > (bits) > Value > > 013 > 50 > 8e-11 > 001 > 50 > 8e-11 > 289 > 42 > 2e-08 > 375 > 34 > 5e-06 > 319 > 34 > 5e-06 > 177 > 32 > 2e-05 > 271 > 30 > 8e-05 > 020 > 28 > 3e-04 > > Best regards. > > sincerely > > Zheng, Yun > > Department of Computer Science > > Washington University in St Louis > > Campus Box 1045 > > 1 Brookings Drive, St Louis, MO 63130 > _______________________________________________ > EMBOSS mailing list > EMBOSS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/emboss From ztu at msi.umn.edu Thu Dec 7 20:30:38 2006 From: ztu at msi.umn.edu (Zheng Jin Tu) Date: Thu, 7 Dec 2006 19:30:38 -0600 (CST) Subject: [EMBOSS] how to find unique DNA sequences from a large database In-Reply-To: <7F1F24A9-10FD-462D-BD63-349AD4538EB9@tamu.edu> References: <8f6eb9540612071236i27bf5d28k1e921d220ea0d9b5@mail.gmail.com> <7F1F24A9-10FD-462D-BD63-349AD4538EB9@tamu.edu> Message-ID: Although these are not the good ways to do, they are the workable solutions: First, for each sequence in your database, make a long string of sequence. Then use a for loop scan over your long sequence string with the window size of your search sequence. You do all for each sequences in the database. It may take a few days if you need to scan big databases such as human genome. The other way is to elongate your short query to 17 or 21 nt (not sure which is the shortest one that blast works) long where blast can search. That means, if you have 15 nt oligo, you can creat four x four possible 17 nt sequences. Such as: AAACCCGGGC CCTTTAAaa AAACCCGGGC CCTTTAAag AAACCCGGGC CCTTTAAac AAACCCGGGC CCTTTAAat AAACCCGGGC CCTTTAAga AAACCCGGGC CCTTTAAgg AAACCCGGGC CCTTTAAgc AAACCCGGGC CCTTTAAgt AAACCCGGGC CCTTTAAca AAACCCGGGC CCTTTAAct AAACCCGGGC CCTTTAAcg AAACCCGGGC CCTTTAAcc ..... Then you run blast and combine all results from 16 17-nt sequences as the hits for your 15 nt query sequence. Hope this useful. Thanks, TU ================================== On Thu, 7 Dec 2006, Michael Thon wrote: > Hi Yun , you might try a clustering algorithm like blastclust (single > linkage clustering) or mcl (a.k.a tribe-mcl) or one of the others > that exist. I can't think of any EMBOSS apps that would solve this > problem, but maybe someone else has a better answer. > Mike > > > On Dec 7, 2006, at 2:36 PM, yun zheng wrote: > >> Hi, >> >> Are there any tools for find unique sequences from a large >> database? Many >> thanks. >> >> I need to find unique DNA sequences from a large database. A short >> piece is >> given as follows. >> >>> 001 >> aaaagttgtgtgtgtatgacaggtt >>> 013 >> aacctgtcatacacacacaactttt >>> 289 >> gttgtgtgtgtatgacaggtt >>> 375 >> tgtgtgtatgacaggttgat >>> 319 >> tcaacctgtcatacacaca >>> 177 >> cgcagtgtgtgtatgacagg >>> 271 >> gtcctacctgtcatacacac >>> 020 >> aagacataatgtgtgtatgacag >> >> All these seem to be the same sequence, since BLASTN gives very small >> e-values for their alignments. >> >> BLASTN 2.2.8 [Jan-05-2004] >> >> >> Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. >> Schaffer, >> Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), >> "Gapped BLAST and PSI-BLAST: a new generation of protein database >> search >> programs", Nucleic Acids Res. 25:3389-3402. >> >> Query= 001 >> (25 letters) >> >> Database: drought-clustered.fa >> 410 sequences; 8877 total letters >> >> Searching.done >> >> >> Score E >> Sequences producing significant alignments: >> (bits) >> Value >> >> 013 >> 50 >> 8e-11 >> 001 >> 50 >> 8e-11 >> 289 >> 42 >> 2e-08 >> 375 >> 34 >> 5e-06 >> 319 >> 34 >> 5e-06 >> 177 >> 32 >> 2e-05 >> 271 >> 30 >> 8e-05 >> 020 >> 28 >> 3e-04 >> >> Best regards. >> >> sincerely >> >> Zheng, Yun >> >> Department of Computer Science >> >> Washington University in St Louis >> >> Campus Box 1045 >> >> 1 Brookings Drive, St Louis, MO 63130 >> _______________________________________________ >> EMBOSS mailing list >> EMBOSS at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/emboss > > _______________________________________________ > EMBOSS mailing list > EMBOSS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/emboss > From pmr at ebi.ac.uk Fri Dec 8 03:37:32 2006 From: pmr at ebi.ac.uk (pmr at ebi.ac.uk) Date: Fri, 8 Dec 2006 08:37:32 -0000 (GMT) Subject: [EMBOSS] how to find unique DNA sequences from a large database In-Reply-To: <8f6eb9540612071236i27bf5d28k1e921d220ea0d9b5@mail.gmail.com> References: <8f6eb9540612071236i27bf5d28k1e921d220ea0d9b5@mail.gmail.com> Message-ID: <1636.217.44.134.240.1165567052.squirrel@webmail.ebi.ac.uk> Dear Yun Zheng, > Are there any tools for find unique sequences from a large database? Many > thanks. > > I need to find unique DNA sequences from a large database. A short piece > is > given as follows. > > All these seem to be the same sequence, since BLASTN gives very small > e-values for their alignments. Remember than BLASTN is a local alignment tool. The small e-values indicate that some part of your 001 query sequence is similar to some part of a sequence in the database. You need to check what is matching in the alignments reported by BLASTN. One useful test is whether the whole length of your query is matching to any of the sequences in the database, also for DNA whether it is matching in one or both directions (as sequences can have biologically significant inverted repeats). There are tools (not in EMBOSS) available for building non-redundant databases - excluding sequences which are subsequences of others in the database, or selecting one of a set of sequences that match closely over their whole length. But you do have to decide what you mean by redundancy and make sure that the methods you apply are appropriate. Hope that helps, Peter Rice From mincloud at gmail.com Fri Dec 8 13:50:40 2006 From: mincloud at gmail.com (yun zheng) Date: Fri, 8 Dec 2006 12:50:40 -0600 Subject: [EMBOSS] how to find unique DNA sequences from a large database In-Reply-To: <1636.217.44.134.240.1165567052.squirrel@webmail.ebi.ac.uk> References: <8f6eb9540612071236i27bf5d28k1e921d220ea0d9b5@mail.gmail.com> <1636.217.44.134.240.1165567052.squirrel@webmail.ebi.ac.uk> Message-ID: <8f6eb9540612081050n2e9b745lb28b79eb9dffb82f@mail.gmail.com> Dear All, Many thanks for your reply. Best regards. sincerely zheng, yun On 12/8/06, pmr at ebi.ac.uk wrote: > > Dear Yun Zheng, > > > Are there any tools for find unique sequences from a large database? > Many > > thanks. > > > > I need to find unique DNA sequences from a large database. A short piece > > is > > given as follows. > > > > > All these seem to be the same sequence, since BLASTN gives very small > > e-values for their alignments. > > Remember than BLASTN is a local alignment tool. The small e-values > indicate that some part of your 001 query sequence is similar to some part > of a sequence in the database. > > You need to check what is matching in the alignments reported by BLASTN. > One useful test is whether the whole length of your query is matching to > any of the sequences in the database, also for DNA whether it is matching > in one or both directions (as sequences can have biologically significant > inverted repeats). > > There are tools (not in EMBOSS) available for building non-redundant > databases - excluding sequences which are subsequences of others in the > database, or selecting one of a set of sequences that match closely over > their whole length. But you do have to decide what you mean by redundancy > and make sure that the methods you apply are appropriate. > > Hope that helps, > > Peter Rice > > From bobwohlhueter at earthlink.net Tue Dec 12 17:32:13 2006 From: bobwohlhueter at earthlink.net (Robert Wohlhueter) Date: Tue, 12 Dec 2006 17:32:13 -0500 Subject: [EMBOSS] jemboss standalone installation problems on MacBook/Intel Message-ID: <457F2DED.7040903@earthlink.net> Trying to run emboss/jemboss as a standalone on MacBookPro/intel under OS 10.4. Downloaded and installed from fink.sourceforge.net (in fink's preferred /sw/share/EMBOSS tree). [In trying to circumvent the proglems described below, I also built, in the /usr/local tree, from source code at emboss.sourceforge.net, an entirely separate installation. I get exactly the same set of errors with that installation.] Facts as best I comprehend them: 1) the envvars, including CLASSPATH, JEMBOSS_HOME, EMBSS_INSTALL, etc., are set as specified in $EMBOSS_INSTALL/jemboss/runJemboss.sh, and seem correct to me. jars, *.class, *.java, and the several executables that comprise the emboss suite of applications, as near as I can tell are all present. 2) java runtime is that bundled by Apple with OS 10.4, namely {summer:~}66 bobw$ java -version java version "1.5.0_06" Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_06-112) Java HotSpot(TM) Client VM (build 1.5.0_06-64, mixed mode, sharing) 3) As suggested in Emboss Administrator's Guide, testing the installation with $ wossname -auto works as expected, listing countless applications in various categories. 4) When I try to run `java $JEMBOSS_HOME/org/emboss/jemboss/Jemboss local &` {summer:~}68 bobw$ jemboss [1] 736 {summer:~}69 bobw$ Exception in thread "main" java.lang.NoClassDefFoundError: /sw/share/EMBOSS/jemboss/org/emboss/jemboss/Jemboss I'll worry about this later. 5) The nub of the problem is when I run `java org.emboss.jemboss.Jemboss local &` from JEMBOSS_HOME, I get the following messages: {summer:/sw/share/EMBOSS/jemboss}60 bobw$ Exception in thread "Thread-2" java.lang.NullPointerException at org.emboss.jemboss.gui.BuildProgramMenu$1.construct(BuildProgramMenu.java:278) at org.emboss.jemboss.gui.SwingWorker$2.run(SwingWorker.java:127) at java.lang.Thread.run(Thread.java:613) I'm not a java programmer, but when I look at source code in BuildProgramMenu.java, it looks like the error arises in a routine which is trying to construct a "dataFile" file specification from data in a object called "mysettings". I don't see how/where "mysettings" is defined, but I'm suspicious that it is intended to read data from my local settings (envvars, emboss.defaults ??), is not able to, and thus passes null information to the new datafile specification. Can anybody elucidate the source of data in "mysettings" and give me a hint what I need to do to supply it? Thanks for any and all pointers, Bob Wohlhueter From tjc at sanger.ac.uk Wed Dec 13 02:11:18 2006 From: tjc at sanger.ac.uk (Tim Carver) Date: Wed, 13 Dec 2006 07:11:18 +0000 Subject: [EMBOSS] jemboss standalone installation problems on MacBook/Intel In-Reply-To: <457F2DED.7040903@earthlink.net> Message-ID: Hi Robert Jemboss has not been set up to work with fink. You do need to use the EMBOSS download (including any patches) and install using the script as described at: http://emboss.sourceforge.net/Jemboss/install/standalone.html Also make sure you have the latest java from: http://www.apple.com/downloads/macosx/apple/j2se50release4intel.html Regards Tim Carver On 12/12/06 22:32, "Robert Wohlhueter" wrote: > Trying to run emboss/jemboss as a standalone on MacBookPro/intel under > OS 10.4. Downloaded and installed from fink.sourceforge.net (in fink's > preferred /sw/share/EMBOSS tree). [In trying to circumvent the proglems > described below, I also built, in the /usr/local tree, from source code > at emboss.sourceforge.net, an entirely separate installation. I get > exactly the same set of errors with that installation.] > > Facts as best I comprehend them: > 1) the envvars, including CLASSPATH, JEMBOSS_HOME, EMBSS_INSTALL, etc., > are set as specified in $EMBOSS_INSTALL/jemboss/runJemboss.sh, and seem > correct to me. jars, *.class, *.java, and the several executables that > comprise the emboss suite of applications, as near as I can tell are all > present. > > 2) java runtime is that bundled by Apple with OS 10.4, namely > {summer:~}66 bobw$ java -version > java version "1.5.0_06" > Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_06-112) > Java HotSpot(TM) Client VM (build 1.5.0_06-64, mixed mode, sharing) > > 3) As suggested in Emboss Administrator's Guide, testing the > installation with > $ wossname -auto > works as expected, listing countless applications in various categories. > > 4) When I try to run `java $JEMBOSS_HOME/org/emboss/jemboss/Jemboss local &` > {summer:~}68 bobw$ jemboss > [1] 736 > {summer:~}69 bobw$ Exception in thread "main" > java.lang.NoClassDefFoundError: > /sw/share/EMBOSS/jemboss/org/emboss/jemboss/Jemboss > I'll worry about this later. > > 5) The nub of the problem is when I run `java org.emboss.jemboss.Jemboss > local &` from JEMBOSS_HOME, I get the following messages: > > {summer:/sw/share/EMBOSS/jemboss}60 bobw$ Exception in thread "Thread-2" > java.lang.NullPointerException > at > org.emboss.jemboss.gui.BuildProgramMenu$1.construct(BuildProgramMenu.java:278) > at org.emboss.jemboss.gui.SwingWorker$2.run(SwingWorker.java:127) > at java.lang.Thread.run(Thread.java:613) > > I'm not a java programmer, but when I look at source code in > BuildProgramMenu.java, it looks like the error arises in a routine which > is trying to construct a "dataFile" file specification from data in a > object called "mysettings". I don't see how/where "mysettings" is > defined, but I'm suspicious that it is intended to read data from my > local settings (envvars, emboss.defaults ??), is not able to, and thus > passes null information to the new datafile specification. > > Can anybody elucidate the source of data in "mysettings" and give me a > hint what I need to do to supply it? > > Thanks for any and all pointers, > > Bob Wohlhueter > > _______________________________________________ > EMBOSS mailing list > EMBOSS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/emboss From pandey.gaurav at gmail.com Wed Dec 13 23:43:06 2006 From: pandey.gaurav at gmail.com (Gaurav Pandey) Date: Wed, 13 Dec 2006 22:43:06 -0600 Subject: [EMBOSS] New Review: Computational Approaches for Protein Function Prediction Message-ID: <627ca1900612132043l48a54760ibda9de67fe35a8f5@mail.gmail.com> [Apologies if you receive this more than once] Dear Colleague, We are pleased to share with you a recent review of several hundred papers in the field of computational protein function prediction: Title: Computational Approaches for Protein Function Prediction: A Survey Authors: Gaurav Pandey , Vipin Kumarand Michael Steinbach Available at: http://www.cs.umn.edu/~kumar/papers/survey.php Abstract Proteins are the most essential and versatile macromolecules of life, and the knowledge of their functions is a crucial link in the development of new drugs, better crops, and even the development of synthetic biochemicals such as biofuels. Experimental procedures for protein function prediction are inherently low throughput and are thus unable to annotate a non-trivial fraction of proteins that are becoming available due to rapid advances in genome sequencing technology. This has motivated the development of computational techniques that utilize a variety of high-throughput experimental data for protein function prediction, such as protein and genome sequences, gene expression data, protein interaction networks and phylogenetic profiles. Indeed, in a short period of a decade, several hundred articles have been published on this topic. This review aims to discuss this wide spectrum of approaches by categorizing them in terms of the data type they use for predicting function, and thus identify the trends and needs of this very important field. The survey is expected to be useful for computational biologists and bioinformaticians aiming to get an overview of the field of computational function prediction, and identify areas that can benefit from further research. Your comments on the article, or any part thereof, are welcome. Thanks and best regards Gaurav Pandey (gaurav at cs.umn.edu) Vipin Kumar (kumar at cs.umn.edu) Michael Steinbach (steinbac at cs.umn.edu) From msarachu at biol.unlp.edu.ar Wed Dec 27 09:01:37 2006 From: msarachu at biol.unlp.edu.ar (=?iso-8859-1?b?TWFydO1u?= Sarachu) Date: Wed, 27 Dec 2006 11:01:37 -0300 Subject: [EMBOSS] wEMBOSS-1.7.1 released Message-ID: <1167228097.45927cc185874@webmail.biol.unlp.edu.ar> This release is mainly to fix a problem with editing nucList and protList and also includes some minor changes. wrappers4EMBOSS-1.5.1 is included in this wEMBOSS release. wEMBOSS can be downloaded from http://www.wemboss.org Best regards, The wEMBOSS team -- Mart?n Sarachu msarachu at biol.unlp.edu.ar EMBnet Argentina http://www.ar.embnet.org From hpm at bioinfo-user.org.uk Sun Dec 3 15:06:25 2006 From: hpm at bioinfo-user.org.uk (Hamish McWilliam) Date: Sun, 03 Dec 2006 15:06:25 +0000 Subject: [EMBOSS] EMBOSS database setup In-Reply-To: <43964.81.98.244.247.1164904060.squirrel@webmail.ebi.ac.uk> References: <639b80db0611281217i6c1cc927v50ac9b8e6a71717c@mail.gmail.com> <1164902235.14146.57.camel@emboss2.ebi.ac.uk> <43964.81.98.244.247.1164904060.squirrel@webmail.ebi.ac.uk> Message-ID: <4572E7F1.5080709@bioinfo-user.org.uk> Hi Alan, > The appended definitions are simple ones that may be > useful if you only want a few sequences at a time. > If sites upgrade to SRS8 then alter accordingly. > > Alan > > DB embl [ type: N method: srswww format: embl release: "EBI" > url: "http://srs.ebi.ac.uk/srs7bin/cgi-bin/wgetz" > comment: "EMBL from the EBI" ] > > DB em [ type: N method: srswww format: embl release: "EBI" > url: "http://srs.ebi.ac.uk/srs7bin/cgi-bin/wgetz" > dbalias: "EMBL" > comment: "EMBL from the EBI" ] > > DB swissprot [ type: P method: srswww format: swiss release: "EBI" > url: "http://srs.ebi.ac.uk/srs7bin/cgi-bin/wgetz" > comment: "SWISSPROT from the EBI" ] > > DB sw [ type: P method: srswww format: swiss release: "EBI" > url: "http://srs.ebi.ac.uk/srs7bin/cgi-bin/wgetz" > dbalias: "SWISSPROT" > comment: "SWISSPROT from the EBI" ] > > DB uniprot [ type: P method: srswww format: swiss release: "EBI" > url: "http://srs.ebi.ac.uk/srs7bin/cgi-bin/wgetz" > comment: "UNIPROT from the EBI" ] > > DB uni [ type: P method: srswww format: swiss release: EBI > url: "http://srs.ebi.ac.uk/srs7bin/cgi-bin/wgetz" > dbalias: "UNIPROT" > comment: "UNIPROT from the EBI" ] > > DB pir [ type: P method: srswww format: nbrf release: "EBI" > url: "http://srs.ebi.ac.uk/srs7bin/cgi-bin/wgetz" > comment: "PIR from the EBI" ] > > DB genbank [ type: N method: srswww format: genbank release: "NCBI" > url: "http://www.infobiogen.fr/srs7bin/cgi-bin/wgetz" > comment: "GenBank from Infobiogen" ] > > DB gb [ type: N method: srswww format: genbank release: "NCBI" > url: "http://www.infobiogen.fr/srs7bin/cgi-bin/wgetz" > dbalias: "GENBANK" > comment: "GenBank from Infobiogen" ] > > DB refseq [ type: N method: srswww format: genbank release: "NCBI" > url: "http://srs.ebi.ac.uk/srs7bin/cgi-bin/wgetz" > comment: "REFSEQ from EBI" ] For the EBI's SRS server please use: http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz as the URL. This should allow for continued support when the server is upgraded. Also note that the Infobiogen SRS service is no longer available. For other SRS sites carrying GenBank please see the Public SRS Server List (http://downloads.biowisdomsrs.com/publicsrs.html). Hamish From maoj at helix.nih.gov Mon Dec 4 15:49:26 2006 From: maoj at helix.nih.gov (Jean Mao) Date: Mon, 4 Dec 2006 10:49:26 -0500 Subject: [EMBOSS] Application for PFAM? Message-ID: <000501c717bb$c6d353d0$be4de780@CIT.NIH.GOV> Hi, Just wondering if EMBOSS has any program that will search a pfam database? From David.Bauer at SCHERING.DE Tue Dec 5 06:40:54 2006 From: David.Bauer at SCHERING.DE (David.Bauer at SCHERING.DE) Date: Tue, 5 Dec 2006 07:40:54 +0100 Subject: [EMBOSS] Antwort: Application for PFAM? In-Reply-To: <000501c717bb$c6d353d0$be4de780@CIT.NIH.GOV> Message-ID: Hi Jean, not in the core EMBOSS but there is the HMMER-2.3.2 embassy application. This contains also the program ehmmpfam to search a sequence against the Pfam HMM database. HTH, David. emboss-bounces at lists.open-bio.org schrieb am 04/12/2006 16:49:26: > Hi, > Just wondering if EMBOSS has any program that will search a pfam database? > > _______________________________________________ > EMBOSS mailing list > EMBOSS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/emboss From JK at novozymes.com Wed Dec 6 12:34:38 2006 From: JK at novozymes.com (JK (Jesper Agerbo Krogh)) Date: Wed, 6 Dec 2006 13:34:38 +0100 Subject: [EMBOSS] Output from seqret in fastaformat. Message-ID: <934F95E71B6C9347A873C42AE3C196190B84C672@NZT0004E.dknz.nzcorp.net> Hi.. I've godt dbxflat to index the swissprot database.. but I'd like to have the output formatted with the USA as the fasta ID. Current..: seqret UNIPROT:Q12345 Reads and writes (returns) sequences output sequence(s) [ies3_yeast.fasta]: >IES3_YEAST Q12345 Ino eighty subunit 3. MKFEDLLATNKQVQFAHAATQHYKSVKTPDFLEKDPHHKKFHNADGLNQQGSSTPSTATD ANAASTASTHTNTTTFKRHIVAVDDISKMNYEMIKNSPGNVITNANQDEIDISTLKTRLY KDNLYAMNDNFLQAVNDQIVTLNAAEQDQETEDPDLSDDEKIDILTKIQENLLEEYQKLS QKERKWFILKELLLDANVELDLFSNRGRKASHPIAFGAVAIPTNVNANSLAFNRTKRRKI NKNGLLENIL .. but I'd like.. >UNIPROT:Q12345 Ino eighty subunit 3. MKFEDLLATNKQVQFAHAATQHYKSVKTPDFLEKDPHHKKFHNADGLNQQGSSTPSTATD ANAASTASTHTNTTTFKRHIVAVDDISKMNYEMIKNSPGNVITNANQDEIDISTLKTRLY KDNLYAMNDNFLQAVNDQIVTLNAAEQDQETEDPDLSDDEKIDILTKIQENLLEEYQKLS QKERKWFILKELLLDANVELDLFSNRGRKASHPIAFGAVAIPTNVNANSLAFNRTKRRKI NKNGLLENIL Is that possible? -- Jesper Krogh From maoj at helix.nih.gov Wed Dec 6 14:55:33 2006 From: maoj at helix.nih.gov (Jean Mao) Date: Wed, 6 Dec 2006 09:55:33 -0500 Subject: [EMBOSS] Question regarding dbxflat entry number processed Message-ID: <000a01c71946$94beb600$be4de780@CIT.NIH.GOV> Hi, I am using dbxflat to index a database. I would like to find out how many entries were processed. In the index file database.pxid, there is a line : Count 123456 which is very close to the number of entries in the database file but not exact the same. Is there a way to find out? Thank you very much. Jean Mao From ajb at ebi.ac.uk Wed Dec 6 15:23:44 2006 From: ajb at ebi.ac.uk (ajb at ebi.ac.uk) Date: Wed, 6 Dec 2006 15:23:44 -0000 (GMT) Subject: [EMBOSS] Question regarding dbxflat entry number processed In-Reply-To: <000a01c71946$94beb600$be4de780@CIT.NIH.GOV> References: <000a01c71946$94beb600$be4de780@CIT.NIH.GOV> Message-ID: <48753.81.98.244.247.1165418624.squirrel@webmail.ebi.ac.uk> Hi Jean, Usually you just need to add 1 to the 'Count' value as the counting works from 0 to n-1 rather than 1 to n. However, if there are duplicate keys in the database then that cannot be relied upon: the Count is representative of the number of unique keys at the top level of the tree and does not include any duplicates indexed in a subtree. HTH Alan From JK at novozymes.com Wed Dec 6 15:33:11 2006 From: JK at novozymes.com (JK (Jesper Agerbo Krogh)) Date: Wed, 6 Dec 2006 16:33:11 +0100 Subject: [EMBOSS] Output from seqret in fastaformat. In-Reply-To: <4576C594.3080609@ebi.ac.uk> Message-ID: <934F95E71B6C9347A873C42AE3C196191386B8DD@NZT0004E.dknz.nzcorp.net> Hi, > > Use -osdbname UNIPROT in the command line. That sort of works... but that gives me the DATABASE:ID not the DATABASE:AC in the fasta-header. Whats the actual difference between the id and the accessionnumbers? -- Jesper Krogh From pmr at ebi.ac.uk Wed Dec 6 17:30:58 2006 From: pmr at ebi.ac.uk (pmr at ebi.ac.uk) Date: Wed, 6 Dec 2006 17:30:58 -0000 (GMT) Subject: [EMBOSS] Output from seqret in fastaformat. In-Reply-To: <934F95E71B6C9347A873C42AE3C196191386B8DD@NZT0004E.dknz.nzcorp.net> References: <4576C594.3080609@ebi.ac.uk> <934F95E71B6C9347A873C42AE3C196191386B8DD@NZT0004E.dknz.nzcorp.net> Message-ID: <14912.193.173.109.1.1165426258.squirrel@webmail.ebi.ac.uk> Hi Jesper, >> Use -osdbname UNIPROT in the command line. > > That sort of works... but that gives me the DATABASE:ID not the > DATABASE:AC in the fasta-header. Yup, you need to redefine the ID as well with -sid Q12345 > Whats the actual difference between the id and the accessionnumbers? The id is the identifier on the ID line of the entry. The accession number is from the AC line - also a unique identifier but completely unmemorable. Given the choice, we prefer the real ID. Entries can also have more than one accession number (more common for EMBL entries than for UniProt) where entries are merged or changed. entret will show you the full entry so you can see where the identifiers come from. Hope that helps, Peter From mincloud at gmail.com Thu Dec 7 20:36:03 2006 From: mincloud at gmail.com (yun zheng) Date: Thu, 7 Dec 2006 14:36:03 -0600 Subject: [EMBOSS] how to find unique DNA sequences from a large database Message-ID: <8f6eb9540612071236i27bf5d28k1e921d220ea0d9b5@mail.gmail.com> Hi, Are there any tools for find unique sequences from a large database? Many thanks. I need to find unique DNA sequences from a large database. A short piece is given as follows. >001 aaaagttgtgtgtgtatgacaggtt >013 aacctgtcatacacacacaactttt >289 gttgtgtgtgtatgacaggtt >375 tgtgtgtatgacaggttgat >319 tcaacctgtcatacacaca >177 cgcagtgtgtgtatgacagg >271 gtcctacctgtcatacacac >020 aagacataatgtgtgtatgacag All these seem to be the same sequence, since BLASTN gives very small e-values for their alignments. BLASTN 2.2.8 [Jan-05-2004] Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs", Nucleic Acids Res. 25:3389-3402. Query= 001 (25 letters) Database: drought-clustered.fa 410 sequences; 8877 total letters Searching.done Score E Sequences producing significant alignments: (bits) Value 013 50 8e-11 001 50 8e-11 289 42 2e-08 375 34 5e-06 319 34 5e-06 177 32 2e-05 271 30 8e-05 020 28 3e-04 Best regards. sincerely Zheng, Yun Department of Computer Science Washington University in St Louis Campus Box 1045 1 Brookings Drive, St Louis, MO 63130 From mthon at tamu.edu Thu Dec 7 23:55:00 2006 From: mthon at tamu.edu (Michael Thon) Date: Thu, 7 Dec 2006 17:55:00 -0600 Subject: [EMBOSS] how to find unique DNA sequences from a large database In-Reply-To: <8f6eb9540612071236i27bf5d28k1e921d220ea0d9b5@mail.gmail.com> References: <8f6eb9540612071236i27bf5d28k1e921d220ea0d9b5@mail.gmail.com> Message-ID: <7F1F24A9-10FD-462D-BD63-349AD4538EB9@tamu.edu> Hi Yun , you might try a clustering algorithm like blastclust (single linkage clustering) or mcl (a.k.a tribe-mcl) or one of the others that exist. I can't think of any EMBOSS apps that would solve this problem, but maybe someone else has a better answer. Mike On Dec 7, 2006, at 2:36 PM, yun zheng wrote: > Hi, > > Are there any tools for find unique sequences from a large > database? Many > thanks. > > I need to find unique DNA sequences from a large database. A short > piece is > given as follows. > >> 001 > aaaagttgtgtgtgtatgacaggtt >> 013 > aacctgtcatacacacacaactttt >> 289 > gttgtgtgtgtatgacaggtt >> 375 > tgtgtgtatgacaggttgat >> 319 > tcaacctgtcatacacaca >> 177 > cgcagtgtgtgtatgacagg >> 271 > gtcctacctgtcatacacac >> 020 > aagacataatgtgtgtatgacag > > All these seem to be the same sequence, since BLASTN gives very small > e-values for their alignments. > > BLASTN 2.2.8 [Jan-05-2004] > > > Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. > Schaffer, > Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), > "Gapped BLAST and PSI-BLAST: a new generation of protein database > search > programs", Nucleic Acids Res. 25:3389-3402. > > Query= 001 > (25 letters) > > Database: drought-clustered.fa > 410 sequences; 8877 total letters > > Searching.done > > > Score E > Sequences producing significant alignments: > (bits) > Value > > 013 > 50 > 8e-11 > 001 > 50 > 8e-11 > 289 > 42 > 2e-08 > 375 > 34 > 5e-06 > 319 > 34 > 5e-06 > 177 > 32 > 2e-05 > 271 > 30 > 8e-05 > 020 > 28 > 3e-04 > > Best regards. > > sincerely > > Zheng, Yun > > Department of Computer Science > > Washington University in St Louis > > Campus Box 1045 > > 1 Brookings Drive, St Louis, MO 63130 > _______________________________________________ > EMBOSS mailing list > EMBOSS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/emboss From ztu at msi.umn.edu Fri Dec 8 01:30:38 2006 From: ztu at msi.umn.edu (Zheng Jin Tu) Date: Thu, 7 Dec 2006 19:30:38 -0600 (CST) Subject: [EMBOSS] how to find unique DNA sequences from a large database In-Reply-To: <7F1F24A9-10FD-462D-BD63-349AD4538EB9@tamu.edu> References: <8f6eb9540612071236i27bf5d28k1e921d220ea0d9b5@mail.gmail.com> <7F1F24A9-10FD-462D-BD63-349AD4538EB9@tamu.edu> Message-ID: Although these are not the good ways to do, they are the workable solutions: First, for each sequence in your database, make a long string of sequence. Then use a for loop scan over your long sequence string with the window size of your search sequence. You do all for each sequences in the database. It may take a few days if you need to scan big databases such as human genome. The other way is to elongate your short query to 17 or 21 nt (not sure which is the shortest one that blast works) long where blast can search. That means, if you have 15 nt oligo, you can creat four x four possible 17 nt sequences. Such as: AAACCCGGGC CCTTTAAaa AAACCCGGGC CCTTTAAag AAACCCGGGC CCTTTAAac AAACCCGGGC CCTTTAAat AAACCCGGGC CCTTTAAga AAACCCGGGC CCTTTAAgg AAACCCGGGC CCTTTAAgc AAACCCGGGC CCTTTAAgt AAACCCGGGC CCTTTAAca AAACCCGGGC CCTTTAAct AAACCCGGGC CCTTTAAcg AAACCCGGGC CCTTTAAcc ..... Then you run blast and combine all results from 16 17-nt sequences as the hits for your 15 nt query sequence. Hope this useful. Thanks, TU ================================== On Thu, 7 Dec 2006, Michael Thon wrote: > Hi Yun , you might try a clustering algorithm like blastclust (single > linkage clustering) or mcl (a.k.a tribe-mcl) or one of the others > that exist. I can't think of any EMBOSS apps that would solve this > problem, but maybe someone else has a better answer. > Mike > > > On Dec 7, 2006, at 2:36 PM, yun zheng wrote: > >> Hi, >> >> Are there any tools for find unique sequences from a large >> database? Many >> thanks. >> >> I need to find unique DNA sequences from a large database. A short >> piece is >> given as follows. >> >>> 001 >> aaaagttgtgtgtgtatgacaggtt >>> 013 >> aacctgtcatacacacacaactttt >>> 289 >> gttgtgtgtgtatgacaggtt >>> 375 >> tgtgtgtatgacaggttgat >>> 319 >> tcaacctgtcatacacaca >>> 177 >> cgcagtgtgtgtatgacagg >>> 271 >> gtcctacctgtcatacacac >>> 020 >> aagacataatgtgtgtatgacag >> >> All these seem to be the same sequence, since BLASTN gives very small >> e-values for their alignments. >> >> BLASTN 2.2.8 [Jan-05-2004] >> >> >> Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. >> Schaffer, >> Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), >> "Gapped BLAST and PSI-BLAST: a new generation of protein database >> search >> programs", Nucleic Acids Res. 25:3389-3402. >> >> Query= 001 >> (25 letters) >> >> Database: drought-clustered.fa >> 410 sequences; 8877 total letters >> >> Searching.done >> >> >> Score E >> Sequences producing significant alignments: >> (bits) >> Value >> >> 013 >> 50 >> 8e-11 >> 001 >> 50 >> 8e-11 >> 289 >> 42 >> 2e-08 >> 375 >> 34 >> 5e-06 >> 319 >> 34 >> 5e-06 >> 177 >> 32 >> 2e-05 >> 271 >> 30 >> 8e-05 >> 020 >> 28 >> 3e-04 >> >> Best regards. >> >> sincerely >> >> Zheng, Yun >> >> Department of Computer Science >> >> Washington University in St Louis >> >> Campus Box 1045 >> >> 1 Brookings Drive, St Louis, MO 63130 >> _______________________________________________ >> EMBOSS mailing list >> EMBOSS at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/emboss > > _______________________________________________ > EMBOSS mailing list > EMBOSS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/emboss > From pmr at ebi.ac.uk Fri Dec 8 08:37:32 2006 From: pmr at ebi.ac.uk (pmr at ebi.ac.uk) Date: Fri, 8 Dec 2006 08:37:32 -0000 (GMT) Subject: [EMBOSS] how to find unique DNA sequences from a large database In-Reply-To: <8f6eb9540612071236i27bf5d28k1e921d220ea0d9b5@mail.gmail.com> References: <8f6eb9540612071236i27bf5d28k1e921d220ea0d9b5@mail.gmail.com> Message-ID: <1636.217.44.134.240.1165567052.squirrel@webmail.ebi.ac.uk> Dear Yun Zheng, > Are there any tools for find unique sequences from a large database? Many > thanks. > > I need to find unique DNA sequences from a large database. A short piece > is > given as follows. > > All these seem to be the same sequence, since BLASTN gives very small > e-values for their alignments. Remember than BLASTN is a local alignment tool. The small e-values indicate that some part of your 001 query sequence is similar to some part of a sequence in the database. You need to check what is matching in the alignments reported by BLASTN. One useful test is whether the whole length of your query is matching to any of the sequences in the database, also for DNA whether it is matching in one or both directions (as sequences can have biologically significant inverted repeats). There are tools (not in EMBOSS) available for building non-redundant databases - excluding sequences which are subsequences of others in the database, or selecting one of a set of sequences that match closely over their whole length. But you do have to decide what you mean by redundancy and make sure that the methods you apply are appropriate. Hope that helps, Peter Rice From mincloud at gmail.com Fri Dec 8 18:50:40 2006 From: mincloud at gmail.com (yun zheng) Date: Fri, 8 Dec 2006 12:50:40 -0600 Subject: [EMBOSS] how to find unique DNA sequences from a large database In-Reply-To: <1636.217.44.134.240.1165567052.squirrel@webmail.ebi.ac.uk> References: <8f6eb9540612071236i27bf5d28k1e921d220ea0d9b5@mail.gmail.com> <1636.217.44.134.240.1165567052.squirrel@webmail.ebi.ac.uk> Message-ID: <8f6eb9540612081050n2e9b745lb28b79eb9dffb82f@mail.gmail.com> Dear All, Many thanks for your reply. Best regards. sincerely zheng, yun On 12/8/06, pmr at ebi.ac.uk wrote: > > Dear Yun Zheng, > > > Are there any tools for find unique sequences from a large database? > Many > > thanks. > > > > I need to find unique DNA sequences from a large database. A short piece > > is > > given as follows. > > > > > All these seem to be the same sequence, since BLASTN gives very small > > e-values for their alignments. > > Remember than BLASTN is a local alignment tool. The small e-values > indicate that some part of your 001 query sequence is similar to some part > of a sequence in the database. > > You need to check what is matching in the alignments reported by BLASTN. > One useful test is whether the whole length of your query is matching to > any of the sequences in the database, also for DNA whether it is matching > in one or both directions (as sequences can have biologically significant > inverted repeats). > > There are tools (not in EMBOSS) available for building non-redundant > databases - excluding sequences which are subsequences of others in the > database, or selecting one of a set of sequences that match closely over > their whole length. But you do have to decide what you mean by redundancy > and make sure that the methods you apply are appropriate. > > Hope that helps, > > Peter Rice > > From bobwohlhueter at earthlink.net Tue Dec 12 22:32:13 2006 From: bobwohlhueter at earthlink.net (Robert Wohlhueter) Date: Tue, 12 Dec 2006 17:32:13 -0500 Subject: [EMBOSS] jemboss standalone installation problems on MacBook/Intel Message-ID: <457F2DED.7040903@earthlink.net> Trying to run emboss/jemboss as a standalone on MacBookPro/intel under OS 10.4. Downloaded and installed from fink.sourceforge.net (in fink's preferred /sw/share/EMBOSS tree). [In trying to circumvent the proglems described below, I also built, in the /usr/local tree, from source code at emboss.sourceforge.net, an entirely separate installation. I get exactly the same set of errors with that installation.] Facts as best I comprehend them: 1) the envvars, including CLASSPATH, JEMBOSS_HOME, EMBSS_INSTALL, etc., are set as specified in $EMBOSS_INSTALL/jemboss/runJemboss.sh, and seem correct to me. jars, *.class, *.java, and the several executables that comprise the emboss suite of applications, as near as I can tell are all present. 2) java runtime is that bundled by Apple with OS 10.4, namely {summer:~}66 bobw$ java -version java version "1.5.0_06" Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_06-112) Java HotSpot(TM) Client VM (build 1.5.0_06-64, mixed mode, sharing) 3) As suggested in Emboss Administrator's Guide, testing the installation with $ wossname -auto works as expected, listing countless applications in various categories. 4) When I try to run `java $JEMBOSS_HOME/org/emboss/jemboss/Jemboss local &` {summer:~}68 bobw$ jemboss [1] 736 {summer:~}69 bobw$ Exception in thread "main" java.lang.NoClassDefFoundError: /sw/share/EMBOSS/jemboss/org/emboss/jemboss/Jemboss I'll worry about this later. 5) The nub of the problem is when I run `java org.emboss.jemboss.Jemboss local &` from JEMBOSS_HOME, I get the following messages: {summer:/sw/share/EMBOSS/jemboss}60 bobw$ Exception in thread "Thread-2" java.lang.NullPointerException at org.emboss.jemboss.gui.BuildProgramMenu$1.construct(BuildProgramMenu.java:278) at org.emboss.jemboss.gui.SwingWorker$2.run(SwingWorker.java:127) at java.lang.Thread.run(Thread.java:613) I'm not a java programmer, but when I look at source code in BuildProgramMenu.java, it looks like the error arises in a routine which is trying to construct a "dataFile" file specification from data in a object called "mysettings". I don't see how/where "mysettings" is defined, but I'm suspicious that it is intended to read data from my local settings (envvars, emboss.defaults ??), is not able to, and thus passes null information to the new datafile specification. Can anybody elucidate the source of data in "mysettings" and give me a hint what I need to do to supply it? Thanks for any and all pointers, Bob Wohlhueter From tjc at sanger.ac.uk Wed Dec 13 07:11:18 2006 From: tjc at sanger.ac.uk (Tim Carver) Date: Wed, 13 Dec 2006 07:11:18 +0000 Subject: [EMBOSS] jemboss standalone installation problems on MacBook/Intel In-Reply-To: <457F2DED.7040903@earthlink.net> Message-ID: Hi Robert Jemboss has not been set up to work with fink. You do need to use the EMBOSS download (including any patches) and install using the script as described at: http://emboss.sourceforge.net/Jemboss/install/standalone.html Also make sure you have the latest java from: http://www.apple.com/downloads/macosx/apple/j2se50release4intel.html Regards Tim Carver On 12/12/06 22:32, "Robert Wohlhueter" wrote: > Trying to run emboss/jemboss as a standalone on MacBookPro/intel under > OS 10.4. Downloaded and installed from fink.sourceforge.net (in fink's > preferred /sw/share/EMBOSS tree). [In trying to circumvent the proglems > described below, I also built, in the /usr/local tree, from source code > at emboss.sourceforge.net, an entirely separate installation. I get > exactly the same set of errors with that installation.] > > Facts as best I comprehend them: > 1) the envvars, including CLASSPATH, JEMBOSS_HOME, EMBSS_INSTALL, etc., > are set as specified in $EMBOSS_INSTALL/jemboss/runJemboss.sh, and seem > correct to me. jars, *.class, *.java, and the several executables that > comprise the emboss suite of applications, as near as I can tell are all > present. > > 2) java runtime is that bundled by Apple with OS 10.4, namely > {summer:~}66 bobw$ java -version > java version "1.5.0_06" > Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_06-112) > Java HotSpot(TM) Client VM (build 1.5.0_06-64, mixed mode, sharing) > > 3) As suggested in Emboss Administrator's Guide, testing the > installation with > $ wossname -auto > works as expected, listing countless applications in various categories. > > 4) When I try to run `java $JEMBOSS_HOME/org/emboss/jemboss/Jemboss local &` > {summer:~}68 bobw$ jemboss > [1] 736 > {summer:~}69 bobw$ Exception in thread "main" > java.lang.NoClassDefFoundError: > /sw/share/EMBOSS/jemboss/org/emboss/jemboss/Jemboss > I'll worry about this later. > > 5) The nub of the problem is when I run `java org.emboss.jemboss.Jemboss > local &` from JEMBOSS_HOME, I get the following messages: > > {summer:/sw/share/EMBOSS/jemboss}60 bobw$ Exception in thread "Thread-2" > java.lang.NullPointerException > at > org.emboss.jemboss.gui.BuildProgramMenu$1.construct(BuildProgramMenu.java:278) > at org.emboss.jemboss.gui.SwingWorker$2.run(SwingWorker.java:127) > at java.lang.Thread.run(Thread.java:613) > > I'm not a java programmer, but when I look at source code in > BuildProgramMenu.java, it looks like the error arises in a routine which > is trying to construct a "dataFile" file specification from data in a > object called "mysettings". I don't see how/where "mysettings" is > defined, but I'm suspicious that it is intended to read data from my > local settings (envvars, emboss.defaults ??), is not able to, and thus > passes null information to the new datafile specification. > > Can anybody elucidate the source of data in "mysettings" and give me a > hint what I need to do to supply it? > > Thanks for any and all pointers, > > Bob Wohlhueter > > _______________________________________________ > EMBOSS mailing list > EMBOSS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/emboss From pandey.gaurav at gmail.com Thu Dec 14 04:43:06 2006 From: pandey.gaurav at gmail.com (Gaurav Pandey) Date: Wed, 13 Dec 2006 22:43:06 -0600 Subject: [EMBOSS] New Review: Computational Approaches for Protein Function Prediction Message-ID: <627ca1900612132043l48a54760ibda9de67fe35a8f5@mail.gmail.com> [Apologies if you receive this more than once] Dear Colleague, We are pleased to share with you a recent review of several hundred papers in the field of computational protein function prediction: Title: Computational Approaches for Protein Function Prediction: A Survey Authors: Gaurav Pandey , Vipin Kumarand Michael Steinbach Available at: http://www.cs.umn.edu/~kumar/papers/survey.php Abstract Proteins are the most essential and versatile macromolecules of life, and the knowledge of their functions is a crucial link in the development of new drugs, better crops, and even the development of synthetic biochemicals such as biofuels. Experimental procedures for protein function prediction are inherently low throughput and are thus unable to annotate a non-trivial fraction of proteins that are becoming available due to rapid advances in genome sequencing technology. This has motivated the development of computational techniques that utilize a variety of high-throughput experimental data for protein function prediction, such as protein and genome sequences, gene expression data, protein interaction networks and phylogenetic profiles. Indeed, in a short period of a decade, several hundred articles have been published on this topic. This review aims to discuss this wide spectrum of approaches by categorizing them in terms of the data type they use for predicting function, and thus identify the trends and needs of this very important field. The survey is expected to be useful for computational biologists and bioinformaticians aiming to get an overview of the field of computational function prediction, and identify areas that can benefit from further research. Your comments on the article, or any part thereof, are welcome. Thanks and best regards Gaurav Pandey (gaurav at cs.umn.edu) Vipin Kumar (kumar at cs.umn.edu) Michael Steinbach (steinbac at cs.umn.edu) From msarachu at biol.unlp.edu.ar Wed Dec 27 14:01:37 2006 From: msarachu at biol.unlp.edu.ar (=?iso-8859-1?b?TWFydO1u?= Sarachu) Date: Wed, 27 Dec 2006 11:01:37 -0300 Subject: [EMBOSS] wEMBOSS-1.7.1 released Message-ID: <1167228097.45927cc185874@webmail.biol.unlp.edu.ar> This release is mainly to fix a problem with editing nucList and protList and also includes some minor changes. wrappers4EMBOSS-1.5.1 is included in this wEMBOSS release. wEMBOSS can be downloaded from http://www.wemboss.org Best regards, The wEMBOSS team -- Mart?n Sarachu msarachu at biol.unlp.edu.ar EMBnet Argentina http://www.ar.embnet.org