From fernan at iib.unsam.edu.ar Tue Oct 2 13:54:05 2007 From: fernan at iib.unsam.edu.ar (Fernan Aguero) Date: Tue, 2 Oct 2007 14:54:05 -0300 Subject: [EMBOSS] problems installing/using TrEMBL Message-ID: <20071002175405.GA62945@iib.unsam.edu.ar> Hi, I've installed TrEMBL in EMBOSS and it seems like I'm having some problems ... I've run dbiflat as follows: dbiflat -dbname trembl -idformat EMBL -directory . -filenames uniprot_trembl.dat -release '37.0' -date '24/07/07' -fields sv,acc,des,key,org I've put an entry in my emboss.default configuration file and the db is listed by showdb. Also the db seems to works fine with, for example 'textsearch': [fernan at alfa ~]$ textsearch trembl:* 'cyclase' Search sequence documentation. Slow, use SRS and Entrez! Output file [a0b532_mettp.textsearch]: stdout # Search for: cyclase trembl-id:A0B532_METTP A0B532_METTP A0B532 RNA-3'-phosphate cyclase (EC 6.5.1.4). trembl-id:A1RWP7_THEPD A1RWP7_THEPD A1RWP7 RNA-3'-phosphate cyclase (EC 6.5.1.4). trembl-id:A2SR85_METLZ A2SR85_METLZ A2SR85 Cyclase family protein. trembl-id:A3H5Q9_9CREN A3H5Q9_9CREN A3H5Q9 Magnesium-protoporphyrin IX monomethyl ester (Oxidative) cyclase (EC 1.14.13.81). trembl-id:A3H7Y6_9CREN A3H7Y6_9CREN A3H7Y6 RNA-3'-phosphate cyclase (EC 6.5.1.4). trembl-id:A6URB1_METVA A6URB1_METVA A6URB1 Cyclase family protein. ... First, I've got a number of warnings when running dbiflat. Because all of them were about null IDs ('') I've just ignored them ... I mention it just in case, Warning: Duplicate ID skipped: '' All hits will point to first ID found Now, when using seqret, it seems like I'm not getting the records I expect, for example if I search for the first ID in the example above (A0B532), I get A0BDZ0 instead: [fernan at alfa ~]$ seqret trembl:A0B532 Reads and writes (returns) sequences output sequence(s) [a0bdz0_parte.fasta]: stdout >A0BDZ0_PARTE A0BDZ0 Chromosome undetermined scaffold_101, whole genome shotgun sequence. MLNFPQNARDHFSCDCDPCEFAITHGEEIMPKRVPPQKPIQQIQDKDLGLLLRKLQAPNK LTRSVRIRIPETCVCNEGEIKFIAYYDESEGFIKFIQKPTFQQTKQFLNERRPPDSLAVI IKYIDNNMQVMTDMEFTILMMKRKIDPIWSQILYIQNFNSNKNYELQHYEFKHSFDSKYP EFDLARIEILILNGEIARASSDFVPMVREEAYENSLSQDQYCRYMVYKMVHYADVFGGIQ ITEGKFSFHKKTFISMEKMEYTDLDRKALFDSEILLRKKKMIDEDMFQFQKLIDQNVKKE REYALKVYREILDMDNGLDQQSHLLKNKLSVIGYDLKKYSQSIQSNFQQVMVSKDPASTL KELVIEQKVNEEKLTSILKPKKGEKTKKKM But if I search for A0B532_METTP I get nothing: [fernan at alfa ~]$ seqret trembl:A0B532_METTP Reads and writes (returns) sequences Error: Unable to read sequence 'trembl:A0B532_METTP' Died: seqret terminated: Bad value for '-sequence' and no prompt Now, if I search for A0BDZ0, I get A0BL81 instead: [fernan at alfa ~]$ seqret trembl:A0BDZ0 Reads and writes (returns) sequences output sequence(s) [a0bl81_parte.fasta]: stdout >A0BL81_PARTE A0BL81 Chromosome undetermined scaffold_113, whole genome shotgun sequence. MKQISESAHILQKVYNPNRMNKLFMTTHYQLQNETDLIFDKYMLMPLFGLSVANGISSNC IKPKYLCSEYKKQELYDCNLILILSAYSDQAVYRSKTMYEKRNGLEQIFKYLASPNYTYN IHISLLSYFVPQRVFYKQVLQALNIFELIDQKQIEELTKSSSIINQSVGEDNLDSILFKN QEFIDYQKWRRMLKNNTIINLKTLHQHQLSQQIFCQYFLRYHYYQGCEEEINKLNKFLVD DFDMFKFRSRLEHNEKKMKFYFLRMLKYFKLNEKLEIFLKFSFKSYSLDWNKELLREMKN SLNQYKKQ Any idea about what is wrong? I also have swissprot installed (pretty much in the same way) and it works OK with seqret, both using ACs (Q4U9M9) or IDs (104K_THEAN). This is on a Linux cluster (Rocks 4.2, with EMBOSS installed from the Bio roll) [fernan at alfa ~]$ embossversion Writes the current EMBOSS version number 4.0.0 Thanks in advance for any pointer, Fernan From simon.andrews at bbsrc.ac.uk Wed Oct 3 03:37:53 2007 From: simon.andrews at bbsrc.ac.uk (Simon Andrews) Date: Wed, 3 Oct 2007 08:37:53 +0100 Subject: [EMBOSS] problems installing/using TrEMBL In-Reply-To: <20071002175405.GA62945@iib.unsam.edu.ar> References: <20071002175405.GA62945@iib.unsam.edu.ar> Message-ID: On 2 Oct 2007, at 18:54, Fernan Aguero wrote: > Hi, > > I've installed TrEMBL in EMBOSS and it seems like I'm having some > problems ... > > I've run dbiflat as follows: [snip] > > Now, when using seqret, it seems like I'm not getting the > records I expect, for example if I search for the first ID > in the example above (A0B532), I get A0BDZ0 instead: I suspect your problem is that your trembl file is >2Gb in size. Above this size dbiflat won't work properly and will give wacky results such as the ones you've shown. This won't be a problem with uniprot_sprot.dat as this is still only about 1.1Gb. Your choices are therefore: 1) You could split your trembl file into multiple files, each smaller than 2Gb. This ends up being a complete pain, and you probably don't want to do it this way. 2) Use the newer dbx* family of indexing programs which can cope with larger file sizes. In your case you'd use dbxflat instead of dbiflat. There are some configuration differences between the two so you should read 'tfm dbxflat' first, but they work pretty much the same as the old versions. We use the dbx programs for all of our databases and they work fine. Hope this helps Simon. From gbottu at vub.ac.be Thu Oct 4 06:01:45 2007 From: gbottu at vub.ac.be (Guy Bottu) Date: Thu, 04 Oct 2007 12:01:45 +0200 Subject: [EMBOSS] Question about acidify Message-ID: <4704BA09.1000905@vub.ac.be> Dear Peter, dear Alan, dear all, I remember that there had been question of implementing a tool called acidify that would allow for the easy integration of software under EMBOSS (with the help of an ACD file but without elaborate EMBOSS "wrapper" progrm). Can someone tell me how far this has gone. I ask this question because my colleagues of the SIMDAT project have expressed their interest. Guy Bottu, BEN From pmr at ebi.ac.uk Thu Oct 4 06:40:48 2007 From: pmr at ebi.ac.uk (Peter Rice) Date: Thu, 04 Oct 2007 11:40:48 +0100 Subject: [EMBOSS] Question about acidify In-Reply-To: <4704BA09.1000905@vub.ac.be> References: <4704BA09.1000905@vub.ac.be> Message-ID: <4704C330.6070102@ebi.ac.uk> Guy Bottu wrote: > I remember that there had been question of implementing a tool called > acidify that would allow for the easy integration of software under > EMBOSS (with the help of an ACD file but without elaborate EMBOSS > "wrapper" progrm). Can someone tell me how far this has gone. I ask this > question because my colleagues of the SIMDAT project have expressed > their interest. We are working on making this easier in ACD. I added some functions when Alan was writing wrappers for MIRA. We already have ACD extensions for SoapLab to provide additional definitions for external applications. These are used to generate the XML definitions used by SoapLab for non-EMBOSS applications, but can be generally useful. Do you have examples of the ACD files that would be useful for SIMDAT? Are any new datatypes involved? regards, Peter From fernan at iib.unsam.edu.ar Thu Oct 4 10:08:22 2007 From: fernan at iib.unsam.edu.ar (Fernan Aguero) Date: Thu, 4 Oct 2007 11:08:22 -0300 Subject: [EMBOSS] problems installing/using TrEMBL In-Reply-To: References: <20071002175405.GA62945@iib.unsam.edu.ar> Message-ID: <20071004140822.GA96432@iib.unsam.edu.ar> | On 2 Oct 2007, at 18:54, Fernan Aguero wrote: | | > Hi, | > | > I've installed TrEMBL in EMBOSS and it seems like I'm having some | > problems ... | > | > I've run dbiflat as follows: | [snip] | > | > Now, when using seqret, it seems like I'm not getting the | > records I expect, for example if I search for the first ID | > in the example above (A0B532), I get A0BDZ0 instead: | | I suspect your problem is that your trembl file is >2Gb in size. | Above this size dbiflat won't work properly and will give wacky | results such as the ones you've shown. This won't be a problem with | uniprot_sprot.dat as this is still only about 1.1Gb. | | Your choices are therefore: | | 1) You could split your trembl file into multiple files, each smaller | than 2Gb. This ends up being a complete pain, and you probably don't | want to do it this way. | | 2) Use the newer dbx* family of indexing programs which can cope with | larger file sizes. In your case you'd use dbxflat instead of | dbiflat. There are some configuration differences between the two so | you should read 'tfm dbxflat' first, but they work pretty much the | same as the old versions. We use the dbx programs for all of our | databases and they work fine. | | Hope this helps | | Simon. Simon, thanks for your suggestions. I've been waiting for dbxflat to finish before replying ... thus the delay. You mention that there are some configuration differences between db(x|i)flat ... I guess I've got into those now ... even after reading tfm for dbxflat, it seems I can't just set it up right ===> Configuration DB trembl [ type: P comment: "TrEMBL 37.0" method: emblcd format: embl dbalias: trembl dir: /share/bio/emboss/trembl/ file: uniprot_trembl.dat indexdirectory: /share/bio/emboss/trembl ] With this configuration, I get this error: [fernan at alfa ~]$ seqret trembl:A0B532 Reads and writes (returns) sequences Warning: Cannot open division file '' for database 'trembl' Warning: seqCdQry failed Error: Unable to read sequence 'trembl:A0B532' Died: seqret terminated: Bad value for '-sequence' and no prompt If I change the 'method' to 'method: emboss' as per the example in the dbxflat docs, I get this error: [fernan at alfa ~]$ seqret trembl:A0B532 Reads and writes (returns) sequences EMBOSS An error in ajindex.c at line 3028: Cannot open param file /share/bio/emboss/trembl/trembl.pxid This file does not exist (see result of indexing below): ===> Indexing [root at alfa trembl]# dbxflat -dbname trembl -idformat EMBL -directory . -filenames uniprot_trembl.dat -release "37.0" -date "24/07/07" -fields sv,acc,des,key,orgDatabase b+tree indexing for flat file databases Resource name: embl Processing file ./uniprot_trembl.dat [root at alfa trembl]# du -hc * 4.0K dbxflat.command 4.0K trembl.ent 4.0K trembl.pxac 4.0K trembl.pxde 4.0K trembl.pxkw 4.0K trembl.pxsv 4.0K trembl.pxtx 572M trembl.xac 4.2G trembl.xde 381M trembl.xkw 4.0K trembl.xsv 3.0G trembl.xtx 11G uniprot_trembl.dat 19G total I've also tried other combinations of 'method' (emboss, emblcd) and 'format' (swiss, embl) without success ... Am I indexing the db with the right incantation for dbxflat? If so, what am I missing in my configuration? Thanks again for any pointer, Fernan PS: this is on emboss-4.0.0 running on a Rocks Cluster (4.2, CentOS) From georgios at biotek.uio.no Thu Oct 4 10:53:38 2007 From: georgios at biotek.uio.no (George Magklaras) Date: Thu, 04 Oct 2007 16:53:38 +0200 Subject: [EMBOSS] problems installing/using TrEMBL In-Reply-To: <20071004140822.GA96432@iib.unsam.edu.ar> References: <20071002175405.GA62945@iib.unsam.edu.ar> <20071004140822.GA96432@iib.unsam.edu.ar> Message-ID: <4704FE72.1090206@biotek.uio.no> Maybe you are missing the resource record in the emboss.default file for the trembl databank and you have passed the wrong arguments to dbxflat. You should choose the emboss method in the DB entry. Then, the emboss.default file should contain also a resource entry for trembl: RES trembl [ type: Index idlen: 15 acclen: 15 svlen: 20 keylen: 30 deslen: 25 orglen: 25 ] From your dbxflat output you quote I can see that the command points to the embl resource: [root at alfa trembl]# dbxflat -dbname trembl -idformat EMBL <--- Why EMBL? -directory . -filenames uniprot_trembl.dat -release "37.0" -date "24/07/07" -fields sv,acc,des,key,orgDatabase b+tree indexing for flat file databases Resource name: embl <--- That should say trembl, Why did you choose embl here? When the dbxflat command asked you for a resource name, you really should have a trembl RES entry and I am not sure that your idformat (EMBL) is correct. GM -- -- George Magklaras Senior Computer Systems Engineer/UNIX Systems Administrator EMBnet Technical Management Board The Biotechnology Centre of Oslo, University of Oslo http://www.biotek.uio.no/ EMBnet Norway: http://www.no.embnet.org/ Fernan Aguero wrote: > > | On 2 Oct 2007, at 18:54, Fernan Aguero wrote: > | > | > Hi, > | > > | > I've installed TrEMBL in EMBOSS and it seems like I'm having some > | > problems ... > | > > | > I've run dbiflat as follows: > | [snip] > | > > | > Now, when using seqret, it seems like I'm not getting the > | > records I expect, for example if I search for the first ID > | > in the example above (A0B532), I get A0BDZ0 instead: > | > | I suspect your problem is that your trembl file is >2Gb in size. > | Above this size dbiflat won't work properly and will give wacky > | results such as the ones you've shown. This won't be a problem with > | uniprot_sprot.dat as this is still only about 1.1Gb. > | > | Your choices are therefore: > | > | 1) You could split your trembl file into multiple files, each smaller > | than 2Gb. This ends up being a complete pain, and you probably don't > | want to do it this way. > | > | 2) Use the newer dbx* family of indexing programs which can cope with > | larger file sizes. In your case you'd use dbxflat instead of > | dbiflat. There are some configuration differences between the two so > | you should read 'tfm dbxflat' first, but they work pretty much the > | same as the old versions. We use the dbx programs for all of our > | databases and they work fine. > | > | Hope this helps > | > | Simon. > > Simon, > > thanks for your suggestions. I've been waiting for dbxflat > to finish before replying ... thus the delay. > > You mention that there are some configuration > differences between db(x|i)flat ... I guess I've got into those > now ... even after reading tfm for dbxflat, it seems I can't > just set it up right > > ===> Configuration > DB trembl [ > type: P > comment: "TrEMBL 37.0" > method: emblcd > format: embl > dbalias: trembl > dir: /share/bio/emboss/trembl/ > file: uniprot_trembl.dat > indexdirectory: /share/bio/emboss/trembl > ] > > With this configuration, I get this error: > [fernan at alfa ~]$ seqret trembl:A0B532 > Reads and writes (returns) sequences > Warning: Cannot open division file '' for database 'trembl' > Warning: seqCdQry failed > Error: Unable to read sequence 'trembl:A0B532' > Died: seqret terminated: Bad value for '-sequence' and no prompt > > If I change the 'method' to 'method: emboss' > as per the example in the dbxflat docs, I get this error: > > [fernan at alfa ~]$ seqret trembl:A0B532 > Reads and writes (returns) sequences > > EMBOSS An error in ajindex.c at line 3028: > Cannot open param file /share/bio/emboss/trembl/trembl.pxid > > This file does not exist (see result of indexing below): > > ===> Indexing > [root at alfa trembl]# dbxflat -dbname trembl -idformat EMBL > -directory . -filenames uniprot_trembl.dat -release "37.0" > -date "24/07/07" -fields sv,acc,des,key,orgDatabase b+tree > indexing for flat file databases > Resource name: embl > Processing file ./uniprot_trembl.dat > [root at alfa trembl]# du -hc * > 4.0K dbxflat.command > 4.0K trembl.ent > 4.0K trembl.pxac > 4.0K trembl.pxde > 4.0K trembl.pxkw > 4.0K trembl.pxsv > 4.0K trembl.pxtx > 572M trembl.xac > 4.2G trembl.xde > 381M trembl.xkw > 4.0K trembl.xsv > 3.0G trembl.xtx > 11G uniprot_trembl.dat > 19G total > > I've also tried other combinations of 'method' (emboss, > emblcd) and 'format' (swiss, embl) without success ... > > Am I indexing the db with the right incantation for dbxflat? > If so, what am I missing in my configuration? > > Thanks again for any pointer, > > Fernan > > PS: this is on emboss-4.0.0 running on a Rocks Cluster (4.2, > CentOS) > > _______________________________________________ > EMBOSS mailing list > EMBOSS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/emboss > From fernan at iib.unsam.edu.ar Thu Oct 4 18:41:44 2007 From: fernan at iib.unsam.edu.ar (Fernan Aguero) Date: Thu, 4 Oct 2007 19:41:44 -0300 Subject: [EMBOSS] problems installing/using TrEMBL In-Reply-To: <4704FE72.1090206@biotek.uio.no> References: <20071002175405.GA62945@iib.unsam.edu.ar> <4704FE72.1090206@biotek.uio.no> Message-ID: <20071004224144.GA98760@iib.unsam.edu.ar> George, thanks for your points. | Maybe you are missing the resource record in the emboss.default file for | the trembl databank and you have passed the wrong arguments to dbxflat. I have this resource record in my emboss.default conf RES embl [ type: Index idlen: 15 acclen: 15 svlen: 15 keylen: 25 deslen: 25 orglen: 25 ] | You should choose the emboss method in the DB entry. OK | Then, the | emboss.default file should contain also a resource entry for trembl: | | RES trembl [ | type: Index | idlen: 15 | acclen: 15 | svlen: 20 | keylen: 30 | deslen: 25 | orglen: 25 | ] Does the name of the resource matter? Mine is named 'embl' ... | From your dbxflat output you quote I can see that the command points to | the embl resource: | | [root at alfa trembl]# dbxflat -dbname trembl -idformat EMBL <--- Why EMBL? What other options are there SWISS? GCG? GENBANK? This is AFAIK an EMBL formatted file. But maybe I'm wrong ... | -directory . -filenames uniprot_trembl.dat -release "37.0" | -date "24/07/07" -fields sv,acc,des,key,orgDatabase b+tree | indexing for flat file databases | Resource name: embl <--- That should say trembl, Why did you choose | embl here? Because the resource in my emboss.default file is named 'embl'. | | When the dbxflat command asked you for a resource name, you really | should have a trembl RES entry and I am not sure that your idformat | (EMBL) is correct. | | GM | -- | George Magklaras Mmm ... maybe it's SWISS then? >From the dbxflat docs: EMBL : EMBL SWISS : Swiss-Prot, SpTrEMBL, TrEMBLnew GB : Genbank, DDBJ REFSEQ : Refseq Entry format [SWISS]: Thanks for your questions and pointers. I'm running dbxflat overnight again to see if this makes any difference (-idformat SWISS -resource trembl, with a new trembl RES line added to emboss.default). But so far, only 6 trembl.* files are being produced and none of them is called trembl.pxid (as per the error in my original message, see below). [root at alfa trembl]# ls trembl.* trembl.ent trembl.xac trembl.xde trembl.xkw trembl.xsv trembl.xtx Fernan PS: this is the first entry in my uniprot_trembl.dat file [fernan at alfa trembl]$ head -45 uniprot_trembl.dat ID A0B532_METTP Unreviewed; 337 AA. AC A0B532; DT 28-NOV-2006, integrated into UniProtKB/TrEMBL. DT 28-NOV-2006, sequence version 1. DT 24-JUL-2007, entry version 6. DE RNA-3'-phosphate cyclase (EC 6.5.1.4). GN OrderedLocusNames=Mthe_0003; OS Methanosaeta thermophila (strain DSM 6194 / PT) (Methanothrix OS thermophila (strain DSM 6194 / PT)). OC Archaea; Euryarchaeota; Methanomicrobia; Methanosarcinales; OC Methanosaetaceae; Methanosaeta. OX NCBI_TaxID=349307; RN [1] RP NUCLEOTIDE SEQUENCE [LARGE SCALE GENOMIC DNA]. RG US DOE Joint Genome Institute; RA Copeland A., Lucas S., Lapidus A., Barry K., Detter J.C., RA Glavina del Rio T., Hammon N., Israni S., Pitluck S., Chain P., RA Malfatti S., Shin M., Vergez L., Schmutz J., Larimer F., Land M., RA Hauser L., Kyrpides N., Kim E., Smith K.S., Ingram-Smith C., RA Richardson P.; RT "Complete sequence of Methanosaeta thermophila PT."; RL Submitted (OCT-2006) to the EMBL/GenBank/DDBJ databases. CC ----------------------------------------------------------------------- CC Copyrighted by the UniProt Consortium, see http://www.uniprot.org/terms CC Distributed under the Creative Commons Attribution-NoDerivs License CC ----------------------------------------------------------------------- DR EMBL; CP000477; ABK13806.1; -; Genomic_DNA. DR GenomeReviews; CP000477_GR; Mthe_0003. DR GO; GO:0003963; F:RNA-3'-phosphate cyclase activity; IEA:InterPro. DR InterPro; IPR000228; RNA3'_term_phos_cycl. DR InterPro; IPR013796; RNA3'_term_phos_cycl_insert. DR PANTHER; PTHR11096; RNA3'_term_phos_cycl; 1. DR Pfam; PF01137; RTC; 1. DR Pfam; PF05189; RTC_insert; 1. DR PROSITE; PS01287; RTC; 1. PE 4: Predicted; KW Complete proteome; Ligase. SQ SEQUENCE 337 AA; 36340 MW; 69F26755A1B8DA03 CRC64; MNKPQMIEID GSYGEGGGQI VRTSVALSTL TGIPVRIKNI RRNRPRPGLA AQHVRAIEAL AQISRAETRG VHLGSEEIEF IPGRISAGSY DVDIGTAGSV TLLIQCLLPA LTAAEGPVTV TVRGGTDVRW SPTVDYLEHV ALPAMHLFGV TATFRCERRG YYPRGGGVVV LSTRPSRLRP ARLELIEEGI CGISHCGSLP EHVARRQADA ALELLKEKGY DARIDIQTMS SSSPGSGITL WSGFRGSSAL GERGVRAEDV GREAAKALID ELKSKASVDV HLADQLIPYI ALAGGEYTTR EISSHTRTNI WTAQRILRCR IDIDEGEVFR IHSTGSG // | Fernan Aguero wrote: | > | > | On 2 Oct 2007, at 18:54, Fernan Aguero wrote: | > | | > | > Hi, | > | > | > | > I've installed TrEMBL in EMBOSS and it seems like I'm having some | > | > problems ... | > | > | > | > I've run dbiflat as follows: | > | [snip] | > | > | > | > Now, when using seqret, it seems like I'm not getting the | > | > records I expect, for example if I search for the first ID | > | > in the example above (A0B532), I get A0BDZ0 instead: | > | | > | I suspect your problem is that your trembl file is >2Gb in size. | > | Above this size dbiflat won't work properly and will give wacky | > | results such as the ones you've shown. This won't be a problem with | > | uniprot_sprot.dat as this is still only about 1.1Gb. | > | | > | Your choices are therefore: | > | | > | 1) You could split your trembl file into multiple files, each smaller | > | than 2Gb. This ends up being a complete pain, and you probably don't | > | want to do it this way. | > | | > | 2) Use the newer dbx* family of indexing programs which can cope with | > | larger file sizes. In your case you'd use dbxflat instead of | > | dbiflat. There are some configuration differences between the two so | > | you should read 'tfm dbxflat' first, but they work pretty much the | > | same as the old versions. We use the dbx programs for all of our | > | databases and they work fine. | > | | > | Hope this helps | > | | > | Simon. | > | > Simon, | > | > thanks for your suggestions. I've been waiting for dbxflat | > to finish before replying ... thus the delay. | > | > You mention that there are some configuration | > differences between db(x|i)flat ... I guess I've got into those | > now ... even after reading tfm for dbxflat, it seems I can't | > just set it up right | > | > ===> Configuration | > DB trembl [ | > type: P | > comment: "TrEMBL 37.0" | > method: emblcd | > format: embl | > dbalias: trembl | > dir: /share/bio/emboss/trembl/ | > file: uniprot_trembl.dat | > indexdirectory: /share/bio/emboss/trembl | > ] | > | > With this configuration, I get this error: | > [fernan at alfa ~]$ seqret trembl:A0B532 | > Reads and writes (returns) sequences | > Warning: Cannot open division file '' for database 'trembl' | > Warning: seqCdQry failed | > Error: Unable to read sequence 'trembl:A0B532' | > Died: seqret terminated: Bad value for '-sequence' and no prompt | > | > If I change the 'method' to 'method: emboss' | > as per the example in the dbxflat docs, I get this error: | > | > [fernan at alfa ~]$ seqret trembl:A0B532 | > Reads and writes (returns) sequences | > | > EMBOSS An error in ajindex.c at line 3028: | > Cannot open param file /share/bio/emboss/trembl/trembl.pxid | > | > This file does not exist (see result of indexing below): | > | > ===> Indexing | > [root at alfa trembl]# dbxflat -dbname trembl -idformat EMBL | > -directory . -filenames uniprot_trembl.dat -release "37.0" | > -date "24/07/07" -fields sv,acc,des,key,orgDatabase b+tree | > indexing for flat file databases | > Resource name: embl | > Processing file ./uniprot_trembl.dat | > [root at alfa trembl]# du -hc * | > 4.0K dbxflat.command | > 4.0K trembl.ent | > 4.0K trembl.pxac | > 4.0K trembl.pxde | > 4.0K trembl.pxkw | > 4.0K trembl.pxsv | > 4.0K trembl.pxtx | > 572M trembl.xac | > 4.2G trembl.xde | > 381M trembl.xkw | > 4.0K trembl.xsv | > 3.0G trembl.xtx | > 11G uniprot_trembl.dat | > 19G total | > | > I've also tried other combinations of 'method' (emboss, | > emblcd) and 'format' (swiss, embl) without success ... | > | > Am I indexing the db with the right incantation for dbxflat? | > If so, what am I missing in my configuration? | > | > Thanks again for any pointer, | > | > Fernan | > | > PS: this is on emboss-4.0.0 running on a Rocks Cluster (4.2, | > CentOS) | > | > _______________________________________________ | > EMBOSS mailing list | > EMBOSS at lists.open-bio.org | > http://lists.open-bio.org/mailman/listinfo/emboss | > | | | | | _______________________________________________ | EMBOSS mailing list | EMBOSS at lists.open-bio.org | http://lists.open-bio.org/mailman/listinfo/emboss | | +----] From sum732 at mail.usask.ca Fri Oct 5 19:38:01 2007 From: sum732 at mail.usask.ca (Sudeep Mehrotra) Date: Fri, 05 Oct 2007 17:38:01 -0600 Subject: [EMBOSS] Seqret and searching a database with entries in a file Message-ID: <986A5EE0-8709-4657-B7CB-84A43513D308@mail.usask.ca> Hello, I am wondering if I can use "seqret" from EMBOSS to perform following action. I have a database and I have a file which consists of list of protein IDs. I want use seqret to search each entry (in the given file) in the given database and output the search into another file. for example: seqret "path to the database":AAT37944.1. If I use the above mentioned command on command line, I get the output (protein name, protein sequence etc) in fasta format consisting the entry. What I want to do is instead of giving one entry I want to give the whole file, which consists of similar entries. Can some one help me here. Thanks Sudeep From david.bauer at bayerhealthcare.com Sat Oct 6 15:13:34 2007 From: david.bauer at bayerhealthcare.com (david.bauer at bayerhealthcare.com) Date: Sat, 6 Oct 2007 21:13:34 +0200 Subject: [EMBOSS] Seqret and searching a database with entries in a file In-Reply-To: <986A5EE0-8709-4657-B7CB-84A43513D308@mail.usask.ca> Message-ID: Hi Sudeep, if you add a "@" character in front of a filename, EMBOSS interprets this as a "file of filenames". So you can put all your IDs including the database name into a file (e.g. myseqs.fof). Then you run "seqret @myseqs.fof". Cheers, David. emboss-bounces at lists.open-bio.org schrieb am 06/10/2007 01:38:01: > Hello, > I am wondering if I can use "seqret" from EMBOSS to perform > following action. > > I have a database and I have a file which consists of list of protein > IDs. I want use seqret to search each entry (in the given file) in > the given database and output the search into another file. > for example: > seqret "path to the database":AAT37944.1. > If I use the above mentioned command on command line, I get the > output (protein name, protein sequence etc) in fasta format > consisting the entry. What I want to do is instead of giving one > entry I want to give the whole file, which consists of similar entries. > > Can some one help me here. > Thanks > Sudeep > _______________________________________________ > EMBOSS mailing list > EMBOSS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/emboss From gbottu at vub.ac.be Mon Oct 8 03:12:18 2007 From: gbottu at vub.ac.be (Guy Bottu) Date: Mon, 08 Oct 2007 09:12:18 +0200 Subject: [EMBOSS] Seqret and searching a database with entries in a file In-Reply-To: <986A5EE0-8709-4657-B7CB-84A43513D308@mail.usask.ca> References: <986A5EE0-8709-4657-B7CB-84A43513D308@mail.usask.ca> Message-ID: <4709D852.20007@vub.ac.be> Sudeep Mehrotra wrote: > I have a database and I have a file which consists of list of protein > IDs. I want use seqret to search each entry (in the given file) in > the given database and output the search into another file. Dear Sudeep, If you can, using some script, transform your file into format : xxx:AC3355 xxx:CG6754 xxx:AV6754 with xxx the name of the databank (you might have to use bare accession numbers rather than version numbers), then it is easy, just run seqret list::File If you want the original entries rather than the entries in fastA format, use entret instead of seqret. Guy Bottu, Belgian EMBnet Node From charles-listes-emboss at plessy.org Mon Oct 8 02:30:50 2007 From: charles-listes-emboss at plessy.org (Charles Plessy) Date: Mon, 8 Oct 2007 15:30:50 +0900 Subject: [EMBOSS] About the EMBOSS quick guide. Message-ID: <20071008063047.GB9819@kunpuu.plessy.org> Dear EMBOSS developpers, I am member of a packaging team that takes care of integrating EMBOSS in Debian. I just realised today that the Quick Guide to EMBOSS is released under a "noncommercial" licence. file:///usr/share/EMBOSS/doc/manuals/emboss_qg.pdf Debian puts a strong emphasis on not mixing programs which do not meet the "Debian Free Software Guidelines" (DFSG) with the ones which do. In our case, EMBOSS is free according to the DFSG, but not the Quick Guide, as restrictions on commercial use do not comply whith the guideline number 6: No Discrimination Against Fields of Endeavor The license must not restrict anyone from making use of the program in a specific field of endeavor. For example, it may not restrict the program from being used in a business, or from being used for genetic research. >From my packager point of view, the simplest way to solve this problem would be that you relicence the Quick Guide under a free licence according to the DFSG, such as BSD or GPL for instance. Unfortunately, the guide's author, David Martin, left EMBnet and I do not know how to contact him. Importantly, the DFSG also require the sources of works distributed in Debian to be available. If it is possible to relicence the Quick Guide, could somebody send me its sources ? Debian integrates a bug reporting and tracking system, and having the sources available in Debian could bring opportunities to receive patches. Have a nice day, -- Charles Plessy Debian-Med packaging team. http://www.debian.org/devel/debian-med Wako, Saitama, Japan From georgios at biotek.uio.no Mon Oct 8 04:59:56 2007 From: georgios at biotek.uio.no (George Magklaras) Date: Mon, 08 Oct 2007 10:59:56 +0200 Subject: [EMBOSS] problems installing/using TrEMBL In-Reply-To: <20071004224144.GA98760@iib.unsam.edu.ar> References: <20071002175405.GA62945@iib.unsam.edu.ar> <4704FE72.1090206@biotek.uio.no> <20071004224144.GA98760@iib.unsam.edu.ar> Message-ID: <4709F18C.2070304@biotek.uio.no> Hi Fernan, Fernan Aguero wrote: > George, > > Does the name of the resource matter? Mine is named 'embl' ... > If you plan to have the same values for all databases, no. But I tend to choose different length values for different databanks, so in that case, I have a different RES entry for each databank. > What other options are there SWISS? GCG? GENBANK? This is AFAIK an > EMBL formatted file. But maybe I'm wrong ... > I believe that TrEMBL should be formatted with the SWISS entry format in dbxflat (-idformat SWISS). -- -- George Magklaras Senior Computer Systems Engineer/UNIX Systems Administrator EMBnet Technical Management Board The Biotechnology Centre of Oslo, University of Oslo http://www.biotek.uio.no/ EMBnet Norway: http://www.no.embnet.org/ From charles-listes-emboss at plessy.org Mon Oct 8 19:38:28 2007 From: charles-listes-emboss at plessy.org (Charles Plessy) Date: Tue, 9 Oct 2007 08:38:28 +0900 Subject: [EMBOSS] Bug in degapseq ? Message-ID: <20071008233828.GA32069@kunpuu.plessy.org> Dear developpers, If I use degaspeq on a file, it prompts me for the name of the output, but if the data comes from stdin, degaspeq crashes. It does not happen if the name of the output is given. chouca?~?$ cat toto >Xenopus-1a -----MVLLKCEYRDEEEDLTS---ASPCSV--TSSFRSPAT----QTCSSDDEQLLSPT SP--------------GQHQGEE---NS----------------------------PRCR RSRGRA-QGKSGETVLKIKKTRRVKANNRERNRMHNLNSALDSLREVLPSLPEDAKLTKI ETLRFAYNYIWALSETLRLGD-----P-VHRS--AS-----TPAAAI---LV---QDSSS SQSP-----SWS--CSSSPSS-----S-------CCSFS--PASP----ASST--SDSIE SWQ---PSELHLNPFMSASSA---FI---- >Xenopus-1b -----MVLLKCEYRDEVSELTS---VSPCSVSSSSSHPSPAM----QTCSSDDEQLHSPT SPTL-------THLQQGRDQGEE---NS----------------------------PRCR RSRAR------GDTVLKIKKTRRVKANNRERNRMHHLNYALDSLREVLPSLPEDAKLTKI ETLRFAHNYIWALSETLRLAD-----Q-LHGS--TS-----TPAAAI---LV---QDSYP SLSP-----SWS--CSSSPSS----NS-------CDSFS--PTSP----ASST--SDSIE YWQ---PSELRLNPFMSAL----------- >Gallus-2 ------MPVKAESPAPAAEDE--L-LLLRLASPAPSASLP-------SSAGEEDEDEEDG RP-------------RRLQEGA----------------------------------RRAG RQRGPPRAARTAETAQRIKRSRRLKANNRERNRMHNLNAALDALRDVLPTFPEDAKLTKI ETLRFAHNYIWALTETLRL----AGAARLGGA--AD-A---APGAA-----A---EG-SP SPAS-----SWS--GGASPAP-----SA---SPYACTLS--PGSP----AGSA--SD-AE HW---PPPRGRFAPPPPPHR----CL---- chouca?~?$ cat toto | degapseq stdin Removes gap characters from sequences output sequence(s) [xenopus-1a.fasta]: EMBOSS An error in ajmess.c at line 1662: END-OF-FILE reading from user chouca?~?$ cat toto | degapseq stdin stdout Removes gap characters from sequences >Xenopus-1a MVLLKCEYRDEEEDLTSASPCSVTSSFRSPATQTCSSDDEQLLSPTSPGQHQGEENSPRC RRSRGRAQGKSGETVLKIKKTRRVKANNRERNRMHNLNSALDSLREVLPSLPEDAKLTKI ETLRFAYNYIWALSETLRLGDPVHRSASTPAAAILVQDSSSSQSPSWSCSSSPSSSCCSF SPASPASSTSDSIESWQPSELHLNPFMSASSAFI >Xenopus-1b MVLLKCEYRDEVSELTSVSPCSVSSSSSHPSPAMQTCSSDDEQLHSPTSPTLTHLQQGRD QGEENSPRCRRSRARGDTVLKIKKTRRVKANNRERNRMHHLNYALDSLREVLPSLPEDAK LTKIETLRFAHNYIWALSETLRLADQLHGSTSTPAAAILVQDSYPSLSPSWSCSSSPSSN SCDSFSPTSPASSTSDSIEYWQPSELRLNPFMSAL >Gallus-2 MPVKAESPAPAAEDELLLLRLASPAPSASLPSSAGEEDEDEEDGRPRRLQEGARRAGRQR GPPRAARTAETAQRIKRSRRLKANNRERNRMHNLNAALDALRDVLPTFPEDAKLTKIETL RFAHNYIWALTETLRLAGAARLGGAADAAPGAAAEGSPSPASSWSGGASPAPSASPYACT LSPGSPAGSASDAEHWPPPRGRFAPPPPPHRCL chouca?~?$ degapseq toto Removes gap characters from sequences output sequence(s) [xenopus-1a.fasta]: stdout >Xenopus-1a MVLLKCEYRDEEEDLTSASPCSVTSSFRSPATQTCSSDDEQLLSPTSPGQHQGEENSPRC RRSRGRAQGKSGETVLKIKKTRRVKANNRERNRMHNLNSALDSLREVLPSLPEDAKLTKI ETLRFAYNYIWALSETLRLGDPVHRSASTPAAAILVQDSSSSQSPSWSCSSSPSSSCCSF SPASPASSTSDSIESWQPSELHLNPFMSASSAFI >Xenopus-1b MVLLKCEYRDEVSELTSVSPCSVSSSSSHPSPAMQTCSSDDEQLHSPTSPTLTHLQQGRD QGEENSPRCRRSRARGDTVLKIKKTRRVKANNRERNRMHHLNYALDSLREVLPSLPEDAK LTKIETLRFAHNYIWALSETLRLADQLHGSTSTPAAAILVQDSYPSLSPSWSCSSSPSSN SCDSFSPTSPASSTSDSIEYWQPSELRLNPFMSAL >Gallus-2 MPVKAESPAPAAEDELLLLRLASPAPSASLPSSAGEEDEDEEDGRPRRLQEGARRAGRQR GPPRAARTAETAQRIKRSRRLKANNRERNRMHNLNAALDALRDVLPTFPEDAKLTKIETL RFAHNYIWALTETLRLAGAARLGGAADAAPGAAAEGSPSPASSWSGGASPAPSASPYACT LSPGSPAGSASDAEHWPPPRGRFAPPPPPHRCL Have a nice day, -- Charles Plessy http://charles.plessy.org Wako, Saitama, Japan From david at compbio.dundee.ac.uk Tue Oct 9 11:56:57 2007 From: david at compbio.dundee.ac.uk (David Martin) Date: Tue, 09 Oct 2007 16:56:57 +0100 Subject: [EMBOSS] Updating the Quick Guide Message-ID: Prompted by charles' request yesterday I am in the process of updating the EMBOSS quick guide. it was last touched about 8 years ago so comments and suggestions on what is new, and what should be dropped would be much appreciated. ..d From andrespinzon at gmail.com Tue Oct 9 12:32:09 2007 From: andrespinzon at gmail.com (Andres Pinzon) Date: Tue, 9 Oct 2007 11:32:09 -0500 Subject: [EMBOSS] Updating the Quick Guide In-Reply-To: References: Message-ID: <8968fc7e0710090932g63b77a9k7d83bea25c176349@mail.gmail.com> David, I am in the process of writing an EMBOSS book, called "An?lisis de secuencias usando EMBOSS", (" Molecular sequence analysis using EMBOSS", in english), it will be released under a CC license (and of course Open Source), maybe some of the book content can be used. Please, if you need help on the "old" quick guide update please let me know it, Ill be more than glad on helping. Regards, On 10/9/07, David Martin wrote: > Prompted by charles' request yesterday I am in the process of updating the > EMBOSS quick guide. it was last touched about 8 years ago so comments and > suggestions on what is new, and what should be dropped would be much > appreciated. > > ..d > > > _______________________________________________ > EMBOSS mailing list > EMBOSS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/emboss > -- Andr?s Pinz?n http://bioinf.ibun.unal.edu.co/~apinzon/ Bioinformatics Center, Colombia EMBnet node http://bioinf.ibun.unal.edu.co Tel +57 3165000 ext 16961 Fax +571 3165415 Micology and Phytopathology Laboratory - Los Andes University. http://bioinf.uniandes.edu.co Tel +571 3394949 ext. 2768 From michael.watson at bbsrc.ac.uk Wed Oct 10 09:02:49 2007 From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C)) Date: Wed, 10 Oct 2007 14:02:49 +0100 Subject: [EMBOSS] XFree86 vs xorg Message-ID: <8975119BCD0AC5419D61A9CF1A923E9505A4F2EA@iahce2ksrv1.iah.bbsrc.ac.uk> Hi My EMBOSS 5.0 install failed as it couldn't find Xlib.h. On googling, I see this is part of XFree86-devel. However, as a red hat enterprise linux 4 user, my X windows seems to be the x.org branch rather than XFree86.... So, is there a workaround, or should I overwrite my xorg libraries with XFree86 ones? Thanks Mick The information contained in this message may be confidential or legally privileged and is intended solely for the addressee. If you have received this message in error please delete it & notify the originator immediately. Unauthorised use, disclosure, copying or alteration of this message is forbidden & may be unlawful. The contents of this e-mail are the views of the sender and do not necessarily represent the views of the Institute. This email and associated attachments has been checked locally for viruses but we can accept no responsibility once it has left our systems. Communications on Institute computers are monitored to secure the effective operation of the systems and for other lawful purposes. From dalloliogm at gmail.com Wed Oct 10 09:23:01 2007 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Wed, 10 Oct 2007 15:23:01 +0200 Subject: [EMBOSS] Updating the Quick Guide In-Reply-To: References: Message-ID: <5aa3b3570710100623v42107a31we6af4cab8d1bdb80@mail.gmail.com> You should update the guide on how to install emboss. In particular, explain how to use the .deb and .rpm packages, since a lot of people still try to install emboss by compiling it, and it is a pain. 2007/10/9, David Martin : > Prompted by charles' request yesterday I am in the process of updating the > EMBOSS quick guide. it was last touched about 8 years ago so comments and > suggestions on what is new, and what should be dropped would be much > appreciated. > > ..d > > > _______________________________________________ > EMBOSS mailing list > EMBOSS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/emboss > -- ----------------------------------------------------------- My Blog on Bioinformatics (italian): http://dalloliogm.wordpress.com From ajb at ebi.ac.uk Wed Oct 10 09:30:09 2007 From: ajb at ebi.ac.uk (ajb at ebi.ac.uk) Date: Wed, 10 Oct 2007 14:30:09 +0100 (BST) Subject: [EMBOSS] XFree86 vs xorg In-Reply-To: <8975119BCD0AC5419D61A9CF1A923E9505A4F2EA@iahce2ksrv1.iah.bbsrc.ac.uk> References: <8975119BCD0AC5419D61A9CF1A923E9505A4F2EA@iahce2ksrv1.iah.bbsrc.ac.uk> Message-ID: <50101.81.98.241.17.1192023009.squirrel@webmail.ebi.ac.uk> Hello Mick, For xorg all you need to do is to install the xorg-x11-proto-devel RPM and then, in EMBOSS-5.0.0, do a 'make clean' and configure again. You might want to install the gd-devel RPM at the same time (to get PNG support). If you install them both using 'yum' then all the dependencies will be pulled-in. HTH Alan > Hi > > My EMBOSS 5.0 install failed as it couldn't find Xlib.h. On googling, I > see this is part of XFree86-devel. However, as a red hat enterprise > linux 4 user, my X windows seems to be the x.org branch rather than > XFree86.... > > So, is there a workaround, or should I overwrite my xorg libraries with > XFree86 ones? > > Thanks > Mick > > The information contained in this message may be confidential or legally > privileged and is intended solely for the addressee. If you have > received this message in error please delete it & notify the originator > immediately. > Unauthorised use, disclosure, copying or alteration of this message is > forbidden & may be unlawful. > The contents of this e-mail are the views of the sender and do not > necessarily represent the views of the Institute. > This email and associated attachments has been checked locally for > viruses but we can accept no responsibility once it has left our > systems. > Communications on Institute computers are monitored to secure the > effective operation of the systems and for other lawful purposes. > > _______________________________________________ > EMBOSS mailing list > EMBOSS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/emboss > From ajb at ebi.ac.uk Wed Oct 10 09:39:26 2007 From: ajb at ebi.ac.uk (ajb at ebi.ac.uk) Date: Wed, 10 Oct 2007 14:39:26 +0100 (BST) Subject: [EMBOSS] Updating the Quick Guide In-Reply-To: <5aa3b3570710100623v42107a31we6af4cab8d1bdb80@mail.gmail.com> References: <5aa3b3570710100623v42107a31we6af4cab8d1bdb80@mail.gmail.com> Message-ID: <42689.81.98.241.17.1192023566.squirrel@webmail.ebi.ac.uk> > You should update the guide on how to install emboss. > > In particular, explain how to use the .deb and .rpm packages, since a > lot of people still try to install emboss by compiling it, and it is a > pain. I'll leave that up to David to decide but the information is in the new FAQ which, yesterday, I submitted to my colleagues for approval and will then appear in CVS and later online. There was already some RPM info around but no .deb stuff. The info will also be in the books which Jon mentioned recently. Alan From charles-listes-emboss at plessy.org Wed Oct 10 09:24:08 2007 From: charles-listes-emboss at plessy.org (Charles Plessy) Date: Wed, 10 Oct 2007 22:24:08 +0900 Subject: [EMBOSS] XFree86 vs xorg In-Reply-To: <8975119BCD0AC5419D61A9CF1A923E9505A4F2EA@iahce2ksrv1.iah.bbsrc.ac.uk> References: <8975119BCD0AC5419D61A9CF1A923E9505A4F2EA@iahce2ksrv1.iah.bbsrc.ac.uk> Message-ID: <20071010132408.GJ990@kunpuu.plessy.org> Le Wed, Oct 10, 2007 at 02:02:49PM +0100, michael watson (IAH-C) a ?crit : > Hi > > My EMBOSS 5.0 install failed as it couldn't find Xlib.h. On googling, I > see this is part of XFree86-devel. However, as a red hat enterprise > linux 4 user, my X windows seems to be the x.org branch rather than > XFree86.... In Xorg, the libraries have been separated in individual packages. I think that you can find Xlib.h in a package named libx11-devel, or something like this. Have a nice day, -- Charles Plessy http://charles.plessy.org Wako, Saitama, Japan From dalloliogm at gmail.com Wed Oct 10 10:06:16 2007 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Wed, 10 Oct 2007 16:06:16 +0200 Subject: [EMBOSS] Updating the Quick Guide In-Reply-To: <42689.81.98.241.17.1192023566.squirrel@webmail.ebi.ac.uk> References: <5aa3b3570710100623v42107a31we6af4cab8d1bdb80@mail.gmail.com> <42689.81.98.241.17.1192023566.squirrel@webmail.ebi.ac.uk> Message-ID: <5aa3b3570710100706s7ab0a28tebbd30523733826@mail.gmail.com> 2007/10/10, ajb at ebi.ac.uk : > There was already some RPM > info around but no .deb stuff. The info will also be in the books > which Jon mentioned recently. > hi, there is an emboss 5.0 package in debian sid. You just have to add something like this: """ If you are a debian/ubuntu user, you can install emboss by giving the command: >>> sudo aptitude install emboss to install the package. """ Actually, this would work only for Debian Sid, but I believe the package will be included also in Ubuntu 7/10 and in debian etch in the short time. -- ----------------------------------------------------------- My Blog on Bioinformatics (italian): http://dalloliogm.wordpress.com From charles-listes-emboss at plessy.org Wed Oct 10 10:55:35 2007 From: charles-listes-emboss at plessy.org (Charles Plessy) Date: Wed, 10 Oct 2007 23:55:35 +0900 Subject: [EMBOSS] possibility of packages for Debian Etch. In-Reply-To: <5aa3b3570710100706s7ab0a28tebbd30523733826@mail.gmail.com> References: <5aa3b3570710100623v42107a31we6af4cab8d1bdb80@mail.gmail.com> <42689.81.98.241.17.1192023566.squirrel@webmail.ebi.ac.uk> <5aa3b3570710100706s7ab0a28tebbd30523733826@mail.gmail.com> Message-ID: <20071010145535.GK990@kunpuu.plessy.org> Le Wed, Oct 10, 2007 at 04:06:16PM +0200, Giovanni Marco Dall'Olio a ?crit : > hi, > there is an emboss 5.0 package in debian sid. > > Actually, this would work only for Debian Sid, but I believe the > package will be included also in Ubuntu 7/10 and in debian etch in the > short time. Dear Giovanni, Because Debian Etch is the stable version, it does not receive new packages unless they fix security issues or grave bugs. The emboss package for Debian will never be part of Etch nor its updates. However, some Debian developpers provides a separate repository in which only official developers upload recent packages recompiled for Etch. The site is called backports.org. If you or another reader is interested, we can prepare such a backport for Etch. Have a nice day, -- Charles Plessy Debian-Med packaging team Wako, Saitama, Japan From david at compbio.dundee.ac.uk Wed Oct 10 10:30:24 2007 From: david at compbio.dundee.ac.uk (David Martin) Date: Wed, 10 Oct 2007 15:30:24 +0100 Subject: [EMBOSS] Updating the Quick Guide In-Reply-To: <5aa3b3570710100623v42107a31we6af4cab8d1bdb80@mail.gmail.com> Message-ID: On 10/10/07 14:23, "Giovanni Marco Dall'Olio" wrote: > You should update the guide on how to install emboss. > > In particular, explain how to use the .deb and .rpm packages, since a > lot of people still try to install emboss by compiling it, and it is a > pain. > > > 2007/10/9, David Martin : >> Prompted by charles' request yesterday I am in the process of updating the >> EMBOSS quick guide. it was last touched about 8 years ago so comments and >> suggestions on what is new, and what should be dropped would be much >> appreciated. >> >> ..d The aim of the Quick Guide is to provide a one sheet of A4 (two sides) quick reference guide to the common programs and command line arguments that are used with EMBOSS. I found it very useful when teaching as an aide memoire for myself and the students. Explaining how to install EMBOSS on each architecture is NOT the aim - for that read the admin guide, the maintenance of which Alan and others have taken off my hands. I will however reference the admin guide for installation info. If you haven't seen the quick guide a somewhat dated pdf is available in emboss/docs/manuals/emboss_qg.pdf regards ..d From Veronique.Martin at jouy.inra.fr Thu Oct 11 03:39:44 2007 From: Veronique.Martin at jouy.inra.fr (Veronique.Martin at jouy.inra.fr) Date: Thu, 11 Oct 2007 09:39:44 +0200 (CEST) Subject: [EMBOSS] prosextract option? Message-ID: Hi, I want to run prosextract, but I would like build prosite motif in directory of my choice. Now the only possibility is in this path : emboss/share/EMBOSS/data/PROSITE Is it possbile to have got an option for choosing the output directory? I had tried by using the .embossrc file but only for this database (prosite) this file is not considered, prosextract used the emboss/share/EMBOSS/emboss.default file. Regards, VM ------------------------------------------------- V?ronique MARTIN INRA - Unit? Math?matique, Informatique et G?nome 78352 Jouy-en Josas cedex tel.: 01 34 65 29 74 ------------------------------------------------- From dalloliogm at gmail.com Thu Oct 11 04:36:04 2007 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Thu, 11 Oct 2007 10:36:04 +0200 Subject: [EMBOSS] possibility of packages for Debian Etch. In-Reply-To: <20071010145535.GK990@kunpuu.plessy.org> References: <5aa3b3570710100623v42107a31we6af4cab8d1bdb80@mail.gmail.com> <42689.81.98.241.17.1192023566.squirrel@webmail.ebi.ac.uk> <5aa3b3570710100706s7ab0a28tebbd30523733826@mail.gmail.com> <20071010145535.GK990@kunpuu.plessy.org> Message-ID: <5aa3b3570710110136y2c32b6e8v614e13cbfd12de44@mail.gmail.com> 2007/10/10, Charles Plessy : > > Because Debian Etch is the stable version, it does not receive new > packages unless they fix security issues or grave bugs. The emboss > package for Debian will never be part of Etch nor its updates. > Really? I didn't know emboss had grave bugs. Are you saying they can't be fixed? I can't find many references to bugs in emboss, but maybe you are referring to bugs like this: - http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=427439 ? > However, some Debian developpers provides a separate repository in which > only official developers upload recent packages recompiled for Etch. The > site is called backports.org. > > If you or another reader is interested, we can prepare such a backport > for Etch. Thank you very much: I think many people are interested, expecially from the Ubuntu users community. Emboss is seen as a educational package to learn bioinformatics: so, it would be better if people can install it easily by themselves, instead of asking to a system manager. Maybe you can just add the link to debian backports in the help page. > Have a nice day, > and to you, too! > -- > Charles Plessy > Debian-Med packaging team > Wako, Saitama, Japan > _______________________________________________ > EMBOSS mailing list > EMBOSS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/emboss > -- ----------------------------------------------------------- My Blog on Bioinformatics (italian): http://dalloliogm.wordpress.com From charles-listes-emboss at plessy.org Thu Oct 11 05:06:13 2007 From: charles-listes-emboss at plessy.org (charles-listes-emboss at plessy.org) Date: Thu, 11 Oct 2007 18:06:13 +0900 Subject: [EMBOSS] possibility of packages for Debian Etch. In-Reply-To: <5aa3b3570710110136y2c32b6e8v614e13cbfd12de44@mail.gmail.com> References: <5aa3b3570710100623v42107a31we6af4cab8d1bdb80@mail.gmail.com> <42689.81.98.241.17.1192023566.squirrel@webmail.ebi.ac.uk> <5aa3b3570710100706s7ab0a28tebbd30523733826@mail.gmail.com> <20071010145535.GK990@kunpuu.plessy.org> <5aa3b3570710110136y2c32b6e8v614e13cbfd12de44@mail.gmail.com> Message-ID: <20071011090613.GA31072@kunpuu.plessy.org> Le Thu, Oct 11, 2007 at 10:36:04AM +0200, Giovanni Marco Dall'Olio a ?crit : > 2007/10/10, Charles Plessy : > > > > Because Debian Etch is the stable version, it does not receive new > > packages unless they fix security issues or grave bugs. The emboss > > package for Debian will never be part of Etch nor its updates. > > > > Really? > I didn't know emboss had grave bugs. Dear Giovanni, I have been unclear. The reason why EMBOSS is not in Debian Etch is because its Debian package was not ready when Etch has been released. Furthermore, it is the policy of Debian to only accept changes related to security or grave bugs. Therefore, Debian Etch will never contain the Debian packages we prepared for EMBOSS. I will announce on this list when the package will be available through backports.org. > Are you saying they can't be fixed? > I can't find many references to bugs in emboss, but maybe you are > referring to bugs like this: > - http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=427439 ? Yes, the current package still has some quality issues. However, all the ones reported so far are solved in our SVN repository. I hope that I can update the Debian package of EMBOSS in Debian Sid soon. http://svn.debian.org/wsvn/pkg-emboss/emboss/trunk/debian/changelog?op=file&rev=0&sc=0 (If one explores a bit this repository, he can have a glimpse of what we have in the pipeline...). > Thank you very much: I think many people are interested, expecially > from the Ubuntu users community. By the way, if you ask to a MOTU Science, I think that it is possible to fast-track the emboss packages into Ubuntu... > Maybe you can just add the link to debian backports in the help page. The new package.debian.org website advertises the backports. See for example the page for OpenOffice.org: http://packages.debian.org/openoffice.org Have a nice day, -- Charles Plessy Debian-Med packaging team. Wako, Saitama, Japan From Laurence.Amilhat at toulouse.inra.fr Thu Oct 11 05:44:40 2007 From: Laurence.Amilhat at toulouse.inra.fr (Laurence Amilhat) Date: Thu, 11 Oct 2007 11:44:40 +0200 Subject: [EMBOSS] plcore.c error when compiling Message-ID: <470DF088.4020303@toulouse.inra.fr> Dear Emboss users, I am tryin to install emboss on Linux Ubuntu 7.04 Feisty Fawn I downloaded the following tar.gz : EMBOSS-5.0.0.tar.gz I made the ./configure, (I have the grphics lib z, png and gd) But when I maunch the make, I get the following message. Does anyone have an idea why? Did I miss a lib or something? Thank you for your help, Best regards, Laurence plcore.c: In function 'int text2fci(const char*, unsigned char*, unsigned char*)': plcore.c:459: error: initializer-string for array of chars is too long plcore.c:459: error: initializer-string for array of chars is too long plcore.c:459: error: initializer-string for array of chars is too long plcore.c:459: error: initializer-string for array of chars is too long plcore.c:459: error: initializer-string for array of chars is too long plcore.c:459: error: initializer-string for array of chars is too long plcore.c:459: error: initializer-string for array of chars is too long plcore.c:459: error: initializer-string for array of chars is too long plcore.c:459: error: initializer-string for array of chars is too long plcore.c:459: error: initializer-string for array of chars is too long plcore.c: In function 'void difilt(PLINT*, PLINT*, PLINT, PLINT*, PLINT*, PLINT*, PLINT*)': plcore.c:887: warning: converting to 'int' from 'PLFLT' plcore.c:888: warning: converting to 'int' from 'PLFLT' plcore.c:897: warning: converting to 'PLINT' from 'PLFLT' plcore.c:899: warning: converting to 'PLINT' from 'PLFLT' plcore.c:909: warning: converting to 'int' from 'PLFLT' plcore.c:910: warning: converting to 'int' from 'PLFLT' plcore.c:919: warning: converting to 'int' from 'PLFLT' plcore.c:920: warning: converting to 'int' from 'PLFLT' plcore.c: In function 'void sdifilt(short int*, short int*, PLINT, PLINT*, PLINT*, PLINT*, PLINT*)': plcore.c:946: warning: converting to 'short int' from 'PLFLT' plcore.c:947: warning: converting to 'short int' from 'PLFLT' plcore.c:955: warning: converting to 'short int' from 'PLFLT' plcore.c:956: warning: converting to 'short int' from 'PLFLT' plcore.c:966: warning: converting to 'short int' from 'PLFLT' plcore.c:967: warning: converting to 'short int' from 'PLFLT' plcore.c:976: warning: converting to 'short int' from 'PLFLT' plcore.c:977: warning: converting to 'short int' from 'PLFLT' plcore.c: In function 'void pldid2pc(PLFLT*, PLFLT*, PLFLT*, PLFLT*)': plcore.c:1079: warning: passing 'PLFLT' for argument 1 to 'PLFLT plP_pcdcx(PLINT)' plcore.c:1080: warning: passing 'PLFLT' for argument 1 to 'PLFLT plP_pcdcy(PLINT)' plcore.c:1081: warning: passing 'PLFLT' for argument 1 to 'PLFLT plP_pcdcx(PLINT)' plcore.c:1082: warning: passing 'PLFLT' for argument 1 to 'PLFLT plP_pcdcy(PLINT)' plcore.c: In function 'void pldip2dc(PLFLT*, PLFLT*, PLFLT*, PLFLT*)': plcore.c:1125: warning: passing 'PLFLT' for argument 1 to 'PLFLT plP_pcdcx(PLINT)' plcore.c:1126: warning: passing 'PLFLT' for argument 1 to 'PLFLT plP_pcdcy(PLINT)' plcore.c:1127: warning: passing 'PLFLT' for argument 1 to 'PLFLT plP_pcdcx(PLINT)' plcore.c:1128: warning: passing 'PLFLT' for argument 1 to 'PLFLT plP_pcdcy(PLINT)' plcore.c: In function 'void calc_didev()': plcore.c:1345: warning: converting to 'PLINT' from 'PLFLT' plcore.c:1346: warning: converting to 'PLINT' from 'PLFLT' plcore.c:1347: warning: converting to 'PLINT' from 'PLFLT' plcore.c:1348: warning: converting to 'PLINT' from 'PLFLT' plcore.c: In function 'void plP_setpxl(PLFLT, PLFLT)': plcore.c:3264: warning: converting to 'PLINT' from 'double' plcore.c:3265: warning: converting to 'PLINT' from 'double' make[2]: *** [plcore.lo] Erreur 1 make[2]: quittant le r?pertoire ? /tmp/EMBOSS-5.0.0/plplot ? make[1]: *** [all-recursive] Erreur 1 make[1]: quittant le r?pertoire ? /tmp/EMBOSS-5.0.0/plplot ? make: *** [all-recursive] Erreur 1 Exit 2 -- ==================================================================== = Laurence Amilhat INRA Toulouse 31326 Castanet-Tolosan = = Tel: 33 5 61 28 53 34 Email: laurence.amilhat at toulouse.inra.fr = ==================================================================== From jison at ebi.ac.uk Thu Oct 11 08:16:19 2007 From: jison at ebi.ac.uk (Jon Ison) Date: Thu, 11 Oct 2007 13:16:19 +0100 (BST) Subject: [EMBOSS] prosextract option? In-Reply-To: References: Message-ID: <48865.84.92.187.247.1192104979.squirrel@webmail.ebi.ac.uk> Hi Veronique prosextract is indeed hard-coded to write to the EMBOSS data directory (defined by the EMBOSS environment variable EMBOSS_DATA). You could always copy the file to your current working directory or into a directory called ".embossdata" in either your home or current working directory and the file could still be read by EMBOSS. If that doesn't help an option to write to any specified directory could easily be added - please advise. Cheers Jon > > Hi, > > I want to run prosextract, but I would like build prosite motif in > directory of my choice. Now the only possibility is in this path : > emboss/share/EMBOSS/data/PROSITE > Is it possbile to have got an option for choosing the output directory? > > I had tried by using the .embossrc file but only for this database > (prosite) this file is not considered, prosextract used the > emboss/share/EMBOSS/emboss.default file. > > Regards, > > VM > > ------------------------------------------------- > V?ronique MARTIN > INRA - Unit? Math?matique, Informatique et G?nome > 78352 Jouy-en Josas cedex > tel.: 01 34 65 29 74 > -------------------------------------------------_______________________________________________ > EMBOSS mailing list > EMBOSS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/emboss > From pmr at ebi.ac.uk Wed Oct 24 04:07:21 2007 From: pmr at ebi.ac.uk (Peter Rice) Date: Wed, 24 Oct 2007 09:07:21 +0100 Subject: [EMBOSS] Bug in degapseq ? In-Reply-To: <20071008233828.GA32069@kunpuu.plessy.org> References: <20071008233828.GA32069@kunpuu.plessy.org> Message-ID: <471EFD39.5060202@ebi.ac.uk> Charles Plessy wrote: > If I use degaspeq on a file, it prompts me for the name of the output, but if > the data comes from stdin, degaspeq crashes. It does not happen if the name of > the output is given. > chouca?~?$ cat toto | degapseq stdin > Removes gap characters from sequences > output sequence(s) [xenopus-1a.fasta]: > EMBOSS An error in ajmess.c at line 1662: > END-OF-FILE reading from user This is because you are reading from stdin, but then degapseq tries to read the output filename from stdin. You do need to specify the output filename, or use -auto to accept the default (or -filter to use stdout and to read from stdin). With -auto and -filter the program will no longer be using stdin for user replies. Hmmm ... maybe we could catch these cases ... tricky though as really it is an explicit search for "stdin" as an input file/sequence. I could invent examples where we would guess wrongly. Hope that helps, Peter From charles-listes-emboss at plessy.org Wed Oct 24 10:37:06 2007 From: charles-listes-emboss at plessy.org (Charles Plessy) Date: Wed, 24 Oct 2007 23:37:06 +0900 Subject: [EMBOSS] Bug in degapseq ? In-Reply-To: <471EFD39.5060202@ebi.ac.uk> References: <20071008233828.GA32069@kunpuu.plessy.org> <471EFD39.5060202@ebi.ac.uk> Message-ID: <20071024143706.GB24491@kunpuu.plessy.org> Le Wed, Oct 24, 2007 at 09:07:21AM +0100, Peter Rice a ?crit : > > You do need to specify the output filename, or use -auto to accept the > default (or -filter to use stdout and to read from stdin). > > With -auto and -filter the program will no longer be using stdin for > user replies. Oh, I completely overlooked the fact that the emboss programs can take their user replies from stdin. Maybe then the most straightforward to inform users from this mistake would be to change the error message to something like : "Error: could not open file '...............', in which the name of the file would be truncated to the end of the line. The user would quickly understand if the file name is someting like AGTCCAGGTA... Have a nice day, -- Charles Plessy Wako, Saitama, Japan From pmr at ebi.ac.uk Wed Oct 24 12:53:27 2007 From: pmr at ebi.ac.uk (Peter Rice) Date: Wed, 24 Oct 2007 17:53:27 +0100 Subject: [EMBOSS] Bug in degapseq ? In-Reply-To: <20071024143706.GB24491@kunpuu.plessy.org> References: <20071008233828.GA32069@kunpuu.plessy.org> <471EFD39.5060202@ebi.ac.uk> <20071024143706.GB24491@kunpuu.plessy.org> Message-ID: <471F7887.5050004@ebi.ac.uk> Charles Plessy wrote: > Oh, I completely overlooked the fact that the emboss programs can take > their user replies from stdin. Maybe then the most straightforward to > inform users from this mistake would be to change the error message to > something like : "Error: could not open file '...............', in which > the name of the file would be truncated to the end of the line. The user > would quickly understand if the file name is someting like AGTCCAGGTA... Or perhaps they would not quickly understand ... because it took me a few runs before I realised that was the problem :-) I think we can keep track of stdin being opened in EMBOSS and refuse to prompt for input. regards, Peter From staffa at niehs.nih.gov Wed Oct 24 13:21:37 2007 From: staffa at niehs.nih.gov (Staffa, Nick (NIH/NIEHS)) Date: Wed, 24 Oct 2007 13:21:37 -0400 Subject: [EMBOSS] GUI interfaces Message-ID: Friends We are preparing for if ever GCG goes away by seriously pushing EMBOSS with our users. This page http://emboss.sourceforge.net/interfaces/ lists 15 GUIs. apparently ColiMate is not an existing GUI to EMBOSS, but a developement tool. Please tell me: Which of the 15 GUIs listed are complete and available? Which do you think is best? Thank you Nick Staffa Telephone: 919-316-4569 (NIEHS: 6-4569) Scientific Computing Support Group NIEHS Information Technology Support Services Contract (Science Task Monitor: Roy W. Reter (reter at niehs.nih.gov) National Institute of Environmental Health Sciences National Institutes of Health Research Triangle Park, North Carolina From andrespinzon at gmail.com Wed Oct 24 14:11:18 2007 From: andrespinzon at gmail.com (Andres Pinzon) Date: Wed, 24 Oct 2007 13:11:18 -0500 Subject: [EMBOSS] GUI interfaces In-Reply-To: References: Message-ID: <8968fc7e0710241111odff847dge2d0d16889c16e32@mail.gmail.com> In my experience: [1] wEMBOSS and EMBOSS-Explorer are really easy to configure and provide different user experience that complement each other. [1] http://bioinf.ibun.unal.edu.co/wEMBOSS/ Regards, On 10/24/07, Staffa, Nick (NIH/NIEHS) wrote: > Friends > We are preparing for if ever GCG goes away by seriously pushing EMBOSS > with our users. > This page > http://emboss.sourceforge.net/interfaces/ > lists 15 GUIs. > apparently ColiMate is not an existing GUI to EMBOSS, > but a developement tool. > Please tell me: > Which of the 15 GUIs listed are complete and available? > Which do you think is best? > > Thank you > > Nick Staffa > Telephone: 919-316-4569 (NIEHS: 6-4569) > Scientific Computing Support Group > NIEHS Information Technology Support Services Contract > (Science Task Monitor: Roy W. Reter (reter at niehs.nih.gov) > National Institute of Environmental Health Sciences > National Institutes of Health > Research Triangle Park, North Carolina > > > _______________________________________________ > EMBOSS mailing list > EMBOSS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/emboss > -- Andr?s Pinz?n http://bioinf.ibun.unal.edu.co/~apinzon/ Bioinformatics Center, Colombia EMBnet node http://bioinf.ibun.unal.edu.co Tel +57 3165000 ext 16961 Fax +571 3165415 Micology and Phytopathology Laboratory - Los Andes University. http://bioinf.uniandes.edu.co Tel +571 3394949 ext. 2768 From golharam at umdnj.edu Wed Oct 24 13:58:08 2007 From: golharam at umdnj.edu (Ryan Golhar) Date: Wed, 24 Oct 2007 13:58:08 -0400 Subject: [EMBOSS] GUI interfaces In-Reply-To: References: Message-ID: <471F87B0.8030308@umdnj.edu> Hi Nich, We (UMDNJ) migrated off of GCG several years ago. We found most of our users prefer the command-line interface for shell scripting or a web interface for GUI access from their own computers. We use EMBOSS-Explorer for the web interface. Its (much) cleaner and faster than SeqWeb ever was and doesn't rely on the server storing user data. We removed our responsibility of backing user data by moving off a server storages system to the user instead. There are no issues with user account management (username/passwords) with this system either. With GCG, we would have at least 1 or 2 user issues per month. Since the switch, I can honestly say our user issues are maybe 1 or 2 per year. If you have any questions about this, feel free to email me, Ryan ---------------- Ryan Golhar, PhD golharam at umdnj.edu Computational Biologst Informatics Institute at UMDNJ Staffa, Nick (NIH/NIEHS) wrote: > Friends > We are preparing for if ever GCG goes away by seriously pushing EMBOSS > with our users. > This page > http://emboss.sourceforge.net/interfaces/ > lists 15 GUIs. > apparently ColiMate is not an existing GUI to EMBOSS, > but a developement tool. > Please tell me: > Which of the 15 GUIs listed are complete and available? > Which do you think is best? > > Thank you > > Nick Staffa > Telephone: 919-316-4569 (NIEHS: 6-4569) > Scientific Computing Support Group > NIEHS Information Technology Support Services Contract > (Science Task Monitor: Roy W. Reter (reter at niehs.nih.gov) > National Institute of Environmental Health Sciences > National Institutes of Health > Research Triangle Park, North Carolina > > > _______________________________________________ > EMBOSS mailing list > EMBOSS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/emboss > > From kann.vearasilp at mu.edu Thu Oct 25 15:07:37 2007 From: kann.vearasilp at mu.edu (Kann Vearasilp) Date: Thu, 25 Oct 2007 14:07:37 -0500 Subject: [EMBOSS] Cannot open division file Message-ID: <80455327-9F8B-49EC-801F-3A5DFDE09DD3@mu.edu> Hello everyone, I just finish indexing a genbank database for my lab using dbiflat command. I set up an emboss.default file referenced from emboss.default.template as it was provided. "seqret" is a command that is used to test the system, and it seems that EMBOSS could not find the division file. I can see from the archive that there was this kind of problem with test database provided from emboss as well. (http://emboss.open- bio.org/pipermail/emboss/2005-November/002323.html). However, I am pretty sure that I correctly pointed the path to my database. However, here is my configuration. The system is Mac OS 10.4 1. Emboss was installed from fink at /sw/share/EMBOSS 2. All database was installed in /lab/data/databases/genbank/*.seq 3. Index files are in /lab/data/indices/genbank/??? Here is an example of one of the index directory from my lab. xxx at yyy/lab/data/indices/genbank/mam: acnum.hit des.trg keyword.hit seqvn.hit taxon.trg acnum.trg division.lkp keyword.trg seqvn.trg des.hit entrynam.idx mam.dbiflat taxon.hit 4. Here is a fraction from my emboss.default file: # Set location of acd files that describe each program SET emboss_acdroot /sw/share/EMBOSS/acd # Set location of Genbank flatfiles in protein SET emboss_database_dir /lab/data/databases # Set location of Genbank flatfiles indices in protein set emboss_index_dir /lab/data/indices # Set a log file that user can append their records and EMBOSS automatically write log information SET emboss_logfile /sw/share/EMBOSS/log/log # Set Paper size of disc page and is required by the 'dbx' indexing program and 'method: "emblcd" emboss' # Recommended value is 2048 SET PAGESIZE 2048 # Set Caches size required for 'dbx' indexing and 'method emboss'. # It is a page size number to cache. Recommended value is 200 SET CACHESIZE 200 # Set parameter for flat file indices that we have created in # /lab/data/indices/genbank . . . . . DB gbmam [ # required parameters method: "emblcd" format: "GB" type: "N" dir: "\$emboss_database_dir/genbank" file: "gbmam*.seq" # optional parameters fields: "sv des key org" release: "161.0" comment: "Genbank database for mam sequences" indexdir: "\$emboss_index_dir/genbank/mam" ] 5. I run this seqret command to test the system, but it throw error and you can see: xxx at yyy~:seqret gbmam:BC102801 Reads and writes (returns) sequences Warning: Cannot open division file '' for database 'gbmam' Warning: seqCdQry failed Error: Unable to read sequence 'gbmam:BC102801' Died: seqret terminated: Bad value for '-sequence' and no prompt 6. I also run the seqret command in debug mode and this is its log from the command. Debug file seqret.dbg buffered:No ajAcdInitP pgm 'seqret' package '' ajFileNewIn '/sw/share/EMBOSS/acd/seqret.acd' EOF ajFileGetsL file /sw/share/EMBOSS/acd/seqret.acd closing file '/sw/share/EMBOSS/acd/seqret.acd' ajFileNewIn '/sw/share/EMBOSS/acd/codes.english' EOF ajFileGetsL file /sw/share/EMBOSS/acd/codes.english closing file '/sw/share/EMBOSS/acd/codes.english' ajTableNewFunctionLen hint 25 size 251 ajTableNewFunctionLen hint 25 size 251 ajTableNewFunctionLen hint 25 size 251 ajFileNewIn '/sw/share/EMBOSS/acd/knowntypes.standard' EOF ajFileGetsL file /sw/share/EMBOSS/acd/knowntypes.standard closing file '/sw/share/EMBOSS/acd/knowntypes.standard' Set acdprotein value '$(sequence.protein)' ajSeqinClear called ++seqUsaProcess 'gbmam:BC102801' 0..0(N) '' 0 USA to test: 'gbmam:BC102801' format regexp: No list:No no format specified in USA ...input format not set dbname dbexp: Yes found dbname 'gbmam' level: '' qry->QryString: 'BC102801' seqQueryFieldC usa 'sv' fields 'sv des key org' seqQueryField test 'sv' seqQueryField match 'sv' ajSeqQueryWild id 'BC102801' acc 'BC102801' sv 'BC102801' gi '' des '' org '' key '' wild (has) query Sv 'BC102801' database type: 'N' format 'GB' use access method 'emblcd' Matched seqAccess[2] 'emblcd' seqAccessEmblcd type 2 directory '\$emboss_index_dir/genbank/mam' entry 'BC102801' acc 'BC102801' hasacc:Yes ajFileNewIn '\$emboss_index_dir/genbank/mam/division.lkp' Database 'gbmam' : access method 'emblcd' failed ajSeqinClear called ++seqUsaProcess 'gbmam:BC102801' 0..0(N) '' 0 USA to test: 'gbmam:BC102801' format regexp: No list:No no format specified in USA ...input format not set dbname dbexp: Yes found dbname 'gbmam' level: '' qry->QryString: 'BC102801' seqQueryFieldC usa 'sv' fields 'sv des key org' seqQueryField test 'sv' seqQueryField match 'sv' ajSeqQueryWild id 'BC102801' acc 'BC102801' sv 'BC102801' gi '' des '' org '' key '' wild (has) query Sv 'BC102801' database type: 'N' format 'GB' use access method 'emblcd' Matched seqAccess[2] 'emblcd' seqAccessEmblcd type 2 directory '\$emboss_index_dir/genbank/mam' entry 'BC102801' acc 'BC102801' hasacc:Yes ajFileNewIn '\$emboss_index_dir/genbank/mam/division.lkp' Database 'gbmam' : access method 'emblcd' failed It seems that the emboss could not find the division file. I still don't know what the problem is. Do you have any recommendation? Thank you so much in advance for any help! Kann From ajb at ebi.ac.uk Thu Oct 25 16:22:18 2007 From: ajb at ebi.ac.uk (ajb at ebi.ac.uk) Date: Thu, 25 Oct 2007 21:22:18 +0100 (BST) Subject: [EMBOSS] Cannot open division file In-Reply-To: <80455327-9F8B-49EC-801F-3A5DFDE09DD3@mu.edu> References: <80455327-9F8B-49EC-801F-3A5DFDE09DD3@mu.edu> Message-ID: <33572.81.98.241.17.1193343738.squirrel@webmail.ebi.ac.uk> Dear Kann, One major problem is your DB entry: DB gbmam [ # required parameters method: "emblcd" format: "GB" type: "N" dir: "\$emboss_database_dir/genbank" file: "gbmam*.seq" # optional parameters fields: "sv des key org" release: "161.0" comment: "Genbank database for mam sequences" indexdir: "\$emboss_index_dir/genbank/mam" ] You should remove the two backquote characters before the '$' characters. I believe they mistakenly appeared in some documentation in the past (possibly as a result of some automatic formatting). It'd be useful if you'd email me off-list and tell me which documentation contained the error (if my guess is correct). Alan > Hello everyone, > > I just finish indexing a genbank database for my lab using dbiflat > command. I set up an emboss.default file referenced from > emboss.default.template as it was provided. "seqret" is a command > that is used to test the system, and it seems that EMBOSS could not > find the division file. > > I can see from the archive that there was this kind of problem with > test database provided from emboss as well. (http://emboss.open- > bio.org/pipermail/emboss/2005-November/002323.html). However, I am > pretty sure that I correctly pointed the path to my database. > However, here is my configuration. > > The system is Mac OS 10.4 > > 1. Emboss was installed from fink at /sw/share/EMBOSS > > 2. All database was installed in /lab/data/databases/genbank/*.seq > > 3. Index files are in /lab/data/indices/genbank/??? Here is an > example of one of the index directory from my lab. > > xxx at yyy/lab/data/indices/genbank/mam: > acnum.hit des.trg keyword.hit seqvn.hit taxon.trg > acnum.trg division.lkp keyword.trg seqvn.trg > des.hit entrynam.idx mam.dbiflat taxon.hit > > 4. Here is a fraction from my emboss.default file: > > # Set location of acd files that describe each program > SET emboss_acdroot /sw/share/EMBOSS/acd > > > # Set location of Genbank flatfiles in protein > SET emboss_database_dir /lab/data/databases > > # Set location of Genbank flatfiles indices in protein > set emboss_index_dir /lab/data/indices > > # Set a log file that user can append their records and EMBOSS > automatically write log information > SET emboss_logfile /sw/share/EMBOSS/log/log > > # Set Paper size of disc page and is required by the 'dbx' indexing > program and 'method: "emblcd" emboss' > # Recommended value is 2048 > SET PAGESIZE 2048 > > # Set Caches size required for 'dbx' indexing and 'method emboss'. > # It is a page size number to cache. Recommended value is 200 > SET CACHESIZE 200 > > # Set parameter for flat file indices that we have created in > # /lab/data/indices/genbank > . > . > . > . > . > DB gbmam [ > # required parameters > method: "emblcd" > format: "GB" > type: "N" > dir: "\$emboss_database_dir/genbank" > file: "gbmam*.seq" > # optional parameters > fields: "sv des key org" > release: "161.0" > comment: "Genbank database for mam sequences" > indexdir: "\$emboss_index_dir/genbank/mam" > ] > > 5. I run this seqret command to test the system, but it throw error > and you can see: > > xxx at yyy~:seqret gbmam:BC102801 > Reads and writes (returns) sequences > Warning: Cannot open division file '' for database 'gbmam' > Warning: seqCdQry failed > Error: Unable to read sequence 'gbmam:BC102801' > Died: seqret terminated: Bad value for '-sequence' and no prompt > > 6. I also run the seqret command in debug mode and this is its log > from the command. > > Debug file seqret.dbg buffered:No > ajAcdInitP pgm 'seqret' package '' > ajFileNewIn '/sw/share/EMBOSS/acd/seqret.acd' > EOF ajFileGetsL file /sw/share/EMBOSS/acd/seqret.acd > closing file '/sw/share/EMBOSS/acd/seqret.acd' > ajFileNewIn '/sw/share/EMBOSS/acd/codes.english' > EOF ajFileGetsL file /sw/share/EMBOSS/acd/codes.english > closing file '/sw/share/EMBOSS/acd/codes.english' > ajTableNewFunctionLen hint 25 size 251 > ajTableNewFunctionLen hint 25 size 251 > ajTableNewFunctionLen hint 25 size 251 > ajFileNewIn '/sw/share/EMBOSS/acd/knowntypes.standard' > EOF ajFileGetsL file /sw/share/EMBOSS/acd/knowntypes.standard > closing file '/sw/share/EMBOSS/acd/knowntypes.standard' > Set acdprotein value '$(sequence.protein)' > ajSeqinClear called > ++seqUsaProcess 'gbmam:BC102801' 0..0(N) '' 0 > USA to test: 'gbmam:BC102801' > > format regexp: No list:No > no format specified in USA > > ...input format not set > dbname dbexp: Yes > found dbname 'gbmam' level: '' qry->QryString: 'BC102801' > seqQueryFieldC usa 'sv' fields 'sv des key org' > seqQueryField test 'sv' > seqQueryField match 'sv' > ajSeqQueryWild id 'BC102801' acc 'BC102801' sv 'BC102801' gi '' des > '' org '' key '' > wild (has) query Sv 'BC102801' > database type: 'N' format 'GB' > use access method 'emblcd' > Matched seqAccess[2] 'emblcd' > seqAccessEmblcd type 2 > directory '\$emboss_index_dir/genbank/mam' entry 'BC102801' acc > 'BC102801' hasacc:Yes > ajFileNewIn '\$emboss_index_dir/genbank/mam/division.lkp' > Database 'gbmam' : access method 'emblcd' failed > ajSeqinClear called > ++seqUsaProcess 'gbmam:BC102801' 0..0(N) '' 0 > USA to test: 'gbmam:BC102801' > > format regexp: No list:No > no format specified in USA > > ...input format not set > dbname dbexp: Yes > found dbname 'gbmam' level: '' qry->QryString: 'BC102801' > seqQueryFieldC usa 'sv' fields 'sv des key org' > seqQueryField test 'sv' > seqQueryField match 'sv' > ajSeqQueryWild id 'BC102801' acc 'BC102801' sv 'BC102801' gi '' des > '' org '' key '' > wild (has) query Sv 'BC102801' > database type: 'N' format 'GB' > use access method 'emblcd' > Matched seqAccess[2] 'emblcd' > seqAccessEmblcd type 2 > directory '\$emboss_index_dir/genbank/mam' entry 'BC102801' acc > 'BC102801' hasacc:Yes > ajFileNewIn '\$emboss_index_dir/genbank/mam/division.lkp' > Database 'gbmam' : access method 'emblcd' failed > > It seems that the emboss could not find the division file. I still > don't know what the problem is. Do you have any recommendation? > > Thank you so much in advance for any help! > > Kann > > _______________________________________________ > EMBOSS mailing list > EMBOSS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/emboss > From kann.vearasilp at mu.edu Thu Oct 25 18:06:01 2007 From: kann.vearasilp at mu.edu (Kann Vearasilp) Date: Thu, 25 Oct 2007 17:06:01 -0500 Subject: [EMBOSS] Cannot open division file In-Reply-To: <33572.81.98.241.17.1193343738.squirrel@webmail.ebi.ac.uk> References: <80455327-9F8B-49EC-801F-3A5DFDE09DD3@mu.edu> <33572.81.98.241.17.1193343738.squirrel@webmail.ebi.ac.uk> Message-ID: Hello Alan, Thank you so much for fast response! It seems that this backslash cause me all the problems. Once I removed them, the program works flawlessly. :) Kann PS. I can find the document and will mail you once I know the version of this emboss tutorial. On Oct 25, 2007, at 3:22 PM, ajb at ebi.ac.uk wrote: > Dear Kann, > > One major problem is your DB entry: > > DB gbmam [ > # required parameters > method: "emblcd" > format: "GB" > type: "N" > dir: "\$emboss_database_dir/genbank" > file: "gbmam*.seq" > # optional parameters > fields: "sv des key org" > release: "161.0" > comment: "Genbank database for mam sequences" > indexdir: "\$emboss_index_dir/genbank/mam" > ] > > You should remove the two backquote characters before the '$' > characters. I believe they mistakenly appeared in some documentation > in the past (possibly as a result of some automatic formatting). > It'd be useful if you'd email me off-list and tell me which > documentation > contained the error (if my guess is correct). > > > Alan > > >> Hello everyone, >> >> I just finish indexing a genbank database for my lab using dbiflat >> command. I set up an emboss.default file referenced from >> emboss.default.template as it was provided. "seqret" is a command >> that is used to test the system, and it seems that EMBOSS could not >> find the division file. >> >> I can see from the archive that there was this kind of problem with >> test database provided from emboss as well. (http://emboss.open- >> bio.org/pipermail/emboss/2005-November/002323.html). However, I am >> pretty sure that I correctly pointed the path to my database. >> However, here is my configuration. >> >> The system is Mac OS 10.4 >> >> 1. Emboss was installed from fink at /sw/share/EMBOSS >> >> 2. All database was installed in /lab/data/databases/genbank/*.seq >> >> 3. Index files are in /lab/data/indices/genbank/??? Here is an >> example of one of the index directory from my lab. >> >> xxx at yyy/lab/data/indices/genbank/mam: >> acnum.hit des.trg keyword.hit seqvn.hit taxon.trg >> acnum.trg division.lkp keyword.trg seqvn.trg >> des.hit entrynam.idx mam.dbiflat taxon.hit >> >> 4. Here is a fraction from my emboss.default file: >> >> # Set location of acd files that describe each program >> SET emboss_acdroot /sw/share/EMBOSS/acd >> >> >> # Set location of Genbank flatfiles in protein >> SET emboss_database_dir /lab/data/databases >> >> # Set location of Genbank flatfiles indices in protein >> set emboss_index_dir /lab/data/indices >> >> # Set a log file that user can append their records and EMBOSS >> automatically write log information >> SET emboss_logfile /sw/share/EMBOSS/log/log >> >> # Set Paper size of disc page and is required by the 'dbx' indexing >> program and 'method: "emblcd" emboss' >> # Recommended value is 2048 >> SET PAGESIZE 2048 >> >> # Set Caches size required for 'dbx' indexing and 'method emboss'. >> # It is a page size number to cache. Recommended value is 200 >> SET CACHESIZE 200 >> >> # Set parameter for flat file indices that we have created in >> # /lab/data/indices/genbank >> . >> . >> . >> . >> . >> DB gbmam [ >> # required parameters >> method: "emblcd" >> format: "GB" >> type: "N" >> dir: "\$emboss_database_dir/genbank" >> file: "gbmam*.seq" >> # optional parameters >> fields: "sv des key org" >> release: "161.0" >> comment: "Genbank database for mam sequences" >> indexdir: "\$emboss_index_dir/genbank/mam" >> ] >> >> 5. I run this seqret command to test the system, but it throw error >> and you can see: >> >> xxx at yyy~:seqret gbmam:BC102801 >> Reads and writes (returns) sequences >> Warning: Cannot open division file '' for database 'gbmam' >> Warning: seqCdQry failed >> Error: Unable to read sequence 'gbmam:BC102801' >> Died: seqret terminated: Bad value for '-sequence' and no prompt >> >> 6. I also run the seqret command in debug mode and this is its log >> from the command. >> >> Debug file seqret.dbg buffered:No >> ajAcdInitP pgm 'seqret' package '' >> ajFileNewIn '/sw/share/EMBOSS/acd/seqret.acd' >> EOF ajFileGetsL file /sw/share/EMBOSS/acd/seqret.acd >> closing file '/sw/share/EMBOSS/acd/seqret.acd' >> ajFileNewIn '/sw/share/EMBOSS/acd/codes.english' >> EOF ajFileGetsL file /sw/share/EMBOSS/acd/codes.english >> closing file '/sw/share/EMBOSS/acd/codes.english' >> ajTableNewFunctionLen hint 25 size 251 >> ajTableNewFunctionLen hint 25 size 251 >> ajTableNewFunctionLen hint 25 size 251 >> ajFileNewIn '/sw/share/EMBOSS/acd/knowntypes.standard' >> EOF ajFileGetsL file /sw/share/EMBOSS/acd/knowntypes.standard >> closing file '/sw/share/EMBOSS/acd/knowntypes.standard' >> Set acdprotein value '$(sequence.protein)' >> ajSeqinClear called >> ++seqUsaProcess 'gbmam:BC102801' 0..0(N) '' 0 >> USA to test: 'gbmam:BC102801' >> >> format regexp: No list:No >> no format specified in USA >> >> ...input format not set >> dbname dbexp: Yes >> found dbname 'gbmam' level: '' qry->QryString: 'BC102801' >> seqQueryFieldC usa 'sv' fields 'sv des key org' >> seqQueryField test 'sv' >> seqQueryField match 'sv' >> ajSeqQueryWild id 'BC102801' acc 'BC102801' sv 'BC102801' gi '' des >> '' org '' key '' >> wild (has) query Sv 'BC102801' >> database type: 'N' format 'GB' >> use access method 'emblcd' >> Matched seqAccess[2] 'emblcd' >> seqAccessEmblcd type 2 >> directory '\$emboss_index_dir/genbank/mam' entry 'BC102801' acc >> 'BC102801' hasacc:Yes >> ajFileNewIn '\$emboss_index_dir/genbank/mam/division.lkp' >> Database 'gbmam' : access method 'emblcd' failed >> ajSeqinClear called >> ++seqUsaProcess 'gbmam:BC102801' 0..0(N) '' 0 >> USA to test: 'gbmam:BC102801' >> >> format regexp: No list:No >> no format specified in USA >> >> ...input format not set >> dbname dbexp: Yes >> found dbname 'gbmam' level: '' qry->QryString: 'BC102801' >> seqQueryFieldC usa 'sv' fields 'sv des key org' >> seqQueryField test 'sv' >> seqQueryField match 'sv' >> ajSeqQueryWild id 'BC102801' acc 'BC102801' sv 'BC102801' gi '' des >> '' org '' key '' >> wild (has) query Sv 'BC102801' >> database type: 'N' format 'GB' >> use access method 'emblcd' >> Matched seqAccess[2] 'emblcd' >> seqAccessEmblcd type 2 >> directory '\$emboss_index_dir/genbank/mam' entry 'BC102801' acc >> 'BC102801' hasacc:Yes >> ajFileNewIn '\$emboss_index_dir/genbank/mam/division.lkp' >> Database 'gbmam' : access method 'emblcd' failed >> >> It seems that the emboss could not find the division file. I still >> don't know what the problem is. Do you have any recommendation? >> >> Thank you so much in advance for any help! >> >> Kann >> >> _______________________________________________ >> EMBOSS mailing list >> EMBOSS at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/emboss >> > > From kertib at linuxlap.hu Tue Oct 30 06:25:36 2007 From: kertib at linuxlap.hu (kerti =?ISO-8859-1?Q?Bal=E1zs_G=E1bor?=) Date: Tue, 30 Oct 2007 11:25:36 +0100 Subject: [EMBOSS] make error Message-ID: <1193739936.5962.28.camel@genotech> Hello, There is some problem make EMBOSS. The "configure" has ran well, no made error, or missing componenet, but the "make" exit run with message attacted make.err file. How solve the problem? Thank you! Balazs Kerti Szent Istvan University, Institute of Genetics and Biotechnology HUN-2103 Godollo, Pater Karoly u. 1. -------------- next part -------------- Making all in plplot make[1]: Entering directory `/usr/src/EMBOSS-5.0.0/plplot' Making all in lib make[2]: Entering directory `/usr/src/EMBOSS-5.0.0/plplot/lib' make[2]: Nothing to be done for `all'. make[2]: Leaving directory `/usr/src/EMBOSS-5.0.0/plplot/lib' make[2]: Entering directory `/usr/src/EMBOSS-5.0.0/plplot' /bin/bash ../libtool --tag=CC --mode=compile gcc -DPACKAGE_NAME=\"\" -DPACKAGE_TARNAME=\"\" -DPACKAGE_VERSION=\"\" -DPACKAGE_STRING=\"\" -DPACKAGE_BUGREPORT=\"\" -DPACKAGE=\"EMBOSS\" -DVERSION=\"5.0.0\" -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -DX_DISPLAY_MISSING=1 -DHAVE_DIRENT_H=1 -DSTDC_HEADERS=1 -DHAVE_UNISTD_H=1 -DGETPGRP_VOID=1 -DHAVE_STRFTIME=1 -DHAVE_FORK=1 -DHAVE_VFORK=1 -DHAVE_WORKING_VFORK=1 -DHAVE_WORKING_FORK=1 -DHAVE_VPRINTF=1 -DHAVE_MEMMOVE=1 -DHAVE_LIBM=1 -I. -I./ -I/usr/include/gd -DPREFIX=\"/usr/local\" -DBUILD_DIR=\".\" -DDRV_DIR=\".\" -DEMBOSS_TOP=\"/usr/src/EMBOSS-5.0.0\" -DAJ_LinuxLF -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE -DLENDIAN -DNO_AUTH -O2 -MT xwin.lo -MD -MP -MF .deps/xwin.Tpo -c -o xwin.lo xwin.c gcc -DPACKAGE_NAME=\"\" -DPACKAGE_TARNAME=\"\" -DPACKAGE_VERSION=\"\" -DPACKAGE_STRING=\"\" -DPACKAGE_BUGREPORT=\"\" -DPACKAGE=\"EMBOSS\" -DVERSION=\"5.0.0\" -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -DX_DISPLAY_MISSING=1 -DHAVE_DIRENT_H=1 -DSTDC_HEADERS=1 -DHAVE_UNISTD_H=1 -DGETPGRP_VOID=1 -DHAVE_STRFTIME=1 -DHAVE_FORK=1 -DHAVE_VFORK=1 -DHAVE_WORKING_VFORK=1 -DHAVE_WORKING_FORK=1 -DHAVE_VPRINTF=1 -DHAVE_MEMMOVE=1 -DHAVE_LIBM=1 -I. -I./ -I/usr/include/gd -DPREFIX=\"/usr/local\" -DBUILD_DIR=\".\" -DDRV_DIR=\".\" -DEMBOSS_TOP=\"/usr/src/EMBOSS-5.0.0\" -DAJ_LinuxLF -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE -DLENDIAN -DNO_AUTH -O2 -MT xwin.lo -MD -MP -MF .deps/xwin.Tpo -c xwin.c -fPIC -DPIC -o .libs/xwin.o make[2]: Leaving directory `/usr/src/EMBOSS-5.0.0/plplot' make[1]: Leaving directory `/usr/src/EMBOSS-5.0.0/plplot' From jerome.laroche at bioinfo.ulaval.ca Wed Oct 31 16:46:50 2007 From: jerome.laroche at bioinfo.ulaval.ca (=?ISO-8859-1?Q?J=E9r=F4me_Laroche?=) Date: Wed, 31 Oct 2007 16:46:50 -0400 Subject: [EMBOSS] dbxflat and size of index files Message-ID: Hello, I use dbxflat to index uniprot (sprot and trembl) flat files for which the size is 1.2 G for sprot and 11 G for trembl. The resulting files are amazingly huge: 11 G. Is it normal? Another example with Genbank flat files: the division gbsts has a size of 3.3 G. Indexing with dbxflat give 6.8 G of index files but with dbiflat give only 199 M of index files. I know its not necessary to index genbank flat files with dbxflat because each individual file is not bigger than 300 M. I did this just for the demonstration. Apart of this, all is working very well. Thank you in advance. J?r?me Laroche Centre de bioinformatique et de biologie computationnelle Universit? Laval From ajb at ebi.ac.uk Wed Oct 31 18:07:24 2007 From: ajb at ebi.ac.uk (ajb at ebi.ac.uk) Date: Wed, 31 Oct 2007 22:07:24 -0000 (GMT) Subject: [EMBOSS] dbxflat and size of index files In-Reply-To: References: Message-ID: <33217.81.98.241.17.1193868444.squirrel@webmail.ebi.ac.uk> Hello J?r?me, Yes, it is normal. It is a combination of three things. First, it is a tree structure, secondly the tree isn't tightly packed and thirdly 64-bit pointers are used throughout. The first will allow on-the-fly updating of the index, the second is for speed of construction/updating and the third is obvious. Another consideration is that, in some cases, the indexes are trees-of-trees to allow duplicate codes to be indexed (e.g. keywords). Coincidentally I'm on the lookout for new indexing algorithms at the moment so, if you have a favourite one then we're always open for suggestions. Alan > Hello, > > I use dbxflat to index uniprot (sprot and trembl) flat files for > which the size is 1.2 G for sprot and 11 G for trembl. The resulting > files are amazingly huge: 11 G. Is it normal? > > Another example with Genbank flat files: the division gbsts has a > size of 3.3 G. Indexing with dbxflat give 6.8 G of index files but > with dbiflat give only 199 M of index files. I know its not necessary > to index genbank flat files with dbxflat because each individual file > is not bigger than 300 M. I did this just for the demonstration. > > Apart of this, all is working very well. > > Thank you in advance. > > > J?r?me Laroche > > Centre de bioinformatique et de biologie computationnelle > Universit? Laval > > > _______________________________________________ > EMBOSS mailing list > EMBOSS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/emboss > From fernan at iib.unsam.edu.ar Tue Oct 2 17:54:05 2007 From: fernan at iib.unsam.edu.ar (Fernan Aguero) Date: Tue, 2 Oct 2007 14:54:05 -0300 Subject: [EMBOSS] problems installing/using TrEMBL Message-ID: <20071002175405.GA62945@iib.unsam.edu.ar> Hi, I've installed TrEMBL in EMBOSS and it seems like I'm having some problems ... I've run dbiflat as follows: dbiflat -dbname trembl -idformat EMBL -directory . -filenames uniprot_trembl.dat -release '37.0' -date '24/07/07' -fields sv,acc,des,key,org I've put an entry in my emboss.default configuration file and the db is listed by showdb. Also the db seems to works fine with, for example 'textsearch': [fernan at alfa ~]$ textsearch trembl:* 'cyclase' Search sequence documentation. Slow, use SRS and Entrez! Output file [a0b532_mettp.textsearch]: stdout # Search for: cyclase trembl-id:A0B532_METTP A0B532_METTP A0B532 RNA-3'-phosphate cyclase (EC 6.5.1.4). trembl-id:A1RWP7_THEPD A1RWP7_THEPD A1RWP7 RNA-3'-phosphate cyclase (EC 6.5.1.4). trembl-id:A2SR85_METLZ A2SR85_METLZ A2SR85 Cyclase family protein. trembl-id:A3H5Q9_9CREN A3H5Q9_9CREN A3H5Q9 Magnesium-protoporphyrin IX monomethyl ester (Oxidative) cyclase (EC 1.14.13.81). trembl-id:A3H7Y6_9CREN A3H7Y6_9CREN A3H7Y6 RNA-3'-phosphate cyclase (EC 6.5.1.4). trembl-id:A6URB1_METVA A6URB1_METVA A6URB1 Cyclase family protein. ... First, I've got a number of warnings when running dbiflat. Because all of them were about null IDs ('') I've just ignored them ... I mention it just in case, Warning: Duplicate ID skipped: '' All hits will point to first ID found Now, when using seqret, it seems like I'm not getting the records I expect, for example if I search for the first ID in the example above (A0B532), I get A0BDZ0 instead: [fernan at alfa ~]$ seqret trembl:A0B532 Reads and writes (returns) sequences output sequence(s) [a0bdz0_parte.fasta]: stdout >A0BDZ0_PARTE A0BDZ0 Chromosome undetermined scaffold_101, whole genome shotgun sequence. MLNFPQNARDHFSCDCDPCEFAITHGEEIMPKRVPPQKPIQQIQDKDLGLLLRKLQAPNK LTRSVRIRIPETCVCNEGEIKFIAYYDESEGFIKFIQKPTFQQTKQFLNERRPPDSLAVI IKYIDNNMQVMTDMEFTILMMKRKIDPIWSQILYIQNFNSNKNYELQHYEFKHSFDSKYP EFDLARIEILILNGEIARASSDFVPMVREEAYENSLSQDQYCRYMVYKMVHYADVFGGIQ ITEGKFSFHKKTFISMEKMEYTDLDRKALFDSEILLRKKKMIDEDMFQFQKLIDQNVKKE REYALKVYREILDMDNGLDQQSHLLKNKLSVIGYDLKKYSQSIQSNFQQVMVSKDPASTL KELVIEQKVNEEKLTSILKPKKGEKTKKKM But if I search for A0B532_METTP I get nothing: [fernan at alfa ~]$ seqret trembl:A0B532_METTP Reads and writes (returns) sequences Error: Unable to read sequence 'trembl:A0B532_METTP' Died: seqret terminated: Bad value for '-sequence' and no prompt Now, if I search for A0BDZ0, I get A0BL81 instead: [fernan at alfa ~]$ seqret trembl:A0BDZ0 Reads and writes (returns) sequences output sequence(s) [a0bl81_parte.fasta]: stdout >A0BL81_PARTE A0BL81 Chromosome undetermined scaffold_113, whole genome shotgun sequence. MKQISESAHILQKVYNPNRMNKLFMTTHYQLQNETDLIFDKYMLMPLFGLSVANGISSNC IKPKYLCSEYKKQELYDCNLILILSAYSDQAVYRSKTMYEKRNGLEQIFKYLASPNYTYN IHISLLSYFVPQRVFYKQVLQALNIFELIDQKQIEELTKSSSIINQSVGEDNLDSILFKN QEFIDYQKWRRMLKNNTIINLKTLHQHQLSQQIFCQYFLRYHYYQGCEEEINKLNKFLVD DFDMFKFRSRLEHNEKKMKFYFLRMLKYFKLNEKLEIFLKFSFKSYSLDWNKELLREMKN SLNQYKKQ Any idea about what is wrong? I also have swissprot installed (pretty much in the same way) and it works OK with seqret, both using ACs (Q4U9M9) or IDs (104K_THEAN). This is on a Linux cluster (Rocks 4.2, with EMBOSS installed from the Bio roll) [fernan at alfa ~]$ embossversion Writes the current EMBOSS version number 4.0.0 Thanks in advance for any pointer, Fernan From simon.andrews at bbsrc.ac.uk Wed Oct 3 07:37:53 2007 From: simon.andrews at bbsrc.ac.uk (Simon Andrews) Date: Wed, 3 Oct 2007 08:37:53 +0100 Subject: [EMBOSS] problems installing/using TrEMBL In-Reply-To: <20071002175405.GA62945@iib.unsam.edu.ar> References: <20071002175405.GA62945@iib.unsam.edu.ar> Message-ID: On 2 Oct 2007, at 18:54, Fernan Aguero wrote: > Hi, > > I've installed TrEMBL in EMBOSS and it seems like I'm having some > problems ... > > I've run dbiflat as follows: [snip] > > Now, when using seqret, it seems like I'm not getting the > records I expect, for example if I search for the first ID > in the example above (A0B532), I get A0BDZ0 instead: I suspect your problem is that your trembl file is >2Gb in size. Above this size dbiflat won't work properly and will give wacky results such as the ones you've shown. This won't be a problem with uniprot_sprot.dat as this is still only about 1.1Gb. Your choices are therefore: 1) You could split your trembl file into multiple files, each smaller than 2Gb. This ends up being a complete pain, and you probably don't want to do it this way. 2) Use the newer dbx* family of indexing programs which can cope with larger file sizes. In your case you'd use dbxflat instead of dbiflat. There are some configuration differences between the two so you should read 'tfm dbxflat' first, but they work pretty much the same as the old versions. We use the dbx programs for all of our databases and they work fine. Hope this helps Simon. From gbottu at vub.ac.be Thu Oct 4 10:01:45 2007 From: gbottu at vub.ac.be (Guy Bottu) Date: Thu, 04 Oct 2007 12:01:45 +0200 Subject: [EMBOSS] Question about acidify Message-ID: <4704BA09.1000905@vub.ac.be> Dear Peter, dear Alan, dear all, I remember that there had been question of implementing a tool called acidify that would allow for the easy integration of software under EMBOSS (with the help of an ACD file but without elaborate EMBOSS "wrapper" progrm). Can someone tell me how far this has gone. I ask this question because my colleagues of the SIMDAT project have expressed their interest. Guy Bottu, BEN From pmr at ebi.ac.uk Thu Oct 4 10:40:48 2007 From: pmr at ebi.ac.uk (Peter Rice) Date: Thu, 04 Oct 2007 11:40:48 +0100 Subject: [EMBOSS] Question about acidify In-Reply-To: <4704BA09.1000905@vub.ac.be> References: <4704BA09.1000905@vub.ac.be> Message-ID: <4704C330.6070102@ebi.ac.uk> Guy Bottu wrote: > I remember that there had been question of implementing a tool called > acidify that would allow for the easy integration of software under > EMBOSS (with the help of an ACD file but without elaborate EMBOSS > "wrapper" progrm). Can someone tell me how far this has gone. I ask this > question because my colleagues of the SIMDAT project have expressed > their interest. We are working on making this easier in ACD. I added some functions when Alan was writing wrappers for MIRA. We already have ACD extensions for SoapLab to provide additional definitions for external applications. These are used to generate the XML definitions used by SoapLab for non-EMBOSS applications, but can be generally useful. Do you have examples of the ACD files that would be useful for SIMDAT? Are any new datatypes involved? regards, Peter From fernan at iib.unsam.edu.ar Thu Oct 4 14:08:22 2007 From: fernan at iib.unsam.edu.ar (Fernan Aguero) Date: Thu, 4 Oct 2007 11:08:22 -0300 Subject: [EMBOSS] problems installing/using TrEMBL In-Reply-To: References: <20071002175405.GA62945@iib.unsam.edu.ar> Message-ID: <20071004140822.GA96432@iib.unsam.edu.ar> | On 2 Oct 2007, at 18:54, Fernan Aguero wrote: | | > Hi, | > | > I've installed TrEMBL in EMBOSS and it seems like I'm having some | > problems ... | > | > I've run dbiflat as follows: | [snip] | > | > Now, when using seqret, it seems like I'm not getting the | > records I expect, for example if I search for the first ID | > in the example above (A0B532), I get A0BDZ0 instead: | | I suspect your problem is that your trembl file is >2Gb in size. | Above this size dbiflat won't work properly and will give wacky | results such as the ones you've shown. This won't be a problem with | uniprot_sprot.dat as this is still only about 1.1Gb. | | Your choices are therefore: | | 1) You could split your trembl file into multiple files, each smaller | than 2Gb. This ends up being a complete pain, and you probably don't | want to do it this way. | | 2) Use the newer dbx* family of indexing programs which can cope with | larger file sizes. In your case you'd use dbxflat instead of | dbiflat. There are some configuration differences between the two so | you should read 'tfm dbxflat' first, but they work pretty much the | same as the old versions. We use the dbx programs for all of our | databases and they work fine. | | Hope this helps | | Simon. Simon, thanks for your suggestions. I've been waiting for dbxflat to finish before replying ... thus the delay. You mention that there are some configuration differences between db(x|i)flat ... I guess I've got into those now ... even after reading tfm for dbxflat, it seems I can't just set it up right ===> Configuration DB trembl [ type: P comment: "TrEMBL 37.0" method: emblcd format: embl dbalias: trembl dir: /share/bio/emboss/trembl/ file: uniprot_trembl.dat indexdirectory: /share/bio/emboss/trembl ] With this configuration, I get this error: [fernan at alfa ~]$ seqret trembl:A0B532 Reads and writes (returns) sequences Warning: Cannot open division file '' for database 'trembl' Warning: seqCdQry failed Error: Unable to read sequence 'trembl:A0B532' Died: seqret terminated: Bad value for '-sequence' and no prompt If I change the 'method' to 'method: emboss' as per the example in the dbxflat docs, I get this error: [fernan at alfa ~]$ seqret trembl:A0B532 Reads and writes (returns) sequences EMBOSS An error in ajindex.c at line 3028: Cannot open param file /share/bio/emboss/trembl/trembl.pxid This file does not exist (see result of indexing below): ===> Indexing [root at alfa trembl]# dbxflat -dbname trembl -idformat EMBL -directory . -filenames uniprot_trembl.dat -release "37.0" -date "24/07/07" -fields sv,acc,des,key,orgDatabase b+tree indexing for flat file databases Resource name: embl Processing file ./uniprot_trembl.dat [root at alfa trembl]# du -hc * 4.0K dbxflat.command 4.0K trembl.ent 4.0K trembl.pxac 4.0K trembl.pxde 4.0K trembl.pxkw 4.0K trembl.pxsv 4.0K trembl.pxtx 572M trembl.xac 4.2G trembl.xde 381M trembl.xkw 4.0K trembl.xsv 3.0G trembl.xtx 11G uniprot_trembl.dat 19G total I've also tried other combinations of 'method' (emboss, emblcd) and 'format' (swiss, embl) without success ... Am I indexing the db with the right incantation for dbxflat? If so, what am I missing in my configuration? Thanks again for any pointer, Fernan PS: this is on emboss-4.0.0 running on a Rocks Cluster (4.2, CentOS) From georgios at biotek.uio.no Thu Oct 4 14:53:38 2007 From: georgios at biotek.uio.no (George Magklaras) Date: Thu, 04 Oct 2007 16:53:38 +0200 Subject: [EMBOSS] problems installing/using TrEMBL In-Reply-To: <20071004140822.GA96432@iib.unsam.edu.ar> References: <20071002175405.GA62945@iib.unsam.edu.ar> <20071004140822.GA96432@iib.unsam.edu.ar> Message-ID: <4704FE72.1090206@biotek.uio.no> Maybe you are missing the resource record in the emboss.default file for the trembl databank and you have passed the wrong arguments to dbxflat. You should choose the emboss method in the DB entry. Then, the emboss.default file should contain also a resource entry for trembl: RES trembl [ type: Index idlen: 15 acclen: 15 svlen: 20 keylen: 30 deslen: 25 orglen: 25 ] From your dbxflat output you quote I can see that the command points to the embl resource: [root at alfa trembl]# dbxflat -dbname trembl -idformat EMBL <--- Why EMBL? -directory . -filenames uniprot_trembl.dat -release "37.0" -date "24/07/07" -fields sv,acc,des,key,orgDatabase b+tree indexing for flat file databases Resource name: embl <--- That should say trembl, Why did you choose embl here? When the dbxflat command asked you for a resource name, you really should have a trembl RES entry and I am not sure that your idformat (EMBL) is correct. GM -- -- George Magklaras Senior Computer Systems Engineer/UNIX Systems Administrator EMBnet Technical Management Board The Biotechnology Centre of Oslo, University of Oslo http://www.biotek.uio.no/ EMBnet Norway: http://www.no.embnet.org/ Fernan Aguero wrote: > > | On 2 Oct 2007, at 18:54, Fernan Aguero wrote: > | > | > Hi, > | > > | > I've installed TrEMBL in EMBOSS and it seems like I'm having some > | > problems ... > | > > | > I've run dbiflat as follows: > | [snip] > | > > | > Now, when using seqret, it seems like I'm not getting the > | > records I expect, for example if I search for the first ID > | > in the example above (A0B532), I get A0BDZ0 instead: > | > | I suspect your problem is that your trembl file is >2Gb in size. > | Above this size dbiflat won't work properly and will give wacky > | results such as the ones you've shown. This won't be a problem with > | uniprot_sprot.dat as this is still only about 1.1Gb. > | > | Your choices are therefore: > | > | 1) You could split your trembl file into multiple files, each smaller > | than 2Gb. This ends up being a complete pain, and you probably don't > | want to do it this way. > | > | 2) Use the newer dbx* family of indexing programs which can cope with > | larger file sizes. In your case you'd use dbxflat instead of > | dbiflat. There are some configuration differences between the two so > | you should read 'tfm dbxflat' first, but they work pretty much the > | same as the old versions. We use the dbx programs for all of our > | databases and they work fine. > | > | Hope this helps > | > | Simon. > > Simon, > > thanks for your suggestions. I've been waiting for dbxflat > to finish before replying ... thus the delay. > > You mention that there are some configuration > differences between db(x|i)flat ... I guess I've got into those > now ... even after reading tfm for dbxflat, it seems I can't > just set it up right > > ===> Configuration > DB trembl [ > type: P > comment: "TrEMBL 37.0" > method: emblcd > format: embl > dbalias: trembl > dir: /share/bio/emboss/trembl/ > file: uniprot_trembl.dat > indexdirectory: /share/bio/emboss/trembl > ] > > With this configuration, I get this error: > [fernan at alfa ~]$ seqret trembl:A0B532 > Reads and writes (returns) sequences > Warning: Cannot open division file '' for database 'trembl' > Warning: seqCdQry failed > Error: Unable to read sequence 'trembl:A0B532' > Died: seqret terminated: Bad value for '-sequence' and no prompt > > If I change the 'method' to 'method: emboss' > as per the example in the dbxflat docs, I get this error: > > [fernan at alfa ~]$ seqret trembl:A0B532 > Reads and writes (returns) sequences > > EMBOSS An error in ajindex.c at line 3028: > Cannot open param file /share/bio/emboss/trembl/trembl.pxid > > This file does not exist (see result of indexing below): > > ===> Indexing > [root at alfa trembl]# dbxflat -dbname trembl -idformat EMBL > -directory . -filenames uniprot_trembl.dat -release "37.0" > -date "24/07/07" -fields sv,acc,des,key,orgDatabase b+tree > indexing for flat file databases > Resource name: embl > Processing file ./uniprot_trembl.dat > [root at alfa trembl]# du -hc * > 4.0K dbxflat.command > 4.0K trembl.ent > 4.0K trembl.pxac > 4.0K trembl.pxde > 4.0K trembl.pxkw > 4.0K trembl.pxsv > 4.0K trembl.pxtx > 572M trembl.xac > 4.2G trembl.xde > 381M trembl.xkw > 4.0K trembl.xsv > 3.0G trembl.xtx > 11G uniprot_trembl.dat > 19G total > > I've also tried other combinations of 'method' (emboss, > emblcd) and 'format' (swiss, embl) without success ... > > Am I indexing the db with the right incantation for dbxflat? > If so, what am I missing in my configuration? > > Thanks again for any pointer, > > Fernan > > PS: this is on emboss-4.0.0 running on a Rocks Cluster (4.2, > CentOS) > > _______________________________________________ > EMBOSS mailing list > EMBOSS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/emboss > From fernan at iib.unsam.edu.ar Thu Oct 4 22:41:44 2007 From: fernan at iib.unsam.edu.ar (Fernan Aguero) Date: Thu, 4 Oct 2007 19:41:44 -0300 Subject: [EMBOSS] problems installing/using TrEMBL In-Reply-To: <4704FE72.1090206@biotek.uio.no> References: <20071002175405.GA62945@iib.unsam.edu.ar> <4704FE72.1090206@biotek.uio.no> Message-ID: <20071004224144.GA98760@iib.unsam.edu.ar> George, thanks for your points. | Maybe you are missing the resource record in the emboss.default file for | the trembl databank and you have passed the wrong arguments to dbxflat. I have this resource record in my emboss.default conf RES embl [ type: Index idlen: 15 acclen: 15 svlen: 15 keylen: 25 deslen: 25 orglen: 25 ] | You should choose the emboss method in the DB entry. OK | Then, the | emboss.default file should contain also a resource entry for trembl: | | RES trembl [ | type: Index | idlen: 15 | acclen: 15 | svlen: 20 | keylen: 30 | deslen: 25 | orglen: 25 | ] Does the name of the resource matter? Mine is named 'embl' ... | From your dbxflat output you quote I can see that the command points to | the embl resource: | | [root at alfa trembl]# dbxflat -dbname trembl -idformat EMBL <--- Why EMBL? What other options are there SWISS? GCG? GENBANK? This is AFAIK an EMBL formatted file. But maybe I'm wrong ... | -directory . -filenames uniprot_trembl.dat -release "37.0" | -date "24/07/07" -fields sv,acc,des,key,orgDatabase b+tree | indexing for flat file databases | Resource name: embl <--- That should say trembl, Why did you choose | embl here? Because the resource in my emboss.default file is named 'embl'. | | When the dbxflat command asked you for a resource name, you really | should have a trembl RES entry and I am not sure that your idformat | (EMBL) is correct. | | GM | -- | George Magklaras Mmm ... maybe it's SWISS then? >From the dbxflat docs: EMBL : EMBL SWISS : Swiss-Prot, SpTrEMBL, TrEMBLnew GB : Genbank, DDBJ REFSEQ : Refseq Entry format [SWISS]: Thanks for your questions and pointers. I'm running dbxflat overnight again to see if this makes any difference (-idformat SWISS -resource trembl, with a new trembl RES line added to emboss.default). But so far, only 6 trembl.* files are being produced and none of them is called trembl.pxid (as per the error in my original message, see below). [root at alfa trembl]# ls trembl.* trembl.ent trembl.xac trembl.xde trembl.xkw trembl.xsv trembl.xtx Fernan PS: this is the first entry in my uniprot_trembl.dat file [fernan at alfa trembl]$ head -45 uniprot_trembl.dat ID A0B532_METTP Unreviewed; 337 AA. AC A0B532; DT 28-NOV-2006, integrated into UniProtKB/TrEMBL. DT 28-NOV-2006, sequence version 1. DT 24-JUL-2007, entry version 6. DE RNA-3'-phosphate cyclase (EC 6.5.1.4). GN OrderedLocusNames=Mthe_0003; OS Methanosaeta thermophila (strain DSM 6194 / PT) (Methanothrix OS thermophila (strain DSM 6194 / PT)). OC Archaea; Euryarchaeota; Methanomicrobia; Methanosarcinales; OC Methanosaetaceae; Methanosaeta. OX NCBI_TaxID=349307; RN [1] RP NUCLEOTIDE SEQUENCE [LARGE SCALE GENOMIC DNA]. RG US DOE Joint Genome Institute; RA Copeland A., Lucas S., Lapidus A., Barry K., Detter J.C., RA Glavina del Rio T., Hammon N., Israni S., Pitluck S., Chain P., RA Malfatti S., Shin M., Vergez L., Schmutz J., Larimer F., Land M., RA Hauser L., Kyrpides N., Kim E., Smith K.S., Ingram-Smith C., RA Richardson P.; RT "Complete sequence of Methanosaeta thermophila PT."; RL Submitted (OCT-2006) to the EMBL/GenBank/DDBJ databases. CC ----------------------------------------------------------------------- CC Copyrighted by the UniProt Consortium, see http://www.uniprot.org/terms CC Distributed under the Creative Commons Attribution-NoDerivs License CC ----------------------------------------------------------------------- DR EMBL; CP000477; ABK13806.1; -; Genomic_DNA. DR GenomeReviews; CP000477_GR; Mthe_0003. DR GO; GO:0003963; F:RNA-3'-phosphate cyclase activity; IEA:InterPro. DR InterPro; IPR000228; RNA3'_term_phos_cycl. DR InterPro; IPR013796; RNA3'_term_phos_cycl_insert. DR PANTHER; PTHR11096; RNA3'_term_phos_cycl; 1. DR Pfam; PF01137; RTC; 1. DR Pfam; PF05189; RTC_insert; 1. DR PROSITE; PS01287; RTC; 1. PE 4: Predicted; KW Complete proteome; Ligase. SQ SEQUENCE 337 AA; 36340 MW; 69F26755A1B8DA03 CRC64; MNKPQMIEID GSYGEGGGQI VRTSVALSTL TGIPVRIKNI RRNRPRPGLA AQHVRAIEAL AQISRAETRG VHLGSEEIEF IPGRISAGSY DVDIGTAGSV TLLIQCLLPA LTAAEGPVTV TVRGGTDVRW SPTVDYLEHV ALPAMHLFGV TATFRCERRG YYPRGGGVVV LSTRPSRLRP ARLELIEEGI CGISHCGSLP EHVARRQADA ALELLKEKGY DARIDIQTMS SSSPGSGITL WSGFRGSSAL GERGVRAEDV GREAAKALID ELKSKASVDV HLADQLIPYI ALAGGEYTTR EISSHTRTNI WTAQRILRCR IDIDEGEVFR IHSTGSG // | Fernan Aguero wrote: | > | > | On 2 Oct 2007, at 18:54, Fernan Aguero wrote: | > | | > | > Hi, | > | > | > | > I've installed TrEMBL in EMBOSS and it seems like I'm having some | > | > problems ... | > | > | > | > I've run dbiflat as follows: | > | [snip] | > | > | > | > Now, when using seqret, it seems like I'm not getting the | > | > records I expect, for example if I search for the first ID | > | > in the example above (A0B532), I get A0BDZ0 instead: | > | | > | I suspect your problem is that your trembl file is >2Gb in size. | > | Above this size dbiflat won't work properly and will give wacky | > | results such as the ones you've shown. This won't be a problem with | > | uniprot_sprot.dat as this is still only about 1.1Gb. | > | | > | Your choices are therefore: | > | | > | 1) You could split your trembl file into multiple files, each smaller | > | than 2Gb. This ends up being a complete pain, and you probably don't | > | want to do it this way. | > | | > | 2) Use the newer dbx* family of indexing programs which can cope with | > | larger file sizes. In your case you'd use dbxflat instead of | > | dbiflat. There are some configuration differences between the two so | > | you should read 'tfm dbxflat' first, but they work pretty much the | > | same as the old versions. We use the dbx programs for all of our | > | databases and they work fine. | > | | > | Hope this helps | > | | > | Simon. | > | > Simon, | > | > thanks for your suggestions. I've been waiting for dbxflat | > to finish before replying ... thus the delay. | > | > You mention that there are some configuration | > differences between db(x|i)flat ... I guess I've got into those | > now ... even after reading tfm for dbxflat, it seems I can't | > just set it up right | > | > ===> Configuration | > DB trembl [ | > type: P | > comment: "TrEMBL 37.0" | > method: emblcd | > format: embl | > dbalias: trembl | > dir: /share/bio/emboss/trembl/ | > file: uniprot_trembl.dat | > indexdirectory: /share/bio/emboss/trembl | > ] | > | > With this configuration, I get this error: | > [fernan at alfa ~]$ seqret trembl:A0B532 | > Reads and writes (returns) sequences | > Warning: Cannot open division file '' for database 'trembl' | > Warning: seqCdQry failed | > Error: Unable to read sequence 'trembl:A0B532' | > Died: seqret terminated: Bad value for '-sequence' and no prompt | > | > If I change the 'method' to 'method: emboss' | > as per the example in the dbxflat docs, I get this error: | > | > [fernan at alfa ~]$ seqret trembl:A0B532 | > Reads and writes (returns) sequences | > | > EMBOSS An error in ajindex.c at line 3028: | > Cannot open param file /share/bio/emboss/trembl/trembl.pxid | > | > This file does not exist (see result of indexing below): | > | > ===> Indexing | > [root at alfa trembl]# dbxflat -dbname trembl -idformat EMBL | > -directory . -filenames uniprot_trembl.dat -release "37.0" | > -date "24/07/07" -fields sv,acc,des,key,orgDatabase b+tree | > indexing for flat file databases | > Resource name: embl | > Processing file ./uniprot_trembl.dat | > [root at alfa trembl]# du -hc * | > 4.0K dbxflat.command | > 4.0K trembl.ent | > 4.0K trembl.pxac | > 4.0K trembl.pxde | > 4.0K trembl.pxkw | > 4.0K trembl.pxsv | > 4.0K trembl.pxtx | > 572M trembl.xac | > 4.2G trembl.xde | > 381M trembl.xkw | > 4.0K trembl.xsv | > 3.0G trembl.xtx | > 11G uniprot_trembl.dat | > 19G total | > | > I've also tried other combinations of 'method' (emboss, | > emblcd) and 'format' (swiss, embl) without success ... | > | > Am I indexing the db with the right incantation for dbxflat? | > If so, what am I missing in my configuration? | > | > Thanks again for any pointer, | > | > Fernan | > | > PS: this is on emboss-4.0.0 running on a Rocks Cluster (4.2, | > CentOS) | > | > _______________________________________________ | > EMBOSS mailing list | > EMBOSS at lists.open-bio.org | > http://lists.open-bio.org/mailman/listinfo/emboss | > | | | | | _______________________________________________ | EMBOSS mailing list | EMBOSS at lists.open-bio.org | http://lists.open-bio.org/mailman/listinfo/emboss | | +----] From sum732 at mail.usask.ca Fri Oct 5 23:38:01 2007 From: sum732 at mail.usask.ca (Sudeep Mehrotra) Date: Fri, 05 Oct 2007 17:38:01 -0600 Subject: [EMBOSS] Seqret and searching a database with entries in a file Message-ID: <986A5EE0-8709-4657-B7CB-84A43513D308@mail.usask.ca> Hello, I am wondering if I can use "seqret" from EMBOSS to perform following action. I have a database and I have a file which consists of list of protein IDs. I want use seqret to search each entry (in the given file) in the given database and output the search into another file. for example: seqret "path to the database":AAT37944.1. If I use the above mentioned command on command line, I get the output (protein name, protein sequence etc) in fasta format consisting the entry. What I want to do is instead of giving one entry I want to give the whole file, which consists of similar entries. Can some one help me here. Thanks Sudeep From david.bauer at bayerhealthcare.com Sat Oct 6 19:13:34 2007 From: david.bauer at bayerhealthcare.com (david.bauer at bayerhealthcare.com) Date: Sat, 6 Oct 2007 21:13:34 +0200 Subject: [EMBOSS] Seqret and searching a database with entries in a file In-Reply-To: <986A5EE0-8709-4657-B7CB-84A43513D308@mail.usask.ca> Message-ID: Hi Sudeep, if you add a "@" character in front of a filename, EMBOSS interprets this as a "file of filenames". So you can put all your IDs including the database name into a file (e.g. myseqs.fof). Then you run "seqret @myseqs.fof". Cheers, David. emboss-bounces at lists.open-bio.org schrieb am 06/10/2007 01:38:01: > Hello, > I am wondering if I can use "seqret" from EMBOSS to perform > following action. > > I have a database and I have a file which consists of list of protein > IDs. I want use seqret to search each entry (in the given file) in > the given database and output the search into another file. > for example: > seqret "path to the database":AAT37944.1. > If I use the above mentioned command on command line, I get the > output (protein name, protein sequence etc) in fasta format > consisting the entry. What I want to do is instead of giving one > entry I want to give the whole file, which consists of similar entries. > > Can some one help me here. > Thanks > Sudeep > _______________________________________________ > EMBOSS mailing list > EMBOSS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/emboss From gbottu at vub.ac.be Mon Oct 8 07:12:18 2007 From: gbottu at vub.ac.be (Guy Bottu) Date: Mon, 08 Oct 2007 09:12:18 +0200 Subject: [EMBOSS] Seqret and searching a database with entries in a file In-Reply-To: <986A5EE0-8709-4657-B7CB-84A43513D308@mail.usask.ca> References: <986A5EE0-8709-4657-B7CB-84A43513D308@mail.usask.ca> Message-ID: <4709D852.20007@vub.ac.be> Sudeep Mehrotra wrote: > I have a database and I have a file which consists of list of protein > IDs. I want use seqret to search each entry (in the given file) in > the given database and output the search into another file. Dear Sudeep, If you can, using some script, transform your file into format : xxx:AC3355 xxx:CG6754 xxx:AV6754 with xxx the name of the databank (you might have to use bare accession numbers rather than version numbers), then it is easy, just run seqret list::File If you want the original entries rather than the entries in fastA format, use entret instead of seqret. Guy Bottu, Belgian EMBnet Node From charles-listes-emboss at plessy.org Mon Oct 8 06:30:50 2007 From: charles-listes-emboss at plessy.org (Charles Plessy) Date: Mon, 8 Oct 2007 15:30:50 +0900 Subject: [EMBOSS] About the EMBOSS quick guide. Message-ID: <20071008063047.GB9819@kunpuu.plessy.org> Dear EMBOSS developpers, I am member of a packaging team that takes care of integrating EMBOSS in Debian. I just realised today that the Quick Guide to EMBOSS is released under a "noncommercial" licence. file:///usr/share/EMBOSS/doc/manuals/emboss_qg.pdf Debian puts a strong emphasis on not mixing programs which do not meet the "Debian Free Software Guidelines" (DFSG) with the ones which do. In our case, EMBOSS is free according to the DFSG, but not the Quick Guide, as restrictions on commercial use do not comply whith the guideline number 6: No Discrimination Against Fields of Endeavor The license must not restrict anyone from making use of the program in a specific field of endeavor. For example, it may not restrict the program from being used in a business, or from being used for genetic research. >From my packager point of view, the simplest way to solve this problem would be that you relicence the Quick Guide under a free licence according to the DFSG, such as BSD or GPL for instance. Unfortunately, the guide's author, David Martin, left EMBnet and I do not know how to contact him. Importantly, the DFSG also require the sources of works distributed in Debian to be available. If it is possible to relicence the Quick Guide, could somebody send me its sources ? Debian integrates a bug reporting and tracking system, and having the sources available in Debian could bring opportunities to receive patches. Have a nice day, -- Charles Plessy Debian-Med packaging team. http://www.debian.org/devel/debian-med Wako, Saitama, Japan From georgios at biotek.uio.no Mon Oct 8 08:59:56 2007 From: georgios at biotek.uio.no (George Magklaras) Date: Mon, 08 Oct 2007 10:59:56 +0200 Subject: [EMBOSS] problems installing/using TrEMBL In-Reply-To: <20071004224144.GA98760@iib.unsam.edu.ar> References: <20071002175405.GA62945@iib.unsam.edu.ar> <4704FE72.1090206@biotek.uio.no> <20071004224144.GA98760@iib.unsam.edu.ar> Message-ID: <4709F18C.2070304@biotek.uio.no> Hi Fernan, Fernan Aguero wrote: > George, > > Does the name of the resource matter? Mine is named 'embl' ... > If you plan to have the same values for all databases, no. But I tend to choose different length values for different databanks, so in that case, I have a different RES entry for each databank. > What other options are there SWISS? GCG? GENBANK? This is AFAIK an > EMBL formatted file. But maybe I'm wrong ... > I believe that TrEMBL should be formatted with the SWISS entry format in dbxflat (-idformat SWISS). -- -- George Magklaras Senior Computer Systems Engineer/UNIX Systems Administrator EMBnet Technical Management Board The Biotechnology Centre of Oslo, University of Oslo http://www.biotek.uio.no/ EMBnet Norway: http://www.no.embnet.org/ From charles-listes-emboss at plessy.org Mon Oct 8 23:38:28 2007 From: charles-listes-emboss at plessy.org (Charles Plessy) Date: Tue, 9 Oct 2007 08:38:28 +0900 Subject: [EMBOSS] Bug in degapseq ? Message-ID: <20071008233828.GA32069@kunpuu.plessy.org> Dear developpers, If I use degaspeq on a file, it prompts me for the name of the output, but if the data comes from stdin, degaspeq crashes. It does not happen if the name of the output is given. chouca?~?$ cat toto >Xenopus-1a -----MVLLKCEYRDEEEDLTS---ASPCSV--TSSFRSPAT----QTCSSDDEQLLSPT SP--------------GQHQGEE---NS----------------------------PRCR RSRGRA-QGKSGETVLKIKKTRRVKANNRERNRMHNLNSALDSLREVLPSLPEDAKLTKI ETLRFAYNYIWALSETLRLGD-----P-VHRS--AS-----TPAAAI---LV---QDSSS SQSP-----SWS--CSSSPSS-----S-------CCSFS--PASP----ASST--SDSIE SWQ---PSELHLNPFMSASSA---FI---- >Xenopus-1b -----MVLLKCEYRDEVSELTS---VSPCSVSSSSSHPSPAM----QTCSSDDEQLHSPT SPTL-------THLQQGRDQGEE---NS----------------------------PRCR RSRAR------GDTVLKIKKTRRVKANNRERNRMHHLNYALDSLREVLPSLPEDAKLTKI ETLRFAHNYIWALSETLRLAD-----Q-LHGS--TS-----TPAAAI---LV---QDSYP SLSP-----SWS--CSSSPSS----NS-------CDSFS--PTSP----ASST--SDSIE YWQ---PSELRLNPFMSAL----------- >Gallus-2 ------MPVKAESPAPAAEDE--L-LLLRLASPAPSASLP-------SSAGEEDEDEEDG RP-------------RRLQEGA----------------------------------RRAG RQRGPPRAARTAETAQRIKRSRRLKANNRERNRMHNLNAALDALRDVLPTFPEDAKLTKI ETLRFAHNYIWALTETLRL----AGAARLGGA--AD-A---APGAA-----A---EG-SP SPAS-----SWS--GGASPAP-----SA---SPYACTLS--PGSP----AGSA--SD-AE HW---PPPRGRFAPPPPPHR----CL---- chouca?~?$ cat toto | degapseq stdin Removes gap characters from sequences output sequence(s) [xenopus-1a.fasta]: EMBOSS An error in ajmess.c at line 1662: END-OF-FILE reading from user chouca?~?$ cat toto | degapseq stdin stdout Removes gap characters from sequences >Xenopus-1a MVLLKCEYRDEEEDLTSASPCSVTSSFRSPATQTCSSDDEQLLSPTSPGQHQGEENSPRC RRSRGRAQGKSGETVLKIKKTRRVKANNRERNRMHNLNSALDSLREVLPSLPEDAKLTKI ETLRFAYNYIWALSETLRLGDPVHRSASTPAAAILVQDSSSSQSPSWSCSSSPSSSCCSF SPASPASSTSDSIESWQPSELHLNPFMSASSAFI >Xenopus-1b MVLLKCEYRDEVSELTSVSPCSVSSSSSHPSPAMQTCSSDDEQLHSPTSPTLTHLQQGRD QGEENSPRCRRSRARGDTVLKIKKTRRVKANNRERNRMHHLNYALDSLREVLPSLPEDAK LTKIETLRFAHNYIWALSETLRLADQLHGSTSTPAAAILVQDSYPSLSPSWSCSSSPSSN SCDSFSPTSPASSTSDSIEYWQPSELRLNPFMSAL >Gallus-2 MPVKAESPAPAAEDELLLLRLASPAPSASLPSSAGEEDEDEEDGRPRRLQEGARRAGRQR GPPRAARTAETAQRIKRSRRLKANNRERNRMHNLNAALDALRDVLPTFPEDAKLTKIETL RFAHNYIWALTETLRLAGAARLGGAADAAPGAAAEGSPSPASSWSGGASPAPSASPYACT LSPGSPAGSASDAEHWPPPRGRFAPPPPPHRCL chouca?~?$ degapseq toto Removes gap characters from sequences output sequence(s) [xenopus-1a.fasta]: stdout >Xenopus-1a MVLLKCEYRDEEEDLTSASPCSVTSSFRSPATQTCSSDDEQLLSPTSPGQHQGEENSPRC RRSRGRAQGKSGETVLKIKKTRRVKANNRERNRMHNLNSALDSLREVLPSLPEDAKLTKI ETLRFAYNYIWALSETLRLGDPVHRSASTPAAAILVQDSSSSQSPSWSCSSSPSSSCCSF SPASPASSTSDSIESWQPSELHLNPFMSASSAFI >Xenopus-1b MVLLKCEYRDEVSELTSVSPCSVSSSSSHPSPAMQTCSSDDEQLHSPTSPTLTHLQQGRD QGEENSPRCRRSRARGDTVLKIKKTRRVKANNRERNRMHHLNYALDSLREVLPSLPEDAK LTKIETLRFAHNYIWALSETLRLADQLHGSTSTPAAAILVQDSYPSLSPSWSCSSSPSSN SCDSFSPTSPASSTSDSIEYWQPSELRLNPFMSAL >Gallus-2 MPVKAESPAPAAEDELLLLRLASPAPSASLPSSAGEEDEDEEDGRPRRLQEGARRAGRQR GPPRAARTAETAQRIKRSRRLKANNRERNRMHNLNAALDALRDVLPTFPEDAKLTKIETL RFAHNYIWALTETLRLAGAARLGGAADAAPGAAAEGSPSPASSWSGGASPAPSASPYACT LSPGSPAGSASDAEHWPPPRGRFAPPPPPHRCL Have a nice day, -- Charles Plessy http://charles.plessy.org Wako, Saitama, Japan From david at compbio.dundee.ac.uk Tue Oct 9 15:56:57 2007 From: david at compbio.dundee.ac.uk (David Martin) Date: Tue, 09 Oct 2007 16:56:57 +0100 Subject: [EMBOSS] Updating the Quick Guide Message-ID: Prompted by charles' request yesterday I am in the process of updating the EMBOSS quick guide. it was last touched about 8 years ago so comments and suggestions on what is new, and what should be dropped would be much appreciated. ..d From andrespinzon at gmail.com Tue Oct 9 16:32:09 2007 From: andrespinzon at gmail.com (Andres Pinzon) Date: Tue, 9 Oct 2007 11:32:09 -0500 Subject: [EMBOSS] Updating the Quick Guide In-Reply-To: References: Message-ID: <8968fc7e0710090932g63b77a9k7d83bea25c176349@mail.gmail.com> David, I am in the process of writing an EMBOSS book, called "An?lisis de secuencias usando EMBOSS", (" Molecular sequence analysis using EMBOSS", in english), it will be released under a CC license (and of course Open Source), maybe some of the book content can be used. Please, if you need help on the "old" quick guide update please let me know it, Ill be more than glad on helping. Regards, On 10/9/07, David Martin wrote: > Prompted by charles' request yesterday I am in the process of updating the > EMBOSS quick guide. it was last touched about 8 years ago so comments and > suggestions on what is new, and what should be dropped would be much > appreciated. > > ..d > > > _______________________________________________ > EMBOSS mailing list > EMBOSS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/emboss > -- Andr?s Pinz?n http://bioinf.ibun.unal.edu.co/~apinzon/ Bioinformatics Center, Colombia EMBnet node http://bioinf.ibun.unal.edu.co Tel +57 3165000 ext 16961 Fax +571 3165415 Micology and Phytopathology Laboratory - Los Andes University. http://bioinf.uniandes.edu.co Tel +571 3394949 ext. 2768 From michael.watson at bbsrc.ac.uk Wed Oct 10 13:02:49 2007 From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C)) Date: Wed, 10 Oct 2007 14:02:49 +0100 Subject: [EMBOSS] XFree86 vs xorg Message-ID: <8975119BCD0AC5419D61A9CF1A923E9505A4F2EA@iahce2ksrv1.iah.bbsrc.ac.uk> Hi My EMBOSS 5.0 install failed as it couldn't find Xlib.h. On googling, I see this is part of XFree86-devel. However, as a red hat enterprise linux 4 user, my X windows seems to be the x.org branch rather than XFree86.... So, is there a workaround, or should I overwrite my xorg libraries with XFree86 ones? Thanks Mick The information contained in this message may be confidential or legally privileged and is intended solely for the addressee. If you have received this message in error please delete it & notify the originator immediately. Unauthorised use, disclosure, copying or alteration of this message is forbidden & may be unlawful. The contents of this e-mail are the views of the sender and do not necessarily represent the views of the Institute. This email and associated attachments has been checked locally for viruses but we can accept no responsibility once it has left our systems. Communications on Institute computers are monitored to secure the effective operation of the systems and for other lawful purposes. From dalloliogm at gmail.com Wed Oct 10 13:23:01 2007 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Wed, 10 Oct 2007 15:23:01 +0200 Subject: [EMBOSS] Updating the Quick Guide In-Reply-To: References: Message-ID: <5aa3b3570710100623v42107a31we6af4cab8d1bdb80@mail.gmail.com> You should update the guide on how to install emboss. In particular, explain how to use the .deb and .rpm packages, since a lot of people still try to install emboss by compiling it, and it is a pain. 2007/10/9, David Martin : > Prompted by charles' request yesterday I am in the process of updating the > EMBOSS quick guide. it was last touched about 8 years ago so comments and > suggestions on what is new, and what should be dropped would be much > appreciated. > > ..d > > > _______________________________________________ > EMBOSS mailing list > EMBOSS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/emboss > -- ----------------------------------------------------------- My Blog on Bioinformatics (italian): http://dalloliogm.wordpress.com From ajb at ebi.ac.uk Wed Oct 10 13:30:09 2007 From: ajb at ebi.ac.uk (ajb at ebi.ac.uk) Date: Wed, 10 Oct 2007 14:30:09 +0100 (BST) Subject: [EMBOSS] XFree86 vs xorg In-Reply-To: <8975119BCD0AC5419D61A9CF1A923E9505A4F2EA@iahce2ksrv1.iah.bbsrc.ac.uk> References: <8975119BCD0AC5419D61A9CF1A923E9505A4F2EA@iahce2ksrv1.iah.bbsrc.ac.uk> Message-ID: <50101.81.98.241.17.1192023009.squirrel@webmail.ebi.ac.uk> Hello Mick, For xorg all you need to do is to install the xorg-x11-proto-devel RPM and then, in EMBOSS-5.0.0, do a 'make clean' and configure again. You might want to install the gd-devel RPM at the same time (to get PNG support). If you install them both using 'yum' then all the dependencies will be pulled-in. HTH Alan > Hi > > My EMBOSS 5.0 install failed as it couldn't find Xlib.h. On googling, I > see this is part of XFree86-devel. However, as a red hat enterprise > linux 4 user, my X windows seems to be the x.org branch rather than > XFree86.... > > So, is there a workaround, or should I overwrite my xorg libraries with > XFree86 ones? > > Thanks > Mick > > The information contained in this message may be confidential or legally > privileged and is intended solely for the addressee. If you have > received this message in error please delete it & notify the originator > immediately. > Unauthorised use, disclosure, copying or alteration of this message is > forbidden & may be unlawful. > The contents of this e-mail are the views of the sender and do not > necessarily represent the views of the Institute. > This email and associated attachments has been checked locally for > viruses but we can accept no responsibility once it has left our > systems. > Communications on Institute computers are monitored to secure the > effective operation of the systems and for other lawful purposes. > > _______________________________________________ > EMBOSS mailing list > EMBOSS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/emboss > From ajb at ebi.ac.uk Wed Oct 10 13:39:26 2007 From: ajb at ebi.ac.uk (ajb at ebi.ac.uk) Date: Wed, 10 Oct 2007 14:39:26 +0100 (BST) Subject: [EMBOSS] Updating the Quick Guide In-Reply-To: <5aa3b3570710100623v42107a31we6af4cab8d1bdb80@mail.gmail.com> References: <5aa3b3570710100623v42107a31we6af4cab8d1bdb80@mail.gmail.com> Message-ID: <42689.81.98.241.17.1192023566.squirrel@webmail.ebi.ac.uk> > You should update the guide on how to install emboss. > > In particular, explain how to use the .deb and .rpm packages, since a > lot of people still try to install emboss by compiling it, and it is a > pain. I'll leave that up to David to decide but the information is in the new FAQ which, yesterday, I submitted to my colleagues for approval and will then appear in CVS and later online. There was already some RPM info around but no .deb stuff. The info will also be in the books which Jon mentioned recently. Alan From charles-listes-emboss at plessy.org Wed Oct 10 13:24:08 2007 From: charles-listes-emboss at plessy.org (Charles Plessy) Date: Wed, 10 Oct 2007 22:24:08 +0900 Subject: [EMBOSS] XFree86 vs xorg In-Reply-To: <8975119BCD0AC5419D61A9CF1A923E9505A4F2EA@iahce2ksrv1.iah.bbsrc.ac.uk> References: <8975119BCD0AC5419D61A9CF1A923E9505A4F2EA@iahce2ksrv1.iah.bbsrc.ac.uk> Message-ID: <20071010132408.GJ990@kunpuu.plessy.org> Le Wed, Oct 10, 2007 at 02:02:49PM +0100, michael watson (IAH-C) a ?crit : > Hi > > My EMBOSS 5.0 install failed as it couldn't find Xlib.h. On googling, I > see this is part of XFree86-devel. However, as a red hat enterprise > linux 4 user, my X windows seems to be the x.org branch rather than > XFree86.... In Xorg, the libraries have been separated in individual packages. I think that you can find Xlib.h in a package named libx11-devel, or something like this. Have a nice day, -- Charles Plessy http://charles.plessy.org Wako, Saitama, Japan From dalloliogm at gmail.com Wed Oct 10 14:06:16 2007 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Wed, 10 Oct 2007 16:06:16 +0200 Subject: [EMBOSS] Updating the Quick Guide In-Reply-To: <42689.81.98.241.17.1192023566.squirrel@webmail.ebi.ac.uk> References: <5aa3b3570710100623v42107a31we6af4cab8d1bdb80@mail.gmail.com> <42689.81.98.241.17.1192023566.squirrel@webmail.ebi.ac.uk> Message-ID: <5aa3b3570710100706s7ab0a28tebbd30523733826@mail.gmail.com> 2007/10/10, ajb at ebi.ac.uk : > There was already some RPM > info around but no .deb stuff. The info will also be in the books > which Jon mentioned recently. > hi, there is an emboss 5.0 package in debian sid. You just have to add something like this: """ If you are a debian/ubuntu user, you can install emboss by giving the command: >>> sudo aptitude install emboss to install the package. """ Actually, this would work only for Debian Sid, but I believe the package will be included also in Ubuntu 7/10 and in debian etch in the short time. -- ----------------------------------------------------------- My Blog on Bioinformatics (italian): http://dalloliogm.wordpress.com From charles-listes-emboss at plessy.org Wed Oct 10 14:55:35 2007 From: charles-listes-emboss at plessy.org (Charles Plessy) Date: Wed, 10 Oct 2007 23:55:35 +0900 Subject: [EMBOSS] possibility of packages for Debian Etch. In-Reply-To: <5aa3b3570710100706s7ab0a28tebbd30523733826@mail.gmail.com> References: <5aa3b3570710100623v42107a31we6af4cab8d1bdb80@mail.gmail.com> <42689.81.98.241.17.1192023566.squirrel@webmail.ebi.ac.uk> <5aa3b3570710100706s7ab0a28tebbd30523733826@mail.gmail.com> Message-ID: <20071010145535.GK990@kunpuu.plessy.org> Le Wed, Oct 10, 2007 at 04:06:16PM +0200, Giovanni Marco Dall'Olio a ?crit : > hi, > there is an emboss 5.0 package in debian sid. > > Actually, this would work only for Debian Sid, but I believe the > package will be included also in Ubuntu 7/10 and in debian etch in the > short time. Dear Giovanni, Because Debian Etch is the stable version, it does not receive new packages unless they fix security issues or grave bugs. The emboss package for Debian will never be part of Etch nor its updates. However, some Debian developpers provides a separate repository in which only official developers upload recent packages recompiled for Etch. The site is called backports.org. If you or another reader is interested, we can prepare such a backport for Etch. Have a nice day, -- Charles Plessy Debian-Med packaging team Wako, Saitama, Japan From david at compbio.dundee.ac.uk Wed Oct 10 14:30:24 2007 From: david at compbio.dundee.ac.uk (David Martin) Date: Wed, 10 Oct 2007 15:30:24 +0100 Subject: [EMBOSS] Updating the Quick Guide In-Reply-To: <5aa3b3570710100623v42107a31we6af4cab8d1bdb80@mail.gmail.com> Message-ID: On 10/10/07 14:23, "Giovanni Marco Dall'Olio" wrote: > You should update the guide on how to install emboss. > > In particular, explain how to use the .deb and .rpm packages, since a > lot of people still try to install emboss by compiling it, and it is a > pain. > > > 2007/10/9, David Martin : >> Prompted by charles' request yesterday I am in the process of updating the >> EMBOSS quick guide. it was last touched about 8 years ago so comments and >> suggestions on what is new, and what should be dropped would be much >> appreciated. >> >> ..d The aim of the Quick Guide is to provide a one sheet of A4 (two sides) quick reference guide to the common programs and command line arguments that are used with EMBOSS. I found it very useful when teaching as an aide memoire for myself and the students. Explaining how to install EMBOSS on each architecture is NOT the aim - for that read the admin guide, the maintenance of which Alan and others have taken off my hands. I will however reference the admin guide for installation info. If you haven't seen the quick guide a somewhat dated pdf is available in emboss/docs/manuals/emboss_qg.pdf regards ..d From Veronique.Martin at jouy.inra.fr Thu Oct 11 07:39:44 2007 From: Veronique.Martin at jouy.inra.fr (Veronique.Martin at jouy.inra.fr) Date: Thu, 11 Oct 2007 09:39:44 +0200 (CEST) Subject: [EMBOSS] prosextract option? Message-ID: Hi, I want to run prosextract, but I would like build prosite motif in directory of my choice. Now the only possibility is in this path : emboss/share/EMBOSS/data/PROSITE Is it possbile to have got an option for choosing the output directory? I had tried by using the .embossrc file but only for this database (prosite) this file is not considered, prosextract used the emboss/share/EMBOSS/emboss.default file. Regards, VM ------------------------------------------------- V?ronique MARTIN INRA - Unit? Math?matique, Informatique et G?nome 78352 Jouy-en Josas cedex tel.: 01 34 65 29 74 ------------------------------------------------- From dalloliogm at gmail.com Thu Oct 11 08:36:04 2007 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Thu, 11 Oct 2007 10:36:04 +0200 Subject: [EMBOSS] possibility of packages for Debian Etch. In-Reply-To: <20071010145535.GK990@kunpuu.plessy.org> References: <5aa3b3570710100623v42107a31we6af4cab8d1bdb80@mail.gmail.com> <42689.81.98.241.17.1192023566.squirrel@webmail.ebi.ac.uk> <5aa3b3570710100706s7ab0a28tebbd30523733826@mail.gmail.com> <20071010145535.GK990@kunpuu.plessy.org> Message-ID: <5aa3b3570710110136y2c32b6e8v614e13cbfd12de44@mail.gmail.com> 2007/10/10, Charles Plessy : > > Because Debian Etch is the stable version, it does not receive new > packages unless they fix security issues or grave bugs. The emboss > package for Debian will never be part of Etch nor its updates. > Really? I didn't know emboss had grave bugs. Are you saying they can't be fixed? I can't find many references to bugs in emboss, but maybe you are referring to bugs like this: - http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=427439 ? > However, some Debian developpers provides a separate repository in which > only official developers upload recent packages recompiled for Etch. The > site is called backports.org. > > If you or another reader is interested, we can prepare such a backport > for Etch. Thank you very much: I think many people are interested, expecially from the Ubuntu users community. Emboss is seen as a educational package to learn bioinformatics: so, it would be better if people can install it easily by themselves, instead of asking to a system manager. Maybe you can just add the link to debian backports in the help page. > Have a nice day, > and to you, too! > -- > Charles Plessy > Debian-Med packaging team > Wako, Saitama, Japan > _______________________________________________ > EMBOSS mailing list > EMBOSS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/emboss > -- ----------------------------------------------------------- My Blog on Bioinformatics (italian): http://dalloliogm.wordpress.com From charles-listes-emboss at plessy.org Thu Oct 11 09:06:13 2007 From: charles-listes-emboss at plessy.org (charles-listes-emboss at plessy.org) Date: Thu, 11 Oct 2007 18:06:13 +0900 Subject: [EMBOSS] possibility of packages for Debian Etch. In-Reply-To: <5aa3b3570710110136y2c32b6e8v614e13cbfd12de44@mail.gmail.com> References: <5aa3b3570710100623v42107a31we6af4cab8d1bdb80@mail.gmail.com> <42689.81.98.241.17.1192023566.squirrel@webmail.ebi.ac.uk> <5aa3b3570710100706s7ab0a28tebbd30523733826@mail.gmail.com> <20071010145535.GK990@kunpuu.plessy.org> <5aa3b3570710110136y2c32b6e8v614e13cbfd12de44@mail.gmail.com> Message-ID: <20071011090613.GA31072@kunpuu.plessy.org> Le Thu, Oct 11, 2007 at 10:36:04AM +0200, Giovanni Marco Dall'Olio a ?crit : > 2007/10/10, Charles Plessy : > > > > Because Debian Etch is the stable version, it does not receive new > > packages unless they fix security issues or grave bugs. The emboss > > package for Debian will never be part of Etch nor its updates. > > > > Really? > I didn't know emboss had grave bugs. Dear Giovanni, I have been unclear. The reason why EMBOSS is not in Debian Etch is because its Debian package was not ready when Etch has been released. Furthermore, it is the policy of Debian to only accept changes related to security or grave bugs. Therefore, Debian Etch will never contain the Debian packages we prepared for EMBOSS. I will announce on this list when the package will be available through backports.org. > Are you saying they can't be fixed? > I can't find many references to bugs in emboss, but maybe you are > referring to bugs like this: > - http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=427439 ? Yes, the current package still has some quality issues. However, all the ones reported so far are solved in our SVN repository. I hope that I can update the Debian package of EMBOSS in Debian Sid soon. http://svn.debian.org/wsvn/pkg-emboss/emboss/trunk/debian/changelog?op=file&rev=0&sc=0 (If one explores a bit this repository, he can have a glimpse of what we have in the pipeline...). > Thank you very much: I think many people are interested, expecially > from the Ubuntu users community. By the way, if you ask to a MOTU Science, I think that it is possible to fast-track the emboss packages into Ubuntu... > Maybe you can just add the link to debian backports in the help page. The new package.debian.org website advertises the backports. See for example the page for OpenOffice.org: http://packages.debian.org/openoffice.org Have a nice day, -- Charles Plessy Debian-Med packaging team. Wako, Saitama, Japan From Laurence.Amilhat at toulouse.inra.fr Thu Oct 11 09:44:40 2007 From: Laurence.Amilhat at toulouse.inra.fr (Laurence Amilhat) Date: Thu, 11 Oct 2007 11:44:40 +0200 Subject: [EMBOSS] plcore.c error when compiling Message-ID: <470DF088.4020303@toulouse.inra.fr> Dear Emboss users, I am tryin to install emboss on Linux Ubuntu 7.04 Feisty Fawn I downloaded the following tar.gz : EMBOSS-5.0.0.tar.gz I made the ./configure, (I have the grphics lib z, png and gd) But when I maunch the make, I get the following message. Does anyone have an idea why? Did I miss a lib or something? Thank you for your help, Best regards, Laurence plcore.c: In function 'int text2fci(const char*, unsigned char*, unsigned char*)': plcore.c:459: error: initializer-string for array of chars is too long plcore.c:459: error: initializer-string for array of chars is too long plcore.c:459: error: initializer-string for array of chars is too long plcore.c:459: error: initializer-string for array of chars is too long plcore.c:459: error: initializer-string for array of chars is too long plcore.c:459: error: initializer-string for array of chars is too long plcore.c:459: error: initializer-string for array of chars is too long plcore.c:459: error: initializer-string for array of chars is too long plcore.c:459: error: initializer-string for array of chars is too long plcore.c:459: error: initializer-string for array of chars is too long plcore.c: In function 'void difilt(PLINT*, PLINT*, PLINT, PLINT*, PLINT*, PLINT*, PLINT*)': plcore.c:887: warning: converting to 'int' from 'PLFLT' plcore.c:888: warning: converting to 'int' from 'PLFLT' plcore.c:897: warning: converting to 'PLINT' from 'PLFLT' plcore.c:899: warning: converting to 'PLINT' from 'PLFLT' plcore.c:909: warning: converting to 'int' from 'PLFLT' plcore.c:910: warning: converting to 'int' from 'PLFLT' plcore.c:919: warning: converting to 'int' from 'PLFLT' plcore.c:920: warning: converting to 'int' from 'PLFLT' plcore.c: In function 'void sdifilt(short int*, short int*, PLINT, PLINT*, PLINT*, PLINT*, PLINT*)': plcore.c:946: warning: converting to 'short int' from 'PLFLT' plcore.c:947: warning: converting to 'short int' from 'PLFLT' plcore.c:955: warning: converting to 'short int' from 'PLFLT' plcore.c:956: warning: converting to 'short int' from 'PLFLT' plcore.c:966: warning: converting to 'short int' from 'PLFLT' plcore.c:967: warning: converting to 'short int' from 'PLFLT' plcore.c:976: warning: converting to 'short int' from 'PLFLT' plcore.c:977: warning: converting to 'short int' from 'PLFLT' plcore.c: In function 'void pldid2pc(PLFLT*, PLFLT*, PLFLT*, PLFLT*)': plcore.c:1079: warning: passing 'PLFLT' for argument 1 to 'PLFLT plP_pcdcx(PLINT)' plcore.c:1080: warning: passing 'PLFLT' for argument 1 to 'PLFLT plP_pcdcy(PLINT)' plcore.c:1081: warning: passing 'PLFLT' for argument 1 to 'PLFLT plP_pcdcx(PLINT)' plcore.c:1082: warning: passing 'PLFLT' for argument 1 to 'PLFLT plP_pcdcy(PLINT)' plcore.c: In function 'void pldip2dc(PLFLT*, PLFLT*, PLFLT*, PLFLT*)': plcore.c:1125: warning: passing 'PLFLT' for argument 1 to 'PLFLT plP_pcdcx(PLINT)' plcore.c:1126: warning: passing 'PLFLT' for argument 1 to 'PLFLT plP_pcdcy(PLINT)' plcore.c:1127: warning: passing 'PLFLT' for argument 1 to 'PLFLT plP_pcdcx(PLINT)' plcore.c:1128: warning: passing 'PLFLT' for argument 1 to 'PLFLT plP_pcdcy(PLINT)' plcore.c: In function 'void calc_didev()': plcore.c:1345: warning: converting to 'PLINT' from 'PLFLT' plcore.c:1346: warning: converting to 'PLINT' from 'PLFLT' plcore.c:1347: warning: converting to 'PLINT' from 'PLFLT' plcore.c:1348: warning: converting to 'PLINT' from 'PLFLT' plcore.c: In function 'void plP_setpxl(PLFLT, PLFLT)': plcore.c:3264: warning: converting to 'PLINT' from 'double' plcore.c:3265: warning: converting to 'PLINT' from 'double' make[2]: *** [plcore.lo] Erreur 1 make[2]: quittant le r?pertoire ? /tmp/EMBOSS-5.0.0/plplot ? make[1]: *** [all-recursive] Erreur 1 make[1]: quittant le r?pertoire ? /tmp/EMBOSS-5.0.0/plplot ? make: *** [all-recursive] Erreur 1 Exit 2 -- ==================================================================== = Laurence Amilhat INRA Toulouse 31326 Castanet-Tolosan = = Tel: 33 5 61 28 53 34 Email: laurence.amilhat at toulouse.inra.fr = ==================================================================== From jison at ebi.ac.uk Thu Oct 11 12:16:19 2007 From: jison at ebi.ac.uk (Jon Ison) Date: Thu, 11 Oct 2007 13:16:19 +0100 (BST) Subject: [EMBOSS] prosextract option? In-Reply-To: References: Message-ID: <48865.84.92.187.247.1192104979.squirrel@webmail.ebi.ac.uk> Hi Veronique prosextract is indeed hard-coded to write to the EMBOSS data directory (defined by the EMBOSS environment variable EMBOSS_DATA). You could always copy the file to your current working directory or into a directory called ".embossdata" in either your home or current working directory and the file could still be read by EMBOSS. If that doesn't help an option to write to any specified directory could easily be added - please advise. Cheers Jon > > Hi, > > I want to run prosextract, but I would like build prosite motif in > directory of my choice. Now the only possibility is in this path : > emboss/share/EMBOSS/data/PROSITE > Is it possbile to have got an option for choosing the output directory? > > I had tried by using the .embossrc file but only for this database > (prosite) this file is not considered, prosextract used the > emboss/share/EMBOSS/emboss.default file. > > Regards, > > VM > > ------------------------------------------------- > V?ronique MARTIN > INRA - Unit? Math?matique, Informatique et G?nome > 78352 Jouy-en Josas cedex > tel.: 01 34 65 29 74 > -------------------------------------------------_______________________________________________ > EMBOSS mailing list > EMBOSS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/emboss > From pmr at ebi.ac.uk Wed Oct 24 08:07:21 2007 From: pmr at ebi.ac.uk (Peter Rice) Date: Wed, 24 Oct 2007 09:07:21 +0100 Subject: [EMBOSS] Bug in degapseq ? In-Reply-To: <20071008233828.GA32069@kunpuu.plessy.org> References: <20071008233828.GA32069@kunpuu.plessy.org> Message-ID: <471EFD39.5060202@ebi.ac.uk> Charles Plessy wrote: > If I use degaspeq on a file, it prompts me for the name of the output, but if > the data comes from stdin, degaspeq crashes. It does not happen if the name of > the output is given. > chouca?~?$ cat toto | degapseq stdin > Removes gap characters from sequences > output sequence(s) [xenopus-1a.fasta]: > EMBOSS An error in ajmess.c at line 1662: > END-OF-FILE reading from user This is because you are reading from stdin, but then degapseq tries to read the output filename from stdin. You do need to specify the output filename, or use -auto to accept the default (or -filter to use stdout and to read from stdin). With -auto and -filter the program will no longer be using stdin for user replies. Hmmm ... maybe we could catch these cases ... tricky though as really it is an explicit search for "stdin" as an input file/sequence. I could invent examples where we would guess wrongly. Hope that helps, Peter From charles-listes-emboss at plessy.org Wed Oct 24 14:37:06 2007 From: charles-listes-emboss at plessy.org (Charles Plessy) Date: Wed, 24 Oct 2007 23:37:06 +0900 Subject: [EMBOSS] Bug in degapseq ? In-Reply-To: <471EFD39.5060202@ebi.ac.uk> References: <20071008233828.GA32069@kunpuu.plessy.org> <471EFD39.5060202@ebi.ac.uk> Message-ID: <20071024143706.GB24491@kunpuu.plessy.org> Le Wed, Oct 24, 2007 at 09:07:21AM +0100, Peter Rice a ?crit : > > You do need to specify the output filename, or use -auto to accept the > default (or -filter to use stdout and to read from stdin). > > With -auto and -filter the program will no longer be using stdin for > user replies. Oh, I completely overlooked the fact that the emboss programs can take their user replies from stdin. Maybe then the most straightforward to inform users from this mistake would be to change the error message to something like : "Error: could not open file '...............', in which the name of the file would be truncated to the end of the line. The user would quickly understand if the file name is someting like AGTCCAGGTA... Have a nice day, -- Charles Plessy Wako, Saitama, Japan From pmr at ebi.ac.uk Wed Oct 24 16:53:27 2007 From: pmr at ebi.ac.uk (Peter Rice) Date: Wed, 24 Oct 2007 17:53:27 +0100 Subject: [EMBOSS] Bug in degapseq ? In-Reply-To: <20071024143706.GB24491@kunpuu.plessy.org> References: <20071008233828.GA32069@kunpuu.plessy.org> <471EFD39.5060202@ebi.ac.uk> <20071024143706.GB24491@kunpuu.plessy.org> Message-ID: <471F7887.5050004@ebi.ac.uk> Charles Plessy wrote: > Oh, I completely overlooked the fact that the emboss programs can take > their user replies from stdin. Maybe then the most straightforward to > inform users from this mistake would be to change the error message to > something like : "Error: could not open file '...............', in which > the name of the file would be truncated to the end of the line. The user > would quickly understand if the file name is someting like AGTCCAGGTA... Or perhaps they would not quickly understand ... because it took me a few runs before I realised that was the problem :-) I think we can keep track of stdin being opened in EMBOSS and refuse to prompt for input. regards, Peter From staffa at niehs.nih.gov Wed Oct 24 17:21:37 2007 From: staffa at niehs.nih.gov (Staffa, Nick (NIH/NIEHS)) Date: Wed, 24 Oct 2007 13:21:37 -0400 Subject: [EMBOSS] GUI interfaces Message-ID: Friends We are preparing for if ever GCG goes away by seriously pushing EMBOSS with our users. This page http://emboss.sourceforge.net/interfaces/ lists 15 GUIs. apparently ColiMate is not an existing GUI to EMBOSS, but a developement tool. Please tell me: Which of the 15 GUIs listed are complete and available? Which do you think is best? Thank you Nick Staffa Telephone: 919-316-4569 (NIEHS: 6-4569) Scientific Computing Support Group NIEHS Information Technology Support Services Contract (Science Task Monitor: Roy W. Reter (reter at niehs.nih.gov) National Institute of Environmental Health Sciences National Institutes of Health Research Triangle Park, North Carolina From andrespinzon at gmail.com Wed Oct 24 18:11:18 2007 From: andrespinzon at gmail.com (Andres Pinzon) Date: Wed, 24 Oct 2007 13:11:18 -0500 Subject: [EMBOSS] GUI interfaces In-Reply-To: References: Message-ID: <8968fc7e0710241111odff847dge2d0d16889c16e32@mail.gmail.com> In my experience: [1] wEMBOSS and EMBOSS-Explorer are really easy to configure and provide different user experience that complement each other. [1] http://bioinf.ibun.unal.edu.co/wEMBOSS/ Regards, On 10/24/07, Staffa, Nick (NIH/NIEHS) wrote: > Friends > We are preparing for if ever GCG goes away by seriously pushing EMBOSS > with our users. > This page > http://emboss.sourceforge.net/interfaces/ > lists 15 GUIs. > apparently ColiMate is not an existing GUI to EMBOSS, > but a developement tool. > Please tell me: > Which of the 15 GUIs listed are complete and available? > Which do you think is best? > > Thank you > > Nick Staffa > Telephone: 919-316-4569 (NIEHS: 6-4569) > Scientific Computing Support Group > NIEHS Information Technology Support Services Contract > (Science Task Monitor: Roy W. Reter (reter at niehs.nih.gov) > National Institute of Environmental Health Sciences > National Institutes of Health > Research Triangle Park, North Carolina > > > _______________________________________________ > EMBOSS mailing list > EMBOSS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/emboss > -- Andr?s Pinz?n http://bioinf.ibun.unal.edu.co/~apinzon/ Bioinformatics Center, Colombia EMBnet node http://bioinf.ibun.unal.edu.co Tel +57 3165000 ext 16961 Fax +571 3165415 Micology and Phytopathology Laboratory - Los Andes University. http://bioinf.uniandes.edu.co Tel +571 3394949 ext. 2768 From golharam at umdnj.edu Wed Oct 24 17:58:08 2007 From: golharam at umdnj.edu (Ryan Golhar) Date: Wed, 24 Oct 2007 13:58:08 -0400 Subject: [EMBOSS] GUI interfaces In-Reply-To: References: Message-ID: <471F87B0.8030308@umdnj.edu> Hi Nich, We (UMDNJ) migrated off of GCG several years ago. We found most of our users prefer the command-line interface for shell scripting or a web interface for GUI access from their own computers. We use EMBOSS-Explorer for the web interface. Its (much) cleaner and faster than SeqWeb ever was and doesn't rely on the server storing user data. We removed our responsibility of backing user data by moving off a server storages system to the user instead. There are no issues with user account management (username/passwords) with this system either. With GCG, we would have at least 1 or 2 user issues per month. Since the switch, I can honestly say our user issues are maybe 1 or 2 per year. If you have any questions about this, feel free to email me, Ryan ---------------- Ryan Golhar, PhD golharam at umdnj.edu Computational Biologst Informatics Institute at UMDNJ Staffa, Nick (NIH/NIEHS) wrote: > Friends > We are preparing for if ever GCG goes away by seriously pushing EMBOSS > with our users. > This page > http://emboss.sourceforge.net/interfaces/ > lists 15 GUIs. > apparently ColiMate is not an existing GUI to EMBOSS, > but a developement tool. > Please tell me: > Which of the 15 GUIs listed are complete and available? > Which do you think is best? > > Thank you > > Nick Staffa > Telephone: 919-316-4569 (NIEHS: 6-4569) > Scientific Computing Support Group > NIEHS Information Technology Support Services Contract > (Science Task Monitor: Roy W. Reter (reter at niehs.nih.gov) > National Institute of Environmental Health Sciences > National Institutes of Health > Research Triangle Park, North Carolina > > > _______________________________________________ > EMBOSS mailing list > EMBOSS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/emboss > > From kann.vearasilp at mu.edu Thu Oct 25 19:07:37 2007 From: kann.vearasilp at mu.edu (Kann Vearasilp) Date: Thu, 25 Oct 2007 14:07:37 -0500 Subject: [EMBOSS] Cannot open division file Message-ID: <80455327-9F8B-49EC-801F-3A5DFDE09DD3@mu.edu> Hello everyone, I just finish indexing a genbank database for my lab using dbiflat command. I set up an emboss.default file referenced from emboss.default.template as it was provided. "seqret" is a command that is used to test the system, and it seems that EMBOSS could not find the division file. I can see from the archive that there was this kind of problem with test database provided from emboss as well. (http://emboss.open- bio.org/pipermail/emboss/2005-November/002323.html). However, I am pretty sure that I correctly pointed the path to my database. However, here is my configuration. The system is Mac OS 10.4 1. Emboss was installed from fink at /sw/share/EMBOSS 2. All database was installed in /lab/data/databases/genbank/*.seq 3. Index files are in /lab/data/indices/genbank/??? Here is an example of one of the index directory from my lab. xxx at yyy/lab/data/indices/genbank/mam: acnum.hit des.trg keyword.hit seqvn.hit taxon.trg acnum.trg division.lkp keyword.trg seqvn.trg des.hit entrynam.idx mam.dbiflat taxon.hit 4. Here is a fraction from my emboss.default file: # Set location of acd files that describe each program SET emboss_acdroot /sw/share/EMBOSS/acd # Set location of Genbank flatfiles in protein SET emboss_database_dir /lab/data/databases # Set location of Genbank flatfiles indices in protein set emboss_index_dir /lab/data/indices # Set a log file that user can append their records and EMBOSS automatically write log information SET emboss_logfile /sw/share/EMBOSS/log/log # Set Paper size of disc page and is required by the 'dbx' indexing program and 'method: "emblcd" emboss' # Recommended value is 2048 SET PAGESIZE 2048 # Set Caches size required for 'dbx' indexing and 'method emboss'. # It is a page size number to cache. Recommended value is 200 SET CACHESIZE 200 # Set parameter for flat file indices that we have created in # /lab/data/indices/genbank . . . . . DB gbmam [ # required parameters method: "emblcd" format: "GB" type: "N" dir: "\$emboss_database_dir/genbank" file: "gbmam*.seq" # optional parameters fields: "sv des key org" release: "161.0" comment: "Genbank database for mam sequences" indexdir: "\$emboss_index_dir/genbank/mam" ] 5. I run this seqret command to test the system, but it throw error and you can see: xxx at yyy~:seqret gbmam:BC102801 Reads and writes (returns) sequences Warning: Cannot open division file '' for database 'gbmam' Warning: seqCdQry failed Error: Unable to read sequence 'gbmam:BC102801' Died: seqret terminated: Bad value for '-sequence' and no prompt 6. I also run the seqret command in debug mode and this is its log from the command. Debug file seqret.dbg buffered:No ajAcdInitP pgm 'seqret' package '' ajFileNewIn '/sw/share/EMBOSS/acd/seqret.acd' EOF ajFileGetsL file /sw/share/EMBOSS/acd/seqret.acd closing file '/sw/share/EMBOSS/acd/seqret.acd' ajFileNewIn '/sw/share/EMBOSS/acd/codes.english' EOF ajFileGetsL file /sw/share/EMBOSS/acd/codes.english closing file '/sw/share/EMBOSS/acd/codes.english' ajTableNewFunctionLen hint 25 size 251 ajTableNewFunctionLen hint 25 size 251 ajTableNewFunctionLen hint 25 size 251 ajFileNewIn '/sw/share/EMBOSS/acd/knowntypes.standard' EOF ajFileGetsL file /sw/share/EMBOSS/acd/knowntypes.standard closing file '/sw/share/EMBOSS/acd/knowntypes.standard' Set acdprotein value '$(sequence.protein)' ajSeqinClear called ++seqUsaProcess 'gbmam:BC102801' 0..0(N) '' 0 USA to test: 'gbmam:BC102801' format regexp: No list:No no format specified in USA ...input format not set dbname dbexp: Yes found dbname 'gbmam' level: '' qry->QryString: 'BC102801' seqQueryFieldC usa 'sv' fields 'sv des key org' seqQueryField test 'sv' seqQueryField match 'sv' ajSeqQueryWild id 'BC102801' acc 'BC102801' sv 'BC102801' gi '' des '' org '' key '' wild (has) query Sv 'BC102801' database type: 'N' format 'GB' use access method 'emblcd' Matched seqAccess[2] 'emblcd' seqAccessEmblcd type 2 directory '\$emboss_index_dir/genbank/mam' entry 'BC102801' acc 'BC102801' hasacc:Yes ajFileNewIn '\$emboss_index_dir/genbank/mam/division.lkp' Database 'gbmam' : access method 'emblcd' failed ajSeqinClear called ++seqUsaProcess 'gbmam:BC102801' 0..0(N) '' 0 USA to test: 'gbmam:BC102801' format regexp: No list:No no format specified in USA ...input format not set dbname dbexp: Yes found dbname 'gbmam' level: '' qry->QryString: 'BC102801' seqQueryFieldC usa 'sv' fields 'sv des key org' seqQueryField test 'sv' seqQueryField match 'sv' ajSeqQueryWild id 'BC102801' acc 'BC102801' sv 'BC102801' gi '' des '' org '' key '' wild (has) query Sv 'BC102801' database type: 'N' format 'GB' use access method 'emblcd' Matched seqAccess[2] 'emblcd' seqAccessEmblcd type 2 directory '\$emboss_index_dir/genbank/mam' entry 'BC102801' acc 'BC102801' hasacc:Yes ajFileNewIn '\$emboss_index_dir/genbank/mam/division.lkp' Database 'gbmam' : access method 'emblcd' failed It seems that the emboss could not find the division file. I still don't know what the problem is. Do you have any recommendation? Thank you so much in advance for any help! Kann From ajb at ebi.ac.uk Thu Oct 25 20:22:18 2007 From: ajb at ebi.ac.uk (ajb at ebi.ac.uk) Date: Thu, 25 Oct 2007 21:22:18 +0100 (BST) Subject: [EMBOSS] Cannot open division file In-Reply-To: <80455327-9F8B-49EC-801F-3A5DFDE09DD3@mu.edu> References: <80455327-9F8B-49EC-801F-3A5DFDE09DD3@mu.edu> Message-ID: <33572.81.98.241.17.1193343738.squirrel@webmail.ebi.ac.uk> Dear Kann, One major problem is your DB entry: DB gbmam [ # required parameters method: "emblcd" format: "GB" type: "N" dir: "\$emboss_database_dir/genbank" file: "gbmam*.seq" # optional parameters fields: "sv des key org" release: "161.0" comment: "Genbank database for mam sequences" indexdir: "\$emboss_index_dir/genbank/mam" ] You should remove the two backquote characters before the '$' characters. I believe they mistakenly appeared in some documentation in the past (possibly as a result of some automatic formatting). It'd be useful if you'd email me off-list and tell me which documentation contained the error (if my guess is correct). Alan > Hello everyone, > > I just finish indexing a genbank database for my lab using dbiflat > command. I set up an emboss.default file referenced from > emboss.default.template as it was provided. "seqret" is a command > that is used to test the system, and it seems that EMBOSS could not > find the division file. > > I can see from the archive that there was this kind of problem with > test database provided from emboss as well. (http://emboss.open- > bio.org/pipermail/emboss/2005-November/002323.html). However, I am > pretty sure that I correctly pointed the path to my database. > However, here is my configuration. > > The system is Mac OS 10.4 > > 1. Emboss was installed from fink at /sw/share/EMBOSS > > 2. All database was installed in /lab/data/databases/genbank/*.seq > > 3. Index files are in /lab/data/indices/genbank/??? Here is an > example of one of the index directory from my lab. > > xxx at yyy/lab/data/indices/genbank/mam: > acnum.hit des.trg keyword.hit seqvn.hit taxon.trg > acnum.trg division.lkp keyword.trg seqvn.trg > des.hit entrynam.idx mam.dbiflat taxon.hit > > 4. Here is a fraction from my emboss.default file: > > # Set location of acd files that describe each program > SET emboss_acdroot /sw/share/EMBOSS/acd > > > # Set location of Genbank flatfiles in protein > SET emboss_database_dir /lab/data/databases > > # Set location of Genbank flatfiles indices in protein > set emboss_index_dir /lab/data/indices > > # Set a log file that user can append their records and EMBOSS > automatically write log information > SET emboss_logfile /sw/share/EMBOSS/log/log > > # Set Paper size of disc page and is required by the 'dbx' indexing > program and 'method: "emblcd" emboss' > # Recommended value is 2048 > SET PAGESIZE 2048 > > # Set Caches size required for 'dbx' indexing and 'method emboss'. > # It is a page size number to cache. Recommended value is 200 > SET CACHESIZE 200 > > # Set parameter for flat file indices that we have created in > # /lab/data/indices/genbank > . > . > . > . > . > DB gbmam [ > # required parameters > method: "emblcd" > format: "GB" > type: "N" > dir: "\$emboss_database_dir/genbank" > file: "gbmam*.seq" > # optional parameters > fields: "sv des key org" > release: "161.0" > comment: "Genbank database for mam sequences" > indexdir: "\$emboss_index_dir/genbank/mam" > ] > > 5. I run this seqret command to test the system, but it throw error > and you can see: > > xxx at yyy~:seqret gbmam:BC102801 > Reads and writes (returns) sequences > Warning: Cannot open division file '' for database 'gbmam' > Warning: seqCdQry failed > Error: Unable to read sequence 'gbmam:BC102801' > Died: seqret terminated: Bad value for '-sequence' and no prompt > > 6. I also run the seqret command in debug mode and this is its log > from the command. > > Debug file seqret.dbg buffered:No > ajAcdInitP pgm 'seqret' package '' > ajFileNewIn '/sw/share/EMBOSS/acd/seqret.acd' > EOF ajFileGetsL file /sw/share/EMBOSS/acd/seqret.acd > closing file '/sw/share/EMBOSS/acd/seqret.acd' > ajFileNewIn '/sw/share/EMBOSS/acd/codes.english' > EOF ajFileGetsL file /sw/share/EMBOSS/acd/codes.english > closing file '/sw/share/EMBOSS/acd/codes.english' > ajTableNewFunctionLen hint 25 size 251 > ajTableNewFunctionLen hint 25 size 251 > ajTableNewFunctionLen hint 25 size 251 > ajFileNewIn '/sw/share/EMBOSS/acd/knowntypes.standard' > EOF ajFileGetsL file /sw/share/EMBOSS/acd/knowntypes.standard > closing file '/sw/share/EMBOSS/acd/knowntypes.standard' > Set acdprotein value '$(sequence.protein)' > ajSeqinClear called > ++seqUsaProcess 'gbmam:BC102801' 0..0(N) '' 0 > USA to test: 'gbmam:BC102801' > > format regexp: No list:No > no format specified in USA > > ...input format not set > dbname dbexp: Yes > found dbname 'gbmam' level: '' qry->QryString: 'BC102801' > seqQueryFieldC usa 'sv' fields 'sv des key org' > seqQueryField test 'sv' > seqQueryField match 'sv' > ajSeqQueryWild id 'BC102801' acc 'BC102801' sv 'BC102801' gi '' des > '' org '' key '' > wild (has) query Sv 'BC102801' > database type: 'N' format 'GB' > use access method 'emblcd' > Matched seqAccess[2] 'emblcd' > seqAccessEmblcd type 2 > directory '\$emboss_index_dir/genbank/mam' entry 'BC102801' acc > 'BC102801' hasacc:Yes > ajFileNewIn '\$emboss_index_dir/genbank/mam/division.lkp' > Database 'gbmam' : access method 'emblcd' failed > ajSeqinClear called > ++seqUsaProcess 'gbmam:BC102801' 0..0(N) '' 0 > USA to test: 'gbmam:BC102801' > > format regexp: No list:No > no format specified in USA > > ...input format not set > dbname dbexp: Yes > found dbname 'gbmam' level: '' qry->QryString: 'BC102801' > seqQueryFieldC usa 'sv' fields 'sv des key org' > seqQueryField test 'sv' > seqQueryField match 'sv' > ajSeqQueryWild id 'BC102801' acc 'BC102801' sv 'BC102801' gi '' des > '' org '' key '' > wild (has) query Sv 'BC102801' > database type: 'N' format 'GB' > use access method 'emblcd' > Matched seqAccess[2] 'emblcd' > seqAccessEmblcd type 2 > directory '\$emboss_index_dir/genbank/mam' entry 'BC102801' acc > 'BC102801' hasacc:Yes > ajFileNewIn '\$emboss_index_dir/genbank/mam/division.lkp' > Database 'gbmam' : access method 'emblcd' failed > > It seems that the emboss could not find the division file. I still > don't know what the problem is. Do you have any recommendation? > > Thank you so much in advance for any help! > > Kann > > _______________________________________________ > EMBOSS mailing list > EMBOSS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/emboss > From kann.vearasilp at mu.edu Thu Oct 25 22:06:01 2007 From: kann.vearasilp at mu.edu (Kann Vearasilp) Date: Thu, 25 Oct 2007 17:06:01 -0500 Subject: [EMBOSS] Cannot open division file In-Reply-To: <33572.81.98.241.17.1193343738.squirrel@webmail.ebi.ac.uk> References: <80455327-9F8B-49EC-801F-3A5DFDE09DD3@mu.edu> <33572.81.98.241.17.1193343738.squirrel@webmail.ebi.ac.uk> Message-ID: Hello Alan, Thank you so much for fast response! It seems that this backslash cause me all the problems. Once I removed them, the program works flawlessly. :) Kann PS. I can find the document and will mail you once I know the version of this emboss tutorial. On Oct 25, 2007, at 3:22 PM, ajb at ebi.ac.uk wrote: > Dear Kann, > > One major problem is your DB entry: > > DB gbmam [ > # required parameters > method: "emblcd" > format: "GB" > type: "N" > dir: "\$emboss_database_dir/genbank" > file: "gbmam*.seq" > # optional parameters > fields: "sv des key org" > release: "161.0" > comment: "Genbank database for mam sequences" > indexdir: "\$emboss_index_dir/genbank/mam" > ] > > You should remove the two backquote characters before the '$' > characters. I believe they mistakenly appeared in some documentation > in the past (possibly as a result of some automatic formatting). > It'd be useful if you'd email me off-list and tell me which > documentation > contained the error (if my guess is correct). > > > Alan > > >> Hello everyone, >> >> I just finish indexing a genbank database for my lab using dbiflat >> command. I set up an emboss.default file referenced from >> emboss.default.template as it was provided. "seqret" is a command >> that is used to test the system, and it seems that EMBOSS could not >> find the division file. >> >> I can see from the archive that there was this kind of problem with >> test database provided from emboss as well. (http://emboss.open- >> bio.org/pipermail/emboss/2005-November/002323.html). However, I am >> pretty sure that I correctly pointed the path to my database. >> However, here is my configuration. >> >> The system is Mac OS 10.4 >> >> 1. Emboss was installed from fink at /sw/share/EMBOSS >> >> 2. All database was installed in /lab/data/databases/genbank/*.seq >> >> 3. Index files are in /lab/data/indices/genbank/??? Here is an >> example of one of the index directory from my lab. >> >> xxx at yyy/lab/data/indices/genbank/mam: >> acnum.hit des.trg keyword.hit seqvn.hit taxon.trg >> acnum.trg division.lkp keyword.trg seqvn.trg >> des.hit entrynam.idx mam.dbiflat taxon.hit >> >> 4. Here is a fraction from my emboss.default file: >> >> # Set location of acd files that describe each program >> SET emboss_acdroot /sw/share/EMBOSS/acd >> >> >> # Set location of Genbank flatfiles in protein >> SET emboss_database_dir /lab/data/databases >> >> # Set location of Genbank flatfiles indices in protein >> set emboss_index_dir /lab/data/indices >> >> # Set a log file that user can append their records and EMBOSS >> automatically write log information >> SET emboss_logfile /sw/share/EMBOSS/log/log >> >> # Set Paper size of disc page and is required by the 'dbx' indexing >> program and 'method: "emblcd" emboss' >> # Recommended value is 2048 >> SET PAGESIZE 2048 >> >> # Set Caches size required for 'dbx' indexing and 'method emboss'. >> # It is a page size number to cache. Recommended value is 200 >> SET CACHESIZE 200 >> >> # Set parameter for flat file indices that we have created in >> # /lab/data/indices/genbank >> . >> . >> . >> . >> . >> DB gbmam [ >> # required parameters >> method: "emblcd" >> format: "GB" >> type: "N" >> dir: "\$emboss_database_dir/genbank" >> file: "gbmam*.seq" >> # optional parameters >> fields: "sv des key org" >> release: "161.0" >> comment: "Genbank database for mam sequences" >> indexdir: "\$emboss_index_dir/genbank/mam" >> ] >> >> 5. I run this seqret command to test the system, but it throw error >> and you can see: >> >> xxx at yyy~:seqret gbmam:BC102801 >> Reads and writes (returns) sequences >> Warning: Cannot open division file '' for database 'gbmam' >> Warning: seqCdQry failed >> Error: Unable to read sequence 'gbmam:BC102801' >> Died: seqret terminated: Bad value for '-sequence' and no prompt >> >> 6. I also run the seqret command in debug mode and this is its log >> from the command. >> >> Debug file seqret.dbg buffered:No >> ajAcdInitP pgm 'seqret' package '' >> ajFileNewIn '/sw/share/EMBOSS/acd/seqret.acd' >> EOF ajFileGetsL file /sw/share/EMBOSS/acd/seqret.acd >> closing file '/sw/share/EMBOSS/acd/seqret.acd' >> ajFileNewIn '/sw/share/EMBOSS/acd/codes.english' >> EOF ajFileGetsL file /sw/share/EMBOSS/acd/codes.english >> closing file '/sw/share/EMBOSS/acd/codes.english' >> ajTableNewFunctionLen hint 25 size 251 >> ajTableNewFunctionLen hint 25 size 251 >> ajTableNewFunctionLen hint 25 size 251 >> ajFileNewIn '/sw/share/EMBOSS/acd/knowntypes.standard' >> EOF ajFileGetsL file /sw/share/EMBOSS/acd/knowntypes.standard >> closing file '/sw/share/EMBOSS/acd/knowntypes.standard' >> Set acdprotein value '$(sequence.protein)' >> ajSeqinClear called >> ++seqUsaProcess 'gbmam:BC102801' 0..0(N) '' 0 >> USA to test: 'gbmam:BC102801' >> >> format regexp: No list:No >> no format specified in USA >> >> ...input format not set >> dbname dbexp: Yes >> found dbname 'gbmam' level: '' qry->QryString: 'BC102801' >> seqQueryFieldC usa 'sv' fields 'sv des key org' >> seqQueryField test 'sv' >> seqQueryField match 'sv' >> ajSeqQueryWild id 'BC102801' acc 'BC102801' sv 'BC102801' gi '' des >> '' org '' key '' >> wild (has) query Sv 'BC102801' >> database type: 'N' format 'GB' >> use access method 'emblcd' >> Matched seqAccess[2] 'emblcd' >> seqAccessEmblcd type 2 >> directory '\$emboss_index_dir/genbank/mam' entry 'BC102801' acc >> 'BC102801' hasacc:Yes >> ajFileNewIn '\$emboss_index_dir/genbank/mam/division.lkp' >> Database 'gbmam' : access method 'emblcd' failed >> ajSeqinClear called >> ++seqUsaProcess 'gbmam:BC102801' 0..0(N) '' 0 >> USA to test: 'gbmam:BC102801' >> >> format regexp: No list:No >> no format specified in USA >> >> ...input format not set >> dbname dbexp: Yes >> found dbname 'gbmam' level: '' qry->QryString: 'BC102801' >> seqQueryFieldC usa 'sv' fields 'sv des key org' >> seqQueryField test 'sv' >> seqQueryField match 'sv' >> ajSeqQueryWild id 'BC102801' acc 'BC102801' sv 'BC102801' gi '' des >> '' org '' key '' >> wild (has) query Sv 'BC102801' >> database type: 'N' format 'GB' >> use access method 'emblcd' >> Matched seqAccess[2] 'emblcd' >> seqAccessEmblcd type 2 >> directory '\$emboss_index_dir/genbank/mam' entry 'BC102801' acc >> 'BC102801' hasacc:Yes >> ajFileNewIn '\$emboss_index_dir/genbank/mam/division.lkp' >> Database 'gbmam' : access method 'emblcd' failed >> >> It seems that the emboss could not find the division file. I still >> don't know what the problem is. Do you have any recommendation? >> >> Thank you so much in advance for any help! >> >> Kann >> >> _______________________________________________ >> EMBOSS mailing list >> EMBOSS at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/emboss >> > > From kertib at linuxlap.hu Tue Oct 30 10:25:36 2007 From: kertib at linuxlap.hu (kerti =?ISO-8859-1?Q?Bal=E1zs_G=E1bor?=) Date: Tue, 30 Oct 2007 11:25:36 +0100 Subject: [EMBOSS] make error Message-ID: <1193739936.5962.28.camel@genotech> Hello, There is some problem make EMBOSS. The "configure" has ran well, no made error, or missing componenet, but the "make" exit run with message attacted make.err file. How solve the problem? Thank you! Balazs Kerti Szent Istvan University, Institute of Genetics and Biotechnology HUN-2103 Godollo, Pater Karoly u. 1. -------------- next part -------------- Making all in plplot make[1]: Entering directory `/usr/src/EMBOSS-5.0.0/plplot' Making all in lib make[2]: Entering directory `/usr/src/EMBOSS-5.0.0/plplot/lib' make[2]: Nothing to be done for `all'. make[2]: Leaving directory `/usr/src/EMBOSS-5.0.0/plplot/lib' make[2]: Entering directory `/usr/src/EMBOSS-5.0.0/plplot' /bin/bash ../libtool --tag=CC --mode=compile gcc -DPACKAGE_NAME=\"\" -DPACKAGE_TARNAME=\"\" -DPACKAGE_VERSION=\"\" -DPACKAGE_STRING=\"\" -DPACKAGE_BUGREPORT=\"\" -DPACKAGE=\"EMBOSS\" -DVERSION=\"5.0.0\" -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -DX_DISPLAY_MISSING=1 -DHAVE_DIRENT_H=1 -DSTDC_HEADERS=1 -DHAVE_UNISTD_H=1 -DGETPGRP_VOID=1 -DHAVE_STRFTIME=1 -DHAVE_FORK=1 -DHAVE_VFORK=1 -DHAVE_WORKING_VFORK=1 -DHAVE_WORKING_FORK=1 -DHAVE_VPRINTF=1 -DHAVE_MEMMOVE=1 -DHAVE_LIBM=1 -I. -I./ -I/usr/include/gd -DPREFIX=\"/usr/local\" -DBUILD_DIR=\".\" -DDRV_DIR=\".\" -DEMBOSS_TOP=\"/usr/src/EMBOSS-5.0.0\" -DAJ_LinuxLF -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE -DLENDIAN -DNO_AUTH -O2 -MT xwin.lo -MD -MP -MF .deps/xwin.Tpo -c -o xwin.lo xwin.c gcc -DPACKAGE_NAME=\"\" -DPACKAGE_TARNAME=\"\" -DPACKAGE_VERSION=\"\" -DPACKAGE_STRING=\"\" -DPACKAGE_BUGREPORT=\"\" -DPACKAGE=\"EMBOSS\" -DVERSION=\"5.0.0\" -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -DX_DISPLAY_MISSING=1 -DHAVE_DIRENT_H=1 -DSTDC_HEADERS=1 -DHAVE_UNISTD_H=1 -DGETPGRP_VOID=1 -DHAVE_STRFTIME=1 -DHAVE_FORK=1 -DHAVE_VFORK=1 -DHAVE_WORKING_VFORK=1 -DHAVE_WORKING_FORK=1 -DHAVE_VPRINTF=1 -DHAVE_MEMMOVE=1 -DHAVE_LIBM=1 -I. -I./ -I/usr/include/gd -DPREFIX=\"/usr/local\" -DBUILD_DIR=\".\" -DDRV_DIR=\".\" -DEMBOSS_TOP=\"/usr/src/EMBOSS-5.0.0\" -DAJ_LinuxLF -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE -DLENDIAN -DNO_AUTH -O2 -MT xwin.lo -MD -MP -MF .deps/xwin.Tpo -c xwin.c -fPIC -DPIC -o .libs/xwin.o make[2]: Leaving directory `/usr/src/EMBOSS-5.0.0/plplot' make[1]: Leaving directory `/usr/src/EMBOSS-5.0.0/plplot' From jerome.laroche at bioinfo.ulaval.ca Wed Oct 31 20:46:50 2007 From: jerome.laroche at bioinfo.ulaval.ca (=?ISO-8859-1?Q?J=E9r=F4me_Laroche?=) Date: Wed, 31 Oct 2007 16:46:50 -0400 Subject: [EMBOSS] dbxflat and size of index files Message-ID: Hello, I use dbxflat to index uniprot (sprot and trembl) flat files for which the size is 1.2 G for sprot and 11 G for trembl. The resulting files are amazingly huge: 11 G. Is it normal? Another example with Genbank flat files: the division gbsts has a size of 3.3 G. Indexing with dbxflat give 6.8 G of index files but with dbiflat give only 199 M of index files. I know its not necessary to index genbank flat files with dbxflat because each individual file is not bigger than 300 M. I did this just for the demonstration. Apart of this, all is working very well. Thank you in advance. J?r?me Laroche Centre de bioinformatique et de biologie computationnelle Universit? Laval From ajb at ebi.ac.uk Wed Oct 31 22:07:24 2007 From: ajb at ebi.ac.uk (ajb at ebi.ac.uk) Date: Wed, 31 Oct 2007 22:07:24 -0000 (GMT) Subject: [EMBOSS] dbxflat and size of index files In-Reply-To: References: Message-ID: <33217.81.98.241.17.1193868444.squirrel@webmail.ebi.ac.uk> Hello J?r?me, Yes, it is normal. It is a combination of three things. First, it is a tree structure, secondly the tree isn't tightly packed and thirdly 64-bit pointers are used throughout. The first will allow on-the-fly updating of the index, the second is for speed of construction/updating and the third is obvious. Another consideration is that, in some cases, the indexes are trees-of-trees to allow duplicate codes to be indexed (e.g. keywords). Coincidentally I'm on the lookout for new indexing algorithms at the moment so, if you have a favourite one then we're always open for suggestions. Alan > Hello, > > I use dbxflat to index uniprot (sprot and trembl) flat files for > which the size is 1.2 G for sprot and 11 G for trembl. The resulting > files are amazingly huge: 11 G. Is it normal? > > Another example with Genbank flat files: the division gbsts has a > size of 3.3 G. Indexing with dbxflat give 6.8 G of index files but > with dbiflat give only 199 M of index files. I know its not necessary > to index genbank flat files with dbxflat because each individual file > is not bigger than 300 M. I did this just for the demonstration. > > Apart of this, all is working very well. > > Thank you in advance. > > > J?r?me Laroche > > Centre de bioinformatique et de biologie computationnelle > Universit? Laval > > > _______________________________________________ > EMBOSS mailing list > EMBOSS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/emboss >