From kvddrift at earthlink.net Sun Apr 2 18:51:23 2006 From: kvddrift at earthlink.net (Koen van der Drift) Date: Sun, 2 Apr 2006 18:51:23 -0400 Subject: [EMBOSS] crash on intel-Mac In-Reply-To: <51078.81.98.244.247.1143807173.squirrel@webmail.ebi.ac.uk> References: <51078.81.98.244.247.1143807173.squirrel@webmail.ebi.ac.uk> Message-ID: On Mar 31, 2006, at 7:12 AM, ajb at ebi.ac.uk wrote: > This should now be fixed as long as you apply all the fixes to > EMBOSS-3.0.0 > from the directory: Thanks. Another fink user suggested to even extend the testing for ppc and intel in new config file, so it looks like: if test "`uname -a | grep Darwin`"; then if test "`uname -a | grep i386`"; then CFLAGS="$CFLAGS -O1" else # is this the correct setting on darwin-powerpc? CFLAGS="$CLFAGS -O2" fi else CFLAGS="$CFLAGS -O2" fi fi Would that cause any problems with emboss? thanks, - Koen. From h-weber at users.sourceforge.net Mon Apr 3 13:49:06 2006 From: h-weber at users.sourceforge.net (harald weber) Date: Mon, 03 Apr 2006 10:49:06 -0700 Subject: [EMBOSS] SeqFreed - a new interface to EMBOSS Message-ID: Dear friends, herewith I'd like to inform you about SeqFreed, a bioinformatics desktop. Amongst others, SeqFreed can also serve as a GUI-interface to EMBOSS applications. Please download it via 'seqfreed.sourceforge.net', run it and let me know, what you think about it. Besides that many details have to be improved, I'd like to know if this kind of app could be useful for you at all. All the best, Harald From dwaner at scitegic.com Tue Apr 4 12:57:45 2006 From: dwaner at scitegic.com (David Waner) Date: Tue, 04 Apr 2006 09:57:45 -0700 Subject: [EMBOSS] Digest and Pepstats crash using cygwin Message-ID: <4432A589.6050809@scitegic.com> I have compiled the 3.0.0 release of Emboss (including all current fixes from the ftp site) for Windows XP using Cygwin version 1.88. Most of the Emboss programs that I have tested work, but both Digest and Pepstats fail every time with a "Bad float conversion" error. The problem does not seem to depend on the sequence data, and occurs on every file I've tried. Has anyone else experienced this problem? Any solutions or suggestions would be appreciated. Thanks. - David Example: C:> digest -sequence O43291.fa -menu 2 -auto Protein proteolytic enzyme or reagent cleavage digest Output report [spt2_human.digest]: stdout EMBOSS An error in ajarr.c at line 1701: Bad float conversion Test data (O43291.fa): >swall|O43291|SPT2_HUMAN Kunitz-type protease inhibitor 2 precursor (Hepatocyte growth factor activator inhibitor type 2) (HAI-2) (Placental bikunin). MAQLCGLRRSRAFLALLGSLLLSGVLAADRERSIHDFCLVSKVVGRCRASMPRWWYNVTD GSCQLFVYGGCDGNSNNYLTKEECLKKCATVTENATGDLATSRNAADSSVPSAPRRQDSE DHSSDMFNYEEYCTANAVTGPCRASFPRWYFDVERNSCNNFIYGGCRGNKNSYRSEEACM LRCFRQQENPPLPLGSKVVVLAGLFVMVLILFLGASMVYLIRVARRNQERALRTVWSSGD DKEQLVKNTYVL From simon.andrews at bbsrc.ac.uk Wed Apr 5 05:04:20 2006 From: simon.andrews at bbsrc.ac.uk (simon andrews (BI)) Date: Wed, 5 Apr 2006 10:04:20 +0100 Subject: [EMBOSS] Download server problems? Message-ID: <324960494b49bb2c0f2679add8452bf9@bbsrc.ac.uk> Does anyone know what's up with the emboss.open-bio.org FTP server? I can connect, but never get as far as a login prompt. Simon. -- Simon Andrews PhD Bioinformatics Dept. The Babraham Institute simon.andrews at bbsrc.ac.uk +44 (0) 1223 496463 From dag at sonsorol.org Wed Apr 5 23:07:33 2006 From: dag at sonsorol.org (Chris Dagdigian) Date: Wed, 5 Apr 2006 23:07:33 -0400 Subject: [EMBOSS] Download server problems? In-Reply-To: <324960494b49bb2c0f2679add8452bf9@bbsrc.ac.uk> References: <324960494b49bb2c0f2679add8452bf9@bbsrc.ac.uk> Message-ID: {forgot to CC the list on this reply ... } Our fault (open-bio.org hosting) -- the server has some sort of running process with a memory leak we thought we had found. Turns out we didn't and the box ground itself slowly to a halt this evening. Thanks to the wonders of remote power control all it takes to reset and power cycle the system is an SSH connection. We've got another 4GB of memory on order for this system. Regards. Chris On Apr 5, 2006, at 5:04 AM, simon andrews (BI) wrote: > Does anyone know what's up with the emboss.open-bio.org FTP server? I > can connect, but never get as far as a login prompt. > > Simon. > -- > Simon Andrews PhD > Bioinformatics Dept. > The Babraham Institute > > simon.andrews at bbsrc.ac.uk > +44 (0) 1223 496463 > > _______________________________________________ > EMBOSS mailing list > EMBOSS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/emboss From natalia.jimenez at pcm.uam.es Thu Apr 6 03:56:06 2006 From: natalia.jimenez at pcm.uam.es (Natalia Jimenez Lozano) Date: Thu, 06 Apr 2006 09:56:06 +0200 Subject: [EMBOSS] Problems with GenBank indexing Message-ID: <4434C996.7050606@pcm.uam.es> Hi everybody, I was trying to retrieve fasta protein sequences from GenBank by id using seqret but it was not possible for every id. However, retrieval by GI is allowed. Additionally, during the indexing process (dbifasta) I've obtained some errors like this one: Warning: Duplicate ID skipped: 'AC000348_16' All hits will point to first ID found I was looking for an explanation to this behaviour and I've found that skipped IDs correspond to CDS from genomic sequences and have this format: >gi|10121909|gb|AAG13419.1|AC000348_16 T7N9.24 [Arabidopsis thaliana] MELPDVPVWRRVIVSAFFEALTFNIDIEEERSEIMMKTGAVVSNPRSRVKWDAFLSFQRDTSHNFTDRLY... >gi|8778864|gb|AAF79863.1|AC000348_16 T7N9.28 [Arabidopsis thaliana] MSVVLQITKDWVQALLGFLLLSFANISTRTNHKHFPHGSCSSIMAGFWIYMYIYSYLFITLKIIDLTS... In the previous entries, when I try to retrieve one of them by the first identifier (gi), I can get both of them. When I try to do retrievals using the last identifier (AC000348_16), I only get the first one. But it's impossible to do retrievals by second identifier (AAG13419.1 and AAF79863.1). However, sequences with the following format can be well indexed: >gi|64029|emb|CAA23986.1| reading frame [Lophius americanus] MKMVSSSRLRCLLVLLLSLTASISCSFAGQRDSKLRLLLHRYPLQGSKQDMTRSALAELLLSDLLQGENE ... and these sequences can be well retrieved by first and second identifiers (64029 and CAA23986.1). Does anybody know how to solve these problems? Thanks in advance, Natalia From jison at ebi.ac.uk Fri Apr 7 08:02:50 2006 From: jison at ebi.ac.uk (Jon Ison) Date: Fri, 7 Apr 2006 13:02:50 +0100 (BST) Subject: [EMBOSS] Problems with GenBank indexing In-Reply-To: <4434C996.7050606@pcm.uam.es> References: <4434C996.7050606@pcm.uam.es> Message-ID: <59463.172.31.100.168.1144411370.squirrel@webmail.ebi.ac.uk> Dear Natalia By default, dbifasta will index the ID name and the accession number (if present). To index the Sequence Version, GI number and words in the description, you must run dbifasta with the '-fields' qualifier, e.g. "-fields acc", "-fields sv acc" etc. If you don't, you will not be able to retrieve by those fields. Please see http://emboss.sourceforge.net/apps/cvs/dbifasta.html. dbifasta only retrieves the first of any duplicate entries. So far as I'm aware dbxfasta can retrieve duplicate entries. Does that help? Feel free to get back in touch. Cheers Jon > Hi everybody, > > I was trying to retrieve fasta protein sequences from GenBank by id > using seqret but it was not possible for every id. However, retrieval by > GI is allowed. > > Additionally, during the indexing process (dbifasta) I've obtained some > errors like this one: > > Warning: Duplicate ID skipped: 'AC000348_16' All hits will point to > first ID found > > I was looking for an explanation to this behaviour and I've found that > skipped IDs correspond to CDS from genomic sequences and have this format: > > >gi|10121909|gb|AAG13419.1|AC000348_16 T7N9.24 [Arabidopsis thaliana] > MELPDVPVWRRVIVSAFFEALTFNIDIEEERSEIMMKTGAVVSNPRSRVKWDAFLSFQRDTSHNFTDRLY... > >gi|8778864|gb|AAF79863.1|AC000348_16 T7N9.28 [Arabidopsis thaliana] > MSVVLQITKDWVQALLGFLLLSFANISTRTNHKHFPHGSCSSIMAGFWIYMYIYSYLFITLKIIDLTS... > > In the previous entries, when I try to retrieve one of them by the first > identifier (gi), I can get both of them. When I try to do retrievals > using the last identifier (AC000348_16), I only get the first one. But > it's impossible to do retrievals by second identifier (AAG13419.1 and > AAF79863.1). > > However, sequences with the following format can be well indexed: > > >gi|64029|emb|CAA23986.1| reading frame [Lophius americanus] > MKMVSSSRLRCLLVLLLSLTASISCSFAGQRDSKLRLLLHRYPLQGSKQDMTRSALAELLLSDLLQGENE ... > > and these sequences can be well retrieved by first and second > identifiers (64029 and CAA23986.1). > > Does anybody know how to solve these problems? > Thanks in advance, > Natalia > _______________________________________________ > EMBOSS mailing list > EMBOSS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/emboss > From natalia.jimenez at pcm.uam.es Fri Apr 7 08:50:16 2006 From: natalia.jimenez at pcm.uam.es (Natalia Jimenez Lozano) Date: Fri, 07 Apr 2006 14:50:16 +0200 Subject: [EMBOSS] Problems with GenBank indexing In-Reply-To: <59463.172.31.100.168.1144411370.squirrel@webmail.ebi.ac.uk> References: <4434C996.7050606@pcm.uam.es> <59463.172.31.100.168.1144411370.squirrel@webmail.ebi.ac.uk> Message-ID: <44366008.6080106@pcm.uam.es> Dear Jon, > Dear Natalia > > By default, dbifasta will index the ID name and the accession number (if present). > > To index the Sequence Version, GI number and words in the description, you must > run dbifasta with the '-fields' qualifier, e.g. "-fields acc", "-fields sv acc" > etc. If you don't, you will not be able to retrieve by those fields. Please > see http://emboss.sourceforge.net/apps/cvs/dbifasta.html. > Yes indexation was done taking into account the -field parameter :-( > dbifasta only retrieves the first of any duplicate entries. So far as I'm aware > dbxfasta can retrieve duplicate entries. > We'll try with dbxfasta! > Does that help? Feel free to get back in touch. > Yes, a lot. Thank you very much Regards, Natalia > Cheers > > Jon > > > > > >> Hi everybody, >> >> I was trying to retrieve fasta protein sequences from GenBank by id >> using seqret but it was not possible for every id. However, retrieval by >> GI is allowed. >> >> Additionally, during the indexing process (dbifasta) I've obtained some >> errors like this one: >> >> Warning: Duplicate ID skipped: 'AC000348_16' All hits will point to >> first ID found >> >> I was looking for an explanation to this behaviour and I've found that >> skipped IDs correspond to CDS from genomic sequences and have this format: >> >> >gi|10121909|gb|AAG13419.1|AC000348_16 T7N9.24 [Arabidopsis thaliana] >> MELPDVPVWRRVIVSAFFEALTFNIDIEEERSEIMMKTGAVVSNPRSRVKWDAFLSFQRDTSHNFTDRLY... >> >gi|8778864|gb|AAF79863.1|AC000348_16 T7N9.28 [Arabidopsis thaliana] >> MSVVLQITKDWVQALLGFLLLSFANISTRTNHKHFPHGSCSSIMAGFWIYMYIYSYLFITLKIIDLTS... >> >> In the previous entries, when I try to retrieve one of them by the first >> identifier (gi), I can get both of them. When I try to do retrievals >> using the last identifier (AC000348_16), I only get the first one. But >> it's impossible to do retrievals by second identifier (AAG13419.1 and >> AAF79863.1). >> >> However, sequences with the following format can be well indexed: >> >> >gi|64029|emb|CAA23986.1| reading frame [Lophius americanus] >> MKMVSSSRLRCLLVLLLSLTASISCSFAGQRDSKLRLLLHRYPLQGSKQDMTRSALAELLLSDLLQGENE ... >> >> and these sequences can be well retrieved by first and second >> identifiers (64029 and CAA23986.1). >> >> Does anybody know how to solve these problems? >> Thanks in advance, >> Natalia >> _______________________________________________ >> EMBOSS mailing list >> EMBOSS at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/emboss >> >> > > > > > From jison at ebi.ac.uk Fri Apr 7 11:34:24 2006 From: jison at ebi.ac.uk (Jon Ison) Date: Fri, 7 Apr 2006 16:34:24 +0100 (BST) Subject: [EMBOSS] Problem indexing PDB fasta file In-Reply-To: <442BFD56.9010908@pcm.uam.es> References: <442BFD56.9010908@pcm.uam.es> Message-ID: <34100.172.31.100.168.1144424064.squirrel@webmail.ebi.ac.uk> Hi Enrique dbifasta will return just the first entry with a duplicated id. The new dbxfasta will return all entries with the duplicated id. dbifasta is indeed case-insensitive. To make it case-sensitive, you could change the 3 instances of "ajStrMatchCaseC" in dbifasta.c to "ajStrMatchC", recompile and try again. I don't think we'd want to make that change in the distribution though. Hope that helps. Cheers Jon > Hello, > > I'm trying to index the fasta file of the PDB database with dbifasta > command and I get a lot of warnings as: > > Warning: Duplicate ID skipped: '1FNT_A' All hits will point to first ID > found > > I have been looking the PDB fasta file and I see that, for the previous > warning, there are an entry whoose id is '1FNT_A' and another one whoose > id is '1FNT_a'. Then, this make me think that EMBOSS is > case-insensitive. Is this true? Are there any way to distinguish between > the two id's? > > Thanks in advance, > > Enrique. > _______________________________________________ > EMBOSS mailing list > EMBOSS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/emboss > From pmr at ebi.ac.uk Mon Apr 10 05:12:00 2006 From: pmr at ebi.ac.uk (Peter Rice) Date: Mon, 10 Apr 2006 10:12:00 +0100 Subject: [EMBOSS] Problem indexing PDB fasta file In-Reply-To: <442BFD56.9010908@pcm.uam.es> References: <442BFD56.9010908@pcm.uam.es> Message-ID: <443A2160.8090102@ebi.ac.uk> Enrique de Andres Saiz wrote: > I have been looking the PDB fasta file and I see that, for the previous > warning, there are an entry whoose id is '1FNT_A' and another one whoose > id is '1FNT_a'. Then, this make me think that EMBOSS is > case-insensitive. Is this true? Are there any way to distinguish between > the two id's? Yes, EMBOSS is case-insensitive. So is the Staden/EMBLCD indexing standard that dbifasta uses. The standard also only allows one entry with each ID. dbxfasta uses a new indexing format and can index both entries, but will still assume the names are the same (a search for 1FNT_A or 1FNT_a wil return both entries). Allowing indexing to be case-sensitive is possible in future, but can slow down searches. We will investigate. Hope that helps, Peter From pmr at ebi.ac.uk Mon Apr 10 05:05:36 2006 From: pmr at ebi.ac.uk (Peter Rice) Date: Mon, 10 Apr 2006 10:05:36 +0100 Subject: [EMBOSS] dbifasta index file format In-Reply-To: <20060330083142.4237.qmail@web26207.mail.ukl.yahoo.com> References: <20060330083142.4237.qmail@web26207.mail.ukl.yahoo.com> Message-ID: <443A1FE0.1060707@ebi.ac.uk> Graziano P. wrote: > hello EMBOSS users, > I have some databases in fasta format (ncbi | format) > and I want to index them using dbifasta, then I want > to access the index files using a program that will be > developed by a computer scientist of my group. > I need to index the databases by accession number, > ginumber and description. I have read in the dbifasta > help info about the structure of the index files when > the databases were indexed by accession number, but I > have not found info about the structure of the index > files when the databases are indexed by description. > Anyone knows where I can find detailed information > about the structure of the index files? Ciao Graziano, The dbifasta index files use the same format as the Staden package, the old EMBL CD-ROM distribution, and Erik Sonnhammer's "efetch" utility. They were documented in some old Staden documentation and papers. They are also documented in the EMBOSS distribution under doc/manuals/ in file internals-indexing.txt (see attached). I see that this document was written before we indexed the descriptions!!! The description (title) indexing is the same as the accession number indexing. The files are called des.hit and des.trg. dbifasta has a -maxindex option to limit the size of the longest words indexed (the index files have a value for the maximum record length). We also have a script in the distribution scripts/dbilist.pl which can list the contents of the description index (in the database index directory, run it as dbilist.pl des) The new dbxfasta index files are very different. For very large databases we recommend dbxfasta. For smaller databases dbifasta is fine and we will continue to support it. Hope that helps. If you need more details, just ask. regards, Peter -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: internals-indexing.txt Url: http://lists.open-bio.org/pipermail/emboss/attachments/20060410/be632ef4/attachment.txt From simon.andrews at bbsrc.ac.uk Mon Apr 10 05:40:30 2006 From: simon.andrews at bbsrc.ac.uk (simon andrews (BI)) Date: Mon, 10 Apr 2006 10:40:30 +0100 Subject: [EMBOSS] Problem indexing PDB fasta file In-Reply-To: <443A2160.8090102@ebi.ac.uk> References: <442BFD56.9010908@pcm.uam.es> <443A2160.8090102@ebi.ac.uk> Message-ID: <69289db821f271dd3cf6e131ffa02013@bbsrc.ac.uk> On 10 Apr 2006, at 10:12, Peter Rice wrote: > Enrique de Andres Saiz wrote: >> I have been looking the PDB fasta file and I see that, for the >> previous >> warning, there are an entry whoose id is '1FNT_A' and another one >> whoose >> id is '1FNT_a'. Then, this make me think that EMBOSS is >> case-insensitive. Is this true? Are there any way to distinguish >> between >> the two id's? > > Yes, EMBOSS is case-insensitive. So is the Staden/EMBLCD indexing > standard > that dbifasta uses. > > The standard also only allows one entry with each ID. If anyone's interested I've got a small perl script which reformats the PDB database into a more sensible format and sorts out the problems with case sensitive ids and a number of other odd conventions used in PDB. I'm happy to supply a copy to anyone who wants it. TTFN Simon. -- Simon Andrews PhD Bioinformatics Dept. The Babraham Institute simon.andrews at bbsrc.ac.uk +44 (0) 1223 496463 From pmr at ebi.ac.uk Mon Apr 10 06:44:47 2006 From: pmr at ebi.ac.uk (Peter Rice) Date: Mon, 10 Apr 2006 11:44:47 +0100 Subject: [EMBOSS] Problems with GenBank indexing In-Reply-To: <4434C996.7050606@pcm.uam.es> References: <4434C996.7050606@pcm.uam.es> Message-ID: <443A371F.1010100@ebi.ac.uk> Natalia Jimenez Lozano wrote: > I was looking for an explanation to this behaviour and I've found that > skipped IDs correspond to CDS from genomic sequences and have this format: > > >gi|10121909|gb|AAG13419.1|AC000348_16 T7N9.24 [Arabidopsis thaliana] > MELPDVPVWRRVIVSAFFEALTFNIDIEEERSEIMMKTGAVVSNPRSRVKWDAFLSFQRDTSHNFTDRLY... > >gi|8778864|gb|AAF79863.1|AC000348_16 T7N9.28 [Arabidopsis thaliana] > MSVVLQITKDWVQALLGFLLLSFANISTRTNHKHFPHGSCSSIMAGFWIYMYIYSYLFITLKIIDLTS... As Jon says, dbxfasta is a solution. However, that is only a partial solution. The real problem is that these FASTA format sequences do indeed have duplicate IDs. This is protein sequence data, so it is not GenBank - was this GenPept or some other database? GenPept and other databases have been known to report "gb" or "emb" as the database for protein sequences!!! A possible solution is to add a new ID format to dbifasta and dbxfasta that uses AAG13419 and AAF7986 as the ID and ignores the AC000348_16 part. Hope this helps, Peter From pmr at ebi.ac.uk Mon Apr 10 07:04:49 2006 From: pmr at ebi.ac.uk (Peter Rice) Date: Mon, 10 Apr 2006 12:04:49 +0100 Subject: [EMBOSS] Fwd: EMBOSS for Windows without Cygwin In-Reply-To: References: <442CCD71.60202@gmail.com> Message-ID: <443A3BD1.2040709@ebi.ac.uk> Duleep Samuel wrote: > Is the latest EMBOSS version 3.0.0.0 available anywhere as a precompiled > binary for Windows XP, I have tried compiling using cygwin and it > crashed, I loaded EMBOSS for windows which is a port of version 2.10.0, > loaded Staden Package and made Spin aware of EMBOSS and am working, but > feel bad that I am _One_ whole release behind, If anyone has a complied > binary I can download for testing and report back on useability, > regards, Samuel, Virologist, India Staden has support for older versions of EMBOSS. We are trying to update Staden to work with EMBOS 3.0.0 and future releases. If anyone is using EMBOSS and Staden (especially EMBOSS under the Staden SPIN interface) please contact the EMBOSS developers (emboss-bug at emboss.open-bio.org) so we know how many EMBOSS SPIN users there are. It helps to set priorities for the work. regards, Peter From janenerz at web.de Wed Apr 12 05:09:58 2006 From: janenerz at web.de (Christiane Nerz) Date: Wed, 12 Apr 2006 11:09:58 +0200 Subject: [EMBOSS] nt-multi-fastA-file Message-ID: <443CC3E6.4040108@web.de> Hi all, I put the gb-file of an whole genome in Artemis. Is there a possibility to export a multi-FastA-file with the bases of all ORFs? Example: >ORF_1 ATGTGTTCGTT.... >ORF_2 ATGTTCCCGACCA... >ORF_3 ATGCCGCAT... I know how to get all bases, but only as one complete sequence. (That genome is not published yet, so there is no multi-Fasta-file at ncbi or EMBL available) Thanks for help! Jane Nerz From simon.andrews at bbsrc.ac.uk Wed Apr 12 06:05:49 2006 From: simon.andrews at bbsrc.ac.uk (simon andrews (BI)) Date: Wed, 12 Apr 2006 11:05:49 +0100 Subject: [EMBOSS] nt-multi-fastA-file In-Reply-To: <443CC3E6.4040108@web.de> References: <443CC3E6.4040108@web.de> Message-ID: <902608901e58c68600b4dc52c7e8a966@bbsrc.ac.uk> On 12 Apr 2006, at 10:09, Christiane Nerz wrote: > Hi all, > > I put the gb-file of an whole genome in Artemis. > Is there a possibility to export a multi-FastA-file with the bases of > all ORFs? If you can save the file out of Artemis with the ORFs shown in the feature table then you can use coderet in EMBOSS to extract out all of the subsequences covering those features, either as protein or DNA. Hope this helps Simon. -- Simon Andrews PhD Bioinformatics Dept. The Babraham Institute simon.andrews at bbsrc.ac.uk +44 (0) 1223 496463 From pmr at ebi.ac.uk Wed Apr 12 06:20:46 2006 From: pmr at ebi.ac.uk (Peter Rice) Date: Wed, 12 Apr 2006 11:20:46 +0100 Subject: [EMBOSS] nt-multi-fastA-file In-Reply-To: <443CC3E6.4040108@web.de> References: <443CC3E6.4040108@web.de> Message-ID: <443CD47E.6060607@ebi.ac.uk> Christiane Nerz wrote: > Hi all, > > I put the gb-file of an whole genome in Artemis. > Is there a possibility to export a multi-FastA-file with the bases of > all ORFs? Example: > > >ORF_1 > ATGTGTTCGTT.... > >ORF_2 > ATGTTCCCGACCA... > >ORF_3 > ATGCCGCAT... > > I know how to get all bases, but only as one complete sequence. > (That genome is not published yet, so there is no multi-Fasta-file at > ncbi or EMBL available) Yes, the coderet program will do this. Unfortunately coderet tries to return CDS, mRNA and translations all in one file (to be fixed for the next release). You can ask just for the CDS with a couple of extra command line options: coderet -nomrna -notranslation Give it the filename as input. The output will be the coding sequences. With -nocds instead of -notranslation you will get the protein sequences. If you have any problems parsing the GenBank file let me know. regards, Peter Rice From Marc.Logghe at DEVGEN.com Wed Apr 12 08:39:00 2006 From: Marc.Logghe at DEVGEN.com (Marc Logghe) Date: Wed, 12 Apr 2006 14:39:00 +0200 Subject: [EMBOSS] Embossdata -reject option Message-ID: <0C528E3670D8CE4B8E013F6749231AA6746CD7@ANTARESIA.be.devgen.com> Hi, I am intrigued by the -reject option of embossdata. According to the doc: "This specifies the names of the sub-directories of the EMBOSS data directory that should be ignored when displaying data directories. Choose from selection list of values 3, 5, 6". I was not able to find out what this list of values corresponds to. I hoped to get a list to select from when embossdata was run with the -options parameter, but this did not happen. Any clues ? Actually I was trying to find a way to obtain more or less the oposite of '-reject', e.g. what if you only want the content of the CODONS directory ? Regards, Marc From gbottu at ben.vub.ac.be Wed Apr 12 09:30:00 2006 From: gbottu at ben.vub.ac.be (Guy Bottu) Date: Wed, 12 Apr 2006 15:30:00 +0200 Subject: [EMBOSS] Embossdata -reject option - Checked by AntiVir DEMO versio In-Reply-To: <0C528E3670D8CE4B8E013F6749231AA6746CD7@ANTARESIA.be.devgen.com> References: <0C528E3670D8CE4B8E013F6749231AA6746CD7@ANTARESIA.be.devgen.com> Message-ID: <20060412133000.GD15725@bigben.ulb.ac.be> On Wed, Apr 12, 2006 at 02:39:00PM +0200, Marc Logghe wrote: > I am intrigued by the -reject option of embossdata. > According to the doc: > "This specifies the names of the sub-directories of the EMBOSS data > directory that should be ignored when displaying data directories. > Choose from selection list of values 3, 5, 6". > I was not able to find out what this list of values corresponds to. Indeed tricky to find out what this means :-; You can look in the file .../share/EMBOSS/acd/embossdata.acd : selection: reject [ default: "3, 5, 6" minimum: "1" maximum: "6" values: "None, AAINDEX, CVS, CODONS, PRINTS, PROSITE, REBASE" delimiter: "," header: "Directories to ignore" information: "Select directories" help: "This specifies the names of the sub-directories of the EMBOSS data directory that should be ignored when displaying data directories." button: "Y" ] So, by default CVS, PRINTS and PROSITE are rejected. > I hoped to get a list to select from when embossdata was run with the > -options parameter, but this did not happen. That is because -reject is an "advanced", not an "optional"/"additinal" parameter. It is indeed impossible to get a selection list displayed at the command line, although many GUI's like wEMBOSS will show it. > Actually I was trying to find a way to obtain more or less the oposite > of '-reject', e.g. what if you only want the content of the CODONS > directory ? This does not work, there is no way to reject the files in the base data directory. The best you can do is to add on the command line -reject=2,3,5,6,7 or -reject= AAINDEX,CVS,PRINTS,PROSITE,REBASE What you can do however is : ls $EMBOSS_DATA/CODONS Hope this helps, Guy Bottu, Belgian EMBnet Node From Marc.Logghe at DEVGEN.com Wed Apr 12 10:02:09 2006 From: Marc.Logghe at DEVGEN.com (Marc Logghe) Date: Wed, 12 Apr 2006 16:02:09 +0200 Subject: [EMBOSS] Embossdata -reject option - Checked by AntiVir DEMO versio Message-ID: <0C528E3670D8CE4B8E013F6749231AA6746CD9@ANTARESIA.be.devgen.com> Hi Guy ! > You can look in the file .../share/EMBOSS/acd/embossdata.acd : > > selection: reject [ > default: "3, 5, 6" > minimum: "1" > maximum: "6" > values: "None, AAINDEX, CVS, CODONS, PRINTS, PROSITE, REBASE" > delimiter: "," > header: "Directories to ignore" > information: "Select directories" > help: "This specifies the names of the sub-directories of the > EMBOSS data directory that should be ignored when > displaying data > directories." > button: "Y" > ] > > So, by default CVS, PRINTS and PROSITE are rejected. Yes, that makes sense now ! > This does not work, there is no way to reject the files in > the base data directory. The best you can do is to add on the > command line > -reject=2,3,5,6,7 or -reject= > AAINDEX,CVS,PRINTS,PROSITE,REBASE What you can do however is : > ls $EMBOSS_DATA/CODONS Yeah, that is of course the most obvious ;-) Thing is that I wanted to do it in an emboss-only way so that it would be possible to run the emboss command via a soaplab service. The latter should provide a means to dynamically fetch a list of codon usage tables. More or less like showdb is doing. > > Hope this helps, Yes it did. Thanks ! Regards, Marc From pmr at ebi.ac.uk Wed Apr 12 12:04:21 2006 From: pmr at ebi.ac.uk (pmr at ebi.ac.uk) Date: Wed, 12 Apr 2006 17:04:21 +0100 (BST) Subject: [EMBOSS] Embossdata -reject option In-Reply-To: <0C528E3670D8CE4B8E013F6749231AA6746CD7@ANTARESIA.be.devgen.com> References: <0C528E3670D8CE4B8E013F6749231AA6746CD7@ANTARESIA.be.devgen.com> Message-ID: <3057.86.137.128.238.1144857861.squirrel@webmail.ebi.ac.uk> Mark Logghe wrote: > I am intrigued by the -reject option of embossdata. > > I was not able to find out what this list of values corresponds to. I > hoped to get a list to select from when embossdata was run with the > -options parameter, but this did not happen. > Any clues ? Hmmmm .... yes, -help and the acdtable output (the table in the webpage application documentation) really need to report the list of menu items for values that are not prompted (list and selection datatypes). We will do that for the next release! Otherwise, you do need to look in the ACD file. I propose: -help to report documentation on the options -help -verbose to report the list of options acdtable to report the full menu formatted in the "Allowed values" box. When this is implemented, it will appear in the apps/cvs/embossdata.html documentation at emboss.sf.net :-) >Yeah, that is of course the most obvious ;-) Thing is that I wanted to >do it in an emboss-only way so that it would be possible to run the >emboss command via a soaplab service. The latter should provide a means >to dynamically fetch a list of codon usage tables. More or less like >showdb is doing. We are looking at ways to do that ... can be tricky if cutgextract has been run. Any suggestions? A showdata application perhaps? Hope that helps, Peter From Marc.Logghe at DEVGEN.com Wed Apr 12 12:21:17 2006 From: Marc.Logghe at DEVGEN.com (Marc Logghe) Date: Wed, 12 Apr 2006 18:21:17 +0200 Subject: [EMBOSS] Embossdata -reject option Message-ID: <0C528E3670D8CE4B8E013F6749231AA6746CDB@ANTARESIA.be.devgen.com> Hi Peter, > Hmmmm .... yes, -help and the acdtable output (the table in > the webpage application documentation) really need to report > the list of menu items for values that are not prompted (list > and selection datatypes). > > We will do that for the next release! > > Otherwise, you do need to look in the ACD file. > > I propose: > > -help to report documentation on the options -help -verbose > to report the list of options > > acdtable to report the full menu formatted in the "Allowed > values" box. OK, great ! > We are looking at ways to do that ... can be tricky if > cutgextract has been run. Any suggestions? A showdata > application perhaps? Yes that could be a start. You could give the directory name as a parameter, the oposite of the -reject parameter (-include ?). In it's basic form it can just list the file content like embossdata -showall is doing. An example command that lists all the codon tables could be: 'showdata -include CODONS'. Something else. In order not to contaminate the CODONS folder I created a CUTG folder in the directory containing the codon tables extracted from the most recent CUTG. Problem now is a user has to add the relative filename as a cfile option (backtranseq) in order EMBOSS to find the new codon tables. Would it be an idea that you can set $EMBOSS_DATA to a list of values instead of only 1 directory name ? In that way, EMBOSS can access custom data directories. Suppose the following: EMBOSS_DATA=/usr/local/share/EMBOSS/data:/my/other/emboss_data_dir/CUTG If a codon table is not found in the usual place (/usr/local/share/EMBOSS/data/CODONS) EMBOSS will look for them in other places defined in EMBOSS_DATA (/my/other/emboss_data_dir/CUTG). Or something alike. Does that make sense ? Cheers, Marc From simon.andrews at bbsrc.ac.uk Thu Apr 13 04:43:53 2006 From: simon.andrews at bbsrc.ac.uk (simon andrews (BI)) Date: Thu, 13 Apr 2006 09:43:53 +0100 Subject: [EMBOSS] Problems indexing with dbxfasta Message-ID: I'm trying to use dbxfasta to index one of the WGS trace databases. Unfortunately dbxfasta is falling over on me. The session looks like this: $ dbxfasta Database b+tree indexing for fasta file databases Basename for index files: traces_oanatinus Resource name: all simple : >ID idacc : >ID ACC gcgid : >db:ID gcgidacc : >db:ID ACC dbid : >db ID ncbi : | formats ID line format [idacc]: simple Database directory [.]: Wildcard database filename [*.dat]: *.fasta Release number [0.0]: Index date [00/00/00]: Processing file ./nisc-platypus-shotgun-1048960391.fasta Processing file ./nisc-platypus-shotgun-1071756042.fasta Processing file ./nisc-platypus-shotgun-1080815515.fasta Processing file ./nisc-platypus-shotgun-1102160893.fasta Processing file ./nisc-platypus-shotgun-1104879084.fasta Processing file ./nisc-platypus-shotgun-1109000445.fasta Processing file ./nisc-platypus-shotgun-1110804272.fasta Processing file ./nisc-platypus-shotgun-1116844699.fasta Processing file ./nisc-platypus-shotgun-1142973027.fasta Processing file ./wugsc-ornithorhynchus_anatinus-cloneEnd-1113828608.fasta Processing file ./wugsc-ornithorhynchus_anatinus-cloneEnd-1115655383.fasta Processing file ./wugsc-ornithorhynchus_anatinus-cloneEnd-1119433885.fasta EMBOSS An error in ajindex.c at line 615: Maximum retries (100) reached in btreeCacheFetch for page 14240710656 The same files have indexed OK with formatdb. I havent' tried with dbifasta as I'm trying to move everything over to the new dbx system (and the rest of our databases have processed OK with dbx(fasta|flat)). Anyone have any ideas about how to debug this? Cheers Simon. -- Simon Andrews PhD Bioinformatics Group The Babraham Institute simon.andrews at bbsrc.ac.uk +44 (0) 1223 496463 From Marc.Logghe at DEVGEN.com Thu Apr 13 05:00:56 2006 From: Marc.Logghe at DEVGEN.com (Marc Logghe) Date: Thu, 13 Apr 2006 11:00:56 +0200 Subject: [EMBOSS] Problems indexing with dbxfasta Message-ID: <0C528E3670D8CE4B8E013F6749231AA6746CE0@ANTARESIA.be.devgen.com> Hi Simon, > The same files have indexed OK with formatdb. I havent' > tried with dbifasta as I'm trying to move everything over to > the new dbx system (and the rest of our databases have > processed OK with dbx(fasta|flat)). > > Anyone have any ideas about how to debug this? You can run the command with the -debug option (any EMBOSS application accepts this option). In that case a dbxfasta.dbg file will be created. Hope this file will give you the clues. Cheers, Marc From ajb at ebi.ac.uk Thu Apr 13 05:19:44 2006 From: ajb at ebi.ac.uk (ajb at ebi.ac.uk) Date: Thu, 13 Apr 2006 10:19:44 +0100 (BST) Subject: [EMBOSS] Problems indexing with dbxfasta In-Reply-To: References: Message-ID: <52959.81.98.244.247.1144919984.squirrel@webmail.ebi.ac.uk> Hello Simon, Did you pick up the latest set of patches from: ftp://emboss.open-bio.org/pub/EMBOSS/fixes/ ? The indexing system was rewritten a few months ago to fix this. See the README in that directory. If you are using the latest fixes (check file sizes) and it is still failing then let me know. HTH Alan > I'm trying to use dbxfasta to index one of the WGS trace databases. > Unfortunately dbxfasta is falling over on me. The session looks like > this: > > $ dbxfasta > Database b+tree indexing for fasta file databases > Basename for index files: traces_oanatinus > Resource name: all > simple : >ID > idacc : >ID ACC > gcgid : >db:ID > gcgidacc : >db:ID ACC > dbid : >db ID > ncbi : | formats > ID line format [idacc]: simple > Database directory [.]: > Wildcard database filename [*.dat]: *.fasta > Release number [0.0]: > Index date [00/00/00]: > Processing file ./nisc-platypus-shotgun-1048960391.fasta > Processing file ./nisc-platypus-shotgun-1071756042.fasta > Processing file ./nisc-platypus-shotgun-1080815515.fasta > Processing file ./nisc-platypus-shotgun-1102160893.fasta > Processing file ./nisc-platypus-shotgun-1104879084.fasta > Processing file ./nisc-platypus-shotgun-1109000445.fasta > Processing file ./nisc-platypus-shotgun-1110804272.fasta > Processing file ./nisc-platypus-shotgun-1116844699.fasta > Processing file ./nisc-platypus-shotgun-1142973027.fasta > Processing file > ./wugsc-ornithorhynchus_anatinus-cloneEnd-1113828608.fasta > Processing file > ./wugsc-ornithorhynchus_anatinus-cloneEnd-1115655383.fasta > Processing file > ./wugsc-ornithorhynchus_anatinus-cloneEnd-1119433885.fasta > > EMBOSS An error in ajindex.c at line 615: > Maximum retries (100) reached in btreeCacheFetch for page 14240710656 > > The same files have indexed OK with formatdb. I havent' tried with > dbifasta as I'm trying to move everything over to the new dbx system > (and the rest of our databases have processed OK with dbx(fasta|flat)). > > Anyone have any ideas about how to debug this? > > Cheers > > Simon. > > -- > Simon Andrews PhD > Bioinformatics Group > The Babraham Institute > > simon.andrews at bbsrc.ac.uk > +44 (0) 1223 496463 > > _______________________________________________ > EMBOSS mailing list > EMBOSS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/emboss > From simon.andrews at bbsrc.ac.uk Thu Apr 13 05:30:41 2006 From: simon.andrews at bbsrc.ac.uk (simon andrews (BI)) Date: Thu, 13 Apr 2006 10:30:41 +0100 Subject: [EMBOSS] Problems indexing with dbxfasta In-Reply-To: <52959.81.98.244.247.1144919984.squirrel@webmail.ebi.ac.uk> References: <52959.81.98.244.247.1144919984.squirrel@webmail.ebi.ac.uk> Message-ID: <25bf10458f1cd7e0cc1c64de70f6bdef@bbsrc.ac.uk> On 13 Apr 2006, at 10:19, ajb at ebi.ac.uk wrote: > Hello Simon, > > Did you pick up the latest set of patches from: > ftp://emboss.open-bio.org/pub/EMBOSS/fixes/ Yes. All patched with the latest fixes as of last week. > If you are using the latest fixes (check file sizes) and it is still > failing then let me know. It is still failing. I'll have a go at generating a .dbg file if you think it'll help, but given how verbose those tend to be, and how long it takes to fail I was a bit concerned at the size of file it was likely to generate. Simon. > > > HTH > > Alan > >> I'm trying to use dbxfasta to index one of the WGS trace databases. >> Unfortunately dbxfasta is falling over on me. The session looks like >> this: >> >> $ dbxfasta >> Database b+tree indexing for fasta file databases >> Basename for index files: traces_oanatinus >> Resource name: all >> simple : >ID >> idacc : >ID ACC >> gcgid : >db:ID >> gcgidacc : >db:ID ACC >> dbid : >db ID >> ncbi : | formats >> ID line format [idacc]: simple >> Database directory [.]: >> Wildcard database filename [*.dat]: *.fasta >> Release number [0.0]: >> Index date [00/00/00]: >> Processing file ./nisc-platypus-shotgun-1048960391.fasta >> Processing file ./nisc-platypus-shotgun-1071756042.fasta >> Processing file ./nisc-platypus-shotgun-1080815515.fasta >> Processing file ./nisc-platypus-shotgun-1102160893.fasta >> Processing file ./nisc-platypus-shotgun-1104879084.fasta >> Processing file ./nisc-platypus-shotgun-1109000445.fasta >> Processing file ./nisc-platypus-shotgun-1110804272.fasta >> Processing file ./nisc-platypus-shotgun-1116844699.fasta >> Processing file ./nisc-platypus-shotgun-1142973027.fasta >> Processing file >> ./wugsc-ornithorhynchus_anatinus-cloneEnd-1113828608.fasta >> Processing file >> ./wugsc-ornithorhynchus_anatinus-cloneEnd-1115655383.fasta >> Processing file >> ./wugsc-ornithorhynchus_anatinus-cloneEnd-1119433885.fasta >> >> EMBOSS An error in ajindex.c at line 615: >> Maximum retries (100) reached in btreeCacheFetch for page 14240710656 >> >> The same files have indexed OK with formatdb. I havent' tried with >> dbifasta as I'm trying to move everything over to the new dbx system >> (and the rest of our databases have processed OK with >> dbx(fasta|flat)). >> >> Anyone have any ideas about how to debug this? >> >> Cheers >> >> Simon. >> >> -- >> Simon Andrews PhD >> Bioinformatics Group >> The Babraham Institute >> >> simon.andrews at bbsrc.ac.uk >> +44 (0) 1223 496463 >> >> _______________________________________________ >> EMBOSS mailing list >> EMBOSS at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/emboss >> > > > -- Simon Andrews PhD Bioinformatics Dept. The Babraham Institute simon.andrews at bbsrc.ac.uk +44 (0) 1223 496463 From simon.andrews at bbsrc.ac.uk Thu Apr 13 05:41:13 2006 From: simon.andrews at bbsrc.ac.uk (simon andrews (BI)) Date: Thu, 13 Apr 2006 10:41:13 +0100 Subject: [EMBOSS] Problems indexing with dbxfasta Message-ID: I managed to get hold of a debug file from the failing dbxfasta. The edited highlights are: Debug file dbxfasta.dbg buffered:No ajFileNewIn '/usr/local/share/EMBOSS/acd/dbxfasta.acd' EOF ajFileGetsL file /usr/local/share/EMBOSS/acd/dbxfasta.acd closing file '/usr/local/share/EMBOSS/acd/dbxfasta.acd' ajUserGet buffer len: 0 res: 2048 ptr: 8d8eb18 ajUserGet buffer len: 0 res: 2048 ptr: 8d8eb18 ajUserGet buffer len: 5 res: 2048 ptr: 8d8f320 ajUserGet buffer len: 1 res: 2048 ptr: 8d8fb28 ajUserGet buffer len: 5 res: 2048 ptr: 8d8eb18 ajUserGet buffer len: 3 res: 2048 ptr: 8d8eb18 ajUserGet buffer len: 8 res: 2048 ptr: 8d8eb18 ajFileScan directory: './' nisc-platypus-shotgun-1071756042.fasta nisc-platypus-shotgun-1080815515.fasta nisc-platypus-shotgun-1102160893.fasta [snip big list of files] closing file './/traces_oanatinus.ent' ajFileNewIn './nisc-platypus-shotgun-1048960391.fasta' EOF ajFileGetsL file ./nisc-platypus-shotgun-1048960391.fasta closing file './nisc-platypus-shotgun-1048960391.fasta' ajFileNewIn './nisc-platypus-shotgun-1071756042.fasta' EOF ajFileGetsL file ./nisc-platypus-shotgun-1071756042.fasta closing file './nisc-platypus-shotgun-1071756042.fasta' ajFileNewIn './nisc-platypus-shotgun-1080815515.fasta' EOF ajFileGetsL file ./nisc-platypus-shotgun-1080815515.fasta closing file './nisc-platypus-shotgun-1080815515.fasta' ajFileNewIn './nisc-platypus-shotgun-1102160893.fasta' EOF ajFileGetsL file ./nisc-platypus-shotgun-1102160893.fasta closing file './nisc-platypus-shotgun-1102160893.fasta' ajFileNewIn './nisc-platypus-shotgun-1104879084.fasta' EOF ajFileGetsL file ./nisc-platypus-shotgun-1104879084.fasta closing file './nisc-platypus-shotgun-1104879084.fasta' ajFileNewIn './nisc-platypus-shotgun-1109000445.fasta' EOF ajFileGetsL file ./nisc-platypus-shotgun-1109000445.fasta closing file './nisc-platypus-shotgun-1109000445.fasta' ajFileNewIn './nisc-platypus-shotgun-1110804272.fasta' EOF ajFileGetsL file ./nisc-platypus-shotgun-1110804272.fasta closing file './nisc-platypus-shotgun-1110804272.fasta' ajFileNewIn './nisc-platypus-shotgun-1116844699.fasta' EOF ajFileGetsL file ./nisc-platypus-shotgun-1116844699.fasta closing file './nisc-platypus-shotgun-1116844699.fasta' ajFileNewIn './nisc-platypus-shotgun-1142973027.fasta' EOF ajFileGetsL file ./nisc-platypus-shotgun-1142973027.fasta closing file './nisc-platypus-shotgun-1142973027.fasta' ajFileNewIn './wugsc-ornithorhynchus_anatinus-cloneEnd-1113828608.fasta' WriteBucket: Overflow WriteBucket: Overflow ReadBucket: Overflow ReadBucket: Overflow ReadBucket: Overflow ReadBucket: Overflow WriteBucket: Overflow [Loads more of these] GetKeys: Overflow ReadBucket: Overflow GetKeys: Overflow WriteNode: Overflow GetKeys: Overflow WriteNode: Overflow WriteBucket: Overflow WriteBucket: Overflow [Loads of these] WriteNode: Overflow GetKeys: Overflow WriteNode: Overflow GetKeys: Overflow GetKeys: Overflow WriteNode: Overflow GetKeys: Overflow GetKeys: Overflow WriteNode: Overflow GetKeys: Overflow WriteNode: Overflow GetKeys: Overflow [Killed at this point as the .dbg file getting enormous] From ajb at ebi.ac.uk Thu Apr 13 06:22:49 2006 From: ajb at ebi.ac.uk (ajb at ebi.ac.uk) Date: Thu, 13 Apr 2006 11:22:49 +0100 (BST) Subject: [EMBOSS] Problems indexing with dbxfasta In-Reply-To: References: Message-ID: <36857.81.98.244.247.1144923769.squirrel@webmail.ebi.ac.uk> Hi Simon, The overflow code isn't fully implemented yet and it shouldn't need to use it if your resource definition is OK. You'll get overflows if the length values are too short for the ID/ACC/SV/etc. Take a look and get back to me off-list if adjusting any appropriate length resource definitions doesn't help. HTH Alan > I managed to get hold of a debug file from the failing dbxfasta. The > edited highlights are: > > Debug file dbxfasta.dbg buffered:No > ajFileNewIn '/usr/local/share/EMBOSS/acd/dbxfasta.acd' > EOF ajFileGetsL file /usr/local/share/EMBOSS/acd/dbxfasta.acd > closing file '/usr/local/share/EMBOSS/acd/dbxfasta.acd' > ajUserGet buffer len: 0 res: 2048 ptr: 8d8eb18 > ajUserGet buffer len: 0 res: 2048 ptr: 8d8eb18 > ajUserGet buffer len: 5 res: 2048 ptr: 8d8f320 > ajUserGet buffer len: 1 res: 2048 ptr: 8d8fb28 > ajUserGet buffer len: 5 res: 2048 ptr: 8d8eb18 > ajUserGet buffer len: 3 res: 2048 ptr: 8d8eb18 > ajUserGet buffer len: 8 res: 2048 ptr: 8d8eb18 > ajFileScan directory: './' > nisc-platypus-shotgun-1071756042.fasta > nisc-platypus-shotgun-1080815515.fasta > nisc-platypus-shotgun-1102160893.fasta > > > [snip big list of files] > > closing file './/traces_oanatinus.ent' > ajFileNewIn './nisc-platypus-shotgun-1048960391.fasta' > EOF ajFileGetsL file ./nisc-platypus-shotgun-1048960391.fasta > closing file './nisc-platypus-shotgun-1048960391.fasta' > ajFileNewIn './nisc-platypus-shotgun-1071756042.fasta' > EOF ajFileGetsL file ./nisc-platypus-shotgun-1071756042.fasta > closing file './nisc-platypus-shotgun-1071756042.fasta' > ajFileNewIn './nisc-platypus-shotgun-1080815515.fasta' > EOF ajFileGetsL file ./nisc-platypus-shotgun-1080815515.fasta > closing file './nisc-platypus-shotgun-1080815515.fasta' > ajFileNewIn './nisc-platypus-shotgun-1102160893.fasta' > EOF ajFileGetsL file ./nisc-platypus-shotgun-1102160893.fasta > closing file './nisc-platypus-shotgun-1102160893.fasta' > ajFileNewIn './nisc-platypus-shotgun-1104879084.fasta' > EOF ajFileGetsL file ./nisc-platypus-shotgun-1104879084.fasta > closing file './nisc-platypus-shotgun-1104879084.fasta' > ajFileNewIn './nisc-platypus-shotgun-1109000445.fasta' > EOF ajFileGetsL file ./nisc-platypus-shotgun-1109000445.fasta > closing file './nisc-platypus-shotgun-1109000445.fasta' > ajFileNewIn './nisc-platypus-shotgun-1110804272.fasta' > EOF ajFileGetsL file ./nisc-platypus-shotgun-1110804272.fasta > closing file './nisc-platypus-shotgun-1110804272.fasta' > ajFileNewIn './nisc-platypus-shotgun-1116844699.fasta' > EOF ajFileGetsL file ./nisc-platypus-shotgun-1116844699.fasta > closing file './nisc-platypus-shotgun-1116844699.fasta' > ajFileNewIn './nisc-platypus-shotgun-1142973027.fasta' > EOF ajFileGetsL file ./nisc-platypus-shotgun-1142973027.fasta > closing file './nisc-platypus-shotgun-1142973027.fasta' > ajFileNewIn './wugsc-ornithorhynchus_anatinus-cloneEnd-1113828608.fasta' > WriteBucket: Overflow > WriteBucket: Overflow > ReadBucket: Overflow > ReadBucket: Overflow > ReadBucket: Overflow > ReadBucket: Overflow > WriteBucket: Overflow > > [Loads more of these] > > GetKeys: Overflow > ReadBucket: Overflow > GetKeys: Overflow > WriteNode: Overflow > GetKeys: Overflow > WriteNode: Overflow > WriteBucket: Overflow > WriteBucket: Overflow > > [Loads of these] > > WriteNode: Overflow > GetKeys: Overflow > WriteNode: Overflow > GetKeys: Overflow > GetKeys: Overflow > WriteNode: Overflow > GetKeys: Overflow > GetKeys: Overflow > WriteNode: Overflow > GetKeys: Overflow > WriteNode: Overflow > GetKeys: Overflow > > [Killed at this point as the .dbg file getting enormous] > > _______________________________________________ > EMBOSS mailing list > EMBOSS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/emboss > From simon.andrews at bbsrc.ac.uk Thu Apr 13 09:36:11 2006 From: simon.andrews at bbsrc.ac.uk (simon andrews (BI)) Date: Thu, 13 Apr 2006 14:36:11 +0100 Subject: [EMBOSS] Problems indexing with dbxfasta In-Reply-To: <36857.81.98.244.247.1144923769.squirrel@webmail.ebi.ac.uk> References: <36857.81.98.244.247.1144923769.squirrel@webmail.ebi.ac.uk> Message-ID: Alan, I increased all of the values in the resource definition and did the index again and it all worked fine this time. Looks like there must be some very long ids somewhere in this data. Thanks for the help Simon. On 13 Apr 2006, at 11:22, ajb at ebi.ac.uk wrote: > Hi Simon, > > The overflow code isn't fully implemented yet and it shouldn't need > to use it if your resource definition is OK. You'll get > overflows if the length values are too short for the > ID/ACC/SV/etc. Take a look and get back to me off-list > if adjusting any appropriate length resource definitions > doesn't help. > > HTH > > Alan > > >> I managed to get hold of a debug file from the failing dbxfasta. The >> edited highlights are: >> >> Debug file dbxfasta.dbg buffered:No >> ajFileNewIn '/usr/local/share/EMBOSS/acd/dbxfasta.acd' >> EOF ajFileGetsL file /usr/local/share/EMBOSS/acd/dbxfasta.acd >> closing file '/usr/local/share/EMBOSS/acd/dbxfasta.acd' >> ajUserGet buffer len: 0 res: 2048 ptr: 8d8eb18 >> ajUserGet buffer len: 0 res: 2048 ptr: 8d8eb18 >> ajUserGet buffer len: 5 res: 2048 ptr: 8d8f320 >> ajUserGet buffer len: 1 res: 2048 ptr: 8d8fb28 >> ajUserGet buffer len: 5 res: 2048 ptr: 8d8eb18 >> ajUserGet buffer len: 3 res: 2048 ptr: 8d8eb18 >> ajUserGet buffer len: 8 res: 2048 ptr: 8d8eb18 >> ajFileScan directory: './' >> nisc-platypus-shotgun-1071756042.fasta >> nisc-platypus-shotgun-1080815515.fasta >> nisc-platypus-shotgun-1102160893.fasta >> >> >> [snip big list of files] >> >> closing file './/traces_oanatinus.ent' >> ajFileNewIn './nisc-platypus-shotgun-1048960391.fasta' >> EOF ajFileGetsL file ./nisc-platypus-shotgun-1048960391.fasta >> closing file './nisc-platypus-shotgun-1048960391.fasta' >> ajFileNewIn './nisc-platypus-shotgun-1071756042.fasta' >> EOF ajFileGetsL file ./nisc-platypus-shotgun-1071756042.fasta >> closing file './nisc-platypus-shotgun-1071756042.fasta' >> ajFileNewIn './nisc-platypus-shotgun-1080815515.fasta' >> EOF ajFileGetsL file ./nisc-platypus-shotgun-1080815515.fasta >> closing file './nisc-platypus-shotgun-1080815515.fasta' >> ajFileNewIn './nisc-platypus-shotgun-1102160893.fasta' >> EOF ajFileGetsL file ./nisc-platypus-shotgun-1102160893.fasta >> closing file './nisc-platypus-shotgun-1102160893.fasta' >> ajFileNewIn './nisc-platypus-shotgun-1104879084.fasta' >> EOF ajFileGetsL file ./nisc-platypus-shotgun-1104879084.fasta >> closing file './nisc-platypus-shotgun-1104879084.fasta' >> ajFileNewIn './nisc-platypus-shotgun-1109000445.fasta' >> EOF ajFileGetsL file ./nisc-platypus-shotgun-1109000445.fasta >> closing file './nisc-platypus-shotgun-1109000445.fasta' >> ajFileNewIn './nisc-platypus-shotgun-1110804272.fasta' >> EOF ajFileGetsL file ./nisc-platypus-shotgun-1110804272.fasta >> closing file './nisc-platypus-shotgun-1110804272.fasta' >> ajFileNewIn './nisc-platypus-shotgun-1116844699.fasta' >> EOF ajFileGetsL file ./nisc-platypus-shotgun-1116844699.fasta >> closing file './nisc-platypus-shotgun-1116844699.fasta' >> ajFileNewIn './nisc-platypus-shotgun-1142973027.fasta' >> EOF ajFileGetsL file ./nisc-platypus-shotgun-1142973027.fasta >> closing file './nisc-platypus-shotgun-1142973027.fasta' >> ajFileNewIn >> './wugsc-ornithorhynchus_anatinus-cloneEnd-1113828608.fasta' >> WriteBucket: Overflow >> WriteBucket: Overflow >> ReadBucket: Overflow >> ReadBucket: Overflow >> ReadBucket: Overflow >> ReadBucket: Overflow >> WriteBucket: Overflow >> >> [Loads more of these] >> >> GetKeys: Overflow >> ReadBucket: Overflow >> GetKeys: Overflow >> WriteNode: Overflow >> GetKeys: Overflow >> WriteNode: Overflow >> WriteBucket: Overflow >> WriteBucket: Overflow >> >> [Loads of these] >> >> WriteNode: Overflow >> GetKeys: Overflow >> WriteNode: Overflow >> GetKeys: Overflow >> GetKeys: Overflow >> WriteNode: Overflow >> GetKeys: Overflow >> GetKeys: Overflow >> WriteNode: Overflow >> GetKeys: Overflow >> WriteNode: Overflow >> GetKeys: Overflow >> >> [Killed at this point as the .dbg file getting enormous] >> >> _______________________________________________ >> EMBOSS mailing list >> EMBOSS at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/emboss >> > > > -- Simon Andrews PhD Bioinformatics Dept. The Babraham Institute simon.andrews at bbsrc.ac.uk +44 (0) 1223 496463 From msarachu at biol.unlp.edu.ar Mon Apr 17 16:55:47 2006 From: msarachu at biol.unlp.edu.ar (Martin Sarachu) Date: Mon, 17 Apr 2006 17:55:47 -0300 Subject: [EMBOSS] wEMBOSS-1.6.0 & wrappers4EMBOSS-1.4.0 release Message-ID: <444400D3.6080705@biol.unlp.edu.ar> This is to announce the release of both wEMBOSS-1.6.0 and wrappers4EMBOSS-1.4.0 Changes in wEMBOSS-1.6.0 includes: - compatibility with new datatypes in EMBOSS-3.0.0 - better conversion of ACD expressions to Perl to maintain the same order of priority as in EMBOSS - increased speed by preprocessing EMBOSS datafiles Changes in wrappers4EMBOSS includes: - all programs that compute a gap penalty of type a*n+b now have parameters -gappenalty and -gaplength instead of -gapopen and -gapextend - muscle version updated for MUSCLE-3.6 - support for EMBOSS v2.9, 2.10 and 3. EMBOSS-2.8 is no longer supported - fastapid uses matrices coded in the software rather then read from files - indexsearch can also run with SRS 8 and it also fully runs on command line We are experiencing some dificulties at the wEMBOSS site so you can download both files at http://www.ar.embnet.org/downloads Shortly you will be able to download both at http://www.wemboss.org as usual. wEMBOSS includes wrappers4EMBOSS but if you want to use just wrappers4EMBOSS on the command line just like any EMBOSS program you can download it separately. Regards, the wEMBOSS & wrappers4EMBOSS dev team. -- Martin Sarachu msarachu at biol.unlp.edu.ar AR.EMBnet http://www.ar.embnet.org From dalesan at lamar.colostate.edu Tue Apr 18 19:53:03 2006 From: dalesan at lamar.colostate.edu (Dale Richardson) Date: Tue, 18 Apr 2006 17:53:03 -0600 Subject: [EMBOSS] Compilation errors on Intel Mac in ajindex.c Message-ID: <1D9B80E3-1F2B-4171-A346-BB63EF775585@lamar.colostate.edu> Hello All, I am trying to install EMBOSS 3.0 on my MacBook Pro. Interestingly, I have come across an error that I haven't been able to resolve via googling. When running make, the following error is encountered: ajindex.c: In function 'ajBtreeCacheNewC': ajindex.c:200: error: storage size of 'buf' isn't known ajindex.c: In function 'ajBtreeSecCacheNewC': ajindex.c:8234: error: storage size of 'buf' isn't known make[1]: *** [ajindex.lo] Error 1 make: *** [all-recursive] Error 1 Is there a way around this? I've applied the fixes available from the fixes directory at ftp://emboss.open-bio.org/pub/EMBOSS/ and tried to reconfigure and recompile but to no avail. Insights and suggestions would be much appreciated. Thanks, Dale Richardson Colorado State University dalesan at lamar.colostate.edu From kvddrift at earthlink.net Tue Apr 18 21:27:30 2006 From: kvddrift at earthlink.net (Koen van der Drift) Date: Tue, 18 Apr 2006 21:27:30 -0400 Subject: [EMBOSS] Compilation errors on Intel Mac in ajindex.c In-Reply-To: <1D9B80E3-1F2B-4171-A346-BB63EF775585@lamar.colostate.edu> References: <1D9B80E3-1F2B-4171-A346-BB63EF775585@lamar.colostate.edu> Message-ID: On Apr 18, 2006, at 7:53 PM, Dale Richardson wrote: > Insights and suggestions would be much appreciated. You could try to install emboss using fink, which is reportedly working on an Intel Mac (not tested by myself though). - Koen. From kvddrift at earthlink.net Tue Apr 18 21:30:43 2006 From: kvddrift at earthlink.net (Koen van der Drift) Date: Tue, 18 Apr 2006 21:30:43 -0400 Subject: [EMBOSS] Compilation errors on Intel Mac in ajindex.c In-Reply-To: <1D9B80E3-1F2B-4171-A346-BB63EF775585@lamar.colostate.edu> References: <1D9B80E3-1F2B-4171-A346-BB63EF775585@lamar.colostate.edu> Message-ID: <5195A9EE-A20F-4804-9AD5-FA08662D8912@earthlink.net> On Apr 18, 2006, at 7:53 PM, Dale Richardson wrote: > Is there a way around this? I've applied the fixes available from > the fixes directory at ftp://emboss.open-bio.org/pub/EMBOSS/ and > tried to reconfigure and recompile but to no avail. > > Insights and suggestions would be much appreciated. Just another thought, did you also replace the configure file from the fixes directory, followed by the ./configure command? - Koen. From olivier.friard at unito.it Fri Apr 21 11:00:20 2006 From: olivier.friard at unito.it (Olivier Friard) Date: Fri, 21 Apr 2006 17:00:20 +0200 Subject: [EMBOSS] index RefSeq for EMBOSS Message-ID: <4448F384.7020900@unito.it> Hi, I tried to index the RefSeq database: 1) I downloaded all ftp://ftp.ncbi.nih.gov/refseq/release/complete/complete*.genomic.gbff.gz file (GB format) 2) gunziped 3) Added the rs_dna entry to my .embossrc file DB rs_dna [ type: "N" method: "emblcd" format: "GB" dir: "/home/users/friard/data/refseq_genomic/" file: "*.gbff" release: "" comment: "RefSeq Genomic (upd)" indexdir: "/home/users/friard/data/refseq_genomic/" ] 4) used dbiflat with following arguments (from the directory where files are stored) dbiflat Index a flat file database Database name: rs_dna EMBL : EMBL SWISS : Swiss-Prot, SpTrEMBL, TrEMBLnew GB : Genbank, DDBJ REFSEQ : Refseq Entry format [SWISS]: REFSEQ Database directory [.]: Wildcard database filename [*.dat]: *.gbff Release number [0.0]: Index date [00/00/00]: The indexes were created but when I try to access to a sequence (i.e seqret rs_rna:NC_000004) then results is not the correct sequence but an other one with the NC_000004 ID! I also downloaded the file in FASTA format and tried to index them with the dbifasta command (format: ncbi) without positive results: seqret rs_dna:nc_000004 Reads and writes (returns) sequences Error: Unable to read sequence 'rs_dna:nc_000004' Died: seqret terminated: Bad value for '-sequence' and no prompt Does anyone index the RefSeq successfully? Thank you in advance -- Olivier Friard Laboratorio di Biologia Computazionale Facolt? di Scienze MFN Universit? di Torino via Accademia Albertina 13, 10124 TORINO (Italy) tel. +39 011 6704689 From simon.andrews at bbsrc.ac.uk Fri Apr 21 11:35:29 2006 From: simon.andrews at bbsrc.ac.uk (simon andrews (BI)) Date: Fri, 21 Apr 2006 16:35:29 +0100 Subject: [EMBOSS] index RefSeq for EMBOSS In-Reply-To: <4448F384.7020900@unito.it> References: <4448F384.7020900@unito.it> Message-ID: On 21 Apr 2006, at 16:00, Olivier Friard wrote: > The indexes were created but when I try to access to a sequence (i.e > seqret rs_rna:NC_000004) then results is not the correct sequence but > an > other one with the NC_000004 ID! Is it just finding the wrong sequence or could you have duplicate entries in the data? Use entret to see if the entry really has that ID. We found that we got problems with incorrect or no sequences being returned by seqret when some of the individual sequence files were >2Gb in size. In these cases you can use the new dbx* indexing programs which handle large files properly. > Does anyone index the RefSeq successfully? Yes. We use it here without problems, but indexed with dbxflat. It gets indexed with: dbxflat -dbresource all -auto -idformat refseq -dbname refseq_all -filenames \*.gbff ..and the emboss.default entry looks like: DB refseq_all [ type: N comment: "Refseq" method: emboss format: genbank dbalias: refseq_all directory: /data/public/DNA/Refseq/Current/all file: *.gbff ] with the resource section being: RES all [ type: Index idlen: 15 acclen: 15 svlen: 15 keylen: 15 deslen: 15 orglen: 15 ] Simon. -- Simon Andrews PhD Bioinformatics Dept. The Babraham Institute simon.andrews at bbsrc.ac.uk +44 (0) 1223 496463 From isabelle.wells at roche.com Fri Apr 21 11:43:27 2006 From: isabelle.wells at roche.com (Wells, Isabelle) Date: Fri, 21 Apr 2006 17:43:27 +0200 Subject: [EMBOSS] index RefSeq for EMBOSS Message-ID: Hi, Yes I also index refseq. I think the problem here is that dbiflat can only handle files which are less than 2GB. So try splitting the files first. Best, Isabelle -----Original Message----- From: emboss-bounces at lists.open-bio.org [mailto:emboss-bounces at lists.open-bio.org] On Behalf Of Olivier Friard Sent: Friday, April 21, 2006 17:00 To: emboss at emboss.open-bio.org Subject: [EMBOSS] index RefSeq for EMBOSS Hi, I tried to index the RefSeq database: 1) I downloaded all ftp://ftp.ncbi.nih.gov/refseq/release/complete/complete*.genomic.gbff.gz file (GB format) 2) gunziped 3) Added the rs_dna entry to my .embossrc file DB rs_dna [ type: "N" method: "emblcd" format: "GB" dir: "/home/users/friard/data/refseq_genomic/" file: "*.gbff" release: "" comment: "RefSeq Genomic (upd)" indexdir: "/home/users/friard/data/refseq_genomic/" ] 4) used dbiflat with following arguments (from the directory where files are stored) dbiflat Index a flat file database Database name: rs_dna EMBL : EMBL SWISS : Swiss-Prot, SpTrEMBL, TrEMBLnew GB : Genbank, DDBJ REFSEQ : Refseq Entry format [SWISS]: REFSEQ Database directory [.]: Wildcard database filename [*.dat]: *.gbff Release number [0.0]: Index date [00/00/00]: The indexes were created but when I try to access to a sequence (i.e seqret rs_rna:NC_000004) then results is not the correct sequence but an other one with the NC_000004 ID! I also downloaded the file in FASTA format and tried to index them with the dbifasta command (format: ncbi) without positive results: seqret rs_dna:nc_000004 Reads and writes (returns) sequences Error: Unable to read sequence 'rs_dna:nc_000004' Died: seqret terminated: Bad value for '-sequence' and no prompt Does anyone index the RefSeq successfully? Thank you in advance -- Olivier Friard Laboratorio di Biologia Computazionale Facolt? di Scienze MFN Universit? di Torino via Accademia Albertina 13, 10124 TORINO (Italy) tel. +39 011 6704689 _______________________________________________ EMBOSS mailing list EMBOSS at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/emboss From David.Bauer at schering.de Mon Apr 24 01:52:50 2006 From: David.Bauer at schering.de (David.Bauer at schering.de) Date: Mon, 24 Apr 2006 07:52:50 +0200 Subject: [EMBOSS] index RefSeq for EMBOSS In-Reply-To: Message-ID: You can also try the new indexing programs dbxflat and dbxfasta, which can handle files larger than 2 GB. Regards, David. emboss-bounces at lists.open-bio.org schrieb am 21/04/2006 17:43:27: > Hi, > > Yes I also index refseq. I think the problem here is that dbiflat > can only handle files which are less than 2GB. So try splitting the > files first. > > Best, > Isabelle > > -----Original Message----- > From: emboss-bounces at lists.open-bio.org [mailto:emboss- > bounces at lists.open-bio.org] On Behalf Of Olivier Friard > Sent: Friday, April 21, 2006 17:00 > To: emboss at emboss.open-bio.org > Subject: [EMBOSS] index RefSeq for EMBOSS > > > Hi, > > I tried to index the RefSeq database: > > 1) I downloaded all > ftp://ftp.ncbi.nih.gov/refseq/release/complete/complete*.genomic.gbff.gz > file (GB format) > > 2) gunziped > > 3) Added the rs_dna entry to my .embossrc file > > > DB rs_dna [ > type: "N" > method: "emblcd" > format: "GB" > dir: "/home/users/friard/data/refseq_genomic/" > file: "*.gbff" > release: "" > comment: "RefSeq Genomic (upd)" > indexdir: "/home/users/friard/data/refseq_genomic/" > ] > > > 4) used dbiflat with following arguments (from the directory where files > are stored) > > dbiflat > Index a flat file database > Database name: rs_dna > EMBL : EMBL > SWISS : Swiss-Prot, SpTrEMBL, TrEMBLnew > GB : Genbank, DDBJ > REFSEQ : Refseq > Entry format [SWISS]: REFSEQ > Database directory [.]: > Wildcard database filename [*.dat]: *.gbff > Release number [0.0]: > Index date [00/00/00]: > > The indexes were created but when I try to access to a sequence (i.e > seqret rs_rna:NC_000004) then results is not the correct sequence but an > other one with the NC_000004 ID! > > > > I also downloaded the file in FASTA format and tried to index them with > the dbifasta command (format: ncbi) without positive results: > > seqret rs_dna:nc_000004 > Reads and writes (returns) sequences > Error: Unable to read sequence 'rs_dna:nc_000004' > Died: seqret terminated: Bad value for '-sequence' and no prompt > > > Does anyone index the RefSeq successfully? > Thank you in advance > > > > > > > -- > > Olivier Friard > Laboratorio di Biologia Computazionale > Facolt? di Scienze MFN > Universit? di Torino > via Accademia Albertina 13, 10124 TORINO (Italy) > > tel. +39 011 6704689 > > _______________________________________________ > EMBOSS mailing list > EMBOSS at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/emboss > > _______________________________________________ > EMBOSS mailing list > EMBOSS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/emboss From olivier.friard at unito.it Wed Apr 26 06:29:51 2006 From: olivier.friard at unito.it (Olivier Friard) Date: Wed, 26 Apr 2006 12:29:51 +0200 Subject: [EMBOSS] index RefSeq with dbxflat Message-ID: <444F4B9F.1020209@unito.it> Hello, Thank you for your kindly help for indexing refseq. I try to index RefSeq DNA db using the dbxflat program with the following arguments: dbxflat Database b+tree indexing for flat file databases Basename for index files: rs_dna Resource name: rs_dna EMBL : EMBL SWISS : Swiss-Prot, SpTrEMBL, TrEMBLnew GB : Genbank, DDBJ REFSEQ : Refseq Entry format [SWISS]: REFSEQ Wildcard database filename [*.dat]: *.gbff Database directory [.]: /home/users/friard/data/refseq_genomic id : ID acc : Accession number sv : Sequence Version and GI des : Description key : Keywords org : Taxonomy Index fields [id,acc]: I included these records in my .embossrc file: DB rs_dna [ type: "N" method: "emboss" dbalias: "rs_dna" format: "genbank" directory: "/home/users/friard/data/refseq_genomic/" file: "*.gbff" comment: "RefSeq DNA (dbxflat)" ] RES rs_dna [ type: Index idlen: 15 acclen: 15 svlen: 15 keylen: 15 deslen: 15 orglen: 15 ] but when I try to retrieve a single sequence with its AC (seqret rs_dna:NC_001911) the program fails with this error message: seqret rs_dna:NC_001191 Reads and writes (returns) sequences Error: Unable to read sequence 'rs_dna:NC_001191' Died: seqret terminated: Bad value for '-sequence' and no prompt when I try to retrieve all sequences with "seqret rs_dna:* -out fasta::refseq.fasta" and everything works well I try to use dbxfasta with the *.fna files (modifying the .embossrc file with "fasta" value) but I obtained the same error. Any idea about the problem? Thank you in advance Olivier Friard From xiaozhendong at gmail.com Wed Apr 26 09:50:01 2006 From: xiaozhendong at gmail.com (zhendong shaw) Date: Wed, 26 Apr 2006 21:50:01 +0800 Subject: [EMBOSS] how to using Einverted to process a file contain multiple sequences Message-ID: Since the Einverted program is designed to process only one sequences a time. Are there any ways to handle a file in fasta format containing multiple sequences? The input file just like follow: >seq1 ATTTTTTTTTTTTTTTTTTTT >seq2 TTTAAAAAAAAAAAAAAA ....... sth like that.... From rls at ebi.ac.uk Wed Apr 26 11:46:51 2006 From: rls at ebi.ac.uk (Rodrigo Lopez) Date: Wed, 26 Apr 2006 16:46:51 +0100 Subject: [EMBOSS] FW: Forthcoming change in the EMBL flatfile format Message-ID: <00c401c66948$a2e29d40$0132a8c0@windows.ebi.ac.uk> > -----Original Message----- > From: owner-seq-dbg at ebi.ac.uk > [mailto:owner-seq-dbg at ebi.ac.uk] On Behalf Of Carola Kanz > Sent: 26 April 2006 16:29 > To: seq-dbg at ebi.ac.uk > Subject: Forthcoming change in the EMBL flatfile format > > > Dear all, > > if you are working with the EMBL flatfile format and you are > not yet aware of the format change we are going to introduce > with the next release, please have a look at the following > announcement. > Carola > > > -------------------------------------------------------------- > ----------- > > Dear colleagues, > > We would like to announce the following important change in > the EMBL database in June this year. > > At the time of release 87 (available from JUN-2006) the > format of the EMBL flat file will undergo a change: the ID > line will have a different structure (see below) and the SV > line will be removed. > > The changes affecting the ID line structure are: > > * All tokens will be separated by a semicolon. > * The entry name will not be displayed, in its place > there will be > the primary accession number. > * The sequence version will be indicated. > * The topology will be a separate token and will be > indicated for > both circular and linear molecules. > * Both the data class and the taxonomic divisions will > be displayed. > > This is an example of the new ID line: > > ID CD789012; SV 4; linear; genomic DNA; HTG; MAM; 500 BP. > (1) (2) (3) (4) (5) (6) (7) > > > The tokens represent: > > 1. Primary accession number. > 2. 'SV' + sequence version number. > 3. Topology: 'circular' or 'linear'. > 4. Molecule type. > 5. Data class (ANN, CON, PAT, EST, GSS, HTC, HTG, MGA, WGS, TPA, > STS, STD, "normal" entries will have STD for standard). > 6. Taxonomic division (HUM, MUS, ROD, PRO, MAM, VRT, FUN, > PLN, ENV, > INV, SYN, UNC, VRL, PHG)." > 7. Sequence length + 'BP.'. > > The entry name will not be displayed any more in the ID line. > Since EMBL release 3 (Dec 1983) the stable identifier of an > entry has been the primary accession number. > > A mapping file (entryname to accession number) will be > provided with the next release for those entries where the > entryname doesn't coincide with the accession number. > > To give users a test dataset, one file with new-style ID > lines called new_id_line.test.gz was provided together with > the March release of the EMBL database: > ftp://ftp.ebi.ac.uk/pub/databases/embl/release/new_id_line.test.gz > > Feedback from users is sought; please use the "Contact us" > link at the bottom of the EBI home page and specify "EMBL" in > the feedback form. > > Note: this information was first made available on our > "Forthcoming changes" page ( > http://www.ebi.ac.uk/embl/Documentation/forthcomingchanges.htm > l#0606 ) and in the EMBL database release notes. > > > > > > From pmr at ebi.ac.uk Fri Apr 28 05:04:31 2006 From: pmr at ebi.ac.uk (Peter Rice) Date: Fri, 28 Apr 2006 10:04:31 +0100 Subject: [EMBOSS] EMBOSS Funding News Message-ID: <4451DA9F.5030906@ebi.ac.uk> EMBOSS will be funded by the UK Biotechnology and Biological Sciences Research Council (BBSRC) for the next 3 years. EBI has issued the following press release, also available from: http://www.ebi.ac.uk/Information/News/pdf/Press25Apr06-small.pdf The EMBOSS team would like to thanks all our users and developers for their patience over the past two years. regards, Peter Rice Alan Bleasby Jon Ison A brighter future for Europe?s favourite molecular biology software package New funding for EMBOSS ? Europe?s leading suite of molecular biology analysis tools ? guarantees open access for researchers and software developers Hinxton, 25 April, 2006 ? EMBOSS, the European Molecular Biology Open Software Suite, has received a vital funding boost from the UK Biotechnology and Biological Sciences Research Council (BBSRC) that will guarantee its continued maintenance under an open source license for the next three years. This ends two years of uncertainty over the future of the project. Until recently, EMBOSS was hosted by the Medical Research Council?s Rosalind Franklin Centre for Genomics Research (RFCGR), where it was funded jointly by the BBSRC and the Medical Research Council (see ?notes for editors? for more information on the history of EMBOSS). With the announcement in April 2004 of the RFCGR?s closure, the future of EMBOSS hung in the balance. The new funding from the BBSRC means that EMBOSS co-founders Peter Rice and Alan Bleasby will be able to continue the EMBOSS project at the EMBL-EBI for the next three years. EMBOSS will remain freely available from emboss.sourceforge.net and anyone who wants to develop it further will have access to its source code. ?We?re delighted that the BBSRC has recognized EMBOSS as an important tool for molecular biology? says project leader Peter Rice. ?The EMBOSS user community has been very patient, and it highlights a great benefit of open source software that even users in industry have continued to rely on EMBOSS despite the uncertainty about its future. This simply could not have happened if EMBOSS had been a commercial package under threat.? EMBOSS provides a powerful package of around 300 applications for molecular biology and bioinformatics analysis. Molecular biologists use EMBOSS at all stages of their research, from planning experiments to analysing results. It also has an application-programming interface (API) that enables software developers to write their own EMBOSS applications. These can readily be strung together, allowing users to create ?workflows? that automate complex and time-consuming tasks. EMBOSS has also been used in many commercial software developments and is included in commercial bioinformatics systems. Its flexibility has made it an obvious core component of several data integration and bioinformatics infrastructure projects, including myGrid and EMBRACE. The new funding also provides helpdesk support for EMBOSS?s users. ?As well as helping researchers with limited bioinformatics expertise to make the most of EMBOSS, we will be able to provide better support and documentation to the estimated 20% of our users who are also software developers?, explains Alan Bleasby. ?We will encourage these experts to contribute their code to the project. In return, we will make their software widely available through the EMBOSS website and provide ongoing user support for it. This mechanism will help to ensure that EMBOSS evolves according to the needs of its users.? Contact: Cath Brooksbank PhD, EMBL-EBI Scientific Outreach Officer, Hinxton, UK, Tel: +44 1223 492 552, www.ebi.ac.uk, cath at ebi.ac.uk Anna-Lynn Wegener, EMBL Press Officer, Heidelberg, Germany, Tel: +49 6221 387 452, www.embl.org, wegener at embl.de Notes for editors ? a brief history of EMBOSS EMBOSS, an open source suite of tools for the analysis of biological data, has its origins in the late 1980s when Peter Rice, a co-founder of EMBOSS, was working at EMBL. Encouraged by his colleagues in the lab, he began to write extensions to the GCG package, which at that time provided its source code to users. His efforts evolved into EGCG (extended GCG) and Rice moved to the Sanger Centre (now the Wellcome Trust Sanger Institute) to continue its development. However, the changes to the source code licensing of GCG in 1996 put an end to further development of EGCG. Recognizing the importance of free source code to the rapid and cost-effective development of bioinformatics tools, Rice, in collaboration with Alan Bleasby (then at SEQNET, Daresbury, UK) began working on a new suite of open-source bioinformatics tools ? the EMBOSS project ? in 1996. EMBOSS has been funded by: the Wellcome Trust (1997?2000); the BBSRC and MRC (2001?2004); and through two posts at the MRC Rosalind Franklin Centre for Genomic Research following a merger with BBSRC?s SEQNET facility in 1998.After the closure of RFCGR in July 2005,EMBOSS moved to the EMBL-EBI where it is coordinated by Rice and Bleasby. About EMBL: The European Molecular Biology Laboratory is a basic research institute funded by public research monies from 19 member states (Austria, Belgium, Croatia,Denmark, Finland, France,Germany,Greece, Iceland, Ireland, Israel, Italy, the Netherlands,Norway, Portugal, Spain, Sweden, Switzerland and the United Kingdom). Research at EMBL is conducted by approximately 80 independent groups covering the spectrum of molecular biology. The Laboratory has five units: the main Laboratory in Heidelberg, and Outstations in Hinxton (the European Bioinformatics Institute), Grenoble, Hamburg, and Monterotondo near Rome. The cornerstones of EMBL?s mission are: to perform basic research in molecular biology; to train scientists, students and visitors at all levels; to offer vital services to scientists in the member states; to develop new instruments and methods in the life sciences and to actively engage in technology transfer activities. EMBL?s International PhD Programme has a student body of about 170. The Laboratory also sponsors an active Science and Society programme.Visitors from the press and public are welcome. About EBI: The European Bioinformatics Institute (EBI) is part of the European Molecular Biology Laboratory (EMBL) and is located on the Wellcome Trust Genome Campus in Hinxton near Cambridge (UK). The EBI grew out of EMBL's pioneering work in providing public biological databases to the research community. It hosts some of the world's most important collections of biological data, including DNA sequences (EMBL-Bank), protein sequences (UniProt), animal genomes (Ensembl), three-dimensional structures (the Macromolecular Structure Database), data from microarray experiments (ArrayExpress), protein?protein interactions (IntAct) and pathway information (Reactome).The EBI hosts several research groups and its scientists continually develop new tools for the biocomputing community. Policy regarding use: EMBL press releases may be freely reprinted and distributed via print and electronic media. Text, photographs & graphics are copyrighted by EMBL. They may be freely reprinted and distributed in conjunction with this news story, provided that proper attribution to authors, photographers and designers is made. High-resolution copies of the images can be downloaded from the EMBL web site: www.embl.org From rsucgang at bcm.tmc.edu Fri Apr 28 17:33:59 2006 From: rsucgang at bcm.tmc.edu (richard sucgang phd) Date: Fri, 28 Apr 2006 16:33:59 -0500 Subject: [EMBOSS] backtranambig missing? In-Reply-To: <69289db821f271dd3cf6e131ffa02013@bbsrc.ac.uk> References: <442BFD56.9010908@pcm.uam.es> <443A2160.8090102@ebi.ac.uk> <69289db821f271dd3cf6e131ffa02013@bbsrc.ac.uk> Message-ID: I am using EMBOSS on OSX (installed using fink). Is it my imagination, or is the application backtranambig missing? The documentation on sf.net points to this application existing, yet, I cannot find the binary in the install. Any ideas? -- Richard Sucgang, PhD (713) 798 7657 http://www.dictygenome.org/ From francis.tang at chukhang.com Fri Apr 28 18:39:12 2006 From: francis.tang at chukhang.com (Francis Tang) Date: Fri, 28 Apr 2006 23:39:12 +0100 Subject: [EMBOSS] how to using Einverted to process a file contain multiple sequences In-Reply-To: References: Message-ID: <44529990.90600@chukhang.com> Hi Zhendong, I've had to run einverted on a file with many sequences before. If I remember correctly, I used seqret to create a new file for each sequence, and then used bash's for+glob expansion to run einverted many times. Sorry this mail is so vague - it's been a long while since I've used emboss. If you haven't solved the problem already and the clues above don't make it obvious, write back and I'll work it out again. Cheers. Francis. zhendong shaw wrote: > Since the Einverted program is designed to process only one sequences a > time. Are there any ways to handle a file in fasta format containing > multiple sequences? > The input file just like follow: >> seq1 > ATTTTTTTTTTTTTTTTTTTT >> seq2 > TTTAAAAAAAAAAAAAAA > ....... > > sth like that.... -- www.chukhang.com/francis From pmr at ebi.ac.uk Sat Apr 29 06:23:42 2006 From: pmr at ebi.ac.uk (pmr at ebi.ac.uk) Date: Sat, 29 Apr 2006 11:23:42 +0100 (BST) Subject: [EMBOSS] backtranambig missing? In-Reply-To: References: <442BFD56.9010908@pcm.uam.es> <443A2160.8090102@ebi.ac.uk> <69289db821f271dd3cf6e131ffa02013@bbsrc.ac.uk> Message-ID: <2033.86.137.135.19.1146306222.squirrel@webmail.ebi.ac.uk> Richard Sucgang writes: > I am using EMBOSS on OSX (installed using fink). Is it my > imagination, or is the application backtranambig missing? The > documentation on sf.net points to this application existing, yet, I > cannot find the binary in the install. Any ideas? backtranambig will be in EMBOSS 4.0.0 The emboss.sf.net documentation is for the current developers code, and includes new programs and changes to the documentation for some of the current programs. EMBOSS 3.0.0 documentation is included in the distribution and installed when EMBOSS is installed. This often causes confusion - we are working on adding the 3.0.0 documentation to the website but we have not yet had time to finish that work. (We did move the current documentation to make it clearer that it was for the CVS code - but that caused more confusion). More news on 4.0.0 soon - we are busy now planning what will be in the release. Hope that helps, Peter From dksamuel at gmail.com Sat Apr 1 04:12:14 2006 From: dksamuel at gmail.com (Duleep Samuel) Date: Sat, 1 Apr 2006 09:42:14 +0530 Subject: [EMBOSS] Fwd: EMBOSS for Windows without Cygwin In-Reply-To: <442CCD71.60202@gmail.com> References: <442CCD71.60202@gmail.com> Message-ID: Is the latest EMBOSS version 3.0.0.0 available anywhere as a precompiled binary for Windows XP, I have tried compiling using cygwin and it crashed, I loaded EMBOSS for windows which is a port of version 2.10.0, loaded Staden Package and made Spin aware of EMBOSS and am working, but feel bad that I am _One_ whole release behind, If anyone has a complied binary I can download for testing and report back on useability, regards, Samuel, Virologist, India From kvddrift at earthlink.net Sun Apr 2 22:51:23 2006 From: kvddrift at earthlink.net (Koen van der Drift) Date: Sun, 2 Apr 2006 18:51:23 -0400 Subject: [EMBOSS] crash on intel-Mac In-Reply-To: <51078.81.98.244.247.1143807173.squirrel@webmail.ebi.ac.uk> References: <51078.81.98.244.247.1143807173.squirrel@webmail.ebi.ac.uk> Message-ID: On Mar 31, 2006, at 7:12 AM, ajb at ebi.ac.uk wrote: > This should now be fixed as long as you apply all the fixes to > EMBOSS-3.0.0 > from the directory: Thanks. Another fink user suggested to even extend the testing for ppc and intel in new config file, so it looks like: if test "`uname -a | grep Darwin`"; then if test "`uname -a | grep i386`"; then CFLAGS="$CFLAGS -O1" else # is this the correct setting on darwin-powerpc? CFLAGS="$CLFAGS -O2" fi else CFLAGS="$CFLAGS -O2" fi fi Would that cause any problems with emboss? thanks, - Koen. From h-weber at users.sourceforge.net Mon Apr 3 17:49:06 2006 From: h-weber at users.sourceforge.net (harald weber) Date: Mon, 03 Apr 2006 10:49:06 -0700 Subject: [EMBOSS] SeqFreed - a new interface to EMBOSS Message-ID: Dear friends, herewith I'd like to inform you about SeqFreed, a bioinformatics desktop. Amongst others, SeqFreed can also serve as a GUI-interface to EMBOSS applications. Please download it via 'seqfreed.sourceforge.net', run it and let me know, what you think about it. Besides that many details have to be improved, I'd like to know if this kind of app could be useful for you at all. All the best, Harald From dwaner at scitegic.com Tue Apr 4 16:57:45 2006 From: dwaner at scitegic.com (David Waner) Date: Tue, 04 Apr 2006 09:57:45 -0700 Subject: [EMBOSS] Digest and Pepstats crash using cygwin Message-ID: <4432A589.6050809@scitegic.com> I have compiled the 3.0.0 release of Emboss (including all current fixes from the ftp site) for Windows XP using Cygwin version 1.88. Most of the Emboss programs that I have tested work, but both Digest and Pepstats fail every time with a "Bad float conversion" error. The problem does not seem to depend on the sequence data, and occurs on every file I've tried. Has anyone else experienced this problem? Any solutions or suggestions would be appreciated. Thanks. - David Example: C:> digest -sequence O43291.fa -menu 2 -auto Protein proteolytic enzyme or reagent cleavage digest Output report [spt2_human.digest]: stdout EMBOSS An error in ajarr.c at line 1701: Bad float conversion Test data (O43291.fa): >swall|O43291|SPT2_HUMAN Kunitz-type protease inhibitor 2 precursor (Hepatocyte growth factor activator inhibitor type 2) (HAI-2) (Placental bikunin). MAQLCGLRRSRAFLALLGSLLLSGVLAADRERSIHDFCLVSKVVGRCRASMPRWWYNVTD GSCQLFVYGGCDGNSNNYLTKEECLKKCATVTENATGDLATSRNAADSSVPSAPRRQDSE DHSSDMFNYEEYCTANAVTGPCRASFPRWYFDVERNSCNNFIYGGCRGNKNSYRSEEACM LRCFRQQENPPLPLGSKVVVLAGLFVMVLILFLGASMVYLIRVARRNQERALRTVWSSGD DKEQLVKNTYVL From simon.andrews at bbsrc.ac.uk Wed Apr 5 09:04:20 2006 From: simon.andrews at bbsrc.ac.uk (simon andrews (BI)) Date: Wed, 5 Apr 2006 10:04:20 +0100 Subject: [EMBOSS] Download server problems? Message-ID: <324960494b49bb2c0f2679add8452bf9@bbsrc.ac.uk> Does anyone know what's up with the emboss.open-bio.org FTP server? I can connect, but never get as far as a login prompt. Simon. -- Simon Andrews PhD Bioinformatics Dept. The Babraham Institute simon.andrews at bbsrc.ac.uk +44 (0) 1223 496463 From dag at sonsorol.org Thu Apr 6 03:07:33 2006 From: dag at sonsorol.org (Chris Dagdigian) Date: Wed, 5 Apr 2006 23:07:33 -0400 Subject: [EMBOSS] Download server problems? In-Reply-To: <324960494b49bb2c0f2679add8452bf9@bbsrc.ac.uk> References: <324960494b49bb2c0f2679add8452bf9@bbsrc.ac.uk> Message-ID: {forgot to CC the list on this reply ... } Our fault (open-bio.org hosting) -- the server has some sort of running process with a memory leak we thought we had found. Turns out we didn't and the box ground itself slowly to a halt this evening. Thanks to the wonders of remote power control all it takes to reset and power cycle the system is an SSH connection. We've got another 4GB of memory on order for this system. Regards. Chris On Apr 5, 2006, at 5:04 AM, simon andrews (BI) wrote: > Does anyone know what's up with the emboss.open-bio.org FTP server? I > can connect, but never get as far as a login prompt. > > Simon. > -- > Simon Andrews PhD > Bioinformatics Dept. > The Babraham Institute > > simon.andrews at bbsrc.ac.uk > +44 (0) 1223 496463 > > _______________________________________________ > EMBOSS mailing list > EMBOSS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/emboss From natalia.jimenez at pcm.uam.es Thu Apr 6 07:56:06 2006 From: natalia.jimenez at pcm.uam.es (Natalia Jimenez Lozano) Date: Thu, 06 Apr 2006 09:56:06 +0200 Subject: [EMBOSS] Problems with GenBank indexing Message-ID: <4434C996.7050606@pcm.uam.es> Hi everybody, I was trying to retrieve fasta protein sequences from GenBank by id using seqret but it was not possible for every id. However, retrieval by GI is allowed. Additionally, during the indexing process (dbifasta) I've obtained some errors like this one: Warning: Duplicate ID skipped: 'AC000348_16' All hits will point to first ID found I was looking for an explanation to this behaviour and I've found that skipped IDs correspond to CDS from genomic sequences and have this format: >gi|10121909|gb|AAG13419.1|AC000348_16 T7N9.24 [Arabidopsis thaliana] MELPDVPVWRRVIVSAFFEALTFNIDIEEERSEIMMKTGAVVSNPRSRVKWDAFLSFQRDTSHNFTDRLY... >gi|8778864|gb|AAF79863.1|AC000348_16 T7N9.28 [Arabidopsis thaliana] MSVVLQITKDWVQALLGFLLLSFANISTRTNHKHFPHGSCSSIMAGFWIYMYIYSYLFITLKIIDLTS... In the previous entries, when I try to retrieve one of them by the first identifier (gi), I can get both of them. When I try to do retrievals using the last identifier (AC000348_16), I only get the first one. But it's impossible to do retrievals by second identifier (AAG13419.1 and AAF79863.1). However, sequences with the following format can be well indexed: >gi|64029|emb|CAA23986.1| reading frame [Lophius americanus] MKMVSSSRLRCLLVLLLSLTASISCSFAGQRDSKLRLLLHRYPLQGSKQDMTRSALAELLLSDLLQGENE ... and these sequences can be well retrieved by first and second identifiers (64029 and CAA23986.1). Does anybody know how to solve these problems? Thanks in advance, Natalia From jison at ebi.ac.uk Fri Apr 7 12:02:50 2006 From: jison at ebi.ac.uk (Jon Ison) Date: Fri, 7 Apr 2006 13:02:50 +0100 (BST) Subject: [EMBOSS] Problems with GenBank indexing In-Reply-To: <4434C996.7050606@pcm.uam.es> References: <4434C996.7050606@pcm.uam.es> Message-ID: <59463.172.31.100.168.1144411370.squirrel@webmail.ebi.ac.uk> Dear Natalia By default, dbifasta will index the ID name and the accession number (if present). To index the Sequence Version, GI number and words in the description, you must run dbifasta with the '-fields' qualifier, e.g. "-fields acc", "-fields sv acc" etc. If you don't, you will not be able to retrieve by those fields. Please see http://emboss.sourceforge.net/apps/cvs/dbifasta.html. dbifasta only retrieves the first of any duplicate entries. So far as I'm aware dbxfasta can retrieve duplicate entries. Does that help? Feel free to get back in touch. Cheers Jon > Hi everybody, > > I was trying to retrieve fasta protein sequences from GenBank by id > using seqret but it was not possible for every id. However, retrieval by > GI is allowed. > > Additionally, during the indexing process (dbifasta) I've obtained some > errors like this one: > > Warning: Duplicate ID skipped: 'AC000348_16' All hits will point to > first ID found > > I was looking for an explanation to this behaviour and I've found that > skipped IDs correspond to CDS from genomic sequences and have this format: > > >gi|10121909|gb|AAG13419.1|AC000348_16 T7N9.24 [Arabidopsis thaliana] > MELPDVPVWRRVIVSAFFEALTFNIDIEEERSEIMMKTGAVVSNPRSRVKWDAFLSFQRDTSHNFTDRLY... > >gi|8778864|gb|AAF79863.1|AC000348_16 T7N9.28 [Arabidopsis thaliana] > MSVVLQITKDWVQALLGFLLLSFANISTRTNHKHFPHGSCSSIMAGFWIYMYIYSYLFITLKIIDLTS... > > In the previous entries, when I try to retrieve one of them by the first > identifier (gi), I can get both of them. When I try to do retrievals > using the last identifier (AC000348_16), I only get the first one. But > it's impossible to do retrievals by second identifier (AAG13419.1 and > AAF79863.1). > > However, sequences with the following format can be well indexed: > > >gi|64029|emb|CAA23986.1| reading frame [Lophius americanus] > MKMVSSSRLRCLLVLLLSLTASISCSFAGQRDSKLRLLLHRYPLQGSKQDMTRSALAELLLSDLLQGENE ... > > and these sequences can be well retrieved by first and second > identifiers (64029 and CAA23986.1). > > Does anybody know how to solve these problems? > Thanks in advance, > Natalia > _______________________________________________ > EMBOSS mailing list > EMBOSS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/emboss > From natalia.jimenez at pcm.uam.es Fri Apr 7 12:50:16 2006 From: natalia.jimenez at pcm.uam.es (Natalia Jimenez Lozano) Date: Fri, 07 Apr 2006 14:50:16 +0200 Subject: [EMBOSS] Problems with GenBank indexing In-Reply-To: <59463.172.31.100.168.1144411370.squirrel@webmail.ebi.ac.uk> References: <4434C996.7050606@pcm.uam.es> <59463.172.31.100.168.1144411370.squirrel@webmail.ebi.ac.uk> Message-ID: <44366008.6080106@pcm.uam.es> Dear Jon, > Dear Natalia > > By default, dbifasta will index the ID name and the accession number (if present). > > To index the Sequence Version, GI number and words in the description, you must > run dbifasta with the '-fields' qualifier, e.g. "-fields acc", "-fields sv acc" > etc. If you don't, you will not be able to retrieve by those fields. Please > see http://emboss.sourceforge.net/apps/cvs/dbifasta.html. > Yes indexation was done taking into account the -field parameter :-( > dbifasta only retrieves the first of any duplicate entries. So far as I'm aware > dbxfasta can retrieve duplicate entries. > We'll try with dbxfasta! > Does that help? Feel free to get back in touch. > Yes, a lot. Thank you very much Regards, Natalia > Cheers > > Jon > > > > > >> Hi everybody, >> >> I was trying to retrieve fasta protein sequences from GenBank by id >> using seqret but it was not possible for every id. However, retrieval by >> GI is allowed. >> >> Additionally, during the indexing process (dbifasta) I've obtained some >> errors like this one: >> >> Warning: Duplicate ID skipped: 'AC000348_16' All hits will point to >> first ID found >> >> I was looking for an explanation to this behaviour and I've found that >> skipped IDs correspond to CDS from genomic sequences and have this format: >> >> >gi|10121909|gb|AAG13419.1|AC000348_16 T7N9.24 [Arabidopsis thaliana] >> MELPDVPVWRRVIVSAFFEALTFNIDIEEERSEIMMKTGAVVSNPRSRVKWDAFLSFQRDTSHNFTDRLY... >> >gi|8778864|gb|AAF79863.1|AC000348_16 T7N9.28 [Arabidopsis thaliana] >> MSVVLQITKDWVQALLGFLLLSFANISTRTNHKHFPHGSCSSIMAGFWIYMYIYSYLFITLKIIDLTS... >> >> In the previous entries, when I try to retrieve one of them by the first >> identifier (gi), I can get both of them. When I try to do retrievals >> using the last identifier (AC000348_16), I only get the first one. But >> it's impossible to do retrievals by second identifier (AAG13419.1 and >> AAF79863.1). >> >> However, sequences with the following format can be well indexed: >> >> >gi|64029|emb|CAA23986.1| reading frame [Lophius americanus] >> MKMVSSSRLRCLLVLLLSLTASISCSFAGQRDSKLRLLLHRYPLQGSKQDMTRSALAELLLSDLLQGENE ... >> >> and these sequences can be well retrieved by first and second >> identifiers (64029 and CAA23986.1). >> >> Does anybody know how to solve these problems? >> Thanks in advance, >> Natalia >> _______________________________________________ >> EMBOSS mailing list >> EMBOSS at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/emboss >> >> > > > > > From jison at ebi.ac.uk Fri Apr 7 15:34:24 2006 From: jison at ebi.ac.uk (Jon Ison) Date: Fri, 7 Apr 2006 16:34:24 +0100 (BST) Subject: [EMBOSS] Problem indexing PDB fasta file In-Reply-To: <442BFD56.9010908@pcm.uam.es> References: <442BFD56.9010908@pcm.uam.es> Message-ID: <34100.172.31.100.168.1144424064.squirrel@webmail.ebi.ac.uk> Hi Enrique dbifasta will return just the first entry with a duplicated id. The new dbxfasta will return all entries with the duplicated id. dbifasta is indeed case-insensitive. To make it case-sensitive, you could change the 3 instances of "ajStrMatchCaseC" in dbifasta.c to "ajStrMatchC", recompile and try again. I don't think we'd want to make that change in the distribution though. Hope that helps. Cheers Jon > Hello, > > I'm trying to index the fasta file of the PDB database with dbifasta > command and I get a lot of warnings as: > > Warning: Duplicate ID skipped: '1FNT_A' All hits will point to first ID > found > > I have been looking the PDB fasta file and I see that, for the previous > warning, there are an entry whoose id is '1FNT_A' and another one whoose > id is '1FNT_a'. Then, this make me think that EMBOSS is > case-insensitive. Is this true? Are there any way to distinguish between > the two id's? > > Thanks in advance, > > Enrique. > _______________________________________________ > EMBOSS mailing list > EMBOSS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/emboss > From pmr at ebi.ac.uk Mon Apr 10 09:12:00 2006 From: pmr at ebi.ac.uk (Peter Rice) Date: Mon, 10 Apr 2006 10:12:00 +0100 Subject: [EMBOSS] Problem indexing PDB fasta file In-Reply-To: <442BFD56.9010908@pcm.uam.es> References: <442BFD56.9010908@pcm.uam.es> Message-ID: <443A2160.8090102@ebi.ac.uk> Enrique de Andres Saiz wrote: > I have been looking the PDB fasta file and I see that, for the previous > warning, there are an entry whoose id is '1FNT_A' and another one whoose > id is '1FNT_a'. Then, this make me think that EMBOSS is > case-insensitive. Is this true? Are there any way to distinguish between > the two id's? Yes, EMBOSS is case-insensitive. So is the Staden/EMBLCD indexing standard that dbifasta uses. The standard also only allows one entry with each ID. dbxfasta uses a new indexing format and can index both entries, but will still assume the names are the same (a search for 1FNT_A or 1FNT_a wil return both entries). Allowing indexing to be case-sensitive is possible in future, but can slow down searches. We will investigate. Hope that helps, Peter From pmr at ebi.ac.uk Mon Apr 10 09:05:36 2006 From: pmr at ebi.ac.uk (Peter Rice) Date: Mon, 10 Apr 2006 10:05:36 +0100 Subject: [EMBOSS] dbifasta index file format In-Reply-To: <20060330083142.4237.qmail@web26207.mail.ukl.yahoo.com> References: <20060330083142.4237.qmail@web26207.mail.ukl.yahoo.com> Message-ID: <443A1FE0.1060707@ebi.ac.uk> Graziano P. wrote: > hello EMBOSS users, > I have some databases in fasta format (ncbi | format) > and I want to index them using dbifasta, then I want > to access the index files using a program that will be > developed by a computer scientist of my group. > I need to index the databases by accession number, > ginumber and description. I have read in the dbifasta > help info about the structure of the index files when > the databases were indexed by accession number, but I > have not found info about the structure of the index > files when the databases are indexed by description. > Anyone knows where I can find detailed information > about the structure of the index files? Ciao Graziano, The dbifasta index files use the same format as the Staden package, the old EMBL CD-ROM distribution, and Erik Sonnhammer's "efetch" utility. They were documented in some old Staden documentation and papers. They are also documented in the EMBOSS distribution under doc/manuals/ in file internals-indexing.txt (see attached). I see that this document was written before we indexed the descriptions!!! The description (title) indexing is the same as the accession number indexing. The files are called des.hit and des.trg. dbifasta has a -maxindex option to limit the size of the longest words indexed (the index files have a value for the maximum record length). We also have a script in the distribution scripts/dbilist.pl which can list the contents of the description index (in the database index directory, run it as dbilist.pl des) The new dbxfasta index files are very different. For very large databases we recommend dbxfasta. For smaller databases dbifasta is fine and we will continue to support it. Hope that helps. If you need more details, just ask. regards, Peter -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: internals-indexing.txt URL: From simon.andrews at bbsrc.ac.uk Mon Apr 10 09:40:30 2006 From: simon.andrews at bbsrc.ac.uk (simon andrews (BI)) Date: Mon, 10 Apr 2006 10:40:30 +0100 Subject: [EMBOSS] Problem indexing PDB fasta file In-Reply-To: <443A2160.8090102@ebi.ac.uk> References: <442BFD56.9010908@pcm.uam.es> <443A2160.8090102@ebi.ac.uk> Message-ID: <69289db821f271dd3cf6e131ffa02013@bbsrc.ac.uk> On 10 Apr 2006, at 10:12, Peter Rice wrote: > Enrique de Andres Saiz wrote: >> I have been looking the PDB fasta file and I see that, for the >> previous >> warning, there are an entry whoose id is '1FNT_A' and another one >> whoose >> id is '1FNT_a'. Then, this make me think that EMBOSS is >> case-insensitive. Is this true? Are there any way to distinguish >> between >> the two id's? > > Yes, EMBOSS is case-insensitive. So is the Staden/EMBLCD indexing > standard > that dbifasta uses. > > The standard also only allows one entry with each ID. If anyone's interested I've got a small perl script which reformats the PDB database into a more sensible format and sorts out the problems with case sensitive ids and a number of other odd conventions used in PDB. I'm happy to supply a copy to anyone who wants it. TTFN Simon. -- Simon Andrews PhD Bioinformatics Dept. The Babraham Institute simon.andrews at bbsrc.ac.uk +44 (0) 1223 496463 From pmr at ebi.ac.uk Mon Apr 10 10:44:47 2006 From: pmr at ebi.ac.uk (Peter Rice) Date: Mon, 10 Apr 2006 11:44:47 +0100 Subject: [EMBOSS] Problems with GenBank indexing In-Reply-To: <4434C996.7050606@pcm.uam.es> References: <4434C996.7050606@pcm.uam.es> Message-ID: <443A371F.1010100@ebi.ac.uk> Natalia Jimenez Lozano wrote: > I was looking for an explanation to this behaviour and I've found that > skipped IDs correspond to CDS from genomic sequences and have this format: > > >gi|10121909|gb|AAG13419.1|AC000348_16 T7N9.24 [Arabidopsis thaliana] > MELPDVPVWRRVIVSAFFEALTFNIDIEEERSEIMMKTGAVVSNPRSRVKWDAFLSFQRDTSHNFTDRLY... > >gi|8778864|gb|AAF79863.1|AC000348_16 T7N9.28 [Arabidopsis thaliana] > MSVVLQITKDWVQALLGFLLLSFANISTRTNHKHFPHGSCSSIMAGFWIYMYIYSYLFITLKIIDLTS... As Jon says, dbxfasta is a solution. However, that is only a partial solution. The real problem is that these FASTA format sequences do indeed have duplicate IDs. This is protein sequence data, so it is not GenBank - was this GenPept or some other database? GenPept and other databases have been known to report "gb" or "emb" as the database for protein sequences!!! A possible solution is to add a new ID format to dbifasta and dbxfasta that uses AAG13419 and AAF7986 as the ID and ignores the AC000348_16 part. Hope this helps, Peter From pmr at ebi.ac.uk Mon Apr 10 11:04:49 2006 From: pmr at ebi.ac.uk (Peter Rice) Date: Mon, 10 Apr 2006 12:04:49 +0100 Subject: [EMBOSS] Fwd: EMBOSS for Windows without Cygwin In-Reply-To: References: <442CCD71.60202@gmail.com> Message-ID: <443A3BD1.2040709@ebi.ac.uk> Duleep Samuel wrote: > Is the latest EMBOSS version 3.0.0.0 available anywhere as a precompiled > binary for Windows XP, I have tried compiling using cygwin and it > crashed, I loaded EMBOSS for windows which is a port of version 2.10.0, > loaded Staden Package and made Spin aware of EMBOSS and am working, but > feel bad that I am _One_ whole release behind, If anyone has a complied > binary I can download for testing and report back on useability, > regards, Samuel, Virologist, India Staden has support for older versions of EMBOSS. We are trying to update Staden to work with EMBOS 3.0.0 and future releases. If anyone is using EMBOSS and Staden (especially EMBOSS under the Staden SPIN interface) please contact the EMBOSS developers (emboss-bug at emboss.open-bio.org) so we know how many EMBOSS SPIN users there are. It helps to set priorities for the work. regards, Peter From janenerz at web.de Wed Apr 12 09:09:58 2006 From: janenerz at web.de (Christiane Nerz) Date: Wed, 12 Apr 2006 11:09:58 +0200 Subject: [EMBOSS] nt-multi-fastA-file Message-ID: <443CC3E6.4040108@web.de> Hi all, I put the gb-file of an whole genome in Artemis. Is there a possibility to export a multi-FastA-file with the bases of all ORFs? Example: >ORF_1 ATGTGTTCGTT.... >ORF_2 ATGTTCCCGACCA... >ORF_3 ATGCCGCAT... I know how to get all bases, but only as one complete sequence. (That genome is not published yet, so there is no multi-Fasta-file at ncbi or EMBL available) Thanks for help! Jane Nerz From simon.andrews at bbsrc.ac.uk Wed Apr 12 10:05:49 2006 From: simon.andrews at bbsrc.ac.uk (simon andrews (BI)) Date: Wed, 12 Apr 2006 11:05:49 +0100 Subject: [EMBOSS] nt-multi-fastA-file In-Reply-To: <443CC3E6.4040108@web.de> References: <443CC3E6.4040108@web.de> Message-ID: <902608901e58c68600b4dc52c7e8a966@bbsrc.ac.uk> On 12 Apr 2006, at 10:09, Christiane Nerz wrote: > Hi all, > > I put the gb-file of an whole genome in Artemis. > Is there a possibility to export a multi-FastA-file with the bases of > all ORFs? If you can save the file out of Artemis with the ORFs shown in the feature table then you can use coderet in EMBOSS to extract out all of the subsequences covering those features, either as protein or DNA. Hope this helps Simon. -- Simon Andrews PhD Bioinformatics Dept. The Babraham Institute simon.andrews at bbsrc.ac.uk +44 (0) 1223 496463 From pmr at ebi.ac.uk Wed Apr 12 10:20:46 2006 From: pmr at ebi.ac.uk (Peter Rice) Date: Wed, 12 Apr 2006 11:20:46 +0100 Subject: [EMBOSS] nt-multi-fastA-file In-Reply-To: <443CC3E6.4040108@web.de> References: <443CC3E6.4040108@web.de> Message-ID: <443CD47E.6060607@ebi.ac.uk> Christiane Nerz wrote: > Hi all, > > I put the gb-file of an whole genome in Artemis. > Is there a possibility to export a multi-FastA-file with the bases of > all ORFs? Example: > > >ORF_1 > ATGTGTTCGTT.... > >ORF_2 > ATGTTCCCGACCA... > >ORF_3 > ATGCCGCAT... > > I know how to get all bases, but only as one complete sequence. > (That genome is not published yet, so there is no multi-Fasta-file at > ncbi or EMBL available) Yes, the coderet program will do this. Unfortunately coderet tries to return CDS, mRNA and translations all in one file (to be fixed for the next release). You can ask just for the CDS with a couple of extra command line options: coderet -nomrna -notranslation Give it the filename as input. The output will be the coding sequences. With -nocds instead of -notranslation you will get the protein sequences. If you have any problems parsing the GenBank file let me know. regards, Peter Rice From Marc.Logghe at DEVGEN.com Wed Apr 12 12:39:00 2006 From: Marc.Logghe at DEVGEN.com (Marc Logghe) Date: Wed, 12 Apr 2006 14:39:00 +0200 Subject: [EMBOSS] Embossdata -reject option Message-ID: <0C528E3670D8CE4B8E013F6749231AA6746CD7@ANTARESIA.be.devgen.com> Hi, I am intrigued by the -reject option of embossdata. According to the doc: "This specifies the names of the sub-directories of the EMBOSS data directory that should be ignored when displaying data directories. Choose from selection list of values 3, 5, 6". I was not able to find out what this list of values corresponds to. I hoped to get a list to select from when embossdata was run with the -options parameter, but this did not happen. Any clues ? Actually I was trying to find a way to obtain more or less the oposite of '-reject', e.g. what if you only want the content of the CODONS directory ? Regards, Marc From gbottu at ben.vub.ac.be Wed Apr 12 13:30:00 2006 From: gbottu at ben.vub.ac.be (Guy Bottu) Date: Wed, 12 Apr 2006 15:30:00 +0200 Subject: [EMBOSS] Embossdata -reject option - Checked by AntiVir DEMO versio In-Reply-To: <0C528E3670D8CE4B8E013F6749231AA6746CD7@ANTARESIA.be.devgen.com> References: <0C528E3670D8CE4B8E013F6749231AA6746CD7@ANTARESIA.be.devgen.com> Message-ID: <20060412133000.GD15725@bigben.ulb.ac.be> On Wed, Apr 12, 2006 at 02:39:00PM +0200, Marc Logghe wrote: > I am intrigued by the -reject option of embossdata. > According to the doc: > "This specifies the names of the sub-directories of the EMBOSS data > directory that should be ignored when displaying data directories. > Choose from selection list of values 3, 5, 6". > I was not able to find out what this list of values corresponds to. Indeed tricky to find out what this means :-; You can look in the file .../share/EMBOSS/acd/embossdata.acd : selection: reject [ default: "3, 5, 6" minimum: "1" maximum: "6" values: "None, AAINDEX, CVS, CODONS, PRINTS, PROSITE, REBASE" delimiter: "," header: "Directories to ignore" information: "Select directories" help: "This specifies the names of the sub-directories of the EMBOSS data directory that should be ignored when displaying data directories." button: "Y" ] So, by default CVS, PRINTS and PROSITE are rejected. > I hoped to get a list to select from when embossdata was run with the > -options parameter, but this did not happen. That is because -reject is an "advanced", not an "optional"/"additinal" parameter. It is indeed impossible to get a selection list displayed at the command line, although many GUI's like wEMBOSS will show it. > Actually I was trying to find a way to obtain more or less the oposite > of '-reject', e.g. what if you only want the content of the CODONS > directory ? This does not work, there is no way to reject the files in the base data directory. The best you can do is to add on the command line -reject=2,3,5,6,7 or -reject= AAINDEX,CVS,PRINTS,PROSITE,REBASE What you can do however is : ls $EMBOSS_DATA/CODONS Hope this helps, Guy Bottu, Belgian EMBnet Node From Marc.Logghe at DEVGEN.com Wed Apr 12 14:02:09 2006 From: Marc.Logghe at DEVGEN.com (Marc Logghe) Date: Wed, 12 Apr 2006 16:02:09 +0200 Subject: [EMBOSS] Embossdata -reject option - Checked by AntiVir DEMO versio Message-ID: <0C528E3670D8CE4B8E013F6749231AA6746CD9@ANTARESIA.be.devgen.com> Hi Guy ! > You can look in the file .../share/EMBOSS/acd/embossdata.acd : > > selection: reject [ > default: "3, 5, 6" > minimum: "1" > maximum: "6" > values: "None, AAINDEX, CVS, CODONS, PRINTS, PROSITE, REBASE" > delimiter: "," > header: "Directories to ignore" > information: "Select directories" > help: "This specifies the names of the sub-directories of the > EMBOSS data directory that should be ignored when > displaying data > directories." > button: "Y" > ] > > So, by default CVS, PRINTS and PROSITE are rejected. Yes, that makes sense now ! > This does not work, there is no way to reject the files in > the base data directory. The best you can do is to add on the > command line > -reject=2,3,5,6,7 or -reject= > AAINDEX,CVS,PRINTS,PROSITE,REBASE What you can do however is : > ls $EMBOSS_DATA/CODONS Yeah, that is of course the most obvious ;-) Thing is that I wanted to do it in an emboss-only way so that it would be possible to run the emboss command via a soaplab service. The latter should provide a means to dynamically fetch a list of codon usage tables. More or less like showdb is doing. > > Hope this helps, Yes it did. Thanks ! Regards, Marc From pmr at ebi.ac.uk Wed Apr 12 16:04:21 2006 From: pmr at ebi.ac.uk (pmr at ebi.ac.uk) Date: Wed, 12 Apr 2006 17:04:21 +0100 (BST) Subject: [EMBOSS] Embossdata -reject option In-Reply-To: <0C528E3670D8CE4B8E013F6749231AA6746CD7@ANTARESIA.be.devgen.com> References: <0C528E3670D8CE4B8E013F6749231AA6746CD7@ANTARESIA.be.devgen.com> Message-ID: <3057.86.137.128.238.1144857861.squirrel@webmail.ebi.ac.uk> Mark Logghe wrote: > I am intrigued by the -reject option of embossdata. > > I was not able to find out what this list of values corresponds to. I > hoped to get a list to select from when embossdata was run with the > -options parameter, but this did not happen. > Any clues ? Hmmmm .... yes, -help and the acdtable output (the table in the webpage application documentation) really need to report the list of menu items for values that are not prompted (list and selection datatypes). We will do that for the next release! Otherwise, you do need to look in the ACD file. I propose: -help to report documentation on the options -help -verbose to report the list of options acdtable to report the full menu formatted in the "Allowed values" box. When this is implemented, it will appear in the apps/cvs/embossdata.html documentation at emboss.sf.net :-) >Yeah, that is of course the most obvious ;-) Thing is that I wanted to >do it in an emboss-only way so that it would be possible to run the >emboss command via a soaplab service. The latter should provide a means >to dynamically fetch a list of codon usage tables. More or less like >showdb is doing. We are looking at ways to do that ... can be tricky if cutgextract has been run. Any suggestions? A showdata application perhaps? Hope that helps, Peter From Marc.Logghe at DEVGEN.com Wed Apr 12 16:21:17 2006 From: Marc.Logghe at DEVGEN.com (Marc Logghe) Date: Wed, 12 Apr 2006 18:21:17 +0200 Subject: [EMBOSS] Embossdata -reject option Message-ID: <0C528E3670D8CE4B8E013F6749231AA6746CDB@ANTARESIA.be.devgen.com> Hi Peter, > Hmmmm .... yes, -help and the acdtable output (the table in > the webpage application documentation) really need to report > the list of menu items for values that are not prompted (list > and selection datatypes). > > We will do that for the next release! > > Otherwise, you do need to look in the ACD file. > > I propose: > > -help to report documentation on the options -help -verbose > to report the list of options > > acdtable to report the full menu formatted in the "Allowed > values" box. OK, great ! > We are looking at ways to do that ... can be tricky if > cutgextract has been run. Any suggestions? A showdata > application perhaps? Yes that could be a start. You could give the directory name as a parameter, the oposite of the -reject parameter (-include ?). In it's basic form it can just list the file content like embossdata -showall is doing. An example command that lists all the codon tables could be: 'showdata -include CODONS'. Something else. In order not to contaminate the CODONS folder I created a CUTG folder in the directory containing the codon tables extracted from the most recent CUTG. Problem now is a user has to add the relative filename as a cfile option (backtranseq) in order EMBOSS to find the new codon tables. Would it be an idea that you can set $EMBOSS_DATA to a list of values instead of only 1 directory name ? In that way, EMBOSS can access custom data directories. Suppose the following: EMBOSS_DATA=/usr/local/share/EMBOSS/data:/my/other/emboss_data_dir/CUTG If a codon table is not found in the usual place (/usr/local/share/EMBOSS/data/CODONS) EMBOSS will look for them in other places defined in EMBOSS_DATA (/my/other/emboss_data_dir/CUTG). Or something alike. Does that make sense ? Cheers, Marc From simon.andrews at bbsrc.ac.uk Thu Apr 13 08:43:53 2006 From: simon.andrews at bbsrc.ac.uk (simon andrews (BI)) Date: Thu, 13 Apr 2006 09:43:53 +0100 Subject: [EMBOSS] Problems indexing with dbxfasta Message-ID: I'm trying to use dbxfasta to index one of the WGS trace databases. Unfortunately dbxfasta is falling over on me. The session looks like this: $ dbxfasta Database b+tree indexing for fasta file databases Basename for index files: traces_oanatinus Resource name: all simple : >ID idacc : >ID ACC gcgid : >db:ID gcgidacc : >db:ID ACC dbid : >db ID ncbi : | formats ID line format [idacc]: simple Database directory [.]: Wildcard database filename [*.dat]: *.fasta Release number [0.0]: Index date [00/00/00]: Processing file ./nisc-platypus-shotgun-1048960391.fasta Processing file ./nisc-platypus-shotgun-1071756042.fasta Processing file ./nisc-platypus-shotgun-1080815515.fasta Processing file ./nisc-platypus-shotgun-1102160893.fasta Processing file ./nisc-platypus-shotgun-1104879084.fasta Processing file ./nisc-platypus-shotgun-1109000445.fasta Processing file ./nisc-platypus-shotgun-1110804272.fasta Processing file ./nisc-platypus-shotgun-1116844699.fasta Processing file ./nisc-platypus-shotgun-1142973027.fasta Processing file ./wugsc-ornithorhynchus_anatinus-cloneEnd-1113828608.fasta Processing file ./wugsc-ornithorhynchus_anatinus-cloneEnd-1115655383.fasta Processing file ./wugsc-ornithorhynchus_anatinus-cloneEnd-1119433885.fasta EMBOSS An error in ajindex.c at line 615: Maximum retries (100) reached in btreeCacheFetch for page 14240710656 The same files have indexed OK with formatdb. I havent' tried with dbifasta as I'm trying to move everything over to the new dbx system (and the rest of our databases have processed OK with dbx(fasta|flat)). Anyone have any ideas about how to debug this? Cheers Simon. -- Simon Andrews PhD Bioinformatics Group The Babraham Institute simon.andrews at bbsrc.ac.uk +44 (0) 1223 496463 From Marc.Logghe at DEVGEN.com Thu Apr 13 09:00:56 2006 From: Marc.Logghe at DEVGEN.com (Marc Logghe) Date: Thu, 13 Apr 2006 11:00:56 +0200 Subject: [EMBOSS] Problems indexing with dbxfasta Message-ID: <0C528E3670D8CE4B8E013F6749231AA6746CE0@ANTARESIA.be.devgen.com> Hi Simon, > The same files have indexed OK with formatdb. I havent' > tried with dbifasta as I'm trying to move everything over to > the new dbx system (and the rest of our databases have > processed OK with dbx(fasta|flat)). > > Anyone have any ideas about how to debug this? You can run the command with the -debug option (any EMBOSS application accepts this option). In that case a dbxfasta.dbg file will be created. Hope this file will give you the clues. Cheers, Marc From ajb at ebi.ac.uk Thu Apr 13 09:19:44 2006 From: ajb at ebi.ac.uk (ajb at ebi.ac.uk) Date: Thu, 13 Apr 2006 10:19:44 +0100 (BST) Subject: [EMBOSS] Problems indexing with dbxfasta In-Reply-To: References: Message-ID: <52959.81.98.244.247.1144919984.squirrel@webmail.ebi.ac.uk> Hello Simon, Did you pick up the latest set of patches from: ftp://emboss.open-bio.org/pub/EMBOSS/fixes/ ? The indexing system was rewritten a few months ago to fix this. See the README in that directory. If you are using the latest fixes (check file sizes) and it is still failing then let me know. HTH Alan > I'm trying to use dbxfasta to index one of the WGS trace databases. > Unfortunately dbxfasta is falling over on me. The session looks like > this: > > $ dbxfasta > Database b+tree indexing for fasta file databases > Basename for index files: traces_oanatinus > Resource name: all > simple : >ID > idacc : >ID ACC > gcgid : >db:ID > gcgidacc : >db:ID ACC > dbid : >db ID > ncbi : | formats > ID line format [idacc]: simple > Database directory [.]: > Wildcard database filename [*.dat]: *.fasta > Release number [0.0]: > Index date [00/00/00]: > Processing file ./nisc-platypus-shotgun-1048960391.fasta > Processing file ./nisc-platypus-shotgun-1071756042.fasta > Processing file ./nisc-platypus-shotgun-1080815515.fasta > Processing file ./nisc-platypus-shotgun-1102160893.fasta > Processing file ./nisc-platypus-shotgun-1104879084.fasta > Processing file ./nisc-platypus-shotgun-1109000445.fasta > Processing file ./nisc-platypus-shotgun-1110804272.fasta > Processing file ./nisc-platypus-shotgun-1116844699.fasta > Processing file ./nisc-platypus-shotgun-1142973027.fasta > Processing file > ./wugsc-ornithorhynchus_anatinus-cloneEnd-1113828608.fasta > Processing file > ./wugsc-ornithorhynchus_anatinus-cloneEnd-1115655383.fasta > Processing file > ./wugsc-ornithorhynchus_anatinus-cloneEnd-1119433885.fasta > > EMBOSS An error in ajindex.c at line 615: > Maximum retries (100) reached in btreeCacheFetch for page 14240710656 > > The same files have indexed OK with formatdb. I havent' tried with > dbifasta as I'm trying to move everything over to the new dbx system > (and the rest of our databases have processed OK with dbx(fasta|flat)). > > Anyone have any ideas about how to debug this? > > Cheers > > Simon. > > -- > Simon Andrews PhD > Bioinformatics Group > The Babraham Institute > > simon.andrews at bbsrc.ac.uk > +44 (0) 1223 496463 > > _______________________________________________ > EMBOSS mailing list > EMBOSS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/emboss > From simon.andrews at bbsrc.ac.uk Thu Apr 13 09:30:41 2006 From: simon.andrews at bbsrc.ac.uk (simon andrews (BI)) Date: Thu, 13 Apr 2006 10:30:41 +0100 Subject: [EMBOSS] Problems indexing with dbxfasta In-Reply-To: <52959.81.98.244.247.1144919984.squirrel@webmail.ebi.ac.uk> References: <52959.81.98.244.247.1144919984.squirrel@webmail.ebi.ac.uk> Message-ID: <25bf10458f1cd7e0cc1c64de70f6bdef@bbsrc.ac.uk> On 13 Apr 2006, at 10:19, ajb at ebi.ac.uk wrote: > Hello Simon, > > Did you pick up the latest set of patches from: > ftp://emboss.open-bio.org/pub/EMBOSS/fixes/ Yes. All patched with the latest fixes as of last week. > If you are using the latest fixes (check file sizes) and it is still > failing then let me know. It is still failing. I'll have a go at generating a .dbg file if you think it'll help, but given how verbose those tend to be, and how long it takes to fail I was a bit concerned at the size of file it was likely to generate. Simon. > > > HTH > > Alan > >> I'm trying to use dbxfasta to index one of the WGS trace databases. >> Unfortunately dbxfasta is falling over on me. The session looks like >> this: >> >> $ dbxfasta >> Database b+tree indexing for fasta file databases >> Basename for index files: traces_oanatinus >> Resource name: all >> simple : >ID >> idacc : >ID ACC >> gcgid : >db:ID >> gcgidacc : >db:ID ACC >> dbid : >db ID >> ncbi : | formats >> ID line format [idacc]: simple >> Database directory [.]: >> Wildcard database filename [*.dat]: *.fasta >> Release number [0.0]: >> Index date [00/00/00]: >> Processing file ./nisc-platypus-shotgun-1048960391.fasta >> Processing file ./nisc-platypus-shotgun-1071756042.fasta >> Processing file ./nisc-platypus-shotgun-1080815515.fasta >> Processing file ./nisc-platypus-shotgun-1102160893.fasta >> Processing file ./nisc-platypus-shotgun-1104879084.fasta >> Processing file ./nisc-platypus-shotgun-1109000445.fasta >> Processing file ./nisc-platypus-shotgun-1110804272.fasta >> Processing file ./nisc-platypus-shotgun-1116844699.fasta >> Processing file ./nisc-platypus-shotgun-1142973027.fasta >> Processing file >> ./wugsc-ornithorhynchus_anatinus-cloneEnd-1113828608.fasta >> Processing file >> ./wugsc-ornithorhynchus_anatinus-cloneEnd-1115655383.fasta >> Processing file >> ./wugsc-ornithorhynchus_anatinus-cloneEnd-1119433885.fasta >> >> EMBOSS An error in ajindex.c at line 615: >> Maximum retries (100) reached in btreeCacheFetch for page 14240710656 >> >> The same files have indexed OK with formatdb. I havent' tried with >> dbifasta as I'm trying to move everything over to the new dbx system >> (and the rest of our databases have processed OK with >> dbx(fasta|flat)). >> >> Anyone have any ideas about how to debug this? >> >> Cheers >> >> Simon. >> >> -- >> Simon Andrews PhD >> Bioinformatics Group >> The Babraham Institute >> >> simon.andrews at bbsrc.ac.uk >> +44 (0) 1223 496463 >> >> _______________________________________________ >> EMBOSS mailing list >> EMBOSS at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/emboss >> > > > -- Simon Andrews PhD Bioinformatics Dept. The Babraham Institute simon.andrews at bbsrc.ac.uk +44 (0) 1223 496463 From simon.andrews at bbsrc.ac.uk Thu Apr 13 09:41:13 2006 From: simon.andrews at bbsrc.ac.uk (simon andrews (BI)) Date: Thu, 13 Apr 2006 10:41:13 +0100 Subject: [EMBOSS] Problems indexing with dbxfasta Message-ID: I managed to get hold of a debug file from the failing dbxfasta. The edited highlights are: Debug file dbxfasta.dbg buffered:No ajFileNewIn '/usr/local/share/EMBOSS/acd/dbxfasta.acd' EOF ajFileGetsL file /usr/local/share/EMBOSS/acd/dbxfasta.acd closing file '/usr/local/share/EMBOSS/acd/dbxfasta.acd' ajUserGet buffer len: 0 res: 2048 ptr: 8d8eb18 ajUserGet buffer len: 0 res: 2048 ptr: 8d8eb18 ajUserGet buffer len: 5 res: 2048 ptr: 8d8f320 ajUserGet buffer len: 1 res: 2048 ptr: 8d8fb28 ajUserGet buffer len: 5 res: 2048 ptr: 8d8eb18 ajUserGet buffer len: 3 res: 2048 ptr: 8d8eb18 ajUserGet buffer len: 8 res: 2048 ptr: 8d8eb18 ajFileScan directory: './' nisc-platypus-shotgun-1071756042.fasta nisc-platypus-shotgun-1080815515.fasta nisc-platypus-shotgun-1102160893.fasta [snip big list of files] closing file './/traces_oanatinus.ent' ajFileNewIn './nisc-platypus-shotgun-1048960391.fasta' EOF ajFileGetsL file ./nisc-platypus-shotgun-1048960391.fasta closing file './nisc-platypus-shotgun-1048960391.fasta' ajFileNewIn './nisc-platypus-shotgun-1071756042.fasta' EOF ajFileGetsL file ./nisc-platypus-shotgun-1071756042.fasta closing file './nisc-platypus-shotgun-1071756042.fasta' ajFileNewIn './nisc-platypus-shotgun-1080815515.fasta' EOF ajFileGetsL file ./nisc-platypus-shotgun-1080815515.fasta closing file './nisc-platypus-shotgun-1080815515.fasta' ajFileNewIn './nisc-platypus-shotgun-1102160893.fasta' EOF ajFileGetsL file ./nisc-platypus-shotgun-1102160893.fasta closing file './nisc-platypus-shotgun-1102160893.fasta' ajFileNewIn './nisc-platypus-shotgun-1104879084.fasta' EOF ajFileGetsL file ./nisc-platypus-shotgun-1104879084.fasta closing file './nisc-platypus-shotgun-1104879084.fasta' ajFileNewIn './nisc-platypus-shotgun-1109000445.fasta' EOF ajFileGetsL file ./nisc-platypus-shotgun-1109000445.fasta closing file './nisc-platypus-shotgun-1109000445.fasta' ajFileNewIn './nisc-platypus-shotgun-1110804272.fasta' EOF ajFileGetsL file ./nisc-platypus-shotgun-1110804272.fasta closing file './nisc-platypus-shotgun-1110804272.fasta' ajFileNewIn './nisc-platypus-shotgun-1116844699.fasta' EOF ajFileGetsL file ./nisc-platypus-shotgun-1116844699.fasta closing file './nisc-platypus-shotgun-1116844699.fasta' ajFileNewIn './nisc-platypus-shotgun-1142973027.fasta' EOF ajFileGetsL file ./nisc-platypus-shotgun-1142973027.fasta closing file './nisc-platypus-shotgun-1142973027.fasta' ajFileNewIn './wugsc-ornithorhynchus_anatinus-cloneEnd-1113828608.fasta' WriteBucket: Overflow WriteBucket: Overflow ReadBucket: Overflow ReadBucket: Overflow ReadBucket: Overflow ReadBucket: Overflow WriteBucket: Overflow [Loads more of these] GetKeys: Overflow ReadBucket: Overflow GetKeys: Overflow WriteNode: Overflow GetKeys: Overflow WriteNode: Overflow WriteBucket: Overflow WriteBucket: Overflow [Loads of these] WriteNode: Overflow GetKeys: Overflow WriteNode: Overflow GetKeys: Overflow GetKeys: Overflow WriteNode: Overflow GetKeys: Overflow GetKeys: Overflow WriteNode: Overflow GetKeys: Overflow WriteNode: Overflow GetKeys: Overflow [Killed at this point as the .dbg file getting enormous] From ajb at ebi.ac.uk Thu Apr 13 10:22:49 2006 From: ajb at ebi.ac.uk (ajb at ebi.ac.uk) Date: Thu, 13 Apr 2006 11:22:49 +0100 (BST) Subject: [EMBOSS] Problems indexing with dbxfasta In-Reply-To: References: Message-ID: <36857.81.98.244.247.1144923769.squirrel@webmail.ebi.ac.uk> Hi Simon, The overflow code isn't fully implemented yet and it shouldn't need to use it if your resource definition is OK. You'll get overflows if the length values are too short for the ID/ACC/SV/etc. Take a look and get back to me off-list if adjusting any appropriate length resource definitions doesn't help. HTH Alan > I managed to get hold of a debug file from the failing dbxfasta. The > edited highlights are: > > Debug file dbxfasta.dbg buffered:No > ajFileNewIn '/usr/local/share/EMBOSS/acd/dbxfasta.acd' > EOF ajFileGetsL file /usr/local/share/EMBOSS/acd/dbxfasta.acd > closing file '/usr/local/share/EMBOSS/acd/dbxfasta.acd' > ajUserGet buffer len: 0 res: 2048 ptr: 8d8eb18 > ajUserGet buffer len: 0 res: 2048 ptr: 8d8eb18 > ajUserGet buffer len: 5 res: 2048 ptr: 8d8f320 > ajUserGet buffer len: 1 res: 2048 ptr: 8d8fb28 > ajUserGet buffer len: 5 res: 2048 ptr: 8d8eb18 > ajUserGet buffer len: 3 res: 2048 ptr: 8d8eb18 > ajUserGet buffer len: 8 res: 2048 ptr: 8d8eb18 > ajFileScan directory: './' > nisc-platypus-shotgun-1071756042.fasta > nisc-platypus-shotgun-1080815515.fasta > nisc-platypus-shotgun-1102160893.fasta > > > [snip big list of files] > > closing file './/traces_oanatinus.ent' > ajFileNewIn './nisc-platypus-shotgun-1048960391.fasta' > EOF ajFileGetsL file ./nisc-platypus-shotgun-1048960391.fasta > closing file './nisc-platypus-shotgun-1048960391.fasta' > ajFileNewIn './nisc-platypus-shotgun-1071756042.fasta' > EOF ajFileGetsL file ./nisc-platypus-shotgun-1071756042.fasta > closing file './nisc-platypus-shotgun-1071756042.fasta' > ajFileNewIn './nisc-platypus-shotgun-1080815515.fasta' > EOF ajFileGetsL file ./nisc-platypus-shotgun-1080815515.fasta > closing file './nisc-platypus-shotgun-1080815515.fasta' > ajFileNewIn './nisc-platypus-shotgun-1102160893.fasta' > EOF ajFileGetsL file ./nisc-platypus-shotgun-1102160893.fasta > closing file './nisc-platypus-shotgun-1102160893.fasta' > ajFileNewIn './nisc-platypus-shotgun-1104879084.fasta' > EOF ajFileGetsL file ./nisc-platypus-shotgun-1104879084.fasta > closing file './nisc-platypus-shotgun-1104879084.fasta' > ajFileNewIn './nisc-platypus-shotgun-1109000445.fasta' > EOF ajFileGetsL file ./nisc-platypus-shotgun-1109000445.fasta > closing file './nisc-platypus-shotgun-1109000445.fasta' > ajFileNewIn './nisc-platypus-shotgun-1110804272.fasta' > EOF ajFileGetsL file ./nisc-platypus-shotgun-1110804272.fasta > closing file './nisc-platypus-shotgun-1110804272.fasta' > ajFileNewIn './nisc-platypus-shotgun-1116844699.fasta' > EOF ajFileGetsL file ./nisc-platypus-shotgun-1116844699.fasta > closing file './nisc-platypus-shotgun-1116844699.fasta' > ajFileNewIn './nisc-platypus-shotgun-1142973027.fasta' > EOF ajFileGetsL file ./nisc-platypus-shotgun-1142973027.fasta > closing file './nisc-platypus-shotgun-1142973027.fasta' > ajFileNewIn './wugsc-ornithorhynchus_anatinus-cloneEnd-1113828608.fasta' > WriteBucket: Overflow > WriteBucket: Overflow > ReadBucket: Overflow > ReadBucket: Overflow > ReadBucket: Overflow > ReadBucket: Overflow > WriteBucket: Overflow > > [Loads more of these] > > GetKeys: Overflow > ReadBucket: Overflow > GetKeys: Overflow > WriteNode: Overflow > GetKeys: Overflow > WriteNode: Overflow > WriteBucket: Overflow > WriteBucket: Overflow > > [Loads of these] > > WriteNode: Overflow > GetKeys: Overflow > WriteNode: Overflow > GetKeys: Overflow > GetKeys: Overflow > WriteNode: Overflow > GetKeys: Overflow > GetKeys: Overflow > WriteNode: Overflow > GetKeys: Overflow > WriteNode: Overflow > GetKeys: Overflow > > [Killed at this point as the .dbg file getting enormous] > > _______________________________________________ > EMBOSS mailing list > EMBOSS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/emboss > From simon.andrews at bbsrc.ac.uk Thu Apr 13 13:36:11 2006 From: simon.andrews at bbsrc.ac.uk (simon andrews (BI)) Date: Thu, 13 Apr 2006 14:36:11 +0100 Subject: [EMBOSS] Problems indexing with dbxfasta In-Reply-To: <36857.81.98.244.247.1144923769.squirrel@webmail.ebi.ac.uk> References: <36857.81.98.244.247.1144923769.squirrel@webmail.ebi.ac.uk> Message-ID: Alan, I increased all of the values in the resource definition and did the index again and it all worked fine this time. Looks like there must be some very long ids somewhere in this data. Thanks for the help Simon. On 13 Apr 2006, at 11:22, ajb at ebi.ac.uk wrote: > Hi Simon, > > The overflow code isn't fully implemented yet and it shouldn't need > to use it if your resource definition is OK. You'll get > overflows if the length values are too short for the > ID/ACC/SV/etc. Take a look and get back to me off-list > if adjusting any appropriate length resource definitions > doesn't help. > > HTH > > Alan > > >> I managed to get hold of a debug file from the failing dbxfasta. The >> edited highlights are: >> >> Debug file dbxfasta.dbg buffered:No >> ajFileNewIn '/usr/local/share/EMBOSS/acd/dbxfasta.acd' >> EOF ajFileGetsL file /usr/local/share/EMBOSS/acd/dbxfasta.acd >> closing file '/usr/local/share/EMBOSS/acd/dbxfasta.acd' >> ajUserGet buffer len: 0 res: 2048 ptr: 8d8eb18 >> ajUserGet buffer len: 0 res: 2048 ptr: 8d8eb18 >> ajUserGet buffer len: 5 res: 2048 ptr: 8d8f320 >> ajUserGet buffer len: 1 res: 2048 ptr: 8d8fb28 >> ajUserGet buffer len: 5 res: 2048 ptr: 8d8eb18 >> ajUserGet buffer len: 3 res: 2048 ptr: 8d8eb18 >> ajUserGet buffer len: 8 res: 2048 ptr: 8d8eb18 >> ajFileScan directory: './' >> nisc-platypus-shotgun-1071756042.fasta >> nisc-platypus-shotgun-1080815515.fasta >> nisc-platypus-shotgun-1102160893.fasta >> >> >> [snip big list of files] >> >> closing file './/traces_oanatinus.ent' >> ajFileNewIn './nisc-platypus-shotgun-1048960391.fasta' >> EOF ajFileGetsL file ./nisc-platypus-shotgun-1048960391.fasta >> closing file './nisc-platypus-shotgun-1048960391.fasta' >> ajFileNewIn './nisc-platypus-shotgun-1071756042.fasta' >> EOF ajFileGetsL file ./nisc-platypus-shotgun-1071756042.fasta >> closing file './nisc-platypus-shotgun-1071756042.fasta' >> ajFileNewIn './nisc-platypus-shotgun-1080815515.fasta' >> EOF ajFileGetsL file ./nisc-platypus-shotgun-1080815515.fasta >> closing file './nisc-platypus-shotgun-1080815515.fasta' >> ajFileNewIn './nisc-platypus-shotgun-1102160893.fasta' >> EOF ajFileGetsL file ./nisc-platypus-shotgun-1102160893.fasta >> closing file './nisc-platypus-shotgun-1102160893.fasta' >> ajFileNewIn './nisc-platypus-shotgun-1104879084.fasta' >> EOF ajFileGetsL file ./nisc-platypus-shotgun-1104879084.fasta >> closing file './nisc-platypus-shotgun-1104879084.fasta' >> ajFileNewIn './nisc-platypus-shotgun-1109000445.fasta' >> EOF ajFileGetsL file ./nisc-platypus-shotgun-1109000445.fasta >> closing file './nisc-platypus-shotgun-1109000445.fasta' >> ajFileNewIn './nisc-platypus-shotgun-1110804272.fasta' >> EOF ajFileGetsL file ./nisc-platypus-shotgun-1110804272.fasta >> closing file './nisc-platypus-shotgun-1110804272.fasta' >> ajFileNewIn './nisc-platypus-shotgun-1116844699.fasta' >> EOF ajFileGetsL file ./nisc-platypus-shotgun-1116844699.fasta >> closing file './nisc-platypus-shotgun-1116844699.fasta' >> ajFileNewIn './nisc-platypus-shotgun-1142973027.fasta' >> EOF ajFileGetsL file ./nisc-platypus-shotgun-1142973027.fasta >> closing file './nisc-platypus-shotgun-1142973027.fasta' >> ajFileNewIn >> './wugsc-ornithorhynchus_anatinus-cloneEnd-1113828608.fasta' >> WriteBucket: Overflow >> WriteBucket: Overflow >> ReadBucket: Overflow >> ReadBucket: Overflow >> ReadBucket: Overflow >> ReadBucket: Overflow >> WriteBucket: Overflow >> >> [Loads more of these] >> >> GetKeys: Overflow >> ReadBucket: Overflow >> GetKeys: Overflow >> WriteNode: Overflow >> GetKeys: Overflow >> WriteNode: Overflow >> WriteBucket: Overflow >> WriteBucket: Overflow >> >> [Loads of these] >> >> WriteNode: Overflow >> GetKeys: Overflow >> WriteNode: Overflow >> GetKeys: Overflow >> GetKeys: Overflow >> WriteNode: Overflow >> GetKeys: Overflow >> GetKeys: Overflow >> WriteNode: Overflow >> GetKeys: Overflow >> WriteNode: Overflow >> GetKeys: Overflow >> >> [Killed at this point as the .dbg file getting enormous] >> >> _______________________________________________ >> EMBOSS mailing list >> EMBOSS at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/emboss >> > > > -- Simon Andrews PhD Bioinformatics Dept. The Babraham Institute simon.andrews at bbsrc.ac.uk +44 (0) 1223 496463 From msarachu at biol.unlp.edu.ar Mon Apr 17 20:55:47 2006 From: msarachu at biol.unlp.edu.ar (Martin Sarachu) Date: Mon, 17 Apr 2006 17:55:47 -0300 Subject: [EMBOSS] wEMBOSS-1.6.0 & wrappers4EMBOSS-1.4.0 release Message-ID: <444400D3.6080705@biol.unlp.edu.ar> This is to announce the release of both wEMBOSS-1.6.0 and wrappers4EMBOSS-1.4.0 Changes in wEMBOSS-1.6.0 includes: - compatibility with new datatypes in EMBOSS-3.0.0 - better conversion of ACD expressions to Perl to maintain the same order of priority as in EMBOSS - increased speed by preprocessing EMBOSS datafiles Changes in wrappers4EMBOSS includes: - all programs that compute a gap penalty of type a*n+b now have parameters -gappenalty and -gaplength instead of -gapopen and -gapextend - muscle version updated for MUSCLE-3.6 - support for EMBOSS v2.9, 2.10 and 3. EMBOSS-2.8 is no longer supported - fastapid uses matrices coded in the software rather then read from files - indexsearch can also run with SRS 8 and it also fully runs on command line We are experiencing some dificulties at the wEMBOSS site so you can download both files at http://www.ar.embnet.org/downloads Shortly you will be able to download both at http://www.wemboss.org as usual. wEMBOSS includes wrappers4EMBOSS but if you want to use just wrappers4EMBOSS on the command line just like any EMBOSS program you can download it separately. Regards, the wEMBOSS & wrappers4EMBOSS dev team. -- Martin Sarachu msarachu at biol.unlp.edu.ar AR.EMBnet http://www.ar.embnet.org From dalesan at lamar.colostate.edu Tue Apr 18 23:53:03 2006 From: dalesan at lamar.colostate.edu (Dale Richardson) Date: Tue, 18 Apr 2006 17:53:03 -0600 Subject: [EMBOSS] Compilation errors on Intel Mac in ajindex.c Message-ID: <1D9B80E3-1F2B-4171-A346-BB63EF775585@lamar.colostate.edu> Hello All, I am trying to install EMBOSS 3.0 on my MacBook Pro. Interestingly, I have come across an error that I haven't been able to resolve via googling. When running make, the following error is encountered: ajindex.c: In function 'ajBtreeCacheNewC': ajindex.c:200: error: storage size of 'buf' isn't known ajindex.c: In function 'ajBtreeSecCacheNewC': ajindex.c:8234: error: storage size of 'buf' isn't known make[1]: *** [ajindex.lo] Error 1 make: *** [all-recursive] Error 1 Is there a way around this? I've applied the fixes available from the fixes directory at ftp://emboss.open-bio.org/pub/EMBOSS/ and tried to reconfigure and recompile but to no avail. Insights and suggestions would be much appreciated. Thanks, Dale Richardson Colorado State University dalesan at lamar.colostate.edu From kvddrift at earthlink.net Wed Apr 19 01:27:30 2006 From: kvddrift at earthlink.net (Koen van der Drift) Date: Tue, 18 Apr 2006 21:27:30 -0400 Subject: [EMBOSS] Compilation errors on Intel Mac in ajindex.c In-Reply-To: <1D9B80E3-1F2B-4171-A346-BB63EF775585@lamar.colostate.edu> References: <1D9B80E3-1F2B-4171-A346-BB63EF775585@lamar.colostate.edu> Message-ID: On Apr 18, 2006, at 7:53 PM, Dale Richardson wrote: > Insights and suggestions would be much appreciated. You could try to install emboss using fink, which is reportedly working on an Intel Mac (not tested by myself though). - Koen. From kvddrift at earthlink.net Wed Apr 19 01:30:43 2006 From: kvddrift at earthlink.net (Koen van der Drift) Date: Tue, 18 Apr 2006 21:30:43 -0400 Subject: [EMBOSS] Compilation errors on Intel Mac in ajindex.c In-Reply-To: <1D9B80E3-1F2B-4171-A346-BB63EF775585@lamar.colostate.edu> References: <1D9B80E3-1F2B-4171-A346-BB63EF775585@lamar.colostate.edu> Message-ID: <5195A9EE-A20F-4804-9AD5-FA08662D8912@earthlink.net> On Apr 18, 2006, at 7:53 PM, Dale Richardson wrote: > Is there a way around this? I've applied the fixes available from > the fixes directory at ftp://emboss.open-bio.org/pub/EMBOSS/ and > tried to reconfigure and recompile but to no avail. > > Insights and suggestions would be much appreciated. Just another thought, did you also replace the configure file from the fixes directory, followed by the ./configure command? - Koen. From olivier.friard at unito.it Fri Apr 21 15:00:20 2006 From: olivier.friard at unito.it (Olivier Friard) Date: Fri, 21 Apr 2006 17:00:20 +0200 Subject: [EMBOSS] index RefSeq for EMBOSS Message-ID: <4448F384.7020900@unito.it> Hi, I tried to index the RefSeq database: 1) I downloaded all ftp://ftp.ncbi.nih.gov/refseq/release/complete/complete*.genomic.gbff.gz file (GB format) 2) gunziped 3) Added the rs_dna entry to my .embossrc file DB rs_dna [ type: "N" method: "emblcd" format: "GB" dir: "/home/users/friard/data/refseq_genomic/" file: "*.gbff" release: "" comment: "RefSeq Genomic (upd)" indexdir: "/home/users/friard/data/refseq_genomic/" ] 4) used dbiflat with following arguments (from the directory where files are stored) dbiflat Index a flat file database Database name: rs_dna EMBL : EMBL SWISS : Swiss-Prot, SpTrEMBL, TrEMBLnew GB : Genbank, DDBJ REFSEQ : Refseq Entry format [SWISS]: REFSEQ Database directory [.]: Wildcard database filename [*.dat]: *.gbff Release number [0.0]: Index date [00/00/00]: The indexes were created but when I try to access to a sequence (i.e seqret rs_rna:NC_000004) then results is not the correct sequence but an other one with the NC_000004 ID! I also downloaded the file in FASTA format and tried to index them with the dbifasta command (format: ncbi) without positive results: seqret rs_dna:nc_000004 Reads and writes (returns) sequences Error: Unable to read sequence 'rs_dna:nc_000004' Died: seqret terminated: Bad value for '-sequence' and no prompt Does anyone index the RefSeq successfully? Thank you in advance -- Olivier Friard Laboratorio di Biologia Computazionale Facolt? di Scienze MFN Universit? di Torino via Accademia Albertina 13, 10124 TORINO (Italy) tel. +39 011 6704689 From simon.andrews at bbsrc.ac.uk Fri Apr 21 15:35:29 2006 From: simon.andrews at bbsrc.ac.uk (simon andrews (BI)) Date: Fri, 21 Apr 2006 16:35:29 +0100 Subject: [EMBOSS] index RefSeq for EMBOSS In-Reply-To: <4448F384.7020900@unito.it> References: <4448F384.7020900@unito.it> Message-ID: On 21 Apr 2006, at 16:00, Olivier Friard wrote: > The indexes were created but when I try to access to a sequence (i.e > seqret rs_rna:NC_000004) then results is not the correct sequence but > an > other one with the NC_000004 ID! Is it just finding the wrong sequence or could you have duplicate entries in the data? Use entret to see if the entry really has that ID. We found that we got problems with incorrect or no sequences being returned by seqret when some of the individual sequence files were >2Gb in size. In these cases you can use the new dbx* indexing programs which handle large files properly. > Does anyone index the RefSeq successfully? Yes. We use it here without problems, but indexed with dbxflat. It gets indexed with: dbxflat -dbresource all -auto -idformat refseq -dbname refseq_all -filenames \*.gbff ..and the emboss.default entry looks like: DB refseq_all [ type: N comment: "Refseq" method: emboss format: genbank dbalias: refseq_all directory: /data/public/DNA/Refseq/Current/all file: *.gbff ] with the resource section being: RES all [ type: Index idlen: 15 acclen: 15 svlen: 15 keylen: 15 deslen: 15 orglen: 15 ] Simon. -- Simon Andrews PhD Bioinformatics Dept. The Babraham Institute simon.andrews at bbsrc.ac.uk +44 (0) 1223 496463 From isabelle.wells at roche.com Fri Apr 21 15:43:27 2006 From: isabelle.wells at roche.com (Wells, Isabelle) Date: Fri, 21 Apr 2006 17:43:27 +0200 Subject: [EMBOSS] index RefSeq for EMBOSS Message-ID: Hi, Yes I also index refseq. I think the problem here is that dbiflat can only handle files which are less than 2GB. So try splitting the files first. Best, Isabelle -----Original Message----- From: emboss-bounces at lists.open-bio.org [mailto:emboss-bounces at lists.open-bio.org] On Behalf Of Olivier Friard Sent: Friday, April 21, 2006 17:00 To: emboss at emboss.open-bio.org Subject: [EMBOSS] index RefSeq for EMBOSS Hi, I tried to index the RefSeq database: 1) I downloaded all ftp://ftp.ncbi.nih.gov/refseq/release/complete/complete*.genomic.gbff.gz file (GB format) 2) gunziped 3) Added the rs_dna entry to my .embossrc file DB rs_dna [ type: "N" method: "emblcd" format: "GB" dir: "/home/users/friard/data/refseq_genomic/" file: "*.gbff" release: "" comment: "RefSeq Genomic (upd)" indexdir: "/home/users/friard/data/refseq_genomic/" ] 4) used dbiflat with following arguments (from the directory where files are stored) dbiflat Index a flat file database Database name: rs_dna EMBL : EMBL SWISS : Swiss-Prot, SpTrEMBL, TrEMBLnew GB : Genbank, DDBJ REFSEQ : Refseq Entry format [SWISS]: REFSEQ Database directory [.]: Wildcard database filename [*.dat]: *.gbff Release number [0.0]: Index date [00/00/00]: The indexes were created but when I try to access to a sequence (i.e seqret rs_rna:NC_000004) then results is not the correct sequence but an other one with the NC_000004 ID! I also downloaded the file in FASTA format and tried to index them with the dbifasta command (format: ncbi) without positive results: seqret rs_dna:nc_000004 Reads and writes (returns) sequences Error: Unable to read sequence 'rs_dna:nc_000004' Died: seqret terminated: Bad value for '-sequence' and no prompt Does anyone index the RefSeq successfully? Thank you in advance -- Olivier Friard Laboratorio di Biologia Computazionale Facolt? di Scienze MFN Universit? di Torino via Accademia Albertina 13, 10124 TORINO (Italy) tel. +39 011 6704689 _______________________________________________ EMBOSS mailing list EMBOSS at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/emboss From David.Bauer at schering.de Mon Apr 24 05:52:50 2006 From: David.Bauer at schering.de (David.Bauer at schering.de) Date: Mon, 24 Apr 2006 07:52:50 +0200 Subject: [EMBOSS] index RefSeq for EMBOSS In-Reply-To: Message-ID: You can also try the new indexing programs dbxflat and dbxfasta, which can handle files larger than 2 GB. Regards, David. emboss-bounces at lists.open-bio.org schrieb am 21/04/2006 17:43:27: > Hi, > > Yes I also index refseq. I think the problem here is that dbiflat > can only handle files which are less than 2GB. So try splitting the > files first. > > Best, > Isabelle > > -----Original Message----- > From: emboss-bounces at lists.open-bio.org [mailto:emboss- > bounces at lists.open-bio.org] On Behalf Of Olivier Friard > Sent: Friday, April 21, 2006 17:00 > To: emboss at emboss.open-bio.org > Subject: [EMBOSS] index RefSeq for EMBOSS > > > Hi, > > I tried to index the RefSeq database: > > 1) I downloaded all > ftp://ftp.ncbi.nih.gov/refseq/release/complete/complete*.genomic.gbff.gz > file (GB format) > > 2) gunziped > > 3) Added the rs_dna entry to my .embossrc file > > > DB rs_dna [ > type: "N" > method: "emblcd" > format: "GB" > dir: "/home/users/friard/data/refseq_genomic/" > file: "*.gbff" > release: "" > comment: "RefSeq Genomic (upd)" > indexdir: "/home/users/friard/data/refseq_genomic/" > ] > > > 4) used dbiflat with following arguments (from the directory where files > are stored) > > dbiflat > Index a flat file database > Database name: rs_dna > EMBL : EMBL > SWISS : Swiss-Prot, SpTrEMBL, TrEMBLnew > GB : Genbank, DDBJ > REFSEQ : Refseq > Entry format [SWISS]: REFSEQ > Database directory [.]: > Wildcard database filename [*.dat]: *.gbff > Release number [0.0]: > Index date [00/00/00]: > > The indexes were created but when I try to access to a sequence (i.e > seqret rs_rna:NC_000004) then results is not the correct sequence but an > other one with the NC_000004 ID! > > > > I also downloaded the file in FASTA format and tried to index them with > the dbifasta command (format: ncbi) without positive results: > > seqret rs_dna:nc_000004 > Reads and writes (returns) sequences > Error: Unable to read sequence 'rs_dna:nc_000004' > Died: seqret terminated: Bad value for '-sequence' and no prompt > > > Does anyone index the RefSeq successfully? > Thank you in advance > > > > > > > -- > > Olivier Friard > Laboratorio di Biologia Computazionale > Facolt? di Scienze MFN > Universit? di Torino > via Accademia Albertina 13, 10124 TORINO (Italy) > > tel. +39 011 6704689 > > _______________________________________________ > EMBOSS mailing list > EMBOSS at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/emboss > > _______________________________________________ > EMBOSS mailing list > EMBOSS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/emboss From olivier.friard at unito.it Wed Apr 26 10:29:51 2006 From: olivier.friard at unito.it (Olivier Friard) Date: Wed, 26 Apr 2006 12:29:51 +0200 Subject: [EMBOSS] index RefSeq with dbxflat Message-ID: <444F4B9F.1020209@unito.it> Hello, Thank you for your kindly help for indexing refseq. I try to index RefSeq DNA db using the dbxflat program with the following arguments: dbxflat Database b+tree indexing for flat file databases Basename for index files: rs_dna Resource name: rs_dna EMBL : EMBL SWISS : Swiss-Prot, SpTrEMBL, TrEMBLnew GB : Genbank, DDBJ REFSEQ : Refseq Entry format [SWISS]: REFSEQ Wildcard database filename [*.dat]: *.gbff Database directory [.]: /home/users/friard/data/refseq_genomic id : ID acc : Accession number sv : Sequence Version and GI des : Description key : Keywords org : Taxonomy Index fields [id,acc]: I included these records in my .embossrc file: DB rs_dna [ type: "N" method: "emboss" dbalias: "rs_dna" format: "genbank" directory: "/home/users/friard/data/refseq_genomic/" file: "*.gbff" comment: "RefSeq DNA (dbxflat)" ] RES rs_dna [ type: Index idlen: 15 acclen: 15 svlen: 15 keylen: 15 deslen: 15 orglen: 15 ] but when I try to retrieve a single sequence with its AC (seqret rs_dna:NC_001911) the program fails with this error message: seqret rs_dna:NC_001191 Reads and writes (returns) sequences Error: Unable to read sequence 'rs_dna:NC_001191' Died: seqret terminated: Bad value for '-sequence' and no prompt when I try to retrieve all sequences with "seqret rs_dna:* -out fasta::refseq.fasta" and everything works well I try to use dbxfasta with the *.fna files (modifying the .embossrc file with "fasta" value) but I obtained the same error. Any idea about the problem? Thank you in advance Olivier Friard From xiaozhendong at gmail.com Wed Apr 26 13:50:01 2006 From: xiaozhendong at gmail.com (zhendong shaw) Date: Wed, 26 Apr 2006 21:50:01 +0800 Subject: [EMBOSS] how to using Einverted to process a file contain multiple sequences Message-ID: Since the Einverted program is designed to process only one sequences a time. Are there any ways to handle a file in fasta format containing multiple sequences? The input file just like follow: >seq1 ATTTTTTTTTTTTTTTTTTTT >seq2 TTTAAAAAAAAAAAAAAA ....... sth like that.... From rls at ebi.ac.uk Wed Apr 26 15:46:51 2006 From: rls at ebi.ac.uk (Rodrigo Lopez) Date: Wed, 26 Apr 2006 16:46:51 +0100 Subject: [EMBOSS] FW: Forthcoming change in the EMBL flatfile format Message-ID: <00c401c66948$a2e29d40$0132a8c0@windows.ebi.ac.uk> > -----Original Message----- > From: owner-seq-dbg at ebi.ac.uk > [mailto:owner-seq-dbg at ebi.ac.uk] On Behalf Of Carola Kanz > Sent: 26 April 2006 16:29 > To: seq-dbg at ebi.ac.uk > Subject: Forthcoming change in the EMBL flatfile format > > > Dear all, > > if you are working with the EMBL flatfile format and you are > not yet aware of the format change we are going to introduce > with the next release, please have a look at the following > announcement. > Carola > > > -------------------------------------------------------------- > ----------- > > Dear colleagues, > > We would like to announce the following important change in > the EMBL database in June this year. > > At the time of release 87 (available from JUN-2006) the > format of the EMBL flat file will undergo a change: the ID > line will have a different structure (see below) and the SV > line will be removed. > > The changes affecting the ID line structure are: > > * All tokens will be separated by a semicolon. > * The entry name will not be displayed, in its place > there will be > the primary accession number. > * The sequence version will be indicated. > * The topology will be a separate token and will be > indicated for > both circular and linear molecules. > * Both the data class and the taxonomic divisions will > be displayed. > > This is an example of the new ID line: > > ID CD789012; SV 4; linear; genomic DNA; HTG; MAM; 500 BP. > (1) (2) (3) (4) (5) (6) (7) > > > The tokens represent: > > 1. Primary accession number. > 2. 'SV' + sequence version number. > 3. Topology: 'circular' or 'linear'. > 4. Molecule type. > 5. Data class (ANN, CON, PAT, EST, GSS, HTC, HTG, MGA, WGS, TPA, > STS, STD, "normal" entries will have STD for standard). > 6. Taxonomic division (HUM, MUS, ROD, PRO, MAM, VRT, FUN, > PLN, ENV, > INV, SYN, UNC, VRL, PHG)." > 7. Sequence length + 'BP.'. > > The entry name will not be displayed any more in the ID line. > Since EMBL release 3 (Dec 1983) the stable identifier of an > entry has been the primary accession number. > > A mapping file (entryname to accession number) will be > provided with the next release for those entries where the > entryname doesn't coincide with the accession number. > > To give users a test dataset, one file with new-style ID > lines called new_id_line.test.gz was provided together with > the March release of the EMBL database: > ftp://ftp.ebi.ac.uk/pub/databases/embl/release/new_id_line.test.gz > > Feedback from users is sought; please use the "Contact us" > link at the bottom of the EBI home page and specify "EMBL" in > the feedback form. > > Note: this information was first made available on our > "Forthcoming changes" page ( > http://www.ebi.ac.uk/embl/Documentation/forthcomingchanges.htm > l#0606 ) and in the EMBL database release notes. > > > > > > From pmr at ebi.ac.uk Fri Apr 28 09:04:31 2006 From: pmr at ebi.ac.uk (Peter Rice) Date: Fri, 28 Apr 2006 10:04:31 +0100 Subject: [EMBOSS] EMBOSS Funding News Message-ID: <4451DA9F.5030906@ebi.ac.uk> EMBOSS will be funded by the UK Biotechnology and Biological Sciences Research Council (BBSRC) for the next 3 years. EBI has issued the following press release, also available from: http://www.ebi.ac.uk/Information/News/pdf/Press25Apr06-small.pdf The EMBOSS team would like to thanks all our users and developers for their patience over the past two years. regards, Peter Rice Alan Bleasby Jon Ison A brighter future for Europe?s favourite molecular biology software package New funding for EMBOSS ? Europe?s leading suite of molecular biology analysis tools ? guarantees open access for researchers and software developers Hinxton, 25 April, 2006 ? EMBOSS, the European Molecular Biology Open Software Suite, has received a vital funding boost from the UK Biotechnology and Biological Sciences Research Council (BBSRC) that will guarantee its continued maintenance under an open source license for the next three years. This ends two years of uncertainty over the future of the project. Until recently, EMBOSS was hosted by the Medical Research Council?s Rosalind Franklin Centre for Genomics Research (RFCGR), where it was funded jointly by the BBSRC and the Medical Research Council (see ?notes for editors? for more information on the history of EMBOSS). With the announcement in April 2004 of the RFCGR?s closure, the future of EMBOSS hung in the balance. The new funding from the BBSRC means that EMBOSS co-founders Peter Rice and Alan Bleasby will be able to continue the EMBOSS project at the EMBL-EBI for the next three years. EMBOSS will remain freely available from emboss.sourceforge.net and anyone who wants to develop it further will have access to its source code. ?We?re delighted that the BBSRC has recognized EMBOSS as an important tool for molecular biology? says project leader Peter Rice. ?The EMBOSS user community has been very patient, and it highlights a great benefit of open source software that even users in industry have continued to rely on EMBOSS despite the uncertainty about its future. This simply could not have happened if EMBOSS had been a commercial package under threat.? EMBOSS provides a powerful package of around 300 applications for molecular biology and bioinformatics analysis. Molecular biologists use EMBOSS at all stages of their research, from planning experiments to analysing results. It also has an application-programming interface (API) that enables software developers to write their own EMBOSS applications. These can readily be strung together, allowing users to create ?workflows? that automate complex and time-consuming tasks. EMBOSS has also been used in many commercial software developments and is included in commercial bioinformatics systems. Its flexibility has made it an obvious core component of several data integration and bioinformatics infrastructure projects, including myGrid and EMBRACE. The new funding also provides helpdesk support for EMBOSS?s users. ?As well as helping researchers with limited bioinformatics expertise to make the most of EMBOSS, we will be able to provide better support and documentation to the estimated 20% of our users who are also software developers?, explains Alan Bleasby. ?We will encourage these experts to contribute their code to the project. In return, we will make their software widely available through the EMBOSS website and provide ongoing user support for it. This mechanism will help to ensure that EMBOSS evolves according to the needs of its users.? Contact: Cath Brooksbank PhD, EMBL-EBI Scientific Outreach Officer, Hinxton, UK, Tel: +44 1223 492 552, www.ebi.ac.uk, cath at ebi.ac.uk Anna-Lynn Wegener, EMBL Press Officer, Heidelberg, Germany, Tel: +49 6221 387 452, www.embl.org, wegener at embl.de Notes for editors ? a brief history of EMBOSS EMBOSS, an open source suite of tools for the analysis of biological data, has its origins in the late 1980s when Peter Rice, a co-founder of EMBOSS, was working at EMBL. Encouraged by his colleagues in the lab, he began to write extensions to the GCG package, which at that time provided its source code to users. His efforts evolved into EGCG (extended GCG) and Rice moved to the Sanger Centre (now the Wellcome Trust Sanger Institute) to continue its development. However, the changes to the source code licensing of GCG in 1996 put an end to further development of EGCG. Recognizing the importance of free source code to the rapid and cost-effective development of bioinformatics tools, Rice, in collaboration with Alan Bleasby (then at SEQNET, Daresbury, UK) began working on a new suite of open-source bioinformatics tools ? the EMBOSS project ? in 1996. EMBOSS has been funded by: the Wellcome Trust (1997?2000); the BBSRC and MRC (2001?2004); and through two posts at the MRC Rosalind Franklin Centre for Genomic Research following a merger with BBSRC?s SEQNET facility in 1998.After the closure of RFCGR in July 2005,EMBOSS moved to the EMBL-EBI where it is coordinated by Rice and Bleasby. About EMBL: The European Molecular Biology Laboratory is a basic research institute funded by public research monies from 19 member states (Austria, Belgium, Croatia,Denmark, Finland, France,Germany,Greece, Iceland, Ireland, Israel, Italy, the Netherlands,Norway, Portugal, Spain, Sweden, Switzerland and the United Kingdom). Research at EMBL is conducted by approximately 80 independent groups covering the spectrum of molecular biology. The Laboratory has five units: the main Laboratory in Heidelberg, and Outstations in Hinxton (the European Bioinformatics Institute), Grenoble, Hamburg, and Monterotondo near Rome. The cornerstones of EMBL?s mission are: to perform basic research in molecular biology; to train scientists, students and visitors at all levels; to offer vital services to scientists in the member states; to develop new instruments and methods in the life sciences and to actively engage in technology transfer activities. EMBL?s International PhD Programme has a student body of about 170. The Laboratory also sponsors an active Science and Society programme.Visitors from the press and public are welcome. About EBI: The European Bioinformatics Institute (EBI) is part of the European Molecular Biology Laboratory (EMBL) and is located on the Wellcome Trust Genome Campus in Hinxton near Cambridge (UK). The EBI grew out of EMBL's pioneering work in providing public biological databases to the research community. It hosts some of the world's most important collections of biological data, including DNA sequences (EMBL-Bank), protein sequences (UniProt), animal genomes (Ensembl), three-dimensional structures (the Macromolecular Structure Database), data from microarray experiments (ArrayExpress), protein?protein interactions (IntAct) and pathway information (Reactome).The EBI hosts several research groups and its scientists continually develop new tools for the biocomputing community. Policy regarding use: EMBL press releases may be freely reprinted and distributed via print and electronic media. Text, photographs & graphics are copyrighted by EMBL. They may be freely reprinted and distributed in conjunction with this news story, provided that proper attribution to authors, photographers and designers is made. High-resolution copies of the images can be downloaded from the EMBL web site: www.embl.org From rsucgang at bcm.tmc.edu Fri Apr 28 21:33:59 2006 From: rsucgang at bcm.tmc.edu (richard sucgang phd) Date: Fri, 28 Apr 2006 16:33:59 -0500 Subject: [EMBOSS] backtranambig missing? In-Reply-To: <69289db821f271dd3cf6e131ffa02013@bbsrc.ac.uk> References: <442BFD56.9010908@pcm.uam.es> <443A2160.8090102@ebi.ac.uk> <69289db821f271dd3cf6e131ffa02013@bbsrc.ac.uk> Message-ID: I am using EMBOSS on OSX (installed using fink). Is it my imagination, or is the application backtranambig missing? The documentation on sf.net points to this application existing, yet, I cannot find the binary in the install. Any ideas? -- Richard Sucgang, PhD (713) 798 7657 http://www.dictygenome.org/ From francis.tang at chukhang.com Fri Apr 28 22:39:12 2006 From: francis.tang at chukhang.com (Francis Tang) Date: Fri, 28 Apr 2006 23:39:12 +0100 Subject: [EMBOSS] how to using Einverted to process a file contain multiple sequences In-Reply-To: References: Message-ID: <44529990.90600@chukhang.com> Hi Zhendong, I've had to run einverted on a file with many sequences before. If I remember correctly, I used seqret to create a new file for each sequence, and then used bash's for+glob expansion to run einverted many times. Sorry this mail is so vague - it's been a long while since I've used emboss. If you haven't solved the problem already and the clues above don't make it obvious, write back and I'll work it out again. Cheers. Francis. zhendong shaw wrote: > Since the Einverted program is designed to process only one sequences a > time. Are there any ways to handle a file in fasta format containing > multiple sequences? > The input file just like follow: >> seq1 > ATTTTTTTTTTTTTTTTTTTT >> seq2 > TTTAAAAAAAAAAAAAAA > ....... > > sth like that.... -- www.chukhang.com/francis From pmr at ebi.ac.uk Sat Apr 29 10:23:42 2006 From: pmr at ebi.ac.uk (pmr at ebi.ac.uk) Date: Sat, 29 Apr 2006 11:23:42 +0100 (BST) Subject: [EMBOSS] backtranambig missing? In-Reply-To: References: <442BFD56.9010908@pcm.uam.es> <443A2160.8090102@ebi.ac.uk> <69289db821f271dd3cf6e131ffa02013@bbsrc.ac.uk> Message-ID: <2033.86.137.135.19.1146306222.squirrel@webmail.ebi.ac.uk> Richard Sucgang writes: > I am using EMBOSS on OSX (installed using fink). Is it my > imagination, or is the application backtranambig missing? The > documentation on sf.net points to this application existing, yet, I > cannot find the binary in the install. Any ideas? backtranambig will be in EMBOSS 4.0.0 The emboss.sf.net documentation is for the current developers code, and includes new programs and changes to the documentation for some of the current programs. EMBOSS 3.0.0 documentation is included in the distribution and installed when EMBOSS is installed. This often causes confusion - we are working on adding the 3.0.0 documentation to the website but we have not yet had time to finish that work. (We did move the current documentation to make it clearer that it was for the CVS code - but that caused more confusion). More news on 4.0.0 soon - we are busy now planning what will be in the release. Hope that helps, Peter