From kvddrift at earthlink.net  Sun Apr  2 18:51:23 2006
From: kvddrift at earthlink.net (Koen van der Drift)
Date: Sun, 2 Apr 2006 18:51:23 -0400
Subject: [EMBOSS] crash on intel-Mac
In-Reply-To: <51078.81.98.244.247.1143807173.squirrel@webmail.ebi.ac.uk>
References: <E24BA334-87A3-4EE1-91D7-C63B1A02BA63@earthlink.net>
	<51078.81.98.244.247.1143807173.squirrel@webmail.ebi.ac.uk>
Message-ID: <D7211261-0F49-4FB6-BA19-12082674BC5E@earthlink.net>


On Mar 31, 2006, at 7:12 AM, ajb at ebi.ac.uk wrote:

> This should now be fixed as long as you apply all the fixes to  
> EMBOSS-3.0.0
> from the directory:

Thanks.

Another fink user suggested to even extend the testing for ppc and  
intel in new config file, so it looks like:

if test "`uname -a | grep Darwin`"; then
   if test "`uname -a | grep i386`"; then
     CFLAGS="$CFLAGS -O1"
   else
     # is this the correct setting on darwin-powerpc?
     CFLAGS="$CLFAGS -O2"
   fi
else
  CFLAGS="$CFLAGS -O2"
  fi
fi

Would that cause any problems with emboss?

thanks,

- Koen.


From h-weber at users.sourceforge.net  Mon Apr  3 13:49:06 2006
From: h-weber at users.sourceforge.net (harald weber)
Date: Mon, 03 Apr 2006 10:49:06 -0700
Subject: [EMBOSS] SeqFreed - a new interface to EMBOSS
Message-ID: <E1FQTAo-0001uF-A6@sc8-pr-shell1.sourceforge.net>

Dear friends,

herewith I'd like to inform you about SeqFreed, a bioinformatics desktop.
Amongst others, SeqFreed can also serve as a GUI-interface to EMBOSS applications.
Please download it via 'seqfreed.sourceforge.net', run it and let me know,
what you think about it. Besides that many details have to be improved,
I'd like to know if this kind of app could be useful for you at all.

All the best, Harald


From dwaner at scitegic.com  Tue Apr  4 12:57:45 2006
From: dwaner at scitegic.com (David Waner)
Date: Tue, 04 Apr 2006 09:57:45 -0700
Subject: [EMBOSS] Digest and Pepstats crash using cygwin
Message-ID: <4432A589.6050809@scitegic.com>

I have compiled the 3.0.0 release of Emboss (including all current fixes 
from the ftp site) for Windows XP using Cygwin version 1.88.  Most of 
the Emboss programs that I have tested work, but both Digest and 
Pepstats fail every time with a "Bad float conversion" error.  The 
problem does not seem to depend on the sequence data, and occurs on 
every file I've tried.

Has anyone else experienced this problem? Any solutions or suggestions 
would be appreciated.

Thanks.
    - David

Example: 

    C:> digest -sequence O43291.fa -menu 2 -auto
    Protein proteolytic enzyme or reagent cleavage digest
    Output report [spt2_human.digest]: stdout

       EMBOSS An error in ajarr.c at line 1701:
    Bad float conversion

Test data (O43291.fa):

 >swall|O43291|SPT2_HUMAN Kunitz-type protease inhibitor 2 precursor 
(Hepatocyte growth factor activator inhibitor type 2) (HAI-2) (Placental 
bikunin).
MAQLCGLRRSRAFLALLGSLLLSGVLAADRERSIHDFCLVSKVVGRCRASMPRWWYNVTD
GSCQLFVYGGCDGNSNNYLTKEECLKKCATVTENATGDLATSRNAADSSVPSAPRRQDSE
DHSSDMFNYEEYCTANAVTGPCRASFPRWYFDVERNSCNNFIYGGCRGNKNSYRSEEACM
LRCFRQQENPPLPLGSKVVVLAGLFVMVLILFLGASMVYLIRVARRNQERALRTVWSSGD
DKEQLVKNTYVL


From simon.andrews at bbsrc.ac.uk  Wed Apr  5 05:04:20 2006
From: simon.andrews at bbsrc.ac.uk (simon andrews (BI))
Date: Wed, 5 Apr 2006 10:04:20 +0100
Subject: [EMBOSS] Download server problems?
Message-ID: <324960494b49bb2c0f2679add8452bf9@bbsrc.ac.uk>

Does anyone know what's up with the emboss.open-bio.org FTP server?  I 
can connect, but never get as far as a login prompt.

Simon.
-- 
Simon Andrews PhD
Bioinformatics Dept.
The Babraham Institute

simon.andrews at bbsrc.ac.uk
+44 (0) 1223 496463


From dag at sonsorol.org  Wed Apr  5 23:07:33 2006
From: dag at sonsorol.org (Chris Dagdigian)
Date: Wed, 5 Apr 2006 23:07:33 -0400
Subject: [EMBOSS] Download server problems?
In-Reply-To: <324960494b49bb2c0f2679add8452bf9@bbsrc.ac.uk>
References: <324960494b49bb2c0f2679add8452bf9@bbsrc.ac.uk>
Message-ID: <EB2D57E0-7C4A-40FD-B905-A83775F21552@sonsorol.org>


{forgot to CC the list on this reply ... }

Our fault (open-bio.org hosting) -- the server has some sort of  
running process with a memory leak we thought we had found. Turns out  
we didn't and the box ground itself slowly to a halt this evening.  
Thanks to the wonders of remote power control all it takes to reset  
and power cycle the system is an SSH connection.  We've got another  
4GB of memory on order for this system.

Regards.
Chris


On Apr 5, 2006, at 5:04 AM, simon andrews (BI) wrote:

> Does anyone know what's up with the emboss.open-bio.org FTP server?  I
> can connect, but never get as far as a login prompt.
>
> Simon.
> -- 
> Simon Andrews PhD
> Bioinformatics Dept.
> The Babraham Institute
>
> simon.andrews at bbsrc.ac.uk
> +44 (0) 1223 496463
>
> _______________________________________________
> EMBOSS mailing list
> EMBOSS at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/emboss


From natalia.jimenez at pcm.uam.es  Thu Apr  6 03:56:06 2006
From: natalia.jimenez at pcm.uam.es (Natalia Jimenez Lozano)
Date: Thu, 06 Apr 2006 09:56:06 +0200
Subject: [EMBOSS] Problems with GenBank indexing
Message-ID: <4434C996.7050606@pcm.uam.es>

Hi everybody,

I was trying to retrieve fasta protein sequences from GenBank by id 
using seqret but it was not possible for every id. However, retrieval by 
GI is allowed.

Additionally, during the indexing process (dbifasta) I've obtained some 
errors like this one:

Warning: Duplicate ID skipped: 'AC000348_16' All hits will point to 
first ID found

I was looking for an explanation to this behaviour and I've found that 
skipped IDs correspond to CDS from genomic sequences and have this format:

 >gi|10121909|gb|AAG13419.1|AC000348_16 T7N9.24 [Arabidopsis thaliana]
MELPDVPVWRRVIVSAFFEALTFNIDIEEERSEIMMKTGAVVSNPRSRVKWDAFLSFQRDTSHNFTDRLY...
 >gi|8778864|gb|AAF79863.1|AC000348_16 T7N9.28 [Arabidopsis thaliana]
MSVVLQITKDWVQALLGFLLLSFANISTRTNHKHFPHGSCSSIMAGFWIYMYIYSYLFITLKIIDLTS...

In the previous entries, when I try to retrieve one of them by the first 
identifier (gi), I can get both of them. When I try to do retrievals 
using the last identifier (AC000348_16), I only get the first one. But 
it's impossible to do retrievals by second identifier (AAG13419.1 and 
AAF79863.1).

However, sequences with the following format can be well indexed:

 >gi|64029|emb|CAA23986.1| reading frame [Lophius americanus]
MKMVSSSRLRCLLVLLLSLTASISCSFAGQRDSKLRLLLHRYPLQGSKQDMTRSALAELLLSDLLQGENE ...

and these sequences can be well retrieved by first and second 
identifiers (64029 and CAA23986.1).

Does anybody know how to solve these problems?
Thanks in advance,
Natalia


From jison at ebi.ac.uk  Fri Apr  7 08:02:50 2006
From: jison at ebi.ac.uk (Jon Ison)
Date: Fri, 7 Apr 2006 13:02:50 +0100 (BST)
Subject: [EMBOSS] Problems with GenBank indexing
In-Reply-To: <4434C996.7050606@pcm.uam.es>
References: <4434C996.7050606@pcm.uam.es>
Message-ID: <59463.172.31.100.168.1144411370.squirrel@webmail.ebi.ac.uk>


Dear Natalia

By default, dbifasta will index the ID name and the accession number (if present).

To index the Sequence Version, GI number and words in the description, you must
run dbifasta with the '-fields' qualifier, e.g. "-fields acc", "-fields sv acc"
etc.   If you don't, you will not be able to retrieve by those fields. Please
see http://emboss.sourceforge.net/apps/cvs/dbifasta.html.

dbifasta only retrieves the first of any duplicate entries.  So far as I'm aware
dbxfasta can retrieve duplicate entries.

Does that help?  Feel free to get back in touch.

Cheers

Jon


> Hi everybody,
>
> I was trying to retrieve fasta protein sequences from GenBank by id
> using seqret but it was not possible for every id. However, retrieval by
> GI is allowed.
>
> Additionally, during the indexing process (dbifasta) I've obtained some
> errors like this one:
>
> Warning: Duplicate ID skipped: 'AC000348_16' All hits will point to
> first ID found
>
> I was looking for an explanation to this behaviour and I've found that
> skipped IDs correspond to CDS from genomic sequences and have this format:
>
>  >gi|10121909|gb|AAG13419.1|AC000348_16 T7N9.24 [Arabidopsis thaliana]
> MELPDVPVWRRVIVSAFFEALTFNIDIEEERSEIMMKTGAVVSNPRSRVKWDAFLSFQRDTSHNFTDRLY...
>  >gi|8778864|gb|AAF79863.1|AC000348_16 T7N9.28 [Arabidopsis thaliana]
> MSVVLQITKDWVQALLGFLLLSFANISTRTNHKHFPHGSCSSIMAGFWIYMYIYSYLFITLKIIDLTS...
>
> In the previous entries, when I try to retrieve one of them by the first
> identifier (gi), I can get both of them. When I try to do retrievals
> using the last identifier (AC000348_16), I only get the first one. But
> it's impossible to do retrievals by second identifier (AAG13419.1 and
> AAF79863.1).
>
> However, sequences with the following format can be well indexed:
>
>  >gi|64029|emb|CAA23986.1| reading frame [Lophius americanus]
> MKMVSSSRLRCLLVLLLSLTASISCSFAGQRDSKLRLLLHRYPLQGSKQDMTRSALAELLLSDLLQGENE ...
>
> and these sequences can be well retrieved by first and second
> identifiers (64029 and CAA23986.1).
>
> Does anybody know how to solve these problems?
> Thanks in advance,
> Natalia
> _______________________________________________
> EMBOSS mailing list
> EMBOSS at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/emboss
>


From natalia.jimenez at pcm.uam.es  Fri Apr  7 08:50:16 2006
From: natalia.jimenez at pcm.uam.es (Natalia Jimenez Lozano)
Date: Fri, 07 Apr 2006 14:50:16 +0200
Subject: [EMBOSS] Problems with GenBank indexing
In-Reply-To: <59463.172.31.100.168.1144411370.squirrel@webmail.ebi.ac.uk>
References: <4434C996.7050606@pcm.uam.es> 
	<59463.172.31.100.168.1144411370.squirrel@webmail.ebi.ac.uk>
Message-ID: <44366008.6080106@pcm.uam.es>

Dear Jon,

> Dear Natalia
>
> By default, dbifasta will index the ID name and the accession number (if present).
>
> To index the Sequence Version, GI number and words in the description, you must
> run dbifasta with the '-fields' qualifier, e.g. "-fields acc", "-fields sv acc"
> etc.   If you don't, you will not be able to retrieve by those fields. Please
> see http://emboss.sourceforge.net/apps/cvs/dbifasta.html.
>   
Yes indexation was done taking into account the -field parameter :-(
> dbifasta only retrieves the first of any duplicate entries.  So far as I'm aware
> dbxfasta can retrieve duplicate entries.
>   
We'll try with dbxfasta!
> Does that help?  Feel free to get back in touch.
>   
Yes, a lot.
Thank you very much
Regards,
Natalia
> Cheers
>
> Jon
>
>
>
>
>   
>> Hi everybody,
>>
>> I was trying to retrieve fasta protein sequences from GenBank by id
>> using seqret but it was not possible for every id. However, retrieval by
>> GI is allowed.
>>
>> Additionally, during the indexing process (dbifasta) I've obtained some
>> errors like this one:
>>
>> Warning: Duplicate ID skipped: 'AC000348_16' All hits will point to
>> first ID found
>>
>> I was looking for an explanation to this behaviour and I've found that
>> skipped IDs correspond to CDS from genomic sequences and have this format:
>>
>>  >gi|10121909|gb|AAG13419.1|AC000348_16 T7N9.24 [Arabidopsis thaliana]
>> MELPDVPVWRRVIVSAFFEALTFNIDIEEERSEIMMKTGAVVSNPRSRVKWDAFLSFQRDTSHNFTDRLY...
>>  >gi|8778864|gb|AAF79863.1|AC000348_16 T7N9.28 [Arabidopsis thaliana]
>> MSVVLQITKDWVQALLGFLLLSFANISTRTNHKHFPHGSCSSIMAGFWIYMYIYSYLFITLKIIDLTS...
>>
>> In the previous entries, when I try to retrieve one of them by the first
>> identifier (gi), I can get both of them. When I try to do retrievals
>> using the last identifier (AC000348_16), I only get the first one. But
>> it's impossible to do retrievals by second identifier (AAG13419.1 and
>> AAF79863.1).
>>
>> However, sequences with the following format can be well indexed:
>>
>>  >gi|64029|emb|CAA23986.1| reading frame [Lophius americanus]
>> MKMVSSSRLRCLLVLLLSLTASISCSFAGQRDSKLRLLLHRYPLQGSKQDMTRSALAELLLSDLLQGENE ...
>>
>> and these sequences can be well retrieved by first and second
>> identifiers (64029 and CAA23986.1).
>>
>> Does anybody know how to solve these problems?
>> Thanks in advance,
>> Natalia
>> _______________________________________________
>> EMBOSS mailing list
>> EMBOSS at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/emboss
>>
>>     
>
>
>
>
>   


From jison at ebi.ac.uk  Fri Apr  7 11:34:24 2006
From: jison at ebi.ac.uk (Jon Ison)
Date: Fri, 7 Apr 2006 16:34:24 +0100 (BST)
Subject: [EMBOSS] Problem indexing PDB fasta file
In-Reply-To: <442BFD56.9010908@pcm.uam.es>
References: <442BFD56.9010908@pcm.uam.es>
Message-ID: <34100.172.31.100.168.1144424064.squirrel@webmail.ebi.ac.uk>

Hi Enrique

dbifasta will return just the first entry with a duplicated id.
The new dbxfasta will return all entries with the duplicated id.

dbifasta is indeed case-insensitive.   To make it case-sensitive,
you could change the 3 instances of "ajStrMatchCaseC" in dbifasta.c
to "ajStrMatchC", recompile and try again.  I don't think we'd want
to make that change in the distribution though.

Hope that helps.

Cheers

Jon


> Hello,
>
> I'm trying to index the fasta file of the PDB database with dbifasta
> command and I get a lot of warnings as:
>
> Warning: Duplicate ID skipped: '1FNT_A' All hits will point to first ID
> found
>
> I have been looking the PDB fasta file and I see that, for the previous
> warning, there are an entry whoose id is '1FNT_A' and another one whoose
> id is '1FNT_a'. Then, this make me think that EMBOSS is
> case-insensitive. Is this true? Are there any way to distinguish between
> the two id's?
>
> Thanks in advance,
>
> Enrique.
> _______________________________________________
> EMBOSS mailing list
> EMBOSS at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/emboss
>


From pmr at ebi.ac.uk  Mon Apr 10 05:12:00 2006
From: pmr at ebi.ac.uk (Peter Rice)
Date: Mon, 10 Apr 2006 10:12:00 +0100
Subject: [EMBOSS] Problem indexing PDB fasta file
In-Reply-To: <442BFD56.9010908@pcm.uam.es>
References: <442BFD56.9010908@pcm.uam.es>
Message-ID: <443A2160.8090102@ebi.ac.uk>

Enrique de Andres Saiz wrote:
> I have been looking the PDB fasta file and I see that, for the previous 
> warning, there are an entry whoose id is '1FNT_A' and another one whoose 
> id is '1FNT_a'. Then, this make me think that EMBOSS is 
> case-insensitive. Is this true? Are there any way to distinguish between 
> the two id's?

Yes, EMBOSS is case-insensitive. So is the Staden/EMBLCD indexing standard 
that dbifasta uses.

The standard also only allows one entry with each ID.

dbxfasta uses a new indexing format and can index both entries, but will still 
assume the names are the same (a search for 1FNT_A or 1FNT_a wil return both 
entries). Allowing indexing to be case-sensitive is possible in future, but 
can slow down searches. We will investigate.

Hope that helps,

Peter


From pmr at ebi.ac.uk  Mon Apr 10 05:05:36 2006
From: pmr at ebi.ac.uk (Peter Rice)
Date: Mon, 10 Apr 2006 10:05:36 +0100
Subject: [EMBOSS] dbifasta index file format
In-Reply-To: <20060330083142.4237.qmail@web26207.mail.ukl.yahoo.com>
References: <20060330083142.4237.qmail@web26207.mail.ukl.yahoo.com>
Message-ID: <443A1FE0.1060707@ebi.ac.uk>

Graziano P. wrote:
> hello EMBOSS users,
> I have some databases in fasta format (ncbi | format)
> and I want to index them using dbifasta, then I want
> to access the index files using a program that will be
> developed by a computer scientist of my group.
> I need to index the databases by accession number,
> ginumber and description. I have read in the dbifasta
> help info about the structure of the index files when
> the databases were indexed by accession number, but I
> have not found info about the structure of the index
> files when the databases are indexed by description.
> Anyone knows where I can find detailed information
> about the structure of the index files?

Ciao Graziano,

The dbifasta index files use the same format as the Staden package, the old 
EMBL CD-ROM distribution, and Erik Sonnhammer's "efetch" utility.

They were documented in some old Staden documentation and papers.

They are also documented in the EMBOSS distribution under doc/manuals/ in file 
internals-indexing.txt (see attached). I see that this document was written 
before we indexed the descriptions!!!

The description (title) indexing is the same as the accession number indexing. 
The files are called des.hit and des.trg. dbifasta has a -maxindex option to 
limit the size of the longest words indexed (the index files have a value for 
the maximum record length).

We also have a script in the distribution scripts/dbilist.pl which can list 
the contents of the description index (in the database index directory, run it 
as dbilist.pl des)

The new dbxfasta index files are very different. For very large databases we 
recommend dbxfasta. For smaller databases dbifasta is fine and we will 
continue to support it.

Hope that helps. If you need more details, just ask.

regards,

Peter


-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: internals-indexing.txt
Url: http://lists.open-bio.org/pipermail/emboss/attachments/20060410/be632ef4/attachment.txt 

From simon.andrews at bbsrc.ac.uk  Mon Apr 10 05:40:30 2006
From: simon.andrews at bbsrc.ac.uk (simon andrews (BI))
Date: Mon, 10 Apr 2006 10:40:30 +0100
Subject: [EMBOSS] Problem indexing PDB fasta file
In-Reply-To: <443A2160.8090102@ebi.ac.uk>
References: <442BFD56.9010908@pcm.uam.es> <443A2160.8090102@ebi.ac.uk>
Message-ID: <69289db821f271dd3cf6e131ffa02013@bbsrc.ac.uk>


On 10 Apr 2006, at 10:12, Peter Rice wrote:

> Enrique de Andres Saiz wrote:
>> I have been looking the PDB fasta file and I see that, for the 
>> previous
>> warning, there are an entry whoose id is '1FNT_A' and another one 
>> whoose
>> id is '1FNT_a'. Then, this make me think that EMBOSS is
>> case-insensitive. Is this true? Are there any way to distinguish 
>> between
>> the two id's?
>
> Yes, EMBOSS is case-insensitive. So is the Staden/EMBLCD indexing 
> standard
> that dbifasta uses.
>
> The standard also only allows one entry with each ID.

If anyone's interested I've got a small perl script which reformats the 
PDB database into a more sensible format and sorts out the problems 
with case sensitive ids and a number of other odd conventions used in 
PDB.

I'm happy to supply a copy to anyone who wants it.

TTFN

Simon.
-- 
Simon Andrews PhD
Bioinformatics Dept.
The Babraham Institute

simon.andrews at bbsrc.ac.uk
+44 (0) 1223 496463


From pmr at ebi.ac.uk  Mon Apr 10 06:44:47 2006
From: pmr at ebi.ac.uk (Peter Rice)
Date: Mon, 10 Apr 2006 11:44:47 +0100
Subject: [EMBOSS] Problems with GenBank indexing
In-Reply-To: <4434C996.7050606@pcm.uam.es>
References: <4434C996.7050606@pcm.uam.es>
Message-ID: <443A371F.1010100@ebi.ac.uk>

Natalia Jimenez Lozano wrote:

> I was looking for an explanation to this behaviour and I've found that 
> skipped IDs correspond to CDS from genomic sequences and have this format:
> 
>  >gi|10121909|gb|AAG13419.1|AC000348_16 T7N9.24 [Arabidopsis thaliana]
> MELPDVPVWRRVIVSAFFEALTFNIDIEEERSEIMMKTGAVVSNPRSRVKWDAFLSFQRDTSHNFTDRLY...
>  >gi|8778864|gb|AAF79863.1|AC000348_16 T7N9.28 [Arabidopsis thaliana]
> MSVVLQITKDWVQALLGFLLLSFANISTRTNHKHFPHGSCSSIMAGFWIYMYIYSYLFITLKIIDLTS...

As Jon says, dbxfasta is a solution.

However, that is only a partial solution. The real problem is that these FASTA 
format sequences do indeed have duplicate IDs.

This is protein sequence data, so it is not GenBank - was this GenPept or some 
other database?

GenPept and other databases have been known to report "gb" or "emb" as the 
database for protein sequences!!!

A possible solution is to add a new ID format to dbifasta and dbxfasta that 
uses AAG13419 and AAF7986 as the ID and ignores the AC000348_16 part.

Hope this helps,

Peter


From pmr at ebi.ac.uk  Mon Apr 10 07:04:49 2006
From: pmr at ebi.ac.uk (Peter Rice)
Date: Mon, 10 Apr 2006 12:04:49 +0100
Subject: [EMBOSS] Fwd: EMBOSS for Windows without Cygwin
In-Reply-To: <a0bf33d50603312012yd77e73ex9e5f88b3acc10e97@mail.gmail.com>
References: <442CCD71.60202@gmail.com>
	<a0bf33d50603312012yd77e73ex9e5f88b3acc10e97@mail.gmail.com>
Message-ID: <443A3BD1.2040709@ebi.ac.uk>

Duleep Samuel wrote:

> Is the latest EMBOSS version 3.0.0.0 available anywhere as a precompiled
> binary for Windows  XP,  I have tried  compiling  using cygwin and it
> crashed, I loaded EMBOSS for windows which is a port of version 2.10.0,
> loaded Staden Package and made Spin aware of EMBOSS and am working, but
> feel bad that I am _One_ whole release behind, If anyone has a complied
> binary I can download for testing and report back on useability,
> regards, Samuel, Virologist, India

Staden has support for older versions of EMBOSS. We are trying to update 
Staden to work with EMBOS 3.0.0 and future releases.

If anyone is using EMBOSS and Staden (especially EMBOSS under the Staden SPIN 
interface) please contact the EMBOSS developers 
(emboss-bug at emboss.open-bio.org) so we know how many EMBOSS SPIN users there 
are. It helps to set priorities for the work.

regards,

Peter


From janenerz at web.de  Wed Apr 12 05:09:58 2006
From: janenerz at web.de (Christiane Nerz)
Date: Wed, 12 Apr 2006 11:09:58 +0200
Subject: [EMBOSS] nt-multi-fastA-file
Message-ID: <443CC3E6.4040108@web.de>

Hi all,

I put the gb-file of an whole genome in Artemis.
Is there a possibility to export a multi-FastA-file with the bases of 
all ORFs? Example:

 >ORF_1
ATGTGTTCGTT....
 >ORF_2
ATGTTCCCGACCA...
 >ORF_3
ATGCCGCAT...

I know how to get all bases, but only as one complete sequence.
(That genome is not published yet, so there is no multi-Fasta-file at 
ncbi or EMBL available)

Thanks for help!

Jane Nerz


From simon.andrews at bbsrc.ac.uk  Wed Apr 12 06:05:49 2006
From: simon.andrews at bbsrc.ac.uk (simon andrews (BI))
Date: Wed, 12 Apr 2006 11:05:49 +0100
Subject: [EMBOSS] nt-multi-fastA-file
In-Reply-To: <443CC3E6.4040108@web.de>
References: <443CC3E6.4040108@web.de>
Message-ID: <902608901e58c68600b4dc52c7e8a966@bbsrc.ac.uk>


On 12 Apr 2006, at 10:09, Christiane Nerz wrote:

> Hi all,
>
> I put the gb-file of an whole genome in Artemis.
> Is there a possibility to export a multi-FastA-file with the bases of
> all ORFs?

If you can save the file out of Artemis with the ORFs shown in the 
feature table then you can use coderet in EMBOSS to extract out all of 
the subsequences covering those features, either as protein or DNA.

Hope this helps

Simon.
-- 
Simon Andrews PhD
Bioinformatics Dept.
The Babraham Institute

simon.andrews at bbsrc.ac.uk
+44 (0) 1223 496463


From pmr at ebi.ac.uk  Wed Apr 12 06:20:46 2006
From: pmr at ebi.ac.uk (Peter Rice)
Date: Wed, 12 Apr 2006 11:20:46 +0100
Subject: [EMBOSS] nt-multi-fastA-file
In-Reply-To: <443CC3E6.4040108@web.de>
References: <443CC3E6.4040108@web.de>
Message-ID: <443CD47E.6060607@ebi.ac.uk>

Christiane Nerz wrote:
> Hi all,
> 
> I put the gb-file of an whole genome in Artemis.
> Is there a possibility to export a multi-FastA-file with the bases of 
> all ORFs? Example:
> 
>  >ORF_1
> ATGTGTTCGTT....
>  >ORF_2
> ATGTTCCCGACCA...
>  >ORF_3
> ATGCCGCAT...
> 
> I know how to get all bases, but only as one complete sequence.
> (That genome is not published yet, so there is no multi-Fasta-file at 
> ncbi or EMBL available)

Yes, the coderet program will do this.

Unfortunately coderet tries to return CDS, mRNA and translations all in 
one file (to be fixed for the next release). You can ask just for the 
CDS with a couple of extra command line options:

coderet -nomrna -notranslation

Give it the filename as input.
The output will be the coding sequences.

With -nocds instead of -notranslation you will get the protein sequences.

If you have any problems parsing the GenBank file let me know.

regards,

Peter Rice


From Marc.Logghe at DEVGEN.com  Wed Apr 12 08:39:00 2006
From: Marc.Logghe at DEVGEN.com (Marc Logghe)
Date: Wed, 12 Apr 2006 14:39:00 +0200
Subject: [EMBOSS] Embossdata -reject option
Message-ID: <0C528E3670D8CE4B8E013F6749231AA6746CD7@ANTARESIA.be.devgen.com>

Hi,
I am intrigued by the -reject option of embossdata.
According to the doc:
"This specifies the names of the sub-directories of the EMBOSS data
directory that should be ignored when displaying data directories.
Choose from selection list of values 	3, 5, 6".

I was not able to find out what this list of values corresponds to. I
hoped to get a list to select from when embossdata was run with the
-options parameter, but this did not happen.
Any clues ?
Actually I was trying to find a way to obtain more or less the oposite
of '-reject', e.g. what if you only want the content of the CODONS
directory ?

Regards,
Marc


From gbottu at ben.vub.ac.be  Wed Apr 12 09:30:00 2006
From: gbottu at ben.vub.ac.be (Guy Bottu)
Date: Wed, 12 Apr 2006 15:30:00 +0200
Subject: [EMBOSS] Embossdata -reject option - Checked by AntiVir DEMO
	versio
In-Reply-To: <0C528E3670D8CE4B8E013F6749231AA6746CD7@ANTARESIA.be.devgen.com>
References: <0C528E3670D8CE4B8E013F6749231AA6746CD7@ANTARESIA.be.devgen.com>
Message-ID: <20060412133000.GD15725@bigben.ulb.ac.be>

On Wed, Apr 12, 2006 at 02:39:00PM +0200, Marc Logghe wrote:
> I am intrigued by the -reject option of embossdata.
> According to the doc:
> "This specifies the names of the sub-directories of the EMBOSS data
> directory that should be ignored when displaying data directories.
> Choose from selection list of values 	3, 5, 6".
> I was not able to find out what this list of values corresponds to.

Indeed tricky to find out what this means   :-;
You can look in the file  .../share/EMBOSS/acd/embossdata.acd :

  selection: reject  [
    default: "3, 5, 6"
    minimum: "1"
    maximum: "6"
    values: "None, AAINDEX, CVS, CODONS, PRINTS, PROSITE, REBASE"
    delimiter: ","
    header: "Directories to ignore"
    information: "Select directories"
    help: "This specifies the names of the sub-directories of the
           EMBOSS data directory that should be ignored when displaying data
           directories."
    button: "Y"
  ]

So, by default CVS, PRINTS and PROSITE are rejected.

> I hoped to get a list to select from when embossdata was run with the
> -options parameter, but this did not happen.

That is because -reject is an "advanced", not an "optional"/"additinal" 
parameter. It is indeed impossible to get a selection list displayed at 
the command line, although many GUI's like wEMBOSS will show it.

> Actually I was trying to find a way to obtain more or less the oposite
> of '-reject', e.g. what if you only want the content of the CODONS
> directory ?

This does not work, there is no way to reject the files in the base data 
directory. The best you can do is to add on the command line 
-reject=2,3,5,6,7 or -reject= AAINDEX,CVS,PRINTS,PROSITE,REBASE
What you can do however is :
ls $EMBOSS_DATA/CODONS

	Hope this helps,
	Guy Bottu,
	Belgian EMBnet Node


From Marc.Logghe at DEVGEN.com  Wed Apr 12 10:02:09 2006
From: Marc.Logghe at DEVGEN.com (Marc Logghe)
Date: Wed, 12 Apr 2006 16:02:09 +0200
Subject: [EMBOSS] Embossdata -reject option - Checked by AntiVir DEMO
	versio
Message-ID: <0C528E3670D8CE4B8E013F6749231AA6746CD9@ANTARESIA.be.devgen.com>

Hi Guy !

> You can look in the file  .../share/EMBOSS/acd/embossdata.acd :
> 
>   selection: reject  [
>     default: "3, 5, 6"
>     minimum: "1"
>     maximum: "6"
>     values: "None, AAINDEX, CVS, CODONS, PRINTS, PROSITE, REBASE"
>     delimiter: ","
>     header: "Directories to ignore"
>     information: "Select directories"
>     help: "This specifies the names of the sub-directories of the
>            EMBOSS data directory that should be ignored when 
> displaying data
>            directories."
>     button: "Y"
>   ]
> 
> So, by default CVS, PRINTS and PROSITE are rejected.
 
Yes, that makes sense now !

> This does not work, there is no way to reject the files in 
> the base data directory. The best you can do is to add on the 
> command line
> -reject=2,3,5,6,7 or -reject= 
> AAINDEX,CVS,PRINTS,PROSITE,REBASE What you can do however is :
> ls $EMBOSS_DATA/CODONS

Yeah, that is of course the most obvious ;-) Thing is that I wanted to
do it in an emboss-only way so that it would be possible to run the
emboss command via a soaplab service. The latter should provide a means
to dynamically fetch a list of codon usage tables. More or less like
showdb is doing.
> 
> 	Hope this helps,
Yes it did. Thanks !
Regards,
Marc 


From pmr at ebi.ac.uk  Wed Apr 12 12:04:21 2006
From: pmr at ebi.ac.uk (pmr at ebi.ac.uk)
Date: Wed, 12 Apr 2006 17:04:21 +0100 (BST)
Subject: [EMBOSS] Embossdata -reject option
In-Reply-To: <0C528E3670D8CE4B8E013F6749231AA6746CD7@ANTARESIA.be.devgen.com>
References: <0C528E3670D8CE4B8E013F6749231AA6746CD7@ANTARESIA.be.devgen.com>
Message-ID: <3057.86.137.128.238.1144857861.squirrel@webmail.ebi.ac.uk>

Mark Logghe wrote:

> I am intrigued by the -reject option of embossdata.
>
> I was not able to find out what this list of values corresponds to. I
> hoped to get a list to select from when embossdata was run with the
> -options parameter, but this did not happen.
> Any clues ?

Hmmmm .... yes, -help and the acdtable output (the table in the webpage
application documentation) really need to report the list of menu items
for values that are not prompted (list and selection datatypes).

We will do that for the next release!

Otherwise, you do need to look in the ACD file.

I propose:

-help to report documentation on the options
-help -verbose to report the list of options

acdtable to report the full menu formatted in the "Allowed values" box.

When this is implemented, it will appear in the apps/cvs/embossdata.html
documentation at emboss.sf.net :-)

>Yeah, that is of course the most obvious ;-) Thing is that I wanted to
>do it in an emboss-only way so that it would be possible to run the
>emboss command via a soaplab service. The latter should provide a means
>to dynamically fetch a list of codon usage tables. More or less like
>showdb is doing.

We are looking at ways to do that ... can be tricky if cutgextract has
been run. Any suggestions? A showdata application perhaps?

Hope that helps,

Peter


From Marc.Logghe at DEVGEN.com  Wed Apr 12 12:21:17 2006
From: Marc.Logghe at DEVGEN.com (Marc Logghe)
Date: Wed, 12 Apr 2006 18:21:17 +0200
Subject: [EMBOSS] Embossdata -reject option
Message-ID: <0C528E3670D8CE4B8E013F6749231AA6746CDB@ANTARESIA.be.devgen.com>

Hi Peter,

> Hmmmm .... yes, -help and the acdtable output (the table in 
> the webpage application documentation) really need to report 
> the list of menu items for values that are not prompted (list 
> and selection datatypes).
> 
> We will do that for the next release!
> 
> Otherwise, you do need to look in the ACD file.
> 
> I propose:
> 
> -help to report documentation on the options -help -verbose 
> to report the list of options
> 
> acdtable to report the full menu formatted in the "Allowed 
> values" box.

OK, great !

> We are looking at ways to do that ... can be tricky if 
> cutgextract has been run. Any suggestions? A showdata 
> application perhaps?

Yes that could be a start. You could give the directory name as a
parameter, the oposite of the -reject parameter (-include ?).
In it's basic form it can just list the file content like embossdata
-showall is doing.
An example command that lists all the codon tables could be: 'showdata
-include CODONS'.

Something else. In order not to contaminate the CODONS folder I created
a CUTG folder in the <emboss_data> directory containing the codon tables
extracted from the most recent CUTG. Problem now is a user has to add
the relative filename as a cfile option (backtranseq) in order EMBOSS to
find the new codon tables. Would it be an idea that you can set
$EMBOSS_DATA to a list of values instead of only 1 directory name ? In
that way, EMBOSS can access custom data directories.
Suppose the following:
EMBOSS_DATA=/usr/local/share/EMBOSS/data:/my/other/emboss_data_dir/CUTG

If a codon table is not found in the usual place
(/usr/local/share/EMBOSS/data/CODONS) EMBOSS will look for them in other
places defined in EMBOSS_DATA (/my/other/emboss_data_dir/CUTG). Or
something alike.

Does that make sense ?
Cheers,
Marc


From simon.andrews at bbsrc.ac.uk  Thu Apr 13 04:43:53 2006
From: simon.andrews at bbsrc.ac.uk (simon andrews (BI))
Date: Thu, 13 Apr 2006 09:43:53 +0100
Subject: [EMBOSS] Problems indexing with dbxfasta
Message-ID: <F02984326C1F2C428930E8A24561D268012D0487@bie2ksrv1.babraham.bbsrc.ac.uk>

I'm trying to use dbxfasta to index one of the WGS trace databases.
Unfortunately dbxfasta is falling over on me.  The session looks like
this:

$ dbxfasta
Database b+tree indexing for fasta file databases
Basename for index files: traces_oanatinus
Resource name: all
    simple : >ID
     idacc : >ID ACC
     gcgid : >db:ID
  gcgidacc : >db:ID ACC
      dbid : >db ID
      ncbi : | formats
ID line format [idacc]: simple
Database directory [.]:
Wildcard database filename [*.dat]: *.fasta
Release number [0.0]:
Index date [00/00/00]:
Processing file ./nisc-platypus-shotgun-1048960391.fasta
Processing file ./nisc-platypus-shotgun-1071756042.fasta
Processing file ./nisc-platypus-shotgun-1080815515.fasta
Processing file ./nisc-platypus-shotgun-1102160893.fasta
Processing file ./nisc-platypus-shotgun-1104879084.fasta
Processing file ./nisc-platypus-shotgun-1109000445.fasta
Processing file ./nisc-platypus-shotgun-1110804272.fasta
Processing file ./nisc-platypus-shotgun-1116844699.fasta
Processing file ./nisc-platypus-shotgun-1142973027.fasta
Processing file
./wugsc-ornithorhynchus_anatinus-cloneEnd-1113828608.fasta
Processing file
./wugsc-ornithorhynchus_anatinus-cloneEnd-1115655383.fasta
Processing file
./wugsc-ornithorhynchus_anatinus-cloneEnd-1119433885.fasta

   EMBOSS An error in ajindex.c at line 615:
Maximum retries (100) reached in btreeCacheFetch for page 14240710656

The same files have indexed OK with formatdb.  I havent' tried with
dbifasta as I'm trying to move everything over to the new dbx system
(and the rest of our databases have processed OK with dbx(fasta|flat)).

Anyone have any ideas about how to debug this?

Cheers

Simon.

-- 
Simon Andrews PhD
Bioinformatics Group
The Babraham Institute

simon.andrews at bbsrc.ac.uk
+44 (0) 1223 496463 


From Marc.Logghe at DEVGEN.com  Thu Apr 13 05:00:56 2006
From: Marc.Logghe at DEVGEN.com (Marc Logghe)
Date: Thu, 13 Apr 2006 11:00:56 +0200
Subject: [EMBOSS] Problems indexing with dbxfasta
Message-ID: <0C528E3670D8CE4B8E013F6749231AA6746CE0@ANTARESIA.be.devgen.com>

Hi Simon,
> The same files have indexed OK with formatdb.  I havent' 
> tried with dbifasta as I'm trying to move everything over to 
> the new dbx system (and the rest of our databases have 
> processed OK with dbx(fasta|flat)).
> 
> Anyone have any ideas about how to debug this?
You can run the command with the -debug option (any EMBOSS application
accepts this option). In that case a dbxfasta.dbg file will be created.
Hope this file will give you the clues.
Cheers,
Marc


From ajb at ebi.ac.uk  Thu Apr 13 05:19:44 2006
From: ajb at ebi.ac.uk (ajb at ebi.ac.uk)
Date: Thu, 13 Apr 2006 10:19:44 +0100 (BST)
Subject: [EMBOSS] Problems indexing with dbxfasta
In-Reply-To: <F02984326C1F2C428930E8A24561D268012D0487@bie2ksrv1.babraham.bbsrc.ac.
	uk>
References: <F02984326C1F2C428930E8A24561D268012D0487@bie2ksrv1.babraham.bbsrc.ac.uk>
Message-ID: <52959.81.98.244.247.1144919984.squirrel@webmail.ebi.ac.uk>

Hello Simon,

Did you pick up the latest set of patches from:
  ftp://emboss.open-bio.org/pub/EMBOSS/fixes/
?

The indexing system was rewritten a few months ago to fix this. See
the README in that directory.
If you are using the latest fixes (check file sizes) and it is still
failing then let me know.


HTH

Alan

> I'm trying to use dbxfasta to index one of the WGS trace databases.
> Unfortunately dbxfasta is falling over on me.  The session looks like
> this:
>
> $ dbxfasta
> Database b+tree indexing for fasta file databases
> Basename for index files: traces_oanatinus
> Resource name: all
>     simple : >ID
>      idacc : >ID ACC
>      gcgid : >db:ID
>   gcgidacc : >db:ID ACC
>       dbid : >db ID
>       ncbi : | formats
> ID line format [idacc]: simple
> Database directory [.]:
> Wildcard database filename [*.dat]: *.fasta
> Release number [0.0]:
> Index date [00/00/00]:
> Processing file ./nisc-platypus-shotgun-1048960391.fasta
> Processing file ./nisc-platypus-shotgun-1071756042.fasta
> Processing file ./nisc-platypus-shotgun-1080815515.fasta
> Processing file ./nisc-platypus-shotgun-1102160893.fasta
> Processing file ./nisc-platypus-shotgun-1104879084.fasta
> Processing file ./nisc-platypus-shotgun-1109000445.fasta
> Processing file ./nisc-platypus-shotgun-1110804272.fasta
> Processing file ./nisc-platypus-shotgun-1116844699.fasta
> Processing file ./nisc-platypus-shotgun-1142973027.fasta
> Processing file
> ./wugsc-ornithorhynchus_anatinus-cloneEnd-1113828608.fasta
> Processing file
> ./wugsc-ornithorhynchus_anatinus-cloneEnd-1115655383.fasta
> Processing file
> ./wugsc-ornithorhynchus_anatinus-cloneEnd-1119433885.fasta
>
>    EMBOSS An error in ajindex.c at line 615:
> Maximum retries (100) reached in btreeCacheFetch for page 14240710656
>
> The same files have indexed OK with formatdb.  I havent' tried with
> dbifasta as I'm trying to move everything over to the new dbx system
> (and the rest of our databases have processed OK with dbx(fasta|flat)).
>
> Anyone have any ideas about how to debug this?
>
> Cheers
>
> Simon.
>
> --
> Simon Andrews PhD
> Bioinformatics Group
> The Babraham Institute
>
> simon.andrews at bbsrc.ac.uk
> +44 (0) 1223 496463
>
> _______________________________________________
> EMBOSS mailing list
> EMBOSS at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/emboss
>


From simon.andrews at bbsrc.ac.uk  Thu Apr 13 05:30:41 2006
From: simon.andrews at bbsrc.ac.uk (simon andrews (BI))
Date: Thu, 13 Apr 2006 10:30:41 +0100
Subject: [EMBOSS] Problems indexing with dbxfasta
In-Reply-To: <52959.81.98.244.247.1144919984.squirrel@webmail.ebi.ac.uk>
References: <F02984326C1F2C428930E8A24561D268012D0487@bie2ksrv1.babraham.bbsrc.ac.uk>
	<52959.81.98.244.247.1144919984.squirrel@webmail.ebi.ac.uk>
Message-ID: <25bf10458f1cd7e0cc1c64de70f6bdef@bbsrc.ac.uk>


On 13 Apr 2006, at 10:19, ajb at ebi.ac.uk wrote:

> Hello Simon,
>
> Did you pick up the latest set of patches from:
>   ftp://emboss.open-bio.org/pub/EMBOSS/fixes/

Yes.  All patched with the latest fixes as of last week.

> If you are using the latest fixes (check file sizes) and it is still
> failing then let me know.

It is still failing.  I'll have a go at generating a .dbg file if you 
think it'll help, but given how verbose those tend to be, and how long 
it takes to fail I was a bit concerned at the size of file it was 
likely to generate.

Simon.

>
>
> HTH
>
> Alan
>
>> I'm trying to use dbxfasta to index one of the WGS trace databases.
>> Unfortunately dbxfasta is falling over on me.  The session looks like
>> this:
>>
>> $ dbxfasta
>> Database b+tree indexing for fasta file databases
>> Basename for index files: traces_oanatinus
>> Resource name: all
>>     simple : >ID
>>      idacc : >ID ACC
>>      gcgid : >db:ID
>>   gcgidacc : >db:ID ACC
>>       dbid : >db ID
>>       ncbi : | formats
>> ID line format [idacc]: simple
>> Database directory [.]:
>> Wildcard database filename [*.dat]: *.fasta
>> Release number [0.0]:
>> Index date [00/00/00]:
>> Processing file ./nisc-platypus-shotgun-1048960391.fasta
>> Processing file ./nisc-platypus-shotgun-1071756042.fasta
>> Processing file ./nisc-platypus-shotgun-1080815515.fasta
>> Processing file ./nisc-platypus-shotgun-1102160893.fasta
>> Processing file ./nisc-platypus-shotgun-1104879084.fasta
>> Processing file ./nisc-platypus-shotgun-1109000445.fasta
>> Processing file ./nisc-platypus-shotgun-1110804272.fasta
>> Processing file ./nisc-platypus-shotgun-1116844699.fasta
>> Processing file ./nisc-platypus-shotgun-1142973027.fasta
>> Processing file
>> ./wugsc-ornithorhynchus_anatinus-cloneEnd-1113828608.fasta
>> Processing file
>> ./wugsc-ornithorhynchus_anatinus-cloneEnd-1115655383.fasta
>> Processing file
>> ./wugsc-ornithorhynchus_anatinus-cloneEnd-1119433885.fasta
>>
>>    EMBOSS An error in ajindex.c at line 615:
>> Maximum retries (100) reached in btreeCacheFetch for page 14240710656
>>
>> The same files have indexed OK with formatdb.  I havent' tried with
>> dbifasta as I'm trying to move everything over to the new dbx system
>> (and the rest of our databases have processed OK with 
>> dbx(fasta|flat)).
>>
>> Anyone have any ideas about how to debug this?
>>
>> Cheers
>>
>> Simon.
>>
>> --
>> Simon Andrews PhD
>> Bioinformatics Group
>> The Babraham Institute
>>
>> simon.andrews at bbsrc.ac.uk
>> +44 (0) 1223 496463
>>
>> _______________________________________________
>> EMBOSS mailing list
>> EMBOSS at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/emboss
>>
>
>
>
-- 
Simon Andrews PhD
Bioinformatics Dept.
The Babraham Institute

simon.andrews at bbsrc.ac.uk
+44 (0) 1223 496463


From simon.andrews at bbsrc.ac.uk  Thu Apr 13 05:41:13 2006
From: simon.andrews at bbsrc.ac.uk (simon andrews (BI))
Date: Thu, 13 Apr 2006 10:41:13 +0100
Subject: [EMBOSS] Problems indexing with dbxfasta
Message-ID: <F02984326C1F2C428930E8A24561D268012D0488@bie2ksrv1.babraham.bbsrc.ac.uk>

I managed to get hold of a debug file from the failing dbxfasta.  The
edited highlights are:

Debug file dbxfasta.dbg buffered:No
ajFileNewIn '/usr/local/share/EMBOSS/acd/dbxfasta.acd'
EOF ajFileGetsL file /usr/local/share/EMBOSS/acd/dbxfasta.acd
closing file '/usr/local/share/EMBOSS/acd/dbxfasta.acd'
ajUserGet buffer len: 0 res: 2048 ptr: 8d8eb18
ajUserGet buffer len: 0 res: 2048 ptr: 8d8eb18
ajUserGet buffer len: 5 res: 2048 ptr: 8d8f320
ajUserGet buffer len: 1 res: 2048 ptr: 8d8fb28
ajUserGet buffer len: 5 res: 2048 ptr: 8d8eb18
ajUserGet buffer len: 3 res: 2048 ptr: 8d8eb18
ajUserGet buffer len: 8 res: 2048 ptr: 8d8eb18
ajFileScan directory: './'
  nisc-platypus-shotgun-1071756042.fasta
  nisc-platypus-shotgun-1080815515.fasta
  nisc-platypus-shotgun-1102160893.fasta


[snip big list of files]

closing file './/traces_oanatinus.ent'
ajFileNewIn './nisc-platypus-shotgun-1048960391.fasta'
EOF ajFileGetsL file ./nisc-platypus-shotgun-1048960391.fasta
closing file './nisc-platypus-shotgun-1048960391.fasta'
ajFileNewIn './nisc-platypus-shotgun-1071756042.fasta'
EOF ajFileGetsL file ./nisc-platypus-shotgun-1071756042.fasta
closing file './nisc-platypus-shotgun-1071756042.fasta'
ajFileNewIn './nisc-platypus-shotgun-1080815515.fasta'
EOF ajFileGetsL file ./nisc-platypus-shotgun-1080815515.fasta
closing file './nisc-platypus-shotgun-1080815515.fasta'
ajFileNewIn './nisc-platypus-shotgun-1102160893.fasta'
EOF ajFileGetsL file ./nisc-platypus-shotgun-1102160893.fasta
closing file './nisc-platypus-shotgun-1102160893.fasta'
ajFileNewIn './nisc-platypus-shotgun-1104879084.fasta'
EOF ajFileGetsL file ./nisc-platypus-shotgun-1104879084.fasta
closing file './nisc-platypus-shotgun-1104879084.fasta'
ajFileNewIn './nisc-platypus-shotgun-1109000445.fasta'
EOF ajFileGetsL file ./nisc-platypus-shotgun-1109000445.fasta
closing file './nisc-platypus-shotgun-1109000445.fasta'
ajFileNewIn './nisc-platypus-shotgun-1110804272.fasta'
EOF ajFileGetsL file ./nisc-platypus-shotgun-1110804272.fasta
closing file './nisc-platypus-shotgun-1110804272.fasta'
ajFileNewIn './nisc-platypus-shotgun-1116844699.fasta'
EOF ajFileGetsL file ./nisc-platypus-shotgun-1116844699.fasta
closing file './nisc-platypus-shotgun-1116844699.fasta'
ajFileNewIn './nisc-platypus-shotgun-1142973027.fasta'
EOF ajFileGetsL file ./nisc-platypus-shotgun-1142973027.fasta
closing file './nisc-platypus-shotgun-1142973027.fasta'
ajFileNewIn './wugsc-ornithorhynchus_anatinus-cloneEnd-1113828608.fasta'
WriteBucket: Overflow
WriteBucket: Overflow
ReadBucket: Overflow
ReadBucket: Overflow
ReadBucket: Overflow
ReadBucket: Overflow
WriteBucket: Overflow

[Loads more of these]

GetKeys: Overflow
ReadBucket: Overflow
GetKeys: Overflow
WriteNode: Overflow
GetKeys: Overflow
WriteNode: Overflow
WriteBucket: Overflow
WriteBucket: Overflow

[Loads of these]

WriteNode: Overflow
GetKeys: Overflow
WriteNode: Overflow
GetKeys: Overflow
GetKeys: Overflow
WriteNode: Overflow
GetKeys: Overflow
GetKeys: Overflow
WriteNode: Overflow
GetKeys: Overflow
WriteNode: Overflow
GetKeys: Overflow

[Killed at this point as the .dbg file getting enormous] 


From ajb at ebi.ac.uk  Thu Apr 13 06:22:49 2006
From: ajb at ebi.ac.uk (ajb at ebi.ac.uk)
Date: Thu, 13 Apr 2006 11:22:49 +0100 (BST)
Subject: [EMBOSS] Problems indexing with dbxfasta
In-Reply-To: <F02984326C1F2C428930E8A24561D268012D0488@bie2ksrv1.babraham.bbsrc.ac.
	uk>
References: <F02984326C1F2C428930E8A24561D268012D0488@bie2ksrv1.babraham.bbsrc.ac.uk>
Message-ID: <36857.81.98.244.247.1144923769.squirrel@webmail.ebi.ac.uk>

Hi Simon,

The overflow code isn't fully implemented yet and it shouldn't need
to use it if your resource definition is OK. You'll get
overflows if the length values are too short for the
ID/ACC/SV/etc. Take a look and get back to me off-list
if adjusting any appropriate length resource definitions
doesn't help.

HTH

Alan


> I managed to get hold of a debug file from the failing dbxfasta.  The
> edited highlights are:
>
> Debug file dbxfasta.dbg buffered:No
> ajFileNewIn '/usr/local/share/EMBOSS/acd/dbxfasta.acd'
> EOF ajFileGetsL file /usr/local/share/EMBOSS/acd/dbxfasta.acd
> closing file '/usr/local/share/EMBOSS/acd/dbxfasta.acd'
> ajUserGet buffer len: 0 res: 2048 ptr: 8d8eb18
> ajUserGet buffer len: 0 res: 2048 ptr: 8d8eb18
> ajUserGet buffer len: 5 res: 2048 ptr: 8d8f320
> ajUserGet buffer len: 1 res: 2048 ptr: 8d8fb28
> ajUserGet buffer len: 5 res: 2048 ptr: 8d8eb18
> ajUserGet buffer len: 3 res: 2048 ptr: 8d8eb18
> ajUserGet buffer len: 8 res: 2048 ptr: 8d8eb18
> ajFileScan directory: './'
>   nisc-platypus-shotgun-1071756042.fasta
>   nisc-platypus-shotgun-1080815515.fasta
>   nisc-platypus-shotgun-1102160893.fasta
>
>
> [snip big list of files]
>
> closing file './/traces_oanatinus.ent'
> ajFileNewIn './nisc-platypus-shotgun-1048960391.fasta'
> EOF ajFileGetsL file ./nisc-platypus-shotgun-1048960391.fasta
> closing file './nisc-platypus-shotgun-1048960391.fasta'
> ajFileNewIn './nisc-platypus-shotgun-1071756042.fasta'
> EOF ajFileGetsL file ./nisc-platypus-shotgun-1071756042.fasta
> closing file './nisc-platypus-shotgun-1071756042.fasta'
> ajFileNewIn './nisc-platypus-shotgun-1080815515.fasta'
> EOF ajFileGetsL file ./nisc-platypus-shotgun-1080815515.fasta
> closing file './nisc-platypus-shotgun-1080815515.fasta'
> ajFileNewIn './nisc-platypus-shotgun-1102160893.fasta'
> EOF ajFileGetsL file ./nisc-platypus-shotgun-1102160893.fasta
> closing file './nisc-platypus-shotgun-1102160893.fasta'
> ajFileNewIn './nisc-platypus-shotgun-1104879084.fasta'
> EOF ajFileGetsL file ./nisc-platypus-shotgun-1104879084.fasta
> closing file './nisc-platypus-shotgun-1104879084.fasta'
> ajFileNewIn './nisc-platypus-shotgun-1109000445.fasta'
> EOF ajFileGetsL file ./nisc-platypus-shotgun-1109000445.fasta
> closing file './nisc-platypus-shotgun-1109000445.fasta'
> ajFileNewIn './nisc-platypus-shotgun-1110804272.fasta'
> EOF ajFileGetsL file ./nisc-platypus-shotgun-1110804272.fasta
> closing file './nisc-platypus-shotgun-1110804272.fasta'
> ajFileNewIn './nisc-platypus-shotgun-1116844699.fasta'
> EOF ajFileGetsL file ./nisc-platypus-shotgun-1116844699.fasta
> closing file './nisc-platypus-shotgun-1116844699.fasta'
> ajFileNewIn './nisc-platypus-shotgun-1142973027.fasta'
> EOF ajFileGetsL file ./nisc-platypus-shotgun-1142973027.fasta
> closing file './nisc-platypus-shotgun-1142973027.fasta'
> ajFileNewIn './wugsc-ornithorhynchus_anatinus-cloneEnd-1113828608.fasta'
> WriteBucket: Overflow
> WriteBucket: Overflow
> ReadBucket: Overflow
> ReadBucket: Overflow
> ReadBucket: Overflow
> ReadBucket: Overflow
> WriteBucket: Overflow
>
> [Loads more of these]
>
> GetKeys: Overflow
> ReadBucket: Overflow
> GetKeys: Overflow
> WriteNode: Overflow
> GetKeys: Overflow
> WriteNode: Overflow
> WriteBucket: Overflow
> WriteBucket: Overflow
>
> [Loads of these]
>
> WriteNode: Overflow
> GetKeys: Overflow
> WriteNode: Overflow
> GetKeys: Overflow
> GetKeys: Overflow
> WriteNode: Overflow
> GetKeys: Overflow
> GetKeys: Overflow
> WriteNode: Overflow
> GetKeys: Overflow
> WriteNode: Overflow
> GetKeys: Overflow
>
> [Killed at this point as the .dbg file getting enormous]
>
> _______________________________________________
> EMBOSS mailing list
> EMBOSS at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/emboss
>


From simon.andrews at bbsrc.ac.uk  Thu Apr 13 09:36:11 2006
From: simon.andrews at bbsrc.ac.uk (simon andrews (BI))
Date: Thu, 13 Apr 2006 14:36:11 +0100
Subject: [EMBOSS] Problems indexing with dbxfasta
In-Reply-To: <36857.81.98.244.247.1144923769.squirrel@webmail.ebi.ac.uk>
References: <F02984326C1F2C428930E8A24561D268012D0488@bie2ksrv1.babraham.bbsrc.ac.uk>
	<36857.81.98.244.247.1144923769.squirrel@webmail.ebi.ac.uk>
Message-ID: <b4b09803fdaa7ffb1e8f7d4d21a06a6d@bbsrc.ac.uk>

Alan,

I increased all of the values in the resource definition and did the 
index again and it all worked fine this time.  Looks like there must be 
some very long ids somewhere in this data.

Thanks for the help

Simon.

On 13 Apr 2006, at 11:22, ajb at ebi.ac.uk wrote:

> Hi Simon,
>
> The overflow code isn't fully implemented yet and it shouldn't need
> to use it if your resource definition is OK. You'll get
> overflows if the length values are too short for the
> ID/ACC/SV/etc. Take a look and get back to me off-list
> if adjusting any appropriate length resource definitions
> doesn't help.
>
> HTH
>
> Alan
>
>
>> I managed to get hold of a debug file from the failing dbxfasta.  The
>> edited highlights are:
>>
>> Debug file dbxfasta.dbg buffered:No
>> ajFileNewIn '/usr/local/share/EMBOSS/acd/dbxfasta.acd'
>> EOF ajFileGetsL file /usr/local/share/EMBOSS/acd/dbxfasta.acd
>> closing file '/usr/local/share/EMBOSS/acd/dbxfasta.acd'
>> ajUserGet buffer len: 0 res: 2048 ptr: 8d8eb18
>> ajUserGet buffer len: 0 res: 2048 ptr: 8d8eb18
>> ajUserGet buffer len: 5 res: 2048 ptr: 8d8f320
>> ajUserGet buffer len: 1 res: 2048 ptr: 8d8fb28
>> ajUserGet buffer len: 5 res: 2048 ptr: 8d8eb18
>> ajUserGet buffer len: 3 res: 2048 ptr: 8d8eb18
>> ajUserGet buffer len: 8 res: 2048 ptr: 8d8eb18
>> ajFileScan directory: './'
>>   nisc-platypus-shotgun-1071756042.fasta
>>   nisc-platypus-shotgun-1080815515.fasta
>>   nisc-platypus-shotgun-1102160893.fasta
>>
>>
>> [snip big list of files]
>>
>> closing file './/traces_oanatinus.ent'
>> ajFileNewIn './nisc-platypus-shotgun-1048960391.fasta'
>> EOF ajFileGetsL file ./nisc-platypus-shotgun-1048960391.fasta
>> closing file './nisc-platypus-shotgun-1048960391.fasta'
>> ajFileNewIn './nisc-platypus-shotgun-1071756042.fasta'
>> EOF ajFileGetsL file ./nisc-platypus-shotgun-1071756042.fasta
>> closing file './nisc-platypus-shotgun-1071756042.fasta'
>> ajFileNewIn './nisc-platypus-shotgun-1080815515.fasta'
>> EOF ajFileGetsL file ./nisc-platypus-shotgun-1080815515.fasta
>> closing file './nisc-platypus-shotgun-1080815515.fasta'
>> ajFileNewIn './nisc-platypus-shotgun-1102160893.fasta'
>> EOF ajFileGetsL file ./nisc-platypus-shotgun-1102160893.fasta
>> closing file './nisc-platypus-shotgun-1102160893.fasta'
>> ajFileNewIn './nisc-platypus-shotgun-1104879084.fasta'
>> EOF ajFileGetsL file ./nisc-platypus-shotgun-1104879084.fasta
>> closing file './nisc-platypus-shotgun-1104879084.fasta'
>> ajFileNewIn './nisc-platypus-shotgun-1109000445.fasta'
>> EOF ajFileGetsL file ./nisc-platypus-shotgun-1109000445.fasta
>> closing file './nisc-platypus-shotgun-1109000445.fasta'
>> ajFileNewIn './nisc-platypus-shotgun-1110804272.fasta'
>> EOF ajFileGetsL file ./nisc-platypus-shotgun-1110804272.fasta
>> closing file './nisc-platypus-shotgun-1110804272.fasta'
>> ajFileNewIn './nisc-platypus-shotgun-1116844699.fasta'
>> EOF ajFileGetsL file ./nisc-platypus-shotgun-1116844699.fasta
>> closing file './nisc-platypus-shotgun-1116844699.fasta'
>> ajFileNewIn './nisc-platypus-shotgun-1142973027.fasta'
>> EOF ajFileGetsL file ./nisc-platypus-shotgun-1142973027.fasta
>> closing file './nisc-platypus-shotgun-1142973027.fasta'
>> ajFileNewIn 
>> './wugsc-ornithorhynchus_anatinus-cloneEnd-1113828608.fasta'
>> WriteBucket: Overflow
>> WriteBucket: Overflow
>> ReadBucket: Overflow
>> ReadBucket: Overflow
>> ReadBucket: Overflow
>> ReadBucket: Overflow
>> WriteBucket: Overflow
>>
>> [Loads more of these]
>>
>> GetKeys: Overflow
>> ReadBucket: Overflow
>> GetKeys: Overflow
>> WriteNode: Overflow
>> GetKeys: Overflow
>> WriteNode: Overflow
>> WriteBucket: Overflow
>> WriteBucket: Overflow
>>
>> [Loads of these]
>>
>> WriteNode: Overflow
>> GetKeys: Overflow
>> WriteNode: Overflow
>> GetKeys: Overflow
>> GetKeys: Overflow
>> WriteNode: Overflow
>> GetKeys: Overflow
>> GetKeys: Overflow
>> WriteNode: Overflow
>> GetKeys: Overflow
>> WriteNode: Overflow
>> GetKeys: Overflow
>>
>> [Killed at this point as the .dbg file getting enormous]
>>
>> _______________________________________________
>> EMBOSS mailing list
>> EMBOSS at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/emboss
>>
>
>
>
-- 
Simon Andrews PhD
Bioinformatics Dept.
The Babraham Institute

simon.andrews at bbsrc.ac.uk
+44 (0) 1223 496463


From msarachu at biol.unlp.edu.ar  Mon Apr 17 16:55:47 2006
From: msarachu at biol.unlp.edu.ar (Martin Sarachu)
Date: Mon, 17 Apr 2006 17:55:47 -0300
Subject: [EMBOSS] wEMBOSS-1.6.0 & wrappers4EMBOSS-1.4.0 release
Message-ID: <444400D3.6080705@biol.unlp.edu.ar>

This is to announce the release of both wEMBOSS-1.6.0 and
wrappers4EMBOSS-1.4.0

Changes in wEMBOSS-1.6.0 includes:
  - compatibility with new datatypes in EMBOSS-3.0.0
  - better conversion of ACD expressions to Perl to maintain
the same order of priority as in EMBOSS
  - increased speed by preprocessing EMBOSS datafiles

Changes in wrappers4EMBOSS includes:
  - all programs that compute a gap penalty of type a*n+b now have
parameters -gappenalty and -gaplength instead of -gapopen and -gapextend
  - muscle version updated for MUSCLE-3.6
  - support for EMBOSS v2.9, 2.10 and 3. EMBOSS-2.8 is no longer supported
  - fastapid uses matrices coded in the software rather then read from files
  - indexsearch can also run with SRS 8 and it also fully runs on
command line

We are experiencing some dificulties at the wEMBOSS site so you can 
download both files at http://www.ar.embnet.org/downloads
Shortly you will be able to download both at http://www.wemboss.org as 
usual.
wEMBOSS includes wrappers4EMBOSS but if you want to use just
wrappers4EMBOSS on the command line just like any EMBOSS program you can
download it separately.


Regards,

the wEMBOSS & wrappers4EMBOSS dev team.

-- 
Martin Sarachu
msarachu at biol.unlp.edu.ar
AR.EMBnet
http://www.ar.embnet.org


From dalesan at lamar.colostate.edu  Tue Apr 18 19:53:03 2006
From: dalesan at lamar.colostate.edu (Dale Richardson)
Date: Tue, 18 Apr 2006 17:53:03 -0600
Subject: [EMBOSS] Compilation errors on Intel Mac in ajindex.c
Message-ID: <1D9B80E3-1F2B-4171-A346-BB63EF775585@lamar.colostate.edu>

Hello All,

I am trying to install EMBOSS 3.0 on my MacBook Pro.  Interestingly,  
I have come across an error that I haven't been able to resolve via  
googling.

When running make, the following error is encountered:

ajindex.c: In function 'ajBtreeCacheNewC':
ajindex.c:200: error: storage size of 'buf' isn't known
ajindex.c: In function 'ajBtreeSecCacheNewC':
ajindex.c:8234: error: storage size of 'buf' isn't known
make[1]: *** [ajindex.lo] Error 1
make: *** [all-recursive] Error 1

Is there a way around this?  I've applied the fixes available from  
the fixes directory at ftp://emboss.open-bio.org/pub/EMBOSS/ and  
tried to reconfigure and recompile but to no avail.

Insights and suggestions would be much appreciated.

Thanks,

Dale Richardson
Colorado State University
dalesan at lamar.colostate.edu


From kvddrift at earthlink.net  Tue Apr 18 21:27:30 2006
From: kvddrift at earthlink.net (Koen van der Drift)
Date: Tue, 18 Apr 2006 21:27:30 -0400
Subject: [EMBOSS] Compilation errors on Intel Mac in ajindex.c
In-Reply-To: <1D9B80E3-1F2B-4171-A346-BB63EF775585@lamar.colostate.edu>
References: <1D9B80E3-1F2B-4171-A346-BB63EF775585@lamar.colostate.edu>
Message-ID: <E5391EE0-46D3-400C-9A52-27C4585C4A98@earthlink.net>


On Apr 18, 2006, at 7:53 PM, Dale Richardson wrote:

> Insights and suggestions would be much appreciated.

You could try to install emboss using fink, which is reportedly  
working on an Intel Mac (not tested by myself though).

- Koen.


From kvddrift at earthlink.net  Tue Apr 18 21:30:43 2006
From: kvddrift at earthlink.net (Koen van der Drift)
Date: Tue, 18 Apr 2006 21:30:43 -0400
Subject: [EMBOSS] Compilation errors on Intel Mac in ajindex.c
In-Reply-To: <1D9B80E3-1F2B-4171-A346-BB63EF775585@lamar.colostate.edu>
References: <1D9B80E3-1F2B-4171-A346-BB63EF775585@lamar.colostate.edu>
Message-ID: <5195A9EE-A20F-4804-9AD5-FA08662D8912@earthlink.net>


On Apr 18, 2006, at 7:53 PM, Dale Richardson wrote:

> Is there a way around this?  I've applied the fixes available from
> the fixes directory at ftp://emboss.open-bio.org/pub/EMBOSS/ and
> tried to reconfigure and recompile but to no avail.
>
> Insights and suggestions would be much appreciated.


Just another thought, did you also replace the configure file from  
the fixes directory, followed by the ./configure command?

- Koen.


From olivier.friard at unito.it  Fri Apr 21 11:00:20 2006
From: olivier.friard at unito.it (Olivier Friard)
Date: Fri, 21 Apr 2006 17:00:20 +0200
Subject: [EMBOSS] index RefSeq for EMBOSS
Message-ID: <4448F384.7020900@unito.it>

Hi,

I tried to index the RefSeq database:

1) I downloaded all 
ftp://ftp.ncbi.nih.gov/refseq/release/complete/complete*.genomic.gbff.gz 
file (GB format)

2) gunziped

3) Added the rs_dna entry to my .embossrc file


DB rs_dna [
    type: "N"
    method: "emblcd"
    format: "GB"
    dir: "/home/users/friard/data/refseq_genomic/"
    file: "*.gbff"
    release: ""
    comment: "RefSeq Genomic  (upd)"
    indexdir: "/home/users/friard/data/refseq_genomic/"
]


4) used dbiflat with following arguments (from the directory where files 
are stored)

dbiflat
Index a flat file database
Database name: rs_dna
       EMBL : EMBL
      SWISS : Swiss-Prot, SpTrEMBL, TrEMBLnew
         GB : Genbank, DDBJ
     REFSEQ : Refseq
Entry format [SWISS]: REFSEQ
Database directory [.]:
Wildcard database filename [*.dat]: *.gbff
Release number [0.0]:
Index date [00/00/00]:

The indexes were created but when I try to access to a sequence (i.e 
seqret rs_rna:NC_000004) then results is not the correct sequence but an 
other one with the NC_000004 ID!


I also downloaded the file in FASTA format and tried to index them with 
the dbifasta command (format: ncbi) without positive results:

seqret rs_dna:nc_000004
Reads and writes (returns) sequences
Error: Unable to read sequence 'rs_dna:nc_000004'
Died: seqret terminated: Bad value for '-sequence' and no prompt


Does anyone index the RefSeq successfully?
Thank you in advance


-- 

Olivier Friard
Laboratorio di Biologia Computazionale
Facolt? di Scienze MFN
Universit? di Torino
via Accademia Albertina 13, 10124 TORINO (Italy)

tel. +39 011 6704689


From simon.andrews at bbsrc.ac.uk  Fri Apr 21 11:35:29 2006
From: simon.andrews at bbsrc.ac.uk (simon andrews (BI))
Date: Fri, 21 Apr 2006 16:35:29 +0100
Subject: [EMBOSS] index RefSeq for EMBOSS
In-Reply-To: <4448F384.7020900@unito.it>
References: <4448F384.7020900@unito.it>
Message-ID: <ae34eb8837560b8610df39877f5ad928@bbsrc.ac.uk>


On 21 Apr 2006, at 16:00, Olivier Friard wrote:

> The indexes were created but when I try to access to a sequence (i.e
> seqret rs_rna:NC_000004) then results is not the correct sequence but 
> an
> other one with the NC_000004 ID!

Is it just finding the wrong sequence or could you have duplicate 
entries in the data?  Use entret to see if the entry really has that 
ID.

We found that we got problems with incorrect or no sequences being 
returned by seqret when some of the individual sequence files were >2Gb 
in size.  In these cases you can use the new dbx* indexing programs 
which handle large files properly.

> Does anyone index the RefSeq successfully?

Yes.  We use it here without problems, but indexed with dbxflat.

It gets indexed with:

dbxflat -dbresource all -auto -idformat refseq -dbname refseq_all 
-filenames \*.gbff

..and the emboss.default entry looks like:

DB refseq_all
  [
     type: N
     comment: "Refseq"
     method: emboss
     format: genbank
     dbalias: refseq_all
     directory: /data/public/DNA/Refseq/Current/all
     file: *.gbff
  ]

with the resource section being:

RES all [ type: Index
   idlen:  15
   acclen: 15
   svlen:  15
   keylen: 15
   deslen: 15
   orglen: 15
]


Simon.
-- 
Simon Andrews PhD
Bioinformatics Dept.
The Babraham Institute

simon.andrews at bbsrc.ac.uk
+44 (0) 1223 496463


From isabelle.wells at roche.com  Fri Apr 21 11:43:27 2006
From: isabelle.wells at roche.com (Wells, Isabelle)
Date: Fri, 21 Apr 2006 17:43:27 +0200
Subject: [EMBOSS] index RefSeq for EMBOSS
Message-ID: <B247DF4AC17BBF40B119308717C33702029FA495@rkamsem1.emea.roche.com>

Hi,

Yes I also index refseq. I think the problem here is that dbiflat can only handle files which are less than 2GB. So try splitting the files first.

Best,
Isabelle

-----Original Message-----
From: emboss-bounces at lists.open-bio.org [mailto:emboss-bounces at lists.open-bio.org] On Behalf Of Olivier Friard
Sent: Friday, April 21, 2006 17:00
To: emboss at emboss.open-bio.org
Subject: [EMBOSS] index RefSeq for EMBOSS


Hi,

I tried to index the RefSeq database:

1) I downloaded all 
ftp://ftp.ncbi.nih.gov/refseq/release/complete/complete*.genomic.gbff.gz 
file (GB format)

2) gunziped

3) Added the rs_dna entry to my .embossrc file


DB rs_dna [
    type: "N"
    method: "emblcd"
    format: "GB"
    dir: "/home/users/friard/data/refseq_genomic/"
    file: "*.gbff"
    release: ""
    comment: "RefSeq Genomic  (upd)"
    indexdir: "/home/users/friard/data/refseq_genomic/"
]


4) used dbiflat with following arguments (from the directory where files 
are stored)

dbiflat
Index a flat file database
Database name: rs_dna
       EMBL : EMBL
      SWISS : Swiss-Prot, SpTrEMBL, TrEMBLnew
         GB : Genbank, DDBJ
     REFSEQ : Refseq
Entry format [SWISS]: REFSEQ
Database directory [.]:
Wildcard database filename [*.dat]: *.gbff
Release number [0.0]:
Index date [00/00/00]:

The indexes were created but when I try to access to a sequence (i.e 
seqret rs_rna:NC_000004) then results is not the correct sequence but an 
other one with the NC_000004 ID!


I also downloaded the file in FASTA format and tried to index them with 
the dbifasta command (format: ncbi) without positive results:

seqret rs_dna:nc_000004
Reads and writes (returns) sequences
Error: Unable to read sequence 'rs_dna:nc_000004'
Died: seqret terminated: Bad value for '-sequence' and no prompt


Does anyone index the RefSeq successfully?
Thank you in advance


-- 

Olivier Friard
Laboratorio di Biologia Computazionale
Facolt? di Scienze MFN
Universit? di Torino
via Accademia Albertina 13, 10124 TORINO (Italy)

tel. +39 011 6704689

_______________________________________________
EMBOSS mailing list
EMBOSS at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/emboss


From David.Bauer at schering.de  Mon Apr 24 01:52:50 2006
From: David.Bauer at schering.de (David.Bauer at schering.de)
Date: Mon, 24 Apr 2006 07:52:50 +0200
Subject: [EMBOSS] index RefSeq for EMBOSS
In-Reply-To: <B247DF4AC17BBF40B119308717C33702029FA495@rkamsem1.emea.roche.com>
Message-ID: <OF5F484A92.0F56D33A-ONC125715A.001FFF89-C125715A.00204D9D@schering.de>


You can also try the new indexing programs dbxflat and dbxfasta, which can
handle files larger than 2 GB.

Regards,
David.

emboss-bounces at lists.open-bio.org schrieb am 21/04/2006 17:43:27:

> Hi,
>
> Yes I also index refseq. I think the problem here is that dbiflat
> can only handle files which are less than 2GB. So try splitting the
> files first.
>
> Best,
> Isabelle
>
> -----Original Message-----
> From: emboss-bounces at lists.open-bio.org [mailto:emboss-
> bounces at lists.open-bio.org] On Behalf Of Olivier Friard
> Sent: Friday, April 21, 2006 17:00
> To: emboss at emboss.open-bio.org
> Subject: [EMBOSS] index RefSeq for EMBOSS
>
>
> Hi,
>
> I tried to index the RefSeq database:
>
> 1) I downloaded all
> ftp://ftp.ncbi.nih.gov/refseq/release/complete/complete*.genomic.gbff.gz

> file (GB format)
>
> 2) gunziped
>
> 3) Added the rs_dna entry to my .embossrc file
>
>
> DB rs_dna [
>     type: "N"
>     method: "emblcd"
>     format: "GB"
>     dir: "/home/users/friard/data/refseq_genomic/"
>     file: "*.gbff"
>     release: ""
>     comment: "RefSeq Genomic  (upd)"
>     indexdir: "/home/users/friard/data/refseq_genomic/"
> ]
>
>
> 4) used dbiflat with following arguments (from the directory where files

> are stored)
>
> dbiflat
> Index a flat file database
> Database name: rs_dna
>        EMBL : EMBL
>       SWISS : Swiss-Prot, SpTrEMBL, TrEMBLnew
>          GB : Genbank, DDBJ
>      REFSEQ : Refseq
> Entry format [SWISS]: REFSEQ
> Database directory [.]:
> Wildcard database filename [*.dat]: *.gbff
> Release number [0.0]:
> Index date [00/00/00]:
>
> The indexes were created but when I try to access to a sequence (i.e
> seqret rs_rna:NC_000004) then results is not the correct sequence but an

> other one with the NC_000004 ID!
>
>
>
> I also downloaded the file in FASTA format and tried to index them with
> the dbifasta command (format: ncbi) without positive results:
>
> seqret rs_dna:nc_000004
> Reads and writes (returns) sequences
> Error: Unable to read sequence 'rs_dna:nc_000004'
> Died: seqret terminated: Bad value for '-sequence' and no prompt
>
>
> Does anyone index the RefSeq successfully?
> Thank you in advance
>
>
>
>
>
>
> --
>
> Olivier Friard
> Laboratorio di Biologia Computazionale
> Facolt? di Scienze MFN
> Universit? di Torino
> via Accademia Albertina 13, 10124 TORINO (Italy)
>
> tel. +39 011 6704689
>
> _______________________________________________
> EMBOSS mailing list
> EMBOSS at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/emboss
>
> _______________________________________________
> EMBOSS mailing list
> EMBOSS at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/emboss


From olivier.friard at unito.it  Wed Apr 26 06:29:51 2006
From: olivier.friard at unito.it (Olivier Friard)
Date: Wed, 26 Apr 2006 12:29:51 +0200
Subject: [EMBOSS] index RefSeq with dbxflat
Message-ID: <444F4B9F.1020209@unito.it>

Hello,

Thank you for your kindly help for indexing refseq.


I try to index RefSeq DNA db using the dbxflat program with the 
following arguments:

dbxflat
Database b+tree indexing for flat file databases
Basename for index files: rs_dna
Resource name: rs_dna
       EMBL : EMBL
      SWISS : Swiss-Prot, SpTrEMBL, TrEMBLnew
         GB : Genbank, DDBJ
     REFSEQ : Refseq
Entry format [SWISS]: REFSEQ
Wildcard database filename [*.dat]: *.gbff
Database directory [.]: /home/users/friard/data/refseq_genomic
         id : ID
        acc : Accession number
         sv : Sequence Version and GI
        des : Description
        key : Keywords
        org : Taxonomy
Index fields [id,acc]:

I included these records in my .embossrc file:

DB rs_dna [
     type: "N"
     method: "emboss"
     dbalias: "rs_dna"
     format: "genbank"
     directory: "/home/users/friard/data/refseq_genomic/"
     file: "*.gbff"
     comment: "RefSeq DNA (dbxflat)"
]

RES rs_dna [
    type: Index
    idlen:  15
    acclen: 15
    svlen:  15
    keylen: 15
    deslen: 15
    orglen: 15
]

but when I try to retrieve a single sequence with its AC (seqret 
rs_dna:NC_001911) the program fails with this error message:

seqret rs_dna:NC_001191
Reads and writes (returns) sequences
Error: Unable to read sequence 'rs_dna:NC_001191'
Died: seqret terminated: Bad value for '-sequence' and no prompt

when I try to retrieve all sequences with "seqret rs_dna:* -out 
fasta::refseq.fasta" and everything works well

I try to use dbxfasta with the *.fna files (modifying the .embossrc file 
with "fasta" value) but I obtained the same error.

Any idea about the problem?

Thank you in advance

Olivier Friard


From xiaozhendong at gmail.com  Wed Apr 26 09:50:01 2006
From: xiaozhendong at gmail.com (zhendong shaw)
Date: Wed, 26 Apr 2006 21:50:01 +0800
Subject: [EMBOSS] how to using Einverted to process a file contain multiple
	sequences
Message-ID: <ccfd29ab0604260650q5b7d4d5fjc2c7562668699b36@mail.gmail.com>

Since the Einverted program is designed to process only one sequences a
time. Are there any ways to handle a file in fasta format containing
multiple sequences?
The input file just like follow:
>seq1
ATTTTTTTTTTTTTTTTTTTT
>seq2
TTTAAAAAAAAAAAAAAA
.......

sth like that....


From rls at ebi.ac.uk  Wed Apr 26 11:46:51 2006
From: rls at ebi.ac.uk (Rodrigo Lopez)
Date: Wed, 26 Apr 2006 16:46:51 +0100
Subject: [EMBOSS] FW: Forthcoming change in the EMBL flatfile format
Message-ID: <00c401c66948$a2e29d40$0132a8c0@windows.ebi.ac.uk>

 
> -----Original Message-----
> From: owner-seq-dbg at ebi.ac.uk 
> [mailto:owner-seq-dbg at ebi.ac.uk] On Behalf Of Carola Kanz
> Sent: 26 April 2006 16:29
> To: seq-dbg at ebi.ac.uk
> Subject: Forthcoming change in the EMBL flatfile format
> 
> 
> Dear all,
> 
> if you are working with the EMBL flatfile format and you are 
> not yet aware of the format change we are going to introduce 
> with the next release, please have a look at the following 
> announcement.
> Carola
> 
> 
> --------------------------------------------------------------
> -----------
> 
> Dear colleagues,
> 
> We would like to announce the following important change in 
> the EMBL database in June this year.
> 
> At the time of release 87 (available from JUN-2006) the 
> format of the EMBL flat file will undergo a change: the ID 
> line will have a different structure (see below) and the SV 
> line will be removed.
> 
> The changes affecting the ID line structure are:
> 
>      * All tokens will be separated by a semicolon.
>      * The entry name will not be displayed, in its place 
> there will be  
>        the primary accession number.
>      * The sequence version will be indicated.
>      * The topology will be a separate token and will be 
> indicated for 
>        both circular and linear molecules.
>      * Both the data class and the taxonomic divisions will 
> be displayed.
> 
> This is an example of the new ID line:
> 
> ID   CD789012; SV 4; linear; genomic DNA; HTG; MAM; 500 BP.
>         (1)     (2)     (3)      (4)       (5)  (6)   (7)
> 
> 
> The tokens represent:
> 
>     1. Primary accession number.
>     2. 'SV' + sequence version number.
>     3. Topology: 'circular' or 'linear'.
>     4. Molecule type.
>     5. Data class (ANN, CON, PAT, EST, GSS, HTC, HTG, MGA, WGS, TPA, 
>        STS, STD, "normal" entries will have STD for standard).
>     6. Taxonomic division (HUM, MUS, ROD, PRO, MAM, VRT, FUN, 
> PLN, ENV, 
>        INV, SYN, UNC, VRL, PHG)."
>     7. Sequence length + 'BP.'.
> 
> The entry name will not be displayed any more in the ID line. 
> Since EMBL release 3 (Dec 1983) the stable identifier of an 
> entry has been the primary accession number.
> 
> A mapping file (entryname to accession number) will be 
> provided with the next release for those entries where the 
> entryname doesn't coincide with the accession number.
> 
> To give users a test dataset, one file with new-style ID 
> lines called new_id_line.test.gz was provided together with 
> the March release of the EMBL database: 
> ftp://ftp.ebi.ac.uk/pub/databases/embl/release/new_id_line.test.gz 
> 
> Feedback from users is sought; please use the "Contact us" 
> link at the bottom of the EBI home page and specify "EMBL" in 
> the feedback form.
> 
> Note: this information was first made available on our 
> "Forthcoming changes" page (
> http://www.ebi.ac.uk/embl/Documentation/forthcomingchanges.htm
> l#0606 ) and in the EMBL database release notes.
> 
> 
> 
> 
> 
> 


From pmr at ebi.ac.uk  Fri Apr 28 05:04:31 2006
From: pmr at ebi.ac.uk (Peter Rice)
Date: Fri, 28 Apr 2006 10:04:31 +0100
Subject: [EMBOSS] EMBOSS Funding News
Message-ID: <4451DA9F.5030906@ebi.ac.uk>

EMBOSS will be funded by the UK Biotechnology and Biological Sciences 
Research Council (BBSRC) for the next 3 years. EBI has issued the 
following press release, also available from:

http://www.ebi.ac.uk/Information/News/pdf/Press25Apr06-small.pdf

The EMBOSS team would like to thanks all our users and developers for 
their patience over the past two years.

regards,

Peter Rice
Alan Bleasby
Jon Ison

A brighter future for Europe?s favourite molecular biology software package

New funding for EMBOSS ? Europe?s leading suite of molecular biology 
analysis tools ? guarantees open access for researchers and software 
developers

Hinxton, 25 April, 2006 ? EMBOSS, the European Molecular Biology Open 
Software Suite, has received a vital funding boost from the UK 
Biotechnology and Biological Sciences Research Council (BBSRC) that will 
guarantee its continued maintenance under an open source license for the 
next three years. This ends two years of uncertainty over the future of 
the project.

Until recently, EMBOSS was hosted by the Medical Research Council?s 
Rosalind Franklin Centre for Genomics Research (RFCGR), where it was 
funded jointly by the BBSRC and the Medical Research Council (see ?notes 
for editors? for more information on the history of EMBOSS). With the 
announcement in April 2004 of the RFCGR?s closure, the future of EMBOSS 
hung in the balance. The new funding from the BBSRC means that EMBOSS 
co-founders Peter Rice and Alan Bleasby will be able to continue the 
EMBOSS project at the EMBL-EBI for the next three years. EMBOSS will 
remain freely available from emboss.sourceforge.net and anyone who wants 
to develop it further will have access to its source code. ?We?re 
delighted that the BBSRC has recognized EMBOSS as an important tool for 
molecular biology? says project leader Peter Rice. ?The EMBOSS user 
community has been very patient, and it highlights a great benefit of 
open source software that even users in industry have continued to rely 
on EMBOSS despite the uncertainty about its future. This simply could 
not have happened if EMBOSS had been a commercial package under threat.?

EMBOSS provides a powerful package of around 300 applications for 
molecular biology and bioinformatics analysis. Molecular biologists use 
EMBOSS at all stages of their research, from planning experiments to 
analysing results. It also has an application-programming interface 
(API) that enables software developers to write their own EMBOSS 
applications. These can readily be strung together, allowing users to 
create ?workflows? that automate complex and time-consuming tasks. 
EMBOSS has also been used in many commercial software developments and 
is included in commercial bioinformatics systems. Its flexibility has 
made it an obvious core component of several data integration and 
bioinformatics infrastructure projects, including myGrid and EMBRACE.

The new funding also provides helpdesk support for EMBOSS?s users. ?As 
well as helping researchers with limited bioinformatics expertise to 
make the most of EMBOSS, we will be able to provide better support and 
documentation to the estimated 20% of our users who are also software 
developers?, explains Alan Bleasby. ?We will encourage these experts to 
contribute their code to the project. In return, we will make their 
software widely available through the EMBOSS website and provide ongoing 
user support for it. This mechanism will help to ensure that EMBOSS 
evolves according to the needs of its users.?

Contact:

Cath Brooksbank PhD, EMBL-EBI Scientific Outreach Officer, Hinxton, UK, 
Tel: +44 1223 492 552, www.ebi.ac.uk, cath at ebi.ac.uk
Anna-Lynn Wegener, EMBL Press Officer, Heidelberg, Germany, Tel: +49 
6221 387 452, www.embl.org, wegener at embl.de


Notes for editors ? a brief history of EMBOSS

EMBOSS, an open source suite of tools for the analysis of biological 
data, has its origins in the late 1980s when Peter Rice, a co-founder of 
EMBOSS, was working at EMBL. Encouraged by his colleagues in the lab, he 
began to write extensions to the GCG package, which at that time 
provided its source code to users. His efforts evolved into EGCG 
(extended GCG) and Rice moved to the Sanger Centre (now the Wellcome 
Trust Sanger Institute) to continue its development. However, the 
changes to the source code licensing of GCG in 1996 put an end to 
further development of EGCG. Recognizing the importance of free source 
code to the rapid and cost-effective development of bioinformatics 
tools, Rice, in collaboration with Alan Bleasby (then at SEQNET, 
Daresbury, UK) began working on a new suite of open-source 
bioinformatics tools ? the EMBOSS project ? in 1996. EMBOSS has been 
funded by: the Wellcome Trust (1997?2000); the BBSRC and MRC 
(2001?2004); and through two posts at the MRC Rosalind Franklin Centre 
for Genomic Research following a merger with BBSRC?s SEQNET facility in 
1998.After the closure of RFCGR in July 2005,EMBOSS moved to the 
EMBL-EBI where it is coordinated by Rice and Bleasby.


About EMBL:

The European Molecular Biology Laboratory is a basic research institute 
funded by public research monies from 19 member states (Austria, 
Belgium, Croatia,Denmark, Finland, France,Germany,Greece, Iceland, 
Ireland, Israel, Italy, the Netherlands,Norway, Portugal, Spain, Sweden, 
Switzerland and the United Kingdom). Research at EMBL is conducted by 
approximately 80 independent groups covering the spectrum of molecular 
biology. The Laboratory has five units: the main Laboratory in 
Heidelberg, and Outstations in Hinxton (the European Bioinformatics 
Institute), Grenoble, Hamburg, and Monterotondo near Rome. The 
cornerstones of EMBL?s mission are: to perform basic research in 
molecular biology; to train scientists, students and visitors at all 
levels; to offer vital services to scientists in the member states; to 
develop new instruments and methods in the life sciences and to actively 
engage in technology transfer activities. EMBL?s International PhD 
Programme has a student body of about 170. The Laboratory also sponsors 
an active Science and Society programme.Visitors from the press and 
public are welcome.

About EBI:

The European Bioinformatics Institute (EBI) is part of the European 
Molecular Biology Laboratory (EMBL) and is located on the Wellcome Trust 
Genome Campus in Hinxton near Cambridge (UK). The EBI grew out of EMBL's 
pioneering work in providing public biological databases to the research 
community. It hosts some of the world's most important collections of 
biological data, including DNA sequences (EMBL-Bank), protein sequences 
(UniProt), animal genomes (Ensembl), three-dimensional structures (the 
Macromolecular Structure Database), data from microarray experiments 
(ArrayExpress), protein?protein interactions (IntAct) and pathway 
information (Reactome).The EBI hosts several research groups and its 
scientists continually develop new tools for the biocomputing community.

Policy regarding use:

EMBL press releases may be freely reprinted and distributed via print 
and electronic media. Text, photographs & graphics are copyrighted by 
EMBL. They may be freely reprinted and distributed in conjunction with 
this news story, provided that proper attribution to authors, 
photographers and designers is made. High-resolution copies of the 
images can be downloaded from the EMBL web site: www.embl.org


From rsucgang at bcm.tmc.edu  Fri Apr 28 17:33:59 2006
From: rsucgang at bcm.tmc.edu (richard sucgang phd)
Date: Fri, 28 Apr 2006 16:33:59 -0500
Subject: [EMBOSS] backtranambig missing?
In-Reply-To: <69289db821f271dd3cf6e131ffa02013@bbsrc.ac.uk>
References: <442BFD56.9010908@pcm.uam.es> <443A2160.8090102@ebi.ac.uk>
	<69289db821f271dd3cf6e131ffa02013@bbsrc.ac.uk>
Message-ID: <f06230904c0783a745fe8@[128.249.209.78]>

I am using EMBOSS on OSX (installed using fink). Is it my 
imagination, or is the application backtranambig missing? The 
documentation on sf.net points to this application existing, yet, I 
cannot find the binary in the install. Any ideas?
-- 
Richard Sucgang, PhD
(713) 798 7657
http://www.dictygenome.org/


From francis.tang at chukhang.com  Fri Apr 28 18:39:12 2006
From: francis.tang at chukhang.com (Francis Tang)
Date: Fri, 28 Apr 2006 23:39:12 +0100
Subject: [EMBOSS] how to using Einverted to process a file contain
 multiple sequences
In-Reply-To: <ccfd29ab0604260650q5b7d4d5fjc2c7562668699b36@mail.gmail.com>
References: <ccfd29ab0604260650q5b7d4d5fjc2c7562668699b36@mail.gmail.com>
Message-ID: <44529990.90600@chukhang.com>

Hi Zhendong,

I've had to run einverted on a file with many sequences before.

If I remember correctly, I used seqret to create a new file for each 
sequence, and then used bash's for+glob expansion to run einverted many 
times.

Sorry this mail is so vague - it's been a long while since I've used 
emboss.  If you haven't solved the problem already and the clues above 
don't make it obvious, write back and I'll work it out again.

Cheers.

Francis.

zhendong shaw wrote:
> Since the Einverted program is designed to process only one sequences a
> time. Are there any ways to handle a file in fasta format containing
> multiple sequences?
> The input file just like follow:
>> seq1
> ATTTTTTTTTTTTTTTTTTTT
>> seq2
> TTTAAAAAAAAAAAAAAA
> .......
> 
> sth like that....


-- 
www.chukhang.com/francis


From pmr at ebi.ac.uk  Sat Apr 29 06:23:42 2006
From: pmr at ebi.ac.uk (pmr at ebi.ac.uk)
Date: Sat, 29 Apr 2006 11:23:42 +0100 (BST)
Subject: [EMBOSS] backtranambig missing?
In-Reply-To: <f06230904c0783a745fe8@[128.249.209.78]>
References: <442BFD56.9010908@pcm.uam.es> <443A2160.8090102@ebi.ac.uk>
	<69289db821f271dd3cf6e131ffa02013@bbsrc.ac.uk>
	<f06230904c0783a745fe8@[128.249.209.78]>
Message-ID: <2033.86.137.135.19.1146306222.squirrel@webmail.ebi.ac.uk>

Richard Sucgang writes:

> I am using EMBOSS on OSX (installed using fink). Is it my
> imagination, or is the application backtranambig missing? The
> documentation on sf.net points to this application existing, yet, I
> cannot find the binary in the install. Any ideas?

backtranambig will be in EMBOSS 4.0.0

The emboss.sf.net documentation is for the current developers code, and
includes new programs and changes to the documentation for some of the
current programs.

EMBOSS 3.0.0 documentation is included in the distribution and installed
when EMBOSS is installed.

This often causes confusion - we are working on adding the 3.0.0
documentation to the website but we have not yet had time to finish that
work. (We did move the current documentation to make it clearer that it
was for the CVS code - but that caused more confusion).

More news on 4.0.0 soon - we are busy now planning what will be in the
release.

Hope that helps,

Peter


From dksamuel at gmail.com  Sat Apr  1 04:12:14 2006
From: dksamuel at gmail.com (Duleep Samuel)
Date: Sat, 1 Apr 2006 09:42:14 +0530
Subject: [EMBOSS] Fwd: EMBOSS for Windows without Cygwin
In-Reply-To: <442CCD71.60202@gmail.com>
References: <442CCD71.60202@gmail.com>
Message-ID: <a0bf33d50603312012yd77e73ex9e5f88b3acc10e97@mail.gmail.com>

Is the latest EMBOSS version 3.0.0.0 available anywhere as a precompiled
binary for Windows  XP,  I have tried  compiling  using cygwin and it
crashed, I loaded EMBOSS for windows which is a port of version 2.10.0,
loaded Staden Package and made Spin aware of EMBOSS and am working, but
feel bad that I am _One_ whole release behind, If anyone has a complied
binary I can download for testing and report back on useability,
regards, Samuel, Virologist, India


From kvddrift at earthlink.net  Sun Apr  2 22:51:23 2006
From: kvddrift at earthlink.net (Koen van der Drift)
Date: Sun, 2 Apr 2006 18:51:23 -0400
Subject: [EMBOSS] crash on intel-Mac
In-Reply-To: <51078.81.98.244.247.1143807173.squirrel@webmail.ebi.ac.uk>
References: <E24BA334-87A3-4EE1-91D7-C63B1A02BA63@earthlink.net>
	<51078.81.98.244.247.1143807173.squirrel@webmail.ebi.ac.uk>
Message-ID: <D7211261-0F49-4FB6-BA19-12082674BC5E@earthlink.net>


On Mar 31, 2006, at 7:12 AM, ajb at ebi.ac.uk wrote:

> This should now be fixed as long as you apply all the fixes to  
> EMBOSS-3.0.0
> from the directory:

Thanks.

Another fink user suggested to even extend the testing for ppc and  
intel in new config file, so it looks like:

if test "`uname -a | grep Darwin`"; then
   if test "`uname -a | grep i386`"; then
     CFLAGS="$CFLAGS -O1"
   else
     # is this the correct setting on darwin-powerpc?
     CFLAGS="$CLFAGS -O2"
   fi
else
  CFLAGS="$CFLAGS -O2"
  fi
fi

Would that cause any problems with emboss?

thanks,

- Koen.


From h-weber at users.sourceforge.net  Mon Apr  3 17:49:06 2006
From: h-weber at users.sourceforge.net (harald weber)
Date: Mon, 03 Apr 2006 10:49:06 -0700
Subject: [EMBOSS] SeqFreed - a new interface to EMBOSS
Message-ID: <E1FQTAo-0001uF-A6@sc8-pr-shell1.sourceforge.net>

Dear friends,

herewith I'd like to inform you about SeqFreed, a bioinformatics desktop.
Amongst others, SeqFreed can also serve as a GUI-interface to EMBOSS applications.
Please download it via 'seqfreed.sourceforge.net', run it and let me know,
what you think about it. Besides that many details have to be improved,
I'd like to know if this kind of app could be useful for you at all.

All the best, Harald


From dwaner at scitegic.com  Tue Apr  4 16:57:45 2006
From: dwaner at scitegic.com (David Waner)
Date: Tue, 04 Apr 2006 09:57:45 -0700
Subject: [EMBOSS] Digest and Pepstats crash using cygwin
Message-ID: <4432A589.6050809@scitegic.com>

I have compiled the 3.0.0 release of Emboss (including all current fixes 
from the ftp site) for Windows XP using Cygwin version 1.88.  Most of 
the Emboss programs that I have tested work, but both Digest and 
Pepstats fail every time with a "Bad float conversion" error.  The 
problem does not seem to depend on the sequence data, and occurs on 
every file I've tried.

Has anyone else experienced this problem? Any solutions or suggestions 
would be appreciated.

Thanks.
    - David

Example: 

    C:> digest -sequence O43291.fa -menu 2 -auto
    Protein proteolytic enzyme or reagent cleavage digest
    Output report [spt2_human.digest]: stdout

       EMBOSS An error in ajarr.c at line 1701:
    Bad float conversion

Test data (O43291.fa):

 >swall|O43291|SPT2_HUMAN Kunitz-type protease inhibitor 2 precursor 
(Hepatocyte growth factor activator inhibitor type 2) (HAI-2) (Placental 
bikunin).
MAQLCGLRRSRAFLALLGSLLLSGVLAADRERSIHDFCLVSKVVGRCRASMPRWWYNVTD
GSCQLFVYGGCDGNSNNYLTKEECLKKCATVTENATGDLATSRNAADSSVPSAPRRQDSE
DHSSDMFNYEEYCTANAVTGPCRASFPRWYFDVERNSCNNFIYGGCRGNKNSYRSEEACM
LRCFRQQENPPLPLGSKVVVLAGLFVMVLILFLGASMVYLIRVARRNQERALRTVWSSGD
DKEQLVKNTYVL


From simon.andrews at bbsrc.ac.uk  Wed Apr  5 09:04:20 2006
From: simon.andrews at bbsrc.ac.uk (simon andrews (BI))
Date: Wed, 5 Apr 2006 10:04:20 +0100
Subject: [EMBOSS] Download server problems?
Message-ID: <324960494b49bb2c0f2679add8452bf9@bbsrc.ac.uk>

Does anyone know what's up with the emboss.open-bio.org FTP server?  I 
can connect, but never get as far as a login prompt.

Simon.
-- 
Simon Andrews PhD
Bioinformatics Dept.
The Babraham Institute

simon.andrews at bbsrc.ac.uk
+44 (0) 1223 496463


From dag at sonsorol.org  Thu Apr  6 03:07:33 2006
From: dag at sonsorol.org (Chris Dagdigian)
Date: Wed, 5 Apr 2006 23:07:33 -0400
Subject: [EMBOSS] Download server problems?
In-Reply-To: <324960494b49bb2c0f2679add8452bf9@bbsrc.ac.uk>
References: <324960494b49bb2c0f2679add8452bf9@bbsrc.ac.uk>
Message-ID: <EB2D57E0-7C4A-40FD-B905-A83775F21552@sonsorol.org>


{forgot to CC the list on this reply ... }

Our fault (open-bio.org hosting) -- the server has some sort of  
running process with a memory leak we thought we had found. Turns out  
we didn't and the box ground itself slowly to a halt this evening.  
Thanks to the wonders of remote power control all it takes to reset  
and power cycle the system is an SSH connection.  We've got another  
4GB of memory on order for this system.

Regards.
Chris


On Apr 5, 2006, at 5:04 AM, simon andrews (BI) wrote:

> Does anyone know what's up with the emboss.open-bio.org FTP server?  I
> can connect, but never get as far as a login prompt.
>
> Simon.
> -- 
> Simon Andrews PhD
> Bioinformatics Dept.
> The Babraham Institute
>
> simon.andrews at bbsrc.ac.uk
> +44 (0) 1223 496463
>
> _______________________________________________
> EMBOSS mailing list
> EMBOSS at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/emboss


From natalia.jimenez at pcm.uam.es  Thu Apr  6 07:56:06 2006
From: natalia.jimenez at pcm.uam.es (Natalia Jimenez Lozano)
Date: Thu, 06 Apr 2006 09:56:06 +0200
Subject: [EMBOSS] Problems with GenBank indexing
Message-ID: <4434C996.7050606@pcm.uam.es>

Hi everybody,

I was trying to retrieve fasta protein sequences from GenBank by id 
using seqret but it was not possible for every id. However, retrieval by 
GI is allowed.

Additionally, during the indexing process (dbifasta) I've obtained some 
errors like this one:

Warning: Duplicate ID skipped: 'AC000348_16' All hits will point to 
first ID found

I was looking for an explanation to this behaviour and I've found that 
skipped IDs correspond to CDS from genomic sequences and have this format:

 >gi|10121909|gb|AAG13419.1|AC000348_16 T7N9.24 [Arabidopsis thaliana]
MELPDVPVWRRVIVSAFFEALTFNIDIEEERSEIMMKTGAVVSNPRSRVKWDAFLSFQRDTSHNFTDRLY...
 >gi|8778864|gb|AAF79863.1|AC000348_16 T7N9.28 [Arabidopsis thaliana]
MSVVLQITKDWVQALLGFLLLSFANISTRTNHKHFPHGSCSSIMAGFWIYMYIYSYLFITLKIIDLTS...

In the previous entries, when I try to retrieve one of them by the first 
identifier (gi), I can get both of them. When I try to do retrievals 
using the last identifier (AC000348_16), I only get the first one. But 
it's impossible to do retrievals by second identifier (AAG13419.1 and 
AAF79863.1).

However, sequences with the following format can be well indexed:

 >gi|64029|emb|CAA23986.1| reading frame [Lophius americanus]
MKMVSSSRLRCLLVLLLSLTASISCSFAGQRDSKLRLLLHRYPLQGSKQDMTRSALAELLLSDLLQGENE ...

and these sequences can be well retrieved by first and second 
identifiers (64029 and CAA23986.1).

Does anybody know how to solve these problems?
Thanks in advance,
Natalia


From jison at ebi.ac.uk  Fri Apr  7 12:02:50 2006
From: jison at ebi.ac.uk (Jon Ison)
Date: Fri, 7 Apr 2006 13:02:50 +0100 (BST)
Subject: [EMBOSS] Problems with GenBank indexing
In-Reply-To: <4434C996.7050606@pcm.uam.es>
References: <4434C996.7050606@pcm.uam.es>
Message-ID: <59463.172.31.100.168.1144411370.squirrel@webmail.ebi.ac.uk>


Dear Natalia

By default, dbifasta will index the ID name and the accession number (if present).

To index the Sequence Version, GI number and words in the description, you must
run dbifasta with the '-fields' qualifier, e.g. "-fields acc", "-fields sv acc"
etc.   If you don't, you will not be able to retrieve by those fields. Please
see http://emboss.sourceforge.net/apps/cvs/dbifasta.html.

dbifasta only retrieves the first of any duplicate entries.  So far as I'm aware
dbxfasta can retrieve duplicate entries.

Does that help?  Feel free to get back in touch.

Cheers

Jon


> Hi everybody,
>
> I was trying to retrieve fasta protein sequences from GenBank by id
> using seqret but it was not possible for every id. However, retrieval by
> GI is allowed.
>
> Additionally, during the indexing process (dbifasta) I've obtained some
> errors like this one:
>
> Warning: Duplicate ID skipped: 'AC000348_16' All hits will point to
> first ID found
>
> I was looking for an explanation to this behaviour and I've found that
> skipped IDs correspond to CDS from genomic sequences and have this format:
>
>  >gi|10121909|gb|AAG13419.1|AC000348_16 T7N9.24 [Arabidopsis thaliana]
> MELPDVPVWRRVIVSAFFEALTFNIDIEEERSEIMMKTGAVVSNPRSRVKWDAFLSFQRDTSHNFTDRLY...
>  >gi|8778864|gb|AAF79863.1|AC000348_16 T7N9.28 [Arabidopsis thaliana]
> MSVVLQITKDWVQALLGFLLLSFANISTRTNHKHFPHGSCSSIMAGFWIYMYIYSYLFITLKIIDLTS...
>
> In the previous entries, when I try to retrieve one of them by the first
> identifier (gi), I can get both of them. When I try to do retrievals
> using the last identifier (AC000348_16), I only get the first one. But
> it's impossible to do retrievals by second identifier (AAG13419.1 and
> AAF79863.1).
>
> However, sequences with the following format can be well indexed:
>
>  >gi|64029|emb|CAA23986.1| reading frame [Lophius americanus]
> MKMVSSSRLRCLLVLLLSLTASISCSFAGQRDSKLRLLLHRYPLQGSKQDMTRSALAELLLSDLLQGENE ...
>
> and these sequences can be well retrieved by first and second
> identifiers (64029 and CAA23986.1).
>
> Does anybody know how to solve these problems?
> Thanks in advance,
> Natalia
> _______________________________________________
> EMBOSS mailing list
> EMBOSS at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/emboss
>


From natalia.jimenez at pcm.uam.es  Fri Apr  7 12:50:16 2006
From: natalia.jimenez at pcm.uam.es (Natalia Jimenez Lozano)
Date: Fri, 07 Apr 2006 14:50:16 +0200
Subject: [EMBOSS] Problems with GenBank indexing
In-Reply-To: <59463.172.31.100.168.1144411370.squirrel@webmail.ebi.ac.uk>
References: <4434C996.7050606@pcm.uam.es> 
	<59463.172.31.100.168.1144411370.squirrel@webmail.ebi.ac.uk>
Message-ID: <44366008.6080106@pcm.uam.es>

Dear Jon,

> Dear Natalia
>
> By default, dbifasta will index the ID name and the accession number (if present).
>
> To index the Sequence Version, GI number and words in the description, you must
> run dbifasta with the '-fields' qualifier, e.g. "-fields acc", "-fields sv acc"
> etc.   If you don't, you will not be able to retrieve by those fields. Please
> see http://emboss.sourceforge.net/apps/cvs/dbifasta.html.
>   
Yes indexation was done taking into account the -field parameter :-(
> dbifasta only retrieves the first of any duplicate entries.  So far as I'm aware
> dbxfasta can retrieve duplicate entries.
>   
We'll try with dbxfasta!
> Does that help?  Feel free to get back in touch.
>   
Yes, a lot.
Thank you very much
Regards,
Natalia
> Cheers
>
> Jon
>
>
>
>
>   
>> Hi everybody,
>>
>> I was trying to retrieve fasta protein sequences from GenBank by id
>> using seqret but it was not possible for every id. However, retrieval by
>> GI is allowed.
>>
>> Additionally, during the indexing process (dbifasta) I've obtained some
>> errors like this one:
>>
>> Warning: Duplicate ID skipped: 'AC000348_16' All hits will point to
>> first ID found
>>
>> I was looking for an explanation to this behaviour and I've found that
>> skipped IDs correspond to CDS from genomic sequences and have this format:
>>
>>  >gi|10121909|gb|AAG13419.1|AC000348_16 T7N9.24 [Arabidopsis thaliana]
>> MELPDVPVWRRVIVSAFFEALTFNIDIEEERSEIMMKTGAVVSNPRSRVKWDAFLSFQRDTSHNFTDRLY...
>>  >gi|8778864|gb|AAF79863.1|AC000348_16 T7N9.28 [Arabidopsis thaliana]
>> MSVVLQITKDWVQALLGFLLLSFANISTRTNHKHFPHGSCSSIMAGFWIYMYIYSYLFITLKIIDLTS...
>>
>> In the previous entries, when I try to retrieve one of them by the first
>> identifier (gi), I can get both of them. When I try to do retrievals
>> using the last identifier (AC000348_16), I only get the first one. But
>> it's impossible to do retrievals by second identifier (AAG13419.1 and
>> AAF79863.1).
>>
>> However, sequences with the following format can be well indexed:
>>
>>  >gi|64029|emb|CAA23986.1| reading frame [Lophius americanus]
>> MKMVSSSRLRCLLVLLLSLTASISCSFAGQRDSKLRLLLHRYPLQGSKQDMTRSALAELLLSDLLQGENE ...
>>
>> and these sequences can be well retrieved by first and second
>> identifiers (64029 and CAA23986.1).
>>
>> Does anybody know how to solve these problems?
>> Thanks in advance,
>> Natalia
>> _______________________________________________
>> EMBOSS mailing list
>> EMBOSS at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/emboss
>>
>>     
>
>
>
>
>   


From jison at ebi.ac.uk  Fri Apr  7 15:34:24 2006
From: jison at ebi.ac.uk (Jon Ison)
Date: Fri, 7 Apr 2006 16:34:24 +0100 (BST)
Subject: [EMBOSS] Problem indexing PDB fasta file
In-Reply-To: <442BFD56.9010908@pcm.uam.es>
References: <442BFD56.9010908@pcm.uam.es>
Message-ID: <34100.172.31.100.168.1144424064.squirrel@webmail.ebi.ac.uk>

Hi Enrique

dbifasta will return just the first entry with a duplicated id.
The new dbxfasta will return all entries with the duplicated id.

dbifasta is indeed case-insensitive.   To make it case-sensitive,
you could change the 3 instances of "ajStrMatchCaseC" in dbifasta.c
to "ajStrMatchC", recompile and try again.  I don't think we'd want
to make that change in the distribution though.

Hope that helps.

Cheers

Jon


> Hello,
>
> I'm trying to index the fasta file of the PDB database with dbifasta
> command and I get a lot of warnings as:
>
> Warning: Duplicate ID skipped: '1FNT_A' All hits will point to first ID
> found
>
> I have been looking the PDB fasta file and I see that, for the previous
> warning, there are an entry whoose id is '1FNT_A' and another one whoose
> id is '1FNT_a'. Then, this make me think that EMBOSS is
> case-insensitive. Is this true? Are there any way to distinguish between
> the two id's?
>
> Thanks in advance,
>
> Enrique.
> _______________________________________________
> EMBOSS mailing list
> EMBOSS at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/emboss
>


From pmr at ebi.ac.uk  Mon Apr 10 09:12:00 2006
From: pmr at ebi.ac.uk (Peter Rice)
Date: Mon, 10 Apr 2006 10:12:00 +0100
Subject: [EMBOSS] Problem indexing PDB fasta file
In-Reply-To: <442BFD56.9010908@pcm.uam.es>
References: <442BFD56.9010908@pcm.uam.es>
Message-ID: <443A2160.8090102@ebi.ac.uk>

Enrique de Andres Saiz wrote:
> I have been looking the PDB fasta file and I see that, for the previous 
> warning, there are an entry whoose id is '1FNT_A' and another one whoose 
> id is '1FNT_a'. Then, this make me think that EMBOSS is 
> case-insensitive. Is this true? Are there any way to distinguish between 
> the two id's?

Yes, EMBOSS is case-insensitive. So is the Staden/EMBLCD indexing standard 
that dbifasta uses.

The standard also only allows one entry with each ID.

dbxfasta uses a new indexing format and can index both entries, but will still 
assume the names are the same (a search for 1FNT_A or 1FNT_a wil return both 
entries). Allowing indexing to be case-sensitive is possible in future, but 
can slow down searches. We will investigate.

Hope that helps,

Peter


From pmr at ebi.ac.uk  Mon Apr 10 09:05:36 2006
From: pmr at ebi.ac.uk (Peter Rice)
Date: Mon, 10 Apr 2006 10:05:36 +0100
Subject: [EMBOSS] dbifasta index file format
In-Reply-To: <20060330083142.4237.qmail@web26207.mail.ukl.yahoo.com>
References: <20060330083142.4237.qmail@web26207.mail.ukl.yahoo.com>
Message-ID: <443A1FE0.1060707@ebi.ac.uk>

Graziano P. wrote:
> hello EMBOSS users,
> I have some databases in fasta format (ncbi | format)
> and I want to index them using dbifasta, then I want
> to access the index files using a program that will be
> developed by a computer scientist of my group.
> I need to index the databases by accession number,
> ginumber and description. I have read in the dbifasta
> help info about the structure of the index files when
> the databases were indexed by accession number, but I
> have not found info about the structure of the index
> files when the databases are indexed by description.
> Anyone knows where I can find detailed information
> about the structure of the index files?

Ciao Graziano,

The dbifasta index files use the same format as the Staden package, the old 
EMBL CD-ROM distribution, and Erik Sonnhammer's "efetch" utility.

They were documented in some old Staden documentation and papers.

They are also documented in the EMBOSS distribution under doc/manuals/ in file 
internals-indexing.txt (see attached). I see that this document was written 
before we indexed the descriptions!!!

The description (title) indexing is the same as the accession number indexing. 
The files are called des.hit and des.trg. dbifasta has a -maxindex option to 
limit the size of the longest words indexed (the index files have a value for 
the maximum record length).

We also have a script in the distribution scripts/dbilist.pl which can list 
the contents of the description index (in the database index directory, run it 
as dbilist.pl des)

The new dbxfasta index files are very different. For very large databases we 
recommend dbxfasta. For smaller databases dbifasta is fine and we will 
continue to support it.

Hope that helps. If you need more details, just ask.

regards,

Peter


-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: internals-indexing.txt
URL: <http://lists.open-bio.org/pipermail/emboss/attachments/20060410/be632ef4/attachment-0001.txt>

From simon.andrews at bbsrc.ac.uk  Mon Apr 10 09:40:30 2006
From: simon.andrews at bbsrc.ac.uk (simon andrews (BI))
Date: Mon, 10 Apr 2006 10:40:30 +0100
Subject: [EMBOSS] Problem indexing PDB fasta file
In-Reply-To: <443A2160.8090102@ebi.ac.uk>
References: <442BFD56.9010908@pcm.uam.es> <443A2160.8090102@ebi.ac.uk>
Message-ID: <69289db821f271dd3cf6e131ffa02013@bbsrc.ac.uk>


On 10 Apr 2006, at 10:12, Peter Rice wrote:

> Enrique de Andres Saiz wrote:
>> I have been looking the PDB fasta file and I see that, for the 
>> previous
>> warning, there are an entry whoose id is '1FNT_A' and another one 
>> whoose
>> id is '1FNT_a'. Then, this make me think that EMBOSS is
>> case-insensitive. Is this true? Are there any way to distinguish 
>> between
>> the two id's?
>
> Yes, EMBOSS is case-insensitive. So is the Staden/EMBLCD indexing 
> standard
> that dbifasta uses.
>
> The standard also only allows one entry with each ID.

If anyone's interested I've got a small perl script which reformats the 
PDB database into a more sensible format and sorts out the problems 
with case sensitive ids and a number of other odd conventions used in 
PDB.

I'm happy to supply a copy to anyone who wants it.

TTFN

Simon.
-- 
Simon Andrews PhD
Bioinformatics Dept.
The Babraham Institute

simon.andrews at bbsrc.ac.uk
+44 (0) 1223 496463


From pmr at ebi.ac.uk  Mon Apr 10 10:44:47 2006
From: pmr at ebi.ac.uk (Peter Rice)
Date: Mon, 10 Apr 2006 11:44:47 +0100
Subject: [EMBOSS] Problems with GenBank indexing
In-Reply-To: <4434C996.7050606@pcm.uam.es>
References: <4434C996.7050606@pcm.uam.es>
Message-ID: <443A371F.1010100@ebi.ac.uk>

Natalia Jimenez Lozano wrote:

> I was looking for an explanation to this behaviour and I've found that 
> skipped IDs correspond to CDS from genomic sequences and have this format:
> 
>  >gi|10121909|gb|AAG13419.1|AC000348_16 T7N9.24 [Arabidopsis thaliana]
> MELPDVPVWRRVIVSAFFEALTFNIDIEEERSEIMMKTGAVVSNPRSRVKWDAFLSFQRDTSHNFTDRLY...
>  >gi|8778864|gb|AAF79863.1|AC000348_16 T7N9.28 [Arabidopsis thaliana]
> MSVVLQITKDWVQALLGFLLLSFANISTRTNHKHFPHGSCSSIMAGFWIYMYIYSYLFITLKIIDLTS...

As Jon says, dbxfasta is a solution.

However, that is only a partial solution. The real problem is that these FASTA 
format sequences do indeed have duplicate IDs.

This is protein sequence data, so it is not GenBank - was this GenPept or some 
other database?

GenPept and other databases have been known to report "gb" or "emb" as the 
database for protein sequences!!!

A possible solution is to add a new ID format to dbifasta and dbxfasta that 
uses AAG13419 and AAF7986 as the ID and ignores the AC000348_16 part.

Hope this helps,

Peter


From pmr at ebi.ac.uk  Mon Apr 10 11:04:49 2006
From: pmr at ebi.ac.uk (Peter Rice)
Date: Mon, 10 Apr 2006 12:04:49 +0100
Subject: [EMBOSS] Fwd: EMBOSS for Windows without Cygwin
In-Reply-To: <a0bf33d50603312012yd77e73ex9e5f88b3acc10e97@mail.gmail.com>
References: <442CCD71.60202@gmail.com>
	<a0bf33d50603312012yd77e73ex9e5f88b3acc10e97@mail.gmail.com>
Message-ID: <443A3BD1.2040709@ebi.ac.uk>

Duleep Samuel wrote:

> Is the latest EMBOSS version 3.0.0.0 available anywhere as a precompiled
> binary for Windows  XP,  I have tried  compiling  using cygwin and it
> crashed, I loaded EMBOSS for windows which is a port of version 2.10.0,
> loaded Staden Package and made Spin aware of EMBOSS and am working, but
> feel bad that I am _One_ whole release behind, If anyone has a complied
> binary I can download for testing and report back on useability,
> regards, Samuel, Virologist, India

Staden has support for older versions of EMBOSS. We are trying to update 
Staden to work with EMBOS 3.0.0 and future releases.

If anyone is using EMBOSS and Staden (especially EMBOSS under the Staden SPIN 
interface) please contact the EMBOSS developers 
(emboss-bug at emboss.open-bio.org) so we know how many EMBOSS SPIN users there 
are. It helps to set priorities for the work.

regards,

Peter


From janenerz at web.de  Wed Apr 12 09:09:58 2006
From: janenerz at web.de (Christiane Nerz)
Date: Wed, 12 Apr 2006 11:09:58 +0200
Subject: [EMBOSS] nt-multi-fastA-file
Message-ID: <443CC3E6.4040108@web.de>

Hi all,

I put the gb-file of an whole genome in Artemis.
Is there a possibility to export a multi-FastA-file with the bases of 
all ORFs? Example:

 >ORF_1
ATGTGTTCGTT....
 >ORF_2
ATGTTCCCGACCA...
 >ORF_3
ATGCCGCAT...

I know how to get all bases, but only as one complete sequence.
(That genome is not published yet, so there is no multi-Fasta-file at 
ncbi or EMBL available)

Thanks for help!

Jane Nerz


From simon.andrews at bbsrc.ac.uk  Wed Apr 12 10:05:49 2006
From: simon.andrews at bbsrc.ac.uk (simon andrews (BI))
Date: Wed, 12 Apr 2006 11:05:49 +0100
Subject: [EMBOSS] nt-multi-fastA-file
In-Reply-To: <443CC3E6.4040108@web.de>
References: <443CC3E6.4040108@web.de>
Message-ID: <902608901e58c68600b4dc52c7e8a966@bbsrc.ac.uk>


On 12 Apr 2006, at 10:09, Christiane Nerz wrote:

> Hi all,
>
> I put the gb-file of an whole genome in Artemis.
> Is there a possibility to export a multi-FastA-file with the bases of
> all ORFs?

If you can save the file out of Artemis with the ORFs shown in the 
feature table then you can use coderet in EMBOSS to extract out all of 
the subsequences covering those features, either as protein or DNA.

Hope this helps

Simon.
-- 
Simon Andrews PhD
Bioinformatics Dept.
The Babraham Institute

simon.andrews at bbsrc.ac.uk
+44 (0) 1223 496463


From pmr at ebi.ac.uk  Wed Apr 12 10:20:46 2006
From: pmr at ebi.ac.uk (Peter Rice)
Date: Wed, 12 Apr 2006 11:20:46 +0100
Subject: [EMBOSS] nt-multi-fastA-file
In-Reply-To: <443CC3E6.4040108@web.de>
References: <443CC3E6.4040108@web.de>
Message-ID: <443CD47E.6060607@ebi.ac.uk>

Christiane Nerz wrote:
> Hi all,
> 
> I put the gb-file of an whole genome in Artemis.
> Is there a possibility to export a multi-FastA-file with the bases of 
> all ORFs? Example:
> 
>  >ORF_1
> ATGTGTTCGTT....
>  >ORF_2
> ATGTTCCCGACCA...
>  >ORF_3
> ATGCCGCAT...
> 
> I know how to get all bases, but only as one complete sequence.
> (That genome is not published yet, so there is no multi-Fasta-file at 
> ncbi or EMBL available)

Yes, the coderet program will do this.

Unfortunately coderet tries to return CDS, mRNA and translations all in 
one file (to be fixed for the next release). You can ask just for the 
CDS with a couple of extra command line options:

coderet -nomrna -notranslation

Give it the filename as input.
The output will be the coding sequences.

With -nocds instead of -notranslation you will get the protein sequences.

If you have any problems parsing the GenBank file let me know.

regards,

Peter Rice


From Marc.Logghe at DEVGEN.com  Wed Apr 12 12:39:00 2006
From: Marc.Logghe at DEVGEN.com (Marc Logghe)
Date: Wed, 12 Apr 2006 14:39:00 +0200
Subject: [EMBOSS] Embossdata -reject option
Message-ID: <0C528E3670D8CE4B8E013F6749231AA6746CD7@ANTARESIA.be.devgen.com>

Hi,
I am intrigued by the -reject option of embossdata.
According to the doc:
"This specifies the names of the sub-directories of the EMBOSS data
directory that should be ignored when displaying data directories.
Choose from selection list of values 	3, 5, 6".

I was not able to find out what this list of values corresponds to. I
hoped to get a list to select from when embossdata was run with the
-options parameter, but this did not happen.
Any clues ?
Actually I was trying to find a way to obtain more or less the oposite
of '-reject', e.g. what if you only want the content of the CODONS
directory ?

Regards,
Marc


From gbottu at ben.vub.ac.be  Wed Apr 12 13:30:00 2006
From: gbottu at ben.vub.ac.be (Guy Bottu)
Date: Wed, 12 Apr 2006 15:30:00 +0200
Subject: [EMBOSS] Embossdata -reject option - Checked by AntiVir DEMO
	versio
In-Reply-To: <0C528E3670D8CE4B8E013F6749231AA6746CD7@ANTARESIA.be.devgen.com>
References: <0C528E3670D8CE4B8E013F6749231AA6746CD7@ANTARESIA.be.devgen.com>
Message-ID: <20060412133000.GD15725@bigben.ulb.ac.be>

On Wed, Apr 12, 2006 at 02:39:00PM +0200, Marc Logghe wrote:
> I am intrigued by the -reject option of embossdata.
> According to the doc:
> "This specifies the names of the sub-directories of the EMBOSS data
> directory that should be ignored when displaying data directories.
> Choose from selection list of values 	3, 5, 6".
> I was not able to find out what this list of values corresponds to.

Indeed tricky to find out what this means   :-;
You can look in the file  .../share/EMBOSS/acd/embossdata.acd :

  selection: reject  [
    default: "3, 5, 6"
    minimum: "1"
    maximum: "6"
    values: "None, AAINDEX, CVS, CODONS, PRINTS, PROSITE, REBASE"
    delimiter: ","
    header: "Directories to ignore"
    information: "Select directories"
    help: "This specifies the names of the sub-directories of the
           EMBOSS data directory that should be ignored when displaying data
           directories."
    button: "Y"
  ]

So, by default CVS, PRINTS and PROSITE are rejected.

> I hoped to get a list to select from when embossdata was run with the
> -options parameter, but this did not happen.

That is because -reject is an "advanced", not an "optional"/"additinal" 
parameter. It is indeed impossible to get a selection list displayed at 
the command line, although many GUI's like wEMBOSS will show it.

> Actually I was trying to find a way to obtain more or less the oposite
> of '-reject', e.g. what if you only want the content of the CODONS
> directory ?

This does not work, there is no way to reject the files in the base data 
directory. The best you can do is to add on the command line 
-reject=2,3,5,6,7 or -reject= AAINDEX,CVS,PRINTS,PROSITE,REBASE
What you can do however is :
ls $EMBOSS_DATA/CODONS

	Hope this helps,
	Guy Bottu,
	Belgian EMBnet Node


From Marc.Logghe at DEVGEN.com  Wed Apr 12 14:02:09 2006
From: Marc.Logghe at DEVGEN.com (Marc Logghe)
Date: Wed, 12 Apr 2006 16:02:09 +0200
Subject: [EMBOSS] Embossdata -reject option - Checked by AntiVir DEMO
	versio
Message-ID: <0C528E3670D8CE4B8E013F6749231AA6746CD9@ANTARESIA.be.devgen.com>

Hi Guy !

> You can look in the file  .../share/EMBOSS/acd/embossdata.acd :
> 
>   selection: reject  [
>     default: "3, 5, 6"
>     minimum: "1"
>     maximum: "6"
>     values: "None, AAINDEX, CVS, CODONS, PRINTS, PROSITE, REBASE"
>     delimiter: ","
>     header: "Directories to ignore"
>     information: "Select directories"
>     help: "This specifies the names of the sub-directories of the
>            EMBOSS data directory that should be ignored when 
> displaying data
>            directories."
>     button: "Y"
>   ]
> 
> So, by default CVS, PRINTS and PROSITE are rejected.
 
Yes, that makes sense now !

> This does not work, there is no way to reject the files in 
> the base data directory. The best you can do is to add on the 
> command line
> -reject=2,3,5,6,7 or -reject= 
> AAINDEX,CVS,PRINTS,PROSITE,REBASE What you can do however is :
> ls $EMBOSS_DATA/CODONS

Yeah, that is of course the most obvious ;-) Thing is that I wanted to
do it in an emboss-only way so that it would be possible to run the
emboss command via a soaplab service. The latter should provide a means
to dynamically fetch a list of codon usage tables. More or less like
showdb is doing.
> 
> 	Hope this helps,
Yes it did. Thanks !
Regards,
Marc 


From pmr at ebi.ac.uk  Wed Apr 12 16:04:21 2006
From: pmr at ebi.ac.uk (pmr at ebi.ac.uk)
Date: Wed, 12 Apr 2006 17:04:21 +0100 (BST)
Subject: [EMBOSS] Embossdata -reject option
In-Reply-To: <0C528E3670D8CE4B8E013F6749231AA6746CD7@ANTARESIA.be.devgen.com>
References: <0C528E3670D8CE4B8E013F6749231AA6746CD7@ANTARESIA.be.devgen.com>
Message-ID: <3057.86.137.128.238.1144857861.squirrel@webmail.ebi.ac.uk>

Mark Logghe wrote:

> I am intrigued by the -reject option of embossdata.
>
> I was not able to find out what this list of values corresponds to. I
> hoped to get a list to select from when embossdata was run with the
> -options parameter, but this did not happen.
> Any clues ?

Hmmmm .... yes, -help and the acdtable output (the table in the webpage
application documentation) really need to report the list of menu items
for values that are not prompted (list and selection datatypes).

We will do that for the next release!

Otherwise, you do need to look in the ACD file.

I propose:

-help to report documentation on the options
-help -verbose to report the list of options

acdtable to report the full menu formatted in the "Allowed values" box.

When this is implemented, it will appear in the apps/cvs/embossdata.html
documentation at emboss.sf.net :-)

>Yeah, that is of course the most obvious ;-) Thing is that I wanted to
>do it in an emboss-only way so that it would be possible to run the
>emboss command via a soaplab service. The latter should provide a means
>to dynamically fetch a list of codon usage tables. More or less like
>showdb is doing.

We are looking at ways to do that ... can be tricky if cutgextract has
been run. Any suggestions? A showdata application perhaps?

Hope that helps,

Peter


From Marc.Logghe at DEVGEN.com  Wed Apr 12 16:21:17 2006
From: Marc.Logghe at DEVGEN.com (Marc Logghe)
Date: Wed, 12 Apr 2006 18:21:17 +0200
Subject: [EMBOSS] Embossdata -reject option
Message-ID: <0C528E3670D8CE4B8E013F6749231AA6746CDB@ANTARESIA.be.devgen.com>

Hi Peter,

> Hmmmm .... yes, -help and the acdtable output (the table in 
> the webpage application documentation) really need to report 
> the list of menu items for values that are not prompted (list 
> and selection datatypes).
> 
> We will do that for the next release!
> 
> Otherwise, you do need to look in the ACD file.
> 
> I propose:
> 
> -help to report documentation on the options -help -verbose 
> to report the list of options
> 
> acdtable to report the full menu formatted in the "Allowed 
> values" box.

OK, great !

> We are looking at ways to do that ... can be tricky if 
> cutgextract has been run. Any suggestions? A showdata 
> application perhaps?

Yes that could be a start. You could give the directory name as a
parameter, the oposite of the -reject parameter (-include ?).
In it's basic form it can just list the file content like embossdata
-showall is doing.
An example command that lists all the codon tables could be: 'showdata
-include CODONS'.

Something else. In order not to contaminate the CODONS folder I created
a CUTG folder in the <emboss_data> directory containing the codon tables
extracted from the most recent CUTG. Problem now is a user has to add
the relative filename as a cfile option (backtranseq) in order EMBOSS to
find the new codon tables. Would it be an idea that you can set
$EMBOSS_DATA to a list of values instead of only 1 directory name ? In
that way, EMBOSS can access custom data directories.
Suppose the following:
EMBOSS_DATA=/usr/local/share/EMBOSS/data:/my/other/emboss_data_dir/CUTG

If a codon table is not found in the usual place
(/usr/local/share/EMBOSS/data/CODONS) EMBOSS will look for them in other
places defined in EMBOSS_DATA (/my/other/emboss_data_dir/CUTG). Or
something alike.

Does that make sense ?
Cheers,
Marc


From simon.andrews at bbsrc.ac.uk  Thu Apr 13 08:43:53 2006
From: simon.andrews at bbsrc.ac.uk (simon andrews (BI))
Date: Thu, 13 Apr 2006 09:43:53 +0100
Subject: [EMBOSS] Problems indexing with dbxfasta
Message-ID: <F02984326C1F2C428930E8A24561D268012D0487@bie2ksrv1.babraham.bbsrc.ac.uk>

I'm trying to use dbxfasta to index one of the WGS trace databases.
Unfortunately dbxfasta is falling over on me.  The session looks like
this:

$ dbxfasta
Database b+tree indexing for fasta file databases
Basename for index files: traces_oanatinus
Resource name: all
    simple : >ID
     idacc : >ID ACC
     gcgid : >db:ID
  gcgidacc : >db:ID ACC
      dbid : >db ID
      ncbi : | formats
ID line format [idacc]: simple
Database directory [.]:
Wildcard database filename [*.dat]: *.fasta
Release number [0.0]:
Index date [00/00/00]:
Processing file ./nisc-platypus-shotgun-1048960391.fasta
Processing file ./nisc-platypus-shotgun-1071756042.fasta
Processing file ./nisc-platypus-shotgun-1080815515.fasta
Processing file ./nisc-platypus-shotgun-1102160893.fasta
Processing file ./nisc-platypus-shotgun-1104879084.fasta
Processing file ./nisc-platypus-shotgun-1109000445.fasta
Processing file ./nisc-platypus-shotgun-1110804272.fasta
Processing file ./nisc-platypus-shotgun-1116844699.fasta
Processing file ./nisc-platypus-shotgun-1142973027.fasta
Processing file
./wugsc-ornithorhynchus_anatinus-cloneEnd-1113828608.fasta
Processing file
./wugsc-ornithorhynchus_anatinus-cloneEnd-1115655383.fasta
Processing file
./wugsc-ornithorhynchus_anatinus-cloneEnd-1119433885.fasta

   EMBOSS An error in ajindex.c at line 615:
Maximum retries (100) reached in btreeCacheFetch for page 14240710656

The same files have indexed OK with formatdb.  I havent' tried with
dbifasta as I'm trying to move everything over to the new dbx system
(and the rest of our databases have processed OK with dbx(fasta|flat)).

Anyone have any ideas about how to debug this?

Cheers

Simon.

-- 
Simon Andrews PhD
Bioinformatics Group
The Babraham Institute

simon.andrews at bbsrc.ac.uk
+44 (0) 1223 496463 


From Marc.Logghe at DEVGEN.com  Thu Apr 13 09:00:56 2006
From: Marc.Logghe at DEVGEN.com (Marc Logghe)
Date: Thu, 13 Apr 2006 11:00:56 +0200
Subject: [EMBOSS] Problems indexing with dbxfasta
Message-ID: <0C528E3670D8CE4B8E013F6749231AA6746CE0@ANTARESIA.be.devgen.com>

Hi Simon,
> The same files have indexed OK with formatdb.  I havent' 
> tried with dbifasta as I'm trying to move everything over to 
> the new dbx system (and the rest of our databases have 
> processed OK with dbx(fasta|flat)).
> 
> Anyone have any ideas about how to debug this?
You can run the command with the -debug option (any EMBOSS application
accepts this option). In that case a dbxfasta.dbg file will be created.
Hope this file will give you the clues.
Cheers,
Marc


From ajb at ebi.ac.uk  Thu Apr 13 09:19:44 2006
From: ajb at ebi.ac.uk (ajb at ebi.ac.uk)
Date: Thu, 13 Apr 2006 10:19:44 +0100 (BST)
Subject: [EMBOSS] Problems indexing with dbxfasta
In-Reply-To: <F02984326C1F2C428930E8A24561D268012D0487@bie2ksrv1.babraham.bbsrc.ac.
	uk>
References: <F02984326C1F2C428930E8A24561D268012D0487@bie2ksrv1.babraham.bbsrc.ac.uk>
Message-ID: <52959.81.98.244.247.1144919984.squirrel@webmail.ebi.ac.uk>

Hello Simon,

Did you pick up the latest set of patches from:
  ftp://emboss.open-bio.org/pub/EMBOSS/fixes/
?

The indexing system was rewritten a few months ago to fix this. See
the README in that directory.
If you are using the latest fixes (check file sizes) and it is still
failing then let me know.


HTH

Alan

> I'm trying to use dbxfasta to index one of the WGS trace databases.
> Unfortunately dbxfasta is falling over on me.  The session looks like
> this:
>
> $ dbxfasta
> Database b+tree indexing for fasta file databases
> Basename for index files: traces_oanatinus
> Resource name: all
>     simple : >ID
>      idacc : >ID ACC
>      gcgid : >db:ID
>   gcgidacc : >db:ID ACC
>       dbid : >db ID
>       ncbi : | formats
> ID line format [idacc]: simple
> Database directory [.]:
> Wildcard database filename [*.dat]: *.fasta
> Release number [0.0]:
> Index date [00/00/00]:
> Processing file ./nisc-platypus-shotgun-1048960391.fasta
> Processing file ./nisc-platypus-shotgun-1071756042.fasta
> Processing file ./nisc-platypus-shotgun-1080815515.fasta
> Processing file ./nisc-platypus-shotgun-1102160893.fasta
> Processing file ./nisc-platypus-shotgun-1104879084.fasta
> Processing file ./nisc-platypus-shotgun-1109000445.fasta
> Processing file ./nisc-platypus-shotgun-1110804272.fasta
> Processing file ./nisc-platypus-shotgun-1116844699.fasta
> Processing file ./nisc-platypus-shotgun-1142973027.fasta
> Processing file
> ./wugsc-ornithorhynchus_anatinus-cloneEnd-1113828608.fasta
> Processing file
> ./wugsc-ornithorhynchus_anatinus-cloneEnd-1115655383.fasta
> Processing file
> ./wugsc-ornithorhynchus_anatinus-cloneEnd-1119433885.fasta
>
>    EMBOSS An error in ajindex.c at line 615:
> Maximum retries (100) reached in btreeCacheFetch for page 14240710656
>
> The same files have indexed OK with formatdb.  I havent' tried with
> dbifasta as I'm trying to move everything over to the new dbx system
> (and the rest of our databases have processed OK with dbx(fasta|flat)).
>
> Anyone have any ideas about how to debug this?
>
> Cheers
>
> Simon.
>
> --
> Simon Andrews PhD
> Bioinformatics Group
> The Babraham Institute
>
> simon.andrews at bbsrc.ac.uk
> +44 (0) 1223 496463
>
> _______________________________________________
> EMBOSS mailing list
> EMBOSS at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/emboss
>


From simon.andrews at bbsrc.ac.uk  Thu Apr 13 09:30:41 2006
From: simon.andrews at bbsrc.ac.uk (simon andrews (BI))
Date: Thu, 13 Apr 2006 10:30:41 +0100
Subject: [EMBOSS] Problems indexing with dbxfasta
In-Reply-To: <52959.81.98.244.247.1144919984.squirrel@webmail.ebi.ac.uk>
References: <F02984326C1F2C428930E8A24561D268012D0487@bie2ksrv1.babraham.bbsrc.ac.uk>
	<52959.81.98.244.247.1144919984.squirrel@webmail.ebi.ac.uk>
Message-ID: <25bf10458f1cd7e0cc1c64de70f6bdef@bbsrc.ac.uk>


On 13 Apr 2006, at 10:19, ajb at ebi.ac.uk wrote:

> Hello Simon,
>
> Did you pick up the latest set of patches from:
>   ftp://emboss.open-bio.org/pub/EMBOSS/fixes/

Yes.  All patched with the latest fixes as of last week.

> If you are using the latest fixes (check file sizes) and it is still
> failing then let me know.

It is still failing.  I'll have a go at generating a .dbg file if you 
think it'll help, but given how verbose those tend to be, and how long 
it takes to fail I was a bit concerned at the size of file it was 
likely to generate.

Simon.

>
>
> HTH
>
> Alan
>
>> I'm trying to use dbxfasta to index one of the WGS trace databases.
>> Unfortunately dbxfasta is falling over on me.  The session looks like
>> this:
>>
>> $ dbxfasta
>> Database b+tree indexing for fasta file databases
>> Basename for index files: traces_oanatinus
>> Resource name: all
>>     simple : >ID
>>      idacc : >ID ACC
>>      gcgid : >db:ID
>>   gcgidacc : >db:ID ACC
>>       dbid : >db ID
>>       ncbi : | formats
>> ID line format [idacc]: simple
>> Database directory [.]:
>> Wildcard database filename [*.dat]: *.fasta
>> Release number [0.0]:
>> Index date [00/00/00]:
>> Processing file ./nisc-platypus-shotgun-1048960391.fasta
>> Processing file ./nisc-platypus-shotgun-1071756042.fasta
>> Processing file ./nisc-platypus-shotgun-1080815515.fasta
>> Processing file ./nisc-platypus-shotgun-1102160893.fasta
>> Processing file ./nisc-platypus-shotgun-1104879084.fasta
>> Processing file ./nisc-platypus-shotgun-1109000445.fasta
>> Processing file ./nisc-platypus-shotgun-1110804272.fasta
>> Processing file ./nisc-platypus-shotgun-1116844699.fasta
>> Processing file ./nisc-platypus-shotgun-1142973027.fasta
>> Processing file
>> ./wugsc-ornithorhynchus_anatinus-cloneEnd-1113828608.fasta
>> Processing file
>> ./wugsc-ornithorhynchus_anatinus-cloneEnd-1115655383.fasta
>> Processing file
>> ./wugsc-ornithorhynchus_anatinus-cloneEnd-1119433885.fasta
>>
>>    EMBOSS An error in ajindex.c at line 615:
>> Maximum retries (100) reached in btreeCacheFetch for page 14240710656
>>
>> The same files have indexed OK with formatdb.  I havent' tried with
>> dbifasta as I'm trying to move everything over to the new dbx system
>> (and the rest of our databases have processed OK with 
>> dbx(fasta|flat)).
>>
>> Anyone have any ideas about how to debug this?
>>
>> Cheers
>>
>> Simon.
>>
>> --
>> Simon Andrews PhD
>> Bioinformatics Group
>> The Babraham Institute
>>
>> simon.andrews at bbsrc.ac.uk
>> +44 (0) 1223 496463
>>
>> _______________________________________________
>> EMBOSS mailing list
>> EMBOSS at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/emboss
>>
>
>
>
-- 
Simon Andrews PhD
Bioinformatics Dept.
The Babraham Institute

simon.andrews at bbsrc.ac.uk
+44 (0) 1223 496463


From simon.andrews at bbsrc.ac.uk  Thu Apr 13 09:41:13 2006
From: simon.andrews at bbsrc.ac.uk (simon andrews (BI))
Date: Thu, 13 Apr 2006 10:41:13 +0100
Subject: [EMBOSS] Problems indexing with dbxfasta
Message-ID: <F02984326C1F2C428930E8A24561D268012D0488@bie2ksrv1.babraham.bbsrc.ac.uk>

I managed to get hold of a debug file from the failing dbxfasta.  The
edited highlights are:

Debug file dbxfasta.dbg buffered:No
ajFileNewIn '/usr/local/share/EMBOSS/acd/dbxfasta.acd'
EOF ajFileGetsL file /usr/local/share/EMBOSS/acd/dbxfasta.acd
closing file '/usr/local/share/EMBOSS/acd/dbxfasta.acd'
ajUserGet buffer len: 0 res: 2048 ptr: 8d8eb18
ajUserGet buffer len: 0 res: 2048 ptr: 8d8eb18
ajUserGet buffer len: 5 res: 2048 ptr: 8d8f320
ajUserGet buffer len: 1 res: 2048 ptr: 8d8fb28
ajUserGet buffer len: 5 res: 2048 ptr: 8d8eb18
ajUserGet buffer len: 3 res: 2048 ptr: 8d8eb18
ajUserGet buffer len: 8 res: 2048 ptr: 8d8eb18
ajFileScan directory: './'
  nisc-platypus-shotgun-1071756042.fasta
  nisc-platypus-shotgun-1080815515.fasta
  nisc-platypus-shotgun-1102160893.fasta


[snip big list of files]

closing file './/traces_oanatinus.ent'
ajFileNewIn './nisc-platypus-shotgun-1048960391.fasta'
EOF ajFileGetsL file ./nisc-platypus-shotgun-1048960391.fasta
closing file './nisc-platypus-shotgun-1048960391.fasta'
ajFileNewIn './nisc-platypus-shotgun-1071756042.fasta'
EOF ajFileGetsL file ./nisc-platypus-shotgun-1071756042.fasta
closing file './nisc-platypus-shotgun-1071756042.fasta'
ajFileNewIn './nisc-platypus-shotgun-1080815515.fasta'
EOF ajFileGetsL file ./nisc-platypus-shotgun-1080815515.fasta
closing file './nisc-platypus-shotgun-1080815515.fasta'
ajFileNewIn './nisc-platypus-shotgun-1102160893.fasta'
EOF ajFileGetsL file ./nisc-platypus-shotgun-1102160893.fasta
closing file './nisc-platypus-shotgun-1102160893.fasta'
ajFileNewIn './nisc-platypus-shotgun-1104879084.fasta'
EOF ajFileGetsL file ./nisc-platypus-shotgun-1104879084.fasta
closing file './nisc-platypus-shotgun-1104879084.fasta'
ajFileNewIn './nisc-platypus-shotgun-1109000445.fasta'
EOF ajFileGetsL file ./nisc-platypus-shotgun-1109000445.fasta
closing file './nisc-platypus-shotgun-1109000445.fasta'
ajFileNewIn './nisc-platypus-shotgun-1110804272.fasta'
EOF ajFileGetsL file ./nisc-platypus-shotgun-1110804272.fasta
closing file './nisc-platypus-shotgun-1110804272.fasta'
ajFileNewIn './nisc-platypus-shotgun-1116844699.fasta'
EOF ajFileGetsL file ./nisc-platypus-shotgun-1116844699.fasta
closing file './nisc-platypus-shotgun-1116844699.fasta'
ajFileNewIn './nisc-platypus-shotgun-1142973027.fasta'
EOF ajFileGetsL file ./nisc-platypus-shotgun-1142973027.fasta
closing file './nisc-platypus-shotgun-1142973027.fasta'
ajFileNewIn './wugsc-ornithorhynchus_anatinus-cloneEnd-1113828608.fasta'
WriteBucket: Overflow
WriteBucket: Overflow
ReadBucket: Overflow
ReadBucket: Overflow
ReadBucket: Overflow
ReadBucket: Overflow
WriteBucket: Overflow

[Loads more of these]

GetKeys: Overflow
ReadBucket: Overflow
GetKeys: Overflow
WriteNode: Overflow
GetKeys: Overflow
WriteNode: Overflow
WriteBucket: Overflow
WriteBucket: Overflow

[Loads of these]

WriteNode: Overflow
GetKeys: Overflow
WriteNode: Overflow
GetKeys: Overflow
GetKeys: Overflow
WriteNode: Overflow
GetKeys: Overflow
GetKeys: Overflow
WriteNode: Overflow
GetKeys: Overflow
WriteNode: Overflow
GetKeys: Overflow

[Killed at this point as the .dbg file getting enormous] 


From ajb at ebi.ac.uk  Thu Apr 13 10:22:49 2006
From: ajb at ebi.ac.uk (ajb at ebi.ac.uk)
Date: Thu, 13 Apr 2006 11:22:49 +0100 (BST)
Subject: [EMBOSS] Problems indexing with dbxfasta
In-Reply-To: <F02984326C1F2C428930E8A24561D268012D0488@bie2ksrv1.babraham.bbsrc.ac.
	uk>
References: <F02984326C1F2C428930E8A24561D268012D0488@bie2ksrv1.babraham.bbsrc.ac.uk>
Message-ID: <36857.81.98.244.247.1144923769.squirrel@webmail.ebi.ac.uk>

Hi Simon,

The overflow code isn't fully implemented yet and it shouldn't need
to use it if your resource definition is OK. You'll get
overflows if the length values are too short for the
ID/ACC/SV/etc. Take a look and get back to me off-list
if adjusting any appropriate length resource definitions
doesn't help.

HTH

Alan


> I managed to get hold of a debug file from the failing dbxfasta.  The
> edited highlights are:
>
> Debug file dbxfasta.dbg buffered:No
> ajFileNewIn '/usr/local/share/EMBOSS/acd/dbxfasta.acd'
> EOF ajFileGetsL file /usr/local/share/EMBOSS/acd/dbxfasta.acd
> closing file '/usr/local/share/EMBOSS/acd/dbxfasta.acd'
> ajUserGet buffer len: 0 res: 2048 ptr: 8d8eb18
> ajUserGet buffer len: 0 res: 2048 ptr: 8d8eb18
> ajUserGet buffer len: 5 res: 2048 ptr: 8d8f320
> ajUserGet buffer len: 1 res: 2048 ptr: 8d8fb28
> ajUserGet buffer len: 5 res: 2048 ptr: 8d8eb18
> ajUserGet buffer len: 3 res: 2048 ptr: 8d8eb18
> ajUserGet buffer len: 8 res: 2048 ptr: 8d8eb18
> ajFileScan directory: './'
>   nisc-platypus-shotgun-1071756042.fasta
>   nisc-platypus-shotgun-1080815515.fasta
>   nisc-platypus-shotgun-1102160893.fasta
>
>
> [snip big list of files]
>
> closing file './/traces_oanatinus.ent'
> ajFileNewIn './nisc-platypus-shotgun-1048960391.fasta'
> EOF ajFileGetsL file ./nisc-platypus-shotgun-1048960391.fasta
> closing file './nisc-platypus-shotgun-1048960391.fasta'
> ajFileNewIn './nisc-platypus-shotgun-1071756042.fasta'
> EOF ajFileGetsL file ./nisc-platypus-shotgun-1071756042.fasta
> closing file './nisc-platypus-shotgun-1071756042.fasta'
> ajFileNewIn './nisc-platypus-shotgun-1080815515.fasta'
> EOF ajFileGetsL file ./nisc-platypus-shotgun-1080815515.fasta
> closing file './nisc-platypus-shotgun-1080815515.fasta'
> ajFileNewIn './nisc-platypus-shotgun-1102160893.fasta'
> EOF ajFileGetsL file ./nisc-platypus-shotgun-1102160893.fasta
> closing file './nisc-platypus-shotgun-1102160893.fasta'
> ajFileNewIn './nisc-platypus-shotgun-1104879084.fasta'
> EOF ajFileGetsL file ./nisc-platypus-shotgun-1104879084.fasta
> closing file './nisc-platypus-shotgun-1104879084.fasta'
> ajFileNewIn './nisc-platypus-shotgun-1109000445.fasta'
> EOF ajFileGetsL file ./nisc-platypus-shotgun-1109000445.fasta
> closing file './nisc-platypus-shotgun-1109000445.fasta'
> ajFileNewIn './nisc-platypus-shotgun-1110804272.fasta'
> EOF ajFileGetsL file ./nisc-platypus-shotgun-1110804272.fasta
> closing file './nisc-platypus-shotgun-1110804272.fasta'
> ajFileNewIn './nisc-platypus-shotgun-1116844699.fasta'
> EOF ajFileGetsL file ./nisc-platypus-shotgun-1116844699.fasta
> closing file './nisc-platypus-shotgun-1116844699.fasta'
> ajFileNewIn './nisc-platypus-shotgun-1142973027.fasta'
> EOF ajFileGetsL file ./nisc-platypus-shotgun-1142973027.fasta
> closing file './nisc-platypus-shotgun-1142973027.fasta'
> ajFileNewIn './wugsc-ornithorhynchus_anatinus-cloneEnd-1113828608.fasta'
> WriteBucket: Overflow
> WriteBucket: Overflow
> ReadBucket: Overflow
> ReadBucket: Overflow
> ReadBucket: Overflow
> ReadBucket: Overflow
> WriteBucket: Overflow
>
> [Loads more of these]
>
> GetKeys: Overflow
> ReadBucket: Overflow
> GetKeys: Overflow
> WriteNode: Overflow
> GetKeys: Overflow
> WriteNode: Overflow
> WriteBucket: Overflow
> WriteBucket: Overflow
>
> [Loads of these]
>
> WriteNode: Overflow
> GetKeys: Overflow
> WriteNode: Overflow
> GetKeys: Overflow
> GetKeys: Overflow
> WriteNode: Overflow
> GetKeys: Overflow
> GetKeys: Overflow
> WriteNode: Overflow
> GetKeys: Overflow
> WriteNode: Overflow
> GetKeys: Overflow
>
> [Killed at this point as the .dbg file getting enormous]
>
> _______________________________________________
> EMBOSS mailing list
> EMBOSS at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/emboss
>


From simon.andrews at bbsrc.ac.uk  Thu Apr 13 13:36:11 2006
From: simon.andrews at bbsrc.ac.uk (simon andrews (BI))
Date: Thu, 13 Apr 2006 14:36:11 +0100
Subject: [EMBOSS] Problems indexing with dbxfasta
In-Reply-To: <36857.81.98.244.247.1144923769.squirrel@webmail.ebi.ac.uk>
References: <F02984326C1F2C428930E8A24561D268012D0488@bie2ksrv1.babraham.bbsrc.ac.uk>
	<36857.81.98.244.247.1144923769.squirrel@webmail.ebi.ac.uk>
Message-ID: <b4b09803fdaa7ffb1e8f7d4d21a06a6d@bbsrc.ac.uk>

Alan,

I increased all of the values in the resource definition and did the 
index again and it all worked fine this time.  Looks like there must be 
some very long ids somewhere in this data.

Thanks for the help

Simon.

On 13 Apr 2006, at 11:22, ajb at ebi.ac.uk wrote:

> Hi Simon,
>
> The overflow code isn't fully implemented yet and it shouldn't need
> to use it if your resource definition is OK. You'll get
> overflows if the length values are too short for the
> ID/ACC/SV/etc. Take a look and get back to me off-list
> if adjusting any appropriate length resource definitions
> doesn't help.
>
> HTH
>
> Alan
>
>
>> I managed to get hold of a debug file from the failing dbxfasta.  The
>> edited highlights are:
>>
>> Debug file dbxfasta.dbg buffered:No
>> ajFileNewIn '/usr/local/share/EMBOSS/acd/dbxfasta.acd'
>> EOF ajFileGetsL file /usr/local/share/EMBOSS/acd/dbxfasta.acd
>> closing file '/usr/local/share/EMBOSS/acd/dbxfasta.acd'
>> ajUserGet buffer len: 0 res: 2048 ptr: 8d8eb18
>> ajUserGet buffer len: 0 res: 2048 ptr: 8d8eb18
>> ajUserGet buffer len: 5 res: 2048 ptr: 8d8f320
>> ajUserGet buffer len: 1 res: 2048 ptr: 8d8fb28
>> ajUserGet buffer len: 5 res: 2048 ptr: 8d8eb18
>> ajUserGet buffer len: 3 res: 2048 ptr: 8d8eb18
>> ajUserGet buffer len: 8 res: 2048 ptr: 8d8eb18
>> ajFileScan directory: './'
>>   nisc-platypus-shotgun-1071756042.fasta
>>   nisc-platypus-shotgun-1080815515.fasta
>>   nisc-platypus-shotgun-1102160893.fasta
>>
>>
>> [snip big list of files]
>>
>> closing file './/traces_oanatinus.ent'
>> ajFileNewIn './nisc-platypus-shotgun-1048960391.fasta'
>> EOF ajFileGetsL file ./nisc-platypus-shotgun-1048960391.fasta
>> closing file './nisc-platypus-shotgun-1048960391.fasta'
>> ajFileNewIn './nisc-platypus-shotgun-1071756042.fasta'
>> EOF ajFileGetsL file ./nisc-platypus-shotgun-1071756042.fasta
>> closing file './nisc-platypus-shotgun-1071756042.fasta'
>> ajFileNewIn './nisc-platypus-shotgun-1080815515.fasta'
>> EOF ajFileGetsL file ./nisc-platypus-shotgun-1080815515.fasta
>> closing file './nisc-platypus-shotgun-1080815515.fasta'
>> ajFileNewIn './nisc-platypus-shotgun-1102160893.fasta'
>> EOF ajFileGetsL file ./nisc-platypus-shotgun-1102160893.fasta
>> closing file './nisc-platypus-shotgun-1102160893.fasta'
>> ajFileNewIn './nisc-platypus-shotgun-1104879084.fasta'
>> EOF ajFileGetsL file ./nisc-platypus-shotgun-1104879084.fasta
>> closing file './nisc-platypus-shotgun-1104879084.fasta'
>> ajFileNewIn './nisc-platypus-shotgun-1109000445.fasta'
>> EOF ajFileGetsL file ./nisc-platypus-shotgun-1109000445.fasta
>> closing file './nisc-platypus-shotgun-1109000445.fasta'
>> ajFileNewIn './nisc-platypus-shotgun-1110804272.fasta'
>> EOF ajFileGetsL file ./nisc-platypus-shotgun-1110804272.fasta
>> closing file './nisc-platypus-shotgun-1110804272.fasta'
>> ajFileNewIn './nisc-platypus-shotgun-1116844699.fasta'
>> EOF ajFileGetsL file ./nisc-platypus-shotgun-1116844699.fasta
>> closing file './nisc-platypus-shotgun-1116844699.fasta'
>> ajFileNewIn './nisc-platypus-shotgun-1142973027.fasta'
>> EOF ajFileGetsL file ./nisc-platypus-shotgun-1142973027.fasta
>> closing file './nisc-platypus-shotgun-1142973027.fasta'
>> ajFileNewIn 
>> './wugsc-ornithorhynchus_anatinus-cloneEnd-1113828608.fasta'
>> WriteBucket: Overflow
>> WriteBucket: Overflow
>> ReadBucket: Overflow
>> ReadBucket: Overflow
>> ReadBucket: Overflow
>> ReadBucket: Overflow
>> WriteBucket: Overflow
>>
>> [Loads more of these]
>>
>> GetKeys: Overflow
>> ReadBucket: Overflow
>> GetKeys: Overflow
>> WriteNode: Overflow
>> GetKeys: Overflow
>> WriteNode: Overflow
>> WriteBucket: Overflow
>> WriteBucket: Overflow
>>
>> [Loads of these]
>>
>> WriteNode: Overflow
>> GetKeys: Overflow
>> WriteNode: Overflow
>> GetKeys: Overflow
>> GetKeys: Overflow
>> WriteNode: Overflow
>> GetKeys: Overflow
>> GetKeys: Overflow
>> WriteNode: Overflow
>> GetKeys: Overflow
>> WriteNode: Overflow
>> GetKeys: Overflow
>>
>> [Killed at this point as the .dbg file getting enormous]
>>
>> _______________________________________________
>> EMBOSS mailing list
>> EMBOSS at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/emboss
>>
>
>
>
-- 
Simon Andrews PhD
Bioinformatics Dept.
The Babraham Institute

simon.andrews at bbsrc.ac.uk
+44 (0) 1223 496463


From msarachu at biol.unlp.edu.ar  Mon Apr 17 20:55:47 2006
From: msarachu at biol.unlp.edu.ar (Martin Sarachu)
Date: Mon, 17 Apr 2006 17:55:47 -0300
Subject: [EMBOSS] wEMBOSS-1.6.0 & wrappers4EMBOSS-1.4.0 release
Message-ID: <444400D3.6080705@biol.unlp.edu.ar>

This is to announce the release of both wEMBOSS-1.6.0 and
wrappers4EMBOSS-1.4.0

Changes in wEMBOSS-1.6.0 includes:
  - compatibility with new datatypes in EMBOSS-3.0.0
  - better conversion of ACD expressions to Perl to maintain
the same order of priority as in EMBOSS
  - increased speed by preprocessing EMBOSS datafiles

Changes in wrappers4EMBOSS includes:
  - all programs that compute a gap penalty of type a*n+b now have
parameters -gappenalty and -gaplength instead of -gapopen and -gapextend
  - muscle version updated for MUSCLE-3.6
  - support for EMBOSS v2.9, 2.10 and 3. EMBOSS-2.8 is no longer supported
  - fastapid uses matrices coded in the software rather then read from files
  - indexsearch can also run with SRS 8 and it also fully runs on
command line

We are experiencing some dificulties at the wEMBOSS site so you can 
download both files at http://www.ar.embnet.org/downloads
Shortly you will be able to download both at http://www.wemboss.org as 
usual.
wEMBOSS includes wrappers4EMBOSS but if you want to use just
wrappers4EMBOSS on the command line just like any EMBOSS program you can
download it separately.


Regards,

the wEMBOSS & wrappers4EMBOSS dev team.

-- 
Martin Sarachu
msarachu at biol.unlp.edu.ar
AR.EMBnet
http://www.ar.embnet.org


From dalesan at lamar.colostate.edu  Tue Apr 18 23:53:03 2006
From: dalesan at lamar.colostate.edu (Dale Richardson)
Date: Tue, 18 Apr 2006 17:53:03 -0600
Subject: [EMBOSS] Compilation errors on Intel Mac in ajindex.c
Message-ID: <1D9B80E3-1F2B-4171-A346-BB63EF775585@lamar.colostate.edu>

Hello All,

I am trying to install EMBOSS 3.0 on my MacBook Pro.  Interestingly,  
I have come across an error that I haven't been able to resolve via  
googling.

When running make, the following error is encountered:

ajindex.c: In function 'ajBtreeCacheNewC':
ajindex.c:200: error: storage size of 'buf' isn't known
ajindex.c: In function 'ajBtreeSecCacheNewC':
ajindex.c:8234: error: storage size of 'buf' isn't known
make[1]: *** [ajindex.lo] Error 1
make: *** [all-recursive] Error 1

Is there a way around this?  I've applied the fixes available from  
the fixes directory at ftp://emboss.open-bio.org/pub/EMBOSS/ and  
tried to reconfigure and recompile but to no avail.

Insights and suggestions would be much appreciated.

Thanks,

Dale Richardson
Colorado State University
dalesan at lamar.colostate.edu


From kvddrift at earthlink.net  Wed Apr 19 01:27:30 2006
From: kvddrift at earthlink.net (Koen van der Drift)
Date: Tue, 18 Apr 2006 21:27:30 -0400
Subject: [EMBOSS] Compilation errors on Intel Mac in ajindex.c
In-Reply-To: <1D9B80E3-1F2B-4171-A346-BB63EF775585@lamar.colostate.edu>
References: <1D9B80E3-1F2B-4171-A346-BB63EF775585@lamar.colostate.edu>
Message-ID: <E5391EE0-46D3-400C-9A52-27C4585C4A98@earthlink.net>


On Apr 18, 2006, at 7:53 PM, Dale Richardson wrote:

> Insights and suggestions would be much appreciated.

You could try to install emboss using fink, which is reportedly  
working on an Intel Mac (not tested by myself though).

- Koen.


From kvddrift at earthlink.net  Wed Apr 19 01:30:43 2006
From: kvddrift at earthlink.net (Koen van der Drift)
Date: Tue, 18 Apr 2006 21:30:43 -0400
Subject: [EMBOSS] Compilation errors on Intel Mac in ajindex.c
In-Reply-To: <1D9B80E3-1F2B-4171-A346-BB63EF775585@lamar.colostate.edu>
References: <1D9B80E3-1F2B-4171-A346-BB63EF775585@lamar.colostate.edu>
Message-ID: <5195A9EE-A20F-4804-9AD5-FA08662D8912@earthlink.net>


On Apr 18, 2006, at 7:53 PM, Dale Richardson wrote:

> Is there a way around this?  I've applied the fixes available from
> the fixes directory at ftp://emboss.open-bio.org/pub/EMBOSS/ and
> tried to reconfigure and recompile but to no avail.
>
> Insights and suggestions would be much appreciated.


Just another thought, did you also replace the configure file from  
the fixes directory, followed by the ./configure command?

- Koen.


From olivier.friard at unito.it  Fri Apr 21 15:00:20 2006
From: olivier.friard at unito.it (Olivier Friard)
Date: Fri, 21 Apr 2006 17:00:20 +0200
Subject: [EMBOSS] index RefSeq for EMBOSS
Message-ID: <4448F384.7020900@unito.it>

Hi,

I tried to index the RefSeq database:

1) I downloaded all 
ftp://ftp.ncbi.nih.gov/refseq/release/complete/complete*.genomic.gbff.gz 
file (GB format)

2) gunziped

3) Added the rs_dna entry to my .embossrc file


DB rs_dna [
    type: "N"
    method: "emblcd"
    format: "GB"
    dir: "/home/users/friard/data/refseq_genomic/"
    file: "*.gbff"
    release: ""
    comment: "RefSeq Genomic  (upd)"
    indexdir: "/home/users/friard/data/refseq_genomic/"
]


4) used dbiflat with following arguments (from the directory where files 
are stored)

dbiflat
Index a flat file database
Database name: rs_dna
       EMBL : EMBL
      SWISS : Swiss-Prot, SpTrEMBL, TrEMBLnew
         GB : Genbank, DDBJ
     REFSEQ : Refseq
Entry format [SWISS]: REFSEQ
Database directory [.]:
Wildcard database filename [*.dat]: *.gbff
Release number [0.0]:
Index date [00/00/00]:

The indexes were created but when I try to access to a sequence (i.e 
seqret rs_rna:NC_000004) then results is not the correct sequence but an 
other one with the NC_000004 ID!


I also downloaded the file in FASTA format and tried to index them with 
the dbifasta command (format: ncbi) without positive results:

seqret rs_dna:nc_000004
Reads and writes (returns) sequences
Error: Unable to read sequence 'rs_dna:nc_000004'
Died: seqret terminated: Bad value for '-sequence' and no prompt


Does anyone index the RefSeq successfully?
Thank you in advance


-- 

Olivier Friard
Laboratorio di Biologia Computazionale
Facolt? di Scienze MFN
Universit? di Torino
via Accademia Albertina 13, 10124 TORINO (Italy)

tel. +39 011 6704689


From simon.andrews at bbsrc.ac.uk  Fri Apr 21 15:35:29 2006
From: simon.andrews at bbsrc.ac.uk (simon andrews (BI))
Date: Fri, 21 Apr 2006 16:35:29 +0100
Subject: [EMBOSS] index RefSeq for EMBOSS
In-Reply-To: <4448F384.7020900@unito.it>
References: <4448F384.7020900@unito.it>
Message-ID: <ae34eb8837560b8610df39877f5ad928@bbsrc.ac.uk>


On 21 Apr 2006, at 16:00, Olivier Friard wrote:

> The indexes were created but when I try to access to a sequence (i.e
> seqret rs_rna:NC_000004) then results is not the correct sequence but 
> an
> other one with the NC_000004 ID!

Is it just finding the wrong sequence or could you have duplicate 
entries in the data?  Use entret to see if the entry really has that 
ID.

We found that we got problems with incorrect or no sequences being 
returned by seqret when some of the individual sequence files were >2Gb 
in size.  In these cases you can use the new dbx* indexing programs 
which handle large files properly.

> Does anyone index the RefSeq successfully?

Yes.  We use it here without problems, but indexed with dbxflat.

It gets indexed with:

dbxflat -dbresource all -auto -idformat refseq -dbname refseq_all 
-filenames \*.gbff

..and the emboss.default entry looks like:

DB refseq_all
  [
     type: N
     comment: "Refseq"
     method: emboss
     format: genbank
     dbalias: refseq_all
     directory: /data/public/DNA/Refseq/Current/all
     file: *.gbff
  ]

with the resource section being:

RES all [ type: Index
   idlen:  15
   acclen: 15
   svlen:  15
   keylen: 15
   deslen: 15
   orglen: 15
]


Simon.
-- 
Simon Andrews PhD
Bioinformatics Dept.
The Babraham Institute

simon.andrews at bbsrc.ac.uk
+44 (0) 1223 496463


From isabelle.wells at roche.com  Fri Apr 21 15:43:27 2006
From: isabelle.wells at roche.com (Wells, Isabelle)
Date: Fri, 21 Apr 2006 17:43:27 +0200
Subject: [EMBOSS] index RefSeq for EMBOSS
Message-ID: <B247DF4AC17BBF40B119308717C33702029FA495@rkamsem1.emea.roche.com>

Hi,

Yes I also index refseq. I think the problem here is that dbiflat can only handle files which are less than 2GB. So try splitting the files first.

Best,
Isabelle

-----Original Message-----
From: emboss-bounces at lists.open-bio.org [mailto:emboss-bounces at lists.open-bio.org] On Behalf Of Olivier Friard
Sent: Friday, April 21, 2006 17:00
To: emboss at emboss.open-bio.org
Subject: [EMBOSS] index RefSeq for EMBOSS


Hi,

I tried to index the RefSeq database:

1) I downloaded all 
ftp://ftp.ncbi.nih.gov/refseq/release/complete/complete*.genomic.gbff.gz 
file (GB format)

2) gunziped

3) Added the rs_dna entry to my .embossrc file


DB rs_dna [
    type: "N"
    method: "emblcd"
    format: "GB"
    dir: "/home/users/friard/data/refseq_genomic/"
    file: "*.gbff"
    release: ""
    comment: "RefSeq Genomic  (upd)"
    indexdir: "/home/users/friard/data/refseq_genomic/"
]


4) used dbiflat with following arguments (from the directory where files 
are stored)

dbiflat
Index a flat file database
Database name: rs_dna
       EMBL : EMBL
      SWISS : Swiss-Prot, SpTrEMBL, TrEMBLnew
         GB : Genbank, DDBJ
     REFSEQ : Refseq
Entry format [SWISS]: REFSEQ
Database directory [.]:
Wildcard database filename [*.dat]: *.gbff
Release number [0.0]:
Index date [00/00/00]:

The indexes were created but when I try to access to a sequence (i.e 
seqret rs_rna:NC_000004) then results is not the correct sequence but an 
other one with the NC_000004 ID!


I also downloaded the file in FASTA format and tried to index them with 
the dbifasta command (format: ncbi) without positive results:

seqret rs_dna:nc_000004
Reads and writes (returns) sequences
Error: Unable to read sequence 'rs_dna:nc_000004'
Died: seqret terminated: Bad value for '-sequence' and no prompt


Does anyone index the RefSeq successfully?
Thank you in advance


-- 

Olivier Friard
Laboratorio di Biologia Computazionale
Facolt? di Scienze MFN
Universit? di Torino
via Accademia Albertina 13, 10124 TORINO (Italy)

tel. +39 011 6704689

_______________________________________________
EMBOSS mailing list
EMBOSS at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/emboss


From David.Bauer at schering.de  Mon Apr 24 05:52:50 2006
From: David.Bauer at schering.de (David.Bauer at schering.de)
Date: Mon, 24 Apr 2006 07:52:50 +0200
Subject: [EMBOSS] index RefSeq for EMBOSS
In-Reply-To: <B247DF4AC17BBF40B119308717C33702029FA495@rkamsem1.emea.roche.com>
Message-ID: <OF5F484A92.0F56D33A-ONC125715A.001FFF89-C125715A.00204D9D@schering.de>


You can also try the new indexing programs dbxflat and dbxfasta, which can
handle files larger than 2 GB.

Regards,
David.

emboss-bounces at lists.open-bio.org schrieb am 21/04/2006 17:43:27:

> Hi,
>
> Yes I also index refseq. I think the problem here is that dbiflat
> can only handle files which are less than 2GB. So try splitting the
> files first.
>
> Best,
> Isabelle
>
> -----Original Message-----
> From: emboss-bounces at lists.open-bio.org [mailto:emboss-
> bounces at lists.open-bio.org] On Behalf Of Olivier Friard
> Sent: Friday, April 21, 2006 17:00
> To: emboss at emboss.open-bio.org
> Subject: [EMBOSS] index RefSeq for EMBOSS
>
>
> Hi,
>
> I tried to index the RefSeq database:
>
> 1) I downloaded all
> ftp://ftp.ncbi.nih.gov/refseq/release/complete/complete*.genomic.gbff.gz

> file (GB format)
>
> 2) gunziped
>
> 3) Added the rs_dna entry to my .embossrc file
>
>
> DB rs_dna [
>     type: "N"
>     method: "emblcd"
>     format: "GB"
>     dir: "/home/users/friard/data/refseq_genomic/"
>     file: "*.gbff"
>     release: ""
>     comment: "RefSeq Genomic  (upd)"
>     indexdir: "/home/users/friard/data/refseq_genomic/"
> ]
>
>
> 4) used dbiflat with following arguments (from the directory where files

> are stored)
>
> dbiflat
> Index a flat file database
> Database name: rs_dna
>        EMBL : EMBL
>       SWISS : Swiss-Prot, SpTrEMBL, TrEMBLnew
>          GB : Genbank, DDBJ
>      REFSEQ : Refseq
> Entry format [SWISS]: REFSEQ
> Database directory [.]:
> Wildcard database filename [*.dat]: *.gbff
> Release number [0.0]:
> Index date [00/00/00]:
>
> The indexes were created but when I try to access to a sequence (i.e
> seqret rs_rna:NC_000004) then results is not the correct sequence but an

> other one with the NC_000004 ID!
>
>
>
> I also downloaded the file in FASTA format and tried to index them with
> the dbifasta command (format: ncbi) without positive results:
>
> seqret rs_dna:nc_000004
> Reads and writes (returns) sequences
> Error: Unable to read sequence 'rs_dna:nc_000004'
> Died: seqret terminated: Bad value for '-sequence' and no prompt
>
>
> Does anyone index the RefSeq successfully?
> Thank you in advance
>
>
>
>
>
>
> --
>
> Olivier Friard
> Laboratorio di Biologia Computazionale
> Facolt? di Scienze MFN
> Universit? di Torino
> via Accademia Albertina 13, 10124 TORINO (Italy)
>
> tel. +39 011 6704689
>
> _______________________________________________
> EMBOSS mailing list
> EMBOSS at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/emboss
>
> _______________________________________________
> EMBOSS mailing list
> EMBOSS at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/emboss


From olivier.friard at unito.it  Wed Apr 26 10:29:51 2006
From: olivier.friard at unito.it (Olivier Friard)
Date: Wed, 26 Apr 2006 12:29:51 +0200
Subject: [EMBOSS] index RefSeq with dbxflat
Message-ID: <444F4B9F.1020209@unito.it>

Hello,

Thank you for your kindly help for indexing refseq.


I try to index RefSeq DNA db using the dbxflat program with the 
following arguments:

dbxflat
Database b+tree indexing for flat file databases
Basename for index files: rs_dna
Resource name: rs_dna
       EMBL : EMBL
      SWISS : Swiss-Prot, SpTrEMBL, TrEMBLnew
         GB : Genbank, DDBJ
     REFSEQ : Refseq
Entry format [SWISS]: REFSEQ
Wildcard database filename [*.dat]: *.gbff
Database directory [.]: /home/users/friard/data/refseq_genomic
         id : ID
        acc : Accession number
         sv : Sequence Version and GI
        des : Description
        key : Keywords
        org : Taxonomy
Index fields [id,acc]:

I included these records in my .embossrc file:

DB rs_dna [
     type: "N"
     method: "emboss"
     dbalias: "rs_dna"
     format: "genbank"
     directory: "/home/users/friard/data/refseq_genomic/"
     file: "*.gbff"
     comment: "RefSeq DNA (dbxflat)"
]

RES rs_dna [
    type: Index
    idlen:  15
    acclen: 15
    svlen:  15
    keylen: 15
    deslen: 15
    orglen: 15
]

but when I try to retrieve a single sequence with its AC (seqret 
rs_dna:NC_001911) the program fails with this error message:

seqret rs_dna:NC_001191
Reads and writes (returns) sequences
Error: Unable to read sequence 'rs_dna:NC_001191'
Died: seqret terminated: Bad value for '-sequence' and no prompt

when I try to retrieve all sequences with "seqret rs_dna:* -out 
fasta::refseq.fasta" and everything works well

I try to use dbxfasta with the *.fna files (modifying the .embossrc file 
with "fasta" value) but I obtained the same error.

Any idea about the problem?

Thank you in advance

Olivier Friard


From xiaozhendong at gmail.com  Wed Apr 26 13:50:01 2006
From: xiaozhendong at gmail.com (zhendong shaw)
Date: Wed, 26 Apr 2006 21:50:01 +0800
Subject: [EMBOSS] how to using Einverted to process a file contain multiple
	sequences
Message-ID: <ccfd29ab0604260650q5b7d4d5fjc2c7562668699b36@mail.gmail.com>

Since the Einverted program is designed to process only one sequences a
time. Are there any ways to handle a file in fasta format containing
multiple sequences?
The input file just like follow:
>seq1
ATTTTTTTTTTTTTTTTTTTT
>seq2
TTTAAAAAAAAAAAAAAA
.......

sth like that....


From rls at ebi.ac.uk  Wed Apr 26 15:46:51 2006
From: rls at ebi.ac.uk (Rodrigo Lopez)
Date: Wed, 26 Apr 2006 16:46:51 +0100
Subject: [EMBOSS] FW: Forthcoming change in the EMBL flatfile format
Message-ID: <00c401c66948$a2e29d40$0132a8c0@windows.ebi.ac.uk>

 
> -----Original Message-----
> From: owner-seq-dbg at ebi.ac.uk 
> [mailto:owner-seq-dbg at ebi.ac.uk] On Behalf Of Carola Kanz
> Sent: 26 April 2006 16:29
> To: seq-dbg at ebi.ac.uk
> Subject: Forthcoming change in the EMBL flatfile format
> 
> 
> Dear all,
> 
> if you are working with the EMBL flatfile format and you are 
> not yet aware of the format change we are going to introduce 
> with the next release, please have a look at the following 
> announcement.
> Carola
> 
> 
> --------------------------------------------------------------
> -----------
> 
> Dear colleagues,
> 
> We would like to announce the following important change in 
> the EMBL database in June this year.
> 
> At the time of release 87 (available from JUN-2006) the 
> format of the EMBL flat file will undergo a change: the ID 
> line will have a different structure (see below) and the SV 
> line will be removed.
> 
> The changes affecting the ID line structure are:
> 
>      * All tokens will be separated by a semicolon.
>      * The entry name will not be displayed, in its place 
> there will be  
>        the primary accession number.
>      * The sequence version will be indicated.
>      * The topology will be a separate token and will be 
> indicated for 
>        both circular and linear molecules.
>      * Both the data class and the taxonomic divisions will 
> be displayed.
> 
> This is an example of the new ID line:
> 
> ID   CD789012; SV 4; linear; genomic DNA; HTG; MAM; 500 BP.
>         (1)     (2)     (3)      (4)       (5)  (6)   (7)
> 
> 
> The tokens represent:
> 
>     1. Primary accession number.
>     2. 'SV' + sequence version number.
>     3. Topology: 'circular' or 'linear'.
>     4. Molecule type.
>     5. Data class (ANN, CON, PAT, EST, GSS, HTC, HTG, MGA, WGS, TPA, 
>        STS, STD, "normal" entries will have STD for standard).
>     6. Taxonomic division (HUM, MUS, ROD, PRO, MAM, VRT, FUN, 
> PLN, ENV, 
>        INV, SYN, UNC, VRL, PHG)."
>     7. Sequence length + 'BP.'.
> 
> The entry name will not be displayed any more in the ID line. 
> Since EMBL release 3 (Dec 1983) the stable identifier of an 
> entry has been the primary accession number.
> 
> A mapping file (entryname to accession number) will be 
> provided with the next release for those entries where the 
> entryname doesn't coincide with the accession number.
> 
> To give users a test dataset, one file with new-style ID 
> lines called new_id_line.test.gz was provided together with 
> the March release of the EMBL database: 
> ftp://ftp.ebi.ac.uk/pub/databases/embl/release/new_id_line.test.gz 
> 
> Feedback from users is sought; please use the "Contact us" 
> link at the bottom of the EBI home page and specify "EMBL" in 
> the feedback form.
> 
> Note: this information was first made available on our 
> "Forthcoming changes" page (
> http://www.ebi.ac.uk/embl/Documentation/forthcomingchanges.htm
> l#0606 ) and in the EMBL database release notes.
> 
> 
> 
> 
> 
> 


From pmr at ebi.ac.uk  Fri Apr 28 09:04:31 2006
From: pmr at ebi.ac.uk (Peter Rice)
Date: Fri, 28 Apr 2006 10:04:31 +0100
Subject: [EMBOSS] EMBOSS Funding News
Message-ID: <4451DA9F.5030906@ebi.ac.uk>

EMBOSS will be funded by the UK Biotechnology and Biological Sciences 
Research Council (BBSRC) for the next 3 years. EBI has issued the 
following press release, also available from:

http://www.ebi.ac.uk/Information/News/pdf/Press25Apr06-small.pdf

The EMBOSS team would like to thanks all our users and developers for 
their patience over the past two years.

regards,

Peter Rice
Alan Bleasby
Jon Ison

A brighter future for Europe?s favourite molecular biology software package

New funding for EMBOSS ? Europe?s leading suite of molecular biology 
analysis tools ? guarantees open access for researchers and software 
developers

Hinxton, 25 April, 2006 ? EMBOSS, the European Molecular Biology Open 
Software Suite, has received a vital funding boost from the UK 
Biotechnology and Biological Sciences Research Council (BBSRC) that will 
guarantee its continued maintenance under an open source license for the 
next three years. This ends two years of uncertainty over the future of 
the project.

Until recently, EMBOSS was hosted by the Medical Research Council?s 
Rosalind Franklin Centre for Genomics Research (RFCGR), where it was 
funded jointly by the BBSRC and the Medical Research Council (see ?notes 
for editors? for more information on the history of EMBOSS). With the 
announcement in April 2004 of the RFCGR?s closure, the future of EMBOSS 
hung in the balance. The new funding from the BBSRC means that EMBOSS 
co-founders Peter Rice and Alan Bleasby will be able to continue the 
EMBOSS project at the EMBL-EBI for the next three years. EMBOSS will 
remain freely available from emboss.sourceforge.net and anyone who wants 
to develop it further will have access to its source code. ?We?re 
delighted that the BBSRC has recognized EMBOSS as an important tool for 
molecular biology? says project leader Peter Rice. ?The EMBOSS user 
community has been very patient, and it highlights a great benefit of 
open source software that even users in industry have continued to rely 
on EMBOSS despite the uncertainty about its future. This simply could 
not have happened if EMBOSS had been a commercial package under threat.?

EMBOSS provides a powerful package of around 300 applications for 
molecular biology and bioinformatics analysis. Molecular biologists use 
EMBOSS at all stages of their research, from planning experiments to 
analysing results. It also has an application-programming interface 
(API) that enables software developers to write their own EMBOSS 
applications. These can readily be strung together, allowing users to 
create ?workflows? that automate complex and time-consuming tasks. 
EMBOSS has also been used in many commercial software developments and 
is included in commercial bioinformatics systems. Its flexibility has 
made it an obvious core component of several data integration and 
bioinformatics infrastructure projects, including myGrid and EMBRACE.

The new funding also provides helpdesk support for EMBOSS?s users. ?As 
well as helping researchers with limited bioinformatics expertise to 
make the most of EMBOSS, we will be able to provide better support and 
documentation to the estimated 20% of our users who are also software 
developers?, explains Alan Bleasby. ?We will encourage these experts to 
contribute their code to the project. In return, we will make their 
software widely available through the EMBOSS website and provide ongoing 
user support for it. This mechanism will help to ensure that EMBOSS 
evolves according to the needs of its users.?

Contact:

Cath Brooksbank PhD, EMBL-EBI Scientific Outreach Officer, Hinxton, UK, 
Tel: +44 1223 492 552, www.ebi.ac.uk, cath at ebi.ac.uk
Anna-Lynn Wegener, EMBL Press Officer, Heidelberg, Germany, Tel: +49 
6221 387 452, www.embl.org, wegener at embl.de


Notes for editors ? a brief history of EMBOSS

EMBOSS, an open source suite of tools for the analysis of biological 
data, has its origins in the late 1980s when Peter Rice, a co-founder of 
EMBOSS, was working at EMBL. Encouraged by his colleagues in the lab, he 
began to write extensions to the GCG package, which at that time 
provided its source code to users. His efforts evolved into EGCG 
(extended GCG) and Rice moved to the Sanger Centre (now the Wellcome 
Trust Sanger Institute) to continue its development. However, the 
changes to the source code licensing of GCG in 1996 put an end to 
further development of EGCG. Recognizing the importance of free source 
code to the rapid and cost-effective development of bioinformatics 
tools, Rice, in collaboration with Alan Bleasby (then at SEQNET, 
Daresbury, UK) began working on a new suite of open-source 
bioinformatics tools ? the EMBOSS project ? in 1996. EMBOSS has been 
funded by: the Wellcome Trust (1997?2000); the BBSRC and MRC 
(2001?2004); and through two posts at the MRC Rosalind Franklin Centre 
for Genomic Research following a merger with BBSRC?s SEQNET facility in 
1998.After the closure of RFCGR in July 2005,EMBOSS moved to the 
EMBL-EBI where it is coordinated by Rice and Bleasby.


About EMBL:

The European Molecular Biology Laboratory is a basic research institute 
funded by public research monies from 19 member states (Austria, 
Belgium, Croatia,Denmark, Finland, France,Germany,Greece, Iceland, 
Ireland, Israel, Italy, the Netherlands,Norway, Portugal, Spain, Sweden, 
Switzerland and the United Kingdom). Research at EMBL is conducted by 
approximately 80 independent groups covering the spectrum of molecular 
biology. The Laboratory has five units: the main Laboratory in 
Heidelberg, and Outstations in Hinxton (the European Bioinformatics 
Institute), Grenoble, Hamburg, and Monterotondo near Rome. The 
cornerstones of EMBL?s mission are: to perform basic research in 
molecular biology; to train scientists, students and visitors at all 
levels; to offer vital services to scientists in the member states; to 
develop new instruments and methods in the life sciences and to actively 
engage in technology transfer activities. EMBL?s International PhD 
Programme has a student body of about 170. The Laboratory also sponsors 
an active Science and Society programme.Visitors from the press and 
public are welcome.

About EBI:

The European Bioinformatics Institute (EBI) is part of the European 
Molecular Biology Laboratory (EMBL) and is located on the Wellcome Trust 
Genome Campus in Hinxton near Cambridge (UK). The EBI grew out of EMBL's 
pioneering work in providing public biological databases to the research 
community. It hosts some of the world's most important collections of 
biological data, including DNA sequences (EMBL-Bank), protein sequences 
(UniProt), animal genomes (Ensembl), three-dimensional structures (the 
Macromolecular Structure Database), data from microarray experiments 
(ArrayExpress), protein?protein interactions (IntAct) and pathway 
information (Reactome).The EBI hosts several research groups and its 
scientists continually develop new tools for the biocomputing community.

Policy regarding use:

EMBL press releases may be freely reprinted and distributed via print 
and electronic media. Text, photographs & graphics are copyrighted by 
EMBL. They may be freely reprinted and distributed in conjunction with 
this news story, provided that proper attribution to authors, 
photographers and designers is made. High-resolution copies of the 
images can be downloaded from the EMBL web site: www.embl.org


From rsucgang at bcm.tmc.edu  Fri Apr 28 21:33:59 2006
From: rsucgang at bcm.tmc.edu (richard sucgang phd)
Date: Fri, 28 Apr 2006 16:33:59 -0500
Subject: [EMBOSS] backtranambig missing?
In-Reply-To: <69289db821f271dd3cf6e131ffa02013@bbsrc.ac.uk>
References: <442BFD56.9010908@pcm.uam.es> <443A2160.8090102@ebi.ac.uk>
	<69289db821f271dd3cf6e131ffa02013@bbsrc.ac.uk>
Message-ID: <f06230904c0783a745fe8@[128.249.209.78]>

I am using EMBOSS on OSX (installed using fink). Is it my 
imagination, or is the application backtranambig missing? The 
documentation on sf.net points to this application existing, yet, I 
cannot find the binary in the install. Any ideas?
-- 
Richard Sucgang, PhD
(713) 798 7657
http://www.dictygenome.org/


From francis.tang at chukhang.com  Fri Apr 28 22:39:12 2006
From: francis.tang at chukhang.com (Francis Tang)
Date: Fri, 28 Apr 2006 23:39:12 +0100
Subject: [EMBOSS] how to using Einverted to process a file contain
 multiple sequences
In-Reply-To: <ccfd29ab0604260650q5b7d4d5fjc2c7562668699b36@mail.gmail.com>
References: <ccfd29ab0604260650q5b7d4d5fjc2c7562668699b36@mail.gmail.com>
Message-ID: <44529990.90600@chukhang.com>

Hi Zhendong,

I've had to run einverted on a file with many sequences before.

If I remember correctly, I used seqret to create a new file for each 
sequence, and then used bash's for+glob expansion to run einverted many 
times.

Sorry this mail is so vague - it's been a long while since I've used 
emboss.  If you haven't solved the problem already and the clues above 
don't make it obvious, write back and I'll work it out again.

Cheers.

Francis.

zhendong shaw wrote:
> Since the Einverted program is designed to process only one sequences a
> time. Are there any ways to handle a file in fasta format containing
> multiple sequences?
> The input file just like follow:
>> seq1
> ATTTTTTTTTTTTTTTTTTTT
>> seq2
> TTTAAAAAAAAAAAAAAA
> .......
> 
> sth like that....


-- 
www.chukhang.com/francis


From pmr at ebi.ac.uk  Sat Apr 29 10:23:42 2006
From: pmr at ebi.ac.uk (pmr at ebi.ac.uk)
Date: Sat, 29 Apr 2006 11:23:42 +0100 (BST)
Subject: [EMBOSS] backtranambig missing?
In-Reply-To: <f06230904c0783a745fe8@[128.249.209.78]>
References: <442BFD56.9010908@pcm.uam.es> <443A2160.8090102@ebi.ac.uk>
	<69289db821f271dd3cf6e131ffa02013@bbsrc.ac.uk>
	<f06230904c0783a745fe8@[128.249.209.78]>
Message-ID: <2033.86.137.135.19.1146306222.squirrel@webmail.ebi.ac.uk>

Richard Sucgang writes:

> I am using EMBOSS on OSX (installed using fink). Is it my
> imagination, or is the application backtranambig missing? The
> documentation on sf.net points to this application existing, yet, I
> cannot find the binary in the install. Any ideas?

backtranambig will be in EMBOSS 4.0.0

The emboss.sf.net documentation is for the current developers code, and
includes new programs and changes to the documentation for some of the
current programs.

EMBOSS 3.0.0 documentation is included in the distribution and installed
when EMBOSS is installed.

This often causes confusion - we are working on adding the 3.0.0
documentation to the website but we have not yet had time to finish that
work. (We did move the current documentation to make it clearer that it
was for the CVS code - but that caused more confusion).

More news on 4.0.0 soon - we are busy now planning what will be in the
release.

Hope that helps,

Peter