From gbottu at black.vub.ac.be  Thu May  1 14:43:58 2003
From: gbottu at black.vub.ac.be (Guy Bottu)
Date: Thu, 1 May 2003 20:43:58 +0200
Subject: Preferred isoschizomer ?
In-Reply-To: <200304301829.h3UITwG29534@sulphur.hgmp.mrc.ac.uk>; from ableasby@hgmp.mrc.ac.uk on Wed, Apr 30, 2003 at 07:29:58PM +0100
References: <200304301829.h3UITwG29534@sulphur.hgmp.mrc.ac.uk>
Message-ID: <20030501204358.A1336237@black.vub.ac.be>

from : BEN

On Wed, Apr 30, 2003 at 07:29:58PM +0100, ableasby at hgmp.mrc.ac.uk wrote:
> There are replacement files for rebaseextract.c and rebaseextract.acd
> in the ftp://ftp.uk.embnet.org/pub/EMBOSS/patchfiles/ 
> directory. By default this program will now produce an
> embossre.equ file. Re-extract the withrefm file using the new
> program. If you then use the -preferred option to 'restrict'
> it should behave as you wish.

	Fine !

There is however a problem : the programs restrict and restover now
behave as they should, but, the programs remap and showseq seem to
ignore the parameter -preferred, or do I make a mistake ?

	Regards,
	Guy Bottu


From eija.korpelainen at csc.fi  Fri May  2 01:41:04 2003
From: eija.korpelainen at csc.fi (Eija Korpelainen)
Date: Fri, 2 May 2003 08:41:04 +0300
Subject: Preferred isoschizomer ?
References: <200304141817.h3EIHs410930@bromine.hgmp.mrc.ac.uk> <20030430181945.GD3138@iib.unsam.edu.ar>
Message-ID: <002b01c3106d$6ea5b8a0$0402a6c1@windows.csc.fi>

Dear Fernan, Guy and others,

we have been looking into this problem with Alan and as he told you, the
embossre.equ -file is now made automatically. -preferred works (gives you
PstI instead of BspMAI) because the default value of -limit is true (this is
defined in the restrict.acd file). So if one is using a graphical interface
one has to tick both -preferred and -limit to get the right thing. This is
because in the code of restrict.c -preferred (called "equiv" in the code) is
considered only when -limit has been chosen. What the program actually does
is that it first limits to one isoschizomer and picks the alphabetically
first one (!), and then converts this to the prototype enzyme using the
embossre.equ file. The limiting step is performed by the function
embPatRestrictRestrict in embpat.c (in the nucleus directory).

The problem with the current set up is that the user doesn't know
that -limit and -preferred are interconnected. This could of course be
documented, but the easy fix would be to set the equiv boolean true in the
code and abolish the -preferred qualifier altogether. This way -limit would
give you automatically PstI, and -nolimit all isoschizomers.

As Guy pointed out, the problem with remap is that it does not take any
notice of the -preferred. This is simply because the code reads the value of
preferred (or equiv) but doesn't use it for anything. In other words, most
of remap.c code comes from Alan's restrict.c code, but the following
critical bit was accidentally left out.
if(equiv && limit)
{
value = ajTableGet(table,m->cod);
if (value)
ajStrAss(&m->cod,value);
}

I think it would be important to fix these problems because these are quite
central programs for molecular biologists and expensive projects like
transgenic design depend heavily on proper restriction maps.

Cheers,
Eija

_____________________________________________

Eija Korpelainen, Ph.D
Science Support/Biosciences
CSC - Center for Scientific Computing
P.O.Box 405, FIN-02101 Espoo, Finland
Phone    +358 9 457 2030
Mobile   +358 50 381 9726
Fax        +358 9 457 2302
E-Mail    Eija.Korpelainen at csc.fi
________________________________________________


From ableasby at hgmp.mrc.ac.uk  Fri May  2 02:50:57 2003
From: ableasby at hgmp.mrc.ac.uk (ableasby at hgmp.mrc.ac.uk)
Date: Fri, 2 May 2003 07:50:57 +0100 (BST)
Subject: Preferred isoschizomer ?
Message-ID: <200305020650.h426ovQ24261@bromine.hgmp.mrc.ac.uk>

Eija's analysis is quite correct. In fact the modifications to
remap/showseq (or their equivalent) were made yesterday and
passed on to the original author so they can be tested for
any knock-on effects.

It is true that, when the program was written, there were no GUIs
for EMBOSS so the '-limit' confusion didn't arise. Eija's
suggestion is a good one and will be tested

Alan


From bianji at jincao.com  Fri May  2 06:18:10 2003
From: bianji at jincao.com (bianji at jincao.com)
Date: Fri, 2 May 2003 18:18:10 +0800
Subject: =?GB2312?B?ufq80rDksry52NPaIrfHteQi1+7QwreowsmhoreoueY=?=
Message-ID: <20030502100834.1E0B37D1A5@mercury.hgmp.mrc.ac.uk>

    ??????????????????????????????????

    ????????????????"????"????????????????????????????????????????

"????"????????????

    ?????? http://www.jincao.com/t1.htm 

    ????????????????????????????????CEO??????

    ???????????????????????????? msm at jincao.com 

                              2003??5??2??


From peptides at earthlink.net  Wed May  7 04:15:47 2003
From: peptides at earthlink.net (David Stephens)
Date: Wed, 7 May 2003 01:15:47 -0700
Subject: Growth In Radiolabeled Peptides
Message-ID: <20030507081548.EB25A7D20A@mercury.hgmp.mrc.ac.uk>

An HTML attachment was scrubbed...
URL: http://lists.open-bio.org/pipermail/emboss/attachments/20030507/5fa68a9f/attachment.html 

From Marc.Logghe at devgen.com  Wed May  7 06:18:13 2003
From: Marc.Logghe at devgen.com (Marc Logghe)
Date: Wed, 7 May 2003 12:18:13 +0200 
Subject: dbiflat question
Message-ID: <BEE28BF86078B6429D6C780635718E212E71B4@morelia.be.devgen.com>

Hi all,
I feel a little dumb but I'll ask it anyhow. I seem not to succeed in
creating indices for a database using dbiflat.
As a test I just wanted to index the genbank file /data/genbank/gbest226.seq
Ok, I wanted my indices to be in /data/emboss/est
so I have run dbiflat in that folder.
dbiflat -idformat genbank -directory /data/genbank -filenames gbest226.seq
-dbname est
I added this entry to emboss.default
DB est [
   type: N
   format: genbank
   method: emblcd
   directory: /data/emboss/est
]

But, you guessed it, this did not work.
What am I doing wrong here ? What happens with the passed dbname (could not
find any file with that name after running dbiflat) ?
TIA,
marc

***********************************************************
Marc Logghe, Ph.D.
Senior Scientist
Scientific Computing Group
deVGen
Technologiepark 9
9052 Zwijnaarde
Belgium
tel: +32 (0) 9 324 24 83
fax: +32 (0) 9 324 24 25
***********************************************************


From pmr at ebi.ac.uk  Wed May  7 06:34:58 2003
From: pmr at ebi.ac.uk (Peter Rice)
Date: Wed, 07 May 2003 11:34:58 +0100
Subject: dbiflat question
References: <BEE28BF86078B6429D6C780635718E212E71B4@morelia.be.devgen.com>
Message-ID: <3EB8E152.7090706@ebi.ac.uk>

Marc Logghe wrote:
> I added this entry to emboss.default
> DB est [
>    type: N
>    format: genbank
>    method: emblcd
>    directory: /data/emboss/est
> ]

You need:

directory: /data/genbank
indexdirectory: /data/emboss/est

EMBOSS needs to find the index files and the data files.

Just specifying "directory" works if both files are there (it becomes 
the defualt for indexdirectory), so your confusion is quite understandable.

Hope this helps,

Peter Rice


From pemberaj at pugh.bip.bham.ac.uk  Wed May  7 07:17:31 2003
From: pemberaj at pugh.bip.bham.ac.uk (Tony Pemberton)
Date: Wed, 7 May 2003 12:17:31 +0100
Subject: dbiflat question
In-Reply-To: <BEE28BF86078B6429D6C780635718E212E71B4@morelia.be.devgen.com>
References: <BEE28BF86078B6429D6C780635718E212E71B4@morelia.be.devgen.com>
Message-ID: <Pine.SGI.4.51.0305071212370.189268@pugh.bip.bham.ac.uk>

On Wed, 7 May 2003, Marc Logghe wrote:

> Hi all,
> I feel a little dumb but I'll ask it anyhow. I seem not to succeed in
> creating indices for a database using dbiflat.
> As a test I just wanted to index the genbank file /data/genbank/gbest226.seq
> Ok, I wanted my indices to be in /data/emboss/est
> so I have run dbiflat in that folder.
> dbiflat -idformat genbank -directory /data/genbank -filenames gbest226.seq
> -dbname est
> I added this entry to emboss.default
> DB est [
>    type: N
>    format: genbank
>    method: emblcd
>    directory: /data/emboss/est
> ]
>
> But, you guessed it, this did not work.
> What am I doing wrong here ? What happens with the passed dbname (could not
> find any file with that name after running dbiflat) ?
> TIA,
> marc
>
> ***********************************************************
> Marc Logghe, Ph.D.
> Senior Scientist
> Scientific Computing Group
> deVGen
> Technologiepark 9
> 9052 Zwijnaarde
> Belgium
> tel: +32 (0) 9 324 24 83
> fax: +32 (0) 9 324 24 25
> ***********************************************************
>
>
>

Marc,

You need the .seq file also to be in the directory where you run
dbiflat. Or make symbolic links!

You will note that the dialogue of dbiflat asks about the files to
process (*.seq). At this stage, I think I am correct in saying, that
the database directory file emboss.default is not operable. This
merely directs the user programs e.g. seqret to the formatted
database (indeces) as shown by showdb.

Regards,

Tony


*********************************************************************
Mr. A.J.Pemberton              Tel:  +121-414-3388
c/o Dept. Rheumatology,        Fax:  +121-414-6794
Medical School,                E-mail: A.J.Pemberton at bham.ac.uk
The University of Birmingham,
Birmingham B15 2TT.
U.K.
*********************************************************************


From Marc.Logghe at devgen.com  Wed May  7 08:05:25 2003
From: Marc.Logghe at devgen.com (Marc Logghe)
Date: Wed, 7 May 2003 14:05:25 +0200 
Subject: dbiflat question
Message-ID: <BEE28BF86078B6429D6C780635718E212E71B6@morelia.be.devgen.com>

Thanks for the reply !
That is what I have figured out:
when you run
dbiflat -idformat genbank -directory /data/genbank -filenames gbest226.seq
in the index directory (e.g. /data/emboss/est) has the same effect as
running 
dbiflat -idformat genbank -directory /data/genbank -indexdirectory
/data/emboss/est -filenames gbest226.seq
meaning, the index files are created in the desired place. But still the
sequences themselves are not accessible using the mentioned entry in
emboss.default ('seqret est -firstonly' gives a segmentation fault). I
suppose the 'directory' key should point to the indexdirectory, right ?
Because, the index itself should be pointing to the correct sequence path.
At least that is what I expect.
And indeed, as suggested by Tony, everything worked fine when putting index
and sequence files in the same directory (indexdirectory and directory are
the same).
OK, just tried something which appears to work now. Switch to the first
scenario again: separate paths for index and sequence files. When I changed
the emboss.default to the following, everything worked fine:

DB est [
   type: N
   format: genbank
   method: emblcd
   indexdirectory: /data/emboss/est
   directory: /data/genbank
]

Apparently you have to set the indexdirectory and directory explicitely in
the configuration file also; pointing to the indexdirectory alone is not
sufficient !
Regards,
Marc
 
> -----Original Message-----
> From: Tony Pemberton [mailto:pemberaj at pugh.bip.bham.ac.uk]
> Sent: Wednesday, May 07, 2003 1:18 PM
> To: Marc Logghe
> Cc: Emboss (E-mail)
> Subject: Re: dbiflat question
> 
> 
> On Wed, 7 May 2003, Marc Logghe wrote:
> 
> > Hi all,
> > I feel a little dumb but I'll ask it anyhow. I seem not to 
> succeed in
> > creating indices for a database using dbiflat.
> > As a test I just wanted to index the genbank file 
> /data/genbank/gbest226.seq
> > Ok, I wanted my indices to be in /data/emboss/est
> > so I have run dbiflat in that folder.
> > dbiflat -idformat genbank -directory /data/genbank 
> -filenames gbest226.seq
> > -dbname est
> > I added this entry to emboss.default
> > DB est [
> >    type: N
> >    format: genbank
> >    method: emblcd
> >    directory: /data/emboss/est
> > ]
> >
> > But, you guessed it, this did not work.
> > What am I doing wrong here ? What happens with the passed 
> dbname (could not
> > find any file with that name after running dbiflat) ?
> > TIA,
> > marc
> >
> > ***********************************************************
> > Marc Logghe, Ph.D.
> > Senior Scientist
> > Scientific Computing Group
> > deVGen
> > Technologiepark 9
> > 9052 Zwijnaarde
> > Belgium
> > tel: +32 (0) 9 324 24 83
> > fax: +32 (0) 9 324 24 25
> > ***********************************************************
> >
> >
> >
> 
> Marc,
> 
> You need the .seq file also to be in the directory where you run
> dbiflat. Or make symbolic links!
> 
> You will note that the dialogue of dbiflat asks about the files to
> process (*.seq). At this stage, I think I am correct in saying, that
> the database directory file emboss.default is not operable. This
> merely directs the user programs e.g. seqret to the formatted
> database (indeces) as shown by showdb.
> 
> Regards,
> 
> Tony
> 
> 
> *********************************************************************
> Mr. A.J.Pemberton              Tel:  +121-414-3388
> c/o Dept. Rheumatology,        Fax:  +121-414-6794
> Medical School,                E-mail: A.J.Pemberton at bham.ac.uk
> The University of Birmingham,
> Birmingham B15 2TT.
> U.K.
> *********************************************************************
> 


From Stephan.Hurling at evotecoai.com  Thu May  8 08:41:50 2003
From: Stephan.Hurling at evotecoai.com (Stephan.Hurling at evotecoai.com)
Date: Thu, 8 May 2003 14:41:50 +0200
Subject: Problems with dbigcg...
Message-ID: <OFF5BDE2CA.9C1FDB2B-ONC1256D20.00438AAD@evotecoai.com>

Hello Everyone,

I would like to use EMBOSS version 2.6.0 together with the GCG Wisconsin 
package version 10.3
on a Red Hat 7.2 linux server. I followed the installation instructions 
from the administrators guide
and doing the usual

./configure
make 
make install

I compiled and installed emboss on my system without any error messages. 
But when I want to make
indexes from a gcg database I run into troubles. See the following output 
of an interactive session:

14:18 [root at kepler] ~/Temp # dbigcg
Index a GCG formatted database
      EMBL : EMBL
     SWISS : Swiss-Prot, SpTrEMBL, TrEMBLnew
   GENBANK : Genbank, DDBJ
       PIR : NBRF
Entry format [EMBL]:
Database directory [.]: /usr/local/share/EMBOSS/data/GCG_DATABASES/gcgembl
Wildcard database filename [*.seq]:
Database name: embl
Release number [0.0]: 73.0
Index date [00/00/00]: 01/12/02

   EMBOSS An error in embdbi.c at line 590:
Cannot open embl.idsrt for reading

14:21 [root at kepler] ~/Temp # ll
total 252k
drwxr-xr-x    2 root     root         4.0k May  8 14:18 ./
drwxr-x---   24 root     root         4.0k May  8 14:15 ../
-rw-------    1 root     root         1.4M May  8 14:18 core
-rw-r--r--    1 root     root         1.5k May  8 14:18 division.lkp
-rw-r--r--    1 root     root          675 May  8 14:18 embl001.acnum
-rw-r--r--    1 root     root          104 May  8 14:18 embl002.acnum
-rw-r--r--    1 root     root          504 May  8 14:18 embl003.acnum
-rw-r--r--    1 root     root           51 May  8 14:18 embl004.acnum
-rw-r--r--    1 root     root            0 May  8 14:18 embl005.acnum
-rw-r--r--    1 root     root            0 May  8 14:18 embl006.acnum
-rw-r--r--    1 root     root            0 May  8 14:18 embl007.acnum
-rw-r--r--    1 root     root          126 May  8 14:18 embl008.acnum
-rw-r--r--    1 root     root            0 May  8 14:18 embl009.acnum
-rw-r--r--    1 root     root            0 May  8 14:18 embl010.acnum
-rw-r--r--    1 root     root            0 May  8 14:18 embl011.acnum
-rw-r--r--    1 root     root            0 May  8 14:18 embl012.acnum
-rw-r--r--    1 root     root           36 May  8 14:18 embl013.acnum
-rw-r--r--    1 root     root          25k May  8 14:18 embl014.acnum
-rw-r--r--    1 root     root          161 May  8 14:18 embl015.acnum
-rw-r--r--    1 root     root          850 May  8 14:18 embl016.acnum
-rw-r--r--    1 root     root         1.3k May  8 14:18 embl017.acnum
-rw-r--r--    1 root     root         2.6k May  8 14:18 embl018.acnum
-rw-r--r--    1 root     root          121 May  8 14:18 embl019.acnum
-rw-r--r--    1 root     root          290 May  8 14:18 embl020.acnum
-rw-r--r--    1 root     root            0 May  8 14:18 embl021.acnum
-rw-r--r--    1 root     root          104 May  8 14:18 embl022.acnum
-rw-r--r--    1 root     root          188 May  8 14:18 embl023.acnum
-rw-r--r--    1 root     root            0 May  8 14:18 embl024.acnum
-rw-r--r--    1 root     root           34 May  8 14:18 embl025.acnum
-rw-r--r--    1 root     root           14 May  8 14:18 embl026.acnum
-rw-r--r--    1 root     root          490 May  8 14:18 embl027.acnum
-rw-r--r--    1 root     root          300 May  8 14:18 entrynam.idx
-rw-------    1 root     root            0 May  8 14:18 sort9YCQfK

Can somebody help me? Have I done something wrong during the compilation 
step of emboss?
Any hint would help me.

Thanks in advance...

All the best,


Stephan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.open-bio.org/pipermail/emboss/attachments/20030508/3ce77125/attachment.html 

From ablavier at wanadoo.fr  Sun May 11 14:36:24 2003
From: ablavier at wanadoo.fr (=?iso-8859-1?Q?Andr=E9_Blavier?=)
Date: Sun, 11 May 2003 20:36:24 +0200
Subject: EMBOSS for Windows: DLL  build
Message-ID: <001e01c317ec$3c9feb60$5ca03551@bach>

EMBOSS for Windows is now built with ajax and nucleus compiled as DLLs, so
the EMBOSS programs are now much smaller, and the distribution as well.
dbiblast is now in the package. See
http://perso.wanadoo.fr/ablavier/embosswin/embosswin.html.

    -- Andr? Blavier


From arunanirudhan at yahoo.co.in  Mon May 12 04:27:44 2003
From: arunanirudhan at yahoo.co.in (=?iso-8859-1?q?arun=20anirudhan?=)
Date: Mon, 12 May 2003 09:27:44 +0100 (BST)
Subject: seqret
Message-ID: <20030512082744.65129.qmail@web8203.mail.in.yahoo.com>

Hello allHow can i use seqret to retrieve sequences from a database like we use in entrez? For eg: I want to get sequences of all insulin from genbank. What to give as command?seqret embl:insulin         ? Arun
Catch all the cricket action. Download Yahoo! Score tracker
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.open-bio.org/pipermail/emboss/attachments/20030512/2ca4eb08/attachment.html 

From maoj at mail.nih.gov  Mon May 12 10:16:05 2003
From: maoj at mail.nih.gov (Jean Mao)
Date: Mon, 12 May 2003 10:16:05 -0400
Subject: about ftp site of EMBOSS Administrators Guide
Message-ID: <00e801c31891$0c341410$618a70a5@citjmao>

Hi, where can I find the pdf version of emboss administrators guide? the link on the website doesn't work. thanks.

Jean
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.open-bio.org/pipermail/emboss/attachments/20030512/c7d5275b/attachment.html 

From gwilliam at hgmp.mrc.ac.uk  Mon May 12 10:40:35 2003
From: gwilliam at hgmp.mrc.ac.uk (Gary Williams, Tel 01223 494522)
Date: Mon, 12 May 2003 15:40:35 +0100
Subject: about ftp site of EMBOSS Administrators Guide
References: <00e801c31891$0c341410$618a70a5@citjmao>
Message-ID: <3EBFB263.95B0DDAB@hgmp.mrc.ac.uk>

There is no PDF version of the current guide.
The link on the web was left there by accident and has now been tidied
away - sorry.

Gary

> Jean Mao wrote:
> 
> Hi, where can I find the pdf version of emboss administrators guide?
> the link on the website doesn't work. thanks.
> 
> Jean

-- 
Gary Williams               Tel: +44 1223 494522  Fax: +44 1223 494512
mailto:G.Williams at rfcgr.mrc.ac.uk          http://www.rfcgr.mrc.ac.uk/
Bioinformatics, MRC RFCGR, Hinxton, Cambridge, CB10 1SB, UK


From gwilliam at hgmp.mrc.ac.uk  Mon May 12 11:40:10 2003
From: gwilliam at hgmp.mrc.ac.uk (Gary Williams, Tel 01223 494522)
Date: Mon, 12 May 2003 16:40:10 +0100
Subject: about ftp site of EMBOSS Administrators Guide
References: <00e801c31891$0c341410$618a70a5@citjmao> <3EBFB263.95B0DDAB@hgmp.mrc.ac.uk>
Message-ID: <3EBFC05A.16A96393@hgmp.mrc.ac.uk>

The .ps and .pdf versions of the current guide are now on the web page:
http://www.hgmp.mrc.ac.uk/Software/EMBOSS/admin.html

See:
http://www.hgmp.mrc.ac.uk/Software/EMBOSS/Doc/Admin_guide/admin.ps
and
http://www.hgmp.mrc.ac.uk/Software/EMBOSS/Doc/Admin_guide/admin.pdf

Gary


> > Jean Mao wrote:
> >
> > Hi, where can I find the pdf version of emboss administrators guide?
> > the link on the website doesn't work. thanks.


-- 
Gary Williams               Tel: +44 1223 494522  Fax: +44 1223 494512
mailto:G.Williams at rfcgr.mrc.ac.uk          http://www.rfcgr.mrc.ac.uk/
Bioinformatics, MRC RFCGR, Hinxton, Cambridge, CB10 1SB, UK


From arunanirudhan at yahoo.co.in  Tue May 13 03:25:13 2003
From: arunanirudhan at yahoo.co.in (=?iso-8859-1?q?arun=20anirudhan?=)
Date: Tue, 13 May 2003 08:25:13 +0100 (BST)
Subject: Fwd: seqret
Message-ID: <20030513072513.53308.qmail@web8204.mail.in.yahoo.com>


Note: forwarded message attached.
Catch all the cricket action. Download Yahoo! Score tracker
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.open-bio.org/pipermail/emboss/attachments/20030513/a71082c2/attachment.html 
-------------- next part --------------
An embedded message was scrubbed...
From: =?iso-8859-1?q?arun=20anirudhan?= <arunanirudhan at yahoo.co.in>
Subject: seqret
Date: Mon, 12 May 2003 09:27:44 +0100 (BST)
Size: 2520
Url: http://lists.open-bio.org/pipermail/emboss/attachments/20030513/a71082c2/attachment.mht 

From peptides at earthlink.net  Tue May 13 04:45:09 2003
From: peptides at earthlink.net (David Stephens)
Date: Tue, 13 May 2003 01:45:09 -0700
Subject: Sourcing Information For Amino Acids and Custom Peptides
Message-ID: <20030513084512.363E87D2CC@mercury.hgmp.mrc.ac.uk>

An HTML attachment was scrubbed...
URL: http://lists.open-bio.org/pipermail/emboss/attachments/20030513/c3982ccb/attachment.html 

From yann-francois.bizouerne at bayercropscience.com  Thu May 15 11:29:46 2003
From: yann-francois.bizouerne at bayercropscience.com (yann-francois.bizouerne at bayercropscience.com)
Date: Thu, 15 May 2003 17:29:46 +0200
Subject: Search for organism of entry
Message-ID: <OFC9485941.7B3AAB7B-ONC1256D27.0050FA81-C1256D27.00551F6E@bayer-ag.com>

Hello,

I have install EMBOSS on our server since recently. I reallly enjoy a lot the
different tools but I have a little problem and I can't find the solution
anywhere.
I have index the SwissProt database with  the command line :

      dbiflat -idformat SWISS -directory . -filenames sprot.dat -dnname sprot
-fields acnum,seqvn,des,keyword,taxon

So after that I could find sequence information when I am looking by for
particular organism or keyword.

For the moment What I could retrieve with the accession number is the following
:

   >infoseq sprot:P15711
   Displays some simple information about sequences
   # USA             Name        Accession Type Length     Description
   ian-id:104K_THEPA 104K_THEPA    P15711  P    924        104 kDa
   microneme-rhoptry antigen.

And when I am looking with the organism I obtain :
   >infoseq sprot-org:"*Theileria*" -outfile stdout
   Displays some simple information about sequences
   # USA             Name        Accession Type Length     Description
   sprot-id:104K_THEPA 104K_THEPA    P15711  P    924        104 kDa
   microneme-rhoptry antigen.


So now I want to know if I could  for one particular entry (sprot:P15711) find
the Organism (Theileria prava) or not ?

Thanks in advance for your answer.


Yann-Fran?ois BIZOUERNE
BioInformatic Team
BAYER CropScience
1, rue Pierre Fontaine
91058 Evry Cedex
FRANCE
Phone:      33-(0) 1-69-47-61-56
FAX:        33-(0) 1-69-47-61-42
E-mail:     yann-francois.bizouerne at bayercropscience.com
Intranet: http://bioinfo.evry.fr.bayercropscience/


From pmr at ebi.ac.uk  Thu May 15 12:26:31 2003
From: pmr at ebi.ac.uk (Peter Rice)
Date: Thu, 15 May 2003 17:26:31 +0100
Subject: Search for organism of entry
References: <OFC9485941.7B3AAB7B-ONC1256D27.0050FA81-C1256D27.00551F6E@bayer-ag.com>
Message-ID: <3EC3BFB7.5000605@ebi.ac.uk>

yann-francois.bizouerne at bayercropscience.com wrote:

> And when I am looking with the organism I obtain :
>    >infoseq sprot-org:"*Theileria*" -outfile stdout
>    Displays some simple information about sequences
>    # USA             Name        Accession Type Length     Description
>    sprot-id:104K_THEPA 104K_THEPA    P15711  P    924        104 kDa
>    microneme-rhoptry antigen.
> 
> 
> So now I want to know if I could  for one particular entry (sprot:P15711) find
> the Organism (Theileria prava) or not ?

EMBOSS can search a database by organism, but reads the sequence (in 
most programs) or the whole entry (entret)

... but I am looking into ways to parse out more detail, including 
organism, citation, and features. The database definition would have a 
list of fields that can be retrieved, and a program like (for example) 
entret could check the fields and let you choose the ones you need.

For now, you can run entret and look for the organism in the text.

Hope this helps,

Peter Rice


From henrikki.almusa at helsinki.fi  Tue May 20 02:42:36 2003
From: henrikki.almusa at helsinki.fi (Henrikki Almusa)
Date: Tue, 20 May 2003 09:42:36 +0300
Subject: Support for nexus in alignment format
Message-ID: <200305200942.36206.henrikki.almusa@helsinki.fi>

Hello

I read through the alignment formats that emboss supports. I was wondering if 
nexus is supported as alignment format (-aformat nexus)? I seems to be 
supported as sequence format but it wasnt mentioned as alignment format.

-- 
Henrikki Almusa


From pmr at ebi.ac.uk  Tue May 20 05:29:57 2003
From: pmr at ebi.ac.uk (Peter Rice)
Date: Tue, 20 May 2003 10:29:57 +0100
Subject: Support for nexus in alignment format
References: <200305200942.36206.henrikki.almusa@helsinki.fi>
Message-ID: <3EC9F595.5000008@ebi.ac.uk>

Henrikki Almusa wrote:
> I read through the alignment formats that emboss supports. I was wondering if 
> nexus is supported as alignment format (-aformat nexus)? I seems to be 
> supported as sequence format but it wasnt mentioned as alignment format.

The sequence formats are easy to add as alignment formats. Not sure 
quite how useful that is.

You can do this:

1. Create your alignment in a sequence format (FASTA, MSF)

2. Use seqret to convert to nexus format

... or does NEXUS format hold some extra information that would make it 
a useful alignment format, and that we lose by going through FASTA?

Hope this helps,

Peter


From yann-francois.bizouerne at bayercropscience.com  Wed May 21 04:53:19 2003
From: yann-francois.bizouerne at bayercropscience.com (yann-francois.bizouerne at bayercropscience.com)
Date: Wed, 21 May 2003 10:53:19 +0200
Subject: Use two Emboss package with one database
Message-ID: <OF12D15AFD.92630898-ONC1256D2D.003032C5-C1256D2D.0030D3F0@bayer-ag.com>

Hello,

I am working with 2 diffretns servers on different locations. On each of them a
EMBOSS package tools is installed.
I need to know if I could configure these 2 EMBOSS in order to work with the
same database (which is located on one of the two servers).
Is EMBOSS could working this way or does I need to have only one EMBOSS package
(tools + databse) installed on one server ?

I hope that my question is clear enough.

Best Regards


Yann-Fran?ois BIZOUERNE
BioInformatic Team
BAYER CropScience
1, rue Pierre Fontaine
91058 Evry Cedex
FRANCE
Phone:      33-(0) 1-69-47-61-56
FAX:        33-(0) 1-69-47-61-42
E-mail:     yann-francois.bizouerne at bayercropscience.com
Intranet: http://bioinfo.evry.fr.bayercropscience/


From pmr at ebi.ac.uk  Wed May 21 04:59:18 2003
From: pmr at ebi.ac.uk (Peter Rice)
Date: Wed, 21 May 2003 09:59:18 +0100
Subject: Use two Emboss package with one database
References: <OF12D15AFD.92630898-ONC1256D2D.003032C5-C1256D2D.0030D3F0@bayer-ag.com>
Message-ID: <3ECB3FE6.1020906@ebi.ac.uk>

yann-francois.bizouerne at bayercropscience.com wrote:
> Hello,
> 
> I am working with 2 diffretns servers on different locations. On each of them a
> EMBOSS package tools is installed.
> I need to know if I could configure these 2 EMBOSS in order to work with the
> same database (which is located on one of the two servers).
> Is EMBOSS could working this way or does I need to have only one EMBOSS package
> (tools + databse) installed on one server ?

Yes ... but you need to do some work.

The EMBOSS package on the same server as the databases is easy.

The second EMBOSS package needs to read from remote databases. I assume 
you indexed them with dbiflat (an the other dbi programs).

You can access a remote database by:

SRSWWW if it on an SRS server
URL if you have a web page to query the database
APP (EXTERNAL) if yuo have a script that can return an entry

Assuming you don't have them under SRS ...

You can provide a simple web CGI script that runs entret (for whole 
entry) or seqret (for sequence only - you can put -osformat on the 
command line to get the format of your choice))

You can write a script that will access the databases somehow (possibly 
also by talking to a web page - your choice).

Meanwhile, I am working on ways to define EMBOSS web services and data 
services that would give an alternative access method, but that is for 
later in the year.

Hope this helps,

Peter Rice


From Marc.Logghe at devgen.com  Wed May 21 05:03:00 2003
From: Marc.Logghe at devgen.com (Marc Logghe)
Date: Wed, 21 May 2003 11:03:00 +0200
Subject: Use two Emboss package with one database
Message-ID: <BEE28BF86078B6429D6C780635718E212E71E6@morelia.be.devgen.com>

Hi, 
At our site emboss is installed on every node of a cluster while the
databases are installed on only one.
The only thing you have to do is mount the database directory/directories in
one way or another on every node and adapt the emboss.default files
appropriately (if necessary) so that the DB entries are pointing to the
correct directories.
HTH,
Marc

> -----Original Message-----
> From: yann-francois.bizouerne at bayercropscience.com
> [mailto:yann-francois.bizouerne at bayercropscience.com]
> Sent: Wednesday, May 21, 2003 10:53 AM
> To: emboss at embnet.org
> Subject: Use two Emboss package with one database
> 
> 
> Hello,
> 
> I am working with 2 diffretns servers on different locations. 
> On each of them a
> EMBOSS package tools is installed.
> I need to know if I could configure these 2 EMBOSS in order 
> to work with the
> same database (which is located on one of the two servers).
> Is EMBOSS could working this way or does I need to have only 
> one EMBOSS package
> (tools + databse) installed on one server ?
> 
> I hope that my question is clear enough.
> 
> Best Regards
> 
> 
> 
> Yann-Fran?ois BIZOUERNE
> BioInformatic Team
> BAYER CropScience
> 1, rue Pierre Fontaine
> 91058 Evry Cedex
> FRANCE
> Phone:      33-(0) 1-69-47-61-56
> FAX:        33-(0) 1-69-47-61-42
> E-mail:     yann-francois.bizouerne at bayercropscience.com
> Intranet: http://bioinfo.evry.fr.bayercropscience/
> 
> 


From d.m.a.martin at dundee.ac.uk  Wed May 21 05:04:24 2003
From: d.m.a.martin at dundee.ac.uk (David Martin)
Date: Wed, 21 May 2003 10:04:24 +0100
Subject: Use two Emboss package with one database
In-Reply-To: <OF12D15AFD.92630898-ONC1256D2D.003032C5-C1256D2D.0030D3F0@bayer-ag.com>
Message-ID: <BAF0FFA8.28E5%d.m.a.martin@dundee.ac.uk>

On 21/5/03 9:53 am, "yann-francois.bizouerne at bayercropscience.com"
<yann-francois.bizouerne at bayercropscience.com> wrote:

> Hello,
> 
> I am working with 2 diffretns servers on different locations. On each of them
> a
> EMBOSS package tools is installed.
> I need to know if I could configure these 2 EMBOSS in order to work with the
> same database (which is located on one of the two servers).
> Is EMBOSS could working this way or does I need to have only one EMBOSS
> package
> (tools + databse) installed on one server ?

There are two options here:

1. Different platforms accessing the same database
2. same platform (different machines) accessing the same database.

1. Easy. When you do a configure set the prefix (prefixes) approrpriately so
that the executables go to an appropriate place and the databases point to a
shared (NFS or similar) drive containing the config files. Obviously you
have to use the same mountpoint on all your machines for this to work (I use
/site/share/EMBOSS for the config and /site/databases as a root for the
databases. In this case /site/bin is local, not shared and contains the
appropriate binaries [or can be a symlink to /site/Linux/bin,
/site/IRIX/bin, /site/Solaris/bin, /site/Darwin/bin as appropriate if you
are supporting more than on emachine on a particular platform]  )

2. Can be done in the same way using a shared drive for the executables. One
gotcha is that EMBOSS, despite all efforts, does not compile statically so
you have to ensure that the library versions are the same across the various
platforms or you will get runtime errors.

Either of these methods will reduce the maintenance load considerably.

In my case I use NFS for the data directories and use a nightly scheduled
rsync to synchronise the executables and config files with the master
machine as these don't take much space and it reduces the network overhead.

Hope this helps.

..d
> 
> I hope that my question is clear enough.
> 
> Best Regards
> 
> 
> 
> Yann-Fran?ois BIZOUERNE
> BioInformatic Team
> BAYER CropScience
> 1, rue Pierre Fontaine
> 91058 Evry Cedex
> FRANCE
> Phone:      33-(0) 1-69-47-61-56
> FAX:        33-(0) 1-69-47-61-42
> E-mail:     yann-francois.bizouerne at bayercropscience.com
> Intranet: http://bioinfo.evry.fr.bayercropscience/
> 
> 
> 

-- 
David Martin PhD
Bioinformatics Scientific Officer
Post-Genomics and Molecular Interactions Centre
University of Dundee


From d.m.a.martin at dundee.ac.uk  Wed May 21 05:13:25 2003
From: d.m.a.martin at dundee.ac.uk (David Martin)
Date: Wed, 21 May 2003 10:13:25 +0100
Subject: Use two Emboss package with one database
In-Reply-To: <3ECB3FE6.1020906@ebi.ac.uk>
Message-ID: <BAF101C5.28EA%d.m.a.martin@dundee.ac.uk>

On 21/5/03 9:59 am, "Peter Rice" <pmr at ebi.ac.uk> wrote:

> yann-francois.bizouerne at bayercropscience.com wrote:
>> Hello,
>> 
>> I am working with 2 diffretns servers on different locations. On each of them
>> a
>> EMBOSS package tools is installed.
>> I need to know if I could configure these 2 EMBOSS in order to work with the
>> same database (which is located on one of the two servers).
>> Is EMBOSS could working this way or does I need to have only one EMBOSS
>> package
>> (tools + databse) installed on one server ?
> 
> Yes ... but you need to do some work.
> 
> The EMBOSS package on the same server as the databases is easy.
> 
> The second EMBOSS package needs to read from remote databases. I assume
> you indexed them with dbiflat (an the other dbi programs).
> 
> You can access a remote database by:
> 
> SRSWWW if it on an SRS server
> URL if you have a web page to query the database
> APP (EXTERNAL) if yuo have a script that can return an entry
> 
> Assuming you don't have them under SRS ...
> 
> You can provide a simple web CGI script that runs entret (for whole
> entry) or seqret (for sequence only - you can put -osformat on the
> command line to get the format of your choice))
> 
> You can write a script that will access the databases somehow (possibly
> also by talking to a web page - your choice).
> 
> Meanwhile, I am working on ways to define EMBOSS web services and data
> services that would give an alternative access method, but that is for
> later in the year.
> 

What about just using Jemboss (or a variant thereof) to talk to the master
machine? The alternative is shunting lots of data around which isn't really
feasible unless you have a fast network between the machines.

Do you move the data to the problem or the problem to the data? The trade
off is between transport time and execution time.

Does Jemboss make use of a SOAP server? If not it would be really nice to
have a script that could generate a WSDL definition from the ACD files.

It's then one step away from being a Grid service..

..d 

 
-- 
David Martin PhD
Bioinformatics Scientific Officer
Post-Genomics and Molecular Interactions Centre
University of Dundee


From maoj at mail.nih.gov  Wed May 21 12:21:43 2003
From: maoj at mail.nih.gov (Jean Mao)
Date: Wed, 21 May 2003 12:21:43 -0400
Subject: question about databases setup
Message-ID: <038001c31fb5$1171acf0$618a70a5@citjmao>

Hi, I am new to emboss. have question in database setup. 

I have a file in the directory /data/maoj/emboss/db/mouse/ called 'test.dat'. this file has 9 entries in embl format. i ran dbiflat, acnum.hit, acnum.trg, division.lkp, entrynam.idx were generated. 

then I setup a .embossrc file in my home dir as follows :
-------------------------------------------------------------------------------------------
# Logfile - set this to a file that any user can append to
# and EMBOSS applications will automatically write log information

# SET emboss_logfile /home/db/emboss/tmp/log

DB test [
        type: N
        method: emblcd
        format: embl
        dir: /data/maoj/emboss/db/mouse
        file: "*.dat"
        release: "1"
        comment: "Test DB"
        ]
-------------------------------------------------

when i run seqret and try to retrieve 1 of the 9 entries, i got following error:

% seqret
Reads and writes (returns) sequences
Input sequence(s): test:AB001363
Warning: Cannot open division file '<null>' for database 'test'
Warning: seqCdQry failed
Error: Unable to read sequence 'test:AB001363'

Please help. Thank you in advance.

Jean
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.open-bio.org/pipermail/emboss/attachments/20030521/34ed2c6a/attachment.html 

From pmr at ebi.ac.uk  Wed May 21 12:30:43 2003
From: pmr at ebi.ac.uk (Peter Rice)
Date: Wed, 21 May 2003 17:30:43 +0100
Subject: question about databases setup
References: <038001c31fb5$1171acf0$618a70a5@citjmao>
Message-ID: <3ECBA9B3.3070802@ebi.ac.uk>

Jean Mao wrote:
> Hi, I am new to emboss. have question in database setup.
>  
> I have a file in the directory /data/maoj/emboss/db/mouse/ called 
> 'test.dat'. this file has 9 entries in embl format. i ran dbiflat, 
> acnum.hit, acnum.trg, division.lkp, entrynam.idx were generated.

You need to specify where the index files are (indexdir) in the database 
definition.

Hope this helps,

Peter Rice


From mathog at mendel.bio.caltech.edu  Wed May 21 18:04:02 2003
From: mathog at mendel.bio.caltech.edu (David Mathog)
Date: Wed, 21 May 2003 15:04:02 -0700
Subject: extractfeat with gff files?
Message-ID: <E19Ibgo-0002ya-00@mendel.bio.caltech.edu>

EMBOSS 2.6.0

I cannot seem to locate the magic incantation that will make gff
files work as desired with fasta files.  HELP!

Here's the sort of line I want to extract (sorry about the wrap):

X       gadfly  translation     1880    3119    .       -       .      
genegrp=CG3038; transgrp=CG3038-RB

There are many other lines in the gff for transcription,exon,
gene, etc. which should not be extracted.  The fasta
input file currently has entries with names like X,2L,etc.
which correspond to the first column of the gff file.
Ideally I'd like to be able to use one gff file
(with X->3R in the first column) to extract
from one fasta file (again, with X->3R for the fasta name),
and have the descriptions of X act only on the sequence X,
and so forth. The idea being to be able to extract features
on a genomic level using only one fasta/gff pair, rather than
N (=#of scaffolds) pairs.


First though I tried an input fasta file containing 
(11 X 10kb entries, the first being X) and a gff file also starting
only with X (but with references for the whole chromosome, 22101
lines).  The following command sat for about two minutes, burned
a lot of CPU time, but emitted nothing:

extractfeat -sequence=dmel_genome_frag.nfa\
  -ufo=x.gff -type=translation -outseq=x.nfa

When the -type qualifier was removed it went nuts and emitted 
over 40000 entries (more than there were lines in the gff file!)
before I killed it.  Clearly there was no error checking for
size of gff entry versus size of sequence.  The input fasta
file had 11 entries of 10000 bp each.  The first was X.  Yet
a bunch of lines like:

>X_12390_12854 [exon] X release:3 length:21780003bp Assembled X
chromosome arm sequence md5:f3fbbb4c44f0d30d1effeecc87b5bd18
T

were emitted.  So the fasta file was reduced to just one entry
(X, 10kb) and this time the output fasta file held 22101 entries.
As before, those beyond 10kb were emitted with a single base.
So apparently the entire gff description is applied to each fasta
sequence and there's no checking of the first column against
the sequence name. That's ok - we can live with that for now,
but it would be better if the descriptions could automatically
matched to the sequence names.

I'm not sure though that we can live with it emitting single
bp sequences when the description is outside of the sequence.
If the feature is beyond the end of the input sequence it just
isn't there, right?

Just to spite me "translation" was never emitted.
There were only lines for gene,exon.misc_feature,tRNA,snoRNA.

So I tried:

 extractfeat -sequence=dmel_x.nfa \
  -ufo=x.gff -outseq=x.nfa -type=gene

And it emitted a single whole gene match at (1488,3280,-) correctly.
The next one at (3445,11463,+) partially (and correctly, ending
at the end of the sequence - a warning would have been nice)
and then a slew of (>2000) single base pair "empty" entries
outside of the input sequence.  Note also that there's no indication
on the fasta header line in the output of the strand which
was selected. 

So, how does one get extractfeat to emit only matches
to "translation"? Please tell me there's some way other
than by extracting those lines into a separate
gff file and renaming them all "gene"!

Extractfeat seems to have a predefined set of "features" that it's
willing to work with and doesn't handle others well.  To
narrow this down a bit more I made a small gff file containing
"fred" where "gene" had been and specifying positions <10kb.
The features were emitted but all were labeled "misc_feature".
Is this documented somewhere?  It isn't in an obvious
place in the on line help, as both of these searches come up
empty.

 extractfeat -h 2>&1 | grep -i misc
 tfm extractfeat 2>&1 | grep -i misc


It would also be nice if there was some way to get column 9 from
the gff file onto the fasta header line somewhere.  (It can
then be rearranged to suit later.)  Currently even if one
has the gene names lined up with the gene entries in the gff
file the resulting fasta file just says "X_100_123 [gene]..."
without any of the comment info.  You've got the sequence
but not the names of the genes.  Very painful to work with if
the output is the coding sequences for an entire genome.


Is there a switch (or bug fix) that stops extractfeat
from emitting garbage single bp entries for descriptions
outside the sequence?

Thanks,


David Mathog
mathog at caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech


From yann-francois.bizouerne at bayercropscience.com  Thu May 22 10:26:36 2003
From: yann-francois.bizouerne at bayercropscience.com (yann-francois.bizouerne at bayercropscience.com)
Date: Thu, 22 May 2003 16:26:36 +0200
Subject: creation of new output fasta format
Message-ID: <OFD4207DB1.0DC2618F-ONC1256D2E.004D554E-C1256D2E.004F56FC@bayer-ag.com>

Hello,

Fisrt thanks a lot for your quick response to my last mail.

Now, I am trying to create a new fasta format. The format I want to obtain : >
dbname:id |accession|organism|description
By the way I create a new function in the ajseqwrite.c (seqWriteNewFasta). I
have select the diffrent informations I want to retrieve by using the examples
of others functions.
It is working quite well. Except for the Pir and Nrl_3D databases.
Indeed for these databases, I have no database name and no organism (taxon)

 /** Database name **/
 if (ajStrLen(outseq->Db))
    (void) ajFmtPrintF (outseq->File, ">%S:", outseq->Db);
  else if (ajStrLen(outseq->Setdb))
    (void) ajFmtPrintF (outseq->File, ">%S:", outseq->Setdb);
  else
    (void) ajFmtPrintF (outseq->File, ">unk:");


 /** Organism **/
  if (ajStrLen(outseq->Tax))
    (void) ajFmtPrintF (outseq->File, "%S|", outseq->Tax);


I try to find some information about NBRF format in EMboss and the way to use
it, but I could find nothing.
Do you have a clue for me ?

Best regards


Yann-Fran?ois BIZOUERNE
BioInformatic Team
BAYER CropScience
1, rue Pierre Fontaine
91058 Evry Cedex
FRANCE
Phone:      33-(0) 1-69-47-61-56
FAX:        33-(0) 1-69-47-61-42
E-mail:     yann-francois.bizouerne at bayercropscience.com
Intranet: http://bioinfo.evry.fr.bayercropscience/


From peptides at earthlink.net  Sat May 24 18:55:53 2003
From: peptides at earthlink.net (David Stephens)
Date: Sat, 24 May 2003 15:55:53 -0700
Subject: Happy Memorial Day
Message-ID: <20030524225553.29CB27D181@mercury.hgmp.mrc.ac.uk>

An HTML attachment was scrubbed...
URL: http://lists.open-bio.org/pipermail/emboss/attachments/20030524/4408e78b/attachment.html 

From henrikki.almusa at helsinki.fi  Tue May 27 06:46:13 2003
From: henrikki.almusa at helsinki.fi (Henrikki Almusa)
Date: Tue, 27 May 2003 13:46:13 +0300
Subject: Graph data handling
Message-ID: <200305271339.51333.henrikki.almusa@helsinki.fi>

Hello,

I'm trying to use graphs in scripts outside of emboss. However i got into 
problems with options conserning the graph handling. I found following 
options from "banana" tools webpage:

  "-graph" related qualifiers
   -gprompt             boolean    Graph prompting
   -gtitle              string     Graph title
   -gsubtitle           string     Graph subtitle
   -gxtitle             string     Graph x axis title
   -gytitle             string     Graph y axis title
   -goutfile            string     Output file for non interactive displays
   -gdirectory          string     Output directory

I tried to use -goutfile and -gdirectory with banana, but i seem to be unable 
to effect the data file(s) or their directories. 

If i understand correctly this should work "banana mRNA.seq -graph data 
-goutfile /home/hena/banana_data_file -auto" or then "banana mRNA.seq -graph 
data -goutfile banana_data_file -gdirectory /home/hena". For second i get 
error: "Died: unknown qualifier -gdirectory" and with first i get "Created 
banana_data_file.dat", but no such file is created and no data fale is there. 
Also if i use "-data" option there, i get multiple bananaX.dat (in which X is 
running number) files.

So, my questions are. How exactly how is this supposed to work? And could it 
be added to webpages "User documentation"  section with other formats.  And 
thirdly is there a reason, why some programs expect "-data" option and others 
do not?

TIA
--
Henrikki Almusa


From sebastien.frade at bayercropscience.com  Wed May 28 08:38:49 2003
From: sebastien.frade at bayercropscience.com (sebastien.frade at bayercropscience.com)
Date: Wed, 28 May 2003 14:38:49 +0200
Subject: No subject
Message-ID: <OF4617347F.32BA0711-ONC1256D34.0036ABA5-C1256D34.00457920@bayer-ag.com>

Hi,

I'm a new user of EMBOSS and i like to extract some information of EMBL flat
file like clone, strain, tissue ... that are stored in the FT section.
But i don't know how to do that.

I've look for a tool that can extract features, but no one of them extract these
fields.

if a such tool doesn't exist how can i develop it ?

Please help me !!

Thank


S?bastien Frade
BioInformatic Team
BAYER CropScience
1 Rue Pierre FONTAINE
91058 EVRY ? France
                                                                              
 tel :                                   33 (0) 1 69 47 61 52                 
                                                                              
 fax :                                   33 (0) 1 69 47 61 42                 
                                                                              
 mail :                                  sebastien.frade at bayercropscience.com 
                                                                              
 http://bioinfo.evry.fr.bayercropscience                                      
                                                                              

From pmr at ebi.ac.uk  Wed May 28 10:30:43 2003
From: pmr at ebi.ac.uk (Peter Rice)
Date: Wed, 28 May 2003 15:30:43 +0100
Subject: 
References: <OF4617347F.32BA0711-ONC1256D34.0036ABA5-C1256D34.00457920@bayer-ag.com>
Message-ID: <3ED4C813.4010900@ebi.ac.uk>

sebastien.frade at bayercropscience.com wrote:
> Hi,
> 
> I'm a new user of EMBOSS and i like to extract some information of EMBL flat
> file like clone, strain, tissue ... that are stored in the FT section.
> But i don't know how to do that.
> 
> I've look for a tool that can extract features, but no one of them extract these
> fields.

This sounds like a task for SRS :-)

http://srs.ebi.ac.uk/

EMBOSS really works with the sequence data. We can try to extract more 
of the other data but it is a non-trivial task.

But ... you could write your own EMBOSS tool, and we can help you to do 
that!!!

Hope this helps

Peter Rice


From maoj at mail.nih.gov  Fri May 30 10:49:17 2003
From: maoj at mail.nih.gov (Jean Mao)
Date: Fri, 30 May 2003 10:49:17 -0400
Subject: question about dbiblast
Message-ID: <0d4401c326ba$a5847600$618a70a5@citjmao>

Hi, I am new in emboss db config. Need some help in indexing blast db.

I have in dir following files:
-rw-rw-r--    1 maoj     Seqdb       190733 May 30 08:19 drosoph.nt.nhr
-rw-rw-r--    1 maoj     Seqdb        14108 May 30 08:19 drosoph.nt.nin
-rw-rw-r--    1 maoj     Seqdb         9360 May 30 08:19 drosoph.nt.nnd
-rw-rw-r--    1 maoj     Seqdb           84 May 30 08:19 drosoph.nt.nni
-rw-rw-r--    1 maoj     Seqdb       174584 May 30 08:19 drosoph.nt.nsd
-rw-rw-r--    1 maoj     Seqdb         3699 May 30 08:19 drosoph.nt.nsi
-rw-rw-r--    1 maoj     Seqdb     31368306 May 30 08:19 drosoph.nt.nsq

i believe these are files from NCBI and generated use formatdb version 2.2.5. I run dbiblast in this directory:
Index a BLAST database
Database name: drosoph
Database directory [.]: 
Wildcard database filename [drosoph]: drosoph.nt.*
Release number [0.0]: 
Index date [00/00/00]: 
         N : nucleic
         P : protein
         ? : unknown
Sequence type [unknown]: N
         1 : wublast and setdb/pressdb
         2 : formatdb
         0 : unknown
Blast index version [unknown]: 2

then I got many lines of the following message:
Warning: Duplicate ID skipped: '0?0? ?^DROSOPHILA' All hits will point to first ID found

The following new files were generated:
-rw-rw-r--    1 maoj     Seqdb          322 May 30 10:44 division.lkp
-rw-rw-r--    1 maoj     Seqdb          496 May 30 10:44 entrynam.idx
-rw-rw-r--    1 maoj     Seqdb          300 May 30 10:44 acnum.trg
-rw-rw-r--    1 maoj     Seqdb          300 May 30 10:44 acnum.hit

I then edit my ~/.embossrc file by add the following lines:
DB drosoph [
        type: N
        method: blast
        format: ncbi
        dir: /data/maoj/emboss/db/blast/drosoph
        indexdir: /data/maoj/emboss/db/blast/drosoph
        file: "drosoph.nt.*"
        release: "0.0"
        comment: "blast drosoph"
        ]

Then I run showdb:
% showdb
Displays information on the currently available databases
# Name        Type ID  Qry All Comment
# ====        ==== ==  === === =======
drosoph       N    OK  OK  OK  blast drosoph
test               N    OK  OK  OK  Test DB

The test db is genbank format and was running fine.

Then I run seqret:
% seqret
Reads and writes (returns) sequences
Input sequence(s): drosoph:A*
Error: BLAST Query failed
Error: Unable to read sequence 'drosoph:A*'
Input sequence(s): drosoph:*

   EMBOSS An error in ajseqdb.c at line 4006:
error reading file /data/maoj/emboss/db/blast/drosoph/drosoph.nt.nhr

Please advise what I might did wrong. 

Thank you very much!!!

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.open-bio.org/pipermail/emboss/attachments/20030530/21abd710/attachment.html 

From pmr at ebi.ac.uk  Fri May 30 14:24:58 2003
From: pmr at ebi.ac.uk (pmr at ebi.ac.uk)
Date: Fri, 30 May 2003 19:24:58 +0100 (BST)
Subject: question about dbiblast
In-Reply-To: <0d4401c326ba$a5847600$618a70a5@citjmao>
References: <0d4401c326ba$a5847600$618a70a5@citjmao>
Message-ID: <1189.217.134.86.144.1054319098.squirrel@webmail.ebi.ac.uk>

> Hi, I am new in emboss db config. Need some help in indexing blast db.

This is the long-standing problem of the "new ASN.1 format blast database"

NCBI changed formatdb to create a new index file format, but we have no
documentation on the new format, so we cannot update dbiblast to index it.

We hope to provide the ability to index these blast databases in a future
release, once NCBI release the format specification. I suspect EMBOSS and
FASTA are the only other applications using blast index formats so it is
not an urgent task for them.

Meanwhile, you need to use the 'old' format:

First, you need the original FASTA format file (drosophila.nt)

Then, index it with formatdb but add "-A F" to the command line (to turn
off ASN.1 format).

Hope this helps,

Peter Rice


From calvinwangxi at yahoo.com  Sat May 31 07:43:11 2003
From: calvinwangxi at yahoo.com (calvin wang)
Date: Sat, 31 May 2003 04:43:11 -0700 (PDT)
Subject: new moethods
In-Reply-To: <1189.217.134.86.144.1054319098.squirrel@webmail.ebi.ac.uk>
Message-ID: <20030531114311.68280.qmail@web41115.mail.yahoo.com>

I need to use TCOFFE, I understand this is not part of EMBOSS. Is it possible to include new methods in to EMBOSS? how? is there a guide? thanks. 

> Hi, I am new in emboss db config. Need some help in indexing blast db.

This is the long-standing problem of the "new ASN.1 format blast database"

NCBI changed formatdb to create a new index file format, but we have no
documentation on the new format, so we cannot update dbiblast to index it.

We hope to provide the ability to index these blast databases in a future
release, once NCBI release the format specification. I suspect EMBOSS and
FASTA are the only other applications using blast index formats so it is
not an urgent task for them.

Meanwhile, you need to use the 'old' format:

First, you need the original FASTA format file (drosophila.nt)

Then, index it with formatdb but add "-A F" to the command line (to turn
off ASN.1 format).

Hope this helps,

Peter Rice


---------------------------------
Do you Yahoo!?
Free online calendar with sync to Outlook(TM).
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.open-bio.org/pipermail/emboss/attachments/20030531/7dba1b52/attachment.html 

From gbottu at black.vub.ac.be  Thu May  1 18:43:58 2003
From: gbottu at black.vub.ac.be (Guy Bottu)
Date: Thu, 1 May 2003 20:43:58 +0200
Subject: Preferred isoschizomer ?
In-Reply-To: <200304301829.h3UITwG29534@sulphur.hgmp.mrc.ac.uk>; from ableasby@hgmp.mrc.ac.uk on Wed, Apr 30, 2003 at 07:29:58PM +0100
References: <200304301829.h3UITwG29534@sulphur.hgmp.mrc.ac.uk>
Message-ID: <20030501204358.A1336237@black.vub.ac.be>

from : BEN

On Wed, Apr 30, 2003 at 07:29:58PM +0100, ableasby at hgmp.mrc.ac.uk wrote:
> There are replacement files for rebaseextract.c and rebaseextract.acd
> in the ftp://ftp.uk.embnet.org/pub/EMBOSS/patchfiles/ 
> directory. By default this program will now produce an
> embossre.equ file. Re-extract the withrefm file using the new
> program. If you then use the -preferred option to 'restrict'
> it should behave as you wish.

	Fine !

There is however a problem : the programs restrict and restover now
behave as they should, but, the programs remap and showseq seem to
ignore the parameter -preferred, or do I make a mistake ?

	Regards,
	Guy Bottu


From eija.korpelainen at csc.fi  Fri May  2 05:41:04 2003
From: eija.korpelainen at csc.fi (Eija Korpelainen)
Date: Fri, 2 May 2003 08:41:04 +0300
Subject: Preferred isoschizomer ?
References: <200304141817.h3EIHs410930@bromine.hgmp.mrc.ac.uk> <20030430181945.GD3138@iib.unsam.edu.ar>
Message-ID: <002b01c3106d$6ea5b8a0$0402a6c1@windows.csc.fi>

Dear Fernan, Guy and others,

we have been looking into this problem with Alan and as he told you, the
embossre.equ -file is now made automatically. -preferred works (gives you
PstI instead of BspMAI) because the default value of -limit is true (this is
defined in the restrict.acd file). So if one is using a graphical interface
one has to tick both -preferred and -limit to get the right thing. This is
because in the code of restrict.c -preferred (called "equiv" in the code) is
considered only when -limit has been chosen. What the program actually does
is that it first limits to one isoschizomer and picks the alphabetically
first one (!), and then converts this to the prototype enzyme using the
embossre.equ file. The limiting step is performed by the function
embPatRestrictRestrict in embpat.c (in the nucleus directory).

The problem with the current set up is that the user doesn't know
that -limit and -preferred are interconnected. This could of course be
documented, but the easy fix would be to set the equiv boolean true in the
code and abolish the -preferred qualifier altogether. This way -limit would
give you automatically PstI, and -nolimit all isoschizomers.

As Guy pointed out, the problem with remap is that it does not take any
notice of the -preferred. This is simply because the code reads the value of
preferred (or equiv) but doesn't use it for anything. In other words, most
of remap.c code comes from Alan's restrict.c code, but the following
critical bit was accidentally left out.
if(equiv && limit)
{
value = ajTableGet(table,m->cod);
if (value)
ajStrAss(&m->cod,value);
}

I think it would be important to fix these problems because these are quite
central programs for molecular biologists and expensive projects like
transgenic design depend heavily on proper restriction maps.

Cheers,
Eija

_____________________________________________

Eija Korpelainen, Ph.D
Science Support/Biosciences
CSC - Center for Scientific Computing
P.O.Box 405, FIN-02101 Espoo, Finland
Phone    +358 9 457 2030
Mobile   +358 50 381 9726
Fax        +358 9 457 2302
E-Mail    Eija.Korpelainen at csc.fi
________________________________________________


From ableasby at hgmp.mrc.ac.uk  Fri May  2 06:50:57 2003
From: ableasby at hgmp.mrc.ac.uk (ableasby at hgmp.mrc.ac.uk)
Date: Fri, 2 May 2003 07:50:57 +0100 (BST)
Subject: Preferred isoschizomer ?
Message-ID: <200305020650.h426ovQ24261@bromine.hgmp.mrc.ac.uk>

Eija's analysis is quite correct. In fact the modifications to
remap/showseq (or their equivalent) were made yesterday and
passed on to the original author so they can be tested for
any knock-on effects.

It is true that, when the program was written, there were no GUIs
for EMBOSS so the '-limit' confusion didn't arise. Eija's
suggestion is a good one and will be tested

Alan


From bianji at jincao.com  Fri May  2 10:18:10 2003
From: bianji at jincao.com (bianji at jincao.com)
Date: Fri, 2 May 2003 18:18:10 +0800
Subject: =?GB2312?B?ufq80rDksry52NPaIrfHteQi1+7QwreowsmhoreoueY=?=
Message-ID: <20030502100834.1E0B37D1A5@mercury.hgmp.mrc.ac.uk>

    ?????????????????

    ????????"??"????????????????????

"??"??????

    ??? http://www.jincao.com/t1.htm 

    ????????????????CEO???

    ?????????????? msm at jincao.com 

                              2003?5?2?


From peptides at earthlink.net  Wed May  7 08:15:47 2003
From: peptides at earthlink.net (David Stephens)
Date: Wed, 7 May 2003 01:15:47 -0700
Subject: Growth In Radiolabeled Peptides
Message-ID: <20030507081548.EB25A7D20A@mercury.hgmp.mrc.ac.uk>

An HTML attachment was scrubbed...
URL: <http://lists.open-bio.org/pipermail/emboss/attachments/20030507/5fa68a9f/attachment-0001.html>

From Marc.Logghe at devgen.com  Wed May  7 10:18:13 2003
From: Marc.Logghe at devgen.com (Marc Logghe)
Date: Wed, 7 May 2003 12:18:13 +0200 
Subject: dbiflat question
Message-ID: <BEE28BF86078B6429D6C780635718E212E71B4@morelia.be.devgen.com>

Hi all,
I feel a little dumb but I'll ask it anyhow. I seem not to succeed in
creating indices for a database using dbiflat.
As a test I just wanted to index the genbank file /data/genbank/gbest226.seq
Ok, I wanted my indices to be in /data/emboss/est
so I have run dbiflat in that folder.
dbiflat -idformat genbank -directory /data/genbank -filenames gbest226.seq
-dbname est
I added this entry to emboss.default
DB est [
   type: N
   format: genbank
   method: emblcd
   directory: /data/emboss/est
]

But, you guessed it, this did not work.
What am I doing wrong here ? What happens with the passed dbname (could not
find any file with that name after running dbiflat) ?
TIA,
marc

***********************************************************
Marc Logghe, Ph.D.
Senior Scientist
Scientific Computing Group
deVGen
Technologiepark 9
9052 Zwijnaarde
Belgium
tel: +32 (0) 9 324 24 83
fax: +32 (0) 9 324 24 25
***********************************************************


From pmr at ebi.ac.uk  Wed May  7 10:34:58 2003
From: pmr at ebi.ac.uk (Peter Rice)
Date: Wed, 07 May 2003 11:34:58 +0100
Subject: dbiflat question
References: <BEE28BF86078B6429D6C780635718E212E71B4@morelia.be.devgen.com>
Message-ID: <3EB8E152.7090706@ebi.ac.uk>

Marc Logghe wrote:
> I added this entry to emboss.default
> DB est [
>    type: N
>    format: genbank
>    method: emblcd
>    directory: /data/emboss/est
> ]

You need:

directory: /data/genbank
indexdirectory: /data/emboss/est

EMBOSS needs to find the index files and the data files.

Just specifying "directory" works if both files are there (it becomes 
the defualt for indexdirectory), so your confusion is quite understandable.

Hope this helps,

Peter Rice


From pemberaj at pugh.bip.bham.ac.uk  Wed May  7 11:17:31 2003
From: pemberaj at pugh.bip.bham.ac.uk (Tony Pemberton)
Date: Wed, 7 May 2003 12:17:31 +0100
Subject: dbiflat question
In-Reply-To: <BEE28BF86078B6429D6C780635718E212E71B4@morelia.be.devgen.com>
References: <BEE28BF86078B6429D6C780635718E212E71B4@morelia.be.devgen.com>
Message-ID: <Pine.SGI.4.51.0305071212370.189268@pugh.bip.bham.ac.uk>

On Wed, 7 May 2003, Marc Logghe wrote:

> Hi all,
> I feel a little dumb but I'll ask it anyhow. I seem not to succeed in
> creating indices for a database using dbiflat.
> As a test I just wanted to index the genbank file /data/genbank/gbest226.seq
> Ok, I wanted my indices to be in /data/emboss/est
> so I have run dbiflat in that folder.
> dbiflat -idformat genbank -directory /data/genbank -filenames gbest226.seq
> -dbname est
> I added this entry to emboss.default
> DB est [
>    type: N
>    format: genbank
>    method: emblcd
>    directory: /data/emboss/est
> ]
>
> But, you guessed it, this did not work.
> What am I doing wrong here ? What happens with the passed dbname (could not
> find any file with that name after running dbiflat) ?
> TIA,
> marc
>
> ***********************************************************
> Marc Logghe, Ph.D.
> Senior Scientist
> Scientific Computing Group
> deVGen
> Technologiepark 9
> 9052 Zwijnaarde
> Belgium
> tel: +32 (0) 9 324 24 83
> fax: +32 (0) 9 324 24 25
> ***********************************************************
>
>
>

Marc,

You need the .seq file also to be in the directory where you run
dbiflat. Or make symbolic links!

You will note that the dialogue of dbiflat asks about the files to
process (*.seq). At this stage, I think I am correct in saying, that
the database directory file emboss.default is not operable. This
merely directs the user programs e.g. seqret to the formatted
database (indeces) as shown by showdb.

Regards,

Tony


*********************************************************************
Mr. A.J.Pemberton              Tel:  +121-414-3388
c/o Dept. Rheumatology,        Fax:  +121-414-6794
Medical School,                E-mail: A.J.Pemberton at bham.ac.uk
The University of Birmingham,
Birmingham B15 2TT.
U.K.
*********************************************************************


From Marc.Logghe at devgen.com  Wed May  7 12:05:25 2003
From: Marc.Logghe at devgen.com (Marc Logghe)
Date: Wed, 7 May 2003 14:05:25 +0200 
Subject: dbiflat question
Message-ID: <BEE28BF86078B6429D6C780635718E212E71B6@morelia.be.devgen.com>

Thanks for the reply !
That is what I have figured out:
when you run
dbiflat -idformat genbank -directory /data/genbank -filenames gbest226.seq
in the index directory (e.g. /data/emboss/est) has the same effect as
running 
dbiflat -idformat genbank -directory /data/genbank -indexdirectory
/data/emboss/est -filenames gbest226.seq
meaning, the index files are created in the desired place. But still the
sequences themselves are not accessible using the mentioned entry in
emboss.default ('seqret est -firstonly' gives a segmentation fault). I
suppose the 'directory' key should point to the indexdirectory, right ?
Because, the index itself should be pointing to the correct sequence path.
At least that is what I expect.
And indeed, as suggested by Tony, everything worked fine when putting index
and sequence files in the same directory (indexdirectory and directory are
the same).
OK, just tried something which appears to work now. Switch to the first
scenario again: separate paths for index and sequence files. When I changed
the emboss.default to the following, everything worked fine:

DB est [
   type: N
   format: genbank
   method: emblcd
   indexdirectory: /data/emboss/est
   directory: /data/genbank
]

Apparently you have to set the indexdirectory and directory explicitely in
the configuration file also; pointing to the indexdirectory alone is not
sufficient !
Regards,
Marc
 
> -----Original Message-----
> From: Tony Pemberton [mailto:pemberaj at pugh.bip.bham.ac.uk]
> Sent: Wednesday, May 07, 2003 1:18 PM
> To: Marc Logghe
> Cc: Emboss (E-mail)
> Subject: Re: dbiflat question
> 
> 
> On Wed, 7 May 2003, Marc Logghe wrote:
> 
> > Hi all,
> > I feel a little dumb but I'll ask it anyhow. I seem not to 
> succeed in
> > creating indices for a database using dbiflat.
> > As a test I just wanted to index the genbank file 
> /data/genbank/gbest226.seq
> > Ok, I wanted my indices to be in /data/emboss/est
> > so I have run dbiflat in that folder.
> > dbiflat -idformat genbank -directory /data/genbank 
> -filenames gbest226.seq
> > -dbname est
> > I added this entry to emboss.default
> > DB est [
> >    type: N
> >    format: genbank
> >    method: emblcd
> >    directory: /data/emboss/est
> > ]
> >
> > But, you guessed it, this did not work.
> > What am I doing wrong here ? What happens with the passed 
> dbname (could not
> > find any file with that name after running dbiflat) ?
> > TIA,
> > marc
> >
> > ***********************************************************
> > Marc Logghe, Ph.D.
> > Senior Scientist
> > Scientific Computing Group
> > deVGen
> > Technologiepark 9
> > 9052 Zwijnaarde
> > Belgium
> > tel: +32 (0) 9 324 24 83
> > fax: +32 (0) 9 324 24 25
> > ***********************************************************
> >
> >
> >
> 
> Marc,
> 
> You need the .seq file also to be in the directory where you run
> dbiflat. Or make symbolic links!
> 
> You will note that the dialogue of dbiflat asks about the files to
> process (*.seq). At this stage, I think I am correct in saying, that
> the database directory file emboss.default is not operable. This
> merely directs the user programs e.g. seqret to the formatted
> database (indeces) as shown by showdb.
> 
> Regards,
> 
> Tony
> 
> 
> *********************************************************************
> Mr. A.J.Pemberton              Tel:  +121-414-3388
> c/o Dept. Rheumatology,        Fax:  +121-414-6794
> Medical School,                E-mail: A.J.Pemberton at bham.ac.uk
> The University of Birmingham,
> Birmingham B15 2TT.
> U.K.
> *********************************************************************
> 


From Stephan.Hurling at evotecoai.com  Thu May  8 12:41:50 2003
From: Stephan.Hurling at evotecoai.com (Stephan.Hurling at evotecoai.com)
Date: Thu, 8 May 2003 14:41:50 +0200
Subject: Problems with dbigcg...
Message-ID: <OFF5BDE2CA.9C1FDB2B-ONC1256D20.00438AAD@evotecoai.com>

Hello Everyone,

I would like to use EMBOSS version 2.6.0 together with the GCG Wisconsin 
package version 10.3
on a Red Hat 7.2 linux server. I followed the installation instructions 
from the administrators guide
and doing the usual

./configure
make 
make install

I compiled and installed emboss on my system without any error messages. 
But when I want to make
indexes from a gcg database I run into troubles. See the following output 
of an interactive session:

14:18 [root at kepler] ~/Temp # dbigcg
Index a GCG formatted database
      EMBL : EMBL
     SWISS : Swiss-Prot, SpTrEMBL, TrEMBLnew
   GENBANK : Genbank, DDBJ
       PIR : NBRF
Entry format [EMBL]:
Database directory [.]: /usr/local/share/EMBOSS/data/GCG_DATABASES/gcgembl
Wildcard database filename [*.seq]:
Database name: embl
Release number [0.0]: 73.0
Index date [00/00/00]: 01/12/02

   EMBOSS An error in embdbi.c at line 590:
Cannot open embl.idsrt for reading

14:21 [root at kepler] ~/Temp # ll
total 252k
drwxr-xr-x    2 root     root         4.0k May  8 14:18 ./
drwxr-x---   24 root     root         4.0k May  8 14:15 ../
-rw-------    1 root     root         1.4M May  8 14:18 core
-rw-r--r--    1 root     root         1.5k May  8 14:18 division.lkp
-rw-r--r--    1 root     root          675 May  8 14:18 embl001.acnum
-rw-r--r--    1 root     root          104 May  8 14:18 embl002.acnum
-rw-r--r--    1 root     root          504 May  8 14:18 embl003.acnum
-rw-r--r--    1 root     root           51 May  8 14:18 embl004.acnum
-rw-r--r--    1 root     root            0 May  8 14:18 embl005.acnum
-rw-r--r--    1 root     root            0 May  8 14:18 embl006.acnum
-rw-r--r--    1 root     root            0 May  8 14:18 embl007.acnum
-rw-r--r--    1 root     root          126 May  8 14:18 embl008.acnum
-rw-r--r--    1 root     root            0 May  8 14:18 embl009.acnum
-rw-r--r--    1 root     root            0 May  8 14:18 embl010.acnum
-rw-r--r--    1 root     root            0 May  8 14:18 embl011.acnum
-rw-r--r--    1 root     root            0 May  8 14:18 embl012.acnum
-rw-r--r--    1 root     root           36 May  8 14:18 embl013.acnum
-rw-r--r--    1 root     root          25k May  8 14:18 embl014.acnum
-rw-r--r--    1 root     root          161 May  8 14:18 embl015.acnum
-rw-r--r--    1 root     root          850 May  8 14:18 embl016.acnum
-rw-r--r--    1 root     root         1.3k May  8 14:18 embl017.acnum
-rw-r--r--    1 root     root         2.6k May  8 14:18 embl018.acnum
-rw-r--r--    1 root     root          121 May  8 14:18 embl019.acnum
-rw-r--r--    1 root     root          290 May  8 14:18 embl020.acnum
-rw-r--r--    1 root     root            0 May  8 14:18 embl021.acnum
-rw-r--r--    1 root     root          104 May  8 14:18 embl022.acnum
-rw-r--r--    1 root     root          188 May  8 14:18 embl023.acnum
-rw-r--r--    1 root     root            0 May  8 14:18 embl024.acnum
-rw-r--r--    1 root     root           34 May  8 14:18 embl025.acnum
-rw-r--r--    1 root     root           14 May  8 14:18 embl026.acnum
-rw-r--r--    1 root     root          490 May  8 14:18 embl027.acnum
-rw-r--r--    1 root     root          300 May  8 14:18 entrynam.idx
-rw-------    1 root     root            0 May  8 14:18 sort9YCQfK

Can somebody help me? Have I done something wrong during the compilation 
step of emboss?
Any hint would help me.

Thanks in advance...

All the best,


Stephan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.open-bio.org/pipermail/emboss/attachments/20030508/3ce77125/attachment-0001.html>

From ablavier at wanadoo.fr  Sun May 11 18:36:24 2003
From: ablavier at wanadoo.fr (=?iso-8859-1?Q?Andr=E9_Blavier?=)
Date: Sun, 11 May 2003 20:36:24 +0200
Subject: EMBOSS for Windows: DLL  build
Message-ID: <001e01c317ec$3c9feb60$5ca03551@bach>

EMBOSS for Windows is now built with ajax and nucleus compiled as DLLs, so
the EMBOSS programs are now much smaller, and the distribution as well.
dbiblast is now in the package. See
http://perso.wanadoo.fr/ablavier/embosswin/embosswin.html.

    -- Andr? Blavier


From arunanirudhan at yahoo.co.in  Mon May 12 08:27:44 2003
From: arunanirudhan at yahoo.co.in (=?iso-8859-1?q?arun=20anirudhan?=)
Date: Mon, 12 May 2003 09:27:44 +0100 (BST)
Subject: seqret
Message-ID: <20030512082744.65129.qmail@web8203.mail.in.yahoo.com>

Hello allHow can i use seqret to retrieve sequences from a database like we use in entrez? For eg: I want to get sequences of all insulin from genbank. What to give as command?seqret embl:insulin         ? Arun
Catch all the cricket action. Download Yahoo! Score tracker
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.open-bio.org/pipermail/emboss/attachments/20030512/2ca4eb08/attachment-0001.html>

From maoj at mail.nih.gov  Mon May 12 14:16:05 2003
From: maoj at mail.nih.gov (Jean Mao)
Date: Mon, 12 May 2003 10:16:05 -0400
Subject: about ftp site of EMBOSS Administrators Guide
Message-ID: <00e801c31891$0c341410$618a70a5@citjmao>

Hi, where can I find the pdf version of emboss administrators guide? the link on the website doesn't work. thanks.

Jean
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.open-bio.org/pipermail/emboss/attachments/20030512/c7d5275b/attachment-0001.html>

From gwilliam at hgmp.mrc.ac.uk  Mon May 12 14:40:35 2003
From: gwilliam at hgmp.mrc.ac.uk (Gary Williams, Tel 01223 494522)
Date: Mon, 12 May 2003 15:40:35 +0100
Subject: about ftp site of EMBOSS Administrators Guide
References: <00e801c31891$0c341410$618a70a5@citjmao>
Message-ID: <3EBFB263.95B0DDAB@hgmp.mrc.ac.uk>

There is no PDF version of the current guide.
The link on the web was left there by accident and has now been tidied
away - sorry.

Gary

> Jean Mao wrote:
> 
> Hi, where can I find the pdf version of emboss administrators guide?
> the link on the website doesn't work. thanks.
> 
> Jean

-- 
Gary Williams               Tel: +44 1223 494522  Fax: +44 1223 494512
mailto:G.Williams at rfcgr.mrc.ac.uk          http://www.rfcgr.mrc.ac.uk/
Bioinformatics, MRC RFCGR, Hinxton, Cambridge, CB10 1SB, UK


From gwilliam at hgmp.mrc.ac.uk  Mon May 12 15:40:10 2003
From: gwilliam at hgmp.mrc.ac.uk (Gary Williams, Tel 01223 494522)
Date: Mon, 12 May 2003 16:40:10 +0100
Subject: about ftp site of EMBOSS Administrators Guide
References: <00e801c31891$0c341410$618a70a5@citjmao> <3EBFB263.95B0DDAB@hgmp.mrc.ac.uk>
Message-ID: <3EBFC05A.16A96393@hgmp.mrc.ac.uk>

The .ps and .pdf versions of the current guide are now on the web page:
http://www.hgmp.mrc.ac.uk/Software/EMBOSS/admin.html

See:
http://www.hgmp.mrc.ac.uk/Software/EMBOSS/Doc/Admin_guide/admin.ps
and
http://www.hgmp.mrc.ac.uk/Software/EMBOSS/Doc/Admin_guide/admin.pdf

Gary


> > Jean Mao wrote:
> >
> > Hi, where can I find the pdf version of emboss administrators guide?
> > the link on the website doesn't work. thanks.


-- 
Gary Williams               Tel: +44 1223 494522  Fax: +44 1223 494512
mailto:G.Williams at rfcgr.mrc.ac.uk          http://www.rfcgr.mrc.ac.uk/
Bioinformatics, MRC RFCGR, Hinxton, Cambridge, CB10 1SB, UK


From arunanirudhan at yahoo.co.in  Tue May 13 07:25:13 2003
From: arunanirudhan at yahoo.co.in (=?iso-8859-1?q?arun=20anirudhan?=)
Date: Tue, 13 May 2003 08:25:13 +0100 (BST)
Subject: Fwd: seqret
Message-ID: <20030513072513.53308.qmail@web8204.mail.in.yahoo.com>


Note: forwarded message attached.
Catch all the cricket action. Download Yahoo! Score tracker
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.open-bio.org/pipermail/emboss/attachments/20030513/a71082c2/attachment-0001.html>
-------------- next part --------------
An embedded message was scrubbed...
From: =?iso-8859-1?q?arun=20anirudhan?= <arunanirudhan at yahoo.co.in>
Subject: seqret
Date: Mon, 12 May 2003 09:27:44 +0100 (BST)
Size: 2520
URL: <http://lists.open-bio.org/pipermail/emboss/attachments/20030513/a71082c2/attachment-0001.mht>

From peptides at earthlink.net  Tue May 13 08:45:09 2003
From: peptides at earthlink.net (David Stephens)
Date: Tue, 13 May 2003 01:45:09 -0700
Subject: Sourcing Information For Amino Acids and Custom Peptides
Message-ID: <20030513084512.363E87D2CC@mercury.hgmp.mrc.ac.uk>

An HTML attachment was scrubbed...
URL: <http://lists.open-bio.org/pipermail/emboss/attachments/20030513/c3982ccb/attachment-0001.html>

From yann-francois.bizouerne at bayercropscience.com  Thu May 15 15:29:46 2003
From: yann-francois.bizouerne at bayercropscience.com (yann-francois.bizouerne at bayercropscience.com)
Date: Thu, 15 May 2003 17:29:46 +0200
Subject: Search for organism of entry
Message-ID: <OFC9485941.7B3AAB7B-ONC1256D27.0050FA81-C1256D27.00551F6E@bayer-ag.com>

Hello,

I have install EMBOSS on our server since recently. I reallly enjoy a lot the
different tools but I have a little problem and I can't find the solution
anywhere.
I have index the SwissProt database with  the command line :

      dbiflat -idformat SWISS -directory . -filenames sprot.dat -dnname sprot
-fields acnum,seqvn,des,keyword,taxon

So after that I could find sequence information when I am looking by for
particular organism or keyword.

For the moment What I could retrieve with the accession number is the following
:

   >infoseq sprot:P15711
   Displays some simple information about sequences
   # USA             Name        Accession Type Length     Description
   ian-id:104K_THEPA 104K_THEPA    P15711  P    924        104 kDa
   microneme-rhoptry antigen.

And when I am looking with the organism I obtain :
   >infoseq sprot-org:"*Theileria*" -outfile stdout
   Displays some simple information about sequences
   # USA             Name        Accession Type Length     Description
   sprot-id:104K_THEPA 104K_THEPA    P15711  P    924        104 kDa
   microneme-rhoptry antigen.


So now I want to know if I could  for one particular entry (sprot:P15711) find
the Organism (Theileria prava) or not ?

Thanks in advance for your answer.


Yann-Fran?ois BIZOUERNE
BioInformatic Team
BAYER CropScience
1, rue Pierre Fontaine
91058 Evry Cedex
FRANCE
Phone:      33-(0) 1-69-47-61-56
FAX:        33-(0) 1-69-47-61-42
E-mail:     yann-francois.bizouerne at bayercropscience.com
Intranet: http://bioinfo.evry.fr.bayercropscience/


From pmr at ebi.ac.uk  Thu May 15 16:26:31 2003
From: pmr at ebi.ac.uk (Peter Rice)
Date: Thu, 15 May 2003 17:26:31 +0100
Subject: Search for organism of entry
References: <OFC9485941.7B3AAB7B-ONC1256D27.0050FA81-C1256D27.00551F6E@bayer-ag.com>
Message-ID: <3EC3BFB7.5000605@ebi.ac.uk>

yann-francois.bizouerne at bayercropscience.com wrote:

> And when I am looking with the organism I obtain :
>    >infoseq sprot-org:"*Theileria*" -outfile stdout
>    Displays some simple information about sequences
>    # USA             Name        Accession Type Length     Description
>    sprot-id:104K_THEPA 104K_THEPA    P15711  P    924        104 kDa
>    microneme-rhoptry antigen.
> 
> 
> So now I want to know if I could  for one particular entry (sprot:P15711) find
> the Organism (Theileria prava) or not ?

EMBOSS can search a database by organism, but reads the sequence (in 
most programs) or the whole entry (entret)

... but I am looking into ways to parse out more detail, including 
organism, citation, and features. The database definition would have a 
list of fields that can be retrieved, and a program like (for example) 
entret could check the fields and let you choose the ones you need.

For now, you can run entret and look for the organism in the text.

Hope this helps,

Peter Rice


From henrikki.almusa at helsinki.fi  Tue May 20 06:42:36 2003
From: henrikki.almusa at helsinki.fi (Henrikki Almusa)
Date: Tue, 20 May 2003 09:42:36 +0300
Subject: Support for nexus in alignment format
Message-ID: <200305200942.36206.henrikki.almusa@helsinki.fi>

Hello

I read through the alignment formats that emboss supports. I was wondering if 
nexus is supported as alignment format (-aformat nexus)? I seems to be 
supported as sequence format but it wasnt mentioned as alignment format.

-- 
Henrikki Almusa


From pmr at ebi.ac.uk  Tue May 20 09:29:57 2003
From: pmr at ebi.ac.uk (Peter Rice)
Date: Tue, 20 May 2003 10:29:57 +0100
Subject: Support for nexus in alignment format
References: <200305200942.36206.henrikki.almusa@helsinki.fi>
Message-ID: <3EC9F595.5000008@ebi.ac.uk>

Henrikki Almusa wrote:
> I read through the alignment formats that emboss supports. I was wondering if 
> nexus is supported as alignment format (-aformat nexus)? I seems to be 
> supported as sequence format but it wasnt mentioned as alignment format.

The sequence formats are easy to add as alignment formats. Not sure 
quite how useful that is.

You can do this:

1. Create your alignment in a sequence format (FASTA, MSF)

2. Use seqret to convert to nexus format

... or does NEXUS format hold some extra information that would make it 
a useful alignment format, and that we lose by going through FASTA?

Hope this helps,

Peter


From yann-francois.bizouerne at bayercropscience.com  Wed May 21 08:53:19 2003
From: yann-francois.bizouerne at bayercropscience.com (yann-francois.bizouerne at bayercropscience.com)
Date: Wed, 21 May 2003 10:53:19 +0200
Subject: Use two Emboss package with one database
Message-ID: <OF12D15AFD.92630898-ONC1256D2D.003032C5-C1256D2D.0030D3F0@bayer-ag.com>

Hello,

I am working with 2 diffretns servers on different locations. On each of them a
EMBOSS package tools is installed.
I need to know if I could configure these 2 EMBOSS in order to work with the
same database (which is located on one of the two servers).
Is EMBOSS could working this way or does I need to have only one EMBOSS package
(tools + databse) installed on one server ?

I hope that my question is clear enough.

Best Regards


Yann-Fran?ois BIZOUERNE
BioInformatic Team
BAYER CropScience
1, rue Pierre Fontaine
91058 Evry Cedex
FRANCE
Phone:      33-(0) 1-69-47-61-56
FAX:        33-(0) 1-69-47-61-42
E-mail:     yann-francois.bizouerne at bayercropscience.com
Intranet: http://bioinfo.evry.fr.bayercropscience/


From pmr at ebi.ac.uk  Wed May 21 08:59:18 2003
From: pmr at ebi.ac.uk (Peter Rice)
Date: Wed, 21 May 2003 09:59:18 +0100
Subject: Use two Emboss package with one database
References: <OF12D15AFD.92630898-ONC1256D2D.003032C5-C1256D2D.0030D3F0@bayer-ag.com>
Message-ID: <3ECB3FE6.1020906@ebi.ac.uk>

yann-francois.bizouerne at bayercropscience.com wrote:
> Hello,
> 
> I am working with 2 diffretns servers on different locations. On each of them a
> EMBOSS package tools is installed.
> I need to know if I could configure these 2 EMBOSS in order to work with the
> same database (which is located on one of the two servers).
> Is EMBOSS could working this way or does I need to have only one EMBOSS package
> (tools + databse) installed on one server ?

Yes ... but you need to do some work.

The EMBOSS package on the same server as the databases is easy.

The second EMBOSS package needs to read from remote databases. I assume 
you indexed them with dbiflat (an the other dbi programs).

You can access a remote database by:

SRSWWW if it on an SRS server
URL if you have a web page to query the database
APP (EXTERNAL) if yuo have a script that can return an entry

Assuming you don't have them under SRS ...

You can provide a simple web CGI script that runs entret (for whole 
entry) or seqret (for sequence only - you can put -osformat on the 
command line to get the format of your choice))

You can write a script that will access the databases somehow (possibly 
also by talking to a web page - your choice).

Meanwhile, I am working on ways to define EMBOSS web services and data 
services that would give an alternative access method, but that is for 
later in the year.

Hope this helps,

Peter Rice


From Marc.Logghe at devgen.com  Wed May 21 09:03:00 2003
From: Marc.Logghe at devgen.com (Marc Logghe)
Date: Wed, 21 May 2003 11:03:00 +0200
Subject: Use two Emboss package with one database
Message-ID: <BEE28BF86078B6429D6C780635718E212E71E6@morelia.be.devgen.com>

Hi, 
At our site emboss is installed on every node of a cluster while the
databases are installed on only one.
The only thing you have to do is mount the database directory/directories in
one way or another on every node and adapt the emboss.default files
appropriately (if necessary) so that the DB entries are pointing to the
correct directories.
HTH,
Marc

> -----Original Message-----
> From: yann-francois.bizouerne at bayercropscience.com
> [mailto:yann-francois.bizouerne at bayercropscience.com]
> Sent: Wednesday, May 21, 2003 10:53 AM
> To: emboss at embnet.org
> Subject: Use two Emboss package with one database
> 
> 
> Hello,
> 
> I am working with 2 diffretns servers on different locations. 
> On each of them a
> EMBOSS package tools is installed.
> I need to know if I could configure these 2 EMBOSS in order 
> to work with the
> same database (which is located on one of the two servers).
> Is EMBOSS could working this way or does I need to have only 
> one EMBOSS package
> (tools + databse) installed on one server ?
> 
> I hope that my question is clear enough.
> 
> Best Regards
> 
> 
> 
> Yann-Fran?ois BIZOUERNE
> BioInformatic Team
> BAYER CropScience
> 1, rue Pierre Fontaine
> 91058 Evry Cedex
> FRANCE
> Phone:      33-(0) 1-69-47-61-56
> FAX:        33-(0) 1-69-47-61-42
> E-mail:     yann-francois.bizouerne at bayercropscience.com
> Intranet: http://bioinfo.evry.fr.bayercropscience/
> 
> 


From d.m.a.martin at dundee.ac.uk  Wed May 21 09:04:24 2003
From: d.m.a.martin at dundee.ac.uk (David Martin)
Date: Wed, 21 May 2003 10:04:24 +0100
Subject: Use two Emboss package with one database
In-Reply-To: <OF12D15AFD.92630898-ONC1256D2D.003032C5-C1256D2D.0030D3F0@bayer-ag.com>
Message-ID: <BAF0FFA8.28E5%d.m.a.martin@dundee.ac.uk>

On 21/5/03 9:53 am, "yann-francois.bizouerne at bayercropscience.com"
<yann-francois.bizouerne at bayercropscience.com> wrote:

> Hello,
> 
> I am working with 2 diffretns servers on different locations. On each of them
> a
> EMBOSS package tools is installed.
> I need to know if I could configure these 2 EMBOSS in order to work with the
> same database (which is located on one of the two servers).
> Is EMBOSS could working this way or does I need to have only one EMBOSS
> package
> (tools + databse) installed on one server ?

There are two options here:

1. Different platforms accessing the same database
2. same platform (different machines) accessing the same database.

1. Easy. When you do a configure set the prefix (prefixes) approrpriately so
that the executables go to an appropriate place and the databases point to a
shared (NFS or similar) drive containing the config files. Obviously you
have to use the same mountpoint on all your machines for this to work (I use
/site/share/EMBOSS for the config and /site/databases as a root for the
databases. In this case /site/bin is local, not shared and contains the
appropriate binaries [or can be a symlink to /site/Linux/bin,
/site/IRIX/bin, /site/Solaris/bin, /site/Darwin/bin as appropriate if you
are supporting more than on emachine on a particular platform]  )

2. Can be done in the same way using a shared drive for the executables. One
gotcha is that EMBOSS, despite all efforts, does not compile statically so
you have to ensure that the library versions are the same across the various
platforms or you will get runtime errors.

Either of these methods will reduce the maintenance load considerably.

In my case I use NFS for the data directories and use a nightly scheduled
rsync to synchronise the executables and config files with the master
machine as these don't take much space and it reduces the network overhead.

Hope this helps.

..d
> 
> I hope that my question is clear enough.
> 
> Best Regards
> 
> 
> 
> Yann-Fran?ois BIZOUERNE
> BioInformatic Team
> BAYER CropScience
> 1, rue Pierre Fontaine
> 91058 Evry Cedex
> FRANCE
> Phone:      33-(0) 1-69-47-61-56
> FAX:        33-(0) 1-69-47-61-42
> E-mail:     yann-francois.bizouerne at bayercropscience.com
> Intranet: http://bioinfo.evry.fr.bayercropscience/
> 
> 
> 

-- 
David Martin PhD
Bioinformatics Scientific Officer
Post-Genomics and Molecular Interactions Centre
University of Dundee


From d.m.a.martin at dundee.ac.uk  Wed May 21 09:13:25 2003
From: d.m.a.martin at dundee.ac.uk (David Martin)
Date: Wed, 21 May 2003 10:13:25 +0100
Subject: Use two Emboss package with one database
In-Reply-To: <3ECB3FE6.1020906@ebi.ac.uk>
Message-ID: <BAF101C5.28EA%d.m.a.martin@dundee.ac.uk>

On 21/5/03 9:59 am, "Peter Rice" <pmr at ebi.ac.uk> wrote:

> yann-francois.bizouerne at bayercropscience.com wrote:
>> Hello,
>> 
>> I am working with 2 diffretns servers on different locations. On each of them
>> a
>> EMBOSS package tools is installed.
>> I need to know if I could configure these 2 EMBOSS in order to work with the
>> same database (which is located on one of the two servers).
>> Is EMBOSS could working this way or does I need to have only one EMBOSS
>> package
>> (tools + databse) installed on one server ?
> 
> Yes ... but you need to do some work.
> 
> The EMBOSS package on the same server as the databases is easy.
> 
> The second EMBOSS package needs to read from remote databases. I assume
> you indexed them with dbiflat (an the other dbi programs).
> 
> You can access a remote database by:
> 
> SRSWWW if it on an SRS server
> URL if you have a web page to query the database
> APP (EXTERNAL) if yuo have a script that can return an entry
> 
> Assuming you don't have them under SRS ...
> 
> You can provide a simple web CGI script that runs entret (for whole
> entry) or seqret (for sequence only - you can put -osformat on the
> command line to get the format of your choice))
> 
> You can write a script that will access the databases somehow (possibly
> also by talking to a web page - your choice).
> 
> Meanwhile, I am working on ways to define EMBOSS web services and data
> services that would give an alternative access method, but that is for
> later in the year.
> 

What about just using Jemboss (or a variant thereof) to talk to the master
machine? The alternative is shunting lots of data around which isn't really
feasible unless you have a fast network between the machines.

Do you move the data to the problem or the problem to the data? The trade
off is between transport time and execution time.

Does Jemboss make use of a SOAP server? If not it would be really nice to
have a script that could generate a WSDL definition from the ACD files.

It's then one step away from being a Grid service..

..d 

 
-- 
David Martin PhD
Bioinformatics Scientific Officer
Post-Genomics and Molecular Interactions Centre
University of Dundee


From maoj at mail.nih.gov  Wed May 21 16:21:43 2003
From: maoj at mail.nih.gov (Jean Mao)
Date: Wed, 21 May 2003 12:21:43 -0400
Subject: question about databases setup
Message-ID: <038001c31fb5$1171acf0$618a70a5@citjmao>

Hi, I am new to emboss. have question in database setup. 

I have a file in the directory /data/maoj/emboss/db/mouse/ called 'test.dat'. this file has 9 entries in embl format. i ran dbiflat, acnum.hit, acnum.trg, division.lkp, entrynam.idx were generated. 

then I setup a .embossrc file in my home dir as follows :
-------------------------------------------------------------------------------------------
# Logfile - set this to a file that any user can append to
# and EMBOSS applications will automatically write log information

# SET emboss_logfile /home/db/emboss/tmp/log

DB test [
        type: N
        method: emblcd
        format: embl
        dir: /data/maoj/emboss/db/mouse
        file: "*.dat"
        release: "1"
        comment: "Test DB"
        ]
-------------------------------------------------

when i run seqret and try to retrieve 1 of the 9 entries, i got following error:

% seqret
Reads and writes (returns) sequences
Input sequence(s): test:AB001363
Warning: Cannot open division file '<null>' for database 'test'
Warning: seqCdQry failed
Error: Unable to read sequence 'test:AB001363'

Please help. Thank you in advance.

Jean
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.open-bio.org/pipermail/emboss/attachments/20030521/34ed2c6a/attachment-0001.html>

From pmr at ebi.ac.uk  Wed May 21 16:30:43 2003
From: pmr at ebi.ac.uk (Peter Rice)
Date: Wed, 21 May 2003 17:30:43 +0100
Subject: question about databases setup
References: <038001c31fb5$1171acf0$618a70a5@citjmao>
Message-ID: <3ECBA9B3.3070802@ebi.ac.uk>

Jean Mao wrote:
> Hi, I am new to emboss. have question in database setup.
>  
> I have a file in the directory /data/maoj/emboss/db/mouse/ called 
> 'test.dat'. this file has 9 entries in embl format. i ran dbiflat, 
> acnum.hit, acnum.trg, division.lkp, entrynam.idx were generated.

You need to specify where the index files are (indexdir) in the database 
definition.

Hope this helps,

Peter Rice


From mathog at mendel.bio.caltech.edu  Wed May 21 22:04:02 2003
From: mathog at mendel.bio.caltech.edu (David Mathog)
Date: Wed, 21 May 2003 15:04:02 -0700
Subject: extractfeat with gff files?
Message-ID: <E19Ibgo-0002ya-00@mendel.bio.caltech.edu>

EMBOSS 2.6.0

I cannot seem to locate the magic incantation that will make gff
files work as desired with fasta files.  HELP!

Here's the sort of line I want to extract (sorry about the wrap):

X       gadfly  translation     1880    3119    .       -       .      
genegrp=CG3038; transgrp=CG3038-RB

There are many other lines in the gff for transcription,exon,
gene, etc. which should not be extracted.  The fasta
input file currently has entries with names like X,2L,etc.
which correspond to the first column of the gff file.
Ideally I'd like to be able to use one gff file
(with X->3R in the first column) to extract
from one fasta file (again, with X->3R for the fasta name),
and have the descriptions of X act only on the sequence X,
and so forth. The idea being to be able to extract features
on a genomic level using only one fasta/gff pair, rather than
N (=#of scaffolds) pairs.


First though I tried an input fasta file containing 
(11 X 10kb entries, the first being X) and a gff file also starting
only with X (but with references for the whole chromosome, 22101
lines).  The following command sat for about two minutes, burned
a lot of CPU time, but emitted nothing:

extractfeat -sequence=dmel_genome_frag.nfa\
  -ufo=x.gff -type=translation -outseq=x.nfa

When the -type qualifier was removed it went nuts and emitted 
over 40000 entries (more than there were lines in the gff file!)
before I killed it.  Clearly there was no error checking for
size of gff entry versus size of sequence.  The input fasta
file had 11 entries of 10000 bp each.  The first was X.  Yet
a bunch of lines like:

>X_12390_12854 [exon] X release:3 length:21780003bp Assembled X
chromosome arm sequence md5:f3fbbb4c44f0d30d1effeecc87b5bd18
T

were emitted.  So the fasta file was reduced to just one entry
(X, 10kb) and this time the output fasta file held 22101 entries.
As before, those beyond 10kb were emitted with a single base.
So apparently the entire gff description is applied to each fasta
sequence and there's no checking of the first column against
the sequence name. That's ok - we can live with that for now,
but it would be better if the descriptions could automatically
matched to the sequence names.

I'm not sure though that we can live with it emitting single
bp sequences when the description is outside of the sequence.
If the feature is beyond the end of the input sequence it just
isn't there, right?

Just to spite me "translation" was never emitted.
There were only lines for gene,exon.misc_feature,tRNA,snoRNA.

So I tried:

 extractfeat -sequence=dmel_x.nfa \
  -ufo=x.gff -outseq=x.nfa -type=gene

And it emitted a single whole gene match at (1488,3280,-) correctly.
The next one at (3445,11463,+) partially (and correctly, ending
at the end of the sequence - a warning would have been nice)
and then a slew of (>2000) single base pair "empty" entries
outside of the input sequence.  Note also that there's no indication
on the fasta header line in the output of the strand which
was selected. 

So, how does one get extractfeat to emit only matches
to "translation"? Please tell me there's some way other
than by extracting those lines into a separate
gff file and renaming them all "gene"!

Extractfeat seems to have a predefined set of "features" that it's
willing to work with and doesn't handle others well.  To
narrow this down a bit more I made a small gff file containing
"fred" where "gene" had been and specifying positions <10kb.
The features were emitted but all were labeled "misc_feature".
Is this documented somewhere?  It isn't in an obvious
place in the on line help, as both of these searches come up
empty.

 extractfeat -h 2>&1 | grep -i misc
 tfm extractfeat 2>&1 | grep -i misc


It would also be nice if there was some way to get column 9 from
the gff file onto the fasta header line somewhere.  (It can
then be rearranged to suit later.)  Currently even if one
has the gene names lined up with the gene entries in the gff
file the resulting fasta file just says "X_100_123 [gene]..."
without any of the comment info.  You've got the sequence
but not the names of the genes.  Very painful to work with if
the output is the coding sequences for an entire genome.


Is there a switch (or bug fix) that stops extractfeat
from emitting garbage single bp entries for descriptions
outside the sequence?

Thanks,


David Mathog
mathog at caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech


From yann-francois.bizouerne at bayercropscience.com  Thu May 22 14:26:36 2003
From: yann-francois.bizouerne at bayercropscience.com (yann-francois.bizouerne at bayercropscience.com)
Date: Thu, 22 May 2003 16:26:36 +0200
Subject: creation of new output fasta format
Message-ID: <OFD4207DB1.0DC2618F-ONC1256D2E.004D554E-C1256D2E.004F56FC@bayer-ag.com>

Hello,

Fisrt thanks a lot for your quick response to my last mail.

Now, I am trying to create a new fasta format. The format I want to obtain : >
dbname:id |accession|organism|description
By the way I create a new function in the ajseqwrite.c (seqWriteNewFasta). I
have select the diffrent informations I want to retrieve by using the examples
of others functions.
It is working quite well. Except for the Pir and Nrl_3D databases.
Indeed for these databases, I have no database name and no organism (taxon)

 /** Database name **/
 if (ajStrLen(outseq->Db))
    (void) ajFmtPrintF (outseq->File, ">%S:", outseq->Db);
  else if (ajStrLen(outseq->Setdb))
    (void) ajFmtPrintF (outseq->File, ">%S:", outseq->Setdb);
  else
    (void) ajFmtPrintF (outseq->File, ">unk:");


 /** Organism **/
  if (ajStrLen(outseq->Tax))
    (void) ajFmtPrintF (outseq->File, "%S|", outseq->Tax);


I try to find some information about NBRF format in EMboss and the way to use
it, but I could find nothing.
Do you have a clue for me ?

Best regards


Yann-Fran?ois BIZOUERNE
BioInformatic Team
BAYER CropScience
1, rue Pierre Fontaine
91058 Evry Cedex
FRANCE
Phone:      33-(0) 1-69-47-61-56
FAX:        33-(0) 1-69-47-61-42
E-mail:     yann-francois.bizouerne at bayercropscience.com
Intranet: http://bioinfo.evry.fr.bayercropscience/


From peptides at earthlink.net  Sat May 24 22:55:53 2003
From: peptides at earthlink.net (David Stephens)
Date: Sat, 24 May 2003 15:55:53 -0700
Subject: Happy Memorial Day
Message-ID: <20030524225553.29CB27D181@mercury.hgmp.mrc.ac.uk>

An HTML attachment was scrubbed...
URL: <http://lists.open-bio.org/pipermail/emboss/attachments/20030524/4408e78b/attachment-0001.html>

From henrikki.almusa at helsinki.fi  Tue May 27 10:46:13 2003
From: henrikki.almusa at helsinki.fi (Henrikki Almusa)
Date: Tue, 27 May 2003 13:46:13 +0300
Subject: Graph data handling
Message-ID: <200305271339.51333.henrikki.almusa@helsinki.fi>

Hello,

I'm trying to use graphs in scripts outside of emboss. However i got into 
problems with options conserning the graph handling. I found following 
options from "banana" tools webpage:

  "-graph" related qualifiers
   -gprompt             boolean    Graph prompting
   -gtitle              string     Graph title
   -gsubtitle           string     Graph subtitle
   -gxtitle             string     Graph x axis title
   -gytitle             string     Graph y axis title
   -goutfile            string     Output file for non interactive displays
   -gdirectory          string     Output directory

I tried to use -goutfile and -gdirectory with banana, but i seem to be unable 
to effect the data file(s) or their directories. 

If i understand correctly this should work "banana mRNA.seq -graph data 
-goutfile /home/hena/banana_data_file -auto" or then "banana mRNA.seq -graph 
data -goutfile banana_data_file -gdirectory /home/hena". For second i get 
error: "Died: unknown qualifier -gdirectory" and with first i get "Created 
banana_data_file.dat", but no such file is created and no data fale is there. 
Also if i use "-data" option there, i get multiple bananaX.dat (in which X is 
running number) files.

So, my questions are. How exactly how is this supposed to work? And could it 
be added to webpages "User documentation"  section with other formats.  And 
thirdly is there a reason, why some programs expect "-data" option and others 
do not?

TIA
--
Henrikki Almusa


From sebastien.frade at bayercropscience.com  Wed May 28 12:38:49 2003
From: sebastien.frade at bayercropscience.com (sebastien.frade at bayercropscience.com)
Date: Wed, 28 May 2003 14:38:49 +0200
Subject: No subject
Message-ID: <OF4617347F.32BA0711-ONC1256D34.0036ABA5-C1256D34.00457920@bayer-ag.com>

Hi,

I'm a new user of EMBOSS and i like to extract some information of EMBL flat
file like clone, strain, tissue ... that are stored in the FT section.
But i don't know how to do that.

I've look for a tool that can extract features, but no one of them extract these
fields.

if a such tool doesn't exist how can i develop it ?

Please help me !!

Thank


S?bastien Frade
BioInformatic Team
BAYER CropScience
1 Rue Pierre FONTAINE
91058 EVRY ? France
                                                                              
 tel :                                   33 (0) 1 69 47 61 52                 
                                                                              
 fax :                                   33 (0) 1 69 47 61 42                 
                                                                              
 mail :                                  sebastien.frade at bayercropscience.com 
                                                                              
 http://bioinfo.evry.fr.bayercropscience                                      
                                                                              

From pmr at ebi.ac.uk  Wed May 28 14:30:43 2003
From: pmr at ebi.ac.uk (Peter Rice)
Date: Wed, 28 May 2003 15:30:43 +0100
Subject: 
References: <OF4617347F.32BA0711-ONC1256D34.0036ABA5-C1256D34.00457920@bayer-ag.com>
Message-ID: <3ED4C813.4010900@ebi.ac.uk>

sebastien.frade at bayercropscience.com wrote:
> Hi,
> 
> I'm a new user of EMBOSS and i like to extract some information of EMBL flat
> file like clone, strain, tissue ... that are stored in the FT section.
> But i don't know how to do that.
> 
> I've look for a tool that can extract features, but no one of them extract these
> fields.

This sounds like a task for SRS :-)

http://srs.ebi.ac.uk/

EMBOSS really works with the sequence data. We can try to extract more 
of the other data but it is a non-trivial task.

But ... you could write your own EMBOSS tool, and we can help you to do 
that!!!

Hope this helps

Peter Rice


From maoj at mail.nih.gov  Fri May 30 14:49:17 2003
From: maoj at mail.nih.gov (Jean Mao)
Date: Fri, 30 May 2003 10:49:17 -0400
Subject: question about dbiblast
Message-ID: <0d4401c326ba$a5847600$618a70a5@citjmao>

Hi, I am new in emboss db config. Need some help in indexing blast db.

I have in dir following files:
-rw-rw-r--    1 maoj     Seqdb       190733 May 30 08:19 drosoph.nt.nhr
-rw-rw-r--    1 maoj     Seqdb        14108 May 30 08:19 drosoph.nt.nin
-rw-rw-r--    1 maoj     Seqdb         9360 May 30 08:19 drosoph.nt.nnd
-rw-rw-r--    1 maoj     Seqdb           84 May 30 08:19 drosoph.nt.nni
-rw-rw-r--    1 maoj     Seqdb       174584 May 30 08:19 drosoph.nt.nsd
-rw-rw-r--    1 maoj     Seqdb         3699 May 30 08:19 drosoph.nt.nsi
-rw-rw-r--    1 maoj     Seqdb     31368306 May 30 08:19 drosoph.nt.nsq

i believe these are files from NCBI and generated use formatdb version 2.2.5. I run dbiblast in this directory:
Index a BLAST database
Database name: drosoph
Database directory [.]: 
Wildcard database filename [drosoph]: drosoph.nt.*
Release number [0.0]: 
Index date [00/00/00]: 
         N : nucleic
         P : protein
         ? : unknown
Sequence type [unknown]: N
         1 : wublast and setdb/pressdb
         2 : formatdb
         0 : unknown
Blast index version [unknown]: 2

then I got many lines of the following message:
Warning: Duplicate ID skipped: '0?0? ?^DROSOPHILA' All hits will point to first ID found

The following new files were generated:
-rw-rw-r--    1 maoj     Seqdb          322 May 30 10:44 division.lkp
-rw-rw-r--    1 maoj     Seqdb          496 May 30 10:44 entrynam.idx
-rw-rw-r--    1 maoj     Seqdb          300 May 30 10:44 acnum.trg
-rw-rw-r--    1 maoj     Seqdb          300 May 30 10:44 acnum.hit

I then edit my ~/.embossrc file by add the following lines:
DB drosoph [
        type: N
        method: blast
        format: ncbi
        dir: /data/maoj/emboss/db/blast/drosoph
        indexdir: /data/maoj/emboss/db/blast/drosoph
        file: "drosoph.nt.*"
        release: "0.0"
        comment: "blast drosoph"
        ]

Then I run showdb:
% showdb
Displays information on the currently available databases
# Name        Type ID  Qry All Comment
# ====        ==== ==  === === =======
drosoph       N    OK  OK  OK  blast drosoph
test               N    OK  OK  OK  Test DB

The test db is genbank format and was running fine.

Then I run seqret:
% seqret
Reads and writes (returns) sequences
Input sequence(s): drosoph:A*
Error: BLAST Query failed
Error: Unable to read sequence 'drosoph:A*'
Input sequence(s): drosoph:*

   EMBOSS An error in ajseqdb.c at line 4006:
error reading file /data/maoj/emboss/db/blast/drosoph/drosoph.nt.nhr

Please advise what I might did wrong. 

Thank you very much!!!

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.open-bio.org/pipermail/emboss/attachments/20030530/21abd710/attachment-0001.html>

From pmr at ebi.ac.uk  Fri May 30 18:24:58 2003
From: pmr at ebi.ac.uk (pmr at ebi.ac.uk)
Date: Fri, 30 May 2003 19:24:58 +0100 (BST)
Subject: question about dbiblast
In-Reply-To: <0d4401c326ba$a5847600$618a70a5@citjmao>
References: <0d4401c326ba$a5847600$618a70a5@citjmao>
Message-ID: <1189.217.134.86.144.1054319098.squirrel@webmail.ebi.ac.uk>

> Hi, I am new in emboss db config. Need some help in indexing blast db.

This is the long-standing problem of the "new ASN.1 format blast database"

NCBI changed formatdb to create a new index file format, but we have no
documentation on the new format, so we cannot update dbiblast to index it.

We hope to provide the ability to index these blast databases in a future
release, once NCBI release the format specification. I suspect EMBOSS and
FASTA are the only other applications using blast index formats so it is
not an urgent task for them.

Meanwhile, you need to use the 'old' format:

First, you need the original FASTA format file (drosophila.nt)

Then, index it with formatdb but add "-A F" to the command line (to turn
off ASN.1 format).

Hope this helps,

Peter Rice


From calvinwangxi at yahoo.com  Sat May 31 11:43:11 2003
From: calvinwangxi at yahoo.com (calvin wang)
Date: Sat, 31 May 2003 04:43:11 -0700 (PDT)
Subject: new moethods
In-Reply-To: <1189.217.134.86.144.1054319098.squirrel@webmail.ebi.ac.uk>
Message-ID: <20030531114311.68280.qmail@web41115.mail.yahoo.com>

I need to use TCOFFE, I understand this is not part of EMBOSS. Is it possible to include new methods in to EMBOSS? how? is there a guide? thanks. 

> Hi, I am new in emboss db config. Need some help in indexing blast db.

This is the long-standing problem of the "new ASN.1 format blast database"

NCBI changed formatdb to create a new index file format, but we have no
documentation on the new format, so we cannot update dbiblast to index it.

We hope to provide the ability to index these blast databases in a future
release, once NCBI release the format specification. I suspect EMBOSS and
FASTA are the only other applications using blast index formats so it is
not an urgent task for them.

Meanwhile, you need to use the 'old' format:

First, you need the original FASTA format file (drosophila.nt)

Then, index it with formatdb but add "-A F" to the command line (to turn
off ASN.1 format).

Hope this helps,

Peter Rice


---------------------------------
Do you Yahoo!?
Free online calendar with sync to Outlook(TM).
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.open-bio.org/pipermail/emboss/attachments/20030531/7dba1b52/attachment-0001.html>