From gbottu at ben.vub.ac.be  Mon Oct  2 03:58:24 2006
From: gbottu at ben.vub.ac.be (Guy Bottu)
Date: Mon, 2 Oct 2006 09:58:24 +0200
Subject: [EMBOSS] case sensitive identifiers - Checked by AntiVir DEMO
	version -
In-Reply-To: <2023.86.132.219.183.1159518502.squirrel@webmail.ebi.ac.uk>
References: <20060928135740.GA14320@bigben.ulb.ac.be>
	<451BDD04.9040806@ebi.ac.uk>
	<20060929081508.GA25906@bigben.ulb.ac.be>
	<2023.86.132.219.183.1159518502.squirrel@webmail.ebi.ac.uk>
Message-ID: <20061002075824.GA5571@bigben.ulb.ac.be>

On Fri, Sep 29, 2006 at 09:28:22AM +0100, pmr at ebi.ac.uk wrote:
> For the PDB case, really only the end of the ID is case-sensitive. Do you
> think the database should be case-sensitive for the whole ID, or does it
> make sense to check for a pattern as the case-sensitive part?

I think that trying to define which part of the ID is case-sensitive is 
making it just too complicated. Let's have it completely case-sensitive 
or not at all.

> EMBOSS will initially read only one sequence for a seqall ... it does not
> read in all the sequences and look for duplicates so we have to decide in
> the emboss.defaults DB definition how to check a single ID (no way to read
> them all and check for duplicates).

Trying to check for duplicates is again too complicated. I understand 
that if a databank or a multiple sequence file has duplicates a 
"sequence" will retrieve the first and a "seqset" or "seqall" will 
retrieve them all. Well, let it be that way. It is the responsability of 
the database manager/user to make sure there are no duplicates.

	Guy


From gbottu at ben.vub.ac.be  Mon Oct  2 04:11:46 2006
From: gbottu at ben.vub.ac.be (Guy Bottu)
Date: Mon, 2 Oct 2006 10:11:46 +0200
Subject: [EMBOSS] case sensitive identifiers - Checked by AntiVir DEMO
	version -
In-Reply-To: <451CF527.8040506@ebi.ac.uk>
References: <20060928135740.GA14320@bigben.ulb.ac.be>
	<451BDD04.9040806@ebi.ac.uk>
	<20060929081508.GA25906@bigben.ulb.ac.be>
	<451CF527.8040506@ebi.ac.uk>
Message-ID: <20061002081146.GB5571@bigben.ulb.ac.be>

On Fri, Sep 29, 2006 at 11:27:51AM +0100, Peter Rice wrote:
> So, there will be 2 new (and for the first time boolean) attributes for 
> databases. To use them, you will need:
> 
> caseidmatch: "Y"
> hasaccession: "N"

The "hasaccession" attribute is certainly useful for search methods like 
SRS and MRS who have the notion of searching in separate indexes. By 
default searching both "id" and "ac" is the thing to do, but there are 
databanks where there is no "ac" indexed or there are databanks, like 
EMBL or IMGTHLA, where the "id" and the "ac" are always identical, so 
that searching only the "id" gains time without loosing functionality.

As for the case problem, I think we agree that the best is to always 
handle the sequence name as such (case as typed by the user) to the 
search method and in case the search method itself is not case senstive 
but the databank is, let EMBOSS if 'hasaccession: "Y"' parse the 
retrieved sequences and accept only those who match. This will work fine 
for SRS (and of course for the method "direct", where EMBOSS does all the 
work), but it will not work for MRS, since the current version of MRS 
does not allow case-different index words.

	Guy


From jbreu at mpipsykl.mpg.de  Fri Oct  6 12:53:00 2006
From: jbreu at mpipsykl.mpg.de (Johannes Breu)
Date: Fri, 06 Oct 2006 18:53:00 +0200
Subject: [EMBOSS] question
Message-ID: <3.0.6.32.20061006185300.00ab68e0@komserv.mpipsykl.mpg.de>

To whom it may concern.

I tried to install emboss on MS Windows 2000 in a cygwin environment. I
typed ./configure (following INSTALL). It took a long time but there was no
error message. After typing make  I got the message bash:command:not found.
Does anybody have any idea to solve this problem. Thanks.


From shaun at ebi.ac.uk  Fri Oct  6 14:30:56 2006
From: shaun at ebi.ac.uk (shaun at ebi.ac.uk)
Date: Fri, 6 Oct 2006 19:30:56 +0100 (BST)
Subject: [EMBOSS] question
In-Reply-To: <3.0.6.32.20061006185300.00ab68e0@komserv.mpipsykl.mpg.de>
References: <3.0.6.32.20061006185300.00ab68e0@komserv.mpipsykl.mpg.de>
Message-ID: <50608.82.21.106.225.1160159456.squirrel@webmail.ebi.ac.uk>

> To whom it may concern.
>
> I tried to install emboss on MS Windows 2000 in a cygwin environment. I
> typed ./configure (following INSTALL). It took a long time but there was
> no
> error message. After typing make  I got the message bash:command:not
> found.
> Does anybody have any idea to solve this problem. Thanks.
>
> _______________________________________________
> EMBOSS mailing list
> EMBOSS at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/emboss
>

Hi Johannes,

I believe the installation under cygwin requires a couple of additional
switches:

./configure --without-x CFLAGS=-s

See the following URL for a more short guide to installing EMBOSS under a
cygwin environment (it actually describes the scenario that you are
encountering):

http://emboss.sourceforge.net/download/cygwin.html

HTH

Shaun


From mukherje at nsm.umass.edu  Fri Oct 13 15:34:11 2006
From: mukherje at nsm.umass.edu (mukherje at nsm.umass.edu)
Date: Fri, 13 Oct 2006 15:34:11 -0400
Subject: [EMBOSS] (no subject)
Message-ID: <1160768051.452fea3357af3@mail-www.oit.umass.edu>

Hi,

I tried to build Emboss package in a cygwin environment but had thhe following
error after running the "make" function.

Creating library file: .libs/libajax.dll.a
collect2: Id returned 1 exit status
make[1]: *** [libajax.1a] Error 1
make[1]: Leaving directory '/home/supratim/Emboss-4.0.0/ajax'
make: *** [install-recursive] Error1

I tried to run the application both before & after applying the fix as
mentioned and also tried with the regular configure option as well as

./configure --without-x CFLAGS=-s
make
make install

I also tried with the older version (EMBOSS-3.0.0) but did not have much luck.
It works fine in a Mac but I would like to get it to work on a Windows
platform with cygwin.

Thank you for your assistance in advance

Supratim Mukherjee
Graduate student
Department of Microbiology
UMass, Amherst


From kertib at linuxlap.hu  Mon Oct 16 04:50:03 2006
From: kertib at linuxlap.hu (Kerti =?ISO-8859-1?Q?Bal=E1zs_G=E1bor?=)
Date: Mon, 16 Oct 2006 10:50:03 +0200
Subject: [EMBOSS] make error
Message-ID: <1160988603.3981.12.camel@balazska.site>

Hi,

The .configure run well, but the make made error ebove:

config.status: creating jemboss/resources/Makefile
config.status: creating jemboss/utils/Makefile
config.status: creating Makefile
config.status: executing depfiles commands
server:/usr/src/EMBOSS-4.0.0 # make
Making all in plplot
make[1]: Entering directory `/usr/src/EMBOSS-4.0.0/plplot'
Making all in lib
make[2]: Entering directory `/usr/src/EMBOSS-4.0.0/plplot/lib'
make[2]: Nothing to be done for `all'.
make[2]: Leaving directory `/usr/src/EMBOSS-4.0.0/plplot/lib'
make[2]: Entering directory `/usr/src/EMBOSS-4.0.0/plplot'
make[2]: Nothing to be done for `all-am'.
make[2]: Leaving directory `/usr/src/EMBOSS-4.0.0/plplot'
make[1]: Leaving directory `/usr/src/EMBOSS-4.0.0/plplot'
Making all in ajax
make[1]: Entering directory `/usr/src/EMBOSS-4.0.0/ajax'
make[1]: Nothing to be done for `all'.
make[1]: Leaving directory `/usr/src/EMBOSS-4.0.0/ajax'
Making all in nucleus
make[1]: Entering directory `/usr/src/EMBOSS-4.0.0/nucleus'
/bin/sh ../libtool --tag=CC --mode=link gcc  -O2   -o libnucleus.la
-rpath /usr/local/lib -version-info 4:0:0 embaln.lo embcom.lo embcons.lo
embdata.lo embdbi.lo embdmx.lo embdomain.lo embest.lo embexit.lo
embgroup.lo embiep.lo embindex.lo embinit.lo embmat.lo embmisc.lo
embmol.lo embnmer.lo embpat.lo embpatlist.lo embprop.lo embpdb.lo
embread.lo embsig.lo embshow.lo embword.lo
libtool: link: `embmat.lo' is not a valid libtool object
make[1]: *** [libnucleus.la] Error 1
make[1]: Leaving directory `/usr/src/EMBOSS-4.0.0/nucleus'
make: *** [all-recursive] Error 1
server:/usr/src/EMBOSS-4.0.0 #

The OS: SuSE OpenEnterprise Server 10.0 x86
Kerlenl: server: Linux server 2.6.16.21-0.25-default #1 Tue Sep 19
07:26:15 UTC 2006 i686 i686 i386 GNU/Linux

What is the soultion?

Thank you for your assistance in advance

Kerti Balazs Gabor
Genetics and Plant Breeding, 
Szent Istvan University, Pater K. U. 1., 
Godollo 2103, Hungary


From maoj at helix.nih.gov  Tue Oct 17 11:36:22 2006
From: maoj at helix.nih.gov (Jean Mao)
Date: Tue, 17 Oct 2006 11:36:22 -0400
Subject: [EMBOSS] Question regarding dbxflat
Message-ID: <000001c6f202$000323e0$be4de780@CIT.NIH.GOV>

Hello,

Could someone help me determine which fields I need to include while running dbxflat? I am going to index the genbank and est gb*.seq files and gbest*.seq files from ftp://ftp.ncbi.nih.gov/genbank/. These files have sequence entries composed of : Locus, Definition, Accession, Version, Keywords, Source, Organism??

If I specify 'acc, id' while indexing, will the 'Definition' line be indexed or not? What about 'acc, id, des' ? In other words, I would like to know which programs in EMBOSS will not work if I don't specify 'des' while indexing. 
Some programs in EMBOSS such as 'coderet' require feature table. If I only index 'acc, id', will coderet work when user specify 'genbank:xxxxx' ?

I guess all I am trying to ask is what programs will stop working if I only accept default 'acc, id' fields.

Thank you in advance.


From pmr at ebi.ac.uk  Tue Oct 17 12:23:03 2006
From: pmr at ebi.ac.uk (pmr at ebi.ac.uk)
Date: Tue, 17 Oct 2006 17:23:03 +0100 (BST)
Subject: [EMBOSS] Question regarding dbxflat
In-Reply-To: <000001c6f202$000323e0$be4de780@CIT.NIH.GOV>
References: <000001c6f202$000323e0$be4de780@CIT.NIH.GOV>
Message-ID: <2320.210.150.186.27.1161102183.squirrel@webmail.ebi.ac.uk>

Hi Jean,

> If I specify 'acc, id' while indexing, will the 'Definition' line be
> indexed or not? What about 'acc, id, des' ? In other words, I would like
> to know which programs in EMBOSS will not work if I don't specify 'des'
> while indexing.
> Some programs in EMBOSS such as 'coderet' require feature table. If I only
> index 'acc, id', will coderet work when user specify 'genbank:xxxxx' ?
>
> I guess all I am trying to ask is what programs will stop working if I
> only accept default 'acc, id' fields.

The dbxflat fields only affect queries (do you want to search by
dbname-des: or dbname-gi: when you look for sequences).

Retrieval is the same once an entry has been found - you can return all
txt for entret, features for coderet, and os on as usual.

By default only the id and acc lines will be indexed.

We found a problem with one database that had no accessions (pdb as a
fasta file indexed with dbxfasta) so the next release will have an option
to turn off accession searches in the database definition and we may add
an option to skip accession indexing.

regards,

Peter Rice


From smiddha at indiana.edu  Thu Oct 19 10:59:57 2006
From: smiddha at indiana.edu (Sumit Middha)
Date: Thu, 19 Oct 2006 10:59:57 -0400
Subject: [EMBOSS] distmat Uncorrected distance > 100
Message-ID: <453792ED.6040601@indiana.edu>


Hi,

I tried using distmat from emboss on some alignments and am getting 
scores in excess of 100 (using all default options).

I am not sure how scores can exceed 100.

D = uncorrected distance = p-distance = 1-S
where S = m/(npos + gaps*gap_penalty)  

So D is like a percentage and equals number of substitutions per 100 bases or amino acids.

Please correct me or point me to an explanation which will help clarify my doubt.

Thanks,
Sumit


From aengus.stewart at cancer.org.uk  Mon Oct 23 14:00:11 2006
From: aengus.stewart at cancer.org.uk (Aengus Stewart)
Date: Mon, 23 Oct 2006 19:00:11 +0100
Subject: [EMBOSS] Fuzznuc ignoring start and end
Message-ID: <453D032B.7010104@cancer.org.uk>


Hi folks,

fuzznuc and also fuzzpro are ignoring the start and end params I am giving it.

fuzznuc -pattern rccatgg -sbegin1 75834 -send1 96013 -sequence ac087388.fasta


Cheers
Aengus


########################################
# Program: fuzznuc
# Rundate: Mon Oct 23 2006 18:54:54
# Commandline: fuzznuc
#    -pattern rccatgg
#    -sbegin 75834
#    -send 96013
#    -sequence ac087388.fasta
# Report_format: seqtable
# Report_file: ac087388.fuzznuc
########################################

#=======================================
#
# Sequence: AC087388     from: 75834   to: 96013
# HitCount: 15
#
# Pattern_name Mismatch Pattern
# pattern1            0 rccatgg
#
# Complement: No
#
#=======================================

  Start     End Pattern_name Mismatch Sequence
  38702   38708 pattern1            . gccatgg
  43834   43840 pattern1            . accatgg
  47457   47463 pattern1            . gccatgg
  48659   48665 pattern1            . gccatgg
  56718   56724 pattern1            . accatgg
  61200   61206 pattern1            . accatgg
  62151   62157 pattern1            . accatgg
  68706   68712 pattern1            . accatgg
  78513   78519 pattern1            . gccatgg
  79973   79979 pattern1            . gccatgg
  86415   86421 pattern1            . accatgg
  97451   97457 pattern1            . accatgg
 102803  102809 pattern1            . gccatgg
 113924  113930 pattern1            . gccatgg
 115436  115442 pattern1            . gccatgg

#---------------------------------------
#---------------------------------------

#---------------------------------------
# Total_sequences: 1
# Total_hitcount: 15
#---------------------------------------

-- 
-----------------------------------------------------------------------
Aengus Stewart
Group Leader
Bioinformatics and BioStatistics               Tel: +44 (0)20 7269 3679
Cancer Research UK, Lincoln's Inn Fields, Holborn, London, WC2A 3PX, UK
-----------------------------------------------------------------------

This electronic message contains information which may be privileged and
confidential.  The information is intended to be for the use of the
individual(s) or entity named above. Be aware that any third party
disclosure, distribution, copying or use of this communication, without
prior permission, is strictly prohibited.


From mrln at o2.pl  Mon Oct 23 17:49:07 2006
From: mrln at o2.pl (Marlena Roszczyk)
Date: Mon, 23 Oct 2006 23:49:07 +0200
Subject: [EMBOSS] 30 entries only
Message-ID: <1161640147.4367.37.camel@localhost.localdomain>

Does anybody know how to solve this problem:

I use Emboss via srswww method and everything seems to work fine until I
ask seqret or infoseq (or any other application that searches the
database) for many sequences  (for example typing: "seqret
database-des:kinase"). The output consists only of 30 entries even
though the same query on srs.ebi.ac.uk results in a 6-digit number of
entries. What shall I do to get all the entries I want? Is it a problem
with Emboss or rather srs policy of sending data?


Just in case you would like to see my emboss.default:

DB zuniprot [ 
methodquery: srswww
format: swiss
type: P
fields: "id acc sv des key org"
dbalias: uniprot
url: "http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz"
comment: "uniprot/swiss via srswww"
]

DB zswiss [
methodquery: srswww
format: swiss
type: P
fields: "id acc sv des key org"
dbalias: swissprot
url: "http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz"
comment: "swissprot via srswww"
]

DB zembl [
type: N
methodquery: srswww
format: embl
fields: "id acc key sv des org"
dbalias: embl
url: "http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz"
comment: "embl via srswww"
]


Thanks in advance, 
Marlena Roszczyk


From David.Bauer at SCHERING.DE  Tue Oct 24 02:48:45 2006
From: David.Bauer at SCHERING.DE (David.Bauer at SCHERING.DE)
Date: Tue, 24 Oct 2006 08:48:45 +0200
Subject: [EMBOSS] Antwort:  30 entries only
In-Reply-To: <1161640147.4367.37.camel@localhost.localdomain>
Message-ID: <OF5A6D45E3.B212A495-ONC1257211.0024B775-C1257211.00256D2C@schering.de>


Hi Marleno,

SRS has a default limit of 30 entries/page.
So it seems that you are getting only the first page of results from the
server.
If you want to run queries with such results, it may be a good idea to
download the uniprot flat file from the ebi ftp server, index it with the
EMBOSS dbxflat and than run the queries locally.
But if this is not an option due to limited resources, I guess Peter will
have an idea how to get the other result pages out of SRS with the srswww
method. ;-)

Cheers,
David.

emboss-bounces at lists.open-bio.org schrieb am 23/10/2006 23:49:07:

> Does anybody know how to solve this problem:
>
> I use Emboss via srswww method and everything seems to work fine until I
> ask seqret or infoseq (or any other application that searches the
> database) for many sequences  (for example typing: "seqret
> database-des:kinase"). The output consists only of 30 entries even
> though the same query on srs.ebi.ac.uk results in a 6-digit number of
> entries. What shall I do to get all the entries I want? Is it a problem
> with Emboss or rather srs policy of sending data?
>
>
> Just in case you would like to see my emboss.default:
>
> DB zuniprot [
> methodquery: srswww
> format: swiss
> type: P
> fields: "id acc sv des key org"
> dbalias: uniprot
> url: "http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz"
> comment: "uniprot/swiss via srswww"
> ]
>
> DB zswiss [
> methodquery: srswww
> format: swiss
> type: P
> fields: "id acc sv des key org"
> dbalias: swissprot
> url: "http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz"
> comment: "swissprot via srswww"
> ]
>
> DB zembl [
> type: N
> methodquery: srswww
> format: embl
> fields: "id acc key sv des org"
> dbalias: embl
> url: "http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz"
> comment: "embl via srswww"
> ]
>
>
> Thanks in advance,
> Marlena Roszczyk
>
> _______________________________________________
> EMBOSS mailing list
> EMBOSS at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/emboss


From rls at ebi.ac.uk  Tue Oct 24 03:59:46 2006
From: rls at ebi.ac.uk (Rodrigo Lopez)
Date: Tue, 24 Oct 2006 08:59:46 +0100
Subject: [EMBOSS] 30 entries only
In-Reply-To: <1161640147.4367.37.camel@localhost.localdomain>
References: <1161640147.4367.37.camel@localhost.localdomain>
Message-ID: <453DC7F2.6030908@ebi.ac.uk>

Hi,

I suspect this is related to the default view used in SRS. It is 
returning the first page of results that contains 30 sequences (the 
default). To overcome this problem, the call need to have the following 
parameters:

http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz?[uniprot-des:kinase]+-vn+1+-lv+100

-vn <int> is the view to use (1=names, 2=complete entries)
-lv <int> is the number of entries to be used in one go

It is important to realize that downloading a lot of entries, although 
possible, take a while and results in high loads for the servers.

The way in which I use to download a large set of entries is by 
generating a list (using -vn 1) and then using the list
with seqret in the following way:

% seqret @listname -out mykinases

and making sure this time that -vn 2 is used to retrieve complete 
entries. This requires the addition of other EMBOSS database definitions 
(one for lists and another for complete entry retrieval).

Hope this helps,

R:)


Marlena Roszczyk wrote:
> Does anybody know how to solve this problem:
> 
> I use Emboss via srswww method and everything seems to work fine until I
> ask seqret or infoseq (or any other application that searches the
> database) for many sequences  (for example typing: "seqret
> database-des:kinase"). The output consists only of 30 entries even
> though the same query on srs.ebi.ac.uk results in a 6-digit number of
> entries. What shall I do to get all the entries I want? Is it a problem
> with Emboss or rather srs policy of sending data?
> 
> 
> Just in case you would like to see my emboss.default:
> 
> DB zuniprot [ 
> methodquery: srswww
> format: swiss
> type: P
> fields: "id acc sv des key org"
> dbalias: uniprot
> url: "http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz"
> comment: "uniprot/swiss via srswww"
> ]
> 
> DB zswiss [
> methodquery: srswww
> format: swiss
> type: P
> fields: "id acc sv des key org"
> dbalias: swissprot
> url: "http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz"
> comment: "swissprot via srswww"
> ]
> 
> DB zembl [
> type: N
> methodquery: srswww
> format: embl
> fields: "id acc key sv des org"
> dbalias: embl
> url: "http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz"
> comment: "embl via srswww"
> ]
> 
> 
> Thanks in advance, 
> Marlena Roszczyk
> 
> _______________________________________________
> EMBOSS mailing list
> EMBOSS at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/emboss


From pmr at ebi.ac.uk  Tue Oct 24 04:55:17 2006
From: pmr at ebi.ac.uk (pmr at ebi.ac.uk)
Date: Tue, 24 Oct 2006 09:55:17 +0100 (BST)
Subject: [EMBOSS] 30 entries only
In-Reply-To: <453DC7F2.6030908@ebi.ac.uk>
References: <1161640147.4367.37.camel@localhost.localdomain>
	<453DC7F2.6030908@ebi.ac.uk>
Message-ID: <3261.217.44.133.216.1161680117.squirrel@webmail.ebi.ac.uk>

Rodrigo Lopez writes:

> I suspect this is related to the default view used in SRS. It is
> returning the first page of results that contains 30 sequences (the
> default). To overcome this problem, the call need to have the following
> parameters:
>
> http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz?[uniprot-des:kinase]+-vn+1+-lv+100

But this is using the EMBOSS "srswww" access method, which uses

+-e+-ascii

That should return complete entries for all as ascii text.

Perhaps something has changed on the EBI's SRS server because this now
only gives me 30 entries.

+-lv+100 does give 100 entries ... but it will take some reworking of the
code to loop through entries that way. Hmmmm.....


regards,

Peter


From rls at ebi.ac.uk  Tue Oct 24 05:01:50 2006
From: rls at ebi.ac.uk (Rodrigo Lopez)
Date: Tue, 24 Oct 2006 10:01:50 +0100
Subject: [EMBOSS] 30 entries only
In-Reply-To: <3261.217.44.133.216.1161680117.squirrel@webmail.ebi.ac.uk>
References: <1161640147.4367.37.camel@localhost.localdomain>
	<453DC7F2.6030908@ebi.ac.uk>
	<3261.217.44.133.216.1161680117.squirrel@webmail.ebi.ac.uk>
Message-ID: <453DD67E.8000600@ebi.ac.uk>

I'll have to wait for our SRS admin to come back to find out if a change 
has in fact taken place or not. hmm....

R:/

pmr at ebi.ac.uk wrote:
> Rodrigo Lopez writes:
> 
>> I suspect this is related to the default view used in SRS. It is
>> returning the first page of results that contains 30 sequences (the
>> default). To overcome this problem, the call need to have the following
>> parameters:
>>
>> http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz?[uniprot-des:kinase]+-vn+1+-lv+100
> 
> But this is using the EMBOSS "srswww" access method, which uses
> 
> +-e+-ascii
> 
> That should return complete entries for all as ascii text.
> 
> Perhaps something has changed on the EBI's SRS server because this now
> only gives me 30 entries.
> 
> +-lv+100 does give 100 entries ... but it will take some reworking of the
> code to loop through entries that way. Hmmmm.....
> 
> 
> regards,
> 
> Peter


From maoj at helix.nih.gov  Tue Oct 24 14:15:17 2006
From: maoj at helix.nih.gov (Jean Mao)
Date: Tue, 24 Oct 2006 14:15:17 -0400
Subject: [EMBOSS] How to include Prosite and Rebase and Print into 'showdb'
	program
Message-ID: <000001c6f798$5c2c13c0$be4de780@CIT.NIH.GOV>

Hi, for EMBOSS 4.0.0, is there a way to show both prosite and rebase databases when I type 'showdb' at the prompt? I asked the same question back in 2003. I was hoping the answer will be different this time :-)

Thanks. 

Jean


From sovani at rohan.sdsu.edu  Tue Oct 24 14:09:07 2006
From: sovani at rohan.sdsu.edu (Sujata Sovani)
Date: Tue, 24 Oct 2006 11:09:07 -0700 (PDT)
Subject: [EMBOSS] Antigenic - input file format?
Message-ID: <3251.146.244.226.90.1161713347.squirrel@www-rohan.sdsu.edu>

Hi,

I want to use the package called 'Antigenic' in EMBOSS.
I am not quite clear about the input file format to be used.

How can I input a fasta file to the program? - is it possible to use a
text file that has the amino acid sequence in a fasta format? In which
folder should the file be?

Please let me know.

Thank you.

Regards,
Sujata


From km at mrna.tn.nic.in  Tue Oct 24 17:28:57 2006
From: km at mrna.tn.nic.in (km)
Date: Wed, 25 Oct 2006 02:58:57 +0530
Subject: [EMBOSS] Antigenic - input file format?
In-Reply-To: <3251.146.244.226.90.1161713347.squirrel@www-rohan.sdsu.edu>
References: <3251.146.244.226.90.1161713347.squirrel@www-rohan.sdsu.edu>
Message-ID: <20061024212857.GA31781@mrna.tn.nic.in>

Hi,

> I want to use the package called 'Antigenic' in EMBOSS.
> I am not quite clear about the input file format to be used.
pls consult EMBOSS documentation on the system by typing
$tfm antigenic 

> How can I input a fasta file to the program?
first check: 
tfm antigenic 
then, assuming that ur set of sequence(s) are in a textfile(myseqs.fa) in fasta format run:
$antigenic -sequence myseqs.fa
> is it possible to use a text file that has the amino acid sequence in a fasta format?
yes
>In which folder should the file be?

simple solution would be that the sequence 
current folder 

regards,
KM


From golharam at umdnj.edu  Wed Oct 25 00:28:30 2006
From: golharam at umdnj.edu (Ryan Golhar)
Date: Wed, 25 Oct 2006 00:28:30 -0400
Subject: [EMBOSS] How to include Prosite and Rebase and Print into
 'showdb'program
In-Reply-To: <000001c6f798$5c2c13c0$be4de780@CIT.NIH.GOV>
Message-ID: <002f01c6f7ee$07005fe0$2f01a8c0@GOLHARMOBILE1>

Have you gotten an answer to this yet?

> -----Original Message-----
> From: emboss-bounces at lists.open-bio.org 
> [mailto:emboss-bounces at lists.open-bio.org] On Behalf Of Jean Mao
> Sent: Tuesday, October 24, 2006 2:15 PM
> To: emboss at emboss.open-bio.org
> Subject: [EMBOSS] How to include Prosite and Rebase and Print 
> into 'showdb'program
> 
> 
> Hi, for EMBOSS 4.0.0, is there a way to show both prosite and 
> rebase databases when I type 'showdb' at the prompt? I asked 
> the same question back in 2003. I was hoping the answer will 
> be different this time :-)
> 
> Thanks. 
> 
> Jean
> 
> 
> _______________________________________________
> EMBOSS mailing list
> EMBOSS at lists.open-bio.org 
> http://lists.open-> bio.org/mailman/listinfo/emboss
> 


From pmr at ebi.ac.uk  Wed Oct 25 03:34:19 2006
From: pmr at ebi.ac.uk (pmr at ebi.ac.uk)
Date: Wed, 25 Oct 2006 08:34:19 +0100 (BST)
Subject: [EMBOSS] How to include Prosite and Rebase and Print into
 'showdb' program
In-Reply-To: <000001c6f798$5c2c13c0$be4de780@CIT.NIH.GOV>
References: <000001c6f798$5c2c13c0$be4de780@CIT.NIH.GOV>
Message-ID: <1836.217.44.133.216.1161761659.squirrel@webmail.ebi.ac.uk>

Hi Jean,

> Hi, for EMBOSS 4.0.0, is there a way to show both prosite and rebase
> databases when I type 'showdb' at the prompt? I asked the same question
> back in 2003. I was hoping the answer will be different this time :-)

Well .... EMBOSS 4.0.0 does have extended showdb output so now we can add
this. The main issue is that there is currently nothing in EMBOSS that
uses the definition, but we would like to add a report of the database
release to the output of programs that use them.

The definitions would be expected to go in RESOURCE definitions in the
emboss.default file but we could perhaps put something in the output of
the *extract programs.

I will take another look.

regards,

Peter


From pmr at ebi.ac.uk  Wed Oct 25 03:37:56 2006
From: pmr at ebi.ac.uk (pmr at ebi.ac.uk)
Date: Wed, 25 Oct 2006 08:37:56 +0100 (BST)
Subject: [EMBOSS] Antigenic - input file format?
In-Reply-To: <3251.146.244.226.90.1161713347.squirrel@www-rohan.sdsu.edu>
References: <3251.146.244.226.90.1161713347.squirrel@www-rohan.sdsu.edu>
Message-ID: <1840.217.44.133.216.1161761876.squirrel@webmail.ebi.ac.uk>

Hi Sujata,

> I want to use the package called 'Antigenic' in EMBOSS.
> I am not quite clear about the input file format to be used.
>
> How can I input a fasta file to the program? - is it possible to use a
> text file that has the amino acid sequence in a fasta format? In which
> folder should the file be?

All EMBOSS programs read sequences from files, or from databases (local or
remote).

You can put the sequence in a file, in fasta format, and give the filename
to any EMBOSS program as input. Sequences are "input parameters" so simply
putting the filename on the command line is enough.

EMBOSS will look in the current directory, but you can give the full or
relative file path just like any Unix command.

This assumes of course that you are running EMBOSS locally, not through a
web interface (in that case, simply paste a FASTA format sequence into the
text box).

Hope that helps

Peter


From pmr at ebi.ac.uk  Wed Oct 25 03:42:50 2006
From: pmr at ebi.ac.uk (pmr at ebi.ac.uk)
Date: Wed, 25 Oct 2006 08:42:50 +0100 (BST)
Subject: [EMBOSS] How to include Prosite and Rebase and Print into
 'showdb'program
In-Reply-To: <002f01c6f7ee$07005fe0$2f01a8c0@GOLHARMOBILE1>
References: <000001c6f798$5c2c13c0$be4de780@CIT.NIH.GOV>
	<002f01c6f7ee$07005fe0$2f01a8c0@GOLHARMOBILE1>
Message-ID: <1844.217.44.133.216.1161762170.squirrel@webmail.ebi.ac.uk>

Ryan Golhar writes:
> Have you gotten an answer to this yet?

A bit quick off the mark there, Ryan! :-) :-) :-)

Jean asked in the USA at 7pm our time. You posted this in India at 5am our
time. I answered over breakfast (well, not quite a positive answer, but I
did answer :-)

If only Jean had asked last week ... I was in Japan and I'd have snuck in
a reply already... and Alan, Jon and I do quite often post replies at very
strange hours even when we are home.

regards,

Peter


From mrln at o2.pl  Wed Oct 25 09:13:36 2006
From: mrln at o2.pl (Marlena Roszczyk)
Date: Wed, 25 Oct 2006 15:13:36 +0200
Subject: [EMBOSS] 30 entries only
In-Reply-To: <3261.217.44.133.216.1161680117.squirrel@webmail.ebi.ac.uk>
References: <1161640147.4367.37.camel@localhost.localdomain>
	<453DC7F2.6030908@ebi.ac.uk>
	<3261.217.44.133.216.1161680117.squirrel@webmail.ebi.ac.uk>
Message-ID: <1161782016.4396.52.camel@localhost.localdomain>

Adding lv parameter helped and is good enough. It required few more
lines in emboss.default:

DB blahblah [
method: url
format: myfavouriteformat
type: P
url: "http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz?-e+-ascii+[uniprot-des:%
s]+-lv+<int>"
]

Thank you.

Still, option -vn 1 refuses to cooperate, although -vn 2 works fine.
Adding +-vn+1 to the url-line above makes seqret return "Bad value for
-sequence". Hmmm... 

Regards,
Marlena Roszczyk


> Rodrigo Lopez writes:
> 
> > I suspect this is related to the default view used in SRS. It is
> > returning the first page of results that contains 30 sequences (the
> > default). 

 Yes, the number 30 here and there doesn't seem a coincidence.


From pmr at ebi.ac.uk  Wed Oct 25 09:27:49 2006
From: pmr at ebi.ac.uk (Peter Rice)
Date: Wed, 25 Oct 2006 14:27:49 +0100
Subject: [EMBOSS] Question regarding seqret
In-Reply-To: <000001c6d11a$0db13530$be4de780@CIT.NIH.GOV>
References: <000001c6d11a$0db13530$be4de780@CIT.NIH.GOV>
Message-ID: <453F6655.2050900@ebi.ac.uk>

Jean Mao wrote:
> Hi, 
> I have a question hopefully someone can help me about it.
> 
> I downloaded the gbrvt1.seq file from ftp://ftp.ncbi.nih.gov/genbank/ as a test, gunzip and index it with dbxflat (I know it's not > than 2gb):
> 
> %  dbxflat -dbname=testdb -dbresource=embl -idformat=gb -directory=. -fields='id,acc,sv,des' -filenames='gbvrt*.seq' -indexoutdir=. -release=0.0 -date='00/00/00'
> 
> Then I run 'seqret' but failed to retrieve entries using 'sv' or 'des' fields:

I didn't see an answer to this one, but I suspect you have already figured it out.

dbixflat and dbiflat will have created the sv and des indices.

You have to edit the database definition in emboss.default to say the fields exist.

    fields: "sv des"

then seqret and other programs will know they can use them.

Yes, in theory seqret could work out what indices are available for a dbxflat or 
dbiflat indexed database - but it would be more difficult for an SRS or SRSWWW 
database (for example) so we depend on the database definitions.

Hope that helps,

Peter


From golharam at umdnj.edu  Wed Oct 25 14:50:12 2006
From: golharam at umdnj.edu (Ryan Golhar)
Date: Wed, 25 Oct 2006 14:50:12 -0400
Subject: [EMBOSS] How to include Prosite and Rebase and Print into
 'showdb'program
In-Reply-To: <1844.217.44.133.216.1161762170.squirrel@webmail.ebi.ac.uk>
Message-ID: <006d01c6f866$66d9cbe0$2f01a8c0@GOLHARMOBILE1>

> -----Original Message-----
> From: pmr at ebi.ac.uk [mailto:pmr at ebi.ac.uk] 
> Sent: Wednesday, October 25, 2006 3:43 AM
> To: golharam at umdnj.edu
> Cc: 'Jean Mao'; emboss at emboss.open-bio.org
> Subject: Re: [EMBOSS] How to include Prosite and Rebase and 
> Print into 'showdb'program
> 
> 
> Ryan Golhar writes:
> > Have you gotten an answer to this yet?
> 
> A bit quick off the mark there, Ryan! :-) :-) :-)
> 
> Jean asked in the USA at 7pm our time. You posted this in 
> India at 5am our time. I answered over breakfast (well, not 
> quite a positive answer, but I did answer :-)
> 
> If only Jean had asked last week ... I was in Japan and I'd 
> have snuck in a reply already... and Alan, Jon and I do quite 
> often post replies at very strange hours even when we are home.
> 
> regards,
> 
> Peter
> 
> 
> 

Sorry, I was cleaning out my mail folder.  I had deleted the message
already and noticed it in my deleted box.  The subject caught my
attention.  I thought the message was older...my bad.


From mkitagaw73 at yahoo.co.jp  Fri Oct 27 09:09:40 2006
From: mkitagaw73 at yahoo.co.jp (mkitagaw73 at yahoo.co.jp)
Date: Fri, 27 Oct 2006 22:09:40 +0900
Subject: [EMBOSS] ARACHNE3
Message-ID: <OF4E21ABC7.1D5BAC9A-ON49257214.00484BFB@takara.co.jp>

I can not find "Arachne 3" the assembler of new version of "Arachne 2".
Do you know where it is?
--
Nari


From mincloud at gmail.com  Sun Oct 29 12:39:35 2006
From: mincloud at gmail.com (yun zheng)
Date: Sun, 29 Oct 2006 11:39:35 -0600
Subject: [EMBOSS] How to apply the einverted and etandom to a fasta file
Message-ID: <8f6eb9540610290939i48adf473g2f81c36a14b198ad@mail.gmail.com>

Hi,

I am a new user of emboss. I am trying to find repeat sequences in a
nucleotide sequence file that have many sequences.

Can anybody tell me how to use einverted and etandem to analyze all the
sequences in a fasta file?

Many Thanks.

Sincerely

Zheng, yun

Dept of Computer Science and Engineering

Washington Univ in St Louis

Campus Box 1045

1 Brookings Drive

Jolley Hall 505

St Louis, MO 63130


Details:

I install a version on the linux platform. And the command is like follows,
where the default value is used.

>einverted -sequence test.fasta -outfile test.outfile -outseq
>test-i.fasta

Finds DNA inverted repeats

Gap penalty [12]:

Minimum score threshold [50]:

Match score [3]:

Mismatch score [-4]:


But the output file seems always to be empty.


When I try etandom

>etandem -sequence test.fasta -outfile test-t.out -origfile test.etandem

Looks for tandem repeats in a nucleotide sequence

Minimum repeat size [10]:

Maximum repeat size [10]: 18

However, it seems that only the first sequence is analyzed by the einverted
and etandom. The test-t.out file is as follows.

########################################

# Program: etandem

# Rundate: Sat Oct 28 2006 17:24:30

# Commandline: etandem

#    -sequence test.fasta

#    -outfile test-t.out

#    -origfile test.etandem

#    -maxrepeat 18

# Report_format: table

# Report_file: test-t.out

########################################


#=======================================

#

# Sequence: D9X6RJV01EER0J     from: 1   to: 55

# HitCount: 0

#

# Threshold: 20

# Minrepeat: 10

# Maxrepeat: 18

# Mismatch: No

# Uniform: No

#

#=======================================

   Start     End   Score   Size  Count Identity Consensus

#---------------------------------------

#---------------------------------------

 Many thanks.


From gbottu at ben.vub.ac.be  Mon Oct 30 10:33:13 2006
From: gbottu at ben.vub.ac.be (Guy Bottu)
Date: Mon, 30 Oct 2006 16:33:13 +0100
Subject: [EMBOSS] How to apply the einverted and etandom to a fasta file
	- C
In-Reply-To: <8f6eb9540610290939i48adf473g2f81c36a14b198ad@mail.gmail.com>
References: <8f6eb9540610290939i48adf473g2f81c36a14b198ad@mail.gmail.com>
Message-ID: <20061030153313.GA14597@bigben.ulb.ac.be>

On Sun, Oct 29, 2006 at 11:39:35AM -0600, yun zheng wrote:
> I am a new user of emboss. I am trying to find repeat sequences in a
> nucleotide sequence file that have many sequences.
> 
> Can anybody tell me how to use einverted and etandem to analyze all the
> sequences in a fasta file?

einverted is searching for palindromes rather than repeats. It operates 
without problem on a fastA multiple sequence file. The reason that the 
output file is empty is probably because it did not find any good 
palindrome. Maybe you can try experiment with the parameters.

etandem operates only on one sequence at a time. You can see this because 
if you do etandem -help you see that it takes as input an object of type 
"sequence" rather than "seqall". If you want to treat many sequences at 
once, you will need to put them in separate files. If necessary you can 
run seqret -ossingle on your file. You can under the Tc shell (tcsh) 
(provided your files are all called something.fasta) do :

foreach FASTAFILE (`ls *.fasta`)
etandem $FASTAFILE -minrepeat=10 -maxrepeat=10 -threshold=20 -auto
end

Problem is that etandem works only well if you provide an appropriate 
value for minrepeat/maxrepeat/threshold. You can use equicktandem to get 
an idea (look in the 4th column of the output for a repeat size). Working 
on all sequences in one run will of course only go well if they all 
contain repeats of similar size and quality.

I hope this helps.

	Guy Bottu,
	Belgian EMBnet Node


From jbreu at mpipsykl.mpg.de  Mon Oct 30 14:38:10 2006
From: jbreu at mpipsykl.mpg.de (Johannes Breu)
Date: Mon, 30 Oct 2006 20:38:10 +0100
Subject: [EMBOSS] dbifasta
Message-ID: <3.0.6.32.20061030203810.00acc5e8@komserv.mpipsykl.mpg.de>

Hello,
while trying to index my database (its mouse_ensembl_cdna and so is the
name) I always get the following error message:

$ dbifasta
Database indexing for fasta file databases
Database name: cdna
    simple : >ID
     idacc : >ID ACC
     gcgid : >db:ID
  gcgidacc : >db:ID ACC
      dbid : >db ID
      ncbi : | formats
ID line format [idacc]: simple
Database directory [.]: /data/cdna
Wildcard database filename [*.dat]: cdna
Release number [0.0]:
Index date [00/00/00]:
General log output file [outfile.dbifasta]: outfile.cdnafasta

   EMBOSS An error in dbifasta.c at line 210:
No files selected


For the case it?s relevant - I am using cygwin. 

Thank you, Johannes 


From ajb at ebi.ac.uk  Mon Oct 30 16:30:03 2006
From: ajb at ebi.ac.uk (ajb at ebi.ac.uk)
Date: Mon, 30 Oct 2006 21:30:03 -0000 (GMT)
Subject: [EMBOSS] dbifasta
In-Reply-To: <3.0.6.32.20061030203810.00acc5e8@komserv.mpipsykl.mpg.de>
References: <3.0.6.32.20061030203810.00acc5e8@komserv.mpipsykl.mpg.de>
Message-ID: <40898.81.98.244.247.1162243803.squirrel@webmail.ebi.ac.uk>

Hi,

If the filename is mouse_ensembl_cdna then that's the filename you
should use at the

   Wildcard database filename [*.dat]:

prompt. From your email you were using "cdna" instead. As a wildcard
can be specified  then perhaps you intended typing "*cdna" which would
have picked up the filename mouse_ensembl_cdna

HTH

Alan
EBI

> Hello,
> while trying to index my database (its mouse_ensembl_cdna and so is the
> name) I always get the following error message:
>
> $ dbifasta
> Database indexing for fasta file databases
> Database name: cdna
>     simple : >ID
>      idacc : >ID ACC
>      gcgid : >db:ID
>   gcgidacc : >db:ID ACC
>       dbid : >db ID
>       ncbi : | formats
> ID line format [idacc]: simple
> Database directory [.]: /data/cdna
> Wildcard database filename [*.dat]: cdna
> Release number [0.0]:
> Index date [00/00/00]:
> General log output file [outfile.dbifasta]: outfile.cdnafasta
>
>    EMBOSS An error in dbifasta.c at line 210:
> No files selected
>
>
> For the case it?s relevant - I am using cygwin.
>
> Thank you, Johannes
>
>
> _______________________________________________
> EMBOSS mailing list
> EMBOSS at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/emboss
>
>


From shrish at ccmb.res.in  Tue Oct 31 07:41:19 2006
From: shrish at ccmb.res.in (Shrish Tiwari)
Date: Tue, 31 Oct 2006 18:11:19 +0530 (IST)
Subject: [EMBOSS] extracting noncoding regions
Message-ID: <2187871.1162298479934.JavaMail.root@mailserver>

An embedded and charset-unspecified text was scrubbed...
Name: not available
Url: http://lists.open-bio.org/pipermail/emboss/attachments/20061031/be440637/attachment.pl 

From shrish at ccmb.res.in  Tue Oct 31 07:18:36 2006
From: shrish at ccmb.res.in (Shrish Tiwari)
Date: Tue, 31 Oct 2006 17:48:36 +0530 (IST)
Subject: [EMBOSS] showfeat troubles
Message-ID: <24303384.1162297116701.JavaMail.root@mailserver>

An embedded and charset-unspecified text was scrubbed...
Name: not available
Url: http://lists.open-bio.org/pipermail/emboss/attachments/20061031/9aa4724d/attachment.pl 

From David.Bauer at schering.de  Tue Oct 31 08:54:03 2006
From: David.Bauer at schering.de (David.Bauer at schering.de)
Date: Tue, 31 Oct 2006 14:54:03 +0100
Subject: [EMBOSS] Antwort:  showfeat troubles
In-Reply-To: <24303384.1162297116701.JavaMail.root@mailserver>
Message-ID: <OF769146FF.BC4FD662-ONC1257218.004C2056-C1257218.004C5C38@schering.de>


Hi,

I don't get this problem. Showfeat displays CDS from both strands with
EMBL and GenBank files.
What is the source of your Genbankf file ? Maybe the format is not
perfectly correct ?

David.

emboss-bounces at lists.open-bio.org schrieb am 31/10/2006 13:18:36:

> Hi!
> I used the following command to extract only positions of CDS from gbk
files:
> showfeat -pos -matchtype CDS -width 0
> But I noticed that the program does not extract positions of CDS
> that lie on the complementary strand, e.g. CDS
> complement(5683..6459) did not show up in the resultant file. Any
> ideas on how I can get showfeat to extract these positions too.
> Shrish
> Dr. Shrish Tiwari
> E503, Centre for Cellular and Molecular Biology
> Uppal Road, Hyderabad - 500 007, INDIA
> Phone: 91-40-27192777
> Alternate email: shrish.geo at yahoo.com
>
> _______________________________________________
> EMBOSS mailing list
> EMBOSS at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/emboss


From pmr at ebi.ac.uk  Tue Oct 31 10:34:00 2006
From: pmr at ebi.ac.uk (Peter Rice)
Date: Tue, 31 Oct 2006 15:34:00 +0000
Subject: [EMBOSS] extracting noncoding regions
In-Reply-To: <2187871.1162298479934.JavaMail.root@mailserver>
References: <2187871.1162298479934.JavaMail.root@mailserver>
Message-ID: <45476CE8.8080409@ebi.ac.uk>

Hi Shrish,

Shrish Tiwari wrote:
> Hi!
> Is there a way of extracting the noncoding regions of a genome using an EMBOSS program?

That is a simple change to coderet to return non-coding sequence (exclude the 
CDS and mRNA features).

Does anyone else want this? We can do it for the next release.

regards,

Peter


From pmr at ebi.ac.uk  Tue Oct 31 10:55:49 2006
From: pmr at ebi.ac.uk (Peter Rice)
Date: Tue, 31 Oct 2006 15:55:49 +0000
Subject: [EMBOSS] showfeat troubles
In-Reply-To: <24303384.1162297116701.JavaMail.root@mailserver>
References: <24303384.1162297116701.JavaMail.root@mailserver>
Message-ID: <45477205.50002@ebi.ac.uk>

Hi Shrish,

Shrish Tiwari wrote:
> Hi!
> I used the following command to extract only positions of CDS from gbk files:
> showfeat -pos -matchtype CDS -width 0
> But I noticed that the program does not extract positions of CDS that lie on the complementary strand, e.g. CDS             complement(5683..6459) did not show up in the resultant file. Any ideas on how I can get showfeat to extract these positions too.

It worked for me, but reports these as 5683..6469 (without -width 0 it will show 
the arrow in the reverse direction)

Can you try running entret on the same genbank entry, and sending the output 
file to emboss-bug at emboss.open-bio.org so we can take a look at it.

regards,

Peter Rice


From David.Bauer at schering.de  Tue Oct 31 09:01:54 2006
From: David.Bauer at schering.de (David.Bauer at schering.de)
Date: Tue, 31 Oct 2006 15:01:54 +0100
Subject: [EMBOSS] Antwort:  extracting noncoding regions
In-Reply-To: <2187871.1162298479934.JavaMail.root@mailserver>
Message-ID: <OFA8463ADB.4252913F-ONC1257218.004CD418-C1257218.004D1451@schering.de>


Hm,

if the genome is annotated, you could use
maskfeat -type mRNA (or -type CDS)
to mask all transcribed or translated regions with N.

HTH,
David.

emboss-bounces at lists.open-bio.org schrieb am 31/10/2006 13:41:19:

> Hi!
> Is there a way of extracting the noncoding regions of a genome using
> an EMBOSS program?
> Shrish
> Dr. Shrish Tiwari
> E503, Centre for Cellular and Molecular Biology
> Uppal Road, Hyderabad - 500 007, INDIA
> Phone: 91-40-27192777
> Alternate email: shrish.geo at yahoo.com
>
> _______________________________________________
> EMBOSS mailing list
> EMBOSS at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/emboss


From golharam at umdnj.edu  Tue Oct 31 12:02:38 2006
From: golharam at umdnj.edu (Ryan Golhar)
Date: Tue, 31 Oct 2006 12:02:38 -0500
Subject: [EMBOSS] extracting noncoding regions
In-Reply-To: <45476CE8.8080409@ebi.ac.uk>
Message-ID: <000a01c6fd0e$5f5b5b70$b23d140a@GOLHARMOBILE1>

I think that would be a useful feature...I have a need for it now and
currently use a Bioperl script to parse out noncoding regions from a
GenBank entry...


> -----Original Message-----
> From: emboss-bounces at lists.open-bio.org 
> [mailto:emboss-bounces at lists.open-bio.org] On Behalf Of Peter Rice
> Sent: Tuesday, October 31, 2006 10:34 AM
> To: Shrish Tiwari
> Cc: emboss at emboss.open-bio.org
> Subject: Re: [EMBOSS] extracting noncoding regions
> 
> 
> Hi Shrish,
> 
> Shrish Tiwari wrote:
> > Hi!
> > Is there a way of extracting the noncoding regions of a 
> genome using 
> > an EMBOSS program?
> 
> That is a simple change to coderet to return non-coding 
> sequence (exclude the 
> CDS and mRNA features).
> 
> Does anyone else want this? We can do it for the next release.
> 
> regards,
> 
> Peter
> _______________________________________________
> EMBOSS mailing list
> EMBOSS at lists.open-bio.org 
> http://lists.open-> bio.org/mailman/listinfo/emboss
> 


From Richard.Rothery at ualberta.ca  Tue Oct 31 12:02:22 2006
From: Richard.Rothery at ualberta.ca (Richard Rothery)
Date: Tue, 31 Oct 2006 10:02:22 -0700
Subject: [EMBOSS] Batch retrieval of taxonomy/species names using entret.....
Message-ID: <000001c6fd0e$5520f2f0$5e068081@Nordegg>

Hi,

 
I am interested in using entret to retrieve single field entries from
swissprot or sptrembl. Specifically, I would like to feed entret a list
of accessions and have it return a file with the species names and/or
taxonomies. I intend to use this information to compare with my
phylogeny analyses of clustalw alignments.

 
Thanks,

 
Richard

 
###############################################

CIHR Membrane Protein Research Group,

Department of Biochemistry, University of Alberta,

Edmonton T6G 2H7

Ph. (780) 492-2229 Fax. (780) 492-0886

###############################################

 
From Suraj.Mukatira at STJUDE.ORG  Tue Oct 31 13:30:00 2006
From: Suraj.Mukatira at STJUDE.ORG (Mukatira, Suraj)
Date: Tue, 31 Oct 2006 12:30:00 -0600
Subject: [EMBOSS] extracting noncoding regions
Message-ID: <F2235647AC878D438F09255C39842FBC164EC5AD@SJMEMXMB03.stjude.sjcrh.local>


I use BioPerl as well. Extraction of non-coding regions and features
like intron, UTR etc. would certainly be useful from within EMBOSS.
Suraj Mukatira


-----Original Message-----
From: emboss-bounces at lists.open-bio.org
[mailto:emboss-bounces at lists.open-bio.org] On Behalf Of Ryan Golhar
Sent: Tuesday, October 31, 2006 11:03 AM
To: 'Peter Rice'; 'Shrish Tiwari'
Cc: emboss at emboss.open-bio.org
Subject: Re: [EMBOSS] extracting noncoding regions

I think that would be a useful feature...I have a need for it now and
currently use a Bioperl script to parse out noncoding regions from a
GenBank entry...


> -----Original Message-----
> From: emboss-bounces at lists.open-bio.org 
> [mailto:emboss-bounces at lists.open-bio.org] On Behalf Of Peter Rice
> Sent: Tuesday, October 31, 2006 10:34 AM
> To: Shrish Tiwari
> Cc: emboss at emboss.open-bio.org
> Subject: Re: [EMBOSS] extracting noncoding regions
> 
> 
> Hi Shrish,
> 
> Shrish Tiwari wrote:
> > Hi!
> > Is there a way of extracting the noncoding regions of a 
> genome using 
> > an EMBOSS program?
> 
> That is a simple change to coderet to return non-coding 
> sequence (exclude the 
> CDS and mRNA features).
> 
> Does anyone else want this? We can do it for the next release.
> 
> regards,
> 
> Peter
> _______________________________________________
> EMBOSS mailing list
> EMBOSS at lists.open-bio.org 
> http://lists.open-> bio.org/mailman/listinfo/emboss
> 

_______________________________________________
EMBOSS mailing list
EMBOSS at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/emboss


From pmr at ebi.ac.uk  Tue Oct 31 13:53:00 2006
From: pmr at ebi.ac.uk (Peter Rice)
Date: Tue, 31 Oct 2006 18:53:00 +0000
Subject: [EMBOSS] Batch retrieval of taxonomy/species names using
	entret.....
In-Reply-To: <000001c6fd0e$5520f2f0$5e068081@Nordegg>
References: <000001c6fd0e$5520f2f0$5e068081@Nordegg>
Message-ID: <45479B8C.5080800@ebi.ac.uk>

Hi Richard,

Richard Rothery wrote:
> I am interested in using entret to retrieve single field entries from
> swissprot or sptrembl. Specifically, I would like to feed entret a list
> of accessions and have it return a file with the species names and/or
> taxonomies. I intend to use this information to compare with my
> phylogeny analyses of clustalw alignments.

EMBOSS stores the full text in entret without parsing.

We could try to extract specific fields but it is not easy to define them for 
all formats.

You can do this with SRS. Try the EBI server for example:

Go to the library page

Select UniProtKB/SwissProt (or UniProtKB/TrEMBL)

Select "standard query form"

Enter your query in the top part (e.g. accession number)

In the "create a view" section click the "list" button to egt the original 
lines. Select anything taxonomic from the pull down list (control-click to 
select more than one)

Press "search".

refine your query. You will see the URL at the top that can be used to retrieve 
data when you are happy.

Failing that, you could just parse out the ID and O* lines from entret using a 
simple perl script.

Hope that helps,

Peter


From gbottu at ben.vub.ac.be  Mon Oct  2 07:58:24 2006
From: gbottu at ben.vub.ac.be (Guy Bottu)
Date: Mon, 2 Oct 2006 09:58:24 +0200
Subject: [EMBOSS] case sensitive identifiers - Checked by AntiVir DEMO
	version -
In-Reply-To: <2023.86.132.219.183.1159518502.squirrel@webmail.ebi.ac.uk>
References: <20060928135740.GA14320@bigben.ulb.ac.be>
	<451BDD04.9040806@ebi.ac.uk>
	<20060929081508.GA25906@bigben.ulb.ac.be>
	<2023.86.132.219.183.1159518502.squirrel@webmail.ebi.ac.uk>
Message-ID: <20061002075824.GA5571@bigben.ulb.ac.be>

On Fri, Sep 29, 2006 at 09:28:22AM +0100, pmr at ebi.ac.uk wrote:
> For the PDB case, really only the end of the ID is case-sensitive. Do you
> think the database should be case-sensitive for the whole ID, or does it
> make sense to check for a pattern as the case-sensitive part?

I think that trying to define which part of the ID is case-sensitive is 
making it just too complicated. Let's have it completely case-sensitive 
or not at all.

> EMBOSS will initially read only one sequence for a seqall ... it does not
> read in all the sequences and look for duplicates so we have to decide in
> the emboss.defaults DB definition how to check a single ID (no way to read
> them all and check for duplicates).

Trying to check for duplicates is again too complicated. I understand 
that if a databank or a multiple sequence file has duplicates a 
"sequence" will retrieve the first and a "seqset" or "seqall" will 
retrieve them all. Well, let it be that way. It is the responsability of 
the database manager/user to make sure there are no duplicates.

	Guy


From gbottu at ben.vub.ac.be  Mon Oct  2 08:11:46 2006
From: gbottu at ben.vub.ac.be (Guy Bottu)
Date: Mon, 2 Oct 2006 10:11:46 +0200
Subject: [EMBOSS] case sensitive identifiers - Checked by AntiVir DEMO
	version -
In-Reply-To: <451CF527.8040506@ebi.ac.uk>
References: <20060928135740.GA14320@bigben.ulb.ac.be>
	<451BDD04.9040806@ebi.ac.uk>
	<20060929081508.GA25906@bigben.ulb.ac.be>
	<451CF527.8040506@ebi.ac.uk>
Message-ID: <20061002081146.GB5571@bigben.ulb.ac.be>

On Fri, Sep 29, 2006 at 11:27:51AM +0100, Peter Rice wrote:
> So, there will be 2 new (and for the first time boolean) attributes for 
> databases. To use them, you will need:
> 
> caseidmatch: "Y"
> hasaccession: "N"

The "hasaccession" attribute is certainly useful for search methods like 
SRS and MRS who have the notion of searching in separate indexes. By 
default searching both "id" and "ac" is the thing to do, but there are 
databanks where there is no "ac" indexed or there are databanks, like 
EMBL or IMGTHLA, where the "id" and the "ac" are always identical, so 
that searching only the "id" gains time without loosing functionality.

As for the case problem, I think we agree that the best is to always 
handle the sequence name as such (case as typed by the user) to the 
search method and in case the search method itself is not case senstive 
but the databank is, let EMBOSS if 'hasaccession: "Y"' parse the 
retrieved sequences and accept only those who match. This will work fine 
for SRS (and of course for the method "direct", where EMBOSS does all the 
work), but it will not work for MRS, since the current version of MRS 
does not allow case-different index words.

	Guy


From jbreu at mpipsykl.mpg.de  Fri Oct  6 16:53:00 2006
From: jbreu at mpipsykl.mpg.de (Johannes Breu)
Date: Fri, 06 Oct 2006 18:53:00 +0200
Subject: [EMBOSS] question
Message-ID: <3.0.6.32.20061006185300.00ab68e0@komserv.mpipsykl.mpg.de>

To whom it may concern.

I tried to install emboss on MS Windows 2000 in a cygwin environment. I
typed ./configure (following INSTALL). It took a long time but there was no
error message. After typing make  I got the message bash:command:not found.
Does anybody have any idea to solve this problem. Thanks.


From shaun at ebi.ac.uk  Fri Oct  6 18:30:56 2006
From: shaun at ebi.ac.uk (shaun at ebi.ac.uk)
Date: Fri, 6 Oct 2006 19:30:56 +0100 (BST)
Subject: [EMBOSS] question
In-Reply-To: <3.0.6.32.20061006185300.00ab68e0@komserv.mpipsykl.mpg.de>
References: <3.0.6.32.20061006185300.00ab68e0@komserv.mpipsykl.mpg.de>
Message-ID: <50608.82.21.106.225.1160159456.squirrel@webmail.ebi.ac.uk>

> To whom it may concern.
>
> I tried to install emboss on MS Windows 2000 in a cygwin environment. I
> typed ./configure (following INSTALL). It took a long time but there was
> no
> error message. After typing make  I got the message bash:command:not
> found.
> Does anybody have any idea to solve this problem. Thanks.
>
> _______________________________________________
> EMBOSS mailing list
> EMBOSS at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/emboss
>

Hi Johannes,

I believe the installation under cygwin requires a couple of additional
switches:

./configure --without-x CFLAGS=-s

See the following URL for a more short guide to installing EMBOSS under a
cygwin environment (it actually describes the scenario that you are
encountering):

http://emboss.sourceforge.net/download/cygwin.html

HTH

Shaun


From mukherje at nsm.umass.edu  Fri Oct 13 19:34:11 2006
From: mukherje at nsm.umass.edu (mukherje at nsm.umass.edu)
Date: Fri, 13 Oct 2006 15:34:11 -0400
Subject: [EMBOSS] (no subject)
Message-ID: <1160768051.452fea3357af3@mail-www.oit.umass.edu>

Hi,

I tried to build Emboss package in a cygwin environment but had thhe following
error after running the "make" function.

Creating library file: .libs/libajax.dll.a
collect2: Id returned 1 exit status
make[1]: *** [libajax.1a] Error 1
make[1]: Leaving directory '/home/supratim/Emboss-4.0.0/ajax'
make: *** [install-recursive] Error1

I tried to run the application both before & after applying the fix as
mentioned and also tried with the regular configure option as well as

./configure --without-x CFLAGS=-s
make
make install

I also tried with the older version (EMBOSS-3.0.0) but did not have much luck.
It works fine in a Mac but I would like to get it to work on a Windows
platform with cygwin.

Thank you for your assistance in advance

Supratim Mukherjee
Graduate student
Department of Microbiology
UMass, Amherst


From kertib at linuxlap.hu  Mon Oct 16 08:50:03 2006
From: kertib at linuxlap.hu (Kerti =?ISO-8859-1?Q?Bal=E1zs_G=E1bor?=)
Date: Mon, 16 Oct 2006 10:50:03 +0200
Subject: [EMBOSS] make error
Message-ID: <1160988603.3981.12.camel@balazska.site>

Hi,

The .configure run well, but the make made error ebove:

config.status: creating jemboss/resources/Makefile
config.status: creating jemboss/utils/Makefile
config.status: creating Makefile
config.status: executing depfiles commands
server:/usr/src/EMBOSS-4.0.0 # make
Making all in plplot
make[1]: Entering directory `/usr/src/EMBOSS-4.0.0/plplot'
Making all in lib
make[2]: Entering directory `/usr/src/EMBOSS-4.0.0/plplot/lib'
make[2]: Nothing to be done for `all'.
make[2]: Leaving directory `/usr/src/EMBOSS-4.0.0/plplot/lib'
make[2]: Entering directory `/usr/src/EMBOSS-4.0.0/plplot'
make[2]: Nothing to be done for `all-am'.
make[2]: Leaving directory `/usr/src/EMBOSS-4.0.0/plplot'
make[1]: Leaving directory `/usr/src/EMBOSS-4.0.0/plplot'
Making all in ajax
make[1]: Entering directory `/usr/src/EMBOSS-4.0.0/ajax'
make[1]: Nothing to be done for `all'.
make[1]: Leaving directory `/usr/src/EMBOSS-4.0.0/ajax'
Making all in nucleus
make[1]: Entering directory `/usr/src/EMBOSS-4.0.0/nucleus'
/bin/sh ../libtool --tag=CC --mode=link gcc  -O2   -o libnucleus.la
-rpath /usr/local/lib -version-info 4:0:0 embaln.lo embcom.lo embcons.lo
embdata.lo embdbi.lo embdmx.lo embdomain.lo embest.lo embexit.lo
embgroup.lo embiep.lo embindex.lo embinit.lo embmat.lo embmisc.lo
embmol.lo embnmer.lo embpat.lo embpatlist.lo embprop.lo embpdb.lo
embread.lo embsig.lo embshow.lo embword.lo
libtool: link: `embmat.lo' is not a valid libtool object
make[1]: *** [libnucleus.la] Error 1
make[1]: Leaving directory `/usr/src/EMBOSS-4.0.0/nucleus'
make: *** [all-recursive] Error 1
server:/usr/src/EMBOSS-4.0.0 #

The OS: SuSE OpenEnterprise Server 10.0 x86
Kerlenl: server: Linux server 2.6.16.21-0.25-default #1 Tue Sep 19
07:26:15 UTC 2006 i686 i686 i386 GNU/Linux

What is the soultion?

Thank you for your assistance in advance

Kerti Balazs Gabor
Genetics and Plant Breeding, 
Szent Istvan University, Pater K. U. 1., 
Godollo 2103, Hungary


From maoj at helix.nih.gov  Tue Oct 17 15:36:22 2006
From: maoj at helix.nih.gov (Jean Mao)
Date: Tue, 17 Oct 2006 11:36:22 -0400
Subject: [EMBOSS] Question regarding dbxflat
Message-ID: <000001c6f202$000323e0$be4de780@CIT.NIH.GOV>

Hello,

Could someone help me determine which fields I need to include while running dbxflat? I am going to index the genbank and est gb*.seq files and gbest*.seq files from ftp://ftp.ncbi.nih.gov/genbank/. These files have sequence entries composed of : Locus, Definition, Accession, Version, Keywords, Source, Organism??

If I specify 'acc, id' while indexing, will the 'Definition' line be indexed or not? What about 'acc, id, des' ? In other words, I would like to know which programs in EMBOSS will not work if I don't specify 'des' while indexing. 
Some programs in EMBOSS such as 'coderet' require feature table. If I only index 'acc, id', will coderet work when user specify 'genbank:xxxxx' ?

I guess all I am trying to ask is what programs will stop working if I only accept default 'acc, id' fields.

Thank you in advance.


From pmr at ebi.ac.uk  Tue Oct 17 16:23:03 2006
From: pmr at ebi.ac.uk (pmr at ebi.ac.uk)
Date: Tue, 17 Oct 2006 17:23:03 +0100 (BST)
Subject: [EMBOSS] Question regarding dbxflat
In-Reply-To: <000001c6f202$000323e0$be4de780@CIT.NIH.GOV>
References: <000001c6f202$000323e0$be4de780@CIT.NIH.GOV>
Message-ID: <2320.210.150.186.27.1161102183.squirrel@webmail.ebi.ac.uk>

Hi Jean,

> If I specify 'acc, id' while indexing, will the 'Definition' line be
> indexed or not? What about 'acc, id, des' ? In other words, I would like
> to know which programs in EMBOSS will not work if I don't specify 'des'
> while indexing.
> Some programs in EMBOSS such as 'coderet' require feature table. If I only
> index 'acc, id', will coderet work when user specify 'genbank:xxxxx' ?
>
> I guess all I am trying to ask is what programs will stop working if I
> only accept default 'acc, id' fields.

The dbxflat fields only affect queries (do you want to search by
dbname-des: or dbname-gi: when you look for sequences).

Retrieval is the same once an entry has been found - you can return all
txt for entret, features for coderet, and os on as usual.

By default only the id and acc lines will be indexed.

We found a problem with one database that had no accessions (pdb as a
fasta file indexed with dbxfasta) so the next release will have an option
to turn off accession searches in the database definition and we may add
an option to skip accession indexing.

regards,

Peter Rice


From smiddha at indiana.edu  Thu Oct 19 14:59:57 2006
From: smiddha at indiana.edu (Sumit Middha)
Date: Thu, 19 Oct 2006 10:59:57 -0400
Subject: [EMBOSS] distmat Uncorrected distance > 100
Message-ID: <453792ED.6040601@indiana.edu>


Hi,

I tried using distmat from emboss on some alignments and am getting 
scores in excess of 100 (using all default options).

I am not sure how scores can exceed 100.

D = uncorrected distance = p-distance = 1-S
where S = m/(npos + gaps*gap_penalty)  

So D is like a percentage and equals number of substitutions per 100 bases or amino acids.

Please correct me or point me to an explanation which will help clarify my doubt.

Thanks,
Sumit


From aengus.stewart at cancer.org.uk  Mon Oct 23 18:00:11 2006
From: aengus.stewart at cancer.org.uk (Aengus Stewart)
Date: Mon, 23 Oct 2006 19:00:11 +0100
Subject: [EMBOSS] Fuzznuc ignoring start and end
Message-ID: <453D032B.7010104@cancer.org.uk>


Hi folks,

fuzznuc and also fuzzpro are ignoring the start and end params I am giving it.

fuzznuc -pattern rccatgg -sbegin1 75834 -send1 96013 -sequence ac087388.fasta


Cheers
Aengus


########################################
# Program: fuzznuc
# Rundate: Mon Oct 23 2006 18:54:54
# Commandline: fuzznuc
#    -pattern rccatgg
#    -sbegin 75834
#    -send 96013
#    -sequence ac087388.fasta
# Report_format: seqtable
# Report_file: ac087388.fuzznuc
########################################

#=======================================
#
# Sequence: AC087388     from: 75834   to: 96013
# HitCount: 15
#
# Pattern_name Mismatch Pattern
# pattern1            0 rccatgg
#
# Complement: No
#
#=======================================

  Start     End Pattern_name Mismatch Sequence
  38702   38708 pattern1            . gccatgg
  43834   43840 pattern1            . accatgg
  47457   47463 pattern1            . gccatgg
  48659   48665 pattern1            . gccatgg
  56718   56724 pattern1            . accatgg
  61200   61206 pattern1            . accatgg
  62151   62157 pattern1            . accatgg
  68706   68712 pattern1            . accatgg
  78513   78519 pattern1            . gccatgg
  79973   79979 pattern1            . gccatgg
  86415   86421 pattern1            . accatgg
  97451   97457 pattern1            . accatgg
 102803  102809 pattern1            . gccatgg
 113924  113930 pattern1            . gccatgg
 115436  115442 pattern1            . gccatgg

#---------------------------------------
#---------------------------------------

#---------------------------------------
# Total_sequences: 1
# Total_hitcount: 15
#---------------------------------------

-- 
-----------------------------------------------------------------------
Aengus Stewart
Group Leader
Bioinformatics and BioStatistics               Tel: +44 (0)20 7269 3679
Cancer Research UK, Lincoln's Inn Fields, Holborn, London, WC2A 3PX, UK
-----------------------------------------------------------------------

This electronic message contains information which may be privileged and
confidential.  The information is intended to be for the use of the
individual(s) or entity named above. Be aware that any third party
disclosure, distribution, copying or use of this communication, without
prior permission, is strictly prohibited.


From mrln at o2.pl  Mon Oct 23 21:49:07 2006
From: mrln at o2.pl (Marlena Roszczyk)
Date: Mon, 23 Oct 2006 23:49:07 +0200
Subject: [EMBOSS] 30 entries only
Message-ID: <1161640147.4367.37.camel@localhost.localdomain>

Does anybody know how to solve this problem:

I use Emboss via srswww method and everything seems to work fine until I
ask seqret or infoseq (or any other application that searches the
database) for many sequences  (for example typing: "seqret
database-des:kinase"). The output consists only of 30 entries even
though the same query on srs.ebi.ac.uk results in a 6-digit number of
entries. What shall I do to get all the entries I want? Is it a problem
with Emboss or rather srs policy of sending data?


Just in case you would like to see my emboss.default:

DB zuniprot [ 
methodquery: srswww
format: swiss
type: P
fields: "id acc sv des key org"
dbalias: uniprot
url: "http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz"
comment: "uniprot/swiss via srswww"
]

DB zswiss [
methodquery: srswww
format: swiss
type: P
fields: "id acc sv des key org"
dbalias: swissprot
url: "http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz"
comment: "swissprot via srswww"
]

DB zembl [
type: N
methodquery: srswww
format: embl
fields: "id acc key sv des org"
dbalias: embl
url: "http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz"
comment: "embl via srswww"
]


Thanks in advance, 
Marlena Roszczyk


From David.Bauer at SCHERING.DE  Tue Oct 24 06:48:45 2006
From: David.Bauer at SCHERING.DE (David.Bauer at SCHERING.DE)
Date: Tue, 24 Oct 2006 08:48:45 +0200
Subject: [EMBOSS] Antwort:  30 entries only
In-Reply-To: <1161640147.4367.37.camel@localhost.localdomain>
Message-ID: <OF5A6D45E3.B212A495-ONC1257211.0024B775-C1257211.00256D2C@schering.de>


Hi Marleno,

SRS has a default limit of 30 entries/page.
So it seems that you are getting only the first page of results from the
server.
If you want to run queries with such results, it may be a good idea to
download the uniprot flat file from the ebi ftp server, index it with the
EMBOSS dbxflat and than run the queries locally.
But if this is not an option due to limited resources, I guess Peter will
have an idea how to get the other result pages out of SRS with the srswww
method. ;-)

Cheers,
David.

emboss-bounces at lists.open-bio.org schrieb am 23/10/2006 23:49:07:

> Does anybody know how to solve this problem:
>
> I use Emboss via srswww method and everything seems to work fine until I
> ask seqret or infoseq (or any other application that searches the
> database) for many sequences  (for example typing: "seqret
> database-des:kinase"). The output consists only of 30 entries even
> though the same query on srs.ebi.ac.uk results in a 6-digit number of
> entries. What shall I do to get all the entries I want? Is it a problem
> with Emboss or rather srs policy of sending data?
>
>
> Just in case you would like to see my emboss.default:
>
> DB zuniprot [
> methodquery: srswww
> format: swiss
> type: P
> fields: "id acc sv des key org"
> dbalias: uniprot
> url: "http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz"
> comment: "uniprot/swiss via srswww"
> ]
>
> DB zswiss [
> methodquery: srswww
> format: swiss
> type: P
> fields: "id acc sv des key org"
> dbalias: swissprot
> url: "http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz"
> comment: "swissprot via srswww"
> ]
>
> DB zembl [
> type: N
> methodquery: srswww
> format: embl
> fields: "id acc key sv des org"
> dbalias: embl
> url: "http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz"
> comment: "embl via srswww"
> ]
>
>
> Thanks in advance,
> Marlena Roszczyk
>
> _______________________________________________
> EMBOSS mailing list
> EMBOSS at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/emboss


From rls at ebi.ac.uk  Tue Oct 24 07:59:46 2006
From: rls at ebi.ac.uk (Rodrigo Lopez)
Date: Tue, 24 Oct 2006 08:59:46 +0100
Subject: [EMBOSS] 30 entries only
In-Reply-To: <1161640147.4367.37.camel@localhost.localdomain>
References: <1161640147.4367.37.camel@localhost.localdomain>
Message-ID: <453DC7F2.6030908@ebi.ac.uk>

Hi,

I suspect this is related to the default view used in SRS. It is 
returning the first page of results that contains 30 sequences (the 
default). To overcome this problem, the call need to have the following 
parameters:

http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz?[uniprot-des:kinase]+-vn+1+-lv+100

-vn <int> is the view to use (1=names, 2=complete entries)
-lv <int> is the number of entries to be used in one go

It is important to realize that downloading a lot of entries, although 
possible, take a while and results in high loads for the servers.

The way in which I use to download a large set of entries is by 
generating a list (using -vn 1) and then using the list
with seqret in the following way:

% seqret @listname -out mykinases

and making sure this time that -vn 2 is used to retrieve complete 
entries. This requires the addition of other EMBOSS database definitions 
(one for lists and another for complete entry retrieval).

Hope this helps,

R:)


Marlena Roszczyk wrote:
> Does anybody know how to solve this problem:
> 
> I use Emboss via srswww method and everything seems to work fine until I
> ask seqret or infoseq (or any other application that searches the
> database) for many sequences  (for example typing: "seqret
> database-des:kinase"). The output consists only of 30 entries even
> though the same query on srs.ebi.ac.uk results in a 6-digit number of
> entries. What shall I do to get all the entries I want? Is it a problem
> with Emboss or rather srs policy of sending data?
> 
> 
> Just in case you would like to see my emboss.default:
> 
> DB zuniprot [ 
> methodquery: srswww
> format: swiss
> type: P
> fields: "id acc sv des key org"
> dbalias: uniprot
> url: "http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz"
> comment: "uniprot/swiss via srswww"
> ]
> 
> DB zswiss [
> methodquery: srswww
> format: swiss
> type: P
> fields: "id acc sv des key org"
> dbalias: swissprot
> url: "http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz"
> comment: "swissprot via srswww"
> ]
> 
> DB zembl [
> type: N
> methodquery: srswww
> format: embl
> fields: "id acc key sv des org"
> dbalias: embl
> url: "http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz"
> comment: "embl via srswww"
> ]
> 
> 
> Thanks in advance, 
> Marlena Roszczyk
> 
> _______________________________________________
> EMBOSS mailing list
> EMBOSS at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/emboss


From pmr at ebi.ac.uk  Tue Oct 24 08:55:17 2006
From: pmr at ebi.ac.uk (pmr at ebi.ac.uk)
Date: Tue, 24 Oct 2006 09:55:17 +0100 (BST)
Subject: [EMBOSS] 30 entries only
In-Reply-To: <453DC7F2.6030908@ebi.ac.uk>
References: <1161640147.4367.37.camel@localhost.localdomain>
	<453DC7F2.6030908@ebi.ac.uk>
Message-ID: <3261.217.44.133.216.1161680117.squirrel@webmail.ebi.ac.uk>

Rodrigo Lopez writes:

> I suspect this is related to the default view used in SRS. It is
> returning the first page of results that contains 30 sequences (the
> default). To overcome this problem, the call need to have the following
> parameters:
>
> http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz?[uniprot-des:kinase]+-vn+1+-lv+100

But this is using the EMBOSS "srswww" access method, which uses

+-e+-ascii

That should return complete entries for all as ascii text.

Perhaps something has changed on the EBI's SRS server because this now
only gives me 30 entries.

+-lv+100 does give 100 entries ... but it will take some reworking of the
code to loop through entries that way. Hmmmm.....


regards,

Peter


From rls at ebi.ac.uk  Tue Oct 24 09:01:50 2006
From: rls at ebi.ac.uk (Rodrigo Lopez)
Date: Tue, 24 Oct 2006 10:01:50 +0100
Subject: [EMBOSS] 30 entries only
In-Reply-To: <3261.217.44.133.216.1161680117.squirrel@webmail.ebi.ac.uk>
References: <1161640147.4367.37.camel@localhost.localdomain>
	<453DC7F2.6030908@ebi.ac.uk>
	<3261.217.44.133.216.1161680117.squirrel@webmail.ebi.ac.uk>
Message-ID: <453DD67E.8000600@ebi.ac.uk>

I'll have to wait for our SRS admin to come back to find out if a change 
has in fact taken place or not. hmm....

R:/

pmr at ebi.ac.uk wrote:
> Rodrigo Lopez writes:
> 
>> I suspect this is related to the default view used in SRS. It is
>> returning the first page of results that contains 30 sequences (the
>> default). To overcome this problem, the call need to have the following
>> parameters:
>>
>> http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz?[uniprot-des:kinase]+-vn+1+-lv+100
> 
> But this is using the EMBOSS "srswww" access method, which uses
> 
> +-e+-ascii
> 
> That should return complete entries for all as ascii text.
> 
> Perhaps something has changed on the EBI's SRS server because this now
> only gives me 30 entries.
> 
> +-lv+100 does give 100 entries ... but it will take some reworking of the
> code to loop through entries that way. Hmmmm.....
> 
> 
> regards,
> 
> Peter


From maoj at helix.nih.gov  Tue Oct 24 18:15:17 2006
From: maoj at helix.nih.gov (Jean Mao)
Date: Tue, 24 Oct 2006 14:15:17 -0400
Subject: [EMBOSS] How to include Prosite and Rebase and Print into 'showdb'
	program
Message-ID: <000001c6f798$5c2c13c0$be4de780@CIT.NIH.GOV>

Hi, for EMBOSS 4.0.0, is there a way to show both prosite and rebase databases when I type 'showdb' at the prompt? I asked the same question back in 2003. I was hoping the answer will be different this time :-)

Thanks. 

Jean


From sovani at rohan.sdsu.edu  Tue Oct 24 18:09:07 2006
From: sovani at rohan.sdsu.edu (Sujata Sovani)
Date: Tue, 24 Oct 2006 11:09:07 -0700 (PDT)
Subject: [EMBOSS] Antigenic - input file format?
Message-ID: <3251.146.244.226.90.1161713347.squirrel@www-rohan.sdsu.edu>

Hi,

I want to use the package called 'Antigenic' in EMBOSS.
I am not quite clear about the input file format to be used.

How can I input a fasta file to the program? - is it possible to use a
text file that has the amino acid sequence in a fasta format? In which
folder should the file be?

Please let me know.

Thank you.

Regards,
Sujata


From km at mrna.tn.nic.in  Tue Oct 24 21:28:57 2006
From: km at mrna.tn.nic.in (km)
Date: Wed, 25 Oct 2006 02:58:57 +0530
Subject: [EMBOSS] Antigenic - input file format?
In-Reply-To: <3251.146.244.226.90.1161713347.squirrel@www-rohan.sdsu.edu>
References: <3251.146.244.226.90.1161713347.squirrel@www-rohan.sdsu.edu>
Message-ID: <20061024212857.GA31781@mrna.tn.nic.in>

Hi,

> I want to use the package called 'Antigenic' in EMBOSS.
> I am not quite clear about the input file format to be used.
pls consult EMBOSS documentation on the system by typing
$tfm antigenic 

> How can I input a fasta file to the program?
first check: 
tfm antigenic 
then, assuming that ur set of sequence(s) are in a textfile(myseqs.fa) in fasta format run:
$antigenic -sequence myseqs.fa
> is it possible to use a text file that has the amino acid sequence in a fasta format?
yes
>In which folder should the file be?

simple solution would be that the sequence 
current folder 

regards,
KM


From golharam at umdnj.edu  Wed Oct 25 04:28:30 2006
From: golharam at umdnj.edu (Ryan Golhar)
Date: Wed, 25 Oct 2006 00:28:30 -0400
Subject: [EMBOSS] How to include Prosite and Rebase and Print into
 'showdb'program
In-Reply-To: <000001c6f798$5c2c13c0$be4de780@CIT.NIH.GOV>
Message-ID: <002f01c6f7ee$07005fe0$2f01a8c0@GOLHARMOBILE1>

Have you gotten an answer to this yet?

> -----Original Message-----
> From: emboss-bounces at lists.open-bio.org 
> [mailto:emboss-bounces at lists.open-bio.org] On Behalf Of Jean Mao
> Sent: Tuesday, October 24, 2006 2:15 PM
> To: emboss at emboss.open-bio.org
> Subject: [EMBOSS] How to include Prosite and Rebase and Print 
> into 'showdb'program
> 
> 
> Hi, for EMBOSS 4.0.0, is there a way to show both prosite and 
> rebase databases when I type 'showdb' at the prompt? I asked 
> the same question back in 2003. I was hoping the answer will 
> be different this time :-)
> 
> Thanks. 
> 
> Jean
> 
> 
> _______________________________________________
> EMBOSS mailing list
> EMBOSS at lists.open-bio.org 
> http://lists.open-> bio.org/mailman/listinfo/emboss
> 


From pmr at ebi.ac.uk  Wed Oct 25 07:34:19 2006
From: pmr at ebi.ac.uk (pmr at ebi.ac.uk)
Date: Wed, 25 Oct 2006 08:34:19 +0100 (BST)
Subject: [EMBOSS] How to include Prosite and Rebase and Print into
 'showdb' program
In-Reply-To: <000001c6f798$5c2c13c0$be4de780@CIT.NIH.GOV>
References: <000001c6f798$5c2c13c0$be4de780@CIT.NIH.GOV>
Message-ID: <1836.217.44.133.216.1161761659.squirrel@webmail.ebi.ac.uk>

Hi Jean,

> Hi, for EMBOSS 4.0.0, is there a way to show both prosite and rebase
> databases when I type 'showdb' at the prompt? I asked the same question
> back in 2003. I was hoping the answer will be different this time :-)

Well .... EMBOSS 4.0.0 does have extended showdb output so now we can add
this. The main issue is that there is currently nothing in EMBOSS that
uses the definition, but we would like to add a report of the database
release to the output of programs that use them.

The definitions would be expected to go in RESOURCE definitions in the
emboss.default file but we could perhaps put something in the output of
the *extract programs.

I will take another look.

regards,

Peter


From pmr at ebi.ac.uk  Wed Oct 25 07:37:56 2006
From: pmr at ebi.ac.uk (pmr at ebi.ac.uk)
Date: Wed, 25 Oct 2006 08:37:56 +0100 (BST)
Subject: [EMBOSS] Antigenic - input file format?
In-Reply-To: <3251.146.244.226.90.1161713347.squirrel@www-rohan.sdsu.edu>
References: <3251.146.244.226.90.1161713347.squirrel@www-rohan.sdsu.edu>
Message-ID: <1840.217.44.133.216.1161761876.squirrel@webmail.ebi.ac.uk>

Hi Sujata,

> I want to use the package called 'Antigenic' in EMBOSS.
> I am not quite clear about the input file format to be used.
>
> How can I input a fasta file to the program? - is it possible to use a
> text file that has the amino acid sequence in a fasta format? In which
> folder should the file be?

All EMBOSS programs read sequences from files, or from databases (local or
remote).

You can put the sequence in a file, in fasta format, and give the filename
to any EMBOSS program as input. Sequences are "input parameters" so simply
putting the filename on the command line is enough.

EMBOSS will look in the current directory, but you can give the full or
relative file path just like any Unix command.

This assumes of course that you are running EMBOSS locally, not through a
web interface (in that case, simply paste a FASTA format sequence into the
text box).

Hope that helps

Peter


From pmr at ebi.ac.uk  Wed Oct 25 07:42:50 2006
From: pmr at ebi.ac.uk (pmr at ebi.ac.uk)
Date: Wed, 25 Oct 2006 08:42:50 +0100 (BST)
Subject: [EMBOSS] How to include Prosite and Rebase and Print into
 'showdb'program
In-Reply-To: <002f01c6f7ee$07005fe0$2f01a8c0@GOLHARMOBILE1>
References: <000001c6f798$5c2c13c0$be4de780@CIT.NIH.GOV>
	<002f01c6f7ee$07005fe0$2f01a8c0@GOLHARMOBILE1>
Message-ID: <1844.217.44.133.216.1161762170.squirrel@webmail.ebi.ac.uk>

Ryan Golhar writes:
> Have you gotten an answer to this yet?

A bit quick off the mark there, Ryan! :-) :-) :-)

Jean asked in the USA at 7pm our time. You posted this in India at 5am our
time. I answered over breakfast (well, not quite a positive answer, but I
did answer :-)

If only Jean had asked last week ... I was in Japan and I'd have snuck in
a reply already... and Alan, Jon and I do quite often post replies at very
strange hours even when we are home.

regards,

Peter


From mrln at o2.pl  Wed Oct 25 13:13:36 2006
From: mrln at o2.pl (Marlena Roszczyk)
Date: Wed, 25 Oct 2006 15:13:36 +0200
Subject: [EMBOSS] 30 entries only
In-Reply-To: <3261.217.44.133.216.1161680117.squirrel@webmail.ebi.ac.uk>
References: <1161640147.4367.37.camel@localhost.localdomain>
	<453DC7F2.6030908@ebi.ac.uk>
	<3261.217.44.133.216.1161680117.squirrel@webmail.ebi.ac.uk>
Message-ID: <1161782016.4396.52.camel@localhost.localdomain>

Adding lv parameter helped and is good enough. It required few more
lines in emboss.default:

DB blahblah [
method: url
format: myfavouriteformat
type: P
url: "http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz?-e+-ascii+[uniprot-des:%
s]+-lv+<int>"
]

Thank you.

Still, option -vn 1 refuses to cooperate, although -vn 2 works fine.
Adding +-vn+1 to the url-line above makes seqret return "Bad value for
-sequence". Hmmm... 

Regards,
Marlena Roszczyk


> Rodrigo Lopez writes:
> 
> > I suspect this is related to the default view used in SRS. It is
> > returning the first page of results that contains 30 sequences (the
> > default). 

 Yes, the number 30 here and there doesn't seem a coincidence.


From pmr at ebi.ac.uk  Wed Oct 25 13:27:49 2006
From: pmr at ebi.ac.uk (Peter Rice)
Date: Wed, 25 Oct 2006 14:27:49 +0100
Subject: [EMBOSS] Question regarding seqret
In-Reply-To: <000001c6d11a$0db13530$be4de780@CIT.NIH.GOV>
References: <000001c6d11a$0db13530$be4de780@CIT.NIH.GOV>
Message-ID: <453F6655.2050900@ebi.ac.uk>

Jean Mao wrote:
> Hi, 
> I have a question hopefully someone can help me about it.
> 
> I downloaded the gbrvt1.seq file from ftp://ftp.ncbi.nih.gov/genbank/ as a test, gunzip and index it with dbxflat (I know it's not > than 2gb):
> 
> %  dbxflat -dbname=testdb -dbresource=embl -idformat=gb -directory=. -fields='id,acc,sv,des' -filenames='gbvrt*.seq' -indexoutdir=. -release=0.0 -date='00/00/00'
> 
> Then I run 'seqret' but failed to retrieve entries using 'sv' or 'des' fields:

I didn't see an answer to this one, but I suspect you have already figured it out.

dbixflat and dbiflat will have created the sv and des indices.

You have to edit the database definition in emboss.default to say the fields exist.

    fields: "sv des"

then seqret and other programs will know they can use them.

Yes, in theory seqret could work out what indices are available for a dbxflat or 
dbiflat indexed database - but it would be more difficult for an SRS or SRSWWW 
database (for example) so we depend on the database definitions.

Hope that helps,

Peter


From golharam at umdnj.edu  Wed Oct 25 18:50:12 2006
From: golharam at umdnj.edu (Ryan Golhar)
Date: Wed, 25 Oct 2006 14:50:12 -0400
Subject: [EMBOSS] How to include Prosite and Rebase and Print into
 'showdb'program
In-Reply-To: <1844.217.44.133.216.1161762170.squirrel@webmail.ebi.ac.uk>
Message-ID: <006d01c6f866$66d9cbe0$2f01a8c0@GOLHARMOBILE1>

> -----Original Message-----
> From: pmr at ebi.ac.uk [mailto:pmr at ebi.ac.uk] 
> Sent: Wednesday, October 25, 2006 3:43 AM
> To: golharam at umdnj.edu
> Cc: 'Jean Mao'; emboss at emboss.open-bio.org
> Subject: Re: [EMBOSS] How to include Prosite and Rebase and 
> Print into 'showdb'program
> 
> 
> Ryan Golhar writes:
> > Have you gotten an answer to this yet?
> 
> A bit quick off the mark there, Ryan! :-) :-) :-)
> 
> Jean asked in the USA at 7pm our time. You posted this in 
> India at 5am our time. I answered over breakfast (well, not 
> quite a positive answer, but I did answer :-)
> 
> If only Jean had asked last week ... I was in Japan and I'd 
> have snuck in a reply already... and Alan, Jon and I do quite 
> often post replies at very strange hours even when we are home.
> 
> regards,
> 
> Peter
> 
> 
> 

Sorry, I was cleaning out my mail folder.  I had deleted the message
already and noticed it in my deleted box.  The subject caught my
attention.  I thought the message was older...my bad.


From mkitagaw73 at yahoo.co.jp  Fri Oct 27 13:09:40 2006
From: mkitagaw73 at yahoo.co.jp (mkitagaw73 at yahoo.co.jp)
Date: Fri, 27 Oct 2006 22:09:40 +0900
Subject: [EMBOSS] ARACHNE3
Message-ID: <OF4E21ABC7.1D5BAC9A-ON49257214.00484BFB@takara.co.jp>

I can not find "Arachne 3" the assembler of new version of "Arachne 2".
Do you know where it is?
--
Nari


From mincloud at gmail.com  Sun Oct 29 17:39:35 2006
From: mincloud at gmail.com (yun zheng)
Date: Sun, 29 Oct 2006 11:39:35 -0600
Subject: [EMBOSS] How to apply the einverted and etandom to a fasta file
Message-ID: <8f6eb9540610290939i48adf473g2f81c36a14b198ad@mail.gmail.com>

Hi,

I am a new user of emboss. I am trying to find repeat sequences in a
nucleotide sequence file that have many sequences.

Can anybody tell me how to use einverted and etandem to analyze all the
sequences in a fasta file?

Many Thanks.

Sincerely

Zheng, yun

Dept of Computer Science and Engineering

Washington Univ in St Louis

Campus Box 1045

1 Brookings Drive

Jolley Hall 505

St Louis, MO 63130


Details:

I install a version on the linux platform. And the command is like follows,
where the default value is used.

>einverted -sequence test.fasta -outfile test.outfile -outseq
>test-i.fasta

Finds DNA inverted repeats

Gap penalty [12]:

Minimum score threshold [50]:

Match score [3]:

Mismatch score [-4]:


But the output file seems always to be empty.


When I try etandom

>etandem -sequence test.fasta -outfile test-t.out -origfile test.etandem

Looks for tandem repeats in a nucleotide sequence

Minimum repeat size [10]:

Maximum repeat size [10]: 18

However, it seems that only the first sequence is analyzed by the einverted
and etandom. The test-t.out file is as follows.

########################################

# Program: etandem

# Rundate: Sat Oct 28 2006 17:24:30

# Commandline: etandem

#    -sequence test.fasta

#    -outfile test-t.out

#    -origfile test.etandem

#    -maxrepeat 18

# Report_format: table

# Report_file: test-t.out

########################################


#=======================================

#

# Sequence: D9X6RJV01EER0J     from: 1   to: 55

# HitCount: 0

#

# Threshold: 20

# Minrepeat: 10

# Maxrepeat: 18

# Mismatch: No

# Uniform: No

#

#=======================================

   Start     End   Score   Size  Count Identity Consensus

#---------------------------------------

#---------------------------------------

 Many thanks.


From gbottu at ben.vub.ac.be  Mon Oct 30 15:33:13 2006
From: gbottu at ben.vub.ac.be (Guy Bottu)
Date: Mon, 30 Oct 2006 16:33:13 +0100
Subject: [EMBOSS] How to apply the einverted and etandom to a fasta file
	- C
In-Reply-To: <8f6eb9540610290939i48adf473g2f81c36a14b198ad@mail.gmail.com>
References: <8f6eb9540610290939i48adf473g2f81c36a14b198ad@mail.gmail.com>
Message-ID: <20061030153313.GA14597@bigben.ulb.ac.be>

On Sun, Oct 29, 2006 at 11:39:35AM -0600, yun zheng wrote:
> I am a new user of emboss. I am trying to find repeat sequences in a
> nucleotide sequence file that have many sequences.
> 
> Can anybody tell me how to use einverted and etandem to analyze all the
> sequences in a fasta file?

einverted is searching for palindromes rather than repeats. It operates 
without problem on a fastA multiple sequence file. The reason that the 
output file is empty is probably because it did not find any good 
palindrome. Maybe you can try experiment with the parameters.

etandem operates only on one sequence at a time. You can see this because 
if you do etandem -help you see that it takes as input an object of type 
"sequence" rather than "seqall". If you want to treat many sequences at 
once, you will need to put them in separate files. If necessary you can 
run seqret -ossingle on your file. You can under the Tc shell (tcsh) 
(provided your files are all called something.fasta) do :

foreach FASTAFILE (`ls *.fasta`)
etandem $FASTAFILE -minrepeat=10 -maxrepeat=10 -threshold=20 -auto
end

Problem is that etandem works only well if you provide an appropriate 
value for minrepeat/maxrepeat/threshold. You can use equicktandem to get 
an idea (look in the 4th column of the output for a repeat size). Working 
on all sequences in one run will of course only go well if they all 
contain repeats of similar size and quality.

I hope this helps.

	Guy Bottu,
	Belgian EMBnet Node


From jbreu at mpipsykl.mpg.de  Mon Oct 30 19:38:10 2006
From: jbreu at mpipsykl.mpg.de (Johannes Breu)
Date: Mon, 30 Oct 2006 20:38:10 +0100
Subject: [EMBOSS] dbifasta
Message-ID: <3.0.6.32.20061030203810.00acc5e8@komserv.mpipsykl.mpg.de>

Hello,
while trying to index my database (its mouse_ensembl_cdna and so is the
name) I always get the following error message:

$ dbifasta
Database indexing for fasta file databases
Database name: cdna
    simple : >ID
     idacc : >ID ACC
     gcgid : >db:ID
  gcgidacc : >db:ID ACC
      dbid : >db ID
      ncbi : | formats
ID line format [idacc]: simple
Database directory [.]: /data/cdna
Wildcard database filename [*.dat]: cdna
Release number [0.0]:
Index date [00/00/00]:
General log output file [outfile.dbifasta]: outfile.cdnafasta

   EMBOSS An error in dbifasta.c at line 210:
No files selected


For the case it?s relevant - I am using cygwin. 

Thank you, Johannes 


From ajb at ebi.ac.uk  Mon Oct 30 21:30:03 2006
From: ajb at ebi.ac.uk (ajb at ebi.ac.uk)
Date: Mon, 30 Oct 2006 21:30:03 -0000 (GMT)
Subject: [EMBOSS] dbifasta
In-Reply-To: <3.0.6.32.20061030203810.00acc5e8@komserv.mpipsykl.mpg.de>
References: <3.0.6.32.20061030203810.00acc5e8@komserv.mpipsykl.mpg.de>
Message-ID: <40898.81.98.244.247.1162243803.squirrel@webmail.ebi.ac.uk>

Hi,

If the filename is mouse_ensembl_cdna then that's the filename you
should use at the

   Wildcard database filename [*.dat]:

prompt. From your email you were using "cdna" instead. As a wildcard
can be specified  then perhaps you intended typing "*cdna" which would
have picked up the filename mouse_ensembl_cdna

HTH

Alan
EBI

> Hello,
> while trying to index my database (its mouse_ensembl_cdna and so is the
> name) I always get the following error message:
>
> $ dbifasta
> Database indexing for fasta file databases
> Database name: cdna
>     simple : >ID
>      idacc : >ID ACC
>      gcgid : >db:ID
>   gcgidacc : >db:ID ACC
>       dbid : >db ID
>       ncbi : | formats
> ID line format [idacc]: simple
> Database directory [.]: /data/cdna
> Wildcard database filename [*.dat]: cdna
> Release number [0.0]:
> Index date [00/00/00]:
> General log output file [outfile.dbifasta]: outfile.cdnafasta
>
>    EMBOSS An error in dbifasta.c at line 210:
> No files selected
>
>
> For the case it?s relevant - I am using cygwin.
>
> Thank you, Johannes
>
>
> _______________________________________________
> EMBOSS mailing list
> EMBOSS at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/emboss
>
>


From shrish at ccmb.res.in  Tue Oct 31 12:41:19 2006
From: shrish at ccmb.res.in (Shrish Tiwari)
Date: Tue, 31 Oct 2006 18:11:19 +0530 (IST)
Subject: [EMBOSS] extracting noncoding regions
Message-ID: <2187871.1162298479934.JavaMail.root@mailserver>

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <http://lists.open-bio.org/pipermail/emboss/attachments/20061031/be440637/attachment.ksh>

From shrish at ccmb.res.in  Tue Oct 31 12:18:36 2006
From: shrish at ccmb.res.in (Shrish Tiwari)
Date: Tue, 31 Oct 2006 17:48:36 +0530 (IST)
Subject: [EMBOSS] showfeat troubles
Message-ID: <24303384.1162297116701.JavaMail.root@mailserver>

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <http://lists.open-bio.org/pipermail/emboss/attachments/20061031/9aa4724d/attachment.ksh>

From David.Bauer at schering.de  Tue Oct 31 13:54:03 2006
From: David.Bauer at schering.de (David.Bauer at schering.de)
Date: Tue, 31 Oct 2006 14:54:03 +0100
Subject: [EMBOSS] Antwort:  showfeat troubles
In-Reply-To: <24303384.1162297116701.JavaMail.root@mailserver>
Message-ID: <OF769146FF.BC4FD662-ONC1257218.004C2056-C1257218.004C5C38@schering.de>


Hi,

I don't get this problem. Showfeat displays CDS from both strands with
EMBL and GenBank files.
What is the source of your Genbankf file ? Maybe the format is not
perfectly correct ?

David.

emboss-bounces at lists.open-bio.org schrieb am 31/10/2006 13:18:36:

> Hi!
> I used the following command to extract only positions of CDS from gbk
files:
> showfeat -pos -matchtype CDS -width 0
> But I noticed that the program does not extract positions of CDS
> that lie on the complementary strand, e.g. CDS
> complement(5683..6459) did not show up in the resultant file. Any
> ideas on how I can get showfeat to extract these positions too.
> Shrish
> Dr. Shrish Tiwari
> E503, Centre for Cellular and Molecular Biology
> Uppal Road, Hyderabad - 500 007, INDIA
> Phone: 91-40-27192777
> Alternate email: shrish.geo at yahoo.com
>
> _______________________________________________
> EMBOSS mailing list
> EMBOSS at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/emboss


From pmr at ebi.ac.uk  Tue Oct 31 15:34:00 2006
From: pmr at ebi.ac.uk (Peter Rice)
Date: Tue, 31 Oct 2006 15:34:00 +0000
Subject: [EMBOSS] extracting noncoding regions
In-Reply-To: <2187871.1162298479934.JavaMail.root@mailserver>
References: <2187871.1162298479934.JavaMail.root@mailserver>
Message-ID: <45476CE8.8080409@ebi.ac.uk>

Hi Shrish,

Shrish Tiwari wrote:
> Hi!
> Is there a way of extracting the noncoding regions of a genome using an EMBOSS program?

That is a simple change to coderet to return non-coding sequence (exclude the 
CDS and mRNA features).

Does anyone else want this? We can do it for the next release.

regards,

Peter


From pmr at ebi.ac.uk  Tue Oct 31 15:55:49 2006
From: pmr at ebi.ac.uk (Peter Rice)
Date: Tue, 31 Oct 2006 15:55:49 +0000
Subject: [EMBOSS] showfeat troubles
In-Reply-To: <24303384.1162297116701.JavaMail.root@mailserver>
References: <24303384.1162297116701.JavaMail.root@mailserver>
Message-ID: <45477205.50002@ebi.ac.uk>

Hi Shrish,

Shrish Tiwari wrote:
> Hi!
> I used the following command to extract only positions of CDS from gbk files:
> showfeat -pos -matchtype CDS -width 0
> But I noticed that the program does not extract positions of CDS that lie on the complementary strand, e.g. CDS             complement(5683..6459) did not show up in the resultant file. Any ideas on how I can get showfeat to extract these positions too.

It worked for me, but reports these as 5683..6469 (without -width 0 it will show 
the arrow in the reverse direction)

Can you try running entret on the same genbank entry, and sending the output 
file to emboss-bug at emboss.open-bio.org so we can take a look at it.

regards,

Peter Rice


From David.Bauer at schering.de  Tue Oct 31 14:01:54 2006
From: David.Bauer at schering.de (David.Bauer at schering.de)
Date: Tue, 31 Oct 2006 15:01:54 +0100
Subject: [EMBOSS] Antwort:  extracting noncoding regions
In-Reply-To: <2187871.1162298479934.JavaMail.root@mailserver>
Message-ID: <OFA8463ADB.4252913F-ONC1257218.004CD418-C1257218.004D1451@schering.de>


Hm,

if the genome is annotated, you could use
maskfeat -type mRNA (or -type CDS)
to mask all transcribed or translated regions with N.

HTH,
David.

emboss-bounces at lists.open-bio.org schrieb am 31/10/2006 13:41:19:

> Hi!
> Is there a way of extracting the noncoding regions of a genome using
> an EMBOSS program?
> Shrish
> Dr. Shrish Tiwari
> E503, Centre for Cellular and Molecular Biology
> Uppal Road, Hyderabad - 500 007, INDIA
> Phone: 91-40-27192777
> Alternate email: shrish.geo at yahoo.com
>
> _______________________________________________
> EMBOSS mailing list
> EMBOSS at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/emboss


From golharam at umdnj.edu  Tue Oct 31 17:02:38 2006
From: golharam at umdnj.edu (Ryan Golhar)
Date: Tue, 31 Oct 2006 12:02:38 -0500
Subject: [EMBOSS] extracting noncoding regions
In-Reply-To: <45476CE8.8080409@ebi.ac.uk>
Message-ID: <000a01c6fd0e$5f5b5b70$b23d140a@GOLHARMOBILE1>

I think that would be a useful feature...I have a need for it now and
currently use a Bioperl script to parse out noncoding regions from a
GenBank entry...


> -----Original Message-----
> From: emboss-bounces at lists.open-bio.org 
> [mailto:emboss-bounces at lists.open-bio.org] On Behalf Of Peter Rice
> Sent: Tuesday, October 31, 2006 10:34 AM
> To: Shrish Tiwari
> Cc: emboss at emboss.open-bio.org
> Subject: Re: [EMBOSS] extracting noncoding regions
> 
> 
> Hi Shrish,
> 
> Shrish Tiwari wrote:
> > Hi!
> > Is there a way of extracting the noncoding regions of a 
> genome using 
> > an EMBOSS program?
> 
> That is a simple change to coderet to return non-coding 
> sequence (exclude the 
> CDS and mRNA features).
> 
> Does anyone else want this? We can do it for the next release.
> 
> regards,
> 
> Peter
> _______________________________________________
> EMBOSS mailing list
> EMBOSS at lists.open-bio.org 
> http://lists.open-> bio.org/mailman/listinfo/emboss
> 


From Richard.Rothery at ualberta.ca  Tue Oct 31 17:02:22 2006
From: Richard.Rothery at ualberta.ca (Richard Rothery)
Date: Tue, 31 Oct 2006 10:02:22 -0700
Subject: [EMBOSS] Batch retrieval of taxonomy/species names using entret.....
Message-ID: <000001c6fd0e$5520f2f0$5e068081@Nordegg>

Hi,

 
I am interested in using entret to retrieve single field entries from
swissprot or sptrembl. Specifically, I would like to feed entret a list
of accessions and have it return a file with the species names and/or
taxonomies. I intend to use this information to compare with my
phylogeny analyses of clustalw alignments.

 
Thanks,

 
Richard

 
###############################################

CIHR Membrane Protein Research Group,

Department of Biochemistry, University of Alberta,

Edmonton T6G 2H7

Ph. (780) 492-2229 Fax. (780) 492-0886

###############################################

 
From Suraj.Mukatira at STJUDE.ORG  Tue Oct 31 18:30:00 2006
From: Suraj.Mukatira at STJUDE.ORG (Mukatira, Suraj)
Date: Tue, 31 Oct 2006 12:30:00 -0600
Subject: [EMBOSS] extracting noncoding regions
Message-ID: <F2235647AC878D438F09255C39842FBC164EC5AD@SJMEMXMB03.stjude.sjcrh.local>


I use BioPerl as well. Extraction of non-coding regions and features
like intron, UTR etc. would certainly be useful from within EMBOSS.
Suraj Mukatira


-----Original Message-----
From: emboss-bounces at lists.open-bio.org
[mailto:emboss-bounces at lists.open-bio.org] On Behalf Of Ryan Golhar
Sent: Tuesday, October 31, 2006 11:03 AM
To: 'Peter Rice'; 'Shrish Tiwari'
Cc: emboss at emboss.open-bio.org
Subject: Re: [EMBOSS] extracting noncoding regions

I think that would be a useful feature...I have a need for it now and
currently use a Bioperl script to parse out noncoding regions from a
GenBank entry...


> -----Original Message-----
> From: emboss-bounces at lists.open-bio.org 
> [mailto:emboss-bounces at lists.open-bio.org] On Behalf Of Peter Rice
> Sent: Tuesday, October 31, 2006 10:34 AM
> To: Shrish Tiwari
> Cc: emboss at emboss.open-bio.org
> Subject: Re: [EMBOSS] extracting noncoding regions
> 
> 
> Hi Shrish,
> 
> Shrish Tiwari wrote:
> > Hi!
> > Is there a way of extracting the noncoding regions of a 
> genome using 
> > an EMBOSS program?
> 
> That is a simple change to coderet to return non-coding 
> sequence (exclude the 
> CDS and mRNA features).
> 
> Does anyone else want this? We can do it for the next release.
> 
> regards,
> 
> Peter
> _______________________________________________
> EMBOSS mailing list
> EMBOSS at lists.open-bio.org 
> http://lists.open-> bio.org/mailman/listinfo/emboss
> 

_______________________________________________
EMBOSS mailing list
EMBOSS at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/emboss


From pmr at ebi.ac.uk  Tue Oct 31 18:53:00 2006
From: pmr at ebi.ac.uk (Peter Rice)
Date: Tue, 31 Oct 2006 18:53:00 +0000
Subject: [EMBOSS] Batch retrieval of taxonomy/species names using
	entret.....
In-Reply-To: <000001c6fd0e$5520f2f0$5e068081@Nordegg>
References: <000001c6fd0e$5520f2f0$5e068081@Nordegg>
Message-ID: <45479B8C.5080800@ebi.ac.uk>

Hi Richard,

Richard Rothery wrote:
> I am interested in using entret to retrieve single field entries from
> swissprot or sptrembl. Specifically, I would like to feed entret a list
> of accessions and have it return a file with the species names and/or
> taxonomies. I intend to use this information to compare with my
> phylogeny analyses of clustalw alignments.

EMBOSS stores the full text in entret without parsing.

We could try to extract specific fields but it is not easy to define them for 
all formats.

You can do this with SRS. Try the EBI server for example:

Go to the library page

Select UniProtKB/SwissProt (or UniProtKB/TrEMBL)

Select "standard query form"

Enter your query in the top part (e.g. accession number)

In the "create a view" section click the "list" button to egt the original 
lines. Select anything taxonomic from the pull down list (control-click to 
select more than one)

Press "search".

refine your query. You will see the URL at the top that can be used to retrieve 
data when you are happy.

Failing that, you could just parse out the ID and O* lines from entret using a 
simple perl script.

Hope that helps,

Peter