From jboddu at uiuc.edu  Tue Jun 10 10:50:12 2008
From: jboddu at uiuc.edu (Jay)
Date: Tue, 10 Jun 2008 09:50:12 -0500
Subject: [EMBOSS] sequence retrieval
Message-ID: <000c01c8cb09$4969bdf0$dc3d39d0$@edu>

Hi:

I am brand new to EMBOSS and bioinformatics.

I have a large file with sequences in fasta format. They have IDs.

Is there any EMBOSS way to retrieve sequences by inputting a text file with
a short listed IDs?

Thanks

Jay


From pmr at ebi.ac.uk  Tue Jun 10 12:11:50 2008
From: pmr at ebi.ac.uk (Peter Rice)
Date: Tue, 10 Jun 2008 17:11:50 +0100
Subject: [EMBOSS] sequence retrieval
In-Reply-To: <000c01c8cb09$4969bdf0$dc3d39d0$@edu>
References: <000c01c8cb09$4969bdf0$dc3d39d0$@edu>
Message-ID: <484EA7C6.2050904@ebi.ac.uk>

Jay wrote:
> I have a large file with sequences in fasta format. They have IDs.
> 
> Is there any EMBOSS way to retrieve sequences by inputting a text file with
> a short listed IDs?

With EMBOSS you can refer to sequences in the file:

filename:id

You can also put a list of these into a file, and use that with
@listfilename

But this can be slow - it will read the file for each ID. You can also
index the file with dbxfasta (or dbifasta) as a private database then
define a database in your .embossrc file and use the dbname:id syntax
(again you can use a list file, but it will be much faster)

Hope this helps. If you need more help setting up please ask again!

regards,

Peter

From rls at ebi.ac.uk  Tue Jun 10 13:09:42 2008
From: rls at ebi.ac.uk (Rodrigo Lopez)
Date: Tue, 10 Jun 2008 18:09:42 +0100
Subject: [EMBOSS] sequence retrieval
In-Reply-To: <484EA7C6.2050904@ebi.ac.uk>
References: <000c01c8cb09$4969bdf0$dc3d39d0$@edu> <484EA7C6.2050904@ebi.ac.uk>
Message-ID: <484EB556.4050307@ebi.ac.uk>

Alternatively, look into dbfetch and wsdbfetch Web Services:

http://www.ebi.ac.uk/dbfetch

http://www.ebi.ac.uk/Tools/webservices

All the EMBOSS applications are available under WSDBFetch/SOAPLAB.

R:)


Peter Rice wrote:
> Jay wrote:
>> I have a large file with sequences in fasta format. They have IDs.
>>
>> Is there any EMBOSS way to retrieve sequences by inputting a text file 
>> with
>> a short listed IDs?
> 
> With EMBOSS you can refer to sequences in the file:
> 
> filename:id
> 
> You can also put a list of these into a file, and use that with
> @listfilename
> 
> But this can be slow - it will read the file for each ID. You can also
> index the file with dbxfasta (or dbifasta) as a private database then
> define a database in your .embossrc file and use the dbname:id syntax
> (again you can use a list file, but it will be much faster)
> 
> Hope this helps. If you need more help setting up please ask again!
> 
> regards,
> 
> Peter
> _______________________________________________
> EMBOSS mailing list
> EMBOSS at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/emboss

From db60 at st-andrews.ac.uk  Tue Jun 10 14:49:51 2008
From: db60 at st-andrews.ac.uk (Daniel Barker)
Date: Tue, 10 Jun 2008 19:49:51 +0100
Subject: [EMBOSS] sequence retrieval
In-Reply-To: <484EB556.4050307@ebi.ac.uk>
References: <000c01c8cb09$4969bdf0$dc3d39d0$@edu> <484EA7C6.2050904@ebi.ac.uk>
	<484EB556.4050307@ebi.ac.uk>
Message-ID: <1213123791.484ecccf46b78@webmail.st-andrews.ac.uk>

Dear Jay,

Are you simply trying to extract specific sequences from a Fasta-format
file? The EMBOSS program to do it is seqret, or maybe seqretsplit:

http://emboss.sourceforge.net/apps/release/5.0/emboss/apps/seqret.html

http://emboss.sourceforge.net/apps/release/5.0/emboss/apps/seqretsplit.html

As Peter Rice suggests, you can do stuff to speed the access up, but
it'll work without that.

Best regards,

Daniel

-- 
Daniel Barker
http://bio.st-andrews.ac.uk/staff/db60.htm
The University of St Andrews is a charity registered in Scotland :
No SC013532


------------------------------------------------------------------
University of St Andrews Webmail: https://webmail.st-andrews.ac.uk


From jboddu at uiuc.edu  Tue Jun 10 16:15:35 2008
From: jboddu at uiuc.edu (Jay)
Date: Tue, 10 Jun 2008 15:15:35 -0500
Subject: [EMBOSS] sequence retrieval
In-Reply-To: <1213123791.484ecccf46b78@webmail.st-andrews.ac.uk>
References: <000c01c8cb09$4969bdf0$dc3d39d0$@edu> <484EA7C6.2050904@ebi.ac.uk>
	<484EB556.4050307@ebi.ac.uk>
	<1213123791.484ecccf46b78@webmail.st-andrews.ac.uk>
Message-ID: <002401c8cb36$be5f5390$3b1dfab0$@edu>

Daniel:
I tried seqret in different ways.
My problem is EMBOSS is not recognizing my master sequence file (which is in
fasta form) as my private database. Even after I did the indexing using
dbifasta.
When seqret is asking me to input sequence(s), I am not able to figure out
what exactly it accepts.
I tried dbname:ID, dbname:@listfile.
I also tried a crude way of copy pasting my master file and listfile in
"embl" folder in EMBOSSwin folder and try the same syntax (embl:ID,
embl:@listfile etc.
These did not work.
I am assuming that my master file is not being recognized as a private DB.
I wanted to define my database in .embossrc file. I could not figure this
out either.
Jay

-----Original Message-----
From: Daniel Barker [mailto:db60 at st-andrews.ac.uk] 
Sent: Tuesday, June 10, 2008 1:50 PM
To: rls at ebi.ac.uk
Cc: Peter Rice; Jay; emboss at lists.open-bio.org
Subject: Re: [EMBOSS] sequence retrieval

Dear Jay,

Are you simply trying to extract specific sequences from a Fasta-format
file? The EMBOSS program to do it is seqret, or maybe seqretsplit:

http://emboss.sourceforge.net/apps/release/5.0/emboss/apps/seqret.html

http://emboss.sourceforge.net/apps/release/5.0/emboss/apps/seqretsplit.html

As Peter Rice suggests, you can do stuff to speed the access up, but
it'll work without that.

Best regards,

Daniel

-- 
Daniel Barker
http://bio.st-andrews.ac.uk/staff/db60.htm
The University of St Andrews is a charity registered in Scotland :
No SC013532


------------------------------------------------------------------
University of St Andrews Webmail: https://webmail.st-andrews.ac.uk


From sean.maceach at gmail.com  Tue Jun 10 17:00:33 2008
From: sean.maceach at gmail.com (Sean MacEachern)
Date: Tue, 10 Jun 2008 17:00:33 -0400
Subject: [EMBOSS] sequence retrieval
In-Reply-To: <002401c8cb36$be5f5390$3b1dfab0$@edu>
Message-ID: <C47463B1.2D79%sean.maceach@gmail.com>

Hi Jay, 

Just wondering if you have considered the tools from NCBI. If you were to
dload the blast bundle, I think blast-2.2.17 is the most current release,
you can use formatdb to create a blastable database of your fasta seqs that
you can use for blasting using one of the blast programs or retrieving using
fastacmd.

I'm not sure what emboss application you are attempting to use but you could
probably use a for loop to automate some procedure

Eg.

For i in `cat seqIDs.txt`; do fastacmd -d blastdb -s $i > seq.fsa | primer3
-input seq.fsa -output $i_out.primers

Depending on what you want to do something like that might work for you...

Cheers,
Sean


On 6/10/08 4:15 PM, "Jay" <jboddu at uiuc.edu> wrote:

> Daniel:
> I tried seqret in different ways.
> My problem is EMBOSS is not recognizing my master sequence file (which is in
> fasta form) as my private database. Even after I did the indexing using
> dbifasta.
> When seqret is asking me to input sequence(s), I am not able to figure out
> what exactly it accepts.
> I tried dbname:ID, dbname:@listfile.
> I also tried a crude way of copy pasting my master file and listfile in
> "embl" folder in EMBOSSwin folder and try the same syntax (embl:ID,
> embl:@listfile etc.
> These did not work.
> I am assuming that my master file is not being recognized as a private DB.
> I wanted to define my database in .embossrc file. I could not figure this
> out either.
> Jay
> 
> -----Original Message-----
> From: Daniel Barker [mailto:db60 at st-andrews.ac.uk]
> Sent: Tuesday, June 10, 2008 1:50 PM
> To: rls at ebi.ac.uk
> Cc: Peter Rice; Jay; emboss at lists.open-bio.org
> Subject: Re: [EMBOSS] sequence retrieval
> 
> Dear Jay,
> 
> Are you simply trying to extract specific sequences from a Fasta-format
> file? The EMBOSS program to do it is seqret, or maybe seqretsplit:
> 
> http://emboss.sourceforge.net/apps/release/5.0/emboss/apps/seqret.html
> 
> http://emboss.sourceforge.net/apps/release/5.0/emboss/apps/seqretsplit.html
> 
> As Peter Rice suggests, you can do stuff to speed the access up, but
> it'll work without that.
> 
> Best regards,
> 
> Daniel


From ztu at msi.umn.edu  Tue Jun 10 17:54:06 2008
From: ztu at msi.umn.edu (Zheng Jin Tu)
Date: Tue, 10 Jun 2008 16:54:06 -0500 (CDT)
Subject: [EMBOSS] sequence retrieval
In-Reply-To: <C47463B1.2D79%sean.maceach@gmail.com>
References: <C47463B1.2D79%sean.maceach@gmail.com>
Message-ID: <Pine.LNX.4.63.0806101633570.4628@l11.msi.umn.edu>


This is very popular requirement from biological
user community especially microarray user community.
They have a list of id (affyid or access number) from
microarray data analysis.  Then they want sequence
from fasta file such as Affymetrix Library xxx.sif
file.

In order to use EMBOSS, emboss admin needs to 
index database first.

NCBI fastacmd is another option for getting 
sequence fast especially for last fasta sequence
file such as nt or nr. 

A perl script will be useful for batch sequence
retrival. It will read input file with
list of IDs line-by-line then do:

1): fastacmd -d database -s ID >> outsequence    # ncbi formatdb case

2): seqret .....                                 # EMBOSS case

3): Or just loop over sequence file with flag for find/not find 
by match id over fasta heading ">id ...". Then
output sequence if flag is on if sequence is 
relative small especially in microarray case.


Thanks, TU

--------------------------------------------------
On Tue, 10 Jun 2008, Sean MacEachern wrote:

> Hi Jay, 
> 
> Just wondering if you have considered the tools from NCBI. If you were to
> dload the blast bundle, I think blast-2.2.17 is the most current release,
> you can use formatdb to create a blastable database of your fasta seqs that
> you can use for blasting using one of the blast programs or retrieving using
> fastacmd.
> 
> I'm not sure what emboss application you are attempting to use but you could
> probably use a for loop to automate some procedure
> 
> Eg.
> 
> For i in `cat seqIDs.txt`; do fastacmd -d blastdb -s $i > seq.fsa | primer3
> -input seq.fsa -output $i_out.primers
> 
> Depending on what you want to do something like that might work for you...
> 
> Cheers,
> Sean
> 
> 
> On 6/10/08 4:15 PM, "Jay" <jboddu at uiuc.edu> wrote:
> 
> > Daniel:
> > I tried seqret in different ways.
> > My problem is EMBOSS is not recognizing my master sequence file (which is in
> > fasta form) as my private database. Even after I did the indexing using
> > dbifasta.
> > When seqret is asking me to input sequence(s), I am not able to figure out
> > what exactly it accepts.
> > I tried dbname:ID, dbname:@listfile.
> > I also tried a crude way of copy pasting my master file and listfile in
> > "embl" folder in EMBOSSwin folder and try the same syntax (embl:ID,
> > embl:@listfile etc.
> > These did not work.
> > I am assuming that my master file is not being recognized as a private DB.
> > I wanted to define my database in .embossrc file. I could not figure this
> > out either.
> > Jay
> > 
> > -----Original Message-----
> > From: Daniel Barker [mailto:db60 at st-andrews.ac.uk]
> > Sent: Tuesday, June 10, 2008 1:50 PM
> > To: rls at ebi.ac.uk
> > Cc: Peter Rice; Jay; emboss at lists.open-bio.org
> > Subject: Re: [EMBOSS] sequence retrieval
> > 
> > Dear Jay,
> > 
> > Are you simply trying to extract specific sequences from a Fasta-format
> > file? The EMBOSS program to do it is seqret, or maybe seqretsplit:
> > 
> > http://emboss.sourceforge.net/apps/release/5.0/emboss/apps/seqret.html
> > 
> > http://emboss.sourceforge.net/apps/release/5.0/emboss/apps/seqretsplit.html
> > 
> > As Peter Rice suggests, you can do stuff to speed the access up, but
> > it'll work without that.
> > 
> > Best regards,
> > 
> > Daniel
> 
> 
> _______________________________________________
> EMBOSS mailing list
> EMBOSS at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/emboss
> 

-- 
==========================================================================

From david.bauer at bayerhealthcare.com  Wed Jun 11 01:45:57 2008
From: david.bauer at bayerhealthcare.com (david.bauer at bayerhealthcare.com)
Date: Wed, 11 Jun 2008 07:45:57 +0200
Subject: [EMBOSS] sequence retrieval
In-Reply-To: <002401c8cb36$be5f5390$3b1dfab0$@edu>
Message-ID: <OFFE26CC45.6DB65070-ONC1257465.001EC386-C1257465.001FAC56@schering.de>

Hi,

the database section of the adminguide
http://emboss.sourceforge.net/docs/adminguide/node37.html
describes all the emboss database indexing methods.
There is also a specific chapter on fasta files
http://emboss.sourceforge.net/docs/adminguide/node56.html
which describes the different forms of fasta files. It is important to 
specify the correct type corresponding to the structure of the sequence 
header line.
And also use full path names for the "Database directory" because relative 
path names like "." can cause problems on some systems.

If you still get trouble, send me the section you have in .embossrc, so I 
can have a look at it.
Hope this helps,

Cheers,
David.


emboss-bounces at lists.open-bio.org schrieb am 10/06/2008 22:15:35:

> Daniel:
> I tried seqret in different ways.
> My problem is EMBOSS is not recognizing my master sequence file (which 
is in
> fasta form) as my private database. Even after I did the indexing using
> dbifasta.
> When seqret is asking me to input sequence(s), I am not able to figure 
out
> what exactly it accepts.
> I tried dbname:ID, dbname:@listfile.
> I also tried a crude way of copy pasting my master file and listfile in
> "embl" folder in EMBOSSwin folder and try the same syntax (embl:ID,
> embl:@listfile etc.
> These did not work.
> I am assuming that my master file is not being recognized as a private 
DB.
> I wanted to define my database in .embossrc file. I could not figure 
this
> out either.
> Jay
> 
> -----Original Message-----
> From: Daniel Barker [mailto:db60 at st-andrews.ac.uk] 
> Sent: Tuesday, June 10, 2008 1:50 PM
> To: rls at ebi.ac.uk
> Cc: Peter Rice; Jay; emboss at lists.open-bio.org
> Subject: Re: [EMBOSS] sequence retrieval
> 
> Dear Jay,
> 
> Are you simply trying to extract specific sequences from a Fasta-format
> file? The EMBOSS program to do it is seqret, or maybe seqretsplit:
> 
> http://emboss.sourceforge.net/apps/release/5.0/emboss/apps/seqret.html
> 
> 
http://emboss.sourceforge.net/apps/release/5.0/emboss/apps/seqretsplit.html
> 
> As Peter Rice suggests, you can do stuff to speed the access up, but
> it'll work without that.
> 
> Best regards,
> 
> Daniel
> 
> -- 
> Daniel Barker
> http://bio.st-andrews.ac.uk/staff/db60.htm
> The University of St Andrews is a charity registered in Scotland :
> No SC013532
> 
> 
> ------------------------------------------------------------------
> University of St Andrews Webmail: https://webmail.st-andrews.ac.uk
> 
> _______________________________________________
> EMBOSS mailing list
> EMBOSS at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/emboss


From db60 at st-andrews.ac.uk  Wed Jun 11 06:49:12 2008
From: db60 at st-andrews.ac.uk (Daniel Barker)
Date: Wed, 11 Jun 2008 11:49:12 +0100
Subject: [EMBOSS] sequence retrieval
In-Reply-To: <002401c8cb36$be5f5390$3b1dfab0$@edu>
References: <000c01c8cb09$4969bdf0$dc3d39d0$@edu> <484EA7C6.2050904@ebi.ac.uk>
	<484EB556.4050307@ebi.ac.uk>
	<1213123791.484ecccf46b78@webmail.st-andrews.ac.uk>
	<002401c8cb36$be5f5390$3b1dfab0$@edu>
Message-ID: <484FADA8.5090503@st-andrews.ac.uk>

Dear Jay,

My simple idea is just something like this:

seqret @id_list.txt

where id_list.txt is something like this:

23214.O_sativa_Nipponbare.fasta:Q9FXT4
23214.O_sativa_Nipponbare.fasta:Q2R8Z5
23214.O_sativa_Nipponbare.fasta:Q10AZ4

(23214.O_sativa_Nipponbare.fasta is a Fasta-format file in the current 
directory.)

This certainly works - however, it may not really match what you're after.

Best wishes,

Daniel

-- 
Daniel Barker
http://bio.st-andrews.ac.uk/staff/db60.htm
The University of St Andrews is a charity registered in Scotland :
No SC013532

From orbitus007 at gmail.com  Wed Jun 11 12:44:32 2008
From: orbitus007 at gmail.com (Rudy Aramayo)
Date: Wed, 11 Jun 2008 09:44:32 -0700
Subject: [EMBOSS] Emboss Wrapper for Mac OS X
Message-ID: <29555DCF-21ED-48A0-99DE-F6862BAD400B@neo.tamu.edu>

Howdy!

My name is Rodolfo Aramayo, I have written an application that wraps  
Emboss as well as any Unix application for the Mac. It will ONLY cover  
the Apple side of the spectrum (Mac OSX Leopard and higher)

We distribute Task assignments to a computer with the AppleScript  
language allowing us to manipulate all the beautiful functionality of  
the Emboss package, including NCBI Blast, from scripts.

This application is a generic wrapper tool for all Unix applications.  
It allows us to control the Emboss and/or other Bioinformatics Unix  
applications. With this tool we have also incorporated the ability to  
communicate with an XGrid (distributed computations), this is a way to  
send messages to every computer on an "XGrid" network (using  
AppleScript scripts of course) so that you can get a simple cluster of  
computers to perform a large task. For example, we distribute a local  
Blast search of entire genomes amongst an XGrid and collect each  
result into a single machine.

Apple also has a powerful Automator Workflow feature (a wrapper for  
AppleScript), this allows users whom do not have any AppleScript  
experience to script with the modular components the application (like  
reading in data, or blasting data) with graphical drag and drop  
modules of the application.

In this manner we have written iBioCAD to be presented at WWDC 2008,  
that is the Apple World Wide Developers Conference. Look for us soon.  
The product is NOT ready and we are still developing, we will be  
completing most of this project and I will be displaying a scientific  
poster regarding the structure of the application. Lets build a great  
wrapper to graphical display bioinformatics to the world, together.

-Rodolfo Aramayo

From john.walshaw at bbsrc.ac.uk  Wed Jun 11 13:51:00 2008
From: john.walshaw at bbsrc.ac.uk (john walshaw (JIC))
Date: Wed, 11 Jun 2008 18:51:00 +0100
Subject: [EMBOSS] problem with unauthenticated Jemboss server
Message-ID: <E15BDDABACA8AB409BCC1071AC790DCB0187422D@NBIE2KSRV1.nbi.bbsrc.ac.uk>

 
Hello,
 
I am trying to install an un-authenticated Jemboss server on Linux
(RHEL4, on an AMD64 platform). I've managed this before on other RedHat
flavours, and on Tru64.
 
Everything appears to be ok in terms of the Jemboss service being
deployed, which I can see on the Tomcat server via Axis. However, when I
try and connect with my Jemboss client, I immediately get the "Check
Settings" popup, even though the Public/Private server details appear
correct. As expected, at no stage does a login dialogue appear. However,
if I click OK on the Check Settings popup, then try and run an EMBOSS
app, I get the popup: "Authentication failed/ The server wants a
username and password ..."
 
Can anybody help me diagnose the cause? The logs produced by the vanilla
Tomcat installation aren't very helpful. Details are:
 
EMBOSS 5.0.0
Tomcat 5.0.28
Axis 1.4
Sun Java 1.5.0.11.x86_64
kernel 2.6.9-42.ELsmp

The installation is on a node ('node7') of a cluster behind a firewall.
I'm running the client on the same host and another one behind the same
firewall.
 
When running configure, I specified --without-auth  (and
--with-thread=linux and --enable-64).
 
When building Jemboss, I compiled the JembossServer and
JembossFileServer classes (not the ...Auth.. equivalents).
 
The relevant entries in the jemboss.properties file used by both server
& client are:
 
user.auth=false
jemboss.server=true
server.public=http://node7:8080/axis/services
server.private=http://node7:8080/axis/services
service.public=JembossServer
service.private=JembossServer

The above server details appear as expected in the Preferences ->
Settings -> Servers dialogue of the Jemboss client.
 
 
After starting Tomcat and deploying JembossServer, I can go to:
 
http://node7:8080/axis/services/JembossServer
 
using a browser on the same node or a different one on the cluster. I
get the expected page
("JembossServer  Hi there, this is an AXIS service! .... " etc).
 
http://node7:8080/axis/happyaxis.jsp  lists all the Needed Components,
and all are present. All that is missing is one optional component, the
XML Security class.
 
http://node7:8080/axis/servlet/AxisServlet shows that both JembossServer
and EmbreoFile have been added - they and all their methods are listed.

 
If I run the Jemboss client on the same host as the server, it's still
the same problem if I specify the servers as
http://localhost:8080/axis/services

Any help much appreciated,
 
regards,
 
John.
 
 
Dr John Walshaw
Department of Computational & Systems Biology
John Innes Centre
Colney
Norwich NR4 7UH
UK


From maoj at helix.nih.gov  Fri Jun 13 16:27:36 2008
From: maoj at helix.nih.gov (Jean Mao)
Date: Fri, 13 Jun 2008 16:27:36 -0400
Subject: [EMBOSS] Question about seq fragments merge then align
Message-ID: <4852D838.4010406@helix.nih.gov>

Hi all,

I would like to know which program(s) I should use to do the following, 
prefer in as few steps as possible:

- find the overlap regions of multiple sequence fragments
- merge them into one big sequence
- align to a known sequence

I found programs that only merge 2 sequences, not multiple sequences.

Thanks you very much.

Jean Mao

From andrespinzon at gmail.com  Tue Jun 17 15:47:17 2008
From: andrespinzon at gmail.com (Andres Pinzon)
Date: Tue, 17 Jun 2008 14:47:17 -0500
Subject: [EMBOSS] notseq and fasta definition headers
Message-ID: <8968fc7e0806171247o40d2f7a7gd64618d567c125fd@mail.gmail.com>

Hi,
Im using notseq to obtain a subset of fasta seqs from a multiple fasta file:

notseq -junkoutseq 1000-1.fasta -sequence 7135seqs.fasta -exclude
@xaa.list.fasta -outseq leftSeqs.fast

The output is correct, but notseq changes the definition in the fasta
headers, so if the fasta header in "xaa.list.fasta" was:

lcl|29855|ORF26673_6

the corresponding fasta header in sequence in 1000-1.fasta is:

29855

Is there a way to tell "notseq" to keep the original fasta headers intact?

Thanks in advance,

-- 
Andr?s Pinz?n cPhD
http://bioinf.ibun.unal.edu.co/~apinzon/
Bioinformatics Center, Colombia EMBnet node
http://bioinf.ibun.unal.edu.co
Tel +57 3165000 ext 16961 Fax +571 3165415
Micology and Phytopathology Laboratory - Los Andes University.
http://bioinf.uniandes.edu.co
Tel +571 3394949 ext. 2768


From andrespinzon at gmail.com  Tue Jun 17 15:49:59 2008
From: andrespinzon at gmail.com (Andres Pinzon)
Date: Tue, 17 Jun 2008 14:49:59 -0500
Subject: [EMBOSS] notseq and fasta definition headers
In-Reply-To: <8968fc7e0806171247o40d2f7a7gd64618d567c125fd@mail.gmail.com>
References: <8968fc7e0806171247o40d2f7a7gd64618d567c125fd@mail.gmail.com>
Message-ID: <8968fc7e0806171249x5b4b9ab1q851afb6318840a38@mail.gmail.com>

Hi,
Im using notseq to obtain a subset of fasta seqs from a multiple fasta file:

notseq -junkoutseq 1000-1.fasta -sequence 7135seqs.fasta -exclude
@xaa.list.fasta -outseq leftSeqs.fast

The output is correct, but notseq changes the definition in the fasta
headers, so if the fasta header in "xaa.list.fasta" was:

lcl|29855|ORF26673_6

the corresponding fasta header in sequence in 1000-1.fasta is:

29855

Is there a way to tell "notseq" to keep the original fasta headers intact?

Thanks in advance,

-- 
Andr?s Pinz?n cPhD
http://bioinf.ibun.unal.edu.co/~apinzon/<http://bioinf.ibun.unal.edu.co/%7Eapinzon/>
Bioinformatics Center, Colombia EMBnet node
http://bioinf.ibun.unal.edu.co
Tel +57 3165000 ext 16961 Fax +571 3165415
Micology and Phytopathology Laboratory - Los Andes University.
http://bioinf.uniandes.edu.co
Tel +571 3394949 ext. 2768


From pmr at ebi.ac.uk  Tue Jun 17 16:28:47 2008
From: pmr at ebi.ac.uk (Peter Rice)
Date: Tue, 17 Jun 2008 21:28:47 +0100
Subject: [EMBOSS] notseq and fasta definition headers
In-Reply-To: <8968fc7e0806171247o40d2f7a7gd64618d567c125fd@mail.gmail.com>
References: <8968fc7e0806171247o40d2f7a7gd64618d567c125fd@mail.gmail.com>
Message-ID: <48581E7F.40706@ebi.ac.uk>

Andres Pinzon wrote:
> The output is correct, but notseq changes the definition in the fasta
> headers, so if the fasta header in "xaa.list.fasta" was:
> 
> lcl|29855|ORF26673_6
> 
> the corresponding fasta header in sequence in 1000-1.fasta is:
> 
> 29855
> 
> Is there a way to tell "notseq" to keep the original fasta headers intact?

Yes.

FASTA format is not simple ... we have seen many ways to hide extra 
information in the ID (EMBOSS recognizes NCBI id formats and parses out 
the ID 29855) and also in the description (we try to recognize 
conventions used by GCG and ACEDB)

But you can also specify "pearson" format which reads the ID without 
parsing. Just add to the commandline:

notseq -sf pearson

Now you have another problem. This will not work for notseq!!!

The exclude string in notseq is a pattern. In processing the pattern, 
some pattern characters are removed:

	whitespace
	',' and ';'
	'|'

So your exclude pattern cannot include any '|' chatracters.

As a workaround, you can exclude "*ORF26673_6" and the IDs will be 
preserved.

For the next release we will allow '|' characters. When notseq was first 
written there was a possibility to use regualr expressions, but now we 
only use simple text matching so the pipe characters are not a problem.

Hope that helps

Peter


From jcohn at pngg.org  Wed Jun 25 13:51:28 2008
From: jcohn at pngg.org (Josh Cohn)
Date: Wed, 25 Jun 2008 13:51:28 -0400
Subject: [EMBOSS] einverted- file size limits?
Message-ID: <B9E40689C37A11439CDB8B38262228D75735@mtolympus.pngg.org>

Hello,

      I am attempting to use einverted on a relatively large set of
sequences.  I've noticed that when I run just a few sequences, einverted
seems to run just fine.  However, when I use the same parameters on a
large set of sequences, the program quits before it has finished
analyzing all of the data.  Are there known file size limits or sequence
length limits for einverted?  If so, how can I run large sequences
(>300kb) or large numbers of sequences (1000+)? 

 
I'm running einverted from EMBOSS 5.0.0 on a Sun machine running Solaris
9 for SPARC.

 
Thanks,

 
Josh

 
From jison at ebi.ac.uk  Thu Jun 26 03:24:40 2008
From: jison at ebi.ac.uk (Jon Ison)
Date: Thu, 26 Jun 2008 08:24:40 +0100 (BST)
Subject: [EMBOSS] einverted- file size limits?
In-Reply-To: <B9E40689C37A11439CDB8B38262228D75735@mtolympus.pngg.org>
References: <B9E40689C37A11439CDB8B38262228D75735@mtolympus.pngg.org>
Message-ID: <36190.84.92.187.247.1214465080.squirrel@webmail.ebi.ac.uk>

Hi Josh

The short answer is you need more memory and a faster computer.
Check there are no system limits on memory usage (do an "unlimit"
or some such).  EMBOSS has no arbitrary memory limits, it is just
that einverted uses full dynamic programming which is necessarily
very memory and CPU intensive, especially for larger sequences.

You could try running palindrome which does a similar thing and is
is faster and less memory intensive.

Cheers

Jon


> Hello,
>
>       I am attempting to use einverted on a relatively large set of
> sequences.  I've noticed that when I run just a few sequences, einverted
> seems to run just fine.  However, when I use the same parameters on a
> large set of sequences, the program quits before it has finished
> analyzing all of the data.  Are there known file size limits or sequence
> length limits for einverted?  If so, how can I run large sequences
> (>300kb) or large numbers of sequences (1000+)?
>
>
>
> I'm running einverted from EMBOSS 5.0.0 on a Sun machine running Solaris
> 9 for SPARC.
>
>
>
> Thanks,
>
>
>
> Josh
>
>
>
>
> _______________________________________________
> EMBOSS mailing list
> EMBOSS at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/emboss
>


From jboddu at uiuc.edu  Tue Jun 10 14:50:12 2008
From: jboddu at uiuc.edu (Jay)
Date: Tue, 10 Jun 2008 09:50:12 -0500
Subject: [EMBOSS] sequence retrieval
Message-ID: <000c01c8cb09$4969bdf0$dc3d39d0$@edu>

Hi:

I am brand new to EMBOSS and bioinformatics.

I have a large file with sequences in fasta format. They have IDs.

Is there any EMBOSS way to retrieve sequences by inputting a text file with
a short listed IDs?

Thanks

Jay


From pmr at ebi.ac.uk  Tue Jun 10 16:11:50 2008
From: pmr at ebi.ac.uk (Peter Rice)
Date: Tue, 10 Jun 2008 17:11:50 +0100
Subject: [EMBOSS] sequence retrieval
In-Reply-To: <000c01c8cb09$4969bdf0$dc3d39d0$@edu>
References: <000c01c8cb09$4969bdf0$dc3d39d0$@edu>
Message-ID: <484EA7C6.2050904@ebi.ac.uk>

Jay wrote:
> I have a large file with sequences in fasta format. They have IDs.
> 
> Is there any EMBOSS way to retrieve sequences by inputting a text file with
> a short listed IDs?

With EMBOSS you can refer to sequences in the file:

filename:id

You can also put a list of these into a file, and use that with
@listfilename

But this can be slow - it will read the file for each ID. You can also
index the file with dbxfasta (or dbifasta) as a private database then
define a database in your .embossrc file and use the dbname:id syntax
(again you can use a list file, but it will be much faster)

Hope this helps. If you need more help setting up please ask again!

regards,

Peter


From rls at ebi.ac.uk  Tue Jun 10 17:09:42 2008
From: rls at ebi.ac.uk (Rodrigo Lopez)
Date: Tue, 10 Jun 2008 18:09:42 +0100
Subject: [EMBOSS] sequence retrieval
In-Reply-To: <484EA7C6.2050904@ebi.ac.uk>
References: <000c01c8cb09$4969bdf0$dc3d39d0$@edu> <484EA7C6.2050904@ebi.ac.uk>
Message-ID: <484EB556.4050307@ebi.ac.uk>

Alternatively, look into dbfetch and wsdbfetch Web Services:

http://www.ebi.ac.uk/dbfetch

http://www.ebi.ac.uk/Tools/webservices

All the EMBOSS applications are available under WSDBFetch/SOAPLAB.

R:)


Peter Rice wrote:
> Jay wrote:
>> I have a large file with sequences in fasta format. They have IDs.
>>
>> Is there any EMBOSS way to retrieve sequences by inputting a text file 
>> with
>> a short listed IDs?
> 
> With EMBOSS you can refer to sequences in the file:
> 
> filename:id
> 
> You can also put a list of these into a file, and use that with
> @listfilename
> 
> But this can be slow - it will read the file for each ID. You can also
> index the file with dbxfasta (or dbifasta) as a private database then
> define a database in your .embossrc file and use the dbname:id syntax
> (again you can use a list file, but it will be much faster)
> 
> Hope this helps. If you need more help setting up please ask again!
> 
> regards,
> 
> Peter
> _______________________________________________
> EMBOSS mailing list
> EMBOSS at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/emboss


From db60 at st-andrews.ac.uk  Tue Jun 10 18:49:51 2008
From: db60 at st-andrews.ac.uk (Daniel Barker)
Date: Tue, 10 Jun 2008 19:49:51 +0100
Subject: [EMBOSS] sequence retrieval
In-Reply-To: <484EB556.4050307@ebi.ac.uk>
References: <000c01c8cb09$4969bdf0$dc3d39d0$@edu> <484EA7C6.2050904@ebi.ac.uk>
	<484EB556.4050307@ebi.ac.uk>
Message-ID: <1213123791.484ecccf46b78@webmail.st-andrews.ac.uk>

Dear Jay,

Are you simply trying to extract specific sequences from a Fasta-format
file? The EMBOSS program to do it is seqret, or maybe seqretsplit:

http://emboss.sourceforge.net/apps/release/5.0/emboss/apps/seqret.html

http://emboss.sourceforge.net/apps/release/5.0/emboss/apps/seqretsplit.html

As Peter Rice suggests, you can do stuff to speed the access up, but
it'll work without that.

Best regards,

Daniel

-- 
Daniel Barker
http://bio.st-andrews.ac.uk/staff/db60.htm
The University of St Andrews is a charity registered in Scotland :
No SC013532


------------------------------------------------------------------
University of St Andrews Webmail: https://webmail.st-andrews.ac.uk


From jboddu at uiuc.edu  Tue Jun 10 20:15:35 2008
From: jboddu at uiuc.edu (Jay)
Date: Tue, 10 Jun 2008 15:15:35 -0500
Subject: [EMBOSS] sequence retrieval
In-Reply-To: <1213123791.484ecccf46b78@webmail.st-andrews.ac.uk>
References: <000c01c8cb09$4969bdf0$dc3d39d0$@edu> <484EA7C6.2050904@ebi.ac.uk>
	<484EB556.4050307@ebi.ac.uk>
	<1213123791.484ecccf46b78@webmail.st-andrews.ac.uk>
Message-ID: <002401c8cb36$be5f5390$3b1dfab0$@edu>

Daniel:
I tried seqret in different ways.
My problem is EMBOSS is not recognizing my master sequence file (which is in
fasta form) as my private database. Even after I did the indexing using
dbifasta.
When seqret is asking me to input sequence(s), I am not able to figure out
what exactly it accepts.
I tried dbname:ID, dbname:@listfile.
I also tried a crude way of copy pasting my master file and listfile in
"embl" folder in EMBOSSwin folder and try the same syntax (embl:ID,
embl:@listfile etc.
These did not work.
I am assuming that my master file is not being recognized as a private DB.
I wanted to define my database in .embossrc file. I could not figure this
out either.
Jay

-----Original Message-----
From: Daniel Barker [mailto:db60 at st-andrews.ac.uk] 
Sent: Tuesday, June 10, 2008 1:50 PM
To: rls at ebi.ac.uk
Cc: Peter Rice; Jay; emboss at lists.open-bio.org
Subject: Re: [EMBOSS] sequence retrieval

Dear Jay,

Are you simply trying to extract specific sequences from a Fasta-format
file? The EMBOSS program to do it is seqret, or maybe seqretsplit:

http://emboss.sourceforge.net/apps/release/5.0/emboss/apps/seqret.html

http://emboss.sourceforge.net/apps/release/5.0/emboss/apps/seqretsplit.html

As Peter Rice suggests, you can do stuff to speed the access up, but
it'll work without that.

Best regards,

Daniel

-- 
Daniel Barker
http://bio.st-andrews.ac.uk/staff/db60.htm
The University of St Andrews is a charity registered in Scotland :
No SC013532


------------------------------------------------------------------
University of St Andrews Webmail: https://webmail.st-andrews.ac.uk


From sean.maceach at gmail.com  Tue Jun 10 21:00:33 2008
From: sean.maceach at gmail.com (Sean MacEachern)
Date: Tue, 10 Jun 2008 17:00:33 -0400
Subject: [EMBOSS] sequence retrieval
In-Reply-To: <002401c8cb36$be5f5390$3b1dfab0$@edu>
Message-ID: <C47463B1.2D79%sean.maceach@gmail.com>

Hi Jay, 

Just wondering if you have considered the tools from NCBI. If you were to
dload the blast bundle, I think blast-2.2.17 is the most current release,
you can use formatdb to create a blastable database of your fasta seqs that
you can use for blasting using one of the blast programs or retrieving using
fastacmd.

I'm not sure what emboss application you are attempting to use but you could
probably use a for loop to automate some procedure

Eg.

For i in `cat seqIDs.txt`; do fastacmd -d blastdb -s $i > seq.fsa | primer3
-input seq.fsa -output $i_out.primers

Depending on what you want to do something like that might work for you...

Cheers,
Sean


On 6/10/08 4:15 PM, "Jay" <jboddu at uiuc.edu> wrote:

> Daniel:
> I tried seqret in different ways.
> My problem is EMBOSS is not recognizing my master sequence file (which is in
> fasta form) as my private database. Even after I did the indexing using
> dbifasta.
> When seqret is asking me to input sequence(s), I am not able to figure out
> what exactly it accepts.
> I tried dbname:ID, dbname:@listfile.
> I also tried a crude way of copy pasting my master file and listfile in
> "embl" folder in EMBOSSwin folder and try the same syntax (embl:ID,
> embl:@listfile etc.
> These did not work.
> I am assuming that my master file is not being recognized as a private DB.
> I wanted to define my database in .embossrc file. I could not figure this
> out either.
> Jay
> 
> -----Original Message-----
> From: Daniel Barker [mailto:db60 at st-andrews.ac.uk]
> Sent: Tuesday, June 10, 2008 1:50 PM
> To: rls at ebi.ac.uk
> Cc: Peter Rice; Jay; emboss at lists.open-bio.org
> Subject: Re: [EMBOSS] sequence retrieval
> 
> Dear Jay,
> 
> Are you simply trying to extract specific sequences from a Fasta-format
> file? The EMBOSS program to do it is seqret, or maybe seqretsplit:
> 
> http://emboss.sourceforge.net/apps/release/5.0/emboss/apps/seqret.html
> 
> http://emboss.sourceforge.net/apps/release/5.0/emboss/apps/seqretsplit.html
> 
> As Peter Rice suggests, you can do stuff to speed the access up, but
> it'll work without that.
> 
> Best regards,
> 
> Daniel


From ztu at msi.umn.edu  Tue Jun 10 21:54:06 2008
From: ztu at msi.umn.edu (Zheng Jin Tu)
Date: Tue, 10 Jun 2008 16:54:06 -0500 (CDT)
Subject: [EMBOSS] sequence retrieval
In-Reply-To: <C47463B1.2D79%sean.maceach@gmail.com>
References: <C47463B1.2D79%sean.maceach@gmail.com>
Message-ID: <Pine.LNX.4.63.0806101633570.4628@l11.msi.umn.edu>


This is very popular requirement from biological
user community especially microarray user community.
They have a list of id (affyid or access number) from
microarray data analysis.  Then they want sequence
from fasta file such as Affymetrix Library xxx.sif
file.

In order to use EMBOSS, emboss admin needs to 
index database first.

NCBI fastacmd is another option for getting 
sequence fast especially for last fasta sequence
file such as nt or nr. 

A perl script will be useful for batch sequence
retrival. It will read input file with
list of IDs line-by-line then do:

1): fastacmd -d database -s ID >> outsequence    # ncbi formatdb case

2): seqret .....                                 # EMBOSS case

3): Or just loop over sequence file with flag for find/not find 
by match id over fasta heading ">id ...". Then
output sequence if flag is on if sequence is 
relative small especially in microarray case.


Thanks, TU

--------------------------------------------------
On Tue, 10 Jun 2008, Sean MacEachern wrote:

> Hi Jay, 
> 
> Just wondering if you have considered the tools from NCBI. If you were to
> dload the blast bundle, I think blast-2.2.17 is the most current release,
> you can use formatdb to create a blastable database of your fasta seqs that
> you can use for blasting using one of the blast programs or retrieving using
> fastacmd.
> 
> I'm not sure what emboss application you are attempting to use but you could
> probably use a for loop to automate some procedure
> 
> Eg.
> 
> For i in `cat seqIDs.txt`; do fastacmd -d blastdb -s $i > seq.fsa | primer3
> -input seq.fsa -output $i_out.primers
> 
> Depending on what you want to do something like that might work for you...
> 
> Cheers,
> Sean
> 
> 
> On 6/10/08 4:15 PM, "Jay" <jboddu at uiuc.edu> wrote:
> 
> > Daniel:
> > I tried seqret in different ways.
> > My problem is EMBOSS is not recognizing my master sequence file (which is in
> > fasta form) as my private database. Even after I did the indexing using
> > dbifasta.
> > When seqret is asking me to input sequence(s), I am not able to figure out
> > what exactly it accepts.
> > I tried dbname:ID, dbname:@listfile.
> > I also tried a crude way of copy pasting my master file and listfile in
> > "embl" folder in EMBOSSwin folder and try the same syntax (embl:ID,
> > embl:@listfile etc.
> > These did not work.
> > I am assuming that my master file is not being recognized as a private DB.
> > I wanted to define my database in .embossrc file. I could not figure this
> > out either.
> > Jay
> > 
> > -----Original Message-----
> > From: Daniel Barker [mailto:db60 at st-andrews.ac.uk]
> > Sent: Tuesday, June 10, 2008 1:50 PM
> > To: rls at ebi.ac.uk
> > Cc: Peter Rice; Jay; emboss at lists.open-bio.org
> > Subject: Re: [EMBOSS] sequence retrieval
> > 
> > Dear Jay,
> > 
> > Are you simply trying to extract specific sequences from a Fasta-format
> > file? The EMBOSS program to do it is seqret, or maybe seqretsplit:
> > 
> > http://emboss.sourceforge.net/apps/release/5.0/emboss/apps/seqret.html
> > 
> > http://emboss.sourceforge.net/apps/release/5.0/emboss/apps/seqretsplit.html
> > 
> > As Peter Rice suggests, you can do stuff to speed the access up, but
> > it'll work without that.
> > 
> > Best regards,
> > 
> > Daniel
> 
> 
> _______________________________________________
> EMBOSS mailing list
> EMBOSS at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/emboss
> 

-- 
==========================================================================


From david.bauer at bayerhealthcare.com  Wed Jun 11 05:45:57 2008
From: david.bauer at bayerhealthcare.com (david.bauer at bayerhealthcare.com)
Date: Wed, 11 Jun 2008 07:45:57 +0200
Subject: [EMBOSS] sequence retrieval
In-Reply-To: <002401c8cb36$be5f5390$3b1dfab0$@edu>
Message-ID: <OFFE26CC45.6DB65070-ONC1257465.001EC386-C1257465.001FAC56@schering.de>

Hi,

the database section of the adminguide
http://emboss.sourceforge.net/docs/adminguide/node37.html
describes all the emboss database indexing methods.
There is also a specific chapter on fasta files
http://emboss.sourceforge.net/docs/adminguide/node56.html
which describes the different forms of fasta files. It is important to 
specify the correct type corresponding to the structure of the sequence 
header line.
And also use full path names for the "Database directory" because relative 
path names like "." can cause problems on some systems.

If you still get trouble, send me the section you have in .embossrc, so I 
can have a look at it.
Hope this helps,

Cheers,
David.


emboss-bounces at lists.open-bio.org schrieb am 10/06/2008 22:15:35:

> Daniel:
> I tried seqret in different ways.
> My problem is EMBOSS is not recognizing my master sequence file (which 
is in
> fasta form) as my private database. Even after I did the indexing using
> dbifasta.
> When seqret is asking me to input sequence(s), I am not able to figure 
out
> what exactly it accepts.
> I tried dbname:ID, dbname:@listfile.
> I also tried a crude way of copy pasting my master file and listfile in
> "embl" folder in EMBOSSwin folder and try the same syntax (embl:ID,
> embl:@listfile etc.
> These did not work.
> I am assuming that my master file is not being recognized as a private 
DB.
> I wanted to define my database in .embossrc file. I could not figure 
this
> out either.
> Jay
> 
> -----Original Message-----
> From: Daniel Barker [mailto:db60 at st-andrews.ac.uk] 
> Sent: Tuesday, June 10, 2008 1:50 PM
> To: rls at ebi.ac.uk
> Cc: Peter Rice; Jay; emboss at lists.open-bio.org
> Subject: Re: [EMBOSS] sequence retrieval
> 
> Dear Jay,
> 
> Are you simply trying to extract specific sequences from a Fasta-format
> file? The EMBOSS program to do it is seqret, or maybe seqretsplit:
> 
> http://emboss.sourceforge.net/apps/release/5.0/emboss/apps/seqret.html
> 
> 
http://emboss.sourceforge.net/apps/release/5.0/emboss/apps/seqretsplit.html
> 
> As Peter Rice suggests, you can do stuff to speed the access up, but
> it'll work without that.
> 
> Best regards,
> 
> Daniel
> 
> -- 
> Daniel Barker
> http://bio.st-andrews.ac.uk/staff/db60.htm
> The University of St Andrews is a charity registered in Scotland :
> No SC013532
> 
> 
> ------------------------------------------------------------------
> University of St Andrews Webmail: https://webmail.st-andrews.ac.uk
> 
> _______________________________________________
> EMBOSS mailing list
> EMBOSS at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/emboss


From db60 at st-andrews.ac.uk  Wed Jun 11 10:49:12 2008
From: db60 at st-andrews.ac.uk (Daniel Barker)
Date: Wed, 11 Jun 2008 11:49:12 +0100
Subject: [EMBOSS] sequence retrieval
In-Reply-To: <002401c8cb36$be5f5390$3b1dfab0$@edu>
References: <000c01c8cb09$4969bdf0$dc3d39d0$@edu> <484EA7C6.2050904@ebi.ac.uk>
	<484EB556.4050307@ebi.ac.uk>
	<1213123791.484ecccf46b78@webmail.st-andrews.ac.uk>
	<002401c8cb36$be5f5390$3b1dfab0$@edu>
Message-ID: <484FADA8.5090503@st-andrews.ac.uk>

Dear Jay,

My simple idea is just something like this:

seqret @id_list.txt

where id_list.txt is something like this:

23214.O_sativa_Nipponbare.fasta:Q9FXT4
23214.O_sativa_Nipponbare.fasta:Q2R8Z5
23214.O_sativa_Nipponbare.fasta:Q10AZ4

(23214.O_sativa_Nipponbare.fasta is a Fasta-format file in the current 
directory.)

This certainly works - however, it may not really match what you're after.

Best wishes,

Daniel

-- 
Daniel Barker
http://bio.st-andrews.ac.uk/staff/db60.htm
The University of St Andrews is a charity registered in Scotland :
No SC013532


From orbitus007 at gmail.com  Wed Jun 11 16:44:32 2008
From: orbitus007 at gmail.com (Rudy Aramayo)
Date: Wed, 11 Jun 2008 09:44:32 -0700
Subject: [EMBOSS] Emboss Wrapper for Mac OS X
Message-ID: <29555DCF-21ED-48A0-99DE-F6862BAD400B@neo.tamu.edu>

Howdy!

My name is Rodolfo Aramayo, I have written an application that wraps  
Emboss as well as any Unix application for the Mac. It will ONLY cover  
the Apple side of the spectrum (Mac OSX Leopard and higher)

We distribute Task assignments to a computer with the AppleScript  
language allowing us to manipulate all the beautiful functionality of  
the Emboss package, including NCBI Blast, from scripts.

This application is a generic wrapper tool for all Unix applications.  
It allows us to control the Emboss and/or other Bioinformatics Unix  
applications. With this tool we have also incorporated the ability to  
communicate with an XGrid (distributed computations), this is a way to  
send messages to every computer on an "XGrid" network (using  
AppleScript scripts of course) so that you can get a simple cluster of  
computers to perform a large task. For example, we distribute a local  
Blast search of entire genomes amongst an XGrid and collect each  
result into a single machine.

Apple also has a powerful Automator Workflow feature (a wrapper for  
AppleScript), this allows users whom do not have any AppleScript  
experience to script with the modular components the application (like  
reading in data, or blasting data) with graphical drag and drop  
modules of the application.

In this manner we have written iBioCAD to be presented at WWDC 2008,  
that is the Apple World Wide Developers Conference. Look for us soon.  
The product is NOT ready and we are still developing, we will be  
completing most of this project and I will be displaying a scientific  
poster regarding the structure of the application. Lets build a great  
wrapper to graphical display bioinformatics to the world, together.

-Rodolfo Aramayo


From john.walshaw at bbsrc.ac.uk  Wed Jun 11 17:51:00 2008
From: john.walshaw at bbsrc.ac.uk (john walshaw (JIC))
Date: Wed, 11 Jun 2008 18:51:00 +0100
Subject: [EMBOSS] problem with unauthenticated Jemboss server
Message-ID: <E15BDDABACA8AB409BCC1071AC790DCB0187422D@NBIE2KSRV1.nbi.bbsrc.ac.uk>

 
Hello,
 
I am trying to install an un-authenticated Jemboss server on Linux
(RHEL4, on an AMD64 platform). I've managed this before on other RedHat
flavours, and on Tru64.
 
Everything appears to be ok in terms of the Jemboss service being
deployed, which I can see on the Tomcat server via Axis. However, when I
try and connect with my Jemboss client, I immediately get the "Check
Settings" popup, even though the Public/Private server details appear
correct. As expected, at no stage does a login dialogue appear. However,
if I click OK on the Check Settings popup, then try and run an EMBOSS
app, I get the popup: "Authentication failed/ The server wants a
username and password ..."
 
Can anybody help me diagnose the cause? The logs produced by the vanilla
Tomcat installation aren't very helpful. Details are:
 
EMBOSS 5.0.0
Tomcat 5.0.28
Axis 1.4
Sun Java 1.5.0.11.x86_64
kernel 2.6.9-42.ELsmp

The installation is on a node ('node7') of a cluster behind a firewall.
I'm running the client on the same host and another one behind the same
firewall.
 
When running configure, I specified --without-auth  (and
--with-thread=linux and --enable-64).
 
When building Jemboss, I compiled the JembossServer and
JembossFileServer classes (not the ...Auth.. equivalents).
 
The relevant entries in the jemboss.properties file used by both server
& client are:
 
user.auth=false
jemboss.server=true
server.public=http://node7:8080/axis/services
server.private=http://node7:8080/axis/services
service.public=JembossServer
service.private=JembossServer

The above server details appear as expected in the Preferences ->
Settings -> Servers dialogue of the Jemboss client.
 
 
After starting Tomcat and deploying JembossServer, I can go to:
 
http://node7:8080/axis/services/JembossServer
 
using a browser on the same node or a different one on the cluster. I
get the expected page
("JembossServer  Hi there, this is an AXIS service! .... " etc).
 
http://node7:8080/axis/happyaxis.jsp  lists all the Needed Components,
and all are present. All that is missing is one optional component, the
XML Security class.
 
http://node7:8080/axis/servlet/AxisServlet shows that both JembossServer
and EmbreoFile have been added - they and all their methods are listed.

 
If I run the Jemboss client on the same host as the server, it's still
the same problem if I specify the servers as
http://localhost:8080/axis/services

Any help much appreciated,
 
regards,
 
John.
 
 
Dr John Walshaw
Department of Computational & Systems Biology
John Innes Centre
Colney
Norwich NR4 7UH
UK


From maoj at helix.nih.gov  Fri Jun 13 20:27:36 2008
From: maoj at helix.nih.gov (Jean Mao)
Date: Fri, 13 Jun 2008 16:27:36 -0400
Subject: [EMBOSS] Question about seq fragments merge then align
Message-ID: <4852D838.4010406@helix.nih.gov>

Hi all,

I would like to know which program(s) I should use to do the following, 
prefer in as few steps as possible:

- find the overlap regions of multiple sequence fragments
- merge them into one big sequence
- align to a known sequence

I found programs that only merge 2 sequences, not multiple sequences.

Thanks you very much.

Jean Mao


From andrespinzon at gmail.com  Tue Jun 17 19:47:17 2008
From: andrespinzon at gmail.com (Andres Pinzon)
Date: Tue, 17 Jun 2008 14:47:17 -0500
Subject: [EMBOSS] notseq and fasta definition headers
Message-ID: <8968fc7e0806171247o40d2f7a7gd64618d567c125fd@mail.gmail.com>

Hi,
Im using notseq to obtain a subset of fasta seqs from a multiple fasta file:

notseq -junkoutseq 1000-1.fasta -sequence 7135seqs.fasta -exclude
@xaa.list.fasta -outseq leftSeqs.fast

The output is correct, but notseq changes the definition in the fasta
headers, so if the fasta header in "xaa.list.fasta" was:

lcl|29855|ORF26673_6

the corresponding fasta header in sequence in 1000-1.fasta is:

29855

Is there a way to tell "notseq" to keep the original fasta headers intact?

Thanks in advance,

-- 
Andr?s Pinz?n cPhD
http://bioinf.ibun.unal.edu.co/~apinzon/
Bioinformatics Center, Colombia EMBnet node
http://bioinf.ibun.unal.edu.co
Tel +57 3165000 ext 16961 Fax +571 3165415
Micology and Phytopathology Laboratory - Los Andes University.
http://bioinf.uniandes.edu.co
Tel +571 3394949 ext. 2768


From andrespinzon at gmail.com  Tue Jun 17 19:49:59 2008
From: andrespinzon at gmail.com (Andres Pinzon)
Date: Tue, 17 Jun 2008 14:49:59 -0500
Subject: [EMBOSS] notseq and fasta definition headers
In-Reply-To: <8968fc7e0806171247o40d2f7a7gd64618d567c125fd@mail.gmail.com>
References: <8968fc7e0806171247o40d2f7a7gd64618d567c125fd@mail.gmail.com>
Message-ID: <8968fc7e0806171249x5b4b9ab1q851afb6318840a38@mail.gmail.com>

Hi,
Im using notseq to obtain a subset of fasta seqs from a multiple fasta file:

notseq -junkoutseq 1000-1.fasta -sequence 7135seqs.fasta -exclude
@xaa.list.fasta -outseq leftSeqs.fast

The output is correct, but notseq changes the definition in the fasta
headers, so if the fasta header in "xaa.list.fasta" was:

lcl|29855|ORF26673_6

the corresponding fasta header in sequence in 1000-1.fasta is:

29855

Is there a way to tell "notseq" to keep the original fasta headers intact?

Thanks in advance,

-- 
Andr?s Pinz?n cPhD
http://bioinf.ibun.unal.edu.co/~apinzon/<http://bioinf.ibun.unal.edu.co/%7Eapinzon/>
Bioinformatics Center, Colombia EMBnet node
http://bioinf.ibun.unal.edu.co
Tel +57 3165000 ext 16961 Fax +571 3165415
Micology and Phytopathology Laboratory - Los Andes University.
http://bioinf.uniandes.edu.co
Tel +571 3394949 ext. 2768


From pmr at ebi.ac.uk  Tue Jun 17 20:28:47 2008
From: pmr at ebi.ac.uk (Peter Rice)
Date: Tue, 17 Jun 2008 21:28:47 +0100
Subject: [EMBOSS] notseq and fasta definition headers
In-Reply-To: <8968fc7e0806171247o40d2f7a7gd64618d567c125fd@mail.gmail.com>
References: <8968fc7e0806171247o40d2f7a7gd64618d567c125fd@mail.gmail.com>
Message-ID: <48581E7F.40706@ebi.ac.uk>

Andres Pinzon wrote:
> The output is correct, but notseq changes the definition in the fasta
> headers, so if the fasta header in "xaa.list.fasta" was:
> 
> lcl|29855|ORF26673_6
> 
> the corresponding fasta header in sequence in 1000-1.fasta is:
> 
> 29855
> 
> Is there a way to tell "notseq" to keep the original fasta headers intact?

Yes.

FASTA format is not simple ... we have seen many ways to hide extra 
information in the ID (EMBOSS recognizes NCBI id formats and parses out 
the ID 29855) and also in the description (we try to recognize 
conventions used by GCG and ACEDB)

But you can also specify "pearson" format which reads the ID without 
parsing. Just add to the commandline:

notseq -sf pearson

Now you have another problem. This will not work for notseq!!!

The exclude string in notseq is a pattern. In processing the pattern, 
some pattern characters are removed:

	whitespace
	',' and ';'
	'|'

So your exclude pattern cannot include any '|' chatracters.

As a workaround, you can exclude "*ORF26673_6" and the IDs will be 
preserved.

For the next release we will allow '|' characters. When notseq was first 
written there was a possibility to use regualr expressions, but now we 
only use simple text matching so the pipe characters are not a problem.

Hope that helps

Peter


From jcohn at pngg.org  Wed Jun 25 17:51:28 2008
From: jcohn at pngg.org (Josh Cohn)
Date: Wed, 25 Jun 2008 13:51:28 -0400
Subject: [EMBOSS] einverted- file size limits?
Message-ID: <B9E40689C37A11439CDB8B38262228D75735@mtolympus.pngg.org>

Hello,

      I am attempting to use einverted on a relatively large set of
sequences.  I've noticed that when I run just a few sequences, einverted
seems to run just fine.  However, when I use the same parameters on a
large set of sequences, the program quits before it has finished
analyzing all of the data.  Are there known file size limits or sequence
length limits for einverted?  If so, how can I run large sequences
(>300kb) or large numbers of sequences (1000+)? 

 
I'm running einverted from EMBOSS 5.0.0 on a Sun machine running Solaris
9 for SPARC.

 
Thanks,

 
Josh

 
From jison at ebi.ac.uk  Thu Jun 26 07:24:40 2008
From: jison at ebi.ac.uk (Jon Ison)
Date: Thu, 26 Jun 2008 08:24:40 +0100 (BST)
Subject: [EMBOSS] einverted- file size limits?
In-Reply-To: <B9E40689C37A11439CDB8B38262228D75735@mtolympus.pngg.org>
References: <B9E40689C37A11439CDB8B38262228D75735@mtolympus.pngg.org>
Message-ID: <36190.84.92.187.247.1214465080.squirrel@webmail.ebi.ac.uk>

Hi Josh

The short answer is you need more memory and a faster computer.
Check there are no system limits on memory usage (do an "unlimit"
or some such).  EMBOSS has no arbitrary memory limits, it is just
that einverted uses full dynamic programming which is necessarily
very memory and CPU intensive, especially for larger sequences.

You could try running palindrome which does a similar thing and is
is faster and less memory intensive.

Cheers

Jon


> Hello,
>
>       I am attempting to use einverted on a relatively large set of
> sequences.  I've noticed that when I run just a few sequences, einverted
> seems to run just fine.  However, when I use the same parameters on a
> large set of sequences, the program quits before it has finished
> analyzing all of the data.  Are there known file size limits or sequence
> length limits for einverted?  If so, how can I run large sequences
> (>300kb) or large numbers of sequences (1000+)?
>
>
>
> I'm running einverted from EMBOSS 5.0.0 on a Sun machine running Solaris
> 9 for SPARC.
>
>
>
> Thanks,
>
>
>
> Josh
>
>
>
>
> _______________________________________________
> EMBOSS mailing list
> EMBOSS at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/emboss
>