From starksb at ebi.ac.uk Fri Nov 1 04:45:29 2002
From: starksb at ebi.ac.uk (David Starks-Browning)
Date: Fri, 1 Nov 2002 09:45:29 +0000
Subject: emboss in cygwin
In-Reply-To: <3DC16BAA.1050201@bigfoot.com>
References: <3DC16BAA.1050201@bigfoot.com>
Message-ID: <4429-Fri01Nov2002094530+0000-starksb@ebi.ac.uk>
On Thursday 31 Oct 02, clwu writes:
> Hi, group,
> I am new to group. I tried to compile EMBOSS under
> win2K/cygwin but I failed. EMBOSS website at HGMP mentioned that
> "Richard Bruskiewich and Simon Kelley at the Sanger Centre have
> succeeded in compiling EMBOSS under Windows NT using the CygWin package.
> The resulting executables have been tested but not thoroughly enough for
> a release. Contact Richard Bruskiewich for more information. ". But I
> can not follow the link in this page to get help.
> Does anyone have the successful experience on this?
I just built EMBOSS-2.5.1 on Win98 using the latest Cygwin downloaded
from . There is no libgd.[a|dll] so no PNG
support. But everything else appeared to build fine. I've not tested
the applications though.
Note that you will need much more from Cygwin's setup.exe than is
installed by default.
If you provide details about what failed, I may be able to help you.
Feel free to respond off-list, as a Cygwin build may not be
interesting to the rest of the emboss list. We can always summarise
to the emboss list once we get it sorted, if there is interest.
Regards,
David
(Cygwin FAQ maintainer)
-------------------------------------------------------------------
David Starks-Browning | starksb at ebi.ac.uk
EMBL Outstation -- |
The European Bioinformatics Institute |
Wellcome Trust Genome Campus | tel: +44 (1223) 494 616
Hinxton, Cambridge, CB10 1SD, UK | fax: +44 (1223) 494 468
-------------------------------------------------------------------
From peter.rice at uk.lionbioscience.com Fri Nov 1 05:12:58 2002
From: peter.rice at uk.lionbioscience.com (Peter Rice)
Date: Fri, 01 Nov 2002 10:12:58 +0000
Subject: emboss in cygwin
References: <3DC16BAA.1050201@bigfoot.com>
Message-ID: <3DC253AA.8080401@uk.lionbioscience.com>
clwu wrote:
> I am new to group. I tried to compile EMBOSS under
> win2K/cygwin but I failed. EMBOSS website at HGMP mentioned that
> "Richard Bruskiewich and Simon Kelley at the Sanger Centre have
> succeeded in compiling EMBOSS under Windows NT using the CygWin package.
> The resulting executables have been tested but not thoroughly enough for
> a release. Contact Richard Bruskiewich for more information. ". But I
> can not follow the link in this page to get help.
That is rather old information.
The history is that Richard Bruskiewich made a windows port of an early
ACEDB version, and they both tried porting an early EMBOSS release using
cygwin - which worked apart from the graphics library and windows fiel naming.
Neither Richard nor Simon have been working on this recently.
David Starks-Browning at EBI has built EMBOSS but not yet tried the
applications. I hear of other groups who have also tried.
You can expect problems with Windows filenames which clash with EMBOSS
"USA" syntax. We can try to fix these - perhaps by requiring all database
names to have more than one letter so Windows drive letters work.
Any suggestions on changes needed to make EMBOSS work better (or work at
all) on windows systems?
Peter
--
------------------------------------------------
Peter Rice, LION Bioscience Ltd, Cambridge, UK
peter.rice at uk.lionbioscience.com +44 1223 224723
From pageauma at ESI.UMontreal.CA Fri Nov 1 15:01:07 2002
From: pageauma at ESI.UMontreal.CA (Marie PAGEAU)
Date: Fri, 1 Nov 2002 15:01:07 -0500
Subject: test sequence
Message-ID:
Dear colleagues,
A lady, professor of biochemistry at
the Universite de Montreal, sent me the following request.
Would you please be nice enough to help us?
Your help would be highly appreciated.
Best regards,
Marie Pageau
-----------------------------------------------------------
De : Muriel Aubry
Envoy? : 30 octobre, 2002 14:47
Objet : test sequence
Hi,
I am presently using the restrict and showseq programs from EMBOSS. I
have noticed that some very usual enzymes are not detected by the
program such as XhoI and PstI and a few others when the complete list of
enzymes is used. I have here below a test sequence that should contain
XhoI, EcoRI, PstI, EcoRV, HindIII, KpnI, SacII, ApaI, SmaI, BamHI and
XbaI.
XhoI, PstI and EcoRV are not detected by restrict and showseq in the
test sequence shown below. Is there a problem with the restriction
enzyme list?
Test Sequence:
gagcagggggatctcggcgagctctcgagaattctcacgcgtctgcaggatatcaagcttgcggtaccgcgg
gcccggg
From ableasby at hgmp.mrc.ac.uk Fri Nov 1 15:21:29 2002
From: ableasby at hgmp.mrc.ac.uk (ableasby at hgmp.mrc.ac.uk)
Date: Fri, 1 Nov 2002 20:21:29 GMT
Subject: test sequence
Message-ID: <200211012021.UAA12495@bromine.hgmp.mrc.ac.uk>
There is probably not a problem. EMBOSS only reports one
isoschizomer for cases where several REs have the same cut
site. If the -preferred switch is given to these
programs then the more easily available of the isoschizomers
will be reported. This is controlled by the file:
embossre.equ
where, for each RE, you can specify which isoschizomer should be
reported.
So, first try adding -preferred. If you just want to search for
a particular set of enzymes they can be given as a comma-separated
list using the -enzymes qualifier e.g. -enzymes "ecori bamhi"
HTH
Alan Bleasby
HGMP
PS: NEB supply an emboss-format set of files which are just the most
common REs. You can rename them (e.g. to embossre.enz/ref/sup)
and overwrite your current set in the emboss REBASE directory.
From David.Bauer at SCHERING.DE Mon Nov 4 03:35:51 2002
From: David.Bauer at SCHERING.DE (David.Bauer at SCHERING.DE)
Date: Mon, 4 Nov 2002 09:35:51 +0100
Subject: test sequence
Message-ID:
Hi Alan,
I wonder how the "preferred" list is created.
Restrict finds the site "CTGCAG" as recognition site of BstMAI.
This is a rather exotic enzyme, available only from one single company, which I
didn't know before.
I appologize for my ignorance if this is a common supplier in UK ;-)
On the other hand PstI is available from about 20 suppliers and this is also the
enzyme name used in various catalogue pictures of multiple cloning sites in
vectors (puc19 polylinker etc.)
So I would suggest to add the BstMAI -> PstI mapping to the distribution version
of embossre.equ.
David.
There is probably not a problem. EMBOSS only reports one
isoschizomer for cases where several REs have the same cut
site. If the -preferred switch is given to these
programs then the more easily available of the isoschizomers
will be reported. This is controlled by the file:
embossre.equ
where, for each RE, you can specify which isoschizomer should be
reported.
So, first try adding -preferred. If you just want to search for
a particular set of enzymes they can be given as a comma-separated
list using the -enzymes qualifier e.g. -enzymes "ecori bamhi"
HTH
Alan Bleasby
HGMP
PS: NEB supply an emboss-format set of files which are just the most
common REs. You can rename them (e.g. to embossre.enz/ref/sup)
and overwrite your current set in the emboss REBASE directory.
From gbottu at ben.vub.ac.be Mon Nov 4 05:05:58 2002
From: gbottu at ben.vub.ac.be (Guy Bottu)
Date: Mon, 4 Nov 2002 11:05:58 +0100 (CET)
Subject: Remote getz from emboss
Message-ID: <200211041005.LAA1502643@black.vub.ac.be>
from : BEN
At the BEN site we do have a Perl script that reproduces more or less the
functionality of the GCG program lookup. It can access a local (with getz) or a
remote (with rsh getz) SRS server (simple outcomment inside the script, was
because we once had our SRS server on a different computer). It can be run
interactively at the command line or put behind an EMBOSS wrapper program and
thus behind Staden or the EMBOSS WWW interfaces. Is this what you are looking
for ?
Guy Bottu
From duhaimj at ircm.qc.ca Tue Nov 5 16:34:57 2002
From: duhaimj at ircm.qc.ca (Johanne Duhaime)
Date: Tue, 05 Nov 2002 16:34:57 -0500
Subject: MSE will not save on Exit
Message-ID: <3DC83981.625D3E83@ircm.qc.ca>
Hello
I am trying to use MSE (MSE -0.04.tar.gz just installed) but I cannot
save with the exit command.
After I modified a sequence, when I type Exit on the command line I
have:
Sequences modified do you wish to continue exiting [N]
Saying Y or N will not save anything.
For now I have to use "write".
Any idea of the problem?
--
Johanne Duhaime
IRCM
110 Ave des Pins O
Montreal, Quebec
987-5556 (tel) 987-5644 (fax)
Johanne_Duhaime at ircm.qc.ca
http://www.ircm.qc.ca
From w2hgcg at netscape.net Tue Nov 5 22:01:47 2002
From: w2hgcg at netscape.net (w2hgcg at netscape.net)
Date: Tue, 05 Nov 2002 22:01:47 -0500
Subject: epitope search
Message-ID: <56BE5F2D.194AD14A.000665E2@netscape.net>
I know this is not the place but perhaps...
I am working with LSA-3, I have been able to produce some antibodys in rabits, when I run Inmunoblot the antiboys recognize specific proteic bands (bandas, do not know the right english word), how can I search for epitopes in the pfalciparum against my inmunogenos? sorry for my english...
Lucia Goncalvez
__________________________________________________________________
The NEW Netscape 7.0 browser is now available. Upgrade now! http://channels.netscape.com/ns/browsers/download.jsp
Get your own FREE, personal Netscape Mail account today at http://webmail.netscape.com/
From ray at leicester.ac.uk Wed Nov 6 08:24:38 2002
From: ray at leicester.ac.uk (Dalgleish, Dr R.)
Date: Wed, 6 Nov 2002 13:24:38 -0000
Subject: Suggestion for new EMBOSS program
Message-ID:
I find GCG framealign very useful to align
a protein with its DNA sequence. Could
somebody find the time to write an EMBOSS
equivalent?
Thanks,
Raymond Dalgleish
Genetics
Leicester
From peter.rice at uk.lionbioscience.com Wed Nov 6 08:32:13 2002
From: peter.rice at uk.lionbioscience.com (Peter Rice)
Date: Wed, 06 Nov 2002 13:32:13 +0000
Subject: Suggestion for new EMBOSS program
References:
Message-ID: <3DC919DD.5020306@uk.lionbioscience.com>
Dalgleish, Dr R. wrote:
> I find GCG framealign very useful to align
> a protein with its DNA sequence. Could
> somebody find the time to write an EMBOSS
> equivalent?
Sounds rather like genewise in the (free) Wise2 package
http://www.sanger.ac.uk/Software/Wise2/
You can try it at http://www.sanger.ac.uk/Software/Wise2/genewiseform.shtml
Can you be more specific about whether this is what you need, and what you
want in EMBOSS?
regards,
Peter
--
------------------------------------------------
Peter Rice, LION Bioscience Ltd, Cambridge, UK
peter.rice at uk.lionbioscience.com +44 1223 224723
From heme at postmark.net Thu Nov 7 07:05:55 2002
From: heme at postmark.net (Per Johansson)
Date: Thu, 07 Nov 2002 12:05:55 +0000
Subject: Suggestion for new use for EMBOSS program
Message-ID: <20021107120555.8757.qmail@venus.postmark.net>
I have it difficult to find a replacement for the GCG program
FINDPATTERNS. The EMBOSS program fuzznuc cannot use a database of
patterns (primers). Other alignment programs in EMBOSS like
supermatcher are useful but, among other things, you can't choose
mismatch settings.
The best replacement I've found is the EMBOSS program tfscan! Tfscan
uses a database of patterns, but you can't reverse the patterns (you
have to put in copies of forward and reverse primer sequences in the
database). The tfscan algorithm is ideal (and is much faster than find
patterns) but obviously a few minor changes to the input and output
would be required if it were used in a replacement program.
Obviously, I could write a script to wrap tfscan but I'd like to
avoid this. A new program with this functionality would be beneficial
for the EMBOSS package.
Per
From charles at moulinette.dyndns.org Thu Nov 7 08:20:42 2002
From: charles at moulinette.dyndns.org (Charles Plessy)
Date: Thu, 7 Nov 2002 14:20:42 +0100
Subject: Suggestion for new use for EMBOSS program
In-Reply-To: <20021107120555.8757.qmail@venus.postmark.net>
References: <20021107120555.8757.qmail@venus.postmark.net>
Message-ID: <20021107132042.GA9854@moulinette.dyndns.org>
> The best replacement I've found is the EMBOSS program tfscan!
> Obviously, I could write a script to wrap tfscan but I'd like to
> avoid this. A new program with this functionality would be beneficial
> for the EMBOSS package.
An alternative would be to write a script that builds a
transfac-format database from a flatfile containing names and
corresponding consensus (This would also allow to migrate the
pattern.dat file of GCG).
Charles
From heme at postmark.net Thu Nov 7 10:11:43 2002
From: heme at postmark.net (Per Johansson)
Date: Thu, 07 Nov 2002 15:11:43 +0000
Subject: Fwd: Re: Suggestion for new use for EMBOSS program
Message-ID: <20021107151143.25950.qmail@venus.postmark.net>
It's OK to reformat the GCG pattern file to a transfac-format database
and use tfscan. But I still miss some functions in FINDPATTERNS. I
can't search the reverse primer strand, the output is limited to ONE
format no alignment format, tfscan it doesn't accept wobbeling bases
in primers (e.g. K=G or T , D=G or C or A not-T ...). But otherwise
the tfscan algorithm is a very nice and fast word-matching algorithm,
but it COULD be used for other purposes also!
Per
--- Forwarded Message ---
To: EMBOSS
From: Charles Plessy
Reply-To: c.plessy at mangoosta.net
Subject: Re: Suggestion for new use for EMBOSS program
Date: Thu, 7 Nov 2002 14:20:42 +0100
> The best replacement I've found is the EMBOSS program tfscan!
> Obviously, I could write a script to wrap tfscan but I'd like to
> avoid this. A new program with this functionality would be beneficial
> for the EMBOSS package.
An alternative would be to write a script that builds a
transfac-format database from a flatfile containing names and
corresponding consensus (This would also allow to migrate the
pattern.dat file of GCG).
Charles
From heme at postmark.net Fri Nov 8 09:28:53 2002
From: heme at postmark.net (Per Johansson)
Date: Fri, 08 Nov 2002 14:28:53 +0000
Subject: EMBOSS default program settings
Message-ID: <20021108142853.4941.qmail@venus.postmark.net>
I have problems with EMBOSS default program settings in the
emboss.defaults file.
set emboss_stdout 1 Works fine, output goes to stdout
set emboss_verbose 1 Dosen't work
set emboss_format embl The programs still outputs fasta format by
default! And the ONLY sequence format the EMBOSS programs accepts as
input format is embl!
It dosen't work as it should.
Per
From peter.rice at uk.lionbioscience.com Fri Nov 8 10:52:05 2002
From: peter.rice at uk.lionbioscience.com (Peter Rice)
Date: Fri, 08 Nov 2002 15:52:05 +0000
Subject: EMBOSS default program settings
References: <20021108142853.4941.qmail@venus.postmark.net>
Message-ID: <3DCBDDA5.3070106@uk.lionbioscience.com>
Per Johansson wrote:
> I have problems with EMBOSS default program settings in the
> emboss.defaults file.
>
> set emboss_stdout 1 Works fine, output goes to stdout
>
> set emboss_verbose 1 Dosen't work
Because help is generated as soon as the -help option is tested.
Changed in the next release to set -verbose before -help.
> set emboss_format embl The programs still outputs fasta format by
> default! And the ONLY sequence format the EMBOSS programs accepts as
> input format is embl!
>
> It dosen't work as it should.
Well .... emboss_format sets the default *input* format. You can still say
fasta::filename to read fasta format
The output format is specified as emboss_outformat
EMBOSS will read all input formats if you only set emboss_outformat
I think you really mean to say:
set emboss_outformat embl
Hope this helps
Peter
--
------------------------------------------------
Peter Rice, LION Bioscience Ltd, Cambridge, UK
peter.rice at uk.lionbioscience.com +44 1223 224723
From heme at postmark.net Mon Nov 11 01:32:50 2002
From: heme at postmark.net (Per Johansson)
Date: Mon, 11 Nov 2002 06:32:50 +0000
Subject: Fwd: Re: EMBOSS default program settings
Message-ID: <20021111063250.19730.qmail@venus.postmark.net>
Thank you Peter,
I DO mean
set emboss_outformat embl (but I can't find emboss_outformat in the
documentation). When I use
set emboss_outformat embl
in the emboss.default file I don't see any effect at all, the default
output format is still fasta, I use emboss-2.4.1 . But I assume this
is not version dependent.
Per
Per Johansson
heme at postmark.net
--- Forwarded Message ---
To: Per Johansson
Cc: EMBOSS
From: Peter Rice
Subject: Re: EMBOSS default program settings
Date: Fri, 08 Nov 2002 15:52:05 +0000
Per Johansson wrote:
> I have problems with EMBOSS default program settings in the
> emboss.defaults file.
>
> set emboss_stdout 1 Works fine, output goes to stdout
>
> set emboss_verbose 1 Dosen't work
Because help is generated as soon as the -help option is tested.
Changed in the next release to set -verbose before -help.
> set emboss_format embl The programs still outputs fasta format by
> default! And the ONLY sequence format the EMBOSS programs accepts as
> input format is embl!
>
> It dosen't work as it should.
Well .... emboss_format sets the default *input* format. You can still say
fasta::filename to read fasta format
The output format is specified as emboss_outformat
EMBOSS will read all input formats if you only set emboss_outformat
I think you really mean to say:
set emboss_outformat embl
Hope this helps
Peter
--
------------------------------------------------
Peter Rice, LION Bioscience Ltd, Cambridge, UK
peter.rice at uk.lionbioscience.com +44 1223 224723
From gwilliam at hgmp.mrc.ac.uk Mon Nov 11 04:20:01 2002
From: gwilliam at hgmp.mrc.ac.uk (Gary Williams, Tel 01223 494522)
Date: Mon, 11 Nov 2002 09:20:01 +0000
Subject: Fwd: Re: EMBOSS default program settings
References: <20021111063250.19730.qmail@venus.postmark.net>
Message-ID: <3DCF7641.E2ED9B73@hgmp.mrc.ac.uk>
Per Johansson wrote:
>
> Thank you Peter,
>
> I DO mean
>
> set emboss_outformat embl (but I can't find emboss_outformat in the
> documentation).
It is documented in:
http://www.hgmp.mrc.ac.uk/Software/EMBOSS/Usa/databases.html#global
--
Gary Williams Tel: +44 1223 494522 Fax: +44 1223 494512
mailto:G.Williams at hgmp.mrc.ac.uk http://www.hgmp.mrc.ac.uk/
Bioinformatics,MRC HGMP Resource Centre,Hinxton,Cambridge, CB10 1SB,UK
From heme at postmark.net Mon Nov 11 08:12:50 2002
From: heme at postmark.net (Per Johansson)
Date: Mon, 11 Nov 2002 13:12:50 +0000
Subject: Fwd: Re: EMBOSS default program settings
Message-ID: <20021111131250.4618.qmail@www2.postmark.net>
Tnak you,
That solves the problem, ALWAYS use the latest version!
Per
Gary Williams, Tel 01223 494522 wrote:
> Per Johansson wrote:
> >
> > Thank you Peter,
> >
> > I DO mean
> >
> > set emboss_outformat embl (but I can't find emboss_outformat in the
> > documentation).
>
> It is documented in:
> http://www.hgmp.mrc.ac.uk/Software/EMBOSS/Usa/databases.html#global
>
> --
> Gary Williams Tel: +44 1223 494522 Fax: +44 1223 494512
> mailto:G.Williams at hgmp.mrc.ac.uk http://www.hgmp.mrc.ac.uk/
> Bioinformatics,MRC HGMP Resource Centre,Hinxton,Cambridge, CB10 1SB,UK
From sebastian.bassi at ar.advantaseeds.com Mon Nov 11 07:10:54 2002
From: sebastian.bassi at ar.advantaseeds.com (Sebastian Bassi)
Date: Mon, 11 Nov 2002 13:10:54 +0100
Subject: Problem with EMBOSS GUI
Message-ID:
Hi,
I've just installed the EMBOSS GUI on http://genes.unq.edu.ar/EMBOSS
(this should look like this http://bioinfo.pbi.nrc.ca:8090/EMBOSS/)
The problem as you can see on the webpage is the missing programs on the
left column (it should appear there all the EMBOSS programs).
I think this should be a path problem. For you to help me evaluate it, I
attach two files:
embossdir.txt, a capture of the ls -Ra from my emboss inst. directory
(/opt/emboss).
emboss.pl, the emboss.pl file (for you to see if the path are right).
The emboss.zip file contains both files and I made it because sometimes
attached text get corrupted by some mailers.
I hope you can help me.
Note: The EMBOSS were compiled using this:
configure --prefix=/opt/emboss --without-x --x-includes="" --x-libraries=""
The "without x" part is because it was comiled on a RH web server without X.
The emboss works fine, the problem is this GUI.
Sebastian Bassi.
Advanta Seeds. Balcarce Research Station.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: emboss.zip
Type: application/x-zip-compressed
Size: 15092 bytes
Desc: emboss.zip
Url : http://lists.open-bio.org/pipermail/emboss/attachments/20021111/3f4c80e9/attachment.bin
From mad at biol.unlp.edu.ar Mon Nov 11 14:18:00 2002
From: mad at biol.unlp.edu.ar (=?ISO-8859-1?Q?Mart=EDn_Sarachu?=)
Date: Mon, 11 Nov 2002 16:18:00 -0300
Subject: tfextract not indexing?
Message-ID: <3DD00268.6090206@biol.unlp.edu.ar>
Hi,
tfextract is apparently running ok, but it's output are files are empty.
The command line is
> # tfextract -debug -warning -error -fatal -die -verbose
> Extract data from TRANSFAC
> Full pathname of transfac SITE.DAT: /home/work/dbs/transfac/site.dat
> #
and
> # ls -s /usr/local/emboss/share/EMBOSS/data/tf*
> 0 /usr/local/emboss/share/EMBOSS/data/tffungi
> 0 /usr/local/emboss/share/EMBOSS/data/tfinsect
> 0 /usr/local/emboss/share/EMBOSS/data/tfother
> 0 /usr/local/emboss/share/EMBOSS/data/tfplant
> 0 /usr/local/emboss/share/EMBOSS/data/tfvertebrate
...a sample from site.dat
> VV TRANSFAC SITES TABLE, V.2.4 25-08-1995
> XX
> //
> AC R00001
> XX
> ID HS$6-16_01
> XX
> DT 20.06.90 (created); .
> DT 24.08.95 10:48:05 (updated); EWI.
> XX
> TY DNA
> XX
> DE 6-16
> XX
> SE gGGAAAaTGAAACT
> XX
> EL ISRE
> XX
> SF -127
> ST -89
> XX
> ...
> ...
> SO 0811; B103
> ME gel shift competition
> RN [1]
> RA Suzuki-Yagawa Y., Kawakami K., Nagano K.
> RT Housekeeping Na,K-ATPase alpha1 subunit gene promoter is composed
> RT of multiple cis elements to which common and cell type-specific
> RT factors bind
> RL Mol. Cell. Biol. 12:4046-4055 (1992).
> DR EMBL; X52560; HSNFIL6(37:74).
> //
am I missing something?
Thanks,
martin
--
Mart?n Sarachu
mad at biol.unlp.edu.ar
EMBNet Argentina
http://www.ar.embnet.org
From Gunnar.Andersson at imbim.uu.se Tue Nov 12 05:28:47 2002
From: Gunnar.Andersson at imbim.uu.se (Gunnar Andersson)
Date: Tue, 12 Nov 2002 11:28:47 +0100
Subject: DAN output Tm
Message-ID:
How should I interpret the Tm calculated but DAN? Is Tm an estimated
melt point of the entire sequence or of the sequence in the window?
How can this Tm (window=100nt) be higher than Tmprod of the full 160
nt sequence?
--
Gunnar Andersson
Institutionen f?r medicinsk biokemi och mikrobiologi
Uppsala Biomedicinska Centrum (BMC), Husarg. 3
Box 582, 751 23 UPPSALA
E-post : Gunnar.Andersson at imbim.uu.se
Telefon: 018-471 45 87
Fax:018-50 98 76
From r.bowden at vir.gla.ac.uk Thu Nov 14 10:36:23 2002
From: r.bowden at vir.gla.ac.uk (Rory Bowden)
Date: Thu, 14 Nov 2002 15:36:23 -0000
Subject: Fw: Other: EMBOSS versus GCG?
Message-ID: <007e01c28bf3$96beddc0$6886d182@vir.gla.ac.uk>
This on the 'evoldir' list, which is the main international mailing list in
evolutionary biology. Would anyone like to make any comments for me to pass
on?
while I'm definitely not in Canada I would say that this question is likely
to come up here (in the UK) at the institutional if not research council
level. Does anyone have an opinion they'd like to articulate e.g. about
whether EMBOSS is ready to supplant GCG for end-users.
Rory Bowden
MRC Virology Unit
Glasgow UK
----- Original Message -----
From: "EvolDir"
To:
Sent: Thursday, November 14, 2002 9:34 AM
Subject: Other: EMBOSS versus GCG?
>
> Since its inception as the "Wisconsin package" in the early 1980s, the GCG
> suite of programs have provided a continuously improving "gold standard"
> for evolutionary bioinformatics software. The GCG suite is featured
> extensively in the latest bioinformatics textbooks (e.g. Mount) and in
> software reviews (e.g. The Scientist, August 19). Although some individual
> GCG programs have been surpassed by others, their range and flexibility,
> permitting linkage of programs together in innovative ways, has no current
> equivalent.
>
> Recently, "open source" advocates have pointed to the EMBOSS suite
as
> providing a free alternative to the commercial package (supplied by
> Accelrys, with whom I have no financial connection). It is my impression
> that GCG is in a different league. For example, compare the GCG program
> "Window" with its proposed EMBOSS alternative "Freak":
>
> TASK: Determination of the number of occurences of a motif in a sequence
> window.
>
> GGC program WINDOW
>
> 1. Allows up to 6 motifs at a time
> 2. Outputs absolute values and has a variety of other output options.
> 3. Extensive input menu
>
> EMBOSS program FREAK
>
> 1. Allows only 1 motif at a time.
> 2. Outputs a calculated fraction.
> 3. Very limited input menu.
>
> However, in Canada the open-source agenda has won out. In April
2002
> the publicly-funded, Halifax-based, Canadian Bioinformatics Resource (CBR)
> abandonned GCG, apparently with the consent of the Canadian evolutionary
> bioinformatics community. In this respect, I would be interested to hear
> from concerned parties in Canada with respect to the following questions:
>
> 1. Does your institution (or do you yourself) support GCG, so that you do
> not need CBR to supply GCG?
>
> 2. If you do not have independent access, do you find EMBOSS a suitable
> substitute for GCG?
>
> That Canada, which has spent hundreds of millions on genome
> projects, cannot give its researchers and their students a choice from
> among the relatively-inexpensive software packages that are available to
> analyze genomics data, seems to me very strange.
>
> Donald Forsdyke, Department of Biochemistry,
> Queen's University, Canada
> http://post.queensu.ca/~forsdyke/bioinfor.htm
>
>
>
>
From peter.rice at uk.lionbioscience.com Fri Nov 15 08:20:13 2002
From: peter.rice at uk.lionbioscience.com (Peter Rice)
Date: Fri, 15 Nov 2002 13:20:13 +0000
Subject: Fw: Other: EMBOSS versus GCG?
References: <007e01c28bf3$96beddc0$6886d182@vir.gla.ac.uk>
Message-ID: <3DD4F48D.2020606@uk.lionbioscience.com>
Rory Bowden wrote:
> This on the 'evoldir' list, which is the main international mailing list in
> evolutionary biology. Would anyone like to make any comments for me to pass
> on?
>> It is my impression
>>that GCG is in a different league. For example, compare the GCG program
>>"Window" with its proposed EMBOSS alternative "Freak":
>>
>>TASK: Determination of the number of occurences of a motif in a sequence
>>window.
>>
>>GGC program WINDOW
>>
>>1. Allows up to 6 motifs at a time
>>2. Outputs absolute values and has a variety of other output options.
>>3. Extensive input menu
>>
>>EMBOSS program FREAK
>>
>>1. Allows only 1 motif at a time.
>>2. Outputs a calculated fraction.
>>3. Very limited input menu.
Window : produces scores over a 'window' (a base range).
StatPlot : Plots Window results
EMBOSS : reports have scores over a base range as a general output format.
Freak: frequency of matches
FuzzNuc/FuzzPro/FuzzTran: Pattern matches with ambiguity codes
Restrict: Pattern matches with a pattern file
etc...
This makes it possible to develop some really nice new EMBOSS applications.
So ... how about a program which reads EMBOSS report files and produces a
summary report (think of window), and another that plots them all (think of
statplot). Scores could be plotted if we have a good way to compare them.
Yes, I know freak does not produce a report file ... but that is a very
easy change.
It could also read in EMBL/SwissProt feature tables as annotation.
So, suggestions please for EMBOSS applications to plot reports/features...
For example:
1. xy plot of scores as points at the centre of a feature, with the
sequence position on the x axis and the score on the y axis. Possibly split
into multiple plots by program/feature-type/named-tag-value (e.g. pattern)
(like statplot only much more versatile).
2. xy plot of lines for each feature
3. GANTT (bar) chart of features by position, annotated with feature
type/program/score as appropriate
4. Combine these - xy plot of features with scores, and other features
reported underneath (think of the -mark option in statplot - but with far
more annotation possible below the x axis)
Maybe we can make some mock-ups on the EMBOSS pages to show the possibilities?
regards,
Peter
--
------------------------------------------------
Peter Rice, LION Bioscience Ltd, Cambridge, UK
peter.rice at uk.lionbioscience.com +44 1223 224723
From sjmiller at u.arizona.edu Fri Nov 15 14:23:21 2002
From: sjmiller at u.arizona.edu (Susan J. Miller)
Date: Fri, 15 Nov 2002 12:23:21 -0700
Subject: newcpgreport vs newcpgseek
Message-ID: <3DD549A9.9170F056@u.arizona.edu>
I could not find an emboss FAQ...is there one?
I'm trying to figure out the differences between cpgreport, newcpgreport
and newcpgseek.
--
Thanks,
-susan
Susan J. Miller
Biotechnology Computing Facility
Arizona Research Laboratories
Bio West 228
University of Arizona
Tucson, AZ 85721
(520) 626-2597
From rls at ebi.ac.uk Fri Nov 15 20:08:02 2002
From: rls at ebi.ac.uk (Rodrigo Lopez)
Date: Sat, 16 Nov 2002 01:08:02 -0000
Subject: newcpgreport vs newcpgseek
In-Reply-To: <3DD549A9.9170F056@u.arizona.edu>
Message-ID: <000501c28d0c$9cf06780$0a0868d5@castafiore>
Hi Susan,
Yes. I never had the time to document these. Briefly:
newcpgreport use the same method to find islands but produce different
output. The method is described in:
Larsen,F., Gundersen,G., Lopez,R., Prydz,H.
CpG islands as gene markers in the human genome.
(1992) Genomics 13 (4):1095-107
MedlineID: 92372002 PubMedID: 1505946
Cpgreport uses a scoring method based on sum/frequencies which
overpredicts islands but finds the smaller ones around primary exons.
Cpgseek is deprecated at the moment.
For all practical purposes I use newcpgreport. I actually use it to
produce the human cpgisland database you can find on the EBI's ftp
server as well as on the EBI's SRS server.
Hope this helps,
R:)
> -----Original Message-----
> From: owner-emboss at hgmp.mrc.ac.uk
> [mailto:owner-emboss at hgmp.mrc.ac.uk] On Behalf Of Susan J. Miller
> Sent: 15 November 2002 19:23
> To: emboss at embnet.org
> Subject: newcpgreport vs newcpgseek
>
>
> I could not find an emboss FAQ...is there one?
>
> I'm trying to figure out the differences between cpgreport,
> newcpgreport and newcpgseek.
>
> --
> Thanks,
> -susan
>
> Susan J. Miller
> Biotechnology Computing Facility
> Arizona Research Laboratories
> Bio West 228
> University of Arizona
> Tucson, AZ 85721
> (520) 626-2597
>
From David.Bauer at SCHERING.DE Wed Nov 20 09:45:24 2002
From: David.Bauer at SCHERING.DE (David.Bauer at SCHERING.DE)
Date: Wed, 20 Nov 2002 15:45:24 +0100
Subject: vectorstrip
Message-ID:
Hi,
If I run vectorstrip on a file with many sequences, the output file contains
only sequences where the vector was stripped.
I would find it more usefull, if vectorstrip would (maybe optionally) also send
the sequences with no hit to the vector in the output file.
Or have I overseen something ?
David.
From Myrian_Grondin at UQTR.CA Wed Nov 20 11:15:20 2002
From: Myrian_Grondin at UQTR.CA (Myrian_Grondin at UQTR.CA)
Date: Wed, 20 Nov 2002 11:15:20 -0500
Subject: Install Emboss with Windows??
Message-ID: <1037808920.3ddbb518dbbac@courriel.uqtr.ca>
Hi,
We are working on PC, OS Windows 98, and we would like to know if it's possible
to install Emboss on our machine. If so, which software have we to install to
be able to run Emboss?
Thanks a lot (excuse me, my English is so poor...)
Myrian
-------------------------------------------------
Courriel exp?di? via https://courriel.uqtr.ca
From stefanielager at fastmail.ca Thu Nov 21 03:19:32 2002
From: stefanielager at fastmail.ca (Stefanie Lager)
Date: Thu, 21 Nov 2002 03:19:32 -0500 (EST)
Subject: EMBOOS end EMBL entryname
Message-ID: <3DDC9714.000059.00380@ns.interchange.ca>
Hi,
Does EMBL still stick to entrynames (the ID line)of "nine uppercase
alphanumeric Characters"?
(http://www.ebi.ac.uk/embl/Documentation/User_manual/id_line.html) .I
can't retrive sequences from the International Protein Index (IPI)
database (11 characters in ID entryname) in EMBL or SWISS format using
EMBOSS programs. The EMBOSS programs only accepts 10 characters for ID
in EMBL or SWISS format . Is this problem fixed in EMBOSS versions
later than 2.4.1? EMBL can have wthatever policy they want but it
would be nice if the EMBOSS programs would accept ANY lenth of ID also
in EMBL and SWISS format.
Stefanie
_________________________________________________________________
http://fastmail.ca/ - Fast Secure Web Email for Canadians
From sharmila at ebi.ac.uk Thu Nov 21 06:12:33 2002
From: sharmila at ebi.ac.uk (Sharmila Pillai)
Date: Thu, 21 Nov 2002 11:12:33 +0000
Subject: Install Emboss with Windows??
Message-ID:
Hi,
From what I understand there is a cygwin compiled version but its not
tested and cannot handle graphics.
You should refer to what Rodrigo Lopez wrote to the embosslist on
1/11/02 in response to subject:Remote getz from emboss
Though this not the solution for your problem today, this could be the
direction for Windows users. I'll try to explain bit of it here:
At the EBI's External Services group, I am working on a webservice for
EMBOSS using SOAP. Basically, this enables the user to use EMBOSS
applications remotely.
% seqret srsembl:J00231 -lhttp://servername:portnum/axis/services
The above command would use AXIS/SOAP to access the 'servername' and the
'portnum' which inturn would retrieve data from srsembl (as defined in
emboss.default) and pass it on to the application (seqret, in this example).
The result is sent to stdout.
All the user (using any OS) needs is a client which understands/interprets a
command line as above and some libraries for Axis/SOAP. We have an
experimental service using both Java and Perl running on Axis/Tomcat.
I don't think EBI provides remote access to many EMBOSS applications today.
Hoping our experimental service survives our tests and there is enough
demand for such a service, EBI can soon start opening up webservice access
to EMBOSS.
//Sharmila.
From Georg.Beckmann at Schering.DE Thu Nov 21 03:25:39 2002
From: Georg.Beckmann at Schering.DE (Georg.Beckmann at Schering.DE)
Date: Thu, 21 Nov 2002 09:25:39 +0100
Subject: OldDistances
Message-ID:
Hi,
does anybody know if EMBOSS offers a program similar to OldDistances in GCG ?
OldDistances - which previously had still another name, that I don't remember -
calculates a matrix of pairwise similarities from a multiple alignment.
As far as I can see, there is no such program. Is somebody working on such
program for Emboss ?
Thanks.
Ciao,
Georg Beckmann
From newgene at bigfoot.com Thu Nov 21 09:55:24 2002
From: newgene at bigfoot.com (clwu)
Date: Thu, 21 Nov 2002 08:55:24 -0600
Subject: Install Emboss with Windows??
References: <1037808920.3ddbb518dbbac@courriel.uqtr.ca>
Message-ID: <3DDCF3DC.9020202@bigfoot.com>
I recently compiled EMBOSS successfully under cygwin/win2K(Thanks for
David Starks-Browning's great help). And so far, all applications I used
works fine(graphics output is also OK under openbox/cygwin). I think you
should install cygwin and give a try.
good luck.
Chunlei
Myrian_Grondin at UQTR.CA wrote:e
>Hi,
>We are working on PC, OS Windows 98, and we would like to know if it's
possible
>to install Emboss on our machine. If so, which software have we to
install to
>be able to run Emboss?
>Thanks a lot (excuse me, my English is so poor...)
>Myrian
>
>
>
>-------------------------------------------------
>Courriel exp?di? via https://courriel.uqtr.ca
>
>
From lukem at gene.pbi.nrc.ca Thu Nov 21 10:28:13 2002
From: lukem at gene.pbi.nrc.ca (Luke McCarthy)
Date: Thu, 21 Nov 2002 09:28:13 -0600 (CST)
Subject: Install Emboss with Windows??
In-Reply-To: <1037808920.3ddbb518dbbac@courriel.uqtr.ca>
Message-ID:
On Wed, 20 Nov 2002 Myrian_Grondin at UQTR.CA wrote:
> Hi,
> We are working on PC, OS Windows 98, and we would like to know if it's
> possible to install Emboss on our machine. If so, which software have we to
> install to be able to run Emboss?
Other posts have addressed the issue of installing EMBOSS locally on a Windows
box, but if you have an immediate pressing need to use the EMBOSS
applications, the Canadian Bioinformatics Resource offers access through a web
interface at
http://www.cbr.nrc.ca/services/emboss_e.php
ou en francais:
http://www.cbr.nrc.ca/services/emboss_f.php
Unfortunately, the interface itself is English only, but then so are the
EMBOSS applications (at least as far as I know...) Cheers,
Luke
From newgene at bigfoot.com Thu Nov 21 11:52:11 2002
From: newgene at bigfoot.com (clwu)
Date: Thu, 21 Nov 2002 10:52:11 -0600
Subject: mfold
Message-ID: <3DDD0F3B.8020109@bigfoot.com>
Hi, group,
Does anybody know if there is a EMBOSS equivalence for
"mfold" program in GCG?
Thanks.
Chunlei
From stefanielager at fastmail.ca Fri Nov 22 03:33:29 2002
From: stefanielager at fastmail.ca (Stefanie Lager)
Date: Fri, 22 Nov 2002 03:33:29 -0500 (EST)
Subject: mfold
Message-ID: <3DDDEBD9.000009.03475@ns.interchange.ca>
> Hi, group,
> Does anybody know if there is a EMBOSS equivalence for
> "mfold" program in GCG?
>
> Thanks.
>
> Chunlei
NO, but there are plenty of RNA structure software out there, both as
servers and for local installation.
http://www.bioinfo.rpi.edu/~zukerm/rna/node3.html#SECTION00031
_________________________________________________________________
http://fastmail.ca/ - Fast Secure Web Email for Canadians
From mikep at angis.org.au Sun Nov 24 17:25:41 2002
From: mikep at angis.org.au (Michael Poidinger)
Date: Mon, 25 Nov 2002 09:25:41 +1100
Subject: codon useage tables
In-Reply-To: <3DDDEBD9.000009.03475@ns.interchange.ca>
Message-ID: <5.2.0.9.2.20021125092344.02f2ae50@morgan.angis.org.au>
Is there a site somewhere which describes which organisms/data sets the
EMBOSS codon useage tables are derived from? some are obvious from their
name, others are not.
Thanks,
Mike
------------------------------------
Dr Michael Poidinger
PhD(virology) PGDipSci (computer science)
CEO, Australian Genome Information Centre
Head, Australian National Genome Information Service
ph 61-2-93518617
mob 0413146765
fax 61-2-93518618
email head at angis.org.au
------------------------------------------
From areagp61 at yahoo.it Mon Nov 25 04:38:59 2002
From: areagp61 at yahoo.it (Graziano P.)
Date: Mon, 25 Nov 2002 10:38:59 +0100
Subject: codon useage tables
References: <5.2.0.9.2.20021125092344.02f2ae50@morgan.angis.org.au>
Message-ID: <007701c29466$8127a7f0$18105709@italy.ibm.com>
Not every file but most are described in the README file
from ftp://ftp.ebi.ac.uk/pub/databases/codonusage
Hope this helps
Graziano Pappad?
----- Original Message -----
From: "Michael Poidinger"
To:
Sent: Sunday, November 24, 2002 11:25 PM
Subject: codon useage tables
> Is there a site somewhere which describes which organisms/data sets the
> EMBOSS codon useage tables are derived from? some are obvious from their
> name, others are not.
>
> Thanks,
> Mike
> ------------------------------------
> Dr Michael Poidinger
> PhD(virology) PGDipSci (computer science)
> CEO, Australian Genome Information Centre
> Head, Australian National Genome Information Service
> ph 61-2-93518617
> mob 0413146765
> fax 61-2-93518618
> email head at angis.org.au
> ------------------------------------------
>
______________________________________________________________________
Per te Blu American Express ? gratis!
http://it.yahoo.com/mail_it/foot/?http://www.americanexpress.it/land_yahoo
From mikep at angis.org.au Mon Nov 25 16:58:02 2002
From: mikep at angis.org.au (Michael Poidinger)
Date: Tue, 26 Nov 2002 08:58:02 +1100
Subject: codon useage tables
In-Reply-To: <007701c29466$8127a7f0$18105709@italy.ibm.com>
References: <5.2.0.9.2.20021125092344.02f2ae50@morgan.angis.org.au>
Message-ID: <5.2.0.9.2.20021126085548.02f344e8@morgan.angis.org.au>
At 10:38 AM 25/11/2002 +0100, Graziano P. wrote:
>Not every file but most are described in the README file
>from ftp://ftp.ebi.ac.uk/pub/databases/codonusage
>
>Hope this helps
Thanks, it helps with quite a few.
Do you (or anyone else) know the difference between related files?
such as
Ehum and Ehuman
Eeco, Eeco_h and Eecoli
Emus, Emussp
etc.
Thanks,
Mike
------------------------------------
Dr Michael Poidinger
PhD(virology) PGDipSci (computer science)
CEO, Australian Genome Information Centre
Head, Australian National Genome Information Service
ph 61-2-93518617
mob 0413146765
fax 61-2-93518618
email head at angis.org.au
------------------------------------------
From peter.rice at uk.lionbioscience.com Tue Nov 26 05:40:04 2002
From: peter.rice at uk.lionbioscience.com (Peter Rice)
Date: Tue, 26 Nov 2002 10:40:04 +0000
Subject: codon useage tables
References: <5.2.0.9.2.20021125092344.02f2ae50@morgan.angis.org.au> <5.2.0.9.2.20021126085548.02f344e8@morgan.angis.org.au>
Message-ID: <3DE34F84.90108@uk.lionbioscience.com>
Michael Poidinger wrote:
> Do you (or anyone else) know the difference between related files?
>
> such as
> Ehum and Ehuman
> Eeco, Eeco_h and Eecoli
> Emus, Emussp
The codon usage files were set up a long time ago. It was not so easy to
find a good set of tables that were free to use. The first tables (if I
recall correctly) came from the TRANSTERM database
Short names (Eeco) are reformatted TRANSTERM codon usage tables with an E
(EMBOSS) prefix and a .cut suffix to identify the format.
Names with _h (Eco_h) are highly expressed genes (high Codon Adaptation
Index values)
sp endings? Help! Ysp is "Yeast S.pombe" of course. I assume the others are
for a genus (e.g. Mus sp. = Mus musculus and Mus domesticus) rather than
a single species. Emussp.cut is a reformat of TRANSTERM's mussp.cod file.
The EBI's FTP copy of TRANSTERM did not document exactly what these names
mean. The original TRANSTERM documentation also leaves you to guess at the
3-letter spoecies codes. The TRANSTERM website seems to be only partly
available.
Longer names (Eecoli) are added from elsewhere (I need to check on their
origin) and only include a few genes (count the stop codons!) so I assume
they are old and probably obsolete.
mt endings are mitochondrial genes
cp endings are chloroplast genes
Time to review these tables I suspect!!! How about replacing them with
annotated tables from CUTG for selected species? We need to be careful
about default table names in some programs, but they are easy to update.
Peter
--
------------------------------------------------
Peter Rice, LION Bioscience Ltd, Cambridge, UK
peter.rice at uk.lionbioscience.com +44 1223 224723
From Joerg.Schaber at uv.es Tue Nov 26 07:41:44 2002
From: Joerg.Schaber at uv.es (Joerg Schaber)
Date: Tue, 26 Nov 2002 13:41:44 +0100
Subject: duplicate ID
Message-ID: <3DE36C08.6030603@uv.es>
Hi,
creating a ncbi database using dbiflat I always get a few times the message
"Warning: Duplicate ID skipped: '' All hits will point to first ID
found".
Even though it does not seem to have severe efects I would like to know
what duplicate IDs are ment.
I checked the genomes IDs and acnums and they seem to be OK (all *gbk
files downloaded from NCBI) and they all have entries and are not 'null'.
Any idea what's the problem?
here the command I use:
dbiflat -idformat gb -directory "." -filename "*.gbk" -dbname "ncbibac"
-release "1.0" -date "26/11/02" -fields acnum,des,taxon
greeetings,
joerg
From david.vilanova at rdls.nestle.com Tue Nov 26 08:43:40 2002
From: david.vilanova at rdls.nestle.com (Vilanova,David,LAUSANNE,NRC/BS)
Date: Tue, 26 Nov 2002 14:43:40 +0100
Subject: Matcher
Message-ID: <89466355CEFE7244AC3A013E45641C180144ECDD@lsmail2.crn.nestrd.ch>
Dear all,
I was wondering if matcher program accepts a sequence via stdin.
the following exemple doesn't work for me.
matcher -sequencea 'ATGCGA' -sequenceb 'ATCTAGATATGCGA'
>cannot open ATGCGA file for read.
Is there anyway to submit a sequence via stdin ???
Thanks,
David
From peter.rice at uk.lionbioscience.com Tue Nov 26 08:53:50 2002
From: peter.rice at uk.lionbioscience.com (Peter Rice)
Date: Tue, 26 Nov 2002 13:53:50 +0000
Subject: Matcher
References: <89466355CEFE7244AC3A013E45641C180144ECDD@lsmail2.crn.nestrd.ch>
Message-ID: <3DE37CEE.5090903@uk.lionbioscience.com>
Vilanova,David,LAUSANNE,NRC/BS wrote:
> Dear all,
> I was wondering if matcher program accepts a sequence via stdin.
>
> the following exemple doesn't work for me.
>
> matcher -sequencea 'ATGCGA' -sequenceb 'ATCTAGATATGCGA'
>
>>cannot open ATGCGA file for read.
>
>
> Is there anyway to submit a sequence via stdin ???
You don't mean stdin (that can only read one sequence anyway) ... you mean
"can I specify a sequence on the command line?"
Yes!!!! You need the "asis" special format.
matcher -sequencea 'asis::ATGCGA' -sequenceb 'asis::ATCTAGATATGCGA'
(assuming your shell allows the command line to be long enough for your
sequences :-)
Hope this helps
Peter
--
------------------------------------------------
Peter Rice, LION Bioscience Ltd, Cambridge, UK
peter.rice at uk.lionbioscience.com +44 1223 224723
From david.vilanova at rdls.nestle.com Tue Nov 26 09:12:19 2002
From: david.vilanova at rdls.nestle.com (Vilanova,David,LAUSANNE,NRC/BS)
Date: Tue, 26 Nov 2002 15:12:19 +0100
Subject: Matcher
Message-ID: <89466355CEFE7244AC3A013E45641C180144ECE0@lsmail2.crn.nestrd.ch>
Thanks Peter,
Sorry for the mistake.
I'm writing a bioperl script which automatically runs an emboss aplication.
I could have worked by generating foreach sequence I read a new file but it
looks pretty nice like that.
Regards,
David
#! /usr/bin/perl -w
use Bio::Factory::EMBOSS;
use Bio::SeqIO;
die "Usage: perl script.pl [seqfileA] [seqfileB] [outfile]\n" unless @ARGV
eq '3';
#Read input files
($seqfileA,$seqfileB,$outfile) = @ARGV;
#Initialize Object
$EMBOSS = new Bio::Factory::EMBOSS;
#Define emboss program to run
$application = $EMBOSS->program('matcher');
#Manipulate SeqfileA file
$seqA = new Bio::SeqIO (-file => $seqfileA,
-format => 'fasta');
while ($seqinA = $seqA->next_seq){
$inseqA = "asis::".$seqinA->seq;
$seqidA = $seqinA->id;
#$seqoutA->write_seq($inseqA);
print "####$seqidA\n";
#Initialize seqB at every iteration of SeqA
$seqB = new Bio::SeqIO (-file => $seqfileB,
-format => 'fasta');
while ($seqinB = $seqB->next_seq){
$inseqB = "asis::".$seqinB->seq; #Format like asis::ATGCGA (required
for emboss)
$seqidB = $seqinB->id;
#$seqoutB->write_seq($inseqB);
#print "####$inseqA\n";
print "Processing sequence $seqidA..vs..$seqidB...";
#Define program parameters and run...
$application->run({
-sequencea => $inseqA,
-sequenceb => $inseqB,
-outfile => $outfile });
print "done\n";
....
Manipulate alignments.....
....
}
}
-----Original Message-----
From: Peter Rice [mailto:peter.rice at uk.lionbioscience.com]
Sent: mardi, 26. novembre 2002 14:54
To: Vilanova,David,LAUSANNE,NRC/BS
Cc: 'emboss at embnet.org'
Subject: Re: Matcher
Vilanova,David,LAUSANNE,NRC/BS wrote:
> Dear all,
> I was wondering if matcher program accepts a sequence via stdin.
>
> the following exemple doesn't work for me.
>
> matcher -sequencea 'ATGCGA' -sequenceb 'ATCTAGATATGCGA'
>
>>cannot open ATGCGA file for read.
>
>
> Is there anyway to submit a sequence via stdin ???
You don't mean stdin (that can only read one sequence anyway) ... you mean
"can I specify a sequence on the command line?"
Yes!!!! You need the "asis" special format.
matcher -sequencea 'asis::ATGCGA' -sequenceb 'asis::ATCTAGATATGCGA'
(assuming your shell allows the command line to be long enough for your
sequences :-)
Hope this helps
Peter
--
------------------------------------------------
Peter Rice, LION Bioscience Ltd, Cambridge, UK
peter.rice at uk.lionbioscience.com +44 1223 224723
From jason at cgt.mc.duke.edu Tue Nov 26 09:54:19 2002
From: jason at cgt.mc.duke.edu (Jason Stajich)
Date: Tue, 26 Nov 2002 09:54:19 -0500 (EST)
Subject: Matcher
In-Reply-To: <89466355CEFE7244AC3A013E45641C180144ECE0@lsmail2.crn.nestrd.ch>
References: <89466355CEFE7244AC3A013E45641C180144ECE0@lsmail2.crn.nestrd.ch>
Message-ID:
Bioperl will also do the behind-the-scenes work of creating the tempfile
and cleaning it up for you if you just pass in a Bio::PrimarySeqI object.
It detects if you pass in an object or a string and proceeds accordingly.
Jason Stajich
Duke University
jason at cgt.mc.duke.edu
On Tue, 26 Nov 2002, Vilanova,David,LAUSANNE,NRC/BS wrote:
> Thanks Peter,
> Sorry for the mistake.
> I'm writing a bioperl script which automatically runs an emboss aplication.
> I could have worked by generating foreach sequence I read a new file but it
> looks pretty nice like that.
>
> Regards,
> David
>
>
> #! /usr/bin/perl -w
>
> use Bio::Factory::EMBOSS;
> use Bio::SeqIO;
>
> die "Usage: perl script.pl [seqfileA] [seqfileB] [outfile]\n" unless @ARGV
> eq '3';
>
> #Read input files
> ($seqfileA,$seqfileB,$outfile) = @ARGV;
>
> #Initialize Object
> $EMBOSS = new Bio::Factory::EMBOSS;
>
> #Define emboss program to run
> $application = $EMBOSS->program('matcher');
>
> #Manipulate SeqfileA file
> $seqA = new Bio::SeqIO (-file => $seqfileA,
> -format => 'fasta');
>
>
> while ($seqinA = $seqA->next_seq){
> $inseqA = "asis::".$seqinA->seq;
> $seqidA = $seqinA->id;
> #$seqoutA->write_seq($inseqA);
>
> print "####$seqidA\n";
> #Initialize seqB at every iteration of SeqA
> $seqB = new Bio::SeqIO (-file => $seqfileB,
> -format => 'fasta');
>
> while ($seqinB = $seqB->next_seq){
> $inseqB = "asis::".$seqinB->seq; #Format like asis::ATGCGA (required
> for emboss)
> $seqidB = $seqinB->id;
> #$seqoutB->write_seq($inseqB);
> #print "####$inseqA\n";
> print "Processing sequence $seqidA..vs..$seqidB...";
>
>
> #Define program parameters and run...
> $application->run({
> -sequencea => $inseqA,
> -sequenceb => $inseqB,
> -outfile => $outfile });
> print "done\n";
> ....
> Manipulate alignments.....
> ....
> }
>
> }
>
>
>
>
>
> -----Original Message-----
> From: Peter Rice [mailto:peter.rice at uk.lionbioscience.com]
> Sent: mardi, 26. novembre 2002 14:54
> To: Vilanova,David,LAUSANNE,NRC/BS
> Cc: 'emboss at embnet.org'
> Subject: Re: Matcher
>
>
> Vilanova,David,LAUSANNE,NRC/BS wrote:
> > Dear all,
> > I was wondering if matcher program accepts a sequence via stdin.
> >
> > the following exemple doesn't work for me.
> >
> > matcher -sequencea 'ATGCGA' -sequenceb 'ATCTAGATATGCGA'
> >
> >>cannot open ATGCGA file for read.
> >
> >
> > Is there anyway to submit a sequence via stdin ???
>
> You don't mean stdin (that can only read one sequence anyway) ... you mean
> "can I specify a sequence on the command line?"
>
> Yes!!!! You need the "asis" special format.
>
> matcher -sequencea 'asis::ATGCGA' -sequenceb 'asis::ATCTAGATATGCGA'
>
> (assuming your shell allows the command line to be long enough for your
> sequences :-)
>
> Hope this helps
>
> Peter
>
> --
> ------------------------------------------------
> Peter Rice, LION Bioscience Ltd, Cambridge, UK
> peter.rice at uk.lionbioscience.com +44 1223 224723
>
From david.vilanova at rdls.nestle.com Tue Nov 26 10:58:32 2002
From: david.vilanova at rdls.nestle.com (Vilanova,David,LAUSANNE,NRC/BS)
Date: Tue, 26 Nov 2002 16:58:32 +0100
Subject: Bioperl and matcher
Message-ID: <89466355CEFE7244AC3A013E45641C180144ECE7@lsmail2.crn.nestrd.ch>
Hello,
I have problems retrieving the alignments from an emboss output.
The program belows read 2 files and runs a matcher of all against all.
Matcher gives me an msf output and then I try to parse this alignment with
Bio::AlignIO.
However I get an exception...
Processing sequence 1..vs..3...done
------------- EXCEPTION -------------
MSG: 1 exists as an alignment line but not in the header. Not confident of
what is going on!
STACK Bio::AlignIO::msf::next_aln
/usr/local/lib/perl5/site_perl/5.8.0/Bio/AlignIO/msf.pm:106
STACK toplevel Run_Emboss.pl:50
--------------------------------------
Here is the output from matcher:
!!NA_MULTIPLE_ALIGNMENT 1.0
out MSF: 5 Type: N 26/11/02 CompCheck: 2090 ..
Name: EMBOSS_001 Len: 5 Check: 1045 Weight: 1.00
Name: EMBOSS_002 Len: 5 Check: 1045 Weight: 1.00
//
1 5
EMBOSS_001 CGGCG
EMBOSS_002 CGGCG
###########################################################
It doesn't work for fasta format as well in my script (see output below):
Processing sequence 1..vs..3...done
Use of uninitialized value in sprintf at
/usr/local/lib/perl5/site_perl/5.8.0/Bio/SimpleAlign.pm line 257,
line 4.
Use of uninitialized value in hash element at
/usr/local/lib/perl5/site_perl/5.8.0/Bio/SimpleAlign.pm line 268,
line 4.
Use of uninitialized value in hash element at
/usr/local/lib/perl5/site_perl/5.8.0/Bio/SimpleAlign.pm line 268,
line 4.
Use of uninitialized value in hash element at
/usr/local/lib/perl5/site_perl/5.8.0/Bio/SimpleAlign.pm line 270,
line 4.
#########################
#Script
#! /usr/bin/perl -w
use Bio::Factory::EMBOSS;
use Bio::SeqIO;
use Bio::AlignIO;
die "Usage: perl script.pl [seqfileA] [seqfileB] [outfile]\n" unless @ARGV
eq '3';
#Read input files
($seqfileA,$seqfileB,$outfile) = @ARGV;
#Initialize Object
$EMBOSS = new Bio::Factory::EMBOSS;
#Define emboss program to run
$application = $EMBOSS->program('matcher');
#Manipulate SeqfileA file
$seqA = new Bio::SeqIO (-file => $seqfileA,
-format => 'fasta');
while ($seqinA = $seqA->next_seq){
$inseqA = "asis::".$seqinA->seq;
$seqidA = $seqinA->id;
print "####$seqidA\n";
#Initialize seqB at every iteration of SeqA
$seqB = new Bio::SeqIO (-file => $seqfileB,
-format => 'fasta');
while ($seqinB = $seqB->next_seq){
$inseqB = "asis::".$seqinB->seq; #Format like asis::ATGCGA (required for
emboss)
$seqidB = $seqinB->id;
print "Processing sequence $seqidA..vs..$seqidB...";
#Define program parameters and run...
$application->run({
-sequencea => $inseqA,
-sequenceb => $inseqB,
-aformat => 'msf',
-outfile => $outfile });
print "done\n";
$alnin = new Bio::AlignIO(-format => 'msf',
-file => $outfile );
while ($aln = $alnin->next_aln){
print $aln->no_residues,"\n";
#print $aln->consensus_string,"\n";
}
}
}
From jason at cgt.mc.duke.edu Tue Nov 26 11:05:22 2002
From: jason at cgt.mc.duke.edu (Jason Stajich)
Date: Tue, 26 Nov 2002 11:05:22 -0500 (EST)
Subject: Bioperl and matcher
In-Reply-To: <89466355CEFE7244AC3A013E45641C180144ECE7@lsmail2.crn.nestrd.ch>
References: <89466355CEFE7244AC3A013E45641C180144ECE7@lsmail2.crn.nestrd.ch>
Message-ID:
Our msf parser is seeing something it isn't expecting - not sure why -
what happens when you just use the straight 'emboss' parser with standard
emboss alignment output which is the route that has been most heavily
tested?
-jason
Jason Stajich
Duke University
jason at cgt.mc.duke.edu
On Tue, 26 Nov 2002, Vilanova,David,LAUSANNE,NRC/BS wrote:
>
> Hello,
> I have problems retrieving the alignments from an emboss output.
> The program belows read 2 files and runs a matcher of all against all.
> Matcher gives me an msf output and then I try to parse this alignment with
> Bio::AlignIO.
> However I get an exception...
>
> Processing sequence 1..vs..3...done
>
> ------------- EXCEPTION -------------
> MSG: 1 exists as an alignment line but not in the header. Not confident of
> what is going on!
> STACK Bio::AlignIO::msf::next_aln
> /usr/local/lib/perl5/site_perl/5.8.0/Bio/AlignIO/msf.pm:106
> STACK toplevel Run_Emboss.pl:50
>
> --------------------------------------
>
> Here is the output from matcher:
> !!NA_MULTIPLE_ALIGNMENT 1.0
>
> out MSF: 5 Type: N 26/11/02 CompCheck: 2090 ..
>
> Name: EMBOSS_001 Len: 5 Check: 1045 Weight: 1.00
> Name: EMBOSS_002 Len: 5 Check: 1045 Weight: 1.00
>
> //
>
> 1 5
> EMBOSS_001 CGGCG
> EMBOSS_002 CGGCG
>
>
> ###########################################################
> It doesn't work for fasta format as well in my script (see output below):
> Processing sequence 1..vs..3...done
> Use of uninitialized value in sprintf at
> /usr/local/lib/perl5/site_perl/5.8.0/Bio/SimpleAlign.pm line 257,
> line 4.
> Use of uninitialized value in hash element at
> /usr/local/lib/perl5/site_perl/5.8.0/Bio/SimpleAlign.pm line 268,
> line 4.
> Use of uninitialized value in hash element at
> /usr/local/lib/perl5/site_perl/5.8.0/Bio/SimpleAlign.pm line 268,
> line 4.
> Use of uninitialized value in hash element at
> /usr/local/lib/perl5/site_perl/5.8.0/Bio/SimpleAlign.pm line 270,
> line 4.
>
> #########################
>
>
> #Script
> #! /usr/bin/perl -w
>
> use Bio::Factory::EMBOSS;
> use Bio::SeqIO;
> use Bio::AlignIO;
>
> die "Usage: perl script.pl [seqfileA] [seqfileB] [outfile]\n" unless @ARGV
> eq '3';
>
> #Read input files
> ($seqfileA,$seqfileB,$outfile) = @ARGV;
>
> #Initialize Object
> $EMBOSS = new Bio::Factory::EMBOSS;
>
> #Define emboss program to run
> $application = $EMBOSS->program('matcher');
>
> #Manipulate SeqfileA file
> $seqA = new Bio::SeqIO (-file => $seqfileA,
> -format => 'fasta');
>
>
> while ($seqinA = $seqA->next_seq){
> $inseqA = "asis::".$seqinA->seq;
> $seqidA = $seqinA->id;
>
>
> print "####$seqidA\n";
> #Initialize seqB at every iteration of SeqA
> $seqB = new Bio::SeqIO (-file => $seqfileB,
> -format => 'fasta');
>
> while ($seqinB = $seqB->next_seq){
> $inseqB = "asis::".$seqinB->seq; #Format like asis::ATGCGA (required for
> emboss)
> $seqidB = $seqinB->id;
>
> print "Processing sequence $seqidA..vs..$seqidB...";
>
> #Define program parameters and run...
> $application->run({
> -sequencea => $inseqA,
> -sequenceb => $inseqB,
> -aformat => 'msf',
> -outfile => $outfile });
> print "done\n";
>
> $alnin = new Bio::AlignIO(-format => 'msf',
> -file => $outfile );
>
> while ($aln = $alnin->next_aln){
> print $aln->no_residues,"\n";
> #print $aln->consensus_string,"\n";
>
> }
> }
> }
>
>
>
>
>
>
>
>
>
From peter.rice at uk.lionbioscience.com Tue Nov 26 11:12:46 2002
From: peter.rice at uk.lionbioscience.com (Peter Rice)
Date: Tue, 26 Nov 2002 16:12:46 +0000
Subject: Bioperl and matcher
References: <89466355CEFE7244AC3A013E45641C180144ECE7@lsmail2.crn.nestrd.ch>
Message-ID: <3DE39D7E.9080403@uk.lionbioscience.com>
Vilanova,David,LAUSANNE,NRC/BS wrote:
>
> Hello,
> I have problems retrieving the alignments from an emboss output.
> The program belows read 2 files and runs a matcher of all against all.
> Matcher gives me an msf output and then I try to parse this alignment with
> Bio::AlignIO.
> However I get an exception...
>
> Processing sequence 1..vs..3...done
>
> ------------- EXCEPTION -------------
> MSG: 1 exists as an alignment line but not in the header. Not confident of
> what is going on!
BioPerl seems to be having trouble with the EMBOSS MSF format output. It
could be something about the naming of the sequences?
EMBOSS is making up names for your sequences. I assume you are using
asis::CGGCG to pass them to matcher. You can put -sid after each sequence
to give them names, for example:
matcher -out x.x -af msf asis:ccggc -sid cg asis::cgggc -sid gg
(-sid, like -aformat, is an associated qualifier. It must follow the asis::
sequence because it is positional (putting it first on the command line for
example would refer to all sequences - fine for -sformat but not a good
idea for -sid :-)
Hope this helps
Peter
--
------------------------------------------------
Peter Rice, LION Bioscience Ltd, Cambridge, UK
peter.rice at uk.lionbioscience.com +44 1223 224723
From david.vilanova at rdls.nestle.com Tue Nov 26 11:14:19 2002
From: david.vilanova at rdls.nestle.com (Vilanova,David,LAUSANNE,NRC/BS)
Date: Tue, 26 Nov 2002 17:14:19 +0100
Subject: Bioperl and matcher
Message-ID: <89466355CEFE7244AC3A013E45641C180144ECE8@lsmail2.crn.nestrd.ch>
Ok,I use:
$alnin = new Bio::AlignIO(-format =>'emboss',
-file => $outfile );
while ($aln = $alnin->next_aln){
print $aln->no_residues,"\n";
}
I don't specify any format to emboss so I get the standard alignment.
In this case It doesn't work, it never enters this loop... but the program
doesn't crash. It does all the alignements, store the aln in outfile but
seems not to read it..!! bizarre ???
David
-----Original Message-----
From: Jason Stajich [mailto:jason at cgt.mc.duke.edu]
Sent: mardi, 26. novembre 2002 17:05
To: Vilanova,David,LAUSANNE,NRC/BS
Cc: 'bioperl-l at bioperl.org'; 'emboss at embnet.org'
Subject: Re: Bioperl and matcher
Our msf parser is seeing something it isn't expecting - not sure why -
what happens when you just use the straight 'emboss' parser with standard
emboss alignment output which is the route that has been most heavily
tested?
-jason
Jason Stajich
Duke University
jason at cgt.mc.duke.edu
On Tue, 26 Nov 2002, Vilanova,David,LAUSANNE,NRC/BS wrote:
>
> Hello,
> I have problems retrieving the alignments from an emboss output.
> The program belows read 2 files and runs a matcher of all against all.
> Matcher gives me an msf output and then I try to parse this alignment with
> Bio::AlignIO.
> However I get an exception...
>
> Processing sequence 1..vs..3...done
>
> ------------- EXCEPTION -------------
> MSG: 1 exists as an alignment line but not in the header. Not confident of
> what is going on!
> STACK Bio::AlignIO::msf::next_aln
> /usr/local/lib/perl5/site_perl/5.8.0/Bio/AlignIO/msf.pm:106
> STACK toplevel Run_Emboss.pl:50
>
> --------------------------------------
>
> Here is the output from matcher:
> !!NA_MULTIPLE_ALIGNMENT 1.0
>
> out MSF: 5 Type: N 26/11/02 CompCheck: 2090 ..
>
> Name: EMBOSS_001 Len: 5 Check: 1045 Weight: 1.00
> Name: EMBOSS_002 Len: 5 Check: 1045 Weight: 1.00
>
> //
>
> 1 5
> EMBOSS_001 CGGCG
> EMBOSS_002 CGGCG
>
>
> ###########################################################
> It doesn't work for fasta format as well in my script (see output below):
> Processing sequence 1..vs..3...done
> Use of uninitialized value in sprintf at
> /usr/local/lib/perl5/site_perl/5.8.0/Bio/SimpleAlign.pm line 257,
> line 4.
> Use of uninitialized value in hash element at
> /usr/local/lib/perl5/site_perl/5.8.0/Bio/SimpleAlign.pm line 268,
> line 4.
> Use of uninitialized value in hash element at
> /usr/local/lib/perl5/site_perl/5.8.0/Bio/SimpleAlign.pm line 268,
> line 4.
> Use of uninitialized value in hash element at
> /usr/local/lib/perl5/site_perl/5.8.0/Bio/SimpleAlign.pm line 270,
> line 4.
>
> #########################
>
>
> #Script
> #! /usr/bin/perl -w
>
> use Bio::Factory::EMBOSS;
> use Bio::SeqIO;
> use Bio::AlignIO;
>
> die "Usage: perl script.pl [seqfileA] [seqfileB] [outfile]\n" unless @ARGV
> eq '3';
>
> #Read input files
> ($seqfileA,$seqfileB,$outfile) = @ARGV;
>
> #Initialize Object
> $EMBOSS = new Bio::Factory::EMBOSS;
>
> #Define emboss program to run
> $application = $EMBOSS->program('matcher');
>
> #Manipulate SeqfileA file
> $seqA = new Bio::SeqIO (-file => $seqfileA,
> -format => 'fasta');
>
>
> while ($seqinA = $seqA->next_seq){
> $inseqA = "asis::".$seqinA->seq;
> $seqidA = $seqinA->id;
>
>
> print "####$seqidA\n";
> #Initialize seqB at every iteration of SeqA
> $seqB = new Bio::SeqIO (-file => $seqfileB,
> -format => 'fasta');
>
> while ($seqinB = $seqB->next_seq){
> $inseqB = "asis::".$seqinB->seq; #Format like asis::ATGCGA (required for
> emboss)
> $seqidB = $seqinB->id;
>
> print "Processing sequence $seqidA..vs..$seqidB...";
>
> #Define program parameters and run...
> $application->run({
> -sequencea => $inseqA,
> -sequenceb => $inseqB,
> -aformat => 'msf',
> -outfile => $outfile });
> print "done\n";
>
> $alnin = new Bio::AlignIO(-format => 'msf',
> -file => $outfile );
>
> while ($aln = $alnin->next_aln){
> print $aln->no_residues,"\n";
> #print $aln->consensus_string,"\n";
>
> }
> }
> }
>
>
>
>
>
>
>
>
>
From david.vilanova at rdls.nestle.com Tue Nov 26 11:33:56 2002
From: david.vilanova at rdls.nestle.com (Vilanova,David,LAUSANNE,NRC/BS)
Date: Tue, 26 Nov 2002 17:33:56 +0100
Subject: Bioperl and matcher
Message-ID: <89466355CEFE7244AC3A013E45641C180144ECEC@lsmail2.crn.nestrd.ch>
I tried that but it still doesn't fix the problem...
-----Original Message-----
From: Peter Rice [mailto:peter.rice at uk.lionbioscience.com]
Sent: mardi, 26. novembre 2002 17:13
To: Vilanova,David,LAUSANNE,NRC/BS
Cc: 'bioperl-l at bioperl.org'; 'emboss at embnet.org'
Subject: Re: Bioperl and matcher
Vilanova,David,LAUSANNE,NRC/BS wrote:
>
> Hello,
> I have problems retrieving the alignments from an emboss output.
> The program belows read 2 files and runs a matcher of all against all.
> Matcher gives me an msf output and then I try to parse this alignment with
> Bio::AlignIO.
> However I get an exception...
>
> Processing sequence 1..vs..3...done
>
> ------------- EXCEPTION -------------
> MSG: 1 exists as an alignment line but not in the header. Not confident of
> what is going on!
BioPerl seems to be having trouble with the EMBOSS MSF format output. It
could be something about the naming of the sequences?
EMBOSS is making up names for your sequences. I assume you are using
asis::CGGCG to pass them to matcher. You can put -sid after each sequence
to give them names, for example:
matcher -out x.x -af msf asis:ccggc -sid cg asis::cgggc -sid gg
(-sid, like -aformat, is an associated qualifier. It must follow the asis::
sequence because it is positional (putting it first on the command line for
example would refer to all sequences - fine for -sformat but not a good
idea for -sid :-)
Hope this helps
Peter
--
------------------------------------------------
Peter Rice, LION Bioscience Ltd, Cambridge, UK
peter.rice at uk.lionbioscience.com +44 1223 224723
From vz_silvana at verizon-uweb.com Wed Nov 27 15:30:03 2002
From: vz_silvana at verizon-uweb.com (Silvana Paredes)
Date: Wed, 27 Nov 2002 15:30:03 -0500
Subject: Inquire about login jemboss
Message-ID: <200211272030.PAA18916@www22.ureach.com>
To whom may it concern:
I downloaded the jemboss software but I am trying to used and
it is asking me for a login and a password and I can't find the
way to set up an account or use the emboss without login it.
I will appreciate if you can give me instructions about how to
start using it or create an account.
Thank you so much,
Best regards,
Silvana Paredes
From starksb at ebi.ac.uk Fri Nov 1 09:45:29 2002
From: starksb at ebi.ac.uk (David Starks-Browning)
Date: Fri, 1 Nov 2002 09:45:29 +0000
Subject: emboss in cygwin
In-Reply-To: <3DC16BAA.1050201@bigfoot.com>
References: <3DC16BAA.1050201@bigfoot.com>
Message-ID: <4429-Fri01Nov2002094530+0000-starksb@ebi.ac.uk>
On Thursday 31 Oct 02, clwu writes:
> Hi, group,
> I am new to group. I tried to compile EMBOSS under
> win2K/cygwin but I failed. EMBOSS website at HGMP mentioned that
> "Richard Bruskiewich and Simon Kelley at the Sanger Centre have
> succeeded in compiling EMBOSS under Windows NT using the CygWin package.
> The resulting executables have been tested but not thoroughly enough for
> a release. Contact Richard Bruskiewich for more information. ". But I
> can not follow the link in this page to get help.
> Does anyone have the successful experience on this?
I just built EMBOSS-2.5.1 on Win98 using the latest Cygwin downloaded
from . There is no libgd.[a|dll] so no PNG
support. But everything else appeared to build fine. I've not tested
the applications though.
Note that you will need much more from Cygwin's setup.exe than is
installed by default.
If you provide details about what failed, I may be able to help you.
Feel free to respond off-list, as a Cygwin build may not be
interesting to the rest of the emboss list. We can always summarise
to the emboss list once we get it sorted, if there is interest.
Regards,
David
(Cygwin FAQ maintainer)
-------------------------------------------------------------------
David Starks-Browning | starksb at ebi.ac.uk
EMBL Outstation -- |
The European Bioinformatics Institute |
Wellcome Trust Genome Campus | tel: +44 (1223) 494 616
Hinxton, Cambridge, CB10 1SD, UK | fax: +44 (1223) 494 468
-------------------------------------------------------------------
From peter.rice at uk.lionbioscience.com Fri Nov 1 10:12:58 2002
From: peter.rice at uk.lionbioscience.com (Peter Rice)
Date: Fri, 01 Nov 2002 10:12:58 +0000
Subject: emboss in cygwin
References: <3DC16BAA.1050201@bigfoot.com>
Message-ID: <3DC253AA.8080401@uk.lionbioscience.com>
clwu wrote:
> I am new to group. I tried to compile EMBOSS under
> win2K/cygwin but I failed. EMBOSS website at HGMP mentioned that
> "Richard Bruskiewich and Simon Kelley at the Sanger Centre have
> succeeded in compiling EMBOSS under Windows NT using the CygWin package.
> The resulting executables have been tested but not thoroughly enough for
> a release. Contact Richard Bruskiewich for more information. ". But I
> can not follow the link in this page to get help.
That is rather old information.
The history is that Richard Bruskiewich made a windows port of an early
ACEDB version, and they both tried porting an early EMBOSS release using
cygwin - which worked apart from the graphics library and windows fiel naming.
Neither Richard nor Simon have been working on this recently.
David Starks-Browning at EBI has built EMBOSS but not yet tried the
applications. I hear of other groups who have also tried.
You can expect problems with Windows filenames which clash with EMBOSS
"USA" syntax. We can try to fix these - perhaps by requiring all database
names to have more than one letter so Windows drive letters work.
Any suggestions on changes needed to make EMBOSS work better (or work at
all) on windows systems?
Peter
--
------------------------------------------------
Peter Rice, LION Bioscience Ltd, Cambridge, UK
peter.rice at uk.lionbioscience.com +44 1223 224723
From pageauma at ESI.UMontreal.CA Fri Nov 1 20:01:07 2002
From: pageauma at ESI.UMontreal.CA (Marie PAGEAU)
Date: Fri, 1 Nov 2002 15:01:07 -0500
Subject: test sequence
Message-ID:
Dear colleagues,
A lady, professor of biochemistry at
the Universite de Montreal, sent me the following request.
Would you please be nice enough to help us?
Your help would be highly appreciated.
Best regards,
Marie Pageau
-----------------------------------------------------------
De : Muriel Aubry
Envoy? : 30 octobre, 2002 14:47
Objet : test sequence
Hi,
I am presently using the restrict and showseq programs from EMBOSS. I
have noticed that some very usual enzymes are not detected by the
program such as XhoI and PstI and a few others when the complete list of
enzymes is used. I have here below a test sequence that should contain
XhoI, EcoRI, PstI, EcoRV, HindIII, KpnI, SacII, ApaI, SmaI, BamHI and
XbaI.
XhoI, PstI and EcoRV are not detected by restrict and showseq in the
test sequence shown below. Is there a problem with the restriction
enzyme list?
Test Sequence:
gagcagggggatctcggcgagctctcgagaattctcacgcgtctgcaggatatcaagcttgcggtaccgcgg
gcccggg
From ableasby at hgmp.mrc.ac.uk Fri Nov 1 20:21:29 2002
From: ableasby at hgmp.mrc.ac.uk (ableasby at hgmp.mrc.ac.uk)
Date: Fri, 1 Nov 2002 20:21:29 GMT
Subject: test sequence
Message-ID: <200211012021.UAA12495@bromine.hgmp.mrc.ac.uk>
There is probably not a problem. EMBOSS only reports one
isoschizomer for cases where several REs have the same cut
site. If the -preferred switch is given to these
programs then the more easily available of the isoschizomers
will be reported. This is controlled by the file:
embossre.equ
where, for each RE, you can specify which isoschizomer should be
reported.
So, first try adding -preferred. If you just want to search for
a particular set of enzymes they can be given as a comma-separated
list using the -enzymes qualifier e.g. -enzymes "ecori bamhi"
HTH
Alan Bleasby
HGMP
PS: NEB supply an emboss-format set of files which are just the most
common REs. You can rename them (e.g. to embossre.enz/ref/sup)
and overwrite your current set in the emboss REBASE directory.
From David.Bauer at SCHERING.DE Mon Nov 4 08:35:51 2002
From: David.Bauer at SCHERING.DE (David.Bauer at SCHERING.DE)
Date: Mon, 4 Nov 2002 09:35:51 +0100
Subject: test sequence
Message-ID:
Hi Alan,
I wonder how the "preferred" list is created.
Restrict finds the site "CTGCAG" as recognition site of BstMAI.
This is a rather exotic enzyme, available only from one single company, which I
didn't know before.
I appologize for my ignorance if this is a common supplier in UK ;-)
On the other hand PstI is available from about 20 suppliers and this is also the
enzyme name used in various catalogue pictures of multiple cloning sites in
vectors (puc19 polylinker etc.)
So I would suggest to add the BstMAI -> PstI mapping to the distribution version
of embossre.equ.
David.
There is probably not a problem. EMBOSS only reports one
isoschizomer for cases where several REs have the same cut
site. If the -preferred switch is given to these
programs then the more easily available of the isoschizomers
will be reported. This is controlled by the file:
embossre.equ
where, for each RE, you can specify which isoschizomer should be
reported.
So, first try adding -preferred. If you just want to search for
a particular set of enzymes they can be given as a comma-separated
list using the -enzymes qualifier e.g. -enzymes "ecori bamhi"
HTH
Alan Bleasby
HGMP
PS: NEB supply an emboss-format set of files which are just the most
common REs. You can rename them (e.g. to embossre.enz/ref/sup)
and overwrite your current set in the emboss REBASE directory.
From gbottu at ben.vub.ac.be Mon Nov 4 10:05:58 2002
From: gbottu at ben.vub.ac.be (Guy Bottu)
Date: Mon, 4 Nov 2002 11:05:58 +0100 (CET)
Subject: Remote getz from emboss
Message-ID: <200211041005.LAA1502643@black.vub.ac.be>
from : BEN
At the BEN site we do have a Perl script that reproduces more or less the
functionality of the GCG program lookup. It can access a local (with getz) or a
remote (with rsh getz) SRS server (simple outcomment inside the script, was
because we once had our SRS server on a different computer). It can be run
interactively at the command line or put behind an EMBOSS wrapper program and
thus behind Staden or the EMBOSS WWW interfaces. Is this what you are looking
for ?
Guy Bottu
From duhaimj at ircm.qc.ca Tue Nov 5 21:34:57 2002
From: duhaimj at ircm.qc.ca (Johanne Duhaime)
Date: Tue, 05 Nov 2002 16:34:57 -0500
Subject: MSE will not save on Exit
Message-ID: <3DC83981.625D3E83@ircm.qc.ca>
Hello
I am trying to use MSE (MSE -0.04.tar.gz just installed) but I cannot
save with the exit command.
After I modified a sequence, when I type Exit on the command line I
have:
Sequences modified do you wish to continue exiting [N]
Saying Y or N will not save anything.
For now I have to use "write".
Any idea of the problem?
--
Johanne Duhaime
IRCM
110 Ave des Pins O
Montreal, Quebec
987-5556 (tel) 987-5644 (fax)
Johanne_Duhaime at ircm.qc.ca
http://www.ircm.qc.ca
From w2hgcg at netscape.net Wed Nov 6 03:01:47 2002
From: w2hgcg at netscape.net (w2hgcg at netscape.net)
Date: Tue, 05 Nov 2002 22:01:47 -0500
Subject: epitope search
Message-ID: <56BE5F2D.194AD14A.000665E2@netscape.net>
I know this is not the place but perhaps...
I am working with LSA-3, I have been able to produce some antibodys in rabits, when I run Inmunoblot the antiboys recognize specific proteic bands (bandas, do not know the right english word), how can I search for epitopes in the pfalciparum against my inmunogenos? sorry for my english...
Lucia Goncalvez
__________________________________________________________________
The NEW Netscape 7.0 browser is now available. Upgrade now! http://channels.netscape.com/ns/browsers/download.jsp
Get your own FREE, personal Netscape Mail account today at http://webmail.netscape.com/
From ray at leicester.ac.uk Wed Nov 6 13:24:38 2002
From: ray at leicester.ac.uk (Dalgleish, Dr R.)
Date: Wed, 6 Nov 2002 13:24:38 -0000
Subject: Suggestion for new EMBOSS program
Message-ID:
I find GCG framealign very useful to align
a protein with its DNA sequence. Could
somebody find the time to write an EMBOSS
equivalent?
Thanks,
Raymond Dalgleish
Genetics
Leicester
From peter.rice at uk.lionbioscience.com Wed Nov 6 13:32:13 2002
From: peter.rice at uk.lionbioscience.com (Peter Rice)
Date: Wed, 06 Nov 2002 13:32:13 +0000
Subject: Suggestion for new EMBOSS program
References:
Message-ID: <3DC919DD.5020306@uk.lionbioscience.com>
Dalgleish, Dr R. wrote:
> I find GCG framealign very useful to align
> a protein with its DNA sequence. Could
> somebody find the time to write an EMBOSS
> equivalent?
Sounds rather like genewise in the (free) Wise2 package
http://www.sanger.ac.uk/Software/Wise2/
You can try it at http://www.sanger.ac.uk/Software/Wise2/genewiseform.shtml
Can you be more specific about whether this is what you need, and what you
want in EMBOSS?
regards,
Peter
--
------------------------------------------------
Peter Rice, LION Bioscience Ltd, Cambridge, UK
peter.rice at uk.lionbioscience.com +44 1223 224723
From heme at postmark.net Thu Nov 7 12:05:55 2002
From: heme at postmark.net (Per Johansson)
Date: Thu, 07 Nov 2002 12:05:55 +0000
Subject: Suggestion for new use for EMBOSS program
Message-ID: <20021107120555.8757.qmail@venus.postmark.net>
I have it difficult to find a replacement for the GCG program
FINDPATTERNS. The EMBOSS program fuzznuc cannot use a database of
patterns (primers). Other alignment programs in EMBOSS like
supermatcher are useful but, among other things, you can't choose
mismatch settings.
The best replacement I've found is the EMBOSS program tfscan! Tfscan
uses a database of patterns, but you can't reverse the patterns (you
have to put in copies of forward and reverse primer sequences in the
database). The tfscan algorithm is ideal (and is much faster than find
patterns) but obviously a few minor changes to the input and output
would be required if it were used in a replacement program.
Obviously, I could write a script to wrap tfscan but I'd like to
avoid this. A new program with this functionality would be beneficial
for the EMBOSS package.
Per
From charles at moulinette.dyndns.org Thu Nov 7 13:20:42 2002
From: charles at moulinette.dyndns.org (Charles Plessy)
Date: Thu, 7 Nov 2002 14:20:42 +0100
Subject: Suggestion for new use for EMBOSS program
In-Reply-To: <20021107120555.8757.qmail@venus.postmark.net>
References: <20021107120555.8757.qmail@venus.postmark.net>
Message-ID: <20021107132042.GA9854@moulinette.dyndns.org>
> The best replacement I've found is the EMBOSS program tfscan!
> Obviously, I could write a script to wrap tfscan but I'd like to
> avoid this. A new program with this functionality would be beneficial
> for the EMBOSS package.
An alternative would be to write a script that builds a
transfac-format database from a flatfile containing names and
corresponding consensus (This would also allow to migrate the
pattern.dat file of GCG).
Charles
From heme at postmark.net Thu Nov 7 15:11:43 2002
From: heme at postmark.net (Per Johansson)
Date: Thu, 07 Nov 2002 15:11:43 +0000
Subject: Fwd: Re: Suggestion for new use for EMBOSS program
Message-ID: <20021107151143.25950.qmail@venus.postmark.net>
It's OK to reformat the GCG pattern file to a transfac-format database
and use tfscan. But I still miss some functions in FINDPATTERNS. I
can't search the reverse primer strand, the output is limited to ONE
format no alignment format, tfscan it doesn't accept wobbeling bases
in primers (e.g. K=G or T , D=G or C or A not-T ...). But otherwise
the tfscan algorithm is a very nice and fast word-matching algorithm,
but it COULD be used for other purposes also!
Per
--- Forwarded Message ---
To: EMBOSS
From: Charles Plessy
Reply-To: c.plessy at mangoosta.net
Subject: Re: Suggestion for new use for EMBOSS program
Date: Thu, 7 Nov 2002 14:20:42 +0100
> The best replacement I've found is the EMBOSS program tfscan!
> Obviously, I could write a script to wrap tfscan but I'd like to
> avoid this. A new program with this functionality would be beneficial
> for the EMBOSS package.
An alternative would be to write a script that builds a
transfac-format database from a flatfile containing names and
corresponding consensus (This would also allow to migrate the
pattern.dat file of GCG).
Charles
From heme at postmark.net Fri Nov 8 14:28:53 2002
From: heme at postmark.net (Per Johansson)
Date: Fri, 08 Nov 2002 14:28:53 +0000
Subject: EMBOSS default program settings
Message-ID: <20021108142853.4941.qmail@venus.postmark.net>
I have problems with EMBOSS default program settings in the
emboss.defaults file.
set emboss_stdout 1 Works fine, output goes to stdout
set emboss_verbose 1 Dosen't work
set emboss_format embl The programs still outputs fasta format by
default! And the ONLY sequence format the EMBOSS programs accepts as
input format is embl!
It dosen't work as it should.
Per
From peter.rice at uk.lionbioscience.com Fri Nov 8 15:52:05 2002
From: peter.rice at uk.lionbioscience.com (Peter Rice)
Date: Fri, 08 Nov 2002 15:52:05 +0000
Subject: EMBOSS default program settings
References: <20021108142853.4941.qmail@venus.postmark.net>
Message-ID: <3DCBDDA5.3070106@uk.lionbioscience.com>
Per Johansson wrote:
> I have problems with EMBOSS default program settings in the
> emboss.defaults file.
>
> set emboss_stdout 1 Works fine, output goes to stdout
>
> set emboss_verbose 1 Dosen't work
Because help is generated as soon as the -help option is tested.
Changed in the next release to set -verbose before -help.
> set emboss_format embl The programs still outputs fasta format by
> default! And the ONLY sequence format the EMBOSS programs accepts as
> input format is embl!
>
> It dosen't work as it should.
Well .... emboss_format sets the default *input* format. You can still say
fasta::filename to read fasta format
The output format is specified as emboss_outformat
EMBOSS will read all input formats if you only set emboss_outformat
I think you really mean to say:
set emboss_outformat embl
Hope this helps
Peter
--
------------------------------------------------
Peter Rice, LION Bioscience Ltd, Cambridge, UK
peter.rice at uk.lionbioscience.com +44 1223 224723
From heme at postmark.net Mon Nov 11 06:32:50 2002
From: heme at postmark.net (Per Johansson)
Date: Mon, 11 Nov 2002 06:32:50 +0000
Subject: Fwd: Re: EMBOSS default program settings
Message-ID: <20021111063250.19730.qmail@venus.postmark.net>
Thank you Peter,
I DO mean
set emboss_outformat embl (but I can't find emboss_outformat in the
documentation). When I use
set emboss_outformat embl
in the emboss.default file I don't see any effect at all, the default
output format is still fasta, I use emboss-2.4.1 . But I assume this
is not version dependent.
Per
Per Johansson
heme at postmark.net
--- Forwarded Message ---
To: Per Johansson
Cc: EMBOSS
From: Peter Rice
Subject: Re: EMBOSS default program settings
Date: Fri, 08 Nov 2002 15:52:05 +0000
Per Johansson wrote:
> I have problems with EMBOSS default program settings in the
> emboss.defaults file.
>
> set emboss_stdout 1 Works fine, output goes to stdout
>
> set emboss_verbose 1 Dosen't work
Because help is generated as soon as the -help option is tested.
Changed in the next release to set -verbose before -help.
> set emboss_format embl The programs still outputs fasta format by
> default! And the ONLY sequence format the EMBOSS programs accepts as
> input format is embl!
>
> It dosen't work as it should.
Well .... emboss_format sets the default *input* format. You can still say
fasta::filename to read fasta format
The output format is specified as emboss_outformat
EMBOSS will read all input formats if you only set emboss_outformat
I think you really mean to say:
set emboss_outformat embl
Hope this helps
Peter
--
------------------------------------------------
Peter Rice, LION Bioscience Ltd, Cambridge, UK
peter.rice at uk.lionbioscience.com +44 1223 224723
From gwilliam at hgmp.mrc.ac.uk Mon Nov 11 09:20:01 2002
From: gwilliam at hgmp.mrc.ac.uk (Gary Williams, Tel 01223 494522)
Date: Mon, 11 Nov 2002 09:20:01 +0000
Subject: Fwd: Re: EMBOSS default program settings
References: <20021111063250.19730.qmail@venus.postmark.net>
Message-ID: <3DCF7641.E2ED9B73@hgmp.mrc.ac.uk>
Per Johansson wrote:
>
> Thank you Peter,
>
> I DO mean
>
> set emboss_outformat embl (but I can't find emboss_outformat in the
> documentation).
It is documented in:
http://www.hgmp.mrc.ac.uk/Software/EMBOSS/Usa/databases.html#global
--
Gary Williams Tel: +44 1223 494522 Fax: +44 1223 494512
mailto:G.Williams at hgmp.mrc.ac.uk http://www.hgmp.mrc.ac.uk/
Bioinformatics,MRC HGMP Resource Centre,Hinxton,Cambridge, CB10 1SB,UK
From heme at postmark.net Mon Nov 11 13:12:50 2002
From: heme at postmark.net (Per Johansson)
Date: Mon, 11 Nov 2002 13:12:50 +0000
Subject: Fwd: Re: EMBOSS default program settings
Message-ID: <20021111131250.4618.qmail@www2.postmark.net>
Tnak you,
That solves the problem, ALWAYS use the latest version!
Per
Gary Williams, Tel 01223 494522 wrote:
> Per Johansson wrote:
> >
> > Thank you Peter,
> >
> > I DO mean
> >
> > set emboss_outformat embl (but I can't find emboss_outformat in the
> > documentation).
>
> It is documented in:
> http://www.hgmp.mrc.ac.uk/Software/EMBOSS/Usa/databases.html#global
>
> --
> Gary Williams Tel: +44 1223 494522 Fax: +44 1223 494512
> mailto:G.Williams at hgmp.mrc.ac.uk http://www.hgmp.mrc.ac.uk/
> Bioinformatics,MRC HGMP Resource Centre,Hinxton,Cambridge, CB10 1SB,UK
From sebastian.bassi at ar.advantaseeds.com Mon Nov 11 12:10:54 2002
From: sebastian.bassi at ar.advantaseeds.com (Sebastian Bassi)
Date: Mon, 11 Nov 2002 13:10:54 +0100
Subject: Problem with EMBOSS GUI
Message-ID:
Hi,
I've just installed the EMBOSS GUI on http://genes.unq.edu.ar/EMBOSS
(this should look like this http://bioinfo.pbi.nrc.ca:8090/EMBOSS/)
The problem as you can see on the webpage is the missing programs on the
left column (it should appear there all the EMBOSS programs).
I think this should be a path problem. For you to help me evaluate it, I
attach two files:
embossdir.txt, a capture of the ls -Ra from my emboss inst. directory
(/opt/emboss).
emboss.pl, the emboss.pl file (for you to see if the path are right).
The emboss.zip file contains both files and I made it because sometimes
attached text get corrupted by some mailers.
I hope you can help me.
Note: The EMBOSS were compiled using this:
configure --prefix=/opt/emboss --without-x --x-includes="" --x-libraries=""
The "without x" part is because it was comiled on a RH web server without X.
The emboss works fine, the problem is this GUI.
Sebastian Bassi.
Advanta Seeds. Balcarce Research Station.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: emboss.zip
Type: application/x-zip-compressed
Size: 15092 bytes
Desc: emboss.zip
URL:
From mad at biol.unlp.edu.ar Mon Nov 11 19:18:00 2002
From: mad at biol.unlp.edu.ar (=?ISO-8859-1?Q?Mart=EDn_Sarachu?=)
Date: Mon, 11 Nov 2002 16:18:00 -0300
Subject: tfextract not indexing?
Message-ID: <3DD00268.6090206@biol.unlp.edu.ar>
Hi,
tfextract is apparently running ok, but it's output are files are empty.
The command line is
> # tfextract -debug -warning -error -fatal -die -verbose
> Extract data from TRANSFAC
> Full pathname of transfac SITE.DAT: /home/work/dbs/transfac/site.dat
> #
and
> # ls -s /usr/local/emboss/share/EMBOSS/data/tf*
> 0 /usr/local/emboss/share/EMBOSS/data/tffungi
> 0 /usr/local/emboss/share/EMBOSS/data/tfinsect
> 0 /usr/local/emboss/share/EMBOSS/data/tfother
> 0 /usr/local/emboss/share/EMBOSS/data/tfplant
> 0 /usr/local/emboss/share/EMBOSS/data/tfvertebrate
...a sample from site.dat
> VV TRANSFAC SITES TABLE, V.2.4 25-08-1995
> XX
> //
> AC R00001
> XX
> ID HS$6-16_01
> XX
> DT 20.06.90 (created); .
> DT 24.08.95 10:48:05 (updated); EWI.
> XX
> TY DNA
> XX
> DE 6-16
> XX
> SE gGGAAAaTGAAACT
> XX
> EL ISRE
> XX
> SF -127
> ST -89
> XX
> ...
> ...
> SO 0811; B103
> ME gel shift competition
> RN [1]
> RA Suzuki-Yagawa Y., Kawakami K., Nagano K.
> RT Housekeeping Na,K-ATPase alpha1 subunit gene promoter is composed
> RT of multiple cis elements to which common and cell type-specific
> RT factors bind
> RL Mol. Cell. Biol. 12:4046-4055 (1992).
> DR EMBL; X52560; HSNFIL6(37:74).
> //
am I missing something?
Thanks,
martin
--
Mart?n Sarachu
mad at biol.unlp.edu.ar
EMBNet Argentina
http://www.ar.embnet.org
From Gunnar.Andersson at imbim.uu.se Tue Nov 12 10:28:47 2002
From: Gunnar.Andersson at imbim.uu.se (Gunnar Andersson)
Date: Tue, 12 Nov 2002 11:28:47 +0100
Subject: DAN output Tm
Message-ID:
How should I interpret the Tm calculated but DAN? Is Tm an estimated
melt point of the entire sequence or of the sequence in the window?
How can this Tm (window=100nt) be higher than Tmprod of the full 160
nt sequence?
--
Gunnar Andersson
Institutionen f?r medicinsk biokemi och mikrobiologi
Uppsala Biomedicinska Centrum (BMC), Husarg. 3
Box 582, 751 23 UPPSALA
E-post : Gunnar.Andersson at imbim.uu.se
Telefon: 018-471 45 87
Fax:018-50 98 76
From r.bowden at vir.gla.ac.uk Thu Nov 14 15:36:23 2002
From: r.bowden at vir.gla.ac.uk (Rory Bowden)
Date: Thu, 14 Nov 2002 15:36:23 -0000
Subject: Fw: Other: EMBOSS versus GCG?
Message-ID: <007e01c28bf3$96beddc0$6886d182@vir.gla.ac.uk>
This on the 'evoldir' list, which is the main international mailing list in
evolutionary biology. Would anyone like to make any comments for me to pass
on?
while I'm definitely not in Canada I would say that this question is likely
to come up here (in the UK) at the institutional if not research council
level. Does anyone have an opinion they'd like to articulate e.g. about
whether EMBOSS is ready to supplant GCG for end-users.
Rory Bowden
MRC Virology Unit
Glasgow UK
----- Original Message -----
From: "EvolDir"
To:
Sent: Thursday, November 14, 2002 9:34 AM
Subject: Other: EMBOSS versus GCG?
>
> Since its inception as the "Wisconsin package" in the early 1980s, the GCG
> suite of programs have provided a continuously improving "gold standard"
> for evolutionary bioinformatics software. The GCG suite is featured
> extensively in the latest bioinformatics textbooks (e.g. Mount) and in
> software reviews (e.g. The Scientist, August 19). Although some individual
> GCG programs have been surpassed by others, their range and flexibility,
> permitting linkage of programs together in innovative ways, has no current
> equivalent.
>
> Recently, "open source" advocates have pointed to the EMBOSS suite
as
> providing a free alternative to the commercial package (supplied by
> Accelrys, with whom I have no financial connection). It is my impression
> that GCG is in a different league. For example, compare the GCG program
> "Window" with its proposed EMBOSS alternative "Freak":
>
> TASK: Determination of the number of occurences of a motif in a sequence
> window.
>
> GGC program WINDOW
>
> 1. Allows up to 6 motifs at a time
> 2. Outputs absolute values and has a variety of other output options.
> 3. Extensive input menu
>
> EMBOSS program FREAK
>
> 1. Allows only 1 motif at a time.
> 2. Outputs a calculated fraction.
> 3. Very limited input menu.
>
> However, in Canada the open-source agenda has won out. In April
2002
> the publicly-funded, Halifax-based, Canadian Bioinformatics Resource (CBR)
> abandonned GCG, apparently with the consent of the Canadian evolutionary
> bioinformatics community. In this respect, I would be interested to hear
> from concerned parties in Canada with respect to the following questions:
>
> 1. Does your institution (or do you yourself) support GCG, so that you do
> not need CBR to supply GCG?
>
> 2. If you do not have independent access, do you find EMBOSS a suitable
> substitute for GCG?
>
> That Canada, which has spent hundreds of millions on genome
> projects, cannot give its researchers and their students a choice from
> among the relatively-inexpensive software packages that are available to
> analyze genomics data, seems to me very strange.
>
> Donald Forsdyke, Department of Biochemistry,
> Queen's University, Canada
> http://post.queensu.ca/~forsdyke/bioinfor.htm
>
>
>
>
From peter.rice at uk.lionbioscience.com Fri Nov 15 13:20:13 2002
From: peter.rice at uk.lionbioscience.com (Peter Rice)
Date: Fri, 15 Nov 2002 13:20:13 +0000
Subject: Fw: Other: EMBOSS versus GCG?
References: <007e01c28bf3$96beddc0$6886d182@vir.gla.ac.uk>
Message-ID: <3DD4F48D.2020606@uk.lionbioscience.com>
Rory Bowden wrote:
> This on the 'evoldir' list, which is the main international mailing list in
> evolutionary biology. Would anyone like to make any comments for me to pass
> on?
>> It is my impression
>>that GCG is in a different league. For example, compare the GCG program
>>"Window" with its proposed EMBOSS alternative "Freak":
>>
>>TASK: Determination of the number of occurences of a motif in a sequence
>>window.
>>
>>GGC program WINDOW
>>
>>1. Allows up to 6 motifs at a time
>>2. Outputs absolute values and has a variety of other output options.
>>3. Extensive input menu
>>
>>EMBOSS program FREAK
>>
>>1. Allows only 1 motif at a time.
>>2. Outputs a calculated fraction.
>>3. Very limited input menu.
Window : produces scores over a 'window' (a base range).
StatPlot : Plots Window results
EMBOSS : reports have scores over a base range as a general output format.
Freak: frequency of matches
FuzzNuc/FuzzPro/FuzzTran: Pattern matches with ambiguity codes
Restrict: Pattern matches with a pattern file
etc...
This makes it possible to develop some really nice new EMBOSS applications.
So ... how about a program which reads EMBOSS report files and produces a
summary report (think of window), and another that plots them all (think of
statplot). Scores could be plotted if we have a good way to compare them.
Yes, I know freak does not produce a report file ... but that is a very
easy change.
It could also read in EMBL/SwissProt feature tables as annotation.
So, suggestions please for EMBOSS applications to plot reports/features...
For example:
1. xy plot of scores as points at the centre of a feature, with the
sequence position on the x axis and the score on the y axis. Possibly split
into multiple plots by program/feature-type/named-tag-value (e.g. pattern)
(like statplot only much more versatile).
2. xy plot of lines for each feature
3. GANTT (bar) chart of features by position, annotated with feature
type/program/score as appropriate
4. Combine these - xy plot of features with scores, and other features
reported underneath (think of the -mark option in statplot - but with far
more annotation possible below the x axis)
Maybe we can make some mock-ups on the EMBOSS pages to show the possibilities?
regards,
Peter
--
------------------------------------------------
Peter Rice, LION Bioscience Ltd, Cambridge, UK
peter.rice at uk.lionbioscience.com +44 1223 224723
From sjmiller at u.arizona.edu Fri Nov 15 19:23:21 2002
From: sjmiller at u.arizona.edu (Susan J. Miller)
Date: Fri, 15 Nov 2002 12:23:21 -0700
Subject: newcpgreport vs newcpgseek
Message-ID: <3DD549A9.9170F056@u.arizona.edu>
I could not find an emboss FAQ...is there one?
I'm trying to figure out the differences between cpgreport, newcpgreport
and newcpgseek.
--
Thanks,
-susan
Susan J. Miller
Biotechnology Computing Facility
Arizona Research Laboratories
Bio West 228
University of Arizona
Tucson, AZ 85721
(520) 626-2597
From rls at ebi.ac.uk Sat Nov 16 01:08:02 2002
From: rls at ebi.ac.uk (Rodrigo Lopez)
Date: Sat, 16 Nov 2002 01:08:02 -0000
Subject: newcpgreport vs newcpgseek
In-Reply-To: <3DD549A9.9170F056@u.arizona.edu>
Message-ID: <000501c28d0c$9cf06780$0a0868d5@castafiore>
Hi Susan,
Yes. I never had the time to document these. Briefly:
newcpgreport use the same method to find islands but produce different
output. The method is described in:
Larsen,F., Gundersen,G., Lopez,R., Prydz,H.
CpG islands as gene markers in the human genome.
(1992) Genomics 13 (4):1095-107
MedlineID: 92372002 PubMedID: 1505946
Cpgreport uses a scoring method based on sum/frequencies which
overpredicts islands but finds the smaller ones around primary exons.
Cpgseek is deprecated at the moment.
For all practical purposes I use newcpgreport. I actually use it to
produce the human cpgisland database you can find on the EBI's ftp
server as well as on the EBI's SRS server.
Hope this helps,
R:)
> -----Original Message-----
> From: owner-emboss at hgmp.mrc.ac.uk
> [mailto:owner-emboss at hgmp.mrc.ac.uk] On Behalf Of Susan J. Miller
> Sent: 15 November 2002 19:23
> To: emboss at embnet.org
> Subject: newcpgreport vs newcpgseek
>
>
> I could not find an emboss FAQ...is there one?
>
> I'm trying to figure out the differences between cpgreport,
> newcpgreport and newcpgseek.
>
> --
> Thanks,
> -susan
>
> Susan J. Miller
> Biotechnology Computing Facility
> Arizona Research Laboratories
> Bio West 228
> University of Arizona
> Tucson, AZ 85721
> (520) 626-2597
>
From David.Bauer at SCHERING.DE Wed Nov 20 14:45:24 2002
From: David.Bauer at SCHERING.DE (David.Bauer at SCHERING.DE)
Date: Wed, 20 Nov 2002 15:45:24 +0100
Subject: vectorstrip
Message-ID:
Hi,
If I run vectorstrip on a file with many sequences, the output file contains
only sequences where the vector was stripped.
I would find it more usefull, if vectorstrip would (maybe optionally) also send
the sequences with no hit to the vector in the output file.
Or have I overseen something ?
David.
From Myrian_Grondin at UQTR.CA Wed Nov 20 16:15:20 2002
From: Myrian_Grondin at UQTR.CA (Myrian_Grondin at UQTR.CA)
Date: Wed, 20 Nov 2002 11:15:20 -0500
Subject: Install Emboss with Windows??
Message-ID: <1037808920.3ddbb518dbbac@courriel.uqtr.ca>
Hi,
We are working on PC, OS Windows 98, and we would like to know if it's possible
to install Emboss on our machine. If so, which software have we to install to
be able to run Emboss?
Thanks a lot (excuse me, my English is so poor...)
Myrian
-------------------------------------------------
Courriel exp?di? via https://courriel.uqtr.ca
From stefanielager at fastmail.ca Thu Nov 21 08:19:32 2002
From: stefanielager at fastmail.ca (Stefanie Lager)
Date: Thu, 21 Nov 2002 03:19:32 -0500 (EST)
Subject: EMBOOS end EMBL entryname
Message-ID: <3DDC9714.000059.00380@ns.interchange.ca>
Hi,
Does EMBL still stick to entrynames (the ID line)of "nine uppercase
alphanumeric Characters"?
(http://www.ebi.ac.uk/embl/Documentation/User_manual/id_line.html) .I
can't retrive sequences from the International Protein Index (IPI)
database (11 characters in ID entryname) in EMBL or SWISS format using
EMBOSS programs. The EMBOSS programs only accepts 10 characters for ID
in EMBL or SWISS format . Is this problem fixed in EMBOSS versions
later than 2.4.1? EMBL can have wthatever policy they want but it
would be nice if the EMBOSS programs would accept ANY lenth of ID also
in EMBL and SWISS format.
Stefanie
_________________________________________________________________
http://fastmail.ca/ - Fast Secure Web Email for Canadians
From sharmila at ebi.ac.uk Thu Nov 21 11:12:33 2002
From: sharmila at ebi.ac.uk (Sharmila Pillai)
Date: Thu, 21 Nov 2002 11:12:33 +0000
Subject: Install Emboss with Windows??
Message-ID:
Hi,
From what I understand there is a cygwin compiled version but its not
tested and cannot handle graphics.
You should refer to what Rodrigo Lopez wrote to the embosslist on
1/11/02 in response to subject:Remote getz from emboss
Though this not the solution for your problem today, this could be the
direction for Windows users. I'll try to explain bit of it here:
At the EBI's External Services group, I am working on a webservice for
EMBOSS using SOAP. Basically, this enables the user to use EMBOSS
applications remotely.
% seqret srsembl:J00231 -lhttp://servername:portnum/axis/services
The above command would use AXIS/SOAP to access the 'servername' and the
'portnum' which inturn would retrieve data from srsembl (as defined in
emboss.default) and pass it on to the application (seqret, in this example).
The result is sent to stdout.
All the user (using any OS) needs is a client which understands/interprets a
command line as above and some libraries for Axis/SOAP. We have an
experimental service using both Java and Perl running on Axis/Tomcat.
I don't think EBI provides remote access to many EMBOSS applications today.
Hoping our experimental service survives our tests and there is enough
demand for such a service, EBI can soon start opening up webservice access
to EMBOSS.
//Sharmila.
From Georg.Beckmann at Schering.DE Thu Nov 21 08:25:39 2002
From: Georg.Beckmann at Schering.DE (Georg.Beckmann at Schering.DE)
Date: Thu, 21 Nov 2002 09:25:39 +0100
Subject: OldDistances
Message-ID:
Hi,
does anybody know if EMBOSS offers a program similar to OldDistances in GCG ?
OldDistances - which previously had still another name, that I don't remember -
calculates a matrix of pairwise similarities from a multiple alignment.
As far as I can see, there is no such program. Is somebody working on such
program for Emboss ?
Thanks.
Ciao,
Georg Beckmann
From newgene at bigfoot.com Thu Nov 21 14:55:24 2002
From: newgene at bigfoot.com (clwu)
Date: Thu, 21 Nov 2002 08:55:24 -0600
Subject: Install Emboss with Windows??
References: <1037808920.3ddbb518dbbac@courriel.uqtr.ca>
Message-ID: <3DDCF3DC.9020202@bigfoot.com>
I recently compiled EMBOSS successfully under cygwin/win2K(Thanks for
David Starks-Browning's great help). And so far, all applications I used
works fine(graphics output is also OK under openbox/cygwin). I think you
should install cygwin and give a try.
good luck.
Chunlei
Myrian_Grondin at UQTR.CA wrote:e
>Hi,
>We are working on PC, OS Windows 98, and we would like to know if it's
possible
>to install Emboss on our machine. If so, which software have we to
install to
>be able to run Emboss?
>Thanks a lot (excuse me, my English is so poor...)
>Myrian
>
>
>
>-------------------------------------------------
>Courriel exp?di? via https://courriel.uqtr.ca
>
>
From lukem at gene.pbi.nrc.ca Thu Nov 21 15:28:13 2002
From: lukem at gene.pbi.nrc.ca (Luke McCarthy)
Date: Thu, 21 Nov 2002 09:28:13 -0600 (CST)
Subject: Install Emboss with Windows??
In-Reply-To: <1037808920.3ddbb518dbbac@courriel.uqtr.ca>
Message-ID:
On Wed, 20 Nov 2002 Myrian_Grondin at UQTR.CA wrote:
> Hi,
> We are working on PC, OS Windows 98, and we would like to know if it's
> possible to install Emboss on our machine. If so, which software have we to
> install to be able to run Emboss?
Other posts have addressed the issue of installing EMBOSS locally on a Windows
box, but if you have an immediate pressing need to use the EMBOSS
applications, the Canadian Bioinformatics Resource offers access through a web
interface at
http://www.cbr.nrc.ca/services/emboss_e.php
ou en francais:
http://www.cbr.nrc.ca/services/emboss_f.php
Unfortunately, the interface itself is English only, but then so are the
EMBOSS applications (at least as far as I know...) Cheers,
Luke
From newgene at bigfoot.com Thu Nov 21 16:52:11 2002
From: newgene at bigfoot.com (clwu)
Date: Thu, 21 Nov 2002 10:52:11 -0600
Subject: mfold
Message-ID: <3DDD0F3B.8020109@bigfoot.com>
Hi, group,
Does anybody know if there is a EMBOSS equivalence for
"mfold" program in GCG?
Thanks.
Chunlei
From stefanielager at fastmail.ca Fri Nov 22 08:33:29 2002
From: stefanielager at fastmail.ca (Stefanie Lager)
Date: Fri, 22 Nov 2002 03:33:29 -0500 (EST)
Subject: mfold
Message-ID: <3DDDEBD9.000009.03475@ns.interchange.ca>
> Hi, group,
> Does anybody know if there is a EMBOSS equivalence for
> "mfold" program in GCG?
>
> Thanks.
>
> Chunlei
NO, but there are plenty of RNA structure software out there, both as
servers and for local installation.
http://www.bioinfo.rpi.edu/~zukerm/rna/node3.html#SECTION00031
_________________________________________________________________
http://fastmail.ca/ - Fast Secure Web Email for Canadians
From mikep at angis.org.au Sun Nov 24 22:25:41 2002
From: mikep at angis.org.au (Michael Poidinger)
Date: Mon, 25 Nov 2002 09:25:41 +1100
Subject: codon useage tables
In-Reply-To: <3DDDEBD9.000009.03475@ns.interchange.ca>
Message-ID: <5.2.0.9.2.20021125092344.02f2ae50@morgan.angis.org.au>
Is there a site somewhere which describes which organisms/data sets the
EMBOSS codon useage tables are derived from? some are obvious from their
name, others are not.
Thanks,
Mike
------------------------------------
Dr Michael Poidinger
PhD(virology) PGDipSci (computer science)
CEO, Australian Genome Information Centre
Head, Australian National Genome Information Service
ph 61-2-93518617
mob 0413146765
fax 61-2-93518618
email head at angis.org.au
------------------------------------------
From areagp61 at yahoo.it Mon Nov 25 09:38:59 2002
From: areagp61 at yahoo.it (Graziano P.)
Date: Mon, 25 Nov 2002 10:38:59 +0100
Subject: codon useage tables
References: <5.2.0.9.2.20021125092344.02f2ae50@morgan.angis.org.au>
Message-ID: <007701c29466$8127a7f0$18105709@italy.ibm.com>
Not every file but most are described in the README file
from ftp://ftp.ebi.ac.uk/pub/databases/codonusage
Hope this helps
Graziano Pappad?
----- Original Message -----
From: "Michael Poidinger"
To:
Sent: Sunday, November 24, 2002 11:25 PM
Subject: codon useage tables
> Is there a site somewhere which describes which organisms/data sets the
> EMBOSS codon useage tables are derived from? some are obvious from their
> name, others are not.
>
> Thanks,
> Mike
> ------------------------------------
> Dr Michael Poidinger
> PhD(virology) PGDipSci (computer science)
> CEO, Australian Genome Information Centre
> Head, Australian National Genome Information Service
> ph 61-2-93518617
> mob 0413146765
> fax 61-2-93518618
> email head at angis.org.au
> ------------------------------------------
>
______________________________________________________________________
Per te Blu American Express ? gratis!
http://it.yahoo.com/mail_it/foot/?http://www.americanexpress.it/land_yahoo
From mikep at angis.org.au Mon Nov 25 21:58:02 2002
From: mikep at angis.org.au (Michael Poidinger)
Date: Tue, 26 Nov 2002 08:58:02 +1100
Subject: codon useage tables
In-Reply-To: <007701c29466$8127a7f0$18105709@italy.ibm.com>
References: <5.2.0.9.2.20021125092344.02f2ae50@morgan.angis.org.au>
Message-ID: <5.2.0.9.2.20021126085548.02f344e8@morgan.angis.org.au>
At 10:38 AM 25/11/2002 +0100, Graziano P. wrote:
>Not every file but most are described in the README file
>from ftp://ftp.ebi.ac.uk/pub/databases/codonusage
>
>Hope this helps
Thanks, it helps with quite a few.
Do you (or anyone else) know the difference between related files?
such as
Ehum and Ehuman
Eeco, Eeco_h and Eecoli
Emus, Emussp
etc.
Thanks,
Mike
------------------------------------
Dr Michael Poidinger
PhD(virology) PGDipSci (computer science)
CEO, Australian Genome Information Centre
Head, Australian National Genome Information Service
ph 61-2-93518617
mob 0413146765
fax 61-2-93518618
email head at angis.org.au
------------------------------------------
From peter.rice at uk.lionbioscience.com Tue Nov 26 10:40:04 2002
From: peter.rice at uk.lionbioscience.com (Peter Rice)
Date: Tue, 26 Nov 2002 10:40:04 +0000
Subject: codon useage tables
References: <5.2.0.9.2.20021125092344.02f2ae50@morgan.angis.org.au> <5.2.0.9.2.20021126085548.02f344e8@morgan.angis.org.au>
Message-ID: <3DE34F84.90108@uk.lionbioscience.com>
Michael Poidinger wrote:
> Do you (or anyone else) know the difference between related files?
>
> such as
> Ehum and Ehuman
> Eeco, Eeco_h and Eecoli
> Emus, Emussp
The codon usage files were set up a long time ago. It was not so easy to
find a good set of tables that were free to use. The first tables (if I
recall correctly) came from the TRANSTERM database
Short names (Eeco) are reformatted TRANSTERM codon usage tables with an E
(EMBOSS) prefix and a .cut suffix to identify the format.
Names with _h (Eco_h) are highly expressed genes (high Codon Adaptation
Index values)
sp endings? Help! Ysp is "Yeast S.pombe" of course. I assume the others are
for a genus (e.g. Mus sp. = Mus musculus and Mus domesticus) rather than
a single species. Emussp.cut is a reformat of TRANSTERM's mussp.cod file.
The EBI's FTP copy of TRANSTERM did not document exactly what these names
mean. The original TRANSTERM documentation also leaves you to guess at the
3-letter spoecies codes. The TRANSTERM website seems to be only partly
available.
Longer names (Eecoli) are added from elsewhere (I need to check on their
origin) and only include a few genes (count the stop codons!) so I assume
they are old and probably obsolete.
mt endings are mitochondrial genes
cp endings are chloroplast genes
Time to review these tables I suspect!!! How about replacing them with
annotated tables from CUTG for selected species? We need to be careful
about default table names in some programs, but they are easy to update.
Peter
--
------------------------------------------------
Peter Rice, LION Bioscience Ltd, Cambridge, UK
peter.rice at uk.lionbioscience.com +44 1223 224723
From Joerg.Schaber at uv.es Tue Nov 26 12:41:44 2002
From: Joerg.Schaber at uv.es (Joerg Schaber)
Date: Tue, 26 Nov 2002 13:41:44 +0100
Subject: duplicate ID
Message-ID: <3DE36C08.6030603@uv.es>
Hi,
creating a ncbi database using dbiflat I always get a few times the message
"Warning: Duplicate ID skipped: '' All hits will point to first ID
found".
Even though it does not seem to have severe efects I would like to know
what duplicate IDs are ment.
I checked the genomes IDs and acnums and they seem to be OK (all *gbk
files downloaded from NCBI) and they all have entries and are not 'null'.
Any idea what's the problem?
here the command I use:
dbiflat -idformat gb -directory "." -filename "*.gbk" -dbname "ncbibac"
-release "1.0" -date "26/11/02" -fields acnum,des,taxon
greeetings,
joerg
From david.vilanova at rdls.nestle.com Tue Nov 26 13:43:40 2002
From: david.vilanova at rdls.nestle.com (Vilanova,David,LAUSANNE,NRC/BS)
Date: Tue, 26 Nov 2002 14:43:40 +0100
Subject: Matcher
Message-ID: <89466355CEFE7244AC3A013E45641C180144ECDD@lsmail2.crn.nestrd.ch>
Dear all,
I was wondering if matcher program accepts a sequence via stdin.
the following exemple doesn't work for me.
matcher -sequencea 'ATGCGA' -sequenceb 'ATCTAGATATGCGA'
>cannot open ATGCGA file for read.
Is there anyway to submit a sequence via stdin ???
Thanks,
David
From peter.rice at uk.lionbioscience.com Tue Nov 26 13:53:50 2002
From: peter.rice at uk.lionbioscience.com (Peter Rice)
Date: Tue, 26 Nov 2002 13:53:50 +0000
Subject: Matcher
References: <89466355CEFE7244AC3A013E45641C180144ECDD@lsmail2.crn.nestrd.ch>
Message-ID: <3DE37CEE.5090903@uk.lionbioscience.com>
Vilanova,David,LAUSANNE,NRC/BS wrote:
> Dear all,
> I was wondering if matcher program accepts a sequence via stdin.
>
> the following exemple doesn't work for me.
>
> matcher -sequencea 'ATGCGA' -sequenceb 'ATCTAGATATGCGA'
>
>>cannot open ATGCGA file for read.
>
>
> Is there anyway to submit a sequence via stdin ???
You don't mean stdin (that can only read one sequence anyway) ... you mean
"can I specify a sequence on the command line?"
Yes!!!! You need the "asis" special format.
matcher -sequencea 'asis::ATGCGA' -sequenceb 'asis::ATCTAGATATGCGA'
(assuming your shell allows the command line to be long enough for your
sequences :-)
Hope this helps
Peter
--
------------------------------------------------
Peter Rice, LION Bioscience Ltd, Cambridge, UK
peter.rice at uk.lionbioscience.com +44 1223 224723
From david.vilanova at rdls.nestle.com Tue Nov 26 14:12:19 2002
From: david.vilanova at rdls.nestle.com (Vilanova,David,LAUSANNE,NRC/BS)
Date: Tue, 26 Nov 2002 15:12:19 +0100
Subject: Matcher
Message-ID: <89466355CEFE7244AC3A013E45641C180144ECE0@lsmail2.crn.nestrd.ch>
Thanks Peter,
Sorry for the mistake.
I'm writing a bioperl script which automatically runs an emboss aplication.
I could have worked by generating foreach sequence I read a new file but it
looks pretty nice like that.
Regards,
David
#! /usr/bin/perl -w
use Bio::Factory::EMBOSS;
use Bio::SeqIO;
die "Usage: perl script.pl [seqfileA] [seqfileB] [outfile]\n" unless @ARGV
eq '3';
#Read input files
($seqfileA,$seqfileB,$outfile) = @ARGV;
#Initialize Object
$EMBOSS = new Bio::Factory::EMBOSS;
#Define emboss program to run
$application = $EMBOSS->program('matcher');
#Manipulate SeqfileA file
$seqA = new Bio::SeqIO (-file => $seqfileA,
-format => 'fasta');
while ($seqinA = $seqA->next_seq){
$inseqA = "asis::".$seqinA->seq;
$seqidA = $seqinA->id;
#$seqoutA->write_seq($inseqA);
print "####$seqidA\n";
#Initialize seqB at every iteration of SeqA
$seqB = new Bio::SeqIO (-file => $seqfileB,
-format => 'fasta');
while ($seqinB = $seqB->next_seq){
$inseqB = "asis::".$seqinB->seq; #Format like asis::ATGCGA (required
for emboss)
$seqidB = $seqinB->id;
#$seqoutB->write_seq($inseqB);
#print "####$inseqA\n";
print "Processing sequence $seqidA..vs..$seqidB...";
#Define program parameters and run...
$application->run({
-sequencea => $inseqA,
-sequenceb => $inseqB,
-outfile => $outfile });
print "done\n";
....
Manipulate alignments.....
....
}
}
-----Original Message-----
From: Peter Rice [mailto:peter.rice at uk.lionbioscience.com]
Sent: mardi, 26. novembre 2002 14:54
To: Vilanova,David,LAUSANNE,NRC/BS
Cc: 'emboss at embnet.org'
Subject: Re: Matcher
Vilanova,David,LAUSANNE,NRC/BS wrote:
> Dear all,
> I was wondering if matcher program accepts a sequence via stdin.
>
> the following exemple doesn't work for me.
>
> matcher -sequencea 'ATGCGA' -sequenceb 'ATCTAGATATGCGA'
>
>>cannot open ATGCGA file for read.
>
>
> Is there anyway to submit a sequence via stdin ???
You don't mean stdin (that can only read one sequence anyway) ... you mean
"can I specify a sequence on the command line?"
Yes!!!! You need the "asis" special format.
matcher -sequencea 'asis::ATGCGA' -sequenceb 'asis::ATCTAGATATGCGA'
(assuming your shell allows the command line to be long enough for your
sequences :-)
Hope this helps
Peter
--
------------------------------------------------
Peter Rice, LION Bioscience Ltd, Cambridge, UK
peter.rice at uk.lionbioscience.com +44 1223 224723
From jason at cgt.mc.duke.edu Tue Nov 26 14:54:19 2002
From: jason at cgt.mc.duke.edu (Jason Stajich)
Date: Tue, 26 Nov 2002 09:54:19 -0500 (EST)
Subject: Matcher
In-Reply-To: <89466355CEFE7244AC3A013E45641C180144ECE0@lsmail2.crn.nestrd.ch>
References: <89466355CEFE7244AC3A013E45641C180144ECE0@lsmail2.crn.nestrd.ch>
Message-ID:
Bioperl will also do the behind-the-scenes work of creating the tempfile
and cleaning it up for you if you just pass in a Bio::PrimarySeqI object.
It detects if you pass in an object or a string and proceeds accordingly.
Jason Stajich
Duke University
jason at cgt.mc.duke.edu
On Tue, 26 Nov 2002, Vilanova,David,LAUSANNE,NRC/BS wrote:
> Thanks Peter,
> Sorry for the mistake.
> I'm writing a bioperl script which automatically runs an emboss aplication.
> I could have worked by generating foreach sequence I read a new file but it
> looks pretty nice like that.
>
> Regards,
> David
>
>
> #! /usr/bin/perl -w
>
> use Bio::Factory::EMBOSS;
> use Bio::SeqIO;
>
> die "Usage: perl script.pl [seqfileA] [seqfileB] [outfile]\n" unless @ARGV
> eq '3';
>
> #Read input files
> ($seqfileA,$seqfileB,$outfile) = @ARGV;
>
> #Initialize Object
> $EMBOSS = new Bio::Factory::EMBOSS;
>
> #Define emboss program to run
> $application = $EMBOSS->program('matcher');
>
> #Manipulate SeqfileA file
> $seqA = new Bio::SeqIO (-file => $seqfileA,
> -format => 'fasta');
>
>
> while ($seqinA = $seqA->next_seq){
> $inseqA = "asis::".$seqinA->seq;
> $seqidA = $seqinA->id;
> #$seqoutA->write_seq($inseqA);
>
> print "####$seqidA\n";
> #Initialize seqB at every iteration of SeqA
> $seqB = new Bio::SeqIO (-file => $seqfileB,
> -format => 'fasta');
>
> while ($seqinB = $seqB->next_seq){
> $inseqB = "asis::".$seqinB->seq; #Format like asis::ATGCGA (required
> for emboss)
> $seqidB = $seqinB->id;
> #$seqoutB->write_seq($inseqB);
> #print "####$inseqA\n";
> print "Processing sequence $seqidA..vs..$seqidB...";
>
>
> #Define program parameters and run...
> $application->run({
> -sequencea => $inseqA,
> -sequenceb => $inseqB,
> -outfile => $outfile });
> print "done\n";
> ....
> Manipulate alignments.....
> ....
> }
>
> }
>
>
>
>
>
> -----Original Message-----
> From: Peter Rice [mailto:peter.rice at uk.lionbioscience.com]
> Sent: mardi, 26. novembre 2002 14:54
> To: Vilanova,David,LAUSANNE,NRC/BS
> Cc: 'emboss at embnet.org'
> Subject: Re: Matcher
>
>
> Vilanova,David,LAUSANNE,NRC/BS wrote:
> > Dear all,
> > I was wondering if matcher program accepts a sequence via stdin.
> >
> > the following exemple doesn't work for me.
> >
> > matcher -sequencea 'ATGCGA' -sequenceb 'ATCTAGATATGCGA'
> >
> >>cannot open ATGCGA file for read.
> >
> >
> > Is there anyway to submit a sequence via stdin ???
>
> You don't mean stdin (that can only read one sequence anyway) ... you mean
> "can I specify a sequence on the command line?"
>
> Yes!!!! You need the "asis" special format.
>
> matcher -sequencea 'asis::ATGCGA' -sequenceb 'asis::ATCTAGATATGCGA'
>
> (assuming your shell allows the command line to be long enough for your
> sequences :-)
>
> Hope this helps
>
> Peter
>
> --
> ------------------------------------------------
> Peter Rice, LION Bioscience Ltd, Cambridge, UK
> peter.rice at uk.lionbioscience.com +44 1223 224723
>
From david.vilanova at rdls.nestle.com Tue Nov 26 15:58:32 2002
From: david.vilanova at rdls.nestle.com (Vilanova,David,LAUSANNE,NRC/BS)
Date: Tue, 26 Nov 2002 16:58:32 +0100
Subject: Bioperl and matcher
Message-ID: <89466355CEFE7244AC3A013E45641C180144ECE7@lsmail2.crn.nestrd.ch>
Hello,
I have problems retrieving the alignments from an emboss output.
The program belows read 2 files and runs a matcher of all against all.
Matcher gives me an msf output and then I try to parse this alignment with
Bio::AlignIO.
However I get an exception...
Processing sequence 1..vs..3...done
------------- EXCEPTION -------------
MSG: 1 exists as an alignment line but not in the header. Not confident of
what is going on!
STACK Bio::AlignIO::msf::next_aln
/usr/local/lib/perl5/site_perl/5.8.0/Bio/AlignIO/msf.pm:106
STACK toplevel Run_Emboss.pl:50
--------------------------------------
Here is the output from matcher:
!!NA_MULTIPLE_ALIGNMENT 1.0
out MSF: 5 Type: N 26/11/02 CompCheck: 2090 ..
Name: EMBOSS_001 Len: 5 Check: 1045 Weight: 1.00
Name: EMBOSS_002 Len: 5 Check: 1045 Weight: 1.00
//
1 5
EMBOSS_001 CGGCG
EMBOSS_002 CGGCG
###########################################################
It doesn't work for fasta format as well in my script (see output below):
Processing sequence 1..vs..3...done
Use of uninitialized value in sprintf at
/usr/local/lib/perl5/site_perl/5.8.0/Bio/SimpleAlign.pm line 257,
line 4.
Use of uninitialized value in hash element at
/usr/local/lib/perl5/site_perl/5.8.0/Bio/SimpleAlign.pm line 268,
line 4.
Use of uninitialized value in hash element at
/usr/local/lib/perl5/site_perl/5.8.0/Bio/SimpleAlign.pm line 268,
line 4.
Use of uninitialized value in hash element at
/usr/local/lib/perl5/site_perl/5.8.0/Bio/SimpleAlign.pm line 270,
line 4.
#########################
#Script
#! /usr/bin/perl -w
use Bio::Factory::EMBOSS;
use Bio::SeqIO;
use Bio::AlignIO;
die "Usage: perl script.pl [seqfileA] [seqfileB] [outfile]\n" unless @ARGV
eq '3';
#Read input files
($seqfileA,$seqfileB,$outfile) = @ARGV;
#Initialize Object
$EMBOSS = new Bio::Factory::EMBOSS;
#Define emboss program to run
$application = $EMBOSS->program('matcher');
#Manipulate SeqfileA file
$seqA = new Bio::SeqIO (-file => $seqfileA,
-format => 'fasta');
while ($seqinA = $seqA->next_seq){
$inseqA = "asis::".$seqinA->seq;
$seqidA = $seqinA->id;
print "####$seqidA\n";
#Initialize seqB at every iteration of SeqA
$seqB = new Bio::SeqIO (-file => $seqfileB,
-format => 'fasta');
while ($seqinB = $seqB->next_seq){
$inseqB = "asis::".$seqinB->seq; #Format like asis::ATGCGA (required for
emboss)
$seqidB = $seqinB->id;
print "Processing sequence $seqidA..vs..$seqidB...";
#Define program parameters and run...
$application->run({
-sequencea => $inseqA,
-sequenceb => $inseqB,
-aformat => 'msf',
-outfile => $outfile });
print "done\n";
$alnin = new Bio::AlignIO(-format => 'msf',
-file => $outfile );
while ($aln = $alnin->next_aln){
print $aln->no_residues,"\n";
#print $aln->consensus_string,"\n";
}
}
}
From jason at cgt.mc.duke.edu Tue Nov 26 16:05:22 2002
From: jason at cgt.mc.duke.edu (Jason Stajich)
Date: Tue, 26 Nov 2002 11:05:22 -0500 (EST)
Subject: Bioperl and matcher
In-Reply-To: <89466355CEFE7244AC3A013E45641C180144ECE7@lsmail2.crn.nestrd.ch>
References: <89466355CEFE7244AC3A013E45641C180144ECE7@lsmail2.crn.nestrd.ch>
Message-ID:
Our msf parser is seeing something it isn't expecting - not sure why -
what happens when you just use the straight 'emboss' parser with standard
emboss alignment output which is the route that has been most heavily
tested?
-jason
Jason Stajich
Duke University
jason at cgt.mc.duke.edu
On Tue, 26 Nov 2002, Vilanova,David,LAUSANNE,NRC/BS wrote:
>
> Hello,
> I have problems retrieving the alignments from an emboss output.
> The program belows read 2 files and runs a matcher of all against all.
> Matcher gives me an msf output and then I try to parse this alignment with
> Bio::AlignIO.
> However I get an exception...
>
> Processing sequence 1..vs..3...done
>
> ------------- EXCEPTION -------------
> MSG: 1 exists as an alignment line but not in the header. Not confident of
> what is going on!
> STACK Bio::AlignIO::msf::next_aln
> /usr/local/lib/perl5/site_perl/5.8.0/Bio/AlignIO/msf.pm:106
> STACK toplevel Run_Emboss.pl:50
>
> --------------------------------------
>
> Here is the output from matcher:
> !!NA_MULTIPLE_ALIGNMENT 1.0
>
> out MSF: 5 Type: N 26/11/02 CompCheck: 2090 ..
>
> Name: EMBOSS_001 Len: 5 Check: 1045 Weight: 1.00
> Name: EMBOSS_002 Len: 5 Check: 1045 Weight: 1.00
>
> //
>
> 1 5
> EMBOSS_001 CGGCG
> EMBOSS_002 CGGCG
>
>
> ###########################################################
> It doesn't work for fasta format as well in my script (see output below):
> Processing sequence 1..vs..3...done
> Use of uninitialized value in sprintf at
> /usr/local/lib/perl5/site_perl/5.8.0/Bio/SimpleAlign.pm line 257,
> line 4.
> Use of uninitialized value in hash element at
> /usr/local/lib/perl5/site_perl/5.8.0/Bio/SimpleAlign.pm line 268,
> line 4.
> Use of uninitialized value in hash element at
> /usr/local/lib/perl5/site_perl/5.8.0/Bio/SimpleAlign.pm line 268,
> line 4.
> Use of uninitialized value in hash element at
> /usr/local/lib/perl5/site_perl/5.8.0/Bio/SimpleAlign.pm line 270,
> line 4.
>
> #########################
>
>
> #Script
> #! /usr/bin/perl -w
>
> use Bio::Factory::EMBOSS;
> use Bio::SeqIO;
> use Bio::AlignIO;
>
> die "Usage: perl script.pl [seqfileA] [seqfileB] [outfile]\n" unless @ARGV
> eq '3';
>
> #Read input files
> ($seqfileA,$seqfileB,$outfile) = @ARGV;
>
> #Initialize Object
> $EMBOSS = new Bio::Factory::EMBOSS;
>
> #Define emboss program to run
> $application = $EMBOSS->program('matcher');
>
> #Manipulate SeqfileA file
> $seqA = new Bio::SeqIO (-file => $seqfileA,
> -format => 'fasta');
>
>
> while ($seqinA = $seqA->next_seq){
> $inseqA = "asis::".$seqinA->seq;
> $seqidA = $seqinA->id;
>
>
> print "####$seqidA\n";
> #Initialize seqB at every iteration of SeqA
> $seqB = new Bio::SeqIO (-file => $seqfileB,
> -format => 'fasta');
>
> while ($seqinB = $seqB->next_seq){
> $inseqB = "asis::".$seqinB->seq; #Format like asis::ATGCGA (required for
> emboss)
> $seqidB = $seqinB->id;
>
> print "Processing sequence $seqidA..vs..$seqidB...";
>
> #Define program parameters and run...
> $application->run({
> -sequencea => $inseqA,
> -sequenceb => $inseqB,
> -aformat => 'msf',
> -outfile => $outfile });
> print "done\n";
>
> $alnin = new Bio::AlignIO(-format => 'msf',
> -file => $outfile );
>
> while ($aln = $alnin->next_aln){
> print $aln->no_residues,"\n";
> #print $aln->consensus_string,"\n";
>
> }
> }
> }
>
>
>
>
>
>
>
>
>
From peter.rice at uk.lionbioscience.com Tue Nov 26 16:12:46 2002
From: peter.rice at uk.lionbioscience.com (Peter Rice)
Date: Tue, 26 Nov 2002 16:12:46 +0000
Subject: Bioperl and matcher
References: <89466355CEFE7244AC3A013E45641C180144ECE7@lsmail2.crn.nestrd.ch>
Message-ID: <3DE39D7E.9080403@uk.lionbioscience.com>
Vilanova,David,LAUSANNE,NRC/BS wrote:
>
> Hello,
> I have problems retrieving the alignments from an emboss output.
> The program belows read 2 files and runs a matcher of all against all.
> Matcher gives me an msf output and then I try to parse this alignment with
> Bio::AlignIO.
> However I get an exception...
>
> Processing sequence 1..vs..3...done
>
> ------------- EXCEPTION -------------
> MSG: 1 exists as an alignment line but not in the header. Not confident of
> what is going on!
BioPerl seems to be having trouble with the EMBOSS MSF format output. It
could be something about the naming of the sequences?
EMBOSS is making up names for your sequences. I assume you are using
asis::CGGCG to pass them to matcher. You can put -sid after each sequence
to give them names, for example:
matcher -out x.x -af msf asis:ccggc -sid cg asis::cgggc -sid gg
(-sid, like -aformat, is an associated qualifier. It must follow the asis::
sequence because it is positional (putting it first on the command line for
example would refer to all sequences - fine for -sformat but not a good
idea for -sid :-)
Hope this helps
Peter
--
------------------------------------------------
Peter Rice, LION Bioscience Ltd, Cambridge, UK
peter.rice at uk.lionbioscience.com +44 1223 224723
From david.vilanova at rdls.nestle.com Tue Nov 26 16:14:19 2002
From: david.vilanova at rdls.nestle.com (Vilanova,David,LAUSANNE,NRC/BS)
Date: Tue, 26 Nov 2002 17:14:19 +0100
Subject: Bioperl and matcher
Message-ID: <89466355CEFE7244AC3A013E45641C180144ECE8@lsmail2.crn.nestrd.ch>
Ok,I use:
$alnin = new Bio::AlignIO(-format =>'emboss',
-file => $outfile );
while ($aln = $alnin->next_aln){
print $aln->no_residues,"\n";
}
I don't specify any format to emboss so I get the standard alignment.
In this case It doesn't work, it never enters this loop... but the program
doesn't crash. It does all the alignements, store the aln in outfile but
seems not to read it..!! bizarre ???
David
-----Original Message-----
From: Jason Stajich [mailto:jason at cgt.mc.duke.edu]
Sent: mardi, 26. novembre 2002 17:05
To: Vilanova,David,LAUSANNE,NRC/BS
Cc: 'bioperl-l at bioperl.org'; 'emboss at embnet.org'
Subject: Re: Bioperl and matcher
Our msf parser is seeing something it isn't expecting - not sure why -
what happens when you just use the straight 'emboss' parser with standard
emboss alignment output which is the route that has been most heavily
tested?
-jason
Jason Stajich
Duke University
jason at cgt.mc.duke.edu
On Tue, 26 Nov 2002, Vilanova,David,LAUSANNE,NRC/BS wrote:
>
> Hello,
> I have problems retrieving the alignments from an emboss output.
> The program belows read 2 files and runs a matcher of all against all.
> Matcher gives me an msf output and then I try to parse this alignment with
> Bio::AlignIO.
> However I get an exception...
>
> Processing sequence 1..vs..3...done
>
> ------------- EXCEPTION -------------
> MSG: 1 exists as an alignment line but not in the header. Not confident of
> what is going on!
> STACK Bio::AlignIO::msf::next_aln
> /usr/local/lib/perl5/site_perl/5.8.0/Bio/AlignIO/msf.pm:106
> STACK toplevel Run_Emboss.pl:50
>
> --------------------------------------
>
> Here is the output from matcher:
> !!NA_MULTIPLE_ALIGNMENT 1.0
>
> out MSF: 5 Type: N 26/11/02 CompCheck: 2090 ..
>
> Name: EMBOSS_001 Len: 5 Check: 1045 Weight: 1.00
> Name: EMBOSS_002 Len: 5 Check: 1045 Weight: 1.00
>
> //
>
> 1 5
> EMBOSS_001 CGGCG
> EMBOSS_002 CGGCG
>
>
> ###########################################################
> It doesn't work for fasta format as well in my script (see output below):
> Processing sequence 1..vs..3...done
> Use of uninitialized value in sprintf at
> /usr/local/lib/perl5/site_perl/5.8.0/Bio/SimpleAlign.pm line 257,
> line 4.
> Use of uninitialized value in hash element at
> /usr/local/lib/perl5/site_perl/5.8.0/Bio/SimpleAlign.pm line 268,
> line 4.
> Use of uninitialized value in hash element at
> /usr/local/lib/perl5/site_perl/5.8.0/Bio/SimpleAlign.pm line 268,
> line 4.
> Use of uninitialized value in hash element at
> /usr/local/lib/perl5/site_perl/5.8.0/Bio/SimpleAlign.pm line 270,
> line 4.
>
> #########################
>
>
> #Script
> #! /usr/bin/perl -w
>
> use Bio::Factory::EMBOSS;
> use Bio::SeqIO;
> use Bio::AlignIO;
>
> die "Usage: perl script.pl [seqfileA] [seqfileB] [outfile]\n" unless @ARGV
> eq '3';
>
> #Read input files
> ($seqfileA,$seqfileB,$outfile) = @ARGV;
>
> #Initialize Object
> $EMBOSS = new Bio::Factory::EMBOSS;
>
> #Define emboss program to run
> $application = $EMBOSS->program('matcher');
>
> #Manipulate SeqfileA file
> $seqA = new Bio::SeqIO (-file => $seqfileA,
> -format => 'fasta');
>
>
> while ($seqinA = $seqA->next_seq){
> $inseqA = "asis::".$seqinA->seq;
> $seqidA = $seqinA->id;
>
>
> print "####$seqidA\n";
> #Initialize seqB at every iteration of SeqA
> $seqB = new Bio::SeqIO (-file => $seqfileB,
> -format => 'fasta');
>
> while ($seqinB = $seqB->next_seq){
> $inseqB = "asis::".$seqinB->seq; #Format like asis::ATGCGA (required for
> emboss)
> $seqidB = $seqinB->id;
>
> print "Processing sequence $seqidA..vs..$seqidB...";
>
> #Define program parameters and run...
> $application->run({
> -sequencea => $inseqA,
> -sequenceb => $inseqB,
> -aformat => 'msf',
> -outfile => $outfile });
> print "done\n";
>
> $alnin = new Bio::AlignIO(-format => 'msf',
> -file => $outfile );
>
> while ($aln = $alnin->next_aln){
> print $aln->no_residues,"\n";
> #print $aln->consensus_string,"\n";
>
> }
> }
> }
>
>
>
>
>
>
>
>
>
From david.vilanova at rdls.nestle.com Tue Nov 26 16:33:56 2002
From: david.vilanova at rdls.nestle.com (Vilanova,David,LAUSANNE,NRC/BS)
Date: Tue, 26 Nov 2002 17:33:56 +0100
Subject: Bioperl and matcher
Message-ID: <89466355CEFE7244AC3A013E45641C180144ECEC@lsmail2.crn.nestrd.ch>
I tried that but it still doesn't fix the problem...
-----Original Message-----
From: Peter Rice [mailto:peter.rice at uk.lionbioscience.com]
Sent: mardi, 26. novembre 2002 17:13
To: Vilanova,David,LAUSANNE,NRC/BS
Cc: 'bioperl-l at bioperl.org'; 'emboss at embnet.org'
Subject: Re: Bioperl and matcher
Vilanova,David,LAUSANNE,NRC/BS wrote:
>
> Hello,
> I have problems retrieving the alignments from an emboss output.
> The program belows read 2 files and runs a matcher of all against all.
> Matcher gives me an msf output and then I try to parse this alignment with
> Bio::AlignIO.
> However I get an exception...
>
> Processing sequence 1..vs..3...done
>
> ------------- EXCEPTION -------------
> MSG: 1 exists as an alignment line but not in the header. Not confident of
> what is going on!
BioPerl seems to be having trouble with the EMBOSS MSF format output. It
could be something about the naming of the sequences?
EMBOSS is making up names for your sequences. I assume you are using
asis::CGGCG to pass them to matcher. You can put -sid after each sequence
to give them names, for example:
matcher -out x.x -af msf asis:ccggc -sid cg asis::cgggc -sid gg
(-sid, like -aformat, is an associated qualifier. It must follow the asis::
sequence because it is positional (putting it first on the command line for
example would refer to all sequences - fine for -sformat but not a good
idea for -sid :-)
Hope this helps
Peter
--
------------------------------------------------
Peter Rice, LION Bioscience Ltd, Cambridge, UK
peter.rice at uk.lionbioscience.com +44 1223 224723
From vz_silvana at verizon-uweb.com Wed Nov 27 20:30:03 2002
From: vz_silvana at verizon-uweb.com (Silvana Paredes)
Date: Wed, 27 Nov 2002 15:30:03 -0500
Subject: Inquire about login jemboss
Message-ID: <200211272030.PAA18916@www22.ureach.com>
To whom may it concern:
I downloaded the jemboss software but I am trying to used and
it is asking me for a login and a password and I can't find the
way to set up an account or use the emboss without login it.
I will appreciate if you can give me instructions about how to
start using it or create an account.
Thank you so much,
Best regards,
Silvana Paredes