From biopython at maubp.freeserve.co.uk  Mon Jul  6 06:12:23 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 6 Jul 2009 11:12:23 +0100
Subject: [EMBOSS] Many-to-many with needle and water
Message-ID: <320fb6e00907060312l5506c766i1c4a4ff48fba8c7e@mail.gmail.com>

Hi Peter R. et al,

I gather EMBOSS is looking for feedback for new applications (given
the recent funding from the BBSRC - congratulations again). How about
suggestions for extensions to existing EMBOSS applications?

I've used bits of EMBOSS for several years now (thank you!). Something
I have sometimes wanted to do is a many-to-many pairwise sequence
alignment with the EMBOSS tools needle and water.

Right now, needle and water take two files (here referred to as A and
B), file A has just one sequence, and file B can have one or more
sequences. I'd like to be able to supply two files both with multiple
entries, and have needle/water do pairwise alignments between all the
sequences in A against all the sequences in B. This might be useful
for finding reciprocal best hits in comparative genomics (as an slower
but exact alternative to FASTA or BLAST).

>From an implementation point of view, I might imagine doing sequence
A1 against all of B, then sequence A2 against all of B, etc. This
would require looping over file B many times (easy if on disk). This
would also work if the A input was stdin, but having the B input on
stdin would require caching the data if A has more than one sequence
:(

It may sometimes also be useful to have an all-against-all pairwise
comparison for a single set of sequences. The above suggested
enhancement would let you do this by comparing file A to file A.
However, here you only really need to do half the possible
combinations (as aligning sequence A1 to sequence A2 should be the
same as A2 to A1). This could be useful for implementing a basic
clustering algorithm, or maybe as part of a worked example in building
a simple NJ tree?

So, does supporting many-to-many comparisons sound like a useful
enhancement to needle and water?

I should stress this isn't something I need right now. Also, it can be
worked around with a wrapper script to call needle/water once for each
sequence in file A (against all the sequences in file B), with the
added bonus that then these jobs one-to-many comparisons can then be
shared across multiple CPU cores.

Regards,

Peter C.

From pmr at ebi.ac.uk  Mon Jul  6 06:35:12 2009
From: pmr at ebi.ac.uk (Peter Rice)
Date: Mon, 06 Jul 2009 11:35:12 +0100
Subject: [EMBOSS] Many-to-many with needle and water
In-Reply-To: <320fb6e00907060312l5506c766i1c4a4ff48fba8c7e@mail.gmail.com>
References: <320fb6e00907060312l5506c766i1c4a4ff48fba8c7e@mail.gmail.com>
Message-ID: <4A51D360.1050107@ebi.ac.uk>

Peter Cock or biopython wrote:
> Hi Peter R. et al,
> 
> I gather EMBOSS is looking for feedback for new applications (given
> the recent funding from the BBSRC - congratulations again). How about
> suggestions for extensions to existing EMBOSS applications?
> 
> I've used bits of EMBOSS for several years now (thank you!). Something
> I have sometimes wanted to do is a many-to-many pairwise sequence
> alignment with the EMBOSS tools needle and water.
> 
> Right now, needle and water take two files (here referred to as A and
> B), file A has just one sequence, and file B can have one or more
> sequences. I'd like to be able to supply two files both with multiple
> entries, and have needle/water do pairwise alignments between all the
> sequences in A against all the sequences in B. This might be useful
> for finding reciprocal best hits in comparative genomics (as an slower
> but exact alternative to FASTA or BLAST).

The application is easy to add (after the release)

The usual problem with all-against-all is that it involves loading one
of the inputs as a sequence set entirely in memory - to avoid reading
one input many times over.

We have an application supermatcher which does this - the first sequence
is streamed through, the second is a sequence set loaded into memory. It
uses work matching to find seed alignments then runs a limited alignment
around the hits.

superwater would be a possible name (or superneedle).

How popular would such a program be?

How large would the smaller input set be?

regards,

Peter


From biopython at maubp.freeserve.co.uk  Mon Jul  6 06:58:06 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 6 Jul 2009 11:58:06 +0100
Subject: [EMBOSS] Many-to-many with needle and water
In-Reply-To: <4A51D360.1050107@ebi.ac.uk>
References: <320fb6e00907060312l5506c766i1c4a4ff48fba8c7e@mail.gmail.com>
	<4A51D360.1050107@ebi.ac.uk>
Message-ID: <320fb6e00907060358v71934109o1d8acb0aef8d3e83@mail.gmail.com>

On Mon, Jul 6, 2009 at 11:35 AM, Peter Rice <pmr at ebi.ac.uk> wrote:
>
> Peter C wrote:
> > Hi Peter R. et al,
> >
> > I gather EMBOSS is looking for feedback for new applications (given
> > the recent funding from the BBSRC - congratulations again). How about
> > suggestions for extensions to existing EMBOSS applications?
> >
> > I've used bits of EMBOSS for several years now (thank you!). Something
> > I have sometimes wanted to do is a many-to-many pairwise sequence
> > alignment with the EMBOSS tools needle and water.
> >
> > Right now, needle and water take two files (here referred to as A and
> > B), file A has just one sequence, and file B can have one or more
> > sequences. I'd like to be able to supply two files both with multiple
> > entries, and have needle/water do pairwise alignments between all the
> > sequences in A against all the sequences in B. This might be useful
> > for finding reciprocal best hits in comparative genomics (as an slower
> > but exact alternative to FASTA or BLAST).
>
> The application is easy to add (after the release)
>
> The usual problem with all-against-all is that it involves loading one
> of the inputs as a sequence set entirely in memory - to avoid reading
> one input many times over.

Right - and it would be difficult to decide if in memory vs reading the
file many times is best in general without some specific use cases.

[I suppose you could do something a bit more cunning like start by
caching the sequences as you read them read for re-use, but if the
number of sequences crosses a threshold, stop caching and switch
to re-reading the file for subsequence loops?]

> We have an application supermatcher which does this - the first sequence
> is streamed through, the second is a sequence set loaded into memory. It
> uses work matching to find seed alignments then runs a limited alignment
> around the hits.
>
> superwater would be a possible name (or superneedle).

If you see many-to-many versions of water and needle as a separate
applications, then those names sound fine.

> How popular would such a program be?

I don't know - as I said, this is more of suggestion than a request.
I don't *need* this tool, but there have been occasions in the past
where I would have tried using it if it had existed.

Perhaps others on the list can think of a better uses for this tool idea?

> How large would the smaller input set be?

Hard to say without specific examples in mind. For some hand waving
upper limits, for comparative genomics of bacteria using protein
sequences, you might have a few thousand in each file. If I was trying
this as part of an ad-hoc clustering algorithm (all-against-all), again
maybe a few thousand sequences. In practice, a heuristic tool like
supermatcher (or FASTA or BLAST) would probably be more sensible
for large datasets like this due to the computational time.

I see needle and water as most useful on smaller datasets where
the runtime cost of using an exact algorithm isn't too high. Therefore
many-to-many needle/water searches may be best targeted at
smaller sequence files. Things might be different with a multicore
or GPU/OpenCL version of needle and water ;)

Anyway, unless someone else thinks a many-to-many version
of needle and water would be useful, I wouldn't expect you to
implement this. I'm just putting the idea forward for discussion.

Regards,

Peter C.

From pmr at ebi.ac.uk  Mon Jul  6 07:19:30 2009
From: pmr at ebi.ac.uk (Peter Rice)
Date: Mon, 06 Jul 2009 12:19:30 +0100
Subject: [EMBOSS] Many-to-many with needle and water
In-Reply-To: <320fb6e00907060358v71934109o1d8acb0aef8d3e83@mail.gmail.com>
References: <320fb6e00907060312l5506c766i1c4a4ff48fba8c7e@mail.gmail.com>	
	<4A51D360.1050107@ebi.ac.uk>
	<320fb6e00907060358v71934109o1d8acb0aef8d3e83@mail.gmail.com>
Message-ID: <4A51DDC2.9030801@ebi.ac.uk>

Peter C wrote:
> [I suppose you could do something a bit more cunning like start by
> caching the sequences as you read them read for re-use, but if the
> number of sequences crosses a threshold, stop caching and switch
> to re-reading the file for subsequence loops?]

Tricky. Rereading is not always possible - for example streamed standard
input as the data source.

> Perhaps others on the list can think of a better uses for this tool idea?

Let's see what response we get. One never knows until the question is
asked :-)

>> How large would the smaller input set be?
> 
> Hard to say without specific examples in mind. For some hand waving
> upper limits, for comparative genomics of bacteria using protein
> sequences, you might have a few thousand in each file. If I was trying
> this as part of an ad-hoc clustering algorithm (all-against-all), again
> maybe a few thousand sequences. In practice, a heuristic tool like
> supermatcher (or FASTA or BLAST) would probably be more sensible
> for large datasets like this due to the computational time.
> 
> I see needle and water as most useful on smaller datasets where
> the runtime cost of using an exact algorithm isn't too high. Therefore
> many-to-many needle/water searches may be best targeted at
> smaller sequence files. Things might be different with a multicore
> or GPU/OpenCL version of needle and water ;)

Multicore would be a possibility - at least on systems configured for
it. We are looking into picking up methods from the BioManyCores project.

> Anyway, unless someone else thinks a many-to-many version
> of needle and water would be useful, I wouldn't expect you to
> implement this. I'm just putting the idea forward for discussion.

Implementing is easy - we could simply send you the code to install
locally if nobody else needs it :-)

After all, it is only a minor modification to the existing applications.

regards,

Peter

From biopython at maubp.freeserve.co.uk  Mon Jul  6 07:56:06 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 6 Jul 2009 12:56:06 +0100
Subject: [EMBOSS] Probabilistic versions of needle/water?
Message-ID: <320fb6e00907060456t42879f56ta5df97ac6ae8ec8e@mail.gmail.com>

Hi all,

I have another suggestion for new or enhanced EMBOSS applications,
again related to the existing pairwise sequence alignment tools needle
and water.

The FASTQ file format (or others) contains quality scores (often PHRED
scores) representing the probability of an error in the associated
nucleotide. Solexa/Illumina machines also provide another file with a
more precise breakdown of the likelihood of each of the four bases.

In some cases both sequences could have probability scores (e.g.
trying to align the ends of contigs to each other), but often one
sequence will be taken as fact (e.g. mapping reads onto a reference).

It is possible to take these probabilities into account when
considering the matches in needle (or water) by using a probabilistic
version of the Needleman?Wunsch sequence alignment algorithm (or a
probabilistic Smith-Waterman).

As an example of this idea, did you (Peter R) see the GNUMAP
talk/poster at ISMB 2009? See http://dna.cs.byu.edu/gnumap/

I am aware of people using EMBOSS tools (I assume water) to identify
(known) adaptor sequences in raw Solexa/Illumina data. I considered
doing something similar myself when trying to remove primer sequences
from 454 data. Such a pipeline using the current EMBOSS water would be
doing this matching at a purely fixed nucleotide level (ignoring the
qualities), which isn't ideal. Upgrading to a probabilistic version of
water should be an improvement.

Peter C.


From pmr at ebi.ac.uk  Mon Jul  6 08:32:18 2009
From: pmr at ebi.ac.uk (Peter Rice)
Date: Mon, 06 Jul 2009 13:32:18 +0100
Subject: [EMBOSS] Probabilistic versions of needle/water?
In-Reply-To: <320fb6e00907060456t42879f56ta5df97ac6ae8ec8e@mail.gmail.com>
References: <320fb6e00907060456t42879f56ta5df97ac6ae8ec8e@mail.gmail.com>
Message-ID: <4A51EED2.3080901@ebi.ac.uk>

Peter C. wrote:
> I have another suggestion for new or enhanced EMBOSS applications,
> again related to the existing pairwise sequence alignment tools needle
> and water.
> 
> The FASTQ file format (or others) contains quality scores (often PHRED
> scores) representing the probability of an error in the associated
> nucleotide. Solexa/Illumina machines also provide another file with a
> more precise breakdown of the likelihood of each of the four bases.
> 
> In some cases both sequences could have probability scores (e.g.
> trying to align the ends of contigs to each other), but often one
> sequence will be taken as fact (e.g. mapping reads onto a reference).
> 
> It is possible to take these probabilities into account when
> considering the matches in needle (or water) by using a probabilistic
> version of the Needleman?Wunsch sequence alignment algorithm (or a
> probabilistic Smith-Waterman).
> 
> As an example of this idea, did you (Peter R) see the GNUMAP
> talk/poster at ISMB 2009? See http://dna.cs.byu.edu/gnumap/

I saw the talk, and was wondering about their algorithm. They did not
have a separate treatment for gaps in the redas and the consensus, which
seemed like an obvious extension.

> I am aware of people using EMBOSS tools (I assume water) to identify
> (known) adaptor sequences in raw Solexa/Illumina data. I considered
> doing something similar myself when trying to remove primer sequences
> from 454 data. Such a pipeline using the current EMBOSS water would be
> doing this matching at a purely fixed nucleotide level (ignoring the
> qualities), which isn't ideal. Upgrading to a probabilistic version of
> water should be an improvement.

Would be interesting.

Where can I look up adaptor calling methods?

Peter Rice

From biopython at maubp.freeserve.co.uk  Mon Jul  6 10:25:47 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 6 Jul 2009 15:25:47 +0100
Subject: [EMBOSS] Probabilistic versions of needle/water?
In-Reply-To: <4A51EED2.3080901@ebi.ac.uk>
References: <320fb6e00907060456t42879f56ta5df97ac6ae8ec8e@mail.gmail.com>
	<4A51EED2.3080901@ebi.ac.uk>
Message-ID: <320fb6e00907060725g380ce420vdf2f34c290518a16@mail.gmail.com>

On Mon, Jul 6, 2009 at 1:32 PM, Peter Rice<pmr at ebi.ac.uk> wrote:
>
>> I am aware of people using EMBOSS tools (I assume water) to identify
>> (known) adaptor sequences in raw Solexa/Illumina data. I considered
>> doing something similar myself when trying to remove primer sequences
>> from 454 data. Such a pipeline using the current EMBOSS water would be
>> doing this matching at a purely fixed nucleotide level (ignoring the
>> qualities), which isn't ideal. Upgrading to a probabilistic version of
>> water should be an improvement.
>
> Would be interesting.
>
> Where can I look up adaptor calling methods?

The particular example I had in mind was the thread with Giles Weaver
on the BioPerl mailing list, which I see you have just replied to:

http://lists.open-bio.org/pipermail/bioperl-l/2009-June/030398.html
http://lists.open-bio.org/pipermail/bioperl-l/2009-July/030404.html

I think I made a typo earlier (needle versus water). If you are
comparing a short but complete adaptor sequence to a read
(which you expect may contain the full adaptor) doing a global
alignment is more sensible that a local one. On re-reading,
Giles did actually say he was using needle:
http://lists.open-bio.org/pipermail/bioperl-l/2009-July/030411.html

Peter

From michael.watson at bbsrc.ac.uk  Mon Jul  6 11:45:06 2009
From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C))
Date: Mon, 6 Jul 2009 16:45:06 +0100
Subject: [EMBOSS] Problems installing EMBASSY / PHYLIPNEW
Message-ID: <8975119BCD0AC5419D61A9CF1A923E9508FD0DC2@iahce2ksrv1.iah.bbsrc.ac.uk>

Hello

 
I did this on a test server a few weeks ago and wrote a script and it
worked.

 
Now I come to the production server - well, I get some errors.  Same OS,
same tar balls.  Very frustrating.  Can anyone help?

 
I have PHYLIPNEW-3.67 unpacked in EMBOSS-6.0.1/embassy/PHYLIPNEW-3.67.
In that directory, I run configure and then make

 
I get a variety of errors, but this time it's:

 
Making all in src

make[1]: Entering directory
`/home/bioinfvm/EMBOSS-6.0.1/embassy/PHYLIPNEW-3.67/src'

gcc -DPACKAGE_NAME=\"\" -DPACKAGE_TARNAME=\"\" -DPACKAGE_VERSION=\"\"
-DPACKAGE_STRING=\"\" -DPACKAGE_BUGREPORT=\"\" -DPACKAGE=\"PHYLIPNEW\"
-DVERSION=\"3.67\" -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1
-DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1
-DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1
-DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -DHAVE_DIRENT_H=1
-DSTDC_HEADERS=1 -DHAVE_UNISTD_H=1 -DGETPGRP_VOID=1 -DHAVE_STRFTIME=1
-DHAVE_FORK=1 -DHAVE_VFORK=1 -DHAVE_WORKING_VFORK=1
-DHAVE_WORKING_FORK=1 -DHAVE_VPRINTF=1 -DHAVE_STRDUP=1 -DHAVE_STRSTR=1
-DHAVE_STRCHR=1 -DHAVE_ERAND48=1 -DHAVE_MEMMOVE=1 -DHAVE_LIBM=1
-DPLD_png=1 -I. -I../include -I../../../ajax -I../../../nucleus
-I../../../plplot  -DAJ_LinuxLF -D_FILE_OFFSET_BITS=64
-D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE  -DLENDIAN -DNO_AUTH  -O2
-D__amd64__ -MT clique.o -MD -MP -MF .deps/clique.Tpo -c -o clique.o
clique.c

mv -f .deps/clique.Tpo .deps/clique.Po

gcc -DPACKAGE_NAME=\"\" -DPACKAGE_TARNAME=\"\" -DPACKAGE_VERSION=\"\"
-DPACKAGE_STRING=\"\" -DPACKAGE_BUGREPORT=\"\" -DPACKAGE=\"PHYLIPNEW\"
-DVERSION=\"3.67\" -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1
-DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1
-DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1
-DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -DHAVE_DIRENT_H=1
-DSTDC_HEADERS=1 -DHAVE_UNISTD_H=1 -DGETPGRP_VOID=1 -DHAVE_STRFTIME=1
-DHAVE_FORK=1 -DHAVE_VFORK=1 -DHAVE_WORKING_VFORK=1
-DHAVE_WORKING_FORK=1 -DHAVE_VPRINTF=1 -DHAVE_STRDUP=1 -DHAVE_STRSTR=1
-DHAVE_STRCHR=1 -DHAVE_ERAND48=1 -DHAVE_MEMMOVE=1 -DHAVE_LIBM=1
-DPLD_png=1 -I. -I../include -I../../../ajax -I../../../nucleus
-I../../../plplot  -DAJ_LinuxLF -D_FILE_OFFSET_BITS=64
-D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE  -DLENDIAN -DNO_AUTH  -O2
-D__amd64__ -MT disc.o -MD -MP -MF .deps/disc.Tpo -c -o disc.o disc.c

mv -f .deps/disc.Tpo .deps/disc.Po

gcc -DPACKAGE_NAME=\"\" -DPACKAGE_TARNAME=\"\" -DPACKAGE_VERSION=\"\"
-DPACKAGE_STRING=\"\" -DPACKAGE_BUGREPORT=\"\" -DPACKAGE=\"PHYLIPNEW\"
-DVERSION=\"3.67\" -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1
-DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1
-DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1
-DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -DHAVE_DIRENT_H=1
-DSTDC_HEADERS=1 -DHAVE_UNISTD_H=1 -DGETPGRP_VOID=1 -DHAVE_STRFTIME=1
-DHAVE_FORK=1 -DHAVE_VFORK=1 -DHAVE_WORKING_VFORK=1
-DHAVE_WORKING_FORK=1 -DHAVE_VPRINTF=1 -DHAVE_STRDUP=1 -DHAVE_STRSTR=1
-DHAVE_STRCHR=1 -DHAVE_ERAND48=1 -DHAVE_MEMMOVE=1 -DHAVE_LIBM=1
-DPLD_png=1 -I. -I../include -I../../../ajax -I../../../nucleus
-I../../../plplot  -DAJ_LinuxLF -D_FILE_OFFSET_BITS=64
-D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE  -DLENDIAN -DNO_AUTH  -O2
-D__amd64__ -MT phylip.o -MD -MP -MF .deps/phylip.Tpo -c -o phylip.o
phylip.c

mv -f .deps/phylip.Tpo .deps/phylip.Po

/bin/sh ../libtool --tag=CC   --mode=link gcc  -O2 -D__amd64__   -o
fclique clique.o disc.o phylip.o ../../../nucleus/libnucleus.la
../../../ajax/libajax.la ../../../plplot/libeplplot.la -L/usr/lib64
-lX11  -lXaw -lXt -lm  -lgd -lpng -lz -lm

mkdir .libs

gcc -O2 -D__amd64__ -o .libs/fclique clique.o disc.o phylip.o
../../../nucleus/.libs/libnucleus.so ../../../ajax/.libs/libajax.so
../../../plplot/.libs/libeplplot.so -L/usr/lib64 -lX11 -lXaw -lXt -lgd
-lpng -lz -lm  -Wl,--rpath -Wl,/usr/local/lib

/usr/bin/ld: cannot find -lXaw

collect2: ld returned 1 exit status

make[1]: *** [fclique] Error 1

make[1]: Leaving directory
`/home/bioinfvm/EMBOSS-6.0.1/embassy/PHYLIPNEW-3.67/src'

make: *** [all-recursive] Error 1

 
Any help would be appreciated.

 
Thanks

Mick

 
Head of Bioinformatics
Institute for Animal Health
Compton
Berks
RG20 7NN
01635 578411 

 
Please consider the environment and don't print this e-mail unless you
really need to.

The information contained in this message may be confidential or legally
privileged and is intended solely for the addressee. If you have
received this message in error please delete it & notify the originator
immediately.  Unauthorised use, disclosure, copying or alteration of
this message is forbidden & may be unlawful.  The contents of this
e-mail are the views of the sender and do not necessarily represent the
views of the Institute.   This email, and associated attachments, has
been checked locally for viruses but we can accept no responsibility
once it has left our systems.  Communications on Institute computers are
monitored to secure the effective operation of the systems and for other
lawful purposes.

 
The Institute for Animal Health is a company limited by guarantee,
registered in England no. 559784.  

The Institute is also a registered charity, Charity Commissioners
Reference No. 228824

 
From jasonzhang2002 at gmail.com  Mon Jul  6 12:35:21 2009
From: jasonzhang2002 at gmail.com (jason zhang)
Date: Mon, 06 Jul 2009 09:35:21 -0700
Subject: [EMBOSS] Problems installing EMBASSY / PHYLIPNEW
In-Reply-To: <8975119BCD0AC5419D61A9CF1A923E9508FD0DC2@iahce2ksrv1.iah.bbsrc.ac.uk>
References: <8975119BCD0AC5419D61A9CF1A923E9508FD0DC2@iahce2ksrv1.iah.bbsrc.ac.uk>
Message-ID: <4A5227C9.7040004@gmail.com>

Hi, Michael
I did this in my fedora server : yum install libXaw libXaw-compat 
libXaw-devel to resolve this.

-jason

michael watson (IAH-C) wrote:
> Hello
>
>  
>
> I did this on a test server a few weeks ago and wrote a script and it
> worked.
>
>  
>
> Now I come to the production server - well, I get some errors.  Same OS,
> same tar balls.  Very frustrating.  Can anyone help?
>
>  
>
> I have PHYLIPNEW-3.67 unpacked in EMBOSS-6.0.1/embassy/PHYLIPNEW-3.67.
> In that directory, I run configure and then make
>
>  
>
> I get a variety of errors, but this time it's:
>
>  
>
> Making all in src
>
> make[1]: Entering directory
> `/home/bioinfvm/EMBOSS-6.0.1/embassy/PHYLIPNEW-3.67/src'
>
> gcc -DPACKAGE_NAME=\"\" -DPACKAGE_TARNAME=\"\" -DPACKAGE_VERSION=\"\"
> -DPACKAGE_STRING=\"\" -DPACKAGE_BUGREPORT=\"\" -DPACKAGE=\"PHYLIPNEW\"
> -DVERSION=\"3.67\" -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1
> -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1
> -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1
> -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -DHAVE_DIRENT_H=1
> -DSTDC_HEADERS=1 -DHAVE_UNISTD_H=1 -DGETPGRP_VOID=1 -DHAVE_STRFTIME=1
> -DHAVE_FORK=1 -DHAVE_VFORK=1 -DHAVE_WORKING_VFORK=1
> -DHAVE_WORKING_FORK=1 -DHAVE_VPRINTF=1 -DHAVE_STRDUP=1 -DHAVE_STRSTR=1
> -DHAVE_STRCHR=1 -DHAVE_ERAND48=1 -DHAVE_MEMMOVE=1 -DHAVE_LIBM=1
> -DPLD_png=1 -I. -I../include -I../../../ajax -I../../../nucleus
> -I../../../plplot  -DAJ_LinuxLF -D_FILE_OFFSET_BITS=64
> -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE  -DLENDIAN -DNO_AUTH  -O2
> -D__amd64__ -MT clique.o -MD -MP -MF .deps/clique.Tpo -c -o clique.o
> clique.c
>
> mv -f .deps/clique.Tpo .deps/clique.Po
>
> gcc -DPACKAGE_NAME=\"\" -DPACKAGE_TARNAME=\"\" -DPACKAGE_VERSION=\"\"
> -DPACKAGE_STRING=\"\" -DPACKAGE_BUGREPORT=\"\" -DPACKAGE=\"PHYLIPNEW\"
> -DVERSION=\"3.67\" -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1
> -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1
> -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1
> -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -DHAVE_DIRENT_H=1
> -DSTDC_HEADERS=1 -DHAVE_UNISTD_H=1 -DGETPGRP_VOID=1 -DHAVE_STRFTIME=1
> -DHAVE_FORK=1 -DHAVE_VFORK=1 -DHAVE_WORKING_VFORK=1
> -DHAVE_WORKING_FORK=1 -DHAVE_VPRINTF=1 -DHAVE_STRDUP=1 -DHAVE_STRSTR=1
> -DHAVE_STRCHR=1 -DHAVE_ERAND48=1 -DHAVE_MEMMOVE=1 -DHAVE_LIBM=1
> -DPLD_png=1 -I. -I../include -I../../../ajax -I../../../nucleus
> -I../../../plplot  -DAJ_LinuxLF -D_FILE_OFFSET_BITS=64
> -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE  -DLENDIAN -DNO_AUTH  -O2
> -D__amd64__ -MT disc.o -MD -MP -MF .deps/disc.Tpo -c -o disc.o disc.c
>
> mv -f .deps/disc.Tpo .deps/disc.Po
>
> gcc -DPACKAGE_NAME=\"\" -DPACKAGE_TARNAME=\"\" -DPACKAGE_VERSION=\"\"
> -DPACKAGE_STRING=\"\" -DPACKAGE_BUGREPORT=\"\" -DPACKAGE=\"PHYLIPNEW\"
> -DVERSION=\"3.67\" -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1
> -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1
> -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1
> -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -DHAVE_DIRENT_H=1
> -DSTDC_HEADERS=1 -DHAVE_UNISTD_H=1 -DGETPGRP_VOID=1 -DHAVE_STRFTIME=1
> -DHAVE_FORK=1 -DHAVE_VFORK=1 -DHAVE_WORKING_VFORK=1
> -DHAVE_WORKING_FORK=1 -DHAVE_VPRINTF=1 -DHAVE_STRDUP=1 -DHAVE_STRSTR=1
> -DHAVE_STRCHR=1 -DHAVE_ERAND48=1 -DHAVE_MEMMOVE=1 -DHAVE_LIBM=1
> -DPLD_png=1 -I. -I../include -I../../../ajax -I../../../nucleus
> -I../../../plplot  -DAJ_LinuxLF -D_FILE_OFFSET_BITS=64
> -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE  -DLENDIAN -DNO_AUTH  -O2
> -D__amd64__ -MT phylip.o -MD -MP -MF .deps/phylip.Tpo -c -o phylip.o
> phylip.c
>
> mv -f .deps/phylip.Tpo .deps/phylip.Po
>
> /bin/sh ../libtool --tag=CC   --mode=link gcc  -O2 -D__amd64__   -o
> fclique clique.o disc.o phylip.o ../../../nucleus/libnucleus.la
> ../../../ajax/libajax.la ../../../plplot/libeplplot.la -L/usr/lib64
> -lX11  -lXaw -lXt -lm  -lgd -lpng -lz -lm
>
> mkdir .libs
>
> gcc -O2 -D__amd64__ -o .libs/fclique clique.o disc.o phylip.o
> ../../../nucleus/.libs/libnucleus.so ../../../ajax/.libs/libajax.so
> ../../../plplot/.libs/libeplplot.so -L/usr/lib64 -lX11 -lXaw -lXt -lgd
> -lpng -lz -lm  -Wl,--rpath -Wl,/usr/local/lib
>
> /usr/bin/ld: cannot find -lXaw
>
> collect2: ld returned 1 exit status
>
> make[1]: *** [fclique] Error 1
>
> make[1]: Leaving directory
> `/home/bioinfvm/EMBOSS-6.0.1/embassy/PHYLIPNEW-3.67/src'
>
> make: *** [all-recursive] Error 1
>
>  
>
> Any help would be appreciated.
>
>  
>
> Thanks
>
> Mick
>
>  
>
> Head of Bioinformatics
> Institute for Animal Health
> Compton
> Berks
> RG20 7NN
> 01635 578411 
>
>  
>
> Please consider the environment and don't print this e-mail unless you
> really need to.
>
> The information contained in this message may be confidential or legally
> privileged and is intended solely for the addressee. If you have
> received this message in error please delete it & notify the originator
> immediately.  Unauthorised use, disclosure, copying or alteration of
> this message is forbidden & may be unlawful.  The contents of this
> e-mail are the views of the sender and do not necessarily represent the
> views of the Institute.   This email, and associated attachments, has
> been checked locally for viruses but we can accept no responsibility
> once it has left our systems.  Communications on Institute computers are
> monitored to secure the effective operation of the systems and for other
> lawful purposes.
>
>  
>
> The Institute for Animal Health is a company limited by guarantee,
> registered in England no. 559784.  
>
> The Institute is also a registered charity, Charity Commissioners
> Reference No. 228824
>
>  
>
>
> _______________________________________________
> EMBOSS mailing list
> EMBOSS at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/emboss
>
>   


From steffen_moeller at gmx.de  Mon Jul  6 12:28:47 2009
From: steffen_moeller at gmx.de (Steffen Moeller)
Date: Mon, 06 Jul 2009 18:28:47 +0200
Subject: [EMBOSS] Problems installing EMBASSY / PHYLIPNEW
In-Reply-To: <8975119BCD0AC5419D61A9CF1A923E9508FD0DC2@iahce2ksrv1.iah.bbsrc.ac.uk>
References: <8975119BCD0AC5419D61A9CF1A923E9508FD0DC2@iahce2ksrv1.iah.bbsrc.ac.uk>
Message-ID: <4A52263F.1050806@gmx.de>

Hi Michael,

michael watson (IAH-C) wrote:

> I did this on a test server a few weeks ago and wrote a script and it
> worked.
> 
> Now I come to the production server - well, I get some errors.  Same OS,
> same tar balls.  Very frustrating.  Can anyone help?
...
> 
> gcc -O2 -D__amd64__ -o .libs/fclique clique.o disc.o phylip.o
> ../../../nucleus/.libs/libnucleus.so ../../../ajax/.libs/libajax.so
> ../../../plplot/.libs/libeplplot.so -L/usr/lib64 -lX11 -lXaw -lXt -lgd
> -lpng -lz -lm  -Wl,--rpath -Wl,/usr/local/lib
> 
> /usr/bin/ld: cannot find -lXaw
> 
> collect2: ld returned 1 exit status

your production machine is probably not prepared for graphical user interfaces :)

Debian has it in the package libxaw7.

Many greetings

Steffen

$ dpkg -S /usr/lib/libXaw.so.7
libxaw7: /usr/lib/libXaw.so.7

From michael.watson at bbsrc.ac.uk  Tue Jul  7 04:18:39 2009
From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C))
Date: Tue, 7 Jul 2009 09:18:39 +0100
Subject: [EMBOSS] Problems installing EMBASSY / PHYLIPNEW
In-Reply-To: <4A5227C9.7040004@gmail.com>
References: <8975119BCD0AC5419D61A9CF1A923E9508FD0DC2@iahce2ksrv1.iah.bbsrc.ac.uk>
	<4A5227C9.7040004@gmail.com>
Message-ID: <8975119BCD0AC5419D61A9CF1A923E9508FD0DD3@iahce2ksrv1.iah.bbsrc.ac.uk>

Hi Guys

Thanks for all the replies.

One last question: do I need the X11 development libraries in order for
fdrawgram and fdrawtree to produce postscript files?

Thanks
Michael

-----Original Message-----
From: jason zhang [mailto:jasonzhang2002 at gmail.com] 
Sent: 06 July 2009 17:35
To: michael watson (IAH-C)
Cc: emboss at lists.open-bio.org
Subject: Re: [EMBOSS] Problems installing EMBASSY / PHYLIPNEW

Hi, Michael
I did this in my fedora server : yum install libXaw libXaw-compat 
libXaw-devel to resolve this.

-jason

michael watson (IAH-C) wrote:
> Hello
>
>  
>
> I did this on a test server a few weeks ago and wrote a script and it
> worked.
>
>  
>
> Now I come to the production server - well, I get some errors.  Same
OS,
> same tar balls.  Very frustrating.  Can anyone help?
>
>  
>
> I have PHYLIPNEW-3.67 unpacked in EMBOSS-6.0.1/embassy/PHYLIPNEW-3.67.
> In that directory, I run configure and then make
>
>  
>
> I get a variety of errors, but this time it's:
>
>  
>
> Making all in src
>
> make[1]: Entering directory
> `/home/bioinfvm/EMBOSS-6.0.1/embassy/PHYLIPNEW-3.67/src'
>
> gcc -DPACKAGE_NAME=\"\" -DPACKAGE_TARNAME=\"\" -DPACKAGE_VERSION=\"\"
> -DPACKAGE_STRING=\"\" -DPACKAGE_BUGREPORT=\"\" -DPACKAGE=\"PHYLIPNEW\"
> -DVERSION=\"3.67\" -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1
> -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1
> -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1
> -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -DHAVE_DIRENT_H=1
> -DSTDC_HEADERS=1 -DHAVE_UNISTD_H=1 -DGETPGRP_VOID=1 -DHAVE_STRFTIME=1
> -DHAVE_FORK=1 -DHAVE_VFORK=1 -DHAVE_WORKING_VFORK=1
> -DHAVE_WORKING_FORK=1 -DHAVE_VPRINTF=1 -DHAVE_STRDUP=1 -DHAVE_STRSTR=1
> -DHAVE_STRCHR=1 -DHAVE_ERAND48=1 -DHAVE_MEMMOVE=1 -DHAVE_LIBM=1
> -DPLD_png=1 -I. -I../include -I../../../ajax -I../../../nucleus
> -I../../../plplot  -DAJ_LinuxLF -D_FILE_OFFSET_BITS=64
> -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE  -DLENDIAN -DNO_AUTH  -O2
> -D__amd64__ -MT clique.o -MD -MP -MF .deps/clique.Tpo -c -o clique.o
> clique.c
>
> mv -f .deps/clique.Tpo .deps/clique.Po
>
> gcc -DPACKAGE_NAME=\"\" -DPACKAGE_TARNAME=\"\" -DPACKAGE_VERSION=\"\"
> -DPACKAGE_STRING=\"\" -DPACKAGE_BUGREPORT=\"\" -DPACKAGE=\"PHYLIPNEW\"
> -DVERSION=\"3.67\" -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1
> -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1
> -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1
> -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -DHAVE_DIRENT_H=1
> -DSTDC_HEADERS=1 -DHAVE_UNISTD_H=1 -DGETPGRP_VOID=1 -DHAVE_STRFTIME=1
> -DHAVE_FORK=1 -DHAVE_VFORK=1 -DHAVE_WORKING_VFORK=1
> -DHAVE_WORKING_FORK=1 -DHAVE_VPRINTF=1 -DHAVE_STRDUP=1 -DHAVE_STRSTR=1
> -DHAVE_STRCHR=1 -DHAVE_ERAND48=1 -DHAVE_MEMMOVE=1 -DHAVE_LIBM=1
> -DPLD_png=1 -I. -I../include -I../../../ajax -I../../../nucleus
> -I../../../plplot  -DAJ_LinuxLF -D_FILE_OFFSET_BITS=64
> -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE  -DLENDIAN -DNO_AUTH  -O2
> -D__amd64__ -MT disc.o -MD -MP -MF .deps/disc.Tpo -c -o disc.o disc.c
>
> mv -f .deps/disc.Tpo .deps/disc.Po
>
> gcc -DPACKAGE_NAME=\"\" -DPACKAGE_TARNAME=\"\" -DPACKAGE_VERSION=\"\"
> -DPACKAGE_STRING=\"\" -DPACKAGE_BUGREPORT=\"\" -DPACKAGE=\"PHYLIPNEW\"
> -DVERSION=\"3.67\" -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1
> -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1
> -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1
> -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -DHAVE_DIRENT_H=1
> -DSTDC_HEADERS=1 -DHAVE_UNISTD_H=1 -DGETPGRP_VOID=1 -DHAVE_STRFTIME=1
> -DHAVE_FORK=1 -DHAVE_VFORK=1 -DHAVE_WORKING_VFORK=1
> -DHAVE_WORKING_FORK=1 -DHAVE_VPRINTF=1 -DHAVE_STRDUP=1 -DHAVE_STRSTR=1
> -DHAVE_STRCHR=1 -DHAVE_ERAND48=1 -DHAVE_MEMMOVE=1 -DHAVE_LIBM=1
> -DPLD_png=1 -I. -I../include -I../../../ajax -I../../../nucleus
> -I../../../plplot  -DAJ_LinuxLF -D_FILE_OFFSET_BITS=64
> -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE  -DLENDIAN -DNO_AUTH  -O2
> -D__amd64__ -MT phylip.o -MD -MP -MF .deps/phylip.Tpo -c -o phylip.o
> phylip.c
>
> mv -f .deps/phylip.Tpo .deps/phylip.Po
>
> /bin/sh ../libtool --tag=CC   --mode=link gcc  -O2 -D__amd64__   -o
> fclique clique.o disc.o phylip.o ../../../nucleus/libnucleus.la
> ../../../ajax/libajax.la ../../../plplot/libeplplot.la -L/usr/lib64
> -lX11  -lXaw -lXt -lm  -lgd -lpng -lz -lm
>
> mkdir .libs
>
> gcc -O2 -D__amd64__ -o .libs/fclique clique.o disc.o phylip.o
> ../../../nucleus/.libs/libnucleus.so ../../../ajax/.libs/libajax.so
> ../../../plplot/.libs/libeplplot.so -L/usr/lib64 -lX11 -lXaw -lXt -lgd
> -lpng -lz -lm  -Wl,--rpath -Wl,/usr/local/lib
>
> /usr/bin/ld: cannot find -lXaw
>
> collect2: ld returned 1 exit status
>
> make[1]: *** [fclique] Error 1
>
> make[1]: Leaving directory
> `/home/bioinfvm/EMBOSS-6.0.1/embassy/PHYLIPNEW-3.67/src'
>
> make: *** [all-recursive] Error 1
>
>  
>
> Any help would be appreciated.
>
>  
>
> Thanks
>
> Mick
>
>  
>
> Head of Bioinformatics
> Institute for Animal Health
> Compton
> Berks
> RG20 7NN
> 01635 578411 
>
>  
>
> Please consider the environment and don't print this e-mail unless you
> really need to.
>
> The information contained in this message may be confidential or
legally
> privileged and is intended solely for the addressee. If you have
> received this message in error please delete it & notify the
originator
> immediately.  Unauthorised use, disclosure, copying or alteration of
> this message is forbidden & may be unlawful.  The contents of this
> e-mail are the views of the sender and do not necessarily represent
the
> views of the Institute.   This email, and associated attachments, has
> been checked locally for viruses but we can accept no responsibility
> once it has left our systems.  Communications on Institute computers
are
> monitored to secure the effective operation of the systems and for
other
> lawful purposes.
>
>  
>
> The Institute for Animal Health is a company limited by guarantee,
> registered in England no. 559784.  
>
> The Institute is also a registered charity, Charity Commissioners
> Reference No. 228824
>
>  
>
>
> _______________________________________________
> EMBOSS mailing list
> EMBOSS at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/emboss
>
>   


From ellis at biochem.dental.upenn.edu  Tue Jul  7 15:34:27 2009
From: ellis at biochem.dental.upenn.edu (Ellis Golub)
Date: Tue, 7 Jul 2009 15:34:27 -0400
Subject: [EMBOSS] Structure
Message-ID: <1090707153427.ZM647422@biochem.dental.upenn.edu>

Hello --

I've downloaded the EMBASSY STRUCTURE-0.1.0 package and tried to install it. I
have previously installed EMBOSS-6.0.1.  Using the same flags with configure
that I used for the original EMBOSS installation, all seems to go well until I
try to run one of the programs.  The following output is obtained:

> pdbparse
1274873:pdbparse: rld: Error: unresolvable symbol in
/usr/local/emboss/lib/libplplot.so.6: gdImageColorAllocate
1274873:pdbparse: rld: Error: unresolvable symbol in
/usr/local/emboss/lib/libplplot.so.6: gdImageLine
1274873:pdbparse: rld: Fatal Error: this executable has unresolvable symbols

The /usr/local/emboss/lib/libplplot.so.6 file is the one produced with the
EMBOSS install, and all the EMBOSS files work fine, and the grahics outputs are
fine.

Anyone have a clue?

Ellis

--
=============================================================
Ellis Golub                  Phone: (215) 898-4629
Biochemistry Department      FAX:   (215) 898-3695
School of Dental Medicine    ellis at biochem.dental.upenn.edu
University of Pennsylvania
240 South 40th Street
Philadelphia, PA 19104-6030
Home: http://biochem.dental.upenn.edu/~ellis/
PGP: http://biochem.dental.upenn.edu/~ellis/public_key.html
==============================================================

From lueck at ipk-gatersleben.de  Wed Jul  8 05:59:39 2009
From: lueck at ipk-gatersleben.de (lueck at ipk-gatersleben.de)
Date: Wed,  8 Jul 2009 11:59:39 +0200
Subject: [EMBOSS] Many-to-many with needle and water
In-Reply-To: <4A51D360.1050107@ebi.ac.uk>
References: <320fb6e00907060312l5506c766i1c4a4ff48fba8c7e@mail.gmail.com>
	<4A51D360.1050107@ebi.ac.uk>
Message-ID: <20090708115939.8spikcm8zxi8cscw@webmail.ipk-gatersleben.de>

Hi!

I would be interested in "superneedle" if the speed is acceptable. I'm 
still looking for a fast and good enough short sequence finder/aligner 
(see 
http://www.mailinglistarchive.com/biopython at biopython.org/msg00941.html).

At the moment I'm still using BLAST because I have no time to switch 
but later I would like to change to another algorithm.

Kind regards
Stefanie

Zitat von Peter Rice <pmr at ebi.ac.uk>:

> Peter Cock or biopython wrote:
>> Hi Peter R. et al,
>>
>> I gather EMBOSS is looking for feedback for new applications (given
>> the recent funding from the BBSRC - congratulations again). How about
>> suggestions for extensions to existing EMBOSS applications?
>>
>> I've used bits of EMBOSS for several years now (thank you!). Something
>> I have sometimes wanted to do is a many-to-many pairwise sequence
>> alignment with the EMBOSS tools needle and water.
>>
>> Right now, needle and water take two files (here referred to as A and
>> B), file A has just one sequence, and file B can have one or more
>> sequences. I'd like to be able to supply two files both with multiple
>> entries, and have needle/water do pairwise alignments between all the
>> sequences in A against all the sequences in B. This might be useful
>> for finding reciprocal best hits in comparative genomics (as an slower
>> but exact alternative to FASTA or BLAST).
>
> The application is easy to add (after the release)
>
> The usual problem with all-against-all is that it involves loading one
> of the inputs as a sequence set entirely in memory - to avoid reading
> one input many times over.
>
> We have an application supermatcher which does this - the first sequence
> is streamed through, the second is a sequence set loaded into memory. It
> uses work matching to find seed alignments then runs a limited alignment
> around the hits.
>
> superwater would be a possible name (or superneedle).
>
> How popular would such a program be?
>
> How large would the smaller input set be?
>
> regards,
>
> Peter
>
> _______________________________________________
> EMBOSS mailing list
> EMBOSS at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/emboss
>
>


From pmr at ebi.ac.uk  Wed Jul  8 07:04:04 2009
From: pmr at ebi.ac.uk (Peter Rice)
Date: Wed, 08 Jul 2009 12:04:04 +0100
Subject: [EMBOSS] Many-to-many with needle and water
In-Reply-To: <20090708115939.8spikcm8zxi8cscw@webmail.ipk-gatersleben.de>
References: <320fb6e00907060312l5506c766i1c4a4ff48fba8c7e@mail.gmail.com>	<4A51D360.1050107@ebi.ac.uk>
	<20090708115939.8spikcm8zxi8cscw@webmail.ipk-gatersleben.de>
Message-ID: <4A547D24.6070703@ebi.ac.uk>

Hi Stefanie,

lueck at ipk-gatersleben.de wrote:
> Hi!
> 
> I would be interested in "superneedle" if the speed is acceptable. I'm
> still looking for a fast and good enough short sequence finder/aligner
> (see
> http://www.mailinglistarchive.com/biopython at biopython.org/msg00941.html).
> 
> At the moment I'm still using BLAST because I have no time to switch but
> later I would like to change to another algorithm.

>From the biopython message it seems you mainly want to find exact 21-mers.

Can you give an example of a query and a database so I can test the
speed of current EMBOSS applications like wordfinder

regards,

Peter Rice

From biopython at maubp.freeserve.co.uk  Wed Jul  8 17:50:19 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 8 Jul 2009 22:50:19 +0100
Subject: [EMBOSS] transeq and ambiguous codons
Message-ID: <320fb6e00907081450y2fd135e0x817f03c41357e297@mail.gmail.com>

Hi all,

Something I mentioned to Peter Rice in passing at BOSC/ISMB 2009 was
I'd found an oddity in transeq with certain ambiguous codons which
testing Biopython's translations. Here is a specific example (but I
suspect there are more). For reference, I am expecting EMBOSS transeq
to be using the NCBI tables:
http://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi

First consider the following example, the codon TAN, which can be TAA,
TAC, TAG or TAT which translate to stop or Y. Therefore the
translation of TAN should be "* or Y", and EMBOSS transeq opts for
"X". Which is fine:

$ transeq asis:TAATACTAGTATTAN -stdout -auto
>asis_1
*Y*YX

Similarly for the codon TNN, again EMBOSS transeq opts for "X" because
this could be a stop codon, or W, or F, or L, or S, or Y or C! Again,
this is fine:

$ transeq asis:TNN -stdout -auto >asis_1
X

However, consider the codon TRR. R means A or G, so this can mean TAA,
TGA, TAG or TGG which translate to stop or W (both EMBOSS and the NCBI
standard table agree here). Therefore the translation of TRR should be
"* or W", which I would expect based on the above examples to result
in "X". But instead EMBOSS transeq gives "*":

$ transeq asis:TAATGATAGTGGTRRTNN -stdout -auto
>asis_1
***W*X

I think this is a bug.

However, I am aware that the machine I tried this on is rather old,
and I don't actually know which version of EMBOSS it is. How can I
find out? As far as I know, there is no "-version" or "-v" or
"--version" switch, and the "-help" information doesn't include this
important piece of information. Nor is this in the FAQ:
http://emboss.sourceforge.net/docs/faq.html

So that makes two questions - how should transeq translate "TRR", and
how do I check the version of EMBOSS?

Thanks,

Peter C.

From SMarkel at accelrys.com  Wed Jul  8 19:53:05 2009
From: SMarkel at accelrys.com (Scott Markel)
Date: Wed, 8 Jul 2009 19:53:05 -0400
Subject: [EMBOSS] transeq and ambiguous codons
In-Reply-To: <320fb6e00907081450y2fd135e0x817f03c41357e297@mail.gmail.com>
References: <320fb6e00907081450y2fd135e0x817f03c41357e297@mail.gmail.com>
Message-ID: <1F1240778FB0AF46B4E5A72C44D2C74732E01A91@exch1-hi.accelrys.net>

Peter,

Answer to question #2: run the program embossversion.

> embossversion
Writes the current EMBOSS version number to a file
6.0.1

Scott

Scott Markel, Ph.D.
Principal Bioinformatics Architect  email:  smarkel at accelrys.com
Accelrys (SciTegic R&D)             mobile: +1 858 205 3653
10188 Telesis Court, Suite 100      voice:  +1 858 799 5603
San Diego, CA 92121                 fax:    +1 858 799 5222
USA                                 web:    http://www.accelrys.com

http://www.linkedin.com/in/smarkel
Vice President, Board of Directors:
    International Society for Computational Biology
Co-chair: ISCB Publications Committee
Associate Editor: PLoS Computational Biology
Editorial Board: Briefings in Bioinformatics


> -----Original Message-----
> From: emboss-bounces at lists.open-bio.org [mailto:emboss-bounces at lists.open-
> bio.org] On Behalf Of Peter
> Sent: Wednesday, 08 July 2009 2:50 PM
> To: emboss at lists.open-bio.org
> Subject: [EMBOSS] transeq and ambiguous codons
> 
> Hi all,
> 
> Something I mentioned to Peter Rice in passing at BOSC/ISMB 2009 was
> I'd found an oddity in transeq with certain ambiguous codons which
> testing Biopython's translations. Here is a specific example (but I
> suspect there are more). For reference, I am expecting EMBOSS transeq
> to be using the NCBI tables:
> http://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi
> 
> First consider the following example, the codon TAN, which can be TAA,
> TAC, TAG or TAT which translate to stop or Y. Therefore the
> translation of TAN should be "* or Y", and EMBOSS transeq opts for
> "X". Which is fine:
> 
> $ transeq asis:TAATACTAGTATTAN -stdout -auto
> >asis_1
> *Y*YX
> 
> Similarly for the codon TNN, again EMBOSS transeq opts for "X" because
> this could be a stop codon, or W, or F, or L, or S, or Y or C! Again,
> this is fine:
> 
> $ transeq asis:TNN -stdout -auto >asis_1
> X
> 
> However, consider the codon TRR. R means A or G, so this can mean TAA,
> TGA, TAG or TGG which translate to stop or W (both EMBOSS and the NCBI
> standard table agree here). Therefore the translation of TRR should be
> "* or W", which I would expect based on the above examples to result
> in "X". But instead EMBOSS transeq gives "*":
> 
> $ transeq asis:TAATGATAGTGGTRRTNN -stdout -auto
> >asis_1
> ***W*X
> 
> I think this is a bug.
> 
> However, I am aware that the machine I tried this on is rather old,
> and I don't actually know which version of EMBOSS it is. How can I
> find out? As far as I know, there is no "-version" or "-v" or
> "--version" switch, and the "-help" information doesn't include this
> important piece of information. Nor is this in the FAQ:
> http://emboss.sourceforge.net/docs/faq.html
> 
> So that makes two questions - how should transeq translate "TRR", and
> how do I check the version of EMBOSS?
> 
> Thanks,
> 
> Peter C.
> _______________________________________________
> EMBOSS mailing list
> EMBOSS at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/emboss


From pmr at ebi.ac.uk  Thu Jul  9 03:38:34 2009
From: pmr at ebi.ac.uk (Peter Rice)
Date: Thu, 09 Jul 2009 08:38:34 +0100
Subject: [EMBOSS] transeq and ambiguous codons
In-Reply-To: <320fb6e00907081450y2fd135e0x817f03c41357e297@mail.gmail.com>
References: <320fb6e00907081450y2fd135e0x817f03c41357e297@mail.gmail.com>
Message-ID: <4A559E7A.4020807@ebi.ac.uk>

Peter C. wrote:
> Hi all,
> 
> Something I mentioned to Peter Rice in passing at BOSC/ISMB 2009 was
> I'd found an oddity in transeq with certain ambiguous codons which
> testing Biopython's translations. Here is a specific example (but I
> suspect there are more). For reference, I am expecting EMBOSS transeq
> to be using the NCBI tables:
> http://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi

Yes, those are our source.

> However, consider the codon TRR. R means A or G, so this can mean TAA,
> TGA, TAG or TGG which translate to stop or W (both EMBOSS and the NCBI
> standard table agree here). Therefore the translation of TRR should be
> "* or W", which I would expect based on the above examples to result
> in "X". But instead EMBOSS transeq gives "*":
> 
> $ transeq asis:TAATGATAGTGGTRRTNN -stdout -auto
>> asis_1
> ***W*X

Hmmm... something odd there. TRA translates as 'X'.

I will take a look, but we are bundling the 6.1.0 release so (unless I am 
very quick) any resolution will be a patch after the release.

> So that makes two questions - how should transeq translate "TRR", and
> how do I check the version of EMBOSS?

Scott got there first (different time zone I assume :-) with the suggestion 
of embossversion.

Meanwhile .... are there any translation utilities we could usefully 
include in a future version?

I am already contemplating one to report the translations of ambiguous 
codons, just for information in tracking and debugging translation tables - 
and explaining transeq and other applications to new users.

regards,

Peter Rice

From biopython at maubp.freeserve.co.uk  Thu Jul  9 04:59:52 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 9 Jul 2009 09:59:52 +0100
Subject: [EMBOSS] transeq and ambiguous codons
In-Reply-To: <1F1240778FB0AF46B4E5A72C44D2C74732E01A91@exch1-hi.accelrys.net>
References: <320fb6e00907081450y2fd135e0x817f03c41357e297@mail.gmail.com>
	<1F1240778FB0AF46B4E5A72C44D2C74732E01A91@exch1-hi.accelrys.net>
Message-ID: <320fb6e00907090159w6a76df66gbf8da2581594414d@mail.gmail.com>

On Thu, Jul 9, 2009 at 12:53 AM, Scott Markel<SMarkel at accelrys.com> wrote:
>
> Peter,
>
> Answer to question #2: run the program embossversion.
>
>> embossversion
> Writes the current EMBOSS version number to a file
> 6.0.1
>
> Scott

Thanks Scott (& Thomas) for pointing out the embossversion program.

I would still question why the EMBOSS tools don't also support the
Unix convention of a version switch. Hypothetically, aren't some
(many?) of the tools standalone and couldn't they be installed
individually (e.g. as part of someone else's software bundle)? i.e.
Can EMBOSS really guarantee that the needle tool and the
embossversion tool are in sync?

Peter

From pmr at ebi.ac.uk  Thu Jul  9 05:08:37 2009
From: pmr at ebi.ac.uk (Peter Rice)
Date: Thu, 09 Jul 2009 10:08:37 +0100
Subject: [EMBOSS] transeq and ambiguous codons
In-Reply-To: <320fb6e00907081450y2fd135e0x817f03c41357e297@mail.gmail.com>
References: <320fb6e00907081450y2fd135e0x817f03c41357e297@mail.gmail.com>
Message-ID: <4A55B395.4090301@ebi.ac.uk>

Peter C. wrote:
> However, consider the codon TRR. R means A or G, so this can mean TAA,
> TGA, TAG or TGG which translate to stop or W (both EMBOSS and the NCBI
> standard table agree here). Therefore the translation of TRR should be
> "* or W", which I would expect based on the above examples to result
> in "X". But instead EMBOSS transeq gives "*":

This is a side effect of the way backtranslation works.

EMBOSS calculates the "most ambiguous codon" for each amino acid and
stop, and uses this for back translation. Thus a '*' in a protein
sequence would be rendered as 'TRR' by backtranseq. To provide
consistent translation of the backtranseq results, TRR is assumed to be
a backtranslated stop. Similarly, MGN is 'R' because it could reasonably
result from a backtranslation of 'R'

I agree that it would also be reasonable to be strict about translation
in transeq and render TRR as 'X'

It depends on your philosophy of where the ambiguity codes came from -
from backtranslation, or the curious mind of a bioinformatician :-)

So .... it's not a bug, it's a feature ... which means I can relax for
now and contemplate some extras in the next release.

In future, we will at least make sure TRA and other 'unambiguous
ambiguous codons' get understood as '*' etc. TRR I would prefer to leave
as it is by default, with option for rendering it as 'X' or an
alternative to transeq with the strict translation rules enforced.

regards,

Peter Rice


From pmr at ebi.ac.uk  Thu Jul  9 05:16:40 2009
From: pmr at ebi.ac.uk (Peter Rice)
Date: Thu, 09 Jul 2009 10:16:40 +0100
Subject: [EMBOSS] transeq and ambiguous codons
In-Reply-To: <320fb6e00907090159w6a76df66gbf8da2581594414d@mail.gmail.com>
References: <320fb6e00907081450y2fd135e0x817f03c41357e297@mail.gmail.com>	<1F1240778FB0AF46B4E5A72C44D2C74732E01A91@exch1-hi.accelrys.net>
	<320fb6e00907090159w6a76df66gbf8da2581594414d@mail.gmail.com>
Message-ID: <4A55B578.1060808@ebi.ac.uk>

Peter C. wrote:

> Thanks Scott (& Thomas) for pointing out the embossversion program.
> 
> I would still question why the EMBOSS tools don't also support the
> Unix convention of a version switch. Hypothetically, aren't some
> (many?) of the tools standalone and couldn't they be installed
> individually (e.g. as part of someone else's software bundle)? i.e.
> Can EMBOSS really guarantee that the needle tool and the
> embossversion tool are in sync?

We could easily add a -version global qualifier ... for the next release.

We can guarantee that embossversion and needle are in sync - assuming
they are built using the same libraries as that is where the version is
recorded. Standalone build are an issue though and it would help debug
in a few cases.

regards,

Peter Rice


From biopython at maubp.freeserve.co.uk  Thu Jul  9 05:21:45 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 9 Jul 2009 10:21:45 +0100
Subject: [EMBOSS] transeq and ambiguous codons
In-Reply-To: <4A55B578.1060808@ebi.ac.uk>
References: <320fb6e00907081450y2fd135e0x817f03c41357e297@mail.gmail.com>
	<1F1240778FB0AF46B4E5A72C44D2C74732E01A91@exch1-hi.accelrys.net>
	<320fb6e00907090159w6a76df66gbf8da2581594414d@mail.gmail.com>
	<4A55B578.1060808@ebi.ac.uk>
Message-ID: <320fb6e00907090221m66f830b6ya25ad8eeeecf8f12@mail.gmail.com>

On Thu, Jul 9, 2009 at 10:16 AM, Peter Rice<pmr at ebi.ac.uk> wrote:
>
> Peter C. wrote:
>
>> Thanks Scott (& Thomas) for pointing out the embossversion program.
>>
>> I would still question why the EMBOSS tools don't also support the
>> Unix convention of a version switch. Hypothetically, aren't some
>> (many?) of the tools standalone and couldn't they be installed
>> individually (e.g. as part of someone else's software bundle)? i.e.
>> Can EMBOSS really guarantee that the needle tool and the
>> embossversion tool are in sync?
>
> We could easily add a -version global qualifier ... for the next release.
>
> We can guarantee that embossversion and needle are in sync - assuming
> they are built using the same libraries as that is where the version is
> recorded. Standalone build are an issue though and it would help debug
> in a few cases.

That sounds good to me :)

Peter C.

From biopython at maubp.freeserve.co.uk  Thu Jul  9 05:28:20 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 9 Jul 2009 10:28:20 +0100
Subject: [EMBOSS] transeq and ambiguous codons
In-Reply-To: <320fb6e00907081450y2fd135e0x817f03c41357e297@mail.gmail.com>
References: <320fb6e00907081450y2fd135e0x817f03c41357e297@mail.gmail.com>
Message-ID: <320fb6e00907090228u7fd0df2by1e2ca18fc0a5c43@mail.gmail.com>

On Wed, Jul 8, 2009 at 10:50 PM, Peter<biopython at maubp.freeserve.co.uk> wrote:
> Hi all,
>
> Something I mentioned to Peter Rice in passing at BOSC/ISMB 2009 was
> I'd found an oddity in transeq with certain ambiguous codons while
> testing Biopython's translations. Here is a specific example (but I
> suspect there are more). For reference, I am expecting EMBOSS transeq
> to be using the NCBI tables:
> http://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi
>
> First consider the following example, the codon TAN, which can be TAA,
> TAC, TAG or TAT which translate to stop or Y. Therefore the
> translation of TAN should be "* or Y", and EMBOSS transeq opts for
> "X". Which is fine:

Using raw output instead of the default FASTA works better in emails:

$ transeq asis:TAATACTAGTATTAN -stdout -auto -osformat raw
*Y*YX

> Similarly for the codon TNN, again EMBOSS transeq opts for "X" because
> this could be a stop codon, or W, or F, or L, or S, or Y or C! Again,
> this is fine:

Again, using raw output works better in emails:

$ transeq asis:TNN -stdout -auto -osformat raw
X

> However, consider the codon TRR. R means A or G, so this can mean TAA,
> TGA, TAG or TGG which translate to stop or W (both EMBOSS and the NCBI
> standard table agree here). Therefore the translation of TRR should be
> "* or W", which I would expect based on the above examples to result
> in "X". But instead EMBOSS transeq gives "*":

Again, using raw output works better in emails:

$ transeq asis:TAATGATAGTGGTRR -stdout -auto -osformat raw
***W*

> I think this is a bug.
>
> However, I am aware that the machine I tried this on is rather old,
> and I don't actually know which version of EMBOSS it is.

I can check the old machine later, but I just retested on a Mac using
EMBOSS 6.0.1 (the current release), and see the same behaviour.

Peter C.

From michael.watson at bbsrc.ac.uk  Thu Jul  9 07:33:59 2009
From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C))
Date: Thu, 9 Jul 2009 12:33:59 +0100
Subject: [EMBOSS] Problems installing EMBASSY / PHYLIPNEW
References: <8975119BCD0AC5419D61A9CF1A923E9508FD0DC2@iahce2ksrv1.iah.bbsrc.ac.uk>
	<4A5227C9.7040004@gmail.com>
Message-ID: <8975119BCD0AC5419D61A9CF1A923E9508B292C9@iahce2ksrv1.iah.bbsrc.ac.uk>

Hmmm.  My sys admin says....

I can find:
libXaw-devel.x86_64 : X.Org X11 libXaw development package
libXaw.x86_64 : X.Org X11 libXaw runtime library
libXaw.i386 : X.Org X11 libXaw runtime library
libXaw-devel.i386 : X.Org X11 libXaw development package

Have I got compatibility issues here?


-----Original Message-----
From: jason zhang [mailto:jasonzhang2002 at gmail.com]
Sent: Mon 06/07/2009 5:35 PM
To: michael watson (IAH-C)
Cc: emboss at lists.open-bio.org
Subject: Re: [EMBOSS] Problems installing EMBASSY / PHYLIPNEW
 
Hi, Michael
I did this in my fedora server : yum install libXaw libXaw-compat 
libXaw-devel to resolve this.

-jason

michael watson (IAH-C) wrote:
> Hello
>
>  
>
> I did this on a test server a few weeks ago and wrote a script and it
> worked.
>
>  
>
> Now I come to the production server - well, I get some errors.  Same OS,
> same tar balls.  Very frustrating.  Can anyone help?
>
>  
>
> I have PHYLIPNEW-3.67 unpacked in EMBOSS-6.0.1/embassy/PHYLIPNEW-3.67.
> In that directory, I run configure and then make
>
>  
>
> I get a variety of errors, but this time it's:
>
>  
>
> Making all in src
>
> make[1]: Entering directory
> `/home/bioinfvm/EMBOSS-6.0.1/embassy/PHYLIPNEW-3.67/src'
>
> gcc -DPACKAGE_NAME=\"\" -DPACKAGE_TARNAME=\"\" -DPACKAGE_VERSION=\"\"
> -DPACKAGE_STRING=\"\" -DPACKAGE_BUGREPORT=\"\" -DPACKAGE=\"PHYLIPNEW\"
> -DVERSION=\"3.67\" -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1
> -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1
> -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1
> -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -DHAVE_DIRENT_H=1
> -DSTDC_HEADERS=1 -DHAVE_UNISTD_H=1 -DGETPGRP_VOID=1 -DHAVE_STRFTIME=1
> -DHAVE_FORK=1 -DHAVE_VFORK=1 -DHAVE_WORKING_VFORK=1
> -DHAVE_WORKING_FORK=1 -DHAVE_VPRINTF=1 -DHAVE_STRDUP=1 -DHAVE_STRSTR=1
> -DHAVE_STRCHR=1 -DHAVE_ERAND48=1 -DHAVE_MEMMOVE=1 -DHAVE_LIBM=1
> -DPLD_png=1 -I. -I../include -I../../../ajax -I../../../nucleus
> -I../../../plplot  -DAJ_LinuxLF -D_FILE_OFFSET_BITS=64
> -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE  -DLENDIAN -DNO_AUTH  -O2
> -D__amd64__ -MT clique.o -MD -MP -MF .deps/clique.Tpo -c -o clique.o
> clique.c
>
> mv -f .deps/clique.Tpo .deps/clique.Po
>
> gcc -DPACKAGE_NAME=\"\" -DPACKAGE_TARNAME=\"\" -DPACKAGE_VERSION=\"\"
> -DPACKAGE_STRING=\"\" -DPACKAGE_BUGREPORT=\"\" -DPACKAGE=\"PHYLIPNEW\"
> -DVERSION=\"3.67\" -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1
> -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1
> -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1
> -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -DHAVE_DIRENT_H=1
> -DSTDC_HEADERS=1 -DHAVE_UNISTD_H=1 -DGETPGRP_VOID=1 -DHAVE_STRFTIME=1
> -DHAVE_FORK=1 -DHAVE_VFORK=1 -DHAVE_WORKING_VFORK=1
> -DHAVE_WORKING_FORK=1 -DHAVE_VPRINTF=1 -DHAVE_STRDUP=1 -DHAVE_STRSTR=1
> -DHAVE_STRCHR=1 -DHAVE_ERAND48=1 -DHAVE_MEMMOVE=1 -DHAVE_LIBM=1
> -DPLD_png=1 -I. -I../include -I../../../ajax -I../../../nucleus
> -I../../../plplot  -DAJ_LinuxLF -D_FILE_OFFSET_BITS=64
> -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE  -DLENDIAN -DNO_AUTH  -O2
> -D__amd64__ -MT disc.o -MD -MP -MF .deps/disc.Tpo -c -o disc.o disc.c
>
> mv -f .deps/disc.Tpo .deps/disc.Po
>
> gcc -DPACKAGE_NAME=\"\" -DPACKAGE_TARNAME=\"\" -DPACKAGE_VERSION=\"\"
> -DPACKAGE_STRING=\"\" -DPACKAGE_BUGREPORT=\"\" -DPACKAGE=\"PHYLIPNEW\"
> -DVERSION=\"3.67\" -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1
> -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1
> -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1
> -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -DHAVE_DIRENT_H=1
> -DSTDC_HEADERS=1 -DHAVE_UNISTD_H=1 -DGETPGRP_VOID=1 -DHAVE_STRFTIME=1
> -DHAVE_FORK=1 -DHAVE_VFORK=1 -DHAVE_WORKING_VFORK=1
> -DHAVE_WORKING_FORK=1 -DHAVE_VPRINTF=1 -DHAVE_STRDUP=1 -DHAVE_STRSTR=1
> -DHAVE_STRCHR=1 -DHAVE_ERAND48=1 -DHAVE_MEMMOVE=1 -DHAVE_LIBM=1
> -DPLD_png=1 -I. -I../include -I../../../ajax -I../../../nucleus
> -I../../../plplot  -DAJ_LinuxLF -D_FILE_OFFSET_BITS=64
> -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE  -DLENDIAN -DNO_AUTH  -O2
> -D__amd64__ -MT phylip.o -MD -MP -MF .deps/phylip.Tpo -c -o phylip.o
> phylip.c
>
> mv -f .deps/phylip.Tpo .deps/phylip.Po
>
> /bin/sh ../libtool --tag=CC   --mode=link gcc  -O2 -D__amd64__   -o
> fclique clique.o disc.o phylip.o ../../../nucleus/libnucleus.la
> ../../../ajax/libajax.la ../../../plplot/libeplplot.la -L/usr/lib64
> -lX11  -lXaw -lXt -lm  -lgd -lpng -lz -lm
>
> mkdir .libs
>
> gcc -O2 -D__amd64__ -o .libs/fclique clique.o disc.o phylip.o
> ../../../nucleus/.libs/libnucleus.so ../../../ajax/.libs/libajax.so
> ../../../plplot/.libs/libeplplot.so -L/usr/lib64 -lX11 -lXaw -lXt -lgd
> -lpng -lz -lm  -Wl,--rpath -Wl,/usr/local/lib
>
> /usr/bin/ld: cannot find -lXaw
>
> collect2: ld returned 1 exit status
>
> make[1]: *** [fclique] Error 1
>
> make[1]: Leaving directory
> `/home/bioinfvm/EMBOSS-6.0.1/embassy/PHYLIPNEW-3.67/src'
>
> make: *** [all-recursive] Error 1
>
>  
>
> Any help would be appreciated.
>
>  
>
> Thanks
>
> Mick
>
>  
>
> Head of Bioinformatics
> Institute for Animal Health
> Compton
> Berks
> RG20 7NN
> 01635 578411 
>
>  
>
> Please consider the environment and don't print this e-mail unless you
> really need to.
>
> The information contained in this message may be confidential or legally
> privileged and is intended solely for the addressee. If you have
> received this message in error please delete it & notify the originator
> immediately.  Unauthorised use, disclosure, copying or alteration of
> this message is forbidden & may be unlawful.  The contents of this
> e-mail are the views of the sender and do not necessarily represent the
> views of the Institute.   This email, and associated attachments, has
> been checked locally for viruses but we can accept no responsibility
> once it has left our systems.  Communications on Institute computers are
> monitored to secure the effective operation of the systems and for other
> lawful purposes.
>
>  
>
> The Institute for Animal Health is a company limited by guarantee,
> registered in England no. 559784.  
>
> The Institute is also a registered charity, Charity Commissioners
> Reference No. 228824
>
>  
>
>
> _______________________________________________
> EMBOSS mailing list
> EMBOSS at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/emboss
>
>   


From ajb at ebi.ac.uk  Thu Jul  9 08:08:14 2009
From: ajb at ebi.ac.uk (ajb at ebi.ac.uk)
Date: Thu, 9 Jul 2009 13:08:14 +0100 (BST)
Subject: [EMBOSS] Problems installing EMBASSY / PHYLIPNEW
In-Reply-To: <8975119BCD0AC5419D61A9CF1A923E9508B292C9@iahce2ksrv1.iah.bbsrc.ac.uk>
References: <8975119BCD0AC5419D61A9CF1A923E9508FD0DC2@iahce2ksrv1.iah.bbsrc.ac.uk>
	<4A5227C9.7040004@gmail.com>
	<8975119BCD0AC5419D61A9CF1A923E9508B292C9@iahce2ksrv1.iah.bbsrc.ac.uk>
Message-ID: <35478.86.26.12.63.1247141294.squirrel@webmail.ebi.ac.uk>

> Hmmm.  My sys admin says....
>
> I can find:
> libXaw-devel.x86_64 : X.Org X11 libXaw development package
> libXaw.x86_64 : X.Org X11 libXaw runtime library
> libXaw.i386 : X.Org X11 libXaw runtime library
> libXaw-devel.i386 : X.Org X11 libXaw development package
>
> Have I got compatibility issues here?

What operating system and version are you running?
We're at the bundling stage here for the next release and there
might be an opportunity for us to check.

Alan


From michael.watson at bbsrc.ac.uk  Thu Jul  9 08:22:10 2009
From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C))
Date: Thu, 9 Jul 2009 13:22:10 +0100
Subject: [EMBOSS] Problems installing EMBASSY / PHYLIPNEW
References: <8975119BCD0AC5419D61A9CF1A923E9508FD0DC2@iahce2ksrv1.iah.bbsrc.ac.uk>
	<4A5227C9.7040004@gmail.com>
	<8975119BCD0AC5419D61A9CF1A923E9508B292C9@iahce2ksrv1.iah.bbsrc.ac.uk>
	<35478.86.26.12.63.1247141294.squirrel@webmail.ebi.ac.uk>
Message-ID: <8975119BCD0AC5419D61A9CF1A923E9508B292CE@iahce2ksrv1.iah.bbsrc.ac.uk>

I think it's Red Hat Enterprise Linux 5 64 bit version....

-----Original Message-----
From: ajb at ebi.ac.uk [mailto:ajb at ebi.ac.uk]
Sent: Thu 09/07/2009 1:08 PM
To: michael watson (IAH-C)
Cc: emboss at emboss.open-bio.org
Subject: Re: [EMBOSS] Problems installing EMBASSY / PHYLIPNEW
 
> Hmmm.  My sys admin says....
>
> I can find:
> libXaw-devel.x86_64 : X.Org X11 libXaw development package
> libXaw.x86_64 : X.Org X11 libXaw runtime library
> libXaw.i386 : X.Org X11 libXaw runtime library
> libXaw-devel.i386 : X.Org X11 libXaw development package
>
> Have I got compatibility issues here?

What operating system and version are you running?
We're at the bundling stage here for the next release and there
might be an opportunity for us to check.

Alan


From A.J.Pemberton at bham.ac.uk  Thu Jul  9 09:12:01 2009
From: A.J.Pemberton at bham.ac.uk (Anthony Pemberton)
Date: Thu, 9 Jul 2009 14:12:01 +0100
Subject: [EMBOSS] Problems installing EMBASSY / PHYLIPNEW
In-Reply-To: <35478.86.26.12.63.1247141294.squirrel@webmail.ebi.ac.uk>
References: <8975119BCD0AC5419D61A9CF1A923E9508FD0DC2@iahce2ksrv1.iah.bbsrc.ac.uk><4A5227C9.7040004@gmail.com><8975119BCD0AC5419D61A9CF1A923E9508B292C9@iahce2ksrv1.iah.bbsrc.ac.uk>
	<35478.86.26.12.63.1247141294.squirrel@webmail.ebi.ac.uk>
Message-ID: <D39A1B623AED9D458065D38B8EA52780698BD7@BIOEX2.adf.bham.ac.uk>

Guys,

The system in question is probably a 64bit system. Very often (not
always) there is not a compatibility issue with the packages you list
below, they are simply the 32bit and 64bit versions of the libraries. On
RHEL 5 they will install in /usr/lib and /usr/lib64 respectively. Under
RHEL 5 just:

# up2date -i libXaw libXaw-devel

(# represents root prompt)

I am fairly sure a good install of the packages requires these
libraries. You will need a Red Hat subscription for up2date to work (or
install from your distro CD/DVD),

Regards,

Tony Pemberton


> -----Original Message-----
> From: emboss-bounces at lists.open-bio.org [mailto:emboss-
> bounces at lists.open-bio.org] On Behalf Of ajb at ebi.ac.uk
> Sent: 09 July 2009 13:08
> To: michael watson (IAH-C)
> Cc: emboss at emboss.open-bio.org
> Subject: Re: [EMBOSS] Problems installing EMBASSY / PHYLIPNEW
> 
> > Hmmm.  My sys admin says....
> >
> > I can find:
> > libXaw-devel.x86_64 : X.Org X11 libXaw development package
> > libXaw.x86_64 : X.Org X11 libXaw runtime library
> > libXaw.i386 : X.Org X11 libXaw runtime library
> > libXaw-devel.i386 : X.Org X11 libXaw development package
> >
> > Have I got compatibility issues here?
> 
> What operating system and version are you running?
> We're at the bundling stage here for the next release and there
> might be an opportunity for us to check.
> 
> Alan
> 
> 
> _______________________________________________
> EMBOSS mailing list
> EMBOSS at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/emboss


From biopython at maubp.freeserve.co.uk  Fri Jul 10 05:14:42 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 10 Jul 2009 10:14:42 +0100
Subject: [EMBOSS] transeq and ambiguous codons
In-Reply-To: <4A55B395.4090301@ebi.ac.uk>
References: <320fb6e00907081450y2fd135e0x817f03c41357e297@mail.gmail.com>
	<4A55B395.4090301@ebi.ac.uk>
Message-ID: <320fb6e00907100214v6799a217l507e089f635ef781@mail.gmail.com>

On Thu, Jul 9, 2009 at 10:08 AM, Peter Rice<pmr at ebi.ac.uk> wrote:
>
> Peter C. wrote:
>> However, consider the codon TRR. R means A or G, so this can mean TAA,
>> TGA, TAG or TGG which translate to stop or W (both EMBOSS and the NCBI
>> standard table agree here). Therefore the translation of TRR should be
>> "* or W", which I would expect based on the above examples to result
>> in "X". But instead EMBOSS transeq gives "*":
>
> This is a side effect of the way backtranslation works...

OK, leaving TRR aside for the moment (I'm not sure I'd have done it that
way, but I think I follow your logic), I have some more problem cases for
you to consider (all using the default standard NCBI table 1).

Most of these are 'unambiguous ambiguous codons' as you put it, and
I would agree using X when a more specific letter is possible isn't ideal
but isn't actually wrong. The "ATS" and related codons (see below)
however are simply wrong.

--------------------------------------------------------------------------------------

TRA means TAA or TGA, which are both stop codons. Therefore TRA
should translate as a stop, not as an X:

$ transeq asis:TAATGATRA -stdout -auto -osformat raw
**X

--------------------------------------------------------------------------------------

Now look at YTA, which means CTA or TTA which encode L, so
YTA should be L not X:

$ transeq asis:CTATTAYTA -stdout -auto -osformat raw
LLX

Likewise for YTG and YTR, and YTN.

--------------------------------------------------------------------------------------

Another example, ATW means ATA or ATT, which both translate as I,
so ATW should translate as I not X:

$ transeq asis:ATAATTATW -stdout -auto -osformat raw
IIX

--------------------------------------------------------------------------------------

Conversely, ATS which means ATC or ATG which translate as I and M.
Remember S means G or C. Therefore ATS should translate as X, and
not I:

$ transeq asis:ATCATGATS -stdout -auto -osformat raw
IMI

Likewise H means A, G or C, so ATH shows the same bug, as do some
other AT* codons:

$ transeq asis:ATAATCATGATH -stdout -auto -osformat raw
IIMI

[*** This one strikes me as a clear bug ***]

--------------------------------------------------------------------------------------

Now for another debatable one, RAT means AAT or GAT which code
for N and D. So, you could use B (Asx) here rather than the broader X.

$ transeq asis:AATGATRAT -stdout -auto -osformat raw
NDX

Again, the same thing for others like RAC -> X not B, and RAY -> X not B.

Similarly, you don't use J to mean leucine (L) or to isoleucine (I), and
opt for X (again, this is justifiable). e.g. WTA

$ transeq asis:ATATTAWTA -stdout -auto -osformat raw
ILX

--------------------------------------------------------------------------------------

This list is only partial, and only for the standard table.

Peter C.

From pmr at ebi.ac.uk  Fri Jul 10 05:30:52 2009
From: pmr at ebi.ac.uk (Peter Rice)
Date: Fri, 10 Jul 2009 10:30:52 +0100
Subject: [EMBOSS] transeq and ambiguous codons
In-Reply-To: <320fb6e00907100214v6799a217l507e089f635ef781@mail.gmail.com>
References: <320fb6e00907081450y2fd135e0x817f03c41357e297@mail.gmail.com>	
	<4A55B395.4090301@ebi.ac.uk>
	<320fb6e00907100214v6799a217l507e089f635ef781@mail.gmail.com>
Message-ID: <4A570A4C.2000500@ebi.ac.uk>

Peter C. wrote:
> OK, leaving TRR aside for the moment (I'm not sure I'd have done it that
> way, but I think I follow your logic), I have some more problem cases for
> you to consider (all using the default standard NCBI table 1).
> 
> Most of these are 'unambiguous ambiguous codons' as you put it, and
> I would agree using X when a more specific letter is possible isn't ideal
> but isn't actually wrong. The "ATS" and related codons (see below)
> however are simply wrong.

They do look wrong. The "X when it could pick a residue" ones I knew of.

The others need a closer look. The plan is to work through all possible 
codons and all the NCBI genetic codes as soon as the release is out.

It should be a simple patch to ajtranslate.c when I'm done.

> --------------------------------------------------------------------------------------
> 
> Now for another debatable one, RAT means AAT or GAT which code
> for N and D. So, you could use B (Asx) here rather than the broader X.
> 
> Similarly, you don't use J to mean leucine (L) or to isoleucine (I), and
> opt for X (again, this is justifiable). e.g. WTA

Hmmm ... B and Z are ambiguity codes for amino acid analyser where all the 
amide bonds are broken and that includes N->D and Q->E. We used to have one 
of those in the lab. Similarly, J is for mass spec where I and L have the 
same molecular weight. I don't consider them appropriate for translation.

So I plan to go for unique amino acids where possible with the ambiguity codes.

What do our users think?

regards,

Peter

From biopython at maubp.freeserve.co.uk  Fri Jul 10 19:10:19 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Sat, 11 Jul 2009 00:10:19 +0100
Subject: [EMBOSS] transeq and ambiguous codons
In-Reply-To: <4A570A4C.2000500@ebi.ac.uk>
References: <320fb6e00907081450y2fd135e0x817f03c41357e297@mail.gmail.com>
	<4A55B395.4090301@ebi.ac.uk>
	<320fb6e00907100214v6799a217l507e089f635ef781@mail.gmail.com>
	<4A570A4C.2000500@ebi.ac.uk>
Message-ID: <320fb6e00907101610o63140f2cp7f0aebd1cc7022d1@mail.gmail.com>

On Fri, Jul 10, 2009 at 10:30 AM, Peter Rice<pmr at ebi.ac.uk> wrote:
>
> Peter C. wrote:
>>
>> OK, leaving TRR aside for the moment (I'm not sure I'd have done it that
>> way, but I think I follow your logic), I have some more problem cases for
>> you to consider (all using the default standard NCBI table 1).
>>
>> Most of these are 'unambiguous ambiguous codons' as you put it, and
>> I would agree using X when a more specific letter is possible isn't ideal
>> but isn't actually wrong. The "ATS" and related codons (see below)
>> however are simply wrong.
>
> They do look wrong. The "X when it could pick a residue" ones I knew of.
>
> The others need a closer look. The plan is to work through all possible
> codons and all the NCBI genetic codes as soon as the release is out.
>
> It should be a simple patch to ajtranslate.c when I'm done.
>

OK - I appreciate this is too last minute for the imminent EMBOSS release.

>> --------------------------------------------------------------------------------------
>>
>> Now for another debatable one, RAT means AAT or GAT which code
>> for N and D. So, you could use B (Asx) here rather than the broader X.
>>
>> Similarly, you don't use J to mean leucine (L) or to isoleucine (I), and
>> opt for X (again, this is justifiable). e.g. WTA
>
> Hmmm ... B and Z are ambiguity codes for amino acid analyser where all the
> amide bonds are broken and that includes N->D and Q->E. We used to have one
> of those in the lab. Similarly, J is for mass spec where I and L have the
> same molecular weight. I don't consider them appropriate for translation.

Well, as I said, this is debatable. On the one hand B and Z are IUPAC standards
(although J isn't yet), but amino acids don't have the full ambiguous alphabet
that we have for nucleotides so some might find such a translation surprising.
http://www.chem.qmul.ac.uk/iupac/AminoAcid/A2021.html

> So I plan to go for unique amino acids where possible with the ambiguity
> codes.

Good :)

Peter C.

From aradwen at gmail.com  Sat Jul 11 06:29:40 2009
From: aradwen at gmail.com (Radwen ANIBA)
Date: Sat, 11 Jul 2009 12:29:40 +0200
Subject: [EMBOSS] Problem with protein caracters
Message-ID: <e591b1bd0907110329x7dfdefd1md08c5e99aeb8df2e@mail.gmail.com>

Hi everyone

I'm trying to use some programs that comes with emboss package to analyze
some protein sequences but I have sometimes this message :

Error: ajSeqTypeCheckIn: Sequence must be protein sequence without BZ U X or
*: found bad character 'X'

Is there any manner to force the program considering these types of residues
?

Thank you

radwen

From pmr at ebi.ac.uk  Sat Jul 11 06:54:21 2009
From: pmr at ebi.ac.uk (Peter Rice)
Date: Sat, 11 Jul 2009 11:54:21 +0100
Subject: [EMBOSS] Problem with protein caracters
In-Reply-To: <e591b1bd0907110329x7dfdefd1md08c5e99aeb8df2e@mail.gmail.com>
References: <e591b1bd0907110329x7dfdefd1md08c5e99aeb8df2e@mail.gmail.com>
Message-ID: <4A586F5D.5060702@ebi.ac.uk>

Radwen ANIBA wrote:
> I'm trying to use some programs that comes with emboss package to analyze
> some protein sequences but I have sometimes this message :
> 
> Error: ajSeqTypeCheckIn: Sequence must be protein sequence without BZ U X or
> *: found bad character 'X'
> 
> Is there any manner to force the program considering these types of residues

EMBOSS uses the type attribute of the input sequence (or seqset or 
seqall) to identify the type of the input sequence (nucleotide, protein, 
or any) and the characters that are allowed (gaps, stops, non-standard 
residies and ambiguity characters).

Your application is expecting "pureprotein". This is only used by 
applications unable to handle the ambiguity codes (it can be difficult 
to define what an algorithm should do with them).

The alternative are:

protein - accepts all characters, converts stops to X
proteinstandard - converts U,O and J to 'X'
stopproteinstandard - converts stops, U, O, J to X

"protein" is probably what you want. You need to be able to do something 
with the ambiguity codes X, B, Z and J and with the non-standard amino 
acids U (selenocysteine) and O (pyrrolysine)

Hope this helps

Peter Rice

From liisa.koski at dnalandmarks.ca  Mon Jul 13 09:56:37 2009
From: liisa.koski at dnalandmarks.ca (Liisa Koski)
Date: Mon, 13 Jul 2009 09:56:37 -0400
Subject: [EMBOSS] fuzznuc score
Message-ID: <OF437A9FFD.DAEF22FD-ON852575F2.004C8704-852575F2.004C9866@notes.basf-corp.com>

Hello,

I would like to know how the score is calculated for the fuzznuc results. 
What is taken into consideration(strand, exact matches, etc?). What does a 

high score vs lower score mean?

Many thanks,

Liisa


From pmr at ebi.ac.uk  Mon Jul 13 10:43:18 2009
From: pmr at ebi.ac.uk (Peter Rice)
Date: Mon, 13 Jul 2009 15:43:18 +0100
Subject: [EMBOSS] fuzznuc score
In-Reply-To: <OF437A9FFD.DAEF22FD-ON852575F2.004C8704-852575F2.004C9866@notes.basf-corp.com>
References: <OF437A9FFD.DAEF22FD-ON852575F2.004C8704-852575F2.004C9866@notes.basf-corp.com>
Message-ID: <4A5B4806.1080905@ebi.ac.uk>

Liisa Koski wrote:
> Hello,
> 
> I would like to know how the score is calculated for the fuzznuc results. 
> What is taken into consideration(strand, exact matches, etc?). What does a 
> high score vs lower score mean?

The score in EMBOSS report output is a required field for standard
formats such as GFF.

For fuzznuc the score is the length of the sequence that is found, minus
any mismatches.

Patterns that can match very long sequences can therefore give a very
high score.

Hope this helps,

Peter Rice

From ajb at ebi.ac.uk  Wed Jul 15 07:19:33 2009
From: ajb at ebi.ac.uk (ajb at ebi.ac.uk)
Date: Wed, 15 Jul 2009 12:19:33 +0100 (BST)
Subject: [EMBOSS] EMBOSS 6.1.0 release now available
Message-ID: <39390.86.26.12.63.1247656773.squirrel@webmail.ebi.ac.uk>

Dear EMBOSS users and developers,

A new version of EMBOSS (6.1.0) is now available for download
from our ftp server:

   ftp://emboss.open-bio.org/pub/EMBOSS/


If you use any of the EMBASSY packages (e.g. PHYLIP, VIENNA etc) then,
as usual, remember to re-download and compile those too.

A new version of the mEMBOSS, the Windows port, is also available from:

   ftp://emboss.open-bio.org/pub/EMBOSS/windows/mEMBOSS-6.1.0-setup.exe

Many new capabilities have been added and bugs fixed throughout.


Release highlights for EMBOSS include:

* Full support for the new SwissProt format. In most cases the entry can
  be read and written exactly

* Full support for EMBL and GenBank entries. In most cases the entry can
  be read and written exactly

* Support for FASTQ short read formats for sequence and quality data

* Full support for protein and nucleotide sequence parsing from PDB entries

* Full support for GFF3 feature format as the new default feature output

* Improved summary information at the end of report output

* Alignment output using multiple sequence formats

* Extended support for distance matrix file formats

* Improved support for regular expression and pattern searching

* Improved support for large sequence alignments

* Support for remote locations in feature table processing, for example
  retrieval in coderet.

* Output directory support extended to allow directories to be created

* Normalisation option for hydrophobicity plots (pepwindow and pepwindowall)

* Processing of methylation sites in restriction mapping

* Embossdata reports results alphabetically sorted

* Command line qualifiers should be unique after 5 characters to allow
  safe abbreviation

* Improved configuration procedures for X11 support

* Support for dasgff report format, making it possible to write
  EMBOSS-based DAS annotation servers


Release highlights for EMBASSY include:

* Support for MEME 4.0

* Phylipnew updated to Phylip 3.68

* Support for the HMMERDB environment variable in Hmmernew.

* Bug fixes for the MSE multiple sequence editor


Release highlights for Jemboss include:

* Refactoring of the source code

* Location of the 'Execution mode' menu moved near to the 'Go' button in
  the application forms.  When a user runs a job for the first time in
  'batch' mode an information message is displayed

* Automatic configuration of the standalone Jemboss GUI on
  UNIX systems after typing "make install" for EMBOSS. This
  standalone GUI can be run using the runJemboss.csh script
  in the EMBOSS 'bin' directory. This assumes that you have
  a reasonably up-to-date version of Java installed (1.6
  preferred)


For future extensions, we have added:

* Parsing of cross-reference information from SwissProt and EMBL/GenBank
  formats

* Code to delete and update database indexes


New EMBOSS wiki

EMBOSS now has a Wiki at http://emboss.open-bio.org/wiki where we will
maintain the master copies of documentation for the applications and
libraries, and where we have sections for planning new features and
applications for the next 3 years of funding. Please contribute any
corrections to the documentation and add new ideas to the "Planning"
section. We will, of course, be making the wiki prettier as it
matures.


Important note for Developers

New distributions of operating systems have started to use the series
2 version of libtool. We therefore now use this in our CVS
repository. The latest stable version of libtool is 2.2.6a (reported
by libtool itself as 2.2.6). Developers using systems with older
(1.5.x) libtool versions will have to install a local copy of
libtool. This would typically be done by downloading the source code
from the GNU site:

             ftp://ftp.gnu.org/

After installing libtool it will usually be necessary to then
re-install autoconf (2.63) and automake (1.11) to the same directory
root (they are often tied to the version of libtool they were provided
with). They too are available from the GNU ftp server. Make sure that
your PATH is refreshed between doing the installations of the GNU
tools in order that the previous versions aren't referenced.

We note that one system (cygwin) currently provides an experimental
version of libtool (2.2.7). Developers on these systems (and, in general,
on any system with a higher version of libtool than in our CVS repository)
should type:

             autoreconf -fi

before attempting compilation. We will usually keep up-to-date with libtool
stable releases within a libtool series.


New BBSRC funding and future work

As previously announced, we have recently been refunded by the BBSRC.
What we said in that announcement bears repeating here.
The core aims of the funding proposal were to continue support,
maintenance and development of EMBOSS, and to provide extensive online
training materials for users, developers and system administrators using
text from a series of books to be published by Cambridge University Press.

We are also explicitly targeting areas where we see EMBOSS can be
expanded:

* Richer data content in EMBOSS outputs leading to major improvements
  in the integration and visualisation of results in browsers.

* Processing many more data fields in EMBOSS inputs (taxonomy,
  genes, GO terms, cross-references, keywords.

* Extending and improving database access: better indexing, query
  language support and combining searches across multiple databases,
  support for non-sequence data resources and new data access methods

* Scaling up the libraries and adding new applications to support the
  data volumes generated by next-generation sequencing runs. We
  anticipate many more users will be working with short read data
  mapped to reference sequences over the next few years.

* We aim to add at least 100 new applications in these 3 years.
  Suggestions for new applications are very welcome.

* Major work on new developments and new library code will start from
  August.


Alan


From db60 at st-andrews.ac.uk  Wed Jul 15 08:47:17 2009
From: db60 at st-andrews.ac.uk (Daniel Barker)
Date: Wed, 15 Jul 2009 13:47:17 +0100
Subject: [EMBOSS] EMBOSS 6.1.0 release now available
In-Reply-To: <39390.86.26.12.63.1247656773.squirrel@webmail.ebi.ac.uk>
References: <39390.86.26.12.63.1247656773.squirrel@webmail.ebi.ac.uk>
Message-ID: <4A5DCFD5.8050101@st-andrews.ac.uk>

Dear Alan,

Very glad to see the project will continue with new funding.

One request. Could EMBOSS please add a full global alignment program or 
option? Needle allows 'overhanging ends' with no penalty. From the 
current Needle documentation: 'There is no penalty for the hanging ends 
of the overlap. In bioinformatics, it is usually reasonable to assume 
that the sequences are incomplete and there should be no penalty for 
failing to align the missing bases.'

For protein alignments, this tends to be very lenient towards alignments 
with these kinds of domain architecture:

protein 1: A-B-C
protein 2:     C-X-Y-Z

As opposed to the following situation, which (correctly for global 
alignment) does tend to be heavily penalized:

protein 1: A-C-D
protein 2:   C

I don't think this makes sense. Or at least, it definitely does not make 
sense for all applications. Generally with pairwise alignment of 
proteins I want either local alignment (e.g. water), or full global 
alignment which isn't available in EMBOSS.

Thank you for considering this.

Best regards,

Daniel

-- 
Daniel Barker
http://bio.st-andrews.ac.uk/staff/db60.htm
The University of St Andrews is a charity registered in Scotland :
No SC013532

From pmr at ebi.ac.uk  Wed Jul 15 09:44:26 2009
From: pmr at ebi.ac.uk (Peter Rice)
Date: Wed, 15 Jul 2009 14:44:26 +0100
Subject: [EMBOSS] EMBOSS 6.1.0 release now available
In-Reply-To: <4A5DCFD5.8050101@st-andrews.ac.uk>
References: <39390.86.26.12.63.1247656773.squirrel@webmail.ebi.ac.uk>
	<4A5DCFD5.8050101@st-andrews.ac.uk>
Message-ID: <4A5DDD3A.9070605@ebi.ac.uk>

Daniel Barker wrote:
> One request. Could EMBOSS please add a full global alignment program or
> option? Needle allows 'overhanging ends' with no penalty.
> 
> I don't think this makes sense. Or at least, it definitely does not make
> sense for all applications. Generally with pairwise alignment of
> proteins I want either local alignment (e.g. water), or full global
> alignment which isn't available in EMBOSS.

Thanks for the suggestion.

Would you be happy with an options for needle to penalise gaps at the
end, or a separate application (if so, we need to invent a new name).

regards,

Peter Rice

From db60 at st-andrews.ac.uk  Wed Jul 15 09:59:25 2009
From: db60 at st-andrews.ac.uk (Daniel Barker)
Date: Wed, 15 Jul 2009 14:59:25 +0100
Subject: [EMBOSS] EMBOSS 6.1.0 release now available
In-Reply-To: <4A5DDD3A.9070605@ebi.ac.uk>
References: <39390.86.26.12.63.1247656773.squirrel@webmail.ebi.ac.uk>
	<4A5DCFD5.8050101@st-andrews.ac.uk> <4A5DDD3A.9070605@ebi.ac.uk>
Message-ID: <4A5DE0BD.7090103@st-andrews.ac.uk>

Dear Peter,

> Would you be happy with an options for needle to penalise gaps at the
> end, or a separate application (if so, we need to invent a new name).

An option for needle would be fine. This may be the best solution.

Thanks a lot and best wishes,

Daniel

-- 
Daniel Barker
http://bio.st-andrews.ac.uk/staff/db60.htm
The University of St Andrews is a charity registered in Scotland :
No SC013532

From charles-listes-emboss at plessy.org  Wed Jul 15 10:59:53 2009
From: charles-listes-emboss at plessy.org (Charles Plessy)
Date: Wed, 15 Jul 2009 23:59:53 +0900
Subject: [EMBOSS] EMBOSS 6.1.0 release now available
In-Reply-To: <39390.86.26.12.63.1247656773.squirrel@webmail.ebi.ac.uk>
References: <39390.86.26.12.63.1247656773.squirrel@webmail.ebi.ac.uk>
Message-ID: <20090715145953.GA30812@kunpuu.plessy.org>

Le Wed, Jul 15, 2009 at 12:19:33PM +0100, ajb at ebi.ac.uk a ?crit :
> Dear EMBOSS users and developers,
> 
> A new version of EMBOSS (6.1.0) is now available for download
> from our ftp server:
> 
>    ftp://emboss.open-bio.org/pub/EMBOSS/

Dear EMBOSS developers, congratulations for the release!

I have updated our Debian EMBOSS packages, and they should become available for
the Sid (unstable) distribution in our mirrors within a day or two. Please let
me know if these packages prove difficult to install on our current stable
distribution (Lenny), and I will prepare a semi-official backport.

I will update and rebuild the Debian EMBASSY packages against EMBOSS 6.1.0
later this week.

Have a nice day

-- 
Charles Plessy
Debian Med packaging team,
http://www.debian.org/devel/debian-med
Tsurumi, Kanagawa, Japan

From golharam at umdnj.edu  Thu Jul 16 15:09:03 2009
From: golharam at umdnj.edu (Ryan Golhar)
Date: Thu, 16 Jul 2009 15:09:03 -0400
Subject: [EMBOSS] EMBOSS 6.1.0 release now available
In-Reply-To: <39390.86.26.12.63.1247656773.squirrel@webmail.ebi.ac.uk>
References: <39390.86.26.12.63.1247656773.squirrel@webmail.ebi.ac.uk>
Message-ID: <4A5F7ACF.4060808@umdnj.edu>

Any thought on implementing some of the algorithms using CUDA when 
possible on GPUs?  This could speed up some programs significantly.


ajb at ebi.ac.uk wrote:
> Dear EMBOSS users and developers,
> 
> A new version of EMBOSS (6.1.0) is now available for download
> from our ftp server:
> 
>    ftp://emboss.open-bio.org/pub/EMBOSS/
> 
> 
> If you use any of the EMBASSY packages (e.g. PHYLIP, VIENNA etc) then,
> as usual, remember to re-download and compile those too.
> 
> A new version of the mEMBOSS, the Windows port, is also available from:
> 
>    ftp://emboss.open-bio.org/pub/EMBOSS/windows/mEMBOSS-6.1.0-setup.exe
> 
> Many new capabilities have been added and bugs fixed throughout.
> 
> 
> 
> 
> Release highlights for EMBOSS include:
> 
> * Full support for the new SwissProt format. In most cases the entry can
>   be read and written exactly
> 
> * Full support for EMBL and GenBank entries. In most cases the entry can
>   be read and written exactly
> 
> * Support for FASTQ short read formats for sequence and quality data
> 
> * Full support for protein and nucleotide sequence parsing from PDB entries
> 
> * Full support for GFF3 feature format as the new default feature output
> 
> * Improved summary information at the end of report output
> 
> * Alignment output using multiple sequence formats
> 
> * Extended support for distance matrix file formats
> 
> * Improved support for regular expression and pattern searching
> 
> * Improved support for large sequence alignments
> 
> * Support for remote locations in feature table processing, for example
>   retrieval in coderet.
> 
> * Output directory support extended to allow directories to be created
> 
> * Normalisation option for hydrophobicity plots (pepwindow and pepwindowall)
> 
> * Processing of methylation sites in restriction mapping
> 
> * Embossdata reports results alphabetically sorted
> 
> * Command line qualifiers should be unique after 5 characters to allow
>   safe abbreviation
> 
> * Improved configuration procedures for X11 support
> 
> * Support for dasgff report format, making it possible to write
>   EMBOSS-based DAS annotation servers
> 
> 
> 
> 
> Release highlights for EMBASSY include:
> 
> * Support for MEME 4.0
> 
> * Phylipnew updated to Phylip 3.68
> 
> * Support for the HMMERDB environment variable in Hmmernew.
> 
> * Bug fixes for the MSE multiple sequence editor
> 
> 
> 
> 
> Release highlights for Jemboss include:
> 
> * Refactoring of the source code
> 
> * Location of the 'Execution mode' menu moved near to the 'Go' button in
>   the application forms.  When a user runs a job for the first time in
>   'batch' mode an information message is displayed
> 
> * Automatic configuration of the standalone Jemboss GUI on
>   UNIX systems after typing "make install" for EMBOSS. This
>   standalone GUI can be run using the runJemboss.csh script
>   in the EMBOSS 'bin' directory. This assumes that you have
>   a reasonably up-to-date version of Java installed (1.6
>   preferred)
> 
> 
> 
> 
> For future extensions, we have added:
> 
> * Parsing of cross-reference information from SwissProt and EMBL/GenBank
>   formats
> 
> * Code to delete and update database indexes
> 
> 
> 
> 
> New EMBOSS wiki
> 
> EMBOSS now has a Wiki at http://emboss.open-bio.org/wiki where we will
> maintain the master copies of documentation for the applications and
> libraries, and where we have sections for planning new features and
> applications for the next 3 years of funding. Please contribute any
> corrections to the documentation and add new ideas to the "Planning"
> section. We will, of course, be making the wiki prettier as it
> matures.
> 
> 
> 
> 
> Important note for Developers
> 
> New distributions of operating systems have started to use the series
> 2 version of libtool. We therefore now use this in our CVS
> repository. The latest stable version of libtool is 2.2.6a (reported
> by libtool itself as 2.2.6). Developers using systems with older
> (1.5.x) libtool versions will have to install a local copy of
> libtool. This would typically be done by downloading the source code
> from the GNU site:
> 
>              ftp://ftp.gnu.org/
> 
> After installing libtool it will usually be necessary to then
> re-install autoconf (2.63) and automake (1.11) to the same directory
> root (they are often tied to the version of libtool they were provided
> with). They too are available from the GNU ftp server. Make sure that
> your PATH is refreshed between doing the installations of the GNU
> tools in order that the previous versions aren't referenced.
> 
> We note that one system (cygwin) currently provides an experimental
> version of libtool (2.2.7). Developers on these systems (and, in general,
> on any system with a higher version of libtool than in our CVS repository)
> should type:
> 
>              autoreconf -fi
> 
> before attempting compilation. We will usually keep up-to-date with libtool
> stable releases within a libtool series.
> 
> 
> 
> 
> New BBSRC funding and future work
> 
> As previously announced, we have recently been refunded by the BBSRC.
> What we said in that announcement bears repeating here.
> The core aims of the funding proposal were to continue support,
> maintenance and development of EMBOSS, and to provide extensive online
> training materials for users, developers and system administrators using
> text from a series of books to be published by Cambridge University Press.
> 
> We are also explicitly targeting areas where we see EMBOSS can be
> expanded:
> 
> * Richer data content in EMBOSS outputs leading to major improvements
>   in the integration and visualisation of results in browsers.
> 
> * Processing many more data fields in EMBOSS inputs (taxonomy,
>   genes, GO terms, cross-references, keywords.
> 
> * Extending and improving database access: better indexing, query
>   language support and combining searches across multiple databases,
>   support for non-sequence data resources and new data access methods
> 
> * Scaling up the libraries and adding new applications to support the
>   data volumes generated by next-generation sequencing runs. We
>   anticipate many more users will be working with short read data
>   mapped to reference sequences over the next few years.
> 
> * We aim to add at least 100 new applications in these 3 years.
>   Suggestions for new applications are very welcome.
> 
> * Major work on new developments and new library code will start from
>   August.
> 
> 
> Alan
> 
> 
> _______________________________________________
> EMBOSS mailing list
> EMBOSS at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/emboss
> 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: golharam.vcf
Type: text/x-vcard
Size: 438 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/emboss/attachments/20090716/a7b18140/attachment.vcf>

From pmr at ebi.ac.uk  Fri Jul 17 03:39:11 2009
From: pmr at ebi.ac.uk (Peter Rice)
Date: Fri, 17 Jul 2009 08:39:11 +0100
Subject: [EMBOSS] EMBOSS 6.1.0 release now available
In-Reply-To: <4A5F7ACF.4060808@umdnj.edu>
References: <39390.86.26.12.63.1247656773.squirrel@webmail.ebi.ac.uk>
	<4A5F7ACF.4060808@umdnj.edu>
Message-ID: <4A602A9F.4040908@ebi.ac.uk>

Ryan Golhar wrote:
> Any thought on implementing some of the algorithms using CUDA when 
> possible on GPUs?  This could speed up some programs significantly.

Yes indeed.

At BOSC/ISMB last month we were discussing closer collaborations with the 
other Open Bio FOundation projects.

One of these is BioManyCores which is aiming at OpenCL programming but is 
currently concentrating on CUDA.

When our new workstations are delivered we will be looking into CUDA.

Which applications would you most like to speed up? (current EMBOSS 
programs, and suggestions for new ones)

regards,

Peter Rice

From hpm at bioinfo-user.org.uk  Fri Jul 17 05:59:42 2009
From: hpm at bioinfo-user.org.uk (Hamish McWilliam)
Date: Fri, 17 Jul 2009 10:59:42 +0100
Subject: [EMBOSS] EMBOSS 6.1.0 release now available
In-Reply-To: <4A602A9F.4040908@ebi.ac.uk>
References: <39390.86.26.12.63.1247656773.squirrel@webmail.ebi.ac.uk>	<4A5F7ACF.4060808@umdnj.edu>
	<4A602A9F.4040908@ebi.ac.uk>
Message-ID: <4A604B8E.1000004@bioinfo-user.org.uk>

Hi Peter,

>> Any thought on implementing some of the algorithms using CUDA when 
>> possible on GPUs?  This could speed up some programs significantly.

Given that our server systems do not have particularly powerful GPUs, 
but do have multiple CPU cores: threading, and possibly the use of on 
core vectorization (see http://en.wikipedia.org/wiki/SIMD), seem like 
more generally applicable methods for improving performance in our case.

One interesting option for Intel platforms is the Intel Compiler (icc), 
which will vectorize some code constructions as a platform specific 
optimization. Unfortunately we are running a mixture of AMD and Intel 
systems of various vintages, so this option is going to require a lot of 
testing to check it works and gives us any benefits.

> Yes indeed.
> 
> At BOSC/ISMB last month we were discussing closer collaborations with 
> the other Open Bio FOundation projects.
> 
> One of these is BioManyCores which is aiming at OpenCL programming but 
> is currently concentrating on CUDA.
> 
> When our new workstations are delivered we will be looking into CUDA.

Given that OpenCL supports both GPU and CPU vectorization, and CUDA is 
Nvidia GPU specific, it may be worth waiting for OpenCL to be adopted. 
MacOS X Snow Leopard is only a couple of months away after all ;-)

> Which applications would you most like to speed up? (current EMBOSS 
> programs, and suggestions for new ones)

At our end the bottlenecks are mainly the indexing (dbi* & dbx*) and 
reformatting tools (seqret).

All the best,

Hamish

From georgios at biotek.uio.no  Fri Jul 17 06:52:38 2009
From: georgios at biotek.uio.no (George Magklaras)
Date: Fri, 17 Jul 2009 12:52:38 +0200
Subject: [EMBOSS] EMBOSS 6.1.0 release now available
In-Reply-To: <4A604B8E.1000004@bioinfo-user.org.uk>
References: <39390.86.26.12.63.1247656773.squirrel@webmail.ebi.ac.uk>	<4A5F7ACF.4060808@umdnj.edu>	<4A602A9F.4040908@ebi.ac.uk>
	<4A604B8E.1000004@bioinfo-user.org.uk>
Message-ID: <4A6057F6.7010704@biotek.uio.no>


Hamish McWilliam wrote:
)
> 
> At our end the bottlenecks are mainly the indexing (dbi* & dbx*) and 
> reformatting tools (seqret).
>
I second that.


-- 
--
George Magklaras BSc Hons MPhil
RHCE:805008309135525

Senior Computer Systems Engineer/UNIX-Linux Systems Administrator
EMBnet Technical Management Board
The Biotechnology Centre of Oslo,
University of Oslo
http://folk.uio.no/georgios


From golharam at umdnj.edu  Fri Jul 17 11:03:02 2009
From: golharam at umdnj.edu (Ryan Golhar)
Date: Fri, 17 Jul 2009 11:03:02 -0400
Subject: [EMBOSS] EMBOSS 6.1.0 release now available
In-Reply-To: <4A602A9F.4040908@ebi.ac.uk>
References: <39390.86.26.12.63.1247656773.squirrel@webmail.ebi.ac.uk>
	<4A5F7ACF.4060808@umdnj.edu> <4A602A9F.4040908@ebi.ac.uk>
Message-ID: <4A6092A6.3070808@umdnj.edu>

I think the sequence alignment programs or anything dealing with 
matrices would be a good first start.  Even (as Hamish) mentioned, 
making use of multi-core systems would be good.

We are interested in doing some CUDA programming here.  We are doing 
some training in about a month.  Based on that, we'll see how well we 
know it and what we want to pursue.  Maybe we can help you out a bit in 
this?


Peter Rice wrote:
> Ryan Golhar wrote:
>> Any thought on implementing some of the algorithms using CUDA when 
>> possible on GPUs?  This could speed up some programs significantly.
> 
> Yes indeed.
> 
> At BOSC/ISMB last month we were discussing closer collaborations with 
> the other Open Bio FOundation projects.
> 
> One of these is BioManyCores which is aiming at OpenCL programming but 
> is currently concentrating on CUDA.
> 
> When our new workstations are delivered we will be looking into CUDA.
> 
> Which applications would you most like to speed up? (current EMBOSS 
> programs, and suggestions for new ones)
> 
> regards,
> 
> Peter Rice
> 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: golharam.vcf
Type: text/x-vcard
Size: 438 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/emboss/attachments/20090717/555a78da/attachment.vcf>

From pmr at ebi.ac.uk  Fri Jul 17 11:34:31 2009
From: pmr at ebi.ac.uk (Peter Rice)
Date: Fri, 17 Jul 2009 16:34:31 +0100
Subject: [EMBOSS] EMBOSS 6.1.0 release now available
In-Reply-To: <4A6057F6.7010704@biotek.uio.no>
References: <39390.86.26.12.63.1247656773.squirrel@webmail.ebi.ac.uk>	<4A5F7ACF.4060808@umdnj.edu>	<4A602A9F.4040908@ebi.ac.uk>	<4A604B8E.1000004@bioinfo-user.org.uk>
	<4A6057F6.7010704@biotek.uio.no>
Message-ID: <4A609A07.2080609@ebi.ac.uk>

George Magklaras wrote:
> Hamish McWilliam wrote:
> At our end the bottlenecks are mainly the indexing (dbi* & dbx*) and
> reformatting tools (seqret).
>>
> I second that.

dbi and dbx we understand and can speedup.

What large data are you using through seqret? I assume it is not single
sequences that are slow.

regards,

Peter Rice

From belegdol at gmail.com  Sun Jul 19 08:38:19 2009
From: belegdol at gmail.com (Julian Sikorski)
Date: Sun, 19 Jul 2009 14:38:19 +0200
Subject: [EMBOSS] Packaging EMBOSS for Fedora
In-Reply-To: <1244850556.8999.7.camel@login-svr1.ebi.ac.uk>
References: <h0uef8$bdu$1@ger.gmane.org>
	<1244850556.8999.7.camel@login-svr1.ebi.ac.uk>
Message-ID: <h3v43r$tc5$1@ger.gmane.org>

W dniu 13.06.2009 01:49, Mahmut Uludag pisze:
> Hi Julian,
> 
>> I have one crucial question: is it worth to package jemboss? It seems it
>> didn't see development since 2005, and I have experienced some important
>> issues when trying to use it (mainly not working with files with spaces
>> in filenames). What do you think?
> 
> I had a quick look into the files with spaces in filenames problem, we
> can include a fix in the coming release (6.1) for this problem. Previous
> release of EMBOSS (6.0) included some fixes of similar nature, as well
> as some small gui improvements. We recently have updated EMBOSS/Jemboss
> make/install machinery to make standalone Jemboss come out of box when
> users install EMBOSS, i think we were successful and this will be a new
> feature in EMBOSS 6.1.
> 
> Regards,
> Mahmut
Given that I'm finally an MSc and have some more free time, I decided to
revisit the packaging and update the package to 6.1.0. Unfortunately,
there seem to be some problems with make install:

/usr/bin/make  install-exec-hook
make[7]: Entering directory
`/builddir/build/BUILD/EMBOSS-6.1.0/jemboss/org/emboss/jemboss/editor'
mkdir -p --
/builddir/build/BUILDROOT/EMBOSS-6.1.0-1.fc11.x86_64/usr/share/EMBOSS/jemboss/org/emboss/jemboss/editor
/usr/bin/install: cannot stat `*.class': No such file or directory
make[7]: *** [install-exec-hook] Error 1
make[7]: Leaving directory
`/builddir/build/BUILD/EMBOSS-6.1.0/jemboss/org/emboss/jemboss/editor'
make[6]: Leaving directory
`/builddir/build/BUILD/EMBOSS-6.1.0/jemboss/org/emboss/jemboss/editor'
make[6]: *** [install-exec-am] Error 2
make[5]: *** [install-am] Error 2
make[5]: Leaving directory
`/builddir/build/BUILD/EMBOSS-6.1.0/jemboss/org/emboss/jemboss/editor'
make[4]: Leaving directory
`/builddir/build/BUILD/EMBOSS-6.1.0/jemboss/org/emboss/jemboss'
make[4]: *** [install-recursive] Error 1
make[3]: Leaving directory
`/builddir/build/BUILD/EMBOSS-6.1.0/jemboss/org/emboss'
make[2]: Leaving directory `/builddir/build/BUILD/EMBOSS-6.1.0/jemboss/org'
make[3]: *** [install-recursive] Error 1
make[2]: *** [install-recursive] Error 1
make[1]: Leaving directory `/builddir/build/BUILD/EMBOSS-6.1.0/jemboss'
make[1]: *** [install-recursive] Error 1
make: *** [install-recursive] Error 1

6.0.1 including the 1-12 patch was building/installing fine. Am I making
something wrong? Cheers,

Julian


From uludag at ebi.ac.uk  Sun Jul 19 12:18:29 2009
From: uludag at ebi.ac.uk (uludag at ebi.ac.uk)
Date: Sun, 19 Jul 2009 17:18:29 +0100 (BST)
Subject: [EMBOSS] Packaging EMBOSS for Fedora
In-Reply-To: <h3v43r$tc5$1@ger.gmane.org>
References: <h0uef8$bdu$1@ger.gmane.org>
	<1244850556.8999.7.camel@login-svr1.ebi.ac.uk>
	<h3v43r$tc5$1@ger.gmane.org>
Message-ID: <49993.78.105.201.225.1248020309.squirrel@webmail.ebi.ac.uk>


> there seem to be some problems with make install:
>
> /usr/bin/make  install-exec-hook
> make[7]: Entering directory
> `/builddir/build/BUILD/EMBOSS-6.1.0/jemboss/org/emboss/jemboss/editor'
> mkdir -p --
> /builddir/build/BUILDROOT/EMBOSS-6.1.0-1.fc11.x86_64/usr/share/EMBOSS/jemboss/org/emboss/jemboss/editor
> /usr/bin/install: cannot stat `*.class': No such file or directory

Looks like we didn't test the --with-java and --with-javaos configure
options well, before this release. However, most users will not need these
two options any more as EMBOSS-6.1.0 includes precompiled jemboss class
files collected in a java archive file. You should hopefully not get the
above error if you omit these two options when you configure your emboss
installation.

Regards,
Mahmut


From biopython at maubp.freeserve.co.uk  Mon Jul 20 09:24:02 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 20 Jul 2009 14:24:02 +0100
Subject: [EMBOSS] transeq and ambiguous codons
In-Reply-To: <320fb6e00907090221m66f830b6ya25ad8eeeecf8f12@mail.gmail.com>
References: <320fb6e00907081450y2fd135e0x817f03c41357e297@mail.gmail.com>
	<1F1240778FB0AF46B4E5A72C44D2C74732E01A91@exch1-hi.accelrys.net>
	<320fb6e00907090159w6a76df66gbf8da2581594414d@mail.gmail.com>
	<4A55B578.1060808@ebi.ac.uk>
	<320fb6e00907090221m66f830b6ya25ad8eeeecf8f12@mail.gmail.com>
Message-ID: <320fb6e00907200624x5513f3b5u8f7fa0c88e23fc8d@mail.gmail.com>

On Thu, Jul 9, 2009 at 10:21 AM, Peter<biopython at maubp.freeserve.co.uk> wrote:
> On Thu, Jul 9, 2009 at 10:16 AM, Peter Rice<pmr at ebi.ac.uk> wrote:
>>
>> Peter C. wrote:
>>
>>> Thanks Scott (& Thomas) for pointing out the embossversion program.
>>>
>>> I would still question why the EMBOSS tools don't also support the
>>> Unix convention of a version switch. Hypothetically, aren't some
>>> (many?) of the tools standalone and couldn't they be installed
>>> individually (e.g. as part of someone else's software bundle)? i.e.
>>> Can EMBOSS really guarantee that the needle tool and the
>>> embossversion tool are in sync?
>>
>> We could easily add a -version global qualifier ... for the next release.
>>
>> We can guarantee that embossversion and needle are in sync - assuming
>> they are built using the same libraries as that is where the version is
>> recorded. Standalone build are an issue though and it would help debug
>> in a few cases.
>
> That sounds good to me :)
>

Thinking about this again, rather than adding a whole new argument
(-version), why not just include the program version as the first line of
the help output (from -help)? This should also solve the corner case
of standalone builds, and makes it very easy to find the version
(without having to know about the embossversion tool).

Thanks,

Peter C.

From pmr at ebi.ac.uk  Mon Jul 20 09:37:38 2009
From: pmr at ebi.ac.uk (Peter Rice)
Date: Mon, 20 Jul 2009 14:37:38 +0100
Subject: [EMBOSS] transeq and ambiguous codons
In-Reply-To: <320fb6e00907200624x5513f3b5u8f7fa0c88e23fc8d@mail.gmail.com>
References: <320fb6e00907081450y2fd135e0x817f03c41357e297@mail.gmail.com>	
	<1F1240778FB0AF46B4E5A72C44D2C74732E01A91@exch1-hi.accelrys.net>	
	<320fb6e00907090159w6a76df66gbf8da2581594414d@mail.gmail.com>	
	<4A55B578.1060808@ebi.ac.uk>	
	<320fb6e00907090221m66f830b6ya25ad8eeeecf8f12@mail.gmail.com>
	<320fb6e00907200624x5513f3b5u8f7fa0c88e23fc8d@mail.gmail.com>
Message-ID: <4A647322.7080903@ebi.ac.uk>

Peter C. wrote:
> Thinking about this again, rather than adding a whole new argument
> (-version), why not just include the program version as the first line of
> the help output (from -help)? This should also solve the corner case
> of standalone builds, and makes it very easy to find the version
> (without having to know about the embossversion tool).

Too late ... I already have working code for --version :-)

This can report the EMBOSS version number. For EMBASSY applications it
also reports the EMBASSY package version number (something embossversion
is unable to do).

There was also a "version:" ACD application attribute. This has to be
renamed to "versionnumber:" to avoid a clash with the new qualifier
name, and will be reported if defined. The "versionnumber" attribute in
ACD was originally defined only for third party applications in the
SoapLab interface so we do not expect renaming the attribute to cause
any problems (we will make the changes to SoapLab).

However, the suggestion of adding the the -help output is a good one. I
will add something there too.

regards,

Peter Rice

From biopython at maubp.freeserve.co.uk  Mon Jul 20 11:41:43 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 20 Jul 2009 16:41:43 +0100
Subject: [EMBOSS] EMBOSS seqret : IntelliGenetics and new DOS lines
Message-ID: <320fb6e00907200841p621dcb7eh63d71b0730100813@mail.gmail.com>

Hi all,

I've just updated my Mac to EMBOSS 6.1.0, and have found an
issue with seqret conversion of IntelliGenetics files. After some
digging, I think this problem relates to having DOS new lines in
a file on Unix (in my case, Mac OS X).

For illustration, I'm using the example file from the EMBOSS
website, saved to disk (using Unix new lines on a Mac):
http://emboss.sourceforge.net/docs/themes/seqformats/ig

Using EMBOSS 6.0.1, there was a problem:

$ embossversion
Writes the current EMBOSS version number to a file
6.0.1
$  seqret -sequence emboss_ig.txt -sformat ig -osformat fasta -auto -filter
>HSFAU
ttcctctttctcgactccatcttcgcggtagctgggaccgccgttcagtcgccaatatgc
agctctttgtccgcgcccaggagctacacaccttcgaggtgaccggccaggaaacggtcg
cccagatcaaggctcatgtagcctcactggagggcattgccccggaagatcaagtcgtgc
tcctggcaggcgcgcccctggaggatgaggccactctgggccagtgcggggtggaggccc
tgactaccctggaagtagcaggccgcatgcttggaggtaaagttcatggttccctggccc
gtgctggaaaagtgagaggtcagactcctaaggtggccaaacaggagaagaagaagaaga
agacaggtcgggctaagcggcggatgcagtacaaccggcgctttgtcaacgttgtgccca
cctttggcaagaagaagggccccaatgccaactcttaagtcttttgtaattctggctttc
tctaataaaaaagccacttagttcagtcaaaaaaaaaaH-sapiensfaugenebasesH
SFAUctaccattttccctctcgattctatatgtacactcgggacaagttctcctgatcga
aaacggcaaaactaaggccccaagtaggaatgccttagttttcggggttaacaatgatta
acactgagcctcacacccacgcgatgccctcagctcctcgctcagcgctctcaccaacag
ccgtagcccgcagccccgctggacaccggttctccatccccgcagcgtagcccggaacat
ggtagctgccatctttacctgctacgccagccttctgtgcgcgcaactgtctggtcccgc
cccgtcctgcgcgagctgctgcccaggcaggttcgccggtgcgagcgtaaaggggcggag
ctaggactgccttgggcggtacaaatagcagggaaccgcgcggtcgctcagcagtgacgt
gacacgcagcccacggtctgtactgacgcgccctcgcttcttcctctttctcgactccat
cttcgcggtagctgggaccgccgttcaggtaagaatggggccttggctggatccgaaggg
cttgtagcaggttggctgcggggtcagaaggcgcggggggaaccgaagaacggggcctgc
tccgtggccctgctccagtccctatccgaactccttgggaggcactggccttccgcacgt
gagccgccgcgaccaccatcccgtcgcgatcgtttctggaccgctttccactcccaaatc
tcctttatcccagagcatttcttggcttctcttacaagccgtcttttctttactcagtcg
ccaatatgcagctctttgtccgcgcccaggagctacacaccttcgaggtgaccggccagg
aaacggtcgcccagatcaaggtaaggctgcttggtgcgccctgggttccattttcttgtg
ctcttcactctcgcggcccgagggaacgcttacgagccttatctttccctgtaggctcat
gtagcctcactggagggcattgccccggaagatcaagtcgtgctcctggcaggcgcgccc
ctggaggatgaggccactctgggccagtgcggggtggaggccctgactaccctggaagta
gcaggccgcatgcttggaggtgagtgagagaggaatgttctttgaagtaccggtaagcgt
ctagtgagtgtggggtgcatagtcctgacagctgagtgtcacacctatggtaatagagta
cttctcactgtcttcagttcagagtgattcttcctgtttacatccctcatgttgaacaca
gacgtccatgggagactgagccagagtgtagttgtatttcagtcacatcacgagatccta
gtctggttatcagcttccacactaaaaattaggtcagaccaggccccaaagtgctctata
aattagaagctggaagatcctgaaatgaaacttaagatttcaaggtcaaatatctgcaac
tttgttctcattacctattgggcgcagcttctctttaaaggcttgaattgagaaaagagg
ggttctgctgggtggcaccttcttgctcttacctgctggtgccttcctttcccactacag
gtaaagtccatggttccctggcccgtgctggaaaagtgagaggtcagactcctaaggtga
gtgagagtattagtggtcatggtgttaggactttttttcctttcacagctaaaccaagtc
cctgggctcttactcggtttgccttctccctccctggagatgagcctgagggaagggatg
ctaggtgtggaagacaggaaccagggcctgattaaccttcccttctccaggtggccaaac
aggagaagaagaagaagaagacaggtcgggctaagcggcggatgcagtacaaccggcgct
ttgtcaacgttgtgcccacctttggcaagaagaagggccccaatgccaactcttaagtct
tttgtaattctggctttctctaataaaaaagccacttagttcagtcatcgcattgtttca
tctttacttgcaaggcctcagggagaggtgtgcttctcgg

i.e. The two sequences have been munged into one, with the
name of the second sequence as part of the sequence.

Using EMBOSS 6.1.0, the following now works:

$ embossversion
Reports the current EMBOSS version number
6.1.0
$ seqret -sequence emboss_ig.txt -sformat ig -osformat fasta -auto -filter
>HSFAU H.sapiens fau mRNA, 518 bases
ttcctctttctcgactccatcttcgcggtagctgggaccgccgttcagtcgccaatatgc
agctctttgtccgcgcccaggagctacacaccttcgaggtgaccggccaggaaacggtcg
cccagatcaaggctcatgtagcctcactggagggcattgccccggaagatcaagtcgtgc
tcctggcaggcgcgcccctggaggatgaggccactctgggccagtgcggggtggaggccc
tgactaccctggaagtagcaggccgcatgcttggaggtaaagttcatggttccctggccc
gtgctggaaaagtgagaggtcagactcctaaggtggccaaacaggagaagaagaagaaga
agacaggtcgggctaagcggcggatgcagtacaaccggcgctttgtcaacgttgtgccca
cctttggcaagaagaagggccccaatgccaactcttaagtcttttgtaattctggctttc
tctaataaaaaagccacttagttcagtcaaaaaaaaaa
>HSFAU1 H.sapiens fau 1 gene, 2016 bases
ctaccattttccctctcgattctatatgtacactcgggacaagttctcctgatcgaaaac
ggcaaaactaaggccccaagtaggaatgccttagttttcggggttaacaatgattaacac
tgagcctcacacccacgcgatgccctcagctcctcgctcagcgctctcaccaacagccgt
agcccgcagccccgctggacaccggttctccatccccgcagcgtagcccggaacatggta
gctgccatctttacctgctacgccagccttctgtgcgcgcaactgtctggtcccgccccg
tcctgcgcgagctgctgcccaggcaggttcgccggtgcgagcgtaaaggggcggagctag
gactgccttgggcggtacaaatagcagggaaccgcgcggtcgctcagcagtgacgtgaca
cgcagcccacggtctgtactgacgcgccctcgcttcttcctctttctcgactccatcttc
gcggtagctgggaccgccgttcaggtaagaatggggccttggctggatccgaagggcttg
tagcaggttggctgcggggtcagaaggcgcggggggaaccgaagaacggggcctgctccg
tggccctgctccagtccctatccgaactccttgggaggcactggccttccgcacgtgagc
cgccgcgaccaccatcccgtcgcgatcgtttctggaccgctttccactcccaaatctcct
ttatcccagagcatttcttggcttctcttacaagccgtcttttctttactcagtcgccaa
tatgcagctctttgtccgcgcccaggagctacacaccttcgaggtgaccggccaggaaac
ggtcgcccagatcaaggtaaggctgcttggtgcgccctgggttccattttcttgtgctct
tcactctcgcggcccgagggaacgcttacgagccttatctttccctgtaggctcatgtag
cctcactggagggcattgccccggaagatcaagtcgtgctcctggcaggcgcgcccctgg
aggatgaggccactctgggccagtgcggggtggaggccctgactaccctggaagtagcag
gccgcatgcttggaggtgagtgagagaggaatgttctttgaagtaccggtaagcgtctag
tgagtgtggggtgcatagtcctgacagctgagtgtcacacctatggtaatagagtacttc
tcactgtcttcagttcagagtgattcttcctgtttacatccctcatgttgaacacagacg
tccatgggagactgagccagagtgtagttgtatttcagtcacatcacgagatcctagtct
ggttatcagcttccacactaaaaattaggtcagaccaggccccaaagtgctctataaatt
agaagctggaagatcctgaaatgaaacttaagatttcaaggtcaaatatctgcaactttg
ttctcattacctattgggcgcagcttctctttaaaggcttgaattgagaaaagaggggtt
ctgctgggtggcaccttcttgctcttacctgctggtgccttcctttcccactacaggtaa
agtccatggttccctggcccgtgctggaaaagtgagaggtcagactcctaaggtgagtga
gagtattagtggtcatggtgttaggactttttttcctttcacagctaaaccaagtccctg
ggctcttactcggtttgccttctccctccctggagatgagcctgagggaagggatgctag
gtgtggaagacaggaaccagggcctgattaaccttcccttctccaggtggccaaacagga
gaagaagaagaagaagacaggtcgggctaagcggcggatgcagtacaaccggcgctttgt
caacgttgtgcccacctttggcaagaagaagggccccaatgccaactcttaagtcttttg
taattctggctttctctaataaaaaagccacttagttcagtcatcgcattgtttcatctt
tacttgcaaggcctcagggagaggtgtgcttctcgg

i.e. There was a problem with this example file in EMBOSS 6.0.1,
but things look fine in EMBOSS 6.1.0. Great :)

However, if we now convert this input file to use DOS/Windows
newlines, and repeat the test (on Mac OS X, so Unix):

$ embossversionReports the current EMBOSS version number
6.1.0
$ seqret -sequence emboss_ig.txt -sformat ig -osformat fasta -auto -filter
 H.sapiens fau mRNA, 518 bases
ttcctctttctcgactccatcttcgcggtagctgggaccgccgttcagtcgccaatatgc
agctctttgtccgcgcccaggagctacacaccttcgaggtgaccggccaggaaacggtcg
cccagatcaaggctcatgtagcctcactggagggcattgccccggaagatcaagtcgtgc
tcctggcaggcgcgcccctggaggatgaggccactctgggccagtgcggggtggaggccc
tgactaccctggaagtagcaggccgcatgcttggaggtaaagttcatggttccctggccc
gtgctggaaaagtgagaggtcagactcctaaggtggccaaacaggagaagaagaagaaga
agacaggtcgggctaagcggcggatgcagtacaaccggcgctttgtcaacgttgtgccca
cctttggcaagaagaagggccccaatgccaactcttaagtcttttgtaattctggctttc
tctaataaaaaagccacttagttcagtcaaaaaaaaaa
 H.sapiens fau 1 gene, 2016 bases
ctaccattttccctctcgattctatatgtacactcgggacaagttctcctgatcgaaaac
ggcaaaactaaggccccaagtaggaatgccttagttttcggggttaacaatgattaacac
tgagcctcacacccacgcgatgccctcagctcctcgctcagcgctctcaccaacagccgt
agcccgcagccccgctggacaccggttctccatccccgcagcgtagcccggaacatggta
gctgccatctttacctgctacgccagccttctgtgcgcgcaactgtctggtcccgccccg
tcctgcgcgagctgctgcccaggcaggttcgccggtgcgagcgtaaaggggcggagctag
gactgccttgggcggtacaaatagcagggaaccgcgcggtcgctcagcagtgacgtgaca
cgcagcccacggtctgtactgacgcgccctcgcttcttcctctttctcgactccatcttc
gcggtagctgggaccgccgttcaggtaagaatggggccttggctggatccgaagggcttg
tagcaggttggctgcggggtcagaaggcgcggggggaaccgaagaacggggcctgctccg
tggccctgctccagtccctatccgaactccttgggaggcactggccttccgcacgtgagc
cgccgcgaccaccatcccgtcgcgatcgtttctggaccgctttccactcccaaatctcct
ttatcccagagcatttcttggcttctcttacaagccgtcttttctttactcagtcgccaa
tatgcagctctttgtccgcgcccaggagctacacaccttcgaggtgaccggccaggaaac
ggtcgcccagatcaaggtaaggctgcttggtgcgccctgggttccattttcttgtgctct
tcactctcgcggcccgagggaacgcttacgagccttatctttccctgtaggctcatgtag
cctcactggagggcattgccccggaagatcaagtcgtgctcctggcaggcgcgcccctgg
aggatgaggccactctgggccagtgcggggtggaggccctgactaccctggaagtagcag
gccgcatgcttggaggtgagtgagagaggaatgttctttgaagtaccggtaagcgtctag
tgagtgtggggtgcatagtcctgacagctgagtgtcacacctatggtaatagagtacttc
tcactgtcttcagttcagagtgattcttcctgtttacatccctcatgttgaacacagacg
tccatgggagactgagccagagtgtagttgtatttcagtcacatcacgagatcctagtct
ggttatcagcttccacactaaaaattaggtcagaccaggccccaaagtgctctataaatt
agaagctggaagatcctgaaatgaaacttaagatttcaaggtcaaatatctgcaactttg
ttctcattacctattgggcgcagcttctctttaaaggcttgaattgagaaaagaggggtt
ctgctgggtggcaccttcttgctcttacctgctggtgccttcctttcccactacaggtaa
agtccatggttccctggcccgtgctggaaaagtgagaggtcagactcctaaggtgagtga
gagtattagtggtcatggtgttaggactttttttcctttcacagctaaaccaagtccctg
ggctcttactcggtttgccttctccctccctggagatgagcctgagggaagggatgctag
gtgtggaagacaggaaccagggcctgattaaccttcccttctccaggtggccaaacagga
gaagaagaagaagaagacaggtcgggctaagcggcggatgcagtacaaccggcgctttgt
caacgttgtgcccacctttggcaagaagaagggccccaatgccaactcttaagtcttttg
taattctggctttctctaataaaaaagccacttagttcagtcatcgcattgtttcatctt
tacttgcaaggcctcagggagaggtgtgcttctcgg

i.e. The ">" is missing on all the FASTA sequences.

So, it looks like EMBOSS 6.1.0 fixed one problem with
IntelliGenetics files, but that there is still an issue here.

Peter C.

P.S. Should I have reported this possible bug via sourceforge?

P.P.S. Back in 2006, I reported a similar issue with a data
corruption reading stockholm/pfam with DOS newlines
(Sourceforge Bug #1588956, long since fixed). It seems to
me that EMBOSS would benefit from explicit testing of all
the file formats using DOS/Windows newlines when run on
Unix, and vice versa. Does that sound feasible, or just
hopelessly ambitious?

From biopython at maubp.freeserve.co.uk  Mon Jul 20 11:50:32 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 20 Jul 2009 16:50:32 +0100
Subject: [EMBOSS] FASTQ format documentation
Message-ID: <320fb6e00907200850t39582781p115780690955bde6@mail.gmail.com>

Hi all,

I was just trying to double check the names EMBOSS 6.1.0 supports
for the various FASTQ file formats, and none of them are listed here:
http://emboss.sourceforge.net/docs/themes/SequenceFormats.html

Does this need updating, or should I be looking elsewhere?

Thanks

Peter C.

From pmr at ebi.ac.uk  Mon Jul 20 12:16:39 2009
From: pmr at ebi.ac.uk (Peter Rice)
Date: Mon, 20 Jul 2009 17:16:39 +0100
Subject: [EMBOSS] EMBOSS seqret : IntelliGenetics and new DOS lines
In-Reply-To: <320fb6e00907200841p621dcb7eh63d71b0730100813@mail.gmail.com>
References: <320fb6e00907200841p621dcb7eh63d71b0730100813@mail.gmail.com>
Message-ID: <4A649867.4090205@ebi.ac.uk>

Peter C. wrote:
> Hi all,
> 
> I've just updated my Mac to EMBOSS 6.1.0, and have found an
> issue with seqret conversion of IntelliGenetics files. After some
> digging, I think this problem relates to having DOS new lines in
> a file on Unix (in my case, Mac OS X).

we have an application "noreturn" to fix things like this.

If you send me your file I will ty to take a look at whether we shoudl
be catching the funny newline characters.

> P.S. Should I have reported this possible bug via sourceforge?

The emboss-bug at emboss.open-bio.org list is the best way to get our attention

> P.P.S. Back in 2006, I reported a similar issue with a data
> corruption reading stockholm/pfam with DOS newlines
> (Sourceforge Bug #1588956, long since fixed). It seems to
> me that EMBOSS would benefit from explicit testing of all
> the file formats using DOS/Windows newlines when run on
> Unix, and vice versa. Does that sound feasible, or just
> hopelessly ambitious?

We can try ... how well does biopytjhon handle these? (i.e. do we need
such examples for perl, python etc or is this an EMBOSS-specific issue?)

regards,

Peter

From pmr at ebi.ac.uk  Mon Jul 20 12:17:46 2009
From: pmr at ebi.ac.uk (Peter Rice)
Date: Mon, 20 Jul 2009 17:17:46 +0100
Subject: [EMBOSS] FASTQ format documentation
In-Reply-To: <320fb6e00907200850t39582781p115780690955bde6@mail.gmail.com>
References: <320fb6e00907200850t39582781p115780690955bde6@mail.gmail.com>
Message-ID: <4A6498AA.8040007@ebi.ac.uk>

Peter C. wrote:
> Hi all,
> 
> I was just trying to double check the names EMBOSS 6.1.0 supports
> for the various FASTQ file formats, and none of them are listed here:
> http://emboss.sourceforge.net/docs/themes/SequenceFormats.html
> 
> Does this need updating, or should I be looking elsewhere?

Yes, and yes.

We are cleaning up these things post-release. I will update that file,
and also the equivalent on the wiki

regards,

Peter Rice

From biopython at maubp.freeserve.co.uk  Mon Jul 20 12:30:45 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 20 Jul 2009 17:30:45 +0100
Subject: [EMBOSS] EMBOSS seqret : IntelliGenetics and new DOS lines
In-Reply-To: <4A649867.4090205@ebi.ac.uk>
References: <320fb6e00907200841p621dcb7eh63d71b0730100813@mail.gmail.com>
	<4A649867.4090205@ebi.ac.uk>
Message-ID: <320fb6e00907200930r6086a9a0x855de88cf9281543@mail.gmail.com>

On Mon, Jul 20, 2009 at 5:16 PM, Peter Rice<pmr at ebi.ac.uk> wrote:
>
> Peter C. wrote:
>> Hi all,
>>
>> I've just updated my Mac to EMBOSS 6.1.0, and have found an
>> issue with seqret conversion of IntelliGenetics files. After some
>> digging, I think this problem relates to having DOS new lines in
>> a file on Unix (in my case, Mac OS X).
>
> we have an application "noreturn" to fix things like this.

That's basically an EMBOSS variant on unix2dos and dos2unix
(or similar) existing Unix command line tools?

I'm more interested in having all the EMBOSS tools handle either
new line format themselves automatically. These days I am mostly
working on Unix (including Mac OS X), but I do have to cope with
Windows style text files quite often.

> If you send me your file I will ty to take a look at whether we shoudl
> be catching the funny newline characters.

For this bug report I was using:
http://emboss.sourceforge.net/docs/themes/seqformats/ig

There are another three example files used in the Biopython unit
tests here:
http://biopython.open-bio.org/SRC/biopython/Tests/IntelliGenetics/

>> P.S. Should I have reported this possible bug via sourceforge?
>
> The emboss-bug at emboss.open-bio.org list is the best way to get
> our attention

Great, another mailing list to sign up to... but if that is your
preferred route, that's fine.

>> P.P.S. Back in 2006, I reported a similar issue with a data
>> corruption reading stockholm/pfam with DOS newlines
>> (Sourceforge Bug #1588956, long since fixed). It seems to
>> me that EMBOSS would benefit from explicit testing of all
>> the file formats using DOS/Windows newlines when run on
>> Unix, and vice versa. Does that sound feasible, or just
>> hopelessly ambitious?
>
> We can try ... how well does biopytjhon handle these? (i.e. do we need
> such examples for perl, python etc or is this an EMBOSS-specific issue?)

I think this is an EMBOSS specific issue. I don't know enough about
how all the different EMBOSS parsers work, but is there a singl
place where you could add automatic handling of either new line
convention when reading in text?

For reference, in Python, you can explicitly open text files in "universal
newlines" mode, which takes care of this. I don't know about Perl.

Peter C.

From pmr at ebi.ac.uk  Mon Jul 20 16:04:19 2009
From: pmr at ebi.ac.uk (Peter Rice)
Date: Mon, 20 Jul 2009 21:04:19 +0100
Subject: [EMBOSS] EMBOSS seqret : IntelliGenetics and new DOS lines
In-Reply-To: <320fb6e00907200930r6086a9a0x855de88cf9281543@mail.gmail.com>
References: <320fb6e00907200841p621dcb7eh63d71b0730100813@mail.gmail.com>	
	<4A649867.4090205@ebi.ac.uk>
	<320fb6e00907200930r6086a9a0x855de88cf9281543@mail.gmail.com>
Message-ID: <4A64CDC3.9020902@ebi.ac.uk>

Peter C. wrote:

Thanks for the example files. I will start with those.

>>> P.S. Should I have reported this possible bug via sourceforge?
>> The emboss-bug at emboss.open-bio.org list is the best way to get
>> our attention
> 
> Great, another mailing list to sign up to... but if that is your
> preferred route, that's fine.

Actually, you can't sign up ... that's our bug reporting list :-)
> I think this is an EMBOSS specific issue. I don't know enough about
> how all the different EMBOSS parsers work, but is there a single
> place where you could add automatic handling of either new line
> convention when reading in text?

Hope so. I think the issue is places where the parsing is checking 
explicitly for \n rather than \n and \r. The solution would be to strip 
both off before parsing. It will need a thorough clean through the 
ajseqread code.

regards,

Peter


From biopython at maubp.freeserve.co.uk  Mon Jul 20 16:09:43 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 20 Jul 2009 21:09:43 +0100
Subject: [EMBOSS] EMBOSS seqret : IntelliGenetics and new DOS lines
In-Reply-To: <4A64CDC3.9020902@ebi.ac.uk>
References: <320fb6e00907200841p621dcb7eh63d71b0730100813@mail.gmail.com>
	<4A649867.4090205@ebi.ac.uk>
	<320fb6e00907200930r6086a9a0x855de88cf9281543@mail.gmail.com>
	<4A64CDC3.9020902@ebi.ac.uk>
Message-ID: <320fb6e00907201309r3ddca32dy3bfc159e201217b5@mail.gmail.com>

Peter Rice wrote:
>
> Thanks for the example files. I will start with those.
>
> Peter C. wrote:
>> I think this is an EMBOSS specific issue. I don't know enough about
>> how all the different EMBOSS parsers work, but is there a single
>> place where you could add automatic handling of either new line
>> convention when reading in text?
>
> Hope so. I think the issue is places where the parsing is checking
> explicitly for \n rather than \n and \r. The solution would be to strip
> both off before parsing. It will need a thorough clean through the
> ajseqread code.

That sounds like a good investment of effort in the long run :)

Peter C.

From pmr at ebi.ac.uk  Tue Jul 21 05:15:59 2009
From: pmr at ebi.ac.uk (Peter Rice)
Date: Tue, 21 Jul 2009 10:15:59 +0100
Subject: [EMBOSS] EMBOSS seqret : IntelliGenetics and new DOS lines
In-Reply-To: <320fb6e00907200841p621dcb7eh63d71b0730100813@mail.gmail.com>
References: <320fb6e00907200841p621dcb7eh63d71b0730100813@mail.gmail.com>
Message-ID: <4A65874F.2010106@ebi.ac.uk>

Peter C. wrote:
> However, if we now convert this input file to use DOS/Windows
> newlines, and repeat the test (on Mac OS X, so Unix):
> 
> $ embossversionReports the current EMBOSS version number
> 6.1.0
> $ seqret -sequence emboss_ig.txt -sformat ig -osformat fasta -auto -filter
>  H.sapiens fau mRNA, 518 bases
> ttcctctttctcgactccatcttcgcggtagctgggaccgccgttcagtcgccaatatgc
>
> i.e. The ">" is missing on all the FASTA sequences.

Actually, it's not missing ... it is hiding.

The sequence id has a ^M appended to it, so the '> and the id get
overwritten by the description when you look at the file.

Fixed by processing the IG format ID rather than simply copying it.

Thanks for finding that one.

regards,

Peter Rice

From biopython at maubp.freeserve.co.uk  Tue Jul 21 05:23:03 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 21 Jul 2009 10:23:03 +0100
Subject: [EMBOSS] EMBOSS seqret : IntelliGenetics and new DOS lines
In-Reply-To: <4A65874F.2010106@ebi.ac.uk>
References: <320fb6e00907200841p621dcb7eh63d71b0730100813@mail.gmail.com>
	<4A65874F.2010106@ebi.ac.uk>
Message-ID: <320fb6e00907210223x5a145ab5qfa9b29bc42a37815@mail.gmail.com>

Peter Rice wrote:
>
> Peter C. wrote:
>> However, if we now convert this input file to use DOS/Windows
>> newlines, and repeat the test (on Mac OS X, so Unix):
>>
>> $ embossversionReports the current EMBOSS version number
>> 6.1.0
>> $ seqret -sequence emboss_ig.txt -sformat ig -osformat fasta -auto -filter
>> ?H.sapiens fau mRNA, 518 bases
>> ttcctctttctcgactccatcttcgcggtagctgggaccgccgttcagtcgccaatatgc
>>
>> i.e. The ">" is missing on all the FASTA sequences.
>
> Actually, it's not missing ... it is hiding.
>
> The sequence id has a ^M appended to it, so the '> and the id get
> overwritten by the description when you look at the file.

That makes sense, and I think I can see how it might have happened.

> Fixed by processing the IG format ID rather than simply copying it.
>
> Thanks for finding that one.

Sure,

Peter C.


From db60 at st-andrews.ac.uk  Tue Jul 21 07:24:28 2009
From: db60 at st-andrews.ac.uk (Daniel Barker)
Date: Tue, 21 Jul 2009 12:24:28 +0100
Subject: [EMBOSS] EMBOSS seqret : IntelliGenetics and new DOS lines
In-Reply-To: <320fb6e00907200930r6086a9a0x855de88cf9281543@mail.gmail.com>
References: <320fb6e00907200841p621dcb7eh63d71b0730100813@mail.gmail.com>	<4A649867.4090205@ebi.ac.uk>
	<320fb6e00907200930r6086a9a0x855de88cf9281543@mail.gmail.com>
Message-ID: <4A65A56C.9010308@st-andrews.ac.uk>

Dear Peters et al.,

EMBOSS claims not to care about whether newlines are DOS or UNIX:

'EMBOSS programs can read in both PC and Unix text file formats, so it 
is not necessary for you to use this utility all of the time' - noreturn 
documentation.

This would certainly be good. 'The newline problem' must be the single 
biggest computational waste of time I've experienced over the years!

It's easy to avoid with tr, u2d, d2u, noreturn, etc. - but it's just one 
other thing that can go wrong, especially when data is shared between 
different places.

I've noticed a small amount of software, in the world in general, still 
uses the Mac OS 9 (and earlier) convention where newline is \015 only. 
E.g. this tab-delimited text saved from Excel 2004 for Mac:

$ od -bc Workbook1.txt
0000000   061 011 062 011 063 015 064 011 065 011 066
            1  \t   2  \t   3  \r   4  \t   5  \t   6
0000013
$

I expect this usage will decline, since it's in conflict with the 
convention of Mac OS X's own command-line tools (\012 only, like Linux). 
Probably the '\015 only' convention hasn't had much impact on 
bioinformatics anyway?

Best wishes,

Daniel

-- 
Daniel Barker
http://bio.st-andrews.ac.uk/staff/db60.htm
The University of St Andrews is a charity registered in Scotland :
No SC013532

From pmr at ebi.ac.uk  Tue Jul 21 08:12:00 2009
From: pmr at ebi.ac.uk (Peter Rice)
Date: Tue, 21 Jul 2009 13:12:00 +0100
Subject: [EMBOSS] EMBOSS seqret : IntelliGenetics and new DOS lines
In-Reply-To: <4A65A56C.9010308@st-andrews.ac.uk>
References: <320fb6e00907200841p621dcb7eh63d71b0730100813@mail.gmail.com>	<4A649867.4090205@ebi.ac.uk>	<320fb6e00907200930r6086a9a0x855de88cf9281543@mail.gmail.com>
	<4A65A56C.9010308@st-andrews.ac.uk>
Message-ID: <4A65B090.7030507@ebi.ac.uk>

Daniel Barker wrote:
> Dear Peters et al.,
> 
> EMBOSS claims not to care about whether newlines are DOS or UNIX:
> 
> 'EMBOSS programs can read in both PC and Unix text file formats, so it
> is not necessary for you to use this utility all of the time' - noreturn
> documentation.
> 
> This would certainly be good. 'The newline problem' must be the single
> biggest computational waste of time I've experienced over the years!

Indeed. We get a little caught between accepting the extra carriage
returns and the need for efficient parsing as some users run seqret to
reformat large sequence files. Mac format (\015 only) is
horrible because the C library functions are looking for line feed
unless you're running on a Mac.

I will give our examples a run through after converting to PC format and
see if any others fall over.

> I've noticed a small amount of software, in the world in general, still
> uses the Mac OS 9 (and earlier) convention where newline is \015 only.
> E.g. this tab-delimited text saved from Excel 2004 for Mac:
> 
> $ od -bc Workbook1.txt
> 0000000   061 011 062 011 063 015 064 011 065 011 066
>            1  \t   2  \t   3  \r   4  \t   5  \t   6
> 0000013
> $
> 
> I expect this usage will decline, since it's in conflict with the
> convention of Mac OS X's own command-line tools (\012 only, like Linux).
> Probably the '\015 only' convention hasn't had much impact on
> bioinformatics anyway?

In my experience mac users didn't venture out into the real world. On the
other hand, unix users have often copied files from PCs. I used to do
the same myself, which was why I had to write noreturn in the first place.

regards,

Peter Rice

From stephen.taylor at imm.ox.ac.uk  Thu Jul 23 05:43:35 2009
From: stephen.taylor at imm.ox.ac.uk (Stephen Taylor)
Date: Thu, 23 Jul 2009 10:43:35 +0100
Subject: [EMBOSS] Tree building
Message-ID: <EC064FE0-9C21-40FE-BF69-1179EBC76CC9@imm.ox.ac.uk>

Hi,

Are there any phylogeny programs in EMBOSS/EMBASSY that will allow me  
to build a tree from a multiple alignment without truncating the id  
name in the resulting treefile?

Thanks,

Steve


From pmr at ebi.ac.uk  Thu Jul 23 06:51:06 2009
From: pmr at ebi.ac.uk (Peter Rice)
Date: Thu, 23 Jul 2009 11:51:06 +0100
Subject: [EMBOSS] Tree building
In-Reply-To: <EC064FE0-9C21-40FE-BF69-1179EBC76CC9@imm.ox.ac.uk>
References: <EC064FE0-9C21-40FE-BF69-1179EBC76CC9@imm.ox.ac.uk>
Message-ID: <4A68409A.7020004@ebi.ac.uk>

Stephen Taylor wrote:
> Hi,
> 
> Are there any phylogeny programs in EMBOSS/EMBASSY that will allow me to 
> build a tree from a multiple alignment without truncating the id name in 
> the resulting treefile?

This can be done by adding some new output formats to the existing 
phylipnew embassy applications. Phylip truncates id names for its own 
formats. We can extend them, and only truncate for strict phylip format 
outputs.

We are also open to suggestions for new applications in phylogenetics

A good topic to discuss at the GMOD workshop in Oxford.

regards,

Peter

From pmr at ebi.ac.uk  Thu Jul 23 10:06:58 2009
From: pmr at ebi.ac.uk (Peter Rice)
Date: Thu, 23 Jul 2009 15:06:58 +0100
Subject: [EMBOSS] Tree building
In-Reply-To: <F85F706C-9198-4020-A232-8E44E8F43803@imm.ox.ac.uk>
References: <EC064FE0-9C21-40FE-BF69-1179EBC76CC9@imm.ox.ac.uk>
	<4A68409A.7020004@ebi.ac.uk>
	<F85F706C-9198-4020-A232-8E44E8F43803@imm.ox.ac.uk>
Message-ID: <4A686E82.1010803@ebi.ac.uk>

Stephen Taylor wrote:
> Hi Peter,
> 
>> Stephen Taylor wrote:
>>> Hi,
>>> Are there any phylogeny programs in EMBOSS/EMBASSY that will allow me 
>>> to build a tree from a multiple alignment without truncating the id 
>>> name in the resulting treefile?
>>
>> This can be done by adding some new output formats to the existing 
>> phylipnew embassy applications. Phylip truncates id names for its own 
>> formats. We can extend them, and only truncate for strict phylip 
>> format outputs.
>>
>> We are also open to suggestions for new applications in phylogenetics
>>
> 
> Sounds like a good idea.
> 
> However, I kind of need it today:-). I guess I'll have to truncate them 
> to something sensible for the time being...

Can you give some example data and algorithms to make the tree?

Peter


From stephen.taylor at imm.ox.ac.uk  Thu Jul 23 11:07:36 2009
From: stephen.taylor at imm.ox.ac.uk (Stephen Taylor)
Date: Thu, 23 Jul 2009 16:07:36 +0100
Subject: [EMBOSS] Tree building
In-Reply-To: <50E41C3A73F46B4D876BD13F5E80264A02DAD6C6@exchange3.sims.scri.sari.ac.uk>
References: <EC064FE0-9C21-40FE-BF69-1179EBC76CC9@imm.ox.ac.uk><4A68409A.7020004@ebi.ac.uk><F85F706C-9198-4020-A232-8E44E8F43803@imm.ox.ac.uk>
	<4A686E82.1010803@ebi.ac.uk>
	<50E41C3A73F46B4D876BD13F5E80264A02DAD6C6@exchange3.sims.scri.sari.ac.uk>
Message-ID: <DB9966B9-F981-4A1D-8154-DD571ADB83A7@imm.ox.ac.uk>

Hi Frank,

> Hi Stephen,
>
>> However, I kind of need it today:-). I guess I'll have to truncate  
>> them
>> to something sensible for the time being...
>
> If you need a tree *today*... :-)  you could download our TOPALi  
> package (loads all formats of alignment)
>

I have already installed Topali and was waiting for the results to  
run. I have Over 1600 sequences and I wasn't sure when it was going to  
finish since I started it this morning (using the Quick NJ option on a  
Mac Mini). I think we should install it on a more powerful machine;-).

However, after I got your email I checked the status and the job had  
finished. Good timing...


> It does basic NJ, but also model-selection, PhyML, RaxML and  
> MrBayes.  RaxML is quite fast.
>
> We internally rename species names when TOPALi submits jobs to the  
> above external programs, but put the names back when we show the  
> alignment and the tree.


Yes, all the labels are there in full. Thanks a lot.

Peter - I can send you some date offline if you wish.

Steve


From stephen.taylor at imm.ox.ac.uk  Thu Jul 23 09:34:26 2009
From: stephen.taylor at imm.ox.ac.uk (Stephen Taylor)
Date: Thu, 23 Jul 2009 14:34:26 +0100
Subject: [EMBOSS] Tree building
In-Reply-To: <4A68409A.7020004@ebi.ac.uk>
References: <EC064FE0-9C21-40FE-BF69-1179EBC76CC9@imm.ox.ac.uk>
	<4A68409A.7020004@ebi.ac.uk>
Message-ID: <F85F706C-9198-4020-A232-8E44E8F43803@imm.ox.ac.uk>

Hi Peter,

> Stephen Taylor wrote:
>> Hi,
>> Are there any phylogeny programs in EMBOSS/EMBASSY that will allow  
>> me to build a tree from a multiple alignment without truncating the  
>> id name in the resulting treefile?
>
> This can be done by adding some new output formats to the existing  
> phylipnew embassy applications. Phylip truncates id names for its  
> own formats. We can extend them, and only truncate for strict phylip  
> format outputs.
>
> We are also open to suggestions for new applications in phylogenetics
>

Sounds like a good idea.

However, I kind of need it today:-). I guess I'll have to truncate  
them to something sensible for the time being...

Thanks.

Steve


From gbottu at vub.ac.be  Thu Jul 23 12:19:11 2009
From: gbottu at vub.ac.be (Guy Bottu)
Date: Thu, 23 Jul 2009 18:19:11 +0200
Subject: [EMBOSS] Tree building
In-Reply-To: <4A68409A.7020004@ebi.ac.uk>
References: <EC064FE0-9C21-40FE-BF69-1179EBC76CC9@imm.ox.ac.uk>
	<4A68409A.7020004@ebi.ac.uk>
Message-ID: <4A688D7F.6080304@vub.ac.be>

Peter Rice wrote:
> This can be done by adding some new output formats to the existing 
> phylipnew embassy applications. Phylip truncates id names for its own 
> formats. We can extend them, and only truncate for strict phylip format 
> outputs.

Interesting to note the following : standard PHYLIP format reserves the 
first 10 characters for the name and has the sequence start at position 
11. But some programmers like the one of PhyML use a format they call 
PHYLIP, which however allows a name of any length but then demands that 
there be a space between the name and the sequence. I already had 
trouble when I used a standard PHYLIP file made by PHYLIP or EMBOSS with 
a name of 10 characters as input to PhyML.

	Guy Bottu

From belegdol at gmail.com  Wed Jul 29 07:54:49 2009
From: belegdol at gmail.com (Julian Sikorski)
Date: Wed, 29 Jul 2009 13:54:49 +0200
Subject: [EMBOSS] Packaging EMBOSS for Fedora
In-Reply-To: <49993.78.105.201.225.1248020309.squirrel@webmail.ebi.ac.uk>
References: <h0uef8$bdu$1@ger.gmane.org>	<1244850556.8999.7.camel@login-svr1.ebi.ac.uk>	<h3v43r$tc5$1@ger.gmane.org>
	<49993.78.105.201.225.1248020309.squirrel@webmail.ebi.ac.uk>
Message-ID: <h4pdaa$ad5$1@ger.gmane.org>

W dniu 19.07.2009 18:18, uludag at ebi.ac.uk pisze:
> 
>> there seem to be some problems with make install:
>>
>> /usr/bin/make  install-exec-hook
>> make[7]: Entering directory
>> `/builddir/build/BUILD/EMBOSS-6.1.0/jemboss/org/emboss/jemboss/editor'
>> mkdir -p --
>> /builddir/build/BUILDROOT/EMBOSS-6.1.0-1.fc11.x86_64/usr/share/EMBOSS/jemboss/org/emboss/jemboss/editor
>> /usr/bin/install: cannot stat `*.class': No such file or directory
> 
> Looks like we didn't test the --with-java and --with-javaos configure
> options well, before this release. However, most users will not need these
> two options any more as EMBOSS-6.1.0 includes precompiled jemboss class
> files collected in a java archive file. You should hopefully not get the
> above error if you omit these two options when you configure your emboss
> installation.
> 
> Regards,
> Mahmut
Thank you, removing these two seems to have done the trick!

Julian


From biopython at maubp.freeserve.co.uk  Thu Jul 30 11:00:37 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 30 Jul 2009 16:00:37 +0100
Subject: [EMBOSS] FASTQ records with no sequence?
Message-ID: <320fb6e00907300800x5f8e78eci5df8333df713e4c@mail.gmail.com>

Hi all,

On the continuing topic of the nebulous FASTQ format, are there
any strong views as to weather a FASTQ files could hold records
without a sequence (and therefore no quality scores)? This could
make sense as output from an (agressive) quality filter.

This is corner case, and applies to other file formats too of course
(e.g. FASTA).

I mentioned this to Peter Rice (EMBOSS) off list, and he replied:

On Thu, Jul 30, 2009 at 2:56 PM, Peter Rice<pmr at ebi.ac.uk> wrote:
> EMBOSS rejects zero length sequences - something we put in some years
> ago for misformatted FASTA files that someone ran through a Taverna
> workflow to launch clustalw via EMBOSS's "emma". The user had got his
> carriage control characters mangled so the sequence was appended to the
> FASTA '>' line and appeared as a long description with no sequence.
>
> I can well imagine for filtering paired reads that zero length sequences
> would be useful.
>
> At the point where the test is made we know the sequence format.
> We can therefore define some or all formats as accepting or rejecting
> zero length sequences.
>
> Similarly we can easily extend to define some applications (e.g. emma)
> as requiring a minimum sequence length.
>
> regards,
>
> Peter

Peter Rice is of course correct - in general the meaning and validity
of a zero length sequence is context dependent.

I think Peter Rice makes a good point regarding paired end reads.
What I assume we was getting at is the situation where due to
quality trimming, one of a pair might be trimmed to nothing - leaving
essentially a singleton read. However, paired end reads are normally
stored using a matched pair of FASTQ files, so it could be important
to keep the zero length read present, so that they can be read in
together in sync.

If we do want to allow zero length sequences in FASTQ, would
both of the following be valid? Should there be empty sequence
and quality lines, or no sequence and quality lines?

"@identifier\n+\n" (two lines, just the @ and + lines)
"@identifier\n\n+\n\n" (four lines, including blank seq and qual lines)

or with the repeated identifier on the plus lines:

"@identifier\n+identifier\n" (two lines, just the @ and + lines)
"@identifier\n\n+identifier\n\n" (four lines, including blank lines)

As we are recommending no line wrapping on output this means
typical FASTQ records would be four lines - so doing the same
makes sense here too.

Peter C.

From pmr at ebi.ac.uk  Thu Jul 30 11:09:41 2009
From: pmr at ebi.ac.uk (Peter Rice)
Date: Thu, 30 Jul 2009 16:09:41 +0100
Subject: [EMBOSS] FASTQ records with no sequence?
In-Reply-To: <320fb6e00907300800x5f8e78eci5df8333df713e4c@mail.gmail.com>
References: <320fb6e00907300800x5f8e78eci5df8333df713e4c@mail.gmail.com>
Message-ID: <4A71B7B5.40502@ebi.ac.uk>

Peter C. wrote:

> As we are recommending no line wrapping on output this means
> typical FASTQ records would be four lines - so doing the same
> makes sense here too.

I vote for 4 lines on output. It should be possible to allow zero lines
on input depending on where the '+' check is.

regards,

Peter

From biopython at maubp.freeserve.co.uk  Thu Jul 30 11:19:24 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 30 Jul 2009 16:19:24 +0100
Subject: [EMBOSS] FASTQ records with no sequence?
In-Reply-To: <4A71B7B5.40502@ebi.ac.uk>
References: <320fb6e00907300800x5f8e78eci5df8333df713e4c@mail.gmail.com>
	<4A71B7B5.40502@ebi.ac.uk>
Message-ID: <320fb6e00907300819x35ae00c3wa20a382376134db7@mail.gmail.com>

On Thu, Jul 30, 2009 at 4:09 PM, Peter Rice<pmr at ebi.ac.uk> wrote:
>
> Peter C. wrote:
>
>> As we are recommending no line wrapping on output this means
>> typical FASTQ records would be four lines - so doing the same
>> makes sense here too.
>
> I vote for 4 lines on output.

If we want to allow zero length sequences, then yes, I would also
vote for the 4 line output (i.e. blank lines for the sequence and
the quality string).

> It should be possible to allow zero lines on input depending on
> where the '+' check is.

Yes, I'm pretty sure a parser could cope with any of the zero length
sequence FASTQ examples I gave.

Peter

From biopython at maubp.freeserve.co.uk  Thu Jul 30 11:35:25 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 30 Jul 2009 16:35:25 +0100
Subject: [EMBOSS] FASTQ records with no sequence?
In-Reply-To: <320fb6e00907300819x35ae00c3wa20a382376134db7@mail.gmail.com>
References: <320fb6e00907300800x5f8e78eci5df8333df713e4c@mail.gmail.com>
	<4A71B7B5.40502@ebi.ac.uk>
	<320fb6e00907300819x35ae00c3wa20a382376134db7@mail.gmail.com>
Message-ID: <320fb6e00907300835v3a9d46d4w77c344bbf6efa08d@mail.gmail.com>

Hi all,

On the continuing topic of the nebulous FASTQ format, are there
any strong views as to weather a FASTQ files could hold records
without a sequence (and therefore no quality scores)? This could
make sense as output from an (aggressive) quality filter.

This was a discussion I meant to start on the OBF list, not the
EMBOSS list - so here is the start of the thread:
http://lists.open-bio.org/pipermail/emboss/2009-July/003707.html

Basically in some contexts an empty FASTQ record makes sense,
so perhaps we should include examples of this for our test suite.
However, there is more than one reasonable way to represent
such a record (either omitting the sequence and quality lines, or
including blank sequence and quality lines).

On Thu, Jul 30, 2009 at 4:09 PM, Peter Rice<pmr at ebi.ac.uk> wrote:
>
> Peter C. wrote:
>
>> As we are recommending no line wrapping on output this means
>> typical FASTQ records would be four lines - so doing the same
>> makes sense here too.
>
> I vote for 4 lines on output.

If we want to allow zero length sequences, then yes, I would also
vote for the 4 line output (i.e. blank lines for the sequence and
the quality string).

> It should be possible to allow zero lines on input depending on
> where the '+' check is.

Yes, I'm pretty sure a parser could cope with any of the zero length
sequence FASTQ examples I gave.

Peter

From wzhao6898 at gmail.com  Fri Jul 31 12:21:06 2009
From: wzhao6898 at gmail.com (David Zhao)
Date: Fri, 31 Jul 2009 16:21:06 +0000 (UTC)
Subject: [EMBOSS] How to specify the number of partial digestion using DIGEST
Message-ID: <loom.20090731T161701-636@post.gmane.org>

Hi there,

I'm trying to use DIGEST to generate virtual tryptic digest for a list of 
proteins, where I'd like to get 2 or 3 miscleavage digestion. I know -overlap 
returns 1 miscleavage, and -allpartials returns all the combinations 
and result in a huge file, is there a way I can specify how many miscleavage I 
need?
Thanks,

David Zhao


From biopython at maubp.freeserve.co.uk  Mon Jul  6 10:12:23 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 6 Jul 2009 11:12:23 +0100
Subject: [EMBOSS] Many-to-many with needle and water
Message-ID: <320fb6e00907060312l5506c766i1c4a4ff48fba8c7e@mail.gmail.com>

Hi Peter R. et al,

I gather EMBOSS is looking for feedback for new applications (given
the recent funding from the BBSRC - congratulations again). How about
suggestions for extensions to existing EMBOSS applications?

I've used bits of EMBOSS for several years now (thank you!). Something
I have sometimes wanted to do is a many-to-many pairwise sequence
alignment with the EMBOSS tools needle and water.

Right now, needle and water take two files (here referred to as A and
B), file A has just one sequence, and file B can have one or more
sequences. I'd like to be able to supply two files both with multiple
entries, and have needle/water do pairwise alignments between all the
sequences in A against all the sequences in B. This might be useful
for finding reciprocal best hits in comparative genomics (as an slower
but exact alternative to FASTA or BLAST).

>From an implementation point of view, I might imagine doing sequence
A1 against all of B, then sequence A2 against all of B, etc. This
would require looping over file B many times (easy if on disk). This
would also work if the A input was stdin, but having the B input on
stdin would require caching the data if A has more than one sequence
:(

It may sometimes also be useful to have an all-against-all pairwise
comparison for a single set of sequences. The above suggested
enhancement would let you do this by comparing file A to file A.
However, here you only really need to do half the possible
combinations (as aligning sequence A1 to sequence A2 should be the
same as A2 to A1). This could be useful for implementing a basic
clustering algorithm, or maybe as part of a worked example in building
a simple NJ tree?

So, does supporting many-to-many comparisons sound like a useful
enhancement to needle and water?

I should stress this isn't something I need right now. Also, it can be
worked around with a wrapper script to call needle/water once for each
sequence in file A (against all the sequences in file B), with the
added bonus that then these jobs one-to-many comparisons can then be
shared across multiple CPU cores.

Regards,

Peter C.


From pmr at ebi.ac.uk  Mon Jul  6 10:35:12 2009
From: pmr at ebi.ac.uk (Peter Rice)
Date: Mon, 06 Jul 2009 11:35:12 +0100
Subject: [EMBOSS] Many-to-many with needle and water
In-Reply-To: <320fb6e00907060312l5506c766i1c4a4ff48fba8c7e@mail.gmail.com>
References: <320fb6e00907060312l5506c766i1c4a4ff48fba8c7e@mail.gmail.com>
Message-ID: <4A51D360.1050107@ebi.ac.uk>

Peter Cock or biopython wrote:
> Hi Peter R. et al,
> 
> I gather EMBOSS is looking for feedback for new applications (given
> the recent funding from the BBSRC - congratulations again). How about
> suggestions for extensions to existing EMBOSS applications?
> 
> I've used bits of EMBOSS for several years now (thank you!). Something
> I have sometimes wanted to do is a many-to-many pairwise sequence
> alignment with the EMBOSS tools needle and water.
> 
> Right now, needle and water take two files (here referred to as A and
> B), file A has just one sequence, and file B can have one or more
> sequences. I'd like to be able to supply two files both with multiple
> entries, and have needle/water do pairwise alignments between all the
> sequences in A against all the sequences in B. This might be useful
> for finding reciprocal best hits in comparative genomics (as an slower
> but exact alternative to FASTA or BLAST).

The application is easy to add (after the release)

The usual problem with all-against-all is that it involves loading one
of the inputs as a sequence set entirely in memory - to avoid reading
one input many times over.

We have an application supermatcher which does this - the first sequence
is streamed through, the second is a sequence set loaded into memory. It
uses work matching to find seed alignments then runs a limited alignment
around the hits.

superwater would be a possible name (or superneedle).

How popular would such a program be?

How large would the smaller input set be?

regards,

Peter


From biopython at maubp.freeserve.co.uk  Mon Jul  6 10:58:06 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 6 Jul 2009 11:58:06 +0100
Subject: [EMBOSS] Many-to-many with needle and water
In-Reply-To: <4A51D360.1050107@ebi.ac.uk>
References: <320fb6e00907060312l5506c766i1c4a4ff48fba8c7e@mail.gmail.com>
	<4A51D360.1050107@ebi.ac.uk>
Message-ID: <320fb6e00907060358v71934109o1d8acb0aef8d3e83@mail.gmail.com>

On Mon, Jul 6, 2009 at 11:35 AM, Peter Rice <pmr at ebi.ac.uk> wrote:
>
> Peter C wrote:
> > Hi Peter R. et al,
> >
> > I gather EMBOSS is looking for feedback for new applications (given
> > the recent funding from the BBSRC - congratulations again). How about
> > suggestions for extensions to existing EMBOSS applications?
> >
> > I've used bits of EMBOSS for several years now (thank you!). Something
> > I have sometimes wanted to do is a many-to-many pairwise sequence
> > alignment with the EMBOSS tools needle and water.
> >
> > Right now, needle and water take two files (here referred to as A and
> > B), file A has just one sequence, and file B can have one or more
> > sequences. I'd like to be able to supply two files both with multiple
> > entries, and have needle/water do pairwise alignments between all the
> > sequences in A against all the sequences in B. This might be useful
> > for finding reciprocal best hits in comparative genomics (as an slower
> > but exact alternative to FASTA or BLAST).
>
> The application is easy to add (after the release)
>
> The usual problem with all-against-all is that it involves loading one
> of the inputs as a sequence set entirely in memory - to avoid reading
> one input many times over.

Right - and it would be difficult to decide if in memory vs reading the
file many times is best in general without some specific use cases.

[I suppose you could do something a bit more cunning like start by
caching the sequences as you read them read for re-use, but if the
number of sequences crosses a threshold, stop caching and switch
to re-reading the file for subsequence loops?]

> We have an application supermatcher which does this - the first sequence
> is streamed through, the second is a sequence set loaded into memory. It
> uses work matching to find seed alignments then runs a limited alignment
> around the hits.
>
> superwater would be a possible name (or superneedle).

If you see many-to-many versions of water and needle as a separate
applications, then those names sound fine.

> How popular would such a program be?

I don't know - as I said, this is more of suggestion than a request.
I don't *need* this tool, but there have been occasions in the past
where I would have tried using it if it had existed.

Perhaps others on the list can think of a better uses for this tool idea?

> How large would the smaller input set be?

Hard to say without specific examples in mind. For some hand waving
upper limits, for comparative genomics of bacteria using protein
sequences, you might have a few thousand in each file. If I was trying
this as part of an ad-hoc clustering algorithm (all-against-all), again
maybe a few thousand sequences. In practice, a heuristic tool like
supermatcher (or FASTA or BLAST) would probably be more sensible
for large datasets like this due to the computational time.

I see needle and water as most useful on smaller datasets where
the runtime cost of using an exact algorithm isn't too high. Therefore
many-to-many needle/water searches may be best targeted at
smaller sequence files. Things might be different with a multicore
or GPU/OpenCL version of needle and water ;)

Anyway, unless someone else thinks a many-to-many version
of needle and water would be useful, I wouldn't expect you to
implement this. I'm just putting the idea forward for discussion.

Regards,

Peter C.


From pmr at ebi.ac.uk  Mon Jul  6 11:19:30 2009
From: pmr at ebi.ac.uk (Peter Rice)
Date: Mon, 06 Jul 2009 12:19:30 +0100
Subject: [EMBOSS] Many-to-many with needle and water
In-Reply-To: <320fb6e00907060358v71934109o1d8acb0aef8d3e83@mail.gmail.com>
References: <320fb6e00907060312l5506c766i1c4a4ff48fba8c7e@mail.gmail.com>	
	<4A51D360.1050107@ebi.ac.uk>
	<320fb6e00907060358v71934109o1d8acb0aef8d3e83@mail.gmail.com>
Message-ID: <4A51DDC2.9030801@ebi.ac.uk>

Peter C wrote:
> [I suppose you could do something a bit more cunning like start by
> caching the sequences as you read them read for re-use, but if the
> number of sequences crosses a threshold, stop caching and switch
> to re-reading the file for subsequence loops?]

Tricky. Rereading is not always possible - for example streamed standard
input as the data source.

> Perhaps others on the list can think of a better uses for this tool idea?

Let's see what response we get. One never knows until the question is
asked :-)

>> How large would the smaller input set be?
> 
> Hard to say without specific examples in mind. For some hand waving
> upper limits, for comparative genomics of bacteria using protein
> sequences, you might have a few thousand in each file. If I was trying
> this as part of an ad-hoc clustering algorithm (all-against-all), again
> maybe a few thousand sequences. In practice, a heuristic tool like
> supermatcher (or FASTA or BLAST) would probably be more sensible
> for large datasets like this due to the computational time.
> 
> I see needle and water as most useful on smaller datasets where
> the runtime cost of using an exact algorithm isn't too high. Therefore
> many-to-many needle/water searches may be best targeted at
> smaller sequence files. Things might be different with a multicore
> or GPU/OpenCL version of needle and water ;)

Multicore would be a possibility - at least on systems configured for
it. We are looking into picking up methods from the BioManyCores project.

> Anyway, unless someone else thinks a many-to-many version
> of needle and water would be useful, I wouldn't expect you to
> implement this. I'm just putting the idea forward for discussion.

Implementing is easy - we could simply send you the code to install
locally if nobody else needs it :-)

After all, it is only a minor modification to the existing applications.

regards,

Peter


From biopython at maubp.freeserve.co.uk  Mon Jul  6 11:56:06 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 6 Jul 2009 12:56:06 +0100
Subject: [EMBOSS] Probabilistic versions of needle/water?
Message-ID: <320fb6e00907060456t42879f56ta5df97ac6ae8ec8e@mail.gmail.com>

Hi all,

I have another suggestion for new or enhanced EMBOSS applications,
again related to the existing pairwise sequence alignment tools needle
and water.

The FASTQ file format (or others) contains quality scores (often PHRED
scores) representing the probability of an error in the associated
nucleotide. Solexa/Illumina machines also provide another file with a
more precise breakdown of the likelihood of each of the four bases.

In some cases both sequences could have probability scores (e.g.
trying to align the ends of contigs to each other), but often one
sequence will be taken as fact (e.g. mapping reads onto a reference).

It is possible to take these probabilities into account when
considering the matches in needle (or water) by using a probabilistic
version of the Needleman?Wunsch sequence alignment algorithm (or a
probabilistic Smith-Waterman).

As an example of this idea, did you (Peter R) see the GNUMAP
talk/poster at ISMB 2009? See http://dna.cs.byu.edu/gnumap/

I am aware of people using EMBOSS tools (I assume water) to identify
(known) adaptor sequences in raw Solexa/Illumina data. I considered
doing something similar myself when trying to remove primer sequences
from 454 data. Such a pipeline using the current EMBOSS water would be
doing this matching at a purely fixed nucleotide level (ignoring the
qualities), which isn't ideal. Upgrading to a probabilistic version of
water should be an improvement.

Peter C.


From pmr at ebi.ac.uk  Mon Jul  6 12:32:18 2009
From: pmr at ebi.ac.uk (Peter Rice)
Date: Mon, 06 Jul 2009 13:32:18 +0100
Subject: [EMBOSS] Probabilistic versions of needle/water?
In-Reply-To: <320fb6e00907060456t42879f56ta5df97ac6ae8ec8e@mail.gmail.com>
References: <320fb6e00907060456t42879f56ta5df97ac6ae8ec8e@mail.gmail.com>
Message-ID: <4A51EED2.3080901@ebi.ac.uk>

Peter C. wrote:
> I have another suggestion for new or enhanced EMBOSS applications,
> again related to the existing pairwise sequence alignment tools needle
> and water.
> 
> The FASTQ file format (or others) contains quality scores (often PHRED
> scores) representing the probability of an error in the associated
> nucleotide. Solexa/Illumina machines also provide another file with a
> more precise breakdown of the likelihood of each of the four bases.
> 
> In some cases both sequences could have probability scores (e.g.
> trying to align the ends of contigs to each other), but often one
> sequence will be taken as fact (e.g. mapping reads onto a reference).
> 
> It is possible to take these probabilities into account when
> considering the matches in needle (or water) by using a probabilistic
> version of the Needleman?Wunsch sequence alignment algorithm (or a
> probabilistic Smith-Waterman).
> 
> As an example of this idea, did you (Peter R) see the GNUMAP
> talk/poster at ISMB 2009? See http://dna.cs.byu.edu/gnumap/

I saw the talk, and was wondering about their algorithm. They did not
have a separate treatment for gaps in the redas and the consensus, which
seemed like an obvious extension.

> I am aware of people using EMBOSS tools (I assume water) to identify
> (known) adaptor sequences in raw Solexa/Illumina data. I considered
> doing something similar myself when trying to remove primer sequences
> from 454 data. Such a pipeline using the current EMBOSS water would be
> doing this matching at a purely fixed nucleotide level (ignoring the
> qualities), which isn't ideal. Upgrading to a probabilistic version of
> water should be an improvement.

Would be interesting.

Where can I look up adaptor calling methods?

Peter Rice


From biopython at maubp.freeserve.co.uk  Mon Jul  6 14:25:47 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 6 Jul 2009 15:25:47 +0100
Subject: [EMBOSS] Probabilistic versions of needle/water?
In-Reply-To: <4A51EED2.3080901@ebi.ac.uk>
References: <320fb6e00907060456t42879f56ta5df97ac6ae8ec8e@mail.gmail.com>
	<4A51EED2.3080901@ebi.ac.uk>
Message-ID: <320fb6e00907060725g380ce420vdf2f34c290518a16@mail.gmail.com>

On Mon, Jul 6, 2009 at 1:32 PM, Peter Rice<pmr at ebi.ac.uk> wrote:
>
>> I am aware of people using EMBOSS tools (I assume water) to identify
>> (known) adaptor sequences in raw Solexa/Illumina data. I considered
>> doing something similar myself when trying to remove primer sequences
>> from 454 data. Such a pipeline using the current EMBOSS water would be
>> doing this matching at a purely fixed nucleotide level (ignoring the
>> qualities), which isn't ideal. Upgrading to a probabilistic version of
>> water should be an improvement.
>
> Would be interesting.
>
> Where can I look up adaptor calling methods?

The particular example I had in mind was the thread with Giles Weaver
on the BioPerl mailing list, which I see you have just replied to:

http://lists.open-bio.org/pipermail/bioperl-l/2009-June/030398.html
http://lists.open-bio.org/pipermail/bioperl-l/2009-July/030404.html

I think I made a typo earlier (needle versus water). If you are
comparing a short but complete adaptor sequence to a read
(which you expect may contain the full adaptor) doing a global
alignment is more sensible that a local one. On re-reading,
Giles did actually say he was using needle:
http://lists.open-bio.org/pipermail/bioperl-l/2009-July/030411.html

Peter


From michael.watson at bbsrc.ac.uk  Mon Jul  6 15:45:06 2009
From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C))
Date: Mon, 6 Jul 2009 16:45:06 +0100
Subject: [EMBOSS] Problems installing EMBASSY / PHYLIPNEW
Message-ID: <8975119BCD0AC5419D61A9CF1A923E9508FD0DC2@iahce2ksrv1.iah.bbsrc.ac.uk>

Hello

 
I did this on a test server a few weeks ago and wrote a script and it
worked.

 
Now I come to the production server - well, I get some errors.  Same OS,
same tar balls.  Very frustrating.  Can anyone help?

 
I have PHYLIPNEW-3.67 unpacked in EMBOSS-6.0.1/embassy/PHYLIPNEW-3.67.
In that directory, I run configure and then make

 
I get a variety of errors, but this time it's:

 
Making all in src

make[1]: Entering directory
`/home/bioinfvm/EMBOSS-6.0.1/embassy/PHYLIPNEW-3.67/src'

gcc -DPACKAGE_NAME=\"\" -DPACKAGE_TARNAME=\"\" -DPACKAGE_VERSION=\"\"
-DPACKAGE_STRING=\"\" -DPACKAGE_BUGREPORT=\"\" -DPACKAGE=\"PHYLIPNEW\"
-DVERSION=\"3.67\" -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1
-DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1
-DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1
-DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -DHAVE_DIRENT_H=1
-DSTDC_HEADERS=1 -DHAVE_UNISTD_H=1 -DGETPGRP_VOID=1 -DHAVE_STRFTIME=1
-DHAVE_FORK=1 -DHAVE_VFORK=1 -DHAVE_WORKING_VFORK=1
-DHAVE_WORKING_FORK=1 -DHAVE_VPRINTF=1 -DHAVE_STRDUP=1 -DHAVE_STRSTR=1
-DHAVE_STRCHR=1 -DHAVE_ERAND48=1 -DHAVE_MEMMOVE=1 -DHAVE_LIBM=1
-DPLD_png=1 -I. -I../include -I../../../ajax -I../../../nucleus
-I../../../plplot  -DAJ_LinuxLF -D_FILE_OFFSET_BITS=64
-D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE  -DLENDIAN -DNO_AUTH  -O2
-D__amd64__ -MT clique.o -MD -MP -MF .deps/clique.Tpo -c -o clique.o
clique.c

mv -f .deps/clique.Tpo .deps/clique.Po

gcc -DPACKAGE_NAME=\"\" -DPACKAGE_TARNAME=\"\" -DPACKAGE_VERSION=\"\"
-DPACKAGE_STRING=\"\" -DPACKAGE_BUGREPORT=\"\" -DPACKAGE=\"PHYLIPNEW\"
-DVERSION=\"3.67\" -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1
-DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1
-DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1
-DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -DHAVE_DIRENT_H=1
-DSTDC_HEADERS=1 -DHAVE_UNISTD_H=1 -DGETPGRP_VOID=1 -DHAVE_STRFTIME=1
-DHAVE_FORK=1 -DHAVE_VFORK=1 -DHAVE_WORKING_VFORK=1
-DHAVE_WORKING_FORK=1 -DHAVE_VPRINTF=1 -DHAVE_STRDUP=1 -DHAVE_STRSTR=1
-DHAVE_STRCHR=1 -DHAVE_ERAND48=1 -DHAVE_MEMMOVE=1 -DHAVE_LIBM=1
-DPLD_png=1 -I. -I../include -I../../../ajax -I../../../nucleus
-I../../../plplot  -DAJ_LinuxLF -D_FILE_OFFSET_BITS=64
-D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE  -DLENDIAN -DNO_AUTH  -O2
-D__amd64__ -MT disc.o -MD -MP -MF .deps/disc.Tpo -c -o disc.o disc.c

mv -f .deps/disc.Tpo .deps/disc.Po

gcc -DPACKAGE_NAME=\"\" -DPACKAGE_TARNAME=\"\" -DPACKAGE_VERSION=\"\"
-DPACKAGE_STRING=\"\" -DPACKAGE_BUGREPORT=\"\" -DPACKAGE=\"PHYLIPNEW\"
-DVERSION=\"3.67\" -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1
-DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1
-DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1
-DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -DHAVE_DIRENT_H=1
-DSTDC_HEADERS=1 -DHAVE_UNISTD_H=1 -DGETPGRP_VOID=1 -DHAVE_STRFTIME=1
-DHAVE_FORK=1 -DHAVE_VFORK=1 -DHAVE_WORKING_VFORK=1
-DHAVE_WORKING_FORK=1 -DHAVE_VPRINTF=1 -DHAVE_STRDUP=1 -DHAVE_STRSTR=1
-DHAVE_STRCHR=1 -DHAVE_ERAND48=1 -DHAVE_MEMMOVE=1 -DHAVE_LIBM=1
-DPLD_png=1 -I. -I../include -I../../../ajax -I../../../nucleus
-I../../../plplot  -DAJ_LinuxLF -D_FILE_OFFSET_BITS=64
-D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE  -DLENDIAN -DNO_AUTH  -O2
-D__amd64__ -MT phylip.o -MD -MP -MF .deps/phylip.Tpo -c -o phylip.o
phylip.c

mv -f .deps/phylip.Tpo .deps/phylip.Po

/bin/sh ../libtool --tag=CC   --mode=link gcc  -O2 -D__amd64__   -o
fclique clique.o disc.o phylip.o ../../../nucleus/libnucleus.la
../../../ajax/libajax.la ../../../plplot/libeplplot.la -L/usr/lib64
-lX11  -lXaw -lXt -lm  -lgd -lpng -lz -lm

mkdir .libs

gcc -O2 -D__amd64__ -o .libs/fclique clique.o disc.o phylip.o
../../../nucleus/.libs/libnucleus.so ../../../ajax/.libs/libajax.so
../../../plplot/.libs/libeplplot.so -L/usr/lib64 -lX11 -lXaw -lXt -lgd
-lpng -lz -lm  -Wl,--rpath -Wl,/usr/local/lib

/usr/bin/ld: cannot find -lXaw

collect2: ld returned 1 exit status

make[1]: *** [fclique] Error 1

make[1]: Leaving directory
`/home/bioinfvm/EMBOSS-6.0.1/embassy/PHYLIPNEW-3.67/src'

make: *** [all-recursive] Error 1

 
Any help would be appreciated.

 
Thanks

Mick

 
Head of Bioinformatics
Institute for Animal Health
Compton
Berks
RG20 7NN
01635 578411 

 
Please consider the environment and don't print this e-mail unless you
really need to.

The information contained in this message may be confidential or legally
privileged and is intended solely for the addressee. If you have
received this message in error please delete it & notify the originator
immediately.  Unauthorised use, disclosure, copying or alteration of
this message is forbidden & may be unlawful.  The contents of this
e-mail are the views of the sender and do not necessarily represent the
views of the Institute.   This email, and associated attachments, has
been checked locally for viruses but we can accept no responsibility
once it has left our systems.  Communications on Institute computers are
monitored to secure the effective operation of the systems and for other
lawful purposes.

 
The Institute for Animal Health is a company limited by guarantee,
registered in England no. 559784.  

The Institute is also a registered charity, Charity Commissioners
Reference No. 228824

 
From jasonzhang2002 at gmail.com  Mon Jul  6 16:35:21 2009
From: jasonzhang2002 at gmail.com (jason zhang)
Date: Mon, 06 Jul 2009 09:35:21 -0700
Subject: [EMBOSS] Problems installing EMBASSY / PHYLIPNEW
In-Reply-To: <8975119BCD0AC5419D61A9CF1A923E9508FD0DC2@iahce2ksrv1.iah.bbsrc.ac.uk>
References: <8975119BCD0AC5419D61A9CF1A923E9508FD0DC2@iahce2ksrv1.iah.bbsrc.ac.uk>
Message-ID: <4A5227C9.7040004@gmail.com>

Hi, Michael
I did this in my fedora server : yum install libXaw libXaw-compat 
libXaw-devel to resolve this.

-jason

michael watson (IAH-C) wrote:
> Hello
>
>  
>
> I did this on a test server a few weeks ago and wrote a script and it
> worked.
>
>  
>
> Now I come to the production server - well, I get some errors.  Same OS,
> same tar balls.  Very frustrating.  Can anyone help?
>
>  
>
> I have PHYLIPNEW-3.67 unpacked in EMBOSS-6.0.1/embassy/PHYLIPNEW-3.67.
> In that directory, I run configure and then make
>
>  
>
> I get a variety of errors, but this time it's:
>
>  
>
> Making all in src
>
> make[1]: Entering directory
> `/home/bioinfvm/EMBOSS-6.0.1/embassy/PHYLIPNEW-3.67/src'
>
> gcc -DPACKAGE_NAME=\"\" -DPACKAGE_TARNAME=\"\" -DPACKAGE_VERSION=\"\"
> -DPACKAGE_STRING=\"\" -DPACKAGE_BUGREPORT=\"\" -DPACKAGE=\"PHYLIPNEW\"
> -DVERSION=\"3.67\" -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1
> -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1
> -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1
> -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -DHAVE_DIRENT_H=1
> -DSTDC_HEADERS=1 -DHAVE_UNISTD_H=1 -DGETPGRP_VOID=1 -DHAVE_STRFTIME=1
> -DHAVE_FORK=1 -DHAVE_VFORK=1 -DHAVE_WORKING_VFORK=1
> -DHAVE_WORKING_FORK=1 -DHAVE_VPRINTF=1 -DHAVE_STRDUP=1 -DHAVE_STRSTR=1
> -DHAVE_STRCHR=1 -DHAVE_ERAND48=1 -DHAVE_MEMMOVE=1 -DHAVE_LIBM=1
> -DPLD_png=1 -I. -I../include -I../../../ajax -I../../../nucleus
> -I../../../plplot  -DAJ_LinuxLF -D_FILE_OFFSET_BITS=64
> -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE  -DLENDIAN -DNO_AUTH  -O2
> -D__amd64__ -MT clique.o -MD -MP -MF .deps/clique.Tpo -c -o clique.o
> clique.c
>
> mv -f .deps/clique.Tpo .deps/clique.Po
>
> gcc -DPACKAGE_NAME=\"\" -DPACKAGE_TARNAME=\"\" -DPACKAGE_VERSION=\"\"
> -DPACKAGE_STRING=\"\" -DPACKAGE_BUGREPORT=\"\" -DPACKAGE=\"PHYLIPNEW\"
> -DVERSION=\"3.67\" -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1
> -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1
> -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1
> -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -DHAVE_DIRENT_H=1
> -DSTDC_HEADERS=1 -DHAVE_UNISTD_H=1 -DGETPGRP_VOID=1 -DHAVE_STRFTIME=1
> -DHAVE_FORK=1 -DHAVE_VFORK=1 -DHAVE_WORKING_VFORK=1
> -DHAVE_WORKING_FORK=1 -DHAVE_VPRINTF=1 -DHAVE_STRDUP=1 -DHAVE_STRSTR=1
> -DHAVE_STRCHR=1 -DHAVE_ERAND48=1 -DHAVE_MEMMOVE=1 -DHAVE_LIBM=1
> -DPLD_png=1 -I. -I../include -I../../../ajax -I../../../nucleus
> -I../../../plplot  -DAJ_LinuxLF -D_FILE_OFFSET_BITS=64
> -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE  -DLENDIAN -DNO_AUTH  -O2
> -D__amd64__ -MT disc.o -MD -MP -MF .deps/disc.Tpo -c -o disc.o disc.c
>
> mv -f .deps/disc.Tpo .deps/disc.Po
>
> gcc -DPACKAGE_NAME=\"\" -DPACKAGE_TARNAME=\"\" -DPACKAGE_VERSION=\"\"
> -DPACKAGE_STRING=\"\" -DPACKAGE_BUGREPORT=\"\" -DPACKAGE=\"PHYLIPNEW\"
> -DVERSION=\"3.67\" -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1
> -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1
> -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1
> -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -DHAVE_DIRENT_H=1
> -DSTDC_HEADERS=1 -DHAVE_UNISTD_H=1 -DGETPGRP_VOID=1 -DHAVE_STRFTIME=1
> -DHAVE_FORK=1 -DHAVE_VFORK=1 -DHAVE_WORKING_VFORK=1
> -DHAVE_WORKING_FORK=1 -DHAVE_VPRINTF=1 -DHAVE_STRDUP=1 -DHAVE_STRSTR=1
> -DHAVE_STRCHR=1 -DHAVE_ERAND48=1 -DHAVE_MEMMOVE=1 -DHAVE_LIBM=1
> -DPLD_png=1 -I. -I../include -I../../../ajax -I../../../nucleus
> -I../../../plplot  -DAJ_LinuxLF -D_FILE_OFFSET_BITS=64
> -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE  -DLENDIAN -DNO_AUTH  -O2
> -D__amd64__ -MT phylip.o -MD -MP -MF .deps/phylip.Tpo -c -o phylip.o
> phylip.c
>
> mv -f .deps/phylip.Tpo .deps/phylip.Po
>
> /bin/sh ../libtool --tag=CC   --mode=link gcc  -O2 -D__amd64__   -o
> fclique clique.o disc.o phylip.o ../../../nucleus/libnucleus.la
> ../../../ajax/libajax.la ../../../plplot/libeplplot.la -L/usr/lib64
> -lX11  -lXaw -lXt -lm  -lgd -lpng -lz -lm
>
> mkdir .libs
>
> gcc -O2 -D__amd64__ -o .libs/fclique clique.o disc.o phylip.o
> ../../../nucleus/.libs/libnucleus.so ../../../ajax/.libs/libajax.so
> ../../../plplot/.libs/libeplplot.so -L/usr/lib64 -lX11 -lXaw -lXt -lgd
> -lpng -lz -lm  -Wl,--rpath -Wl,/usr/local/lib
>
> /usr/bin/ld: cannot find -lXaw
>
> collect2: ld returned 1 exit status
>
> make[1]: *** [fclique] Error 1
>
> make[1]: Leaving directory
> `/home/bioinfvm/EMBOSS-6.0.1/embassy/PHYLIPNEW-3.67/src'
>
> make: *** [all-recursive] Error 1
>
>  
>
> Any help would be appreciated.
>
>  
>
> Thanks
>
> Mick
>
>  
>
> Head of Bioinformatics
> Institute for Animal Health
> Compton
> Berks
> RG20 7NN
> 01635 578411 
>
>  
>
> Please consider the environment and don't print this e-mail unless you
> really need to.
>
> The information contained in this message may be confidential or legally
> privileged and is intended solely for the addressee. If you have
> received this message in error please delete it & notify the originator
> immediately.  Unauthorised use, disclosure, copying or alteration of
> this message is forbidden & may be unlawful.  The contents of this
> e-mail are the views of the sender and do not necessarily represent the
> views of the Institute.   This email, and associated attachments, has
> been checked locally for viruses but we can accept no responsibility
> once it has left our systems.  Communications on Institute computers are
> monitored to secure the effective operation of the systems and for other
> lawful purposes.
>
>  
>
> The Institute for Animal Health is a company limited by guarantee,
> registered in England no. 559784.  
>
> The Institute is also a registered charity, Charity Commissioners
> Reference No. 228824
>
>  
>
>
> _______________________________________________
> EMBOSS mailing list
> EMBOSS at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/emboss
>
>   


From steffen_moeller at gmx.de  Mon Jul  6 16:28:47 2009
From: steffen_moeller at gmx.de (Steffen Moeller)
Date: Mon, 06 Jul 2009 18:28:47 +0200
Subject: [EMBOSS] Problems installing EMBASSY / PHYLIPNEW
In-Reply-To: <8975119BCD0AC5419D61A9CF1A923E9508FD0DC2@iahce2ksrv1.iah.bbsrc.ac.uk>
References: <8975119BCD0AC5419D61A9CF1A923E9508FD0DC2@iahce2ksrv1.iah.bbsrc.ac.uk>
Message-ID: <4A52263F.1050806@gmx.de>

Hi Michael,

michael watson (IAH-C) wrote:

> I did this on a test server a few weeks ago and wrote a script and it
> worked.
> 
> Now I come to the production server - well, I get some errors.  Same OS,
> same tar balls.  Very frustrating.  Can anyone help?
...
> 
> gcc -O2 -D__amd64__ -o .libs/fclique clique.o disc.o phylip.o
> ../../../nucleus/.libs/libnucleus.so ../../../ajax/.libs/libajax.so
> ../../../plplot/.libs/libeplplot.so -L/usr/lib64 -lX11 -lXaw -lXt -lgd
> -lpng -lz -lm  -Wl,--rpath -Wl,/usr/local/lib
> 
> /usr/bin/ld: cannot find -lXaw
> 
> collect2: ld returned 1 exit status

your production machine is probably not prepared for graphical user interfaces :)

Debian has it in the package libxaw7.

Many greetings

Steffen

$ dpkg -S /usr/lib/libXaw.so.7
libxaw7: /usr/lib/libXaw.so.7


From michael.watson at bbsrc.ac.uk  Tue Jul  7 08:18:39 2009
From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C))
Date: Tue, 7 Jul 2009 09:18:39 +0100
Subject: [EMBOSS] Problems installing EMBASSY / PHYLIPNEW
In-Reply-To: <4A5227C9.7040004@gmail.com>
References: <8975119BCD0AC5419D61A9CF1A923E9508FD0DC2@iahce2ksrv1.iah.bbsrc.ac.uk>
	<4A5227C9.7040004@gmail.com>
Message-ID: <8975119BCD0AC5419D61A9CF1A923E9508FD0DD3@iahce2ksrv1.iah.bbsrc.ac.uk>

Hi Guys

Thanks for all the replies.

One last question: do I need the X11 development libraries in order for
fdrawgram and fdrawtree to produce postscript files?

Thanks
Michael

-----Original Message-----
From: jason zhang [mailto:jasonzhang2002 at gmail.com] 
Sent: 06 July 2009 17:35
To: michael watson (IAH-C)
Cc: emboss at lists.open-bio.org
Subject: Re: [EMBOSS] Problems installing EMBASSY / PHYLIPNEW

Hi, Michael
I did this in my fedora server : yum install libXaw libXaw-compat 
libXaw-devel to resolve this.

-jason

michael watson (IAH-C) wrote:
> Hello
>
>  
>
> I did this on a test server a few weeks ago and wrote a script and it
> worked.
>
>  
>
> Now I come to the production server - well, I get some errors.  Same
OS,
> same tar balls.  Very frustrating.  Can anyone help?
>
>  
>
> I have PHYLIPNEW-3.67 unpacked in EMBOSS-6.0.1/embassy/PHYLIPNEW-3.67.
> In that directory, I run configure and then make
>
>  
>
> I get a variety of errors, but this time it's:
>
>  
>
> Making all in src
>
> make[1]: Entering directory
> `/home/bioinfvm/EMBOSS-6.0.1/embassy/PHYLIPNEW-3.67/src'
>
> gcc -DPACKAGE_NAME=\"\" -DPACKAGE_TARNAME=\"\" -DPACKAGE_VERSION=\"\"
> -DPACKAGE_STRING=\"\" -DPACKAGE_BUGREPORT=\"\" -DPACKAGE=\"PHYLIPNEW\"
> -DVERSION=\"3.67\" -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1
> -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1
> -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1
> -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -DHAVE_DIRENT_H=1
> -DSTDC_HEADERS=1 -DHAVE_UNISTD_H=1 -DGETPGRP_VOID=1 -DHAVE_STRFTIME=1
> -DHAVE_FORK=1 -DHAVE_VFORK=1 -DHAVE_WORKING_VFORK=1
> -DHAVE_WORKING_FORK=1 -DHAVE_VPRINTF=1 -DHAVE_STRDUP=1 -DHAVE_STRSTR=1
> -DHAVE_STRCHR=1 -DHAVE_ERAND48=1 -DHAVE_MEMMOVE=1 -DHAVE_LIBM=1
> -DPLD_png=1 -I. -I../include -I../../../ajax -I../../../nucleus
> -I../../../plplot  -DAJ_LinuxLF -D_FILE_OFFSET_BITS=64
> -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE  -DLENDIAN -DNO_AUTH  -O2
> -D__amd64__ -MT clique.o -MD -MP -MF .deps/clique.Tpo -c -o clique.o
> clique.c
>
> mv -f .deps/clique.Tpo .deps/clique.Po
>
> gcc -DPACKAGE_NAME=\"\" -DPACKAGE_TARNAME=\"\" -DPACKAGE_VERSION=\"\"
> -DPACKAGE_STRING=\"\" -DPACKAGE_BUGREPORT=\"\" -DPACKAGE=\"PHYLIPNEW\"
> -DVERSION=\"3.67\" -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1
> -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1
> -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1
> -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -DHAVE_DIRENT_H=1
> -DSTDC_HEADERS=1 -DHAVE_UNISTD_H=1 -DGETPGRP_VOID=1 -DHAVE_STRFTIME=1
> -DHAVE_FORK=1 -DHAVE_VFORK=1 -DHAVE_WORKING_VFORK=1
> -DHAVE_WORKING_FORK=1 -DHAVE_VPRINTF=1 -DHAVE_STRDUP=1 -DHAVE_STRSTR=1
> -DHAVE_STRCHR=1 -DHAVE_ERAND48=1 -DHAVE_MEMMOVE=1 -DHAVE_LIBM=1
> -DPLD_png=1 -I. -I../include -I../../../ajax -I../../../nucleus
> -I../../../plplot  -DAJ_LinuxLF -D_FILE_OFFSET_BITS=64
> -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE  -DLENDIAN -DNO_AUTH  -O2
> -D__amd64__ -MT disc.o -MD -MP -MF .deps/disc.Tpo -c -o disc.o disc.c
>
> mv -f .deps/disc.Tpo .deps/disc.Po
>
> gcc -DPACKAGE_NAME=\"\" -DPACKAGE_TARNAME=\"\" -DPACKAGE_VERSION=\"\"
> -DPACKAGE_STRING=\"\" -DPACKAGE_BUGREPORT=\"\" -DPACKAGE=\"PHYLIPNEW\"
> -DVERSION=\"3.67\" -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1
> -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1
> -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1
> -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -DHAVE_DIRENT_H=1
> -DSTDC_HEADERS=1 -DHAVE_UNISTD_H=1 -DGETPGRP_VOID=1 -DHAVE_STRFTIME=1
> -DHAVE_FORK=1 -DHAVE_VFORK=1 -DHAVE_WORKING_VFORK=1
> -DHAVE_WORKING_FORK=1 -DHAVE_VPRINTF=1 -DHAVE_STRDUP=1 -DHAVE_STRSTR=1
> -DHAVE_STRCHR=1 -DHAVE_ERAND48=1 -DHAVE_MEMMOVE=1 -DHAVE_LIBM=1
> -DPLD_png=1 -I. -I../include -I../../../ajax -I../../../nucleus
> -I../../../plplot  -DAJ_LinuxLF -D_FILE_OFFSET_BITS=64
> -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE  -DLENDIAN -DNO_AUTH  -O2
> -D__amd64__ -MT phylip.o -MD -MP -MF .deps/phylip.Tpo -c -o phylip.o
> phylip.c
>
> mv -f .deps/phylip.Tpo .deps/phylip.Po
>
> /bin/sh ../libtool --tag=CC   --mode=link gcc  -O2 -D__amd64__   -o
> fclique clique.o disc.o phylip.o ../../../nucleus/libnucleus.la
> ../../../ajax/libajax.la ../../../plplot/libeplplot.la -L/usr/lib64
> -lX11  -lXaw -lXt -lm  -lgd -lpng -lz -lm
>
> mkdir .libs
>
> gcc -O2 -D__amd64__ -o .libs/fclique clique.o disc.o phylip.o
> ../../../nucleus/.libs/libnucleus.so ../../../ajax/.libs/libajax.so
> ../../../plplot/.libs/libeplplot.so -L/usr/lib64 -lX11 -lXaw -lXt -lgd
> -lpng -lz -lm  -Wl,--rpath -Wl,/usr/local/lib
>
> /usr/bin/ld: cannot find -lXaw
>
> collect2: ld returned 1 exit status
>
> make[1]: *** [fclique] Error 1
>
> make[1]: Leaving directory
> `/home/bioinfvm/EMBOSS-6.0.1/embassy/PHYLIPNEW-3.67/src'
>
> make: *** [all-recursive] Error 1
>
>  
>
> Any help would be appreciated.
>
>  
>
> Thanks
>
> Mick
>
>  
>
> Head of Bioinformatics
> Institute for Animal Health
> Compton
> Berks
> RG20 7NN
> 01635 578411 
>
>  
>
> Please consider the environment and don't print this e-mail unless you
> really need to.
>
> The information contained in this message may be confidential or
legally
> privileged and is intended solely for the addressee. If you have
> received this message in error please delete it & notify the
originator
> immediately.  Unauthorised use, disclosure, copying or alteration of
> this message is forbidden & may be unlawful.  The contents of this
> e-mail are the views of the sender and do not necessarily represent
the
> views of the Institute.   This email, and associated attachments, has
> been checked locally for viruses but we can accept no responsibility
> once it has left our systems.  Communications on Institute computers
are
> monitored to secure the effective operation of the systems and for
other
> lawful purposes.
>
>  
>
> The Institute for Animal Health is a company limited by guarantee,
> registered in England no. 559784.  
>
> The Institute is also a registered charity, Charity Commissioners
> Reference No. 228824
>
>  
>
>
> _______________________________________________
> EMBOSS mailing list
> EMBOSS at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/emboss
>
>   


From ellis at biochem.dental.upenn.edu  Tue Jul  7 19:34:27 2009
From: ellis at biochem.dental.upenn.edu (Ellis Golub)
Date: Tue, 7 Jul 2009 15:34:27 -0400
Subject: [EMBOSS] Structure
Message-ID: <1090707153427.ZM647422@biochem.dental.upenn.edu>

Hello --

I've downloaded the EMBASSY STRUCTURE-0.1.0 package and tried to install it. I
have previously installed EMBOSS-6.0.1.  Using the same flags with configure
that I used for the original EMBOSS installation, all seems to go well until I
try to run one of the programs.  The following output is obtained:

> pdbparse
1274873:pdbparse: rld: Error: unresolvable symbol in
/usr/local/emboss/lib/libplplot.so.6: gdImageColorAllocate
1274873:pdbparse: rld: Error: unresolvable symbol in
/usr/local/emboss/lib/libplplot.so.6: gdImageLine
1274873:pdbparse: rld: Fatal Error: this executable has unresolvable symbols

The /usr/local/emboss/lib/libplplot.so.6 file is the one produced with the
EMBOSS install, and all the EMBOSS files work fine, and the grahics outputs are
fine.

Anyone have a clue?

Ellis

--
=============================================================
Ellis Golub                  Phone: (215) 898-4629
Biochemistry Department      FAX:   (215) 898-3695
School of Dental Medicine    ellis at biochem.dental.upenn.edu
University of Pennsylvania
240 South 40th Street
Philadelphia, PA 19104-6030
Home: http://biochem.dental.upenn.edu/~ellis/
PGP: http://biochem.dental.upenn.edu/~ellis/public_key.html
==============================================================


From lueck at ipk-gatersleben.de  Wed Jul  8 09:59:39 2009
From: lueck at ipk-gatersleben.de (lueck at ipk-gatersleben.de)
Date: Wed,  8 Jul 2009 11:59:39 +0200
Subject: [EMBOSS] Many-to-many with needle and water
In-Reply-To: <4A51D360.1050107@ebi.ac.uk>
References: <320fb6e00907060312l5506c766i1c4a4ff48fba8c7e@mail.gmail.com>
	<4A51D360.1050107@ebi.ac.uk>
Message-ID: <20090708115939.8spikcm8zxi8cscw@webmail.ipk-gatersleben.de>

Hi!

I would be interested in "superneedle" if the speed is acceptable. I'm 
still looking for a fast and good enough short sequence finder/aligner 
(see 
http://www.mailinglistarchive.com/biopython at biopython.org/msg00941.html).

At the moment I'm still using BLAST because I have no time to switch 
but later I would like to change to another algorithm.

Kind regards
Stefanie

Zitat von Peter Rice <pmr at ebi.ac.uk>:

> Peter Cock or biopython wrote:
>> Hi Peter R. et al,
>>
>> I gather EMBOSS is looking for feedback for new applications (given
>> the recent funding from the BBSRC - congratulations again). How about
>> suggestions for extensions to existing EMBOSS applications?
>>
>> I've used bits of EMBOSS for several years now (thank you!). Something
>> I have sometimes wanted to do is a many-to-many pairwise sequence
>> alignment with the EMBOSS tools needle and water.
>>
>> Right now, needle and water take two files (here referred to as A and
>> B), file A has just one sequence, and file B can have one or more
>> sequences. I'd like to be able to supply two files both with multiple
>> entries, and have needle/water do pairwise alignments between all the
>> sequences in A against all the sequences in B. This might be useful
>> for finding reciprocal best hits in comparative genomics (as an slower
>> but exact alternative to FASTA or BLAST).
>
> The application is easy to add (after the release)
>
> The usual problem with all-against-all is that it involves loading one
> of the inputs as a sequence set entirely in memory - to avoid reading
> one input many times over.
>
> We have an application supermatcher which does this - the first sequence
> is streamed through, the second is a sequence set loaded into memory. It
> uses work matching to find seed alignments then runs a limited alignment
> around the hits.
>
> superwater would be a possible name (or superneedle).
>
> How popular would such a program be?
>
> How large would the smaller input set be?
>
> regards,
>
> Peter
>
> _______________________________________________
> EMBOSS mailing list
> EMBOSS at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/emboss
>
>


From pmr at ebi.ac.uk  Wed Jul  8 11:04:04 2009
From: pmr at ebi.ac.uk (Peter Rice)
Date: Wed, 08 Jul 2009 12:04:04 +0100
Subject: [EMBOSS] Many-to-many with needle and water
In-Reply-To: <20090708115939.8spikcm8zxi8cscw@webmail.ipk-gatersleben.de>
References: <320fb6e00907060312l5506c766i1c4a4ff48fba8c7e@mail.gmail.com>	<4A51D360.1050107@ebi.ac.uk>
	<20090708115939.8spikcm8zxi8cscw@webmail.ipk-gatersleben.de>
Message-ID: <4A547D24.6070703@ebi.ac.uk>

Hi Stefanie,

lueck at ipk-gatersleben.de wrote:
> Hi!
> 
> I would be interested in "superneedle" if the speed is acceptable. I'm
> still looking for a fast and good enough short sequence finder/aligner
> (see
> http://www.mailinglistarchive.com/biopython at biopython.org/msg00941.html).
> 
> At the moment I'm still using BLAST because I have no time to switch but
> later I would like to change to another algorithm.

>From the biopython message it seems you mainly want to find exact 21-mers.

Can you give an example of a query and a database so I can test the
speed of current EMBOSS applications like wordfinder

regards,

Peter Rice


From biopython at maubp.freeserve.co.uk  Wed Jul  8 21:50:19 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 8 Jul 2009 22:50:19 +0100
Subject: [EMBOSS] transeq and ambiguous codons
Message-ID: <320fb6e00907081450y2fd135e0x817f03c41357e297@mail.gmail.com>

Hi all,

Something I mentioned to Peter Rice in passing at BOSC/ISMB 2009 was
I'd found an oddity in transeq with certain ambiguous codons which
testing Biopython's translations. Here is a specific example (but I
suspect there are more). For reference, I am expecting EMBOSS transeq
to be using the NCBI tables:
http://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi

First consider the following example, the codon TAN, which can be TAA,
TAC, TAG or TAT which translate to stop or Y. Therefore the
translation of TAN should be "* or Y", and EMBOSS transeq opts for
"X". Which is fine:

$ transeq asis:TAATACTAGTATTAN -stdout -auto
>asis_1
*Y*YX

Similarly for the codon TNN, again EMBOSS transeq opts for "X" because
this could be a stop codon, or W, or F, or L, or S, or Y or C! Again,
this is fine:

$ transeq asis:TNN -stdout -auto >asis_1
X

However, consider the codon TRR. R means A or G, so this can mean TAA,
TGA, TAG or TGG which translate to stop or W (both EMBOSS and the NCBI
standard table agree here). Therefore the translation of TRR should be
"* or W", which I would expect based on the above examples to result
in "X". But instead EMBOSS transeq gives "*":

$ transeq asis:TAATGATAGTGGTRRTNN -stdout -auto
>asis_1
***W*X

I think this is a bug.

However, I am aware that the machine I tried this on is rather old,
and I don't actually know which version of EMBOSS it is. How can I
find out? As far as I know, there is no "-version" or "-v" or
"--version" switch, and the "-help" information doesn't include this
important piece of information. Nor is this in the FAQ:
http://emboss.sourceforge.net/docs/faq.html

So that makes two questions - how should transeq translate "TRR", and
how do I check the version of EMBOSS?

Thanks,

Peter C.


From SMarkel at accelrys.com  Wed Jul  8 23:53:05 2009
From: SMarkel at accelrys.com (Scott Markel)
Date: Wed, 8 Jul 2009 19:53:05 -0400
Subject: [EMBOSS] transeq and ambiguous codons
In-Reply-To: <320fb6e00907081450y2fd135e0x817f03c41357e297@mail.gmail.com>
References: <320fb6e00907081450y2fd135e0x817f03c41357e297@mail.gmail.com>
Message-ID: <1F1240778FB0AF46B4E5A72C44D2C74732E01A91@exch1-hi.accelrys.net>

Peter,

Answer to question #2: run the program embossversion.

> embossversion
Writes the current EMBOSS version number to a file
6.0.1

Scott

Scott Markel, Ph.D.
Principal Bioinformatics Architect  email:  smarkel at accelrys.com
Accelrys (SciTegic R&D)             mobile: +1 858 205 3653
10188 Telesis Court, Suite 100      voice:  +1 858 799 5603
San Diego, CA 92121                 fax:    +1 858 799 5222
USA                                 web:    http://www.accelrys.com

http://www.linkedin.com/in/smarkel
Vice President, Board of Directors:
    International Society for Computational Biology
Co-chair: ISCB Publications Committee
Associate Editor: PLoS Computational Biology
Editorial Board: Briefings in Bioinformatics


> -----Original Message-----
> From: emboss-bounces at lists.open-bio.org [mailto:emboss-bounces at lists.open-
> bio.org] On Behalf Of Peter
> Sent: Wednesday, 08 July 2009 2:50 PM
> To: emboss at lists.open-bio.org
> Subject: [EMBOSS] transeq and ambiguous codons
> 
> Hi all,
> 
> Something I mentioned to Peter Rice in passing at BOSC/ISMB 2009 was
> I'd found an oddity in transeq with certain ambiguous codons which
> testing Biopython's translations. Here is a specific example (but I
> suspect there are more). For reference, I am expecting EMBOSS transeq
> to be using the NCBI tables:
> http://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi
> 
> First consider the following example, the codon TAN, which can be TAA,
> TAC, TAG or TAT which translate to stop or Y. Therefore the
> translation of TAN should be "* or Y", and EMBOSS transeq opts for
> "X". Which is fine:
> 
> $ transeq asis:TAATACTAGTATTAN -stdout -auto
> >asis_1
> *Y*YX
> 
> Similarly for the codon TNN, again EMBOSS transeq opts for "X" because
> this could be a stop codon, or W, or F, or L, or S, or Y or C! Again,
> this is fine:
> 
> $ transeq asis:TNN -stdout -auto >asis_1
> X
> 
> However, consider the codon TRR. R means A or G, so this can mean TAA,
> TGA, TAG or TGG which translate to stop or W (both EMBOSS and the NCBI
> standard table agree here). Therefore the translation of TRR should be
> "* or W", which I would expect based on the above examples to result
> in "X". But instead EMBOSS transeq gives "*":
> 
> $ transeq asis:TAATGATAGTGGTRRTNN -stdout -auto
> >asis_1
> ***W*X
> 
> I think this is a bug.
> 
> However, I am aware that the machine I tried this on is rather old,
> and I don't actually know which version of EMBOSS it is. How can I
> find out? As far as I know, there is no "-version" or "-v" or
> "--version" switch, and the "-help" information doesn't include this
> important piece of information. Nor is this in the FAQ:
> http://emboss.sourceforge.net/docs/faq.html
> 
> So that makes two questions - how should transeq translate "TRR", and
> how do I check the version of EMBOSS?
> 
> Thanks,
> 
> Peter C.
> _______________________________________________
> EMBOSS mailing list
> EMBOSS at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/emboss


From pmr at ebi.ac.uk  Thu Jul  9 07:38:34 2009
From: pmr at ebi.ac.uk (Peter Rice)
Date: Thu, 09 Jul 2009 08:38:34 +0100
Subject: [EMBOSS] transeq and ambiguous codons
In-Reply-To: <320fb6e00907081450y2fd135e0x817f03c41357e297@mail.gmail.com>
References: <320fb6e00907081450y2fd135e0x817f03c41357e297@mail.gmail.com>
Message-ID: <4A559E7A.4020807@ebi.ac.uk>

Peter C. wrote:
> Hi all,
> 
> Something I mentioned to Peter Rice in passing at BOSC/ISMB 2009 was
> I'd found an oddity in transeq with certain ambiguous codons which
> testing Biopython's translations. Here is a specific example (but I
> suspect there are more). For reference, I am expecting EMBOSS transeq
> to be using the NCBI tables:
> http://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi

Yes, those are our source.

> However, consider the codon TRR. R means A or G, so this can mean TAA,
> TGA, TAG or TGG which translate to stop or W (both EMBOSS and the NCBI
> standard table agree here). Therefore the translation of TRR should be
> "* or W", which I would expect based on the above examples to result
> in "X". But instead EMBOSS transeq gives "*":
> 
> $ transeq asis:TAATGATAGTGGTRRTNN -stdout -auto
>> asis_1
> ***W*X

Hmmm... something odd there. TRA translates as 'X'.

I will take a look, but we are bundling the 6.1.0 release so (unless I am 
very quick) any resolution will be a patch after the release.

> So that makes two questions - how should transeq translate "TRR", and
> how do I check the version of EMBOSS?

Scott got there first (different time zone I assume :-) with the suggestion 
of embossversion.

Meanwhile .... are there any translation utilities we could usefully 
include in a future version?

I am already contemplating one to report the translations of ambiguous 
codons, just for information in tracking and debugging translation tables - 
and explaining transeq and other applications to new users.

regards,

Peter Rice


From biopython at maubp.freeserve.co.uk  Thu Jul  9 08:59:52 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 9 Jul 2009 09:59:52 +0100
Subject: [EMBOSS] transeq and ambiguous codons
In-Reply-To: <1F1240778FB0AF46B4E5A72C44D2C74732E01A91@exch1-hi.accelrys.net>
References: <320fb6e00907081450y2fd135e0x817f03c41357e297@mail.gmail.com>
	<1F1240778FB0AF46B4E5A72C44D2C74732E01A91@exch1-hi.accelrys.net>
Message-ID: <320fb6e00907090159w6a76df66gbf8da2581594414d@mail.gmail.com>

On Thu, Jul 9, 2009 at 12:53 AM, Scott Markel<SMarkel at accelrys.com> wrote:
>
> Peter,
>
> Answer to question #2: run the program embossversion.
>
>> embossversion
> Writes the current EMBOSS version number to a file
> 6.0.1
>
> Scott

Thanks Scott (& Thomas) for pointing out the embossversion program.

I would still question why the EMBOSS tools don't also support the
Unix convention of a version switch. Hypothetically, aren't some
(many?) of the tools standalone and couldn't they be installed
individually (e.g. as part of someone else's software bundle)? i.e.
Can EMBOSS really guarantee that the needle tool and the
embossversion tool are in sync?

Peter


From pmr at ebi.ac.uk  Thu Jul  9 09:08:37 2009
From: pmr at ebi.ac.uk (Peter Rice)
Date: Thu, 09 Jul 2009 10:08:37 +0100
Subject: [EMBOSS] transeq and ambiguous codons
In-Reply-To: <320fb6e00907081450y2fd135e0x817f03c41357e297@mail.gmail.com>
References: <320fb6e00907081450y2fd135e0x817f03c41357e297@mail.gmail.com>
Message-ID: <4A55B395.4090301@ebi.ac.uk>

Peter C. wrote:
> However, consider the codon TRR. R means A or G, so this can mean TAA,
> TGA, TAG or TGG which translate to stop or W (both EMBOSS and the NCBI
> standard table agree here). Therefore the translation of TRR should be
> "* or W", which I would expect based on the above examples to result
> in "X". But instead EMBOSS transeq gives "*":

This is a side effect of the way backtranslation works.

EMBOSS calculates the "most ambiguous codon" for each amino acid and
stop, and uses this for back translation. Thus a '*' in a protein
sequence would be rendered as 'TRR' by backtranseq. To provide
consistent translation of the backtranseq results, TRR is assumed to be
a backtranslated stop. Similarly, MGN is 'R' because it could reasonably
result from a backtranslation of 'R'

I agree that it would also be reasonable to be strict about translation
in transeq and render TRR as 'X'

It depends on your philosophy of where the ambiguity codes came from -
from backtranslation, or the curious mind of a bioinformatician :-)

So .... it's not a bug, it's a feature ... which means I can relax for
now and contemplate some extras in the next release.

In future, we will at least make sure TRA and other 'unambiguous
ambiguous codons' get understood as '*' etc. TRR I would prefer to leave
as it is by default, with option for rendering it as 'X' or an
alternative to transeq with the strict translation rules enforced.

regards,

Peter Rice


From pmr at ebi.ac.uk  Thu Jul  9 09:16:40 2009
From: pmr at ebi.ac.uk (Peter Rice)
Date: Thu, 09 Jul 2009 10:16:40 +0100
Subject: [EMBOSS] transeq and ambiguous codons
In-Reply-To: <320fb6e00907090159w6a76df66gbf8da2581594414d@mail.gmail.com>
References: <320fb6e00907081450y2fd135e0x817f03c41357e297@mail.gmail.com>	<1F1240778FB0AF46B4E5A72C44D2C74732E01A91@exch1-hi.accelrys.net>
	<320fb6e00907090159w6a76df66gbf8da2581594414d@mail.gmail.com>
Message-ID: <4A55B578.1060808@ebi.ac.uk>

Peter C. wrote:

> Thanks Scott (& Thomas) for pointing out the embossversion program.
> 
> I would still question why the EMBOSS tools don't also support the
> Unix convention of a version switch. Hypothetically, aren't some
> (many?) of the tools standalone and couldn't they be installed
> individually (e.g. as part of someone else's software bundle)? i.e.
> Can EMBOSS really guarantee that the needle tool and the
> embossversion tool are in sync?

We could easily add a -version global qualifier ... for the next release.

We can guarantee that embossversion and needle are in sync - assuming
they are built using the same libraries as that is where the version is
recorded. Standalone build are an issue though and it would help debug
in a few cases.

regards,

Peter Rice


From biopython at maubp.freeserve.co.uk  Thu Jul  9 09:21:45 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 9 Jul 2009 10:21:45 +0100
Subject: [EMBOSS] transeq and ambiguous codons
In-Reply-To: <4A55B578.1060808@ebi.ac.uk>
References: <320fb6e00907081450y2fd135e0x817f03c41357e297@mail.gmail.com>
	<1F1240778FB0AF46B4E5A72C44D2C74732E01A91@exch1-hi.accelrys.net>
	<320fb6e00907090159w6a76df66gbf8da2581594414d@mail.gmail.com>
	<4A55B578.1060808@ebi.ac.uk>
Message-ID: <320fb6e00907090221m66f830b6ya25ad8eeeecf8f12@mail.gmail.com>

On Thu, Jul 9, 2009 at 10:16 AM, Peter Rice<pmr at ebi.ac.uk> wrote:
>
> Peter C. wrote:
>
>> Thanks Scott (& Thomas) for pointing out the embossversion program.
>>
>> I would still question why the EMBOSS tools don't also support the
>> Unix convention of a version switch. Hypothetically, aren't some
>> (many?) of the tools standalone and couldn't they be installed
>> individually (e.g. as part of someone else's software bundle)? i.e.
>> Can EMBOSS really guarantee that the needle tool and the
>> embossversion tool are in sync?
>
> We could easily add a -version global qualifier ... for the next release.
>
> We can guarantee that embossversion and needle are in sync - assuming
> they are built using the same libraries as that is where the version is
> recorded. Standalone build are an issue though and it would help debug
> in a few cases.

That sounds good to me :)

Peter C.


From biopython at maubp.freeserve.co.uk  Thu Jul  9 09:28:20 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 9 Jul 2009 10:28:20 +0100
Subject: [EMBOSS] transeq and ambiguous codons
In-Reply-To: <320fb6e00907081450y2fd135e0x817f03c41357e297@mail.gmail.com>
References: <320fb6e00907081450y2fd135e0x817f03c41357e297@mail.gmail.com>
Message-ID: <320fb6e00907090228u7fd0df2by1e2ca18fc0a5c43@mail.gmail.com>

On Wed, Jul 8, 2009 at 10:50 PM, Peter<biopython at maubp.freeserve.co.uk> wrote:
> Hi all,
>
> Something I mentioned to Peter Rice in passing at BOSC/ISMB 2009 was
> I'd found an oddity in transeq with certain ambiguous codons while
> testing Biopython's translations. Here is a specific example (but I
> suspect there are more). For reference, I am expecting EMBOSS transeq
> to be using the NCBI tables:
> http://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi
>
> First consider the following example, the codon TAN, which can be TAA,
> TAC, TAG or TAT which translate to stop or Y. Therefore the
> translation of TAN should be "* or Y", and EMBOSS transeq opts for
> "X". Which is fine:

Using raw output instead of the default FASTA works better in emails:

$ transeq asis:TAATACTAGTATTAN -stdout -auto -osformat raw
*Y*YX

> Similarly for the codon TNN, again EMBOSS transeq opts for "X" because
> this could be a stop codon, or W, or F, or L, or S, or Y or C! Again,
> this is fine:

Again, using raw output works better in emails:

$ transeq asis:TNN -stdout -auto -osformat raw
X

> However, consider the codon TRR. R means A or G, so this can mean TAA,
> TGA, TAG or TGG which translate to stop or W (both EMBOSS and the NCBI
> standard table agree here). Therefore the translation of TRR should be
> "* or W", which I would expect based on the above examples to result
> in "X". But instead EMBOSS transeq gives "*":

Again, using raw output works better in emails:

$ transeq asis:TAATGATAGTGGTRR -stdout -auto -osformat raw
***W*

> I think this is a bug.
>
> However, I am aware that the machine I tried this on is rather old,
> and I don't actually know which version of EMBOSS it is.

I can check the old machine later, but I just retested on a Mac using
EMBOSS 6.0.1 (the current release), and see the same behaviour.

Peter C.


From michael.watson at bbsrc.ac.uk  Thu Jul  9 11:33:59 2009
From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C))
Date: Thu, 9 Jul 2009 12:33:59 +0100
Subject: [EMBOSS] Problems installing EMBASSY / PHYLIPNEW
References: <8975119BCD0AC5419D61A9CF1A923E9508FD0DC2@iahce2ksrv1.iah.bbsrc.ac.uk>
	<4A5227C9.7040004@gmail.com>
Message-ID: <8975119BCD0AC5419D61A9CF1A923E9508B292C9@iahce2ksrv1.iah.bbsrc.ac.uk>

Hmmm.  My sys admin says....

I can find:
libXaw-devel.x86_64 : X.Org X11 libXaw development package
libXaw.x86_64 : X.Org X11 libXaw runtime library
libXaw.i386 : X.Org X11 libXaw runtime library
libXaw-devel.i386 : X.Org X11 libXaw development package

Have I got compatibility issues here?


-----Original Message-----
From: jason zhang [mailto:jasonzhang2002 at gmail.com]
Sent: Mon 06/07/2009 5:35 PM
To: michael watson (IAH-C)
Cc: emboss at lists.open-bio.org
Subject: Re: [EMBOSS] Problems installing EMBASSY / PHYLIPNEW
 
Hi, Michael
I did this in my fedora server : yum install libXaw libXaw-compat 
libXaw-devel to resolve this.

-jason

michael watson (IAH-C) wrote:
> Hello
>
>  
>
> I did this on a test server a few weeks ago and wrote a script and it
> worked.
>
>  
>
> Now I come to the production server - well, I get some errors.  Same OS,
> same tar balls.  Very frustrating.  Can anyone help?
>
>  
>
> I have PHYLIPNEW-3.67 unpacked in EMBOSS-6.0.1/embassy/PHYLIPNEW-3.67.
> In that directory, I run configure and then make
>
>  
>
> I get a variety of errors, but this time it's:
>
>  
>
> Making all in src
>
> make[1]: Entering directory
> `/home/bioinfvm/EMBOSS-6.0.1/embassy/PHYLIPNEW-3.67/src'
>
> gcc -DPACKAGE_NAME=\"\" -DPACKAGE_TARNAME=\"\" -DPACKAGE_VERSION=\"\"
> -DPACKAGE_STRING=\"\" -DPACKAGE_BUGREPORT=\"\" -DPACKAGE=\"PHYLIPNEW\"
> -DVERSION=\"3.67\" -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1
> -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1
> -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1
> -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -DHAVE_DIRENT_H=1
> -DSTDC_HEADERS=1 -DHAVE_UNISTD_H=1 -DGETPGRP_VOID=1 -DHAVE_STRFTIME=1
> -DHAVE_FORK=1 -DHAVE_VFORK=1 -DHAVE_WORKING_VFORK=1
> -DHAVE_WORKING_FORK=1 -DHAVE_VPRINTF=1 -DHAVE_STRDUP=1 -DHAVE_STRSTR=1
> -DHAVE_STRCHR=1 -DHAVE_ERAND48=1 -DHAVE_MEMMOVE=1 -DHAVE_LIBM=1
> -DPLD_png=1 -I. -I../include -I../../../ajax -I../../../nucleus
> -I../../../plplot  -DAJ_LinuxLF -D_FILE_OFFSET_BITS=64
> -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE  -DLENDIAN -DNO_AUTH  -O2
> -D__amd64__ -MT clique.o -MD -MP -MF .deps/clique.Tpo -c -o clique.o
> clique.c
>
> mv -f .deps/clique.Tpo .deps/clique.Po
>
> gcc -DPACKAGE_NAME=\"\" -DPACKAGE_TARNAME=\"\" -DPACKAGE_VERSION=\"\"
> -DPACKAGE_STRING=\"\" -DPACKAGE_BUGREPORT=\"\" -DPACKAGE=\"PHYLIPNEW\"
> -DVERSION=\"3.67\" -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1
> -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1
> -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1
> -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -DHAVE_DIRENT_H=1
> -DSTDC_HEADERS=1 -DHAVE_UNISTD_H=1 -DGETPGRP_VOID=1 -DHAVE_STRFTIME=1
> -DHAVE_FORK=1 -DHAVE_VFORK=1 -DHAVE_WORKING_VFORK=1
> -DHAVE_WORKING_FORK=1 -DHAVE_VPRINTF=1 -DHAVE_STRDUP=1 -DHAVE_STRSTR=1
> -DHAVE_STRCHR=1 -DHAVE_ERAND48=1 -DHAVE_MEMMOVE=1 -DHAVE_LIBM=1
> -DPLD_png=1 -I. -I../include -I../../../ajax -I../../../nucleus
> -I../../../plplot  -DAJ_LinuxLF -D_FILE_OFFSET_BITS=64
> -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE  -DLENDIAN -DNO_AUTH  -O2
> -D__amd64__ -MT disc.o -MD -MP -MF .deps/disc.Tpo -c -o disc.o disc.c
>
> mv -f .deps/disc.Tpo .deps/disc.Po
>
> gcc -DPACKAGE_NAME=\"\" -DPACKAGE_TARNAME=\"\" -DPACKAGE_VERSION=\"\"
> -DPACKAGE_STRING=\"\" -DPACKAGE_BUGREPORT=\"\" -DPACKAGE=\"PHYLIPNEW\"
> -DVERSION=\"3.67\" -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1
> -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1
> -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1
> -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -DHAVE_DIRENT_H=1
> -DSTDC_HEADERS=1 -DHAVE_UNISTD_H=1 -DGETPGRP_VOID=1 -DHAVE_STRFTIME=1
> -DHAVE_FORK=1 -DHAVE_VFORK=1 -DHAVE_WORKING_VFORK=1
> -DHAVE_WORKING_FORK=1 -DHAVE_VPRINTF=1 -DHAVE_STRDUP=1 -DHAVE_STRSTR=1
> -DHAVE_STRCHR=1 -DHAVE_ERAND48=1 -DHAVE_MEMMOVE=1 -DHAVE_LIBM=1
> -DPLD_png=1 -I. -I../include -I../../../ajax -I../../../nucleus
> -I../../../plplot  -DAJ_LinuxLF -D_FILE_OFFSET_BITS=64
> -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE  -DLENDIAN -DNO_AUTH  -O2
> -D__amd64__ -MT phylip.o -MD -MP -MF .deps/phylip.Tpo -c -o phylip.o
> phylip.c
>
> mv -f .deps/phylip.Tpo .deps/phylip.Po
>
> /bin/sh ../libtool --tag=CC   --mode=link gcc  -O2 -D__amd64__   -o
> fclique clique.o disc.o phylip.o ../../../nucleus/libnucleus.la
> ../../../ajax/libajax.la ../../../plplot/libeplplot.la -L/usr/lib64
> -lX11  -lXaw -lXt -lm  -lgd -lpng -lz -lm
>
> mkdir .libs
>
> gcc -O2 -D__amd64__ -o .libs/fclique clique.o disc.o phylip.o
> ../../../nucleus/.libs/libnucleus.so ../../../ajax/.libs/libajax.so
> ../../../plplot/.libs/libeplplot.so -L/usr/lib64 -lX11 -lXaw -lXt -lgd
> -lpng -lz -lm  -Wl,--rpath -Wl,/usr/local/lib
>
> /usr/bin/ld: cannot find -lXaw
>
> collect2: ld returned 1 exit status
>
> make[1]: *** [fclique] Error 1
>
> make[1]: Leaving directory
> `/home/bioinfvm/EMBOSS-6.0.1/embassy/PHYLIPNEW-3.67/src'
>
> make: *** [all-recursive] Error 1
>
>  
>
> Any help would be appreciated.
>
>  
>
> Thanks
>
> Mick
>
>  
>
> Head of Bioinformatics
> Institute for Animal Health
> Compton
> Berks
> RG20 7NN
> 01635 578411 
>
>  
>
> Please consider the environment and don't print this e-mail unless you
> really need to.
>
> The information contained in this message may be confidential or legally
> privileged and is intended solely for the addressee. If you have
> received this message in error please delete it & notify the originator
> immediately.  Unauthorised use, disclosure, copying or alteration of
> this message is forbidden & may be unlawful.  The contents of this
> e-mail are the views of the sender and do not necessarily represent the
> views of the Institute.   This email, and associated attachments, has
> been checked locally for viruses but we can accept no responsibility
> once it has left our systems.  Communications on Institute computers are
> monitored to secure the effective operation of the systems and for other
> lawful purposes.
>
>  
>
> The Institute for Animal Health is a company limited by guarantee,
> registered in England no. 559784.  
>
> The Institute is also a registered charity, Charity Commissioners
> Reference No. 228824
>
>  
>
>
> _______________________________________________
> EMBOSS mailing list
> EMBOSS at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/emboss
>
>   


From ajb at ebi.ac.uk  Thu Jul  9 12:08:14 2009
From: ajb at ebi.ac.uk (ajb at ebi.ac.uk)
Date: Thu, 9 Jul 2009 13:08:14 +0100 (BST)
Subject: [EMBOSS] Problems installing EMBASSY / PHYLIPNEW
In-Reply-To: <8975119BCD0AC5419D61A9CF1A923E9508B292C9@iahce2ksrv1.iah.bbsrc.ac.uk>
References: <8975119BCD0AC5419D61A9CF1A923E9508FD0DC2@iahce2ksrv1.iah.bbsrc.ac.uk>
	<4A5227C9.7040004@gmail.com>
	<8975119BCD0AC5419D61A9CF1A923E9508B292C9@iahce2ksrv1.iah.bbsrc.ac.uk>
Message-ID: <35478.86.26.12.63.1247141294.squirrel@webmail.ebi.ac.uk>

> Hmmm.  My sys admin says....
>
> I can find:
> libXaw-devel.x86_64 : X.Org X11 libXaw development package
> libXaw.x86_64 : X.Org X11 libXaw runtime library
> libXaw.i386 : X.Org X11 libXaw runtime library
> libXaw-devel.i386 : X.Org X11 libXaw development package
>
> Have I got compatibility issues here?

What operating system and version are you running?
We're at the bundling stage here for the next release and there
might be an opportunity for us to check.

Alan


From michael.watson at bbsrc.ac.uk  Thu Jul  9 12:22:10 2009
From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C))
Date: Thu, 9 Jul 2009 13:22:10 +0100
Subject: [EMBOSS] Problems installing EMBASSY / PHYLIPNEW
References: <8975119BCD0AC5419D61A9CF1A923E9508FD0DC2@iahce2ksrv1.iah.bbsrc.ac.uk>
	<4A5227C9.7040004@gmail.com>
	<8975119BCD0AC5419D61A9CF1A923E9508B292C9@iahce2ksrv1.iah.bbsrc.ac.uk>
	<35478.86.26.12.63.1247141294.squirrel@webmail.ebi.ac.uk>
Message-ID: <8975119BCD0AC5419D61A9CF1A923E9508B292CE@iahce2ksrv1.iah.bbsrc.ac.uk>

I think it's Red Hat Enterprise Linux 5 64 bit version....

-----Original Message-----
From: ajb at ebi.ac.uk [mailto:ajb at ebi.ac.uk]
Sent: Thu 09/07/2009 1:08 PM
To: michael watson (IAH-C)
Cc: emboss at emboss.open-bio.org
Subject: Re: [EMBOSS] Problems installing EMBASSY / PHYLIPNEW
 
> Hmmm.  My sys admin says....
>
> I can find:
> libXaw-devel.x86_64 : X.Org X11 libXaw development package
> libXaw.x86_64 : X.Org X11 libXaw runtime library
> libXaw.i386 : X.Org X11 libXaw runtime library
> libXaw-devel.i386 : X.Org X11 libXaw development package
>
> Have I got compatibility issues here?

What operating system and version are you running?
We're at the bundling stage here for the next release and there
might be an opportunity for us to check.

Alan


From A.J.Pemberton at bham.ac.uk  Thu Jul  9 13:12:01 2009
From: A.J.Pemberton at bham.ac.uk (Anthony Pemberton)
Date: Thu, 9 Jul 2009 14:12:01 +0100
Subject: [EMBOSS] Problems installing EMBASSY / PHYLIPNEW
In-Reply-To: <35478.86.26.12.63.1247141294.squirrel@webmail.ebi.ac.uk>
References: <8975119BCD0AC5419D61A9CF1A923E9508FD0DC2@iahce2ksrv1.iah.bbsrc.ac.uk><4A5227C9.7040004@gmail.com><8975119BCD0AC5419D61A9CF1A923E9508B292C9@iahce2ksrv1.iah.bbsrc.ac.uk>
	<35478.86.26.12.63.1247141294.squirrel@webmail.ebi.ac.uk>
Message-ID: <D39A1B623AED9D458065D38B8EA52780698BD7@BIOEX2.adf.bham.ac.uk>

Guys,

The system in question is probably a 64bit system. Very often (not
always) there is not a compatibility issue with the packages you list
below, they are simply the 32bit and 64bit versions of the libraries. On
RHEL 5 they will install in /usr/lib and /usr/lib64 respectively. Under
RHEL 5 just:

# up2date -i libXaw libXaw-devel

(# represents root prompt)

I am fairly sure a good install of the packages requires these
libraries. You will need a Red Hat subscription for up2date to work (or
install from your distro CD/DVD),

Regards,

Tony Pemberton


> -----Original Message-----
> From: emboss-bounces at lists.open-bio.org [mailto:emboss-
> bounces at lists.open-bio.org] On Behalf Of ajb at ebi.ac.uk
> Sent: 09 July 2009 13:08
> To: michael watson (IAH-C)
> Cc: emboss at emboss.open-bio.org
> Subject: Re: [EMBOSS] Problems installing EMBASSY / PHYLIPNEW
> 
> > Hmmm.  My sys admin says....
> >
> > I can find:
> > libXaw-devel.x86_64 : X.Org X11 libXaw development package
> > libXaw.x86_64 : X.Org X11 libXaw runtime library
> > libXaw.i386 : X.Org X11 libXaw runtime library
> > libXaw-devel.i386 : X.Org X11 libXaw development package
> >
> > Have I got compatibility issues here?
> 
> What operating system and version are you running?
> We're at the bundling stage here for the next release and there
> might be an opportunity for us to check.
> 
> Alan
> 
> 
> _______________________________________________
> EMBOSS mailing list
> EMBOSS at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/emboss


From biopython at maubp.freeserve.co.uk  Fri Jul 10 09:14:42 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 10 Jul 2009 10:14:42 +0100
Subject: [EMBOSS] transeq and ambiguous codons
In-Reply-To: <4A55B395.4090301@ebi.ac.uk>
References: <320fb6e00907081450y2fd135e0x817f03c41357e297@mail.gmail.com>
	<4A55B395.4090301@ebi.ac.uk>
Message-ID: <320fb6e00907100214v6799a217l507e089f635ef781@mail.gmail.com>

On Thu, Jul 9, 2009 at 10:08 AM, Peter Rice<pmr at ebi.ac.uk> wrote:
>
> Peter C. wrote:
>> However, consider the codon TRR. R means A or G, so this can mean TAA,
>> TGA, TAG or TGG which translate to stop or W (both EMBOSS and the NCBI
>> standard table agree here). Therefore the translation of TRR should be
>> "* or W", which I would expect based on the above examples to result
>> in "X". But instead EMBOSS transeq gives "*":
>
> This is a side effect of the way backtranslation works...

OK, leaving TRR aside for the moment (I'm not sure I'd have done it that
way, but I think I follow your logic), I have some more problem cases for
you to consider (all using the default standard NCBI table 1).

Most of these are 'unambiguous ambiguous codons' as you put it, and
I would agree using X when a more specific letter is possible isn't ideal
but isn't actually wrong. The "ATS" and related codons (see below)
however are simply wrong.

--------------------------------------------------------------------------------------

TRA means TAA or TGA, which are both stop codons. Therefore TRA
should translate as a stop, not as an X:

$ transeq asis:TAATGATRA -stdout -auto -osformat raw
**X

--------------------------------------------------------------------------------------

Now look at YTA, which means CTA or TTA which encode L, so
YTA should be L not X:

$ transeq asis:CTATTAYTA -stdout -auto -osformat raw
LLX

Likewise for YTG and YTR, and YTN.

--------------------------------------------------------------------------------------

Another example, ATW means ATA or ATT, which both translate as I,
so ATW should translate as I not X:

$ transeq asis:ATAATTATW -stdout -auto -osformat raw
IIX

--------------------------------------------------------------------------------------

Conversely, ATS which means ATC or ATG which translate as I and M.
Remember S means G or C. Therefore ATS should translate as X, and
not I:

$ transeq asis:ATCATGATS -stdout -auto -osformat raw
IMI

Likewise H means A, G or C, so ATH shows the same bug, as do some
other AT* codons:

$ transeq asis:ATAATCATGATH -stdout -auto -osformat raw
IIMI

[*** This one strikes me as a clear bug ***]

--------------------------------------------------------------------------------------

Now for another debatable one, RAT means AAT or GAT which code
for N and D. So, you could use B (Asx) here rather than the broader X.

$ transeq asis:AATGATRAT -stdout -auto -osformat raw
NDX

Again, the same thing for others like RAC -> X not B, and RAY -> X not B.

Similarly, you don't use J to mean leucine (L) or to isoleucine (I), and
opt for X (again, this is justifiable). e.g. WTA

$ transeq asis:ATATTAWTA -stdout -auto -osformat raw
ILX

--------------------------------------------------------------------------------------

This list is only partial, and only for the standard table.

Peter C.


From pmr at ebi.ac.uk  Fri Jul 10 09:30:52 2009
From: pmr at ebi.ac.uk (Peter Rice)
Date: Fri, 10 Jul 2009 10:30:52 +0100
Subject: [EMBOSS] transeq and ambiguous codons
In-Reply-To: <320fb6e00907100214v6799a217l507e089f635ef781@mail.gmail.com>
References: <320fb6e00907081450y2fd135e0x817f03c41357e297@mail.gmail.com>	
	<4A55B395.4090301@ebi.ac.uk>
	<320fb6e00907100214v6799a217l507e089f635ef781@mail.gmail.com>
Message-ID: <4A570A4C.2000500@ebi.ac.uk>

Peter C. wrote:
> OK, leaving TRR aside for the moment (I'm not sure I'd have done it that
> way, but I think I follow your logic), I have some more problem cases for
> you to consider (all using the default standard NCBI table 1).
> 
> Most of these are 'unambiguous ambiguous codons' as you put it, and
> I would agree using X when a more specific letter is possible isn't ideal
> but isn't actually wrong. The "ATS" and related codons (see below)
> however are simply wrong.

They do look wrong. The "X when it could pick a residue" ones I knew of.

The others need a closer look. The plan is to work through all possible 
codons and all the NCBI genetic codes as soon as the release is out.

It should be a simple patch to ajtranslate.c when I'm done.

> --------------------------------------------------------------------------------------
> 
> Now for another debatable one, RAT means AAT or GAT which code
> for N and D. So, you could use B (Asx) here rather than the broader X.
> 
> Similarly, you don't use J to mean leucine (L) or to isoleucine (I), and
> opt for X (again, this is justifiable). e.g. WTA

Hmmm ... B and Z are ambiguity codes for amino acid analyser where all the 
amide bonds are broken and that includes N->D and Q->E. We used to have one 
of those in the lab. Similarly, J is for mass spec where I and L have the 
same molecular weight. I don't consider them appropriate for translation.

So I plan to go for unique amino acids where possible with the ambiguity codes.

What do our users think?

regards,

Peter


From biopython at maubp.freeserve.co.uk  Fri Jul 10 23:10:19 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Sat, 11 Jul 2009 00:10:19 +0100
Subject: [EMBOSS] transeq and ambiguous codons
In-Reply-To: <4A570A4C.2000500@ebi.ac.uk>
References: <320fb6e00907081450y2fd135e0x817f03c41357e297@mail.gmail.com>
	<4A55B395.4090301@ebi.ac.uk>
	<320fb6e00907100214v6799a217l507e089f635ef781@mail.gmail.com>
	<4A570A4C.2000500@ebi.ac.uk>
Message-ID: <320fb6e00907101610o63140f2cp7f0aebd1cc7022d1@mail.gmail.com>

On Fri, Jul 10, 2009 at 10:30 AM, Peter Rice<pmr at ebi.ac.uk> wrote:
>
> Peter C. wrote:
>>
>> OK, leaving TRR aside for the moment (I'm not sure I'd have done it that
>> way, but I think I follow your logic), I have some more problem cases for
>> you to consider (all using the default standard NCBI table 1).
>>
>> Most of these are 'unambiguous ambiguous codons' as you put it, and
>> I would agree using X when a more specific letter is possible isn't ideal
>> but isn't actually wrong. The "ATS" and related codons (see below)
>> however are simply wrong.
>
> They do look wrong. The "X when it could pick a residue" ones I knew of.
>
> The others need a closer look. The plan is to work through all possible
> codons and all the NCBI genetic codes as soon as the release is out.
>
> It should be a simple patch to ajtranslate.c when I'm done.
>

OK - I appreciate this is too last minute for the imminent EMBOSS release.

>> --------------------------------------------------------------------------------------
>>
>> Now for another debatable one, RAT means AAT or GAT which code
>> for N and D. So, you could use B (Asx) here rather than the broader X.
>>
>> Similarly, you don't use J to mean leucine (L) or to isoleucine (I), and
>> opt for X (again, this is justifiable). e.g. WTA
>
> Hmmm ... B and Z are ambiguity codes for amino acid analyser where all the
> amide bonds are broken and that includes N->D and Q->E. We used to have one
> of those in the lab. Similarly, J is for mass spec where I and L have the
> same molecular weight. I don't consider them appropriate for translation.

Well, as I said, this is debatable. On the one hand B and Z are IUPAC standards
(although J isn't yet), but amino acids don't have the full ambiguous alphabet
that we have for nucleotides so some might find such a translation surprising.
http://www.chem.qmul.ac.uk/iupac/AminoAcid/A2021.html

> So I plan to go for unique amino acids where possible with the ambiguity
> codes.

Good :)

Peter C.


From aradwen at gmail.com  Sat Jul 11 10:29:40 2009
From: aradwen at gmail.com (Radwen ANIBA)
Date: Sat, 11 Jul 2009 12:29:40 +0200
Subject: [EMBOSS] Problem with protein caracters
Message-ID: <e591b1bd0907110329x7dfdefd1md08c5e99aeb8df2e@mail.gmail.com>

Hi everyone

I'm trying to use some programs that comes with emboss package to analyze
some protein sequences but I have sometimes this message :

Error: ajSeqTypeCheckIn: Sequence must be protein sequence without BZ U X or
*: found bad character 'X'

Is there any manner to force the program considering these types of residues
?

Thank you

radwen


From pmr at ebi.ac.uk  Sat Jul 11 10:54:21 2009
From: pmr at ebi.ac.uk (Peter Rice)
Date: Sat, 11 Jul 2009 11:54:21 +0100
Subject: [EMBOSS] Problem with protein caracters
In-Reply-To: <e591b1bd0907110329x7dfdefd1md08c5e99aeb8df2e@mail.gmail.com>
References: <e591b1bd0907110329x7dfdefd1md08c5e99aeb8df2e@mail.gmail.com>
Message-ID: <4A586F5D.5060702@ebi.ac.uk>

Radwen ANIBA wrote:
> I'm trying to use some programs that comes with emboss package to analyze
> some protein sequences but I have sometimes this message :
> 
> Error: ajSeqTypeCheckIn: Sequence must be protein sequence without BZ U X or
> *: found bad character 'X'
> 
> Is there any manner to force the program considering these types of residues

EMBOSS uses the type attribute of the input sequence (or seqset or 
seqall) to identify the type of the input sequence (nucleotide, protein, 
or any) and the characters that are allowed (gaps, stops, non-standard 
residies and ambiguity characters).

Your application is expecting "pureprotein". This is only used by 
applications unable to handle the ambiguity codes (it can be difficult 
to define what an algorithm should do with them).

The alternative are:

protein - accepts all characters, converts stops to X
proteinstandard - converts U,O and J to 'X'
stopproteinstandard - converts stops, U, O, J to X

"protein" is probably what you want. You need to be able to do something 
with the ambiguity codes X, B, Z and J and with the non-standard amino 
acids U (selenocysteine) and O (pyrrolysine)

Hope this helps

Peter Rice


From liisa.koski at dnalandmarks.ca  Mon Jul 13 13:56:37 2009
From: liisa.koski at dnalandmarks.ca (Liisa Koski)
Date: Mon, 13 Jul 2009 09:56:37 -0400
Subject: [EMBOSS] fuzznuc score
Message-ID: <OF437A9FFD.DAEF22FD-ON852575F2.004C8704-852575F2.004C9866@notes.basf-corp.com>

Hello,

I would like to know how the score is calculated for the fuzznuc results. 
What is taken into consideration(strand, exact matches, etc?). What does a 

high score vs lower score mean?

Many thanks,

Liisa


From pmr at ebi.ac.uk  Mon Jul 13 14:43:18 2009
From: pmr at ebi.ac.uk (Peter Rice)
Date: Mon, 13 Jul 2009 15:43:18 +0100
Subject: [EMBOSS] fuzznuc score
In-Reply-To: <OF437A9FFD.DAEF22FD-ON852575F2.004C8704-852575F2.004C9866@notes.basf-corp.com>
References: <OF437A9FFD.DAEF22FD-ON852575F2.004C8704-852575F2.004C9866@notes.basf-corp.com>
Message-ID: <4A5B4806.1080905@ebi.ac.uk>

Liisa Koski wrote:
> Hello,
> 
> I would like to know how the score is calculated for the fuzznuc results. 
> What is taken into consideration(strand, exact matches, etc?). What does a 
> high score vs lower score mean?

The score in EMBOSS report output is a required field for standard
formats such as GFF.

For fuzznuc the score is the length of the sequence that is found, minus
any mismatches.

Patterns that can match very long sequences can therefore give a very
high score.

Hope this helps,

Peter Rice


From ajb at ebi.ac.uk  Wed Jul 15 11:19:33 2009
From: ajb at ebi.ac.uk (ajb at ebi.ac.uk)
Date: Wed, 15 Jul 2009 12:19:33 +0100 (BST)
Subject: [EMBOSS] EMBOSS 6.1.0 release now available
Message-ID: <39390.86.26.12.63.1247656773.squirrel@webmail.ebi.ac.uk>

Dear EMBOSS users and developers,

A new version of EMBOSS (6.1.0) is now available for download
from our ftp server:

   ftp://emboss.open-bio.org/pub/EMBOSS/


If you use any of the EMBASSY packages (e.g. PHYLIP, VIENNA etc) then,
as usual, remember to re-download and compile those too.

A new version of the mEMBOSS, the Windows port, is also available from:

   ftp://emboss.open-bio.org/pub/EMBOSS/windows/mEMBOSS-6.1.0-setup.exe

Many new capabilities have been added and bugs fixed throughout.


Release highlights for EMBOSS include:

* Full support for the new SwissProt format. In most cases the entry can
  be read and written exactly

* Full support for EMBL and GenBank entries. In most cases the entry can
  be read and written exactly

* Support for FASTQ short read formats for sequence and quality data

* Full support for protein and nucleotide sequence parsing from PDB entries

* Full support for GFF3 feature format as the new default feature output

* Improved summary information at the end of report output

* Alignment output using multiple sequence formats

* Extended support for distance matrix file formats

* Improved support for regular expression and pattern searching

* Improved support for large sequence alignments

* Support for remote locations in feature table processing, for example
  retrieval in coderet.

* Output directory support extended to allow directories to be created

* Normalisation option for hydrophobicity plots (pepwindow and pepwindowall)

* Processing of methylation sites in restriction mapping

* Embossdata reports results alphabetically sorted

* Command line qualifiers should be unique after 5 characters to allow
  safe abbreviation

* Improved configuration procedures for X11 support

* Support for dasgff report format, making it possible to write
  EMBOSS-based DAS annotation servers


Release highlights for EMBASSY include:

* Support for MEME 4.0

* Phylipnew updated to Phylip 3.68

* Support for the HMMERDB environment variable in Hmmernew.

* Bug fixes for the MSE multiple sequence editor


Release highlights for Jemboss include:

* Refactoring of the source code

* Location of the 'Execution mode' menu moved near to the 'Go' button in
  the application forms.  When a user runs a job for the first time in
  'batch' mode an information message is displayed

* Automatic configuration of the standalone Jemboss GUI on
  UNIX systems after typing "make install" for EMBOSS. This
  standalone GUI can be run using the runJemboss.csh script
  in the EMBOSS 'bin' directory. This assumes that you have
  a reasonably up-to-date version of Java installed (1.6
  preferred)


For future extensions, we have added:

* Parsing of cross-reference information from SwissProt and EMBL/GenBank
  formats

* Code to delete and update database indexes


New EMBOSS wiki

EMBOSS now has a Wiki at http://emboss.open-bio.org/wiki where we will
maintain the master copies of documentation for the applications and
libraries, and where we have sections for planning new features and
applications for the next 3 years of funding. Please contribute any
corrections to the documentation and add new ideas to the "Planning"
section. We will, of course, be making the wiki prettier as it
matures.


Important note for Developers

New distributions of operating systems have started to use the series
2 version of libtool. We therefore now use this in our CVS
repository. The latest stable version of libtool is 2.2.6a (reported
by libtool itself as 2.2.6). Developers using systems with older
(1.5.x) libtool versions will have to install a local copy of
libtool. This would typically be done by downloading the source code
from the GNU site:

             ftp://ftp.gnu.org/

After installing libtool it will usually be necessary to then
re-install autoconf (2.63) and automake (1.11) to the same directory
root (they are often tied to the version of libtool they were provided
with). They too are available from the GNU ftp server. Make sure that
your PATH is refreshed between doing the installations of the GNU
tools in order that the previous versions aren't referenced.

We note that one system (cygwin) currently provides an experimental
version of libtool (2.2.7). Developers on these systems (and, in general,
on any system with a higher version of libtool than in our CVS repository)
should type:

             autoreconf -fi

before attempting compilation. We will usually keep up-to-date with libtool
stable releases within a libtool series.


New BBSRC funding and future work

As previously announced, we have recently been refunded by the BBSRC.
What we said in that announcement bears repeating here.
The core aims of the funding proposal were to continue support,
maintenance and development of EMBOSS, and to provide extensive online
training materials for users, developers and system administrators using
text from a series of books to be published by Cambridge University Press.

We are also explicitly targeting areas where we see EMBOSS can be
expanded:

* Richer data content in EMBOSS outputs leading to major improvements
  in the integration and visualisation of results in browsers.

* Processing many more data fields in EMBOSS inputs (taxonomy,
  genes, GO terms, cross-references, keywords.

* Extending and improving database access: better indexing, query
  language support and combining searches across multiple databases,
  support for non-sequence data resources and new data access methods

* Scaling up the libraries and adding new applications to support the
  data volumes generated by next-generation sequencing runs. We
  anticipate many more users will be working with short read data
  mapped to reference sequences over the next few years.

* We aim to add at least 100 new applications in these 3 years.
  Suggestions for new applications are very welcome.

* Major work on new developments and new library code will start from
  August.


Alan


From db60 at st-andrews.ac.uk  Wed Jul 15 12:47:17 2009
From: db60 at st-andrews.ac.uk (Daniel Barker)
Date: Wed, 15 Jul 2009 13:47:17 +0100
Subject: [EMBOSS] EMBOSS 6.1.0 release now available
In-Reply-To: <39390.86.26.12.63.1247656773.squirrel@webmail.ebi.ac.uk>
References: <39390.86.26.12.63.1247656773.squirrel@webmail.ebi.ac.uk>
Message-ID: <4A5DCFD5.8050101@st-andrews.ac.uk>

Dear Alan,

Very glad to see the project will continue with new funding.

One request. Could EMBOSS please add a full global alignment program or 
option? Needle allows 'overhanging ends' with no penalty. From the 
current Needle documentation: 'There is no penalty for the hanging ends 
of the overlap. In bioinformatics, it is usually reasonable to assume 
that the sequences are incomplete and there should be no penalty for 
failing to align the missing bases.'

For protein alignments, this tends to be very lenient towards alignments 
with these kinds of domain architecture:

protein 1: A-B-C
protein 2:     C-X-Y-Z

As opposed to the following situation, which (correctly for global 
alignment) does tend to be heavily penalized:

protein 1: A-C-D
protein 2:   C

I don't think this makes sense. Or at least, it definitely does not make 
sense for all applications. Generally with pairwise alignment of 
proteins I want either local alignment (e.g. water), or full global 
alignment which isn't available in EMBOSS.

Thank you for considering this.

Best regards,

Daniel

-- 
Daniel Barker
http://bio.st-andrews.ac.uk/staff/db60.htm
The University of St Andrews is a charity registered in Scotland :
No SC013532


From pmr at ebi.ac.uk  Wed Jul 15 13:44:26 2009
From: pmr at ebi.ac.uk (Peter Rice)
Date: Wed, 15 Jul 2009 14:44:26 +0100
Subject: [EMBOSS] EMBOSS 6.1.0 release now available
In-Reply-To: <4A5DCFD5.8050101@st-andrews.ac.uk>
References: <39390.86.26.12.63.1247656773.squirrel@webmail.ebi.ac.uk>
	<4A5DCFD5.8050101@st-andrews.ac.uk>
Message-ID: <4A5DDD3A.9070605@ebi.ac.uk>

Daniel Barker wrote:
> One request. Could EMBOSS please add a full global alignment program or
> option? Needle allows 'overhanging ends' with no penalty.
> 
> I don't think this makes sense. Or at least, it definitely does not make
> sense for all applications. Generally with pairwise alignment of
> proteins I want either local alignment (e.g. water), or full global
> alignment which isn't available in EMBOSS.

Thanks for the suggestion.

Would you be happy with an options for needle to penalise gaps at the
end, or a separate application (if so, we need to invent a new name).

regards,

Peter Rice


From db60 at st-andrews.ac.uk  Wed Jul 15 13:59:25 2009
From: db60 at st-andrews.ac.uk (Daniel Barker)
Date: Wed, 15 Jul 2009 14:59:25 +0100
Subject: [EMBOSS] EMBOSS 6.1.0 release now available
In-Reply-To: <4A5DDD3A.9070605@ebi.ac.uk>
References: <39390.86.26.12.63.1247656773.squirrel@webmail.ebi.ac.uk>
	<4A5DCFD5.8050101@st-andrews.ac.uk> <4A5DDD3A.9070605@ebi.ac.uk>
Message-ID: <4A5DE0BD.7090103@st-andrews.ac.uk>

Dear Peter,

> Would you be happy with an options for needle to penalise gaps at the
> end, or a separate application (if so, we need to invent a new name).

An option for needle would be fine. This may be the best solution.

Thanks a lot and best wishes,

Daniel

-- 
Daniel Barker
http://bio.st-andrews.ac.uk/staff/db60.htm
The University of St Andrews is a charity registered in Scotland :
No SC013532


From charles-listes-emboss at plessy.org  Wed Jul 15 14:59:53 2009
From: charles-listes-emboss at plessy.org (Charles Plessy)
Date: Wed, 15 Jul 2009 23:59:53 +0900
Subject: [EMBOSS] EMBOSS 6.1.0 release now available
In-Reply-To: <39390.86.26.12.63.1247656773.squirrel@webmail.ebi.ac.uk>
References: <39390.86.26.12.63.1247656773.squirrel@webmail.ebi.ac.uk>
Message-ID: <20090715145953.GA30812@kunpuu.plessy.org>

Le Wed, Jul 15, 2009 at 12:19:33PM +0100, ajb at ebi.ac.uk a ?crit :
> Dear EMBOSS users and developers,
> 
> A new version of EMBOSS (6.1.0) is now available for download
> from our ftp server:
> 
>    ftp://emboss.open-bio.org/pub/EMBOSS/

Dear EMBOSS developers, congratulations for the release!

I have updated our Debian EMBOSS packages, and they should become available for
the Sid (unstable) distribution in our mirrors within a day or two. Please let
me know if these packages prove difficult to install on our current stable
distribution (Lenny), and I will prepare a semi-official backport.

I will update and rebuild the Debian EMBASSY packages against EMBOSS 6.1.0
later this week.

Have a nice day

-- 
Charles Plessy
Debian Med packaging team,
http://www.debian.org/devel/debian-med
Tsurumi, Kanagawa, Japan


From golharam at umdnj.edu  Thu Jul 16 19:09:03 2009
From: golharam at umdnj.edu (Ryan Golhar)
Date: Thu, 16 Jul 2009 15:09:03 -0400
Subject: [EMBOSS] EMBOSS 6.1.0 release now available
In-Reply-To: <39390.86.26.12.63.1247656773.squirrel@webmail.ebi.ac.uk>
References: <39390.86.26.12.63.1247656773.squirrel@webmail.ebi.ac.uk>
Message-ID: <4A5F7ACF.4060808@umdnj.edu>

Any thought on implementing some of the algorithms using CUDA when 
possible on GPUs?  This could speed up some programs significantly.


ajb at ebi.ac.uk wrote:
> Dear EMBOSS users and developers,
> 
> A new version of EMBOSS (6.1.0) is now available for download
> from our ftp server:
> 
>    ftp://emboss.open-bio.org/pub/EMBOSS/
> 
> 
> If you use any of the EMBASSY packages (e.g. PHYLIP, VIENNA etc) then,
> as usual, remember to re-download and compile those too.
> 
> A new version of the mEMBOSS, the Windows port, is also available from:
> 
>    ftp://emboss.open-bio.org/pub/EMBOSS/windows/mEMBOSS-6.1.0-setup.exe
> 
> Many new capabilities have been added and bugs fixed throughout.
> 
> 
> 
> 
> Release highlights for EMBOSS include:
> 
> * Full support for the new SwissProt format. In most cases the entry can
>   be read and written exactly
> 
> * Full support for EMBL and GenBank entries. In most cases the entry can
>   be read and written exactly
> 
> * Support for FASTQ short read formats for sequence and quality data
> 
> * Full support for protein and nucleotide sequence parsing from PDB entries
> 
> * Full support for GFF3 feature format as the new default feature output
> 
> * Improved summary information at the end of report output
> 
> * Alignment output using multiple sequence formats
> 
> * Extended support for distance matrix file formats
> 
> * Improved support for regular expression and pattern searching
> 
> * Improved support for large sequence alignments
> 
> * Support for remote locations in feature table processing, for example
>   retrieval in coderet.
> 
> * Output directory support extended to allow directories to be created
> 
> * Normalisation option for hydrophobicity plots (pepwindow and pepwindowall)
> 
> * Processing of methylation sites in restriction mapping
> 
> * Embossdata reports results alphabetically sorted
> 
> * Command line qualifiers should be unique after 5 characters to allow
>   safe abbreviation
> 
> * Improved configuration procedures for X11 support
> 
> * Support for dasgff report format, making it possible to write
>   EMBOSS-based DAS annotation servers
> 
> 
> 
> 
> Release highlights for EMBASSY include:
> 
> * Support for MEME 4.0
> 
> * Phylipnew updated to Phylip 3.68
> 
> * Support for the HMMERDB environment variable in Hmmernew.
> 
> * Bug fixes for the MSE multiple sequence editor
> 
> 
> 
> 
> Release highlights for Jemboss include:
> 
> * Refactoring of the source code
> 
> * Location of the 'Execution mode' menu moved near to the 'Go' button in
>   the application forms.  When a user runs a job for the first time in
>   'batch' mode an information message is displayed
> 
> * Automatic configuration of the standalone Jemboss GUI on
>   UNIX systems after typing "make install" for EMBOSS. This
>   standalone GUI can be run using the runJemboss.csh script
>   in the EMBOSS 'bin' directory. This assumes that you have
>   a reasonably up-to-date version of Java installed (1.6
>   preferred)
> 
> 
> 
> 
> For future extensions, we have added:
> 
> * Parsing of cross-reference information from SwissProt and EMBL/GenBank
>   formats
> 
> * Code to delete and update database indexes
> 
> 
> 
> 
> New EMBOSS wiki
> 
> EMBOSS now has a Wiki at http://emboss.open-bio.org/wiki where we will
> maintain the master copies of documentation for the applications and
> libraries, and where we have sections for planning new features and
> applications for the next 3 years of funding. Please contribute any
> corrections to the documentation and add new ideas to the "Planning"
> section. We will, of course, be making the wiki prettier as it
> matures.
> 
> 
> 
> 
> Important note for Developers
> 
> New distributions of operating systems have started to use the series
> 2 version of libtool. We therefore now use this in our CVS
> repository. The latest stable version of libtool is 2.2.6a (reported
> by libtool itself as 2.2.6). Developers using systems with older
> (1.5.x) libtool versions will have to install a local copy of
> libtool. This would typically be done by downloading the source code
> from the GNU site:
> 
>              ftp://ftp.gnu.org/
> 
> After installing libtool it will usually be necessary to then
> re-install autoconf (2.63) and automake (1.11) to the same directory
> root (they are often tied to the version of libtool they were provided
> with). They too are available from the GNU ftp server. Make sure that
> your PATH is refreshed between doing the installations of the GNU
> tools in order that the previous versions aren't referenced.
> 
> We note that one system (cygwin) currently provides an experimental
> version of libtool (2.2.7). Developers on these systems (and, in general,
> on any system with a higher version of libtool than in our CVS repository)
> should type:
> 
>              autoreconf -fi
> 
> before attempting compilation. We will usually keep up-to-date with libtool
> stable releases within a libtool series.
> 
> 
> 
> 
> New BBSRC funding and future work
> 
> As previously announced, we have recently been refunded by the BBSRC.
> What we said in that announcement bears repeating here.
> The core aims of the funding proposal were to continue support,
> maintenance and development of EMBOSS, and to provide extensive online
> training materials for users, developers and system administrators using
> text from a series of books to be published by Cambridge University Press.
> 
> We are also explicitly targeting areas where we see EMBOSS can be
> expanded:
> 
> * Richer data content in EMBOSS outputs leading to major improvements
>   in the integration and visualisation of results in browsers.
> 
> * Processing many more data fields in EMBOSS inputs (taxonomy,
>   genes, GO terms, cross-references, keywords.
> 
> * Extending and improving database access: better indexing, query
>   language support and combining searches across multiple databases,
>   support for non-sequence data resources and new data access methods
> 
> * Scaling up the libraries and adding new applications to support the
>   data volumes generated by next-generation sequencing runs. We
>   anticipate many more users will be working with short read data
>   mapped to reference sequences over the next few years.
> 
> * We aim to add at least 100 new applications in these 3 years.
>   Suggestions for new applications are very welcome.
> 
> * Major work on new developments and new library code will start from
>   August.
> 
> 
> Alan
> 
> 
> _______________________________________________
> EMBOSS mailing list
> EMBOSS at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/emboss
> 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: golharam.vcf
Type: text/x-vcard
Size: 438 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/emboss/attachments/20090716/a7b18140/attachment-0002.vcf>

From pmr at ebi.ac.uk  Fri Jul 17 07:39:11 2009
From: pmr at ebi.ac.uk (Peter Rice)
Date: Fri, 17 Jul 2009 08:39:11 +0100
Subject: [EMBOSS] EMBOSS 6.1.0 release now available
In-Reply-To: <4A5F7ACF.4060808@umdnj.edu>
References: <39390.86.26.12.63.1247656773.squirrel@webmail.ebi.ac.uk>
	<4A5F7ACF.4060808@umdnj.edu>
Message-ID: <4A602A9F.4040908@ebi.ac.uk>

Ryan Golhar wrote:
> Any thought on implementing some of the algorithms using CUDA when 
> possible on GPUs?  This could speed up some programs significantly.

Yes indeed.

At BOSC/ISMB last month we were discussing closer collaborations with the 
other Open Bio FOundation projects.

One of these is BioManyCores which is aiming at OpenCL programming but is 
currently concentrating on CUDA.

When our new workstations are delivered we will be looking into CUDA.

Which applications would you most like to speed up? (current EMBOSS 
programs, and suggestions for new ones)

regards,

Peter Rice


From hpm at bioinfo-user.org.uk  Fri Jul 17 09:59:42 2009
From: hpm at bioinfo-user.org.uk (Hamish McWilliam)
Date: Fri, 17 Jul 2009 10:59:42 +0100
Subject: [EMBOSS] EMBOSS 6.1.0 release now available
In-Reply-To: <4A602A9F.4040908@ebi.ac.uk>
References: <39390.86.26.12.63.1247656773.squirrel@webmail.ebi.ac.uk>	<4A5F7ACF.4060808@umdnj.edu>
	<4A602A9F.4040908@ebi.ac.uk>
Message-ID: <4A604B8E.1000004@bioinfo-user.org.uk>

Hi Peter,

>> Any thought on implementing some of the algorithms using CUDA when 
>> possible on GPUs?  This could speed up some programs significantly.

Given that our server systems do not have particularly powerful GPUs, 
but do have multiple CPU cores: threading, and possibly the use of on 
core vectorization (see http://en.wikipedia.org/wiki/SIMD), seem like 
more generally applicable methods for improving performance in our case.

One interesting option for Intel platforms is the Intel Compiler (icc), 
which will vectorize some code constructions as a platform specific 
optimization. Unfortunately we are running a mixture of AMD and Intel 
systems of various vintages, so this option is going to require a lot of 
testing to check it works and gives us any benefits.

> Yes indeed.
> 
> At BOSC/ISMB last month we were discussing closer collaborations with 
> the other Open Bio FOundation projects.
> 
> One of these is BioManyCores which is aiming at OpenCL programming but 
> is currently concentrating on CUDA.
> 
> When our new workstations are delivered we will be looking into CUDA.

Given that OpenCL supports both GPU and CPU vectorization, and CUDA is 
Nvidia GPU specific, it may be worth waiting for OpenCL to be adopted. 
MacOS X Snow Leopard is only a couple of months away after all ;-)

> Which applications would you most like to speed up? (current EMBOSS 
> programs, and suggestions for new ones)

At our end the bottlenecks are mainly the indexing (dbi* & dbx*) and 
reformatting tools (seqret).

All the best,

Hamish


From georgios at biotek.uio.no  Fri Jul 17 10:52:38 2009
From: georgios at biotek.uio.no (George Magklaras)
Date: Fri, 17 Jul 2009 12:52:38 +0200
Subject: [EMBOSS] EMBOSS 6.1.0 release now available
In-Reply-To: <4A604B8E.1000004@bioinfo-user.org.uk>
References: <39390.86.26.12.63.1247656773.squirrel@webmail.ebi.ac.uk>	<4A5F7ACF.4060808@umdnj.edu>	<4A602A9F.4040908@ebi.ac.uk>
	<4A604B8E.1000004@bioinfo-user.org.uk>
Message-ID: <4A6057F6.7010704@biotek.uio.no>


Hamish McWilliam wrote:
)
> 
> At our end the bottlenecks are mainly the indexing (dbi* & dbx*) and 
> reformatting tools (seqret).
>
I second that.


-- 
--
George Magklaras BSc Hons MPhil
RHCE:805008309135525

Senior Computer Systems Engineer/UNIX-Linux Systems Administrator
EMBnet Technical Management Board
The Biotechnology Centre of Oslo,
University of Oslo
http://folk.uio.no/georgios


From golharam at umdnj.edu  Fri Jul 17 15:03:02 2009
From: golharam at umdnj.edu (Ryan Golhar)
Date: Fri, 17 Jul 2009 11:03:02 -0400
Subject: [EMBOSS] EMBOSS 6.1.0 release now available
In-Reply-To: <4A602A9F.4040908@ebi.ac.uk>
References: <39390.86.26.12.63.1247656773.squirrel@webmail.ebi.ac.uk>
	<4A5F7ACF.4060808@umdnj.edu> <4A602A9F.4040908@ebi.ac.uk>
Message-ID: <4A6092A6.3070808@umdnj.edu>

I think the sequence alignment programs or anything dealing with 
matrices would be a good first start.  Even (as Hamish) mentioned, 
making use of multi-core systems would be good.

We are interested in doing some CUDA programming here.  We are doing 
some training in about a month.  Based on that, we'll see how well we 
know it and what we want to pursue.  Maybe we can help you out a bit in 
this?


Peter Rice wrote:
> Ryan Golhar wrote:
>> Any thought on implementing some of the algorithms using CUDA when 
>> possible on GPUs?  This could speed up some programs significantly.
> 
> Yes indeed.
> 
> At BOSC/ISMB last month we were discussing closer collaborations with 
> the other Open Bio FOundation projects.
> 
> One of these is BioManyCores which is aiming at OpenCL programming but 
> is currently concentrating on CUDA.
> 
> When our new workstations are delivered we will be looking into CUDA.
> 
> Which applications would you most like to speed up? (current EMBOSS 
> programs, and suggestions for new ones)
> 
> regards,
> 
> Peter Rice
> 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: golharam.vcf
Type: text/x-vcard
Size: 438 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/emboss/attachments/20090717/555a78da/attachment-0002.vcf>

From pmr at ebi.ac.uk  Fri Jul 17 15:34:31 2009
From: pmr at ebi.ac.uk (Peter Rice)
Date: Fri, 17 Jul 2009 16:34:31 +0100
Subject: [EMBOSS] EMBOSS 6.1.0 release now available
In-Reply-To: <4A6057F6.7010704@biotek.uio.no>
References: <39390.86.26.12.63.1247656773.squirrel@webmail.ebi.ac.uk>	<4A5F7ACF.4060808@umdnj.edu>	<4A602A9F.4040908@ebi.ac.uk>	<4A604B8E.1000004@bioinfo-user.org.uk>
	<4A6057F6.7010704@biotek.uio.no>
Message-ID: <4A609A07.2080609@ebi.ac.uk>

George Magklaras wrote:
> Hamish McWilliam wrote:
> At our end the bottlenecks are mainly the indexing (dbi* & dbx*) and
> reformatting tools (seqret).
>>
> I second that.

dbi and dbx we understand and can speedup.

What large data are you using through seqret? I assume it is not single
sequences that are slow.

regards,

Peter Rice


From belegdol at gmail.com  Sun Jul 19 12:38:19 2009
From: belegdol at gmail.com (Julian Sikorski)
Date: Sun, 19 Jul 2009 14:38:19 +0200
Subject: [EMBOSS] Packaging EMBOSS for Fedora
In-Reply-To: <1244850556.8999.7.camel@login-svr1.ebi.ac.uk>
References: <h0uef8$bdu$1@ger.gmane.org>
	<1244850556.8999.7.camel@login-svr1.ebi.ac.uk>
Message-ID: <h3v43r$tc5$1@ger.gmane.org>

W dniu 13.06.2009 01:49, Mahmut Uludag pisze:
> Hi Julian,
> 
>> I have one crucial question: is it worth to package jemboss? It seems it
>> didn't see development since 2005, and I have experienced some important
>> issues when trying to use it (mainly not working with files with spaces
>> in filenames). What do you think?
> 
> I had a quick look into the files with spaces in filenames problem, we
> can include a fix in the coming release (6.1) for this problem. Previous
> release of EMBOSS (6.0) included some fixes of similar nature, as well
> as some small gui improvements. We recently have updated EMBOSS/Jemboss
> make/install machinery to make standalone Jemboss come out of box when
> users install EMBOSS, i think we were successful and this will be a new
> feature in EMBOSS 6.1.
> 
> Regards,
> Mahmut
Given that I'm finally an MSc and have some more free time, I decided to
revisit the packaging and update the package to 6.1.0. Unfortunately,
there seem to be some problems with make install:

/usr/bin/make  install-exec-hook
make[7]: Entering directory
`/builddir/build/BUILD/EMBOSS-6.1.0/jemboss/org/emboss/jemboss/editor'
mkdir -p --
/builddir/build/BUILDROOT/EMBOSS-6.1.0-1.fc11.x86_64/usr/share/EMBOSS/jemboss/org/emboss/jemboss/editor
/usr/bin/install: cannot stat `*.class': No such file or directory
make[7]: *** [install-exec-hook] Error 1
make[7]: Leaving directory
`/builddir/build/BUILD/EMBOSS-6.1.0/jemboss/org/emboss/jemboss/editor'
make[6]: Leaving directory
`/builddir/build/BUILD/EMBOSS-6.1.0/jemboss/org/emboss/jemboss/editor'
make[6]: *** [install-exec-am] Error 2
make[5]: *** [install-am] Error 2
make[5]: Leaving directory
`/builddir/build/BUILD/EMBOSS-6.1.0/jemboss/org/emboss/jemboss/editor'
make[4]: Leaving directory
`/builddir/build/BUILD/EMBOSS-6.1.0/jemboss/org/emboss/jemboss'
make[4]: *** [install-recursive] Error 1
make[3]: Leaving directory
`/builddir/build/BUILD/EMBOSS-6.1.0/jemboss/org/emboss'
make[2]: Leaving directory `/builddir/build/BUILD/EMBOSS-6.1.0/jemboss/org'
make[3]: *** [install-recursive] Error 1
make[2]: *** [install-recursive] Error 1
make[1]: Leaving directory `/builddir/build/BUILD/EMBOSS-6.1.0/jemboss'
make[1]: *** [install-recursive] Error 1
make: *** [install-recursive] Error 1

6.0.1 including the 1-12 patch was building/installing fine. Am I making
something wrong? Cheers,

Julian


From uludag at ebi.ac.uk  Sun Jul 19 16:18:29 2009
From: uludag at ebi.ac.uk (uludag at ebi.ac.uk)
Date: Sun, 19 Jul 2009 17:18:29 +0100 (BST)
Subject: [EMBOSS] Packaging EMBOSS for Fedora
In-Reply-To: <h3v43r$tc5$1@ger.gmane.org>
References: <h0uef8$bdu$1@ger.gmane.org>
	<1244850556.8999.7.camel@login-svr1.ebi.ac.uk>
	<h3v43r$tc5$1@ger.gmane.org>
Message-ID: <49993.78.105.201.225.1248020309.squirrel@webmail.ebi.ac.uk>


> there seem to be some problems with make install:
>
> /usr/bin/make  install-exec-hook
> make[7]: Entering directory
> `/builddir/build/BUILD/EMBOSS-6.1.0/jemboss/org/emboss/jemboss/editor'
> mkdir -p --
> /builddir/build/BUILDROOT/EMBOSS-6.1.0-1.fc11.x86_64/usr/share/EMBOSS/jemboss/org/emboss/jemboss/editor
> /usr/bin/install: cannot stat `*.class': No such file or directory

Looks like we didn't test the --with-java and --with-javaos configure
options well, before this release. However, most users will not need these
two options any more as EMBOSS-6.1.0 includes precompiled jemboss class
files collected in a java archive file. You should hopefully not get the
above error if you omit these two options when you configure your emboss
installation.

Regards,
Mahmut


From biopython at maubp.freeserve.co.uk  Mon Jul 20 13:24:02 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 20 Jul 2009 14:24:02 +0100
Subject: [EMBOSS] transeq and ambiguous codons
In-Reply-To: <320fb6e00907090221m66f830b6ya25ad8eeeecf8f12@mail.gmail.com>
References: <320fb6e00907081450y2fd135e0x817f03c41357e297@mail.gmail.com>
	<1F1240778FB0AF46B4E5A72C44D2C74732E01A91@exch1-hi.accelrys.net>
	<320fb6e00907090159w6a76df66gbf8da2581594414d@mail.gmail.com>
	<4A55B578.1060808@ebi.ac.uk>
	<320fb6e00907090221m66f830b6ya25ad8eeeecf8f12@mail.gmail.com>
Message-ID: <320fb6e00907200624x5513f3b5u8f7fa0c88e23fc8d@mail.gmail.com>

On Thu, Jul 9, 2009 at 10:21 AM, Peter<biopython at maubp.freeserve.co.uk> wrote:
> On Thu, Jul 9, 2009 at 10:16 AM, Peter Rice<pmr at ebi.ac.uk> wrote:
>>
>> Peter C. wrote:
>>
>>> Thanks Scott (& Thomas) for pointing out the embossversion program.
>>>
>>> I would still question why the EMBOSS tools don't also support the
>>> Unix convention of a version switch. Hypothetically, aren't some
>>> (many?) of the tools standalone and couldn't they be installed
>>> individually (e.g. as part of someone else's software bundle)? i.e.
>>> Can EMBOSS really guarantee that the needle tool and the
>>> embossversion tool are in sync?
>>
>> We could easily add a -version global qualifier ... for the next release.
>>
>> We can guarantee that embossversion and needle are in sync - assuming
>> they are built using the same libraries as that is where the version is
>> recorded. Standalone build are an issue though and it would help debug
>> in a few cases.
>
> That sounds good to me :)
>

Thinking about this again, rather than adding a whole new argument
(-version), why not just include the program version as the first line of
the help output (from -help)? This should also solve the corner case
of standalone builds, and makes it very easy to find the version
(without having to know about the embossversion tool).

Thanks,

Peter C.


From pmr at ebi.ac.uk  Mon Jul 20 13:37:38 2009
From: pmr at ebi.ac.uk (Peter Rice)
Date: Mon, 20 Jul 2009 14:37:38 +0100
Subject: [EMBOSS] transeq and ambiguous codons
In-Reply-To: <320fb6e00907200624x5513f3b5u8f7fa0c88e23fc8d@mail.gmail.com>
References: <320fb6e00907081450y2fd135e0x817f03c41357e297@mail.gmail.com>	
	<1F1240778FB0AF46B4E5A72C44D2C74732E01A91@exch1-hi.accelrys.net>	
	<320fb6e00907090159w6a76df66gbf8da2581594414d@mail.gmail.com>	
	<4A55B578.1060808@ebi.ac.uk>	
	<320fb6e00907090221m66f830b6ya25ad8eeeecf8f12@mail.gmail.com>
	<320fb6e00907200624x5513f3b5u8f7fa0c88e23fc8d@mail.gmail.com>
Message-ID: <4A647322.7080903@ebi.ac.uk>

Peter C. wrote:
> Thinking about this again, rather than adding a whole new argument
> (-version), why not just include the program version as the first line of
> the help output (from -help)? This should also solve the corner case
> of standalone builds, and makes it very easy to find the version
> (without having to know about the embossversion tool).

Too late ... I already have working code for --version :-)

This can report the EMBOSS version number. For EMBASSY applications it
also reports the EMBASSY package version number (something embossversion
is unable to do).

There was also a "version:" ACD application attribute. This has to be
renamed to "versionnumber:" to avoid a clash with the new qualifier
name, and will be reported if defined. The "versionnumber" attribute in
ACD was originally defined only for third party applications in the
SoapLab interface so we do not expect renaming the attribute to cause
any problems (we will make the changes to SoapLab).

However, the suggestion of adding the the -help output is a good one. I
will add something there too.

regards,

Peter Rice


From biopython at maubp.freeserve.co.uk  Mon Jul 20 15:41:43 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 20 Jul 2009 16:41:43 +0100
Subject: [EMBOSS] EMBOSS seqret : IntelliGenetics and new DOS lines
Message-ID: <320fb6e00907200841p621dcb7eh63d71b0730100813@mail.gmail.com>

Hi all,

I've just updated my Mac to EMBOSS 6.1.0, and have found an
issue with seqret conversion of IntelliGenetics files. After some
digging, I think this problem relates to having DOS new lines in
a file on Unix (in my case, Mac OS X).

For illustration, I'm using the example file from the EMBOSS
website, saved to disk (using Unix new lines on a Mac):
http://emboss.sourceforge.net/docs/themes/seqformats/ig

Using EMBOSS 6.0.1, there was a problem:

$ embossversion
Writes the current EMBOSS version number to a file
6.0.1
$  seqret -sequence emboss_ig.txt -sformat ig -osformat fasta -auto -filter
>HSFAU
ttcctctttctcgactccatcttcgcggtagctgggaccgccgttcagtcgccaatatgc
agctctttgtccgcgcccaggagctacacaccttcgaggtgaccggccaggaaacggtcg
cccagatcaaggctcatgtagcctcactggagggcattgccccggaagatcaagtcgtgc
tcctggcaggcgcgcccctggaggatgaggccactctgggccagtgcggggtggaggccc
tgactaccctggaagtagcaggccgcatgcttggaggtaaagttcatggttccctggccc
gtgctggaaaagtgagaggtcagactcctaaggtggccaaacaggagaagaagaagaaga
agacaggtcgggctaagcggcggatgcagtacaaccggcgctttgtcaacgttgtgccca
cctttggcaagaagaagggccccaatgccaactcttaagtcttttgtaattctggctttc
tctaataaaaaagccacttagttcagtcaaaaaaaaaaH-sapiensfaugenebasesH
SFAUctaccattttccctctcgattctatatgtacactcgggacaagttctcctgatcga
aaacggcaaaactaaggccccaagtaggaatgccttagttttcggggttaacaatgatta
acactgagcctcacacccacgcgatgccctcagctcctcgctcagcgctctcaccaacag
ccgtagcccgcagccccgctggacaccggttctccatccccgcagcgtagcccggaacat
ggtagctgccatctttacctgctacgccagccttctgtgcgcgcaactgtctggtcccgc
cccgtcctgcgcgagctgctgcccaggcaggttcgccggtgcgagcgtaaaggggcggag
ctaggactgccttgggcggtacaaatagcagggaaccgcgcggtcgctcagcagtgacgt
gacacgcagcccacggtctgtactgacgcgccctcgcttcttcctctttctcgactccat
cttcgcggtagctgggaccgccgttcaggtaagaatggggccttggctggatccgaaggg
cttgtagcaggttggctgcggggtcagaaggcgcggggggaaccgaagaacggggcctgc
tccgtggccctgctccagtccctatccgaactccttgggaggcactggccttccgcacgt
gagccgccgcgaccaccatcccgtcgcgatcgtttctggaccgctttccactcccaaatc
tcctttatcccagagcatttcttggcttctcttacaagccgtcttttctttactcagtcg
ccaatatgcagctctttgtccgcgcccaggagctacacaccttcgaggtgaccggccagg
aaacggtcgcccagatcaaggtaaggctgcttggtgcgccctgggttccattttcttgtg
ctcttcactctcgcggcccgagggaacgcttacgagccttatctttccctgtaggctcat
gtagcctcactggagggcattgccccggaagatcaagtcgtgctcctggcaggcgcgccc
ctggaggatgaggccactctgggccagtgcggggtggaggccctgactaccctggaagta
gcaggccgcatgcttggaggtgagtgagagaggaatgttctttgaagtaccggtaagcgt
ctagtgagtgtggggtgcatagtcctgacagctgagtgtcacacctatggtaatagagta
cttctcactgtcttcagttcagagtgattcttcctgtttacatccctcatgttgaacaca
gacgtccatgggagactgagccagagtgtagttgtatttcagtcacatcacgagatccta
gtctggttatcagcttccacactaaaaattaggtcagaccaggccccaaagtgctctata
aattagaagctggaagatcctgaaatgaaacttaagatttcaaggtcaaatatctgcaac
tttgttctcattacctattgggcgcagcttctctttaaaggcttgaattgagaaaagagg
ggttctgctgggtggcaccttcttgctcttacctgctggtgccttcctttcccactacag
gtaaagtccatggttccctggcccgtgctggaaaagtgagaggtcagactcctaaggtga
gtgagagtattagtggtcatggtgttaggactttttttcctttcacagctaaaccaagtc
cctgggctcttactcggtttgccttctccctccctggagatgagcctgagggaagggatg
ctaggtgtggaagacaggaaccagggcctgattaaccttcccttctccaggtggccaaac
aggagaagaagaagaagaagacaggtcgggctaagcggcggatgcagtacaaccggcgct
ttgtcaacgttgtgcccacctttggcaagaagaagggccccaatgccaactcttaagtct
tttgtaattctggctttctctaataaaaaagccacttagttcagtcatcgcattgtttca
tctttacttgcaaggcctcagggagaggtgtgcttctcgg

i.e. The two sequences have been munged into one, with the
name of the second sequence as part of the sequence.

Using EMBOSS 6.1.0, the following now works:

$ embossversion
Reports the current EMBOSS version number
6.1.0
$ seqret -sequence emboss_ig.txt -sformat ig -osformat fasta -auto -filter
>HSFAU H.sapiens fau mRNA, 518 bases
ttcctctttctcgactccatcttcgcggtagctgggaccgccgttcagtcgccaatatgc
agctctttgtccgcgcccaggagctacacaccttcgaggtgaccggccaggaaacggtcg
cccagatcaaggctcatgtagcctcactggagggcattgccccggaagatcaagtcgtgc
tcctggcaggcgcgcccctggaggatgaggccactctgggccagtgcggggtggaggccc
tgactaccctggaagtagcaggccgcatgcttggaggtaaagttcatggttccctggccc
gtgctggaaaagtgagaggtcagactcctaaggtggccaaacaggagaagaagaagaaga
agacaggtcgggctaagcggcggatgcagtacaaccggcgctttgtcaacgttgtgccca
cctttggcaagaagaagggccccaatgccaactcttaagtcttttgtaattctggctttc
tctaataaaaaagccacttagttcagtcaaaaaaaaaa
>HSFAU1 H.sapiens fau 1 gene, 2016 bases
ctaccattttccctctcgattctatatgtacactcgggacaagttctcctgatcgaaaac
ggcaaaactaaggccccaagtaggaatgccttagttttcggggttaacaatgattaacac
tgagcctcacacccacgcgatgccctcagctcctcgctcagcgctctcaccaacagccgt
agcccgcagccccgctggacaccggttctccatccccgcagcgtagcccggaacatggta
gctgccatctttacctgctacgccagccttctgtgcgcgcaactgtctggtcccgccccg
tcctgcgcgagctgctgcccaggcaggttcgccggtgcgagcgtaaaggggcggagctag
gactgccttgggcggtacaaatagcagggaaccgcgcggtcgctcagcagtgacgtgaca
cgcagcccacggtctgtactgacgcgccctcgcttcttcctctttctcgactccatcttc
gcggtagctgggaccgccgttcaggtaagaatggggccttggctggatccgaagggcttg
tagcaggttggctgcggggtcagaaggcgcggggggaaccgaagaacggggcctgctccg
tggccctgctccagtccctatccgaactccttgggaggcactggccttccgcacgtgagc
cgccgcgaccaccatcccgtcgcgatcgtttctggaccgctttccactcccaaatctcct
ttatcccagagcatttcttggcttctcttacaagccgtcttttctttactcagtcgccaa
tatgcagctctttgtccgcgcccaggagctacacaccttcgaggtgaccggccaggaaac
ggtcgcccagatcaaggtaaggctgcttggtgcgccctgggttccattttcttgtgctct
tcactctcgcggcccgagggaacgcttacgagccttatctttccctgtaggctcatgtag
cctcactggagggcattgccccggaagatcaagtcgtgctcctggcaggcgcgcccctgg
aggatgaggccactctgggccagtgcggggtggaggccctgactaccctggaagtagcag
gccgcatgcttggaggtgagtgagagaggaatgttctttgaagtaccggtaagcgtctag
tgagtgtggggtgcatagtcctgacagctgagtgtcacacctatggtaatagagtacttc
tcactgtcttcagttcagagtgattcttcctgtttacatccctcatgttgaacacagacg
tccatgggagactgagccagagtgtagttgtatttcagtcacatcacgagatcctagtct
ggttatcagcttccacactaaaaattaggtcagaccaggccccaaagtgctctataaatt
agaagctggaagatcctgaaatgaaacttaagatttcaaggtcaaatatctgcaactttg
ttctcattacctattgggcgcagcttctctttaaaggcttgaattgagaaaagaggggtt
ctgctgggtggcaccttcttgctcttacctgctggtgccttcctttcccactacaggtaa
agtccatggttccctggcccgtgctggaaaagtgagaggtcagactcctaaggtgagtga
gagtattagtggtcatggtgttaggactttttttcctttcacagctaaaccaagtccctg
ggctcttactcggtttgccttctccctccctggagatgagcctgagggaagggatgctag
gtgtggaagacaggaaccagggcctgattaaccttcccttctccaggtggccaaacagga
gaagaagaagaagaagacaggtcgggctaagcggcggatgcagtacaaccggcgctttgt
caacgttgtgcccacctttggcaagaagaagggccccaatgccaactcttaagtcttttg
taattctggctttctctaataaaaaagccacttagttcagtcatcgcattgtttcatctt
tacttgcaaggcctcagggagaggtgtgcttctcgg

i.e. There was a problem with this example file in EMBOSS 6.0.1,
but things look fine in EMBOSS 6.1.0. Great :)

However, if we now convert this input file to use DOS/Windows
newlines, and repeat the test (on Mac OS X, so Unix):

$ embossversionReports the current EMBOSS version number
6.1.0
$ seqret -sequence emboss_ig.txt -sformat ig -osformat fasta -auto -filter
 H.sapiens fau mRNA, 518 bases
ttcctctttctcgactccatcttcgcggtagctgggaccgccgttcagtcgccaatatgc
agctctttgtccgcgcccaggagctacacaccttcgaggtgaccggccaggaaacggtcg
cccagatcaaggctcatgtagcctcactggagggcattgccccggaagatcaagtcgtgc
tcctggcaggcgcgcccctggaggatgaggccactctgggccagtgcggggtggaggccc
tgactaccctggaagtagcaggccgcatgcttggaggtaaagttcatggttccctggccc
gtgctggaaaagtgagaggtcagactcctaaggtggccaaacaggagaagaagaagaaga
agacaggtcgggctaagcggcggatgcagtacaaccggcgctttgtcaacgttgtgccca
cctttggcaagaagaagggccccaatgccaactcttaagtcttttgtaattctggctttc
tctaataaaaaagccacttagttcagtcaaaaaaaaaa
 H.sapiens fau 1 gene, 2016 bases
ctaccattttccctctcgattctatatgtacactcgggacaagttctcctgatcgaaaac
ggcaaaactaaggccccaagtaggaatgccttagttttcggggttaacaatgattaacac
tgagcctcacacccacgcgatgccctcagctcctcgctcagcgctctcaccaacagccgt
agcccgcagccccgctggacaccggttctccatccccgcagcgtagcccggaacatggta
gctgccatctttacctgctacgccagccttctgtgcgcgcaactgtctggtcccgccccg
tcctgcgcgagctgctgcccaggcaggttcgccggtgcgagcgtaaaggggcggagctag
gactgccttgggcggtacaaatagcagggaaccgcgcggtcgctcagcagtgacgtgaca
cgcagcccacggtctgtactgacgcgccctcgcttcttcctctttctcgactccatcttc
gcggtagctgggaccgccgttcaggtaagaatggggccttggctggatccgaagggcttg
tagcaggttggctgcggggtcagaaggcgcggggggaaccgaagaacggggcctgctccg
tggccctgctccagtccctatccgaactccttgggaggcactggccttccgcacgtgagc
cgccgcgaccaccatcccgtcgcgatcgtttctggaccgctttccactcccaaatctcct
ttatcccagagcatttcttggcttctcttacaagccgtcttttctttactcagtcgccaa
tatgcagctctttgtccgcgcccaggagctacacaccttcgaggtgaccggccaggaaac
ggtcgcccagatcaaggtaaggctgcttggtgcgccctgggttccattttcttgtgctct
tcactctcgcggcccgagggaacgcttacgagccttatctttccctgtaggctcatgtag
cctcactggagggcattgccccggaagatcaagtcgtgctcctggcaggcgcgcccctgg
aggatgaggccactctgggccagtgcggggtggaggccctgactaccctggaagtagcag
gccgcatgcttggaggtgagtgagagaggaatgttctttgaagtaccggtaagcgtctag
tgagtgtggggtgcatagtcctgacagctgagtgtcacacctatggtaatagagtacttc
tcactgtcttcagttcagagtgattcttcctgtttacatccctcatgttgaacacagacg
tccatgggagactgagccagagtgtagttgtatttcagtcacatcacgagatcctagtct
ggttatcagcttccacactaaaaattaggtcagaccaggccccaaagtgctctataaatt
agaagctggaagatcctgaaatgaaacttaagatttcaaggtcaaatatctgcaactttg
ttctcattacctattgggcgcagcttctctttaaaggcttgaattgagaaaagaggggtt
ctgctgggtggcaccttcttgctcttacctgctggtgccttcctttcccactacaggtaa
agtccatggttccctggcccgtgctggaaaagtgagaggtcagactcctaaggtgagtga
gagtattagtggtcatggtgttaggactttttttcctttcacagctaaaccaagtccctg
ggctcttactcggtttgccttctccctccctggagatgagcctgagggaagggatgctag
gtgtggaagacaggaaccagggcctgattaaccttcccttctccaggtggccaaacagga
gaagaagaagaagaagacaggtcgggctaagcggcggatgcagtacaaccggcgctttgt
caacgttgtgcccacctttggcaagaagaagggccccaatgccaactcttaagtcttttg
taattctggctttctctaataaaaaagccacttagttcagtcatcgcattgtttcatctt
tacttgcaaggcctcagggagaggtgtgcttctcgg

i.e. The ">" is missing on all the FASTA sequences.

So, it looks like EMBOSS 6.1.0 fixed one problem with
IntelliGenetics files, but that there is still an issue here.

Peter C.

P.S. Should I have reported this possible bug via sourceforge?

P.P.S. Back in 2006, I reported a similar issue with a data
corruption reading stockholm/pfam with DOS newlines
(Sourceforge Bug #1588956, long since fixed). It seems to
me that EMBOSS would benefit from explicit testing of all
the file formats using DOS/Windows newlines when run on
Unix, and vice versa. Does that sound feasible, or just
hopelessly ambitious?


From biopython at maubp.freeserve.co.uk  Mon Jul 20 15:50:32 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 20 Jul 2009 16:50:32 +0100
Subject: [EMBOSS] FASTQ format documentation
Message-ID: <320fb6e00907200850t39582781p115780690955bde6@mail.gmail.com>

Hi all,

I was just trying to double check the names EMBOSS 6.1.0 supports
for the various FASTQ file formats, and none of them are listed here:
http://emboss.sourceforge.net/docs/themes/SequenceFormats.html

Does this need updating, or should I be looking elsewhere?

Thanks

Peter C.


From pmr at ebi.ac.uk  Mon Jul 20 16:16:39 2009
From: pmr at ebi.ac.uk (Peter Rice)
Date: Mon, 20 Jul 2009 17:16:39 +0100
Subject: [EMBOSS] EMBOSS seqret : IntelliGenetics and new DOS lines
In-Reply-To: <320fb6e00907200841p621dcb7eh63d71b0730100813@mail.gmail.com>
References: <320fb6e00907200841p621dcb7eh63d71b0730100813@mail.gmail.com>
Message-ID: <4A649867.4090205@ebi.ac.uk>

Peter C. wrote:
> Hi all,
> 
> I've just updated my Mac to EMBOSS 6.1.0, and have found an
> issue with seqret conversion of IntelliGenetics files. After some
> digging, I think this problem relates to having DOS new lines in
> a file on Unix (in my case, Mac OS X).

we have an application "noreturn" to fix things like this.

If you send me your file I will ty to take a look at whether we shoudl
be catching the funny newline characters.

> P.S. Should I have reported this possible bug via sourceforge?

The emboss-bug at emboss.open-bio.org list is the best way to get our attention

> P.P.S. Back in 2006, I reported a similar issue with a data
> corruption reading stockholm/pfam with DOS newlines
> (Sourceforge Bug #1588956, long since fixed). It seems to
> me that EMBOSS would benefit from explicit testing of all
> the file formats using DOS/Windows newlines when run on
> Unix, and vice versa. Does that sound feasible, or just
> hopelessly ambitious?

We can try ... how well does biopytjhon handle these? (i.e. do we need
such examples for perl, python etc or is this an EMBOSS-specific issue?)

regards,

Peter


From pmr at ebi.ac.uk  Mon Jul 20 16:17:46 2009
From: pmr at ebi.ac.uk (Peter Rice)
Date: Mon, 20 Jul 2009 17:17:46 +0100
Subject: [EMBOSS] FASTQ format documentation
In-Reply-To: <320fb6e00907200850t39582781p115780690955bde6@mail.gmail.com>
References: <320fb6e00907200850t39582781p115780690955bde6@mail.gmail.com>
Message-ID: <4A6498AA.8040007@ebi.ac.uk>

Peter C. wrote:
> Hi all,
> 
> I was just trying to double check the names EMBOSS 6.1.0 supports
> for the various FASTQ file formats, and none of them are listed here:
> http://emboss.sourceforge.net/docs/themes/SequenceFormats.html
> 
> Does this need updating, or should I be looking elsewhere?

Yes, and yes.

We are cleaning up these things post-release. I will update that file,
and also the equivalent on the wiki

regards,

Peter Rice


From biopython at maubp.freeserve.co.uk  Mon Jul 20 16:30:45 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 20 Jul 2009 17:30:45 +0100
Subject: [EMBOSS] EMBOSS seqret : IntelliGenetics and new DOS lines
In-Reply-To: <4A649867.4090205@ebi.ac.uk>
References: <320fb6e00907200841p621dcb7eh63d71b0730100813@mail.gmail.com>
	<4A649867.4090205@ebi.ac.uk>
Message-ID: <320fb6e00907200930r6086a9a0x855de88cf9281543@mail.gmail.com>

On Mon, Jul 20, 2009 at 5:16 PM, Peter Rice<pmr at ebi.ac.uk> wrote:
>
> Peter C. wrote:
>> Hi all,
>>
>> I've just updated my Mac to EMBOSS 6.1.0, and have found an
>> issue with seqret conversion of IntelliGenetics files. After some
>> digging, I think this problem relates to having DOS new lines in
>> a file on Unix (in my case, Mac OS X).
>
> we have an application "noreturn" to fix things like this.

That's basically an EMBOSS variant on unix2dos and dos2unix
(or similar) existing Unix command line tools?

I'm more interested in having all the EMBOSS tools handle either
new line format themselves automatically. These days I am mostly
working on Unix (including Mac OS X), but I do have to cope with
Windows style text files quite often.

> If you send me your file I will ty to take a look at whether we shoudl
> be catching the funny newline characters.

For this bug report I was using:
http://emboss.sourceforge.net/docs/themes/seqformats/ig

There are another three example files used in the Biopython unit
tests here:
http://biopython.open-bio.org/SRC/biopython/Tests/IntelliGenetics/

>> P.S. Should I have reported this possible bug via sourceforge?
>
> The emboss-bug at emboss.open-bio.org list is the best way to get
> our attention

Great, another mailing list to sign up to... but if that is your
preferred route, that's fine.

>> P.P.S. Back in 2006, I reported a similar issue with a data
>> corruption reading stockholm/pfam with DOS newlines
>> (Sourceforge Bug #1588956, long since fixed). It seems to
>> me that EMBOSS would benefit from explicit testing of all
>> the file formats using DOS/Windows newlines when run on
>> Unix, and vice versa. Does that sound feasible, or just
>> hopelessly ambitious?
>
> We can try ... how well does biopytjhon handle these? (i.e. do we need
> such examples for perl, python etc or is this an EMBOSS-specific issue?)

I think this is an EMBOSS specific issue. I don't know enough about
how all the different EMBOSS parsers work, but is there a singl
place where you could add automatic handling of either new line
convention when reading in text?

For reference, in Python, you can explicitly open text files in "universal
newlines" mode, which takes care of this. I don't know about Perl.

Peter C.


From pmr at ebi.ac.uk  Mon Jul 20 20:04:19 2009
From: pmr at ebi.ac.uk (Peter Rice)
Date: Mon, 20 Jul 2009 21:04:19 +0100
Subject: [EMBOSS] EMBOSS seqret : IntelliGenetics and new DOS lines
In-Reply-To: <320fb6e00907200930r6086a9a0x855de88cf9281543@mail.gmail.com>
References: <320fb6e00907200841p621dcb7eh63d71b0730100813@mail.gmail.com>	
	<4A649867.4090205@ebi.ac.uk>
	<320fb6e00907200930r6086a9a0x855de88cf9281543@mail.gmail.com>
Message-ID: <4A64CDC3.9020902@ebi.ac.uk>

Peter C. wrote:

Thanks for the example files. I will start with those.

>>> P.S. Should I have reported this possible bug via sourceforge?
>> The emboss-bug at emboss.open-bio.org list is the best way to get
>> our attention
> 
> Great, another mailing list to sign up to... but if that is your
> preferred route, that's fine.

Actually, you can't sign up ... that's our bug reporting list :-)
> I think this is an EMBOSS specific issue. I don't know enough about
> how all the different EMBOSS parsers work, but is there a single
> place where you could add automatic handling of either new line
> convention when reading in text?

Hope so. I think the issue is places where the parsing is checking 
explicitly for \n rather than \n and \r. The solution would be to strip 
both off before parsing. It will need a thorough clean through the 
ajseqread code.

regards,

Peter


From biopython at maubp.freeserve.co.uk  Mon Jul 20 20:09:43 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 20 Jul 2009 21:09:43 +0100
Subject: [EMBOSS] EMBOSS seqret : IntelliGenetics and new DOS lines
In-Reply-To: <4A64CDC3.9020902@ebi.ac.uk>
References: <320fb6e00907200841p621dcb7eh63d71b0730100813@mail.gmail.com>
	<4A649867.4090205@ebi.ac.uk>
	<320fb6e00907200930r6086a9a0x855de88cf9281543@mail.gmail.com>
	<4A64CDC3.9020902@ebi.ac.uk>
Message-ID: <320fb6e00907201309r3ddca32dy3bfc159e201217b5@mail.gmail.com>

Peter Rice wrote:
>
> Thanks for the example files. I will start with those.
>
> Peter C. wrote:
>> I think this is an EMBOSS specific issue. I don't know enough about
>> how all the different EMBOSS parsers work, but is there a single
>> place where you could add automatic handling of either new line
>> convention when reading in text?
>
> Hope so. I think the issue is places where the parsing is checking
> explicitly for \n rather than \n and \r. The solution would be to strip
> both off before parsing. It will need a thorough clean through the
> ajseqread code.

That sounds like a good investment of effort in the long run :)

Peter C.


From pmr at ebi.ac.uk  Tue Jul 21 09:15:59 2009
From: pmr at ebi.ac.uk (Peter Rice)
Date: Tue, 21 Jul 2009 10:15:59 +0100
Subject: [EMBOSS] EMBOSS seqret : IntelliGenetics and new DOS lines
In-Reply-To: <320fb6e00907200841p621dcb7eh63d71b0730100813@mail.gmail.com>
References: <320fb6e00907200841p621dcb7eh63d71b0730100813@mail.gmail.com>
Message-ID: <4A65874F.2010106@ebi.ac.uk>

Peter C. wrote:
> However, if we now convert this input file to use DOS/Windows
> newlines, and repeat the test (on Mac OS X, so Unix):
> 
> $ embossversionReports the current EMBOSS version number
> 6.1.0
> $ seqret -sequence emboss_ig.txt -sformat ig -osformat fasta -auto -filter
>  H.sapiens fau mRNA, 518 bases
> ttcctctttctcgactccatcttcgcggtagctgggaccgccgttcagtcgccaatatgc
>
> i.e. The ">" is missing on all the FASTA sequences.

Actually, it's not missing ... it is hiding.

The sequence id has a ^M appended to it, so the '> and the id get
overwritten by the description when you look at the file.

Fixed by processing the IG format ID rather than simply copying it.

Thanks for finding that one.

regards,

Peter Rice


From biopython at maubp.freeserve.co.uk  Tue Jul 21 09:23:03 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 21 Jul 2009 10:23:03 +0100
Subject: [EMBOSS] EMBOSS seqret : IntelliGenetics and new DOS lines
In-Reply-To: <4A65874F.2010106@ebi.ac.uk>
References: <320fb6e00907200841p621dcb7eh63d71b0730100813@mail.gmail.com>
	<4A65874F.2010106@ebi.ac.uk>
Message-ID: <320fb6e00907210223x5a145ab5qfa9b29bc42a37815@mail.gmail.com>

Peter Rice wrote:
>
> Peter C. wrote:
>> However, if we now convert this input file to use DOS/Windows
>> newlines, and repeat the test (on Mac OS X, so Unix):
>>
>> $ embossversionReports the current EMBOSS version number
>> 6.1.0
>> $ seqret -sequence emboss_ig.txt -sformat ig -osformat fasta -auto -filter
>> ?H.sapiens fau mRNA, 518 bases
>> ttcctctttctcgactccatcttcgcggtagctgggaccgccgttcagtcgccaatatgc
>>
>> i.e. The ">" is missing on all the FASTA sequences.
>
> Actually, it's not missing ... it is hiding.
>
> The sequence id has a ^M appended to it, so the '> and the id get
> overwritten by the description when you look at the file.

That makes sense, and I think I can see how it might have happened.

> Fixed by processing the IG format ID rather than simply copying it.
>
> Thanks for finding that one.

Sure,

Peter C.


From db60 at st-andrews.ac.uk  Tue Jul 21 11:24:28 2009
From: db60 at st-andrews.ac.uk (Daniel Barker)
Date: Tue, 21 Jul 2009 12:24:28 +0100
Subject: [EMBOSS] EMBOSS seqret : IntelliGenetics and new DOS lines
In-Reply-To: <320fb6e00907200930r6086a9a0x855de88cf9281543@mail.gmail.com>
References: <320fb6e00907200841p621dcb7eh63d71b0730100813@mail.gmail.com>	<4A649867.4090205@ebi.ac.uk>
	<320fb6e00907200930r6086a9a0x855de88cf9281543@mail.gmail.com>
Message-ID: <4A65A56C.9010308@st-andrews.ac.uk>

Dear Peters et al.,

EMBOSS claims not to care about whether newlines are DOS or UNIX:

'EMBOSS programs can read in both PC and Unix text file formats, so it 
is not necessary for you to use this utility all of the time' - noreturn 
documentation.

This would certainly be good. 'The newline problem' must be the single 
biggest computational waste of time I've experienced over the years!

It's easy to avoid with tr, u2d, d2u, noreturn, etc. - but it's just one 
other thing that can go wrong, especially when data is shared between 
different places.

I've noticed a small amount of software, in the world in general, still 
uses the Mac OS 9 (and earlier) convention where newline is \015 only. 
E.g. this tab-delimited text saved from Excel 2004 for Mac:

$ od -bc Workbook1.txt
0000000   061 011 062 011 063 015 064 011 065 011 066
            1  \t   2  \t   3  \r   4  \t   5  \t   6
0000013
$

I expect this usage will decline, since it's in conflict with the 
convention of Mac OS X's own command-line tools (\012 only, like Linux). 
Probably the '\015 only' convention hasn't had much impact on 
bioinformatics anyway?

Best wishes,

Daniel

-- 
Daniel Barker
http://bio.st-andrews.ac.uk/staff/db60.htm
The University of St Andrews is a charity registered in Scotland :
No SC013532


From pmr at ebi.ac.uk  Tue Jul 21 12:12:00 2009
From: pmr at ebi.ac.uk (Peter Rice)
Date: Tue, 21 Jul 2009 13:12:00 +0100
Subject: [EMBOSS] EMBOSS seqret : IntelliGenetics and new DOS lines
In-Reply-To: <4A65A56C.9010308@st-andrews.ac.uk>
References: <320fb6e00907200841p621dcb7eh63d71b0730100813@mail.gmail.com>	<4A649867.4090205@ebi.ac.uk>	<320fb6e00907200930r6086a9a0x855de88cf9281543@mail.gmail.com>
	<4A65A56C.9010308@st-andrews.ac.uk>
Message-ID: <4A65B090.7030507@ebi.ac.uk>

Daniel Barker wrote:
> Dear Peters et al.,
> 
> EMBOSS claims not to care about whether newlines are DOS or UNIX:
> 
> 'EMBOSS programs can read in both PC and Unix text file formats, so it
> is not necessary for you to use this utility all of the time' - noreturn
> documentation.
> 
> This would certainly be good. 'The newline problem' must be the single
> biggest computational waste of time I've experienced over the years!

Indeed. We get a little caught between accepting the extra carriage
returns and the need for efficient parsing as some users run seqret to
reformat large sequence files. Mac format (\015 only) is
horrible because the C library functions are looking for line feed
unless you're running on a Mac.

I will give our examples a run through after converting to PC format and
see if any others fall over.

> I've noticed a small amount of software, in the world in general, still
> uses the Mac OS 9 (and earlier) convention where newline is \015 only.
> E.g. this tab-delimited text saved from Excel 2004 for Mac:
> 
> $ od -bc Workbook1.txt
> 0000000   061 011 062 011 063 015 064 011 065 011 066
>            1  \t   2  \t   3  \r   4  \t   5  \t   6
> 0000013
> $
> 
> I expect this usage will decline, since it's in conflict with the
> convention of Mac OS X's own command-line tools (\012 only, like Linux).
> Probably the '\015 only' convention hasn't had much impact on
> bioinformatics anyway?

In my experience mac users didn't venture out into the real world. On the
other hand, unix users have often copied files from PCs. I used to do
the same myself, which was why I had to write noreturn in the first place.

regards,

Peter Rice


From stephen.taylor at imm.ox.ac.uk  Thu Jul 23 09:43:35 2009
From: stephen.taylor at imm.ox.ac.uk (Stephen Taylor)
Date: Thu, 23 Jul 2009 10:43:35 +0100
Subject: [EMBOSS] Tree building
Message-ID: <EC064FE0-9C21-40FE-BF69-1179EBC76CC9@imm.ox.ac.uk>

Hi,

Are there any phylogeny programs in EMBOSS/EMBASSY that will allow me  
to build a tree from a multiple alignment without truncating the id  
name in the resulting treefile?

Thanks,

Steve


From pmr at ebi.ac.uk  Thu Jul 23 10:51:06 2009
From: pmr at ebi.ac.uk (Peter Rice)
Date: Thu, 23 Jul 2009 11:51:06 +0100
Subject: [EMBOSS] Tree building
In-Reply-To: <EC064FE0-9C21-40FE-BF69-1179EBC76CC9@imm.ox.ac.uk>
References: <EC064FE0-9C21-40FE-BF69-1179EBC76CC9@imm.ox.ac.uk>
Message-ID: <4A68409A.7020004@ebi.ac.uk>

Stephen Taylor wrote:
> Hi,
> 
> Are there any phylogeny programs in EMBOSS/EMBASSY that will allow me to 
> build a tree from a multiple alignment without truncating the id name in 
> the resulting treefile?

This can be done by adding some new output formats to the existing 
phylipnew embassy applications. Phylip truncates id names for its own 
formats. We can extend them, and only truncate for strict phylip format 
outputs.

We are also open to suggestions for new applications in phylogenetics

A good topic to discuss at the GMOD workshop in Oxford.

regards,

Peter


From pmr at ebi.ac.uk  Thu Jul 23 14:06:58 2009
From: pmr at ebi.ac.uk (Peter Rice)
Date: Thu, 23 Jul 2009 15:06:58 +0100
Subject: [EMBOSS] Tree building
In-Reply-To: <F85F706C-9198-4020-A232-8E44E8F43803@imm.ox.ac.uk>
References: <EC064FE0-9C21-40FE-BF69-1179EBC76CC9@imm.ox.ac.uk>
	<4A68409A.7020004@ebi.ac.uk>
	<F85F706C-9198-4020-A232-8E44E8F43803@imm.ox.ac.uk>
Message-ID: <4A686E82.1010803@ebi.ac.uk>

Stephen Taylor wrote:
> Hi Peter,
> 
>> Stephen Taylor wrote:
>>> Hi,
>>> Are there any phylogeny programs in EMBOSS/EMBASSY that will allow me 
>>> to build a tree from a multiple alignment without truncating the id 
>>> name in the resulting treefile?
>>
>> This can be done by adding some new output formats to the existing 
>> phylipnew embassy applications. Phylip truncates id names for its own 
>> formats. We can extend them, and only truncate for strict phylip 
>> format outputs.
>>
>> We are also open to suggestions for new applications in phylogenetics
>>
> 
> Sounds like a good idea.
> 
> However, I kind of need it today:-). I guess I'll have to truncate them 
> to something sensible for the time being...

Can you give some example data and algorithms to make the tree?

Peter


From stephen.taylor at imm.ox.ac.uk  Thu Jul 23 15:07:36 2009
From: stephen.taylor at imm.ox.ac.uk (Stephen Taylor)
Date: Thu, 23 Jul 2009 16:07:36 +0100
Subject: [EMBOSS] Tree building
In-Reply-To: <50E41C3A73F46B4D876BD13F5E80264A02DAD6C6@exchange3.sims.scri.sari.ac.uk>
References: <EC064FE0-9C21-40FE-BF69-1179EBC76CC9@imm.ox.ac.uk><4A68409A.7020004@ebi.ac.uk><F85F706C-9198-4020-A232-8E44E8F43803@imm.ox.ac.uk>
	<4A686E82.1010803@ebi.ac.uk>
	<50E41C3A73F46B4D876BD13F5E80264A02DAD6C6@exchange3.sims.scri.sari.ac.uk>
Message-ID: <DB9966B9-F981-4A1D-8154-DD571ADB83A7@imm.ox.ac.uk>

Hi Frank,

> Hi Stephen,
>
>> However, I kind of need it today:-). I guess I'll have to truncate  
>> them
>> to something sensible for the time being...
>
> If you need a tree *today*... :-)  you could download our TOPALi  
> package (loads all formats of alignment)
>

I have already installed Topali and was waiting for the results to  
run. I have Over 1600 sequences and I wasn't sure when it was going to  
finish since I started it this morning (using the Quick NJ option on a  
Mac Mini). I think we should install it on a more powerful machine;-).

However, after I got your email I checked the status and the job had  
finished. Good timing...


> It does basic NJ, but also model-selection, PhyML, RaxML and  
> MrBayes.  RaxML is quite fast.
>
> We internally rename species names when TOPALi submits jobs to the  
> above external programs, but put the names back when we show the  
> alignment and the tree.


Yes, all the labels are there in full. Thanks a lot.

Peter - I can send you some date offline if you wish.

Steve


From stephen.taylor at imm.ox.ac.uk  Thu Jul 23 13:34:26 2009
From: stephen.taylor at imm.ox.ac.uk (Stephen Taylor)
Date: Thu, 23 Jul 2009 14:34:26 +0100
Subject: [EMBOSS] Tree building
In-Reply-To: <4A68409A.7020004@ebi.ac.uk>
References: <EC064FE0-9C21-40FE-BF69-1179EBC76CC9@imm.ox.ac.uk>
	<4A68409A.7020004@ebi.ac.uk>
Message-ID: <F85F706C-9198-4020-A232-8E44E8F43803@imm.ox.ac.uk>

Hi Peter,

> Stephen Taylor wrote:
>> Hi,
>> Are there any phylogeny programs in EMBOSS/EMBASSY that will allow  
>> me to build a tree from a multiple alignment without truncating the  
>> id name in the resulting treefile?
>
> This can be done by adding some new output formats to the existing  
> phylipnew embassy applications. Phylip truncates id names for its  
> own formats. We can extend them, and only truncate for strict phylip  
> format outputs.
>
> We are also open to suggestions for new applications in phylogenetics
>

Sounds like a good idea.

However, I kind of need it today:-). I guess I'll have to truncate  
them to something sensible for the time being...

Thanks.

Steve


From gbottu at vub.ac.be  Thu Jul 23 16:19:11 2009
From: gbottu at vub.ac.be (Guy Bottu)
Date: Thu, 23 Jul 2009 18:19:11 +0200
Subject: [EMBOSS] Tree building
In-Reply-To: <4A68409A.7020004@ebi.ac.uk>
References: <EC064FE0-9C21-40FE-BF69-1179EBC76CC9@imm.ox.ac.uk>
	<4A68409A.7020004@ebi.ac.uk>
Message-ID: <4A688D7F.6080304@vub.ac.be>

Peter Rice wrote:
> This can be done by adding some new output formats to the existing 
> phylipnew embassy applications. Phylip truncates id names for its own 
> formats. We can extend them, and only truncate for strict phylip format 
> outputs.

Interesting to note the following : standard PHYLIP format reserves the 
first 10 characters for the name and has the sequence start at position 
11. But some programmers like the one of PhyML use a format they call 
PHYLIP, which however allows a name of any length but then demands that 
there be a space between the name and the sequence. I already had 
trouble when I used a standard PHYLIP file made by PHYLIP or EMBOSS with 
a name of 10 characters as input to PhyML.

	Guy Bottu


From belegdol at gmail.com  Wed Jul 29 11:54:49 2009
From: belegdol at gmail.com (Julian Sikorski)
Date: Wed, 29 Jul 2009 13:54:49 +0200
Subject: [EMBOSS] Packaging EMBOSS for Fedora
In-Reply-To: <49993.78.105.201.225.1248020309.squirrel@webmail.ebi.ac.uk>
References: <h0uef8$bdu$1@ger.gmane.org>	<1244850556.8999.7.camel@login-svr1.ebi.ac.uk>	<h3v43r$tc5$1@ger.gmane.org>
	<49993.78.105.201.225.1248020309.squirrel@webmail.ebi.ac.uk>
Message-ID: <h4pdaa$ad5$1@ger.gmane.org>

W dniu 19.07.2009 18:18, uludag at ebi.ac.uk pisze:
> 
>> there seem to be some problems with make install:
>>
>> /usr/bin/make  install-exec-hook
>> make[7]: Entering directory
>> `/builddir/build/BUILD/EMBOSS-6.1.0/jemboss/org/emboss/jemboss/editor'
>> mkdir -p --
>> /builddir/build/BUILDROOT/EMBOSS-6.1.0-1.fc11.x86_64/usr/share/EMBOSS/jemboss/org/emboss/jemboss/editor
>> /usr/bin/install: cannot stat `*.class': No such file or directory
> 
> Looks like we didn't test the --with-java and --with-javaos configure
> options well, before this release. However, most users will not need these
> two options any more as EMBOSS-6.1.0 includes precompiled jemboss class
> files collected in a java archive file. You should hopefully not get the
> above error if you omit these two options when you configure your emboss
> installation.
> 
> Regards,
> Mahmut
Thank you, removing these two seems to have done the trick!

Julian


From biopython at maubp.freeserve.co.uk  Thu Jul 30 15:00:37 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 30 Jul 2009 16:00:37 +0100
Subject: [EMBOSS] FASTQ records with no sequence?
Message-ID: <320fb6e00907300800x5f8e78eci5df8333df713e4c@mail.gmail.com>

Hi all,

On the continuing topic of the nebulous FASTQ format, are there
any strong views as to weather a FASTQ files could hold records
without a sequence (and therefore no quality scores)? This could
make sense as output from an (agressive) quality filter.

This is corner case, and applies to other file formats too of course
(e.g. FASTA).

I mentioned this to Peter Rice (EMBOSS) off list, and he replied:

On Thu, Jul 30, 2009 at 2:56 PM, Peter Rice<pmr at ebi.ac.uk> wrote:
> EMBOSS rejects zero length sequences - something we put in some years
> ago for misformatted FASTA files that someone ran through a Taverna
> workflow to launch clustalw via EMBOSS's "emma". The user had got his
> carriage control characters mangled so the sequence was appended to the
> FASTA '>' line and appeared as a long description with no sequence.
>
> I can well imagine for filtering paired reads that zero length sequences
> would be useful.
>
> At the point where the test is made we know the sequence format.
> We can therefore define some or all formats as accepting or rejecting
> zero length sequences.
>
> Similarly we can easily extend to define some applications (e.g. emma)
> as requiring a minimum sequence length.
>
> regards,
>
> Peter

Peter Rice is of course correct - in general the meaning and validity
of a zero length sequence is context dependent.

I think Peter Rice makes a good point regarding paired end reads.
What I assume we was getting at is the situation where due to
quality trimming, one of a pair might be trimmed to nothing - leaving
essentially a singleton read. However, paired end reads are normally
stored using a matched pair of FASTQ files, so it could be important
to keep the zero length read present, so that they can be read in
together in sync.

If we do want to allow zero length sequences in FASTQ, would
both of the following be valid? Should there be empty sequence
and quality lines, or no sequence and quality lines?

"@identifier\n+\n" (two lines, just the @ and + lines)
"@identifier\n\n+\n\n" (four lines, including blank seq and qual lines)

or with the repeated identifier on the plus lines:

"@identifier\n+identifier\n" (two lines, just the @ and + lines)
"@identifier\n\n+identifier\n\n" (four lines, including blank lines)

As we are recommending no line wrapping on output this means
typical FASTQ records would be four lines - so doing the same
makes sense here too.

Peter C.


From pmr at ebi.ac.uk  Thu Jul 30 15:09:41 2009
From: pmr at ebi.ac.uk (Peter Rice)
Date: Thu, 30 Jul 2009 16:09:41 +0100
Subject: [EMBOSS] FASTQ records with no sequence?
In-Reply-To: <320fb6e00907300800x5f8e78eci5df8333df713e4c@mail.gmail.com>
References: <320fb6e00907300800x5f8e78eci5df8333df713e4c@mail.gmail.com>
Message-ID: <4A71B7B5.40502@ebi.ac.uk>

Peter C. wrote:

> As we are recommending no line wrapping on output this means
> typical FASTQ records would be four lines - so doing the same
> makes sense here too.

I vote for 4 lines on output. It should be possible to allow zero lines
on input depending on where the '+' check is.

regards,

Peter


From biopython at maubp.freeserve.co.uk  Thu Jul 30 15:19:24 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 30 Jul 2009 16:19:24 +0100
Subject: [EMBOSS] FASTQ records with no sequence?
In-Reply-To: <4A71B7B5.40502@ebi.ac.uk>
References: <320fb6e00907300800x5f8e78eci5df8333df713e4c@mail.gmail.com>
	<4A71B7B5.40502@ebi.ac.uk>
Message-ID: <320fb6e00907300819x35ae00c3wa20a382376134db7@mail.gmail.com>

On Thu, Jul 30, 2009 at 4:09 PM, Peter Rice<pmr at ebi.ac.uk> wrote:
>
> Peter C. wrote:
>
>> As we are recommending no line wrapping on output this means
>> typical FASTQ records would be four lines - so doing the same
>> makes sense here too.
>
> I vote for 4 lines on output.

If we want to allow zero length sequences, then yes, I would also
vote for the 4 line output (i.e. blank lines for the sequence and
the quality string).

> It should be possible to allow zero lines on input depending on
> where the '+' check is.

Yes, I'm pretty sure a parser could cope with any of the zero length
sequence FASTQ examples I gave.

Peter


From biopython at maubp.freeserve.co.uk  Thu Jul 30 15:35:25 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 30 Jul 2009 16:35:25 +0100
Subject: [EMBOSS] FASTQ records with no sequence?
In-Reply-To: <320fb6e00907300819x35ae00c3wa20a382376134db7@mail.gmail.com>
References: <320fb6e00907300800x5f8e78eci5df8333df713e4c@mail.gmail.com>
	<4A71B7B5.40502@ebi.ac.uk>
	<320fb6e00907300819x35ae00c3wa20a382376134db7@mail.gmail.com>
Message-ID: <320fb6e00907300835v3a9d46d4w77c344bbf6efa08d@mail.gmail.com>

Hi all,

On the continuing topic of the nebulous FASTQ format, are there
any strong views as to weather a FASTQ files could hold records
without a sequence (and therefore no quality scores)? This could
make sense as output from an (aggressive) quality filter.

This was a discussion I meant to start on the OBF list, not the
EMBOSS list - so here is the start of the thread:
http://lists.open-bio.org/pipermail/emboss/2009-July/003707.html

Basically in some contexts an empty FASTQ record makes sense,
so perhaps we should include examples of this for our test suite.
However, there is more than one reasonable way to represent
such a record (either omitting the sequence and quality lines, or
including blank sequence and quality lines).

On Thu, Jul 30, 2009 at 4:09 PM, Peter Rice<pmr at ebi.ac.uk> wrote:
>
> Peter C. wrote:
>
>> As we are recommending no line wrapping on output this means
>> typical FASTQ records would be four lines - so doing the same
>> makes sense here too.
>
> I vote for 4 lines on output.

If we want to allow zero length sequences, then yes, I would also
vote for the 4 line output (i.e. blank lines for the sequence and
the quality string).

> It should be possible to allow zero lines on input depending on
> where the '+' check is.

Yes, I'm pretty sure a parser could cope with any of the zero length
sequence FASTQ examples I gave.

Peter


From wzhao6898 at gmail.com  Fri Jul 31 16:21:06 2009
From: wzhao6898 at gmail.com (David Zhao)
Date: Fri, 31 Jul 2009 16:21:06 +0000 (UTC)
Subject: [EMBOSS] How to specify the number of partial digestion using DIGEST
Message-ID: <loom.20090731T161701-636@post.gmane.org>

Hi there,

I'm trying to use DIGEST to generate virtual tryptic digest for a list of 
proteins, where I'd like to get 2 or 3 miscleavage digestion. I know -overlap 
returns 1 miscleavage, and -allpartials returns all the combinations 
and result in a huge file, is there a way I can specify how many miscleavage I 
need?
Thanks,

David Zhao