From biopython at maubp.freeserve.co.uk Mon Jul 6 06:12:23 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 6 Jul 2009 11:12:23 +0100 Subject: [EMBOSS] Many-to-many with needle and water Message-ID: <320fb6e00907060312l5506c766i1c4a4ff48fba8c7e@mail.gmail.com> Hi Peter R. et al, I gather EMBOSS is looking for feedback for new applications (given the recent funding from the BBSRC - congratulations again). How about suggestions for extensions to existing EMBOSS applications? I've used bits of EMBOSS for several years now (thank you!). Something I have sometimes wanted to do is a many-to-many pairwise sequence alignment with the EMBOSS tools needle and water. Right now, needle and water take two files (here referred to as A and B), file A has just one sequence, and file B can have one or more sequences. I'd like to be able to supply two files both with multiple entries, and have needle/water do pairwise alignments between all the sequences in A against all the sequences in B. This might be useful for finding reciprocal best hits in comparative genomics (as an slower but exact alternative to FASTA or BLAST). >From an implementation point of view, I might imagine doing sequence A1 against all of B, then sequence A2 against all of B, etc. This would require looping over file B many times (easy if on disk). This would also work if the A input was stdin, but having the B input on stdin would require caching the data if A has more than one sequence :( It may sometimes also be useful to have an all-against-all pairwise comparison for a single set of sequences. The above suggested enhancement would let you do this by comparing file A to file A. However, here you only really need to do half the possible combinations (as aligning sequence A1 to sequence A2 should be the same as A2 to A1). This could be useful for implementing a basic clustering algorithm, or maybe as part of a worked example in building a simple NJ tree? So, does supporting many-to-many comparisons sound like a useful enhancement to needle and water? I should stress this isn't something I need right now. Also, it can be worked around with a wrapper script to call needle/water once for each sequence in file A (against all the sequences in file B), with the added bonus that then these jobs one-to-many comparisons can then be shared across multiple CPU cores. Regards, Peter C. From pmr at ebi.ac.uk Mon Jul 6 06:35:12 2009 From: pmr at ebi.ac.uk (Peter Rice) Date: Mon, 06 Jul 2009 11:35:12 +0100 Subject: [EMBOSS] Many-to-many with needle and water In-Reply-To: <320fb6e00907060312l5506c766i1c4a4ff48fba8c7e@mail.gmail.com> References: <320fb6e00907060312l5506c766i1c4a4ff48fba8c7e@mail.gmail.com> Message-ID: <4A51D360.1050107@ebi.ac.uk> Peter Cock or biopython wrote: > Hi Peter R. et al, > > I gather EMBOSS is looking for feedback for new applications (given > the recent funding from the BBSRC - congratulations again). How about > suggestions for extensions to existing EMBOSS applications? > > I've used bits of EMBOSS for several years now (thank you!). Something > I have sometimes wanted to do is a many-to-many pairwise sequence > alignment with the EMBOSS tools needle and water. > > Right now, needle and water take two files (here referred to as A and > B), file A has just one sequence, and file B can have one or more > sequences. I'd like to be able to supply two files both with multiple > entries, and have needle/water do pairwise alignments between all the > sequences in A against all the sequences in B. This might be useful > for finding reciprocal best hits in comparative genomics (as an slower > but exact alternative to FASTA or BLAST). The application is easy to add (after the release) The usual problem with all-against-all is that it involves loading one of the inputs as a sequence set entirely in memory - to avoid reading one input many times over. We have an application supermatcher which does this - the first sequence is streamed through, the second is a sequence set loaded into memory. It uses work matching to find seed alignments then runs a limited alignment around the hits. superwater would be a possible name (or superneedle). How popular would such a program be? How large would the smaller input set be? regards, Peter From biopython at maubp.freeserve.co.uk Mon Jul 6 06:58:06 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 6 Jul 2009 11:58:06 +0100 Subject: [EMBOSS] Many-to-many with needle and water In-Reply-To: <4A51D360.1050107@ebi.ac.uk> References: <320fb6e00907060312l5506c766i1c4a4ff48fba8c7e@mail.gmail.com> <4A51D360.1050107@ebi.ac.uk> Message-ID: <320fb6e00907060358v71934109o1d8acb0aef8d3e83@mail.gmail.com> On Mon, Jul 6, 2009 at 11:35 AM, Peter Rice wrote: > > Peter C wrote: > > Hi Peter R. et al, > > > > I gather EMBOSS is looking for feedback for new applications (given > > the recent funding from the BBSRC - congratulations again). How about > > suggestions for extensions to existing EMBOSS applications? > > > > I've used bits of EMBOSS for several years now (thank you!). Something > > I have sometimes wanted to do is a many-to-many pairwise sequence > > alignment with the EMBOSS tools needle and water. > > > > Right now, needle and water take two files (here referred to as A and > > B), file A has just one sequence, and file B can have one or more > > sequences. I'd like to be able to supply two files both with multiple > > entries, and have needle/water do pairwise alignments between all the > > sequences in A against all the sequences in B. This might be useful > > for finding reciprocal best hits in comparative genomics (as an slower > > but exact alternative to FASTA or BLAST). > > The application is easy to add (after the release) > > The usual problem with all-against-all is that it involves loading one > of the inputs as a sequence set entirely in memory - to avoid reading > one input many times over. Right - and it would be difficult to decide if in memory vs reading the file many times is best in general without some specific use cases. [I suppose you could do something a bit more cunning like start by caching the sequences as you read them read for re-use, but if the number of sequences crosses a threshold, stop caching and switch to re-reading the file for subsequence loops?] > We have an application supermatcher which does this - the first sequence > is streamed through, the second is a sequence set loaded into memory. It > uses work matching to find seed alignments then runs a limited alignment > around the hits. > > superwater would be a possible name (or superneedle). If you see many-to-many versions of water and needle as a separate applications, then those names sound fine. > How popular would such a program be? I don't know - as I said, this is more of suggestion than a request. I don't *need* this tool, but there have been occasions in the past where I would have tried using it if it had existed. Perhaps others on the list can think of a better uses for this tool idea? > How large would the smaller input set be? Hard to say without specific examples in mind. For some hand waving upper limits, for comparative genomics of bacteria using protein sequences, you might have a few thousand in each file. If I was trying this as part of an ad-hoc clustering algorithm (all-against-all), again maybe a few thousand sequences. In practice, a heuristic tool like supermatcher (or FASTA or BLAST) would probably be more sensible for large datasets like this due to the computational time. I see needle and water as most useful on smaller datasets where the runtime cost of using an exact algorithm isn't too high. Therefore many-to-many needle/water searches may be best targeted at smaller sequence files. Things might be different with a multicore or GPU/OpenCL version of needle and water ;) Anyway, unless someone else thinks a many-to-many version of needle and water would be useful, I wouldn't expect you to implement this. I'm just putting the idea forward for discussion. Regards, Peter C. From pmr at ebi.ac.uk Mon Jul 6 07:19:30 2009 From: pmr at ebi.ac.uk (Peter Rice) Date: Mon, 06 Jul 2009 12:19:30 +0100 Subject: [EMBOSS] Many-to-many with needle and water In-Reply-To: <320fb6e00907060358v71934109o1d8acb0aef8d3e83@mail.gmail.com> References: <320fb6e00907060312l5506c766i1c4a4ff48fba8c7e@mail.gmail.com> <4A51D360.1050107@ebi.ac.uk> <320fb6e00907060358v71934109o1d8acb0aef8d3e83@mail.gmail.com> Message-ID: <4A51DDC2.9030801@ebi.ac.uk> Peter C wrote: > [I suppose you could do something a bit more cunning like start by > caching the sequences as you read them read for re-use, but if the > number of sequences crosses a threshold, stop caching and switch > to re-reading the file for subsequence loops?] Tricky. Rereading is not always possible - for example streamed standard input as the data source. > Perhaps others on the list can think of a better uses for this tool idea? Let's see what response we get. One never knows until the question is asked :-) >> How large would the smaller input set be? > > Hard to say without specific examples in mind. For some hand waving > upper limits, for comparative genomics of bacteria using protein > sequences, you might have a few thousand in each file. If I was trying > this as part of an ad-hoc clustering algorithm (all-against-all), again > maybe a few thousand sequences. In practice, a heuristic tool like > supermatcher (or FASTA or BLAST) would probably be more sensible > for large datasets like this due to the computational time. > > I see needle and water as most useful on smaller datasets where > the runtime cost of using an exact algorithm isn't too high. Therefore > many-to-many needle/water searches may be best targeted at > smaller sequence files. Things might be different with a multicore > or GPU/OpenCL version of needle and water ;) Multicore would be a possibility - at least on systems configured for it. We are looking into picking up methods from the BioManyCores project. > Anyway, unless someone else thinks a many-to-many version > of needle and water would be useful, I wouldn't expect you to > implement this. I'm just putting the idea forward for discussion. Implementing is easy - we could simply send you the code to install locally if nobody else needs it :-) After all, it is only a minor modification to the existing applications. regards, Peter From biopython at maubp.freeserve.co.uk Mon Jul 6 07:56:06 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 6 Jul 2009 12:56:06 +0100 Subject: [EMBOSS] Probabilistic versions of needle/water? Message-ID: <320fb6e00907060456t42879f56ta5df97ac6ae8ec8e@mail.gmail.com> Hi all, I have another suggestion for new or enhanced EMBOSS applications, again related to the existing pairwise sequence alignment tools needle and water. The FASTQ file format (or others) contains quality scores (often PHRED scores) representing the probability of an error in the associated nucleotide. Solexa/Illumina machines also provide another file with a more precise breakdown of the likelihood of each of the four bases. In some cases both sequences could have probability scores (e.g. trying to align the ends of contigs to each other), but often one sequence will be taken as fact (e.g. mapping reads onto a reference). It is possible to take these probabilities into account when considering the matches in needle (or water) by using a probabilistic version of the Needleman?Wunsch sequence alignment algorithm (or a probabilistic Smith-Waterman). As an example of this idea, did you (Peter R) see the GNUMAP talk/poster at ISMB 2009? See http://dna.cs.byu.edu/gnumap/ I am aware of people using EMBOSS tools (I assume water) to identify (known) adaptor sequences in raw Solexa/Illumina data. I considered doing something similar myself when trying to remove primer sequences from 454 data. Such a pipeline using the current EMBOSS water would be doing this matching at a purely fixed nucleotide level (ignoring the qualities), which isn't ideal. Upgrading to a probabilistic version of water should be an improvement. Peter C. From pmr at ebi.ac.uk Mon Jul 6 08:32:18 2009 From: pmr at ebi.ac.uk (Peter Rice) Date: Mon, 06 Jul 2009 13:32:18 +0100 Subject: [EMBOSS] Probabilistic versions of needle/water? In-Reply-To: <320fb6e00907060456t42879f56ta5df97ac6ae8ec8e@mail.gmail.com> References: <320fb6e00907060456t42879f56ta5df97ac6ae8ec8e@mail.gmail.com> Message-ID: <4A51EED2.3080901@ebi.ac.uk> Peter C. wrote: > I have another suggestion for new or enhanced EMBOSS applications, > again related to the existing pairwise sequence alignment tools needle > and water. > > The FASTQ file format (or others) contains quality scores (often PHRED > scores) representing the probability of an error in the associated > nucleotide. Solexa/Illumina machines also provide another file with a > more precise breakdown of the likelihood of each of the four bases. > > In some cases both sequences could have probability scores (e.g. > trying to align the ends of contigs to each other), but often one > sequence will be taken as fact (e.g. mapping reads onto a reference). > > It is possible to take these probabilities into account when > considering the matches in needle (or water) by using a probabilistic > version of the Needleman?Wunsch sequence alignment algorithm (or a > probabilistic Smith-Waterman). > > As an example of this idea, did you (Peter R) see the GNUMAP > talk/poster at ISMB 2009? See http://dna.cs.byu.edu/gnumap/ I saw the talk, and was wondering about their algorithm. They did not have a separate treatment for gaps in the redas and the consensus, which seemed like an obvious extension. > I am aware of people using EMBOSS tools (I assume water) to identify > (known) adaptor sequences in raw Solexa/Illumina data. I considered > doing something similar myself when trying to remove primer sequences > from 454 data. Such a pipeline using the current EMBOSS water would be > doing this matching at a purely fixed nucleotide level (ignoring the > qualities), which isn't ideal. Upgrading to a probabilistic version of > water should be an improvement. Would be interesting. Where can I look up adaptor calling methods? Peter Rice From biopython at maubp.freeserve.co.uk Mon Jul 6 10:25:47 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 6 Jul 2009 15:25:47 +0100 Subject: [EMBOSS] Probabilistic versions of needle/water? In-Reply-To: <4A51EED2.3080901@ebi.ac.uk> References: <320fb6e00907060456t42879f56ta5df97ac6ae8ec8e@mail.gmail.com> <4A51EED2.3080901@ebi.ac.uk> Message-ID: <320fb6e00907060725g380ce420vdf2f34c290518a16@mail.gmail.com> On Mon, Jul 6, 2009 at 1:32 PM, Peter Rice wrote: > >> I am aware of people using EMBOSS tools (I assume water) to identify >> (known) adaptor sequences in raw Solexa/Illumina data. I considered >> doing something similar myself when trying to remove primer sequences >> from 454 data. Such a pipeline using the current EMBOSS water would be >> doing this matching at a purely fixed nucleotide level (ignoring the >> qualities), which isn't ideal. Upgrading to a probabilistic version of >> water should be an improvement. > > Would be interesting. > > Where can I look up adaptor calling methods? The particular example I had in mind was the thread with Giles Weaver on the BioPerl mailing list, which I see you have just replied to: http://lists.open-bio.org/pipermail/bioperl-l/2009-June/030398.html http://lists.open-bio.org/pipermail/bioperl-l/2009-July/030404.html I think I made a typo earlier (needle versus water). If you are comparing a short but complete adaptor sequence to a read (which you expect may contain the full adaptor) doing a global alignment is more sensible that a local one. On re-reading, Giles did actually say he was using needle: http://lists.open-bio.org/pipermail/bioperl-l/2009-July/030411.html Peter From michael.watson at bbsrc.ac.uk Mon Jul 6 11:45:06 2009 From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C)) Date: Mon, 6 Jul 2009 16:45:06 +0100 Subject: [EMBOSS] Problems installing EMBASSY / PHYLIPNEW Message-ID: <8975119BCD0AC5419D61A9CF1A923E9508FD0DC2@iahce2ksrv1.iah.bbsrc.ac.uk> Hello I did this on a test server a few weeks ago and wrote a script and it worked. Now I come to the production server - well, I get some errors. Same OS, same tar balls. Very frustrating. Can anyone help? I have PHYLIPNEW-3.67 unpacked in EMBOSS-6.0.1/embassy/PHYLIPNEW-3.67. In that directory, I run configure and then make I get a variety of errors, but this time it's: Making all in src make[1]: Entering directory `/home/bioinfvm/EMBOSS-6.0.1/embassy/PHYLIPNEW-3.67/src' gcc -DPACKAGE_NAME=\"\" -DPACKAGE_TARNAME=\"\" -DPACKAGE_VERSION=\"\" -DPACKAGE_STRING=\"\" -DPACKAGE_BUGREPORT=\"\" -DPACKAGE=\"PHYLIPNEW\" -DVERSION=\"3.67\" -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -DHAVE_DIRENT_H=1 -DSTDC_HEADERS=1 -DHAVE_UNISTD_H=1 -DGETPGRP_VOID=1 -DHAVE_STRFTIME=1 -DHAVE_FORK=1 -DHAVE_VFORK=1 -DHAVE_WORKING_VFORK=1 -DHAVE_WORKING_FORK=1 -DHAVE_VPRINTF=1 -DHAVE_STRDUP=1 -DHAVE_STRSTR=1 -DHAVE_STRCHR=1 -DHAVE_ERAND48=1 -DHAVE_MEMMOVE=1 -DHAVE_LIBM=1 -DPLD_png=1 -I. -I../include -I../../../ajax -I../../../nucleus -I../../../plplot -DAJ_LinuxLF -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE -DLENDIAN -DNO_AUTH -O2 -D__amd64__ -MT clique.o -MD -MP -MF .deps/clique.Tpo -c -o clique.o clique.c mv -f .deps/clique.Tpo .deps/clique.Po gcc -DPACKAGE_NAME=\"\" -DPACKAGE_TARNAME=\"\" -DPACKAGE_VERSION=\"\" -DPACKAGE_STRING=\"\" -DPACKAGE_BUGREPORT=\"\" -DPACKAGE=\"PHYLIPNEW\" -DVERSION=\"3.67\" -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -DHAVE_DIRENT_H=1 -DSTDC_HEADERS=1 -DHAVE_UNISTD_H=1 -DGETPGRP_VOID=1 -DHAVE_STRFTIME=1 -DHAVE_FORK=1 -DHAVE_VFORK=1 -DHAVE_WORKING_VFORK=1 -DHAVE_WORKING_FORK=1 -DHAVE_VPRINTF=1 -DHAVE_STRDUP=1 -DHAVE_STRSTR=1 -DHAVE_STRCHR=1 -DHAVE_ERAND48=1 -DHAVE_MEMMOVE=1 -DHAVE_LIBM=1 -DPLD_png=1 -I. -I../include -I../../../ajax -I../../../nucleus -I../../../plplot -DAJ_LinuxLF -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE -DLENDIAN -DNO_AUTH -O2 -D__amd64__ -MT disc.o -MD -MP -MF .deps/disc.Tpo -c -o disc.o disc.c mv -f .deps/disc.Tpo .deps/disc.Po gcc -DPACKAGE_NAME=\"\" -DPACKAGE_TARNAME=\"\" -DPACKAGE_VERSION=\"\" -DPACKAGE_STRING=\"\" -DPACKAGE_BUGREPORT=\"\" -DPACKAGE=\"PHYLIPNEW\" -DVERSION=\"3.67\" -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -DHAVE_DIRENT_H=1 -DSTDC_HEADERS=1 -DHAVE_UNISTD_H=1 -DGETPGRP_VOID=1 -DHAVE_STRFTIME=1 -DHAVE_FORK=1 -DHAVE_VFORK=1 -DHAVE_WORKING_VFORK=1 -DHAVE_WORKING_FORK=1 -DHAVE_VPRINTF=1 -DHAVE_STRDUP=1 -DHAVE_STRSTR=1 -DHAVE_STRCHR=1 -DHAVE_ERAND48=1 -DHAVE_MEMMOVE=1 -DHAVE_LIBM=1 -DPLD_png=1 -I. -I../include -I../../../ajax -I../../../nucleus -I../../../plplot -DAJ_LinuxLF -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE -DLENDIAN -DNO_AUTH -O2 -D__amd64__ -MT phylip.o -MD -MP -MF .deps/phylip.Tpo -c -o phylip.o phylip.c mv -f .deps/phylip.Tpo .deps/phylip.Po /bin/sh ../libtool --tag=CC --mode=link gcc -O2 -D__amd64__ -o fclique clique.o disc.o phylip.o ../../../nucleus/libnucleus.la ../../../ajax/libajax.la ../../../plplot/libeplplot.la -L/usr/lib64 -lX11 -lXaw -lXt -lm -lgd -lpng -lz -lm mkdir .libs gcc -O2 -D__amd64__ -o .libs/fclique clique.o disc.o phylip.o ../../../nucleus/.libs/libnucleus.so ../../../ajax/.libs/libajax.so ../../../plplot/.libs/libeplplot.so -L/usr/lib64 -lX11 -lXaw -lXt -lgd -lpng -lz -lm -Wl,--rpath -Wl,/usr/local/lib /usr/bin/ld: cannot find -lXaw collect2: ld returned 1 exit status make[1]: *** [fclique] Error 1 make[1]: Leaving directory `/home/bioinfvm/EMBOSS-6.0.1/embassy/PHYLIPNEW-3.67/src' make: *** [all-recursive] Error 1 Any help would be appreciated. Thanks Mick Head of Bioinformatics Institute for Animal Health Compton Berks RG20 7NN 01635 578411 Please consider the environment and don't print this e-mail unless you really need to. The information contained in this message may be confidential or legally privileged and is intended solely for the addressee. If you have received this message in error please delete it & notify the originator immediately. Unauthorised use, disclosure, copying or alteration of this message is forbidden & may be unlawful. The contents of this e-mail are the views of the sender and do not necessarily represent the views of the Institute. This email, and associated attachments, has been checked locally for viruses but we can accept no responsibility once it has left our systems. Communications on Institute computers are monitored to secure the effective operation of the systems and for other lawful purposes. The Institute for Animal Health is a company limited by guarantee, registered in England no. 559784. The Institute is also a registered charity, Charity Commissioners Reference No. 228824 From jasonzhang2002 at gmail.com Mon Jul 6 12:35:21 2009 From: jasonzhang2002 at gmail.com (jason zhang) Date: Mon, 06 Jul 2009 09:35:21 -0700 Subject: [EMBOSS] Problems installing EMBASSY / PHYLIPNEW In-Reply-To: <8975119BCD0AC5419D61A9CF1A923E9508FD0DC2@iahce2ksrv1.iah.bbsrc.ac.uk> References: <8975119BCD0AC5419D61A9CF1A923E9508FD0DC2@iahce2ksrv1.iah.bbsrc.ac.uk> Message-ID: <4A5227C9.7040004@gmail.com> Hi, Michael I did this in my fedora server : yum install libXaw libXaw-compat libXaw-devel to resolve this. -jason michael watson (IAH-C) wrote: > Hello > > > > I did this on a test server a few weeks ago and wrote a script and it > worked. > > > > Now I come to the production server - well, I get some errors. Same OS, > same tar balls. Very frustrating. Can anyone help? > > > > I have PHYLIPNEW-3.67 unpacked in EMBOSS-6.0.1/embassy/PHYLIPNEW-3.67. > In that directory, I run configure and then make > > > > I get a variety of errors, but this time it's: > > > > Making all in src > > make[1]: Entering directory > `/home/bioinfvm/EMBOSS-6.0.1/embassy/PHYLIPNEW-3.67/src' > > gcc -DPACKAGE_NAME=\"\" -DPACKAGE_TARNAME=\"\" -DPACKAGE_VERSION=\"\" > -DPACKAGE_STRING=\"\" -DPACKAGE_BUGREPORT=\"\" -DPACKAGE=\"PHYLIPNEW\" > -DVERSION=\"3.67\" -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 > -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 > -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 > -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -DHAVE_DIRENT_H=1 > -DSTDC_HEADERS=1 -DHAVE_UNISTD_H=1 -DGETPGRP_VOID=1 -DHAVE_STRFTIME=1 > -DHAVE_FORK=1 -DHAVE_VFORK=1 -DHAVE_WORKING_VFORK=1 > -DHAVE_WORKING_FORK=1 -DHAVE_VPRINTF=1 -DHAVE_STRDUP=1 -DHAVE_STRSTR=1 > -DHAVE_STRCHR=1 -DHAVE_ERAND48=1 -DHAVE_MEMMOVE=1 -DHAVE_LIBM=1 > -DPLD_png=1 -I. -I../include -I../../../ajax -I../../../nucleus > -I../../../plplot -DAJ_LinuxLF -D_FILE_OFFSET_BITS=64 > -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE -DLENDIAN -DNO_AUTH -O2 > -D__amd64__ -MT clique.o -MD -MP -MF .deps/clique.Tpo -c -o clique.o > clique.c > > mv -f .deps/clique.Tpo .deps/clique.Po > > gcc -DPACKAGE_NAME=\"\" -DPACKAGE_TARNAME=\"\" -DPACKAGE_VERSION=\"\" > -DPACKAGE_STRING=\"\" -DPACKAGE_BUGREPORT=\"\" -DPACKAGE=\"PHYLIPNEW\" > -DVERSION=\"3.67\" -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 > -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 > -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 > -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -DHAVE_DIRENT_H=1 > -DSTDC_HEADERS=1 -DHAVE_UNISTD_H=1 -DGETPGRP_VOID=1 -DHAVE_STRFTIME=1 > -DHAVE_FORK=1 -DHAVE_VFORK=1 -DHAVE_WORKING_VFORK=1 > -DHAVE_WORKING_FORK=1 -DHAVE_VPRINTF=1 -DHAVE_STRDUP=1 -DHAVE_STRSTR=1 > -DHAVE_STRCHR=1 -DHAVE_ERAND48=1 -DHAVE_MEMMOVE=1 -DHAVE_LIBM=1 > -DPLD_png=1 -I. -I../include -I../../../ajax -I../../../nucleus > -I../../../plplot -DAJ_LinuxLF -D_FILE_OFFSET_BITS=64 > -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE -DLENDIAN -DNO_AUTH -O2 > -D__amd64__ -MT disc.o -MD -MP -MF .deps/disc.Tpo -c -o disc.o disc.c > > mv -f .deps/disc.Tpo .deps/disc.Po > > gcc -DPACKAGE_NAME=\"\" -DPACKAGE_TARNAME=\"\" -DPACKAGE_VERSION=\"\" > -DPACKAGE_STRING=\"\" -DPACKAGE_BUGREPORT=\"\" -DPACKAGE=\"PHYLIPNEW\" > -DVERSION=\"3.67\" -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 > -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 > -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 > -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -DHAVE_DIRENT_H=1 > -DSTDC_HEADERS=1 -DHAVE_UNISTD_H=1 -DGETPGRP_VOID=1 -DHAVE_STRFTIME=1 > -DHAVE_FORK=1 -DHAVE_VFORK=1 -DHAVE_WORKING_VFORK=1 > -DHAVE_WORKING_FORK=1 -DHAVE_VPRINTF=1 -DHAVE_STRDUP=1 -DHAVE_STRSTR=1 > -DHAVE_STRCHR=1 -DHAVE_ERAND48=1 -DHAVE_MEMMOVE=1 -DHAVE_LIBM=1 > -DPLD_png=1 -I. -I../include -I../../../ajax -I../../../nucleus > -I../../../plplot -DAJ_LinuxLF -D_FILE_OFFSET_BITS=64 > -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE -DLENDIAN -DNO_AUTH -O2 > -D__amd64__ -MT phylip.o -MD -MP -MF .deps/phylip.Tpo -c -o phylip.o > phylip.c > > mv -f .deps/phylip.Tpo .deps/phylip.Po > > /bin/sh ../libtool --tag=CC --mode=link gcc -O2 -D__amd64__ -o > fclique clique.o disc.o phylip.o ../../../nucleus/libnucleus.la > ../../../ajax/libajax.la ../../../plplot/libeplplot.la -L/usr/lib64 > -lX11 -lXaw -lXt -lm -lgd -lpng -lz -lm > > mkdir .libs > > gcc -O2 -D__amd64__ -o .libs/fclique clique.o disc.o phylip.o > ../../../nucleus/.libs/libnucleus.so ../../../ajax/.libs/libajax.so > ../../../plplot/.libs/libeplplot.so -L/usr/lib64 -lX11 -lXaw -lXt -lgd > -lpng -lz -lm -Wl,--rpath -Wl,/usr/local/lib > > /usr/bin/ld: cannot find -lXaw > > collect2: ld returned 1 exit status > > make[1]: *** [fclique] Error 1 > > make[1]: Leaving directory > `/home/bioinfvm/EMBOSS-6.0.1/embassy/PHYLIPNEW-3.67/src' > > make: *** [all-recursive] Error 1 > > > > Any help would be appreciated. > > > > Thanks > > Mick > > > > Head of Bioinformatics > Institute for Animal Health > Compton > Berks > RG20 7NN > 01635 578411 > > > > Please consider the environment and don't print this e-mail unless you > really need to. > > The information contained in this message may be confidential or legally > privileged and is intended solely for the addressee. If you have > received this message in error please delete it & notify the originator > immediately. Unauthorised use, disclosure, copying or alteration of > this message is forbidden & may be unlawful. The contents of this > e-mail are the views of the sender and do not necessarily represent the > views of the Institute. This email, and associated attachments, has > been checked locally for viruses but we can accept no responsibility > once it has left our systems. Communications on Institute computers are > monitored to secure the effective operation of the systems and for other > lawful purposes. > > > > The Institute for Animal Health is a company limited by guarantee, > registered in England no. 559784. > > The Institute is also a registered charity, Charity Commissioners > Reference No. 228824 > > > > > _______________________________________________ > EMBOSS mailing list > EMBOSS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/emboss > > From steffen_moeller at gmx.de Mon Jul 6 12:28:47 2009 From: steffen_moeller at gmx.de (Steffen Moeller) Date: Mon, 06 Jul 2009 18:28:47 +0200 Subject: [EMBOSS] Problems installing EMBASSY / PHYLIPNEW In-Reply-To: <8975119BCD0AC5419D61A9CF1A923E9508FD0DC2@iahce2ksrv1.iah.bbsrc.ac.uk> References: <8975119BCD0AC5419D61A9CF1A923E9508FD0DC2@iahce2ksrv1.iah.bbsrc.ac.uk> Message-ID: <4A52263F.1050806@gmx.de> Hi Michael, michael watson (IAH-C) wrote: > I did this on a test server a few weeks ago and wrote a script and it > worked. > > Now I come to the production server - well, I get some errors. Same OS, > same tar balls. Very frustrating. Can anyone help? ... > > gcc -O2 -D__amd64__ -o .libs/fclique clique.o disc.o phylip.o > ../../../nucleus/.libs/libnucleus.so ../../../ajax/.libs/libajax.so > ../../../plplot/.libs/libeplplot.so -L/usr/lib64 -lX11 -lXaw -lXt -lgd > -lpng -lz -lm -Wl,--rpath -Wl,/usr/local/lib > > /usr/bin/ld: cannot find -lXaw > > collect2: ld returned 1 exit status your production machine is probably not prepared for graphical user interfaces :) Debian has it in the package libxaw7. Many greetings Steffen $ dpkg -S /usr/lib/libXaw.so.7 libxaw7: /usr/lib/libXaw.so.7 From michael.watson at bbsrc.ac.uk Tue Jul 7 04:18:39 2009 From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C)) Date: Tue, 7 Jul 2009 09:18:39 +0100 Subject: [EMBOSS] Problems installing EMBASSY / PHYLIPNEW In-Reply-To: <4A5227C9.7040004@gmail.com> References: <8975119BCD0AC5419D61A9CF1A923E9508FD0DC2@iahce2ksrv1.iah.bbsrc.ac.uk> <4A5227C9.7040004@gmail.com> Message-ID: <8975119BCD0AC5419D61A9CF1A923E9508FD0DD3@iahce2ksrv1.iah.bbsrc.ac.uk> Hi Guys Thanks for all the replies. One last question: do I need the X11 development libraries in order for fdrawgram and fdrawtree to produce postscript files? Thanks Michael -----Original Message----- From: jason zhang [mailto:jasonzhang2002 at gmail.com] Sent: 06 July 2009 17:35 To: michael watson (IAH-C) Cc: emboss at lists.open-bio.org Subject: Re: [EMBOSS] Problems installing EMBASSY / PHYLIPNEW Hi, Michael I did this in my fedora server : yum install libXaw libXaw-compat libXaw-devel to resolve this. -jason michael watson (IAH-C) wrote: > Hello > > > > I did this on a test server a few weeks ago and wrote a script and it > worked. > > > > Now I come to the production server - well, I get some errors. Same OS, > same tar balls. Very frustrating. Can anyone help? > > > > I have PHYLIPNEW-3.67 unpacked in EMBOSS-6.0.1/embassy/PHYLIPNEW-3.67. > In that directory, I run configure and then make > > > > I get a variety of errors, but this time it's: > > > > Making all in src > > make[1]: Entering directory > `/home/bioinfvm/EMBOSS-6.0.1/embassy/PHYLIPNEW-3.67/src' > > gcc -DPACKAGE_NAME=\"\" -DPACKAGE_TARNAME=\"\" -DPACKAGE_VERSION=\"\" > -DPACKAGE_STRING=\"\" -DPACKAGE_BUGREPORT=\"\" -DPACKAGE=\"PHYLIPNEW\" > -DVERSION=\"3.67\" -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 > -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 > -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 > -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -DHAVE_DIRENT_H=1 > -DSTDC_HEADERS=1 -DHAVE_UNISTD_H=1 -DGETPGRP_VOID=1 -DHAVE_STRFTIME=1 > -DHAVE_FORK=1 -DHAVE_VFORK=1 -DHAVE_WORKING_VFORK=1 > -DHAVE_WORKING_FORK=1 -DHAVE_VPRINTF=1 -DHAVE_STRDUP=1 -DHAVE_STRSTR=1 > -DHAVE_STRCHR=1 -DHAVE_ERAND48=1 -DHAVE_MEMMOVE=1 -DHAVE_LIBM=1 > -DPLD_png=1 -I. -I../include -I../../../ajax -I../../../nucleus > -I../../../plplot -DAJ_LinuxLF -D_FILE_OFFSET_BITS=64 > -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE -DLENDIAN -DNO_AUTH -O2 > -D__amd64__ -MT clique.o -MD -MP -MF .deps/clique.Tpo -c -o clique.o > clique.c > > mv -f .deps/clique.Tpo .deps/clique.Po > > gcc -DPACKAGE_NAME=\"\" -DPACKAGE_TARNAME=\"\" -DPACKAGE_VERSION=\"\" > -DPACKAGE_STRING=\"\" -DPACKAGE_BUGREPORT=\"\" -DPACKAGE=\"PHYLIPNEW\" > -DVERSION=\"3.67\" -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 > -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 > -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 > -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -DHAVE_DIRENT_H=1 > -DSTDC_HEADERS=1 -DHAVE_UNISTD_H=1 -DGETPGRP_VOID=1 -DHAVE_STRFTIME=1 > -DHAVE_FORK=1 -DHAVE_VFORK=1 -DHAVE_WORKING_VFORK=1 > -DHAVE_WORKING_FORK=1 -DHAVE_VPRINTF=1 -DHAVE_STRDUP=1 -DHAVE_STRSTR=1 > -DHAVE_STRCHR=1 -DHAVE_ERAND48=1 -DHAVE_MEMMOVE=1 -DHAVE_LIBM=1 > -DPLD_png=1 -I. -I../include -I../../../ajax -I../../../nucleus > -I../../../plplot -DAJ_LinuxLF -D_FILE_OFFSET_BITS=64 > -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE -DLENDIAN -DNO_AUTH -O2 > -D__amd64__ -MT disc.o -MD -MP -MF .deps/disc.Tpo -c -o disc.o disc.c > > mv -f .deps/disc.Tpo .deps/disc.Po > > gcc -DPACKAGE_NAME=\"\" -DPACKAGE_TARNAME=\"\" -DPACKAGE_VERSION=\"\" > -DPACKAGE_STRING=\"\" -DPACKAGE_BUGREPORT=\"\" -DPACKAGE=\"PHYLIPNEW\" > -DVERSION=\"3.67\" -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 > -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 > -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 > -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -DHAVE_DIRENT_H=1 > -DSTDC_HEADERS=1 -DHAVE_UNISTD_H=1 -DGETPGRP_VOID=1 -DHAVE_STRFTIME=1 > -DHAVE_FORK=1 -DHAVE_VFORK=1 -DHAVE_WORKING_VFORK=1 > -DHAVE_WORKING_FORK=1 -DHAVE_VPRINTF=1 -DHAVE_STRDUP=1 -DHAVE_STRSTR=1 > -DHAVE_STRCHR=1 -DHAVE_ERAND48=1 -DHAVE_MEMMOVE=1 -DHAVE_LIBM=1 > -DPLD_png=1 -I. -I../include -I../../../ajax -I../../../nucleus > -I../../../plplot -DAJ_LinuxLF -D_FILE_OFFSET_BITS=64 > -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE -DLENDIAN -DNO_AUTH -O2 > -D__amd64__ -MT phylip.o -MD -MP -MF .deps/phylip.Tpo -c -o phylip.o > phylip.c > > mv -f .deps/phylip.Tpo .deps/phylip.Po > > /bin/sh ../libtool --tag=CC --mode=link gcc -O2 -D__amd64__ -o > fclique clique.o disc.o phylip.o ../../../nucleus/libnucleus.la > ../../../ajax/libajax.la ../../../plplot/libeplplot.la -L/usr/lib64 > -lX11 -lXaw -lXt -lm -lgd -lpng -lz -lm > > mkdir .libs > > gcc -O2 -D__amd64__ -o .libs/fclique clique.o disc.o phylip.o > ../../../nucleus/.libs/libnucleus.so ../../../ajax/.libs/libajax.so > ../../../plplot/.libs/libeplplot.so -L/usr/lib64 -lX11 -lXaw -lXt -lgd > -lpng -lz -lm -Wl,--rpath -Wl,/usr/local/lib > > /usr/bin/ld: cannot find -lXaw > > collect2: ld returned 1 exit status > > make[1]: *** [fclique] Error 1 > > make[1]: Leaving directory > `/home/bioinfvm/EMBOSS-6.0.1/embassy/PHYLIPNEW-3.67/src' > > make: *** [all-recursive] Error 1 > > > > Any help would be appreciated. > > > > Thanks > > Mick > > > > Head of Bioinformatics > Institute for Animal Health > Compton > Berks > RG20 7NN > 01635 578411 > > > > Please consider the environment and don't print this e-mail unless you > really need to. > > The information contained in this message may be confidential or legally > privileged and is intended solely for the addressee. If you have > received this message in error please delete it & notify the originator > immediately. Unauthorised use, disclosure, copying or alteration of > this message is forbidden & may be unlawful. The contents of this > e-mail are the views of the sender and do not necessarily represent the > views of the Institute. This email, and associated attachments, has > been checked locally for viruses but we can accept no responsibility > once it has left our systems. Communications on Institute computers are > monitored to secure the effective operation of the systems and for other > lawful purposes. > > > > The Institute for Animal Health is a company limited by guarantee, > registered in England no. 559784. > > The Institute is also a registered charity, Charity Commissioners > Reference No. 228824 > > > > > _______________________________________________ > EMBOSS mailing list > EMBOSS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/emboss > > From ellis at biochem.dental.upenn.edu Tue Jul 7 15:34:27 2009 From: ellis at biochem.dental.upenn.edu (Ellis Golub) Date: Tue, 7 Jul 2009 15:34:27 -0400 Subject: [EMBOSS] Structure Message-ID: <1090707153427.ZM647422@biochem.dental.upenn.edu> Hello -- I've downloaded the EMBASSY STRUCTURE-0.1.0 package and tried to install it. I have previously installed EMBOSS-6.0.1. Using the same flags with configure that I used for the original EMBOSS installation, all seems to go well until I try to run one of the programs. The following output is obtained: > pdbparse 1274873:pdbparse: rld: Error: unresolvable symbol in /usr/local/emboss/lib/libplplot.so.6: gdImageColorAllocate 1274873:pdbparse: rld: Error: unresolvable symbol in /usr/local/emboss/lib/libplplot.so.6: gdImageLine 1274873:pdbparse: rld: Fatal Error: this executable has unresolvable symbols The /usr/local/emboss/lib/libplplot.so.6 file is the one produced with the EMBOSS install, and all the EMBOSS files work fine, and the grahics outputs are fine. Anyone have a clue? Ellis -- ============================================================= Ellis Golub Phone: (215) 898-4629 Biochemistry Department FAX: (215) 898-3695 School of Dental Medicine ellis at biochem.dental.upenn.edu University of Pennsylvania 240 South 40th Street Philadelphia, PA 19104-6030 Home: http://biochem.dental.upenn.edu/~ellis/ PGP: http://biochem.dental.upenn.edu/~ellis/public_key.html ============================================================== From lueck at ipk-gatersleben.de Wed Jul 8 05:59:39 2009 From: lueck at ipk-gatersleben.de (lueck at ipk-gatersleben.de) Date: Wed, 8 Jul 2009 11:59:39 +0200 Subject: [EMBOSS] Many-to-many with needle and water In-Reply-To: <4A51D360.1050107@ebi.ac.uk> References: <320fb6e00907060312l5506c766i1c4a4ff48fba8c7e@mail.gmail.com> <4A51D360.1050107@ebi.ac.uk> Message-ID: <20090708115939.8spikcm8zxi8cscw@webmail.ipk-gatersleben.de> Hi! I would be interested in "superneedle" if the speed is acceptable. I'm still looking for a fast and good enough short sequence finder/aligner (see http://www.mailinglistarchive.com/biopython at biopython.org/msg00941.html). At the moment I'm still using BLAST because I have no time to switch but later I would like to change to another algorithm. Kind regards Stefanie Zitat von Peter Rice : > Peter Cock or biopython wrote: >> Hi Peter R. et al, >> >> I gather EMBOSS is looking for feedback for new applications (given >> the recent funding from the BBSRC - congratulations again). How about >> suggestions for extensions to existing EMBOSS applications? >> >> I've used bits of EMBOSS for several years now (thank you!). Something >> I have sometimes wanted to do is a many-to-many pairwise sequence >> alignment with the EMBOSS tools needle and water. >> >> Right now, needle and water take two files (here referred to as A and >> B), file A has just one sequence, and file B can have one or more >> sequences. I'd like to be able to supply two files both with multiple >> entries, and have needle/water do pairwise alignments between all the >> sequences in A against all the sequences in B. This might be useful >> for finding reciprocal best hits in comparative genomics (as an slower >> but exact alternative to FASTA or BLAST). > > The application is easy to add (after the release) > > The usual problem with all-against-all is that it involves loading one > of the inputs as a sequence set entirely in memory - to avoid reading > one input many times over. > > We have an application supermatcher which does this - the first sequence > is streamed through, the second is a sequence set loaded into memory. It > uses work matching to find seed alignments then runs a limited alignment > around the hits. > > superwater would be a possible name (or superneedle). > > How popular would such a program be? > > How large would the smaller input set be? > > regards, > > Peter > > _______________________________________________ > EMBOSS mailing list > EMBOSS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/emboss > > From pmr at ebi.ac.uk Wed Jul 8 07:04:04 2009 From: pmr at ebi.ac.uk (Peter Rice) Date: Wed, 08 Jul 2009 12:04:04 +0100 Subject: [EMBOSS] Many-to-many with needle and water In-Reply-To: <20090708115939.8spikcm8zxi8cscw@webmail.ipk-gatersleben.de> References: <320fb6e00907060312l5506c766i1c4a4ff48fba8c7e@mail.gmail.com> <4A51D360.1050107@ebi.ac.uk> <20090708115939.8spikcm8zxi8cscw@webmail.ipk-gatersleben.de> Message-ID: <4A547D24.6070703@ebi.ac.uk> Hi Stefanie, lueck at ipk-gatersleben.de wrote: > Hi! > > I would be interested in "superneedle" if the speed is acceptable. I'm > still looking for a fast and good enough short sequence finder/aligner > (see > http://www.mailinglistarchive.com/biopython at biopython.org/msg00941.html). > > At the moment I'm still using BLAST because I have no time to switch but > later I would like to change to another algorithm. >From the biopython message it seems you mainly want to find exact 21-mers. Can you give an example of a query and a database so I can test the speed of current EMBOSS applications like wordfinder regards, Peter Rice From biopython at maubp.freeserve.co.uk Wed Jul 8 17:50:19 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 8 Jul 2009 22:50:19 +0100 Subject: [EMBOSS] transeq and ambiguous codons Message-ID: <320fb6e00907081450y2fd135e0x817f03c41357e297@mail.gmail.com> Hi all, Something I mentioned to Peter Rice in passing at BOSC/ISMB 2009 was I'd found an oddity in transeq with certain ambiguous codons which testing Biopython's translations. Here is a specific example (but I suspect there are more). For reference, I am expecting EMBOSS transeq to be using the NCBI tables: http://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi First consider the following example, the codon TAN, which can be TAA, TAC, TAG or TAT which translate to stop or Y. Therefore the translation of TAN should be "* or Y", and EMBOSS transeq opts for "X". Which is fine: $ transeq asis:TAATACTAGTATTAN -stdout -auto >asis_1 *Y*YX Similarly for the codon TNN, again EMBOSS transeq opts for "X" because this could be a stop codon, or W, or F, or L, or S, or Y or C! Again, this is fine: $ transeq asis:TNN -stdout -auto >asis_1 X However, consider the codon TRR. R means A or G, so this can mean TAA, TGA, TAG or TGG which translate to stop or W (both EMBOSS and the NCBI standard table agree here). Therefore the translation of TRR should be "* or W", which I would expect based on the above examples to result in "X". But instead EMBOSS transeq gives "*": $ transeq asis:TAATGATAGTGGTRRTNN -stdout -auto >asis_1 ***W*X I think this is a bug. However, I am aware that the machine I tried this on is rather old, and I don't actually know which version of EMBOSS it is. How can I find out? As far as I know, there is no "-version" or "-v" or "--version" switch, and the "-help" information doesn't include this important piece of information. Nor is this in the FAQ: http://emboss.sourceforge.net/docs/faq.html So that makes two questions - how should transeq translate "TRR", and how do I check the version of EMBOSS? Thanks, Peter C. From SMarkel at accelrys.com Wed Jul 8 19:53:05 2009 From: SMarkel at accelrys.com (Scott Markel) Date: Wed, 8 Jul 2009 19:53:05 -0400 Subject: [EMBOSS] transeq and ambiguous codons In-Reply-To: <320fb6e00907081450y2fd135e0x817f03c41357e297@mail.gmail.com> References: <320fb6e00907081450y2fd135e0x817f03c41357e297@mail.gmail.com> Message-ID: <1F1240778FB0AF46B4E5A72C44D2C74732E01A91@exch1-hi.accelrys.net> Peter, Answer to question #2: run the program embossversion. > embossversion Writes the current EMBOSS version number to a file 6.0.1 Scott Scott Markel, Ph.D. Principal Bioinformatics Architect email: smarkel at accelrys.com Accelrys (SciTegic R&D) mobile: +1 858 205 3653 10188 Telesis Court, Suite 100 voice: +1 858 799 5603 San Diego, CA 92121 fax: +1 858 799 5222 USA web: http://www.accelrys.com http://www.linkedin.com/in/smarkel Vice President, Board of Directors: International Society for Computational Biology Co-chair: ISCB Publications Committee Associate Editor: PLoS Computational Biology Editorial Board: Briefings in Bioinformatics > -----Original Message----- > From: emboss-bounces at lists.open-bio.org [mailto:emboss-bounces at lists.open- > bio.org] On Behalf Of Peter > Sent: Wednesday, 08 July 2009 2:50 PM > To: emboss at lists.open-bio.org > Subject: [EMBOSS] transeq and ambiguous codons > > Hi all, > > Something I mentioned to Peter Rice in passing at BOSC/ISMB 2009 was > I'd found an oddity in transeq with certain ambiguous codons which > testing Biopython's translations. Here is a specific example (but I > suspect there are more). For reference, I am expecting EMBOSS transeq > to be using the NCBI tables: > http://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi > > First consider the following example, the codon TAN, which can be TAA, > TAC, TAG or TAT which translate to stop or Y. Therefore the > translation of TAN should be "* or Y", and EMBOSS transeq opts for > "X". Which is fine: > > $ transeq asis:TAATACTAGTATTAN -stdout -auto > >asis_1 > *Y*YX > > Similarly for the codon TNN, again EMBOSS transeq opts for "X" because > this could be a stop codon, or W, or F, or L, or S, or Y or C! Again, > this is fine: > > $ transeq asis:TNN -stdout -auto >asis_1 > X > > However, consider the codon TRR. R means A or G, so this can mean TAA, > TGA, TAG or TGG which translate to stop or W (both EMBOSS and the NCBI > standard table agree here). Therefore the translation of TRR should be > "* or W", which I would expect based on the above examples to result > in "X". But instead EMBOSS transeq gives "*": > > $ transeq asis:TAATGATAGTGGTRRTNN -stdout -auto > >asis_1 > ***W*X > > I think this is a bug. > > However, I am aware that the machine I tried this on is rather old, > and I don't actually know which version of EMBOSS it is. How can I > find out? As far as I know, there is no "-version" or "-v" or > "--version" switch, and the "-help" information doesn't include this > important piece of information. Nor is this in the FAQ: > http://emboss.sourceforge.net/docs/faq.html > > So that makes two questions - how should transeq translate "TRR", and > how do I check the version of EMBOSS? > > Thanks, > > Peter C. > _______________________________________________ > EMBOSS mailing list > EMBOSS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/emboss From pmr at ebi.ac.uk Thu Jul 9 03:38:34 2009 From: pmr at ebi.ac.uk (Peter Rice) Date: Thu, 09 Jul 2009 08:38:34 +0100 Subject: [EMBOSS] transeq and ambiguous codons In-Reply-To: <320fb6e00907081450y2fd135e0x817f03c41357e297@mail.gmail.com> References: <320fb6e00907081450y2fd135e0x817f03c41357e297@mail.gmail.com> Message-ID: <4A559E7A.4020807@ebi.ac.uk> Peter C. wrote: > Hi all, > > Something I mentioned to Peter Rice in passing at BOSC/ISMB 2009 was > I'd found an oddity in transeq with certain ambiguous codons which > testing Biopython's translations. Here is a specific example (but I > suspect there are more). For reference, I am expecting EMBOSS transeq > to be using the NCBI tables: > http://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi Yes, those are our source. > However, consider the codon TRR. R means A or G, so this can mean TAA, > TGA, TAG or TGG which translate to stop or W (both EMBOSS and the NCBI > standard table agree here). Therefore the translation of TRR should be > "* or W", which I would expect based on the above examples to result > in "X". But instead EMBOSS transeq gives "*": > > $ transeq asis:TAATGATAGTGGTRRTNN -stdout -auto >> asis_1 > ***W*X Hmmm... something odd there. TRA translates as 'X'. I will take a look, but we are bundling the 6.1.0 release so (unless I am very quick) any resolution will be a patch after the release. > So that makes two questions - how should transeq translate "TRR", and > how do I check the version of EMBOSS? Scott got there first (different time zone I assume :-) with the suggestion of embossversion. Meanwhile .... are there any translation utilities we could usefully include in a future version? I am already contemplating one to report the translations of ambiguous codons, just for information in tracking and debugging translation tables - and explaining transeq and other applications to new users. regards, Peter Rice From biopython at maubp.freeserve.co.uk Thu Jul 9 04:59:52 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 9 Jul 2009 09:59:52 +0100 Subject: [EMBOSS] transeq and ambiguous codons In-Reply-To: <1F1240778FB0AF46B4E5A72C44D2C74732E01A91@exch1-hi.accelrys.net> References: <320fb6e00907081450y2fd135e0x817f03c41357e297@mail.gmail.com> <1F1240778FB0AF46B4E5A72C44D2C74732E01A91@exch1-hi.accelrys.net> Message-ID: <320fb6e00907090159w6a76df66gbf8da2581594414d@mail.gmail.com> On Thu, Jul 9, 2009 at 12:53 AM, Scott Markel wrote: > > Peter, > > Answer to question #2: run the program embossversion. > >> embossversion > Writes the current EMBOSS version number to a file > 6.0.1 > > Scott Thanks Scott (& Thomas) for pointing out the embossversion program. I would still question why the EMBOSS tools don't also support the Unix convention of a version switch. Hypothetically, aren't some (many?) of the tools standalone and couldn't they be installed individually (e.g. as part of someone else's software bundle)? i.e. Can EMBOSS really guarantee that the needle tool and the embossversion tool are in sync? Peter From pmr at ebi.ac.uk Thu Jul 9 05:08:37 2009 From: pmr at ebi.ac.uk (Peter Rice) Date: Thu, 09 Jul 2009 10:08:37 +0100 Subject: [EMBOSS] transeq and ambiguous codons In-Reply-To: <320fb6e00907081450y2fd135e0x817f03c41357e297@mail.gmail.com> References: <320fb6e00907081450y2fd135e0x817f03c41357e297@mail.gmail.com> Message-ID: <4A55B395.4090301@ebi.ac.uk> Peter C. wrote: > However, consider the codon TRR. R means A or G, so this can mean TAA, > TGA, TAG or TGG which translate to stop or W (both EMBOSS and the NCBI > standard table agree here). Therefore the translation of TRR should be > "* or W", which I would expect based on the above examples to result > in "X". But instead EMBOSS transeq gives "*": This is a side effect of the way backtranslation works. EMBOSS calculates the "most ambiguous codon" for each amino acid and stop, and uses this for back translation. Thus a '*' in a protein sequence would be rendered as 'TRR' by backtranseq. To provide consistent translation of the backtranseq results, TRR is assumed to be a backtranslated stop. Similarly, MGN is 'R' because it could reasonably result from a backtranslation of 'R' I agree that it would also be reasonable to be strict about translation in transeq and render TRR as 'X' It depends on your philosophy of where the ambiguity codes came from - from backtranslation, or the curious mind of a bioinformatician :-) So .... it's not a bug, it's a feature ... which means I can relax for now and contemplate some extras in the next release. In future, we will at least make sure TRA and other 'unambiguous ambiguous codons' get understood as '*' etc. TRR I would prefer to leave as it is by default, with option for rendering it as 'X' or an alternative to transeq with the strict translation rules enforced. regards, Peter Rice From pmr at ebi.ac.uk Thu Jul 9 05:16:40 2009 From: pmr at ebi.ac.uk (Peter Rice) Date: Thu, 09 Jul 2009 10:16:40 +0100 Subject: [EMBOSS] transeq and ambiguous codons In-Reply-To: <320fb6e00907090159w6a76df66gbf8da2581594414d@mail.gmail.com> References: <320fb6e00907081450y2fd135e0x817f03c41357e297@mail.gmail.com> <1F1240778FB0AF46B4E5A72C44D2C74732E01A91@exch1-hi.accelrys.net> <320fb6e00907090159w6a76df66gbf8da2581594414d@mail.gmail.com> Message-ID: <4A55B578.1060808@ebi.ac.uk> Peter C. wrote: > Thanks Scott (& Thomas) for pointing out the embossversion program. > > I would still question why the EMBOSS tools don't also support the > Unix convention of a version switch. Hypothetically, aren't some > (many?) of the tools standalone and couldn't they be installed > individually (e.g. as part of someone else's software bundle)? i.e. > Can EMBOSS really guarantee that the needle tool and the > embossversion tool are in sync? We could easily add a -version global qualifier ... for the next release. We can guarantee that embossversion and needle are in sync - assuming they are built using the same libraries as that is where the version is recorded. Standalone build are an issue though and it would help debug in a few cases. regards, Peter Rice From biopython at maubp.freeserve.co.uk Thu Jul 9 05:21:45 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 9 Jul 2009 10:21:45 +0100 Subject: [EMBOSS] transeq and ambiguous codons In-Reply-To: <4A55B578.1060808@ebi.ac.uk> References: <320fb6e00907081450y2fd135e0x817f03c41357e297@mail.gmail.com> <1F1240778FB0AF46B4E5A72C44D2C74732E01A91@exch1-hi.accelrys.net> <320fb6e00907090159w6a76df66gbf8da2581594414d@mail.gmail.com> <4A55B578.1060808@ebi.ac.uk> Message-ID: <320fb6e00907090221m66f830b6ya25ad8eeeecf8f12@mail.gmail.com> On Thu, Jul 9, 2009 at 10:16 AM, Peter Rice wrote: > > Peter C. wrote: > >> Thanks Scott (& Thomas) for pointing out the embossversion program. >> >> I would still question why the EMBOSS tools don't also support the >> Unix convention of a version switch. Hypothetically, aren't some >> (many?) of the tools standalone and couldn't they be installed >> individually (e.g. as part of someone else's software bundle)? i.e. >> Can EMBOSS really guarantee that the needle tool and the >> embossversion tool are in sync? > > We could easily add a -version global qualifier ... for the next release. > > We can guarantee that embossversion and needle are in sync - assuming > they are built using the same libraries as that is where the version is > recorded. Standalone build are an issue though and it would help debug > in a few cases. That sounds good to me :) Peter C. From biopython at maubp.freeserve.co.uk Thu Jul 9 05:28:20 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 9 Jul 2009 10:28:20 +0100 Subject: [EMBOSS] transeq and ambiguous codons In-Reply-To: <320fb6e00907081450y2fd135e0x817f03c41357e297@mail.gmail.com> References: <320fb6e00907081450y2fd135e0x817f03c41357e297@mail.gmail.com> Message-ID: <320fb6e00907090228u7fd0df2by1e2ca18fc0a5c43@mail.gmail.com> On Wed, Jul 8, 2009 at 10:50 PM, Peter wrote: > Hi all, > > Something I mentioned to Peter Rice in passing at BOSC/ISMB 2009 was > I'd found an oddity in transeq with certain ambiguous codons while > testing Biopython's translations. Here is a specific example (but I > suspect there are more). For reference, I am expecting EMBOSS transeq > to be using the NCBI tables: > http://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi > > First consider the following example, the codon TAN, which can be TAA, > TAC, TAG or TAT which translate to stop or Y. Therefore the > translation of TAN should be "* or Y", and EMBOSS transeq opts for > "X". Which is fine: Using raw output instead of the default FASTA works better in emails: $ transeq asis:TAATACTAGTATTAN -stdout -auto -osformat raw *Y*YX > Similarly for the codon TNN, again EMBOSS transeq opts for "X" because > this could be a stop codon, or W, or F, or L, or S, or Y or C! Again, > this is fine: Again, using raw output works better in emails: $ transeq asis:TNN -stdout -auto -osformat raw X > However, consider the codon TRR. R means A or G, so this can mean TAA, > TGA, TAG or TGG which translate to stop or W (both EMBOSS and the NCBI > standard table agree here). Therefore the translation of TRR should be > "* or W", which I would expect based on the above examples to result > in "X". But instead EMBOSS transeq gives "*": Again, using raw output works better in emails: $ transeq asis:TAATGATAGTGGTRR -stdout -auto -osformat raw ***W* > I think this is a bug. > > However, I am aware that the machine I tried this on is rather old, > and I don't actually know which version of EMBOSS it is. I can check the old machine later, but I just retested on a Mac using EMBOSS 6.0.1 (the current release), and see the same behaviour. Peter C. From michael.watson at bbsrc.ac.uk Thu Jul 9 07:33:59 2009 From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C)) Date: Thu, 9 Jul 2009 12:33:59 +0100 Subject: [EMBOSS] Problems installing EMBASSY / PHYLIPNEW References: <8975119BCD0AC5419D61A9CF1A923E9508FD0DC2@iahce2ksrv1.iah.bbsrc.ac.uk> <4A5227C9.7040004@gmail.com> Message-ID: <8975119BCD0AC5419D61A9CF1A923E9508B292C9@iahce2ksrv1.iah.bbsrc.ac.uk> Hmmm. My sys admin says.... I can find: libXaw-devel.x86_64 : X.Org X11 libXaw development package libXaw.x86_64 : X.Org X11 libXaw runtime library libXaw.i386 : X.Org X11 libXaw runtime library libXaw-devel.i386 : X.Org X11 libXaw development package Have I got compatibility issues here? -----Original Message----- From: jason zhang [mailto:jasonzhang2002 at gmail.com] Sent: Mon 06/07/2009 5:35 PM To: michael watson (IAH-C) Cc: emboss at lists.open-bio.org Subject: Re: [EMBOSS] Problems installing EMBASSY / PHYLIPNEW Hi, Michael I did this in my fedora server : yum install libXaw libXaw-compat libXaw-devel to resolve this. -jason michael watson (IAH-C) wrote: > Hello > > > > I did this on a test server a few weeks ago and wrote a script and it > worked. > > > > Now I come to the production server - well, I get some errors. Same OS, > same tar balls. Very frustrating. Can anyone help? > > > > I have PHYLIPNEW-3.67 unpacked in EMBOSS-6.0.1/embassy/PHYLIPNEW-3.67. > In that directory, I run configure and then make > > > > I get a variety of errors, but this time it's: > > > > Making all in src > > make[1]: Entering directory > `/home/bioinfvm/EMBOSS-6.0.1/embassy/PHYLIPNEW-3.67/src' > > gcc -DPACKAGE_NAME=\"\" -DPACKAGE_TARNAME=\"\" -DPACKAGE_VERSION=\"\" > -DPACKAGE_STRING=\"\" -DPACKAGE_BUGREPORT=\"\" -DPACKAGE=\"PHYLIPNEW\" > -DVERSION=\"3.67\" -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 > -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 > -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 > -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -DHAVE_DIRENT_H=1 > -DSTDC_HEADERS=1 -DHAVE_UNISTD_H=1 -DGETPGRP_VOID=1 -DHAVE_STRFTIME=1 > -DHAVE_FORK=1 -DHAVE_VFORK=1 -DHAVE_WORKING_VFORK=1 > -DHAVE_WORKING_FORK=1 -DHAVE_VPRINTF=1 -DHAVE_STRDUP=1 -DHAVE_STRSTR=1 > -DHAVE_STRCHR=1 -DHAVE_ERAND48=1 -DHAVE_MEMMOVE=1 -DHAVE_LIBM=1 > -DPLD_png=1 -I. -I../include -I../../../ajax -I../../../nucleus > -I../../../plplot -DAJ_LinuxLF -D_FILE_OFFSET_BITS=64 > -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE -DLENDIAN -DNO_AUTH -O2 > -D__amd64__ -MT clique.o -MD -MP -MF .deps/clique.Tpo -c -o clique.o > clique.c > > mv -f .deps/clique.Tpo .deps/clique.Po > > gcc -DPACKAGE_NAME=\"\" -DPACKAGE_TARNAME=\"\" -DPACKAGE_VERSION=\"\" > -DPACKAGE_STRING=\"\" -DPACKAGE_BUGREPORT=\"\" -DPACKAGE=\"PHYLIPNEW\" > -DVERSION=\"3.67\" -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 > -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 > -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 > -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -DHAVE_DIRENT_H=1 > -DSTDC_HEADERS=1 -DHAVE_UNISTD_H=1 -DGETPGRP_VOID=1 -DHAVE_STRFTIME=1 > -DHAVE_FORK=1 -DHAVE_VFORK=1 -DHAVE_WORKING_VFORK=1 > -DHAVE_WORKING_FORK=1 -DHAVE_VPRINTF=1 -DHAVE_STRDUP=1 -DHAVE_STRSTR=1 > -DHAVE_STRCHR=1 -DHAVE_ERAND48=1 -DHAVE_MEMMOVE=1 -DHAVE_LIBM=1 > -DPLD_png=1 -I. -I../include -I../../../ajax -I../../../nucleus > -I../../../plplot -DAJ_LinuxLF -D_FILE_OFFSET_BITS=64 > -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE -DLENDIAN -DNO_AUTH -O2 > -D__amd64__ -MT disc.o -MD -MP -MF .deps/disc.Tpo -c -o disc.o disc.c > > mv -f .deps/disc.Tpo .deps/disc.Po > > gcc -DPACKAGE_NAME=\"\" -DPACKAGE_TARNAME=\"\" -DPACKAGE_VERSION=\"\" > -DPACKAGE_STRING=\"\" -DPACKAGE_BUGREPORT=\"\" -DPACKAGE=\"PHYLIPNEW\" > -DVERSION=\"3.67\" -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 > -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 > -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 > -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -DHAVE_DIRENT_H=1 > -DSTDC_HEADERS=1 -DHAVE_UNISTD_H=1 -DGETPGRP_VOID=1 -DHAVE_STRFTIME=1 > -DHAVE_FORK=1 -DHAVE_VFORK=1 -DHAVE_WORKING_VFORK=1 > -DHAVE_WORKING_FORK=1 -DHAVE_VPRINTF=1 -DHAVE_STRDUP=1 -DHAVE_STRSTR=1 > -DHAVE_STRCHR=1 -DHAVE_ERAND48=1 -DHAVE_MEMMOVE=1 -DHAVE_LIBM=1 > -DPLD_png=1 -I. -I../include -I../../../ajax -I../../../nucleus > -I../../../plplot -DAJ_LinuxLF -D_FILE_OFFSET_BITS=64 > -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE -DLENDIAN -DNO_AUTH -O2 > -D__amd64__ -MT phylip.o -MD -MP -MF .deps/phylip.Tpo -c -o phylip.o > phylip.c > > mv -f .deps/phylip.Tpo .deps/phylip.Po > > /bin/sh ../libtool --tag=CC --mode=link gcc -O2 -D__amd64__ -o > fclique clique.o disc.o phylip.o ../../../nucleus/libnucleus.la > ../../../ajax/libajax.la ../../../plplot/libeplplot.la -L/usr/lib64 > -lX11 -lXaw -lXt -lm -lgd -lpng -lz -lm > > mkdir .libs > > gcc -O2 -D__amd64__ -o .libs/fclique clique.o disc.o phylip.o > ../../../nucleus/.libs/libnucleus.so ../../../ajax/.libs/libajax.so > ../../../plplot/.libs/libeplplot.so -L/usr/lib64 -lX11 -lXaw -lXt -lgd > -lpng -lz -lm -Wl,--rpath -Wl,/usr/local/lib > > /usr/bin/ld: cannot find -lXaw > > collect2: ld returned 1 exit status > > make[1]: *** [fclique] Error 1 > > make[1]: Leaving directory > `/home/bioinfvm/EMBOSS-6.0.1/embassy/PHYLIPNEW-3.67/src' > > make: *** [all-recursive] Error 1 > > > > Any help would be appreciated. > > > > Thanks > > Mick > > > > Head of Bioinformatics > Institute for Animal Health > Compton > Berks > RG20 7NN > 01635 578411 > > > > Please consider the environment and don't print this e-mail unless you > really need to. > > The information contained in this message may be confidential or legally > privileged and is intended solely for the addressee. If you have > received this message in error please delete it & notify the originator > immediately. Unauthorised use, disclosure, copying or alteration of > this message is forbidden & may be unlawful. The contents of this > e-mail are the views of the sender and do not necessarily represent the > views of the Institute. This email, and associated attachments, has > been checked locally for viruses but we can accept no responsibility > once it has left our systems. Communications on Institute computers are > monitored to secure the effective operation of the systems and for other > lawful purposes. > > > > The Institute for Animal Health is a company limited by guarantee, > registered in England no. 559784. > > The Institute is also a registered charity, Charity Commissioners > Reference No. 228824 > > > > > _______________________________________________ > EMBOSS mailing list > EMBOSS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/emboss > > From ajb at ebi.ac.uk Thu Jul 9 08:08:14 2009 From: ajb at ebi.ac.uk (ajb at ebi.ac.uk) Date: Thu, 9 Jul 2009 13:08:14 +0100 (BST) Subject: [EMBOSS] Problems installing EMBASSY / PHYLIPNEW In-Reply-To: <8975119BCD0AC5419D61A9CF1A923E9508B292C9@iahce2ksrv1.iah.bbsrc.ac.uk> References: <8975119BCD0AC5419D61A9CF1A923E9508FD0DC2@iahce2ksrv1.iah.bbsrc.ac.uk> <4A5227C9.7040004@gmail.com> <8975119BCD0AC5419D61A9CF1A923E9508B292C9@iahce2ksrv1.iah.bbsrc.ac.uk> Message-ID: <35478.86.26.12.63.1247141294.squirrel@webmail.ebi.ac.uk> > Hmmm. My sys admin says.... > > I can find: > libXaw-devel.x86_64 : X.Org X11 libXaw development package > libXaw.x86_64 : X.Org X11 libXaw runtime library > libXaw.i386 : X.Org X11 libXaw runtime library > libXaw-devel.i386 : X.Org X11 libXaw development package > > Have I got compatibility issues here? What operating system and version are you running? We're at the bundling stage here for the next release and there might be an opportunity for us to check. Alan From michael.watson at bbsrc.ac.uk Thu Jul 9 08:22:10 2009 From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C)) Date: Thu, 9 Jul 2009 13:22:10 +0100 Subject: [EMBOSS] Problems installing EMBASSY / PHYLIPNEW References: <8975119BCD0AC5419D61A9CF1A923E9508FD0DC2@iahce2ksrv1.iah.bbsrc.ac.uk> <4A5227C9.7040004@gmail.com> <8975119BCD0AC5419D61A9CF1A923E9508B292C9@iahce2ksrv1.iah.bbsrc.ac.uk> <35478.86.26.12.63.1247141294.squirrel@webmail.ebi.ac.uk> Message-ID: <8975119BCD0AC5419D61A9CF1A923E9508B292CE@iahce2ksrv1.iah.bbsrc.ac.uk> I think it's Red Hat Enterprise Linux 5 64 bit version.... -----Original Message----- From: ajb at ebi.ac.uk [mailto:ajb at ebi.ac.uk] Sent: Thu 09/07/2009 1:08 PM To: michael watson (IAH-C) Cc: emboss at emboss.open-bio.org Subject: Re: [EMBOSS] Problems installing EMBASSY / PHYLIPNEW > Hmmm. My sys admin says.... > > I can find: > libXaw-devel.x86_64 : X.Org X11 libXaw development package > libXaw.x86_64 : X.Org X11 libXaw runtime library > libXaw.i386 : X.Org X11 libXaw runtime library > libXaw-devel.i386 : X.Org X11 libXaw development package > > Have I got compatibility issues here? What operating system and version are you running? We're at the bundling stage here for the next release and there might be an opportunity for us to check. Alan From A.J.Pemberton at bham.ac.uk Thu Jul 9 09:12:01 2009 From: A.J.Pemberton at bham.ac.uk (Anthony Pemberton) Date: Thu, 9 Jul 2009 14:12:01 +0100 Subject: [EMBOSS] Problems installing EMBASSY / PHYLIPNEW In-Reply-To: <35478.86.26.12.63.1247141294.squirrel@webmail.ebi.ac.uk> References: <8975119BCD0AC5419D61A9CF1A923E9508FD0DC2@iahce2ksrv1.iah.bbsrc.ac.uk><4A5227C9.7040004@gmail.com><8975119BCD0AC5419D61A9CF1A923E9508B292C9@iahce2ksrv1.iah.bbsrc.ac.uk> <35478.86.26.12.63.1247141294.squirrel@webmail.ebi.ac.uk> Message-ID: Guys, The system in question is probably a 64bit system. Very often (not always) there is not a compatibility issue with the packages you list below, they are simply the 32bit and 64bit versions of the libraries. On RHEL 5 they will install in /usr/lib and /usr/lib64 respectively. Under RHEL 5 just: # up2date -i libXaw libXaw-devel (# represents root prompt) I am fairly sure a good install of the packages requires these libraries. You will need a Red Hat subscription for up2date to work (or install from your distro CD/DVD), Regards, Tony Pemberton > -----Original Message----- > From: emboss-bounces at lists.open-bio.org [mailto:emboss- > bounces at lists.open-bio.org] On Behalf Of ajb at ebi.ac.uk > Sent: 09 July 2009 13:08 > To: michael watson (IAH-C) > Cc: emboss at emboss.open-bio.org > Subject: Re: [EMBOSS] Problems installing EMBASSY / PHYLIPNEW > > > Hmmm. My sys admin says.... > > > > I can find: > > libXaw-devel.x86_64 : X.Org X11 libXaw development package > > libXaw.x86_64 : X.Org X11 libXaw runtime library > > libXaw.i386 : X.Org X11 libXaw runtime library > > libXaw-devel.i386 : X.Org X11 libXaw development package > > > > Have I got compatibility issues here? > > What operating system and version are you running? > We're at the bundling stage here for the next release and there > might be an opportunity for us to check. > > Alan > > > _______________________________________________ > EMBOSS mailing list > EMBOSS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/emboss From biopython at maubp.freeserve.co.uk Fri Jul 10 05:14:42 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 10 Jul 2009 10:14:42 +0100 Subject: [EMBOSS] transeq and ambiguous codons In-Reply-To: <4A55B395.4090301@ebi.ac.uk> References: <320fb6e00907081450y2fd135e0x817f03c41357e297@mail.gmail.com> <4A55B395.4090301@ebi.ac.uk> Message-ID: <320fb6e00907100214v6799a217l507e089f635ef781@mail.gmail.com> On Thu, Jul 9, 2009 at 10:08 AM, Peter Rice wrote: > > Peter C. wrote: >> However, consider the codon TRR. R means A or G, so this can mean TAA, >> TGA, TAG or TGG which translate to stop or W (both EMBOSS and the NCBI >> standard table agree here). Therefore the translation of TRR should be >> "* or W", which I would expect based on the above examples to result >> in "X". But instead EMBOSS transeq gives "*": > > This is a side effect of the way backtranslation works... OK, leaving TRR aside for the moment (I'm not sure I'd have done it that way, but I think I follow your logic), I have some more problem cases for you to consider (all using the default standard NCBI table 1). Most of these are 'unambiguous ambiguous codons' as you put it, and I would agree using X when a more specific letter is possible isn't ideal but isn't actually wrong. The "ATS" and related codons (see below) however are simply wrong. -------------------------------------------------------------------------------------- TRA means TAA or TGA, which are both stop codons. Therefore TRA should translate as a stop, not as an X: $ transeq asis:TAATGATRA -stdout -auto -osformat raw **X -------------------------------------------------------------------------------------- Now look at YTA, which means CTA or TTA which encode L, so YTA should be L not X: $ transeq asis:CTATTAYTA -stdout -auto -osformat raw LLX Likewise for YTG and YTR, and YTN. -------------------------------------------------------------------------------------- Another example, ATW means ATA or ATT, which both translate as I, so ATW should translate as I not X: $ transeq asis:ATAATTATW -stdout -auto -osformat raw IIX -------------------------------------------------------------------------------------- Conversely, ATS which means ATC or ATG which translate as I and M. Remember S means G or C. Therefore ATS should translate as X, and not I: $ transeq asis:ATCATGATS -stdout -auto -osformat raw IMI Likewise H means A, G or C, so ATH shows the same bug, as do some other AT* codons: $ transeq asis:ATAATCATGATH -stdout -auto -osformat raw IIMI [*** This one strikes me as a clear bug ***] -------------------------------------------------------------------------------------- Now for another debatable one, RAT means AAT or GAT which code for N and D. So, you could use B (Asx) here rather than the broader X. $ transeq asis:AATGATRAT -stdout -auto -osformat raw NDX Again, the same thing for others like RAC -> X not B, and RAY -> X not B. Similarly, you don't use J to mean leucine (L) or to isoleucine (I), and opt for X (again, this is justifiable). e.g. WTA $ transeq asis:ATATTAWTA -stdout -auto -osformat raw ILX -------------------------------------------------------------------------------------- This list is only partial, and only for the standard table. Peter C. From pmr at ebi.ac.uk Fri Jul 10 05:30:52 2009 From: pmr at ebi.ac.uk (Peter Rice) Date: Fri, 10 Jul 2009 10:30:52 +0100 Subject: [EMBOSS] transeq and ambiguous codons In-Reply-To: <320fb6e00907100214v6799a217l507e089f635ef781@mail.gmail.com> References: <320fb6e00907081450y2fd135e0x817f03c41357e297@mail.gmail.com> <4A55B395.4090301@ebi.ac.uk> <320fb6e00907100214v6799a217l507e089f635ef781@mail.gmail.com> Message-ID: <4A570A4C.2000500@ebi.ac.uk> Peter C. wrote: > OK, leaving TRR aside for the moment (I'm not sure I'd have done it that > way, but I think I follow your logic), I have some more problem cases for > you to consider (all using the default standard NCBI table 1). > > Most of these are 'unambiguous ambiguous codons' as you put it, and > I would agree using X when a more specific letter is possible isn't ideal > but isn't actually wrong. The "ATS" and related codons (see below) > however are simply wrong. They do look wrong. The "X when it could pick a residue" ones I knew of. The others need a closer look. The plan is to work through all possible codons and all the NCBI genetic codes as soon as the release is out. It should be a simple patch to ajtranslate.c when I'm done. > -------------------------------------------------------------------------------------- > > Now for another debatable one, RAT means AAT or GAT which code > for N and D. So, you could use B (Asx) here rather than the broader X. > > Similarly, you don't use J to mean leucine (L) or to isoleucine (I), and > opt for X (again, this is justifiable). e.g. WTA Hmmm ... B and Z are ambiguity codes for amino acid analyser where all the amide bonds are broken and that includes N->D and Q->E. We used to have one of those in the lab. Similarly, J is for mass spec where I and L have the same molecular weight. I don't consider them appropriate for translation. So I plan to go for unique amino acids where possible with the ambiguity codes. What do our users think? regards, Peter From biopython at maubp.freeserve.co.uk Fri Jul 10 19:10:19 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sat, 11 Jul 2009 00:10:19 +0100 Subject: [EMBOSS] transeq and ambiguous codons In-Reply-To: <4A570A4C.2000500@ebi.ac.uk> References: <320fb6e00907081450y2fd135e0x817f03c41357e297@mail.gmail.com> <4A55B395.4090301@ebi.ac.uk> <320fb6e00907100214v6799a217l507e089f635ef781@mail.gmail.com> <4A570A4C.2000500@ebi.ac.uk> Message-ID: <320fb6e00907101610o63140f2cp7f0aebd1cc7022d1@mail.gmail.com> On Fri, Jul 10, 2009 at 10:30 AM, Peter Rice wrote: > > Peter C. wrote: >> >> OK, leaving TRR aside for the moment (I'm not sure I'd have done it that >> way, but I think I follow your logic), I have some more problem cases for >> you to consider (all using the default standard NCBI table 1). >> >> Most of these are 'unambiguous ambiguous codons' as you put it, and >> I would agree using X when a more specific letter is possible isn't ideal >> but isn't actually wrong. The "ATS" and related codons (see below) >> however are simply wrong. > > They do look wrong. The "X when it could pick a residue" ones I knew of. > > The others need a closer look. The plan is to work through all possible > codons and all the NCBI genetic codes as soon as the release is out. > > It should be a simple patch to ajtranslate.c when I'm done. > OK - I appreciate this is too last minute for the imminent EMBOSS release. >> -------------------------------------------------------------------------------------- >> >> Now for another debatable one, RAT means AAT or GAT which code >> for N and D. So, you could use B (Asx) here rather than the broader X. >> >> Similarly, you don't use J to mean leucine (L) or to isoleucine (I), and >> opt for X (again, this is justifiable). e.g. WTA > > Hmmm ... B and Z are ambiguity codes for amino acid analyser where all the > amide bonds are broken and that includes N->D and Q->E. We used to have one > of those in the lab. Similarly, J is for mass spec where I and L have the > same molecular weight. I don't consider them appropriate for translation. Well, as I said, this is debatable. On the one hand B and Z are IUPAC standards (although J isn't yet), but amino acids don't have the full ambiguous alphabet that we have for nucleotides so some might find such a translation surprising. http://www.chem.qmul.ac.uk/iupac/AminoAcid/A2021.html > So I plan to go for unique amino acids where possible with the ambiguity > codes. Good :) Peter C. From aradwen at gmail.com Sat Jul 11 06:29:40 2009 From: aradwen at gmail.com (Radwen ANIBA) Date: Sat, 11 Jul 2009 12:29:40 +0200 Subject: [EMBOSS] Problem with protein caracters Message-ID: Hi everyone I'm trying to use some programs that comes with emboss package to analyze some protein sequences but I have sometimes this message : Error: ajSeqTypeCheckIn: Sequence must be protein sequence without BZ U X or *: found bad character 'X' Is there any manner to force the program considering these types of residues ? Thank you radwen From pmr at ebi.ac.uk Sat Jul 11 06:54:21 2009 From: pmr at ebi.ac.uk (Peter Rice) Date: Sat, 11 Jul 2009 11:54:21 +0100 Subject: [EMBOSS] Problem with protein caracters In-Reply-To: References: Message-ID: <4A586F5D.5060702@ebi.ac.uk> Radwen ANIBA wrote: > I'm trying to use some programs that comes with emboss package to analyze > some protein sequences but I have sometimes this message : > > Error: ajSeqTypeCheckIn: Sequence must be protein sequence without BZ U X or > *: found bad character 'X' > > Is there any manner to force the program considering these types of residues EMBOSS uses the type attribute of the input sequence (or seqset or seqall) to identify the type of the input sequence (nucleotide, protein, or any) and the characters that are allowed (gaps, stops, non-standard residies and ambiguity characters). Your application is expecting "pureprotein". This is only used by applications unable to handle the ambiguity codes (it can be difficult to define what an algorithm should do with them). The alternative are: protein - accepts all characters, converts stops to X proteinstandard - converts U,O and J to 'X' stopproteinstandard - converts stops, U, O, J to X "protein" is probably what you want. You need to be able to do something with the ambiguity codes X, B, Z and J and with the non-standard amino acids U (selenocysteine) and O (pyrrolysine) Hope this helps Peter Rice From liisa.koski at dnalandmarks.ca Mon Jul 13 09:56:37 2009 From: liisa.koski at dnalandmarks.ca (Liisa Koski) Date: Mon, 13 Jul 2009 09:56:37 -0400 Subject: [EMBOSS] fuzznuc score Message-ID: Hello, I would like to know how the score is calculated for the fuzznuc results. What is taken into consideration(strand, exact matches, etc?). What does a high score vs lower score mean? Many thanks, Liisa From pmr at ebi.ac.uk Mon Jul 13 10:43:18 2009 From: pmr at ebi.ac.uk (Peter Rice) Date: Mon, 13 Jul 2009 15:43:18 +0100 Subject: [EMBOSS] fuzznuc score In-Reply-To: References: Message-ID: <4A5B4806.1080905@ebi.ac.uk> Liisa Koski wrote: > Hello, > > I would like to know how the score is calculated for the fuzznuc results. > What is taken into consideration(strand, exact matches, etc?). What does a > high score vs lower score mean? The score in EMBOSS report output is a required field for standard formats such as GFF. For fuzznuc the score is the length of the sequence that is found, minus any mismatches. Patterns that can match very long sequences can therefore give a very high score. Hope this helps, Peter Rice From ajb at ebi.ac.uk Wed Jul 15 07:19:33 2009 From: ajb at ebi.ac.uk (ajb at ebi.ac.uk) Date: Wed, 15 Jul 2009 12:19:33 +0100 (BST) Subject: [EMBOSS] EMBOSS 6.1.0 release now available Message-ID: <39390.86.26.12.63.1247656773.squirrel@webmail.ebi.ac.uk> Dear EMBOSS users and developers, A new version of EMBOSS (6.1.0) is now available for download from our ftp server: ftp://emboss.open-bio.org/pub/EMBOSS/ If you use any of the EMBASSY packages (e.g. PHYLIP, VIENNA etc) then, as usual, remember to re-download and compile those too. A new version of the mEMBOSS, the Windows port, is also available from: ftp://emboss.open-bio.org/pub/EMBOSS/windows/mEMBOSS-6.1.0-setup.exe Many new capabilities have been added and bugs fixed throughout. Release highlights for EMBOSS include: * Full support for the new SwissProt format. In most cases the entry can be read and written exactly * Full support for EMBL and GenBank entries. In most cases the entry can be read and written exactly * Support for FASTQ short read formats for sequence and quality data * Full support for protein and nucleotide sequence parsing from PDB entries * Full support for GFF3 feature format as the new default feature output * Improved summary information at the end of report output * Alignment output using multiple sequence formats * Extended support for distance matrix file formats * Improved support for regular expression and pattern searching * Improved support for large sequence alignments * Support for remote locations in feature table processing, for example retrieval in coderet. * Output directory support extended to allow directories to be created * Normalisation option for hydrophobicity plots (pepwindow and pepwindowall) * Processing of methylation sites in restriction mapping * Embossdata reports results alphabetically sorted * Command line qualifiers should be unique after 5 characters to allow safe abbreviation * Improved configuration procedures for X11 support * Support for dasgff report format, making it possible to write EMBOSS-based DAS annotation servers Release highlights for EMBASSY include: * Support for MEME 4.0 * Phylipnew updated to Phylip 3.68 * Support for the HMMERDB environment variable in Hmmernew. * Bug fixes for the MSE multiple sequence editor Release highlights for Jemboss include: * Refactoring of the source code * Location of the 'Execution mode' menu moved near to the 'Go' button in the application forms. When a user runs a job for the first time in 'batch' mode an information message is displayed * Automatic configuration of the standalone Jemboss GUI on UNIX systems after typing "make install" for EMBOSS. This standalone GUI can be run using the runJemboss.csh script in the EMBOSS 'bin' directory. This assumes that you have a reasonably up-to-date version of Java installed (1.6 preferred) For future extensions, we have added: * Parsing of cross-reference information from SwissProt and EMBL/GenBank formats * Code to delete and update database indexes New EMBOSS wiki EMBOSS now has a Wiki at http://emboss.open-bio.org/wiki where we will maintain the master copies of documentation for the applications and libraries, and where we have sections for planning new features and applications for the next 3 years of funding. Please contribute any corrections to the documentation and add new ideas to the "Planning" section. We will, of course, be making the wiki prettier as it matures. Important note for Developers New distributions of operating systems have started to use the series 2 version of libtool. We therefore now use this in our CVS repository. The latest stable version of libtool is 2.2.6a (reported by libtool itself as 2.2.6). Developers using systems with older (1.5.x) libtool versions will have to install a local copy of libtool. This would typically be done by downloading the source code from the GNU site: ftp://ftp.gnu.org/ After installing libtool it will usually be necessary to then re-install autoconf (2.63) and automake (1.11) to the same directory root (they are often tied to the version of libtool they were provided with). They too are available from the GNU ftp server. Make sure that your PATH is refreshed between doing the installations of the GNU tools in order that the previous versions aren't referenced. We note that one system (cygwin) currently provides an experimental version of libtool (2.2.7). Developers on these systems (and, in general, on any system with a higher version of libtool than in our CVS repository) should type: autoreconf -fi before attempting compilation. We will usually keep up-to-date with libtool stable releases within a libtool series. New BBSRC funding and future work As previously announced, we have recently been refunded by the BBSRC. What we said in that announcement bears repeating here. The core aims of the funding proposal were to continue support, maintenance and development of EMBOSS, and to provide extensive online training materials for users, developers and system administrators using text from a series of books to be published by Cambridge University Press. We are also explicitly targeting areas where we see EMBOSS can be expanded: * Richer data content in EMBOSS outputs leading to major improvements in the integration and visualisation of results in browsers. * Processing many more data fields in EMBOSS inputs (taxonomy, genes, GO terms, cross-references, keywords. * Extending and improving database access: better indexing, query language support and combining searches across multiple databases, support for non-sequence data resources and new data access methods * Scaling up the libraries and adding new applications to support the data volumes generated by next-generation sequencing runs. We anticipate many more users will be working with short read data mapped to reference sequences over the next few years. * We aim to add at least 100 new applications in these 3 years. Suggestions for new applications are very welcome. * Major work on new developments and new library code will start from August. Alan From db60 at st-andrews.ac.uk Wed Jul 15 08:47:17 2009 From: db60 at st-andrews.ac.uk (Daniel Barker) Date: Wed, 15 Jul 2009 13:47:17 +0100 Subject: [EMBOSS] EMBOSS 6.1.0 release now available In-Reply-To: <39390.86.26.12.63.1247656773.squirrel@webmail.ebi.ac.uk> References: <39390.86.26.12.63.1247656773.squirrel@webmail.ebi.ac.uk> Message-ID: <4A5DCFD5.8050101@st-andrews.ac.uk> Dear Alan, Very glad to see the project will continue with new funding. One request. Could EMBOSS please add a full global alignment program or option? Needle allows 'overhanging ends' with no penalty. From the current Needle documentation: 'There is no penalty for the hanging ends of the overlap. In bioinformatics, it is usually reasonable to assume that the sequences are incomplete and there should be no penalty for failing to align the missing bases.' For protein alignments, this tends to be very lenient towards alignments with these kinds of domain architecture: protein 1: A-B-C protein 2: C-X-Y-Z As opposed to the following situation, which (correctly for global alignment) does tend to be heavily penalized: protein 1: A-C-D protein 2: C I don't think this makes sense. Or at least, it definitely does not make sense for all applications. Generally with pairwise alignment of proteins I want either local alignment (e.g. water), or full global alignment which isn't available in EMBOSS. Thank you for considering this. Best regards, Daniel -- Daniel Barker http://bio.st-andrews.ac.uk/staff/db60.htm The University of St Andrews is a charity registered in Scotland : No SC013532 From pmr at ebi.ac.uk Wed Jul 15 09:44:26 2009 From: pmr at ebi.ac.uk (Peter Rice) Date: Wed, 15 Jul 2009 14:44:26 +0100 Subject: [EMBOSS] EMBOSS 6.1.0 release now available In-Reply-To: <4A5DCFD5.8050101@st-andrews.ac.uk> References: <39390.86.26.12.63.1247656773.squirrel@webmail.ebi.ac.uk> <4A5DCFD5.8050101@st-andrews.ac.uk> Message-ID: <4A5DDD3A.9070605@ebi.ac.uk> Daniel Barker wrote: > One request. Could EMBOSS please add a full global alignment program or > option? Needle allows 'overhanging ends' with no penalty. > > I don't think this makes sense. Or at least, it definitely does not make > sense for all applications. Generally with pairwise alignment of > proteins I want either local alignment (e.g. water), or full global > alignment which isn't available in EMBOSS. Thanks for the suggestion. Would you be happy with an options for needle to penalise gaps at the end, or a separate application (if so, we need to invent a new name). regards, Peter Rice From db60 at st-andrews.ac.uk Wed Jul 15 09:59:25 2009 From: db60 at st-andrews.ac.uk (Daniel Barker) Date: Wed, 15 Jul 2009 14:59:25 +0100 Subject: [EMBOSS] EMBOSS 6.1.0 release now available In-Reply-To: <4A5DDD3A.9070605@ebi.ac.uk> References: <39390.86.26.12.63.1247656773.squirrel@webmail.ebi.ac.uk> <4A5DCFD5.8050101@st-andrews.ac.uk> <4A5DDD3A.9070605@ebi.ac.uk> Message-ID: <4A5DE0BD.7090103@st-andrews.ac.uk> Dear Peter, > Would you be happy with an options for needle to penalise gaps at the > end, or a separate application (if so, we need to invent a new name). An option for needle would be fine. This may be the best solution. Thanks a lot and best wishes, Daniel -- Daniel Barker http://bio.st-andrews.ac.uk/staff/db60.htm The University of St Andrews is a charity registered in Scotland : No SC013532 From charles-listes-emboss at plessy.org Wed Jul 15 10:59:53 2009 From: charles-listes-emboss at plessy.org (Charles Plessy) Date: Wed, 15 Jul 2009 23:59:53 +0900 Subject: [EMBOSS] EMBOSS 6.1.0 release now available In-Reply-To: <39390.86.26.12.63.1247656773.squirrel@webmail.ebi.ac.uk> References: <39390.86.26.12.63.1247656773.squirrel@webmail.ebi.ac.uk> Message-ID: <20090715145953.GA30812@kunpuu.plessy.org> Le Wed, Jul 15, 2009 at 12:19:33PM +0100, ajb at ebi.ac.uk a ?crit : > Dear EMBOSS users and developers, > > A new version of EMBOSS (6.1.0) is now available for download > from our ftp server: > > ftp://emboss.open-bio.org/pub/EMBOSS/ Dear EMBOSS developers, congratulations for the release! I have updated our Debian EMBOSS packages, and they should become available for the Sid (unstable) distribution in our mirrors within a day or two. Please let me know if these packages prove difficult to install on our current stable distribution (Lenny), and I will prepare a semi-official backport. I will update and rebuild the Debian EMBASSY packages against EMBOSS 6.1.0 later this week. Have a nice day -- Charles Plessy Debian Med packaging team, http://www.debian.org/devel/debian-med Tsurumi, Kanagawa, Japan From golharam at umdnj.edu Thu Jul 16 15:09:03 2009 From: golharam at umdnj.edu (Ryan Golhar) Date: Thu, 16 Jul 2009 15:09:03 -0400 Subject: [EMBOSS] EMBOSS 6.1.0 release now available In-Reply-To: <39390.86.26.12.63.1247656773.squirrel@webmail.ebi.ac.uk> References: <39390.86.26.12.63.1247656773.squirrel@webmail.ebi.ac.uk> Message-ID: <4A5F7ACF.4060808@umdnj.edu> Any thought on implementing some of the algorithms using CUDA when possible on GPUs? This could speed up some programs significantly. ajb at ebi.ac.uk wrote: > Dear EMBOSS users and developers, > > A new version of EMBOSS (6.1.0) is now available for download > from our ftp server: > > ftp://emboss.open-bio.org/pub/EMBOSS/ > > > If you use any of the EMBASSY packages (e.g. PHYLIP, VIENNA etc) then, > as usual, remember to re-download and compile those too. > > A new version of the mEMBOSS, the Windows port, is also available from: > > ftp://emboss.open-bio.org/pub/EMBOSS/windows/mEMBOSS-6.1.0-setup.exe > > Many new capabilities have been added and bugs fixed throughout. > > > > > Release highlights for EMBOSS include: > > * Full support for the new SwissProt format. In most cases the entry can > be read and written exactly > > * Full support for EMBL and GenBank entries. In most cases the entry can > be read and written exactly > > * Support for FASTQ short read formats for sequence and quality data > > * Full support for protein and nucleotide sequence parsing from PDB entries > > * Full support for GFF3 feature format as the new default feature output > > * Improved summary information at the end of report output > > * Alignment output using multiple sequence formats > > * Extended support for distance matrix file formats > > * Improved support for regular expression and pattern searching > > * Improved support for large sequence alignments > > * Support for remote locations in feature table processing, for example > retrieval in coderet. > > * Output directory support extended to allow directories to be created > > * Normalisation option for hydrophobicity plots (pepwindow and pepwindowall) > > * Processing of methylation sites in restriction mapping > > * Embossdata reports results alphabetically sorted > > * Command line qualifiers should be unique after 5 characters to allow > safe abbreviation > > * Improved configuration procedures for X11 support > > * Support for dasgff report format, making it possible to write > EMBOSS-based DAS annotation servers > > > > > Release highlights for EMBASSY include: > > * Support for MEME 4.0 > > * Phylipnew updated to Phylip 3.68 > > * Support for the HMMERDB environment variable in Hmmernew. > > * Bug fixes for the MSE multiple sequence editor > > > > > Release highlights for Jemboss include: > > * Refactoring of the source code > > * Location of the 'Execution mode' menu moved near to the 'Go' button in > the application forms. When a user runs a job for the first time in > 'batch' mode an information message is displayed > > * Automatic configuration of the standalone Jemboss GUI on > UNIX systems after typing "make install" for EMBOSS. This > standalone GUI can be run using the runJemboss.csh script > in the EMBOSS 'bin' directory. This assumes that you have > a reasonably up-to-date version of Java installed (1.6 > preferred) > > > > > For future extensions, we have added: > > * Parsing of cross-reference information from SwissProt and EMBL/GenBank > formats > > * Code to delete and update database indexes > > > > > New EMBOSS wiki > > EMBOSS now has a Wiki at http://emboss.open-bio.org/wiki where we will > maintain the master copies of documentation for the applications and > libraries, and where we have sections for planning new features and > applications for the next 3 years of funding. Please contribute any > corrections to the documentation and add new ideas to the "Planning" > section. We will, of course, be making the wiki prettier as it > matures. > > > > > Important note for Developers > > New distributions of operating systems have started to use the series > 2 version of libtool. We therefore now use this in our CVS > repository. The latest stable version of libtool is 2.2.6a (reported > by libtool itself as 2.2.6). Developers using systems with older > (1.5.x) libtool versions will have to install a local copy of > libtool. This would typically be done by downloading the source code > from the GNU site: > > ftp://ftp.gnu.org/ > > After installing libtool it will usually be necessary to then > re-install autoconf (2.63) and automake (1.11) to the same directory > root (they are often tied to the version of libtool they were provided > with). They too are available from the GNU ftp server. Make sure that > your PATH is refreshed between doing the installations of the GNU > tools in order that the previous versions aren't referenced. > > We note that one system (cygwin) currently provides an experimental > version of libtool (2.2.7). Developers on these systems (and, in general, > on any system with a higher version of libtool than in our CVS repository) > should type: > > autoreconf -fi > > before attempting compilation. We will usually keep up-to-date with libtool > stable releases within a libtool series. > > > > > New BBSRC funding and future work > > As previously announced, we have recently been refunded by the BBSRC. > What we said in that announcement bears repeating here. > The core aims of the funding proposal were to continue support, > maintenance and development of EMBOSS, and to provide extensive online > training materials for users, developers and system administrators using > text from a series of books to be published by Cambridge University Press. > > We are also explicitly targeting areas where we see EMBOSS can be > expanded: > > * Richer data content in EMBOSS outputs leading to major improvements > in the integration and visualisation of results in browsers. > > * Processing many more data fields in EMBOSS inputs (taxonomy, > genes, GO terms, cross-references, keywords. > > * Extending and improving database access: better indexing, query > language support and combining searches across multiple databases, > support for non-sequence data resources and new data access methods > > * Scaling up the libraries and adding new applications to support the > data volumes generated by next-generation sequencing runs. We > anticipate many more users will be working with short read data > mapped to reference sequences over the next few years. > > * We aim to add at least 100 new applications in these 3 years. > Suggestions for new applications are very welcome. > > * Major work on new developments and new library code will start from > August. > > > Alan > > > _______________________________________________ > EMBOSS mailing list > EMBOSS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/emboss > -------------- next part -------------- A non-text attachment was scrubbed... Name: golharam.vcf Type: text/x-vcard Size: 438 bytes Desc: not available URL: From pmr at ebi.ac.uk Fri Jul 17 03:39:11 2009 From: pmr at ebi.ac.uk (Peter Rice) Date: Fri, 17 Jul 2009 08:39:11 +0100 Subject: [EMBOSS] EMBOSS 6.1.0 release now available In-Reply-To: <4A5F7ACF.4060808@umdnj.edu> References: <39390.86.26.12.63.1247656773.squirrel@webmail.ebi.ac.uk> <4A5F7ACF.4060808@umdnj.edu> Message-ID: <4A602A9F.4040908@ebi.ac.uk> Ryan Golhar wrote: > Any thought on implementing some of the algorithms using CUDA when > possible on GPUs? This could speed up some programs significantly. Yes indeed. At BOSC/ISMB last month we were discussing closer collaborations with the other Open Bio FOundation projects. One of these is BioManyCores which is aiming at OpenCL programming but is currently concentrating on CUDA. When our new workstations are delivered we will be looking into CUDA. Which applications would you most like to speed up? (current EMBOSS programs, and suggestions for new ones) regards, Peter Rice From hpm at bioinfo-user.org.uk Fri Jul 17 05:59:42 2009 From: hpm at bioinfo-user.org.uk (Hamish McWilliam) Date: Fri, 17 Jul 2009 10:59:42 +0100 Subject: [EMBOSS] EMBOSS 6.1.0 release now available In-Reply-To: <4A602A9F.4040908@ebi.ac.uk> References: <39390.86.26.12.63.1247656773.squirrel@webmail.ebi.ac.uk> <4A5F7ACF.4060808@umdnj.edu> <4A602A9F.4040908@ebi.ac.uk> Message-ID: <4A604B8E.1000004@bioinfo-user.org.uk> Hi Peter, >> Any thought on implementing some of the algorithms using CUDA when >> possible on GPUs? This could speed up some programs significantly. Given that our server systems do not have particularly powerful GPUs, but do have multiple CPU cores: threading, and possibly the use of on core vectorization (see http://en.wikipedia.org/wiki/SIMD), seem like more generally applicable methods for improving performance in our case. One interesting option for Intel platforms is the Intel Compiler (icc), which will vectorize some code constructions as a platform specific optimization. Unfortunately we are running a mixture of AMD and Intel systems of various vintages, so this option is going to require a lot of testing to check it works and gives us any benefits. > Yes indeed. > > At BOSC/ISMB last month we were discussing closer collaborations with > the other Open Bio FOundation projects. > > One of these is BioManyCores which is aiming at OpenCL programming but > is currently concentrating on CUDA. > > When our new workstations are delivered we will be looking into CUDA. Given that OpenCL supports both GPU and CPU vectorization, and CUDA is Nvidia GPU specific, it may be worth waiting for OpenCL to be adopted. MacOS X Snow Leopard is only a couple of months away after all ;-) > Which applications would you most like to speed up? (current EMBOSS > programs, and suggestions for new ones) At our end the bottlenecks are mainly the indexing (dbi* & dbx*) and reformatting tools (seqret). All the best, Hamish From georgios at biotek.uio.no Fri Jul 17 06:52:38 2009 From: georgios at biotek.uio.no (George Magklaras) Date: Fri, 17 Jul 2009 12:52:38 +0200 Subject: [EMBOSS] EMBOSS 6.1.0 release now available In-Reply-To: <4A604B8E.1000004@bioinfo-user.org.uk> References: <39390.86.26.12.63.1247656773.squirrel@webmail.ebi.ac.uk> <4A5F7ACF.4060808@umdnj.edu> <4A602A9F.4040908@ebi.ac.uk> <4A604B8E.1000004@bioinfo-user.org.uk> Message-ID: <4A6057F6.7010704@biotek.uio.no> Hamish McWilliam wrote: ) > > At our end the bottlenecks are mainly the indexing (dbi* & dbx*) and > reformatting tools (seqret). > I second that. -- -- George Magklaras BSc Hons MPhil RHCE:805008309135525 Senior Computer Systems Engineer/UNIX-Linux Systems Administrator EMBnet Technical Management Board The Biotechnology Centre of Oslo, University of Oslo http://folk.uio.no/georgios From golharam at umdnj.edu Fri Jul 17 11:03:02 2009 From: golharam at umdnj.edu (Ryan Golhar) Date: Fri, 17 Jul 2009 11:03:02 -0400 Subject: [EMBOSS] EMBOSS 6.1.0 release now available In-Reply-To: <4A602A9F.4040908@ebi.ac.uk> References: <39390.86.26.12.63.1247656773.squirrel@webmail.ebi.ac.uk> <4A5F7ACF.4060808@umdnj.edu> <4A602A9F.4040908@ebi.ac.uk> Message-ID: <4A6092A6.3070808@umdnj.edu> I think the sequence alignment programs or anything dealing with matrices would be a good first start. Even (as Hamish) mentioned, making use of multi-core systems would be good. We are interested in doing some CUDA programming here. We are doing some training in about a month. Based on that, we'll see how well we know it and what we want to pursue. Maybe we can help you out a bit in this? Peter Rice wrote: > Ryan Golhar wrote: >> Any thought on implementing some of the algorithms using CUDA when >> possible on GPUs? This could speed up some programs significantly. > > Yes indeed. > > At BOSC/ISMB last month we were discussing closer collaborations with > the other Open Bio FOundation projects. > > One of these is BioManyCores which is aiming at OpenCL programming but > is currently concentrating on CUDA. > > When our new workstations are delivered we will be looking into CUDA. > > Which applications would you most like to speed up? (current EMBOSS > programs, and suggestions for new ones) > > regards, > > Peter Rice > -------------- next part -------------- A non-text attachment was scrubbed... Name: golharam.vcf Type: text/x-vcard Size: 438 bytes Desc: not available URL: From pmr at ebi.ac.uk Fri Jul 17 11:34:31 2009 From: pmr at ebi.ac.uk (Peter Rice) Date: Fri, 17 Jul 2009 16:34:31 +0100 Subject: [EMBOSS] EMBOSS 6.1.0 release now available In-Reply-To: <4A6057F6.7010704@biotek.uio.no> References: <39390.86.26.12.63.1247656773.squirrel@webmail.ebi.ac.uk> <4A5F7ACF.4060808@umdnj.edu> <4A602A9F.4040908@ebi.ac.uk> <4A604B8E.1000004@bioinfo-user.org.uk> <4A6057F6.7010704@biotek.uio.no> Message-ID: <4A609A07.2080609@ebi.ac.uk> George Magklaras wrote: > Hamish McWilliam wrote: > At our end the bottlenecks are mainly the indexing (dbi* & dbx*) and > reformatting tools (seqret). >> > I second that. dbi and dbx we understand and can speedup. What large data are you using through seqret? I assume it is not single sequences that are slow. regards, Peter Rice From belegdol at gmail.com Sun Jul 19 08:38:19 2009 From: belegdol at gmail.com (Julian Sikorski) Date: Sun, 19 Jul 2009 14:38:19 +0200 Subject: [EMBOSS] Packaging EMBOSS for Fedora In-Reply-To: <1244850556.8999.7.camel@login-svr1.ebi.ac.uk> References: <1244850556.8999.7.camel@login-svr1.ebi.ac.uk> Message-ID: W dniu 13.06.2009 01:49, Mahmut Uludag pisze: > Hi Julian, > >> I have one crucial question: is it worth to package jemboss? It seems it >> didn't see development since 2005, and I have experienced some important >> issues when trying to use it (mainly not working with files with spaces >> in filenames). What do you think? > > I had a quick look into the files with spaces in filenames problem, we > can include a fix in the coming release (6.1) for this problem. Previous > release of EMBOSS (6.0) included some fixes of similar nature, as well > as some small gui improvements. We recently have updated EMBOSS/Jemboss > make/install machinery to make standalone Jemboss come out of box when > users install EMBOSS, i think we were successful and this will be a new > feature in EMBOSS 6.1. > > Regards, > Mahmut Given that I'm finally an MSc and have some more free time, I decided to revisit the packaging and update the package to 6.1.0. Unfortunately, there seem to be some problems with make install: /usr/bin/make install-exec-hook make[7]: Entering directory `/builddir/build/BUILD/EMBOSS-6.1.0/jemboss/org/emboss/jemboss/editor' mkdir -p -- /builddir/build/BUILDROOT/EMBOSS-6.1.0-1.fc11.x86_64/usr/share/EMBOSS/jemboss/org/emboss/jemboss/editor /usr/bin/install: cannot stat `*.class': No such file or directory make[7]: *** [install-exec-hook] Error 1 make[7]: Leaving directory `/builddir/build/BUILD/EMBOSS-6.1.0/jemboss/org/emboss/jemboss/editor' make[6]: Leaving directory `/builddir/build/BUILD/EMBOSS-6.1.0/jemboss/org/emboss/jemboss/editor' make[6]: *** [install-exec-am] Error 2 make[5]: *** [install-am] Error 2 make[5]: Leaving directory `/builddir/build/BUILD/EMBOSS-6.1.0/jemboss/org/emboss/jemboss/editor' make[4]: Leaving directory `/builddir/build/BUILD/EMBOSS-6.1.0/jemboss/org/emboss/jemboss' make[4]: *** [install-recursive] Error 1 make[3]: Leaving directory `/builddir/build/BUILD/EMBOSS-6.1.0/jemboss/org/emboss' make[2]: Leaving directory `/builddir/build/BUILD/EMBOSS-6.1.0/jemboss/org' make[3]: *** [install-recursive] Error 1 make[2]: *** [install-recursive] Error 1 make[1]: Leaving directory `/builddir/build/BUILD/EMBOSS-6.1.0/jemboss' make[1]: *** [install-recursive] Error 1 make: *** [install-recursive] Error 1 6.0.1 including the 1-12 patch was building/installing fine. Am I making something wrong? Cheers, Julian From uludag at ebi.ac.uk Sun Jul 19 12:18:29 2009 From: uludag at ebi.ac.uk (uludag at ebi.ac.uk) Date: Sun, 19 Jul 2009 17:18:29 +0100 (BST) Subject: [EMBOSS] Packaging EMBOSS for Fedora In-Reply-To: References: <1244850556.8999.7.camel@login-svr1.ebi.ac.uk> Message-ID: <49993.78.105.201.225.1248020309.squirrel@webmail.ebi.ac.uk> > there seem to be some problems with make install: > > /usr/bin/make install-exec-hook > make[7]: Entering directory > `/builddir/build/BUILD/EMBOSS-6.1.0/jemboss/org/emboss/jemboss/editor' > mkdir -p -- > /builddir/build/BUILDROOT/EMBOSS-6.1.0-1.fc11.x86_64/usr/share/EMBOSS/jemboss/org/emboss/jemboss/editor > /usr/bin/install: cannot stat `*.class': No such file or directory Looks like we didn't test the --with-java and --with-javaos configure options well, before this release. However, most users will not need these two options any more as EMBOSS-6.1.0 includes precompiled jemboss class files collected in a java archive file. You should hopefully not get the above error if you omit these two options when you configure your emboss installation. Regards, Mahmut From biopython at maubp.freeserve.co.uk Mon Jul 20 09:24:02 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 20 Jul 2009 14:24:02 +0100 Subject: [EMBOSS] transeq and ambiguous codons In-Reply-To: <320fb6e00907090221m66f830b6ya25ad8eeeecf8f12@mail.gmail.com> References: <320fb6e00907081450y2fd135e0x817f03c41357e297@mail.gmail.com> <1F1240778FB0AF46B4E5A72C44D2C74732E01A91@exch1-hi.accelrys.net> <320fb6e00907090159w6a76df66gbf8da2581594414d@mail.gmail.com> <4A55B578.1060808@ebi.ac.uk> <320fb6e00907090221m66f830b6ya25ad8eeeecf8f12@mail.gmail.com> Message-ID: <320fb6e00907200624x5513f3b5u8f7fa0c88e23fc8d@mail.gmail.com> On Thu, Jul 9, 2009 at 10:21 AM, Peter wrote: > On Thu, Jul 9, 2009 at 10:16 AM, Peter Rice wrote: >> >> Peter C. wrote: >> >>> Thanks Scott (& Thomas) for pointing out the embossversion program. >>> >>> I would still question why the EMBOSS tools don't also support the >>> Unix convention of a version switch. Hypothetically, aren't some >>> (many?) of the tools standalone and couldn't they be installed >>> individually (e.g. as part of someone else's software bundle)? i.e. >>> Can EMBOSS really guarantee that the needle tool and the >>> embossversion tool are in sync? >> >> We could easily add a -version global qualifier ... for the next release. >> >> We can guarantee that embossversion and needle are in sync - assuming >> they are built using the same libraries as that is where the version is >> recorded. Standalone build are an issue though and it would help debug >> in a few cases. > > That sounds good to me :) > Thinking about this again, rather than adding a whole new argument (-version), why not just include the program version as the first line of the help output (from -help)? This should also solve the corner case of standalone builds, and makes it very easy to find the version (without having to know about the embossversion tool). Thanks, Peter C. From pmr at ebi.ac.uk Mon Jul 20 09:37:38 2009 From: pmr at ebi.ac.uk (Peter Rice) Date: Mon, 20 Jul 2009 14:37:38 +0100 Subject: [EMBOSS] transeq and ambiguous codons In-Reply-To: <320fb6e00907200624x5513f3b5u8f7fa0c88e23fc8d@mail.gmail.com> References: <320fb6e00907081450y2fd135e0x817f03c41357e297@mail.gmail.com> <1F1240778FB0AF46B4E5A72C44D2C74732E01A91@exch1-hi.accelrys.net> <320fb6e00907090159w6a76df66gbf8da2581594414d@mail.gmail.com> <4A55B578.1060808@ebi.ac.uk> <320fb6e00907090221m66f830b6ya25ad8eeeecf8f12@mail.gmail.com> <320fb6e00907200624x5513f3b5u8f7fa0c88e23fc8d@mail.gmail.com> Message-ID: <4A647322.7080903@ebi.ac.uk> Peter C. wrote: > Thinking about this again, rather than adding a whole new argument > (-version), why not just include the program version as the first line of > the help output (from -help)? This should also solve the corner case > of standalone builds, and makes it very easy to find the version > (without having to know about the embossversion tool). Too late ... I already have working code for --version :-) This can report the EMBOSS version number. For EMBASSY applications it also reports the EMBASSY package version number (something embossversion is unable to do). There was also a "version:" ACD application attribute. This has to be renamed to "versionnumber:" to avoid a clash with the new qualifier name, and will be reported if defined. The "versionnumber" attribute in ACD was originally defined only for third party applications in the SoapLab interface so we do not expect renaming the attribute to cause any problems (we will make the changes to SoapLab). However, the suggestion of adding the the -help output is a good one. I will add something there too. regards, Peter Rice From biopython at maubp.freeserve.co.uk Mon Jul 20 11:41:43 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 20 Jul 2009 16:41:43 +0100 Subject: [EMBOSS] EMBOSS seqret : IntelliGenetics and new DOS lines Message-ID: <320fb6e00907200841p621dcb7eh63d71b0730100813@mail.gmail.com> Hi all, I've just updated my Mac to EMBOSS 6.1.0, and have found an issue with seqret conversion of IntelliGenetics files. After some digging, I think this problem relates to having DOS new lines in a file on Unix (in my case, Mac OS X). For illustration, I'm using the example file from the EMBOSS website, saved to disk (using Unix new lines on a Mac): http://emboss.sourceforge.net/docs/themes/seqformats/ig Using EMBOSS 6.0.1, there was a problem: $ embossversion Writes the current EMBOSS version number to a file 6.0.1 $ seqret -sequence emboss_ig.txt -sformat ig -osformat fasta -auto -filter >HSFAU ttcctctttctcgactccatcttcgcggtagctgggaccgccgttcagtcgccaatatgc agctctttgtccgcgcccaggagctacacaccttcgaggtgaccggccaggaaacggtcg cccagatcaaggctcatgtagcctcactggagggcattgccccggaagatcaagtcgtgc tcctggcaggcgcgcccctggaggatgaggccactctgggccagtgcggggtggaggccc tgactaccctggaagtagcaggccgcatgcttggaggtaaagttcatggttccctggccc gtgctggaaaagtgagaggtcagactcctaaggtggccaaacaggagaagaagaagaaga agacaggtcgggctaagcggcggatgcagtacaaccggcgctttgtcaacgttgtgccca cctttggcaagaagaagggccccaatgccaactcttaagtcttttgtaattctggctttc tctaataaaaaagccacttagttcagtcaaaaaaaaaaH-sapiensfaugenebasesH SFAUctaccattttccctctcgattctatatgtacactcgggacaagttctcctgatcga aaacggcaaaactaaggccccaagtaggaatgccttagttttcggggttaacaatgatta acactgagcctcacacccacgcgatgccctcagctcctcgctcagcgctctcaccaacag ccgtagcccgcagccccgctggacaccggttctccatccccgcagcgtagcccggaacat ggtagctgccatctttacctgctacgccagccttctgtgcgcgcaactgtctggtcccgc cccgtcctgcgcgagctgctgcccaggcaggttcgccggtgcgagcgtaaaggggcggag ctaggactgccttgggcggtacaaatagcagggaaccgcgcggtcgctcagcagtgacgt gacacgcagcccacggtctgtactgacgcgccctcgcttcttcctctttctcgactccat cttcgcggtagctgggaccgccgttcaggtaagaatggggccttggctggatccgaaggg cttgtagcaggttggctgcggggtcagaaggcgcggggggaaccgaagaacggggcctgc tccgtggccctgctccagtccctatccgaactccttgggaggcactggccttccgcacgt gagccgccgcgaccaccatcccgtcgcgatcgtttctggaccgctttccactcccaaatc tcctttatcccagagcatttcttggcttctcttacaagccgtcttttctttactcagtcg ccaatatgcagctctttgtccgcgcccaggagctacacaccttcgaggtgaccggccagg aaacggtcgcccagatcaaggtaaggctgcttggtgcgccctgggttccattttcttgtg ctcttcactctcgcggcccgagggaacgcttacgagccttatctttccctgtaggctcat gtagcctcactggagggcattgccccggaagatcaagtcgtgctcctggcaggcgcgccc ctggaggatgaggccactctgggccagtgcggggtggaggccctgactaccctggaagta gcaggccgcatgcttggaggtgagtgagagaggaatgttctttgaagtaccggtaagcgt ctagtgagtgtggggtgcatagtcctgacagctgagtgtcacacctatggtaatagagta cttctcactgtcttcagttcagagtgattcttcctgtttacatccctcatgttgaacaca gacgtccatgggagactgagccagagtgtagttgtatttcagtcacatcacgagatccta gtctggttatcagcttccacactaaaaattaggtcagaccaggccccaaagtgctctata aattagaagctggaagatcctgaaatgaaacttaagatttcaaggtcaaatatctgcaac tttgttctcattacctattgggcgcagcttctctttaaaggcttgaattgagaaaagagg ggttctgctgggtggcaccttcttgctcttacctgctggtgccttcctttcccactacag gtaaagtccatggttccctggcccgtgctggaaaagtgagaggtcagactcctaaggtga gtgagagtattagtggtcatggtgttaggactttttttcctttcacagctaaaccaagtc cctgggctcttactcggtttgccttctccctccctggagatgagcctgagggaagggatg ctaggtgtggaagacaggaaccagggcctgattaaccttcccttctccaggtggccaaac aggagaagaagaagaagaagacaggtcgggctaagcggcggatgcagtacaaccggcgct ttgtcaacgttgtgcccacctttggcaagaagaagggccccaatgccaactcttaagtct tttgtaattctggctttctctaataaaaaagccacttagttcagtcatcgcattgtttca tctttacttgcaaggcctcagggagaggtgtgcttctcgg i.e. The two sequences have been munged into one, with the name of the second sequence as part of the sequence. Using EMBOSS 6.1.0, the following now works: $ embossversion Reports the current EMBOSS version number 6.1.0 $ seqret -sequence emboss_ig.txt -sformat ig -osformat fasta -auto -filter >HSFAU H.sapiens fau mRNA, 518 bases ttcctctttctcgactccatcttcgcggtagctgggaccgccgttcagtcgccaatatgc agctctttgtccgcgcccaggagctacacaccttcgaggtgaccggccaggaaacggtcg cccagatcaaggctcatgtagcctcactggagggcattgccccggaagatcaagtcgtgc tcctggcaggcgcgcccctggaggatgaggccactctgggccagtgcggggtggaggccc tgactaccctggaagtagcaggccgcatgcttggaggtaaagttcatggttccctggccc gtgctggaaaagtgagaggtcagactcctaaggtggccaaacaggagaagaagaagaaga agacaggtcgggctaagcggcggatgcagtacaaccggcgctttgtcaacgttgtgccca cctttggcaagaagaagggccccaatgccaactcttaagtcttttgtaattctggctttc tctaataaaaaagccacttagttcagtcaaaaaaaaaa >HSFAU1 H.sapiens fau 1 gene, 2016 bases ctaccattttccctctcgattctatatgtacactcgggacaagttctcctgatcgaaaac ggcaaaactaaggccccaagtaggaatgccttagttttcggggttaacaatgattaacac tgagcctcacacccacgcgatgccctcagctcctcgctcagcgctctcaccaacagccgt agcccgcagccccgctggacaccggttctccatccccgcagcgtagcccggaacatggta gctgccatctttacctgctacgccagccttctgtgcgcgcaactgtctggtcccgccccg tcctgcgcgagctgctgcccaggcaggttcgccggtgcgagcgtaaaggggcggagctag gactgccttgggcggtacaaatagcagggaaccgcgcggtcgctcagcagtgacgtgaca cgcagcccacggtctgtactgacgcgccctcgcttcttcctctttctcgactccatcttc gcggtagctgggaccgccgttcaggtaagaatggggccttggctggatccgaagggcttg tagcaggttggctgcggggtcagaaggcgcggggggaaccgaagaacggggcctgctccg tggccctgctccagtccctatccgaactccttgggaggcactggccttccgcacgtgagc cgccgcgaccaccatcccgtcgcgatcgtttctggaccgctttccactcccaaatctcct ttatcccagagcatttcttggcttctcttacaagccgtcttttctttactcagtcgccaa tatgcagctctttgtccgcgcccaggagctacacaccttcgaggtgaccggccaggaaac ggtcgcccagatcaaggtaaggctgcttggtgcgccctgggttccattttcttgtgctct tcactctcgcggcccgagggaacgcttacgagccttatctttccctgtaggctcatgtag cctcactggagggcattgccccggaagatcaagtcgtgctcctggcaggcgcgcccctgg aggatgaggccactctgggccagtgcggggtggaggccctgactaccctggaagtagcag gccgcatgcttggaggtgagtgagagaggaatgttctttgaagtaccggtaagcgtctag tgagtgtggggtgcatagtcctgacagctgagtgtcacacctatggtaatagagtacttc tcactgtcttcagttcagagtgattcttcctgtttacatccctcatgttgaacacagacg tccatgggagactgagccagagtgtagttgtatttcagtcacatcacgagatcctagtct ggttatcagcttccacactaaaaattaggtcagaccaggccccaaagtgctctataaatt agaagctggaagatcctgaaatgaaacttaagatttcaaggtcaaatatctgcaactttg ttctcattacctattgggcgcagcttctctttaaaggcttgaattgagaaaagaggggtt ctgctgggtggcaccttcttgctcttacctgctggtgccttcctttcccactacaggtaa agtccatggttccctggcccgtgctggaaaagtgagaggtcagactcctaaggtgagtga gagtattagtggtcatggtgttaggactttttttcctttcacagctaaaccaagtccctg ggctcttactcggtttgccttctccctccctggagatgagcctgagggaagggatgctag gtgtggaagacaggaaccagggcctgattaaccttcccttctccaggtggccaaacagga gaagaagaagaagaagacaggtcgggctaagcggcggatgcagtacaaccggcgctttgt caacgttgtgcccacctttggcaagaagaagggccccaatgccaactcttaagtcttttg taattctggctttctctaataaaaaagccacttagttcagtcatcgcattgtttcatctt tacttgcaaggcctcagggagaggtgtgcttctcgg i.e. There was a problem with this example file in EMBOSS 6.0.1, but things look fine in EMBOSS 6.1.0. Great :) However, if we now convert this input file to use DOS/Windows newlines, and repeat the test (on Mac OS X, so Unix): $ embossversionReports the current EMBOSS version number 6.1.0 $ seqret -sequence emboss_ig.txt -sformat ig -osformat fasta -auto -filter H.sapiens fau mRNA, 518 bases ttcctctttctcgactccatcttcgcggtagctgggaccgccgttcagtcgccaatatgc agctctttgtccgcgcccaggagctacacaccttcgaggtgaccggccaggaaacggtcg cccagatcaaggctcatgtagcctcactggagggcattgccccggaagatcaagtcgtgc tcctggcaggcgcgcccctggaggatgaggccactctgggccagtgcggggtggaggccc tgactaccctggaagtagcaggccgcatgcttggaggtaaagttcatggttccctggccc gtgctggaaaagtgagaggtcagactcctaaggtggccaaacaggagaagaagaagaaga agacaggtcgggctaagcggcggatgcagtacaaccggcgctttgtcaacgttgtgccca cctttggcaagaagaagggccccaatgccaactcttaagtcttttgtaattctggctttc tctaataaaaaagccacttagttcagtcaaaaaaaaaa H.sapiens fau 1 gene, 2016 bases ctaccattttccctctcgattctatatgtacactcgggacaagttctcctgatcgaaaac ggcaaaactaaggccccaagtaggaatgccttagttttcggggttaacaatgattaacac tgagcctcacacccacgcgatgccctcagctcctcgctcagcgctctcaccaacagccgt agcccgcagccccgctggacaccggttctccatccccgcagcgtagcccggaacatggta gctgccatctttacctgctacgccagccttctgtgcgcgcaactgtctggtcccgccccg tcctgcgcgagctgctgcccaggcaggttcgccggtgcgagcgtaaaggggcggagctag gactgccttgggcggtacaaatagcagggaaccgcgcggtcgctcagcagtgacgtgaca cgcagcccacggtctgtactgacgcgccctcgcttcttcctctttctcgactccatcttc gcggtagctgggaccgccgttcaggtaagaatggggccttggctggatccgaagggcttg tagcaggttggctgcggggtcagaaggcgcggggggaaccgaagaacggggcctgctccg tggccctgctccagtccctatccgaactccttgggaggcactggccttccgcacgtgagc cgccgcgaccaccatcccgtcgcgatcgtttctggaccgctttccactcccaaatctcct ttatcccagagcatttcttggcttctcttacaagccgtcttttctttactcagtcgccaa tatgcagctctttgtccgcgcccaggagctacacaccttcgaggtgaccggccaggaaac ggtcgcccagatcaaggtaaggctgcttggtgcgccctgggttccattttcttgtgctct tcactctcgcggcccgagggaacgcttacgagccttatctttccctgtaggctcatgtag cctcactggagggcattgccccggaagatcaagtcgtgctcctggcaggcgcgcccctgg aggatgaggccactctgggccagtgcggggtggaggccctgactaccctggaagtagcag gccgcatgcttggaggtgagtgagagaggaatgttctttgaagtaccggtaagcgtctag tgagtgtggggtgcatagtcctgacagctgagtgtcacacctatggtaatagagtacttc tcactgtcttcagttcagagtgattcttcctgtttacatccctcatgttgaacacagacg tccatgggagactgagccagagtgtagttgtatttcagtcacatcacgagatcctagtct ggttatcagcttccacactaaaaattaggtcagaccaggccccaaagtgctctataaatt agaagctggaagatcctgaaatgaaacttaagatttcaaggtcaaatatctgcaactttg ttctcattacctattgggcgcagcttctctttaaaggcttgaattgagaaaagaggggtt ctgctgggtggcaccttcttgctcttacctgctggtgccttcctttcccactacaggtaa agtccatggttccctggcccgtgctggaaaagtgagaggtcagactcctaaggtgagtga gagtattagtggtcatggtgttaggactttttttcctttcacagctaaaccaagtccctg ggctcttactcggtttgccttctccctccctggagatgagcctgagggaagggatgctag gtgtggaagacaggaaccagggcctgattaaccttcccttctccaggtggccaaacagga gaagaagaagaagaagacaggtcgggctaagcggcggatgcagtacaaccggcgctttgt caacgttgtgcccacctttggcaagaagaagggccccaatgccaactcttaagtcttttg taattctggctttctctaataaaaaagccacttagttcagtcatcgcattgtttcatctt tacttgcaaggcctcagggagaggtgtgcttctcgg i.e. The ">" is missing on all the FASTA sequences. So, it looks like EMBOSS 6.1.0 fixed one problem with IntelliGenetics files, but that there is still an issue here. Peter C. P.S. Should I have reported this possible bug via sourceforge? P.P.S. Back in 2006, I reported a similar issue with a data corruption reading stockholm/pfam with DOS newlines (Sourceforge Bug #1588956, long since fixed). It seems to me that EMBOSS would benefit from explicit testing of all the file formats using DOS/Windows newlines when run on Unix, and vice versa. Does that sound feasible, or just hopelessly ambitious? From biopython at maubp.freeserve.co.uk Mon Jul 20 11:50:32 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 20 Jul 2009 16:50:32 +0100 Subject: [EMBOSS] FASTQ format documentation Message-ID: <320fb6e00907200850t39582781p115780690955bde6@mail.gmail.com> Hi all, I was just trying to double check the names EMBOSS 6.1.0 supports for the various FASTQ file formats, and none of them are listed here: http://emboss.sourceforge.net/docs/themes/SequenceFormats.html Does this need updating, or should I be looking elsewhere? Thanks Peter C. From pmr at ebi.ac.uk Mon Jul 20 12:16:39 2009 From: pmr at ebi.ac.uk (Peter Rice) Date: Mon, 20 Jul 2009 17:16:39 +0100 Subject: [EMBOSS] EMBOSS seqret : IntelliGenetics and new DOS lines In-Reply-To: <320fb6e00907200841p621dcb7eh63d71b0730100813@mail.gmail.com> References: <320fb6e00907200841p621dcb7eh63d71b0730100813@mail.gmail.com> Message-ID: <4A649867.4090205@ebi.ac.uk> Peter C. wrote: > Hi all, > > I've just updated my Mac to EMBOSS 6.1.0, and have found an > issue with seqret conversion of IntelliGenetics files. After some > digging, I think this problem relates to having DOS new lines in > a file on Unix (in my case, Mac OS X). we have an application "noreturn" to fix things like this. If you send me your file I will ty to take a look at whether we shoudl be catching the funny newline characters. > P.S. Should I have reported this possible bug via sourceforge? The emboss-bug at emboss.open-bio.org list is the best way to get our attention > P.P.S. Back in 2006, I reported a similar issue with a data > corruption reading stockholm/pfam with DOS newlines > (Sourceforge Bug #1588956, long since fixed). It seems to > me that EMBOSS would benefit from explicit testing of all > the file formats using DOS/Windows newlines when run on > Unix, and vice versa. Does that sound feasible, or just > hopelessly ambitious? We can try ... how well does biopytjhon handle these? (i.e. do we need such examples for perl, python etc or is this an EMBOSS-specific issue?) regards, Peter From pmr at ebi.ac.uk Mon Jul 20 12:17:46 2009 From: pmr at ebi.ac.uk (Peter Rice) Date: Mon, 20 Jul 2009 17:17:46 +0100 Subject: [EMBOSS] FASTQ format documentation In-Reply-To: <320fb6e00907200850t39582781p115780690955bde6@mail.gmail.com> References: <320fb6e00907200850t39582781p115780690955bde6@mail.gmail.com> Message-ID: <4A6498AA.8040007@ebi.ac.uk> Peter C. wrote: > Hi all, > > I was just trying to double check the names EMBOSS 6.1.0 supports > for the various FASTQ file formats, and none of them are listed here: > http://emboss.sourceforge.net/docs/themes/SequenceFormats.html > > Does this need updating, or should I be looking elsewhere? Yes, and yes. We are cleaning up these things post-release. I will update that file, and also the equivalent on the wiki regards, Peter Rice From biopython at maubp.freeserve.co.uk Mon Jul 20 12:30:45 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 20 Jul 2009 17:30:45 +0100 Subject: [EMBOSS] EMBOSS seqret : IntelliGenetics and new DOS lines In-Reply-To: <4A649867.4090205@ebi.ac.uk> References: <320fb6e00907200841p621dcb7eh63d71b0730100813@mail.gmail.com> <4A649867.4090205@ebi.ac.uk> Message-ID: <320fb6e00907200930r6086a9a0x855de88cf9281543@mail.gmail.com> On Mon, Jul 20, 2009 at 5:16 PM, Peter Rice wrote: > > Peter C. wrote: >> Hi all, >> >> I've just updated my Mac to EMBOSS 6.1.0, and have found an >> issue with seqret conversion of IntelliGenetics files. After some >> digging, I think this problem relates to having DOS new lines in >> a file on Unix (in my case, Mac OS X). > > we have an application "noreturn" to fix things like this. That's basically an EMBOSS variant on unix2dos and dos2unix (or similar) existing Unix command line tools? I'm more interested in having all the EMBOSS tools handle either new line format themselves automatically. These days I am mostly working on Unix (including Mac OS X), but I do have to cope with Windows style text files quite often. > If you send me your file I will ty to take a look at whether we shoudl > be catching the funny newline characters. For this bug report I was using: http://emboss.sourceforge.net/docs/themes/seqformats/ig There are another three example files used in the Biopython unit tests here: http://biopython.open-bio.org/SRC/biopython/Tests/IntelliGenetics/ >> P.S. Should I have reported this possible bug via sourceforge? > > The emboss-bug at emboss.open-bio.org list is the best way to get > our attention Great, another mailing list to sign up to... but if that is your preferred route, that's fine. >> P.P.S. Back in 2006, I reported a similar issue with a data >> corruption reading stockholm/pfam with DOS newlines >> (Sourceforge Bug #1588956, long since fixed). It seems to >> me that EMBOSS would benefit from explicit testing of all >> the file formats using DOS/Windows newlines when run on >> Unix, and vice versa. Does that sound feasible, or just >> hopelessly ambitious? > > We can try ... how well does biopytjhon handle these? (i.e. do we need > such examples for perl, python etc or is this an EMBOSS-specific issue?) I think this is an EMBOSS specific issue. I don't know enough about how all the different EMBOSS parsers work, but is there a singl place where you could add automatic handling of either new line convention when reading in text? For reference, in Python, you can explicitly open text files in "universal newlines" mode, which takes care of this. I don't know about Perl. Peter C. From pmr at ebi.ac.uk Mon Jul 20 16:04:19 2009 From: pmr at ebi.ac.uk (Peter Rice) Date: Mon, 20 Jul 2009 21:04:19 +0100 Subject: [EMBOSS] EMBOSS seqret : IntelliGenetics and new DOS lines In-Reply-To: <320fb6e00907200930r6086a9a0x855de88cf9281543@mail.gmail.com> References: <320fb6e00907200841p621dcb7eh63d71b0730100813@mail.gmail.com> <4A649867.4090205@ebi.ac.uk> <320fb6e00907200930r6086a9a0x855de88cf9281543@mail.gmail.com> Message-ID: <4A64CDC3.9020902@ebi.ac.uk> Peter C. wrote: Thanks for the example files. I will start with those. >>> P.S. Should I have reported this possible bug via sourceforge? >> The emboss-bug at emboss.open-bio.org list is the best way to get >> our attention > > Great, another mailing list to sign up to... but if that is your > preferred route, that's fine. Actually, you can't sign up ... that's our bug reporting list :-) > I think this is an EMBOSS specific issue. I don't know enough about > how all the different EMBOSS parsers work, but is there a single > place where you could add automatic handling of either new line > convention when reading in text? Hope so. I think the issue is places where the parsing is checking explicitly for \n rather than \n and \r. The solution would be to strip both off before parsing. It will need a thorough clean through the ajseqread code. regards, Peter From biopython at maubp.freeserve.co.uk Mon Jul 20 16:09:43 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 20 Jul 2009 21:09:43 +0100 Subject: [EMBOSS] EMBOSS seqret : IntelliGenetics and new DOS lines In-Reply-To: <4A64CDC3.9020902@ebi.ac.uk> References: <320fb6e00907200841p621dcb7eh63d71b0730100813@mail.gmail.com> <4A649867.4090205@ebi.ac.uk> <320fb6e00907200930r6086a9a0x855de88cf9281543@mail.gmail.com> <4A64CDC3.9020902@ebi.ac.uk> Message-ID: <320fb6e00907201309r3ddca32dy3bfc159e201217b5@mail.gmail.com> Peter Rice wrote: > > Thanks for the example files. I will start with those. > > Peter C. wrote: >> I think this is an EMBOSS specific issue. I don't know enough about >> how all the different EMBOSS parsers work, but is there a single >> place where you could add automatic handling of either new line >> convention when reading in text? > > Hope so. I think the issue is places where the parsing is checking > explicitly for \n rather than \n and \r. The solution would be to strip > both off before parsing. It will need a thorough clean through the > ajseqread code. That sounds like a good investment of effort in the long run :) Peter C. From pmr at ebi.ac.uk Tue Jul 21 05:15:59 2009 From: pmr at ebi.ac.uk (Peter Rice) Date: Tue, 21 Jul 2009 10:15:59 +0100 Subject: [EMBOSS] EMBOSS seqret : IntelliGenetics and new DOS lines In-Reply-To: <320fb6e00907200841p621dcb7eh63d71b0730100813@mail.gmail.com> References: <320fb6e00907200841p621dcb7eh63d71b0730100813@mail.gmail.com> Message-ID: <4A65874F.2010106@ebi.ac.uk> Peter C. wrote: > However, if we now convert this input file to use DOS/Windows > newlines, and repeat the test (on Mac OS X, so Unix): > > $ embossversionReports the current EMBOSS version number > 6.1.0 > $ seqret -sequence emboss_ig.txt -sformat ig -osformat fasta -auto -filter > H.sapiens fau mRNA, 518 bases > ttcctctttctcgactccatcttcgcggtagctgggaccgccgttcagtcgccaatatgc > > i.e. The ">" is missing on all the FASTA sequences. Actually, it's not missing ... it is hiding. The sequence id has a ^M appended to it, so the '> and the id get overwritten by the description when you look at the file. Fixed by processing the IG format ID rather than simply copying it. Thanks for finding that one. regards, Peter Rice From biopython at maubp.freeserve.co.uk Tue Jul 21 05:23:03 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 21 Jul 2009 10:23:03 +0100 Subject: [EMBOSS] EMBOSS seqret : IntelliGenetics and new DOS lines In-Reply-To: <4A65874F.2010106@ebi.ac.uk> References: <320fb6e00907200841p621dcb7eh63d71b0730100813@mail.gmail.com> <4A65874F.2010106@ebi.ac.uk> Message-ID: <320fb6e00907210223x5a145ab5qfa9b29bc42a37815@mail.gmail.com> Peter Rice wrote: > > Peter C. wrote: >> However, if we now convert this input file to use DOS/Windows >> newlines, and repeat the test (on Mac OS X, so Unix): >> >> $ embossversionReports the current EMBOSS version number >> 6.1.0 >> $ seqret -sequence emboss_ig.txt -sformat ig -osformat fasta -auto -filter >> ?H.sapiens fau mRNA, 518 bases >> ttcctctttctcgactccatcttcgcggtagctgggaccgccgttcagtcgccaatatgc >> >> i.e. The ">" is missing on all the FASTA sequences. > > Actually, it's not missing ... it is hiding. > > The sequence id has a ^M appended to it, so the '> and the id get > overwritten by the description when you look at the file. That makes sense, and I think I can see how it might have happened. > Fixed by processing the IG format ID rather than simply copying it. > > Thanks for finding that one. Sure, Peter C. From db60 at st-andrews.ac.uk Tue Jul 21 07:24:28 2009 From: db60 at st-andrews.ac.uk (Daniel Barker) Date: Tue, 21 Jul 2009 12:24:28 +0100 Subject: [EMBOSS] EMBOSS seqret : IntelliGenetics and new DOS lines In-Reply-To: <320fb6e00907200930r6086a9a0x855de88cf9281543@mail.gmail.com> References: <320fb6e00907200841p621dcb7eh63d71b0730100813@mail.gmail.com> <4A649867.4090205@ebi.ac.uk> <320fb6e00907200930r6086a9a0x855de88cf9281543@mail.gmail.com> Message-ID: <4A65A56C.9010308@st-andrews.ac.uk> Dear Peters et al., EMBOSS claims not to care about whether newlines are DOS or UNIX: 'EMBOSS programs can read in both PC and Unix text file formats, so it is not necessary for you to use this utility all of the time' - noreturn documentation. This would certainly be good. 'The newline problem' must be the single biggest computational waste of time I've experienced over the years! It's easy to avoid with tr, u2d, d2u, noreturn, etc. - but it's just one other thing that can go wrong, especially when data is shared between different places. I've noticed a small amount of software, in the world in general, still uses the Mac OS 9 (and earlier) convention where newline is \015 only. E.g. this tab-delimited text saved from Excel 2004 for Mac: $ od -bc Workbook1.txt 0000000 061 011 062 011 063 015 064 011 065 011 066 1 \t 2 \t 3 \r 4 \t 5 \t 6 0000013 $ I expect this usage will decline, since it's in conflict with the convention of Mac OS X's own command-line tools (\012 only, like Linux). Probably the '\015 only' convention hasn't had much impact on bioinformatics anyway? Best wishes, Daniel -- Daniel Barker http://bio.st-andrews.ac.uk/staff/db60.htm The University of St Andrews is a charity registered in Scotland : No SC013532 From pmr at ebi.ac.uk Tue Jul 21 08:12:00 2009 From: pmr at ebi.ac.uk (Peter Rice) Date: Tue, 21 Jul 2009 13:12:00 +0100 Subject: [EMBOSS] EMBOSS seqret : IntelliGenetics and new DOS lines In-Reply-To: <4A65A56C.9010308@st-andrews.ac.uk> References: <320fb6e00907200841p621dcb7eh63d71b0730100813@mail.gmail.com> <4A649867.4090205@ebi.ac.uk> <320fb6e00907200930r6086a9a0x855de88cf9281543@mail.gmail.com> <4A65A56C.9010308@st-andrews.ac.uk> Message-ID: <4A65B090.7030507@ebi.ac.uk> Daniel Barker wrote: > Dear Peters et al., > > EMBOSS claims not to care about whether newlines are DOS or UNIX: > > 'EMBOSS programs can read in both PC and Unix text file formats, so it > is not necessary for you to use this utility all of the time' - noreturn > documentation. > > This would certainly be good. 'The newline problem' must be the single > biggest computational waste of time I've experienced over the years! Indeed. We get a little caught between accepting the extra carriage returns and the need for efficient parsing as some users run seqret to reformat large sequence files. Mac format (\015 only) is horrible because the C library functions are looking for line feed unless you're running on a Mac. I will give our examples a run through after converting to PC format and see if any others fall over. > I've noticed a small amount of software, in the world in general, still > uses the Mac OS 9 (and earlier) convention where newline is \015 only. > E.g. this tab-delimited text saved from Excel 2004 for Mac: > > $ od -bc Workbook1.txt > 0000000 061 011 062 011 063 015 064 011 065 011 066 > 1 \t 2 \t 3 \r 4 \t 5 \t 6 > 0000013 > $ > > I expect this usage will decline, since it's in conflict with the > convention of Mac OS X's own command-line tools (\012 only, like Linux). > Probably the '\015 only' convention hasn't had much impact on > bioinformatics anyway? In my experience mac users didn't venture out into the real world. On the other hand, unix users have often copied files from PCs. I used to do the same myself, which was why I had to write noreturn in the first place. regards, Peter Rice From stephen.taylor at imm.ox.ac.uk Thu Jul 23 05:43:35 2009 From: stephen.taylor at imm.ox.ac.uk (Stephen Taylor) Date: Thu, 23 Jul 2009 10:43:35 +0100 Subject: [EMBOSS] Tree building Message-ID: Hi, Are there any phylogeny programs in EMBOSS/EMBASSY that will allow me to build a tree from a multiple alignment without truncating the id name in the resulting treefile? Thanks, Steve From pmr at ebi.ac.uk Thu Jul 23 06:51:06 2009 From: pmr at ebi.ac.uk (Peter Rice) Date: Thu, 23 Jul 2009 11:51:06 +0100 Subject: [EMBOSS] Tree building In-Reply-To: References: Message-ID: <4A68409A.7020004@ebi.ac.uk> Stephen Taylor wrote: > Hi, > > Are there any phylogeny programs in EMBOSS/EMBASSY that will allow me to > build a tree from a multiple alignment without truncating the id name in > the resulting treefile? This can be done by adding some new output formats to the existing phylipnew embassy applications. Phylip truncates id names for its own formats. We can extend them, and only truncate for strict phylip format outputs. We are also open to suggestions for new applications in phylogenetics A good topic to discuss at the GMOD workshop in Oxford. regards, Peter From pmr at ebi.ac.uk Thu Jul 23 10:06:58 2009 From: pmr at ebi.ac.uk (Peter Rice) Date: Thu, 23 Jul 2009 15:06:58 +0100 Subject: [EMBOSS] Tree building In-Reply-To: References: <4A68409A.7020004@ebi.ac.uk> Message-ID: <4A686E82.1010803@ebi.ac.uk> Stephen Taylor wrote: > Hi Peter, > >> Stephen Taylor wrote: >>> Hi, >>> Are there any phylogeny programs in EMBOSS/EMBASSY that will allow me >>> to build a tree from a multiple alignment without truncating the id >>> name in the resulting treefile? >> >> This can be done by adding some new output formats to the existing >> phylipnew embassy applications. Phylip truncates id names for its own >> formats. We can extend them, and only truncate for strict phylip >> format outputs. >> >> We are also open to suggestions for new applications in phylogenetics >> > > Sounds like a good idea. > > However, I kind of need it today:-). I guess I'll have to truncate them > to something sensible for the time being... Can you give some example data and algorithms to make the tree? Peter From stephen.taylor at imm.ox.ac.uk Thu Jul 23 11:07:36 2009 From: stephen.taylor at imm.ox.ac.uk (Stephen Taylor) Date: Thu, 23 Jul 2009 16:07:36 +0100 Subject: [EMBOSS] Tree building In-Reply-To: <50E41C3A73F46B4D876BD13F5E80264A02DAD6C6@exchange3.sims.scri.sari.ac.uk> References: <4A68409A.7020004@ebi.ac.uk> <4A686E82.1010803@ebi.ac.uk> <50E41C3A73F46B4D876BD13F5E80264A02DAD6C6@exchange3.sims.scri.sari.ac.uk> Message-ID: Hi Frank, > Hi Stephen, > >> However, I kind of need it today:-). I guess I'll have to truncate >> them >> to something sensible for the time being... > > If you need a tree *today*... :-) you could download our TOPALi > package (loads all formats of alignment) > I have already installed Topali and was waiting for the results to run. I have Over 1600 sequences and I wasn't sure when it was going to finish since I started it this morning (using the Quick NJ option on a Mac Mini). I think we should install it on a more powerful machine;-). However, after I got your email I checked the status and the job had finished. Good timing... > It does basic NJ, but also model-selection, PhyML, RaxML and > MrBayes. RaxML is quite fast. > > We internally rename species names when TOPALi submits jobs to the > above external programs, but put the names back when we show the > alignment and the tree. Yes, all the labels are there in full. Thanks a lot. Peter - I can send you some date offline if you wish. Steve From stephen.taylor at imm.ox.ac.uk Thu Jul 23 09:34:26 2009 From: stephen.taylor at imm.ox.ac.uk (Stephen Taylor) Date: Thu, 23 Jul 2009 14:34:26 +0100 Subject: [EMBOSS] Tree building In-Reply-To: <4A68409A.7020004@ebi.ac.uk> References: <4A68409A.7020004@ebi.ac.uk> Message-ID: Hi Peter, > Stephen Taylor wrote: >> Hi, >> Are there any phylogeny programs in EMBOSS/EMBASSY that will allow >> me to build a tree from a multiple alignment without truncating the >> id name in the resulting treefile? > > This can be done by adding some new output formats to the existing > phylipnew embassy applications. Phylip truncates id names for its > own formats. We can extend them, and only truncate for strict phylip > format outputs. > > We are also open to suggestions for new applications in phylogenetics > Sounds like a good idea. However, I kind of need it today:-). I guess I'll have to truncate them to something sensible for the time being... Thanks. Steve From gbottu at vub.ac.be Thu Jul 23 12:19:11 2009 From: gbottu at vub.ac.be (Guy Bottu) Date: Thu, 23 Jul 2009 18:19:11 +0200 Subject: [EMBOSS] Tree building In-Reply-To: <4A68409A.7020004@ebi.ac.uk> References: <4A68409A.7020004@ebi.ac.uk> Message-ID: <4A688D7F.6080304@vub.ac.be> Peter Rice wrote: > This can be done by adding some new output formats to the existing > phylipnew embassy applications. Phylip truncates id names for its own > formats. We can extend them, and only truncate for strict phylip format > outputs. Interesting to note the following : standard PHYLIP format reserves the first 10 characters for the name and has the sequence start at position 11. But some programmers like the one of PhyML use a format they call PHYLIP, which however allows a name of any length but then demands that there be a space between the name and the sequence. I already had trouble when I used a standard PHYLIP file made by PHYLIP or EMBOSS with a name of 10 characters as input to PhyML. Guy Bottu From belegdol at gmail.com Wed Jul 29 07:54:49 2009 From: belegdol at gmail.com (Julian Sikorski) Date: Wed, 29 Jul 2009 13:54:49 +0200 Subject: [EMBOSS] Packaging EMBOSS for Fedora In-Reply-To: <49993.78.105.201.225.1248020309.squirrel@webmail.ebi.ac.uk> References: <1244850556.8999.7.camel@login-svr1.ebi.ac.uk> <49993.78.105.201.225.1248020309.squirrel@webmail.ebi.ac.uk> Message-ID: W dniu 19.07.2009 18:18, uludag at ebi.ac.uk pisze: > >> there seem to be some problems with make install: >> >> /usr/bin/make install-exec-hook >> make[7]: Entering directory >> `/builddir/build/BUILD/EMBOSS-6.1.0/jemboss/org/emboss/jemboss/editor' >> mkdir -p -- >> /builddir/build/BUILDROOT/EMBOSS-6.1.0-1.fc11.x86_64/usr/share/EMBOSS/jemboss/org/emboss/jemboss/editor >> /usr/bin/install: cannot stat `*.class': No such file or directory > > Looks like we didn't test the --with-java and --with-javaos configure > options well, before this release. However, most users will not need these > two options any more as EMBOSS-6.1.0 includes precompiled jemboss class > files collected in a java archive file. You should hopefully not get the > above error if you omit these two options when you configure your emboss > installation. > > Regards, > Mahmut Thank you, removing these two seems to have done the trick! Julian From biopython at maubp.freeserve.co.uk Thu Jul 30 11:00:37 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 30 Jul 2009 16:00:37 +0100 Subject: [EMBOSS] FASTQ records with no sequence? Message-ID: <320fb6e00907300800x5f8e78eci5df8333df713e4c@mail.gmail.com> Hi all, On the continuing topic of the nebulous FASTQ format, are there any strong views as to weather a FASTQ files could hold records without a sequence (and therefore no quality scores)? This could make sense as output from an (agressive) quality filter. This is corner case, and applies to other file formats too of course (e.g. FASTA). I mentioned this to Peter Rice (EMBOSS) off list, and he replied: On Thu, Jul 30, 2009 at 2:56 PM, Peter Rice wrote: > EMBOSS rejects zero length sequences - something we put in some years > ago for misformatted FASTA files that someone ran through a Taverna > workflow to launch clustalw via EMBOSS's "emma". The user had got his > carriage control characters mangled so the sequence was appended to the > FASTA '>' line and appeared as a long description with no sequence. > > I can well imagine for filtering paired reads that zero length sequences > would be useful. > > At the point where the test is made we know the sequence format. > We can therefore define some or all formats as accepting or rejecting > zero length sequences. > > Similarly we can easily extend to define some applications (e.g. emma) > as requiring a minimum sequence length. > > regards, > > Peter Peter Rice is of course correct - in general the meaning and validity of a zero length sequence is context dependent. I think Peter Rice makes a good point regarding paired end reads. What I assume we was getting at is the situation where due to quality trimming, one of a pair might be trimmed to nothing - leaving essentially a singleton read. However, paired end reads are normally stored using a matched pair of FASTQ files, so it could be important to keep the zero length read present, so that they can be read in together in sync. If we do want to allow zero length sequences in FASTQ, would both of the following be valid? Should there be empty sequence and quality lines, or no sequence and quality lines? "@identifier\n+\n" (two lines, just the @ and + lines) "@identifier\n\n+\n\n" (four lines, including blank seq and qual lines) or with the repeated identifier on the plus lines: "@identifier\n+identifier\n" (two lines, just the @ and + lines) "@identifier\n\n+identifier\n\n" (four lines, including blank lines) As we are recommending no line wrapping on output this means typical FASTQ records would be four lines - so doing the same makes sense here too. Peter C. From pmr at ebi.ac.uk Thu Jul 30 11:09:41 2009 From: pmr at ebi.ac.uk (Peter Rice) Date: Thu, 30 Jul 2009 16:09:41 +0100 Subject: [EMBOSS] FASTQ records with no sequence? In-Reply-To: <320fb6e00907300800x5f8e78eci5df8333df713e4c@mail.gmail.com> References: <320fb6e00907300800x5f8e78eci5df8333df713e4c@mail.gmail.com> Message-ID: <4A71B7B5.40502@ebi.ac.uk> Peter C. wrote: > As we are recommending no line wrapping on output this means > typical FASTQ records would be four lines - so doing the same > makes sense here too. I vote for 4 lines on output. It should be possible to allow zero lines on input depending on where the '+' check is. regards, Peter From biopython at maubp.freeserve.co.uk Thu Jul 30 11:19:24 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 30 Jul 2009 16:19:24 +0100 Subject: [EMBOSS] FASTQ records with no sequence? In-Reply-To: <4A71B7B5.40502@ebi.ac.uk> References: <320fb6e00907300800x5f8e78eci5df8333df713e4c@mail.gmail.com> <4A71B7B5.40502@ebi.ac.uk> Message-ID: <320fb6e00907300819x35ae00c3wa20a382376134db7@mail.gmail.com> On Thu, Jul 30, 2009 at 4:09 PM, Peter Rice wrote: > > Peter C. wrote: > >> As we are recommending no line wrapping on output this means >> typical FASTQ records would be four lines - so doing the same >> makes sense here too. > > I vote for 4 lines on output. If we want to allow zero length sequences, then yes, I would also vote for the 4 line output (i.e. blank lines for the sequence and the quality string). > It should be possible to allow zero lines on input depending on > where the '+' check is. Yes, I'm pretty sure a parser could cope with any of the zero length sequence FASTQ examples I gave. Peter From biopython at maubp.freeserve.co.uk Thu Jul 30 11:35:25 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 30 Jul 2009 16:35:25 +0100 Subject: [EMBOSS] FASTQ records with no sequence? In-Reply-To: <320fb6e00907300819x35ae00c3wa20a382376134db7@mail.gmail.com> References: <320fb6e00907300800x5f8e78eci5df8333df713e4c@mail.gmail.com> <4A71B7B5.40502@ebi.ac.uk> <320fb6e00907300819x35ae00c3wa20a382376134db7@mail.gmail.com> Message-ID: <320fb6e00907300835v3a9d46d4w77c344bbf6efa08d@mail.gmail.com> Hi all, On the continuing topic of the nebulous FASTQ format, are there any strong views as to weather a FASTQ files could hold records without a sequence (and therefore no quality scores)? This could make sense as output from an (aggressive) quality filter. This was a discussion I meant to start on the OBF list, not the EMBOSS list - so here is the start of the thread: http://lists.open-bio.org/pipermail/emboss/2009-July/003707.html Basically in some contexts an empty FASTQ record makes sense, so perhaps we should include examples of this for our test suite. However, there is more than one reasonable way to represent such a record (either omitting the sequence and quality lines, or including blank sequence and quality lines). On Thu, Jul 30, 2009 at 4:09 PM, Peter Rice wrote: > > Peter C. wrote: > >> As we are recommending no line wrapping on output this means >> typical FASTQ records would be four lines - so doing the same >> makes sense here too. > > I vote for 4 lines on output. If we want to allow zero length sequences, then yes, I would also vote for the 4 line output (i.e. blank lines for the sequence and the quality string). > It should be possible to allow zero lines on input depending on > where the '+' check is. Yes, I'm pretty sure a parser could cope with any of the zero length sequence FASTQ examples I gave. Peter From wzhao6898 at gmail.com Fri Jul 31 12:21:06 2009 From: wzhao6898 at gmail.com (David Zhao) Date: Fri, 31 Jul 2009 16:21:06 +0000 (UTC) Subject: [EMBOSS] How to specify the number of partial digestion using DIGEST Message-ID: Hi there, I'm trying to use DIGEST to generate virtual tryptic digest for a list of proteins, where I'd like to get 2 or 3 miscleavage digestion. I know -overlap returns 1 miscleavage, and -allpartials returns all the combinations and result in a huge file, is there a way I can specify how many miscleavage I need? Thanks, David Zhao From biopython at maubp.freeserve.co.uk Mon Jul 6 10:12:23 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 6 Jul 2009 11:12:23 +0100 Subject: [EMBOSS] Many-to-many with needle and water Message-ID: <320fb6e00907060312l5506c766i1c4a4ff48fba8c7e@mail.gmail.com> Hi Peter R. et al, I gather EMBOSS is looking for feedback for new applications (given the recent funding from the BBSRC - congratulations again). How about suggestions for extensions to existing EMBOSS applications? I've used bits of EMBOSS for several years now (thank you!). Something I have sometimes wanted to do is a many-to-many pairwise sequence alignment with the EMBOSS tools needle and water. Right now, needle and water take two files (here referred to as A and B), file A has just one sequence, and file B can have one or more sequences. I'd like to be able to supply two files both with multiple entries, and have needle/water do pairwise alignments between all the sequences in A against all the sequences in B. This might be useful for finding reciprocal best hits in comparative genomics (as an slower but exact alternative to FASTA or BLAST). >From an implementation point of view, I might imagine doing sequence A1 against all of B, then sequence A2 against all of B, etc. This would require looping over file B many times (easy if on disk). This would also work if the A input was stdin, but having the B input on stdin would require caching the data if A has more than one sequence :( It may sometimes also be useful to have an all-against-all pairwise comparison for a single set of sequences. The above suggested enhancement would let you do this by comparing file A to file A. However, here you only really need to do half the possible combinations (as aligning sequence A1 to sequence A2 should be the same as A2 to A1). This could be useful for implementing a basic clustering algorithm, or maybe as part of a worked example in building a simple NJ tree? So, does supporting many-to-many comparisons sound like a useful enhancement to needle and water? I should stress this isn't something I need right now. Also, it can be worked around with a wrapper script to call needle/water once for each sequence in file A (against all the sequences in file B), with the added bonus that then these jobs one-to-many comparisons can then be shared across multiple CPU cores. Regards, Peter C. From pmr at ebi.ac.uk Mon Jul 6 10:35:12 2009 From: pmr at ebi.ac.uk (Peter Rice) Date: Mon, 06 Jul 2009 11:35:12 +0100 Subject: [EMBOSS] Many-to-many with needle and water In-Reply-To: <320fb6e00907060312l5506c766i1c4a4ff48fba8c7e@mail.gmail.com> References: <320fb6e00907060312l5506c766i1c4a4ff48fba8c7e@mail.gmail.com> Message-ID: <4A51D360.1050107@ebi.ac.uk> Peter Cock or biopython wrote: > Hi Peter R. et al, > > I gather EMBOSS is looking for feedback for new applications (given > the recent funding from the BBSRC - congratulations again). How about > suggestions for extensions to existing EMBOSS applications? > > I've used bits of EMBOSS for several years now (thank you!). Something > I have sometimes wanted to do is a many-to-many pairwise sequence > alignment with the EMBOSS tools needle and water. > > Right now, needle and water take two files (here referred to as A and > B), file A has just one sequence, and file B can have one or more > sequences. I'd like to be able to supply two files both with multiple > entries, and have needle/water do pairwise alignments between all the > sequences in A against all the sequences in B. This might be useful > for finding reciprocal best hits in comparative genomics (as an slower > but exact alternative to FASTA or BLAST). The application is easy to add (after the release) The usual problem with all-against-all is that it involves loading one of the inputs as a sequence set entirely in memory - to avoid reading one input many times over. We have an application supermatcher which does this - the first sequence is streamed through, the second is a sequence set loaded into memory. It uses work matching to find seed alignments then runs a limited alignment around the hits. superwater would be a possible name (or superneedle). How popular would such a program be? How large would the smaller input set be? regards, Peter From biopython at maubp.freeserve.co.uk Mon Jul 6 10:58:06 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 6 Jul 2009 11:58:06 +0100 Subject: [EMBOSS] Many-to-many with needle and water In-Reply-To: <4A51D360.1050107@ebi.ac.uk> References: <320fb6e00907060312l5506c766i1c4a4ff48fba8c7e@mail.gmail.com> <4A51D360.1050107@ebi.ac.uk> Message-ID: <320fb6e00907060358v71934109o1d8acb0aef8d3e83@mail.gmail.com> On Mon, Jul 6, 2009 at 11:35 AM, Peter Rice wrote: > > Peter C wrote: > > Hi Peter R. et al, > > > > I gather EMBOSS is looking for feedback for new applications (given > > the recent funding from the BBSRC - congratulations again). How about > > suggestions for extensions to existing EMBOSS applications? > > > > I've used bits of EMBOSS for several years now (thank you!). Something > > I have sometimes wanted to do is a many-to-many pairwise sequence > > alignment with the EMBOSS tools needle and water. > > > > Right now, needle and water take two files (here referred to as A and > > B), file A has just one sequence, and file B can have one or more > > sequences. I'd like to be able to supply two files both with multiple > > entries, and have needle/water do pairwise alignments between all the > > sequences in A against all the sequences in B. This might be useful > > for finding reciprocal best hits in comparative genomics (as an slower > > but exact alternative to FASTA or BLAST). > > The application is easy to add (after the release) > > The usual problem with all-against-all is that it involves loading one > of the inputs as a sequence set entirely in memory - to avoid reading > one input many times over. Right - and it would be difficult to decide if in memory vs reading the file many times is best in general without some specific use cases. [I suppose you could do something a bit more cunning like start by caching the sequences as you read them read for re-use, but if the number of sequences crosses a threshold, stop caching and switch to re-reading the file for subsequence loops?] > We have an application supermatcher which does this - the first sequence > is streamed through, the second is a sequence set loaded into memory. It > uses work matching to find seed alignments then runs a limited alignment > around the hits. > > superwater would be a possible name (or superneedle). If you see many-to-many versions of water and needle as a separate applications, then those names sound fine. > How popular would such a program be? I don't know - as I said, this is more of suggestion than a request. I don't *need* this tool, but there have been occasions in the past where I would have tried using it if it had existed. Perhaps others on the list can think of a better uses for this tool idea? > How large would the smaller input set be? Hard to say without specific examples in mind. For some hand waving upper limits, for comparative genomics of bacteria using protein sequences, you might have a few thousand in each file. If I was trying this as part of an ad-hoc clustering algorithm (all-against-all), again maybe a few thousand sequences. In practice, a heuristic tool like supermatcher (or FASTA or BLAST) would probably be more sensible for large datasets like this due to the computational time. I see needle and water as most useful on smaller datasets where the runtime cost of using an exact algorithm isn't too high. Therefore many-to-many needle/water searches may be best targeted at smaller sequence files. Things might be different with a multicore or GPU/OpenCL version of needle and water ;) Anyway, unless someone else thinks a many-to-many version of needle and water would be useful, I wouldn't expect you to implement this. I'm just putting the idea forward for discussion. Regards, Peter C. From pmr at ebi.ac.uk Mon Jul 6 11:19:30 2009 From: pmr at ebi.ac.uk (Peter Rice) Date: Mon, 06 Jul 2009 12:19:30 +0100 Subject: [EMBOSS] Many-to-many with needle and water In-Reply-To: <320fb6e00907060358v71934109o1d8acb0aef8d3e83@mail.gmail.com> References: <320fb6e00907060312l5506c766i1c4a4ff48fba8c7e@mail.gmail.com> <4A51D360.1050107@ebi.ac.uk> <320fb6e00907060358v71934109o1d8acb0aef8d3e83@mail.gmail.com> Message-ID: <4A51DDC2.9030801@ebi.ac.uk> Peter C wrote: > [I suppose you could do something a bit more cunning like start by > caching the sequences as you read them read for re-use, but if the > number of sequences crosses a threshold, stop caching and switch > to re-reading the file for subsequence loops?] Tricky. Rereading is not always possible - for example streamed standard input as the data source. > Perhaps others on the list can think of a better uses for this tool idea? Let's see what response we get. One never knows until the question is asked :-) >> How large would the smaller input set be? > > Hard to say without specific examples in mind. For some hand waving > upper limits, for comparative genomics of bacteria using protein > sequences, you might have a few thousand in each file. If I was trying > this as part of an ad-hoc clustering algorithm (all-against-all), again > maybe a few thousand sequences. In practice, a heuristic tool like > supermatcher (or FASTA or BLAST) would probably be more sensible > for large datasets like this due to the computational time. > > I see needle and water as most useful on smaller datasets where > the runtime cost of using an exact algorithm isn't too high. Therefore > many-to-many needle/water searches may be best targeted at > smaller sequence files. Things might be different with a multicore > or GPU/OpenCL version of needle and water ;) Multicore would be a possibility - at least on systems configured for it. We are looking into picking up methods from the BioManyCores project. > Anyway, unless someone else thinks a many-to-many version > of needle and water would be useful, I wouldn't expect you to > implement this. I'm just putting the idea forward for discussion. Implementing is easy - we could simply send you the code to install locally if nobody else needs it :-) After all, it is only a minor modification to the existing applications. regards, Peter From biopython at maubp.freeserve.co.uk Mon Jul 6 11:56:06 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 6 Jul 2009 12:56:06 +0100 Subject: [EMBOSS] Probabilistic versions of needle/water? Message-ID: <320fb6e00907060456t42879f56ta5df97ac6ae8ec8e@mail.gmail.com> Hi all, I have another suggestion for new or enhanced EMBOSS applications, again related to the existing pairwise sequence alignment tools needle and water. The FASTQ file format (or others) contains quality scores (often PHRED scores) representing the probability of an error in the associated nucleotide. Solexa/Illumina machines also provide another file with a more precise breakdown of the likelihood of each of the four bases. In some cases both sequences could have probability scores (e.g. trying to align the ends of contigs to each other), but often one sequence will be taken as fact (e.g. mapping reads onto a reference). It is possible to take these probabilities into account when considering the matches in needle (or water) by using a probabilistic version of the Needleman?Wunsch sequence alignment algorithm (or a probabilistic Smith-Waterman). As an example of this idea, did you (Peter R) see the GNUMAP talk/poster at ISMB 2009? See http://dna.cs.byu.edu/gnumap/ I am aware of people using EMBOSS tools (I assume water) to identify (known) adaptor sequences in raw Solexa/Illumina data. I considered doing something similar myself when trying to remove primer sequences from 454 data. Such a pipeline using the current EMBOSS water would be doing this matching at a purely fixed nucleotide level (ignoring the qualities), which isn't ideal. Upgrading to a probabilistic version of water should be an improvement. Peter C. From pmr at ebi.ac.uk Mon Jul 6 12:32:18 2009 From: pmr at ebi.ac.uk (Peter Rice) Date: Mon, 06 Jul 2009 13:32:18 +0100 Subject: [EMBOSS] Probabilistic versions of needle/water? In-Reply-To: <320fb6e00907060456t42879f56ta5df97ac6ae8ec8e@mail.gmail.com> References: <320fb6e00907060456t42879f56ta5df97ac6ae8ec8e@mail.gmail.com> Message-ID: <4A51EED2.3080901@ebi.ac.uk> Peter C. wrote: > I have another suggestion for new or enhanced EMBOSS applications, > again related to the existing pairwise sequence alignment tools needle > and water. > > The FASTQ file format (or others) contains quality scores (often PHRED > scores) representing the probability of an error in the associated > nucleotide. Solexa/Illumina machines also provide another file with a > more precise breakdown of the likelihood of each of the four bases. > > In some cases both sequences could have probability scores (e.g. > trying to align the ends of contigs to each other), but often one > sequence will be taken as fact (e.g. mapping reads onto a reference). > > It is possible to take these probabilities into account when > considering the matches in needle (or water) by using a probabilistic > version of the Needleman?Wunsch sequence alignment algorithm (or a > probabilistic Smith-Waterman). > > As an example of this idea, did you (Peter R) see the GNUMAP > talk/poster at ISMB 2009? See http://dna.cs.byu.edu/gnumap/ I saw the talk, and was wondering about their algorithm. They did not have a separate treatment for gaps in the redas and the consensus, which seemed like an obvious extension. > I am aware of people using EMBOSS tools (I assume water) to identify > (known) adaptor sequences in raw Solexa/Illumina data. I considered > doing something similar myself when trying to remove primer sequences > from 454 data. Such a pipeline using the current EMBOSS water would be > doing this matching at a purely fixed nucleotide level (ignoring the > qualities), which isn't ideal. Upgrading to a probabilistic version of > water should be an improvement. Would be interesting. Where can I look up adaptor calling methods? Peter Rice From biopython at maubp.freeserve.co.uk Mon Jul 6 14:25:47 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 6 Jul 2009 15:25:47 +0100 Subject: [EMBOSS] Probabilistic versions of needle/water? In-Reply-To: <4A51EED2.3080901@ebi.ac.uk> References: <320fb6e00907060456t42879f56ta5df97ac6ae8ec8e@mail.gmail.com> <4A51EED2.3080901@ebi.ac.uk> Message-ID: <320fb6e00907060725g380ce420vdf2f34c290518a16@mail.gmail.com> On Mon, Jul 6, 2009 at 1:32 PM, Peter Rice wrote: > >> I am aware of people using EMBOSS tools (I assume water) to identify >> (known) adaptor sequences in raw Solexa/Illumina data. I considered >> doing something similar myself when trying to remove primer sequences >> from 454 data. Such a pipeline using the current EMBOSS water would be >> doing this matching at a purely fixed nucleotide level (ignoring the >> qualities), which isn't ideal. Upgrading to a probabilistic version of >> water should be an improvement. > > Would be interesting. > > Where can I look up adaptor calling methods? The particular example I had in mind was the thread with Giles Weaver on the BioPerl mailing list, which I see you have just replied to: http://lists.open-bio.org/pipermail/bioperl-l/2009-June/030398.html http://lists.open-bio.org/pipermail/bioperl-l/2009-July/030404.html I think I made a typo earlier (needle versus water). If you are comparing a short but complete adaptor sequence to a read (which you expect may contain the full adaptor) doing a global alignment is more sensible that a local one. On re-reading, Giles did actually say he was using needle: http://lists.open-bio.org/pipermail/bioperl-l/2009-July/030411.html Peter From michael.watson at bbsrc.ac.uk Mon Jul 6 15:45:06 2009 From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C)) Date: Mon, 6 Jul 2009 16:45:06 +0100 Subject: [EMBOSS] Problems installing EMBASSY / PHYLIPNEW Message-ID: <8975119BCD0AC5419D61A9CF1A923E9508FD0DC2@iahce2ksrv1.iah.bbsrc.ac.uk> Hello I did this on a test server a few weeks ago and wrote a script and it worked. Now I come to the production server - well, I get some errors. Same OS, same tar balls. Very frustrating. Can anyone help? I have PHYLIPNEW-3.67 unpacked in EMBOSS-6.0.1/embassy/PHYLIPNEW-3.67. In that directory, I run configure and then make I get a variety of errors, but this time it's: Making all in src make[1]: Entering directory `/home/bioinfvm/EMBOSS-6.0.1/embassy/PHYLIPNEW-3.67/src' gcc -DPACKAGE_NAME=\"\" -DPACKAGE_TARNAME=\"\" -DPACKAGE_VERSION=\"\" -DPACKAGE_STRING=\"\" -DPACKAGE_BUGREPORT=\"\" -DPACKAGE=\"PHYLIPNEW\" -DVERSION=\"3.67\" -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -DHAVE_DIRENT_H=1 -DSTDC_HEADERS=1 -DHAVE_UNISTD_H=1 -DGETPGRP_VOID=1 -DHAVE_STRFTIME=1 -DHAVE_FORK=1 -DHAVE_VFORK=1 -DHAVE_WORKING_VFORK=1 -DHAVE_WORKING_FORK=1 -DHAVE_VPRINTF=1 -DHAVE_STRDUP=1 -DHAVE_STRSTR=1 -DHAVE_STRCHR=1 -DHAVE_ERAND48=1 -DHAVE_MEMMOVE=1 -DHAVE_LIBM=1 -DPLD_png=1 -I. -I../include -I../../../ajax -I../../../nucleus -I../../../plplot -DAJ_LinuxLF -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE -DLENDIAN -DNO_AUTH -O2 -D__amd64__ -MT clique.o -MD -MP -MF .deps/clique.Tpo -c -o clique.o clique.c mv -f .deps/clique.Tpo .deps/clique.Po gcc -DPACKAGE_NAME=\"\" -DPACKAGE_TARNAME=\"\" -DPACKAGE_VERSION=\"\" -DPACKAGE_STRING=\"\" -DPACKAGE_BUGREPORT=\"\" -DPACKAGE=\"PHYLIPNEW\" -DVERSION=\"3.67\" -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -DHAVE_DIRENT_H=1 -DSTDC_HEADERS=1 -DHAVE_UNISTD_H=1 -DGETPGRP_VOID=1 -DHAVE_STRFTIME=1 -DHAVE_FORK=1 -DHAVE_VFORK=1 -DHAVE_WORKING_VFORK=1 -DHAVE_WORKING_FORK=1 -DHAVE_VPRINTF=1 -DHAVE_STRDUP=1 -DHAVE_STRSTR=1 -DHAVE_STRCHR=1 -DHAVE_ERAND48=1 -DHAVE_MEMMOVE=1 -DHAVE_LIBM=1 -DPLD_png=1 -I. -I../include -I../../../ajax -I../../../nucleus -I../../../plplot -DAJ_LinuxLF -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE -DLENDIAN -DNO_AUTH -O2 -D__amd64__ -MT disc.o -MD -MP -MF .deps/disc.Tpo -c -o disc.o disc.c mv -f .deps/disc.Tpo .deps/disc.Po gcc -DPACKAGE_NAME=\"\" -DPACKAGE_TARNAME=\"\" -DPACKAGE_VERSION=\"\" -DPACKAGE_STRING=\"\" -DPACKAGE_BUGREPORT=\"\" -DPACKAGE=\"PHYLIPNEW\" -DVERSION=\"3.67\" -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -DHAVE_DIRENT_H=1 -DSTDC_HEADERS=1 -DHAVE_UNISTD_H=1 -DGETPGRP_VOID=1 -DHAVE_STRFTIME=1 -DHAVE_FORK=1 -DHAVE_VFORK=1 -DHAVE_WORKING_VFORK=1 -DHAVE_WORKING_FORK=1 -DHAVE_VPRINTF=1 -DHAVE_STRDUP=1 -DHAVE_STRSTR=1 -DHAVE_STRCHR=1 -DHAVE_ERAND48=1 -DHAVE_MEMMOVE=1 -DHAVE_LIBM=1 -DPLD_png=1 -I. -I../include -I../../../ajax -I../../../nucleus -I../../../plplot -DAJ_LinuxLF -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE -DLENDIAN -DNO_AUTH -O2 -D__amd64__ -MT phylip.o -MD -MP -MF .deps/phylip.Tpo -c -o phylip.o phylip.c mv -f .deps/phylip.Tpo .deps/phylip.Po /bin/sh ../libtool --tag=CC --mode=link gcc -O2 -D__amd64__ -o fclique clique.o disc.o phylip.o ../../../nucleus/libnucleus.la ../../../ajax/libajax.la ../../../plplot/libeplplot.la -L/usr/lib64 -lX11 -lXaw -lXt -lm -lgd -lpng -lz -lm mkdir .libs gcc -O2 -D__amd64__ -o .libs/fclique clique.o disc.o phylip.o ../../../nucleus/.libs/libnucleus.so ../../../ajax/.libs/libajax.so ../../../plplot/.libs/libeplplot.so -L/usr/lib64 -lX11 -lXaw -lXt -lgd -lpng -lz -lm -Wl,--rpath -Wl,/usr/local/lib /usr/bin/ld: cannot find -lXaw collect2: ld returned 1 exit status make[1]: *** [fclique] Error 1 make[1]: Leaving directory `/home/bioinfvm/EMBOSS-6.0.1/embassy/PHYLIPNEW-3.67/src' make: *** [all-recursive] Error 1 Any help would be appreciated. Thanks Mick Head of Bioinformatics Institute for Animal Health Compton Berks RG20 7NN 01635 578411 Please consider the environment and don't print this e-mail unless you really need to. The information contained in this message may be confidential or legally privileged and is intended solely for the addressee. If you have received this message in error please delete it & notify the originator immediately. Unauthorised use, disclosure, copying or alteration of this message is forbidden & may be unlawful. The contents of this e-mail are the views of the sender and do not necessarily represent the views of the Institute. This email, and associated attachments, has been checked locally for viruses but we can accept no responsibility once it has left our systems. Communications on Institute computers are monitored to secure the effective operation of the systems and for other lawful purposes. The Institute for Animal Health is a company limited by guarantee, registered in England no. 559784. The Institute is also a registered charity, Charity Commissioners Reference No. 228824 From jasonzhang2002 at gmail.com Mon Jul 6 16:35:21 2009 From: jasonzhang2002 at gmail.com (jason zhang) Date: Mon, 06 Jul 2009 09:35:21 -0700 Subject: [EMBOSS] Problems installing EMBASSY / PHYLIPNEW In-Reply-To: <8975119BCD0AC5419D61A9CF1A923E9508FD0DC2@iahce2ksrv1.iah.bbsrc.ac.uk> References: <8975119BCD0AC5419D61A9CF1A923E9508FD0DC2@iahce2ksrv1.iah.bbsrc.ac.uk> Message-ID: <4A5227C9.7040004@gmail.com> Hi, Michael I did this in my fedora server : yum install libXaw libXaw-compat libXaw-devel to resolve this. -jason michael watson (IAH-C) wrote: > Hello > > > > I did this on a test server a few weeks ago and wrote a script and it > worked. > > > > Now I come to the production server - well, I get some errors. Same OS, > same tar balls. Very frustrating. Can anyone help? > > > > I have PHYLIPNEW-3.67 unpacked in EMBOSS-6.0.1/embassy/PHYLIPNEW-3.67. > In that directory, I run configure and then make > > > > I get a variety of errors, but this time it's: > > > > Making all in src > > make[1]: Entering directory > `/home/bioinfvm/EMBOSS-6.0.1/embassy/PHYLIPNEW-3.67/src' > > gcc -DPACKAGE_NAME=\"\" -DPACKAGE_TARNAME=\"\" -DPACKAGE_VERSION=\"\" > -DPACKAGE_STRING=\"\" -DPACKAGE_BUGREPORT=\"\" -DPACKAGE=\"PHYLIPNEW\" > -DVERSION=\"3.67\" -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 > -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 > -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 > -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -DHAVE_DIRENT_H=1 > -DSTDC_HEADERS=1 -DHAVE_UNISTD_H=1 -DGETPGRP_VOID=1 -DHAVE_STRFTIME=1 > -DHAVE_FORK=1 -DHAVE_VFORK=1 -DHAVE_WORKING_VFORK=1 > -DHAVE_WORKING_FORK=1 -DHAVE_VPRINTF=1 -DHAVE_STRDUP=1 -DHAVE_STRSTR=1 > -DHAVE_STRCHR=1 -DHAVE_ERAND48=1 -DHAVE_MEMMOVE=1 -DHAVE_LIBM=1 > -DPLD_png=1 -I. -I../include -I../../../ajax -I../../../nucleus > -I../../../plplot -DAJ_LinuxLF -D_FILE_OFFSET_BITS=64 > -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE -DLENDIAN -DNO_AUTH -O2 > -D__amd64__ -MT clique.o -MD -MP -MF .deps/clique.Tpo -c -o clique.o > clique.c > > mv -f .deps/clique.Tpo .deps/clique.Po > > gcc -DPACKAGE_NAME=\"\" -DPACKAGE_TARNAME=\"\" -DPACKAGE_VERSION=\"\" > -DPACKAGE_STRING=\"\" -DPACKAGE_BUGREPORT=\"\" -DPACKAGE=\"PHYLIPNEW\" > -DVERSION=\"3.67\" -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 > -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 > -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 > -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -DHAVE_DIRENT_H=1 > -DSTDC_HEADERS=1 -DHAVE_UNISTD_H=1 -DGETPGRP_VOID=1 -DHAVE_STRFTIME=1 > -DHAVE_FORK=1 -DHAVE_VFORK=1 -DHAVE_WORKING_VFORK=1 > -DHAVE_WORKING_FORK=1 -DHAVE_VPRINTF=1 -DHAVE_STRDUP=1 -DHAVE_STRSTR=1 > -DHAVE_STRCHR=1 -DHAVE_ERAND48=1 -DHAVE_MEMMOVE=1 -DHAVE_LIBM=1 > -DPLD_png=1 -I. -I../include -I../../../ajax -I../../../nucleus > -I../../../plplot -DAJ_LinuxLF -D_FILE_OFFSET_BITS=64 > -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE -DLENDIAN -DNO_AUTH -O2 > -D__amd64__ -MT disc.o -MD -MP -MF .deps/disc.Tpo -c -o disc.o disc.c > > mv -f .deps/disc.Tpo .deps/disc.Po > > gcc -DPACKAGE_NAME=\"\" -DPACKAGE_TARNAME=\"\" -DPACKAGE_VERSION=\"\" > -DPACKAGE_STRING=\"\" -DPACKAGE_BUGREPORT=\"\" -DPACKAGE=\"PHYLIPNEW\" > -DVERSION=\"3.67\" -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 > -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 > -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 > -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -DHAVE_DIRENT_H=1 > -DSTDC_HEADERS=1 -DHAVE_UNISTD_H=1 -DGETPGRP_VOID=1 -DHAVE_STRFTIME=1 > -DHAVE_FORK=1 -DHAVE_VFORK=1 -DHAVE_WORKING_VFORK=1 > -DHAVE_WORKING_FORK=1 -DHAVE_VPRINTF=1 -DHAVE_STRDUP=1 -DHAVE_STRSTR=1 > -DHAVE_STRCHR=1 -DHAVE_ERAND48=1 -DHAVE_MEMMOVE=1 -DHAVE_LIBM=1 > -DPLD_png=1 -I. -I../include -I../../../ajax -I../../../nucleus > -I../../../plplot -DAJ_LinuxLF -D_FILE_OFFSET_BITS=64 > -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE -DLENDIAN -DNO_AUTH -O2 > -D__amd64__ -MT phylip.o -MD -MP -MF .deps/phylip.Tpo -c -o phylip.o > phylip.c > > mv -f .deps/phylip.Tpo .deps/phylip.Po > > /bin/sh ../libtool --tag=CC --mode=link gcc -O2 -D__amd64__ -o > fclique clique.o disc.o phylip.o ../../../nucleus/libnucleus.la > ../../../ajax/libajax.la ../../../plplot/libeplplot.la -L/usr/lib64 > -lX11 -lXaw -lXt -lm -lgd -lpng -lz -lm > > mkdir .libs > > gcc -O2 -D__amd64__ -o .libs/fclique clique.o disc.o phylip.o > ../../../nucleus/.libs/libnucleus.so ../../../ajax/.libs/libajax.so > ../../../plplot/.libs/libeplplot.so -L/usr/lib64 -lX11 -lXaw -lXt -lgd > -lpng -lz -lm -Wl,--rpath -Wl,/usr/local/lib > > /usr/bin/ld: cannot find -lXaw > > collect2: ld returned 1 exit status > > make[1]: *** [fclique] Error 1 > > make[1]: Leaving directory > `/home/bioinfvm/EMBOSS-6.0.1/embassy/PHYLIPNEW-3.67/src' > > make: *** [all-recursive] Error 1 > > > > Any help would be appreciated. > > > > Thanks > > Mick > > > > Head of Bioinformatics > Institute for Animal Health > Compton > Berks > RG20 7NN > 01635 578411 > > > > Please consider the environment and don't print this e-mail unless you > really need to. > > The information contained in this message may be confidential or legally > privileged and is intended solely for the addressee. If you have > received this message in error please delete it & notify the originator > immediately. Unauthorised use, disclosure, copying or alteration of > this message is forbidden & may be unlawful. The contents of this > e-mail are the views of the sender and do not necessarily represent the > views of the Institute. This email, and associated attachments, has > been checked locally for viruses but we can accept no responsibility > once it has left our systems. Communications on Institute computers are > monitored to secure the effective operation of the systems and for other > lawful purposes. > > > > The Institute for Animal Health is a company limited by guarantee, > registered in England no. 559784. > > The Institute is also a registered charity, Charity Commissioners > Reference No. 228824 > > > > > _______________________________________________ > EMBOSS mailing list > EMBOSS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/emboss > > From steffen_moeller at gmx.de Mon Jul 6 16:28:47 2009 From: steffen_moeller at gmx.de (Steffen Moeller) Date: Mon, 06 Jul 2009 18:28:47 +0200 Subject: [EMBOSS] Problems installing EMBASSY / PHYLIPNEW In-Reply-To: <8975119BCD0AC5419D61A9CF1A923E9508FD0DC2@iahce2ksrv1.iah.bbsrc.ac.uk> References: <8975119BCD0AC5419D61A9CF1A923E9508FD0DC2@iahce2ksrv1.iah.bbsrc.ac.uk> Message-ID: <4A52263F.1050806@gmx.de> Hi Michael, michael watson (IAH-C) wrote: > I did this on a test server a few weeks ago and wrote a script and it > worked. > > Now I come to the production server - well, I get some errors. Same OS, > same tar balls. Very frustrating. Can anyone help? ... > > gcc -O2 -D__amd64__ -o .libs/fclique clique.o disc.o phylip.o > ../../../nucleus/.libs/libnucleus.so ../../../ajax/.libs/libajax.so > ../../../plplot/.libs/libeplplot.so -L/usr/lib64 -lX11 -lXaw -lXt -lgd > -lpng -lz -lm -Wl,--rpath -Wl,/usr/local/lib > > /usr/bin/ld: cannot find -lXaw > > collect2: ld returned 1 exit status your production machine is probably not prepared for graphical user interfaces :) Debian has it in the package libxaw7. Many greetings Steffen $ dpkg -S /usr/lib/libXaw.so.7 libxaw7: /usr/lib/libXaw.so.7 From michael.watson at bbsrc.ac.uk Tue Jul 7 08:18:39 2009 From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C)) Date: Tue, 7 Jul 2009 09:18:39 +0100 Subject: [EMBOSS] Problems installing EMBASSY / PHYLIPNEW In-Reply-To: <4A5227C9.7040004@gmail.com> References: <8975119BCD0AC5419D61A9CF1A923E9508FD0DC2@iahce2ksrv1.iah.bbsrc.ac.uk> <4A5227C9.7040004@gmail.com> Message-ID: <8975119BCD0AC5419D61A9CF1A923E9508FD0DD3@iahce2ksrv1.iah.bbsrc.ac.uk> Hi Guys Thanks for all the replies. One last question: do I need the X11 development libraries in order for fdrawgram and fdrawtree to produce postscript files? Thanks Michael -----Original Message----- From: jason zhang [mailto:jasonzhang2002 at gmail.com] Sent: 06 July 2009 17:35 To: michael watson (IAH-C) Cc: emboss at lists.open-bio.org Subject: Re: [EMBOSS] Problems installing EMBASSY / PHYLIPNEW Hi, Michael I did this in my fedora server : yum install libXaw libXaw-compat libXaw-devel to resolve this. -jason michael watson (IAH-C) wrote: > Hello > > > > I did this on a test server a few weeks ago and wrote a script and it > worked. > > > > Now I come to the production server - well, I get some errors. Same OS, > same tar balls. Very frustrating. Can anyone help? > > > > I have PHYLIPNEW-3.67 unpacked in EMBOSS-6.0.1/embassy/PHYLIPNEW-3.67. > In that directory, I run configure and then make > > > > I get a variety of errors, but this time it's: > > > > Making all in src > > make[1]: Entering directory > `/home/bioinfvm/EMBOSS-6.0.1/embassy/PHYLIPNEW-3.67/src' > > gcc -DPACKAGE_NAME=\"\" -DPACKAGE_TARNAME=\"\" -DPACKAGE_VERSION=\"\" > -DPACKAGE_STRING=\"\" -DPACKAGE_BUGREPORT=\"\" -DPACKAGE=\"PHYLIPNEW\" > -DVERSION=\"3.67\" -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 > -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 > -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 > -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -DHAVE_DIRENT_H=1 > -DSTDC_HEADERS=1 -DHAVE_UNISTD_H=1 -DGETPGRP_VOID=1 -DHAVE_STRFTIME=1 > -DHAVE_FORK=1 -DHAVE_VFORK=1 -DHAVE_WORKING_VFORK=1 > -DHAVE_WORKING_FORK=1 -DHAVE_VPRINTF=1 -DHAVE_STRDUP=1 -DHAVE_STRSTR=1 > -DHAVE_STRCHR=1 -DHAVE_ERAND48=1 -DHAVE_MEMMOVE=1 -DHAVE_LIBM=1 > -DPLD_png=1 -I. -I../include -I../../../ajax -I../../../nucleus > -I../../../plplot -DAJ_LinuxLF -D_FILE_OFFSET_BITS=64 > -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE -DLENDIAN -DNO_AUTH -O2 > -D__amd64__ -MT clique.o -MD -MP -MF .deps/clique.Tpo -c -o clique.o > clique.c > > mv -f .deps/clique.Tpo .deps/clique.Po > > gcc -DPACKAGE_NAME=\"\" -DPACKAGE_TARNAME=\"\" -DPACKAGE_VERSION=\"\" > -DPACKAGE_STRING=\"\" -DPACKAGE_BUGREPORT=\"\" -DPACKAGE=\"PHYLIPNEW\" > -DVERSION=\"3.67\" -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 > -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 > -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 > -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -DHAVE_DIRENT_H=1 > -DSTDC_HEADERS=1 -DHAVE_UNISTD_H=1 -DGETPGRP_VOID=1 -DHAVE_STRFTIME=1 > -DHAVE_FORK=1 -DHAVE_VFORK=1 -DHAVE_WORKING_VFORK=1 > -DHAVE_WORKING_FORK=1 -DHAVE_VPRINTF=1 -DHAVE_STRDUP=1 -DHAVE_STRSTR=1 > -DHAVE_STRCHR=1 -DHAVE_ERAND48=1 -DHAVE_MEMMOVE=1 -DHAVE_LIBM=1 > -DPLD_png=1 -I. -I../include -I../../../ajax -I../../../nucleus > -I../../../plplot -DAJ_LinuxLF -D_FILE_OFFSET_BITS=64 > -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE -DLENDIAN -DNO_AUTH -O2 > -D__amd64__ -MT disc.o -MD -MP -MF .deps/disc.Tpo -c -o disc.o disc.c > > mv -f .deps/disc.Tpo .deps/disc.Po > > gcc -DPACKAGE_NAME=\"\" -DPACKAGE_TARNAME=\"\" -DPACKAGE_VERSION=\"\" > -DPACKAGE_STRING=\"\" -DPACKAGE_BUGREPORT=\"\" -DPACKAGE=\"PHYLIPNEW\" > -DVERSION=\"3.67\" -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 > -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 > -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 > -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -DHAVE_DIRENT_H=1 > -DSTDC_HEADERS=1 -DHAVE_UNISTD_H=1 -DGETPGRP_VOID=1 -DHAVE_STRFTIME=1 > -DHAVE_FORK=1 -DHAVE_VFORK=1 -DHAVE_WORKING_VFORK=1 > -DHAVE_WORKING_FORK=1 -DHAVE_VPRINTF=1 -DHAVE_STRDUP=1 -DHAVE_STRSTR=1 > -DHAVE_STRCHR=1 -DHAVE_ERAND48=1 -DHAVE_MEMMOVE=1 -DHAVE_LIBM=1 > -DPLD_png=1 -I. -I../include -I../../../ajax -I../../../nucleus > -I../../../plplot -DAJ_LinuxLF -D_FILE_OFFSET_BITS=64 > -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE -DLENDIAN -DNO_AUTH -O2 > -D__amd64__ -MT phylip.o -MD -MP -MF .deps/phylip.Tpo -c -o phylip.o > phylip.c > > mv -f .deps/phylip.Tpo .deps/phylip.Po > > /bin/sh ../libtool --tag=CC --mode=link gcc -O2 -D__amd64__ -o > fclique clique.o disc.o phylip.o ../../../nucleus/libnucleus.la > ../../../ajax/libajax.la ../../../plplot/libeplplot.la -L/usr/lib64 > -lX11 -lXaw -lXt -lm -lgd -lpng -lz -lm > > mkdir .libs > > gcc -O2 -D__amd64__ -o .libs/fclique clique.o disc.o phylip.o > ../../../nucleus/.libs/libnucleus.so ../../../ajax/.libs/libajax.so > ../../../plplot/.libs/libeplplot.so -L/usr/lib64 -lX11 -lXaw -lXt -lgd > -lpng -lz -lm -Wl,--rpath -Wl,/usr/local/lib > > /usr/bin/ld: cannot find -lXaw > > collect2: ld returned 1 exit status > > make[1]: *** [fclique] Error 1 > > make[1]: Leaving directory > `/home/bioinfvm/EMBOSS-6.0.1/embassy/PHYLIPNEW-3.67/src' > > make: *** [all-recursive] Error 1 > > > > Any help would be appreciated. > > > > Thanks > > Mick > > > > Head of Bioinformatics > Institute for Animal Health > Compton > Berks > RG20 7NN > 01635 578411 > > > > Please consider the environment and don't print this e-mail unless you > really need to. > > The information contained in this message may be confidential or legally > privileged and is intended solely for the addressee. If you have > received this message in error please delete it & notify the originator > immediately. Unauthorised use, disclosure, copying or alteration of > this message is forbidden & may be unlawful. The contents of this > e-mail are the views of the sender and do not necessarily represent the > views of the Institute. This email, and associated attachments, has > been checked locally for viruses but we can accept no responsibility > once it has left our systems. Communications on Institute computers are > monitored to secure the effective operation of the systems and for other > lawful purposes. > > > > The Institute for Animal Health is a company limited by guarantee, > registered in England no. 559784. > > The Institute is also a registered charity, Charity Commissioners > Reference No. 228824 > > > > > _______________________________________________ > EMBOSS mailing list > EMBOSS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/emboss > > From ellis at biochem.dental.upenn.edu Tue Jul 7 19:34:27 2009 From: ellis at biochem.dental.upenn.edu (Ellis Golub) Date: Tue, 7 Jul 2009 15:34:27 -0400 Subject: [EMBOSS] Structure Message-ID: <1090707153427.ZM647422@biochem.dental.upenn.edu> Hello -- I've downloaded the EMBASSY STRUCTURE-0.1.0 package and tried to install it. I have previously installed EMBOSS-6.0.1. Using the same flags with configure that I used for the original EMBOSS installation, all seems to go well until I try to run one of the programs. The following output is obtained: > pdbparse 1274873:pdbparse: rld: Error: unresolvable symbol in /usr/local/emboss/lib/libplplot.so.6: gdImageColorAllocate 1274873:pdbparse: rld: Error: unresolvable symbol in /usr/local/emboss/lib/libplplot.so.6: gdImageLine 1274873:pdbparse: rld: Fatal Error: this executable has unresolvable symbols The /usr/local/emboss/lib/libplplot.so.6 file is the one produced with the EMBOSS install, and all the EMBOSS files work fine, and the grahics outputs are fine. Anyone have a clue? Ellis -- ============================================================= Ellis Golub Phone: (215) 898-4629 Biochemistry Department FAX: (215) 898-3695 School of Dental Medicine ellis at biochem.dental.upenn.edu University of Pennsylvania 240 South 40th Street Philadelphia, PA 19104-6030 Home: http://biochem.dental.upenn.edu/~ellis/ PGP: http://biochem.dental.upenn.edu/~ellis/public_key.html ============================================================== From lueck at ipk-gatersleben.de Wed Jul 8 09:59:39 2009 From: lueck at ipk-gatersleben.de (lueck at ipk-gatersleben.de) Date: Wed, 8 Jul 2009 11:59:39 +0200 Subject: [EMBOSS] Many-to-many with needle and water In-Reply-To: <4A51D360.1050107@ebi.ac.uk> References: <320fb6e00907060312l5506c766i1c4a4ff48fba8c7e@mail.gmail.com> <4A51D360.1050107@ebi.ac.uk> Message-ID: <20090708115939.8spikcm8zxi8cscw@webmail.ipk-gatersleben.de> Hi! I would be interested in "superneedle" if the speed is acceptable. I'm still looking for a fast and good enough short sequence finder/aligner (see http://www.mailinglistarchive.com/biopython at biopython.org/msg00941.html). At the moment I'm still using BLAST because I have no time to switch but later I would like to change to another algorithm. Kind regards Stefanie Zitat von Peter Rice : > Peter Cock or biopython wrote: >> Hi Peter R. et al, >> >> I gather EMBOSS is looking for feedback for new applications (given >> the recent funding from the BBSRC - congratulations again). How about >> suggestions for extensions to existing EMBOSS applications? >> >> I've used bits of EMBOSS for several years now (thank you!). Something >> I have sometimes wanted to do is a many-to-many pairwise sequence >> alignment with the EMBOSS tools needle and water. >> >> Right now, needle and water take two files (here referred to as A and >> B), file A has just one sequence, and file B can have one or more >> sequences. I'd like to be able to supply two files both with multiple >> entries, and have needle/water do pairwise alignments between all the >> sequences in A against all the sequences in B. This might be useful >> for finding reciprocal best hits in comparative genomics (as an slower >> but exact alternative to FASTA or BLAST). > > The application is easy to add (after the release) > > The usual problem with all-against-all is that it involves loading one > of the inputs as a sequence set entirely in memory - to avoid reading > one input many times over. > > We have an application supermatcher which does this - the first sequence > is streamed through, the second is a sequence set loaded into memory. It > uses work matching to find seed alignments then runs a limited alignment > around the hits. > > superwater would be a possible name (or superneedle). > > How popular would such a program be? > > How large would the smaller input set be? > > regards, > > Peter > > _______________________________________________ > EMBOSS mailing list > EMBOSS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/emboss > > From pmr at ebi.ac.uk Wed Jul 8 11:04:04 2009 From: pmr at ebi.ac.uk (Peter Rice) Date: Wed, 08 Jul 2009 12:04:04 +0100 Subject: [EMBOSS] Many-to-many with needle and water In-Reply-To: <20090708115939.8spikcm8zxi8cscw@webmail.ipk-gatersleben.de> References: <320fb6e00907060312l5506c766i1c4a4ff48fba8c7e@mail.gmail.com> <4A51D360.1050107@ebi.ac.uk> <20090708115939.8spikcm8zxi8cscw@webmail.ipk-gatersleben.de> Message-ID: <4A547D24.6070703@ebi.ac.uk> Hi Stefanie, lueck at ipk-gatersleben.de wrote: > Hi! > > I would be interested in "superneedle" if the speed is acceptable. I'm > still looking for a fast and good enough short sequence finder/aligner > (see > http://www.mailinglistarchive.com/biopython at biopython.org/msg00941.html). > > At the moment I'm still using BLAST because I have no time to switch but > later I would like to change to another algorithm. >From the biopython message it seems you mainly want to find exact 21-mers. Can you give an example of a query and a database so I can test the speed of current EMBOSS applications like wordfinder regards, Peter Rice From biopython at maubp.freeserve.co.uk Wed Jul 8 21:50:19 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 8 Jul 2009 22:50:19 +0100 Subject: [EMBOSS] transeq and ambiguous codons Message-ID: <320fb6e00907081450y2fd135e0x817f03c41357e297@mail.gmail.com> Hi all, Something I mentioned to Peter Rice in passing at BOSC/ISMB 2009 was I'd found an oddity in transeq with certain ambiguous codons which testing Biopython's translations. Here is a specific example (but I suspect there are more). For reference, I am expecting EMBOSS transeq to be using the NCBI tables: http://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi First consider the following example, the codon TAN, which can be TAA, TAC, TAG or TAT which translate to stop or Y. Therefore the translation of TAN should be "* or Y", and EMBOSS transeq opts for "X". Which is fine: $ transeq asis:TAATACTAGTATTAN -stdout -auto >asis_1 *Y*YX Similarly for the codon TNN, again EMBOSS transeq opts for "X" because this could be a stop codon, or W, or F, or L, or S, or Y or C! Again, this is fine: $ transeq asis:TNN -stdout -auto >asis_1 X However, consider the codon TRR. R means A or G, so this can mean TAA, TGA, TAG or TGG which translate to stop or W (both EMBOSS and the NCBI standard table agree here). Therefore the translation of TRR should be "* or W", which I would expect based on the above examples to result in "X". But instead EMBOSS transeq gives "*": $ transeq asis:TAATGATAGTGGTRRTNN -stdout -auto >asis_1 ***W*X I think this is a bug. However, I am aware that the machine I tried this on is rather old, and I don't actually know which version of EMBOSS it is. How can I find out? As far as I know, there is no "-version" or "-v" or "--version" switch, and the "-help" information doesn't include this important piece of information. Nor is this in the FAQ: http://emboss.sourceforge.net/docs/faq.html So that makes two questions - how should transeq translate "TRR", and how do I check the version of EMBOSS? Thanks, Peter C. From SMarkel at accelrys.com Wed Jul 8 23:53:05 2009 From: SMarkel at accelrys.com (Scott Markel) Date: Wed, 8 Jul 2009 19:53:05 -0400 Subject: [EMBOSS] transeq and ambiguous codons In-Reply-To: <320fb6e00907081450y2fd135e0x817f03c41357e297@mail.gmail.com> References: <320fb6e00907081450y2fd135e0x817f03c41357e297@mail.gmail.com> Message-ID: <1F1240778FB0AF46B4E5A72C44D2C74732E01A91@exch1-hi.accelrys.net> Peter, Answer to question #2: run the program embossversion. > embossversion Writes the current EMBOSS version number to a file 6.0.1 Scott Scott Markel, Ph.D. Principal Bioinformatics Architect email: smarkel at accelrys.com Accelrys (SciTegic R&D) mobile: +1 858 205 3653 10188 Telesis Court, Suite 100 voice: +1 858 799 5603 San Diego, CA 92121 fax: +1 858 799 5222 USA web: http://www.accelrys.com http://www.linkedin.com/in/smarkel Vice President, Board of Directors: International Society for Computational Biology Co-chair: ISCB Publications Committee Associate Editor: PLoS Computational Biology Editorial Board: Briefings in Bioinformatics > -----Original Message----- > From: emboss-bounces at lists.open-bio.org [mailto:emboss-bounces at lists.open- > bio.org] On Behalf Of Peter > Sent: Wednesday, 08 July 2009 2:50 PM > To: emboss at lists.open-bio.org > Subject: [EMBOSS] transeq and ambiguous codons > > Hi all, > > Something I mentioned to Peter Rice in passing at BOSC/ISMB 2009 was > I'd found an oddity in transeq with certain ambiguous codons which > testing Biopython's translations. Here is a specific example (but I > suspect there are more). For reference, I am expecting EMBOSS transeq > to be using the NCBI tables: > http://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi > > First consider the following example, the codon TAN, which can be TAA, > TAC, TAG or TAT which translate to stop or Y. Therefore the > translation of TAN should be "* or Y", and EMBOSS transeq opts for > "X". Which is fine: > > $ transeq asis:TAATACTAGTATTAN -stdout -auto > >asis_1 > *Y*YX > > Similarly for the codon TNN, again EMBOSS transeq opts for "X" because > this could be a stop codon, or W, or F, or L, or S, or Y or C! Again, > this is fine: > > $ transeq asis:TNN -stdout -auto >asis_1 > X > > However, consider the codon TRR. R means A or G, so this can mean TAA, > TGA, TAG or TGG which translate to stop or W (both EMBOSS and the NCBI > standard table agree here). Therefore the translation of TRR should be > "* or W", which I would expect based on the above examples to result > in "X". But instead EMBOSS transeq gives "*": > > $ transeq asis:TAATGATAGTGGTRRTNN -stdout -auto > >asis_1 > ***W*X > > I think this is a bug. > > However, I am aware that the machine I tried this on is rather old, > and I don't actually know which version of EMBOSS it is. How can I > find out? As far as I know, there is no "-version" or "-v" or > "--version" switch, and the "-help" information doesn't include this > important piece of information. Nor is this in the FAQ: > http://emboss.sourceforge.net/docs/faq.html > > So that makes two questions - how should transeq translate "TRR", and > how do I check the version of EMBOSS? > > Thanks, > > Peter C. > _______________________________________________ > EMBOSS mailing list > EMBOSS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/emboss From pmr at ebi.ac.uk Thu Jul 9 07:38:34 2009 From: pmr at ebi.ac.uk (Peter Rice) Date: Thu, 09 Jul 2009 08:38:34 +0100 Subject: [EMBOSS] transeq and ambiguous codons In-Reply-To: <320fb6e00907081450y2fd135e0x817f03c41357e297@mail.gmail.com> References: <320fb6e00907081450y2fd135e0x817f03c41357e297@mail.gmail.com> Message-ID: <4A559E7A.4020807@ebi.ac.uk> Peter C. wrote: > Hi all, > > Something I mentioned to Peter Rice in passing at BOSC/ISMB 2009 was > I'd found an oddity in transeq with certain ambiguous codons which > testing Biopython's translations. Here is a specific example (but I > suspect there are more). For reference, I am expecting EMBOSS transeq > to be using the NCBI tables: > http://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi Yes, those are our source. > However, consider the codon TRR. R means A or G, so this can mean TAA, > TGA, TAG or TGG which translate to stop or W (both EMBOSS and the NCBI > standard table agree here). Therefore the translation of TRR should be > "* or W", which I would expect based on the above examples to result > in "X". But instead EMBOSS transeq gives "*": > > $ transeq asis:TAATGATAGTGGTRRTNN -stdout -auto >> asis_1 > ***W*X Hmmm... something odd there. TRA translates as 'X'. I will take a look, but we are bundling the 6.1.0 release so (unless I am very quick) any resolution will be a patch after the release. > So that makes two questions - how should transeq translate "TRR", and > how do I check the version of EMBOSS? Scott got there first (different time zone I assume :-) with the suggestion of embossversion. Meanwhile .... are there any translation utilities we could usefully include in a future version? I am already contemplating one to report the translations of ambiguous codons, just for information in tracking and debugging translation tables - and explaining transeq and other applications to new users. regards, Peter Rice From biopython at maubp.freeserve.co.uk Thu Jul 9 08:59:52 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 9 Jul 2009 09:59:52 +0100 Subject: [EMBOSS] transeq and ambiguous codons In-Reply-To: <1F1240778FB0AF46B4E5A72C44D2C74732E01A91@exch1-hi.accelrys.net> References: <320fb6e00907081450y2fd135e0x817f03c41357e297@mail.gmail.com> <1F1240778FB0AF46B4E5A72C44D2C74732E01A91@exch1-hi.accelrys.net> Message-ID: <320fb6e00907090159w6a76df66gbf8da2581594414d@mail.gmail.com> On Thu, Jul 9, 2009 at 12:53 AM, Scott Markel wrote: > > Peter, > > Answer to question #2: run the program embossversion. > >> embossversion > Writes the current EMBOSS version number to a file > 6.0.1 > > Scott Thanks Scott (& Thomas) for pointing out the embossversion program. I would still question why the EMBOSS tools don't also support the Unix convention of a version switch. Hypothetically, aren't some (many?) of the tools standalone and couldn't they be installed individually (e.g. as part of someone else's software bundle)? i.e. Can EMBOSS really guarantee that the needle tool and the embossversion tool are in sync? Peter From pmr at ebi.ac.uk Thu Jul 9 09:08:37 2009 From: pmr at ebi.ac.uk (Peter Rice) Date: Thu, 09 Jul 2009 10:08:37 +0100 Subject: [EMBOSS] transeq and ambiguous codons In-Reply-To: <320fb6e00907081450y2fd135e0x817f03c41357e297@mail.gmail.com> References: <320fb6e00907081450y2fd135e0x817f03c41357e297@mail.gmail.com> Message-ID: <4A55B395.4090301@ebi.ac.uk> Peter C. wrote: > However, consider the codon TRR. R means A or G, so this can mean TAA, > TGA, TAG or TGG which translate to stop or W (both EMBOSS and the NCBI > standard table agree here). Therefore the translation of TRR should be > "* or W", which I would expect based on the above examples to result > in "X". But instead EMBOSS transeq gives "*": This is a side effect of the way backtranslation works. EMBOSS calculates the "most ambiguous codon" for each amino acid and stop, and uses this for back translation. Thus a '*' in a protein sequence would be rendered as 'TRR' by backtranseq. To provide consistent translation of the backtranseq results, TRR is assumed to be a backtranslated stop. Similarly, MGN is 'R' because it could reasonably result from a backtranslation of 'R' I agree that it would also be reasonable to be strict about translation in transeq and render TRR as 'X' It depends on your philosophy of where the ambiguity codes came from - from backtranslation, or the curious mind of a bioinformatician :-) So .... it's not a bug, it's a feature ... which means I can relax for now and contemplate some extras in the next release. In future, we will at least make sure TRA and other 'unambiguous ambiguous codons' get understood as '*' etc. TRR I would prefer to leave as it is by default, with option for rendering it as 'X' or an alternative to transeq with the strict translation rules enforced. regards, Peter Rice From pmr at ebi.ac.uk Thu Jul 9 09:16:40 2009 From: pmr at ebi.ac.uk (Peter Rice) Date: Thu, 09 Jul 2009 10:16:40 +0100 Subject: [EMBOSS] transeq and ambiguous codons In-Reply-To: <320fb6e00907090159w6a76df66gbf8da2581594414d@mail.gmail.com> References: <320fb6e00907081450y2fd135e0x817f03c41357e297@mail.gmail.com> <1F1240778FB0AF46B4E5A72C44D2C74732E01A91@exch1-hi.accelrys.net> <320fb6e00907090159w6a76df66gbf8da2581594414d@mail.gmail.com> Message-ID: <4A55B578.1060808@ebi.ac.uk> Peter C. wrote: > Thanks Scott (& Thomas) for pointing out the embossversion program. > > I would still question why the EMBOSS tools don't also support the > Unix convention of a version switch. Hypothetically, aren't some > (many?) of the tools standalone and couldn't they be installed > individually (e.g. as part of someone else's software bundle)? i.e. > Can EMBOSS really guarantee that the needle tool and the > embossversion tool are in sync? We could easily add a -version global qualifier ... for the next release. We can guarantee that embossversion and needle are in sync - assuming they are built using the same libraries as that is where the version is recorded. Standalone build are an issue though and it would help debug in a few cases. regards, Peter Rice From biopython at maubp.freeserve.co.uk Thu Jul 9 09:21:45 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 9 Jul 2009 10:21:45 +0100 Subject: [EMBOSS] transeq and ambiguous codons In-Reply-To: <4A55B578.1060808@ebi.ac.uk> References: <320fb6e00907081450y2fd135e0x817f03c41357e297@mail.gmail.com> <1F1240778FB0AF46B4E5A72C44D2C74732E01A91@exch1-hi.accelrys.net> <320fb6e00907090159w6a76df66gbf8da2581594414d@mail.gmail.com> <4A55B578.1060808@ebi.ac.uk> Message-ID: <320fb6e00907090221m66f830b6ya25ad8eeeecf8f12@mail.gmail.com> On Thu, Jul 9, 2009 at 10:16 AM, Peter Rice wrote: > > Peter C. wrote: > >> Thanks Scott (& Thomas) for pointing out the embossversion program. >> >> I would still question why the EMBOSS tools don't also support the >> Unix convention of a version switch. Hypothetically, aren't some >> (many?) of the tools standalone and couldn't they be installed >> individually (e.g. as part of someone else's software bundle)? i.e. >> Can EMBOSS really guarantee that the needle tool and the >> embossversion tool are in sync? > > We could easily add a -version global qualifier ... for the next release. > > We can guarantee that embossversion and needle are in sync - assuming > they are built using the same libraries as that is where the version is > recorded. Standalone build are an issue though and it would help debug > in a few cases. That sounds good to me :) Peter C. From biopython at maubp.freeserve.co.uk Thu Jul 9 09:28:20 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 9 Jul 2009 10:28:20 +0100 Subject: [EMBOSS] transeq and ambiguous codons In-Reply-To: <320fb6e00907081450y2fd135e0x817f03c41357e297@mail.gmail.com> References: <320fb6e00907081450y2fd135e0x817f03c41357e297@mail.gmail.com> Message-ID: <320fb6e00907090228u7fd0df2by1e2ca18fc0a5c43@mail.gmail.com> On Wed, Jul 8, 2009 at 10:50 PM, Peter wrote: > Hi all, > > Something I mentioned to Peter Rice in passing at BOSC/ISMB 2009 was > I'd found an oddity in transeq with certain ambiguous codons while > testing Biopython's translations. Here is a specific example (but I > suspect there are more). For reference, I am expecting EMBOSS transeq > to be using the NCBI tables: > http://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi > > First consider the following example, the codon TAN, which can be TAA, > TAC, TAG or TAT which translate to stop or Y. Therefore the > translation of TAN should be "* or Y", and EMBOSS transeq opts for > "X". Which is fine: Using raw output instead of the default FASTA works better in emails: $ transeq asis:TAATACTAGTATTAN -stdout -auto -osformat raw *Y*YX > Similarly for the codon TNN, again EMBOSS transeq opts for "X" because > this could be a stop codon, or W, or F, or L, or S, or Y or C! Again, > this is fine: Again, using raw output works better in emails: $ transeq asis:TNN -stdout -auto -osformat raw X > However, consider the codon TRR. R means A or G, so this can mean TAA, > TGA, TAG or TGG which translate to stop or W (both EMBOSS and the NCBI > standard table agree here). Therefore the translation of TRR should be > "* or W", which I would expect based on the above examples to result > in "X". But instead EMBOSS transeq gives "*": Again, using raw output works better in emails: $ transeq asis:TAATGATAGTGGTRR -stdout -auto -osformat raw ***W* > I think this is a bug. > > However, I am aware that the machine I tried this on is rather old, > and I don't actually know which version of EMBOSS it is. I can check the old machine later, but I just retested on a Mac using EMBOSS 6.0.1 (the current release), and see the same behaviour. Peter C. From michael.watson at bbsrc.ac.uk Thu Jul 9 11:33:59 2009 From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C)) Date: Thu, 9 Jul 2009 12:33:59 +0100 Subject: [EMBOSS] Problems installing EMBASSY / PHYLIPNEW References: <8975119BCD0AC5419D61A9CF1A923E9508FD0DC2@iahce2ksrv1.iah.bbsrc.ac.uk> <4A5227C9.7040004@gmail.com> Message-ID: <8975119BCD0AC5419D61A9CF1A923E9508B292C9@iahce2ksrv1.iah.bbsrc.ac.uk> Hmmm. My sys admin says.... I can find: libXaw-devel.x86_64 : X.Org X11 libXaw development package libXaw.x86_64 : X.Org X11 libXaw runtime library libXaw.i386 : X.Org X11 libXaw runtime library libXaw-devel.i386 : X.Org X11 libXaw development package Have I got compatibility issues here? -----Original Message----- From: jason zhang [mailto:jasonzhang2002 at gmail.com] Sent: Mon 06/07/2009 5:35 PM To: michael watson (IAH-C) Cc: emboss at lists.open-bio.org Subject: Re: [EMBOSS] Problems installing EMBASSY / PHYLIPNEW Hi, Michael I did this in my fedora server : yum install libXaw libXaw-compat libXaw-devel to resolve this. -jason michael watson (IAH-C) wrote: > Hello > > > > I did this on a test server a few weeks ago and wrote a script and it > worked. > > > > Now I come to the production server - well, I get some errors. Same OS, > same tar balls. Very frustrating. Can anyone help? > > > > I have PHYLIPNEW-3.67 unpacked in EMBOSS-6.0.1/embassy/PHYLIPNEW-3.67. > In that directory, I run configure and then make > > > > I get a variety of errors, but this time it's: > > > > Making all in src > > make[1]: Entering directory > `/home/bioinfvm/EMBOSS-6.0.1/embassy/PHYLIPNEW-3.67/src' > > gcc -DPACKAGE_NAME=\"\" -DPACKAGE_TARNAME=\"\" -DPACKAGE_VERSION=\"\" > -DPACKAGE_STRING=\"\" -DPACKAGE_BUGREPORT=\"\" -DPACKAGE=\"PHYLIPNEW\" > -DVERSION=\"3.67\" -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 > -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 > -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 > -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -DHAVE_DIRENT_H=1 > -DSTDC_HEADERS=1 -DHAVE_UNISTD_H=1 -DGETPGRP_VOID=1 -DHAVE_STRFTIME=1 > -DHAVE_FORK=1 -DHAVE_VFORK=1 -DHAVE_WORKING_VFORK=1 > -DHAVE_WORKING_FORK=1 -DHAVE_VPRINTF=1 -DHAVE_STRDUP=1 -DHAVE_STRSTR=1 > -DHAVE_STRCHR=1 -DHAVE_ERAND48=1 -DHAVE_MEMMOVE=1 -DHAVE_LIBM=1 > -DPLD_png=1 -I. -I../include -I../../../ajax -I../../../nucleus > -I../../../plplot -DAJ_LinuxLF -D_FILE_OFFSET_BITS=64 > -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE -DLENDIAN -DNO_AUTH -O2 > -D__amd64__ -MT clique.o -MD -MP -MF .deps/clique.Tpo -c -o clique.o > clique.c > > mv -f .deps/clique.Tpo .deps/clique.Po > > gcc -DPACKAGE_NAME=\"\" -DPACKAGE_TARNAME=\"\" -DPACKAGE_VERSION=\"\" > -DPACKAGE_STRING=\"\" -DPACKAGE_BUGREPORT=\"\" -DPACKAGE=\"PHYLIPNEW\" > -DVERSION=\"3.67\" -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 > -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 > -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 > -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -DHAVE_DIRENT_H=1 > -DSTDC_HEADERS=1 -DHAVE_UNISTD_H=1 -DGETPGRP_VOID=1 -DHAVE_STRFTIME=1 > -DHAVE_FORK=1 -DHAVE_VFORK=1 -DHAVE_WORKING_VFORK=1 > -DHAVE_WORKING_FORK=1 -DHAVE_VPRINTF=1 -DHAVE_STRDUP=1 -DHAVE_STRSTR=1 > -DHAVE_STRCHR=1 -DHAVE_ERAND48=1 -DHAVE_MEMMOVE=1 -DHAVE_LIBM=1 > -DPLD_png=1 -I. -I../include -I../../../ajax -I../../../nucleus > -I../../../plplot -DAJ_LinuxLF -D_FILE_OFFSET_BITS=64 > -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE -DLENDIAN -DNO_AUTH -O2 > -D__amd64__ -MT disc.o -MD -MP -MF .deps/disc.Tpo -c -o disc.o disc.c > > mv -f .deps/disc.Tpo .deps/disc.Po > > gcc -DPACKAGE_NAME=\"\" -DPACKAGE_TARNAME=\"\" -DPACKAGE_VERSION=\"\" > -DPACKAGE_STRING=\"\" -DPACKAGE_BUGREPORT=\"\" -DPACKAGE=\"PHYLIPNEW\" > -DVERSION=\"3.67\" -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 > -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 > -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 > -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -DHAVE_DIRENT_H=1 > -DSTDC_HEADERS=1 -DHAVE_UNISTD_H=1 -DGETPGRP_VOID=1 -DHAVE_STRFTIME=1 > -DHAVE_FORK=1 -DHAVE_VFORK=1 -DHAVE_WORKING_VFORK=1 > -DHAVE_WORKING_FORK=1 -DHAVE_VPRINTF=1 -DHAVE_STRDUP=1 -DHAVE_STRSTR=1 > -DHAVE_STRCHR=1 -DHAVE_ERAND48=1 -DHAVE_MEMMOVE=1 -DHAVE_LIBM=1 > -DPLD_png=1 -I. -I../include -I../../../ajax -I../../../nucleus > -I../../../plplot -DAJ_LinuxLF -D_FILE_OFFSET_BITS=64 > -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE -DLENDIAN -DNO_AUTH -O2 > -D__amd64__ -MT phylip.o -MD -MP -MF .deps/phylip.Tpo -c -o phylip.o > phylip.c > > mv -f .deps/phylip.Tpo .deps/phylip.Po > > /bin/sh ../libtool --tag=CC --mode=link gcc -O2 -D__amd64__ -o > fclique clique.o disc.o phylip.o ../../../nucleus/libnucleus.la > ../../../ajax/libajax.la ../../../plplot/libeplplot.la -L/usr/lib64 > -lX11 -lXaw -lXt -lm -lgd -lpng -lz -lm > > mkdir .libs > > gcc -O2 -D__amd64__ -o .libs/fclique clique.o disc.o phylip.o > ../../../nucleus/.libs/libnucleus.so ../../../ajax/.libs/libajax.so > ../../../plplot/.libs/libeplplot.so -L/usr/lib64 -lX11 -lXaw -lXt -lgd > -lpng -lz -lm -Wl,--rpath -Wl,/usr/local/lib > > /usr/bin/ld: cannot find -lXaw > > collect2: ld returned 1 exit status > > make[1]: *** [fclique] Error 1 > > make[1]: Leaving directory > `/home/bioinfvm/EMBOSS-6.0.1/embassy/PHYLIPNEW-3.67/src' > > make: *** [all-recursive] Error 1 > > > > Any help would be appreciated. > > > > Thanks > > Mick > > > > Head of Bioinformatics > Institute for Animal Health > Compton > Berks > RG20 7NN > 01635 578411 > > > > Please consider the environment and don't print this e-mail unless you > really need to. > > The information contained in this message may be confidential or legally > privileged and is intended solely for the addressee. If you have > received this message in error please delete it & notify the originator > immediately. Unauthorised use, disclosure, copying or alteration of > this message is forbidden & may be unlawful. The contents of this > e-mail are the views of the sender and do not necessarily represent the > views of the Institute. This email, and associated attachments, has > been checked locally for viruses but we can accept no responsibility > once it has left our systems. Communications on Institute computers are > monitored to secure the effective operation of the systems and for other > lawful purposes. > > > > The Institute for Animal Health is a company limited by guarantee, > registered in England no. 559784. > > The Institute is also a registered charity, Charity Commissioners > Reference No. 228824 > > > > > _______________________________________________ > EMBOSS mailing list > EMBOSS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/emboss > > From ajb at ebi.ac.uk Thu Jul 9 12:08:14 2009 From: ajb at ebi.ac.uk (ajb at ebi.ac.uk) Date: Thu, 9 Jul 2009 13:08:14 +0100 (BST) Subject: [EMBOSS] Problems installing EMBASSY / PHYLIPNEW In-Reply-To: <8975119BCD0AC5419D61A9CF1A923E9508B292C9@iahce2ksrv1.iah.bbsrc.ac.uk> References: <8975119BCD0AC5419D61A9CF1A923E9508FD0DC2@iahce2ksrv1.iah.bbsrc.ac.uk> <4A5227C9.7040004@gmail.com> <8975119BCD0AC5419D61A9CF1A923E9508B292C9@iahce2ksrv1.iah.bbsrc.ac.uk> Message-ID: <35478.86.26.12.63.1247141294.squirrel@webmail.ebi.ac.uk> > Hmmm. My sys admin says.... > > I can find: > libXaw-devel.x86_64 : X.Org X11 libXaw development package > libXaw.x86_64 : X.Org X11 libXaw runtime library > libXaw.i386 : X.Org X11 libXaw runtime library > libXaw-devel.i386 : X.Org X11 libXaw development package > > Have I got compatibility issues here? What operating system and version are you running? We're at the bundling stage here for the next release and there might be an opportunity for us to check. Alan From michael.watson at bbsrc.ac.uk Thu Jul 9 12:22:10 2009 From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C)) Date: Thu, 9 Jul 2009 13:22:10 +0100 Subject: [EMBOSS] Problems installing EMBASSY / PHYLIPNEW References: <8975119BCD0AC5419D61A9CF1A923E9508FD0DC2@iahce2ksrv1.iah.bbsrc.ac.uk> <4A5227C9.7040004@gmail.com> <8975119BCD0AC5419D61A9CF1A923E9508B292C9@iahce2ksrv1.iah.bbsrc.ac.uk> <35478.86.26.12.63.1247141294.squirrel@webmail.ebi.ac.uk> Message-ID: <8975119BCD0AC5419D61A9CF1A923E9508B292CE@iahce2ksrv1.iah.bbsrc.ac.uk> I think it's Red Hat Enterprise Linux 5 64 bit version.... -----Original Message----- From: ajb at ebi.ac.uk [mailto:ajb at ebi.ac.uk] Sent: Thu 09/07/2009 1:08 PM To: michael watson (IAH-C) Cc: emboss at emboss.open-bio.org Subject: Re: [EMBOSS] Problems installing EMBASSY / PHYLIPNEW > Hmmm. My sys admin says.... > > I can find: > libXaw-devel.x86_64 : X.Org X11 libXaw development package > libXaw.x86_64 : X.Org X11 libXaw runtime library > libXaw.i386 : X.Org X11 libXaw runtime library > libXaw-devel.i386 : X.Org X11 libXaw development package > > Have I got compatibility issues here? What operating system and version are you running? We're at the bundling stage here for the next release and there might be an opportunity for us to check. Alan From A.J.Pemberton at bham.ac.uk Thu Jul 9 13:12:01 2009 From: A.J.Pemberton at bham.ac.uk (Anthony Pemberton) Date: Thu, 9 Jul 2009 14:12:01 +0100 Subject: [EMBOSS] Problems installing EMBASSY / PHYLIPNEW In-Reply-To: <35478.86.26.12.63.1247141294.squirrel@webmail.ebi.ac.uk> References: <8975119BCD0AC5419D61A9CF1A923E9508FD0DC2@iahce2ksrv1.iah.bbsrc.ac.uk><4A5227C9.7040004@gmail.com><8975119BCD0AC5419D61A9CF1A923E9508B292C9@iahce2ksrv1.iah.bbsrc.ac.uk> <35478.86.26.12.63.1247141294.squirrel@webmail.ebi.ac.uk> Message-ID: Guys, The system in question is probably a 64bit system. Very often (not always) there is not a compatibility issue with the packages you list below, they are simply the 32bit and 64bit versions of the libraries. On RHEL 5 they will install in /usr/lib and /usr/lib64 respectively. Under RHEL 5 just: # up2date -i libXaw libXaw-devel (# represents root prompt) I am fairly sure a good install of the packages requires these libraries. You will need a Red Hat subscription for up2date to work (or install from your distro CD/DVD), Regards, Tony Pemberton > -----Original Message----- > From: emboss-bounces at lists.open-bio.org [mailto:emboss- > bounces at lists.open-bio.org] On Behalf Of ajb at ebi.ac.uk > Sent: 09 July 2009 13:08 > To: michael watson (IAH-C) > Cc: emboss at emboss.open-bio.org > Subject: Re: [EMBOSS] Problems installing EMBASSY / PHYLIPNEW > > > Hmmm. My sys admin says.... > > > > I can find: > > libXaw-devel.x86_64 : X.Org X11 libXaw development package > > libXaw.x86_64 : X.Org X11 libXaw runtime library > > libXaw.i386 : X.Org X11 libXaw runtime library > > libXaw-devel.i386 : X.Org X11 libXaw development package > > > > Have I got compatibility issues here? > > What operating system and version are you running? > We're at the bundling stage here for the next release and there > might be an opportunity for us to check. > > Alan > > > _______________________________________________ > EMBOSS mailing list > EMBOSS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/emboss From biopython at maubp.freeserve.co.uk Fri Jul 10 09:14:42 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 10 Jul 2009 10:14:42 +0100 Subject: [EMBOSS] transeq and ambiguous codons In-Reply-To: <4A55B395.4090301@ebi.ac.uk> References: <320fb6e00907081450y2fd135e0x817f03c41357e297@mail.gmail.com> <4A55B395.4090301@ebi.ac.uk> Message-ID: <320fb6e00907100214v6799a217l507e089f635ef781@mail.gmail.com> On Thu, Jul 9, 2009 at 10:08 AM, Peter Rice wrote: > > Peter C. wrote: >> However, consider the codon TRR. R means A or G, so this can mean TAA, >> TGA, TAG or TGG which translate to stop or W (both EMBOSS and the NCBI >> standard table agree here). Therefore the translation of TRR should be >> "* or W", which I would expect based on the above examples to result >> in "X". But instead EMBOSS transeq gives "*": > > This is a side effect of the way backtranslation works... OK, leaving TRR aside for the moment (I'm not sure I'd have done it that way, but I think I follow your logic), I have some more problem cases for you to consider (all using the default standard NCBI table 1). Most of these are 'unambiguous ambiguous codons' as you put it, and I would agree using X when a more specific letter is possible isn't ideal but isn't actually wrong. The "ATS" and related codons (see below) however are simply wrong. -------------------------------------------------------------------------------------- TRA means TAA or TGA, which are both stop codons. Therefore TRA should translate as a stop, not as an X: $ transeq asis:TAATGATRA -stdout -auto -osformat raw **X -------------------------------------------------------------------------------------- Now look at YTA, which means CTA or TTA which encode L, so YTA should be L not X: $ transeq asis:CTATTAYTA -stdout -auto -osformat raw LLX Likewise for YTG and YTR, and YTN. -------------------------------------------------------------------------------------- Another example, ATW means ATA or ATT, which both translate as I, so ATW should translate as I not X: $ transeq asis:ATAATTATW -stdout -auto -osformat raw IIX -------------------------------------------------------------------------------------- Conversely, ATS which means ATC or ATG which translate as I and M. Remember S means G or C. Therefore ATS should translate as X, and not I: $ transeq asis:ATCATGATS -stdout -auto -osformat raw IMI Likewise H means A, G or C, so ATH shows the same bug, as do some other AT* codons: $ transeq asis:ATAATCATGATH -stdout -auto -osformat raw IIMI [*** This one strikes me as a clear bug ***] -------------------------------------------------------------------------------------- Now for another debatable one, RAT means AAT or GAT which code for N and D. So, you could use B (Asx) here rather than the broader X. $ transeq asis:AATGATRAT -stdout -auto -osformat raw NDX Again, the same thing for others like RAC -> X not B, and RAY -> X not B. Similarly, you don't use J to mean leucine (L) or to isoleucine (I), and opt for X (again, this is justifiable). e.g. WTA $ transeq asis:ATATTAWTA -stdout -auto -osformat raw ILX -------------------------------------------------------------------------------------- This list is only partial, and only for the standard table. Peter C. From pmr at ebi.ac.uk Fri Jul 10 09:30:52 2009 From: pmr at ebi.ac.uk (Peter Rice) Date: Fri, 10 Jul 2009 10:30:52 +0100 Subject: [EMBOSS] transeq and ambiguous codons In-Reply-To: <320fb6e00907100214v6799a217l507e089f635ef781@mail.gmail.com> References: <320fb6e00907081450y2fd135e0x817f03c41357e297@mail.gmail.com> <4A55B395.4090301@ebi.ac.uk> <320fb6e00907100214v6799a217l507e089f635ef781@mail.gmail.com> Message-ID: <4A570A4C.2000500@ebi.ac.uk> Peter C. wrote: > OK, leaving TRR aside for the moment (I'm not sure I'd have done it that > way, but I think I follow your logic), I have some more problem cases for > you to consider (all using the default standard NCBI table 1). > > Most of these are 'unambiguous ambiguous codons' as you put it, and > I would agree using X when a more specific letter is possible isn't ideal > but isn't actually wrong. The "ATS" and related codons (see below) > however are simply wrong. They do look wrong. The "X when it could pick a residue" ones I knew of. The others need a closer look. The plan is to work through all possible codons and all the NCBI genetic codes as soon as the release is out. It should be a simple patch to ajtranslate.c when I'm done. > -------------------------------------------------------------------------------------- > > Now for another debatable one, RAT means AAT or GAT which code > for N and D. So, you could use B (Asx) here rather than the broader X. > > Similarly, you don't use J to mean leucine (L) or to isoleucine (I), and > opt for X (again, this is justifiable). e.g. WTA Hmmm ... B and Z are ambiguity codes for amino acid analyser where all the amide bonds are broken and that includes N->D and Q->E. We used to have one of those in the lab. Similarly, J is for mass spec where I and L have the same molecular weight. I don't consider them appropriate for translation. So I plan to go for unique amino acids where possible with the ambiguity codes. What do our users think? regards, Peter From biopython at maubp.freeserve.co.uk Fri Jul 10 23:10:19 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sat, 11 Jul 2009 00:10:19 +0100 Subject: [EMBOSS] transeq and ambiguous codons In-Reply-To: <4A570A4C.2000500@ebi.ac.uk> References: <320fb6e00907081450y2fd135e0x817f03c41357e297@mail.gmail.com> <4A55B395.4090301@ebi.ac.uk> <320fb6e00907100214v6799a217l507e089f635ef781@mail.gmail.com> <4A570A4C.2000500@ebi.ac.uk> Message-ID: <320fb6e00907101610o63140f2cp7f0aebd1cc7022d1@mail.gmail.com> On Fri, Jul 10, 2009 at 10:30 AM, Peter Rice wrote: > > Peter C. wrote: >> >> OK, leaving TRR aside for the moment (I'm not sure I'd have done it that >> way, but I think I follow your logic), I have some more problem cases for >> you to consider (all using the default standard NCBI table 1). >> >> Most of these are 'unambiguous ambiguous codons' as you put it, and >> I would agree using X when a more specific letter is possible isn't ideal >> but isn't actually wrong. The "ATS" and related codons (see below) >> however are simply wrong. > > They do look wrong. The "X when it could pick a residue" ones I knew of. > > The others need a closer look. The plan is to work through all possible > codons and all the NCBI genetic codes as soon as the release is out. > > It should be a simple patch to ajtranslate.c when I'm done. > OK - I appreciate this is too last minute for the imminent EMBOSS release. >> -------------------------------------------------------------------------------------- >> >> Now for another debatable one, RAT means AAT or GAT which code >> for N and D. So, you could use B (Asx) here rather than the broader X. >> >> Similarly, you don't use J to mean leucine (L) or to isoleucine (I), and >> opt for X (again, this is justifiable). e.g. WTA > > Hmmm ... B and Z are ambiguity codes for amino acid analyser where all the > amide bonds are broken and that includes N->D and Q->E. We used to have one > of those in the lab. Similarly, J is for mass spec where I and L have the > same molecular weight. I don't consider them appropriate for translation. Well, as I said, this is debatable. On the one hand B and Z are IUPAC standards (although J isn't yet), but amino acids don't have the full ambiguous alphabet that we have for nucleotides so some might find such a translation surprising. http://www.chem.qmul.ac.uk/iupac/AminoAcid/A2021.html > So I plan to go for unique amino acids where possible with the ambiguity > codes. Good :) Peter C. From aradwen at gmail.com Sat Jul 11 10:29:40 2009 From: aradwen at gmail.com (Radwen ANIBA) Date: Sat, 11 Jul 2009 12:29:40 +0200 Subject: [EMBOSS] Problem with protein caracters Message-ID: Hi everyone I'm trying to use some programs that comes with emboss package to analyze some protein sequences but I have sometimes this message : Error: ajSeqTypeCheckIn: Sequence must be protein sequence without BZ U X or *: found bad character 'X' Is there any manner to force the program considering these types of residues ? Thank you radwen From pmr at ebi.ac.uk Sat Jul 11 10:54:21 2009 From: pmr at ebi.ac.uk (Peter Rice) Date: Sat, 11 Jul 2009 11:54:21 +0100 Subject: [EMBOSS] Problem with protein caracters In-Reply-To: References: Message-ID: <4A586F5D.5060702@ebi.ac.uk> Radwen ANIBA wrote: > I'm trying to use some programs that comes with emboss package to analyze > some protein sequences but I have sometimes this message : > > Error: ajSeqTypeCheckIn: Sequence must be protein sequence without BZ U X or > *: found bad character 'X' > > Is there any manner to force the program considering these types of residues EMBOSS uses the type attribute of the input sequence (or seqset or seqall) to identify the type of the input sequence (nucleotide, protein, or any) and the characters that are allowed (gaps, stops, non-standard residies and ambiguity characters). Your application is expecting "pureprotein". This is only used by applications unable to handle the ambiguity codes (it can be difficult to define what an algorithm should do with them). The alternative are: protein - accepts all characters, converts stops to X proteinstandard - converts U,O and J to 'X' stopproteinstandard - converts stops, U, O, J to X "protein" is probably what you want. You need to be able to do something with the ambiguity codes X, B, Z and J and with the non-standard amino acids U (selenocysteine) and O (pyrrolysine) Hope this helps Peter Rice From liisa.koski at dnalandmarks.ca Mon Jul 13 13:56:37 2009 From: liisa.koski at dnalandmarks.ca (Liisa Koski) Date: Mon, 13 Jul 2009 09:56:37 -0400 Subject: [EMBOSS] fuzznuc score Message-ID: Hello, I would like to know how the score is calculated for the fuzznuc results. What is taken into consideration(strand, exact matches, etc?). What does a high score vs lower score mean? Many thanks, Liisa From pmr at ebi.ac.uk Mon Jul 13 14:43:18 2009 From: pmr at ebi.ac.uk (Peter Rice) Date: Mon, 13 Jul 2009 15:43:18 +0100 Subject: [EMBOSS] fuzznuc score In-Reply-To: References: Message-ID: <4A5B4806.1080905@ebi.ac.uk> Liisa Koski wrote: > Hello, > > I would like to know how the score is calculated for the fuzznuc results. > What is taken into consideration(strand, exact matches, etc?). What does a > high score vs lower score mean? The score in EMBOSS report output is a required field for standard formats such as GFF. For fuzznuc the score is the length of the sequence that is found, minus any mismatches. Patterns that can match very long sequences can therefore give a very high score. Hope this helps, Peter Rice From ajb at ebi.ac.uk Wed Jul 15 11:19:33 2009 From: ajb at ebi.ac.uk (ajb at ebi.ac.uk) Date: Wed, 15 Jul 2009 12:19:33 +0100 (BST) Subject: [EMBOSS] EMBOSS 6.1.0 release now available Message-ID: <39390.86.26.12.63.1247656773.squirrel@webmail.ebi.ac.uk> Dear EMBOSS users and developers, A new version of EMBOSS (6.1.0) is now available for download from our ftp server: ftp://emboss.open-bio.org/pub/EMBOSS/ If you use any of the EMBASSY packages (e.g. PHYLIP, VIENNA etc) then, as usual, remember to re-download and compile those too. A new version of the mEMBOSS, the Windows port, is also available from: ftp://emboss.open-bio.org/pub/EMBOSS/windows/mEMBOSS-6.1.0-setup.exe Many new capabilities have been added and bugs fixed throughout. Release highlights for EMBOSS include: * Full support for the new SwissProt format. In most cases the entry can be read and written exactly * Full support for EMBL and GenBank entries. In most cases the entry can be read and written exactly * Support for FASTQ short read formats for sequence and quality data * Full support for protein and nucleotide sequence parsing from PDB entries * Full support for GFF3 feature format as the new default feature output * Improved summary information at the end of report output * Alignment output using multiple sequence formats * Extended support for distance matrix file formats * Improved support for regular expression and pattern searching * Improved support for large sequence alignments * Support for remote locations in feature table processing, for example retrieval in coderet. * Output directory support extended to allow directories to be created * Normalisation option for hydrophobicity plots (pepwindow and pepwindowall) * Processing of methylation sites in restriction mapping * Embossdata reports results alphabetically sorted * Command line qualifiers should be unique after 5 characters to allow safe abbreviation * Improved configuration procedures for X11 support * Support for dasgff report format, making it possible to write EMBOSS-based DAS annotation servers Release highlights for EMBASSY include: * Support for MEME 4.0 * Phylipnew updated to Phylip 3.68 * Support for the HMMERDB environment variable in Hmmernew. * Bug fixes for the MSE multiple sequence editor Release highlights for Jemboss include: * Refactoring of the source code * Location of the 'Execution mode' menu moved near to the 'Go' button in the application forms. When a user runs a job for the first time in 'batch' mode an information message is displayed * Automatic configuration of the standalone Jemboss GUI on UNIX systems after typing "make install" for EMBOSS. This standalone GUI can be run using the runJemboss.csh script in the EMBOSS 'bin' directory. This assumes that you have a reasonably up-to-date version of Java installed (1.6 preferred) For future extensions, we have added: * Parsing of cross-reference information from SwissProt and EMBL/GenBank formats * Code to delete and update database indexes New EMBOSS wiki EMBOSS now has a Wiki at http://emboss.open-bio.org/wiki where we will maintain the master copies of documentation for the applications and libraries, and where we have sections for planning new features and applications for the next 3 years of funding. Please contribute any corrections to the documentation and add new ideas to the "Planning" section. We will, of course, be making the wiki prettier as it matures. Important note for Developers New distributions of operating systems have started to use the series 2 version of libtool. We therefore now use this in our CVS repository. The latest stable version of libtool is 2.2.6a (reported by libtool itself as 2.2.6). Developers using systems with older (1.5.x) libtool versions will have to install a local copy of libtool. This would typically be done by downloading the source code from the GNU site: ftp://ftp.gnu.org/ After installing libtool it will usually be necessary to then re-install autoconf (2.63) and automake (1.11) to the same directory root (they are often tied to the version of libtool they were provided with). They too are available from the GNU ftp server. Make sure that your PATH is refreshed between doing the installations of the GNU tools in order that the previous versions aren't referenced. We note that one system (cygwin) currently provides an experimental version of libtool (2.2.7). Developers on these systems (and, in general, on any system with a higher version of libtool than in our CVS repository) should type: autoreconf -fi before attempting compilation. We will usually keep up-to-date with libtool stable releases within a libtool series. New BBSRC funding and future work As previously announced, we have recently been refunded by the BBSRC. What we said in that announcement bears repeating here. The core aims of the funding proposal were to continue support, maintenance and development of EMBOSS, and to provide extensive online training materials for users, developers and system administrators using text from a series of books to be published by Cambridge University Press. We are also explicitly targeting areas where we see EMBOSS can be expanded: * Richer data content in EMBOSS outputs leading to major improvements in the integration and visualisation of results in browsers. * Processing many more data fields in EMBOSS inputs (taxonomy, genes, GO terms, cross-references, keywords. * Extending and improving database access: better indexing, query language support and combining searches across multiple databases, support for non-sequence data resources and new data access methods * Scaling up the libraries and adding new applications to support the data volumes generated by next-generation sequencing runs. We anticipate many more users will be working with short read data mapped to reference sequences over the next few years. * We aim to add at least 100 new applications in these 3 years. Suggestions for new applications are very welcome. * Major work on new developments and new library code will start from August. Alan From db60 at st-andrews.ac.uk Wed Jul 15 12:47:17 2009 From: db60 at st-andrews.ac.uk (Daniel Barker) Date: Wed, 15 Jul 2009 13:47:17 +0100 Subject: [EMBOSS] EMBOSS 6.1.0 release now available In-Reply-To: <39390.86.26.12.63.1247656773.squirrel@webmail.ebi.ac.uk> References: <39390.86.26.12.63.1247656773.squirrel@webmail.ebi.ac.uk> Message-ID: <4A5DCFD5.8050101@st-andrews.ac.uk> Dear Alan, Very glad to see the project will continue with new funding. One request. Could EMBOSS please add a full global alignment program or option? Needle allows 'overhanging ends' with no penalty. From the current Needle documentation: 'There is no penalty for the hanging ends of the overlap. In bioinformatics, it is usually reasonable to assume that the sequences are incomplete and there should be no penalty for failing to align the missing bases.' For protein alignments, this tends to be very lenient towards alignments with these kinds of domain architecture: protein 1: A-B-C protein 2: C-X-Y-Z As opposed to the following situation, which (correctly for global alignment) does tend to be heavily penalized: protein 1: A-C-D protein 2: C I don't think this makes sense. Or at least, it definitely does not make sense for all applications. Generally with pairwise alignment of proteins I want either local alignment (e.g. water), or full global alignment which isn't available in EMBOSS. Thank you for considering this. Best regards, Daniel -- Daniel Barker http://bio.st-andrews.ac.uk/staff/db60.htm The University of St Andrews is a charity registered in Scotland : No SC013532 From pmr at ebi.ac.uk Wed Jul 15 13:44:26 2009 From: pmr at ebi.ac.uk (Peter Rice) Date: Wed, 15 Jul 2009 14:44:26 +0100 Subject: [EMBOSS] EMBOSS 6.1.0 release now available In-Reply-To: <4A5DCFD5.8050101@st-andrews.ac.uk> References: <39390.86.26.12.63.1247656773.squirrel@webmail.ebi.ac.uk> <4A5DCFD5.8050101@st-andrews.ac.uk> Message-ID: <4A5DDD3A.9070605@ebi.ac.uk> Daniel Barker wrote: > One request. Could EMBOSS please add a full global alignment program or > option? Needle allows 'overhanging ends' with no penalty. > > I don't think this makes sense. Or at least, it definitely does not make > sense for all applications. Generally with pairwise alignment of > proteins I want either local alignment (e.g. water), or full global > alignment which isn't available in EMBOSS. Thanks for the suggestion. Would you be happy with an options for needle to penalise gaps at the end, or a separate application (if so, we need to invent a new name). regards, Peter Rice From db60 at st-andrews.ac.uk Wed Jul 15 13:59:25 2009 From: db60 at st-andrews.ac.uk (Daniel Barker) Date: Wed, 15 Jul 2009 14:59:25 +0100 Subject: [EMBOSS] EMBOSS 6.1.0 release now available In-Reply-To: <4A5DDD3A.9070605@ebi.ac.uk> References: <39390.86.26.12.63.1247656773.squirrel@webmail.ebi.ac.uk> <4A5DCFD5.8050101@st-andrews.ac.uk> <4A5DDD3A.9070605@ebi.ac.uk> Message-ID: <4A5DE0BD.7090103@st-andrews.ac.uk> Dear Peter, > Would you be happy with an options for needle to penalise gaps at the > end, or a separate application (if so, we need to invent a new name). An option for needle would be fine. This may be the best solution. Thanks a lot and best wishes, Daniel -- Daniel Barker http://bio.st-andrews.ac.uk/staff/db60.htm The University of St Andrews is a charity registered in Scotland : No SC013532 From charles-listes-emboss at plessy.org Wed Jul 15 14:59:53 2009 From: charles-listes-emboss at plessy.org (Charles Plessy) Date: Wed, 15 Jul 2009 23:59:53 +0900 Subject: [EMBOSS] EMBOSS 6.1.0 release now available In-Reply-To: <39390.86.26.12.63.1247656773.squirrel@webmail.ebi.ac.uk> References: <39390.86.26.12.63.1247656773.squirrel@webmail.ebi.ac.uk> Message-ID: <20090715145953.GA30812@kunpuu.plessy.org> Le Wed, Jul 15, 2009 at 12:19:33PM +0100, ajb at ebi.ac.uk a ?crit : > Dear EMBOSS users and developers, > > A new version of EMBOSS (6.1.0) is now available for download > from our ftp server: > > ftp://emboss.open-bio.org/pub/EMBOSS/ Dear EMBOSS developers, congratulations for the release! I have updated our Debian EMBOSS packages, and they should become available for the Sid (unstable) distribution in our mirrors within a day or two. Please let me know if these packages prove difficult to install on our current stable distribution (Lenny), and I will prepare a semi-official backport. I will update and rebuild the Debian EMBASSY packages against EMBOSS 6.1.0 later this week. Have a nice day -- Charles Plessy Debian Med packaging team, http://www.debian.org/devel/debian-med Tsurumi, Kanagawa, Japan From golharam at umdnj.edu Thu Jul 16 19:09:03 2009 From: golharam at umdnj.edu (Ryan Golhar) Date: Thu, 16 Jul 2009 15:09:03 -0400 Subject: [EMBOSS] EMBOSS 6.1.0 release now available In-Reply-To: <39390.86.26.12.63.1247656773.squirrel@webmail.ebi.ac.uk> References: <39390.86.26.12.63.1247656773.squirrel@webmail.ebi.ac.uk> Message-ID: <4A5F7ACF.4060808@umdnj.edu> Any thought on implementing some of the algorithms using CUDA when possible on GPUs? This could speed up some programs significantly. ajb at ebi.ac.uk wrote: > Dear EMBOSS users and developers, > > A new version of EMBOSS (6.1.0) is now available for download > from our ftp server: > > ftp://emboss.open-bio.org/pub/EMBOSS/ > > > If you use any of the EMBASSY packages (e.g. PHYLIP, VIENNA etc) then, > as usual, remember to re-download and compile those too. > > A new version of the mEMBOSS, the Windows port, is also available from: > > ftp://emboss.open-bio.org/pub/EMBOSS/windows/mEMBOSS-6.1.0-setup.exe > > Many new capabilities have been added and bugs fixed throughout. > > > > > Release highlights for EMBOSS include: > > * Full support for the new SwissProt format. In most cases the entry can > be read and written exactly > > * Full support for EMBL and GenBank entries. In most cases the entry can > be read and written exactly > > * Support for FASTQ short read formats for sequence and quality data > > * Full support for protein and nucleotide sequence parsing from PDB entries > > * Full support for GFF3 feature format as the new default feature output > > * Improved summary information at the end of report output > > * Alignment output using multiple sequence formats > > * Extended support for distance matrix file formats > > * Improved support for regular expression and pattern searching > > * Improved support for large sequence alignments > > * Support for remote locations in feature table processing, for example > retrieval in coderet. > > * Output directory support extended to allow directories to be created > > * Normalisation option for hydrophobicity plots (pepwindow and pepwindowall) > > * Processing of methylation sites in restriction mapping > > * Embossdata reports results alphabetically sorted > > * Command line qualifiers should be unique after 5 characters to allow > safe abbreviation > > * Improved configuration procedures for X11 support > > * Support for dasgff report format, making it possible to write > EMBOSS-based DAS annotation servers > > > > > Release highlights for EMBASSY include: > > * Support for MEME 4.0 > > * Phylipnew updated to Phylip 3.68 > > * Support for the HMMERDB environment variable in Hmmernew. > > * Bug fixes for the MSE multiple sequence editor > > > > > Release highlights for Jemboss include: > > * Refactoring of the source code > > * Location of the 'Execution mode' menu moved near to the 'Go' button in > the application forms. When a user runs a job for the first time in > 'batch' mode an information message is displayed > > * Automatic configuration of the standalone Jemboss GUI on > UNIX systems after typing "make install" for EMBOSS. This > standalone GUI can be run using the runJemboss.csh script > in the EMBOSS 'bin' directory. This assumes that you have > a reasonably up-to-date version of Java installed (1.6 > preferred) > > > > > For future extensions, we have added: > > * Parsing of cross-reference information from SwissProt and EMBL/GenBank > formats > > * Code to delete and update database indexes > > > > > New EMBOSS wiki > > EMBOSS now has a Wiki at http://emboss.open-bio.org/wiki where we will > maintain the master copies of documentation for the applications and > libraries, and where we have sections for planning new features and > applications for the next 3 years of funding. Please contribute any > corrections to the documentation and add new ideas to the "Planning" > section. We will, of course, be making the wiki prettier as it > matures. > > > > > Important note for Developers > > New distributions of operating systems have started to use the series > 2 version of libtool. We therefore now use this in our CVS > repository. The latest stable version of libtool is 2.2.6a (reported > by libtool itself as 2.2.6). Developers using systems with older > (1.5.x) libtool versions will have to install a local copy of > libtool. This would typically be done by downloading the source code > from the GNU site: > > ftp://ftp.gnu.org/ > > After installing libtool it will usually be necessary to then > re-install autoconf (2.63) and automake (1.11) to the same directory > root (they are often tied to the version of libtool they were provided > with). They too are available from the GNU ftp server. Make sure that > your PATH is refreshed between doing the installations of the GNU > tools in order that the previous versions aren't referenced. > > We note that one system (cygwin) currently provides an experimental > version of libtool (2.2.7). Developers on these systems (and, in general, > on any system with a higher version of libtool than in our CVS repository) > should type: > > autoreconf -fi > > before attempting compilation. We will usually keep up-to-date with libtool > stable releases within a libtool series. > > > > > New BBSRC funding and future work > > As previously announced, we have recently been refunded by the BBSRC. > What we said in that announcement bears repeating here. > The core aims of the funding proposal were to continue support, > maintenance and development of EMBOSS, and to provide extensive online > training materials for users, developers and system administrators using > text from a series of books to be published by Cambridge University Press. > > We are also explicitly targeting areas where we see EMBOSS can be > expanded: > > * Richer data content in EMBOSS outputs leading to major improvements > in the integration and visualisation of results in browsers. > > * Processing many more data fields in EMBOSS inputs (taxonomy, > genes, GO terms, cross-references, keywords. > > * Extending and improving database access: better indexing, query > language support and combining searches across multiple databases, > support for non-sequence data resources and new data access methods > > * Scaling up the libraries and adding new applications to support the > data volumes generated by next-generation sequencing runs. We > anticipate many more users will be working with short read data > mapped to reference sequences over the next few years. > > * We aim to add at least 100 new applications in these 3 years. > Suggestions for new applications are very welcome. > > * Major work on new developments and new library code will start from > August. > > > Alan > > > _______________________________________________ > EMBOSS mailing list > EMBOSS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/emboss > -------------- next part -------------- A non-text attachment was scrubbed... Name: golharam.vcf Type: text/x-vcard Size: 438 bytes Desc: not available URL: From pmr at ebi.ac.uk Fri Jul 17 07:39:11 2009 From: pmr at ebi.ac.uk (Peter Rice) Date: Fri, 17 Jul 2009 08:39:11 +0100 Subject: [EMBOSS] EMBOSS 6.1.0 release now available In-Reply-To: <4A5F7ACF.4060808@umdnj.edu> References: <39390.86.26.12.63.1247656773.squirrel@webmail.ebi.ac.uk> <4A5F7ACF.4060808@umdnj.edu> Message-ID: <4A602A9F.4040908@ebi.ac.uk> Ryan Golhar wrote: > Any thought on implementing some of the algorithms using CUDA when > possible on GPUs? This could speed up some programs significantly. Yes indeed. At BOSC/ISMB last month we were discussing closer collaborations with the other Open Bio FOundation projects. One of these is BioManyCores which is aiming at OpenCL programming but is currently concentrating on CUDA. When our new workstations are delivered we will be looking into CUDA. Which applications would you most like to speed up? (current EMBOSS programs, and suggestions for new ones) regards, Peter Rice From hpm at bioinfo-user.org.uk Fri Jul 17 09:59:42 2009 From: hpm at bioinfo-user.org.uk (Hamish McWilliam) Date: Fri, 17 Jul 2009 10:59:42 +0100 Subject: [EMBOSS] EMBOSS 6.1.0 release now available In-Reply-To: <4A602A9F.4040908@ebi.ac.uk> References: <39390.86.26.12.63.1247656773.squirrel@webmail.ebi.ac.uk> <4A5F7ACF.4060808@umdnj.edu> <4A602A9F.4040908@ebi.ac.uk> Message-ID: <4A604B8E.1000004@bioinfo-user.org.uk> Hi Peter, >> Any thought on implementing some of the algorithms using CUDA when >> possible on GPUs? This could speed up some programs significantly. Given that our server systems do not have particularly powerful GPUs, but do have multiple CPU cores: threading, and possibly the use of on core vectorization (see http://en.wikipedia.org/wiki/SIMD), seem like more generally applicable methods for improving performance in our case. One interesting option for Intel platforms is the Intel Compiler (icc), which will vectorize some code constructions as a platform specific optimization. Unfortunately we are running a mixture of AMD and Intel systems of various vintages, so this option is going to require a lot of testing to check it works and gives us any benefits. > Yes indeed. > > At BOSC/ISMB last month we were discussing closer collaborations with > the other Open Bio FOundation projects. > > One of these is BioManyCores which is aiming at OpenCL programming but > is currently concentrating on CUDA. > > When our new workstations are delivered we will be looking into CUDA. Given that OpenCL supports both GPU and CPU vectorization, and CUDA is Nvidia GPU specific, it may be worth waiting for OpenCL to be adopted. MacOS X Snow Leopard is only a couple of months away after all ;-) > Which applications would you most like to speed up? (current EMBOSS > programs, and suggestions for new ones) At our end the bottlenecks are mainly the indexing (dbi* & dbx*) and reformatting tools (seqret). All the best, Hamish From georgios at biotek.uio.no Fri Jul 17 10:52:38 2009 From: georgios at biotek.uio.no (George Magklaras) Date: Fri, 17 Jul 2009 12:52:38 +0200 Subject: [EMBOSS] EMBOSS 6.1.0 release now available In-Reply-To: <4A604B8E.1000004@bioinfo-user.org.uk> References: <39390.86.26.12.63.1247656773.squirrel@webmail.ebi.ac.uk> <4A5F7ACF.4060808@umdnj.edu> <4A602A9F.4040908@ebi.ac.uk> <4A604B8E.1000004@bioinfo-user.org.uk> Message-ID: <4A6057F6.7010704@biotek.uio.no> Hamish McWilliam wrote: ) > > At our end the bottlenecks are mainly the indexing (dbi* & dbx*) and > reformatting tools (seqret). > I second that. -- -- George Magklaras BSc Hons MPhil RHCE:805008309135525 Senior Computer Systems Engineer/UNIX-Linux Systems Administrator EMBnet Technical Management Board The Biotechnology Centre of Oslo, University of Oslo http://folk.uio.no/georgios From golharam at umdnj.edu Fri Jul 17 15:03:02 2009 From: golharam at umdnj.edu (Ryan Golhar) Date: Fri, 17 Jul 2009 11:03:02 -0400 Subject: [EMBOSS] EMBOSS 6.1.0 release now available In-Reply-To: <4A602A9F.4040908@ebi.ac.uk> References: <39390.86.26.12.63.1247656773.squirrel@webmail.ebi.ac.uk> <4A5F7ACF.4060808@umdnj.edu> <4A602A9F.4040908@ebi.ac.uk> Message-ID: <4A6092A6.3070808@umdnj.edu> I think the sequence alignment programs or anything dealing with matrices would be a good first start. Even (as Hamish) mentioned, making use of multi-core systems would be good. We are interested in doing some CUDA programming here. We are doing some training in about a month. Based on that, we'll see how well we know it and what we want to pursue. Maybe we can help you out a bit in this? Peter Rice wrote: > Ryan Golhar wrote: >> Any thought on implementing some of the algorithms using CUDA when >> possible on GPUs? This could speed up some programs significantly. > > Yes indeed. > > At BOSC/ISMB last month we were discussing closer collaborations with > the other Open Bio FOundation projects. > > One of these is BioManyCores which is aiming at OpenCL programming but > is currently concentrating on CUDA. > > When our new workstations are delivered we will be looking into CUDA. > > Which applications would you most like to speed up? (current EMBOSS > programs, and suggestions for new ones) > > regards, > > Peter Rice > -------------- next part -------------- A non-text attachment was scrubbed... Name: golharam.vcf Type: text/x-vcard Size: 438 bytes Desc: not available URL: From pmr at ebi.ac.uk Fri Jul 17 15:34:31 2009 From: pmr at ebi.ac.uk (Peter Rice) Date: Fri, 17 Jul 2009 16:34:31 +0100 Subject: [EMBOSS] EMBOSS 6.1.0 release now available In-Reply-To: <4A6057F6.7010704@biotek.uio.no> References: <39390.86.26.12.63.1247656773.squirrel@webmail.ebi.ac.uk> <4A5F7ACF.4060808@umdnj.edu> <4A602A9F.4040908@ebi.ac.uk> <4A604B8E.1000004@bioinfo-user.org.uk> <4A6057F6.7010704@biotek.uio.no> Message-ID: <4A609A07.2080609@ebi.ac.uk> George Magklaras wrote: > Hamish McWilliam wrote: > At our end the bottlenecks are mainly the indexing (dbi* & dbx*) and > reformatting tools (seqret). >> > I second that. dbi and dbx we understand and can speedup. What large data are you using through seqret? I assume it is not single sequences that are slow. regards, Peter Rice From belegdol at gmail.com Sun Jul 19 12:38:19 2009 From: belegdol at gmail.com (Julian Sikorski) Date: Sun, 19 Jul 2009 14:38:19 +0200 Subject: [EMBOSS] Packaging EMBOSS for Fedora In-Reply-To: <1244850556.8999.7.camel@login-svr1.ebi.ac.uk> References: <1244850556.8999.7.camel@login-svr1.ebi.ac.uk> Message-ID: W dniu 13.06.2009 01:49, Mahmut Uludag pisze: > Hi Julian, > >> I have one crucial question: is it worth to package jemboss? It seems it >> didn't see development since 2005, and I have experienced some important >> issues when trying to use it (mainly not working with files with spaces >> in filenames). What do you think? > > I had a quick look into the files with spaces in filenames problem, we > can include a fix in the coming release (6.1) for this problem. Previous > release of EMBOSS (6.0) included some fixes of similar nature, as well > as some small gui improvements. We recently have updated EMBOSS/Jemboss > make/install machinery to make standalone Jemboss come out of box when > users install EMBOSS, i think we were successful and this will be a new > feature in EMBOSS 6.1. > > Regards, > Mahmut Given that I'm finally an MSc and have some more free time, I decided to revisit the packaging and update the package to 6.1.0. Unfortunately, there seem to be some problems with make install: /usr/bin/make install-exec-hook make[7]: Entering directory `/builddir/build/BUILD/EMBOSS-6.1.0/jemboss/org/emboss/jemboss/editor' mkdir -p -- /builddir/build/BUILDROOT/EMBOSS-6.1.0-1.fc11.x86_64/usr/share/EMBOSS/jemboss/org/emboss/jemboss/editor /usr/bin/install: cannot stat `*.class': No such file or directory make[7]: *** [install-exec-hook] Error 1 make[7]: Leaving directory `/builddir/build/BUILD/EMBOSS-6.1.0/jemboss/org/emboss/jemboss/editor' make[6]: Leaving directory `/builddir/build/BUILD/EMBOSS-6.1.0/jemboss/org/emboss/jemboss/editor' make[6]: *** [install-exec-am] Error 2 make[5]: *** [install-am] Error 2 make[5]: Leaving directory `/builddir/build/BUILD/EMBOSS-6.1.0/jemboss/org/emboss/jemboss/editor' make[4]: Leaving directory `/builddir/build/BUILD/EMBOSS-6.1.0/jemboss/org/emboss/jemboss' make[4]: *** [install-recursive] Error 1 make[3]: Leaving directory `/builddir/build/BUILD/EMBOSS-6.1.0/jemboss/org/emboss' make[2]: Leaving directory `/builddir/build/BUILD/EMBOSS-6.1.0/jemboss/org' make[3]: *** [install-recursive] Error 1 make[2]: *** [install-recursive] Error 1 make[1]: Leaving directory `/builddir/build/BUILD/EMBOSS-6.1.0/jemboss' make[1]: *** [install-recursive] Error 1 make: *** [install-recursive] Error 1 6.0.1 including the 1-12 patch was building/installing fine. Am I making something wrong? Cheers, Julian From uludag at ebi.ac.uk Sun Jul 19 16:18:29 2009 From: uludag at ebi.ac.uk (uludag at ebi.ac.uk) Date: Sun, 19 Jul 2009 17:18:29 +0100 (BST) Subject: [EMBOSS] Packaging EMBOSS for Fedora In-Reply-To: References: <1244850556.8999.7.camel@login-svr1.ebi.ac.uk> Message-ID: <49993.78.105.201.225.1248020309.squirrel@webmail.ebi.ac.uk> > there seem to be some problems with make install: > > /usr/bin/make install-exec-hook > make[7]: Entering directory > `/builddir/build/BUILD/EMBOSS-6.1.0/jemboss/org/emboss/jemboss/editor' > mkdir -p -- > /builddir/build/BUILDROOT/EMBOSS-6.1.0-1.fc11.x86_64/usr/share/EMBOSS/jemboss/org/emboss/jemboss/editor > /usr/bin/install: cannot stat `*.class': No such file or directory Looks like we didn't test the --with-java and --with-javaos configure options well, before this release. However, most users will not need these two options any more as EMBOSS-6.1.0 includes precompiled jemboss class files collected in a java archive file. You should hopefully not get the above error if you omit these two options when you configure your emboss installation. Regards, Mahmut From biopython at maubp.freeserve.co.uk Mon Jul 20 13:24:02 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 20 Jul 2009 14:24:02 +0100 Subject: [EMBOSS] transeq and ambiguous codons In-Reply-To: <320fb6e00907090221m66f830b6ya25ad8eeeecf8f12@mail.gmail.com> References: <320fb6e00907081450y2fd135e0x817f03c41357e297@mail.gmail.com> <1F1240778FB0AF46B4E5A72C44D2C74732E01A91@exch1-hi.accelrys.net> <320fb6e00907090159w6a76df66gbf8da2581594414d@mail.gmail.com> <4A55B578.1060808@ebi.ac.uk> <320fb6e00907090221m66f830b6ya25ad8eeeecf8f12@mail.gmail.com> Message-ID: <320fb6e00907200624x5513f3b5u8f7fa0c88e23fc8d@mail.gmail.com> On Thu, Jul 9, 2009 at 10:21 AM, Peter wrote: > On Thu, Jul 9, 2009 at 10:16 AM, Peter Rice wrote: >> >> Peter C. wrote: >> >>> Thanks Scott (& Thomas) for pointing out the embossversion program. >>> >>> I would still question why the EMBOSS tools don't also support the >>> Unix convention of a version switch. Hypothetically, aren't some >>> (many?) of the tools standalone and couldn't they be installed >>> individually (e.g. as part of someone else's software bundle)? i.e. >>> Can EMBOSS really guarantee that the needle tool and the >>> embossversion tool are in sync? >> >> We could easily add a -version global qualifier ... for the next release. >> >> We can guarantee that embossversion and needle are in sync - assuming >> they are built using the same libraries as that is where the version is >> recorded. Standalone build are an issue though and it would help debug >> in a few cases. > > That sounds good to me :) > Thinking about this again, rather than adding a whole new argument (-version), why not just include the program version as the first line of the help output (from -help)? This should also solve the corner case of standalone builds, and makes it very easy to find the version (without having to know about the embossversion tool). Thanks, Peter C. From pmr at ebi.ac.uk Mon Jul 20 13:37:38 2009 From: pmr at ebi.ac.uk (Peter Rice) Date: Mon, 20 Jul 2009 14:37:38 +0100 Subject: [EMBOSS] transeq and ambiguous codons In-Reply-To: <320fb6e00907200624x5513f3b5u8f7fa0c88e23fc8d@mail.gmail.com> References: <320fb6e00907081450y2fd135e0x817f03c41357e297@mail.gmail.com> <1F1240778FB0AF46B4E5A72C44D2C74732E01A91@exch1-hi.accelrys.net> <320fb6e00907090159w6a76df66gbf8da2581594414d@mail.gmail.com> <4A55B578.1060808@ebi.ac.uk> <320fb6e00907090221m66f830b6ya25ad8eeeecf8f12@mail.gmail.com> <320fb6e00907200624x5513f3b5u8f7fa0c88e23fc8d@mail.gmail.com> Message-ID: <4A647322.7080903@ebi.ac.uk> Peter C. wrote: > Thinking about this again, rather than adding a whole new argument > (-version), why not just include the program version as the first line of > the help output (from -help)? This should also solve the corner case > of standalone builds, and makes it very easy to find the version > (without having to know about the embossversion tool). Too late ... I already have working code for --version :-) This can report the EMBOSS version number. For EMBASSY applications it also reports the EMBASSY package version number (something embossversion is unable to do). There was also a "version:" ACD application attribute. This has to be renamed to "versionnumber:" to avoid a clash with the new qualifier name, and will be reported if defined. The "versionnumber" attribute in ACD was originally defined only for third party applications in the SoapLab interface so we do not expect renaming the attribute to cause any problems (we will make the changes to SoapLab). However, the suggestion of adding the the -help output is a good one. I will add something there too. regards, Peter Rice From biopython at maubp.freeserve.co.uk Mon Jul 20 15:41:43 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 20 Jul 2009 16:41:43 +0100 Subject: [EMBOSS] EMBOSS seqret : IntelliGenetics and new DOS lines Message-ID: <320fb6e00907200841p621dcb7eh63d71b0730100813@mail.gmail.com> Hi all, I've just updated my Mac to EMBOSS 6.1.0, and have found an issue with seqret conversion of IntelliGenetics files. After some digging, I think this problem relates to having DOS new lines in a file on Unix (in my case, Mac OS X). For illustration, I'm using the example file from the EMBOSS website, saved to disk (using Unix new lines on a Mac): http://emboss.sourceforge.net/docs/themes/seqformats/ig Using EMBOSS 6.0.1, there was a problem: $ embossversion Writes the current EMBOSS version number to a file 6.0.1 $ seqret -sequence emboss_ig.txt -sformat ig -osformat fasta -auto -filter >HSFAU ttcctctttctcgactccatcttcgcggtagctgggaccgccgttcagtcgccaatatgc agctctttgtccgcgcccaggagctacacaccttcgaggtgaccggccaggaaacggtcg cccagatcaaggctcatgtagcctcactggagggcattgccccggaagatcaagtcgtgc tcctggcaggcgcgcccctggaggatgaggccactctgggccagtgcggggtggaggccc tgactaccctggaagtagcaggccgcatgcttggaggtaaagttcatggttccctggccc gtgctggaaaagtgagaggtcagactcctaaggtggccaaacaggagaagaagaagaaga agacaggtcgggctaagcggcggatgcagtacaaccggcgctttgtcaacgttgtgccca cctttggcaagaagaagggccccaatgccaactcttaagtcttttgtaattctggctttc tctaataaaaaagccacttagttcagtcaaaaaaaaaaH-sapiensfaugenebasesH SFAUctaccattttccctctcgattctatatgtacactcgggacaagttctcctgatcga aaacggcaaaactaaggccccaagtaggaatgccttagttttcggggttaacaatgatta acactgagcctcacacccacgcgatgccctcagctcctcgctcagcgctctcaccaacag ccgtagcccgcagccccgctggacaccggttctccatccccgcagcgtagcccggaacat ggtagctgccatctttacctgctacgccagccttctgtgcgcgcaactgtctggtcccgc cccgtcctgcgcgagctgctgcccaggcaggttcgccggtgcgagcgtaaaggggcggag ctaggactgccttgggcggtacaaatagcagggaaccgcgcggtcgctcagcagtgacgt gacacgcagcccacggtctgtactgacgcgccctcgcttcttcctctttctcgactccat cttcgcggtagctgggaccgccgttcaggtaagaatggggccttggctggatccgaaggg cttgtagcaggttggctgcggggtcagaaggcgcggggggaaccgaagaacggggcctgc tccgtggccctgctccagtccctatccgaactccttgggaggcactggccttccgcacgt gagccgccgcgaccaccatcccgtcgcgatcgtttctggaccgctttccactcccaaatc tcctttatcccagagcatttcttggcttctcttacaagccgtcttttctttactcagtcg ccaatatgcagctctttgtccgcgcccaggagctacacaccttcgaggtgaccggccagg aaacggtcgcccagatcaaggtaaggctgcttggtgcgccctgggttccattttcttgtg ctcttcactctcgcggcccgagggaacgcttacgagccttatctttccctgtaggctcat gtagcctcactggagggcattgccccggaagatcaagtcgtgctcctggcaggcgcgccc ctggaggatgaggccactctgggccagtgcggggtggaggccctgactaccctggaagta gcaggccgcatgcttggaggtgagtgagagaggaatgttctttgaagtaccggtaagcgt ctagtgagtgtggggtgcatagtcctgacagctgagtgtcacacctatggtaatagagta cttctcactgtcttcagttcagagtgattcttcctgtttacatccctcatgttgaacaca gacgtccatgggagactgagccagagtgtagttgtatttcagtcacatcacgagatccta gtctggttatcagcttccacactaaaaattaggtcagaccaggccccaaagtgctctata aattagaagctggaagatcctgaaatgaaacttaagatttcaaggtcaaatatctgcaac tttgttctcattacctattgggcgcagcttctctttaaaggcttgaattgagaaaagagg ggttctgctgggtggcaccttcttgctcttacctgctggtgccttcctttcccactacag gtaaagtccatggttccctggcccgtgctggaaaagtgagaggtcagactcctaaggtga gtgagagtattagtggtcatggtgttaggactttttttcctttcacagctaaaccaagtc cctgggctcttactcggtttgccttctccctccctggagatgagcctgagggaagggatg ctaggtgtggaagacaggaaccagggcctgattaaccttcccttctccaggtggccaaac aggagaagaagaagaagaagacaggtcgggctaagcggcggatgcagtacaaccggcgct ttgtcaacgttgtgcccacctttggcaagaagaagggccccaatgccaactcttaagtct tttgtaattctggctttctctaataaaaaagccacttagttcagtcatcgcattgtttca tctttacttgcaaggcctcagggagaggtgtgcttctcgg i.e. The two sequences have been munged into one, with the name of the second sequence as part of the sequence. Using EMBOSS 6.1.0, the following now works: $ embossversion Reports the current EMBOSS version number 6.1.0 $ seqret -sequence emboss_ig.txt -sformat ig -osformat fasta -auto -filter >HSFAU H.sapiens fau mRNA, 518 bases ttcctctttctcgactccatcttcgcggtagctgggaccgccgttcagtcgccaatatgc agctctttgtccgcgcccaggagctacacaccttcgaggtgaccggccaggaaacggtcg cccagatcaaggctcatgtagcctcactggagggcattgccccggaagatcaagtcgtgc tcctggcaggcgcgcccctggaggatgaggccactctgggccagtgcggggtggaggccc tgactaccctggaagtagcaggccgcatgcttggaggtaaagttcatggttccctggccc gtgctggaaaagtgagaggtcagactcctaaggtggccaaacaggagaagaagaagaaga agacaggtcgggctaagcggcggatgcagtacaaccggcgctttgtcaacgttgtgccca cctttggcaagaagaagggccccaatgccaactcttaagtcttttgtaattctggctttc tctaataaaaaagccacttagttcagtcaaaaaaaaaa >HSFAU1 H.sapiens fau 1 gene, 2016 bases ctaccattttccctctcgattctatatgtacactcgggacaagttctcctgatcgaaaac ggcaaaactaaggccccaagtaggaatgccttagttttcggggttaacaatgattaacac tgagcctcacacccacgcgatgccctcagctcctcgctcagcgctctcaccaacagccgt agcccgcagccccgctggacaccggttctccatccccgcagcgtagcccggaacatggta gctgccatctttacctgctacgccagccttctgtgcgcgcaactgtctggtcccgccccg tcctgcgcgagctgctgcccaggcaggttcgccggtgcgagcgtaaaggggcggagctag gactgccttgggcggtacaaatagcagggaaccgcgcggtcgctcagcagtgacgtgaca cgcagcccacggtctgtactgacgcgccctcgcttcttcctctttctcgactccatcttc gcggtagctgggaccgccgttcaggtaagaatggggccttggctggatccgaagggcttg tagcaggttggctgcggggtcagaaggcgcggggggaaccgaagaacggggcctgctccg tggccctgctccagtccctatccgaactccttgggaggcactggccttccgcacgtgagc cgccgcgaccaccatcccgtcgcgatcgtttctggaccgctttccactcccaaatctcct ttatcccagagcatttcttggcttctcttacaagccgtcttttctttactcagtcgccaa tatgcagctctttgtccgcgcccaggagctacacaccttcgaggtgaccggccaggaaac ggtcgcccagatcaaggtaaggctgcttggtgcgccctgggttccattttcttgtgctct tcactctcgcggcccgagggaacgcttacgagccttatctttccctgtaggctcatgtag cctcactggagggcattgccccggaagatcaagtcgtgctcctggcaggcgcgcccctgg aggatgaggccactctgggccagtgcggggtggaggccctgactaccctggaagtagcag gccgcatgcttggaggtgagtgagagaggaatgttctttgaagtaccggtaagcgtctag tgagtgtggggtgcatagtcctgacagctgagtgtcacacctatggtaatagagtacttc tcactgtcttcagttcagagtgattcttcctgtttacatccctcatgttgaacacagacg tccatgggagactgagccagagtgtagttgtatttcagtcacatcacgagatcctagtct ggttatcagcttccacactaaaaattaggtcagaccaggccccaaagtgctctataaatt agaagctggaagatcctgaaatgaaacttaagatttcaaggtcaaatatctgcaactttg ttctcattacctattgggcgcagcttctctttaaaggcttgaattgagaaaagaggggtt ctgctgggtggcaccttcttgctcttacctgctggtgccttcctttcccactacaggtaa agtccatggttccctggcccgtgctggaaaagtgagaggtcagactcctaaggtgagtga gagtattagtggtcatggtgttaggactttttttcctttcacagctaaaccaagtccctg ggctcttactcggtttgccttctccctccctggagatgagcctgagggaagggatgctag gtgtggaagacaggaaccagggcctgattaaccttcccttctccaggtggccaaacagga gaagaagaagaagaagacaggtcgggctaagcggcggatgcagtacaaccggcgctttgt caacgttgtgcccacctttggcaagaagaagggccccaatgccaactcttaagtcttttg taattctggctttctctaataaaaaagccacttagttcagtcatcgcattgtttcatctt tacttgcaaggcctcagggagaggtgtgcttctcgg i.e. There was a problem with this example file in EMBOSS 6.0.1, but things look fine in EMBOSS 6.1.0. Great :) However, if we now convert this input file to use DOS/Windows newlines, and repeat the test (on Mac OS X, so Unix): $ embossversionReports the current EMBOSS version number 6.1.0 $ seqret -sequence emboss_ig.txt -sformat ig -osformat fasta -auto -filter H.sapiens fau mRNA, 518 bases ttcctctttctcgactccatcttcgcggtagctgggaccgccgttcagtcgccaatatgc agctctttgtccgcgcccaggagctacacaccttcgaggtgaccggccaggaaacggtcg cccagatcaaggctcatgtagcctcactggagggcattgccccggaagatcaagtcgtgc tcctggcaggcgcgcccctggaggatgaggccactctgggccagtgcggggtggaggccc tgactaccctggaagtagcaggccgcatgcttggaggtaaagttcatggttccctggccc gtgctggaaaagtgagaggtcagactcctaaggtggccaaacaggagaagaagaagaaga agacaggtcgggctaagcggcggatgcagtacaaccggcgctttgtcaacgttgtgccca cctttggcaagaagaagggccccaatgccaactcttaagtcttttgtaattctggctttc tctaataaaaaagccacttagttcagtcaaaaaaaaaa H.sapiens fau 1 gene, 2016 bases ctaccattttccctctcgattctatatgtacactcgggacaagttctcctgatcgaaaac ggcaaaactaaggccccaagtaggaatgccttagttttcggggttaacaatgattaacac tgagcctcacacccacgcgatgccctcagctcctcgctcagcgctctcaccaacagccgt agcccgcagccccgctggacaccggttctccatccccgcagcgtagcccggaacatggta gctgccatctttacctgctacgccagccttctgtgcgcgcaactgtctggtcccgccccg tcctgcgcgagctgctgcccaggcaggttcgccggtgcgagcgtaaaggggcggagctag gactgccttgggcggtacaaatagcagggaaccgcgcggtcgctcagcagtgacgtgaca cgcagcccacggtctgtactgacgcgccctcgcttcttcctctttctcgactccatcttc gcggtagctgggaccgccgttcaggtaagaatggggccttggctggatccgaagggcttg tagcaggttggctgcggggtcagaaggcgcggggggaaccgaagaacggggcctgctccg tggccctgctccagtccctatccgaactccttgggaggcactggccttccgcacgtgagc cgccgcgaccaccatcccgtcgcgatcgtttctggaccgctttccactcccaaatctcct ttatcccagagcatttcttggcttctcttacaagccgtcttttctttactcagtcgccaa tatgcagctctttgtccgcgcccaggagctacacaccttcgaggtgaccggccaggaaac ggtcgcccagatcaaggtaaggctgcttggtgcgccctgggttccattttcttgtgctct tcactctcgcggcccgagggaacgcttacgagccttatctttccctgtaggctcatgtag cctcactggagggcattgccccggaagatcaagtcgtgctcctggcaggcgcgcccctgg aggatgaggccactctgggccagtgcggggtggaggccctgactaccctggaagtagcag gccgcatgcttggaggtgagtgagagaggaatgttctttgaagtaccggtaagcgtctag tgagtgtggggtgcatagtcctgacagctgagtgtcacacctatggtaatagagtacttc tcactgtcttcagttcagagtgattcttcctgtttacatccctcatgttgaacacagacg tccatgggagactgagccagagtgtagttgtatttcagtcacatcacgagatcctagtct ggttatcagcttccacactaaaaattaggtcagaccaggccccaaagtgctctataaatt agaagctggaagatcctgaaatgaaacttaagatttcaaggtcaaatatctgcaactttg ttctcattacctattgggcgcagcttctctttaaaggcttgaattgagaaaagaggggtt ctgctgggtggcaccttcttgctcttacctgctggtgccttcctttcccactacaggtaa agtccatggttccctggcccgtgctggaaaagtgagaggtcagactcctaaggtgagtga gagtattagtggtcatggtgttaggactttttttcctttcacagctaaaccaagtccctg ggctcttactcggtttgccttctccctccctggagatgagcctgagggaagggatgctag gtgtggaagacaggaaccagggcctgattaaccttcccttctccaggtggccaaacagga gaagaagaagaagaagacaggtcgggctaagcggcggatgcagtacaaccggcgctttgt caacgttgtgcccacctttggcaagaagaagggccccaatgccaactcttaagtcttttg taattctggctttctctaataaaaaagccacttagttcagtcatcgcattgtttcatctt tacttgcaaggcctcagggagaggtgtgcttctcgg i.e. The ">" is missing on all the FASTA sequences. So, it looks like EMBOSS 6.1.0 fixed one problem with IntelliGenetics files, but that there is still an issue here. Peter C. P.S. Should I have reported this possible bug via sourceforge? P.P.S. Back in 2006, I reported a similar issue with a data corruption reading stockholm/pfam with DOS newlines (Sourceforge Bug #1588956, long since fixed). It seems to me that EMBOSS would benefit from explicit testing of all the file formats using DOS/Windows newlines when run on Unix, and vice versa. Does that sound feasible, or just hopelessly ambitious? From biopython at maubp.freeserve.co.uk Mon Jul 20 15:50:32 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 20 Jul 2009 16:50:32 +0100 Subject: [EMBOSS] FASTQ format documentation Message-ID: <320fb6e00907200850t39582781p115780690955bde6@mail.gmail.com> Hi all, I was just trying to double check the names EMBOSS 6.1.0 supports for the various FASTQ file formats, and none of them are listed here: http://emboss.sourceforge.net/docs/themes/SequenceFormats.html Does this need updating, or should I be looking elsewhere? Thanks Peter C. From pmr at ebi.ac.uk Mon Jul 20 16:16:39 2009 From: pmr at ebi.ac.uk (Peter Rice) Date: Mon, 20 Jul 2009 17:16:39 +0100 Subject: [EMBOSS] EMBOSS seqret : IntelliGenetics and new DOS lines In-Reply-To: <320fb6e00907200841p621dcb7eh63d71b0730100813@mail.gmail.com> References: <320fb6e00907200841p621dcb7eh63d71b0730100813@mail.gmail.com> Message-ID: <4A649867.4090205@ebi.ac.uk> Peter C. wrote: > Hi all, > > I've just updated my Mac to EMBOSS 6.1.0, and have found an > issue with seqret conversion of IntelliGenetics files. After some > digging, I think this problem relates to having DOS new lines in > a file on Unix (in my case, Mac OS X). we have an application "noreturn" to fix things like this. If you send me your file I will ty to take a look at whether we shoudl be catching the funny newline characters. > P.S. Should I have reported this possible bug via sourceforge? The emboss-bug at emboss.open-bio.org list is the best way to get our attention > P.P.S. Back in 2006, I reported a similar issue with a data > corruption reading stockholm/pfam with DOS newlines > (Sourceforge Bug #1588956, long since fixed). It seems to > me that EMBOSS would benefit from explicit testing of all > the file formats using DOS/Windows newlines when run on > Unix, and vice versa. Does that sound feasible, or just > hopelessly ambitious? We can try ... how well does biopytjhon handle these? (i.e. do we need such examples for perl, python etc or is this an EMBOSS-specific issue?) regards, Peter From pmr at ebi.ac.uk Mon Jul 20 16:17:46 2009 From: pmr at ebi.ac.uk (Peter Rice) Date: Mon, 20 Jul 2009 17:17:46 +0100 Subject: [EMBOSS] FASTQ format documentation In-Reply-To: <320fb6e00907200850t39582781p115780690955bde6@mail.gmail.com> References: <320fb6e00907200850t39582781p115780690955bde6@mail.gmail.com> Message-ID: <4A6498AA.8040007@ebi.ac.uk> Peter C. wrote: > Hi all, > > I was just trying to double check the names EMBOSS 6.1.0 supports > for the various FASTQ file formats, and none of them are listed here: > http://emboss.sourceforge.net/docs/themes/SequenceFormats.html > > Does this need updating, or should I be looking elsewhere? Yes, and yes. We are cleaning up these things post-release. I will update that file, and also the equivalent on the wiki regards, Peter Rice From biopython at maubp.freeserve.co.uk Mon Jul 20 16:30:45 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 20 Jul 2009 17:30:45 +0100 Subject: [EMBOSS] EMBOSS seqret : IntelliGenetics and new DOS lines In-Reply-To: <4A649867.4090205@ebi.ac.uk> References: <320fb6e00907200841p621dcb7eh63d71b0730100813@mail.gmail.com> <4A649867.4090205@ebi.ac.uk> Message-ID: <320fb6e00907200930r6086a9a0x855de88cf9281543@mail.gmail.com> On Mon, Jul 20, 2009 at 5:16 PM, Peter Rice wrote: > > Peter C. wrote: >> Hi all, >> >> I've just updated my Mac to EMBOSS 6.1.0, and have found an >> issue with seqret conversion of IntelliGenetics files. After some >> digging, I think this problem relates to having DOS new lines in >> a file on Unix (in my case, Mac OS X). > > we have an application "noreturn" to fix things like this. That's basically an EMBOSS variant on unix2dos and dos2unix (or similar) existing Unix command line tools? I'm more interested in having all the EMBOSS tools handle either new line format themselves automatically. These days I am mostly working on Unix (including Mac OS X), but I do have to cope with Windows style text files quite often. > If you send me your file I will ty to take a look at whether we shoudl > be catching the funny newline characters. For this bug report I was using: http://emboss.sourceforge.net/docs/themes/seqformats/ig There are another three example files used in the Biopython unit tests here: http://biopython.open-bio.org/SRC/biopython/Tests/IntelliGenetics/ >> P.S. Should I have reported this possible bug via sourceforge? > > The emboss-bug at emboss.open-bio.org list is the best way to get > our attention Great, another mailing list to sign up to... but if that is your preferred route, that's fine. >> P.P.S. Back in 2006, I reported a similar issue with a data >> corruption reading stockholm/pfam with DOS newlines >> (Sourceforge Bug #1588956, long since fixed). It seems to >> me that EMBOSS would benefit from explicit testing of all >> the file formats using DOS/Windows newlines when run on >> Unix, and vice versa. Does that sound feasible, or just >> hopelessly ambitious? > > We can try ... how well does biopytjhon handle these? (i.e. do we need > such examples for perl, python etc or is this an EMBOSS-specific issue?) I think this is an EMBOSS specific issue. I don't know enough about how all the different EMBOSS parsers work, but is there a singl place where you could add automatic handling of either new line convention when reading in text? For reference, in Python, you can explicitly open text files in "universal newlines" mode, which takes care of this. I don't know about Perl. Peter C. From pmr at ebi.ac.uk Mon Jul 20 20:04:19 2009 From: pmr at ebi.ac.uk (Peter Rice) Date: Mon, 20 Jul 2009 21:04:19 +0100 Subject: [EMBOSS] EMBOSS seqret : IntelliGenetics and new DOS lines In-Reply-To: <320fb6e00907200930r6086a9a0x855de88cf9281543@mail.gmail.com> References: <320fb6e00907200841p621dcb7eh63d71b0730100813@mail.gmail.com> <4A649867.4090205@ebi.ac.uk> <320fb6e00907200930r6086a9a0x855de88cf9281543@mail.gmail.com> Message-ID: <4A64CDC3.9020902@ebi.ac.uk> Peter C. wrote: Thanks for the example files. I will start with those. >>> P.S. Should I have reported this possible bug via sourceforge? >> The emboss-bug at emboss.open-bio.org list is the best way to get >> our attention > > Great, another mailing list to sign up to... but if that is your > preferred route, that's fine. Actually, you can't sign up ... that's our bug reporting list :-) > I think this is an EMBOSS specific issue. I don't know enough about > how all the different EMBOSS parsers work, but is there a single > place where you could add automatic handling of either new line > convention when reading in text? Hope so. I think the issue is places where the parsing is checking explicitly for \n rather than \n and \r. The solution would be to strip both off before parsing. It will need a thorough clean through the ajseqread code. regards, Peter From biopython at maubp.freeserve.co.uk Mon Jul 20 20:09:43 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 20 Jul 2009 21:09:43 +0100 Subject: [EMBOSS] EMBOSS seqret : IntelliGenetics and new DOS lines In-Reply-To: <4A64CDC3.9020902@ebi.ac.uk> References: <320fb6e00907200841p621dcb7eh63d71b0730100813@mail.gmail.com> <4A649867.4090205@ebi.ac.uk> <320fb6e00907200930r6086a9a0x855de88cf9281543@mail.gmail.com> <4A64CDC3.9020902@ebi.ac.uk> Message-ID: <320fb6e00907201309r3ddca32dy3bfc159e201217b5@mail.gmail.com> Peter Rice wrote: > > Thanks for the example files. I will start with those. > > Peter C. wrote: >> I think this is an EMBOSS specific issue. I don't know enough about >> how all the different EMBOSS parsers work, but is there a single >> place where you could add automatic handling of either new line >> convention when reading in text? > > Hope so. I think the issue is places where the parsing is checking > explicitly for \n rather than \n and \r. The solution would be to strip > both off before parsing. It will need a thorough clean through the > ajseqread code. That sounds like a good investment of effort in the long run :) Peter C. From pmr at ebi.ac.uk Tue Jul 21 09:15:59 2009 From: pmr at ebi.ac.uk (Peter Rice) Date: Tue, 21 Jul 2009 10:15:59 +0100 Subject: [EMBOSS] EMBOSS seqret : IntelliGenetics and new DOS lines In-Reply-To: <320fb6e00907200841p621dcb7eh63d71b0730100813@mail.gmail.com> References: <320fb6e00907200841p621dcb7eh63d71b0730100813@mail.gmail.com> Message-ID: <4A65874F.2010106@ebi.ac.uk> Peter C. wrote: > However, if we now convert this input file to use DOS/Windows > newlines, and repeat the test (on Mac OS X, so Unix): > > $ embossversionReports the current EMBOSS version number > 6.1.0 > $ seqret -sequence emboss_ig.txt -sformat ig -osformat fasta -auto -filter > H.sapiens fau mRNA, 518 bases > ttcctctttctcgactccatcttcgcggtagctgggaccgccgttcagtcgccaatatgc > > i.e. The ">" is missing on all the FASTA sequences. Actually, it's not missing ... it is hiding. The sequence id has a ^M appended to it, so the '> and the id get overwritten by the description when you look at the file. Fixed by processing the IG format ID rather than simply copying it. Thanks for finding that one. regards, Peter Rice From biopython at maubp.freeserve.co.uk Tue Jul 21 09:23:03 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 21 Jul 2009 10:23:03 +0100 Subject: [EMBOSS] EMBOSS seqret : IntelliGenetics and new DOS lines In-Reply-To: <4A65874F.2010106@ebi.ac.uk> References: <320fb6e00907200841p621dcb7eh63d71b0730100813@mail.gmail.com> <4A65874F.2010106@ebi.ac.uk> Message-ID: <320fb6e00907210223x5a145ab5qfa9b29bc42a37815@mail.gmail.com> Peter Rice wrote: > > Peter C. wrote: >> However, if we now convert this input file to use DOS/Windows >> newlines, and repeat the test (on Mac OS X, so Unix): >> >> $ embossversionReports the current EMBOSS version number >> 6.1.0 >> $ seqret -sequence emboss_ig.txt -sformat ig -osformat fasta -auto -filter >> ?H.sapiens fau mRNA, 518 bases >> ttcctctttctcgactccatcttcgcggtagctgggaccgccgttcagtcgccaatatgc >> >> i.e. The ">" is missing on all the FASTA sequences. > > Actually, it's not missing ... it is hiding. > > The sequence id has a ^M appended to it, so the '> and the id get > overwritten by the description when you look at the file. That makes sense, and I think I can see how it might have happened. > Fixed by processing the IG format ID rather than simply copying it. > > Thanks for finding that one. Sure, Peter C. From db60 at st-andrews.ac.uk Tue Jul 21 11:24:28 2009 From: db60 at st-andrews.ac.uk (Daniel Barker) Date: Tue, 21 Jul 2009 12:24:28 +0100 Subject: [EMBOSS] EMBOSS seqret : IntelliGenetics and new DOS lines In-Reply-To: <320fb6e00907200930r6086a9a0x855de88cf9281543@mail.gmail.com> References: <320fb6e00907200841p621dcb7eh63d71b0730100813@mail.gmail.com> <4A649867.4090205@ebi.ac.uk> <320fb6e00907200930r6086a9a0x855de88cf9281543@mail.gmail.com> Message-ID: <4A65A56C.9010308@st-andrews.ac.uk> Dear Peters et al., EMBOSS claims not to care about whether newlines are DOS or UNIX: 'EMBOSS programs can read in both PC and Unix text file formats, so it is not necessary for you to use this utility all of the time' - noreturn documentation. This would certainly be good. 'The newline problem' must be the single biggest computational waste of time I've experienced over the years! It's easy to avoid with tr, u2d, d2u, noreturn, etc. - but it's just one other thing that can go wrong, especially when data is shared between different places. I've noticed a small amount of software, in the world in general, still uses the Mac OS 9 (and earlier) convention where newline is \015 only. E.g. this tab-delimited text saved from Excel 2004 for Mac: $ od -bc Workbook1.txt 0000000 061 011 062 011 063 015 064 011 065 011 066 1 \t 2 \t 3 \r 4 \t 5 \t 6 0000013 $ I expect this usage will decline, since it's in conflict with the convention of Mac OS X's own command-line tools (\012 only, like Linux). Probably the '\015 only' convention hasn't had much impact on bioinformatics anyway? Best wishes, Daniel -- Daniel Barker http://bio.st-andrews.ac.uk/staff/db60.htm The University of St Andrews is a charity registered in Scotland : No SC013532 From pmr at ebi.ac.uk Tue Jul 21 12:12:00 2009 From: pmr at ebi.ac.uk (Peter Rice) Date: Tue, 21 Jul 2009 13:12:00 +0100 Subject: [EMBOSS] EMBOSS seqret : IntelliGenetics and new DOS lines In-Reply-To: <4A65A56C.9010308@st-andrews.ac.uk> References: <320fb6e00907200841p621dcb7eh63d71b0730100813@mail.gmail.com> <4A649867.4090205@ebi.ac.uk> <320fb6e00907200930r6086a9a0x855de88cf9281543@mail.gmail.com> <4A65A56C.9010308@st-andrews.ac.uk> Message-ID: <4A65B090.7030507@ebi.ac.uk> Daniel Barker wrote: > Dear Peters et al., > > EMBOSS claims not to care about whether newlines are DOS or UNIX: > > 'EMBOSS programs can read in both PC and Unix text file formats, so it > is not necessary for you to use this utility all of the time' - noreturn > documentation. > > This would certainly be good. 'The newline problem' must be the single > biggest computational waste of time I've experienced over the years! Indeed. We get a little caught between accepting the extra carriage returns and the need for efficient parsing as some users run seqret to reformat large sequence files. Mac format (\015 only) is horrible because the C library functions are looking for line feed unless you're running on a Mac. I will give our examples a run through after converting to PC format and see if any others fall over. > I've noticed a small amount of software, in the world in general, still > uses the Mac OS 9 (and earlier) convention where newline is \015 only. > E.g. this tab-delimited text saved from Excel 2004 for Mac: > > $ od -bc Workbook1.txt > 0000000 061 011 062 011 063 015 064 011 065 011 066 > 1 \t 2 \t 3 \r 4 \t 5 \t 6 > 0000013 > $ > > I expect this usage will decline, since it's in conflict with the > convention of Mac OS X's own command-line tools (\012 only, like Linux). > Probably the '\015 only' convention hasn't had much impact on > bioinformatics anyway? In my experience mac users didn't venture out into the real world. On the other hand, unix users have often copied files from PCs. I used to do the same myself, which was why I had to write noreturn in the first place. regards, Peter Rice From stephen.taylor at imm.ox.ac.uk Thu Jul 23 09:43:35 2009 From: stephen.taylor at imm.ox.ac.uk (Stephen Taylor) Date: Thu, 23 Jul 2009 10:43:35 +0100 Subject: [EMBOSS] Tree building Message-ID: Hi, Are there any phylogeny programs in EMBOSS/EMBASSY that will allow me to build a tree from a multiple alignment without truncating the id name in the resulting treefile? Thanks, Steve From pmr at ebi.ac.uk Thu Jul 23 10:51:06 2009 From: pmr at ebi.ac.uk (Peter Rice) Date: Thu, 23 Jul 2009 11:51:06 +0100 Subject: [EMBOSS] Tree building In-Reply-To: References: Message-ID: <4A68409A.7020004@ebi.ac.uk> Stephen Taylor wrote: > Hi, > > Are there any phylogeny programs in EMBOSS/EMBASSY that will allow me to > build a tree from a multiple alignment without truncating the id name in > the resulting treefile? This can be done by adding some new output formats to the existing phylipnew embassy applications. Phylip truncates id names for its own formats. We can extend them, and only truncate for strict phylip format outputs. We are also open to suggestions for new applications in phylogenetics A good topic to discuss at the GMOD workshop in Oxford. regards, Peter From pmr at ebi.ac.uk Thu Jul 23 14:06:58 2009 From: pmr at ebi.ac.uk (Peter Rice) Date: Thu, 23 Jul 2009 15:06:58 +0100 Subject: [EMBOSS] Tree building In-Reply-To: References: <4A68409A.7020004@ebi.ac.uk> Message-ID: <4A686E82.1010803@ebi.ac.uk> Stephen Taylor wrote: > Hi Peter, > >> Stephen Taylor wrote: >>> Hi, >>> Are there any phylogeny programs in EMBOSS/EMBASSY that will allow me >>> to build a tree from a multiple alignment without truncating the id >>> name in the resulting treefile? >> >> This can be done by adding some new output formats to the existing >> phylipnew embassy applications. Phylip truncates id names for its own >> formats. We can extend them, and only truncate for strict phylip >> format outputs. >> >> We are also open to suggestions for new applications in phylogenetics >> > > Sounds like a good idea. > > However, I kind of need it today:-). I guess I'll have to truncate them > to something sensible for the time being... Can you give some example data and algorithms to make the tree? Peter From stephen.taylor at imm.ox.ac.uk Thu Jul 23 15:07:36 2009 From: stephen.taylor at imm.ox.ac.uk (Stephen Taylor) Date: Thu, 23 Jul 2009 16:07:36 +0100 Subject: [EMBOSS] Tree building In-Reply-To: <50E41C3A73F46B4D876BD13F5E80264A02DAD6C6@exchange3.sims.scri.sari.ac.uk> References: <4A68409A.7020004@ebi.ac.uk> <4A686E82.1010803@ebi.ac.uk> <50E41C3A73F46B4D876BD13F5E80264A02DAD6C6@exchange3.sims.scri.sari.ac.uk> Message-ID: Hi Frank, > Hi Stephen, > >> However, I kind of need it today:-). I guess I'll have to truncate >> them >> to something sensible for the time being... > > If you need a tree *today*... :-) you could download our TOPALi > package (loads all formats of alignment) > I have already installed Topali and was waiting for the results to run. I have Over 1600 sequences and I wasn't sure when it was going to finish since I started it this morning (using the Quick NJ option on a Mac Mini). I think we should install it on a more powerful machine;-). However, after I got your email I checked the status and the job had finished. Good timing... > It does basic NJ, but also model-selection, PhyML, RaxML and > MrBayes. RaxML is quite fast. > > We internally rename species names when TOPALi submits jobs to the > above external programs, but put the names back when we show the > alignment and the tree. Yes, all the labels are there in full. Thanks a lot. Peter - I can send you some date offline if you wish. Steve From stephen.taylor at imm.ox.ac.uk Thu Jul 23 13:34:26 2009 From: stephen.taylor at imm.ox.ac.uk (Stephen Taylor) Date: Thu, 23 Jul 2009 14:34:26 +0100 Subject: [EMBOSS] Tree building In-Reply-To: <4A68409A.7020004@ebi.ac.uk> References: <4A68409A.7020004@ebi.ac.uk> Message-ID: Hi Peter, > Stephen Taylor wrote: >> Hi, >> Are there any phylogeny programs in EMBOSS/EMBASSY that will allow >> me to build a tree from a multiple alignment without truncating the >> id name in the resulting treefile? > > This can be done by adding some new output formats to the existing > phylipnew embassy applications. Phylip truncates id names for its > own formats. We can extend them, and only truncate for strict phylip > format outputs. > > We are also open to suggestions for new applications in phylogenetics > Sounds like a good idea. However, I kind of need it today:-). I guess I'll have to truncate them to something sensible for the time being... Thanks. Steve From gbottu at vub.ac.be Thu Jul 23 16:19:11 2009 From: gbottu at vub.ac.be (Guy Bottu) Date: Thu, 23 Jul 2009 18:19:11 +0200 Subject: [EMBOSS] Tree building In-Reply-To: <4A68409A.7020004@ebi.ac.uk> References: <4A68409A.7020004@ebi.ac.uk> Message-ID: <4A688D7F.6080304@vub.ac.be> Peter Rice wrote: > This can be done by adding some new output formats to the existing > phylipnew embassy applications. Phylip truncates id names for its own > formats. We can extend them, and only truncate for strict phylip format > outputs. Interesting to note the following : standard PHYLIP format reserves the first 10 characters for the name and has the sequence start at position 11. But some programmers like the one of PhyML use a format they call PHYLIP, which however allows a name of any length but then demands that there be a space between the name and the sequence. I already had trouble when I used a standard PHYLIP file made by PHYLIP or EMBOSS with a name of 10 characters as input to PhyML. Guy Bottu From belegdol at gmail.com Wed Jul 29 11:54:49 2009 From: belegdol at gmail.com (Julian Sikorski) Date: Wed, 29 Jul 2009 13:54:49 +0200 Subject: [EMBOSS] Packaging EMBOSS for Fedora In-Reply-To: <49993.78.105.201.225.1248020309.squirrel@webmail.ebi.ac.uk> References: <1244850556.8999.7.camel@login-svr1.ebi.ac.uk> <49993.78.105.201.225.1248020309.squirrel@webmail.ebi.ac.uk> Message-ID: W dniu 19.07.2009 18:18, uludag at ebi.ac.uk pisze: > >> there seem to be some problems with make install: >> >> /usr/bin/make install-exec-hook >> make[7]: Entering directory >> `/builddir/build/BUILD/EMBOSS-6.1.0/jemboss/org/emboss/jemboss/editor' >> mkdir -p -- >> /builddir/build/BUILDROOT/EMBOSS-6.1.0-1.fc11.x86_64/usr/share/EMBOSS/jemboss/org/emboss/jemboss/editor >> /usr/bin/install: cannot stat `*.class': No such file or directory > > Looks like we didn't test the --with-java and --with-javaos configure > options well, before this release. However, most users will not need these > two options any more as EMBOSS-6.1.0 includes precompiled jemboss class > files collected in a java archive file. You should hopefully not get the > above error if you omit these two options when you configure your emboss > installation. > > Regards, > Mahmut Thank you, removing these two seems to have done the trick! Julian From biopython at maubp.freeserve.co.uk Thu Jul 30 15:00:37 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 30 Jul 2009 16:00:37 +0100 Subject: [EMBOSS] FASTQ records with no sequence? Message-ID: <320fb6e00907300800x5f8e78eci5df8333df713e4c@mail.gmail.com> Hi all, On the continuing topic of the nebulous FASTQ format, are there any strong views as to weather a FASTQ files could hold records without a sequence (and therefore no quality scores)? This could make sense as output from an (agressive) quality filter. This is corner case, and applies to other file formats too of course (e.g. FASTA). I mentioned this to Peter Rice (EMBOSS) off list, and he replied: On Thu, Jul 30, 2009 at 2:56 PM, Peter Rice wrote: > EMBOSS rejects zero length sequences - something we put in some years > ago for misformatted FASTA files that someone ran through a Taverna > workflow to launch clustalw via EMBOSS's "emma". The user had got his > carriage control characters mangled so the sequence was appended to the > FASTA '>' line and appeared as a long description with no sequence. > > I can well imagine for filtering paired reads that zero length sequences > would be useful. > > At the point where the test is made we know the sequence format. > We can therefore define some or all formats as accepting or rejecting > zero length sequences. > > Similarly we can easily extend to define some applications (e.g. emma) > as requiring a minimum sequence length. > > regards, > > Peter Peter Rice is of course correct - in general the meaning and validity of a zero length sequence is context dependent. I think Peter Rice makes a good point regarding paired end reads. What I assume we was getting at is the situation where due to quality trimming, one of a pair might be trimmed to nothing - leaving essentially a singleton read. However, paired end reads are normally stored using a matched pair of FASTQ files, so it could be important to keep the zero length read present, so that they can be read in together in sync. If we do want to allow zero length sequences in FASTQ, would both of the following be valid? Should there be empty sequence and quality lines, or no sequence and quality lines? "@identifier\n+\n" (two lines, just the @ and + lines) "@identifier\n\n+\n\n" (four lines, including blank seq and qual lines) or with the repeated identifier on the plus lines: "@identifier\n+identifier\n" (two lines, just the @ and + lines) "@identifier\n\n+identifier\n\n" (four lines, including blank lines) As we are recommending no line wrapping on output this means typical FASTQ records would be four lines - so doing the same makes sense here too. Peter C. From pmr at ebi.ac.uk Thu Jul 30 15:09:41 2009 From: pmr at ebi.ac.uk (Peter Rice) Date: Thu, 30 Jul 2009 16:09:41 +0100 Subject: [EMBOSS] FASTQ records with no sequence? In-Reply-To: <320fb6e00907300800x5f8e78eci5df8333df713e4c@mail.gmail.com> References: <320fb6e00907300800x5f8e78eci5df8333df713e4c@mail.gmail.com> Message-ID: <4A71B7B5.40502@ebi.ac.uk> Peter C. wrote: > As we are recommending no line wrapping on output this means > typical FASTQ records would be four lines - so doing the same > makes sense here too. I vote for 4 lines on output. It should be possible to allow zero lines on input depending on where the '+' check is. regards, Peter From biopython at maubp.freeserve.co.uk Thu Jul 30 15:19:24 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 30 Jul 2009 16:19:24 +0100 Subject: [EMBOSS] FASTQ records with no sequence? In-Reply-To: <4A71B7B5.40502@ebi.ac.uk> References: <320fb6e00907300800x5f8e78eci5df8333df713e4c@mail.gmail.com> <4A71B7B5.40502@ebi.ac.uk> Message-ID: <320fb6e00907300819x35ae00c3wa20a382376134db7@mail.gmail.com> On Thu, Jul 30, 2009 at 4:09 PM, Peter Rice wrote: > > Peter C. wrote: > >> As we are recommending no line wrapping on output this means >> typical FASTQ records would be four lines - so doing the same >> makes sense here too. > > I vote for 4 lines on output. If we want to allow zero length sequences, then yes, I would also vote for the 4 line output (i.e. blank lines for the sequence and the quality string). > It should be possible to allow zero lines on input depending on > where the '+' check is. Yes, I'm pretty sure a parser could cope with any of the zero length sequence FASTQ examples I gave. Peter From biopython at maubp.freeserve.co.uk Thu Jul 30 15:35:25 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 30 Jul 2009 16:35:25 +0100 Subject: [EMBOSS] FASTQ records with no sequence? In-Reply-To: <320fb6e00907300819x35ae00c3wa20a382376134db7@mail.gmail.com> References: <320fb6e00907300800x5f8e78eci5df8333df713e4c@mail.gmail.com> <4A71B7B5.40502@ebi.ac.uk> <320fb6e00907300819x35ae00c3wa20a382376134db7@mail.gmail.com> Message-ID: <320fb6e00907300835v3a9d46d4w77c344bbf6efa08d@mail.gmail.com> Hi all, On the continuing topic of the nebulous FASTQ format, are there any strong views as to weather a FASTQ files could hold records without a sequence (and therefore no quality scores)? This could make sense as output from an (aggressive) quality filter. This was a discussion I meant to start on the OBF list, not the EMBOSS list - so here is the start of the thread: http://lists.open-bio.org/pipermail/emboss/2009-July/003707.html Basically in some contexts an empty FASTQ record makes sense, so perhaps we should include examples of this for our test suite. However, there is more than one reasonable way to represent such a record (either omitting the sequence and quality lines, or including blank sequence and quality lines). On Thu, Jul 30, 2009 at 4:09 PM, Peter Rice wrote: > > Peter C. wrote: > >> As we are recommending no line wrapping on output this means >> typical FASTQ records would be four lines - so doing the same >> makes sense here too. > > I vote for 4 lines on output. If we want to allow zero length sequences, then yes, I would also vote for the 4 line output (i.e. blank lines for the sequence and the quality string). > It should be possible to allow zero lines on input depending on > where the '+' check is. Yes, I'm pretty sure a parser could cope with any of the zero length sequence FASTQ examples I gave. Peter From wzhao6898 at gmail.com Fri Jul 31 16:21:06 2009 From: wzhao6898 at gmail.com (David Zhao) Date: Fri, 31 Jul 2009 16:21:06 +0000 (UTC) Subject: [EMBOSS] How to specify the number of partial digestion using DIGEST Message-ID: Hi there, I'm trying to use DIGEST to generate virtual tryptic digest for a list of proteins, where I'd like to get 2 or 3 miscleavage digestion. I know -overlap returns 1 miscleavage, and -allpartials returns all the combinations and result in a huge file, is there a way I can specify how many miscleavage I need? Thanks, David Zhao