From michael.watson at bbsrc.ac.uk Tue Dec 1 09:33:23 2009 From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C)) Date: Tue, 1 Dec 2009 14:33:23 +0000 Subject: [EMBOSS] Trimming illumina short reads based on quality Message-ID: <8D08960C647E64438CE5740657CBBDC50148732064@iahcexch1.iah.bbsrc.ac.uk> Hi I'm sorry if I've not been keeping up to date on what is doubtless a hot topic. Does EMBOSS allow one to trim short reads based on quality data (from a fastq file)? If not, I have read that it is planned - any idea when it will be implemented? Otherwise, alternative suggestions are welcome! Thanks Mick From golharam at umdnj.edu Tue Dec 1 10:46:20 2009 From: golharam at umdnj.edu (Ryan Golhar) Date: Tue, 01 Dec 2009 10:46:20 -0500 Subject: [EMBOSS] Trimming illumina short reads based on quality In-Reply-To: <8D08960C647E64438CE5740657CBBDC50148732064@iahcexch1.iah.bbsrc.ac.uk> References: <8D08960C647E64438CE5740657CBBDC50148732064@iahcexch1.iah.bbsrc.ac.uk> Message-ID: <4B153A4C.6000904@umdnj.edu> Michael, Doesn't Illumina provide tools to do this? I know with ABI Solid data, they have a perl script capable of trimming data based on quality scores. Ryan michael watson (IAH-C) wrote: > Hi > > I'm sorry if I've not been keeping up to date on what is doubtless a hot topic. > > Does EMBOSS allow one to trim short reads based on quality data (from a fastq file)? > > If not, I have read that it is planned - any idea when it will be implemented? > > Otherwise, alternative suggestions are welcome! > > Thanks > Mick > > > _______________________________________________ > EMBOSS mailing list > EMBOSS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/emboss > From d.m.a.martin at dundee.ac.uk Tue Dec 1 11:16:21 2009 From: d.m.a.martin at dundee.ac.uk (David Martin) Date: Tue, 01 Dec 2009 16:16:21 +0000 Subject: [EMBOSS] Trimming illumina short reads based on quality In-Reply-To: <4B153A4C.6000904@umdnj.edu> References: <8D08960C647E64438CE5740657CBBDC50148732064@iahcexch1.iah.bbsrc.ac.uk> <4B153A4C.6000904@umdnj.edu> Message-ID: <4B154155.6F09.00E0.1@dundee.ac.uk> >>> On 12/1/2009 at 3:46 PM, in message <4B153A4C.6000904 at umdnj.edu>, Ryan Golhar wrote: I think virtually every man and his dog who has done anything with Illumina reads has a variety of perl scripts that do this. It depends how you want to do the trimming. Do you want to clip to a specific length, clip on quality (absolute or average over a window) and do you have a minimum length requirement? Do you want to clip 3' and 5' ends or just one? ..d Michael, Doesn't Illumina provide tools to do this? I know with ABI Solid data, they have a perl script capable of trimming data based on quality scores. Ryan michael watson (IAH-C) wrote: > Hi > > I'm sorry if I've not been keeping up to date on what is doubtless a hot topic. > > Does EMBOSS allow one to trim short reads based on quality data (from a fastq file)? > > If not, I have read that it is planned - any idea when it will be implemented? > > Otherwise, alternative suggestions are welcome! > David Martin PhD College of Life Sciences University of Dundee The University of Dundee is a Scottish Registered Charity, No. SC015096. The University of Dundee is a registered Scottish charity, No: SC015096 From matthias.dodt at mdc-berlin.de Tue Dec 1 11:17:48 2009 From: matthias.dodt at mdc-berlin.de (Matthias Dodt) Date: Tue, 01 Dec 2009 17:17:48 +0100 Subject: [EMBOSS] using sixpack Message-ID: <4B1541AC.8000301@mdc-berlin.de> Hi there! I have some problems using sixpack for 6-frame translation. I want to convert a fasta file of contigs with sixpack. The command is: sixpack contigs.fa -outseq protein_sequence The problem is that sixpack only converts the first sequence in the fasta file. How can i force it to process the whole file?? thanks! greetings mat From ztu at msi.umn.edu Tue Dec 1 13:38:54 2009 From: ztu at msi.umn.edu (Zheng Jin Tu) Date: Tue, 1 Dec 2009 12:38:54 -0600 (CST) Subject: [EMBOSS] Trimming illumina short reads based on quality In-Reply-To: <4B154155.6F09.00E0.1@dundee.ac.uk> References: <8D08960C647E64438CE5740657CBBDC50148732064@iahcexch1.iah.bbsrc.ac.uk> <4B153A4C.6000904@umdnj.edu> <4B154155.6F09.00E0.1@dundee.ac.uk> Message-ID: Need to find bioinformatician to do the coding. Not sure how to set correct filter parameters but just sharing some experiences: Basically person will check both sequence string and quality string from xxx_qseq.txt file. Match each nucleotide and quality in char level: 454, qual 20 is 95% and 40 99% confident if I am correct. In GA, qual score is bit coded and can be read out by ord function in perl: qualcore = ord( quality_char ) - 64; Not sure the cut off appropriate value. B is quality score 2. Thus at least we remove these BBBBB. May set another min length filter to get rid of less than like 10 nucleotides read after trimming for low quality score. Set another one max_score or avg_score filter, like 5, it can filter out the third and forth sequences in below lines. R0174436 1 8 119 0 1418 0 1 .GATCTTCTCCTTCACCTCCTCCAGGTCCTTGGTCAGCTCAGCACGCAGAG Bb^`bb____bbaaVI_Zbbaba`X_bb`aUbbb`W\\a^\bbT[_Xb]__ 0 R0174436 1 8 119 0 991 0 1 .GCCAATCTGTACTTGTCTTCTTCAGTTCCCACTTTGAATACCGCACAGTC BaGT]]bb[]`_]abaIaaaVbb^``abbaM`Ubbb`babaQT]XS_[a[B 0 R0174436 1 8 119 1791 1559 0 1 A.................................................. BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB 0 R0174436 1 8 119 1791 1997 0 1 A.................................................. BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB 0 Some people also look for trimming down poly T in RNA-seq case. But not sure how many TTTT should be out. Or also do AAAA case for reverse case? Finally better output to fastq file format. @R0174436:6:83:0:1815#0/1 .GTCAATGCGTTCCACCCCCTCTGGGTAGCCTCCAACATCATGTACGTCGA +R0174436:6:83:0:1815#0/1 Ba`babbb]bb\Xb]b___V^^aaaa___Z_\_[aaa]babb`X`^b\bbb @R0174436:6:83:0:506#0/1 .GCAGGAGAAGCATTTTATCTTTGTATTTTCTTCACTGGCAACAACAATGT +R0174436:6:83:0:506#0/1 BaOW_\I]__a``a_\_J_HU_V_J\a`aa^bab^^]]]Y]^`[`[T]]\^ Good luck. TU =================================================== On Tue, 1 Dec 2009, David Martin wrote: > >>> On 12/1/2009 at 3:46 PM, in message <4B153A4C.6000904 at umdnj.edu>, Ryan Golhar wrote: > I think virtually every man and his dog who has done anything with Illumina reads has a variety of perl scripts that do this. It depends how you want to do the trimming. Do you want to clip to a specific length, clip on quality (absolute or average over a window) and do you have a minimum length requirement? > > Do you want to clip 3' and 5' ends or just one? > > ..d > > > Michael, > > Doesn't Illumina provide tools to do this? I know with ABI Solid data, > they have a perl script capable of trimming data based on quality scores. > > Ryan > > > michael watson (IAH-C) wrote: > > Hi > > > > I'm sorry if I've not been keeping up to date on what is doubtless a hot topic. > > > > Does EMBOSS allow one to trim short reads based on quality data (from a fastq file)? > > > > If not, I have read that it is planned - any idea when it will be implemented? > > > > Otherwise, alternative suggestions are welcome! > > > > > > David Martin PhD > College of Life Sciences > University of Dundee > The University of Dundee is a Scottish Registered Charity, No. SC015096. > > > The University of Dundee is a registered Scottish charity, No: SC015096 > > _______________________________________________ > EMBOSS mailing list > EMBOSS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/emboss > From biopython at maubp.freeserve.co.uk Tue Dec 1 14:45:58 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 1 Dec 2009 19:45:58 +0000 Subject: [EMBOSS] Trimming illumina short reads based on quality In-Reply-To: <8D08960C647E64438CE5740657CBBDC50148732064@iahcexch1.iah.bbsrc.ac.uk> References: <8D08960C647E64438CE5740657CBBDC50148732064@iahcexch1.iah.bbsrc.ac.uk> Message-ID: <320fb6e00912011145i2111042ap222cd8fdaef0364b@mail.gmail.com> On Tue, Dec 1, 2009 at 2:33 PM, michael watson (IAH-C) wrote: > > Hi > > I'm sorry if I've not been keeping up to date on what is doubtless a hot topic. > > Does EMBOSS allow one to trim short reads based on quality data (from a fastq file)? > > If not, I have read that it is planned - any idea when it will be implemented? Not yet, but it has been proposed and I understand it is on the EMBOSS to do list along with quality filtering (Peter Rice has suggested the name quaffle for this): http://lists.open-bio.org/pipermail/bioperl-l/2009-July/030493.html I dare say suggestions for precise trimming algorithms (e.g. median over sliding window) might be welcome. > Otherwise, alternative suggestions are welcome! I'm sure there are plenty of scripts out these, in Perl, Python etc. What is your language of choice? Peter C. From zen.lu at roslin.ed.ac.uk Tue Dec 1 15:15:46 2009 From: zen.lu at roslin.ed.ac.uk (zen lu (RI)) Date: Tue, 1 Dec 2009 20:15:46 +0000 Subject: [EMBOSS] Trimming illumina short reads based on quality In-Reply-To: <8D08960C647E64438CE5740657CBBDC50148732064@iahcexch1.iah.bbsrc.ac.uk> References: <8D08960C647E64438CE5740657CBBDC50148732064@iahcexch1.iah.bbsrc.ac.uk> Message-ID: <050C9A545DC1D84BAC7A678B76A56C3C251E351D24@ebrcexch1.ebrc.bbsrc.ac.uk> The fastx-toolkit may be what you are looking for: http://hannonlab.cshl.edu/fastx_toolkit/ ________________________________________ From: emboss-bounces at lists.open-bio.org [emboss-bounces at lists.open-bio.org] On Behalf Of michael watson (IAH-C) [michael.watson at bbsrc.ac.uk] Sent: 01 December 2009 14:33 To: emboss at lists.open-bio.org Subject: [EMBOSS] Trimming illumina short reads based on quality Hi I'm sorry if I've not been keeping up to date on what is doubtless a hot topic. Does EMBOSS allow one to trim short reads based on quality data (from a fastq file)? If not, I have read that it is planned - any idea when it will be implemented? Otherwise, alternative suggestions are welcome! Thanks Mick _______________________________________________ EMBOSS mailing list EMBOSS at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/emboss From pmr at ebi.ac.uk Wed Dec 2 04:43:42 2009 From: pmr at ebi.ac.uk (Peter Rice) Date: Wed, 02 Dec 2009 09:43:42 +0000 Subject: [EMBOSS] using sixpack In-Reply-To: <4B1541AC.8000301@mdc-berlin.de> References: <4B1541AC.8000301@mdc-berlin.de> Message-ID: <4B1636CE.6050403@ebi.ac.uk> On 12/01/2009 04:17 PM, Matthias Dodt wrote: > Hi there! > > I have some problems using sixpack for 6-frame translation. I want to > convert a fasta file of contigs with sixpack. The command is: > > sixpack contigs.fa -outseq protein_sequence > > The problem is that sixpack only converts the first sequence in the > fasta file. How can i force it to process the whole file?? Two options: One is to change the EMBOSS code to loop over each sequence. The other is to write a script that extracts each sequence in turn and launches sixpack. We can consider this for the next EMBOSS release. It applies to other applications too. In general, would users (and developers of web and other interfaces) be happy if more applications could read every sequence in a fasta file? This raises questions of how to mark up the output so that it is clear where each results comes from. There will always be applications where it is more sensible to proces sonly a single sequence. A third option (there is so often another way): getorf will find and report open reading frames in all input sequences getorf contigs.fa -outseq protein_sequence There will be differences in the output - getorf limits ORFs to 30 nucleotides. You get the same effect in sixpack with -orfmin 10 (oops, sixpack counts amino acids - we will try to make them consistent in the next release!) You can also add -minsize 3 to the getorf command line to report all ORFs like sixpack does. Hope this helps, Peter Rice From matthias.dodt at mdc-berlin.de Thu Dec 3 04:39:43 2009 From: matthias.dodt at mdc-berlin.de (Matthias Dodt) Date: Thu, 03 Dec 2009 10:39:43 +0100 Subject: [EMBOSS] using sixpack In-Reply-To: <4B1636CE.6050403@ebi.ac.uk> References: <4B1541AC.8000301@mdc-berlin.de> <4B1636CE.6050403@ebi.ac.uk> Message-ID: <4B17875F.80803@mdc-berlin.de> Hello Peter! Thank you very much, getorf is sufficient for me- greetings mat Peter Rice schrieb: > On 12/01/2009 04:17 PM, Matthias Dodt wrote: >> Hi there! >> >> I have some problems using sixpack for 6-frame translation. I want to >> convert a fasta file of contigs with sixpack. The command is: >> >> sixpack contigs.fa -outseq protein_sequence >> >> The problem is that sixpack only converts the first sequence in the >> fasta file. How can i force it to process the whole file?? > > Two options: > > One is to change the EMBOSS code to loop over each sequence. > > The other is to write a script that extracts each sequence in turn and > launches sixpack. > > We can consider this for the next EMBOSS release. It applies to other > applications too. In general, would users (and developers of web and > other interfaces) be happy if more applications could read every > sequence in a fasta file? > > This raises questions of how to mark up the output so that it is clear > where each results comes from. There will always be applications where > it is more sensible to proces sonly a single sequence. > > A third option (there is so often another way): > > getorf will find and report open reading frames in all input sequences > > getorf contigs.fa -outseq protein_sequence > > There will be differences in the output - getorf limits ORFs to 30 > nucleotides. You get the same effect in sixpack with -orfmin 10 (oops, > sixpack counts amino acids - we will try to make them consistent in the > next release!) > > You can also add -minsize 3 to the getorf command line to report all > ORFs like sixpack does. > > Hope this helps, > > Peter Rice -- ------------------------------------------------ Matthias Dodt Scientific Programmer at Bioinformatics platform AG Dieterich Berlin Institute for Medical Systems Biology at the Max-Delbrueck-Center for Molecular Medicine Robert-Roessle-Strasse 10, 13125 Berlin, Germany fon: +49 30 9406 4261 email: matthias.dodt at mdc-berlin.de From biopython at maubp.freeserve.co.uk Mon Dec 7 14:36:30 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 7 Dec 2009 19:36:30 +0000 Subject: [EMBOSS] Unknown output format 'refseqp' and 'genpept' In-Reply-To: References: Message-ID: <320fb6e00912071136i26de9cadx9da04e1527999345@mail.gmail.com> Hi, I have a protein IntelliGenetics file used in the Biopython test suite: http://biopython.org/SRC/biopython/Tests/IntelliGenetics/VIF_mase-pro.txt I am using EMBOSS 6.1.0 (patch level 2 I think), and I am trying to turn this into a "GenBank Protein File", or GenPept file, using EMBOSS seqret. EMBOSS can read the file fine, this works: $ seqret -auto -sformat=ig -osformat=fasta VIF_mase-pro.txt temp.txt Giving FASTA output with 16 gapped protein sequences, which is good - although the ID of the first record is a bit odd. Using "genbank" as the output format in EMBOSS seems to mean nucleotide and not protein: $ seqret -auto -sformat=ig -osformat=genbank VIF_mase-pro.txt temp.txt Error: Sequence format 'genbank' not supported for protein sequences Error: Sequence format 'genbank' not supported for protein sequences ... Error: Sequence format 'genbank' not supported for protein sequences Referring to the documentation, http://emboss.sourceforge.net/docs/themes/SequenceFormats.html I then tried "genpept" and "refseqp": $ seqret -auto -sformat=ig -osformat=genpept VIF_mase-pro.txt temp.txt Error: Unknown output format 'genpept' Error: Unknown output format 'genpept' ... Error: unknown output format 'genpept' $ seqret -auto -sformat=ig -osformat=refseqp VIF_mase-pro.txt temp.txt Error: Unknown output format 'refseqp' Error: Unknown output format 'refseqp' ... Error: unknown output format 'refseqp' Doesn't EMBOSS seqret support genpept/refseqp as an output format? Thanks, Peter C. From pmr at ebi.ac.uk Tue Dec 8 08:32:41 2009 From: pmr at ebi.ac.uk (Peter Rice) Date: Tue, 08 Dec 2009 13:32:41 +0000 Subject: [EMBOSS] Unknown output format 'refseqp' and 'genpept' In-Reply-To: <320fb6e00912071136i26de9cadx9da04e1527999345@mail.gmail.com> References: <320fb6e00912071136i26de9cadx9da04e1527999345@mail.gmail.com> Message-ID: <4B1E5579.2070407@ebi.ac.uk> Peter wrote: > Hi, > > I have a protein IntelliGenetics file used in the Biopython test suite: > http://biopython.org/SRC/biopython/Tests/IntelliGenetics/VIF_mase-pro.txt > > I am using EMBOSS 6.1.0 (patch level 2 I think), and I am trying > to turn this into a "GenBank Protein File", or GenPept file, using > EMBOSS seqret. > > Doesn't EMBOSS seqret support genpept/refseqp as an output format? Oddly enough you are the first to ask for it. Does biopython have a definition of the fields it expects to write out in a GenPept or RefseqP format file? We would be able to allow GenBank as an alias for, presumably, genpept. Might be a good time to merge the format names and details from biopython and emboss. Where can Ifine the biopython ones? regards, Peter From biopython at maubp.freeserve.co.uk Tue Dec 8 08:53:13 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 8 Dec 2009 13:53:13 +0000 Subject: [EMBOSS] Unknown output format 'refseqp' and 'genpept' In-Reply-To: <4B1E5579.2070407@ebi.ac.uk> References: <320fb6e00912071136i26de9cadx9da04e1527999345@mail.gmail.com> <4B1E5579.2070407@ebi.ac.uk> Message-ID: <320fb6e00912080553w490f66vffab00edbe192069@mail.gmail.com> On Tue, Dec 8, 2009 at 1:32 PM, Peter Rice wrote: > > Peter wrote: >> >> Hi, >> >> I have a protein IntelliGenetics file used in the Biopython test suite: >> http://biopython.org/SRC/biopython/Tests/IntelliGenetics/VIF_mase-pro.txt It probably doesn't matter what the input file is here, the fact that it was an (obsolete) format like IntelliGenetics was just chance as I was working on a Biopython unit test. >> I am using EMBOSS 6.1.0 (patch level 2 I think), and I am trying >> to turn this into a "GenBank Protein File", or GenPept file, using >> EMBOSS seqret. >> >> Doesn't EMBOSS seqret support genpept/refseqp as an output format? > > Oddly enough you are the first to ask for it. That surprises me a little bit. Could I suggest you treat known input formats which are not supported as output formats a little differently and instead of this: unknown output format 'genpept' Perhaps give, format 'genpept' is not supported for output (only input) This would help the user rule out having a typo etc. > Does biopython have a definition of the fields it expects to write out in a > GenPept or RefseqP format file? We would be able to allow GenBank as an > alias for, presumably, genpept. Not explicitly, no. I was hoping to use EMBOSS for cross validation ;) With hindsight this may have been a mistake, but we use "genbank" format to mean either nucleotides of proteins. On parsing we just look at the units of length in the LOCUS line (bp or aa). We also try to cope with both the current NCBI files and some older variants we have in our unit tests (different offsets in the LOCUS line). > Might be a good time to merge the format names and details from biopython > and emboss. Where can Ifine the biopython ones? There are two tables on the wiki which include version information: http://biopython.org/wiki/SeqIO http://biopython.org/wiki/AlignIO You can also consult the built in documentation, also available online: http://biopython.org/DIST/docs/api/Bio.SeqIO-module.html http://biopython.org/DIST/docs/api/Bio.AlignIO-module.html For a long time I avoided having aliases (multiple names for the same thing). However, we now treat "gb" as an alias for "genbank" (since this is what the NCBI use in Entrez). We also treat "fastq-sanger" and "fastq" the same. Peter C (the one at Biopython) From pmr at ebi.ac.uk Tue Dec 8 09:11:59 2009 From: pmr at ebi.ac.uk (Peter Rice) Date: Tue, 08 Dec 2009 14:11:59 +0000 Subject: [EMBOSS] Unknown output format 'refseqp' and 'genpept' In-Reply-To: <320fb6e00912080553w490f66vffab00edbe192069@mail.gmail.com> References: <320fb6e00912071136i26de9cadx9da04e1527999345@mail.gmail.com> <4B1E5579.2070407@ebi.ac.uk> <320fb6e00912080553w490f66vffab00edbe192069@mail.gmail.com> Message-ID: <4B1E5EAF.80301@ebi.ac.uk> Peter C. wrote: > Could I suggest you treat known input formats which are not supported > as output formats a little differently and instead of this: > > unknown output format 'genpept' > > Perhaps give, > > format 'genpept' is not supported for output (only input) > > This would help the user rule out having a typo etc. A useful suggestion. We can apply that to feature formats too. I'll see what I can do. may be worth a tidy up on what we do with formats that are only valid for nucleotide or protein (though that is a little tricky as we currently try to let some fail over to an equivalent format. >> Does biopython have a definition of the fields it expects to write out in a >> GenPept or RefseqP format file? We would be able to allow GenBank as an >> alias for, presumably, genpept. > > Not explicitly, no. I was hoping to use EMBOSS for cross validation ;) No problem. We'll go first then and try to define standard formats. > With hindsight this may have been a mistake, but we use "genbank" > format to mean either nucleotides of proteins. On parsing we just > look at the units of length in the LOCUS line (bp or aa). We also > try to cope with both the current NCBI files and some older variants > we have in our unit tests (different offsets in the LOCUS line). We try that too on input, but for output we have to be explicit so the user can pick just one of the choices. regards, Peter R. From biopython at maubp.freeserve.co.uk Tue Dec 8 09:29:25 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 8 Dec 2009 14:29:25 +0000 Subject: [EMBOSS] Unknown output format 'refseqp' and 'genpept' In-Reply-To: <4B1E5EAF.80301@ebi.ac.uk> References: <320fb6e00912071136i26de9cadx9da04e1527999345@mail.gmail.com> <4B1E5579.2070407@ebi.ac.uk> <320fb6e00912080553w490f66vffab00edbe192069@mail.gmail.com> <4B1E5EAF.80301@ebi.ac.uk> Message-ID: <320fb6e00912080629h969fc27m46dc0165ff1832d0@mail.gmail.com> On Tue, Dec 8, 2009 at 2:11 PM, Peter Rice wrote: > >> With hindsight this may have been a mistake, but we use "genbank" >> format to mean either nucleotides of proteins. On parsing we just >> look at the units of length in the LOCUS line (bp or aa). We also >> try to cope with both the current NCBI files and some older variants >> we have in our unit tests (different offsets in the LOCUS line). > > We try that too on input, but for output we have to be explicit so the user > can pick just one of the choices. I imagine that as with Biopython, sometimes the user has made it explicit that they are dealing with nucleotides or proteins (lots of the EMBOSS tools have switches for this), so you know if you should be using "aa" or "bp" in the LOCUS line. Peter From kellert at ohsu.edu Tue Dec 8 15:13:36 2009 From: kellert at ohsu.edu (Tom Keller) Date: Tue, 8 Dec 2009 12:13:36 -0800 Subject: [EMBOSS] newbie emma (clustalw) question Message-ID: This is an ongoing frustration. I use EMBOSS only occasionally, and I can never find a good guide for setting environmental variables. For example, I wanted to use emma with a new install of clustalw2. I made a link to it in /usr/local/bin, but emma can't find it. How do I fix this in a easy to maintain manner? thanks, Tom Thomas (Tom) Keller kellert at ohsu.edu 503.494.2442 6339b R Jones Hall (BSc/CROET) www.ohsu.edu/xd/research/research-cores/dna-analysis/ From kellert at ohsu.edu Tue Dec 8 16:08:54 2009 From: kellert at ohsu.edu (Tom Keller) Date: Tue, 8 Dec 2009 13:08:54 -0800 Subject: [EMBOSS] newbie emma (clustalw) question In-Reply-To: References: Message-ID: <3A1B51FF-D28D-43DE-A555-E6B6D3163352@ohsu.edu> I finally found the answer. It's simple, but if you don't know it, your only guessing, and that is really not a good approach. The environmental variable to set is EMBOSS_CLUSTALW in whatever dot shell control file you use for setting env. for the bash shell, I edited .bashrc with: export EMBOSS_CLUSTALW=/usr/local/bin/clustalw How would one go about requesting that this sort of info be added as a comment to the application tfm page? So instead of reading COMMENT none it read: Set env variable EMBOSS_CLUSTALW= thanks, Tom On Dec 8, 2009, at 12:13 PM, Tom Keller wrote: > This is an ongoing frustration. I use EMBOSS only occasionally, and I can never find a good guide for setting environmental variables. For example, I wanted to use emma with a new install of clustalw2. I made a link to it in /usr/local/bin, but emma can't find it. How do I fix this in a easy to maintain manner? > > thanks, > Tom > > > Thomas (Tom) Keller > kellert at ohsu.edu > 503.494.2442 > 6339b R Jones Hall (BSc/CROET) > www.ohsu.edu/xd/research/research-cores/dna-analysis/ > > > _______________________________________________ > EMBOSS mailing list > EMBOSS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/emboss Thomas (Tom) Keller kellert at ohsu.edu 503.494.2442 6339b R Jones Hall (BSc/CROET) www.ohsu.edu/xd/research/research-cores/dna-analysis/ From pmr at ebi.ac.uk Wed Dec 9 04:21:43 2009 From: pmr at ebi.ac.uk (Peter Rice) Date: Wed, 09 Dec 2009 09:21:43 +0000 Subject: [EMBOSS] newbie emma (clustalw) question In-Reply-To: <3A1B51FF-D28D-43DE-A555-E6B6D3163352@ohsu.edu> References: <3A1B51FF-D28D-43DE-A555-E6B6D3163352@ohsu.edu> Message-ID: <4B1F6C27.6050502@ebi.ac.uk> On 08/12/09 21:08, Tom Keller wrote: > On Dec 8, 2009, at 12:13 PM, Tom Keller wrote: >> This is an ongoing frustration. I use EMBOSS only occasionally, and >> I can never find a good guide for setting environmental variables. >> For example, I wanted to use emma with a new install of clustalw2. >> I made a link to it in /usr/local/bin, but emma can't find it. How >> do I fix this in a easy to maintain manner? You got there first. The simple answer is that emma will look for clustalw (not clustalw2) in your path. This should include a link. If you can run clustalw from the command line then it should also run from within emma. > I finally found the answer. It's simple, but if you don't know it, > your only guessing, and that is really not a good approach. > > The environmental variable to set is EMBOSS_CLUSTALW in whatever dot > shell control file you use for setting env. > for the bash shell, I edited .bashrc with: > export EMBOSS_CLUSTALW=/usr/local/bin/clustalw This was a feature we added so that clustalw could be launched from a local installation, or to run an executable called clustalw2 (or clustalw183) > How would one go about requesting that this sort of info be added as > a comment to the application tfm page? So instead of reading > COMMENT > none > > it read: Set env variable EMBOSS_CLUSTALW= You just did :-) We will go through the external programs and add more information. There is a paragraph in the emma documentation (diagnostic error messages) but it is not easy to find (and probably has not been read by many users - I just found a typo in it). We plan for a future release to call all external applications in a common way, and to issue a standard error message describing the path and environment variable options. We will also add a reference to external programs in the ACD files for interface developers to identify dependencies on other packages. We will also look into adding messages to the EMBOSS configure script to warn about required third party packages. Thanks for the suggestions regards, Peter Rice From pmr at ebi.ac.uk Thu Dec 10 08:36:14 2009 From: pmr at ebi.ac.uk (Peter Rice) Date: Thu, 10 Dec 2009 13:36:14 +0000 Subject: [EMBOSS] EMBOSS 6.1.0 patch 1.3 Message-ID: <4B20F94E.4000506@ebi.ac.uk> A patch for EMBOSS 6.1.0 is on the FTP server. This fixes problems with extractfeat, using format names with dashes (fastq-sanger) in USAs, scaling issues in plot outputs, and some minor bugs. The files are on our FTP server ftp://emboss.open-bio.org/pub/EMBOSS/fixes with a patch file and instructions in the patches subdirectory. Fix 3. EMBOSS-6.1.0/ajax/ajfeat.c EMBOSS-6.1.0/ajax/ajfeat.h EMBOSS-6.1.0/ajax/ajgraph.c EMBOSS-6.1.0/ajax/ajmath.c EMBOSS-6.1.0/ajax/ajseq.c EMBOSS-6.1.0/ajax/ajseqread.c EMBOSS-6.1.0/ajax/ajseqwrite.c EMBOSS-6.1.0/nucleus/embmisc.c EMBOSS-6.1.0/nucleus/embmisc.h EMBOSS-6.1.0/nucleus/embpat.c EMBOSS-6.1.0/emboss/coderet.c EMBOSS-6.1.0/emboss/extractfeat.c EMBOSS-6.1.0/emboss/notseq.c EMBOSS-6.1.0/emboss/prettyplot.c EMBOSS-6.1.0/emboss/seqmatchall.c EMBOSS-6.1.0/emboss/showfeat.c EMBOSS-6.1.0/emboss/showpep.c EMBOSS-6.1.0/emboss/showseq.c EMBOSS-6.1.0/emboss/twofeat.c EMBOSS-6.1.0/jemboss/utils/install-jemboss-server.sh EMBOSS-6.1.0/jemboss/org/emboss/jemboss/server/AppendToLogFileThread.java EMBOSS-6.1.0/jemboss/org/emboss/jemboss/server/JembossAuthServer.java 02-Dec-2009: Fixes problems with extractfeat. The fix includes cleaner definitions of functions used to match feature tags and feature types which result in minor updates to 6 other applications. Extractfeat in previous versions used its own text parser to extract feature data from only a limited set of formats. In release 6.1.0 it was replaced by the standard EMBOSS feature table. With no options set, extractfeat rejected all features (type '*' was needed to extract features). Extractfeat default settings now extract all features from an entry. Features on the reverse strand were incorrectly processed (an effect caused by some of the old extractfeat code remaining). Reverse strand features are now correctly parsed, including both "join(complement())" and "complement(join())" syntax in EMBL/GenBank/DDBJ feature tables. Fixes an issue in GenBank parsing where the ORIGIN line is absent. Fixes scaling errors in prettyplot, especially in mEMBOSS when plotting to a window on screen (the default output). The plplot library does not report the true width and height for several devices. The assumptions in prettyplot depend on reasonable size estimates. Release 6.2.0 will have further corrections to plplot device scaling. Fixes the counting of non-coding features in coderet. Fixes a seqmatchall error for short sequences with perfect matches When reverse-complementing sequences, also reverses the quality scores. Allows '-' in format names in the USA syntax, to allow fastq-sanger fastq-illumina and fastq-solexa format names to be used. When reading protein sequences, a sequence with only a stop is now recognized as empty (zero length) after processing ambiguity codes and stops. Fixes a problem writing features in PIR format when the feature table is empty, for example a report file with no hits. Fixes a dependency on 'ant' to install a Jemboss server. Fixes a problem in logging Jemboss info/error messages. regards, Peter Rice From charles-listes-emboss at plessy.org Mon Dec 14 06:59:18 2009 From: charles-listes-emboss at plessy.org (Charles Plessy) Date: Mon, 14 Dec 2009 20:59:18 +0900 Subject: [EMBOSS] EMBOSS 6.1.0 patch 1.3 In-Reply-To: <4B20F94E.4000506@ebi.ac.uk> References: <4B20F94E.4000506@ebi.ac.uk> Message-ID: <20091214115918.GB22410@kunpuu.plessy.org> Le Thu, Dec 10, 2009 at 01:36:14PM +0000, Peter Rice a ?crit : > A patch for EMBOSS 6.1.0 is on the FTP server. This fixes problems with > extractfeat, using format names with dashes (fastq-sanger) in USAs, > scaling issues in plot outputs, and some minor bugs. > > The files are on our FTP server ftp://emboss.open-bio.org/pub/EMBOSS/fixes > with a patch file and instructions in the patches subdirectory. > > Fix 3. EMBOSS-6.1.0/ajax/ajfeat.c > EMBOSS-6.1.0/ajax/ajfeat.h > EMBOSS-6.1.0/ajax/ajgraph.c > EMBOSS-6.1.0/ajax/ajmath.c > EMBOSS-6.1.0/ajax/ajseq.c > EMBOSS-6.1.0/ajax/ajseqread.c > EMBOSS-6.1.0/ajax/ajseqwrite.c > EMBOSS-6.1.0/nucleus/embmisc.c > EMBOSS-6.1.0/nucleus/embmisc.h > EMBOSS-6.1.0/nucleus/embpat.c > EMBOSS-6.1.0/emboss/coderet.c > EMBOSS-6.1.0/emboss/extractfeat.c > EMBOSS-6.1.0/emboss/notseq.c > EMBOSS-6.1.0/emboss/prettyplot.c > EMBOSS-6.1.0/emboss/seqmatchall.c > EMBOSS-6.1.0/emboss/showfeat.c > EMBOSS-6.1.0/emboss/showpep.c > EMBOSS-6.1.0/emboss/showseq.c > EMBOSS-6.1.0/emboss/twofeat.c > EMBOSS-6.1.0/jemboss/utils/install-jemboss-server.sh Dear Peter and all EMBOSS developers, There were two issues that were discussed previously on the EMBOSS user list or on the sourceforge tracker, that I do not see fixed in this patch. The EMBOSS package distributed by Debian contains a patch for each of them. 1) To prevent vectorstrip to discard FASTQ qualities. http://git.debian.org/?p=debian-med/emboss.git;a=blob;f=debian/patches/transient-vectorstrip.patch;h=c4d8fd1e6c6d223ec8664b1aa08171407f9ace55;hb=HEAD https://sourceforge.net/tracker/index.php?func=detail&aid=2886368&group_id=93650&atid=605034 2) To help tfm to find the HTML documentation. http://git.debian.org/?p=debian-med/emboss.git;a=blob;f=debian/patches/tfm-html.patch;h=c0595eb5008f53fec27e0ba4ff144c6385c6acdd;hb=HEAD https://sourceforge.net/tracker/?func=detail&aid=2877960&group_id=93650&atid=605031 I would welcome in particular comments on the second patch. If it is not suitable, then I will remove it from the Debian package. Best regards, -- Charles Plessy Debian Med packaging team, http://www.debian.org/devel/debian-med Tsurumi, Kanagawa, Japan From pmr at ebi.ac.uk Mon Dec 14 08:44:10 2009 From: pmr at ebi.ac.uk (Peter Rice) Date: Mon, 14 Dec 2009 13:44:10 +0000 Subject: [EMBOSS] EMBOSS 6.1.0 patch 1.3 In-Reply-To: <20091214115918.GB22410@kunpuu.plessy.org> References: <4B20F94E.4000506@ebi.ac.uk> <20091214115918.GB22410@kunpuu.plessy.org> Message-ID: <4B26412A.5020906@ebi.ac.uk> On 12/14/2009 11:59 AM, Charles Plessy wrote: > There were two issues that were discussed previously on the EMBOSS user list or > on the sourceforge tracker, that I do not see fixed in this patch. The EMBOSS > package distributed by Debian contains a patch for each of them. > > 1) To prevent vectorstrip to discard FASTQ qualities. > http://git.debian.org/?p=debian-med/emboss.git;a=blob;f=debian/patches/transient-vectorstrip.patch;h=c4d8fd1e6c6d223ec8664b1aa08171407f9ace55;hb=HEAD > https://sourceforge.net/tracker/index.php?func=detail&aid=2886368&group_id=93650&atid=605034 Sorry, we missed that update in goiing through the changes in the ajseq.c file. We will add it to the next patch. > 2) To help tfm to find the HTML documentation. > http://git.debian.org/?p=debian-med/emboss.git;a=blob;f=debian/patches/tfm-html.patch;h=c0595eb5008f53fec27e0ba4ff144c6385c6acdd;hb=HEAD > https://sourceforge.net/tracker/?func=detail&aid=2877960&group_id=93650&atid=605031 This one is wrong. If the documentation is not found in the install directory (which it should be with a standard 'make install') tfm looks in ajNamValueBaseDir which is the top level directory. The Debian patch uses ajNamValueRootDir which is the emboss/ subdirectory used when finding uninstalled ACD files. embossversion -full shows both these paths. Hope this helps, Peter Rice From stephen.taylor at imm.ox.ac.uk Tue Dec 15 07:14:07 2009 From: stephen.taylor at imm.ox.ac.uk (Steve Taylor) Date: Tue, 15 Dec 2009 12:14:07 +0000 Subject: [EMBOSS] Genpept entry in MSE Message-ID: <4B277D8F.8010205@imm.ox.ac.uk> Hi, I am trying to load a Genpept entry into MSE, EMBOSS Version 6.0.1 on Fedora. Unfortunately it doesn't like the LOCUS line. It loads, but warns: Warning: bad Genbank LOCUS line 'LOCUS ACN78416 225 aa linear BCT 21-MAR-2009' Changing the aa to bp fixes it. Thanks, Steve LOCUS ACN78416 225 aa linear BCT 21-MAR-2009 DEFINITION galactosyltransferase A [Pasteurella multocida]. ACCESSION ACN78416 VERSION ACN78416.1 GI:224999306 DBSOURCE accession FJ755839.1 KEYWORDS . SOURCE Pasteurella multocida ORGANISM Pasteurella multocida Bacteria; Proteobacteria; Gammaproteobacteria; Pasteurellales; Pasteurellaceae; Pasteurella. REFERENCE 1 (residues 1 to 225) AUTHORS Boyce,J.D., Harper,M., St Michael,F., John,M., Aubry,A., Parnas,H., Logan,S.M., Wilkie,I.W., Ford,M., Cox,A.D. and Adler,B. TITLE Identification of novel glycosyltransferases required for assembly of the Pasteurella multocida A:1 lipopolysaccharide and their involvement in virulence JOURNAL Infect. Immun. 77 (4), 1532-1542 (2009) PUBMED 19168738 REFERENCE 2 (residues 1 to 225) AUTHORS Harper,M., John,M., Adler,B. and Boyce,J. TITLE Direct Submission JOURNAL Submitted (17-FEB-2009) Microbiology, Monash University, Wellington road, Clayton, Melbourne, Victoria 3800, Australia COMMENT Method: conceptual translation supplied by author. FEATURES Location/Qualifiers source 1..225 /organism="Pasteurella multocida" /strain="VP161" /serotype="1" /db_xref="taxon:747" /note="serogroup: A" Protein 1..225 /product="galactosyltransferase A" /function="adds beta-D-galactose to both the 4 and 6 position of alpha-L,D-heptose IV" /name="putative bi-functional galactosyltransferase; GatA" Region 5..188 /region_name="Glyco_transf_25" /note="Glycosyltransferase family 25 [lipooligosaccharide (LOS) biosynthesis protein] is a family of glycosyltransferases involved in LOS biosynthesis. The members include the beta(1,4) galactosyltransferases: Lgt2 of Moraxella catarrhalis, LgtB and LgtE of...; cd06532" /db_xref="CDD:133474" CDS 1..225 /gene="gatA" /coded_by="FJ755839.1:1..678" /transl_table=11 ORIGIN 1 mklpkiivis lknsprrqii shrlsglgld feffdavygk dltkeeleki dyeffpkycg 61 skgaltlgei gcamshikiy ehivannleq viileddaiv slyfeeivla alqklpnrre 121 ilfldhgkak vypfmrnlpe ryrlaryrkp skhskrfivr ttaylitleg akkllkhayp 181 irmpsdfltg llqlthinay giepscvfgg veseinemer raglk // From biopython at maubp.freeserve.co.uk Tue Dec 15 08:13:54 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 15 Dec 2009 13:13:54 +0000 Subject: [EMBOSS] Genpept entry in MSE In-Reply-To: <4B277D8F.8010205@imm.ox.ac.uk> References: <4B277D8F.8010205@imm.ox.ac.uk> Message-ID: <320fb6e00912150513g6ca79e70lf86bca737fef5f5f@mail.gmail.com> On Tue, Dec 15, 2009 at 12:14 PM, Steve Taylor wrote: > > Hi, > > I am trying to load a Genpept entry into MSE, EMBOSS Version 6.0.1 on > Fedora. Unfortunately it doesn't like the LOCUS line. > > It loads, but warns: > > Warning: bad Genbank LOCUS line 'LOCUS ? ? ? ACN78416 ? ? ? ? ? ? ? ? 225 aa > ? ? ? ? ?linear ? BCT 21-MAR-2009' > > Changing the aa to bp fixes it. What command line did you use? If you specified format "genbank", I think you should use format name "genpept" or "refseqp" instead: http://emboss.sourceforge.net/docs/themes/SequenceFormats.html Peter From stephen.taylor at imm.ox.ac.uk Tue Dec 15 10:26:59 2009 From: stephen.taylor at imm.ox.ac.uk (Steve Taylor) Date: Tue, 15 Dec 2009 15:26:59 +0000 Subject: [EMBOSS] Genpept entry in MSE In-Reply-To: <320fb6e00912150513g6ca79e70lf86bca737fef5f5f@mail.gmail.com> References: <4B277D8F.8010205@imm.ox.ac.uk> <320fb6e00912150513g6ca79e70lf86bca737fef5f5f@mail.gmail.com> Message-ID: <4B27AAC3.9070209@imm.ox.ac.uk> Peter wrote: > On Tue, Dec 15, 2009 at 12:14 PM, Steve Taylor > wrote: >> Hi, >> >> I am trying to load a Genpept entry into MSE, EMBOSS Version 6.0.1 on >> Fedora. Unfortunately it doesn't like the LOCUS line. >> >> It loads, but warns: >> >> Warning: bad Genbank LOCUS line 'LOCUS ACN78416 225 aa >> linear BCT 21-MAR-2009' >> >> Changing the aa to bp fixes it. > > What command line did you use? If you specified format "genbank", > I think you should use format name "genpept" or "refseqp" instead: > http://emboss.sourceforge.net/docs/themes/SequenceFormats.html I didn't specify any format. I assumed it would pick it up... However, I still get the error if I use mse -sformat1 genpept -sequence ACN78417.pep Is this what you mean? Steve From biopython at maubp.freeserve.co.uk Tue Dec 15 10:56:34 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 15 Dec 2009 15:56:34 +0000 Subject: [EMBOSS] Genpept entry in MSE In-Reply-To: <4B27AAC3.9070209@imm.ox.ac.uk> References: <4B277D8F.8010205@imm.ox.ac.uk> <320fb6e00912150513g6ca79e70lf86bca737fef5f5f@mail.gmail.com> <4B27AAC3.9070209@imm.ox.ac.uk> Message-ID: <320fb6e00912150756y2a213b21v1917be1b3133d3b4@mail.gmail.com> On Tue, Dec 15, 2009 at 3:26 PM, Steve Taylor wrote: > > I didn't specify any format. I assumed it would pick it up... Emboss is normally pretty good at deducing file formats, so I would have expected it to cope too. > However, I still get the error if I use > > mse -sformat1 genpept -sequence ACN78417.pep > > Is this what you mean? Probably - although I don't think I have ever used mse myself. Hopefully an EMBOSS developer can enlighten us. Peter C. From pmr at ebi.ac.uk Tue Dec 15 11:41:34 2009 From: pmr at ebi.ac.uk (Peter Rice) Date: Tue, 15 Dec 2009 16:41:34 +0000 Subject: [EMBOSS] Genpept entry in MSE In-Reply-To: <320fb6e00912150756y2a213b21v1917be1b3133d3b4@mail.gmail.com> References: <4B277D8F.8010205@imm.ox.ac.uk> <320fb6e00912150513g6ca79e70lf86bca737fef5f5f@mail.gmail.com> <4B27AAC3.9070209@imm.ox.ac.uk> <320fb6e00912150756y2a213b21v1917be1b3133d3b4@mail.gmail.com> Message-ID: <4B27BC3E.3090601@ebi.ac.uk> On 15/12/09 15:56, Peter wrote: > On Tue, Dec 15, 2009 at 3:26 PM, Steve Taylor > wrote: >> >> I didn't specify any format. I assumed it would pick it up... > > Emboss is normally pretty good at deducing file formats, so I > would have expected it to cope too. > >> However, I still get the error if I use >> >> mse -sformat1 genpept -sequence ACN78417.pep >> >> Is this what you mean? > > Probably - although I don't think I have ever used mse myself. > > Hopefully an EMBOSS developer can enlighten us. Which version of EMBOSS are you running? We are looking into improving support for genpept and the refseq variants in a future release. Genpept was cleaned up in one of the patches for 6.1.0 so this should address your problem. regards, Peter Rice From kellert at ohsu.edu Tue Dec 15 15:16:07 2009 From: kellert at ohsu.edu (Tom Keller) Date: Tue, 15 Dec 2009 12:16:07 -0800 Subject: [EMBOSS] Macports/EMBOSS dyld issue Message-ID: Hi, I'm running Mac OS X 10.6, and have EMBOSS 6.0.1 installed via MacPorts. And I have macport installed jpeg.7.dylib at /opt/local/lib/ But I get the following error: $ wossname wossname dyld: Library not loaded: /opt/local/lib/libjpeg.62.dylib Referenced from: /opt/local/bin/wossname Reason: image not found Trace/BPT trap I tried making a link from jpeg.7.dylib to /opt/local/lib/libjpeg.62.dylib but then I get the error: dyld: Library not loaded: /opt/local/lib/libjpeg.62.dylib Referenced from: /opt/local/bin/wossname Reason: Incompatible library version: wossname requires version 63.0.0 or later, but libjpeg.62.dylib provides version 8.0.0 Trace/BPT trap Can someone suggest a solution? Thomas (Tom) Keller kellert at ohsu.edu 503.494.2442 6339b R Jones Hall (BSc/CROET) www.ohsu.edu/xd/research/research-cores/dna-analysis/ From biopython at maubp.freeserve.co.uk Tue Dec 15 16:21:22 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 15 Dec 2009 21:21:22 +0000 Subject: [EMBOSS] Macports/EMBOSS dyld issue In-Reply-To: References: Message-ID: <320fb6e00912151321r3453c849t32beb55888988021@mail.gmail.com> On Tue, Dec 15, 2009 at 8:16 PM, Tom Keller wrote: > > Hi, > I'm running Mac OS X 10.6, and have EMBOSS 6.0.1 installed via MacPorts. And I have macport installed jpeg.7.dylib at /opt/local/lib/ > > But I get the following error: > $ wossname wossname > dyld: Library not loaded: /opt/local/lib/libjpeg.62.dylib > ?Referenced from: /opt/local/bin/wossname > ?Reason: image not found > Trace/BPT trap > > I tried making a link from jpeg.7.dylib to /opt/local/lib/libjpeg.62.dylib but then I get the error: > > dyld: Library not loaded: /opt/local/lib/libjpeg.62.dylib > ?Referenced from: /opt/local/bin/wossname > ?Reason: Incompatible library version: wossname requires version 63.0.0 or later, but libjpeg.62.dylib provides version 8.0.0 > Trace/BPT trap > > Can someone suggest a solution? > > Thomas (Tom) Keller > kellert at ohsu.edu > 503.494.2442 > 6339b R Jones Hall (BSc/CROET) > www.ohsu.edu/xd/research/research-cores/dna-analysis/ That looks like two problems, you seem to have libjpeg 62.x.x which is too old, but also EMBOSS (or dyld) isn't reporting the same kind of version number. Do you (or MacPorts) have a libjpeg.63.dylib file you could try? [I've never tried this - this is an informed guess at best] Peter From d.leader at bio.gla.ac.uk Wed Dec 16 07:29:56 2009 From: d.leader at bio.gla.ac.uk (David Leader) Date: Wed, 16 Dec 2009 12:29:56 +0000 Subject: [EMBOSS] Rebase installation nightmare Message-ID: <52CBBC8D-4374-4E88-AFE4-1A09F10E68CC@bio.gla.ac.uk> I discovered that the programs using restriction enzymes do not work on my installations of EMBOSS: ERROR application terminated EMBOSS An error in remap.c at line 236: Cannot locate enzyme file. Run REBASEEXTRACT So I tried to follow the instructions on the EMBOSS site at sourceforge: ------------------------------------------------------------------------ -------- Rebase is the restriction enzyme database maintained by New England Biolabs. It is needed for programs such as remap and restrict. The latest version of Rebase can be obtained by anonymous FTP.3.10 EMBOSS needs the withrefm file. The data is extracted for EMBOSS with the program rebaseextract. % mkdir /site/prog/emboss/data/REBASE % rebaseextract Extract data from REBASE Full pathname of WITHREFM: /data/rebase/withrefm.208 Rebase is now installed and ready to use. ------------------------------------------------------ ... FTP.3.10 ftp://ftp.ebi.ac.uk/pub/databases/rebase ------------------------------------------------------ 1. I downloaded the file withrefm.912.Z and extracted the file withrefm.912 2. I attempted to make a directory called REBASE but it already existed at /usr/local/EMBOSS-3.0.0/emboss/data/REBASE on my OS X installation. The contents were; Makefile.am Makefile.in dummyfile 3. OK, so I tried to run rebaseextract: david: rebaseextract Extract data from REBASE Full pathname of WITHREFM file: /Users/david/withrefm.912 Full pathname of PROTO file: Nothing in the instructions about a PROTO file. Go back to the ftp site and download proto.912 Full pathname of PROTO file: /Users/david/proto.912 EMBOSS An error in ajfile.c at line 2173: Cannot write to file /usr/local/share/EMBOSS/data/REBASE/embossre.enz 4. So I check whether there is a directory /usr/local/share/EMBOSS/ data/REBASE. There is, with a file entitled "dummyfile". 5. Give up. Life's too short to learn C. David ___________________________________________________ Dr. David P. Leader, Faculty of Biomedical & Life Sciences, University of Glasgow, Glasgow G12 8QQ, UK Phone: +44 (0)141 330 5905 http://doolittle.ibls.gla.ac.uk/leader http://motif.gla.ac.uk/ The University of Glasgow, charity number SC004401 ___________________________________________________ From uludag at ebi.ac.uk Wed Dec 16 09:12:45 2009 From: uludag at ebi.ac.uk (Mahmut Uludag) Date: Wed, 16 Dec 2009 14:12:45 +0000 Subject: [EMBOSS] Rebase installation nightmare In-Reply-To: <52CBBC8D-4374-4E88-AFE4-1A09F10E68CC@bio.gla.ac.uk> References: <52CBBC8D-4374-4E88-AFE4-1A09F10E68CC@bio.gla.ac.uk> Message-ID: <1260972765.1497.97.camel@emboss1.ebi.ac.uk> Hi David, > EMBOSS An error in ajfile.c at line 2173: > Cannot write to file /usr/local/share/EMBOSS/data/REBASE/embossre.enz This could be a simple write permission error. Typically above location (/usr/local/share/) should be updated using admin accounts. I also noticed that your EMBOSS version is not one of the recent releases. I don't know whether there has been any major changes in rebase file syntax during last few years that would make EMBOSS-3 not compatible with latest rebase files. Regards, Mahmut From gbottu at vub.ac.be Wed Dec 16 09:28:34 2009 From: gbottu at vub.ac.be (Guy Bottu) Date: Wed, 16 Dec 2009 15:28:34 +0100 Subject: [EMBOSS] Rebase installation nightmare In-Reply-To: <52CBBC8D-4374-4E88-AFE4-1A09F10E68CC@bio.gla.ac.uk> References: <52CBBC8D-4374-4E88-AFE4-1A09F10E68CC@bio.gla.ac.uk> Message-ID: <4B28EE92.8000903@vub.ac.be> Yes, this is an old pain. It is difficult to find good EMBOSS documentation. Should improbe when John Ison puts the new manual on-line... > 4. So I check whether there is a directory > /usr/local/share/EMBOSS/data/REBASE. There is, with a file entitled > "dummyfile". Is the directory REBASE writable for the UNIX user who runs the program rebaseextract ? Guy Bottu From d.leader at bio.gla.ac.uk Wed Dec 16 09:57:21 2009 From: d.leader at bio.gla.ac.uk (David Leader) Date: Wed, 16 Dec 2009 14:57:21 +0000 Subject: [EMBOSS] Rebase installation nightmare Message-ID: <0F56441F-517A-416D-B9B5-75C1B6AF68F9@bio.gla.ac.uk> > I discovered that the programs using restriction enzymes do not > work on my installations of EMBOSS: > > ERROR application terminated > EMBOSS An error in remap.c at line 236: > Cannot locate enzyme file. Run REBASEEXTRACT > > So I tried to follow the instructions on the EMBOSS site at > sourceforge: OK, woken from the nightmare, thanks to Guy and Mahmut. Always forget that even though I'm an admin user at the GUI level on my Mac I need to sudo to install things on the terminal. (In my defence, if I hadn't been asked for files that weren't mentioned in the docs I might have considered that it was my fault rather than emboss's.) David ___________________________________________________ Dr. David P. Leader, Faculty of Biomedical & Life Sciences, University of Glasgow, Glasgow G12 8QQ, UK Phone: +44 (0)141 330 5905 http://doolittle.ibls.gla.ac.uk/leader http://motif.gla.ac.uk/ The University of Glasgow, charity number SC004401 ___________________________________________________ From sean at seanmcollins.com Tue Dec 22 09:24:29 2009 From: sean at seanmcollins.com (Sean Collins) Date: Tue, 22 Dec 2009 09:24:29 -0500 Subject: [EMBOSS] EMBOSS amd64 packages Message-ID: <96FA7C41-3AED-403B-A9A7-56EA8603C049@seanmcollins.com> Hello, I have built amd64 packages of the latest stable EMBOSS (emboss.sourceforge.net) release for CentOS 5.4 and would like to share them with the community. I have spec files, SRPMS and RPMs. Thank You, Sean Collins From belegdol at gmail.com Tue Dec 22 14:59:45 2009 From: belegdol at gmail.com (Julian Sikorski) Date: Tue, 22 Dec 2009 20:59:45 +0100 Subject: [EMBOSS] EMBOSS amd64 packages In-Reply-To: <96FA7C41-3AED-403B-A9A7-56EA8603C049@seanmcollins.com> References: <96FA7C41-3AED-403B-A9A7-56EA8603C049@seanmcollins.com> Message-ID: W dniu 22.12.2009 15:24, Sean Collins pisze: > Hello, > > I have built amd64 packages of the latest stable EMBOSS > (emboss.sourceforge.net) release for CentOS 5.4 and would like to share > them with the community. I have spec files, SRPMS and RPMs. > > Thank You, > Sean Collins Hi, I package EMBOSS for Fedora. Did you create your spec files from scratch, or did you use mine as a basis? Maybe we could join forces and you could co-maintain emboss in the EL branch? Are you a Fedora packager? Julian From sean at seanmcollins.com Tue Dec 22 15:38:43 2009 From: sean at seanmcollins.com (Sean Collins) Date: Tue, 22 Dec 2009 15:38:43 -0500 Subject: [EMBOSS] EMBOSS amd64 packages In-Reply-To: References: <96FA7C41-3AED-403B-A9A7-56EA8603C049@seanmcollins.com> Message-ID: <8A6FFA1B-DE98-41C6-88AD-09AE1150F935@seanmcollins.com> Hi Julian, pleasure to speak with you. On Dec 22, 2009, at 2:59 PM, Julian Sikorski wrote: > I package EMBOSS for Fedora. Did you create your spec files from > scratch, or did you use mine as a basis? I extracted spec files from the Emboss 5.0.0 RPM on the emboss site, which has Ryan Golhar listed in the %changelog section. > Maybe we could join forces and > you could co-maintain emboss in the EL branch? Are you a Fedora > packager? I am not a Fedora packager but would be happy to become one to maintain an EL/EPEL branch of EMBOSS. Thank You, Sean Collins From michael.watson at bbsrc.ac.uk Tue Dec 1 14:33:23 2009 From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C)) Date: Tue, 1 Dec 2009 14:33:23 +0000 Subject: [EMBOSS] Trimming illumina short reads based on quality Message-ID: <8D08960C647E64438CE5740657CBBDC50148732064@iahcexch1.iah.bbsrc.ac.uk> Hi I'm sorry if I've not been keeping up to date on what is doubtless a hot topic. Does EMBOSS allow one to trim short reads based on quality data (from a fastq file)? If not, I have read that it is planned - any idea when it will be implemented? Otherwise, alternative suggestions are welcome! Thanks Mick From golharam at umdnj.edu Tue Dec 1 15:46:20 2009 From: golharam at umdnj.edu (Ryan Golhar) Date: Tue, 01 Dec 2009 10:46:20 -0500 Subject: [EMBOSS] Trimming illumina short reads based on quality In-Reply-To: <8D08960C647E64438CE5740657CBBDC50148732064@iahcexch1.iah.bbsrc.ac.uk> References: <8D08960C647E64438CE5740657CBBDC50148732064@iahcexch1.iah.bbsrc.ac.uk> Message-ID: <4B153A4C.6000904@umdnj.edu> Michael, Doesn't Illumina provide tools to do this? I know with ABI Solid data, they have a perl script capable of trimming data based on quality scores. Ryan michael watson (IAH-C) wrote: > Hi > > I'm sorry if I've not been keeping up to date on what is doubtless a hot topic. > > Does EMBOSS allow one to trim short reads based on quality data (from a fastq file)? > > If not, I have read that it is planned - any idea when it will be implemented? > > Otherwise, alternative suggestions are welcome! > > Thanks > Mick > > > _______________________________________________ > EMBOSS mailing list > EMBOSS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/emboss > From d.m.a.martin at dundee.ac.uk Tue Dec 1 16:16:21 2009 From: d.m.a.martin at dundee.ac.uk (David Martin) Date: Tue, 01 Dec 2009 16:16:21 +0000 Subject: [EMBOSS] Trimming illumina short reads based on quality In-Reply-To: <4B153A4C.6000904@umdnj.edu> References: <8D08960C647E64438CE5740657CBBDC50148732064@iahcexch1.iah.bbsrc.ac.uk> <4B153A4C.6000904@umdnj.edu> Message-ID: <4B154155.6F09.00E0.1@dundee.ac.uk> >>> On 12/1/2009 at 3:46 PM, in message <4B153A4C.6000904 at umdnj.edu>, Ryan Golhar wrote: I think virtually every man and his dog who has done anything with Illumina reads has a variety of perl scripts that do this. It depends how you want to do the trimming. Do you want to clip to a specific length, clip on quality (absolute or average over a window) and do you have a minimum length requirement? Do you want to clip 3' and 5' ends or just one? ..d Michael, Doesn't Illumina provide tools to do this? I know with ABI Solid data, they have a perl script capable of trimming data based on quality scores. Ryan michael watson (IAH-C) wrote: > Hi > > I'm sorry if I've not been keeping up to date on what is doubtless a hot topic. > > Does EMBOSS allow one to trim short reads based on quality data (from a fastq file)? > > If not, I have read that it is planned - any idea when it will be implemented? > > Otherwise, alternative suggestions are welcome! > David Martin PhD College of Life Sciences University of Dundee The University of Dundee is a Scottish Registered Charity, No. SC015096. The University of Dundee is a registered Scottish charity, No: SC015096 From matthias.dodt at mdc-berlin.de Tue Dec 1 16:17:48 2009 From: matthias.dodt at mdc-berlin.de (Matthias Dodt) Date: Tue, 01 Dec 2009 17:17:48 +0100 Subject: [EMBOSS] using sixpack Message-ID: <4B1541AC.8000301@mdc-berlin.de> Hi there! I have some problems using sixpack for 6-frame translation. I want to convert a fasta file of contigs with sixpack. The command is: sixpack contigs.fa -outseq protein_sequence The problem is that sixpack only converts the first sequence in the fasta file. How can i force it to process the whole file?? thanks! greetings mat From ztu at msi.umn.edu Tue Dec 1 18:38:54 2009 From: ztu at msi.umn.edu (Zheng Jin Tu) Date: Tue, 1 Dec 2009 12:38:54 -0600 (CST) Subject: [EMBOSS] Trimming illumina short reads based on quality In-Reply-To: <4B154155.6F09.00E0.1@dundee.ac.uk> References: <8D08960C647E64438CE5740657CBBDC50148732064@iahcexch1.iah.bbsrc.ac.uk> <4B153A4C.6000904@umdnj.edu> <4B154155.6F09.00E0.1@dundee.ac.uk> Message-ID: Need to find bioinformatician to do the coding. Not sure how to set correct filter parameters but just sharing some experiences: Basically person will check both sequence string and quality string from xxx_qseq.txt file. Match each nucleotide and quality in char level: 454, qual 20 is 95% and 40 99% confident if I am correct. In GA, qual score is bit coded and can be read out by ord function in perl: qualcore = ord( quality_char ) - 64; Not sure the cut off appropriate value. B is quality score 2. Thus at least we remove these BBBBB. May set another min length filter to get rid of less than like 10 nucleotides read after trimming for low quality score. Set another one max_score or avg_score filter, like 5, it can filter out the third and forth sequences in below lines. R0174436 1 8 119 0 1418 0 1 .GATCTTCTCCTTCACCTCCTCCAGGTCCTTGGTCAGCTCAGCACGCAGAG Bb^`bb____bbaaVI_Zbbaba`X_bb`aUbbb`W\\a^\bbT[_Xb]__ 0 R0174436 1 8 119 0 991 0 1 .GCCAATCTGTACTTGTCTTCTTCAGTTCCCACTTTGAATACCGCACAGTC BaGT]]bb[]`_]abaIaaaVbb^``abbaM`Ubbb`babaQT]XS_[a[B 0 R0174436 1 8 119 1791 1559 0 1 A.................................................. BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB 0 R0174436 1 8 119 1791 1997 0 1 A.................................................. BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB 0 Some people also look for trimming down poly T in RNA-seq case. But not sure how many TTTT should be out. Or also do AAAA case for reverse case? Finally better output to fastq file format. @R0174436:6:83:0:1815#0/1 .GTCAATGCGTTCCACCCCCTCTGGGTAGCCTCCAACATCATGTACGTCGA +R0174436:6:83:0:1815#0/1 Ba`babbb]bb\Xb]b___V^^aaaa___Z_\_[aaa]babb`X`^b\bbb @R0174436:6:83:0:506#0/1 .GCAGGAGAAGCATTTTATCTTTGTATTTTCTTCACTGGCAACAACAATGT +R0174436:6:83:0:506#0/1 BaOW_\I]__a``a_\_J_HU_V_J\a`aa^bab^^]]]Y]^`[`[T]]\^ Good luck. TU =================================================== On Tue, 1 Dec 2009, David Martin wrote: > >>> On 12/1/2009 at 3:46 PM, in message <4B153A4C.6000904 at umdnj.edu>, Ryan Golhar wrote: > I think virtually every man and his dog who has done anything with Illumina reads has a variety of perl scripts that do this. It depends how you want to do the trimming. Do you want to clip to a specific length, clip on quality (absolute or average over a window) and do you have a minimum length requirement? > > Do you want to clip 3' and 5' ends or just one? > > ..d > > > Michael, > > Doesn't Illumina provide tools to do this? I know with ABI Solid data, > they have a perl script capable of trimming data based on quality scores. > > Ryan > > > michael watson (IAH-C) wrote: > > Hi > > > > I'm sorry if I've not been keeping up to date on what is doubtless a hot topic. > > > > Does EMBOSS allow one to trim short reads based on quality data (from a fastq file)? > > > > If not, I have read that it is planned - any idea when it will be implemented? > > > > Otherwise, alternative suggestions are welcome! > > > > > > David Martin PhD > College of Life Sciences > University of Dundee > The University of Dundee is a Scottish Registered Charity, No. SC015096. > > > The University of Dundee is a registered Scottish charity, No: SC015096 > > _______________________________________________ > EMBOSS mailing list > EMBOSS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/emboss > From biopython at maubp.freeserve.co.uk Tue Dec 1 19:45:58 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 1 Dec 2009 19:45:58 +0000 Subject: [EMBOSS] Trimming illumina short reads based on quality In-Reply-To: <8D08960C647E64438CE5740657CBBDC50148732064@iahcexch1.iah.bbsrc.ac.uk> References: <8D08960C647E64438CE5740657CBBDC50148732064@iahcexch1.iah.bbsrc.ac.uk> Message-ID: <320fb6e00912011145i2111042ap222cd8fdaef0364b@mail.gmail.com> On Tue, Dec 1, 2009 at 2:33 PM, michael watson (IAH-C) wrote: > > Hi > > I'm sorry if I've not been keeping up to date on what is doubtless a hot topic. > > Does EMBOSS allow one to trim short reads based on quality data (from a fastq file)? > > If not, I have read that it is planned - any idea when it will be implemented? Not yet, but it has been proposed and I understand it is on the EMBOSS to do list along with quality filtering (Peter Rice has suggested the name quaffle for this): http://lists.open-bio.org/pipermail/bioperl-l/2009-July/030493.html I dare say suggestions for precise trimming algorithms (e.g. median over sliding window) might be welcome. > Otherwise, alternative suggestions are welcome! I'm sure there are plenty of scripts out these, in Perl, Python etc. What is your language of choice? Peter C. From zen.lu at roslin.ed.ac.uk Tue Dec 1 20:15:46 2009 From: zen.lu at roslin.ed.ac.uk (zen lu (RI)) Date: Tue, 1 Dec 2009 20:15:46 +0000 Subject: [EMBOSS] Trimming illumina short reads based on quality In-Reply-To: <8D08960C647E64438CE5740657CBBDC50148732064@iahcexch1.iah.bbsrc.ac.uk> References: <8D08960C647E64438CE5740657CBBDC50148732064@iahcexch1.iah.bbsrc.ac.uk> Message-ID: <050C9A545DC1D84BAC7A678B76A56C3C251E351D24@ebrcexch1.ebrc.bbsrc.ac.uk> The fastx-toolkit may be what you are looking for: http://hannonlab.cshl.edu/fastx_toolkit/ ________________________________________ From: emboss-bounces at lists.open-bio.org [emboss-bounces at lists.open-bio.org] On Behalf Of michael watson (IAH-C) [michael.watson at bbsrc.ac.uk] Sent: 01 December 2009 14:33 To: emboss at lists.open-bio.org Subject: [EMBOSS] Trimming illumina short reads based on quality Hi I'm sorry if I've not been keeping up to date on what is doubtless a hot topic. Does EMBOSS allow one to trim short reads based on quality data (from a fastq file)? If not, I have read that it is planned - any idea when it will be implemented? Otherwise, alternative suggestions are welcome! Thanks Mick _______________________________________________ EMBOSS mailing list EMBOSS at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/emboss From pmr at ebi.ac.uk Wed Dec 2 09:43:42 2009 From: pmr at ebi.ac.uk (Peter Rice) Date: Wed, 02 Dec 2009 09:43:42 +0000 Subject: [EMBOSS] using sixpack In-Reply-To: <4B1541AC.8000301@mdc-berlin.de> References: <4B1541AC.8000301@mdc-berlin.de> Message-ID: <4B1636CE.6050403@ebi.ac.uk> On 12/01/2009 04:17 PM, Matthias Dodt wrote: > Hi there! > > I have some problems using sixpack for 6-frame translation. I want to > convert a fasta file of contigs with sixpack. The command is: > > sixpack contigs.fa -outseq protein_sequence > > The problem is that sixpack only converts the first sequence in the > fasta file. How can i force it to process the whole file?? Two options: One is to change the EMBOSS code to loop over each sequence. The other is to write a script that extracts each sequence in turn and launches sixpack. We can consider this for the next EMBOSS release. It applies to other applications too. In general, would users (and developers of web and other interfaces) be happy if more applications could read every sequence in a fasta file? This raises questions of how to mark up the output so that it is clear where each results comes from. There will always be applications where it is more sensible to proces sonly a single sequence. A third option (there is so often another way): getorf will find and report open reading frames in all input sequences getorf contigs.fa -outseq protein_sequence There will be differences in the output - getorf limits ORFs to 30 nucleotides. You get the same effect in sixpack with -orfmin 10 (oops, sixpack counts amino acids - we will try to make them consistent in the next release!) You can also add -minsize 3 to the getorf command line to report all ORFs like sixpack does. Hope this helps, Peter Rice From matthias.dodt at mdc-berlin.de Thu Dec 3 09:39:43 2009 From: matthias.dodt at mdc-berlin.de (Matthias Dodt) Date: Thu, 03 Dec 2009 10:39:43 +0100 Subject: [EMBOSS] using sixpack In-Reply-To: <4B1636CE.6050403@ebi.ac.uk> References: <4B1541AC.8000301@mdc-berlin.de> <4B1636CE.6050403@ebi.ac.uk> Message-ID: <4B17875F.80803@mdc-berlin.de> Hello Peter! Thank you very much, getorf is sufficient for me- greetings mat Peter Rice schrieb: > On 12/01/2009 04:17 PM, Matthias Dodt wrote: >> Hi there! >> >> I have some problems using sixpack for 6-frame translation. I want to >> convert a fasta file of contigs with sixpack. The command is: >> >> sixpack contigs.fa -outseq protein_sequence >> >> The problem is that sixpack only converts the first sequence in the >> fasta file. How can i force it to process the whole file?? > > Two options: > > One is to change the EMBOSS code to loop over each sequence. > > The other is to write a script that extracts each sequence in turn and > launches sixpack. > > We can consider this for the next EMBOSS release. It applies to other > applications too. In general, would users (and developers of web and > other interfaces) be happy if more applications could read every > sequence in a fasta file? > > This raises questions of how to mark up the output so that it is clear > where each results comes from. There will always be applications where > it is more sensible to proces sonly a single sequence. > > A third option (there is so often another way): > > getorf will find and report open reading frames in all input sequences > > getorf contigs.fa -outseq protein_sequence > > There will be differences in the output - getorf limits ORFs to 30 > nucleotides. You get the same effect in sixpack with -orfmin 10 (oops, > sixpack counts amino acids - we will try to make them consistent in the > next release!) > > You can also add -minsize 3 to the getorf command line to report all > ORFs like sixpack does. > > Hope this helps, > > Peter Rice -- ------------------------------------------------ Matthias Dodt Scientific Programmer at Bioinformatics platform AG Dieterich Berlin Institute for Medical Systems Biology at the Max-Delbrueck-Center for Molecular Medicine Robert-Roessle-Strasse 10, 13125 Berlin, Germany fon: +49 30 9406 4261 email: matthias.dodt at mdc-berlin.de From biopython at maubp.freeserve.co.uk Mon Dec 7 19:36:30 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 7 Dec 2009 19:36:30 +0000 Subject: [EMBOSS] Unknown output format 'refseqp' and 'genpept' In-Reply-To: References: Message-ID: <320fb6e00912071136i26de9cadx9da04e1527999345@mail.gmail.com> Hi, I have a protein IntelliGenetics file used in the Biopython test suite: http://biopython.org/SRC/biopython/Tests/IntelliGenetics/VIF_mase-pro.txt I am using EMBOSS 6.1.0 (patch level 2 I think), and I am trying to turn this into a "GenBank Protein File", or GenPept file, using EMBOSS seqret. EMBOSS can read the file fine, this works: $ seqret -auto -sformat=ig -osformat=fasta VIF_mase-pro.txt temp.txt Giving FASTA output with 16 gapped protein sequences, which is good - although the ID of the first record is a bit odd. Using "genbank" as the output format in EMBOSS seems to mean nucleotide and not protein: $ seqret -auto -sformat=ig -osformat=genbank VIF_mase-pro.txt temp.txt Error: Sequence format 'genbank' not supported for protein sequences Error: Sequence format 'genbank' not supported for protein sequences ... Error: Sequence format 'genbank' not supported for protein sequences Referring to the documentation, http://emboss.sourceforge.net/docs/themes/SequenceFormats.html I then tried "genpept" and "refseqp": $ seqret -auto -sformat=ig -osformat=genpept VIF_mase-pro.txt temp.txt Error: Unknown output format 'genpept' Error: Unknown output format 'genpept' ... Error: unknown output format 'genpept' $ seqret -auto -sformat=ig -osformat=refseqp VIF_mase-pro.txt temp.txt Error: Unknown output format 'refseqp' Error: Unknown output format 'refseqp' ... Error: unknown output format 'refseqp' Doesn't EMBOSS seqret support genpept/refseqp as an output format? Thanks, Peter C. From pmr at ebi.ac.uk Tue Dec 8 13:32:41 2009 From: pmr at ebi.ac.uk (Peter Rice) Date: Tue, 08 Dec 2009 13:32:41 +0000 Subject: [EMBOSS] Unknown output format 'refseqp' and 'genpept' In-Reply-To: <320fb6e00912071136i26de9cadx9da04e1527999345@mail.gmail.com> References: <320fb6e00912071136i26de9cadx9da04e1527999345@mail.gmail.com> Message-ID: <4B1E5579.2070407@ebi.ac.uk> Peter wrote: > Hi, > > I have a protein IntelliGenetics file used in the Biopython test suite: > http://biopython.org/SRC/biopython/Tests/IntelliGenetics/VIF_mase-pro.txt > > I am using EMBOSS 6.1.0 (patch level 2 I think), and I am trying > to turn this into a "GenBank Protein File", or GenPept file, using > EMBOSS seqret. > > Doesn't EMBOSS seqret support genpept/refseqp as an output format? Oddly enough you are the first to ask for it. Does biopython have a definition of the fields it expects to write out in a GenPept or RefseqP format file? We would be able to allow GenBank as an alias for, presumably, genpept. Might be a good time to merge the format names and details from biopython and emboss. Where can Ifine the biopython ones? regards, Peter From biopython at maubp.freeserve.co.uk Tue Dec 8 13:53:13 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 8 Dec 2009 13:53:13 +0000 Subject: [EMBOSS] Unknown output format 'refseqp' and 'genpept' In-Reply-To: <4B1E5579.2070407@ebi.ac.uk> References: <320fb6e00912071136i26de9cadx9da04e1527999345@mail.gmail.com> <4B1E5579.2070407@ebi.ac.uk> Message-ID: <320fb6e00912080553w490f66vffab00edbe192069@mail.gmail.com> On Tue, Dec 8, 2009 at 1:32 PM, Peter Rice wrote: > > Peter wrote: >> >> Hi, >> >> I have a protein IntelliGenetics file used in the Biopython test suite: >> http://biopython.org/SRC/biopython/Tests/IntelliGenetics/VIF_mase-pro.txt It probably doesn't matter what the input file is here, the fact that it was an (obsolete) format like IntelliGenetics was just chance as I was working on a Biopython unit test. >> I am using EMBOSS 6.1.0 (patch level 2 I think), and I am trying >> to turn this into a "GenBank Protein File", or GenPept file, using >> EMBOSS seqret. >> >> Doesn't EMBOSS seqret support genpept/refseqp as an output format? > > Oddly enough you are the first to ask for it. That surprises me a little bit. Could I suggest you treat known input formats which are not supported as output formats a little differently and instead of this: unknown output format 'genpept' Perhaps give, format 'genpept' is not supported for output (only input) This would help the user rule out having a typo etc. > Does biopython have a definition of the fields it expects to write out in a > GenPept or RefseqP format file? We would be able to allow GenBank as an > alias for, presumably, genpept. Not explicitly, no. I was hoping to use EMBOSS for cross validation ;) With hindsight this may have been a mistake, but we use "genbank" format to mean either nucleotides of proteins. On parsing we just look at the units of length in the LOCUS line (bp or aa). We also try to cope with both the current NCBI files and some older variants we have in our unit tests (different offsets in the LOCUS line). > Might be a good time to merge the format names and details from biopython > and emboss. Where can Ifine the biopython ones? There are two tables on the wiki which include version information: http://biopython.org/wiki/SeqIO http://biopython.org/wiki/AlignIO You can also consult the built in documentation, also available online: http://biopython.org/DIST/docs/api/Bio.SeqIO-module.html http://biopython.org/DIST/docs/api/Bio.AlignIO-module.html For a long time I avoided having aliases (multiple names for the same thing). However, we now treat "gb" as an alias for "genbank" (since this is what the NCBI use in Entrez). We also treat "fastq-sanger" and "fastq" the same. Peter C (the one at Biopython) From pmr at ebi.ac.uk Tue Dec 8 14:11:59 2009 From: pmr at ebi.ac.uk (Peter Rice) Date: Tue, 08 Dec 2009 14:11:59 +0000 Subject: [EMBOSS] Unknown output format 'refseqp' and 'genpept' In-Reply-To: <320fb6e00912080553w490f66vffab00edbe192069@mail.gmail.com> References: <320fb6e00912071136i26de9cadx9da04e1527999345@mail.gmail.com> <4B1E5579.2070407@ebi.ac.uk> <320fb6e00912080553w490f66vffab00edbe192069@mail.gmail.com> Message-ID: <4B1E5EAF.80301@ebi.ac.uk> Peter C. wrote: > Could I suggest you treat known input formats which are not supported > as output formats a little differently and instead of this: > > unknown output format 'genpept' > > Perhaps give, > > format 'genpept' is not supported for output (only input) > > This would help the user rule out having a typo etc. A useful suggestion. We can apply that to feature formats too. I'll see what I can do. may be worth a tidy up on what we do with formats that are only valid for nucleotide or protein (though that is a little tricky as we currently try to let some fail over to an equivalent format. >> Does biopython have a definition of the fields it expects to write out in a >> GenPept or RefseqP format file? We would be able to allow GenBank as an >> alias for, presumably, genpept. > > Not explicitly, no. I was hoping to use EMBOSS for cross validation ;) No problem. We'll go first then and try to define standard formats. > With hindsight this may have been a mistake, but we use "genbank" > format to mean either nucleotides of proteins. On parsing we just > look at the units of length in the LOCUS line (bp or aa). We also > try to cope with both the current NCBI files and some older variants > we have in our unit tests (different offsets in the LOCUS line). We try that too on input, but for output we have to be explicit so the user can pick just one of the choices. regards, Peter R. From biopython at maubp.freeserve.co.uk Tue Dec 8 14:29:25 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 8 Dec 2009 14:29:25 +0000 Subject: [EMBOSS] Unknown output format 'refseqp' and 'genpept' In-Reply-To: <4B1E5EAF.80301@ebi.ac.uk> References: <320fb6e00912071136i26de9cadx9da04e1527999345@mail.gmail.com> <4B1E5579.2070407@ebi.ac.uk> <320fb6e00912080553w490f66vffab00edbe192069@mail.gmail.com> <4B1E5EAF.80301@ebi.ac.uk> Message-ID: <320fb6e00912080629h969fc27m46dc0165ff1832d0@mail.gmail.com> On Tue, Dec 8, 2009 at 2:11 PM, Peter Rice wrote: > >> With hindsight this may have been a mistake, but we use "genbank" >> format to mean either nucleotides of proteins. On parsing we just >> look at the units of length in the LOCUS line (bp or aa). We also >> try to cope with both the current NCBI files and some older variants >> we have in our unit tests (different offsets in the LOCUS line). > > We try that too on input, but for output we have to be explicit so the user > can pick just one of the choices. I imagine that as with Biopython, sometimes the user has made it explicit that they are dealing with nucleotides or proteins (lots of the EMBOSS tools have switches for this), so you know if you should be using "aa" or "bp" in the LOCUS line. Peter From kellert at ohsu.edu Tue Dec 8 20:13:36 2009 From: kellert at ohsu.edu (Tom Keller) Date: Tue, 8 Dec 2009 12:13:36 -0800 Subject: [EMBOSS] newbie emma (clustalw) question Message-ID: This is an ongoing frustration. I use EMBOSS only occasionally, and I can never find a good guide for setting environmental variables. For example, I wanted to use emma with a new install of clustalw2. I made a link to it in /usr/local/bin, but emma can't find it. How do I fix this in a easy to maintain manner? thanks, Tom Thomas (Tom) Keller kellert at ohsu.edu 503.494.2442 6339b R Jones Hall (BSc/CROET) www.ohsu.edu/xd/research/research-cores/dna-analysis/ From kellert at ohsu.edu Tue Dec 8 21:08:54 2009 From: kellert at ohsu.edu (Tom Keller) Date: Tue, 8 Dec 2009 13:08:54 -0800 Subject: [EMBOSS] newbie emma (clustalw) question In-Reply-To: References: Message-ID: <3A1B51FF-D28D-43DE-A555-E6B6D3163352@ohsu.edu> I finally found the answer. It's simple, but if you don't know it, your only guessing, and that is really not a good approach. The environmental variable to set is EMBOSS_CLUSTALW in whatever dot shell control file you use for setting env. for the bash shell, I edited .bashrc with: export EMBOSS_CLUSTALW=/usr/local/bin/clustalw How would one go about requesting that this sort of info be added as a comment to the application tfm page? So instead of reading COMMENT none it read: Set env variable EMBOSS_CLUSTALW= thanks, Tom On Dec 8, 2009, at 12:13 PM, Tom Keller wrote: > This is an ongoing frustration. I use EMBOSS only occasionally, and I can never find a good guide for setting environmental variables. For example, I wanted to use emma with a new install of clustalw2. I made a link to it in /usr/local/bin, but emma can't find it. How do I fix this in a easy to maintain manner? > > thanks, > Tom > > > Thomas (Tom) Keller > kellert at ohsu.edu > 503.494.2442 > 6339b R Jones Hall (BSc/CROET) > www.ohsu.edu/xd/research/research-cores/dna-analysis/ > > > _______________________________________________ > EMBOSS mailing list > EMBOSS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/emboss Thomas (Tom) Keller kellert at ohsu.edu 503.494.2442 6339b R Jones Hall (BSc/CROET) www.ohsu.edu/xd/research/research-cores/dna-analysis/ From pmr at ebi.ac.uk Wed Dec 9 09:21:43 2009 From: pmr at ebi.ac.uk (Peter Rice) Date: Wed, 09 Dec 2009 09:21:43 +0000 Subject: [EMBOSS] newbie emma (clustalw) question In-Reply-To: <3A1B51FF-D28D-43DE-A555-E6B6D3163352@ohsu.edu> References: <3A1B51FF-D28D-43DE-A555-E6B6D3163352@ohsu.edu> Message-ID: <4B1F6C27.6050502@ebi.ac.uk> On 08/12/09 21:08, Tom Keller wrote: > On Dec 8, 2009, at 12:13 PM, Tom Keller wrote: >> This is an ongoing frustration. I use EMBOSS only occasionally, and >> I can never find a good guide for setting environmental variables. >> For example, I wanted to use emma with a new install of clustalw2. >> I made a link to it in /usr/local/bin, but emma can't find it. How >> do I fix this in a easy to maintain manner? You got there first. The simple answer is that emma will look for clustalw (not clustalw2) in your path. This should include a link. If you can run clustalw from the command line then it should also run from within emma. > I finally found the answer. It's simple, but if you don't know it, > your only guessing, and that is really not a good approach. > > The environmental variable to set is EMBOSS_CLUSTALW in whatever dot > shell control file you use for setting env. > for the bash shell, I edited .bashrc with: > export EMBOSS_CLUSTALW=/usr/local/bin/clustalw This was a feature we added so that clustalw could be launched from a local installation, or to run an executable called clustalw2 (or clustalw183) > How would one go about requesting that this sort of info be added as > a comment to the application tfm page? So instead of reading > COMMENT > none > > it read: Set env variable EMBOSS_CLUSTALW= You just did :-) We will go through the external programs and add more information. There is a paragraph in the emma documentation (diagnostic error messages) but it is not easy to find (and probably has not been read by many users - I just found a typo in it). We plan for a future release to call all external applications in a common way, and to issue a standard error message describing the path and environment variable options. We will also add a reference to external programs in the ACD files for interface developers to identify dependencies on other packages. We will also look into adding messages to the EMBOSS configure script to warn about required third party packages. Thanks for the suggestions regards, Peter Rice From pmr at ebi.ac.uk Thu Dec 10 13:36:14 2009 From: pmr at ebi.ac.uk (Peter Rice) Date: Thu, 10 Dec 2009 13:36:14 +0000 Subject: [EMBOSS] EMBOSS 6.1.0 patch 1.3 Message-ID: <4B20F94E.4000506@ebi.ac.uk> A patch for EMBOSS 6.1.0 is on the FTP server. This fixes problems with extractfeat, using format names with dashes (fastq-sanger) in USAs, scaling issues in plot outputs, and some minor bugs. The files are on our FTP server ftp://emboss.open-bio.org/pub/EMBOSS/fixes with a patch file and instructions in the patches subdirectory. Fix 3. EMBOSS-6.1.0/ajax/ajfeat.c EMBOSS-6.1.0/ajax/ajfeat.h EMBOSS-6.1.0/ajax/ajgraph.c EMBOSS-6.1.0/ajax/ajmath.c EMBOSS-6.1.0/ajax/ajseq.c EMBOSS-6.1.0/ajax/ajseqread.c EMBOSS-6.1.0/ajax/ajseqwrite.c EMBOSS-6.1.0/nucleus/embmisc.c EMBOSS-6.1.0/nucleus/embmisc.h EMBOSS-6.1.0/nucleus/embpat.c EMBOSS-6.1.0/emboss/coderet.c EMBOSS-6.1.0/emboss/extractfeat.c EMBOSS-6.1.0/emboss/notseq.c EMBOSS-6.1.0/emboss/prettyplot.c EMBOSS-6.1.0/emboss/seqmatchall.c EMBOSS-6.1.0/emboss/showfeat.c EMBOSS-6.1.0/emboss/showpep.c EMBOSS-6.1.0/emboss/showseq.c EMBOSS-6.1.0/emboss/twofeat.c EMBOSS-6.1.0/jemboss/utils/install-jemboss-server.sh EMBOSS-6.1.0/jemboss/org/emboss/jemboss/server/AppendToLogFileThread.java EMBOSS-6.1.0/jemboss/org/emboss/jemboss/server/JembossAuthServer.java 02-Dec-2009: Fixes problems with extractfeat. The fix includes cleaner definitions of functions used to match feature tags and feature types which result in minor updates to 6 other applications. Extractfeat in previous versions used its own text parser to extract feature data from only a limited set of formats. In release 6.1.0 it was replaced by the standard EMBOSS feature table. With no options set, extractfeat rejected all features (type '*' was needed to extract features). Extractfeat default settings now extract all features from an entry. Features on the reverse strand were incorrectly processed (an effect caused by some of the old extractfeat code remaining). Reverse strand features are now correctly parsed, including both "join(complement())" and "complement(join())" syntax in EMBL/GenBank/DDBJ feature tables. Fixes an issue in GenBank parsing where the ORIGIN line is absent. Fixes scaling errors in prettyplot, especially in mEMBOSS when plotting to a window on screen (the default output). The plplot library does not report the true width and height for several devices. The assumptions in prettyplot depend on reasonable size estimates. Release 6.2.0 will have further corrections to plplot device scaling. Fixes the counting of non-coding features in coderet. Fixes a seqmatchall error for short sequences with perfect matches When reverse-complementing sequences, also reverses the quality scores. Allows '-' in format names in the USA syntax, to allow fastq-sanger fastq-illumina and fastq-solexa format names to be used. When reading protein sequences, a sequence with only a stop is now recognized as empty (zero length) after processing ambiguity codes and stops. Fixes a problem writing features in PIR format when the feature table is empty, for example a report file with no hits. Fixes a dependency on 'ant' to install a Jemboss server. Fixes a problem in logging Jemboss info/error messages. regards, Peter Rice From charles-listes-emboss at plessy.org Mon Dec 14 11:59:18 2009 From: charles-listes-emboss at plessy.org (Charles Plessy) Date: Mon, 14 Dec 2009 20:59:18 +0900 Subject: [EMBOSS] EMBOSS 6.1.0 patch 1.3 In-Reply-To: <4B20F94E.4000506@ebi.ac.uk> References: <4B20F94E.4000506@ebi.ac.uk> Message-ID: <20091214115918.GB22410@kunpuu.plessy.org> Le Thu, Dec 10, 2009 at 01:36:14PM +0000, Peter Rice a ?crit : > A patch for EMBOSS 6.1.0 is on the FTP server. This fixes problems with > extractfeat, using format names with dashes (fastq-sanger) in USAs, > scaling issues in plot outputs, and some minor bugs. > > The files are on our FTP server ftp://emboss.open-bio.org/pub/EMBOSS/fixes > with a patch file and instructions in the patches subdirectory. > > Fix 3. EMBOSS-6.1.0/ajax/ajfeat.c > EMBOSS-6.1.0/ajax/ajfeat.h > EMBOSS-6.1.0/ajax/ajgraph.c > EMBOSS-6.1.0/ajax/ajmath.c > EMBOSS-6.1.0/ajax/ajseq.c > EMBOSS-6.1.0/ajax/ajseqread.c > EMBOSS-6.1.0/ajax/ajseqwrite.c > EMBOSS-6.1.0/nucleus/embmisc.c > EMBOSS-6.1.0/nucleus/embmisc.h > EMBOSS-6.1.0/nucleus/embpat.c > EMBOSS-6.1.0/emboss/coderet.c > EMBOSS-6.1.0/emboss/extractfeat.c > EMBOSS-6.1.0/emboss/notseq.c > EMBOSS-6.1.0/emboss/prettyplot.c > EMBOSS-6.1.0/emboss/seqmatchall.c > EMBOSS-6.1.0/emboss/showfeat.c > EMBOSS-6.1.0/emboss/showpep.c > EMBOSS-6.1.0/emboss/showseq.c > EMBOSS-6.1.0/emboss/twofeat.c > EMBOSS-6.1.0/jemboss/utils/install-jemboss-server.sh Dear Peter and all EMBOSS developers, There were two issues that were discussed previously on the EMBOSS user list or on the sourceforge tracker, that I do not see fixed in this patch. The EMBOSS package distributed by Debian contains a patch for each of them. 1) To prevent vectorstrip to discard FASTQ qualities. http://git.debian.org/?p=debian-med/emboss.git;a=blob;f=debian/patches/transient-vectorstrip.patch;h=c4d8fd1e6c6d223ec8664b1aa08171407f9ace55;hb=HEAD https://sourceforge.net/tracker/index.php?func=detail&aid=2886368&group_id=93650&atid=605034 2) To help tfm to find the HTML documentation. http://git.debian.org/?p=debian-med/emboss.git;a=blob;f=debian/patches/tfm-html.patch;h=c0595eb5008f53fec27e0ba4ff144c6385c6acdd;hb=HEAD https://sourceforge.net/tracker/?func=detail&aid=2877960&group_id=93650&atid=605031 I would welcome in particular comments on the second patch. If it is not suitable, then I will remove it from the Debian package. Best regards, -- Charles Plessy Debian Med packaging team, http://www.debian.org/devel/debian-med Tsurumi, Kanagawa, Japan From pmr at ebi.ac.uk Mon Dec 14 13:44:10 2009 From: pmr at ebi.ac.uk (Peter Rice) Date: Mon, 14 Dec 2009 13:44:10 +0000 Subject: [EMBOSS] EMBOSS 6.1.0 patch 1.3 In-Reply-To: <20091214115918.GB22410@kunpuu.plessy.org> References: <4B20F94E.4000506@ebi.ac.uk> <20091214115918.GB22410@kunpuu.plessy.org> Message-ID: <4B26412A.5020906@ebi.ac.uk> On 12/14/2009 11:59 AM, Charles Plessy wrote: > There were two issues that were discussed previously on the EMBOSS user list or > on the sourceforge tracker, that I do not see fixed in this patch. The EMBOSS > package distributed by Debian contains a patch for each of them. > > 1) To prevent vectorstrip to discard FASTQ qualities. > http://git.debian.org/?p=debian-med/emboss.git;a=blob;f=debian/patches/transient-vectorstrip.patch;h=c4d8fd1e6c6d223ec8664b1aa08171407f9ace55;hb=HEAD > https://sourceforge.net/tracker/index.php?func=detail&aid=2886368&group_id=93650&atid=605034 Sorry, we missed that update in goiing through the changes in the ajseq.c file. We will add it to the next patch. > 2) To help tfm to find the HTML documentation. > http://git.debian.org/?p=debian-med/emboss.git;a=blob;f=debian/patches/tfm-html.patch;h=c0595eb5008f53fec27e0ba4ff144c6385c6acdd;hb=HEAD > https://sourceforge.net/tracker/?func=detail&aid=2877960&group_id=93650&atid=605031 This one is wrong. If the documentation is not found in the install directory (which it should be with a standard 'make install') tfm looks in ajNamValueBaseDir which is the top level directory. The Debian patch uses ajNamValueRootDir which is the emboss/ subdirectory used when finding uninstalled ACD files. embossversion -full shows both these paths. Hope this helps, Peter Rice From stephen.taylor at imm.ox.ac.uk Tue Dec 15 12:14:07 2009 From: stephen.taylor at imm.ox.ac.uk (Steve Taylor) Date: Tue, 15 Dec 2009 12:14:07 +0000 Subject: [EMBOSS] Genpept entry in MSE Message-ID: <4B277D8F.8010205@imm.ox.ac.uk> Hi, I am trying to load a Genpept entry into MSE, EMBOSS Version 6.0.1 on Fedora. Unfortunately it doesn't like the LOCUS line. It loads, but warns: Warning: bad Genbank LOCUS line 'LOCUS ACN78416 225 aa linear BCT 21-MAR-2009' Changing the aa to bp fixes it. Thanks, Steve LOCUS ACN78416 225 aa linear BCT 21-MAR-2009 DEFINITION galactosyltransferase A [Pasteurella multocida]. ACCESSION ACN78416 VERSION ACN78416.1 GI:224999306 DBSOURCE accession FJ755839.1 KEYWORDS . SOURCE Pasteurella multocida ORGANISM Pasteurella multocida Bacteria; Proteobacteria; Gammaproteobacteria; Pasteurellales; Pasteurellaceae; Pasteurella. REFERENCE 1 (residues 1 to 225) AUTHORS Boyce,J.D., Harper,M., St Michael,F., John,M., Aubry,A., Parnas,H., Logan,S.M., Wilkie,I.W., Ford,M., Cox,A.D. and Adler,B. TITLE Identification of novel glycosyltransferases required for assembly of the Pasteurella multocida A:1 lipopolysaccharide and their involvement in virulence JOURNAL Infect. Immun. 77 (4), 1532-1542 (2009) PUBMED 19168738 REFERENCE 2 (residues 1 to 225) AUTHORS Harper,M., John,M., Adler,B. and Boyce,J. TITLE Direct Submission JOURNAL Submitted (17-FEB-2009) Microbiology, Monash University, Wellington road, Clayton, Melbourne, Victoria 3800, Australia COMMENT Method: conceptual translation supplied by author. FEATURES Location/Qualifiers source 1..225 /organism="Pasteurella multocida" /strain="VP161" /serotype="1" /db_xref="taxon:747" /note="serogroup: A" Protein 1..225 /product="galactosyltransferase A" /function="adds beta-D-galactose to both the 4 and 6 position of alpha-L,D-heptose IV" /name="putative bi-functional galactosyltransferase; GatA" Region 5..188 /region_name="Glyco_transf_25" /note="Glycosyltransferase family 25 [lipooligosaccharide (LOS) biosynthesis protein] is a family of glycosyltransferases involved in LOS biosynthesis. The members include the beta(1,4) galactosyltransferases: Lgt2 of Moraxella catarrhalis, LgtB and LgtE of...; cd06532" /db_xref="CDD:133474" CDS 1..225 /gene="gatA" /coded_by="FJ755839.1:1..678" /transl_table=11 ORIGIN 1 mklpkiivis lknsprrqii shrlsglgld feffdavygk dltkeeleki dyeffpkycg 61 skgaltlgei gcamshikiy ehivannleq viileddaiv slyfeeivla alqklpnrre 121 ilfldhgkak vypfmrnlpe ryrlaryrkp skhskrfivr ttaylitleg akkllkhayp 181 irmpsdfltg llqlthinay giepscvfgg veseinemer raglk // From biopython at maubp.freeserve.co.uk Tue Dec 15 13:13:54 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 15 Dec 2009 13:13:54 +0000 Subject: [EMBOSS] Genpept entry in MSE In-Reply-To: <4B277D8F.8010205@imm.ox.ac.uk> References: <4B277D8F.8010205@imm.ox.ac.uk> Message-ID: <320fb6e00912150513g6ca79e70lf86bca737fef5f5f@mail.gmail.com> On Tue, Dec 15, 2009 at 12:14 PM, Steve Taylor wrote: > > Hi, > > I am trying to load a Genpept entry into MSE, EMBOSS Version 6.0.1 on > Fedora. Unfortunately it doesn't like the LOCUS line. > > It loads, but warns: > > Warning: bad Genbank LOCUS line 'LOCUS ? ? ? ACN78416 ? ? ? ? ? ? ? ? 225 aa > ? ? ? ? ?linear ? BCT 21-MAR-2009' > > Changing the aa to bp fixes it. What command line did you use? If you specified format "genbank", I think you should use format name "genpept" or "refseqp" instead: http://emboss.sourceforge.net/docs/themes/SequenceFormats.html Peter From stephen.taylor at imm.ox.ac.uk Tue Dec 15 15:26:59 2009 From: stephen.taylor at imm.ox.ac.uk (Steve Taylor) Date: Tue, 15 Dec 2009 15:26:59 +0000 Subject: [EMBOSS] Genpept entry in MSE In-Reply-To: <320fb6e00912150513g6ca79e70lf86bca737fef5f5f@mail.gmail.com> References: <4B277D8F.8010205@imm.ox.ac.uk> <320fb6e00912150513g6ca79e70lf86bca737fef5f5f@mail.gmail.com> Message-ID: <4B27AAC3.9070209@imm.ox.ac.uk> Peter wrote: > On Tue, Dec 15, 2009 at 12:14 PM, Steve Taylor > wrote: >> Hi, >> >> I am trying to load a Genpept entry into MSE, EMBOSS Version 6.0.1 on >> Fedora. Unfortunately it doesn't like the LOCUS line. >> >> It loads, but warns: >> >> Warning: bad Genbank LOCUS line 'LOCUS ACN78416 225 aa >> linear BCT 21-MAR-2009' >> >> Changing the aa to bp fixes it. > > What command line did you use? If you specified format "genbank", > I think you should use format name "genpept" or "refseqp" instead: > http://emboss.sourceforge.net/docs/themes/SequenceFormats.html I didn't specify any format. I assumed it would pick it up... However, I still get the error if I use mse -sformat1 genpept -sequence ACN78417.pep Is this what you mean? Steve From biopython at maubp.freeserve.co.uk Tue Dec 15 15:56:34 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 15 Dec 2009 15:56:34 +0000 Subject: [EMBOSS] Genpept entry in MSE In-Reply-To: <4B27AAC3.9070209@imm.ox.ac.uk> References: <4B277D8F.8010205@imm.ox.ac.uk> <320fb6e00912150513g6ca79e70lf86bca737fef5f5f@mail.gmail.com> <4B27AAC3.9070209@imm.ox.ac.uk> Message-ID: <320fb6e00912150756y2a213b21v1917be1b3133d3b4@mail.gmail.com> On Tue, Dec 15, 2009 at 3:26 PM, Steve Taylor wrote: > > I didn't specify any format. I assumed it would pick it up... Emboss is normally pretty good at deducing file formats, so I would have expected it to cope too. > However, I still get the error if I use > > mse -sformat1 genpept -sequence ACN78417.pep > > Is this what you mean? Probably - although I don't think I have ever used mse myself. Hopefully an EMBOSS developer can enlighten us. Peter C. From pmr at ebi.ac.uk Tue Dec 15 16:41:34 2009 From: pmr at ebi.ac.uk (Peter Rice) Date: Tue, 15 Dec 2009 16:41:34 +0000 Subject: [EMBOSS] Genpept entry in MSE In-Reply-To: <320fb6e00912150756y2a213b21v1917be1b3133d3b4@mail.gmail.com> References: <4B277D8F.8010205@imm.ox.ac.uk> <320fb6e00912150513g6ca79e70lf86bca737fef5f5f@mail.gmail.com> <4B27AAC3.9070209@imm.ox.ac.uk> <320fb6e00912150756y2a213b21v1917be1b3133d3b4@mail.gmail.com> Message-ID: <4B27BC3E.3090601@ebi.ac.uk> On 15/12/09 15:56, Peter wrote: > On Tue, Dec 15, 2009 at 3:26 PM, Steve Taylor > wrote: >> >> I didn't specify any format. I assumed it would pick it up... > > Emboss is normally pretty good at deducing file formats, so I > would have expected it to cope too. > >> However, I still get the error if I use >> >> mse -sformat1 genpept -sequence ACN78417.pep >> >> Is this what you mean? > > Probably - although I don't think I have ever used mse myself. > > Hopefully an EMBOSS developer can enlighten us. Which version of EMBOSS are you running? We are looking into improving support for genpept and the refseq variants in a future release. Genpept was cleaned up in one of the patches for 6.1.0 so this should address your problem. regards, Peter Rice From kellert at ohsu.edu Tue Dec 15 20:16:07 2009 From: kellert at ohsu.edu (Tom Keller) Date: Tue, 15 Dec 2009 12:16:07 -0800 Subject: [EMBOSS] Macports/EMBOSS dyld issue Message-ID: Hi, I'm running Mac OS X 10.6, and have EMBOSS 6.0.1 installed via MacPorts. And I have macport installed jpeg.7.dylib at /opt/local/lib/ But I get the following error: $ wossname wossname dyld: Library not loaded: /opt/local/lib/libjpeg.62.dylib Referenced from: /opt/local/bin/wossname Reason: image not found Trace/BPT trap I tried making a link from jpeg.7.dylib to /opt/local/lib/libjpeg.62.dylib but then I get the error: dyld: Library not loaded: /opt/local/lib/libjpeg.62.dylib Referenced from: /opt/local/bin/wossname Reason: Incompatible library version: wossname requires version 63.0.0 or later, but libjpeg.62.dylib provides version 8.0.0 Trace/BPT trap Can someone suggest a solution? Thomas (Tom) Keller kellert at ohsu.edu 503.494.2442 6339b R Jones Hall (BSc/CROET) www.ohsu.edu/xd/research/research-cores/dna-analysis/ From biopython at maubp.freeserve.co.uk Tue Dec 15 21:21:22 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 15 Dec 2009 21:21:22 +0000 Subject: [EMBOSS] Macports/EMBOSS dyld issue In-Reply-To: References: Message-ID: <320fb6e00912151321r3453c849t32beb55888988021@mail.gmail.com> On Tue, Dec 15, 2009 at 8:16 PM, Tom Keller wrote: > > Hi, > I'm running Mac OS X 10.6, and have EMBOSS 6.0.1 installed via MacPorts. And I have macport installed jpeg.7.dylib at /opt/local/lib/ > > But I get the following error: > $ wossname wossname > dyld: Library not loaded: /opt/local/lib/libjpeg.62.dylib > ?Referenced from: /opt/local/bin/wossname > ?Reason: image not found > Trace/BPT trap > > I tried making a link from jpeg.7.dylib to /opt/local/lib/libjpeg.62.dylib but then I get the error: > > dyld: Library not loaded: /opt/local/lib/libjpeg.62.dylib > ?Referenced from: /opt/local/bin/wossname > ?Reason: Incompatible library version: wossname requires version 63.0.0 or later, but libjpeg.62.dylib provides version 8.0.0 > Trace/BPT trap > > Can someone suggest a solution? > > Thomas (Tom) Keller > kellert at ohsu.edu > 503.494.2442 > 6339b R Jones Hall (BSc/CROET) > www.ohsu.edu/xd/research/research-cores/dna-analysis/ That looks like two problems, you seem to have libjpeg 62.x.x which is too old, but also EMBOSS (or dyld) isn't reporting the same kind of version number. Do you (or MacPorts) have a libjpeg.63.dylib file you could try? [I've never tried this - this is an informed guess at best] Peter From d.leader at bio.gla.ac.uk Wed Dec 16 12:29:56 2009 From: d.leader at bio.gla.ac.uk (David Leader) Date: Wed, 16 Dec 2009 12:29:56 +0000 Subject: [EMBOSS] Rebase installation nightmare Message-ID: <52CBBC8D-4374-4E88-AFE4-1A09F10E68CC@bio.gla.ac.uk> I discovered that the programs using restriction enzymes do not work on my installations of EMBOSS: ERROR application terminated EMBOSS An error in remap.c at line 236: Cannot locate enzyme file. Run REBASEEXTRACT So I tried to follow the instructions on the EMBOSS site at sourceforge: ------------------------------------------------------------------------ -------- Rebase is the restriction enzyme database maintained by New England Biolabs. It is needed for programs such as remap and restrict. The latest version of Rebase can be obtained by anonymous FTP.3.10 EMBOSS needs the withrefm file. The data is extracted for EMBOSS with the program rebaseextract. % mkdir /site/prog/emboss/data/REBASE % rebaseextract Extract data from REBASE Full pathname of WITHREFM: /data/rebase/withrefm.208 Rebase is now installed and ready to use. ------------------------------------------------------ ... FTP.3.10 ftp://ftp.ebi.ac.uk/pub/databases/rebase ------------------------------------------------------ 1. I downloaded the file withrefm.912.Z and extracted the file withrefm.912 2. I attempted to make a directory called REBASE but it already existed at /usr/local/EMBOSS-3.0.0/emboss/data/REBASE on my OS X installation. The contents were; Makefile.am Makefile.in dummyfile 3. OK, so I tried to run rebaseextract: david: rebaseextract Extract data from REBASE Full pathname of WITHREFM file: /Users/david/withrefm.912 Full pathname of PROTO file: Nothing in the instructions about a PROTO file. Go back to the ftp site and download proto.912 Full pathname of PROTO file: /Users/david/proto.912 EMBOSS An error in ajfile.c at line 2173: Cannot write to file /usr/local/share/EMBOSS/data/REBASE/embossre.enz 4. So I check whether there is a directory /usr/local/share/EMBOSS/ data/REBASE. There is, with a file entitled "dummyfile". 5. Give up. Life's too short to learn C. David ___________________________________________________ Dr. David P. Leader, Faculty of Biomedical & Life Sciences, University of Glasgow, Glasgow G12 8QQ, UK Phone: +44 (0)141 330 5905 http://doolittle.ibls.gla.ac.uk/leader http://motif.gla.ac.uk/ The University of Glasgow, charity number SC004401 ___________________________________________________ From uludag at ebi.ac.uk Wed Dec 16 14:12:45 2009 From: uludag at ebi.ac.uk (Mahmut Uludag) Date: Wed, 16 Dec 2009 14:12:45 +0000 Subject: [EMBOSS] Rebase installation nightmare In-Reply-To: <52CBBC8D-4374-4E88-AFE4-1A09F10E68CC@bio.gla.ac.uk> References: <52CBBC8D-4374-4E88-AFE4-1A09F10E68CC@bio.gla.ac.uk> Message-ID: <1260972765.1497.97.camel@emboss1.ebi.ac.uk> Hi David, > EMBOSS An error in ajfile.c at line 2173: > Cannot write to file /usr/local/share/EMBOSS/data/REBASE/embossre.enz This could be a simple write permission error. Typically above location (/usr/local/share/) should be updated using admin accounts. I also noticed that your EMBOSS version is not one of the recent releases. I don't know whether there has been any major changes in rebase file syntax during last few years that would make EMBOSS-3 not compatible with latest rebase files. Regards, Mahmut From gbottu at vub.ac.be Wed Dec 16 14:28:34 2009 From: gbottu at vub.ac.be (Guy Bottu) Date: Wed, 16 Dec 2009 15:28:34 +0100 Subject: [EMBOSS] Rebase installation nightmare In-Reply-To: <52CBBC8D-4374-4E88-AFE4-1A09F10E68CC@bio.gla.ac.uk> References: <52CBBC8D-4374-4E88-AFE4-1A09F10E68CC@bio.gla.ac.uk> Message-ID: <4B28EE92.8000903@vub.ac.be> Yes, this is an old pain. It is difficult to find good EMBOSS documentation. Should improbe when John Ison puts the new manual on-line... > 4. So I check whether there is a directory > /usr/local/share/EMBOSS/data/REBASE. There is, with a file entitled > "dummyfile". Is the directory REBASE writable for the UNIX user who runs the program rebaseextract ? Guy Bottu From d.leader at bio.gla.ac.uk Wed Dec 16 14:57:21 2009 From: d.leader at bio.gla.ac.uk (David Leader) Date: Wed, 16 Dec 2009 14:57:21 +0000 Subject: [EMBOSS] Rebase installation nightmare Message-ID: <0F56441F-517A-416D-B9B5-75C1B6AF68F9@bio.gla.ac.uk> > I discovered that the programs using restriction enzymes do not > work on my installations of EMBOSS: > > ERROR application terminated > EMBOSS An error in remap.c at line 236: > Cannot locate enzyme file. Run REBASEEXTRACT > > So I tried to follow the instructions on the EMBOSS site at > sourceforge: OK, woken from the nightmare, thanks to Guy and Mahmut. Always forget that even though I'm an admin user at the GUI level on my Mac I need to sudo to install things on the terminal. (In my defence, if I hadn't been asked for files that weren't mentioned in the docs I might have considered that it was my fault rather than emboss's.) David ___________________________________________________ Dr. David P. Leader, Faculty of Biomedical & Life Sciences, University of Glasgow, Glasgow G12 8QQ, UK Phone: +44 (0)141 330 5905 http://doolittle.ibls.gla.ac.uk/leader http://motif.gla.ac.uk/ The University of Glasgow, charity number SC004401 ___________________________________________________ From sean at seanmcollins.com Tue Dec 22 14:24:29 2009 From: sean at seanmcollins.com (Sean Collins) Date: Tue, 22 Dec 2009 09:24:29 -0500 Subject: [EMBOSS] EMBOSS amd64 packages Message-ID: <96FA7C41-3AED-403B-A9A7-56EA8603C049@seanmcollins.com> Hello, I have built amd64 packages of the latest stable EMBOSS (emboss.sourceforge.net) release for CentOS 5.4 and would like to share them with the community. I have spec files, SRPMS and RPMs. Thank You, Sean Collins From belegdol at gmail.com Tue Dec 22 19:59:45 2009 From: belegdol at gmail.com (Julian Sikorski) Date: Tue, 22 Dec 2009 20:59:45 +0100 Subject: [EMBOSS] EMBOSS amd64 packages In-Reply-To: <96FA7C41-3AED-403B-A9A7-56EA8603C049@seanmcollins.com> References: <96FA7C41-3AED-403B-A9A7-56EA8603C049@seanmcollins.com> Message-ID: W dniu 22.12.2009 15:24, Sean Collins pisze: > Hello, > > I have built amd64 packages of the latest stable EMBOSS > (emboss.sourceforge.net) release for CentOS 5.4 and would like to share > them with the community. I have spec files, SRPMS and RPMs. > > Thank You, > Sean Collins Hi, I package EMBOSS for Fedora. Did you create your spec files from scratch, or did you use mine as a basis? Maybe we could join forces and you could co-maintain emboss in the EL branch? Are you a Fedora packager? Julian From sean at seanmcollins.com Tue Dec 22 20:38:43 2009 From: sean at seanmcollins.com (Sean Collins) Date: Tue, 22 Dec 2009 15:38:43 -0500 Subject: [EMBOSS] EMBOSS amd64 packages In-Reply-To: References: <96FA7C41-3AED-403B-A9A7-56EA8603C049@seanmcollins.com> Message-ID: <8A6FFA1B-DE98-41C6-88AD-09AE1150F935@seanmcollins.com> Hi Julian, pleasure to speak with you. On Dec 22, 2009, at 2:59 PM, Julian Sikorski wrote: > I package EMBOSS for Fedora. Did you create your spec files from > scratch, or did you use mine as a basis? I extracted spec files from the Emboss 5.0.0 RPM on the emboss site, which has Ryan Golhar listed in the %changelog section. > Maybe we could join forces and > you could co-maintain emboss in the EL branch? Are you a Fedora > packager? I am not a Fedora packager but would be happy to become one to maintain an EL/EPEL branch of EMBOSS. Thank You, Sean Collins