From charles-listes-emboss at plessy.org Wed Aug 5 06:16:57 2009 From: charles-listes-emboss at plessy.org (Charles Plessy) Date: Wed, 5 Aug 2009 19:16:57 +0900 Subject: [EMBOSS] Redistribution terms of PHILIPNEW. Message-ID: <20090805101657.GA26099@kunpuu.plessy.org> Dear EMBOSS developers, I am preparing a Debian package for EMBASSY?s PHILIPNEW package. The redistribution terms of Phyilp itself are: /* version 3.6. (c) Copyright 1993-2002 by the University of Washington. Written by Joseph Felsenstein, Akiko Fuseki, Sean Lamont, Andrew Keeffe, and Dan Fineman. Permission is granted to copy and use this program provided no fee is charged for it and provided that this copyright notice is not removed. */ And for its documentation: Copyright 1986-2000 by the University of Washington. Written by Joseph Felsenstein. Permission is granted to copy this document provided that no fee is charged for it and that this copyright notice is not removed. I see that the documentation in emboss-doc is a derivative of the Phylip documentation. What are the redistribution terms for it ? For the rest of the EMBOSS-specific work, there is a hint that the license could be the GNU GPL, since this is what the COPYING file contains, but the GNU GPL does not allow linking to software that prohibits commercial use. As copyright holders, you are not yourself bound by the GPL, so this does not prevent you from distributing PHYLIPNEW, but this buggy situation makes it un-redistributable for third parties like Debian. But maybe the license of the EMBASSY part of PHYLIPNEW is not the GNU GPL? Can you clarify? Have a nice day, -- Charles Plessy Debian Med packaging team, http://www.debian.org/devel/debian-med Tsurumi, Kanagawa, Japan From pmr at ebi.ac.uk Wed Aug 5 06:47:45 2009 From: pmr at ebi.ac.uk (Peter Rice) Date: Wed, 05 Aug 2009 11:47:45 +0100 Subject: [EMBOSS] Redistribution terms of PHILIPNEW. In-Reply-To: <20090805101657.GA26099@kunpuu.plessy.org> References: <20090805101657.GA26099@kunpuu.plessy.org> Message-ID: <4A796351.3000108@ebi.ac.uk> Charles Plessy wrote: > Dear EMBOSS developers, > > I see that the documentation in emboss-doc is a derivative of the Phylip > documentation. What are the redistribution terms for it ? The changes are only to conform to EMBOSS documentation style and to use EMBOSS examples. The Phylip redistribution terms apply. > For the rest of the EMBOSS-specific work, there is a hint that the license > could be the GNU GPL, since this is what the COPYING file contains, but the GNU > GPL does not allow linking to software that prohibits commercial use. As > copyright holders, you are not yourself bound by the GPL, so this does not > prevent you from distributing PHYLIPNEW, but this buggy situation makes it > un-redistributable for third parties like Debian. The original licence applies. The COPYING file has been accidentally left there. We will replace it with the phylip copyright statements from the phylip-3.68 doc/main.html file (and check the other EMBASSY packages). The AUTHORS file should be completed as it is presently empty. If you check the README file you will see the changes we made. They certainly do not change the code significantly, only the interface. > But maybe the license of the EMBASSY part of PHYLIPNEW is not the GNU GPL? Can > you clarify? Definitely not GNU GPL. regards, Peter Rice From biopython at maubp.freeserve.co.uk Thu Aug 6 13:28:05 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 6 Aug 2009 18:28:05 +0100 Subject: [EMBOSS] GFF/GFF2/GFF3 examples on EMBOSS webpage Message-ID: <320fb6e00908061028m776fbf9buc56e1fb73f7e3a0b@mail.gmail.com> Hi all, I was just looking at this page: http://emboss.sourceforge.net/docs/themes/SequenceFormats.html This table lists GFF2 as one entry, and GFF/GFF3 as another. They link to: http://emboss.sourceforge.net/docs/themes/seqformats/gff2 and http://emboss.sourceforge.net/docs/themes/seqformats/gff respectively. These examples appear to be indentical (and the header says it is a GFF2 file). So I am a bit confused. Should one be a GFF3 file, and simply one file was uploaded twice by mistake? Thanks, Peter C. From isabelle.wells at roche.com Tue Aug 18 04:25:41 2009 From: isabelle.wells at roche.com (Wells, Isabelle) Date: Tue, 18 Aug 2009 10:25:41 +0200 Subject: [EMBOSS] inosine in nucleotide sequence databases Message-ID: <6DE144B7487D104290A097EA7C0C356A016BB94889@rkamsem701.emea.roche.com> Hi all, Can emboss handle inosine in nucleotide sequences? We have a nucleotide file in embl format where some sequences contain inosine. Dbiflat doesn't seem to index the database properly although no error message was given and those inosine containing sequences cannot be retrieved with seqret. Any suggestions on what we could do apart from replacing inosine by X or N? Many thanks, Isabelle Wells From pmr at ebi.ac.uk Tue Aug 18 06:05:05 2009 From: pmr at ebi.ac.uk (Peter Rice) Date: Tue, 18 Aug 2009 11:05:05 +0100 Subject: [EMBOSS] inosine in nucleotide sequence databases In-Reply-To: <6DE144B7487D104290A097EA7C0C356A016BB94889@rkamsem701.emea.roche.com> References: <6DE144B7487D104290A097EA7C0C356A016BB94889@rkamsem701.emea.roche.com> Message-ID: <4A8A7CD1.9030700@ebi.ac.uk> Dear Isabelle, Wells, Isabelle wrote: > Can emboss handle inosine in nucleotide sequences? We have a > nucleotide file in embl format where some sequences contain inosine. > Dbiflat doesn't seem to index the database properly although no error > message was given and those inosine containing sequences cannot be > retrieved with seqret. Any suggestions on what we could do apart from > replacing inosine by X or N? I assume your dbiflat problem is an error in retrieving the entries, unless there is some other format problem in the database that prevents entries from being recognized by the dbiflat parser. If you can send me one of the Inosine-containing entries (or a fake entry if these one are proprietary information) I can check. We treat Inosine as a modified base. These are usually in RNA sequences. You should replace it by X or N and if you have an EMBL format feature table you could add a modified_base feature with a /mod_base=I qualifier to mark each Inosine. EMBOSS does nothing special with these in the current release, but you can perhaps suggest applications to use the modified base information. Hope this helps, Peter Rice From xiz407 at gmail.com Tue Aug 18 11:23:45 2009 From: xiz407 at gmail.com (Zhou Xiang) Date: Tue, 18 Aug 2009 10:23:45 -0500 Subject: [EMBOSS] can vectorstrip trim only a substring of the adapter? Message-ID: Hi all, I used the vectorstrip to trim the 3' adapter off the sequences. But it seemed that the program searched for the existence of the entire adapter. For example, if i have the read: CCCCCTTTTTAAAAAGGGGG And 3' adapter is: CCAAAGGG The program will not trim the read to CCCCCTTTTTAA Because it does not use the substring "AAAGGG" in the adapter sequence. Any comments about this? How can i trim only a substring of the adapter? I hope it can search for the longest match, but substring matches should also be accepted if no entire adapter is found in the sequence. Thanks! -Xiang From pmr at ebi.ac.uk Tue Aug 18 12:06:43 2009 From: pmr at ebi.ac.uk (Peter Rice) Date: Tue, 18 Aug 2009 17:06:43 +0100 Subject: [EMBOSS] can vectorstrip trim only a substring of the adapter? In-Reply-To: References: Message-ID: <4A8AD193.8050102@ebi.ac.uk> Dear Zhou Xiang, > How can i trim only a substring of the adapter? You can use the -mismatch parameter to increase the allowed number of mismatches. A higher percent mismatch allows less precise matching, but in this case the value needs to be set quite high (25). We are interested in any comments on removing 3' adapters from short reads. We expect that we can find improvements in the methods used by vectorstrip. Please send us any suggestions. regards, Peter Rice From Frank.Foerster at biozentrum.uni-wuerzburg.de Tue Aug 18 14:15:44 2009 From: Frank.Foerster at biozentrum.uni-wuerzburg.de (=?ISO-8859-15?Q?Frank_F=F6rster?=) Date: Tue, 18 Aug 2009 20:15:44 +0200 Subject: [EMBOSS] Needle with penalty for end gaps Message-ID: <4A8AEFD0.8040403@biozentrum.uni-wuerzburg.de> Hi, in the announcement-thread of the new EMBOSS version 6.1.0. was a request for a program allowing complete global alignments including penalties for end gaps by Daniel Barker. ( http://www.mail-archive.com/emboss at lists.open-bio.org/msg01202.html ) The suggestion was to add a command line parameter to the needle program to enable/disable the penalties. Are there any news on this topic? I have to perform a lot of pairwise global alignments (without free end gaps) and either I have to program my own software or use existing software. Needle owns all needed features except the "only free end behavior". So I am really interested in getting a version of needle able to help me ;) Thanks for the great EMBOSS package. Regards, Frank -- Dipl. Biochem. Frank F?rster Department of Bioinformatics University of W?rzburg, Germany Fon: +49 931 - 318 4555 Fax: +49 931 - 318 4552 frank.foerster at biozentrum.uni-wuerzburg.de From pmr at ebi.ac.uk Wed Aug 19 03:26:10 2009 From: pmr at ebi.ac.uk (Peter Rice) Date: Wed, 19 Aug 2009 08:26:10 +0100 Subject: [EMBOSS] Needle with penalty for end gaps In-Reply-To: <4A8AEFD0.8040403@biozentrum.uni-wuerzburg.de> References: <4A8AEFD0.8040403@biozentrum.uni-wuerzburg.de> Message-ID: <4A8BA912.4080904@ebi.ac.uk> Frank F?rster wrote: > Are there any news on this topic? I have to perform a lot of pairwise > global alignments (without free end gaps) and either I have to program > my own software or use existing software. Needle owns all needed > features except the "only free end behavior". So I am really interested > in getting a version of needle able to help me ;) We are just coming to the end of our "40 days and 40 nights" since the release when we try not to break anything by making changes - and while we work on finishing the book texts which is really what has kept us busy. We will get on to this next week (the 40 days runs out on Monday 24th :-) and can give you an early version to try. > Thanks for the great EMBOSS package. Thanks for the very welcome thanks! regards, Peter From Frank.Foerster at biozentrum.uni-wuerzburg.de Wed Aug 19 03:29:19 2009 From: Frank.Foerster at biozentrum.uni-wuerzburg.de (=?ISO-8859-15?Q?Frank_F=F6rster?=) Date: Wed, 19 Aug 2009 09:29:19 +0200 Subject: [EMBOSS] Needle with penalty for end gaps In-Reply-To: <4A8BA912.4080904@ebi.ac.uk> References: <4A8AEFD0.8040403@biozentrum.uni-wuerzburg.de> <4A8BA912.4080904@ebi.ac.uk> Message-ID: <4A8BA9CF.5070604@biozentrum.uni-wuerzburg.de> Dear Peter, thank you for your fast reply. > We will get on to this next week (the 40 days runs out on Monday 24th > :-) and can give you an early version to try. This sounds very kind of you. I can hardly wait but I will ;) Regards, Frank -- Dipl. Biochem. Frank F?rster Department of Bioinformatics University of W?rzburg, Germany Fon: +49 931 - 318 4555 Fax: +49 931 - 318 4552 frank.foerster at biozentrum.uni-wuerzburg.de From biopython at maubp.freeserve.co.uk Wed Aug 19 07:08:26 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 19 Aug 2009 12:08:26 +0100 Subject: [EMBOSS] vectorstrip on FASTQ files Message-ID: <320fb6e00908190408j25f2eca0l6356b0fcd0526422@mail.gmail.com> Hi, I'm trying to use vectorstrip on FASTQ files (as a simple way to remove adaptor or primer sequences). However, it seems that on output the FASTQ qualities are missing (all set to the double quote, ASCII 33, meaning PHRED quality 1 or random). Is this a known bug (or rather, a missing feature)? For illustration I am using a Sanger style FASTQ file from the NCBI SRA (short reads originally from Solexa/Illumina), SRR014849.fastq which you can download from ftp://ftp.ncbi.nlm.nih.gov/sra/static/SRX003/SRX003639/SRR014849.fastq.gz I am pretending "GTTGGAACCG" is 5' adaptor sequence, and want to find any matches in some FASTQ reads, and trim it off taking only the sequence to the right. For simplicity I'm allowing no mismatches. Here is the start of the file: $ head -n 12 SRR014849.fastq @SRR014849.1 EIXKN4201CFU84 length=93 GGGGGGGGGGGGGGGGCTTTTTTTGTTTGGAACCGAAAGGGTTTTGAATTTCAAACCCTTTTCGGTTTCCAACCTTCCAAAGCAATGCCAATA +SRR014849.1 EIXKN4201CFU84 length=93 3+&$#"""""""""""7F at 71,'";C?,B;?6B;:EA1EA1EA5'9B:?:#9EA0D at 2EA5':>5?:%A;A8A;?9B;D@/=5B;4B>+C?,EA09B;@;9E@/EA/E@/B:;1B:B:;A9<5SRR014849.9_from_31_to_84 EIXKN4201AL42E length=84 AAAGGGTTTGAATTCAAACCCTTTGGTTCCAACTTGTCTTGCTTTAGCCTTTTA Using Sanger FASTQ runs: $ vectorstrip -sequence SRR014849.fastq -sformat fastq-sanger -readfile N -alinker "GTTGGAACCG" -blinker "" -osformat fastq-sanger -outseq SRR014849_5trimmed.fastq -mismatch 0 -besthits Y -outfile SRR014849_5trimmed.txt Removes vectors from the ends of nucleotide sequence(s) But the output is missing the quality scores: $ head -n 4 SRR014849_5trimmed.fastq @SRR014849.9_from_31_to_84 EIXKN4201AL42E length=84 AAAGGGTTTGAATTCAAACCCTTTGGTTCCAACTTGTCTTGCTTTAGCCTTTTA + """""""""""""""""""""""""""""""""""""""""""""""""""""" Is this something simple to add to vectorstrip? What about other annotation (e.g. running vector strip on annotated GenBank or EMBL files)? Thanks, Peter C. P.S. This is with EMBOSS 6.1.0 with a patch from Peter Rice, running on Mac OS X. From pmr at ebi.ac.uk Wed Aug 19 07:24:41 2009 From: pmr at ebi.ac.uk (Peter Rice) Date: Wed, 19 Aug 2009 12:24:41 +0100 Subject: [EMBOSS] vectorstrip on FASTQ files In-Reply-To: <320fb6e00908190408j25f2eca0l6356b0fcd0526422@mail.gmail.com> References: <320fb6e00908190408j25f2eca0l6356b0fcd0526422@mail.gmail.com> Message-ID: <4A8BE0F9.2020404@ebi.ac.uk> Peter C. wrote: > Hi, > > I'm trying to use vectorstrip on FASTQ files (as a simple way to > remove adaptor or primer sequences). However, it seems that on output > the FASTQ qualities are missing (all set to the double quote, ASCII > 33, meaning PHRED quality 1 or random). Is this a known bug (or > rather, a missing feature)? It is a missing feature. vectorstrip was written before quality scores became fashionable and, curiously, nobody has asked for them before. We will certainly retain them in a future release. regards, Peter From biopython at maubp.freeserve.co.uk Wed Aug 19 07:31:17 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 19 Aug 2009 12:31:17 +0100 Subject: [EMBOSS] vectorstrip on FASTQ files In-Reply-To: <4A8BE0F9.2020404@ebi.ac.uk> References: <320fb6e00908190408j25f2eca0l6356b0fcd0526422@mail.gmail.com> <4A8BE0F9.2020404@ebi.ac.uk> Message-ID: <320fb6e00908190431i23a27ed7g46cf9223b191d5f5@mail.gmail.com> Peter Rice wrote: > > Peter C. wrote: >> Hi, >> >> I'm trying to use vectorstrip on FASTQ files (as a simple way to >> remove adaptor or primer sequences). However, it seems that on output >> the FASTQ qualities are missing (all set to the double quote, ASCII >> 33, meaning PHRED quality 1 or random). Is this a known bug (or >> rather, a missing feature)? > > It is a missing feature. vectorstrip was written before quality scores > became fashionable and, curiously, nobody has asked for them before. > > We will certainly retain them in a future release. Great - thanks! Peter C. From frank.foerster at biozentrum.uni-wuerzburg.de Mon Aug 24 05:01:49 2009 From: frank.foerster at biozentrum.uni-wuerzburg.de (=?ISO-8859-15?Q?Frank_F=F6rster?=) Date: Mon, 24 Aug 2009 11:01:49 +0200 Subject: [EMBOSS] Gap cost restrictions for needle/water/stretcher? Message-ID: <4A9256FD.8070708@biozentrum.uni-wuerzburg.de> Hi, I have only one question about the allowed gap costs in several programs. I using needle, water and stretcher for example. There are some restrictions to the gap costs a have to use: 1) needle: float from 0-100 for gapopen and 0-10 for gapextend 2) water: float from 0.000-10.000 for gapopen and 0.000-10.000 for gapextend 2) stretcher: positive integer What are the meaning of these restrictions? I think you use an integer value for stretcher (I did not check the source code) and floats for needle/water. But why the restriction for water to three decimal places? But more interessting, why the restriction to 0-100/0-10 for needle/water? Thank you for your efforts! Frank F?rster -- Dipl. Biochem. Frank F?rster Department of Bioinformatics University of W?rzburg, Germany Fon: +49 931 - 318 4555 Fax: +49 931 - 318 4552 frank.foerster at biozentrum.uni-wuerzburg.de From pmr at ebi.ac.uk Mon Aug 24 07:52:48 2009 From: pmr at ebi.ac.uk (Peter Rice) Date: Mon, 24 Aug 2009 12:52:48 +0100 Subject: [EMBOSS] Gap cost restrictions for needle/water/stretcher? In-Reply-To: <4A9256FD.8070708@biozentrum.uni-wuerzburg.de> References: <4A9256FD.8070708@biozentrum.uni-wuerzburg.de> Message-ID: <4A927F10.1000701@ebi.ac.uk> Frank F?rster wrote: > Hi, > > I have only one question about the allowed gap costs in several > programs. I using needle, water and stretcher for example. > > There are some restrictions to the gap costs a have to use: > > 1) needle: float from 0-100 for gapopen and 0-10 for gapextend > 2) water: float from 0.000-10.000 for gapopen and 0.000-10.000 for > gapextend > 2) stretcher: positive integer > > What are the meaning of these restrictions? I think you use an integer > value for stretcher (I did not check the source code) and floats for > needle/water. Stretcher and matcher were imported code that used integer values for speed. Our matrix files use integer values so we can use integer or flats as gap penalty values. > But why the restriction for water to three decimal places? There is no 3 decimal places restriction, we only use 3 decimal places to write out the values. > But more interesting, why the restriction to 0-100/0-10 for needle/water? We set limits for needle and water with the first release of EMBOSS and nobody has asked for a higher value. Zero is useful for some cases, either to not penalise the number of gaps (for example a large number of single base gapes in a single nucleotide read) or to not penalise the gap length (genomic sequence aligned to mRNA/cDNA). The upper limits are enough for the cases we have seen. More interesting is why we have no upper limit for stretcher and matcher. We should be consistent. These were third-party applications (from Bill Pearson's fasta2 package) that we imported. Does anyone object to setting the same gap penalty limits for all applications? Can anyone think of a use case that needs a larger maximum value? We can add applications to suggest gap penalties for each matrix file ... or store default values in the files. Is this useful? regards, Peter Rice From pmr at ebi.ac.uk Tue Aug 25 06:59:35 2009 From: pmr at ebi.ac.uk (Peter Rice) Date: Tue, 25 Aug 2009 11:59:35 +0100 Subject: [EMBOSS] EMBOSS patch 1-2 for 6.1.0 Message-ID: <4A93C417.2070502@ebi.ac.uk> A patch for EMBOSS 6.1.0 is on the FTP server. This fixes a problem with reading the new UniProt/SwissProt description line. The bug is in extending strings within lists, so it may have had other effects. We recommend patching your EMBOSS installations as several of the new format SwissProt entries are unreadable The files are on our FTP server ftp://emboss.open-bio.org/pub/EMBOSS/fixes with a patch file and instructions in the patches subdirectory. Fix 2. EMBOSS-6.1.0/ajax/ajmem.h EMBOSS-6.1.0/ajax/ajstr.c EMBOSS-6.1.0/ajax/ajstr.h EMBOSS-6.1.0/nucleus/embaln.c 24-Aug-2009: Fix string extension so that pointers in lists remain valid. This fixes a bug in processing SwissProt complex descriptions. Fix definition of AJRESIZE0 macro. Fix processing of first match in a prophet profile alignment regards, Peter Rice From CAPS at novozymes.com Tue Aug 25 07:56:34 2009 From: CAPS at novozymes.com (=?iso-8859-1?Q?CAPS_=28Carsten_P=2E_S=F6nksen=29?=) Date: Tue, 25 Aug 2009 13:56:34 +0200 Subject: [EMBOSS] Pepstats "Molecular weight" calculations Message-ID: <4D0464992D73D44A93400D893E2A2C763C01848AED@NZT0013E.dknz.nzcorp.net> Hi We are using Pepstat for molecular weight calculations and subsequent comparison with mass spectrometric determined masses. I am looking for the mass table used for the molecular weight calculations of the proteins in order to determine the accuracy. And how it could be possible to change it. The other question is implementation of a molecular weight assuming that the cysteins form disulfide bridges. This question is related to my first line. Since we compare the intact molecular weight of the proteins we want to be as precise as possible and thus measure the difference between reduced and oxidized cystein residues. Most proteins with cystein residues form disulfide bridges. Would it be possible to include a molecular weight calculation which takes disulfide bridges into account? So that an even nr of cysteins are calculated with the mass of oxidized cysteins (S-S) and if there should be an single cystein left then it is calculated with a sulfhydryl group (SH)? Best Regards Carsten P. S?nksen Senior Scientist Novozymes A/S Krogshoejvej 36 2880 Bagsvaerd Denmark Phone: +45 44461123 Mobile: +45 30771123 E-mail: caps at novozymes.com Novozymes A/S (reg. no.: 10007127). Registered address: Krogshoejvej 36 DK-2880 Bagsvaerd, Denmark This e-mail (including any attachments) is for the intended addressee(s) only and may contain confidential and/or proprietary information protected by law. You are hereby notified that any unauthorized reading, disclosure, copying or distribution of this e-mail or use of information herein is strictly prohibited. If you are not an intended recipient you should delete this e-mail immediately. Thank you. From pmr at ebi.ac.uk Tue Aug 25 08:56:30 2009 From: pmr at ebi.ac.uk (Peter Rice) Date: Tue, 25 Aug 2009 13:56:30 +0100 Subject: [EMBOSS] Pepstats "Molecular weight" calculations In-Reply-To: <4D0464992D73D44A93400D893E2A2C763C01848AED@NZT0013E.dknz.nzcorp.net> References: <4D0464992D73D44A93400D893E2A2C763C01848AED@NZT0013E.dknz.nzcorp.net> Message-ID: <4A93DF7E.4030305@ebi.ac.uk> CAPS (Carsten P. S?nksen) wrote: > Hi > We are using Pepstat for molecular weight calculations and subsequent > comparison with mass spectrometric determined masses. I am looking for > the mass table used for the molecular weight calculations of the > proteins in order to determine the accuracy. And how it could be > possible to change it. The table is in a file called Emolwt.dat This should be included in the local data files section of the pepstats documentation. We will add it. It is at least mentioned in the -help output and in the command line section of the documentation. The local data files section should describe the file in more detail. A copy in your local diretcory (embossdata-fetch will copy the EMBOSS version for you) will be used in preference to the installed copy. > The other question is implementation of a molecular weight assuming that > the cysteins form disulfide bridges. This question is related to my > first line. Since we compare the intact molecular weight of the proteins > we want to be as precise as possible and thus measure the difference > between reduced and oxidized cystein residues. Most proteins with > cystein residues form disulfide bridges. Would it be possible to include > a molecular weight calculation which takes disulfide bridges into > account? So that an even nr of cysteins are calculated with the mass of > oxidized cysteins (S-S) and if there should be an single cystein left > then it is calculated with a sulfhydryl group (SH)? Good suggestion. We can add that for the next release. we would add an option for the number of S-S bridges and adjust the molecular weight. We have a similar option already for iep. Is there a need for single cysteines to allow for inter-chain disulphide bridges? Are there any other adjustments you would like? regards, Peter Rice From CAPS at novozymes.com Tue Aug 25 09:46:28 2009 From: CAPS at novozymes.com (=?iso-8859-1?Q?CAPS_=28Carsten_P=2E_S=F6nksen=29?=) Date: Tue, 25 Aug 2009 15:46:28 +0200 Subject: [EMBOSS] Pepstats "Molecular weight" calculations In-Reply-To: <4A93DF7E.4030305@ebi.ac.uk> References: <4D0464992D73D44A93400D893E2A2C763C01848AED@NZT0013E.dknz.nzcorp.net> <4A93DF7E.4030305@ebi.ac.uk> Message-ID: <4D0464992D73D44A93400D893E2A2C763C01848E4B@NZT0013E.dknz.nzcorp.net> Hi Peter, Thanks a lot for fast and positive reply. Regarding the molecular weight calculation including the disulfide bridges: Would it be possible to have the option that pepstat always calculates the molecular weight for the highest number of possible disulfide bridges and if there is a single cysteine left then this one should be calculated with an sulfhydryl group? This option would also be nice for the iep calculation. "Is there a need for single cysteines to allow for inter-chain disulphide bridges?" Not currently I believe that we then turn into a level where you need human interaction. Right now no further adjustments in my mind. Do you have an estimated time range when I can expect the next release? Best Regards Carsten P. S?nksen Senior Scientist Novozymes A/S Krogshoejvej 36 2880 Bagsvaerd Denmark Phone: +45 44461123 Mobile: +45 30771123 E-mail: caps at novozymes.com Novozymes A/S (reg. no.: 10007127). Registered address: Krogshoejvej 36 DK-2880 Bagsvaerd, Denmark This e-mail (including any attachments) is for the intended addressee(s) only and may contain confidential and/or proprietary information protected by law. You are hereby notified that any unauthorized reading, disclosure, copying or distribution of this e-mail or use of information herein is strictly prohibited. If you are not an intended recipient you should delete this e-mail immediately. Thank you. -----Original Message----- From: Peter Rice [mailto:pmr at ebi.ac.uk] Sent: 25. august 2009 14:57 To: CAPS (Carsten P. S?nksen) Cc: emboss at lists.open-bio.org; TAPO (Thomas Agersten Poulsen) Subject: Re: [EMBOSS] Pepstats "Molecular weight" calculations CAPS (Carsten P. S?nksen) wrote: > Hi > We are using Pepstat for molecular weight calculations and subsequent > comparison with mass spectrometric determined masses. I am looking for > the mass table used for the molecular weight calculations of the > proteins in order to determine the accuracy. And how it could be > possible to change it. The table is in a file called Emolwt.dat This should be included in the local data files section of the pepstats documentation. We will add it. It is at least mentioned in the -help output and in the command line section of the documentation. The local data files section should describe the file in more detail. A copy in your local diretcory (embossdata-fetch will copy the EMBOSS version for you) will be used in preference to the installed copy. > The other question is implementation of a molecular weight assuming that > the cysteins form disulfide bridges. This question is related to my > first line. Since we compare the intact molecular weight of the proteins > we want to be as precise as possible and thus measure the difference > between reduced and oxidized cystein residues. Most proteins with > cystein residues form disulfide bridges. Would it be possible to include > a molecular weight calculation which takes disulfide bridges into > account? So that an even nr of cysteins are calculated with the mass of > oxidized cysteins (S-S) and if there should be an single cystein left > then it is calculated with a sulfhydryl group (SH)? Good suggestion. We can add that for the next release. we would add an option for the number of S-S bridges and adjust the molecular weight. We have a similar option already for iep. Is there a need for single cysteines to allow for inter-chain disulphide bridges? Are there any other adjustments you would like? regards, Peter Rice From gbottu at vub.ac.be Tue Aug 25 12:06:44 2009 From: gbottu at vub.ac.be (Guy Bottu) Date: Tue, 25 Aug 2009 18:06:44 +0200 Subject: [EMBOSS] wrappers4EMBOSS 2.3.0 released Message-ID: <4A940C14.5070705@vub.ac.be> Dear users of wrappers4EMBOSS, This mail concerns you if you are using or intend to use wrappers4EMBOSS with one of the following : EMBOSS 6.1.0, MRS 4, PhyML 3, CLUSTAL 2, InterProScan 4.5, EBI fastA access through Web Services. You might be interested to upgrade for one of the following reasons : - We support all EMBOSS versions from 3.0.0 to 6.1.0 (it was necessary to take account of the fact that MYEMBOSS can use "source" as well as "src" as directory name and that EMBOSS 6.1.0 requests to have parameter names that are unique in the first 6 characters). - We support MRS version 4 as well as version 3. - We have abandoned support for PhyML version 2 in favour of version 3. The wrapper for ModelGenerator has been modified accordingly in order to automatically start PhyML with a model generated by ModelGenerator, using not anymore the script generated by ModelGenerator itself (it is for version 2) but instead a Perl script that parses the ModelGenerator output. The user can choose whether to use the model selected according to Akaike, modified Akaike or Bayesian information criterion. - We support the new optional features introduced in CLUSTAL version 2 (using UPGMA instead of NJ, not using sequence weights, improving the alignment by iterative re-alignment). - The module for InterProScan works with version 4.5 and has HAMAP in its menu. - The list of databank names in ebi_fasta has been adapted to the recent situation on the server. Guy Bottu, wEMBOSS development team From charles-listes-emboss at plessy.org Wed Aug 5 10:16:57 2009 From: charles-listes-emboss at plessy.org (Charles Plessy) Date: Wed, 5 Aug 2009 19:16:57 +0900 Subject: [EMBOSS] Redistribution terms of PHILIPNEW. Message-ID: <20090805101657.GA26099@kunpuu.plessy.org> Dear EMBOSS developers, I am preparing a Debian package for EMBASSY?s PHILIPNEW package. The redistribution terms of Phyilp itself are: /* version 3.6. (c) Copyright 1993-2002 by the University of Washington. Written by Joseph Felsenstein, Akiko Fuseki, Sean Lamont, Andrew Keeffe, and Dan Fineman. Permission is granted to copy and use this program provided no fee is charged for it and provided that this copyright notice is not removed. */ And for its documentation: Copyright 1986-2000 by the University of Washington. Written by Joseph Felsenstein. Permission is granted to copy this document provided that no fee is charged for it and that this copyright notice is not removed. I see that the documentation in emboss-doc is a derivative of the Phylip documentation. What are the redistribution terms for it ? For the rest of the EMBOSS-specific work, there is a hint that the license could be the GNU GPL, since this is what the COPYING file contains, but the GNU GPL does not allow linking to software that prohibits commercial use. As copyright holders, you are not yourself bound by the GPL, so this does not prevent you from distributing PHYLIPNEW, but this buggy situation makes it un-redistributable for third parties like Debian. But maybe the license of the EMBASSY part of PHYLIPNEW is not the GNU GPL? Can you clarify? Have a nice day, -- Charles Plessy Debian Med packaging team, http://www.debian.org/devel/debian-med Tsurumi, Kanagawa, Japan From pmr at ebi.ac.uk Wed Aug 5 10:47:45 2009 From: pmr at ebi.ac.uk (Peter Rice) Date: Wed, 05 Aug 2009 11:47:45 +0100 Subject: [EMBOSS] Redistribution terms of PHILIPNEW. In-Reply-To: <20090805101657.GA26099@kunpuu.plessy.org> References: <20090805101657.GA26099@kunpuu.plessy.org> Message-ID: <4A796351.3000108@ebi.ac.uk> Charles Plessy wrote: > Dear EMBOSS developers, > > I see that the documentation in emboss-doc is a derivative of the Phylip > documentation. What are the redistribution terms for it ? The changes are only to conform to EMBOSS documentation style and to use EMBOSS examples. The Phylip redistribution terms apply. > For the rest of the EMBOSS-specific work, there is a hint that the license > could be the GNU GPL, since this is what the COPYING file contains, but the GNU > GPL does not allow linking to software that prohibits commercial use. As > copyright holders, you are not yourself bound by the GPL, so this does not > prevent you from distributing PHYLIPNEW, but this buggy situation makes it > un-redistributable for third parties like Debian. The original licence applies. The COPYING file has been accidentally left there. We will replace it with the phylip copyright statements from the phylip-3.68 doc/main.html file (and check the other EMBASSY packages). The AUTHORS file should be completed as it is presently empty. If you check the README file you will see the changes we made. They certainly do not change the code significantly, only the interface. > But maybe the license of the EMBASSY part of PHYLIPNEW is not the GNU GPL? Can > you clarify? Definitely not GNU GPL. regards, Peter Rice From biopython at maubp.freeserve.co.uk Thu Aug 6 17:28:05 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 6 Aug 2009 18:28:05 +0100 Subject: [EMBOSS] GFF/GFF2/GFF3 examples on EMBOSS webpage Message-ID: <320fb6e00908061028m776fbf9buc56e1fb73f7e3a0b@mail.gmail.com> Hi all, I was just looking at this page: http://emboss.sourceforge.net/docs/themes/SequenceFormats.html This table lists GFF2 as one entry, and GFF/GFF3 as another. They link to: http://emboss.sourceforge.net/docs/themes/seqformats/gff2 and http://emboss.sourceforge.net/docs/themes/seqformats/gff respectively. These examples appear to be indentical (and the header says it is a GFF2 file). So I am a bit confused. Should one be a GFF3 file, and simply one file was uploaded twice by mistake? Thanks, Peter C. From isabelle.wells at roche.com Tue Aug 18 08:25:41 2009 From: isabelle.wells at roche.com (Wells, Isabelle) Date: Tue, 18 Aug 2009 10:25:41 +0200 Subject: [EMBOSS] inosine in nucleotide sequence databases Message-ID: <6DE144B7487D104290A097EA7C0C356A016BB94889@rkamsem701.emea.roche.com> Hi all, Can emboss handle inosine in nucleotide sequences? We have a nucleotide file in embl format where some sequences contain inosine. Dbiflat doesn't seem to index the database properly although no error message was given and those inosine containing sequences cannot be retrieved with seqret. Any suggestions on what we could do apart from replacing inosine by X or N? Many thanks, Isabelle Wells From pmr at ebi.ac.uk Tue Aug 18 10:05:05 2009 From: pmr at ebi.ac.uk (Peter Rice) Date: Tue, 18 Aug 2009 11:05:05 +0100 Subject: [EMBOSS] inosine in nucleotide sequence databases In-Reply-To: <6DE144B7487D104290A097EA7C0C356A016BB94889@rkamsem701.emea.roche.com> References: <6DE144B7487D104290A097EA7C0C356A016BB94889@rkamsem701.emea.roche.com> Message-ID: <4A8A7CD1.9030700@ebi.ac.uk> Dear Isabelle, Wells, Isabelle wrote: > Can emboss handle inosine in nucleotide sequences? We have a > nucleotide file in embl format where some sequences contain inosine. > Dbiflat doesn't seem to index the database properly although no error > message was given and those inosine containing sequences cannot be > retrieved with seqret. Any suggestions on what we could do apart from > replacing inosine by X or N? I assume your dbiflat problem is an error in retrieving the entries, unless there is some other format problem in the database that prevents entries from being recognized by the dbiflat parser. If you can send me one of the Inosine-containing entries (or a fake entry if these one are proprietary information) I can check. We treat Inosine as a modified base. These are usually in RNA sequences. You should replace it by X or N and if you have an EMBL format feature table you could add a modified_base feature with a /mod_base=I qualifier to mark each Inosine. EMBOSS does nothing special with these in the current release, but you can perhaps suggest applications to use the modified base information. Hope this helps, Peter Rice From xiz407 at gmail.com Tue Aug 18 15:23:45 2009 From: xiz407 at gmail.com (Zhou Xiang) Date: Tue, 18 Aug 2009 10:23:45 -0500 Subject: [EMBOSS] can vectorstrip trim only a substring of the adapter? Message-ID: Hi all, I used the vectorstrip to trim the 3' adapter off the sequences. But it seemed that the program searched for the existence of the entire adapter. For example, if i have the read: CCCCCTTTTTAAAAAGGGGG And 3' adapter is: CCAAAGGG The program will not trim the read to CCCCCTTTTTAA Because it does not use the substring "AAAGGG" in the adapter sequence. Any comments about this? How can i trim only a substring of the adapter? I hope it can search for the longest match, but substring matches should also be accepted if no entire adapter is found in the sequence. Thanks! -Xiang From pmr at ebi.ac.uk Tue Aug 18 16:06:43 2009 From: pmr at ebi.ac.uk (Peter Rice) Date: Tue, 18 Aug 2009 17:06:43 +0100 Subject: [EMBOSS] can vectorstrip trim only a substring of the adapter? In-Reply-To: References: Message-ID: <4A8AD193.8050102@ebi.ac.uk> Dear Zhou Xiang, > How can i trim only a substring of the adapter? You can use the -mismatch parameter to increase the allowed number of mismatches. A higher percent mismatch allows less precise matching, but in this case the value needs to be set quite high (25). We are interested in any comments on removing 3' adapters from short reads. We expect that we can find improvements in the methods used by vectorstrip. Please send us any suggestions. regards, Peter Rice From Frank.Foerster at biozentrum.uni-wuerzburg.de Tue Aug 18 18:15:44 2009 From: Frank.Foerster at biozentrum.uni-wuerzburg.de (=?ISO-8859-15?Q?Frank_F=F6rster?=) Date: Tue, 18 Aug 2009 20:15:44 +0200 Subject: [EMBOSS] Needle with penalty for end gaps Message-ID: <4A8AEFD0.8040403@biozentrum.uni-wuerzburg.de> Hi, in the announcement-thread of the new EMBOSS version 6.1.0. was a request for a program allowing complete global alignments including penalties for end gaps by Daniel Barker. ( http://www.mail-archive.com/emboss at lists.open-bio.org/msg01202.html ) The suggestion was to add a command line parameter to the needle program to enable/disable the penalties. Are there any news on this topic? I have to perform a lot of pairwise global alignments (without free end gaps) and either I have to program my own software or use existing software. Needle owns all needed features except the "only free end behavior". So I am really interested in getting a version of needle able to help me ;) Thanks for the great EMBOSS package. Regards, Frank -- Dipl. Biochem. Frank F?rster Department of Bioinformatics University of W?rzburg, Germany Fon: +49 931 - 318 4555 Fax: +49 931 - 318 4552 frank.foerster at biozentrum.uni-wuerzburg.de From pmr at ebi.ac.uk Wed Aug 19 07:26:10 2009 From: pmr at ebi.ac.uk (Peter Rice) Date: Wed, 19 Aug 2009 08:26:10 +0100 Subject: [EMBOSS] Needle with penalty for end gaps In-Reply-To: <4A8AEFD0.8040403@biozentrum.uni-wuerzburg.de> References: <4A8AEFD0.8040403@biozentrum.uni-wuerzburg.de> Message-ID: <4A8BA912.4080904@ebi.ac.uk> Frank F?rster wrote: > Are there any news on this topic? I have to perform a lot of pairwise > global alignments (without free end gaps) and either I have to program > my own software or use existing software. Needle owns all needed > features except the "only free end behavior". So I am really interested > in getting a version of needle able to help me ;) We are just coming to the end of our "40 days and 40 nights" since the release when we try not to break anything by making changes - and while we work on finishing the book texts which is really what has kept us busy. We will get on to this next week (the 40 days runs out on Monday 24th :-) and can give you an early version to try. > Thanks for the great EMBOSS package. Thanks for the very welcome thanks! regards, Peter From Frank.Foerster at biozentrum.uni-wuerzburg.de Wed Aug 19 07:29:19 2009 From: Frank.Foerster at biozentrum.uni-wuerzburg.de (=?ISO-8859-15?Q?Frank_F=F6rster?=) Date: Wed, 19 Aug 2009 09:29:19 +0200 Subject: [EMBOSS] Needle with penalty for end gaps In-Reply-To: <4A8BA912.4080904@ebi.ac.uk> References: <4A8AEFD0.8040403@biozentrum.uni-wuerzburg.de> <4A8BA912.4080904@ebi.ac.uk> Message-ID: <4A8BA9CF.5070604@biozentrum.uni-wuerzburg.de> Dear Peter, thank you for your fast reply. > We will get on to this next week (the 40 days runs out on Monday 24th > :-) and can give you an early version to try. This sounds very kind of you. I can hardly wait but I will ;) Regards, Frank -- Dipl. Biochem. Frank F?rster Department of Bioinformatics University of W?rzburg, Germany Fon: +49 931 - 318 4555 Fax: +49 931 - 318 4552 frank.foerster at biozentrum.uni-wuerzburg.de From biopython at maubp.freeserve.co.uk Wed Aug 19 11:08:26 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 19 Aug 2009 12:08:26 +0100 Subject: [EMBOSS] vectorstrip on FASTQ files Message-ID: <320fb6e00908190408j25f2eca0l6356b0fcd0526422@mail.gmail.com> Hi, I'm trying to use vectorstrip on FASTQ files (as a simple way to remove adaptor or primer sequences). However, it seems that on output the FASTQ qualities are missing (all set to the double quote, ASCII 33, meaning PHRED quality 1 or random). Is this a known bug (or rather, a missing feature)? For illustration I am using a Sanger style FASTQ file from the NCBI SRA (short reads originally from Solexa/Illumina), SRR014849.fastq which you can download from ftp://ftp.ncbi.nlm.nih.gov/sra/static/SRX003/SRX003639/SRR014849.fastq.gz I am pretending "GTTGGAACCG" is 5' adaptor sequence, and want to find any matches in some FASTQ reads, and trim it off taking only the sequence to the right. For simplicity I'm allowing no mismatches. Here is the start of the file: $ head -n 12 SRR014849.fastq @SRR014849.1 EIXKN4201CFU84 length=93 GGGGGGGGGGGGGGGGCTTTTTTTGTTTGGAACCGAAAGGGTTTTGAATTTCAAACCCTTTTCGGTTTCCAACCTTCCAAAGCAATGCCAATA +SRR014849.1 EIXKN4201CFU84 length=93 3+&$#"""""""""""7F at 71,'";C?,B;?6B;:EA1EA1EA5'9B:?:#9EA0D at 2EA5':>5?:%A;A8A;?9B;D@/=5B;4B>+C?,EA09B;@;9E@/EA/E@/B:;1B:B:;A9<5SRR014849.9_from_31_to_84 EIXKN4201AL42E length=84 AAAGGGTTTGAATTCAAACCCTTTGGTTCCAACTTGTCTTGCTTTAGCCTTTTA Using Sanger FASTQ runs: $ vectorstrip -sequence SRR014849.fastq -sformat fastq-sanger -readfile N -alinker "GTTGGAACCG" -blinker "" -osformat fastq-sanger -outseq SRR014849_5trimmed.fastq -mismatch 0 -besthits Y -outfile SRR014849_5trimmed.txt Removes vectors from the ends of nucleotide sequence(s) But the output is missing the quality scores: $ head -n 4 SRR014849_5trimmed.fastq @SRR014849.9_from_31_to_84 EIXKN4201AL42E length=84 AAAGGGTTTGAATTCAAACCCTTTGGTTCCAACTTGTCTTGCTTTAGCCTTTTA + """""""""""""""""""""""""""""""""""""""""""""""""""""" Is this something simple to add to vectorstrip? What about other annotation (e.g. running vector strip on annotated GenBank or EMBL files)? Thanks, Peter C. P.S. This is with EMBOSS 6.1.0 with a patch from Peter Rice, running on Mac OS X. From pmr at ebi.ac.uk Wed Aug 19 11:24:41 2009 From: pmr at ebi.ac.uk (Peter Rice) Date: Wed, 19 Aug 2009 12:24:41 +0100 Subject: [EMBOSS] vectorstrip on FASTQ files In-Reply-To: <320fb6e00908190408j25f2eca0l6356b0fcd0526422@mail.gmail.com> References: <320fb6e00908190408j25f2eca0l6356b0fcd0526422@mail.gmail.com> Message-ID: <4A8BE0F9.2020404@ebi.ac.uk> Peter C. wrote: > Hi, > > I'm trying to use vectorstrip on FASTQ files (as a simple way to > remove adaptor or primer sequences). However, it seems that on output > the FASTQ qualities are missing (all set to the double quote, ASCII > 33, meaning PHRED quality 1 or random). Is this a known bug (or > rather, a missing feature)? It is a missing feature. vectorstrip was written before quality scores became fashionable and, curiously, nobody has asked for them before. We will certainly retain them in a future release. regards, Peter From biopython at maubp.freeserve.co.uk Wed Aug 19 11:31:17 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 19 Aug 2009 12:31:17 +0100 Subject: [EMBOSS] vectorstrip on FASTQ files In-Reply-To: <4A8BE0F9.2020404@ebi.ac.uk> References: <320fb6e00908190408j25f2eca0l6356b0fcd0526422@mail.gmail.com> <4A8BE0F9.2020404@ebi.ac.uk> Message-ID: <320fb6e00908190431i23a27ed7g46cf9223b191d5f5@mail.gmail.com> Peter Rice wrote: > > Peter C. wrote: >> Hi, >> >> I'm trying to use vectorstrip on FASTQ files (as a simple way to >> remove adaptor or primer sequences). However, it seems that on output >> the FASTQ qualities are missing (all set to the double quote, ASCII >> 33, meaning PHRED quality 1 or random). Is this a known bug (or >> rather, a missing feature)? > > It is a missing feature. vectorstrip was written before quality scores > became fashionable and, curiously, nobody has asked for them before. > > We will certainly retain them in a future release. Great - thanks! Peter C. From frank.foerster at biozentrum.uni-wuerzburg.de Mon Aug 24 09:01:49 2009 From: frank.foerster at biozentrum.uni-wuerzburg.de (=?ISO-8859-15?Q?Frank_F=F6rster?=) Date: Mon, 24 Aug 2009 11:01:49 +0200 Subject: [EMBOSS] Gap cost restrictions for needle/water/stretcher? Message-ID: <4A9256FD.8070708@biozentrum.uni-wuerzburg.de> Hi, I have only one question about the allowed gap costs in several programs. I using needle, water and stretcher for example. There are some restrictions to the gap costs a have to use: 1) needle: float from 0-100 for gapopen and 0-10 for gapextend 2) water: float from 0.000-10.000 for gapopen and 0.000-10.000 for gapextend 2) stretcher: positive integer What are the meaning of these restrictions? I think you use an integer value for stretcher (I did not check the source code) and floats for needle/water. But why the restriction for water to three decimal places? But more interessting, why the restriction to 0-100/0-10 for needle/water? Thank you for your efforts! Frank F?rster -- Dipl. Biochem. Frank F?rster Department of Bioinformatics University of W?rzburg, Germany Fon: +49 931 - 318 4555 Fax: +49 931 - 318 4552 frank.foerster at biozentrum.uni-wuerzburg.de From pmr at ebi.ac.uk Mon Aug 24 11:52:48 2009 From: pmr at ebi.ac.uk (Peter Rice) Date: Mon, 24 Aug 2009 12:52:48 +0100 Subject: [EMBOSS] Gap cost restrictions for needle/water/stretcher? In-Reply-To: <4A9256FD.8070708@biozentrum.uni-wuerzburg.de> References: <4A9256FD.8070708@biozentrum.uni-wuerzburg.de> Message-ID: <4A927F10.1000701@ebi.ac.uk> Frank F?rster wrote: > Hi, > > I have only one question about the allowed gap costs in several > programs. I using needle, water and stretcher for example. > > There are some restrictions to the gap costs a have to use: > > 1) needle: float from 0-100 for gapopen and 0-10 for gapextend > 2) water: float from 0.000-10.000 for gapopen and 0.000-10.000 for > gapextend > 2) stretcher: positive integer > > What are the meaning of these restrictions? I think you use an integer > value for stretcher (I did not check the source code) and floats for > needle/water. Stretcher and matcher were imported code that used integer values for speed. Our matrix files use integer values so we can use integer or flats as gap penalty values. > But why the restriction for water to three decimal places? There is no 3 decimal places restriction, we only use 3 decimal places to write out the values. > But more interesting, why the restriction to 0-100/0-10 for needle/water? We set limits for needle and water with the first release of EMBOSS and nobody has asked for a higher value. Zero is useful for some cases, either to not penalise the number of gaps (for example a large number of single base gapes in a single nucleotide read) or to not penalise the gap length (genomic sequence aligned to mRNA/cDNA). The upper limits are enough for the cases we have seen. More interesting is why we have no upper limit for stretcher and matcher. We should be consistent. These were third-party applications (from Bill Pearson's fasta2 package) that we imported. Does anyone object to setting the same gap penalty limits for all applications? Can anyone think of a use case that needs a larger maximum value? We can add applications to suggest gap penalties for each matrix file ... or store default values in the files. Is this useful? regards, Peter Rice From pmr at ebi.ac.uk Tue Aug 25 10:59:35 2009 From: pmr at ebi.ac.uk (Peter Rice) Date: Tue, 25 Aug 2009 11:59:35 +0100 Subject: [EMBOSS] EMBOSS patch 1-2 for 6.1.0 Message-ID: <4A93C417.2070502@ebi.ac.uk> A patch for EMBOSS 6.1.0 is on the FTP server. This fixes a problem with reading the new UniProt/SwissProt description line. The bug is in extending strings within lists, so it may have had other effects. We recommend patching your EMBOSS installations as several of the new format SwissProt entries are unreadable The files are on our FTP server ftp://emboss.open-bio.org/pub/EMBOSS/fixes with a patch file and instructions in the patches subdirectory. Fix 2. EMBOSS-6.1.0/ajax/ajmem.h EMBOSS-6.1.0/ajax/ajstr.c EMBOSS-6.1.0/ajax/ajstr.h EMBOSS-6.1.0/nucleus/embaln.c 24-Aug-2009: Fix string extension so that pointers in lists remain valid. This fixes a bug in processing SwissProt complex descriptions. Fix definition of AJRESIZE0 macro. Fix processing of first match in a prophet profile alignment regards, Peter Rice From CAPS at novozymes.com Tue Aug 25 11:56:34 2009 From: CAPS at novozymes.com (=?iso-8859-1?Q?CAPS_=28Carsten_P=2E_S=F6nksen=29?=) Date: Tue, 25 Aug 2009 13:56:34 +0200 Subject: [EMBOSS] Pepstats "Molecular weight" calculations Message-ID: <4D0464992D73D44A93400D893E2A2C763C01848AED@NZT0013E.dknz.nzcorp.net> Hi We are using Pepstat for molecular weight calculations and subsequent comparison with mass spectrometric determined masses. I am looking for the mass table used for the molecular weight calculations of the proteins in order to determine the accuracy. And how it could be possible to change it. The other question is implementation of a molecular weight assuming that the cysteins form disulfide bridges. This question is related to my first line. Since we compare the intact molecular weight of the proteins we want to be as precise as possible and thus measure the difference between reduced and oxidized cystein residues. Most proteins with cystein residues form disulfide bridges. Would it be possible to include a molecular weight calculation which takes disulfide bridges into account? So that an even nr of cysteins are calculated with the mass of oxidized cysteins (S-S) and if there should be an single cystein left then it is calculated with a sulfhydryl group (SH)? Best Regards Carsten P. S?nksen Senior Scientist Novozymes A/S Krogshoejvej 36 2880 Bagsvaerd Denmark Phone: +45 44461123 Mobile: +45 30771123 E-mail: caps at novozymes.com Novozymes A/S (reg. no.: 10007127). Registered address: Krogshoejvej 36 DK-2880 Bagsvaerd, Denmark This e-mail (including any attachments) is for the intended addressee(s) only and may contain confidential and/or proprietary information protected by law. You are hereby notified that any unauthorized reading, disclosure, copying or distribution of this e-mail or use of information herein is strictly prohibited. If you are not an intended recipient you should delete this e-mail immediately. Thank you. From pmr at ebi.ac.uk Tue Aug 25 12:56:30 2009 From: pmr at ebi.ac.uk (Peter Rice) Date: Tue, 25 Aug 2009 13:56:30 +0100 Subject: [EMBOSS] Pepstats "Molecular weight" calculations In-Reply-To: <4D0464992D73D44A93400D893E2A2C763C01848AED@NZT0013E.dknz.nzcorp.net> References: <4D0464992D73D44A93400D893E2A2C763C01848AED@NZT0013E.dknz.nzcorp.net> Message-ID: <4A93DF7E.4030305@ebi.ac.uk> CAPS (Carsten P. S?nksen) wrote: > Hi > We are using Pepstat for molecular weight calculations and subsequent > comparison with mass spectrometric determined masses. I am looking for > the mass table used for the molecular weight calculations of the > proteins in order to determine the accuracy. And how it could be > possible to change it. The table is in a file called Emolwt.dat This should be included in the local data files section of the pepstats documentation. We will add it. It is at least mentioned in the -help output and in the command line section of the documentation. The local data files section should describe the file in more detail. A copy in your local diretcory (embossdata-fetch will copy the EMBOSS version for you) will be used in preference to the installed copy. > The other question is implementation of a molecular weight assuming that > the cysteins form disulfide bridges. This question is related to my > first line. Since we compare the intact molecular weight of the proteins > we want to be as precise as possible and thus measure the difference > between reduced and oxidized cystein residues. Most proteins with > cystein residues form disulfide bridges. Would it be possible to include > a molecular weight calculation which takes disulfide bridges into > account? So that an even nr of cysteins are calculated with the mass of > oxidized cysteins (S-S) and if there should be an single cystein left > then it is calculated with a sulfhydryl group (SH)? Good suggestion. We can add that for the next release. we would add an option for the number of S-S bridges and adjust the molecular weight. We have a similar option already for iep. Is there a need for single cysteines to allow for inter-chain disulphide bridges? Are there any other adjustments you would like? regards, Peter Rice From CAPS at novozymes.com Tue Aug 25 13:46:28 2009 From: CAPS at novozymes.com (=?iso-8859-1?Q?CAPS_=28Carsten_P=2E_S=F6nksen=29?=) Date: Tue, 25 Aug 2009 15:46:28 +0200 Subject: [EMBOSS] Pepstats "Molecular weight" calculations In-Reply-To: <4A93DF7E.4030305@ebi.ac.uk> References: <4D0464992D73D44A93400D893E2A2C763C01848AED@NZT0013E.dknz.nzcorp.net> <4A93DF7E.4030305@ebi.ac.uk> Message-ID: <4D0464992D73D44A93400D893E2A2C763C01848E4B@NZT0013E.dknz.nzcorp.net> Hi Peter, Thanks a lot for fast and positive reply. Regarding the molecular weight calculation including the disulfide bridges: Would it be possible to have the option that pepstat always calculates the molecular weight for the highest number of possible disulfide bridges and if there is a single cysteine left then this one should be calculated with an sulfhydryl group? This option would also be nice for the iep calculation. "Is there a need for single cysteines to allow for inter-chain disulphide bridges?" Not currently I believe that we then turn into a level where you need human interaction. Right now no further adjustments in my mind. Do you have an estimated time range when I can expect the next release? Best Regards Carsten P. S?nksen Senior Scientist Novozymes A/S Krogshoejvej 36 2880 Bagsvaerd Denmark Phone: +45 44461123 Mobile: +45 30771123 E-mail: caps at novozymes.com Novozymes A/S (reg. no.: 10007127). Registered address: Krogshoejvej 36 DK-2880 Bagsvaerd, Denmark This e-mail (including any attachments) is for the intended addressee(s) only and may contain confidential and/or proprietary information protected by law. You are hereby notified that any unauthorized reading, disclosure, copying or distribution of this e-mail or use of information herein is strictly prohibited. If you are not an intended recipient you should delete this e-mail immediately. Thank you. -----Original Message----- From: Peter Rice [mailto:pmr at ebi.ac.uk] Sent: 25. august 2009 14:57 To: CAPS (Carsten P. S?nksen) Cc: emboss at lists.open-bio.org; TAPO (Thomas Agersten Poulsen) Subject: Re: [EMBOSS] Pepstats "Molecular weight" calculations CAPS (Carsten P. S?nksen) wrote: > Hi > We are using Pepstat for molecular weight calculations and subsequent > comparison with mass spectrometric determined masses. I am looking for > the mass table used for the molecular weight calculations of the > proteins in order to determine the accuracy. And how it could be > possible to change it. The table is in a file called Emolwt.dat This should be included in the local data files section of the pepstats documentation. We will add it. It is at least mentioned in the -help output and in the command line section of the documentation. The local data files section should describe the file in more detail. A copy in your local diretcory (embossdata-fetch will copy the EMBOSS version for you) will be used in preference to the installed copy. > The other question is implementation of a molecular weight assuming that > the cysteins form disulfide bridges. This question is related to my > first line. Since we compare the intact molecular weight of the proteins > we want to be as precise as possible and thus measure the difference > between reduced and oxidized cystein residues. Most proteins with > cystein residues form disulfide bridges. Would it be possible to include > a molecular weight calculation which takes disulfide bridges into > account? So that an even nr of cysteins are calculated with the mass of > oxidized cysteins (S-S) and if there should be an single cystein left > then it is calculated with a sulfhydryl group (SH)? Good suggestion. We can add that for the next release. we would add an option for the number of S-S bridges and adjust the molecular weight. We have a similar option already for iep. Is there a need for single cysteines to allow for inter-chain disulphide bridges? Are there any other adjustments you would like? regards, Peter Rice From gbottu at vub.ac.be Tue Aug 25 16:06:44 2009 From: gbottu at vub.ac.be (Guy Bottu) Date: Tue, 25 Aug 2009 18:06:44 +0200 Subject: [EMBOSS] wrappers4EMBOSS 2.3.0 released Message-ID: <4A940C14.5070705@vub.ac.be> Dear users of wrappers4EMBOSS, This mail concerns you if you are using or intend to use wrappers4EMBOSS with one of the following : EMBOSS 6.1.0, MRS 4, PhyML 3, CLUSTAL 2, InterProScan 4.5, EBI fastA access through Web Services. You might be interested to upgrade for one of the following reasons : - We support all EMBOSS versions from 3.0.0 to 6.1.0 (it was necessary to take account of the fact that MYEMBOSS can use "source" as well as "src" as directory name and that EMBOSS 6.1.0 requests to have parameter names that are unique in the first 6 characters). - We support MRS version 4 as well as version 3. - We have abandoned support for PhyML version 2 in favour of version 3. The wrapper for ModelGenerator has been modified accordingly in order to automatically start PhyML with a model generated by ModelGenerator, using not anymore the script generated by ModelGenerator itself (it is for version 2) but instead a Perl script that parses the ModelGenerator output. The user can choose whether to use the model selected according to Akaike, modified Akaike or Bayesian information criterion. - We support the new optional features introduced in CLUSTAL version 2 (using UPGMA instead of NJ, not using sequence weights, improving the alignment by iterative re-alignment). - The module for InterProScan works with version 4.5 and has HAMAP in its menu. - The list of databank names in ebi_fasta has been adapted to the recent situation on the server. Guy Bottu, wEMBOSS development team