From charles-listes-emboss at plessy.org  Wed Aug  5 06:16:57 2009
From: charles-listes-emboss at plessy.org (Charles Plessy)
Date: Wed, 5 Aug 2009 19:16:57 +0900
Subject: [EMBOSS] Redistribution terms of PHILIPNEW.
Message-ID: <20090805101657.GA26099@kunpuu.plessy.org>

Dear EMBOSS developers,

I am preparing a Debian package for EMBASSY?s PHILIPNEW package. The
redistribution terms of Phyilp itself are:

/* version 3.6. (c) Copyright 1993-2002 by the University of Washington.
   Written by Joseph Felsenstein, Akiko Fuseki, Sean Lamont, Andrew Keeffe,
   and Dan Fineman.
   Permission is granted to copy and use this program provided no fee is
   charged for it and provided that this copyright notice is not removed. */

And for its documentation:

   Copyright 1986-2000 by the University of
   Washington.  Written by Joseph Felsenstein.  Permission is granted to copy 
   this document provided that no fee is charged for it and that this copyright 
   notice is not removed. 

I see that the documentation in emboss-doc is a derivative of the Phylip
documentation. What are the redistribution terms for it ?

For the rest of the EMBOSS-specific work, there is a hint that the license
could be the GNU GPL, since this is what the COPYING file contains, but the GNU
GPL does not allow linking to software that prohibits commercial use. As
copyright holders, you are not yourself bound by the GPL, so this does not
prevent you from distributing PHYLIPNEW, but this buggy situation makes it
un-redistributable for third parties like Debian.

But maybe the license of the EMBASSY part of PHYLIPNEW is not the GNU GPL? Can
you clarify?

Have a nice day,

-- 
Charles Plessy
Debian Med packaging team,
http://www.debian.org/devel/debian-med
Tsurumi, Kanagawa, Japan

From pmr at ebi.ac.uk  Wed Aug  5 06:47:45 2009
From: pmr at ebi.ac.uk (Peter Rice)
Date: Wed, 05 Aug 2009 11:47:45 +0100
Subject: [EMBOSS] Redistribution terms of PHILIPNEW.
In-Reply-To: <20090805101657.GA26099@kunpuu.plessy.org>
References: <20090805101657.GA26099@kunpuu.plessy.org>
Message-ID: <4A796351.3000108@ebi.ac.uk>

Charles Plessy wrote:
> Dear EMBOSS developers,
> 
> I see that the documentation in emboss-doc is a derivative of the Phylip
> documentation. What are the redistribution terms for it ?

The changes are only to conform to EMBOSS documentation style and to use 
EMBOSS examples. The Phylip redistribution terms apply.

> For the rest of the EMBOSS-specific work, there is a hint that the license
> could be the GNU GPL, since this is what the COPYING file contains, but the GNU
> GPL does not allow linking to software that prohibits commercial use. As
> copyright holders, you are not yourself bound by the GPL, so this does not
> prevent you from distributing PHYLIPNEW, but this buggy situation makes it
> un-redistributable for third parties like Debian.

The original licence applies.

The COPYING file has been accidentally left there. We will replace it 
with the phylip copyright statements from the phylip-3.68 doc/main.html 
file (and check the other EMBASSY packages). The AUTHORS file should be 
completed as it is presently empty.

If you check the README file you will see the changes we made. They 
certainly do not change the code significantly, only the interface.

> But maybe the license of the EMBASSY part of PHYLIPNEW is not the GNU GPL? Can
> you clarify?

Definitely not GNU GPL.

regards,

Peter Rice

From biopython at maubp.freeserve.co.uk  Thu Aug  6 13:28:05 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 6 Aug 2009 18:28:05 +0100
Subject: [EMBOSS] GFF/GFF2/GFF3 examples on EMBOSS webpage
Message-ID: <320fb6e00908061028m776fbf9buc56e1fb73f7e3a0b@mail.gmail.com>

Hi all,

I was just looking at this page:
http://emboss.sourceforge.net/docs/themes/SequenceFormats.html

This table lists GFF2 as one entry, and GFF/GFF3 as another. They link
to: http://emboss.sourceforge.net/docs/themes/seqformats/gff2 and
http://emboss.sourceforge.net/docs/themes/seqformats/gff respectively.

These examples appear to be indentical (and the header says it is a
GFF2 file). So I am a bit confused. Should one be a GFF3 file, and
simply one file was uploaded twice by mistake?

Thanks,

Peter C.

From isabelle.wells at roche.com  Tue Aug 18 04:25:41 2009
From: isabelle.wells at roche.com (Wells, Isabelle)
Date: Tue, 18 Aug 2009 10:25:41 +0200
Subject: [EMBOSS] inosine in nucleotide sequence databases
Message-ID: <6DE144B7487D104290A097EA7C0C356A016BB94889@rkamsem701.emea.roche.com>

Hi all,
Can emboss handle inosine in nucleotide sequences? We have a nucleotide file in embl format where some sequences contain inosine. Dbiflat doesn't seem to index the database properly although no error message was given and those inosine containing sequences cannot be retrieved with seqret. Any suggestions on what we could do apart from replacing inosine by X or N?
Many thanks,
Isabelle Wells


From pmr at ebi.ac.uk  Tue Aug 18 06:05:05 2009
From: pmr at ebi.ac.uk (Peter Rice)
Date: Tue, 18 Aug 2009 11:05:05 +0100
Subject: [EMBOSS] inosine in nucleotide sequence databases
In-Reply-To: <6DE144B7487D104290A097EA7C0C356A016BB94889@rkamsem701.emea.roche.com>
References: <6DE144B7487D104290A097EA7C0C356A016BB94889@rkamsem701.emea.roche.com>
Message-ID: <4A8A7CD1.9030700@ebi.ac.uk>

Dear Isabelle,

Wells, Isabelle wrote:
> Can emboss handle inosine in nucleotide sequences? We have a
> nucleotide file in embl format where some sequences contain inosine.
> Dbiflat doesn't seem to index the database properly although no error
> message was given and those inosine containing sequences cannot be
> retrieved with seqret. Any suggestions on what we could do apart from
> replacing inosine by X or N?

I assume your dbiflat problem is an error in retrieving the entries,
unless there is some other format problem in the database that prevents
entries from being recognized by the dbiflat parser. If you can send me
one of the Inosine-containing entries (or a fake entry if these one are
proprietary information) I can check.

We treat Inosine as a modified base. These are usually in RNA sequences.
You should replace it by X or N and if you have an EMBL format feature
table you could add a modified_base feature with a /mod_base=I qualifier
to mark each Inosine. EMBOSS does nothing special with these in the
current release, but you can perhaps suggest applications to use the
modified base information.

Hope this helps,

Peter Rice

From xiz407 at gmail.com  Tue Aug 18 11:23:45 2009
From: xiz407 at gmail.com (Zhou Xiang)
Date: Tue, 18 Aug 2009 10:23:45 -0500
Subject: [EMBOSS] can vectorstrip trim only a substring of the adapter?
Message-ID: <d55614630908180823qcd029f6yd86bc00449ebcef4@mail.gmail.com>

Hi all,

I used the vectorstrip to trim the 3' adapter off the sequences.
But it seemed that the program searched for the existence of the entire
adapter.

For example, if i have the read: CCCCCTTTTTAAAAAGGGGG
And 3' adapter is: CCAAAGGG
The program will not trim the read to CCCCCTTTTTAA
Because it does not use the substring "AAAGGG" in the adapter sequence.

Any comments about this? How can i trim only a substring of the adapter?
I hope it can search for the longest match, but substring matches should
also be accepted if no entire adapter is found in the sequence.
Thanks!

-Xiang

From pmr at ebi.ac.uk  Tue Aug 18 12:06:43 2009
From: pmr at ebi.ac.uk (Peter Rice)
Date: Tue, 18 Aug 2009 17:06:43 +0100
Subject: [EMBOSS] can vectorstrip trim only a substring of the adapter?
In-Reply-To: <d55614630908180823qcd029f6yd86bc00449ebcef4@mail.gmail.com>
References: <d55614630908180823qcd029f6yd86bc00449ebcef4@mail.gmail.com>
Message-ID: <4A8AD193.8050102@ebi.ac.uk>

Dear Zhou Xiang,

> How can i trim only a substring of the adapter?

You can use the -mismatch parameter to increase the allowed number of
mismatches.

A higher percent mismatch allows less precise matching, but in this case the
value needs to be set quite high (25).

We are interested in any comments on removing 3' adapters from short
reads. We expect that we can find improvements in the methods used by
vectorstrip. Please send us any suggestions.

regards,

Peter Rice

From Frank.Foerster at biozentrum.uni-wuerzburg.de  Tue Aug 18 14:15:44 2009
From: Frank.Foerster at biozentrum.uni-wuerzburg.de (=?ISO-8859-15?Q?Frank_F=F6rster?=)
Date: Tue, 18 Aug 2009 20:15:44 +0200
Subject: [EMBOSS] Needle with penalty for end gaps
Message-ID: <4A8AEFD0.8040403@biozentrum.uni-wuerzburg.de>

Hi,

in the announcement-thread  of the new EMBOSS version 6.1.0. was a
request for a program allowing complete global alignments including
penalties for end gaps by Daniel Barker. (
http://www.mail-archive.com/emboss at lists.open-bio.org/msg01202.html )

The suggestion was to add a command line parameter to the needle program
to enable/disable the penalties.

Are there any news on this topic? I have to perform a lot of pairwise
global alignments (without free end gaps) and either I have to program
my own software or use existing software. Needle owns all needed
features except the "only free end behavior". So I am really interested
in getting a version of needle able to help me ;)

Thanks for the great EMBOSS package.

Regards,
Frank

-- 
Dipl. Biochem. Frank F?rster
Department of Bioinformatics
University of W?rzburg, Germany
Fon: +49 931 - 318 4555
Fax: +49 931 - 318 4552
frank.foerster at biozentrum.uni-wuerzburg.de

From pmr at ebi.ac.uk  Wed Aug 19 03:26:10 2009
From: pmr at ebi.ac.uk (Peter Rice)
Date: Wed, 19 Aug 2009 08:26:10 +0100
Subject: [EMBOSS] Needle with penalty for end gaps
In-Reply-To: <4A8AEFD0.8040403@biozentrum.uni-wuerzburg.de>
References: <4A8AEFD0.8040403@biozentrum.uni-wuerzburg.de>
Message-ID: <4A8BA912.4080904@ebi.ac.uk>

Frank F?rster wrote:
> Are there any news on this topic? I have to perform a lot of pairwise
> global alignments (without free end gaps) and either I have to program
> my own software or use existing software. Needle owns all needed
> features except the "only free end behavior". So I am really interested
> in getting a version of needle able to help me ;)

We are just coming to the end of our "40 days and 40 nights" since the 
release when we try not to break anything by making changes - and while 
we work on finishing the book texts which is really what has kept us busy.

We will get on to this next week (the 40 days runs out on Monday 24th 
:-) and can give you an early version to try.

> Thanks for the great EMBOSS package.

Thanks for the very welcome thanks!

regards,

Peter

From Frank.Foerster at biozentrum.uni-wuerzburg.de  Wed Aug 19 03:29:19 2009
From: Frank.Foerster at biozentrum.uni-wuerzburg.de (=?ISO-8859-15?Q?Frank_F=F6rster?=)
Date: Wed, 19 Aug 2009 09:29:19 +0200
Subject: [EMBOSS] Needle with penalty for end gaps
In-Reply-To: <4A8BA912.4080904@ebi.ac.uk>
References: <4A8AEFD0.8040403@biozentrum.uni-wuerzburg.de>
	<4A8BA912.4080904@ebi.ac.uk>
Message-ID: <4A8BA9CF.5070604@biozentrum.uni-wuerzburg.de>

Dear Peter,

thank you for your fast reply.

> We will get on to this next week (the 40 days runs out on Monday 24th
> :-) and can give you an early version to try.

This sounds very kind of you. I can hardly wait but I will ;)

Regards,
Frank

-- 
Dipl. Biochem. Frank F?rster
Department of Bioinformatics
University of W?rzburg, Germany
Fon: +49 931 - 318 4555
Fax: +49 931 - 318 4552
frank.foerster at biozentrum.uni-wuerzburg.de

From biopython at maubp.freeserve.co.uk  Wed Aug 19 07:08:26 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 19 Aug 2009 12:08:26 +0100
Subject: [EMBOSS] vectorstrip on FASTQ files
Message-ID: <320fb6e00908190408j25f2eca0l6356b0fcd0526422@mail.gmail.com>

Hi,

I'm trying to use vectorstrip on FASTQ files (as a simple way to
remove adaptor or primer sequences). However, it seems that on output
the FASTQ qualities are missing (all set to the double quote, ASCII
33, meaning PHRED quality 1 or random). Is this a known bug (or
rather, a missing feature)?

For illustration I am using a Sanger style FASTQ file from the NCBI
SRA (short reads originally from Solexa/Illumina), SRR014849.fastq
which you can download from
ftp://ftp.ncbi.nlm.nih.gov/sra/static/SRX003/SRX003639/SRR014849.fastq.gz

I am pretending "GTTGGAACCG" is 5' adaptor sequence, and want to find
any matches in some FASTQ reads, and trim it off taking only the
sequence to the right. For simplicity I'm allowing no mismatches.
Here is the start of the file:

$ head -n 12 SRR014849.fastq
@SRR014849.1 EIXKN4201CFU84 length=93
GGGGGGGGGGGGGGGGCTTTTTTTGTTTGGAACCGAAAGGGTTTTGAATTTCAAACCCTTTTCGGTTTCCAACCTTCCAAAGCAATGCCAATA
+SRR014849.1 EIXKN4201CFU84 length=93
3+&$#"""""""""""7F at 71,'";C?,B;?6B;:EA1EA1EA5'9B:?:#9EA0D at 2EA5':>5?:%A;A8A;?9B;D@/=<?7=9<2A8==
@SRR014849.3 EIXKN4201D4ZBL length=119
GGGGGGGGGCTGTTGGCCGAGGTTGGAGTAGCCAGGGGGAAGGCATGGCCAGCCGTTGAGAAATGCTTGTTGAAGTTTTCGATAATAATGGATTTATCGGTGGTGACCGTGTTACCTAG
+SRR014849.3 EIXKN4201D4ZBL length=119
;3.*(&$"";<=A9 at 8A9;<B;B;B;8=<==B;<FB8/'@8B:==<B;A9<<A8=B;==;A=)=<<B;=A9<@7<FB5(<<=<B;<B;:A9=EA0;<;B:<A8=<<@8<<<B;<A99=<
@SRR014849.9 EIXKN4201AL42E length=84
AACATAAAGAGCAATAGACAGTTGGAACCGAAAGGGTTTGAATTCAAACCCTTTGGTTCCAACTTGTCTTGCTTTAGCCTTTTA
+SRR014849.9 EIXKN4201AL42E length=84
B:=8<EA087<;@8<<<8<:8A9=3>5B;4B>+C?,EA09B;@;9E@/EA/E@/B:;1B:B:;A9<5<B;;8EA0<<B;FB6)7

Notice the "adaptor" in in the third sequence, SRR014849.9,
AACATAAAGAGCAATAGACAGTTGGAACCGAAAGGGTTTGAATTCAAACCCTTTGGTTCCAACTTGTCTTGCTTTAGCCTTTTA
This should be trimmed to just:
AAAGGGTTTGAATTCAAACCCTTTGGTTCCAACTTGTCTTGCTTTAGCCTTTTA

Using FASTA as output looks fine:

$ vectorstrip -sequence SRR014849.fastq -sformat fastq-sanger
-readfile N -alinker "GTTGGAACCG" -blinker "" -osformat fasta -outseq
SRR014849_5trimmed.fasta -mismatch 0 -besthits Y -outfile
SRR014849_5trimmed.txt
Removes vectors from the ends of nucleotide sequence(s)

$ head -n 2 SRR014849_5trimmed.fasta
>SRR014849.9_from_31_to_84 EIXKN4201AL42E length=84
AAAGGGTTTGAATTCAAACCCTTTGGTTCCAACTTGTCTTGCTTTAGCCTTTTA

Using Sanger FASTQ runs:

$ vectorstrip -sequence SRR014849.fastq -sformat fastq-sanger
-readfile N -alinker "GTTGGAACCG" -blinker "" -osformat fastq-sanger
-outseq SRR014849_5trimmed.fastq -mismatch 0 -besthits Y -outfile
SRR014849_5trimmed.txt
Removes vectors from the ends of nucleotide sequence(s)

But the output is missing the quality scores:

$ head -n 4 SRR014849_5trimmed.fastq
@SRR014849.9_from_31_to_84 EIXKN4201AL42E length=84
AAAGGGTTTGAATTCAAACCCTTTGGTTCCAACTTGTCTTGCTTTAGCCTTTTA
+
""""""""""""""""""""""""""""""""""""""""""""""""""""""

Is this something simple to add to vectorstrip? What about other
annotation (e.g. running vector strip on annotated GenBank or EMBL
files)?

Thanks,

Peter C.

P.S. This is with EMBOSS 6.1.0 with a patch from Peter Rice, running
on Mac OS X.

From pmr at ebi.ac.uk  Wed Aug 19 07:24:41 2009
From: pmr at ebi.ac.uk (Peter Rice)
Date: Wed, 19 Aug 2009 12:24:41 +0100
Subject: [EMBOSS] vectorstrip on FASTQ files
In-Reply-To: <320fb6e00908190408j25f2eca0l6356b0fcd0526422@mail.gmail.com>
References: <320fb6e00908190408j25f2eca0l6356b0fcd0526422@mail.gmail.com>
Message-ID: <4A8BE0F9.2020404@ebi.ac.uk>

Peter C. wrote:
> Hi,
> 
> I'm trying to use vectorstrip on FASTQ files (as a simple way to
> remove adaptor or primer sequences). However, it seems that on output
> the FASTQ qualities are missing (all set to the double quote, ASCII
> 33, meaning PHRED quality 1 or random). Is this a known bug (or
> rather, a missing feature)?

It is a missing feature. vectorstrip was written before quality scores
became fashionable and, curiously, nobody has asked for them before.

We will certainly retain them in a future release.

regards,

Peter

From biopython at maubp.freeserve.co.uk  Wed Aug 19 07:31:17 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 19 Aug 2009 12:31:17 +0100
Subject: [EMBOSS] vectorstrip on FASTQ files
In-Reply-To: <4A8BE0F9.2020404@ebi.ac.uk>
References: <320fb6e00908190408j25f2eca0l6356b0fcd0526422@mail.gmail.com>
	<4A8BE0F9.2020404@ebi.ac.uk>
Message-ID: <320fb6e00908190431i23a27ed7g46cf9223b191d5f5@mail.gmail.com>

 Peter Rice wrote:
>
> Peter C. wrote:
>> Hi,
>>
>> I'm trying to use vectorstrip on FASTQ files (as a simple way to
>> remove adaptor or primer sequences). However, it seems that on output
>> the FASTQ qualities are missing (all set to the double quote, ASCII
>> 33, meaning PHRED quality 1 or random). Is this a known bug (or
>> rather, a missing feature)?
>
> It is a missing feature. vectorstrip was written before quality scores
> became fashionable and, curiously, nobody has asked for them before.
>
> We will certainly retain them in a future release.

Great - thanks!

Peter C.

From frank.foerster at biozentrum.uni-wuerzburg.de  Mon Aug 24 05:01:49 2009
From: frank.foerster at biozentrum.uni-wuerzburg.de (=?ISO-8859-15?Q?Frank_F=F6rster?=)
Date: Mon, 24 Aug 2009 11:01:49 +0200
Subject: [EMBOSS] Gap cost restrictions for needle/water/stretcher?
Message-ID: <4A9256FD.8070708@biozentrum.uni-wuerzburg.de>

Hi,

I have only one question about the allowed gap costs in several programs. I 
using needle, water and stretcher for example.

There are some restrictions to the gap costs a have to use:

1) needle: float from 0-100 for gapopen and 0-10 for gapextend
2) water: float from 0.000-10.000 for gapopen and 0.000-10.000 for gapextend
2) stretcher: positive integer

What are the meaning of these restrictions? I think you use an integer value for 
stretcher (I did not check the source code) and floats for needle/water.

But why the restriction for water to three decimal places?

But more interessting, why the restriction to 0-100/0-10 for needle/water?

Thank you for your efforts!

Frank F?rster


-- 
Dipl. Biochem. Frank F?rster
Department of Bioinformatics
University of W?rzburg, Germany
Fon: +49 931 - 318 4555
Fax: +49 931 - 318 4552
frank.foerster at biozentrum.uni-wuerzburg.de

From pmr at ebi.ac.uk  Mon Aug 24 07:52:48 2009
From: pmr at ebi.ac.uk (Peter Rice)
Date: Mon, 24 Aug 2009 12:52:48 +0100
Subject: [EMBOSS] Gap cost restrictions for needle/water/stretcher?
In-Reply-To: <4A9256FD.8070708@biozentrum.uni-wuerzburg.de>
References: <4A9256FD.8070708@biozentrum.uni-wuerzburg.de>
Message-ID: <4A927F10.1000701@ebi.ac.uk>

Frank F?rster wrote:
> Hi,
> 
> I have only one question about the allowed gap costs in several
> programs. I using needle, water and stretcher for example.
> 
> There are some restrictions to the gap costs a have to use:
> 
> 1) needle: float from 0-100 for gapopen and 0-10 for gapextend
> 2) water: float from 0.000-10.000 for gapopen and 0.000-10.000 for
> gapextend
> 2) stretcher: positive integer
> 
> What are the meaning of these restrictions? I think you use an integer
> value for stretcher (I did not check the source code) and floats for
> needle/water.

Stretcher and matcher were imported code that used integer values for
speed. Our matrix files use integer values so we can use integer or
flats as gap penalty values.

> But why the restriction for water to three decimal places?

There is no 3 decimal places restriction, we only use 3 decimal places
to write out the values.

> But more interesting, why the restriction to 0-100/0-10 for needle/water?

We set limits for needle and water with the first release of EMBOSS and
nobody has asked for a higher value.

Zero is useful for some cases, either to not penalise the number of gaps
(for example a large number of single base gapes in a single nucleotide
read) or to not penalise the gap length (genomic sequence aligned to
mRNA/cDNA).

The upper limits are enough for the cases we have seen.

More interesting is why we have no upper limit for stretcher and
matcher. We should be consistent. These were third-party applications
(from Bill Pearson's fasta2 package) that we imported.

Does anyone object to setting the same gap penalty limits for all
applications?

Can anyone think of a use case that needs a larger maximum value?

We can add applications to suggest gap penalties for each matrix file
... or store default values in the files. Is this useful?

regards,

Peter Rice


From pmr at ebi.ac.uk  Tue Aug 25 06:59:35 2009
From: pmr at ebi.ac.uk (Peter Rice)
Date: Tue, 25 Aug 2009 11:59:35 +0100
Subject: [EMBOSS] EMBOSS patch 1-2 for 6.1.0
Message-ID: <4A93C417.2070502@ebi.ac.uk>

A patch for EMBOSS 6.1.0 is on the FTP server. This fixes a problem with 
reading the new UniProt/SwissProt description line. The bug is in extending 
strings within lists, so it may have had other effects. We recommend 
patching your EMBOSS installations as several of the new format SwissProt 
entries are unreadable

The files are on our FTP server ftp://emboss.open-bio.org/pub/EMBOSS/fixes
with a patch file and instructions in the patches subdirectory.

Fix 2. EMBOSS-6.1.0/ajax/ajmem.h
        EMBOSS-6.1.0/ajax/ajstr.c
        EMBOSS-6.1.0/ajax/ajstr.h
        EMBOSS-6.1.0/nucleus/embaln.c

24-Aug-2009: Fix string extension so that pointers in lists remain valid.
              This fixes a bug in processing SwissProt complex descriptions.
              Fix definition of AJRESIZE0 macro.
              Fix processing of first match in a prophet profile alignment

regards,

Peter Rice


From CAPS at novozymes.com  Tue Aug 25 07:56:34 2009
From: CAPS at novozymes.com (=?iso-8859-1?Q?CAPS_=28Carsten_P=2E_S=F6nksen=29?=)
Date: Tue, 25 Aug 2009 13:56:34 +0200
Subject: [EMBOSS] Pepstats "Molecular weight" calculations
Message-ID: <4D0464992D73D44A93400D893E2A2C763C01848AED@NZT0013E.dknz.nzcorp.net>

Hi
We are using Pepstat for molecular weight calculations and subsequent comparison with mass spectrometric determined masses.
I am looking for the mass table used for the molecular weight calculations of the proteins in order to determine the accuracy. And how it could be possible to change it.

The other question is implementation of a molecular weight assuming that the cysteins form disulfide bridges.
                             This question is related to my first line. Since we compare the intact molecular weight of the proteins we want to be as precise as possible and thus measure the difference between reduced and oxidized cystein residues. Most proteins with cystein residues form disulfide bridges.
                             Would it be possible to include a molecular weight calculation which takes disulfide bridges into account? So that an even nr of cysteins are calculated with the mass of oxidized  cysteins (S-S) and if there should be an single cystein left then it is calculated with a sulfhydryl group (SH)?

Best Regards
Carsten P. S?nksen
Senior Scientist

Novozymes A/S
Krogshoejvej 36
2880 Bagsvaerd Denmark
Phone: +45 44461123
Mobile: +45 30771123
E-mail: caps at novozymes.com
Novozymes A/S (reg. no.: 10007127). Registered address: Krogshoejvej 36 DK-2880 Bagsvaerd, Denmark
This e-mail (including any attachments) is for the intended addressee(s) only and may contain confidential and/or proprietary information protected by law. You are hereby notified that any unauthorized reading, disclosure, copying or distribution of this e-mail or use of information herein is strictly prohibited. If you are not an intended recipient you should delete this e-mail immediately. Thank you.


From pmr at ebi.ac.uk  Tue Aug 25 08:56:30 2009
From: pmr at ebi.ac.uk (Peter Rice)
Date: Tue, 25 Aug 2009 13:56:30 +0100
Subject: [EMBOSS] Pepstats "Molecular weight" calculations
In-Reply-To: <4D0464992D73D44A93400D893E2A2C763C01848AED@NZT0013E.dknz.nzcorp.net>
References: <4D0464992D73D44A93400D893E2A2C763C01848AED@NZT0013E.dknz.nzcorp.net>
Message-ID: <4A93DF7E.4030305@ebi.ac.uk>

CAPS (Carsten P. S?nksen) wrote:
> Hi
> We are using Pepstat for molecular weight calculations and subsequent
> comparison with mass spectrometric determined masses. I am looking for
> the mass table used for the molecular weight calculations of the
> proteins in order to determine the accuracy. And how it could be
> possible to change it.

The table is in a file called Emolwt.dat

This should be included in the local data files section of the pepstats 
documentation. We will add it. It is at least mentioned in the -help output 
and in the command line section of the documentation. The local data files 
section should describe the file in more detail.

A copy in your local diretcory (embossdata-fetch will copy the EMBOSS 
version for you) will be used in preference to the installed copy.

> The other question is implementation of a molecular weight assuming that
> the cysteins form disulfide bridges. This question is related to my
> first line. Since we compare the intact molecular weight of the proteins
> we want to be as precise as possible and thus measure the difference
> between reduced and oxidized cystein residues. Most proteins with
> cystein residues form disulfide bridges. Would it be possible to include
> a molecular weight calculation which takes disulfide bridges into
> account? So that an even nr of cysteins are calculated with the mass of
> oxidized  cysteins (S-S) and if there should be an single cystein left
> then it is calculated with a sulfhydryl group (SH)?

Good suggestion. We can add that for the next release. we would add an 
option for the number of S-S bridges and adjust the molecular weight.
We have a similar option already for iep.

Is there a need for single cysteines to allow for inter-chain disulphide 
bridges?

Are there any other adjustments you would like?

regards,

Peter Rice

From CAPS at novozymes.com  Tue Aug 25 09:46:28 2009
From: CAPS at novozymes.com (=?iso-8859-1?Q?CAPS_=28Carsten_P=2E_S=F6nksen=29?=)
Date: Tue, 25 Aug 2009 15:46:28 +0200
Subject: [EMBOSS] Pepstats "Molecular weight" calculations
In-Reply-To: <4A93DF7E.4030305@ebi.ac.uk>
References: <4D0464992D73D44A93400D893E2A2C763C01848AED@NZT0013E.dknz.nzcorp.net>
	<4A93DF7E.4030305@ebi.ac.uk>
Message-ID: <4D0464992D73D44A93400D893E2A2C763C01848E4B@NZT0013E.dknz.nzcorp.net>

Hi Peter,

Thanks a lot for fast and positive reply.

Regarding the molecular weight calculation including the disulfide bridges:
Would it be possible to have the option that pepstat always calculates the molecular weight for the highest number of possible disulfide bridges and if there is a single cysteine left then this one should be calculated with an sulfhydryl group?

This option would also be nice for the iep calculation.

"Is there a need for single cysteines to allow for inter-chain disulphide 
bridges?"
Not currently I believe that we then turn into a level where you need human interaction. 

Right now no further adjustments in my mind. 

Do you have an estimated time range when I can expect the next release?


Best Regards
Carsten P. S?nksen
Senior Scientist

Novozymes A/S
Krogshoejvej 36

2880 Bagsvaerd Denmark
Phone: +45 44461123
Mobile: +45 30771123
E-mail: caps at novozymes.com

Novozymes A/S (reg. no.: 10007127). Registered address: Krogshoejvej 36 DK-2880 Bagsvaerd, Denmark
This e-mail (including any attachments) is for the intended addressee(s) only and may contain confidential and/or proprietary information protected by law. You are hereby notified that any unauthorized reading, disclosure, copying or distribution of this e-mail or use of information herein is strictly prohibited. If you are not an intended recipient you should delete this e-mail immediately. Thank you.


-----Original Message-----
From: Peter Rice [mailto:pmr at ebi.ac.uk] 
Sent: 25. august 2009 14:57
To: CAPS (Carsten P. S?nksen)
Cc: emboss at lists.open-bio.org; TAPO (Thomas Agersten Poulsen)
Subject: Re: [EMBOSS] Pepstats "Molecular weight" calculations

CAPS (Carsten P. S?nksen) wrote:
> Hi
> We are using Pepstat for molecular weight calculations and subsequent
> comparison with mass spectrometric determined masses. I am looking for
> the mass table used for the molecular weight calculations of the
> proteins in order to determine the accuracy. And how it could be
> possible to change it.

The table is in a file called Emolwt.dat

This should be included in the local data files section of the pepstats 
documentation. We will add it. It is at least mentioned in the -help output 
and in the command line section of the documentation. The local data files 
section should describe the file in more detail.

A copy in your local diretcory (embossdata-fetch will copy the EMBOSS 
version for you) will be used in preference to the installed copy.

> The other question is implementation of a molecular weight assuming that
> the cysteins form disulfide bridges. This question is related to my
> first line. Since we compare the intact molecular weight of the proteins
> we want to be as precise as possible and thus measure the difference
> between reduced and oxidized cystein residues. Most proteins with
> cystein residues form disulfide bridges. Would it be possible to include
> a molecular weight calculation which takes disulfide bridges into
> account? So that an even nr of cysteins are calculated with the mass of
> oxidized  cysteins (S-S) and if there should be an single cystein left
> then it is calculated with a sulfhydryl group (SH)?

Good suggestion. We can add that for the next release. we would add an 
option for the number of S-S bridges and adjust the molecular weight.
We have a similar option already for iep.

Is there a need for single cysteines to allow for inter-chain disulphide 
bridges?

Are there any other adjustments you would like?

regards,

Peter Rice


From gbottu at vub.ac.be  Tue Aug 25 12:06:44 2009
From: gbottu at vub.ac.be (Guy Bottu)
Date: Tue, 25 Aug 2009 18:06:44 +0200
Subject: [EMBOSS] wrappers4EMBOSS 2.3.0 released
Message-ID: <4A940C14.5070705@vub.ac.be>

	Dear users of wrappers4EMBOSS,

This mail concerns you if you are using or intend to use wrappers4EMBOSS 
with one of the following : EMBOSS 6.1.0, MRS 4, PhyML 3, CLUSTAL 2, 
InterProScan 4.5, EBI fastA access through Web Services.

You might be interested to upgrade for one of the following reasons :
- We support all EMBOSS versions from 3.0.0 to 6.1.0 (it was necessary 
to take account of the fact that MYEMBOSS can use "source" as well as 
"src" as directory name and that EMBOSS 6.1.0 requests to have parameter 
names that are unique in the first 6 characters).
- We support MRS version 4 as well as version 3.
- We have abandoned support for PhyML version 2 in favour of version 3. 
The wrapper for ModelGenerator has been modified accordingly in order to 
automatically start PhyML with a model generated by ModelGenerator, 
using not anymore the script generated by ModelGenerator itself (it is 
for version 2) but instead a Perl script that parses the ModelGenerator 
output. The user can choose whether to use the model selected according 
to Akaike, modified Akaike or Bayesian information criterion.
- We support the new optional features introduced in CLUSTAL version 2 
(using UPGMA instead of NJ, not using sequence weights, improving the 
alignment by iterative re-alignment).
- The module for InterProScan works with version 4.5 and has HAMAP in 
its menu.
- The list of databank names in ebi_fasta has been adapted to the recent 
situation on the server.

	Guy Bottu,
	wEMBOSS development team

From charles-listes-emboss at plessy.org  Wed Aug  5 10:16:57 2009
From: charles-listes-emboss at plessy.org (Charles Plessy)
Date: Wed, 5 Aug 2009 19:16:57 +0900
Subject: [EMBOSS] Redistribution terms of PHILIPNEW.
Message-ID: <20090805101657.GA26099@kunpuu.plessy.org>

Dear EMBOSS developers,

I am preparing a Debian package for EMBASSY?s PHILIPNEW package. The
redistribution terms of Phyilp itself are:

/* version 3.6. (c) Copyright 1993-2002 by the University of Washington.
   Written by Joseph Felsenstein, Akiko Fuseki, Sean Lamont, Andrew Keeffe,
   and Dan Fineman.
   Permission is granted to copy and use this program provided no fee is
   charged for it and provided that this copyright notice is not removed. */

And for its documentation:

   Copyright 1986-2000 by the University of
   Washington.  Written by Joseph Felsenstein.  Permission is granted to copy 
   this document provided that no fee is charged for it and that this copyright 
   notice is not removed. 

I see that the documentation in emboss-doc is a derivative of the Phylip
documentation. What are the redistribution terms for it ?

For the rest of the EMBOSS-specific work, there is a hint that the license
could be the GNU GPL, since this is what the COPYING file contains, but the GNU
GPL does not allow linking to software that prohibits commercial use. As
copyright holders, you are not yourself bound by the GPL, so this does not
prevent you from distributing PHYLIPNEW, but this buggy situation makes it
un-redistributable for third parties like Debian.

But maybe the license of the EMBASSY part of PHYLIPNEW is not the GNU GPL? Can
you clarify?

Have a nice day,

-- 
Charles Plessy
Debian Med packaging team,
http://www.debian.org/devel/debian-med
Tsurumi, Kanagawa, Japan


From pmr at ebi.ac.uk  Wed Aug  5 10:47:45 2009
From: pmr at ebi.ac.uk (Peter Rice)
Date: Wed, 05 Aug 2009 11:47:45 +0100
Subject: [EMBOSS] Redistribution terms of PHILIPNEW.
In-Reply-To: <20090805101657.GA26099@kunpuu.plessy.org>
References: <20090805101657.GA26099@kunpuu.plessy.org>
Message-ID: <4A796351.3000108@ebi.ac.uk>

Charles Plessy wrote:
> Dear EMBOSS developers,
> 
> I see that the documentation in emboss-doc is a derivative of the Phylip
> documentation. What are the redistribution terms for it ?

The changes are only to conform to EMBOSS documentation style and to use 
EMBOSS examples. The Phylip redistribution terms apply.

> For the rest of the EMBOSS-specific work, there is a hint that the license
> could be the GNU GPL, since this is what the COPYING file contains, but the GNU
> GPL does not allow linking to software that prohibits commercial use. As
> copyright holders, you are not yourself bound by the GPL, so this does not
> prevent you from distributing PHYLIPNEW, but this buggy situation makes it
> un-redistributable for third parties like Debian.

The original licence applies.

The COPYING file has been accidentally left there. We will replace it 
with the phylip copyright statements from the phylip-3.68 doc/main.html 
file (and check the other EMBASSY packages). The AUTHORS file should be 
completed as it is presently empty.

If you check the README file you will see the changes we made. They 
certainly do not change the code significantly, only the interface.

> But maybe the license of the EMBASSY part of PHYLIPNEW is not the GNU GPL? Can
> you clarify?

Definitely not GNU GPL.

regards,

Peter Rice


From biopython at maubp.freeserve.co.uk  Thu Aug  6 17:28:05 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 6 Aug 2009 18:28:05 +0100
Subject: [EMBOSS] GFF/GFF2/GFF3 examples on EMBOSS webpage
Message-ID: <320fb6e00908061028m776fbf9buc56e1fb73f7e3a0b@mail.gmail.com>

Hi all,

I was just looking at this page:
http://emboss.sourceforge.net/docs/themes/SequenceFormats.html

This table lists GFF2 as one entry, and GFF/GFF3 as another. They link
to: http://emboss.sourceforge.net/docs/themes/seqformats/gff2 and
http://emboss.sourceforge.net/docs/themes/seqformats/gff respectively.

These examples appear to be indentical (and the header says it is a
GFF2 file). So I am a bit confused. Should one be a GFF3 file, and
simply one file was uploaded twice by mistake?

Thanks,

Peter C.


From isabelle.wells at roche.com  Tue Aug 18 08:25:41 2009
From: isabelle.wells at roche.com (Wells, Isabelle)
Date: Tue, 18 Aug 2009 10:25:41 +0200
Subject: [EMBOSS] inosine in nucleotide sequence databases
Message-ID: <6DE144B7487D104290A097EA7C0C356A016BB94889@rkamsem701.emea.roche.com>

Hi all,
Can emboss handle inosine in nucleotide sequences? We have a nucleotide file in embl format where some sequences contain inosine. Dbiflat doesn't seem to index the database properly although no error message was given and those inosine containing sequences cannot be retrieved with seqret. Any suggestions on what we could do apart from replacing inosine by X or N?
Many thanks,
Isabelle Wells


From pmr at ebi.ac.uk  Tue Aug 18 10:05:05 2009
From: pmr at ebi.ac.uk (Peter Rice)
Date: Tue, 18 Aug 2009 11:05:05 +0100
Subject: [EMBOSS] inosine in nucleotide sequence databases
In-Reply-To: <6DE144B7487D104290A097EA7C0C356A016BB94889@rkamsem701.emea.roche.com>
References: <6DE144B7487D104290A097EA7C0C356A016BB94889@rkamsem701.emea.roche.com>
Message-ID: <4A8A7CD1.9030700@ebi.ac.uk>

Dear Isabelle,

Wells, Isabelle wrote:
> Can emboss handle inosine in nucleotide sequences? We have a
> nucleotide file in embl format where some sequences contain inosine.
> Dbiflat doesn't seem to index the database properly although no error
> message was given and those inosine containing sequences cannot be
> retrieved with seqret. Any suggestions on what we could do apart from
> replacing inosine by X or N?

I assume your dbiflat problem is an error in retrieving the entries,
unless there is some other format problem in the database that prevents
entries from being recognized by the dbiflat parser. If you can send me
one of the Inosine-containing entries (or a fake entry if these one are
proprietary information) I can check.

We treat Inosine as a modified base. These are usually in RNA sequences.
You should replace it by X or N and if you have an EMBL format feature
table you could add a modified_base feature with a /mod_base=I qualifier
to mark each Inosine. EMBOSS does nothing special with these in the
current release, but you can perhaps suggest applications to use the
modified base information.

Hope this helps,

Peter Rice


From xiz407 at gmail.com  Tue Aug 18 15:23:45 2009
From: xiz407 at gmail.com (Zhou Xiang)
Date: Tue, 18 Aug 2009 10:23:45 -0500
Subject: [EMBOSS] can vectorstrip trim only a substring of the adapter?
Message-ID: <d55614630908180823qcd029f6yd86bc00449ebcef4@mail.gmail.com>

Hi all,

I used the vectorstrip to trim the 3' adapter off the sequences.
But it seemed that the program searched for the existence of the entire
adapter.

For example, if i have the read: CCCCCTTTTTAAAAAGGGGG
And 3' adapter is: CCAAAGGG
The program will not trim the read to CCCCCTTTTTAA
Because it does not use the substring "AAAGGG" in the adapter sequence.

Any comments about this? How can i trim only a substring of the adapter?
I hope it can search for the longest match, but substring matches should
also be accepted if no entire adapter is found in the sequence.
Thanks!

-Xiang


From pmr at ebi.ac.uk  Tue Aug 18 16:06:43 2009
From: pmr at ebi.ac.uk (Peter Rice)
Date: Tue, 18 Aug 2009 17:06:43 +0100
Subject: [EMBOSS] can vectorstrip trim only a substring of the adapter?
In-Reply-To: <d55614630908180823qcd029f6yd86bc00449ebcef4@mail.gmail.com>
References: <d55614630908180823qcd029f6yd86bc00449ebcef4@mail.gmail.com>
Message-ID: <4A8AD193.8050102@ebi.ac.uk>

Dear Zhou Xiang,

> How can i trim only a substring of the adapter?

You can use the -mismatch parameter to increase the allowed number of
mismatches.

A higher percent mismatch allows less precise matching, but in this case the
value needs to be set quite high (25).

We are interested in any comments on removing 3' adapters from short
reads. We expect that we can find improvements in the methods used by
vectorstrip. Please send us any suggestions.

regards,

Peter Rice


From Frank.Foerster at biozentrum.uni-wuerzburg.de  Tue Aug 18 18:15:44 2009
From: Frank.Foerster at biozentrum.uni-wuerzburg.de (=?ISO-8859-15?Q?Frank_F=F6rster?=)
Date: Tue, 18 Aug 2009 20:15:44 +0200
Subject: [EMBOSS] Needle with penalty for end gaps
Message-ID: <4A8AEFD0.8040403@biozentrum.uni-wuerzburg.de>

Hi,

in the announcement-thread  of the new EMBOSS version 6.1.0. was a
request for a program allowing complete global alignments including
penalties for end gaps by Daniel Barker. (
http://www.mail-archive.com/emboss at lists.open-bio.org/msg01202.html )

The suggestion was to add a command line parameter to the needle program
to enable/disable the penalties.

Are there any news on this topic? I have to perform a lot of pairwise
global alignments (without free end gaps) and either I have to program
my own software or use existing software. Needle owns all needed
features except the "only free end behavior". So I am really interested
in getting a version of needle able to help me ;)

Thanks for the great EMBOSS package.

Regards,
Frank

-- 
Dipl. Biochem. Frank F?rster
Department of Bioinformatics
University of W?rzburg, Germany
Fon: +49 931 - 318 4555
Fax: +49 931 - 318 4552
frank.foerster at biozentrum.uni-wuerzburg.de


From pmr at ebi.ac.uk  Wed Aug 19 07:26:10 2009
From: pmr at ebi.ac.uk (Peter Rice)
Date: Wed, 19 Aug 2009 08:26:10 +0100
Subject: [EMBOSS] Needle with penalty for end gaps
In-Reply-To: <4A8AEFD0.8040403@biozentrum.uni-wuerzburg.de>
References: <4A8AEFD0.8040403@biozentrum.uni-wuerzburg.de>
Message-ID: <4A8BA912.4080904@ebi.ac.uk>

Frank F?rster wrote:
> Are there any news on this topic? I have to perform a lot of pairwise
> global alignments (without free end gaps) and either I have to program
> my own software or use existing software. Needle owns all needed
> features except the "only free end behavior". So I am really interested
> in getting a version of needle able to help me ;)

We are just coming to the end of our "40 days and 40 nights" since the 
release when we try not to break anything by making changes - and while 
we work on finishing the book texts which is really what has kept us busy.

We will get on to this next week (the 40 days runs out on Monday 24th 
:-) and can give you an early version to try.

> Thanks for the great EMBOSS package.

Thanks for the very welcome thanks!

regards,

Peter


From Frank.Foerster at biozentrum.uni-wuerzburg.de  Wed Aug 19 07:29:19 2009
From: Frank.Foerster at biozentrum.uni-wuerzburg.de (=?ISO-8859-15?Q?Frank_F=F6rster?=)
Date: Wed, 19 Aug 2009 09:29:19 +0200
Subject: [EMBOSS] Needle with penalty for end gaps
In-Reply-To: <4A8BA912.4080904@ebi.ac.uk>
References: <4A8AEFD0.8040403@biozentrum.uni-wuerzburg.de>
	<4A8BA912.4080904@ebi.ac.uk>
Message-ID: <4A8BA9CF.5070604@biozentrum.uni-wuerzburg.de>

Dear Peter,

thank you for your fast reply.

> We will get on to this next week (the 40 days runs out on Monday 24th
> :-) and can give you an early version to try.

This sounds very kind of you. I can hardly wait but I will ;)

Regards,
Frank

-- 
Dipl. Biochem. Frank F?rster
Department of Bioinformatics
University of W?rzburg, Germany
Fon: +49 931 - 318 4555
Fax: +49 931 - 318 4552
frank.foerster at biozentrum.uni-wuerzburg.de


From biopython at maubp.freeserve.co.uk  Wed Aug 19 11:08:26 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 19 Aug 2009 12:08:26 +0100
Subject: [EMBOSS] vectorstrip on FASTQ files
Message-ID: <320fb6e00908190408j25f2eca0l6356b0fcd0526422@mail.gmail.com>

Hi,

I'm trying to use vectorstrip on FASTQ files (as a simple way to
remove adaptor or primer sequences). However, it seems that on output
the FASTQ qualities are missing (all set to the double quote, ASCII
33, meaning PHRED quality 1 or random). Is this a known bug (or
rather, a missing feature)?

For illustration I am using a Sanger style FASTQ file from the NCBI
SRA (short reads originally from Solexa/Illumina), SRR014849.fastq
which you can download from
ftp://ftp.ncbi.nlm.nih.gov/sra/static/SRX003/SRX003639/SRR014849.fastq.gz

I am pretending "GTTGGAACCG" is 5' adaptor sequence, and want to find
any matches in some FASTQ reads, and trim it off taking only the
sequence to the right. For simplicity I'm allowing no mismatches.
Here is the start of the file:

$ head -n 12 SRR014849.fastq
@SRR014849.1 EIXKN4201CFU84 length=93
GGGGGGGGGGGGGGGGCTTTTTTTGTTTGGAACCGAAAGGGTTTTGAATTTCAAACCCTTTTCGGTTTCCAACCTTCCAAAGCAATGCCAATA
+SRR014849.1 EIXKN4201CFU84 length=93
3+&$#"""""""""""7F at 71,'";C?,B;?6B;:EA1EA1EA5'9B:?:#9EA0D at 2EA5':>5?:%A;A8A;?9B;D@/=<?7=9<2A8==
@SRR014849.3 EIXKN4201D4ZBL length=119
GGGGGGGGGCTGTTGGCCGAGGTTGGAGTAGCCAGGGGGAAGGCATGGCCAGCCGTTGAGAAATGCTTGTTGAAGTTTTCGATAATAATGGATTTATCGGTGGTGACCGTGTTACCTAG
+SRR014849.3 EIXKN4201D4ZBL length=119
;3.*(&$"";<=A9 at 8A9;<B;B;B;8=<==B;<FB8/'@8B:==<B;A9<<A8=B;==;A=)=<<B;=A9<@7<FB5(<<=<B;<B;:A9=EA0;<;B:<A8=<<@8<<<B;<A99=<
@SRR014849.9 EIXKN4201AL42E length=84
AACATAAAGAGCAATAGACAGTTGGAACCGAAAGGGTTTGAATTCAAACCCTTTGGTTCCAACTTGTCTTGCTTTAGCCTTTTA
+SRR014849.9 EIXKN4201AL42E length=84
B:=8<EA087<;@8<<<8<:8A9=3>5B;4B>+C?,EA09B;@;9E@/EA/E@/B:;1B:B:;A9<5<B;;8EA0<<B;FB6)7

Notice the "adaptor" in in the third sequence, SRR014849.9,
AACATAAAGAGCAATAGACAGTTGGAACCGAAAGGGTTTGAATTCAAACCCTTTGGTTCCAACTTGTCTTGCTTTAGCCTTTTA
This should be trimmed to just:
AAAGGGTTTGAATTCAAACCCTTTGGTTCCAACTTGTCTTGCTTTAGCCTTTTA

Using FASTA as output looks fine:

$ vectorstrip -sequence SRR014849.fastq -sformat fastq-sanger
-readfile N -alinker "GTTGGAACCG" -blinker "" -osformat fasta -outseq
SRR014849_5trimmed.fasta -mismatch 0 -besthits Y -outfile
SRR014849_5trimmed.txt
Removes vectors from the ends of nucleotide sequence(s)

$ head -n 2 SRR014849_5trimmed.fasta
>SRR014849.9_from_31_to_84 EIXKN4201AL42E length=84
AAAGGGTTTGAATTCAAACCCTTTGGTTCCAACTTGTCTTGCTTTAGCCTTTTA

Using Sanger FASTQ runs:

$ vectorstrip -sequence SRR014849.fastq -sformat fastq-sanger
-readfile N -alinker "GTTGGAACCG" -blinker "" -osformat fastq-sanger
-outseq SRR014849_5trimmed.fastq -mismatch 0 -besthits Y -outfile
SRR014849_5trimmed.txt
Removes vectors from the ends of nucleotide sequence(s)

But the output is missing the quality scores:

$ head -n 4 SRR014849_5trimmed.fastq
@SRR014849.9_from_31_to_84 EIXKN4201AL42E length=84
AAAGGGTTTGAATTCAAACCCTTTGGTTCCAACTTGTCTTGCTTTAGCCTTTTA
+
""""""""""""""""""""""""""""""""""""""""""""""""""""""

Is this something simple to add to vectorstrip? What about other
annotation (e.g. running vector strip on annotated GenBank or EMBL
files)?

Thanks,

Peter C.

P.S. This is with EMBOSS 6.1.0 with a patch from Peter Rice, running
on Mac OS X.


From pmr at ebi.ac.uk  Wed Aug 19 11:24:41 2009
From: pmr at ebi.ac.uk (Peter Rice)
Date: Wed, 19 Aug 2009 12:24:41 +0100
Subject: [EMBOSS] vectorstrip on FASTQ files
In-Reply-To: <320fb6e00908190408j25f2eca0l6356b0fcd0526422@mail.gmail.com>
References: <320fb6e00908190408j25f2eca0l6356b0fcd0526422@mail.gmail.com>
Message-ID: <4A8BE0F9.2020404@ebi.ac.uk>

Peter C. wrote:
> Hi,
> 
> I'm trying to use vectorstrip on FASTQ files (as a simple way to
> remove adaptor or primer sequences). However, it seems that on output
> the FASTQ qualities are missing (all set to the double quote, ASCII
> 33, meaning PHRED quality 1 or random). Is this a known bug (or
> rather, a missing feature)?

It is a missing feature. vectorstrip was written before quality scores
became fashionable and, curiously, nobody has asked for them before.

We will certainly retain them in a future release.

regards,

Peter


From biopython at maubp.freeserve.co.uk  Wed Aug 19 11:31:17 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 19 Aug 2009 12:31:17 +0100
Subject: [EMBOSS] vectorstrip on FASTQ files
In-Reply-To: <4A8BE0F9.2020404@ebi.ac.uk>
References: <320fb6e00908190408j25f2eca0l6356b0fcd0526422@mail.gmail.com>
	<4A8BE0F9.2020404@ebi.ac.uk>
Message-ID: <320fb6e00908190431i23a27ed7g46cf9223b191d5f5@mail.gmail.com>

 Peter Rice wrote:
>
> Peter C. wrote:
>> Hi,
>>
>> I'm trying to use vectorstrip on FASTQ files (as a simple way to
>> remove adaptor or primer sequences). However, it seems that on output
>> the FASTQ qualities are missing (all set to the double quote, ASCII
>> 33, meaning PHRED quality 1 or random). Is this a known bug (or
>> rather, a missing feature)?
>
> It is a missing feature. vectorstrip was written before quality scores
> became fashionable and, curiously, nobody has asked for them before.
>
> We will certainly retain them in a future release.

Great - thanks!

Peter C.


From frank.foerster at biozentrum.uni-wuerzburg.de  Mon Aug 24 09:01:49 2009
From: frank.foerster at biozentrum.uni-wuerzburg.de (=?ISO-8859-15?Q?Frank_F=F6rster?=)
Date: Mon, 24 Aug 2009 11:01:49 +0200
Subject: [EMBOSS] Gap cost restrictions for needle/water/stretcher?
Message-ID: <4A9256FD.8070708@biozentrum.uni-wuerzburg.de>

Hi,

I have only one question about the allowed gap costs in several programs. I 
using needle, water and stretcher for example.

There are some restrictions to the gap costs a have to use:

1) needle: float from 0-100 for gapopen and 0-10 for gapextend
2) water: float from 0.000-10.000 for gapopen and 0.000-10.000 for gapextend
2) stretcher: positive integer

What are the meaning of these restrictions? I think you use an integer value for 
stretcher (I did not check the source code) and floats for needle/water.

But why the restriction for water to three decimal places?

But more interessting, why the restriction to 0-100/0-10 for needle/water?

Thank you for your efforts!

Frank F?rster


-- 
Dipl. Biochem. Frank F?rster
Department of Bioinformatics
University of W?rzburg, Germany
Fon: +49 931 - 318 4555
Fax: +49 931 - 318 4552
frank.foerster at biozentrum.uni-wuerzburg.de


From pmr at ebi.ac.uk  Mon Aug 24 11:52:48 2009
From: pmr at ebi.ac.uk (Peter Rice)
Date: Mon, 24 Aug 2009 12:52:48 +0100
Subject: [EMBOSS] Gap cost restrictions for needle/water/stretcher?
In-Reply-To: <4A9256FD.8070708@biozentrum.uni-wuerzburg.de>
References: <4A9256FD.8070708@biozentrum.uni-wuerzburg.de>
Message-ID: <4A927F10.1000701@ebi.ac.uk>

Frank F?rster wrote:
> Hi,
> 
> I have only one question about the allowed gap costs in several
> programs. I using needle, water and stretcher for example.
> 
> There are some restrictions to the gap costs a have to use:
> 
> 1) needle: float from 0-100 for gapopen and 0-10 for gapextend
> 2) water: float from 0.000-10.000 for gapopen and 0.000-10.000 for
> gapextend
> 2) stretcher: positive integer
> 
> What are the meaning of these restrictions? I think you use an integer
> value for stretcher (I did not check the source code) and floats for
> needle/water.

Stretcher and matcher were imported code that used integer values for
speed. Our matrix files use integer values so we can use integer or
flats as gap penalty values.

> But why the restriction for water to three decimal places?

There is no 3 decimal places restriction, we only use 3 decimal places
to write out the values.

> But more interesting, why the restriction to 0-100/0-10 for needle/water?

We set limits for needle and water with the first release of EMBOSS and
nobody has asked for a higher value.

Zero is useful for some cases, either to not penalise the number of gaps
(for example a large number of single base gapes in a single nucleotide
read) or to not penalise the gap length (genomic sequence aligned to
mRNA/cDNA).

The upper limits are enough for the cases we have seen.

More interesting is why we have no upper limit for stretcher and
matcher. We should be consistent. These were third-party applications
(from Bill Pearson's fasta2 package) that we imported.

Does anyone object to setting the same gap penalty limits for all
applications?

Can anyone think of a use case that needs a larger maximum value?

We can add applications to suggest gap penalties for each matrix file
... or store default values in the files. Is this useful?

regards,

Peter Rice


From pmr at ebi.ac.uk  Tue Aug 25 10:59:35 2009
From: pmr at ebi.ac.uk (Peter Rice)
Date: Tue, 25 Aug 2009 11:59:35 +0100
Subject: [EMBOSS] EMBOSS patch 1-2 for 6.1.0
Message-ID: <4A93C417.2070502@ebi.ac.uk>

A patch for EMBOSS 6.1.0 is on the FTP server. This fixes a problem with 
reading the new UniProt/SwissProt description line. The bug is in extending 
strings within lists, so it may have had other effects. We recommend 
patching your EMBOSS installations as several of the new format SwissProt 
entries are unreadable

The files are on our FTP server ftp://emboss.open-bio.org/pub/EMBOSS/fixes
with a patch file and instructions in the patches subdirectory.

Fix 2. EMBOSS-6.1.0/ajax/ajmem.h
        EMBOSS-6.1.0/ajax/ajstr.c
        EMBOSS-6.1.0/ajax/ajstr.h
        EMBOSS-6.1.0/nucleus/embaln.c

24-Aug-2009: Fix string extension so that pointers in lists remain valid.
              This fixes a bug in processing SwissProt complex descriptions.
              Fix definition of AJRESIZE0 macro.
              Fix processing of first match in a prophet profile alignment

regards,

Peter Rice


From CAPS at novozymes.com  Tue Aug 25 11:56:34 2009
From: CAPS at novozymes.com (=?iso-8859-1?Q?CAPS_=28Carsten_P=2E_S=F6nksen=29?=)
Date: Tue, 25 Aug 2009 13:56:34 +0200
Subject: [EMBOSS] Pepstats "Molecular weight" calculations
Message-ID: <4D0464992D73D44A93400D893E2A2C763C01848AED@NZT0013E.dknz.nzcorp.net>

Hi
We are using Pepstat for molecular weight calculations and subsequent comparison with mass spectrometric determined masses.
I am looking for the mass table used for the molecular weight calculations of the proteins in order to determine the accuracy. And how it could be possible to change it.

The other question is implementation of a molecular weight assuming that the cysteins form disulfide bridges.
                             This question is related to my first line. Since we compare the intact molecular weight of the proteins we want to be as precise as possible and thus measure the difference between reduced and oxidized cystein residues. Most proteins with cystein residues form disulfide bridges.
                             Would it be possible to include a molecular weight calculation which takes disulfide bridges into account? So that an even nr of cysteins are calculated with the mass of oxidized  cysteins (S-S) and if there should be an single cystein left then it is calculated with a sulfhydryl group (SH)?

Best Regards
Carsten P. S?nksen
Senior Scientist

Novozymes A/S
Krogshoejvej 36
2880 Bagsvaerd Denmark
Phone: +45 44461123
Mobile: +45 30771123
E-mail: caps at novozymes.com
Novozymes A/S (reg. no.: 10007127). Registered address: Krogshoejvej 36 DK-2880 Bagsvaerd, Denmark
This e-mail (including any attachments) is for the intended addressee(s) only and may contain confidential and/or proprietary information protected by law. You are hereby notified that any unauthorized reading, disclosure, copying or distribution of this e-mail or use of information herein is strictly prohibited. If you are not an intended recipient you should delete this e-mail immediately. Thank you.


From pmr at ebi.ac.uk  Tue Aug 25 12:56:30 2009
From: pmr at ebi.ac.uk (Peter Rice)
Date: Tue, 25 Aug 2009 13:56:30 +0100
Subject: [EMBOSS] Pepstats "Molecular weight" calculations
In-Reply-To: <4D0464992D73D44A93400D893E2A2C763C01848AED@NZT0013E.dknz.nzcorp.net>
References: <4D0464992D73D44A93400D893E2A2C763C01848AED@NZT0013E.dknz.nzcorp.net>
Message-ID: <4A93DF7E.4030305@ebi.ac.uk>

CAPS (Carsten P. S?nksen) wrote:
> Hi
> We are using Pepstat for molecular weight calculations and subsequent
> comparison with mass spectrometric determined masses. I am looking for
> the mass table used for the molecular weight calculations of the
> proteins in order to determine the accuracy. And how it could be
> possible to change it.

The table is in a file called Emolwt.dat

This should be included in the local data files section of the pepstats 
documentation. We will add it. It is at least mentioned in the -help output 
and in the command line section of the documentation. The local data files 
section should describe the file in more detail.

A copy in your local diretcory (embossdata-fetch will copy the EMBOSS 
version for you) will be used in preference to the installed copy.

> The other question is implementation of a molecular weight assuming that
> the cysteins form disulfide bridges. This question is related to my
> first line. Since we compare the intact molecular weight of the proteins
> we want to be as precise as possible and thus measure the difference
> between reduced and oxidized cystein residues. Most proteins with
> cystein residues form disulfide bridges. Would it be possible to include
> a molecular weight calculation which takes disulfide bridges into
> account? So that an even nr of cysteins are calculated with the mass of
> oxidized  cysteins (S-S) and if there should be an single cystein left
> then it is calculated with a sulfhydryl group (SH)?

Good suggestion. We can add that for the next release. we would add an 
option for the number of S-S bridges and adjust the molecular weight.
We have a similar option already for iep.

Is there a need for single cysteines to allow for inter-chain disulphide 
bridges?

Are there any other adjustments you would like?

regards,

Peter Rice


From CAPS at novozymes.com  Tue Aug 25 13:46:28 2009
From: CAPS at novozymes.com (=?iso-8859-1?Q?CAPS_=28Carsten_P=2E_S=F6nksen=29?=)
Date: Tue, 25 Aug 2009 15:46:28 +0200
Subject: [EMBOSS] Pepstats "Molecular weight" calculations
In-Reply-To: <4A93DF7E.4030305@ebi.ac.uk>
References: <4D0464992D73D44A93400D893E2A2C763C01848AED@NZT0013E.dknz.nzcorp.net>
	<4A93DF7E.4030305@ebi.ac.uk>
Message-ID: <4D0464992D73D44A93400D893E2A2C763C01848E4B@NZT0013E.dknz.nzcorp.net>

Hi Peter,

Thanks a lot for fast and positive reply.

Regarding the molecular weight calculation including the disulfide bridges:
Would it be possible to have the option that pepstat always calculates the molecular weight for the highest number of possible disulfide bridges and if there is a single cysteine left then this one should be calculated with an sulfhydryl group?

This option would also be nice for the iep calculation.

"Is there a need for single cysteines to allow for inter-chain disulphide 
bridges?"
Not currently I believe that we then turn into a level where you need human interaction. 

Right now no further adjustments in my mind. 

Do you have an estimated time range when I can expect the next release?


Best Regards
Carsten P. S?nksen
Senior Scientist

Novozymes A/S
Krogshoejvej 36

2880 Bagsvaerd Denmark
Phone: +45 44461123
Mobile: +45 30771123
E-mail: caps at novozymes.com

Novozymes A/S (reg. no.: 10007127). Registered address: Krogshoejvej 36 DK-2880 Bagsvaerd, Denmark
This e-mail (including any attachments) is for the intended addressee(s) only and may contain confidential and/or proprietary information protected by law. You are hereby notified that any unauthorized reading, disclosure, copying or distribution of this e-mail or use of information herein is strictly prohibited. If you are not an intended recipient you should delete this e-mail immediately. Thank you.


-----Original Message-----
From: Peter Rice [mailto:pmr at ebi.ac.uk] 
Sent: 25. august 2009 14:57
To: CAPS (Carsten P. S?nksen)
Cc: emboss at lists.open-bio.org; TAPO (Thomas Agersten Poulsen)
Subject: Re: [EMBOSS] Pepstats "Molecular weight" calculations

CAPS (Carsten P. S?nksen) wrote:
> Hi
> We are using Pepstat for molecular weight calculations and subsequent
> comparison with mass spectrometric determined masses. I am looking for
> the mass table used for the molecular weight calculations of the
> proteins in order to determine the accuracy. And how it could be
> possible to change it.

The table is in a file called Emolwt.dat

This should be included in the local data files section of the pepstats 
documentation. We will add it. It is at least mentioned in the -help output 
and in the command line section of the documentation. The local data files 
section should describe the file in more detail.

A copy in your local diretcory (embossdata-fetch will copy the EMBOSS 
version for you) will be used in preference to the installed copy.

> The other question is implementation of a molecular weight assuming that
> the cysteins form disulfide bridges. This question is related to my
> first line. Since we compare the intact molecular weight of the proteins
> we want to be as precise as possible and thus measure the difference
> between reduced and oxidized cystein residues. Most proteins with
> cystein residues form disulfide bridges. Would it be possible to include
> a molecular weight calculation which takes disulfide bridges into
> account? So that an even nr of cysteins are calculated with the mass of
> oxidized  cysteins (S-S) and if there should be an single cystein left
> then it is calculated with a sulfhydryl group (SH)?

Good suggestion. We can add that for the next release. we would add an 
option for the number of S-S bridges and adjust the molecular weight.
We have a similar option already for iep.

Is there a need for single cysteines to allow for inter-chain disulphide 
bridges?

Are there any other adjustments you would like?

regards,

Peter Rice


From gbottu at vub.ac.be  Tue Aug 25 16:06:44 2009
From: gbottu at vub.ac.be (Guy Bottu)
Date: Tue, 25 Aug 2009 18:06:44 +0200
Subject: [EMBOSS] wrappers4EMBOSS 2.3.0 released
Message-ID: <4A940C14.5070705@vub.ac.be>

	Dear users of wrappers4EMBOSS,

This mail concerns you if you are using or intend to use wrappers4EMBOSS 
with one of the following : EMBOSS 6.1.0, MRS 4, PhyML 3, CLUSTAL 2, 
InterProScan 4.5, EBI fastA access through Web Services.

You might be interested to upgrade for one of the following reasons :
- We support all EMBOSS versions from 3.0.0 to 6.1.0 (it was necessary 
to take account of the fact that MYEMBOSS can use "source" as well as 
"src" as directory name and that EMBOSS 6.1.0 requests to have parameter 
names that are unique in the first 6 characters).
- We support MRS version 4 as well as version 3.
- We have abandoned support for PhyML version 2 in favour of version 3. 
The wrapper for ModelGenerator has been modified accordingly in order to 
automatically start PhyML with a model generated by ModelGenerator, 
using not anymore the script generated by ModelGenerator itself (it is 
for version 2) but instead a Perl script that parses the ModelGenerator 
output. The user can choose whether to use the model selected according 
to Akaike, modified Akaike or Bayesian information criterion.
- We support the new optional features introduced in CLUSTAL version 2 
(using UPGMA instead of NJ, not using sequence weights, improving the 
alignment by iterative re-alignment).
- The module for InterProScan works with version 4.5 and has HAMAP in 
its menu.
- The list of databank names in ebi_fasta has been adapted to the recent 
situation on the server.

	Guy Bottu,
	wEMBOSS development team