From aengus.stewart at cancer.org.uk Thu Mar 2 09:56:25 2006 From: aengus.stewart at cancer.org.uk (Aengus Stewart) Date: Thu, 02 Mar 2006 14:56:25 +0000 Subject: [EMBOSS] DB - finding out how many sequences Message-ID: <44070799.6090804@cancer.org.uk> Hi, Does any of the EMBOSS apps output the number of sequences that it has searched? I am after this figure as I have a data library issue. Some sequences are "not found" by EMBOSS even though I know they are in the original flat files. I am trying to figure out if this is configure problem data problem indexing problem. The indexing with dbiflat doesnt complain but I would like to be able to check my input number of sequences with what EMBOSS thinks was output. Cheers Aengus -- ----------------------------------------------------------------------- Aengus Stewart Group Leader Bioinformatics and BioStatistics Tel: +44 (0)20 7269 3679 Cancer Research UK, Lincoln's Inn Fields, Holborn, London, WC2A 3PX, UK ----------------------------------------------------------------------- This electronic message contains information which may be privileged and confidential. The information is intended to be for the use of the individual(s) or entity named above. Be aware that any third party disclosure, distribution, copying or use of this communication, without prior permission, is strictly prohibited. From jison at ebi.ac.uk Thu Mar 2 11:35:35 2006 From: jison at ebi.ac.uk (Jon Ison) Date: Thu, 2 Mar 2006 16:35:35 -0000 (GMT) Subject: [EMBOSS] DB - finding out how many sequences In-Reply-To: <44070799.6090804@cancer.org.uk> References: <44070799.6090804@cancer.org.uk> Message-ID: <43713.172.31.100.168.1141317335.squirrel@webmail.ebi.ac.uk> Ay up Aengus Not so far as I'm aware although you could get that number indirectly by using infoseq. You could try using dbxflat too ... which does generate some stats on the input data - don't know whether the stats include the number of sequences that were indexed but its worth a look. Cheers Jon > > Hi, > > Does any of the EMBOSS apps output the number of sequences that it has searched? > > I am after this figure as I have a data library issue. > > Some sequences are "not found" by EMBOSS even though I know they are in the original flat files. > > I am trying to figure out if this is > > configure problem > data problem > indexing problem. > > The indexing with dbiflat doesnt complain but I would like to be able to check my input number of sequences with what > EMBOSS thinks was output. > > > Cheers > Aengus > > > -- > ----------------------------------------------------------------------- > Aengus Stewart > Group Leader > Bioinformatics and BioStatistics Tel: +44 (0)20 7269 3679 > Cancer Research UK, Lincoln's Inn Fields, Holborn, London, WC2A 3PX, UK > ----------------------------------------------------------------------- > > This electronic message contains information which may be privileged and > confidential. The information is intended to be for the use of the > individual(s) or entity named above. Be aware that any third party > disclosure, distribution, copying or use of this communication, without > prior permission, is strictly prohibited. > _______________________________________________ > EMBOSS mailing list > EMBOSS at emboss.open-bio.org > http://newportal.open-bio.org/mailman/listinfo/emboss > From pmr at ebi.ac.uk Thu Mar 2 12:39:07 2006 From: pmr at ebi.ac.uk (Peter Rice) Date: Thu, 02 Mar 2006 17:39:07 +0000 Subject: [EMBOSS] DB - finding out how many sequences In-Reply-To: <44070799.6090804@cancer.org.uk> References: <44070799.6090804@cancer.org.uk> Message-ID: <44072DBB.1060506@ebi.ac.uk> Hi Aengus, > Does any of the EMBOSS apps output the number of sequences that it has searched? > > I am after this figure as I have a data library issue. > > Some sequences are "not found" by EMBOSS even though I know they are in the original flat files. What is your database definition? regards, Peter From joanne at bioinformatics.ubc.ca Mon Mar 6 17:51:00 2006 From: joanne at bioinformatics.ubc.ca (Joanne Fox) Date: Mon, 06 Mar 2006 14:51:00 -0800 Subject: [EMBOSS] Warning: Cannot open division file '' for database 'swissprot' Message-ID: <440CBCD4.8030605@bioinformatics.ubc.ca> Hello EMBOSS community, I used dbiflat to index the latest flatfile distribution of swissprot (uniprot_sprot.dat). Now I am trying to use this database with the EMBOSS patmatdb program and I'm encountering an error that reads, "Warning: Cannot open division file '' for database 'swissprot'". I searched the mailing list archives and I see others with this same problem that boil down to permissions and/or path problems. However, I still can't figure out what's going wrong on my system. I've put more detailed information below. I'm new to the world of configuring EMBOSS so if anyone has any ideas about what might be going wrong, I'd really appreciate the advice. Thanks, Joanne. -- | Joanne Fox | http://bioinformatics.ubc.ca/people/joanne ~> showdb Displays information on the currently available databases # Name Type ID Qry All Comment # ==== ==== == === === ======= swissprot P OK OK OK Swissprot Release 7.1, 2/21/2006 ~> patmatdb Search a protein sequence with a motif Input sequence(s): swissprot Warning: Cannot open division file '' for database 'swissprot' Error: Unable to read sequence 'swissprot' Input sequence(s): swissprot:* Warning: Cannot open division file '' for database 'swissprot' Error: Unable to read sequence 'swissprot:*' Died: patmatdb terminated: Bad value for '-sequence' and no more retries contents of .embossrc file: set emboss_logfile /usr/local/software/bioinformatics/emboss/log/emboss.log set emboss_database_dir /raid1/bioinformatics/data DB swissprot [ type: P method: emblcd format: swissprot dir: \$emboss_database_dir/swissprot/swissprot_V7_1 file: "*.dat" release: "7.1" comment: "Swissprot Release 7.1, 2/21/2006" ] contents of the /raid1/bioinformatics/data/swissprot/swissprot_V7_1/ directory: -rw-r--r-- 1 bin bin 1165524 Mar 6 12:37 acnum.hit -rw-r--r-- 1 bin bin 3899118 Mar 6 12:37 acnum.trg -rw-r--r-- 1 bin bin 322 Mar 6 12:37 division.lkp -rw-r--r-- 1 bin bin 4368405 Mar 6 12:37 entrynam.idx -rw-r--r-- 1 bin bin 802445434 Mar 6 12:23 uniprot_sprot.dat From yezhiqiang at gmail.com Mon Mar 6 16:29:49 2006 From: yezhiqiang at gmail.com (Zhiqiang Ye) Date: Tue, 7 Mar 2006 05:29:49 +0800 Subject: [EMBOSS] Does emboss have a handy way for mutate a protein sequence? Message-ID: <34198fe40603061329u85c9f95p@mail.gmail.com> Dear all, Does emboss have a handy way for mutate a protein sequence by the specified way? For example, I have a sequence foo.fasta >foo MATSCGLLKIIQRE It has a mutant called 'A2L'. Is there any way to do this operation to output(with an option to check the foo.fasta has 'A' at position 2): >foo A2L MLTSCGLLKIIQRE My way: use extractseq to extract two file: one before position 2, the other after postion 2. Then creat a fasta file contain 'L'. After that, I use union to connect these 3 sequence file in to one. Or write a perl script to do this by change a string's substring. How If emboss could provide a 'mutate' ! Thank you :) Best regards! -- Zhiqiang Ye From Marc.Logghe at DEVGEN.com Tue Mar 7 02:59:27 2006 From: Marc.Logghe at DEVGEN.com (Marc Logghe) Date: Tue, 7 Mar 2006 08:59:27 +0100 Subject: [EMBOSS] Does emboss have a handy way for mutate a protein sequence? Message-ID: <0C528E3670D8CE4B8E013F6749231AA6746B99@ANTARESIA.be.devgen.com> Can msbar do something for you ? Msbar = "Mutate sequence beyond all recognition" Cheers, Marc > -----Original Message----- > From: emboss-bounces at emboss.open-bio.org > [mailto:emboss-bounces at emboss.open-bio.org] On Behalf Of Zhiqiang Ye > Sent: Monday, March 06, 2006 10:30 PM > To: emboss at emboss.open-bio.org > Subject: [EMBOSS] Does emboss have a handy way for mutate a > protein sequence? > > Dear all, > > Does emboss have a handy way for mutate a protein > sequence by the specified way? > For example, I have a sequence foo.fasta > > >foo > MATSCGLLKIIQRE > > It has a mutant called 'A2L'. Is there any way to do this > operation to output(with an option to check the foo.fasta has > 'A' at position > 2): > >foo A2L > MLTSCGLLKIIQRE > > My way: use extractseq to extract two file: one before > position 2, the other after postion 2. Then creat a fasta > file contain 'L'. After that, I use union to connect these > 3 sequence file in to one. > > Or write a perl script to do this by change a string's substring. > > How If emboss could provide a 'mutate' ! > > Thank you :) > > Best regards! > -- > Zhiqiang Ye > > _______________________________________________ > EMBOSS mailing list > EMBOSS at emboss.open-bio.org > http://newportal.open-bio.org/mailman/listinfo/emboss > From David.Bauer at SCHERING.DE Tue Mar 7 01:40:24 2006 From: David.Bauer at SCHERING.DE (David.Bauer at SCHERING.DE) Date: Tue, 7 Mar 2006 07:40:24 +0100 Subject: [EMBOSS] Antwort: Does emboss have a handy way for mutate a protein sequence? In-Reply-To: <34198fe40603061329u85c9f95p@mail.gmail.com> Message-ID: What about this solution: cutseq foo.fasta -from 2 -to 2 | pasteseq -filter -pos 1 -bs asis:'L' | descseq -filter -append -desc "A2L" >foo A2L MLTSCGLLKIIQRE Cheers, David. emboss-bounces at emboss.open-bio.org schrieb am 06/03/2006 22:29:49: > Dear all, > > Does emboss have a handy way for mutate a protein sequence by > the specified way? > For example, I have a sequence foo.fasta > > >foo > MATSCGLLKIIQRE > > It has a mutant called 'A2L'. Is there any way to do this operation > to output(with an option to check the foo.fasta has 'A' at position > 2): > >foo A2L > MLTSCGLLKIIQRE > > My way: use extractseq to extract two file: one before position 2, > the other after postion 2. Then creat a fasta file contain 'L'. After > that, I use union to connect these 3 sequence file in to one. > > Or write a perl script to do this by change a string's substring. > > How If emboss could provide a 'mutate' ! > > Thank you :) > > Best regards! > -- > Zhiqiang Ye > > _______________________________________________ > EMBOSS mailing list > EMBOSS at emboss.open-bio.org > http://newportal.open-bio.org/mailman/listinfo/emboss From yezhiqiang at gmail.com Tue Mar 7 08:22:57 2006 From: yezhiqiang at gmail.com (Zhiqiang Ye) Date: Tue, 7 Mar 2006 21:22:57 +0800 Subject: [EMBOSS] Does emboss have a handy way for mutate a protein sequence? In-Reply-To: <0C528E3670D8CE4B8E013F6749231AA6746B99@ANTARESIA.be.devgen.com> References: <0C528E3670D8CE4B8E013F6749231AA6746B99@ANTARESIA.be.devgen.com> Message-ID: <34198fe40603070522r5d0920f9q@mail.gmail.com> 2006/3/7, Marc Logghe : > Can msbar do something for you ? Msbar = "Mutate sequence beyond all > recognition" Thank you. I have checked msbar, it cannot do what I need. -- Zhiqiang Ye From pmr at ebi.ac.uk Tue Mar 7 08:55:39 2006 From: pmr at ebi.ac.uk (Peter Rice) Date: Tue, 07 Mar 2006 13:55:39 +0000 Subject: [EMBOSS] Does emboss have a handy way for mutate a protein sequence? In-Reply-To: <34198fe40603061329u85c9f95p@mail.gmail.com> References: <34198fe40603061329u85c9f95p@mail.gmail.com> Message-ID: <440D90DB.6040605@ebi.ac.uk> Zhiqiang Ye wrote: > Dear all, > > Does emboss have a handy way for mutate a protein sequence by > the specified way? > For example, I have a sequence foo.fasta > > >>foo > > MATSCGLLKIIQRE > > It has a mutant called 'A2L'. Is there any way to do this operation > to output(with an option to check the foo.fasta has 'A' at position > 2): > >>foo A2L > > MLTSCGLLKIIQRE EMBOSS has several programs to change sequences. None does exactly what you ask. You could look at: biosed (does what you ask for longer replacements, but will change all 'A's to 'L's.) We could extend biosed to specify the position of the pattern ... is that what you need? regards, Peter From gbottu at ben.vub.ac.be Tue Mar 7 10:38:08 2006 From: gbottu at ben.vub.ac.be (Guy Bottu) Date: Tue, 7 Mar 2006 16:38:08 +0100 Subject: [EMBOSS] Does emboss have a handy way for mutate a protein sequence In-Reply-To: <34198fe40603061329u85c9f95p@mail.gmail.com> References: <34198fe40603061329u85c9f95p@mail.gmail.com> Message-ID: <20060307153808.GA15947@bigben.ulb.ac.be> Your solution works, as does the one proposed by David Bauer. Both are however rather tedious. It goes much easier with an interactive sequence editor. There is MSE, not a standard EMBOSS program but distributed as Embassadir. It runs in a VT100 terminal ; it is a little bit intimidating for the novice user, but you can with some practice learn to use it. At the BEN site we have besides MSE also installed SeaView, a graphical mode editor (has versions for Windows, Macintosh and X-Window). These editors are of course only usable if you work locally in your own computer or in a terminal session in a remote computer. It will not work if you are using a Web interface for EMBOSS ... although some Web interfaces might have an applet mode editor that allows to save the modified sequence back on the server (is there one in Jemboss ?). Hope this helps, Guy Bottu, Belgian EMBnet Node From yezhiqiang at gmail.com Tue Mar 7 08:25:02 2006 From: yezhiqiang at gmail.com (Zhiqiang Ye) Date: Tue, 7 Mar 2006 21:25:02 +0800 Subject: [EMBOSS] Does emboss have a handy way for mutate a protein sequence? In-Reply-To: References: <34198fe40603061329u85c9f95p@mail.gmail.com> Message-ID: <34198fe40603070525w1ffa7155n@mail.gmail.com> 2006/3/7, David.Bauer at schering.de : > What about this solution: > > cutseq foo.fasta -from 2 -to 2 | pasteseq -filter -pos 1 -bs asis:'L' | > descseq -filter -append -desc "A2L" Thanks a lot. It works very well! Best -- Zhiqiang Ye From yezhiqiang at gmail.com Tue Mar 7 11:33:40 2006 From: yezhiqiang at gmail.com (Zhiqiang Ye) Date: Wed, 8 Mar 2006 00:33:40 +0800 Subject: [EMBOSS] Does emboss have a handy way for mutate a protein sequence? In-Reply-To: <440D90DB.6040605@ebi.ac.uk> References: <34198fe40603061329u85c9f95p@mail.gmail.com> <440D90DB.6040605@ebi.ac.uk> Message-ID: <34198fe40603070833p627159b0i@mail.gmail.com> 2006/3/7, Peter Rice : > > EMBOSS has several programs to change sequences. None does exactly what you ask. > > You could look at: > > biosed (does what you ask for longer replacements, but will change all 'A's to > 'L's.) Yeah, it will change all 'A's to 'L's... > We could extend biosed to specify the position of the pattern ... is that what > you need? > Yes! If biosed can be extended to do this, it will be better :) Best Regards! -- Zhiqiang Ye From yezhiqiang at gmail.com Tue Mar 7 11:42:00 2006 From: yezhiqiang at gmail.com (Zhiqiang Ye) Date: Wed, 8 Mar 2006 00:42:00 +0800 Subject: [EMBOSS] Does emboss have a handy way for mutate a protein sequence In-Reply-To: <20060307153808.GA15947@bigben.ulb.ac.be> References: <34198fe40603061329u85c9f95p@mail.gmail.com> <20060307153808.GA15947@bigben.ulb.ac.be> Message-ID: <34198fe40603070842i15a34763s@mail.gmail.com> hi, Guy Bottu Thank you. But I have to do a batch of these subsitituion, so a command line solution will be better. I write an ugly shell script to do this according to David Bauer. #!/bin/sh mutation=$2; WT=${mutation:0:1}; POS=${mutation:1:${#mutation}-2}; MT=${mutation: -1} POS2=`expr $POS - 1` cat $1 | cutseq -filter -from $POS -to $POS | pasteseq -filter -pos $POS2 -bs asis:$MT | descseq -filter -append -desc " (mutant: $mutation )" With this script mutate.sh in my ~/bin, I can type this: mutate.sh foo.fasta A2L Best -- Zhiqiang Ye From Marc.Logghe at DEVGEN.com Wed Mar 8 04:00:14 2006 From: Marc.Logghe at DEVGEN.com (Marc Logghe) Date: Wed, 8 Mar 2006 10:00:14 +0100 Subject: [EMBOSS] Oddcomp behaves oddly ... Message-ID: <0C528E3670D8CE4B8E013F6749231AA6746BA7@ANTARESIA.be.devgen.com> ... Or rather, how should I use it properly ? OK, suppose your run compseq to obtain the frequency for individual residues: compseq tsw:Q62671 -word 1 Apparently this example protein sequence is rather rich in leucine (106 L out of 889). In order to detect this leucine bias, a little file was created (leu.comp) that had the following content: Word size 1 Total count 0 # bias should be detected as 106 > 100 L 100 Oddcomp was run like this: oddcomp tsw:Q62671 -infile leu.comp -window 889 But the sequece is not reported. When I change the L count to 10 in leu.comp it does not work neither. Strangely enough, when the default window is taken (30) the sequence is reported. What is happening here ? Regards, Marc From d.gatherer at vir.gla.ac.uk Wed Mar 8 04:30:13 2006 From: d.gatherer at vir.gla.ac.uk (Derek Gatherer) Date: Wed, 08 Mar 2006 09:30:13 +0000 Subject: [EMBOSS] clustalw vs. emma Message-ID: <6.2.3.4.1.20060308092040.02aabc78@lenzie.gla.ac.uk> Morning all Is there some unusual default being passed to emma? For instance, here's emma with a vanilla set of parameters on a fairly well conserved set of proteins (bdlf4.fa): yoda:cluscheck 157 > emma bdlf4.fa -osformat2 msf -out2 bdlf4.emma -auto CLUSTAL W (1.83) Multiple Sequence Alignments Sequence type explicitly set to Protein Sequence format is Pearson Sequence 1: AG876-BDLF4 225 aa Sequence 2: B95-BDLF4 225 aa Sequence 3: GD1-BDLF4 225 aa Sequence 4: RLV-BDLF4 238 aa Start of Pairwise alignments Aligning... Sequences (1:2) Aligned. Score: 100 Sequences (1:3) Aligned. Score: 98 Sequences (1:4) Aligned. Score: 85 Sequences (2:3) Aligned. Score: 98 Sequences (2:4) Aligned. Score: 85 Sequences (3:4) Aligned. Score: 86 Guide tree file created: [00029986C] Start of Multiple Alignment There are 3 groups Aligning... Group 1: Sequences: 2 Score:3770 Group 2: Sequences: 3 Score:3741 Group 3: Sequences: 4 Score:3462 Alignment Score 8058 GCG-Alignment file created [00029986B] and now clustalw, unwrapped in emma, with the same input file yoda:cluscheck 158 > clustalw bdlf4.fa CLUSTAL W (1.83) Multiple Sequence Alignments Sequence format is Pearson Sequence 1: AG876-BDLF4 225 aa Sequence 2: B95-BDLF4 225 aa Sequence 3: GD1-BDLF4 225 aa Sequence 4: RLV-BDLF4 238 aa Start of Pairwise alignments Aligning... Sequences (1:2) Aligned. Score: 100 Sequences (1:3) Aligned. Score: 98 Sequences (1:4) Aligned. Score: 88 Sequences (2:3) Aligned. Score: 98 Sequences (2:4) Aligned. Score: 88 Sequences (3:4) Aligned. Score: 88 Guide tree file created: [bdlf4.dnd] Start of Multiple Alignment There are 3 groups Aligning... Group 1: Sequences: 2 Score:4959 Group 2: Sequences: 3 Score:4928 Group 3: Sequences: 4 Score:4677 Alignment Score 8187 CLUSTAL-Alignment file created [bdlf4.aln] Why is the scoring subtly different? and see what it does to the N-terminal of the alignment.... First with emma: 1 50 AG876-BDLF4 ~~~~~~~~~~~~~MSDQGRLSLPRGEGGTDEPNPRHLCSYSKLEFHLPLP B95-BDLF4 ~~~~~~~~~~~~~MSDQGRLSLPRGEGGTDEPNPRHLCSYSKLEFHLPLP GD1-BDLF4 ~~~~~~~~~~~~~MSDQGRLSLPRGEGGTDEPNPRHLCSYSKLEFHLPLP RLV-BDLF4 MSDHGRVSGRPRGAVRGRGASSPDGEGAPTGPNSRHLCSYSKLESHFPLP now with clustalw: AG876-BDLF4 MSDQGRLS-------------LPRGEGGTDEPNPRHLCSYSKLEFHLPLPESMASVFACW B95-BDLF4 MSDQGRLS-------------LPRGEGGTDEPNPRHLCSYSKLEFHLPLPESMASVFACW GD1-BDLF4 MSDQGRLS-------------LPRGEGGTDEPNPRHLCSYSKLEFHLPLPESMASVFACW RLV-BDLF4 MSDHGRVSGRPRGAVRGRGASSPDGEGAPTGPNSRHLCSYSKLESHFPLPESMASVFACW ***:**:* * ***.. **.********** *:************* Clustalw alone clearly gives the correct alignment whereas emma is wrong. I thought that emma simply wrapped clustalw for automation, but it appears it is doing something else. Out of a set of 80 proteins I am trying to pipeline through alignment, emma gives a variant result for 7 of them..... Any thoughts, as always, much appreciated cheers Derek From Marc.Logghe at DEVGEN.com Wed Mar 8 05:36:56 2006 From: Marc.Logghe at DEVGEN.com (Marc Logghe) Date: Wed, 8 Mar 2006 11:36:56 +0100 Subject: [EMBOSS] Oddcomp behaves oddly ... Message-ID: <0C528E3670D8CE4B8E013F6749231AA6746BAA@ANTARESIA.be.devgen.com> > Basically what is happening is that there is a check for the > length of the sequence being shorter than the window. It may > well be this that is giving the problem. This was a perfect diagnosis. It works fine when I make the window size off one. But I guess it should not be a problem for oddcomp being the window size equal (or even larger) to the length of the sequence ? It is a way of saying: don't bother with window sizes, just take the complete thing. Could be a nice to have feature. Thanks David, Marc From david at compbio.dundee.ac.uk Wed Mar 8 05:26:23 2006 From: david at compbio.dundee.ac.uk (David Martin) Date: Wed, 08 Mar 2006 10:26:23 +0000 Subject: [EMBOSS] Oddcomp behaves oddly ... In-Reply-To: <0C528E3670D8CE4B8E013F6749231AA6746BA7@ANTARESIA.be.devgen.com> Message-ID: On 8/3/06 9:00 am, "Marc Logghe" wrote: > ... Or rather, how should I use it properly ? > > OK, suppose your run compseq to obtain the frequency for individual > residues: > compseq tsw:Q62671 -word 1 > Apparently this example protein sequence is rather rich in leucine (106 > L out of 889). > > In order to detect this leucine bias, a little file was created > (leu.comp) that had the following content: > > Word size 1 > Total count 0 > > # bias should be detected as 106 > 100 > L 100 > > > Oddcomp was run like this: > oddcomp tsw:Q62671 -infile leu.comp -window 889 Try window 888 (ie shorter than the length of the sequence). There are a couple of minor bugs in the oddcomp code that I will forward to the team. Basically what is happening is that there is a check for the length of the sequence being shorter than the window. It may well be this that is giving the problem. It is a long time since I wrote this and C is not my usual language so apologies if this is not a comprehensive answer. ..d > > But the sequece is not reported. > When I change the L count to 10 in leu.comp it does not work neither. > Strangely enough, when the default window is taken (30) the sequence is > reported. > What is happening here ? > > Regards, > Marc > > _______________________________________________ > EMBOSS mailing list > EMBOSS at emboss.open-bio.org > http://newportal.open-bio.org/mailman/listinfo/emboss From jison at ebi.ac.uk Wed Mar 8 06:05:33 2006 From: jison at ebi.ac.uk (Jon Ison) Date: Wed, 8 Mar 2006 11:05:33 -0000 (GMT) Subject: [EMBOSS] Oddcomp behaves oddly ... In-Reply-To: <0C528E3670D8CE4B8E013F6749231AA6746BAA@ANTARESIA.be.devgen.com> References: <0C528E3670D8CE4B8E013F6749231AA6746BAA@ANTARESIA.be.devgen.com> Message-ID: <56257.84.92.187.247.1141815933.squirrel@webmail.ebi.ac.uk> Hi Marc What might be cleaner is if we modify the ACD file so that any window size bigger than the sequence length is reprompted for. Also, to add a qualifier to set the window to the sequence length, if that'd help. Cheers Jon >> Basically what is happening is that there is a check for the >> length of the sequence being shorter than the window. It may >> well be this that is giving the problem. > > This was a perfect diagnosis. It works fine when I make the window size > off one. > But I guess it should not be a problem for oddcomp being the window size > equal (or even larger) to the length of the sequence ? It is a way of > saying: don't bother with window sizes, just take the complete thing. > Could be a nice to have feature. > Thanks David, > Marc > > _______________________________________________ > EMBOSS mailing list > EMBOSS at emboss.open-bio.org > http://newportal.open-bio.org/mailman/listinfo/emboss > From Marc.Logghe at DEVGEN.com Wed Mar 8 06:36:06 2006 From: Marc.Logghe at DEVGEN.com (Marc Logghe) Date: Wed, 8 Mar 2006 12:36:06 +0100 Subject: [EMBOSS] Oddcomp behaves oddly ... Message-ID: <0C528E3670D8CE4B8E013F6749231AA6746BAB@ANTARESIA.be.devgen.com> Hi David, I am afraid there are some remaining oddities with oddcomp. Tried another protein, other residue. Word size 1 Total count 0 S 4 First a set of sequences is generated (kind of mimicking sliding window) of length 20: splitter wormpep:ZK822.4 -size 20 -overlap 19 > split.fa Second, oddseq is run (with window option off by one): oddcomp split.fa -window 19 -infile compseq.data # # Output from 'oddcomp' # # The Expected frequencies are taken from the file: compseq.data # # Word size: 1 ZK822.4_36-55 ZK822.4_37-56 ZK822.4_38-57 ZK822.4_39-58 ZK822.4_40-59 ZK822.4_41-60 # END # The first 20mer: >ZK822.4_36-55 SAGSSGSNFLSGLQNSSFGQ It is clear that there are 7 S residues in this stretch and we were looking for 4 or more, so that makes sense. However, when you run oddseq again with S count of 5 instead of 4, no sequence is reported ! Cheers, Marc > -----Original Message----- > From: David Martin [mailto:david at compbio.dundee.ac.uk] > Sent: Wednesday, March 08, 2006 11:26 AM > To: Marc Logghe; emboss at emboss.open-bio.org > Subject: Re: [EMBOSS] Oddcomp behaves oddly ... > > On 8/3/06 9:00 am, "Marc Logghe" wrote: > > > ... Or rather, how should I use it properly ? > > > > OK, suppose your run compseq to obtain the frequency for individual > > residues: > > compseq tsw:Q62671 -word 1 > > Apparently this example protein sequence is rather rich in leucine > > (106 L out of 889). > > > > In order to detect this leucine bias, a little file was created > > (leu.comp) that had the following content: > > > > Word size 1 > > Total count 0 > > > > # bias should be detected as 106 > 100 > > L 100 > > > > > > Oddcomp was run like this: > > oddcomp tsw:Q62671 -infile leu.comp -window 889 > > Try window 888 (ie shorter than the length of the sequence). > There are a couple of minor bugs in the oddcomp code that I > will forward to the team. > > Basically what is happening is that there is a check for the > length of the sequence being shorter than the window. It may > well be this that is giving the problem. > > It is a long time since I wrote this and C is not my usual > language so apologies if this is not a comprehensive answer. > > ..d > > > > > But the sequece is not reported. > > When I change the L count to 10 in leu.comp it does not > work neither. > > Strangely enough, when the default window is taken (30) the > sequence > > is reported. > > What is happening here ? > > > > Regards, > > Marc > > > > _______________________________________________ > > EMBOSS mailing list > > EMBOSS at emboss.open-bio.org > > http://newportal.open-bio.org/mailman/listinfo/emboss > > > From pmr at ebi.ac.uk Wed Mar 8 07:09:25 2006 From: pmr at ebi.ac.uk (Peter Rice) Date: Wed, 08 Mar 2006 12:09:25 +0000 Subject: [EMBOSS] Oddcomp behaves oddly ... In-Reply-To: References: Message-ID: <440EC975.6090907@ebi.ac.uk> David Martin wrote: > Basically what is happening is that there is a check for the length of the > sequence being shorter than the window. It may well be this that is giving > the problem. Not that part - it accepts a window the same length as the sequence (oddcomp can read more than one sequence, and does have to skip those too short to fit a window). A later loop does fail if the window size matches the sequence - I am testing allowing it to run just one more time :-) > It is a long time since I wrote this and C is not my usual language so > apologies if this is not a comprehensive answer. Snakke de fortran? >>But the sequece is not reported. >>When I change the L count to 10 in leu.comp it does not work neither. >>Strangely enough, when the default window is taken (30) the sequence is >>reported. Same problem I believe - it is the window size matching sequence length that stops the last for loop from checking anything. regadrs, Peter From pmr at ebi.ac.uk Wed Mar 8 08:13:24 2006 From: pmr at ebi.ac.uk (Peter Rice) Date: Wed, 08 Mar 2006 13:13:24 +0000 Subject: [EMBOSS] Oddcomp behaves oddly ... In-Reply-To: <0C528E3670D8CE4B8E013F6749231AA6746BAB@ANTARESIA.be.devgen.com> References: <0C528E3670D8CE4B8E013F6749231AA6746BAB@ANTARESIA.be.devgen.com> Message-ID: <440ED874.7070100@ebi.ac.uk> Marc Logghe wrote: > Hi David, > I am afraid there are some remaining oddities with oddcomp. > The first 20mer: > >>ZK822.4_36-55 > > SAGSSGSNFLSGLQNSSFGQ > > It is clear that there are 7 S residues in this stretch and we were > looking for 4 or more, so that makes sense. > However, when you run oddseq again with S count of 5 instead of 4, no > sequence is reported ! At least 2 bugs here. Firstly, with more than one sequence as input, some internal values were not fully reset. Also the word size is used (as 2) before it is set to 1. For 8 Serines in this set I am still only getting one hit out of two. A little more investigation needed ... I am getting closer :-) regards, Peter From ajb at ebi.ac.uk Thu Mar 9 10:58:33 2006 From: ajb at ebi.ac.uk (ajb at ebi.ac.uk) Date: Thu, 9 Mar 2006 15:58:33 -0000 (GMT) Subject: [EMBOSS] clustalw vs. emma In-Reply-To: <6.2.3.4.1.20060308092040.02aabc78@lenzie.gla.ac.uk> References: <6.2.3.4.1.20060308092040.02aabc78@lenzie.gla.ac.uk> Message-ID: <45378.81.96.70.96.1141919913.squirrel@webmail.ebi.ac.uk> Hi Derek, emma is indeed just a wrapper for clustalw. You can see what default parameters it is using by specifying -debug on the command line and then looking at the emma.dbg file. Search for a line saying "Executing 'clustalw" I suspect that the default gap extension penalty is rather high in your case. If you use (e.g.) -gapext 0.2 then you'll get something approaching the default clustalw behaviour. The defaults for your sequences seem to be: -gapopen=10.000 -gapext=5.000 -gapdist=8 HTH Alan > Morning all > > Is there some unusual default being passed to emma? For instance, > here's emma with a vanilla set of parameters on a fairly well > conserved set of proteins (bdlf4.fa): > > yoda:cluscheck 157 > emma bdlf4.fa -osformat2 msf -out2 bdlf4.emma -auto > > CLUSTAL W (1.83) Multiple Sequence Alignments > > Sequence type explicitly set to Protein > Sequence format is Pearson > Sequence 1: AG876-BDLF4 225 aa > Sequence 2: B95-BDLF4 225 aa > Sequence 3: GD1-BDLF4 225 aa > Sequence 4: RLV-BDLF4 238 aa > Start of Pairwise alignments > Aligning... > Sequences (1:2) Aligned. Score: 100 > Sequences (1:3) Aligned. Score: 98 > Sequences (1:4) Aligned. Score: 85 > Sequences (2:3) Aligned. Score: 98 > Sequences (2:4) Aligned. Score: 85 > Sequences (3:4) Aligned. Score: 86 > Guide tree file created: [00029986C] > Start of Multiple Alignment > There are 3 groups > Aligning... > Group 1: Sequences: 2 Score:3770 > Group 2: Sequences: 3 Score:3741 > Group 3: Sequences: 4 Score:3462 > Alignment Score 8058 > GCG-Alignment file created [00029986B] > > and now clustalw, unwrapped in emma, with the same input file > > yoda:cluscheck 158 > clustalw bdlf4.fa > > CLUSTAL W (1.83) Multiple Sequence Alignments > > Sequence format is Pearson > Sequence 1: AG876-BDLF4 225 aa > Sequence 2: B95-BDLF4 225 aa > Sequence 3: GD1-BDLF4 225 aa > Sequence 4: RLV-BDLF4 238 aa > Start of Pairwise alignments > Aligning... > Sequences (1:2) Aligned. Score: 100 > Sequences (1:3) Aligned. Score: 98 > Sequences (1:4) Aligned. Score: 88 > Sequences (2:3) Aligned. Score: 98 > Sequences (2:4) Aligned. Score: 88 > Sequences (3:4) Aligned. Score: 88 > Guide tree file created: [bdlf4.dnd] > Start of Multiple Alignment > There are 3 groups > Aligning... > Group 1: Sequences: 2 Score:4959 > Group 2: Sequences: 3 Score:4928 > Group 3: Sequences: 4 Score:4677 > Alignment Score 8187 > CLUSTAL-Alignment file created [bdlf4.aln] > > Why is the scoring subtly different? and see what it does to the > N-terminal of the alignment.... > > First with emma: > > 1 50 > AG876-BDLF4 ~~~~~~~~~~~~~MSDQGRLSLPRGEGGTDEPNPRHLCSYSKLEFHLPLP > B95-BDLF4 ~~~~~~~~~~~~~MSDQGRLSLPRGEGGTDEPNPRHLCSYSKLEFHLPLP > GD1-BDLF4 ~~~~~~~~~~~~~MSDQGRLSLPRGEGGTDEPNPRHLCSYSKLEFHLPLP > RLV-BDLF4 MSDHGRVSGRPRGAVRGRGASSPDGEGAPTGPNSRHLCSYSKLESHFPLP > > now with clustalw: > > AG876-BDLF4 > MSDQGRLS-------------LPRGEGGTDEPNPRHLCSYSKLEFHLPLPESMASVFACW > B95-BDLF4 > MSDQGRLS-------------LPRGEGGTDEPNPRHLCSYSKLEFHLPLPESMASVFACW > GD1-BDLF4 > MSDQGRLS-------------LPRGEGGTDEPNPRHLCSYSKLEFHLPLPESMASVFACW > RLV-BDLF4 > MSDHGRVSGRPRGAVRGRGASSPDGEGAPTGPNSRHLCSYSKLESHFPLPESMASVFACW > ***:**:* * ***.. **.********** > *:************* > > Clustalw alone clearly gives the correct alignment whereas emma is > wrong. I thought that emma simply wrapped clustalw for automation, > but it appears it is doing something else. Out of a set of 80 > proteins I am trying to pipeline through alignment, emma gives a > variant result for 7 of them..... > > Any thoughts, as always, much appreciated > > cheers > Derek > _______________________________________________ > EMBOSS mailing list > EMBOSS at emboss.open-bio.org > http://newportal.open-bio.org/mailman/listinfo/emboss > From d.gatherer at vir.gla.ac.uk Thu Mar 9 11:18:55 2006 From: d.gatherer at vir.gla.ac.uk (Derek Gatherer) Date: Thu, 09 Mar 2006 16:18:55 +0000 Subject: [EMBOSS] clustalw vs. emma In-Reply-To: <45378.81.96.70.96.1141919913.squirrel@webmail.ebi.ac.uk> References: <6.2.3.4.1.20060308092040.02aabc78@lenzie.gla.ac.uk> <45378.81.96.70.96.1141919913.squirrel@webmail.ebi.ac.uk> Message-ID: <6.2.3.4.1.20060309160317.02abb870@lenzie.gla.ac.uk> Thanks Alan That indeed is the cause of the problem: Executing 'clustalw -infile=00052348A -outfile=00052348B -align -type=protein -o utput=gcg -pwmatrix=blosum -pwgapopen=10.000 -pwgapext=0.100 -newtree=00052348C -matrix=blosum -gapopen=10.000 -gapext=5.000 -gapdist=8 -hgapresidues=GPSNDQEKR -maxdiv=30' However, on attempting to manually specify it, I run into another one: [gath01d at gamma cluscheck]$ emma bdlf4.fa -osformat2 msf -out2 bdlf4.emma -auto -debug -pwgapextend 5 Died: Unknown qualifier -pwgapextend In the docs http://emboss.sourceforge.net/apps/cvs/emma.html, there are quite a few optional parameters of this sort, some of which work and others don't, eg: [gath01d at gamma cluscheck]$ emma bdlf4.fa -osformat2 msf -out2 bdlf4.emma -auto -debug -gapextend 5 Died: Unknown qualifier -gapextend [gath01d at gamma cluscheck]$ emma bdlf4.fa -osformat2 msf -out2 bdlf4.emma -auto -debug -pwgapextend 5 Died: Unknown qualifier -pwgapextend [gath01d at gamma cluscheck]$ emma bdlf4.fa -osformat2 msf -out2 bdlf4.emma -auto -debug -gapopen 5 Died: Unknown qualifier -gapopen [gath01d at gamma cluscheck]$ emma bdlf4.fa -osformat2 msf -out2 bdlf4.emma -auto -debug -gapdist 5 CLUSTAL W (1.83) Multiple Sequence Alignments so -gapdist works at least. Cheers Derek At 15:58 09/03/2006, ajb at ebi.ac.uk wrote: >Hi Derek, > >emma is indeed just a wrapper for clustalw. You can see what default >parameters it is using by specifying -debug on the command line >and then looking at the emma.dbg file. Search for a line >saying "Executing 'clustalw" > >I suspect that the default gap extension penalty is rather high >in your case. If you use (e.g.) -gapext 0.2 then you'll get >something approaching the default clustalw behaviour. The defaults >for your sequences seem to be: > > -gapopen=10.000 -gapext=5.000 -gapdist=8 > > >HTH > >Alan > > > Morning all > > > > Is there some unusual default being passed to emma? For instance, > > here's emma with a vanilla set of parameters on a fairly well > > conserved set of proteins (bdlf4.fa): > > > > yoda:cluscheck 157 > emma bdlf4.fa -osformat2 msf -out2 bdlf4.emma -auto > > > > CLUSTAL W (1.83) Multiple Sequence Alignments > > > > Sequence type explicitly set to Protein > > Sequence format is Pearson > > Sequence 1: AG876-BDLF4 225 aa > > Sequence 2: B95-BDLF4 225 aa > > Sequence 3: GD1-BDLF4 225 aa > > Sequence 4: RLV-BDLF4 238 aa > > Start of Pairwise alignments > > Aligning... > > Sequences (1:2) Aligned. Score: 100 > > Sequences (1:3) Aligned. Score: 98 > > Sequences (1:4) Aligned. Score: 85 > > Sequences (2:3) Aligned. Score: 98 > > Sequences (2:4) Aligned. Score: 85 > > Sequences (3:4) Aligned. Score: 86 > > Guide tree file created: [00029986C] > > Start of Multiple Alignment > > There are 3 groups > > Aligning... > > Group 1: Sequences: 2 Score:3770 > > Group 2: Sequences: 3 Score:3741 > > Group 3: Sequences: 4 Score:3462 > > Alignment Score 8058 > > GCG-Alignment file created [00029986B] > > > > and now clustalw, unwrapped in emma, with the same input file > > > > yoda:cluscheck 158 > clustalw bdlf4.fa > > > > CLUSTAL W (1.83) Multiple Sequence Alignments > > > > Sequence format is Pearson > > Sequence 1: AG876-BDLF4 225 aa > > Sequence 2: B95-BDLF4 225 aa > > Sequence 3: GD1-BDLF4 225 aa > > Sequence 4: RLV-BDLF4 238 aa > > Start of Pairwise alignments > > Aligning... > > Sequences (1:2) Aligned. Score: 100 > > Sequences (1:3) Aligned. Score: 98 > > Sequences (1:4) Aligned. Score: 88 > > Sequences (2:3) Aligned. Score: 98 > > Sequences (2:4) Aligned. Score: 88 > > Sequences (3:4) Aligned. Score: 88 > > Guide tree file created: [bdlf4.dnd] > > Start of Multiple Alignment > > There are 3 groups > > Aligning... > > Group 1: Sequences: 2 Score:4959 > > Group 2: Sequences: 3 Score:4928 > > Group 3: Sequences: 4 Score:4677 > > Alignment Score 8187 > > CLUSTAL-Alignment file created [bdlf4.aln] > > > > Why is the scoring subtly different? and see what it does to the > > N-terminal of the alignment.... > > > > First with emma: > > > > 1 50 > > AG876-BDLF4 ~~~~~~~~~~~~~MSDQGRLSLPRGEGGTDEPNPRHLCSYSKLEFHLPLP > > B95-BDLF4 ~~~~~~~~~~~~~MSDQGRLSLPRGEGGTDEPNPRHLCSYSKLEFHLPLP > > GD1-BDLF4 ~~~~~~~~~~~~~MSDQGRLSLPRGEGGTDEPNPRHLCSYSKLEFHLPLP > > RLV-BDLF4 MSDHGRVSGRPRGAVRGRGASSPDGEGAPTGPNSRHLCSYSKLESHFPLP > > > > now with clustalw: > > > > AG876-BDLF4 > > MSDQGRLS-------------LPRGEGGTDEPNPRHLCSYSKLEFHLPLPESMASVFACW > > B95-BDLF4 > > MSDQGRLS-------------LPRGEGGTDEPNPRHLCSYSKLEFHLPLPESMASVFACW > > GD1-BDLF4 > > MSDQGRLS-------------LPRGEGGTDEPNPRHLCSYSKLEFHLPLPESMASVFACW > > RLV-BDLF4 > > MSDHGRVSGRPRGAVRGRGASSPDGEGAPTGPNSRHLCSYSKLESHFPLPESMASVFACW > > ***:**:* * ***.. **.********** > > *:************* > > > > Clustalw alone clearly gives the correct alignment whereas emma is > > wrong. I thought that emma simply wrapped clustalw for automation, > > but it appears it is doing something else. Out of a set of 80 > > proteins I am trying to pipeline through alignment, emma gives a > > variant result for 7 of them..... > > > > Any thoughts, as always, much appreciated > > > > cheers > > Derek > > _______________________________________________ > > EMBOSS mailing list > > EMBOSS at emboss.open-bio.org > > http://newportal.open-bio.org/mailman/listinfo/emboss > > From pmr at ebi.ac.uk Thu Mar 9 12:01:15 2006 From: pmr at ebi.ac.uk (Peter Rice) Date: Thu, 09 Mar 2006 17:01:15 +0000 Subject: [EMBOSS] clustalw vs. emma In-Reply-To: <6.2.3.4.1.20060309160317.02abb870@lenzie.gla.ac.uk> References: <6.2.3.4.1.20060308092040.02aabc78@lenzie.gla.ac.uk> <45378.81.96.70.96.1141919913.squirrel@webmail.ebi.ac.uk> <6.2.3.4.1.20060309160317.02abb870@lenzie.gla.ac.uk> Message-ID: <44105F5B.3050200@ebi.ac.uk> Derek Gatherer wrote: > In the docs http://emboss.sourceforge.net/apps/cvs/emma.html, there > are quite a few optional parameters of this sort, some of which work > and others don't, eg: Yup - we're putting that right (some people have noticed the application docs are moving around). The emboss.sf.net website only documents things for the latest code in CVS. We are adding documentation for release 3.0.0 (that is why the new directories are appearing). The release 3.0.0 documentation is installed on your system when you install 3.0.0 - if you install to /usr/local/bin it will be in: /usr/local/share/EMBOSS/doc/programs/html (this will change in release 4.0.0). You are seeing some of the changes made to make standard names for command line qualifiers since 3.0.0 Hope that helps, Peter From blanchard at microbio.umass.edu Thu Mar 9 16:18:55 2006 From: blanchard at microbio.umass.edu (Jeffrey Blanchard) Date: Thu, 9 Mar 2006 16:18:55 -0500 Subject: [EMBOSS] d_ino Message-ID: Hello, I am trying to install EMBOSS under cygwin for teaching purposes. make crashes on ajfile because d_ino appears to be missing in current version of cygwin. Is there a work around for this? Thanks, Jeff ------------------------------- Jeffrey L. Blanchard Assistant Professor Department of Microbiology University of Massachusetts Amherst, MA 01003 Office and Lab: Morrill I N330 Tel: 413-577-2130 Fax: 413-545-1578 http://www.bio.umass.edu/micro/blanchard/Lab_About.html From ajb at ebi.ac.uk Thu Mar 9 19:22:45 2006 From: ajb at ebi.ac.uk (ajb at ebi.ac.uk) Date: Fri, 10 Mar 2006 00:22:45 -0000 (GMT) Subject: [EMBOSS] d_ino In-Reply-To: References: Message-ID: <41243.81.96.70.96.1141950165.squirrel@webmail.ebi.ac.uk> Hi, Yes indeed there is a fix. Look in the directory. ftp://emboss.open-bio.org/pub/EMBOSS/fixes/ The README file there will usually tell you what each of the files fixes. HTH Alan Bleasby EBI > Hello, > > I am trying to install EMBOSS under cygwin for teaching purposes. > > make crashes on ajfile because d_ino appears to be missing in current > version of cygwin. > > Is there a work around for this? > > Thanks, Jeff > > ------------------------------- > Jeffrey L. Blanchard > Assistant Professor > Department of Microbiology > University of Massachusetts > Amherst, MA 01003 > Office and Lab: Morrill I N330 > Tel: 413-577-2130 > Fax: 413-545-1578 > http://www.bio.umass.edu/micro/blanchard/Lab_About.html > > > _______________________________________________ > EMBOSS mailing list > EMBOSS at emboss.open-bio.org > http://newportal.open-bio.org/mailman/listinfo/emboss > From jison at ebi.ac.uk Wed Mar 15 12:09:59 2006 From: jison at ebi.ac.uk (Jon Ison) Date: Wed, 15 Mar 2006 17:09:59 -0000 (GMT) Subject: [EMBOSS] EMBOSS Developers Course - reminder Message-ID: <39760.172.31.70.94.1142442599.squirrel@webmail.ebi.ac.uk> Hi There's still some places left on this course. Get in touch if you'd like to attend. Cheers Jon BSDC 2006 Bioinformatics Software Development Course April 18-20 2006 Following from the highly successful BSDC 2003/2004 courses, a new series of courses on 'Bioinformatics Software Development' using EMBOSS will be held in the training room at The Wellcome Trust Conference Centre on April 18-20, 2006. The course will give a good introduction to programming in EMBOSS. By the end of the course you will be experienced in all the steps in writing a basic bioinformatics application using the EMBOSS programming libraries. The course would suit competent programmers, probably with at least a couple of years of experience. A reasonable working knowledge of C is required to get the most out of the course, familiarity with pointers is helpful but not essential. That said, all are welcome regardless of background or experience. Places are limited so please email Liz Ford (ford at ebi.ac.uk) to register as soon as possible. We do not make a profit on the course but must charge #125 / person (for the 3-days) to recover some of our costs. We are unable to take credit card payments. The preferred method of payment is by cheque made payable to 'Industry Workshops'. If you wish to pay in cash or by bank transfer please contact Liz Ford (ford at ebi.ac.uk) To read more about the course see http://emboss.sourceforge.net/developers/developers_course/ To read more about EMBOSS see http://emboss.sourceforge.net/ To register: email Liz Ford (ford at ebi.ac.uk) with your full name, address, phone number You will then receive an email back confirming your registration or not. Please note, as mentioned before, places are limited so not all registrations will be successful. For further information email Jon Ison (jison at ebi.ac.uk) From pmr at ebi.ac.uk Mon Mar 27 12:50:09 2006 From: pmr at ebi.ac.uk (pmr at ebi.ac.uk) Date: Mon, 27 Mar 2006 18:50:09 +0100 (BST) Subject: [EMBOSS] [BiO BB] Building an alignment from BLAST hsp In-Reply-To: <00b501c651be$95b37500$e6028a0a@GOLHARMOBILE1> References: <00b501c651be$95b37500$e6028a0a@GOLHARMOBILE1> Message-ID: <2253.86.132.217.176.1143481809.squirrel@webmail.ebi.ac.uk> Ryan Golhar wrote: > I have a BLAST alignment: query sequence and database sequence. > > The alignment is only showing the HSP from the blast output as expected, > however I want to build an alignment of the entire database sequence > against my query sequence. > > I tried using needle from EMBOSS, however its aligning the sequences > completely different than BLAST does. What I'd really like is a way to > anchor the alignment based on the BLAST HSP. Does anyone know how to do > this, or what tool(s) will allow me to do this? You are quite right that EMBOSS may align the sequences completely differently - unless the HSPs are very significant and cover most of the sequence this will be true of any attempt to simply realign. There has to be some way to pass on the HSPs as fixed positions, as in the BioPerl solution. However, it could make a nice EMBOSS application - the only question would be how you would like to specify the HSPs. Perhaps we could read BLAST output (in some specified format), or perhaps some other way to give the input alignments. We do have at least one EMBOSS application that does something similar (finds all long perfect matches and interpolates) - we just need to reuse the interpolation code which is basically doing a global alignment of the bits in between. That also tackles the problem of choosing which non-compatible initial matches to use. Hope that helps, Peter From golharam at umdnj.edu Mon Mar 27 11:50:42 2006 From: golharam at umdnj.edu (Ryan Golhar) Date: Mon, 27 Mar 2006 11:50:42 -0500 Subject: [EMBOSS] Building an alignment from BLAST hsp Message-ID: <00b501c651be$95b37500$e6028a0a@GOLHARMOBILE1> I have a BLAST alignment: query sequence and database sequence. The alignment is only showing the HSP from the blast output as expected, however I want to build an alignment of the entire database sequence against my query sequence. I tried using needle from EMBOSS, however its aligning the sequences completely different than BLAST does. What I'd really like is a way to anchor the alignment based on the BLAST HSP. Does anyone know how to do this, or what tool(s) will allow me to do this? Ryan From golharam at umdnj.edu Mon Mar 27 13:03:39 2006 From: golharam at umdnj.edu (Ryan Golhar) Date: Mon, 27 Mar 2006 13:03:39 -0500 Subject: [EMBOSS] [BiO BB] Building an alignment from BLAST hsp In-Reply-To: <2253.86.132.217.176.1143481809.squirrel@webmail.ebi.ac.uk> Message-ID: <010501c651c8$c6b4bb00$e6028a0a@GOLHARMOBILE1> Hi Peter, > You are quite right that EMBOSS may align the sequences completely > differently - unless the HSPs are very significant and cover most > of the sequence this will be true of any attempt to simply realign. > There has to be some way to pass on the HSPs as fixed positions, > as in the BioPerl solution. I looked at a bioperl method, but can't seem to find something that will accomplish this. > However, it could make a nice EMBOSS application - the only question > would be how you would like to specify the HSPs. Perhaps we could read > BLAST output (in some specified format), or perhaps some other way to > give the input alignments. Yes, I agree. I suppose the best way would be to specify the two sequences and the blast output. The application could then construct an alignment based on a particular HSP (probably the first one, or whatever the user specifies). Ryan From letondal at pasteur.fr Tue Mar 28 02:25:07 2006 From: letondal at pasteur.fr (Catherine Letondal) Date: Tue, 28 Mar 2006 09:25:07 +0200 Subject: [EMBOSS] [BiO BB] Building an alignment from BLAST hsp In-Reply-To: <010501c651c8$c6b4bb00$e6028a0a@GOLHARMOBILE1> References: <010501c651c8$c6b4bb00$e6028a0a@GOLHARMOBILE1> Message-ID: <4b91818a096ba42d8d53279a7f63e6ea@pasteur.fr> On Mar 27, 2006, at 8:03 PM, Ryan Golhar wrote: > Hi Peter, > >> You are quite right that EMBOSS may align the sequences completely >> differently - unless the HSPs are very significant and cover most >> of the sequence this will be true of any attempt to simply realign. >> There has to be some way to pass on the HSPs as fixed positions, >> as in the BioPerl solution. > > I looked at a bioperl method, but can't seem to find something that > will > accomplish this. > >> However, it could make a nice EMBOSS application - the only question >> would be how you would like to specify the HSPs. Perhaps we could read > >> BLAST output (in some specified format), or perhaps some other way to >> give the input alignments. > > Yes, I agree. I suppose the best way would be to specify the two > sequences and the blast output. The application could then construct > an > alignment based on a particular HSP (probably the first one, or > whatever > the user specifies). > Have you tried this: http://bioweb.pasteur.fr/seqanal/interfaces/seqsblast.html It is based on bioperl. check "Get HSP" option (you can even extend it). Best, -- Catherine Letondal -- Institut Pasteur -- Computing Center From cquijano at iib.uam.es Tue Mar 28 04:49:01 2006 From: cquijano at iib.uam.es (Carlos Quijano) Date: Tue, 28 Mar 2006 11:49:01 +0200 Subject: [EMBOSS] [BiO BB] Building an alignment from BLAST hsp In-Reply-To: <4b91818a096ba42d8d53279a7f63e6ea@pasteur.fr> References: <010501c651c8$c6b4bb00$e6028a0a@GOLHARMOBILE1> <4b91818a096ba42d8d53279a7f63e6ea@pasteur.fr> Message-ID: <1143539342.8611.45.camel@localhost.localdomain> Hi all, I didnt read it before, sorry for the "lapsus". And sorry for the information if what I tell you is not exactly what you needed, Ryan. What you are looking for is just _MVIEW_, an old but nice application. Use scholar.google.com / pubmed to find more information about it, I remember that there are web servers running cgi's somewhere. It is possible than during this last years, somebody has published a new better tool or a new mview version.... Look for it. MVIEW is a parser for your blast output. MVIEW works for your problem because you wanna align only one sequence (as a template) to a entire database (I suppose that with any cutoff in the e-value or p-vale, at least the default, it is, ten) or against a set of some sequences or only one more sequence (2 sequences alignment). I continue with some considerations about aligning HSPs from Blast the way you pretend and mview does... there are important considerations and it is only a minute to read: Remember, what you get is what you wanted, but not a real thing (this is something very typical in bioinformatics - and all science - hahaha). You dont get a real multiple alignment, you get an artifact that is a entire database's gene-blast.hsps constructs piled down a template gene (your sequence). All right then. You dont have by any means an alignment, nor even an alignment of the genes using HSPs, because, there can be some hsps alignable between sequences in the database that are hidden for the alignment when sequences are piled down your sequence, because your sequence lacks this hsps and are _ignored_. Why is this so important? What I actually mean is that if you use this "sequences piled down a template" as a multiple alignment, you will be lying about the topology underlying (it is, not lying ;-) in the gene network, that arises from your database plus your sequence when correctly aligned, it is, all against all... etc,etc, etc. Well, it is the mathematical exhaustive-optimal way... normally we use heuristics again, and again, and again... But "all against all" is the key concept involved in the multiple alignment problem. It is very important to be aware of this things. needle is the optimal way <-> Blast is the heuristic Clustal is also a very very heuristic solution to the massive problem of multiple alignment. And personally I prefer to use muscle that uses a better mathematical model and is (right now) the quickest aligner for the most of the cases. I am sure that most of you know it. I hope it is usefull for newbies and others, so forgive me for the boring tedious discourse... CQ El mar, 28-03-2006 a las 09:25 +0200, Catherine Letondal escribi?: > On Mar 27, 2006, at 8:03 PM, Ryan Golhar wrote: > > > Hi Peter, > > > >> You are quite right that EMBOSS may align the sequences completely > >> differently - unless the HSPs are very significant and cover most > >> of the sequence this will be true of any attempt to simply realign. > >> There has to be some way to pass on the HSPs as fixed positions, > >> as in the BioPerl solution. > > > > I looked at a bioperl method, but can't seem to find something that > > will > > accomplish this. > > > >> However, it could make a nice EMBOSS application - the only question > >> would be how you would like to specify the HSPs. Perhaps we could read > > > >> BLAST output (in some specified format), or perhaps some other way to > >> give the input alignments. > > > > Yes, I agree. I suppose the best way would be to specify the two > > sequences and the blast output. The application could then construct > > an > > alignment based on a particular HSP (probably the first one, or > > whatever > > the user specifies). > > > > Have you tried this: > http://bioweb.pasteur.fr/seqanal/interfaces/seqsblast.html > > It is based on bioperl. check "Get HSP" option (you can even extend it). > > Best, > > -- > Catherine Letondal -- Institut Pasteur -- Computing Center > > _______________________________________________ > EMBOSS mailing list > EMBOSS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/emboss Carlos Quijano http://www2.iib.uam.es/cquijano Evolution and Development laboratory Regulation of Gene Expression Department Institute for Biomedical Research http://www.iib.uam.es From kvddrift at earthlink.net Wed Mar 29 19:36:23 2006 From: kvddrift at earthlink.net (Koen van der Drift) Date: Wed, 29 Mar 2006 19:36:23 -0500 Subject: [EMBOSS] crash on intel-Mac Message-ID: Hi, I got a report from a user (of the fink package of emboss) that the following crashes occur on his Mac with an intel processor: % wossname Error: Failed to compile regular expression '^(.*/)[^/]+/?$' at position 716: range out of order in character class Bus error All other programs just give a bus error. I don't get these errors on a Mac with a PowerPC processor. This is emboss 3.0.0. - Koen. From areagp61 at yahoo.it Thu Mar 30 03:31:42 2006 From: areagp61 at yahoo.it (Graziano P.) Date: Thu, 30 Mar 2006 10:31:42 +0200 (CEST) Subject: [EMBOSS] dbifasta index file format Message-ID: <20060330083142.4237.qmail@web26207.mail.ukl.yahoo.com> hello EMBOSS users, I have some databases in fasta format (ncbi | format) and I want to index them using dbifasta, then I want to access the index files using a program that will be developed by a computer scientist of my group. I need to index the databases by accession number, ginumber and description. I have read in the dbifasta help info about the structure of the index files when the databases were indexed by accession number, but I have not found info about the structure of the index files when the databases are indexed by description. Anyone knows where I can find detailed information about the structure of the index files? Regards Graziano ___________________________________ Yahoo! Messenger with Voice: chiama da PC a telefono a tariffe esclusive http://it.messenger.yahoo.com From ajb at ebi.ac.uk Thu Mar 30 03:38:10 2006 From: ajb at ebi.ac.uk (ajb at ebi.ac.uk) Date: Thu, 30 Mar 2006 09:38:10 +0100 (BST) Subject: [EMBOSS] crash on intel-Mac In-Reply-To: References: Message-ID: <37407.81.98.244.247.1143707890.squirrel@webmail.ebi.ac.uk> Hi, Thanks. We already have a report of this and are working on a solution. Alan From gbottu at ben.vub.ac.be Thu Mar 30 04:37:23 2006 From: gbottu at ben.vub.ac.be (Guy Bottu) Date: Thu, 30 Mar 2006 11:37:23 +0200 Subject: [EMBOSS] A note about fastA format(s) - Checked by AntiVir DEMO version - Message-ID: <20060330093723.GA18690@bigben.ulb.ac.be> Dear friends, We are using EMBOSS version 3.0. One of my colleagues tried to use a multiple sequence file in fastA format, where each comment line starts with a string containing multiple pipe signs. An USA of type fasta::file:xx|yy|zz|uu|ss did not work. After some trial I found that putting "pearson" instead of "fasta" helped. This is strange, since according to the on-line manual at http://emboss.sourceforge.net/docs/themes/SequenceFormats.html "fasta" and "pearson" are synonyms. Here it seems that "fasta" is instead treated the same as "ncbi". Comments ? Guy Bottu, BEN From enrique.deandres at pcm.uam.es Thu Mar 30 10:46:30 2006 From: enrique.deandres at pcm.uam.es (Enrique de Andres Saiz) Date: Thu, 30 Mar 2006 17:46:30 +0200 Subject: [EMBOSS] Problem indexing PDB fasta file Message-ID: <442BFD56.9010908@pcm.uam.es> Hello, I'm trying to index the fasta file of the PDB database with dbifasta command and I get a lot of warnings as: Warning: Duplicate ID skipped: '1FNT_A' All hits will point to first ID found I have been looking the PDB fasta file and I see that, for the previous warning, there are an entry whoose id is '1FNT_A' and another one whoose id is '1FNT_a'. Then, this make me think that EMBOSS is case-insensitive. Is this true? Are there any way to distinguish between the two id's? Thanks in advance, Enrique. From pmr at ebi.ac.uk Thu Mar 30 16:47:19 2006 From: pmr at ebi.ac.uk (pmr at ebi.ac.uk) Date: Thu, 30 Mar 2006 22:47:19 +0100 (BST) Subject: [EMBOSS] A note about fastA format(s) - Checked by AntiVir DEMO version - In-Reply-To: <20060330093723.GA18690@bigben.ulb.ac.be> References: <20060330093723.GA18690@bigben.ulb.ac.be> Message-ID: <50335.68.153.173.207.1143755239.squirrel@webmail.ebi.ac.uk> Dear Guy, > We are using EMBOSS version 3.0. One of my colleagues tried to use a > multiple sequence file in fastA format, where each comment line starts > with a string containing multiple pipe signs. An USA of type > fasta::file:xx|yy|zz|uu|ss > did not work. After some trial I found that putting "pearson" instead of > "fasta" helped. This is strange, since according to the on-line manual at > http://emboss.sourceforge.net/docs/themes/SequenceFormats.html > "fasta" and "pearson" are synonyms. Here it seems that "fasta" is instead > treated the same as "ncbi". Comments ? Yes, that is indeed true. We had to make chanhes to support various NCBI formats, and made FASTA and NCBI the same. We kept "pearson" as the original plain fasta format. We will update the documentation - it will take a little time to check for any other changes to the formats. regards, Peter From ajb at ebi.ac.uk Fri Mar 31 07:12:53 2006 From: ajb at ebi.ac.uk (ajb at ebi.ac.uk) Date: Fri, 31 Mar 2006 13:12:53 +0100 (BST) Subject: [EMBOSS] crash on intel-Mac In-Reply-To: References: Message-ID: <51078.81.98.244.247.1143807173.squirrel@webmail.ebi.ac.uk> This should now be fixed as long as you apply all the fixes to EMBOSS-3.0.0 from the directory: ftp://emboss.open-bio.org/pub/EMBOSS/fixes/ The latest file there is a new 'configure' however, if you've not applied previous patches in the above directory as well, then you'll get compilation failure. Look at the README for details of what the patches fix. Thanks to Bill van Etten for previous emails on this. Changes to the CVS developers version will follow. Alan From dksamuel at gmail.com Fri Mar 31 23:12:14 2006 From: dksamuel at gmail.com (Duleep Samuel) Date: Sat, 1 Apr 2006 09:42:14 +0530 Subject: [EMBOSS] Fwd: EMBOSS for Windows without Cygwin In-Reply-To: <442CCD71.60202@gmail.com> References: <442CCD71.60202@gmail.com> Message-ID: Is the latest EMBOSS version 3.0.0.0 available anywhere as a precompiled binary for Windows XP, I have tried compiling using cygwin and it crashed, I loaded EMBOSS for windows which is a port of version 2.10.0, loaded Staden Package and made Spin aware of EMBOSS and am working, but feel bad that I am _One_ whole release behind, If anyone has a complied binary I can download for testing and report back on useability, regards, Samuel, Virologist, India From aengus.stewart at cancer.org.uk Thu Mar 2 14:56:25 2006 From: aengus.stewart at cancer.org.uk (Aengus Stewart) Date: Thu, 02 Mar 2006 14:56:25 +0000 Subject: [EMBOSS] DB - finding out how many sequences Message-ID: <44070799.6090804@cancer.org.uk> Hi, Does any of the EMBOSS apps output the number of sequences that it has searched? I am after this figure as I have a data library issue. Some sequences are "not found" by EMBOSS even though I know they are in the original flat files. I am trying to figure out if this is configure problem data problem indexing problem. The indexing with dbiflat doesnt complain but I would like to be able to check my input number of sequences with what EMBOSS thinks was output. Cheers Aengus -- ----------------------------------------------------------------------- Aengus Stewart Group Leader Bioinformatics and BioStatistics Tel: +44 (0)20 7269 3679 Cancer Research UK, Lincoln's Inn Fields, Holborn, London, WC2A 3PX, UK ----------------------------------------------------------------------- This electronic message contains information which may be privileged and confidential. The information is intended to be for the use of the individual(s) or entity named above. Be aware that any third party disclosure, distribution, copying or use of this communication, without prior permission, is strictly prohibited. From jison at ebi.ac.uk Thu Mar 2 16:35:35 2006 From: jison at ebi.ac.uk (Jon Ison) Date: Thu, 2 Mar 2006 16:35:35 -0000 (GMT) Subject: [EMBOSS] DB - finding out how many sequences In-Reply-To: <44070799.6090804@cancer.org.uk> References: <44070799.6090804@cancer.org.uk> Message-ID: <43713.172.31.100.168.1141317335.squirrel@webmail.ebi.ac.uk> Ay up Aengus Not so far as I'm aware although you could get that number indirectly by using infoseq. You could try using dbxflat too ... which does generate some stats on the input data - don't know whether the stats include the number of sequences that were indexed but its worth a look. Cheers Jon > > Hi, > > Does any of the EMBOSS apps output the number of sequences that it has searched? > > I am after this figure as I have a data library issue. > > Some sequences are "not found" by EMBOSS even though I know they are in the original flat files. > > I am trying to figure out if this is > > configure problem > data problem > indexing problem. > > The indexing with dbiflat doesnt complain but I would like to be able to check my input number of sequences with what > EMBOSS thinks was output. > > > Cheers > Aengus > > > -- > ----------------------------------------------------------------------- > Aengus Stewart > Group Leader > Bioinformatics and BioStatistics Tel: +44 (0)20 7269 3679 > Cancer Research UK, Lincoln's Inn Fields, Holborn, London, WC2A 3PX, UK > ----------------------------------------------------------------------- > > This electronic message contains information which may be privileged and > confidential. The information is intended to be for the use of the > individual(s) or entity named above. Be aware that any third party > disclosure, distribution, copying or use of this communication, without > prior permission, is strictly prohibited. > _______________________________________________ > EMBOSS mailing list > EMBOSS at emboss.open-bio.org > http://newportal.open-bio.org/mailman/listinfo/emboss > From pmr at ebi.ac.uk Thu Mar 2 17:39:07 2006 From: pmr at ebi.ac.uk (Peter Rice) Date: Thu, 02 Mar 2006 17:39:07 +0000 Subject: [EMBOSS] DB - finding out how many sequences In-Reply-To: <44070799.6090804@cancer.org.uk> References: <44070799.6090804@cancer.org.uk> Message-ID: <44072DBB.1060506@ebi.ac.uk> Hi Aengus, > Does any of the EMBOSS apps output the number of sequences that it has searched? > > I am after this figure as I have a data library issue. > > Some sequences are "not found" by EMBOSS even though I know they are in the original flat files. What is your database definition? regards, Peter From joanne at bioinformatics.ubc.ca Mon Mar 6 22:51:00 2006 From: joanne at bioinformatics.ubc.ca (Joanne Fox) Date: Mon, 06 Mar 2006 14:51:00 -0800 Subject: [EMBOSS] Warning: Cannot open division file '' for database 'swissprot' Message-ID: <440CBCD4.8030605@bioinformatics.ubc.ca> Hello EMBOSS community, I used dbiflat to index the latest flatfile distribution of swissprot (uniprot_sprot.dat). Now I am trying to use this database with the EMBOSS patmatdb program and I'm encountering an error that reads, "Warning: Cannot open division file '' for database 'swissprot'". I searched the mailing list archives and I see others with this same problem that boil down to permissions and/or path problems. However, I still can't figure out what's going wrong on my system. I've put more detailed information below. I'm new to the world of configuring EMBOSS so if anyone has any ideas about what might be going wrong, I'd really appreciate the advice. Thanks, Joanne. -- | Joanne Fox | http://bioinformatics.ubc.ca/people/joanne ~> showdb Displays information on the currently available databases # Name Type ID Qry All Comment # ==== ==== == === === ======= swissprot P OK OK OK Swissprot Release 7.1, 2/21/2006 ~> patmatdb Search a protein sequence with a motif Input sequence(s): swissprot Warning: Cannot open division file '' for database 'swissprot' Error: Unable to read sequence 'swissprot' Input sequence(s): swissprot:* Warning: Cannot open division file '' for database 'swissprot' Error: Unable to read sequence 'swissprot:*' Died: patmatdb terminated: Bad value for '-sequence' and no more retries contents of .embossrc file: set emboss_logfile /usr/local/software/bioinformatics/emboss/log/emboss.log set emboss_database_dir /raid1/bioinformatics/data DB swissprot [ type: P method: emblcd format: swissprot dir: \$emboss_database_dir/swissprot/swissprot_V7_1 file: "*.dat" release: "7.1" comment: "Swissprot Release 7.1, 2/21/2006" ] contents of the /raid1/bioinformatics/data/swissprot/swissprot_V7_1/ directory: -rw-r--r-- 1 bin bin 1165524 Mar 6 12:37 acnum.hit -rw-r--r-- 1 bin bin 3899118 Mar 6 12:37 acnum.trg -rw-r--r-- 1 bin bin 322 Mar 6 12:37 division.lkp -rw-r--r-- 1 bin bin 4368405 Mar 6 12:37 entrynam.idx -rw-r--r-- 1 bin bin 802445434 Mar 6 12:23 uniprot_sprot.dat From yezhiqiang at gmail.com Mon Mar 6 21:29:49 2006 From: yezhiqiang at gmail.com (Zhiqiang Ye) Date: Tue, 7 Mar 2006 05:29:49 +0800 Subject: [EMBOSS] Does emboss have a handy way for mutate a protein sequence? Message-ID: <34198fe40603061329u85c9f95p@mail.gmail.com> Dear all, Does emboss have a handy way for mutate a protein sequence by the specified way? For example, I have a sequence foo.fasta >foo MATSCGLLKIIQRE It has a mutant called 'A2L'. Is there any way to do this operation to output(with an option to check the foo.fasta has 'A' at position 2): >foo A2L MLTSCGLLKIIQRE My way: use extractseq to extract two file: one before position 2, the other after postion 2. Then creat a fasta file contain 'L'. After that, I use union to connect these 3 sequence file in to one. Or write a perl script to do this by change a string's substring. How If emboss could provide a 'mutate' ! Thank you :) Best regards! -- Zhiqiang Ye From Marc.Logghe at DEVGEN.com Tue Mar 7 07:59:27 2006 From: Marc.Logghe at DEVGEN.com (Marc Logghe) Date: Tue, 7 Mar 2006 08:59:27 +0100 Subject: [EMBOSS] Does emboss have a handy way for mutate a protein sequence? Message-ID: <0C528E3670D8CE4B8E013F6749231AA6746B99@ANTARESIA.be.devgen.com> Can msbar do something for you ? Msbar = "Mutate sequence beyond all recognition" Cheers, Marc > -----Original Message----- > From: emboss-bounces at emboss.open-bio.org > [mailto:emboss-bounces at emboss.open-bio.org] On Behalf Of Zhiqiang Ye > Sent: Monday, March 06, 2006 10:30 PM > To: emboss at emboss.open-bio.org > Subject: [EMBOSS] Does emboss have a handy way for mutate a > protein sequence? > > Dear all, > > Does emboss have a handy way for mutate a protein > sequence by the specified way? > For example, I have a sequence foo.fasta > > >foo > MATSCGLLKIIQRE > > It has a mutant called 'A2L'. Is there any way to do this > operation to output(with an option to check the foo.fasta has > 'A' at position > 2): > >foo A2L > MLTSCGLLKIIQRE > > My way: use extractseq to extract two file: one before > position 2, the other after postion 2. Then creat a fasta > file contain 'L'. After that, I use union to connect these > 3 sequence file in to one. > > Or write a perl script to do this by change a string's substring. > > How If emboss could provide a 'mutate' ! > > Thank you :) > > Best regards! > -- > Zhiqiang Ye > > _______________________________________________ > EMBOSS mailing list > EMBOSS at emboss.open-bio.org > http://newportal.open-bio.org/mailman/listinfo/emboss > From David.Bauer at SCHERING.DE Tue Mar 7 06:40:24 2006 From: David.Bauer at SCHERING.DE (David.Bauer at SCHERING.DE) Date: Tue, 7 Mar 2006 07:40:24 +0100 Subject: [EMBOSS] Antwort: Does emboss have a handy way for mutate a protein sequence? In-Reply-To: <34198fe40603061329u85c9f95p@mail.gmail.com> Message-ID: What about this solution: cutseq foo.fasta -from 2 -to 2 | pasteseq -filter -pos 1 -bs asis:'L' | descseq -filter -append -desc "A2L" >foo A2L MLTSCGLLKIIQRE Cheers, David. emboss-bounces at emboss.open-bio.org schrieb am 06/03/2006 22:29:49: > Dear all, > > Does emboss have a handy way for mutate a protein sequence by > the specified way? > For example, I have a sequence foo.fasta > > >foo > MATSCGLLKIIQRE > > It has a mutant called 'A2L'. Is there any way to do this operation > to output(with an option to check the foo.fasta has 'A' at position > 2): > >foo A2L > MLTSCGLLKIIQRE > > My way: use extractseq to extract two file: one before position 2, > the other after postion 2. Then creat a fasta file contain 'L'. After > that, I use union to connect these 3 sequence file in to one. > > Or write a perl script to do this by change a string's substring. > > How If emboss could provide a 'mutate' ! > > Thank you :) > > Best regards! > -- > Zhiqiang Ye > > _______________________________________________ > EMBOSS mailing list > EMBOSS at emboss.open-bio.org > http://newportal.open-bio.org/mailman/listinfo/emboss From yezhiqiang at gmail.com Tue Mar 7 13:22:57 2006 From: yezhiqiang at gmail.com (Zhiqiang Ye) Date: Tue, 7 Mar 2006 21:22:57 +0800 Subject: [EMBOSS] Does emboss have a handy way for mutate a protein sequence? In-Reply-To: <0C528E3670D8CE4B8E013F6749231AA6746B99@ANTARESIA.be.devgen.com> References: <0C528E3670D8CE4B8E013F6749231AA6746B99@ANTARESIA.be.devgen.com> Message-ID: <34198fe40603070522r5d0920f9q@mail.gmail.com> 2006/3/7, Marc Logghe : > Can msbar do something for you ? Msbar = "Mutate sequence beyond all > recognition" Thank you. I have checked msbar, it cannot do what I need. -- Zhiqiang Ye From pmr at ebi.ac.uk Tue Mar 7 13:55:39 2006 From: pmr at ebi.ac.uk (Peter Rice) Date: Tue, 07 Mar 2006 13:55:39 +0000 Subject: [EMBOSS] Does emboss have a handy way for mutate a protein sequence? In-Reply-To: <34198fe40603061329u85c9f95p@mail.gmail.com> References: <34198fe40603061329u85c9f95p@mail.gmail.com> Message-ID: <440D90DB.6040605@ebi.ac.uk> Zhiqiang Ye wrote: > Dear all, > > Does emboss have a handy way for mutate a protein sequence by > the specified way? > For example, I have a sequence foo.fasta > > >>foo > > MATSCGLLKIIQRE > > It has a mutant called 'A2L'. Is there any way to do this operation > to output(with an option to check the foo.fasta has 'A' at position > 2): > >>foo A2L > > MLTSCGLLKIIQRE EMBOSS has several programs to change sequences. None does exactly what you ask. You could look at: biosed (does what you ask for longer replacements, but will change all 'A's to 'L's.) We could extend biosed to specify the position of the pattern ... is that what you need? regards, Peter From gbottu at ben.vub.ac.be Tue Mar 7 15:38:08 2006 From: gbottu at ben.vub.ac.be (Guy Bottu) Date: Tue, 7 Mar 2006 16:38:08 +0100 Subject: [EMBOSS] Does emboss have a handy way for mutate a protein sequence In-Reply-To: <34198fe40603061329u85c9f95p@mail.gmail.com> References: <34198fe40603061329u85c9f95p@mail.gmail.com> Message-ID: <20060307153808.GA15947@bigben.ulb.ac.be> Your solution works, as does the one proposed by David Bauer. Both are however rather tedious. It goes much easier with an interactive sequence editor. There is MSE, not a standard EMBOSS program but distributed as Embassadir. It runs in a VT100 terminal ; it is a little bit intimidating for the novice user, but you can with some practice learn to use it. At the BEN site we have besides MSE also installed SeaView, a graphical mode editor (has versions for Windows, Macintosh and X-Window). These editors are of course only usable if you work locally in your own computer or in a terminal session in a remote computer. It will not work if you are using a Web interface for EMBOSS ... although some Web interfaces might have an applet mode editor that allows to save the modified sequence back on the server (is there one in Jemboss ?). Hope this helps, Guy Bottu, Belgian EMBnet Node From yezhiqiang at gmail.com Tue Mar 7 13:25:02 2006 From: yezhiqiang at gmail.com (Zhiqiang Ye) Date: Tue, 7 Mar 2006 21:25:02 +0800 Subject: [EMBOSS] Does emboss have a handy way for mutate a protein sequence? In-Reply-To: References: <34198fe40603061329u85c9f95p@mail.gmail.com> Message-ID: <34198fe40603070525w1ffa7155n@mail.gmail.com> 2006/3/7, David.Bauer at schering.de : > What about this solution: > > cutseq foo.fasta -from 2 -to 2 | pasteseq -filter -pos 1 -bs asis:'L' | > descseq -filter -append -desc "A2L" Thanks a lot. It works very well! Best -- Zhiqiang Ye From yezhiqiang at gmail.com Tue Mar 7 16:33:40 2006 From: yezhiqiang at gmail.com (Zhiqiang Ye) Date: Wed, 8 Mar 2006 00:33:40 +0800 Subject: [EMBOSS] Does emboss have a handy way for mutate a protein sequence? In-Reply-To: <440D90DB.6040605@ebi.ac.uk> References: <34198fe40603061329u85c9f95p@mail.gmail.com> <440D90DB.6040605@ebi.ac.uk> Message-ID: <34198fe40603070833p627159b0i@mail.gmail.com> 2006/3/7, Peter Rice : > > EMBOSS has several programs to change sequences. None does exactly what you ask. > > You could look at: > > biosed (does what you ask for longer replacements, but will change all 'A's to > 'L's.) Yeah, it will change all 'A's to 'L's... > We could extend biosed to specify the position of the pattern ... is that what > you need? > Yes! If biosed can be extended to do this, it will be better :) Best Regards! -- Zhiqiang Ye From yezhiqiang at gmail.com Tue Mar 7 16:42:00 2006 From: yezhiqiang at gmail.com (Zhiqiang Ye) Date: Wed, 8 Mar 2006 00:42:00 +0800 Subject: [EMBOSS] Does emboss have a handy way for mutate a protein sequence In-Reply-To: <20060307153808.GA15947@bigben.ulb.ac.be> References: <34198fe40603061329u85c9f95p@mail.gmail.com> <20060307153808.GA15947@bigben.ulb.ac.be> Message-ID: <34198fe40603070842i15a34763s@mail.gmail.com> hi, Guy Bottu Thank you. But I have to do a batch of these subsitituion, so a command line solution will be better. I write an ugly shell script to do this according to David Bauer. #!/bin/sh mutation=$2; WT=${mutation:0:1}; POS=${mutation:1:${#mutation}-2}; MT=${mutation: -1} POS2=`expr $POS - 1` cat $1 | cutseq -filter -from $POS -to $POS | pasteseq -filter -pos $POS2 -bs asis:$MT | descseq -filter -append -desc " (mutant: $mutation )" With this script mutate.sh in my ~/bin, I can type this: mutate.sh foo.fasta A2L Best -- Zhiqiang Ye From Marc.Logghe at DEVGEN.com Wed Mar 8 09:00:14 2006 From: Marc.Logghe at DEVGEN.com (Marc Logghe) Date: Wed, 8 Mar 2006 10:00:14 +0100 Subject: [EMBOSS] Oddcomp behaves oddly ... Message-ID: <0C528E3670D8CE4B8E013F6749231AA6746BA7@ANTARESIA.be.devgen.com> ... Or rather, how should I use it properly ? OK, suppose your run compseq to obtain the frequency for individual residues: compseq tsw:Q62671 -word 1 Apparently this example protein sequence is rather rich in leucine (106 L out of 889). In order to detect this leucine bias, a little file was created (leu.comp) that had the following content: Word size 1 Total count 0 # bias should be detected as 106 > 100 L 100 Oddcomp was run like this: oddcomp tsw:Q62671 -infile leu.comp -window 889 But the sequece is not reported. When I change the L count to 10 in leu.comp it does not work neither. Strangely enough, when the default window is taken (30) the sequence is reported. What is happening here ? Regards, Marc From d.gatherer at vir.gla.ac.uk Wed Mar 8 09:30:13 2006 From: d.gatherer at vir.gla.ac.uk (Derek Gatherer) Date: Wed, 08 Mar 2006 09:30:13 +0000 Subject: [EMBOSS] clustalw vs. emma Message-ID: <6.2.3.4.1.20060308092040.02aabc78@lenzie.gla.ac.uk> Morning all Is there some unusual default being passed to emma? For instance, here's emma with a vanilla set of parameters on a fairly well conserved set of proteins (bdlf4.fa): yoda:cluscheck 157 > emma bdlf4.fa -osformat2 msf -out2 bdlf4.emma -auto CLUSTAL W (1.83) Multiple Sequence Alignments Sequence type explicitly set to Protein Sequence format is Pearson Sequence 1: AG876-BDLF4 225 aa Sequence 2: B95-BDLF4 225 aa Sequence 3: GD1-BDLF4 225 aa Sequence 4: RLV-BDLF4 238 aa Start of Pairwise alignments Aligning... Sequences (1:2) Aligned. Score: 100 Sequences (1:3) Aligned. Score: 98 Sequences (1:4) Aligned. Score: 85 Sequences (2:3) Aligned. Score: 98 Sequences (2:4) Aligned. Score: 85 Sequences (3:4) Aligned. Score: 86 Guide tree file created: [00029986C] Start of Multiple Alignment There are 3 groups Aligning... Group 1: Sequences: 2 Score:3770 Group 2: Sequences: 3 Score:3741 Group 3: Sequences: 4 Score:3462 Alignment Score 8058 GCG-Alignment file created [00029986B] and now clustalw, unwrapped in emma, with the same input file yoda:cluscheck 158 > clustalw bdlf4.fa CLUSTAL W (1.83) Multiple Sequence Alignments Sequence format is Pearson Sequence 1: AG876-BDLF4 225 aa Sequence 2: B95-BDLF4 225 aa Sequence 3: GD1-BDLF4 225 aa Sequence 4: RLV-BDLF4 238 aa Start of Pairwise alignments Aligning... Sequences (1:2) Aligned. Score: 100 Sequences (1:3) Aligned. Score: 98 Sequences (1:4) Aligned. Score: 88 Sequences (2:3) Aligned. Score: 98 Sequences (2:4) Aligned. Score: 88 Sequences (3:4) Aligned. Score: 88 Guide tree file created: [bdlf4.dnd] Start of Multiple Alignment There are 3 groups Aligning... Group 1: Sequences: 2 Score:4959 Group 2: Sequences: 3 Score:4928 Group 3: Sequences: 4 Score:4677 Alignment Score 8187 CLUSTAL-Alignment file created [bdlf4.aln] Why is the scoring subtly different? and see what it does to the N-terminal of the alignment.... First with emma: 1 50 AG876-BDLF4 ~~~~~~~~~~~~~MSDQGRLSLPRGEGGTDEPNPRHLCSYSKLEFHLPLP B95-BDLF4 ~~~~~~~~~~~~~MSDQGRLSLPRGEGGTDEPNPRHLCSYSKLEFHLPLP GD1-BDLF4 ~~~~~~~~~~~~~MSDQGRLSLPRGEGGTDEPNPRHLCSYSKLEFHLPLP RLV-BDLF4 MSDHGRVSGRPRGAVRGRGASSPDGEGAPTGPNSRHLCSYSKLESHFPLP now with clustalw: AG876-BDLF4 MSDQGRLS-------------LPRGEGGTDEPNPRHLCSYSKLEFHLPLPESMASVFACW B95-BDLF4 MSDQGRLS-------------LPRGEGGTDEPNPRHLCSYSKLEFHLPLPESMASVFACW GD1-BDLF4 MSDQGRLS-------------LPRGEGGTDEPNPRHLCSYSKLEFHLPLPESMASVFACW RLV-BDLF4 MSDHGRVSGRPRGAVRGRGASSPDGEGAPTGPNSRHLCSYSKLESHFPLPESMASVFACW ***:**:* * ***.. **.********** *:************* Clustalw alone clearly gives the correct alignment whereas emma is wrong. I thought that emma simply wrapped clustalw for automation, but it appears it is doing something else. Out of a set of 80 proteins I am trying to pipeline through alignment, emma gives a variant result for 7 of them..... Any thoughts, as always, much appreciated cheers Derek From Marc.Logghe at DEVGEN.com Wed Mar 8 10:36:56 2006 From: Marc.Logghe at DEVGEN.com (Marc Logghe) Date: Wed, 8 Mar 2006 11:36:56 +0100 Subject: [EMBOSS] Oddcomp behaves oddly ... Message-ID: <0C528E3670D8CE4B8E013F6749231AA6746BAA@ANTARESIA.be.devgen.com> > Basically what is happening is that there is a check for the > length of the sequence being shorter than the window. It may > well be this that is giving the problem. This was a perfect diagnosis. It works fine when I make the window size off one. But I guess it should not be a problem for oddcomp being the window size equal (or even larger) to the length of the sequence ? It is a way of saying: don't bother with window sizes, just take the complete thing. Could be a nice to have feature. Thanks David, Marc From david at compbio.dundee.ac.uk Wed Mar 8 10:26:23 2006 From: david at compbio.dundee.ac.uk (David Martin) Date: Wed, 08 Mar 2006 10:26:23 +0000 Subject: [EMBOSS] Oddcomp behaves oddly ... In-Reply-To: <0C528E3670D8CE4B8E013F6749231AA6746BA7@ANTARESIA.be.devgen.com> Message-ID: On 8/3/06 9:00 am, "Marc Logghe" wrote: > ... Or rather, how should I use it properly ? > > OK, suppose your run compseq to obtain the frequency for individual > residues: > compseq tsw:Q62671 -word 1 > Apparently this example protein sequence is rather rich in leucine (106 > L out of 889). > > In order to detect this leucine bias, a little file was created > (leu.comp) that had the following content: > > Word size 1 > Total count 0 > > # bias should be detected as 106 > 100 > L 100 > > > Oddcomp was run like this: > oddcomp tsw:Q62671 -infile leu.comp -window 889 Try window 888 (ie shorter than the length of the sequence). There are a couple of minor bugs in the oddcomp code that I will forward to the team. Basically what is happening is that there is a check for the length of the sequence being shorter than the window. It may well be this that is giving the problem. It is a long time since I wrote this and C is not my usual language so apologies if this is not a comprehensive answer. ..d > > But the sequece is not reported. > When I change the L count to 10 in leu.comp it does not work neither. > Strangely enough, when the default window is taken (30) the sequence is > reported. > What is happening here ? > > Regards, > Marc > > _______________________________________________ > EMBOSS mailing list > EMBOSS at emboss.open-bio.org > http://newportal.open-bio.org/mailman/listinfo/emboss From jison at ebi.ac.uk Wed Mar 8 11:05:33 2006 From: jison at ebi.ac.uk (Jon Ison) Date: Wed, 8 Mar 2006 11:05:33 -0000 (GMT) Subject: [EMBOSS] Oddcomp behaves oddly ... In-Reply-To: <0C528E3670D8CE4B8E013F6749231AA6746BAA@ANTARESIA.be.devgen.com> References: <0C528E3670D8CE4B8E013F6749231AA6746BAA@ANTARESIA.be.devgen.com> Message-ID: <56257.84.92.187.247.1141815933.squirrel@webmail.ebi.ac.uk> Hi Marc What might be cleaner is if we modify the ACD file so that any window size bigger than the sequence length is reprompted for. Also, to add a qualifier to set the window to the sequence length, if that'd help. Cheers Jon >> Basically what is happening is that there is a check for the >> length of the sequence being shorter than the window. It may >> well be this that is giving the problem. > > This was a perfect diagnosis. It works fine when I make the window size > off one. > But I guess it should not be a problem for oddcomp being the window size > equal (or even larger) to the length of the sequence ? It is a way of > saying: don't bother with window sizes, just take the complete thing. > Could be a nice to have feature. > Thanks David, > Marc > > _______________________________________________ > EMBOSS mailing list > EMBOSS at emboss.open-bio.org > http://newportal.open-bio.org/mailman/listinfo/emboss > From Marc.Logghe at DEVGEN.com Wed Mar 8 11:36:06 2006 From: Marc.Logghe at DEVGEN.com (Marc Logghe) Date: Wed, 8 Mar 2006 12:36:06 +0100 Subject: [EMBOSS] Oddcomp behaves oddly ... Message-ID: <0C528E3670D8CE4B8E013F6749231AA6746BAB@ANTARESIA.be.devgen.com> Hi David, I am afraid there are some remaining oddities with oddcomp. Tried another protein, other residue. Word size 1 Total count 0 S 4 First a set of sequences is generated (kind of mimicking sliding window) of length 20: splitter wormpep:ZK822.4 -size 20 -overlap 19 > split.fa Second, oddseq is run (with window option off by one): oddcomp split.fa -window 19 -infile compseq.data # # Output from 'oddcomp' # # The Expected frequencies are taken from the file: compseq.data # # Word size: 1 ZK822.4_36-55 ZK822.4_37-56 ZK822.4_38-57 ZK822.4_39-58 ZK822.4_40-59 ZK822.4_41-60 # END # The first 20mer: >ZK822.4_36-55 SAGSSGSNFLSGLQNSSFGQ It is clear that there are 7 S residues in this stretch and we were looking for 4 or more, so that makes sense. However, when you run oddseq again with S count of 5 instead of 4, no sequence is reported ! Cheers, Marc > -----Original Message----- > From: David Martin [mailto:david at compbio.dundee.ac.uk] > Sent: Wednesday, March 08, 2006 11:26 AM > To: Marc Logghe; emboss at emboss.open-bio.org > Subject: Re: [EMBOSS] Oddcomp behaves oddly ... > > On 8/3/06 9:00 am, "Marc Logghe" wrote: > > > ... Or rather, how should I use it properly ? > > > > OK, suppose your run compseq to obtain the frequency for individual > > residues: > > compseq tsw:Q62671 -word 1 > > Apparently this example protein sequence is rather rich in leucine > > (106 L out of 889). > > > > In order to detect this leucine bias, a little file was created > > (leu.comp) that had the following content: > > > > Word size 1 > > Total count 0 > > > > # bias should be detected as 106 > 100 > > L 100 > > > > > > Oddcomp was run like this: > > oddcomp tsw:Q62671 -infile leu.comp -window 889 > > Try window 888 (ie shorter than the length of the sequence). > There are a couple of minor bugs in the oddcomp code that I > will forward to the team. > > Basically what is happening is that there is a check for the > length of the sequence being shorter than the window. It may > well be this that is giving the problem. > > It is a long time since I wrote this and C is not my usual > language so apologies if this is not a comprehensive answer. > > ..d > > > > > But the sequece is not reported. > > When I change the L count to 10 in leu.comp it does not > work neither. > > Strangely enough, when the default window is taken (30) the > sequence > > is reported. > > What is happening here ? > > > > Regards, > > Marc > > > > _______________________________________________ > > EMBOSS mailing list > > EMBOSS at emboss.open-bio.org > > http://newportal.open-bio.org/mailman/listinfo/emboss > > > From pmr at ebi.ac.uk Wed Mar 8 12:09:25 2006 From: pmr at ebi.ac.uk (Peter Rice) Date: Wed, 08 Mar 2006 12:09:25 +0000 Subject: [EMBOSS] Oddcomp behaves oddly ... In-Reply-To: References: Message-ID: <440EC975.6090907@ebi.ac.uk> David Martin wrote: > Basically what is happening is that there is a check for the length of the > sequence being shorter than the window. It may well be this that is giving > the problem. Not that part - it accepts a window the same length as the sequence (oddcomp can read more than one sequence, and does have to skip those too short to fit a window). A later loop does fail if the window size matches the sequence - I am testing allowing it to run just one more time :-) > It is a long time since I wrote this and C is not my usual language so > apologies if this is not a comprehensive answer. Snakke de fortran? >>But the sequece is not reported. >>When I change the L count to 10 in leu.comp it does not work neither. >>Strangely enough, when the default window is taken (30) the sequence is >>reported. Same problem I believe - it is the window size matching sequence length that stops the last for loop from checking anything. regadrs, Peter From pmr at ebi.ac.uk Wed Mar 8 13:13:24 2006 From: pmr at ebi.ac.uk (Peter Rice) Date: Wed, 08 Mar 2006 13:13:24 +0000 Subject: [EMBOSS] Oddcomp behaves oddly ... In-Reply-To: <0C528E3670D8CE4B8E013F6749231AA6746BAB@ANTARESIA.be.devgen.com> References: <0C528E3670D8CE4B8E013F6749231AA6746BAB@ANTARESIA.be.devgen.com> Message-ID: <440ED874.7070100@ebi.ac.uk> Marc Logghe wrote: > Hi David, > I am afraid there are some remaining oddities with oddcomp. > The first 20mer: > >>ZK822.4_36-55 > > SAGSSGSNFLSGLQNSSFGQ > > It is clear that there are 7 S residues in this stretch and we were > looking for 4 or more, so that makes sense. > However, when you run oddseq again with S count of 5 instead of 4, no > sequence is reported ! At least 2 bugs here. Firstly, with more than one sequence as input, some internal values were not fully reset. Also the word size is used (as 2) before it is set to 1. For 8 Serines in this set I am still only getting one hit out of two. A little more investigation needed ... I am getting closer :-) regards, Peter From ajb at ebi.ac.uk Thu Mar 9 15:58:33 2006 From: ajb at ebi.ac.uk (ajb at ebi.ac.uk) Date: Thu, 9 Mar 2006 15:58:33 -0000 (GMT) Subject: [EMBOSS] clustalw vs. emma In-Reply-To: <6.2.3.4.1.20060308092040.02aabc78@lenzie.gla.ac.uk> References: <6.2.3.4.1.20060308092040.02aabc78@lenzie.gla.ac.uk> Message-ID: <45378.81.96.70.96.1141919913.squirrel@webmail.ebi.ac.uk> Hi Derek, emma is indeed just a wrapper for clustalw. You can see what default parameters it is using by specifying -debug on the command line and then looking at the emma.dbg file. Search for a line saying "Executing 'clustalw" I suspect that the default gap extension penalty is rather high in your case. If you use (e.g.) -gapext 0.2 then you'll get something approaching the default clustalw behaviour. The defaults for your sequences seem to be: -gapopen=10.000 -gapext=5.000 -gapdist=8 HTH Alan > Morning all > > Is there some unusual default being passed to emma? For instance, > here's emma with a vanilla set of parameters on a fairly well > conserved set of proteins (bdlf4.fa): > > yoda:cluscheck 157 > emma bdlf4.fa -osformat2 msf -out2 bdlf4.emma -auto > > CLUSTAL W (1.83) Multiple Sequence Alignments > > Sequence type explicitly set to Protein > Sequence format is Pearson > Sequence 1: AG876-BDLF4 225 aa > Sequence 2: B95-BDLF4 225 aa > Sequence 3: GD1-BDLF4 225 aa > Sequence 4: RLV-BDLF4 238 aa > Start of Pairwise alignments > Aligning... > Sequences (1:2) Aligned. Score: 100 > Sequences (1:3) Aligned. Score: 98 > Sequences (1:4) Aligned. Score: 85 > Sequences (2:3) Aligned. Score: 98 > Sequences (2:4) Aligned. Score: 85 > Sequences (3:4) Aligned. Score: 86 > Guide tree file created: [00029986C] > Start of Multiple Alignment > There are 3 groups > Aligning... > Group 1: Sequences: 2 Score:3770 > Group 2: Sequences: 3 Score:3741 > Group 3: Sequences: 4 Score:3462 > Alignment Score 8058 > GCG-Alignment file created [00029986B] > > and now clustalw, unwrapped in emma, with the same input file > > yoda:cluscheck 158 > clustalw bdlf4.fa > > CLUSTAL W (1.83) Multiple Sequence Alignments > > Sequence format is Pearson > Sequence 1: AG876-BDLF4 225 aa > Sequence 2: B95-BDLF4 225 aa > Sequence 3: GD1-BDLF4 225 aa > Sequence 4: RLV-BDLF4 238 aa > Start of Pairwise alignments > Aligning... > Sequences (1:2) Aligned. Score: 100 > Sequences (1:3) Aligned. Score: 98 > Sequences (1:4) Aligned. Score: 88 > Sequences (2:3) Aligned. Score: 98 > Sequences (2:4) Aligned. Score: 88 > Sequences (3:4) Aligned. Score: 88 > Guide tree file created: [bdlf4.dnd] > Start of Multiple Alignment > There are 3 groups > Aligning... > Group 1: Sequences: 2 Score:4959 > Group 2: Sequences: 3 Score:4928 > Group 3: Sequences: 4 Score:4677 > Alignment Score 8187 > CLUSTAL-Alignment file created [bdlf4.aln] > > Why is the scoring subtly different? and see what it does to the > N-terminal of the alignment.... > > First with emma: > > 1 50 > AG876-BDLF4 ~~~~~~~~~~~~~MSDQGRLSLPRGEGGTDEPNPRHLCSYSKLEFHLPLP > B95-BDLF4 ~~~~~~~~~~~~~MSDQGRLSLPRGEGGTDEPNPRHLCSYSKLEFHLPLP > GD1-BDLF4 ~~~~~~~~~~~~~MSDQGRLSLPRGEGGTDEPNPRHLCSYSKLEFHLPLP > RLV-BDLF4 MSDHGRVSGRPRGAVRGRGASSPDGEGAPTGPNSRHLCSYSKLESHFPLP > > now with clustalw: > > AG876-BDLF4 > MSDQGRLS-------------LPRGEGGTDEPNPRHLCSYSKLEFHLPLPESMASVFACW > B95-BDLF4 > MSDQGRLS-------------LPRGEGGTDEPNPRHLCSYSKLEFHLPLPESMASVFACW > GD1-BDLF4 > MSDQGRLS-------------LPRGEGGTDEPNPRHLCSYSKLEFHLPLPESMASVFACW > RLV-BDLF4 > MSDHGRVSGRPRGAVRGRGASSPDGEGAPTGPNSRHLCSYSKLESHFPLPESMASVFACW > ***:**:* * ***.. **.********** > *:************* > > Clustalw alone clearly gives the correct alignment whereas emma is > wrong. I thought that emma simply wrapped clustalw for automation, > but it appears it is doing something else. Out of a set of 80 > proteins I am trying to pipeline through alignment, emma gives a > variant result for 7 of them..... > > Any thoughts, as always, much appreciated > > cheers > Derek > _______________________________________________ > EMBOSS mailing list > EMBOSS at emboss.open-bio.org > http://newportal.open-bio.org/mailman/listinfo/emboss > From d.gatherer at vir.gla.ac.uk Thu Mar 9 16:18:55 2006 From: d.gatherer at vir.gla.ac.uk (Derek Gatherer) Date: Thu, 09 Mar 2006 16:18:55 +0000 Subject: [EMBOSS] clustalw vs. emma In-Reply-To: <45378.81.96.70.96.1141919913.squirrel@webmail.ebi.ac.uk> References: <6.2.3.4.1.20060308092040.02aabc78@lenzie.gla.ac.uk> <45378.81.96.70.96.1141919913.squirrel@webmail.ebi.ac.uk> Message-ID: <6.2.3.4.1.20060309160317.02abb870@lenzie.gla.ac.uk> Thanks Alan That indeed is the cause of the problem: Executing 'clustalw -infile=00052348A -outfile=00052348B -align -type=protein -o utput=gcg -pwmatrix=blosum -pwgapopen=10.000 -pwgapext=0.100 -newtree=00052348C -matrix=blosum -gapopen=10.000 -gapext=5.000 -gapdist=8 -hgapresidues=GPSNDQEKR -maxdiv=30' However, on attempting to manually specify it, I run into another one: [gath01d at gamma cluscheck]$ emma bdlf4.fa -osformat2 msf -out2 bdlf4.emma -auto -debug -pwgapextend 5 Died: Unknown qualifier -pwgapextend In the docs http://emboss.sourceforge.net/apps/cvs/emma.html, there are quite a few optional parameters of this sort, some of which work and others don't, eg: [gath01d at gamma cluscheck]$ emma bdlf4.fa -osformat2 msf -out2 bdlf4.emma -auto -debug -gapextend 5 Died: Unknown qualifier -gapextend [gath01d at gamma cluscheck]$ emma bdlf4.fa -osformat2 msf -out2 bdlf4.emma -auto -debug -pwgapextend 5 Died: Unknown qualifier -pwgapextend [gath01d at gamma cluscheck]$ emma bdlf4.fa -osformat2 msf -out2 bdlf4.emma -auto -debug -gapopen 5 Died: Unknown qualifier -gapopen [gath01d at gamma cluscheck]$ emma bdlf4.fa -osformat2 msf -out2 bdlf4.emma -auto -debug -gapdist 5 CLUSTAL W (1.83) Multiple Sequence Alignments so -gapdist works at least. Cheers Derek At 15:58 09/03/2006, ajb at ebi.ac.uk wrote: >Hi Derek, > >emma is indeed just a wrapper for clustalw. You can see what default >parameters it is using by specifying -debug on the command line >and then looking at the emma.dbg file. Search for a line >saying "Executing 'clustalw" > >I suspect that the default gap extension penalty is rather high >in your case. If you use (e.g.) -gapext 0.2 then you'll get >something approaching the default clustalw behaviour. The defaults >for your sequences seem to be: > > -gapopen=10.000 -gapext=5.000 -gapdist=8 > > >HTH > >Alan > > > Morning all > > > > Is there some unusual default being passed to emma? For instance, > > here's emma with a vanilla set of parameters on a fairly well > > conserved set of proteins (bdlf4.fa): > > > > yoda:cluscheck 157 > emma bdlf4.fa -osformat2 msf -out2 bdlf4.emma -auto > > > > CLUSTAL W (1.83) Multiple Sequence Alignments > > > > Sequence type explicitly set to Protein > > Sequence format is Pearson > > Sequence 1: AG876-BDLF4 225 aa > > Sequence 2: B95-BDLF4 225 aa > > Sequence 3: GD1-BDLF4 225 aa > > Sequence 4: RLV-BDLF4 238 aa > > Start of Pairwise alignments > > Aligning... > > Sequences (1:2) Aligned. Score: 100 > > Sequences (1:3) Aligned. Score: 98 > > Sequences (1:4) Aligned. Score: 85 > > Sequences (2:3) Aligned. Score: 98 > > Sequences (2:4) Aligned. Score: 85 > > Sequences (3:4) Aligned. Score: 86 > > Guide tree file created: [00029986C] > > Start of Multiple Alignment > > There are 3 groups > > Aligning... > > Group 1: Sequences: 2 Score:3770 > > Group 2: Sequences: 3 Score:3741 > > Group 3: Sequences: 4 Score:3462 > > Alignment Score 8058 > > GCG-Alignment file created [00029986B] > > > > and now clustalw, unwrapped in emma, with the same input file > > > > yoda:cluscheck 158 > clustalw bdlf4.fa > > > > CLUSTAL W (1.83) Multiple Sequence Alignments > > > > Sequence format is Pearson > > Sequence 1: AG876-BDLF4 225 aa > > Sequence 2: B95-BDLF4 225 aa > > Sequence 3: GD1-BDLF4 225 aa > > Sequence 4: RLV-BDLF4 238 aa > > Start of Pairwise alignments > > Aligning... > > Sequences (1:2) Aligned. Score: 100 > > Sequences (1:3) Aligned. Score: 98 > > Sequences (1:4) Aligned. Score: 88 > > Sequences (2:3) Aligned. Score: 98 > > Sequences (2:4) Aligned. Score: 88 > > Sequences (3:4) Aligned. Score: 88 > > Guide tree file created: [bdlf4.dnd] > > Start of Multiple Alignment > > There are 3 groups > > Aligning... > > Group 1: Sequences: 2 Score:4959 > > Group 2: Sequences: 3 Score:4928 > > Group 3: Sequences: 4 Score:4677 > > Alignment Score 8187 > > CLUSTAL-Alignment file created [bdlf4.aln] > > > > Why is the scoring subtly different? and see what it does to the > > N-terminal of the alignment.... > > > > First with emma: > > > > 1 50 > > AG876-BDLF4 ~~~~~~~~~~~~~MSDQGRLSLPRGEGGTDEPNPRHLCSYSKLEFHLPLP > > B95-BDLF4 ~~~~~~~~~~~~~MSDQGRLSLPRGEGGTDEPNPRHLCSYSKLEFHLPLP > > GD1-BDLF4 ~~~~~~~~~~~~~MSDQGRLSLPRGEGGTDEPNPRHLCSYSKLEFHLPLP > > RLV-BDLF4 MSDHGRVSGRPRGAVRGRGASSPDGEGAPTGPNSRHLCSYSKLESHFPLP > > > > now with clustalw: > > > > AG876-BDLF4 > > MSDQGRLS-------------LPRGEGGTDEPNPRHLCSYSKLEFHLPLPESMASVFACW > > B95-BDLF4 > > MSDQGRLS-------------LPRGEGGTDEPNPRHLCSYSKLEFHLPLPESMASVFACW > > GD1-BDLF4 > > MSDQGRLS-------------LPRGEGGTDEPNPRHLCSYSKLEFHLPLPESMASVFACW > > RLV-BDLF4 > > MSDHGRVSGRPRGAVRGRGASSPDGEGAPTGPNSRHLCSYSKLESHFPLPESMASVFACW > > ***:**:* * ***.. **.********** > > *:************* > > > > Clustalw alone clearly gives the correct alignment whereas emma is > > wrong. I thought that emma simply wrapped clustalw for automation, > > but it appears it is doing something else. Out of a set of 80 > > proteins I am trying to pipeline through alignment, emma gives a > > variant result for 7 of them..... > > > > Any thoughts, as always, much appreciated > > > > cheers > > Derek > > _______________________________________________ > > EMBOSS mailing list > > EMBOSS at emboss.open-bio.org > > http://newportal.open-bio.org/mailman/listinfo/emboss > > From pmr at ebi.ac.uk Thu Mar 9 17:01:15 2006 From: pmr at ebi.ac.uk (Peter Rice) Date: Thu, 09 Mar 2006 17:01:15 +0000 Subject: [EMBOSS] clustalw vs. emma In-Reply-To: <6.2.3.4.1.20060309160317.02abb870@lenzie.gla.ac.uk> References: <6.2.3.4.1.20060308092040.02aabc78@lenzie.gla.ac.uk> <45378.81.96.70.96.1141919913.squirrel@webmail.ebi.ac.uk> <6.2.3.4.1.20060309160317.02abb870@lenzie.gla.ac.uk> Message-ID: <44105F5B.3050200@ebi.ac.uk> Derek Gatherer wrote: > In the docs http://emboss.sourceforge.net/apps/cvs/emma.html, there > are quite a few optional parameters of this sort, some of which work > and others don't, eg: Yup - we're putting that right (some people have noticed the application docs are moving around). The emboss.sf.net website only documents things for the latest code in CVS. We are adding documentation for release 3.0.0 (that is why the new directories are appearing). The release 3.0.0 documentation is installed on your system when you install 3.0.0 - if you install to /usr/local/bin it will be in: /usr/local/share/EMBOSS/doc/programs/html (this will change in release 4.0.0). You are seeing some of the changes made to make standard names for command line qualifiers since 3.0.0 Hope that helps, Peter From blanchard at microbio.umass.edu Thu Mar 9 21:18:55 2006 From: blanchard at microbio.umass.edu (Jeffrey Blanchard) Date: Thu, 9 Mar 2006 16:18:55 -0500 Subject: [EMBOSS] d_ino Message-ID: Hello, I am trying to install EMBOSS under cygwin for teaching purposes. make crashes on ajfile because d_ino appears to be missing in current version of cygwin. Is there a work around for this? Thanks, Jeff ------------------------------- Jeffrey L. Blanchard Assistant Professor Department of Microbiology University of Massachusetts Amherst, MA 01003 Office and Lab: Morrill I N330 Tel: 413-577-2130 Fax: 413-545-1578 http://www.bio.umass.edu/micro/blanchard/Lab_About.html From ajb at ebi.ac.uk Fri Mar 10 00:22:45 2006 From: ajb at ebi.ac.uk (ajb at ebi.ac.uk) Date: Fri, 10 Mar 2006 00:22:45 -0000 (GMT) Subject: [EMBOSS] d_ino In-Reply-To: References: Message-ID: <41243.81.96.70.96.1141950165.squirrel@webmail.ebi.ac.uk> Hi, Yes indeed there is a fix. Look in the directory. ftp://emboss.open-bio.org/pub/EMBOSS/fixes/ The README file there will usually tell you what each of the files fixes. HTH Alan Bleasby EBI > Hello, > > I am trying to install EMBOSS under cygwin for teaching purposes. > > make crashes on ajfile because d_ino appears to be missing in current > version of cygwin. > > Is there a work around for this? > > Thanks, Jeff > > ------------------------------- > Jeffrey L. Blanchard > Assistant Professor > Department of Microbiology > University of Massachusetts > Amherst, MA 01003 > Office and Lab: Morrill I N330 > Tel: 413-577-2130 > Fax: 413-545-1578 > http://www.bio.umass.edu/micro/blanchard/Lab_About.html > > > _______________________________________________ > EMBOSS mailing list > EMBOSS at emboss.open-bio.org > http://newportal.open-bio.org/mailman/listinfo/emboss > From jison at ebi.ac.uk Wed Mar 15 17:09:59 2006 From: jison at ebi.ac.uk (Jon Ison) Date: Wed, 15 Mar 2006 17:09:59 -0000 (GMT) Subject: [EMBOSS] EMBOSS Developers Course - reminder Message-ID: <39760.172.31.70.94.1142442599.squirrel@webmail.ebi.ac.uk> Hi There's still some places left on this course. Get in touch if you'd like to attend. Cheers Jon BSDC 2006 Bioinformatics Software Development Course April 18-20 2006 Following from the highly successful BSDC 2003/2004 courses, a new series of courses on 'Bioinformatics Software Development' using EMBOSS will be held in the training room at The Wellcome Trust Conference Centre on April 18-20, 2006. The course will give a good introduction to programming in EMBOSS. By the end of the course you will be experienced in all the steps in writing a basic bioinformatics application using the EMBOSS programming libraries. The course would suit competent programmers, probably with at least a couple of years of experience. A reasonable working knowledge of C is required to get the most out of the course, familiarity with pointers is helpful but not essential. That said, all are welcome regardless of background or experience. Places are limited so please email Liz Ford (ford at ebi.ac.uk) to register as soon as possible. We do not make a profit on the course but must charge #125 / person (for the 3-days) to recover some of our costs. We are unable to take credit card payments. The preferred method of payment is by cheque made payable to 'Industry Workshops'. If you wish to pay in cash or by bank transfer please contact Liz Ford (ford at ebi.ac.uk) To read more about the course see http://emboss.sourceforge.net/developers/developers_course/ To read more about EMBOSS see http://emboss.sourceforge.net/ To register: email Liz Ford (ford at ebi.ac.uk) with your full name, address, phone number You will then receive an email back confirming your registration or not. Please note, as mentioned before, places are limited so not all registrations will be successful. For further information email Jon Ison (jison at ebi.ac.uk) From pmr at ebi.ac.uk Mon Mar 27 17:50:09 2006 From: pmr at ebi.ac.uk (pmr at ebi.ac.uk) Date: Mon, 27 Mar 2006 18:50:09 +0100 (BST) Subject: [EMBOSS] [BiO BB] Building an alignment from BLAST hsp In-Reply-To: <00b501c651be$95b37500$e6028a0a@GOLHARMOBILE1> References: <00b501c651be$95b37500$e6028a0a@GOLHARMOBILE1> Message-ID: <2253.86.132.217.176.1143481809.squirrel@webmail.ebi.ac.uk> Ryan Golhar wrote: > I have a BLAST alignment: query sequence and database sequence. > > The alignment is only showing the HSP from the blast output as expected, > however I want to build an alignment of the entire database sequence > against my query sequence. > > I tried using needle from EMBOSS, however its aligning the sequences > completely different than BLAST does. What I'd really like is a way to > anchor the alignment based on the BLAST HSP. Does anyone know how to do > this, or what tool(s) will allow me to do this? You are quite right that EMBOSS may align the sequences completely differently - unless the HSPs are very significant and cover most of the sequence this will be true of any attempt to simply realign. There has to be some way to pass on the HSPs as fixed positions, as in the BioPerl solution. However, it could make a nice EMBOSS application - the only question would be how you would like to specify the HSPs. Perhaps we could read BLAST output (in some specified format), or perhaps some other way to give the input alignments. We do have at least one EMBOSS application that does something similar (finds all long perfect matches and interpolates) - we just need to reuse the interpolation code which is basically doing a global alignment of the bits in between. That also tackles the problem of choosing which non-compatible initial matches to use. Hope that helps, Peter From golharam at umdnj.edu Mon Mar 27 16:50:42 2006 From: golharam at umdnj.edu (Ryan Golhar) Date: Mon, 27 Mar 2006 11:50:42 -0500 Subject: [EMBOSS] Building an alignment from BLAST hsp Message-ID: <00b501c651be$95b37500$e6028a0a@GOLHARMOBILE1> I have a BLAST alignment: query sequence and database sequence. The alignment is only showing the HSP from the blast output as expected, however I want to build an alignment of the entire database sequence against my query sequence. I tried using needle from EMBOSS, however its aligning the sequences completely different than BLAST does. What I'd really like is a way to anchor the alignment based on the BLAST HSP. Does anyone know how to do this, or what tool(s) will allow me to do this? Ryan From golharam at umdnj.edu Mon Mar 27 18:03:39 2006 From: golharam at umdnj.edu (Ryan Golhar) Date: Mon, 27 Mar 2006 13:03:39 -0500 Subject: [EMBOSS] [BiO BB] Building an alignment from BLAST hsp In-Reply-To: <2253.86.132.217.176.1143481809.squirrel@webmail.ebi.ac.uk> Message-ID: <010501c651c8$c6b4bb00$e6028a0a@GOLHARMOBILE1> Hi Peter, > You are quite right that EMBOSS may align the sequences completely > differently - unless the HSPs are very significant and cover most > of the sequence this will be true of any attempt to simply realign. > There has to be some way to pass on the HSPs as fixed positions, > as in the BioPerl solution. I looked at a bioperl method, but can't seem to find something that will accomplish this. > However, it could make a nice EMBOSS application - the only question > would be how you would like to specify the HSPs. Perhaps we could read > BLAST output (in some specified format), or perhaps some other way to > give the input alignments. Yes, I agree. I suppose the best way would be to specify the two sequences and the blast output. The application could then construct an alignment based on a particular HSP (probably the first one, or whatever the user specifies). Ryan From letondal at pasteur.fr Tue Mar 28 07:25:07 2006 From: letondal at pasteur.fr (Catherine Letondal) Date: Tue, 28 Mar 2006 09:25:07 +0200 Subject: [EMBOSS] [BiO BB] Building an alignment from BLAST hsp In-Reply-To: <010501c651c8$c6b4bb00$e6028a0a@GOLHARMOBILE1> References: <010501c651c8$c6b4bb00$e6028a0a@GOLHARMOBILE1> Message-ID: <4b91818a096ba42d8d53279a7f63e6ea@pasteur.fr> On Mar 27, 2006, at 8:03 PM, Ryan Golhar wrote: > Hi Peter, > >> You are quite right that EMBOSS may align the sequences completely >> differently - unless the HSPs are very significant and cover most >> of the sequence this will be true of any attempt to simply realign. >> There has to be some way to pass on the HSPs as fixed positions, >> as in the BioPerl solution. > > I looked at a bioperl method, but can't seem to find something that > will > accomplish this. > >> However, it could make a nice EMBOSS application - the only question >> would be how you would like to specify the HSPs. Perhaps we could read > >> BLAST output (in some specified format), or perhaps some other way to >> give the input alignments. > > Yes, I agree. I suppose the best way would be to specify the two > sequences and the blast output. The application could then construct > an > alignment based on a particular HSP (probably the first one, or > whatever > the user specifies). > Have you tried this: http://bioweb.pasteur.fr/seqanal/interfaces/seqsblast.html It is based on bioperl. check "Get HSP" option (you can even extend it). Best, -- Catherine Letondal -- Institut Pasteur -- Computing Center From cquijano at iib.uam.es Tue Mar 28 09:49:01 2006 From: cquijano at iib.uam.es (Carlos Quijano) Date: Tue, 28 Mar 2006 11:49:01 +0200 Subject: [EMBOSS] [BiO BB] Building an alignment from BLAST hsp In-Reply-To: <4b91818a096ba42d8d53279a7f63e6ea@pasteur.fr> References: <010501c651c8$c6b4bb00$e6028a0a@GOLHARMOBILE1> <4b91818a096ba42d8d53279a7f63e6ea@pasteur.fr> Message-ID: <1143539342.8611.45.camel@localhost.localdomain> Hi all, I didnt read it before, sorry for the "lapsus". And sorry for the information if what I tell you is not exactly what you needed, Ryan. What you are looking for is just _MVIEW_, an old but nice application. Use scholar.google.com / pubmed to find more information about it, I remember that there are web servers running cgi's somewhere. It is possible than during this last years, somebody has published a new better tool or a new mview version.... Look for it. MVIEW is a parser for your blast output. MVIEW works for your problem because you wanna align only one sequence (as a template) to a entire database (I suppose that with any cutoff in the e-value or p-vale, at least the default, it is, ten) or against a set of some sequences or only one more sequence (2 sequences alignment). I continue with some considerations about aligning HSPs from Blast the way you pretend and mview does... there are important considerations and it is only a minute to read: Remember, what you get is what you wanted, but not a real thing (this is something very typical in bioinformatics - and all science - hahaha). You dont get a real multiple alignment, you get an artifact that is a entire database's gene-blast.hsps constructs piled down a template gene (your sequence). All right then. You dont have by any means an alignment, nor even an alignment of the genes using HSPs, because, there can be some hsps alignable between sequences in the database that are hidden for the alignment when sequences are piled down your sequence, because your sequence lacks this hsps and are _ignored_. Why is this so important? What I actually mean is that if you use this "sequences piled down a template" as a multiple alignment, you will be lying about the topology underlying (it is, not lying ;-) in the gene network, that arises from your database plus your sequence when correctly aligned, it is, all against all... etc,etc, etc. Well, it is the mathematical exhaustive-optimal way... normally we use heuristics again, and again, and again... But "all against all" is the key concept involved in the multiple alignment problem. It is very important to be aware of this things. needle is the optimal way <-> Blast is the heuristic Clustal is also a very very heuristic solution to the massive problem of multiple alignment. And personally I prefer to use muscle that uses a better mathematical model and is (right now) the quickest aligner for the most of the cases. I am sure that most of you know it. I hope it is usefull for newbies and others, so forgive me for the boring tedious discourse... CQ El mar, 28-03-2006 a las 09:25 +0200, Catherine Letondal escribi?: > On Mar 27, 2006, at 8:03 PM, Ryan Golhar wrote: > > > Hi Peter, > > > >> You are quite right that EMBOSS may align the sequences completely > >> differently - unless the HSPs are very significant and cover most > >> of the sequence this will be true of any attempt to simply realign. > >> There has to be some way to pass on the HSPs as fixed positions, > >> as in the BioPerl solution. > > > > I looked at a bioperl method, but can't seem to find something that > > will > > accomplish this. > > > >> However, it could make a nice EMBOSS application - the only question > >> would be how you would like to specify the HSPs. Perhaps we could read > > > >> BLAST output (in some specified format), or perhaps some other way to > >> give the input alignments. > > > > Yes, I agree. I suppose the best way would be to specify the two > > sequences and the blast output. The application could then construct > > an > > alignment based on a particular HSP (probably the first one, or > > whatever > > the user specifies). > > > > Have you tried this: > http://bioweb.pasteur.fr/seqanal/interfaces/seqsblast.html > > It is based on bioperl. check "Get HSP" option (you can even extend it). > > Best, > > -- > Catherine Letondal -- Institut Pasteur -- Computing Center > > _______________________________________________ > EMBOSS mailing list > EMBOSS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/emboss Carlos Quijano http://www2.iib.uam.es/cquijano Evolution and Development laboratory Regulation of Gene Expression Department Institute for Biomedical Research http://www.iib.uam.es From kvddrift at earthlink.net Thu Mar 30 00:36:23 2006 From: kvddrift at earthlink.net (Koen van der Drift) Date: Wed, 29 Mar 2006 19:36:23 -0500 Subject: [EMBOSS] crash on intel-Mac Message-ID: Hi, I got a report from a user (of the fink package of emboss) that the following crashes occur on his Mac with an intel processor: % wossname Error: Failed to compile regular expression '^(.*/)[^/]+/?$' at position 716: range out of order in character class Bus error All other programs just give a bus error. I don't get these errors on a Mac with a PowerPC processor. This is emboss 3.0.0. - Koen. From areagp61 at yahoo.it Thu Mar 30 08:31:42 2006 From: areagp61 at yahoo.it (Graziano P.) Date: Thu, 30 Mar 2006 10:31:42 +0200 (CEST) Subject: [EMBOSS] dbifasta index file format Message-ID: <20060330083142.4237.qmail@web26207.mail.ukl.yahoo.com> hello EMBOSS users, I have some databases in fasta format (ncbi | format) and I want to index them using dbifasta, then I want to access the index files using a program that will be developed by a computer scientist of my group. I need to index the databases by accession number, ginumber and description. I have read in the dbifasta help info about the structure of the index files when the databases were indexed by accession number, but I have not found info about the structure of the index files when the databases are indexed by description. Anyone knows where I can find detailed information about the structure of the index files? Regards Graziano ___________________________________ Yahoo! Messenger with Voice: chiama da PC a telefono a tariffe esclusive http://it.messenger.yahoo.com From ajb at ebi.ac.uk Thu Mar 30 08:38:10 2006 From: ajb at ebi.ac.uk (ajb at ebi.ac.uk) Date: Thu, 30 Mar 2006 09:38:10 +0100 (BST) Subject: [EMBOSS] crash on intel-Mac In-Reply-To: References: Message-ID: <37407.81.98.244.247.1143707890.squirrel@webmail.ebi.ac.uk> Hi, Thanks. We already have a report of this and are working on a solution. Alan From gbottu at ben.vub.ac.be Thu Mar 30 09:37:23 2006 From: gbottu at ben.vub.ac.be (Guy Bottu) Date: Thu, 30 Mar 2006 11:37:23 +0200 Subject: [EMBOSS] A note about fastA format(s) - Checked by AntiVir DEMO version - Message-ID: <20060330093723.GA18690@bigben.ulb.ac.be> Dear friends, We are using EMBOSS version 3.0. One of my colleagues tried to use a multiple sequence file in fastA format, where each comment line starts with a string containing multiple pipe signs. An USA of type fasta::file:xx|yy|zz|uu|ss did not work. After some trial I found that putting "pearson" instead of "fasta" helped. This is strange, since according to the on-line manual at http://emboss.sourceforge.net/docs/themes/SequenceFormats.html "fasta" and "pearson" are synonyms. Here it seems that "fasta" is instead treated the same as "ncbi". Comments ? Guy Bottu, BEN From enrique.deandres at pcm.uam.es Thu Mar 30 15:46:30 2006 From: enrique.deandres at pcm.uam.es (Enrique de Andres Saiz) Date: Thu, 30 Mar 2006 17:46:30 +0200 Subject: [EMBOSS] Problem indexing PDB fasta file Message-ID: <442BFD56.9010908@pcm.uam.es> Hello, I'm trying to index the fasta file of the PDB database with dbifasta command and I get a lot of warnings as: Warning: Duplicate ID skipped: '1FNT_A' All hits will point to first ID found I have been looking the PDB fasta file and I see that, for the previous warning, there are an entry whoose id is '1FNT_A' and another one whoose id is '1FNT_a'. Then, this make me think that EMBOSS is case-insensitive. Is this true? Are there any way to distinguish between the two id's? Thanks in advance, Enrique. From pmr at ebi.ac.uk Thu Mar 30 21:47:19 2006 From: pmr at ebi.ac.uk (pmr at ebi.ac.uk) Date: Thu, 30 Mar 2006 22:47:19 +0100 (BST) Subject: [EMBOSS] A note about fastA format(s) - Checked by AntiVir DEMO version - In-Reply-To: <20060330093723.GA18690@bigben.ulb.ac.be> References: <20060330093723.GA18690@bigben.ulb.ac.be> Message-ID: <50335.68.153.173.207.1143755239.squirrel@webmail.ebi.ac.uk> Dear Guy, > We are using EMBOSS version 3.0. One of my colleagues tried to use a > multiple sequence file in fastA format, where each comment line starts > with a string containing multiple pipe signs. An USA of type > fasta::file:xx|yy|zz|uu|ss > did not work. After some trial I found that putting "pearson" instead of > "fasta" helped. This is strange, since according to the on-line manual at > http://emboss.sourceforge.net/docs/themes/SequenceFormats.html > "fasta" and "pearson" are synonyms. Here it seems that "fasta" is instead > treated the same as "ncbi". Comments ? Yes, that is indeed true. We had to make chanhes to support various NCBI formats, and made FASTA and NCBI the same. We kept "pearson" as the original plain fasta format. We will update the documentation - it will take a little time to check for any other changes to the formats. regards, Peter From ajb at ebi.ac.uk Fri Mar 31 12:12:53 2006 From: ajb at ebi.ac.uk (ajb at ebi.ac.uk) Date: Fri, 31 Mar 2006 13:12:53 +0100 (BST) Subject: [EMBOSS] crash on intel-Mac In-Reply-To: References: Message-ID: <51078.81.98.244.247.1143807173.squirrel@webmail.ebi.ac.uk> This should now be fixed as long as you apply all the fixes to EMBOSS-3.0.0 from the directory: ftp://emboss.open-bio.org/pub/EMBOSS/fixes/ The latest file there is a new 'configure' however, if you've not applied previous patches in the above directory as well, then you'll get compilation failure. Look at the README for details of what the patches fix. Thanks to Bill van Etten for previous emails on this. Changes to the CVS developers version will follow. Alan