From misoh049 at gmail.com Mon Jul 18 22:00:35 2011 From: misoh049 at gmail.com (Tae-Kyung Kim) Date: Tue, 19 Jul 2011 11:00:35 +0900 Subject: [emboss-dev] =?utf-8?q?Enquiry_about_emboss_needle_program_custom?= =?utf-8?b?aXppbuKAi2c=?= Message-ID: *I am now working for Korea Bioinformation Center, and particularly integrating Korean Bio-resources as National infrastructure.* * * * * *Now, I am urgently customizing global alignment program needle * * * * * *for fungi sequence database analysis. Although I know that emboss needle is the best program, * * * * * *it is not easy to customize it for my purpose. Would you please let me know how to modify the program?* * * * * *There are two questions.* * * * * *First, I would like to get standard input sequence parameter instead of file name like this.* * * * * *./needle -asequence ATGCATATAAA -bsequence ATAGAATAAA -gapopen 10 -gapextend 0.5 -stdout -auto* * * * * *I want to reduce file I/O to perform large number of global sequence alignment with this structure.* * * * * *Second, I would like to get real identity, which means the number of '|' in needle result.* * * * * * ATAAAAAA* * | | | | | | | * * ATAATAAA* * ---------------------------------------* * Real Identity = 7* * * * * * * *I thank you in advance for your support and look forward to hearing from you.* -- Tae-Kyung Kim, Ph.D. Bio-Resource Information Team Korea Bioinformation Center (KOBIC) 111 Gwahangno, Yuseong-gu, Daejeon 305-806, Korea TEL: +82-42-879-8548 FAX: +82-42-879-8519 From pmr at ebi.ac.uk Tue Jul 19 01:50:00 2011 From: pmr at ebi.ac.uk (Peter Rice) Date: Tue, 19 Jul 2011 06:50:00 +0100 Subject: [emboss-dev] =?utf-8?q?Enquiry_about_emboss_needle_program_custom?= =?utf-8?b?aXppbuKAi2c=?= In-Reply-To: References: Message-ID: <4E251B08.3080606@ebi.ac.uk> On 19/07/2011 03:00, Tae-Kyung Kim wrote: > I am now working for Korea Bioinformation Center, and particularly > integrating Korean Bio-resources as National infrastructure. > > Now, I am urgently customizing global alignment program needle I think we can help. I believe EMBOSS can already do what you need. > First, I would like to get standard input sequence parameter instead of > file name like this. > > ./needle -asequence ATGCATATAAA -bsequence ATAGAATAAA -gapopen 10 > -gapextend 0.5 -stdout -auto All EMBOSS programs can do this. EMBOSS has a file format "asis" which defines the file name as the sequence, so your command line becomes: ./needle -asequence asis::ATGCATATAAA -bsequence asis::ATAGAATAAA -gapopen 10 -gapextend 0.5 -stdout -auto For long sequences your system needs to allow long command lines. EMBOSS has no restriction on the length of the asis:: sequence but sometimes the system gives an error message or truncates the command line. For long sequences you can, of course, save the sequence to a file and use the filename as input. > Second, I would like to get real identity, which means the number > of '|' in needle result. > > ATAAAAAA > | | | | | | | > ATAATAAA > --------------------------------------- > Real Identity = 7 Needle reports this information ... if you look in the header of the output you will find an Indentity: line which is the number of positions with a '|' in the output. #======================================= # # Aligned_sequences: 2 # 1: asis # 2: asis # Matrix: EDNAFULL # Gap_penalty: 10.0 # Extend_penalty: 0.5 # # Length: 14 # Identity: 7/14 (50.0%) # Similarity: 7/14 (50.0%) # Gaps: 7/14 (50.0%) # Score: 24.0 # # #======================================= asis 1 ATGCAT---ATAAA 11 || ||||| asis 1 ----ATAGAATAAA 10 You can also select an alternative output format with the -aformat qualifier (alignment format). Most of the alignment formats also include this header. A list of alignment formats can be found on our website at http://emboss.open-bio.org/html/use/ch05s04.html (This is chapter 5.4 of the new EMBOSS User's Guide book) Hope this helps. Peter Rice EMBOSS Team From misoh049 at gmail.com Fri Jul 22 01:19:16 2011 From: misoh049 at gmail.com (Tae-Kyung Kim) Date: Fri, 22 Jul 2011 14:19:16 +0900 Subject: [emboss-dev] Enquiry about emboss needleall program Message-ID: Hi, I am now trying to perform many-to-many global sequence alignment. I know that *needleall* supports such a operation, but I would like to get the same result with a needle program including alignment, identity. Is there any method to get it? I have just used -[no]brief option. but I didn't get what I want. Thanks. -- Tae-Kyung Kim, Ph.D. Bio-Resource Information Team Korea Bioinformation Center (KOBIC) 111 Gwahangno, Yuseong-gu, Daejeon 305-806, Korea TEL: +82-42-879-8548 FAX: +82-42-879-8519 From pmr at ebi.ac.uk Fri Jul 22 03:30:33 2011 From: pmr at ebi.ac.uk (Peter Rice) Date: Fri, 22 Jul 2011 08:30:33 +0100 Subject: [emboss-dev] Enquiry about emboss needleall program In-Reply-To: References: Message-ID: <4E292719.7020907@ebi.ac.uk> On 22/07/2011 06:19, Tae-Kyung Kim wrote: > Hi, > > I am now trying to perform many-to-many global sequence alignment. > I know that *needleall* supports such a operation, but I would like to get > the same result with a needle program including alignment, identity. > Is there any method to get it? Why do you want to use needle instead of needleall? needleall reads two sets of sequences (many to many) needle reads a single sequence, and a set of sequences to compare to (one to many). That should be the only diufference between the applications. > I have just used -[no]brief option. but I didn't get what I want. The identity is in the header above each sequence alignment in the default output. There are other alignment output formats available with the -aformat option. Can you give an example of what you would like to see. We probably already have a format in needle and needleall that gives what you need. regards, Peter Rice From misoh049 at gmail.com Fri Jul 22 04:18:45 2011 From: misoh049 at gmail.com (Tae-Kyung Kim) Date: Fri, 22 Jul 2011 17:18:45 +0900 Subject: [emboss-dev] Enquiry about emboss needleall program In-Reply-To: <4E292719.7020907@ebi.ac.uk> References: <4E292719.7020907@ebi.ac.uk> Message-ID: Thank you for fast response. When I first knew EMBOSS package a week ago, I have thought that needle is only one global alignment program. So I tried to perform many-to-many global alignment with needle. As you know, it was inefficient with much I/O operation causing performance delay. (To reduce overhead, I asked you about how to get the sequence parameter by stdin ^^) On studying other emboss program more, I identified the needleall program for many-to-many alignment and tried to use it. However, there was no description for result format in ./needleall --help and it is the reason that I sent inquiry mail to you. Fortunately, I have found the solution by googling as soon as sending email to you. sol)./needleall -asequence a.seq -bsequence b.seq -nobrief -gapopen 10 -gapextend 0.5 -stdout -auto *-aformat3 srspair* I am satisfied with much better performance than single needle after testing with following test program. Thank you again. Best Regards, Kim. /*my test application*/ int get_needle_all_result(char *fname1, char *fname2){ char cmd[10000], *cp1, *cp2, c1[1000], c2[1000]; char seq1[100], seq2[100]; int i=0, j=0, bar_cnt=0, dot_cnt=0, empty_line=0; FILE *fp; sprintf(cmd,"./needleall -asequence %s -bsequence %s -nobrief -gapopen 10 -gapextend 0.5 -stdout -auto -aformat3 srspair",fname1, fname2); fp = popen(cmd,"r"); if(fp==NULL){ printf("Process Open Error!\n"); exit(0); } while(1){ cp1 = fgets(c1,1000,fp); if(cp1==NULL) break; if(strstr(c1,"1:")!=NULL){ c1[strlen(c1)-1] = '\0'; strcpy(seq1,c1+5); }else if(strstr(c1,"2:")!=NULL){ c1[strlen(c1)-1] = '\0'; strcpy(seq2,c1+5); for(i=0;i<16;i++) fgets(c1,1000,fp); bar_cnt=0; dot_cnt=0; while(1){ cp2 = fgets(c2,1000,fp); if(cp1==NULL) break; c2[strlen(c2)-1] = '\0'; if(strstr(c2,"|")!=NULL || strstr(c2,".")!=NULL){ for(j=0;j<71;j++) { if(c2[j]=='|') bar_cnt++; else if(c2[j]=='.') dot_cnt++; } } if(strlen(c2)==0) empty_line++; else empty_line=0; if(empty_line==2) { printf("%s:%s:%d:%d\n",seq1,seq2,bar_cnt,dot_cnt); //liked list empty_line=0; break; } } } } fclose(fp); } On Fri, Jul 22, 2011 at 4:30 PM, Peter Rice wrote: > On 22/07/2011 06:19, Tae-Kyung Kim wrote: > >> Hi, >> >> I am now trying to perform many-to-many global sequence alignment. >> I know that *needleall* supports such a operation, but I would like to get >> the same result with a needle program including alignment, identity. >> Is there any method to get it? >> > > Why do you want to use needle instead of needleall? > > needleall reads two sets of sequences (many to many) > > needle reads a single sequence, and a set of sequences to compare to (one > to many). > > That should be the only diufference between the applications. > > > I have just used -[no]brief option. but I didn't get what I want. >> > > The identity is in the header above each sequence alignment in the default > output. > > There are other alignment output formats available with the -aformat > option. > > Can you give an example of what you would like to see. We probably already > have a format in needle and needleall that gives what you need. > > regards, > > Peter Rice > -- Tae-Kyung Kim, Ph.D. Bio-Resource Information Team Korea Bioinformation Center (KOBIC) 111 Gwahangno, Yuseong-gu, Daejeon 305-806, Korea TEL: +82-42-879-8548 FAX: +82-42-879-8519 From pmr at ebi.ac.uk Fri Jul 22 04:48:17 2011 From: pmr at ebi.ac.uk (Peter Rice) Date: Fri, 22 Jul 2011 09:48:17 +0100 Subject: [emboss-dev] Enquiry about emboss needleall program In-Reply-To: References: <4E292719.7020907@ebi.ac.uk> Message-ID: <4E293951.6060006@ebi.ac.uk> On 07/22/2011 09:18 AM, Tae-Kyung Kim wrote: > Fortunately, I have found the solution by googling as soon as sending email > to you. > > sol)./needleall -asequence a.seq -bsequence b.seq -nobrief -gapopen 10 > -gapextend 0.5 -stdout -auto *-aformat3 srspair* > > I am satisfied with much better performance than single needle after testing > with following test program. You can simply use -aformat (if there is only one alignment output there is no need for the parameter number 3) to avoid confusion when you tray another application with a different number of parameters. We are also happy to add new alignment formats if anyone needs a different output format. Any new alignment format will be available to all alignment outputs (but can be declared as only for pairwise/multiple or nucleotide/protein alignments) Happy to be able to help. Peter Rice EMBOSS Team From misoh049 at gmail.com Tue Jul 19 02:00:35 2011 From: misoh049 at gmail.com (Tae-Kyung Kim) Date: Tue, 19 Jul 2011 11:00:35 +0900 Subject: [emboss-dev] =?utf-8?q?Enquiry_about_emboss_needle_program_custom?= =?utf-8?b?aXppbuKAi2c=?= Message-ID: *I am now working for Korea Bioinformation Center, and particularly integrating Korean Bio-resources as National infrastructure.* * * * * *Now, I am urgently customizing global alignment program needle * * * * * *for fungi sequence database analysis. Although I know that emboss needle is the best program, * * * * * *it is not easy to customize it for my purpose. Would you please let me know how to modify the program?* * * * * *There are two questions.* * * * * *First, I would like to get standard input sequence parameter instead of file name like this.* * * * * *./needle -asequence ATGCATATAAA -bsequence ATAGAATAAA -gapopen 10 -gapextend 0.5 -stdout -auto* * * * * *I want to reduce file I/O to perform large number of global sequence alignment with this structure.* * * * * *Second, I would like to get real identity, which means the number of '|' in needle result.* * * * * * ATAAAAAA* * | | | | | | | * * ATAATAAA* * ---------------------------------------* * Real Identity = 7* * * * * * * *I thank you in advance for your support and look forward to hearing from you.* -- Tae-Kyung Kim, Ph.D. Bio-Resource Information Team Korea Bioinformation Center (KOBIC) 111 Gwahangno, Yuseong-gu, Daejeon 305-806, Korea TEL: +82-42-879-8548 FAX: +82-42-879-8519 From pmr at ebi.ac.uk Tue Jul 19 05:50:00 2011 From: pmr at ebi.ac.uk (Peter Rice) Date: Tue, 19 Jul 2011 06:50:00 +0100 Subject: [emboss-dev] =?utf-8?q?Enquiry_about_emboss_needle_program_custom?= =?utf-8?b?aXppbuKAi2c=?= In-Reply-To: References: Message-ID: <4E251B08.3080606@ebi.ac.uk> On 19/07/2011 03:00, Tae-Kyung Kim wrote: > I am now working for Korea Bioinformation Center, and particularly > integrating Korean Bio-resources as National infrastructure. > > Now, I am urgently customizing global alignment program needle I think we can help. I believe EMBOSS can already do what you need. > First, I would like to get standard input sequence parameter instead of > file name like this. > > ./needle -asequence ATGCATATAAA -bsequence ATAGAATAAA -gapopen 10 > -gapextend 0.5 -stdout -auto All EMBOSS programs can do this. EMBOSS has a file format "asis" which defines the file name as the sequence, so your command line becomes: ./needle -asequence asis::ATGCATATAAA -bsequence asis::ATAGAATAAA -gapopen 10 -gapextend 0.5 -stdout -auto For long sequences your system needs to allow long command lines. EMBOSS has no restriction on the length of the asis:: sequence but sometimes the system gives an error message or truncates the command line. For long sequences you can, of course, save the sequence to a file and use the filename as input. > Second, I would like to get real identity, which means the number > of '|' in needle result. > > ATAAAAAA > | | | | | | | > ATAATAAA > --------------------------------------- > Real Identity = 7 Needle reports this information ... if you look in the header of the output you will find an Indentity: line which is the number of positions with a '|' in the output. #======================================= # # Aligned_sequences: 2 # 1: asis # 2: asis # Matrix: EDNAFULL # Gap_penalty: 10.0 # Extend_penalty: 0.5 # # Length: 14 # Identity: 7/14 (50.0%) # Similarity: 7/14 (50.0%) # Gaps: 7/14 (50.0%) # Score: 24.0 # # #======================================= asis 1 ATGCAT---ATAAA 11 || ||||| asis 1 ----ATAGAATAAA 10 You can also select an alternative output format with the -aformat qualifier (alignment format). Most of the alignment formats also include this header. A list of alignment formats can be found on our website at http://emboss.open-bio.org/html/use/ch05s04.html (This is chapter 5.4 of the new EMBOSS User's Guide book) Hope this helps. Peter Rice EMBOSS Team From misoh049 at gmail.com Fri Jul 22 05:19:16 2011 From: misoh049 at gmail.com (Tae-Kyung Kim) Date: Fri, 22 Jul 2011 14:19:16 +0900 Subject: [emboss-dev] Enquiry about emboss needleall program Message-ID: Hi, I am now trying to perform many-to-many global sequence alignment. I know that *needleall* supports such a operation, but I would like to get the same result with a needle program including alignment, identity. Is there any method to get it? I have just used -[no]brief option. but I didn't get what I want. Thanks. -- Tae-Kyung Kim, Ph.D. Bio-Resource Information Team Korea Bioinformation Center (KOBIC) 111 Gwahangno, Yuseong-gu, Daejeon 305-806, Korea TEL: +82-42-879-8548 FAX: +82-42-879-8519 From pmr at ebi.ac.uk Fri Jul 22 07:30:33 2011 From: pmr at ebi.ac.uk (Peter Rice) Date: Fri, 22 Jul 2011 08:30:33 +0100 Subject: [emboss-dev] Enquiry about emboss needleall program In-Reply-To: References: Message-ID: <4E292719.7020907@ebi.ac.uk> On 22/07/2011 06:19, Tae-Kyung Kim wrote: > Hi, > > I am now trying to perform many-to-many global sequence alignment. > I know that *needleall* supports such a operation, but I would like to get > the same result with a needle program including alignment, identity. > Is there any method to get it? Why do you want to use needle instead of needleall? needleall reads two sets of sequences (many to many) needle reads a single sequence, and a set of sequences to compare to (one to many). That should be the only diufference between the applications. > I have just used -[no]brief option. but I didn't get what I want. The identity is in the header above each sequence alignment in the default output. There are other alignment output formats available with the -aformat option. Can you give an example of what you would like to see. We probably already have a format in needle and needleall that gives what you need. regards, Peter Rice From misoh049 at gmail.com Fri Jul 22 08:18:45 2011 From: misoh049 at gmail.com (Tae-Kyung Kim) Date: Fri, 22 Jul 2011 17:18:45 +0900 Subject: [emboss-dev] Enquiry about emboss needleall program In-Reply-To: <4E292719.7020907@ebi.ac.uk> References: <4E292719.7020907@ebi.ac.uk> Message-ID: Thank you for fast response. When I first knew EMBOSS package a week ago, I have thought that needle is only one global alignment program. So I tried to perform many-to-many global alignment with needle. As you know, it was inefficient with much I/O operation causing performance delay. (To reduce overhead, I asked you about how to get the sequence parameter by stdin ^^) On studying other emboss program more, I identified the needleall program for many-to-many alignment and tried to use it. However, there was no description for result format in ./needleall --help and it is the reason that I sent inquiry mail to you. Fortunately, I have found the solution by googling as soon as sending email to you. sol)./needleall -asequence a.seq -bsequence b.seq -nobrief -gapopen 10 -gapextend 0.5 -stdout -auto *-aformat3 srspair* I am satisfied with much better performance than single needle after testing with following test program. Thank you again. Best Regards, Kim. /*my test application*/ int get_needle_all_result(char *fname1, char *fname2){ char cmd[10000], *cp1, *cp2, c1[1000], c2[1000]; char seq1[100], seq2[100]; int i=0, j=0, bar_cnt=0, dot_cnt=0, empty_line=0; FILE *fp; sprintf(cmd,"./needleall -asequence %s -bsequence %s -nobrief -gapopen 10 -gapextend 0.5 -stdout -auto -aformat3 srspair",fname1, fname2); fp = popen(cmd,"r"); if(fp==NULL){ printf("Process Open Error!\n"); exit(0); } while(1){ cp1 = fgets(c1,1000,fp); if(cp1==NULL) break; if(strstr(c1,"1:")!=NULL){ c1[strlen(c1)-1] = '\0'; strcpy(seq1,c1+5); }else if(strstr(c1,"2:")!=NULL){ c1[strlen(c1)-1] = '\0'; strcpy(seq2,c1+5); for(i=0;i<16;i++) fgets(c1,1000,fp); bar_cnt=0; dot_cnt=0; while(1){ cp2 = fgets(c2,1000,fp); if(cp1==NULL) break; c2[strlen(c2)-1] = '\0'; if(strstr(c2,"|")!=NULL || strstr(c2,".")!=NULL){ for(j=0;j<71;j++) { if(c2[j]=='|') bar_cnt++; else if(c2[j]=='.') dot_cnt++; } } if(strlen(c2)==0) empty_line++; else empty_line=0; if(empty_line==2) { printf("%s:%s:%d:%d\n",seq1,seq2,bar_cnt,dot_cnt); //liked list empty_line=0; break; } } } } fclose(fp); } On Fri, Jul 22, 2011 at 4:30 PM, Peter Rice wrote: > On 22/07/2011 06:19, Tae-Kyung Kim wrote: > >> Hi, >> >> I am now trying to perform many-to-many global sequence alignment. >> I know that *needleall* supports such a operation, but I would like to get >> the same result with a needle program including alignment, identity. >> Is there any method to get it? >> > > Why do you want to use needle instead of needleall? > > needleall reads two sets of sequences (many to many) > > needle reads a single sequence, and a set of sequences to compare to (one > to many). > > That should be the only diufference between the applications. > > > I have just used -[no]brief option. but I didn't get what I want. >> > > The identity is in the header above each sequence alignment in the default > output. > > There are other alignment output formats available with the -aformat > option. > > Can you give an example of what you would like to see. We probably already > have a format in needle and needleall that gives what you need. > > regards, > > Peter Rice > -- Tae-Kyung Kim, Ph.D. Bio-Resource Information Team Korea Bioinformation Center (KOBIC) 111 Gwahangno, Yuseong-gu, Daejeon 305-806, Korea TEL: +82-42-879-8548 FAX: +82-42-879-8519 From pmr at ebi.ac.uk Fri Jul 22 08:48:17 2011 From: pmr at ebi.ac.uk (Peter Rice) Date: Fri, 22 Jul 2011 09:48:17 +0100 Subject: [emboss-dev] Enquiry about emboss needleall program In-Reply-To: References: <4E292719.7020907@ebi.ac.uk> Message-ID: <4E293951.6060006@ebi.ac.uk> On 07/22/2011 09:18 AM, Tae-Kyung Kim wrote: > Fortunately, I have found the solution by googling as soon as sending email > to you. > > sol)./needleall -asequence a.seq -bsequence b.seq -nobrief -gapopen 10 > -gapextend 0.5 -stdout -auto *-aformat3 srspair* > > I am satisfied with much better performance than single needle after testing > with following test program. You can simply use -aformat (if there is only one alignment output there is no need for the parameter number 3) to avoid confusion when you tray another application with a different number of parameters. We are also happy to add new alignment formats if anyone needs a different output format. Any new alignment format will be available to all alignment outputs (but can be declared as only for pairwise/multiple or nucleotide/protein alignments) Happy to be able to help. Peter Rice EMBOSS Team