From tzhu at mail.bnu.edu.cn Thu Aug 2 20:53:19 2012 From: tzhu at mail.bnu.edu.cn (Tao Zhu) Date: Fri, 03 Aug 2012 08:53:19 +0800 Subject: [EMBOSS] Why tfscan generate duplicate results? Message-ID: <501B20FF.7050308@mail.bnu.edu.cn> tfscan from emboss 6.5.7.0 My input sequence is an intron sequence from Arabidopsis thaliana(in attached files: test.fasta) I run: $ tfscan -sequence test.fasta -menu P -mismatch 0 -outfile test.out the result is: ######################################## # Program: tfscan # Rundate: Fri 3 Aug 2012 08:55:50 # Commandline: tfscan # -sequence test.fasta # -menu P # -mismatch 0 # -outfile test.out # Report_format: seqtable # Report_file: test.out ######################################## #======================================= # # Sequence: Atha from: 1 to: 3388 # HitCount: 20 #======================================= Start End Strand Accession Factor Sequence 3108 3114 + R03715 ggttaat 3108 3114 + R03715 ggttaat 2482 2488 + R03710 gaaagaa 3354 3359 + R02731 T00627; NIT2;Quality: 2; Species: Neurospora crassa. tatctc 2185 2190 + R02731 T00627; NIT2;Quality: 2; Species: Neurospora crassa. tatctc 1008 1013 + R02731 T00627; NIT2;Quality: 2; Species: Neurospora crassa. tatctc 1261 1266 + R02731 T00627; NIT2;Quality: 2; Species: Neurospora crassa. gagata 3354 3359 + R02729 T00627; NIT2;Quality: 2; Species: Neurospora crassa. tatctc 2185 2190 + R02729 T00627; NIT2;Quality: 2; Species: Neurospora crassa. tatctc 1008 1013 + R02729 T00627; NIT2;Quality: 2; Species: Neurospora crassa. tatctc 2726 2731 + R02728 T00627; NIT2;Quality: 2; Species: Neurospora crassa. tatcta 989 994 + R02728 T00627; NIT2;Quality: 2; Species: Neurospora crassa. tatcta 223 228 + R02728 T00627; NIT2;Quality: 2; Species: Neurospora crassa. tatcta 2726 2731 + R02728 T00627; NIT2;Quality: 2; Species: Neurospora crassa. tatcta 989 994 + R02728 T00627; NIT2;Quality: 2; Species: Neurospora crassa. tatcta 223 228 + R02728 T00627; NIT2;Quality: 2; Species: Neurospora crassa. tatcta 2876 2879 + R01203 ctcc 2449 2452 + R01203 ctcc 2141 2144 + R01203 ctcc 2877 2883 + R01202 tccacct #--------------------------------------- #--------------------------------------- #--------------------------------------- # Reported_sequences: 1 # Reported_hitcount: 20 #--------------------------------------- It could be seen that there exists duplicate items: for example, 3108-3114, +, appear and be counted twice. Why so? -- Tao Zhu, College of Life Sciences, Beijing Normal University, Beijing 100875, China Email: tzhu at mail.bnu.edu.cn -------------- next part -------------- A non-text attachment was scrubbed... Name: test.fasta Type: application/x-wine-extension-fasta Size: 3394 bytes Desc: not available URL: From georgios at biotek.uio.no Mon Aug 6 07:45:29 2012 From: georgios at biotek.uio.no (Georgios Magklaras) Date: Mon, 06 Aug 2012 13:45:29 +0200 Subject: [EMBOSS] Databases In-Reply-To: <5003F595.9080407@yahoo.co.uk> References: <5003F595.9080407@yahoo.co.uk> Message-ID: <501FAE59.80809@biotek.uio.no> Hi Nabeel, I made this guide on the subject of EMBOSS databases: http://epistolatory.blogspot.no/2012/08/the-bioinformatics-sysadmin.html I hope it explains the things you need to know to get started. GM Best regards, -- -- George Magklaras PhD RHCE no: 805008309135525 Senior Systems Engineer/IT Manager Biotechnology Center of Oslo and the Norwegian Center for Molecular Medicine EMBnet TMPC Chair http://folk.uio.no/georgios Tel: +47 22840535 On 07/16/2012 01:05 PM, Peter Rice wrote: > On 16/07/2012 06:56, Nabeel Ahmed wrote: >> I have recently installed EMBOSS-6.4.0 (Ubuntu 11.10). >> I am unable to make it work directly with live databases (embl, >> uniprot) , >> working totally fine with local sequence files. >> e.g >> >> % *plotorf * >> Plot potential open reading frames in a nucleotide sequence >> Input nucleotide sequence: *embl:x13776* >> >> *Error:* Failed to open filename 'embl'** >> >> Used 'showdb' , displayed table with zero rows. > > That is very strange. > > EMBOSS 6.4.0 includes a set of installed databases which should appear > in showdb. > > One of these is the data resource catalogue (drcat) which in turn > defines the embl database for web access. > > David Bauer has already explained how to get web access working if you > have a proxy to go through, but that does not explain the empty showdb > output. > > With drcat defined in a standard installation, and web access, you > should already have access to the embl database (it will not appear in > the showdb output of a standard installation, but is reachable by a > lookup in DRCAT). > > Could you please try the command > > embossversion -full > > which should include the value of Emboss_Standard. In this directory > you should have a file emboss.standard that defines drcat, several > other databases, and the servers that David Bauer mentioned. > > My first guess would be that there is some problem with access > permissions to that file. > > regards > > Peter Rice > EMBOSS Team > _______________________________________________ > EMBOSS mailing list > EMBOSS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/emboss > > From georgios at biotek.uio.no Mon Aug 6 07:50:42 2012 From: georgios at biotek.uio.no (Georgios Magklaras) Date: Mon, 06 Aug 2012 13:50:42 +0200 Subject: [EMBOSS] EMBOSS 6.5 installation and basic db setup walkthrough Message-ID: <501FAF92.2070209@biotek.uio.no> Hi, For the interest of the EMBOSS community, I have made a short-ish two part article series, explaining how to get an EMBOSS installation from sources on a Linux system. Part 1: http://epistolatory.blogspot.no/2012/07/a-linux-emboss-65-production-server.html Part 2: http://epistolatory.blogspot.no/2012_08_01_archive.html My plan is to add parts in the article series and highlight also some of the new functionality of EMBOSS 6.5 . Comments and suggestions are welcome. GM Best regards, -- -- George Magklaras PhD RHCE no: 805008309135525 Senior Systems Engineer/IT Manager Biotechnology Center of Oslo and the Norwegian Center for Molecular Medicine EMBnet TMPC Chair http://folk.uio.no/georgios Tel: +47 22840535 From drozenbaum at yahoo.com Tue Aug 14 13:59:14 2012 From: drozenbaum at yahoo.com (Daniel Rozenbaum) Date: Tue, 14 Aug 2012 10:59:14 -0700 (PDT) Subject: [EMBOSS] Sequence annotation parsing and format conversion Message-ID: <1344967154.22837.YahooMailNeo@web130203.mail.mud.yahoo.com> Greetings, My sincerest apologies if this question has already been addressed here: I'm trying to understand how EMBOSS works with sequence annotation. Here's an example (I'm using EMBOSS 6.4.0.0): I have a sequence in GENBANK format with extensive annotation, stored in a file/tmp/W02578.genbank (sequence listing at the end of this email). I feed it through the seqret utility as follows: seqret /abss/tmp/W02578.genbank -osformat2 genbank -feature Y -auto -osname W02578.emboss_genbank2genbank -osdirectory /tmp In the resultant file parts of the sequence annotation, such as fields AUTHORS, TITLE, COMMENT, and BASE COUNT are omitted, and values of some of the other fields are modified. I understand that entret is the tool to use when one is interested in the sequence record as is, but what I'm trying to understand is whether it is EMBOSS's parsing and internal representation of the sequence data where parts of the annotation are omitted, and whether it's necessarily the case that some of the annotation fields are going to be lost/modified when converting between formats as well? Many thanks, Daniel === start /tmp/W02578.genbank === LOCUS?????? W02578??????? 644 bp??? mRNA??????????? EST?????? 18-APR-1996 DEFINITION? za52e02.r1 Soares fetal liver spleen 1NFLS Homo sapiens cDNA clone ??????????? 296186 5'. ACCESSION?? W02578 NID???????? g1274623 KEYWORDS??? EST. SOURCE????? human. ? ORGANISM? Homo sapiens ??????????? Eukaryotae; mitochondrial eukaryotes; Metazoa; Chordata; ??????????? Vertebrata; Eutheria; Primates; Catarrhini; Hominidae; Homo. REFERENCE?? 1? (bases 1 to 644) ? AUTHORS?? Hillier,L., Clark,N., Dubuque,T., Elliston,K., Hawkins,M., ??????????? Holman,M., Hultman,M., Kucaba,T., Le,M., Lennon,G., Marra,M., ??????????? Parsons,J., Rifkin,L., Rohlfing,T., Soares,M., Tan,F., ??????????? Trevaskis,E., Waterston,R., Williamson,A., Wohldmann,P. and ??????????? Wilson,R. ? TITLE???? The WashU-Merck EST Project ? JOURNAL?? Unpublished (1995) COMMENT ??????????? Contact: Wilson RK ??????????? WashU-Merck EST Project ??????????? Washington University School of Medicine ??????????? 4444 Forest Park Parkway, Box 8501, St. Louis, MO 63108 ??????????? Tel: 314 286 1800 ??????????? Fax: 314 286 1810 ??????????? Email: est at watson.wustl.edu ??????????? This clone is available royalty-free through LLNL ; contact the ??????????? IMAGE Consortium (info at image.llnl.gov) for further information. ??????????? Seq primer: mob.REGA+ET ??????????? High quality sequence stop: 320. FEATURES???????????? Location/Qualifiers ???? source????????? 1..644 ???????????????????? /organism="Homo sapiens" ???????????????????? /note="Organ: Liver and Spleen; Vector: pT7T3D (Pharmacia) ???????????????????? with a modified polylinker; Site_1: Pac I; Site_2: Eco RI; ???????????????????? 1st strand cDNA was primed with a Pac I - oligo(dT) primer ???????????????????? [5' AACTGGAAGAATTAATTAAAGATCTTTTTTTTTTTTTTTTTTT 3'], ???????????????????? double-stranded cDNA was ligated to Eco RI adaptors ???????????????????? (Pharmacia), digested with Pac I and cloned into the Pac I ???????????????????? and Eco RI sites of the modified pT7T3 vector.? Library ???????????????????? went through one round of normalization. Library ???????????????????? constructed by Bento Soares and M.Fatima Bonaldo." ???????????????????? /clone="296186" ???????????????????? /clone_lib="Soares fetal liver spleen 1NFLS" ???????????????????? /sex="male" ???????????????????? /dev_stage="20 week-post conception fetus" ???????????????????? /lab_host="DH10B (ampicillin resistant)" ???? mRNA??????????? <1..>644 BASE COUNT????? 176 a??? 140 c??? 148 g??? 172 t????? 8 others ORIGIN ??????? 1 acgatgatga caatgaaatt agtgcctgtt ttcttgcaaa tttagcactt ggaacattta ?????? 61 aagaaaggtc tatgctgtca tatggggttt attgggaact atcctcctgg ccccaccctg ????? 121 ccccttcttt ttggttttga catcattcat ttccacctgg gaatttctgg tgccatgcca ????? 181 gaaagaatga ggaacctgta ttcctcttct tcgtgataat ataatctcta tttttttagg ????? 241 aaaacaaaaa tgaaaaacta ctccatttga ggattgtaat tcccacccct cttgcttctt ????? 301 ccccacctca ccatctccca gaccctcttc ccttctgtct tctcctccaa tacataaaag ????? 361 gacacagaca aggaactttg ctggaaaggg gnaacccatt ttcagggatc aggtcaaagg ????? 421 gcaagcaagc aggatagact cnaggtgtgt gaaatatgtt atacaccagg aggctggcac ????? 481 tggnatggtc ccaaacaaga atggtgtccg tctggggtct ggaatgtaag agttaaggga ????? 541 agggaangaa gggactacaa gangagtcgg agatggatga nggaaacaac acaatttccc ????? 601 aggccagtga tgcttgtggt gnacagntgt tcccgaggtc gggg // === end /tmp/W02578.genbank === === start /tmp/W02578.emboss_genbank2genbank === LOCUS?????? W02578?????????????????? 644 bp??? DNA???? linear?? UNC 14-AUG-2012 DEFINITION? za52e02.r1 Soares fetal liver spleen 1NFLS Homo sapiens cDNA clone ??????????? 296186 5'. ACCESSION?? W02578 KEYWORDS??? EST. SOURCE????? human. ? ORGANISM? human. REFERENCE?? 1? (bases 1 to 644) FEATURES???????????? Location/Qualifiers ???? source????????? 1..644 ???????????????????? /organism="Homo sapiens" ???????????????????? /note="Organ: Liver and Spleen; Vector: pT7T3D (Pharmacia) ???????????????????? with a modified polylinker; Site_1: Pac I; Site_2: Eco RI; ???????????????????? 1st strand cDNA was primed with a Pac I - oligo(dT) primer ???????????????????? [5' AACTGGAAGAATTAATTAAAGATCTTTTTTTTTTTTTTTTTTT 3'], ???????????????????? double-stranded cDNA was ligated to Eco RI adaptors ???????????????????? (Pharmacia), digested with Pac I and cloned into the Pac I ???????????????????? and Eco RI sites of the modified pT7T3 vector. Library ???????????????????? went through one round of normalization. Library ???????????????????? constructed by Bento Soares and M.Fatima Bonaldo." ???????????????????? /clone="296186" ???????????????????? /clone_lib="Soares fetal liver spleen 1NFLS" ???????????????????? /sex="male" ???????????????????? /dev_stage="20 week-post conception fetus" ???????????????????? /lab_host="DH10B (ampicillin resistant)" ???? mRNA??????????? <1..>644 ORIGIN ?????? 1? acgatgatga caatgaaatt agtgcctgtt ttcttgcaaa tttagcactt ggaacattta ????? 61? aagaaaggtc tatgctgtca tatggggttt attgggaact atcctcctgg ccccaccctg ???? 121? ccccttcttt ttggttttga catcattcat ttccacctgg gaatttctgg tgccatgcca ???? 181? gaaagaatga ggaacctgta ttcctcttct tcgtgataat ataatctcta tttttttagg ???? 241? aaaacaaaaa tgaaaaacta ctccatttga ggattgtaat tcccacccct cttgcttctt ???? 301? ccccacctca ccatctccca gaccctcttc ccttctgtct tctcctccaa tacataaaag ???? 361? gacacagaca aggaactttg ctggaaaggg gnaacccatt ttcagggatc aggtcaaagg ???? 421? gcaagcaagc aggatagact cnaggtgtgt gaaatatgtt atacaccagg aggctggcac ???? 481? tggnatggtc ccaaacaaga atggtgtccg tctggggtct ggaatgtaag agttaaggga ???? 541? agggaangaa gggactacaa gangagtcgg agatggatga nggaaacaac acaatttccc ???? 601? aggccagtga tgcttgtggt gnacagntgt tcccgaggtc gggg // === end /tmp/W02578.emboss_genbank2genbank === From ricepeterm at yahoo.co.uk Wed Aug 15 03:53:12 2012 From: ricepeterm at yahoo.co.uk (Peter Rice) Date: Wed, 15 Aug 2012 08:53:12 +0100 Subject: [EMBOSS] Sequence annotation parsing and format conversion In-Reply-To: <1344967154.22837.YahooMailNeo@web130203.mail.mud.yahoo.com> References: <1344967154.22837.YahooMailNeo@web130203.mail.mud.yahoo.com> Message-ID: <502B5568.6090003@yahoo.co.uk> Dear Daniel, On 14/08/2012 18:59, Daniel Rozenbaum wrote: > seqret /abss/tmp/W02578.genbank -osformat2 genbank -feature Y -auto -osname W02578.emboss_genbank2genbank -osdirectory /tmp > > In the resultant file parts of the sequence annotation, such as fields AUTHORS, TITLE, COMMENT, and BASE COUNT are omitted, and values of some of the other fields are modified. You are correct ... there are some gaps in the coverage of records in GenBank format. We will update those for the next release with the aim of preserving information when rewriting in GenBank format (we will aim to reproduce the full entry) and where possible retaining information when writing in EMBL format. There are some surprising inconsistencies in the current genbank to genbank conversion (for example the ORGANISM record). When tested on the EMBL version of this entry the current EMBOSS 6.5 release reproduces the entry exactly (comparing seqret to entret) apart from the exact wrapping of feature annotation. We should be able to do the same for GenBank format. Many thanks for the report Peter Rice EMBOSS Team From drozenbaum at yahoo.com Wed Aug 15 12:57:20 2012 From: drozenbaum at yahoo.com (Daniel Rozenbaum) Date: Wed, 15 Aug 2012 09:57:20 -0700 (PDT) Subject: [EMBOSS] Support for multi-line annotation in ig format Message-ID: <1345049840.40402.YahooMailNeo@web130204.mail.mud.yahoo.com> Dear list, (Peter, many thanks for your prompt reply to my previous inquiry!) We need to deal with extensive databases in Intelligenetics format with multiple lines in annotation of each record. It appears however that EMBOSS concatenates all annotation lines into a single line when building its internal representation of the sequence description: % cat /tmp/IGSEQ.ig ; Annotation line 1 ; Annotation line 2 ; Annotation line 3 IGSEQ ACGCATCGCATCAGACTACGC1 % seqret /tmp/IGSEQ.ig -osformat2 ig -auto -osname IGSEQ.emboss_ig2ig -osdirectory /tmp % cat /tmp/IGSEQ.emboss_ig2ig.ig ;Annotation line 1 Annotation line 2 Annotation line 3, 21 bases IGSEQ ACGCATCGCATCAGACTACGC1 Are there any plans to support multi-line annotation in this format? Many thanks, Daniel From ricepeterm at yahoo.co.uk Wed Aug 15 13:36:25 2012 From: ricepeterm at yahoo.co.uk (Peter Rice) Date: Wed, 15 Aug 2012 18:36:25 +0100 Subject: [EMBOSS] Support for multi-line annotation in ig format In-Reply-To: <1345049840.40402.YahooMailNeo@web130204.mail.mud.yahoo.com> References: <1345049840.40402.YahooMailNeo@web130204.mail.mud.yahoo.com> Message-ID: <502BDE19.9070603@yahoo.co.uk> On 15/08/2012 17:57, Daniel Rozenbaum wrote: > Dear list, > > (Peter, many thanks for your prompt reply to my previous inquiry!) > > We need to deal with extensive databases in Intelligenetics format with multiple lines in annotation of each record. It appears however that EMBOSS concatenates all annotation lines into a single line when building its internal representation of the sequence description: > > % cat /tmp/IGSEQ.ig > ; Annotation line 1 > ; Annotation line 2 > ; Annotation line 3 > IGSEQ > ACGCATCGCATCAGACTACGC1 > > > % seqret /tmp/IGSEQ.ig -osformat2 ig -auto -osname IGSEQ.emboss_ig2ig -osdirectory /tmp > > > % cat /tmp/IGSEQ.emboss_ig2ig.ig > ;Annotation line 1 Annotation line 2 Annotation line 3, 21 bases > IGSEQ > ACGCATCGCATCAGACTACGC1 > > Are there any plans to support multi-line annotation in this format? Interesting thought. We will take a look. It will need some care to maintain compatibility with other formats that have single (FASTA) or multiple (swissprot) descriptions. Which package is using this IG format? regards, Peter Rice EMBOSS Team From ivaylo.stoimenov at gmail.com Fri Aug 24 10:46:42 2012 From: ivaylo.stoimenov at gmail.com (Ivaylo Stoimenov) Date: Fri, 24 Aug 2012 16:46:42 +0200 Subject: [EMBOSS] eprimer32 question Message-ID: Hi, I was wondering if it is possible for eprimer32 to read a sequence from a string. The default input is from a file, but I need to feed primer3 from a string, which is changing in the body of my program. Any help will be appreciated. Regards, Ivaylo Stoimenov From p.j.a.cock at googlemail.com Fri Aug 24 10:59:27 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 24 Aug 2012 15:59:27 +0100 Subject: [EMBOSS] eprimer32 question In-Reply-To: References: Message-ID: On Fri, Aug 24, 2012 at 3:46 PM, Ivaylo Stoimenov wrote: > Hi, > I was wondering if it is possible for eprimer32 to read a sequence from a > string. The default input is from a file, but I need to feed primer3 from a > string, which is changing in the body of my program. Any help will be > appreciated. > Many EMBOSS tools will take a sequence on the command line using the pretend file format "asis" (as is), or can read from stdin. Because in this case EMBOSS is wrapping primer3, that may not be possible - but worth checking. Peter From ricepeterm at yahoo.co.uk Fri Aug 24 12:34:55 2012 From: ricepeterm at yahoo.co.uk (Peter Rice) Date: Fri, 24 Aug 2012 17:34:55 +0100 Subject: [EMBOSS] eprimer32 question In-Reply-To: References: Message-ID: <5037AD2F.2060607@yahoo.co.uk> Dear Ivaylo and Peter, On 24/08/2012 15:59, Peter Cock wrote: > On Fri, Aug 24, 2012 at 3:46 PM, Ivaylo Stoimenov > wrote: >> Hi, >> I was wondering if it is possible for eprimer32 to read a sequence from a >> string. The default input is from a file, but I need to feed primer3 from a >> string, which is changing in the body of my program. Any help will be >> appreciated. >> > Many EMBOSS tools will take a sequence on the command line > using the pretend file format "asis" (as is), or can read from stdin. > Because in this case EMBOSS is wrapping primer3, that may not > be possible - but worth checking. Good idea to be careful - some (EMBASSY) wrappers pass the input file name directly but eprimer32 does read the sequence and then send it to primer3 as a string. Any EMBOSS sequence input can be in the form asis::atcgatcgtagctgac which simply says the string is the sequence (rather than the file name). As the sequence has no name you can add: -sid myseqname to the command line which is then used in the output, and is also used to create the default output filename of myseqname.eprimer32 The only limit will be the length of command line you are allowed on your system, so long sequences may fail. But that is a system (shell) limit - EMBOSS will read any length of sequence that is passed to it this way. regards, Peter Rice EMBOSS Team From ivaylo.stoimenov at gmail.com Mon Aug 27 06:16:41 2012 From: ivaylo.stoimenov at gmail.com (Ivaylo Stoimenov) Date: Mon, 27 Aug 2012 12:16:41 +0200 Subject: [EMBOSS] eprimer32 question In-Reply-To: <5037AD2F.2060607@yahoo.co.uk> References: <5037AD2F.2060607@yahoo.co.uk> Message-ID: Hi Peter and Peter, Many thanks to both of you for the valuable help. It is very important for me to have as less file-related operations as possible, and passing a string helps to reduce the input from files. I wondered if it is possible to hijack the output of Primer3 to some sort of object directly to eprimer32 without writing to a file and thus making Primer3 sort of a function to an external program. I am aiming of running a validation cycle for the primers suggested by Primer3 and if necessary to change the input to Primer3 until certain criteria are fulfilled. Therefore I would like to skip a file-write-read operation until everything is optimal. Do you know if that is still possible, or I always need to read from Primer3 output files. Kind regards, Ivaylo 2012/8/24 Peter Rice > Dear Ivaylo and Peter, > > > On 24/08/2012 15:59, Peter Cock wrote: > >> On Fri, Aug 24, 2012 at 3:46 PM, Ivaylo Stoimenov >> wrote: >> >>> Hi, >>> I was wondering if it is possible for eprimer32 to read a sequence from a >>> string. The default input is from a file, but I need to feed primer3 >>> from a >>> string, which is changing in the body of my program. Any help will be >>> appreciated. >>> >>> Many EMBOSS tools will take a sequence on the command line >> using the pretend file format "asis" (as is), or can read from stdin. >> Because in this case EMBOSS is wrapping primer3, that may not >> be possible - but worth checking. >> > > Good idea to be careful - some (EMBASSY) wrappers pass the input file name > directly but eprimer32 does read the sequence and then send it to primer3 > as a string. > > Any EMBOSS sequence input can be in the form asis::atcgatcgtagctgac which > simply says the string is the sequence (rather than the file name). As the > sequence has no name you can add: > > -sid myseqname > > to the command line which is then used in the output, and is also used to > create the default output filename of myseqname.eprimer32 > > The only limit will be the length of command line you are allowed on your > system, so long sequences may fail. But that is a system (shell) limit - > EMBOSS will read any length of sequence that is passed to it this way. > > regards, > > Peter Rice > EMBOSS Team > > From p.j.a.cock at googlemail.com Mon Aug 27 09:43:44 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 27 Aug 2012 14:43:44 +0100 Subject: [EMBOSS] eprimer32 question In-Reply-To: References: <5037AD2F.2060607@yahoo.co.uk> Message-ID: On Mon, Aug 27, 2012 at 11:16 AM, Ivaylo Stoimenov wrote: > Hi Peter and Peter, > Many thanks to both of you for the valuable help. It is very important for > me to have as less file-related operations as possible, and passing a > string helps to reduce the input from files. > I wondered if it is possible to hijack the output of Primer3 to some sort > of object directly to eprimer32 without writing to a file and thus making > Primer3 sort of a function to an external program. I am aiming of running a > validation cycle for the primers suggested by Primer3 and if necessary to > change the input to Primer3 until certain criteria are fulfilled. Therefore > I would like to skip a file-write-read operation until everything is > optimal. Do you know if that is still possible, or I always need to read > from Primer3 output files. I think what you're asking for would require compiling the EMBOSS or underlying Primer3 function into your program. That may not be possible due to licensing restrictions (e.g. EMBOSS is GPL, what does your tool use?). However, you should be able to avoid a file-write-read by asking EMBOSS to write the output to stdout, and reading that from your program. (This is related to the other suggestion I made for passing the sequence to EMBOSS without using an input file: You can do this in the command line using the "asis" trick, or use stdin.) Peter From ricepeterm at yahoo.co.uk Mon Aug 27 13:28:31 2012 From: ricepeterm at yahoo.co.uk (Peter Rice) Date: Mon, 27 Aug 2012 18:28:31 +0100 Subject: [EMBOSS] eprimer32 question In-Reply-To: References: <5037AD2F.2060607@yahoo.co.uk> Message-ID: <503BAE3F.2090902@yahoo.co.uk> Dear Ivaylo and Peter, > On Mon, Aug 27, 2012 at 11:16 AM, Ivaylo Stoimenov > wrote: >> Hi Peter and Peter, >> Many thanks to both of you for the valuable help. It is very important for >> me to have as less file-related operations as possible, and passing a >> string helps to reduce the input from files. >> I wondered if it is possible to hijack the output of Primer3 to some sort >> of object directly to eprimer32 without writing to a file and thus making >> Primer3 sort of a function to an external program. I am aiming of running a >> validation cycle for the primers suggested by Primer3 and if necessary to >> change the input to Primer3 until certain criteria are fulfilled. Therefore >> I would like to skip a file-write-read operation until everything is >> optimal. Do you know if that is still possible, or I always need to read >> from Primer3 output files. Interesting suggestion, but already done! That is how eprimer32 works. The primer3 program is started, its input is sent to standard input, and its results are read from standard output. So all you should need to do is to request EMBOSS primer32 output goes to standard output. You can either use 'stdout' as the output file name, or use the -filter command line option which also defaults the first input file to standard input (but is happy with the asis:: sequence string as input), writes by default to standard output as the first output file, and also adds -auto to default all other unspecified options. -filter is intended to allow EMBOSS applications to be run as part of a pipe series. On 27/08/2012 14:43, Peter Cock wrote: > I think what you're asking for would require compiling the EMBOSS > or underlying Primer3 function into your program. That may not be > possible due to licensing restrictions (e.g. EMBOSS is GPL, what > does your tool use?). Fortunately not a problem. The way EMBOSS works it passes input and reads output through a pipe between processes (so, in GPL terms, primer3 and eprimer32 are not in the same binary file). > However, you should be able to avoid a file-write-read by asking > EMBOSS to write the output to stdout, and reading that from your > program. Exactly as above. > (This is related to the other suggestion I made for passing the > sequence to EMBOSS without using an input file: You can do > this in the command line using the "asis" trick, or use stdin.) Indeed. This is how the -filter pipe works. regards, Peter Rice EMBOSS Team From ivaylo.stoimenov at gmail.com Tue Aug 28 11:10:30 2012 From: ivaylo.stoimenov at gmail.com (Ivaylo Stoimenov) Date: Tue, 28 Aug 2012 17:10:30 +0200 Subject: [EMBOSS] eprimer32 question In-Reply-To: <503BAE3F.2090902@yahoo.co.uk> References: <5037AD2F.2060607@yahoo.co.uk> <503BAE3F.2090902@yahoo.co.uk> Message-ID: Hi, Your are extremely helpful again! Thank you. Just a last detail. How to rescue the output from stdout and put it back to the EMBOSS parser. If I feed one sequence at a time, I assume I have to use primer3 module read function. Could you please write one or two lines of code, which will hijack the output from Primer3 into some sort of Primer3.Record object. This might look easy, but I was not able to figure out how to do it. Hopefully this will be my last question. Kind regards, Ivaylo 2012/8/27 Peter Rice > Dear Ivaylo and Peter, > > > On Mon, Aug 27, 2012 at 11:16 AM, Ivaylo Stoimenov >> wrote: >> >>> Hi Peter and Peter, >>> Many thanks to both of you for the valuable help. It is very important >>> for >>> me to have as less file-related operations as possible, and passing a >>> string helps to reduce the input from files. >>> I wondered if it is possible to hijack the output of Primer3 to some sort >>> of object directly to eprimer32 without writing to a file and thus making >>> Primer3 sort of a function to an external program. I am aiming of >>> running a >>> validation cycle for the primers suggested by Primer3 and if necessary to >>> change the input to Primer3 until certain criteria are fulfilled. >>> Therefore >>> I would like to skip a file-write-read operation until everything is >>> optimal. Do you know if that is still possible, or I always need to read >>> from Primer3 output files. >>> >> > Interesting suggestion, but already done! That is how eprimer32 works. The > primer3 program is started, its input is sent to standard input, and its > results are read from standard output. > > So all you should need to do is to request EMBOSS primer32 output goes to > standard output. > > You can either use 'stdout' as the output file name, or use the -filter > command line option which also defaults the first input file to standard > input (but is happy with the asis:: sequence string as input), writes by > default to standard output as the first output file, and also adds -auto to > default all other unspecified options. -filter is intended to allow EMBOSS > applications to be run as part of a pipe series. > > > On 27/08/2012 14:43, Peter Cock wrote: > >> I think what you're asking for would require compiling the EMBOSS >> or underlying Primer3 function into your program. That may not be >> possible due to licensing restrictions (e.g. EMBOSS is GPL, what >> does your tool use?). >> > > Fortunately not a problem. The way EMBOSS works it passes input and reads > output through a pipe between processes (so, in GPL terms, primer3 and > eprimer32 are not in the same binary file). > > > However, you should be able to avoid a file-write-read by asking >> EMBOSS to write the output to stdout, and reading that from your >> program. >> > > Exactly as above. > > > (This is related to the other suggestion I made for passing the >> sequence to EMBOSS without using an input file: You can do >> this in the command line using the "asis" trick, or use stdin.) >> > > Indeed. This is how the -filter pipe works. > > > regards, > > Peter Rice > EMBOSS Team > > From tzhu at mail.bnu.edu.cn Fri Aug 3 00:53:19 2012 From: tzhu at mail.bnu.edu.cn (Tao Zhu) Date: Fri, 03 Aug 2012 08:53:19 +0800 Subject: [EMBOSS] Why tfscan generate duplicate results? Message-ID: <501B20FF.7050308@mail.bnu.edu.cn> tfscan from emboss 6.5.7.0 My input sequence is an intron sequence from Arabidopsis thaliana(in attached files: test.fasta) I run: $ tfscan -sequence test.fasta -menu P -mismatch 0 -outfile test.out the result is: ######################################## # Program: tfscan # Rundate: Fri 3 Aug 2012 08:55:50 # Commandline: tfscan # -sequence test.fasta # -menu P # -mismatch 0 # -outfile test.out # Report_format: seqtable # Report_file: test.out ######################################## #======================================= # # Sequence: Atha from: 1 to: 3388 # HitCount: 20 #======================================= Start End Strand Accession Factor Sequence 3108 3114 + R03715 ggttaat 3108 3114 + R03715 ggttaat 2482 2488 + R03710 gaaagaa 3354 3359 + R02731 T00627; NIT2;Quality: 2; Species: Neurospora crassa. tatctc 2185 2190 + R02731 T00627; NIT2;Quality: 2; Species: Neurospora crassa. tatctc 1008 1013 + R02731 T00627; NIT2;Quality: 2; Species: Neurospora crassa. tatctc 1261 1266 + R02731 T00627; NIT2;Quality: 2; Species: Neurospora crassa. gagata 3354 3359 + R02729 T00627; NIT2;Quality: 2; Species: Neurospora crassa. tatctc 2185 2190 + R02729 T00627; NIT2;Quality: 2; Species: Neurospora crassa. tatctc 1008 1013 + R02729 T00627; NIT2;Quality: 2; Species: Neurospora crassa. tatctc 2726 2731 + R02728 T00627; NIT2;Quality: 2; Species: Neurospora crassa. tatcta 989 994 + R02728 T00627; NIT2;Quality: 2; Species: Neurospora crassa. tatcta 223 228 + R02728 T00627; NIT2;Quality: 2; Species: Neurospora crassa. tatcta 2726 2731 + R02728 T00627; NIT2;Quality: 2; Species: Neurospora crassa. tatcta 989 994 + R02728 T00627; NIT2;Quality: 2; Species: Neurospora crassa. tatcta 223 228 + R02728 T00627; NIT2;Quality: 2; Species: Neurospora crassa. tatcta 2876 2879 + R01203 ctcc 2449 2452 + R01203 ctcc 2141 2144 + R01203 ctcc 2877 2883 + R01202 tccacct #--------------------------------------- #--------------------------------------- #--------------------------------------- # Reported_sequences: 1 # Reported_hitcount: 20 #--------------------------------------- It could be seen that there exists duplicate items: for example, 3108-3114, +, appear and be counted twice. Why so? -- Tao Zhu, College of Life Sciences, Beijing Normal University, Beijing 100875, China Email: tzhu at mail.bnu.edu.cn -------------- next part -------------- A non-text attachment was scrubbed... Name: test.fasta Type: application/x-wine-extension-fasta Size: 3394 bytes Desc: not available URL: From georgios at biotek.uio.no Mon Aug 6 11:45:29 2012 From: georgios at biotek.uio.no (Georgios Magklaras) Date: Mon, 06 Aug 2012 13:45:29 +0200 Subject: [EMBOSS] Databases In-Reply-To: <5003F595.9080407@yahoo.co.uk> References: <5003F595.9080407@yahoo.co.uk> Message-ID: <501FAE59.80809@biotek.uio.no> Hi Nabeel, I made this guide on the subject of EMBOSS databases: http://epistolatory.blogspot.no/2012/08/the-bioinformatics-sysadmin.html I hope it explains the things you need to know to get started. GM Best regards, -- -- George Magklaras PhD RHCE no: 805008309135525 Senior Systems Engineer/IT Manager Biotechnology Center of Oslo and the Norwegian Center for Molecular Medicine EMBnet TMPC Chair http://folk.uio.no/georgios Tel: +47 22840535 On 07/16/2012 01:05 PM, Peter Rice wrote: > On 16/07/2012 06:56, Nabeel Ahmed wrote: >> I have recently installed EMBOSS-6.4.0 (Ubuntu 11.10). >> I am unable to make it work directly with live databases (embl, >> uniprot) , >> working totally fine with local sequence files. >> e.g >> >> % *plotorf * >> Plot potential open reading frames in a nucleotide sequence >> Input nucleotide sequence: *embl:x13776* >> >> *Error:* Failed to open filename 'embl'** >> >> Used 'showdb' , displayed table with zero rows. > > That is very strange. > > EMBOSS 6.4.0 includes a set of installed databases which should appear > in showdb. > > One of these is the data resource catalogue (drcat) which in turn > defines the embl database for web access. > > David Bauer has already explained how to get web access working if you > have a proxy to go through, but that does not explain the empty showdb > output. > > With drcat defined in a standard installation, and web access, you > should already have access to the embl database (it will not appear in > the showdb output of a standard installation, but is reachable by a > lookup in DRCAT). > > Could you please try the command > > embossversion -full > > which should include the value of Emboss_Standard. In this directory > you should have a file emboss.standard that defines drcat, several > other databases, and the servers that David Bauer mentioned. > > My first guess would be that there is some problem with access > permissions to that file. > > regards > > Peter Rice > EMBOSS Team > _______________________________________________ > EMBOSS mailing list > EMBOSS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/emboss > > From georgios at biotek.uio.no Mon Aug 6 11:50:42 2012 From: georgios at biotek.uio.no (Georgios Magklaras) Date: Mon, 06 Aug 2012 13:50:42 +0200 Subject: [EMBOSS] EMBOSS 6.5 installation and basic db setup walkthrough Message-ID: <501FAF92.2070209@biotek.uio.no> Hi, For the interest of the EMBOSS community, I have made a short-ish two part article series, explaining how to get an EMBOSS installation from sources on a Linux system. Part 1: http://epistolatory.blogspot.no/2012/07/a-linux-emboss-65-production-server.html Part 2: http://epistolatory.blogspot.no/2012_08_01_archive.html My plan is to add parts in the article series and highlight also some of the new functionality of EMBOSS 6.5 . Comments and suggestions are welcome. GM Best regards, -- -- George Magklaras PhD RHCE no: 805008309135525 Senior Systems Engineer/IT Manager Biotechnology Center of Oslo and the Norwegian Center for Molecular Medicine EMBnet TMPC Chair http://folk.uio.no/georgios Tel: +47 22840535 From drozenbaum at yahoo.com Tue Aug 14 17:59:14 2012 From: drozenbaum at yahoo.com (Daniel Rozenbaum) Date: Tue, 14 Aug 2012 10:59:14 -0700 (PDT) Subject: [EMBOSS] Sequence annotation parsing and format conversion Message-ID: <1344967154.22837.YahooMailNeo@web130203.mail.mud.yahoo.com> Greetings, My sincerest apologies if this question has already been addressed here: I'm trying to understand how EMBOSS works with sequence annotation. Here's an example (I'm using EMBOSS 6.4.0.0): I have a sequence in GENBANK format with extensive annotation, stored in a file/tmp/W02578.genbank (sequence listing at the end of this email). I feed it through the seqret utility as follows: seqret /abss/tmp/W02578.genbank -osformat2 genbank -feature Y -auto -osname W02578.emboss_genbank2genbank -osdirectory /tmp In the resultant file parts of the sequence annotation, such as fields AUTHORS, TITLE, COMMENT, and BASE COUNT are omitted, and values of some of the other fields are modified. I understand that entret is the tool to use when one is interested in the sequence record as is, but what I'm trying to understand is whether it is EMBOSS's parsing and internal representation of the sequence data where parts of the annotation are omitted, and whether it's necessarily the case that some of the annotation fields are going to be lost/modified when converting between formats as well? Many thanks, Daniel === start /tmp/W02578.genbank === LOCUS?????? W02578??????? 644 bp??? mRNA??????????? EST?????? 18-APR-1996 DEFINITION? za52e02.r1 Soares fetal liver spleen 1NFLS Homo sapiens cDNA clone ??????????? 296186 5'. ACCESSION?? W02578 NID???????? g1274623 KEYWORDS??? EST. SOURCE????? human. ? ORGANISM? Homo sapiens ??????????? Eukaryotae; mitochondrial eukaryotes; Metazoa; Chordata; ??????????? Vertebrata; Eutheria; Primates; Catarrhini; Hominidae; Homo. REFERENCE?? 1? (bases 1 to 644) ? AUTHORS?? Hillier,L., Clark,N., Dubuque,T., Elliston,K., Hawkins,M., ??????????? Holman,M., Hultman,M., Kucaba,T., Le,M., Lennon,G., Marra,M., ??????????? Parsons,J., Rifkin,L., Rohlfing,T., Soares,M., Tan,F., ??????????? Trevaskis,E., Waterston,R., Williamson,A., Wohldmann,P. and ??????????? Wilson,R. ? TITLE???? The WashU-Merck EST Project ? JOURNAL?? Unpublished (1995) COMMENT ??????????? Contact: Wilson RK ??????????? WashU-Merck EST Project ??????????? Washington University School of Medicine ??????????? 4444 Forest Park Parkway, Box 8501, St. Louis, MO 63108 ??????????? Tel: 314 286 1800 ??????????? Fax: 314 286 1810 ??????????? Email: est at watson.wustl.edu ??????????? This clone is available royalty-free through LLNL ; contact the ??????????? IMAGE Consortium (info at image.llnl.gov) for further information. ??????????? Seq primer: mob.REGA+ET ??????????? High quality sequence stop: 320. FEATURES???????????? Location/Qualifiers ???? source????????? 1..644 ???????????????????? /organism="Homo sapiens" ???????????????????? /note="Organ: Liver and Spleen; Vector: pT7T3D (Pharmacia) ???????????????????? with a modified polylinker; Site_1: Pac I; Site_2: Eco RI; ???????????????????? 1st strand cDNA was primed with a Pac I - oligo(dT) primer ???????????????????? [5' AACTGGAAGAATTAATTAAAGATCTTTTTTTTTTTTTTTTTTT 3'], ???????????????????? double-stranded cDNA was ligated to Eco RI adaptors ???????????????????? (Pharmacia), digested with Pac I and cloned into the Pac I ???????????????????? and Eco RI sites of the modified pT7T3 vector.? Library ???????????????????? went through one round of normalization. Library ???????????????????? constructed by Bento Soares and M.Fatima Bonaldo." ???????????????????? /clone="296186" ???????????????????? /clone_lib="Soares fetal liver spleen 1NFLS" ???????????????????? /sex="male" ???????????????????? /dev_stage="20 week-post conception fetus" ???????????????????? /lab_host="DH10B (ampicillin resistant)" ???? mRNA??????????? <1..>644 BASE COUNT????? 176 a??? 140 c??? 148 g??? 172 t????? 8 others ORIGIN ??????? 1 acgatgatga caatgaaatt agtgcctgtt ttcttgcaaa tttagcactt ggaacattta ?????? 61 aagaaaggtc tatgctgtca tatggggttt attgggaact atcctcctgg ccccaccctg ????? 121 ccccttcttt ttggttttga catcattcat ttccacctgg gaatttctgg tgccatgcca ????? 181 gaaagaatga ggaacctgta ttcctcttct tcgtgataat ataatctcta tttttttagg ????? 241 aaaacaaaaa tgaaaaacta ctccatttga ggattgtaat tcccacccct cttgcttctt ????? 301 ccccacctca ccatctccca gaccctcttc ccttctgtct tctcctccaa tacataaaag ????? 361 gacacagaca aggaactttg ctggaaaggg gnaacccatt ttcagggatc aggtcaaagg ????? 421 gcaagcaagc aggatagact cnaggtgtgt gaaatatgtt atacaccagg aggctggcac ????? 481 tggnatggtc ccaaacaaga atggtgtccg tctggggtct ggaatgtaag agttaaggga ????? 541 agggaangaa gggactacaa gangagtcgg agatggatga nggaaacaac acaatttccc ????? 601 aggccagtga tgcttgtggt gnacagntgt tcccgaggtc gggg // === end /tmp/W02578.genbank === === start /tmp/W02578.emboss_genbank2genbank === LOCUS?????? W02578?????????????????? 644 bp??? DNA???? linear?? UNC 14-AUG-2012 DEFINITION? za52e02.r1 Soares fetal liver spleen 1NFLS Homo sapiens cDNA clone ??????????? 296186 5'. ACCESSION?? W02578 KEYWORDS??? EST. SOURCE????? human. ? ORGANISM? human. REFERENCE?? 1? (bases 1 to 644) FEATURES???????????? Location/Qualifiers ???? source????????? 1..644 ???????????????????? /organism="Homo sapiens" ???????????????????? /note="Organ: Liver and Spleen; Vector: pT7T3D (Pharmacia) ???????????????????? with a modified polylinker; Site_1: Pac I; Site_2: Eco RI; ???????????????????? 1st strand cDNA was primed with a Pac I - oligo(dT) primer ???????????????????? [5' AACTGGAAGAATTAATTAAAGATCTTTTTTTTTTTTTTTTTTT 3'], ???????????????????? double-stranded cDNA was ligated to Eco RI adaptors ???????????????????? (Pharmacia), digested with Pac I and cloned into the Pac I ???????????????????? and Eco RI sites of the modified pT7T3 vector. Library ???????????????????? went through one round of normalization. Library ???????????????????? constructed by Bento Soares and M.Fatima Bonaldo." ???????????????????? /clone="296186" ???????????????????? /clone_lib="Soares fetal liver spleen 1NFLS" ???????????????????? /sex="male" ???????????????????? /dev_stage="20 week-post conception fetus" ???????????????????? /lab_host="DH10B (ampicillin resistant)" ???? mRNA??????????? <1..>644 ORIGIN ?????? 1? acgatgatga caatgaaatt agtgcctgtt ttcttgcaaa tttagcactt ggaacattta ????? 61? aagaaaggtc tatgctgtca tatggggttt attgggaact atcctcctgg ccccaccctg ???? 121? ccccttcttt ttggttttga catcattcat ttccacctgg gaatttctgg tgccatgcca ???? 181? gaaagaatga ggaacctgta ttcctcttct tcgtgataat ataatctcta tttttttagg ???? 241? aaaacaaaaa tgaaaaacta ctccatttga ggattgtaat tcccacccct cttgcttctt ???? 301? ccccacctca ccatctccca gaccctcttc ccttctgtct tctcctccaa tacataaaag ???? 361? gacacagaca aggaactttg ctggaaaggg gnaacccatt ttcagggatc aggtcaaagg ???? 421? gcaagcaagc aggatagact cnaggtgtgt gaaatatgtt atacaccagg aggctggcac ???? 481? tggnatggtc ccaaacaaga atggtgtccg tctggggtct ggaatgtaag agttaaggga ???? 541? agggaangaa gggactacaa gangagtcgg agatggatga nggaaacaac acaatttccc ???? 601? aggccagtga tgcttgtggt gnacagntgt tcccgaggtc gggg // === end /tmp/W02578.emboss_genbank2genbank === From ricepeterm at yahoo.co.uk Wed Aug 15 07:53:12 2012 From: ricepeterm at yahoo.co.uk (Peter Rice) Date: Wed, 15 Aug 2012 08:53:12 +0100 Subject: [EMBOSS] Sequence annotation parsing and format conversion In-Reply-To: <1344967154.22837.YahooMailNeo@web130203.mail.mud.yahoo.com> References: <1344967154.22837.YahooMailNeo@web130203.mail.mud.yahoo.com> Message-ID: <502B5568.6090003@yahoo.co.uk> Dear Daniel, On 14/08/2012 18:59, Daniel Rozenbaum wrote: > seqret /abss/tmp/W02578.genbank -osformat2 genbank -feature Y -auto -osname W02578.emboss_genbank2genbank -osdirectory /tmp > > In the resultant file parts of the sequence annotation, such as fields AUTHORS, TITLE, COMMENT, and BASE COUNT are omitted, and values of some of the other fields are modified. You are correct ... there are some gaps in the coverage of records in GenBank format. We will update those for the next release with the aim of preserving information when rewriting in GenBank format (we will aim to reproduce the full entry) and where possible retaining information when writing in EMBL format. There are some surprising inconsistencies in the current genbank to genbank conversion (for example the ORGANISM record). When tested on the EMBL version of this entry the current EMBOSS 6.5 release reproduces the entry exactly (comparing seqret to entret) apart from the exact wrapping of feature annotation. We should be able to do the same for GenBank format. Many thanks for the report Peter Rice EMBOSS Team From drozenbaum at yahoo.com Wed Aug 15 16:57:20 2012 From: drozenbaum at yahoo.com (Daniel Rozenbaum) Date: Wed, 15 Aug 2012 09:57:20 -0700 (PDT) Subject: [EMBOSS] Support for multi-line annotation in ig format Message-ID: <1345049840.40402.YahooMailNeo@web130204.mail.mud.yahoo.com> Dear list, (Peter, many thanks for your prompt reply to my previous inquiry!) We need to deal with extensive databases in Intelligenetics format with multiple lines in annotation of each record. It appears however that EMBOSS concatenates all annotation lines into a single line when building its internal representation of the sequence description: % cat /tmp/IGSEQ.ig ; Annotation line 1 ; Annotation line 2 ; Annotation line 3 IGSEQ ACGCATCGCATCAGACTACGC1 % seqret /tmp/IGSEQ.ig -osformat2 ig -auto -osname IGSEQ.emboss_ig2ig -osdirectory /tmp % cat /tmp/IGSEQ.emboss_ig2ig.ig ;Annotation line 1 Annotation line 2 Annotation line 3, 21 bases IGSEQ ACGCATCGCATCAGACTACGC1 Are there any plans to support multi-line annotation in this format? Many thanks, Daniel From ricepeterm at yahoo.co.uk Wed Aug 15 17:36:25 2012 From: ricepeterm at yahoo.co.uk (Peter Rice) Date: Wed, 15 Aug 2012 18:36:25 +0100 Subject: [EMBOSS] Support for multi-line annotation in ig format In-Reply-To: <1345049840.40402.YahooMailNeo@web130204.mail.mud.yahoo.com> References: <1345049840.40402.YahooMailNeo@web130204.mail.mud.yahoo.com> Message-ID: <502BDE19.9070603@yahoo.co.uk> On 15/08/2012 17:57, Daniel Rozenbaum wrote: > Dear list, > > (Peter, many thanks for your prompt reply to my previous inquiry!) > > We need to deal with extensive databases in Intelligenetics format with multiple lines in annotation of each record. It appears however that EMBOSS concatenates all annotation lines into a single line when building its internal representation of the sequence description: > > % cat /tmp/IGSEQ.ig > ; Annotation line 1 > ; Annotation line 2 > ; Annotation line 3 > IGSEQ > ACGCATCGCATCAGACTACGC1 > > > % seqret /tmp/IGSEQ.ig -osformat2 ig -auto -osname IGSEQ.emboss_ig2ig -osdirectory /tmp > > > % cat /tmp/IGSEQ.emboss_ig2ig.ig > ;Annotation line 1 Annotation line 2 Annotation line 3, 21 bases > IGSEQ > ACGCATCGCATCAGACTACGC1 > > Are there any plans to support multi-line annotation in this format? Interesting thought. We will take a look. It will need some care to maintain compatibility with other formats that have single (FASTA) or multiple (swissprot) descriptions. Which package is using this IG format? regards, Peter Rice EMBOSS Team From ivaylo.stoimenov at gmail.com Fri Aug 24 14:46:42 2012 From: ivaylo.stoimenov at gmail.com (Ivaylo Stoimenov) Date: Fri, 24 Aug 2012 16:46:42 +0200 Subject: [EMBOSS] eprimer32 question Message-ID: Hi, I was wondering if it is possible for eprimer32 to read a sequence from a string. The default input is from a file, but I need to feed primer3 from a string, which is changing in the body of my program. Any help will be appreciated. Regards, Ivaylo Stoimenov From p.j.a.cock at googlemail.com Fri Aug 24 14:59:27 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 24 Aug 2012 15:59:27 +0100 Subject: [EMBOSS] eprimer32 question In-Reply-To: References: Message-ID: On Fri, Aug 24, 2012 at 3:46 PM, Ivaylo Stoimenov wrote: > Hi, > I was wondering if it is possible for eprimer32 to read a sequence from a > string. The default input is from a file, but I need to feed primer3 from a > string, which is changing in the body of my program. Any help will be > appreciated. > Many EMBOSS tools will take a sequence on the command line using the pretend file format "asis" (as is), or can read from stdin. Because in this case EMBOSS is wrapping primer3, that may not be possible - but worth checking. Peter From ricepeterm at yahoo.co.uk Fri Aug 24 16:34:55 2012 From: ricepeterm at yahoo.co.uk (Peter Rice) Date: Fri, 24 Aug 2012 17:34:55 +0100 Subject: [EMBOSS] eprimer32 question In-Reply-To: References: Message-ID: <5037AD2F.2060607@yahoo.co.uk> Dear Ivaylo and Peter, On 24/08/2012 15:59, Peter Cock wrote: > On Fri, Aug 24, 2012 at 3:46 PM, Ivaylo Stoimenov > wrote: >> Hi, >> I was wondering if it is possible for eprimer32 to read a sequence from a >> string. The default input is from a file, but I need to feed primer3 from a >> string, which is changing in the body of my program. Any help will be >> appreciated. >> > Many EMBOSS tools will take a sequence on the command line > using the pretend file format "asis" (as is), or can read from stdin. > Because in this case EMBOSS is wrapping primer3, that may not > be possible - but worth checking. Good idea to be careful - some (EMBASSY) wrappers pass the input file name directly but eprimer32 does read the sequence and then send it to primer3 as a string. Any EMBOSS sequence input can be in the form asis::atcgatcgtagctgac which simply says the string is the sequence (rather than the file name). As the sequence has no name you can add: -sid myseqname to the command line which is then used in the output, and is also used to create the default output filename of myseqname.eprimer32 The only limit will be the length of command line you are allowed on your system, so long sequences may fail. But that is a system (shell) limit - EMBOSS will read any length of sequence that is passed to it this way. regards, Peter Rice EMBOSS Team From ivaylo.stoimenov at gmail.com Mon Aug 27 10:16:41 2012 From: ivaylo.stoimenov at gmail.com (Ivaylo Stoimenov) Date: Mon, 27 Aug 2012 12:16:41 +0200 Subject: [EMBOSS] eprimer32 question In-Reply-To: <5037AD2F.2060607@yahoo.co.uk> References: <5037AD2F.2060607@yahoo.co.uk> Message-ID: Hi Peter and Peter, Many thanks to both of you for the valuable help. It is very important for me to have as less file-related operations as possible, and passing a string helps to reduce the input from files. I wondered if it is possible to hijack the output of Primer3 to some sort of object directly to eprimer32 without writing to a file and thus making Primer3 sort of a function to an external program. I am aiming of running a validation cycle for the primers suggested by Primer3 and if necessary to change the input to Primer3 until certain criteria are fulfilled. Therefore I would like to skip a file-write-read operation until everything is optimal. Do you know if that is still possible, or I always need to read from Primer3 output files. Kind regards, Ivaylo 2012/8/24 Peter Rice > Dear Ivaylo and Peter, > > > On 24/08/2012 15:59, Peter Cock wrote: > >> On Fri, Aug 24, 2012 at 3:46 PM, Ivaylo Stoimenov >> wrote: >> >>> Hi, >>> I was wondering if it is possible for eprimer32 to read a sequence from a >>> string. The default input is from a file, but I need to feed primer3 >>> from a >>> string, which is changing in the body of my program. Any help will be >>> appreciated. >>> >>> Many EMBOSS tools will take a sequence on the command line >> using the pretend file format "asis" (as is), or can read from stdin. >> Because in this case EMBOSS is wrapping primer3, that may not >> be possible - but worth checking. >> > > Good idea to be careful - some (EMBASSY) wrappers pass the input file name > directly but eprimer32 does read the sequence and then send it to primer3 > as a string. > > Any EMBOSS sequence input can be in the form asis::atcgatcgtagctgac which > simply says the string is the sequence (rather than the file name). As the > sequence has no name you can add: > > -sid myseqname > > to the command line which is then used in the output, and is also used to > create the default output filename of myseqname.eprimer32 > > The only limit will be the length of command line you are allowed on your > system, so long sequences may fail. But that is a system (shell) limit - > EMBOSS will read any length of sequence that is passed to it this way. > > regards, > > Peter Rice > EMBOSS Team > > From p.j.a.cock at googlemail.com Mon Aug 27 13:43:44 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 27 Aug 2012 14:43:44 +0100 Subject: [EMBOSS] eprimer32 question In-Reply-To: References: <5037AD2F.2060607@yahoo.co.uk> Message-ID: On Mon, Aug 27, 2012 at 11:16 AM, Ivaylo Stoimenov wrote: > Hi Peter and Peter, > Many thanks to both of you for the valuable help. It is very important for > me to have as less file-related operations as possible, and passing a > string helps to reduce the input from files. > I wondered if it is possible to hijack the output of Primer3 to some sort > of object directly to eprimer32 without writing to a file and thus making > Primer3 sort of a function to an external program. I am aiming of running a > validation cycle for the primers suggested by Primer3 and if necessary to > change the input to Primer3 until certain criteria are fulfilled. Therefore > I would like to skip a file-write-read operation until everything is > optimal. Do you know if that is still possible, or I always need to read > from Primer3 output files. I think what you're asking for would require compiling the EMBOSS or underlying Primer3 function into your program. That may not be possible due to licensing restrictions (e.g. EMBOSS is GPL, what does your tool use?). However, you should be able to avoid a file-write-read by asking EMBOSS to write the output to stdout, and reading that from your program. (This is related to the other suggestion I made for passing the sequence to EMBOSS without using an input file: You can do this in the command line using the "asis" trick, or use stdin.) Peter From ricepeterm at yahoo.co.uk Mon Aug 27 17:28:31 2012 From: ricepeterm at yahoo.co.uk (Peter Rice) Date: Mon, 27 Aug 2012 18:28:31 +0100 Subject: [EMBOSS] eprimer32 question In-Reply-To: References: <5037AD2F.2060607@yahoo.co.uk> Message-ID: <503BAE3F.2090902@yahoo.co.uk> Dear Ivaylo and Peter, > On Mon, Aug 27, 2012 at 11:16 AM, Ivaylo Stoimenov > wrote: >> Hi Peter and Peter, >> Many thanks to both of you for the valuable help. It is very important for >> me to have as less file-related operations as possible, and passing a >> string helps to reduce the input from files. >> I wondered if it is possible to hijack the output of Primer3 to some sort >> of object directly to eprimer32 without writing to a file and thus making >> Primer3 sort of a function to an external program. I am aiming of running a >> validation cycle for the primers suggested by Primer3 and if necessary to >> change the input to Primer3 until certain criteria are fulfilled. Therefore >> I would like to skip a file-write-read operation until everything is >> optimal. Do you know if that is still possible, or I always need to read >> from Primer3 output files. Interesting suggestion, but already done! That is how eprimer32 works. The primer3 program is started, its input is sent to standard input, and its results are read from standard output. So all you should need to do is to request EMBOSS primer32 output goes to standard output. You can either use 'stdout' as the output file name, or use the -filter command line option which also defaults the first input file to standard input (but is happy with the asis:: sequence string as input), writes by default to standard output as the first output file, and also adds -auto to default all other unspecified options. -filter is intended to allow EMBOSS applications to be run as part of a pipe series. On 27/08/2012 14:43, Peter Cock wrote: > I think what you're asking for would require compiling the EMBOSS > or underlying Primer3 function into your program. That may not be > possible due to licensing restrictions (e.g. EMBOSS is GPL, what > does your tool use?). Fortunately not a problem. The way EMBOSS works it passes input and reads output through a pipe between processes (so, in GPL terms, primer3 and eprimer32 are not in the same binary file). > However, you should be able to avoid a file-write-read by asking > EMBOSS to write the output to stdout, and reading that from your > program. Exactly as above. > (This is related to the other suggestion I made for passing the > sequence to EMBOSS without using an input file: You can do > this in the command line using the "asis" trick, or use stdin.) Indeed. This is how the -filter pipe works. regards, Peter Rice EMBOSS Team From ivaylo.stoimenov at gmail.com Tue Aug 28 15:10:30 2012 From: ivaylo.stoimenov at gmail.com (Ivaylo Stoimenov) Date: Tue, 28 Aug 2012 17:10:30 +0200 Subject: [EMBOSS] eprimer32 question In-Reply-To: <503BAE3F.2090902@yahoo.co.uk> References: <5037AD2F.2060607@yahoo.co.uk> <503BAE3F.2090902@yahoo.co.uk> Message-ID: Hi, Your are extremely helpful again! Thank you. Just a last detail. How to rescue the output from stdout and put it back to the EMBOSS parser. If I feed one sequence at a time, I assume I have to use primer3 module read function. Could you please write one or two lines of code, which will hijack the output from Primer3 into some sort of Primer3.Record object. This might look easy, but I was not able to figure out how to do it. Hopefully this will be my last question. Kind regards, Ivaylo 2012/8/27 Peter Rice > Dear Ivaylo and Peter, > > > On Mon, Aug 27, 2012 at 11:16 AM, Ivaylo Stoimenov >> wrote: >> >>> Hi Peter and Peter, >>> Many thanks to both of you for the valuable help. It is very important >>> for >>> me to have as less file-related operations as possible, and passing a >>> string helps to reduce the input from files. >>> I wondered if it is possible to hijack the output of Primer3 to some sort >>> of object directly to eprimer32 without writing to a file and thus making >>> Primer3 sort of a function to an external program. I am aiming of >>> running a >>> validation cycle for the primers suggested by Primer3 and if necessary to >>> change the input to Primer3 until certain criteria are fulfilled. >>> Therefore >>> I would like to skip a file-write-read operation until everything is >>> optimal. Do you know if that is still possible, or I always need to read >>> from Primer3 output files. >>> >> > Interesting suggestion, but already done! That is how eprimer32 works. The > primer3 program is started, its input is sent to standard input, and its > results are read from standard output. > > So all you should need to do is to request EMBOSS primer32 output goes to > standard output. > > You can either use 'stdout' as the output file name, or use the -filter > command line option which also defaults the first input file to standard > input (but is happy with the asis:: sequence string as input), writes by > default to standard output as the first output file, and also adds -auto to > default all other unspecified options. -filter is intended to allow EMBOSS > applications to be run as part of a pipe series. > > > On 27/08/2012 14:43, Peter Cock wrote: > >> I think what you're asking for would require compiling the EMBOSS >> or underlying Primer3 function into your program. That may not be >> possible due to licensing restrictions (e.g. EMBOSS is GPL, what >> does your tool use?). >> > > Fortunately not a problem. The way EMBOSS works it passes input and reads > output through a pipe between processes (so, in GPL terms, primer3 and > eprimer32 are not in the same binary file). > > > However, you should be able to avoid a file-write-read by asking >> EMBOSS to write the output to stdout, and reading that from your >> program. >> > > Exactly as above. > > > (This is related to the other suggestion I made for passing the >> sequence to EMBOSS without using an input file: You can do >> this in the command line using the "asis" trick, or use stdin.) >> > > Indeed. This is how the -filter pipe works. > > > regards, > > Peter Rice > EMBOSS Team > >