From wrp at virginia.edu Mon Jul 2 10:31:40 2012 From: wrp at virginia.edu (William Pearson) Date: Mon, 2 Jul 2012 10:31:40 -0400 Subject: [Bioperl-l] Application Deadline - 2012 CSHL Computational and Comparative Genomics Course Message-ID: Course announcement - Application deadline, July 15, 2012 Cold Spring Harbor COMPUTATIONAL & COMPARATIVE GENOMICS Oct 31 - Nov 6, 2011 Application Deadline: July 15, 2012 INSTRUCTORS: William Pearson, University of Virginia, Charlottesville, VA Lisa Stubbs, University of Illinois, Urbana, IL This course presents a comprehensive overview of the theory and practice of computational methods for the identification and characterization of functional elements from DNA sequence data. The course focuses on approaches for extracting the maximum amount of information from protein and DNA sequence similarity through sequence database searches, statistical analysis, and multiple sequence alignment. Additional topics include: Alignment and analysis of "Next-Gen" sequencing data The Galaxy environment for high-throughput analysis Identification of conserved signals in aligned and unaligned sequences Regulatory element and motif recognition Integration of genetic and sequence information in biological databases The ENSEMBL genome browser and BioMart Function/phenotype prediction for sequence variants The course combines lectures with hands-on exercises; students are encouraged to pose challenging sequence analysis problems using their own data. The course is designed for biologists seeking advanced training in biological sequence and genome analysis, computational biology core resource directors and staff, and for scientists in other disciplines, such as computer science, who wish to survey current research problems in biological sequence analysis. Advanced programming skills are not required. The lecture/lab schedule for the 2011 course can be found at fasta.bioch.virginia.edu/cshl Speakers in the 2011 course included: Aaron Mackey, U. of Virginia, Next-Gen analysis pipelines Bert Overduin, European Bioinformatics Institute, UK, ENSEMBL and BioMart Francis Ouellette, Ontario Institute for Cancer Research, Databases for Biological Function William Pearson, U. of Virginia, Similarity Searching, Multiple Alignment Lisa Stubbs, U. of Illinois, Urbana, ChIP, Transcription Factors, and Comparative Genomics James Taylor, Emory, Galaxy and genome analysis pipelines The primary focus of the computational and comparative genomics course is the theory and practice of algorithms used in computational biology, with the goal of using current methods more effectively and evaluating new approaches. Students who wish to learn Perl programming for Bioinformatics are encouraged to apply to the Programming for Biology course. Students who would like in-depth training in the analysis of next-generation sequencing data (e.g., SNP calling and the detection of structural variants) should apply to the course on Advanced Sequencing Technologies & Applications. This Computational and Comparative Genomics course will discuss methods for phenotype prediction from variation data. To apply to the course, fill out and send in the form at: http://meetings.cshl.edu/course/courseapp_instr.shtml From shalabh.sharma7 at gmail.com Mon Jul 2 13:09:57 2012 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Mon, 2 Jul 2012 13:09:57 -0400 Subject: [Bioperl-l] translation frame problem in bioperl Message-ID: Hi All, I am just confused about the translation frames. I used bioperl to parse a blastx report. Reports shows that the frame used is -2 but when i translate the sequence using EMBOSS or Some other program the frame is -1. Am i doing something wrong here. Here is the sequence: >gi|378759230|gb|AHBJ01000169.1| SAR86 cluster bacterium SAR86D scf1120176765857, whole genome shotgun sequence 2642:3697 AGCTTCCCATGGAACCCATGCAAGTGCAATATTTGTTTCTAGCTCTGGTGACCACCAAGGAGATGTCACGTAGCCCACCTCATCTTCATCAGTATTAGTTACTATCCAAAAATCAGAAGCATAATCTGTGATTTCTTTTCCTCCAAGGGTTAAACCAACCATCTTCATTTTAAATGGTGCATTTCCTTCATCTATGATTGCTCTCTGTTTTTCAAGCTCTTCTTTACCAATGTAATCAGCTGCTTTATTTCTTGGTACCTGATAACTTAAATTAACCTGAAAGGGAGAAGTTTCATGATCCAGATCTTGTCCCCAAGACAAAATTCCAGCTGCAATGCGACGATGATGCGCAGGAGCTATGACCATTAAGCCAAATTCTTCTCCAGCCTCAAGAACAGCATTCCACATTTTTTCTGCATTATCATGTGCGTCACGAACATATATTTCATAACCTTTTTCGCCTGTAAAACCAGTTTGACTGATTACACAATCAGCTCCACCAACCTGAGTTTCTAAAATTCCATAATAAGGAACTTCTCTTAACTCTTCGCCAGCTAACTTTGCCATAAGATCTTCAGATAAAGGGCCTTGAATTTGAACAGGACAAACATCAATCTCATCAATTTCTACGTCATATTTTTTAGACACATTTACGCCTTGAAGCCAAAGTAAGAGATCGCTGTCTGATATTGAGAACCAGAATTCATCTTCTGTTAGTCTTAATAGAACAGGGTCATTTAAAACCCCTCCTTTTTCATTGCATAAAATCGCATATTTACCATTTCCGGGTTTAATTTTTGTAGCATCACGAGTTATTACATAATCTGTAAAAGCTTCTGCATCTGGACCTTTTACTCTTATCTGTCTTTCAACAGCAACATTCCACATAGTAACTCTATTAACCAAGGCTTCGTATTCAACCATGGCACCGCCATCTTCAGGTTTTACATAGCCTCGTGGATGATAAATTCGATTATATACAGTTGCTCTCCAACAGCCCGCTTCATGAGATAGATGCCAAAAAGGCGATTTTCTTACCCGGGTTGAAATTAATAA This is a part of blast report by bioperl: >JCVI_READ_1105499496127 /Indian_Ocean/gcvT Length = 352 Score = 655 bits (1690), Expect = 0.0 Identities = 311/352 (88%), Positives = 329/352 (93%) Frame = -2 Query: 3697 LLISTRVRKSPFWHLSHEAGCWRATVYNRIYHPRGYVKPEDGGAMVEYEALVNRVTMWNV 3518 +LISTRVRKSPFWHLSHEAGCWRATVYNRIYHPRGY+KPEDGGAMVEY+ALVNRVTMWNV Sbjct: 1 MLISTRVRKSPFWHLSHEAGCWRATVYNRIYHPRGYIKPEDGGAMVEYDALVNRVTMWNV 60 ..... ..... Query: 2797 GLTLGGKEITDYASDFWIVTNTDEDEVGYVTSPWWSPELETNIALAWVPWEA 2642 GLTLGGKEITDYA DFW+V + D + PWWSPEL TNIAL WVPW A Sbjct: 301 GLTLGGKEITDYAPDFWLVADMDGMMLDISLPPWWSPELNTNIALGWVPWSA 352 This is EMBOSS output (from EBI): >EMBOSS_001_4 LLISTRVRKSPFWHLSHEAGCWRATVYNRIYHPRGYVKPEDGGAMVEYEALVNRVTMWNV AVERQIRVKGPDAEAFTDYVITRDATKIKPGNGKYAILCNEKGGVLNDPVLLRLTEDEFW FSISDSDLLLWLQGVNVSKKYDVEIDEIDVCPVQIQGPLSEDLMAKLAGEELREVPYYGI LETQVGGADCVISQTGFTGEKGYEIYVRDAHDNAEKMWNAVLEAGEEFGLMVIAPAHHRR IAAGILSWGQDLDHETSPFQVNLSYQVPRNKAADYIGKEELEKQRAIIDEGNAPFKMKMV GLTLGGKEITDYASDFWIVTNTDEDEVGYVTSPWWSPELETNIALAWVPWEA >EMBOSS_001_5 INFNPGKKIAFLASIS*SGLLESNCI*SNLSSTRLCKT*RWRCHG*IRSLG**SYYVECC C*KTDKSKRSRCRSFYRLCNNS*CYKN*TRKW*ICDFMQ*KRRGFK*PCSIKTNRR*ILV ...... You can see its a frame -1. I would really appreciate your help. Thanks Shalabh -- Shalabh Sharma Scientific Computing Professional Associate (Bioinformatics Specialist) Department of Marine Sciences University of Georgia Athens, GA 30602-3636 From scott at scottcain.net Mon Jul 2 14:50:45 2012 From: scott at scottcain.net (Scott Cain) Date: Mon, 2 Jul 2012 14:50:45 -0400 Subject: [Bioperl-l] GMOD Summer School application deadline Message-ID: Hello, The deadline to apply for the GMOD Summer School is in one week, July 9th. The application is available as a Google Form: https://docs.google.com/spreadsheet/embeddedform?formkey=dG5hNGFiQ3UwYTV2LUZxZW04Qm1yZXc6MQ In the GMOD Summer School (August 24-29, 2012) we will cover the installation, configuration and use of a variety of GMOD tools, including Chado, GBrowse, JBrowse and Tripal. For more information on the course, see the course web page at http://gmod.org/wiki/2012_GMOD_Summer_School The course will make heavy use of the Amazon Web Service (aka, the Cloud) via a grant from Amazon. Enrollment is limited to 24 students, and the application process is competitive: the last few years we've received over 75 applications for those 24 spots. I look forward to seeing you in North Carolina in August! Scott -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From p.j.a.cock at googlemail.com Mon Jul 2 15:34:40 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 2 Jul 2012 20:34:40 +0100 Subject: [Bioperl-l] translation frame problem in bioperl In-Reply-To: References: Message-ID: On Mon, Jul 2, 2012 at 6:09 PM, shalabh sharma wrote: > Hi All, > ? ? ? ? ?I am just confused about the translation frames. I used bioperl to > parse a blastx report. > Reports shows that the frame used is -2 but when i translate the sequence > using EMBOSS or Some other program the frame is -1. > Am i doing something wrong here. Possibly there are conflicting definitions of frames -1, -2, and -3 here (and that's leaving out the possibility of -0, -1 and -2 counting). Some will count from the first base (start for forward strand), others the last base (start of reverse strand). This can make comparing the output of different tools quite confusing. Peter From shalabh.sharma7 at gmail.com Mon Jul 2 16:39:29 2012 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Mon, 2 Jul 2012 16:39:29 -0400 Subject: [Bioperl-l] translation frame problem in bioperl In-Reply-To: <98F2B304-71F3-42BD-9603-6858F03CC9F5@verizon.net> References: <98F2B304-71F3-42BD-9603-6858F03CC9F5@verizon.net> Message-ID: Hi Peter and Brian, Thanks a lot for your reply. I have already taken this in account. So if i parse the blast report (my previous example) i get strand '-1' and frame '1' (according to bioperl) so if we convert it to general term then its -2 because bioperl starts from 0. Also for bioperl forward frame translation working fine. Thanks Shalabh On Mon, Jul 2, 2012 at 4:24 PM, Brian Osborne wrote: > Shalabh, > > Also take a look at this: > > http://www.bioperl.org/wiki/HOWTO:SearchIO#frame.28.29 > > Brian O. > > > On Jul 2, 2012, at 3:34 PM, Peter Cock wrote: > > > On Mon, Jul 2, 2012 at 6:09 PM, shalabh sharma > > wrote: > >> Hi All, > >> I am just confused about the translation frames. I used > bioperl to > >> parse a blastx report. > >> Reports shows that the frame used is -2 but when i translate the > sequence > >> using EMBOSS or Some other program the frame is -1. > >> Am i doing something wrong here. > > > > Possibly there are conflicting definitions of frames -1, -2, and -3 here > > (and that's leaving out the possibility of -0, -1 and -2 counting). Some > > will count from the first base (start for forward strand), others the > last > > base (start of reverse strand). This can make comparing the output > > of different tools quite confusing. > > > > Peter > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- Shalabh Sharma Scientific Computing Professional Associate (Bioinformatics Specialist) Department of Marine Sciences University of Georgia Athens, GA 30602-3636 From bosborne11 at verizon.net Mon Jul 2 16:24:24 2012 From: bosborne11 at verizon.net (Brian Osborne) Date: Mon, 02 Jul 2012 16:24:24 -0400 Subject: [Bioperl-l] translation frame problem in bioperl In-Reply-To: References: Message-ID: <98F2B304-71F3-42BD-9603-6858F03CC9F5@verizon.net> Shalabh, Also take a look at this: http://www.bioperl.org/wiki/HOWTO:SearchIO#frame.28.29 Brian O. On Jul 2, 2012, at 3:34 PM, Peter Cock wrote: > On Mon, Jul 2, 2012 at 6:09 PM, shalabh sharma > wrote: >> Hi All, >> I am just confused about the translation frames. I used bioperl to >> parse a blastx report. >> Reports shows that the frame used is -2 but when i translate the sequence >> using EMBOSS or Some other program the frame is -1. >> Am i doing something wrong here. > > Possibly there are conflicting definitions of frames -1, -2, and -3 here > (and that's leaving out the possibility of -0, -1 and -2 counting). Some > will count from the first base (start for forward strand), others the last > base (start of reverse strand). This can make comparing the output > of different tools quite confusing. > > Peter > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From vebaev at gmail.com Tue Jul 3 12:35:26 2012 From: vebaev at gmail.com (vebaev at gmail.com) Date: Tue, 3 Jul 2012 09:35:26 -0700 (PDT) Subject: [Bioperl-l] CFP - International Conference on Bioinformatics and Computational Biology - BIOCOMP BG 2012 Message-ID: <7b498b4c-2b2e-4e1f-871f-513203488bf1@googlegroups.com> International Conference on Bioinformatics and Computational Biology - BIOCOMP BG 2012 September 20-21, 2012, Varna, Bulgaria Dear Colleague, It is our pleasure to circulate the 2nd announcement of the International Conference on Bioinformatics and Computational Biology - BIOCOMP 2012 (http://biocomp.bio.uni-plovdiv.bg/). Keynote speakers Prof. Dr. Klaas Vandepoele - Ghent University, Belgium Dr. Andreas Gisel - Institute for Biomedical Technologies, Italy Prof. Wojciech Karlowski - Insitute of Molecular Biology and Biotechnology, Poland Prof. Mario A. Fares - University of Dublin, Trinity College, Ireland Dr.Andrey Kajava - CRBM - Macromolecular Biochemistry Research Center, France Dr.Gaurav Sablok - Istituto Agrario San Michele (IASMA), Italy Topics Topics of interest include, but are not limited to: High-performance bio-computing High-throughput sequencing data analysis (NGS) Bio-ontologies Molecular evolution Comparative genomics Molecular modeling and simulation Computational genetics Computational proteomics Data mining and visualization Software tools and applications Gene expression analysis Gene networks Structural biology Genome analysis Databases Systems biology Special topic: bioinformatics and miRNAs Recent achievements in these fields will be presented. The conference will include plenary and poster sessions. Participant?s proposals will be taken under advisement in compiling the program. Publications All accepted abstracts will be published in the conference abstract book. Best 20 abstracts will be peer-reviewed and published as full text manuscripts in a Special Issue of Springer and Elsevier journals: Interdisciplinary Sciences: Computational Life Sciences (ISSN: 1867-1462). Journal of Computational Science (ISSN: 1877-7503) Venue The venue of the conference is 4-star All-inclusive Sunny Day Black Sea resort, Bulgaria Registration and abstract submission All the actions related to the BIOCOMP 2012 (abstract submission, registration etc) may be completed via the Conference website at http://biocomp.bio.uni-plovdiv.bg/ Accommodation IMPORTANT: Accommodation is included in the conference registration fee. Important dates Abstract Submission Deadline - 20 August 2012 Early Registration Fee Payment Deadline - 20 August 2012 Arriving, Poster set up, Registration ? 19 September 2012 Plenary and Poster Sessions ? 20-21 September 2012 You may find details of the Conference visiting the Conference website at http://biocomp.bio.uni-plovdiv.bg/ Looking forward to see you in Bulgaria! ------------------------------------------------ Dr. Vesselin Baev Research Assistant Professor University of Plovdiv Dept. Plant Phys. and Molecular Biology Bioinformatics SMART Group Tzar Assen 24,Plovdiv 4000, BULGARIA Office:+359 32 261 (560); Mobile:+359 89 43 80 945 vebaev at gmail.com; baev at uni-plovdiv.bg; CV: http://plantgene.eu/ From tarakaramji at gmail.com Tue Jul 3 15:33:43 2012 From: tarakaramji at gmail.com (Tarakaramji Moturu) Date: Tue, 3 Jul 2012 19:33:43 +0000 (UTC) Subject: [Bioperl-l] Invitation to connect on LinkedIn Message-ID: <757342252.16905070.1341344023805.JavaMail.app@ela4-bed83.prod> LinkedIn ------------ I'd like to add you to my professional network on LinkedIn. - Tarakaramji Tarakaramji Moturu Student at GITAM University Vishakhapatnam Area, India Confirm that you know Tarakaramji Moturu: https://www.linkedin.com/e/1505z7-h47dlkop-69/isd/7726719493/9xC087NO/?hs=false&tok=2UuxBwCCkl7Rk1 -- You are receiving Invitation to Connect emails. Click to unsubscribe: http://www.linkedin.com/e/1505z7-h47dlkop-69/q7l5PgNeLXh3mAgNJzs79PDWzhT0l80xWa/goo/bioperl-l%40bioperl%2Eorg/20061/I2613636655_1/?hs=false&tok=0hY4YIDwkl7Rk1 (c) 2012 LinkedIn Corporation. 2029 Stierlin Ct, Mountain View, CA 94043, USA. From l.m.timmermans at students.uu.nl Wed Jul 4 03:16:34 2012 From: l.m.timmermans at students.uu.nl (Leon Timmermans) Date: Wed, 4 Jul 2012 10:16:34 +0300 Subject: [Bioperl-l] Invitation to connect on LinkedIn In-Reply-To: <757342252.16905070.1341344023805.JavaMail.app@ela4-bed83.prod> References: <757342252.16905070.1341344023805.JavaMail.app@ela4-bed83.prod> Message-ID: On Tue, Jul 3, 2012 at 10:33 PM, Tarakaramji Moturu wrote: > LinkedIn > ------------ > > > > I'd like to add you to my professional network on LinkedIn. > > - Tarakaramji Sending messages like this directly over mailinglists is a rather bad idea, if only because LinkedIn will think bioperl-l at bioperl.org is one of the email addresses of whomever accepts the request (which is relevant for retrieving a lost password, I think). Leon From ulrik.stervbo at gmail.com Fri Jul 6 03:03:08 2012 From: ulrik.stervbo at gmail.com (Ulrik Stervbo) Date: Fri, 6 Jul 2012 09:03:08 +0200 Subject: [Bioperl-l] problem in using protparam.pm module In-Reply-To: References: <9AA27ADA-FFE1-4735-BDE4-56C9B9A18009@illinois.edu> Message-ID: I had the same problem, and realized it is because I am behind a proxy. This is what I did to the Protparam module: Changed the url to 'http://web.expasy.org/cgi-bin/protparam/protparam' as previously found Added: $browser->proxy(['http'], 'http://[my proxy]/'); after initialization of the LWP agent. The proxy settings is what made Perl choke. (If only one could make perl see global proxy settings). Cheers, Ulrik 2011/7/28 Shachi Gahoi : > Please help me how to run protparam using bioperl module > > On Wed, Jul 27, 2011 at 11:05 AM, Chris Fields wrote: > >> The web service appears to have changed, but it looks as if no tests have >> been written up for this module which would have caught this out. We can >> write some basic tests up to check for simple functionality. >> >> chris >> >> On Jul 26, 2011, at 10:58 PM, Shachi Gahoi wrote: >> >> > Dear All, >> > >> > i am using protparam.pm module. but when i am running this script it is >> > printing one error message >> > >> > "Can't call method "throw" without a package or object reference at >> > /usr/share/perl5/Bio/Root/Root.pm line 368, line 1." >> > >> > Kindly help me to solve this problem. >> > >> > >> > Script is here---- >> > >> ################################################################################### >> > #!/usr/bin/perl >> > >> > use warnings; >> > use Bio::SeqIO; >> > use Bio::Tools::Protparam; >> > >> > >> > $seqfile='test1.fasta'; >> > >> > $seqio = Bio::SeqIO->new(-file => "$seqfile", -format => 'Fasta'); >> > >> > >> > while( $seq = $seqio->next_seq() ) >> > { >> > >> > >> > my $pp = Bio::Tools::Protparam->new(-seq=>$seq->seq); >> > >> > print >> > "ID : ", $seq->display_id,"\n", >> > "Amino acid number : ",$pp->amino_acid_number(),"\n", >> > "Number of negative amino acids : ",$pp->num_neg(),"\n", >> > "Number of positive amino acids : ",$pp->num_pos(),"\n", >> > "Molecular weight : ",$pp->molecular_weight(),"\n", >> > "Theoretical pI : ",$pp->theoretical_pI(),"\n", >> > "Total number of atoms : ", $pp->total_atoms(),"\n", >> > "Number of carbon atoms : ",$pp->num_carbon(),"\n", >> > "Number of hydrogen atoms : ",$pp->num_hydrogen(),"\n", >> > "Number of nitrogen atoms : ",$pp->num_nitro(),"\n", >> > "Number of oxygen atoms : ",$pp->num_oxygen(),"\n", >> > "Number of sulphur atoms : ",$pp->num_sulphur(),"\n", >> > "Half life : ", $pp->half_life(),"\n", >> > "Instability Index : ", $pp->instability_index(),"\n", >> > "Stability class : ", $pp->stability(),"\n", >> > "Aliphatic_index : ",$pp->aliphatic_index(),"\n", >> > "Gravy : ", $pp->gravy(),"\n", >> > "Composition of A : ", $pp->AA_comp('A'),"\n", >> > "Composition of R : ", $pp->AA_comp('R'),"\n", >> > "Composition of N : ", $pp->AA_comp('N'),"\n", >> > "Composition of D : ", $pp->AA_comp('D'),"\n", >> > "Composition of C : ", $pp->AA_comp('C'),"\n", >> > "Composition of Q : ", $pp->AA_comp('Q'),"\n", >> > "Composition of E : ", $pp->AA_comp('E'),"\n", >> > "Composition of G : ", $pp->AA_comp('G'),"\n", >> > "Composition of H : ", $pp->AA_comp('H'),"\n", >> > "Composition of I : ", $pp->AA_comp('I'),"\n", >> > "Composition of L : ", $pp->AA_comp('L'),"\n", >> > "Composition of K : ", $pp->AA_comp('K'),"\n", >> > "Composition of M : ", $pp->AA_comp('M'),"\n", >> > "Composition of F : ", $pp->AA_comp('F'),"\n", >> > "Composition of P : ", $pp->AA_comp('P'),"\n", >> > "Composition of S : ", $pp->AA_comp('S'),"\n", >> > "Composition of T : ", $pp->AA_comp('T'),"\n", >> > "Composition of W : ", $pp->AA_comp('W'),"\n", >> > "Composition of Y : ", $pp->AA_comp('Y'),"\n", >> > "Composition of V : ", $pp->AA_comp('V'),"\n", >> > "Composition of B : ", $pp->AA_comp('B'),"\n", >> > "Composition of Z : ", $pp->AA_comp('Z'),"\n", >> > "Composition of X : ", $pp->AA_comp('X'),"\n"; >> > } >> > >> ################################################################################### >> > >> > >> > >> > >> > -- >> > Regards, >> > Shachi >> > _______________________________________________ >> > Bioperl-l mailing list >> > Bioperl-l at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > > -- > Regards, > Shachi > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jason.stajich at gmail.com Fri Jul 6 13:49:46 2012 From: jason.stajich at gmail.com (Jason Stajich) Date: Fri, 6 Jul 2012 10:49:46 -0700 Subject: [Bioperl-l] problem in using protparam.pm module In-Reply-To: References: <9AA27ADA-FFE1-4735-BDE4-56C9B9A18009@illinois.edu> Message-ID: <8C9056B6-1DA4-4BE0-B008-429C2F6C05BE@gmail.com> you might try the PERL_LWP_ENV_PROXY and HTTP_PROXY env variables http://search.cpan.org/~gaas/libwww-perl-6.04/lib/LWP/UserAgent.pm#CONSTRUCTOR_METHODS http://search.cpan.org/~gaas/libwww-perl-6.04/lib/LWP/UserAgent.pm#Proxy_attributes I can't test it my end though w/o a proxy service. On Jul 6, 2012, at 12:03 AM, Ulrik Stervbo wrote: > I had the same problem, and realized it is because I am behind a proxy. > > This is what I did to the Protparam module: > Changed the url to 'http://web.expasy.org/cgi-bin/protparam/protparam' > as previously found > > Added: > $browser->proxy(['http'], 'http://[my proxy]/'); after initialization > of the LWP agent. > > The proxy settings is what made Perl choke. (If only one could make > perl see global proxy settings). > > Cheers, > Ulrik > > 2011/7/28 Shachi Gahoi : >> Please help me how to run protparam using bioperl module >> >> On Wed, Jul 27, 2011 at 11:05 AM, Chris Fields wrote: >> >>> The web service appears to have changed, but it looks as if no tests have >>> been written up for this module which would have caught this out. We can >>> write some basic tests up to check for simple functionality. >>> >>> chris >>> >>> On Jul 26, 2011, at 10:58 PM, Shachi Gahoi wrote: >>> >>>> Dear All, >>>> >>>> i am using protparam.pm module. but when i am running this script it is >>>> printing one error message >>>> >>>> "Can't call method "throw" without a package or object reference at >>>> /usr/share/perl5/Bio/Root/Root.pm line 368, line 1." >>>> >>>> Kindly help me to solve this problem. >>>> >>>> >>>> Script is here---- >>>> >>> ################################################################################### >>>> #!/usr/bin/perl >>>> >>>> use warnings; >>>> use Bio::SeqIO; >>>> use Bio::Tools::Protparam; >>>> >>>> >>>> $seqfile='test1.fasta'; >>>> >>>> $seqio = Bio::SeqIO->new(-file => "$seqfile", -format => 'Fasta'); >>>> >>>> >>>> while( $seq = $seqio->next_seq() ) >>>> { >>>> >>>> >>>> my $pp = Bio::Tools::Protparam->new(-seq=>$seq->seq); >>>> >>>> print >>>> "ID : ", $seq->display_id,"\n", >>>> "Amino acid number : ",$pp->amino_acid_number(),"\n", >>>> "Number of negative amino acids : ",$pp->num_neg(),"\n", >>>> "Number of positive amino acids : ",$pp->num_pos(),"\n", >>>> "Molecular weight : ",$pp->molecular_weight(),"\n", >>>> "Theoretical pI : ",$pp->theoretical_pI(),"\n", >>>> "Total number of atoms : ", $pp->total_atoms(),"\n", >>>> "Number of carbon atoms : ",$pp->num_carbon(),"\n", >>>> "Number of hydrogen atoms : ",$pp->num_hydrogen(),"\n", >>>> "Number of nitrogen atoms : ",$pp->num_nitro(),"\n", >>>> "Number of oxygen atoms : ",$pp->num_oxygen(),"\n", >>>> "Number of sulphur atoms : ",$pp->num_sulphur(),"\n", >>>> "Half life : ", $pp->half_life(),"\n", >>>> "Instability Index : ", $pp->instability_index(),"\n", >>>> "Stability class : ", $pp->stability(),"\n", >>>> "Aliphatic_index : ",$pp->aliphatic_index(),"\n", >>>> "Gravy : ", $pp->gravy(),"\n", >>>> "Composition of A : ", $pp->AA_comp('A'),"\n", >>>> "Composition of R : ", $pp->AA_comp('R'),"\n", >>>> "Composition of N : ", $pp->AA_comp('N'),"\n", >>>> "Composition of D : ", $pp->AA_comp('D'),"\n", >>>> "Composition of C : ", $pp->AA_comp('C'),"\n", >>>> "Composition of Q : ", $pp->AA_comp('Q'),"\n", >>>> "Composition of E : ", $pp->AA_comp('E'),"\n", >>>> "Composition of G : ", $pp->AA_comp('G'),"\n", >>>> "Composition of H : ", $pp->AA_comp('H'),"\n", >>>> "Composition of I : ", $pp->AA_comp('I'),"\n", >>>> "Composition of L : ", $pp->AA_comp('L'),"\n", >>>> "Composition of K : ", $pp->AA_comp('K'),"\n", >>>> "Composition of M : ", $pp->AA_comp('M'),"\n", >>>> "Composition of F : ", $pp->AA_comp('F'),"\n", >>>> "Composition of P : ", $pp->AA_comp('P'),"\n", >>>> "Composition of S : ", $pp->AA_comp('S'),"\n", >>>> "Composition of T : ", $pp->AA_comp('T'),"\n", >>>> "Composition of W : ", $pp->AA_comp('W'),"\n", >>>> "Composition of Y : ", $pp->AA_comp('Y'),"\n", >>>> "Composition of V : ", $pp->AA_comp('V'),"\n", >>>> "Composition of B : ", $pp->AA_comp('B'),"\n", >>>> "Composition of Z : ", $pp->AA_comp('Z'),"\n", >>>> "Composition of X : ", $pp->AA_comp('X'),"\n"; >>>> } >>>> >>> ################################################################################### >>>> >>>> >>>> >>>> >>>> -- >>>> Regards, >>>> Shachi >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >> >> >> -- >> Regards, >> Shachi >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Jason Stajich jason.stajich at gmail.com jason at bioperl.org From bubli_thakur at rediffmail.com Sun Jul 1 10:59:29 2012 From: bubli_thakur at rediffmail.com (subarna thakur) Date: Sun, 01 Jul 2012 14:59:29 -0000 Subject: [Bioperl-l] =?utf-8?q?Ks_saturation?= Message-ID: <20120617031856.16345.qmail@f4mail-235-140.rediffmail.com> Dear all,I am trying to calculate dn/ds values of  all orthologous gene pair between a pair of genome using pairwsie_kaks.pl script within bioperl which evokes the codeml program in runmode -2. When I am analyzing the results, some of the genes have anomalously high dS or Ks values and some of them even reaching more than 100 as a result of which the average value of Ks for the whole genome  shots up. These genes are orthologous genes and even share more than 50% sequence identity. Should I consider these genes for the anlysis or left them out. If I left them out, then upto what cutoff value of Ks should I consider for analysis. In some papers, I have found that they have considered Ks values as high as 5.6. Is there a way for determining the cutoff value for Ks? Subarna   From haywardjeremya at gmail.com Fri Jul 6 13:56:12 2012 From: haywardjeremya at gmail.com (Jeremy Hayward) Date: Fri, 6 Jul 2012 14:56:12 -0300 Subject: [Bioperl-l] Two 'host' tags? Message-ID: Hi-- Clueless newbie here, for which apologies. I've posted a description of my problem, inputs and outputs, at Gist 2816510; https://gist.github.com/2816510 Briefly, I'm trying to take a genbank file (.gb), and create a FASTA file with a specific identifier line for each sequence. Specifically, I want the "host" tag as the identifier. With the help of the Bioperl beginner readme and the HOWTO's (which are great!) I've worked out how to loop through my sequences and get the 'host' tag for each one. For some reason, I get two identifier lines for each sequence. I guess the problem is in the 'for' loop--it's running the stuff below it twice, once with the actual 'host' tag data and once with...nothing? Not sure. I think I can work out how to use s/ and a regex just to delete the second identifier line, but that feels like I'm avoiding the problem instead of fixing it. Any help appreciated! Many thanks, --Jeremy Hayward From jason.stajich at gmail.com Fri Jul 6 15:39:52 2012 From: jason.stajich at gmail.com (Jason Stajich) Date: Fri, 6 Jul 2012 12:39:52 -0700 Subject: [Bioperl-l] Two 'host' tags? In-Reply-To: References: Message-ID: Hi Jeremy - You are printing for every feature in the loop (e.g. the source and the misc_RNA ) - you only want to loop through the features, then grab the one which is source, then change or print the info when you see that. So you could have an if( $feature->primary_tag eq 'source') in there or something as well. Alternatively I've left it pretty much intact and just simplified it a bit. You should also try and use Bio::SeqIO to print instead of your printing. I updated the code here to be simpler - right now it warns you that you are printing IDs with spaces (which is something you should think about when it comes to your output file, but I don't know your downstream plans). Also you could put other info in the description field if you wanted to capture accession number or the endophyte name too. https://gist.github.com/3062285 Best, Jason On Jul 6, 2012, at 10:56 AM, Jeremy Hayward wrote: > Hi-- Clueless newbie here, for which apologies. > > I've posted a description of my problem, inputs and outputs, at Gist > 2816510; https://gist.github.com/2816510 > > Briefly, I'm trying to take a genbank file (.gb), and create a FASTA > file with a specific identifier line for each sequence. Specifically, > I want the "host" tag as the identifier. With the help of the Bioperl > beginner readme and the HOWTO's (which are great!) I've worked out how > to loop through my sequences and get the 'host' tag for each one. For > some reason, I get two identifier lines for each sequence. I guess the > problem is in the 'for' loop--it's running the stuff below it twice, > once with the actual 'host' tag data and once with...nothing? Not > sure. > > I think I can work out how to use s/ and a regex just to delete the > second identifier line, but that feels like I'm avoiding the problem > instead of fixing it. Any help appreciated! > > > Many thanks, > > --Jeremy Hayward > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Jason Stajich jason.stajich at gmail.com jason at bioperl.org From bosborne11 at verizon.net Fri Jul 6 15:51:11 2012 From: bosborne11 at verizon.net (Brian Osborne) Date: Fri, 06 Jul 2012 15:51:11 -0400 Subject: [Bioperl-l] Two 'host' tags? In-Reply-To: References: Message-ID: <456448FF-C413-42D1-833A-FAA74E4FEF9E@verizon.net> Jeremy, Looks like each of your individual sequences has 2 features, but you only care about the 'source' feature ( if ($feat_object->primary_tag eq "source") ?). Also, try not to print out the sequence like you're doing, try to build a Sequence object for each input sequence and then write its contents to your fasta file using write_seq(). You will set the id for your Sequence object using display_name(). Brian O. On Jul 6, 2012, at 1:56 PM, Jeremy Hayward wrote: > Hi-- Clueless newbie here, for which apologies. > > I've posted a description of my problem, inputs and outputs, at Gist > 2816510; https://gist.github.com/2816510 > > Briefly, I'm trying to take a genbank file (.gb), and create a FASTA > file with a specific identifier line for each sequence. Specifically, > I want the "host" tag as the identifier. With the help of the Bioperl > beginner readme and the HOWTO's (which are great!) I've worked out how > to loop through my sequences and get the 'host' tag for each one. For > some reason, I get two identifier lines for each sequence. I guess the > problem is in the 'for' loop--it's running the stuff below it twice, > once with the actual 'host' tag data and once with...nothing? Not > sure. > > I think I can work out how to use s/ and a regex just to delete the > second identifier line, but that feels like I'm avoiding the problem > instead of fixing it. Any help appreciated! > > > Many thanks, > > --Jeremy Hayward > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From dejian.zhao at gmail.com Wed Jul 11 13:31:37 2012 From: dejian.zhao at gmail.com (De-Jian Zhao) Date: Thu, 12 Jul 2012 01:31:37 +0800 Subject: [Bioperl-l] Errors with Bio::Seq objects Message-ID: <4FFDB879.1020906@gmail.com> Hi, I downloaded a nucleotide sequence from Genbank (file name: nt.gb) and tested the Bio::SeqIO module as follows. The first 3 commands succeeded; however the last 2 failed. I think $seqio->next_seq() produces a Bio::Seq object which contains the sequence, features and annotation (according to the DESCRIPTION of "perldoc Bio::Seq") and thus the invocation of the methods get_SeqFeatures() and translate() should be valid. However, the results denied this idea. Will anyone explain what happened to the last 2 commands? I have encountered numerous cases of failures when testing the bioperl methods. I want to translate the mRNA sequence and extract the sequence features. What are the right commands? Thanks a lot! Best, Dejian PS: The commands and results $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); print $seqio->next_seq()->display_id(); ' nt.gb NM_053056 $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); print $seqio->next_seq()->molecule(); ' nt.gb mRNA $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); print $seqio->next_seq()->subseq(1,6); ' nt.gb CACACG $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); print $seqio->next_seq()->get_SeqFeatures(); ' nt.gb Bio::SeqFeature::Generic=HASH(0x20a30898)Bio::SeqFeature::Generic=HASH(0x20a30bb0)Bio::SeqFeature::Generic=HASH(0x20a30cd0)Bio::SeqFeature::Generic=HASH(0x20a317b0)Bio::SeqFeature::Generic=HASH(0x20a31720)Bio::SeqFeature::Generic=HASH(0x20a39a18)Bio::SeqFeature::Generic=HASH(0x20a317e0)Bio::SeqFeature::Generic=HASH(0x20a398b0)Bio::SeqFeature::Generic=HASH(0x20a39838)Bio::SeqFeature::Generic=HASH(0x20a39e98)Bio::SeqFeature::Generic=HASH(0x20a3b898)Bio::SeqFeature::Generic=HASH(0x20a3a120)Bio::SeqFeature::Generic=HASH(0x20a3bda8)Bio::SeqFeature::Generic=HASH(0x20a3c030)Bio::SeqFeature::Generic=HASH(0x20a3c2b8)Bio::SeqFeature::Generic=HASH(0x20a3be20)Bio::SeqFeature::Generic=HASH(0x20a3c0a8)Bio::SeqFeature::Generic=HASH(0x20a3bb98)Bio::SeqFeature::Generic=HASH(0x20a3c300)Bio::SeqFeature::Generic=HASH(0x20a3c588)Bio::SeqFeature::Generic=HASH(0x20a3d838)Bio::SeqFeature::Generic=HASH(0x20a3dfb8)Bio::SeqFeature::Generic=HASH(0x20a3dd18) $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); print $seqio->next_seq()->translate(); ' nt.gb Bio::Seq::RichSeq=HASH(0x20a3e7b0) From jimhu at tamu.edu Wed Jul 11 14:01:27 2012 From: jimhu at tamu.edu (Jim Hu) Date: Wed, 11 Jul 2012 13:01:27 -0500 Subject: [Bioperl-l] Errors with Bio::Seq objects In-Reply-To: <4FFDB879.1020906@gmail.com> References: <4FFDB879.1020906@gmail.com> Message-ID: <9CA9DA3A-B03F-4EC3-977C-E18A6F4D9B6F@tamu.edu> Hi Dejian, On Jul 11, 2012, at 12:31 PM, De-Jian Zhao wrote: > Hi, > > I downloaded a nucleotide sequence from Genbank (file name: nt.gb) and tested the Bio::SeqIO module as follows. The first 3 commands succeeded; however the last 2 failed. > > I think $seqio->next_seq() produces a Bio::Seq object which contains the sequence, features and annotation (according to the DESCRIPTION of "perldoc Bio::Seq") and thus the invocation of the methods get_SeqFeatures() and translate() should be valid. However, the results denied this idea. That's correct about Bio::Seq objects being returned. Actually, it is probably a kind of Bio::Seq object. For example, SeqIO may return a Bio::Seq::RichSeq object that inherits methods from Bio::Seq objects. However, as explained below, the methods are working as they should... they are just returning objects when you are expecting something else. > > Will anyone explain what happened to the last 2 commands? I have encountered numerous cases of failures when testing the bioperl methods. I want to translate the mRNA sequence and extract the sequence features. What are the right commands? Thanks a lot! > > Best, > Dejian > > > > PS: The commands and results > > $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); print $seqio->next_seq()->display_id(); ' nt.gb > NM_053056 > > $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); print $seqio->next_seq()->molecule(); ' nt.gb > mRNA > > $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); print $seqio->next_seq()->subseq(1,6); ' nt.gb > CACACG > > $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); print $seqio->next_seq()->get_SeqFeatures(); ' nt.gb > Bio::SeqFeature::Generic=HASH(0x20a30898)Bio::SeqFeature::Generic=HASH(0x20a30bb0)Bio::SeqFeature::Generic=HASH(0x20a30cd0)Bio::SeqFeature::Generic=HASH(0x20a317b0)Bio::SeqFeature::Generic=HASH(0x20a31720)Bio::SeqFeature::Generic=HASH(0x20a39a18)Bio::SeqFeature::Generic=HASH(0x20a317e0)Bio::SeqFeature::Generic=HASH(0x20a398b0)Bio::SeqFeature::Generic=HASH(0x20a39838)Bio::SeqFeature::Generic=HASH(0x20a39e98)Bio::SeqFeature::Generic=HASH(0x20a3b898)Bio::SeqFeature::Generic=HASH(0x20a3a120)Bio::SeqFeature::Generic=HASH(0x20a3bda8)Bio::SeqFeature::Generic=HASH(0x20a3c030)Bio::SeqFeature::Generic=HASH(0x20a3c2b8)Bio::SeqFeature::Generic=HASH(0x20a3be20)Bio::SeqFeature::Generic=HASH(0x20a3c0a8)Bio::SeqFeature::Generic=HASH(0x20a3bb98)Bio::SeqFeature::Generic=HASH(0x20a3c300)Bio::SeqFeature::Generic=HASH(0x20a3c588)Bio::SeqFeature::Generic=HASH(0x20a3d838)Bio::SeqFeature::Generic=HASH(0x20a3dfb8)Bio::SeqFeature::Generic=HASH(0x20a3dd18) http://doc.bioperl.org/bioperl-live/Bio/Seq.html#POD24 $seq_obj->get_SeqFeatures() returns an array of SeqFeature objects, which are references. So this worked as expected. I usually write this as script files, so I've never done it all with perl -e. But you need to iterate over the array and query the objects for the information you want about the features. > > $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); print $seqio->next_seq()->translate(); ' nt.gb > Bio::Seq::RichSeq=HASH(0x20a3e7b0) ->translate returns a new Seq object. I think $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); print $seqio->next_seq()->translate()->seq(); ' nt.gb should work (haven't tried it). Jim > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ===================================== Jim Hu Professor Dept. of Biochemistry and Biophysics 2128 TAMU Texas A&M Univ. College Station, TX 77843-2128 979-862-4054 From bosborne11 at verizon.net Wed Jul 11 13:47:25 2012 From: bosborne11 at verizon.net (Brian Osborne) Date: Wed, 11 Jul 2012 13:47:25 -0400 Subject: [Bioperl-l] Errors with Bio::Seq objects In-Reply-To: <4FFDB879.1020906@gmail.com> References: <4FFDB879.1020906@gmail.com> Message-ID: Dejian, These are not "failures". The get_SeqFeatures() and translate() methods will return Bio::Seq objects or a Bio::Seq object. Start here: www.bioperl.org/wiki/HOWTO:Beginners Brian O. On Jul 11, 2012, at 1:31 PM, De-Jian Zhao wrote: > Hi, > > I downloaded a nucleotide sequence from Genbank (file name: nt.gb) and tested the Bio::SeqIO module as follows. The first 3 commands succeeded; however the last 2 failed. > > I think $seqio->next_seq() produces a Bio::Seq object which contains the sequence, features and annotation (according to the DESCRIPTION of "perldoc Bio::Seq") and thus the invocation of the methods get_SeqFeatures() and translate() should be valid. However, the results denied this idea. > > Will anyone explain what happened to the last 2 commands? I have encountered numerous cases of failures when testing the bioperl methods. I want to translate the mRNA sequence and extract the sequence features. What are the right commands? Thanks a lot! > > Best, > Dejian > > > > PS: The commands and results > > $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); print $seqio->next_seq()->display_id(); ' nt.gb > NM_053056 > > $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); print $seqio->next_seq()->molecule(); ' nt.gb > mRNA > > $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); print $seqio->next_seq()->subseq(1,6); ' nt.gb > CACACG > > $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); print $seqio->next_seq()->get_SeqFeatures(); ' nt.gb > Bio::SeqFeature::Generic=HASH(0x20a30898)Bio::SeqFeature::Generic=HASH(0x20a30bb0)Bio::SeqFeature::Generic=HASH(0x20a30cd0)Bio::SeqFeature::Generic=HASH(0x20a317b0)Bio::SeqFeature::Generic=HASH(0x20a31720)Bio::SeqFeature::Generic=HASH(0x20a39a18)Bio::SeqFeature::Generic=HASH(0x20a317e0)Bio::SeqFeature::Generic=HASH(0x20a398b0)Bio::SeqFeature::Generic=HASH(0x20a39838)Bio::SeqFeature::Generic=HASH(0x20a39e98)Bio::SeqFeature::Generic=HASH(0x20a3b898)Bio::SeqFeature::Generic=HASH(0x20a3a120)Bio::SeqFeature::Generic=HASH(0x20a3bda8)Bio::SeqFeature::Generic=HASH(0x20a3c030)Bio::SeqFeature::Generic=HASH(0x20a3c2b8)Bio::SeqFeature::Generic=HASH(0x20a3be20)Bio::SeqFeature::Generic=HASH(0x20a3c0a8)Bio::SeqFeature::Generic=HASH(0x20a3bb98)Bio::SeqFeature::Generic=HASH(0x20a3c300)Bio::SeqFeature::Generic=HASH(0x20a3c588)Bio::SeqFeature::Generic=HASH(0x20a3d838)Bio::SeqFeature::Generic=HASH(0x20a3dfb8)Bio::SeqFeature::Generic=HASH(0x20a3dd18) > > $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); print $seqio->next_seq()->translate(); ' nt.gb > Bio::Seq::RichSeq=HASH(0x20a3e7b0) > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Wed Jul 11 15:02:46 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Wed, 11 Jul 2012 19:02:46 +0000 Subject: [Bioperl-l] hmmer3.pm question re query and hit coordinates In-Reply-To: References: <7CF4A2C5-F44F-4C0D-A3B7-5ED131A1A9ED@gmail.com> <1823BCEE-5D27-4FF9-8D57-082AE0CFE8ED@illinois.edu> Message-ID: Peng, Has this been filed as a bug yet? https://redmine.open-bio.org/projects/bioperl Seems like it would be fairly easy to fix, but I want to track it just in case. chris On Jul 11, 2012, at 12:45 PM, Peng Zhou wrote: > Hello guys, > > Just a follow-up, it seems to me the bioperl-live version is still having the same problem - calling hit "query" while query sequence "hit". I also looked into the test script written for hmmer3 (bioperl-live/t/SearchIO/hmmer.t), and it doesn't deal with the alignment part - I guess that's why this bug was not discovered. > > To be simple, here's an output of hmmsearch v3.0: > # hmmsearch :: search profile(s) against a sequence database > # HMMER 3.0 (March 2010); http://hmmer.org/ > # Copyright (C) 2010 Howard Hughes Medical Institute. > # Freely distributed under the GNU General Public License (GPLv3). > # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > # query HMM file: /project/youngn/zhoup/Scripts/spada/profile/21_all.hmm > # target sequence database: /project/youngn/zhoup/Data/misc3/spada/Athaliana/01_genome/12_refseq_orf.fa > # output directed to file: /project/youngn/zhoup/Data/misc3/spada/Athaliana/11_hmmSearchX/01_raw.txt > # number of worker threads: 4 > # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > > Query: CRP0000 [M=75] > Scores for complete sequences (score includes all domains): > --- full sequence --- --- best 1 domain --- -#dom- > E-value score bias E-value score bias exp N Sequence Description > ------- ------ ----- ------- ------ ----- ---- -- -------- ----------- > 5.5e-25 95.0 14.4 5.7e-25 95.0 10.0 1.0 1 Chr2_540228_540404_+ > > Domain annotation for each sequence (and alignments): > >> Chr2_540228_540404_+ > # score bias c-Evalue i-Evalue hmmfrom hmm to alifrom ali to envfrom env to acc > --- ------ ----- --------- --------- ------- ------- ------- ------- ------- ------- ---- > 1 ! 95.0 10.0 3.6e-30 5.7e-25 20 74 .. 4 59 .] 1 59 [] 0.95 > > Alignments for each domain: > == domain 1 score: 95.0 bits; conditional E-value: 3.6e-30 > CRP0000 20 tegpkvaeartCesqShkFkGpCvsdtnCasvCrtEgfpgGecrg.rrrCfCtkpc 74 > ++gp+++eartCes+Sh+FkGpCvs +nCa+vC++Egf gG+crg rrrC+Ct++c > Chr2_540228_540404_+ 4 GMGPVTVEARTCESKSHRFKGPCVSTHNCANVCHNEGFGGGKCRGfRRRCYCTRHC 59 > 568899***99********************************************* PP > > And here is a dump of the parsed HSP object: > $VAR1 = bless( { > 'VERBOSE' => 0, > 'IDENTICAL' => 0, > 'RANK' => 1, > 'STRANDED' => 'NONE', > 'EVALUE' => '3.6e-30', > 'HSP_LENGTH' => 56, > 'ALGORITHM' => 'HMMSEARCH' > 'SCORE' => '95.0', > 'GAP_SYMBOL' => '-', > 'CONSERVED' => 0, > > 'HIT_NAME' => 'Chr2_540228_540404_+', > 'HIT_DESC' => '', > 'HIT_START' => '20', > 'HIT_END' => '74', > 'HIT_LENGTH' => 56, > 'HIT_SEQ' => 'tegpkvaeartCesqShkFkGpCvsdtnCasvCrtEgfpgGecrg-rrrCfCtkpc', > 'HIT_FRAME' => 0, > > 'QUERY_NAME' => 'CRP0000', > 'QUERY_DESC' => undef, > 'QUERY_START' => '4', > 'QUERY_END' => '59', > 'QUERY_LENGTH' => '75', > 'QUERY_FRAME' => 0, > 'QUERY_SEQ' => 'GMGPVTVEARTCESKSHRFKGPCVSTHNCANVCHNEGFGGGKCRGfRRRCYCTRHC', > > 'HOMOLOGY_SEQ' => '++gp+++eartCes+Sh+FkGpCvs +nCa+vC++Egf gG+crg rrrC+Ct++c', > }, 'Bio::Search::HSP::HMMERHSP' ); > > Clearly, the "HIT_START", "HIT_END", "HIT_SEQ" should actually be exchanged with "QUERY_START", "QUERY_END" and "QUERY_SEQ" values. > > Thanks, > > Peng, > > On Tuesday, July 19, 2011 11:23:20 PM UTC-5, Givan, Scott A. wrote: > I'll try the bioperl-live version. Thanks guys. > Scott Givan > 541-740-4685 > Sent from an iPhone (so expect typos). > > On Jul 19, 2011, at 10:34 PM, "Chris Fields" wrote: > > > This might be a disconnect between the HMMER3 version in bioperl-live and the one in Kai's bioperl-hmmer3 repo. I believe the one in bioperl-live is newer. Scott, can you give that a try? > > > > chris > > > > On Jul 19, 2011, at 9:45 PM, Thomas Sharpton wrote: > > > >> Hi Scott, > >> > >> Thanks for writing. I'm on the road at the moment so I have to be briefer and less thorough than I'd like to be. > >> > >> What you are observing is not the intended behavior. Oddly, it's not what I recall obtaining in my tests on this software, though I was mostly interested in hmmsearch at the time and may have been sloppier than I should have been when it came to hmmscan. > >> > >> What version of HMMER3 you're using? There have been some small formatting changes in the past that might be causing a burp in the parser, though I'm doubting it. > >> > >> Kai Blin wrote some test scripts (found here: bioperl-live/t/SearchIO/hmmer.t) that, if I recall correctly, evaluate query/hit coordinates. It might be worth giving this a shot if you haven't already. > >> > >> Also, if you don't mind, I'm happy to run your code on your output file on my end. It might help me diagnose the problem. > >> > >> Sorry this is being a thorn in your side! I've cc'ed the list in case anyone else has insight into this matter. > >> > >> Best, > >> Thomas > >> > >> On Jul 19, 2011, at 10:43 AM, Givan, Scott A. wrote: > >> > >>> Hi Thomas, > >>> > >>> I'm using modules in the bipoerl-hmmer3 git repository to parse hmmscan > >>> reports. When I parse the files and walk through the HSP's like: > >>> > >>> while (my $hit = $rslt->next_model) { > >>> > >>> while (my $domain = $hit->next_hsp) { > >>> > >>> And retrieve the "hit" coordinates like: > >>> > >>> print "hit coords: ", $domain->start('hit'), "-", $domain->end('hit'), > >>> "\n"; > >>> > >>> The coordinates returned correspond to what I would call the "query", > >>> since they are for the sequence I fed to hmmscan to search the profile > >>> database. Likewise, when retrieving the query coordinates like > >>> $domain->start('query'), I get what I consider the "hit" coordinates, > >>> since they are for the domain profile. Is this the intended behavior? > >>> > >>> Thanks. > >>> > >>> scott > >>> > >>> -- > >>> Scott A. Givan > >>> Associate Director > >>> Informatics Research Core Facility > >>> 240e Bond Life Sciences Center > >>> Research Assistant Professor > >>> Molecular Microbiology and Immunology > >>> University of Missouri, Columbia > >>> > >>> TEL 573-882-2948 > >>> FAX 573-884-9676 > >>> http://ircf.rnet.missouri.edu > >>> > >>> > >>> > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > From p.j.a.cock at googlemail.com Wed Jul 11 17:00:56 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 11 Jul 2012 22:00:56 +0100 Subject: [Bioperl-l] hmmer3.pm question re query and hit coordinates In-Reply-To: References: <7CF4A2C5-F44F-4C0D-A3B7-5ED131A1A9ED@gmail.com> <1823BCEE-5D27-4FF9-8D57-082AE0CFE8ED@illinois.edu> Message-ID: On Wed, Jul 11, 2012 at 8:02 PM, Fields, Christopher J wrote: > Peng, > > Has this been filed as a bug yet? > > https://redmine.open-bio.org/projects/bioperl > > Seems like it would be fairly easy to fix, but I want to track it just in case. > > chris Hi all, This could be the unfortunate fact that hmmscan and hmmsearch return very similar tabular output, but with query and hit interchanged. i.e. You need some extra information to know which way round they are (not possible with the current output). This was an issue in Bow's Biopython SearchIO project - which for the moment he solved by handling this as two hmmer file formats. In the medium term we're hoping hmmer3 will add some header information or something. Peter From zhoupenggeni at gmail.com Wed Jul 11 13:45:00 2012 From: zhoupenggeni at gmail.com (Peng Zhou) Date: Wed, 11 Jul 2012 10:45:00 -0700 (PDT) Subject: [Bioperl-l] hmmer3.pm question re query and hit coordinates In-Reply-To: References: <7CF4A2C5-F44F-4C0D-A3B7-5ED131A1A9ED@gmail.com> <1823BCEE-5D27-4FF9-8D57-082AE0CFE8ED@illinois.edu> Message-ID: Hello guys, Just a follow-up, it seems to me the bioperl-live version is still having the same problem - calling hit "query" while query sequence "hit". I also looked into the test script written for hmmer3 ( bioperl-live/t/SearchIO/hmmer.t), and it doesn't deal with the alignment part - I guess that's why this bug was not discovered. To be simple, here's an output of hmmsearch v3.0: # hmmsearch :: search profile(s) against a sequence database # HMMER 3.0 (March 2010); http://hmmer.org/ # Copyright (C) 2010 Howard Hughes Medical Institute. # Freely distributed under the GNU General Public License (GPLv3). # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - # query HMM file: /project/youngn/zhoup/Scripts/spada/profile/21_all.hmm # target sequence database: /project/youngn/zhoup/Data/misc3/spada/Athaliana/01_genome/12_refseq_orf.fa # output directed to file: /project/youngn/zhoup/Data/misc3/spada/Athaliana/11_hmmSearchX/01_raw.txt # number of worker threads: 4 # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Query: CRP0000 [M=75] Scores for complete sequences (score includes all domains): --- full sequence --- --- best 1 domain --- -#dom- E-value score bias E-value score bias exp N Sequence Description ------- ------ ----- ------- ------ ----- ---- -- -------- ----------- 5.5e-25 95.0 14.4 5.7e-25 95.0 10.0 1.0 1 Chr2_540228_540404_+ Domain annotation for each sequence (and alignments): >> Chr2_540228_540404_+ # score bias c-Evalue i-Evalue hmmfrom hmm to alifrom ali to envfrom env to acc --- ------ ----- --------- --------- ------- ------- ------- ------- ------- ------- ---- 1 ! 95.0 10.0 3.6e-30 5.7e-25 20 74 .. 4 59 .] 1 59 [] 0.95 Alignments for each domain: == domain 1 score: 95.0 bits; conditional E-value: 3.6e-30 CRP0000 20 tegpkvaeartCesqShkFkGpCvsdtnCasvCrtEgfpgGecrg.rrrCfCtkpc 74 ++gp+++eartCes+Sh+FkGpCvs +nCa+vC++Egf gG+crg rrrC+Ct++c Chr2_540228_540404_+ 4 GMGPVTVEARTCESKSHRFKGPCVSTHNCANVCHNEGFGGGKCRGfRRRCYCTRHC 59 568899***99********************************************* PP And here is a dump of the parsed HSP object: $VAR1 = bless( { 'VERBOSE' => 0, 'IDENTICAL' => 0, 'RANK' => 1, 'STRANDED' => 'NONE', 'EVALUE' => '3.6e-30', 'HSP_LENGTH' => 56, 'ALGORITHM' => 'HMMSEARCH' 'SCORE' => '95.0', 'GAP_SYMBOL' => '-', 'CONSERVED' => 0, 'HIT_NAME' => 'Chr2_540228_540404_+', 'HIT_DESC' => '', 'HIT_START' => '20', 'HIT_END' => '74', 'HIT_LENGTH' => 56, 'HIT_SEQ' => 'tegpkvaeartCesqShkFkGpCvsdtnCasvCrtEgfpgGecrg-rrrCfCtkpc', 'HIT_FRAME' => 0, 'QUERY_NAME' => 'CRP0000', 'QUERY_DESC' => undef, 'QUERY_START' => '4', 'QUERY_END' => '59', 'QUERY_LENGTH' => '75', 'QUERY_FRAME' => 0, 'QUERY_SEQ' => 'GMGPVTVEARTCESKSHRFKGPCVSTHNCANVCHNEGFGGGKCRGfRRRCYCTRHC', 'HOMOLOGY_SEQ' => '++gp+++eartCes+Sh+FkGpCvs +nCa+vC++Egf gG+crg rrrC+Ct++c', }, 'Bio::Search::HSP::HMMERHSP' ); Clearly, the "HIT_START", "HIT_END", "HIT_SEQ" should actually be exchanged with "QUERY_START", "QUERY_END" and "QUERY_SEQ" values. Thanks, Peng, On Tuesday, July 19, 2011 11:23:20 PM UTC-5, Givan, Scott A. wrote: > > I'll try the bioperl-live version. Thanks guys. > > Scott Givan > 541-740-4685 > Sent from an iPhone (so expect typos). > > On Jul 19, 2011, at 10:34 PM, "Chris Fields" > wrote: > > > This might be a disconnect between the HMMER3 version in bioperl-live > and the one in Kai's bioperl-hmmer3 repo. I believe the one in > bioperl-live is newer. Scott, can you give that a try? > > > > chris > > > > On Jul 19, 2011, at 9:45 PM, Thomas Sharpton wrote: > > > >> Hi Scott, > >> > >> Thanks for writing. I'm on the road at the moment so I have to be > briefer and less thorough than I'd like to be. > >> > >> What you are observing is not the intended behavior. Oddly, it's not > what I recall obtaining in my tests on this software, though I was mostly > interested in hmmsearch at the time and may have been sloppier than I > should have been when it came to hmmscan. > >> > >> What version of HMMER3 you're using? There have been some small > formatting changes in the past that might be causing a burp in the parser, > though I'm doubting it. > >> > >> Kai Blin wrote some test scripts (found here: > bioperl-live/t/SearchIO/hmmer.t) that, if I recall correctly, evaluate > query/hit coordinates. It might be worth giving this a shot if you haven't > already. > >> > >> Also, if you don't mind, I'm happy to run your code on your output file > on my end. It might help me diagnose the problem. > >> > >> Sorry this is being a thorn in your side! I've cc'ed the list in case > anyone else has insight into this matter. > >> > >> Best, > >> Thomas > >> > >> On Jul 19, 2011, at 10:43 AM, Givan, Scott A. wrote: > >> > >>> Hi Thomas, > >>> > >>> I'm using modules in the bipoerl-hmmer3 git repository to parse hmmscan > >>> reports. When I parse the files and walk through the HSP's like: > >>> > >>> while (my $hit = $rslt->next_model) { > >>> > >>> while (my $domain = $hit->next_hsp) { > >>> > >>> And retrieve the "hit" coordinates like: > >>> > >>> print "hit coords: ", $domain->start('hit'), "-", > $domain->end('hit'), > >>> "\n"; > >>> > >>> The coordinates returned correspond to what I would call the "query", > >>> since they are for the sequence I fed to hmmscan to search the profile > >>> database. Likewise, when retrieving the query coordinates like > >>> $domain->start('query'), I get what I consider the "hit" coordinates, > >>> since they are for the domain profile. Is this the intended behavior? > >>> > >>> Thanks. > >>> > >>> scott > >>> > >>> -- > >>> Scott A. Givan > >>> Associate Director > >>> Informatics Research Core Facility > >>> 240e Bond Life Sciences Center > >>> Research Assistant Professor > >>> Molecular Microbiology and Immunology > >>> University of Missouri, Columbia > >>> > >>> TEL 573-882-2948 > >>> FAX 573-884-9676 > >>> http://ircf.rnet.missouri.edu > >>> > >>> > >>> > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From zhoupenggeni at gmail.com Wed Jul 11 13:45:00 2012 From: zhoupenggeni at gmail.com (Peng Zhou) Date: Wed, 11 Jul 2012 10:45:00 -0700 (PDT) Subject: [Bioperl-l] hmmer3.pm question re query and hit coordinates In-Reply-To: References: <7CF4A2C5-F44F-4C0D-A3B7-5ED131A1A9ED@gmail.com> <1823BCEE-5D27-4FF9-8D57-082AE0CFE8ED@illinois.edu> Message-ID: Hello guys, Just a follow-up, it seems to me the bioperl-live version is still having the same problem - calling hit "query" while query sequence "hit". I also looked into the test script written for hmmer3 ( bioperl-live/t/SearchIO/hmmer.t), and it doesn't deal with the alignment part - I guess that's why this bug was not discovered. To be simple, here's an output of hmmsearch v3.0: # hmmsearch :: search profile(s) against a sequence database # HMMER 3.0 (March 2010); http://hmmer.org/ # Copyright (C) 2010 Howard Hughes Medical Institute. # Freely distributed under the GNU General Public License (GPLv3). # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - # query HMM file: /project/youngn/zhoup/Scripts/spada/profile/21_all.hmm # target sequence database: /project/youngn/zhoup/Data/misc3/spada/Athaliana/01_genome/12_refseq_orf.fa # output directed to file: /project/youngn/zhoup/Data/misc3/spada/Athaliana/11_hmmSearchX/01_raw.txt # number of worker threads: 4 # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Query: CRP0000 [M=75] Scores for complete sequences (score includes all domains): --- full sequence --- --- best 1 domain --- -#dom- E-value score bias E-value score bias exp N Sequence Description ------- ------ ----- ------- ------ ----- ---- -- -------- ----------- 5.5e-25 95.0 14.4 5.7e-25 95.0 10.0 1.0 1 Chr2_540228_540404_+ Domain annotation for each sequence (and alignments): >> Chr2_540228_540404_+ # score bias c-Evalue i-Evalue hmmfrom hmm to alifrom ali to envfrom env to acc --- ------ ----- --------- --------- ------- ------- ------- ------- ------- ------- ---- 1 ! 95.0 10.0 3.6e-30 5.7e-25 20 74 .. 4 59 .] 1 59 [] 0.95 Alignments for each domain: == domain 1 score: 95.0 bits; conditional E-value: 3.6e-30 CRP0000 20 tegpkvaeartCesqShkFkGpCvsdtnCasvCrtEgfpgGecrg.rrrCfCtkpc 74 ++gp+++eartCes+Sh+FkGpCvs +nCa+vC++Egf gG+crg rrrC+Ct++c Chr2_540228_540404_+ 4 GMGPVTVEARTCESKSHRFKGPCVSTHNCANVCHNEGFGGGKCRGfRRRCYCTRHC 59 568899***99********************************************* PP And here is a dump of the parsed HSP object: $VAR1 = bless( { 'VERBOSE' => 0, 'IDENTICAL' => 0, 'RANK' => 1, 'STRANDED' => 'NONE', 'EVALUE' => '3.6e-30', 'HSP_LENGTH' => 56, 'ALGORITHM' => 'HMMSEARCH' 'SCORE' => '95.0', 'GAP_SYMBOL' => '-', 'CONSERVED' => 0, 'HIT_NAME' => 'Chr2_540228_540404_+', 'HIT_DESC' => '', 'HIT_START' => '20', 'HIT_END' => '74', 'HIT_LENGTH' => 56, 'HIT_SEQ' => 'tegpkvaeartCesqShkFkGpCvsdtnCasvCrtEgfpgGecrg-rrrCfCtkpc', 'HIT_FRAME' => 0, 'QUERY_NAME' => 'CRP0000', 'QUERY_DESC' => undef, 'QUERY_START' => '4', 'QUERY_END' => '59', 'QUERY_LENGTH' => '75', 'QUERY_FRAME' => 0, 'QUERY_SEQ' => 'GMGPVTVEARTCESKSHRFKGPCVSTHNCANVCHNEGFGGGKCRGfRRRCYCTRHC', 'HOMOLOGY_SEQ' => '++gp+++eartCes+Sh+FkGpCvs +nCa+vC++Egf gG+crg rrrC+Ct++c', }, 'Bio::Search::HSP::HMMERHSP' ); Clearly, the "HIT_START", "HIT_END", "HIT_SEQ" should actually be exchanged with "QUERY_START", "QUERY_END" and "QUERY_SEQ" values. Thanks, Peng, On Tuesday, July 19, 2011 11:23:20 PM UTC-5, Givan, Scott A. wrote: > > I'll try the bioperl-live version. Thanks guys. > > Scott Givan > 541-740-4685 > Sent from an iPhone (so expect typos). > > On Jul 19, 2011, at 10:34 PM, "Chris Fields" > wrote: > > > This might be a disconnect between the HMMER3 version in bioperl-live > and the one in Kai's bioperl-hmmer3 repo. I believe the one in > bioperl-live is newer. Scott, can you give that a try? > > > > chris > > > > On Jul 19, 2011, at 9:45 PM, Thomas Sharpton wrote: > > > >> Hi Scott, > >> > >> Thanks for writing. I'm on the road at the moment so I have to be > briefer and less thorough than I'd like to be. > >> > >> What you are observing is not the intended behavior. Oddly, it's not > what I recall obtaining in my tests on this software, though I was mostly > interested in hmmsearch at the time and may have been sloppier than I > should have been when it came to hmmscan. > >> > >> What version of HMMER3 you're using? There have been some small > formatting changes in the past that might be causing a burp in the parser, > though I'm doubting it. > >> > >> Kai Blin wrote some test scripts (found here: > bioperl-live/t/SearchIO/hmmer.t) that, if I recall correctly, evaluate > query/hit coordinates. It might be worth giving this a shot if you haven't > already. > >> > >> Also, if you don't mind, I'm happy to run your code on your output file > on my end. It might help me diagnose the problem. > >> > >> Sorry this is being a thorn in your side! I've cc'ed the list in case > anyone else has insight into this matter. > >> > >> Best, > >> Thomas > >> > >> On Jul 19, 2011, at 10:43 AM, Givan, Scott A. wrote: > >> > >>> Hi Thomas, > >>> > >>> I'm using modules in the bipoerl-hmmer3 git repository to parse hmmscan > >>> reports. When I parse the files and walk through the HSP's like: > >>> > >>> while (my $hit = $rslt->next_model) { > >>> > >>> while (my $domain = $hit->next_hsp) { > >>> > >>> And retrieve the "hit" coordinates like: > >>> > >>> print "hit coords: ", $domain->start('hit'), "-", > $domain->end('hit'), > >>> "\n"; > >>> > >>> The coordinates returned correspond to what I would call the "query", > >>> since they are for the sequence I fed to hmmscan to search the profile > >>> database. Likewise, when retrieving the query coordinates like > >>> $domain->start('query'), I get what I consider the "hit" coordinates, > >>> since they are for the domain profile. Is this the intended behavior? > >>> > >>> Thanks. > >>> > >>> scott > >>> > >>> -- > >>> Scott A. Givan > >>> Associate Director > >>> Informatics Research Core Facility > >>> 240e Bond Life Sciences Center > >>> Research Assistant Professor > >>> Molecular Microbiology and Immunology > >>> University of Missouri, Columbia > >>> > >>> TEL 573-882-2948 > >>> FAX 573-884-9676 > >>> http://ircf.rnet.missouri.edu > >>> > >>> > >>> > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From zhoupenggeni at gmail.com Wed Jul 11 14:03:17 2012 From: zhoupenggeni at gmail.com (Peng Zhou) Date: Wed, 11 Jul 2012 11:03:17 -0700 (PDT) Subject: [Bioperl-l] Errors with Bio::Seq objects In-Reply-To: <4FFDB879.1020906@gmail.com> References: <4FFDB879.1020906@gmail.com> Message-ID: <25cf332a-f998-4eae-96ff-d9db1ee2ff2c@googlegroups.com> Hi, I guess that's what the commands are supposed to do: the get_SeqFeatures() method return an array of Bio::SeqFeature objects, and the translate() method returns a Bio::Seq object. And you can't simply "print" an object in perl - you can "dump" it though: $ perl -e ' use Bio::SeqIO; use Data::Dumper; my $seqio=Bio::SeqIO->new(-file=>shift); print Dumper($seqio->next_seq()->get_SeqFeatures()); ' nt.gb $ perl -e ' use Bio::SeqIO; use Data::Dumper; my $seqio=Bio::SeqIO->new(-file=>shift); print Dumper($seqio->next_seq()->translate()); ' nt.gb On Wednesday, July 11, 2012 12:31:37 PM UTC-5, De-Jian Zhao wrote: > > Hi, > > I downloaded a nucleotide sequence from Genbank (file name: nt.gb) and > tested the Bio::SeqIO module as follows. The first 3 commands succeeded; > however the last 2 failed. > > I think $seqio->next_seq() produces a Bio::Seq object which contains the > sequence, features and annotation (according to the DESCRIPTION of > "perldoc Bio::Seq") and thus the invocation of the methods > get_SeqFeatures() and translate() should be valid. However, the results > denied this idea. > > Will anyone explain what happened to the last 2 commands? I have > encountered numerous cases of failures when testing the bioperl methods. > I want to translate the mRNA sequence and extract the sequence features. > What are the right commands? Thanks a lot! > > Best, > Dejian > > > > PS: The commands and results > > $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); > print $seqio->next_seq()->display_id(); ' nt.gb > NM_053056 > > $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); > print $seqio->next_seq()->molecule(); ' nt.gb > mRNA > > $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); > print $seqio->next_seq()->subseq(1,6); ' nt.gb > CACACG > > $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); > print $seqio->next_seq()->get_SeqFeatures(); ' nt.gb > Bio::SeqFeature::Generic=HASH(0x20a30898)Bio::SeqFeature::Generic=HASH(0x20a30bb0)Bio::SeqFeature::Generic=HASH(0x20a30cd0)Bio::SeqFeature::Generic=HASH(0x20a317b0)Bio::SeqFeature::Generic=HASH(0x20a31720)Bio::SeqFeature::Generic=HASH(0x20a39a18)Bio::SeqFeature::Generic=HASH(0x20a317e0)Bio::SeqFeature::Generic=HASH(0x20a398b0)Bio::SeqFeature::Generic=HASH(0x20a39838)Bio::SeqFeature::Generic=HASH(0x20a39e98)Bio::SeqFeature::Generic=HASH(0x20a3b898)Bio::SeqFeature::Generic=HASH(0x20a3a120)Bio::SeqFeature::Generic=HASH(0x20a3bda8)Bio::SeqFeature::Generic=HASH(0x20a3c030)Bio::SeqFeature::Generic=HASH(0x20a3c2b8)Bio::SeqFeature::Generic=HASH(0x20a3be20)Bio::SeqFeature::Generic=HASH(0x20a3c0a8)Bio::SeqFeature::Generic=HASH(0x20a3bb98)Bio::SeqFeature::Generic=HASH(0x20a3c300)Bio::SeqFeature::Generic=HASH(0x20a3c588)Bio::SeqFeature::Generic=HASH(0x20a3d838)Bio::SeqFeature::Generic=HASH(0x20a3dfb8)Bio::SeqFeature::Generic=HASH(0x20a3dd18) > > > $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); > print $seqio->next_seq()->translate(); ' nt.gb > Bio::Seq::RichSeq=HASH(0x20a3e7b0) > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From zhoupenggeni at gmail.com Wed Jul 11 14:03:17 2012 From: zhoupenggeni at gmail.com (Peng Zhou) Date: Wed, 11 Jul 2012 11:03:17 -0700 (PDT) Subject: [Bioperl-l] Errors with Bio::Seq objects In-Reply-To: <4FFDB879.1020906@gmail.com> References: <4FFDB879.1020906@gmail.com> Message-ID: <25cf332a-f998-4eae-96ff-d9db1ee2ff2c@googlegroups.com> Hi, I guess that's what the commands are supposed to do: the get_SeqFeatures() method return an array of Bio::SeqFeature objects, and the translate() method returns a Bio::Seq object. And you can't simply "print" an object in perl - you can "dump" it though: $ perl -e ' use Bio::SeqIO; use Data::Dumper; my $seqio=Bio::SeqIO->new(-file=>shift); print Dumper($seqio->next_seq()->get_SeqFeatures()); ' nt.gb $ perl -e ' use Bio::SeqIO; use Data::Dumper; my $seqio=Bio::SeqIO->new(-file=>shift); print Dumper($seqio->next_seq()->translate()); ' nt.gb On Wednesday, July 11, 2012 12:31:37 PM UTC-5, De-Jian Zhao wrote: > > Hi, > > I downloaded a nucleotide sequence from Genbank (file name: nt.gb) and > tested the Bio::SeqIO module as follows. The first 3 commands succeeded; > however the last 2 failed. > > I think $seqio->next_seq() produces a Bio::Seq object which contains the > sequence, features and annotation (according to the DESCRIPTION of > "perldoc Bio::Seq") and thus the invocation of the methods > get_SeqFeatures() and translate() should be valid. However, the results > denied this idea. > > Will anyone explain what happened to the last 2 commands? I have > encountered numerous cases of failures when testing the bioperl methods. > I want to translate the mRNA sequence and extract the sequence features. > What are the right commands? Thanks a lot! > > Best, > Dejian > > > > PS: The commands and results > > $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); > print $seqio->next_seq()->display_id(); ' nt.gb > NM_053056 > > $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); > print $seqio->next_seq()->molecule(); ' nt.gb > mRNA > > $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); > print $seqio->next_seq()->subseq(1,6); ' nt.gb > CACACG > > $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); > print $seqio->next_seq()->get_SeqFeatures(); ' nt.gb > Bio::SeqFeature::Generic=HASH(0x20a30898)Bio::SeqFeature::Generic=HASH(0x20a30bb0)Bio::SeqFeature::Generic=HASH(0x20a30cd0)Bio::SeqFeature::Generic=HASH(0x20a317b0)Bio::SeqFeature::Generic=HASH(0x20a31720)Bio::SeqFeature::Generic=HASH(0x20a39a18)Bio::SeqFeature::Generic=HASH(0x20a317e0)Bio::SeqFeature::Generic=HASH(0x20a398b0)Bio::SeqFeature::Generic=HASH(0x20a39838)Bio::SeqFeature::Generic=HASH(0x20a39e98)Bio::SeqFeature::Generic=HASH(0x20a3b898)Bio::SeqFeature::Generic=HASH(0x20a3a120)Bio::SeqFeature::Generic=HASH(0x20a3bda8)Bio::SeqFeature::Generic=HASH(0x20a3c030)Bio::SeqFeature::Generic=HASH(0x20a3c2b8)Bio::SeqFeature::Generic=HASH(0x20a3be20)Bio::SeqFeature::Generic=HASH(0x20a3c0a8)Bio::SeqFeature::Generic=HASH(0x20a3bb98)Bio::SeqFeature::Generic=HASH(0x20a3c300)Bio::SeqFeature::Generic=HASH(0x20a3c588)Bio::SeqFeature::Generic=HASH(0x20a3d838)Bio::SeqFeature::Generic=HASH(0x20a3dfb8)Bio::SeqFeature::Generic=HASH(0x20a3dd18) > > > $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); > print $seqio->next_seq()->translate(); ' nt.gb > Bio::Seq::RichSeq=HASH(0x20a3e7b0) > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From zhoupenggeni at gmail.com Wed Jul 11 16:05:56 2012 From: zhoupenggeni at gmail.com (Peng Zhou) Date: Wed, 11 Jul 2012 13:05:56 -0700 (PDT) Subject: [Bioperl-l] hmmer3.pm question re query and hit coordinates In-Reply-To: References: <7CF4A2C5-F44F-4C0D-A3B7-5ED131A1A9ED@gmail.com> <1823BCEE-5D27-4FF9-8D57-082AE0CFE8ED@illinois.edu> Message-ID: Thanks Chris, here is the link of the filed bug: https://redmine.open-bio.org/issues/3369 On Wednesday, July 11, 2012 2:02:46 PM UTC-5, Christopher Fields wrote: > > Peng, > > Has this been filed as a bug yet? > > https://redmine.open-bio.org/projects/bioperl > > Seems like it would be fairly easy to fix, but I want to track it just in > case. > > chris > > On Jul 11, 2012, at 12:45 PM, Peng Zhou wrote: > > > Hello guys, > > > > Just a follow-up, it seems to me the bioperl-live version is still > having the same problem - calling hit "query" while query sequence "hit". I > also looked into the test script written for hmmer3 > (bioperl-live/t/SearchIO/hmmer.t), and it doesn't deal with the alignment > part - I guess that's why this bug was not discovered. > > > > To be simple, here's an output of hmmsearch v3.0: > > # hmmsearch :: search profile(s) against a sequence database > > # HMMER 3.0 (March 2010); http://hmmer.org/ > > # Copyright (C) 2010 Howard Hughes Medical Institute. > > # Freely distributed under the GNU General Public License (GPLv3). > > # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > - > > # query HMM file: > /project/youngn/zhoup/Scripts/spada/profile/21_all.hmm > > # target sequence database: > /project/youngn/zhoup/Data/misc3/spada/Athaliana/01_genome/12_refseq_orf.fa > > > # output directed to file: > /project/youngn/zhoup/Data/misc3/spada/Athaliana/11_hmmSearchX/01_raw.txt > > # number of worker threads: 4 > > # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > - > > > > Query: CRP0000 [M=75] > > Scores for complete sequences (score includes all domains): > > --- full sequence --- --- best 1 domain --- -#dom- > > E-value score bias E-value score bias exp N Sequence > Description > > ------- ------ ----- ------- ------ ----- ---- -- -------- > ----------- > > 5.5e-25 95.0 14.4 5.7e-25 95.0 10.0 1.0 1 > Chr2_540228_540404_+ > > > > Domain annotation for each sequence (and alignments): > > >> Chr2_540228_540404_+ > > # score bias c-Evalue i-Evalue hmmfrom hmm to alifrom ali > to envfrom env to acc > > --- ------ ----- --------- --------- ------- ------- ------- > ------- ------- ------- ---- > > 1 ! 95.0 10.0 3.6e-30 5.7e-25 20 74 .. 4 > 59 .] 1 59 [] 0.95 > > > > Alignments for each domain: > > == domain 1 score: 95.0 bits; conditional E-value: 3.6e-30 > > CRP0000 20 > tegpkvaeartCesqShkFkGpCvsdtnCasvCrtEgfpgGecrg.rrrCfCtkpc 74 > > ++gp+++eartCes+Sh+FkGpCvs +nCa+vC++Egf gG+crg > rrrC+Ct++c > > Chr2_540228_540404_+ 4 > GMGPVTVEARTCESKSHRFKGPCVSTHNCANVCHNEGFGGGKCRGfRRRCYCTRHC 59 > > > 568899***99********************************************* PP > > > > And here is a dump of the parsed HSP object: > > $VAR1 = bless( { > > 'VERBOSE' => 0, > > 'IDENTICAL' => 0, > > 'RANK' => 1, > > 'STRANDED' => 'NONE', > > 'EVALUE' => '3.6e-30', > > 'HSP_LENGTH' => 56, > > 'ALGORITHM' => 'HMMSEARCH' > > 'SCORE' => '95.0', > > 'GAP_SYMBOL' => '-', > > 'CONSERVED' => 0, > > > > 'HIT_NAME' => 'Chr2_540228_540404_+', > > 'HIT_DESC' => '', > > 'HIT_START' => '20', > > 'HIT_END' => '74', > > 'HIT_LENGTH' => 56, > > 'HIT_SEQ' => > 'tegpkvaeartCesqShkFkGpCvsdtnCasvCrtEgfpgGecrg-rrrCfCtkpc', > > 'HIT_FRAME' => 0, > > > > 'QUERY_NAME' => 'CRP0000', > > 'QUERY_DESC' => undef, > > 'QUERY_START' => '4', > > 'QUERY_END' => '59', > > 'QUERY_LENGTH' => '75', > > 'QUERY_FRAME' => 0, > > 'QUERY_SEQ' => > 'GMGPVTVEARTCESKSHRFKGPCVSTHNCANVCHNEGFGGGKCRGfRRRCYCTRHC', > > > > 'HOMOLOGY_SEQ' => '++gp+++eartCes+Sh+FkGpCvs > +nCa+vC++Egf gG+crg rrrC+Ct++c', > > }, 'Bio::Search::HSP::HMMERHSP' ); > > > > Clearly, the "HIT_START", "HIT_END", "HIT_SEQ" should actually be > exchanged with "QUERY_START", "QUERY_END" and "QUERY_SEQ" values. > > > > Thanks, > > > > Peng, > > > > On Tuesday, July 19, 2011 11:23:20 PM UTC-5, Givan, Scott A. wrote: > > I'll try the bioperl-live version. Thanks guys. > > Scott Givan > > 541-740-4685 > > Sent from an iPhone (so expect typos). > > > > On Jul 19, 2011, at 10:34 PM, "Chris Fields" > wrote: > > > > > This might be a disconnect between the HMMER3 version in bioperl-live > and the one in Kai's bioperl-hmmer3 repo. I believe the one in > bioperl-live is newer. Scott, can you give that a try? > > > > > > chris > > > > > > On Jul 19, 2011, at 9:45 PM, Thomas Sharpton wrote: > > > > > >> Hi Scott, > > >> > > >> Thanks for writing. I'm on the road at the moment so I have to be > briefer and less thorough than I'd like to be. > > >> > > >> What you are observing is not the intended behavior. Oddly, it's not > what I recall obtaining in my tests on this software, though I was mostly > interested in hmmsearch at the time and may have been sloppier than I > should have been when it came to hmmscan. > > >> > > >> What version of HMMER3 you're using? There have been some small > formatting changes in the past that might be causing a burp in the parser, > though I'm doubting it. > > >> > > >> Kai Blin wrote some test scripts (found here: > bioperl-live/t/SearchIO/hmmer.t) that, if I recall correctly, evaluate > query/hit coordinates. It might be worth giving this a shot if you haven't > already. > > >> > > >> Also, if you don't mind, I'm happy to run your code on your output > file on my end. It might help me diagnose the problem. > > >> > > >> Sorry this is being a thorn in your side! I've cc'ed the list in case > anyone else has insight into this matter. > > >> > > >> Best, > > >> Thomas > > >> > > >> On Jul 19, 2011, at 10:43 AM, Givan, Scott A. wrote: > > >> > > >>> Hi Thomas, > > >>> > > >>> I'm using modules in the bipoerl-hmmer3 git repository to parse > hmmscan > > >>> reports. When I parse the files and walk through the HSP's like: > > >>> > > >>> while (my $hit = $rslt->next_model) { > > >>> > > >>> while (my $domain = $hit->next_hsp) { > > >>> > > >>> And retrieve the "hit" coordinates like: > > >>> > > >>> print "hit coords: ", $domain->start('hit'), "-", > $domain->end('hit'), > > >>> "\n"; > > >>> > > >>> The coordinates returned correspond to what I would call the > "query", > > >>> since they are for the sequence I fed to hmmscan to search the > profile > > >>> database. Likewise, when retrieving the query coordinates like > > >>> $domain->start('query'), I get what I consider the "hit" > coordinates, > > >>> since they are for the domain profile. Is this the intended > behavior? > > >>> > > >>> Thanks. > > >>> > > >>> scott > > >>> > > >>> -- > > >>> Scott A. Givan > > >>> Associate Director > > >>> Informatics Research Core Facility > > >>> 240e Bond Life Sciences Center > > >>> Research Assistant Professor > > >>> Molecular Microbiology and Immunology > > >>> University of Missouri, Columbia > > >>> > > >>> TEL 573-882-2948 > > >>> FAX 573-884-9676 > > >>> http://ircf.rnet.missouri.edu > > >>> > > >>> > > >>> > > >> > > >> _______________________________________________ > > >> Bioperl-l mailing list > > >> Bioperl-l at lists.open-bio.org > > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From zhoupenggeni at gmail.com Wed Jul 11 16:05:56 2012 From: zhoupenggeni at gmail.com (Peng Zhou) Date: Wed, 11 Jul 2012 13:05:56 -0700 (PDT) Subject: [Bioperl-l] hmmer3.pm question re query and hit coordinates In-Reply-To: References: <7CF4A2C5-F44F-4C0D-A3B7-5ED131A1A9ED@gmail.com> <1823BCEE-5D27-4FF9-8D57-082AE0CFE8ED@illinois.edu> Message-ID: Thanks Chris, here is the link of the filed bug: https://redmine.open-bio.org/issues/3369 On Wednesday, July 11, 2012 2:02:46 PM UTC-5, Christopher Fields wrote: > > Peng, > > Has this been filed as a bug yet? > > https://redmine.open-bio.org/projects/bioperl > > Seems like it would be fairly easy to fix, but I want to track it just in > case. > > chris > > On Jul 11, 2012, at 12:45 PM, Peng Zhou wrote: > > > Hello guys, > > > > Just a follow-up, it seems to me the bioperl-live version is still > having the same problem - calling hit "query" while query sequence "hit". I > also looked into the test script written for hmmer3 > (bioperl-live/t/SearchIO/hmmer.t), and it doesn't deal with the alignment > part - I guess that's why this bug was not discovered. > > > > To be simple, here's an output of hmmsearch v3.0: > > # hmmsearch :: search profile(s) against a sequence database > > # HMMER 3.0 (March 2010); http://hmmer.org/ > > # Copyright (C) 2010 Howard Hughes Medical Institute. > > # Freely distributed under the GNU General Public License (GPLv3). > > # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > - > > # query HMM file: > /project/youngn/zhoup/Scripts/spada/profile/21_all.hmm > > # target sequence database: > /project/youngn/zhoup/Data/misc3/spada/Athaliana/01_genome/12_refseq_orf.fa > > > # output directed to file: > /project/youngn/zhoup/Data/misc3/spada/Athaliana/11_hmmSearchX/01_raw.txt > > # number of worker threads: 4 > > # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > - > > > > Query: CRP0000 [M=75] > > Scores for complete sequences (score includes all domains): > > --- full sequence --- --- best 1 domain --- -#dom- > > E-value score bias E-value score bias exp N Sequence > Description > > ------- ------ ----- ------- ------ ----- ---- -- -------- > ----------- > > 5.5e-25 95.0 14.4 5.7e-25 95.0 10.0 1.0 1 > Chr2_540228_540404_+ > > > > Domain annotation for each sequence (and alignments): > > >> Chr2_540228_540404_+ > > # score bias c-Evalue i-Evalue hmmfrom hmm to alifrom ali > to envfrom env to acc > > --- ------ ----- --------- --------- ------- ------- ------- > ------- ------- ------- ---- > > 1 ! 95.0 10.0 3.6e-30 5.7e-25 20 74 .. 4 > 59 .] 1 59 [] 0.95 > > > > Alignments for each domain: > > == domain 1 score: 95.0 bits; conditional E-value: 3.6e-30 > > CRP0000 20 > tegpkvaeartCesqShkFkGpCvsdtnCasvCrtEgfpgGecrg.rrrCfCtkpc 74 > > ++gp+++eartCes+Sh+FkGpCvs +nCa+vC++Egf gG+crg > rrrC+Ct++c > > Chr2_540228_540404_+ 4 > GMGPVTVEARTCESKSHRFKGPCVSTHNCANVCHNEGFGGGKCRGfRRRCYCTRHC 59 > > > 568899***99********************************************* PP > > > > And here is a dump of the parsed HSP object: > > $VAR1 = bless( { > > 'VERBOSE' => 0, > > 'IDENTICAL' => 0, > > 'RANK' => 1, > > 'STRANDED' => 'NONE', > > 'EVALUE' => '3.6e-30', > > 'HSP_LENGTH' => 56, > > 'ALGORITHM' => 'HMMSEARCH' > > 'SCORE' => '95.0', > > 'GAP_SYMBOL' => '-', > > 'CONSERVED' => 0, > > > > 'HIT_NAME' => 'Chr2_540228_540404_+', > > 'HIT_DESC' => '', > > 'HIT_START' => '20', > > 'HIT_END' => '74', > > 'HIT_LENGTH' => 56, > > 'HIT_SEQ' => > 'tegpkvaeartCesqShkFkGpCvsdtnCasvCrtEgfpgGecrg-rrrCfCtkpc', > > 'HIT_FRAME' => 0, > > > > 'QUERY_NAME' => 'CRP0000', > > 'QUERY_DESC' => undef, > > 'QUERY_START' => '4', > > 'QUERY_END' => '59', > > 'QUERY_LENGTH' => '75', > > 'QUERY_FRAME' => 0, > > 'QUERY_SEQ' => > 'GMGPVTVEARTCESKSHRFKGPCVSTHNCANVCHNEGFGGGKCRGfRRRCYCTRHC', > > > > 'HOMOLOGY_SEQ' => '++gp+++eartCes+Sh+FkGpCvs > +nCa+vC++Egf gG+crg rrrC+Ct++c', > > }, 'Bio::Search::HSP::HMMERHSP' ); > > > > Clearly, the "HIT_START", "HIT_END", "HIT_SEQ" should actually be > exchanged with "QUERY_START", "QUERY_END" and "QUERY_SEQ" values. > > > > Thanks, > > > > Peng, > > > > On Tuesday, July 19, 2011 11:23:20 PM UTC-5, Givan, Scott A. wrote: > > I'll try the bioperl-live version. Thanks guys. > > Scott Givan > > 541-740-4685 > > Sent from an iPhone (so expect typos). > > > > On Jul 19, 2011, at 10:34 PM, "Chris Fields" > wrote: > > > > > This might be a disconnect between the HMMER3 version in bioperl-live > and the one in Kai's bioperl-hmmer3 repo. I believe the one in > bioperl-live is newer. Scott, can you give that a try? > > > > > > chris > > > > > > On Jul 19, 2011, at 9:45 PM, Thomas Sharpton wrote: > > > > > >> Hi Scott, > > >> > > >> Thanks for writing. I'm on the road at the moment so I have to be > briefer and less thorough than I'd like to be. > > >> > > >> What you are observing is not the intended behavior. Oddly, it's not > what I recall obtaining in my tests on this software, though I was mostly > interested in hmmsearch at the time and may have been sloppier than I > should have been when it came to hmmscan. > > >> > > >> What version of HMMER3 you're using? There have been some small > formatting changes in the past that might be causing a burp in the parser, > though I'm doubting it. > > >> > > >> Kai Blin wrote some test scripts (found here: > bioperl-live/t/SearchIO/hmmer.t) that, if I recall correctly, evaluate > query/hit coordinates. It might be worth giving this a shot if you haven't > already. > > >> > > >> Also, if you don't mind, I'm happy to run your code on your output > file on my end. It might help me diagnose the problem. > > >> > > >> Sorry this is being a thorn in your side! I've cc'ed the list in case > anyone else has insight into this matter. > > >> > > >> Best, > > >> Thomas > > >> > > >> On Jul 19, 2011, at 10:43 AM, Givan, Scott A. wrote: > > >> > > >>> Hi Thomas, > > >>> > > >>> I'm using modules in the bipoerl-hmmer3 git repository to parse > hmmscan > > >>> reports. When I parse the files and walk through the HSP's like: > > >>> > > >>> while (my $hit = $rslt->next_model) { > > >>> > > >>> while (my $domain = $hit->next_hsp) { > > >>> > > >>> And retrieve the "hit" coordinates like: > > >>> > > >>> print "hit coords: ", $domain->start('hit'), "-", > $domain->end('hit'), > > >>> "\n"; > > >>> > > >>> The coordinates returned correspond to what I would call the > "query", > > >>> since they are for the sequence I fed to hmmscan to search the > profile > > >>> database. Likewise, when retrieving the query coordinates like > > >>> $domain->start('query'), I get what I consider the "hit" > coordinates, > > >>> since they are for the domain profile. Is this the intended > behavior? > > >>> > > >>> Thanks. > > >>> > > >>> scott > > >>> > > >>> -- > > >>> Scott A. Givan > > >>> Associate Director > > >>> Informatics Research Core Facility > > >>> 240e Bond Life Sciences Center > > >>> Research Assistant Professor > > >>> Molecular Microbiology and Immunology > > >>> University of Missouri, Columbia > > >>> > > >>> TEL 573-882-2948 > > >>> FAX 573-884-9676 > > >>> http://ircf.rnet.missouri.edu > > >>> > > >>> > > >>> > > >> > > >> _______________________________________________ > > >> Bioperl-l mailing list > > >> Bioperl-l at lists.open-bio.org > > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From w.arindrarto at gmail.com Wed Jul 11 17:25:44 2012 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Wed, 11 Jul 2012 23:25:44 +0200 Subject: [Bioperl-l] hmmer3.pm question re query and hit coordinates In-Reply-To: References: <7CF4A2C5-F44F-4C0D-A3B7-5ED131A1A9ED@gmail.com> <1823BCEE-5D27-4FF9-8D57-082AE0CFE8ED@illinois.edu> Message-ID: Hi everyone, Just as an additional info that might be useful: The current Biopython parser for the plain text format parses the very first line to find out which HMMER flavor produces the result. Both 'hmm from' and 'hmmto' are query coordinates if the flavor is hmmsearch or phmmer; and they're hit coordinates if the flavor is hmmscan. This information is not available in other HMMER command line output formats (tblout and domtblout), which as Peter has mentioned, required us to treat different flavors of the table output as different formats for the time being. Fortunately, after contacting the HMMER developers they mentioned that this is not the case anymore in their development branch (and their future planned release). Hope that helps :), Bow On Wed, Jul 11, 2012 at 11:00 PM, Peter Cock wrote: > On Wed, Jul 11, 2012 at 8:02 PM, Fields, Christopher J > wrote: >> Peng, >> >> Has this been filed as a bug yet? >> >> https://redmine.open-bio.org/projects/bioperl >> >> Seems like it would be fairly easy to fix, but I want to track it just in case. >> >> chris > > Hi all, > > This could be the unfortunate fact that hmmscan and > hmmsearch return very similar tabular output, but > with query and hit interchanged. i.e. You need some > extra information to know which way round they are > (not possible with the current output). This was an > issue in Bow's Biopython SearchIO project - which > for the moment he solved by handling this as two > hmmer file formats. In the medium term we're hoping > hmmer3 will add some header information or something. > > Peter From dejian.zhao at gmail.com Thu Jul 12 01:04:54 2012 From: dejian.zhao at gmail.com (De-Jian Zhao) Date: Thu, 12 Jul 2012 13:04:54 +0800 Subject: [Bioperl-l] Errors with Bio::Seq objects In-Reply-To: <25cf332a-f998-4eae-96ff-d9db1ee2ff2c@googlegroups.com> References: <4FFDB879.1020906@gmail.com> <25cf332a-f998-4eae-96ff-d9db1ee2ff2c@googlegroups.com> Message-ID: <4FFE5AF6.1020300@gmail.com> Thank you, Peng. That's great! Actually I am wondering how to get the whole content of an object these days; "Dumping it" is a good solution. On 2012-7-12 2:03, Peng Zhou wrote: > Hi, > > I guess that's what the commands are supposed to do: the get_SeqFeatures() > method return an array of Bio::SeqFeature objects, and the translate() > method returns a Bio::Seq object. And you can't simply "print" an object in > perl - you can "dump" it though: > > $ perl -e ' use Bio::SeqIO; use Data::Dumper; my > $seqio=Bio::SeqIO->new(-file=>shift); > print Dumper($seqio->next_seq()->get_SeqFeatures()); ' nt.gb > > $ perl -e ' use Bio::SeqIO; use Data::Dumper; my > $seqio=Bio::SeqIO->new(-file=>shift); > print Dumper($seqio->next_seq()->translate()); ' nt.gb > > On Wednesday, July 11, 2012 12:31:37 PM UTC-5, De-Jian Zhao wrote: >> Hi, >> >> I downloaded a nucleotide sequence from Genbank (file name: nt.gb) and >> tested the Bio::SeqIO module as follows. The first 3 commands succeeded; >> however the last 2 failed. >> >> I think $seqio->next_seq() produces a Bio::Seq object which contains the >> sequence, features and annotation (according to the DESCRIPTION of >> "perldoc Bio::Seq") and thus the invocation of the methods >> get_SeqFeatures() and translate() should be valid. However, the results >> denied this idea. >> >> Will anyone explain what happened to the last 2 commands? I have >> encountered numerous cases of failures when testing the bioperl methods. >> I want to translate the mRNA sequence and extract the sequence features. >> What are the right commands? Thanks a lot! >> >> Best, >> Dejian >> >> >> >> PS: The commands and results >> >> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); >> print $seqio->next_seq()->display_id(); ' nt.gb >> NM_053056 >> >> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); >> print $seqio->next_seq()->molecule(); ' nt.gb >> mRNA >> >> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); >> print $seqio->next_seq()->subseq(1,6); ' nt.gb >> CACACG >> >> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); >> print $seqio->next_seq()->get_SeqFeatures(); ' nt.gb >> Bio::SeqFeature::Generic=HASH(0x20a30898)Bio::SeqFeature::Generic=HASH(0x20a30bb0)Bio::SeqFeature::Generic=HASH(0x20a30cd0)Bio::SeqFeature::Generic=HASH(0x20a317b0)Bio::SeqFeature::Generic=HASH(0x20a31720)Bio::SeqFeature::Generic=HASH(0x20a39a18)Bio::SeqFeature::Generic=HASH(0x20a317e0)Bio::SeqFeature::Generic=HASH(0x20a398b0)Bio::SeqFeature::Generic=HASH(0x20a39838)Bio::SeqFeature::Generic=HASH(0x20a39e98)Bio::SeqFeature::Generic=HASH(0x20a3b898)Bio::SeqFeature::Generic=HASH(0x20a3a120)Bio::SeqFeature::Generic=HASH(0x20a3bda8)Bio::SeqFeature::Generic=HASH(0x20a3c030)Bio::SeqFeature::Generic=HASH(0x20a3c2b8)Bio::SeqFeature::Generic=HASH(0x20a3be20)Bio::SeqFeature::Generic=HASH(0x20a3c0a8)Bio::SeqFeature::Generic=HASH(0x20a3bb98)Bio::SeqFeature::Generic=HASH(0x20a3c300)Bio::SeqFeature::Generic=HASH(0x20a3c588)Bio::SeqFeature::Generic=HASH(0x20a3d838)Bio::SeqFeature::Generic=HASH(0x20a3dfb8)Bio::SeqFeature::Generic=HASH(0x20a3dd18) >> >> >> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); >> print $seqio->next_seq()->translate(); ' nt.gb >> Bio::Seq::RichSeq=HASH(0x20a3e7b0) >> From dejian.zhao at gmail.com Thu Jul 12 01:14:33 2012 From: dejian.zhao at gmail.com (De-Jian Zhao) Date: Thu, 12 Jul 2012 13:14:33 +0800 Subject: [Bioperl-l] Errors with Bio::Seq objects In-Reply-To: <9CA9DA3A-B03F-4EC3-977C-E18A6F4D9B6F@tamu.edu> References: <4FFDB879.1020906@gmail.com> <9CA9DA3A-B03F-4EC3-977C-E18A6F4D9B6F@tamu.edu> Message-ID: <4FFE5D39.6010406@gmail.com> Thank you, Jim. You are right. It works. This example deepens my understanding of OOP. On 2012-7-12 2:01, Jim Hu wrote: >> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); print $seqio->next_seq()->translate(); ' nt.gb >> > Bio::Seq::RichSeq=HASH(0x20a3e7b0) > ->translate returns a new Seq object. I think > > $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); print $seqio->next_seq()->translate()->seq(); ' nt.gb > > should work (haven't tried it). From kai.blin at biotech.uni-tuebingen.de Thu Jul 12 09:43:19 2012 From: kai.blin at biotech.uni-tuebingen.de (Kai Blin) Date: Thu, 12 Jul 2012 15:43:19 +0200 Subject: [Bioperl-l] hmmer3.pm question re query and hit coordinates In-Reply-To: References: <7CF4A2C5-F44F-4C0D-A3B7-5ED131A1A9ED@gmail.com> <1823BCEE-5D27-4FF9-8D57-082AE0CFE8ED@illinois.edu> Message-ID: <4FFED477.3090907@biotech.uni-tuebingen.de> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 2012-07-11 23:25, Wibowo Arindrarto wrote: Hi, > The current Biopython parser for the plain text format parses the > very first line to find out which HMMER flavor produces the result. > Both 'hmm from' and 'hmmto' are query coordinates if the flavor is > hmmsearch or phmmer; and they're hit coordinates if the flavor is > hmmscan. Whoops. I mostly looked at hmmscan when writing the parser, because that's the file format I needed for my code. The code clearly should follow the way the hmmer2 parser works, and differentiate between hmmsearch and hmmscan type output. As I said on the bug report, I'm happy to look at code fixing this. > This information is not available in other HMMER command line > output formats (tblout and domtblout), which as Peter has > mentioned, required us to treat different flavors of the table > output as different formats for the time being. As far as I'm aware, BioPerl currently doesn't parse the table output format. Seeing how much repeated pain we run into with all these parsers in the different Bio* projects, I wonder if there was a smarter way to deal with parsing. Maybe at least some shared grammar file that we could use for testing, to make sure we at least have the same expectations about file formats in the different language implementations. Ideally we'd auto-generate the parsers from the grammar specification, but I guess that'll stay wishful thinking for quite a bit. > Fortunately, after contacting the HMMER developers they mentioned > that this is not the case anymore in their development branch (and > their future planned release). That's certainly good news. :) Cheers, Kai - -- Dipl.-Inform. Kai Blin kai.blin at biotech.uni-tuebingen.de Institute for Microbiology and Infection Medicine Division of Microbiology/Biotechnology Eberhard-Karls-Universit?t T?bingen Auf der Morgenstelle 28 Phone : ++49 7071 29-78841 D-72076 T?bingen Fax : ++49 7071 29-5979 Germany Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQEcBAEBAgAGBQJP/tR3AAoJEKM5lwBiwTTP6OoIAM3J9chdyfmTuQTp4KMxVIk7 PCkJy+aLcnfa3d7s8BVPG0GWQTPrfHLX6a7zWfoSLzL9RBShFWCQIxGpu+Tq3yR8 Hu/TpoFIg8bB1iAroAWLdsX8nio3Idlcl5JN38LBsFEUirFrGAsvfdN/+fYrP5Ni y0ULP18uihiN07sVG88nZXNyEB7fIscVYdO90GsGq03/KOTRsRD4kugapiQJIy4D lrqnYznLa4p30lBDCEHbTaHYbfIs7/8tryfHJsfjimjg8IoSMHMJfIkI7/z0qlL+ bxt/HuGMsm1Ak08xEAoT7T00t5tcAp1gclgZsO/CrviOicmhUgd6iri/kIpzg0c= =acWd -----END PGP SIGNATURE----- From cjfields at illinois.edu Thu Jul 12 11:24:13 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Thu, 12 Jul 2012 15:24:13 +0000 Subject: [Bioperl-l] hmmer3.pm question re query and hit coordinates In-Reply-To: <4FFED477.3090907@biotech.uni-tuebingen.de> References: <7CF4A2C5-F44F-4C0D-A3B7-5ED131A1A9ED@gmail.com> <1823BCEE-5D27-4FF9-8D57-082AE0CFE8ED@illinois.edu> <4FFED477.3090907@biotech.uni-tuebingen.de> Message-ID: <1C3A31F9-9717-49F3-A880-FA725D0F3CDB@illinois.edu> On Jul 12, 2012, at 8:43 AM, Kai Blin wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On 2012-07-11 23:25, Wibowo Arindrarto wrote: > > Hi, > >> The current Biopython parser for the plain text format parses the >> very first line to find out which HMMER flavor produces the result. >> Both 'hmm from' and 'hmmto' are query coordinates if the flavor is >> hmmsearch or phmmer; and they're hit coordinates if the flavor is >> hmmscan. > > Whoops. I mostly looked at hmmscan when writing the parser, because > that's the file format I needed for my code. The code clearly should > follow the way the hmmer2 parser works, and differentiate between > hmmsearch and hmmscan type output. > > As I said on the bug report, I'm happy to look at code fixing this. Seems like it should be easy enough to address if there is something in the output that indicates the report type. >> This information is not available in other HMMER command line >> output formats (tblout and domtblout), which as Peter has >> mentioned, required us to treat different flavors of the table >> output as different formats for the time being. > > As far as I'm aware, BioPerl currently doesn't parse the table output > format. The only reason to do so is if the table provides additional information the actual hits don't (this can be the case with BLAST reports). > Seeing how much repeated pain we run into with all these parsers in > the different Bio* projects, I wonder if there was a smarter way to > deal with parsing. Maybe at least some shared grammar file that we > could use for testing, to make sure we at least have the same > expectations about file formats in the different language > implementations. Ideally we'd auto-generate the parsers from the > grammar specification, but I guess that'll stay wishful thinking for > quite a bit. I would fully support something like this, been thinking about this with Marpa::XS (which now has a compiled library, libmarpa, to make it less perl-centric), and there have been talks of using a similar toolkit with the bioruby folks. We could always have a plain-perl/python/ruby/etc fallback in the most common formats. chris From buschj at hhu.de Sun Jul 15 15:46:42 2012 From: buschj at hhu.de (jobu) Date: Sun, 15 Jul 2012 21:46:42 +0200 Subject: [Bioperl-l] How to obtain Up- and Downstream target-Sequences of Blast Matches Message-ID: <50031E22.3060902@hhu.de> Dear All. Still being a beginner in Perl and just having started to look into BioPerl, I hope to ask my question at the right place. I locally ran a standalone blastn search of many short query-sequences against a set of target-fasta-sequences consisting of whole chromosomal sequence data. What I need to do now is to get let's say 100nt each Up- and Downstream out of my target sequences for each Blast match. At this point I only can assume that BioPerl might be helpfull in resolving this task, though I haven't found a module yet that will manage to do this locally on my harddrive. Thus I would be thankful for the slightest hint where to begin. Sincerely Jochen From Russell.Smithies at agresearch.co.nz Sun Jul 15 17:19:14 2012 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Mon, 16 Jul 2012 09:19:14 +1200 Subject: [Bioperl-l] How to obtain Up- and Downstream target-Sequences of Blast Matches In-Reply-To: <50031E22.3060902@hhu.de> References: <50031E22.3060902@hhu.de> Message-ID: <18DF7D20DFEC044098A1062202F5FFF34CCF2A4CAA@exchsth.agresearch.co.nz> Hi Jochen, I don't think BioPerl can directly manipulate blast databases so I'd probably do it with fastacmd to extract the sequence from the original blast database. eg. fastacmd -s X51494.1 -d /dataset/blastdata/active/nt -L 100,200 >gi|20090|emb|X51494.1|:100-200 Rice prolamin gene (strain NE4) ATGATGCAAACGTTGGGCATGGGTAGCTCCACAGCCATGTTCATGTCGCAGCCAATGGCGCTCCTGCAGCAGCAATGTTG CATGCAGCTACAAGGCATGAT Or if you're using blast+, use the blastdbcmd command: eg. blastdbcmd -entry X51494.1 -db /dataset/blastdata/active/nt -range 100-200 >gi|20090|emb|X51494.1|:100-200 Rice prolamin gene (strain NE4) ATGATGCAAACGTTGGGCATGGGTAGCTCCACAGCCATGTTCATGTCGCAGCCAATGGCGCTCCTGCAGCAGCAATGTTG CATGCAGCTACAAGGCATGAT So to add it all together, try using BioPerl to parse your existing blast results and pull out each hit's coordinates then use a system call to exec fastacmd or blastdbcmd to extract the sequence from the blast database then write the sequences to file. These might be useful: http://www.bioperl.org/wiki/HOWTO:SearchIO http://www.bioperl.org/wiki/HOWTO:SearchIO#Speed_improvements_with_lightweight_objects http://www.bioperl.org/wiki/HOWTO:BlastPlus http://www.bioperl.org/wiki/HOWTO:StandAloneBlast --Russell -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of jobu Sent: Monday, 16 July 2012 7:47 a.m. To: bioperl-l at lists.open-bio.org Subject: [Bioperl-l] How to obtain Up- and Downstream target-Sequences of Blast Matches Dear All. Still being a beginner in Perl and just having started to look into BioPerl, I hope to ask my question at the right place. I locally ran a standalone blastn search of many short query-sequences against a set of target-fasta-sequences consisting of whole chromosomal sequence data. What I need to do now is to get let's say 100nt each Up- and Downstream out of my target sequences for each Blast match. At this point I only can assume that BioPerl might be helpfull in resolving this task, though I haven't found a module yet that will manage to do this locally on my harddrive. Thus I would be thankful for the slightest hint where to begin. Sincerely Jochen _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From dcmertens.perl at gmail.com Tue Jul 17 08:57:55 2012 From: dcmertens.perl at gmail.com (David Mertens) Date: Tue, 17 Jul 2012 07:57:55 -0500 Subject: [Bioperl-l] Announcing The Quantified Onion Google Group and perl4science.github.com Message-ID: Hello everybody - I returned from YAPC::NA this year intending to build-up the scientific Perl community. One outgrowth of this has been Joel Berger's creation of perl4science.github.com and gizmomathboy's creation of The Quantified Onion Google Group . perl4science is meant to be a landing page for anybody looking to combine Perl and science. Since it is a github repository, it makes it about as easy as possible for others to contribute content or fixes. If you have a project that scientists would find useful, you should fork the project, add your content, and issue a pull request. It's that easy. The Quantified Onion is meant to be a space for scientists to discuss how we use Perl in our science and to work together to grow adoption of Perl among scientists. It will undoubtedly attract newcomers to Perll asking beginner questions, at which point we will gently refer them to the appropriate manual pages. Interesting discussions thus far (in my mind) include a discussion about teaching test-driven design and a discussion about submitting an article to Computing in Science and Engineering for their November Issue, which is supposed to be about Modern Programming Languages. I would like to begin putting on workshops on Perl for Scientists and Engineers (and encourage others to do that same), and I will begin the discussion on The Quantified Onion. If you know of other Perl science resources, please feel free to add them to perl4science or post them on The Quantified Onion, and please join The Quantified Onion. Together, we can grow Perl's adoption among scientists! David Mertens -- "Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it." -- Brian Kernighan From cjfields at illinois.edu Wed Jul 18 10:29:02 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Wed, 18 Jul 2012 14:29:02 +0000 Subject: [Bioperl-l] [blast-announce] OLD_BLAST parameter to be discontinued. Alternative NCBI BLAST parsable formats are available References: Message-ID: <118F034CF4C3EF48A96F86CE585B94BF2B63F6C5@CHIMBX5.ad.uillinois.edu> Not sure if anyone is using this as a means of getting their reports (I don't), but I'm posting this here just in case. -c Begin forwarded message: > From: "Mcginnis, Scott (NIH/NLM/NCBI) [E]" > Subject: [blast-announce] OLD_BLAST parameter to be discontinued. Alternative NCBI BLAST parsable formats are available > Date: July 18, 2012 9:17:05 AM CDT > To: NLM/NCBI List blast-announce > > Beginning Sept. 10, 2012, the BLAST service will ignore the OLD_BLAST parameter in posted URLs. We are removing this old and little used option to prepare for upcoming enhancements to the BLAST service later this year. Setting OLD_BLAST=true produces an older version of the BLAST HTML results that a few people have used for automated processing (parsing) of results. NCBI BLAST supports a number of different and more stable formats for parsing. These include XML, tabular reports and ASN.1. For more information, please see BLAST Developer Information (http://1.usa.gov/O8AocI) and links on that page. > From dejian.zhao at gmail.com Wed Jul 18 11:36:14 2012 From: dejian.zhao at gmail.com (De-Jian Zhao) Date: Wed, 18 Jul 2012 23:36:14 +0800 Subject: [Bioperl-l] Which graphic module should I learn? Message-ID: <5006D7EE.1020205@gmail.com> Hi, all. Currently I am working on a genome. I will draw some pictures based on the sequencing data. In the long run, I will use the module in my future projects, so I want to learn a popular module to get better support from the community. I searched in cpan with the command "i /SVG/" and got 234 items. Which one is popular in bioinformatics? Which module should I start with? Thanks for any suggestions. Best, De-Jian From scott at scottcain.net Wed Jul 18 11:46:01 2012 From: scott at scottcain.net (Scott Cain) Date: Wed, 18 Jul 2012 11:46:01 -0400 Subject: [Bioperl-l] Which graphic module should I learn? In-Reply-To: <5006D7EE.1020205@gmail.com> References: <5006D7EE.1020205@gmail.com> Message-ID: Hi De-Jian, Of course, it depends on what you want to do, but if you're referring to the genome feature/annotation type graphics, Bio::Graphics already supports SVG pretty well, via GD::SVG. Scott On Wed, Jul 18, 2012 at 11:36 AM, De-Jian Zhao wrote: > Hi, all. > > Currently I am working on a genome. I will draw some pictures based on the > sequencing data. In the long run, I will use the module in my future > projects, so I want to learn a popular module to get better support from the > community. I searched in cpan with the command "i /SVG/" and got 234 items. > Which one is popular in bioinformatics? Which module should I start with? > Thanks for any suggestions. > > Best, > De-Jian > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From cjfields at illinois.edu Tue Jul 24 23:08:05 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Wed, 25 Jul 2012 03:08:05 +0000 Subject: [Bioperl-l] BioPerl and Travis-CI Message-ID: <118F034CF4C3EF48A96F86CE585B94BF3140B017@CITESMBX5.ad.uillinois.edu> Peter Cock has graciously helped start up a branch for bioperl-live that is using Travis-CI (a nice continuous integration tool). Results from Peter's fork are found here: http://travis-ci.org/#!/peterjc/bioperl-live As this is now pulled into the main bioperl repo, results will be here: http://travis-ci.org/#!/bioperl/bioperl-live I'll be working on this and expect this will be added to master in the next few days. chris From p.j.a.cock at googlemail.com Wed Jul 25 06:31:13 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 25 Jul 2012 11:31:13 +0100 Subject: [Bioperl-l] BioPerl and Travis-CI In-Reply-To: <118F034CF4C3EF48A96F86CE585B94BF3140B017@CITESMBX5.ad.uillinois.edu> References: <118F034CF4C3EF48A96F86CE585B94BF3140B017@CITESMBX5.ad.uillinois.edu> Message-ID: On Wed, Jul 25, 2012 at 4:08 AM, Fields, Christopher J wrote: > Peter Cock has graciously helped start up a branch for bioperl-live > that is using Travis-CI (a nice continuous integration tool). Results > from Peter's fork are found here: > > http://travis-ci.org/#!/peterjc/bioperl-live > > As this is now pulled into the main bioperl repo, results will be here: > > http://travis-ci.org/#!/bioperl/bioperl-live > > I'll be working on this and expect this will be added to master in > the next few days. > > chris We've had this running for Biopython for a month now, and it has been a useful complement to the BuildBot (which covers other operating systems). This was following BioRuby's lead: http://biopython.org/pipermail/biopython-dev/2012-June/009742.html The current BioPerl Travis configuration is probably usable right now (after changing the branch whitelist to either master, or simple all branches). Other remaining issues include sorting out which dependencies should be installed, and streamlining their verbose output (e.g. using tail). TravisCI can send out emails (e.g. on test failures), and perhaps bioperl-guts-l might be a sensible place to send these. Initially we'd disabled the emails for Biopython. I'd like to use an RSS feed... there is a JSON API which BioRuby are using for http://www.biogems.info/ which tracks their plugins. Peter From p.j.a.cock at googlemail.com Fri Jul 27 11:03:05 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 27 Jul 2012 16:03:05 +0100 Subject: [Bioperl-l] BioPerl and Travis-CI In-Reply-To: <118F034CF4C3EF48A96F86CE585B94BF3140D21F@CITESMBX5.ad.uillinois.edu> References: <118F034CF4C3EF48A96F86CE585B94BF3140B017@CITESMBX5.ad.uillinois.edu> <118F034CF4C3EF48A96F86CE585B94BF3140D21F@CITESMBX5.ad.uillinois.edu> Message-ID: On Fri, Jul 27, 2012 at 3:58 PM, Fields, Christopher J wrote: > On Jul 27, 2012, at 9:47 AM, Peter Cock wrote: > >> On Thu, Jul 26, 2012 at 4:22 PM, Peter Cock wrote: >>> >>> That's done now - except for the circular dependencies, and GD, >>> which might be easy to solve if anyone knows what the error >>> means - see commit message here: >>> https://github.com/peterjc/bioperl-live/commit/905441ac09939be3368c14de38d04486c7e9849a >> >> Re: https://twitter.com/cjfields/status/228861370454638592 >> Not sure why you got GD to work when something very similar >> had failed for me. Oh well - job done :) > > It was the lack of gdlib-config in the libgd2-xpm package, you need > libgd2-xpm-dev. One of the fun things about Debian packaging. Ah - I should have guessed that. >>> Would a single clean commit of the (current) .travis.yml file be >>> preferable to the current series of commits? And you you want >>> a pull request, or would you just merge/cherry-pick manually? >> >> Given all the churn between our revisions, personally I'd opt for >> a single clean commit to bioperl/master - but your call. >> >> Peter > > Yep, about to merge it over. It's working now, just need to > whitelist master instead of travis after the merge. I'd removed the whitelist altogether here: https://github.com/peterjc/bioperl-live/commit/96dc5866f4406179353909c72d812623341c8fbd My thinking was BioPerl seems to have multiple feature branches under the official repo, so they should get tested too. You'd be in a better position than me to judge what would work best for BioPerl here. Peter From cjfields at illinois.edu Fri Jul 27 10:58:21 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Fri, 27 Jul 2012 14:58:21 +0000 Subject: [Bioperl-l] BioPerl and Travis-CI In-Reply-To: References: <118F034CF4C3EF48A96F86CE585B94BF3140B017@CITESMBX5.ad.uillinois.edu> Message-ID: <118F034CF4C3EF48A96F86CE585B94BF3140D21F@CITESMBX5.ad.uillinois.edu> On Jul 27, 2012, at 9:47 AM, Peter Cock wrote: > On Thu, Jul 26, 2012 at 4:22 PM, Peter Cock wrote: >> >> That's done now - except for the circular dependencies, and GD, >> which might be easy to solve if anyone knows what the error >> means - see commit message here: >> https://github.com/peterjc/bioperl-live/commit/905441ac09939be3368c14de38d04486c7e9849a > > Re: https://twitter.com/cjfields/status/228861370454638592 > Not sure why you got GD to work when something very similar > had failed for me. Oh well - job done :) It was the lack of gdlib-config in the libgd2-xpm package, you need libgd2-xpm-dev. One of the fun things about Debian packaging. >> Would a single clean commit of the (current) .travis.yml file be >> preferable to the current series of commits? And you you want >> a pull request, or would you just merge/cherry-pick manually? > > Given all the churn between our revisions, personally I'd opt for > a single clean commit to bioperl/master - but your call. > > Peter Yep, about to merge it over. It's working now, just need to whitelist master instead of travis after the merge. chris From cjfields at illinois.edu Fri Jul 27 12:26:34 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Fri, 27 Jul 2012 16:26:34 +0000 Subject: [Bioperl-l] BioPerl Travis-CI now live Message-ID: <118F034CF4C3EF48A96F86CE585B94BF3140D54D@CITESMBX5.ad.uillinois.edu> All commits to bioperl-live master branch on github are now being tracked: http://travis-ci.org/#!/bioperl/bioperl-live The .travis.yml file has a whitelist for branches to be tested; if anyone wants to test additional branches feel free to add them to the list! chris From cjfields at illinois.edu Fri Jul 27 11:15:19 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Fri, 27 Jul 2012 15:15:19 +0000 Subject: [Bioperl-l] BioPerl and Travis-CI In-Reply-To: References: <118F034CF4C3EF48A96F86CE585B94BF3140B017@CITESMBX5.ad.uillinois.edu> <118F034CF4C3EF48A96F86CE585B94BF3140D21F@CITESMBX5.ad.uillinois.edu> Message-ID: <118F034CF4C3EF48A96F86CE585B94BF3140D2D6@CITESMBX5.ad.uillinois.edu> On Jul 27, 2012, at 10:03 AM, Peter Cock wrote: > On Fri, Jul 27, 2012 at 3:58 PM, Fields, Christopher J > wrote: >> On Jul 27, 2012, at 9:47 AM, Peter Cock wrote: >> >>> On Thu, Jul 26, 2012 at 4:22 PM, Peter Cock wrote: >>>> >>>> That's done now - except for the circular dependencies, and GD, >>>> which might be easy to solve if anyone knows what the error >>>> means - see commit message here: >>>> https://github.com/peterjc/bioperl-live/commit/905441ac09939be3368c14de38d04486c7e9849a >>> >>> Re: https://twitter.com/cjfields/status/228861370454638592 >>> Not sure why you got GD to work when something very similar >>> had failed for me. Oh well - job done :) >> >> It was the lack of gdlib-config in the libgd2-xpm package, you need >> libgd2-xpm-dev. One of the fun things about Debian packaging. > > Ah - I should have guessed that. > >>>> Would a single clean commit of the (current) .travis.yml file be >>>> preferable to the current series of commits? And you you want >>>> a pull request, or would you just merge/cherry-pick manually? >>> >>> Given all the churn between our revisions, personally I'd opt for >>> a single clean commit to bioperl/master - but your call. >>> >>> Peter >> >> Yep, about to merge it over. It's working now, just need to >> whitelist master instead of travis after the merge. > > I'd removed the whitelist altogether here: > https://github.com/peterjc/bioperl-live/commit/96dc5866f4406179353909c72d812623341c8fbd > > My thinking was BioPerl seems to have multiple feature branches > under the official repo, so they should get tested too. You'd be > in a better position than me to judge what would work best for > BioPerl here. > > Peter We'll keep it to master for now. It's pretty easy to add branches as needed, and I didn't want to expand to all the potentially stale branches unless explicitly set (we need to triage all those at some point). chris From p.j.a.cock at googlemail.com Fri Jul 27 10:47:18 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 27 Jul 2012 15:47:18 +0100 Subject: [Bioperl-l] BioPerl and Travis-CI In-Reply-To: References: <118F034CF4C3EF48A96F86CE585B94BF3140B017@CITESMBX5.ad.uillinois.edu> Message-ID: On Thu, Jul 26, 2012 at 4:22 PM, Peter Cock wrote: > > That's done now - except for the circular dependencies, and GD, > which might be easy to solve if anyone knows what the error > means - see commit message here: > https://github.com/peterjc/bioperl-live/commit/905441ac09939be3368c14de38d04486c7e9849a Re: https://twitter.com/cjfields/status/228861370454638592 Not sure why you got GD to work when something very similar had failed for me. Oh well - job done :) > Would a single clean commit of the (current) .travis.yml file be > preferable to the current series of commits? And you you want > a pull request, or would you just merge/cherry-pick manually? Given all the churn between our revisions, personally I'd opt for a single clean commit to bioperl/master - but your call. Peter From robfsouza at gmail.com Fri Jul 27 18:29:22 2012 From: robfsouza at gmail.com (Robson de Souza) Date: Fri, 27 Jul 2012 15:29:22 -0700 (PDT) Subject: [Bioperl-l] obf sites offline? Message-ID: <9bef8a3b-08ca-4868-be7a-193e7596290d@googlegroups.com> I can't access any of the OBF sites, either from work (USA) or my phone... is there something going on? Robson From p.j.a.cock at googlemail.com Thu Jul 26 11:22:26 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 26 Jul 2012 16:22:26 +0100 Subject: [Bioperl-l] BioPerl and Travis-CI In-Reply-To: References: <118F034CF4C3EF48A96F86CE585B94BF3140B017@CITESMBX5.ad.uillinois.edu> Message-ID: On Wed, Jul 25, 2012 at 11:31 AM, Peter Cock wrote: > On Wed, Jul 25, 2012 at 4:08 AM, Fields, Christopher J > wrote: >> Peter Cock has graciously helped start up a branch for bioperl-live >> that is using Travis-CI (a nice continuous integration tool). Results >> from Peter's fork are found here: >> >> http://travis-ci.org/#!/peterjc/bioperl-live >> >> As this is now pulled into the main bioperl repo, results will be here: >> >> http://travis-ci.org/#!/bioperl/bioperl-live >> >> I'll be working on this and expect this will be added to master in >> the next few days. >> >> chris > > We've had this running for Biopython for a month now, and it has > been a useful complement to the BuildBot (which covers other > operating systems). This was following BioRuby's lead: > http://biopython.org/pipermail/biopython-dev/2012-June/009742.html > > The current BioPerl Travis configuration is probably usable right > now (after changing the branch whitelist to either master, or simple > all branches). > > Other remaining issues include sorting out which dependencies > should be installed, and streamlining their verbose output (e.g. > using tail). That's done now - except for the circular dependencies, and GD, which might be easy to solve if anyone knows what the error means - see commit message here: https://github.com/peterjc/bioperl-live/commit/905441ac09939be3368c14de38d04486c7e9849a Would a single clean commit of the (current) .travis.yml file be preferable to the current series of commits? And you you want a pull request, or would you just merge/cherry-pick manually? > TravisCI can send out emails (e.g. on test failures), and perhaps > bioperl-guts-l might be a sensible place to send these. Initially > we'd disabled the emails for Biopython. I'd like to use an RSS > feed... there is a JSON API which BioRuby are using for > http://www.biogems.info/ which tracks their plugins. I've filed an issue for news feed support in TravisCI, https://github.com/travis-ci/travis-core/issues/82 Regards, Peter From p.j.a.cock at googlemail.com Tue Jul 31 06:37:35 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 31 Jul 2012 11:37:35 +0100 Subject: [Bioperl-l] Travis Continuous Integration testing & pull requests Message-ID: Hi all, I'm cross posting as this is an announcement. Please keep any follow up discussion to the relevant project specific mailing list, or if general open-bio-l please. Those following the OBF blog or the OBF or Bio* Twitter accounts will have already seen this, which I posted yesterday: http://news.open-bio.org/news/2012/07/travis-ci-for-testing/ In summary, since earlier this year BioRuby and then Biopython and BioPerl have been using Travis-CI.org (a hosted continuous integration service for the open source community) to run their unit tests automatically whenever their GitHub repositories are updated. In addition we now have TravisCI automatically running our tests on any new GitHub pull requests - supported by an OBF donation to Travis-CI, see: http://about.travis-ci.org/blog/announcing-pull-request-support/ Currently BioJava only uses GitHub as an SVN mirror - but this should still let you start using TravisCI for automated testing: http://about.travis-ci.org/docs/user/languages/java/ For EMBOSS, this is another incentive to convert from CVS to github - TravisCI recently announced support for C/C++ projects: http://about.travis-ci.org/blog/support_for_go_c_and_cpp/ http://about.travis-ci.org/docs/user/languages/c/ Potentially there are other OBF projects where this would be useful too. Regards, Peter From wrp at virginia.edu Mon Jul 2 10:31:40 2012 From: wrp at virginia.edu (William Pearson) Date: Mon, 2 Jul 2012 10:31:40 -0400 Subject: [Bioperl-l] Application Deadline - 2012 CSHL Computational and Comparative Genomics Course Message-ID: Course announcement - Application deadline, July 15, 2012 Cold Spring Harbor COMPUTATIONAL & COMPARATIVE GENOMICS Oct 31 - Nov 6, 2011 Application Deadline: July 15, 2012 INSTRUCTORS: William Pearson, University of Virginia, Charlottesville, VA Lisa Stubbs, University of Illinois, Urbana, IL This course presents a comprehensive overview of the theory and practice of computational methods for the identification and characterization of functional elements from DNA sequence data. The course focuses on approaches for extracting the maximum amount of information from protein and DNA sequence similarity through sequence database searches, statistical analysis, and multiple sequence alignment. Additional topics include: Alignment and analysis of "Next-Gen" sequencing data The Galaxy environment for high-throughput analysis Identification of conserved signals in aligned and unaligned sequences Regulatory element and motif recognition Integration of genetic and sequence information in biological databases The ENSEMBL genome browser and BioMart Function/phenotype prediction for sequence variants The course combines lectures with hands-on exercises; students are encouraged to pose challenging sequence analysis problems using their own data. The course is designed for biologists seeking advanced training in biological sequence and genome analysis, computational biology core resource directors and staff, and for scientists in other disciplines, such as computer science, who wish to survey current research problems in biological sequence analysis. Advanced programming skills are not required. The lecture/lab schedule for the 2011 course can be found at fasta.bioch.virginia.edu/cshl Speakers in the 2011 course included: Aaron Mackey, U. of Virginia, Next-Gen analysis pipelines Bert Overduin, European Bioinformatics Institute, UK, ENSEMBL and BioMart Francis Ouellette, Ontario Institute for Cancer Research, Databases for Biological Function William Pearson, U. of Virginia, Similarity Searching, Multiple Alignment Lisa Stubbs, U. of Illinois, Urbana, ChIP, Transcription Factors, and Comparative Genomics James Taylor, Emory, Galaxy and genome analysis pipelines The primary focus of the computational and comparative genomics course is the theory and practice of algorithms used in computational biology, with the goal of using current methods more effectively and evaluating new approaches. Students who wish to learn Perl programming for Bioinformatics are encouraged to apply to the Programming for Biology course. Students who would like in-depth training in the analysis of next-generation sequencing data (e.g., SNP calling and the detection of structural variants) should apply to the course on Advanced Sequencing Technologies & Applications. This Computational and Comparative Genomics course will discuss methods for phenotype prediction from variation data. To apply to the course, fill out and send in the form at: http://meetings.cshl.edu/course/courseapp_instr.shtml From shalabh.sharma7 at gmail.com Mon Jul 2 13:09:57 2012 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Mon, 2 Jul 2012 13:09:57 -0400 Subject: [Bioperl-l] translation frame problem in bioperl Message-ID: Hi All, I am just confused about the translation frames. I used bioperl to parse a blastx report. Reports shows that the frame used is -2 but when i translate the sequence using EMBOSS or Some other program the frame is -1. Am i doing something wrong here. Here is the sequence: >gi|378759230|gb|AHBJ01000169.1| SAR86 cluster bacterium SAR86D scf1120176765857, whole genome shotgun sequence 2642:3697 AGCTTCCCATGGAACCCATGCAAGTGCAATATTTGTTTCTAGCTCTGGTGACCACCAAGGAGATGTCACGTAGCCCACCTCATCTTCATCAGTATTAGTTACTATCCAAAAATCAGAAGCATAATCTGTGATTTCTTTTCCTCCAAGGGTTAAACCAACCATCTTCATTTTAAATGGTGCATTTCCTTCATCTATGATTGCTCTCTGTTTTTCAAGCTCTTCTTTACCAATGTAATCAGCTGCTTTATTTCTTGGTACCTGATAACTTAAATTAACCTGAAAGGGAGAAGTTTCATGATCCAGATCTTGTCCCCAAGACAAAATTCCAGCTGCAATGCGACGATGATGCGCAGGAGCTATGACCATTAAGCCAAATTCTTCTCCAGCCTCAAGAACAGCATTCCACATTTTTTCTGCATTATCATGTGCGTCACGAACATATATTTCATAACCTTTTTCGCCTGTAAAACCAGTTTGACTGATTACACAATCAGCTCCACCAACCTGAGTTTCTAAAATTCCATAATAAGGAACTTCTCTTAACTCTTCGCCAGCTAACTTTGCCATAAGATCTTCAGATAAAGGGCCTTGAATTTGAACAGGACAAACATCAATCTCATCAATTTCTACGTCATATTTTTTAGACACATTTACGCCTTGAAGCCAAAGTAAGAGATCGCTGTCTGATATTGAGAACCAGAATTCATCTTCTGTTAGTCTTAATAGAACAGGGTCATTTAAAACCCCTCCTTTTTCATTGCATAAAATCGCATATTTACCATTTCCGGGTTTAATTTTTGTAGCATCACGAGTTATTACATAATCTGTAAAAGCTTCTGCATCTGGACCTTTTACTCTTATCTGTCTTTCAACAGCAACATTCCACATAGTAACTCTATTAACCAAGGCTTCGTATTCAACCATGGCACCGCCATCTTCAGGTTTTACATAGCCTCGTGGATGATAAATTCGATTATATACAGTTGCTCTCCAACAGCCCGCTTCATGAGATAGATGCCAAAAAGGCGATTTTCTTACCCGGGTTGAAATTAATAA This is a part of blast report by bioperl: >JCVI_READ_1105499496127 /Indian_Ocean/gcvT Length = 352 Score = 655 bits (1690), Expect = 0.0 Identities = 311/352 (88%), Positives = 329/352 (93%) Frame = -2 Query: 3697 LLISTRVRKSPFWHLSHEAGCWRATVYNRIYHPRGYVKPEDGGAMVEYEALVNRVTMWNV 3518 +LISTRVRKSPFWHLSHEAGCWRATVYNRIYHPRGY+KPEDGGAMVEY+ALVNRVTMWNV Sbjct: 1 MLISTRVRKSPFWHLSHEAGCWRATVYNRIYHPRGYIKPEDGGAMVEYDALVNRVTMWNV 60 ..... ..... Query: 2797 GLTLGGKEITDYASDFWIVTNTDEDEVGYVTSPWWSPELETNIALAWVPWEA 2642 GLTLGGKEITDYA DFW+V + D + PWWSPEL TNIAL WVPW A Sbjct: 301 GLTLGGKEITDYAPDFWLVADMDGMMLDISLPPWWSPELNTNIALGWVPWSA 352 This is EMBOSS output (from EBI): >EMBOSS_001_4 LLISTRVRKSPFWHLSHEAGCWRATVYNRIYHPRGYVKPEDGGAMVEYEALVNRVTMWNV AVERQIRVKGPDAEAFTDYVITRDATKIKPGNGKYAILCNEKGGVLNDPVLLRLTEDEFW FSISDSDLLLWLQGVNVSKKYDVEIDEIDVCPVQIQGPLSEDLMAKLAGEELREVPYYGI LETQVGGADCVISQTGFTGEKGYEIYVRDAHDNAEKMWNAVLEAGEEFGLMVIAPAHHRR IAAGILSWGQDLDHETSPFQVNLSYQVPRNKAADYIGKEELEKQRAIIDEGNAPFKMKMV GLTLGGKEITDYASDFWIVTNTDEDEVGYVTSPWWSPELETNIALAWVPWEA >EMBOSS_001_5 INFNPGKKIAFLASIS*SGLLESNCI*SNLSSTRLCKT*RWRCHG*IRSLG**SYYVECC C*KTDKSKRSRCRSFYRLCNNS*CYKN*TRKW*ICDFMQ*KRRGFK*PCSIKTNRR*ILV ...... You can see its a frame -1. I would really appreciate your help. Thanks Shalabh -- Shalabh Sharma Scientific Computing Professional Associate (Bioinformatics Specialist) Department of Marine Sciences University of Georgia Athens, GA 30602-3636 From scott at scottcain.net Mon Jul 2 14:50:45 2012 From: scott at scottcain.net (Scott Cain) Date: Mon, 2 Jul 2012 14:50:45 -0400 Subject: [Bioperl-l] GMOD Summer School application deadline Message-ID: Hello, The deadline to apply for the GMOD Summer School is in one week, July 9th. The application is available as a Google Form: https://docs.google.com/spreadsheet/embeddedform?formkey=dG5hNGFiQ3UwYTV2LUZxZW04Qm1yZXc6MQ In the GMOD Summer School (August 24-29, 2012) we will cover the installation, configuration and use of a variety of GMOD tools, including Chado, GBrowse, JBrowse and Tripal. For more information on the course, see the course web page at http://gmod.org/wiki/2012_GMOD_Summer_School The course will make heavy use of the Amazon Web Service (aka, the Cloud) via a grant from Amazon. Enrollment is limited to 24 students, and the application process is competitive: the last few years we've received over 75 applications for those 24 spots. I look forward to seeing you in North Carolina in August! Scott -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From p.j.a.cock at googlemail.com Mon Jul 2 15:34:40 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 2 Jul 2012 20:34:40 +0100 Subject: [Bioperl-l] translation frame problem in bioperl In-Reply-To: References: Message-ID: On Mon, Jul 2, 2012 at 6:09 PM, shalabh sharma wrote: > Hi All, > ? ? ? ? ?I am just confused about the translation frames. I used bioperl to > parse a blastx report. > Reports shows that the frame used is -2 but when i translate the sequence > using EMBOSS or Some other program the frame is -1. > Am i doing something wrong here. Possibly there are conflicting definitions of frames -1, -2, and -3 here (and that's leaving out the possibility of -0, -1 and -2 counting). Some will count from the first base (start for forward strand), others the last base (start of reverse strand). This can make comparing the output of different tools quite confusing. Peter From shalabh.sharma7 at gmail.com Mon Jul 2 16:39:29 2012 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Mon, 2 Jul 2012 16:39:29 -0400 Subject: [Bioperl-l] translation frame problem in bioperl In-Reply-To: <98F2B304-71F3-42BD-9603-6858F03CC9F5@verizon.net> References: <98F2B304-71F3-42BD-9603-6858F03CC9F5@verizon.net> Message-ID: Hi Peter and Brian, Thanks a lot for your reply. I have already taken this in account. So if i parse the blast report (my previous example) i get strand '-1' and frame '1' (according to bioperl) so if we convert it to general term then its -2 because bioperl starts from 0. Also for bioperl forward frame translation working fine. Thanks Shalabh On Mon, Jul 2, 2012 at 4:24 PM, Brian Osborne wrote: > Shalabh, > > Also take a look at this: > > http://www.bioperl.org/wiki/HOWTO:SearchIO#frame.28.29 > > Brian O. > > > On Jul 2, 2012, at 3:34 PM, Peter Cock wrote: > > > On Mon, Jul 2, 2012 at 6:09 PM, shalabh sharma > > wrote: > >> Hi All, > >> I am just confused about the translation frames. I used > bioperl to > >> parse a blastx report. > >> Reports shows that the frame used is -2 but when i translate the > sequence > >> using EMBOSS or Some other program the frame is -1. > >> Am i doing something wrong here. > > > > Possibly there are conflicting definitions of frames -1, -2, and -3 here > > (and that's leaving out the possibility of -0, -1 and -2 counting). Some > > will count from the first base (start for forward strand), others the > last > > base (start of reverse strand). This can make comparing the output > > of different tools quite confusing. > > > > Peter > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- Shalabh Sharma Scientific Computing Professional Associate (Bioinformatics Specialist) Department of Marine Sciences University of Georgia Athens, GA 30602-3636 From bosborne11 at verizon.net Mon Jul 2 16:24:24 2012 From: bosborne11 at verizon.net (Brian Osborne) Date: Mon, 02 Jul 2012 16:24:24 -0400 Subject: [Bioperl-l] translation frame problem in bioperl In-Reply-To: References: Message-ID: <98F2B304-71F3-42BD-9603-6858F03CC9F5@verizon.net> Shalabh, Also take a look at this: http://www.bioperl.org/wiki/HOWTO:SearchIO#frame.28.29 Brian O. On Jul 2, 2012, at 3:34 PM, Peter Cock wrote: > On Mon, Jul 2, 2012 at 6:09 PM, shalabh sharma > wrote: >> Hi All, >> I am just confused about the translation frames. I used bioperl to >> parse a blastx report. >> Reports shows that the frame used is -2 but when i translate the sequence >> using EMBOSS or Some other program the frame is -1. >> Am i doing something wrong here. > > Possibly there are conflicting definitions of frames -1, -2, and -3 here > (and that's leaving out the possibility of -0, -1 and -2 counting). Some > will count from the first base (start for forward strand), others the last > base (start of reverse strand). This can make comparing the output > of different tools quite confusing. > > Peter > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From vebaev at gmail.com Tue Jul 3 12:35:26 2012 From: vebaev at gmail.com (vebaev at gmail.com) Date: Tue, 3 Jul 2012 09:35:26 -0700 (PDT) Subject: [Bioperl-l] CFP - International Conference on Bioinformatics and Computational Biology - BIOCOMP BG 2012 Message-ID: <7b498b4c-2b2e-4e1f-871f-513203488bf1@googlegroups.com> International Conference on Bioinformatics and Computational Biology - BIOCOMP BG 2012 September 20-21, 2012, Varna, Bulgaria Dear Colleague, It is our pleasure to circulate the 2nd announcement of the International Conference on Bioinformatics and Computational Biology - BIOCOMP 2012 (http://biocomp.bio.uni-plovdiv.bg/). Keynote speakers Prof. Dr. Klaas Vandepoele - Ghent University, Belgium Dr. Andreas Gisel - Institute for Biomedical Technologies, Italy Prof. Wojciech Karlowski - Insitute of Molecular Biology and Biotechnology, Poland Prof. Mario A. Fares - University of Dublin, Trinity College, Ireland Dr.Andrey Kajava - CRBM - Macromolecular Biochemistry Research Center, France Dr.Gaurav Sablok - Istituto Agrario San Michele (IASMA), Italy Topics Topics of interest include, but are not limited to: High-performance bio-computing High-throughput sequencing data analysis (NGS) Bio-ontologies Molecular evolution Comparative genomics Molecular modeling and simulation Computational genetics Computational proteomics Data mining and visualization Software tools and applications Gene expression analysis Gene networks Structural biology Genome analysis Databases Systems biology Special topic: bioinformatics and miRNAs Recent achievements in these fields will be presented. The conference will include plenary and poster sessions. Participant?s proposals will be taken under advisement in compiling the program. Publications All accepted abstracts will be published in the conference abstract book. Best 20 abstracts will be peer-reviewed and published as full text manuscripts in a Special Issue of Springer and Elsevier journals: Interdisciplinary Sciences: Computational Life Sciences (ISSN: 1867-1462). Journal of Computational Science (ISSN: 1877-7503) Venue The venue of the conference is 4-star All-inclusive Sunny Day Black Sea resort, Bulgaria Registration and abstract submission All the actions related to the BIOCOMP 2012 (abstract submission, registration etc) may be completed via the Conference website at http://biocomp.bio.uni-plovdiv.bg/ Accommodation IMPORTANT: Accommodation is included in the conference registration fee. Important dates Abstract Submission Deadline - 20 August 2012 Early Registration Fee Payment Deadline - 20 August 2012 Arriving, Poster set up, Registration ? 19 September 2012 Plenary and Poster Sessions ? 20-21 September 2012 You may find details of the Conference visiting the Conference website at http://biocomp.bio.uni-plovdiv.bg/ Looking forward to see you in Bulgaria! ------------------------------------------------ Dr. Vesselin Baev Research Assistant Professor University of Plovdiv Dept. Plant Phys. and Molecular Biology Bioinformatics SMART Group Tzar Assen 24,Plovdiv 4000, BULGARIA Office:+359 32 261 (560); Mobile:+359 89 43 80 945 vebaev at gmail.com; baev at uni-plovdiv.bg; CV: http://plantgene.eu/ From tarakaramji at gmail.com Tue Jul 3 15:33:43 2012 From: tarakaramji at gmail.com (Tarakaramji Moturu) Date: Tue, 3 Jul 2012 19:33:43 +0000 (UTC) Subject: [Bioperl-l] Invitation to connect on LinkedIn Message-ID: <757342252.16905070.1341344023805.JavaMail.app@ela4-bed83.prod> LinkedIn ------------ I'd like to add you to my professional network on LinkedIn. - Tarakaramji Tarakaramji Moturu Student at GITAM University Vishakhapatnam Area, India Confirm that you know Tarakaramji Moturu: https://www.linkedin.com/e/1505z7-h47dlkop-69/isd/7726719493/9xC087NO/?hs=false&tok=2UuxBwCCkl7Rk1 -- You are receiving Invitation to Connect emails. Click to unsubscribe: http://www.linkedin.com/e/1505z7-h47dlkop-69/q7l5PgNeLXh3mAgNJzs79PDWzhT0l80xWa/goo/bioperl-l%40bioperl%2Eorg/20061/I2613636655_1/?hs=false&tok=0hY4YIDwkl7Rk1 (c) 2012 LinkedIn Corporation. 2029 Stierlin Ct, Mountain View, CA 94043, USA. From l.m.timmermans at students.uu.nl Wed Jul 4 03:16:34 2012 From: l.m.timmermans at students.uu.nl (Leon Timmermans) Date: Wed, 4 Jul 2012 10:16:34 +0300 Subject: [Bioperl-l] Invitation to connect on LinkedIn In-Reply-To: <757342252.16905070.1341344023805.JavaMail.app@ela4-bed83.prod> References: <757342252.16905070.1341344023805.JavaMail.app@ela4-bed83.prod> Message-ID: On Tue, Jul 3, 2012 at 10:33 PM, Tarakaramji Moturu wrote: > LinkedIn > ------------ > > > > I'd like to add you to my professional network on LinkedIn. > > - Tarakaramji Sending messages like this directly over mailinglists is a rather bad idea, if only because LinkedIn will think bioperl-l at bioperl.org is one of the email addresses of whomever accepts the request (which is relevant for retrieving a lost password, I think). Leon From ulrik.stervbo at gmail.com Fri Jul 6 03:03:08 2012 From: ulrik.stervbo at gmail.com (Ulrik Stervbo) Date: Fri, 6 Jul 2012 09:03:08 +0200 Subject: [Bioperl-l] problem in using protparam.pm module In-Reply-To: References: <9AA27ADA-FFE1-4735-BDE4-56C9B9A18009@illinois.edu> Message-ID: I had the same problem, and realized it is because I am behind a proxy. This is what I did to the Protparam module: Changed the url to 'http://web.expasy.org/cgi-bin/protparam/protparam' as previously found Added: $browser->proxy(['http'], 'http://[my proxy]/'); after initialization of the LWP agent. The proxy settings is what made Perl choke. (If only one could make perl see global proxy settings). Cheers, Ulrik 2011/7/28 Shachi Gahoi : > Please help me how to run protparam using bioperl module > > On Wed, Jul 27, 2011 at 11:05 AM, Chris Fields wrote: > >> The web service appears to have changed, but it looks as if no tests have >> been written up for this module which would have caught this out. We can >> write some basic tests up to check for simple functionality. >> >> chris >> >> On Jul 26, 2011, at 10:58 PM, Shachi Gahoi wrote: >> >> > Dear All, >> > >> > i am using protparam.pm module. but when i am running this script it is >> > printing one error message >> > >> > "Can't call method "throw" without a package or object reference at >> > /usr/share/perl5/Bio/Root/Root.pm line 368, line 1." >> > >> > Kindly help me to solve this problem. >> > >> > >> > Script is here---- >> > >> ################################################################################### >> > #!/usr/bin/perl >> > >> > use warnings; >> > use Bio::SeqIO; >> > use Bio::Tools::Protparam; >> > >> > >> > $seqfile='test1.fasta'; >> > >> > $seqio = Bio::SeqIO->new(-file => "$seqfile", -format => 'Fasta'); >> > >> > >> > while( $seq = $seqio->next_seq() ) >> > { >> > >> > >> > my $pp = Bio::Tools::Protparam->new(-seq=>$seq->seq); >> > >> > print >> > "ID : ", $seq->display_id,"\n", >> > "Amino acid number : ",$pp->amino_acid_number(),"\n", >> > "Number of negative amino acids : ",$pp->num_neg(),"\n", >> > "Number of positive amino acids : ",$pp->num_pos(),"\n", >> > "Molecular weight : ",$pp->molecular_weight(),"\n", >> > "Theoretical pI : ",$pp->theoretical_pI(),"\n", >> > "Total number of atoms : ", $pp->total_atoms(),"\n", >> > "Number of carbon atoms : ",$pp->num_carbon(),"\n", >> > "Number of hydrogen atoms : ",$pp->num_hydrogen(),"\n", >> > "Number of nitrogen atoms : ",$pp->num_nitro(),"\n", >> > "Number of oxygen atoms : ",$pp->num_oxygen(),"\n", >> > "Number of sulphur atoms : ",$pp->num_sulphur(),"\n", >> > "Half life : ", $pp->half_life(),"\n", >> > "Instability Index : ", $pp->instability_index(),"\n", >> > "Stability class : ", $pp->stability(),"\n", >> > "Aliphatic_index : ",$pp->aliphatic_index(),"\n", >> > "Gravy : ", $pp->gravy(),"\n", >> > "Composition of A : ", $pp->AA_comp('A'),"\n", >> > "Composition of R : ", $pp->AA_comp('R'),"\n", >> > "Composition of N : ", $pp->AA_comp('N'),"\n", >> > "Composition of D : ", $pp->AA_comp('D'),"\n", >> > "Composition of C : ", $pp->AA_comp('C'),"\n", >> > "Composition of Q : ", $pp->AA_comp('Q'),"\n", >> > "Composition of E : ", $pp->AA_comp('E'),"\n", >> > "Composition of G : ", $pp->AA_comp('G'),"\n", >> > "Composition of H : ", $pp->AA_comp('H'),"\n", >> > "Composition of I : ", $pp->AA_comp('I'),"\n", >> > "Composition of L : ", $pp->AA_comp('L'),"\n", >> > "Composition of K : ", $pp->AA_comp('K'),"\n", >> > "Composition of M : ", $pp->AA_comp('M'),"\n", >> > "Composition of F : ", $pp->AA_comp('F'),"\n", >> > "Composition of P : ", $pp->AA_comp('P'),"\n", >> > "Composition of S : ", $pp->AA_comp('S'),"\n", >> > "Composition of T : ", $pp->AA_comp('T'),"\n", >> > "Composition of W : ", $pp->AA_comp('W'),"\n", >> > "Composition of Y : ", $pp->AA_comp('Y'),"\n", >> > "Composition of V : ", $pp->AA_comp('V'),"\n", >> > "Composition of B : ", $pp->AA_comp('B'),"\n", >> > "Composition of Z : ", $pp->AA_comp('Z'),"\n", >> > "Composition of X : ", $pp->AA_comp('X'),"\n"; >> > } >> > >> ################################################################################### >> > >> > >> > >> > >> > -- >> > Regards, >> > Shachi >> > _______________________________________________ >> > Bioperl-l mailing list >> > Bioperl-l at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > > -- > Regards, > Shachi > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jason.stajich at gmail.com Fri Jul 6 13:49:46 2012 From: jason.stajich at gmail.com (Jason Stajich) Date: Fri, 6 Jul 2012 10:49:46 -0700 Subject: [Bioperl-l] problem in using protparam.pm module In-Reply-To: References: <9AA27ADA-FFE1-4735-BDE4-56C9B9A18009@illinois.edu> Message-ID: <8C9056B6-1DA4-4BE0-B008-429C2F6C05BE@gmail.com> you might try the PERL_LWP_ENV_PROXY and HTTP_PROXY env variables http://search.cpan.org/~gaas/libwww-perl-6.04/lib/LWP/UserAgent.pm#CONSTRUCTOR_METHODS http://search.cpan.org/~gaas/libwww-perl-6.04/lib/LWP/UserAgent.pm#Proxy_attributes I can't test it my end though w/o a proxy service. On Jul 6, 2012, at 12:03 AM, Ulrik Stervbo wrote: > I had the same problem, and realized it is because I am behind a proxy. > > This is what I did to the Protparam module: > Changed the url to 'http://web.expasy.org/cgi-bin/protparam/protparam' > as previously found > > Added: > $browser->proxy(['http'], 'http://[my proxy]/'); after initialization > of the LWP agent. > > The proxy settings is what made Perl choke. (If only one could make > perl see global proxy settings). > > Cheers, > Ulrik > > 2011/7/28 Shachi Gahoi : >> Please help me how to run protparam using bioperl module >> >> On Wed, Jul 27, 2011 at 11:05 AM, Chris Fields wrote: >> >>> The web service appears to have changed, but it looks as if no tests have >>> been written up for this module which would have caught this out. We can >>> write some basic tests up to check for simple functionality. >>> >>> chris >>> >>> On Jul 26, 2011, at 10:58 PM, Shachi Gahoi wrote: >>> >>>> Dear All, >>>> >>>> i am using protparam.pm module. but when i am running this script it is >>>> printing one error message >>>> >>>> "Can't call method "throw" without a package or object reference at >>>> /usr/share/perl5/Bio/Root/Root.pm line 368, line 1." >>>> >>>> Kindly help me to solve this problem. >>>> >>>> >>>> Script is here---- >>>> >>> ################################################################################### >>>> #!/usr/bin/perl >>>> >>>> use warnings; >>>> use Bio::SeqIO; >>>> use Bio::Tools::Protparam; >>>> >>>> >>>> $seqfile='test1.fasta'; >>>> >>>> $seqio = Bio::SeqIO->new(-file => "$seqfile", -format => 'Fasta'); >>>> >>>> >>>> while( $seq = $seqio->next_seq() ) >>>> { >>>> >>>> >>>> my $pp = Bio::Tools::Protparam->new(-seq=>$seq->seq); >>>> >>>> print >>>> "ID : ", $seq->display_id,"\n", >>>> "Amino acid number : ",$pp->amino_acid_number(),"\n", >>>> "Number of negative amino acids : ",$pp->num_neg(),"\n", >>>> "Number of positive amino acids : ",$pp->num_pos(),"\n", >>>> "Molecular weight : ",$pp->molecular_weight(),"\n", >>>> "Theoretical pI : ",$pp->theoretical_pI(),"\n", >>>> "Total number of atoms : ", $pp->total_atoms(),"\n", >>>> "Number of carbon atoms : ",$pp->num_carbon(),"\n", >>>> "Number of hydrogen atoms : ",$pp->num_hydrogen(),"\n", >>>> "Number of nitrogen atoms : ",$pp->num_nitro(),"\n", >>>> "Number of oxygen atoms : ",$pp->num_oxygen(),"\n", >>>> "Number of sulphur atoms : ",$pp->num_sulphur(),"\n", >>>> "Half life : ", $pp->half_life(),"\n", >>>> "Instability Index : ", $pp->instability_index(),"\n", >>>> "Stability class : ", $pp->stability(),"\n", >>>> "Aliphatic_index : ",$pp->aliphatic_index(),"\n", >>>> "Gravy : ", $pp->gravy(),"\n", >>>> "Composition of A : ", $pp->AA_comp('A'),"\n", >>>> "Composition of R : ", $pp->AA_comp('R'),"\n", >>>> "Composition of N : ", $pp->AA_comp('N'),"\n", >>>> "Composition of D : ", $pp->AA_comp('D'),"\n", >>>> "Composition of C : ", $pp->AA_comp('C'),"\n", >>>> "Composition of Q : ", $pp->AA_comp('Q'),"\n", >>>> "Composition of E : ", $pp->AA_comp('E'),"\n", >>>> "Composition of G : ", $pp->AA_comp('G'),"\n", >>>> "Composition of H : ", $pp->AA_comp('H'),"\n", >>>> "Composition of I : ", $pp->AA_comp('I'),"\n", >>>> "Composition of L : ", $pp->AA_comp('L'),"\n", >>>> "Composition of K : ", $pp->AA_comp('K'),"\n", >>>> "Composition of M : ", $pp->AA_comp('M'),"\n", >>>> "Composition of F : ", $pp->AA_comp('F'),"\n", >>>> "Composition of P : ", $pp->AA_comp('P'),"\n", >>>> "Composition of S : ", $pp->AA_comp('S'),"\n", >>>> "Composition of T : ", $pp->AA_comp('T'),"\n", >>>> "Composition of W : ", $pp->AA_comp('W'),"\n", >>>> "Composition of Y : ", $pp->AA_comp('Y'),"\n", >>>> "Composition of V : ", $pp->AA_comp('V'),"\n", >>>> "Composition of B : ", $pp->AA_comp('B'),"\n", >>>> "Composition of Z : ", $pp->AA_comp('Z'),"\n", >>>> "Composition of X : ", $pp->AA_comp('X'),"\n"; >>>> } >>>> >>> ################################################################################### >>>> >>>> >>>> >>>> >>>> -- >>>> Regards, >>>> Shachi >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >> >> >> -- >> Regards, >> Shachi >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Jason Stajich jason.stajich at gmail.com jason at bioperl.org From bubli_thakur at rediffmail.com Sun Jul 1 10:59:29 2012 From: bubli_thakur at rediffmail.com (subarna thakur) Date: Sun, 01 Jul 2012 14:59:29 -0000 Subject: [Bioperl-l] =?utf-8?q?Ks_saturation?= Message-ID: <20120617031856.16345.qmail@f4mail-235-140.rediffmail.com> Dear all,I am trying to calculate dn/ds values of  all orthologous gene pair between a pair of genome using pairwsie_kaks.pl script within bioperl which evokes the codeml program in runmode -2. When I am analyzing the results, some of the genes have anomalously high dS or Ks values and some of them even reaching more than 100 as a result of which the average value of Ks for the whole genome  shots up. These genes are orthologous genes and even share more than 50% sequence identity. Should I consider these genes for the anlysis or left them out. If I left them out, then upto what cutoff value of Ks should I consider for analysis. In some papers, I have found that they have considered Ks values as high as 5.6. Is there a way for determining the cutoff value for Ks? Subarna   From haywardjeremya at gmail.com Fri Jul 6 13:56:12 2012 From: haywardjeremya at gmail.com (Jeremy Hayward) Date: Fri, 6 Jul 2012 14:56:12 -0300 Subject: [Bioperl-l] Two 'host' tags? Message-ID: Hi-- Clueless newbie here, for which apologies. I've posted a description of my problem, inputs and outputs, at Gist 2816510; https://gist.github.com/2816510 Briefly, I'm trying to take a genbank file (.gb), and create a FASTA file with a specific identifier line for each sequence. Specifically, I want the "host" tag as the identifier. With the help of the Bioperl beginner readme and the HOWTO's (which are great!) I've worked out how to loop through my sequences and get the 'host' tag for each one. For some reason, I get two identifier lines for each sequence. I guess the problem is in the 'for' loop--it's running the stuff below it twice, once with the actual 'host' tag data and once with...nothing? Not sure. I think I can work out how to use s/ and a regex just to delete the second identifier line, but that feels like I'm avoiding the problem instead of fixing it. Any help appreciated! Many thanks, --Jeremy Hayward From jason.stajich at gmail.com Fri Jul 6 15:39:52 2012 From: jason.stajich at gmail.com (Jason Stajich) Date: Fri, 6 Jul 2012 12:39:52 -0700 Subject: [Bioperl-l] Two 'host' tags? In-Reply-To: References: Message-ID: Hi Jeremy - You are printing for every feature in the loop (e.g. the source and the misc_RNA ) - you only want to loop through the features, then grab the one which is source, then change or print the info when you see that. So you could have an if( $feature->primary_tag eq 'source') in there or something as well. Alternatively I've left it pretty much intact and just simplified it a bit. You should also try and use Bio::SeqIO to print instead of your printing. I updated the code here to be simpler - right now it warns you that you are printing IDs with spaces (which is something you should think about when it comes to your output file, but I don't know your downstream plans). Also you could put other info in the description field if you wanted to capture accession number or the endophyte name too. https://gist.github.com/3062285 Best, Jason On Jul 6, 2012, at 10:56 AM, Jeremy Hayward wrote: > Hi-- Clueless newbie here, for which apologies. > > I've posted a description of my problem, inputs and outputs, at Gist > 2816510; https://gist.github.com/2816510 > > Briefly, I'm trying to take a genbank file (.gb), and create a FASTA > file with a specific identifier line for each sequence. Specifically, > I want the "host" tag as the identifier. With the help of the Bioperl > beginner readme and the HOWTO's (which are great!) I've worked out how > to loop through my sequences and get the 'host' tag for each one. For > some reason, I get two identifier lines for each sequence. I guess the > problem is in the 'for' loop--it's running the stuff below it twice, > once with the actual 'host' tag data and once with...nothing? Not > sure. > > I think I can work out how to use s/ and a regex just to delete the > second identifier line, but that feels like I'm avoiding the problem > instead of fixing it. Any help appreciated! > > > Many thanks, > > --Jeremy Hayward > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Jason Stajich jason.stajich at gmail.com jason at bioperl.org From bosborne11 at verizon.net Fri Jul 6 15:51:11 2012 From: bosborne11 at verizon.net (Brian Osborne) Date: Fri, 06 Jul 2012 15:51:11 -0400 Subject: [Bioperl-l] Two 'host' tags? In-Reply-To: References: Message-ID: <456448FF-C413-42D1-833A-FAA74E4FEF9E@verizon.net> Jeremy, Looks like each of your individual sequences has 2 features, but you only care about the 'source' feature ( if ($feat_object->primary_tag eq "source") ?). Also, try not to print out the sequence like you're doing, try to build a Sequence object for each input sequence and then write its contents to your fasta file using write_seq(). You will set the id for your Sequence object using display_name(). Brian O. On Jul 6, 2012, at 1:56 PM, Jeremy Hayward wrote: > Hi-- Clueless newbie here, for which apologies. > > I've posted a description of my problem, inputs and outputs, at Gist > 2816510; https://gist.github.com/2816510 > > Briefly, I'm trying to take a genbank file (.gb), and create a FASTA > file with a specific identifier line for each sequence. Specifically, > I want the "host" tag as the identifier. With the help of the Bioperl > beginner readme and the HOWTO's (which are great!) I've worked out how > to loop through my sequences and get the 'host' tag for each one. For > some reason, I get two identifier lines for each sequence. I guess the > problem is in the 'for' loop--it's running the stuff below it twice, > once with the actual 'host' tag data and once with...nothing? Not > sure. > > I think I can work out how to use s/ and a regex just to delete the > second identifier line, but that feels like I'm avoiding the problem > instead of fixing it. Any help appreciated! > > > Many thanks, > > --Jeremy Hayward > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From dejian.zhao at gmail.com Wed Jul 11 13:31:37 2012 From: dejian.zhao at gmail.com (De-Jian Zhao) Date: Thu, 12 Jul 2012 01:31:37 +0800 Subject: [Bioperl-l] Errors with Bio::Seq objects Message-ID: <4FFDB879.1020906@gmail.com> Hi, I downloaded a nucleotide sequence from Genbank (file name: nt.gb) and tested the Bio::SeqIO module as follows. The first 3 commands succeeded; however the last 2 failed. I think $seqio->next_seq() produces a Bio::Seq object which contains the sequence, features and annotation (according to the DESCRIPTION of "perldoc Bio::Seq") and thus the invocation of the methods get_SeqFeatures() and translate() should be valid. However, the results denied this idea. Will anyone explain what happened to the last 2 commands? I have encountered numerous cases of failures when testing the bioperl methods. I want to translate the mRNA sequence and extract the sequence features. What are the right commands? Thanks a lot! Best, Dejian PS: The commands and results $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); print $seqio->next_seq()->display_id(); ' nt.gb NM_053056 $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); print $seqio->next_seq()->molecule(); ' nt.gb mRNA $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); print $seqio->next_seq()->subseq(1,6); ' nt.gb CACACG $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); print $seqio->next_seq()->get_SeqFeatures(); ' nt.gb Bio::SeqFeature::Generic=HASH(0x20a30898)Bio::SeqFeature::Generic=HASH(0x20a30bb0)Bio::SeqFeature::Generic=HASH(0x20a30cd0)Bio::SeqFeature::Generic=HASH(0x20a317b0)Bio::SeqFeature::Generic=HASH(0x20a31720)Bio::SeqFeature::Generic=HASH(0x20a39a18)Bio::SeqFeature::Generic=HASH(0x20a317e0)Bio::SeqFeature::Generic=HASH(0x20a398b0)Bio::SeqFeature::Generic=HASH(0x20a39838)Bio::SeqFeature::Generic=HASH(0x20a39e98)Bio::SeqFeature::Generic=HASH(0x20a3b898)Bio::SeqFeature::Generic=HASH(0x20a3a120)Bio::SeqFeature::Generic=HASH(0x20a3bda8)Bio::SeqFeature::Generic=HASH(0x20a3c030)Bio::SeqFeature::Generic=HASH(0x20a3c2b8)Bio::SeqFeature::Generic=HASH(0x20a3be20)Bio::SeqFeature::Generic=HASH(0x20a3c0a8)Bio::SeqFeature::Generic=HASH(0x20a3bb98)Bio::SeqFeature::Generic=HASH(0x20a3c300)Bio::SeqFeature::Generic=HASH(0x20a3c588)Bio::SeqFeature::Generic=HASH(0x20a3d838)Bio::SeqFeature::Generic=HASH(0x20a3dfb8)Bio::SeqFeature::Generic=HASH(0x20a3dd18) $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); print $seqio->next_seq()->translate(); ' nt.gb Bio::Seq::RichSeq=HASH(0x20a3e7b0) From jimhu at tamu.edu Wed Jul 11 14:01:27 2012 From: jimhu at tamu.edu (Jim Hu) Date: Wed, 11 Jul 2012 13:01:27 -0500 Subject: [Bioperl-l] Errors with Bio::Seq objects In-Reply-To: <4FFDB879.1020906@gmail.com> References: <4FFDB879.1020906@gmail.com> Message-ID: <9CA9DA3A-B03F-4EC3-977C-E18A6F4D9B6F@tamu.edu> Hi Dejian, On Jul 11, 2012, at 12:31 PM, De-Jian Zhao wrote: > Hi, > > I downloaded a nucleotide sequence from Genbank (file name: nt.gb) and tested the Bio::SeqIO module as follows. The first 3 commands succeeded; however the last 2 failed. > > I think $seqio->next_seq() produces a Bio::Seq object which contains the sequence, features and annotation (according to the DESCRIPTION of "perldoc Bio::Seq") and thus the invocation of the methods get_SeqFeatures() and translate() should be valid. However, the results denied this idea. That's correct about Bio::Seq objects being returned. Actually, it is probably a kind of Bio::Seq object. For example, SeqIO may return a Bio::Seq::RichSeq object that inherits methods from Bio::Seq objects. However, as explained below, the methods are working as they should... they are just returning objects when you are expecting something else. > > Will anyone explain what happened to the last 2 commands? I have encountered numerous cases of failures when testing the bioperl methods. I want to translate the mRNA sequence and extract the sequence features. What are the right commands? Thanks a lot! > > Best, > Dejian > > > > PS: The commands and results > > $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); print $seqio->next_seq()->display_id(); ' nt.gb > NM_053056 > > $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); print $seqio->next_seq()->molecule(); ' nt.gb > mRNA > > $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); print $seqio->next_seq()->subseq(1,6); ' nt.gb > CACACG > > $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); print $seqio->next_seq()->get_SeqFeatures(); ' nt.gb > Bio::SeqFeature::Generic=HASH(0x20a30898)Bio::SeqFeature::Generic=HASH(0x20a30bb0)Bio::SeqFeature::Generic=HASH(0x20a30cd0)Bio::SeqFeature::Generic=HASH(0x20a317b0)Bio::SeqFeature::Generic=HASH(0x20a31720)Bio::SeqFeature::Generic=HASH(0x20a39a18)Bio::SeqFeature::Generic=HASH(0x20a317e0)Bio::SeqFeature::Generic=HASH(0x20a398b0)Bio::SeqFeature::Generic=HASH(0x20a39838)Bio::SeqFeature::Generic=HASH(0x20a39e98)Bio::SeqFeature::Generic=HASH(0x20a3b898)Bio::SeqFeature::Generic=HASH(0x20a3a120)Bio::SeqFeature::Generic=HASH(0x20a3bda8)Bio::SeqFeature::Generic=HASH(0x20a3c030)Bio::SeqFeature::Generic=HASH(0x20a3c2b8)Bio::SeqFeature::Generic=HASH(0x20a3be20)Bio::SeqFeature::Generic=HASH(0x20a3c0a8)Bio::SeqFeature::Generic=HASH(0x20a3bb98)Bio::SeqFeature::Generic=HASH(0x20a3c300)Bio::SeqFeature::Generic=HASH(0x20a3c588)Bio::SeqFeature::Generic=HASH(0x20a3d838)Bio::SeqFeature::Generic=HASH(0x20a3dfb8)Bio::SeqFeature::Generic=HASH(0x20a3dd18) http://doc.bioperl.org/bioperl-live/Bio/Seq.html#POD24 $seq_obj->get_SeqFeatures() returns an array of SeqFeature objects, which are references. So this worked as expected. I usually write this as script files, so I've never done it all with perl -e. But you need to iterate over the array and query the objects for the information you want about the features. > > $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); print $seqio->next_seq()->translate(); ' nt.gb > Bio::Seq::RichSeq=HASH(0x20a3e7b0) ->translate returns a new Seq object. I think $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); print $seqio->next_seq()->translate()->seq(); ' nt.gb should work (haven't tried it). Jim > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ===================================== Jim Hu Professor Dept. of Biochemistry and Biophysics 2128 TAMU Texas A&M Univ. College Station, TX 77843-2128 979-862-4054 From bosborne11 at verizon.net Wed Jul 11 13:47:25 2012 From: bosborne11 at verizon.net (Brian Osborne) Date: Wed, 11 Jul 2012 13:47:25 -0400 Subject: [Bioperl-l] Errors with Bio::Seq objects In-Reply-To: <4FFDB879.1020906@gmail.com> References: <4FFDB879.1020906@gmail.com> Message-ID: Dejian, These are not "failures". The get_SeqFeatures() and translate() methods will return Bio::Seq objects or a Bio::Seq object. Start here: www.bioperl.org/wiki/HOWTO:Beginners Brian O. On Jul 11, 2012, at 1:31 PM, De-Jian Zhao wrote: > Hi, > > I downloaded a nucleotide sequence from Genbank (file name: nt.gb) and tested the Bio::SeqIO module as follows. The first 3 commands succeeded; however the last 2 failed. > > I think $seqio->next_seq() produces a Bio::Seq object which contains the sequence, features and annotation (according to the DESCRIPTION of "perldoc Bio::Seq") and thus the invocation of the methods get_SeqFeatures() and translate() should be valid. However, the results denied this idea. > > Will anyone explain what happened to the last 2 commands? I have encountered numerous cases of failures when testing the bioperl methods. I want to translate the mRNA sequence and extract the sequence features. What are the right commands? Thanks a lot! > > Best, > Dejian > > > > PS: The commands and results > > $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); print $seqio->next_seq()->display_id(); ' nt.gb > NM_053056 > > $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); print $seqio->next_seq()->molecule(); ' nt.gb > mRNA > > $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); print $seqio->next_seq()->subseq(1,6); ' nt.gb > CACACG > > $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); print $seqio->next_seq()->get_SeqFeatures(); ' nt.gb > Bio::SeqFeature::Generic=HASH(0x20a30898)Bio::SeqFeature::Generic=HASH(0x20a30bb0)Bio::SeqFeature::Generic=HASH(0x20a30cd0)Bio::SeqFeature::Generic=HASH(0x20a317b0)Bio::SeqFeature::Generic=HASH(0x20a31720)Bio::SeqFeature::Generic=HASH(0x20a39a18)Bio::SeqFeature::Generic=HASH(0x20a317e0)Bio::SeqFeature::Generic=HASH(0x20a398b0)Bio::SeqFeature::Generic=HASH(0x20a39838)Bio::SeqFeature::Generic=HASH(0x20a39e98)Bio::SeqFeature::Generic=HASH(0x20a3b898)Bio::SeqFeature::Generic=HASH(0x20a3a120)Bio::SeqFeature::Generic=HASH(0x20a3bda8)Bio::SeqFeature::Generic=HASH(0x20a3c030)Bio::SeqFeature::Generic=HASH(0x20a3c2b8)Bio::SeqFeature::Generic=HASH(0x20a3be20)Bio::SeqFeature::Generic=HASH(0x20a3c0a8)Bio::SeqFeature::Generic=HASH(0x20a3bb98)Bio::SeqFeature::Generic=HASH(0x20a3c300)Bio::SeqFeature::Generic=HASH(0x20a3c588)Bio::SeqFeature::Generic=HASH(0x20a3d838)Bio::SeqFeature::Generic=HASH(0x20a3dfb8)Bio::SeqFeature::Generic=HASH(0x20a3dd18) > > $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); print $seqio->next_seq()->translate(); ' nt.gb > Bio::Seq::RichSeq=HASH(0x20a3e7b0) > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Wed Jul 11 15:02:46 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Wed, 11 Jul 2012 19:02:46 +0000 Subject: [Bioperl-l] hmmer3.pm question re query and hit coordinates In-Reply-To: References: <7CF4A2C5-F44F-4C0D-A3B7-5ED131A1A9ED@gmail.com> <1823BCEE-5D27-4FF9-8D57-082AE0CFE8ED@illinois.edu> Message-ID: Peng, Has this been filed as a bug yet? https://redmine.open-bio.org/projects/bioperl Seems like it would be fairly easy to fix, but I want to track it just in case. chris On Jul 11, 2012, at 12:45 PM, Peng Zhou wrote: > Hello guys, > > Just a follow-up, it seems to me the bioperl-live version is still having the same problem - calling hit "query" while query sequence "hit". I also looked into the test script written for hmmer3 (bioperl-live/t/SearchIO/hmmer.t), and it doesn't deal with the alignment part - I guess that's why this bug was not discovered. > > To be simple, here's an output of hmmsearch v3.0: > # hmmsearch :: search profile(s) against a sequence database > # HMMER 3.0 (March 2010); http://hmmer.org/ > # Copyright (C) 2010 Howard Hughes Medical Institute. > # Freely distributed under the GNU General Public License (GPLv3). > # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > # query HMM file: /project/youngn/zhoup/Scripts/spada/profile/21_all.hmm > # target sequence database: /project/youngn/zhoup/Data/misc3/spada/Athaliana/01_genome/12_refseq_orf.fa > # output directed to file: /project/youngn/zhoup/Data/misc3/spada/Athaliana/11_hmmSearchX/01_raw.txt > # number of worker threads: 4 > # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > > Query: CRP0000 [M=75] > Scores for complete sequences (score includes all domains): > --- full sequence --- --- best 1 domain --- -#dom- > E-value score bias E-value score bias exp N Sequence Description > ------- ------ ----- ------- ------ ----- ---- -- -------- ----------- > 5.5e-25 95.0 14.4 5.7e-25 95.0 10.0 1.0 1 Chr2_540228_540404_+ > > Domain annotation for each sequence (and alignments): > >> Chr2_540228_540404_+ > # score bias c-Evalue i-Evalue hmmfrom hmm to alifrom ali to envfrom env to acc > --- ------ ----- --------- --------- ------- ------- ------- ------- ------- ------- ---- > 1 ! 95.0 10.0 3.6e-30 5.7e-25 20 74 .. 4 59 .] 1 59 [] 0.95 > > Alignments for each domain: > == domain 1 score: 95.0 bits; conditional E-value: 3.6e-30 > CRP0000 20 tegpkvaeartCesqShkFkGpCvsdtnCasvCrtEgfpgGecrg.rrrCfCtkpc 74 > ++gp+++eartCes+Sh+FkGpCvs +nCa+vC++Egf gG+crg rrrC+Ct++c > Chr2_540228_540404_+ 4 GMGPVTVEARTCESKSHRFKGPCVSTHNCANVCHNEGFGGGKCRGfRRRCYCTRHC 59 > 568899***99********************************************* PP > > And here is a dump of the parsed HSP object: > $VAR1 = bless( { > 'VERBOSE' => 0, > 'IDENTICAL' => 0, > 'RANK' => 1, > 'STRANDED' => 'NONE', > 'EVALUE' => '3.6e-30', > 'HSP_LENGTH' => 56, > 'ALGORITHM' => 'HMMSEARCH' > 'SCORE' => '95.0', > 'GAP_SYMBOL' => '-', > 'CONSERVED' => 0, > > 'HIT_NAME' => 'Chr2_540228_540404_+', > 'HIT_DESC' => '', > 'HIT_START' => '20', > 'HIT_END' => '74', > 'HIT_LENGTH' => 56, > 'HIT_SEQ' => 'tegpkvaeartCesqShkFkGpCvsdtnCasvCrtEgfpgGecrg-rrrCfCtkpc', > 'HIT_FRAME' => 0, > > 'QUERY_NAME' => 'CRP0000', > 'QUERY_DESC' => undef, > 'QUERY_START' => '4', > 'QUERY_END' => '59', > 'QUERY_LENGTH' => '75', > 'QUERY_FRAME' => 0, > 'QUERY_SEQ' => 'GMGPVTVEARTCESKSHRFKGPCVSTHNCANVCHNEGFGGGKCRGfRRRCYCTRHC', > > 'HOMOLOGY_SEQ' => '++gp+++eartCes+Sh+FkGpCvs +nCa+vC++Egf gG+crg rrrC+Ct++c', > }, 'Bio::Search::HSP::HMMERHSP' ); > > Clearly, the "HIT_START", "HIT_END", "HIT_SEQ" should actually be exchanged with "QUERY_START", "QUERY_END" and "QUERY_SEQ" values. > > Thanks, > > Peng, > > On Tuesday, July 19, 2011 11:23:20 PM UTC-5, Givan, Scott A. wrote: > I'll try the bioperl-live version. Thanks guys. > Scott Givan > 541-740-4685 > Sent from an iPhone (so expect typos). > > On Jul 19, 2011, at 10:34 PM, "Chris Fields" wrote: > > > This might be a disconnect between the HMMER3 version in bioperl-live and the one in Kai's bioperl-hmmer3 repo. I believe the one in bioperl-live is newer. Scott, can you give that a try? > > > > chris > > > > On Jul 19, 2011, at 9:45 PM, Thomas Sharpton wrote: > > > >> Hi Scott, > >> > >> Thanks for writing. I'm on the road at the moment so I have to be briefer and less thorough than I'd like to be. > >> > >> What you are observing is not the intended behavior. Oddly, it's not what I recall obtaining in my tests on this software, though I was mostly interested in hmmsearch at the time and may have been sloppier than I should have been when it came to hmmscan. > >> > >> What version of HMMER3 you're using? There have been some small formatting changes in the past that might be causing a burp in the parser, though I'm doubting it. > >> > >> Kai Blin wrote some test scripts (found here: bioperl-live/t/SearchIO/hmmer.t) that, if I recall correctly, evaluate query/hit coordinates. It might be worth giving this a shot if you haven't already. > >> > >> Also, if you don't mind, I'm happy to run your code on your output file on my end. It might help me diagnose the problem. > >> > >> Sorry this is being a thorn in your side! I've cc'ed the list in case anyone else has insight into this matter. > >> > >> Best, > >> Thomas > >> > >> On Jul 19, 2011, at 10:43 AM, Givan, Scott A. wrote: > >> > >>> Hi Thomas, > >>> > >>> I'm using modules in the bipoerl-hmmer3 git repository to parse hmmscan > >>> reports. When I parse the files and walk through the HSP's like: > >>> > >>> while (my $hit = $rslt->next_model) { > >>> > >>> while (my $domain = $hit->next_hsp) { > >>> > >>> And retrieve the "hit" coordinates like: > >>> > >>> print "hit coords: ", $domain->start('hit'), "-", $domain->end('hit'), > >>> "\n"; > >>> > >>> The coordinates returned correspond to what I would call the "query", > >>> since they are for the sequence I fed to hmmscan to search the profile > >>> database. Likewise, when retrieving the query coordinates like > >>> $domain->start('query'), I get what I consider the "hit" coordinates, > >>> since they are for the domain profile. Is this the intended behavior? > >>> > >>> Thanks. > >>> > >>> scott > >>> > >>> -- > >>> Scott A. Givan > >>> Associate Director > >>> Informatics Research Core Facility > >>> 240e Bond Life Sciences Center > >>> Research Assistant Professor > >>> Molecular Microbiology and Immunology > >>> University of Missouri, Columbia > >>> > >>> TEL 573-882-2948 > >>> FAX 573-884-9676 > >>> http://ircf.rnet.missouri.edu > >>> > >>> > >>> > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > From p.j.a.cock at googlemail.com Wed Jul 11 17:00:56 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 11 Jul 2012 22:00:56 +0100 Subject: [Bioperl-l] hmmer3.pm question re query and hit coordinates In-Reply-To: References: <7CF4A2C5-F44F-4C0D-A3B7-5ED131A1A9ED@gmail.com> <1823BCEE-5D27-4FF9-8D57-082AE0CFE8ED@illinois.edu> Message-ID: On Wed, Jul 11, 2012 at 8:02 PM, Fields, Christopher J wrote: > Peng, > > Has this been filed as a bug yet? > > https://redmine.open-bio.org/projects/bioperl > > Seems like it would be fairly easy to fix, but I want to track it just in case. > > chris Hi all, This could be the unfortunate fact that hmmscan and hmmsearch return very similar tabular output, but with query and hit interchanged. i.e. You need some extra information to know which way round they are (not possible with the current output). This was an issue in Bow's Biopython SearchIO project - which for the moment he solved by handling this as two hmmer file formats. In the medium term we're hoping hmmer3 will add some header information or something. Peter From zhoupenggeni at gmail.com Wed Jul 11 13:45:00 2012 From: zhoupenggeni at gmail.com (Peng Zhou) Date: Wed, 11 Jul 2012 10:45:00 -0700 (PDT) Subject: [Bioperl-l] hmmer3.pm question re query and hit coordinates In-Reply-To: References: <7CF4A2C5-F44F-4C0D-A3B7-5ED131A1A9ED@gmail.com> <1823BCEE-5D27-4FF9-8D57-082AE0CFE8ED@illinois.edu> Message-ID: Hello guys, Just a follow-up, it seems to me the bioperl-live version is still having the same problem - calling hit "query" while query sequence "hit". I also looked into the test script written for hmmer3 ( bioperl-live/t/SearchIO/hmmer.t), and it doesn't deal with the alignment part - I guess that's why this bug was not discovered. To be simple, here's an output of hmmsearch v3.0: # hmmsearch :: search profile(s) against a sequence database # HMMER 3.0 (March 2010); http://hmmer.org/ # Copyright (C) 2010 Howard Hughes Medical Institute. # Freely distributed under the GNU General Public License (GPLv3). # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - # query HMM file: /project/youngn/zhoup/Scripts/spada/profile/21_all.hmm # target sequence database: /project/youngn/zhoup/Data/misc3/spada/Athaliana/01_genome/12_refseq_orf.fa # output directed to file: /project/youngn/zhoup/Data/misc3/spada/Athaliana/11_hmmSearchX/01_raw.txt # number of worker threads: 4 # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Query: CRP0000 [M=75] Scores for complete sequences (score includes all domains): --- full sequence --- --- best 1 domain --- -#dom- E-value score bias E-value score bias exp N Sequence Description ------- ------ ----- ------- ------ ----- ---- -- -------- ----------- 5.5e-25 95.0 14.4 5.7e-25 95.0 10.0 1.0 1 Chr2_540228_540404_+ Domain annotation for each sequence (and alignments): >> Chr2_540228_540404_+ # score bias c-Evalue i-Evalue hmmfrom hmm to alifrom ali to envfrom env to acc --- ------ ----- --------- --------- ------- ------- ------- ------- ------- ------- ---- 1 ! 95.0 10.0 3.6e-30 5.7e-25 20 74 .. 4 59 .] 1 59 [] 0.95 Alignments for each domain: == domain 1 score: 95.0 bits; conditional E-value: 3.6e-30 CRP0000 20 tegpkvaeartCesqShkFkGpCvsdtnCasvCrtEgfpgGecrg.rrrCfCtkpc 74 ++gp+++eartCes+Sh+FkGpCvs +nCa+vC++Egf gG+crg rrrC+Ct++c Chr2_540228_540404_+ 4 GMGPVTVEARTCESKSHRFKGPCVSTHNCANVCHNEGFGGGKCRGfRRRCYCTRHC 59 568899***99********************************************* PP And here is a dump of the parsed HSP object: $VAR1 = bless( { 'VERBOSE' => 0, 'IDENTICAL' => 0, 'RANK' => 1, 'STRANDED' => 'NONE', 'EVALUE' => '3.6e-30', 'HSP_LENGTH' => 56, 'ALGORITHM' => 'HMMSEARCH' 'SCORE' => '95.0', 'GAP_SYMBOL' => '-', 'CONSERVED' => 0, 'HIT_NAME' => 'Chr2_540228_540404_+', 'HIT_DESC' => '', 'HIT_START' => '20', 'HIT_END' => '74', 'HIT_LENGTH' => 56, 'HIT_SEQ' => 'tegpkvaeartCesqShkFkGpCvsdtnCasvCrtEgfpgGecrg-rrrCfCtkpc', 'HIT_FRAME' => 0, 'QUERY_NAME' => 'CRP0000', 'QUERY_DESC' => undef, 'QUERY_START' => '4', 'QUERY_END' => '59', 'QUERY_LENGTH' => '75', 'QUERY_FRAME' => 0, 'QUERY_SEQ' => 'GMGPVTVEARTCESKSHRFKGPCVSTHNCANVCHNEGFGGGKCRGfRRRCYCTRHC', 'HOMOLOGY_SEQ' => '++gp+++eartCes+Sh+FkGpCvs +nCa+vC++Egf gG+crg rrrC+Ct++c', }, 'Bio::Search::HSP::HMMERHSP' ); Clearly, the "HIT_START", "HIT_END", "HIT_SEQ" should actually be exchanged with "QUERY_START", "QUERY_END" and "QUERY_SEQ" values. Thanks, Peng, On Tuesday, July 19, 2011 11:23:20 PM UTC-5, Givan, Scott A. wrote: > > I'll try the bioperl-live version. Thanks guys. > > Scott Givan > 541-740-4685 > Sent from an iPhone (so expect typos). > > On Jul 19, 2011, at 10:34 PM, "Chris Fields" > wrote: > > > This might be a disconnect between the HMMER3 version in bioperl-live > and the one in Kai's bioperl-hmmer3 repo. I believe the one in > bioperl-live is newer. Scott, can you give that a try? > > > > chris > > > > On Jul 19, 2011, at 9:45 PM, Thomas Sharpton wrote: > > > >> Hi Scott, > >> > >> Thanks for writing. I'm on the road at the moment so I have to be > briefer and less thorough than I'd like to be. > >> > >> What you are observing is not the intended behavior. Oddly, it's not > what I recall obtaining in my tests on this software, though I was mostly > interested in hmmsearch at the time and may have been sloppier than I > should have been when it came to hmmscan. > >> > >> What version of HMMER3 you're using? There have been some small > formatting changes in the past that might be causing a burp in the parser, > though I'm doubting it. > >> > >> Kai Blin wrote some test scripts (found here: > bioperl-live/t/SearchIO/hmmer.t) that, if I recall correctly, evaluate > query/hit coordinates. It might be worth giving this a shot if you haven't > already. > >> > >> Also, if you don't mind, I'm happy to run your code on your output file > on my end. It might help me diagnose the problem. > >> > >> Sorry this is being a thorn in your side! I've cc'ed the list in case > anyone else has insight into this matter. > >> > >> Best, > >> Thomas > >> > >> On Jul 19, 2011, at 10:43 AM, Givan, Scott A. wrote: > >> > >>> Hi Thomas, > >>> > >>> I'm using modules in the bipoerl-hmmer3 git repository to parse hmmscan > >>> reports. When I parse the files and walk through the HSP's like: > >>> > >>> while (my $hit = $rslt->next_model) { > >>> > >>> while (my $domain = $hit->next_hsp) { > >>> > >>> And retrieve the "hit" coordinates like: > >>> > >>> print "hit coords: ", $domain->start('hit'), "-", > $domain->end('hit'), > >>> "\n"; > >>> > >>> The coordinates returned correspond to what I would call the "query", > >>> since they are for the sequence I fed to hmmscan to search the profile > >>> database. Likewise, when retrieving the query coordinates like > >>> $domain->start('query'), I get what I consider the "hit" coordinates, > >>> since they are for the domain profile. Is this the intended behavior? > >>> > >>> Thanks. > >>> > >>> scott > >>> > >>> -- > >>> Scott A. Givan > >>> Associate Director > >>> Informatics Research Core Facility > >>> 240e Bond Life Sciences Center > >>> Research Assistant Professor > >>> Molecular Microbiology and Immunology > >>> University of Missouri, Columbia > >>> > >>> TEL 573-882-2948 > >>> FAX 573-884-9676 > >>> http://ircf.rnet.missouri.edu > >>> > >>> > >>> > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From zhoupenggeni at gmail.com Wed Jul 11 13:45:00 2012 From: zhoupenggeni at gmail.com (Peng Zhou) Date: Wed, 11 Jul 2012 10:45:00 -0700 (PDT) Subject: [Bioperl-l] hmmer3.pm question re query and hit coordinates In-Reply-To: References: <7CF4A2C5-F44F-4C0D-A3B7-5ED131A1A9ED@gmail.com> <1823BCEE-5D27-4FF9-8D57-082AE0CFE8ED@illinois.edu> Message-ID: Hello guys, Just a follow-up, it seems to me the bioperl-live version is still having the same problem - calling hit "query" while query sequence "hit". I also looked into the test script written for hmmer3 ( bioperl-live/t/SearchIO/hmmer.t), and it doesn't deal with the alignment part - I guess that's why this bug was not discovered. To be simple, here's an output of hmmsearch v3.0: # hmmsearch :: search profile(s) against a sequence database # HMMER 3.0 (March 2010); http://hmmer.org/ # Copyright (C) 2010 Howard Hughes Medical Institute. # Freely distributed under the GNU General Public License (GPLv3). # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - # query HMM file: /project/youngn/zhoup/Scripts/spada/profile/21_all.hmm # target sequence database: /project/youngn/zhoup/Data/misc3/spada/Athaliana/01_genome/12_refseq_orf.fa # output directed to file: /project/youngn/zhoup/Data/misc3/spada/Athaliana/11_hmmSearchX/01_raw.txt # number of worker threads: 4 # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Query: CRP0000 [M=75] Scores for complete sequences (score includes all domains): --- full sequence --- --- best 1 domain --- -#dom- E-value score bias E-value score bias exp N Sequence Description ------- ------ ----- ------- ------ ----- ---- -- -------- ----------- 5.5e-25 95.0 14.4 5.7e-25 95.0 10.0 1.0 1 Chr2_540228_540404_+ Domain annotation for each sequence (and alignments): >> Chr2_540228_540404_+ # score bias c-Evalue i-Evalue hmmfrom hmm to alifrom ali to envfrom env to acc --- ------ ----- --------- --------- ------- ------- ------- ------- ------- ------- ---- 1 ! 95.0 10.0 3.6e-30 5.7e-25 20 74 .. 4 59 .] 1 59 [] 0.95 Alignments for each domain: == domain 1 score: 95.0 bits; conditional E-value: 3.6e-30 CRP0000 20 tegpkvaeartCesqShkFkGpCvsdtnCasvCrtEgfpgGecrg.rrrCfCtkpc 74 ++gp+++eartCes+Sh+FkGpCvs +nCa+vC++Egf gG+crg rrrC+Ct++c Chr2_540228_540404_+ 4 GMGPVTVEARTCESKSHRFKGPCVSTHNCANVCHNEGFGGGKCRGfRRRCYCTRHC 59 568899***99********************************************* PP And here is a dump of the parsed HSP object: $VAR1 = bless( { 'VERBOSE' => 0, 'IDENTICAL' => 0, 'RANK' => 1, 'STRANDED' => 'NONE', 'EVALUE' => '3.6e-30', 'HSP_LENGTH' => 56, 'ALGORITHM' => 'HMMSEARCH' 'SCORE' => '95.0', 'GAP_SYMBOL' => '-', 'CONSERVED' => 0, 'HIT_NAME' => 'Chr2_540228_540404_+', 'HIT_DESC' => '', 'HIT_START' => '20', 'HIT_END' => '74', 'HIT_LENGTH' => 56, 'HIT_SEQ' => 'tegpkvaeartCesqShkFkGpCvsdtnCasvCrtEgfpgGecrg-rrrCfCtkpc', 'HIT_FRAME' => 0, 'QUERY_NAME' => 'CRP0000', 'QUERY_DESC' => undef, 'QUERY_START' => '4', 'QUERY_END' => '59', 'QUERY_LENGTH' => '75', 'QUERY_FRAME' => 0, 'QUERY_SEQ' => 'GMGPVTVEARTCESKSHRFKGPCVSTHNCANVCHNEGFGGGKCRGfRRRCYCTRHC', 'HOMOLOGY_SEQ' => '++gp+++eartCes+Sh+FkGpCvs +nCa+vC++Egf gG+crg rrrC+Ct++c', }, 'Bio::Search::HSP::HMMERHSP' ); Clearly, the "HIT_START", "HIT_END", "HIT_SEQ" should actually be exchanged with "QUERY_START", "QUERY_END" and "QUERY_SEQ" values. Thanks, Peng, On Tuesday, July 19, 2011 11:23:20 PM UTC-5, Givan, Scott A. wrote: > > I'll try the bioperl-live version. Thanks guys. > > Scott Givan > 541-740-4685 > Sent from an iPhone (so expect typos). > > On Jul 19, 2011, at 10:34 PM, "Chris Fields" > wrote: > > > This might be a disconnect between the HMMER3 version in bioperl-live > and the one in Kai's bioperl-hmmer3 repo. I believe the one in > bioperl-live is newer. Scott, can you give that a try? > > > > chris > > > > On Jul 19, 2011, at 9:45 PM, Thomas Sharpton wrote: > > > >> Hi Scott, > >> > >> Thanks for writing. I'm on the road at the moment so I have to be > briefer and less thorough than I'd like to be. > >> > >> What you are observing is not the intended behavior. Oddly, it's not > what I recall obtaining in my tests on this software, though I was mostly > interested in hmmsearch at the time and may have been sloppier than I > should have been when it came to hmmscan. > >> > >> What version of HMMER3 you're using? There have been some small > formatting changes in the past that might be causing a burp in the parser, > though I'm doubting it. > >> > >> Kai Blin wrote some test scripts (found here: > bioperl-live/t/SearchIO/hmmer.t) that, if I recall correctly, evaluate > query/hit coordinates. It might be worth giving this a shot if you haven't > already. > >> > >> Also, if you don't mind, I'm happy to run your code on your output file > on my end. It might help me diagnose the problem. > >> > >> Sorry this is being a thorn in your side! I've cc'ed the list in case > anyone else has insight into this matter. > >> > >> Best, > >> Thomas > >> > >> On Jul 19, 2011, at 10:43 AM, Givan, Scott A. wrote: > >> > >>> Hi Thomas, > >>> > >>> I'm using modules in the bipoerl-hmmer3 git repository to parse hmmscan > >>> reports. When I parse the files and walk through the HSP's like: > >>> > >>> while (my $hit = $rslt->next_model) { > >>> > >>> while (my $domain = $hit->next_hsp) { > >>> > >>> And retrieve the "hit" coordinates like: > >>> > >>> print "hit coords: ", $domain->start('hit'), "-", > $domain->end('hit'), > >>> "\n"; > >>> > >>> The coordinates returned correspond to what I would call the "query", > >>> since they are for the sequence I fed to hmmscan to search the profile > >>> database. Likewise, when retrieving the query coordinates like > >>> $domain->start('query'), I get what I consider the "hit" coordinates, > >>> since they are for the domain profile. Is this the intended behavior? > >>> > >>> Thanks. > >>> > >>> scott > >>> > >>> -- > >>> Scott A. Givan > >>> Associate Director > >>> Informatics Research Core Facility > >>> 240e Bond Life Sciences Center > >>> Research Assistant Professor > >>> Molecular Microbiology and Immunology > >>> University of Missouri, Columbia > >>> > >>> TEL 573-882-2948 > >>> FAX 573-884-9676 > >>> http://ircf.rnet.missouri.edu > >>> > >>> > >>> > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From zhoupenggeni at gmail.com Wed Jul 11 14:03:17 2012 From: zhoupenggeni at gmail.com (Peng Zhou) Date: Wed, 11 Jul 2012 11:03:17 -0700 (PDT) Subject: [Bioperl-l] Errors with Bio::Seq objects In-Reply-To: <4FFDB879.1020906@gmail.com> References: <4FFDB879.1020906@gmail.com> Message-ID: <25cf332a-f998-4eae-96ff-d9db1ee2ff2c@googlegroups.com> Hi, I guess that's what the commands are supposed to do: the get_SeqFeatures() method return an array of Bio::SeqFeature objects, and the translate() method returns a Bio::Seq object. And you can't simply "print" an object in perl - you can "dump" it though: $ perl -e ' use Bio::SeqIO; use Data::Dumper; my $seqio=Bio::SeqIO->new(-file=>shift); print Dumper($seqio->next_seq()->get_SeqFeatures()); ' nt.gb $ perl -e ' use Bio::SeqIO; use Data::Dumper; my $seqio=Bio::SeqIO->new(-file=>shift); print Dumper($seqio->next_seq()->translate()); ' nt.gb On Wednesday, July 11, 2012 12:31:37 PM UTC-5, De-Jian Zhao wrote: > > Hi, > > I downloaded a nucleotide sequence from Genbank (file name: nt.gb) and > tested the Bio::SeqIO module as follows. The first 3 commands succeeded; > however the last 2 failed. > > I think $seqio->next_seq() produces a Bio::Seq object which contains the > sequence, features and annotation (according to the DESCRIPTION of > "perldoc Bio::Seq") and thus the invocation of the methods > get_SeqFeatures() and translate() should be valid. However, the results > denied this idea. > > Will anyone explain what happened to the last 2 commands? I have > encountered numerous cases of failures when testing the bioperl methods. > I want to translate the mRNA sequence and extract the sequence features. > What are the right commands? Thanks a lot! > > Best, > Dejian > > > > PS: The commands and results > > $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); > print $seqio->next_seq()->display_id(); ' nt.gb > NM_053056 > > $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); > print $seqio->next_seq()->molecule(); ' nt.gb > mRNA > > $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); > print $seqio->next_seq()->subseq(1,6); ' nt.gb > CACACG > > $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); > print $seqio->next_seq()->get_SeqFeatures(); ' nt.gb > Bio::SeqFeature::Generic=HASH(0x20a30898)Bio::SeqFeature::Generic=HASH(0x20a30bb0)Bio::SeqFeature::Generic=HASH(0x20a30cd0)Bio::SeqFeature::Generic=HASH(0x20a317b0)Bio::SeqFeature::Generic=HASH(0x20a31720)Bio::SeqFeature::Generic=HASH(0x20a39a18)Bio::SeqFeature::Generic=HASH(0x20a317e0)Bio::SeqFeature::Generic=HASH(0x20a398b0)Bio::SeqFeature::Generic=HASH(0x20a39838)Bio::SeqFeature::Generic=HASH(0x20a39e98)Bio::SeqFeature::Generic=HASH(0x20a3b898)Bio::SeqFeature::Generic=HASH(0x20a3a120)Bio::SeqFeature::Generic=HASH(0x20a3bda8)Bio::SeqFeature::Generic=HASH(0x20a3c030)Bio::SeqFeature::Generic=HASH(0x20a3c2b8)Bio::SeqFeature::Generic=HASH(0x20a3be20)Bio::SeqFeature::Generic=HASH(0x20a3c0a8)Bio::SeqFeature::Generic=HASH(0x20a3bb98)Bio::SeqFeature::Generic=HASH(0x20a3c300)Bio::SeqFeature::Generic=HASH(0x20a3c588)Bio::SeqFeature::Generic=HASH(0x20a3d838)Bio::SeqFeature::Generic=HASH(0x20a3dfb8)Bio::SeqFeature::Generic=HASH(0x20a3dd18) > > > $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); > print $seqio->next_seq()->translate(); ' nt.gb > Bio::Seq::RichSeq=HASH(0x20a3e7b0) > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From zhoupenggeni at gmail.com Wed Jul 11 14:03:17 2012 From: zhoupenggeni at gmail.com (Peng Zhou) Date: Wed, 11 Jul 2012 11:03:17 -0700 (PDT) Subject: [Bioperl-l] Errors with Bio::Seq objects In-Reply-To: <4FFDB879.1020906@gmail.com> References: <4FFDB879.1020906@gmail.com> Message-ID: <25cf332a-f998-4eae-96ff-d9db1ee2ff2c@googlegroups.com> Hi, I guess that's what the commands are supposed to do: the get_SeqFeatures() method return an array of Bio::SeqFeature objects, and the translate() method returns a Bio::Seq object. And you can't simply "print" an object in perl - you can "dump" it though: $ perl -e ' use Bio::SeqIO; use Data::Dumper; my $seqio=Bio::SeqIO->new(-file=>shift); print Dumper($seqio->next_seq()->get_SeqFeatures()); ' nt.gb $ perl -e ' use Bio::SeqIO; use Data::Dumper; my $seqio=Bio::SeqIO->new(-file=>shift); print Dumper($seqio->next_seq()->translate()); ' nt.gb On Wednesday, July 11, 2012 12:31:37 PM UTC-5, De-Jian Zhao wrote: > > Hi, > > I downloaded a nucleotide sequence from Genbank (file name: nt.gb) and > tested the Bio::SeqIO module as follows. The first 3 commands succeeded; > however the last 2 failed. > > I think $seqio->next_seq() produces a Bio::Seq object which contains the > sequence, features and annotation (according to the DESCRIPTION of > "perldoc Bio::Seq") and thus the invocation of the methods > get_SeqFeatures() and translate() should be valid. However, the results > denied this idea. > > Will anyone explain what happened to the last 2 commands? I have > encountered numerous cases of failures when testing the bioperl methods. > I want to translate the mRNA sequence and extract the sequence features. > What are the right commands? Thanks a lot! > > Best, > Dejian > > > > PS: The commands and results > > $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); > print $seqio->next_seq()->display_id(); ' nt.gb > NM_053056 > > $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); > print $seqio->next_seq()->molecule(); ' nt.gb > mRNA > > $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); > print $seqio->next_seq()->subseq(1,6); ' nt.gb > CACACG > > $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); > print $seqio->next_seq()->get_SeqFeatures(); ' nt.gb > Bio::SeqFeature::Generic=HASH(0x20a30898)Bio::SeqFeature::Generic=HASH(0x20a30bb0)Bio::SeqFeature::Generic=HASH(0x20a30cd0)Bio::SeqFeature::Generic=HASH(0x20a317b0)Bio::SeqFeature::Generic=HASH(0x20a31720)Bio::SeqFeature::Generic=HASH(0x20a39a18)Bio::SeqFeature::Generic=HASH(0x20a317e0)Bio::SeqFeature::Generic=HASH(0x20a398b0)Bio::SeqFeature::Generic=HASH(0x20a39838)Bio::SeqFeature::Generic=HASH(0x20a39e98)Bio::SeqFeature::Generic=HASH(0x20a3b898)Bio::SeqFeature::Generic=HASH(0x20a3a120)Bio::SeqFeature::Generic=HASH(0x20a3bda8)Bio::SeqFeature::Generic=HASH(0x20a3c030)Bio::SeqFeature::Generic=HASH(0x20a3c2b8)Bio::SeqFeature::Generic=HASH(0x20a3be20)Bio::SeqFeature::Generic=HASH(0x20a3c0a8)Bio::SeqFeature::Generic=HASH(0x20a3bb98)Bio::SeqFeature::Generic=HASH(0x20a3c300)Bio::SeqFeature::Generic=HASH(0x20a3c588)Bio::SeqFeature::Generic=HASH(0x20a3d838)Bio::SeqFeature::Generic=HASH(0x20a3dfb8)Bio::SeqFeature::Generic=HASH(0x20a3dd18) > > > $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); > print $seqio->next_seq()->translate(); ' nt.gb > Bio::Seq::RichSeq=HASH(0x20a3e7b0) > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From zhoupenggeni at gmail.com Wed Jul 11 16:05:56 2012 From: zhoupenggeni at gmail.com (Peng Zhou) Date: Wed, 11 Jul 2012 13:05:56 -0700 (PDT) Subject: [Bioperl-l] hmmer3.pm question re query and hit coordinates In-Reply-To: References: <7CF4A2C5-F44F-4C0D-A3B7-5ED131A1A9ED@gmail.com> <1823BCEE-5D27-4FF9-8D57-082AE0CFE8ED@illinois.edu> Message-ID: Thanks Chris, here is the link of the filed bug: https://redmine.open-bio.org/issues/3369 On Wednesday, July 11, 2012 2:02:46 PM UTC-5, Christopher Fields wrote: > > Peng, > > Has this been filed as a bug yet? > > https://redmine.open-bio.org/projects/bioperl > > Seems like it would be fairly easy to fix, but I want to track it just in > case. > > chris > > On Jul 11, 2012, at 12:45 PM, Peng Zhou wrote: > > > Hello guys, > > > > Just a follow-up, it seems to me the bioperl-live version is still > having the same problem - calling hit "query" while query sequence "hit". I > also looked into the test script written for hmmer3 > (bioperl-live/t/SearchIO/hmmer.t), and it doesn't deal with the alignment > part - I guess that's why this bug was not discovered. > > > > To be simple, here's an output of hmmsearch v3.0: > > # hmmsearch :: search profile(s) against a sequence database > > # HMMER 3.0 (March 2010); http://hmmer.org/ > > # Copyright (C) 2010 Howard Hughes Medical Institute. > > # Freely distributed under the GNU General Public License (GPLv3). > > # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > - > > # query HMM file: > /project/youngn/zhoup/Scripts/spada/profile/21_all.hmm > > # target sequence database: > /project/youngn/zhoup/Data/misc3/spada/Athaliana/01_genome/12_refseq_orf.fa > > > # output directed to file: > /project/youngn/zhoup/Data/misc3/spada/Athaliana/11_hmmSearchX/01_raw.txt > > # number of worker threads: 4 > > # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > - > > > > Query: CRP0000 [M=75] > > Scores for complete sequences (score includes all domains): > > --- full sequence --- --- best 1 domain --- -#dom- > > E-value score bias E-value score bias exp N Sequence > Description > > ------- ------ ----- ------- ------ ----- ---- -- -------- > ----------- > > 5.5e-25 95.0 14.4 5.7e-25 95.0 10.0 1.0 1 > Chr2_540228_540404_+ > > > > Domain annotation for each sequence (and alignments): > > >> Chr2_540228_540404_+ > > # score bias c-Evalue i-Evalue hmmfrom hmm to alifrom ali > to envfrom env to acc > > --- ------ ----- --------- --------- ------- ------- ------- > ------- ------- ------- ---- > > 1 ! 95.0 10.0 3.6e-30 5.7e-25 20 74 .. 4 > 59 .] 1 59 [] 0.95 > > > > Alignments for each domain: > > == domain 1 score: 95.0 bits; conditional E-value: 3.6e-30 > > CRP0000 20 > tegpkvaeartCesqShkFkGpCvsdtnCasvCrtEgfpgGecrg.rrrCfCtkpc 74 > > ++gp+++eartCes+Sh+FkGpCvs +nCa+vC++Egf gG+crg > rrrC+Ct++c > > Chr2_540228_540404_+ 4 > GMGPVTVEARTCESKSHRFKGPCVSTHNCANVCHNEGFGGGKCRGfRRRCYCTRHC 59 > > > 568899***99********************************************* PP > > > > And here is a dump of the parsed HSP object: > > $VAR1 = bless( { > > 'VERBOSE' => 0, > > 'IDENTICAL' => 0, > > 'RANK' => 1, > > 'STRANDED' => 'NONE', > > 'EVALUE' => '3.6e-30', > > 'HSP_LENGTH' => 56, > > 'ALGORITHM' => 'HMMSEARCH' > > 'SCORE' => '95.0', > > 'GAP_SYMBOL' => '-', > > 'CONSERVED' => 0, > > > > 'HIT_NAME' => 'Chr2_540228_540404_+', > > 'HIT_DESC' => '', > > 'HIT_START' => '20', > > 'HIT_END' => '74', > > 'HIT_LENGTH' => 56, > > 'HIT_SEQ' => > 'tegpkvaeartCesqShkFkGpCvsdtnCasvCrtEgfpgGecrg-rrrCfCtkpc', > > 'HIT_FRAME' => 0, > > > > 'QUERY_NAME' => 'CRP0000', > > 'QUERY_DESC' => undef, > > 'QUERY_START' => '4', > > 'QUERY_END' => '59', > > 'QUERY_LENGTH' => '75', > > 'QUERY_FRAME' => 0, > > 'QUERY_SEQ' => > 'GMGPVTVEARTCESKSHRFKGPCVSTHNCANVCHNEGFGGGKCRGfRRRCYCTRHC', > > > > 'HOMOLOGY_SEQ' => '++gp+++eartCes+Sh+FkGpCvs > +nCa+vC++Egf gG+crg rrrC+Ct++c', > > }, 'Bio::Search::HSP::HMMERHSP' ); > > > > Clearly, the "HIT_START", "HIT_END", "HIT_SEQ" should actually be > exchanged with "QUERY_START", "QUERY_END" and "QUERY_SEQ" values. > > > > Thanks, > > > > Peng, > > > > On Tuesday, July 19, 2011 11:23:20 PM UTC-5, Givan, Scott A. wrote: > > I'll try the bioperl-live version. Thanks guys. > > Scott Givan > > 541-740-4685 > > Sent from an iPhone (so expect typos). > > > > On Jul 19, 2011, at 10:34 PM, "Chris Fields" > wrote: > > > > > This might be a disconnect between the HMMER3 version in bioperl-live > and the one in Kai's bioperl-hmmer3 repo. I believe the one in > bioperl-live is newer. Scott, can you give that a try? > > > > > > chris > > > > > > On Jul 19, 2011, at 9:45 PM, Thomas Sharpton wrote: > > > > > >> Hi Scott, > > >> > > >> Thanks for writing. I'm on the road at the moment so I have to be > briefer and less thorough than I'd like to be. > > >> > > >> What you are observing is not the intended behavior. Oddly, it's not > what I recall obtaining in my tests on this software, though I was mostly > interested in hmmsearch at the time and may have been sloppier than I > should have been when it came to hmmscan. > > >> > > >> What version of HMMER3 you're using? There have been some small > formatting changes in the past that might be causing a burp in the parser, > though I'm doubting it. > > >> > > >> Kai Blin wrote some test scripts (found here: > bioperl-live/t/SearchIO/hmmer.t) that, if I recall correctly, evaluate > query/hit coordinates. It might be worth giving this a shot if you haven't > already. > > >> > > >> Also, if you don't mind, I'm happy to run your code on your output > file on my end. It might help me diagnose the problem. > > >> > > >> Sorry this is being a thorn in your side! I've cc'ed the list in case > anyone else has insight into this matter. > > >> > > >> Best, > > >> Thomas > > >> > > >> On Jul 19, 2011, at 10:43 AM, Givan, Scott A. wrote: > > >> > > >>> Hi Thomas, > > >>> > > >>> I'm using modules in the bipoerl-hmmer3 git repository to parse > hmmscan > > >>> reports. When I parse the files and walk through the HSP's like: > > >>> > > >>> while (my $hit = $rslt->next_model) { > > >>> > > >>> while (my $domain = $hit->next_hsp) { > > >>> > > >>> And retrieve the "hit" coordinates like: > > >>> > > >>> print "hit coords: ", $domain->start('hit'), "-", > $domain->end('hit'), > > >>> "\n"; > > >>> > > >>> The coordinates returned correspond to what I would call the > "query", > > >>> since they are for the sequence I fed to hmmscan to search the > profile > > >>> database. Likewise, when retrieving the query coordinates like > > >>> $domain->start('query'), I get what I consider the "hit" > coordinates, > > >>> since they are for the domain profile. Is this the intended > behavior? > > >>> > > >>> Thanks. > > >>> > > >>> scott > > >>> > > >>> -- > > >>> Scott A. Givan > > >>> Associate Director > > >>> Informatics Research Core Facility > > >>> 240e Bond Life Sciences Center > > >>> Research Assistant Professor > > >>> Molecular Microbiology and Immunology > > >>> University of Missouri, Columbia > > >>> > > >>> TEL 573-882-2948 > > >>> FAX 573-884-9676 > > >>> http://ircf.rnet.missouri.edu > > >>> > > >>> > > >>> > > >> > > >> _______________________________________________ > > >> Bioperl-l mailing list > > >> Bioperl-l at lists.open-bio.org > > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From zhoupenggeni at gmail.com Wed Jul 11 16:05:56 2012 From: zhoupenggeni at gmail.com (Peng Zhou) Date: Wed, 11 Jul 2012 13:05:56 -0700 (PDT) Subject: [Bioperl-l] hmmer3.pm question re query and hit coordinates In-Reply-To: References: <7CF4A2C5-F44F-4C0D-A3B7-5ED131A1A9ED@gmail.com> <1823BCEE-5D27-4FF9-8D57-082AE0CFE8ED@illinois.edu> Message-ID: Thanks Chris, here is the link of the filed bug: https://redmine.open-bio.org/issues/3369 On Wednesday, July 11, 2012 2:02:46 PM UTC-5, Christopher Fields wrote: > > Peng, > > Has this been filed as a bug yet? > > https://redmine.open-bio.org/projects/bioperl > > Seems like it would be fairly easy to fix, but I want to track it just in > case. > > chris > > On Jul 11, 2012, at 12:45 PM, Peng Zhou wrote: > > > Hello guys, > > > > Just a follow-up, it seems to me the bioperl-live version is still > having the same problem - calling hit "query" while query sequence "hit". I > also looked into the test script written for hmmer3 > (bioperl-live/t/SearchIO/hmmer.t), and it doesn't deal with the alignment > part - I guess that's why this bug was not discovered. > > > > To be simple, here's an output of hmmsearch v3.0: > > # hmmsearch :: search profile(s) against a sequence database > > # HMMER 3.0 (March 2010); http://hmmer.org/ > > # Copyright (C) 2010 Howard Hughes Medical Institute. > > # Freely distributed under the GNU General Public License (GPLv3). > > # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > - > > # query HMM file: > /project/youngn/zhoup/Scripts/spada/profile/21_all.hmm > > # target sequence database: > /project/youngn/zhoup/Data/misc3/spada/Athaliana/01_genome/12_refseq_orf.fa > > > # output directed to file: > /project/youngn/zhoup/Data/misc3/spada/Athaliana/11_hmmSearchX/01_raw.txt > > # number of worker threads: 4 > > # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > - > > > > Query: CRP0000 [M=75] > > Scores for complete sequences (score includes all domains): > > --- full sequence --- --- best 1 domain --- -#dom- > > E-value score bias E-value score bias exp N Sequence > Description > > ------- ------ ----- ------- ------ ----- ---- -- -------- > ----------- > > 5.5e-25 95.0 14.4 5.7e-25 95.0 10.0 1.0 1 > Chr2_540228_540404_+ > > > > Domain annotation for each sequence (and alignments): > > >> Chr2_540228_540404_+ > > # score bias c-Evalue i-Evalue hmmfrom hmm to alifrom ali > to envfrom env to acc > > --- ------ ----- --------- --------- ------- ------- ------- > ------- ------- ------- ---- > > 1 ! 95.0 10.0 3.6e-30 5.7e-25 20 74 .. 4 > 59 .] 1 59 [] 0.95 > > > > Alignments for each domain: > > == domain 1 score: 95.0 bits; conditional E-value: 3.6e-30 > > CRP0000 20 > tegpkvaeartCesqShkFkGpCvsdtnCasvCrtEgfpgGecrg.rrrCfCtkpc 74 > > ++gp+++eartCes+Sh+FkGpCvs +nCa+vC++Egf gG+crg > rrrC+Ct++c > > Chr2_540228_540404_+ 4 > GMGPVTVEARTCESKSHRFKGPCVSTHNCANVCHNEGFGGGKCRGfRRRCYCTRHC 59 > > > 568899***99********************************************* PP > > > > And here is a dump of the parsed HSP object: > > $VAR1 = bless( { > > 'VERBOSE' => 0, > > 'IDENTICAL' => 0, > > 'RANK' => 1, > > 'STRANDED' => 'NONE', > > 'EVALUE' => '3.6e-30', > > 'HSP_LENGTH' => 56, > > 'ALGORITHM' => 'HMMSEARCH' > > 'SCORE' => '95.0', > > 'GAP_SYMBOL' => '-', > > 'CONSERVED' => 0, > > > > 'HIT_NAME' => 'Chr2_540228_540404_+', > > 'HIT_DESC' => '', > > 'HIT_START' => '20', > > 'HIT_END' => '74', > > 'HIT_LENGTH' => 56, > > 'HIT_SEQ' => > 'tegpkvaeartCesqShkFkGpCvsdtnCasvCrtEgfpgGecrg-rrrCfCtkpc', > > 'HIT_FRAME' => 0, > > > > 'QUERY_NAME' => 'CRP0000', > > 'QUERY_DESC' => undef, > > 'QUERY_START' => '4', > > 'QUERY_END' => '59', > > 'QUERY_LENGTH' => '75', > > 'QUERY_FRAME' => 0, > > 'QUERY_SEQ' => > 'GMGPVTVEARTCESKSHRFKGPCVSTHNCANVCHNEGFGGGKCRGfRRRCYCTRHC', > > > > 'HOMOLOGY_SEQ' => '++gp+++eartCes+Sh+FkGpCvs > +nCa+vC++Egf gG+crg rrrC+Ct++c', > > }, 'Bio::Search::HSP::HMMERHSP' ); > > > > Clearly, the "HIT_START", "HIT_END", "HIT_SEQ" should actually be > exchanged with "QUERY_START", "QUERY_END" and "QUERY_SEQ" values. > > > > Thanks, > > > > Peng, > > > > On Tuesday, July 19, 2011 11:23:20 PM UTC-5, Givan, Scott A. wrote: > > I'll try the bioperl-live version. Thanks guys. > > Scott Givan > > 541-740-4685 > > Sent from an iPhone (so expect typos). > > > > On Jul 19, 2011, at 10:34 PM, "Chris Fields" > wrote: > > > > > This might be a disconnect between the HMMER3 version in bioperl-live > and the one in Kai's bioperl-hmmer3 repo. I believe the one in > bioperl-live is newer. Scott, can you give that a try? > > > > > > chris > > > > > > On Jul 19, 2011, at 9:45 PM, Thomas Sharpton wrote: > > > > > >> Hi Scott, > > >> > > >> Thanks for writing. I'm on the road at the moment so I have to be > briefer and less thorough than I'd like to be. > > >> > > >> What you are observing is not the intended behavior. Oddly, it's not > what I recall obtaining in my tests on this software, though I was mostly > interested in hmmsearch at the time and may have been sloppier than I > should have been when it came to hmmscan. > > >> > > >> What version of HMMER3 you're using? There have been some small > formatting changes in the past that might be causing a burp in the parser, > though I'm doubting it. > > >> > > >> Kai Blin wrote some test scripts (found here: > bioperl-live/t/SearchIO/hmmer.t) that, if I recall correctly, evaluate > query/hit coordinates. It might be worth giving this a shot if you haven't > already. > > >> > > >> Also, if you don't mind, I'm happy to run your code on your output > file on my end. It might help me diagnose the problem. > > >> > > >> Sorry this is being a thorn in your side! I've cc'ed the list in case > anyone else has insight into this matter. > > >> > > >> Best, > > >> Thomas > > >> > > >> On Jul 19, 2011, at 10:43 AM, Givan, Scott A. wrote: > > >> > > >>> Hi Thomas, > > >>> > > >>> I'm using modules in the bipoerl-hmmer3 git repository to parse > hmmscan > > >>> reports. When I parse the files and walk through the HSP's like: > > >>> > > >>> while (my $hit = $rslt->next_model) { > > >>> > > >>> while (my $domain = $hit->next_hsp) { > > >>> > > >>> And retrieve the "hit" coordinates like: > > >>> > > >>> print "hit coords: ", $domain->start('hit'), "-", > $domain->end('hit'), > > >>> "\n"; > > >>> > > >>> The coordinates returned correspond to what I would call the > "query", > > >>> since they are for the sequence I fed to hmmscan to search the > profile > > >>> database. Likewise, when retrieving the query coordinates like > > >>> $domain->start('query'), I get what I consider the "hit" > coordinates, > > >>> since they are for the domain profile. Is this the intended > behavior? > > >>> > > >>> Thanks. > > >>> > > >>> scott > > >>> > > >>> -- > > >>> Scott A. Givan > > >>> Associate Director > > >>> Informatics Research Core Facility > > >>> 240e Bond Life Sciences Center > > >>> Research Assistant Professor > > >>> Molecular Microbiology and Immunology > > >>> University of Missouri, Columbia > > >>> > > >>> TEL 573-882-2948 > > >>> FAX 573-884-9676 > > >>> http://ircf.rnet.missouri.edu > > >>> > > >>> > > >>> > > >> > > >> _______________________________________________ > > >> Bioperl-l mailing list > > >> Bioperl-l at lists.open-bio.org > > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From w.arindrarto at gmail.com Wed Jul 11 17:25:44 2012 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Wed, 11 Jul 2012 23:25:44 +0200 Subject: [Bioperl-l] hmmer3.pm question re query and hit coordinates In-Reply-To: References: <7CF4A2C5-F44F-4C0D-A3B7-5ED131A1A9ED@gmail.com> <1823BCEE-5D27-4FF9-8D57-082AE0CFE8ED@illinois.edu> Message-ID: Hi everyone, Just as an additional info that might be useful: The current Biopython parser for the plain text format parses the very first line to find out which HMMER flavor produces the result. Both 'hmm from' and 'hmmto' are query coordinates if the flavor is hmmsearch or phmmer; and they're hit coordinates if the flavor is hmmscan. This information is not available in other HMMER command line output formats (tblout and domtblout), which as Peter has mentioned, required us to treat different flavors of the table output as different formats for the time being. Fortunately, after contacting the HMMER developers they mentioned that this is not the case anymore in their development branch (and their future planned release). Hope that helps :), Bow On Wed, Jul 11, 2012 at 11:00 PM, Peter Cock wrote: > On Wed, Jul 11, 2012 at 8:02 PM, Fields, Christopher J > wrote: >> Peng, >> >> Has this been filed as a bug yet? >> >> https://redmine.open-bio.org/projects/bioperl >> >> Seems like it would be fairly easy to fix, but I want to track it just in case. >> >> chris > > Hi all, > > This could be the unfortunate fact that hmmscan and > hmmsearch return very similar tabular output, but > with query and hit interchanged. i.e. You need some > extra information to know which way round they are > (not possible with the current output). This was an > issue in Bow's Biopython SearchIO project - which > for the moment he solved by handling this as two > hmmer file formats. In the medium term we're hoping > hmmer3 will add some header information or something. > > Peter From dejian.zhao at gmail.com Thu Jul 12 01:04:54 2012 From: dejian.zhao at gmail.com (De-Jian Zhao) Date: Thu, 12 Jul 2012 13:04:54 +0800 Subject: [Bioperl-l] Errors with Bio::Seq objects In-Reply-To: <25cf332a-f998-4eae-96ff-d9db1ee2ff2c@googlegroups.com> References: <4FFDB879.1020906@gmail.com> <25cf332a-f998-4eae-96ff-d9db1ee2ff2c@googlegroups.com> Message-ID: <4FFE5AF6.1020300@gmail.com> Thank you, Peng. That's great! Actually I am wondering how to get the whole content of an object these days; "Dumping it" is a good solution. On 2012-7-12 2:03, Peng Zhou wrote: > Hi, > > I guess that's what the commands are supposed to do: the get_SeqFeatures() > method return an array of Bio::SeqFeature objects, and the translate() > method returns a Bio::Seq object. And you can't simply "print" an object in > perl - you can "dump" it though: > > $ perl -e ' use Bio::SeqIO; use Data::Dumper; my > $seqio=Bio::SeqIO->new(-file=>shift); > print Dumper($seqio->next_seq()->get_SeqFeatures()); ' nt.gb > > $ perl -e ' use Bio::SeqIO; use Data::Dumper; my > $seqio=Bio::SeqIO->new(-file=>shift); > print Dumper($seqio->next_seq()->translate()); ' nt.gb > > On Wednesday, July 11, 2012 12:31:37 PM UTC-5, De-Jian Zhao wrote: >> Hi, >> >> I downloaded a nucleotide sequence from Genbank (file name: nt.gb) and >> tested the Bio::SeqIO module as follows. The first 3 commands succeeded; >> however the last 2 failed. >> >> I think $seqio->next_seq() produces a Bio::Seq object which contains the >> sequence, features and annotation (according to the DESCRIPTION of >> "perldoc Bio::Seq") and thus the invocation of the methods >> get_SeqFeatures() and translate() should be valid. However, the results >> denied this idea. >> >> Will anyone explain what happened to the last 2 commands? I have >> encountered numerous cases of failures when testing the bioperl methods. >> I want to translate the mRNA sequence and extract the sequence features. >> What are the right commands? Thanks a lot! >> >> Best, >> Dejian >> >> >> >> PS: The commands and results >> >> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); >> print $seqio->next_seq()->display_id(); ' nt.gb >> NM_053056 >> >> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); >> print $seqio->next_seq()->molecule(); ' nt.gb >> mRNA >> >> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); >> print $seqio->next_seq()->subseq(1,6); ' nt.gb >> CACACG >> >> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); >> print $seqio->next_seq()->get_SeqFeatures(); ' nt.gb >> Bio::SeqFeature::Generic=HASH(0x20a30898)Bio::SeqFeature::Generic=HASH(0x20a30bb0)Bio::SeqFeature::Generic=HASH(0x20a30cd0)Bio::SeqFeature::Generic=HASH(0x20a317b0)Bio::SeqFeature::Generic=HASH(0x20a31720)Bio::SeqFeature::Generic=HASH(0x20a39a18)Bio::SeqFeature::Generic=HASH(0x20a317e0)Bio::SeqFeature::Generic=HASH(0x20a398b0)Bio::SeqFeature::Generic=HASH(0x20a39838)Bio::SeqFeature::Generic=HASH(0x20a39e98)Bio::SeqFeature::Generic=HASH(0x20a3b898)Bio::SeqFeature::Generic=HASH(0x20a3a120)Bio::SeqFeature::Generic=HASH(0x20a3bda8)Bio::SeqFeature::Generic=HASH(0x20a3c030)Bio::SeqFeature::Generic=HASH(0x20a3c2b8)Bio::SeqFeature::Generic=HASH(0x20a3be20)Bio::SeqFeature::Generic=HASH(0x20a3c0a8)Bio::SeqFeature::Generic=HASH(0x20a3bb98)Bio::SeqFeature::Generic=HASH(0x20a3c300)Bio::SeqFeature::Generic=HASH(0x20a3c588)Bio::SeqFeature::Generic=HASH(0x20a3d838)Bio::SeqFeature::Generic=HASH(0x20a3dfb8)Bio::SeqFeature::Generic=HASH(0x20a3dd18) >> >> >> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); >> print $seqio->next_seq()->translate(); ' nt.gb >> Bio::Seq::RichSeq=HASH(0x20a3e7b0) >> From dejian.zhao at gmail.com Thu Jul 12 01:14:33 2012 From: dejian.zhao at gmail.com (De-Jian Zhao) Date: Thu, 12 Jul 2012 13:14:33 +0800 Subject: [Bioperl-l] Errors with Bio::Seq objects In-Reply-To: <9CA9DA3A-B03F-4EC3-977C-E18A6F4D9B6F@tamu.edu> References: <4FFDB879.1020906@gmail.com> <9CA9DA3A-B03F-4EC3-977C-E18A6F4D9B6F@tamu.edu> Message-ID: <4FFE5D39.6010406@gmail.com> Thank you, Jim. You are right. It works. This example deepens my understanding of OOP. On 2012-7-12 2:01, Jim Hu wrote: >> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); print $seqio->next_seq()->translate(); ' nt.gb >> > Bio::Seq::RichSeq=HASH(0x20a3e7b0) > ->translate returns a new Seq object. I think > > $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); print $seqio->next_seq()->translate()->seq(); ' nt.gb > > should work (haven't tried it). From kai.blin at biotech.uni-tuebingen.de Thu Jul 12 09:43:19 2012 From: kai.blin at biotech.uni-tuebingen.de (Kai Blin) Date: Thu, 12 Jul 2012 15:43:19 +0200 Subject: [Bioperl-l] hmmer3.pm question re query and hit coordinates In-Reply-To: References: <7CF4A2C5-F44F-4C0D-A3B7-5ED131A1A9ED@gmail.com> <1823BCEE-5D27-4FF9-8D57-082AE0CFE8ED@illinois.edu> Message-ID: <4FFED477.3090907@biotech.uni-tuebingen.de> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 2012-07-11 23:25, Wibowo Arindrarto wrote: Hi, > The current Biopython parser for the plain text format parses the > very first line to find out which HMMER flavor produces the result. > Both 'hmm from' and 'hmmto' are query coordinates if the flavor is > hmmsearch or phmmer; and they're hit coordinates if the flavor is > hmmscan. Whoops. I mostly looked at hmmscan when writing the parser, because that's the file format I needed for my code. The code clearly should follow the way the hmmer2 parser works, and differentiate between hmmsearch and hmmscan type output. As I said on the bug report, I'm happy to look at code fixing this. > This information is not available in other HMMER command line > output formats (tblout and domtblout), which as Peter has > mentioned, required us to treat different flavors of the table > output as different formats for the time being. As far as I'm aware, BioPerl currently doesn't parse the table output format. Seeing how much repeated pain we run into with all these parsers in the different Bio* projects, I wonder if there was a smarter way to deal with parsing. Maybe at least some shared grammar file that we could use for testing, to make sure we at least have the same expectations about file formats in the different language implementations. Ideally we'd auto-generate the parsers from the grammar specification, but I guess that'll stay wishful thinking for quite a bit. > Fortunately, after contacting the HMMER developers they mentioned > that this is not the case anymore in their development branch (and > their future planned release). That's certainly good news. :) Cheers, Kai - -- Dipl.-Inform. Kai Blin kai.blin at biotech.uni-tuebingen.de Institute for Microbiology and Infection Medicine Division of Microbiology/Biotechnology Eberhard-Karls-Universit?t T?bingen Auf der Morgenstelle 28 Phone : ++49 7071 29-78841 D-72076 T?bingen Fax : ++49 7071 29-5979 Germany Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQEcBAEBAgAGBQJP/tR3AAoJEKM5lwBiwTTP6OoIAM3J9chdyfmTuQTp4KMxVIk7 PCkJy+aLcnfa3d7s8BVPG0GWQTPrfHLX6a7zWfoSLzL9RBShFWCQIxGpu+Tq3yR8 Hu/TpoFIg8bB1iAroAWLdsX8nio3Idlcl5JN38LBsFEUirFrGAsvfdN/+fYrP5Ni y0ULP18uihiN07sVG88nZXNyEB7fIscVYdO90GsGq03/KOTRsRD4kugapiQJIy4D lrqnYznLa4p30lBDCEHbTaHYbfIs7/8tryfHJsfjimjg8IoSMHMJfIkI7/z0qlL+ bxt/HuGMsm1Ak08xEAoT7T00t5tcAp1gclgZsO/CrviOicmhUgd6iri/kIpzg0c= =acWd -----END PGP SIGNATURE----- From cjfields at illinois.edu Thu Jul 12 11:24:13 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Thu, 12 Jul 2012 15:24:13 +0000 Subject: [Bioperl-l] hmmer3.pm question re query and hit coordinates In-Reply-To: <4FFED477.3090907@biotech.uni-tuebingen.de> References: <7CF4A2C5-F44F-4C0D-A3B7-5ED131A1A9ED@gmail.com> <1823BCEE-5D27-4FF9-8D57-082AE0CFE8ED@illinois.edu> <4FFED477.3090907@biotech.uni-tuebingen.de> Message-ID: <1C3A31F9-9717-49F3-A880-FA725D0F3CDB@illinois.edu> On Jul 12, 2012, at 8:43 AM, Kai Blin wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On 2012-07-11 23:25, Wibowo Arindrarto wrote: > > Hi, > >> The current Biopython parser for the plain text format parses the >> very first line to find out which HMMER flavor produces the result. >> Both 'hmm from' and 'hmmto' are query coordinates if the flavor is >> hmmsearch or phmmer; and they're hit coordinates if the flavor is >> hmmscan. > > Whoops. I mostly looked at hmmscan when writing the parser, because > that's the file format I needed for my code. The code clearly should > follow the way the hmmer2 parser works, and differentiate between > hmmsearch and hmmscan type output. > > As I said on the bug report, I'm happy to look at code fixing this. Seems like it should be easy enough to address if there is something in the output that indicates the report type. >> This information is not available in other HMMER command line >> output formats (tblout and domtblout), which as Peter has >> mentioned, required us to treat different flavors of the table >> output as different formats for the time being. > > As far as I'm aware, BioPerl currently doesn't parse the table output > format. The only reason to do so is if the table provides additional information the actual hits don't (this can be the case with BLAST reports). > Seeing how much repeated pain we run into with all these parsers in > the different Bio* projects, I wonder if there was a smarter way to > deal with parsing. Maybe at least some shared grammar file that we > could use for testing, to make sure we at least have the same > expectations about file formats in the different language > implementations. Ideally we'd auto-generate the parsers from the > grammar specification, but I guess that'll stay wishful thinking for > quite a bit. I would fully support something like this, been thinking about this with Marpa::XS (which now has a compiled library, libmarpa, to make it less perl-centric), and there have been talks of using a similar toolkit with the bioruby folks. We could always have a plain-perl/python/ruby/etc fallback in the most common formats. chris From buschj at hhu.de Sun Jul 15 15:46:42 2012 From: buschj at hhu.de (jobu) Date: Sun, 15 Jul 2012 21:46:42 +0200 Subject: [Bioperl-l] How to obtain Up- and Downstream target-Sequences of Blast Matches Message-ID: <50031E22.3060902@hhu.de> Dear All. Still being a beginner in Perl and just having started to look into BioPerl, I hope to ask my question at the right place. I locally ran a standalone blastn search of many short query-sequences against a set of target-fasta-sequences consisting of whole chromosomal sequence data. What I need to do now is to get let's say 100nt each Up- and Downstream out of my target sequences for each Blast match. At this point I only can assume that BioPerl might be helpfull in resolving this task, though I haven't found a module yet that will manage to do this locally on my harddrive. Thus I would be thankful for the slightest hint where to begin. Sincerely Jochen From Russell.Smithies at agresearch.co.nz Sun Jul 15 17:19:14 2012 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Mon, 16 Jul 2012 09:19:14 +1200 Subject: [Bioperl-l] How to obtain Up- and Downstream target-Sequences of Blast Matches In-Reply-To: <50031E22.3060902@hhu.de> References: <50031E22.3060902@hhu.de> Message-ID: <18DF7D20DFEC044098A1062202F5FFF34CCF2A4CAA@exchsth.agresearch.co.nz> Hi Jochen, I don't think BioPerl can directly manipulate blast databases so I'd probably do it with fastacmd to extract the sequence from the original blast database. eg. fastacmd -s X51494.1 -d /dataset/blastdata/active/nt -L 100,200 >gi|20090|emb|X51494.1|:100-200 Rice prolamin gene (strain NE4) ATGATGCAAACGTTGGGCATGGGTAGCTCCACAGCCATGTTCATGTCGCAGCCAATGGCGCTCCTGCAGCAGCAATGTTG CATGCAGCTACAAGGCATGAT Or if you're using blast+, use the blastdbcmd command: eg. blastdbcmd -entry X51494.1 -db /dataset/blastdata/active/nt -range 100-200 >gi|20090|emb|X51494.1|:100-200 Rice prolamin gene (strain NE4) ATGATGCAAACGTTGGGCATGGGTAGCTCCACAGCCATGTTCATGTCGCAGCCAATGGCGCTCCTGCAGCAGCAATGTTG CATGCAGCTACAAGGCATGAT So to add it all together, try using BioPerl to parse your existing blast results and pull out each hit's coordinates then use a system call to exec fastacmd or blastdbcmd to extract the sequence from the blast database then write the sequences to file. These might be useful: http://www.bioperl.org/wiki/HOWTO:SearchIO http://www.bioperl.org/wiki/HOWTO:SearchIO#Speed_improvements_with_lightweight_objects http://www.bioperl.org/wiki/HOWTO:BlastPlus http://www.bioperl.org/wiki/HOWTO:StandAloneBlast --Russell -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of jobu Sent: Monday, 16 July 2012 7:47 a.m. To: bioperl-l at lists.open-bio.org Subject: [Bioperl-l] How to obtain Up- and Downstream target-Sequences of Blast Matches Dear All. Still being a beginner in Perl and just having started to look into BioPerl, I hope to ask my question at the right place. I locally ran a standalone blastn search of many short query-sequences against a set of target-fasta-sequences consisting of whole chromosomal sequence data. What I need to do now is to get let's say 100nt each Up- and Downstream out of my target sequences for each Blast match. At this point I only can assume that BioPerl might be helpfull in resolving this task, though I haven't found a module yet that will manage to do this locally on my harddrive. Thus I would be thankful for the slightest hint where to begin. Sincerely Jochen _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From dcmertens.perl at gmail.com Tue Jul 17 08:57:55 2012 From: dcmertens.perl at gmail.com (David Mertens) Date: Tue, 17 Jul 2012 07:57:55 -0500 Subject: [Bioperl-l] Announcing The Quantified Onion Google Group and perl4science.github.com Message-ID: Hello everybody - I returned from YAPC::NA this year intending to build-up the scientific Perl community. One outgrowth of this has been Joel Berger's creation of perl4science.github.com and gizmomathboy's creation of The Quantified Onion Google Group . perl4science is meant to be a landing page for anybody looking to combine Perl and science. Since it is a github repository, it makes it about as easy as possible for others to contribute content or fixes. If you have a project that scientists would find useful, you should fork the project, add your content, and issue a pull request. It's that easy. The Quantified Onion is meant to be a space for scientists to discuss how we use Perl in our science and to work together to grow adoption of Perl among scientists. It will undoubtedly attract newcomers to Perll asking beginner questions, at which point we will gently refer them to the appropriate manual pages. Interesting discussions thus far (in my mind) include a discussion about teaching test-driven design and a discussion about submitting an article to Computing in Science and Engineering for their November Issue, which is supposed to be about Modern Programming Languages. I would like to begin putting on workshops on Perl for Scientists and Engineers (and encourage others to do that same), and I will begin the discussion on The Quantified Onion. If you know of other Perl science resources, please feel free to add them to perl4science or post them on The Quantified Onion, and please join The Quantified Onion. Together, we can grow Perl's adoption among scientists! David Mertens -- "Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it." -- Brian Kernighan From cjfields at illinois.edu Wed Jul 18 10:29:02 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Wed, 18 Jul 2012 14:29:02 +0000 Subject: [Bioperl-l] [blast-announce] OLD_BLAST parameter to be discontinued. Alternative NCBI BLAST parsable formats are available References: Message-ID: <118F034CF4C3EF48A96F86CE585B94BF2B63F6C5@CHIMBX5.ad.uillinois.edu> Not sure if anyone is using this as a means of getting their reports (I don't), but I'm posting this here just in case. -c Begin forwarded message: > From: "Mcginnis, Scott (NIH/NLM/NCBI) [E]" > Subject: [blast-announce] OLD_BLAST parameter to be discontinued. Alternative NCBI BLAST parsable formats are available > Date: July 18, 2012 9:17:05 AM CDT > To: NLM/NCBI List blast-announce > > Beginning Sept. 10, 2012, the BLAST service will ignore the OLD_BLAST parameter in posted URLs. We are removing this old and little used option to prepare for upcoming enhancements to the BLAST service later this year. Setting OLD_BLAST=true produces an older version of the BLAST HTML results that a few people have used for automated processing (parsing) of results. NCBI BLAST supports a number of different and more stable formats for parsing. These include XML, tabular reports and ASN.1. For more information, please see BLAST Developer Information (http://1.usa.gov/O8AocI) and links on that page. > From dejian.zhao at gmail.com Wed Jul 18 11:36:14 2012 From: dejian.zhao at gmail.com (De-Jian Zhao) Date: Wed, 18 Jul 2012 23:36:14 +0800 Subject: [Bioperl-l] Which graphic module should I learn? Message-ID: <5006D7EE.1020205@gmail.com> Hi, all. Currently I am working on a genome. I will draw some pictures based on the sequencing data. In the long run, I will use the module in my future projects, so I want to learn a popular module to get better support from the community. I searched in cpan with the command "i /SVG/" and got 234 items. Which one is popular in bioinformatics? Which module should I start with? Thanks for any suggestions. Best, De-Jian From scott at scottcain.net Wed Jul 18 11:46:01 2012 From: scott at scottcain.net (Scott Cain) Date: Wed, 18 Jul 2012 11:46:01 -0400 Subject: [Bioperl-l] Which graphic module should I learn? In-Reply-To: <5006D7EE.1020205@gmail.com> References: <5006D7EE.1020205@gmail.com> Message-ID: Hi De-Jian, Of course, it depends on what you want to do, but if you're referring to the genome feature/annotation type graphics, Bio::Graphics already supports SVG pretty well, via GD::SVG. Scott On Wed, Jul 18, 2012 at 11:36 AM, De-Jian Zhao wrote: > Hi, all. > > Currently I am working on a genome. I will draw some pictures based on the > sequencing data. In the long run, I will use the module in my future > projects, so I want to learn a popular module to get better support from the > community. I searched in cpan with the command "i /SVG/" and got 234 items. > Which one is popular in bioinformatics? Which module should I start with? > Thanks for any suggestions. > > Best, > De-Jian > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From cjfields at illinois.edu Tue Jul 24 23:08:05 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Wed, 25 Jul 2012 03:08:05 +0000 Subject: [Bioperl-l] BioPerl and Travis-CI Message-ID: <118F034CF4C3EF48A96F86CE585B94BF3140B017@CITESMBX5.ad.uillinois.edu> Peter Cock has graciously helped start up a branch for bioperl-live that is using Travis-CI (a nice continuous integration tool). Results from Peter's fork are found here: http://travis-ci.org/#!/peterjc/bioperl-live As this is now pulled into the main bioperl repo, results will be here: http://travis-ci.org/#!/bioperl/bioperl-live I'll be working on this and expect this will be added to master in the next few days. chris From p.j.a.cock at googlemail.com Wed Jul 25 06:31:13 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 25 Jul 2012 11:31:13 +0100 Subject: [Bioperl-l] BioPerl and Travis-CI In-Reply-To: <118F034CF4C3EF48A96F86CE585B94BF3140B017@CITESMBX5.ad.uillinois.edu> References: <118F034CF4C3EF48A96F86CE585B94BF3140B017@CITESMBX5.ad.uillinois.edu> Message-ID: On Wed, Jul 25, 2012 at 4:08 AM, Fields, Christopher J wrote: > Peter Cock has graciously helped start up a branch for bioperl-live > that is using Travis-CI (a nice continuous integration tool). Results > from Peter's fork are found here: > > http://travis-ci.org/#!/peterjc/bioperl-live > > As this is now pulled into the main bioperl repo, results will be here: > > http://travis-ci.org/#!/bioperl/bioperl-live > > I'll be working on this and expect this will be added to master in > the next few days. > > chris We've had this running for Biopython for a month now, and it has been a useful complement to the BuildBot (which covers other operating systems). This was following BioRuby's lead: http://biopython.org/pipermail/biopython-dev/2012-June/009742.html The current BioPerl Travis configuration is probably usable right now (after changing the branch whitelist to either master, or simple all branches). Other remaining issues include sorting out which dependencies should be installed, and streamlining their verbose output (e.g. using tail). TravisCI can send out emails (e.g. on test failures), and perhaps bioperl-guts-l might be a sensible place to send these. Initially we'd disabled the emails for Biopython. I'd like to use an RSS feed... there is a JSON API which BioRuby are using for http://www.biogems.info/ which tracks their plugins. Peter From p.j.a.cock at googlemail.com Fri Jul 27 11:03:05 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 27 Jul 2012 16:03:05 +0100 Subject: [Bioperl-l] BioPerl and Travis-CI In-Reply-To: <118F034CF4C3EF48A96F86CE585B94BF3140D21F@CITESMBX5.ad.uillinois.edu> References: <118F034CF4C3EF48A96F86CE585B94BF3140B017@CITESMBX5.ad.uillinois.edu> <118F034CF4C3EF48A96F86CE585B94BF3140D21F@CITESMBX5.ad.uillinois.edu> Message-ID: On Fri, Jul 27, 2012 at 3:58 PM, Fields, Christopher J wrote: > On Jul 27, 2012, at 9:47 AM, Peter Cock wrote: > >> On Thu, Jul 26, 2012 at 4:22 PM, Peter Cock wrote: >>> >>> That's done now - except for the circular dependencies, and GD, >>> which might be easy to solve if anyone knows what the error >>> means - see commit message here: >>> https://github.com/peterjc/bioperl-live/commit/905441ac09939be3368c14de38d04486c7e9849a >> >> Re: https://twitter.com/cjfields/status/228861370454638592 >> Not sure why you got GD to work when something very similar >> had failed for me. Oh well - job done :) > > It was the lack of gdlib-config in the libgd2-xpm package, you need > libgd2-xpm-dev. One of the fun things about Debian packaging. Ah - I should have guessed that. >>> Would a single clean commit of the (current) .travis.yml file be >>> preferable to the current series of commits? And you you want >>> a pull request, or would you just merge/cherry-pick manually? >> >> Given all the churn between our revisions, personally I'd opt for >> a single clean commit to bioperl/master - but your call. >> >> Peter > > Yep, about to merge it over. It's working now, just need to > whitelist master instead of travis after the merge. I'd removed the whitelist altogether here: https://github.com/peterjc/bioperl-live/commit/96dc5866f4406179353909c72d812623341c8fbd My thinking was BioPerl seems to have multiple feature branches under the official repo, so they should get tested too. You'd be in a better position than me to judge what would work best for BioPerl here. Peter From cjfields at illinois.edu Fri Jul 27 10:58:21 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Fri, 27 Jul 2012 14:58:21 +0000 Subject: [Bioperl-l] BioPerl and Travis-CI In-Reply-To: References: <118F034CF4C3EF48A96F86CE585B94BF3140B017@CITESMBX5.ad.uillinois.edu> Message-ID: <118F034CF4C3EF48A96F86CE585B94BF3140D21F@CITESMBX5.ad.uillinois.edu> On Jul 27, 2012, at 9:47 AM, Peter Cock wrote: > On Thu, Jul 26, 2012 at 4:22 PM, Peter Cock wrote: >> >> That's done now - except for the circular dependencies, and GD, >> which might be easy to solve if anyone knows what the error >> means - see commit message here: >> https://github.com/peterjc/bioperl-live/commit/905441ac09939be3368c14de38d04486c7e9849a > > Re: https://twitter.com/cjfields/status/228861370454638592 > Not sure why you got GD to work when something very similar > had failed for me. Oh well - job done :) It was the lack of gdlib-config in the libgd2-xpm package, you need libgd2-xpm-dev. One of the fun things about Debian packaging. >> Would a single clean commit of the (current) .travis.yml file be >> preferable to the current series of commits? And you you want >> a pull request, or would you just merge/cherry-pick manually? > > Given all the churn between our revisions, personally I'd opt for > a single clean commit to bioperl/master - but your call. > > Peter Yep, about to merge it over. It's working now, just need to whitelist master instead of travis after the merge. chris From cjfields at illinois.edu Fri Jul 27 12:26:34 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Fri, 27 Jul 2012 16:26:34 +0000 Subject: [Bioperl-l] BioPerl Travis-CI now live Message-ID: <118F034CF4C3EF48A96F86CE585B94BF3140D54D@CITESMBX5.ad.uillinois.edu> All commits to bioperl-live master branch on github are now being tracked: http://travis-ci.org/#!/bioperl/bioperl-live The .travis.yml file has a whitelist for branches to be tested; if anyone wants to test additional branches feel free to add them to the list! chris From cjfields at illinois.edu Fri Jul 27 11:15:19 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Fri, 27 Jul 2012 15:15:19 +0000 Subject: [Bioperl-l] BioPerl and Travis-CI In-Reply-To: References: <118F034CF4C3EF48A96F86CE585B94BF3140B017@CITESMBX5.ad.uillinois.edu> <118F034CF4C3EF48A96F86CE585B94BF3140D21F@CITESMBX5.ad.uillinois.edu> Message-ID: <118F034CF4C3EF48A96F86CE585B94BF3140D2D6@CITESMBX5.ad.uillinois.edu> On Jul 27, 2012, at 10:03 AM, Peter Cock wrote: > On Fri, Jul 27, 2012 at 3:58 PM, Fields, Christopher J > wrote: >> On Jul 27, 2012, at 9:47 AM, Peter Cock wrote: >> >>> On Thu, Jul 26, 2012 at 4:22 PM, Peter Cock wrote: >>>> >>>> That's done now - except for the circular dependencies, and GD, >>>> which might be easy to solve if anyone knows what the error >>>> means - see commit message here: >>>> https://github.com/peterjc/bioperl-live/commit/905441ac09939be3368c14de38d04486c7e9849a >>> >>> Re: https://twitter.com/cjfields/status/228861370454638592 >>> Not sure why you got GD to work when something very similar >>> had failed for me. Oh well - job done :) >> >> It was the lack of gdlib-config in the libgd2-xpm package, you need >> libgd2-xpm-dev. One of the fun things about Debian packaging. > > Ah - I should have guessed that. > >>>> Would a single clean commit of the (current) .travis.yml file be >>>> preferable to the current series of commits? And you you want >>>> a pull request, or would you just merge/cherry-pick manually? >>> >>> Given all the churn between our revisions, personally I'd opt for >>> a single clean commit to bioperl/master - but your call. >>> >>> Peter >> >> Yep, about to merge it over. It's working now, just need to >> whitelist master instead of travis after the merge. > > I'd removed the whitelist altogether here: > https://github.com/peterjc/bioperl-live/commit/96dc5866f4406179353909c72d812623341c8fbd > > My thinking was BioPerl seems to have multiple feature branches > under the official repo, so they should get tested too. You'd be > in a better position than me to judge what would work best for > BioPerl here. > > Peter We'll keep it to master for now. It's pretty easy to add branches as needed, and I didn't want to expand to all the potentially stale branches unless explicitly set (we need to triage all those at some point). chris From p.j.a.cock at googlemail.com Fri Jul 27 10:47:18 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 27 Jul 2012 15:47:18 +0100 Subject: [Bioperl-l] BioPerl and Travis-CI In-Reply-To: References: <118F034CF4C3EF48A96F86CE585B94BF3140B017@CITESMBX5.ad.uillinois.edu> Message-ID: On Thu, Jul 26, 2012 at 4:22 PM, Peter Cock wrote: > > That's done now - except for the circular dependencies, and GD, > which might be easy to solve if anyone knows what the error > means - see commit message here: > https://github.com/peterjc/bioperl-live/commit/905441ac09939be3368c14de38d04486c7e9849a Re: https://twitter.com/cjfields/status/228861370454638592 Not sure why you got GD to work when something very similar had failed for me. Oh well - job done :) > Would a single clean commit of the (current) .travis.yml file be > preferable to the current series of commits? And you you want > a pull request, or would you just merge/cherry-pick manually? Given all the churn between our revisions, personally I'd opt for a single clean commit to bioperl/master - but your call. Peter From robfsouza at gmail.com Fri Jul 27 18:29:22 2012 From: robfsouza at gmail.com (Robson de Souza) Date: Fri, 27 Jul 2012 15:29:22 -0700 (PDT) Subject: [Bioperl-l] obf sites offline? Message-ID: <9bef8a3b-08ca-4868-be7a-193e7596290d@googlegroups.com> I can't access any of the OBF sites, either from work (USA) or my phone... is there something going on? Robson From p.j.a.cock at googlemail.com Thu Jul 26 11:22:26 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 26 Jul 2012 16:22:26 +0100 Subject: [Bioperl-l] BioPerl and Travis-CI In-Reply-To: References: <118F034CF4C3EF48A96F86CE585B94BF3140B017@CITESMBX5.ad.uillinois.edu> Message-ID: On Wed, Jul 25, 2012 at 11:31 AM, Peter Cock wrote: > On Wed, Jul 25, 2012 at 4:08 AM, Fields, Christopher J > wrote: >> Peter Cock has graciously helped start up a branch for bioperl-live >> that is using Travis-CI (a nice continuous integration tool). Results >> from Peter's fork are found here: >> >> http://travis-ci.org/#!/peterjc/bioperl-live >> >> As this is now pulled into the main bioperl repo, results will be here: >> >> http://travis-ci.org/#!/bioperl/bioperl-live >> >> I'll be working on this and expect this will be added to master in >> the next few days. >> >> chris > > We've had this running for Biopython for a month now, and it has > been a useful complement to the BuildBot (which covers other > operating systems). This was following BioRuby's lead: > http://biopython.org/pipermail/biopython-dev/2012-June/009742.html > > The current BioPerl Travis configuration is probably usable right > now (after changing the branch whitelist to either master, or simple > all branches). > > Other remaining issues include sorting out which dependencies > should be installed, and streamlining their verbose output (e.g. > using tail). That's done now - except for the circular dependencies, and GD, which might be easy to solve if anyone knows what the error means - see commit message here: https://github.com/peterjc/bioperl-live/commit/905441ac09939be3368c14de38d04486c7e9849a Would a single clean commit of the (current) .travis.yml file be preferable to the current series of commits? And you you want a pull request, or would you just merge/cherry-pick manually? > TravisCI can send out emails (e.g. on test failures), and perhaps > bioperl-guts-l might be a sensible place to send these. Initially > we'd disabled the emails for Biopython. I'd like to use an RSS > feed... there is a JSON API which BioRuby are using for > http://www.biogems.info/ which tracks their plugins. I've filed an issue for news feed support in TravisCI, https://github.com/travis-ci/travis-core/issues/82 Regards, Peter From p.j.a.cock at googlemail.com Tue Jul 31 06:37:35 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 31 Jul 2012 11:37:35 +0100 Subject: [Bioperl-l] Travis Continuous Integration testing & pull requests Message-ID: Hi all, I'm cross posting as this is an announcement. Please keep any follow up discussion to the relevant project specific mailing list, or if general open-bio-l please. Those following the OBF blog or the OBF or Bio* Twitter accounts will have already seen this, which I posted yesterday: http://news.open-bio.org/news/2012/07/travis-ci-for-testing/ In summary, since earlier this year BioRuby and then Biopython and BioPerl have been using Travis-CI.org (a hosted continuous integration service for the open source community) to run their unit tests automatically whenever their GitHub repositories are updated. In addition we now have TravisCI automatically running our tests on any new GitHub pull requests - supported by an OBF donation to Travis-CI, see: http://about.travis-ci.org/blog/announcing-pull-request-support/ Currently BioJava only uses GitHub as an SVN mirror - but this should still let you start using TravisCI for automated testing: http://about.travis-ci.org/docs/user/languages/java/ For EMBOSS, this is another incentive to convert from CVS to github - TravisCI recently announced support for C/C++ projects: http://about.travis-ci.org/blog/support_for_go_c_and_cpp/ http://about.travis-ci.org/docs/user/languages/c/ Potentially there are other OBF projects where this would be useful too. Regards, Peter From wrp at virginia.edu Mon Jul 2 10:31:40 2012 From: wrp at virginia.edu (William Pearson) Date: Mon, 2 Jul 2012 10:31:40 -0400 Subject: [Bioperl-l] Application Deadline - 2012 CSHL Computational and Comparative Genomics Course Message-ID: Course announcement - Application deadline, July 15, 2012 Cold Spring Harbor COMPUTATIONAL & COMPARATIVE GENOMICS Oct 31 - Nov 6, 2011 Application Deadline: July 15, 2012 INSTRUCTORS: William Pearson, University of Virginia, Charlottesville, VA Lisa Stubbs, University of Illinois, Urbana, IL This course presents a comprehensive overview of the theory and practice of computational methods for the identification and characterization of functional elements from DNA sequence data. The course focuses on approaches for extracting the maximum amount of information from protein and DNA sequence similarity through sequence database searches, statistical analysis, and multiple sequence alignment. Additional topics include: Alignment and analysis of "Next-Gen" sequencing data The Galaxy environment for high-throughput analysis Identification of conserved signals in aligned and unaligned sequences Regulatory element and motif recognition Integration of genetic and sequence information in biological databases The ENSEMBL genome browser and BioMart Function/phenotype prediction for sequence variants The course combines lectures with hands-on exercises; students are encouraged to pose challenging sequence analysis problems using their own data. The course is designed for biologists seeking advanced training in biological sequence and genome analysis, computational biology core resource directors and staff, and for scientists in other disciplines, such as computer science, who wish to survey current research problems in biological sequence analysis. Advanced programming skills are not required. The lecture/lab schedule for the 2011 course can be found at fasta.bioch.virginia.edu/cshl Speakers in the 2011 course included: Aaron Mackey, U. of Virginia, Next-Gen analysis pipelines Bert Overduin, European Bioinformatics Institute, UK, ENSEMBL and BioMart Francis Ouellette, Ontario Institute for Cancer Research, Databases for Biological Function William Pearson, U. of Virginia, Similarity Searching, Multiple Alignment Lisa Stubbs, U. of Illinois, Urbana, ChIP, Transcription Factors, and Comparative Genomics James Taylor, Emory, Galaxy and genome analysis pipelines The primary focus of the computational and comparative genomics course is the theory and practice of algorithms used in computational biology, with the goal of using current methods more effectively and evaluating new approaches. Students who wish to learn Perl programming for Bioinformatics are encouraged to apply to the Programming for Biology course. Students who would like in-depth training in the analysis of next-generation sequencing data (e.g., SNP calling and the detection of structural variants) should apply to the course on Advanced Sequencing Technologies & Applications. This Computational and Comparative Genomics course will discuss methods for phenotype prediction from variation data. To apply to the course, fill out and send in the form at: http://meetings.cshl.edu/course/courseapp_instr.shtml From shalabh.sharma7 at gmail.com Mon Jul 2 13:09:57 2012 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Mon, 2 Jul 2012 13:09:57 -0400 Subject: [Bioperl-l] translation frame problem in bioperl Message-ID: Hi All, I am just confused about the translation frames. I used bioperl to parse a blastx report. Reports shows that the frame used is -2 but when i translate the sequence using EMBOSS or Some other program the frame is -1. Am i doing something wrong here. Here is the sequence: >gi|378759230|gb|AHBJ01000169.1| SAR86 cluster bacterium SAR86D scf1120176765857, whole genome shotgun sequence 2642:3697 AGCTTCCCATGGAACCCATGCAAGTGCAATATTTGTTTCTAGCTCTGGTGACCACCAAGGAGATGTCACGTAGCCCACCTCATCTTCATCAGTATTAGTTACTATCCAAAAATCAGAAGCATAATCTGTGATTTCTTTTCCTCCAAGGGTTAAACCAACCATCTTCATTTTAAATGGTGCATTTCCTTCATCTATGATTGCTCTCTGTTTTTCAAGCTCTTCTTTACCAATGTAATCAGCTGCTTTATTTCTTGGTACCTGATAACTTAAATTAACCTGAAAGGGAGAAGTTTCATGATCCAGATCTTGTCCCCAAGACAAAATTCCAGCTGCAATGCGACGATGATGCGCAGGAGCTATGACCATTAAGCCAAATTCTTCTCCAGCCTCAAGAACAGCATTCCACATTTTTTCTGCATTATCATGTGCGTCACGAACATATATTTCATAACCTTTTTCGCCTGTAAAACCAGTTTGACTGATTACACAATCAGCTCCACCAACCTGAGTTTCTAAAATTCCATAATAAGGAACTTCTCTTAACTCTTCGCCAGCTAACTTTGCCATAAGATCTTCAGATAAAGGGCCTTGAATTTGAACAGGACAAACATCAATCTCATCAATTTCTACGTCATATTTTTTAGACACATTTACGCCTTGAAGCCAAAGTAAGAGATCGCTGTCTGATATTGAGAACCAGAATTCATCTTCTGTTAGTCTTAATAGAACAGGGTCATTTAAAACCCCTCCTTTTTCATTGCATAAAATCGCATATTTACCATTTCCGGGTTTAATTTTTGTAGCATCACGAGTTATTACATAATCTGTAAAAGCTTCTGCATCTGGACCTTTTACTCTTATCTGTCTTTCAACAGCAACATTCCACATAGTAACTCTATTAACCAAGGCTTCGTATTCAACCATGGCACCGCCATCTTCAGGTTTTACATAGCCTCGTGGATGATAAATTCGATTATATACAGTTGCTCTCCAACAGCCCGCTTCATGAGATAGATGCCAAAAAGGCGATTTTCTTACCCGGGTTGAAATTAATAA This is a part of blast report by bioperl: >JCVI_READ_1105499496127 /Indian_Ocean/gcvT Length = 352 Score = 655 bits (1690), Expect = 0.0 Identities = 311/352 (88%), Positives = 329/352 (93%) Frame = -2 Query: 3697 LLISTRVRKSPFWHLSHEAGCWRATVYNRIYHPRGYVKPEDGGAMVEYEALVNRVTMWNV 3518 +LISTRVRKSPFWHLSHEAGCWRATVYNRIYHPRGY+KPEDGGAMVEY+ALVNRVTMWNV Sbjct: 1 MLISTRVRKSPFWHLSHEAGCWRATVYNRIYHPRGYIKPEDGGAMVEYDALVNRVTMWNV 60 ..... ..... Query: 2797 GLTLGGKEITDYASDFWIVTNTDEDEVGYVTSPWWSPELETNIALAWVPWEA 2642 GLTLGGKEITDYA DFW+V + D + PWWSPEL TNIAL WVPW A Sbjct: 301 GLTLGGKEITDYAPDFWLVADMDGMMLDISLPPWWSPELNTNIALGWVPWSA 352 This is EMBOSS output (from EBI): >EMBOSS_001_4 LLISTRVRKSPFWHLSHEAGCWRATVYNRIYHPRGYVKPEDGGAMVEYEALVNRVTMWNV AVERQIRVKGPDAEAFTDYVITRDATKIKPGNGKYAILCNEKGGVLNDPVLLRLTEDEFW FSISDSDLLLWLQGVNVSKKYDVEIDEIDVCPVQIQGPLSEDLMAKLAGEELREVPYYGI LETQVGGADCVISQTGFTGEKGYEIYVRDAHDNAEKMWNAVLEAGEEFGLMVIAPAHHRR IAAGILSWGQDLDHETSPFQVNLSYQVPRNKAADYIGKEELEKQRAIIDEGNAPFKMKMV GLTLGGKEITDYASDFWIVTNTDEDEVGYVTSPWWSPELETNIALAWVPWEA >EMBOSS_001_5 INFNPGKKIAFLASIS*SGLLESNCI*SNLSSTRLCKT*RWRCHG*IRSLG**SYYVECC C*KTDKSKRSRCRSFYRLCNNS*CYKN*TRKW*ICDFMQ*KRRGFK*PCSIKTNRR*ILV ...... You can see its a frame -1. I would really appreciate your help. Thanks Shalabh -- Shalabh Sharma Scientific Computing Professional Associate (Bioinformatics Specialist) Department of Marine Sciences University of Georgia Athens, GA 30602-3636 From scott at scottcain.net Mon Jul 2 14:50:45 2012 From: scott at scottcain.net (Scott Cain) Date: Mon, 2 Jul 2012 14:50:45 -0400 Subject: [Bioperl-l] GMOD Summer School application deadline Message-ID: Hello, The deadline to apply for the GMOD Summer School is in one week, July 9th. The application is available as a Google Form: https://docs.google.com/spreadsheet/embeddedform?formkey=dG5hNGFiQ3UwYTV2LUZxZW04Qm1yZXc6MQ In the GMOD Summer School (August 24-29, 2012) we will cover the installation, configuration and use of a variety of GMOD tools, including Chado, GBrowse, JBrowse and Tripal. For more information on the course, see the course web page at http://gmod.org/wiki/2012_GMOD_Summer_School The course will make heavy use of the Amazon Web Service (aka, the Cloud) via a grant from Amazon. Enrollment is limited to 24 students, and the application process is competitive: the last few years we've received over 75 applications for those 24 spots. I look forward to seeing you in North Carolina in August! Scott -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From p.j.a.cock at googlemail.com Mon Jul 2 15:34:40 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 2 Jul 2012 20:34:40 +0100 Subject: [Bioperl-l] translation frame problem in bioperl In-Reply-To: References: Message-ID: On Mon, Jul 2, 2012 at 6:09 PM, shalabh sharma wrote: > Hi All, > ? ? ? ? ?I am just confused about the translation frames. I used bioperl to > parse a blastx report. > Reports shows that the frame used is -2 but when i translate the sequence > using EMBOSS or Some other program the frame is -1. > Am i doing something wrong here. Possibly there are conflicting definitions of frames -1, -2, and -3 here (and that's leaving out the possibility of -0, -1 and -2 counting). Some will count from the first base (start for forward strand), others the last base (start of reverse strand). This can make comparing the output of different tools quite confusing. Peter From shalabh.sharma7 at gmail.com Mon Jul 2 16:39:29 2012 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Mon, 2 Jul 2012 16:39:29 -0400 Subject: [Bioperl-l] translation frame problem in bioperl In-Reply-To: <98F2B304-71F3-42BD-9603-6858F03CC9F5@verizon.net> References: <98F2B304-71F3-42BD-9603-6858F03CC9F5@verizon.net> Message-ID: Hi Peter and Brian, Thanks a lot for your reply. I have already taken this in account. So if i parse the blast report (my previous example) i get strand '-1' and frame '1' (according to bioperl) so if we convert it to general term then its -2 because bioperl starts from 0. Also for bioperl forward frame translation working fine. Thanks Shalabh On Mon, Jul 2, 2012 at 4:24 PM, Brian Osborne wrote: > Shalabh, > > Also take a look at this: > > http://www.bioperl.org/wiki/HOWTO:SearchIO#frame.28.29 > > Brian O. > > > On Jul 2, 2012, at 3:34 PM, Peter Cock wrote: > > > On Mon, Jul 2, 2012 at 6:09 PM, shalabh sharma > > wrote: > >> Hi All, > >> I am just confused about the translation frames. I used > bioperl to > >> parse a blastx report. > >> Reports shows that the frame used is -2 but when i translate the > sequence > >> using EMBOSS or Some other program the frame is -1. > >> Am i doing something wrong here. > > > > Possibly there are conflicting definitions of frames -1, -2, and -3 here > > (and that's leaving out the possibility of -0, -1 and -2 counting). Some > > will count from the first base (start for forward strand), others the > last > > base (start of reverse strand). This can make comparing the output > > of different tools quite confusing. > > > > Peter > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- Shalabh Sharma Scientific Computing Professional Associate (Bioinformatics Specialist) Department of Marine Sciences University of Georgia Athens, GA 30602-3636 From bosborne11 at verizon.net Mon Jul 2 16:24:24 2012 From: bosborne11 at verizon.net (Brian Osborne) Date: Mon, 02 Jul 2012 16:24:24 -0400 Subject: [Bioperl-l] translation frame problem in bioperl In-Reply-To: References: Message-ID: <98F2B304-71F3-42BD-9603-6858F03CC9F5@verizon.net> Shalabh, Also take a look at this: http://www.bioperl.org/wiki/HOWTO:SearchIO#frame.28.29 Brian O. On Jul 2, 2012, at 3:34 PM, Peter Cock wrote: > On Mon, Jul 2, 2012 at 6:09 PM, shalabh sharma > wrote: >> Hi All, >> I am just confused about the translation frames. I used bioperl to >> parse a blastx report. >> Reports shows that the frame used is -2 but when i translate the sequence >> using EMBOSS or Some other program the frame is -1. >> Am i doing something wrong here. > > Possibly there are conflicting definitions of frames -1, -2, and -3 here > (and that's leaving out the possibility of -0, -1 and -2 counting). Some > will count from the first base (start for forward strand), others the last > base (start of reverse strand). This can make comparing the output > of different tools quite confusing. > > Peter > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From vebaev at gmail.com Tue Jul 3 12:35:26 2012 From: vebaev at gmail.com (vebaev at gmail.com) Date: Tue, 3 Jul 2012 09:35:26 -0700 (PDT) Subject: [Bioperl-l] CFP - International Conference on Bioinformatics and Computational Biology - BIOCOMP BG 2012 Message-ID: <7b498b4c-2b2e-4e1f-871f-513203488bf1@googlegroups.com> International Conference on Bioinformatics and Computational Biology - BIOCOMP BG 2012 September 20-21, 2012, Varna, Bulgaria Dear Colleague, It is our pleasure to circulate the 2nd announcement of the International Conference on Bioinformatics and Computational Biology - BIOCOMP 2012 (http://biocomp.bio.uni-plovdiv.bg/). Keynote speakers Prof. Dr. Klaas Vandepoele - Ghent University, Belgium Dr. Andreas Gisel - Institute for Biomedical Technologies, Italy Prof. Wojciech Karlowski - Insitute of Molecular Biology and Biotechnology, Poland Prof. Mario A. Fares - University of Dublin, Trinity College, Ireland Dr.Andrey Kajava - CRBM - Macromolecular Biochemistry Research Center, France Dr.Gaurav Sablok - Istituto Agrario San Michele (IASMA), Italy Topics Topics of interest include, but are not limited to: High-performance bio-computing High-throughput sequencing data analysis (NGS) Bio-ontologies Molecular evolution Comparative genomics Molecular modeling and simulation Computational genetics Computational proteomics Data mining and visualization Software tools and applications Gene expression analysis Gene networks Structural biology Genome analysis Databases Systems biology Special topic: bioinformatics and miRNAs Recent achievements in these fields will be presented. The conference will include plenary and poster sessions. Participant?s proposals will be taken under advisement in compiling the program. Publications All accepted abstracts will be published in the conference abstract book. Best 20 abstracts will be peer-reviewed and published as full text manuscripts in a Special Issue of Springer and Elsevier journals: Interdisciplinary Sciences: Computational Life Sciences (ISSN: 1867-1462). Journal of Computational Science (ISSN: 1877-7503) Venue The venue of the conference is 4-star All-inclusive Sunny Day Black Sea resort, Bulgaria Registration and abstract submission All the actions related to the BIOCOMP 2012 (abstract submission, registration etc) may be completed via the Conference website at http://biocomp.bio.uni-plovdiv.bg/ Accommodation IMPORTANT: Accommodation is included in the conference registration fee. Important dates Abstract Submission Deadline - 20 August 2012 Early Registration Fee Payment Deadline - 20 August 2012 Arriving, Poster set up, Registration ? 19 September 2012 Plenary and Poster Sessions ? 20-21 September 2012 You may find details of the Conference visiting the Conference website at http://biocomp.bio.uni-plovdiv.bg/ Looking forward to see you in Bulgaria! ------------------------------------------------ Dr. Vesselin Baev Research Assistant Professor University of Plovdiv Dept. Plant Phys. and Molecular Biology Bioinformatics SMART Group Tzar Assen 24,Plovdiv 4000, BULGARIA Office:+359 32 261 (560); Mobile:+359 89 43 80 945 vebaev at gmail.com; baev at uni-plovdiv.bg; CV: http://plantgene.eu/ From tarakaramji at gmail.com Tue Jul 3 15:33:43 2012 From: tarakaramji at gmail.com (Tarakaramji Moturu) Date: Tue, 3 Jul 2012 19:33:43 +0000 (UTC) Subject: [Bioperl-l] Invitation to connect on LinkedIn Message-ID: <757342252.16905070.1341344023805.JavaMail.app@ela4-bed83.prod> LinkedIn ------------ I'd like to add you to my professional network on LinkedIn. - Tarakaramji Tarakaramji Moturu Student at GITAM University Vishakhapatnam Area, India Confirm that you know Tarakaramji Moturu: https://www.linkedin.com/e/1505z7-h47dlkop-69/isd/7726719493/9xC087NO/?hs=false&tok=2UuxBwCCkl7Rk1 -- You are receiving Invitation to Connect emails. Click to unsubscribe: http://www.linkedin.com/e/1505z7-h47dlkop-69/q7l5PgNeLXh3mAgNJzs79PDWzhT0l80xWa/goo/bioperl-l%40bioperl%2Eorg/20061/I2613636655_1/?hs=false&tok=0hY4YIDwkl7Rk1 (c) 2012 LinkedIn Corporation. 2029 Stierlin Ct, Mountain View, CA 94043, USA. From l.m.timmermans at students.uu.nl Wed Jul 4 03:16:34 2012 From: l.m.timmermans at students.uu.nl (Leon Timmermans) Date: Wed, 4 Jul 2012 10:16:34 +0300 Subject: [Bioperl-l] Invitation to connect on LinkedIn In-Reply-To: <757342252.16905070.1341344023805.JavaMail.app@ela4-bed83.prod> References: <757342252.16905070.1341344023805.JavaMail.app@ela4-bed83.prod> Message-ID: On Tue, Jul 3, 2012 at 10:33 PM, Tarakaramji Moturu wrote: > LinkedIn > ------------ > > > > I'd like to add you to my professional network on LinkedIn. > > - Tarakaramji Sending messages like this directly over mailinglists is a rather bad idea, if only because LinkedIn will think bioperl-l at bioperl.org is one of the email addresses of whomever accepts the request (which is relevant for retrieving a lost password, I think). Leon From ulrik.stervbo at gmail.com Fri Jul 6 03:03:08 2012 From: ulrik.stervbo at gmail.com (Ulrik Stervbo) Date: Fri, 6 Jul 2012 09:03:08 +0200 Subject: [Bioperl-l] problem in using protparam.pm module In-Reply-To: References: <9AA27ADA-FFE1-4735-BDE4-56C9B9A18009@illinois.edu> Message-ID: I had the same problem, and realized it is because I am behind a proxy. This is what I did to the Protparam module: Changed the url to 'http://web.expasy.org/cgi-bin/protparam/protparam' as previously found Added: $browser->proxy(['http'], 'http://[my proxy]/'); after initialization of the LWP agent. The proxy settings is what made Perl choke. (If only one could make perl see global proxy settings). Cheers, Ulrik 2011/7/28 Shachi Gahoi : > Please help me how to run protparam using bioperl module > > On Wed, Jul 27, 2011 at 11:05 AM, Chris Fields wrote: > >> The web service appears to have changed, but it looks as if no tests have >> been written up for this module which would have caught this out. We can >> write some basic tests up to check for simple functionality. >> >> chris >> >> On Jul 26, 2011, at 10:58 PM, Shachi Gahoi wrote: >> >> > Dear All, >> > >> > i am using protparam.pm module. but when i am running this script it is >> > printing one error message >> > >> > "Can't call method "throw" without a package or object reference at >> > /usr/share/perl5/Bio/Root/Root.pm line 368, line 1." >> > >> > Kindly help me to solve this problem. >> > >> > >> > Script is here---- >> > >> ################################################################################### >> > #!/usr/bin/perl >> > >> > use warnings; >> > use Bio::SeqIO; >> > use Bio::Tools::Protparam; >> > >> > >> > $seqfile='test1.fasta'; >> > >> > $seqio = Bio::SeqIO->new(-file => "$seqfile", -format => 'Fasta'); >> > >> > >> > while( $seq = $seqio->next_seq() ) >> > { >> > >> > >> > my $pp = Bio::Tools::Protparam->new(-seq=>$seq->seq); >> > >> > print >> > "ID : ", $seq->display_id,"\n", >> > "Amino acid number : ",$pp->amino_acid_number(),"\n", >> > "Number of negative amino acids : ",$pp->num_neg(),"\n", >> > "Number of positive amino acids : ",$pp->num_pos(),"\n", >> > "Molecular weight : ",$pp->molecular_weight(),"\n", >> > "Theoretical pI : ",$pp->theoretical_pI(),"\n", >> > "Total number of atoms : ", $pp->total_atoms(),"\n", >> > "Number of carbon atoms : ",$pp->num_carbon(),"\n", >> > "Number of hydrogen atoms : ",$pp->num_hydrogen(),"\n", >> > "Number of nitrogen atoms : ",$pp->num_nitro(),"\n", >> > "Number of oxygen atoms : ",$pp->num_oxygen(),"\n", >> > "Number of sulphur atoms : ",$pp->num_sulphur(),"\n", >> > "Half life : ", $pp->half_life(),"\n", >> > "Instability Index : ", $pp->instability_index(),"\n", >> > "Stability class : ", $pp->stability(),"\n", >> > "Aliphatic_index : ",$pp->aliphatic_index(),"\n", >> > "Gravy : ", $pp->gravy(),"\n", >> > "Composition of A : ", $pp->AA_comp('A'),"\n", >> > "Composition of R : ", $pp->AA_comp('R'),"\n", >> > "Composition of N : ", $pp->AA_comp('N'),"\n", >> > "Composition of D : ", $pp->AA_comp('D'),"\n", >> > "Composition of C : ", $pp->AA_comp('C'),"\n", >> > "Composition of Q : ", $pp->AA_comp('Q'),"\n", >> > "Composition of E : ", $pp->AA_comp('E'),"\n", >> > "Composition of G : ", $pp->AA_comp('G'),"\n", >> > "Composition of H : ", $pp->AA_comp('H'),"\n", >> > "Composition of I : ", $pp->AA_comp('I'),"\n", >> > "Composition of L : ", $pp->AA_comp('L'),"\n", >> > "Composition of K : ", $pp->AA_comp('K'),"\n", >> > "Composition of M : ", $pp->AA_comp('M'),"\n", >> > "Composition of F : ", $pp->AA_comp('F'),"\n", >> > "Composition of P : ", $pp->AA_comp('P'),"\n", >> > "Composition of S : ", $pp->AA_comp('S'),"\n", >> > "Composition of T : ", $pp->AA_comp('T'),"\n", >> > "Composition of W : ", $pp->AA_comp('W'),"\n", >> > "Composition of Y : ", $pp->AA_comp('Y'),"\n", >> > "Composition of V : ", $pp->AA_comp('V'),"\n", >> > "Composition of B : ", $pp->AA_comp('B'),"\n", >> > "Composition of Z : ", $pp->AA_comp('Z'),"\n", >> > "Composition of X : ", $pp->AA_comp('X'),"\n"; >> > } >> > >> ################################################################################### >> > >> > >> > >> > >> > -- >> > Regards, >> > Shachi >> > _______________________________________________ >> > Bioperl-l mailing list >> > Bioperl-l at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > > -- > Regards, > Shachi > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jason.stajich at gmail.com Fri Jul 6 13:49:46 2012 From: jason.stajich at gmail.com (Jason Stajich) Date: Fri, 6 Jul 2012 10:49:46 -0700 Subject: [Bioperl-l] problem in using protparam.pm module In-Reply-To: References: <9AA27ADA-FFE1-4735-BDE4-56C9B9A18009@illinois.edu> Message-ID: <8C9056B6-1DA4-4BE0-B008-429C2F6C05BE@gmail.com> you might try the PERL_LWP_ENV_PROXY and HTTP_PROXY env variables http://search.cpan.org/~gaas/libwww-perl-6.04/lib/LWP/UserAgent.pm#CONSTRUCTOR_METHODS http://search.cpan.org/~gaas/libwww-perl-6.04/lib/LWP/UserAgent.pm#Proxy_attributes I can't test it my end though w/o a proxy service. On Jul 6, 2012, at 12:03 AM, Ulrik Stervbo wrote: > I had the same problem, and realized it is because I am behind a proxy. > > This is what I did to the Protparam module: > Changed the url to 'http://web.expasy.org/cgi-bin/protparam/protparam' > as previously found > > Added: > $browser->proxy(['http'], 'http://[my proxy]/'); after initialization > of the LWP agent. > > The proxy settings is what made Perl choke. (If only one could make > perl see global proxy settings). > > Cheers, > Ulrik > > 2011/7/28 Shachi Gahoi : >> Please help me how to run protparam using bioperl module >> >> On Wed, Jul 27, 2011 at 11:05 AM, Chris Fields wrote: >> >>> The web service appears to have changed, but it looks as if no tests have >>> been written up for this module which would have caught this out. We can >>> write some basic tests up to check for simple functionality. >>> >>> chris >>> >>> On Jul 26, 2011, at 10:58 PM, Shachi Gahoi wrote: >>> >>>> Dear All, >>>> >>>> i am using protparam.pm module. but when i am running this script it is >>>> printing one error message >>>> >>>> "Can't call method "throw" without a package or object reference at >>>> /usr/share/perl5/Bio/Root/Root.pm line 368, line 1." >>>> >>>> Kindly help me to solve this problem. >>>> >>>> >>>> Script is here---- >>>> >>> ################################################################################### >>>> #!/usr/bin/perl >>>> >>>> use warnings; >>>> use Bio::SeqIO; >>>> use Bio::Tools::Protparam; >>>> >>>> >>>> $seqfile='test1.fasta'; >>>> >>>> $seqio = Bio::SeqIO->new(-file => "$seqfile", -format => 'Fasta'); >>>> >>>> >>>> while( $seq = $seqio->next_seq() ) >>>> { >>>> >>>> >>>> my $pp = Bio::Tools::Protparam->new(-seq=>$seq->seq); >>>> >>>> print >>>> "ID : ", $seq->display_id,"\n", >>>> "Amino acid number : ",$pp->amino_acid_number(),"\n", >>>> "Number of negative amino acids : ",$pp->num_neg(),"\n", >>>> "Number of positive amino acids : ",$pp->num_pos(),"\n", >>>> "Molecular weight : ",$pp->molecular_weight(),"\n", >>>> "Theoretical pI : ",$pp->theoretical_pI(),"\n", >>>> "Total number of atoms : ", $pp->total_atoms(),"\n", >>>> "Number of carbon atoms : ",$pp->num_carbon(),"\n", >>>> "Number of hydrogen atoms : ",$pp->num_hydrogen(),"\n", >>>> "Number of nitrogen atoms : ",$pp->num_nitro(),"\n", >>>> "Number of oxygen atoms : ",$pp->num_oxygen(),"\n", >>>> "Number of sulphur atoms : ",$pp->num_sulphur(),"\n", >>>> "Half life : ", $pp->half_life(),"\n", >>>> "Instability Index : ", $pp->instability_index(),"\n", >>>> "Stability class : ", $pp->stability(),"\n", >>>> "Aliphatic_index : ",$pp->aliphatic_index(),"\n", >>>> "Gravy : ", $pp->gravy(),"\n", >>>> "Composition of A : ", $pp->AA_comp('A'),"\n", >>>> "Composition of R : ", $pp->AA_comp('R'),"\n", >>>> "Composition of N : ", $pp->AA_comp('N'),"\n", >>>> "Composition of D : ", $pp->AA_comp('D'),"\n", >>>> "Composition of C : ", $pp->AA_comp('C'),"\n", >>>> "Composition of Q : ", $pp->AA_comp('Q'),"\n", >>>> "Composition of E : ", $pp->AA_comp('E'),"\n", >>>> "Composition of G : ", $pp->AA_comp('G'),"\n", >>>> "Composition of H : ", $pp->AA_comp('H'),"\n", >>>> "Composition of I : ", $pp->AA_comp('I'),"\n", >>>> "Composition of L : ", $pp->AA_comp('L'),"\n", >>>> "Composition of K : ", $pp->AA_comp('K'),"\n", >>>> "Composition of M : ", $pp->AA_comp('M'),"\n", >>>> "Composition of F : ", $pp->AA_comp('F'),"\n", >>>> "Composition of P : ", $pp->AA_comp('P'),"\n", >>>> "Composition of S : ", $pp->AA_comp('S'),"\n", >>>> "Composition of T : ", $pp->AA_comp('T'),"\n", >>>> "Composition of W : ", $pp->AA_comp('W'),"\n", >>>> "Composition of Y : ", $pp->AA_comp('Y'),"\n", >>>> "Composition of V : ", $pp->AA_comp('V'),"\n", >>>> "Composition of B : ", $pp->AA_comp('B'),"\n", >>>> "Composition of Z : ", $pp->AA_comp('Z'),"\n", >>>> "Composition of X : ", $pp->AA_comp('X'),"\n"; >>>> } >>>> >>> ################################################################################### >>>> >>>> >>>> >>>> >>>> -- >>>> Regards, >>>> Shachi >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >> >> >> -- >> Regards, >> Shachi >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Jason Stajich jason.stajich at gmail.com jason at bioperl.org From bubli_thakur at rediffmail.com Sun Jul 1 10:59:29 2012 From: bubli_thakur at rediffmail.com (subarna thakur) Date: Sun, 01 Jul 2012 14:59:29 -0000 Subject: [Bioperl-l] =?utf-8?q?Ks_saturation?= Message-ID: <20120617031856.16345.qmail@f4mail-235-140.rediffmail.com> Dear all,I am trying to calculate dn/ds values of  all orthologous gene pair between a pair of genome using pairwsie_kaks.pl script within bioperl which evokes the codeml program in runmode -2. When I am analyzing the results, some of the genes have anomalously high dS or Ks values and some of them even reaching more than 100 as a result of which the average value of Ks for the whole genome  shots up. These genes are orthologous genes and even share more than 50% sequence identity. Should I consider these genes for the anlysis or left them out. If I left them out, then upto what cutoff value of Ks should I consider for analysis. In some papers, I have found that they have considered Ks values as high as 5.6. Is there a way for determining the cutoff value for Ks? Subarna   From haywardjeremya at gmail.com Fri Jul 6 13:56:12 2012 From: haywardjeremya at gmail.com (Jeremy Hayward) Date: Fri, 6 Jul 2012 14:56:12 -0300 Subject: [Bioperl-l] Two 'host' tags? Message-ID: Hi-- Clueless newbie here, for which apologies. I've posted a description of my problem, inputs and outputs, at Gist 2816510; https://gist.github.com/2816510 Briefly, I'm trying to take a genbank file (.gb), and create a FASTA file with a specific identifier line for each sequence. Specifically, I want the "host" tag as the identifier. With the help of the Bioperl beginner readme and the HOWTO's (which are great!) I've worked out how to loop through my sequences and get the 'host' tag for each one. For some reason, I get two identifier lines for each sequence. I guess the problem is in the 'for' loop--it's running the stuff below it twice, once with the actual 'host' tag data and once with...nothing? Not sure. I think I can work out how to use s/ and a regex just to delete the second identifier line, but that feels like I'm avoiding the problem instead of fixing it. Any help appreciated! Many thanks, --Jeremy Hayward From jason.stajich at gmail.com Fri Jul 6 15:39:52 2012 From: jason.stajich at gmail.com (Jason Stajich) Date: Fri, 6 Jul 2012 12:39:52 -0700 Subject: [Bioperl-l] Two 'host' tags? In-Reply-To: References: Message-ID: Hi Jeremy - You are printing for every feature in the loop (e.g. the source and the misc_RNA ) - you only want to loop through the features, then grab the one which is source, then change or print the info when you see that. So you could have an if( $feature->primary_tag eq 'source') in there or something as well. Alternatively I've left it pretty much intact and just simplified it a bit. You should also try and use Bio::SeqIO to print instead of your printing. I updated the code here to be simpler - right now it warns you that you are printing IDs with spaces (which is something you should think about when it comes to your output file, but I don't know your downstream plans). Also you could put other info in the description field if you wanted to capture accession number or the endophyte name too. https://gist.github.com/3062285 Best, Jason On Jul 6, 2012, at 10:56 AM, Jeremy Hayward wrote: > Hi-- Clueless newbie here, for which apologies. > > I've posted a description of my problem, inputs and outputs, at Gist > 2816510; https://gist.github.com/2816510 > > Briefly, I'm trying to take a genbank file (.gb), and create a FASTA > file with a specific identifier line for each sequence. Specifically, > I want the "host" tag as the identifier. With the help of the Bioperl > beginner readme and the HOWTO's (which are great!) I've worked out how > to loop through my sequences and get the 'host' tag for each one. For > some reason, I get two identifier lines for each sequence. I guess the > problem is in the 'for' loop--it's running the stuff below it twice, > once with the actual 'host' tag data and once with...nothing? Not > sure. > > I think I can work out how to use s/ and a regex just to delete the > second identifier line, but that feels like I'm avoiding the problem > instead of fixing it. Any help appreciated! > > > Many thanks, > > --Jeremy Hayward > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Jason Stajich jason.stajich at gmail.com jason at bioperl.org From bosborne11 at verizon.net Fri Jul 6 15:51:11 2012 From: bosborne11 at verizon.net (Brian Osborne) Date: Fri, 06 Jul 2012 15:51:11 -0400 Subject: [Bioperl-l] Two 'host' tags? In-Reply-To: References: Message-ID: <456448FF-C413-42D1-833A-FAA74E4FEF9E@verizon.net> Jeremy, Looks like each of your individual sequences has 2 features, but you only care about the 'source' feature ( if ($feat_object->primary_tag eq "source") ?). Also, try not to print out the sequence like you're doing, try to build a Sequence object for each input sequence and then write its contents to your fasta file using write_seq(). You will set the id for your Sequence object using display_name(). Brian O. On Jul 6, 2012, at 1:56 PM, Jeremy Hayward wrote: > Hi-- Clueless newbie here, for which apologies. > > I've posted a description of my problem, inputs and outputs, at Gist > 2816510; https://gist.github.com/2816510 > > Briefly, I'm trying to take a genbank file (.gb), and create a FASTA > file with a specific identifier line for each sequence. Specifically, > I want the "host" tag as the identifier. With the help of the Bioperl > beginner readme and the HOWTO's (which are great!) I've worked out how > to loop through my sequences and get the 'host' tag for each one. For > some reason, I get two identifier lines for each sequence. I guess the > problem is in the 'for' loop--it's running the stuff below it twice, > once with the actual 'host' tag data and once with...nothing? Not > sure. > > I think I can work out how to use s/ and a regex just to delete the > second identifier line, but that feels like I'm avoiding the problem > instead of fixing it. Any help appreciated! > > > Many thanks, > > --Jeremy Hayward > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From dejian.zhao at gmail.com Wed Jul 11 13:31:37 2012 From: dejian.zhao at gmail.com (De-Jian Zhao) Date: Thu, 12 Jul 2012 01:31:37 +0800 Subject: [Bioperl-l] Errors with Bio::Seq objects Message-ID: <4FFDB879.1020906@gmail.com> Hi, I downloaded a nucleotide sequence from Genbank (file name: nt.gb) and tested the Bio::SeqIO module as follows. The first 3 commands succeeded; however the last 2 failed. I think $seqio->next_seq() produces a Bio::Seq object which contains the sequence, features and annotation (according to the DESCRIPTION of "perldoc Bio::Seq") and thus the invocation of the methods get_SeqFeatures() and translate() should be valid. However, the results denied this idea. Will anyone explain what happened to the last 2 commands? I have encountered numerous cases of failures when testing the bioperl methods. I want to translate the mRNA sequence and extract the sequence features. What are the right commands? Thanks a lot! Best, Dejian PS: The commands and results $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); print $seqio->next_seq()->display_id(); ' nt.gb NM_053056 $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); print $seqio->next_seq()->molecule(); ' nt.gb mRNA $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); print $seqio->next_seq()->subseq(1,6); ' nt.gb CACACG $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); print $seqio->next_seq()->get_SeqFeatures(); ' nt.gb Bio::SeqFeature::Generic=HASH(0x20a30898)Bio::SeqFeature::Generic=HASH(0x20a30bb0)Bio::SeqFeature::Generic=HASH(0x20a30cd0)Bio::SeqFeature::Generic=HASH(0x20a317b0)Bio::SeqFeature::Generic=HASH(0x20a31720)Bio::SeqFeature::Generic=HASH(0x20a39a18)Bio::SeqFeature::Generic=HASH(0x20a317e0)Bio::SeqFeature::Generic=HASH(0x20a398b0)Bio::SeqFeature::Generic=HASH(0x20a39838)Bio::SeqFeature::Generic=HASH(0x20a39e98)Bio::SeqFeature::Generic=HASH(0x20a3b898)Bio::SeqFeature::Generic=HASH(0x20a3a120)Bio::SeqFeature::Generic=HASH(0x20a3bda8)Bio::SeqFeature::Generic=HASH(0x20a3c030)Bio::SeqFeature::Generic=HASH(0x20a3c2b8)Bio::SeqFeature::Generic=HASH(0x20a3be20)Bio::SeqFeature::Generic=HASH(0x20a3c0a8)Bio::SeqFeature::Generic=HASH(0x20a3bb98)Bio::SeqFeature::Generic=HASH(0x20a3c300)Bio::SeqFeature::Generic=HASH(0x20a3c588)Bio::SeqFeature::Generic=HASH(0x20a3d838)Bio::SeqFeature::Generic=HASH(0x20a3dfb8)Bio::SeqFeature::Generic=HASH(0x20a3dd18) $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); print $seqio->next_seq()->translate(); ' nt.gb Bio::Seq::RichSeq=HASH(0x20a3e7b0) From jimhu at tamu.edu Wed Jul 11 14:01:27 2012 From: jimhu at tamu.edu (Jim Hu) Date: Wed, 11 Jul 2012 13:01:27 -0500 Subject: [Bioperl-l] Errors with Bio::Seq objects In-Reply-To: <4FFDB879.1020906@gmail.com> References: <4FFDB879.1020906@gmail.com> Message-ID: <9CA9DA3A-B03F-4EC3-977C-E18A6F4D9B6F@tamu.edu> Hi Dejian, On Jul 11, 2012, at 12:31 PM, De-Jian Zhao wrote: > Hi, > > I downloaded a nucleotide sequence from Genbank (file name: nt.gb) and tested the Bio::SeqIO module as follows. The first 3 commands succeeded; however the last 2 failed. > > I think $seqio->next_seq() produces a Bio::Seq object which contains the sequence, features and annotation (according to the DESCRIPTION of "perldoc Bio::Seq") and thus the invocation of the methods get_SeqFeatures() and translate() should be valid. However, the results denied this idea. That's correct about Bio::Seq objects being returned. Actually, it is probably a kind of Bio::Seq object. For example, SeqIO may return a Bio::Seq::RichSeq object that inherits methods from Bio::Seq objects. However, as explained below, the methods are working as they should... they are just returning objects when you are expecting something else. > > Will anyone explain what happened to the last 2 commands? I have encountered numerous cases of failures when testing the bioperl methods. I want to translate the mRNA sequence and extract the sequence features. What are the right commands? Thanks a lot! > > Best, > Dejian > > > > PS: The commands and results > > $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); print $seqio->next_seq()->display_id(); ' nt.gb > NM_053056 > > $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); print $seqio->next_seq()->molecule(); ' nt.gb > mRNA > > $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); print $seqio->next_seq()->subseq(1,6); ' nt.gb > CACACG > > $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); print $seqio->next_seq()->get_SeqFeatures(); ' nt.gb > Bio::SeqFeature::Generic=HASH(0x20a30898)Bio::SeqFeature::Generic=HASH(0x20a30bb0)Bio::SeqFeature::Generic=HASH(0x20a30cd0)Bio::SeqFeature::Generic=HASH(0x20a317b0)Bio::SeqFeature::Generic=HASH(0x20a31720)Bio::SeqFeature::Generic=HASH(0x20a39a18)Bio::SeqFeature::Generic=HASH(0x20a317e0)Bio::SeqFeature::Generic=HASH(0x20a398b0)Bio::SeqFeature::Generic=HASH(0x20a39838)Bio::SeqFeature::Generic=HASH(0x20a39e98)Bio::SeqFeature::Generic=HASH(0x20a3b898)Bio::SeqFeature::Generic=HASH(0x20a3a120)Bio::SeqFeature::Generic=HASH(0x20a3bda8)Bio::SeqFeature::Generic=HASH(0x20a3c030)Bio::SeqFeature::Generic=HASH(0x20a3c2b8)Bio::SeqFeature::Generic=HASH(0x20a3be20)Bio::SeqFeature::Generic=HASH(0x20a3c0a8)Bio::SeqFeature::Generic=HASH(0x20a3bb98)Bio::SeqFeature::Generic=HASH(0x20a3c300)Bio::SeqFeature::Generic=HASH(0x20a3c588)Bio::SeqFeature::Generic=HASH(0x20a3d838)Bio::SeqFeature::Generic=HASH(0x20a3dfb8)Bio::SeqFeature::Generic=HASH(0x20a3dd18) http://doc.bioperl.org/bioperl-live/Bio/Seq.html#POD24 $seq_obj->get_SeqFeatures() returns an array of SeqFeature objects, which are references. So this worked as expected. I usually write this as script files, so I've never done it all with perl -e. But you need to iterate over the array and query the objects for the information you want about the features. > > $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); print $seqio->next_seq()->translate(); ' nt.gb > Bio::Seq::RichSeq=HASH(0x20a3e7b0) ->translate returns a new Seq object. I think $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); print $seqio->next_seq()->translate()->seq(); ' nt.gb should work (haven't tried it). Jim > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ===================================== Jim Hu Professor Dept. of Biochemistry and Biophysics 2128 TAMU Texas A&M Univ. College Station, TX 77843-2128 979-862-4054 From bosborne11 at verizon.net Wed Jul 11 13:47:25 2012 From: bosborne11 at verizon.net (Brian Osborne) Date: Wed, 11 Jul 2012 13:47:25 -0400 Subject: [Bioperl-l] Errors with Bio::Seq objects In-Reply-To: <4FFDB879.1020906@gmail.com> References: <4FFDB879.1020906@gmail.com> Message-ID: Dejian, These are not "failures". The get_SeqFeatures() and translate() methods will return Bio::Seq objects or a Bio::Seq object. Start here: www.bioperl.org/wiki/HOWTO:Beginners Brian O. On Jul 11, 2012, at 1:31 PM, De-Jian Zhao wrote: > Hi, > > I downloaded a nucleotide sequence from Genbank (file name: nt.gb) and tested the Bio::SeqIO module as follows. The first 3 commands succeeded; however the last 2 failed. > > I think $seqio->next_seq() produces a Bio::Seq object which contains the sequence, features and annotation (according to the DESCRIPTION of "perldoc Bio::Seq") and thus the invocation of the methods get_SeqFeatures() and translate() should be valid. However, the results denied this idea. > > Will anyone explain what happened to the last 2 commands? I have encountered numerous cases of failures when testing the bioperl methods. I want to translate the mRNA sequence and extract the sequence features. What are the right commands? Thanks a lot! > > Best, > Dejian > > > > PS: The commands and results > > $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); print $seqio->next_seq()->display_id(); ' nt.gb > NM_053056 > > $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); print $seqio->next_seq()->molecule(); ' nt.gb > mRNA > > $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); print $seqio->next_seq()->subseq(1,6); ' nt.gb > CACACG > > $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); print $seqio->next_seq()->get_SeqFeatures(); ' nt.gb > Bio::SeqFeature::Generic=HASH(0x20a30898)Bio::SeqFeature::Generic=HASH(0x20a30bb0)Bio::SeqFeature::Generic=HASH(0x20a30cd0)Bio::SeqFeature::Generic=HASH(0x20a317b0)Bio::SeqFeature::Generic=HASH(0x20a31720)Bio::SeqFeature::Generic=HASH(0x20a39a18)Bio::SeqFeature::Generic=HASH(0x20a317e0)Bio::SeqFeature::Generic=HASH(0x20a398b0)Bio::SeqFeature::Generic=HASH(0x20a39838)Bio::SeqFeature::Generic=HASH(0x20a39e98)Bio::SeqFeature::Generic=HASH(0x20a3b898)Bio::SeqFeature::Generic=HASH(0x20a3a120)Bio::SeqFeature::Generic=HASH(0x20a3bda8)Bio::SeqFeature::Generic=HASH(0x20a3c030)Bio::SeqFeature::Generic=HASH(0x20a3c2b8)Bio::SeqFeature::Generic=HASH(0x20a3be20)Bio::SeqFeature::Generic=HASH(0x20a3c0a8)Bio::SeqFeature::Generic=HASH(0x20a3bb98)Bio::SeqFeature::Generic=HASH(0x20a3c300)Bio::SeqFeature::Generic=HASH(0x20a3c588)Bio::SeqFeature::Generic=HASH(0x20a3d838)Bio::SeqFeature::Generic=HASH(0x20a3dfb8)Bio::SeqFeature::Generic=HASH(0x20a3dd18) > > $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); print $seqio->next_seq()->translate(); ' nt.gb > Bio::Seq::RichSeq=HASH(0x20a3e7b0) > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Wed Jul 11 15:02:46 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Wed, 11 Jul 2012 19:02:46 +0000 Subject: [Bioperl-l] hmmer3.pm question re query and hit coordinates In-Reply-To: References: <7CF4A2C5-F44F-4C0D-A3B7-5ED131A1A9ED@gmail.com> <1823BCEE-5D27-4FF9-8D57-082AE0CFE8ED@illinois.edu> Message-ID: Peng, Has this been filed as a bug yet? https://redmine.open-bio.org/projects/bioperl Seems like it would be fairly easy to fix, but I want to track it just in case. chris On Jul 11, 2012, at 12:45 PM, Peng Zhou wrote: > Hello guys, > > Just a follow-up, it seems to me the bioperl-live version is still having the same problem - calling hit "query" while query sequence "hit". I also looked into the test script written for hmmer3 (bioperl-live/t/SearchIO/hmmer.t), and it doesn't deal with the alignment part - I guess that's why this bug was not discovered. > > To be simple, here's an output of hmmsearch v3.0: > # hmmsearch :: search profile(s) against a sequence database > # HMMER 3.0 (March 2010); http://hmmer.org/ > # Copyright (C) 2010 Howard Hughes Medical Institute. > # Freely distributed under the GNU General Public License (GPLv3). > # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > # query HMM file: /project/youngn/zhoup/Scripts/spada/profile/21_all.hmm > # target sequence database: /project/youngn/zhoup/Data/misc3/spada/Athaliana/01_genome/12_refseq_orf.fa > # output directed to file: /project/youngn/zhoup/Data/misc3/spada/Athaliana/11_hmmSearchX/01_raw.txt > # number of worker threads: 4 > # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > > Query: CRP0000 [M=75] > Scores for complete sequences (score includes all domains): > --- full sequence --- --- best 1 domain --- -#dom- > E-value score bias E-value score bias exp N Sequence Description > ------- ------ ----- ------- ------ ----- ---- -- -------- ----------- > 5.5e-25 95.0 14.4 5.7e-25 95.0 10.0 1.0 1 Chr2_540228_540404_+ > > Domain annotation for each sequence (and alignments): > >> Chr2_540228_540404_+ > # score bias c-Evalue i-Evalue hmmfrom hmm to alifrom ali to envfrom env to acc > --- ------ ----- --------- --------- ------- ------- ------- ------- ------- ------- ---- > 1 ! 95.0 10.0 3.6e-30 5.7e-25 20 74 .. 4 59 .] 1 59 [] 0.95 > > Alignments for each domain: > == domain 1 score: 95.0 bits; conditional E-value: 3.6e-30 > CRP0000 20 tegpkvaeartCesqShkFkGpCvsdtnCasvCrtEgfpgGecrg.rrrCfCtkpc 74 > ++gp+++eartCes+Sh+FkGpCvs +nCa+vC++Egf gG+crg rrrC+Ct++c > Chr2_540228_540404_+ 4 GMGPVTVEARTCESKSHRFKGPCVSTHNCANVCHNEGFGGGKCRGfRRRCYCTRHC 59 > 568899***99********************************************* PP > > And here is a dump of the parsed HSP object: > $VAR1 = bless( { > 'VERBOSE' => 0, > 'IDENTICAL' => 0, > 'RANK' => 1, > 'STRANDED' => 'NONE', > 'EVALUE' => '3.6e-30', > 'HSP_LENGTH' => 56, > 'ALGORITHM' => 'HMMSEARCH' > 'SCORE' => '95.0', > 'GAP_SYMBOL' => '-', > 'CONSERVED' => 0, > > 'HIT_NAME' => 'Chr2_540228_540404_+', > 'HIT_DESC' => '', > 'HIT_START' => '20', > 'HIT_END' => '74', > 'HIT_LENGTH' => 56, > 'HIT_SEQ' => 'tegpkvaeartCesqShkFkGpCvsdtnCasvCrtEgfpgGecrg-rrrCfCtkpc', > 'HIT_FRAME' => 0, > > 'QUERY_NAME' => 'CRP0000', > 'QUERY_DESC' => undef, > 'QUERY_START' => '4', > 'QUERY_END' => '59', > 'QUERY_LENGTH' => '75', > 'QUERY_FRAME' => 0, > 'QUERY_SEQ' => 'GMGPVTVEARTCESKSHRFKGPCVSTHNCANVCHNEGFGGGKCRGfRRRCYCTRHC', > > 'HOMOLOGY_SEQ' => '++gp+++eartCes+Sh+FkGpCvs +nCa+vC++Egf gG+crg rrrC+Ct++c', > }, 'Bio::Search::HSP::HMMERHSP' ); > > Clearly, the "HIT_START", "HIT_END", "HIT_SEQ" should actually be exchanged with "QUERY_START", "QUERY_END" and "QUERY_SEQ" values. > > Thanks, > > Peng, > > On Tuesday, July 19, 2011 11:23:20 PM UTC-5, Givan, Scott A. wrote: > I'll try the bioperl-live version. Thanks guys. > Scott Givan > 541-740-4685 > Sent from an iPhone (so expect typos). > > On Jul 19, 2011, at 10:34 PM, "Chris Fields" wrote: > > > This might be a disconnect between the HMMER3 version in bioperl-live and the one in Kai's bioperl-hmmer3 repo. I believe the one in bioperl-live is newer. Scott, can you give that a try? > > > > chris > > > > On Jul 19, 2011, at 9:45 PM, Thomas Sharpton wrote: > > > >> Hi Scott, > >> > >> Thanks for writing. I'm on the road at the moment so I have to be briefer and less thorough than I'd like to be. > >> > >> What you are observing is not the intended behavior. Oddly, it's not what I recall obtaining in my tests on this software, though I was mostly interested in hmmsearch at the time and may have been sloppier than I should have been when it came to hmmscan. > >> > >> What version of HMMER3 you're using? There have been some small formatting changes in the past that might be causing a burp in the parser, though I'm doubting it. > >> > >> Kai Blin wrote some test scripts (found here: bioperl-live/t/SearchIO/hmmer.t) that, if I recall correctly, evaluate query/hit coordinates. It might be worth giving this a shot if you haven't already. > >> > >> Also, if you don't mind, I'm happy to run your code on your output file on my end. It might help me diagnose the problem. > >> > >> Sorry this is being a thorn in your side! I've cc'ed the list in case anyone else has insight into this matter. > >> > >> Best, > >> Thomas > >> > >> On Jul 19, 2011, at 10:43 AM, Givan, Scott A. wrote: > >> > >>> Hi Thomas, > >>> > >>> I'm using modules in the bipoerl-hmmer3 git repository to parse hmmscan > >>> reports. When I parse the files and walk through the HSP's like: > >>> > >>> while (my $hit = $rslt->next_model) { > >>> > >>> while (my $domain = $hit->next_hsp) { > >>> > >>> And retrieve the "hit" coordinates like: > >>> > >>> print "hit coords: ", $domain->start('hit'), "-", $domain->end('hit'), > >>> "\n"; > >>> > >>> The coordinates returned correspond to what I would call the "query", > >>> since they are for the sequence I fed to hmmscan to search the profile > >>> database. Likewise, when retrieving the query coordinates like > >>> $domain->start('query'), I get what I consider the "hit" coordinates, > >>> since they are for the domain profile. Is this the intended behavior? > >>> > >>> Thanks. > >>> > >>> scott > >>> > >>> -- > >>> Scott A. Givan > >>> Associate Director > >>> Informatics Research Core Facility > >>> 240e Bond Life Sciences Center > >>> Research Assistant Professor > >>> Molecular Microbiology and Immunology > >>> University of Missouri, Columbia > >>> > >>> TEL 573-882-2948 > >>> FAX 573-884-9676 > >>> http://ircf.rnet.missouri.edu > >>> > >>> > >>> > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > From p.j.a.cock at googlemail.com Wed Jul 11 17:00:56 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 11 Jul 2012 22:00:56 +0100 Subject: [Bioperl-l] hmmer3.pm question re query and hit coordinates In-Reply-To: References: <7CF4A2C5-F44F-4C0D-A3B7-5ED131A1A9ED@gmail.com> <1823BCEE-5D27-4FF9-8D57-082AE0CFE8ED@illinois.edu> Message-ID: On Wed, Jul 11, 2012 at 8:02 PM, Fields, Christopher J wrote: > Peng, > > Has this been filed as a bug yet? > > https://redmine.open-bio.org/projects/bioperl > > Seems like it would be fairly easy to fix, but I want to track it just in case. > > chris Hi all, This could be the unfortunate fact that hmmscan and hmmsearch return very similar tabular output, but with query and hit interchanged. i.e. You need some extra information to know which way round they are (not possible with the current output). This was an issue in Bow's Biopython SearchIO project - which for the moment he solved by handling this as two hmmer file formats. In the medium term we're hoping hmmer3 will add some header information or something. Peter From zhoupenggeni at gmail.com Wed Jul 11 13:45:00 2012 From: zhoupenggeni at gmail.com (Peng Zhou) Date: Wed, 11 Jul 2012 10:45:00 -0700 (PDT) Subject: [Bioperl-l] hmmer3.pm question re query and hit coordinates In-Reply-To: References: <7CF4A2C5-F44F-4C0D-A3B7-5ED131A1A9ED@gmail.com> <1823BCEE-5D27-4FF9-8D57-082AE0CFE8ED@illinois.edu> Message-ID: Hello guys, Just a follow-up, it seems to me the bioperl-live version is still having the same problem - calling hit "query" while query sequence "hit". I also looked into the test script written for hmmer3 ( bioperl-live/t/SearchIO/hmmer.t), and it doesn't deal with the alignment part - I guess that's why this bug was not discovered. To be simple, here's an output of hmmsearch v3.0: # hmmsearch :: search profile(s) against a sequence database # HMMER 3.0 (March 2010); http://hmmer.org/ # Copyright (C) 2010 Howard Hughes Medical Institute. # Freely distributed under the GNU General Public License (GPLv3). # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - # query HMM file: /project/youngn/zhoup/Scripts/spada/profile/21_all.hmm # target sequence database: /project/youngn/zhoup/Data/misc3/spada/Athaliana/01_genome/12_refseq_orf.fa # output directed to file: /project/youngn/zhoup/Data/misc3/spada/Athaliana/11_hmmSearchX/01_raw.txt # number of worker threads: 4 # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Query: CRP0000 [M=75] Scores for complete sequences (score includes all domains): --- full sequence --- --- best 1 domain --- -#dom- E-value score bias E-value score bias exp N Sequence Description ------- ------ ----- ------- ------ ----- ---- -- -------- ----------- 5.5e-25 95.0 14.4 5.7e-25 95.0 10.0 1.0 1 Chr2_540228_540404_+ Domain annotation for each sequence (and alignments): >> Chr2_540228_540404_+ # score bias c-Evalue i-Evalue hmmfrom hmm to alifrom ali to envfrom env to acc --- ------ ----- --------- --------- ------- ------- ------- ------- ------- ------- ---- 1 ! 95.0 10.0 3.6e-30 5.7e-25 20 74 .. 4 59 .] 1 59 [] 0.95 Alignments for each domain: == domain 1 score: 95.0 bits; conditional E-value: 3.6e-30 CRP0000 20 tegpkvaeartCesqShkFkGpCvsdtnCasvCrtEgfpgGecrg.rrrCfCtkpc 74 ++gp+++eartCes+Sh+FkGpCvs +nCa+vC++Egf gG+crg rrrC+Ct++c Chr2_540228_540404_+ 4 GMGPVTVEARTCESKSHRFKGPCVSTHNCANVCHNEGFGGGKCRGfRRRCYCTRHC 59 568899***99********************************************* PP And here is a dump of the parsed HSP object: $VAR1 = bless( { 'VERBOSE' => 0, 'IDENTICAL' => 0, 'RANK' => 1, 'STRANDED' => 'NONE', 'EVALUE' => '3.6e-30', 'HSP_LENGTH' => 56, 'ALGORITHM' => 'HMMSEARCH' 'SCORE' => '95.0', 'GAP_SYMBOL' => '-', 'CONSERVED' => 0, 'HIT_NAME' => 'Chr2_540228_540404_+', 'HIT_DESC' => '', 'HIT_START' => '20', 'HIT_END' => '74', 'HIT_LENGTH' => 56, 'HIT_SEQ' => 'tegpkvaeartCesqShkFkGpCvsdtnCasvCrtEgfpgGecrg-rrrCfCtkpc', 'HIT_FRAME' => 0, 'QUERY_NAME' => 'CRP0000', 'QUERY_DESC' => undef, 'QUERY_START' => '4', 'QUERY_END' => '59', 'QUERY_LENGTH' => '75', 'QUERY_FRAME' => 0, 'QUERY_SEQ' => 'GMGPVTVEARTCESKSHRFKGPCVSTHNCANVCHNEGFGGGKCRGfRRRCYCTRHC', 'HOMOLOGY_SEQ' => '++gp+++eartCes+Sh+FkGpCvs +nCa+vC++Egf gG+crg rrrC+Ct++c', }, 'Bio::Search::HSP::HMMERHSP' ); Clearly, the "HIT_START", "HIT_END", "HIT_SEQ" should actually be exchanged with "QUERY_START", "QUERY_END" and "QUERY_SEQ" values. Thanks, Peng, On Tuesday, July 19, 2011 11:23:20 PM UTC-5, Givan, Scott A. wrote: > > I'll try the bioperl-live version. Thanks guys. > > Scott Givan > 541-740-4685 > Sent from an iPhone (so expect typos). > > On Jul 19, 2011, at 10:34 PM, "Chris Fields" > wrote: > > > This might be a disconnect between the HMMER3 version in bioperl-live > and the one in Kai's bioperl-hmmer3 repo. I believe the one in > bioperl-live is newer. Scott, can you give that a try? > > > > chris > > > > On Jul 19, 2011, at 9:45 PM, Thomas Sharpton wrote: > > > >> Hi Scott, > >> > >> Thanks for writing. I'm on the road at the moment so I have to be > briefer and less thorough than I'd like to be. > >> > >> What you are observing is not the intended behavior. Oddly, it's not > what I recall obtaining in my tests on this software, though I was mostly > interested in hmmsearch at the time and may have been sloppier than I > should have been when it came to hmmscan. > >> > >> What version of HMMER3 you're using? There have been some small > formatting changes in the past that might be causing a burp in the parser, > though I'm doubting it. > >> > >> Kai Blin wrote some test scripts (found here: > bioperl-live/t/SearchIO/hmmer.t) that, if I recall correctly, evaluate > query/hit coordinates. It might be worth giving this a shot if you haven't > already. > >> > >> Also, if you don't mind, I'm happy to run your code on your output file > on my end. It might help me diagnose the problem. > >> > >> Sorry this is being a thorn in your side! I've cc'ed the list in case > anyone else has insight into this matter. > >> > >> Best, > >> Thomas > >> > >> On Jul 19, 2011, at 10:43 AM, Givan, Scott A. wrote: > >> > >>> Hi Thomas, > >>> > >>> I'm using modules in the bipoerl-hmmer3 git repository to parse hmmscan > >>> reports. When I parse the files and walk through the HSP's like: > >>> > >>> while (my $hit = $rslt->next_model) { > >>> > >>> while (my $domain = $hit->next_hsp) { > >>> > >>> And retrieve the "hit" coordinates like: > >>> > >>> print "hit coords: ", $domain->start('hit'), "-", > $domain->end('hit'), > >>> "\n"; > >>> > >>> The coordinates returned correspond to what I would call the "query", > >>> since they are for the sequence I fed to hmmscan to search the profile > >>> database. Likewise, when retrieving the query coordinates like > >>> $domain->start('query'), I get what I consider the "hit" coordinates, > >>> since they are for the domain profile. Is this the intended behavior? > >>> > >>> Thanks. > >>> > >>> scott > >>> > >>> -- > >>> Scott A. Givan > >>> Associate Director > >>> Informatics Research Core Facility > >>> 240e Bond Life Sciences Center > >>> Research Assistant Professor > >>> Molecular Microbiology and Immunology > >>> University of Missouri, Columbia > >>> > >>> TEL 573-882-2948 > >>> FAX 573-884-9676 > >>> http://ircf.rnet.missouri.edu > >>> > >>> > >>> > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From zhoupenggeni at gmail.com Wed Jul 11 13:45:00 2012 From: zhoupenggeni at gmail.com (Peng Zhou) Date: Wed, 11 Jul 2012 10:45:00 -0700 (PDT) Subject: [Bioperl-l] hmmer3.pm question re query and hit coordinates In-Reply-To: References: <7CF4A2C5-F44F-4C0D-A3B7-5ED131A1A9ED@gmail.com> <1823BCEE-5D27-4FF9-8D57-082AE0CFE8ED@illinois.edu> Message-ID: Hello guys, Just a follow-up, it seems to me the bioperl-live version is still having the same problem - calling hit "query" while query sequence "hit". I also looked into the test script written for hmmer3 ( bioperl-live/t/SearchIO/hmmer.t), and it doesn't deal with the alignment part - I guess that's why this bug was not discovered. To be simple, here's an output of hmmsearch v3.0: # hmmsearch :: search profile(s) against a sequence database # HMMER 3.0 (March 2010); http://hmmer.org/ # Copyright (C) 2010 Howard Hughes Medical Institute. # Freely distributed under the GNU General Public License (GPLv3). # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - # query HMM file: /project/youngn/zhoup/Scripts/spada/profile/21_all.hmm # target sequence database: /project/youngn/zhoup/Data/misc3/spada/Athaliana/01_genome/12_refseq_orf.fa # output directed to file: /project/youngn/zhoup/Data/misc3/spada/Athaliana/11_hmmSearchX/01_raw.txt # number of worker threads: 4 # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Query: CRP0000 [M=75] Scores for complete sequences (score includes all domains): --- full sequence --- --- best 1 domain --- -#dom- E-value score bias E-value score bias exp N Sequence Description ------- ------ ----- ------- ------ ----- ---- -- -------- ----------- 5.5e-25 95.0 14.4 5.7e-25 95.0 10.0 1.0 1 Chr2_540228_540404_+ Domain annotation for each sequence (and alignments): >> Chr2_540228_540404_+ # score bias c-Evalue i-Evalue hmmfrom hmm to alifrom ali to envfrom env to acc --- ------ ----- --------- --------- ------- ------- ------- ------- ------- ------- ---- 1 ! 95.0 10.0 3.6e-30 5.7e-25 20 74 .. 4 59 .] 1 59 [] 0.95 Alignments for each domain: == domain 1 score: 95.0 bits; conditional E-value: 3.6e-30 CRP0000 20 tegpkvaeartCesqShkFkGpCvsdtnCasvCrtEgfpgGecrg.rrrCfCtkpc 74 ++gp+++eartCes+Sh+FkGpCvs +nCa+vC++Egf gG+crg rrrC+Ct++c Chr2_540228_540404_+ 4 GMGPVTVEARTCESKSHRFKGPCVSTHNCANVCHNEGFGGGKCRGfRRRCYCTRHC 59 568899***99********************************************* PP And here is a dump of the parsed HSP object: $VAR1 = bless( { 'VERBOSE' => 0, 'IDENTICAL' => 0, 'RANK' => 1, 'STRANDED' => 'NONE', 'EVALUE' => '3.6e-30', 'HSP_LENGTH' => 56, 'ALGORITHM' => 'HMMSEARCH' 'SCORE' => '95.0', 'GAP_SYMBOL' => '-', 'CONSERVED' => 0, 'HIT_NAME' => 'Chr2_540228_540404_+', 'HIT_DESC' => '', 'HIT_START' => '20', 'HIT_END' => '74', 'HIT_LENGTH' => 56, 'HIT_SEQ' => 'tegpkvaeartCesqShkFkGpCvsdtnCasvCrtEgfpgGecrg-rrrCfCtkpc', 'HIT_FRAME' => 0, 'QUERY_NAME' => 'CRP0000', 'QUERY_DESC' => undef, 'QUERY_START' => '4', 'QUERY_END' => '59', 'QUERY_LENGTH' => '75', 'QUERY_FRAME' => 0, 'QUERY_SEQ' => 'GMGPVTVEARTCESKSHRFKGPCVSTHNCANVCHNEGFGGGKCRGfRRRCYCTRHC', 'HOMOLOGY_SEQ' => '++gp+++eartCes+Sh+FkGpCvs +nCa+vC++Egf gG+crg rrrC+Ct++c', }, 'Bio::Search::HSP::HMMERHSP' ); Clearly, the "HIT_START", "HIT_END", "HIT_SEQ" should actually be exchanged with "QUERY_START", "QUERY_END" and "QUERY_SEQ" values. Thanks, Peng, On Tuesday, July 19, 2011 11:23:20 PM UTC-5, Givan, Scott A. wrote: > > I'll try the bioperl-live version. Thanks guys. > > Scott Givan > 541-740-4685 > Sent from an iPhone (so expect typos). > > On Jul 19, 2011, at 10:34 PM, "Chris Fields" > wrote: > > > This might be a disconnect between the HMMER3 version in bioperl-live > and the one in Kai's bioperl-hmmer3 repo. I believe the one in > bioperl-live is newer. Scott, can you give that a try? > > > > chris > > > > On Jul 19, 2011, at 9:45 PM, Thomas Sharpton wrote: > > > >> Hi Scott, > >> > >> Thanks for writing. I'm on the road at the moment so I have to be > briefer and less thorough than I'd like to be. > >> > >> What you are observing is not the intended behavior. Oddly, it's not > what I recall obtaining in my tests on this software, though I was mostly > interested in hmmsearch at the time and may have been sloppier than I > should have been when it came to hmmscan. > >> > >> What version of HMMER3 you're using? There have been some small > formatting changes in the past that might be causing a burp in the parser, > though I'm doubting it. > >> > >> Kai Blin wrote some test scripts (found here: > bioperl-live/t/SearchIO/hmmer.t) that, if I recall correctly, evaluate > query/hit coordinates. It might be worth giving this a shot if you haven't > already. > >> > >> Also, if you don't mind, I'm happy to run your code on your output file > on my end. It might help me diagnose the problem. > >> > >> Sorry this is being a thorn in your side! I've cc'ed the list in case > anyone else has insight into this matter. > >> > >> Best, > >> Thomas > >> > >> On Jul 19, 2011, at 10:43 AM, Givan, Scott A. wrote: > >> > >>> Hi Thomas, > >>> > >>> I'm using modules in the bipoerl-hmmer3 git repository to parse hmmscan > >>> reports. When I parse the files and walk through the HSP's like: > >>> > >>> while (my $hit = $rslt->next_model) { > >>> > >>> while (my $domain = $hit->next_hsp) { > >>> > >>> And retrieve the "hit" coordinates like: > >>> > >>> print "hit coords: ", $domain->start('hit'), "-", > $domain->end('hit'), > >>> "\n"; > >>> > >>> The coordinates returned correspond to what I would call the "query", > >>> since they are for the sequence I fed to hmmscan to search the profile > >>> database. Likewise, when retrieving the query coordinates like > >>> $domain->start('query'), I get what I consider the "hit" coordinates, > >>> since they are for the domain profile. Is this the intended behavior? > >>> > >>> Thanks. > >>> > >>> scott > >>> > >>> -- > >>> Scott A. Givan > >>> Associate Director > >>> Informatics Research Core Facility > >>> 240e Bond Life Sciences Center > >>> Research Assistant Professor > >>> Molecular Microbiology and Immunology > >>> University of Missouri, Columbia > >>> > >>> TEL 573-882-2948 > >>> FAX 573-884-9676 > >>> http://ircf.rnet.missouri.edu > >>> > >>> > >>> > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From zhoupenggeni at gmail.com Wed Jul 11 14:03:17 2012 From: zhoupenggeni at gmail.com (Peng Zhou) Date: Wed, 11 Jul 2012 11:03:17 -0700 (PDT) Subject: [Bioperl-l] Errors with Bio::Seq objects In-Reply-To: <4FFDB879.1020906@gmail.com> References: <4FFDB879.1020906@gmail.com> Message-ID: <25cf332a-f998-4eae-96ff-d9db1ee2ff2c@googlegroups.com> Hi, I guess that's what the commands are supposed to do: the get_SeqFeatures() method return an array of Bio::SeqFeature objects, and the translate() method returns a Bio::Seq object. And you can't simply "print" an object in perl - you can "dump" it though: $ perl -e ' use Bio::SeqIO; use Data::Dumper; my $seqio=Bio::SeqIO->new(-file=>shift); print Dumper($seqio->next_seq()->get_SeqFeatures()); ' nt.gb $ perl -e ' use Bio::SeqIO; use Data::Dumper; my $seqio=Bio::SeqIO->new(-file=>shift); print Dumper($seqio->next_seq()->translate()); ' nt.gb On Wednesday, July 11, 2012 12:31:37 PM UTC-5, De-Jian Zhao wrote: > > Hi, > > I downloaded a nucleotide sequence from Genbank (file name: nt.gb) and > tested the Bio::SeqIO module as follows. The first 3 commands succeeded; > however the last 2 failed. > > I think $seqio->next_seq() produces a Bio::Seq object which contains the > sequence, features and annotation (according to the DESCRIPTION of > "perldoc Bio::Seq") and thus the invocation of the methods > get_SeqFeatures() and translate() should be valid. However, the results > denied this idea. > > Will anyone explain what happened to the last 2 commands? I have > encountered numerous cases of failures when testing the bioperl methods. > I want to translate the mRNA sequence and extract the sequence features. > What are the right commands? Thanks a lot! > > Best, > Dejian > > > > PS: The commands and results > > $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); > print $seqio->next_seq()->display_id(); ' nt.gb > NM_053056 > > $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); > print $seqio->next_seq()->molecule(); ' nt.gb > mRNA > > $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); > print $seqio->next_seq()->subseq(1,6); ' nt.gb > CACACG > > $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); > print $seqio->next_seq()->get_SeqFeatures(); ' nt.gb > Bio::SeqFeature::Generic=HASH(0x20a30898)Bio::SeqFeature::Generic=HASH(0x20a30bb0)Bio::SeqFeature::Generic=HASH(0x20a30cd0)Bio::SeqFeature::Generic=HASH(0x20a317b0)Bio::SeqFeature::Generic=HASH(0x20a31720)Bio::SeqFeature::Generic=HASH(0x20a39a18)Bio::SeqFeature::Generic=HASH(0x20a317e0)Bio::SeqFeature::Generic=HASH(0x20a398b0)Bio::SeqFeature::Generic=HASH(0x20a39838)Bio::SeqFeature::Generic=HASH(0x20a39e98)Bio::SeqFeature::Generic=HASH(0x20a3b898)Bio::SeqFeature::Generic=HASH(0x20a3a120)Bio::SeqFeature::Generic=HASH(0x20a3bda8)Bio::SeqFeature::Generic=HASH(0x20a3c030)Bio::SeqFeature::Generic=HASH(0x20a3c2b8)Bio::SeqFeature::Generic=HASH(0x20a3be20)Bio::SeqFeature::Generic=HASH(0x20a3c0a8)Bio::SeqFeature::Generic=HASH(0x20a3bb98)Bio::SeqFeature::Generic=HASH(0x20a3c300)Bio::SeqFeature::Generic=HASH(0x20a3c588)Bio::SeqFeature::Generic=HASH(0x20a3d838)Bio::SeqFeature::Generic=HASH(0x20a3dfb8)Bio::SeqFeature::Generic=HASH(0x20a3dd18) > > > $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); > print $seqio->next_seq()->translate(); ' nt.gb > Bio::Seq::RichSeq=HASH(0x20a3e7b0) > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From zhoupenggeni at gmail.com Wed Jul 11 14:03:17 2012 From: zhoupenggeni at gmail.com (Peng Zhou) Date: Wed, 11 Jul 2012 11:03:17 -0700 (PDT) Subject: [Bioperl-l] Errors with Bio::Seq objects In-Reply-To: <4FFDB879.1020906@gmail.com> References: <4FFDB879.1020906@gmail.com> Message-ID: <25cf332a-f998-4eae-96ff-d9db1ee2ff2c@googlegroups.com> Hi, I guess that's what the commands are supposed to do: the get_SeqFeatures() method return an array of Bio::SeqFeature objects, and the translate() method returns a Bio::Seq object. And you can't simply "print" an object in perl - you can "dump" it though: $ perl -e ' use Bio::SeqIO; use Data::Dumper; my $seqio=Bio::SeqIO->new(-file=>shift); print Dumper($seqio->next_seq()->get_SeqFeatures()); ' nt.gb $ perl -e ' use Bio::SeqIO; use Data::Dumper; my $seqio=Bio::SeqIO->new(-file=>shift); print Dumper($seqio->next_seq()->translate()); ' nt.gb On Wednesday, July 11, 2012 12:31:37 PM UTC-5, De-Jian Zhao wrote: > > Hi, > > I downloaded a nucleotide sequence from Genbank (file name: nt.gb) and > tested the Bio::SeqIO module as follows. The first 3 commands succeeded; > however the last 2 failed. > > I think $seqio->next_seq() produces a Bio::Seq object which contains the > sequence, features and annotation (according to the DESCRIPTION of > "perldoc Bio::Seq") and thus the invocation of the methods > get_SeqFeatures() and translate() should be valid. However, the results > denied this idea. > > Will anyone explain what happened to the last 2 commands? I have > encountered numerous cases of failures when testing the bioperl methods. > I want to translate the mRNA sequence and extract the sequence features. > What are the right commands? Thanks a lot! > > Best, > Dejian > > > > PS: The commands and results > > $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); > print $seqio->next_seq()->display_id(); ' nt.gb > NM_053056 > > $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); > print $seqio->next_seq()->molecule(); ' nt.gb > mRNA > > $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); > print $seqio->next_seq()->subseq(1,6); ' nt.gb > CACACG > > $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); > print $seqio->next_seq()->get_SeqFeatures(); ' nt.gb > Bio::SeqFeature::Generic=HASH(0x20a30898)Bio::SeqFeature::Generic=HASH(0x20a30bb0)Bio::SeqFeature::Generic=HASH(0x20a30cd0)Bio::SeqFeature::Generic=HASH(0x20a317b0)Bio::SeqFeature::Generic=HASH(0x20a31720)Bio::SeqFeature::Generic=HASH(0x20a39a18)Bio::SeqFeature::Generic=HASH(0x20a317e0)Bio::SeqFeature::Generic=HASH(0x20a398b0)Bio::SeqFeature::Generic=HASH(0x20a39838)Bio::SeqFeature::Generic=HASH(0x20a39e98)Bio::SeqFeature::Generic=HASH(0x20a3b898)Bio::SeqFeature::Generic=HASH(0x20a3a120)Bio::SeqFeature::Generic=HASH(0x20a3bda8)Bio::SeqFeature::Generic=HASH(0x20a3c030)Bio::SeqFeature::Generic=HASH(0x20a3c2b8)Bio::SeqFeature::Generic=HASH(0x20a3be20)Bio::SeqFeature::Generic=HASH(0x20a3c0a8)Bio::SeqFeature::Generic=HASH(0x20a3bb98)Bio::SeqFeature::Generic=HASH(0x20a3c300)Bio::SeqFeature::Generic=HASH(0x20a3c588)Bio::SeqFeature::Generic=HASH(0x20a3d838)Bio::SeqFeature::Generic=HASH(0x20a3dfb8)Bio::SeqFeature::Generic=HASH(0x20a3dd18) > > > $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); > print $seqio->next_seq()->translate(); ' nt.gb > Bio::Seq::RichSeq=HASH(0x20a3e7b0) > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From zhoupenggeni at gmail.com Wed Jul 11 16:05:56 2012 From: zhoupenggeni at gmail.com (Peng Zhou) Date: Wed, 11 Jul 2012 13:05:56 -0700 (PDT) Subject: [Bioperl-l] hmmer3.pm question re query and hit coordinates In-Reply-To: References: <7CF4A2C5-F44F-4C0D-A3B7-5ED131A1A9ED@gmail.com> <1823BCEE-5D27-4FF9-8D57-082AE0CFE8ED@illinois.edu> Message-ID: Thanks Chris, here is the link of the filed bug: https://redmine.open-bio.org/issues/3369 On Wednesday, July 11, 2012 2:02:46 PM UTC-5, Christopher Fields wrote: > > Peng, > > Has this been filed as a bug yet? > > https://redmine.open-bio.org/projects/bioperl > > Seems like it would be fairly easy to fix, but I want to track it just in > case. > > chris > > On Jul 11, 2012, at 12:45 PM, Peng Zhou wrote: > > > Hello guys, > > > > Just a follow-up, it seems to me the bioperl-live version is still > having the same problem - calling hit "query" while query sequence "hit". I > also looked into the test script written for hmmer3 > (bioperl-live/t/SearchIO/hmmer.t), and it doesn't deal with the alignment > part - I guess that's why this bug was not discovered. > > > > To be simple, here's an output of hmmsearch v3.0: > > # hmmsearch :: search profile(s) against a sequence database > > # HMMER 3.0 (March 2010); http://hmmer.org/ > > # Copyright (C) 2010 Howard Hughes Medical Institute. > > # Freely distributed under the GNU General Public License (GPLv3). > > # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > - > > # query HMM file: > /project/youngn/zhoup/Scripts/spada/profile/21_all.hmm > > # target sequence database: > /project/youngn/zhoup/Data/misc3/spada/Athaliana/01_genome/12_refseq_orf.fa > > > # output directed to file: > /project/youngn/zhoup/Data/misc3/spada/Athaliana/11_hmmSearchX/01_raw.txt > > # number of worker threads: 4 > > # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > - > > > > Query: CRP0000 [M=75] > > Scores for complete sequences (score includes all domains): > > --- full sequence --- --- best 1 domain --- -#dom- > > E-value score bias E-value score bias exp N Sequence > Description > > ------- ------ ----- ------- ------ ----- ---- -- -------- > ----------- > > 5.5e-25 95.0 14.4 5.7e-25 95.0 10.0 1.0 1 > Chr2_540228_540404_+ > > > > Domain annotation for each sequence (and alignments): > > >> Chr2_540228_540404_+ > > # score bias c-Evalue i-Evalue hmmfrom hmm to alifrom ali > to envfrom env to acc > > --- ------ ----- --------- --------- ------- ------- ------- > ------- ------- ------- ---- > > 1 ! 95.0 10.0 3.6e-30 5.7e-25 20 74 .. 4 > 59 .] 1 59 [] 0.95 > > > > Alignments for each domain: > > == domain 1 score: 95.0 bits; conditional E-value: 3.6e-30 > > CRP0000 20 > tegpkvaeartCesqShkFkGpCvsdtnCasvCrtEgfpgGecrg.rrrCfCtkpc 74 > > ++gp+++eartCes+Sh+FkGpCvs +nCa+vC++Egf gG+crg > rrrC+Ct++c > > Chr2_540228_540404_+ 4 > GMGPVTVEARTCESKSHRFKGPCVSTHNCANVCHNEGFGGGKCRGfRRRCYCTRHC 59 > > > 568899***99********************************************* PP > > > > And here is a dump of the parsed HSP object: > > $VAR1 = bless( { > > 'VERBOSE' => 0, > > 'IDENTICAL' => 0, > > 'RANK' => 1, > > 'STRANDED' => 'NONE', > > 'EVALUE' => '3.6e-30', > > 'HSP_LENGTH' => 56, > > 'ALGORITHM' => 'HMMSEARCH' > > 'SCORE' => '95.0', > > 'GAP_SYMBOL' => '-', > > 'CONSERVED' => 0, > > > > 'HIT_NAME' => 'Chr2_540228_540404_+', > > 'HIT_DESC' => '', > > 'HIT_START' => '20', > > 'HIT_END' => '74', > > 'HIT_LENGTH' => 56, > > 'HIT_SEQ' => > 'tegpkvaeartCesqShkFkGpCvsdtnCasvCrtEgfpgGecrg-rrrCfCtkpc', > > 'HIT_FRAME' => 0, > > > > 'QUERY_NAME' => 'CRP0000', > > 'QUERY_DESC' => undef, > > 'QUERY_START' => '4', > > 'QUERY_END' => '59', > > 'QUERY_LENGTH' => '75', > > 'QUERY_FRAME' => 0, > > 'QUERY_SEQ' => > 'GMGPVTVEARTCESKSHRFKGPCVSTHNCANVCHNEGFGGGKCRGfRRRCYCTRHC', > > > > 'HOMOLOGY_SEQ' => '++gp+++eartCes+Sh+FkGpCvs > +nCa+vC++Egf gG+crg rrrC+Ct++c', > > }, 'Bio::Search::HSP::HMMERHSP' ); > > > > Clearly, the "HIT_START", "HIT_END", "HIT_SEQ" should actually be > exchanged with "QUERY_START", "QUERY_END" and "QUERY_SEQ" values. > > > > Thanks, > > > > Peng, > > > > On Tuesday, July 19, 2011 11:23:20 PM UTC-5, Givan, Scott A. wrote: > > I'll try the bioperl-live version. Thanks guys. > > Scott Givan > > 541-740-4685 > > Sent from an iPhone (so expect typos). > > > > On Jul 19, 2011, at 10:34 PM, "Chris Fields" > wrote: > > > > > This might be a disconnect between the HMMER3 version in bioperl-live > and the one in Kai's bioperl-hmmer3 repo. I believe the one in > bioperl-live is newer. Scott, can you give that a try? > > > > > > chris > > > > > > On Jul 19, 2011, at 9:45 PM, Thomas Sharpton wrote: > > > > > >> Hi Scott, > > >> > > >> Thanks for writing. I'm on the road at the moment so I have to be > briefer and less thorough than I'd like to be. > > >> > > >> What you are observing is not the intended behavior. Oddly, it's not > what I recall obtaining in my tests on this software, though I was mostly > interested in hmmsearch at the time and may have been sloppier than I > should have been when it came to hmmscan. > > >> > > >> What version of HMMER3 you're using? There have been some small > formatting changes in the past that might be causing a burp in the parser, > though I'm doubting it. > > >> > > >> Kai Blin wrote some test scripts (found here: > bioperl-live/t/SearchIO/hmmer.t) that, if I recall correctly, evaluate > query/hit coordinates. It might be worth giving this a shot if you haven't > already. > > >> > > >> Also, if you don't mind, I'm happy to run your code on your output > file on my end. It might help me diagnose the problem. > > >> > > >> Sorry this is being a thorn in your side! I've cc'ed the list in case > anyone else has insight into this matter. > > >> > > >> Best, > > >> Thomas > > >> > > >> On Jul 19, 2011, at 10:43 AM, Givan, Scott A. wrote: > > >> > > >>> Hi Thomas, > > >>> > > >>> I'm using modules in the bipoerl-hmmer3 git repository to parse > hmmscan > > >>> reports. When I parse the files and walk through the HSP's like: > > >>> > > >>> while (my $hit = $rslt->next_model) { > > >>> > > >>> while (my $domain = $hit->next_hsp) { > > >>> > > >>> And retrieve the "hit" coordinates like: > > >>> > > >>> print "hit coords: ", $domain->start('hit'), "-", > $domain->end('hit'), > > >>> "\n"; > > >>> > > >>> The coordinates returned correspond to what I would call the > "query", > > >>> since they are for the sequence I fed to hmmscan to search the > profile > > >>> database. Likewise, when retrieving the query coordinates like > > >>> $domain->start('query'), I get what I consider the "hit" > coordinates, > > >>> since they are for the domain profile. Is this the intended > behavior? > > >>> > > >>> Thanks. > > >>> > > >>> scott > > >>> > > >>> -- > > >>> Scott A. Givan > > >>> Associate Director > > >>> Informatics Research Core Facility > > >>> 240e Bond Life Sciences Center > > >>> Research Assistant Professor > > >>> Molecular Microbiology and Immunology > > >>> University of Missouri, Columbia > > >>> > > >>> TEL 573-882-2948 > > >>> FAX 573-884-9676 > > >>> http://ircf.rnet.missouri.edu > > >>> > > >>> > > >>> > > >> > > >> _______________________________________________ > > >> Bioperl-l mailing list > > >> Bioperl-l at lists.open-bio.org > > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From zhoupenggeni at gmail.com Wed Jul 11 16:05:56 2012 From: zhoupenggeni at gmail.com (Peng Zhou) Date: Wed, 11 Jul 2012 13:05:56 -0700 (PDT) Subject: [Bioperl-l] hmmer3.pm question re query and hit coordinates In-Reply-To: References: <7CF4A2C5-F44F-4C0D-A3B7-5ED131A1A9ED@gmail.com> <1823BCEE-5D27-4FF9-8D57-082AE0CFE8ED@illinois.edu> Message-ID: Thanks Chris, here is the link of the filed bug: https://redmine.open-bio.org/issues/3369 On Wednesday, July 11, 2012 2:02:46 PM UTC-5, Christopher Fields wrote: > > Peng, > > Has this been filed as a bug yet? > > https://redmine.open-bio.org/projects/bioperl > > Seems like it would be fairly easy to fix, but I want to track it just in > case. > > chris > > On Jul 11, 2012, at 12:45 PM, Peng Zhou wrote: > > > Hello guys, > > > > Just a follow-up, it seems to me the bioperl-live version is still > having the same problem - calling hit "query" while query sequence "hit". I > also looked into the test script written for hmmer3 > (bioperl-live/t/SearchIO/hmmer.t), and it doesn't deal with the alignment > part - I guess that's why this bug was not discovered. > > > > To be simple, here's an output of hmmsearch v3.0: > > # hmmsearch :: search profile(s) against a sequence database > > # HMMER 3.0 (March 2010); http://hmmer.org/ > > # Copyright (C) 2010 Howard Hughes Medical Institute. > > # Freely distributed under the GNU General Public License (GPLv3). > > # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > - > > # query HMM file: > /project/youngn/zhoup/Scripts/spada/profile/21_all.hmm > > # target sequence database: > /project/youngn/zhoup/Data/misc3/spada/Athaliana/01_genome/12_refseq_orf.fa > > > # output directed to file: > /project/youngn/zhoup/Data/misc3/spada/Athaliana/11_hmmSearchX/01_raw.txt > > # number of worker threads: 4 > > # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > - > > > > Query: CRP0000 [M=75] > > Scores for complete sequences (score includes all domains): > > --- full sequence --- --- best 1 domain --- -#dom- > > E-value score bias E-value score bias exp N Sequence > Description > > ------- ------ ----- ------- ------ ----- ---- -- -------- > ----------- > > 5.5e-25 95.0 14.4 5.7e-25 95.0 10.0 1.0 1 > Chr2_540228_540404_+ > > > > Domain annotation for each sequence (and alignments): > > >> Chr2_540228_540404_+ > > # score bias c-Evalue i-Evalue hmmfrom hmm to alifrom ali > to envfrom env to acc > > --- ------ ----- --------- --------- ------- ------- ------- > ------- ------- ------- ---- > > 1 ! 95.0 10.0 3.6e-30 5.7e-25 20 74 .. 4 > 59 .] 1 59 [] 0.95 > > > > Alignments for each domain: > > == domain 1 score: 95.0 bits; conditional E-value: 3.6e-30 > > CRP0000 20 > tegpkvaeartCesqShkFkGpCvsdtnCasvCrtEgfpgGecrg.rrrCfCtkpc 74 > > ++gp+++eartCes+Sh+FkGpCvs +nCa+vC++Egf gG+crg > rrrC+Ct++c > > Chr2_540228_540404_+ 4 > GMGPVTVEARTCESKSHRFKGPCVSTHNCANVCHNEGFGGGKCRGfRRRCYCTRHC 59 > > > 568899***99********************************************* PP > > > > And here is a dump of the parsed HSP object: > > $VAR1 = bless( { > > 'VERBOSE' => 0, > > 'IDENTICAL' => 0, > > 'RANK' => 1, > > 'STRANDED' => 'NONE', > > 'EVALUE' => '3.6e-30', > > 'HSP_LENGTH' => 56, > > 'ALGORITHM' => 'HMMSEARCH' > > 'SCORE' => '95.0', > > 'GAP_SYMBOL' => '-', > > 'CONSERVED' => 0, > > > > 'HIT_NAME' => 'Chr2_540228_540404_+', > > 'HIT_DESC' => '', > > 'HIT_START' => '20', > > 'HIT_END' => '74', > > 'HIT_LENGTH' => 56, > > 'HIT_SEQ' => > 'tegpkvaeartCesqShkFkGpCvsdtnCasvCrtEgfpgGecrg-rrrCfCtkpc', > > 'HIT_FRAME' => 0, > > > > 'QUERY_NAME' => 'CRP0000', > > 'QUERY_DESC' => undef, > > 'QUERY_START' => '4', > > 'QUERY_END' => '59', > > 'QUERY_LENGTH' => '75', > > 'QUERY_FRAME' => 0, > > 'QUERY_SEQ' => > 'GMGPVTVEARTCESKSHRFKGPCVSTHNCANVCHNEGFGGGKCRGfRRRCYCTRHC', > > > > 'HOMOLOGY_SEQ' => '++gp+++eartCes+Sh+FkGpCvs > +nCa+vC++Egf gG+crg rrrC+Ct++c', > > }, 'Bio::Search::HSP::HMMERHSP' ); > > > > Clearly, the "HIT_START", "HIT_END", "HIT_SEQ" should actually be > exchanged with "QUERY_START", "QUERY_END" and "QUERY_SEQ" values. > > > > Thanks, > > > > Peng, > > > > On Tuesday, July 19, 2011 11:23:20 PM UTC-5, Givan, Scott A. wrote: > > I'll try the bioperl-live version. Thanks guys. > > Scott Givan > > 541-740-4685 > > Sent from an iPhone (so expect typos). > > > > On Jul 19, 2011, at 10:34 PM, "Chris Fields" > wrote: > > > > > This might be a disconnect between the HMMER3 version in bioperl-live > and the one in Kai's bioperl-hmmer3 repo. I believe the one in > bioperl-live is newer. Scott, can you give that a try? > > > > > > chris > > > > > > On Jul 19, 2011, at 9:45 PM, Thomas Sharpton wrote: > > > > > >> Hi Scott, > > >> > > >> Thanks for writing. I'm on the road at the moment so I have to be > briefer and less thorough than I'd like to be. > > >> > > >> What you are observing is not the intended behavior. Oddly, it's not > what I recall obtaining in my tests on this software, though I was mostly > interested in hmmsearch at the time and may have been sloppier than I > should have been when it came to hmmscan. > > >> > > >> What version of HMMER3 you're using? There have been some small > formatting changes in the past that might be causing a burp in the parser, > though I'm doubting it. > > >> > > >> Kai Blin wrote some test scripts (found here: > bioperl-live/t/SearchIO/hmmer.t) that, if I recall correctly, evaluate > query/hit coordinates. It might be worth giving this a shot if you haven't > already. > > >> > > >> Also, if you don't mind, I'm happy to run your code on your output > file on my end. It might help me diagnose the problem. > > >> > > >> Sorry this is being a thorn in your side! I've cc'ed the list in case > anyone else has insight into this matter. > > >> > > >> Best, > > >> Thomas > > >> > > >> On Jul 19, 2011, at 10:43 AM, Givan, Scott A. wrote: > > >> > > >>> Hi Thomas, > > >>> > > >>> I'm using modules in the bipoerl-hmmer3 git repository to parse > hmmscan > > >>> reports. When I parse the files and walk through the HSP's like: > > >>> > > >>> while (my $hit = $rslt->next_model) { > > >>> > > >>> while (my $domain = $hit->next_hsp) { > > >>> > > >>> And retrieve the "hit" coordinates like: > > >>> > > >>> print "hit coords: ", $domain->start('hit'), "-", > $domain->end('hit'), > > >>> "\n"; > > >>> > > >>> The coordinates returned correspond to what I would call the > "query", > > >>> since they are for the sequence I fed to hmmscan to search the > profile > > >>> database. Likewise, when retrieving the query coordinates like > > >>> $domain->start('query'), I get what I consider the "hit" > coordinates, > > >>> since they are for the domain profile. Is this the intended > behavior? > > >>> > > >>> Thanks. > > >>> > > >>> scott > > >>> > > >>> -- > > >>> Scott A. Givan > > >>> Associate Director > > >>> Informatics Research Core Facility > > >>> 240e Bond Life Sciences Center > > >>> Research Assistant Professor > > >>> Molecular Microbiology and Immunology > > >>> University of Missouri, Columbia > > >>> > > >>> TEL 573-882-2948 > > >>> FAX 573-884-9676 > > >>> http://ircf.rnet.missouri.edu > > >>> > > >>> > > >>> > > >> > > >> _______________________________________________ > > >> Bioperl-l mailing list > > >> Bioperl-l at lists.open-bio.org > > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From w.arindrarto at gmail.com Wed Jul 11 17:25:44 2012 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Wed, 11 Jul 2012 23:25:44 +0200 Subject: [Bioperl-l] hmmer3.pm question re query and hit coordinates In-Reply-To: References: <7CF4A2C5-F44F-4C0D-A3B7-5ED131A1A9ED@gmail.com> <1823BCEE-5D27-4FF9-8D57-082AE0CFE8ED@illinois.edu> Message-ID: Hi everyone, Just as an additional info that might be useful: The current Biopython parser for the plain text format parses the very first line to find out which HMMER flavor produces the result. Both 'hmm from' and 'hmmto' are query coordinates if the flavor is hmmsearch or phmmer; and they're hit coordinates if the flavor is hmmscan. This information is not available in other HMMER command line output formats (tblout and domtblout), which as Peter has mentioned, required us to treat different flavors of the table output as different formats for the time being. Fortunately, after contacting the HMMER developers they mentioned that this is not the case anymore in their development branch (and their future planned release). Hope that helps :), Bow On Wed, Jul 11, 2012 at 11:00 PM, Peter Cock wrote: > On Wed, Jul 11, 2012 at 8:02 PM, Fields, Christopher J > wrote: >> Peng, >> >> Has this been filed as a bug yet? >> >> https://redmine.open-bio.org/projects/bioperl >> >> Seems like it would be fairly easy to fix, but I want to track it just in case. >> >> chris > > Hi all, > > This could be the unfortunate fact that hmmscan and > hmmsearch return very similar tabular output, but > with query and hit interchanged. i.e. You need some > extra information to know which way round they are > (not possible with the current output). This was an > issue in Bow's Biopython SearchIO project - which > for the moment he solved by handling this as two > hmmer file formats. In the medium term we're hoping > hmmer3 will add some header information or something. > > Peter From dejian.zhao at gmail.com Thu Jul 12 01:04:54 2012 From: dejian.zhao at gmail.com (De-Jian Zhao) Date: Thu, 12 Jul 2012 13:04:54 +0800 Subject: [Bioperl-l] Errors with Bio::Seq objects In-Reply-To: <25cf332a-f998-4eae-96ff-d9db1ee2ff2c@googlegroups.com> References: <4FFDB879.1020906@gmail.com> <25cf332a-f998-4eae-96ff-d9db1ee2ff2c@googlegroups.com> Message-ID: <4FFE5AF6.1020300@gmail.com> Thank you, Peng. That's great! Actually I am wondering how to get the whole content of an object these days; "Dumping it" is a good solution. On 2012-7-12 2:03, Peng Zhou wrote: > Hi, > > I guess that's what the commands are supposed to do: the get_SeqFeatures() > method return an array of Bio::SeqFeature objects, and the translate() > method returns a Bio::Seq object. And you can't simply "print" an object in > perl - you can "dump" it though: > > $ perl -e ' use Bio::SeqIO; use Data::Dumper; my > $seqio=Bio::SeqIO->new(-file=>shift); > print Dumper($seqio->next_seq()->get_SeqFeatures()); ' nt.gb > > $ perl -e ' use Bio::SeqIO; use Data::Dumper; my > $seqio=Bio::SeqIO->new(-file=>shift); > print Dumper($seqio->next_seq()->translate()); ' nt.gb > > On Wednesday, July 11, 2012 12:31:37 PM UTC-5, De-Jian Zhao wrote: >> Hi, >> >> I downloaded a nucleotide sequence from Genbank (file name: nt.gb) and >> tested the Bio::SeqIO module as follows. The first 3 commands succeeded; >> however the last 2 failed. >> >> I think $seqio->next_seq() produces a Bio::Seq object which contains the >> sequence, features and annotation (according to the DESCRIPTION of >> "perldoc Bio::Seq") and thus the invocation of the methods >> get_SeqFeatures() and translate() should be valid. However, the results >> denied this idea. >> >> Will anyone explain what happened to the last 2 commands? I have >> encountered numerous cases of failures when testing the bioperl methods. >> I want to translate the mRNA sequence and extract the sequence features. >> What are the right commands? Thanks a lot! >> >> Best, >> Dejian >> >> >> >> PS: The commands and results >> >> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); >> print $seqio->next_seq()->display_id(); ' nt.gb >> NM_053056 >> >> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); >> print $seqio->next_seq()->molecule(); ' nt.gb >> mRNA >> >> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); >> print $seqio->next_seq()->subseq(1,6); ' nt.gb >> CACACG >> >> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); >> print $seqio->next_seq()->get_SeqFeatures(); ' nt.gb >> Bio::SeqFeature::Generic=HASH(0x20a30898)Bio::SeqFeature::Generic=HASH(0x20a30bb0)Bio::SeqFeature::Generic=HASH(0x20a30cd0)Bio::SeqFeature::Generic=HASH(0x20a317b0)Bio::SeqFeature::Generic=HASH(0x20a31720)Bio::SeqFeature::Generic=HASH(0x20a39a18)Bio::SeqFeature::Generic=HASH(0x20a317e0)Bio::SeqFeature::Generic=HASH(0x20a398b0)Bio::SeqFeature::Generic=HASH(0x20a39838)Bio::SeqFeature::Generic=HASH(0x20a39e98)Bio::SeqFeature::Generic=HASH(0x20a3b898)Bio::SeqFeature::Generic=HASH(0x20a3a120)Bio::SeqFeature::Generic=HASH(0x20a3bda8)Bio::SeqFeature::Generic=HASH(0x20a3c030)Bio::SeqFeature::Generic=HASH(0x20a3c2b8)Bio::SeqFeature::Generic=HASH(0x20a3be20)Bio::SeqFeature::Generic=HASH(0x20a3c0a8)Bio::SeqFeature::Generic=HASH(0x20a3bb98)Bio::SeqFeature::Generic=HASH(0x20a3c300)Bio::SeqFeature::Generic=HASH(0x20a3c588)Bio::SeqFeature::Generic=HASH(0x20a3d838)Bio::SeqFeature::Generic=HASH(0x20a3dfb8)Bio::SeqFeature::Generic=HASH(0x20a3dd18) >> >> >> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); >> print $seqio->next_seq()->translate(); ' nt.gb >> Bio::Seq::RichSeq=HASH(0x20a3e7b0) >> From dejian.zhao at gmail.com Thu Jul 12 01:14:33 2012 From: dejian.zhao at gmail.com (De-Jian Zhao) Date: Thu, 12 Jul 2012 13:14:33 +0800 Subject: [Bioperl-l] Errors with Bio::Seq objects In-Reply-To: <9CA9DA3A-B03F-4EC3-977C-E18A6F4D9B6F@tamu.edu> References: <4FFDB879.1020906@gmail.com> <9CA9DA3A-B03F-4EC3-977C-E18A6F4D9B6F@tamu.edu> Message-ID: <4FFE5D39.6010406@gmail.com> Thank you, Jim. You are right. It works. This example deepens my understanding of OOP. On 2012-7-12 2:01, Jim Hu wrote: >> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); print $seqio->next_seq()->translate(); ' nt.gb >> > Bio::Seq::RichSeq=HASH(0x20a3e7b0) > ->translate returns a new Seq object. I think > > $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); print $seqio->next_seq()->translate()->seq(); ' nt.gb > > should work (haven't tried it). From kai.blin at biotech.uni-tuebingen.de Thu Jul 12 09:43:19 2012 From: kai.blin at biotech.uni-tuebingen.de (Kai Blin) Date: Thu, 12 Jul 2012 15:43:19 +0200 Subject: [Bioperl-l] hmmer3.pm question re query and hit coordinates In-Reply-To: References: <7CF4A2C5-F44F-4C0D-A3B7-5ED131A1A9ED@gmail.com> <1823BCEE-5D27-4FF9-8D57-082AE0CFE8ED@illinois.edu> Message-ID: <4FFED477.3090907@biotech.uni-tuebingen.de> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 2012-07-11 23:25, Wibowo Arindrarto wrote: Hi, > The current Biopython parser for the plain text format parses the > very first line to find out which HMMER flavor produces the result. > Both 'hmm from' and 'hmmto' are query coordinates if the flavor is > hmmsearch or phmmer; and they're hit coordinates if the flavor is > hmmscan. Whoops. I mostly looked at hmmscan when writing the parser, because that's the file format I needed for my code. The code clearly should follow the way the hmmer2 parser works, and differentiate between hmmsearch and hmmscan type output. As I said on the bug report, I'm happy to look at code fixing this. > This information is not available in other HMMER command line > output formats (tblout and domtblout), which as Peter has > mentioned, required us to treat different flavors of the table > output as different formats for the time being. As far as I'm aware, BioPerl currently doesn't parse the table output format. Seeing how much repeated pain we run into with all these parsers in the different Bio* projects, I wonder if there was a smarter way to deal with parsing. Maybe at least some shared grammar file that we could use for testing, to make sure we at least have the same expectations about file formats in the different language implementations. Ideally we'd auto-generate the parsers from the grammar specification, but I guess that'll stay wishful thinking for quite a bit. > Fortunately, after contacting the HMMER developers they mentioned > that this is not the case anymore in their development branch (and > their future planned release). That's certainly good news. :) Cheers, Kai - -- Dipl.-Inform. Kai Blin kai.blin at biotech.uni-tuebingen.de Institute for Microbiology and Infection Medicine Division of Microbiology/Biotechnology Eberhard-Karls-Universit?t T?bingen Auf der Morgenstelle 28 Phone : ++49 7071 29-78841 D-72076 T?bingen Fax : ++49 7071 29-5979 Germany Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQEcBAEBAgAGBQJP/tR3AAoJEKM5lwBiwTTP6OoIAM3J9chdyfmTuQTp4KMxVIk7 PCkJy+aLcnfa3d7s8BVPG0GWQTPrfHLX6a7zWfoSLzL9RBShFWCQIxGpu+Tq3yR8 Hu/TpoFIg8bB1iAroAWLdsX8nio3Idlcl5JN38LBsFEUirFrGAsvfdN/+fYrP5Ni y0ULP18uihiN07sVG88nZXNyEB7fIscVYdO90GsGq03/KOTRsRD4kugapiQJIy4D lrqnYznLa4p30lBDCEHbTaHYbfIs7/8tryfHJsfjimjg8IoSMHMJfIkI7/z0qlL+ bxt/HuGMsm1Ak08xEAoT7T00t5tcAp1gclgZsO/CrviOicmhUgd6iri/kIpzg0c= =acWd -----END PGP SIGNATURE----- From cjfields at illinois.edu Thu Jul 12 11:24:13 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Thu, 12 Jul 2012 15:24:13 +0000 Subject: [Bioperl-l] hmmer3.pm question re query and hit coordinates In-Reply-To: <4FFED477.3090907@biotech.uni-tuebingen.de> References: <7CF4A2C5-F44F-4C0D-A3B7-5ED131A1A9ED@gmail.com> <1823BCEE-5D27-4FF9-8D57-082AE0CFE8ED@illinois.edu> <4FFED477.3090907@biotech.uni-tuebingen.de> Message-ID: <1C3A31F9-9717-49F3-A880-FA725D0F3CDB@illinois.edu> On Jul 12, 2012, at 8:43 AM, Kai Blin wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On 2012-07-11 23:25, Wibowo Arindrarto wrote: > > Hi, > >> The current Biopython parser for the plain text format parses the >> very first line to find out which HMMER flavor produces the result. >> Both 'hmm from' and 'hmmto' are query coordinates if the flavor is >> hmmsearch or phmmer; and they're hit coordinates if the flavor is >> hmmscan. > > Whoops. I mostly looked at hmmscan when writing the parser, because > that's the file format I needed for my code. The code clearly should > follow the way the hmmer2 parser works, and differentiate between > hmmsearch and hmmscan type output. > > As I said on the bug report, I'm happy to look at code fixing this. Seems like it should be easy enough to address if there is something in the output that indicates the report type. >> This information is not available in other HMMER command line >> output formats (tblout and domtblout), which as Peter has >> mentioned, required us to treat different flavors of the table >> output as different formats for the time being. > > As far as I'm aware, BioPerl currently doesn't parse the table output > format. The only reason to do so is if the table provides additional information the actual hits don't (this can be the case with BLAST reports). > Seeing how much repeated pain we run into with all these parsers in > the different Bio* projects, I wonder if there was a smarter way to > deal with parsing. Maybe at least some shared grammar file that we > could use for testing, to make sure we at least have the same > expectations about file formats in the different language > implementations. Ideally we'd auto-generate the parsers from the > grammar specification, but I guess that'll stay wishful thinking for > quite a bit. I would fully support something like this, been thinking about this with Marpa::XS (which now has a compiled library, libmarpa, to make it less perl-centric), and there have been talks of using a similar toolkit with the bioruby folks. We could always have a plain-perl/python/ruby/etc fallback in the most common formats. chris From buschj at hhu.de Sun Jul 15 15:46:42 2012 From: buschj at hhu.de (jobu) Date: Sun, 15 Jul 2012 21:46:42 +0200 Subject: [Bioperl-l] How to obtain Up- and Downstream target-Sequences of Blast Matches Message-ID: <50031E22.3060902@hhu.de> Dear All. Still being a beginner in Perl and just having started to look into BioPerl, I hope to ask my question at the right place. I locally ran a standalone blastn search of many short query-sequences against a set of target-fasta-sequences consisting of whole chromosomal sequence data. What I need to do now is to get let's say 100nt each Up- and Downstream out of my target sequences for each Blast match. At this point I only can assume that BioPerl might be helpfull in resolving this task, though I haven't found a module yet that will manage to do this locally on my harddrive. Thus I would be thankful for the slightest hint where to begin. Sincerely Jochen From Russell.Smithies at agresearch.co.nz Sun Jul 15 17:19:14 2012 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Mon, 16 Jul 2012 09:19:14 +1200 Subject: [Bioperl-l] How to obtain Up- and Downstream target-Sequences of Blast Matches In-Reply-To: <50031E22.3060902@hhu.de> References: <50031E22.3060902@hhu.de> Message-ID: <18DF7D20DFEC044098A1062202F5FFF34CCF2A4CAA@exchsth.agresearch.co.nz> Hi Jochen, I don't think BioPerl can directly manipulate blast databases so I'd probably do it with fastacmd to extract the sequence from the original blast database. eg. fastacmd -s X51494.1 -d /dataset/blastdata/active/nt -L 100,200 >gi|20090|emb|X51494.1|:100-200 Rice prolamin gene (strain NE4) ATGATGCAAACGTTGGGCATGGGTAGCTCCACAGCCATGTTCATGTCGCAGCCAATGGCGCTCCTGCAGCAGCAATGTTG CATGCAGCTACAAGGCATGAT Or if you're using blast+, use the blastdbcmd command: eg. blastdbcmd -entry X51494.1 -db /dataset/blastdata/active/nt -range 100-200 >gi|20090|emb|X51494.1|:100-200 Rice prolamin gene (strain NE4) ATGATGCAAACGTTGGGCATGGGTAGCTCCACAGCCATGTTCATGTCGCAGCCAATGGCGCTCCTGCAGCAGCAATGTTG CATGCAGCTACAAGGCATGAT So to add it all together, try using BioPerl to parse your existing blast results and pull out each hit's coordinates then use a system call to exec fastacmd or blastdbcmd to extract the sequence from the blast database then write the sequences to file. These might be useful: http://www.bioperl.org/wiki/HOWTO:SearchIO http://www.bioperl.org/wiki/HOWTO:SearchIO#Speed_improvements_with_lightweight_objects http://www.bioperl.org/wiki/HOWTO:BlastPlus http://www.bioperl.org/wiki/HOWTO:StandAloneBlast --Russell -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of jobu Sent: Monday, 16 July 2012 7:47 a.m. To: bioperl-l at lists.open-bio.org Subject: [Bioperl-l] How to obtain Up- and Downstream target-Sequences of Blast Matches Dear All. Still being a beginner in Perl and just having started to look into BioPerl, I hope to ask my question at the right place. I locally ran a standalone blastn search of many short query-sequences against a set of target-fasta-sequences consisting of whole chromosomal sequence data. What I need to do now is to get let's say 100nt each Up- and Downstream out of my target sequences for each Blast match. At this point I only can assume that BioPerl might be helpfull in resolving this task, though I haven't found a module yet that will manage to do this locally on my harddrive. Thus I would be thankful for the slightest hint where to begin. Sincerely Jochen _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From dcmertens.perl at gmail.com Tue Jul 17 08:57:55 2012 From: dcmertens.perl at gmail.com (David Mertens) Date: Tue, 17 Jul 2012 07:57:55 -0500 Subject: [Bioperl-l] Announcing The Quantified Onion Google Group and perl4science.github.com Message-ID: Hello everybody - I returned from YAPC::NA this year intending to build-up the scientific Perl community. One outgrowth of this has been Joel Berger's creation of perl4science.github.com and gizmomathboy's creation of The Quantified Onion Google Group . perl4science is meant to be a landing page for anybody looking to combine Perl and science. Since it is a github repository, it makes it about as easy as possible for others to contribute content or fixes. If you have a project that scientists would find useful, you should fork the project, add your content, and issue a pull request. It's that easy. The Quantified Onion is meant to be a space for scientists to discuss how we use Perl in our science and to work together to grow adoption of Perl among scientists. It will undoubtedly attract newcomers to Perll asking beginner questions, at which point we will gently refer them to the appropriate manual pages. Interesting discussions thus far (in my mind) include a discussion about teaching test-driven design and a discussion about submitting an article to Computing in Science and Engineering for their November Issue, which is supposed to be about Modern Programming Languages. I would like to begin putting on workshops on Perl for Scientists and Engineers (and encourage others to do that same), and I will begin the discussion on The Quantified Onion. If you know of other Perl science resources, please feel free to add them to perl4science or post them on The Quantified Onion, and please join The Quantified Onion. Together, we can grow Perl's adoption among scientists! David Mertens -- "Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it." -- Brian Kernighan From cjfields at illinois.edu Wed Jul 18 10:29:02 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Wed, 18 Jul 2012 14:29:02 +0000 Subject: [Bioperl-l] [blast-announce] OLD_BLAST parameter to be discontinued. Alternative NCBI BLAST parsable formats are available References: Message-ID: <118F034CF4C3EF48A96F86CE585B94BF2B63F6C5@CHIMBX5.ad.uillinois.edu> Not sure if anyone is using this as a means of getting their reports (I don't), but I'm posting this here just in case. -c Begin forwarded message: > From: "Mcginnis, Scott (NIH/NLM/NCBI) [E]" > Subject: [blast-announce] OLD_BLAST parameter to be discontinued. Alternative NCBI BLAST parsable formats are available > Date: July 18, 2012 9:17:05 AM CDT > To: NLM/NCBI List blast-announce > > Beginning Sept. 10, 2012, the BLAST service will ignore the OLD_BLAST parameter in posted URLs. We are removing this old and little used option to prepare for upcoming enhancements to the BLAST service later this year. Setting OLD_BLAST=true produces an older version of the BLAST HTML results that a few people have used for automated processing (parsing) of results. NCBI BLAST supports a number of different and more stable formats for parsing. These include XML, tabular reports and ASN.1. For more information, please see BLAST Developer Information (http://1.usa.gov/O8AocI) and links on that page. > From dejian.zhao at gmail.com Wed Jul 18 11:36:14 2012 From: dejian.zhao at gmail.com (De-Jian Zhao) Date: Wed, 18 Jul 2012 23:36:14 +0800 Subject: [Bioperl-l] Which graphic module should I learn? Message-ID: <5006D7EE.1020205@gmail.com> Hi, all. Currently I am working on a genome. I will draw some pictures based on the sequencing data. In the long run, I will use the module in my future projects, so I want to learn a popular module to get better support from the community. I searched in cpan with the command "i /SVG/" and got 234 items. Which one is popular in bioinformatics? Which module should I start with? Thanks for any suggestions. Best, De-Jian From scott at scottcain.net Wed Jul 18 11:46:01 2012 From: scott at scottcain.net (Scott Cain) Date: Wed, 18 Jul 2012 11:46:01 -0400 Subject: [Bioperl-l] Which graphic module should I learn? In-Reply-To: <5006D7EE.1020205@gmail.com> References: <5006D7EE.1020205@gmail.com> Message-ID: Hi De-Jian, Of course, it depends on what you want to do, but if you're referring to the genome feature/annotation type graphics, Bio::Graphics already supports SVG pretty well, via GD::SVG. Scott On Wed, Jul 18, 2012 at 11:36 AM, De-Jian Zhao wrote: > Hi, all. > > Currently I am working on a genome. I will draw some pictures based on the > sequencing data. In the long run, I will use the module in my future > projects, so I want to learn a popular module to get better support from the > community. I searched in cpan with the command "i /SVG/" and got 234 items. > Which one is popular in bioinformatics? Which module should I start with? > Thanks for any suggestions. > > Best, > De-Jian > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From cjfields at illinois.edu Tue Jul 24 23:08:05 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Wed, 25 Jul 2012 03:08:05 +0000 Subject: [Bioperl-l] BioPerl and Travis-CI Message-ID: <118F034CF4C3EF48A96F86CE585B94BF3140B017@CITESMBX5.ad.uillinois.edu> Peter Cock has graciously helped start up a branch for bioperl-live that is using Travis-CI (a nice continuous integration tool). Results from Peter's fork are found here: http://travis-ci.org/#!/peterjc/bioperl-live As this is now pulled into the main bioperl repo, results will be here: http://travis-ci.org/#!/bioperl/bioperl-live I'll be working on this and expect this will be added to master in the next few days. chris From p.j.a.cock at googlemail.com Wed Jul 25 06:31:13 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 25 Jul 2012 11:31:13 +0100 Subject: [Bioperl-l] BioPerl and Travis-CI In-Reply-To: <118F034CF4C3EF48A96F86CE585B94BF3140B017@CITESMBX5.ad.uillinois.edu> References: <118F034CF4C3EF48A96F86CE585B94BF3140B017@CITESMBX5.ad.uillinois.edu> Message-ID: On Wed, Jul 25, 2012 at 4:08 AM, Fields, Christopher J wrote: > Peter Cock has graciously helped start up a branch for bioperl-live > that is using Travis-CI (a nice continuous integration tool). Results > from Peter's fork are found here: > > http://travis-ci.org/#!/peterjc/bioperl-live > > As this is now pulled into the main bioperl repo, results will be here: > > http://travis-ci.org/#!/bioperl/bioperl-live > > I'll be working on this and expect this will be added to master in > the next few days. > > chris We've had this running for Biopython for a month now, and it has been a useful complement to the BuildBot (which covers other operating systems). This was following BioRuby's lead: http://biopython.org/pipermail/biopython-dev/2012-June/009742.html The current BioPerl Travis configuration is probably usable right now (after changing the branch whitelist to either master, or simple all branches). Other remaining issues include sorting out which dependencies should be installed, and streamlining their verbose output (e.g. using tail). TravisCI can send out emails (e.g. on test failures), and perhaps bioperl-guts-l might be a sensible place to send these. Initially we'd disabled the emails for Biopython. I'd like to use an RSS feed... there is a JSON API which BioRuby are using for http://www.biogems.info/ which tracks their plugins. Peter From p.j.a.cock at googlemail.com Fri Jul 27 11:03:05 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 27 Jul 2012 16:03:05 +0100 Subject: [Bioperl-l] BioPerl and Travis-CI In-Reply-To: <118F034CF4C3EF48A96F86CE585B94BF3140D21F@CITESMBX5.ad.uillinois.edu> References: <118F034CF4C3EF48A96F86CE585B94BF3140B017@CITESMBX5.ad.uillinois.edu> <118F034CF4C3EF48A96F86CE585B94BF3140D21F@CITESMBX5.ad.uillinois.edu> Message-ID: On Fri, Jul 27, 2012 at 3:58 PM, Fields, Christopher J wrote: > On Jul 27, 2012, at 9:47 AM, Peter Cock wrote: > >> On Thu, Jul 26, 2012 at 4:22 PM, Peter Cock wrote: >>> >>> That's done now - except for the circular dependencies, and GD, >>> which might be easy to solve if anyone knows what the error >>> means - see commit message here: >>> https://github.com/peterjc/bioperl-live/commit/905441ac09939be3368c14de38d04486c7e9849a >> >> Re: https://twitter.com/cjfields/status/228861370454638592 >> Not sure why you got GD to work when something very similar >> had failed for me. Oh well - job done :) > > It was the lack of gdlib-config in the libgd2-xpm package, you need > libgd2-xpm-dev. One of the fun things about Debian packaging. Ah - I should have guessed that. >>> Would a single clean commit of the (current) .travis.yml file be >>> preferable to the current series of commits? And you you want >>> a pull request, or would you just merge/cherry-pick manually? >> >> Given all the churn between our revisions, personally I'd opt for >> a single clean commit to bioperl/master - but your call. >> >> Peter > > Yep, about to merge it over. It's working now, just need to > whitelist master instead of travis after the merge. I'd removed the whitelist altogether here: https://github.com/peterjc/bioperl-live/commit/96dc5866f4406179353909c72d812623341c8fbd My thinking was BioPerl seems to have multiple feature branches under the official repo, so they should get tested too. You'd be in a better position than me to judge what would work best for BioPerl here. Peter From cjfields at illinois.edu Fri Jul 27 10:58:21 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Fri, 27 Jul 2012 14:58:21 +0000 Subject: [Bioperl-l] BioPerl and Travis-CI In-Reply-To: References: <118F034CF4C3EF48A96F86CE585B94BF3140B017@CITESMBX5.ad.uillinois.edu> Message-ID: <118F034CF4C3EF48A96F86CE585B94BF3140D21F@CITESMBX5.ad.uillinois.edu> On Jul 27, 2012, at 9:47 AM, Peter Cock wrote: > On Thu, Jul 26, 2012 at 4:22 PM, Peter Cock wrote: >> >> That's done now - except for the circular dependencies, and GD, >> which might be easy to solve if anyone knows what the error >> means - see commit message here: >> https://github.com/peterjc/bioperl-live/commit/905441ac09939be3368c14de38d04486c7e9849a > > Re: https://twitter.com/cjfields/status/228861370454638592 > Not sure why you got GD to work when something very similar > had failed for me. Oh well - job done :) It was the lack of gdlib-config in the libgd2-xpm package, you need libgd2-xpm-dev. One of the fun things about Debian packaging. >> Would a single clean commit of the (current) .travis.yml file be >> preferable to the current series of commits? And you you want >> a pull request, or would you just merge/cherry-pick manually? > > Given all the churn between our revisions, personally I'd opt for > a single clean commit to bioperl/master - but your call. > > Peter Yep, about to merge it over. It's working now, just need to whitelist master instead of travis after the merge. chris From cjfields at illinois.edu Fri Jul 27 12:26:34 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Fri, 27 Jul 2012 16:26:34 +0000 Subject: [Bioperl-l] BioPerl Travis-CI now live Message-ID: <118F034CF4C3EF48A96F86CE585B94BF3140D54D@CITESMBX5.ad.uillinois.edu> All commits to bioperl-live master branch on github are now being tracked: http://travis-ci.org/#!/bioperl/bioperl-live The .travis.yml file has a whitelist for branches to be tested; if anyone wants to test additional branches feel free to add them to the list! chris From cjfields at illinois.edu Fri Jul 27 11:15:19 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Fri, 27 Jul 2012 15:15:19 +0000 Subject: [Bioperl-l] BioPerl and Travis-CI In-Reply-To: References: <118F034CF4C3EF48A96F86CE585B94BF3140B017@CITESMBX5.ad.uillinois.edu> <118F034CF4C3EF48A96F86CE585B94BF3140D21F@CITESMBX5.ad.uillinois.edu> Message-ID: <118F034CF4C3EF48A96F86CE585B94BF3140D2D6@CITESMBX5.ad.uillinois.edu> On Jul 27, 2012, at 10:03 AM, Peter Cock wrote: > On Fri, Jul 27, 2012 at 3:58 PM, Fields, Christopher J > wrote: >> On Jul 27, 2012, at 9:47 AM, Peter Cock wrote: >> >>> On Thu, Jul 26, 2012 at 4:22 PM, Peter Cock wrote: >>>> >>>> That's done now - except for the circular dependencies, and GD, >>>> which might be easy to solve if anyone knows what the error >>>> means - see commit message here: >>>> https://github.com/peterjc/bioperl-live/commit/905441ac09939be3368c14de38d04486c7e9849a >>> >>> Re: https://twitter.com/cjfields/status/228861370454638592 >>> Not sure why you got GD to work when something very similar >>> had failed for me. Oh well - job done :) >> >> It was the lack of gdlib-config in the libgd2-xpm package, you need >> libgd2-xpm-dev. One of the fun things about Debian packaging. > > Ah - I should have guessed that. > >>>> Would a single clean commit of the (current) .travis.yml file be >>>> preferable to the current series of commits? And you you want >>>> a pull request, or would you just merge/cherry-pick manually? >>> >>> Given all the churn between our revisions, personally I'd opt for >>> a single clean commit to bioperl/master - but your call. >>> >>> Peter >> >> Yep, about to merge it over. It's working now, just need to >> whitelist master instead of travis after the merge. > > I'd removed the whitelist altogether here: > https://github.com/peterjc/bioperl-live/commit/96dc5866f4406179353909c72d812623341c8fbd > > My thinking was BioPerl seems to have multiple feature branches > under the official repo, so they should get tested too. You'd be > in a better position than me to judge what would work best for > BioPerl here. > > Peter We'll keep it to master for now. It's pretty easy to add branches as needed, and I didn't want to expand to all the potentially stale branches unless explicitly set (we need to triage all those at some point). chris From p.j.a.cock at googlemail.com Fri Jul 27 10:47:18 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 27 Jul 2012 15:47:18 +0100 Subject: [Bioperl-l] BioPerl and Travis-CI In-Reply-To: References: <118F034CF4C3EF48A96F86CE585B94BF3140B017@CITESMBX5.ad.uillinois.edu> Message-ID: On Thu, Jul 26, 2012 at 4:22 PM, Peter Cock wrote: > > That's done now - except for the circular dependencies, and GD, > which might be easy to solve if anyone knows what the error > means - see commit message here: > https://github.com/peterjc/bioperl-live/commit/905441ac09939be3368c14de38d04486c7e9849a Re: https://twitter.com/cjfields/status/228861370454638592 Not sure why you got GD to work when something very similar had failed for me. Oh well - job done :) > Would a single clean commit of the (current) .travis.yml file be > preferable to the current series of commits? And you you want > a pull request, or would you just merge/cherry-pick manually? Given all the churn between our revisions, personally I'd opt for a single clean commit to bioperl/master - but your call. Peter From robfsouza at gmail.com Fri Jul 27 18:29:22 2012 From: robfsouza at gmail.com (Robson de Souza) Date: Fri, 27 Jul 2012 15:29:22 -0700 (PDT) Subject: [Bioperl-l] obf sites offline? Message-ID: <9bef8a3b-08ca-4868-be7a-193e7596290d@googlegroups.com> I can't access any of the OBF sites, either from work (USA) or my phone... is there something going on? Robson From p.j.a.cock at googlemail.com Thu Jul 26 11:22:26 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 26 Jul 2012 16:22:26 +0100 Subject: [Bioperl-l] BioPerl and Travis-CI In-Reply-To: References: <118F034CF4C3EF48A96F86CE585B94BF3140B017@CITESMBX5.ad.uillinois.edu> Message-ID: On Wed, Jul 25, 2012 at 11:31 AM, Peter Cock wrote: > On Wed, Jul 25, 2012 at 4:08 AM, Fields, Christopher J > wrote: >> Peter Cock has graciously helped start up a branch for bioperl-live >> that is using Travis-CI (a nice continuous integration tool). Results >> from Peter's fork are found here: >> >> http://travis-ci.org/#!/peterjc/bioperl-live >> >> As this is now pulled into the main bioperl repo, results will be here: >> >> http://travis-ci.org/#!/bioperl/bioperl-live >> >> I'll be working on this and expect this will be added to master in >> the next few days. >> >> chris > > We've had this running for Biopython for a month now, and it has > been a useful complement to the BuildBot (which covers other > operating systems). This was following BioRuby's lead: > http://biopython.org/pipermail/biopython-dev/2012-June/009742.html > > The current BioPerl Travis configuration is probably usable right > now (after changing the branch whitelist to either master, or simple > all branches). > > Other remaining issues include sorting out which dependencies > should be installed, and streamlining their verbose output (e.g. > using tail). That's done now - except for the circular dependencies, and GD, which might be easy to solve if anyone knows what the error means - see commit message here: https://github.com/peterjc/bioperl-live/commit/905441ac09939be3368c14de38d04486c7e9849a Would a single clean commit of the (current) .travis.yml file be preferable to the current series of commits? And you you want a pull request, or would you just merge/cherry-pick manually? > TravisCI can send out emails (e.g. on test failures), and perhaps > bioperl-guts-l might be a sensible place to send these. Initially > we'd disabled the emails for Biopython. I'd like to use an RSS > feed... there is a JSON API which BioRuby are using for > http://www.biogems.info/ which tracks their plugins. I've filed an issue for news feed support in TravisCI, https://github.com/travis-ci/travis-core/issues/82 Regards, Peter From p.j.a.cock at googlemail.com Tue Jul 31 06:37:35 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 31 Jul 2012 11:37:35 +0100 Subject: [Bioperl-l] Travis Continuous Integration testing & pull requests Message-ID: Hi all, I'm cross posting as this is an announcement. Please keep any follow up discussion to the relevant project specific mailing list, or if general open-bio-l please. Those following the OBF blog or the OBF or Bio* Twitter accounts will have already seen this, which I posted yesterday: http://news.open-bio.org/news/2012/07/travis-ci-for-testing/ In summary, since earlier this year BioRuby and then Biopython and BioPerl have been using Travis-CI.org (a hosted continuous integration service for the open source community) to run their unit tests automatically whenever their GitHub repositories are updated. In addition we now have TravisCI automatically running our tests on any new GitHub pull requests - supported by an OBF donation to Travis-CI, see: http://about.travis-ci.org/blog/announcing-pull-request-support/ Currently BioJava only uses GitHub as an SVN mirror - but this should still let you start using TravisCI for automated testing: http://about.travis-ci.org/docs/user/languages/java/ For EMBOSS, this is another incentive to convert from CVS to github - TravisCI recently announced support for C/C++ projects: http://about.travis-ci.org/blog/support_for_go_c_and_cpp/ http://about.travis-ci.org/docs/user/languages/c/ Potentially there are other OBF projects where this would be useful too. Regards, Peter From wrp at virginia.edu Mon Jul 2 14:31:40 2012 From: wrp at virginia.edu (William Pearson) Date: Mon, 2 Jul 2012 10:31:40 -0400 Subject: [Bioperl-l] Application Deadline - 2012 CSHL Computational and Comparative Genomics Course Message-ID: Course announcement - Application deadline, July 15, 2012 Cold Spring Harbor COMPUTATIONAL & COMPARATIVE GENOMICS Oct 31 - Nov 6, 2011 Application Deadline: July 15, 2012 INSTRUCTORS: William Pearson, University of Virginia, Charlottesville, VA Lisa Stubbs, University of Illinois, Urbana, IL This course presents a comprehensive overview of the theory and practice of computational methods for the identification and characterization of functional elements from DNA sequence data. The course focuses on approaches for extracting the maximum amount of information from protein and DNA sequence similarity through sequence database searches, statistical analysis, and multiple sequence alignment. Additional topics include: Alignment and analysis of "Next-Gen" sequencing data The Galaxy environment for high-throughput analysis Identification of conserved signals in aligned and unaligned sequences Regulatory element and motif recognition Integration of genetic and sequence information in biological databases The ENSEMBL genome browser and BioMart Function/phenotype prediction for sequence variants The course combines lectures with hands-on exercises; students are encouraged to pose challenging sequence analysis problems using their own data. The course is designed for biologists seeking advanced training in biological sequence and genome analysis, computational biology core resource directors and staff, and for scientists in other disciplines, such as computer science, who wish to survey current research problems in biological sequence analysis. Advanced programming skills are not required. The lecture/lab schedule for the 2011 course can be found at fasta.bioch.virginia.edu/cshl Speakers in the 2011 course included: Aaron Mackey, U. of Virginia, Next-Gen analysis pipelines Bert Overduin, European Bioinformatics Institute, UK, ENSEMBL and BioMart Francis Ouellette, Ontario Institute for Cancer Research, Databases for Biological Function William Pearson, U. of Virginia, Similarity Searching, Multiple Alignment Lisa Stubbs, U. of Illinois, Urbana, ChIP, Transcription Factors, and Comparative Genomics James Taylor, Emory, Galaxy and genome analysis pipelines The primary focus of the computational and comparative genomics course is the theory and practice of algorithms used in computational biology, with the goal of using current methods more effectively and evaluating new approaches. Students who wish to learn Perl programming for Bioinformatics are encouraged to apply to the Programming for Biology course. Students who would like in-depth training in the analysis of next-generation sequencing data (e.g., SNP calling and the detection of structural variants) should apply to the course on Advanced Sequencing Technologies & Applications. This Computational and Comparative Genomics course will discuss methods for phenotype prediction from variation data. To apply to the course, fill out and send in the form at: http://meetings.cshl.edu/course/courseapp_instr.shtml From shalabh.sharma7 at gmail.com Mon Jul 2 17:09:57 2012 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Mon, 2 Jul 2012 13:09:57 -0400 Subject: [Bioperl-l] translation frame problem in bioperl Message-ID: Hi All, I am just confused about the translation frames. I used bioperl to parse a blastx report. Reports shows that the frame used is -2 but when i translate the sequence using EMBOSS or Some other program the frame is -1. Am i doing something wrong here. Here is the sequence: >gi|378759230|gb|AHBJ01000169.1| SAR86 cluster bacterium SAR86D scf1120176765857, whole genome shotgun sequence 2642:3697 AGCTTCCCATGGAACCCATGCAAGTGCAATATTTGTTTCTAGCTCTGGTGACCACCAAGGAGATGTCACGTAGCCCACCTCATCTTCATCAGTATTAGTTACTATCCAAAAATCAGAAGCATAATCTGTGATTTCTTTTCCTCCAAGGGTTAAACCAACCATCTTCATTTTAAATGGTGCATTTCCTTCATCTATGATTGCTCTCTGTTTTTCAAGCTCTTCTTTACCAATGTAATCAGCTGCTTTATTTCTTGGTACCTGATAACTTAAATTAACCTGAAAGGGAGAAGTTTCATGATCCAGATCTTGTCCCCAAGACAAAATTCCAGCTGCAATGCGACGATGATGCGCAGGAGCTATGACCATTAAGCCAAATTCTTCTCCAGCCTCAAGAACAGCATTCCACATTTTTTCTGCATTATCATGTGCGTCACGAACATATATTTCATAACCTTTTTCGCCTGTAAAACCAGTTTGACTGATTACACAATCAGCTCCACCAACCTGAGTTTCTAAAATTCCATAATAAGGAACTTCTCTTAACTCTTCGCCAGCTAACTTTGCCATAAGATCTTCAGATAAAGGGCCTTGAATTTGAACAGGACAAACATCAATCTCATCAATTTCTACGTCATATTTTTTAGACACATTTACGCCTTGAAGCCAAAGTAAGAGATCGCTGTCTGATATTGAGAACCAGAATTCATCTTCTGTTAGTCTTAATAGAACAGGGTCATTTAAAACCCCTCCTTTTTCATTGCATAAAATCGCATATTTACCATTTCCGGGTTTAATTTTTGTAGCATCACGAGTTATTACATAATCTGTAAAAGCTTCTGCATCTGGACCTTTTACTCTTATCTGTCTTTCAACAGCAACATTCCACATAGTAACTCTATTAACCAAGGCTTCGTATTCAACCATGGCACCGCCATCTTCAGGTTTTACATAGCCTCGTGGATGATAAATTCGATTATATACAGTTGCTCTCCAACAGCCCGCTTCATGAGATAGATGCCAAAAAGGCGATTTTCTTACCCGGGTTGAAATTAATAA This is a part of blast report by bioperl: >JCVI_READ_1105499496127 /Indian_Ocean/gcvT Length = 352 Score = 655 bits (1690), Expect = 0.0 Identities = 311/352 (88%), Positives = 329/352 (93%) Frame = -2 Query: 3697 LLISTRVRKSPFWHLSHEAGCWRATVYNRIYHPRGYVKPEDGGAMVEYEALVNRVTMWNV 3518 +LISTRVRKSPFWHLSHEAGCWRATVYNRIYHPRGY+KPEDGGAMVEY+ALVNRVTMWNV Sbjct: 1 MLISTRVRKSPFWHLSHEAGCWRATVYNRIYHPRGYIKPEDGGAMVEYDALVNRVTMWNV 60 ..... ..... Query: 2797 GLTLGGKEITDYASDFWIVTNTDEDEVGYVTSPWWSPELETNIALAWVPWEA 2642 GLTLGGKEITDYA DFW+V + D + PWWSPEL TNIAL WVPW A Sbjct: 301 GLTLGGKEITDYAPDFWLVADMDGMMLDISLPPWWSPELNTNIALGWVPWSA 352 This is EMBOSS output (from EBI): >EMBOSS_001_4 LLISTRVRKSPFWHLSHEAGCWRATVYNRIYHPRGYVKPEDGGAMVEYEALVNRVTMWNV AVERQIRVKGPDAEAFTDYVITRDATKIKPGNGKYAILCNEKGGVLNDPVLLRLTEDEFW FSISDSDLLLWLQGVNVSKKYDVEIDEIDVCPVQIQGPLSEDLMAKLAGEELREVPYYGI LETQVGGADCVISQTGFTGEKGYEIYVRDAHDNAEKMWNAVLEAGEEFGLMVIAPAHHRR IAAGILSWGQDLDHETSPFQVNLSYQVPRNKAADYIGKEELEKQRAIIDEGNAPFKMKMV GLTLGGKEITDYASDFWIVTNTDEDEVGYVTSPWWSPELETNIALAWVPWEA >EMBOSS_001_5 INFNPGKKIAFLASIS*SGLLESNCI*SNLSSTRLCKT*RWRCHG*IRSLG**SYYVECC C*KTDKSKRSRCRSFYRLCNNS*CYKN*TRKW*ICDFMQ*KRRGFK*PCSIKTNRR*ILV ...... You can see its a frame -1. I would really appreciate your help. Thanks Shalabh -- Shalabh Sharma Scientific Computing Professional Associate (Bioinformatics Specialist) Department of Marine Sciences University of Georgia Athens, GA 30602-3636 From scott at scottcain.net Mon Jul 2 18:50:45 2012 From: scott at scottcain.net (Scott Cain) Date: Mon, 2 Jul 2012 14:50:45 -0400 Subject: [Bioperl-l] GMOD Summer School application deadline Message-ID: Hello, The deadline to apply for the GMOD Summer School is in one week, July 9th. The application is available as a Google Form: https://docs.google.com/spreadsheet/embeddedform?formkey=dG5hNGFiQ3UwYTV2LUZxZW04Qm1yZXc6MQ In the GMOD Summer School (August 24-29, 2012) we will cover the installation, configuration and use of a variety of GMOD tools, including Chado, GBrowse, JBrowse and Tripal. For more information on the course, see the course web page at http://gmod.org/wiki/2012_GMOD_Summer_School The course will make heavy use of the Amazon Web Service (aka, the Cloud) via a grant from Amazon. Enrollment is limited to 24 students, and the application process is competitive: the last few years we've received over 75 applications for those 24 spots. I look forward to seeing you in North Carolina in August! Scott -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From p.j.a.cock at googlemail.com Mon Jul 2 19:34:40 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 2 Jul 2012 20:34:40 +0100 Subject: [Bioperl-l] translation frame problem in bioperl In-Reply-To: References: Message-ID: On Mon, Jul 2, 2012 at 6:09 PM, shalabh sharma wrote: > Hi All, > ? ? ? ? ?I am just confused about the translation frames. I used bioperl to > parse a blastx report. > Reports shows that the frame used is -2 but when i translate the sequence > using EMBOSS or Some other program the frame is -1. > Am i doing something wrong here. Possibly there are conflicting definitions of frames -1, -2, and -3 here (and that's leaving out the possibility of -0, -1 and -2 counting). Some will count from the first base (start for forward strand), others the last base (start of reverse strand). This can make comparing the output of different tools quite confusing. Peter From shalabh.sharma7 at gmail.com Mon Jul 2 20:39:29 2012 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Mon, 2 Jul 2012 16:39:29 -0400 Subject: [Bioperl-l] translation frame problem in bioperl In-Reply-To: <98F2B304-71F3-42BD-9603-6858F03CC9F5@verizon.net> References: <98F2B304-71F3-42BD-9603-6858F03CC9F5@verizon.net> Message-ID: Hi Peter and Brian, Thanks a lot for your reply. I have already taken this in account. So if i parse the blast report (my previous example) i get strand '-1' and frame '1' (according to bioperl) so if we convert it to general term then its -2 because bioperl starts from 0. Also for bioperl forward frame translation working fine. Thanks Shalabh On Mon, Jul 2, 2012 at 4:24 PM, Brian Osborne wrote: > Shalabh, > > Also take a look at this: > > http://www.bioperl.org/wiki/HOWTO:SearchIO#frame.28.29 > > Brian O. > > > On Jul 2, 2012, at 3:34 PM, Peter Cock wrote: > > > On Mon, Jul 2, 2012 at 6:09 PM, shalabh sharma > > wrote: > >> Hi All, > >> I am just confused about the translation frames. I used > bioperl to > >> parse a blastx report. > >> Reports shows that the frame used is -2 but when i translate the > sequence > >> using EMBOSS or Some other program the frame is -1. > >> Am i doing something wrong here. > > > > Possibly there are conflicting definitions of frames -1, -2, and -3 here > > (and that's leaving out the possibility of -0, -1 and -2 counting). Some > > will count from the first base (start for forward strand), others the > last > > base (start of reverse strand). This can make comparing the output > > of different tools quite confusing. > > > > Peter > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- Shalabh Sharma Scientific Computing Professional Associate (Bioinformatics Specialist) Department of Marine Sciences University of Georgia Athens, GA 30602-3636 From bosborne11 at verizon.net Mon Jul 2 20:24:24 2012 From: bosborne11 at verizon.net (Brian Osborne) Date: Mon, 02 Jul 2012 16:24:24 -0400 Subject: [Bioperl-l] translation frame problem in bioperl In-Reply-To: References: Message-ID: <98F2B304-71F3-42BD-9603-6858F03CC9F5@verizon.net> Shalabh, Also take a look at this: http://www.bioperl.org/wiki/HOWTO:SearchIO#frame.28.29 Brian O. On Jul 2, 2012, at 3:34 PM, Peter Cock wrote: > On Mon, Jul 2, 2012 at 6:09 PM, shalabh sharma > wrote: >> Hi All, >> I am just confused about the translation frames. I used bioperl to >> parse a blastx report. >> Reports shows that the frame used is -2 but when i translate the sequence >> using EMBOSS or Some other program the frame is -1. >> Am i doing something wrong here. > > Possibly there are conflicting definitions of frames -1, -2, and -3 here > (and that's leaving out the possibility of -0, -1 and -2 counting). Some > will count from the first base (start for forward strand), others the last > base (start of reverse strand). This can make comparing the output > of different tools quite confusing. > > Peter > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From vebaev at gmail.com Tue Jul 3 16:35:26 2012 From: vebaev at gmail.com (vebaev at gmail.com) Date: Tue, 3 Jul 2012 09:35:26 -0700 (PDT) Subject: [Bioperl-l] CFP - International Conference on Bioinformatics and Computational Biology - BIOCOMP BG 2012 Message-ID: <7b498b4c-2b2e-4e1f-871f-513203488bf1@googlegroups.com> International Conference on Bioinformatics and Computational Biology - BIOCOMP BG 2012 September 20-21, 2012, Varna, Bulgaria Dear Colleague, It is our pleasure to circulate the 2nd announcement of the International Conference on Bioinformatics and Computational Biology - BIOCOMP 2012 (http://biocomp.bio.uni-plovdiv.bg/). Keynote speakers Prof. Dr. Klaas Vandepoele - Ghent University, Belgium Dr. Andreas Gisel - Institute for Biomedical Technologies, Italy Prof. Wojciech Karlowski - Insitute of Molecular Biology and Biotechnology, Poland Prof. Mario A. Fares - University of Dublin, Trinity College, Ireland Dr.Andrey Kajava - CRBM - Macromolecular Biochemistry Research Center, France Dr.Gaurav Sablok - Istituto Agrario San Michele (IASMA), Italy Topics Topics of interest include, but are not limited to: High-performance bio-computing High-throughput sequencing data analysis (NGS) Bio-ontologies Molecular evolution Comparative genomics Molecular modeling and simulation Computational genetics Computational proteomics Data mining and visualization Software tools and applications Gene expression analysis Gene networks Structural biology Genome analysis Databases Systems biology Special topic: bioinformatics and miRNAs Recent achievements in these fields will be presented. The conference will include plenary and poster sessions. Participant?s proposals will be taken under advisement in compiling the program. Publications All accepted abstracts will be published in the conference abstract book. Best 20 abstracts will be peer-reviewed and published as full text manuscripts in a Special Issue of Springer and Elsevier journals: Interdisciplinary Sciences: Computational Life Sciences (ISSN: 1867-1462). Journal of Computational Science (ISSN: 1877-7503) Venue The venue of the conference is 4-star All-inclusive Sunny Day Black Sea resort, Bulgaria Registration and abstract submission All the actions related to the BIOCOMP 2012 (abstract submission, registration etc) may be completed via the Conference website at http://biocomp.bio.uni-plovdiv.bg/ Accommodation IMPORTANT: Accommodation is included in the conference registration fee. Important dates Abstract Submission Deadline - 20 August 2012 Early Registration Fee Payment Deadline - 20 August 2012 Arriving, Poster set up, Registration ? 19 September 2012 Plenary and Poster Sessions ? 20-21 September 2012 You may find details of the Conference visiting the Conference website at http://biocomp.bio.uni-plovdiv.bg/ Looking forward to see you in Bulgaria! ------------------------------------------------ Dr. Vesselin Baev Research Assistant Professor University of Plovdiv Dept. Plant Phys. and Molecular Biology Bioinformatics SMART Group Tzar Assen 24,Plovdiv 4000, BULGARIA Office:+359 32 261 (560); Mobile:+359 89 43 80 945 vebaev at gmail.com; baev at uni-plovdiv.bg; CV: http://plantgene.eu/ From tarakaramji at gmail.com Tue Jul 3 19:33:43 2012 From: tarakaramji at gmail.com (Tarakaramji Moturu) Date: Tue, 3 Jul 2012 19:33:43 +0000 (UTC) Subject: [Bioperl-l] Invitation to connect on LinkedIn Message-ID: <757342252.16905070.1341344023805.JavaMail.app@ela4-bed83.prod> LinkedIn ------------ I'd like to add you to my professional network on LinkedIn. - Tarakaramji Tarakaramji Moturu Student at GITAM University Vishakhapatnam Area, India Confirm that you know Tarakaramji Moturu: https://www.linkedin.com/e/1505z7-h47dlkop-69/isd/7726719493/9xC087NO/?hs=false&tok=2UuxBwCCkl7Rk1 -- You are receiving Invitation to Connect emails. Click to unsubscribe: http://www.linkedin.com/e/1505z7-h47dlkop-69/q7l5PgNeLXh3mAgNJzs79PDWzhT0l80xWa/goo/bioperl-l%40bioperl%2Eorg/20061/I2613636655_1/?hs=false&tok=0hY4YIDwkl7Rk1 (c) 2012 LinkedIn Corporation. 2029 Stierlin Ct, Mountain View, CA 94043, USA. From l.m.timmermans at students.uu.nl Wed Jul 4 07:16:34 2012 From: l.m.timmermans at students.uu.nl (Leon Timmermans) Date: Wed, 4 Jul 2012 10:16:34 +0300 Subject: [Bioperl-l] Invitation to connect on LinkedIn In-Reply-To: <757342252.16905070.1341344023805.JavaMail.app@ela4-bed83.prod> References: <757342252.16905070.1341344023805.JavaMail.app@ela4-bed83.prod> Message-ID: On Tue, Jul 3, 2012 at 10:33 PM, Tarakaramji Moturu wrote: > LinkedIn > ------------ > > > > I'd like to add you to my professional network on LinkedIn. > > - Tarakaramji Sending messages like this directly over mailinglists is a rather bad idea, if only because LinkedIn will think bioperl-l at bioperl.org is one of the email addresses of whomever accepts the request (which is relevant for retrieving a lost password, I think). Leon From ulrik.stervbo at gmail.com Fri Jul 6 07:03:08 2012 From: ulrik.stervbo at gmail.com (Ulrik Stervbo) Date: Fri, 6 Jul 2012 09:03:08 +0200 Subject: [Bioperl-l] problem in using protparam.pm module In-Reply-To: References: <9AA27ADA-FFE1-4735-BDE4-56C9B9A18009@illinois.edu> Message-ID: I had the same problem, and realized it is because I am behind a proxy. This is what I did to the Protparam module: Changed the url to 'http://web.expasy.org/cgi-bin/protparam/protparam' as previously found Added: $browser->proxy(['http'], 'http://[my proxy]/'); after initialization of the LWP agent. The proxy settings is what made Perl choke. (If only one could make perl see global proxy settings). Cheers, Ulrik 2011/7/28 Shachi Gahoi : > Please help me how to run protparam using bioperl module > > On Wed, Jul 27, 2011 at 11:05 AM, Chris Fields wrote: > >> The web service appears to have changed, but it looks as if no tests have >> been written up for this module which would have caught this out. We can >> write some basic tests up to check for simple functionality. >> >> chris >> >> On Jul 26, 2011, at 10:58 PM, Shachi Gahoi wrote: >> >> > Dear All, >> > >> > i am using protparam.pm module. but when i am running this script it is >> > printing one error message >> > >> > "Can't call method "throw" without a package or object reference at >> > /usr/share/perl5/Bio/Root/Root.pm line 368, line 1." >> > >> > Kindly help me to solve this problem. >> > >> > >> > Script is here---- >> > >> ################################################################################### >> > #!/usr/bin/perl >> > >> > use warnings; >> > use Bio::SeqIO; >> > use Bio::Tools::Protparam; >> > >> > >> > $seqfile='test1.fasta'; >> > >> > $seqio = Bio::SeqIO->new(-file => "$seqfile", -format => 'Fasta'); >> > >> > >> > while( $seq = $seqio->next_seq() ) >> > { >> > >> > >> > my $pp = Bio::Tools::Protparam->new(-seq=>$seq->seq); >> > >> > print >> > "ID : ", $seq->display_id,"\n", >> > "Amino acid number : ",$pp->amino_acid_number(),"\n", >> > "Number of negative amino acids : ",$pp->num_neg(),"\n", >> > "Number of positive amino acids : ",$pp->num_pos(),"\n", >> > "Molecular weight : ",$pp->molecular_weight(),"\n", >> > "Theoretical pI : ",$pp->theoretical_pI(),"\n", >> > "Total number of atoms : ", $pp->total_atoms(),"\n", >> > "Number of carbon atoms : ",$pp->num_carbon(),"\n", >> > "Number of hydrogen atoms : ",$pp->num_hydrogen(),"\n", >> > "Number of nitrogen atoms : ",$pp->num_nitro(),"\n", >> > "Number of oxygen atoms : ",$pp->num_oxygen(),"\n", >> > "Number of sulphur atoms : ",$pp->num_sulphur(),"\n", >> > "Half life : ", $pp->half_life(),"\n", >> > "Instability Index : ", $pp->instability_index(),"\n", >> > "Stability class : ", $pp->stability(),"\n", >> > "Aliphatic_index : ",$pp->aliphatic_index(),"\n", >> > "Gravy : ", $pp->gravy(),"\n", >> > "Composition of A : ", $pp->AA_comp('A'),"\n", >> > "Composition of R : ", $pp->AA_comp('R'),"\n", >> > "Composition of N : ", $pp->AA_comp('N'),"\n", >> > "Composition of D : ", $pp->AA_comp('D'),"\n", >> > "Composition of C : ", $pp->AA_comp('C'),"\n", >> > "Composition of Q : ", $pp->AA_comp('Q'),"\n", >> > "Composition of E : ", $pp->AA_comp('E'),"\n", >> > "Composition of G : ", $pp->AA_comp('G'),"\n", >> > "Composition of H : ", $pp->AA_comp('H'),"\n", >> > "Composition of I : ", $pp->AA_comp('I'),"\n", >> > "Composition of L : ", $pp->AA_comp('L'),"\n", >> > "Composition of K : ", $pp->AA_comp('K'),"\n", >> > "Composition of M : ", $pp->AA_comp('M'),"\n", >> > "Composition of F : ", $pp->AA_comp('F'),"\n", >> > "Composition of P : ", $pp->AA_comp('P'),"\n", >> > "Composition of S : ", $pp->AA_comp('S'),"\n", >> > "Composition of T : ", $pp->AA_comp('T'),"\n", >> > "Composition of W : ", $pp->AA_comp('W'),"\n", >> > "Composition of Y : ", $pp->AA_comp('Y'),"\n", >> > "Composition of V : ", $pp->AA_comp('V'),"\n", >> > "Composition of B : ", $pp->AA_comp('B'),"\n", >> > "Composition of Z : ", $pp->AA_comp('Z'),"\n", >> > "Composition of X : ", $pp->AA_comp('X'),"\n"; >> > } >> > >> ################################################################################### >> > >> > >> > >> > >> > -- >> > Regards, >> > Shachi >> > _______________________________________________ >> > Bioperl-l mailing list >> > Bioperl-l at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > > -- > Regards, > Shachi > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jason.stajich at gmail.com Fri Jul 6 17:49:46 2012 From: jason.stajich at gmail.com (Jason Stajich) Date: Fri, 6 Jul 2012 10:49:46 -0700 Subject: [Bioperl-l] problem in using protparam.pm module In-Reply-To: References: <9AA27ADA-FFE1-4735-BDE4-56C9B9A18009@illinois.edu> Message-ID: <8C9056B6-1DA4-4BE0-B008-429C2F6C05BE@gmail.com> you might try the PERL_LWP_ENV_PROXY and HTTP_PROXY env variables http://search.cpan.org/~gaas/libwww-perl-6.04/lib/LWP/UserAgent.pm#CONSTRUCTOR_METHODS http://search.cpan.org/~gaas/libwww-perl-6.04/lib/LWP/UserAgent.pm#Proxy_attributes I can't test it my end though w/o a proxy service. On Jul 6, 2012, at 12:03 AM, Ulrik Stervbo wrote: > I had the same problem, and realized it is because I am behind a proxy. > > This is what I did to the Protparam module: > Changed the url to 'http://web.expasy.org/cgi-bin/protparam/protparam' > as previously found > > Added: > $browser->proxy(['http'], 'http://[my proxy]/'); after initialization > of the LWP agent. > > The proxy settings is what made Perl choke. (If only one could make > perl see global proxy settings). > > Cheers, > Ulrik > > 2011/7/28 Shachi Gahoi : >> Please help me how to run protparam using bioperl module >> >> On Wed, Jul 27, 2011 at 11:05 AM, Chris Fields wrote: >> >>> The web service appears to have changed, but it looks as if no tests have >>> been written up for this module which would have caught this out. We can >>> write some basic tests up to check for simple functionality. >>> >>> chris >>> >>> On Jul 26, 2011, at 10:58 PM, Shachi Gahoi wrote: >>> >>>> Dear All, >>>> >>>> i am using protparam.pm module. but when i am running this script it is >>>> printing one error message >>>> >>>> "Can't call method "throw" without a package or object reference at >>>> /usr/share/perl5/Bio/Root/Root.pm line 368, line 1." >>>> >>>> Kindly help me to solve this problem. >>>> >>>> >>>> Script is here---- >>>> >>> ################################################################################### >>>> #!/usr/bin/perl >>>> >>>> use warnings; >>>> use Bio::SeqIO; >>>> use Bio::Tools::Protparam; >>>> >>>> >>>> $seqfile='test1.fasta'; >>>> >>>> $seqio = Bio::SeqIO->new(-file => "$seqfile", -format => 'Fasta'); >>>> >>>> >>>> while( $seq = $seqio->next_seq() ) >>>> { >>>> >>>> >>>> my $pp = Bio::Tools::Protparam->new(-seq=>$seq->seq); >>>> >>>> print >>>> "ID : ", $seq->display_id,"\n", >>>> "Amino acid number : ",$pp->amino_acid_number(),"\n", >>>> "Number of negative amino acids : ",$pp->num_neg(),"\n", >>>> "Number of positive amino acids : ",$pp->num_pos(),"\n", >>>> "Molecular weight : ",$pp->molecular_weight(),"\n", >>>> "Theoretical pI : ",$pp->theoretical_pI(),"\n", >>>> "Total number of atoms : ", $pp->total_atoms(),"\n", >>>> "Number of carbon atoms : ",$pp->num_carbon(),"\n", >>>> "Number of hydrogen atoms : ",$pp->num_hydrogen(),"\n", >>>> "Number of nitrogen atoms : ",$pp->num_nitro(),"\n", >>>> "Number of oxygen atoms : ",$pp->num_oxygen(),"\n", >>>> "Number of sulphur atoms : ",$pp->num_sulphur(),"\n", >>>> "Half life : ", $pp->half_life(),"\n", >>>> "Instability Index : ", $pp->instability_index(),"\n", >>>> "Stability class : ", $pp->stability(),"\n", >>>> "Aliphatic_index : ",$pp->aliphatic_index(),"\n", >>>> "Gravy : ", $pp->gravy(),"\n", >>>> "Composition of A : ", $pp->AA_comp('A'),"\n", >>>> "Composition of R : ", $pp->AA_comp('R'),"\n", >>>> "Composition of N : ", $pp->AA_comp('N'),"\n", >>>> "Composition of D : ", $pp->AA_comp('D'),"\n", >>>> "Composition of C : ", $pp->AA_comp('C'),"\n", >>>> "Composition of Q : ", $pp->AA_comp('Q'),"\n", >>>> "Composition of E : ", $pp->AA_comp('E'),"\n", >>>> "Composition of G : ", $pp->AA_comp('G'),"\n", >>>> "Composition of H : ", $pp->AA_comp('H'),"\n", >>>> "Composition of I : ", $pp->AA_comp('I'),"\n", >>>> "Composition of L : ", $pp->AA_comp('L'),"\n", >>>> "Composition of K : ", $pp->AA_comp('K'),"\n", >>>> "Composition of M : ", $pp->AA_comp('M'),"\n", >>>> "Composition of F : ", $pp->AA_comp('F'),"\n", >>>> "Composition of P : ", $pp->AA_comp('P'),"\n", >>>> "Composition of S : ", $pp->AA_comp('S'),"\n", >>>> "Composition of T : ", $pp->AA_comp('T'),"\n", >>>> "Composition of W : ", $pp->AA_comp('W'),"\n", >>>> "Composition of Y : ", $pp->AA_comp('Y'),"\n", >>>> "Composition of V : ", $pp->AA_comp('V'),"\n", >>>> "Composition of B : ", $pp->AA_comp('B'),"\n", >>>> "Composition of Z : ", $pp->AA_comp('Z'),"\n", >>>> "Composition of X : ", $pp->AA_comp('X'),"\n"; >>>> } >>>> >>> ################################################################################### >>>> >>>> >>>> >>>> >>>> -- >>>> Regards, >>>> Shachi >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >> >> >> -- >> Regards, >> Shachi >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Jason Stajich jason.stajich at gmail.com jason at bioperl.org From bubli_thakur at rediffmail.com Sun Jul 1 14:59:29 2012 From: bubli_thakur at rediffmail.com (subarna thakur) Date: Sun, 01 Jul 2012 14:59:29 -0000 Subject: [Bioperl-l] =?utf-8?q?Ks_saturation?= Message-ID: <20120617031856.16345.qmail@f4mail-235-140.rediffmail.com> Dear all,I am trying to calculate dn/ds values of  all orthologous gene pair between a pair of genome using pairwsie_kaks.pl script within bioperl which evokes the codeml program in runmode -2. When I am analyzing the results, some of the genes have anomalously high dS or Ks values and some of them even reaching more than 100 as a result of which the average value of Ks for the whole genome  shots up. These genes are orthologous genes and even share more than 50% sequence identity. Should I consider these genes for the anlysis or left them out. If I left them out, then upto what cutoff value of Ks should I consider for analysis. In some papers, I have found that they have considered Ks values as high as 5.6. Is there a way for determining the cutoff value for Ks? Subarna   From haywardjeremya at gmail.com Fri Jul 6 17:56:12 2012 From: haywardjeremya at gmail.com (Jeremy Hayward) Date: Fri, 6 Jul 2012 14:56:12 -0300 Subject: [Bioperl-l] Two 'host' tags? Message-ID: Hi-- Clueless newbie here, for which apologies. I've posted a description of my problem, inputs and outputs, at Gist 2816510; https://gist.github.com/2816510 Briefly, I'm trying to take a genbank file (.gb), and create a FASTA file with a specific identifier line for each sequence. Specifically, I want the "host" tag as the identifier. With the help of the Bioperl beginner readme and the HOWTO's (which are great!) I've worked out how to loop through my sequences and get the 'host' tag for each one. For some reason, I get two identifier lines for each sequence. I guess the problem is in the 'for' loop--it's running the stuff below it twice, once with the actual 'host' tag data and once with...nothing? Not sure. I think I can work out how to use s/ and a regex just to delete the second identifier line, but that feels like I'm avoiding the problem instead of fixing it. Any help appreciated! Many thanks, --Jeremy Hayward From jason.stajich at gmail.com Fri Jul 6 19:39:52 2012 From: jason.stajich at gmail.com (Jason Stajich) Date: Fri, 6 Jul 2012 12:39:52 -0700 Subject: [Bioperl-l] Two 'host' tags? In-Reply-To: References: Message-ID: Hi Jeremy - You are printing for every feature in the loop (e.g. the source and the misc_RNA ) - you only want to loop through the features, then grab the one which is source, then change or print the info when you see that. So you could have an if( $feature->primary_tag eq 'source') in there or something as well. Alternatively I've left it pretty much intact and just simplified it a bit. You should also try and use Bio::SeqIO to print instead of your printing. I updated the code here to be simpler - right now it warns you that you are printing IDs with spaces (which is something you should think about when it comes to your output file, but I don't know your downstream plans). Also you could put other info in the description field if you wanted to capture accession number or the endophyte name too. https://gist.github.com/3062285 Best, Jason On Jul 6, 2012, at 10:56 AM, Jeremy Hayward wrote: > Hi-- Clueless newbie here, for which apologies. > > I've posted a description of my problem, inputs and outputs, at Gist > 2816510; https://gist.github.com/2816510 > > Briefly, I'm trying to take a genbank file (.gb), and create a FASTA > file with a specific identifier line for each sequence. Specifically, > I want the "host" tag as the identifier. With the help of the Bioperl > beginner readme and the HOWTO's (which are great!) I've worked out how > to loop through my sequences and get the 'host' tag for each one. For > some reason, I get two identifier lines for each sequence. I guess the > problem is in the 'for' loop--it's running the stuff below it twice, > once with the actual 'host' tag data and once with...nothing? Not > sure. > > I think I can work out how to use s/ and a regex just to delete the > second identifier line, but that feels like I'm avoiding the problem > instead of fixing it. Any help appreciated! > > > Many thanks, > > --Jeremy Hayward > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Jason Stajich jason.stajich at gmail.com jason at bioperl.org From bosborne11 at verizon.net Fri Jul 6 19:51:11 2012 From: bosborne11 at verizon.net (Brian Osborne) Date: Fri, 06 Jul 2012 15:51:11 -0400 Subject: [Bioperl-l] Two 'host' tags? In-Reply-To: References: Message-ID: <456448FF-C413-42D1-833A-FAA74E4FEF9E@verizon.net> Jeremy, Looks like each of your individual sequences has 2 features, but you only care about the 'source' feature ( if ($feat_object->primary_tag eq "source") ?). Also, try not to print out the sequence like you're doing, try to build a Sequence object for each input sequence and then write its contents to your fasta file using write_seq(). You will set the id for your Sequence object using display_name(). Brian O. On Jul 6, 2012, at 1:56 PM, Jeremy Hayward wrote: > Hi-- Clueless newbie here, for which apologies. > > I've posted a description of my problem, inputs and outputs, at Gist > 2816510; https://gist.github.com/2816510 > > Briefly, I'm trying to take a genbank file (.gb), and create a FASTA > file with a specific identifier line for each sequence. Specifically, > I want the "host" tag as the identifier. With the help of the Bioperl > beginner readme and the HOWTO's (which are great!) I've worked out how > to loop through my sequences and get the 'host' tag for each one. For > some reason, I get two identifier lines for each sequence. I guess the > problem is in the 'for' loop--it's running the stuff below it twice, > once with the actual 'host' tag data and once with...nothing? Not > sure. > > I think I can work out how to use s/ and a regex just to delete the > second identifier line, but that feels like I'm avoiding the problem > instead of fixing it. Any help appreciated! > > > Many thanks, > > --Jeremy Hayward > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From dejian.zhao at gmail.com Wed Jul 11 17:31:37 2012 From: dejian.zhao at gmail.com (De-Jian Zhao) Date: Thu, 12 Jul 2012 01:31:37 +0800 Subject: [Bioperl-l] Errors with Bio::Seq objects Message-ID: <4FFDB879.1020906@gmail.com> Hi, I downloaded a nucleotide sequence from Genbank (file name: nt.gb) and tested the Bio::SeqIO module as follows. The first 3 commands succeeded; however the last 2 failed. I think $seqio->next_seq() produces a Bio::Seq object which contains the sequence, features and annotation (according to the DESCRIPTION of "perldoc Bio::Seq") and thus the invocation of the methods get_SeqFeatures() and translate() should be valid. However, the results denied this idea. Will anyone explain what happened to the last 2 commands? I have encountered numerous cases of failures when testing the bioperl methods. I want to translate the mRNA sequence and extract the sequence features. What are the right commands? Thanks a lot! Best, Dejian PS: The commands and results $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); print $seqio->next_seq()->display_id(); ' nt.gb NM_053056 $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); print $seqio->next_seq()->molecule(); ' nt.gb mRNA $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); print $seqio->next_seq()->subseq(1,6); ' nt.gb CACACG $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); print $seqio->next_seq()->get_SeqFeatures(); ' nt.gb Bio::SeqFeature::Generic=HASH(0x20a30898)Bio::SeqFeature::Generic=HASH(0x20a30bb0)Bio::SeqFeature::Generic=HASH(0x20a30cd0)Bio::SeqFeature::Generic=HASH(0x20a317b0)Bio::SeqFeature::Generic=HASH(0x20a31720)Bio::SeqFeature::Generic=HASH(0x20a39a18)Bio::SeqFeature::Generic=HASH(0x20a317e0)Bio::SeqFeature::Generic=HASH(0x20a398b0)Bio::SeqFeature::Generic=HASH(0x20a39838)Bio::SeqFeature::Generic=HASH(0x20a39e98)Bio::SeqFeature::Generic=HASH(0x20a3b898)Bio::SeqFeature::Generic=HASH(0x20a3a120)Bio::SeqFeature::Generic=HASH(0x20a3bda8)Bio::SeqFeature::Generic=HASH(0x20a3c030)Bio::SeqFeature::Generic=HASH(0x20a3c2b8)Bio::SeqFeature::Generic=HASH(0x20a3be20)Bio::SeqFeature::Generic=HASH(0x20a3c0a8)Bio::SeqFeature::Generic=HASH(0x20a3bb98)Bio::SeqFeature::Generic=HASH(0x20a3c300)Bio::SeqFeature::Generic=HASH(0x20a3c588)Bio::SeqFeature::Generic=HASH(0x20a3d838)Bio::SeqFeature::Generic=HASH(0x20a3dfb8)Bio::SeqFeature::Generic=HASH(0x20a3dd18) $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); print $seqio->next_seq()->translate(); ' nt.gb Bio::Seq::RichSeq=HASH(0x20a3e7b0) From jimhu at tamu.edu Wed Jul 11 18:01:27 2012 From: jimhu at tamu.edu (Jim Hu) Date: Wed, 11 Jul 2012 13:01:27 -0500 Subject: [Bioperl-l] Errors with Bio::Seq objects In-Reply-To: <4FFDB879.1020906@gmail.com> References: <4FFDB879.1020906@gmail.com> Message-ID: <9CA9DA3A-B03F-4EC3-977C-E18A6F4D9B6F@tamu.edu> Hi Dejian, On Jul 11, 2012, at 12:31 PM, De-Jian Zhao wrote: > Hi, > > I downloaded a nucleotide sequence from Genbank (file name: nt.gb) and tested the Bio::SeqIO module as follows. The first 3 commands succeeded; however the last 2 failed. > > I think $seqio->next_seq() produces a Bio::Seq object which contains the sequence, features and annotation (according to the DESCRIPTION of "perldoc Bio::Seq") and thus the invocation of the methods get_SeqFeatures() and translate() should be valid. However, the results denied this idea. That's correct about Bio::Seq objects being returned. Actually, it is probably a kind of Bio::Seq object. For example, SeqIO may return a Bio::Seq::RichSeq object that inherits methods from Bio::Seq objects. However, as explained below, the methods are working as they should... they are just returning objects when you are expecting something else. > > Will anyone explain what happened to the last 2 commands? I have encountered numerous cases of failures when testing the bioperl methods. I want to translate the mRNA sequence and extract the sequence features. What are the right commands? Thanks a lot! > > Best, > Dejian > > > > PS: The commands and results > > $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); print $seqio->next_seq()->display_id(); ' nt.gb > NM_053056 > > $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); print $seqio->next_seq()->molecule(); ' nt.gb > mRNA > > $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); print $seqio->next_seq()->subseq(1,6); ' nt.gb > CACACG > > $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); print $seqio->next_seq()->get_SeqFeatures(); ' nt.gb > Bio::SeqFeature::Generic=HASH(0x20a30898)Bio::SeqFeature::Generic=HASH(0x20a30bb0)Bio::SeqFeature::Generic=HASH(0x20a30cd0)Bio::SeqFeature::Generic=HASH(0x20a317b0)Bio::SeqFeature::Generic=HASH(0x20a31720)Bio::SeqFeature::Generic=HASH(0x20a39a18)Bio::SeqFeature::Generic=HASH(0x20a317e0)Bio::SeqFeature::Generic=HASH(0x20a398b0)Bio::SeqFeature::Generic=HASH(0x20a39838)Bio::SeqFeature::Generic=HASH(0x20a39e98)Bio::SeqFeature::Generic=HASH(0x20a3b898)Bio::SeqFeature::Generic=HASH(0x20a3a120)Bio::SeqFeature::Generic=HASH(0x20a3bda8)Bio::SeqFeature::Generic=HASH(0x20a3c030)Bio::SeqFeature::Generic=HASH(0x20a3c2b8)Bio::SeqFeature::Generic=HASH(0x20a3be20)Bio::SeqFeature::Generic=HASH(0x20a3c0a8)Bio::SeqFeature::Generic=HASH(0x20a3bb98)Bio::SeqFeature::Generic=HASH(0x20a3c300)Bio::SeqFeature::Generic=HASH(0x20a3c588)Bio::SeqFeature::Generic=HASH(0x20a3d838)Bio::SeqFeature::Generic=HASH(0x20a3dfb8)Bio::SeqFeature::Generic=HASH(0x20a3dd18) http://doc.bioperl.org/bioperl-live/Bio/Seq.html#POD24 $seq_obj->get_SeqFeatures() returns an array of SeqFeature objects, which are references. So this worked as expected. I usually write this as script files, so I've never done it all with perl -e. But you need to iterate over the array and query the objects for the information you want about the features. > > $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); print $seqio->next_seq()->translate(); ' nt.gb > Bio::Seq::RichSeq=HASH(0x20a3e7b0) ->translate returns a new Seq object. I think $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); print $seqio->next_seq()->translate()->seq(); ' nt.gb should work (haven't tried it). Jim > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ===================================== Jim Hu Professor Dept. of Biochemistry and Biophysics 2128 TAMU Texas A&M Univ. College Station, TX 77843-2128 979-862-4054 From bosborne11 at verizon.net Wed Jul 11 17:47:25 2012 From: bosborne11 at verizon.net (Brian Osborne) Date: Wed, 11 Jul 2012 13:47:25 -0400 Subject: [Bioperl-l] Errors with Bio::Seq objects In-Reply-To: <4FFDB879.1020906@gmail.com> References: <4FFDB879.1020906@gmail.com> Message-ID: Dejian, These are not "failures". The get_SeqFeatures() and translate() methods will return Bio::Seq objects or a Bio::Seq object. Start here: www.bioperl.org/wiki/HOWTO:Beginners Brian O. On Jul 11, 2012, at 1:31 PM, De-Jian Zhao wrote: > Hi, > > I downloaded a nucleotide sequence from Genbank (file name: nt.gb) and tested the Bio::SeqIO module as follows. The first 3 commands succeeded; however the last 2 failed. > > I think $seqio->next_seq() produces a Bio::Seq object which contains the sequence, features and annotation (according to the DESCRIPTION of "perldoc Bio::Seq") and thus the invocation of the methods get_SeqFeatures() and translate() should be valid. However, the results denied this idea. > > Will anyone explain what happened to the last 2 commands? I have encountered numerous cases of failures when testing the bioperl methods. I want to translate the mRNA sequence and extract the sequence features. What are the right commands? Thanks a lot! > > Best, > Dejian > > > > PS: The commands and results > > $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); print $seqio->next_seq()->display_id(); ' nt.gb > NM_053056 > > $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); print $seqio->next_seq()->molecule(); ' nt.gb > mRNA > > $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); print $seqio->next_seq()->subseq(1,6); ' nt.gb > CACACG > > $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); print $seqio->next_seq()->get_SeqFeatures(); ' nt.gb > Bio::SeqFeature::Generic=HASH(0x20a30898)Bio::SeqFeature::Generic=HASH(0x20a30bb0)Bio::SeqFeature::Generic=HASH(0x20a30cd0)Bio::SeqFeature::Generic=HASH(0x20a317b0)Bio::SeqFeature::Generic=HASH(0x20a31720)Bio::SeqFeature::Generic=HASH(0x20a39a18)Bio::SeqFeature::Generic=HASH(0x20a317e0)Bio::SeqFeature::Generic=HASH(0x20a398b0)Bio::SeqFeature::Generic=HASH(0x20a39838)Bio::SeqFeature::Generic=HASH(0x20a39e98)Bio::SeqFeature::Generic=HASH(0x20a3b898)Bio::SeqFeature::Generic=HASH(0x20a3a120)Bio::SeqFeature::Generic=HASH(0x20a3bda8)Bio::SeqFeature::Generic=HASH(0x20a3c030)Bio::SeqFeature::Generic=HASH(0x20a3c2b8)Bio::SeqFeature::Generic=HASH(0x20a3be20)Bio::SeqFeature::Generic=HASH(0x20a3c0a8)Bio::SeqFeature::Generic=HASH(0x20a3bb98)Bio::SeqFeature::Generic=HASH(0x20a3c300)Bio::SeqFeature::Generic=HASH(0x20a3c588)Bio::SeqFeature::Generic=HASH(0x20a3d838)Bio::SeqFeature::Generic=HASH(0x20a3dfb8)Bio::SeqFeature::Generic=HASH(0x20a3dd18) > > $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); print $seqio->next_seq()->translate(); ' nt.gb > Bio::Seq::RichSeq=HASH(0x20a3e7b0) > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Wed Jul 11 19:02:46 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Wed, 11 Jul 2012 19:02:46 +0000 Subject: [Bioperl-l] hmmer3.pm question re query and hit coordinates In-Reply-To: References: <7CF4A2C5-F44F-4C0D-A3B7-5ED131A1A9ED@gmail.com> <1823BCEE-5D27-4FF9-8D57-082AE0CFE8ED@illinois.edu> Message-ID: Peng, Has this been filed as a bug yet? https://redmine.open-bio.org/projects/bioperl Seems like it would be fairly easy to fix, but I want to track it just in case. chris On Jul 11, 2012, at 12:45 PM, Peng Zhou wrote: > Hello guys, > > Just a follow-up, it seems to me the bioperl-live version is still having the same problem - calling hit "query" while query sequence "hit". I also looked into the test script written for hmmer3 (bioperl-live/t/SearchIO/hmmer.t), and it doesn't deal with the alignment part - I guess that's why this bug was not discovered. > > To be simple, here's an output of hmmsearch v3.0: > # hmmsearch :: search profile(s) against a sequence database > # HMMER 3.0 (March 2010); http://hmmer.org/ > # Copyright (C) 2010 Howard Hughes Medical Institute. > # Freely distributed under the GNU General Public License (GPLv3). > # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > # query HMM file: /project/youngn/zhoup/Scripts/spada/profile/21_all.hmm > # target sequence database: /project/youngn/zhoup/Data/misc3/spada/Athaliana/01_genome/12_refseq_orf.fa > # output directed to file: /project/youngn/zhoup/Data/misc3/spada/Athaliana/11_hmmSearchX/01_raw.txt > # number of worker threads: 4 > # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > > Query: CRP0000 [M=75] > Scores for complete sequences (score includes all domains): > --- full sequence --- --- best 1 domain --- -#dom- > E-value score bias E-value score bias exp N Sequence Description > ------- ------ ----- ------- ------ ----- ---- -- -------- ----------- > 5.5e-25 95.0 14.4 5.7e-25 95.0 10.0 1.0 1 Chr2_540228_540404_+ > > Domain annotation for each sequence (and alignments): > >> Chr2_540228_540404_+ > # score bias c-Evalue i-Evalue hmmfrom hmm to alifrom ali to envfrom env to acc > --- ------ ----- --------- --------- ------- ------- ------- ------- ------- ------- ---- > 1 ! 95.0 10.0 3.6e-30 5.7e-25 20 74 .. 4 59 .] 1 59 [] 0.95 > > Alignments for each domain: > == domain 1 score: 95.0 bits; conditional E-value: 3.6e-30 > CRP0000 20 tegpkvaeartCesqShkFkGpCvsdtnCasvCrtEgfpgGecrg.rrrCfCtkpc 74 > ++gp+++eartCes+Sh+FkGpCvs +nCa+vC++Egf gG+crg rrrC+Ct++c > Chr2_540228_540404_+ 4 GMGPVTVEARTCESKSHRFKGPCVSTHNCANVCHNEGFGGGKCRGfRRRCYCTRHC 59 > 568899***99********************************************* PP > > And here is a dump of the parsed HSP object: > $VAR1 = bless( { > 'VERBOSE' => 0, > 'IDENTICAL' => 0, > 'RANK' => 1, > 'STRANDED' => 'NONE', > 'EVALUE' => '3.6e-30', > 'HSP_LENGTH' => 56, > 'ALGORITHM' => 'HMMSEARCH' > 'SCORE' => '95.0', > 'GAP_SYMBOL' => '-', > 'CONSERVED' => 0, > > 'HIT_NAME' => 'Chr2_540228_540404_+', > 'HIT_DESC' => '', > 'HIT_START' => '20', > 'HIT_END' => '74', > 'HIT_LENGTH' => 56, > 'HIT_SEQ' => 'tegpkvaeartCesqShkFkGpCvsdtnCasvCrtEgfpgGecrg-rrrCfCtkpc', > 'HIT_FRAME' => 0, > > 'QUERY_NAME' => 'CRP0000', > 'QUERY_DESC' => undef, > 'QUERY_START' => '4', > 'QUERY_END' => '59', > 'QUERY_LENGTH' => '75', > 'QUERY_FRAME' => 0, > 'QUERY_SEQ' => 'GMGPVTVEARTCESKSHRFKGPCVSTHNCANVCHNEGFGGGKCRGfRRRCYCTRHC', > > 'HOMOLOGY_SEQ' => '++gp+++eartCes+Sh+FkGpCvs +nCa+vC++Egf gG+crg rrrC+Ct++c', > }, 'Bio::Search::HSP::HMMERHSP' ); > > Clearly, the "HIT_START", "HIT_END", "HIT_SEQ" should actually be exchanged with "QUERY_START", "QUERY_END" and "QUERY_SEQ" values. > > Thanks, > > Peng, > > On Tuesday, July 19, 2011 11:23:20 PM UTC-5, Givan, Scott A. wrote: > I'll try the bioperl-live version. Thanks guys. > Scott Givan > 541-740-4685 > Sent from an iPhone (so expect typos). > > On Jul 19, 2011, at 10:34 PM, "Chris Fields" wrote: > > > This might be a disconnect between the HMMER3 version in bioperl-live and the one in Kai's bioperl-hmmer3 repo. I believe the one in bioperl-live is newer. Scott, can you give that a try? > > > > chris > > > > On Jul 19, 2011, at 9:45 PM, Thomas Sharpton wrote: > > > >> Hi Scott, > >> > >> Thanks for writing. I'm on the road at the moment so I have to be briefer and less thorough than I'd like to be. > >> > >> What you are observing is not the intended behavior. Oddly, it's not what I recall obtaining in my tests on this software, though I was mostly interested in hmmsearch at the time and may have been sloppier than I should have been when it came to hmmscan. > >> > >> What version of HMMER3 you're using? There have been some small formatting changes in the past that might be causing a burp in the parser, though I'm doubting it. > >> > >> Kai Blin wrote some test scripts (found here: bioperl-live/t/SearchIO/hmmer.t) that, if I recall correctly, evaluate query/hit coordinates. It might be worth giving this a shot if you haven't already. > >> > >> Also, if you don't mind, I'm happy to run your code on your output file on my end. It might help me diagnose the problem. > >> > >> Sorry this is being a thorn in your side! I've cc'ed the list in case anyone else has insight into this matter. > >> > >> Best, > >> Thomas > >> > >> On Jul 19, 2011, at 10:43 AM, Givan, Scott A. wrote: > >> > >>> Hi Thomas, > >>> > >>> I'm using modules in the bipoerl-hmmer3 git repository to parse hmmscan > >>> reports. When I parse the files and walk through the HSP's like: > >>> > >>> while (my $hit = $rslt->next_model) { > >>> > >>> while (my $domain = $hit->next_hsp) { > >>> > >>> And retrieve the "hit" coordinates like: > >>> > >>> print "hit coords: ", $domain->start('hit'), "-", $domain->end('hit'), > >>> "\n"; > >>> > >>> The coordinates returned correspond to what I would call the "query", > >>> since they are for the sequence I fed to hmmscan to search the profile > >>> database. Likewise, when retrieving the query coordinates like > >>> $domain->start('query'), I get what I consider the "hit" coordinates, > >>> since they are for the domain profile. Is this the intended behavior? > >>> > >>> Thanks. > >>> > >>> scott > >>> > >>> -- > >>> Scott A. Givan > >>> Associate Director > >>> Informatics Research Core Facility > >>> 240e Bond Life Sciences Center > >>> Research Assistant Professor > >>> Molecular Microbiology and Immunology > >>> University of Missouri, Columbia > >>> > >>> TEL 573-882-2948 > >>> FAX 573-884-9676 > >>> http://ircf.rnet.missouri.edu > >>> > >>> > >>> > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > From p.j.a.cock at googlemail.com Wed Jul 11 21:00:56 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 11 Jul 2012 22:00:56 +0100 Subject: [Bioperl-l] hmmer3.pm question re query and hit coordinates In-Reply-To: References: <7CF4A2C5-F44F-4C0D-A3B7-5ED131A1A9ED@gmail.com> <1823BCEE-5D27-4FF9-8D57-082AE0CFE8ED@illinois.edu> Message-ID: On Wed, Jul 11, 2012 at 8:02 PM, Fields, Christopher J wrote: > Peng, > > Has this been filed as a bug yet? > > https://redmine.open-bio.org/projects/bioperl > > Seems like it would be fairly easy to fix, but I want to track it just in case. > > chris Hi all, This could be the unfortunate fact that hmmscan and hmmsearch return very similar tabular output, but with query and hit interchanged. i.e. You need some extra information to know which way round they are (not possible with the current output). This was an issue in Bow's Biopython SearchIO project - which for the moment he solved by handling this as two hmmer file formats. In the medium term we're hoping hmmer3 will add some header information or something. Peter From zhoupenggeni at gmail.com Wed Jul 11 17:45:00 2012 From: zhoupenggeni at gmail.com (Peng Zhou) Date: Wed, 11 Jul 2012 10:45:00 -0700 (PDT) Subject: [Bioperl-l] hmmer3.pm question re query and hit coordinates In-Reply-To: References: <7CF4A2C5-F44F-4C0D-A3B7-5ED131A1A9ED@gmail.com> <1823BCEE-5D27-4FF9-8D57-082AE0CFE8ED@illinois.edu> Message-ID: Hello guys, Just a follow-up, it seems to me the bioperl-live version is still having the same problem - calling hit "query" while query sequence "hit". I also looked into the test script written for hmmer3 ( bioperl-live/t/SearchIO/hmmer.t), and it doesn't deal with the alignment part - I guess that's why this bug was not discovered. To be simple, here's an output of hmmsearch v3.0: # hmmsearch :: search profile(s) against a sequence database # HMMER 3.0 (March 2010); http://hmmer.org/ # Copyright (C) 2010 Howard Hughes Medical Institute. # Freely distributed under the GNU General Public License (GPLv3). # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - # query HMM file: /project/youngn/zhoup/Scripts/spada/profile/21_all.hmm # target sequence database: /project/youngn/zhoup/Data/misc3/spada/Athaliana/01_genome/12_refseq_orf.fa # output directed to file: /project/youngn/zhoup/Data/misc3/spada/Athaliana/11_hmmSearchX/01_raw.txt # number of worker threads: 4 # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Query: CRP0000 [M=75] Scores for complete sequences (score includes all domains): --- full sequence --- --- best 1 domain --- -#dom- E-value score bias E-value score bias exp N Sequence Description ------- ------ ----- ------- ------ ----- ---- -- -------- ----------- 5.5e-25 95.0 14.4 5.7e-25 95.0 10.0 1.0 1 Chr2_540228_540404_+ Domain annotation for each sequence (and alignments): >> Chr2_540228_540404_+ # score bias c-Evalue i-Evalue hmmfrom hmm to alifrom ali to envfrom env to acc --- ------ ----- --------- --------- ------- ------- ------- ------- ------- ------- ---- 1 ! 95.0 10.0 3.6e-30 5.7e-25 20 74 .. 4 59 .] 1 59 [] 0.95 Alignments for each domain: == domain 1 score: 95.0 bits; conditional E-value: 3.6e-30 CRP0000 20 tegpkvaeartCesqShkFkGpCvsdtnCasvCrtEgfpgGecrg.rrrCfCtkpc 74 ++gp+++eartCes+Sh+FkGpCvs +nCa+vC++Egf gG+crg rrrC+Ct++c Chr2_540228_540404_+ 4 GMGPVTVEARTCESKSHRFKGPCVSTHNCANVCHNEGFGGGKCRGfRRRCYCTRHC 59 568899***99********************************************* PP And here is a dump of the parsed HSP object: $VAR1 = bless( { 'VERBOSE' => 0, 'IDENTICAL' => 0, 'RANK' => 1, 'STRANDED' => 'NONE', 'EVALUE' => '3.6e-30', 'HSP_LENGTH' => 56, 'ALGORITHM' => 'HMMSEARCH' 'SCORE' => '95.0', 'GAP_SYMBOL' => '-', 'CONSERVED' => 0, 'HIT_NAME' => 'Chr2_540228_540404_+', 'HIT_DESC' => '', 'HIT_START' => '20', 'HIT_END' => '74', 'HIT_LENGTH' => 56, 'HIT_SEQ' => 'tegpkvaeartCesqShkFkGpCvsdtnCasvCrtEgfpgGecrg-rrrCfCtkpc', 'HIT_FRAME' => 0, 'QUERY_NAME' => 'CRP0000', 'QUERY_DESC' => undef, 'QUERY_START' => '4', 'QUERY_END' => '59', 'QUERY_LENGTH' => '75', 'QUERY_FRAME' => 0, 'QUERY_SEQ' => 'GMGPVTVEARTCESKSHRFKGPCVSTHNCANVCHNEGFGGGKCRGfRRRCYCTRHC', 'HOMOLOGY_SEQ' => '++gp+++eartCes+Sh+FkGpCvs +nCa+vC++Egf gG+crg rrrC+Ct++c', }, 'Bio::Search::HSP::HMMERHSP' ); Clearly, the "HIT_START", "HIT_END", "HIT_SEQ" should actually be exchanged with "QUERY_START", "QUERY_END" and "QUERY_SEQ" values. Thanks, Peng, On Tuesday, July 19, 2011 11:23:20 PM UTC-5, Givan, Scott A. wrote: > > I'll try the bioperl-live version. Thanks guys. > > Scott Givan > 541-740-4685 > Sent from an iPhone (so expect typos). > > On Jul 19, 2011, at 10:34 PM, "Chris Fields" > wrote: > > > This might be a disconnect between the HMMER3 version in bioperl-live > and the one in Kai's bioperl-hmmer3 repo. I believe the one in > bioperl-live is newer. Scott, can you give that a try? > > > > chris > > > > On Jul 19, 2011, at 9:45 PM, Thomas Sharpton wrote: > > > >> Hi Scott, > >> > >> Thanks for writing. I'm on the road at the moment so I have to be > briefer and less thorough than I'd like to be. > >> > >> What you are observing is not the intended behavior. Oddly, it's not > what I recall obtaining in my tests on this software, though I was mostly > interested in hmmsearch at the time and may have been sloppier than I > should have been when it came to hmmscan. > >> > >> What version of HMMER3 you're using? There have been some small > formatting changes in the past that might be causing a burp in the parser, > though I'm doubting it. > >> > >> Kai Blin wrote some test scripts (found here: > bioperl-live/t/SearchIO/hmmer.t) that, if I recall correctly, evaluate > query/hit coordinates. It might be worth giving this a shot if you haven't > already. > >> > >> Also, if you don't mind, I'm happy to run your code on your output file > on my end. It might help me diagnose the problem. > >> > >> Sorry this is being a thorn in your side! I've cc'ed the list in case > anyone else has insight into this matter. > >> > >> Best, > >> Thomas > >> > >> On Jul 19, 2011, at 10:43 AM, Givan, Scott A. wrote: > >> > >>> Hi Thomas, > >>> > >>> I'm using modules in the bipoerl-hmmer3 git repository to parse hmmscan > >>> reports. When I parse the files and walk through the HSP's like: > >>> > >>> while (my $hit = $rslt->next_model) { > >>> > >>> while (my $domain = $hit->next_hsp) { > >>> > >>> And retrieve the "hit" coordinates like: > >>> > >>> print "hit coords: ", $domain->start('hit'), "-", > $domain->end('hit'), > >>> "\n"; > >>> > >>> The coordinates returned correspond to what I would call the "query", > >>> since they are for the sequence I fed to hmmscan to search the profile > >>> database. Likewise, when retrieving the query coordinates like > >>> $domain->start('query'), I get what I consider the "hit" coordinates, > >>> since they are for the domain profile. Is this the intended behavior? > >>> > >>> Thanks. > >>> > >>> scott > >>> > >>> -- > >>> Scott A. Givan > >>> Associate Director > >>> Informatics Research Core Facility > >>> 240e Bond Life Sciences Center > >>> Research Assistant Professor > >>> Molecular Microbiology and Immunology > >>> University of Missouri, Columbia > >>> > >>> TEL 573-882-2948 > >>> FAX 573-884-9676 > >>> http://ircf.rnet.missouri.edu > >>> > >>> > >>> > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From zhoupenggeni at gmail.com Wed Jul 11 17:45:00 2012 From: zhoupenggeni at gmail.com (Peng Zhou) Date: Wed, 11 Jul 2012 10:45:00 -0700 (PDT) Subject: [Bioperl-l] hmmer3.pm question re query and hit coordinates In-Reply-To: References: <7CF4A2C5-F44F-4C0D-A3B7-5ED131A1A9ED@gmail.com> <1823BCEE-5D27-4FF9-8D57-082AE0CFE8ED@illinois.edu> Message-ID: Hello guys, Just a follow-up, it seems to me the bioperl-live version is still having the same problem - calling hit "query" while query sequence "hit". I also looked into the test script written for hmmer3 ( bioperl-live/t/SearchIO/hmmer.t), and it doesn't deal with the alignment part - I guess that's why this bug was not discovered. To be simple, here's an output of hmmsearch v3.0: # hmmsearch :: search profile(s) against a sequence database # HMMER 3.0 (March 2010); http://hmmer.org/ # Copyright (C) 2010 Howard Hughes Medical Institute. # Freely distributed under the GNU General Public License (GPLv3). # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - # query HMM file: /project/youngn/zhoup/Scripts/spada/profile/21_all.hmm # target sequence database: /project/youngn/zhoup/Data/misc3/spada/Athaliana/01_genome/12_refseq_orf.fa # output directed to file: /project/youngn/zhoup/Data/misc3/spada/Athaliana/11_hmmSearchX/01_raw.txt # number of worker threads: 4 # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Query: CRP0000 [M=75] Scores for complete sequences (score includes all domains): --- full sequence --- --- best 1 domain --- -#dom- E-value score bias E-value score bias exp N Sequence Description ------- ------ ----- ------- ------ ----- ---- -- -------- ----------- 5.5e-25 95.0 14.4 5.7e-25 95.0 10.0 1.0 1 Chr2_540228_540404_+ Domain annotation for each sequence (and alignments): >> Chr2_540228_540404_+ # score bias c-Evalue i-Evalue hmmfrom hmm to alifrom ali to envfrom env to acc --- ------ ----- --------- --------- ------- ------- ------- ------- ------- ------- ---- 1 ! 95.0 10.0 3.6e-30 5.7e-25 20 74 .. 4 59 .] 1 59 [] 0.95 Alignments for each domain: == domain 1 score: 95.0 bits; conditional E-value: 3.6e-30 CRP0000 20 tegpkvaeartCesqShkFkGpCvsdtnCasvCrtEgfpgGecrg.rrrCfCtkpc 74 ++gp+++eartCes+Sh+FkGpCvs +nCa+vC++Egf gG+crg rrrC+Ct++c Chr2_540228_540404_+ 4 GMGPVTVEARTCESKSHRFKGPCVSTHNCANVCHNEGFGGGKCRGfRRRCYCTRHC 59 568899***99********************************************* PP And here is a dump of the parsed HSP object: $VAR1 = bless( { 'VERBOSE' => 0, 'IDENTICAL' => 0, 'RANK' => 1, 'STRANDED' => 'NONE', 'EVALUE' => '3.6e-30', 'HSP_LENGTH' => 56, 'ALGORITHM' => 'HMMSEARCH' 'SCORE' => '95.0', 'GAP_SYMBOL' => '-', 'CONSERVED' => 0, 'HIT_NAME' => 'Chr2_540228_540404_+', 'HIT_DESC' => '', 'HIT_START' => '20', 'HIT_END' => '74', 'HIT_LENGTH' => 56, 'HIT_SEQ' => 'tegpkvaeartCesqShkFkGpCvsdtnCasvCrtEgfpgGecrg-rrrCfCtkpc', 'HIT_FRAME' => 0, 'QUERY_NAME' => 'CRP0000', 'QUERY_DESC' => undef, 'QUERY_START' => '4', 'QUERY_END' => '59', 'QUERY_LENGTH' => '75', 'QUERY_FRAME' => 0, 'QUERY_SEQ' => 'GMGPVTVEARTCESKSHRFKGPCVSTHNCANVCHNEGFGGGKCRGfRRRCYCTRHC', 'HOMOLOGY_SEQ' => '++gp+++eartCes+Sh+FkGpCvs +nCa+vC++Egf gG+crg rrrC+Ct++c', }, 'Bio::Search::HSP::HMMERHSP' ); Clearly, the "HIT_START", "HIT_END", "HIT_SEQ" should actually be exchanged with "QUERY_START", "QUERY_END" and "QUERY_SEQ" values. Thanks, Peng, On Tuesday, July 19, 2011 11:23:20 PM UTC-5, Givan, Scott A. wrote: > > I'll try the bioperl-live version. Thanks guys. > > Scott Givan > 541-740-4685 > Sent from an iPhone (so expect typos). > > On Jul 19, 2011, at 10:34 PM, "Chris Fields" > wrote: > > > This might be a disconnect between the HMMER3 version in bioperl-live > and the one in Kai's bioperl-hmmer3 repo. I believe the one in > bioperl-live is newer. Scott, can you give that a try? > > > > chris > > > > On Jul 19, 2011, at 9:45 PM, Thomas Sharpton wrote: > > > >> Hi Scott, > >> > >> Thanks for writing. I'm on the road at the moment so I have to be > briefer and less thorough than I'd like to be. > >> > >> What you are observing is not the intended behavior. Oddly, it's not > what I recall obtaining in my tests on this software, though I was mostly > interested in hmmsearch at the time and may have been sloppier than I > should have been when it came to hmmscan. > >> > >> What version of HMMER3 you're using? There have been some small > formatting changes in the past that might be causing a burp in the parser, > though I'm doubting it. > >> > >> Kai Blin wrote some test scripts (found here: > bioperl-live/t/SearchIO/hmmer.t) that, if I recall correctly, evaluate > query/hit coordinates. It might be worth giving this a shot if you haven't > already. > >> > >> Also, if you don't mind, I'm happy to run your code on your output file > on my end. It might help me diagnose the problem. > >> > >> Sorry this is being a thorn in your side! I've cc'ed the list in case > anyone else has insight into this matter. > >> > >> Best, > >> Thomas > >> > >> On Jul 19, 2011, at 10:43 AM, Givan, Scott A. wrote: > >> > >>> Hi Thomas, > >>> > >>> I'm using modules in the bipoerl-hmmer3 git repository to parse hmmscan > >>> reports. When I parse the files and walk through the HSP's like: > >>> > >>> while (my $hit = $rslt->next_model) { > >>> > >>> while (my $domain = $hit->next_hsp) { > >>> > >>> And retrieve the "hit" coordinates like: > >>> > >>> print "hit coords: ", $domain->start('hit'), "-", > $domain->end('hit'), > >>> "\n"; > >>> > >>> The coordinates returned correspond to what I would call the "query", > >>> since they are for the sequence I fed to hmmscan to search the profile > >>> database. Likewise, when retrieving the query coordinates like > >>> $domain->start('query'), I get what I consider the "hit" coordinates, > >>> since they are for the domain profile. Is this the intended behavior? > >>> > >>> Thanks. > >>> > >>> scott > >>> > >>> -- > >>> Scott A. Givan > >>> Associate Director > >>> Informatics Research Core Facility > >>> 240e Bond Life Sciences Center > >>> Research Assistant Professor > >>> Molecular Microbiology and Immunology > >>> University of Missouri, Columbia > >>> > >>> TEL 573-882-2948 > >>> FAX 573-884-9676 > >>> http://ircf.rnet.missouri.edu > >>> > >>> > >>> > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From zhoupenggeni at gmail.com Wed Jul 11 18:03:17 2012 From: zhoupenggeni at gmail.com (Peng Zhou) Date: Wed, 11 Jul 2012 11:03:17 -0700 (PDT) Subject: [Bioperl-l] Errors with Bio::Seq objects In-Reply-To: <4FFDB879.1020906@gmail.com> References: <4FFDB879.1020906@gmail.com> Message-ID: <25cf332a-f998-4eae-96ff-d9db1ee2ff2c@googlegroups.com> Hi, I guess that's what the commands are supposed to do: the get_SeqFeatures() method return an array of Bio::SeqFeature objects, and the translate() method returns a Bio::Seq object. And you can't simply "print" an object in perl - you can "dump" it though: $ perl -e ' use Bio::SeqIO; use Data::Dumper; my $seqio=Bio::SeqIO->new(-file=>shift); print Dumper($seqio->next_seq()->get_SeqFeatures()); ' nt.gb $ perl -e ' use Bio::SeqIO; use Data::Dumper; my $seqio=Bio::SeqIO->new(-file=>shift); print Dumper($seqio->next_seq()->translate()); ' nt.gb On Wednesday, July 11, 2012 12:31:37 PM UTC-5, De-Jian Zhao wrote: > > Hi, > > I downloaded a nucleotide sequence from Genbank (file name: nt.gb) and > tested the Bio::SeqIO module as follows. The first 3 commands succeeded; > however the last 2 failed. > > I think $seqio->next_seq() produces a Bio::Seq object which contains the > sequence, features and annotation (according to the DESCRIPTION of > "perldoc Bio::Seq") and thus the invocation of the methods > get_SeqFeatures() and translate() should be valid. However, the results > denied this idea. > > Will anyone explain what happened to the last 2 commands? I have > encountered numerous cases of failures when testing the bioperl methods. > I want to translate the mRNA sequence and extract the sequence features. > What are the right commands? Thanks a lot! > > Best, > Dejian > > > > PS: The commands and results > > $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); > print $seqio->next_seq()->display_id(); ' nt.gb > NM_053056 > > $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); > print $seqio->next_seq()->molecule(); ' nt.gb > mRNA > > $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); > print $seqio->next_seq()->subseq(1,6); ' nt.gb > CACACG > > $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); > print $seqio->next_seq()->get_SeqFeatures(); ' nt.gb > Bio::SeqFeature::Generic=HASH(0x20a30898)Bio::SeqFeature::Generic=HASH(0x20a30bb0)Bio::SeqFeature::Generic=HASH(0x20a30cd0)Bio::SeqFeature::Generic=HASH(0x20a317b0)Bio::SeqFeature::Generic=HASH(0x20a31720)Bio::SeqFeature::Generic=HASH(0x20a39a18)Bio::SeqFeature::Generic=HASH(0x20a317e0)Bio::SeqFeature::Generic=HASH(0x20a398b0)Bio::SeqFeature::Generic=HASH(0x20a39838)Bio::SeqFeature::Generic=HASH(0x20a39e98)Bio::SeqFeature::Generic=HASH(0x20a3b898)Bio::SeqFeature::Generic=HASH(0x20a3a120)Bio::SeqFeature::Generic=HASH(0x20a3bda8)Bio::SeqFeature::Generic=HASH(0x20a3c030)Bio::SeqFeature::Generic=HASH(0x20a3c2b8)Bio::SeqFeature::Generic=HASH(0x20a3be20)Bio::SeqFeature::Generic=HASH(0x20a3c0a8)Bio::SeqFeature::Generic=HASH(0x20a3bb98)Bio::SeqFeature::Generic=HASH(0x20a3c300)Bio::SeqFeature::Generic=HASH(0x20a3c588)Bio::SeqFeature::Generic=HASH(0x20a3d838)Bio::SeqFeature::Generic=HASH(0x20a3dfb8)Bio::SeqFeature::Generic=HASH(0x20a3dd18) > > > $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); > print $seqio->next_seq()->translate(); ' nt.gb > Bio::Seq::RichSeq=HASH(0x20a3e7b0) > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From zhoupenggeni at gmail.com Wed Jul 11 18:03:17 2012 From: zhoupenggeni at gmail.com (Peng Zhou) Date: Wed, 11 Jul 2012 11:03:17 -0700 (PDT) Subject: [Bioperl-l] Errors with Bio::Seq objects In-Reply-To: <4FFDB879.1020906@gmail.com> References: <4FFDB879.1020906@gmail.com> Message-ID: <25cf332a-f998-4eae-96ff-d9db1ee2ff2c@googlegroups.com> Hi, I guess that's what the commands are supposed to do: the get_SeqFeatures() method return an array of Bio::SeqFeature objects, and the translate() method returns a Bio::Seq object. And you can't simply "print" an object in perl - you can "dump" it though: $ perl -e ' use Bio::SeqIO; use Data::Dumper; my $seqio=Bio::SeqIO->new(-file=>shift); print Dumper($seqio->next_seq()->get_SeqFeatures()); ' nt.gb $ perl -e ' use Bio::SeqIO; use Data::Dumper; my $seqio=Bio::SeqIO->new(-file=>shift); print Dumper($seqio->next_seq()->translate()); ' nt.gb On Wednesday, July 11, 2012 12:31:37 PM UTC-5, De-Jian Zhao wrote: > > Hi, > > I downloaded a nucleotide sequence from Genbank (file name: nt.gb) and > tested the Bio::SeqIO module as follows. The first 3 commands succeeded; > however the last 2 failed. > > I think $seqio->next_seq() produces a Bio::Seq object which contains the > sequence, features and annotation (according to the DESCRIPTION of > "perldoc Bio::Seq") and thus the invocation of the methods > get_SeqFeatures() and translate() should be valid. However, the results > denied this idea. > > Will anyone explain what happened to the last 2 commands? I have > encountered numerous cases of failures when testing the bioperl methods. > I want to translate the mRNA sequence and extract the sequence features. > What are the right commands? Thanks a lot! > > Best, > Dejian > > > > PS: The commands and results > > $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); > print $seqio->next_seq()->display_id(); ' nt.gb > NM_053056 > > $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); > print $seqio->next_seq()->molecule(); ' nt.gb > mRNA > > $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); > print $seqio->next_seq()->subseq(1,6); ' nt.gb > CACACG > > $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); > print $seqio->next_seq()->get_SeqFeatures(); ' nt.gb > Bio::SeqFeature::Generic=HASH(0x20a30898)Bio::SeqFeature::Generic=HASH(0x20a30bb0)Bio::SeqFeature::Generic=HASH(0x20a30cd0)Bio::SeqFeature::Generic=HASH(0x20a317b0)Bio::SeqFeature::Generic=HASH(0x20a31720)Bio::SeqFeature::Generic=HASH(0x20a39a18)Bio::SeqFeature::Generic=HASH(0x20a317e0)Bio::SeqFeature::Generic=HASH(0x20a398b0)Bio::SeqFeature::Generic=HASH(0x20a39838)Bio::SeqFeature::Generic=HASH(0x20a39e98)Bio::SeqFeature::Generic=HASH(0x20a3b898)Bio::SeqFeature::Generic=HASH(0x20a3a120)Bio::SeqFeature::Generic=HASH(0x20a3bda8)Bio::SeqFeature::Generic=HASH(0x20a3c030)Bio::SeqFeature::Generic=HASH(0x20a3c2b8)Bio::SeqFeature::Generic=HASH(0x20a3be20)Bio::SeqFeature::Generic=HASH(0x20a3c0a8)Bio::SeqFeature::Generic=HASH(0x20a3bb98)Bio::SeqFeature::Generic=HASH(0x20a3c300)Bio::SeqFeature::Generic=HASH(0x20a3c588)Bio::SeqFeature::Generic=HASH(0x20a3d838)Bio::SeqFeature::Generic=HASH(0x20a3dfb8)Bio::SeqFeature::Generic=HASH(0x20a3dd18) > > > $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); > print $seqio->next_seq()->translate(); ' nt.gb > Bio::Seq::RichSeq=HASH(0x20a3e7b0) > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From zhoupenggeni at gmail.com Wed Jul 11 20:05:56 2012 From: zhoupenggeni at gmail.com (Peng Zhou) Date: Wed, 11 Jul 2012 13:05:56 -0700 (PDT) Subject: [Bioperl-l] hmmer3.pm question re query and hit coordinates In-Reply-To: References: <7CF4A2C5-F44F-4C0D-A3B7-5ED131A1A9ED@gmail.com> <1823BCEE-5D27-4FF9-8D57-082AE0CFE8ED@illinois.edu> Message-ID: Thanks Chris, here is the link of the filed bug: https://redmine.open-bio.org/issues/3369 On Wednesday, July 11, 2012 2:02:46 PM UTC-5, Christopher Fields wrote: > > Peng, > > Has this been filed as a bug yet? > > https://redmine.open-bio.org/projects/bioperl > > Seems like it would be fairly easy to fix, but I want to track it just in > case. > > chris > > On Jul 11, 2012, at 12:45 PM, Peng Zhou wrote: > > > Hello guys, > > > > Just a follow-up, it seems to me the bioperl-live version is still > having the same problem - calling hit "query" while query sequence "hit". I > also looked into the test script written for hmmer3 > (bioperl-live/t/SearchIO/hmmer.t), and it doesn't deal with the alignment > part - I guess that's why this bug was not discovered. > > > > To be simple, here's an output of hmmsearch v3.0: > > # hmmsearch :: search profile(s) against a sequence database > > # HMMER 3.0 (March 2010); http://hmmer.org/ > > # Copyright (C) 2010 Howard Hughes Medical Institute. > > # Freely distributed under the GNU General Public License (GPLv3). > > # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > - > > # query HMM file: > /project/youngn/zhoup/Scripts/spada/profile/21_all.hmm > > # target sequence database: > /project/youngn/zhoup/Data/misc3/spada/Athaliana/01_genome/12_refseq_orf.fa > > > # output directed to file: > /project/youngn/zhoup/Data/misc3/spada/Athaliana/11_hmmSearchX/01_raw.txt > > # number of worker threads: 4 > > # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > - > > > > Query: CRP0000 [M=75] > > Scores for complete sequences (score includes all domains): > > --- full sequence --- --- best 1 domain --- -#dom- > > E-value score bias E-value score bias exp N Sequence > Description > > ------- ------ ----- ------- ------ ----- ---- -- -------- > ----------- > > 5.5e-25 95.0 14.4 5.7e-25 95.0 10.0 1.0 1 > Chr2_540228_540404_+ > > > > Domain annotation for each sequence (and alignments): > > >> Chr2_540228_540404_+ > > # score bias c-Evalue i-Evalue hmmfrom hmm to alifrom ali > to envfrom env to acc > > --- ------ ----- --------- --------- ------- ------- ------- > ------- ------- ------- ---- > > 1 ! 95.0 10.0 3.6e-30 5.7e-25 20 74 .. 4 > 59 .] 1 59 [] 0.95 > > > > Alignments for each domain: > > == domain 1 score: 95.0 bits; conditional E-value: 3.6e-30 > > CRP0000 20 > tegpkvaeartCesqShkFkGpCvsdtnCasvCrtEgfpgGecrg.rrrCfCtkpc 74 > > ++gp+++eartCes+Sh+FkGpCvs +nCa+vC++Egf gG+crg > rrrC+Ct++c > > Chr2_540228_540404_+ 4 > GMGPVTVEARTCESKSHRFKGPCVSTHNCANVCHNEGFGGGKCRGfRRRCYCTRHC 59 > > > 568899***99********************************************* PP > > > > And here is a dump of the parsed HSP object: > > $VAR1 = bless( { > > 'VERBOSE' => 0, > > 'IDENTICAL' => 0, > > 'RANK' => 1, > > 'STRANDED' => 'NONE', > > 'EVALUE' => '3.6e-30', > > 'HSP_LENGTH' => 56, > > 'ALGORITHM' => 'HMMSEARCH' > > 'SCORE' => '95.0', > > 'GAP_SYMBOL' => '-', > > 'CONSERVED' => 0, > > > > 'HIT_NAME' => 'Chr2_540228_540404_+', > > 'HIT_DESC' => '', > > 'HIT_START' => '20', > > 'HIT_END' => '74', > > 'HIT_LENGTH' => 56, > > 'HIT_SEQ' => > 'tegpkvaeartCesqShkFkGpCvsdtnCasvCrtEgfpgGecrg-rrrCfCtkpc', > > 'HIT_FRAME' => 0, > > > > 'QUERY_NAME' => 'CRP0000', > > 'QUERY_DESC' => undef, > > 'QUERY_START' => '4', > > 'QUERY_END' => '59', > > 'QUERY_LENGTH' => '75', > > 'QUERY_FRAME' => 0, > > 'QUERY_SEQ' => > 'GMGPVTVEARTCESKSHRFKGPCVSTHNCANVCHNEGFGGGKCRGfRRRCYCTRHC', > > > > 'HOMOLOGY_SEQ' => '++gp+++eartCes+Sh+FkGpCvs > +nCa+vC++Egf gG+crg rrrC+Ct++c', > > }, 'Bio::Search::HSP::HMMERHSP' ); > > > > Clearly, the "HIT_START", "HIT_END", "HIT_SEQ" should actually be > exchanged with "QUERY_START", "QUERY_END" and "QUERY_SEQ" values. > > > > Thanks, > > > > Peng, > > > > On Tuesday, July 19, 2011 11:23:20 PM UTC-5, Givan, Scott A. wrote: > > I'll try the bioperl-live version. Thanks guys. > > Scott Givan > > 541-740-4685 > > Sent from an iPhone (so expect typos). > > > > On Jul 19, 2011, at 10:34 PM, "Chris Fields" > wrote: > > > > > This might be a disconnect between the HMMER3 version in bioperl-live > and the one in Kai's bioperl-hmmer3 repo. I believe the one in > bioperl-live is newer. Scott, can you give that a try? > > > > > > chris > > > > > > On Jul 19, 2011, at 9:45 PM, Thomas Sharpton wrote: > > > > > >> Hi Scott, > > >> > > >> Thanks for writing. I'm on the road at the moment so I have to be > briefer and less thorough than I'd like to be. > > >> > > >> What you are observing is not the intended behavior. Oddly, it's not > what I recall obtaining in my tests on this software, though I was mostly > interested in hmmsearch at the time and may have been sloppier than I > should have been when it came to hmmscan. > > >> > > >> What version of HMMER3 you're using? There have been some small > formatting changes in the past that might be causing a burp in the parser, > though I'm doubting it. > > >> > > >> Kai Blin wrote some test scripts (found here: > bioperl-live/t/SearchIO/hmmer.t) that, if I recall correctly, evaluate > query/hit coordinates. It might be worth giving this a shot if you haven't > already. > > >> > > >> Also, if you don't mind, I'm happy to run your code on your output > file on my end. It might help me diagnose the problem. > > >> > > >> Sorry this is being a thorn in your side! I've cc'ed the list in case > anyone else has insight into this matter. > > >> > > >> Best, > > >> Thomas > > >> > > >> On Jul 19, 2011, at 10:43 AM, Givan, Scott A. wrote: > > >> > > >>> Hi Thomas, > > >>> > > >>> I'm using modules in the bipoerl-hmmer3 git repository to parse > hmmscan > > >>> reports. When I parse the files and walk through the HSP's like: > > >>> > > >>> while (my $hit = $rslt->next_model) { > > >>> > > >>> while (my $domain = $hit->next_hsp) { > > >>> > > >>> And retrieve the "hit" coordinates like: > > >>> > > >>> print "hit coords: ", $domain->start('hit'), "-", > $domain->end('hit'), > > >>> "\n"; > > >>> > > >>> The coordinates returned correspond to what I would call the > "query", > > >>> since they are for the sequence I fed to hmmscan to search the > profile > > >>> database. Likewise, when retrieving the query coordinates like > > >>> $domain->start('query'), I get what I consider the "hit" > coordinates, > > >>> since they are for the domain profile. Is this the intended > behavior? > > >>> > > >>> Thanks. > > >>> > > >>> scott > > >>> > > >>> -- > > >>> Scott A. Givan > > >>> Associate Director > > >>> Informatics Research Core Facility > > >>> 240e Bond Life Sciences Center > > >>> Research Assistant Professor > > >>> Molecular Microbiology and Immunology > > >>> University of Missouri, Columbia > > >>> > > >>> TEL 573-882-2948 > > >>> FAX 573-884-9676 > > >>> http://ircf.rnet.missouri.edu > > >>> > > >>> > > >>> > > >> > > >> _______________________________________________ > > >> Bioperl-l mailing list > > >> Bioperl-l at lists.open-bio.org > > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From zhoupenggeni at gmail.com Wed Jul 11 20:05:56 2012 From: zhoupenggeni at gmail.com (Peng Zhou) Date: Wed, 11 Jul 2012 13:05:56 -0700 (PDT) Subject: [Bioperl-l] hmmer3.pm question re query and hit coordinates In-Reply-To: References: <7CF4A2C5-F44F-4C0D-A3B7-5ED131A1A9ED@gmail.com> <1823BCEE-5D27-4FF9-8D57-082AE0CFE8ED@illinois.edu> Message-ID: Thanks Chris, here is the link of the filed bug: https://redmine.open-bio.org/issues/3369 On Wednesday, July 11, 2012 2:02:46 PM UTC-5, Christopher Fields wrote: > > Peng, > > Has this been filed as a bug yet? > > https://redmine.open-bio.org/projects/bioperl > > Seems like it would be fairly easy to fix, but I want to track it just in > case. > > chris > > On Jul 11, 2012, at 12:45 PM, Peng Zhou wrote: > > > Hello guys, > > > > Just a follow-up, it seems to me the bioperl-live version is still > having the same problem - calling hit "query" while query sequence "hit". I > also looked into the test script written for hmmer3 > (bioperl-live/t/SearchIO/hmmer.t), and it doesn't deal with the alignment > part - I guess that's why this bug was not discovered. > > > > To be simple, here's an output of hmmsearch v3.0: > > # hmmsearch :: search profile(s) against a sequence database > > # HMMER 3.0 (March 2010); http://hmmer.org/ > > # Copyright (C) 2010 Howard Hughes Medical Institute. > > # Freely distributed under the GNU General Public License (GPLv3). > > # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > - > > # query HMM file: > /project/youngn/zhoup/Scripts/spada/profile/21_all.hmm > > # target sequence database: > /project/youngn/zhoup/Data/misc3/spada/Athaliana/01_genome/12_refseq_orf.fa > > > # output directed to file: > /project/youngn/zhoup/Data/misc3/spada/Athaliana/11_hmmSearchX/01_raw.txt > > # number of worker threads: 4 > > # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > - > > > > Query: CRP0000 [M=75] > > Scores for complete sequences (score includes all domains): > > --- full sequence --- --- best 1 domain --- -#dom- > > E-value score bias E-value score bias exp N Sequence > Description > > ------- ------ ----- ------- ------ ----- ---- -- -------- > ----------- > > 5.5e-25 95.0 14.4 5.7e-25 95.0 10.0 1.0 1 > Chr2_540228_540404_+ > > > > Domain annotation for each sequence (and alignments): > > >> Chr2_540228_540404_+ > > # score bias c-Evalue i-Evalue hmmfrom hmm to alifrom ali > to envfrom env to acc > > --- ------ ----- --------- --------- ------- ------- ------- > ------- ------- ------- ---- > > 1 ! 95.0 10.0 3.6e-30 5.7e-25 20 74 .. 4 > 59 .] 1 59 [] 0.95 > > > > Alignments for each domain: > > == domain 1 score: 95.0 bits; conditional E-value: 3.6e-30 > > CRP0000 20 > tegpkvaeartCesqShkFkGpCvsdtnCasvCrtEgfpgGecrg.rrrCfCtkpc 74 > > ++gp+++eartCes+Sh+FkGpCvs +nCa+vC++Egf gG+crg > rrrC+Ct++c > > Chr2_540228_540404_+ 4 > GMGPVTVEARTCESKSHRFKGPCVSTHNCANVCHNEGFGGGKCRGfRRRCYCTRHC 59 > > > 568899***99********************************************* PP > > > > And here is a dump of the parsed HSP object: > > $VAR1 = bless( { > > 'VERBOSE' => 0, > > 'IDENTICAL' => 0, > > 'RANK' => 1, > > 'STRANDED' => 'NONE', > > 'EVALUE' => '3.6e-30', > > 'HSP_LENGTH' => 56, > > 'ALGORITHM' => 'HMMSEARCH' > > 'SCORE' => '95.0', > > 'GAP_SYMBOL' => '-', > > 'CONSERVED' => 0, > > > > 'HIT_NAME' => 'Chr2_540228_540404_+', > > 'HIT_DESC' => '', > > 'HIT_START' => '20', > > 'HIT_END' => '74', > > 'HIT_LENGTH' => 56, > > 'HIT_SEQ' => > 'tegpkvaeartCesqShkFkGpCvsdtnCasvCrtEgfpgGecrg-rrrCfCtkpc', > > 'HIT_FRAME' => 0, > > > > 'QUERY_NAME' => 'CRP0000', > > 'QUERY_DESC' => undef, > > 'QUERY_START' => '4', > > 'QUERY_END' => '59', > > 'QUERY_LENGTH' => '75', > > 'QUERY_FRAME' => 0, > > 'QUERY_SEQ' => > 'GMGPVTVEARTCESKSHRFKGPCVSTHNCANVCHNEGFGGGKCRGfRRRCYCTRHC', > > > > 'HOMOLOGY_SEQ' => '++gp+++eartCes+Sh+FkGpCvs > +nCa+vC++Egf gG+crg rrrC+Ct++c', > > }, 'Bio::Search::HSP::HMMERHSP' ); > > > > Clearly, the "HIT_START", "HIT_END", "HIT_SEQ" should actually be > exchanged with "QUERY_START", "QUERY_END" and "QUERY_SEQ" values. > > > > Thanks, > > > > Peng, > > > > On Tuesday, July 19, 2011 11:23:20 PM UTC-5, Givan, Scott A. wrote: > > I'll try the bioperl-live version. Thanks guys. > > Scott Givan > > 541-740-4685 > > Sent from an iPhone (so expect typos). > > > > On Jul 19, 2011, at 10:34 PM, "Chris Fields" > wrote: > > > > > This might be a disconnect between the HMMER3 version in bioperl-live > and the one in Kai's bioperl-hmmer3 repo. I believe the one in > bioperl-live is newer. Scott, can you give that a try? > > > > > > chris > > > > > > On Jul 19, 2011, at 9:45 PM, Thomas Sharpton wrote: > > > > > >> Hi Scott, > > >> > > >> Thanks for writing. I'm on the road at the moment so I have to be > briefer and less thorough than I'd like to be. > > >> > > >> What you are observing is not the intended behavior. Oddly, it's not > what I recall obtaining in my tests on this software, though I was mostly > interested in hmmsearch at the time and may have been sloppier than I > should have been when it came to hmmscan. > > >> > > >> What version of HMMER3 you're using? There have been some small > formatting changes in the past that might be causing a burp in the parser, > though I'm doubting it. > > >> > > >> Kai Blin wrote some test scripts (found here: > bioperl-live/t/SearchIO/hmmer.t) that, if I recall correctly, evaluate > query/hit coordinates. It might be worth giving this a shot if you haven't > already. > > >> > > >> Also, if you don't mind, I'm happy to run your code on your output > file on my end. It might help me diagnose the problem. > > >> > > >> Sorry this is being a thorn in your side! I've cc'ed the list in case > anyone else has insight into this matter. > > >> > > >> Best, > > >> Thomas > > >> > > >> On Jul 19, 2011, at 10:43 AM, Givan, Scott A. wrote: > > >> > > >>> Hi Thomas, > > >>> > > >>> I'm using modules in the bipoerl-hmmer3 git repository to parse > hmmscan > > >>> reports. When I parse the files and walk through the HSP's like: > > >>> > > >>> while (my $hit = $rslt->next_model) { > > >>> > > >>> while (my $domain = $hit->next_hsp) { > > >>> > > >>> And retrieve the "hit" coordinates like: > > >>> > > >>> print "hit coords: ", $domain->start('hit'), "-", > $domain->end('hit'), > > >>> "\n"; > > >>> > > >>> The coordinates returned correspond to what I would call the > "query", > > >>> since they are for the sequence I fed to hmmscan to search the > profile > > >>> database. Likewise, when retrieving the query coordinates like > > >>> $domain->start('query'), I get what I consider the "hit" > coordinates, > > >>> since they are for the domain profile. Is this the intended > behavior? > > >>> > > >>> Thanks. > > >>> > > >>> scott > > >>> > > >>> -- > > >>> Scott A. Givan > > >>> Associate Director > > >>> Informatics Research Core Facility > > >>> 240e Bond Life Sciences Center > > >>> Research Assistant Professor > > >>> Molecular Microbiology and Immunology > > >>> University of Missouri, Columbia > > >>> > > >>> TEL 573-882-2948 > > >>> FAX 573-884-9676 > > >>> http://ircf.rnet.missouri.edu > > >>> > > >>> > > >>> > > >> > > >> _______________________________________________ > > >> Bioperl-l mailing list > > >> Bioperl-l at lists.open-bio.org > > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From w.arindrarto at gmail.com Wed Jul 11 21:25:44 2012 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Wed, 11 Jul 2012 23:25:44 +0200 Subject: [Bioperl-l] hmmer3.pm question re query and hit coordinates In-Reply-To: References: <7CF4A2C5-F44F-4C0D-A3B7-5ED131A1A9ED@gmail.com> <1823BCEE-5D27-4FF9-8D57-082AE0CFE8ED@illinois.edu> Message-ID: Hi everyone, Just as an additional info that might be useful: The current Biopython parser for the plain text format parses the very first line to find out which HMMER flavor produces the result. Both 'hmm from' and 'hmmto' are query coordinates if the flavor is hmmsearch or phmmer; and they're hit coordinates if the flavor is hmmscan. This information is not available in other HMMER command line output formats (tblout and domtblout), which as Peter has mentioned, required us to treat different flavors of the table output as different formats for the time being. Fortunately, after contacting the HMMER developers they mentioned that this is not the case anymore in their development branch (and their future planned release). Hope that helps :), Bow On Wed, Jul 11, 2012 at 11:00 PM, Peter Cock wrote: > On Wed, Jul 11, 2012 at 8:02 PM, Fields, Christopher J > wrote: >> Peng, >> >> Has this been filed as a bug yet? >> >> https://redmine.open-bio.org/projects/bioperl >> >> Seems like it would be fairly easy to fix, but I want to track it just in case. >> >> chris > > Hi all, > > This could be the unfortunate fact that hmmscan and > hmmsearch return very similar tabular output, but > with query and hit interchanged. i.e. You need some > extra information to know which way round they are > (not possible with the current output). This was an > issue in Bow's Biopython SearchIO project - which > for the moment he solved by handling this as two > hmmer file formats. In the medium term we're hoping > hmmer3 will add some header information or something. > > Peter From dejian.zhao at gmail.com Thu Jul 12 05:04:54 2012 From: dejian.zhao at gmail.com (De-Jian Zhao) Date: Thu, 12 Jul 2012 13:04:54 +0800 Subject: [Bioperl-l] Errors with Bio::Seq objects In-Reply-To: <25cf332a-f998-4eae-96ff-d9db1ee2ff2c@googlegroups.com> References: <4FFDB879.1020906@gmail.com> <25cf332a-f998-4eae-96ff-d9db1ee2ff2c@googlegroups.com> Message-ID: <4FFE5AF6.1020300@gmail.com> Thank you, Peng. That's great! Actually I am wondering how to get the whole content of an object these days; "Dumping it" is a good solution. On 2012-7-12 2:03, Peng Zhou wrote: > Hi, > > I guess that's what the commands are supposed to do: the get_SeqFeatures() > method return an array of Bio::SeqFeature objects, and the translate() > method returns a Bio::Seq object. And you can't simply "print" an object in > perl - you can "dump" it though: > > $ perl -e ' use Bio::SeqIO; use Data::Dumper; my > $seqio=Bio::SeqIO->new(-file=>shift); > print Dumper($seqio->next_seq()->get_SeqFeatures()); ' nt.gb > > $ perl -e ' use Bio::SeqIO; use Data::Dumper; my > $seqio=Bio::SeqIO->new(-file=>shift); > print Dumper($seqio->next_seq()->translate()); ' nt.gb > > On Wednesday, July 11, 2012 12:31:37 PM UTC-5, De-Jian Zhao wrote: >> Hi, >> >> I downloaded a nucleotide sequence from Genbank (file name: nt.gb) and >> tested the Bio::SeqIO module as follows. The first 3 commands succeeded; >> however the last 2 failed. >> >> I think $seqio->next_seq() produces a Bio::Seq object which contains the >> sequence, features and annotation (according to the DESCRIPTION of >> "perldoc Bio::Seq") and thus the invocation of the methods >> get_SeqFeatures() and translate() should be valid. However, the results >> denied this idea. >> >> Will anyone explain what happened to the last 2 commands? I have >> encountered numerous cases of failures when testing the bioperl methods. >> I want to translate the mRNA sequence and extract the sequence features. >> What are the right commands? Thanks a lot! >> >> Best, >> Dejian >> >> >> >> PS: The commands and results >> >> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); >> print $seqio->next_seq()->display_id(); ' nt.gb >> NM_053056 >> >> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); >> print $seqio->next_seq()->molecule(); ' nt.gb >> mRNA >> >> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); >> print $seqio->next_seq()->subseq(1,6); ' nt.gb >> CACACG >> >> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); >> print $seqio->next_seq()->get_SeqFeatures(); ' nt.gb >> Bio::SeqFeature::Generic=HASH(0x20a30898)Bio::SeqFeature::Generic=HASH(0x20a30bb0)Bio::SeqFeature::Generic=HASH(0x20a30cd0)Bio::SeqFeature::Generic=HASH(0x20a317b0)Bio::SeqFeature::Generic=HASH(0x20a31720)Bio::SeqFeature::Generic=HASH(0x20a39a18)Bio::SeqFeature::Generic=HASH(0x20a317e0)Bio::SeqFeature::Generic=HASH(0x20a398b0)Bio::SeqFeature::Generic=HASH(0x20a39838)Bio::SeqFeature::Generic=HASH(0x20a39e98)Bio::SeqFeature::Generic=HASH(0x20a3b898)Bio::SeqFeature::Generic=HASH(0x20a3a120)Bio::SeqFeature::Generic=HASH(0x20a3bda8)Bio::SeqFeature::Generic=HASH(0x20a3c030)Bio::SeqFeature::Generic=HASH(0x20a3c2b8)Bio::SeqFeature::Generic=HASH(0x20a3be20)Bio::SeqFeature::Generic=HASH(0x20a3c0a8)Bio::SeqFeature::Generic=HASH(0x20a3bb98)Bio::SeqFeature::Generic=HASH(0x20a3c300)Bio::SeqFeature::Generic=HASH(0x20a3c588)Bio::SeqFeature::Generic=HASH(0x20a3d838)Bio::SeqFeature::Generic=HASH(0x20a3dfb8)Bio::SeqFeature::Generic=HASH(0x20a3dd18) >> >> >> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); >> print $seqio->next_seq()->translate(); ' nt.gb >> Bio::Seq::RichSeq=HASH(0x20a3e7b0) >> From dejian.zhao at gmail.com Thu Jul 12 05:14:33 2012 From: dejian.zhao at gmail.com (De-Jian Zhao) Date: Thu, 12 Jul 2012 13:14:33 +0800 Subject: [Bioperl-l] Errors with Bio::Seq objects In-Reply-To: <9CA9DA3A-B03F-4EC3-977C-E18A6F4D9B6F@tamu.edu> References: <4FFDB879.1020906@gmail.com> <9CA9DA3A-B03F-4EC3-977C-E18A6F4D9B6F@tamu.edu> Message-ID: <4FFE5D39.6010406@gmail.com> Thank you, Jim. You are right. It works. This example deepens my understanding of OOP. On 2012-7-12 2:01, Jim Hu wrote: >> $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); print $seqio->next_seq()->translate(); ' nt.gb >> > Bio::Seq::RichSeq=HASH(0x20a3e7b0) > ->translate returns a new Seq object. I think > > $ perl -e ' use Bio::SeqIO; my $seqio=Bio::SeqIO->new(-file=>shift); print $seqio->next_seq()->translate()->seq(); ' nt.gb > > should work (haven't tried it). From kai.blin at biotech.uni-tuebingen.de Thu Jul 12 13:43:19 2012 From: kai.blin at biotech.uni-tuebingen.de (Kai Blin) Date: Thu, 12 Jul 2012 15:43:19 +0200 Subject: [Bioperl-l] hmmer3.pm question re query and hit coordinates In-Reply-To: References: <7CF4A2C5-F44F-4C0D-A3B7-5ED131A1A9ED@gmail.com> <1823BCEE-5D27-4FF9-8D57-082AE0CFE8ED@illinois.edu> Message-ID: <4FFED477.3090907@biotech.uni-tuebingen.de> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 2012-07-11 23:25, Wibowo Arindrarto wrote: Hi, > The current Biopython parser for the plain text format parses the > very first line to find out which HMMER flavor produces the result. > Both 'hmm from' and 'hmmto' are query coordinates if the flavor is > hmmsearch or phmmer; and they're hit coordinates if the flavor is > hmmscan. Whoops. I mostly looked at hmmscan when writing the parser, because that's the file format I needed for my code. The code clearly should follow the way the hmmer2 parser works, and differentiate between hmmsearch and hmmscan type output. As I said on the bug report, I'm happy to look at code fixing this. > This information is not available in other HMMER command line > output formats (tblout and domtblout), which as Peter has > mentioned, required us to treat different flavors of the table > output as different formats for the time being. As far as I'm aware, BioPerl currently doesn't parse the table output format. Seeing how much repeated pain we run into with all these parsers in the different Bio* projects, I wonder if there was a smarter way to deal with parsing. Maybe at least some shared grammar file that we could use for testing, to make sure we at least have the same expectations about file formats in the different language implementations. Ideally we'd auto-generate the parsers from the grammar specification, but I guess that'll stay wishful thinking for quite a bit. > Fortunately, after contacting the HMMER developers they mentioned > that this is not the case anymore in their development branch (and > their future planned release). That's certainly good news. :) Cheers, Kai - -- Dipl.-Inform. Kai Blin kai.blin at biotech.uni-tuebingen.de Institute for Microbiology and Infection Medicine Division of Microbiology/Biotechnology Eberhard-Karls-Universit?t T?bingen Auf der Morgenstelle 28 Phone : ++49 7071 29-78841 D-72076 T?bingen Fax : ++49 7071 29-5979 Germany Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQEcBAEBAgAGBQJP/tR3AAoJEKM5lwBiwTTP6OoIAM3J9chdyfmTuQTp4KMxVIk7 PCkJy+aLcnfa3d7s8BVPG0GWQTPrfHLX6a7zWfoSLzL9RBShFWCQIxGpu+Tq3yR8 Hu/TpoFIg8bB1iAroAWLdsX8nio3Idlcl5JN38LBsFEUirFrGAsvfdN/+fYrP5Ni y0ULP18uihiN07sVG88nZXNyEB7fIscVYdO90GsGq03/KOTRsRD4kugapiQJIy4D lrqnYznLa4p30lBDCEHbTaHYbfIs7/8tryfHJsfjimjg8IoSMHMJfIkI7/z0qlL+ bxt/HuGMsm1Ak08xEAoT7T00t5tcAp1gclgZsO/CrviOicmhUgd6iri/kIpzg0c= =acWd -----END PGP SIGNATURE----- From cjfields at illinois.edu Thu Jul 12 15:24:13 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Thu, 12 Jul 2012 15:24:13 +0000 Subject: [Bioperl-l] hmmer3.pm question re query and hit coordinates In-Reply-To: <4FFED477.3090907@biotech.uni-tuebingen.de> References: <7CF4A2C5-F44F-4C0D-A3B7-5ED131A1A9ED@gmail.com> <1823BCEE-5D27-4FF9-8D57-082AE0CFE8ED@illinois.edu> <4FFED477.3090907@biotech.uni-tuebingen.de> Message-ID: <1C3A31F9-9717-49F3-A880-FA725D0F3CDB@illinois.edu> On Jul 12, 2012, at 8:43 AM, Kai Blin wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On 2012-07-11 23:25, Wibowo Arindrarto wrote: > > Hi, > >> The current Biopython parser for the plain text format parses the >> very first line to find out which HMMER flavor produces the result. >> Both 'hmm from' and 'hmmto' are query coordinates if the flavor is >> hmmsearch or phmmer; and they're hit coordinates if the flavor is >> hmmscan. > > Whoops. I mostly looked at hmmscan when writing the parser, because > that's the file format I needed for my code. The code clearly should > follow the way the hmmer2 parser works, and differentiate between > hmmsearch and hmmscan type output. > > As I said on the bug report, I'm happy to look at code fixing this. Seems like it should be easy enough to address if there is something in the output that indicates the report type. >> This information is not available in other HMMER command line >> output formats (tblout and domtblout), which as Peter has >> mentioned, required us to treat different flavors of the table >> output as different formats for the time being. > > As far as I'm aware, BioPerl currently doesn't parse the table output > format. The only reason to do so is if the table provides additional information the actual hits don't (this can be the case with BLAST reports). > Seeing how much repeated pain we run into with all these parsers in > the different Bio* projects, I wonder if there was a smarter way to > deal with parsing. Maybe at least some shared grammar file that we > could use for testing, to make sure we at least have the same > expectations about file formats in the different language > implementations. Ideally we'd auto-generate the parsers from the > grammar specification, but I guess that'll stay wishful thinking for > quite a bit. I would fully support something like this, been thinking about this with Marpa::XS (which now has a compiled library, libmarpa, to make it less perl-centric), and there have been talks of using a similar toolkit with the bioruby folks. We could always have a plain-perl/python/ruby/etc fallback in the most common formats. chris From buschj at hhu.de Sun Jul 15 19:46:42 2012 From: buschj at hhu.de (jobu) Date: Sun, 15 Jul 2012 21:46:42 +0200 Subject: [Bioperl-l] How to obtain Up- and Downstream target-Sequences of Blast Matches Message-ID: <50031E22.3060902@hhu.de> Dear All. Still being a beginner in Perl and just having started to look into BioPerl, I hope to ask my question at the right place. I locally ran a standalone blastn search of many short query-sequences against a set of target-fasta-sequences consisting of whole chromosomal sequence data. What I need to do now is to get let's say 100nt each Up- and Downstream out of my target sequences for each Blast match. At this point I only can assume that BioPerl might be helpfull in resolving this task, though I haven't found a module yet that will manage to do this locally on my harddrive. Thus I would be thankful for the slightest hint where to begin. Sincerely Jochen From Russell.Smithies at agresearch.co.nz Sun Jul 15 21:19:14 2012 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Mon, 16 Jul 2012 09:19:14 +1200 Subject: [Bioperl-l] How to obtain Up- and Downstream target-Sequences of Blast Matches In-Reply-To: <50031E22.3060902@hhu.de> References: <50031E22.3060902@hhu.de> Message-ID: <18DF7D20DFEC044098A1062202F5FFF34CCF2A4CAA@exchsth.agresearch.co.nz> Hi Jochen, I don't think BioPerl can directly manipulate blast databases so I'd probably do it with fastacmd to extract the sequence from the original blast database. eg. fastacmd -s X51494.1 -d /dataset/blastdata/active/nt -L 100,200 >gi|20090|emb|X51494.1|:100-200 Rice prolamin gene (strain NE4) ATGATGCAAACGTTGGGCATGGGTAGCTCCACAGCCATGTTCATGTCGCAGCCAATGGCGCTCCTGCAGCAGCAATGTTG CATGCAGCTACAAGGCATGAT Or if you're using blast+, use the blastdbcmd command: eg. blastdbcmd -entry X51494.1 -db /dataset/blastdata/active/nt -range 100-200 >gi|20090|emb|X51494.1|:100-200 Rice prolamin gene (strain NE4) ATGATGCAAACGTTGGGCATGGGTAGCTCCACAGCCATGTTCATGTCGCAGCCAATGGCGCTCCTGCAGCAGCAATGTTG CATGCAGCTACAAGGCATGAT So to add it all together, try using BioPerl to parse your existing blast results and pull out each hit's coordinates then use a system call to exec fastacmd or blastdbcmd to extract the sequence from the blast database then write the sequences to file. These might be useful: http://www.bioperl.org/wiki/HOWTO:SearchIO http://www.bioperl.org/wiki/HOWTO:SearchIO#Speed_improvements_with_lightweight_objects http://www.bioperl.org/wiki/HOWTO:BlastPlus http://www.bioperl.org/wiki/HOWTO:StandAloneBlast --Russell -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of jobu Sent: Monday, 16 July 2012 7:47 a.m. To: bioperl-l at lists.open-bio.org Subject: [Bioperl-l] How to obtain Up- and Downstream target-Sequences of Blast Matches Dear All. Still being a beginner in Perl and just having started to look into BioPerl, I hope to ask my question at the right place. I locally ran a standalone blastn search of many short query-sequences against a set of target-fasta-sequences consisting of whole chromosomal sequence data. What I need to do now is to get let's say 100nt each Up- and Downstream out of my target sequences for each Blast match. At this point I only can assume that BioPerl might be helpfull in resolving this task, though I haven't found a module yet that will manage to do this locally on my harddrive. Thus I would be thankful for the slightest hint where to begin. Sincerely Jochen _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From dcmertens.perl at gmail.com Tue Jul 17 12:57:55 2012 From: dcmertens.perl at gmail.com (David Mertens) Date: Tue, 17 Jul 2012 07:57:55 -0500 Subject: [Bioperl-l] Announcing The Quantified Onion Google Group and perl4science.github.com Message-ID: Hello everybody - I returned from YAPC::NA this year intending to build-up the scientific Perl community. One outgrowth of this has been Joel Berger's creation of perl4science.github.com and gizmomathboy's creation of The Quantified Onion Google Group . perl4science is meant to be a landing page for anybody looking to combine Perl and science. Since it is a github repository, it makes it about as easy as possible for others to contribute content or fixes. If you have a project that scientists would find useful, you should fork the project, add your content, and issue a pull request. It's that easy. The Quantified Onion is meant to be a space for scientists to discuss how we use Perl in our science and to work together to grow adoption of Perl among scientists. It will undoubtedly attract newcomers to Perll asking beginner questions, at which point we will gently refer them to the appropriate manual pages. Interesting discussions thus far (in my mind) include a discussion about teaching test-driven design and a discussion about submitting an article to Computing in Science and Engineering for their November Issue, which is supposed to be about Modern Programming Languages. I would like to begin putting on workshops on Perl for Scientists and Engineers (and encourage others to do that same), and I will begin the discussion on The Quantified Onion. If you know of other Perl science resources, please feel free to add them to perl4science or post them on The Quantified Onion, and please join The Quantified Onion. Together, we can grow Perl's adoption among scientists! David Mertens -- "Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it." -- Brian Kernighan From cjfields at illinois.edu Wed Jul 18 14:29:02 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Wed, 18 Jul 2012 14:29:02 +0000 Subject: [Bioperl-l] [blast-announce] OLD_BLAST parameter to be discontinued. Alternative NCBI BLAST parsable formats are available References: Message-ID: <118F034CF4C3EF48A96F86CE585B94BF2B63F6C5@CHIMBX5.ad.uillinois.edu> Not sure if anyone is using this as a means of getting their reports (I don't), but I'm posting this here just in case. -c Begin forwarded message: > From: "Mcginnis, Scott (NIH/NLM/NCBI) [E]" > Subject: [blast-announce] OLD_BLAST parameter to be discontinued. Alternative NCBI BLAST parsable formats are available > Date: July 18, 2012 9:17:05 AM CDT > To: NLM/NCBI List blast-announce > > Beginning Sept. 10, 2012, the BLAST service will ignore the OLD_BLAST parameter in posted URLs. We are removing this old and little used option to prepare for upcoming enhancements to the BLAST service later this year. Setting OLD_BLAST=true produces an older version of the BLAST HTML results that a few people have used for automated processing (parsing) of results. NCBI BLAST supports a number of different and more stable formats for parsing. These include XML, tabular reports and ASN.1. For more information, please see BLAST Developer Information (http://1.usa.gov/O8AocI) and links on that page. > From dejian.zhao at gmail.com Wed Jul 18 15:36:14 2012 From: dejian.zhao at gmail.com (De-Jian Zhao) Date: Wed, 18 Jul 2012 23:36:14 +0800 Subject: [Bioperl-l] Which graphic module should I learn? Message-ID: <5006D7EE.1020205@gmail.com> Hi, all. Currently I am working on a genome. I will draw some pictures based on the sequencing data. In the long run, I will use the module in my future projects, so I want to learn a popular module to get better support from the community. I searched in cpan with the command "i /SVG/" and got 234 items. Which one is popular in bioinformatics? Which module should I start with? Thanks for any suggestions. Best, De-Jian From scott at scottcain.net Wed Jul 18 15:46:01 2012 From: scott at scottcain.net (Scott Cain) Date: Wed, 18 Jul 2012 11:46:01 -0400 Subject: [Bioperl-l] Which graphic module should I learn? In-Reply-To: <5006D7EE.1020205@gmail.com> References: <5006D7EE.1020205@gmail.com> Message-ID: Hi De-Jian, Of course, it depends on what you want to do, but if you're referring to the genome feature/annotation type graphics, Bio::Graphics already supports SVG pretty well, via GD::SVG. Scott On Wed, Jul 18, 2012 at 11:36 AM, De-Jian Zhao wrote: > Hi, all. > > Currently I am working on a genome. I will draw some pictures based on the > sequencing data. In the long run, I will use the module in my future > projects, so I want to learn a popular module to get better support from the > community. I searched in cpan with the command "i /SVG/" and got 234 items. > Which one is popular in bioinformatics? Which module should I start with? > Thanks for any suggestions. > > Best, > De-Jian > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From cjfields at illinois.edu Wed Jul 25 03:08:05 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Wed, 25 Jul 2012 03:08:05 +0000 Subject: [Bioperl-l] BioPerl and Travis-CI Message-ID: <118F034CF4C3EF48A96F86CE585B94BF3140B017@CITESMBX5.ad.uillinois.edu> Peter Cock has graciously helped start up a branch for bioperl-live that is using Travis-CI (a nice continuous integration tool). Results from Peter's fork are found here: http://travis-ci.org/#!/peterjc/bioperl-live As this is now pulled into the main bioperl repo, results will be here: http://travis-ci.org/#!/bioperl/bioperl-live I'll be working on this and expect this will be added to master in the next few days. chris From p.j.a.cock at googlemail.com Wed Jul 25 10:31:13 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 25 Jul 2012 11:31:13 +0100 Subject: [Bioperl-l] BioPerl and Travis-CI In-Reply-To: <118F034CF4C3EF48A96F86CE585B94BF3140B017@CITESMBX5.ad.uillinois.edu> References: <118F034CF4C3EF48A96F86CE585B94BF3140B017@CITESMBX5.ad.uillinois.edu> Message-ID: On Wed, Jul 25, 2012 at 4:08 AM, Fields, Christopher J wrote: > Peter Cock has graciously helped start up a branch for bioperl-live > that is using Travis-CI (a nice continuous integration tool). Results > from Peter's fork are found here: > > http://travis-ci.org/#!/peterjc/bioperl-live > > As this is now pulled into the main bioperl repo, results will be here: > > http://travis-ci.org/#!/bioperl/bioperl-live > > I'll be working on this and expect this will be added to master in > the next few days. > > chris We've had this running for Biopython for a month now, and it has been a useful complement to the BuildBot (which covers other operating systems). This was following BioRuby's lead: http://biopython.org/pipermail/biopython-dev/2012-June/009742.html The current BioPerl Travis configuration is probably usable right now (after changing the branch whitelist to either master, or simple all branches). Other remaining issues include sorting out which dependencies should be installed, and streamlining their verbose output (e.g. using tail). TravisCI can send out emails (e.g. on test failures), and perhaps bioperl-guts-l might be a sensible place to send these. Initially we'd disabled the emails for Biopython. I'd like to use an RSS feed... there is a JSON API which BioRuby are using for http://www.biogems.info/ which tracks their plugins. Peter From p.j.a.cock at googlemail.com Fri Jul 27 15:03:05 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 27 Jul 2012 16:03:05 +0100 Subject: [Bioperl-l] BioPerl and Travis-CI In-Reply-To: <118F034CF4C3EF48A96F86CE585B94BF3140D21F@CITESMBX5.ad.uillinois.edu> References: <118F034CF4C3EF48A96F86CE585B94BF3140B017@CITESMBX5.ad.uillinois.edu> <118F034CF4C3EF48A96F86CE585B94BF3140D21F@CITESMBX5.ad.uillinois.edu> Message-ID: On Fri, Jul 27, 2012 at 3:58 PM, Fields, Christopher J wrote: > On Jul 27, 2012, at 9:47 AM, Peter Cock wrote: > >> On Thu, Jul 26, 2012 at 4:22 PM, Peter Cock wrote: >>> >>> That's done now - except for the circular dependencies, and GD, >>> which might be easy to solve if anyone knows what the error >>> means - see commit message here: >>> https://github.com/peterjc/bioperl-live/commit/905441ac09939be3368c14de38d04486c7e9849a >> >> Re: https://twitter.com/cjfields/status/228861370454638592 >> Not sure why you got GD to work when something very similar >> had failed for me. Oh well - job done :) > > It was the lack of gdlib-config in the libgd2-xpm package, you need > libgd2-xpm-dev. One of the fun things about Debian packaging. Ah - I should have guessed that. >>> Would a single clean commit of the (current) .travis.yml file be >>> preferable to the current series of commits? And you you want >>> a pull request, or would you just merge/cherry-pick manually? >> >> Given all the churn between our revisions, personally I'd opt for >> a single clean commit to bioperl/master - but your call. >> >> Peter > > Yep, about to merge it over. It's working now, just need to > whitelist master instead of travis after the merge. I'd removed the whitelist altogether here: https://github.com/peterjc/bioperl-live/commit/96dc5866f4406179353909c72d812623341c8fbd My thinking was BioPerl seems to have multiple feature branches under the official repo, so they should get tested too. You'd be in a better position than me to judge what would work best for BioPerl here. Peter From cjfields at illinois.edu Fri Jul 27 14:58:21 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Fri, 27 Jul 2012 14:58:21 +0000 Subject: [Bioperl-l] BioPerl and Travis-CI In-Reply-To: References: <118F034CF4C3EF48A96F86CE585B94BF3140B017@CITESMBX5.ad.uillinois.edu> Message-ID: <118F034CF4C3EF48A96F86CE585B94BF3140D21F@CITESMBX5.ad.uillinois.edu> On Jul 27, 2012, at 9:47 AM, Peter Cock wrote: > On Thu, Jul 26, 2012 at 4:22 PM, Peter Cock wrote: >> >> That's done now - except for the circular dependencies, and GD, >> which might be easy to solve if anyone knows what the error >> means - see commit message here: >> https://github.com/peterjc/bioperl-live/commit/905441ac09939be3368c14de38d04486c7e9849a > > Re: https://twitter.com/cjfields/status/228861370454638592 > Not sure why you got GD to work when something very similar > had failed for me. Oh well - job done :) It was the lack of gdlib-config in the libgd2-xpm package, you need libgd2-xpm-dev. One of the fun things about Debian packaging. >> Would a single clean commit of the (current) .travis.yml file be >> preferable to the current series of commits? And you you want >> a pull request, or would you just merge/cherry-pick manually? > > Given all the churn between our revisions, personally I'd opt for > a single clean commit to bioperl/master - but your call. > > Peter Yep, about to merge it over. It's working now, just need to whitelist master instead of travis after the merge. chris From cjfields at illinois.edu Fri Jul 27 16:26:34 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Fri, 27 Jul 2012 16:26:34 +0000 Subject: [Bioperl-l] BioPerl Travis-CI now live Message-ID: <118F034CF4C3EF48A96F86CE585B94BF3140D54D@CITESMBX5.ad.uillinois.edu> All commits to bioperl-live master branch on github are now being tracked: http://travis-ci.org/#!/bioperl/bioperl-live The .travis.yml file has a whitelist for branches to be tested; if anyone wants to test additional branches feel free to add them to the list! chris From cjfields at illinois.edu Fri Jul 27 15:15:19 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Fri, 27 Jul 2012 15:15:19 +0000 Subject: [Bioperl-l] BioPerl and Travis-CI In-Reply-To: References: <118F034CF4C3EF48A96F86CE585B94BF3140B017@CITESMBX5.ad.uillinois.edu> <118F034CF4C3EF48A96F86CE585B94BF3140D21F@CITESMBX5.ad.uillinois.edu> Message-ID: <118F034CF4C3EF48A96F86CE585B94BF3140D2D6@CITESMBX5.ad.uillinois.edu> On Jul 27, 2012, at 10:03 AM, Peter Cock wrote: > On Fri, Jul 27, 2012 at 3:58 PM, Fields, Christopher J > wrote: >> On Jul 27, 2012, at 9:47 AM, Peter Cock wrote: >> >>> On Thu, Jul 26, 2012 at 4:22 PM, Peter Cock wrote: >>>> >>>> That's done now - except for the circular dependencies, and GD, >>>> which might be easy to solve if anyone knows what the error >>>> means - see commit message here: >>>> https://github.com/peterjc/bioperl-live/commit/905441ac09939be3368c14de38d04486c7e9849a >>> >>> Re: https://twitter.com/cjfields/status/228861370454638592 >>> Not sure why you got GD to work when something very similar >>> had failed for me. Oh well - job done :) >> >> It was the lack of gdlib-config in the libgd2-xpm package, you need >> libgd2-xpm-dev. One of the fun things about Debian packaging. > > Ah - I should have guessed that. > >>>> Would a single clean commit of the (current) .travis.yml file be >>>> preferable to the current series of commits? And you you want >>>> a pull request, or would you just merge/cherry-pick manually? >>> >>> Given all the churn between our revisions, personally I'd opt for >>> a single clean commit to bioperl/master - but your call. >>> >>> Peter >> >> Yep, about to merge it over. It's working now, just need to >> whitelist master instead of travis after the merge. > > I'd removed the whitelist altogether here: > https://github.com/peterjc/bioperl-live/commit/96dc5866f4406179353909c72d812623341c8fbd > > My thinking was BioPerl seems to have multiple feature branches > under the official repo, so they should get tested too. You'd be > in a better position than me to judge what would work best for > BioPerl here. > > Peter We'll keep it to master for now. It's pretty easy to add branches as needed, and I didn't want to expand to all the potentially stale branches unless explicitly set (we need to triage all those at some point). chris From p.j.a.cock at googlemail.com Fri Jul 27 14:47:18 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 27 Jul 2012 15:47:18 +0100 Subject: [Bioperl-l] BioPerl and Travis-CI In-Reply-To: References: <118F034CF4C3EF48A96F86CE585B94BF3140B017@CITESMBX5.ad.uillinois.edu> Message-ID: On Thu, Jul 26, 2012 at 4:22 PM, Peter Cock wrote: > > That's done now - except for the circular dependencies, and GD, > which might be easy to solve if anyone knows what the error > means - see commit message here: > https://github.com/peterjc/bioperl-live/commit/905441ac09939be3368c14de38d04486c7e9849a Re: https://twitter.com/cjfields/status/228861370454638592 Not sure why you got GD to work when something very similar had failed for me. Oh well - job done :) > Would a single clean commit of the (current) .travis.yml file be > preferable to the current series of commits? And you you want > a pull request, or would you just merge/cherry-pick manually? Given all the churn between our revisions, personally I'd opt for a single clean commit to bioperl/master - but your call. Peter From robfsouza at gmail.com Fri Jul 27 22:29:22 2012 From: robfsouza at gmail.com (Robson de Souza) Date: Fri, 27 Jul 2012 15:29:22 -0700 (PDT) Subject: [Bioperl-l] obf sites offline? Message-ID: <9bef8a3b-08ca-4868-be7a-193e7596290d@googlegroups.com> I can't access any of the OBF sites, either from work (USA) or my phone... is there something going on? Robson From p.j.a.cock at googlemail.com Thu Jul 26 15:22:26 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 26 Jul 2012 16:22:26 +0100 Subject: [Bioperl-l] BioPerl and Travis-CI In-Reply-To: References: <118F034CF4C3EF48A96F86CE585B94BF3140B017@CITESMBX5.ad.uillinois.edu> Message-ID: On Wed, Jul 25, 2012 at 11:31 AM, Peter Cock wrote: > On Wed, Jul 25, 2012 at 4:08 AM, Fields, Christopher J > wrote: >> Peter Cock has graciously helped start up a branch for bioperl-live >> that is using Travis-CI (a nice continuous integration tool). Results >> from Peter's fork are found here: >> >> http://travis-ci.org/#!/peterjc/bioperl-live >> >> As this is now pulled into the main bioperl repo, results will be here: >> >> http://travis-ci.org/#!/bioperl/bioperl-live >> >> I'll be working on this and expect this will be added to master in >> the next few days. >> >> chris > > We've had this running for Biopython for a month now, and it has > been a useful complement to the BuildBot (which covers other > operating systems). This was following BioRuby's lead: > http://biopython.org/pipermail/biopython-dev/2012-June/009742.html > > The current BioPerl Travis configuration is probably usable right > now (after changing the branch whitelist to either master, or simple > all branches). > > Other remaining issues include sorting out which dependencies > should be installed, and streamlining their verbose output (e.g. > using tail). That's done now - except for the circular dependencies, and GD, which might be easy to solve if anyone knows what the error means - see commit message here: https://github.com/peterjc/bioperl-live/commit/905441ac09939be3368c14de38d04486c7e9849a Would a single clean commit of the (current) .travis.yml file be preferable to the current series of commits? And you you want a pull request, or would you just merge/cherry-pick manually? > TravisCI can send out emails (e.g. on test failures), and perhaps > bioperl-guts-l might be a sensible place to send these. Initially > we'd disabled the emails for Biopython. I'd like to use an RSS > feed... there is a JSON API which BioRuby are using for > http://www.biogems.info/ which tracks their plugins. I've filed an issue for news feed support in TravisCI, https://github.com/travis-ci/travis-core/issues/82 Regards, Peter From p.j.a.cock at googlemail.com Tue Jul 31 10:37:35 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 31 Jul 2012 11:37:35 +0100 Subject: [Bioperl-l] Travis Continuous Integration testing & pull requests Message-ID: Hi all, I'm cross posting as this is an announcement. Please keep any follow up discussion to the relevant project specific mailing list, or if general open-bio-l please. Those following the OBF blog or the OBF or Bio* Twitter accounts will have already seen this, which I posted yesterday: http://news.open-bio.org/news/2012/07/travis-ci-for-testing/ In summary, since earlier this year BioRuby and then Biopython and BioPerl have been using Travis-CI.org (a hosted continuous integration service for the open source community) to run their unit tests automatically whenever their GitHub repositories are updated. In addition we now have TravisCI automatically running our tests on any new GitHub pull requests - supported by an OBF donation to Travis-CI, see: http://about.travis-ci.org/blog/announcing-pull-request-support/ Currently BioJava only uses GitHub as an SVN mirror - but this should still let you start using TravisCI for automated testing: http://about.travis-ci.org/docs/user/languages/java/ For EMBOSS, this is another incentive to convert from CVS to github - TravisCI recently announced support for C/C++ projects: http://about.travis-ci.org/blog/support_for_go_c_and_cpp/ http://about.travis-ci.org/docs/user/languages/c/ Potentially there are other OBF projects where this would be useful too. Regards, Peter