From manju.rawat2 at gmail.com Thu Sep 1 02:53:53 2011 From: manju.rawat2 at gmail.com (Manju Rawat) Date: Thu, 1 Sep 2011 02:53:53 -0400 Subject: [Bioperl-l] Bioperl query.... In-Reply-To: References: <4E5CC8AC.8050800@gmail.com> Message-ID: Thanks For The Reply.. I have already seen this link..But I am confused. I used to following code and run it... my $in = Bio::SearchIO->new(-format => 'blast', -file => 'seqs.blast'); while( my $result = $in->next_result ) { ## $result is a Bio::Search::Result::ResultI compliant object while( my $hit = $result->next_hit ) { ## $hit is a Bio::Search::Hit::HitI compliant object while( my $hsp = $hit->next_hsp ) { ## $hsp is a Bio::Search::HSP::HSPI compliant object if( $hsp->length('total') > 50 ) { if ( $hsp->percent_identity >= 75 ) { print "Query=", $result->query_name, " Hit=", $hit->name, " Length=", $hsp->length('total'), " gaps=", $hsp->gaps, " Percent_id=", $hsp->percent_identity, "\n"; } } } }} and it showing me following output with an error that.. *Erro--*rArgument "" isn't numeric in numeric lt (<) at /usr/local/share/perl/5.10.1/Bio/SearchIO/SearchResultEventBuilder.pm line 279, line 4113. Query=NM_181451 Hit=ref|NM_181451.1| Length=1349 gaps=1 Percent_id=100 Query=NM_181451 Hit=ref|XM_002706247.1| Length=1345 gaps=13 Percent_id=93.8289962825279 Query=NM_181451 Hit=ref|NM_001098089.1| Length=1323 gaps=7 Percent_id=91.9123204837491 Query=NM_181451 Hit=ref|NM_001008415.1| Length=1211 gaps=5 Percent_id=94.9628406275805 Query=NM_181451 Hit=ref|XM_001251693.3| Length=1320 gaps=5 Percent_id=91.969696969697 Query=NM_181451 Hit=ref|NM_001097567.1| Length=1338 gaps=4 Percent_id=91.5545590433483 Query=NM_181451 Hit=gb|AY075103.1| Length=1334 gaps=1 Percent_id=91.304347826087 ................ .......... Pl Find.whats the error i this code... Thanks Manju Rawat. From locarpau at upvnet.upv.es Thu Sep 1 10:49:06 2011 From: locarpau at upvnet.upv.es (Lorenzo Carretero Paulet) Date: Thu, 1 Sep 2011 16:49:06 +0200 Subject: [Bioperl-l] Parsing PAML mlc files In-Reply-To: References: <9645AF32-5EC3-41FA-9A32-45B6B92E31FD@illinois.edu><4DF56976.8080704@upvnet.upv.es><9866C4A4-AC36-4A25-B38F-3006A7BB0F11@illinois.edu> <1314636433.4e5bc291a40c6@webmail.upv.es> <1A4207F8295607498283FE9E93B775B407CCB2D3@EX02.asurite.ad.asu.edu> <1314659986.4e5c1e9268078@webmail.upv.es> Message-ID: <1314888546.4e5f9b628ecea@webmail.upv.es> Hi all, I'm trying to parse mlc output files from PAML using Bio::Tools::Phylo::PAML as: my $mlcfile = "/Applications/Bioinformatics/paml44/bin/mlc"; my $parserF = Bio::Tools::Phylo::PAML->new (-file =>$mclfile); if ( my $paml_result = $parserF->next_result ) { say Dumper $paml_result; #Prints Ok for ( my $model_result= $paml_result->get_NSSite_results ) { #say Dumper $model_result; #Prints nothing $ns_string = "model ".$model_result->model_num."\n ".$model_result->model_description()."\n ".$model_result->time_used."\n"; $dnds_site_classes = $model_result->dnds_site_classes; #a hashref #say Dumper $dnds_site_classes; for my $sites ( $model_result->get_BEB_pos_selected_sites ) ... ... ... The Bio::Tools::Phylo::PAML::Result object is ok, as I can print it using DUmper. In contrast, it seems that the Bio::Tools::Phylo::PAML::ModelResult object is not being properly instantiated, as I get the error message: "Can't call method "model_num" without a package or object reference at ..." What am I missing? Best, Lorenzo From jason.stajich at gmail.com Thu Sep 1 16:23:47 2011 From: jason.stajich at gmail.com (Jason Stajich) Date: Thu, 1 Sep 2011 13:23:47 -0700 Subject: [Bioperl-l] Parsing PAML mlc files In-Reply-To: <1314888546.4e5f9b628ecea@webmail.upv.es> References: <9645AF32-5EC3-41FA-9A32-45B6B92E31FD@illinois.edu><4DF56976.8080704@upvnet.upv.es><9866C4A4-AC36-4A25-B38F-3006A7BB0F11@illinois.edu> <1314636433.4e5bc291a40c6@webmail.upv.es> <1A4207F8295607498283FE9E93B775B407CCB2D3@EX02.asurite.ad.asu.edu> <1314659986.4e5c1e9268078@webmail.upv.es> <1314888546.4e5f9b628ecea@webmail.upv.es> Message-ID: Lorenzo - I am sure this is a problem with changes in the output from PAML - this is classic problem with this suite. This requires some debugging of the parser, not sure if there is anyone out there with time to do the debugging. I can say all this worked before on an earlier version of PAML but I don't know specifically what is going on with the latest paml4.4 version. Jason On Sep 1, 2011, at 7:49 AM, Lorenzo Carretero Paulet wrote: > Hi all, > I'm trying to parse mlc output files from PAML using Bio::Tools::Phylo::PAML as: > > my $mlcfile = "/Applications/Bioinformatics/paml44/bin/mlc"; > my $parserF = Bio::Tools::Phylo::PAML->new (-file =>$mclfile); > if ( my $paml_result = $parserF->next_result ) > { > say Dumper $paml_result; #Prints Ok > for ( my $model_result= $paml_result->get_NSSite_results ) > { > #say Dumper $model_result; #Prints nothing > $ns_string = "model ".$model_result->model_num."\n > ".$model_result->model_description()."\n ".$model_result->time_used."\n"; > $dnds_site_classes = $model_result->dnds_site_classes; #a hashref > #say Dumper $dnds_site_classes; > for my $sites ( $model_result->get_BEB_pos_selected_sites ) > ... > ... > ... > > The Bio::Tools::Phylo::PAML::Result object is ok, as I can print it using > DUmper. In contrast, it seems that the Bio::Tools::Phylo::PAML::ModelResult > object is not being properly instantiated, as I get the error message: > > "Can't call method "model_num" without a package or object reference at ..." > > What am I missing? > Best, > Lorenzo > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From Scott.Markel at accelrys.com Thu Sep 1 17:22:21 2011 From: Scott.Markel at accelrys.com (Scott Markel) Date: Thu, 1 Sep 2011 14:22:21 -0700 Subject: [Bioperl-l] file format for alignment plus features for aligned sequences Message-ID: <5ACBA19439E77B43A06F4CAB897EC97702F8302A05@EXCH1-COLO.accelrys.net> A question on behalf of the Discovery Studio group at Accelrys - They have alignment data with annotations, e.g., visualization settings or alignment properties. The aligned sequences also have features, e.g., domain boundaries or secondary structure motifs. They currently use BSML to save sequences and features. Is there an extension of BSML that can also save the alignment information? Are there any good file formats that can be used to store an alignment plus features associated with the aligned sequences? Are there other mailing lists that might be more appropriate for these questions? Scott Scott Markel, Ph.D. Principal Bioinformatics Architect email: smarkel at accelrys.com Accelrys (Pipeline Pilot R&D) mobile: +1 858 205 3653 10188 Telesis Court, Suite 100 voice: +1 858 799 5603 San Diego, CA 92121 fax: +1 858 799 5222 USA web: http://www.accelrys.com http://www.linkedin.com/in/smarkel Secretary, Board of Directors: International Society for Computational Biology Chair: ISCB Publications Committee Associate Editor: PLoS Computational Biology Editorial Board: Briefings in Bioinformatics From ihok at hotmail.com Thu Sep 1 23:49:50 2011 From: ihok at hotmail.com (Jack Tanner) Date: Thu, 1 Sep 2011 23:49:50 -0400 Subject: [Bioperl-l] Bio::Ext::Align? Message-ID: I'd like to run Bio::Ext::Align, from bioperl-ext. Seems like a pretty quiet codebase these days... Is it dead? Does anyone have a spec file for building an SRPM for it for RHEL 6? From cjfields at illinois.edu Fri Sep 2 00:31:17 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Fri, 2 Sep 2011 04:31:17 +0000 Subject: [Bioperl-l] Bio::Ext::Align? In-Reply-To: References: Message-ID: <82213CB3-C02A-415D-A4F3-66471D44C31C@illinois.edu> Yes, it's essentially deprecated (unmaintained). I don't know of anyone who has packaged that up in a while, if ever. chris On Sep 1, 2011, at 10:49 PM, Jack Tanner wrote: > I'd like to run Bio::Ext::Align, from bioperl-ext. Seems like a pretty quiet codebase these days... Is it dead? > > Does anyone have a spec file for building an SRPM for it for RHEL 6? > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From David.Messina at sbc.su.se Fri Sep 2 04:44:07 2011 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 2 Sep 2011 10:44:07 +0200 Subject: [Bioperl-l] Bio::Ext::Align? In-Reply-To: <82213CB3-C02A-415D-A4F3-66471D44C31C@illinois.edu> References: <82213CB3-C02A-415D-A4F3-66471D44C31C@illinois.edu> Message-ID: As it happens, a colleague of mine needed Bio::Ext::Align for hhrpred: http://toolkit.tuebingen.mpg.de/hhpred He got it working thus: > Hi Dave, > thanks a lot. i made it work. The error i got later on was: relocation R_X86_64_32 against `.rodata.str1.1' can not be used when making > a shared object; > recompile with -fPIC > > the solution is: > perl Makefile.PL PREFIX= /fftwsingle --enable-shared > --with-pic --enable-single > > make > make install > http://forums.fedoraforum.org/ > showthread.php?t=232607 Dave On Fri, Sep 2, 2011 at 06:31, Fields, Christopher J wrote: > Yes, it's essentially deprecated (unmaintained). I don't know of anyone > who has packaged that up in a while, if ever. > > chris > > On Sep 1, 2011, at 10:49 PM, Jack Tanner wrote: > > > I'd like to run Bio::Ext::Align, from bioperl-ext. Seems like a pretty > quiet codebase these days... Is it dead? > > > > Does anyone have a spec file for building an SRPM for it for RHEL 6? > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From David.Messina at sbc.su.se Fri Sep 2 05:30:33 2011 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 2 Sep 2011 11:30:33 +0200 Subject: [Bioperl-l] Parsing PAML mlc files In-Reply-To: References: <9645AF32-5EC3-41FA-9A32-45B6B92E31FD@illinois.edu> <4DF56976.8080704@upvnet.upv.es> <9866C4A4-AC36-4A25-B38F-3006A7BB0F11@illinois.edu> <1314636433.4e5bc291a40c6@webmail.upv.es> <1A4207F8295607498283FE9E93B775B407CCB2D3@EX02.asurite.ad.asu.edu> <1314659986.4e5c1e9268078@webmail.upv.es> <1314888546.4e5f9b628ecea@webmail.upv.es> Message-ID: Looking back at the commit history, back in April and May 2010, I made some updates for the January 2010 edition of PAML 4.4. All tests passed at that time, but: - the tests may be incomplete - PAML has undoubtedly changed since then, even if it's still called version 4.4 I can't look at this right now myself, but please file a bug report on this, and hopefully someone else can. Dave On Thu, Sep 1, 2011 at 22:23, Jason Stajich wrote: > Lorenzo - > > I am sure this is a problem with changes in the output from PAML - this is > classic problem with this suite. This requires some debugging of the > parser, not sure if there is anyone out there with time to do the debugging. > I can say all this worked before on an earlier version of PAML but I don't > know specifically what is going on with the latest paml4.4 version. > > Jason > > > On Sep 1, 2011, at 7:49 AM, Lorenzo Carretero Paulet wrote: > > > Hi all, > > I'm trying to parse mlc output files from PAML using > Bio::Tools::Phylo::PAML as: > > > > my $mlcfile = "/Applications/Bioinformatics/paml44/bin/mlc"; > > my $parserF = Bio::Tools::Phylo::PAML->new (-file =>$mclfile); > > if ( my $paml_result = $parserF->next_result ) > > { > > say Dumper $paml_result; #Prints Ok > > for ( my $model_result= $paml_result->get_NSSite_results ) > > { > > #say Dumper $model_result; #Prints nothing > > $ns_string = "model ".$model_result->model_num."\n > > ".$model_result->model_description()."\n ".$model_result->time_used."\n"; > > $dnds_site_classes = $model_result->dnds_site_classes; #a hashref > > #say Dumper $dnds_site_classes; > > for my $sites ( $model_result->get_BEB_pos_selected_sites ) > > ... > > ... > > ... > > > > The Bio::Tools::Phylo::PAML::Result object is ok, as I can print it using > > DUmper. In contrast, it seems that the > Bio::Tools::Phylo::PAML::ModelResult > > object is not being properly instantiated, as I get the error message: > > > > "Can't call method "model_num" without a package or object reference at > ..." > > > > What am I missing? > > Best, > > Lorenzo > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at illinois.edu Fri Sep 2 09:00:27 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Fri, 2 Sep 2011 13:00:27 +0000 Subject: [Bioperl-l] Bio::Ext::Align? In-Reply-To: References: <82213CB3-C02A-415D-A4F3-66471D44C31C@illinois.edu> Message-ID: <81EEC2D4-D039-41D5-A10F-91DD26B4D21E@illinois.edu> I think, if this is actively being used, we should split it away from bioperl-ext and release it on its own. Otherwise I worry about the long-term support for it/ chris On Sep 2, 2011, at 3:44 AM, Dave Messina wrote: > As it happens, a colleague of mine needed Bio::Ext::Align for hhrpred: > http://toolkit.tuebingen.mpg.de/hhpred > > > He got it working thus: > Hi Dave, > thanks a lot. i made it work. The error i got later on was: > relocation R_X86_64_32 against `.rodata.str1.1' can not be used when making a shared object; > recompile with -fPIC > > the solution is: > perl Makefile.PL PREFIX= /fftwsingle --enable-shared --with-pic --enable-single > > make > make install > http://forums.fedoraforum.org/showthread.php?t=232607 > > > > Dave > > > > > > On Fri, Sep 2, 2011 at 06:31, Fields, Christopher J wrote: > Yes, it's essentially deprecated (unmaintained). I don't know of anyone who has packaged that up in a while, if ever. > > chris > > On Sep 1, 2011, at 10:49 PM, Jack Tanner wrote: > > > I'd like to run Bio::Ext::Align, from bioperl-ext. Seems like a pretty quiet codebase these days... Is it dead? > > > > Does anyone have a spec file for building an SRPM for it for RHEL 6? > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From ihok at hotmail.com Fri Sep 2 11:20:44 2011 From: ihok at hotmail.com (Jack Tanner) Date: Fri, 2 Sep 2011 11:20:44 -0400 Subject: [Bioperl-l] Bio::Ext::Align? In-Reply-To: <81EEC2D4-D039-41D5-A10F-91DD26B4D21E@illinois.edu> References: <82213CB3-C02A-415D-A4F3-66471D44C31C@illinois.edu> <81EEC2D4-D039-41D5-A10F-91DD26B4D21E@illinois.edu> Message-ID: I also see that someone's forked it on Github and made some packaging fixes. It'd be nice to see it revived. On 9/2/2011 9:00 AM, Fields, Christopher J wrote: > I think, if this is actively being used, we should split it away from bioperl-ext and release it on its own. Otherwise I worry about the long-term support for it/ > > chris > > On Sep 2, 2011, at 3:44 AM, Dave Messina wrote: > >> As it happens, a colleague of mine needed Bio::Ext::Align for hhrpred: >> http://toolkit.tuebingen.mpg.de/hhpred >> >> >> He got it working thus: >> Hi Dave, >> thanks a lot. i made it work. The error i got later on was: >> relocation R_X86_64_32 against `.rodata.str1.1' can not be used when making a shared object; >> recompile with -fPIC >> >> the solution is: >> perl Makefile.PL PREFIX= /fftwsingle --enable-shared --with-pic --enable-single >> >> make >> make install >> http://forums.fedoraforum.org/showthread.php?t=232607 >> >> >> >> Dave >> >> >> >> >> >> On Fri, Sep 2, 2011 at 06:31, Fields, Christopher J wrote: >> Yes, it's essentially deprecated (unmaintained). I don't know of anyone who has packaged that up in a while, if ever. >> >> chris >> >> On Sep 1, 2011, at 10:49 PM, Jack Tanner wrote: >> >>> I'd like to run Bio::Ext::Align, from bioperl-ext. Seems like a pretty quiet codebase these days... Is it dead? >>> >>> Does anyone have a spec file for building an SRPM for it for RHEL 6? >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > From manju.rawat2 at gmail.com Sat Sep 3 01:29:56 2011 From: manju.rawat2 at gmail.com (Manju Rawat) Date: Sat, 3 Sep 2011 01:29:56 -0400 Subject: [Bioperl-l] hsps_successfully_gapped: 47 Message-ID: Hello, Is There any method in BioPerl through which we can extract number_of_hsps_successfully_gapped: from a blast file.. If any one know about the it Pl help me... Thanks Manju Rawat From manju.rawat2 at gmail.com Sat Sep 3 06:00:22 2011 From: manju.rawat2 at gmail.com (Manju Rawat) Date: Sat, 3 Sep 2011 06:00:22 -0400 Subject: [Bioperl-l] blast result not matching. Message-ID: Hi, I doing blast using bioperl...but it not showing me complete result.. my program is following... #!usr/bin/perl -w use Bio::Perl; # this script will only work with an internet connection # on the computer it is run on $seq = new_sequence("ATTGGTTTGGGGACCCAATTTGTGTGTTATATGTA"); $blast_result=blast_sequence($seq); write_blast(">xyz.blast",$blast_result); and Output.. BLASTN 2.2.25+ Stephen F. Altschul, Thomas L. Madden, Alejandro A. Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs", Nucleic Acids Res. 25:3389-3402. Query= blast-sequence-temp-id (30 letters) Database: All GenBank+EMBL+DDBJ+PDB sequences (but no EST, STS,GSS,environmental samples or phase 0, 1 or 2 HTGS sequences) 14,527,398 sequences; 37,346,598,701 total letters Score E Sequences producing significant alignments: (bits) value Database: All GenBank+EMBL+DDBJ+PDB sequences (but no EST, STS,GSS,environmental samples or phase 0, 1 or 2 HTGS sequences) Posted date: Sep 2, 2011 4:14 PM Number of letters in database: 37,346,598,701 Number of sequences in database: 14,527,398 Matrix: blastn matrix:2 -3 Gap Penalties Existence: 5, Extension: 2 expect: 1e-10 allowgaps: yes Search Statistics A: 0 Hits_to_DB: 737,387 S1: 23 S1_bits: 22.0 S2: 77 S2_bits: 70.7 X1: 22 X1_bits: 20.1 X2: 33 X2_bits: 29.8 X3: 110 X3_bits: 99.2 dbentries: 14,527,398 dbletters: -1308106959 effectivedblength: 36,954,358,955 effectivespace: 110,863,076,865 effectivespaceused: 110,863,076,865 entropy: 0.912 entropy_gapped: 0.780 kappa: 0.408 kappa_gapped: 0.410 lambda: 0.634 lambda_gapped: 0.625 length_adjustment: 27 num_extensions: 8,057 num_successful_extensions: 8,057 number_of_hsps_better_than_expect_value_cutoff_without_gapping: 0 number_of_hsps_gapped: 8,057 number_of_hsps_successfully_gapped: 0 querylength: 3 seqs_better_than_1e-10: 0 this result is not matching with with NCBI result... Is there anything wrong.. Thanks Manju Rawat From florent.angly at gmail.com Sun Sep 4 22:14:37 2011 From: florent.angly at gmail.com (Florent Angly) Date: Mon, 05 Sep 2011 12:14:37 +1000 Subject: [Bioperl-l] Bio::SeqIO can't guess the format of data from a pipe In-Reply-To: References: <2E070303-ADDF-472B-835B-6D3959B83E9A@illinois.edu> <4E58D105.7050805@gmail.com> <4E5A0590.2010805@gmail.com> <8D639B95-0666-4F09-8E9E-88C8CDF76ABC@illinois.edu> <4E5AC2B8.9060808@gmail.com> Message-ID: <4E64308D.5060304@gmail.com> Thanks for your advice Chris. I put a format() and variant() method in Bio::Root::IO. All the Bio::*IO methods inherit these methods. Regarding the module naming, 8 follow the convention Bio::*IO and 8 follow the Bio::*::IO convention. If we decide to rename some IO modules for consistency, I would prefer the Bio::*::IO convention. Regards, Florent On 29/08/11 11:10, Chris Fields wrote: > On Aug 28, 2011, at 5:35 PM, Florent Angly wrote: > >> Hi, >> >> I implemented the format() getter method in Bio::SeqIO as discussed, essentially following the way proposed by Hilmar. The variant() method is not needed since Bio::SeqIO::fastq already has a get/set method for that. > Right, but the method could be used by other modules if it were moved to Bio::SeqIO. for instance. > >> I noticed that there are plenty more Bio*IO modules that could benefit from having a format() method, e.g.: >> Bio::AlignIO >> Bio::ClusterIO >> Bio::FeatureIO >> Bio::MapIO >> Bio::OntologyIO >> Bio::SearchIO >> Bio::TreeIO >> Bio::Assembly::IO * >> The code could be copy-pasted for each of them but it is not very graceful. Is there a way we could have all these IO modules share the same format() method? > Move the method to Bio::Root::IO, the common base class for all of the above. > >> * Note how the IO class for Bio::Assembly is called Bio::Assembly::IO, and not Bio::AssemblyIO like for other classes. This may be something to change in the future for consistency. >> >> Florent > That's possible; one could take advantage of that for redesign/API issues if it were needed. > > chris From manju.rawat2 at gmail.com Mon Sep 5 03:53:40 2011 From: manju.rawat2 at gmail.com (Manju Rawat) Date: Mon, 5 Sep 2011 03:53:40 -0400 Subject: [Bioperl-l] Fwd: blast result not matching. In-Reply-To: References: Message-ID: Hi, I doing blast using bioperl...but it not showing me complete result.. my program is following... #!usr/bin/perl -w use Bio::Perl; # this script will only work with an internet connection # on the computer it is run on $seq = new_sequence("ATTGGTTTGGGGACCCAATTTGTGTGTTATATGTA"); $blast_result=blast_sequence($seq); write_blast(">xyz.blast",$blast_result); *and Output..* BLASTN 2.2.25+ Stephen F. Altschul, Thomas L. Madden, Alejandro A. Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs", Nucleic Acids Res. 25:3389-3402. Query= blast-sequence-temp-id (30 letters) Database: All GenBank+EMBL+DDBJ+PDB sequences (but no EST, STS,GSS,environmental samples or phase 0, 1 or 2 HTGS sequences) 14,527,398 sequences; 37,346,598,701 total letters Score E Sequences producing significant alignments: (bits) value Database: All GenBank+EMBL+DDBJ+PDB sequences (but no EST, STS,GSS,environmental samples or phase 0, 1 or 2 HTGS sequences) Posted date: Sep 2, 2011 4:14 PM Number of letters in database: 37,346,598,701 Number of sequences in database: 14,527,398 Matrix: blastn matrix:2 -3 Gap Penalties Existence: 5, Extension: 2 expect: 1e-10 allowgaps: yes Search Statistics A: 0 Hits_to_DB: 737,387 S1: 23 S1_bits: 22.0 S2: 77 S2_bits: 70.7 X1: 22 X1_bits: 20.1 X2: 33 X2_bits: 29.8 X3: 110 X3_bits: 99.2 dbentries: 14,527,398 dbletters: -1308106959 effectivedblength: 36,954,358,955 effectivespace: 110,863,076,865 effectivespaceused: 110,863,076,865 entropy: 0.912 entropy_gapped: 0.780 kappa: 0.408 kappa_gapped: 0.410 lambda: 0.634 lambda_gapped: 0.625 length_adjustment: 27 num_extensions: 8,057 num_successful_extensions: 8,057 number_of_hsps_better_than_expect_value_cutoff_without_gapping: 0 number_of_hsps_gapped: 8,057 number_of_hsps_successfully_gapped: 0 querylength: 3 seqs_better_than_1e-10: 0 this result is not matching with with NCBI result... Is there anything wrong.. Thanks Manju Rawat From p.j.a.cock at googlemail.com Mon Sep 5 05:44:06 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 5 Sep 2011 10:44:06 +0100 Subject: [Bioperl-l] Fwd: blast result not matching. In-Reply-To: References: Message-ID: On Mon, Sep 5, 2011 at 8:53 AM, Manju Rawat wrote: > Hi, > I doing blast using bioperl...but it not showing me complete result.. > > > my program is following... > ... > > this result is not matching with with NCBI result... > Is there anything wrong.. The NCBI website for BLAST uses different default values to the BLAST command line tools. Check things like the gap parameters if you want to use the same settings. Peter From p.j.a.cock at googlemail.com Mon Sep 5 06:25:15 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 5 Sep 2011 11:25:15 +0100 Subject: [Bioperl-l] Fwd: blast result not matching. In-Reply-To: References: Message-ID: Please CC the mailing list. On Mon, Sep 5, 2011 at 11:19 AM, Manju Rawat wrote: > Hi, > > Thanks for the reply... > but when i am blasting after getting sequence of any gene (from NCBI using > bioperl see below)..it showing me same result as shown in NCBI.. > > #!usr/bin/perl -w > use Bio::Perl; > $seq_object = get_sequence('NCBI',"NM_181451"); > $blast_result = blast_sequence($seq); > write_blast(">roa1.blast",$blast_report); > > > I dnt know why its not working when i am blasting my own sequence.. > Maybe you need give the sequence as a FASTA entry rather than a plain string? Peter From manju.rawat2 at gmail.com Mon Sep 5 06:40:57 2011 From: manju.rawat2 at gmail.com (Manju Rawat) Date: Mon, 5 Sep 2011 06:40:57 -0400 Subject: [Bioperl-l] Fwd: blast result not matching. In-Reply-To: References: Message-ID: No..i also tried this..this also dont work.. pls help me.. From cjfields at illinois.edu Mon Sep 5 15:42:49 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Mon, 5 Sep 2011 19:42:49 +0000 Subject: [Bioperl-l] Fwd: blast result not matching. In-Reply-To: References: Message-ID: <4AF09F30-CB69-4285-90A2-40AFEEEBD222@illinois.edu> Are you using the latest BioPerl? I believe there had been some fixes addressing remote blast. chris On Sep 5, 2011, at 5:40 AM, Manju Rawat wrote: > No..i also tried this..this also dont work.. > pls help me.. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From manju.rawat2 at gmail.com Tue Sep 6 06:59:50 2011 From: manju.rawat2 at gmail.com (Manju Rawat) Date: Tue, 6 Sep 2011 06:59:50 -0400 Subject: [Bioperl-l] Fwd: blast result not matching. In-Reply-To: <4AF09F30-CB69-4285-90A2-40AFEEEBD222@illinois.edu> References: <4AF09F30-CB69-4285-90A2-40AFEEEBD222@illinois.edu> Message-ID: bioperl 1.6.9 version is installed in my system.. its not the reason bcs blast is working fine when i am blasting with follwing code.. #!usr/bin/perl -w use Bio::Perl; $seq = get_sequence('NCBI',"NM_181451"); $blast_result=blast_sequence($seq); write_blast(">xyz.blast",$blast_result); Manju From sidd.basu at gmail.com Tue Sep 6 11:51:09 2011 From: sidd.basu at gmail.com (Siddhartha Basu) Date: Tue, 6 Sep 2011 10:51:09 -0500 Subject: [Bioperl-l] Bioinformatics Job Opening at dictyBase in Chicago Message-ID: <20110906155106.GB1841@Macintosh-388.local> Hi All, We have an open position for a Bioinformatics Software Engineer at dictyBase(Northwestern University in Chicago). The job involves developing web application and middleware for a genome database using modern perl(DBIx-Class/Moose/MVC web frameworks etc) as well as integration of various genomic tools(gbrowse, intermine, apollo, biomart, pathway tools etc..). For full details please see: http://www.dictybase.org/dictybase_jobs.html. thanks, -siddhartha Siddhartha Basu Software developer, dictybase http://www.dictybase.org From slucky at ibab.ac.in Wed Sep 7 09:39:03 2011 From: slucky at ibab.ac.in (Lucky Singh) Date: Wed, 07 Sep 2011 19:09:03 +0530 Subject: [Bioperl-l] Fwd: Re: Problem using Bio::Tools::Run::RemoteBlast Message-ID: <4E6773F7.7000703@ibab.ac.in> -------- Original Message -------- Subject: Re: [Bioperl-l] Problem using Bio::Tools::Run::RemoteBlast Date: Sat, 27 Aug 2011 20:36:58 +0530 From: Lucky Singh To: Carn? Draug On Friday 26 August 2011 07:50 PM, Carn? Draug wrote: > On 22 August 2011 07:01, Lucky Singh wrote: >> Now I >> wanted to host it from web server, but This program is not working from it >> may be it is not able to create or write on file from web server but in >> command line it is working fine. I don't know the possible reason, please >> help me to figure it out. > Have you looked in the apache logs (look in > /var/log/apache2/error.log) ? Can you pastebin your whole code and the > content of the error log after trying to run the script? Dear Carn? Draug, As per your suggestion, I am attaching blast code file currently it is not showing any error on error.log. Thanks a lot for your valuable reply and will be highly grateful if you can get me out of this problem :) -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: blastn URL: From jason.stajich at gmail.com Wed Sep 7 12:13:46 2011 From: jason.stajich at gmail.com (Jason Stajich) Date: Wed, 7 Sep 2011 09:13:46 -0700 Subject: [Bioperl-l] Fwd: blast result not matching. In-Reply-To: References: <4AF09F30-CB69-4285-90A2-40AFEEEBD222@illinois.edu> Message-ID: <8FB3525D-3B2B-4D17-80D4-AA85B3D2AE8B@gmail.com> I don't think it works. I am not sure why - probably a bug - but can you go back to what it is you are trying to do? The Bio::Perl functions in that modules are intended to be shortcuts but the original modules should work. Can you recap what it is you want to accomplish, it may be better to do this with the Bio::Perl module but instead use a more direct use of the underlying modules. On Sep 6, 2011, at 3:59 AM, Manju Rawat wrote: > bioperl 1.6.9 version is installed in my system.. > its not the reason bcs blast is working fine when i am blasting with > follwing code.. > > #!usr/bin/perl -w > use Bio::Perl; > $seq = get_sequence('NCBI',"NM_181451"); > $blast_result=blast_sequence($seq); > write_blast(">xyz.blast",$blast_result); > > > Manju > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Wed Sep 7 12:33:52 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Wed, 7 Sep 2011 16:33:52 +0000 Subject: [Bioperl-l] Fwd: blast result not matching. In-Reply-To: <8FB3525D-3B2B-4D17-80D4-AA85B3D2AE8B@gmail.com> References: <4AF09F30-CB69-4285-90A2-40AFEEEBD222@illinois.edu> <8FB3525D-3B2B-4D17-80D4-AA85B3D2AE8B@gmail.com> Message-ID: I think there was an issue with Bio::Perl BLAST submissions fixed in the 1.6.901 release (1.6.9 != 1.6.901, the latter is newer). From CPAN: 1.6.901 May 18, 2011 ... [Bug fixes] * [3205] - small fix to Bio::Perl blast_sequence() to make compliant with docs [genehack, cjfields] chris On Sep 7, 2011, at 11:13 AM, Jason Stajich wrote: > I don't think it works. I am not sure why - probably a bug - but can you go back to what it is you are trying to do? The Bio::Perl functions in that modules are intended to be shortcuts but the original modules should work. > Can you recap what it is you want to accomplish, it may be better to do this with the Bio::Perl module but instead use a more direct use of the underlying modules. > > > On Sep 6, 2011, at 3:59 AM, Manju Rawat wrote: > >> bioperl 1.6.9 version is installed in my system.. >> its not the reason bcs blast is working fine when i am blasting with >> follwing code.. >> >> #!usr/bin/perl -w >> use Bio::Perl; >> $seq = get_sequence('NCBI',"NM_181451"); >> $blast_result=blast_sequence($seq); >> write_blast(">xyz.blast",$blast_result); >> >> >> Manju >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > From carandraug+dev at gmail.com Wed Sep 7 12:47:16 2011 From: carandraug+dev at gmail.com (=?ISO-8859-1?Q?Carn=EB_Draug?=) Date: Wed, 7 Sep 2011 17:47:16 +0100 Subject: [Bioperl-l] Problem using Bio::Tools::Run::RemoteBlast In-Reply-To: <4E590812.9030006@ibab.ac.in> References: <37711.192.168.1.254.1313992876.squirrel@webmail.ibab.ac.in> <4E590812.9030006@ibab.ac.in> Message-ID: 2011/8/27 Lucky Singh : > On Friday 26 August 2011 07:50 PM, Carn? Draug wrote: >> >> On 22 August 2011 07:01, Lucky Singh ?wrote: >>> >>> Now I >>> wanted to host it from web server, but This program is not working from >>> it >>> may be it is not able to create or write on file from web server but in >>> command line it is working fine. I don't know the possible reason, please >>> help me to figure it out. >> >> Have you looked in the apache logs (look in >> /var/log/apache2/error.log) ? Can you pastebin your whole code and the >> content of the error log after trying to run the script? > > Dear Carn? Draug, > > As per your suggestion, I am attaching blast code file currently it is not > showing any error on error.log. > Thanks a lot for your valuable reply and will be highly grateful if you can > get me out of this problem :) Hi sorry for the late reply. Please try to always reply to the mailing list, maybe someone else can help you too. I don't know about the script as I never used RemoteBlast from bioperl. But given a quick look at it, you're not loading the CGI module on the script ( http://perldoc.perl.org/CGI.html ). Here's a simple example using the CGI module ( http://pastebin.com/miMd70wn ) and a HTML page that will use it ( http://pastebin.com/kWwwMijd ). If nothing shows up on error.log, take a look in access.log. Try some simple CGI script first, such as "hello world!" to see if the problem lies on your bioperl part of the script, or in the web server, or some other part. Carn? From scott at scottcain.net Wed Sep 7 13:57:31 2011 From: scott at scottcain.net (Scott Cain) Date: Wed, 7 Sep 2011 13:57:31 -0400 Subject: [Bioperl-l] October GMOD Meeting in Toronto Message-ID: Hello, The early registration deadline for the October GMOD meeting in Toronto, Canada is approaching. Please register by September 13th to avoid the late registration fee. You can register here: http://gmod.eventbrite.com/ For information about the GMOD meeting please see the page at: http://gmod.org/wiki/October_2011_GMOD_Meeting In addition to the main meeting, there will be a free BioMart workshop on the following Friday, which you can also register for at the main meeting registration page. Thanks, Scott -- ------------------------------------------------------------------------ Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 Ontario Institute for Cancer Research From info at etisalat.com Wed Sep 7 05:52:00 2011 From: info at etisalat.com (Etisalat Telecommunication Network.) Date: Wed, 7 Sep 2011 17:52:00 +0800 Subject: [Bioperl-l] Winning No:ETS/G/NG Message-ID: <20110907092508.M63844@etisalat.com> Etisalat Telecommunication Network. Ticket No:ET/S/3G Notification Date:07/09/2011 Winning No:ETS/G/NG Dear Beneficiary, Congratulations The Etisalat mobile telecommunication network service has chosen you by the board of executive directors as one of the final recipients of a cash Grant/Donation.The online cyber draws was conducted from an exclusive list of 100,000 email addresses of individuals and corporate bodies picked by an advanced automated random computer selection from the web.This promotion is to celebrate the patronage of our esteem customers and we are giving out a yearly donation of $1,000,000.00 US dollers to 10 lucky recipients as a way of showing our appreciation. CONTACT EVENT MANAGER. NAME:Thompson Thomas Phone # :+2347063805127 etisalat_clamdept001 at hotmail.com Etisalat Claims Department 1.Full Name: 2.Residential Address: 3.Country: 4.Occupation: 5.Telephone: 6.Sex: 7.Age: 8.Next of Kin: 9.Nationality: 10.Winning No: Secretary Mrs Linda Abram Etisalat Award Promotion (c)2011 Online Award Promotion Edition From longbow0 at gmail.com Wed Sep 7 16:19:37 2011 From: longbow0 at gmail.com (longbow leo) Date: Wed, 7 Sep 2011 15:19:37 -0500 Subject: [Bioperl-l] How to determine strains were evolved independently in a phylogenetic tree? Message-ID: Hi, I have created a phylogenetic for a virus protein which contained about 200 strains. Next I need to do an analysis to check whether several strains in this tree were evolved independently. Although it is not too difficult to do manually, I still have litter idea how to do this in a Perl script since there are some datasets need to do. At first I tried to use the method "is_monophyletic" in the module "Bio::Tree::TreeFunctionsI" to do this analysis, but it seems it doesn't work as I have thought. According to the description of "is_monophyletic" method, it "Will do a test of monophyly for the nodes specified in comparison to a chosen outgroup". Does here test whether the outgroup strain is monophyletic to the nodes, or test the nodes only? The description sounds like the latter but the what the script did seemed to be the first. Are there any suggestions? Thank you very much! Haizhou Liu From greg at ebi.ac.uk Thu Sep 8 06:40:30 2011 From: greg at ebi.ac.uk (Gregory Jordan) Date: Thu, 8 Sep 2011 11:40:30 +0100 Subject: [Bioperl-l] How to determine strains were evolved independently in a phylogenetic tree? In-Reply-To: References: Message-ID: Hi Haizhou, I'm not sure I understand exactly what you're trying to do. But to clarify the BioPerl code: the is_monophyletic method (for the actual code, see here https://github.com/bioperl/bioperl-live/blob/master/Bio/Tree/TreeFunctionsI.pm#L832) tests whether the single outgroup node falls *within* or *outside* the last common ancestor of the group of nodes given. If the outgroup node falls *outside* the subtree defined by this LCA node, then the group of nodes can be called monophyletic with respect to that outgroup (at least as far as my understanding of the word 'monophyletic' goes). If the outgroup node falls *within* the subtree defined by this LCA node, then the group of nodes is not monophyletic with respect to that outgroup node. The term "evolved independently" sounds slightly vague to me -- what is it exactly about the shape of your tree that allows you to call a strain independent or not? If you gave an example or two of trees where you consider the evolution to be independent and non-independent, I (or someone else on the list) may be able to help you find the right method to do this automatically. Cheers, Greg On Wed, Sep 7, 2011 at 9:19 PM, longbow leo wrote: > Hi, > > I have created a phylogenetic for a virus protein which contained about 200 > strains. Next I need to do an analysis to check whether several strains in > this tree were evolved independently. Although it is not too difficult to > do > manually, I still have litter idea how to do this in a Perl script since > there are some datasets need to do. > > At first I tried to use the method "is_monophyletic" in the module > "Bio::Tree::TreeFunctionsI" to do this analysis, but it seems it doesn't > work as I have thought. According to the description of "is_monophyletic" > method, it "Will do a test of monophyly for the nodes specified in > comparison to a chosen outgroup". Does here test whether the outgroup > strain > is monophyletic to the nodes, or test the nodes only? The description > sounds > like the latter but the what the script did seemed to be the first. > > Are there any suggestions? > > Thank you very much! > > > Haizhou Liu > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From manju.rawat2 at gmail.com Thu Sep 8 02:11:12 2011 From: manju.rawat2 at gmail.com (Manju Rawat) Date: Thu, 8 Sep 2011 02:11:12 -0400 Subject: [Bioperl-l] Fwd: blast result not matching. In-Reply-To: References: <4AF09F30-CB69-4285-90A2-40AFEEEBD222@illinois.edu> <8FB3525D-3B2B-4D17-80D4-AA85B3D2AE8B@gmail.com> Message-ID: Toady i installed the latest version of bioperl in my system via CPAN.. But this still not sowing the complete result.. I just want to do nucleotide blast using bioperl..but while i am doing blast with my sequence it shwowing very samll result.. I dnt know whether it is wrong or right...but while i am blasting the same sequence in NCBI it showing a diffrent result.. and i have also tried to use the orignl module..but it also dnt work.. Pl see reult of the balst in attached file of this mail.. #!usr/bin/perl -w use Bio::Perl; use Bio::SearchIO; $blast_report =blast_sequence('acggctgctgtagatctgatgct'); write_blast(">resl.blast",$blast_report); Thanks. Manju Rawat -------------- next part -------------- A non-text attachment was scrubbed... Name: resl.blast Type: application/octet-stream Size: 1680 bytes Desc: not available URL: From cjfields at illinois.edu Thu Sep 8 09:05:10 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Thu, 8 Sep 2011 13:05:10 +0000 Subject: [Bioperl-l] Fwd: blast result not matching. In-Reply-To: References: <4AF09F30-CB69-4285-90A2-40AFEEEBD222@illinois.edu> <8FB3525D-3B2B-4D17-80D4-AA85B3D2AE8B@gmail.com> Message-ID: <6D4A142B-9455-4CC3-AFDB-F9B3B991B57F@illinois.edu> Submissions to NCBI BLAST via their web interface have different parameters than those submitted via their QBLAST interface (what is used in BioPerl). So the fact there are differing results isn't too surprising, particularly if the results fall close to the e-value cutoff for one or the other. You will need to set the proper parameters, which I don't believe is possible via the (very simple) Bio::Perl interface, but is possible via Bio::Tools::Run::RemoteBlast. chris On Sep 8, 2011, at 1:11 AM, Manju Rawat wrote: > Toady i installed the latest version of bioperl in my system via CPAN.. > But this still not sowing the complete result.. > > > I just want to do nucleotide blast using bioperl..but while i am doing blast with my sequence it shwowing very samll result.. > I dnt know whether it is wrong or right...but while i am blasting the same sequence in NCBI it showing a diffrent result.. > > and i have also tried to use the orignl module..but it also dnt work.. > > Pl see reult of the balst in attached file of this mail.. > #!usr/bin/perl -w > use Bio::Perl; > use Bio::SearchIO; > $blast_report =blast_sequence('acggctgctgtagatctgatgct'); > write_blast(">resl.blast",$blast_report); > > Thanks. > Manju Rawat > > > From David.Messina at sbc.su.se Thu Sep 8 09:33:19 2011 From: David.Messina at sbc.su.se (Dave Messina) Date: Thu, 8 Sep 2011 15:33:19 +0200 Subject: [Bioperl-l] Fwd: blast result not matching. In-Reply-To: References: <4AF09F30-CB69-4285-90A2-40AFEEEBD222@illinois.edu> <8FB3525D-3B2B-4D17-80D4-AA85B3D2AE8B@gmail.com> Message-ID: As I think has been said earlier in this thread, it's almost certainly a discrepancy in the BLAST parameters between what the blast_sequence function in the Bio::Perl module is sending, and what the BLAST website is doing. In this case, you have a very short sequence. If you look in the "Algorithm parameters" section of the BLAST web form, you'll see that there is an option that is checked by default, "Automatically adjust parameters for short input sequences". If I uncheck that option, I get the same results as you did when you submitted your BLAST through BioPerl (see http://cl.ly/9ynq). So to get the same results from a BioPerl-submitted BLAST and a BLAST on NCBI's website, you need to have the same parameters. You can set the parameters from BioPerl as described in the documentation: http://search.cpan.org/~cjfields/BioPerl-1.6.901/Bio/Tools/Run/RemoteBlast.pm As Jason said earlier, the blast_sequence function in Bio::Perl is intended as a simple demonstration and uses the default BLAST parameters. That function is a wrapper around the RemoteBlast module. Since you want to do something a little different, I believe you'll need to use the RemoteBlast module directly. Dave On Thu, Sep 8, 2011 at 08:11, Manju Rawat wrote: > Toady i installed the latest version of bioperl in my system via CPAN.. > But this still not sowing the complete result.. > > > I just want to do nucleotide blast using bioperl..but while i am doing > blast > with my sequence it shwowing very samll result.. > I dnt know whether it is wrong or right...but while i am blasting the same > sequence in NCBI it showing a diffrent result.. > > and i have also tried to use the orignl module..but it also dnt work.. > > Pl see reult of the balst in attached file of this mail.. > #!usr/bin/perl -w > use Bio::Perl; > use Bio::SearchIO; > $blast_report =blast_sequence('acggctgctgtagatctgatgct'); > write_blast(">resl.blast",$blast_report); > > Thanks. > Manju Rawat > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From abualiga2 at gmail.com Thu Sep 8 10:44:39 2011 From: abualiga2 at gmail.com (galeb abu-ali) Date: Thu, 8 Sep 2011 10:44:39 -0400 Subject: [Bioperl-l] genbank parsing of multiple 'function' tags within primary tag Message-ID: Hi, I'm parsing a genbank file with Bio::SeqIO and am stuck on instances of multiple tags within a primary tag. E.g., when there are several 'function' tag-values within a 'CDS' primary tag, I don't know how to link those 'function' tag-values to a particular 'locus_tag'. As parsed values are returned as a list, I tried creating an array of hashes, where the hash-key is 'locus_tag' and hash-values are multiple 'function' tags, but am failing miserably. Pasted below is what I managed so far. At your convenience, please advise. thanks! galeb #!/usr/local/bin/perl # parse_gbk.pl # gsa 09042011 # script to parse out features from gbk # http://www.bioperl.org/wiki/HOWTO:Feature-Annotation#Customizing_Sequence_Object_Construction use strict; use warnings; use Bio::SeqIO; my @loci; my @seqs; my @directions; my @start_coords; my @end_coords; my @genes; my @products; my @notes; my @functions; my %functions; my $gb_file = shift; my $seqio_obj = Bio::SeqIO->new(-file => $gb_file ); my $seq_obj = $seqio_obj->next_seq; for my $feat_obj ( $seq_obj->get_SeqFeatures ) { if ( $feat_obj->primary_tag eq ( 'gene' ) ) { if ($feat_obj->has_tag( 'locus_tag' ) ) { push ( @seqs, $feat_obj->seq->seq ); #collect sequences for my $val ( $feat_obj->get_tag_values( 'locus_tag' ) ) { push ( @loci, $val ); # locus_tags } } if ( $feat_obj->has_tag( 'gene' ) ) { for my $val ( $feat_obj->get_tag_values( 'gene' ) ) { push ( @genes, $val ); # gene names } } else { push ( @genes, "" ); # if gene names are absent, leave empty } if ( $feat_obj->location->isa( 'Bio::Location::Simple' ) ) { # gene coordinates for my $location ( $feat_obj->location ) { push ( @start_coords, $location->start ); push ( @end_coords, $location->end ); if ( $location->strand == -1 ) { push ( @directions, "reverse" ); } else { push ( @directions, "forward" ); } } } } # gene products, notes, functions if ( $feat_obj->primary_tag eq ( 'CDS' ) || $feat_obj->primary_tag eq ( 'misc_feature' ) || $feat_obj->primary_tag eq ( 'ncRNA' ) || $feat_obj->primary_tag eq ( 'rRNA' ) || $feat_obj->primary_tag eq ( 'tRNA' ) || $feat_obj->primary_tag eq ( 'misc_RNA' ) ) { if ( $feat_obj->has_tag( 'product' ) ) { for my $product ( $feat_obj->get_tag_values( 'product' ) ) { push ( @products, $product ); } } else { push ( @products, "" ); } if ( $feat_obj->has_tag( 'note' ) ) { for my $note ( $feat_obj->get_tag_values( 'note' ) ) { push ( @notes, $note ); } } else { push ( @notes, "" ); } if ( $feat_obj->has_tag( 'function' ) ) { for my $function ( $feat_obj->get_tag_values( 'function' ) ) { push ( @functions, $function ); } } else { push ( @functions, "" ); } } } print "locus\tgene_name\tstart_nt\tend_nt\tlength_nt\tdirection\tproduct\tnote\tfunction\tsequence_nt\n"; # header for ( my $elem = 0; $elem < scalar @loci; ++$elem ) { print $loci[$elem], "\t",$genes[$elem], "\t", $start_coords[$elem], "\t", $end_coords[$elem], "\t", length( $seqs[$elem] ), "\t", $directions[$elem], "\t", $products[$elem], "\t", $notes[$elem], "\t", $functions[$elem], "\t", $seqs[$elem], "\n"; } From p.j.a.cock at googlemail.com Thu Sep 8 11:27:56 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 8 Sep 2011 16:27:56 +0100 Subject: [Bioperl-l] genbank parsing of multiple 'function' tags within primary tag In-Reply-To: References: Message-ID: On Thu, Sep 8, 2011 at 3:44 PM, galeb abu-ali wrote: > Hi, > > I'm parsing a genbank file with Bio::SeqIO and am stuck on instances of > multiple tags within a primary tag. ?E.g., when there are several 'function' > tag-values within a 'CDS' primary tag, I don't know how to link those > 'function' tag-values to a particular 'locus_tag'. Do you have GenBank features with multiple locus_tag qualifiers? That would be very unusual... Peter From cjfields at illinois.edu Thu Sep 8 11:32:21 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Thu, 8 Sep 2011 15:32:21 +0000 Subject: [Bioperl-l] genbank parsing of multiple 'function' tags within primary tag In-Reply-To: References: Message-ID: On Sep 8, 2011, at 10:27 AM, Peter Cock wrote: > On Thu, Sep 8, 2011 at 3:44 PM, galeb abu-ali wrote: >> Hi, >> >> I'm parsing a genbank file with Bio::SeqIO and am stuck on instances of >> multiple tags within a primary tag. E.g., when there are several 'function' >> tag-values within a 'CDS' primary tag, I don't know how to link those >> 'function' tag-values to a particular 'locus_tag'. > > Do you have GenBank features with multiple locus_tag qualifiers? > That would be very unusual... > > Peter Agreed; in order to clarify what you mean, I think we would need to see the record in question to get a better idea of the problem. chris From abualiga2 at gmail.com Thu Sep 8 11:39:20 2011 From: abualiga2 at gmail.com (galeb abu-ali) Date: Thu, 8 Sep 2011 11:39:20 -0400 Subject: [Bioperl-l] genbank parsing of multiple 'function' tags within primary tag In-Reply-To: References: Message-ID: I guess I was not clear. 'locus_tag' qualifiers are single, but there are mutliple 'function' qualifiers within a primary feature (e.g. 'CDS'). # gbk file LOCUS NC_011748 5154862 bp DNA circular BCT 15-MAY-2010 # example feature gene complement(1336169..1337905) /gene="cvrA" /locus_tag="EC55989_1287" /db_xref="GeneID:7145846" CDS complement(1336169..1337905) /gene="cvrA" /locus_tag="EC55989_1287" /function="7 : Transport and binding proteins" /function="15.10 : Adaptations to atypical conditions" /function="16.1 : Circulate" /inference="ab initio prediction:AMIGene:2.0" /note="the Vibrio parahaemolyticus gene VP2867 was found to be a potassium/proton antiporter; can rapidly extrude potassium against a potassium gradient at alkaline pH when cloned and expressed in Escherichia coli" /codon_start=1 /transl_table=11 /product="potassium/proton antiporter" /protein_id="YP_002402372.1" /db_xref="GI:218694705" /db_xref="GeneID:7145846" /translation="MDATTIISLFILGSILVTSSILLSSFSSRLGIPILVIFLAIGML AGVDGVGGIPFDNYPFAYMVSNLALAIILLDGGMRTQASSFRVALGPALSLATLGVLI TSGLTGMMAAWLFNLDLIEGLLIGAIVGSTDAAAVFSLLGGKGLNERVGSTLEIESGS NDPMAVFLTITLIAMIQQHESSVSWMFVVDILQQFGLGIVIGLGGGYLLLQMINRIAL PAGLYPLLALSGGILIFALTTALEGSGILAVYLCGFLLGNRPIRNRYGILQNFDGLAW LAQIAMFLVLGLLVNPSDLLPIAIPALILSAWMIFFARPLSVFAGLLPFRGFNLRERV FISWVGLRGAVPIILAVFPMMAGLENARLFFNVAFFVVLVSLLLQGTSLSWAAKKAKV VVPPVGRPVSRVGLDIHPENPWEQFVYQLSADKWCVGAALRDLHMPKETRIAALFRDN QLLHPTGSTRLREGDVLCVIGRERDLPALGKLFSQSPPVALDQRFFGDFILEASAKYA DVALIYGLEDGREYRDKQQTLGEIVQQLLGAAPVVGDQVEFAGMIWTVAEKEDNEVLK IGVRVAEEEAES" On Thu, Sep 8, 2011 at 11:32 AM, Fields, Christopher J < cjfields at illinois.edu> wrote: > On Sep 8, 2011, at 10:27 AM, Peter Cock wrote: > > > On Thu, Sep 8, 2011 at 3:44 PM, galeb abu-ali > wrote: > >> Hi, > >> > >> I'm parsing a genbank file with Bio::SeqIO and am stuck on instances of > >> multiple tags within a primary tag. E.g., when there are several > 'function' > >> tag-values within a 'CDS' primary tag, I don't know how to link those > >> 'function' tag-values to a particular 'locus_tag'. > > > > Do you have GenBank features with multiple locus_tag qualifiers? > > That would be very unusual... > > > > Peter > > Agreed; in order to clarify what you mean, I think we would need to see the > record in question to get a better idea of the problem. > > chris From p.j.a.cock at googlemail.com Thu Sep 8 11:46:28 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 8 Sep 2011 16:46:28 +0100 Subject: [Bioperl-l] genbank parsing of multiple 'function' tags within primary tag In-Reply-To: References: Message-ID: On Thu, Sep 8, 2011 at 4:39 PM, galeb abu-ali wrote: > I guess I was not clear. 'locus_tag' qualifiers are single, but there are > mutliple 'function' qualifiers within a primary feature (e.g. 'CDS'). So are your intending to look at all the CDS features only, and build a hash using the locus_tag as the key, and a list of the 'function' qualifiers as values? Peter From abualiga2 at gmail.com Thu Sep 8 11:55:08 2011 From: abualiga2 at gmail.com (galeb abu-ali) Date: Thu, 8 Sep 2011 11:55:08 -0400 Subject: [Bioperl-l] genbank parsing of multiple 'function' tags within primary tag In-Reply-To: References: Message-ID: So are your intending to look at all the CDS features only, and build a hash using the locus_tag as the key, and a list of the 'function' qualifiers as values? Precisely! I want to create a tab delim file with 'locus_tag' as the common identifier to all the features and gene sequences. So far, I parsed out sequences and single instance qualifiers, but 'function' and 'db_xref' qualifiers give me grief. galeb From abualiga2 at gmail.com Thu Sep 8 12:14:07 2011 From: abualiga2 at gmail.com (galeb abu-ali) Date: Thu, 8 Sep 2011 12:14:07 -0400 Subject: [Bioperl-l] genbank parsing of multiple 'function' tags within primary tag In-Reply-To: <1315497897.3797.443.camel@deskpro15336.internal.sanger.ac.uk> References: <1315497897.3797.443.camel@deskpro15336.internal.sanger.ac.uk> Message-ID: I only had a quick look at your code, so maybe I'm missing something but you are currently pushing all products of all CDSs into the same array, i.e. you do not assign them to a datastructure that links a particular CDS to a list of products. You then use the same index to print out a locus from the @loci array and a product from @products, but the two will not match up because you will have more products than loci. That's right. Products are not the issue in this particular case, as it's E.coli and there's no alternate splicing as far as I know so there is a single product per gene. But there are plenty more 'function' qualifiers, for example, than loci. And I don't know how to create a data structure that will link a 'gene' (as primary tag) to all other qualifiers, whether they belong to 'CDS', 'Misc_RNA', 'Misc_feature', or other primary tags. From ss2489 at cornell.edu Thu Sep 8 12:28:40 2011 From: ss2489 at cornell.edu (Surya Saha) Date: Thu, 8 Sep 2011 12:28:40 -0400 Subject: [Bioperl-l] genbank parsing of multiple 'function' tags within primary tag In-Reply-To: References: <1315497897.3797.443.camel@deskpro15336.internal.sanger.ac.uk> Message-ID: You might want to explore using a hash of complex records that are very similar to structures in C/C++. More info at http://perldoc.perl.org/perldsc.html#Declaration-of-a-HASH-OF-COMPLEX-RECORDS -Surya On Thu, Sep 8, 2011 at 12:14 PM, galeb abu-ali wrote: > I only had a quick look at your code, so maybe I'm missing something but > you are currently pushing all products of all CDSs into the same array, > i.e. you do not assign them to a datastructure that links a particular > CDS to a list of products. You then use the same index to print out a > locus from the @loci array and a product from @products, but the two > will not match up because you will have more products than loci. > > > > That's right. Products are not the issue in this particular case, as it's > E.coli and there's no alternate splicing as far as I know so there is a > single product per gene. But there are plenty more 'function' qualifiers, > for example, than loci. And I don't know how to create a data structure > that > will link a 'gene' (as primary tag) to all other qualifiers, whether they > belong to 'CDS', 'Misc_RNA', 'Misc_feature', or other primary tags. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From fs5 at sanger.ac.uk Thu Sep 8 12:04:57 2011 From: fs5 at sanger.ac.uk (Frank Schwach) Date: Thu, 08 Sep 2011 17:04:57 +0100 Subject: [Bioperl-l] genbank parsing of multiple 'function' tags within primary tag In-Reply-To: References: Message-ID: <1315497897.3797.443.camel@deskpro15336.internal.sanger.ac.uk> I only had a quick look at your code, so maybe I'm missing something but you are currently pushing all products of all CDSs into the same array, i.e. you do not assign them to a datastructure that links a particular CDS to a list of products. You then use the same index to print out a locus from the @loci array and a product from @products, but the two will not match up because you will have more products than loci. Frank On Thu, 2011-09-08 at 10:44 -0400, galeb abu-ali wrote: > Hi, > > I'm parsing a genbank file with Bio::SeqIO and am stuck on instances of > multiple tags within a primary tag. E.g., when there are several 'function' > tag-values within a 'CDS' primary tag, I don't know how to link those > 'function' tag-values to a particular 'locus_tag'. As parsed values are > returned as a list, I tried creating an array of hashes, where the hash-key > is 'locus_tag' and hash-values are multiple 'function' tags, but am failing > miserably. Pasted below is what I managed so far. At your convenience, > please advise. > > thanks! > > galeb > > #!/usr/local/bin/perl > # parse_gbk.pl > # gsa 09042011 > # script to parse out features from gbk > # > http://www.bioperl.org/wiki/HOWTO:Feature-Annotation#Customizing_Sequence_Object_Construction > > use strict; use warnings; > use Bio::SeqIO; > > my @loci; > my @seqs; > my @directions; > my @start_coords; > my @end_coords; > my @genes; > my @products; > my @notes; > my @functions; > my %functions; > > my $gb_file = shift; > my $seqio_obj = Bio::SeqIO->new(-file => $gb_file ); > my $seq_obj = $seqio_obj->next_seq; > > for my $feat_obj ( $seq_obj->get_SeqFeatures ) { > if ( $feat_obj->primary_tag eq ( 'gene' ) ) { > if ($feat_obj->has_tag( 'locus_tag' ) ) { > push ( @seqs, $feat_obj->seq->seq ); #collect sequences > for my $val ( $feat_obj->get_tag_values( 'locus_tag' ) ) > { > push ( @loci, $val ); # locus_tags > } > } > if ( $feat_obj->has_tag( 'gene' ) ) { > for my $val ( $feat_obj->get_tag_values( 'gene' ) > ) { > push ( @genes, $val ); # gene names > } > } > else { > push ( @genes, "" ); # if gene names are absent, leave > empty > } > if ( $feat_obj->location->isa( 'Bio::Location::Simple' ) ) { # gene > coordinates > for my $location ( $feat_obj->location ) { > push ( @start_coords, $location->start ); > push ( @end_coords, $location->end ); > if ( $location->strand == -1 ) { > push ( @directions, "reverse" ); > } > else { > push ( @directions, "forward" ); > } > } > } > } > # gene products, notes, functions > if ( $feat_obj->primary_tag eq ( 'CDS' ) || $feat_obj->primary_tag eq ( > 'misc_feature' ) || $feat_obj->primary_tag eq ( 'ncRNA' ) || > $feat_obj->primary_tag eq ( 'rRNA' ) || $feat_obj->primary_tag eq ( 'tRNA' ) > || $feat_obj->primary_tag eq ( 'misc_RNA' ) ) { > if ( $feat_obj->has_tag( 'product' ) ) { > for my $product ( $feat_obj->get_tag_values( 'product' ) ) { > push ( @products, $product ); > } > } > else { > push ( @products, "" ); > } > if ( $feat_obj->has_tag( 'note' ) ) { > for my $note ( $feat_obj->get_tag_values( 'note' ) ) { > push ( @notes, $note ); > } > } > else { > push ( @notes, "" ); > } > if ( $feat_obj->has_tag( 'function' ) ) { > for my $function ( $feat_obj->get_tag_values( 'function' ) ) { > push ( @functions, $function ); > } > } > else { > push ( @functions, "" ); > } > > } > } > > print > "locus\tgene_name\tstart_nt\tend_nt\tlength_nt\tdirection\tproduct\tnote\tfunction\tsequence_nt\n"; > # header > > for ( my $elem = 0; $elem < scalar @loci; ++$elem ) { > print $loci[$elem], "\t",$genes[$elem], "\t", $start_coords[$elem], > "\t", $end_coords[$elem], "\t", length( $seqs[$elem] ), "\t", > $directions[$elem], "\t", $products[$elem], "\t", $notes[$elem], "\t", > $functions[$elem], "\t", $seqs[$elem], "\n"; > } > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From cjfields at illinois.edu Thu Sep 8 12:51:22 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Thu, 8 Sep 2011 16:51:22 +0000 Subject: [Bioperl-l] genbank parsing of multiple 'function' tags within primary tag In-Reply-To: References: <1315497897.3797.443.camel@deskpro15336.internal.sanger.ac.uk> Message-ID: There is no need to do that if one is using the Bio::SeqFeatureI interface. Note that get_tag_values always returns a list, so to snag a single value for a tag in a scalar, force list context on the LHS by enclosing the variable in (). chris ----------------------------- #!/usr/bin/env perl use Modern::Perl; use Bio::SeqIO; my $in = Bio::SeqIO->new(-format => 'genbank', -file => shift); while (my $seq = $in->next_seq) { for my $feat ($seq->get_SeqFeatures) { next unless $feat->primary_tag eq 'CDS'; my ($locus) = $feat->has_tag('locus_tag') ? $feat->get_tag_values('locus_tag') : ''; my @funcs = $feat->has_tag('function') ? $feat->get_tag_values('function') : (); say join("\t", $locus, join(',', at funcs)); } } On Sep 8, 2011, at 11:28 AM, Surya Saha wrote: > You might want to explore using a hash of complex records that are very > similar to structures in C/C++. More info at > http://perldoc.perl.org/perldsc.html#Declaration-of-a-HASH-OF-COMPLEX-RECORDS > > -Surya > > On Thu, Sep 8, 2011 at 12:14 PM, galeb abu-ali wrote: > >> I only had a quick look at your code, so maybe I'm missing something but >> you are currently pushing all products of all CDSs into the same array, >> i.e. you do not assign them to a datastructure that links a particular >> CDS to a list of products. You then use the same index to print out a >> locus from the @loci array and a product from @products, but the two >> will not match up because you will have more products than loci. >> >> >> >> That's right. Products are not the issue in this particular case, as it's >> E.coli and there's no alternate splicing as far as I know so there is a >> single product per gene. But there are plenty more 'function' qualifiers, >> for example, than loci. And I don't know how to create a data structure >> that >> will link a 'gene' (as primary tag) to all other qualifiers, whether they >> belong to 'CDS', 'Misc_RNA', 'Misc_feature', or other primary tags. >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From abualiga2 at gmail.com Thu Sep 8 12:51:42 2011 From: abualiga2 at gmail.com (galeb abu-ali) Date: Thu, 8 Sep 2011 12:51:42 -0400 Subject: [Bioperl-l] genbank parsing of multiple 'function' tags within primary tag In-Reply-To: References: <1315497897.3797.443.camel@deskpro15336.internal.sanger.ac.uk> Message-ID: You might want to explore using a hash of complex records that are very similar to structures in C/C++. More info at http://perldoc.perl.org/perldsc.html#Declaration-of-a-HASH-OF-COMPLEX-RECORDS alright, thanks! From jskittrell at unmc.edu Thu Sep 8 12:40:31 2011 From: jskittrell at unmc.edu (Jeff S Kittrell) Date: Thu, 8 Sep 2011 11:40:31 -0500 Subject: [Bioperl-l] Error when parsing a blast file Message-ID: An HTML attachment was scrubbed... URL: From cjfields at illinois.edu Thu Sep 8 13:28:53 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Thu, 8 Sep 2011 17:28:53 +0000 Subject: [Bioperl-l] Error when parsing a blast file In-Reply-To: References: Message-ID: <1F3664C5-6D6B-409C-BE5A-C5EB08975231@illinois.edu> What version of bioperl are you using? I think this issue was addressed a while ago, but it's possible there has been a regression. chris On Sep 8, 2011, at 11:40 AM, Jeff S Kittrell wrote: > Hello Gentlemen, > > I am using BioPerl to a parse a blast output file but have run into some difficulties. I've pin pointed the problem and have pasted an example below. If you look at query position 223-224 you will see a large insertion 65ish nucleotides. Since the insertion spans the entire line there are no nucleotide position numbers at the end or beginning of the line nor any nucleotides within the line (dashes only). > When the SearchIO parser encounters this record it dies with the error > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: no data for midline Query ------------------------------------------------------------ > STACK: Error::throw > STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:368 > STACK: Bio::SearchIO::blast::next_result /usr/local/share/perl5/Bio/SearchIO/blast.pm:1805 > STACK: BlastParseNucleotideForDBTopHitCONTIGSQUERY.pl:24 > ----------------------------------------------------------- > > > Has anyone encountered this problem before? Am I doing something wrong? > > Thanks > > Jeff Kittrell > Department of Genetics, Cell Biology & Anatomy > University of Nebraska Medical Center > 985805 Nebraska Medical Center > Omaha, NE 68198-5805 > > Query= 78065535 > > Length=523 > Score E > Sequences producing significant alignments: (Bits) Value > > gi|144922664|ref|NM_001083909.1| Homo sapiens G protein-coupled... 576 1e-163 > > > > gi|144922664|ref|NM_001083909.1| Homo sapiens G protein-coupled > receptor 123 (GPR123), mRNA > Length=4298 > > Score = 576 bits (638), Expect = 1e-163 > Identities = 466/583 (80%), Gaps = 82/583 (14%) > Strand=Plus/Minus > > Query 1 CAGGACTCCGTGG-----ATGGCATCTCGGGCAGGGCCACGCTGGGGTCTGGGTGGGTCC 55 > ||||||||||||| | ||||||||||||||||||| |||||||||| |||||||| > Sbjct 2537 CAGGACTCCGTGGGCAGCAGGGCATCTCGGGCAGGGCCATGCTGGGGTCTCAGTGGGTCC 2478 > > Query 56 TTTGATGGAAGCCCCTGCTCTGCCTCTGGGGCGCCCCAGGACTGGAGGCCACAGGACAGA 115 > |||||||||| |||||||||||||||| ||| ||||||||||||||| |||||||||||| > Sbjct 2477 TTTGATGGAATCCCCTGCTCTGCCTCTAGGGTGCCCCAGGACTGGAGACCACAGGACAGA 2418 > > Query 116 AACCAGATGACCTTGTGCAGGGACGAGCACGTGGAACTGGGATAAAAGGAGTGGGCGTGG 175 > |||| ||||||| ||||| ||||| |||||| |||| |||||||| ||||||||||||| > Sbjct 2417 AACCGGATGACCGTGTGC-GGGACCAGCACGCGGAATTGGGATAAGGGGAGTGGGCGTGG 2359 > > Query 176 CCCAGAGCTTTTCCCCGCTGAGGTCTTTCACAAGGAAGGGGCAGGGGT------------ 223 > ||| |||| ||||||||||||||||||||||||||||||||||||||| > Sbjct 2358 CCCGGAGCGTTTCCCCGCTGAGGTCTTTCACAAGGAAGGGGCAGGGGTGTGATCACAAGG 2299 > > Query ------------------------------------------------------------ > > Sbjct 2298 AAGGGGCAGGGGTGTGATCACAAGGAAGGGGCAGGGGTGTGATCACAAGGAAGGGGCAGG 2239 > > Query 224 ---GTGAACTGCTTCCGAAAGGTGGGGTCACTTTGGTGCCCCCAGTGACCTCATGTGGCA 280 > |||||| ||||| |||||| |||||||||| ||| |||||||||||||||||||||| > Sbjct 2238 GGTGTGAACGGCTTCTGAAAGGCGGGGTCACTTCGGTACCCCCAGTGACCTCATGTGGCA 2179 > > Query 281 GATGGGCCCCCCACTCTGCTCTGAAGCTCCTCCAGGAACACTGTGTCCCCTG-CTCCGCC 339 > ||||||||||||||||||||||||||||||||||||||||| |||||| ||| | || | > Sbjct 2178 GATGGGCCCCCCACTCTGCTCTGAAGCTCCTCCAGGAACACCGTGTCCTCTGCCCCCATC 2119 > > Query 340 TACACAGTAGTTTCATTTTTCCAGGGTCCTGTTCGGATGTTGCCGGTCCCATCGGTGCCA 399 > |||||||||||||| |||||||||||||| |||||||||||||||||||| ||||||||| > Sbjct 2118 TACACAGTAGTTTCGTTTTTCCAGGGTCCCGTTCGGATGTTGCCGGTCCCGTCGGTGCCA 2059 > > Query 400 AACGGCAGGTCTTCTAGCAAGTTACCCTTGGGCAGCCCGTTCTGGCTGGGGCCACCAAAG 459 > ||||||||| |||||||||| ||||||||||||||||||||||||||||||||||||||| > Sbjct 2058 AACGGCAGGCCTTCTAGCAATTTACCCTTGGGCAGCCCGTTCTGGCTGGGGCCACCAAAG 1999 > > Query 460 GGCAGGGACTGTGTCCTCCGCAGCATCTCCAGGTGACCAGGCC 502 > ||||||||||||||||||||||||||||||| ||| || |||| > Sbjct 1998 GGCAGGGACTGTGTCCTCCGCAGCATCTCCAAGTGGCCGGGCC 1956 > > > > Lambda K H > 0.634 0.408 0.912 > > Gapped > Lambda K H > 0.625 0.410 0.780 > > Effective search space used: 47712920310 > > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From abualiga2 at gmail.com Thu Sep 8 13:51:34 2011 From: abualiga2 at gmail.com (galeb abu-ali) Date: Thu, 8 Sep 2011 13:51:34 -0400 Subject: [Bioperl-l] genbank parsing of multiple 'function' tags within primary tag In-Reply-To: References: <1315497897.3797.443.camel@deskpro15336.internal.sanger.ac.uk> Message-ID: thanks, Chris! works perfect. To make sure I understand what's going on, forcing list context on $locus allows me to get one value at a time, which is then concatenated with \t to concatenated functions. thanks again! galeb On Thu, Sep 8, 2011 at 12:51 PM, Fields, Christopher J < cjfields at illinois.edu> wrote: > There is no need to do that if one is using the Bio::SeqFeatureI interface. > Note that get_tag_values always returns a list, so to snag a single value > for a tag in a scalar, force list context on the LHS by enclosing the > variable in (). > > chris > > ----------------------------- > #!/usr/bin/env perl > > use Modern::Perl; > use Bio::SeqIO; > > my $in = Bio::SeqIO->new(-format => 'genbank', > -file => shift); > > while (my $seq = $in->next_seq) { > for my $feat ($seq->get_SeqFeatures) { > next unless $feat->primary_tag eq 'CDS'; > my ($locus) = $feat->has_tag('locus_tag') ? > $feat->get_tag_values('locus_tag') : ''; > my @funcs = $feat->has_tag('function') ? > $feat->get_tag_values('function') : (); > say join("\t", $locus, join(',', at funcs)); > } > } > > > > From cjfields at illinois.edu Thu Sep 8 14:27:06 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Thu, 8 Sep 2011 18:27:06 +0000 Subject: [Bioperl-l] genbank parsing of multiple 'function' tags within primary tag In-Reply-To: References: <1315497897.3797.443.camel@deskpro15336.internal.sanger.ac.uk> Message-ID: <2CDD8237-140E-410B-A18C-68FAF54719D5@illinois.edu> On Sep 8, 2011, at 12:51 PM, galeb abu-ali wrote: > thanks, Chris! works perfect. > To make sure I understand what's going on, forcing list context on $locus allows me to get one value at a time,... You have to be careful in this circumstance; doing this: my $foo = @bar; is scalar context on a list, which returns the number of elements in @bar. The following my ($foo) = @bar; forces list context and assigns the first value in @bar to $foo but tosses the rest. If you are sure there is only one value in @bar anyway, the above is fine (and is a common perl idiom). > which is then concatenated with \t to concatenated functions. I'm just using a simple join() to print off the results. Note the second element in the join list is an embedded join() with comma-sep values for functions. chris > thanks again! > > galeb > > On Thu, Sep 8, 2011 at 12:51 PM, Fields, Christopher J wrote: > There is no need to do that if one is using the Bio::SeqFeatureI interface. Note that get_tag_values always returns a list, so to snag a single value for a tag in a scalar, force list context on the LHS by enclosing the variable in (). > > chris > > ----------------------------- > #!/usr/bin/env perl > > use Modern::Perl; > use Bio::SeqIO; > > my $in = Bio::SeqIO->new(-format => 'genbank', > -file => shift); > > while (my $seq = $in->next_seq) { > for my $feat ($seq->get_SeqFeatures) { > next unless $feat->primary_tag eq 'CDS'; > my ($locus) = $feat->has_tag('locus_tag') ? > $feat->get_tag_values('locus_tag') : ''; > my @funcs = $feat->has_tag('function') ? > $feat->get_tag_values('function') : (); > say join("\t", $locus, join(',', at funcs)); > } > } > > > From cjfields at illinois.edu Thu Sep 8 14:30:06 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Thu, 8 Sep 2011 18:30:06 +0000 Subject: [Bioperl-l] Error when parsing a blast file In-Reply-To: References: <1F3664C5-6D6B-409C-BE5A-C5EB08975231@illinois.edu> Message-ID: Try updating to the latest CPAN release (1.6.901, which is the pre-1.7 release). chris On Sep 8, 2011, at 1:19 PM, Jeff S Kittrell wrote: > chris, > > I am using version 1.6.1 > > Thanks, > > > Jeff Kittrell > Department of Genetics, Cell Biology & Anatomy > University of Nebraska Medical Center > 985805 Nebraska Medical Center > Omaha, NE 68198-5805 > > "Fields, Christopher J" ---09/08/2011 12:28:56 PM---What version of bioperl are you using? I think this issue was addressed a while ago, but it's possi > > > From: > > "Fields, Christopher J" > > To: > > Jeff S Kittrell > > Cc: > > " " > > Date: > > 09/08/2011 12:28 PM > > Subject: > > Re: [Bioperl-l] Error when parsing a blast file > > > > What version of bioperl are you using? I think this issue was addressed a while ago, but it's possible there has been a regression. > > chris > > On Sep 8, 2011, at 11:40 AM, Jeff S Kittrell wrote: > > > Hello Gentlemen, > > > > I am using BioPerl to a parse a blast output file but have run into some difficulties. I've pin pointed the problem and have pasted an example below. If you look at query position 223-224 you will see a large insertion 65ish nucleotides. Since the insertion spans the entire line there are no nucleotide position numbers at the end or beginning of the line nor any nucleotides within the line (dashes only). > > When the SearchIO parser encounters this record it dies with the error > > > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > MSG: no data for midline Query ------------------------------------------------------------ > > STACK: Error::throw > > STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:368 > > STACK: Bio::SearchIO::blast::next_result /usr/local/share/perl5/Bio/SearchIO/blast.pm:1805 > > STACK: BlastParseNucleotideForDBTopHitCONTIGSQUERY.pl:24 > > ----------------------------------------------------------- > > > > > > Has anyone encountered this problem before? Am I doing something wrong? > > > > Thanks > > > > Jeff Kittrell > > Department of Genetics, Cell Biology & Anatomy > > University of Nebraska Medical Center > > 985805 Nebraska Medical Center > > Omaha, NE 68198-5805 > > > > Query= 78065535 > > > > Length=523 > > Score E > > Sequences producing significant alignments: (Bits) Value > > > > gi|144922664|ref|NM_001083909.1| Homo sapiens G protein-coupled... 576 1e-163 > > > > > > > gi|144922664|ref|NM_001083909.1| Homo sapiens G protein-coupled > > receptor 123 (GPR123), mRNA > > Length=4298 > > > > Score = 576 bits (638), Expect = 1e-163 > > Identities = 466/583 (80%), Gaps = 82/583 (14%) > > Strand=Plus/Minus > > > > Query 1 CAGGACTCCGTGG-----ATGGCATCTCGGGCAGGGCCACGCTGGGGTCTGGGTGGGTCC 55 > > ||||||||||||| | ||||||||||||||||||| |||||||||| |||||||| > > Sbjct 2537 CAGGACTCCGTGGGCAGCAGGGCATCTCGGGCAGGGCCATGCTGGGGTCTCAGTGGGTCC 2478 > > > > Query 56 TTTGATGGAAGCCCCTGCTCTGCCTCTGGGGCGCCCCAGGACTGGAGGCCACAGGACAGA 115 > > |||||||||| |||||||||||||||| ||| ||||||||||||||| |||||||||||| > > Sbjct 2477 TTTGATGGAATCCCCTGCTCTGCCTCTAGGGTGCCCCAGGACTGGAGACCACAGGACAGA 2418 > > > > Query 116 AACCAGATGACCTTGTGCAGGGACGAGCACGTGGAACTGGGATAAAAGGAGTGGGCGTGG 175 > > |||| ||||||| ||||| ||||| |||||| |||| |||||||| ||||||||||||| > > Sbjct 2417 AACCGGATGACCGTGTGC-GGGACCAGCACGCGGAATTGGGATAAGGGGAGTGGGCGTGG 2359 > > > > Query 176 CCCAGAGCTTTTCCCCGCTGAGGTCTTTCACAAGGAAGGGGCAGGGGT------------ 223 > > ||| |||| ||||||||||||||||||||||||||||||||||||||| > > Sbjct 2358 CCCGGAGCGTTTCCCCGCTGAGGTCTTTCACAAGGAAGGGGCAGGGGTGTGATCACAAGG 2299 > > > > Query ------------------------------------------------------------ > > > > Sbjct 2298 AAGGGGCAGGGGTGTGATCACAAGGAAGGGGCAGGGGTGTGATCACAAGGAAGGGGCAGG 2239 > > > > Query 224 ---GTGAACTGCTTCCGAAAGGTGGGGTCACTTTGGTGCCCCCAGTGACCTCATGTGGCA 280 > > |||||| ||||| |||||| |||||||||| ||| |||||||||||||||||||||| > > Sbjct 2238 GGTGTGAACGGCTTCTGAAAGGCGGGGTCACTTCGGTACCCCCAGTGACCTCATGTGGCA 2179 > > > > Query 281 GATGGGCCCCCCACTCTGCTCTGAAGCTCCTCCAGGAACACTGTGTCCCCTG-CTCCGCC 339 > > ||||||||||||||||||||||||||||||||||||||||| |||||| ||| | || | > > Sbjct 2178 GATGGGCCCCCCACTCTGCTCTGAAGCTCCTCCAGGAACACCGTGTCCTCTGCCCCCATC 2119 > > > > Query 340 TACACAGTAGTTTCATTTTTCCAGGGTCCTGTTCGGATGTTGCCGGTCCCATCGGTGCCA 399 > > |||||||||||||| |||||||||||||| |||||||||||||||||||| ||||||||| > > Sbjct 2118 TACACAGTAGTTTCGTTTTTCCAGGGTCCCGTTCGGATGTTGCCGGTCCCGTCGGTGCCA 2059 > > > > Query 400 AACGGCAGGTCTTCTAGCAAGTTACCCTTGGGCAGCCCGTTCTGGCTGGGGCCACCAAAG 459 > > ||||||||| |||||||||| ||||||||||||||||||||||||||||||||||||||| > > Sbjct 2058 AACGGCAGGCCTTCTAGCAATTTACCCTTGGGCAGCCCGTTCTGGCTGGGGCCACCAAAG 1999 > > > > Query 460 GGCAGGGACTGTGTCCTCCGCAGCATCTCCAGGTGACCAGGCC 502 > > ||||||||||||||||||||||||||||||| ||| || |||| > > Sbjct 1998 GGCAGGGACTGTGTCCTCCGCAGCATCTCCAAGTGGCCGGGCC 1956 > > > > > > > > Lambda K H > > 0.634 0.408 0.912 > > > > Gapped > > Lambda K H > > 0.625 0.410 0.780 > > > > Effective search space used: 47712920310 > > > > > > > > > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > From abualiga2 at gmail.com Thu Sep 8 14:34:41 2011 From: abualiga2 at gmail.com (galeb abu-ali) Date: Thu, 8 Sep 2011 14:34:41 -0400 Subject: [Bioperl-l] genbank parsing of multiple 'function' tags within primary tag In-Reply-To: <2CDD8237-140E-410B-A18C-68FAF54719D5@illinois.edu> References: <1315497897.3797.443.camel@deskpro15336.internal.sanger.ac.uk> <2CDD8237-140E-410B-A18C-68FAF54719D5@illinois.edu> Message-ID: many thanks again, Chris! I was reading Programming Perl, but this sums it up better. On Thu, Sep 8, 2011 at 2:27 PM, Fields, Christopher J wrote: > On Sep 8, 2011, at 12:51 PM, galeb abu-ali wrote: > > > thanks, Chris! works perfect. > > To make sure I understand what's going on, forcing list context on $locus > allows me to get one value at a time,... > > You have to be careful in this circumstance; doing this: > > my $foo = @bar; > > is scalar context on a list, which returns the number of elements in @bar. > The following > > my ($foo) = @bar; > > forces list context and assigns the first value in @bar to $foo but tosses > the rest. If you are sure there is only one value in @bar anyway, the above > is fine (and is a common perl idiom). > > > which is then concatenated with \t to concatenated functions. > > I'm just using a simple join() to print off the results. Note the second > element in the join list is an embedded join() with comma-sep values for > functions. > > chris > > > thanks again! > > > > galeb > > > > On Thu, Sep 8, 2011 at 12:51 PM, Fields, Christopher J < > cjfields at illinois.edu> wrote: > > There is no need to do that if one is using the Bio::SeqFeatureI > interface. Note that get_tag_values always returns a list, so to snag a > single value for a tag in a scalar, force list context on the LHS by > enclosing the variable in (). > > > > chris > > > > ----------------------------- > > #!/usr/bin/env perl > > > > use Modern::Perl; > > use Bio::SeqIO; > > > > my $in = Bio::SeqIO->new(-format => 'genbank', > > -file => shift); > > > > while (my $seq = $in->next_seq) { > > for my $feat ($seq->get_SeqFeatures) { > > next unless $feat->primary_tag eq 'CDS'; > > my ($locus) = $feat->has_tag('locus_tag') ? > > $feat->get_tag_values('locus_tag') : ''; > > my @funcs = $feat->has_tag('function') ? > > $feat->get_tag_values('function') : (); > > say join("\t", $locus, join(',', at funcs)); > > } > > } > > > > > > > > From David.Messina at sbc.su.se Fri Sep 9 05:40:25 2011 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 9 Sep 2011 11:40:25 +0200 Subject: [Bioperl-l] Fwd: blast result not matching. In-Reply-To: References: <4AF09F30-CB69-4285-90A2-40AFEEEBD222@illinois.edu> <8FB3525D-3B2B-4D17-80D4-AA85B3D2AE8B@gmail.com> Message-ID: Hi Manju, But this is not showing all query coverage as it shows in simple balst.(see > attached file) > I'm not sure what you mean by query coverage here, as blast report you attached doesn't (as far as I can see) include a calculation of the number or percentage of query bases covered. But in any case, everything in that blast report is available in the Bio::SearchIO object that B::T::R::RemoteBlast returns. Have you taken a look at http://www.bioperl.org/wiki/HOWTO:SearchIO ? That, along with the module documentation, should help you find the parts of the BLAST report you're looking for. > and i also want to write that result in a blast file..Is there any method > which can write the remoteblast output > in a file with blast extension? > It is possible to write out the results in a format that closely resembles the native blast report, but it's not recommended. If you want to just run BLAST and get back a report, there's no need to use BioPerl to parse the report first and then recreate the report. This might also be a good time to mention that, if you're doing more than a few hundred BLAST searches, you'll find it much more efficient to download the database and the BLAST program from NCBI and run them on your own computer. NCBI severely limits the speed and frequency of remote BLASTs, and furthermore it's much more prone to failure. Also, if you're using BLAST+, you can run your BLASTs on NCBI's computers remotely without BioPerl. Check out the --remote command-line option ? it's my favorite new feature! Dave From David.Messina at sbc.su.se Fri Sep 9 06:53:01 2011 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 9 Sep 2011 12:53:01 +0200 Subject: [Bioperl-l] Fwd: blast result not matching. In-Reply-To: References: <4AF09F30-CB69-4285-90A2-40AFEEEBD222@illinois.edu> <8FB3525D-3B2B-4D17-80D4-AA85B3D2AE8B@gmail.com> Message-ID: If you don't want to learn how to do this in BioPerl, then take my previous suggestion and just use NCBI's tools: Also, if you're using BLAST+, you can run your BLASTs on NCBI's computers > remotely without BioPerl. Check out the --remote command-line option On Fri, Sep 9, 2011 at 12:07, Manju Rawat wrote: > I dont no more about Bioperl.... > and i just want to blast my sequences using bioperl... > ans see the result in a file... > pls tell me what should i do??? > From manju.rawat2 at gmail.com Fri Sep 9 07:05:57 2011 From: manju.rawat2 at gmail.com (Manju Rawat) Date: Fri, 9 Sep 2011 07:05:57 -0400 Subject: [Bioperl-l] Fwd: blast result not matching. In-Reply-To: References: <4AF09F30-CB69-4285-90A2-40AFEEEBD222@illinois.edu> <8FB3525D-3B2B-4D17-80D4-AA85B3D2AE8B@gmail.com> Message-ID: I want to learn...and i am learing it from starting... My main query is I want to make a program which gives me that result(sequence) which have no blast result(no matches in any database/or particular database). for this i have to do blast may time....but i am not getting desired result in blast...this is the main problem which i am facing.. now pls tell me whta procedure i should follow... Manju From cjfields at illinois.edu Fri Sep 9 09:03:26 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Fri, 9 Sep 2011 13:03:26 +0000 Subject: [Bioperl-l] blast result not matching. In-Reply-To: References: <4AF09F30-CB69-4285-90A2-40AFEEEBD222@illinois.edu> <8FB3525D-3B2B-4D17-80D4-AA85B3D2AE8B@gmail.com> Message-ID: If you are planning on looking against 'everything' (e.g. nt or nr), and you have many sequences to run, I would follow Dave's suggestion and download BLAST locally. chris On Sep 9, 2011, at 6:05 AM, Manju Rawat wrote: > I want to learn...and i am learing it from starting... > My main query is I want to make a program which gives me that > result(sequence) which have no blast result(no matches in any database/or > particular database). > for this i have to do blast may time....but i am not getting desired result > in blast...this is the main problem which i am facing.. > now pls tell me whta procedure i should follow... > > > Manju > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From manju.rawat2 at gmail.com Fri Sep 9 05:01:55 2011 From: manju.rawat2 at gmail.com (Manju Rawat) Date: Fri, 9 Sep 2011 05:01:55 -0400 Subject: [Bioperl-l] Fwd: blast result not matching. In-Reply-To: References: <4AF09F30-CB69-4285-90A2-40AFEEEBD222@illinois.edu> <8FB3525D-3B2B-4D17-80D4-AA85B3D2AE8B@gmail.com> Message-ID: Thanks to all..Its working.. I tried that module...and got the result follwing result in terminal... waiting......db is All GenBank+EMBL+DDBJ+PDB sequences (but no EST, STS,GSS,environmental samples or phase 0, 1 or 2 HTGS sequences) hit name is ref|NM_181451.1| score is 240 hit name is ref|NM_001008415.1| score is 234 hit name is ref|XM_002706247.1| score is 212 hit name is ref|XM_002683856.1| score is 208 hit name is gb|EF197120.1| score is 208 hit name is ref|XR_083566.1| score is 198 hit name is ref|NM_001097567.1| score is 198 hit name is ref|NM_001098089.1| score is 198 hit name is ref|XM_002699708.1| score is 192 hit name is ref|XM_592786.5| score is 192 hit name is ref|XM_001251693.3| score is 192 hit name is gb|AF490400.1| score is 190 hit name is gb|AY075103.1| score is 190 hit name is ref|XR_083457.1| score is 178 But this is not showing all query coverage as it shows in simple balst.(see attached file) and i also want to write that result in a blast file..Is there any method which can write the remoteblast output in a file with blast extension? Thanks Manju Rawat. -------------- next part -------------- A non-text attachment was scrubbed... Name: res.blast Type: application/octet-stream Size: 218976 bytes Desc: not available URL: From ross at cuhk.edu.hk Sat Sep 10 02:39:23 2011 From: ross at cuhk.edu.hk (Ross KK Leung) Date: Sat, 10 Sep 2011 14:39:23 +0800 Subject: [Bioperl-l] fail to obtain node-to-node distance from a newick file Message-ID: <048a01cc6f84$60c41090$224c31b0$@edu.hk> I use the following code to derive the distance between two nodes but an error "MSG: could not find the lca of supplied nodes; can't find distance either" What's the problem? use Bio::TreeIO; ($treefh) = @ARGV; my $treeio = new Bio::TreeIO(-file => $treefh, -format => "newick"); my $tree = $treeio->next_tree; $keyword="Mycobacterium_tuberculosis_H37Rv"; my $Tnodes = $tree->find_node(-id => $keyword); $keyword="Mycobacterium_smegmatis_str._MC2_155"; my $Mnodes = $tree->find_node(-id => $keyword); $keyword="Mycobacterium_abscessus"; my $Anodes = $tree->find_node(-id => $keyword); my @root = $tree->get_root_node; #my $distances = $tree->distance(-nodes => [$node[0],$root]); my $distances = $tree->distance(-nodes => [$Tnode,$Mnodes]); print "Dist:$distances\n"; #### the following is the infile (((Mycobacterium_tuberculosis_SUMu006:4.1E-4,(Mycobacterium_tuberculosis_SUM u002:8.9E-4,Mycobacterium_tuberculosis_SUMu005:1.7E-4)31:1.4E-4)30:1.4E-4,(M ycobacterium_tuberculosis_SUMu007:4.3E-4,Mycobacterium_tuberculosis_SUMu009: 1.2E-4)22:1.4E-4)8:1.4E-4,(Mycobacterium_tuberculosis_SUMu004:3.7E-4,Mycobac terium_tuberculosis_SUMu008:2.0E-4)19:1.0E-4,(Mycobacterium_tuberculosis_SUM u003:1.4E-4,(((((Mycobacterium_tuberculosis_EAS054:0.00165,(Mycobacterium_tu berculosis_94_M4241A:0.0011,((Mycobacterium_tuberculosis_T17:0.00257,Mycobac terium_tuberculosis_KZN_605:0.00259)8:1.9E-4,(Mycobacterium_tuberculosis_'98 -R604_INH-RIF-EM':9.9E-4,(Mycobacterium_tuberculosis_C:0.00293,Mycobacterium _tuberculosis_str._Haarlem:0.00271)28:1.4E-4)1:1.3E-4)0:1.6E-4)0:1.7E-4)0:1. 6E-4,(Mycobacterium_tuberculosis_02_1987:0.00186,(Mycobacterium_tuberculosis _T85:0.00229,Mycobacterium_tuberculosis_210:0.00175)41:1.4E-4)22:1.1E-4)0:1. 4E-4,(((Mycobacterium_kansasii_ATCC_12478:0.03353,(Mycobacterium_marinum_M:8 .4E-4,Mycobacterium_ulcerans_Agy99:0.00621)100:0.03843)100:0.01058,((Mycobac terium_leprae_Br4923:1.7E-4,Mycobacterium_leprae_TN:1.6E-4)100:0.07379,(((My cobacterium_abscessus:0.09172,((Nocardia_farcinica_IFM_10152:0.09312,(Rhodoc occus_equi_103S:0.04972,((Rhodococcus_erythropolis_PR4:0.00223,Rhodococcus_e rythropolis_SK121:0.00169)100:0.05311,(Rhodococcus_jostii_RHA1:0.00657,Rhodo coccus_opacus_B4:0.00665)100:0.03198)100:0.02608)100:0.03679)100:0.0322,(((D ietzia_cinnamea_P4:0.17703,(Corynebacterium_amycolatum_SK46:0.16548,((Coryne bacterium_kroppenstedtii_DSM_44385:0.19137,(Corynebacterium_variabile_DSM_44 702:0.14307,(Corynebacterium_urealyticum_DSM_7109:0.11864,(Corynebacterium_r esistens_DSM_45100:0.08617,(Corynebacterium_jeikeium_ATCC_43734:0.00292,Cory nebacterium_jeikeium_K411:0.00277)100:0.08012)100:0.02749)100:0.03867)100:0. 05776)100:0.02728,((Corynebacterium_glucuronolyticum_ATCC_51866:0.00103,Cory nebacterium_glucuronolyticum_ATCC_51867:0.00137)100:0.2108,(((Corynebacteriu m_efficiens_YS-314:0.04795,Corynebacterium_glutamicum_R:0.05369)100:0.07449, ((Corynebacterium_matruchotii_ATCC_14266:0.00162,Corynebacterium_matruchotii _ATCC_33806:0.00183)100:0.13467,(Corynebacterium_diphtheriae_NCTC_13129:0.07 465,((Corynebacterium_pseudotuberculosis_C231:1.6E-4,Corynebacterium_pseudot uberculosis_I19:1.1E-4)55:1.8E-4,(Corynebacterium_pseudotuberculosis_1002:4. 2E-4,Corynebacterium_pseudotuberculosis_FRC41:1.2E-4)100:1.3E-4)100:0.0698)1 00:0.03816)100:0.03537)100:0.01906,((Corynebacterium_ammoniagenes_DSM_20306: 0.10393,(((Corynebacterium_accolens_ATCC_49725:0.00229,Corynebacterium_accol ens_ATCC_49726:0.00293)100:0.03941,(Corynebacterium_pseudogenitalium_ATCC_33 035:0.00268,Corynebacterium_tuberculostearicum_SK141:0.00306)100:0.02483)100 :0.04295,(Corynebacterium_aurimucosum_ATCC_700975:0.05398,Corynebacterium_st riatum_ATCC_6940:0.05404)100:0.02088)100:0.03132)100:0.05481,(Corynebacteriu m_genitalium_ATCC_33030:0.09551,Corynebacterium_lipophiloflavum_DSM_44291:0. 09316)100:0.0783)100:0.03398)100:0.02921)100:0.04667)100:0.02447)100:0.11754 )100:0.06305,(((Pseudonocardia_sp._P1:0.18877,(Saccharopolyspora_erythraea_N RRL_2338:0.1213,(Actinosynnema_mirum_DSM_43827:0.11851,(Saccharomonospora_vi ridis_DSM_43017:0.0927,(Amycolatopsis_mediterranei_U32:0.04231,Streptomyces_ sp._AA4:0.04144)100:0.05565)100:0.05467)55:0.01703)100:0.03241)100:0.02581,( Nakamurella_multipartita_DSM_44233:0.23119,(Geodermatophilus_obscurus_DSM_43 160:0.21546,((Stackebrandtia_nassauensis_DSM_44728:0.21272,((Salinispora_are nicola_CNS-205:0.01818,Salinispora_tropica_CNB-440:0.02452)100:0.03265,(Micr omonospora_sp._ATCC_39149:0.0329,(Micromonospora_aurantiaca_ATCC_27029:0.001 16,Micromonospora_sp._L5:7.7E-4)100:0.03199)100:0.01352)100:0.11098)100:0.09 308,((((((Propionibacterium_freudenreichii_subsp._shermanii_CIRM-BIA1:0.2696 8,(Propionibacterium_acnes_J139:0.00475,(Propionibacterium_acnes_KPA171202:6 .9E-4,(Propionibacterium_acnes_SK187:9.6E-4,(Propionibacterium_acnes_J165:6. 2E-4,Propionibacterium_acnes_SK137:8.8E-4)100:6.3E-4)100:0.00181)100:0.00315 )100:0.24712)45:0.13659,(Kribbella_flavida_DSM_17836:0.17436,(Nocardioidacea e_bacterium_Broad-1:0.13638,Nocardioides_sp._JS614:0.08869)100:0.11094)73:0. 02696)43:0.0456,((Kytococcus_sedentarius_DSM_20547:0.25655,(Dermacoccus_sp._ Ellin185:0.19062,(Intrasporangium_calvum_DSM_43043:0.13866,Janibacter_sp._HT CC2649:0.1412)100:0.04228)58:0.02864)100:0.06896,(Kineococcus_radiotolerans_ SRS30216:0.21207,((((Propionibacterium_acidifaciens_F0233:1.06116,Rubrobacte r_xylanophilus_DSM_9941:0.5414)87:0.41816,(((Tropheryma_whipplei_TW08/27:0.0 0122,Tropheryma_whipplei_str._Twist:9.8E-4)100:0.6263,(Microbacterium_testac eum_StLB037:0.17489,(Leifsonia_xyli_subsp._xyli_str._CTCB07:0.12936,(marine_ actinobacterium_PHSC20C1:0.15679,(Clavibacter_michiganensis_subsp._michigane nsis_NCPPB_382:0.00662,Clavibacter_michiganensis_subsp._sepedonicus:0.00756) 100:0.1194)100:0.03032)100:0.03729)100:0.05755)100:0.10753,((Kocuria_rhizoph ila_DC2201:0.15484,(Rothia_dentocariosa_M567:0.0599,(Rothia_mucilaginosa_ATC C_25296:0.00472,Rothia_mucilaginosa_DY-18:0.00351)100:0.05376)100:0.16477)10 0:0.06018,((Micrococcus_luteus_NCTC_2665:0.0060,Micrococcus_luteus_SK58:0.00 536)100:0.2105,(Arthrobacter_arilaitensis_Re117:0.19987,(Renibacterium_salmo ninarum_ATCC_33209:0.11862,(Arthrobacter_aurescens_TC1:0.0397,(Arthrobacter_ sp._FB24:0.03191,(Arthrobacter_chlorophenolicus_A6:0.02327,Arthrobacter_phen anthrenivorans_Sphe3:0.0208)100:0.01937)100:0.01749)100:0.05605)100:0.05138) 54:0.02994)100:0.02418)100:0.06667)66:0.02055)2:0.00737,(Brevibacterium_line ns_BL2:0.16126,Brevibacterium_mcbrellneri_ATCC_49030:0.16657)100:0.16995)18: 0.03558,(Brachybacterium_faecium_DSM_4810:0.31488,(((Actinomyces_sp._oral_ta xon_848_str._F0332:0.18095,Arcanobacterium_haemolyticum_DSM_20595:0.23121)10 0:0.1195,((Actinomyces_urogenitalis_DSM_15434:0.07685,(Actinomyces_sp._oral_ taxon_171_str._F0337:0.01337,(Actinomyces_oris_K20:0.00655,Actinomyces_visco sus_C505:0.00531)100:0.00537)100:0.07249)100:0.1364,(((Mobiluncus_mulieris_2 8-1:0.00136,(Mobiluncus_mulieris_ATCC_35243:8.3E-4,(Mobiluncus_mulieris_ATCC _35239:8.2E-4,Mobiluncus_mulieris_FB024-16:0.00149)77:5.2E-4)68:4.7E-4)100:0 .11036,(Mobiluncus_curtisii_ATCC_51333:0.00379,(Mobiluncus_curtisii_ATCC_430 63:0.00217,Mobiluncus_curtisii_subsp._holmesii_ATCC_35242:0.00188)100:0.0018 7)100:0.10314)100:0.23714,(Actinomyces_coleocanis_DSM_15436:0.23053,(Actinom yces_sp._oral_taxon_178_str._F0338:0.07648,(Actinomyces_sp._oral_taxon_180_s tr._F0310:0.034,(Actinomyces_odontolyticus_ATCC_17982:0.00322,Actinomyces_od ontolyticus_F0309:0.00348)100:0.02875)100:0.05144)100:0.13371)100:0.05044)10 0:0.03821)98:0.03296)99:0.07932,(Beutenbergia_cavernae_DSM_12333:0.17071,(Ce llulomonas_flavigena_DSM_20109:0.11998,(Xylanimonas_cellulosilytica_DSM_1589 4:0.13952,(Sanguibacter_keddieii_DSM_10542:0.0926,Jonesia_denitrificans_DSM_ 20603:0.19099)100:0.037)100:0.02528)100:0.03933)100:0.03437)90:0.02763)87:0. 02582)25:0.02892)16:0.01907)26:0.05274)57:0.03588,(Catenulispora_acidiphila_ DSM_44928:0.19325,((Streptomyces_bingchenggensis_BCW-1:0.02661,(Streptomyces _hygroscopicus_ATCC_53653:0.02627,Streptomyces_violaceusniger_Tu_4113:0.0226 6)100:0.0122)100:0.02739,(((Streptomyces_avermitilis_MA-4680:0.02562,((Strep tomyces_viridochromogenes_DSM_40736:0.02687,((Streptomyces_ghanaensis_ATCC_1 4672:0.0209,Streptomyces_griseoflavus_Tu4000:0.02075)100:0.01066,(Streptomyc es_sp._e14:0.03193,(Streptomyces_coelicolor_A3_2_:5.2E-4,Streptomyces_livida ns_TK24:6.7E-4)100:0.02763)21:0.00492)22:0.00461)100:0.00897,(Streptomyces_s viceus_ATCC_29083:0.02794,Streptomyces_scabiei_87.22:0.03155)79:0.00685)98:0 .00896)100:0.02063,(Streptomyces_albus_J1074:0.04846,(Streptomyces_sp._SPB74 :0.01027,(Streptomyces_sp._SPB78:0.0027,(Streptomyces_sp._SA3_actF:0.00376,S treptomyces_sp._SA3_actG:2.4E-4)100:6.7E-4)100:0.00617)100:0.05981)100:0.015 63)100:0.00959,((Streptomyces_sp._C:0.01998,Streptomyces_sp._Mg1:0.02056)100 :0.0383,(Streptomyces_pristinaespiralis_ATCC_25486:0.03823,(Streptomyces_sp. _ACTE:0.02415,((Streptomyces_griseus_subsp._griseus_NBRC_13350:4.1E-4,Strept omyces_sp._ACT-1:5.2E-4)100:0.00769,(Streptomyces_roseosporus_NRRL_11379:2.2 E-4,Streptomyces_roseosporus_NRRL_15998:0.00209)100:0.00782)100:0.017)100:0. 02297)60:0.00742)100:0.00985)100:0.0215)100:0.11349)100:0.05)46:0.02214,(Aci dothermus_cellulolyticus_11B:0.25166,((Nocardiopsis_dassonvillei_subsp._dass onvillei_DSM_43111:0.11961,Thermobifida_fusca_YX:0.09198)100:0.09418,(Thermo monospora_curvata_DSM_43183:0.12535,(Streptosporangium_roseum_DSM_43021:0.09 754,Thermobispora_bispora_DSM_43833:0.08434)100:0.06253)100:0.0273)100:0.045 43)74:0.02856)40:0.02885,(Frankia_symbiont_of_Datisca_glomerata:0.09569,(Fra nkia_sp._EuI1c:0.10847,((Frankia_alni_ACN14a:0.0351,Frankia_sp._CcI3:0.03651 )100:0.04249,(Frankia_sp._EAN1pec:0.03101,Frankia_sp._EUN1f:0.0394)100:0.046 88)100:0.02188)100:0.03014)100:0.16741)88:0.0315)65:0.02571)88:0.05354)88:0. 03348)88:0.06784,(Segniliparus_rotundus_DSM_44985:0.04795,Segniliparus_rugos us_ATCC_BAA-974:0.03891)100:0.23144)81:0.0205)88:0.02214,(Tsukamurella_pauro metabola_DSM_20162:0.13958,(Gordonia_bronchialis_DSM_43247:0.08238,Gordonia_ neofelifaecis_NRRL_B-59395:0.10412)100:0.06124)100:0.03199)84:0.01596)88:0.0 6919)88:0.04069,(Mycobacterium_smegmatis_str._MC2_155:0.05174,((Mycobacteriu m_sp._JLS:0.00135,(Mycobacterium_sp._KMS:1.4E-4,Mycobacterium_sp._MCS:2.3E-4 )100:0.00177)100:0.04943,(Mycobacterium_vanbaalenii_PYR-1:0.02631,(Mycobacte rium_gilvum_PYR-GCK:0.0011,Mycobacterium_sp._Spyr1:9.2E-4)100:0.03388)100:0. 03536)100:0.01213)100:0.02314)88:0.04937,(Mycobacterium_parascrofulaceum_ATC C_BAA-614:0.02796,(Mycobacterium_intracellulare_ATCC_13950:0.02159,(Mycobact erium_avium_subsp._paratuberculosis_K-10:0.0020,(Mycobacterium_avium_104:0.0 032,Mycobacterium_avium_subsp._avium_ATCC_25291:0.00216)91:6.3E-4)100:0.0194 3)100:0.01235)100:0.0158)88:0.01054)62:0.0066)88:0.04218,(Mycobacterium_tube rculosis_T46:7.6E-4,(Mycobacterium_tuberculosis_CPHL_A:5.8E-4,(Mycobacterium _tuberculosis_K85:8.0E-4,(Mycobacterium_bovis_AF2122/97:1.4E-4,(Mycobacteriu m_bovis_BCG_str._Pasteur_1173P2:1.4E-4,Mycobacterium_bovis_BCG_str._Tokyo_17 2:1.4E-4)100:1.3E-4)100:1.8E-4)38:1.1E-4)33:1.4E-4)0:1.4E-4)1:1.5E-4)0:1.5E- 4,(Mycobacterium_tuberculosis_CDC1551:2.2E-4,(Mycobacterium_tuberculosis_T92 :0.00262,Mycobacterium_tuberculosis_GM_1503:0.00427)3:1.3E-4)0:1.4E-4)13:1.4 E-4,(((Mycobacterium_tuberculosis_SUMu012:8.5E-4,(Mycobacterium_tuberculosis _H37Ra_WGS_:0.00109,(Mycobacterium_tuberculosis_H37Ra:1.4E-4,Mycobacterium_t uberculosis_H37Rv:1.4E-4)58:1.4E-4)41:1.4E-4)72:1.4E-4,(Mycobacterium_tuberc ulosis_SUMu001:2.9E-4,(Mycobacterium_tuberculosis_SUMu010:1.1E-4,Mycobacteri um_tuberculosis_SUMu011:8.7E-4)58:1.4E-4)65:1.6E-4)61:0.00102,(Mycobacterium _tuberculosis_F11:1.1E-4,((Mycobacterium_tuberculosis_KZN_4207:1.4E-4,Mycoba cterium_tuberculosis_KZN_R506:1.4E-4)58:1.4E-4,(Mycobacterium_tuberculosis_K ZN_1435:1.4E-4,Mycobacterium_tuberculosis_KZN_V2475:1.4E-4)74:1.4E-4)78:1.1E -4)36:1.4E-4)3:1.4E-4)46:1.4E-4)4:1.4E-4)88; From greg at ebi.ac.uk Sat Sep 10 11:39:52 2011 From: greg at ebi.ac.uk (Gregory Jordan) Date: Sat, 10 Sep 2011 16:39:52 +0100 Subject: [Bioperl-l] fail to obtain node-to-node distance from a newick file In-Reply-To: <048a01cc6f84$60c41090$224c31b0$@edu.hk> References: <048a01cc6f84$60c41090$224c31b0$@edu.hk> Message-ID: Hi Ross, Which version of BioPerl are you using? With the refactored tree code (available from the tree_api_refresh branch on the BioPerl github repo: https://github.com/bioperl/bioperl-live/blob/topic/tree_api_refresh/Bio/Tree/NodeFunctionsI.pm#L406) the following script works for me. Do those values look sensible to you? The code on the new branch is a bit experimental, so I wouldn't be surprised if all the edge cases for calculations like this aren't covered. --greg use Bio::TreeIO; my $treeio = new Bio::TreeIO(-file => 'temp.nh', -format => "newick"); my $tree = $treeio->next_tree; my $mt = $tree->find("Mycobacterium_tuberculosis_H37Rv"); my $ms = $tree->find("Mycobacterium_smegmatis_str._MC2_155"); my $ma = $tree->find("Mycobacterium_abscessus"); my $distance = $mt->distance($ma); print "MT - MA: ".$mt->distance($ma)."\n"; print "MT - MS: ".$mt->distance($ms)."\n"; print "MS - MA: ".$ms->distance($ma)."\n"; # MT - MA: 0.24326 # MT - MS: 0.18573 # MS - MA: 0.20729 --greg On Sat, Sep 10, 2011 at 7:39 AM, Ross KK Leung wrote: > I use the following code to derive the distance between two nodes but an > error "MSG: could not find the lca of supplied nodes; can't find distance > either" > > > > What's the problem? > > > > use Bio::TreeIO; > > > > ($treefh) = @ARGV; > > > > my $treeio = new Bio::TreeIO(-file => $treefh, -format => "newick"); > > my $tree = $treeio->next_tree; > > > > $keyword="Mycobacterium_tuberculosis_H37Rv"; > > my $Tnodes = $tree->find_node(-id => $keyword); > > $keyword="Mycobacterium_smegmatis_str._MC2_155"; > > my $Mnodes = $tree->find_node(-id => $keyword); > > $keyword="Mycobacterium_abscessus"; > > my $Anodes = $tree->find_node(-id => $keyword); > > > > my @root = $tree->get_root_node; > > #my $distances = $tree->distance(-nodes => [$node[0],$root]); > > > > my $distances = $tree->distance(-nodes => [$Tnode,$Mnodes]); > > print "Dist:$distances\n"; > > > > > > #### the following is the infile > > > (((Mycobacterium_tuberculosis_SUMu006:4.1E-4,(Mycobacterium_tuberculosis_SUM > > u002:8.9E-4,Mycobacterium_tuberculosis_SUMu005:1.7E-4)31:1.4E-4)30:1.4E-4,(M > > ycobacterium_tuberculosis_SUMu007:4.3E-4,Mycobacterium_tuberculosis_SUMu009: > > 1.2E-4)22:1.4E-4)8:1.4E-4,(Mycobacterium_tuberculosis_SUMu004:3.7E-4,Mycobac > > terium_tuberculosis_SUMu008:2.0E-4)19:1.0E-4,(Mycobacterium_tuberculosis_SUM > > u003:1.4E-4,(((((Mycobacterium_tuberculosis_EAS054:0.00165,(Mycobacterium_tu > > berculosis_94_M4241A:0.0011,((Mycobacterium_tuberculosis_T17:0.00257,Mycobac > > terium_tuberculosis_KZN_605:0.00259)8:1.9E-4,(Mycobacterium_tuberculosis_'98 > > -R604_INH-RIF-EM':9.9E-4,(Mycobacterium_tuberculosis_C:0.00293,Mycobacterium > > _tuberculosis_str._Haarlem:0.00271)28:1.4E-4)1:1.3E-4)0:1.6E-4)0:1.7E-4)0:1. > > 6E-4,(Mycobacterium_tuberculosis_02_1987:0.00186,(Mycobacterium_tuberculosis > > _T85:0.00229,Mycobacterium_tuberculosis_210:0.00175)41:1.4E-4)22:1.1E-4)0:1. > > 4E-4,(((Mycobacterium_kansasii_ATCC_12478:0.03353,(Mycobacterium_marinum_M:8 > > .4E-4,Mycobacterium_ulcerans_Agy99:0.00621)100:0.03843)100:0.01058,((Mycobac > > terium_leprae_Br4923:1.7E-4,Mycobacterium_leprae_TN:1.6E-4)100:0.07379,(((My > > cobacterium_abscessus:0.09172,((Nocardia_farcinica_IFM_10152:0.09312,(Rhodoc > > occus_equi_103S:0.04972,((Rhodococcus_erythropolis_PR4:0.00223,Rhodococcus_e > > rythropolis_SK121:0.00169)100:0.05311,(Rhodococcus_jostii_RHA1:0.00657,Rhodo > > coccus_opacus_B4:0.00665)100:0.03198)100:0.02608)100:0.03679)100:0.0322,(((D > > ietzia_cinnamea_P4:0.17703,(Corynebacterium_amycolatum_SK46:0.16548,((Coryne > > bacterium_kroppenstedtii_DSM_44385:0.19137,(Corynebacterium_variabile_DSM_44 > > 702:0.14307,(Corynebacterium_urealyticum_DSM_7109:0.11864,(Corynebacterium_r > > esistens_DSM_45100:0.08617,(Corynebacterium_jeikeium_ATCC_43734:0.00292,Cory > > nebacterium_jeikeium_K411:0.00277)100:0.08012)100:0.02749)100:0.03867)100:0. > > 05776)100:0.02728,((Corynebacterium_glucuronolyticum_ATCC_51866:0.00103,Cory > > nebacterium_glucuronolyticum_ATCC_51867:0.00137)100:0.2108,(((Corynebacteriu > > m_efficiens_YS-314:0.04795,Corynebacterium_glutamicum_R:0.05369)100:0.07449, > > ((Corynebacterium_matruchotii_ATCC_14266:0.00162,Corynebacterium_matruchotii > > _ATCC_33806:0.00183)100:0.13467,(Corynebacterium_diphtheriae_NCTC_13129:0.07 > > 465,((Corynebacterium_pseudotuberculosis_C231:1.6E-4,Corynebacterium_pseudot > > uberculosis_I19:1.1E-4)55:1.8E-4,(Corynebacterium_pseudotuberculosis_1002:4. > > 2E-4,Corynebacterium_pseudotuberculosis_FRC41:1.2E-4)100:1.3E-4)100:0.0698)1 > > 00:0.03816)100:0.03537)100:0.01906,((Corynebacterium_ammoniagenes_DSM_20306: > > 0.10393,(((Corynebacterium_accolens_ATCC_49725:0.00229,Corynebacterium_accol > > ens_ATCC_49726:0.00293)100:0.03941,(Corynebacterium_pseudogenitalium_ATCC_33 > > 035:0.00268,Corynebacterium_tuberculostearicum_SK141:0.00306)100:0.02483)100 > > :0.04295,(Corynebacterium_aurimucosum_ATCC_700975:0.05398,Corynebacterium_st > > riatum_ATCC_6940:0.05404)100:0.02088)100:0.03132)100:0.05481,(Corynebacteriu > > m_genitalium_ATCC_33030:0.09551,Corynebacterium_lipophiloflavum_DSM_44291:0. > > 09316)100:0.0783)100:0.03398)100:0.02921)100:0.04667)100:0.02447)100:0.11754 > > )100:0.06305,(((Pseudonocardia_sp._P1:0.18877,(Saccharopolyspora_erythraea_N > > RRL_2338:0.1213,(Actinosynnema_mirum_DSM_43827:0.11851,(Saccharomonospora_vi > > ridis_DSM_43017:0.0927,(Amycolatopsis_mediterranei_U32:0.04231,Streptomyces_ > > sp._AA4:0.04144)100:0.05565)100:0.05467)55:0.01703)100:0.03241)100:0.02581,( > > Nakamurella_multipartita_DSM_44233:0.23119,(Geodermatophilus_obscurus_DSM_43 > > 160:0.21546,((Stackebrandtia_nassauensis_DSM_44728:0.21272,((Salinispora_are > > nicola_CNS-205:0.01818,Salinispora_tropica_CNB-440:0.02452)100:0.03265,(Micr > > omonospora_sp._ATCC_39149:0.0329,(Micromonospora_aurantiaca_ATCC_27029:0.001 > > 16,Micromonospora_sp._L5:7.7E-4)100:0.03199)100:0.01352)100:0.11098)100:0.09 > > 308,((((((Propionibacterium_freudenreichii_subsp._shermanii_CIRM-BIA1:0.2696 > > 8,(Propionibacterium_acnes_J139:0.00475,(Propionibacterium_acnes_KPA171202:6 > > .9E-4,(Propionibacterium_acnes_SK187:9.6E-4,(Propionibacterium_acnes_J165:6. > > 2E-4,Propionibacterium_acnes_SK137:8.8E-4)100:6.3E-4)100:0.00181)100:0.00315 > > )100:0.24712)45:0.13659,(Kribbella_flavida_DSM_17836:0.17436,(Nocardioidacea > > e_bacterium_Broad-1:0.13638,Nocardioides_sp._JS614:0.08869)100:0.11094)73:0. > > 02696)43:0.0456,((Kytococcus_sedentarius_DSM_20547:0.25655,(Dermacoccus_sp._ > > Ellin185:0.19062,(Intrasporangium_calvum_DSM_43043:0.13866,Janibacter_sp._HT > > CC2649:0.1412)100:0.04228)58:0.02864)100:0.06896,(Kineococcus_radiotolerans_ > > SRS30216:0.21207,((((Propionibacterium_acidifaciens_F0233:1.06116,Rubrobacte > > r_xylanophilus_DSM_9941:0.5414)87:0.41816,(((Tropheryma_whipplei_TW08/27:0.0 > > 0122,Tropheryma_whipplei_str._Twist:9.8E-4)100:0.6263,(Microbacterium_testac > > eum_StLB037:0.17489,(Leifsonia_xyli_subsp._xyli_str._CTCB07:0.12936,(marine_ > > actinobacterium_PHSC20C1:0.15679,(Clavibacter_michiganensis_subsp._michigane > > nsis_NCPPB_382:0.00662,Clavibacter_michiganensis_subsp._sepedonicus:0.00756) > > 100:0.1194)100:0.03032)100:0.03729)100:0.05755)100:0.10753,((Kocuria_rhizoph > > ila_DC2201:0.15484,(Rothia_dentocariosa_M567:0.0599,(Rothia_mucilaginosa_ATC > > C_25296:0.00472,Rothia_mucilaginosa_DY-18:0.00351)100:0.05376)100:0.16477)10 > > 0:0.06018,((Micrococcus_luteus_NCTC_2665:0.0060,Micrococcus_luteus_SK58:0.00 > > 536)100:0.2105,(Arthrobacter_arilaitensis_Re117:0.19987,(Renibacterium_salmo > > ninarum_ATCC_33209:0.11862,(Arthrobacter_aurescens_TC1:0.0397,(Arthrobacter_ > > sp._FB24:0.03191,(Arthrobacter_chlorophenolicus_A6:0.02327,Arthrobacter_phen > > anthrenivorans_Sphe3:0.0208)100:0.01937)100:0.01749)100:0.05605)100:0.05138) > > 54:0.02994)100:0.02418)100:0.06667)66:0.02055)2:0.00737,(Brevibacterium_line > > ns_BL2:0.16126,Brevibacterium_mcbrellneri_ATCC_49030:0.16657)100:0.16995)18: > > 0.03558,(Brachybacterium_faecium_DSM_4810:0.31488,(((Actinomyces_sp._oral_ta > > xon_848_str._F0332:0.18095,Arcanobacterium_haemolyticum_DSM_20595:0.23121)10 > > 0:0.1195,((Actinomyces_urogenitalis_DSM_15434:0.07685,(Actinomyces_sp._oral_ > > taxon_171_str._F0337:0.01337,(Actinomyces_oris_K20:0.00655,Actinomyces_visco > > sus_C505:0.00531)100:0.00537)100:0.07249)100:0.1364,(((Mobiluncus_mulieris_2 > > 8-1:0.00136,(Mobiluncus_mulieris_ATCC_35243:8.3E-4,(Mobiluncus_mulieris_ATCC > > _35239:8.2E-4,Mobiluncus_mulieris_FB024-16:0.00149)77:5.2E-4)68:4.7E-4)100:0 > > .11036,(Mobiluncus_curtisii_ATCC_51333:0.00379,(Mobiluncus_curtisii_ATCC_430 > > 63:0.00217,Mobiluncus_curtisii_subsp._holmesii_ATCC_35242:0.00188)100:0.0018 > > 7)100:0.10314)100:0.23714,(Actinomyces_coleocanis_DSM_15436:0.23053,(Actinom > > yces_sp._oral_taxon_178_str._F0338:0.07648,(Actinomyces_sp._oral_taxon_180_s > > tr._F0310:0.034,(Actinomyces_odontolyticus_ATCC_17982:0.00322,Actinomyces_od > > ontolyticus_F0309:0.00348)100:0.02875)100:0.05144)100:0.13371)100:0.05044)10 > > 0:0.03821)98:0.03296)99:0.07932,(Beutenbergia_cavernae_DSM_12333:0.17071,(Ce > > llulomonas_flavigena_DSM_20109:0.11998,(Xylanimonas_cellulosilytica_DSM_1589 > > 4:0.13952,(Sanguibacter_keddieii_DSM_10542:0.0926,Jonesia_denitrificans_DSM_ > > 20603:0.19099)100:0.037)100:0.02528)100:0.03933)100:0.03437)90:0.02763)87:0. > > 02582)25:0.02892)16:0.01907)26:0.05274)57:0.03588,(Catenulispora_acidiphila_ > > DSM_44928:0.19325,((Streptomyces_bingchenggensis_BCW-1:0.02661,(Streptomyces > > _hygroscopicus_ATCC_53653:0.02627,Streptomyces_violaceusniger_Tu_4113:0.0226 > > 6)100:0.0122)100:0.02739,(((Streptomyces_avermitilis_MA-4680:0.02562,((Strep > > tomyces_viridochromogenes_DSM_40736:0.02687,((Streptomyces_ghanaensis_ATCC_1 > > 4672:0.0209,Streptomyces_griseoflavus_Tu4000:0.02075)100:0.01066,(Streptomyc > > es_sp._e14:0.03193,(Streptomyces_coelicolor_A3_2_:5.2E-4,Streptomyces_livida > > ns_TK24:6.7E-4)100:0.02763)21:0.00492)22:0.00461)100:0.00897,(Streptomyces_s > > viceus_ATCC_29083:0.02794,Streptomyces_scabiei_87.22:0.03155)79:0.00685)98:0 > > .00896)100:0.02063,(Streptomyces_albus_J1074:0.04846,(Streptomyces_sp._SPB74 > > :0.01027,(Streptomyces_sp._SPB78:0.0027,(Streptomyces_sp._SA3_actF:0.00376,S > > treptomyces_sp._SA3_actG:2.4E-4)100:6.7E-4)100:0.00617)100:0.05981)100:0.015 > > 63)100:0.00959,((Streptomyces_sp._C:0.01998,Streptomyces_sp._Mg1:0.02056)100 > > :0.0383,(Streptomyces_pristinaespiralis_ATCC_25486:0.03823,(Streptomyces_sp. > > _ACTE:0.02415,((Streptomyces_griseus_subsp._griseus_NBRC_13350:4.1E-4,Strept > > omyces_sp._ACT-1:5.2E-4)100:0.00769,(Streptomyces_roseosporus_NRRL_11379:2.2 > > E-4,Streptomyces_roseosporus_NRRL_15998:0.00209)100:0.00782)100:0.017)100:0. > > 02297)60:0.00742)100:0.00985)100:0.0215)100:0.11349)100:0.05)46:0.02214,(Aci > > dothermus_cellulolyticus_11B:0.25166,((Nocardiopsis_dassonvillei_subsp._dass > > onvillei_DSM_43111:0.11961,Thermobifida_fusca_YX:0.09198)100:0.09418,(Thermo > > monospora_curvata_DSM_43183:0.12535,(Streptosporangium_roseum_DSM_43021:0.09 > > 754,Thermobispora_bispora_DSM_43833:0.08434)100:0.06253)100:0.0273)100:0.045 > > 43)74:0.02856)40:0.02885,(Frankia_symbiont_of_Datisca_glomerata:0.09569,(Fra > > nkia_sp._EuI1c:0.10847,((Frankia_alni_ACN14a:0.0351,Frankia_sp._CcI3:0.03651 > > )100:0.04249,(Frankia_sp._EAN1pec:0.03101,Frankia_sp._EUN1f:0.0394)100:0.046 > > 88)100:0.02188)100:0.03014)100:0.16741)88:0.0315)65:0.02571)88:0.05354)88:0. > > 03348)88:0.06784,(Segniliparus_rotundus_DSM_44985:0.04795,Segniliparus_rugos > > us_ATCC_BAA-974:0.03891)100:0.23144)81:0.0205)88:0.02214,(Tsukamurella_pauro > > metabola_DSM_20162:0.13958,(Gordonia_bronchialis_DSM_43247:0.08238,Gordonia_ > > neofelifaecis_NRRL_B-59395:0.10412)100:0.06124)100:0.03199)84:0.01596)88:0.0 > > 6919)88:0.04069,(Mycobacterium_smegmatis_str._MC2_155:0.05174,((Mycobacteriu > > m_sp._JLS:0.00135,(Mycobacterium_sp._KMS:1.4E-4,Mycobacterium_sp._MCS:2.3E-4 > > )100:0.00177)100:0.04943,(Mycobacterium_vanbaalenii_PYR-1:0.02631,(Mycobacte > > rium_gilvum_PYR-GCK:0.0011,Mycobacterium_sp._Spyr1:9.2E-4)100:0.03388)100:0. > > 03536)100:0.01213)100:0.02314)88:0.04937,(Mycobacterium_parascrofulaceum_ATC > > C_BAA-614:0.02796,(Mycobacterium_intracellulare_ATCC_13950:0.02159,(Mycobact > > erium_avium_subsp._paratuberculosis_K-10:0.0020,(Mycobacterium_avium_104:0.0 > > 032,Mycobacterium_avium_subsp._avium_ATCC_25291:0.00216)91:6.3E-4)100:0.0194 > > 3)100:0.01235)100:0.0158)88:0.01054)62:0.0066)88:0.04218,(Mycobacterium_tube > > rculosis_T46:7.6E-4,(Mycobacterium_tuberculosis_CPHL_A:5.8E-4,(Mycobacterium > > _tuberculosis_K85:8.0E-4,(Mycobacterium_bovis_AF2122/97:1.4E-4,(Mycobacteriu > > m_bovis_BCG_str._Pasteur_1173P2:1.4E-4,Mycobacterium_bovis_BCG_str._Tokyo_17 > > 2:1.4E-4)100:1.3E-4)100:1.8E-4)38:1.1E-4)33:1.4E-4)0:1.4E-4)1:1.5E-4)0:1.5E- > > 4,(Mycobacterium_tuberculosis_CDC1551:2.2E-4,(Mycobacterium_tuberculosis_T92 > > :0.00262,Mycobacterium_tuberculosis_GM_1503:0.00427)3:1.3E-4)0:1.4E-4)13:1.4 > > E-4,(((Mycobacterium_tuberculosis_SUMu012:8.5E-4,(Mycobacterium_tuberculosis > > _H37Ra_WGS_:0.00109,(Mycobacterium_tuberculosis_H37Ra:1.4E-4,Mycobacterium_t > > uberculosis_H37Rv:1.4E-4)58:1.4E-4)41:1.4E-4)72:1.4E-4,(Mycobacterium_tuberc > > ulosis_SUMu001:2.9E-4,(Mycobacterium_tuberculosis_SUMu010:1.1E-4,Mycobacteri > > um_tuberculosis_SUMu011:8.7E-4)58:1.4E-4)65:1.6E-4)61:0.00102,(Mycobacterium > > _tuberculosis_F11:1.1E-4,((Mycobacterium_tuberculosis_KZN_4207:1.4E-4,Mycoba > > cterium_tuberculosis_KZN_R506:1.4E-4)58:1.4E-4,(Mycobacterium_tuberculosis_K > > ZN_1435:1.4E-4,Mycobacterium_tuberculosis_KZN_V2475:1.4E-4)74:1.4E-4)78:1.1E > -4)36:1.4E-4)3:1.4E-4)46:1.4E-4)4:1.4E-4)88; > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From ross at cuhk.edu.hk Sat Sep 10 19:06:44 2011 From: ross at cuhk.edu.hk (Ross KK Leung) Date: Sun, 11 Sep 2011 07:06:44 +0800 Subject: [Bioperl-l] fail to obtain node-to-node distance from a newick file In-Reply-To: References: <048a01cc6f84$60c41090$224c31b0$@edu.hk> Message-ID: <04a601cc700e$4f051d10$ed0f5730$@edu.hk> Hi Greg, The values are correct! However, how to install this bioperl-live module? my bioperl is 1.6.1 but there's an error: Can't locate object method "find" via package "Bio::Tree::Tree" at TreeCalDist.pl line 32, line 1. my $mt = $tree->find($keyword); #line 32 From: gjuggler at gmail.com [mailto:gjuggler at gmail.com] On Behalf Of Gregory Jordan Sent: 2011??9??10?? 23:40 To: bioperl-l List; Ross KK Leung Subject: Re: [Bioperl-l] fail to obtain node-to-node distance from a newick file Hi Ross, Which version of BioPerl are you using? With the refactored tree code (available from the tree_api_refresh branch on the BioPerl github repo: https://github.com/bioperl/bioperl-live/blob/topic/tree_api_refresh/Bio/Tree /NodeFunctionsI.pm#L406) the following script works for me. Do those values look sensible to you? The code on the new branch is a bit experimental, so I wouldn't be surprised if all the edge cases for calculations like this aren't covered. --greg use Bio::TreeIO; my $treeio = new Bio::TreeIO(-file => 'temp.nh', -format => "newick"); my $tree = $treeio->next_tree; my $mt = $tree->find("Mycobacterium_tuberculosis_H37Rv"); my $ms = $tree->find("Mycobacterium_smegmatis_str._MC2_155"); my $ma = $tree->find("Mycobacterium_abscessus"); my $distance = $mt->distance($ma); print "MT - MA: ".$mt->distance($ma)."\n"; print "MT - MS: ".$mt->distance($ms)."\n"; print "MS - MA: ".$ms->distance($ma)."\n"; # MT - MA: 0.24326 # MT - MS: 0.18573 # MS - MA: 0.20729 --greg On Sat, Sep 10, 2011 at 7:39 AM, Ross KK Leung wrote: I use the following code to derive the distance between two nodes but an error "MSG: could not find the lca of supplied nodes; can't find distance either" What's the problem? use Bio::TreeIO; ($treefh) = @ARGV; my $treeio = new Bio::TreeIO(-file => $treefh, -format => "newick"); my $tree = $treeio->next_tree; $keyword="Mycobacterium_tuberculosis_H37Rv"; my $Tnodes = $tree->find_node(-id => $keyword); $keyword="Mycobacterium_smegmatis_str._MC2_155"; my $Mnodes = $tree->find_node(-id => $keyword); $keyword="Mycobacterium_abscessus"; my $Anodes = $tree->find_node(-id => $keyword); my @root = $tree->get_root_node; #my $distances = $tree->distance(-nodes => [$node[0],$root]); my $distances = $tree->distance(-nodes => [$Tnode,$Mnodes]); print "Dist:$distances\n"; #### the following is the infile (((Mycobacterium_tuberculosis_SUMu006:4.1E-4,(Mycobacterium_tuberculosis_SUM u002:8.9E-4,Mycobacterium_tuberculosis_SUMu005:1.7E-4)31:1.4E-4)30:1.4E-4,(M ycobacterium_tuberculosis_SUMu007:4.3E-4,Mycobacterium_tuberculosis_SUMu009: 1.2E-4)22:1.4E-4)8:1.4E-4,(Mycobacterium_tuberculosis_SUMu004:3.7E-4,Mycobac terium_tuberculosis_SUMu008:2.0E-4)19:1.0E-4,(Mycobacterium_tuberculosis_SUM u003:1.4E-4,(((((Mycobacterium_tuberculosis_EAS054:0.00165,(Mycobacterium_tu berculosis_94_M4241A:0.0011,((Mycobacterium_tuberculosis_T17:0.00257,Mycobac terium_tuberculosis_KZN_605:0.00259)8:1.9E-4,(Mycobacterium_tuberculosis_'98 -R604_INH-RIF-EM':9.9E-4,(Mycobacterium_tuberculosis_C:0.00293,Mycobacterium _tuberculosis_str._Haarlem:0.00271)28:1.4E-4)1:1.3E-4)0:1.6E-4)0:1.7E-4)0:1. 6E-4,(Mycobacterium_tuberculosis_02_1987:0.00186,(Mycobacterium_tuberculosis _T85:0.00229,Mycobacterium_tuberculosis_210:0.00175)41:1.4E-4)22:1.1E-4)0:1. 4E-4,(((Mycobacterium_kansasii_ATCC_12478:0.03353,(Mycobacterium_marinum_M:8 .4E-4,Mycobacterium_ulcerans_Agy99:0.00621)100:0.03843)100:0.01058,((Mycobac terium_leprae_Br4923:1.7E-4,Mycobacterium_leprae_TN:1.6E-4)100:0.07379,(((My cobacterium_abscessus:0.09172,((Nocardia_farcinica_IFM_10152:0.09312,(Rhodoc occus_equi_103S:0.04972,((Rhodococcus_erythropolis_PR4:0.00223,Rhodococcus_e rythropolis_SK121:0.00169)100:0.05311,(Rhodococcus_jostii_RHA1:0.00657,Rhodo coccus_opacus_B4:0.00665)100:0.03198)100:0.02608)100:0.03679)100:0.0322,(((D ietzia_cinnamea_P4:0.17703,(Corynebacterium_amycolatum_SK46:0.16548,((Coryne bacterium_kroppenstedtii_DSM_44385:0.19137,(Corynebacterium_variabile_DSM_44 702:0.14307,(Corynebacterium_urealyticum_DSM_7109:0.11864,(Corynebacterium_r esistens_DSM_45100:0.08617,(Corynebacterium_jeikeium_ATCC_43734:0.00292,Cory nebacterium_jeikeium_K411:0.00277)100:0.08012)100:0.02749)100:0.03867)100:0. 05776)100:0.02728,((Corynebacterium_glucuronolyticum_ATCC_51866:0.00103,Cory nebacterium_glucuronolyticum_ATCC_51867:0.00137)100:0.2108,(((Corynebacteriu m_efficiens_YS-314:0.04795,Corynebacterium_glutamicum_R:0.05369)100:0.07449, ((Corynebacterium_matruchotii_ATCC_14266:0.00162,Corynebacterium_matruchotii _ATCC_33806:0.00183)100:0.13467,(Corynebacterium_diphtheriae_NCTC_13129:0.07 465,((Corynebacterium_pseudotuberculosis_C231:1.6E-4,Corynebacterium_pseudot uberculosis_I19:1.1E-4)55:1.8E-4,(Corynebacterium_pseudotuberculosis_1002:4. 2E-4,Corynebacterium_pseudotuberculosis_FRC41:1.2E-4)100:1.3E-4)100:0.0698)1 00:0.03816)100:0.03537)100:0.01906,((Corynebacterium_ammoniagenes_DSM_20306: 0.10393,(((Corynebacterium_accolens_ATCC_49725:0.00229,Corynebacterium_accol ens_ATCC_49726:0.00293)100:0.03941,(Corynebacterium_pseudogenitalium_ATCC_33 035:0.00268,Corynebacterium_tuberculostearicum_SK141:0.00306)100:0.02483)100 :0.04295,(Corynebacterium_aurimucosum_ATCC_700975:0.05398,Corynebacterium_st riatum_ATCC_6940:0.05404)100:0.02088)100:0.03132)100:0.05481,(Corynebacteriu m_genitalium_ATCC_33030:0.09551,Corynebacterium_lipophiloflavum_DSM_44291:0. 09316)100:0.0783)100:0.03398)100:0.02921)100:0.04667)100:0.02447)100:0.11754 )100:0.06305,(((Pseudonocardia_sp._P1:0.18877,(Saccharopolyspora_erythraea_N RRL_2338:0.1213,(Actinosynnema_mirum_DSM_43827:0.11851,(Saccharomonospora_vi ridis_DSM_43017:0.0927,(Amycolatopsis_mediterranei_U32:0.04231,Streptomyces_ sp._AA4:0.04144)100:0.05565)100:0.05467)55:0.01703)100:0.03241)100:0.02581,( Nakamurella_multipartita_DSM_44233:0.23119,(Geodermatophilus_obscurus_DSM_43 160:0.21546,((Stackebrandtia_nassauensis_DSM_44728:0.21272,((Salinispora_are nicola_CNS-205:0.01818,Salinispora_tropica_CNB-440:0.02452)100:0.03265,(Micr omonospora_sp._ATCC_39149:0.0329,(Micromonospora_aurantiaca_ATCC_27029:0.001 16,Micromonospora_sp._L5:7.7E-4)100:0.03199)100:0.01352)100:0.11098)100:0.09 308,((((((Propionibacterium_freudenreichii_subsp._shermanii_CIRM-BIA1:0.2696 8,(Propionibacterium_acnes_J139:0.00475,(Propionibacterium_acnes_KPA171202:6 .9E-4,(Propionibacterium_acnes_SK187:9.6E-4,(Propionibacterium_acnes_J165:6. 2E-4,Propionibacterium_acnes_SK137:8.8E-4)100:6.3E-4)100:0.00181)100:0.00315 )100:0.24712)45:0.13659,(Kribbella_flavida_DSM_17836:0.17436,(Nocardioidacea e_bacterium_Broad-1:0.13638,Nocardioides_sp._JS614:0.08869)100:0.11094)73:0. 02696)43:0.0456,((Kytococcus_sedentarius_DSM_20547:0.25655,(Dermacoccus_sp._ Ellin185:0.19062,(Intrasporangium_calvum_DSM_43043:0.13866,Janibacter_sp._HT CC2649:0.1412)100:0.04228)58:0.02864)100:0.06896,(Kineococcus_radiotolerans_ SRS30216:0.21207,((((Propionibacterium_acidifaciens_F0233:1.06116,Rubrobacte r_xylanophilus_DSM_9941:0.5414)87:0.41816,(((Tropheryma_whipplei_TW08/27:0.0 0122,Tropheryma_whipplei_str._Twist:9.8E-4)100:0.6263,(Microbacterium_testac eum_StLB037:0.17489,(Leifsonia_xyli_subsp._xyli_str._CTCB07:0.12936,(marine_ actinobacterium_PHSC20C1:0.15679,(Clavibacter_michiganensis_subsp._michigane nsis_NCPPB_382:0.00662,Clavibacter_michiganensis_subsp._sepedonicus:0.00756) 100:0.1194)100:0.03032)100:0.03729)100:0.05755)100:0.10753,((Kocuria_rhizoph ila_DC2201:0.15484,(Rothia_dentocariosa_M567:0.0599,(Rothia_mucilaginosa_ATC C_25296:0.00472,Rothia_mucilaginosa_DY-18:0.00351)100:0.05376)100:0.16477)10 0:0.06018,((Micrococcus_luteus_NCTC_2665:0.0060,Micrococcus_luteus_SK58:0.00 536)100:0.2105,(Arthrobacter_arilaitensis_Re117:0.19987,(Renibacterium_salmo ninarum_ATCC_33209:0.11862,(Arthrobacter_aurescens_TC1:0.0397,(Arthrobacter_ sp._FB24:0.03191,(Arthrobacter_chlorophenolicus_A6:0.02327,Arthrobacter_phen anthrenivorans_Sphe3:0.0208)100:0.01937)100:0.01749)100:0.05605)100:0.05138) 54:0.02994)100:0.02418)100:0.06667)66:0.02055)2:0.00737,(Brevibacterium_line ns_BL2:0.16126,Brevibacterium_mcbrellneri_ATCC_49030:0.16657)100:0.16995)18: 0.03558,(Brachybacterium_faecium_DSM_4810:0.31488,(((Actinomyces_sp._oral_ta xon_848_str._F0332:0.18095,Arcanobacterium_haemolyticum_DSM_20595:0.23121)10 0:0.1195,((Actinomyces_urogenitalis_DSM_15434:0.07685,(Actinomyces_sp._oral_ taxon_171_str._F0337:0.01337,(Actinomyces_oris_K20:0.00655,Actinomyces_visco sus_C505:0.00531)100:0.00537)100:0.07249)100:0.1364,(((Mobiluncus_mulieris_2 8-1:0.00136,(Mobiluncus_mulieris_ATCC_35243:8.3E-4,(Mobiluncus_mulieris_ATCC _35239:8.2E-4,Mobiluncus_mulieris_FB024-16:0.00149)77:5.2E-4)68:4.7E-4)100:0 .11036,(Mobiluncus_curtisii_ATCC_51333:0.00379,(Mobiluncus_curtisii_ATCC_430 63:0.00217,Mobiluncus_curtisii_subsp._holmesii_ATCC_35242:0.00188)100:0.0018 7)100:0.10314)100:0.23714,(Actinomyces_coleocanis_DSM_15436:0.23053,(Actinom yces_sp._oral_taxon_178_str._F0338:0.07648,(Actinomyces_sp._oral_taxon_180_s tr._F0310:0.034,(Actinomyces_odontolyticus_ATCC_17982:0.00322,Actinomyces_od ontolyticus_F0309:0.00348)100:0.02875)100:0.05144)100:0.13371)100:0.05044)10 0:0.03821)98:0.03296)99:0.07932,(Beutenbergia_cavernae_DSM_12333:0.17071,(Ce llulomonas_flavigena_DSM_20109:0.11998,(Xylanimonas_cellulosilytica_DSM_1589 4:0.13952,(Sanguibacter_keddieii_DSM_10542:0.0926,Jonesia_denitrificans_DSM_ 20603:0.19099)100:0.037)100:0.02528)100:0.03933)100:0.03437)90:0.02763)87:0. 02582)25:0.02892)16:0.01907)26:0.05274)57:0.03588,(Catenulispora_acidiphila_ DSM_44928:0.19325,((Streptomyces_bingchenggensis_BCW-1:0.02661,(Streptomyces _hygroscopicus_ATCC_53653:0.02627,Streptomyces_violaceusniger_Tu_4113:0.0226 6)100:0.0122)100:0.02739,(((Streptomyces_avermitilis_MA-4680:0.02562,((Strep tomyces_viridochromogenes_DSM_40736:0.02687,((Streptomyces_ghanaensis_ATCC_1 4672:0.0209,Streptomyces_griseoflavus_Tu4000:0.02075)100:0.01066,(Streptomyc es_sp._e14:0.03193,(Streptomyces_coelicolor_A3_2_:5.2E-4,Streptomyces_livida ns_TK24:6.7E-4)100:0.02763)21:0.00492)22:0.00461)100:0.00897,(Streptomyces_s viceus_ATCC_29083:0.02794,Streptomyces_scabiei_87.22:0.03155)79:0.00685)98:0 .00896)100:0.02063,(Streptomyces_albus_J1074:0.04846,(Streptomyces_sp._SPB74 :0.01027,(Streptomyces_sp._SPB78:0.0027,(Streptomyces_sp._SA3_actF:0.00376,S treptomyces_sp._SA3_actG:2.4E-4)100:6.7E-4)100:0.00617)100:0.05981)100:0.015 63)100:0.00959,((Streptomyces_sp._C:0.01998,Streptomyces_sp._Mg1:0.02056)100 :0.0383,(Streptomyces_pristinaespiralis_ATCC_25486:0.03823,(Streptomyces_sp. _ACTE:0.02415,((Streptomyces_griseus_subsp._griseus_NBRC_13350:4.1E-4,Strept omyces_sp._ACT-1:5.2E-4)100:0.00769,(Streptomyces_roseosporus_NRRL_11379:2.2 E-4,Streptomyces_roseosporus_NRRL_15998:0.00209)100:0.00782)100:0.017)100:0. 02297)60:0.00742)100:0.00985)100:0.0215)100:0.11349)100:0.05)46:0.02214,(Aci dothermus_cellulolyticus_11B:0.25166,((Nocardiopsis_dassonvillei_subsp._dass onvillei_DSM_43111:0.11961,Thermobifida_fusca_YX:0.09198)100:0.09418,(Thermo monospora_curvata_DSM_43183:0.12535,(Streptosporangium_roseum_DSM_43021:0.09 754,Thermobispora_bispora_DSM_43833:0.08434)100:0.06253)100:0.0273)100:0.045 43)74:0.02856)40:0.02885,(Frankia_symbiont_of_Datisca_glomerata:0.09569,(Fra nkia_sp._EuI1c:0.10847,((Frankia_alni_ACN14a:0.0351,Frankia_sp._CcI3:0.03651 )100:0.04249,(Frankia_sp._EAN1pec:0.03101,Frankia_sp._EUN1f:0.0394)100:0.046 88)100:0.02188)100:0.03014)100:0.16741)88:0.0315)65:0.02571)88:0.05354)88:0. 03348)88:0.06784,(Segniliparus_rotundus_DSM_44985:0.04795,Segniliparus_rugos us_ATCC_BAA-974:0.03891)100:0.23144)81:0.0205)88:0.02214,(Tsukamurella_pauro metabola_DSM_20162:0.13958,(Gordonia_bronchialis_DSM_43247:0.08238,Gordonia_ neofelifaecis_NRRL_B-59395:0.10412)100:0.06124)100:0.03199)84:0.01596)88:0.0 6919)88:0.04069,(Mycobacterium_smegmatis_str._MC2_155:0.05174,((Mycobacteriu m_sp._JLS:0.00135,(Mycobacterium_sp._KMS:1.4E-4,Mycobacterium_sp._MCS:2.3E-4 )100:0.00177)100:0.04943,(Mycobacterium_vanbaalenii_PYR-1:0.02631,(Mycobacte rium_gilvum_PYR-GCK:0.0011,Mycobacterium_sp._Spyr1:9.2E-4)100:0.03388)100:0. 03536)100:0.01213)100:0.02314)88:0.04937,(Mycobacterium_parascrofulaceum_ATC C_BAA-614:0.02796,(Mycobacterium_intracellulare_ATCC_13950:0.02159,(Mycobact erium_avium_subsp._paratuberculosis_K-10:0.0020,(Mycobacterium_avium_104:0.0 032,Mycobacterium_avium_subsp._avium_ATCC_25291:0.00216)91:6.3E-4)100:0.0194 3)100:0.01235)100:0.0158)88:0.01054)62:0.0066)88:0.04218,(Mycobacterium_tube rculosis_T46:7.6E-4,(Mycobacterium_tuberculosis_CPHL_A:5.8E-4,(Mycobacterium _tuberculosis_K85:8.0E-4,(Mycobacterium_bovis_AF2122/97:1.4E-4,(Mycobacteriu m_bovis_BCG_str._Pasteur_1173P2:1.4E-4,Mycobacterium_bovis_BCG_str._Tokyo_17 2:1.4E-4)100:1.3E-4)100:1.8E-4)38:1.1E-4)33:1.4E-4)0:1.4E-4)1:1.5E-4)0:1.5E- 4,(Mycobacterium_tuberculosis_CDC1551:2.2E-4,(Mycobacterium_tuberculosis_T92 :0.00262,Mycobacterium_tuberculosis_GM_1503:0.00427)3:1.3E-4)0:1.4E-4)13:1.4 E-4,(((Mycobacterium_tuberculosis_SUMu012:8.5E-4,(Mycobacterium_tuberculosis _H37Ra_WGS_:0.00109,(Mycobacterium_tuberculosis_H37Ra:1.4E-4,Mycobacterium_t uberculosis_H37Rv:1.4E-4)58:1.4E-4)41:1.4E-4)72:1.4E-4,(Mycobacterium_tuberc ulosis_SUMu001:2.9E-4,(Mycobacterium_tuberculosis_SUMu010:1.1E-4,Mycobacteri um_tuberculosis_SUMu011:8.7E-4)58:1.4E-4)65:1.6E-4)61:0.00102,(Mycobacterium _tuberculosis_F11:1.1E-4,((Mycobacterium_tuberculosis_KZN_4207:1.4E-4,Mycoba cterium_tuberculosis_KZN_R506:1.4E-4)58:1.4E-4,(Mycobacterium_tuberculosis_K ZN_1435:1.4E-4,Mycobacterium_tuberculosis_KZN_V2475:1.4E-4)74:1.4E-4)78:1.1E -4)36:1.4E-4)3:1.4E-4)46:1.4E-4)4:1.4E-4)88; _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From manju.rawat2 at gmail.com Mon Sep 12 01:37:35 2011 From: manju.rawat2 at gmail.com (Manju Rawat) Date: Mon, 12 Sep 2011 01:37:35 -0400 Subject: [Bioperl-l] no blast result Message-ID: Hello, I want to make a program which first generate the random sequence and then gives me that result(sequence) which have no blast result(no matches in any database/or particular database).Is there any body who can help me in doing this. Pl reply if anybody knows about it.. Thanks Manju From zhangchnxp at gmail.com Mon Sep 12 01:51:59 2011 From: zhangchnxp at gmail.com (Zhang chn) Date: Mon, 12 Sep 2011 13:51:59 +0800 Subject: [Bioperl-l] no blast result In-Reply-To: References: Message-ID: Hi, IMHO, due to the nature of BLAST, it is usually impossible to get no results from random sequence, but to get a set of matches with lower scores. What you can do is to focus on the e-value, say, setting a threshold to it. FYI, http://www.ncbi.nlm.nih.gov/BLAST/tutorial/Altschul-1.html On Mon, Sep 12, 2011 at 1:37 PM, Manju Rawat wrote: > Hello, > I want to make a program which first generate the random sequence and then > gives me that result(sequence) which have no blast result(no matches in any > database/or particular database).Is there any body who can help me in doing > this. > > Pl reply if anybody knows about it.. > Thanks > Manju > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From manju.rawat2 at gmail.com Mon Sep 12 01:58:38 2011 From: manju.rawat2 at gmail.com (Manju Rawat) Date: Mon, 12 Sep 2011 01:58:38 -0400 Subject: [Bioperl-l] no blast result In-Reply-To: References: Message-ID: Ya i know this....And it is also in my use if i get result with lower scores. But how could I do this? Manju From zhangchnxp at gmail.com Mon Sep 12 02:04:17 2011 From: zhangchnxp at gmail.com (Zhang chn) Date: Mon, 12 Sep 2011 14:04:17 +0800 Subject: [Bioperl-l] no blast result In-Reply-To: References: Message-ID: Please read the documentation for Bio::Tools::Run::StandAloneBlast and Bio::AlignIO.* * On Mon, Sep 12, 2011 at 1:58 PM, Manju Rawat wrote: > Ya i know this....And it is also in my use if i get result with lower > scores. > But how could I do this? > > > Manju > From manju.rawat2 at gmail.com Mon Sep 12 07:12:40 2011 From: manju.rawat2 at gmail.com (Manju Rawat) Date: Mon, 12 Sep 2011 07:12:40 -0400 Subject: [Bioperl-l] no blast result In-Reply-To: References: Message-ID: I read this..but default program is not runnig fine.it showing many error that MSG: cannot find path to blastall.. Use of uninitialized value $_[0] in join or string at /usr/share/perl/5.10/File/Spec/Unix.pm line 41. Am this this is not showing output which i want.. Pl help me.. Manju Rawat From arguelloj at gmail.com Sun Sep 11 22:52:42 2011 From: arguelloj at gmail.com (J. Fernando Arguello) Date: Sun, 11 Sep 2011 19:52:42 -0700 Subject: [Bioperl-l] BioPerl - quick general question Message-ID: Dear BioPerl, I'm excited to see a project like this! Basically I have a computer science background with a few years of development, research and minimal bioinformatics experience. Dumb question...where is the best place for a developer to begin on the BioPerl wiki(s), who is wanting to contribute new code or bug fixes to BioPerl in the future? Any input is much appreciated. Thank you all for your time. Best, Fernando jfa From briano at bioteam.net Mon Sep 12 09:20:36 2011 From: briano at bioteam.net (Brian Osborne) Date: Mon, 12 Sep 2011 09:20:36 -0400 Subject: [Bioperl-l] Fwd: cds sequence extract References: <112c4ef2.641e.1325c4b21cb.Coremail.maliang7121@163.com> Message-ID: <671CAF11-55A4-462A-BC5B-805C87E1EB0E@bioteam.net> Liang Ma, I'm forwarding this to the Bioperl mailing list. If you're starting out with Bioperl I suggest you read this: http://www.bioperl.org/wiki/HOWTO:Beginners Brian O. Begin forwarded message: > From: maliang7121 > Date: September 12, 2011 2:20:20 AM EDT > To: briano at bioteam.net > Subject: cds sequence extract > > Dear Brian: > > I am a student of Chinese Academy of Sience, I begin to love bioperl, but now I have a problem. > > According to the script of the attachment, I could easily dowload sequences from NCBI, now I need extract cds sequence from the genbank format files, and put them all in a single file using fasta format, I can not do it, could you spend a few minite wrinting a script for me? > > Best! > > Liang Ma > > > Brian O. -- Brian Osborne, PhD BioTeam: http://bioteam.net email: briano at bioteam.net mobile: 978-317-3101 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: acc.txt URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: get_seq_by_acc_ml.pl Type: text/x-perl-script Size: 583 bytes Desc: not available URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: From fs5 at sanger.ac.uk Mon Sep 12 09:54:21 2011 From: fs5 at sanger.ac.uk (Frank Schwach) Date: Mon, 12 Sep 2011 14:54:21 +0100 Subject: [Bioperl-l] no blast result In-Reply-To: References: Message-ID: <1315835661.3797.563.camel@deskpro15336.internal.sanger.ac.uk> looks like BLAST is not install on your system. The BioPerl module only runs BLAST for you and parses the output but you still need the BLAST executables installed on your system. Follow the instructions on the NCBI website to download and install BLAST and try running it on the commandline with the "blastall" command. If that works then you can run it also via BioPerl. Frank On Mon, 2011-09-12 at 07:12 -0400, Manju Rawat wrote: > I read this..but default program is not runnig fine.it showing many error > that > > MSG: cannot find path to blastall.. > Use of uninitialized value $_[0] in join or string at > /usr/share/perl/5.10/File/Spec/Unix.pm line 41. > > Am this this is not showing output which i want.. > > Pl help me.. > > Manju Rawat > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From p.j.a.cock at googlemail.com Mon Sep 12 10:00:30 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 12 Sep 2011 15:00:30 +0100 Subject: [Bioperl-l] no blast result In-Reply-To: <1315835661.3797.563.camel@deskpro15336.internal.sanger.ac.uk> References: <1315835661.3797.563.camel@deskpro15336.internal.sanger.ac.uk> Message-ID: On Mon, Sep 12, 2011 at 2:54 PM, Frank Schwach wrote: > looks like BLAST is not install on your system. The BioPerl module only > runs BLAST for you and parses the output but you still need the BLAST > executables installed on your system. Follow the instructions on the > NCBI website to download and install BLAST and try running it on the > commandline with the "blastall" command. If that works then you can run > it also via BioPerl. > Frank Hang on - blastall is from the "legacy" BLAST suite, does BioPerl still talk to that or the new BLAST+ suite (e.g. binaries blastn and blastp rather then blastall)? Peter From cjfields at illinois.edu Mon Sep 12 13:45:56 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Mon, 12 Sep 2011 17:45:56 +0000 Subject: [Bioperl-l] BioPerl - quick general question In-Reply-To: References: Message-ID: <62B9B300-96AC-4511-A1B9-CFF36CBE6288@illinois.edu> On Sep 11, 2011, at 9:52 PM, J. Fernando Arguello wrote: > Dear BioPerl, > > I'm excited to see a project like this! Basically I have a computer science > background with a few years of development, research and minimal > bioinformatics experience. > > Dumb question...where is the best place for a developer to begin on the > BioPerl wiki(s), who is wanting to contribute new code or bug fixes to > BioPerl in the future? The basic starting point: the HOWTOs and the tutorial (not sure how up-to-date some of the latter are, in general they should work): http://www.bioperl.org/wiki/HOWTOs http://www.bioperl.org/wiki/Tutorials > Any input is much appreciated. Thank you all for your time. > > Best, > Fernando > jfa We gladly welcome anyone willing to hack on BioPerl. The repository is now on github (core is https://github.com/bioperl/bioperl-live), so it's fairly easy to fork the code and make changes. We are in the middle of splitting up the large codebase into more manageable subdistributions, so it's probably a good idea to ask on list about specific code in case the code is question resides in a separate repository. Let us know if you have additional questions! Cheers! chris From shalabh.sharma7 at gmail.com Mon Sep 12 14:00:16 2011 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Mon, 12 Sep 2011 14:00:16 -0400 Subject: [Bioperl-l] Module for SOCS Message-ID: Hi All, I am using SOCS for mapping my SOILD data. I was just wondering if there is any module in bioperl to analyze SOCS output files directly or mapreads format. Thanks Shalabh -- Shalabh Sharma Scientific Computing Professional Associate (Bioinformatics Specialist) Department of Marine Sciences University of Georgia Athens, GA 30602-3636 From greg at ebi.ac.uk Tue Sep 13 04:30:58 2011 From: greg at ebi.ac.uk (Gregory Jordan) Date: Tue, 13 Sep 2011 09:30:58 +0100 Subject: [Bioperl-l] fail to obtain node-to-node distance from a newick file In-Reply-To: <04a601cc700e$4f051d10$ed0f5730$@edu.hk> References: <048a01cc6f84$60c41090$224c31b0$@edu.hk> <04a601cc700e$4f051d10$ed0f5730$@edu.hk> Message-ID: Hi Ross, I don't typically 'install' versions of BioPerl from GitHub. Rather, I check out the code into a directory that's on my Perl search path (and make sure any other BioPerl code isn't on the path anymore). I think the following commands should get you the right set of code: > git clone git://github.com/bioperl/bioperl-live.git > git checkout topic/tree_api_refresh After that, I'm afraid I'll have to leave it to you (or someone else on the list). I'm no Perl guru, so I don't know the "right" way to direct Perl towards a developmental BioPerl branch. Cheers, Greg 2011/9/11 Ross KK Leung > Hi Greg,**** > > ** ** > > The values are correct! However, how to install this bioperl-live module? > my bioperl is 1.6.1 but there's an error:**** > > ** ** > > Can't locate object method "find" via package "Bio::Tree::Tree" at > TreeCalDist.pl line 32, line 1.**** > > my $mt = $tree->find($keyword); #line 32**** > > ** ** > > ** ** > > *From:* gjuggler at gmail.com [mailto:gjuggler at gmail.com] *On Behalf Of *Gregory > Jordan > *Sent:* 2011?9?10? 23:40 > *To:* bioperl-l List; Ross KK Leung > *Subject:* Re: [Bioperl-l] fail to obtain node-to-node distance from a > newick file**** > > ** ** > > Hi Ross,**** > > ** ** > > Which version of BioPerl are you using?**** > > ** ** > > With the refactored tree code (available from the tree_api_refresh branch > on the BioPerl github repo: > https://github.com/bioperl/bioperl-live/blob/topic/tree_api_refresh/Bio/Tree/NodeFunctionsI.pm#L406) > the following script works for me. Do those values look sensible to you? The > code on the new branch is a bit experimental, so I wouldn't be surprised if > all the edge cases for calculations like this aren't covered.**** > > ** ** > > --greg**** > > ** ** > > use Bio::TreeIO;**** > > ** ** > > my $treeio = new Bio::TreeIO(-file => 'temp.nh', -format => "newick");** > ** > > my $tree = $treeio->next_tree;**** > > my $mt = $tree->find("Mycobacterium_tuberculosis_H37Rv");**** > > my $ms = $tree->find("Mycobacterium_smegmatis_str._MC2_155");**** > > my $ma = $tree->find("Mycobacterium_abscessus");**** > > my $distance = $mt->distance($ma);**** > > print "MT - MA: ".$mt->distance($ma)."\n";**** > > print "MT - MS: ".$mt->distance($ms)."\n";**** > > print "MS - MA: ".$ms->distance($ma)."\n";**** > > # MT - MA: 0.24326**** > > # MT - MS: 0.18573**** > > # MS - MA: 0.20729**** > > ** ** > > --greg**** > > ** ** > > On Sat, Sep 10, 2011 at 7:39 AM, Ross KK Leung wrote:** > ** > > I use the following code to derive the distance between two nodes but an > error "MSG: could not find the lca of supplied nodes; can't find distance > either" > > > > What's the problem? > > > > use Bio::TreeIO; > > > > ($treefh) = @ARGV; > > > > my $treeio = new Bio::TreeIO(-file => $treefh, -format => "newick"); > > my $tree = $treeio->next_tree; > > > > $keyword="Mycobacterium_tuberculosis_H37Rv"; > > my $Tnodes = $tree->find_node(-id => $keyword); > > $keyword="Mycobacterium_smegmatis_str._MC2_155"; > > my $Mnodes = $tree->find_node(-id => $keyword); > > $keyword="Mycobacterium_abscessus"; > > my $Anodes = $tree->find_node(-id => $keyword); > > > > my @root = $tree->get_root_node; > > #my $distances = $tree->distance(-nodes => [$node[0],$root]); > > > > my $distances = $tree->distance(-nodes => [$Tnode,$Mnodes]); > > print "Dist:$distances\n"; > > > > > > #### the following is the infile > > > (((Mycobacterium_tuberculosis_SUMu006:4.1E-4,(Mycobacterium_tuberculosis_SUM > > u002:8.9E-4,Mycobacterium_tuberculosis_SUMu005:1.7E-4)31:1.4E-4)30:1.4E-4,(M > > ycobacterium_tuberculosis_SUMu007:4.3E-4,Mycobacterium_tuberculosis_SUMu009: > > 1.2E-4)22:1.4E-4)8:1.4E-4,(Mycobacterium_tuberculosis_SUMu004:3.7E-4,Mycobac > > terium_tuberculosis_SUMu008:2.0E-4)19:1.0E-4,(Mycobacterium_tuberculosis_SUM > > u003:1.4E-4,(((((Mycobacterium_tuberculosis_EAS054:0.00165,(Mycobacterium_tu > > berculosis_94_M4241A:0.0011,((Mycobacterium_tuberculosis_T17:0.00257,Mycobac > > terium_tuberculosis_KZN_605:0.00259)8:1.9E-4,(Mycobacterium_tuberculosis_'98 > > -R604_INH-RIF-EM':9.9E-4,(Mycobacterium_tuberculosis_C:0.00293,Mycobacterium > > _tuberculosis_str._Haarlem:0.00271)28:1.4E-4)1:1.3E-4)0:1.6E-4)0:1.7E-4)0:1. > > 6E-4,(Mycobacterium_tuberculosis_02_1987:0.00186,(Mycobacterium_tuberculosis > > _T85:0.00229,Mycobacterium_tuberculosis_210:0.00175)41:1.4E-4)22:1.1E-4)0:1. > > 4E-4,(((Mycobacterium_kansasii_ATCC_12478:0.03353,(Mycobacterium_marinum_M:8 > > .4E-4,Mycobacterium_ulcerans_Agy99:0.00621)100:0.03843)100:0.01058,((Mycobac > > terium_leprae_Br4923:1.7E-4,Mycobacterium_leprae_TN:1.6E-4)100:0.07379,(((My > > cobacterium_abscessus:0.09172,((Nocardia_farcinica_IFM_10152:0.09312,(Rhodoc > > occus_equi_103S:0.04972,((Rhodococcus_erythropolis_PR4:0.00223,Rhodococcus_e > > rythropolis_SK121:0.00169)100:0.05311,(Rhodococcus_jostii_RHA1:0.00657,Rhodo > > coccus_opacus_B4:0.00665)100:0.03198)100:0.02608)100:0.03679)100:0.0322,(((D > > ietzia_cinnamea_P4:0.17703,(Corynebacterium_amycolatum_SK46:0.16548,((Coryne > > bacterium_kroppenstedtii_DSM_44385:0.19137,(Corynebacterium_variabile_DSM_44 > > 702:0.14307,(Corynebacterium_urealyticum_DSM_7109:0.11864,(Corynebacterium_r > > esistens_DSM_45100:0.08617,(Corynebacterium_jeikeium_ATCC_43734:0.00292,Cory > > nebacterium_jeikeium_K411:0.00277)100:0.08012)100:0.02749)100:0.03867)100:0. > > 05776)100:0.02728,((Corynebacterium_glucuronolyticum_ATCC_51866:0.00103,Cory > > nebacterium_glucuronolyticum_ATCC_51867:0.00137)100:0.2108,(((Corynebacteriu > > m_efficiens_YS-314:0.04795,Corynebacterium_glutamicum_R:0.05369)100:0.07449, > > ((Corynebacterium_matruchotii_ATCC_14266:0.00162,Corynebacterium_matruchotii > > _ATCC_33806:0.00183)100:0.13467,(Corynebacterium_diphtheriae_NCTC_13129:0.07 > > 465,((Corynebacterium_pseudotuberculosis_C231:1.6E-4,Corynebacterium_pseudot > > uberculosis_I19:1.1E-4)55:1.8E-4,(Corynebacterium_pseudotuberculosis_1002:4. > > 2E-4,Corynebacterium_pseudotuberculosis_FRC41:1.2E-4)100:1.3E-4)100:0.0698)1 > > 00:0.03816)100:0.03537)100:0.01906,((Corynebacterium_ammoniagenes_DSM_20306: > > 0.10393,(((Corynebacterium_accolens_ATCC_49725:0.00229,Corynebacterium_accol > > ens_ATCC_49726:0.00293)100:0.03941,(Corynebacterium_pseudogenitalium_ATCC_33 > > 035:0.00268,Corynebacterium_tuberculostearicum_SK141:0.00306)100:0.02483)100 > > :0.04295,(Corynebacterium_aurimucosum_ATCC_700975:0.05398,Corynebacterium_st > > riatum_ATCC_6940:0.05404)100:0.02088)100:0.03132)100:0.05481,(Corynebacteriu > > m_genitalium_ATCC_33030:0.09551,Corynebacterium_lipophiloflavum_DSM_44291:0. > > 09316)100:0.0783)100:0.03398)100:0.02921)100:0.04667)100:0.02447)100:0.11754 > > )100:0.06305,(((Pseudonocardia_sp._P1:0.18877,(Saccharopolyspora_erythraea_N > > RRL_2338:0.1213,(Actinosynnema_mirum_DSM_43827:0.11851,(Saccharomonospora_vi > > ridis_DSM_43017:0.0927,(Amycolatopsis_mediterranei_U32:0.04231,Streptomyces_ > > sp._AA4:0.04144)100:0.05565)100:0.05467)55:0.01703)100:0.03241)100:0.02581,( > > Nakamurella_multipartita_DSM_44233:0.23119,(Geodermatophilus_obscurus_DSM_43 > > 160:0.21546,((Stackebrandtia_nassauensis_DSM_44728:0.21272,((Salinispora_are > > nicola_CNS-205:0.01818,Salinispora_tropica_CNB-440:0.02452)100:0.03265,(Micr > > omonospora_sp._ATCC_39149:0.0329,(Micromonospora_aurantiaca_ATCC_27029:0.001 > > 16,Micromonospora_sp._L5:7.7E-4)100:0.03199)100:0.01352)100:0.11098)100:0.09 > > 308,((((((Propionibacterium_freudenreichii_subsp._shermanii_CIRM-BIA1:0.2696 > > 8,(Propionibacterium_acnes_J139:0.00475,(Propionibacterium_acnes_KPA171202:6 > > .9E-4,(Propionibacterium_acnes_SK187:9.6E-4,(Propionibacterium_acnes_J165:6. > > 2E-4,Propionibacterium_acnes_SK137:8.8E-4)100:6.3E-4)100:0.00181)100:0.00315 > > )100:0.24712)45:0.13659,(Kribbella_flavida_DSM_17836:0.17436,(Nocardioidacea > > e_bacterium_Broad-1:0.13638,Nocardioides_sp._JS614:0.08869)100:0.11094)73:0. > > 02696)43:0.0456,((Kytococcus_sedentarius_DSM_20547:0.25655,(Dermacoccus_sp._ > > Ellin185:0.19062,(Intrasporangium_calvum_DSM_43043:0.13866,Janibacter_sp._HT > > CC2649:0.1412)100:0.04228)58:0.02864)100:0.06896,(Kineococcus_radiotolerans_ > > SRS30216:0.21207,((((Propionibacterium_acidifaciens_F0233:1.06116,Rubrobacte > > r_xylanophilus_DSM_9941:0.5414)87:0.41816,(((Tropheryma_whipplei_TW08/27:0.0 > > 0122,Tropheryma_whipplei_str._Twist:9.8E-4)100:0.6263,(Microbacterium_testac > > eum_StLB037:0.17489,(Leifsonia_xyli_subsp._xyli_str._CTCB07:0.12936,(marine_ > > actinobacterium_PHSC20C1:0.15679,(Clavibacter_michiganensis_subsp._michigane > > nsis_NCPPB_382:0.00662,Clavibacter_michiganensis_subsp._sepedonicus:0.00756) > > 100:0.1194)100:0.03032)100:0.03729)100:0.05755)100:0.10753,((Kocuria_rhizoph > > ila_DC2201:0.15484,(Rothia_dentocariosa_M567:0.0599,(Rothia_mucilaginosa_ATC > > C_25296:0.00472,Rothia_mucilaginosa_DY-18:0.00351)100:0.05376)100:0.16477)10 > > 0:0.06018,((Micrococcus_luteus_NCTC_2665:0.0060,Micrococcus_luteus_SK58:0.00 > > 536)100:0.2105,(Arthrobacter_arilaitensis_Re117:0.19987,(Renibacterium_salmo > > ninarum_ATCC_33209:0.11862,(Arthrobacter_aurescens_TC1:0.0397,(Arthrobacter_ > > sp._FB24:0.03191,(Arthrobacter_chlorophenolicus_A6:0.02327,Arthrobacter_phen > > anthrenivorans_Sphe3:0.0208)100:0.01937)100:0.01749)100:0.05605)100:0.05138) > > 54:0.02994)100:0.02418)100:0.06667)66:0.02055)2:0.00737,(Brevibacterium_line > > ns_BL2:0.16126,Brevibacterium_mcbrellneri_ATCC_49030:0.16657)100:0.16995)18: > > 0.03558,(Brachybacterium_faecium_DSM_4810:0.31488,(((Actinomyces_sp._oral_ta > > xon_848_str._F0332:0.18095,Arcanobacterium_haemolyticum_DSM_20595:0.23121)10 > > 0:0.1195,((Actinomyces_urogenitalis_DSM_15434:0.07685,(Actinomyces_sp._oral_ > > taxon_171_str._F0337:0.01337,(Actinomyces_oris_K20:0.00655,Actinomyces_visco > > sus_C505:0.00531)100:0.00537)100:0.07249)100:0.1364,(((Mobiluncus_mulieris_2 > > 8-1:0.00136,(Mobiluncus_mulieris_ATCC_35243:8.3E-4,(Mobiluncus_mulieris_ATCC > > _35239:8.2E-4,Mobiluncus_mulieris_FB024-16:0.00149)77:5.2E-4)68:4.7E-4)100:0 > > .11036,(Mobiluncus_curtisii_ATCC_51333:0.00379,(Mobiluncus_curtisii_ATCC_430 > > 63:0.00217,Mobiluncus_curtisii_subsp._holmesii_ATCC_35242:0.00188)100:0.0018 > > 7)100:0.10314)100:0.23714,(Actinomyces_coleocanis_DSM_15436:0.23053,(Actinom > > yces_sp._oral_taxon_178_str._F0338:0.07648,(Actinomyces_sp._oral_taxon_180_s > > tr._F0310:0.034,(Actinomyces_odontolyticus_ATCC_17982:0.00322,Actinomyces_od > > ontolyticus_F0309:0.00348)100:0.02875)100:0.05144)100:0.13371)100:0.05044)10 > > 0:0.03821)98:0.03296)99:0.07932,(Beutenbergia_cavernae_DSM_12333:0.17071,(Ce > > llulomonas_flavigena_DSM_20109:0.11998,(Xylanimonas_cellulosilytica_DSM_1589 > > 4:0.13952,(Sanguibacter_keddieii_DSM_10542:0.0926,Jonesia_denitrificans_DSM_ > > 20603:0.19099)100:0.037)100:0.02528)100:0.03933)100:0.03437)90:0.02763)87:0. > > 02582)25:0.02892)16:0.01907)26:0.05274)57:0.03588,(Catenulispora_acidiphila_ > > DSM_44928:0.19325,((Streptomyces_bingchenggensis_BCW-1:0.02661,(Streptomyces > > _hygroscopicus_ATCC_53653:0.02627,Streptomyces_violaceusniger_Tu_4113:0.0226 > > 6)100:0.0122)100:0.02739,(((Streptomyces_avermitilis_MA-4680:0.02562,((Strep > > tomyces_viridochromogenes_DSM_40736:0.02687,((Streptomyces_ghanaensis_ATCC_1 > > 4672:0.0209,Streptomyces_griseoflavus_Tu4000:0.02075)100:0.01066,(Streptomyc > > es_sp._e14:0.03193,(Streptomyces_coelicolor_A3_2_:5.2E-4,Streptomyces_livida > > ns_TK24:6.7E-4)100:0.02763)21:0.00492)22:0.00461)100:0.00897,(Streptomyces_s > > viceus_ATCC_29083:0.02794,Streptomyces_scabiei_87.22:0.03155)79:0.00685)98:0 > > .00896)100:0.02063,(Streptomyces_albus_J1074:0.04846,(Streptomyces_sp._SPB74 > > :0.01027,(Streptomyces_sp._SPB78:0.0027,(Streptomyces_sp._SA3_actF:0.00376,S > > treptomyces_sp._SA3_actG:2.4E-4)100:6.7E-4)100:0.00617)100:0.05981)100:0.015 > > 63)100:0.00959,((Streptomyces_sp._C:0.01998,Streptomyces_sp._Mg1:0.02056)100 > > :0.0383,(Streptomyces_pristinaespiralis_ATCC_25486:0.03823,(Streptomyces_sp. > > _ACTE:0.02415,((Streptomyces_griseus_subsp._griseus_NBRC_13350:4.1E-4,Strept > > omyces_sp._ACT-1:5.2E-4)100:0.00769,(Streptomyces_roseosporus_NRRL_11379:2.2 > > E-4,Streptomyces_roseosporus_NRRL_15998:0.00209)100:0.00782)100:0.017)100:0. > > 02297)60:0.00742)100:0.00985)100:0.0215)100:0.11349)100:0.05)46:0.02214,(Aci > > dothermus_cellulolyticus_11B:0.25166,((Nocardiopsis_dassonvillei_subsp._dass > > onvillei_DSM_43111:0.11961,Thermobifida_fusca_YX:0.09198)100:0.09418,(Thermo > > monospora_curvata_DSM_43183:0.12535,(Streptosporangium_roseum_DSM_43021:0.09 > > 754,Thermobispora_bispora_DSM_43833:0.08434)100:0.06253)100:0.0273)100:0.045 > > 43)74:0.02856)40:0.02885,(Frankia_symbiont_of_Datisca_glomerata:0.09569,(Fra > > nkia_sp._EuI1c:0.10847,((Frankia_alni_ACN14a:0.0351,Frankia_sp._CcI3:0.03651 > > )100:0.04249,(Frankia_sp._EAN1pec:0.03101,Frankia_sp._EUN1f:0.0394)100:0.046 > > 88)100:0.02188)100:0.03014)100:0.16741)88:0.0315)65:0.02571)88:0.05354)88:0. > > 03348)88:0.06784,(Segniliparus_rotundus_DSM_44985:0.04795,Segniliparus_rugos > > us_ATCC_BAA-974:0.03891)100:0.23144)81:0.0205)88:0.02214,(Tsukamurella_pauro > > metabola_DSM_20162:0.13958,(Gordonia_bronchialis_DSM_43247:0.08238,Gordonia_ > > neofelifaecis_NRRL_B-59395:0.10412)100:0.06124)100:0.03199)84:0.01596)88:0.0 > > 6919)88:0.04069,(Mycobacterium_smegmatis_str._MC2_155:0.05174,((Mycobacteriu > > m_sp._JLS:0.00135,(Mycobacterium_sp._KMS:1.4E-4,Mycobacterium_sp._MCS:2.3E-4 > > )100:0.00177)100:0.04943,(Mycobacterium_vanbaalenii_PYR-1:0.02631,(Mycobacte > > rium_gilvum_PYR-GCK:0.0011,Mycobacterium_sp._Spyr1:9.2E-4)100:0.03388)100:0. > > 03536)100:0.01213)100:0.02314)88:0.04937,(Mycobacterium_parascrofulaceum_ATC > > C_BAA-614:0.02796,(Mycobacterium_intracellulare_ATCC_13950:0.02159,(Mycobact > > erium_avium_subsp._paratuberculosis_K-10:0.0020,(Mycobacterium_avium_104:0.0 > > 032,Mycobacterium_avium_subsp._avium_ATCC_25291:0.00216)91:6.3E-4)100:0.0194 > > 3)100:0.01235)100:0.0158)88:0.01054)62:0.0066)88:0.04218,(Mycobacterium_tube > > rculosis_T46:7.6E-4,(Mycobacterium_tuberculosis_CPHL_A:5.8E-4,(Mycobacterium > > _tuberculosis_K85:8.0E-4,(Mycobacterium_bovis_AF2122/97:1.4E-4,(Mycobacteriu > > m_bovis_BCG_str._Pasteur_1173P2:1.4E-4,Mycobacterium_bovis_BCG_str._Tokyo_17 > > 2:1.4E-4)100:1.3E-4)100:1.8E-4)38:1.1E-4)33:1.4E-4)0:1.4E-4)1:1.5E-4)0:1.5E- > > 4,(Mycobacterium_tuberculosis_CDC1551:2.2E-4,(Mycobacterium_tuberculosis_T92 > > :0.00262,Mycobacterium_tuberculosis_GM_1503:0.00427)3:1.3E-4)0:1.4E-4)13:1.4 > > E-4,(((Mycobacterium_tuberculosis_SUMu012:8.5E-4,(Mycobacterium_tuberculosis > > _H37Ra_WGS_:0.00109,(Mycobacterium_tuberculosis_H37Ra:1.4E-4,Mycobacterium_t > > uberculosis_H37Rv:1.4E-4)58:1.4E-4)41:1.4E-4)72:1.4E-4,(Mycobacterium_tuberc > > ulosis_SUMu001:2.9E-4,(Mycobacterium_tuberculosis_SUMu010:1.1E-4,Mycobacteri > > um_tuberculosis_SUMu011:8.7E-4)58:1.4E-4)65:1.6E-4)61:0.00102,(Mycobacterium > > _tuberculosis_F11:1.1E-4,((Mycobacterium_tuberculosis_KZN_4207:1.4E-4,Mycoba > > cterium_tuberculosis_KZN_R506:1.4E-4)58:1.4E-4,(Mycobacterium_tuberculosis_K > > ZN_1435:1.4E-4,Mycobacterium_tuberculosis_KZN_V2475:1.4E-4)74:1.4E-4)78:1.1E > -4)36:1.4E-4)3:1.4E-4)46:1.4E-4)4:1.4E-4)88; > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l**** > > ** ** > From manju.rawat2 at gmail.com Tue Sep 13 07:20:07 2011 From: manju.rawat2 at gmail.com (Manju Rawat) Date: Tue, 13 Sep 2011 07:20:07 -0400 Subject: [Bioperl-l] no blast result In-Reply-To: References: <1315835661.3797.563.camel@deskpro15336.internal.sanger.ac.uk> Message-ID: this is the perl code #!usr/bin/perl -w use Bio::Perl; use Bio::SearchIO use Bio::Tools::Run::StandAloneBlast; @params = ('database' => 'swissprot', 'READMETHOD' => 'Blastn'); $factory = Bio::Tools::Run::StandAloneBlast->new(@params); $input = Bio::Seq->new(-id=>"testquery",-seq=>"gatcgtataccgtacagct"); $blast_report = $factory->blastall($input); write_blast(">rs.blast",$blast_report); It showing error that Use of uninitialized value $_[0] in join or string at /usr/share/perl/5.10/File/Spec/Unix.pm line 41. MSG: cannot find path to blastall From fs5 at sanger.ac.uk Tue Sep 13 11:09:24 2011 From: fs5 at sanger.ac.uk (Frank Schwach) Date: Tue, 13 Sep 2011 16:09:24 +0100 Subject: [Bioperl-l] no blast result In-Reply-To: References: <1315835661.3797.563.camel@deskpro15336.internal.sanger.ac.uk> Message-ID: <1315926565.3797.587.camel@deskpro15336.internal.sanger.ac.uk> Peter: in BioPerl 1.6 the default executable name in Bio::Tools::Run StandAloneBlast is still set to "blastall" - I'm not sure if it works with blast+ too. Manju: as I said previously, you need to check that you can run BLAST on the command line, i.e. make sure it is actually installed on your system. Have you done that? You can also check the Bio::Tools::Run::StandAloneBlast docs to see how you can manually set the path to your BLAST executable if it is not in your path. You have to install BLAST fisrt before you can run this module. The other error you get from yuor code refers to something that is outside of the code fragment you show here, so can't comment on that one. Frank On Tue, 2011-09-13 at 07:20 -0400, Manju Rawat wrote: > this is the perl code > > #!usr/bin/perl -w > use Bio::Perl; > use Bio::SearchIO > use Bio::Tools::Run::StandAloneBlast; > @params = ('database' => 'swissprot', > 'READMETHOD' => 'Blastn'); > > $factory = Bio::Tools::Run::StandAloneBlast->new(@params); > > > $input = Bio::Seq->new(-id=>"testquery",-seq=>"gatcgtataccgtacagct"); > $blast_report = $factory->blastall($input); > > > write_blast(">rs.blast",$blast_report); > > > It showing error that > > > Use of uninitialized value $_[0] in join or string > at /usr/share/perl/5.10/File/Spec/Unix.pm line 41. > > MSG: cannot find path to blastall > > -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From David.Messina at sbc.su.se Tue Sep 13 11:34:20 2011 From: David.Messina at sbc.su.se (Dave Messina) Date: Tue, 13 Sep 2011 17:34:20 +0200 Subject: [Bioperl-l] no blast result In-Reply-To: <1315926565.3797.587.camel@deskpro15336.internal.sanger.ac.uk> References: <1315835661.3797.563.camel@deskpro15336.internal.sanger.ac.uk> <1315926565.3797.587.camel@deskpro15336.internal.sanger.ac.uk> Message-ID: There's a separate Bio::Tools::Run::BlastPlus module for blast+. And a related HOWTO: http://www.bioperl.org/wiki/HOWTO:BlastPlus On Tue, Sep 13, 2011 at 17:09, Frank Schwach wrote: > Peter: in BioPerl 1.6 the default executable name in Bio::Tools::Run > StandAloneBlast is still set to "blastall" - I'm not sure if it works > with blast+ too. > > Manju: as I said previously, you need to check that you can run BLAST on > the command line, i.e. make sure it is actually installed on your > system. Have you done that? > You can also check the Bio::Tools::Run::StandAloneBlast docs to see how > you can manually set the path to your BLAST executable if it is not in > your path. You have to install BLAST fisrt before you can run this > module. > The other error you get from yuor code refers to something that is > outside of the code fragment you show here, so can't comment on that > one. > > Frank > > > On Tue, 2011-09-13 at 07:20 -0400, Manju Rawat wrote: > > this is the perl code > > > > #!usr/bin/perl -w > > use Bio::Perl; > > use Bio::SearchIO > > use Bio::Tools::Run::StandAloneBlast; > > @params = ('database' => 'swissprot', > > 'READMETHOD' => 'Blastn'); > > > > $factory = Bio::Tools::Run::StandAloneBlast->new(@params); > > > > > > $input = Bio::Seq->new(-id=>"testquery",-seq=>"gatcgtataccgtacagct"); > > $blast_report = $factory->blastall($input); > > > > > > write_blast(">rs.blast",$blast_report); > > > > > > It showing error that > > > > > > Use of uninitialized value $_[0] in join or string > > at /usr/share/perl/5.10/File/Spec/Unix.pm line 41. > > > > MSG: cannot find path to blastall > > > > > > > > -- > The Wellcome Trust Sanger Institute is operated by Genome Research > Limited, a charity registered in England with number 1021457 and a > company registered in England with number 2742969, whose registered > office is 215 Euston Road, London, NW1 2BE. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at illinois.edu Tue Sep 13 15:36:21 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Tue, 13 Sep 2011 19:36:21 +0000 Subject: [Bioperl-l] no blast result In-Reply-To: References: <1315835661.3797.563.camel@deskpro15336.internal.sanger.ac.uk> Message-ID: <6570DEC6-B485-44B0-868A-AAC6329B3224@illinois.edu> On Sep 12, 2011, at 9:00 AM, Peter Cock wrote: > On Mon, Sep 12, 2011 at 2:54 PM, Frank Schwach wrote: >> looks like BLAST is not install on your system. The BioPerl module only >> runs BLAST for you and parses the output but you still need the BLAST >> executables installed on your system. Follow the instructions on the >> NCBI website to download and install BLAST and try running it on the >> commandline with the "blastall" command. If that works then you can run >> it also via BioPerl. >> Frank > > Hang on - blastall is from the "legacy" BLAST suite, does > BioPerl still talk to that or the new BLAST+ suite (e.g. binaries > blastn and blastp rather then blastall)? > > Peter (aside: thought I sent this the other day. never mix grant writing and open source) Both BLAST and BLAST+ are supported via different modules. Some users don't want to use BLAST+ for various reasons, though this may soon be out of their control when NCBI eventually stops supporting legacy BLAST entirely. chris From manju.rawat2 at gmail.com Wed Sep 14 07:32:19 2011 From: manju.rawat2 at gmail.com (Manju Rawat) Date: Wed, 14 Sep 2011 07:32:19 -0400 Subject: [Bioperl-l] no blast result In-Reply-To: References: <1315835661.3797.563.camel@deskpro15336.internal.sanger.ac.uk> <6570DEC6-B485-44B0-868A-AAC6329B3224@illinois.edu> Message-ID: On Wed, Sep 14, 2011 at 7:31 AM, Manju Rawat wrote: > I am trying to install Blast+ in my system.(ubuntu) from this link > http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/unix_setup.html. > but i am getting error.. > > first i downloaded the blast(ncbi-blast-2.2.25+-ia32-linux.tar.gz) from ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/ > . and then extract it in the home/abc/ folder. > after that i set the path for configuration in terminal i.e > > *PATH=$PATH:/home/abc/blast-2.2.25+/bin* > > > but when i am running blast -help in terminal it showing me error that > > error while loading shared libraries: > libbz2.so.1: cannot open shared object file: No such file or directory. > > -- Regards Manju Rawat Project Assistant(NAIP) Genomics Lab ABTC,NDRI Karnal-132001,Haryana From kumarsaurabh20 at gmail.com Thu Sep 15 07:20:47 2011 From: kumarsaurabh20 at gmail.com (kumar Saurabh) Date: Thu, 15 Sep 2011 13:20:47 +0200 Subject: [Bioperl-l] issue with Bio::Tools::Run::Primer3Redux Message-ID: Hi, I need to integrate the primer3 module in one of our pipeline. In a process, I was testing the initial code given on the CPAN website. But whenever I try to run this program its giving me error...that "Cannot locate the Object method add_target via the package Bio::Tools:Run::Primer3Redux...." The line of codes I am using is as follows: # design some primers. # the output will be put into temp.out use Bio::Tools::Primer3Redux; use Bio::Tools::Run::Primer3Redux; use Bio::SeqIO; my $seqio = Bio::SeqIO->new(-file=>'sample.dna'); my $seq = $seqio->next_seq; my $primer3 = Bio::Tools::Run::Primer3Redux->new(-outfile => "temp.out", -path => "/home/singh/Downloads/primer3-2.2.3/src/primer3_core"); # or after the fact you can change the program_name $primer3->program_name('my_superfast_primer3'); unless ($primer3->executable) { print STDERR "primer3 can not be found. Is it installed?\n"; exit(-1) } # set the maximum and minimum Tm of the primer $primer3->add_targets('PRIMER_MIN_TM'=>56, 'PRIMER_MAX_TM'=>90); # Design the primers. This runs primer3 and returns a # Bio::Tools::Primer3::result object with the results # Primer3 can run in several modes (see explanation for # 'PRIMER_TASK' in the primer3 doccumentation). To run a task, # either call it by its PRIMER_TASK name as in these examples: $pcr_primer_results = $primer3->pick_pcr_primers($seq); $pcr_and_hyb_results = $primer3->pick_pcr_primers_and_hyb_probe( $seq ); $check_results = $primer3->check_primers(); # Alternatively, explicitly set the PRIMER_TASK parameter and # use the generic 'run' method (this is mainly here for backwards # compatibility) : $primer3->PRIMER_TASK( 'pick_left_only' ); $result = $primer3->run( $seq ); # If no task is set and the 'run' method is called, primer3 will default to # pick pcr primers. # see the Bio::Tools::Primer3Redux POD for # things that you can get from this. For example: print "There were ", $results->num_primer_pairs, " primer pairs\n"; Can anyone help me with this??? Best regards, Kumar From fs5 at sanger.ac.uk Thu Sep 15 09:44:03 2011 From: fs5 at sanger.ac.uk (Frank Schwach) Date: Thu, 15 Sep 2011 14:44:03 +0100 Subject: [Bioperl-l] issue with Bio::Tools::Run::Primer3Redux In-Reply-To: References: Message-ID: <1316094243.3797.669.camel@deskpro15336.internal.sanger.ac.uk> Hi Kumar, We are currently working on this module and you might want to check out the latest version on Chris Field's github project: https://github.com/cjfields/Bio-Tools-Primer3Redux There will probably be some changes again once I get some time again to work on a few points we discussed lately. You can also check out my repo here: https://github.com/fschwach/Bio-Tools-Primer3Redux but I will certainly have to make changes to that code because I used AUTOLAD in the last version, which is probably not a good idea. My recommendation would be to use Chris' repo and see if that works for you. If not, feedback would be much appreciated. Cheers, Frank On Thu, 2011-09-15 at 13:20 +0200, kumar Saurabh wrote: > Hi, > > I need to integrate the primer3 module in one of our pipeline. In a process, > I was testing the initial code given on the CPAN website. But whenever I try > to run this program its giving me error...that "Cannot locate the Object > method add_target via the package Bio::Tools:Run::Primer3Redux...." > > The line of codes I am using is as follows: > > # design some primers. > # the output will be put into temp.out > use Bio::Tools::Primer3Redux; > use Bio::Tools::Run::Primer3Redux; > use Bio::SeqIO; > > my $seqio = Bio::SeqIO->new(-file=>'sample.dna'); > my $seq = $seqio->next_seq; > > my $primer3 = Bio::Tools::Run::Primer3Redux->new(-outfile => "temp.out", > -path => > "/home/singh/Downloads/primer3-2.2.3/src/primer3_core"); > > # or after the fact you can change the program_name > $primer3->program_name('my_superfast_primer3'); > > unless ($primer3->executable) { > print STDERR "primer3 can not be found. Is it installed?\n"; > exit(-1) > } > > # set the maximum and minimum Tm of the primer > $primer3->add_targets('PRIMER_MIN_TM'=>56, 'PRIMER_MAX_TM'=>90); > > # Design the primers. This runs primer3 and returns a > # Bio::Tools::Primer3::result object with the results > # Primer3 can run in several modes (see explanation for > # 'PRIMER_TASK' in the primer3 doccumentation). To run a task, > # either call it by its PRIMER_TASK name as in these examples: > $pcr_primer_results = $primer3->pick_pcr_primers($seq); > $pcr_and_hyb_results = $primer3->pick_pcr_primers_and_hyb_probe( $seq ); > $check_results = $primer3->check_primers(); > > # Alternatively, explicitly set the PRIMER_TASK parameter and > # use the generic 'run' method (this is mainly here for backwards > # compatibility) : > $primer3->PRIMER_TASK( 'pick_left_only' ); > $result = $primer3->run( $seq ); > > # If no task is set and the 'run' method is called, primer3 will default > to > # pick pcr primers. > > # see the Bio::Tools::Primer3Redux POD for > # things that you can get from this. For example: > > print "There were ", $results->num_primer_pairs, " primer pairs\n"; > > > Can anyone help me with this??? > > > Best regards, > Kumar > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From cjfields at illinois.edu Thu Sep 15 10:13:38 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Thu, 15 Sep 2011 14:13:38 +0000 Subject: [Bioperl-l] issue with Bio::Tools::Run::Primer3Redux In-Reply-To: <1316094243.3797.669.camel@deskpro15336.internal.sanger.ac.uk> References: <1316094243.3797.669.camel@deskpro15336.internal.sanger.ac.uk> Message-ID: I mentioned off-list that this should be filed as a github issue so we don't lose track. Unfortunately I can't get to it until next week (grant deadline). chris On Sep 15, 2011, at 8:44 AM, Frank Schwach wrote: > Hi Kumar, > > We are currently working on this module and you might want to check out > the latest version on Chris Field's github project: > > https://github.com/cjfields/Bio-Tools-Primer3Redux > > There will probably be some changes again once I get some time again to > work on a few points we discussed lately. You can also check out my repo > here: > https://github.com/fschwach/Bio-Tools-Primer3Redux > > but I will certainly have to make changes to that code because I used > AUTOLAD in the last version, which is probably not a good idea. > My recommendation would be to use Chris' repo and see if that works for > you. If not, feedback would be much appreciated. > > Cheers, > > Frank > > > > > On Thu, 2011-09-15 at 13:20 +0200, kumar Saurabh wrote: >> Hi, >> >> I need to integrate the primer3 module in one of our pipeline. In a process, >> I was testing the initial code given on the CPAN website. But whenever I try >> to run this program its giving me error...that "Cannot locate the Object >> method add_target via the package Bio::Tools:Run::Primer3Redux...." >> >> The line of codes I am using is as follows: >> >> # design some primers. >> # the output will be put into temp.out >> use Bio::Tools::Primer3Redux; >> use Bio::Tools::Run::Primer3Redux; >> use Bio::SeqIO; >> >> my $seqio = Bio::SeqIO->new(-file=>'sample.dna'); >> my $seq = $seqio->next_seq; >> >> my $primer3 = Bio::Tools::Run::Primer3Redux->new(-outfile => "temp.out", >> -path => >> "/home/singh/Downloads/primer3-2.2.3/src/primer3_core"); >> >> # or after the fact you can change the program_name >> $primer3->program_name('my_superfast_primer3'); >> >> unless ($primer3->executable) { >> print STDERR "primer3 can not be found. Is it installed?\n"; >> exit(-1) >> } >> >> # set the maximum and minimum Tm of the primer >> $primer3->add_targets('PRIMER_MIN_TM'=>56, 'PRIMER_MAX_TM'=>90); >> >> # Design the primers. This runs primer3 and returns a >> # Bio::Tools::Primer3::result object with the results >> # Primer3 can run in several modes (see explanation for >> # 'PRIMER_TASK' in the primer3 doccumentation). To run a task, >> # either call it by its PRIMER_TASK name as in these examples: >> $pcr_primer_results = $primer3->pick_pcr_primers($seq); >> $pcr_and_hyb_results = $primer3->pick_pcr_primers_and_hyb_probe( $seq ); >> $check_results = $primer3->check_primers(); >> >> # Alternatively, explicitly set the PRIMER_TASK parameter and >> # use the generic 'run' method (this is mainly here for backwards >> # compatibility) : >> $primer3->PRIMER_TASK( 'pick_left_only' ); >> $result = $primer3->run( $seq ); >> >> # If no task is set and the 'run' method is called, primer3 will default >> to >> # pick pcr primers. >> >> # see the Bio::Tools::Primer3Redux POD for >> # things that you can get from this. For example: >> >> print "There were ", $results->num_primer_pairs, " primer pairs\n"; >> >> >> Can anyone help me with this??? >> >> >> Best regards, >> Kumar >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > The Wellcome Trust Sanger Institute is operated by Genome Research > Limited, a charity registered in England with number 1021457 and a > company registered in England with number 2742969, whose registered > office is 215 Euston Road, London, NW1 2BE. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From fs5 at sanger.ac.uk Thu Sep 15 10:43:48 2011 From: fs5 at sanger.ac.uk (Frank Schwach) Date: Thu, 15 Sep 2011 15:43:48 +0100 Subject: [Bioperl-l] issue with Bio::Tools::Run::Primer3Redux In-Reply-To: References: <1316094243.3797.669.camel@deskpro15336.internal.sanger.ac.uk> Message-ID: <1316097828.3797.700.camel@deskpro15336.internal.sanger.ac.uk> I also haven't had the time yet to work on this again but, yes, we need to make sure we don't loose track of where we are. On Thu, 2011-09-15 at 14:13 +0000, Fields, Christopher J wrote: > I mentioned off-list that this should be filed as a github issue so we don't lose track. Unfortunately I can't get to it until next week (grant deadline). > > chris > > On Sep 15, 2011, at 8:44 AM, Frank Schwach wrote: > > > Hi Kumar, > > > > We are currently working on this module and you might want to check out > > the latest version on Chris Field's github project: > > > > https://github.com/cjfields/Bio-Tools-Primer3Redux > > > > There will probably be some changes again once I get some time again to > > work on a few points we discussed lately. You can also check out my repo > > here: > > https://github.com/fschwach/Bio-Tools-Primer3Redux > > > > but I will certainly have to make changes to that code because I used > > AUTOLAD in the last version, which is probably not a good idea. > > My recommendation would be to use Chris' repo and see if that works for > > you. If not, feedback would be much appreciated. > > > > Cheers, > > > > Frank > > > > > > > > > > On Thu, 2011-09-15 at 13:20 +0200, kumar Saurabh wrote: > >> Hi, > >> > >> I need to integrate the primer3 module in one of our pipeline. In a process, > >> I was testing the initial code given on the CPAN website. But whenever I try > >> to run this program its giving me error...that "Cannot locate the Object > >> method add_target via the package Bio::Tools:Run::Primer3Redux...." > >> > >> The line of codes I am using is as follows: > >> > >> # design some primers. > >> # the output will be put into temp.out > >> use Bio::Tools::Primer3Redux; > >> use Bio::Tools::Run::Primer3Redux; > >> use Bio::SeqIO; > >> > >> my $seqio = Bio::SeqIO->new(-file=>'sample.dna'); > >> my $seq = $seqio->next_seq; > >> > >> my $primer3 = Bio::Tools::Run::Primer3Redux->new(-outfile => "temp.out", > >> -path => > >> "/home/singh/Downloads/primer3-2.2.3/src/primer3_core"); > >> > >> # or after the fact you can change the program_name > >> $primer3->program_name('my_superfast_primer3'); > >> > >> unless ($primer3->executable) { > >> print STDERR "primer3 can not be found. Is it installed?\n"; > >> exit(-1) > >> } > >> > >> # set the maximum and minimum Tm of the primer > >> $primer3->add_targets('PRIMER_MIN_TM'=>56, 'PRIMER_MAX_TM'=>90); > >> > >> # Design the primers. This runs primer3 and returns a > >> # Bio::Tools::Primer3::result object with the results > >> # Primer3 can run in several modes (see explanation for > >> # 'PRIMER_TASK' in the primer3 doccumentation). To run a task, > >> # either call it by its PRIMER_TASK name as in these examples: > >> $pcr_primer_results = $primer3->pick_pcr_primers($seq); > >> $pcr_and_hyb_results = $primer3->pick_pcr_primers_and_hyb_probe( $seq ); > >> $check_results = $primer3->check_primers(); > >> > >> # Alternatively, explicitly set the PRIMER_TASK parameter and > >> # use the generic 'run' method (this is mainly here for backwards > >> # compatibility) : > >> $primer3->PRIMER_TASK( 'pick_left_only' ); > >> $result = $primer3->run( $seq ); > >> > >> # If no task is set and the 'run' method is called, primer3 will default > >> to > >> # pick pcr primers. > >> > >> # see the Bio::Tools::Primer3Redux POD for > >> # things that you can get from this. For example: > >> > >> print "There were ", $results->num_primer_pairs, " primer pairs\n"; > >> > >> > >> Can anyone help me with this??? > >> > >> > >> Best regards, > >> Kumar > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > -- > > The Wellcome Trust Sanger Institute is operated by Genome Research > > Limited, a charity registered in England with number 1021457 and a > > company registered in England with number 2742969, whose registered > > office is 215 Euston Road, London, NW1 2BE. > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From manju.rawat2 at gmail.com Fri Sep 16 01:09:25 2011 From: manju.rawat2 at gmail.com (Manju Rawat) Date: Fri, 16 Sep 2011 01:09:25 -0400 Subject: [Bioperl-l] no blast result In-Reply-To: <1315926565.3797.587.camel@deskpro15336.internal.sanger.ac.uk> References: <1315835661.3797.563.camel@deskpro15336.internal.sanger.ac.uk> <1315926565.3797.587.camel@deskpro15336.internal.sanger.ac.uk> Message-ID: Hello Frank, Yes,u r rite..I tried to run blast in terminal but its not working.. I have installed the latest version of blast and download the database correctly.. But when i running blastn-help command in terminal it showing me error that blastn: error while loading shared libraries: libbz2.so.1: cannot open shared object file: No such file or directory. and when i am running the blastall command then it showing that *legacy_blast.pl blastall -p blastn -i amino.fa -d nr -o blast.out* Can't exec "/usr/bin/blastn": No such file or directory at /usr/bin/ legacy_blast.pl line 85. Program failed, try executing the command manually. While i have set the path of environment variable PATH=$PATH:/home/abc/ncbi-blast-2.2.25+/bin I have checked everything but not able tp fine the error.. Pl help me. Manju From manju.rawat2 at gmail.com Fri Sep 16 01:12:03 2011 From: manju.rawat2 at gmail.com (Manju Rawat) Date: Fri, 16 Sep 2011 01:12:03 -0400 Subject: [Bioperl-l] Command line error in BLAST+ Message-ID: Hi, I have installed the latest version of blast and download the database correctly Using this tutorial http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/unix_setup.html But when i running blastn-help command in terminal it showing me error that blastn: error while loading shared libraries: libbz2.so.1: cannot open shared object file: No such file or directory. and when i am running the blastall command then it showing that *legacy_blast.pl blastall -p blastn -i amino.fa -d nr -o blast.out* Can't exec "/usr/bin/blastn": No such file or directory at /usr/bin/ legacy_blast.pl line 85. Program failed, try executing the command manually. While i have set the path of environment variable PATH=$PATH:/home/abc/ncbi-blast-2.2.25+/bin I have checked everything but not able tp fine the error.. Pl help me. Thanks Manju From p.j.a.cock at googlemail.com Fri Sep 16 04:15:46 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 16 Sep 2011 09:15:46 +0100 Subject: [Bioperl-l] Command line error in BLAST+ In-Reply-To: References: Message-ID: On Fri, Sep 16, 2011 at 6:12 AM, Manju Rawat wrote: > Hi, > > > I have installed the latest version of blast and download the database > correctly Using this tutorial > http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/unix_setup.html > > But when i running blastn-help command in terminal it showing me error that > blastn: error while loading shared libraries: libbz2.so.1: cannot open > shared object file: No such file or directory. > > > and when i am running the blastall command then it showing that > *legacy_blast.pl blastall -p blastn -i amino.fa -d nr -o blast.out* > Can't exec "/usr/bin/blastn": No such file or directory at /usr/bin/ > legacy_blast.pl line 85. > Program failed, try executing the command manually. > > While i have set the path of environment variable > PATH=$PATH:/home/abc/ncbi-blast-2.2.25+/bin > > I have checked everything but not able tp fine the error.. > > Pl help me. > > Thanks > Manju You're using the BioPerl wrapper for legacy blast (blastall), which is not installed. Instead you have the new blast+ suite which includes a wrapper using the perl script legacy_blast.pl to imitate the old blastall tool (in this case it calls the new tool blastn). Fix 1: Edit legacy_blast.pl to use the path to blastn etc under your home directory Fix 2: Install BLAST+ at system level Fix 3: Use the BioPerl wrapper for BLAST+ instead. I'd go with option 3. Peter From p.j.a.cock at googlemail.com Fri Sep 16 04:17:58 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 16 Sep 2011 09:17:58 +0100 Subject: [Bioperl-l] no blast result In-Reply-To: References: <1315835661.3797.563.camel@deskpro15336.internal.sanger.ac.uk> <1315926565.3797.587.camel@deskpro15336.internal.sanger.ac.uk> Message-ID: On Fri, Sep 16, 2011 at 6:09 AM, Manju Rawat wrote: > Hello Frank, > > Yes,u r rite..I tried to run blast in terminal but its not working.. > I have installed the latest version of blast and download the database > correctly.. > > But when i running blastn-help command in terminal it showing me error that > blastn: error while loading shared libraries: libbz2.so.1: cannot open > shared object file: No such file or directory. > > > and when i am running the blastall command then it showing that > legacy_blast.pl blastall -p blastn -i amino.fa -d nr -o blast.out > Can't exec "/usr/bin/blastn": No such file or directory at > /usr/bin/legacy_blast.pl line 85. > Program failed, try executing the command manually. > > While i have set the path of environment variable > PATH=$PATH:/home/abc/ncbi-blast-2.2.25+/bin > > I have checked everything but not able tp fine the error.. > > Pl help me. > Manju For the benefit of anyone reading the archives later, I tried to answer this in Manju's new thread: http://lists.open-bio.org/pipermail/bioperl-l/2011-September/035696.html Peter From fs5 at sanger.ac.uk Fri Sep 16 04:36:37 2011 From: fs5 at sanger.ac.uk (Frank Schwach) Date: Fri, 16 Sep 2011 09:36:37 +0100 Subject: [Bioperl-l] Command line error in BLAST+ In-Reply-To: References: Message-ID: <1316162197.3797.721.camel@deskpro15336.internal.sanger.ac.uk> Hi Manju, Are you on Ubuntu? I think I've seen problems with this bzip library on Ubuntu before. It's not a problem with BLAST in any case. Should be possible to install the missing files through your package manager. I'm sure Google will know what to do :) Not sure what went wrong with your blast installation. What happens if you run blastall directly (without the legacy_blast.pl script)? In any case, it might be better to ask the NCBI people for help with the BLAST installation as this is not a BioPerl problem. cheers, Frank On Fri, 2011-09-16 at 01:12 -0400, Manju Rawat wrote: > Hi, > > > I have installed the latest version of blast and download the database > correctly Using this tutorial > http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/unix_setup.html > > But when i running blastn-help command in terminal it showing me error that > blastn: error while loading shared libraries: libbz2.so.1: cannot open > shared object file: No such file or directory. > > > and when i am running the blastall command then it showing that > *legacy_blast.pl blastall -p blastn -i amino.fa -d nr -o blast.out* > Can't exec "/usr/bin/blastn": No such file or directory at /usr/bin/ > legacy_blast.pl line 85. > Program failed, try executing the command manually. > > While i have set the path of environment variable > PATH=$PATH:/home/abc/ncbi-blast-2.2.25+/bin > > I have checked everything but not able tp fine the error.. > > Pl help me. > > Thanks > Manju > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From ross at cuhk.edu.hk Fri Sep 16 04:51:38 2011 From: ross at cuhk.edu.hk (Ross KK Leung) Date: Fri, 16 Sep 2011 16:51:38 +0800 Subject: [Bioperl-l] use blast to extract similar sequences In-Reply-To: References: <048a01cc6f84$60c41090$224c31b0$@edu.hk> <04a601cc700e$4f051d10$ed0f5730$@edu.hk> Message-ID: <085501cc744d$d90b4500$8b21cf00$@edu.hk> I wonder whether bioperl has any built-in modules that extracts sequences based on blast results. For example, a short query sequence of length 1000 is to blast against a reference genome of 3M. The homologous sequence of 1000 +/- 20 is extracted. Why is +/- 20 needed? Because we can't guarantee there must have a good match. Frequent blast users may be well aware that then there can be coverage, split-up due to local alignments, etc and that's why I would like to know if anybody has already developed a module to handle this kind of problem. Thanks in advance! From cjfields at illinois.edu Fri Sep 16 09:22:07 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Fri, 16 Sep 2011 13:22:07 +0000 Subject: [Bioperl-l] use blast to extract similar sequences In-Reply-To: <085501cc744d$d90b4500$8b21cf00$@edu.hk> References: <048a01cc6f84$60c41090$224c31b0$@edu.hk> <04a601cc700e$4f051d10$ed0f5730$@edu.hk> <085501cc744d$d90b4500$8b21cf00$@edu.hk> Message-ID: That seems like a pretty straightforward thing to do; there isn't an all-in-one way of doing this, but that's a good thing (it's a separation of concerns). 1) Run and parse BLAST results and grab seqID and coordinates for each hit (or each HSP for each hit) (Bio::SearchIO) 2) Pull the right subsequence +/- 20bp using above from the indexed flat file of your reference (Bio::DB::Fasta) You can get revcomped sequence from Bio::DB::Fasta directly by flipping coordinates: # raw sequence my $seq = $db->seq('CHROMOSOME_I',4_000_000 => 4_100_000); my $revseq = $db->seq('CHROMOSOME_I',4_100_000 => 4_000_000); chris On Sep 16, 2011, at 3:51 AM, Ross KK Leung wrote: > I wonder whether bioperl has any built-in modules that extracts sequences based on blast results. For example, a short query sequence of length 1000 is to blast against a reference genome of 3M. The homologous sequence of 1000 +/- 20 is extracted. Why is +/- 20 needed? Because we can't guarantee there must have a good match. Frequent blast users may be well aware that then there can be coverage, split-up due to local alignments, etc and that's why I would like to know if anybody has already developed a module to handle this kind of problem. Thanks in advance! > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From wsavigne at yahoo.com Fri Sep 16 16:45:12 2011 From: wsavigne at yahoo.com (Willy Savigne) Date: Fri, 16 Sep 2011 13:45:12 -0700 (PDT) Subject: [Bioperl-l] question Bioperl installation Message-ID: <1316205912.93564.YahooMailClassic@web160515.mail.bf1.yahoo.com> my name is william how do download Bioperl i tried other site but NOTHING? i would like to know info in downloading? bioperl .This is my first? time into knowing? bioinformatic i? just got? a book developing bioinformatic and begginning perl bioinformatic. I do alot Dna and RNA sequencing?? and more. ? Thank u willy From ross at cuhk.edu.hk Sun Sep 18 06:51:05 2011 From: ross at cuhk.edu.hk (Ross KK Leung) Date: Sun, 18 Sep 2011 18:51:05 +0800 Subject: [Bioperl-l] snp/frameshift identification In-Reply-To: References: <048a01cc6f84$60c41090$224c31b0$@edu.hk> <04a601cc700e$4f051d10$ed0f5730$@edu.hk> <085501cc744d$d90b4500$8b21cf00$@edu.hk> Message-ID: <08a901cc75f0$dd463b30$97d2b190$@edu.hk> Dear Bioperl-users, Following Fields, Christopher J's advice on sequence extraction, I manage to proceed to the last stage of non-synonymous SNP identification. Now what I have in hand is thousands of reliable multiple sequence alignment files, e.g. >seq1 ATGACAGACACGACGTTGCCGTAG >seq2 ATGACAGACACGACGTAGCCGTAG >seq3 ATGACAGACACGACGTTGCCGTAG Seq2 has a T->A mutation and that leads to a stop codon generation. I wonder if Bioperl has handled this kind of SNP or frameshift or non-sense mutations that lead to change of amino acid in the translated protein product. Thanks again to the community that helps me a great deal so I can catch up progress during this Sat/Sun!! From rondonbio at yahoo.com.br Mon Sep 19 09:46:36 2011 From: rondonbio at yahoo.com.br (Rondon Neto) Date: Mon, 19 Sep 2011 06:46:36 -0700 (PDT) Subject: [Bioperl-l] help-> SearchIO Message-ID: <1316439996.6247.YahooMailNeo@web130220.mail.mud.yahoo.com> Hi guys! I need your help in a loop that I have in SearchIO. I need to check the nucleotide coverage of querys using BLAST. I'm using the script below. It's open the alignment, create arrays for each query with zeros in each nucleotide position but, when I adds values to the coverage of each nucleotide, the script does it once and passes to another query. Can you hek me? Thank you very much, Rondon a Brazilian friend. use Bio::SearchIO; ? my $alignment = new Bio::SearchIO ( -format => 'blastXML', ? ? ? ? ? ? ? ? ? ? ? ? ?? ? -file ? => $alignment_file ); my %positions; while (my $result = $alignment->next_result) { while (my $hit = $result->next_hit) { while (my $hsp = $hit->next_hsp) { my $query_name = $result->query_name(); my $tam = $result -> query_length(); my @pos = $hsp->seq_inds('query','identical'); for (0..$tam){ ${$positions{$query_name}}[$_] = 0 } # make arrays for each query and populate with 0 in each position foreach my $num (@pos) { ${$positions{$query_name}}[$num -1]++; ? ?#This loop is where I believe that is an error. } } } } foreach my $key (keys %positions){ print "$key\t@{$positions{$key}}\n"; } exit; From roy.chaudhuri at gmail.com Mon Sep 19 12:29:41 2011 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Mon, 19 Sep 2011 17:29:41 +0100 Subject: [Bioperl-l] help-> SearchIO In-Reply-To: <1316439996.6247.YahooMailNeo@web130220.mail.mud.yahoo.com> References: <1316439996.6247.YahooMailNeo@web130220.mail.mud.yahoo.com> Message-ID: <4E776DF5.6040504@gmail.com> Hi Rondon, The line where you populate your arrayref with 0 (starting "for (0..$tam)") is within the HSP loop, so the data from every successive HSP will overwrite the previous one in your hash. You will therefore only see the data for the last HSP from each query. If you move that line to execute once per result (i.e. just after the line starting "while (my result ="), then I think it should work as you intended. Cheers, Roy. On 19/09/2011 14:46, Rondon Neto wrote: > Hi guys! I need your help in a loop that I have in SearchIO. I need > to check the nucleotide coverage of querys using BLAST. I'm using the > script below. It's open the alignment, create arrays for each query > with zeros in each nucleotide position but, when I adds values to the > coverage of each nucleotide, the script does it once and passes to > another query. Can you hek me? Thank you very much, > > Rondon a Brazilian friend. > > use Bio::SearchIO; > > my $alignment = new Bio::SearchIO ( -format => 'blastXML', > -file => $alignment_file ); > > my %positions; > while (my $result = $alignment->next_result) { > while (my $hit = $result->next_hit) { > while (my $hsp = $hit->next_hsp) { > my $query_name = $result->query_name(); > my $tam = $result -> query_length(); > my @pos = $hsp->seq_inds('query','identical'); > for (0..$tam){ ${$positions{$query_name}}[$_] = 0 } # make arrays for each query and populate with 0 in each position > foreach my $num (@pos) { > ${$positions{$query_name}}[$num -1]++; #This loop is where I believe that is an error. > } > } > } > } > > foreach my $key (keys %positions){ > print "$key\t@{$positions{$key}}\n"; > } > > exit; > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From roy.chaudhuri at gmail.com Mon Sep 19 12:39:40 2011 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Mon, 19 Sep 2011 17:39:40 +0100 Subject: [Bioperl-l] question Bioperl installation In-Reply-To: <1316205912.93564.YahooMailClassic@web160515.mail.bf1.yahoo.com> References: <1316205912.93564.YahooMailClassic@web160515.mail.bf1.yahoo.com> Message-ID: <4E77704C.20604@gmail.com> Hi Willy, There are instructions for downloading and installing BioPerl on the wiki: http://www.bioperl.org/wiki/Getting_BioPerl http://www.bioperl.org/wiki/Installing_BioPerl These are the first two results when you Google for "bioperl download". Note that the wiki is a little out of date, the latest BioPerl version is 1.6.901: http://search.cpan.org/~cjfields/BioPerl-1.6.901/ Cheers, Roy. On 16/09/2011 21:45, Willy Savigne wrote: > my name is william how do download Bioperl i tried other site but > NOTHING i would like to know info in downloading bioperl .This is > my first time into knowing bioinformatic i just got a book > developing bioinformatic and begginning perl bioinformatic. I do alot > Dna and RNA sequencing and more. > > Thank u willy > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From bosborne11 at verizon.net Tue Sep 20 13:01:21 2011 From: bosborne11 at verizon.net (Brian Osborne) Date: Tue, 20 Sep 2011 13:01:21 -0400 Subject: [Bioperl-l] Question about a phylogenetic tree Message-ID: <36D93DC4-135D-408F-8169-AC8D5E59BD90@verizon.net> All, I have code that starts with a sequence file and makes a tree (Bio::Tree::Tree) using Muscle to align and then Phyml, here's the last part that makes the tree: ..... get the files etc .... my %alignparams = ( -seqtype => 'nucleo', -usetree_nowarn => $guidetreefile, -in => $tempfile ); my $aligner = Bio::Tools::Run::Alignment::Muscle->new(%alignparams); # $align is a Bio::SimpleAlign object my $align = $aligner->align($tempfile); my %treeparams = ( -data_type => 'nt', -model => 'K80', # Kimura -tree => 'BIONJ', -bootstrap => 1000 ); my $treemaker = Bio::Tools::Run::Phylo::Phyml->new(%treeparams); #$tree is a Bio::Tree::Tree object my $tree = $treemaker->run($align); My question: do I get the pairwise distance between 2 sequences (based on Kimura here) by doing something like: $distance = $tree->subtree_length($internal_node) Where $internal_node is the parent of the pair in question? Excuse me if this is obvious, have never made Bioperl trees before! Brian O. From bosborne11 at verizon.net Tue Sep 20 15:17:13 2011 From: bosborne11 at verizon.net (Brian Osborne) Date: Tue, 20 Sep 2011 15:17:13 -0400 Subject: [Bioperl-l] Question about a phylogenetic tree In-Reply-To: <36D93DC4-135D-408F-8169-AC8D5E59BD90@verizon.net> References: <36D93DC4-135D-408F-8169-AC8D5E59BD90@verizon.net> Message-ID: Ah, I see: my $distances = $tree->distance(-nodes => [$node1,$node2]); Brian O. On Sep 20, 2011, at 1:01 PM, Brian Osborne wrote: > All, > > I have code that starts with a sequence file and makes a tree (Bio::Tree::Tree) using Muscle to align and then Phyml, here's the last part that makes the tree: > > ..... get the files etc .... > > my %alignparams = ( > -seqtype => 'nucleo', > -usetree_nowarn => $guidetreefile, > -in => $tempfile > ); > my $aligner = Bio::Tools::Run::Alignment::Muscle->new(%alignparams); > > # $align is a Bio::SimpleAlign object > my $align = $aligner->align($tempfile); > > my %treeparams = ( > -data_type => 'nt', > -model => 'K80', # Kimura > -tree => 'BIONJ', > -bootstrap => 1000 > ); > my $treemaker = Bio::Tools::Run::Phylo::Phyml->new(%treeparams); > > #$tree is a Bio::Tree::Tree object > my $tree = $treemaker->run($align); > > My question: do I get the pairwise distance between 2 sequences (based on Kimura here) by doing something like: > > $distance = $tree->subtree_length($internal_node) > > Where $internal_node is the parent of the pair in question? Excuse me if this is obvious, have never made Bioperl trees before! > > Brian O. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From manju.rawat2 at gmail.com Thu Sep 22 07:07:39 2011 From: manju.rawat2 at gmail.com (Manju Rawat) Date: Thu, 22 Sep 2011 16:37:39 +0530 Subject: [Bioperl-l] database for Bos Touras Message-ID: Hello To All, I want to blast my sequence Only in Bos Touras Database Using Local Blast(Blast+). But I dnt Know which database I should use for this From this Link. ftp://ftp.ncbi.nlm.nih.gov/blast/db/ Pl tell me which DB I Should use?? Thanks Manju From hrh at fmi.ch Thu Sep 22 07:44:56 2011 From: hrh at fmi.ch (Hans-Rudolf Hotz) Date: Thu, 22 Sep 2011 13:44:56 +0200 Subject: [Bioperl-l] database for Bos Touras In-Reply-To: References: Message-ID: <4E7B1FB8.8090208@fmi.ch> assuming you mean 'Bos taurus', it might be easier to get the data from ucsc: http://hgdownload.cse.ucsc.edu/downloads.html#cow or ensembl: ftp://ftp.ensembl.org/pub/release-64/fasta/bos_taurus/dna/ Regards, Hans On 09/22/2011 01:07 PM, Manju Rawat wrote: > Hello To All, > > I want to blast my sequence Only in Bos Touras Database Using Local > Blast(Blast+). > But I dnt Know which database I should use for this From this Link. > ftp://ftp.ncbi.nlm.nih.gov/blast/db/ > > Pl tell me which DB I Should use?? > > Thanks > Manju > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From hrh at fmi.ch Thu Sep 22 08:16:00 2011 From: hrh at fmi.ch (Hans-Rudolf Hotz) Date: Thu, 22 Sep 2011 14:16:00 +0200 Subject: [Bioperl-l] database for Bos Touras In-Reply-To: References: <4E7B1FB8.8090208@fmi.ch> Message-ID: <4E7B2700.8080904@fmi.ch> Yes, BLAST uses fasta files. You (may need to concatenate the individual chromosomes and the you) need to index them with 'makeblastdb' which is also part of the blast+ software package. see: http://www.ncbi.nlm.nih.gov/books/NBK1762/ Hans On 09/22/2011 01:49 PM, Manju Rawat wrote: > It will work on Local Blast or not?????? From bosborne11 at verizon.net Thu Sep 22 12:16:39 2011 From: bosborne11 at verizon.net (Brian Osborne) Date: Thu, 22 Sep 2011 12:16:39 -0400 Subject: [Bioperl-l] [bioperl-live] genbank_ref_extractor: new script to make search on entrez gene and retrieve related sequences (#23) In-Reply-To: References: Message-ID: <245C75D5-61EC-4395-B64F-47D8471568F5@verizon.net> Carne, This is impressive looking, it is now in scripts/. Thanks again, Brian O. On Sep 21, 2011, at 11:25 AM, Carn? Draug wrote: > Hi > > I wrote a script with bioperl that I would like to share back. It takes a list of searches for Entrez Gene and attempts to retrieve the related sequences (genomic, transcripts and proteins). It is also possible to obtain extra upstream and downstream bp for genomic sequences and control the naming of the files. In the end it can save all the results in a CSV file. > > Hope you find it up to your coding standards. Suggestions for improvements are welcome, including for a better name. > > Carn? > > You can merge this Pull Request by running: > > git pull https://github.com/carandraug/bioperl-live bp_genbank_ref_extractor > > Or you can view, comment on it, or merge it online at: > > https://github.com/bioperl/bioperl-live/pull/23 > > -- Commit Summary -- > > * genbank_ref_extractor: new script to make search on entrez gene and retrieve related sequences > > -- File Changes -- > > A scripts/Bio-DB-EUtilities/bp_genbank_ref_extractor.pl (1064) > > -- Patch Links -- > > https://github.com/bioperl/bioperl-live/pull/23.patch > https://github.com/bioperl/bioperl-live/pull/23.diff > > -- > Reply to this email directly or view it on GitHub: > https://github.com/bioperl/bioperl-live/pull/23 From bluecurio at gmail.com Thu Sep 22 15:32:07 2011 From: bluecurio at gmail.com (Daniel Renfro) Date: Thu, 22 Sep 2011 14:32:07 -0500 Subject: [Bioperl-l] Download RefSeq revision history programmatically Message-ID: <29B78DBCEBCA42B5A99461FE1E9BC33F@gmail.com> I am working on a project to find historical differences in GenBank/RefSeq files. I would like to download all the old revisions of a file (for example NC_000913 [http://www.ncbi.nlm.nih.gov/nuccore/NC_000913.2?report=girevhist]) using any technology available. I wrote a page-scraper in Perl, but I can't get NCBI to return plaintext, only HTML (which does nobody any good.) Does anyone know of a way to get all the "revisions" (not just "versions") of a GenBank/RefSeq file? -Daniel -- http://ecoliwiki.net/User:DanielRenfro Hu Lab Research Associate 979-862-4055 From ross at cuhk.edu.hk Tue Sep 27 10:16:14 2011 From: ross at cuhk.edu.hk (Ross KK Leung) Date: Tue, 27 Sep 2011 22:16:14 +0800 Subject: [Bioperl-l] obtain a distance matrix from tree In-Reply-To: References: <048a01cc6f84$60c41090$224c31b0$@edu.hk> <04a601cc700e$4f051d10$ed0f5730$@edu.hk> <085501cc744d$d90b4500$8b21cf00$@edu.hk> Message-ID: <014201cc7d20$03abd4c0$0b037e40$@edu.hk> After using MEGA to generate a newick tree file (phylogram), I wonder if Bioperl has any convenient functions to derive the (n x n) distance (by NJ, MP etc) matrix. Thanks for your advice in advance! From thomas.sharpton at gmail.com Tue Sep 27 16:02:44 2011 From: thomas.sharpton at gmail.com (Thomas Sharpton) Date: Tue, 27 Sep 2011 13:02:44 -0700 Subject: [Bioperl-l] obtain a distance matrix from tree In-Reply-To: <014201cc7d20$03abd4c0$0b037e40$@edu.hk> References: <048a01cc6f84$60c41090$224c31b0$@edu.hk> <04a601cc700e$4f051d10$ed0f5730$@edu.hk> <085501cc744d$d90b4500$8b21cf00$@edu.hk> <014201cc7d20$03abd4c0$0b037e40$@edu.hk> Message-ID: Hi Ross, For very large trees, I found it to be more efficient to do this in R using the ape package. I have a script listed in my github repo that will convert a tree to a distance matrix via in R at the link below: https://github.com/sharpton/PhylOTU/blob/master/tree_to_matrix.R That said, I've also done this in Bioperl using something like the following: use Bio::TreeIO; my $treein = Bio::TreeIO->new( -fh => "input_tree.nwk", -format => 'newick' ); while( my $tree = $treein->next_tree ){ my %dist_matrix = (); my @leaves = $tree->get_leaf_nodes; foreach my $leaf1( @leaves ){ my $id1 = $leaf1->id; foreach my $leaf2( @leaves ){ my $id2 = $leaf2->id; next if $id1 eq $id2; next if( defined( $dist_matrix{$id1}->{$id2} ) || defined ( $dist_matrix{$id2}->{$id1} ) ); my $distance = $tree->distance( -nodes => [$leaf1, $leaf2] ); $dist_matrix{$id1}->{$id2} = $distance; } } } #print distance matrix here.... This will put the information you need to create either a full or a upper triangle distance matrix into the hash %dist_matrix. I didn't test the above, so hopefully there are no bugs.... Someone else may have a more elegant solution. Best, Tom PS: Sorry if you get this twice. On Sep 27, 2011, at 7:16 AM, Ross KK Leung wrote: > After using MEGA to generate a newick tree file (phylogram), I > wonder if > Bioperl has any convenient functions to derive the (n x n) distance > (by NJ, > MP etc) matrix. Thanks for your advice in advance! > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From member at linkedin.com Tue Sep 27 19:45:10 2011 From: member at linkedin.com (Razi Khaja via LinkedIn) Date: Tue, 27 Sep 2011 23:45:10 +0000 (UTC) Subject: [Bioperl-l] Invitation to connect on LinkedIn Message-ID: <1856085440.8574001.1317167110185.JavaMail.app@ela4-bed82.prod> LinkedIn ------------ Razi Khaja requested to add you as a connection on LinkedIn: ------------------------------------------ Bolotin,, I'd like to add you to my professional network on LinkedIn. Accept invitation from Razi Khaja http://www.linkedin.com/e/5drwke-gt3jaequ-6k/uez6TYkHzbaXxXM-lUk23auFwJZodcPlXc2UWC0Ao8h/blk/I3148646357_2/1BpC5vrmRLoRZcjkkZt5YCpnlOt3RApnhMpmdzgmhxrSNBszYOnPsRcPoQdzwQcjd9bSsPizpOoltTbP0NdPgMd3kTcPgLrCBxbOYWrSlI/EML_comm_afe/?hs=false&tok=2TjQgihXkh-kU1 View profile of Razi Khaja http://www.linkedin.com/e/5drwke-gt3jaequ-6k/rsn/35197242/UkCS/?hs=false&tok=3k9X2Qfnoh-kU1 ------------------------------------------ From ross at cuhk.edu.hk Tue Sep 27 23:57:52 2011 From: ross at cuhk.edu.hk (Ross KK Leung) Date: Wed, 28 Sep 2011 11:57:52 +0800 Subject: [Bioperl-l] ancestral state derived from Tree In-Reply-To: <29B78DBCEBCA42B5A99461FE1E9BC33F@gmail.com> References: <29B78DBCEBCA42B5A99461FE1E9BC33F@gmail.com> Message-ID: <017701cc7d92$cba88d20$62f9a760$@edu.hk> By using Tom's advice, I'm able to obtain the distance matrix for the following tree by Bioperl TreeIO. ((((((A:1.00000000,B:1.00000000):1.00000000,C:1.00000000):0.00000000,D:0.000 00000):1.00000000,(E:0.00000000,(F:2.00000000,G:1.00000000):0.00000000):0.00 000000):2.00000000,(H:3.00000000,(I:2.00000000,(J:1.00000000,(K:2.00000000,( L:2.00000000,M:2.00000000):0.00000000):0.00000000):0.00000000):0.00000000):0 .00000000):1.00000000,(N:0.00000000,((O:0.00000000,P:0.00000000):1.00000000, (Q:2.00000000,(R:2.66666667,S:3.66666667):3.66666667):0.00000000):1.00000000 ):3.00000000,(T:0.00000000,(U:0.00000000,V:0.00000000):1.00000000):16.000000 00); For the last few nodes T, U and V, they should be monophyletic but U and V should be more closely related. Although I can use TreeIO methods like is_monophyletic or is_paraphyletic to test in this case, the problem becomes more tricky for nodes A, B, C, D because D actually makes no difference from the common ancestor of nodes A, B, C and D. Since is_monophyletic does not take into account for this case, is there any workaround? I have to pay attention to such a detail in order to make a better guess for the ancestral state(s) at various points of this tree. Thanks again for the TreeIO developers for making tree analysis easier for us biologists! From manju.rawat2 at gmail.com Wed Sep 28 05:54:07 2011 From: manju.rawat2 at gmail.com (Manju Rawat) Date: Wed, 28 Sep 2011 15:24:07 +0530 Subject: [Bioperl-l] how to blast a seq against multiple dbase Message-ID: Hello, I have downloaded all the chromosome of Bos Taurus and i'd changed them in blast format using makeblastdb..and now i want to localy blast my sequence against these all chromosome.. now i have 29 database.Is there any method by which can i blast my sequence against all 29 database in my program.. whta should i write in database???? @params = ('database' => '????????', 'outfile' => 'blast2.out', '_READMETHOD' => 'Blast', 'prog'=> 'blastn'); Thanks Manju Rawat From p.j.a.cock at googlemail.com Wed Sep 28 06:02:07 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 28 Sep 2011 11:02:07 +0100 Subject: [Bioperl-l] how to blast a seq against multiple dbase In-Reply-To: References: Message-ID: On Wed, Sep 28, 2011 at 10:54 AM, Manju Rawat wrote: > Hello, > > I have downloaded all the chromosome of Bos Taurus and i'd changed them in > blast format using makeblastdb..and now i want to localy blast my sequence > against these all chromosome.. > now i have 29 database.Is there any method by which can i blast my sequence > against all 29 database in my program.. > > > whta should i write in database???? > > @params = ('database' => '????????', 'outfile' => 'blast2.out', > ? ? ? ?'_READMETHOD' => 'Blast', 'prog'=> 'blastn'); > The simple answer is make a combined database. This works internally with alias files, have a look at the NR and NT databases for example - they act like singe databases but are actually a collection of chunks. Even simpler would be to combine your Bos taurus sequence files into a single multi-entry FASTA file, and make that into a single BLAST database. Peter From awitney at sgul.ac.uk Wed Sep 28 06:42:39 2011 From: awitney at sgul.ac.uk (Adam Witney) Date: Wed, 28 Sep 2011 11:42:39 +0100 Subject: [Bioperl-l] how to blast a seq against multiple dbase In-Reply-To: References: Message-ID: I think if you want to keep the databases separate you would need to create a factory for each database, something like this foreach my $db ( @databases ) { my $factory = Bio::Tools::Run::StandAloneBlastPlus->new( -db_data => $db , < ? any other params ? > ); ? do blast stuff? } or as Peter says in another email you could combine your databases and run one query then filter them out in the results. regards adam On 28 Sep 2011, at 10:54, Manju Rawat wrote: > Hello, > > I have downloaded all the chromosome of Bos Taurus and i'd changed them in > blast format using makeblastdb..and now i want to localy blast my sequence > against these all chromosome.. > now i have 29 database.Is there any method by which can i blast my sequence > against all 29 database in my program.. > > > whta should i write in database???? > > @params = ('database' => '????????', 'outfile' => 'blast2.out', > '_READMETHOD' => 'Blast', 'prog'=> 'blastn'); > > > > Thanks > Manju Rawat > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From carandraug+dev at gmail.com Wed Sep 28 07:43:02 2011 From: carandraug+dev at gmail.com (=?ISO-8859-1?Q?Carn=EB_Draug?=) Date: Wed, 28 Sep 2011 12:43:02 +0100 Subject: [Bioperl-l] retrieving bioperl version for scripts Message-ID: Hi everyone, is there a recommended way to get the version of a script that is part of bioperl (the ones in the scripts directory)? Rather than hard coding the version of the script independent of bioperl, I thought on using the bioperl version itself. How can this be done? Thanks in advance, Carn? From carandraug+dev at gmail.com Wed Sep 28 11:00:34 2011 From: carandraug+dev at gmail.com (=?ISO-8859-1?Q?Carn=EB_Draug?=) Date: Wed, 28 Sep 2011 16:00:34 +0100 Subject: [Bioperl-l] retrieving bioperl version for scripts In-Reply-To: References: Message-ID: 2011/9/28 longbow leo : > Hi, Carn?, > > Do you mean this: > > perl -MBio::SeqIO -e 'print $Bio::SeqIO::VERSION' > > In my machine, the output is Thank you. Yes this is what I was looking for. I looked down how that variable comes up and so I think I'll use use Bio::Root::Version; say $Bio::Root::Version::VERSION; Carn? From pcantalupo at gmail.com Wed Sep 28 12:54:19 2011 From: pcantalupo at gmail.com (Paul Cantalupo) Date: Wed, 28 Sep 2011 12:54:19 -0400 Subject: [Bioperl-l] algorithm_version not working in multi-result blast output Message-ID: Hello, I'm using the most recent copy of bioperl-live (pulled yesterday). I have a BLASTN (from blast+) output file for 3 query sequences (https://gist.github.com/1248342). I used this script, https://gist.github.com/1248338, to print the query id, algorithm and algorithm_version for each result. When I run the script, I get the following output: GFAVMM201BADC0 ?BLASTN ?2.2.25+ GFAVMM201A1JOH ?BLASTN GFAVMM201D933Z ?BLASTN Algorithm_version outputs the correct version for the first result but outputs the empty string for the 2nd and 3rd query. Why? This functionality worked about a month ago. What has changed to cause this to happen? Thank you, Paul From rondonbio at yahoo.com.br Wed Sep 28 15:47:45 2011 From: rondonbio at yahoo.com.br (Rondon Neto) Date: Wed, 28 Sep 2011 12:47:45 -0700 (PDT) Subject: [Bioperl-l] best Hit Message-ID: <1317239265.98674.YahooMailNeo@web130214.mail.mud.yahoo.com> Hi guys.? I have this subroutine that returns a hash with nucleotide's coverage of each query from a blast alignment. So, I want to compute uniq hits. If a hit has already been aligned with a query, it must be eliminated from my experiment. Can anyone check if it's right or can fix it to me? Is there a way to do that directly in blast? Thank you Rondon Neto sub nucleotide_coverage{ #Bio::SearchIO dependent #This subroutine return a Hash and a file with nucleotide coverage? #for each query in an blast alignment xlm file. The input is the #alignment file. my ($alignment_file) = @_; my $alignment = new Bio::SearchIO ( -format => 'blastXML', ? ?-file ? => $alignment_file );my %positions;my @used_reads; while (my $result = $alignment->next_result) { my $query_name = $result->query_name(); my $tam = $result -> query_length(); for (0..$tam-1){ ${$positions{$query_name}}[$_] = 0 }? while (my $hit = $result->next_hit) { my $hit_name = $hit->name; # Here is my best hit parser. Is it ok? foreach my $read (@used_reads) { if ( $read eq $hit_name ) { next; } } while (my $hsp = $hit->next_hsp) { my $query_name = $result->query_name(); my @pos = $hsp->seq_inds('query','identical'); foreach my $num (@pos) { ${$positions{$query_name}}[$num-1]++; } } push (@used_reads, $hit_name); } } my $outfile = "nucleotide_coverage.txt"; open OUT, ">$outfile" or die $!;foreach my $key (keys %positions){print OUT "$key\t@{$positions{$key}}\n"; } close OUT; return \%positions; } From shalabh.sharma7 at gmail.com Wed Sep 28 15:53:07 2011 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Wed, 28 Sep 2011 15:53:07 -0400 Subject: [Bioperl-l] Getting taxa from gi Message-ID: Hi All, I know this has been discussed before, but this is kind of a new problem that i am facing. I want to get taxonomy (full linage) information from the huge list of GI's. I am using Bio::DB:Genbak for this with perl-5.12.3. Here is my small script. #! /usr/local/perl-5.12.3/bin/perl -w use strict; use warnings; use Bio::DB::GenBank; my @ids = qw( CP000490 ); my $gbh = Bio::DB::GenBank->new(); foreach my $id( @ids ) { # say "* ID: $id"; my $seq = $gbh->get_Seq_by_acc( $id ); my $org = $seq->species; #print "$org\n"; my $class = join'-', $org->classification; print "$class\n"; } The output is: Paracoccus denitrificans PD1222-Paracoccus-Rhodobacteraceae-Rhodobacterales-Alphaproteobacteria-Proteobacteria-Bacteria which is fine but i also want to get the taxa id, and if possible taxa ids for all the linage classification. ideally i would like to get something like this: 318586 - - - - - - - 1224 - 2 I would really appreciate your help. Thanks Shalabh -- Shalabh Sharma Scientific Computing Professional Associate (Bioinformatics Specialist) Department of Marine Sciences University of Georgia Athens, GA 30602-3636 From cjfields at illinois.edu Wed Sep 28 17:36:37 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Wed, 28 Sep 2011 21:36:37 +0000 Subject: [Bioperl-l] retrieving bioperl version for scripts In-Reply-To: References: Message-ID: On Sep 28, 2011, at 10:00 AM, Carn? Draug wrote: > 2011/9/28 longbow leo : >> Hi, Carn?, >> >> Do you mean this: >> >> perl -MBio::SeqIO -e 'print $Bio::SeqIO::VERSION' >> >> In my machine, the output is > > Thank you. Yes this is what I was looking for. I looked down how that > variable comes up and so I think I'll use > > use Bio::Root::Version; > say $Bio::Root::Version::VERSION; > > Carn? Just a warning on this: we're shortly to announce a major restructuring effort with BioPerl that will dramatically affect core versioning, mainly from the point of view that modularizing BioPerl into more manageable sub-distributions will require that each sub-distribution have it's own version (not necessarily a bad thing). chris From cjfields at illinois.edu Wed Sep 28 17:40:48 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Wed, 28 Sep 2011 21:40:48 +0000 Subject: [Bioperl-l] algorithm_version not working in multi-result blast output In-Reply-To: References: Message-ID: <4240A221-7BF7-4723-A7A1-30C806CE3DC6@illinois.edu> Not sure, but I would hazard a guess that only the 'Query=' line is present in concatenated BLAST reports past the initial report, and the version isn't carried over (I recall this being a problem with the algorithm() as well, but that was fixed a while ago. This should be an easy enough fix, but can you submit it as a bug so we can track it? chris On Sep 28, 2011, at 11:54 AM, Paul Cantalupo wrote: > Hello, > > I'm using the most recent copy of bioperl-live (pulled yesterday). I > have a BLASTN (from blast+) output file for 3 query sequences > (https://gist.github.com/1248342). I used this script, > https://gist.github.com/1248338, to print the query id, algorithm and > algorithm_version for each result. When I run the script, I get the > following output: > > GFAVMM201BADC0 BLASTN 2.2.25+ > GFAVMM201A1JOH BLASTN > GFAVMM201D933Z BLASTN > > Algorithm_version outputs the correct version for the first result but > outputs the empty string for the 2nd and 3rd query. Why? This > functionality worked about a month ago. What has changed to cause this > to happen? > > Thank you, > > Paul > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From carandraug+dev at gmail.com Wed Sep 28 18:07:53 2011 From: carandraug+dev at gmail.com (=?ISO-8859-1?Q?Carn=EB_Draug?=) Date: Wed, 28 Sep 2011 23:07:53 +0100 Subject: [Bioperl-l] retrieving bioperl version for scripts In-Reply-To: References: Message-ID: 2011/9/28 Fields, Christopher J : > On Sep 28, 2011, at 10:00 AM, Carn? Draug wrote: > >> 2011/9/28 longbow leo : >>> Hi, Carn?, >>> >>> Do you mean this: >>> >>> perl -MBio::SeqIO -e 'print $Bio::SeqIO::VERSION' >>> >>> In my machine, the output is >> >> Thank you. Yes this is what I was looking for. I looked down how that >> variable comes up and so I think I'll use >> >> use Bio::Root::Version; >> say $Bio::Root::Version::VERSION; >> >> Carn? > > Just a warning on this: we're shortly to announce a major restructuring effort with BioPerl that will dramatically affect core versioning, mainly from the point of view that modularizing BioPerl into more manageable sub-distributions will require that each sub-distribution have it's own version (not necessarily a bad thing). Where will the scripts end up after this restructuration? What I want is to create a version of the script (not of bioperl). Since the script is released with bioperl, they are the same. I actually already made the commit that makes this, just haven't bothered with the pull request yet. Also, will there be a release before this change? Carn? From shalabh.sharma7 at gmail.com Thu Sep 29 10:37:53 2011 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Thu, 29 Sep 2011 10:37:53 -0400 Subject: [Bioperl-l] GFF to GTF Message-ID: Hi, Is there any module to convert GFF file to GTF? Thanks Shalabh -- Shalabh Sharma Scientific Computing Professional Associate (Bioinformatics Specialist) Department of Marine Sciences University of Georgia Athens, GA 30602-3636 From cjfields at illinois.edu Thu Sep 29 11:07:27 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Thu, 29 Sep 2011 15:07:27 +0000 Subject: [Bioperl-l] retrieving bioperl version for scripts In-Reply-To: References: Message-ID: <46733E36-5795-4EB6-9C2B-C978000FFD46@illinois.edu> On Sep 28, 2011, at 5:07 PM, Carn? Draug wrote: > 2011/9/28 Fields, Christopher J : >> On Sep 28, 2011, at 10:00 AM, Carn? Draug wrote: >> >>> 2011/9/28 longbow leo : >>>> Hi, Carn?, >>>> >>>> Do you mean this: >>>> >>>> perl -MBio::SeqIO -e 'print $Bio::SeqIO::VERSION' >>>> >>>> In my machine, the output is >>> >>> Thank you. Yes this is what I was looking for. I looked down how that >>> variable comes up and so I think I'll use >>> >>> use Bio::Root::Version; >>> say $Bio::Root::Version::VERSION; >>> >>> Carn? >> >> Just a warning on this: we're shortly to announce a major restructuring effort with BioPerl that will dramatically affect core versioning, mainly from the point of view that modularizing BioPerl into more manageable sub-distributions will require that each sub-distribution have it's own version (not necessarily a bad thing). > > Where will the scripts end up after this restructuration? What I want > is to create a version of the script (not of bioperl). Since the > script is released with bioperl, they are the same. I actually already > made the commit that makes this, just haven't bothered with the pull > request yet. > > Also, will there be a release before this change? > > Carn? Scripts will likely go with the distribution that they most closely are tied to, but that's still an area for debate (some may equally fall within one distribution or another, which will be tricky). For more on the release aspects see the (currently being revised and thus not complete) wiki page: http://www.bioperl.org/wiki/BioPerl_Modularization chris From pcantalupo at gmail.com Thu Sep 29 12:13:05 2011 From: pcantalupo at gmail.com (Paul Cantalupo) Date: Thu, 29 Sep 2011 12:13:05 -0400 Subject: [Bioperl-l] algorithm_version not working in multi-result blast output In-Reply-To: <4240A221-7BF7-4723-A7A1-30C806CE3DC6@illinois.edu> References: <4240A221-7BF7-4723-A7A1-30C806CE3DC6@illinois.edu> Message-ID: Bug submitted: https://redmine.open-bio.org/issues/3298 On Wed, Sep 28, 2011 at 5:40 PM, Fields, Christopher J wrote: > Not sure, but I would hazard a guess that only the 'Query=' line is present in concatenated BLAST reports past the initial report, and the version isn't carried over (I recall this being a problem with the algorithm() as well, but that was fixed a while ago. > > This should be an easy enough fix, but can you submit it as a bug so we can track it? > > chris > > On Sep 28, 2011, at 11:54 AM, Paul Cantalupo wrote: > >> Hello, >> >> I'm using the most recent copy of bioperl-live (pulled yesterday). I >> have a BLASTN (from blast+) output file for 3 query sequences >> (https://gist.github.com/1248342). I used this script, >> https://gist.github.com/1248338, to print the query id, algorithm and >> algorithm_version for each result. When I run the script, I get the >> following output: >> >> GFAVMM201BADC0 ?BLASTN ?2.2.25+ >> GFAVMM201A1JOH ?BLASTN >> GFAVMM201D933Z ?BLASTN >> >> Algorithm_version outputs the correct version for the first result but >> outputs the empty string for the 2nd and 3rd query. Why? This >> functionality worked about a month ago. What has changed to cause this >> to happen? >> >> Thank you, >> >> Paul >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From jluis.lavin at unavarra.es Fri Sep 30 04:23:19 2011 From: jluis.lavin at unavarra.es (jluis.lavin at unavarra.es) Date: Fri, 30 Sep 2011 10:23:19 +0200 Subject: [Bioperl-l] Bio-Graphics module Message-ID: <52f9bcbb5c40302fe5d1ea274982c24b.squirrel@webmail.unavarra.es> Dear All, I'm currently using Perl 5.10.0 version and Bioperl 1.6.1 running on a windows machine. I read about the Bio-Graphics module and it'd be wonderful to install it, but seems like it is only available for Perl 5.8... Is there any other Perl and/or Bioperl module to do the same kind of genomic and Blast report representation currently available? Thanks in advance -- Dr. Jos? Luis Lav?n Trueba Dpto. de Producci?n Agraria Grupo de Gen?tica y Microbiolog?a Universidad P?blica de Navarra 31006 Pamplona Navarra SPAIN From cjfields at illinois.edu Fri Sep 30 08:38:01 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Fri, 30 Sep 2011 12:38:01 +0000 Subject: [Bioperl-l] Bio-Graphics module In-Reply-To: <52f9bcbb5c40302fe5d1ea274982c24b.squirrel@webmail.unavarra.es> References: <52f9bcbb5c40302fe5d1ea274982c24b.squirrel@webmail.unavarra.es> Message-ID: It's available for all perl versions from 5.8.8 up. I have it running with perl 5.14. Now, I recall there being problems with installation on Mac OS X, though I think that was mainly due to GD.pm and libgd. chris On Sep 30, 2011, at 3:23 AM, wrote: > > Dear All, > > I'm currently using Perl 5.10.0 version and Bioperl 1.6.1 running on a > windows machine. > > I read about the Bio-Graphics module and it'd be wonderful to install it, > but seems like it is only available for Perl 5.8... > Is there any other Perl and/or Bioperl module to do the same kind of > genomic and Blast report representation currently available? > > Thanks in advance > > -- > Dr. Jos? Luis Lav?n Trueba > > Dpto. de Producci?n Agraria > Grupo de Gen?tica y Microbiolog?a > Universidad P?blica de Navarra > 31006 Pamplona > Navarra > SPAIN > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jillianrowe91286 at gmail.com Wed Sep 28 02:03:32 2011 From: jillianrowe91286 at gmail.com (Jill) Date: Tue, 27 Sep 2011 23:03:32 -0700 (PDT) Subject: [Bioperl-l] Gene Type in Entrez gene? Message-ID: Hi there, I am using the Bio::DB::Eutilities module to download gene sequences based on a query. while (my $docsum = $summaries->next_DocSum) { ## some items in DocSum are also named ChrStart so we pick the genomic ## information item and get the coordinates from it my ($genomic_info) = $docsum->get_Items_by_name('GenomicInfoType'); ## some entries may have no data on genomic coordinates. This condition filters then out if (!$genomic_info) { ## found no genomic coordinates data next; } ## get coordinates of sequence ## get_contents_by_name always returns a list my ($chr_acc_ver) = $genomic_info- >get_contents_by_name("ChrAccVer"); my ($chr_start) = $genomic_info- >get_contents_by_name("ChrStart"); my ($chr_stop) = $genomic_info- >get_contents_by_name("ChrStop"); my $strand; if ($chr_start < $chr_stop) { $strand = 1; $chr_start = $chr_start +1 - $bp5_extra; $chr_stop = $chr_stop +1 + $bp5_extra; } elsif ($chr_start > $chr_stop) { $strand = 2; $chr_start = $chr_start +1 - (-$bp5_extra); $chr_stop = $chr_stop +1 + (-$bp5_extra); } else { next; } while (my $item = $docsum->next_Item('flattened')) { next if ($item->get_name =~ m/NomenclatureName/); if($item->get_name =~ m/Description/) { $description = $item->get_content if $item->get_content; $description =~ tr/ /_/; print $description, "\n";} if($item->get_name =~ m/Name/) { $name = $item->get_content if $item->get_content; print $name, "\n"; } printf("%-20s:%s\n",$item->get_name,$item->get_content) if $item->get_content; } } Then I go on to use genbank to download the sequences based on the chromosome splice. For what I have it works great. But I am trying to get to the gene type (either protein coding or pseudo) as well. I can see it in the summary on the Entrez Gene sight, but can't get to it through bioperl. When I have it print out all the contents of the summary it doesn't show up there either. Any help? Thanks! From liam.elbourne at mq.edu.au Thu Sep 29 17:34:04 2011 From: liam.elbourne at mq.edu.au (Liam Elbourne) Date: Fri, 30 Sep 2011 07:34:04 +1000 Subject: [Bioperl-l] GFF to GTF In-Reply-To: References: Message-ID: <8D027281-44E6-467C-8D22-D2D2F87D04B6@mq.edu.au> Hi Shalabh, Not sure about bioperl (I looked a while back and either missed it or it's not there) but there is a program associated with the cufflinks suite called gffread that should convert. Regards, Liam Elbourne. On 30/09/2011, at 12:37 AM, shalabh sharma wrote: > Hi, > Is there any module to convert GFF file to GTF? > > Thanks > Shalabh > > > -- > Shalabh Sharma > Scientific Computing Professional Associate (Bioinformatics Specialist) > Department of Marine Sciences > University of Georgia > Athens, GA 30602-3636 > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From carandraug+dev at gmail.com Fri Sep 30 09:18:04 2011 From: carandraug+dev at gmail.com (=?ISO-8859-1?Q?Carn=EB_Draug?=) Date: Fri, 30 Sep 2011 14:18:04 +0100 Subject: [Bioperl-l] Gene Type in Entrez gene? In-Reply-To: References: Message-ID: On 28 September 2011 07:03, Jill wrote: > Hi there, > > I am using the Bio::DB::Eutilities module to download gene sequences > based on a query. > > > [...] > } > > > Then I go on to use genbank to download the sequences based on the > chromosome splice. For what I have it works great. But I am trying to > get to the gene type (either protein coding or pseudo) as well. I can > see it in the summary on the Entrez Gene sight, but can't get to it > through bioperl. When I have it print out all the contents of the > summary it doesn't show up there either. > > Any help? Hi Jill, there's already a script in bioperl that does what you want, it's just not part of the current stable release. You can get it here https://github.com/bioperl/bioperl-live/blob/master/scripts/Bio-DB-EUtilities/bp_genbank_ref_extractor.pl You can download the script alone, it will work fine in previous releases of bioperl, no need to write another one. Carn? Draug From manju.rawat2 at gmail.com Thu Sep 1 02:53:53 2011 From: manju.rawat2 at gmail.com (Manju Rawat) Date: Thu, 1 Sep 2011 02:53:53 -0400 Subject: [Bioperl-l] Bioperl query.... In-Reply-To: References: <4E5CC8AC.8050800@gmail.com> Message-ID: Thanks For The Reply.. I have already seen this link..But I am confused. I used to following code and run it... my $in = Bio::SearchIO->new(-format => 'blast', -file => 'seqs.blast'); while( my $result = $in->next_result ) { ## $result is a Bio::Search::Result::ResultI compliant object while( my $hit = $result->next_hit ) { ## $hit is a Bio::Search::Hit::HitI compliant object while( my $hsp = $hit->next_hsp ) { ## $hsp is a Bio::Search::HSP::HSPI compliant object if( $hsp->length('total') > 50 ) { if ( $hsp->percent_identity >= 75 ) { print "Query=", $result->query_name, " Hit=", $hit->name, " Length=", $hsp->length('total'), " gaps=", $hsp->gaps, " Percent_id=", $hsp->percent_identity, "\n"; } } } }} and it showing me following output with an error that.. *Erro--*rArgument "" isn't numeric in numeric lt (<) at /usr/local/share/perl/5.10.1/Bio/SearchIO/SearchResultEventBuilder.pm line 279, line 4113. Query=NM_181451 Hit=ref|NM_181451.1| Length=1349 gaps=1 Percent_id=100 Query=NM_181451 Hit=ref|XM_002706247.1| Length=1345 gaps=13 Percent_id=93.8289962825279 Query=NM_181451 Hit=ref|NM_001098089.1| Length=1323 gaps=7 Percent_id=91.9123204837491 Query=NM_181451 Hit=ref|NM_001008415.1| Length=1211 gaps=5 Percent_id=94.9628406275805 Query=NM_181451 Hit=ref|XM_001251693.3| Length=1320 gaps=5 Percent_id=91.969696969697 Query=NM_181451 Hit=ref|NM_001097567.1| Length=1338 gaps=4 Percent_id=91.5545590433483 Query=NM_181451 Hit=gb|AY075103.1| Length=1334 gaps=1 Percent_id=91.304347826087 ................ .......... Pl Find.whats the error i this code... Thanks Manju Rawat. From locarpau at upvnet.upv.es Thu Sep 1 10:49:06 2011 From: locarpau at upvnet.upv.es (Lorenzo Carretero Paulet) Date: Thu, 1 Sep 2011 16:49:06 +0200 Subject: [Bioperl-l] Parsing PAML mlc files In-Reply-To: References: <9645AF32-5EC3-41FA-9A32-45B6B92E31FD@illinois.edu><4DF56976.8080704@upvnet.upv.es><9866C4A4-AC36-4A25-B38F-3006A7BB0F11@illinois.edu> <1314636433.4e5bc291a40c6@webmail.upv.es> <1A4207F8295607498283FE9E93B775B407CCB2D3@EX02.asurite.ad.asu.edu> <1314659986.4e5c1e9268078@webmail.upv.es> Message-ID: <1314888546.4e5f9b628ecea@webmail.upv.es> Hi all, I'm trying to parse mlc output files from PAML using Bio::Tools::Phylo::PAML as: my $mlcfile = "/Applications/Bioinformatics/paml44/bin/mlc"; my $parserF = Bio::Tools::Phylo::PAML->new (-file =>$mclfile); if ( my $paml_result = $parserF->next_result ) { say Dumper $paml_result; #Prints Ok for ( my $model_result= $paml_result->get_NSSite_results ) { #say Dumper $model_result; #Prints nothing $ns_string = "model ".$model_result->model_num."\n ".$model_result->model_description()."\n ".$model_result->time_used."\n"; $dnds_site_classes = $model_result->dnds_site_classes; #a hashref #say Dumper $dnds_site_classes; for my $sites ( $model_result->get_BEB_pos_selected_sites ) ... ... ... The Bio::Tools::Phylo::PAML::Result object is ok, as I can print it using DUmper. In contrast, it seems that the Bio::Tools::Phylo::PAML::ModelResult object is not being properly instantiated, as I get the error message: "Can't call method "model_num" without a package or object reference at ..." What am I missing? Best, Lorenzo From jason.stajich at gmail.com Thu Sep 1 16:23:47 2011 From: jason.stajich at gmail.com (Jason Stajich) Date: Thu, 1 Sep 2011 13:23:47 -0700 Subject: [Bioperl-l] Parsing PAML mlc files In-Reply-To: <1314888546.4e5f9b628ecea@webmail.upv.es> References: <9645AF32-5EC3-41FA-9A32-45B6B92E31FD@illinois.edu><4DF56976.8080704@upvnet.upv.es><9866C4A4-AC36-4A25-B38F-3006A7BB0F11@illinois.edu> <1314636433.4e5bc291a40c6@webmail.upv.es> <1A4207F8295607498283FE9E93B775B407CCB2D3@EX02.asurite.ad.asu.edu> <1314659986.4e5c1e9268078@webmail.upv.es> <1314888546.4e5f9b628ecea@webmail.upv.es> Message-ID: Lorenzo - I am sure this is a problem with changes in the output from PAML - this is classic problem with this suite. This requires some debugging of the parser, not sure if there is anyone out there with time to do the debugging. I can say all this worked before on an earlier version of PAML but I don't know specifically what is going on with the latest paml4.4 version. Jason On Sep 1, 2011, at 7:49 AM, Lorenzo Carretero Paulet wrote: > Hi all, > I'm trying to parse mlc output files from PAML using Bio::Tools::Phylo::PAML as: > > my $mlcfile = "/Applications/Bioinformatics/paml44/bin/mlc"; > my $parserF = Bio::Tools::Phylo::PAML->new (-file =>$mclfile); > if ( my $paml_result = $parserF->next_result ) > { > say Dumper $paml_result; #Prints Ok > for ( my $model_result= $paml_result->get_NSSite_results ) > { > #say Dumper $model_result; #Prints nothing > $ns_string = "model ".$model_result->model_num."\n > ".$model_result->model_description()."\n ".$model_result->time_used."\n"; > $dnds_site_classes = $model_result->dnds_site_classes; #a hashref > #say Dumper $dnds_site_classes; > for my $sites ( $model_result->get_BEB_pos_selected_sites ) > ... > ... > ... > > The Bio::Tools::Phylo::PAML::Result object is ok, as I can print it using > DUmper. In contrast, it seems that the Bio::Tools::Phylo::PAML::ModelResult > object is not being properly instantiated, as I get the error message: > > "Can't call method "model_num" without a package or object reference at ..." > > What am I missing? > Best, > Lorenzo > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From Scott.Markel at accelrys.com Thu Sep 1 17:22:21 2011 From: Scott.Markel at accelrys.com (Scott Markel) Date: Thu, 1 Sep 2011 14:22:21 -0700 Subject: [Bioperl-l] file format for alignment plus features for aligned sequences Message-ID: <5ACBA19439E77B43A06F4CAB897EC97702F8302A05@EXCH1-COLO.accelrys.net> A question on behalf of the Discovery Studio group at Accelrys - They have alignment data with annotations, e.g., visualization settings or alignment properties. The aligned sequences also have features, e.g., domain boundaries or secondary structure motifs. They currently use BSML to save sequences and features. Is there an extension of BSML that can also save the alignment information? Are there any good file formats that can be used to store an alignment plus features associated with the aligned sequences? Are there other mailing lists that might be more appropriate for these questions? Scott Scott Markel, Ph.D. Principal Bioinformatics Architect email: smarkel at accelrys.com Accelrys (Pipeline Pilot R&D) mobile: +1 858 205 3653 10188 Telesis Court, Suite 100 voice: +1 858 799 5603 San Diego, CA 92121 fax: +1 858 799 5222 USA web: http://www.accelrys.com http://www.linkedin.com/in/smarkel Secretary, Board of Directors: International Society for Computational Biology Chair: ISCB Publications Committee Associate Editor: PLoS Computational Biology Editorial Board: Briefings in Bioinformatics From ihok at hotmail.com Thu Sep 1 23:49:50 2011 From: ihok at hotmail.com (Jack Tanner) Date: Thu, 1 Sep 2011 23:49:50 -0400 Subject: [Bioperl-l] Bio::Ext::Align? Message-ID: I'd like to run Bio::Ext::Align, from bioperl-ext. Seems like a pretty quiet codebase these days... Is it dead? Does anyone have a spec file for building an SRPM for it for RHEL 6? From cjfields at illinois.edu Fri Sep 2 00:31:17 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Fri, 2 Sep 2011 04:31:17 +0000 Subject: [Bioperl-l] Bio::Ext::Align? In-Reply-To: References: Message-ID: <82213CB3-C02A-415D-A4F3-66471D44C31C@illinois.edu> Yes, it's essentially deprecated (unmaintained). I don't know of anyone who has packaged that up in a while, if ever. chris On Sep 1, 2011, at 10:49 PM, Jack Tanner wrote: > I'd like to run Bio::Ext::Align, from bioperl-ext. Seems like a pretty quiet codebase these days... Is it dead? > > Does anyone have a spec file for building an SRPM for it for RHEL 6? > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From David.Messina at sbc.su.se Fri Sep 2 04:44:07 2011 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 2 Sep 2011 10:44:07 +0200 Subject: [Bioperl-l] Bio::Ext::Align? In-Reply-To: <82213CB3-C02A-415D-A4F3-66471D44C31C@illinois.edu> References: <82213CB3-C02A-415D-A4F3-66471D44C31C@illinois.edu> Message-ID: As it happens, a colleague of mine needed Bio::Ext::Align for hhrpred: http://toolkit.tuebingen.mpg.de/hhpred He got it working thus: > Hi Dave, > thanks a lot. i made it work. The error i got later on was: relocation R_X86_64_32 against `.rodata.str1.1' can not be used when making > a shared object; > recompile with -fPIC > > the solution is: > perl Makefile.PL PREFIX= /fftwsingle --enable-shared > --with-pic --enable-single > > make > make install > http://forums.fedoraforum.org/ > showthread.php?t=232607 Dave On Fri, Sep 2, 2011 at 06:31, Fields, Christopher J wrote: > Yes, it's essentially deprecated (unmaintained). I don't know of anyone > who has packaged that up in a while, if ever. > > chris > > On Sep 1, 2011, at 10:49 PM, Jack Tanner wrote: > > > I'd like to run Bio::Ext::Align, from bioperl-ext. Seems like a pretty > quiet codebase these days... Is it dead? > > > > Does anyone have a spec file for building an SRPM for it for RHEL 6? > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From David.Messina at sbc.su.se Fri Sep 2 05:30:33 2011 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 2 Sep 2011 11:30:33 +0200 Subject: [Bioperl-l] Parsing PAML mlc files In-Reply-To: References: <9645AF32-5EC3-41FA-9A32-45B6B92E31FD@illinois.edu> <4DF56976.8080704@upvnet.upv.es> <9866C4A4-AC36-4A25-B38F-3006A7BB0F11@illinois.edu> <1314636433.4e5bc291a40c6@webmail.upv.es> <1A4207F8295607498283FE9E93B775B407CCB2D3@EX02.asurite.ad.asu.edu> <1314659986.4e5c1e9268078@webmail.upv.es> <1314888546.4e5f9b628ecea@webmail.upv.es> Message-ID: Looking back at the commit history, back in April and May 2010, I made some updates for the January 2010 edition of PAML 4.4. All tests passed at that time, but: - the tests may be incomplete - PAML has undoubtedly changed since then, even if it's still called version 4.4 I can't look at this right now myself, but please file a bug report on this, and hopefully someone else can. Dave On Thu, Sep 1, 2011 at 22:23, Jason Stajich wrote: > Lorenzo - > > I am sure this is a problem with changes in the output from PAML - this is > classic problem with this suite. This requires some debugging of the > parser, not sure if there is anyone out there with time to do the debugging. > I can say all this worked before on an earlier version of PAML but I don't > know specifically what is going on with the latest paml4.4 version. > > Jason > > > On Sep 1, 2011, at 7:49 AM, Lorenzo Carretero Paulet wrote: > > > Hi all, > > I'm trying to parse mlc output files from PAML using > Bio::Tools::Phylo::PAML as: > > > > my $mlcfile = "/Applications/Bioinformatics/paml44/bin/mlc"; > > my $parserF = Bio::Tools::Phylo::PAML->new (-file =>$mclfile); > > if ( my $paml_result = $parserF->next_result ) > > { > > say Dumper $paml_result; #Prints Ok > > for ( my $model_result= $paml_result->get_NSSite_results ) > > { > > #say Dumper $model_result; #Prints nothing > > $ns_string = "model ".$model_result->model_num."\n > > ".$model_result->model_description()."\n ".$model_result->time_used."\n"; > > $dnds_site_classes = $model_result->dnds_site_classes; #a hashref > > #say Dumper $dnds_site_classes; > > for my $sites ( $model_result->get_BEB_pos_selected_sites ) > > ... > > ... > > ... > > > > The Bio::Tools::Phylo::PAML::Result object is ok, as I can print it using > > DUmper. In contrast, it seems that the > Bio::Tools::Phylo::PAML::ModelResult > > object is not being properly instantiated, as I get the error message: > > > > "Can't call method "model_num" without a package or object reference at > ..." > > > > What am I missing? > > Best, > > Lorenzo > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at illinois.edu Fri Sep 2 09:00:27 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Fri, 2 Sep 2011 13:00:27 +0000 Subject: [Bioperl-l] Bio::Ext::Align? In-Reply-To: References: <82213CB3-C02A-415D-A4F3-66471D44C31C@illinois.edu> Message-ID: <81EEC2D4-D039-41D5-A10F-91DD26B4D21E@illinois.edu> I think, if this is actively being used, we should split it away from bioperl-ext and release it on its own. Otherwise I worry about the long-term support for it/ chris On Sep 2, 2011, at 3:44 AM, Dave Messina wrote: > As it happens, a colleague of mine needed Bio::Ext::Align for hhrpred: > http://toolkit.tuebingen.mpg.de/hhpred > > > He got it working thus: > Hi Dave, > thanks a lot. i made it work. The error i got later on was: > relocation R_X86_64_32 against `.rodata.str1.1' can not be used when making a shared object; > recompile with -fPIC > > the solution is: > perl Makefile.PL PREFIX= /fftwsingle --enable-shared --with-pic --enable-single > > make > make install > http://forums.fedoraforum.org/showthread.php?t=232607 > > > > Dave > > > > > > On Fri, Sep 2, 2011 at 06:31, Fields, Christopher J wrote: > Yes, it's essentially deprecated (unmaintained). I don't know of anyone who has packaged that up in a while, if ever. > > chris > > On Sep 1, 2011, at 10:49 PM, Jack Tanner wrote: > > > I'd like to run Bio::Ext::Align, from bioperl-ext. Seems like a pretty quiet codebase these days... Is it dead? > > > > Does anyone have a spec file for building an SRPM for it for RHEL 6? > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From ihok at hotmail.com Fri Sep 2 11:20:44 2011 From: ihok at hotmail.com (Jack Tanner) Date: Fri, 2 Sep 2011 11:20:44 -0400 Subject: [Bioperl-l] Bio::Ext::Align? In-Reply-To: <81EEC2D4-D039-41D5-A10F-91DD26B4D21E@illinois.edu> References: <82213CB3-C02A-415D-A4F3-66471D44C31C@illinois.edu> <81EEC2D4-D039-41D5-A10F-91DD26B4D21E@illinois.edu> Message-ID: I also see that someone's forked it on Github and made some packaging fixes. It'd be nice to see it revived. On 9/2/2011 9:00 AM, Fields, Christopher J wrote: > I think, if this is actively being used, we should split it away from bioperl-ext and release it on its own. Otherwise I worry about the long-term support for it/ > > chris > > On Sep 2, 2011, at 3:44 AM, Dave Messina wrote: > >> As it happens, a colleague of mine needed Bio::Ext::Align for hhrpred: >> http://toolkit.tuebingen.mpg.de/hhpred >> >> >> He got it working thus: >> Hi Dave, >> thanks a lot. i made it work. The error i got later on was: >> relocation R_X86_64_32 against `.rodata.str1.1' can not be used when making a shared object; >> recompile with -fPIC >> >> the solution is: >> perl Makefile.PL PREFIX= /fftwsingle --enable-shared --with-pic --enable-single >> >> make >> make install >> http://forums.fedoraforum.org/showthread.php?t=232607 >> >> >> >> Dave >> >> >> >> >> >> On Fri, Sep 2, 2011 at 06:31, Fields, Christopher J wrote: >> Yes, it's essentially deprecated (unmaintained). I don't know of anyone who has packaged that up in a while, if ever. >> >> chris >> >> On Sep 1, 2011, at 10:49 PM, Jack Tanner wrote: >> >>> I'd like to run Bio::Ext::Align, from bioperl-ext. Seems like a pretty quiet codebase these days... Is it dead? >>> >>> Does anyone have a spec file for building an SRPM for it for RHEL 6? >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > From manju.rawat2 at gmail.com Sat Sep 3 01:29:56 2011 From: manju.rawat2 at gmail.com (Manju Rawat) Date: Sat, 3 Sep 2011 01:29:56 -0400 Subject: [Bioperl-l] hsps_successfully_gapped: 47 Message-ID: Hello, Is There any method in BioPerl through which we can extract number_of_hsps_successfully_gapped: from a blast file.. If any one know about the it Pl help me... Thanks Manju Rawat From manju.rawat2 at gmail.com Sat Sep 3 06:00:22 2011 From: manju.rawat2 at gmail.com (Manju Rawat) Date: Sat, 3 Sep 2011 06:00:22 -0400 Subject: [Bioperl-l] blast result not matching. Message-ID: Hi, I doing blast using bioperl...but it not showing me complete result.. my program is following... #!usr/bin/perl -w use Bio::Perl; # this script will only work with an internet connection # on the computer it is run on $seq = new_sequence("ATTGGTTTGGGGACCCAATTTGTGTGTTATATGTA"); $blast_result=blast_sequence($seq); write_blast(">xyz.blast",$blast_result); and Output.. BLASTN 2.2.25+ Stephen F. Altschul, Thomas L. Madden, Alejandro A. Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs", Nucleic Acids Res. 25:3389-3402. Query= blast-sequence-temp-id (30 letters) Database: All GenBank+EMBL+DDBJ+PDB sequences (but no EST, STS,GSS,environmental samples or phase 0, 1 or 2 HTGS sequences) 14,527,398 sequences; 37,346,598,701 total letters Score E Sequences producing significant alignments: (bits) value Database: All GenBank+EMBL+DDBJ+PDB sequences (but no EST, STS,GSS,environmental samples or phase 0, 1 or 2 HTGS sequences) Posted date: Sep 2, 2011 4:14 PM Number of letters in database: 37,346,598,701 Number of sequences in database: 14,527,398 Matrix: blastn matrix:2 -3 Gap Penalties Existence: 5, Extension: 2 expect: 1e-10 allowgaps: yes Search Statistics A: 0 Hits_to_DB: 737,387 S1: 23 S1_bits: 22.0 S2: 77 S2_bits: 70.7 X1: 22 X1_bits: 20.1 X2: 33 X2_bits: 29.8 X3: 110 X3_bits: 99.2 dbentries: 14,527,398 dbletters: -1308106959 effectivedblength: 36,954,358,955 effectivespace: 110,863,076,865 effectivespaceused: 110,863,076,865 entropy: 0.912 entropy_gapped: 0.780 kappa: 0.408 kappa_gapped: 0.410 lambda: 0.634 lambda_gapped: 0.625 length_adjustment: 27 num_extensions: 8,057 num_successful_extensions: 8,057 number_of_hsps_better_than_expect_value_cutoff_without_gapping: 0 number_of_hsps_gapped: 8,057 number_of_hsps_successfully_gapped: 0 querylength: 3 seqs_better_than_1e-10: 0 this result is not matching with with NCBI result... Is there anything wrong.. Thanks Manju Rawat From florent.angly at gmail.com Sun Sep 4 22:14:37 2011 From: florent.angly at gmail.com (Florent Angly) Date: Mon, 05 Sep 2011 12:14:37 +1000 Subject: [Bioperl-l] Bio::SeqIO can't guess the format of data from a pipe In-Reply-To: References: <2E070303-ADDF-472B-835B-6D3959B83E9A@illinois.edu> <4E58D105.7050805@gmail.com> <4E5A0590.2010805@gmail.com> <8D639B95-0666-4F09-8E9E-88C8CDF76ABC@illinois.edu> <4E5AC2B8.9060808@gmail.com> Message-ID: <4E64308D.5060304@gmail.com> Thanks for your advice Chris. I put a format() and variant() method in Bio::Root::IO. All the Bio::*IO methods inherit these methods. Regarding the module naming, 8 follow the convention Bio::*IO and 8 follow the Bio::*::IO convention. If we decide to rename some IO modules for consistency, I would prefer the Bio::*::IO convention. Regards, Florent On 29/08/11 11:10, Chris Fields wrote: > On Aug 28, 2011, at 5:35 PM, Florent Angly wrote: > >> Hi, >> >> I implemented the format() getter method in Bio::SeqIO as discussed, essentially following the way proposed by Hilmar. The variant() method is not needed since Bio::SeqIO::fastq already has a get/set method for that. > Right, but the method could be used by other modules if it were moved to Bio::SeqIO. for instance. > >> I noticed that there are plenty more Bio*IO modules that could benefit from having a format() method, e.g.: >> Bio::AlignIO >> Bio::ClusterIO >> Bio::FeatureIO >> Bio::MapIO >> Bio::OntologyIO >> Bio::SearchIO >> Bio::TreeIO >> Bio::Assembly::IO * >> The code could be copy-pasted for each of them but it is not very graceful. Is there a way we could have all these IO modules share the same format() method? > Move the method to Bio::Root::IO, the common base class for all of the above. > >> * Note how the IO class for Bio::Assembly is called Bio::Assembly::IO, and not Bio::AssemblyIO like for other classes. This may be something to change in the future for consistency. >> >> Florent > That's possible; one could take advantage of that for redesign/API issues if it were needed. > > chris From manju.rawat2 at gmail.com Mon Sep 5 03:53:40 2011 From: manju.rawat2 at gmail.com (Manju Rawat) Date: Mon, 5 Sep 2011 03:53:40 -0400 Subject: [Bioperl-l] Fwd: blast result not matching. In-Reply-To: References: Message-ID: Hi, I doing blast using bioperl...but it not showing me complete result.. my program is following... #!usr/bin/perl -w use Bio::Perl; # this script will only work with an internet connection # on the computer it is run on $seq = new_sequence("ATTGGTTTGGGGACCCAATTTGTGTGTTATATGTA"); $blast_result=blast_sequence($seq); write_blast(">xyz.blast",$blast_result); *and Output..* BLASTN 2.2.25+ Stephen F. Altschul, Thomas L. Madden, Alejandro A. Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs", Nucleic Acids Res. 25:3389-3402. Query= blast-sequence-temp-id (30 letters) Database: All GenBank+EMBL+DDBJ+PDB sequences (but no EST, STS,GSS,environmental samples or phase 0, 1 or 2 HTGS sequences) 14,527,398 sequences; 37,346,598,701 total letters Score E Sequences producing significant alignments: (bits) value Database: All GenBank+EMBL+DDBJ+PDB sequences (but no EST, STS,GSS,environmental samples or phase 0, 1 or 2 HTGS sequences) Posted date: Sep 2, 2011 4:14 PM Number of letters in database: 37,346,598,701 Number of sequences in database: 14,527,398 Matrix: blastn matrix:2 -3 Gap Penalties Existence: 5, Extension: 2 expect: 1e-10 allowgaps: yes Search Statistics A: 0 Hits_to_DB: 737,387 S1: 23 S1_bits: 22.0 S2: 77 S2_bits: 70.7 X1: 22 X1_bits: 20.1 X2: 33 X2_bits: 29.8 X3: 110 X3_bits: 99.2 dbentries: 14,527,398 dbletters: -1308106959 effectivedblength: 36,954,358,955 effectivespace: 110,863,076,865 effectivespaceused: 110,863,076,865 entropy: 0.912 entropy_gapped: 0.780 kappa: 0.408 kappa_gapped: 0.410 lambda: 0.634 lambda_gapped: 0.625 length_adjustment: 27 num_extensions: 8,057 num_successful_extensions: 8,057 number_of_hsps_better_than_expect_value_cutoff_without_gapping: 0 number_of_hsps_gapped: 8,057 number_of_hsps_successfully_gapped: 0 querylength: 3 seqs_better_than_1e-10: 0 this result is not matching with with NCBI result... Is there anything wrong.. Thanks Manju Rawat From p.j.a.cock at googlemail.com Mon Sep 5 05:44:06 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 5 Sep 2011 10:44:06 +0100 Subject: [Bioperl-l] Fwd: blast result not matching. In-Reply-To: References: Message-ID: On Mon, Sep 5, 2011 at 8:53 AM, Manju Rawat wrote: > Hi, > I doing blast using bioperl...but it not showing me complete result.. > > > my program is following... > ... > > this result is not matching with with NCBI result... > Is there anything wrong.. The NCBI website for BLAST uses different default values to the BLAST command line tools. Check things like the gap parameters if you want to use the same settings. Peter From p.j.a.cock at googlemail.com Mon Sep 5 06:25:15 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 5 Sep 2011 11:25:15 +0100 Subject: [Bioperl-l] Fwd: blast result not matching. In-Reply-To: References: Message-ID: Please CC the mailing list. On Mon, Sep 5, 2011 at 11:19 AM, Manju Rawat wrote: > Hi, > > Thanks for the reply... > but when i am blasting after getting sequence of any gene (from NCBI using > bioperl see below)..it showing me same result as shown in NCBI.. > > #!usr/bin/perl -w > use Bio::Perl; > $seq_object = get_sequence('NCBI',"NM_181451"); > $blast_result = blast_sequence($seq); > write_blast(">roa1.blast",$blast_report); > > > I dnt know why its not working when i am blasting my own sequence.. > Maybe you need give the sequence as a FASTA entry rather than a plain string? Peter From manju.rawat2 at gmail.com Mon Sep 5 06:40:57 2011 From: manju.rawat2 at gmail.com (Manju Rawat) Date: Mon, 5 Sep 2011 06:40:57 -0400 Subject: [Bioperl-l] Fwd: blast result not matching. In-Reply-To: References: Message-ID: No..i also tried this..this also dont work.. pls help me.. From cjfields at illinois.edu Mon Sep 5 15:42:49 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Mon, 5 Sep 2011 19:42:49 +0000 Subject: [Bioperl-l] Fwd: blast result not matching. In-Reply-To: References: Message-ID: <4AF09F30-CB69-4285-90A2-40AFEEEBD222@illinois.edu> Are you using the latest BioPerl? I believe there had been some fixes addressing remote blast. chris On Sep 5, 2011, at 5:40 AM, Manju Rawat wrote: > No..i also tried this..this also dont work.. > pls help me.. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From manju.rawat2 at gmail.com Tue Sep 6 06:59:50 2011 From: manju.rawat2 at gmail.com (Manju Rawat) Date: Tue, 6 Sep 2011 06:59:50 -0400 Subject: [Bioperl-l] Fwd: blast result not matching. In-Reply-To: <4AF09F30-CB69-4285-90A2-40AFEEEBD222@illinois.edu> References: <4AF09F30-CB69-4285-90A2-40AFEEEBD222@illinois.edu> Message-ID: bioperl 1.6.9 version is installed in my system.. its not the reason bcs blast is working fine when i am blasting with follwing code.. #!usr/bin/perl -w use Bio::Perl; $seq = get_sequence('NCBI',"NM_181451"); $blast_result=blast_sequence($seq); write_blast(">xyz.blast",$blast_result); Manju From sidd.basu at gmail.com Tue Sep 6 11:51:09 2011 From: sidd.basu at gmail.com (Siddhartha Basu) Date: Tue, 6 Sep 2011 10:51:09 -0500 Subject: [Bioperl-l] Bioinformatics Job Opening at dictyBase in Chicago Message-ID: <20110906155106.GB1841@Macintosh-388.local> Hi All, We have an open position for a Bioinformatics Software Engineer at dictyBase(Northwestern University in Chicago). The job involves developing web application and middleware for a genome database using modern perl(DBIx-Class/Moose/MVC web frameworks etc) as well as integration of various genomic tools(gbrowse, intermine, apollo, biomart, pathway tools etc..). For full details please see: http://www.dictybase.org/dictybase_jobs.html. thanks, -siddhartha Siddhartha Basu Software developer, dictybase http://www.dictybase.org From slucky at ibab.ac.in Wed Sep 7 09:39:03 2011 From: slucky at ibab.ac.in (Lucky Singh) Date: Wed, 07 Sep 2011 19:09:03 +0530 Subject: [Bioperl-l] Fwd: Re: Problem using Bio::Tools::Run::RemoteBlast Message-ID: <4E6773F7.7000703@ibab.ac.in> -------- Original Message -------- Subject: Re: [Bioperl-l] Problem using Bio::Tools::Run::RemoteBlast Date: Sat, 27 Aug 2011 20:36:58 +0530 From: Lucky Singh To: Carn? Draug On Friday 26 August 2011 07:50 PM, Carn? Draug wrote: > On 22 August 2011 07:01, Lucky Singh wrote: >> Now I >> wanted to host it from web server, but This program is not working from it >> may be it is not able to create or write on file from web server but in >> command line it is working fine. I don't know the possible reason, please >> help me to figure it out. > Have you looked in the apache logs (look in > /var/log/apache2/error.log) ? Can you pastebin your whole code and the > content of the error log after trying to run the script? Dear Carn? Draug, As per your suggestion, I am attaching blast code file currently it is not showing any error on error.log. Thanks a lot for your valuable reply and will be highly grateful if you can get me out of this problem :) -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: blastn URL: From jason.stajich at gmail.com Wed Sep 7 12:13:46 2011 From: jason.stajich at gmail.com (Jason Stajich) Date: Wed, 7 Sep 2011 09:13:46 -0700 Subject: [Bioperl-l] Fwd: blast result not matching. In-Reply-To: References: <4AF09F30-CB69-4285-90A2-40AFEEEBD222@illinois.edu> Message-ID: <8FB3525D-3B2B-4D17-80D4-AA85B3D2AE8B@gmail.com> I don't think it works. I am not sure why - probably a bug - but can you go back to what it is you are trying to do? The Bio::Perl functions in that modules are intended to be shortcuts but the original modules should work. Can you recap what it is you want to accomplish, it may be better to do this with the Bio::Perl module but instead use a more direct use of the underlying modules. On Sep 6, 2011, at 3:59 AM, Manju Rawat wrote: > bioperl 1.6.9 version is installed in my system.. > its not the reason bcs blast is working fine when i am blasting with > follwing code.. > > #!usr/bin/perl -w > use Bio::Perl; > $seq = get_sequence('NCBI',"NM_181451"); > $blast_result=blast_sequence($seq); > write_blast(">xyz.blast",$blast_result); > > > Manju > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Wed Sep 7 12:33:52 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Wed, 7 Sep 2011 16:33:52 +0000 Subject: [Bioperl-l] Fwd: blast result not matching. In-Reply-To: <8FB3525D-3B2B-4D17-80D4-AA85B3D2AE8B@gmail.com> References: <4AF09F30-CB69-4285-90A2-40AFEEEBD222@illinois.edu> <8FB3525D-3B2B-4D17-80D4-AA85B3D2AE8B@gmail.com> Message-ID: I think there was an issue with Bio::Perl BLAST submissions fixed in the 1.6.901 release (1.6.9 != 1.6.901, the latter is newer). From CPAN: 1.6.901 May 18, 2011 ... [Bug fixes] * [3205] - small fix to Bio::Perl blast_sequence() to make compliant with docs [genehack, cjfields] chris On Sep 7, 2011, at 11:13 AM, Jason Stajich wrote: > I don't think it works. I am not sure why - probably a bug - but can you go back to what it is you are trying to do? The Bio::Perl functions in that modules are intended to be shortcuts but the original modules should work. > Can you recap what it is you want to accomplish, it may be better to do this with the Bio::Perl module but instead use a more direct use of the underlying modules. > > > On Sep 6, 2011, at 3:59 AM, Manju Rawat wrote: > >> bioperl 1.6.9 version is installed in my system.. >> its not the reason bcs blast is working fine when i am blasting with >> follwing code.. >> >> #!usr/bin/perl -w >> use Bio::Perl; >> $seq = get_sequence('NCBI',"NM_181451"); >> $blast_result=blast_sequence($seq); >> write_blast(">xyz.blast",$blast_result); >> >> >> Manju >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > From carandraug+dev at gmail.com Wed Sep 7 12:47:16 2011 From: carandraug+dev at gmail.com (=?ISO-8859-1?Q?Carn=EB_Draug?=) Date: Wed, 7 Sep 2011 17:47:16 +0100 Subject: [Bioperl-l] Problem using Bio::Tools::Run::RemoteBlast In-Reply-To: <4E590812.9030006@ibab.ac.in> References: <37711.192.168.1.254.1313992876.squirrel@webmail.ibab.ac.in> <4E590812.9030006@ibab.ac.in> Message-ID: 2011/8/27 Lucky Singh : > On Friday 26 August 2011 07:50 PM, Carn? Draug wrote: >> >> On 22 August 2011 07:01, Lucky Singh ?wrote: >>> >>> Now I >>> wanted to host it from web server, but This program is not working from >>> it >>> may be it is not able to create or write on file from web server but in >>> command line it is working fine. I don't know the possible reason, please >>> help me to figure it out. >> >> Have you looked in the apache logs (look in >> /var/log/apache2/error.log) ? Can you pastebin your whole code and the >> content of the error log after trying to run the script? > > Dear Carn? Draug, > > As per your suggestion, I am attaching blast code file currently it is not > showing any error on error.log. > Thanks a lot for your valuable reply and will be highly grateful if you can > get me out of this problem :) Hi sorry for the late reply. Please try to always reply to the mailing list, maybe someone else can help you too. I don't know about the script as I never used RemoteBlast from bioperl. But given a quick look at it, you're not loading the CGI module on the script ( http://perldoc.perl.org/CGI.html ). Here's a simple example using the CGI module ( http://pastebin.com/miMd70wn ) and a HTML page that will use it ( http://pastebin.com/kWwwMijd ). If nothing shows up on error.log, take a look in access.log. Try some simple CGI script first, such as "hello world!" to see if the problem lies on your bioperl part of the script, or in the web server, or some other part. Carn? From scott at scottcain.net Wed Sep 7 13:57:31 2011 From: scott at scottcain.net (Scott Cain) Date: Wed, 7 Sep 2011 13:57:31 -0400 Subject: [Bioperl-l] October GMOD Meeting in Toronto Message-ID: Hello, The early registration deadline for the October GMOD meeting in Toronto, Canada is approaching. Please register by September 13th to avoid the late registration fee. You can register here: http://gmod.eventbrite.com/ For information about the GMOD meeting please see the page at: http://gmod.org/wiki/October_2011_GMOD_Meeting In addition to the main meeting, there will be a free BioMart workshop on the following Friday, which you can also register for at the main meeting registration page. Thanks, Scott -- ------------------------------------------------------------------------ Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 Ontario Institute for Cancer Research From info at etisalat.com Wed Sep 7 05:52:00 2011 From: info at etisalat.com (Etisalat Telecommunication Network.) Date: Wed, 7 Sep 2011 17:52:00 +0800 Subject: [Bioperl-l] Winning No:ETS/G/NG Message-ID: <20110907092508.M63844@etisalat.com> Etisalat Telecommunication Network. Ticket No:ET/S/3G Notification Date:07/09/2011 Winning No:ETS/G/NG Dear Beneficiary, Congratulations The Etisalat mobile telecommunication network service has chosen you by the board of executive directors as one of the final recipients of a cash Grant/Donation.The online cyber draws was conducted from an exclusive list of 100,000 email addresses of individuals and corporate bodies picked by an advanced automated random computer selection from the web.This promotion is to celebrate the patronage of our esteem customers and we are giving out a yearly donation of $1,000,000.00 US dollers to 10 lucky recipients as a way of showing our appreciation. CONTACT EVENT MANAGER. NAME:Thompson Thomas Phone # :+2347063805127 etisalat_clamdept001 at hotmail.com Etisalat Claims Department 1.Full Name: 2.Residential Address: 3.Country: 4.Occupation: 5.Telephone: 6.Sex: 7.Age: 8.Next of Kin: 9.Nationality: 10.Winning No: Secretary Mrs Linda Abram Etisalat Award Promotion (c)2011 Online Award Promotion Edition From longbow0 at gmail.com Wed Sep 7 16:19:37 2011 From: longbow0 at gmail.com (longbow leo) Date: Wed, 7 Sep 2011 15:19:37 -0500 Subject: [Bioperl-l] How to determine strains were evolved independently in a phylogenetic tree? Message-ID: Hi, I have created a phylogenetic for a virus protein which contained about 200 strains. Next I need to do an analysis to check whether several strains in this tree were evolved independently. Although it is not too difficult to do manually, I still have litter idea how to do this in a Perl script since there are some datasets need to do. At first I tried to use the method "is_monophyletic" in the module "Bio::Tree::TreeFunctionsI" to do this analysis, but it seems it doesn't work as I have thought. According to the description of "is_monophyletic" method, it "Will do a test of monophyly for the nodes specified in comparison to a chosen outgroup". Does here test whether the outgroup strain is monophyletic to the nodes, or test the nodes only? The description sounds like the latter but the what the script did seemed to be the first. Are there any suggestions? Thank you very much! Haizhou Liu From greg at ebi.ac.uk Thu Sep 8 06:40:30 2011 From: greg at ebi.ac.uk (Gregory Jordan) Date: Thu, 8 Sep 2011 11:40:30 +0100 Subject: [Bioperl-l] How to determine strains were evolved independently in a phylogenetic tree? In-Reply-To: References: Message-ID: Hi Haizhou, I'm not sure I understand exactly what you're trying to do. But to clarify the BioPerl code: the is_monophyletic method (for the actual code, see here https://github.com/bioperl/bioperl-live/blob/master/Bio/Tree/TreeFunctionsI.pm#L832) tests whether the single outgroup node falls *within* or *outside* the last common ancestor of the group of nodes given. If the outgroup node falls *outside* the subtree defined by this LCA node, then the group of nodes can be called monophyletic with respect to that outgroup (at least as far as my understanding of the word 'monophyletic' goes). If the outgroup node falls *within* the subtree defined by this LCA node, then the group of nodes is not monophyletic with respect to that outgroup node. The term "evolved independently" sounds slightly vague to me -- what is it exactly about the shape of your tree that allows you to call a strain independent or not? If you gave an example or two of trees where you consider the evolution to be independent and non-independent, I (or someone else on the list) may be able to help you find the right method to do this automatically. Cheers, Greg On Wed, Sep 7, 2011 at 9:19 PM, longbow leo wrote: > Hi, > > I have created a phylogenetic for a virus protein which contained about 200 > strains. Next I need to do an analysis to check whether several strains in > this tree were evolved independently. Although it is not too difficult to > do > manually, I still have litter idea how to do this in a Perl script since > there are some datasets need to do. > > At first I tried to use the method "is_monophyletic" in the module > "Bio::Tree::TreeFunctionsI" to do this analysis, but it seems it doesn't > work as I have thought. According to the description of "is_monophyletic" > method, it "Will do a test of monophyly for the nodes specified in > comparison to a chosen outgroup". Does here test whether the outgroup > strain > is monophyletic to the nodes, or test the nodes only? The description > sounds > like the latter but the what the script did seemed to be the first. > > Are there any suggestions? > > Thank you very much! > > > Haizhou Liu > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From manju.rawat2 at gmail.com Thu Sep 8 02:11:12 2011 From: manju.rawat2 at gmail.com (Manju Rawat) Date: Thu, 8 Sep 2011 02:11:12 -0400 Subject: [Bioperl-l] Fwd: blast result not matching. In-Reply-To: References: <4AF09F30-CB69-4285-90A2-40AFEEEBD222@illinois.edu> <8FB3525D-3B2B-4D17-80D4-AA85B3D2AE8B@gmail.com> Message-ID: Toady i installed the latest version of bioperl in my system via CPAN.. But this still not sowing the complete result.. I just want to do nucleotide blast using bioperl..but while i am doing blast with my sequence it shwowing very samll result.. I dnt know whether it is wrong or right...but while i am blasting the same sequence in NCBI it showing a diffrent result.. and i have also tried to use the orignl module..but it also dnt work.. Pl see reult of the balst in attached file of this mail.. #!usr/bin/perl -w use Bio::Perl; use Bio::SearchIO; $blast_report =blast_sequence('acggctgctgtagatctgatgct'); write_blast(">resl.blast",$blast_report); Thanks. Manju Rawat -------------- next part -------------- A non-text attachment was scrubbed... Name: resl.blast Type: application/octet-stream Size: 1680 bytes Desc: not available URL: From cjfields at illinois.edu Thu Sep 8 09:05:10 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Thu, 8 Sep 2011 13:05:10 +0000 Subject: [Bioperl-l] Fwd: blast result not matching. In-Reply-To: References: <4AF09F30-CB69-4285-90A2-40AFEEEBD222@illinois.edu> <8FB3525D-3B2B-4D17-80D4-AA85B3D2AE8B@gmail.com> Message-ID: <6D4A142B-9455-4CC3-AFDB-F9B3B991B57F@illinois.edu> Submissions to NCBI BLAST via their web interface have different parameters than those submitted via their QBLAST interface (what is used in BioPerl). So the fact there are differing results isn't too surprising, particularly if the results fall close to the e-value cutoff for one or the other. You will need to set the proper parameters, which I don't believe is possible via the (very simple) Bio::Perl interface, but is possible via Bio::Tools::Run::RemoteBlast. chris On Sep 8, 2011, at 1:11 AM, Manju Rawat wrote: > Toady i installed the latest version of bioperl in my system via CPAN.. > But this still not sowing the complete result.. > > > I just want to do nucleotide blast using bioperl..but while i am doing blast with my sequence it shwowing very samll result.. > I dnt know whether it is wrong or right...but while i am blasting the same sequence in NCBI it showing a diffrent result.. > > and i have also tried to use the orignl module..but it also dnt work.. > > Pl see reult of the balst in attached file of this mail.. > #!usr/bin/perl -w > use Bio::Perl; > use Bio::SearchIO; > $blast_report =blast_sequence('acggctgctgtagatctgatgct'); > write_blast(">resl.blast",$blast_report); > > Thanks. > Manju Rawat > > > From David.Messina at sbc.su.se Thu Sep 8 09:33:19 2011 From: David.Messina at sbc.su.se (Dave Messina) Date: Thu, 8 Sep 2011 15:33:19 +0200 Subject: [Bioperl-l] Fwd: blast result not matching. In-Reply-To: References: <4AF09F30-CB69-4285-90A2-40AFEEEBD222@illinois.edu> <8FB3525D-3B2B-4D17-80D4-AA85B3D2AE8B@gmail.com> Message-ID: As I think has been said earlier in this thread, it's almost certainly a discrepancy in the BLAST parameters between what the blast_sequence function in the Bio::Perl module is sending, and what the BLAST website is doing. In this case, you have a very short sequence. If you look in the "Algorithm parameters" section of the BLAST web form, you'll see that there is an option that is checked by default, "Automatically adjust parameters for short input sequences". If I uncheck that option, I get the same results as you did when you submitted your BLAST through BioPerl (see http://cl.ly/9ynq). So to get the same results from a BioPerl-submitted BLAST and a BLAST on NCBI's website, you need to have the same parameters. You can set the parameters from BioPerl as described in the documentation: http://search.cpan.org/~cjfields/BioPerl-1.6.901/Bio/Tools/Run/RemoteBlast.pm As Jason said earlier, the blast_sequence function in Bio::Perl is intended as a simple demonstration and uses the default BLAST parameters. That function is a wrapper around the RemoteBlast module. Since you want to do something a little different, I believe you'll need to use the RemoteBlast module directly. Dave On Thu, Sep 8, 2011 at 08:11, Manju Rawat wrote: > Toady i installed the latest version of bioperl in my system via CPAN.. > But this still not sowing the complete result.. > > > I just want to do nucleotide blast using bioperl..but while i am doing > blast > with my sequence it shwowing very samll result.. > I dnt know whether it is wrong or right...but while i am blasting the same > sequence in NCBI it showing a diffrent result.. > > and i have also tried to use the orignl module..but it also dnt work.. > > Pl see reult of the balst in attached file of this mail.. > #!usr/bin/perl -w > use Bio::Perl; > use Bio::SearchIO; > $blast_report =blast_sequence('acggctgctgtagatctgatgct'); > write_blast(">resl.blast",$blast_report); > > Thanks. > Manju Rawat > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From abualiga2 at gmail.com Thu Sep 8 10:44:39 2011 From: abualiga2 at gmail.com (galeb abu-ali) Date: Thu, 8 Sep 2011 10:44:39 -0400 Subject: [Bioperl-l] genbank parsing of multiple 'function' tags within primary tag Message-ID: Hi, I'm parsing a genbank file with Bio::SeqIO and am stuck on instances of multiple tags within a primary tag. E.g., when there are several 'function' tag-values within a 'CDS' primary tag, I don't know how to link those 'function' tag-values to a particular 'locus_tag'. As parsed values are returned as a list, I tried creating an array of hashes, where the hash-key is 'locus_tag' and hash-values are multiple 'function' tags, but am failing miserably. Pasted below is what I managed so far. At your convenience, please advise. thanks! galeb #!/usr/local/bin/perl # parse_gbk.pl # gsa 09042011 # script to parse out features from gbk # http://www.bioperl.org/wiki/HOWTO:Feature-Annotation#Customizing_Sequence_Object_Construction use strict; use warnings; use Bio::SeqIO; my @loci; my @seqs; my @directions; my @start_coords; my @end_coords; my @genes; my @products; my @notes; my @functions; my %functions; my $gb_file = shift; my $seqio_obj = Bio::SeqIO->new(-file => $gb_file ); my $seq_obj = $seqio_obj->next_seq; for my $feat_obj ( $seq_obj->get_SeqFeatures ) { if ( $feat_obj->primary_tag eq ( 'gene' ) ) { if ($feat_obj->has_tag( 'locus_tag' ) ) { push ( @seqs, $feat_obj->seq->seq ); #collect sequences for my $val ( $feat_obj->get_tag_values( 'locus_tag' ) ) { push ( @loci, $val ); # locus_tags } } if ( $feat_obj->has_tag( 'gene' ) ) { for my $val ( $feat_obj->get_tag_values( 'gene' ) ) { push ( @genes, $val ); # gene names } } else { push ( @genes, "" ); # if gene names are absent, leave empty } if ( $feat_obj->location->isa( 'Bio::Location::Simple' ) ) { # gene coordinates for my $location ( $feat_obj->location ) { push ( @start_coords, $location->start ); push ( @end_coords, $location->end ); if ( $location->strand == -1 ) { push ( @directions, "reverse" ); } else { push ( @directions, "forward" ); } } } } # gene products, notes, functions if ( $feat_obj->primary_tag eq ( 'CDS' ) || $feat_obj->primary_tag eq ( 'misc_feature' ) || $feat_obj->primary_tag eq ( 'ncRNA' ) || $feat_obj->primary_tag eq ( 'rRNA' ) || $feat_obj->primary_tag eq ( 'tRNA' ) || $feat_obj->primary_tag eq ( 'misc_RNA' ) ) { if ( $feat_obj->has_tag( 'product' ) ) { for my $product ( $feat_obj->get_tag_values( 'product' ) ) { push ( @products, $product ); } } else { push ( @products, "" ); } if ( $feat_obj->has_tag( 'note' ) ) { for my $note ( $feat_obj->get_tag_values( 'note' ) ) { push ( @notes, $note ); } } else { push ( @notes, "" ); } if ( $feat_obj->has_tag( 'function' ) ) { for my $function ( $feat_obj->get_tag_values( 'function' ) ) { push ( @functions, $function ); } } else { push ( @functions, "" ); } } } print "locus\tgene_name\tstart_nt\tend_nt\tlength_nt\tdirection\tproduct\tnote\tfunction\tsequence_nt\n"; # header for ( my $elem = 0; $elem < scalar @loci; ++$elem ) { print $loci[$elem], "\t",$genes[$elem], "\t", $start_coords[$elem], "\t", $end_coords[$elem], "\t", length( $seqs[$elem] ), "\t", $directions[$elem], "\t", $products[$elem], "\t", $notes[$elem], "\t", $functions[$elem], "\t", $seqs[$elem], "\n"; } From p.j.a.cock at googlemail.com Thu Sep 8 11:27:56 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 8 Sep 2011 16:27:56 +0100 Subject: [Bioperl-l] genbank parsing of multiple 'function' tags within primary tag In-Reply-To: References: Message-ID: On Thu, Sep 8, 2011 at 3:44 PM, galeb abu-ali wrote: > Hi, > > I'm parsing a genbank file with Bio::SeqIO and am stuck on instances of > multiple tags within a primary tag. ?E.g., when there are several 'function' > tag-values within a 'CDS' primary tag, I don't know how to link those > 'function' tag-values to a particular 'locus_tag'. Do you have GenBank features with multiple locus_tag qualifiers? That would be very unusual... Peter From cjfields at illinois.edu Thu Sep 8 11:32:21 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Thu, 8 Sep 2011 15:32:21 +0000 Subject: [Bioperl-l] genbank parsing of multiple 'function' tags within primary tag In-Reply-To: References: Message-ID: On Sep 8, 2011, at 10:27 AM, Peter Cock wrote: > On Thu, Sep 8, 2011 at 3:44 PM, galeb abu-ali wrote: >> Hi, >> >> I'm parsing a genbank file with Bio::SeqIO and am stuck on instances of >> multiple tags within a primary tag. E.g., when there are several 'function' >> tag-values within a 'CDS' primary tag, I don't know how to link those >> 'function' tag-values to a particular 'locus_tag'. > > Do you have GenBank features with multiple locus_tag qualifiers? > That would be very unusual... > > Peter Agreed; in order to clarify what you mean, I think we would need to see the record in question to get a better idea of the problem. chris From abualiga2 at gmail.com Thu Sep 8 11:39:20 2011 From: abualiga2 at gmail.com (galeb abu-ali) Date: Thu, 8 Sep 2011 11:39:20 -0400 Subject: [Bioperl-l] genbank parsing of multiple 'function' tags within primary tag In-Reply-To: References: Message-ID: I guess I was not clear. 'locus_tag' qualifiers are single, but there are mutliple 'function' qualifiers within a primary feature (e.g. 'CDS'). # gbk file LOCUS NC_011748 5154862 bp DNA circular BCT 15-MAY-2010 # example feature gene complement(1336169..1337905) /gene="cvrA" /locus_tag="EC55989_1287" /db_xref="GeneID:7145846" CDS complement(1336169..1337905) /gene="cvrA" /locus_tag="EC55989_1287" /function="7 : Transport and binding proteins" /function="15.10 : Adaptations to atypical conditions" /function="16.1 : Circulate" /inference="ab initio prediction:AMIGene:2.0" /note="the Vibrio parahaemolyticus gene VP2867 was found to be a potassium/proton antiporter; can rapidly extrude potassium against a potassium gradient at alkaline pH when cloned and expressed in Escherichia coli" /codon_start=1 /transl_table=11 /product="potassium/proton antiporter" /protein_id="YP_002402372.1" /db_xref="GI:218694705" /db_xref="GeneID:7145846" /translation="MDATTIISLFILGSILVTSSILLSSFSSRLGIPILVIFLAIGML AGVDGVGGIPFDNYPFAYMVSNLALAIILLDGGMRTQASSFRVALGPALSLATLGVLI TSGLTGMMAAWLFNLDLIEGLLIGAIVGSTDAAAVFSLLGGKGLNERVGSTLEIESGS NDPMAVFLTITLIAMIQQHESSVSWMFVVDILQQFGLGIVIGLGGGYLLLQMINRIAL PAGLYPLLALSGGILIFALTTALEGSGILAVYLCGFLLGNRPIRNRYGILQNFDGLAW LAQIAMFLVLGLLVNPSDLLPIAIPALILSAWMIFFARPLSVFAGLLPFRGFNLRERV FISWVGLRGAVPIILAVFPMMAGLENARLFFNVAFFVVLVSLLLQGTSLSWAAKKAKV VVPPVGRPVSRVGLDIHPENPWEQFVYQLSADKWCVGAALRDLHMPKETRIAALFRDN QLLHPTGSTRLREGDVLCVIGRERDLPALGKLFSQSPPVALDQRFFGDFILEASAKYA DVALIYGLEDGREYRDKQQTLGEIVQQLLGAAPVVGDQVEFAGMIWTVAEKEDNEVLK IGVRVAEEEAES" On Thu, Sep 8, 2011 at 11:32 AM, Fields, Christopher J < cjfields at illinois.edu> wrote: > On Sep 8, 2011, at 10:27 AM, Peter Cock wrote: > > > On Thu, Sep 8, 2011 at 3:44 PM, galeb abu-ali > wrote: > >> Hi, > >> > >> I'm parsing a genbank file with Bio::SeqIO and am stuck on instances of > >> multiple tags within a primary tag. E.g., when there are several > 'function' > >> tag-values within a 'CDS' primary tag, I don't know how to link those > >> 'function' tag-values to a particular 'locus_tag'. > > > > Do you have GenBank features with multiple locus_tag qualifiers? > > That would be very unusual... > > > > Peter > > Agreed; in order to clarify what you mean, I think we would need to see the > record in question to get a better idea of the problem. > > chris From p.j.a.cock at googlemail.com Thu Sep 8 11:46:28 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 8 Sep 2011 16:46:28 +0100 Subject: [Bioperl-l] genbank parsing of multiple 'function' tags within primary tag In-Reply-To: References: Message-ID: On Thu, Sep 8, 2011 at 4:39 PM, galeb abu-ali wrote: > I guess I was not clear. 'locus_tag' qualifiers are single, but there are > mutliple 'function' qualifiers within a primary feature (e.g. 'CDS'). So are your intending to look at all the CDS features only, and build a hash using the locus_tag as the key, and a list of the 'function' qualifiers as values? Peter From abualiga2 at gmail.com Thu Sep 8 11:55:08 2011 From: abualiga2 at gmail.com (galeb abu-ali) Date: Thu, 8 Sep 2011 11:55:08 -0400 Subject: [Bioperl-l] genbank parsing of multiple 'function' tags within primary tag In-Reply-To: References: Message-ID: So are your intending to look at all the CDS features only, and build a hash using the locus_tag as the key, and a list of the 'function' qualifiers as values? Precisely! I want to create a tab delim file with 'locus_tag' as the common identifier to all the features and gene sequences. So far, I parsed out sequences and single instance qualifiers, but 'function' and 'db_xref' qualifiers give me grief. galeb From abualiga2 at gmail.com Thu Sep 8 12:14:07 2011 From: abualiga2 at gmail.com (galeb abu-ali) Date: Thu, 8 Sep 2011 12:14:07 -0400 Subject: [Bioperl-l] genbank parsing of multiple 'function' tags within primary tag In-Reply-To: <1315497897.3797.443.camel@deskpro15336.internal.sanger.ac.uk> References: <1315497897.3797.443.camel@deskpro15336.internal.sanger.ac.uk> Message-ID: I only had a quick look at your code, so maybe I'm missing something but you are currently pushing all products of all CDSs into the same array, i.e. you do not assign them to a datastructure that links a particular CDS to a list of products. You then use the same index to print out a locus from the @loci array and a product from @products, but the two will not match up because you will have more products than loci. That's right. Products are not the issue in this particular case, as it's E.coli and there's no alternate splicing as far as I know so there is a single product per gene. But there are plenty more 'function' qualifiers, for example, than loci. And I don't know how to create a data structure that will link a 'gene' (as primary tag) to all other qualifiers, whether they belong to 'CDS', 'Misc_RNA', 'Misc_feature', or other primary tags. From ss2489 at cornell.edu Thu Sep 8 12:28:40 2011 From: ss2489 at cornell.edu (Surya Saha) Date: Thu, 8 Sep 2011 12:28:40 -0400 Subject: [Bioperl-l] genbank parsing of multiple 'function' tags within primary tag In-Reply-To: References: <1315497897.3797.443.camel@deskpro15336.internal.sanger.ac.uk> Message-ID: You might want to explore using a hash of complex records that are very similar to structures in C/C++. More info at http://perldoc.perl.org/perldsc.html#Declaration-of-a-HASH-OF-COMPLEX-RECORDS -Surya On Thu, Sep 8, 2011 at 12:14 PM, galeb abu-ali wrote: > I only had a quick look at your code, so maybe I'm missing something but > you are currently pushing all products of all CDSs into the same array, > i.e. you do not assign them to a datastructure that links a particular > CDS to a list of products. You then use the same index to print out a > locus from the @loci array and a product from @products, but the two > will not match up because you will have more products than loci. > > > > That's right. Products are not the issue in this particular case, as it's > E.coli and there's no alternate splicing as far as I know so there is a > single product per gene. But there are plenty more 'function' qualifiers, > for example, than loci. And I don't know how to create a data structure > that > will link a 'gene' (as primary tag) to all other qualifiers, whether they > belong to 'CDS', 'Misc_RNA', 'Misc_feature', or other primary tags. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From fs5 at sanger.ac.uk Thu Sep 8 12:04:57 2011 From: fs5 at sanger.ac.uk (Frank Schwach) Date: Thu, 08 Sep 2011 17:04:57 +0100 Subject: [Bioperl-l] genbank parsing of multiple 'function' tags within primary tag In-Reply-To: References: Message-ID: <1315497897.3797.443.camel@deskpro15336.internal.sanger.ac.uk> I only had a quick look at your code, so maybe I'm missing something but you are currently pushing all products of all CDSs into the same array, i.e. you do not assign them to a datastructure that links a particular CDS to a list of products. You then use the same index to print out a locus from the @loci array and a product from @products, but the two will not match up because you will have more products than loci. Frank On Thu, 2011-09-08 at 10:44 -0400, galeb abu-ali wrote: > Hi, > > I'm parsing a genbank file with Bio::SeqIO and am stuck on instances of > multiple tags within a primary tag. E.g., when there are several 'function' > tag-values within a 'CDS' primary tag, I don't know how to link those > 'function' tag-values to a particular 'locus_tag'. As parsed values are > returned as a list, I tried creating an array of hashes, where the hash-key > is 'locus_tag' and hash-values are multiple 'function' tags, but am failing > miserably. Pasted below is what I managed so far. At your convenience, > please advise. > > thanks! > > galeb > > #!/usr/local/bin/perl > # parse_gbk.pl > # gsa 09042011 > # script to parse out features from gbk > # > http://www.bioperl.org/wiki/HOWTO:Feature-Annotation#Customizing_Sequence_Object_Construction > > use strict; use warnings; > use Bio::SeqIO; > > my @loci; > my @seqs; > my @directions; > my @start_coords; > my @end_coords; > my @genes; > my @products; > my @notes; > my @functions; > my %functions; > > my $gb_file = shift; > my $seqio_obj = Bio::SeqIO->new(-file => $gb_file ); > my $seq_obj = $seqio_obj->next_seq; > > for my $feat_obj ( $seq_obj->get_SeqFeatures ) { > if ( $feat_obj->primary_tag eq ( 'gene' ) ) { > if ($feat_obj->has_tag( 'locus_tag' ) ) { > push ( @seqs, $feat_obj->seq->seq ); #collect sequences > for my $val ( $feat_obj->get_tag_values( 'locus_tag' ) ) > { > push ( @loci, $val ); # locus_tags > } > } > if ( $feat_obj->has_tag( 'gene' ) ) { > for my $val ( $feat_obj->get_tag_values( 'gene' ) > ) { > push ( @genes, $val ); # gene names > } > } > else { > push ( @genes, "" ); # if gene names are absent, leave > empty > } > if ( $feat_obj->location->isa( 'Bio::Location::Simple' ) ) { # gene > coordinates > for my $location ( $feat_obj->location ) { > push ( @start_coords, $location->start ); > push ( @end_coords, $location->end ); > if ( $location->strand == -1 ) { > push ( @directions, "reverse" ); > } > else { > push ( @directions, "forward" ); > } > } > } > } > # gene products, notes, functions > if ( $feat_obj->primary_tag eq ( 'CDS' ) || $feat_obj->primary_tag eq ( > 'misc_feature' ) || $feat_obj->primary_tag eq ( 'ncRNA' ) || > $feat_obj->primary_tag eq ( 'rRNA' ) || $feat_obj->primary_tag eq ( 'tRNA' ) > || $feat_obj->primary_tag eq ( 'misc_RNA' ) ) { > if ( $feat_obj->has_tag( 'product' ) ) { > for my $product ( $feat_obj->get_tag_values( 'product' ) ) { > push ( @products, $product ); > } > } > else { > push ( @products, "" ); > } > if ( $feat_obj->has_tag( 'note' ) ) { > for my $note ( $feat_obj->get_tag_values( 'note' ) ) { > push ( @notes, $note ); > } > } > else { > push ( @notes, "" ); > } > if ( $feat_obj->has_tag( 'function' ) ) { > for my $function ( $feat_obj->get_tag_values( 'function' ) ) { > push ( @functions, $function ); > } > } > else { > push ( @functions, "" ); > } > > } > } > > print > "locus\tgene_name\tstart_nt\tend_nt\tlength_nt\tdirection\tproduct\tnote\tfunction\tsequence_nt\n"; > # header > > for ( my $elem = 0; $elem < scalar @loci; ++$elem ) { > print $loci[$elem], "\t",$genes[$elem], "\t", $start_coords[$elem], > "\t", $end_coords[$elem], "\t", length( $seqs[$elem] ), "\t", > $directions[$elem], "\t", $products[$elem], "\t", $notes[$elem], "\t", > $functions[$elem], "\t", $seqs[$elem], "\n"; > } > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From cjfields at illinois.edu Thu Sep 8 12:51:22 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Thu, 8 Sep 2011 16:51:22 +0000 Subject: [Bioperl-l] genbank parsing of multiple 'function' tags within primary tag In-Reply-To: References: <1315497897.3797.443.camel@deskpro15336.internal.sanger.ac.uk> Message-ID: There is no need to do that if one is using the Bio::SeqFeatureI interface. Note that get_tag_values always returns a list, so to snag a single value for a tag in a scalar, force list context on the LHS by enclosing the variable in (). chris ----------------------------- #!/usr/bin/env perl use Modern::Perl; use Bio::SeqIO; my $in = Bio::SeqIO->new(-format => 'genbank', -file => shift); while (my $seq = $in->next_seq) { for my $feat ($seq->get_SeqFeatures) { next unless $feat->primary_tag eq 'CDS'; my ($locus) = $feat->has_tag('locus_tag') ? $feat->get_tag_values('locus_tag') : ''; my @funcs = $feat->has_tag('function') ? $feat->get_tag_values('function') : (); say join("\t", $locus, join(',', at funcs)); } } On Sep 8, 2011, at 11:28 AM, Surya Saha wrote: > You might want to explore using a hash of complex records that are very > similar to structures in C/C++. More info at > http://perldoc.perl.org/perldsc.html#Declaration-of-a-HASH-OF-COMPLEX-RECORDS > > -Surya > > On Thu, Sep 8, 2011 at 12:14 PM, galeb abu-ali wrote: > >> I only had a quick look at your code, so maybe I'm missing something but >> you are currently pushing all products of all CDSs into the same array, >> i.e. you do not assign them to a datastructure that links a particular >> CDS to a list of products. You then use the same index to print out a >> locus from the @loci array and a product from @products, but the two >> will not match up because you will have more products than loci. >> >> >> >> That's right. Products are not the issue in this particular case, as it's >> E.coli and there's no alternate splicing as far as I know so there is a >> single product per gene. But there are plenty more 'function' qualifiers, >> for example, than loci. And I don't know how to create a data structure >> that >> will link a 'gene' (as primary tag) to all other qualifiers, whether they >> belong to 'CDS', 'Misc_RNA', 'Misc_feature', or other primary tags. >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From abualiga2 at gmail.com Thu Sep 8 12:51:42 2011 From: abualiga2 at gmail.com (galeb abu-ali) Date: Thu, 8 Sep 2011 12:51:42 -0400 Subject: [Bioperl-l] genbank parsing of multiple 'function' tags within primary tag In-Reply-To: References: <1315497897.3797.443.camel@deskpro15336.internal.sanger.ac.uk> Message-ID: You might want to explore using a hash of complex records that are very similar to structures in C/C++. More info at http://perldoc.perl.org/perldsc.html#Declaration-of-a-HASH-OF-COMPLEX-RECORDS alright, thanks! From jskittrell at unmc.edu Thu Sep 8 12:40:31 2011 From: jskittrell at unmc.edu (Jeff S Kittrell) Date: Thu, 8 Sep 2011 11:40:31 -0500 Subject: [Bioperl-l] Error when parsing a blast file Message-ID: An HTML attachment was scrubbed... URL: From cjfields at illinois.edu Thu Sep 8 13:28:53 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Thu, 8 Sep 2011 17:28:53 +0000 Subject: [Bioperl-l] Error when parsing a blast file In-Reply-To: References: Message-ID: <1F3664C5-6D6B-409C-BE5A-C5EB08975231@illinois.edu> What version of bioperl are you using? I think this issue was addressed a while ago, but it's possible there has been a regression. chris On Sep 8, 2011, at 11:40 AM, Jeff S Kittrell wrote: > Hello Gentlemen, > > I am using BioPerl to a parse a blast output file but have run into some difficulties. I've pin pointed the problem and have pasted an example below. If you look at query position 223-224 you will see a large insertion 65ish nucleotides. Since the insertion spans the entire line there are no nucleotide position numbers at the end or beginning of the line nor any nucleotides within the line (dashes only). > When the SearchIO parser encounters this record it dies with the error > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: no data for midline Query ------------------------------------------------------------ > STACK: Error::throw > STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:368 > STACK: Bio::SearchIO::blast::next_result /usr/local/share/perl5/Bio/SearchIO/blast.pm:1805 > STACK: BlastParseNucleotideForDBTopHitCONTIGSQUERY.pl:24 > ----------------------------------------------------------- > > > Has anyone encountered this problem before? Am I doing something wrong? > > Thanks > > Jeff Kittrell > Department of Genetics, Cell Biology & Anatomy > University of Nebraska Medical Center > 985805 Nebraska Medical Center > Omaha, NE 68198-5805 > > Query= 78065535 > > Length=523 > Score E > Sequences producing significant alignments: (Bits) Value > > gi|144922664|ref|NM_001083909.1| Homo sapiens G protein-coupled... 576 1e-163 > > > > gi|144922664|ref|NM_001083909.1| Homo sapiens G protein-coupled > receptor 123 (GPR123), mRNA > Length=4298 > > Score = 576 bits (638), Expect = 1e-163 > Identities = 466/583 (80%), Gaps = 82/583 (14%) > Strand=Plus/Minus > > Query 1 CAGGACTCCGTGG-----ATGGCATCTCGGGCAGGGCCACGCTGGGGTCTGGGTGGGTCC 55 > ||||||||||||| | ||||||||||||||||||| |||||||||| |||||||| > Sbjct 2537 CAGGACTCCGTGGGCAGCAGGGCATCTCGGGCAGGGCCATGCTGGGGTCTCAGTGGGTCC 2478 > > Query 56 TTTGATGGAAGCCCCTGCTCTGCCTCTGGGGCGCCCCAGGACTGGAGGCCACAGGACAGA 115 > |||||||||| |||||||||||||||| ||| ||||||||||||||| |||||||||||| > Sbjct 2477 TTTGATGGAATCCCCTGCTCTGCCTCTAGGGTGCCCCAGGACTGGAGACCACAGGACAGA 2418 > > Query 116 AACCAGATGACCTTGTGCAGGGACGAGCACGTGGAACTGGGATAAAAGGAGTGGGCGTGG 175 > |||| ||||||| ||||| ||||| |||||| |||| |||||||| ||||||||||||| > Sbjct 2417 AACCGGATGACCGTGTGC-GGGACCAGCACGCGGAATTGGGATAAGGGGAGTGGGCGTGG 2359 > > Query 176 CCCAGAGCTTTTCCCCGCTGAGGTCTTTCACAAGGAAGGGGCAGGGGT------------ 223 > ||| |||| ||||||||||||||||||||||||||||||||||||||| > Sbjct 2358 CCCGGAGCGTTTCCCCGCTGAGGTCTTTCACAAGGAAGGGGCAGGGGTGTGATCACAAGG 2299 > > Query ------------------------------------------------------------ > > Sbjct 2298 AAGGGGCAGGGGTGTGATCACAAGGAAGGGGCAGGGGTGTGATCACAAGGAAGGGGCAGG 2239 > > Query 224 ---GTGAACTGCTTCCGAAAGGTGGGGTCACTTTGGTGCCCCCAGTGACCTCATGTGGCA 280 > |||||| ||||| |||||| |||||||||| ||| |||||||||||||||||||||| > Sbjct 2238 GGTGTGAACGGCTTCTGAAAGGCGGGGTCACTTCGGTACCCCCAGTGACCTCATGTGGCA 2179 > > Query 281 GATGGGCCCCCCACTCTGCTCTGAAGCTCCTCCAGGAACACTGTGTCCCCTG-CTCCGCC 339 > ||||||||||||||||||||||||||||||||||||||||| |||||| ||| | || | > Sbjct 2178 GATGGGCCCCCCACTCTGCTCTGAAGCTCCTCCAGGAACACCGTGTCCTCTGCCCCCATC 2119 > > Query 340 TACACAGTAGTTTCATTTTTCCAGGGTCCTGTTCGGATGTTGCCGGTCCCATCGGTGCCA 399 > |||||||||||||| |||||||||||||| |||||||||||||||||||| ||||||||| > Sbjct 2118 TACACAGTAGTTTCGTTTTTCCAGGGTCCCGTTCGGATGTTGCCGGTCCCGTCGGTGCCA 2059 > > Query 400 AACGGCAGGTCTTCTAGCAAGTTACCCTTGGGCAGCCCGTTCTGGCTGGGGCCACCAAAG 459 > ||||||||| |||||||||| ||||||||||||||||||||||||||||||||||||||| > Sbjct 2058 AACGGCAGGCCTTCTAGCAATTTACCCTTGGGCAGCCCGTTCTGGCTGGGGCCACCAAAG 1999 > > Query 460 GGCAGGGACTGTGTCCTCCGCAGCATCTCCAGGTGACCAGGCC 502 > ||||||||||||||||||||||||||||||| ||| || |||| > Sbjct 1998 GGCAGGGACTGTGTCCTCCGCAGCATCTCCAAGTGGCCGGGCC 1956 > > > > Lambda K H > 0.634 0.408 0.912 > > Gapped > Lambda K H > 0.625 0.410 0.780 > > Effective search space used: 47712920310 > > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From abualiga2 at gmail.com Thu Sep 8 13:51:34 2011 From: abualiga2 at gmail.com (galeb abu-ali) Date: Thu, 8 Sep 2011 13:51:34 -0400 Subject: [Bioperl-l] genbank parsing of multiple 'function' tags within primary tag In-Reply-To: References: <1315497897.3797.443.camel@deskpro15336.internal.sanger.ac.uk> Message-ID: thanks, Chris! works perfect. To make sure I understand what's going on, forcing list context on $locus allows me to get one value at a time, which is then concatenated with \t to concatenated functions. thanks again! galeb On Thu, Sep 8, 2011 at 12:51 PM, Fields, Christopher J < cjfields at illinois.edu> wrote: > There is no need to do that if one is using the Bio::SeqFeatureI interface. > Note that get_tag_values always returns a list, so to snag a single value > for a tag in a scalar, force list context on the LHS by enclosing the > variable in (). > > chris > > ----------------------------- > #!/usr/bin/env perl > > use Modern::Perl; > use Bio::SeqIO; > > my $in = Bio::SeqIO->new(-format => 'genbank', > -file => shift); > > while (my $seq = $in->next_seq) { > for my $feat ($seq->get_SeqFeatures) { > next unless $feat->primary_tag eq 'CDS'; > my ($locus) = $feat->has_tag('locus_tag') ? > $feat->get_tag_values('locus_tag') : ''; > my @funcs = $feat->has_tag('function') ? > $feat->get_tag_values('function') : (); > say join("\t", $locus, join(',', at funcs)); > } > } > > > > From cjfields at illinois.edu Thu Sep 8 14:27:06 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Thu, 8 Sep 2011 18:27:06 +0000 Subject: [Bioperl-l] genbank parsing of multiple 'function' tags within primary tag In-Reply-To: References: <1315497897.3797.443.camel@deskpro15336.internal.sanger.ac.uk> Message-ID: <2CDD8237-140E-410B-A18C-68FAF54719D5@illinois.edu> On Sep 8, 2011, at 12:51 PM, galeb abu-ali wrote: > thanks, Chris! works perfect. > To make sure I understand what's going on, forcing list context on $locus allows me to get one value at a time,... You have to be careful in this circumstance; doing this: my $foo = @bar; is scalar context on a list, which returns the number of elements in @bar. The following my ($foo) = @bar; forces list context and assigns the first value in @bar to $foo but tosses the rest. If you are sure there is only one value in @bar anyway, the above is fine (and is a common perl idiom). > which is then concatenated with \t to concatenated functions. I'm just using a simple join() to print off the results. Note the second element in the join list is an embedded join() with comma-sep values for functions. chris > thanks again! > > galeb > > On Thu, Sep 8, 2011 at 12:51 PM, Fields, Christopher J wrote: > There is no need to do that if one is using the Bio::SeqFeatureI interface. Note that get_tag_values always returns a list, so to snag a single value for a tag in a scalar, force list context on the LHS by enclosing the variable in (). > > chris > > ----------------------------- > #!/usr/bin/env perl > > use Modern::Perl; > use Bio::SeqIO; > > my $in = Bio::SeqIO->new(-format => 'genbank', > -file => shift); > > while (my $seq = $in->next_seq) { > for my $feat ($seq->get_SeqFeatures) { > next unless $feat->primary_tag eq 'CDS'; > my ($locus) = $feat->has_tag('locus_tag') ? > $feat->get_tag_values('locus_tag') : ''; > my @funcs = $feat->has_tag('function') ? > $feat->get_tag_values('function') : (); > say join("\t", $locus, join(',', at funcs)); > } > } > > > From cjfields at illinois.edu Thu Sep 8 14:30:06 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Thu, 8 Sep 2011 18:30:06 +0000 Subject: [Bioperl-l] Error when parsing a blast file In-Reply-To: References: <1F3664C5-6D6B-409C-BE5A-C5EB08975231@illinois.edu> Message-ID: Try updating to the latest CPAN release (1.6.901, which is the pre-1.7 release). chris On Sep 8, 2011, at 1:19 PM, Jeff S Kittrell wrote: > chris, > > I am using version 1.6.1 > > Thanks, > > > Jeff Kittrell > Department of Genetics, Cell Biology & Anatomy > University of Nebraska Medical Center > 985805 Nebraska Medical Center > Omaha, NE 68198-5805 > > "Fields, Christopher J" ---09/08/2011 12:28:56 PM---What version of bioperl are you using? I think this issue was addressed a while ago, but it's possi > > > From: > > "Fields, Christopher J" > > To: > > Jeff S Kittrell > > Cc: > > " " > > Date: > > 09/08/2011 12:28 PM > > Subject: > > Re: [Bioperl-l] Error when parsing a blast file > > > > What version of bioperl are you using? I think this issue was addressed a while ago, but it's possible there has been a regression. > > chris > > On Sep 8, 2011, at 11:40 AM, Jeff S Kittrell wrote: > > > Hello Gentlemen, > > > > I am using BioPerl to a parse a blast output file but have run into some difficulties. I've pin pointed the problem and have pasted an example below. If you look at query position 223-224 you will see a large insertion 65ish nucleotides. Since the insertion spans the entire line there are no nucleotide position numbers at the end or beginning of the line nor any nucleotides within the line (dashes only). > > When the SearchIO parser encounters this record it dies with the error > > > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > MSG: no data for midline Query ------------------------------------------------------------ > > STACK: Error::throw > > STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:368 > > STACK: Bio::SearchIO::blast::next_result /usr/local/share/perl5/Bio/SearchIO/blast.pm:1805 > > STACK: BlastParseNucleotideForDBTopHitCONTIGSQUERY.pl:24 > > ----------------------------------------------------------- > > > > > > Has anyone encountered this problem before? Am I doing something wrong? > > > > Thanks > > > > Jeff Kittrell > > Department of Genetics, Cell Biology & Anatomy > > University of Nebraska Medical Center > > 985805 Nebraska Medical Center > > Omaha, NE 68198-5805 > > > > Query= 78065535 > > > > Length=523 > > Score E > > Sequences producing significant alignments: (Bits) Value > > > > gi|144922664|ref|NM_001083909.1| Homo sapiens G protein-coupled... 576 1e-163 > > > > > > > gi|144922664|ref|NM_001083909.1| Homo sapiens G protein-coupled > > receptor 123 (GPR123), mRNA > > Length=4298 > > > > Score = 576 bits (638), Expect = 1e-163 > > Identities = 466/583 (80%), Gaps = 82/583 (14%) > > Strand=Plus/Minus > > > > Query 1 CAGGACTCCGTGG-----ATGGCATCTCGGGCAGGGCCACGCTGGGGTCTGGGTGGGTCC 55 > > ||||||||||||| | ||||||||||||||||||| |||||||||| |||||||| > > Sbjct 2537 CAGGACTCCGTGGGCAGCAGGGCATCTCGGGCAGGGCCATGCTGGGGTCTCAGTGGGTCC 2478 > > > > Query 56 TTTGATGGAAGCCCCTGCTCTGCCTCTGGGGCGCCCCAGGACTGGAGGCCACAGGACAGA 115 > > |||||||||| |||||||||||||||| ||| ||||||||||||||| |||||||||||| > > Sbjct 2477 TTTGATGGAATCCCCTGCTCTGCCTCTAGGGTGCCCCAGGACTGGAGACCACAGGACAGA 2418 > > > > Query 116 AACCAGATGACCTTGTGCAGGGACGAGCACGTGGAACTGGGATAAAAGGAGTGGGCGTGG 175 > > |||| ||||||| ||||| ||||| |||||| |||| |||||||| ||||||||||||| > > Sbjct 2417 AACCGGATGACCGTGTGC-GGGACCAGCACGCGGAATTGGGATAAGGGGAGTGGGCGTGG 2359 > > > > Query 176 CCCAGAGCTTTTCCCCGCTGAGGTCTTTCACAAGGAAGGGGCAGGGGT------------ 223 > > ||| |||| ||||||||||||||||||||||||||||||||||||||| > > Sbjct 2358 CCCGGAGCGTTTCCCCGCTGAGGTCTTTCACAAGGAAGGGGCAGGGGTGTGATCACAAGG 2299 > > > > Query ------------------------------------------------------------ > > > > Sbjct 2298 AAGGGGCAGGGGTGTGATCACAAGGAAGGGGCAGGGGTGTGATCACAAGGAAGGGGCAGG 2239 > > > > Query 224 ---GTGAACTGCTTCCGAAAGGTGGGGTCACTTTGGTGCCCCCAGTGACCTCATGTGGCA 280 > > |||||| ||||| |||||| |||||||||| ||| |||||||||||||||||||||| > > Sbjct 2238 GGTGTGAACGGCTTCTGAAAGGCGGGGTCACTTCGGTACCCCCAGTGACCTCATGTGGCA 2179 > > > > Query 281 GATGGGCCCCCCACTCTGCTCTGAAGCTCCTCCAGGAACACTGTGTCCCCTG-CTCCGCC 339 > > ||||||||||||||||||||||||||||||||||||||||| |||||| ||| | || | > > Sbjct 2178 GATGGGCCCCCCACTCTGCTCTGAAGCTCCTCCAGGAACACCGTGTCCTCTGCCCCCATC 2119 > > > > Query 340 TACACAGTAGTTTCATTTTTCCAGGGTCCTGTTCGGATGTTGCCGGTCCCATCGGTGCCA 399 > > |||||||||||||| |||||||||||||| |||||||||||||||||||| ||||||||| > > Sbjct 2118 TACACAGTAGTTTCGTTTTTCCAGGGTCCCGTTCGGATGTTGCCGGTCCCGTCGGTGCCA 2059 > > > > Query 400 AACGGCAGGTCTTCTAGCAAGTTACCCTTGGGCAGCCCGTTCTGGCTGGGGCCACCAAAG 459 > > ||||||||| |||||||||| ||||||||||||||||||||||||||||||||||||||| > > Sbjct 2058 AACGGCAGGCCTTCTAGCAATTTACCCTTGGGCAGCCCGTTCTGGCTGGGGCCACCAAAG 1999 > > > > Query 460 GGCAGGGACTGTGTCCTCCGCAGCATCTCCAGGTGACCAGGCC 502 > > ||||||||||||||||||||||||||||||| ||| || |||| > > Sbjct 1998 GGCAGGGACTGTGTCCTCCGCAGCATCTCCAAGTGGCCGGGCC 1956 > > > > > > > > Lambda K H > > 0.634 0.408 0.912 > > > > Gapped > > Lambda K H > > 0.625 0.410 0.780 > > > > Effective search space used: 47712920310 > > > > > > > > > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > From abualiga2 at gmail.com Thu Sep 8 14:34:41 2011 From: abualiga2 at gmail.com (galeb abu-ali) Date: Thu, 8 Sep 2011 14:34:41 -0400 Subject: [Bioperl-l] genbank parsing of multiple 'function' tags within primary tag In-Reply-To: <2CDD8237-140E-410B-A18C-68FAF54719D5@illinois.edu> References: <1315497897.3797.443.camel@deskpro15336.internal.sanger.ac.uk> <2CDD8237-140E-410B-A18C-68FAF54719D5@illinois.edu> Message-ID: many thanks again, Chris! I was reading Programming Perl, but this sums it up better. On Thu, Sep 8, 2011 at 2:27 PM, Fields, Christopher J wrote: > On Sep 8, 2011, at 12:51 PM, galeb abu-ali wrote: > > > thanks, Chris! works perfect. > > To make sure I understand what's going on, forcing list context on $locus > allows me to get one value at a time,... > > You have to be careful in this circumstance; doing this: > > my $foo = @bar; > > is scalar context on a list, which returns the number of elements in @bar. > The following > > my ($foo) = @bar; > > forces list context and assigns the first value in @bar to $foo but tosses > the rest. If you are sure there is only one value in @bar anyway, the above > is fine (and is a common perl idiom). > > > which is then concatenated with \t to concatenated functions. > > I'm just using a simple join() to print off the results. Note the second > element in the join list is an embedded join() with comma-sep values for > functions. > > chris > > > thanks again! > > > > galeb > > > > On Thu, Sep 8, 2011 at 12:51 PM, Fields, Christopher J < > cjfields at illinois.edu> wrote: > > There is no need to do that if one is using the Bio::SeqFeatureI > interface. Note that get_tag_values always returns a list, so to snag a > single value for a tag in a scalar, force list context on the LHS by > enclosing the variable in (). > > > > chris > > > > ----------------------------- > > #!/usr/bin/env perl > > > > use Modern::Perl; > > use Bio::SeqIO; > > > > my $in = Bio::SeqIO->new(-format => 'genbank', > > -file => shift); > > > > while (my $seq = $in->next_seq) { > > for my $feat ($seq->get_SeqFeatures) { > > next unless $feat->primary_tag eq 'CDS'; > > my ($locus) = $feat->has_tag('locus_tag') ? > > $feat->get_tag_values('locus_tag') : ''; > > my @funcs = $feat->has_tag('function') ? > > $feat->get_tag_values('function') : (); > > say join("\t", $locus, join(',', at funcs)); > > } > > } > > > > > > > > From David.Messina at sbc.su.se Fri Sep 9 05:40:25 2011 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 9 Sep 2011 11:40:25 +0200 Subject: [Bioperl-l] Fwd: blast result not matching. In-Reply-To: References: <4AF09F30-CB69-4285-90A2-40AFEEEBD222@illinois.edu> <8FB3525D-3B2B-4D17-80D4-AA85B3D2AE8B@gmail.com> Message-ID: Hi Manju, But this is not showing all query coverage as it shows in simple balst.(see > attached file) > I'm not sure what you mean by query coverage here, as blast report you attached doesn't (as far as I can see) include a calculation of the number or percentage of query bases covered. But in any case, everything in that blast report is available in the Bio::SearchIO object that B::T::R::RemoteBlast returns. Have you taken a look at http://www.bioperl.org/wiki/HOWTO:SearchIO ? That, along with the module documentation, should help you find the parts of the BLAST report you're looking for. > and i also want to write that result in a blast file..Is there any method > which can write the remoteblast output > in a file with blast extension? > It is possible to write out the results in a format that closely resembles the native blast report, but it's not recommended. If you want to just run BLAST and get back a report, there's no need to use BioPerl to parse the report first and then recreate the report. This might also be a good time to mention that, if you're doing more than a few hundred BLAST searches, you'll find it much more efficient to download the database and the BLAST program from NCBI and run them on your own computer. NCBI severely limits the speed and frequency of remote BLASTs, and furthermore it's much more prone to failure. Also, if you're using BLAST+, you can run your BLASTs on NCBI's computers remotely without BioPerl. Check out the --remote command-line option ? it's my favorite new feature! Dave From David.Messina at sbc.su.se Fri Sep 9 06:53:01 2011 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 9 Sep 2011 12:53:01 +0200 Subject: [Bioperl-l] Fwd: blast result not matching. In-Reply-To: References: <4AF09F30-CB69-4285-90A2-40AFEEEBD222@illinois.edu> <8FB3525D-3B2B-4D17-80D4-AA85B3D2AE8B@gmail.com> Message-ID: If you don't want to learn how to do this in BioPerl, then take my previous suggestion and just use NCBI's tools: Also, if you're using BLAST+, you can run your BLASTs on NCBI's computers > remotely without BioPerl. Check out the --remote command-line option On Fri, Sep 9, 2011 at 12:07, Manju Rawat wrote: > I dont no more about Bioperl.... > and i just want to blast my sequences using bioperl... > ans see the result in a file... > pls tell me what should i do??? > From manju.rawat2 at gmail.com Fri Sep 9 07:05:57 2011 From: manju.rawat2 at gmail.com (Manju Rawat) Date: Fri, 9 Sep 2011 07:05:57 -0400 Subject: [Bioperl-l] Fwd: blast result not matching. In-Reply-To: References: <4AF09F30-CB69-4285-90A2-40AFEEEBD222@illinois.edu> <8FB3525D-3B2B-4D17-80D4-AA85B3D2AE8B@gmail.com> Message-ID: I want to learn...and i am learing it from starting... My main query is I want to make a program which gives me that result(sequence) which have no blast result(no matches in any database/or particular database). for this i have to do blast may time....but i am not getting desired result in blast...this is the main problem which i am facing.. now pls tell me whta procedure i should follow... Manju From cjfields at illinois.edu Fri Sep 9 09:03:26 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Fri, 9 Sep 2011 13:03:26 +0000 Subject: [Bioperl-l] blast result not matching. In-Reply-To: References: <4AF09F30-CB69-4285-90A2-40AFEEEBD222@illinois.edu> <8FB3525D-3B2B-4D17-80D4-AA85B3D2AE8B@gmail.com> Message-ID: If you are planning on looking against 'everything' (e.g. nt or nr), and you have many sequences to run, I would follow Dave's suggestion and download BLAST locally. chris On Sep 9, 2011, at 6:05 AM, Manju Rawat wrote: > I want to learn...and i am learing it from starting... > My main query is I want to make a program which gives me that > result(sequence) which have no blast result(no matches in any database/or > particular database). > for this i have to do blast may time....but i am not getting desired result > in blast...this is the main problem which i am facing.. > now pls tell me whta procedure i should follow... > > > Manju > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From manju.rawat2 at gmail.com Fri Sep 9 05:01:55 2011 From: manju.rawat2 at gmail.com (Manju Rawat) Date: Fri, 9 Sep 2011 05:01:55 -0400 Subject: [Bioperl-l] Fwd: blast result not matching. In-Reply-To: References: <4AF09F30-CB69-4285-90A2-40AFEEEBD222@illinois.edu> <8FB3525D-3B2B-4D17-80D4-AA85B3D2AE8B@gmail.com> Message-ID: Thanks to all..Its working.. I tried that module...and got the result follwing result in terminal... waiting......db is All GenBank+EMBL+DDBJ+PDB sequences (but no EST, STS,GSS,environmental samples or phase 0, 1 or 2 HTGS sequences) hit name is ref|NM_181451.1| score is 240 hit name is ref|NM_001008415.1| score is 234 hit name is ref|XM_002706247.1| score is 212 hit name is ref|XM_002683856.1| score is 208 hit name is gb|EF197120.1| score is 208 hit name is ref|XR_083566.1| score is 198 hit name is ref|NM_001097567.1| score is 198 hit name is ref|NM_001098089.1| score is 198 hit name is ref|XM_002699708.1| score is 192 hit name is ref|XM_592786.5| score is 192 hit name is ref|XM_001251693.3| score is 192 hit name is gb|AF490400.1| score is 190 hit name is gb|AY075103.1| score is 190 hit name is ref|XR_083457.1| score is 178 But this is not showing all query coverage as it shows in simple balst.(see attached file) and i also want to write that result in a blast file..Is there any method which can write the remoteblast output in a file with blast extension? Thanks Manju Rawat. -------------- next part -------------- A non-text attachment was scrubbed... Name: res.blast Type: application/octet-stream Size: 218976 bytes Desc: not available URL: From ross at cuhk.edu.hk Sat Sep 10 02:39:23 2011 From: ross at cuhk.edu.hk (Ross KK Leung) Date: Sat, 10 Sep 2011 14:39:23 +0800 Subject: [Bioperl-l] fail to obtain node-to-node distance from a newick file Message-ID: <048a01cc6f84$60c41090$224c31b0$@edu.hk> I use the following code to derive the distance between two nodes but an error "MSG: could not find the lca of supplied nodes; can't find distance either" What's the problem? use Bio::TreeIO; ($treefh) = @ARGV; my $treeio = new Bio::TreeIO(-file => $treefh, -format => "newick"); my $tree = $treeio->next_tree; $keyword="Mycobacterium_tuberculosis_H37Rv"; my $Tnodes = $tree->find_node(-id => $keyword); $keyword="Mycobacterium_smegmatis_str._MC2_155"; my $Mnodes = $tree->find_node(-id => $keyword); $keyword="Mycobacterium_abscessus"; my $Anodes = $tree->find_node(-id => $keyword); my @root = $tree->get_root_node; #my $distances = $tree->distance(-nodes => [$node[0],$root]); my $distances = $tree->distance(-nodes => [$Tnode,$Mnodes]); print "Dist:$distances\n"; #### the following is the infile (((Mycobacterium_tuberculosis_SUMu006:4.1E-4,(Mycobacterium_tuberculosis_SUM u002:8.9E-4,Mycobacterium_tuberculosis_SUMu005:1.7E-4)31:1.4E-4)30:1.4E-4,(M ycobacterium_tuberculosis_SUMu007:4.3E-4,Mycobacterium_tuberculosis_SUMu009: 1.2E-4)22:1.4E-4)8:1.4E-4,(Mycobacterium_tuberculosis_SUMu004:3.7E-4,Mycobac terium_tuberculosis_SUMu008:2.0E-4)19:1.0E-4,(Mycobacterium_tuberculosis_SUM u003:1.4E-4,(((((Mycobacterium_tuberculosis_EAS054:0.00165,(Mycobacterium_tu berculosis_94_M4241A:0.0011,((Mycobacterium_tuberculosis_T17:0.00257,Mycobac terium_tuberculosis_KZN_605:0.00259)8:1.9E-4,(Mycobacterium_tuberculosis_'98 -R604_INH-RIF-EM':9.9E-4,(Mycobacterium_tuberculosis_C:0.00293,Mycobacterium _tuberculosis_str._Haarlem:0.00271)28:1.4E-4)1:1.3E-4)0:1.6E-4)0:1.7E-4)0:1. 6E-4,(Mycobacterium_tuberculosis_02_1987:0.00186,(Mycobacterium_tuberculosis _T85:0.00229,Mycobacterium_tuberculosis_210:0.00175)41:1.4E-4)22:1.1E-4)0:1. 4E-4,(((Mycobacterium_kansasii_ATCC_12478:0.03353,(Mycobacterium_marinum_M:8 .4E-4,Mycobacterium_ulcerans_Agy99:0.00621)100:0.03843)100:0.01058,((Mycobac terium_leprae_Br4923:1.7E-4,Mycobacterium_leprae_TN:1.6E-4)100:0.07379,(((My cobacterium_abscessus:0.09172,((Nocardia_farcinica_IFM_10152:0.09312,(Rhodoc occus_equi_103S:0.04972,((Rhodococcus_erythropolis_PR4:0.00223,Rhodococcus_e rythropolis_SK121:0.00169)100:0.05311,(Rhodococcus_jostii_RHA1:0.00657,Rhodo coccus_opacus_B4:0.00665)100:0.03198)100:0.02608)100:0.03679)100:0.0322,(((D ietzia_cinnamea_P4:0.17703,(Corynebacterium_amycolatum_SK46:0.16548,((Coryne bacterium_kroppenstedtii_DSM_44385:0.19137,(Corynebacterium_variabile_DSM_44 702:0.14307,(Corynebacterium_urealyticum_DSM_7109:0.11864,(Corynebacterium_r esistens_DSM_45100:0.08617,(Corynebacterium_jeikeium_ATCC_43734:0.00292,Cory nebacterium_jeikeium_K411:0.00277)100:0.08012)100:0.02749)100:0.03867)100:0. 05776)100:0.02728,((Corynebacterium_glucuronolyticum_ATCC_51866:0.00103,Cory nebacterium_glucuronolyticum_ATCC_51867:0.00137)100:0.2108,(((Corynebacteriu m_efficiens_YS-314:0.04795,Corynebacterium_glutamicum_R:0.05369)100:0.07449, ((Corynebacterium_matruchotii_ATCC_14266:0.00162,Corynebacterium_matruchotii _ATCC_33806:0.00183)100:0.13467,(Corynebacterium_diphtheriae_NCTC_13129:0.07 465,((Corynebacterium_pseudotuberculosis_C231:1.6E-4,Corynebacterium_pseudot uberculosis_I19:1.1E-4)55:1.8E-4,(Corynebacterium_pseudotuberculosis_1002:4. 2E-4,Corynebacterium_pseudotuberculosis_FRC41:1.2E-4)100:1.3E-4)100:0.0698)1 00:0.03816)100:0.03537)100:0.01906,((Corynebacterium_ammoniagenes_DSM_20306: 0.10393,(((Corynebacterium_accolens_ATCC_49725:0.00229,Corynebacterium_accol ens_ATCC_49726:0.00293)100:0.03941,(Corynebacterium_pseudogenitalium_ATCC_33 035:0.00268,Corynebacterium_tuberculostearicum_SK141:0.00306)100:0.02483)100 :0.04295,(Corynebacterium_aurimucosum_ATCC_700975:0.05398,Corynebacterium_st riatum_ATCC_6940:0.05404)100:0.02088)100:0.03132)100:0.05481,(Corynebacteriu m_genitalium_ATCC_33030:0.09551,Corynebacterium_lipophiloflavum_DSM_44291:0. 09316)100:0.0783)100:0.03398)100:0.02921)100:0.04667)100:0.02447)100:0.11754 )100:0.06305,(((Pseudonocardia_sp._P1:0.18877,(Saccharopolyspora_erythraea_N RRL_2338:0.1213,(Actinosynnema_mirum_DSM_43827:0.11851,(Saccharomonospora_vi ridis_DSM_43017:0.0927,(Amycolatopsis_mediterranei_U32:0.04231,Streptomyces_ sp._AA4:0.04144)100:0.05565)100:0.05467)55:0.01703)100:0.03241)100:0.02581,( Nakamurella_multipartita_DSM_44233:0.23119,(Geodermatophilus_obscurus_DSM_43 160:0.21546,((Stackebrandtia_nassauensis_DSM_44728:0.21272,((Salinispora_are nicola_CNS-205:0.01818,Salinispora_tropica_CNB-440:0.02452)100:0.03265,(Micr omonospora_sp._ATCC_39149:0.0329,(Micromonospora_aurantiaca_ATCC_27029:0.001 16,Micromonospora_sp._L5:7.7E-4)100:0.03199)100:0.01352)100:0.11098)100:0.09 308,((((((Propionibacterium_freudenreichii_subsp._shermanii_CIRM-BIA1:0.2696 8,(Propionibacterium_acnes_J139:0.00475,(Propionibacterium_acnes_KPA171202:6 .9E-4,(Propionibacterium_acnes_SK187:9.6E-4,(Propionibacterium_acnes_J165:6. 2E-4,Propionibacterium_acnes_SK137:8.8E-4)100:6.3E-4)100:0.00181)100:0.00315 )100:0.24712)45:0.13659,(Kribbella_flavida_DSM_17836:0.17436,(Nocardioidacea e_bacterium_Broad-1:0.13638,Nocardioides_sp._JS614:0.08869)100:0.11094)73:0. 02696)43:0.0456,((Kytococcus_sedentarius_DSM_20547:0.25655,(Dermacoccus_sp._ Ellin185:0.19062,(Intrasporangium_calvum_DSM_43043:0.13866,Janibacter_sp._HT CC2649:0.1412)100:0.04228)58:0.02864)100:0.06896,(Kineococcus_radiotolerans_ SRS30216:0.21207,((((Propionibacterium_acidifaciens_F0233:1.06116,Rubrobacte r_xylanophilus_DSM_9941:0.5414)87:0.41816,(((Tropheryma_whipplei_TW08/27:0.0 0122,Tropheryma_whipplei_str._Twist:9.8E-4)100:0.6263,(Microbacterium_testac eum_StLB037:0.17489,(Leifsonia_xyli_subsp._xyli_str._CTCB07:0.12936,(marine_ actinobacterium_PHSC20C1:0.15679,(Clavibacter_michiganensis_subsp._michigane nsis_NCPPB_382:0.00662,Clavibacter_michiganensis_subsp._sepedonicus:0.00756) 100:0.1194)100:0.03032)100:0.03729)100:0.05755)100:0.10753,((Kocuria_rhizoph ila_DC2201:0.15484,(Rothia_dentocariosa_M567:0.0599,(Rothia_mucilaginosa_ATC C_25296:0.00472,Rothia_mucilaginosa_DY-18:0.00351)100:0.05376)100:0.16477)10 0:0.06018,((Micrococcus_luteus_NCTC_2665:0.0060,Micrococcus_luteus_SK58:0.00 536)100:0.2105,(Arthrobacter_arilaitensis_Re117:0.19987,(Renibacterium_salmo ninarum_ATCC_33209:0.11862,(Arthrobacter_aurescens_TC1:0.0397,(Arthrobacter_ sp._FB24:0.03191,(Arthrobacter_chlorophenolicus_A6:0.02327,Arthrobacter_phen anthrenivorans_Sphe3:0.0208)100:0.01937)100:0.01749)100:0.05605)100:0.05138) 54:0.02994)100:0.02418)100:0.06667)66:0.02055)2:0.00737,(Brevibacterium_line ns_BL2:0.16126,Brevibacterium_mcbrellneri_ATCC_49030:0.16657)100:0.16995)18: 0.03558,(Brachybacterium_faecium_DSM_4810:0.31488,(((Actinomyces_sp._oral_ta xon_848_str._F0332:0.18095,Arcanobacterium_haemolyticum_DSM_20595:0.23121)10 0:0.1195,((Actinomyces_urogenitalis_DSM_15434:0.07685,(Actinomyces_sp._oral_ taxon_171_str._F0337:0.01337,(Actinomyces_oris_K20:0.00655,Actinomyces_visco sus_C505:0.00531)100:0.00537)100:0.07249)100:0.1364,(((Mobiluncus_mulieris_2 8-1:0.00136,(Mobiluncus_mulieris_ATCC_35243:8.3E-4,(Mobiluncus_mulieris_ATCC _35239:8.2E-4,Mobiluncus_mulieris_FB024-16:0.00149)77:5.2E-4)68:4.7E-4)100:0 .11036,(Mobiluncus_curtisii_ATCC_51333:0.00379,(Mobiluncus_curtisii_ATCC_430 63:0.00217,Mobiluncus_curtisii_subsp._holmesii_ATCC_35242:0.00188)100:0.0018 7)100:0.10314)100:0.23714,(Actinomyces_coleocanis_DSM_15436:0.23053,(Actinom yces_sp._oral_taxon_178_str._F0338:0.07648,(Actinomyces_sp._oral_taxon_180_s tr._F0310:0.034,(Actinomyces_odontolyticus_ATCC_17982:0.00322,Actinomyces_od ontolyticus_F0309:0.00348)100:0.02875)100:0.05144)100:0.13371)100:0.05044)10 0:0.03821)98:0.03296)99:0.07932,(Beutenbergia_cavernae_DSM_12333:0.17071,(Ce llulomonas_flavigena_DSM_20109:0.11998,(Xylanimonas_cellulosilytica_DSM_1589 4:0.13952,(Sanguibacter_keddieii_DSM_10542:0.0926,Jonesia_denitrificans_DSM_ 20603:0.19099)100:0.037)100:0.02528)100:0.03933)100:0.03437)90:0.02763)87:0. 02582)25:0.02892)16:0.01907)26:0.05274)57:0.03588,(Catenulispora_acidiphila_ DSM_44928:0.19325,((Streptomyces_bingchenggensis_BCW-1:0.02661,(Streptomyces _hygroscopicus_ATCC_53653:0.02627,Streptomyces_violaceusniger_Tu_4113:0.0226 6)100:0.0122)100:0.02739,(((Streptomyces_avermitilis_MA-4680:0.02562,((Strep tomyces_viridochromogenes_DSM_40736:0.02687,((Streptomyces_ghanaensis_ATCC_1 4672:0.0209,Streptomyces_griseoflavus_Tu4000:0.02075)100:0.01066,(Streptomyc es_sp._e14:0.03193,(Streptomyces_coelicolor_A3_2_:5.2E-4,Streptomyces_livida ns_TK24:6.7E-4)100:0.02763)21:0.00492)22:0.00461)100:0.00897,(Streptomyces_s viceus_ATCC_29083:0.02794,Streptomyces_scabiei_87.22:0.03155)79:0.00685)98:0 .00896)100:0.02063,(Streptomyces_albus_J1074:0.04846,(Streptomyces_sp._SPB74 :0.01027,(Streptomyces_sp._SPB78:0.0027,(Streptomyces_sp._SA3_actF:0.00376,S treptomyces_sp._SA3_actG:2.4E-4)100:6.7E-4)100:0.00617)100:0.05981)100:0.015 63)100:0.00959,((Streptomyces_sp._C:0.01998,Streptomyces_sp._Mg1:0.02056)100 :0.0383,(Streptomyces_pristinaespiralis_ATCC_25486:0.03823,(Streptomyces_sp. _ACTE:0.02415,((Streptomyces_griseus_subsp._griseus_NBRC_13350:4.1E-4,Strept omyces_sp._ACT-1:5.2E-4)100:0.00769,(Streptomyces_roseosporus_NRRL_11379:2.2 E-4,Streptomyces_roseosporus_NRRL_15998:0.00209)100:0.00782)100:0.017)100:0. 02297)60:0.00742)100:0.00985)100:0.0215)100:0.11349)100:0.05)46:0.02214,(Aci dothermus_cellulolyticus_11B:0.25166,((Nocardiopsis_dassonvillei_subsp._dass onvillei_DSM_43111:0.11961,Thermobifida_fusca_YX:0.09198)100:0.09418,(Thermo monospora_curvata_DSM_43183:0.12535,(Streptosporangium_roseum_DSM_43021:0.09 754,Thermobispora_bispora_DSM_43833:0.08434)100:0.06253)100:0.0273)100:0.045 43)74:0.02856)40:0.02885,(Frankia_symbiont_of_Datisca_glomerata:0.09569,(Fra nkia_sp._EuI1c:0.10847,((Frankia_alni_ACN14a:0.0351,Frankia_sp._CcI3:0.03651 )100:0.04249,(Frankia_sp._EAN1pec:0.03101,Frankia_sp._EUN1f:0.0394)100:0.046 88)100:0.02188)100:0.03014)100:0.16741)88:0.0315)65:0.02571)88:0.05354)88:0. 03348)88:0.06784,(Segniliparus_rotundus_DSM_44985:0.04795,Segniliparus_rugos us_ATCC_BAA-974:0.03891)100:0.23144)81:0.0205)88:0.02214,(Tsukamurella_pauro metabola_DSM_20162:0.13958,(Gordonia_bronchialis_DSM_43247:0.08238,Gordonia_ neofelifaecis_NRRL_B-59395:0.10412)100:0.06124)100:0.03199)84:0.01596)88:0.0 6919)88:0.04069,(Mycobacterium_smegmatis_str._MC2_155:0.05174,((Mycobacteriu m_sp._JLS:0.00135,(Mycobacterium_sp._KMS:1.4E-4,Mycobacterium_sp._MCS:2.3E-4 )100:0.00177)100:0.04943,(Mycobacterium_vanbaalenii_PYR-1:0.02631,(Mycobacte rium_gilvum_PYR-GCK:0.0011,Mycobacterium_sp._Spyr1:9.2E-4)100:0.03388)100:0. 03536)100:0.01213)100:0.02314)88:0.04937,(Mycobacterium_parascrofulaceum_ATC C_BAA-614:0.02796,(Mycobacterium_intracellulare_ATCC_13950:0.02159,(Mycobact erium_avium_subsp._paratuberculosis_K-10:0.0020,(Mycobacterium_avium_104:0.0 032,Mycobacterium_avium_subsp._avium_ATCC_25291:0.00216)91:6.3E-4)100:0.0194 3)100:0.01235)100:0.0158)88:0.01054)62:0.0066)88:0.04218,(Mycobacterium_tube rculosis_T46:7.6E-4,(Mycobacterium_tuberculosis_CPHL_A:5.8E-4,(Mycobacterium _tuberculosis_K85:8.0E-4,(Mycobacterium_bovis_AF2122/97:1.4E-4,(Mycobacteriu m_bovis_BCG_str._Pasteur_1173P2:1.4E-4,Mycobacterium_bovis_BCG_str._Tokyo_17 2:1.4E-4)100:1.3E-4)100:1.8E-4)38:1.1E-4)33:1.4E-4)0:1.4E-4)1:1.5E-4)0:1.5E- 4,(Mycobacterium_tuberculosis_CDC1551:2.2E-4,(Mycobacterium_tuberculosis_T92 :0.00262,Mycobacterium_tuberculosis_GM_1503:0.00427)3:1.3E-4)0:1.4E-4)13:1.4 E-4,(((Mycobacterium_tuberculosis_SUMu012:8.5E-4,(Mycobacterium_tuberculosis _H37Ra_WGS_:0.00109,(Mycobacterium_tuberculosis_H37Ra:1.4E-4,Mycobacterium_t uberculosis_H37Rv:1.4E-4)58:1.4E-4)41:1.4E-4)72:1.4E-4,(Mycobacterium_tuberc ulosis_SUMu001:2.9E-4,(Mycobacterium_tuberculosis_SUMu010:1.1E-4,Mycobacteri um_tuberculosis_SUMu011:8.7E-4)58:1.4E-4)65:1.6E-4)61:0.00102,(Mycobacterium _tuberculosis_F11:1.1E-4,((Mycobacterium_tuberculosis_KZN_4207:1.4E-4,Mycoba cterium_tuberculosis_KZN_R506:1.4E-4)58:1.4E-4,(Mycobacterium_tuberculosis_K ZN_1435:1.4E-4,Mycobacterium_tuberculosis_KZN_V2475:1.4E-4)74:1.4E-4)78:1.1E -4)36:1.4E-4)3:1.4E-4)46:1.4E-4)4:1.4E-4)88; From greg at ebi.ac.uk Sat Sep 10 11:39:52 2011 From: greg at ebi.ac.uk (Gregory Jordan) Date: Sat, 10 Sep 2011 16:39:52 +0100 Subject: [Bioperl-l] fail to obtain node-to-node distance from a newick file In-Reply-To: <048a01cc6f84$60c41090$224c31b0$@edu.hk> References: <048a01cc6f84$60c41090$224c31b0$@edu.hk> Message-ID: Hi Ross, Which version of BioPerl are you using? With the refactored tree code (available from the tree_api_refresh branch on the BioPerl github repo: https://github.com/bioperl/bioperl-live/blob/topic/tree_api_refresh/Bio/Tree/NodeFunctionsI.pm#L406) the following script works for me. Do those values look sensible to you? The code on the new branch is a bit experimental, so I wouldn't be surprised if all the edge cases for calculations like this aren't covered. --greg use Bio::TreeIO; my $treeio = new Bio::TreeIO(-file => 'temp.nh', -format => "newick"); my $tree = $treeio->next_tree; my $mt = $tree->find("Mycobacterium_tuberculosis_H37Rv"); my $ms = $tree->find("Mycobacterium_smegmatis_str._MC2_155"); my $ma = $tree->find("Mycobacterium_abscessus"); my $distance = $mt->distance($ma); print "MT - MA: ".$mt->distance($ma)."\n"; print "MT - MS: ".$mt->distance($ms)."\n"; print "MS - MA: ".$ms->distance($ma)."\n"; # MT - MA: 0.24326 # MT - MS: 0.18573 # MS - MA: 0.20729 --greg On Sat, Sep 10, 2011 at 7:39 AM, Ross KK Leung wrote: > I use the following code to derive the distance between two nodes but an > error "MSG: could not find the lca of supplied nodes; can't find distance > either" > > > > What's the problem? > > > > use Bio::TreeIO; > > > > ($treefh) = @ARGV; > > > > my $treeio = new Bio::TreeIO(-file => $treefh, -format => "newick"); > > my $tree = $treeio->next_tree; > > > > $keyword="Mycobacterium_tuberculosis_H37Rv"; > > my $Tnodes = $tree->find_node(-id => $keyword); > > $keyword="Mycobacterium_smegmatis_str._MC2_155"; > > my $Mnodes = $tree->find_node(-id => $keyword); > > $keyword="Mycobacterium_abscessus"; > > my $Anodes = $tree->find_node(-id => $keyword); > > > > my @root = $tree->get_root_node; > > #my $distances = $tree->distance(-nodes => [$node[0],$root]); > > > > my $distances = $tree->distance(-nodes => [$Tnode,$Mnodes]); > > print "Dist:$distances\n"; > > > > > > #### the following is the infile > > > (((Mycobacterium_tuberculosis_SUMu006:4.1E-4,(Mycobacterium_tuberculosis_SUM > > u002:8.9E-4,Mycobacterium_tuberculosis_SUMu005:1.7E-4)31:1.4E-4)30:1.4E-4,(M > > ycobacterium_tuberculosis_SUMu007:4.3E-4,Mycobacterium_tuberculosis_SUMu009: > > 1.2E-4)22:1.4E-4)8:1.4E-4,(Mycobacterium_tuberculosis_SUMu004:3.7E-4,Mycobac > > terium_tuberculosis_SUMu008:2.0E-4)19:1.0E-4,(Mycobacterium_tuberculosis_SUM > > u003:1.4E-4,(((((Mycobacterium_tuberculosis_EAS054:0.00165,(Mycobacterium_tu > > berculosis_94_M4241A:0.0011,((Mycobacterium_tuberculosis_T17:0.00257,Mycobac > > terium_tuberculosis_KZN_605:0.00259)8:1.9E-4,(Mycobacterium_tuberculosis_'98 > > -R604_INH-RIF-EM':9.9E-4,(Mycobacterium_tuberculosis_C:0.00293,Mycobacterium > > _tuberculosis_str._Haarlem:0.00271)28:1.4E-4)1:1.3E-4)0:1.6E-4)0:1.7E-4)0:1. > > 6E-4,(Mycobacterium_tuberculosis_02_1987:0.00186,(Mycobacterium_tuberculosis > > _T85:0.00229,Mycobacterium_tuberculosis_210:0.00175)41:1.4E-4)22:1.1E-4)0:1. > > 4E-4,(((Mycobacterium_kansasii_ATCC_12478:0.03353,(Mycobacterium_marinum_M:8 > > .4E-4,Mycobacterium_ulcerans_Agy99:0.00621)100:0.03843)100:0.01058,((Mycobac > > terium_leprae_Br4923:1.7E-4,Mycobacterium_leprae_TN:1.6E-4)100:0.07379,(((My > > cobacterium_abscessus:0.09172,((Nocardia_farcinica_IFM_10152:0.09312,(Rhodoc > > occus_equi_103S:0.04972,((Rhodococcus_erythropolis_PR4:0.00223,Rhodococcus_e > > rythropolis_SK121:0.00169)100:0.05311,(Rhodococcus_jostii_RHA1:0.00657,Rhodo > > coccus_opacus_B4:0.00665)100:0.03198)100:0.02608)100:0.03679)100:0.0322,(((D > > ietzia_cinnamea_P4:0.17703,(Corynebacterium_amycolatum_SK46:0.16548,((Coryne > > bacterium_kroppenstedtii_DSM_44385:0.19137,(Corynebacterium_variabile_DSM_44 > > 702:0.14307,(Corynebacterium_urealyticum_DSM_7109:0.11864,(Corynebacterium_r > > esistens_DSM_45100:0.08617,(Corynebacterium_jeikeium_ATCC_43734:0.00292,Cory > > nebacterium_jeikeium_K411:0.00277)100:0.08012)100:0.02749)100:0.03867)100:0. > > 05776)100:0.02728,((Corynebacterium_glucuronolyticum_ATCC_51866:0.00103,Cory > > nebacterium_glucuronolyticum_ATCC_51867:0.00137)100:0.2108,(((Corynebacteriu > > m_efficiens_YS-314:0.04795,Corynebacterium_glutamicum_R:0.05369)100:0.07449, > > ((Corynebacterium_matruchotii_ATCC_14266:0.00162,Corynebacterium_matruchotii > > _ATCC_33806:0.00183)100:0.13467,(Corynebacterium_diphtheriae_NCTC_13129:0.07 > > 465,((Corynebacterium_pseudotuberculosis_C231:1.6E-4,Corynebacterium_pseudot > > uberculosis_I19:1.1E-4)55:1.8E-4,(Corynebacterium_pseudotuberculosis_1002:4. > > 2E-4,Corynebacterium_pseudotuberculosis_FRC41:1.2E-4)100:1.3E-4)100:0.0698)1 > > 00:0.03816)100:0.03537)100:0.01906,((Corynebacterium_ammoniagenes_DSM_20306: > > 0.10393,(((Corynebacterium_accolens_ATCC_49725:0.00229,Corynebacterium_accol > > ens_ATCC_49726:0.00293)100:0.03941,(Corynebacterium_pseudogenitalium_ATCC_33 > > 035:0.00268,Corynebacterium_tuberculostearicum_SK141:0.00306)100:0.02483)100 > > :0.04295,(Corynebacterium_aurimucosum_ATCC_700975:0.05398,Corynebacterium_st > > riatum_ATCC_6940:0.05404)100:0.02088)100:0.03132)100:0.05481,(Corynebacteriu > > m_genitalium_ATCC_33030:0.09551,Corynebacterium_lipophiloflavum_DSM_44291:0. > > 09316)100:0.0783)100:0.03398)100:0.02921)100:0.04667)100:0.02447)100:0.11754 > > )100:0.06305,(((Pseudonocardia_sp._P1:0.18877,(Saccharopolyspora_erythraea_N > > RRL_2338:0.1213,(Actinosynnema_mirum_DSM_43827:0.11851,(Saccharomonospora_vi > > ridis_DSM_43017:0.0927,(Amycolatopsis_mediterranei_U32:0.04231,Streptomyces_ > > sp._AA4:0.04144)100:0.05565)100:0.05467)55:0.01703)100:0.03241)100:0.02581,( > > Nakamurella_multipartita_DSM_44233:0.23119,(Geodermatophilus_obscurus_DSM_43 > > 160:0.21546,((Stackebrandtia_nassauensis_DSM_44728:0.21272,((Salinispora_are > > nicola_CNS-205:0.01818,Salinispora_tropica_CNB-440:0.02452)100:0.03265,(Micr > > omonospora_sp._ATCC_39149:0.0329,(Micromonospora_aurantiaca_ATCC_27029:0.001 > > 16,Micromonospora_sp._L5:7.7E-4)100:0.03199)100:0.01352)100:0.11098)100:0.09 > > 308,((((((Propionibacterium_freudenreichii_subsp._shermanii_CIRM-BIA1:0.2696 > > 8,(Propionibacterium_acnes_J139:0.00475,(Propionibacterium_acnes_KPA171202:6 > > .9E-4,(Propionibacterium_acnes_SK187:9.6E-4,(Propionibacterium_acnes_J165:6. > > 2E-4,Propionibacterium_acnes_SK137:8.8E-4)100:6.3E-4)100:0.00181)100:0.00315 > > )100:0.24712)45:0.13659,(Kribbella_flavida_DSM_17836:0.17436,(Nocardioidacea > > e_bacterium_Broad-1:0.13638,Nocardioides_sp._JS614:0.08869)100:0.11094)73:0. > > 02696)43:0.0456,((Kytococcus_sedentarius_DSM_20547:0.25655,(Dermacoccus_sp._ > > Ellin185:0.19062,(Intrasporangium_calvum_DSM_43043:0.13866,Janibacter_sp._HT > > CC2649:0.1412)100:0.04228)58:0.02864)100:0.06896,(Kineococcus_radiotolerans_ > > SRS30216:0.21207,((((Propionibacterium_acidifaciens_F0233:1.06116,Rubrobacte > > r_xylanophilus_DSM_9941:0.5414)87:0.41816,(((Tropheryma_whipplei_TW08/27:0.0 > > 0122,Tropheryma_whipplei_str._Twist:9.8E-4)100:0.6263,(Microbacterium_testac > > eum_StLB037:0.17489,(Leifsonia_xyli_subsp._xyli_str._CTCB07:0.12936,(marine_ > > actinobacterium_PHSC20C1:0.15679,(Clavibacter_michiganensis_subsp._michigane > > nsis_NCPPB_382:0.00662,Clavibacter_michiganensis_subsp._sepedonicus:0.00756) > > 100:0.1194)100:0.03032)100:0.03729)100:0.05755)100:0.10753,((Kocuria_rhizoph > > ila_DC2201:0.15484,(Rothia_dentocariosa_M567:0.0599,(Rothia_mucilaginosa_ATC > > C_25296:0.00472,Rothia_mucilaginosa_DY-18:0.00351)100:0.05376)100:0.16477)10 > > 0:0.06018,((Micrococcus_luteus_NCTC_2665:0.0060,Micrococcus_luteus_SK58:0.00 > > 536)100:0.2105,(Arthrobacter_arilaitensis_Re117:0.19987,(Renibacterium_salmo > > ninarum_ATCC_33209:0.11862,(Arthrobacter_aurescens_TC1:0.0397,(Arthrobacter_ > > sp._FB24:0.03191,(Arthrobacter_chlorophenolicus_A6:0.02327,Arthrobacter_phen > > anthrenivorans_Sphe3:0.0208)100:0.01937)100:0.01749)100:0.05605)100:0.05138) > > 54:0.02994)100:0.02418)100:0.06667)66:0.02055)2:0.00737,(Brevibacterium_line > > ns_BL2:0.16126,Brevibacterium_mcbrellneri_ATCC_49030:0.16657)100:0.16995)18: > > 0.03558,(Brachybacterium_faecium_DSM_4810:0.31488,(((Actinomyces_sp._oral_ta > > xon_848_str._F0332:0.18095,Arcanobacterium_haemolyticum_DSM_20595:0.23121)10 > > 0:0.1195,((Actinomyces_urogenitalis_DSM_15434:0.07685,(Actinomyces_sp._oral_ > > taxon_171_str._F0337:0.01337,(Actinomyces_oris_K20:0.00655,Actinomyces_visco > > sus_C505:0.00531)100:0.00537)100:0.07249)100:0.1364,(((Mobiluncus_mulieris_2 > > 8-1:0.00136,(Mobiluncus_mulieris_ATCC_35243:8.3E-4,(Mobiluncus_mulieris_ATCC > > _35239:8.2E-4,Mobiluncus_mulieris_FB024-16:0.00149)77:5.2E-4)68:4.7E-4)100:0 > > .11036,(Mobiluncus_curtisii_ATCC_51333:0.00379,(Mobiluncus_curtisii_ATCC_430 > > 63:0.00217,Mobiluncus_curtisii_subsp._holmesii_ATCC_35242:0.00188)100:0.0018 > > 7)100:0.10314)100:0.23714,(Actinomyces_coleocanis_DSM_15436:0.23053,(Actinom > > yces_sp._oral_taxon_178_str._F0338:0.07648,(Actinomyces_sp._oral_taxon_180_s > > tr._F0310:0.034,(Actinomyces_odontolyticus_ATCC_17982:0.00322,Actinomyces_od > > ontolyticus_F0309:0.00348)100:0.02875)100:0.05144)100:0.13371)100:0.05044)10 > > 0:0.03821)98:0.03296)99:0.07932,(Beutenbergia_cavernae_DSM_12333:0.17071,(Ce > > llulomonas_flavigena_DSM_20109:0.11998,(Xylanimonas_cellulosilytica_DSM_1589 > > 4:0.13952,(Sanguibacter_keddieii_DSM_10542:0.0926,Jonesia_denitrificans_DSM_ > > 20603:0.19099)100:0.037)100:0.02528)100:0.03933)100:0.03437)90:0.02763)87:0. > > 02582)25:0.02892)16:0.01907)26:0.05274)57:0.03588,(Catenulispora_acidiphila_ > > DSM_44928:0.19325,((Streptomyces_bingchenggensis_BCW-1:0.02661,(Streptomyces > > _hygroscopicus_ATCC_53653:0.02627,Streptomyces_violaceusniger_Tu_4113:0.0226 > > 6)100:0.0122)100:0.02739,(((Streptomyces_avermitilis_MA-4680:0.02562,((Strep > > tomyces_viridochromogenes_DSM_40736:0.02687,((Streptomyces_ghanaensis_ATCC_1 > > 4672:0.0209,Streptomyces_griseoflavus_Tu4000:0.02075)100:0.01066,(Streptomyc > > es_sp._e14:0.03193,(Streptomyces_coelicolor_A3_2_:5.2E-4,Streptomyces_livida > > ns_TK24:6.7E-4)100:0.02763)21:0.00492)22:0.00461)100:0.00897,(Streptomyces_s > > viceus_ATCC_29083:0.02794,Streptomyces_scabiei_87.22:0.03155)79:0.00685)98:0 > > .00896)100:0.02063,(Streptomyces_albus_J1074:0.04846,(Streptomyces_sp._SPB74 > > :0.01027,(Streptomyces_sp._SPB78:0.0027,(Streptomyces_sp._SA3_actF:0.00376,S > > treptomyces_sp._SA3_actG:2.4E-4)100:6.7E-4)100:0.00617)100:0.05981)100:0.015 > > 63)100:0.00959,((Streptomyces_sp._C:0.01998,Streptomyces_sp._Mg1:0.02056)100 > > :0.0383,(Streptomyces_pristinaespiralis_ATCC_25486:0.03823,(Streptomyces_sp. > > _ACTE:0.02415,((Streptomyces_griseus_subsp._griseus_NBRC_13350:4.1E-4,Strept > > omyces_sp._ACT-1:5.2E-4)100:0.00769,(Streptomyces_roseosporus_NRRL_11379:2.2 > > E-4,Streptomyces_roseosporus_NRRL_15998:0.00209)100:0.00782)100:0.017)100:0. > > 02297)60:0.00742)100:0.00985)100:0.0215)100:0.11349)100:0.05)46:0.02214,(Aci > > dothermus_cellulolyticus_11B:0.25166,((Nocardiopsis_dassonvillei_subsp._dass > > onvillei_DSM_43111:0.11961,Thermobifida_fusca_YX:0.09198)100:0.09418,(Thermo > > monospora_curvata_DSM_43183:0.12535,(Streptosporangium_roseum_DSM_43021:0.09 > > 754,Thermobispora_bispora_DSM_43833:0.08434)100:0.06253)100:0.0273)100:0.045 > > 43)74:0.02856)40:0.02885,(Frankia_symbiont_of_Datisca_glomerata:0.09569,(Fra > > nkia_sp._EuI1c:0.10847,((Frankia_alni_ACN14a:0.0351,Frankia_sp._CcI3:0.03651 > > )100:0.04249,(Frankia_sp._EAN1pec:0.03101,Frankia_sp._EUN1f:0.0394)100:0.046 > > 88)100:0.02188)100:0.03014)100:0.16741)88:0.0315)65:0.02571)88:0.05354)88:0. > > 03348)88:0.06784,(Segniliparus_rotundus_DSM_44985:0.04795,Segniliparus_rugos > > us_ATCC_BAA-974:0.03891)100:0.23144)81:0.0205)88:0.02214,(Tsukamurella_pauro > > metabola_DSM_20162:0.13958,(Gordonia_bronchialis_DSM_43247:0.08238,Gordonia_ > > neofelifaecis_NRRL_B-59395:0.10412)100:0.06124)100:0.03199)84:0.01596)88:0.0 > > 6919)88:0.04069,(Mycobacterium_smegmatis_str._MC2_155:0.05174,((Mycobacteriu > > m_sp._JLS:0.00135,(Mycobacterium_sp._KMS:1.4E-4,Mycobacterium_sp._MCS:2.3E-4 > > )100:0.00177)100:0.04943,(Mycobacterium_vanbaalenii_PYR-1:0.02631,(Mycobacte > > rium_gilvum_PYR-GCK:0.0011,Mycobacterium_sp._Spyr1:9.2E-4)100:0.03388)100:0. > > 03536)100:0.01213)100:0.02314)88:0.04937,(Mycobacterium_parascrofulaceum_ATC > > C_BAA-614:0.02796,(Mycobacterium_intracellulare_ATCC_13950:0.02159,(Mycobact > > erium_avium_subsp._paratuberculosis_K-10:0.0020,(Mycobacterium_avium_104:0.0 > > 032,Mycobacterium_avium_subsp._avium_ATCC_25291:0.00216)91:6.3E-4)100:0.0194 > > 3)100:0.01235)100:0.0158)88:0.01054)62:0.0066)88:0.04218,(Mycobacterium_tube > > rculosis_T46:7.6E-4,(Mycobacterium_tuberculosis_CPHL_A:5.8E-4,(Mycobacterium > > _tuberculosis_K85:8.0E-4,(Mycobacterium_bovis_AF2122/97:1.4E-4,(Mycobacteriu > > m_bovis_BCG_str._Pasteur_1173P2:1.4E-4,Mycobacterium_bovis_BCG_str._Tokyo_17 > > 2:1.4E-4)100:1.3E-4)100:1.8E-4)38:1.1E-4)33:1.4E-4)0:1.4E-4)1:1.5E-4)0:1.5E- > > 4,(Mycobacterium_tuberculosis_CDC1551:2.2E-4,(Mycobacterium_tuberculosis_T92 > > :0.00262,Mycobacterium_tuberculosis_GM_1503:0.00427)3:1.3E-4)0:1.4E-4)13:1.4 > > E-4,(((Mycobacterium_tuberculosis_SUMu012:8.5E-4,(Mycobacterium_tuberculosis > > _H37Ra_WGS_:0.00109,(Mycobacterium_tuberculosis_H37Ra:1.4E-4,Mycobacterium_t > > uberculosis_H37Rv:1.4E-4)58:1.4E-4)41:1.4E-4)72:1.4E-4,(Mycobacterium_tuberc > > ulosis_SUMu001:2.9E-4,(Mycobacterium_tuberculosis_SUMu010:1.1E-4,Mycobacteri > > um_tuberculosis_SUMu011:8.7E-4)58:1.4E-4)65:1.6E-4)61:0.00102,(Mycobacterium > > _tuberculosis_F11:1.1E-4,((Mycobacterium_tuberculosis_KZN_4207:1.4E-4,Mycoba > > cterium_tuberculosis_KZN_R506:1.4E-4)58:1.4E-4,(Mycobacterium_tuberculosis_K > > ZN_1435:1.4E-4,Mycobacterium_tuberculosis_KZN_V2475:1.4E-4)74:1.4E-4)78:1.1E > -4)36:1.4E-4)3:1.4E-4)46:1.4E-4)4:1.4E-4)88; > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From ross at cuhk.edu.hk Sat Sep 10 19:06:44 2011 From: ross at cuhk.edu.hk (Ross KK Leung) Date: Sun, 11 Sep 2011 07:06:44 +0800 Subject: [Bioperl-l] fail to obtain node-to-node distance from a newick file In-Reply-To: References: <048a01cc6f84$60c41090$224c31b0$@edu.hk> Message-ID: <04a601cc700e$4f051d10$ed0f5730$@edu.hk> Hi Greg, The values are correct! However, how to install this bioperl-live module? my bioperl is 1.6.1 but there's an error: Can't locate object method "find" via package "Bio::Tree::Tree" at TreeCalDist.pl line 32, line 1. my $mt = $tree->find($keyword); #line 32 From: gjuggler at gmail.com [mailto:gjuggler at gmail.com] On Behalf Of Gregory Jordan Sent: 2011??9??10?? 23:40 To: bioperl-l List; Ross KK Leung Subject: Re: [Bioperl-l] fail to obtain node-to-node distance from a newick file Hi Ross, Which version of BioPerl are you using? With the refactored tree code (available from the tree_api_refresh branch on the BioPerl github repo: https://github.com/bioperl/bioperl-live/blob/topic/tree_api_refresh/Bio/Tree /NodeFunctionsI.pm#L406) the following script works for me. Do those values look sensible to you? The code on the new branch is a bit experimental, so I wouldn't be surprised if all the edge cases for calculations like this aren't covered. --greg use Bio::TreeIO; my $treeio = new Bio::TreeIO(-file => 'temp.nh', -format => "newick"); my $tree = $treeio->next_tree; my $mt = $tree->find("Mycobacterium_tuberculosis_H37Rv"); my $ms = $tree->find("Mycobacterium_smegmatis_str._MC2_155"); my $ma = $tree->find("Mycobacterium_abscessus"); my $distance = $mt->distance($ma); print "MT - MA: ".$mt->distance($ma)."\n"; print "MT - MS: ".$mt->distance($ms)."\n"; print "MS - MA: ".$ms->distance($ma)."\n"; # MT - MA: 0.24326 # MT - MS: 0.18573 # MS - MA: 0.20729 --greg On Sat, Sep 10, 2011 at 7:39 AM, Ross KK Leung wrote: I use the following code to derive the distance between two nodes but an error "MSG: could not find the lca of supplied nodes; can't find distance either" What's the problem? use Bio::TreeIO; ($treefh) = @ARGV; my $treeio = new Bio::TreeIO(-file => $treefh, -format => "newick"); my $tree = $treeio->next_tree; $keyword="Mycobacterium_tuberculosis_H37Rv"; my $Tnodes = $tree->find_node(-id => $keyword); $keyword="Mycobacterium_smegmatis_str._MC2_155"; my $Mnodes = $tree->find_node(-id => $keyword); $keyword="Mycobacterium_abscessus"; my $Anodes = $tree->find_node(-id => $keyword); my @root = $tree->get_root_node; #my $distances = $tree->distance(-nodes => [$node[0],$root]); my $distances = $tree->distance(-nodes => [$Tnode,$Mnodes]); print "Dist:$distances\n"; #### the following is the infile (((Mycobacterium_tuberculosis_SUMu006:4.1E-4,(Mycobacterium_tuberculosis_SUM u002:8.9E-4,Mycobacterium_tuberculosis_SUMu005:1.7E-4)31:1.4E-4)30:1.4E-4,(M ycobacterium_tuberculosis_SUMu007:4.3E-4,Mycobacterium_tuberculosis_SUMu009: 1.2E-4)22:1.4E-4)8:1.4E-4,(Mycobacterium_tuberculosis_SUMu004:3.7E-4,Mycobac terium_tuberculosis_SUMu008:2.0E-4)19:1.0E-4,(Mycobacterium_tuberculosis_SUM u003:1.4E-4,(((((Mycobacterium_tuberculosis_EAS054:0.00165,(Mycobacterium_tu berculosis_94_M4241A:0.0011,((Mycobacterium_tuberculosis_T17:0.00257,Mycobac terium_tuberculosis_KZN_605:0.00259)8:1.9E-4,(Mycobacterium_tuberculosis_'98 -R604_INH-RIF-EM':9.9E-4,(Mycobacterium_tuberculosis_C:0.00293,Mycobacterium _tuberculosis_str._Haarlem:0.00271)28:1.4E-4)1:1.3E-4)0:1.6E-4)0:1.7E-4)0:1. 6E-4,(Mycobacterium_tuberculosis_02_1987:0.00186,(Mycobacterium_tuberculosis _T85:0.00229,Mycobacterium_tuberculosis_210:0.00175)41:1.4E-4)22:1.1E-4)0:1. 4E-4,(((Mycobacterium_kansasii_ATCC_12478:0.03353,(Mycobacterium_marinum_M:8 .4E-4,Mycobacterium_ulcerans_Agy99:0.00621)100:0.03843)100:0.01058,((Mycobac terium_leprae_Br4923:1.7E-4,Mycobacterium_leprae_TN:1.6E-4)100:0.07379,(((My cobacterium_abscessus:0.09172,((Nocardia_farcinica_IFM_10152:0.09312,(Rhodoc occus_equi_103S:0.04972,((Rhodococcus_erythropolis_PR4:0.00223,Rhodococcus_e rythropolis_SK121:0.00169)100:0.05311,(Rhodococcus_jostii_RHA1:0.00657,Rhodo coccus_opacus_B4:0.00665)100:0.03198)100:0.02608)100:0.03679)100:0.0322,(((D ietzia_cinnamea_P4:0.17703,(Corynebacterium_amycolatum_SK46:0.16548,((Coryne bacterium_kroppenstedtii_DSM_44385:0.19137,(Corynebacterium_variabile_DSM_44 702:0.14307,(Corynebacterium_urealyticum_DSM_7109:0.11864,(Corynebacterium_r esistens_DSM_45100:0.08617,(Corynebacterium_jeikeium_ATCC_43734:0.00292,Cory nebacterium_jeikeium_K411:0.00277)100:0.08012)100:0.02749)100:0.03867)100:0. 05776)100:0.02728,((Corynebacterium_glucuronolyticum_ATCC_51866:0.00103,Cory nebacterium_glucuronolyticum_ATCC_51867:0.00137)100:0.2108,(((Corynebacteriu m_efficiens_YS-314:0.04795,Corynebacterium_glutamicum_R:0.05369)100:0.07449, ((Corynebacterium_matruchotii_ATCC_14266:0.00162,Corynebacterium_matruchotii _ATCC_33806:0.00183)100:0.13467,(Corynebacterium_diphtheriae_NCTC_13129:0.07 465,((Corynebacterium_pseudotuberculosis_C231:1.6E-4,Corynebacterium_pseudot uberculosis_I19:1.1E-4)55:1.8E-4,(Corynebacterium_pseudotuberculosis_1002:4. 2E-4,Corynebacterium_pseudotuberculosis_FRC41:1.2E-4)100:1.3E-4)100:0.0698)1 00:0.03816)100:0.03537)100:0.01906,((Corynebacterium_ammoniagenes_DSM_20306: 0.10393,(((Corynebacterium_accolens_ATCC_49725:0.00229,Corynebacterium_accol ens_ATCC_49726:0.00293)100:0.03941,(Corynebacterium_pseudogenitalium_ATCC_33 035:0.00268,Corynebacterium_tuberculostearicum_SK141:0.00306)100:0.02483)100 :0.04295,(Corynebacterium_aurimucosum_ATCC_700975:0.05398,Corynebacterium_st riatum_ATCC_6940:0.05404)100:0.02088)100:0.03132)100:0.05481,(Corynebacteriu m_genitalium_ATCC_33030:0.09551,Corynebacterium_lipophiloflavum_DSM_44291:0. 09316)100:0.0783)100:0.03398)100:0.02921)100:0.04667)100:0.02447)100:0.11754 )100:0.06305,(((Pseudonocardia_sp._P1:0.18877,(Saccharopolyspora_erythraea_N RRL_2338:0.1213,(Actinosynnema_mirum_DSM_43827:0.11851,(Saccharomonospora_vi ridis_DSM_43017:0.0927,(Amycolatopsis_mediterranei_U32:0.04231,Streptomyces_ sp._AA4:0.04144)100:0.05565)100:0.05467)55:0.01703)100:0.03241)100:0.02581,( Nakamurella_multipartita_DSM_44233:0.23119,(Geodermatophilus_obscurus_DSM_43 160:0.21546,((Stackebrandtia_nassauensis_DSM_44728:0.21272,((Salinispora_are nicola_CNS-205:0.01818,Salinispora_tropica_CNB-440:0.02452)100:0.03265,(Micr omonospora_sp._ATCC_39149:0.0329,(Micromonospora_aurantiaca_ATCC_27029:0.001 16,Micromonospora_sp._L5:7.7E-4)100:0.03199)100:0.01352)100:0.11098)100:0.09 308,((((((Propionibacterium_freudenreichii_subsp._shermanii_CIRM-BIA1:0.2696 8,(Propionibacterium_acnes_J139:0.00475,(Propionibacterium_acnes_KPA171202:6 .9E-4,(Propionibacterium_acnes_SK187:9.6E-4,(Propionibacterium_acnes_J165:6. 2E-4,Propionibacterium_acnes_SK137:8.8E-4)100:6.3E-4)100:0.00181)100:0.00315 )100:0.24712)45:0.13659,(Kribbella_flavida_DSM_17836:0.17436,(Nocardioidacea e_bacterium_Broad-1:0.13638,Nocardioides_sp._JS614:0.08869)100:0.11094)73:0. 02696)43:0.0456,((Kytococcus_sedentarius_DSM_20547:0.25655,(Dermacoccus_sp._ Ellin185:0.19062,(Intrasporangium_calvum_DSM_43043:0.13866,Janibacter_sp._HT CC2649:0.1412)100:0.04228)58:0.02864)100:0.06896,(Kineococcus_radiotolerans_ SRS30216:0.21207,((((Propionibacterium_acidifaciens_F0233:1.06116,Rubrobacte r_xylanophilus_DSM_9941:0.5414)87:0.41816,(((Tropheryma_whipplei_TW08/27:0.0 0122,Tropheryma_whipplei_str._Twist:9.8E-4)100:0.6263,(Microbacterium_testac eum_StLB037:0.17489,(Leifsonia_xyli_subsp._xyli_str._CTCB07:0.12936,(marine_ actinobacterium_PHSC20C1:0.15679,(Clavibacter_michiganensis_subsp._michigane nsis_NCPPB_382:0.00662,Clavibacter_michiganensis_subsp._sepedonicus:0.00756) 100:0.1194)100:0.03032)100:0.03729)100:0.05755)100:0.10753,((Kocuria_rhizoph ila_DC2201:0.15484,(Rothia_dentocariosa_M567:0.0599,(Rothia_mucilaginosa_ATC C_25296:0.00472,Rothia_mucilaginosa_DY-18:0.00351)100:0.05376)100:0.16477)10 0:0.06018,((Micrococcus_luteus_NCTC_2665:0.0060,Micrococcus_luteus_SK58:0.00 536)100:0.2105,(Arthrobacter_arilaitensis_Re117:0.19987,(Renibacterium_salmo ninarum_ATCC_33209:0.11862,(Arthrobacter_aurescens_TC1:0.0397,(Arthrobacter_ sp._FB24:0.03191,(Arthrobacter_chlorophenolicus_A6:0.02327,Arthrobacter_phen anthrenivorans_Sphe3:0.0208)100:0.01937)100:0.01749)100:0.05605)100:0.05138) 54:0.02994)100:0.02418)100:0.06667)66:0.02055)2:0.00737,(Brevibacterium_line ns_BL2:0.16126,Brevibacterium_mcbrellneri_ATCC_49030:0.16657)100:0.16995)18: 0.03558,(Brachybacterium_faecium_DSM_4810:0.31488,(((Actinomyces_sp._oral_ta xon_848_str._F0332:0.18095,Arcanobacterium_haemolyticum_DSM_20595:0.23121)10 0:0.1195,((Actinomyces_urogenitalis_DSM_15434:0.07685,(Actinomyces_sp._oral_ taxon_171_str._F0337:0.01337,(Actinomyces_oris_K20:0.00655,Actinomyces_visco sus_C505:0.00531)100:0.00537)100:0.07249)100:0.1364,(((Mobiluncus_mulieris_2 8-1:0.00136,(Mobiluncus_mulieris_ATCC_35243:8.3E-4,(Mobiluncus_mulieris_ATCC _35239:8.2E-4,Mobiluncus_mulieris_FB024-16:0.00149)77:5.2E-4)68:4.7E-4)100:0 .11036,(Mobiluncus_curtisii_ATCC_51333:0.00379,(Mobiluncus_curtisii_ATCC_430 63:0.00217,Mobiluncus_curtisii_subsp._holmesii_ATCC_35242:0.00188)100:0.0018 7)100:0.10314)100:0.23714,(Actinomyces_coleocanis_DSM_15436:0.23053,(Actinom yces_sp._oral_taxon_178_str._F0338:0.07648,(Actinomyces_sp._oral_taxon_180_s tr._F0310:0.034,(Actinomyces_odontolyticus_ATCC_17982:0.00322,Actinomyces_od ontolyticus_F0309:0.00348)100:0.02875)100:0.05144)100:0.13371)100:0.05044)10 0:0.03821)98:0.03296)99:0.07932,(Beutenbergia_cavernae_DSM_12333:0.17071,(Ce llulomonas_flavigena_DSM_20109:0.11998,(Xylanimonas_cellulosilytica_DSM_1589 4:0.13952,(Sanguibacter_keddieii_DSM_10542:0.0926,Jonesia_denitrificans_DSM_ 20603:0.19099)100:0.037)100:0.02528)100:0.03933)100:0.03437)90:0.02763)87:0. 02582)25:0.02892)16:0.01907)26:0.05274)57:0.03588,(Catenulispora_acidiphila_ DSM_44928:0.19325,((Streptomyces_bingchenggensis_BCW-1:0.02661,(Streptomyces _hygroscopicus_ATCC_53653:0.02627,Streptomyces_violaceusniger_Tu_4113:0.0226 6)100:0.0122)100:0.02739,(((Streptomyces_avermitilis_MA-4680:0.02562,((Strep tomyces_viridochromogenes_DSM_40736:0.02687,((Streptomyces_ghanaensis_ATCC_1 4672:0.0209,Streptomyces_griseoflavus_Tu4000:0.02075)100:0.01066,(Streptomyc es_sp._e14:0.03193,(Streptomyces_coelicolor_A3_2_:5.2E-4,Streptomyces_livida ns_TK24:6.7E-4)100:0.02763)21:0.00492)22:0.00461)100:0.00897,(Streptomyces_s viceus_ATCC_29083:0.02794,Streptomyces_scabiei_87.22:0.03155)79:0.00685)98:0 .00896)100:0.02063,(Streptomyces_albus_J1074:0.04846,(Streptomyces_sp._SPB74 :0.01027,(Streptomyces_sp._SPB78:0.0027,(Streptomyces_sp._SA3_actF:0.00376,S treptomyces_sp._SA3_actG:2.4E-4)100:6.7E-4)100:0.00617)100:0.05981)100:0.015 63)100:0.00959,((Streptomyces_sp._C:0.01998,Streptomyces_sp._Mg1:0.02056)100 :0.0383,(Streptomyces_pristinaespiralis_ATCC_25486:0.03823,(Streptomyces_sp. _ACTE:0.02415,((Streptomyces_griseus_subsp._griseus_NBRC_13350:4.1E-4,Strept omyces_sp._ACT-1:5.2E-4)100:0.00769,(Streptomyces_roseosporus_NRRL_11379:2.2 E-4,Streptomyces_roseosporus_NRRL_15998:0.00209)100:0.00782)100:0.017)100:0. 02297)60:0.00742)100:0.00985)100:0.0215)100:0.11349)100:0.05)46:0.02214,(Aci dothermus_cellulolyticus_11B:0.25166,((Nocardiopsis_dassonvillei_subsp._dass onvillei_DSM_43111:0.11961,Thermobifida_fusca_YX:0.09198)100:0.09418,(Thermo monospora_curvata_DSM_43183:0.12535,(Streptosporangium_roseum_DSM_43021:0.09 754,Thermobispora_bispora_DSM_43833:0.08434)100:0.06253)100:0.0273)100:0.045 43)74:0.02856)40:0.02885,(Frankia_symbiont_of_Datisca_glomerata:0.09569,(Fra nkia_sp._EuI1c:0.10847,((Frankia_alni_ACN14a:0.0351,Frankia_sp._CcI3:0.03651 )100:0.04249,(Frankia_sp._EAN1pec:0.03101,Frankia_sp._EUN1f:0.0394)100:0.046 88)100:0.02188)100:0.03014)100:0.16741)88:0.0315)65:0.02571)88:0.05354)88:0. 03348)88:0.06784,(Segniliparus_rotundus_DSM_44985:0.04795,Segniliparus_rugos us_ATCC_BAA-974:0.03891)100:0.23144)81:0.0205)88:0.02214,(Tsukamurella_pauro metabola_DSM_20162:0.13958,(Gordonia_bronchialis_DSM_43247:0.08238,Gordonia_ neofelifaecis_NRRL_B-59395:0.10412)100:0.06124)100:0.03199)84:0.01596)88:0.0 6919)88:0.04069,(Mycobacterium_smegmatis_str._MC2_155:0.05174,((Mycobacteriu m_sp._JLS:0.00135,(Mycobacterium_sp._KMS:1.4E-4,Mycobacterium_sp._MCS:2.3E-4 )100:0.00177)100:0.04943,(Mycobacterium_vanbaalenii_PYR-1:0.02631,(Mycobacte rium_gilvum_PYR-GCK:0.0011,Mycobacterium_sp._Spyr1:9.2E-4)100:0.03388)100:0. 03536)100:0.01213)100:0.02314)88:0.04937,(Mycobacterium_parascrofulaceum_ATC C_BAA-614:0.02796,(Mycobacterium_intracellulare_ATCC_13950:0.02159,(Mycobact erium_avium_subsp._paratuberculosis_K-10:0.0020,(Mycobacterium_avium_104:0.0 032,Mycobacterium_avium_subsp._avium_ATCC_25291:0.00216)91:6.3E-4)100:0.0194 3)100:0.01235)100:0.0158)88:0.01054)62:0.0066)88:0.04218,(Mycobacterium_tube rculosis_T46:7.6E-4,(Mycobacterium_tuberculosis_CPHL_A:5.8E-4,(Mycobacterium _tuberculosis_K85:8.0E-4,(Mycobacterium_bovis_AF2122/97:1.4E-4,(Mycobacteriu m_bovis_BCG_str._Pasteur_1173P2:1.4E-4,Mycobacterium_bovis_BCG_str._Tokyo_17 2:1.4E-4)100:1.3E-4)100:1.8E-4)38:1.1E-4)33:1.4E-4)0:1.4E-4)1:1.5E-4)0:1.5E- 4,(Mycobacterium_tuberculosis_CDC1551:2.2E-4,(Mycobacterium_tuberculosis_T92 :0.00262,Mycobacterium_tuberculosis_GM_1503:0.00427)3:1.3E-4)0:1.4E-4)13:1.4 E-4,(((Mycobacterium_tuberculosis_SUMu012:8.5E-4,(Mycobacterium_tuberculosis _H37Ra_WGS_:0.00109,(Mycobacterium_tuberculosis_H37Ra:1.4E-4,Mycobacterium_t uberculosis_H37Rv:1.4E-4)58:1.4E-4)41:1.4E-4)72:1.4E-4,(Mycobacterium_tuberc ulosis_SUMu001:2.9E-4,(Mycobacterium_tuberculosis_SUMu010:1.1E-4,Mycobacteri um_tuberculosis_SUMu011:8.7E-4)58:1.4E-4)65:1.6E-4)61:0.00102,(Mycobacterium _tuberculosis_F11:1.1E-4,((Mycobacterium_tuberculosis_KZN_4207:1.4E-4,Mycoba cterium_tuberculosis_KZN_R506:1.4E-4)58:1.4E-4,(Mycobacterium_tuberculosis_K ZN_1435:1.4E-4,Mycobacterium_tuberculosis_KZN_V2475:1.4E-4)74:1.4E-4)78:1.1E -4)36:1.4E-4)3:1.4E-4)46:1.4E-4)4:1.4E-4)88; _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From manju.rawat2 at gmail.com Mon Sep 12 01:37:35 2011 From: manju.rawat2 at gmail.com (Manju Rawat) Date: Mon, 12 Sep 2011 01:37:35 -0400 Subject: [Bioperl-l] no blast result Message-ID: Hello, I want to make a program which first generate the random sequence and then gives me that result(sequence) which have no blast result(no matches in any database/or particular database).Is there any body who can help me in doing this. Pl reply if anybody knows about it.. Thanks Manju From zhangchnxp at gmail.com Mon Sep 12 01:51:59 2011 From: zhangchnxp at gmail.com (Zhang chn) Date: Mon, 12 Sep 2011 13:51:59 +0800 Subject: [Bioperl-l] no blast result In-Reply-To: References: Message-ID: Hi, IMHO, due to the nature of BLAST, it is usually impossible to get no results from random sequence, but to get a set of matches with lower scores. What you can do is to focus on the e-value, say, setting a threshold to it. FYI, http://www.ncbi.nlm.nih.gov/BLAST/tutorial/Altschul-1.html On Mon, Sep 12, 2011 at 1:37 PM, Manju Rawat wrote: > Hello, > I want to make a program which first generate the random sequence and then > gives me that result(sequence) which have no blast result(no matches in any > database/or particular database).Is there any body who can help me in doing > this. > > Pl reply if anybody knows about it.. > Thanks > Manju > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From manju.rawat2 at gmail.com Mon Sep 12 01:58:38 2011 From: manju.rawat2 at gmail.com (Manju Rawat) Date: Mon, 12 Sep 2011 01:58:38 -0400 Subject: [Bioperl-l] no blast result In-Reply-To: References: Message-ID: Ya i know this....And it is also in my use if i get result with lower scores. But how could I do this? Manju From zhangchnxp at gmail.com Mon Sep 12 02:04:17 2011 From: zhangchnxp at gmail.com (Zhang chn) Date: Mon, 12 Sep 2011 14:04:17 +0800 Subject: [Bioperl-l] no blast result In-Reply-To: References: Message-ID: Please read the documentation for Bio::Tools::Run::StandAloneBlast and Bio::AlignIO.* * On Mon, Sep 12, 2011 at 1:58 PM, Manju Rawat wrote: > Ya i know this....And it is also in my use if i get result with lower > scores. > But how could I do this? > > > Manju > From manju.rawat2 at gmail.com Mon Sep 12 07:12:40 2011 From: manju.rawat2 at gmail.com (Manju Rawat) Date: Mon, 12 Sep 2011 07:12:40 -0400 Subject: [Bioperl-l] no blast result In-Reply-To: References: Message-ID: I read this..but default program is not runnig fine.it showing many error that MSG: cannot find path to blastall.. Use of uninitialized value $_[0] in join or string at /usr/share/perl/5.10/File/Spec/Unix.pm line 41. Am this this is not showing output which i want.. Pl help me.. Manju Rawat From arguelloj at gmail.com Sun Sep 11 22:52:42 2011 From: arguelloj at gmail.com (J. Fernando Arguello) Date: Sun, 11 Sep 2011 19:52:42 -0700 Subject: [Bioperl-l] BioPerl - quick general question Message-ID: Dear BioPerl, I'm excited to see a project like this! Basically I have a computer science background with a few years of development, research and minimal bioinformatics experience. Dumb question...where is the best place for a developer to begin on the BioPerl wiki(s), who is wanting to contribute new code or bug fixes to BioPerl in the future? Any input is much appreciated. Thank you all for your time. Best, Fernando jfa From briano at bioteam.net Mon Sep 12 09:20:36 2011 From: briano at bioteam.net (Brian Osborne) Date: Mon, 12 Sep 2011 09:20:36 -0400 Subject: [Bioperl-l] Fwd: cds sequence extract References: <112c4ef2.641e.1325c4b21cb.Coremail.maliang7121@163.com> Message-ID: <671CAF11-55A4-462A-BC5B-805C87E1EB0E@bioteam.net> Liang Ma, I'm forwarding this to the Bioperl mailing list. If you're starting out with Bioperl I suggest you read this: http://www.bioperl.org/wiki/HOWTO:Beginners Brian O. Begin forwarded message: > From: maliang7121 > Date: September 12, 2011 2:20:20 AM EDT > To: briano at bioteam.net > Subject: cds sequence extract > > Dear Brian: > > I am a student of Chinese Academy of Sience, I begin to love bioperl, but now I have a problem. > > According to the script of the attachment, I could easily dowload sequences from NCBI, now I need extract cds sequence from the genbank format files, and put them all in a single file using fasta format, I can not do it, could you spend a few minite wrinting a script for me? > > Best! > > Liang Ma > > > Brian O. -- Brian Osborne, PhD BioTeam: http://bioteam.net email: briano at bioteam.net mobile: 978-317-3101 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: acc.txt URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: get_seq_by_acc_ml.pl Type: text/x-perl-script Size: 583 bytes Desc: not available URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: From fs5 at sanger.ac.uk Mon Sep 12 09:54:21 2011 From: fs5 at sanger.ac.uk (Frank Schwach) Date: Mon, 12 Sep 2011 14:54:21 +0100 Subject: [Bioperl-l] no blast result In-Reply-To: References: Message-ID: <1315835661.3797.563.camel@deskpro15336.internal.sanger.ac.uk> looks like BLAST is not install on your system. The BioPerl module only runs BLAST for you and parses the output but you still need the BLAST executables installed on your system. Follow the instructions on the NCBI website to download and install BLAST and try running it on the commandline with the "blastall" command. If that works then you can run it also via BioPerl. Frank On Mon, 2011-09-12 at 07:12 -0400, Manju Rawat wrote: > I read this..but default program is not runnig fine.it showing many error > that > > MSG: cannot find path to blastall.. > Use of uninitialized value $_[0] in join or string at > /usr/share/perl/5.10/File/Spec/Unix.pm line 41. > > Am this this is not showing output which i want.. > > Pl help me.. > > Manju Rawat > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From p.j.a.cock at googlemail.com Mon Sep 12 10:00:30 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 12 Sep 2011 15:00:30 +0100 Subject: [Bioperl-l] no blast result In-Reply-To: <1315835661.3797.563.camel@deskpro15336.internal.sanger.ac.uk> References: <1315835661.3797.563.camel@deskpro15336.internal.sanger.ac.uk> Message-ID: On Mon, Sep 12, 2011 at 2:54 PM, Frank Schwach wrote: > looks like BLAST is not install on your system. The BioPerl module only > runs BLAST for you and parses the output but you still need the BLAST > executables installed on your system. Follow the instructions on the > NCBI website to download and install BLAST and try running it on the > commandline with the "blastall" command. If that works then you can run > it also via BioPerl. > Frank Hang on - blastall is from the "legacy" BLAST suite, does BioPerl still talk to that or the new BLAST+ suite (e.g. binaries blastn and blastp rather then blastall)? Peter From cjfields at illinois.edu Mon Sep 12 13:45:56 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Mon, 12 Sep 2011 17:45:56 +0000 Subject: [Bioperl-l] BioPerl - quick general question In-Reply-To: References: Message-ID: <62B9B300-96AC-4511-A1B9-CFF36CBE6288@illinois.edu> On Sep 11, 2011, at 9:52 PM, J. Fernando Arguello wrote: > Dear BioPerl, > > I'm excited to see a project like this! Basically I have a computer science > background with a few years of development, research and minimal > bioinformatics experience. > > Dumb question...where is the best place for a developer to begin on the > BioPerl wiki(s), who is wanting to contribute new code or bug fixes to > BioPerl in the future? The basic starting point: the HOWTOs and the tutorial (not sure how up-to-date some of the latter are, in general they should work): http://www.bioperl.org/wiki/HOWTOs http://www.bioperl.org/wiki/Tutorials > Any input is much appreciated. Thank you all for your time. > > Best, > Fernando > jfa We gladly welcome anyone willing to hack on BioPerl. The repository is now on github (core is https://github.com/bioperl/bioperl-live), so it's fairly easy to fork the code and make changes. We are in the middle of splitting up the large codebase into more manageable subdistributions, so it's probably a good idea to ask on list about specific code in case the code is question resides in a separate repository. Let us know if you have additional questions! Cheers! chris From shalabh.sharma7 at gmail.com Mon Sep 12 14:00:16 2011 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Mon, 12 Sep 2011 14:00:16 -0400 Subject: [Bioperl-l] Module for SOCS Message-ID: Hi All, I am using SOCS for mapping my SOILD data. I was just wondering if there is any module in bioperl to analyze SOCS output files directly or mapreads format. Thanks Shalabh -- Shalabh Sharma Scientific Computing Professional Associate (Bioinformatics Specialist) Department of Marine Sciences University of Georgia Athens, GA 30602-3636 From greg at ebi.ac.uk Tue Sep 13 04:30:58 2011 From: greg at ebi.ac.uk (Gregory Jordan) Date: Tue, 13 Sep 2011 09:30:58 +0100 Subject: [Bioperl-l] fail to obtain node-to-node distance from a newick file In-Reply-To: <04a601cc700e$4f051d10$ed0f5730$@edu.hk> References: <048a01cc6f84$60c41090$224c31b0$@edu.hk> <04a601cc700e$4f051d10$ed0f5730$@edu.hk> Message-ID: Hi Ross, I don't typically 'install' versions of BioPerl from GitHub. Rather, I check out the code into a directory that's on my Perl search path (and make sure any other BioPerl code isn't on the path anymore). I think the following commands should get you the right set of code: > git clone git://github.com/bioperl/bioperl-live.git > git checkout topic/tree_api_refresh After that, I'm afraid I'll have to leave it to you (or someone else on the list). I'm no Perl guru, so I don't know the "right" way to direct Perl towards a developmental BioPerl branch. Cheers, Greg 2011/9/11 Ross KK Leung > Hi Greg,**** > > ** ** > > The values are correct! However, how to install this bioperl-live module? > my bioperl is 1.6.1 but there's an error:**** > > ** ** > > Can't locate object method "find" via package "Bio::Tree::Tree" at > TreeCalDist.pl line 32, line 1.**** > > my $mt = $tree->find($keyword); #line 32**** > > ** ** > > ** ** > > *From:* gjuggler at gmail.com [mailto:gjuggler at gmail.com] *On Behalf Of *Gregory > Jordan > *Sent:* 2011?9?10? 23:40 > *To:* bioperl-l List; Ross KK Leung > *Subject:* Re: [Bioperl-l] fail to obtain node-to-node distance from a > newick file**** > > ** ** > > Hi Ross,**** > > ** ** > > Which version of BioPerl are you using?**** > > ** ** > > With the refactored tree code (available from the tree_api_refresh branch > on the BioPerl github repo: > https://github.com/bioperl/bioperl-live/blob/topic/tree_api_refresh/Bio/Tree/NodeFunctionsI.pm#L406) > the following script works for me. Do those values look sensible to you? The > code on the new branch is a bit experimental, so I wouldn't be surprised if > all the edge cases for calculations like this aren't covered.**** > > ** ** > > --greg**** > > ** ** > > use Bio::TreeIO;**** > > ** ** > > my $treeio = new Bio::TreeIO(-file => 'temp.nh', -format => "newick");** > ** > > my $tree = $treeio->next_tree;**** > > my $mt = $tree->find("Mycobacterium_tuberculosis_H37Rv");**** > > my $ms = $tree->find("Mycobacterium_smegmatis_str._MC2_155");**** > > my $ma = $tree->find("Mycobacterium_abscessus");**** > > my $distance = $mt->distance($ma);**** > > print "MT - MA: ".$mt->distance($ma)."\n";**** > > print "MT - MS: ".$mt->distance($ms)."\n";**** > > print "MS - MA: ".$ms->distance($ma)."\n";**** > > # MT - MA: 0.24326**** > > # MT - MS: 0.18573**** > > # MS - MA: 0.20729**** > > ** ** > > --greg**** > > ** ** > > On Sat, Sep 10, 2011 at 7:39 AM, Ross KK Leung wrote:** > ** > > I use the following code to derive the distance between two nodes but an > error "MSG: could not find the lca of supplied nodes; can't find distance > either" > > > > What's the problem? > > > > use Bio::TreeIO; > > > > ($treefh) = @ARGV; > > > > my $treeio = new Bio::TreeIO(-file => $treefh, -format => "newick"); > > my $tree = $treeio->next_tree; > > > > $keyword="Mycobacterium_tuberculosis_H37Rv"; > > my $Tnodes = $tree->find_node(-id => $keyword); > > $keyword="Mycobacterium_smegmatis_str._MC2_155"; > > my $Mnodes = $tree->find_node(-id => $keyword); > > $keyword="Mycobacterium_abscessus"; > > my $Anodes = $tree->find_node(-id => $keyword); > > > > my @root = $tree->get_root_node; > > #my $distances = $tree->distance(-nodes => [$node[0],$root]); > > > > my $distances = $tree->distance(-nodes => [$Tnode,$Mnodes]); > > print "Dist:$distances\n"; > > > > > > #### the following is the infile > > > (((Mycobacterium_tuberculosis_SUMu006:4.1E-4,(Mycobacterium_tuberculosis_SUM > > u002:8.9E-4,Mycobacterium_tuberculosis_SUMu005:1.7E-4)31:1.4E-4)30:1.4E-4,(M > > ycobacterium_tuberculosis_SUMu007:4.3E-4,Mycobacterium_tuberculosis_SUMu009: > > 1.2E-4)22:1.4E-4)8:1.4E-4,(Mycobacterium_tuberculosis_SUMu004:3.7E-4,Mycobac > > terium_tuberculosis_SUMu008:2.0E-4)19:1.0E-4,(Mycobacterium_tuberculosis_SUM > > u003:1.4E-4,(((((Mycobacterium_tuberculosis_EAS054:0.00165,(Mycobacterium_tu > > berculosis_94_M4241A:0.0011,((Mycobacterium_tuberculosis_T17:0.00257,Mycobac > > terium_tuberculosis_KZN_605:0.00259)8:1.9E-4,(Mycobacterium_tuberculosis_'98 > > -R604_INH-RIF-EM':9.9E-4,(Mycobacterium_tuberculosis_C:0.00293,Mycobacterium > > _tuberculosis_str._Haarlem:0.00271)28:1.4E-4)1:1.3E-4)0:1.6E-4)0:1.7E-4)0:1. > > 6E-4,(Mycobacterium_tuberculosis_02_1987:0.00186,(Mycobacterium_tuberculosis > > _T85:0.00229,Mycobacterium_tuberculosis_210:0.00175)41:1.4E-4)22:1.1E-4)0:1. > > 4E-4,(((Mycobacterium_kansasii_ATCC_12478:0.03353,(Mycobacterium_marinum_M:8 > > .4E-4,Mycobacterium_ulcerans_Agy99:0.00621)100:0.03843)100:0.01058,((Mycobac > > terium_leprae_Br4923:1.7E-4,Mycobacterium_leprae_TN:1.6E-4)100:0.07379,(((My > > cobacterium_abscessus:0.09172,((Nocardia_farcinica_IFM_10152:0.09312,(Rhodoc > > occus_equi_103S:0.04972,((Rhodococcus_erythropolis_PR4:0.00223,Rhodococcus_e > > rythropolis_SK121:0.00169)100:0.05311,(Rhodococcus_jostii_RHA1:0.00657,Rhodo > > coccus_opacus_B4:0.00665)100:0.03198)100:0.02608)100:0.03679)100:0.0322,(((D > > ietzia_cinnamea_P4:0.17703,(Corynebacterium_amycolatum_SK46:0.16548,((Coryne > > bacterium_kroppenstedtii_DSM_44385:0.19137,(Corynebacterium_variabile_DSM_44 > > 702:0.14307,(Corynebacterium_urealyticum_DSM_7109:0.11864,(Corynebacterium_r > > esistens_DSM_45100:0.08617,(Corynebacterium_jeikeium_ATCC_43734:0.00292,Cory > > nebacterium_jeikeium_K411:0.00277)100:0.08012)100:0.02749)100:0.03867)100:0. > > 05776)100:0.02728,((Corynebacterium_glucuronolyticum_ATCC_51866:0.00103,Cory > > nebacterium_glucuronolyticum_ATCC_51867:0.00137)100:0.2108,(((Corynebacteriu > > m_efficiens_YS-314:0.04795,Corynebacterium_glutamicum_R:0.05369)100:0.07449, > > ((Corynebacterium_matruchotii_ATCC_14266:0.00162,Corynebacterium_matruchotii > > _ATCC_33806:0.00183)100:0.13467,(Corynebacterium_diphtheriae_NCTC_13129:0.07 > > 465,((Corynebacterium_pseudotuberculosis_C231:1.6E-4,Corynebacterium_pseudot > > uberculosis_I19:1.1E-4)55:1.8E-4,(Corynebacterium_pseudotuberculosis_1002:4. > > 2E-4,Corynebacterium_pseudotuberculosis_FRC41:1.2E-4)100:1.3E-4)100:0.0698)1 > > 00:0.03816)100:0.03537)100:0.01906,((Corynebacterium_ammoniagenes_DSM_20306: > > 0.10393,(((Corynebacterium_accolens_ATCC_49725:0.00229,Corynebacterium_accol > > ens_ATCC_49726:0.00293)100:0.03941,(Corynebacterium_pseudogenitalium_ATCC_33 > > 035:0.00268,Corynebacterium_tuberculostearicum_SK141:0.00306)100:0.02483)100 > > :0.04295,(Corynebacterium_aurimucosum_ATCC_700975:0.05398,Corynebacterium_st > > riatum_ATCC_6940:0.05404)100:0.02088)100:0.03132)100:0.05481,(Corynebacteriu > > m_genitalium_ATCC_33030:0.09551,Corynebacterium_lipophiloflavum_DSM_44291:0. > > 09316)100:0.0783)100:0.03398)100:0.02921)100:0.04667)100:0.02447)100:0.11754 > > )100:0.06305,(((Pseudonocardia_sp._P1:0.18877,(Saccharopolyspora_erythraea_N > > RRL_2338:0.1213,(Actinosynnema_mirum_DSM_43827:0.11851,(Saccharomonospora_vi > > ridis_DSM_43017:0.0927,(Amycolatopsis_mediterranei_U32:0.04231,Streptomyces_ > > sp._AA4:0.04144)100:0.05565)100:0.05467)55:0.01703)100:0.03241)100:0.02581,( > > Nakamurella_multipartita_DSM_44233:0.23119,(Geodermatophilus_obscurus_DSM_43 > > 160:0.21546,((Stackebrandtia_nassauensis_DSM_44728:0.21272,((Salinispora_are > > nicola_CNS-205:0.01818,Salinispora_tropica_CNB-440:0.02452)100:0.03265,(Micr > > omonospora_sp._ATCC_39149:0.0329,(Micromonospora_aurantiaca_ATCC_27029:0.001 > > 16,Micromonospora_sp._L5:7.7E-4)100:0.03199)100:0.01352)100:0.11098)100:0.09 > > 308,((((((Propionibacterium_freudenreichii_subsp._shermanii_CIRM-BIA1:0.2696 > > 8,(Propionibacterium_acnes_J139:0.00475,(Propionibacterium_acnes_KPA171202:6 > > .9E-4,(Propionibacterium_acnes_SK187:9.6E-4,(Propionibacterium_acnes_J165:6. > > 2E-4,Propionibacterium_acnes_SK137:8.8E-4)100:6.3E-4)100:0.00181)100:0.00315 > > )100:0.24712)45:0.13659,(Kribbella_flavida_DSM_17836:0.17436,(Nocardioidacea > > e_bacterium_Broad-1:0.13638,Nocardioides_sp._JS614:0.08869)100:0.11094)73:0. > > 02696)43:0.0456,((Kytococcus_sedentarius_DSM_20547:0.25655,(Dermacoccus_sp._ > > Ellin185:0.19062,(Intrasporangium_calvum_DSM_43043:0.13866,Janibacter_sp._HT > > CC2649:0.1412)100:0.04228)58:0.02864)100:0.06896,(Kineococcus_radiotolerans_ > > SRS30216:0.21207,((((Propionibacterium_acidifaciens_F0233:1.06116,Rubrobacte > > r_xylanophilus_DSM_9941:0.5414)87:0.41816,(((Tropheryma_whipplei_TW08/27:0.0 > > 0122,Tropheryma_whipplei_str._Twist:9.8E-4)100:0.6263,(Microbacterium_testac > > eum_StLB037:0.17489,(Leifsonia_xyli_subsp._xyli_str._CTCB07:0.12936,(marine_ > > actinobacterium_PHSC20C1:0.15679,(Clavibacter_michiganensis_subsp._michigane > > nsis_NCPPB_382:0.00662,Clavibacter_michiganensis_subsp._sepedonicus:0.00756) > > 100:0.1194)100:0.03032)100:0.03729)100:0.05755)100:0.10753,((Kocuria_rhizoph > > ila_DC2201:0.15484,(Rothia_dentocariosa_M567:0.0599,(Rothia_mucilaginosa_ATC > > C_25296:0.00472,Rothia_mucilaginosa_DY-18:0.00351)100:0.05376)100:0.16477)10 > > 0:0.06018,((Micrococcus_luteus_NCTC_2665:0.0060,Micrococcus_luteus_SK58:0.00 > > 536)100:0.2105,(Arthrobacter_arilaitensis_Re117:0.19987,(Renibacterium_salmo > > ninarum_ATCC_33209:0.11862,(Arthrobacter_aurescens_TC1:0.0397,(Arthrobacter_ > > sp._FB24:0.03191,(Arthrobacter_chlorophenolicus_A6:0.02327,Arthrobacter_phen > > anthrenivorans_Sphe3:0.0208)100:0.01937)100:0.01749)100:0.05605)100:0.05138) > > 54:0.02994)100:0.02418)100:0.06667)66:0.02055)2:0.00737,(Brevibacterium_line > > ns_BL2:0.16126,Brevibacterium_mcbrellneri_ATCC_49030:0.16657)100:0.16995)18: > > 0.03558,(Brachybacterium_faecium_DSM_4810:0.31488,(((Actinomyces_sp._oral_ta > > xon_848_str._F0332:0.18095,Arcanobacterium_haemolyticum_DSM_20595:0.23121)10 > > 0:0.1195,((Actinomyces_urogenitalis_DSM_15434:0.07685,(Actinomyces_sp._oral_ > > taxon_171_str._F0337:0.01337,(Actinomyces_oris_K20:0.00655,Actinomyces_visco > > sus_C505:0.00531)100:0.00537)100:0.07249)100:0.1364,(((Mobiluncus_mulieris_2 > > 8-1:0.00136,(Mobiluncus_mulieris_ATCC_35243:8.3E-4,(Mobiluncus_mulieris_ATCC > > _35239:8.2E-4,Mobiluncus_mulieris_FB024-16:0.00149)77:5.2E-4)68:4.7E-4)100:0 > > .11036,(Mobiluncus_curtisii_ATCC_51333:0.00379,(Mobiluncus_curtisii_ATCC_430 > > 63:0.00217,Mobiluncus_curtisii_subsp._holmesii_ATCC_35242:0.00188)100:0.0018 > > 7)100:0.10314)100:0.23714,(Actinomyces_coleocanis_DSM_15436:0.23053,(Actinom > > yces_sp._oral_taxon_178_str._F0338:0.07648,(Actinomyces_sp._oral_taxon_180_s > > tr._F0310:0.034,(Actinomyces_odontolyticus_ATCC_17982:0.00322,Actinomyces_od > > ontolyticus_F0309:0.00348)100:0.02875)100:0.05144)100:0.13371)100:0.05044)10 > > 0:0.03821)98:0.03296)99:0.07932,(Beutenbergia_cavernae_DSM_12333:0.17071,(Ce > > llulomonas_flavigena_DSM_20109:0.11998,(Xylanimonas_cellulosilytica_DSM_1589 > > 4:0.13952,(Sanguibacter_keddieii_DSM_10542:0.0926,Jonesia_denitrificans_DSM_ > > 20603:0.19099)100:0.037)100:0.02528)100:0.03933)100:0.03437)90:0.02763)87:0. > > 02582)25:0.02892)16:0.01907)26:0.05274)57:0.03588,(Catenulispora_acidiphila_ > > DSM_44928:0.19325,((Streptomyces_bingchenggensis_BCW-1:0.02661,(Streptomyces > > _hygroscopicus_ATCC_53653:0.02627,Streptomyces_violaceusniger_Tu_4113:0.0226 > > 6)100:0.0122)100:0.02739,(((Streptomyces_avermitilis_MA-4680:0.02562,((Strep > > tomyces_viridochromogenes_DSM_40736:0.02687,((Streptomyces_ghanaensis_ATCC_1 > > 4672:0.0209,Streptomyces_griseoflavus_Tu4000:0.02075)100:0.01066,(Streptomyc > > es_sp._e14:0.03193,(Streptomyces_coelicolor_A3_2_:5.2E-4,Streptomyces_livida > > ns_TK24:6.7E-4)100:0.02763)21:0.00492)22:0.00461)100:0.00897,(Streptomyces_s > > viceus_ATCC_29083:0.02794,Streptomyces_scabiei_87.22:0.03155)79:0.00685)98:0 > > .00896)100:0.02063,(Streptomyces_albus_J1074:0.04846,(Streptomyces_sp._SPB74 > > :0.01027,(Streptomyces_sp._SPB78:0.0027,(Streptomyces_sp._SA3_actF:0.00376,S > > treptomyces_sp._SA3_actG:2.4E-4)100:6.7E-4)100:0.00617)100:0.05981)100:0.015 > > 63)100:0.00959,((Streptomyces_sp._C:0.01998,Streptomyces_sp._Mg1:0.02056)100 > > :0.0383,(Streptomyces_pristinaespiralis_ATCC_25486:0.03823,(Streptomyces_sp. > > _ACTE:0.02415,((Streptomyces_griseus_subsp._griseus_NBRC_13350:4.1E-4,Strept > > omyces_sp._ACT-1:5.2E-4)100:0.00769,(Streptomyces_roseosporus_NRRL_11379:2.2 > > E-4,Streptomyces_roseosporus_NRRL_15998:0.00209)100:0.00782)100:0.017)100:0. > > 02297)60:0.00742)100:0.00985)100:0.0215)100:0.11349)100:0.05)46:0.02214,(Aci > > dothermus_cellulolyticus_11B:0.25166,((Nocardiopsis_dassonvillei_subsp._dass > > onvillei_DSM_43111:0.11961,Thermobifida_fusca_YX:0.09198)100:0.09418,(Thermo > > monospora_curvata_DSM_43183:0.12535,(Streptosporangium_roseum_DSM_43021:0.09 > > 754,Thermobispora_bispora_DSM_43833:0.08434)100:0.06253)100:0.0273)100:0.045 > > 43)74:0.02856)40:0.02885,(Frankia_symbiont_of_Datisca_glomerata:0.09569,(Fra > > nkia_sp._EuI1c:0.10847,((Frankia_alni_ACN14a:0.0351,Frankia_sp._CcI3:0.03651 > > )100:0.04249,(Frankia_sp._EAN1pec:0.03101,Frankia_sp._EUN1f:0.0394)100:0.046 > > 88)100:0.02188)100:0.03014)100:0.16741)88:0.0315)65:0.02571)88:0.05354)88:0. > > 03348)88:0.06784,(Segniliparus_rotundus_DSM_44985:0.04795,Segniliparus_rugos > > us_ATCC_BAA-974:0.03891)100:0.23144)81:0.0205)88:0.02214,(Tsukamurella_pauro > > metabola_DSM_20162:0.13958,(Gordonia_bronchialis_DSM_43247:0.08238,Gordonia_ > > neofelifaecis_NRRL_B-59395:0.10412)100:0.06124)100:0.03199)84:0.01596)88:0.0 > > 6919)88:0.04069,(Mycobacterium_smegmatis_str._MC2_155:0.05174,((Mycobacteriu > > m_sp._JLS:0.00135,(Mycobacterium_sp._KMS:1.4E-4,Mycobacterium_sp._MCS:2.3E-4 > > )100:0.00177)100:0.04943,(Mycobacterium_vanbaalenii_PYR-1:0.02631,(Mycobacte > > rium_gilvum_PYR-GCK:0.0011,Mycobacterium_sp._Spyr1:9.2E-4)100:0.03388)100:0. > > 03536)100:0.01213)100:0.02314)88:0.04937,(Mycobacterium_parascrofulaceum_ATC > > C_BAA-614:0.02796,(Mycobacterium_intracellulare_ATCC_13950:0.02159,(Mycobact > > erium_avium_subsp._paratuberculosis_K-10:0.0020,(Mycobacterium_avium_104:0.0 > > 032,Mycobacterium_avium_subsp._avium_ATCC_25291:0.00216)91:6.3E-4)100:0.0194 > > 3)100:0.01235)100:0.0158)88:0.01054)62:0.0066)88:0.04218,(Mycobacterium_tube > > rculosis_T46:7.6E-4,(Mycobacterium_tuberculosis_CPHL_A:5.8E-4,(Mycobacterium > > _tuberculosis_K85:8.0E-4,(Mycobacterium_bovis_AF2122/97:1.4E-4,(Mycobacteriu > > m_bovis_BCG_str._Pasteur_1173P2:1.4E-4,Mycobacterium_bovis_BCG_str._Tokyo_17 > > 2:1.4E-4)100:1.3E-4)100:1.8E-4)38:1.1E-4)33:1.4E-4)0:1.4E-4)1:1.5E-4)0:1.5E- > > 4,(Mycobacterium_tuberculosis_CDC1551:2.2E-4,(Mycobacterium_tuberculosis_T92 > > :0.00262,Mycobacterium_tuberculosis_GM_1503:0.00427)3:1.3E-4)0:1.4E-4)13:1.4 > > E-4,(((Mycobacterium_tuberculosis_SUMu012:8.5E-4,(Mycobacterium_tuberculosis > > _H37Ra_WGS_:0.00109,(Mycobacterium_tuberculosis_H37Ra:1.4E-4,Mycobacterium_t > > uberculosis_H37Rv:1.4E-4)58:1.4E-4)41:1.4E-4)72:1.4E-4,(Mycobacterium_tuberc > > ulosis_SUMu001:2.9E-4,(Mycobacterium_tuberculosis_SUMu010:1.1E-4,Mycobacteri > > um_tuberculosis_SUMu011:8.7E-4)58:1.4E-4)65:1.6E-4)61:0.00102,(Mycobacterium > > _tuberculosis_F11:1.1E-4,((Mycobacterium_tuberculosis_KZN_4207:1.4E-4,Mycoba > > cterium_tuberculosis_KZN_R506:1.4E-4)58:1.4E-4,(Mycobacterium_tuberculosis_K > > ZN_1435:1.4E-4,Mycobacterium_tuberculosis_KZN_V2475:1.4E-4)74:1.4E-4)78:1.1E > -4)36:1.4E-4)3:1.4E-4)46:1.4E-4)4:1.4E-4)88; > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l**** > > ** ** > From manju.rawat2 at gmail.com Tue Sep 13 07:20:07 2011 From: manju.rawat2 at gmail.com (Manju Rawat) Date: Tue, 13 Sep 2011 07:20:07 -0400 Subject: [Bioperl-l] no blast result In-Reply-To: References: <1315835661.3797.563.camel@deskpro15336.internal.sanger.ac.uk> Message-ID: this is the perl code #!usr/bin/perl -w use Bio::Perl; use Bio::SearchIO use Bio::Tools::Run::StandAloneBlast; @params = ('database' => 'swissprot', 'READMETHOD' => 'Blastn'); $factory = Bio::Tools::Run::StandAloneBlast->new(@params); $input = Bio::Seq->new(-id=>"testquery",-seq=>"gatcgtataccgtacagct"); $blast_report = $factory->blastall($input); write_blast(">rs.blast",$blast_report); It showing error that Use of uninitialized value $_[0] in join or string at /usr/share/perl/5.10/File/Spec/Unix.pm line 41. MSG: cannot find path to blastall From fs5 at sanger.ac.uk Tue Sep 13 11:09:24 2011 From: fs5 at sanger.ac.uk (Frank Schwach) Date: Tue, 13 Sep 2011 16:09:24 +0100 Subject: [Bioperl-l] no blast result In-Reply-To: References: <1315835661.3797.563.camel@deskpro15336.internal.sanger.ac.uk> Message-ID: <1315926565.3797.587.camel@deskpro15336.internal.sanger.ac.uk> Peter: in BioPerl 1.6 the default executable name in Bio::Tools::Run StandAloneBlast is still set to "blastall" - I'm not sure if it works with blast+ too. Manju: as I said previously, you need to check that you can run BLAST on the command line, i.e. make sure it is actually installed on your system. Have you done that? You can also check the Bio::Tools::Run::StandAloneBlast docs to see how you can manually set the path to your BLAST executable if it is not in your path. You have to install BLAST fisrt before you can run this module. The other error you get from yuor code refers to something that is outside of the code fragment you show here, so can't comment on that one. Frank On Tue, 2011-09-13 at 07:20 -0400, Manju Rawat wrote: > this is the perl code > > #!usr/bin/perl -w > use Bio::Perl; > use Bio::SearchIO > use Bio::Tools::Run::StandAloneBlast; > @params = ('database' => 'swissprot', > 'READMETHOD' => 'Blastn'); > > $factory = Bio::Tools::Run::StandAloneBlast->new(@params); > > > $input = Bio::Seq->new(-id=>"testquery",-seq=>"gatcgtataccgtacagct"); > $blast_report = $factory->blastall($input); > > > write_blast(">rs.blast",$blast_report); > > > It showing error that > > > Use of uninitialized value $_[0] in join or string > at /usr/share/perl/5.10/File/Spec/Unix.pm line 41. > > MSG: cannot find path to blastall > > -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From David.Messina at sbc.su.se Tue Sep 13 11:34:20 2011 From: David.Messina at sbc.su.se (Dave Messina) Date: Tue, 13 Sep 2011 17:34:20 +0200 Subject: [Bioperl-l] no blast result In-Reply-To: <1315926565.3797.587.camel@deskpro15336.internal.sanger.ac.uk> References: <1315835661.3797.563.camel@deskpro15336.internal.sanger.ac.uk> <1315926565.3797.587.camel@deskpro15336.internal.sanger.ac.uk> Message-ID: There's a separate Bio::Tools::Run::BlastPlus module for blast+. And a related HOWTO: http://www.bioperl.org/wiki/HOWTO:BlastPlus On Tue, Sep 13, 2011 at 17:09, Frank Schwach wrote: > Peter: in BioPerl 1.6 the default executable name in Bio::Tools::Run > StandAloneBlast is still set to "blastall" - I'm not sure if it works > with blast+ too. > > Manju: as I said previously, you need to check that you can run BLAST on > the command line, i.e. make sure it is actually installed on your > system. Have you done that? > You can also check the Bio::Tools::Run::StandAloneBlast docs to see how > you can manually set the path to your BLAST executable if it is not in > your path. You have to install BLAST fisrt before you can run this > module. > The other error you get from yuor code refers to something that is > outside of the code fragment you show here, so can't comment on that > one. > > Frank > > > On Tue, 2011-09-13 at 07:20 -0400, Manju Rawat wrote: > > this is the perl code > > > > #!usr/bin/perl -w > > use Bio::Perl; > > use Bio::SearchIO > > use Bio::Tools::Run::StandAloneBlast; > > @params = ('database' => 'swissprot', > > 'READMETHOD' => 'Blastn'); > > > > $factory = Bio::Tools::Run::StandAloneBlast->new(@params); > > > > > > $input = Bio::Seq->new(-id=>"testquery",-seq=>"gatcgtataccgtacagct"); > > $blast_report = $factory->blastall($input); > > > > > > write_blast(">rs.blast",$blast_report); > > > > > > It showing error that > > > > > > Use of uninitialized value $_[0] in join or string > > at /usr/share/perl/5.10/File/Spec/Unix.pm line 41. > > > > MSG: cannot find path to blastall > > > > > > > > -- > The Wellcome Trust Sanger Institute is operated by Genome Research > Limited, a charity registered in England with number 1021457 and a > company registered in England with number 2742969, whose registered > office is 215 Euston Road, London, NW1 2BE. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at illinois.edu Tue Sep 13 15:36:21 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Tue, 13 Sep 2011 19:36:21 +0000 Subject: [Bioperl-l] no blast result In-Reply-To: References: <1315835661.3797.563.camel@deskpro15336.internal.sanger.ac.uk> Message-ID: <6570DEC6-B485-44B0-868A-AAC6329B3224@illinois.edu> On Sep 12, 2011, at 9:00 AM, Peter Cock wrote: > On Mon, Sep 12, 2011 at 2:54 PM, Frank Schwach wrote: >> looks like BLAST is not install on your system. The BioPerl module only >> runs BLAST for you and parses the output but you still need the BLAST >> executables installed on your system. Follow the instructions on the >> NCBI website to download and install BLAST and try running it on the >> commandline with the "blastall" command. If that works then you can run >> it also via BioPerl. >> Frank > > Hang on - blastall is from the "legacy" BLAST suite, does > BioPerl still talk to that or the new BLAST+ suite (e.g. binaries > blastn and blastp rather then blastall)? > > Peter (aside: thought I sent this the other day. never mix grant writing and open source) Both BLAST and BLAST+ are supported via different modules. Some users don't want to use BLAST+ for various reasons, though this may soon be out of their control when NCBI eventually stops supporting legacy BLAST entirely. chris From manju.rawat2 at gmail.com Wed Sep 14 07:32:19 2011 From: manju.rawat2 at gmail.com (Manju Rawat) Date: Wed, 14 Sep 2011 07:32:19 -0400 Subject: [Bioperl-l] no blast result In-Reply-To: References: <1315835661.3797.563.camel@deskpro15336.internal.sanger.ac.uk> <6570DEC6-B485-44B0-868A-AAC6329B3224@illinois.edu> Message-ID: On Wed, Sep 14, 2011 at 7:31 AM, Manju Rawat wrote: > I am trying to install Blast+ in my system.(ubuntu) from this link > http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/unix_setup.html. > but i am getting error.. > > first i downloaded the blast(ncbi-blast-2.2.25+-ia32-linux.tar.gz) from ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/ > . and then extract it in the home/abc/ folder. > after that i set the path for configuration in terminal i.e > > *PATH=$PATH:/home/abc/blast-2.2.25+/bin* > > > but when i am running blast -help in terminal it showing me error that > > error while loading shared libraries: > libbz2.so.1: cannot open shared object file: No such file or directory. > > -- Regards Manju Rawat Project Assistant(NAIP) Genomics Lab ABTC,NDRI Karnal-132001,Haryana From kumarsaurabh20 at gmail.com Thu Sep 15 07:20:47 2011 From: kumarsaurabh20 at gmail.com (kumar Saurabh) Date: Thu, 15 Sep 2011 13:20:47 +0200 Subject: [Bioperl-l] issue with Bio::Tools::Run::Primer3Redux Message-ID: Hi, I need to integrate the primer3 module in one of our pipeline. In a process, I was testing the initial code given on the CPAN website. But whenever I try to run this program its giving me error...that "Cannot locate the Object method add_target via the package Bio::Tools:Run::Primer3Redux...." The line of codes I am using is as follows: # design some primers. # the output will be put into temp.out use Bio::Tools::Primer3Redux; use Bio::Tools::Run::Primer3Redux; use Bio::SeqIO; my $seqio = Bio::SeqIO->new(-file=>'sample.dna'); my $seq = $seqio->next_seq; my $primer3 = Bio::Tools::Run::Primer3Redux->new(-outfile => "temp.out", -path => "/home/singh/Downloads/primer3-2.2.3/src/primer3_core"); # or after the fact you can change the program_name $primer3->program_name('my_superfast_primer3'); unless ($primer3->executable) { print STDERR "primer3 can not be found. Is it installed?\n"; exit(-1) } # set the maximum and minimum Tm of the primer $primer3->add_targets('PRIMER_MIN_TM'=>56, 'PRIMER_MAX_TM'=>90); # Design the primers. This runs primer3 and returns a # Bio::Tools::Primer3::result object with the results # Primer3 can run in several modes (see explanation for # 'PRIMER_TASK' in the primer3 doccumentation). To run a task, # either call it by its PRIMER_TASK name as in these examples: $pcr_primer_results = $primer3->pick_pcr_primers($seq); $pcr_and_hyb_results = $primer3->pick_pcr_primers_and_hyb_probe( $seq ); $check_results = $primer3->check_primers(); # Alternatively, explicitly set the PRIMER_TASK parameter and # use the generic 'run' method (this is mainly here for backwards # compatibility) : $primer3->PRIMER_TASK( 'pick_left_only' ); $result = $primer3->run( $seq ); # If no task is set and the 'run' method is called, primer3 will default to # pick pcr primers. # see the Bio::Tools::Primer3Redux POD for # things that you can get from this. For example: print "There were ", $results->num_primer_pairs, " primer pairs\n"; Can anyone help me with this??? Best regards, Kumar From fs5 at sanger.ac.uk Thu Sep 15 09:44:03 2011 From: fs5 at sanger.ac.uk (Frank Schwach) Date: Thu, 15 Sep 2011 14:44:03 +0100 Subject: [Bioperl-l] issue with Bio::Tools::Run::Primer3Redux In-Reply-To: References: Message-ID: <1316094243.3797.669.camel@deskpro15336.internal.sanger.ac.uk> Hi Kumar, We are currently working on this module and you might want to check out the latest version on Chris Field's github project: https://github.com/cjfields/Bio-Tools-Primer3Redux There will probably be some changes again once I get some time again to work on a few points we discussed lately. You can also check out my repo here: https://github.com/fschwach/Bio-Tools-Primer3Redux but I will certainly have to make changes to that code because I used AUTOLAD in the last version, which is probably not a good idea. My recommendation would be to use Chris' repo and see if that works for you. If not, feedback would be much appreciated. Cheers, Frank On Thu, 2011-09-15 at 13:20 +0200, kumar Saurabh wrote: > Hi, > > I need to integrate the primer3 module in one of our pipeline. In a process, > I was testing the initial code given on the CPAN website. But whenever I try > to run this program its giving me error...that "Cannot locate the Object > method add_target via the package Bio::Tools:Run::Primer3Redux...." > > The line of codes I am using is as follows: > > # design some primers. > # the output will be put into temp.out > use Bio::Tools::Primer3Redux; > use Bio::Tools::Run::Primer3Redux; > use Bio::SeqIO; > > my $seqio = Bio::SeqIO->new(-file=>'sample.dna'); > my $seq = $seqio->next_seq; > > my $primer3 = Bio::Tools::Run::Primer3Redux->new(-outfile => "temp.out", > -path => > "/home/singh/Downloads/primer3-2.2.3/src/primer3_core"); > > # or after the fact you can change the program_name > $primer3->program_name('my_superfast_primer3'); > > unless ($primer3->executable) { > print STDERR "primer3 can not be found. Is it installed?\n"; > exit(-1) > } > > # set the maximum and minimum Tm of the primer > $primer3->add_targets('PRIMER_MIN_TM'=>56, 'PRIMER_MAX_TM'=>90); > > # Design the primers. This runs primer3 and returns a > # Bio::Tools::Primer3::result object with the results > # Primer3 can run in several modes (see explanation for > # 'PRIMER_TASK' in the primer3 doccumentation). To run a task, > # either call it by its PRIMER_TASK name as in these examples: > $pcr_primer_results = $primer3->pick_pcr_primers($seq); > $pcr_and_hyb_results = $primer3->pick_pcr_primers_and_hyb_probe( $seq ); > $check_results = $primer3->check_primers(); > > # Alternatively, explicitly set the PRIMER_TASK parameter and > # use the generic 'run' method (this is mainly here for backwards > # compatibility) : > $primer3->PRIMER_TASK( 'pick_left_only' ); > $result = $primer3->run( $seq ); > > # If no task is set and the 'run' method is called, primer3 will default > to > # pick pcr primers. > > # see the Bio::Tools::Primer3Redux POD for > # things that you can get from this. For example: > > print "There were ", $results->num_primer_pairs, " primer pairs\n"; > > > Can anyone help me with this??? > > > Best regards, > Kumar > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From cjfields at illinois.edu Thu Sep 15 10:13:38 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Thu, 15 Sep 2011 14:13:38 +0000 Subject: [Bioperl-l] issue with Bio::Tools::Run::Primer3Redux In-Reply-To: <1316094243.3797.669.camel@deskpro15336.internal.sanger.ac.uk> References: <1316094243.3797.669.camel@deskpro15336.internal.sanger.ac.uk> Message-ID: I mentioned off-list that this should be filed as a github issue so we don't lose track. Unfortunately I can't get to it until next week (grant deadline). chris On Sep 15, 2011, at 8:44 AM, Frank Schwach wrote: > Hi Kumar, > > We are currently working on this module and you might want to check out > the latest version on Chris Field's github project: > > https://github.com/cjfields/Bio-Tools-Primer3Redux > > There will probably be some changes again once I get some time again to > work on a few points we discussed lately. You can also check out my repo > here: > https://github.com/fschwach/Bio-Tools-Primer3Redux > > but I will certainly have to make changes to that code because I used > AUTOLAD in the last version, which is probably not a good idea. > My recommendation would be to use Chris' repo and see if that works for > you. If not, feedback would be much appreciated. > > Cheers, > > Frank > > > > > On Thu, 2011-09-15 at 13:20 +0200, kumar Saurabh wrote: >> Hi, >> >> I need to integrate the primer3 module in one of our pipeline. In a process, >> I was testing the initial code given on the CPAN website. But whenever I try >> to run this program its giving me error...that "Cannot locate the Object >> method add_target via the package Bio::Tools:Run::Primer3Redux...." >> >> The line of codes I am using is as follows: >> >> # design some primers. >> # the output will be put into temp.out >> use Bio::Tools::Primer3Redux; >> use Bio::Tools::Run::Primer3Redux; >> use Bio::SeqIO; >> >> my $seqio = Bio::SeqIO->new(-file=>'sample.dna'); >> my $seq = $seqio->next_seq; >> >> my $primer3 = Bio::Tools::Run::Primer3Redux->new(-outfile => "temp.out", >> -path => >> "/home/singh/Downloads/primer3-2.2.3/src/primer3_core"); >> >> # or after the fact you can change the program_name >> $primer3->program_name('my_superfast_primer3'); >> >> unless ($primer3->executable) { >> print STDERR "primer3 can not be found. Is it installed?\n"; >> exit(-1) >> } >> >> # set the maximum and minimum Tm of the primer >> $primer3->add_targets('PRIMER_MIN_TM'=>56, 'PRIMER_MAX_TM'=>90); >> >> # Design the primers. This runs primer3 and returns a >> # Bio::Tools::Primer3::result object with the results >> # Primer3 can run in several modes (see explanation for >> # 'PRIMER_TASK' in the primer3 doccumentation). To run a task, >> # either call it by its PRIMER_TASK name as in these examples: >> $pcr_primer_results = $primer3->pick_pcr_primers($seq); >> $pcr_and_hyb_results = $primer3->pick_pcr_primers_and_hyb_probe( $seq ); >> $check_results = $primer3->check_primers(); >> >> # Alternatively, explicitly set the PRIMER_TASK parameter and >> # use the generic 'run' method (this is mainly here for backwards >> # compatibility) : >> $primer3->PRIMER_TASK( 'pick_left_only' ); >> $result = $primer3->run( $seq ); >> >> # If no task is set and the 'run' method is called, primer3 will default >> to >> # pick pcr primers. >> >> # see the Bio::Tools::Primer3Redux POD for >> # things that you can get from this. For example: >> >> print "There were ", $results->num_primer_pairs, " primer pairs\n"; >> >> >> Can anyone help me with this??? >> >> >> Best regards, >> Kumar >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > The Wellcome Trust Sanger Institute is operated by Genome Research > Limited, a charity registered in England with number 1021457 and a > company registered in England with number 2742969, whose registered > office is 215 Euston Road, London, NW1 2BE. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From fs5 at sanger.ac.uk Thu Sep 15 10:43:48 2011 From: fs5 at sanger.ac.uk (Frank Schwach) Date: Thu, 15 Sep 2011 15:43:48 +0100 Subject: [Bioperl-l] issue with Bio::Tools::Run::Primer3Redux In-Reply-To: References: <1316094243.3797.669.camel@deskpro15336.internal.sanger.ac.uk> Message-ID: <1316097828.3797.700.camel@deskpro15336.internal.sanger.ac.uk> I also haven't had the time yet to work on this again but, yes, we need to make sure we don't loose track of where we are. On Thu, 2011-09-15 at 14:13 +0000, Fields, Christopher J wrote: > I mentioned off-list that this should be filed as a github issue so we don't lose track. Unfortunately I can't get to it until next week (grant deadline). > > chris > > On Sep 15, 2011, at 8:44 AM, Frank Schwach wrote: > > > Hi Kumar, > > > > We are currently working on this module and you might want to check out > > the latest version on Chris Field's github project: > > > > https://github.com/cjfields/Bio-Tools-Primer3Redux > > > > There will probably be some changes again once I get some time again to > > work on a few points we discussed lately. You can also check out my repo > > here: > > https://github.com/fschwach/Bio-Tools-Primer3Redux > > > > but I will certainly have to make changes to that code because I used > > AUTOLAD in the last version, which is probably not a good idea. > > My recommendation would be to use Chris' repo and see if that works for > > you. If not, feedback would be much appreciated. > > > > Cheers, > > > > Frank > > > > > > > > > > On Thu, 2011-09-15 at 13:20 +0200, kumar Saurabh wrote: > >> Hi, > >> > >> I need to integrate the primer3 module in one of our pipeline. In a process, > >> I was testing the initial code given on the CPAN website. But whenever I try > >> to run this program its giving me error...that "Cannot locate the Object > >> method add_target via the package Bio::Tools:Run::Primer3Redux...." > >> > >> The line of codes I am using is as follows: > >> > >> # design some primers. > >> # the output will be put into temp.out > >> use Bio::Tools::Primer3Redux; > >> use Bio::Tools::Run::Primer3Redux; > >> use Bio::SeqIO; > >> > >> my $seqio = Bio::SeqIO->new(-file=>'sample.dna'); > >> my $seq = $seqio->next_seq; > >> > >> my $primer3 = Bio::Tools::Run::Primer3Redux->new(-outfile => "temp.out", > >> -path => > >> "/home/singh/Downloads/primer3-2.2.3/src/primer3_core"); > >> > >> # or after the fact you can change the program_name > >> $primer3->program_name('my_superfast_primer3'); > >> > >> unless ($primer3->executable) { > >> print STDERR "primer3 can not be found. Is it installed?\n"; > >> exit(-1) > >> } > >> > >> # set the maximum and minimum Tm of the primer > >> $primer3->add_targets('PRIMER_MIN_TM'=>56, 'PRIMER_MAX_TM'=>90); > >> > >> # Design the primers. This runs primer3 and returns a > >> # Bio::Tools::Primer3::result object with the results > >> # Primer3 can run in several modes (see explanation for > >> # 'PRIMER_TASK' in the primer3 doccumentation). To run a task, > >> # either call it by its PRIMER_TASK name as in these examples: > >> $pcr_primer_results = $primer3->pick_pcr_primers($seq); > >> $pcr_and_hyb_results = $primer3->pick_pcr_primers_and_hyb_probe( $seq ); > >> $check_results = $primer3->check_primers(); > >> > >> # Alternatively, explicitly set the PRIMER_TASK parameter and > >> # use the generic 'run' method (this is mainly here for backwards > >> # compatibility) : > >> $primer3->PRIMER_TASK( 'pick_left_only' ); > >> $result = $primer3->run( $seq ); > >> > >> # If no task is set and the 'run' method is called, primer3 will default > >> to > >> # pick pcr primers. > >> > >> # see the Bio::Tools::Primer3Redux POD for > >> # things that you can get from this. For example: > >> > >> print "There were ", $results->num_primer_pairs, " primer pairs\n"; > >> > >> > >> Can anyone help me with this??? > >> > >> > >> Best regards, > >> Kumar > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > -- > > The Wellcome Trust Sanger Institute is operated by Genome Research > > Limited, a charity registered in England with number 1021457 and a > > company registered in England with number 2742969, whose registered > > office is 215 Euston Road, London, NW1 2BE. > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From manju.rawat2 at gmail.com Fri Sep 16 01:09:25 2011 From: manju.rawat2 at gmail.com (Manju Rawat) Date: Fri, 16 Sep 2011 01:09:25 -0400 Subject: [Bioperl-l] no blast result In-Reply-To: <1315926565.3797.587.camel@deskpro15336.internal.sanger.ac.uk> References: <1315835661.3797.563.camel@deskpro15336.internal.sanger.ac.uk> <1315926565.3797.587.camel@deskpro15336.internal.sanger.ac.uk> Message-ID: Hello Frank, Yes,u r rite..I tried to run blast in terminal but its not working.. I have installed the latest version of blast and download the database correctly.. But when i running blastn-help command in terminal it showing me error that blastn: error while loading shared libraries: libbz2.so.1: cannot open shared object file: No such file or directory. and when i am running the blastall command then it showing that *legacy_blast.pl blastall -p blastn -i amino.fa -d nr -o blast.out* Can't exec "/usr/bin/blastn": No such file or directory at /usr/bin/ legacy_blast.pl line 85. Program failed, try executing the command manually. While i have set the path of environment variable PATH=$PATH:/home/abc/ncbi-blast-2.2.25+/bin I have checked everything but not able tp fine the error.. Pl help me. Manju From manju.rawat2 at gmail.com Fri Sep 16 01:12:03 2011 From: manju.rawat2 at gmail.com (Manju Rawat) Date: Fri, 16 Sep 2011 01:12:03 -0400 Subject: [Bioperl-l] Command line error in BLAST+ Message-ID: Hi, I have installed the latest version of blast and download the database correctly Using this tutorial http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/unix_setup.html But when i running blastn-help command in terminal it showing me error that blastn: error while loading shared libraries: libbz2.so.1: cannot open shared object file: No such file or directory. and when i am running the blastall command then it showing that *legacy_blast.pl blastall -p blastn -i amino.fa -d nr -o blast.out* Can't exec "/usr/bin/blastn": No such file or directory at /usr/bin/ legacy_blast.pl line 85. Program failed, try executing the command manually. While i have set the path of environment variable PATH=$PATH:/home/abc/ncbi-blast-2.2.25+/bin I have checked everything but not able tp fine the error.. Pl help me. Thanks Manju From p.j.a.cock at googlemail.com Fri Sep 16 04:15:46 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 16 Sep 2011 09:15:46 +0100 Subject: [Bioperl-l] Command line error in BLAST+ In-Reply-To: References: Message-ID: On Fri, Sep 16, 2011 at 6:12 AM, Manju Rawat wrote: > Hi, > > > I have installed the latest version of blast and download the database > correctly Using this tutorial > http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/unix_setup.html > > But when i running blastn-help command in terminal it showing me error that > blastn: error while loading shared libraries: libbz2.so.1: cannot open > shared object file: No such file or directory. > > > and when i am running the blastall command then it showing that > *legacy_blast.pl blastall -p blastn -i amino.fa -d nr -o blast.out* > Can't exec "/usr/bin/blastn": No such file or directory at /usr/bin/ > legacy_blast.pl line 85. > Program failed, try executing the command manually. > > While i have set the path of environment variable > PATH=$PATH:/home/abc/ncbi-blast-2.2.25+/bin > > I have checked everything but not able tp fine the error.. > > Pl help me. > > Thanks > Manju You're using the BioPerl wrapper for legacy blast (blastall), which is not installed. Instead you have the new blast+ suite which includes a wrapper using the perl script legacy_blast.pl to imitate the old blastall tool (in this case it calls the new tool blastn). Fix 1: Edit legacy_blast.pl to use the path to blastn etc under your home directory Fix 2: Install BLAST+ at system level Fix 3: Use the BioPerl wrapper for BLAST+ instead. I'd go with option 3. Peter From p.j.a.cock at googlemail.com Fri Sep 16 04:17:58 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 16 Sep 2011 09:17:58 +0100 Subject: [Bioperl-l] no blast result In-Reply-To: References: <1315835661.3797.563.camel@deskpro15336.internal.sanger.ac.uk> <1315926565.3797.587.camel@deskpro15336.internal.sanger.ac.uk> Message-ID: On Fri, Sep 16, 2011 at 6:09 AM, Manju Rawat wrote: > Hello Frank, > > Yes,u r rite..I tried to run blast in terminal but its not working.. > I have installed the latest version of blast and download the database > correctly.. > > But when i running blastn-help command in terminal it showing me error that > blastn: error while loading shared libraries: libbz2.so.1: cannot open > shared object file: No such file or directory. > > > and when i am running the blastall command then it showing that > legacy_blast.pl blastall -p blastn -i amino.fa -d nr -o blast.out > Can't exec "/usr/bin/blastn": No such file or directory at > /usr/bin/legacy_blast.pl line 85. > Program failed, try executing the command manually. > > While i have set the path of environment variable > PATH=$PATH:/home/abc/ncbi-blast-2.2.25+/bin > > I have checked everything but not able tp fine the error.. > > Pl help me. > Manju For the benefit of anyone reading the archives later, I tried to answer this in Manju's new thread: http://lists.open-bio.org/pipermail/bioperl-l/2011-September/035696.html Peter From fs5 at sanger.ac.uk Fri Sep 16 04:36:37 2011 From: fs5 at sanger.ac.uk (Frank Schwach) Date: Fri, 16 Sep 2011 09:36:37 +0100 Subject: [Bioperl-l] Command line error in BLAST+ In-Reply-To: References: Message-ID: <1316162197.3797.721.camel@deskpro15336.internal.sanger.ac.uk> Hi Manju, Are you on Ubuntu? I think I've seen problems with this bzip library on Ubuntu before. It's not a problem with BLAST in any case. Should be possible to install the missing files through your package manager. I'm sure Google will know what to do :) Not sure what went wrong with your blast installation. What happens if you run blastall directly (without the legacy_blast.pl script)? In any case, it might be better to ask the NCBI people for help with the BLAST installation as this is not a BioPerl problem. cheers, Frank On Fri, 2011-09-16 at 01:12 -0400, Manju Rawat wrote: > Hi, > > > I have installed the latest version of blast and download the database > correctly Using this tutorial > http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/unix_setup.html > > But when i running blastn-help command in terminal it showing me error that > blastn: error while loading shared libraries: libbz2.so.1: cannot open > shared object file: No such file or directory. > > > and when i am running the blastall command then it showing that > *legacy_blast.pl blastall -p blastn -i amino.fa -d nr -o blast.out* > Can't exec "/usr/bin/blastn": No such file or directory at /usr/bin/ > legacy_blast.pl line 85. > Program failed, try executing the command manually. > > While i have set the path of environment variable > PATH=$PATH:/home/abc/ncbi-blast-2.2.25+/bin > > I have checked everything but not able tp fine the error.. > > Pl help me. > > Thanks > Manju > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From ross at cuhk.edu.hk Fri Sep 16 04:51:38 2011 From: ross at cuhk.edu.hk (Ross KK Leung) Date: Fri, 16 Sep 2011 16:51:38 +0800 Subject: [Bioperl-l] use blast to extract similar sequences In-Reply-To: References: <048a01cc6f84$60c41090$224c31b0$@edu.hk> <04a601cc700e$4f051d10$ed0f5730$@edu.hk> Message-ID: <085501cc744d$d90b4500$8b21cf00$@edu.hk> I wonder whether bioperl has any built-in modules that extracts sequences based on blast results. For example, a short query sequence of length 1000 is to blast against a reference genome of 3M. The homologous sequence of 1000 +/- 20 is extracted. Why is +/- 20 needed? Because we can't guarantee there must have a good match. Frequent blast users may be well aware that then there can be coverage, split-up due to local alignments, etc and that's why I would like to know if anybody has already developed a module to handle this kind of problem. Thanks in advance! From cjfields at illinois.edu Fri Sep 16 09:22:07 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Fri, 16 Sep 2011 13:22:07 +0000 Subject: [Bioperl-l] use blast to extract similar sequences In-Reply-To: <085501cc744d$d90b4500$8b21cf00$@edu.hk> References: <048a01cc6f84$60c41090$224c31b0$@edu.hk> <04a601cc700e$4f051d10$ed0f5730$@edu.hk> <085501cc744d$d90b4500$8b21cf00$@edu.hk> Message-ID: That seems like a pretty straightforward thing to do; there isn't an all-in-one way of doing this, but that's a good thing (it's a separation of concerns). 1) Run and parse BLAST results and grab seqID and coordinates for each hit (or each HSP for each hit) (Bio::SearchIO) 2) Pull the right subsequence +/- 20bp using above from the indexed flat file of your reference (Bio::DB::Fasta) You can get revcomped sequence from Bio::DB::Fasta directly by flipping coordinates: # raw sequence my $seq = $db->seq('CHROMOSOME_I',4_000_000 => 4_100_000); my $revseq = $db->seq('CHROMOSOME_I',4_100_000 => 4_000_000); chris On Sep 16, 2011, at 3:51 AM, Ross KK Leung wrote: > I wonder whether bioperl has any built-in modules that extracts sequences based on blast results. For example, a short query sequence of length 1000 is to blast against a reference genome of 3M. The homologous sequence of 1000 +/- 20 is extracted. Why is +/- 20 needed? Because we can't guarantee there must have a good match. Frequent blast users may be well aware that then there can be coverage, split-up due to local alignments, etc and that's why I would like to know if anybody has already developed a module to handle this kind of problem. Thanks in advance! > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From wsavigne at yahoo.com Fri Sep 16 16:45:12 2011 From: wsavigne at yahoo.com (Willy Savigne) Date: Fri, 16 Sep 2011 13:45:12 -0700 (PDT) Subject: [Bioperl-l] question Bioperl installation Message-ID: <1316205912.93564.YahooMailClassic@web160515.mail.bf1.yahoo.com> my name is william how do download Bioperl i tried other site but NOTHING? i would like to know info in downloading? bioperl .This is my first? time into knowing? bioinformatic i? just got? a book developing bioinformatic and begginning perl bioinformatic. I do alot Dna and RNA sequencing?? and more. ? Thank u willy From ross at cuhk.edu.hk Sun Sep 18 06:51:05 2011 From: ross at cuhk.edu.hk (Ross KK Leung) Date: Sun, 18 Sep 2011 18:51:05 +0800 Subject: [Bioperl-l] snp/frameshift identification In-Reply-To: References: <048a01cc6f84$60c41090$224c31b0$@edu.hk> <04a601cc700e$4f051d10$ed0f5730$@edu.hk> <085501cc744d$d90b4500$8b21cf00$@edu.hk> Message-ID: <08a901cc75f0$dd463b30$97d2b190$@edu.hk> Dear Bioperl-users, Following Fields, Christopher J's advice on sequence extraction, I manage to proceed to the last stage of non-synonymous SNP identification. Now what I have in hand is thousands of reliable multiple sequence alignment files, e.g. >seq1 ATGACAGACACGACGTTGCCGTAG >seq2 ATGACAGACACGACGTAGCCGTAG >seq3 ATGACAGACACGACGTTGCCGTAG Seq2 has a T->A mutation and that leads to a stop codon generation. I wonder if Bioperl has handled this kind of SNP or frameshift or non-sense mutations that lead to change of amino acid in the translated protein product. Thanks again to the community that helps me a great deal so I can catch up progress during this Sat/Sun!! From rondonbio at yahoo.com.br Mon Sep 19 09:46:36 2011 From: rondonbio at yahoo.com.br (Rondon Neto) Date: Mon, 19 Sep 2011 06:46:36 -0700 (PDT) Subject: [Bioperl-l] help-> SearchIO Message-ID: <1316439996.6247.YahooMailNeo@web130220.mail.mud.yahoo.com> Hi guys! I need your help in a loop that I have in SearchIO. I need to check the nucleotide coverage of querys using BLAST. I'm using the script below. It's open the alignment, create arrays for each query with zeros in each nucleotide position but, when I adds values to the coverage of each nucleotide, the script does it once and passes to another query. Can you hek me? Thank you very much, Rondon a Brazilian friend. use Bio::SearchIO; ? my $alignment = new Bio::SearchIO ( -format => 'blastXML', ? ? ? ? ? ? ? ? ? ? ? ? ?? ? -file ? => $alignment_file ); my %positions; while (my $result = $alignment->next_result) { while (my $hit = $result->next_hit) { while (my $hsp = $hit->next_hsp) { my $query_name = $result->query_name(); my $tam = $result -> query_length(); my @pos = $hsp->seq_inds('query','identical'); for (0..$tam){ ${$positions{$query_name}}[$_] = 0 } # make arrays for each query and populate with 0 in each position foreach my $num (@pos) { ${$positions{$query_name}}[$num -1]++; ? ?#This loop is where I believe that is an error. } } } } foreach my $key (keys %positions){ print "$key\t@{$positions{$key}}\n"; } exit; From roy.chaudhuri at gmail.com Mon Sep 19 12:29:41 2011 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Mon, 19 Sep 2011 17:29:41 +0100 Subject: [Bioperl-l] help-> SearchIO In-Reply-To: <1316439996.6247.YahooMailNeo@web130220.mail.mud.yahoo.com> References: <1316439996.6247.YahooMailNeo@web130220.mail.mud.yahoo.com> Message-ID: <4E776DF5.6040504@gmail.com> Hi Rondon, The line where you populate your arrayref with 0 (starting "for (0..$tam)") is within the HSP loop, so the data from every successive HSP will overwrite the previous one in your hash. You will therefore only see the data for the last HSP from each query. If you move that line to execute once per result (i.e. just after the line starting "while (my result ="), then I think it should work as you intended. Cheers, Roy. On 19/09/2011 14:46, Rondon Neto wrote: > Hi guys! I need your help in a loop that I have in SearchIO. I need > to check the nucleotide coverage of querys using BLAST. I'm using the > script below. It's open the alignment, create arrays for each query > with zeros in each nucleotide position but, when I adds values to the > coverage of each nucleotide, the script does it once and passes to > another query. Can you hek me? Thank you very much, > > Rondon a Brazilian friend. > > use Bio::SearchIO; > > my $alignment = new Bio::SearchIO ( -format => 'blastXML', > -file => $alignment_file ); > > my %positions; > while (my $result = $alignment->next_result) { > while (my $hit = $result->next_hit) { > while (my $hsp = $hit->next_hsp) { > my $query_name = $result->query_name(); > my $tam = $result -> query_length(); > my @pos = $hsp->seq_inds('query','identical'); > for (0..$tam){ ${$positions{$query_name}}[$_] = 0 } # make arrays for each query and populate with 0 in each position > foreach my $num (@pos) { > ${$positions{$query_name}}[$num -1]++; #This loop is where I believe that is an error. > } > } > } > } > > foreach my $key (keys %positions){ > print "$key\t@{$positions{$key}}\n"; > } > > exit; > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From roy.chaudhuri at gmail.com Mon Sep 19 12:39:40 2011 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Mon, 19 Sep 2011 17:39:40 +0100 Subject: [Bioperl-l] question Bioperl installation In-Reply-To: <1316205912.93564.YahooMailClassic@web160515.mail.bf1.yahoo.com> References: <1316205912.93564.YahooMailClassic@web160515.mail.bf1.yahoo.com> Message-ID: <4E77704C.20604@gmail.com> Hi Willy, There are instructions for downloading and installing BioPerl on the wiki: http://www.bioperl.org/wiki/Getting_BioPerl http://www.bioperl.org/wiki/Installing_BioPerl These are the first two results when you Google for "bioperl download". Note that the wiki is a little out of date, the latest BioPerl version is 1.6.901: http://search.cpan.org/~cjfields/BioPerl-1.6.901/ Cheers, Roy. On 16/09/2011 21:45, Willy Savigne wrote: > my name is william how do download Bioperl i tried other site but > NOTHING i would like to know info in downloading bioperl .This is > my first time into knowing bioinformatic i just got a book > developing bioinformatic and begginning perl bioinformatic. I do alot > Dna and RNA sequencing and more. > > Thank u willy > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From bosborne11 at verizon.net Tue Sep 20 13:01:21 2011 From: bosborne11 at verizon.net (Brian Osborne) Date: Tue, 20 Sep 2011 13:01:21 -0400 Subject: [Bioperl-l] Question about a phylogenetic tree Message-ID: <36D93DC4-135D-408F-8169-AC8D5E59BD90@verizon.net> All, I have code that starts with a sequence file and makes a tree (Bio::Tree::Tree) using Muscle to align and then Phyml, here's the last part that makes the tree: ..... get the files etc .... my %alignparams = ( -seqtype => 'nucleo', -usetree_nowarn => $guidetreefile, -in => $tempfile ); my $aligner = Bio::Tools::Run::Alignment::Muscle->new(%alignparams); # $align is a Bio::SimpleAlign object my $align = $aligner->align($tempfile); my %treeparams = ( -data_type => 'nt', -model => 'K80', # Kimura -tree => 'BIONJ', -bootstrap => 1000 ); my $treemaker = Bio::Tools::Run::Phylo::Phyml->new(%treeparams); #$tree is a Bio::Tree::Tree object my $tree = $treemaker->run($align); My question: do I get the pairwise distance between 2 sequences (based on Kimura here) by doing something like: $distance = $tree->subtree_length($internal_node) Where $internal_node is the parent of the pair in question? Excuse me if this is obvious, have never made Bioperl trees before! Brian O. From bosborne11 at verizon.net Tue Sep 20 15:17:13 2011 From: bosborne11 at verizon.net (Brian Osborne) Date: Tue, 20 Sep 2011 15:17:13 -0400 Subject: [Bioperl-l] Question about a phylogenetic tree In-Reply-To: <36D93DC4-135D-408F-8169-AC8D5E59BD90@verizon.net> References: <36D93DC4-135D-408F-8169-AC8D5E59BD90@verizon.net> Message-ID: Ah, I see: my $distances = $tree->distance(-nodes => [$node1,$node2]); Brian O. On Sep 20, 2011, at 1:01 PM, Brian Osborne wrote: > All, > > I have code that starts with a sequence file and makes a tree (Bio::Tree::Tree) using Muscle to align and then Phyml, here's the last part that makes the tree: > > ..... get the files etc .... > > my %alignparams = ( > -seqtype => 'nucleo', > -usetree_nowarn => $guidetreefile, > -in => $tempfile > ); > my $aligner = Bio::Tools::Run::Alignment::Muscle->new(%alignparams); > > # $align is a Bio::SimpleAlign object > my $align = $aligner->align($tempfile); > > my %treeparams = ( > -data_type => 'nt', > -model => 'K80', # Kimura > -tree => 'BIONJ', > -bootstrap => 1000 > ); > my $treemaker = Bio::Tools::Run::Phylo::Phyml->new(%treeparams); > > #$tree is a Bio::Tree::Tree object > my $tree = $treemaker->run($align); > > My question: do I get the pairwise distance between 2 sequences (based on Kimura here) by doing something like: > > $distance = $tree->subtree_length($internal_node) > > Where $internal_node is the parent of the pair in question? Excuse me if this is obvious, have never made Bioperl trees before! > > Brian O. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From manju.rawat2 at gmail.com Thu Sep 22 07:07:39 2011 From: manju.rawat2 at gmail.com (Manju Rawat) Date: Thu, 22 Sep 2011 16:37:39 +0530 Subject: [Bioperl-l] database for Bos Touras Message-ID: Hello To All, I want to blast my sequence Only in Bos Touras Database Using Local Blast(Blast+). But I dnt Know which database I should use for this From this Link. ftp://ftp.ncbi.nlm.nih.gov/blast/db/ Pl tell me which DB I Should use?? Thanks Manju From hrh at fmi.ch Thu Sep 22 07:44:56 2011 From: hrh at fmi.ch (Hans-Rudolf Hotz) Date: Thu, 22 Sep 2011 13:44:56 +0200 Subject: [Bioperl-l] database for Bos Touras In-Reply-To: References: Message-ID: <4E7B1FB8.8090208@fmi.ch> assuming you mean 'Bos taurus', it might be easier to get the data from ucsc: http://hgdownload.cse.ucsc.edu/downloads.html#cow or ensembl: ftp://ftp.ensembl.org/pub/release-64/fasta/bos_taurus/dna/ Regards, Hans On 09/22/2011 01:07 PM, Manju Rawat wrote: > Hello To All, > > I want to blast my sequence Only in Bos Touras Database Using Local > Blast(Blast+). > But I dnt Know which database I should use for this From this Link. > ftp://ftp.ncbi.nlm.nih.gov/blast/db/ > > Pl tell me which DB I Should use?? > > Thanks > Manju > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From hrh at fmi.ch Thu Sep 22 08:16:00 2011 From: hrh at fmi.ch (Hans-Rudolf Hotz) Date: Thu, 22 Sep 2011 14:16:00 +0200 Subject: [Bioperl-l] database for Bos Touras In-Reply-To: References: <4E7B1FB8.8090208@fmi.ch> Message-ID: <4E7B2700.8080904@fmi.ch> Yes, BLAST uses fasta files. You (may need to concatenate the individual chromosomes and the you) need to index them with 'makeblastdb' which is also part of the blast+ software package. see: http://www.ncbi.nlm.nih.gov/books/NBK1762/ Hans On 09/22/2011 01:49 PM, Manju Rawat wrote: > It will work on Local Blast or not?????? From bosborne11 at verizon.net Thu Sep 22 12:16:39 2011 From: bosborne11 at verizon.net (Brian Osborne) Date: Thu, 22 Sep 2011 12:16:39 -0400 Subject: [Bioperl-l] [bioperl-live] genbank_ref_extractor: new script to make search on entrez gene and retrieve related sequences (#23) In-Reply-To: References: Message-ID: <245C75D5-61EC-4395-B64F-47D8471568F5@verizon.net> Carne, This is impressive looking, it is now in scripts/. Thanks again, Brian O. On Sep 21, 2011, at 11:25 AM, Carn? Draug wrote: > Hi > > I wrote a script with bioperl that I would like to share back. It takes a list of searches for Entrez Gene and attempts to retrieve the related sequences (genomic, transcripts and proteins). It is also possible to obtain extra upstream and downstream bp for genomic sequences and control the naming of the files. In the end it can save all the results in a CSV file. > > Hope you find it up to your coding standards. Suggestions for improvements are welcome, including for a better name. > > Carn? > > You can merge this Pull Request by running: > > git pull https://github.com/carandraug/bioperl-live bp_genbank_ref_extractor > > Or you can view, comment on it, or merge it online at: > > https://github.com/bioperl/bioperl-live/pull/23 > > -- Commit Summary -- > > * genbank_ref_extractor: new script to make search on entrez gene and retrieve related sequences > > -- File Changes -- > > A scripts/Bio-DB-EUtilities/bp_genbank_ref_extractor.pl (1064) > > -- Patch Links -- > > https://github.com/bioperl/bioperl-live/pull/23.patch > https://github.com/bioperl/bioperl-live/pull/23.diff > > -- > Reply to this email directly or view it on GitHub: > https://github.com/bioperl/bioperl-live/pull/23 From bluecurio at gmail.com Thu Sep 22 15:32:07 2011 From: bluecurio at gmail.com (Daniel Renfro) Date: Thu, 22 Sep 2011 14:32:07 -0500 Subject: [Bioperl-l] Download RefSeq revision history programmatically Message-ID: <29B78DBCEBCA42B5A99461FE1E9BC33F@gmail.com> I am working on a project to find historical differences in GenBank/RefSeq files. I would like to download all the old revisions of a file (for example NC_000913 [http://www.ncbi.nlm.nih.gov/nuccore/NC_000913.2?report=girevhist]) using any technology available. I wrote a page-scraper in Perl, but I can't get NCBI to return plaintext, only HTML (which does nobody any good.) Does anyone know of a way to get all the "revisions" (not just "versions") of a GenBank/RefSeq file? -Daniel -- http://ecoliwiki.net/User:DanielRenfro Hu Lab Research Associate 979-862-4055 From ross at cuhk.edu.hk Tue Sep 27 10:16:14 2011 From: ross at cuhk.edu.hk (Ross KK Leung) Date: Tue, 27 Sep 2011 22:16:14 +0800 Subject: [Bioperl-l] obtain a distance matrix from tree In-Reply-To: References: <048a01cc6f84$60c41090$224c31b0$@edu.hk> <04a601cc700e$4f051d10$ed0f5730$@edu.hk> <085501cc744d$d90b4500$8b21cf00$@edu.hk> Message-ID: <014201cc7d20$03abd4c0$0b037e40$@edu.hk> After using MEGA to generate a newick tree file (phylogram), I wonder if Bioperl has any convenient functions to derive the (n x n) distance (by NJ, MP etc) matrix. Thanks for your advice in advance! From thomas.sharpton at gmail.com Tue Sep 27 16:02:44 2011 From: thomas.sharpton at gmail.com (Thomas Sharpton) Date: Tue, 27 Sep 2011 13:02:44 -0700 Subject: [Bioperl-l] obtain a distance matrix from tree In-Reply-To: <014201cc7d20$03abd4c0$0b037e40$@edu.hk> References: <048a01cc6f84$60c41090$224c31b0$@edu.hk> <04a601cc700e$4f051d10$ed0f5730$@edu.hk> <085501cc744d$d90b4500$8b21cf00$@edu.hk> <014201cc7d20$03abd4c0$0b037e40$@edu.hk> Message-ID: Hi Ross, For very large trees, I found it to be more efficient to do this in R using the ape package. I have a script listed in my github repo that will convert a tree to a distance matrix via in R at the link below: https://github.com/sharpton/PhylOTU/blob/master/tree_to_matrix.R That said, I've also done this in Bioperl using something like the following: use Bio::TreeIO; my $treein = Bio::TreeIO->new( -fh => "input_tree.nwk", -format => 'newick' ); while( my $tree = $treein->next_tree ){ my %dist_matrix = (); my @leaves = $tree->get_leaf_nodes; foreach my $leaf1( @leaves ){ my $id1 = $leaf1->id; foreach my $leaf2( @leaves ){ my $id2 = $leaf2->id; next if $id1 eq $id2; next if( defined( $dist_matrix{$id1}->{$id2} ) || defined ( $dist_matrix{$id2}->{$id1} ) ); my $distance = $tree->distance( -nodes => [$leaf1, $leaf2] ); $dist_matrix{$id1}->{$id2} = $distance; } } } #print distance matrix here.... This will put the information you need to create either a full or a upper triangle distance matrix into the hash %dist_matrix. I didn't test the above, so hopefully there are no bugs.... Someone else may have a more elegant solution. Best, Tom PS: Sorry if you get this twice. On Sep 27, 2011, at 7:16 AM, Ross KK Leung wrote: > After using MEGA to generate a newick tree file (phylogram), I > wonder if > Bioperl has any convenient functions to derive the (n x n) distance > (by NJ, > MP etc) matrix. Thanks for your advice in advance! > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From member at linkedin.com Tue Sep 27 19:45:10 2011 From: member at linkedin.com (Razi Khaja via LinkedIn) Date: Tue, 27 Sep 2011 23:45:10 +0000 (UTC) Subject: [Bioperl-l] Invitation to connect on LinkedIn Message-ID: <1856085440.8574001.1317167110185.JavaMail.app@ela4-bed82.prod> LinkedIn ------------ Razi Khaja requested to add you as a connection on LinkedIn: ------------------------------------------ Bolotin,, I'd like to add you to my professional network on LinkedIn. Accept invitation from Razi Khaja http://www.linkedin.com/e/5drwke-gt3jaequ-6k/uez6TYkHzbaXxXM-lUk23auFwJZodcPlXc2UWC0Ao8h/blk/I3148646357_2/1BpC5vrmRLoRZcjkkZt5YCpnlOt3RApnhMpmdzgmhxrSNBszYOnPsRcPoQdzwQcjd9bSsPizpOoltTbP0NdPgMd3kTcPgLrCBxbOYWrSlI/EML_comm_afe/?hs=false&tok=2TjQgihXkh-kU1 View profile of Razi Khaja http://www.linkedin.com/e/5drwke-gt3jaequ-6k/rsn/35197242/UkCS/?hs=false&tok=3k9X2Qfnoh-kU1 ------------------------------------------ From ross at cuhk.edu.hk Tue Sep 27 23:57:52 2011 From: ross at cuhk.edu.hk (Ross KK Leung) Date: Wed, 28 Sep 2011 11:57:52 +0800 Subject: [Bioperl-l] ancestral state derived from Tree In-Reply-To: <29B78DBCEBCA42B5A99461FE1E9BC33F@gmail.com> References: <29B78DBCEBCA42B5A99461FE1E9BC33F@gmail.com> Message-ID: <017701cc7d92$cba88d20$62f9a760$@edu.hk> By using Tom's advice, I'm able to obtain the distance matrix for the following tree by Bioperl TreeIO. ((((((A:1.00000000,B:1.00000000):1.00000000,C:1.00000000):0.00000000,D:0.000 00000):1.00000000,(E:0.00000000,(F:2.00000000,G:1.00000000):0.00000000):0.00 000000):2.00000000,(H:3.00000000,(I:2.00000000,(J:1.00000000,(K:2.00000000,( L:2.00000000,M:2.00000000):0.00000000):0.00000000):0.00000000):0.00000000):0 .00000000):1.00000000,(N:0.00000000,((O:0.00000000,P:0.00000000):1.00000000, (Q:2.00000000,(R:2.66666667,S:3.66666667):3.66666667):0.00000000):1.00000000 ):3.00000000,(T:0.00000000,(U:0.00000000,V:0.00000000):1.00000000):16.000000 00); For the last few nodes T, U and V, they should be monophyletic but U and V should be more closely related. Although I can use TreeIO methods like is_monophyletic or is_paraphyletic to test in this case, the problem becomes more tricky for nodes A, B, C, D because D actually makes no difference from the common ancestor of nodes A, B, C and D. Since is_monophyletic does not take into account for this case, is there any workaround? I have to pay attention to such a detail in order to make a better guess for the ancestral state(s) at various points of this tree. Thanks again for the TreeIO developers for making tree analysis easier for us biologists! From manju.rawat2 at gmail.com Wed Sep 28 05:54:07 2011 From: manju.rawat2 at gmail.com (Manju Rawat) Date: Wed, 28 Sep 2011 15:24:07 +0530 Subject: [Bioperl-l] how to blast a seq against multiple dbase Message-ID: Hello, I have downloaded all the chromosome of Bos Taurus and i'd changed them in blast format using makeblastdb..and now i want to localy blast my sequence against these all chromosome.. now i have 29 database.Is there any method by which can i blast my sequence against all 29 database in my program.. whta should i write in database???? @params = ('database' => '????????', 'outfile' => 'blast2.out', '_READMETHOD' => 'Blast', 'prog'=> 'blastn'); Thanks Manju Rawat From p.j.a.cock at googlemail.com Wed Sep 28 06:02:07 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 28 Sep 2011 11:02:07 +0100 Subject: [Bioperl-l] how to blast a seq against multiple dbase In-Reply-To: References: Message-ID: On Wed, Sep 28, 2011 at 10:54 AM, Manju Rawat wrote: > Hello, > > I have downloaded all the chromosome of Bos Taurus and i'd changed them in > blast format using makeblastdb..and now i want to localy blast my sequence > against these all chromosome.. > now i have 29 database.Is there any method by which can i blast my sequence > against all 29 database in my program.. > > > whta should i write in database???? > > @params = ('database' => '????????', 'outfile' => 'blast2.out', > ? ? ? ?'_READMETHOD' => 'Blast', 'prog'=> 'blastn'); > The simple answer is make a combined database. This works internally with alias files, have a look at the NR and NT databases for example - they act like singe databases but are actually a collection of chunks. Even simpler would be to combine your Bos taurus sequence files into a single multi-entry FASTA file, and make that into a single BLAST database. Peter From awitney at sgul.ac.uk Wed Sep 28 06:42:39 2011 From: awitney at sgul.ac.uk (Adam Witney) Date: Wed, 28 Sep 2011 11:42:39 +0100 Subject: [Bioperl-l] how to blast a seq against multiple dbase In-Reply-To: References: Message-ID: I think if you want to keep the databases separate you would need to create a factory for each database, something like this foreach my $db ( @databases ) { my $factory = Bio::Tools::Run::StandAloneBlastPlus->new( -db_data => $db , < ? any other params ? > ); ? do blast stuff? } or as Peter says in another email you could combine your databases and run one query then filter them out in the results. regards adam On 28 Sep 2011, at 10:54, Manju Rawat wrote: > Hello, > > I have downloaded all the chromosome of Bos Taurus and i'd changed them in > blast format using makeblastdb..and now i want to localy blast my sequence > against these all chromosome.. > now i have 29 database.Is there any method by which can i blast my sequence > against all 29 database in my program.. > > > whta should i write in database???? > > @params = ('database' => '????????', 'outfile' => 'blast2.out', > '_READMETHOD' => 'Blast', 'prog'=> 'blastn'); > > > > Thanks > Manju Rawat > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From carandraug+dev at gmail.com Wed Sep 28 07:43:02 2011 From: carandraug+dev at gmail.com (=?ISO-8859-1?Q?Carn=EB_Draug?=) Date: Wed, 28 Sep 2011 12:43:02 +0100 Subject: [Bioperl-l] retrieving bioperl version for scripts Message-ID: Hi everyone, is there a recommended way to get the version of a script that is part of bioperl (the ones in the scripts directory)? Rather than hard coding the version of the script independent of bioperl, I thought on using the bioperl version itself. How can this be done? Thanks in advance, Carn? From carandraug+dev at gmail.com Wed Sep 28 11:00:34 2011 From: carandraug+dev at gmail.com (=?ISO-8859-1?Q?Carn=EB_Draug?=) Date: Wed, 28 Sep 2011 16:00:34 +0100 Subject: [Bioperl-l] retrieving bioperl version for scripts In-Reply-To: References: Message-ID: 2011/9/28 longbow leo : > Hi, Carn?, > > Do you mean this: > > perl -MBio::SeqIO -e 'print $Bio::SeqIO::VERSION' > > In my machine, the output is Thank you. Yes this is what I was looking for. I looked down how that variable comes up and so I think I'll use use Bio::Root::Version; say $Bio::Root::Version::VERSION; Carn? From pcantalupo at gmail.com Wed Sep 28 12:54:19 2011 From: pcantalupo at gmail.com (Paul Cantalupo) Date: Wed, 28 Sep 2011 12:54:19 -0400 Subject: [Bioperl-l] algorithm_version not working in multi-result blast output Message-ID: Hello, I'm using the most recent copy of bioperl-live (pulled yesterday). I have a BLASTN (from blast+) output file for 3 query sequences (https://gist.github.com/1248342). I used this script, https://gist.github.com/1248338, to print the query id, algorithm and algorithm_version for each result. When I run the script, I get the following output: GFAVMM201BADC0 ?BLASTN ?2.2.25+ GFAVMM201A1JOH ?BLASTN GFAVMM201D933Z ?BLASTN Algorithm_version outputs the correct version for the first result but outputs the empty string for the 2nd and 3rd query. Why? This functionality worked about a month ago. What has changed to cause this to happen? Thank you, Paul From rondonbio at yahoo.com.br Wed Sep 28 15:47:45 2011 From: rondonbio at yahoo.com.br (Rondon Neto) Date: Wed, 28 Sep 2011 12:47:45 -0700 (PDT) Subject: [Bioperl-l] best Hit Message-ID: <1317239265.98674.YahooMailNeo@web130214.mail.mud.yahoo.com> Hi guys.? I have this subroutine that returns a hash with nucleotide's coverage of each query from a blast alignment. So, I want to compute uniq hits. If a hit has already been aligned with a query, it must be eliminated from my experiment. Can anyone check if it's right or can fix it to me? Is there a way to do that directly in blast? Thank you Rondon Neto sub nucleotide_coverage{ #Bio::SearchIO dependent #This subroutine return a Hash and a file with nucleotide coverage? #for each query in an blast alignment xlm file. The input is the #alignment file. my ($alignment_file) = @_; my $alignment = new Bio::SearchIO ( -format => 'blastXML', ? ?-file ? => $alignment_file );my %positions;my @used_reads; while (my $result = $alignment->next_result) { my $query_name = $result->query_name(); my $tam = $result -> query_length(); for (0..$tam-1){ ${$positions{$query_name}}[$_] = 0 }? while (my $hit = $result->next_hit) { my $hit_name = $hit->name; # Here is my best hit parser. Is it ok? foreach my $read (@used_reads) { if ( $read eq $hit_name ) { next; } } while (my $hsp = $hit->next_hsp) { my $query_name = $result->query_name(); my @pos = $hsp->seq_inds('query','identical'); foreach my $num (@pos) { ${$positions{$query_name}}[$num-1]++; } } push (@used_reads, $hit_name); } } my $outfile = "nucleotide_coverage.txt"; open OUT, ">$outfile" or die $!;foreach my $key (keys %positions){print OUT "$key\t@{$positions{$key}}\n"; } close OUT; return \%positions; } From shalabh.sharma7 at gmail.com Wed Sep 28 15:53:07 2011 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Wed, 28 Sep 2011 15:53:07 -0400 Subject: [Bioperl-l] Getting taxa from gi Message-ID: Hi All, I know this has been discussed before, but this is kind of a new problem that i am facing. I want to get taxonomy (full linage) information from the huge list of GI's. I am using Bio::DB:Genbak for this with perl-5.12.3. Here is my small script. #! /usr/local/perl-5.12.3/bin/perl -w use strict; use warnings; use Bio::DB::GenBank; my @ids = qw( CP000490 ); my $gbh = Bio::DB::GenBank->new(); foreach my $id( @ids ) { # say "* ID: $id"; my $seq = $gbh->get_Seq_by_acc( $id ); my $org = $seq->species; #print "$org\n"; my $class = join'-', $org->classification; print "$class\n"; } The output is: Paracoccus denitrificans PD1222-Paracoccus-Rhodobacteraceae-Rhodobacterales-Alphaproteobacteria-Proteobacteria-Bacteria which is fine but i also want to get the taxa id, and if possible taxa ids for all the linage classification. ideally i would like to get something like this: 318586 - - - - - - - 1224 - 2 I would really appreciate your help. Thanks Shalabh -- Shalabh Sharma Scientific Computing Professional Associate (Bioinformatics Specialist) Department of Marine Sciences University of Georgia Athens, GA 30602-3636 From cjfields at illinois.edu Wed Sep 28 17:36:37 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Wed, 28 Sep 2011 21:36:37 +0000 Subject: [Bioperl-l] retrieving bioperl version for scripts In-Reply-To: References: Message-ID: On Sep 28, 2011, at 10:00 AM, Carn? Draug wrote: > 2011/9/28 longbow leo : >> Hi, Carn?, >> >> Do you mean this: >> >> perl -MBio::SeqIO -e 'print $Bio::SeqIO::VERSION' >> >> In my machine, the output is > > Thank you. Yes this is what I was looking for. I looked down how that > variable comes up and so I think I'll use > > use Bio::Root::Version; > say $Bio::Root::Version::VERSION; > > Carn? Just a warning on this: we're shortly to announce a major restructuring effort with BioPerl that will dramatically affect core versioning, mainly from the point of view that modularizing BioPerl into more manageable sub-distributions will require that each sub-distribution have it's own version (not necessarily a bad thing). chris From cjfields at illinois.edu Wed Sep 28 17:40:48 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Wed, 28 Sep 2011 21:40:48 +0000 Subject: [Bioperl-l] algorithm_version not working in multi-result blast output In-Reply-To: References: Message-ID: <4240A221-7BF7-4723-A7A1-30C806CE3DC6@illinois.edu> Not sure, but I would hazard a guess that only the 'Query=' line is present in concatenated BLAST reports past the initial report, and the version isn't carried over (I recall this being a problem with the algorithm() as well, but that was fixed a while ago. This should be an easy enough fix, but can you submit it as a bug so we can track it? chris On Sep 28, 2011, at 11:54 AM, Paul Cantalupo wrote: > Hello, > > I'm using the most recent copy of bioperl-live (pulled yesterday). I > have a BLASTN (from blast+) output file for 3 query sequences > (https://gist.github.com/1248342). I used this script, > https://gist.github.com/1248338, to print the query id, algorithm and > algorithm_version for each result. When I run the script, I get the > following output: > > GFAVMM201BADC0 BLASTN 2.2.25+ > GFAVMM201A1JOH BLASTN > GFAVMM201D933Z BLASTN > > Algorithm_version outputs the correct version for the first result but > outputs the empty string for the 2nd and 3rd query. Why? This > functionality worked about a month ago. What has changed to cause this > to happen? > > Thank you, > > Paul > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From carandraug+dev at gmail.com Wed Sep 28 18:07:53 2011 From: carandraug+dev at gmail.com (=?ISO-8859-1?Q?Carn=EB_Draug?=) Date: Wed, 28 Sep 2011 23:07:53 +0100 Subject: [Bioperl-l] retrieving bioperl version for scripts In-Reply-To: References: Message-ID: 2011/9/28 Fields, Christopher J : > On Sep 28, 2011, at 10:00 AM, Carn? Draug wrote: > >> 2011/9/28 longbow leo : >>> Hi, Carn?, >>> >>> Do you mean this: >>> >>> perl -MBio::SeqIO -e 'print $Bio::SeqIO::VERSION' >>> >>> In my machine, the output is >> >> Thank you. Yes this is what I was looking for. I looked down how that >> variable comes up and so I think I'll use >> >> use Bio::Root::Version; >> say $Bio::Root::Version::VERSION; >> >> Carn? > > Just a warning on this: we're shortly to announce a major restructuring effort with BioPerl that will dramatically affect core versioning, mainly from the point of view that modularizing BioPerl into more manageable sub-distributions will require that each sub-distribution have it's own version (not necessarily a bad thing). Where will the scripts end up after this restructuration? What I want is to create a version of the script (not of bioperl). Since the script is released with bioperl, they are the same. I actually already made the commit that makes this, just haven't bothered with the pull request yet. Also, will there be a release before this change? Carn? From shalabh.sharma7 at gmail.com Thu Sep 29 10:37:53 2011 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Thu, 29 Sep 2011 10:37:53 -0400 Subject: [Bioperl-l] GFF to GTF Message-ID: Hi, Is there any module to convert GFF file to GTF? Thanks Shalabh -- Shalabh Sharma Scientific Computing Professional Associate (Bioinformatics Specialist) Department of Marine Sciences University of Georgia Athens, GA 30602-3636 From cjfields at illinois.edu Thu Sep 29 11:07:27 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Thu, 29 Sep 2011 15:07:27 +0000 Subject: [Bioperl-l] retrieving bioperl version for scripts In-Reply-To: References: Message-ID: <46733E36-5795-4EB6-9C2B-C978000FFD46@illinois.edu> On Sep 28, 2011, at 5:07 PM, Carn? Draug wrote: > 2011/9/28 Fields, Christopher J : >> On Sep 28, 2011, at 10:00 AM, Carn? Draug wrote: >> >>> 2011/9/28 longbow leo : >>>> Hi, Carn?, >>>> >>>> Do you mean this: >>>> >>>> perl -MBio::SeqIO -e 'print $Bio::SeqIO::VERSION' >>>> >>>> In my machine, the output is >>> >>> Thank you. Yes this is what I was looking for. I looked down how that >>> variable comes up and so I think I'll use >>> >>> use Bio::Root::Version; >>> say $Bio::Root::Version::VERSION; >>> >>> Carn? >> >> Just a warning on this: we're shortly to announce a major restructuring effort with BioPerl that will dramatically affect core versioning, mainly from the point of view that modularizing BioPerl into more manageable sub-distributions will require that each sub-distribution have it's own version (not necessarily a bad thing). > > Where will the scripts end up after this restructuration? What I want > is to create a version of the script (not of bioperl). Since the > script is released with bioperl, they are the same. I actually already > made the commit that makes this, just haven't bothered with the pull > request yet. > > Also, will there be a release before this change? > > Carn? Scripts will likely go with the distribution that they most closely are tied to, but that's still an area for debate (some may equally fall within one distribution or another, which will be tricky). For more on the release aspects see the (currently being revised and thus not complete) wiki page: http://www.bioperl.org/wiki/BioPerl_Modularization chris From pcantalupo at gmail.com Thu Sep 29 12:13:05 2011 From: pcantalupo at gmail.com (Paul Cantalupo) Date: Thu, 29 Sep 2011 12:13:05 -0400 Subject: [Bioperl-l] algorithm_version not working in multi-result blast output In-Reply-To: <4240A221-7BF7-4723-A7A1-30C806CE3DC6@illinois.edu> References: <4240A221-7BF7-4723-A7A1-30C806CE3DC6@illinois.edu> Message-ID: Bug submitted: https://redmine.open-bio.org/issues/3298 On Wed, Sep 28, 2011 at 5:40 PM, Fields, Christopher J wrote: > Not sure, but I would hazard a guess that only the 'Query=' line is present in concatenated BLAST reports past the initial report, and the version isn't carried over (I recall this being a problem with the algorithm() as well, but that was fixed a while ago. > > This should be an easy enough fix, but can you submit it as a bug so we can track it? > > chris > > On Sep 28, 2011, at 11:54 AM, Paul Cantalupo wrote: > >> Hello, >> >> I'm using the most recent copy of bioperl-live (pulled yesterday). I >> have a BLASTN (from blast+) output file for 3 query sequences >> (https://gist.github.com/1248342). I used this script, >> https://gist.github.com/1248338, to print the query id, algorithm and >> algorithm_version for each result. When I run the script, I get the >> following output: >> >> GFAVMM201BADC0 ?BLASTN ?2.2.25+ >> GFAVMM201A1JOH ?BLASTN >> GFAVMM201D933Z ?BLASTN >> >> Algorithm_version outputs the correct version for the first result but >> outputs the empty string for the 2nd and 3rd query. Why? This >> functionality worked about a month ago. What has changed to cause this >> to happen? >> >> Thank you, >> >> Paul >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From jluis.lavin at unavarra.es Fri Sep 30 04:23:19 2011 From: jluis.lavin at unavarra.es (jluis.lavin at unavarra.es) Date: Fri, 30 Sep 2011 10:23:19 +0200 Subject: [Bioperl-l] Bio-Graphics module Message-ID: <52f9bcbb5c40302fe5d1ea274982c24b.squirrel@webmail.unavarra.es> Dear All, I'm currently using Perl 5.10.0 version and Bioperl 1.6.1 running on a windows machine. I read about the Bio-Graphics module and it'd be wonderful to install it, but seems like it is only available for Perl 5.8... Is there any other Perl and/or Bioperl module to do the same kind of genomic and Blast report representation currently available? Thanks in advance -- Dr. Jos? Luis Lav?n Trueba Dpto. de Producci?n Agraria Grupo de Gen?tica y Microbiolog?a Universidad P?blica de Navarra 31006 Pamplona Navarra SPAIN From cjfields at illinois.edu Fri Sep 30 08:38:01 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Fri, 30 Sep 2011 12:38:01 +0000 Subject: [Bioperl-l] Bio-Graphics module In-Reply-To: <52f9bcbb5c40302fe5d1ea274982c24b.squirrel@webmail.unavarra.es> References: <52f9bcbb5c40302fe5d1ea274982c24b.squirrel@webmail.unavarra.es> Message-ID: It's available for all perl versions from 5.8.8 up. I have it running with perl 5.14. Now, I recall there being problems with installation on Mac OS X, though I think that was mainly due to GD.pm and libgd. chris On Sep 30, 2011, at 3:23 AM, wrote: > > Dear All, > > I'm currently using Perl 5.10.0 version and Bioperl 1.6.1 running on a > windows machine. > > I read about the Bio-Graphics module and it'd be wonderful to install it, > but seems like it is only available for Perl 5.8... > Is there any other Perl and/or Bioperl module to do the same kind of > genomic and Blast report representation currently available? > > Thanks in advance > > -- > Dr. Jos? Luis Lav?n Trueba > > Dpto. de Producci?n Agraria > Grupo de Gen?tica y Microbiolog?a > Universidad P?blica de Navarra > 31006 Pamplona > Navarra > SPAIN > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jillianrowe91286 at gmail.com Wed Sep 28 02:03:32 2011 From: jillianrowe91286 at gmail.com (Jill) Date: Tue, 27 Sep 2011 23:03:32 -0700 (PDT) Subject: [Bioperl-l] Gene Type in Entrez gene? Message-ID: Hi there, I am using the Bio::DB::Eutilities module to download gene sequences based on a query. while (my $docsum = $summaries->next_DocSum) { ## some items in DocSum are also named ChrStart so we pick the genomic ## information item and get the coordinates from it my ($genomic_info) = $docsum->get_Items_by_name('GenomicInfoType'); ## some entries may have no data on genomic coordinates. This condition filters then out if (!$genomic_info) { ## found no genomic coordinates data next; } ## get coordinates of sequence ## get_contents_by_name always returns a list my ($chr_acc_ver) = $genomic_info- >get_contents_by_name("ChrAccVer"); my ($chr_start) = $genomic_info- >get_contents_by_name("ChrStart"); my ($chr_stop) = $genomic_info- >get_contents_by_name("ChrStop"); my $strand; if ($chr_start < $chr_stop) { $strand = 1; $chr_start = $chr_start +1 - $bp5_extra; $chr_stop = $chr_stop +1 + $bp5_extra; } elsif ($chr_start > $chr_stop) { $strand = 2; $chr_start = $chr_start +1 - (-$bp5_extra); $chr_stop = $chr_stop +1 + (-$bp5_extra); } else { next; } while (my $item = $docsum->next_Item('flattened')) { next if ($item->get_name =~ m/NomenclatureName/); if($item->get_name =~ m/Description/) { $description = $item->get_content if $item->get_content; $description =~ tr/ /_/; print $description, "\n";} if($item->get_name =~ m/Name/) { $name = $item->get_content if $item->get_content; print $name, "\n"; } printf("%-20s:%s\n",$item->get_name,$item->get_content) if $item->get_content; } } Then I go on to use genbank to download the sequences based on the chromosome splice. For what I have it works great. But I am trying to get to the gene type (either protein coding or pseudo) as well. I can see it in the summary on the Entrez Gene sight, but can't get to it through bioperl. When I have it print out all the contents of the summary it doesn't show up there either. Any help? Thanks! From liam.elbourne at mq.edu.au Thu Sep 29 17:34:04 2011 From: liam.elbourne at mq.edu.au (Liam Elbourne) Date: Fri, 30 Sep 2011 07:34:04 +1000 Subject: [Bioperl-l] GFF to GTF In-Reply-To: References: Message-ID: <8D027281-44E6-467C-8D22-D2D2F87D04B6@mq.edu.au> Hi Shalabh, Not sure about bioperl (I looked a while back and either missed it or it's not there) but there is a program associated with the cufflinks suite called gffread that should convert. Regards, Liam Elbourne. On 30/09/2011, at 12:37 AM, shalabh sharma wrote: > Hi, > Is there any module to convert GFF file to GTF? > > Thanks > Shalabh > > > -- > Shalabh Sharma > Scientific Computing Professional Associate (Bioinformatics Specialist) > Department of Marine Sciences > University of Georgia > Athens, GA 30602-3636 > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From carandraug+dev at gmail.com Fri Sep 30 09:18:04 2011 From: carandraug+dev at gmail.com (=?ISO-8859-1?Q?Carn=EB_Draug?=) Date: Fri, 30 Sep 2011 14:18:04 +0100 Subject: [Bioperl-l] Gene Type in Entrez gene? In-Reply-To: References: Message-ID: On 28 September 2011 07:03, Jill wrote: > Hi there, > > I am using the Bio::DB::Eutilities module to download gene sequences > based on a query. > > > [...] > } > > > Then I go on to use genbank to download the sequences based on the > chromosome splice. For what I have it works great. But I am trying to > get to the gene type (either protein coding or pseudo) as well. I can > see it in the summary on the Entrez Gene sight, but can't get to it > through bioperl. When I have it print out all the contents of the > summary it doesn't show up there either. > > Any help? Hi Jill, there's already a script in bioperl that does what you want, it's just not part of the current stable release. You can get it here https://github.com/bioperl/bioperl-live/blob/master/scripts/Bio-DB-EUtilities/bp_genbank_ref_extractor.pl You can download the script alone, it will work fine in previous releases of bioperl, no need to write another one. Carn? Draug From manju.rawat2 at gmail.com Thu Sep 1 02:53:53 2011 From: manju.rawat2 at gmail.com (Manju Rawat) Date: Thu, 1 Sep 2011 02:53:53 -0400 Subject: [Bioperl-l] Bioperl query.... In-Reply-To: References: <4E5CC8AC.8050800@gmail.com> Message-ID: Thanks For The Reply.. I have already seen this link..But I am confused. I used to following code and run it... my $in = Bio::SearchIO->new(-format => 'blast', -file => 'seqs.blast'); while( my $result = $in->next_result ) { ## $result is a Bio::Search::Result::ResultI compliant object while( my $hit = $result->next_hit ) { ## $hit is a Bio::Search::Hit::HitI compliant object while( my $hsp = $hit->next_hsp ) { ## $hsp is a Bio::Search::HSP::HSPI compliant object if( $hsp->length('total') > 50 ) { if ( $hsp->percent_identity >= 75 ) { print "Query=", $result->query_name, " Hit=", $hit->name, " Length=", $hsp->length('total'), " gaps=", $hsp->gaps, " Percent_id=", $hsp->percent_identity, "\n"; } } } }} and it showing me following output with an error that.. *Erro--*rArgument "" isn't numeric in numeric lt (<) at /usr/local/share/perl/5.10.1/Bio/SearchIO/SearchResultEventBuilder.pm line 279, line 4113. Query=NM_181451 Hit=ref|NM_181451.1| Length=1349 gaps=1 Percent_id=100 Query=NM_181451 Hit=ref|XM_002706247.1| Length=1345 gaps=13 Percent_id=93.8289962825279 Query=NM_181451 Hit=ref|NM_001098089.1| Length=1323 gaps=7 Percent_id=91.9123204837491 Query=NM_181451 Hit=ref|NM_001008415.1| Length=1211 gaps=5 Percent_id=94.9628406275805 Query=NM_181451 Hit=ref|XM_001251693.3| Length=1320 gaps=5 Percent_id=91.969696969697 Query=NM_181451 Hit=ref|NM_001097567.1| Length=1338 gaps=4 Percent_id=91.5545590433483 Query=NM_181451 Hit=gb|AY075103.1| Length=1334 gaps=1 Percent_id=91.304347826087 ................ .......... Pl Find.whats the error i this code... Thanks Manju Rawat. From locarpau at upvnet.upv.es Thu Sep 1 10:49:06 2011 From: locarpau at upvnet.upv.es (Lorenzo Carretero Paulet) Date: Thu, 1 Sep 2011 16:49:06 +0200 Subject: [Bioperl-l] Parsing PAML mlc files In-Reply-To: References: <9645AF32-5EC3-41FA-9A32-45B6B92E31FD@illinois.edu><4DF56976.8080704@upvnet.upv.es><9866C4A4-AC36-4A25-B38F-3006A7BB0F11@illinois.edu> <1314636433.4e5bc291a40c6@webmail.upv.es> <1A4207F8295607498283FE9E93B775B407CCB2D3@EX02.asurite.ad.asu.edu> <1314659986.4e5c1e9268078@webmail.upv.es> Message-ID: <1314888546.4e5f9b628ecea@webmail.upv.es> Hi all, I'm trying to parse mlc output files from PAML using Bio::Tools::Phylo::PAML as: my $mlcfile = "/Applications/Bioinformatics/paml44/bin/mlc"; my $parserF = Bio::Tools::Phylo::PAML->new (-file =>$mclfile); if ( my $paml_result = $parserF->next_result ) { say Dumper $paml_result; #Prints Ok for ( my $model_result= $paml_result->get_NSSite_results ) { #say Dumper $model_result; #Prints nothing $ns_string = "model ".$model_result->model_num."\n ".$model_result->model_description()."\n ".$model_result->time_used."\n"; $dnds_site_classes = $model_result->dnds_site_classes; #a hashref #say Dumper $dnds_site_classes; for my $sites ( $model_result->get_BEB_pos_selected_sites ) ... ... ... The Bio::Tools::Phylo::PAML::Result object is ok, as I can print it using DUmper. In contrast, it seems that the Bio::Tools::Phylo::PAML::ModelResult object is not being properly instantiated, as I get the error message: "Can't call method "model_num" without a package or object reference at ..." What am I missing? Best, Lorenzo From jason.stajich at gmail.com Thu Sep 1 16:23:47 2011 From: jason.stajich at gmail.com (Jason Stajich) Date: Thu, 1 Sep 2011 13:23:47 -0700 Subject: [Bioperl-l] Parsing PAML mlc files In-Reply-To: <1314888546.4e5f9b628ecea@webmail.upv.es> References: <9645AF32-5EC3-41FA-9A32-45B6B92E31FD@illinois.edu><4DF56976.8080704@upvnet.upv.es><9866C4A4-AC36-4A25-B38F-3006A7BB0F11@illinois.edu> <1314636433.4e5bc291a40c6@webmail.upv.es> <1A4207F8295607498283FE9E93B775B407CCB2D3@EX02.asurite.ad.asu.edu> <1314659986.4e5c1e9268078@webmail.upv.es> <1314888546.4e5f9b628ecea@webmail.upv.es> Message-ID: Lorenzo - I am sure this is a problem with changes in the output from PAML - this is classic problem with this suite. This requires some debugging of the parser, not sure if there is anyone out there with time to do the debugging. I can say all this worked before on an earlier version of PAML but I don't know specifically what is going on with the latest paml4.4 version. Jason On Sep 1, 2011, at 7:49 AM, Lorenzo Carretero Paulet wrote: > Hi all, > I'm trying to parse mlc output files from PAML using Bio::Tools::Phylo::PAML as: > > my $mlcfile = "/Applications/Bioinformatics/paml44/bin/mlc"; > my $parserF = Bio::Tools::Phylo::PAML->new (-file =>$mclfile); > if ( my $paml_result = $parserF->next_result ) > { > say Dumper $paml_result; #Prints Ok > for ( my $model_result= $paml_result->get_NSSite_results ) > { > #say Dumper $model_result; #Prints nothing > $ns_string = "model ".$model_result->model_num."\n > ".$model_result->model_description()."\n ".$model_result->time_used."\n"; > $dnds_site_classes = $model_result->dnds_site_classes; #a hashref > #say Dumper $dnds_site_classes; > for my $sites ( $model_result->get_BEB_pos_selected_sites ) > ... > ... > ... > > The Bio::Tools::Phylo::PAML::Result object is ok, as I can print it using > DUmper. In contrast, it seems that the Bio::Tools::Phylo::PAML::ModelResult > object is not being properly instantiated, as I get the error message: > > "Can't call method "model_num" without a package or object reference at ..." > > What am I missing? > Best, > Lorenzo > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From Scott.Markel at accelrys.com Thu Sep 1 17:22:21 2011 From: Scott.Markel at accelrys.com (Scott Markel) Date: Thu, 1 Sep 2011 14:22:21 -0700 Subject: [Bioperl-l] file format for alignment plus features for aligned sequences Message-ID: <5ACBA19439E77B43A06F4CAB897EC97702F8302A05@EXCH1-COLO.accelrys.net> A question on behalf of the Discovery Studio group at Accelrys - They have alignment data with annotations, e.g., visualization settings or alignment properties. The aligned sequences also have features, e.g., domain boundaries or secondary structure motifs. They currently use BSML to save sequences and features. Is there an extension of BSML that can also save the alignment information? Are there any good file formats that can be used to store an alignment plus features associated with the aligned sequences? Are there other mailing lists that might be more appropriate for these questions? Scott Scott Markel, Ph.D. Principal Bioinformatics Architect email: smarkel at accelrys.com Accelrys (Pipeline Pilot R&D) mobile: +1 858 205 3653 10188 Telesis Court, Suite 100 voice: +1 858 799 5603 San Diego, CA 92121 fax: +1 858 799 5222 USA web: http://www.accelrys.com http://www.linkedin.com/in/smarkel Secretary, Board of Directors: International Society for Computational Biology Chair: ISCB Publications Committee Associate Editor: PLoS Computational Biology Editorial Board: Briefings in Bioinformatics From ihok at hotmail.com Thu Sep 1 23:49:50 2011 From: ihok at hotmail.com (Jack Tanner) Date: Thu, 1 Sep 2011 23:49:50 -0400 Subject: [Bioperl-l] Bio::Ext::Align? Message-ID: I'd like to run Bio::Ext::Align, from bioperl-ext. Seems like a pretty quiet codebase these days... Is it dead? Does anyone have a spec file for building an SRPM for it for RHEL 6? From cjfields at illinois.edu Fri Sep 2 00:31:17 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Fri, 2 Sep 2011 04:31:17 +0000 Subject: [Bioperl-l] Bio::Ext::Align? In-Reply-To: References: Message-ID: <82213CB3-C02A-415D-A4F3-66471D44C31C@illinois.edu> Yes, it's essentially deprecated (unmaintained). I don't know of anyone who has packaged that up in a while, if ever. chris On Sep 1, 2011, at 10:49 PM, Jack Tanner wrote: > I'd like to run Bio::Ext::Align, from bioperl-ext. Seems like a pretty quiet codebase these days... Is it dead? > > Does anyone have a spec file for building an SRPM for it for RHEL 6? > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From David.Messina at sbc.su.se Fri Sep 2 04:44:07 2011 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 2 Sep 2011 10:44:07 +0200 Subject: [Bioperl-l] Bio::Ext::Align? In-Reply-To: <82213CB3-C02A-415D-A4F3-66471D44C31C@illinois.edu> References: <82213CB3-C02A-415D-A4F3-66471D44C31C@illinois.edu> Message-ID: As it happens, a colleague of mine needed Bio::Ext::Align for hhrpred: http://toolkit.tuebingen.mpg.de/hhpred He got it working thus: > Hi Dave, > thanks a lot. i made it work. The error i got later on was: relocation R_X86_64_32 against `.rodata.str1.1' can not be used when making > a shared object; > recompile with -fPIC > > the solution is: > perl Makefile.PL PREFIX= /fftwsingle --enable-shared > --with-pic --enable-single > > make > make install > http://forums.fedoraforum.org/ > showthread.php?t=232607 Dave On Fri, Sep 2, 2011 at 06:31, Fields, Christopher J wrote: > Yes, it's essentially deprecated (unmaintained). I don't know of anyone > who has packaged that up in a while, if ever. > > chris > > On Sep 1, 2011, at 10:49 PM, Jack Tanner wrote: > > > I'd like to run Bio::Ext::Align, from bioperl-ext. Seems like a pretty > quiet codebase these days... Is it dead? > > > > Does anyone have a spec file for building an SRPM for it for RHEL 6? > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From David.Messina at sbc.su.se Fri Sep 2 05:30:33 2011 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 2 Sep 2011 11:30:33 +0200 Subject: [Bioperl-l] Parsing PAML mlc files In-Reply-To: References: <9645AF32-5EC3-41FA-9A32-45B6B92E31FD@illinois.edu> <4DF56976.8080704@upvnet.upv.es> <9866C4A4-AC36-4A25-B38F-3006A7BB0F11@illinois.edu> <1314636433.4e5bc291a40c6@webmail.upv.es> <1A4207F8295607498283FE9E93B775B407CCB2D3@EX02.asurite.ad.asu.edu> <1314659986.4e5c1e9268078@webmail.upv.es> <1314888546.4e5f9b628ecea@webmail.upv.es> Message-ID: Looking back at the commit history, back in April and May 2010, I made some updates for the January 2010 edition of PAML 4.4. All tests passed at that time, but: - the tests may be incomplete - PAML has undoubtedly changed since then, even if it's still called version 4.4 I can't look at this right now myself, but please file a bug report on this, and hopefully someone else can. Dave On Thu, Sep 1, 2011 at 22:23, Jason Stajich wrote: > Lorenzo - > > I am sure this is a problem with changes in the output from PAML - this is > classic problem with this suite. This requires some debugging of the > parser, not sure if there is anyone out there with time to do the debugging. > I can say all this worked before on an earlier version of PAML but I don't > know specifically what is going on with the latest paml4.4 version. > > Jason > > > On Sep 1, 2011, at 7:49 AM, Lorenzo Carretero Paulet wrote: > > > Hi all, > > I'm trying to parse mlc output files from PAML using > Bio::Tools::Phylo::PAML as: > > > > my $mlcfile = "/Applications/Bioinformatics/paml44/bin/mlc"; > > my $parserF = Bio::Tools::Phylo::PAML->new (-file =>$mclfile); > > if ( my $paml_result = $parserF->next_result ) > > { > > say Dumper $paml_result; #Prints Ok > > for ( my $model_result= $paml_result->get_NSSite_results ) > > { > > #say Dumper $model_result; #Prints nothing > > $ns_string = "model ".$model_result->model_num."\n > > ".$model_result->model_description()."\n ".$model_result->time_used."\n"; > > $dnds_site_classes = $model_result->dnds_site_classes; #a hashref > > #say Dumper $dnds_site_classes; > > for my $sites ( $model_result->get_BEB_pos_selected_sites ) > > ... > > ... > > ... > > > > The Bio::Tools::Phylo::PAML::Result object is ok, as I can print it using > > DUmper. In contrast, it seems that the > Bio::Tools::Phylo::PAML::ModelResult > > object is not being properly instantiated, as I get the error message: > > > > "Can't call method "model_num" without a package or object reference at > ..." > > > > What am I missing? > > Best, > > Lorenzo > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at illinois.edu Fri Sep 2 09:00:27 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Fri, 2 Sep 2011 13:00:27 +0000 Subject: [Bioperl-l] Bio::Ext::Align? In-Reply-To: References: <82213CB3-C02A-415D-A4F3-66471D44C31C@illinois.edu> Message-ID: <81EEC2D4-D039-41D5-A10F-91DD26B4D21E@illinois.edu> I think, if this is actively being used, we should split it away from bioperl-ext and release it on its own. Otherwise I worry about the long-term support for it/ chris On Sep 2, 2011, at 3:44 AM, Dave Messina wrote: > As it happens, a colleague of mine needed Bio::Ext::Align for hhrpred: > http://toolkit.tuebingen.mpg.de/hhpred > > > He got it working thus: > Hi Dave, > thanks a lot. i made it work. The error i got later on was: > relocation R_X86_64_32 against `.rodata.str1.1' can not be used when making a shared object; > recompile with -fPIC > > the solution is: > perl Makefile.PL PREFIX= /fftwsingle --enable-shared --with-pic --enable-single > > make > make install > http://forums.fedoraforum.org/showthread.php?t=232607 > > > > Dave > > > > > > On Fri, Sep 2, 2011 at 06:31, Fields, Christopher J wrote: > Yes, it's essentially deprecated (unmaintained). I don't know of anyone who has packaged that up in a while, if ever. > > chris > > On Sep 1, 2011, at 10:49 PM, Jack Tanner wrote: > > > I'd like to run Bio::Ext::Align, from bioperl-ext. Seems like a pretty quiet codebase these days... Is it dead? > > > > Does anyone have a spec file for building an SRPM for it for RHEL 6? > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From ihok at hotmail.com Fri Sep 2 11:20:44 2011 From: ihok at hotmail.com (Jack Tanner) Date: Fri, 2 Sep 2011 11:20:44 -0400 Subject: [Bioperl-l] Bio::Ext::Align? In-Reply-To: <81EEC2D4-D039-41D5-A10F-91DD26B4D21E@illinois.edu> References: <82213CB3-C02A-415D-A4F3-66471D44C31C@illinois.edu> <81EEC2D4-D039-41D5-A10F-91DD26B4D21E@illinois.edu> Message-ID: I also see that someone's forked it on Github and made some packaging fixes. It'd be nice to see it revived. On 9/2/2011 9:00 AM, Fields, Christopher J wrote: > I think, if this is actively being used, we should split it away from bioperl-ext and release it on its own. Otherwise I worry about the long-term support for it/ > > chris > > On Sep 2, 2011, at 3:44 AM, Dave Messina wrote: > >> As it happens, a colleague of mine needed Bio::Ext::Align for hhrpred: >> http://toolkit.tuebingen.mpg.de/hhpred >> >> >> He got it working thus: >> Hi Dave, >> thanks a lot. i made it work. The error i got later on was: >> relocation R_X86_64_32 against `.rodata.str1.1' can not be used when making a shared object; >> recompile with -fPIC >> >> the solution is: >> perl Makefile.PL PREFIX= /fftwsingle --enable-shared --with-pic --enable-single >> >> make >> make install >> http://forums.fedoraforum.org/showthread.php?t=232607 >> >> >> >> Dave >> >> >> >> >> >> On Fri, Sep 2, 2011 at 06:31, Fields, Christopher J wrote: >> Yes, it's essentially deprecated (unmaintained). I don't know of anyone who has packaged that up in a while, if ever. >> >> chris >> >> On Sep 1, 2011, at 10:49 PM, Jack Tanner wrote: >> >>> I'd like to run Bio::Ext::Align, from bioperl-ext. Seems like a pretty quiet codebase these days... Is it dead? >>> >>> Does anyone have a spec file for building an SRPM for it for RHEL 6? >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > From manju.rawat2 at gmail.com Sat Sep 3 01:29:56 2011 From: manju.rawat2 at gmail.com (Manju Rawat) Date: Sat, 3 Sep 2011 01:29:56 -0400 Subject: [Bioperl-l] hsps_successfully_gapped: 47 Message-ID: Hello, Is There any method in BioPerl through which we can extract number_of_hsps_successfully_gapped: from a blast file.. If any one know about the it Pl help me... Thanks Manju Rawat From manju.rawat2 at gmail.com Sat Sep 3 06:00:22 2011 From: manju.rawat2 at gmail.com (Manju Rawat) Date: Sat, 3 Sep 2011 06:00:22 -0400 Subject: [Bioperl-l] blast result not matching. Message-ID: Hi, I doing blast using bioperl...but it not showing me complete result.. my program is following... #!usr/bin/perl -w use Bio::Perl; # this script will only work with an internet connection # on the computer it is run on $seq = new_sequence("ATTGGTTTGGGGACCCAATTTGTGTGTTATATGTA"); $blast_result=blast_sequence($seq); write_blast(">xyz.blast",$blast_result); and Output.. BLASTN 2.2.25+ Stephen F. Altschul, Thomas L. Madden, Alejandro A. Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs", Nucleic Acids Res. 25:3389-3402. Query= blast-sequence-temp-id (30 letters) Database: All GenBank+EMBL+DDBJ+PDB sequences (but no EST, STS,GSS,environmental samples or phase 0, 1 or 2 HTGS sequences) 14,527,398 sequences; 37,346,598,701 total letters Score E Sequences producing significant alignments: (bits) value Database: All GenBank+EMBL+DDBJ+PDB sequences (but no EST, STS,GSS,environmental samples or phase 0, 1 or 2 HTGS sequences) Posted date: Sep 2, 2011 4:14 PM Number of letters in database: 37,346,598,701 Number of sequences in database: 14,527,398 Matrix: blastn matrix:2 -3 Gap Penalties Existence: 5, Extension: 2 expect: 1e-10 allowgaps: yes Search Statistics A: 0 Hits_to_DB: 737,387 S1: 23 S1_bits: 22.0 S2: 77 S2_bits: 70.7 X1: 22 X1_bits: 20.1 X2: 33 X2_bits: 29.8 X3: 110 X3_bits: 99.2 dbentries: 14,527,398 dbletters: -1308106959 effectivedblength: 36,954,358,955 effectivespace: 110,863,076,865 effectivespaceused: 110,863,076,865 entropy: 0.912 entropy_gapped: 0.780 kappa: 0.408 kappa_gapped: 0.410 lambda: 0.634 lambda_gapped: 0.625 length_adjustment: 27 num_extensions: 8,057 num_successful_extensions: 8,057 number_of_hsps_better_than_expect_value_cutoff_without_gapping: 0 number_of_hsps_gapped: 8,057 number_of_hsps_successfully_gapped: 0 querylength: 3 seqs_better_than_1e-10: 0 this result is not matching with with NCBI result... Is there anything wrong.. Thanks Manju Rawat From florent.angly at gmail.com Sun Sep 4 22:14:37 2011 From: florent.angly at gmail.com (Florent Angly) Date: Mon, 05 Sep 2011 12:14:37 +1000 Subject: [Bioperl-l] Bio::SeqIO can't guess the format of data from a pipe In-Reply-To: References: <2E070303-ADDF-472B-835B-6D3959B83E9A@illinois.edu> <4E58D105.7050805@gmail.com> <4E5A0590.2010805@gmail.com> <8D639B95-0666-4F09-8E9E-88C8CDF76ABC@illinois.edu> <4E5AC2B8.9060808@gmail.com> Message-ID: <4E64308D.5060304@gmail.com> Thanks for your advice Chris. I put a format() and variant() method in Bio::Root::IO. All the Bio::*IO methods inherit these methods. Regarding the module naming, 8 follow the convention Bio::*IO and 8 follow the Bio::*::IO convention. If we decide to rename some IO modules for consistency, I would prefer the Bio::*::IO convention. Regards, Florent On 29/08/11 11:10, Chris Fields wrote: > On Aug 28, 2011, at 5:35 PM, Florent Angly wrote: > >> Hi, >> >> I implemented the format() getter method in Bio::SeqIO as discussed, essentially following the way proposed by Hilmar. The variant() method is not needed since Bio::SeqIO::fastq already has a get/set method for that. > Right, but the method could be used by other modules if it were moved to Bio::SeqIO. for instance. > >> I noticed that there are plenty more Bio*IO modules that could benefit from having a format() method, e.g.: >> Bio::AlignIO >> Bio::ClusterIO >> Bio::FeatureIO >> Bio::MapIO >> Bio::OntologyIO >> Bio::SearchIO >> Bio::TreeIO >> Bio::Assembly::IO * >> The code could be copy-pasted for each of them but it is not very graceful. Is there a way we could have all these IO modules share the same format() method? > Move the method to Bio::Root::IO, the common base class for all of the above. > >> * Note how the IO class for Bio::Assembly is called Bio::Assembly::IO, and not Bio::AssemblyIO like for other classes. This may be something to change in the future for consistency. >> >> Florent > That's possible; one could take advantage of that for redesign/API issues if it were needed. > > chris From manju.rawat2 at gmail.com Mon Sep 5 03:53:40 2011 From: manju.rawat2 at gmail.com (Manju Rawat) Date: Mon, 5 Sep 2011 03:53:40 -0400 Subject: [Bioperl-l] Fwd: blast result not matching. In-Reply-To: References: Message-ID: Hi, I doing blast using bioperl...but it not showing me complete result.. my program is following... #!usr/bin/perl -w use Bio::Perl; # this script will only work with an internet connection # on the computer it is run on $seq = new_sequence("ATTGGTTTGGGGACCCAATTTGTGTGTTATATGTA"); $blast_result=blast_sequence($seq); write_blast(">xyz.blast",$blast_result); *and Output..* BLASTN 2.2.25+ Stephen F. Altschul, Thomas L. Madden, Alejandro A. Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs", Nucleic Acids Res. 25:3389-3402. Query= blast-sequence-temp-id (30 letters) Database: All GenBank+EMBL+DDBJ+PDB sequences (but no EST, STS,GSS,environmental samples or phase 0, 1 or 2 HTGS sequences) 14,527,398 sequences; 37,346,598,701 total letters Score E Sequences producing significant alignments: (bits) value Database: All GenBank+EMBL+DDBJ+PDB sequences (but no EST, STS,GSS,environmental samples or phase 0, 1 or 2 HTGS sequences) Posted date: Sep 2, 2011 4:14 PM Number of letters in database: 37,346,598,701 Number of sequences in database: 14,527,398 Matrix: blastn matrix:2 -3 Gap Penalties Existence: 5, Extension: 2 expect: 1e-10 allowgaps: yes Search Statistics A: 0 Hits_to_DB: 737,387 S1: 23 S1_bits: 22.0 S2: 77 S2_bits: 70.7 X1: 22 X1_bits: 20.1 X2: 33 X2_bits: 29.8 X3: 110 X3_bits: 99.2 dbentries: 14,527,398 dbletters: -1308106959 effectivedblength: 36,954,358,955 effectivespace: 110,863,076,865 effectivespaceused: 110,863,076,865 entropy: 0.912 entropy_gapped: 0.780 kappa: 0.408 kappa_gapped: 0.410 lambda: 0.634 lambda_gapped: 0.625 length_adjustment: 27 num_extensions: 8,057 num_successful_extensions: 8,057 number_of_hsps_better_than_expect_value_cutoff_without_gapping: 0 number_of_hsps_gapped: 8,057 number_of_hsps_successfully_gapped: 0 querylength: 3 seqs_better_than_1e-10: 0 this result is not matching with with NCBI result... Is there anything wrong.. Thanks Manju Rawat From p.j.a.cock at googlemail.com Mon Sep 5 05:44:06 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 5 Sep 2011 10:44:06 +0100 Subject: [Bioperl-l] Fwd: blast result not matching. In-Reply-To: References: Message-ID: On Mon, Sep 5, 2011 at 8:53 AM, Manju Rawat wrote: > Hi, > I doing blast using bioperl...but it not showing me complete result.. > > > my program is following... > ... > > this result is not matching with with NCBI result... > Is there anything wrong.. The NCBI website for BLAST uses different default values to the BLAST command line tools. Check things like the gap parameters if you want to use the same settings. Peter From p.j.a.cock at googlemail.com Mon Sep 5 06:25:15 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 5 Sep 2011 11:25:15 +0100 Subject: [Bioperl-l] Fwd: blast result not matching. In-Reply-To: References: Message-ID: Please CC the mailing list. On Mon, Sep 5, 2011 at 11:19 AM, Manju Rawat wrote: > Hi, > > Thanks for the reply... > but when i am blasting after getting sequence of any gene (from NCBI using > bioperl see below)..it showing me same result as shown in NCBI.. > > #!usr/bin/perl -w > use Bio::Perl; > $seq_object = get_sequence('NCBI',"NM_181451"); > $blast_result = blast_sequence($seq); > write_blast(">roa1.blast",$blast_report); > > > I dnt know why its not working when i am blasting my own sequence.. > Maybe you need give the sequence as a FASTA entry rather than a plain string? Peter From manju.rawat2 at gmail.com Mon Sep 5 06:40:57 2011 From: manju.rawat2 at gmail.com (Manju Rawat) Date: Mon, 5 Sep 2011 06:40:57 -0400 Subject: [Bioperl-l] Fwd: blast result not matching. In-Reply-To: References: Message-ID: No..i also tried this..this also dont work.. pls help me.. From cjfields at illinois.edu Mon Sep 5 15:42:49 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Mon, 5 Sep 2011 19:42:49 +0000 Subject: [Bioperl-l] Fwd: blast result not matching. In-Reply-To: References: Message-ID: <4AF09F30-CB69-4285-90A2-40AFEEEBD222@illinois.edu> Are you using the latest BioPerl? I believe there had been some fixes addressing remote blast. chris On Sep 5, 2011, at 5:40 AM, Manju Rawat wrote: > No..i also tried this..this also dont work.. > pls help me.. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From manju.rawat2 at gmail.com Tue Sep 6 06:59:50 2011 From: manju.rawat2 at gmail.com (Manju Rawat) Date: Tue, 6 Sep 2011 06:59:50 -0400 Subject: [Bioperl-l] Fwd: blast result not matching. In-Reply-To: <4AF09F30-CB69-4285-90A2-40AFEEEBD222@illinois.edu> References: <4AF09F30-CB69-4285-90A2-40AFEEEBD222@illinois.edu> Message-ID: bioperl 1.6.9 version is installed in my system.. its not the reason bcs blast is working fine when i am blasting with follwing code.. #!usr/bin/perl -w use Bio::Perl; $seq = get_sequence('NCBI',"NM_181451"); $blast_result=blast_sequence($seq); write_blast(">xyz.blast",$blast_result); Manju From sidd.basu at gmail.com Tue Sep 6 11:51:09 2011 From: sidd.basu at gmail.com (Siddhartha Basu) Date: Tue, 6 Sep 2011 10:51:09 -0500 Subject: [Bioperl-l] Bioinformatics Job Opening at dictyBase in Chicago Message-ID: <20110906155106.GB1841@Macintosh-388.local> Hi All, We have an open position for a Bioinformatics Software Engineer at dictyBase(Northwestern University in Chicago). The job involves developing web application and middleware for a genome database using modern perl(DBIx-Class/Moose/MVC web frameworks etc) as well as integration of various genomic tools(gbrowse, intermine, apollo, biomart, pathway tools etc..). For full details please see: http://www.dictybase.org/dictybase_jobs.html. thanks, -siddhartha Siddhartha Basu Software developer, dictybase http://www.dictybase.org From slucky at ibab.ac.in Wed Sep 7 09:39:03 2011 From: slucky at ibab.ac.in (Lucky Singh) Date: Wed, 07 Sep 2011 19:09:03 +0530 Subject: [Bioperl-l] Fwd: Re: Problem using Bio::Tools::Run::RemoteBlast Message-ID: <4E6773F7.7000703@ibab.ac.in> -------- Original Message -------- Subject: Re: [Bioperl-l] Problem using Bio::Tools::Run::RemoteBlast Date: Sat, 27 Aug 2011 20:36:58 +0530 From: Lucky Singh To: Carn? Draug On Friday 26 August 2011 07:50 PM, Carn? Draug wrote: > On 22 August 2011 07:01, Lucky Singh wrote: >> Now I >> wanted to host it from web server, but This program is not working from it >> may be it is not able to create or write on file from web server but in >> command line it is working fine. I don't know the possible reason, please >> help me to figure it out. > Have you looked in the apache logs (look in > /var/log/apache2/error.log) ? Can you pastebin your whole code and the > content of the error log after trying to run the script? Dear Carn? Draug, As per your suggestion, I am attaching blast code file currently it is not showing any error on error.log. Thanks a lot for your valuable reply and will be highly grateful if you can get me out of this problem :) -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: blastn URL: From jason.stajich at gmail.com Wed Sep 7 12:13:46 2011 From: jason.stajich at gmail.com (Jason Stajich) Date: Wed, 7 Sep 2011 09:13:46 -0700 Subject: [Bioperl-l] Fwd: blast result not matching. In-Reply-To: References: <4AF09F30-CB69-4285-90A2-40AFEEEBD222@illinois.edu> Message-ID: <8FB3525D-3B2B-4D17-80D4-AA85B3D2AE8B@gmail.com> I don't think it works. I am not sure why - probably a bug - but can you go back to what it is you are trying to do? The Bio::Perl functions in that modules are intended to be shortcuts but the original modules should work. Can you recap what it is you want to accomplish, it may be better to do this with the Bio::Perl module but instead use a more direct use of the underlying modules. On Sep 6, 2011, at 3:59 AM, Manju Rawat wrote: > bioperl 1.6.9 version is installed in my system.. > its not the reason bcs blast is working fine when i am blasting with > follwing code.. > > #!usr/bin/perl -w > use Bio::Perl; > $seq = get_sequence('NCBI',"NM_181451"); > $blast_result=blast_sequence($seq); > write_blast(">xyz.blast",$blast_result); > > > Manju > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Wed Sep 7 12:33:52 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Wed, 7 Sep 2011 16:33:52 +0000 Subject: [Bioperl-l] Fwd: blast result not matching. In-Reply-To: <8FB3525D-3B2B-4D17-80D4-AA85B3D2AE8B@gmail.com> References: <4AF09F30-CB69-4285-90A2-40AFEEEBD222@illinois.edu> <8FB3525D-3B2B-4D17-80D4-AA85B3D2AE8B@gmail.com> Message-ID: I think there was an issue with Bio::Perl BLAST submissions fixed in the 1.6.901 release (1.6.9 != 1.6.901, the latter is newer). From CPAN: 1.6.901 May 18, 2011 ... [Bug fixes] * [3205] - small fix to Bio::Perl blast_sequence() to make compliant with docs [genehack, cjfields] chris On Sep 7, 2011, at 11:13 AM, Jason Stajich wrote: > I don't think it works. I am not sure why - probably a bug - but can you go back to what it is you are trying to do? The Bio::Perl functions in that modules are intended to be shortcuts but the original modules should work. > Can you recap what it is you want to accomplish, it may be better to do this with the Bio::Perl module but instead use a more direct use of the underlying modules. > > > On Sep 6, 2011, at 3:59 AM, Manju Rawat wrote: > >> bioperl 1.6.9 version is installed in my system.. >> its not the reason bcs blast is working fine when i am blasting with >> follwing code.. >> >> #!usr/bin/perl -w >> use Bio::Perl; >> $seq = get_sequence('NCBI',"NM_181451"); >> $blast_result=blast_sequence($seq); >> write_blast(">xyz.blast",$blast_result); >> >> >> Manju >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > From carandraug+dev at gmail.com Wed Sep 7 12:47:16 2011 From: carandraug+dev at gmail.com (=?ISO-8859-1?Q?Carn=EB_Draug?=) Date: Wed, 7 Sep 2011 17:47:16 +0100 Subject: [Bioperl-l] Problem using Bio::Tools::Run::RemoteBlast In-Reply-To: <4E590812.9030006@ibab.ac.in> References: <37711.192.168.1.254.1313992876.squirrel@webmail.ibab.ac.in> <4E590812.9030006@ibab.ac.in> Message-ID: 2011/8/27 Lucky Singh : > On Friday 26 August 2011 07:50 PM, Carn? Draug wrote: >> >> On 22 August 2011 07:01, Lucky Singh ?wrote: >>> >>> Now I >>> wanted to host it from web server, but This program is not working from >>> it >>> may be it is not able to create or write on file from web server but in >>> command line it is working fine. I don't know the possible reason, please >>> help me to figure it out. >> >> Have you looked in the apache logs (look in >> /var/log/apache2/error.log) ? Can you pastebin your whole code and the >> content of the error log after trying to run the script? > > Dear Carn? Draug, > > As per your suggestion, I am attaching blast code file currently it is not > showing any error on error.log. > Thanks a lot for your valuable reply and will be highly grateful if you can > get me out of this problem :) Hi sorry for the late reply. Please try to always reply to the mailing list, maybe someone else can help you too. I don't know about the script as I never used RemoteBlast from bioperl. But given a quick look at it, you're not loading the CGI module on the script ( http://perldoc.perl.org/CGI.html ). Here's a simple example using the CGI module ( http://pastebin.com/miMd70wn ) and a HTML page that will use it ( http://pastebin.com/kWwwMijd ). If nothing shows up on error.log, take a look in access.log. Try some simple CGI script first, such as "hello world!" to see if the problem lies on your bioperl part of the script, or in the web server, or some other part. Carn? From scott at scottcain.net Wed Sep 7 13:57:31 2011 From: scott at scottcain.net (Scott Cain) Date: Wed, 7 Sep 2011 13:57:31 -0400 Subject: [Bioperl-l] October GMOD Meeting in Toronto Message-ID: Hello, The early registration deadline for the October GMOD meeting in Toronto, Canada is approaching. Please register by September 13th to avoid the late registration fee. You can register here: http://gmod.eventbrite.com/ For information about the GMOD meeting please see the page at: http://gmod.org/wiki/October_2011_GMOD_Meeting In addition to the main meeting, there will be a free BioMart workshop on the following Friday, which you can also register for at the main meeting registration page. Thanks, Scott -- ------------------------------------------------------------------------ Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 Ontario Institute for Cancer Research From info at etisalat.com Wed Sep 7 05:52:00 2011 From: info at etisalat.com (Etisalat Telecommunication Network.) Date: Wed, 7 Sep 2011 17:52:00 +0800 Subject: [Bioperl-l] Winning No:ETS/G/NG Message-ID: <20110907092508.M63844@etisalat.com> Etisalat Telecommunication Network. Ticket No:ET/S/3G Notification Date:07/09/2011 Winning No:ETS/G/NG Dear Beneficiary, Congratulations The Etisalat mobile telecommunication network service has chosen you by the board of executive directors as one of the final recipients of a cash Grant/Donation.The online cyber draws was conducted from an exclusive list of 100,000 email addresses of individuals and corporate bodies picked by an advanced automated random computer selection from the web.This promotion is to celebrate the patronage of our esteem customers and we are giving out a yearly donation of $1,000,000.00 US dollers to 10 lucky recipients as a way of showing our appreciation. CONTACT EVENT MANAGER. NAME:Thompson Thomas Phone # :+2347063805127 etisalat_clamdept001 at hotmail.com Etisalat Claims Department 1.Full Name: 2.Residential Address: 3.Country: 4.Occupation: 5.Telephone: 6.Sex: 7.Age: 8.Next of Kin: 9.Nationality: 10.Winning No: Secretary Mrs Linda Abram Etisalat Award Promotion (c)2011 Online Award Promotion Edition From longbow0 at gmail.com Wed Sep 7 16:19:37 2011 From: longbow0 at gmail.com (longbow leo) Date: Wed, 7 Sep 2011 15:19:37 -0500 Subject: [Bioperl-l] How to determine strains were evolved independently in a phylogenetic tree? Message-ID: Hi, I have created a phylogenetic for a virus protein which contained about 200 strains. Next I need to do an analysis to check whether several strains in this tree were evolved independently. Although it is not too difficult to do manually, I still have litter idea how to do this in a Perl script since there are some datasets need to do. At first I tried to use the method "is_monophyletic" in the module "Bio::Tree::TreeFunctionsI" to do this analysis, but it seems it doesn't work as I have thought. According to the description of "is_monophyletic" method, it "Will do a test of monophyly for the nodes specified in comparison to a chosen outgroup". Does here test whether the outgroup strain is monophyletic to the nodes, or test the nodes only? The description sounds like the latter but the what the script did seemed to be the first. Are there any suggestions? Thank you very much! Haizhou Liu From greg at ebi.ac.uk Thu Sep 8 06:40:30 2011 From: greg at ebi.ac.uk (Gregory Jordan) Date: Thu, 8 Sep 2011 11:40:30 +0100 Subject: [Bioperl-l] How to determine strains were evolved independently in a phylogenetic tree? In-Reply-To: References: Message-ID: Hi Haizhou, I'm not sure I understand exactly what you're trying to do. But to clarify the BioPerl code: the is_monophyletic method (for the actual code, see here https://github.com/bioperl/bioperl-live/blob/master/Bio/Tree/TreeFunctionsI.pm#L832) tests whether the single outgroup node falls *within* or *outside* the last common ancestor of the group of nodes given. If the outgroup node falls *outside* the subtree defined by this LCA node, then the group of nodes can be called monophyletic with respect to that outgroup (at least as far as my understanding of the word 'monophyletic' goes). If the outgroup node falls *within* the subtree defined by this LCA node, then the group of nodes is not monophyletic with respect to that outgroup node. The term "evolved independently" sounds slightly vague to me -- what is it exactly about the shape of your tree that allows you to call a strain independent or not? If you gave an example or two of trees where you consider the evolution to be independent and non-independent, I (or someone else on the list) may be able to help you find the right method to do this automatically. Cheers, Greg On Wed, Sep 7, 2011 at 9:19 PM, longbow leo wrote: > Hi, > > I have created a phylogenetic for a virus protein which contained about 200 > strains. Next I need to do an analysis to check whether several strains in > this tree were evolved independently. Although it is not too difficult to > do > manually, I still have litter idea how to do this in a Perl script since > there are some datasets need to do. > > At first I tried to use the method "is_monophyletic" in the module > "Bio::Tree::TreeFunctionsI" to do this analysis, but it seems it doesn't > work as I have thought. According to the description of "is_monophyletic" > method, it "Will do a test of monophyly for the nodes specified in > comparison to a chosen outgroup". Does here test whether the outgroup > strain > is monophyletic to the nodes, or test the nodes only? The description > sounds > like the latter but the what the script did seemed to be the first. > > Are there any suggestions? > > Thank you very much! > > > Haizhou Liu > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From manju.rawat2 at gmail.com Thu Sep 8 02:11:12 2011 From: manju.rawat2 at gmail.com (Manju Rawat) Date: Thu, 8 Sep 2011 02:11:12 -0400 Subject: [Bioperl-l] Fwd: blast result not matching. In-Reply-To: References: <4AF09F30-CB69-4285-90A2-40AFEEEBD222@illinois.edu> <8FB3525D-3B2B-4D17-80D4-AA85B3D2AE8B@gmail.com> Message-ID: Toady i installed the latest version of bioperl in my system via CPAN.. But this still not sowing the complete result.. I just want to do nucleotide blast using bioperl..but while i am doing blast with my sequence it shwowing very samll result.. I dnt know whether it is wrong or right...but while i am blasting the same sequence in NCBI it showing a diffrent result.. and i have also tried to use the orignl module..but it also dnt work.. Pl see reult of the balst in attached file of this mail.. #!usr/bin/perl -w use Bio::Perl; use Bio::SearchIO; $blast_report =blast_sequence('acggctgctgtagatctgatgct'); write_blast(">resl.blast",$blast_report); Thanks. Manju Rawat -------------- next part -------------- A non-text attachment was scrubbed... Name: resl.blast Type: application/octet-stream Size: 1680 bytes Desc: not available URL: From cjfields at illinois.edu Thu Sep 8 09:05:10 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Thu, 8 Sep 2011 13:05:10 +0000 Subject: [Bioperl-l] Fwd: blast result not matching. In-Reply-To: References: <4AF09F30-CB69-4285-90A2-40AFEEEBD222@illinois.edu> <8FB3525D-3B2B-4D17-80D4-AA85B3D2AE8B@gmail.com> Message-ID: <6D4A142B-9455-4CC3-AFDB-F9B3B991B57F@illinois.edu> Submissions to NCBI BLAST via their web interface have different parameters than those submitted via their QBLAST interface (what is used in BioPerl). So the fact there are differing results isn't too surprising, particularly if the results fall close to the e-value cutoff for one or the other. You will need to set the proper parameters, which I don't believe is possible via the (very simple) Bio::Perl interface, but is possible via Bio::Tools::Run::RemoteBlast. chris On Sep 8, 2011, at 1:11 AM, Manju Rawat wrote: > Toady i installed the latest version of bioperl in my system via CPAN.. > But this still not sowing the complete result.. > > > I just want to do nucleotide blast using bioperl..but while i am doing blast with my sequence it shwowing very samll result.. > I dnt know whether it is wrong or right...but while i am blasting the same sequence in NCBI it showing a diffrent result.. > > and i have also tried to use the orignl module..but it also dnt work.. > > Pl see reult of the balst in attached file of this mail.. > #!usr/bin/perl -w > use Bio::Perl; > use Bio::SearchIO; > $blast_report =blast_sequence('acggctgctgtagatctgatgct'); > write_blast(">resl.blast",$blast_report); > > Thanks. > Manju Rawat > > > From David.Messina at sbc.su.se Thu Sep 8 09:33:19 2011 From: David.Messina at sbc.su.se (Dave Messina) Date: Thu, 8 Sep 2011 15:33:19 +0200 Subject: [Bioperl-l] Fwd: blast result not matching. In-Reply-To: References: <4AF09F30-CB69-4285-90A2-40AFEEEBD222@illinois.edu> <8FB3525D-3B2B-4D17-80D4-AA85B3D2AE8B@gmail.com> Message-ID: As I think has been said earlier in this thread, it's almost certainly a discrepancy in the BLAST parameters between what the blast_sequence function in the Bio::Perl module is sending, and what the BLAST website is doing. In this case, you have a very short sequence. If you look in the "Algorithm parameters" section of the BLAST web form, you'll see that there is an option that is checked by default, "Automatically adjust parameters for short input sequences". If I uncheck that option, I get the same results as you did when you submitted your BLAST through BioPerl (see http://cl.ly/9ynq). So to get the same results from a BioPerl-submitted BLAST and a BLAST on NCBI's website, you need to have the same parameters. You can set the parameters from BioPerl as described in the documentation: http://search.cpan.org/~cjfields/BioPerl-1.6.901/Bio/Tools/Run/RemoteBlast.pm As Jason said earlier, the blast_sequence function in Bio::Perl is intended as a simple demonstration and uses the default BLAST parameters. That function is a wrapper around the RemoteBlast module. Since you want to do something a little different, I believe you'll need to use the RemoteBlast module directly. Dave On Thu, Sep 8, 2011 at 08:11, Manju Rawat wrote: > Toady i installed the latest version of bioperl in my system via CPAN.. > But this still not sowing the complete result.. > > > I just want to do nucleotide blast using bioperl..but while i am doing > blast > with my sequence it shwowing very samll result.. > I dnt know whether it is wrong or right...but while i am blasting the same > sequence in NCBI it showing a diffrent result.. > > and i have also tried to use the orignl module..but it also dnt work.. > > Pl see reult of the balst in attached file of this mail.. > #!usr/bin/perl -w > use Bio::Perl; > use Bio::SearchIO; > $blast_report =blast_sequence('acggctgctgtagatctgatgct'); > write_blast(">resl.blast",$blast_report); > > Thanks. > Manju Rawat > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From abualiga2 at gmail.com Thu Sep 8 10:44:39 2011 From: abualiga2 at gmail.com (galeb abu-ali) Date: Thu, 8 Sep 2011 10:44:39 -0400 Subject: [Bioperl-l] genbank parsing of multiple 'function' tags within primary tag Message-ID: Hi, I'm parsing a genbank file with Bio::SeqIO and am stuck on instances of multiple tags within a primary tag. E.g., when there are several 'function' tag-values within a 'CDS' primary tag, I don't know how to link those 'function' tag-values to a particular 'locus_tag'. As parsed values are returned as a list, I tried creating an array of hashes, where the hash-key is 'locus_tag' and hash-values are multiple 'function' tags, but am failing miserably. Pasted below is what I managed so far. At your convenience, please advise. thanks! galeb #!/usr/local/bin/perl # parse_gbk.pl # gsa 09042011 # script to parse out features from gbk # http://www.bioperl.org/wiki/HOWTO:Feature-Annotation#Customizing_Sequence_Object_Construction use strict; use warnings; use Bio::SeqIO; my @loci; my @seqs; my @directions; my @start_coords; my @end_coords; my @genes; my @products; my @notes; my @functions; my %functions; my $gb_file = shift; my $seqio_obj = Bio::SeqIO->new(-file => $gb_file ); my $seq_obj = $seqio_obj->next_seq; for my $feat_obj ( $seq_obj->get_SeqFeatures ) { if ( $feat_obj->primary_tag eq ( 'gene' ) ) { if ($feat_obj->has_tag( 'locus_tag' ) ) { push ( @seqs, $feat_obj->seq->seq ); #collect sequences for my $val ( $feat_obj->get_tag_values( 'locus_tag' ) ) { push ( @loci, $val ); # locus_tags } } if ( $feat_obj->has_tag( 'gene' ) ) { for my $val ( $feat_obj->get_tag_values( 'gene' ) ) { push ( @genes, $val ); # gene names } } else { push ( @genes, "" ); # if gene names are absent, leave empty } if ( $feat_obj->location->isa( 'Bio::Location::Simple' ) ) { # gene coordinates for my $location ( $feat_obj->location ) { push ( @start_coords, $location->start ); push ( @end_coords, $location->end ); if ( $location->strand == -1 ) { push ( @directions, "reverse" ); } else { push ( @directions, "forward" ); } } } } # gene products, notes, functions if ( $feat_obj->primary_tag eq ( 'CDS' ) || $feat_obj->primary_tag eq ( 'misc_feature' ) || $feat_obj->primary_tag eq ( 'ncRNA' ) || $feat_obj->primary_tag eq ( 'rRNA' ) || $feat_obj->primary_tag eq ( 'tRNA' ) || $feat_obj->primary_tag eq ( 'misc_RNA' ) ) { if ( $feat_obj->has_tag( 'product' ) ) { for my $product ( $feat_obj->get_tag_values( 'product' ) ) { push ( @products, $product ); } } else { push ( @products, "" ); } if ( $feat_obj->has_tag( 'note' ) ) { for my $note ( $feat_obj->get_tag_values( 'note' ) ) { push ( @notes, $note ); } } else { push ( @notes, "" ); } if ( $feat_obj->has_tag( 'function' ) ) { for my $function ( $feat_obj->get_tag_values( 'function' ) ) { push ( @functions, $function ); } } else { push ( @functions, "" ); } } } print "locus\tgene_name\tstart_nt\tend_nt\tlength_nt\tdirection\tproduct\tnote\tfunction\tsequence_nt\n"; # header for ( my $elem = 0; $elem < scalar @loci; ++$elem ) { print $loci[$elem], "\t",$genes[$elem], "\t", $start_coords[$elem], "\t", $end_coords[$elem], "\t", length( $seqs[$elem] ), "\t", $directions[$elem], "\t", $products[$elem], "\t", $notes[$elem], "\t", $functions[$elem], "\t", $seqs[$elem], "\n"; } From p.j.a.cock at googlemail.com Thu Sep 8 11:27:56 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 8 Sep 2011 16:27:56 +0100 Subject: [Bioperl-l] genbank parsing of multiple 'function' tags within primary tag In-Reply-To: References: Message-ID: On Thu, Sep 8, 2011 at 3:44 PM, galeb abu-ali wrote: > Hi, > > I'm parsing a genbank file with Bio::SeqIO and am stuck on instances of > multiple tags within a primary tag. ?E.g., when there are several 'function' > tag-values within a 'CDS' primary tag, I don't know how to link those > 'function' tag-values to a particular 'locus_tag'. Do you have GenBank features with multiple locus_tag qualifiers? That would be very unusual... Peter From cjfields at illinois.edu Thu Sep 8 11:32:21 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Thu, 8 Sep 2011 15:32:21 +0000 Subject: [Bioperl-l] genbank parsing of multiple 'function' tags within primary tag In-Reply-To: References: Message-ID: On Sep 8, 2011, at 10:27 AM, Peter Cock wrote: > On Thu, Sep 8, 2011 at 3:44 PM, galeb abu-ali wrote: >> Hi, >> >> I'm parsing a genbank file with Bio::SeqIO and am stuck on instances of >> multiple tags within a primary tag. E.g., when there are several 'function' >> tag-values within a 'CDS' primary tag, I don't know how to link those >> 'function' tag-values to a particular 'locus_tag'. > > Do you have GenBank features with multiple locus_tag qualifiers? > That would be very unusual... > > Peter Agreed; in order to clarify what you mean, I think we would need to see the record in question to get a better idea of the problem. chris From abualiga2 at gmail.com Thu Sep 8 11:39:20 2011 From: abualiga2 at gmail.com (galeb abu-ali) Date: Thu, 8 Sep 2011 11:39:20 -0400 Subject: [Bioperl-l] genbank parsing of multiple 'function' tags within primary tag In-Reply-To: References: Message-ID: I guess I was not clear. 'locus_tag' qualifiers are single, but there are mutliple 'function' qualifiers within a primary feature (e.g. 'CDS'). # gbk file LOCUS NC_011748 5154862 bp DNA circular BCT 15-MAY-2010 # example feature gene complement(1336169..1337905) /gene="cvrA" /locus_tag="EC55989_1287" /db_xref="GeneID:7145846" CDS complement(1336169..1337905) /gene="cvrA" /locus_tag="EC55989_1287" /function="7 : Transport and binding proteins" /function="15.10 : Adaptations to atypical conditions" /function="16.1 : Circulate" /inference="ab initio prediction:AMIGene:2.0" /note="the Vibrio parahaemolyticus gene VP2867 was found to be a potassium/proton antiporter; can rapidly extrude potassium against a potassium gradient at alkaline pH when cloned and expressed in Escherichia coli" /codon_start=1 /transl_table=11 /product="potassium/proton antiporter" /protein_id="YP_002402372.1" /db_xref="GI:218694705" /db_xref="GeneID:7145846" /translation="MDATTIISLFILGSILVTSSILLSSFSSRLGIPILVIFLAIGML AGVDGVGGIPFDNYPFAYMVSNLALAIILLDGGMRTQASSFRVALGPALSLATLGVLI TSGLTGMMAAWLFNLDLIEGLLIGAIVGSTDAAAVFSLLGGKGLNERVGSTLEIESGS NDPMAVFLTITLIAMIQQHESSVSWMFVVDILQQFGLGIVIGLGGGYLLLQMINRIAL PAGLYPLLALSGGILIFALTTALEGSGILAVYLCGFLLGNRPIRNRYGILQNFDGLAW LAQIAMFLVLGLLVNPSDLLPIAIPALILSAWMIFFARPLSVFAGLLPFRGFNLRERV FISWVGLRGAVPIILAVFPMMAGLENARLFFNVAFFVVLVSLLLQGTSLSWAAKKAKV VVPPVGRPVSRVGLDIHPENPWEQFVYQLSADKWCVGAALRDLHMPKETRIAALFRDN QLLHPTGSTRLREGDVLCVIGRERDLPALGKLFSQSPPVALDQRFFGDFILEASAKYA DVALIYGLEDGREYRDKQQTLGEIVQQLLGAAPVVGDQVEFAGMIWTVAEKEDNEVLK IGVRVAEEEAES" On Thu, Sep 8, 2011 at 11:32 AM, Fields, Christopher J < cjfields at illinois.edu> wrote: > On Sep 8, 2011, at 10:27 AM, Peter Cock wrote: > > > On Thu, Sep 8, 2011 at 3:44 PM, galeb abu-ali > wrote: > >> Hi, > >> > >> I'm parsing a genbank file with Bio::SeqIO and am stuck on instances of > >> multiple tags within a primary tag. E.g., when there are several > 'function' > >> tag-values within a 'CDS' primary tag, I don't know how to link those > >> 'function' tag-values to a particular 'locus_tag'. > > > > Do you have GenBank features with multiple locus_tag qualifiers? > > That would be very unusual... > > > > Peter > > Agreed; in order to clarify what you mean, I think we would need to see the > record in question to get a better idea of the problem. > > chris From p.j.a.cock at googlemail.com Thu Sep 8 11:46:28 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 8 Sep 2011 16:46:28 +0100 Subject: [Bioperl-l] genbank parsing of multiple 'function' tags within primary tag In-Reply-To: References: Message-ID: On Thu, Sep 8, 2011 at 4:39 PM, galeb abu-ali wrote: > I guess I was not clear. 'locus_tag' qualifiers are single, but there are > mutliple 'function' qualifiers within a primary feature (e.g. 'CDS'). So are your intending to look at all the CDS features only, and build a hash using the locus_tag as the key, and a list of the 'function' qualifiers as values? Peter From abualiga2 at gmail.com Thu Sep 8 11:55:08 2011 From: abualiga2 at gmail.com (galeb abu-ali) Date: Thu, 8 Sep 2011 11:55:08 -0400 Subject: [Bioperl-l] genbank parsing of multiple 'function' tags within primary tag In-Reply-To: References: Message-ID: So are your intending to look at all the CDS features only, and build a hash using the locus_tag as the key, and a list of the 'function' qualifiers as values? Precisely! I want to create a tab delim file with 'locus_tag' as the common identifier to all the features and gene sequences. So far, I parsed out sequences and single instance qualifiers, but 'function' and 'db_xref' qualifiers give me grief. galeb From abualiga2 at gmail.com Thu Sep 8 12:14:07 2011 From: abualiga2 at gmail.com (galeb abu-ali) Date: Thu, 8 Sep 2011 12:14:07 -0400 Subject: [Bioperl-l] genbank parsing of multiple 'function' tags within primary tag In-Reply-To: <1315497897.3797.443.camel@deskpro15336.internal.sanger.ac.uk> References: <1315497897.3797.443.camel@deskpro15336.internal.sanger.ac.uk> Message-ID: I only had a quick look at your code, so maybe I'm missing something but you are currently pushing all products of all CDSs into the same array, i.e. you do not assign them to a datastructure that links a particular CDS to a list of products. You then use the same index to print out a locus from the @loci array and a product from @products, but the two will not match up because you will have more products than loci. That's right. Products are not the issue in this particular case, as it's E.coli and there's no alternate splicing as far as I know so there is a single product per gene. But there are plenty more 'function' qualifiers, for example, than loci. And I don't know how to create a data structure that will link a 'gene' (as primary tag) to all other qualifiers, whether they belong to 'CDS', 'Misc_RNA', 'Misc_feature', or other primary tags. From ss2489 at cornell.edu Thu Sep 8 12:28:40 2011 From: ss2489 at cornell.edu (Surya Saha) Date: Thu, 8 Sep 2011 12:28:40 -0400 Subject: [Bioperl-l] genbank parsing of multiple 'function' tags within primary tag In-Reply-To: References: <1315497897.3797.443.camel@deskpro15336.internal.sanger.ac.uk> Message-ID: You might want to explore using a hash of complex records that are very similar to structures in C/C++. More info at http://perldoc.perl.org/perldsc.html#Declaration-of-a-HASH-OF-COMPLEX-RECORDS -Surya On Thu, Sep 8, 2011 at 12:14 PM, galeb abu-ali wrote: > I only had a quick look at your code, so maybe I'm missing something but > you are currently pushing all products of all CDSs into the same array, > i.e. you do not assign them to a datastructure that links a particular > CDS to a list of products. You then use the same index to print out a > locus from the @loci array and a product from @products, but the two > will not match up because you will have more products than loci. > > > > That's right. Products are not the issue in this particular case, as it's > E.coli and there's no alternate splicing as far as I know so there is a > single product per gene. But there are plenty more 'function' qualifiers, > for example, than loci. And I don't know how to create a data structure > that > will link a 'gene' (as primary tag) to all other qualifiers, whether they > belong to 'CDS', 'Misc_RNA', 'Misc_feature', or other primary tags. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From fs5 at sanger.ac.uk Thu Sep 8 12:04:57 2011 From: fs5 at sanger.ac.uk (Frank Schwach) Date: Thu, 08 Sep 2011 17:04:57 +0100 Subject: [Bioperl-l] genbank parsing of multiple 'function' tags within primary tag In-Reply-To: References: Message-ID: <1315497897.3797.443.camel@deskpro15336.internal.sanger.ac.uk> I only had a quick look at your code, so maybe I'm missing something but you are currently pushing all products of all CDSs into the same array, i.e. you do not assign them to a datastructure that links a particular CDS to a list of products. You then use the same index to print out a locus from the @loci array and a product from @products, but the two will not match up because you will have more products than loci. Frank On Thu, 2011-09-08 at 10:44 -0400, galeb abu-ali wrote: > Hi, > > I'm parsing a genbank file with Bio::SeqIO and am stuck on instances of > multiple tags within a primary tag. E.g., when there are several 'function' > tag-values within a 'CDS' primary tag, I don't know how to link those > 'function' tag-values to a particular 'locus_tag'. As parsed values are > returned as a list, I tried creating an array of hashes, where the hash-key > is 'locus_tag' and hash-values are multiple 'function' tags, but am failing > miserably. Pasted below is what I managed so far. At your convenience, > please advise. > > thanks! > > galeb > > #!/usr/local/bin/perl > # parse_gbk.pl > # gsa 09042011 > # script to parse out features from gbk > # > http://www.bioperl.org/wiki/HOWTO:Feature-Annotation#Customizing_Sequence_Object_Construction > > use strict; use warnings; > use Bio::SeqIO; > > my @loci; > my @seqs; > my @directions; > my @start_coords; > my @end_coords; > my @genes; > my @products; > my @notes; > my @functions; > my %functions; > > my $gb_file = shift; > my $seqio_obj = Bio::SeqIO->new(-file => $gb_file ); > my $seq_obj = $seqio_obj->next_seq; > > for my $feat_obj ( $seq_obj->get_SeqFeatures ) { > if ( $feat_obj->primary_tag eq ( 'gene' ) ) { > if ($feat_obj->has_tag( 'locus_tag' ) ) { > push ( @seqs, $feat_obj->seq->seq ); #collect sequences > for my $val ( $feat_obj->get_tag_values( 'locus_tag' ) ) > { > push ( @loci, $val ); # locus_tags > } > } > if ( $feat_obj->has_tag( 'gene' ) ) { > for my $val ( $feat_obj->get_tag_values( 'gene' ) > ) { > push ( @genes, $val ); # gene names > } > } > else { > push ( @genes, "" ); # if gene names are absent, leave > empty > } > if ( $feat_obj->location->isa( 'Bio::Location::Simple' ) ) { # gene > coordinates > for my $location ( $feat_obj->location ) { > push ( @start_coords, $location->start ); > push ( @end_coords, $location->end ); > if ( $location->strand == -1 ) { > push ( @directions, "reverse" ); > } > else { > push ( @directions, "forward" ); > } > } > } > } > # gene products, notes, functions > if ( $feat_obj->primary_tag eq ( 'CDS' ) || $feat_obj->primary_tag eq ( > 'misc_feature' ) || $feat_obj->primary_tag eq ( 'ncRNA' ) || > $feat_obj->primary_tag eq ( 'rRNA' ) || $feat_obj->primary_tag eq ( 'tRNA' ) > || $feat_obj->primary_tag eq ( 'misc_RNA' ) ) { > if ( $feat_obj->has_tag( 'product' ) ) { > for my $product ( $feat_obj->get_tag_values( 'product' ) ) { > push ( @products, $product ); > } > } > else { > push ( @products, "" ); > } > if ( $feat_obj->has_tag( 'note' ) ) { > for my $note ( $feat_obj->get_tag_values( 'note' ) ) { > push ( @notes, $note ); > } > } > else { > push ( @notes, "" ); > } > if ( $feat_obj->has_tag( 'function' ) ) { > for my $function ( $feat_obj->get_tag_values( 'function' ) ) { > push ( @functions, $function ); > } > } > else { > push ( @functions, "" ); > } > > } > } > > print > "locus\tgene_name\tstart_nt\tend_nt\tlength_nt\tdirection\tproduct\tnote\tfunction\tsequence_nt\n"; > # header > > for ( my $elem = 0; $elem < scalar @loci; ++$elem ) { > print $loci[$elem], "\t",$genes[$elem], "\t", $start_coords[$elem], > "\t", $end_coords[$elem], "\t", length( $seqs[$elem] ), "\t", > $directions[$elem], "\t", $products[$elem], "\t", $notes[$elem], "\t", > $functions[$elem], "\t", $seqs[$elem], "\n"; > } > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From cjfields at illinois.edu Thu Sep 8 12:51:22 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Thu, 8 Sep 2011 16:51:22 +0000 Subject: [Bioperl-l] genbank parsing of multiple 'function' tags within primary tag In-Reply-To: References: <1315497897.3797.443.camel@deskpro15336.internal.sanger.ac.uk> Message-ID: There is no need to do that if one is using the Bio::SeqFeatureI interface. Note that get_tag_values always returns a list, so to snag a single value for a tag in a scalar, force list context on the LHS by enclosing the variable in (). chris ----------------------------- #!/usr/bin/env perl use Modern::Perl; use Bio::SeqIO; my $in = Bio::SeqIO->new(-format => 'genbank', -file => shift); while (my $seq = $in->next_seq) { for my $feat ($seq->get_SeqFeatures) { next unless $feat->primary_tag eq 'CDS'; my ($locus) = $feat->has_tag('locus_tag') ? $feat->get_tag_values('locus_tag') : ''; my @funcs = $feat->has_tag('function') ? $feat->get_tag_values('function') : (); say join("\t", $locus, join(',', at funcs)); } } On Sep 8, 2011, at 11:28 AM, Surya Saha wrote: > You might want to explore using a hash of complex records that are very > similar to structures in C/C++. More info at > http://perldoc.perl.org/perldsc.html#Declaration-of-a-HASH-OF-COMPLEX-RECORDS > > -Surya > > On Thu, Sep 8, 2011 at 12:14 PM, galeb abu-ali wrote: > >> I only had a quick look at your code, so maybe I'm missing something but >> you are currently pushing all products of all CDSs into the same array, >> i.e. you do not assign them to a datastructure that links a particular >> CDS to a list of products. You then use the same index to print out a >> locus from the @loci array and a product from @products, but the two >> will not match up because you will have more products than loci. >> >> >> >> That's right. Products are not the issue in this particular case, as it's >> E.coli and there's no alternate splicing as far as I know so there is a >> single product per gene. But there are plenty more 'function' qualifiers, >> for example, than loci. And I don't know how to create a data structure >> that >> will link a 'gene' (as primary tag) to all other qualifiers, whether they >> belong to 'CDS', 'Misc_RNA', 'Misc_feature', or other primary tags. >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From abualiga2 at gmail.com Thu Sep 8 12:51:42 2011 From: abualiga2 at gmail.com (galeb abu-ali) Date: Thu, 8 Sep 2011 12:51:42 -0400 Subject: [Bioperl-l] genbank parsing of multiple 'function' tags within primary tag In-Reply-To: References: <1315497897.3797.443.camel@deskpro15336.internal.sanger.ac.uk> Message-ID: You might want to explore using a hash of complex records that are very similar to structures in C/C++. More info at http://perldoc.perl.org/perldsc.html#Declaration-of-a-HASH-OF-COMPLEX-RECORDS alright, thanks! From jskittrell at unmc.edu Thu Sep 8 12:40:31 2011 From: jskittrell at unmc.edu (Jeff S Kittrell) Date: Thu, 8 Sep 2011 11:40:31 -0500 Subject: [Bioperl-l] Error when parsing a blast file Message-ID: An HTML attachment was scrubbed... URL: From cjfields at illinois.edu Thu Sep 8 13:28:53 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Thu, 8 Sep 2011 17:28:53 +0000 Subject: [Bioperl-l] Error when parsing a blast file In-Reply-To: References: Message-ID: <1F3664C5-6D6B-409C-BE5A-C5EB08975231@illinois.edu> What version of bioperl are you using? I think this issue was addressed a while ago, but it's possible there has been a regression. chris On Sep 8, 2011, at 11:40 AM, Jeff S Kittrell wrote: > Hello Gentlemen, > > I am using BioPerl to a parse a blast output file but have run into some difficulties. I've pin pointed the problem and have pasted an example below. If you look at query position 223-224 you will see a large insertion 65ish nucleotides. Since the insertion spans the entire line there are no nucleotide position numbers at the end or beginning of the line nor any nucleotides within the line (dashes only). > When the SearchIO parser encounters this record it dies with the error > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: no data for midline Query ------------------------------------------------------------ > STACK: Error::throw > STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:368 > STACK: Bio::SearchIO::blast::next_result /usr/local/share/perl5/Bio/SearchIO/blast.pm:1805 > STACK: BlastParseNucleotideForDBTopHitCONTIGSQUERY.pl:24 > ----------------------------------------------------------- > > > Has anyone encountered this problem before? Am I doing something wrong? > > Thanks > > Jeff Kittrell > Department of Genetics, Cell Biology & Anatomy > University of Nebraska Medical Center > 985805 Nebraska Medical Center > Omaha, NE 68198-5805 > > Query= 78065535 > > Length=523 > Score E > Sequences producing significant alignments: (Bits) Value > > gi|144922664|ref|NM_001083909.1| Homo sapiens G protein-coupled... 576 1e-163 > > > > gi|144922664|ref|NM_001083909.1| Homo sapiens G protein-coupled > receptor 123 (GPR123), mRNA > Length=4298 > > Score = 576 bits (638), Expect = 1e-163 > Identities = 466/583 (80%), Gaps = 82/583 (14%) > Strand=Plus/Minus > > Query 1 CAGGACTCCGTGG-----ATGGCATCTCGGGCAGGGCCACGCTGGGGTCTGGGTGGGTCC 55 > ||||||||||||| | ||||||||||||||||||| |||||||||| |||||||| > Sbjct 2537 CAGGACTCCGTGGGCAGCAGGGCATCTCGGGCAGGGCCATGCTGGGGTCTCAGTGGGTCC 2478 > > Query 56 TTTGATGGAAGCCCCTGCTCTGCCTCTGGGGCGCCCCAGGACTGGAGGCCACAGGACAGA 115 > |||||||||| |||||||||||||||| ||| ||||||||||||||| |||||||||||| > Sbjct 2477 TTTGATGGAATCCCCTGCTCTGCCTCTAGGGTGCCCCAGGACTGGAGACCACAGGACAGA 2418 > > Query 116 AACCAGATGACCTTGTGCAGGGACGAGCACGTGGAACTGGGATAAAAGGAGTGGGCGTGG 175 > |||| ||||||| ||||| ||||| |||||| |||| |||||||| ||||||||||||| > Sbjct 2417 AACCGGATGACCGTGTGC-GGGACCAGCACGCGGAATTGGGATAAGGGGAGTGGGCGTGG 2359 > > Query 176 CCCAGAGCTTTTCCCCGCTGAGGTCTTTCACAAGGAAGGGGCAGGGGT------------ 223 > ||| |||| ||||||||||||||||||||||||||||||||||||||| > Sbjct 2358 CCCGGAGCGTTTCCCCGCTGAGGTCTTTCACAAGGAAGGGGCAGGGGTGTGATCACAAGG 2299 > > Query ------------------------------------------------------------ > > Sbjct 2298 AAGGGGCAGGGGTGTGATCACAAGGAAGGGGCAGGGGTGTGATCACAAGGAAGGGGCAGG 2239 > > Query 224 ---GTGAACTGCTTCCGAAAGGTGGGGTCACTTTGGTGCCCCCAGTGACCTCATGTGGCA 280 > |||||| ||||| |||||| |||||||||| ||| |||||||||||||||||||||| > Sbjct 2238 GGTGTGAACGGCTTCTGAAAGGCGGGGTCACTTCGGTACCCCCAGTGACCTCATGTGGCA 2179 > > Query 281 GATGGGCCCCCCACTCTGCTCTGAAGCTCCTCCAGGAACACTGTGTCCCCTG-CTCCGCC 339 > ||||||||||||||||||||||||||||||||||||||||| |||||| ||| | || | > Sbjct 2178 GATGGGCCCCCCACTCTGCTCTGAAGCTCCTCCAGGAACACCGTGTCCTCTGCCCCCATC 2119 > > Query 340 TACACAGTAGTTTCATTTTTCCAGGGTCCTGTTCGGATGTTGCCGGTCCCATCGGTGCCA 399 > |||||||||||||| |||||||||||||| |||||||||||||||||||| ||||||||| > Sbjct 2118 TACACAGTAGTTTCGTTTTTCCAGGGTCCCGTTCGGATGTTGCCGGTCCCGTCGGTGCCA 2059 > > Query 400 AACGGCAGGTCTTCTAGCAAGTTACCCTTGGGCAGCCCGTTCTGGCTGGGGCCACCAAAG 459 > ||||||||| |||||||||| ||||||||||||||||||||||||||||||||||||||| > Sbjct 2058 AACGGCAGGCCTTCTAGCAATTTACCCTTGGGCAGCCCGTTCTGGCTGGGGCCACCAAAG 1999 > > Query 460 GGCAGGGACTGTGTCCTCCGCAGCATCTCCAGGTGACCAGGCC 502 > ||||||||||||||||||||||||||||||| ||| || |||| > Sbjct 1998 GGCAGGGACTGTGTCCTCCGCAGCATCTCCAAGTGGCCGGGCC 1956 > > > > Lambda K H > 0.634 0.408 0.912 > > Gapped > Lambda K H > 0.625 0.410 0.780 > > Effective search space used: 47712920310 > > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From abualiga2 at gmail.com Thu Sep 8 13:51:34 2011 From: abualiga2 at gmail.com (galeb abu-ali) Date: Thu, 8 Sep 2011 13:51:34 -0400 Subject: [Bioperl-l] genbank parsing of multiple 'function' tags within primary tag In-Reply-To: References: <1315497897.3797.443.camel@deskpro15336.internal.sanger.ac.uk> Message-ID: thanks, Chris! works perfect. To make sure I understand what's going on, forcing list context on $locus allows me to get one value at a time, which is then concatenated with \t to concatenated functions. thanks again! galeb On Thu, Sep 8, 2011 at 12:51 PM, Fields, Christopher J < cjfields at illinois.edu> wrote: > There is no need to do that if one is using the Bio::SeqFeatureI interface. > Note that get_tag_values always returns a list, so to snag a single value > for a tag in a scalar, force list context on the LHS by enclosing the > variable in (). > > chris > > ----------------------------- > #!/usr/bin/env perl > > use Modern::Perl; > use Bio::SeqIO; > > my $in = Bio::SeqIO->new(-format => 'genbank', > -file => shift); > > while (my $seq = $in->next_seq) { > for my $feat ($seq->get_SeqFeatures) { > next unless $feat->primary_tag eq 'CDS'; > my ($locus) = $feat->has_tag('locus_tag') ? > $feat->get_tag_values('locus_tag') : ''; > my @funcs = $feat->has_tag('function') ? > $feat->get_tag_values('function') : (); > say join("\t", $locus, join(',', at funcs)); > } > } > > > > From cjfields at illinois.edu Thu Sep 8 14:27:06 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Thu, 8 Sep 2011 18:27:06 +0000 Subject: [Bioperl-l] genbank parsing of multiple 'function' tags within primary tag In-Reply-To: References: <1315497897.3797.443.camel@deskpro15336.internal.sanger.ac.uk> Message-ID: <2CDD8237-140E-410B-A18C-68FAF54719D5@illinois.edu> On Sep 8, 2011, at 12:51 PM, galeb abu-ali wrote: > thanks, Chris! works perfect. > To make sure I understand what's going on, forcing list context on $locus allows me to get one value at a time,... You have to be careful in this circumstance; doing this: my $foo = @bar; is scalar context on a list, which returns the number of elements in @bar. The following my ($foo) = @bar; forces list context and assigns the first value in @bar to $foo but tosses the rest. If you are sure there is only one value in @bar anyway, the above is fine (and is a common perl idiom). > which is then concatenated with \t to concatenated functions. I'm just using a simple join() to print off the results. Note the second element in the join list is an embedded join() with comma-sep values for functions. chris > thanks again! > > galeb > > On Thu, Sep 8, 2011 at 12:51 PM, Fields, Christopher J wrote: > There is no need to do that if one is using the Bio::SeqFeatureI interface. Note that get_tag_values always returns a list, so to snag a single value for a tag in a scalar, force list context on the LHS by enclosing the variable in (). > > chris > > ----------------------------- > #!/usr/bin/env perl > > use Modern::Perl; > use Bio::SeqIO; > > my $in = Bio::SeqIO->new(-format => 'genbank', > -file => shift); > > while (my $seq = $in->next_seq) { > for my $feat ($seq->get_SeqFeatures) { > next unless $feat->primary_tag eq 'CDS'; > my ($locus) = $feat->has_tag('locus_tag') ? > $feat->get_tag_values('locus_tag') : ''; > my @funcs = $feat->has_tag('function') ? > $feat->get_tag_values('function') : (); > say join("\t", $locus, join(',', at funcs)); > } > } > > > From cjfields at illinois.edu Thu Sep 8 14:30:06 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Thu, 8 Sep 2011 18:30:06 +0000 Subject: [Bioperl-l] Error when parsing a blast file In-Reply-To: References: <1F3664C5-6D6B-409C-BE5A-C5EB08975231@illinois.edu> Message-ID: Try updating to the latest CPAN release (1.6.901, which is the pre-1.7 release). chris On Sep 8, 2011, at 1:19 PM, Jeff S Kittrell wrote: > chris, > > I am using version 1.6.1 > > Thanks, > > > Jeff Kittrell > Department of Genetics, Cell Biology & Anatomy > University of Nebraska Medical Center > 985805 Nebraska Medical Center > Omaha, NE 68198-5805 > > "Fields, Christopher J" ---09/08/2011 12:28:56 PM---What version of bioperl are you using? I think this issue was addressed a while ago, but it's possi > > > From: > > "Fields, Christopher J" > > To: > > Jeff S Kittrell > > Cc: > > " " > > Date: > > 09/08/2011 12:28 PM > > Subject: > > Re: [Bioperl-l] Error when parsing a blast file > > > > What version of bioperl are you using? I think this issue was addressed a while ago, but it's possible there has been a regression. > > chris > > On Sep 8, 2011, at 11:40 AM, Jeff S Kittrell wrote: > > > Hello Gentlemen, > > > > I am using BioPerl to a parse a blast output file but have run into some difficulties. I've pin pointed the problem and have pasted an example below. If you look at query position 223-224 you will see a large insertion 65ish nucleotides. Since the insertion spans the entire line there are no nucleotide position numbers at the end or beginning of the line nor any nucleotides within the line (dashes only). > > When the SearchIO parser encounters this record it dies with the error > > > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > MSG: no data for midline Query ------------------------------------------------------------ > > STACK: Error::throw > > STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:368 > > STACK: Bio::SearchIO::blast::next_result /usr/local/share/perl5/Bio/SearchIO/blast.pm:1805 > > STACK: BlastParseNucleotideForDBTopHitCONTIGSQUERY.pl:24 > > ----------------------------------------------------------- > > > > > > Has anyone encountered this problem before? Am I doing something wrong? > > > > Thanks > > > > Jeff Kittrell > > Department of Genetics, Cell Biology & Anatomy > > University of Nebraska Medical Center > > 985805 Nebraska Medical Center > > Omaha, NE 68198-5805 > > > > Query= 78065535 > > > > Length=523 > > Score E > > Sequences producing significant alignments: (Bits) Value > > > > gi|144922664|ref|NM_001083909.1| Homo sapiens G protein-coupled... 576 1e-163 > > > > > > > gi|144922664|ref|NM_001083909.1| Homo sapiens G protein-coupled > > receptor 123 (GPR123), mRNA > > Length=4298 > > > > Score = 576 bits (638), Expect = 1e-163 > > Identities = 466/583 (80%), Gaps = 82/583 (14%) > > Strand=Plus/Minus > > > > Query 1 CAGGACTCCGTGG-----ATGGCATCTCGGGCAGGGCCACGCTGGGGTCTGGGTGGGTCC 55 > > ||||||||||||| | ||||||||||||||||||| |||||||||| |||||||| > > Sbjct 2537 CAGGACTCCGTGGGCAGCAGGGCATCTCGGGCAGGGCCATGCTGGGGTCTCAGTGGGTCC 2478 > > > > Query 56 TTTGATGGAAGCCCCTGCTCTGCCTCTGGGGCGCCCCAGGACTGGAGGCCACAGGACAGA 115 > > |||||||||| |||||||||||||||| ||| ||||||||||||||| |||||||||||| > > Sbjct 2477 TTTGATGGAATCCCCTGCTCTGCCTCTAGGGTGCCCCAGGACTGGAGACCACAGGACAGA 2418 > > > > Query 116 AACCAGATGACCTTGTGCAGGGACGAGCACGTGGAACTGGGATAAAAGGAGTGGGCGTGG 175 > > |||| ||||||| ||||| ||||| |||||| |||| |||||||| ||||||||||||| > > Sbjct 2417 AACCGGATGACCGTGTGC-GGGACCAGCACGCGGAATTGGGATAAGGGGAGTGGGCGTGG 2359 > > > > Query 176 CCCAGAGCTTTTCCCCGCTGAGGTCTTTCACAAGGAAGGGGCAGGGGT------------ 223 > > ||| |||| ||||||||||||||||||||||||||||||||||||||| > > Sbjct 2358 CCCGGAGCGTTTCCCCGCTGAGGTCTTTCACAAGGAAGGGGCAGGGGTGTGATCACAAGG 2299 > > > > Query ------------------------------------------------------------ > > > > Sbjct 2298 AAGGGGCAGGGGTGTGATCACAAGGAAGGGGCAGGGGTGTGATCACAAGGAAGGGGCAGG 2239 > > > > Query 224 ---GTGAACTGCTTCCGAAAGGTGGGGTCACTTTGGTGCCCCCAGTGACCTCATGTGGCA 280 > > |||||| ||||| |||||| |||||||||| ||| |||||||||||||||||||||| > > Sbjct 2238 GGTGTGAACGGCTTCTGAAAGGCGGGGTCACTTCGGTACCCCCAGTGACCTCATGTGGCA 2179 > > > > Query 281 GATGGGCCCCCCACTCTGCTCTGAAGCTCCTCCAGGAACACTGTGTCCCCTG-CTCCGCC 339 > > ||||||||||||||||||||||||||||||||||||||||| |||||| ||| | || | > > Sbjct 2178 GATGGGCCCCCCACTCTGCTCTGAAGCTCCTCCAGGAACACCGTGTCCTCTGCCCCCATC 2119 > > > > Query 340 TACACAGTAGTTTCATTTTTCCAGGGTCCTGTTCGGATGTTGCCGGTCCCATCGGTGCCA 399 > > |||||||||||||| |||||||||||||| |||||||||||||||||||| ||||||||| > > Sbjct 2118 TACACAGTAGTTTCGTTTTTCCAGGGTCCCGTTCGGATGTTGCCGGTCCCGTCGGTGCCA 2059 > > > > Query 400 AACGGCAGGTCTTCTAGCAAGTTACCCTTGGGCAGCCCGTTCTGGCTGGGGCCACCAAAG 459 > > ||||||||| |||||||||| ||||||||||||||||||||||||||||||||||||||| > > Sbjct 2058 AACGGCAGGCCTTCTAGCAATTTACCCTTGGGCAGCCCGTTCTGGCTGGGGCCACCAAAG 1999 > > > > Query 460 GGCAGGGACTGTGTCCTCCGCAGCATCTCCAGGTGACCAGGCC 502 > > ||||||||||||||||||||||||||||||| ||| || |||| > > Sbjct 1998 GGCAGGGACTGTGTCCTCCGCAGCATCTCCAAGTGGCCGGGCC 1956 > > > > > > > > Lambda K H > > 0.634 0.408 0.912 > > > > Gapped > > Lambda K H > > 0.625 0.410 0.780 > > > > Effective search space used: 47712920310 > > > > > > > > > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > From abualiga2 at gmail.com Thu Sep 8 14:34:41 2011 From: abualiga2 at gmail.com (galeb abu-ali) Date: Thu, 8 Sep 2011 14:34:41 -0400 Subject: [Bioperl-l] genbank parsing of multiple 'function' tags within primary tag In-Reply-To: <2CDD8237-140E-410B-A18C-68FAF54719D5@illinois.edu> References: <1315497897.3797.443.camel@deskpro15336.internal.sanger.ac.uk> <2CDD8237-140E-410B-A18C-68FAF54719D5@illinois.edu> Message-ID: many thanks again, Chris! I was reading Programming Perl, but this sums it up better. On Thu, Sep 8, 2011 at 2:27 PM, Fields, Christopher J wrote: > On Sep 8, 2011, at 12:51 PM, galeb abu-ali wrote: > > > thanks, Chris! works perfect. > > To make sure I understand what's going on, forcing list context on $locus > allows me to get one value at a time,... > > You have to be careful in this circumstance; doing this: > > my $foo = @bar; > > is scalar context on a list, which returns the number of elements in @bar. > The following > > my ($foo) = @bar; > > forces list context and assigns the first value in @bar to $foo but tosses > the rest. If you are sure there is only one value in @bar anyway, the above > is fine (and is a common perl idiom). > > > which is then concatenated with \t to concatenated functions. > > I'm just using a simple join() to print off the results. Note the second > element in the join list is an embedded join() with comma-sep values for > functions. > > chris > > > thanks again! > > > > galeb > > > > On Thu, Sep 8, 2011 at 12:51 PM, Fields, Christopher J < > cjfields at illinois.edu> wrote: > > There is no need to do that if one is using the Bio::SeqFeatureI > interface. Note that get_tag_values always returns a list, so to snag a > single value for a tag in a scalar, force list context on the LHS by > enclosing the variable in (). > > > > chris > > > > ----------------------------- > > #!/usr/bin/env perl > > > > use Modern::Perl; > > use Bio::SeqIO; > > > > my $in = Bio::SeqIO->new(-format => 'genbank', > > -file => shift); > > > > while (my $seq = $in->next_seq) { > > for my $feat ($seq->get_SeqFeatures) { > > next unless $feat->primary_tag eq 'CDS'; > > my ($locus) = $feat->has_tag('locus_tag') ? > > $feat->get_tag_values('locus_tag') : ''; > > my @funcs = $feat->has_tag('function') ? > > $feat->get_tag_values('function') : (); > > say join("\t", $locus, join(',', at funcs)); > > } > > } > > > > > > > > From David.Messina at sbc.su.se Fri Sep 9 05:40:25 2011 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 9 Sep 2011 11:40:25 +0200 Subject: [Bioperl-l] Fwd: blast result not matching. In-Reply-To: References: <4AF09F30-CB69-4285-90A2-40AFEEEBD222@illinois.edu> <8FB3525D-3B2B-4D17-80D4-AA85B3D2AE8B@gmail.com> Message-ID: Hi Manju, But this is not showing all query coverage as it shows in simple balst.(see > attached file) > I'm not sure what you mean by query coverage here, as blast report you attached doesn't (as far as I can see) include a calculation of the number or percentage of query bases covered. But in any case, everything in that blast report is available in the Bio::SearchIO object that B::T::R::RemoteBlast returns. Have you taken a look at http://www.bioperl.org/wiki/HOWTO:SearchIO ? That, along with the module documentation, should help you find the parts of the BLAST report you're looking for. > and i also want to write that result in a blast file..Is there any method > which can write the remoteblast output > in a file with blast extension? > It is possible to write out the results in a format that closely resembles the native blast report, but it's not recommended. If you want to just run BLAST and get back a report, there's no need to use BioPerl to parse the report first and then recreate the report. This might also be a good time to mention that, if you're doing more than a few hundred BLAST searches, you'll find it much more efficient to download the database and the BLAST program from NCBI and run them on your own computer. NCBI severely limits the speed and frequency of remote BLASTs, and furthermore it's much more prone to failure. Also, if you're using BLAST+, you can run your BLASTs on NCBI's computers remotely without BioPerl. Check out the --remote command-line option ? it's my favorite new feature! Dave From David.Messina at sbc.su.se Fri Sep 9 06:53:01 2011 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 9 Sep 2011 12:53:01 +0200 Subject: [Bioperl-l] Fwd: blast result not matching. In-Reply-To: References: <4AF09F30-CB69-4285-90A2-40AFEEEBD222@illinois.edu> <8FB3525D-3B2B-4D17-80D4-AA85B3D2AE8B@gmail.com> Message-ID: If you don't want to learn how to do this in BioPerl, then take my previous suggestion and just use NCBI's tools: Also, if you're using BLAST+, you can run your BLASTs on NCBI's computers > remotely without BioPerl. Check out the --remote command-line option On Fri, Sep 9, 2011 at 12:07, Manju Rawat wrote: > I dont no more about Bioperl.... > and i just want to blast my sequences using bioperl... > ans see the result in a file... > pls tell me what should i do??? > From manju.rawat2 at gmail.com Fri Sep 9 07:05:57 2011 From: manju.rawat2 at gmail.com (Manju Rawat) Date: Fri, 9 Sep 2011 07:05:57 -0400 Subject: [Bioperl-l] Fwd: blast result not matching. In-Reply-To: References: <4AF09F30-CB69-4285-90A2-40AFEEEBD222@illinois.edu> <8FB3525D-3B2B-4D17-80D4-AA85B3D2AE8B@gmail.com> Message-ID: I want to learn...and i am learing it from starting... My main query is I want to make a program which gives me that result(sequence) which have no blast result(no matches in any database/or particular database). for this i have to do blast may time....but i am not getting desired result in blast...this is the main problem which i am facing.. now pls tell me whta procedure i should follow... Manju From cjfields at illinois.edu Fri Sep 9 09:03:26 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Fri, 9 Sep 2011 13:03:26 +0000 Subject: [Bioperl-l] blast result not matching. In-Reply-To: References: <4AF09F30-CB69-4285-90A2-40AFEEEBD222@illinois.edu> <8FB3525D-3B2B-4D17-80D4-AA85B3D2AE8B@gmail.com> Message-ID: If you are planning on looking against 'everything' (e.g. nt or nr), and you have many sequences to run, I would follow Dave's suggestion and download BLAST locally. chris On Sep 9, 2011, at 6:05 AM, Manju Rawat wrote: > I want to learn...and i am learing it from starting... > My main query is I want to make a program which gives me that > result(sequence) which have no blast result(no matches in any database/or > particular database). > for this i have to do blast may time....but i am not getting desired result > in blast...this is the main problem which i am facing.. > now pls tell me whta procedure i should follow... > > > Manju > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From manju.rawat2 at gmail.com Fri Sep 9 05:01:55 2011 From: manju.rawat2 at gmail.com (Manju Rawat) Date: Fri, 9 Sep 2011 05:01:55 -0400 Subject: [Bioperl-l] Fwd: blast result not matching. In-Reply-To: References: <4AF09F30-CB69-4285-90A2-40AFEEEBD222@illinois.edu> <8FB3525D-3B2B-4D17-80D4-AA85B3D2AE8B@gmail.com> Message-ID: Thanks to all..Its working.. I tried that module...and got the result follwing result in terminal... waiting......db is All GenBank+EMBL+DDBJ+PDB sequences (but no EST, STS,GSS,environmental samples or phase 0, 1 or 2 HTGS sequences) hit name is ref|NM_181451.1| score is 240 hit name is ref|NM_001008415.1| score is 234 hit name is ref|XM_002706247.1| score is 212 hit name is ref|XM_002683856.1| score is 208 hit name is gb|EF197120.1| score is 208 hit name is ref|XR_083566.1| score is 198 hit name is ref|NM_001097567.1| score is 198 hit name is ref|NM_001098089.1| score is 198 hit name is ref|XM_002699708.1| score is 192 hit name is ref|XM_592786.5| score is 192 hit name is ref|XM_001251693.3| score is 192 hit name is gb|AF490400.1| score is 190 hit name is gb|AY075103.1| score is 190 hit name is ref|XR_083457.1| score is 178 But this is not showing all query coverage as it shows in simple balst.(see attached file) and i also want to write that result in a blast file..Is there any method which can write the remoteblast output in a file with blast extension? Thanks Manju Rawat. -------------- next part -------------- A non-text attachment was scrubbed... Name: res.blast Type: application/octet-stream Size: 218976 bytes Desc: not available URL: From ross at cuhk.edu.hk Sat Sep 10 02:39:23 2011 From: ross at cuhk.edu.hk (Ross KK Leung) Date: Sat, 10 Sep 2011 14:39:23 +0800 Subject: [Bioperl-l] fail to obtain node-to-node distance from a newick file Message-ID: <048a01cc6f84$60c41090$224c31b0$@edu.hk> I use the following code to derive the distance between two nodes but an error "MSG: could not find the lca of supplied nodes; can't find distance either" What's the problem? use Bio::TreeIO; ($treefh) = @ARGV; my $treeio = new Bio::TreeIO(-file => $treefh, -format => "newick"); my $tree = $treeio->next_tree; $keyword="Mycobacterium_tuberculosis_H37Rv"; my $Tnodes = $tree->find_node(-id => $keyword); $keyword="Mycobacterium_smegmatis_str._MC2_155"; my $Mnodes = $tree->find_node(-id => $keyword); $keyword="Mycobacterium_abscessus"; my $Anodes = $tree->find_node(-id => $keyword); my @root = $tree->get_root_node; #my $distances = $tree->distance(-nodes => [$node[0],$root]); my $distances = $tree->distance(-nodes => [$Tnode,$Mnodes]); print "Dist:$distances\n"; #### the following is the infile (((Mycobacterium_tuberculosis_SUMu006:4.1E-4,(Mycobacterium_tuberculosis_SUM u002:8.9E-4,Mycobacterium_tuberculosis_SUMu005:1.7E-4)31:1.4E-4)30:1.4E-4,(M ycobacterium_tuberculosis_SUMu007:4.3E-4,Mycobacterium_tuberculosis_SUMu009: 1.2E-4)22:1.4E-4)8:1.4E-4,(Mycobacterium_tuberculosis_SUMu004:3.7E-4,Mycobac terium_tuberculosis_SUMu008:2.0E-4)19:1.0E-4,(Mycobacterium_tuberculosis_SUM u003:1.4E-4,(((((Mycobacterium_tuberculosis_EAS054:0.00165,(Mycobacterium_tu berculosis_94_M4241A:0.0011,((Mycobacterium_tuberculosis_T17:0.00257,Mycobac terium_tuberculosis_KZN_605:0.00259)8:1.9E-4,(Mycobacterium_tuberculosis_'98 -R604_INH-RIF-EM':9.9E-4,(Mycobacterium_tuberculosis_C:0.00293,Mycobacterium _tuberculosis_str._Haarlem:0.00271)28:1.4E-4)1:1.3E-4)0:1.6E-4)0:1.7E-4)0:1. 6E-4,(Mycobacterium_tuberculosis_02_1987:0.00186,(Mycobacterium_tuberculosis _T85:0.00229,Mycobacterium_tuberculosis_210:0.00175)41:1.4E-4)22:1.1E-4)0:1. 4E-4,(((Mycobacterium_kansasii_ATCC_12478:0.03353,(Mycobacterium_marinum_M:8 .4E-4,Mycobacterium_ulcerans_Agy99:0.00621)100:0.03843)100:0.01058,((Mycobac terium_leprae_Br4923:1.7E-4,Mycobacterium_leprae_TN:1.6E-4)100:0.07379,(((My cobacterium_abscessus:0.09172,((Nocardia_farcinica_IFM_10152:0.09312,(Rhodoc occus_equi_103S:0.04972,((Rhodococcus_erythropolis_PR4:0.00223,Rhodococcus_e rythropolis_SK121:0.00169)100:0.05311,(Rhodococcus_jostii_RHA1:0.00657,Rhodo coccus_opacus_B4:0.00665)100:0.03198)100:0.02608)100:0.03679)100:0.0322,(((D ietzia_cinnamea_P4:0.17703,(Corynebacterium_amycolatum_SK46:0.16548,((Coryne bacterium_kroppenstedtii_DSM_44385:0.19137,(Corynebacterium_variabile_DSM_44 702:0.14307,(Corynebacterium_urealyticum_DSM_7109:0.11864,(Corynebacterium_r esistens_DSM_45100:0.08617,(Corynebacterium_jeikeium_ATCC_43734:0.00292,Cory nebacterium_jeikeium_K411:0.00277)100:0.08012)100:0.02749)100:0.03867)100:0. 05776)100:0.02728,((Corynebacterium_glucuronolyticum_ATCC_51866:0.00103,Cory nebacterium_glucuronolyticum_ATCC_51867:0.00137)100:0.2108,(((Corynebacteriu m_efficiens_YS-314:0.04795,Corynebacterium_glutamicum_R:0.05369)100:0.07449, ((Corynebacterium_matruchotii_ATCC_14266:0.00162,Corynebacterium_matruchotii _ATCC_33806:0.00183)100:0.13467,(Corynebacterium_diphtheriae_NCTC_13129:0.07 465,((Corynebacterium_pseudotuberculosis_C231:1.6E-4,Corynebacterium_pseudot uberculosis_I19:1.1E-4)55:1.8E-4,(Corynebacterium_pseudotuberculosis_1002:4. 2E-4,Corynebacterium_pseudotuberculosis_FRC41:1.2E-4)100:1.3E-4)100:0.0698)1 00:0.03816)100:0.03537)100:0.01906,((Corynebacterium_ammoniagenes_DSM_20306: 0.10393,(((Corynebacterium_accolens_ATCC_49725:0.00229,Corynebacterium_accol ens_ATCC_49726:0.00293)100:0.03941,(Corynebacterium_pseudogenitalium_ATCC_33 035:0.00268,Corynebacterium_tuberculostearicum_SK141:0.00306)100:0.02483)100 :0.04295,(Corynebacterium_aurimucosum_ATCC_700975:0.05398,Corynebacterium_st riatum_ATCC_6940:0.05404)100:0.02088)100:0.03132)100:0.05481,(Corynebacteriu m_genitalium_ATCC_33030:0.09551,Corynebacterium_lipophiloflavum_DSM_44291:0. 09316)100:0.0783)100:0.03398)100:0.02921)100:0.04667)100:0.02447)100:0.11754 )100:0.06305,(((Pseudonocardia_sp._P1:0.18877,(Saccharopolyspora_erythraea_N RRL_2338:0.1213,(Actinosynnema_mirum_DSM_43827:0.11851,(Saccharomonospora_vi ridis_DSM_43017:0.0927,(Amycolatopsis_mediterranei_U32:0.04231,Streptomyces_ sp._AA4:0.04144)100:0.05565)100:0.05467)55:0.01703)100:0.03241)100:0.02581,( Nakamurella_multipartita_DSM_44233:0.23119,(Geodermatophilus_obscurus_DSM_43 160:0.21546,((Stackebrandtia_nassauensis_DSM_44728:0.21272,((Salinispora_are nicola_CNS-205:0.01818,Salinispora_tropica_CNB-440:0.02452)100:0.03265,(Micr omonospora_sp._ATCC_39149:0.0329,(Micromonospora_aurantiaca_ATCC_27029:0.001 16,Micromonospora_sp._L5:7.7E-4)100:0.03199)100:0.01352)100:0.11098)100:0.09 308,((((((Propionibacterium_freudenreichii_subsp._shermanii_CIRM-BIA1:0.2696 8,(Propionibacterium_acnes_J139:0.00475,(Propionibacterium_acnes_KPA171202:6 .9E-4,(Propionibacterium_acnes_SK187:9.6E-4,(Propionibacterium_acnes_J165:6. 2E-4,Propionibacterium_acnes_SK137:8.8E-4)100:6.3E-4)100:0.00181)100:0.00315 )100:0.24712)45:0.13659,(Kribbella_flavida_DSM_17836:0.17436,(Nocardioidacea e_bacterium_Broad-1:0.13638,Nocardioides_sp._JS614:0.08869)100:0.11094)73:0. 02696)43:0.0456,((Kytococcus_sedentarius_DSM_20547:0.25655,(Dermacoccus_sp._ Ellin185:0.19062,(Intrasporangium_calvum_DSM_43043:0.13866,Janibacter_sp._HT CC2649:0.1412)100:0.04228)58:0.02864)100:0.06896,(Kineococcus_radiotolerans_ SRS30216:0.21207,((((Propionibacterium_acidifaciens_F0233:1.06116,Rubrobacte r_xylanophilus_DSM_9941:0.5414)87:0.41816,(((Tropheryma_whipplei_TW08/27:0.0 0122,Tropheryma_whipplei_str._Twist:9.8E-4)100:0.6263,(Microbacterium_testac eum_StLB037:0.17489,(Leifsonia_xyli_subsp._xyli_str._CTCB07:0.12936,(marine_ actinobacterium_PHSC20C1:0.15679,(Clavibacter_michiganensis_subsp._michigane nsis_NCPPB_382:0.00662,Clavibacter_michiganensis_subsp._sepedonicus:0.00756) 100:0.1194)100:0.03032)100:0.03729)100:0.05755)100:0.10753,((Kocuria_rhizoph ila_DC2201:0.15484,(Rothia_dentocariosa_M567:0.0599,(Rothia_mucilaginosa_ATC C_25296:0.00472,Rothia_mucilaginosa_DY-18:0.00351)100:0.05376)100:0.16477)10 0:0.06018,((Micrococcus_luteus_NCTC_2665:0.0060,Micrococcus_luteus_SK58:0.00 536)100:0.2105,(Arthrobacter_arilaitensis_Re117:0.19987,(Renibacterium_salmo ninarum_ATCC_33209:0.11862,(Arthrobacter_aurescens_TC1:0.0397,(Arthrobacter_ sp._FB24:0.03191,(Arthrobacter_chlorophenolicus_A6:0.02327,Arthrobacter_phen anthrenivorans_Sphe3:0.0208)100:0.01937)100:0.01749)100:0.05605)100:0.05138) 54:0.02994)100:0.02418)100:0.06667)66:0.02055)2:0.00737,(Brevibacterium_line ns_BL2:0.16126,Brevibacterium_mcbrellneri_ATCC_49030:0.16657)100:0.16995)18: 0.03558,(Brachybacterium_faecium_DSM_4810:0.31488,(((Actinomyces_sp._oral_ta xon_848_str._F0332:0.18095,Arcanobacterium_haemolyticum_DSM_20595:0.23121)10 0:0.1195,((Actinomyces_urogenitalis_DSM_15434:0.07685,(Actinomyces_sp._oral_ taxon_171_str._F0337:0.01337,(Actinomyces_oris_K20:0.00655,Actinomyces_visco sus_C505:0.00531)100:0.00537)100:0.07249)100:0.1364,(((Mobiluncus_mulieris_2 8-1:0.00136,(Mobiluncus_mulieris_ATCC_35243:8.3E-4,(Mobiluncus_mulieris_ATCC _35239:8.2E-4,Mobiluncus_mulieris_FB024-16:0.00149)77:5.2E-4)68:4.7E-4)100:0 .11036,(Mobiluncus_curtisii_ATCC_51333:0.00379,(Mobiluncus_curtisii_ATCC_430 63:0.00217,Mobiluncus_curtisii_subsp._holmesii_ATCC_35242:0.00188)100:0.0018 7)100:0.10314)100:0.23714,(Actinomyces_coleocanis_DSM_15436:0.23053,(Actinom yces_sp._oral_taxon_178_str._F0338:0.07648,(Actinomyces_sp._oral_taxon_180_s tr._F0310:0.034,(Actinomyces_odontolyticus_ATCC_17982:0.00322,Actinomyces_od ontolyticus_F0309:0.00348)100:0.02875)100:0.05144)100:0.13371)100:0.05044)10 0:0.03821)98:0.03296)99:0.07932,(Beutenbergia_cavernae_DSM_12333:0.17071,(Ce llulomonas_flavigena_DSM_20109:0.11998,(Xylanimonas_cellulosilytica_DSM_1589 4:0.13952,(Sanguibacter_keddieii_DSM_10542:0.0926,Jonesia_denitrificans_DSM_ 20603:0.19099)100:0.037)100:0.02528)100:0.03933)100:0.03437)90:0.02763)87:0. 02582)25:0.02892)16:0.01907)26:0.05274)57:0.03588,(Catenulispora_acidiphila_ DSM_44928:0.19325,((Streptomyces_bingchenggensis_BCW-1:0.02661,(Streptomyces _hygroscopicus_ATCC_53653:0.02627,Streptomyces_violaceusniger_Tu_4113:0.0226 6)100:0.0122)100:0.02739,(((Streptomyces_avermitilis_MA-4680:0.02562,((Strep tomyces_viridochromogenes_DSM_40736:0.02687,((Streptomyces_ghanaensis_ATCC_1 4672:0.0209,Streptomyces_griseoflavus_Tu4000:0.02075)100:0.01066,(Streptomyc es_sp._e14:0.03193,(Streptomyces_coelicolor_A3_2_:5.2E-4,Streptomyces_livida ns_TK24:6.7E-4)100:0.02763)21:0.00492)22:0.00461)100:0.00897,(Streptomyces_s viceus_ATCC_29083:0.02794,Streptomyces_scabiei_87.22:0.03155)79:0.00685)98:0 .00896)100:0.02063,(Streptomyces_albus_J1074:0.04846,(Streptomyces_sp._SPB74 :0.01027,(Streptomyces_sp._SPB78:0.0027,(Streptomyces_sp._SA3_actF:0.00376,S treptomyces_sp._SA3_actG:2.4E-4)100:6.7E-4)100:0.00617)100:0.05981)100:0.015 63)100:0.00959,((Streptomyces_sp._C:0.01998,Streptomyces_sp._Mg1:0.02056)100 :0.0383,(Streptomyces_pristinaespiralis_ATCC_25486:0.03823,(Streptomyces_sp. _ACTE:0.02415,((Streptomyces_griseus_subsp._griseus_NBRC_13350:4.1E-4,Strept omyces_sp._ACT-1:5.2E-4)100:0.00769,(Streptomyces_roseosporus_NRRL_11379:2.2 E-4,Streptomyces_roseosporus_NRRL_15998:0.00209)100:0.00782)100:0.017)100:0. 02297)60:0.00742)100:0.00985)100:0.0215)100:0.11349)100:0.05)46:0.02214,(Aci dothermus_cellulolyticus_11B:0.25166,((Nocardiopsis_dassonvillei_subsp._dass onvillei_DSM_43111:0.11961,Thermobifida_fusca_YX:0.09198)100:0.09418,(Thermo monospora_curvata_DSM_43183:0.12535,(Streptosporangium_roseum_DSM_43021:0.09 754,Thermobispora_bispora_DSM_43833:0.08434)100:0.06253)100:0.0273)100:0.045 43)74:0.02856)40:0.02885,(Frankia_symbiont_of_Datisca_glomerata:0.09569,(Fra nkia_sp._EuI1c:0.10847,((Frankia_alni_ACN14a:0.0351,Frankia_sp._CcI3:0.03651 )100:0.04249,(Frankia_sp._EAN1pec:0.03101,Frankia_sp._EUN1f:0.0394)100:0.046 88)100:0.02188)100:0.03014)100:0.16741)88:0.0315)65:0.02571)88:0.05354)88:0. 03348)88:0.06784,(Segniliparus_rotundus_DSM_44985:0.04795,Segniliparus_rugos us_ATCC_BAA-974:0.03891)100:0.23144)81:0.0205)88:0.02214,(Tsukamurella_pauro metabola_DSM_20162:0.13958,(Gordonia_bronchialis_DSM_43247:0.08238,Gordonia_ neofelifaecis_NRRL_B-59395:0.10412)100:0.06124)100:0.03199)84:0.01596)88:0.0 6919)88:0.04069,(Mycobacterium_smegmatis_str._MC2_155:0.05174,((Mycobacteriu m_sp._JLS:0.00135,(Mycobacterium_sp._KMS:1.4E-4,Mycobacterium_sp._MCS:2.3E-4 )100:0.00177)100:0.04943,(Mycobacterium_vanbaalenii_PYR-1:0.02631,(Mycobacte rium_gilvum_PYR-GCK:0.0011,Mycobacterium_sp._Spyr1:9.2E-4)100:0.03388)100:0. 03536)100:0.01213)100:0.02314)88:0.04937,(Mycobacterium_parascrofulaceum_ATC C_BAA-614:0.02796,(Mycobacterium_intracellulare_ATCC_13950:0.02159,(Mycobact erium_avium_subsp._paratuberculosis_K-10:0.0020,(Mycobacterium_avium_104:0.0 032,Mycobacterium_avium_subsp._avium_ATCC_25291:0.00216)91:6.3E-4)100:0.0194 3)100:0.01235)100:0.0158)88:0.01054)62:0.0066)88:0.04218,(Mycobacterium_tube rculosis_T46:7.6E-4,(Mycobacterium_tuberculosis_CPHL_A:5.8E-4,(Mycobacterium _tuberculosis_K85:8.0E-4,(Mycobacterium_bovis_AF2122/97:1.4E-4,(Mycobacteriu m_bovis_BCG_str._Pasteur_1173P2:1.4E-4,Mycobacterium_bovis_BCG_str._Tokyo_17 2:1.4E-4)100:1.3E-4)100:1.8E-4)38:1.1E-4)33:1.4E-4)0:1.4E-4)1:1.5E-4)0:1.5E- 4,(Mycobacterium_tuberculosis_CDC1551:2.2E-4,(Mycobacterium_tuberculosis_T92 :0.00262,Mycobacterium_tuberculosis_GM_1503:0.00427)3:1.3E-4)0:1.4E-4)13:1.4 E-4,(((Mycobacterium_tuberculosis_SUMu012:8.5E-4,(Mycobacterium_tuberculosis _H37Ra_WGS_:0.00109,(Mycobacterium_tuberculosis_H37Ra:1.4E-4,Mycobacterium_t uberculosis_H37Rv:1.4E-4)58:1.4E-4)41:1.4E-4)72:1.4E-4,(Mycobacterium_tuberc ulosis_SUMu001:2.9E-4,(Mycobacterium_tuberculosis_SUMu010:1.1E-4,Mycobacteri um_tuberculosis_SUMu011:8.7E-4)58:1.4E-4)65:1.6E-4)61:0.00102,(Mycobacterium _tuberculosis_F11:1.1E-4,((Mycobacterium_tuberculosis_KZN_4207:1.4E-4,Mycoba cterium_tuberculosis_KZN_R506:1.4E-4)58:1.4E-4,(Mycobacterium_tuberculosis_K ZN_1435:1.4E-4,Mycobacterium_tuberculosis_KZN_V2475:1.4E-4)74:1.4E-4)78:1.1E -4)36:1.4E-4)3:1.4E-4)46:1.4E-4)4:1.4E-4)88; From greg at ebi.ac.uk Sat Sep 10 11:39:52 2011 From: greg at ebi.ac.uk (Gregory Jordan) Date: Sat, 10 Sep 2011 16:39:52 +0100 Subject: [Bioperl-l] fail to obtain node-to-node distance from a newick file In-Reply-To: <048a01cc6f84$60c41090$224c31b0$@edu.hk> References: <048a01cc6f84$60c41090$224c31b0$@edu.hk> Message-ID: Hi Ross, Which version of BioPerl are you using? With the refactored tree code (available from the tree_api_refresh branch on the BioPerl github repo: https://github.com/bioperl/bioperl-live/blob/topic/tree_api_refresh/Bio/Tree/NodeFunctionsI.pm#L406) the following script works for me. Do those values look sensible to you? The code on the new branch is a bit experimental, so I wouldn't be surprised if all the edge cases for calculations like this aren't covered. --greg use Bio::TreeIO; my $treeio = new Bio::TreeIO(-file => 'temp.nh', -format => "newick"); my $tree = $treeio->next_tree; my $mt = $tree->find("Mycobacterium_tuberculosis_H37Rv"); my $ms = $tree->find("Mycobacterium_smegmatis_str._MC2_155"); my $ma = $tree->find("Mycobacterium_abscessus"); my $distance = $mt->distance($ma); print "MT - MA: ".$mt->distance($ma)."\n"; print "MT - MS: ".$mt->distance($ms)."\n"; print "MS - MA: ".$ms->distance($ma)."\n"; # MT - MA: 0.24326 # MT - MS: 0.18573 # MS - MA: 0.20729 --greg On Sat, Sep 10, 2011 at 7:39 AM, Ross KK Leung wrote: > I use the following code to derive the distance between two nodes but an > error "MSG: could not find the lca of supplied nodes; can't find distance > either" > > > > What's the problem? > > > > use Bio::TreeIO; > > > > ($treefh) = @ARGV; > > > > my $treeio = new Bio::TreeIO(-file => $treefh, -format => "newick"); > > my $tree = $treeio->next_tree; > > > > $keyword="Mycobacterium_tuberculosis_H37Rv"; > > my $Tnodes = $tree->find_node(-id => $keyword); > > $keyword="Mycobacterium_smegmatis_str._MC2_155"; > > my $Mnodes = $tree->find_node(-id => $keyword); > > $keyword="Mycobacterium_abscessus"; > > my $Anodes = $tree->find_node(-id => $keyword); > > > > my @root = $tree->get_root_node; > > #my $distances = $tree->distance(-nodes => [$node[0],$root]); > > > > my $distances = $tree->distance(-nodes => [$Tnode,$Mnodes]); > > print "Dist:$distances\n"; > > > > > > #### the following is the infile > > > (((Mycobacterium_tuberculosis_SUMu006:4.1E-4,(Mycobacterium_tuberculosis_SUM > > u002:8.9E-4,Mycobacterium_tuberculosis_SUMu005:1.7E-4)31:1.4E-4)30:1.4E-4,(M > > ycobacterium_tuberculosis_SUMu007:4.3E-4,Mycobacterium_tuberculosis_SUMu009: > > 1.2E-4)22:1.4E-4)8:1.4E-4,(Mycobacterium_tuberculosis_SUMu004:3.7E-4,Mycobac > > terium_tuberculosis_SUMu008:2.0E-4)19:1.0E-4,(Mycobacterium_tuberculosis_SUM > > u003:1.4E-4,(((((Mycobacterium_tuberculosis_EAS054:0.00165,(Mycobacterium_tu > > berculosis_94_M4241A:0.0011,((Mycobacterium_tuberculosis_T17:0.00257,Mycobac > > terium_tuberculosis_KZN_605:0.00259)8:1.9E-4,(Mycobacterium_tuberculosis_'98 > > -R604_INH-RIF-EM':9.9E-4,(Mycobacterium_tuberculosis_C:0.00293,Mycobacterium > > _tuberculosis_str._Haarlem:0.00271)28:1.4E-4)1:1.3E-4)0:1.6E-4)0:1.7E-4)0:1. > > 6E-4,(Mycobacterium_tuberculosis_02_1987:0.00186,(Mycobacterium_tuberculosis > > _T85:0.00229,Mycobacterium_tuberculosis_210:0.00175)41:1.4E-4)22:1.1E-4)0:1. > > 4E-4,(((Mycobacterium_kansasii_ATCC_12478:0.03353,(Mycobacterium_marinum_M:8 > > .4E-4,Mycobacterium_ulcerans_Agy99:0.00621)100:0.03843)100:0.01058,((Mycobac > > terium_leprae_Br4923:1.7E-4,Mycobacterium_leprae_TN:1.6E-4)100:0.07379,(((My > > cobacterium_abscessus:0.09172,((Nocardia_farcinica_IFM_10152:0.09312,(Rhodoc > > occus_equi_103S:0.04972,((Rhodococcus_erythropolis_PR4:0.00223,Rhodococcus_e > > rythropolis_SK121:0.00169)100:0.05311,(Rhodococcus_jostii_RHA1:0.00657,Rhodo > > coccus_opacus_B4:0.00665)100:0.03198)100:0.02608)100:0.03679)100:0.0322,(((D > > ietzia_cinnamea_P4:0.17703,(Corynebacterium_amycolatum_SK46:0.16548,((Coryne > > bacterium_kroppenstedtii_DSM_44385:0.19137,(Corynebacterium_variabile_DSM_44 > > 702:0.14307,(Corynebacterium_urealyticum_DSM_7109:0.11864,(Corynebacterium_r > > esistens_DSM_45100:0.08617,(Corynebacterium_jeikeium_ATCC_43734:0.00292,Cory > > nebacterium_jeikeium_K411:0.00277)100:0.08012)100:0.02749)100:0.03867)100:0. > > 05776)100:0.02728,((Corynebacterium_glucuronolyticum_ATCC_51866:0.00103,Cory > > nebacterium_glucuronolyticum_ATCC_51867:0.00137)100:0.2108,(((Corynebacteriu > > m_efficiens_YS-314:0.04795,Corynebacterium_glutamicum_R:0.05369)100:0.07449, > > ((Corynebacterium_matruchotii_ATCC_14266:0.00162,Corynebacterium_matruchotii > > _ATCC_33806:0.00183)100:0.13467,(Corynebacterium_diphtheriae_NCTC_13129:0.07 > > 465,((Corynebacterium_pseudotuberculosis_C231:1.6E-4,Corynebacterium_pseudot > > uberculosis_I19:1.1E-4)55:1.8E-4,(Corynebacterium_pseudotuberculosis_1002:4. > > 2E-4,Corynebacterium_pseudotuberculosis_FRC41:1.2E-4)100:1.3E-4)100:0.0698)1 > > 00:0.03816)100:0.03537)100:0.01906,((Corynebacterium_ammoniagenes_DSM_20306: > > 0.10393,(((Corynebacterium_accolens_ATCC_49725:0.00229,Corynebacterium_accol > > ens_ATCC_49726:0.00293)100:0.03941,(Corynebacterium_pseudogenitalium_ATCC_33 > > 035:0.00268,Corynebacterium_tuberculostearicum_SK141:0.00306)100:0.02483)100 > > :0.04295,(Corynebacterium_aurimucosum_ATCC_700975:0.05398,Corynebacterium_st > > riatum_ATCC_6940:0.05404)100:0.02088)100:0.03132)100:0.05481,(Corynebacteriu > > m_genitalium_ATCC_33030:0.09551,Corynebacterium_lipophiloflavum_DSM_44291:0. > > 09316)100:0.0783)100:0.03398)100:0.02921)100:0.04667)100:0.02447)100:0.11754 > > )100:0.06305,(((Pseudonocardia_sp._P1:0.18877,(Saccharopolyspora_erythraea_N > > RRL_2338:0.1213,(Actinosynnema_mirum_DSM_43827:0.11851,(Saccharomonospora_vi > > ridis_DSM_43017:0.0927,(Amycolatopsis_mediterranei_U32:0.04231,Streptomyces_ > > sp._AA4:0.04144)100:0.05565)100:0.05467)55:0.01703)100:0.03241)100:0.02581,( > > Nakamurella_multipartita_DSM_44233:0.23119,(Geodermatophilus_obscurus_DSM_43 > > 160:0.21546,((Stackebrandtia_nassauensis_DSM_44728:0.21272,((Salinispora_are > > nicola_CNS-205:0.01818,Salinispora_tropica_CNB-440:0.02452)100:0.03265,(Micr > > omonospora_sp._ATCC_39149:0.0329,(Micromonospora_aurantiaca_ATCC_27029:0.001 > > 16,Micromonospora_sp._L5:7.7E-4)100:0.03199)100:0.01352)100:0.11098)100:0.09 > > 308,((((((Propionibacterium_freudenreichii_subsp._shermanii_CIRM-BIA1:0.2696 > > 8,(Propionibacterium_acnes_J139:0.00475,(Propionibacterium_acnes_KPA171202:6 > > .9E-4,(Propionibacterium_acnes_SK187:9.6E-4,(Propionibacterium_acnes_J165:6. > > 2E-4,Propionibacterium_acnes_SK137:8.8E-4)100:6.3E-4)100:0.00181)100:0.00315 > > )100:0.24712)45:0.13659,(Kribbella_flavida_DSM_17836:0.17436,(Nocardioidacea > > e_bacterium_Broad-1:0.13638,Nocardioides_sp._JS614:0.08869)100:0.11094)73:0. > > 02696)43:0.0456,((Kytococcus_sedentarius_DSM_20547:0.25655,(Dermacoccus_sp._ > > Ellin185:0.19062,(Intrasporangium_calvum_DSM_43043:0.13866,Janibacter_sp._HT > > CC2649:0.1412)100:0.04228)58:0.02864)100:0.06896,(Kineococcus_radiotolerans_ > > SRS30216:0.21207,((((Propionibacterium_acidifaciens_F0233:1.06116,Rubrobacte > > r_xylanophilus_DSM_9941:0.5414)87:0.41816,(((Tropheryma_whipplei_TW08/27:0.0 > > 0122,Tropheryma_whipplei_str._Twist:9.8E-4)100:0.6263,(Microbacterium_testac > > eum_StLB037:0.17489,(Leifsonia_xyli_subsp._xyli_str._CTCB07:0.12936,(marine_ > > actinobacterium_PHSC20C1:0.15679,(Clavibacter_michiganensis_subsp._michigane > > nsis_NCPPB_382:0.00662,Clavibacter_michiganensis_subsp._sepedonicus:0.00756) > > 100:0.1194)100:0.03032)100:0.03729)100:0.05755)100:0.10753,((Kocuria_rhizoph > > ila_DC2201:0.15484,(Rothia_dentocariosa_M567:0.0599,(Rothia_mucilaginosa_ATC > > C_25296:0.00472,Rothia_mucilaginosa_DY-18:0.00351)100:0.05376)100:0.16477)10 > > 0:0.06018,((Micrococcus_luteus_NCTC_2665:0.0060,Micrococcus_luteus_SK58:0.00 > > 536)100:0.2105,(Arthrobacter_arilaitensis_Re117:0.19987,(Renibacterium_salmo > > ninarum_ATCC_33209:0.11862,(Arthrobacter_aurescens_TC1:0.0397,(Arthrobacter_ > > sp._FB24:0.03191,(Arthrobacter_chlorophenolicus_A6:0.02327,Arthrobacter_phen > > anthrenivorans_Sphe3:0.0208)100:0.01937)100:0.01749)100:0.05605)100:0.05138) > > 54:0.02994)100:0.02418)100:0.06667)66:0.02055)2:0.00737,(Brevibacterium_line > > ns_BL2:0.16126,Brevibacterium_mcbrellneri_ATCC_49030:0.16657)100:0.16995)18: > > 0.03558,(Brachybacterium_faecium_DSM_4810:0.31488,(((Actinomyces_sp._oral_ta > > xon_848_str._F0332:0.18095,Arcanobacterium_haemolyticum_DSM_20595:0.23121)10 > > 0:0.1195,((Actinomyces_urogenitalis_DSM_15434:0.07685,(Actinomyces_sp._oral_ > > taxon_171_str._F0337:0.01337,(Actinomyces_oris_K20:0.00655,Actinomyces_visco > > sus_C505:0.00531)100:0.00537)100:0.07249)100:0.1364,(((Mobiluncus_mulieris_2 > > 8-1:0.00136,(Mobiluncus_mulieris_ATCC_35243:8.3E-4,(Mobiluncus_mulieris_ATCC > > _35239:8.2E-4,Mobiluncus_mulieris_FB024-16:0.00149)77:5.2E-4)68:4.7E-4)100:0 > > .11036,(Mobiluncus_curtisii_ATCC_51333:0.00379,(Mobiluncus_curtisii_ATCC_430 > > 63:0.00217,Mobiluncus_curtisii_subsp._holmesii_ATCC_35242:0.00188)100:0.0018 > > 7)100:0.10314)100:0.23714,(Actinomyces_coleocanis_DSM_15436:0.23053,(Actinom > > yces_sp._oral_taxon_178_str._F0338:0.07648,(Actinomyces_sp._oral_taxon_180_s > > tr._F0310:0.034,(Actinomyces_odontolyticus_ATCC_17982:0.00322,Actinomyces_od > > ontolyticus_F0309:0.00348)100:0.02875)100:0.05144)100:0.13371)100:0.05044)10 > > 0:0.03821)98:0.03296)99:0.07932,(Beutenbergia_cavernae_DSM_12333:0.17071,(Ce > > llulomonas_flavigena_DSM_20109:0.11998,(Xylanimonas_cellulosilytica_DSM_1589 > > 4:0.13952,(Sanguibacter_keddieii_DSM_10542:0.0926,Jonesia_denitrificans_DSM_ > > 20603:0.19099)100:0.037)100:0.02528)100:0.03933)100:0.03437)90:0.02763)87:0. > > 02582)25:0.02892)16:0.01907)26:0.05274)57:0.03588,(Catenulispora_acidiphila_ > > DSM_44928:0.19325,((Streptomyces_bingchenggensis_BCW-1:0.02661,(Streptomyces > > _hygroscopicus_ATCC_53653:0.02627,Streptomyces_violaceusniger_Tu_4113:0.0226 > > 6)100:0.0122)100:0.02739,(((Streptomyces_avermitilis_MA-4680:0.02562,((Strep > > tomyces_viridochromogenes_DSM_40736:0.02687,((Streptomyces_ghanaensis_ATCC_1 > > 4672:0.0209,Streptomyces_griseoflavus_Tu4000:0.02075)100:0.01066,(Streptomyc > > es_sp._e14:0.03193,(Streptomyces_coelicolor_A3_2_:5.2E-4,Streptomyces_livida > > ns_TK24:6.7E-4)100:0.02763)21:0.00492)22:0.00461)100:0.00897,(Streptomyces_s > > viceus_ATCC_29083:0.02794,Streptomyces_scabiei_87.22:0.03155)79:0.00685)98:0 > > .00896)100:0.02063,(Streptomyces_albus_J1074:0.04846,(Streptomyces_sp._SPB74 > > :0.01027,(Streptomyces_sp._SPB78:0.0027,(Streptomyces_sp._SA3_actF:0.00376,S > > treptomyces_sp._SA3_actG:2.4E-4)100:6.7E-4)100:0.00617)100:0.05981)100:0.015 > > 63)100:0.00959,((Streptomyces_sp._C:0.01998,Streptomyces_sp._Mg1:0.02056)100 > > :0.0383,(Streptomyces_pristinaespiralis_ATCC_25486:0.03823,(Streptomyces_sp. > > _ACTE:0.02415,((Streptomyces_griseus_subsp._griseus_NBRC_13350:4.1E-4,Strept > > omyces_sp._ACT-1:5.2E-4)100:0.00769,(Streptomyces_roseosporus_NRRL_11379:2.2 > > E-4,Streptomyces_roseosporus_NRRL_15998:0.00209)100:0.00782)100:0.017)100:0. > > 02297)60:0.00742)100:0.00985)100:0.0215)100:0.11349)100:0.05)46:0.02214,(Aci > > dothermus_cellulolyticus_11B:0.25166,((Nocardiopsis_dassonvillei_subsp._dass > > onvillei_DSM_43111:0.11961,Thermobifida_fusca_YX:0.09198)100:0.09418,(Thermo > > monospora_curvata_DSM_43183:0.12535,(Streptosporangium_roseum_DSM_43021:0.09 > > 754,Thermobispora_bispora_DSM_43833:0.08434)100:0.06253)100:0.0273)100:0.045 > > 43)74:0.02856)40:0.02885,(Frankia_symbiont_of_Datisca_glomerata:0.09569,(Fra > > nkia_sp._EuI1c:0.10847,((Frankia_alni_ACN14a:0.0351,Frankia_sp._CcI3:0.03651 > > )100:0.04249,(Frankia_sp._EAN1pec:0.03101,Frankia_sp._EUN1f:0.0394)100:0.046 > > 88)100:0.02188)100:0.03014)100:0.16741)88:0.0315)65:0.02571)88:0.05354)88:0. > > 03348)88:0.06784,(Segniliparus_rotundus_DSM_44985:0.04795,Segniliparus_rugos > > us_ATCC_BAA-974:0.03891)100:0.23144)81:0.0205)88:0.02214,(Tsukamurella_pauro > > metabola_DSM_20162:0.13958,(Gordonia_bronchialis_DSM_43247:0.08238,Gordonia_ > > neofelifaecis_NRRL_B-59395:0.10412)100:0.06124)100:0.03199)84:0.01596)88:0.0 > > 6919)88:0.04069,(Mycobacterium_smegmatis_str._MC2_155:0.05174,((Mycobacteriu > > m_sp._JLS:0.00135,(Mycobacterium_sp._KMS:1.4E-4,Mycobacterium_sp._MCS:2.3E-4 > > )100:0.00177)100:0.04943,(Mycobacterium_vanbaalenii_PYR-1:0.02631,(Mycobacte > > rium_gilvum_PYR-GCK:0.0011,Mycobacterium_sp._Spyr1:9.2E-4)100:0.03388)100:0. > > 03536)100:0.01213)100:0.02314)88:0.04937,(Mycobacterium_parascrofulaceum_ATC > > C_BAA-614:0.02796,(Mycobacterium_intracellulare_ATCC_13950:0.02159,(Mycobact > > erium_avium_subsp._paratuberculosis_K-10:0.0020,(Mycobacterium_avium_104:0.0 > > 032,Mycobacterium_avium_subsp._avium_ATCC_25291:0.00216)91:6.3E-4)100:0.0194 > > 3)100:0.01235)100:0.0158)88:0.01054)62:0.0066)88:0.04218,(Mycobacterium_tube > > rculosis_T46:7.6E-4,(Mycobacterium_tuberculosis_CPHL_A:5.8E-4,(Mycobacterium > > _tuberculosis_K85:8.0E-4,(Mycobacterium_bovis_AF2122/97:1.4E-4,(Mycobacteriu > > m_bovis_BCG_str._Pasteur_1173P2:1.4E-4,Mycobacterium_bovis_BCG_str._Tokyo_17 > > 2:1.4E-4)100:1.3E-4)100:1.8E-4)38:1.1E-4)33:1.4E-4)0:1.4E-4)1:1.5E-4)0:1.5E- > > 4,(Mycobacterium_tuberculosis_CDC1551:2.2E-4,(Mycobacterium_tuberculosis_T92 > > :0.00262,Mycobacterium_tuberculosis_GM_1503:0.00427)3:1.3E-4)0:1.4E-4)13:1.4 > > E-4,(((Mycobacterium_tuberculosis_SUMu012:8.5E-4,(Mycobacterium_tuberculosis > > _H37Ra_WGS_:0.00109,(Mycobacterium_tuberculosis_H37Ra:1.4E-4,Mycobacterium_t > > uberculosis_H37Rv:1.4E-4)58:1.4E-4)41:1.4E-4)72:1.4E-4,(Mycobacterium_tuberc > > ulosis_SUMu001:2.9E-4,(Mycobacterium_tuberculosis_SUMu010:1.1E-4,Mycobacteri > > um_tuberculosis_SUMu011:8.7E-4)58:1.4E-4)65:1.6E-4)61:0.00102,(Mycobacterium > > _tuberculosis_F11:1.1E-4,((Mycobacterium_tuberculosis_KZN_4207:1.4E-4,Mycoba > > cterium_tuberculosis_KZN_R506:1.4E-4)58:1.4E-4,(Mycobacterium_tuberculosis_K > > ZN_1435:1.4E-4,Mycobacterium_tuberculosis_KZN_V2475:1.4E-4)74:1.4E-4)78:1.1E > -4)36:1.4E-4)3:1.4E-4)46:1.4E-4)4:1.4E-4)88; > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From ross at cuhk.edu.hk Sat Sep 10 19:06:44 2011 From: ross at cuhk.edu.hk (Ross KK Leung) Date: Sun, 11 Sep 2011 07:06:44 +0800 Subject: [Bioperl-l] fail to obtain node-to-node distance from a newick file In-Reply-To: References: <048a01cc6f84$60c41090$224c31b0$@edu.hk> Message-ID: <04a601cc700e$4f051d10$ed0f5730$@edu.hk> Hi Greg, The values are correct! However, how to install this bioperl-live module? my bioperl is 1.6.1 but there's an error: Can't locate object method "find" via package "Bio::Tree::Tree" at TreeCalDist.pl line 32, line 1. my $mt = $tree->find($keyword); #line 32 From: gjuggler at gmail.com [mailto:gjuggler at gmail.com] On Behalf Of Gregory Jordan Sent: 2011??9??10?? 23:40 To: bioperl-l List; Ross KK Leung Subject: Re: [Bioperl-l] fail to obtain node-to-node distance from a newick file Hi Ross, Which version of BioPerl are you using? With the refactored tree code (available from the tree_api_refresh branch on the BioPerl github repo: https://github.com/bioperl/bioperl-live/blob/topic/tree_api_refresh/Bio/Tree /NodeFunctionsI.pm#L406) the following script works for me. Do those values look sensible to you? The code on the new branch is a bit experimental, so I wouldn't be surprised if all the edge cases for calculations like this aren't covered. --greg use Bio::TreeIO; my $treeio = new Bio::TreeIO(-file => 'temp.nh', -format => "newick"); my $tree = $treeio->next_tree; my $mt = $tree->find("Mycobacterium_tuberculosis_H37Rv"); my $ms = $tree->find("Mycobacterium_smegmatis_str._MC2_155"); my $ma = $tree->find("Mycobacterium_abscessus"); my $distance = $mt->distance($ma); print "MT - MA: ".$mt->distance($ma)."\n"; print "MT - MS: ".$mt->distance($ms)."\n"; print "MS - MA: ".$ms->distance($ma)."\n"; # MT - MA: 0.24326 # MT - MS: 0.18573 # MS - MA: 0.20729 --greg On Sat, Sep 10, 2011 at 7:39 AM, Ross KK Leung wrote: I use the following code to derive the distance between two nodes but an error "MSG: could not find the lca of supplied nodes; can't find distance either" What's the problem? use Bio::TreeIO; ($treefh) = @ARGV; my $treeio = new Bio::TreeIO(-file => $treefh, -format => "newick"); my $tree = $treeio->next_tree; $keyword="Mycobacterium_tuberculosis_H37Rv"; my $Tnodes = $tree->find_node(-id => $keyword); $keyword="Mycobacterium_smegmatis_str._MC2_155"; my $Mnodes = $tree->find_node(-id => $keyword); $keyword="Mycobacterium_abscessus"; my $Anodes = $tree->find_node(-id => $keyword); my @root = $tree->get_root_node; #my $distances = $tree->distance(-nodes => [$node[0],$root]); my $distances = $tree->distance(-nodes => [$Tnode,$Mnodes]); print "Dist:$distances\n"; #### the following is the infile (((Mycobacterium_tuberculosis_SUMu006:4.1E-4,(Mycobacterium_tuberculosis_SUM u002:8.9E-4,Mycobacterium_tuberculosis_SUMu005:1.7E-4)31:1.4E-4)30:1.4E-4,(M ycobacterium_tuberculosis_SUMu007:4.3E-4,Mycobacterium_tuberculosis_SUMu009: 1.2E-4)22:1.4E-4)8:1.4E-4,(Mycobacterium_tuberculosis_SUMu004:3.7E-4,Mycobac terium_tuberculosis_SUMu008:2.0E-4)19:1.0E-4,(Mycobacterium_tuberculosis_SUM u003:1.4E-4,(((((Mycobacterium_tuberculosis_EAS054:0.00165,(Mycobacterium_tu berculosis_94_M4241A:0.0011,((Mycobacterium_tuberculosis_T17:0.00257,Mycobac terium_tuberculosis_KZN_605:0.00259)8:1.9E-4,(Mycobacterium_tuberculosis_'98 -R604_INH-RIF-EM':9.9E-4,(Mycobacterium_tuberculosis_C:0.00293,Mycobacterium _tuberculosis_str._Haarlem:0.00271)28:1.4E-4)1:1.3E-4)0:1.6E-4)0:1.7E-4)0:1. 6E-4,(Mycobacterium_tuberculosis_02_1987:0.00186,(Mycobacterium_tuberculosis _T85:0.00229,Mycobacterium_tuberculosis_210:0.00175)41:1.4E-4)22:1.1E-4)0:1. 4E-4,(((Mycobacterium_kansasii_ATCC_12478:0.03353,(Mycobacterium_marinum_M:8 .4E-4,Mycobacterium_ulcerans_Agy99:0.00621)100:0.03843)100:0.01058,((Mycobac terium_leprae_Br4923:1.7E-4,Mycobacterium_leprae_TN:1.6E-4)100:0.07379,(((My cobacterium_abscessus:0.09172,((Nocardia_farcinica_IFM_10152:0.09312,(Rhodoc occus_equi_103S:0.04972,((Rhodococcus_erythropolis_PR4:0.00223,Rhodococcus_e rythropolis_SK121:0.00169)100:0.05311,(Rhodococcus_jostii_RHA1:0.00657,Rhodo coccus_opacus_B4:0.00665)100:0.03198)100:0.02608)100:0.03679)100:0.0322,(((D ietzia_cinnamea_P4:0.17703,(Corynebacterium_amycolatum_SK46:0.16548,((Coryne bacterium_kroppenstedtii_DSM_44385:0.19137,(Corynebacterium_variabile_DSM_44 702:0.14307,(Corynebacterium_urealyticum_DSM_7109:0.11864,(Corynebacterium_r esistens_DSM_45100:0.08617,(Corynebacterium_jeikeium_ATCC_43734:0.00292,Cory nebacterium_jeikeium_K411:0.00277)100:0.08012)100:0.02749)100:0.03867)100:0. 05776)100:0.02728,((Corynebacterium_glucuronolyticum_ATCC_51866:0.00103,Cory nebacterium_glucuronolyticum_ATCC_51867:0.00137)100:0.2108,(((Corynebacteriu m_efficiens_YS-314:0.04795,Corynebacterium_glutamicum_R:0.05369)100:0.07449, ((Corynebacterium_matruchotii_ATCC_14266:0.00162,Corynebacterium_matruchotii _ATCC_33806:0.00183)100:0.13467,(Corynebacterium_diphtheriae_NCTC_13129:0.07 465,((Corynebacterium_pseudotuberculosis_C231:1.6E-4,Corynebacterium_pseudot uberculosis_I19:1.1E-4)55:1.8E-4,(Corynebacterium_pseudotuberculosis_1002:4. 2E-4,Corynebacterium_pseudotuberculosis_FRC41:1.2E-4)100:1.3E-4)100:0.0698)1 00:0.03816)100:0.03537)100:0.01906,((Corynebacterium_ammoniagenes_DSM_20306: 0.10393,(((Corynebacterium_accolens_ATCC_49725:0.00229,Corynebacterium_accol ens_ATCC_49726:0.00293)100:0.03941,(Corynebacterium_pseudogenitalium_ATCC_33 035:0.00268,Corynebacterium_tuberculostearicum_SK141:0.00306)100:0.02483)100 :0.04295,(Corynebacterium_aurimucosum_ATCC_700975:0.05398,Corynebacterium_st riatum_ATCC_6940:0.05404)100:0.02088)100:0.03132)100:0.05481,(Corynebacteriu m_genitalium_ATCC_33030:0.09551,Corynebacterium_lipophiloflavum_DSM_44291:0. 09316)100:0.0783)100:0.03398)100:0.02921)100:0.04667)100:0.02447)100:0.11754 )100:0.06305,(((Pseudonocardia_sp._P1:0.18877,(Saccharopolyspora_erythraea_N RRL_2338:0.1213,(Actinosynnema_mirum_DSM_43827:0.11851,(Saccharomonospora_vi ridis_DSM_43017:0.0927,(Amycolatopsis_mediterranei_U32:0.04231,Streptomyces_ sp._AA4:0.04144)100:0.05565)100:0.05467)55:0.01703)100:0.03241)100:0.02581,( Nakamurella_multipartita_DSM_44233:0.23119,(Geodermatophilus_obscurus_DSM_43 160:0.21546,((Stackebrandtia_nassauensis_DSM_44728:0.21272,((Salinispora_are nicola_CNS-205:0.01818,Salinispora_tropica_CNB-440:0.02452)100:0.03265,(Micr omonospora_sp._ATCC_39149:0.0329,(Micromonospora_aurantiaca_ATCC_27029:0.001 16,Micromonospora_sp._L5:7.7E-4)100:0.03199)100:0.01352)100:0.11098)100:0.09 308,((((((Propionibacterium_freudenreichii_subsp._shermanii_CIRM-BIA1:0.2696 8,(Propionibacterium_acnes_J139:0.00475,(Propionibacterium_acnes_KPA171202:6 .9E-4,(Propionibacterium_acnes_SK187:9.6E-4,(Propionibacterium_acnes_J165:6. 2E-4,Propionibacterium_acnes_SK137:8.8E-4)100:6.3E-4)100:0.00181)100:0.00315 )100:0.24712)45:0.13659,(Kribbella_flavida_DSM_17836:0.17436,(Nocardioidacea e_bacterium_Broad-1:0.13638,Nocardioides_sp._JS614:0.08869)100:0.11094)73:0. 02696)43:0.0456,((Kytococcus_sedentarius_DSM_20547:0.25655,(Dermacoccus_sp._ Ellin185:0.19062,(Intrasporangium_calvum_DSM_43043:0.13866,Janibacter_sp._HT CC2649:0.1412)100:0.04228)58:0.02864)100:0.06896,(Kineococcus_radiotolerans_ SRS30216:0.21207,((((Propionibacterium_acidifaciens_F0233:1.06116,Rubrobacte r_xylanophilus_DSM_9941:0.5414)87:0.41816,(((Tropheryma_whipplei_TW08/27:0.0 0122,Tropheryma_whipplei_str._Twist:9.8E-4)100:0.6263,(Microbacterium_testac eum_StLB037:0.17489,(Leifsonia_xyli_subsp._xyli_str._CTCB07:0.12936,(marine_ actinobacterium_PHSC20C1:0.15679,(Clavibacter_michiganensis_subsp._michigane nsis_NCPPB_382:0.00662,Clavibacter_michiganensis_subsp._sepedonicus:0.00756) 100:0.1194)100:0.03032)100:0.03729)100:0.05755)100:0.10753,((Kocuria_rhizoph ila_DC2201:0.15484,(Rothia_dentocariosa_M567:0.0599,(Rothia_mucilaginosa_ATC C_25296:0.00472,Rothia_mucilaginosa_DY-18:0.00351)100:0.05376)100:0.16477)10 0:0.06018,((Micrococcus_luteus_NCTC_2665:0.0060,Micrococcus_luteus_SK58:0.00 536)100:0.2105,(Arthrobacter_arilaitensis_Re117:0.19987,(Renibacterium_salmo ninarum_ATCC_33209:0.11862,(Arthrobacter_aurescens_TC1:0.0397,(Arthrobacter_ sp._FB24:0.03191,(Arthrobacter_chlorophenolicus_A6:0.02327,Arthrobacter_phen anthrenivorans_Sphe3:0.0208)100:0.01937)100:0.01749)100:0.05605)100:0.05138) 54:0.02994)100:0.02418)100:0.06667)66:0.02055)2:0.00737,(Brevibacterium_line ns_BL2:0.16126,Brevibacterium_mcbrellneri_ATCC_49030:0.16657)100:0.16995)18: 0.03558,(Brachybacterium_faecium_DSM_4810:0.31488,(((Actinomyces_sp._oral_ta xon_848_str._F0332:0.18095,Arcanobacterium_haemolyticum_DSM_20595:0.23121)10 0:0.1195,((Actinomyces_urogenitalis_DSM_15434:0.07685,(Actinomyces_sp._oral_ taxon_171_str._F0337:0.01337,(Actinomyces_oris_K20:0.00655,Actinomyces_visco sus_C505:0.00531)100:0.00537)100:0.07249)100:0.1364,(((Mobiluncus_mulieris_2 8-1:0.00136,(Mobiluncus_mulieris_ATCC_35243:8.3E-4,(Mobiluncus_mulieris_ATCC _35239:8.2E-4,Mobiluncus_mulieris_FB024-16:0.00149)77:5.2E-4)68:4.7E-4)100:0 .11036,(Mobiluncus_curtisii_ATCC_51333:0.00379,(Mobiluncus_curtisii_ATCC_430 63:0.00217,Mobiluncus_curtisii_subsp._holmesii_ATCC_35242:0.00188)100:0.0018 7)100:0.10314)100:0.23714,(Actinomyces_coleocanis_DSM_15436:0.23053,(Actinom yces_sp._oral_taxon_178_str._F0338:0.07648,(Actinomyces_sp._oral_taxon_180_s tr._F0310:0.034,(Actinomyces_odontolyticus_ATCC_17982:0.00322,Actinomyces_od ontolyticus_F0309:0.00348)100:0.02875)100:0.05144)100:0.13371)100:0.05044)10 0:0.03821)98:0.03296)99:0.07932,(Beutenbergia_cavernae_DSM_12333:0.17071,(Ce llulomonas_flavigena_DSM_20109:0.11998,(Xylanimonas_cellulosilytica_DSM_1589 4:0.13952,(Sanguibacter_keddieii_DSM_10542:0.0926,Jonesia_denitrificans_DSM_ 20603:0.19099)100:0.037)100:0.02528)100:0.03933)100:0.03437)90:0.02763)87:0. 02582)25:0.02892)16:0.01907)26:0.05274)57:0.03588,(Catenulispora_acidiphila_ DSM_44928:0.19325,((Streptomyces_bingchenggensis_BCW-1:0.02661,(Streptomyces _hygroscopicus_ATCC_53653:0.02627,Streptomyces_violaceusniger_Tu_4113:0.0226 6)100:0.0122)100:0.02739,(((Streptomyces_avermitilis_MA-4680:0.02562,((Strep tomyces_viridochromogenes_DSM_40736:0.02687,((Streptomyces_ghanaensis_ATCC_1 4672:0.0209,Streptomyces_griseoflavus_Tu4000:0.02075)100:0.01066,(Streptomyc es_sp._e14:0.03193,(Streptomyces_coelicolor_A3_2_:5.2E-4,Streptomyces_livida ns_TK24:6.7E-4)100:0.02763)21:0.00492)22:0.00461)100:0.00897,(Streptomyces_s viceus_ATCC_29083:0.02794,Streptomyces_scabiei_87.22:0.03155)79:0.00685)98:0 .00896)100:0.02063,(Streptomyces_albus_J1074:0.04846,(Streptomyces_sp._SPB74 :0.01027,(Streptomyces_sp._SPB78:0.0027,(Streptomyces_sp._SA3_actF:0.00376,S treptomyces_sp._SA3_actG:2.4E-4)100:6.7E-4)100:0.00617)100:0.05981)100:0.015 63)100:0.00959,((Streptomyces_sp._C:0.01998,Streptomyces_sp._Mg1:0.02056)100 :0.0383,(Streptomyces_pristinaespiralis_ATCC_25486:0.03823,(Streptomyces_sp. _ACTE:0.02415,((Streptomyces_griseus_subsp._griseus_NBRC_13350:4.1E-4,Strept omyces_sp._ACT-1:5.2E-4)100:0.00769,(Streptomyces_roseosporus_NRRL_11379:2.2 E-4,Streptomyces_roseosporus_NRRL_15998:0.00209)100:0.00782)100:0.017)100:0. 02297)60:0.00742)100:0.00985)100:0.0215)100:0.11349)100:0.05)46:0.02214,(Aci dothermus_cellulolyticus_11B:0.25166,((Nocardiopsis_dassonvillei_subsp._dass onvillei_DSM_43111:0.11961,Thermobifida_fusca_YX:0.09198)100:0.09418,(Thermo monospora_curvata_DSM_43183:0.12535,(Streptosporangium_roseum_DSM_43021:0.09 754,Thermobispora_bispora_DSM_43833:0.08434)100:0.06253)100:0.0273)100:0.045 43)74:0.02856)40:0.02885,(Frankia_symbiont_of_Datisca_glomerata:0.09569,(Fra nkia_sp._EuI1c:0.10847,((Frankia_alni_ACN14a:0.0351,Frankia_sp._CcI3:0.03651 )100:0.04249,(Frankia_sp._EAN1pec:0.03101,Frankia_sp._EUN1f:0.0394)100:0.046 88)100:0.02188)100:0.03014)100:0.16741)88:0.0315)65:0.02571)88:0.05354)88:0. 03348)88:0.06784,(Segniliparus_rotundus_DSM_44985:0.04795,Segniliparus_rugos us_ATCC_BAA-974:0.03891)100:0.23144)81:0.0205)88:0.02214,(Tsukamurella_pauro metabola_DSM_20162:0.13958,(Gordonia_bronchialis_DSM_43247:0.08238,Gordonia_ neofelifaecis_NRRL_B-59395:0.10412)100:0.06124)100:0.03199)84:0.01596)88:0.0 6919)88:0.04069,(Mycobacterium_smegmatis_str._MC2_155:0.05174,((Mycobacteriu m_sp._JLS:0.00135,(Mycobacterium_sp._KMS:1.4E-4,Mycobacterium_sp._MCS:2.3E-4 )100:0.00177)100:0.04943,(Mycobacterium_vanbaalenii_PYR-1:0.02631,(Mycobacte rium_gilvum_PYR-GCK:0.0011,Mycobacterium_sp._Spyr1:9.2E-4)100:0.03388)100:0. 03536)100:0.01213)100:0.02314)88:0.04937,(Mycobacterium_parascrofulaceum_ATC C_BAA-614:0.02796,(Mycobacterium_intracellulare_ATCC_13950:0.02159,(Mycobact erium_avium_subsp._paratuberculosis_K-10:0.0020,(Mycobacterium_avium_104:0.0 032,Mycobacterium_avium_subsp._avium_ATCC_25291:0.00216)91:6.3E-4)100:0.0194 3)100:0.01235)100:0.0158)88:0.01054)62:0.0066)88:0.04218,(Mycobacterium_tube rculosis_T46:7.6E-4,(Mycobacterium_tuberculosis_CPHL_A:5.8E-4,(Mycobacterium _tuberculosis_K85:8.0E-4,(Mycobacterium_bovis_AF2122/97:1.4E-4,(Mycobacteriu m_bovis_BCG_str._Pasteur_1173P2:1.4E-4,Mycobacterium_bovis_BCG_str._Tokyo_17 2:1.4E-4)100:1.3E-4)100:1.8E-4)38:1.1E-4)33:1.4E-4)0:1.4E-4)1:1.5E-4)0:1.5E- 4,(Mycobacterium_tuberculosis_CDC1551:2.2E-4,(Mycobacterium_tuberculosis_T92 :0.00262,Mycobacterium_tuberculosis_GM_1503:0.00427)3:1.3E-4)0:1.4E-4)13:1.4 E-4,(((Mycobacterium_tuberculosis_SUMu012:8.5E-4,(Mycobacterium_tuberculosis _H37Ra_WGS_:0.00109,(Mycobacterium_tuberculosis_H37Ra:1.4E-4,Mycobacterium_t uberculosis_H37Rv:1.4E-4)58:1.4E-4)41:1.4E-4)72:1.4E-4,(Mycobacterium_tuberc ulosis_SUMu001:2.9E-4,(Mycobacterium_tuberculosis_SUMu010:1.1E-4,Mycobacteri um_tuberculosis_SUMu011:8.7E-4)58:1.4E-4)65:1.6E-4)61:0.00102,(Mycobacterium _tuberculosis_F11:1.1E-4,((Mycobacterium_tuberculosis_KZN_4207:1.4E-4,Mycoba cterium_tuberculosis_KZN_R506:1.4E-4)58:1.4E-4,(Mycobacterium_tuberculosis_K ZN_1435:1.4E-4,Mycobacterium_tuberculosis_KZN_V2475:1.4E-4)74:1.4E-4)78:1.1E -4)36:1.4E-4)3:1.4E-4)46:1.4E-4)4:1.4E-4)88; _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From manju.rawat2 at gmail.com Mon Sep 12 01:37:35 2011 From: manju.rawat2 at gmail.com (Manju Rawat) Date: Mon, 12 Sep 2011 01:37:35 -0400 Subject: [Bioperl-l] no blast result Message-ID: Hello, I want to make a program which first generate the random sequence and then gives me that result(sequence) which have no blast result(no matches in any database/or particular database).Is there any body who can help me in doing this. Pl reply if anybody knows about it.. Thanks Manju From zhangchnxp at gmail.com Mon Sep 12 01:51:59 2011 From: zhangchnxp at gmail.com (Zhang chn) Date: Mon, 12 Sep 2011 13:51:59 +0800 Subject: [Bioperl-l] no blast result In-Reply-To: References: Message-ID: Hi, IMHO, due to the nature of BLAST, it is usually impossible to get no results from random sequence, but to get a set of matches with lower scores. What you can do is to focus on the e-value, say, setting a threshold to it. FYI, http://www.ncbi.nlm.nih.gov/BLAST/tutorial/Altschul-1.html On Mon, Sep 12, 2011 at 1:37 PM, Manju Rawat wrote: > Hello, > I want to make a program which first generate the random sequence and then > gives me that result(sequence) which have no blast result(no matches in any > database/or particular database).Is there any body who can help me in doing > this. > > Pl reply if anybody knows about it.. > Thanks > Manju > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From manju.rawat2 at gmail.com Mon Sep 12 01:58:38 2011 From: manju.rawat2 at gmail.com (Manju Rawat) Date: Mon, 12 Sep 2011 01:58:38 -0400 Subject: [Bioperl-l] no blast result In-Reply-To: References: Message-ID: Ya i know this....And it is also in my use if i get result with lower scores. But how could I do this? Manju From zhangchnxp at gmail.com Mon Sep 12 02:04:17 2011 From: zhangchnxp at gmail.com (Zhang chn) Date: Mon, 12 Sep 2011 14:04:17 +0800 Subject: [Bioperl-l] no blast result In-Reply-To: References: Message-ID: Please read the documentation for Bio::Tools::Run::StandAloneBlast and Bio::AlignIO.* * On Mon, Sep 12, 2011 at 1:58 PM, Manju Rawat wrote: > Ya i know this....And it is also in my use if i get result with lower > scores. > But how could I do this? > > > Manju > From manju.rawat2 at gmail.com Mon Sep 12 07:12:40 2011 From: manju.rawat2 at gmail.com (Manju Rawat) Date: Mon, 12 Sep 2011 07:12:40 -0400 Subject: [Bioperl-l] no blast result In-Reply-To: References: Message-ID: I read this..but default program is not runnig fine.it showing many error that MSG: cannot find path to blastall.. Use of uninitialized value $_[0] in join or string at /usr/share/perl/5.10/File/Spec/Unix.pm line 41. Am this this is not showing output which i want.. Pl help me.. Manju Rawat From arguelloj at gmail.com Sun Sep 11 22:52:42 2011 From: arguelloj at gmail.com (J. Fernando Arguello) Date: Sun, 11 Sep 2011 19:52:42 -0700 Subject: [Bioperl-l] BioPerl - quick general question Message-ID: Dear BioPerl, I'm excited to see a project like this! Basically I have a computer science background with a few years of development, research and minimal bioinformatics experience. Dumb question...where is the best place for a developer to begin on the BioPerl wiki(s), who is wanting to contribute new code or bug fixes to BioPerl in the future? Any input is much appreciated. Thank you all for your time. Best, Fernando jfa From briano at bioteam.net Mon Sep 12 09:20:36 2011 From: briano at bioteam.net (Brian Osborne) Date: Mon, 12 Sep 2011 09:20:36 -0400 Subject: [Bioperl-l] Fwd: cds sequence extract References: <112c4ef2.641e.1325c4b21cb.Coremail.maliang7121@163.com> Message-ID: <671CAF11-55A4-462A-BC5B-805C87E1EB0E@bioteam.net> Liang Ma, I'm forwarding this to the Bioperl mailing list. If you're starting out with Bioperl I suggest you read this: http://www.bioperl.org/wiki/HOWTO:Beginners Brian O. Begin forwarded message: > From: maliang7121 > Date: September 12, 2011 2:20:20 AM EDT > To: briano at bioteam.net > Subject: cds sequence extract > > Dear Brian: > > I am a student of Chinese Academy of Sience, I begin to love bioperl, but now I have a problem. > > According to the script of the attachment, I could easily dowload sequences from NCBI, now I need extract cds sequence from the genbank format files, and put them all in a single file using fasta format, I can not do it, could you spend a few minite wrinting a script for me? > > Best! > > Liang Ma > > > Brian O. -- Brian Osborne, PhD BioTeam: http://bioteam.net email: briano at bioteam.net mobile: 978-317-3101 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: acc.txt URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: get_seq_by_acc_ml.pl Type: text/x-perl-script Size: 583 bytes Desc: not available URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: From fs5 at sanger.ac.uk Mon Sep 12 09:54:21 2011 From: fs5 at sanger.ac.uk (Frank Schwach) Date: Mon, 12 Sep 2011 14:54:21 +0100 Subject: [Bioperl-l] no blast result In-Reply-To: References: Message-ID: <1315835661.3797.563.camel@deskpro15336.internal.sanger.ac.uk> looks like BLAST is not install on your system. The BioPerl module only runs BLAST for you and parses the output but you still need the BLAST executables installed on your system. Follow the instructions on the NCBI website to download and install BLAST and try running it on the commandline with the "blastall" command. If that works then you can run it also via BioPerl. Frank On Mon, 2011-09-12 at 07:12 -0400, Manju Rawat wrote: > I read this..but default program is not runnig fine.it showing many error > that > > MSG: cannot find path to blastall.. > Use of uninitialized value $_[0] in join or string at > /usr/share/perl/5.10/File/Spec/Unix.pm line 41. > > Am this this is not showing output which i want.. > > Pl help me.. > > Manju Rawat > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From p.j.a.cock at googlemail.com Mon Sep 12 10:00:30 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 12 Sep 2011 15:00:30 +0100 Subject: [Bioperl-l] no blast result In-Reply-To: <1315835661.3797.563.camel@deskpro15336.internal.sanger.ac.uk> References: <1315835661.3797.563.camel@deskpro15336.internal.sanger.ac.uk> Message-ID: On Mon, Sep 12, 2011 at 2:54 PM, Frank Schwach wrote: > looks like BLAST is not install on your system. The BioPerl module only > runs BLAST for you and parses the output but you still need the BLAST > executables installed on your system. Follow the instructions on the > NCBI website to download and install BLAST and try running it on the > commandline with the "blastall" command. If that works then you can run > it also via BioPerl. > Frank Hang on - blastall is from the "legacy" BLAST suite, does BioPerl still talk to that or the new BLAST+ suite (e.g. binaries blastn and blastp rather then blastall)? Peter From cjfields at illinois.edu Mon Sep 12 13:45:56 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Mon, 12 Sep 2011 17:45:56 +0000 Subject: [Bioperl-l] BioPerl - quick general question In-Reply-To: References: Message-ID: <62B9B300-96AC-4511-A1B9-CFF36CBE6288@illinois.edu> On Sep 11, 2011, at 9:52 PM, J. Fernando Arguello wrote: > Dear BioPerl, > > I'm excited to see a project like this! Basically I have a computer science > background with a few years of development, research and minimal > bioinformatics experience. > > Dumb question...where is the best place for a developer to begin on the > BioPerl wiki(s), who is wanting to contribute new code or bug fixes to > BioPerl in the future? The basic starting point: the HOWTOs and the tutorial (not sure how up-to-date some of the latter are, in general they should work): http://www.bioperl.org/wiki/HOWTOs http://www.bioperl.org/wiki/Tutorials > Any input is much appreciated. Thank you all for your time. > > Best, > Fernando > jfa We gladly welcome anyone willing to hack on BioPerl. The repository is now on github (core is https://github.com/bioperl/bioperl-live), so it's fairly easy to fork the code and make changes. We are in the middle of splitting up the large codebase into more manageable subdistributions, so it's probably a good idea to ask on list about specific code in case the code is question resides in a separate repository. Let us know if you have additional questions! Cheers! chris From shalabh.sharma7 at gmail.com Mon Sep 12 14:00:16 2011 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Mon, 12 Sep 2011 14:00:16 -0400 Subject: [Bioperl-l] Module for SOCS Message-ID: Hi All, I am using SOCS for mapping my SOILD data. I was just wondering if there is any module in bioperl to analyze SOCS output files directly or mapreads format. Thanks Shalabh -- Shalabh Sharma Scientific Computing Professional Associate (Bioinformatics Specialist) Department of Marine Sciences University of Georgia Athens, GA 30602-3636 From greg at ebi.ac.uk Tue Sep 13 04:30:58 2011 From: greg at ebi.ac.uk (Gregory Jordan) Date: Tue, 13 Sep 2011 09:30:58 +0100 Subject: [Bioperl-l] fail to obtain node-to-node distance from a newick file In-Reply-To: <04a601cc700e$4f051d10$ed0f5730$@edu.hk> References: <048a01cc6f84$60c41090$224c31b0$@edu.hk> <04a601cc700e$4f051d10$ed0f5730$@edu.hk> Message-ID: Hi Ross, I don't typically 'install' versions of BioPerl from GitHub. Rather, I check out the code into a directory that's on my Perl search path (and make sure any other BioPerl code isn't on the path anymore). I think the following commands should get you the right set of code: > git clone git://github.com/bioperl/bioperl-live.git > git checkout topic/tree_api_refresh After that, I'm afraid I'll have to leave it to you (or someone else on the list). I'm no Perl guru, so I don't know the "right" way to direct Perl towards a developmental BioPerl branch. Cheers, Greg 2011/9/11 Ross KK Leung > Hi Greg,**** > > ** ** > > The values are correct! However, how to install this bioperl-live module? > my bioperl is 1.6.1 but there's an error:**** > > ** ** > > Can't locate object method "find" via package "Bio::Tree::Tree" at > TreeCalDist.pl line 32, line 1.**** > > my $mt = $tree->find($keyword); #line 32**** > > ** ** > > ** ** > > *From:* gjuggler at gmail.com [mailto:gjuggler at gmail.com] *On Behalf Of *Gregory > Jordan > *Sent:* 2011?9?10? 23:40 > *To:* bioperl-l List; Ross KK Leung > *Subject:* Re: [Bioperl-l] fail to obtain node-to-node distance from a > newick file**** > > ** ** > > Hi Ross,**** > > ** ** > > Which version of BioPerl are you using?**** > > ** ** > > With the refactored tree code (available from the tree_api_refresh branch > on the BioPerl github repo: > https://github.com/bioperl/bioperl-live/blob/topic/tree_api_refresh/Bio/Tree/NodeFunctionsI.pm#L406) > the following script works for me. Do those values look sensible to you? The > code on the new branch is a bit experimental, so I wouldn't be surprised if > all the edge cases for calculations like this aren't covered.**** > > ** ** > > --greg**** > > ** ** > > use Bio::TreeIO;**** > > ** ** > > my $treeio = new Bio::TreeIO(-file => 'temp.nh', -format => "newick");** > ** > > my $tree = $treeio->next_tree;**** > > my $mt = $tree->find("Mycobacterium_tuberculosis_H37Rv");**** > > my $ms = $tree->find("Mycobacterium_smegmatis_str._MC2_155");**** > > my $ma = $tree->find("Mycobacterium_abscessus");**** > > my $distance = $mt->distance($ma);**** > > print "MT - MA: ".$mt->distance($ma)."\n";**** > > print "MT - MS: ".$mt->distance($ms)."\n";**** > > print "MS - MA: ".$ms->distance($ma)."\n";**** > > # MT - MA: 0.24326**** > > # MT - MS: 0.18573**** > > # MS - MA: 0.20729**** > > ** ** > > --greg**** > > ** ** > > On Sat, Sep 10, 2011 at 7:39 AM, Ross KK Leung wrote:** > ** > > I use the following code to derive the distance between two nodes but an > error "MSG: could not find the lca of supplied nodes; can't find distance > either" > > > > What's the problem? > > > > use Bio::TreeIO; > > > > ($treefh) = @ARGV; > > > > my $treeio = new Bio::TreeIO(-file => $treefh, -format => "newick"); > > my $tree = $treeio->next_tree; > > > > $keyword="Mycobacterium_tuberculosis_H37Rv"; > > my $Tnodes = $tree->find_node(-id => $keyword); > > $keyword="Mycobacterium_smegmatis_str._MC2_155"; > > my $Mnodes = $tree->find_node(-id => $keyword); > > $keyword="Mycobacterium_abscessus"; > > my $Anodes = $tree->find_node(-id => $keyword); > > > > my @root = $tree->get_root_node; > > #my $distances = $tree->distance(-nodes => [$node[0],$root]); > > > > my $distances = $tree->distance(-nodes => [$Tnode,$Mnodes]); > > print "Dist:$distances\n"; > > > > > > #### the following is the infile > > > (((Mycobacterium_tuberculosis_SUMu006:4.1E-4,(Mycobacterium_tuberculosis_SUM > > u002:8.9E-4,Mycobacterium_tuberculosis_SUMu005:1.7E-4)31:1.4E-4)30:1.4E-4,(M > > ycobacterium_tuberculosis_SUMu007:4.3E-4,Mycobacterium_tuberculosis_SUMu009: > > 1.2E-4)22:1.4E-4)8:1.4E-4,(Mycobacterium_tuberculosis_SUMu004:3.7E-4,Mycobac > > terium_tuberculosis_SUMu008:2.0E-4)19:1.0E-4,(Mycobacterium_tuberculosis_SUM > > u003:1.4E-4,(((((Mycobacterium_tuberculosis_EAS054:0.00165,(Mycobacterium_tu > > berculosis_94_M4241A:0.0011,((Mycobacterium_tuberculosis_T17:0.00257,Mycobac > > terium_tuberculosis_KZN_605:0.00259)8:1.9E-4,(Mycobacterium_tuberculosis_'98 > > -R604_INH-RIF-EM':9.9E-4,(Mycobacterium_tuberculosis_C:0.00293,Mycobacterium > > _tuberculosis_str._Haarlem:0.00271)28:1.4E-4)1:1.3E-4)0:1.6E-4)0:1.7E-4)0:1. > > 6E-4,(Mycobacterium_tuberculosis_02_1987:0.00186,(Mycobacterium_tuberculosis > > _T85:0.00229,Mycobacterium_tuberculosis_210:0.00175)41:1.4E-4)22:1.1E-4)0:1. > > 4E-4,(((Mycobacterium_kansasii_ATCC_12478:0.03353,(Mycobacterium_marinum_M:8 > > .4E-4,Mycobacterium_ulcerans_Agy99:0.00621)100:0.03843)100:0.01058,((Mycobac > > terium_leprae_Br4923:1.7E-4,Mycobacterium_leprae_TN:1.6E-4)100:0.07379,(((My > > cobacterium_abscessus:0.09172,((Nocardia_farcinica_IFM_10152:0.09312,(Rhodoc > > occus_equi_103S:0.04972,((Rhodococcus_erythropolis_PR4:0.00223,Rhodococcus_e > > rythropolis_SK121:0.00169)100:0.05311,(Rhodococcus_jostii_RHA1:0.00657,Rhodo > > coccus_opacus_B4:0.00665)100:0.03198)100:0.02608)100:0.03679)100:0.0322,(((D > > ietzia_cinnamea_P4:0.17703,(Corynebacterium_amycolatum_SK46:0.16548,((Coryne > > bacterium_kroppenstedtii_DSM_44385:0.19137,(Corynebacterium_variabile_DSM_44 > > 702:0.14307,(Corynebacterium_urealyticum_DSM_7109:0.11864,(Corynebacterium_r > > esistens_DSM_45100:0.08617,(Corynebacterium_jeikeium_ATCC_43734:0.00292,Cory > > nebacterium_jeikeium_K411:0.00277)100:0.08012)100:0.02749)100:0.03867)100:0. > > 05776)100:0.02728,((Corynebacterium_glucuronolyticum_ATCC_51866:0.00103,Cory > > nebacterium_glucuronolyticum_ATCC_51867:0.00137)100:0.2108,(((Corynebacteriu > > m_efficiens_YS-314:0.04795,Corynebacterium_glutamicum_R:0.05369)100:0.07449, > > ((Corynebacterium_matruchotii_ATCC_14266:0.00162,Corynebacterium_matruchotii > > _ATCC_33806:0.00183)100:0.13467,(Corynebacterium_diphtheriae_NCTC_13129:0.07 > > 465,((Corynebacterium_pseudotuberculosis_C231:1.6E-4,Corynebacterium_pseudot > > uberculosis_I19:1.1E-4)55:1.8E-4,(Corynebacterium_pseudotuberculosis_1002:4. > > 2E-4,Corynebacterium_pseudotuberculosis_FRC41:1.2E-4)100:1.3E-4)100:0.0698)1 > > 00:0.03816)100:0.03537)100:0.01906,((Corynebacterium_ammoniagenes_DSM_20306: > > 0.10393,(((Corynebacterium_accolens_ATCC_49725:0.00229,Corynebacterium_accol > > ens_ATCC_49726:0.00293)100:0.03941,(Corynebacterium_pseudogenitalium_ATCC_33 > > 035:0.00268,Corynebacterium_tuberculostearicum_SK141:0.00306)100:0.02483)100 > > :0.04295,(Corynebacterium_aurimucosum_ATCC_700975:0.05398,Corynebacterium_st > > riatum_ATCC_6940:0.05404)100:0.02088)100:0.03132)100:0.05481,(Corynebacteriu > > m_genitalium_ATCC_33030:0.09551,Corynebacterium_lipophiloflavum_DSM_44291:0. > > 09316)100:0.0783)100:0.03398)100:0.02921)100:0.04667)100:0.02447)100:0.11754 > > )100:0.06305,(((Pseudonocardia_sp._P1:0.18877,(Saccharopolyspora_erythraea_N > > RRL_2338:0.1213,(Actinosynnema_mirum_DSM_43827:0.11851,(Saccharomonospora_vi > > ridis_DSM_43017:0.0927,(Amycolatopsis_mediterranei_U32:0.04231,Streptomyces_ > > sp._AA4:0.04144)100:0.05565)100:0.05467)55:0.01703)100:0.03241)100:0.02581,( > > Nakamurella_multipartita_DSM_44233:0.23119,(Geodermatophilus_obscurus_DSM_43 > > 160:0.21546,((Stackebrandtia_nassauensis_DSM_44728:0.21272,((Salinispora_are > > nicola_CNS-205:0.01818,Salinispora_tropica_CNB-440:0.02452)100:0.03265,(Micr > > omonospora_sp._ATCC_39149:0.0329,(Micromonospora_aurantiaca_ATCC_27029:0.001 > > 16,Micromonospora_sp._L5:7.7E-4)100:0.03199)100:0.01352)100:0.11098)100:0.09 > > 308,((((((Propionibacterium_freudenreichii_subsp._shermanii_CIRM-BIA1:0.2696 > > 8,(Propionibacterium_acnes_J139:0.00475,(Propionibacterium_acnes_KPA171202:6 > > .9E-4,(Propionibacterium_acnes_SK187:9.6E-4,(Propionibacterium_acnes_J165:6. > > 2E-4,Propionibacterium_acnes_SK137:8.8E-4)100:6.3E-4)100:0.00181)100:0.00315 > > )100:0.24712)45:0.13659,(Kribbella_flavida_DSM_17836:0.17436,(Nocardioidacea > > e_bacterium_Broad-1:0.13638,Nocardioides_sp._JS614:0.08869)100:0.11094)73:0. > > 02696)43:0.0456,((Kytococcus_sedentarius_DSM_20547:0.25655,(Dermacoccus_sp._ > > Ellin185:0.19062,(Intrasporangium_calvum_DSM_43043:0.13866,Janibacter_sp._HT > > CC2649:0.1412)100:0.04228)58:0.02864)100:0.06896,(Kineococcus_radiotolerans_ > > SRS30216:0.21207,((((Propionibacterium_acidifaciens_F0233:1.06116,Rubrobacte > > r_xylanophilus_DSM_9941:0.5414)87:0.41816,(((Tropheryma_whipplei_TW08/27:0.0 > > 0122,Tropheryma_whipplei_str._Twist:9.8E-4)100:0.6263,(Microbacterium_testac > > eum_StLB037:0.17489,(Leifsonia_xyli_subsp._xyli_str._CTCB07:0.12936,(marine_ > > actinobacterium_PHSC20C1:0.15679,(Clavibacter_michiganensis_subsp._michigane > > nsis_NCPPB_382:0.00662,Clavibacter_michiganensis_subsp._sepedonicus:0.00756) > > 100:0.1194)100:0.03032)100:0.03729)100:0.05755)100:0.10753,((Kocuria_rhizoph > > ila_DC2201:0.15484,(Rothia_dentocariosa_M567:0.0599,(Rothia_mucilaginosa_ATC > > C_25296:0.00472,Rothia_mucilaginosa_DY-18:0.00351)100:0.05376)100:0.16477)10 > > 0:0.06018,((Micrococcus_luteus_NCTC_2665:0.0060,Micrococcus_luteus_SK58:0.00 > > 536)100:0.2105,(Arthrobacter_arilaitensis_Re117:0.19987,(Renibacterium_salmo > > ninarum_ATCC_33209:0.11862,(Arthrobacter_aurescens_TC1:0.0397,(Arthrobacter_ > > sp._FB24:0.03191,(Arthrobacter_chlorophenolicus_A6:0.02327,Arthrobacter_phen > > anthrenivorans_Sphe3:0.0208)100:0.01937)100:0.01749)100:0.05605)100:0.05138) > > 54:0.02994)100:0.02418)100:0.06667)66:0.02055)2:0.00737,(Brevibacterium_line > > ns_BL2:0.16126,Brevibacterium_mcbrellneri_ATCC_49030:0.16657)100:0.16995)18: > > 0.03558,(Brachybacterium_faecium_DSM_4810:0.31488,(((Actinomyces_sp._oral_ta > > xon_848_str._F0332:0.18095,Arcanobacterium_haemolyticum_DSM_20595:0.23121)10 > > 0:0.1195,((Actinomyces_urogenitalis_DSM_15434:0.07685,(Actinomyces_sp._oral_ > > taxon_171_str._F0337:0.01337,(Actinomyces_oris_K20:0.00655,Actinomyces_visco > > sus_C505:0.00531)100:0.00537)100:0.07249)100:0.1364,(((Mobiluncus_mulieris_2 > > 8-1:0.00136,(Mobiluncus_mulieris_ATCC_35243:8.3E-4,(Mobiluncus_mulieris_ATCC > > _35239:8.2E-4,Mobiluncus_mulieris_FB024-16:0.00149)77:5.2E-4)68:4.7E-4)100:0 > > .11036,(Mobiluncus_curtisii_ATCC_51333:0.00379,(Mobiluncus_curtisii_ATCC_430 > > 63:0.00217,Mobiluncus_curtisii_subsp._holmesii_ATCC_35242:0.00188)100:0.0018 > > 7)100:0.10314)100:0.23714,(Actinomyces_coleocanis_DSM_15436:0.23053,(Actinom > > yces_sp._oral_taxon_178_str._F0338:0.07648,(Actinomyces_sp._oral_taxon_180_s > > tr._F0310:0.034,(Actinomyces_odontolyticus_ATCC_17982:0.00322,Actinomyces_od > > ontolyticus_F0309:0.00348)100:0.02875)100:0.05144)100:0.13371)100:0.05044)10 > > 0:0.03821)98:0.03296)99:0.07932,(Beutenbergia_cavernae_DSM_12333:0.17071,(Ce > > llulomonas_flavigena_DSM_20109:0.11998,(Xylanimonas_cellulosilytica_DSM_1589 > > 4:0.13952,(Sanguibacter_keddieii_DSM_10542:0.0926,Jonesia_denitrificans_DSM_ > > 20603:0.19099)100:0.037)100:0.02528)100:0.03933)100:0.03437)90:0.02763)87:0. > > 02582)25:0.02892)16:0.01907)26:0.05274)57:0.03588,(Catenulispora_acidiphila_ > > DSM_44928:0.19325,((Streptomyces_bingchenggensis_BCW-1:0.02661,(Streptomyces > > _hygroscopicus_ATCC_53653:0.02627,Streptomyces_violaceusniger_Tu_4113:0.0226 > > 6)100:0.0122)100:0.02739,(((Streptomyces_avermitilis_MA-4680:0.02562,((Strep > > tomyces_viridochromogenes_DSM_40736:0.02687,((Streptomyces_ghanaensis_ATCC_1 > > 4672:0.0209,Streptomyces_griseoflavus_Tu4000:0.02075)100:0.01066,(Streptomyc > > es_sp._e14:0.03193,(Streptomyces_coelicolor_A3_2_:5.2E-4,Streptomyces_livida > > ns_TK24:6.7E-4)100:0.02763)21:0.00492)22:0.00461)100:0.00897,(Streptomyces_s > > viceus_ATCC_29083:0.02794,Streptomyces_scabiei_87.22:0.03155)79:0.00685)98:0 > > .00896)100:0.02063,(Streptomyces_albus_J1074:0.04846,(Streptomyces_sp._SPB74 > > :0.01027,(Streptomyces_sp._SPB78:0.0027,(Streptomyces_sp._SA3_actF:0.00376,S > > treptomyces_sp._SA3_actG:2.4E-4)100:6.7E-4)100:0.00617)100:0.05981)100:0.015 > > 63)100:0.00959,((Streptomyces_sp._C:0.01998,Streptomyces_sp._Mg1:0.02056)100 > > :0.0383,(Streptomyces_pristinaespiralis_ATCC_25486:0.03823,(Streptomyces_sp. > > _ACTE:0.02415,((Streptomyces_griseus_subsp._griseus_NBRC_13350:4.1E-4,Strept > > omyces_sp._ACT-1:5.2E-4)100:0.00769,(Streptomyces_roseosporus_NRRL_11379:2.2 > > E-4,Streptomyces_roseosporus_NRRL_15998:0.00209)100:0.00782)100:0.017)100:0. > > 02297)60:0.00742)100:0.00985)100:0.0215)100:0.11349)100:0.05)46:0.02214,(Aci > > dothermus_cellulolyticus_11B:0.25166,((Nocardiopsis_dassonvillei_subsp._dass > > onvillei_DSM_43111:0.11961,Thermobifida_fusca_YX:0.09198)100:0.09418,(Thermo > > monospora_curvata_DSM_43183:0.12535,(Streptosporangium_roseum_DSM_43021:0.09 > > 754,Thermobispora_bispora_DSM_43833:0.08434)100:0.06253)100:0.0273)100:0.045 > > 43)74:0.02856)40:0.02885,(Frankia_symbiont_of_Datisca_glomerata:0.09569,(Fra > > nkia_sp._EuI1c:0.10847,((Frankia_alni_ACN14a:0.0351,Frankia_sp._CcI3:0.03651 > > )100:0.04249,(Frankia_sp._EAN1pec:0.03101,Frankia_sp._EUN1f:0.0394)100:0.046 > > 88)100:0.02188)100:0.03014)100:0.16741)88:0.0315)65:0.02571)88:0.05354)88:0. > > 03348)88:0.06784,(Segniliparus_rotundus_DSM_44985:0.04795,Segniliparus_rugos > > us_ATCC_BAA-974:0.03891)100:0.23144)81:0.0205)88:0.02214,(Tsukamurella_pauro > > metabola_DSM_20162:0.13958,(Gordonia_bronchialis_DSM_43247:0.08238,Gordonia_ > > neofelifaecis_NRRL_B-59395:0.10412)100:0.06124)100:0.03199)84:0.01596)88:0.0 > > 6919)88:0.04069,(Mycobacterium_smegmatis_str._MC2_155:0.05174,((Mycobacteriu > > m_sp._JLS:0.00135,(Mycobacterium_sp._KMS:1.4E-4,Mycobacterium_sp._MCS:2.3E-4 > > )100:0.00177)100:0.04943,(Mycobacterium_vanbaalenii_PYR-1:0.02631,(Mycobacte > > rium_gilvum_PYR-GCK:0.0011,Mycobacterium_sp._Spyr1:9.2E-4)100:0.03388)100:0. > > 03536)100:0.01213)100:0.02314)88:0.04937,(Mycobacterium_parascrofulaceum_ATC > > C_BAA-614:0.02796,(Mycobacterium_intracellulare_ATCC_13950:0.02159,(Mycobact > > erium_avium_subsp._paratuberculosis_K-10:0.0020,(Mycobacterium_avium_104:0.0 > > 032,Mycobacterium_avium_subsp._avium_ATCC_25291:0.00216)91:6.3E-4)100:0.0194 > > 3)100:0.01235)100:0.0158)88:0.01054)62:0.0066)88:0.04218,(Mycobacterium_tube > > rculosis_T46:7.6E-4,(Mycobacterium_tuberculosis_CPHL_A:5.8E-4,(Mycobacterium > > _tuberculosis_K85:8.0E-4,(Mycobacterium_bovis_AF2122/97:1.4E-4,(Mycobacteriu > > m_bovis_BCG_str._Pasteur_1173P2:1.4E-4,Mycobacterium_bovis_BCG_str._Tokyo_17 > > 2:1.4E-4)100:1.3E-4)100:1.8E-4)38:1.1E-4)33:1.4E-4)0:1.4E-4)1:1.5E-4)0:1.5E- > > 4,(Mycobacterium_tuberculosis_CDC1551:2.2E-4,(Mycobacterium_tuberculosis_T92 > > :0.00262,Mycobacterium_tuberculosis_GM_1503:0.00427)3:1.3E-4)0:1.4E-4)13:1.4 > > E-4,(((Mycobacterium_tuberculosis_SUMu012:8.5E-4,(Mycobacterium_tuberculosis > > _H37Ra_WGS_:0.00109,(Mycobacterium_tuberculosis_H37Ra:1.4E-4,Mycobacterium_t > > uberculosis_H37Rv:1.4E-4)58:1.4E-4)41:1.4E-4)72:1.4E-4,(Mycobacterium_tuberc > > ulosis_SUMu001:2.9E-4,(Mycobacterium_tuberculosis_SUMu010:1.1E-4,Mycobacteri > > um_tuberculosis_SUMu011:8.7E-4)58:1.4E-4)65:1.6E-4)61:0.00102,(Mycobacterium > > _tuberculosis_F11:1.1E-4,((Mycobacterium_tuberculosis_KZN_4207:1.4E-4,Mycoba > > cterium_tuberculosis_KZN_R506:1.4E-4)58:1.4E-4,(Mycobacterium_tuberculosis_K > > ZN_1435:1.4E-4,Mycobacterium_tuberculosis_KZN_V2475:1.4E-4)74:1.4E-4)78:1.1E > -4)36:1.4E-4)3:1.4E-4)46:1.4E-4)4:1.4E-4)88; > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l**** > > ** ** > From manju.rawat2 at gmail.com Tue Sep 13 07:20:07 2011 From: manju.rawat2 at gmail.com (Manju Rawat) Date: Tue, 13 Sep 2011 07:20:07 -0400 Subject: [Bioperl-l] no blast result In-Reply-To: References: <1315835661.3797.563.camel@deskpro15336.internal.sanger.ac.uk> Message-ID: this is the perl code #!usr/bin/perl -w use Bio::Perl; use Bio::SearchIO use Bio::Tools::Run::StandAloneBlast; @params = ('database' => 'swissprot', 'READMETHOD' => 'Blastn'); $factory = Bio::Tools::Run::StandAloneBlast->new(@params); $input = Bio::Seq->new(-id=>"testquery",-seq=>"gatcgtataccgtacagct"); $blast_report = $factory->blastall($input); write_blast(">rs.blast",$blast_report); It showing error that Use of uninitialized value $_[0] in join or string at /usr/share/perl/5.10/File/Spec/Unix.pm line 41. MSG: cannot find path to blastall From fs5 at sanger.ac.uk Tue Sep 13 11:09:24 2011 From: fs5 at sanger.ac.uk (Frank Schwach) Date: Tue, 13 Sep 2011 16:09:24 +0100 Subject: [Bioperl-l] no blast result In-Reply-To: References: <1315835661.3797.563.camel@deskpro15336.internal.sanger.ac.uk> Message-ID: <1315926565.3797.587.camel@deskpro15336.internal.sanger.ac.uk> Peter: in BioPerl 1.6 the default executable name in Bio::Tools::Run StandAloneBlast is still set to "blastall" - I'm not sure if it works with blast+ too. Manju: as I said previously, you need to check that you can run BLAST on the command line, i.e. make sure it is actually installed on your system. Have you done that? You can also check the Bio::Tools::Run::StandAloneBlast docs to see how you can manually set the path to your BLAST executable if it is not in your path. You have to install BLAST fisrt before you can run this module. The other error you get from yuor code refers to something that is outside of the code fragment you show here, so can't comment on that one. Frank On Tue, 2011-09-13 at 07:20 -0400, Manju Rawat wrote: > this is the perl code > > #!usr/bin/perl -w > use Bio::Perl; > use Bio::SearchIO > use Bio::Tools::Run::StandAloneBlast; > @params = ('database' => 'swissprot', > 'READMETHOD' => 'Blastn'); > > $factory = Bio::Tools::Run::StandAloneBlast->new(@params); > > > $input = Bio::Seq->new(-id=>"testquery",-seq=>"gatcgtataccgtacagct"); > $blast_report = $factory->blastall($input); > > > write_blast(">rs.blast",$blast_report); > > > It showing error that > > > Use of uninitialized value $_[0] in join or string > at /usr/share/perl/5.10/File/Spec/Unix.pm line 41. > > MSG: cannot find path to blastall > > -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From David.Messina at sbc.su.se Tue Sep 13 11:34:20 2011 From: David.Messina at sbc.su.se (Dave Messina) Date: Tue, 13 Sep 2011 17:34:20 +0200 Subject: [Bioperl-l] no blast result In-Reply-To: <1315926565.3797.587.camel@deskpro15336.internal.sanger.ac.uk> References: <1315835661.3797.563.camel@deskpro15336.internal.sanger.ac.uk> <1315926565.3797.587.camel@deskpro15336.internal.sanger.ac.uk> Message-ID: There's a separate Bio::Tools::Run::BlastPlus module for blast+. And a related HOWTO: http://www.bioperl.org/wiki/HOWTO:BlastPlus On Tue, Sep 13, 2011 at 17:09, Frank Schwach wrote: > Peter: in BioPerl 1.6 the default executable name in Bio::Tools::Run > StandAloneBlast is still set to "blastall" - I'm not sure if it works > with blast+ too. > > Manju: as I said previously, you need to check that you can run BLAST on > the command line, i.e. make sure it is actually installed on your > system. Have you done that? > You can also check the Bio::Tools::Run::StandAloneBlast docs to see how > you can manually set the path to your BLAST executable if it is not in > your path. You have to install BLAST fisrt before you can run this > module. > The other error you get from yuor code refers to something that is > outside of the code fragment you show here, so can't comment on that > one. > > Frank > > > On Tue, 2011-09-13 at 07:20 -0400, Manju Rawat wrote: > > this is the perl code > > > > #!usr/bin/perl -w > > use Bio::Perl; > > use Bio::SearchIO > > use Bio::Tools::Run::StandAloneBlast; > > @params = ('database' => 'swissprot', > > 'READMETHOD' => 'Blastn'); > > > > $factory = Bio::Tools::Run::StandAloneBlast->new(@params); > > > > > > $input = Bio::Seq->new(-id=>"testquery",-seq=>"gatcgtataccgtacagct"); > > $blast_report = $factory->blastall($input); > > > > > > write_blast(">rs.blast",$blast_report); > > > > > > It showing error that > > > > > > Use of uninitialized value $_[0] in join or string > > at /usr/share/perl/5.10/File/Spec/Unix.pm line 41. > > > > MSG: cannot find path to blastall > > > > > > > > -- > The Wellcome Trust Sanger Institute is operated by Genome Research > Limited, a charity registered in England with number 1021457 and a > company registered in England with number 2742969, whose registered > office is 215 Euston Road, London, NW1 2BE. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at illinois.edu Tue Sep 13 15:36:21 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Tue, 13 Sep 2011 19:36:21 +0000 Subject: [Bioperl-l] no blast result In-Reply-To: References: <1315835661.3797.563.camel@deskpro15336.internal.sanger.ac.uk> Message-ID: <6570DEC6-B485-44B0-868A-AAC6329B3224@illinois.edu> On Sep 12, 2011, at 9:00 AM, Peter Cock wrote: > On Mon, Sep 12, 2011 at 2:54 PM, Frank Schwach wrote: >> looks like BLAST is not install on your system. The BioPerl module only >> runs BLAST for you and parses the output but you still need the BLAST >> executables installed on your system. Follow the instructions on the >> NCBI website to download and install BLAST and try running it on the >> commandline with the "blastall" command. If that works then you can run >> it also via BioPerl. >> Frank > > Hang on - blastall is from the "legacy" BLAST suite, does > BioPerl still talk to that or the new BLAST+ suite (e.g. binaries > blastn and blastp rather then blastall)? > > Peter (aside: thought I sent this the other day. never mix grant writing and open source) Both BLAST and BLAST+ are supported via different modules. Some users don't want to use BLAST+ for various reasons, though this may soon be out of their control when NCBI eventually stops supporting legacy BLAST entirely. chris From manju.rawat2 at gmail.com Wed Sep 14 07:32:19 2011 From: manju.rawat2 at gmail.com (Manju Rawat) Date: Wed, 14 Sep 2011 07:32:19 -0400 Subject: [Bioperl-l] no blast result In-Reply-To: References: <1315835661.3797.563.camel@deskpro15336.internal.sanger.ac.uk> <6570DEC6-B485-44B0-868A-AAC6329B3224@illinois.edu> Message-ID: On Wed, Sep 14, 2011 at 7:31 AM, Manju Rawat wrote: > I am trying to install Blast+ in my system.(ubuntu) from this link > http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/unix_setup.html. > but i am getting error.. > > first i downloaded the blast(ncbi-blast-2.2.25+-ia32-linux.tar.gz) from ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/ > . and then extract it in the home/abc/ folder. > after that i set the path for configuration in terminal i.e > > *PATH=$PATH:/home/abc/blast-2.2.25+/bin* > > > but when i am running blast -help in terminal it showing me error that > > error while loading shared libraries: > libbz2.so.1: cannot open shared object file: No such file or directory. > > -- Regards Manju Rawat Project Assistant(NAIP) Genomics Lab ABTC,NDRI Karnal-132001,Haryana From kumarsaurabh20 at gmail.com Thu Sep 15 07:20:47 2011 From: kumarsaurabh20 at gmail.com (kumar Saurabh) Date: Thu, 15 Sep 2011 13:20:47 +0200 Subject: [Bioperl-l] issue with Bio::Tools::Run::Primer3Redux Message-ID: Hi, I need to integrate the primer3 module in one of our pipeline. In a process, I was testing the initial code given on the CPAN website. But whenever I try to run this program its giving me error...that "Cannot locate the Object method add_target via the package Bio::Tools:Run::Primer3Redux...." The line of codes I am using is as follows: # design some primers. # the output will be put into temp.out use Bio::Tools::Primer3Redux; use Bio::Tools::Run::Primer3Redux; use Bio::SeqIO; my $seqio = Bio::SeqIO->new(-file=>'sample.dna'); my $seq = $seqio->next_seq; my $primer3 = Bio::Tools::Run::Primer3Redux->new(-outfile => "temp.out", -path => "/home/singh/Downloads/primer3-2.2.3/src/primer3_core"); # or after the fact you can change the program_name $primer3->program_name('my_superfast_primer3'); unless ($primer3->executable) { print STDERR "primer3 can not be found. Is it installed?\n"; exit(-1) } # set the maximum and minimum Tm of the primer $primer3->add_targets('PRIMER_MIN_TM'=>56, 'PRIMER_MAX_TM'=>90); # Design the primers. This runs primer3 and returns a # Bio::Tools::Primer3::result object with the results # Primer3 can run in several modes (see explanation for # 'PRIMER_TASK' in the primer3 doccumentation). To run a task, # either call it by its PRIMER_TASK name as in these examples: $pcr_primer_results = $primer3->pick_pcr_primers($seq); $pcr_and_hyb_results = $primer3->pick_pcr_primers_and_hyb_probe( $seq ); $check_results = $primer3->check_primers(); # Alternatively, explicitly set the PRIMER_TASK parameter and # use the generic 'run' method (this is mainly here for backwards # compatibility) : $primer3->PRIMER_TASK( 'pick_left_only' ); $result = $primer3->run( $seq ); # If no task is set and the 'run' method is called, primer3 will default to # pick pcr primers. # see the Bio::Tools::Primer3Redux POD for # things that you can get from this. For example: print "There were ", $results->num_primer_pairs, " primer pairs\n"; Can anyone help me with this??? Best regards, Kumar From fs5 at sanger.ac.uk Thu Sep 15 09:44:03 2011 From: fs5 at sanger.ac.uk (Frank Schwach) Date: Thu, 15 Sep 2011 14:44:03 +0100 Subject: [Bioperl-l] issue with Bio::Tools::Run::Primer3Redux In-Reply-To: References: Message-ID: <1316094243.3797.669.camel@deskpro15336.internal.sanger.ac.uk> Hi Kumar, We are currently working on this module and you might want to check out the latest version on Chris Field's github project: https://github.com/cjfields/Bio-Tools-Primer3Redux There will probably be some changes again once I get some time again to work on a few points we discussed lately. You can also check out my repo here: https://github.com/fschwach/Bio-Tools-Primer3Redux but I will certainly have to make changes to that code because I used AUTOLAD in the last version, which is probably not a good idea. My recommendation would be to use Chris' repo and see if that works for you. If not, feedback would be much appreciated. Cheers, Frank On Thu, 2011-09-15 at 13:20 +0200, kumar Saurabh wrote: > Hi, > > I need to integrate the primer3 module in one of our pipeline. In a process, > I was testing the initial code given on the CPAN website. But whenever I try > to run this program its giving me error...that "Cannot locate the Object > method add_target via the package Bio::Tools:Run::Primer3Redux...." > > The line of codes I am using is as follows: > > # design some primers. > # the output will be put into temp.out > use Bio::Tools::Primer3Redux; > use Bio::Tools::Run::Primer3Redux; > use Bio::SeqIO; > > my $seqio = Bio::SeqIO->new(-file=>'sample.dna'); > my $seq = $seqio->next_seq; > > my $primer3 = Bio::Tools::Run::Primer3Redux->new(-outfile => "temp.out", > -path => > "/home/singh/Downloads/primer3-2.2.3/src/primer3_core"); > > # or after the fact you can change the program_name > $primer3->program_name('my_superfast_primer3'); > > unless ($primer3->executable) { > print STDERR "primer3 can not be found. Is it installed?\n"; > exit(-1) > } > > # set the maximum and minimum Tm of the primer > $primer3->add_targets('PRIMER_MIN_TM'=>56, 'PRIMER_MAX_TM'=>90); > > # Design the primers. This runs primer3 and returns a > # Bio::Tools::Primer3::result object with the results > # Primer3 can run in several modes (see explanation for > # 'PRIMER_TASK' in the primer3 doccumentation). To run a task, > # either call it by its PRIMER_TASK name as in these examples: > $pcr_primer_results = $primer3->pick_pcr_primers($seq); > $pcr_and_hyb_results = $primer3->pick_pcr_primers_and_hyb_probe( $seq ); > $check_results = $primer3->check_primers(); > > # Alternatively, explicitly set the PRIMER_TASK parameter and > # use the generic 'run' method (this is mainly here for backwards > # compatibility) : > $primer3->PRIMER_TASK( 'pick_left_only' ); > $result = $primer3->run( $seq ); > > # If no task is set and the 'run' method is called, primer3 will default > to > # pick pcr primers. > > # see the Bio::Tools::Primer3Redux POD for > # things that you can get from this. For example: > > print "There were ", $results->num_primer_pairs, " primer pairs\n"; > > > Can anyone help me with this??? > > > Best regards, > Kumar > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From cjfields at illinois.edu Thu Sep 15 10:13:38 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Thu, 15 Sep 2011 14:13:38 +0000 Subject: [Bioperl-l] issue with Bio::Tools::Run::Primer3Redux In-Reply-To: <1316094243.3797.669.camel@deskpro15336.internal.sanger.ac.uk> References: <1316094243.3797.669.camel@deskpro15336.internal.sanger.ac.uk> Message-ID: I mentioned off-list that this should be filed as a github issue so we don't lose track. Unfortunately I can't get to it until next week (grant deadline). chris On Sep 15, 2011, at 8:44 AM, Frank Schwach wrote: > Hi Kumar, > > We are currently working on this module and you might want to check out > the latest version on Chris Field's github project: > > https://github.com/cjfields/Bio-Tools-Primer3Redux > > There will probably be some changes again once I get some time again to > work on a few points we discussed lately. You can also check out my repo > here: > https://github.com/fschwach/Bio-Tools-Primer3Redux > > but I will certainly have to make changes to that code because I used > AUTOLAD in the last version, which is probably not a good idea. > My recommendation would be to use Chris' repo and see if that works for > you. If not, feedback would be much appreciated. > > Cheers, > > Frank > > > > > On Thu, 2011-09-15 at 13:20 +0200, kumar Saurabh wrote: >> Hi, >> >> I need to integrate the primer3 module in one of our pipeline. In a process, >> I was testing the initial code given on the CPAN website. But whenever I try >> to run this program its giving me error...that "Cannot locate the Object >> method add_target via the package Bio::Tools:Run::Primer3Redux...." >> >> The line of codes I am using is as follows: >> >> # design some primers. >> # the output will be put into temp.out >> use Bio::Tools::Primer3Redux; >> use Bio::Tools::Run::Primer3Redux; >> use Bio::SeqIO; >> >> my $seqio = Bio::SeqIO->new(-file=>'sample.dna'); >> my $seq = $seqio->next_seq; >> >> my $primer3 = Bio::Tools::Run::Primer3Redux->new(-outfile => "temp.out", >> -path => >> "/home/singh/Downloads/primer3-2.2.3/src/primer3_core"); >> >> # or after the fact you can change the program_name >> $primer3->program_name('my_superfast_primer3'); >> >> unless ($primer3->executable) { >> print STDERR "primer3 can not be found. Is it installed?\n"; >> exit(-1) >> } >> >> # set the maximum and minimum Tm of the primer >> $primer3->add_targets('PRIMER_MIN_TM'=>56, 'PRIMER_MAX_TM'=>90); >> >> # Design the primers. This runs primer3 and returns a >> # Bio::Tools::Primer3::result object with the results >> # Primer3 can run in several modes (see explanation for >> # 'PRIMER_TASK' in the primer3 doccumentation). To run a task, >> # either call it by its PRIMER_TASK name as in these examples: >> $pcr_primer_results = $primer3->pick_pcr_primers($seq); >> $pcr_and_hyb_results = $primer3->pick_pcr_primers_and_hyb_probe( $seq ); >> $check_results = $primer3->check_primers(); >> >> # Alternatively, explicitly set the PRIMER_TASK parameter and >> # use the generic 'run' method (this is mainly here for backwards >> # compatibility) : >> $primer3->PRIMER_TASK( 'pick_left_only' ); >> $result = $primer3->run( $seq ); >> >> # If no task is set and the 'run' method is called, primer3 will default >> to >> # pick pcr primers. >> >> # see the Bio::Tools::Primer3Redux POD for >> # things that you can get from this. For example: >> >> print "There were ", $results->num_primer_pairs, " primer pairs\n"; >> >> >> Can anyone help me with this??? >> >> >> Best regards, >> Kumar >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > The Wellcome Trust Sanger Institute is operated by Genome Research > Limited, a charity registered in England with number 1021457 and a > company registered in England with number 2742969, whose registered > office is 215 Euston Road, London, NW1 2BE. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From fs5 at sanger.ac.uk Thu Sep 15 10:43:48 2011 From: fs5 at sanger.ac.uk (Frank Schwach) Date: Thu, 15 Sep 2011 15:43:48 +0100 Subject: [Bioperl-l] issue with Bio::Tools::Run::Primer3Redux In-Reply-To: References: <1316094243.3797.669.camel@deskpro15336.internal.sanger.ac.uk> Message-ID: <1316097828.3797.700.camel@deskpro15336.internal.sanger.ac.uk> I also haven't had the time yet to work on this again but, yes, we need to make sure we don't loose track of where we are. On Thu, 2011-09-15 at 14:13 +0000, Fields, Christopher J wrote: > I mentioned off-list that this should be filed as a github issue so we don't lose track. Unfortunately I can't get to it until next week (grant deadline). > > chris > > On Sep 15, 2011, at 8:44 AM, Frank Schwach wrote: > > > Hi Kumar, > > > > We are currently working on this module and you might want to check out > > the latest version on Chris Field's github project: > > > > https://github.com/cjfields/Bio-Tools-Primer3Redux > > > > There will probably be some changes again once I get some time again to > > work on a few points we discussed lately. You can also check out my repo > > here: > > https://github.com/fschwach/Bio-Tools-Primer3Redux > > > > but I will certainly have to make changes to that code because I used > > AUTOLAD in the last version, which is probably not a good idea. > > My recommendation would be to use Chris' repo and see if that works for > > you. If not, feedback would be much appreciated. > > > > Cheers, > > > > Frank > > > > > > > > > > On Thu, 2011-09-15 at 13:20 +0200, kumar Saurabh wrote: > >> Hi, > >> > >> I need to integrate the primer3 module in one of our pipeline. In a process, > >> I was testing the initial code given on the CPAN website. But whenever I try > >> to run this program its giving me error...that "Cannot locate the Object > >> method add_target via the package Bio::Tools:Run::Primer3Redux...." > >> > >> The line of codes I am using is as follows: > >> > >> # design some primers. > >> # the output will be put into temp.out > >> use Bio::Tools::Primer3Redux; > >> use Bio::Tools::Run::Primer3Redux; > >> use Bio::SeqIO; > >> > >> my $seqio = Bio::SeqIO->new(-file=>'sample.dna'); > >> my $seq = $seqio->next_seq; > >> > >> my $primer3 = Bio::Tools::Run::Primer3Redux->new(-outfile => "temp.out", > >> -path => > >> "/home/singh/Downloads/primer3-2.2.3/src/primer3_core"); > >> > >> # or after the fact you can change the program_name > >> $primer3->program_name('my_superfast_primer3'); > >> > >> unless ($primer3->executable) { > >> print STDERR "primer3 can not be found. Is it installed?\n"; > >> exit(-1) > >> } > >> > >> # set the maximum and minimum Tm of the primer > >> $primer3->add_targets('PRIMER_MIN_TM'=>56, 'PRIMER_MAX_TM'=>90); > >> > >> # Design the primers. This runs primer3 and returns a > >> # Bio::Tools::Primer3::result object with the results > >> # Primer3 can run in several modes (see explanation for > >> # 'PRIMER_TASK' in the primer3 doccumentation). To run a task, > >> # either call it by its PRIMER_TASK name as in these examples: > >> $pcr_primer_results = $primer3->pick_pcr_primers($seq); > >> $pcr_and_hyb_results = $primer3->pick_pcr_primers_and_hyb_probe( $seq ); > >> $check_results = $primer3->check_primers(); > >> > >> # Alternatively, explicitly set the PRIMER_TASK parameter and > >> # use the generic 'run' method (this is mainly here for backwards > >> # compatibility) : > >> $primer3->PRIMER_TASK( 'pick_left_only' ); > >> $result = $primer3->run( $seq ); > >> > >> # If no task is set and the 'run' method is called, primer3 will default > >> to > >> # pick pcr primers. > >> > >> # see the Bio::Tools::Primer3Redux POD for > >> # things that you can get from this. For example: > >> > >> print "There were ", $results->num_primer_pairs, " primer pairs\n"; > >> > >> > >> Can anyone help me with this??? > >> > >> > >> Best regards, > >> Kumar > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > -- > > The Wellcome Trust Sanger Institute is operated by Genome Research > > Limited, a charity registered in England with number 1021457 and a > > company registered in England with number 2742969, whose registered > > office is 215 Euston Road, London, NW1 2BE. > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From manju.rawat2 at gmail.com Fri Sep 16 01:09:25 2011 From: manju.rawat2 at gmail.com (Manju Rawat) Date: Fri, 16 Sep 2011 01:09:25 -0400 Subject: [Bioperl-l] no blast result In-Reply-To: <1315926565.3797.587.camel@deskpro15336.internal.sanger.ac.uk> References: <1315835661.3797.563.camel@deskpro15336.internal.sanger.ac.uk> <1315926565.3797.587.camel@deskpro15336.internal.sanger.ac.uk> Message-ID: Hello Frank, Yes,u r rite..I tried to run blast in terminal but its not working.. I have installed the latest version of blast and download the database correctly.. But when i running blastn-help command in terminal it showing me error that blastn: error while loading shared libraries: libbz2.so.1: cannot open shared object file: No such file or directory. and when i am running the blastall command then it showing that *legacy_blast.pl blastall -p blastn -i amino.fa -d nr -o blast.out* Can't exec "/usr/bin/blastn": No such file or directory at /usr/bin/ legacy_blast.pl line 85. Program failed, try executing the command manually. While i have set the path of environment variable PATH=$PATH:/home/abc/ncbi-blast-2.2.25+/bin I have checked everything but not able tp fine the error.. Pl help me. Manju From manju.rawat2 at gmail.com Fri Sep 16 01:12:03 2011 From: manju.rawat2 at gmail.com (Manju Rawat) Date: Fri, 16 Sep 2011 01:12:03 -0400 Subject: [Bioperl-l] Command line error in BLAST+ Message-ID: Hi, I have installed the latest version of blast and download the database correctly Using this tutorial http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/unix_setup.html But when i running blastn-help command in terminal it showing me error that blastn: error while loading shared libraries: libbz2.so.1: cannot open shared object file: No such file or directory. and when i am running the blastall command then it showing that *legacy_blast.pl blastall -p blastn -i amino.fa -d nr -o blast.out* Can't exec "/usr/bin/blastn": No such file or directory at /usr/bin/ legacy_blast.pl line 85. Program failed, try executing the command manually. While i have set the path of environment variable PATH=$PATH:/home/abc/ncbi-blast-2.2.25+/bin I have checked everything but not able tp fine the error.. Pl help me. Thanks Manju From p.j.a.cock at googlemail.com Fri Sep 16 04:15:46 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 16 Sep 2011 09:15:46 +0100 Subject: [Bioperl-l] Command line error in BLAST+ In-Reply-To: References: Message-ID: On Fri, Sep 16, 2011 at 6:12 AM, Manju Rawat wrote: > Hi, > > > I have installed the latest version of blast and download the database > correctly Using this tutorial > http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/unix_setup.html > > But when i running blastn-help command in terminal it showing me error that > blastn: error while loading shared libraries: libbz2.so.1: cannot open > shared object file: No such file or directory. > > > and when i am running the blastall command then it showing that > *legacy_blast.pl blastall -p blastn -i amino.fa -d nr -o blast.out* > Can't exec "/usr/bin/blastn": No such file or directory at /usr/bin/ > legacy_blast.pl line 85. > Program failed, try executing the command manually. > > While i have set the path of environment variable > PATH=$PATH:/home/abc/ncbi-blast-2.2.25+/bin > > I have checked everything but not able tp fine the error.. > > Pl help me. > > Thanks > Manju You're using the BioPerl wrapper for legacy blast (blastall), which is not installed. Instead you have the new blast+ suite which includes a wrapper using the perl script legacy_blast.pl to imitate the old blastall tool (in this case it calls the new tool blastn). Fix 1: Edit legacy_blast.pl to use the path to blastn etc under your home directory Fix 2: Install BLAST+ at system level Fix 3: Use the BioPerl wrapper for BLAST+ instead. I'd go with option 3. Peter From p.j.a.cock at googlemail.com Fri Sep 16 04:17:58 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 16 Sep 2011 09:17:58 +0100 Subject: [Bioperl-l] no blast result In-Reply-To: References: <1315835661.3797.563.camel@deskpro15336.internal.sanger.ac.uk> <1315926565.3797.587.camel@deskpro15336.internal.sanger.ac.uk> Message-ID: On Fri, Sep 16, 2011 at 6:09 AM, Manju Rawat wrote: > Hello Frank, > > Yes,u r rite..I tried to run blast in terminal but its not working.. > I have installed the latest version of blast and download the database > correctly.. > > But when i running blastn-help command in terminal it showing me error that > blastn: error while loading shared libraries: libbz2.so.1: cannot open > shared object file: No such file or directory. > > > and when i am running the blastall command then it showing that > legacy_blast.pl blastall -p blastn -i amino.fa -d nr -o blast.out > Can't exec "/usr/bin/blastn": No such file or directory at > /usr/bin/legacy_blast.pl line 85. > Program failed, try executing the command manually. > > While i have set the path of environment variable > PATH=$PATH:/home/abc/ncbi-blast-2.2.25+/bin > > I have checked everything but not able tp fine the error.. > > Pl help me. > Manju For the benefit of anyone reading the archives later, I tried to answer this in Manju's new thread: http://lists.open-bio.org/pipermail/bioperl-l/2011-September/035696.html Peter From fs5 at sanger.ac.uk Fri Sep 16 04:36:37 2011 From: fs5 at sanger.ac.uk (Frank Schwach) Date: Fri, 16 Sep 2011 09:36:37 +0100 Subject: [Bioperl-l] Command line error in BLAST+ In-Reply-To: References: Message-ID: <1316162197.3797.721.camel@deskpro15336.internal.sanger.ac.uk> Hi Manju, Are you on Ubuntu? I think I've seen problems with this bzip library on Ubuntu before. It's not a problem with BLAST in any case. Should be possible to install the missing files through your package manager. I'm sure Google will know what to do :) Not sure what went wrong with your blast installation. What happens if you run blastall directly (without the legacy_blast.pl script)? In any case, it might be better to ask the NCBI people for help with the BLAST installation as this is not a BioPerl problem. cheers, Frank On Fri, 2011-09-16 at 01:12 -0400, Manju Rawat wrote: > Hi, > > > I have installed the latest version of blast and download the database > correctly Using this tutorial > http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/unix_setup.html > > But when i running blastn-help command in terminal it showing me error that > blastn: error while loading shared libraries: libbz2.so.1: cannot open > shared object file: No such file or directory. > > > and when i am running the blastall command then it showing that > *legacy_blast.pl blastall -p blastn -i amino.fa -d nr -o blast.out* > Can't exec "/usr/bin/blastn": No such file or directory at /usr/bin/ > legacy_blast.pl line 85. > Program failed, try executing the command manually. > > While i have set the path of environment variable > PATH=$PATH:/home/abc/ncbi-blast-2.2.25+/bin > > I have checked everything but not able tp fine the error.. > > Pl help me. > > Thanks > Manju > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From ross at cuhk.edu.hk Fri Sep 16 04:51:38 2011 From: ross at cuhk.edu.hk (Ross KK Leung) Date: Fri, 16 Sep 2011 16:51:38 +0800 Subject: [Bioperl-l] use blast to extract similar sequences In-Reply-To: References: <048a01cc6f84$60c41090$224c31b0$@edu.hk> <04a601cc700e$4f051d10$ed0f5730$@edu.hk> Message-ID: <085501cc744d$d90b4500$8b21cf00$@edu.hk> I wonder whether bioperl has any built-in modules that extracts sequences based on blast results. For example, a short query sequence of length 1000 is to blast against a reference genome of 3M. The homologous sequence of 1000 +/- 20 is extracted. Why is +/- 20 needed? Because we can't guarantee there must have a good match. Frequent blast users may be well aware that then there can be coverage, split-up due to local alignments, etc and that's why I would like to know if anybody has already developed a module to handle this kind of problem. Thanks in advance! From cjfields at illinois.edu Fri Sep 16 09:22:07 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Fri, 16 Sep 2011 13:22:07 +0000 Subject: [Bioperl-l] use blast to extract similar sequences In-Reply-To: <085501cc744d$d90b4500$8b21cf00$@edu.hk> References: <048a01cc6f84$60c41090$224c31b0$@edu.hk> <04a601cc700e$4f051d10$ed0f5730$@edu.hk> <085501cc744d$d90b4500$8b21cf00$@edu.hk> Message-ID: That seems like a pretty straightforward thing to do; there isn't an all-in-one way of doing this, but that's a good thing (it's a separation of concerns). 1) Run and parse BLAST results and grab seqID and coordinates for each hit (or each HSP for each hit) (Bio::SearchIO) 2) Pull the right subsequence +/- 20bp using above from the indexed flat file of your reference (Bio::DB::Fasta) You can get revcomped sequence from Bio::DB::Fasta directly by flipping coordinates: # raw sequence my $seq = $db->seq('CHROMOSOME_I',4_000_000 => 4_100_000); my $revseq = $db->seq('CHROMOSOME_I',4_100_000 => 4_000_000); chris On Sep 16, 2011, at 3:51 AM, Ross KK Leung wrote: > I wonder whether bioperl has any built-in modules that extracts sequences based on blast results. For example, a short query sequence of length 1000 is to blast against a reference genome of 3M. The homologous sequence of 1000 +/- 20 is extracted. Why is +/- 20 needed? Because we can't guarantee there must have a good match. Frequent blast users may be well aware that then there can be coverage, split-up due to local alignments, etc and that's why I would like to know if anybody has already developed a module to handle this kind of problem. Thanks in advance! > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From wsavigne at yahoo.com Fri Sep 16 16:45:12 2011 From: wsavigne at yahoo.com (Willy Savigne) Date: Fri, 16 Sep 2011 13:45:12 -0700 (PDT) Subject: [Bioperl-l] question Bioperl installation Message-ID: <1316205912.93564.YahooMailClassic@web160515.mail.bf1.yahoo.com> my name is william how do download Bioperl i tried other site but NOTHING? i would like to know info in downloading? bioperl .This is my first? time into knowing? bioinformatic i? just got? a book developing bioinformatic and begginning perl bioinformatic. I do alot Dna and RNA sequencing?? and more. ? Thank u willy From ross at cuhk.edu.hk Sun Sep 18 06:51:05 2011 From: ross at cuhk.edu.hk (Ross KK Leung) Date: Sun, 18 Sep 2011 18:51:05 +0800 Subject: [Bioperl-l] snp/frameshift identification In-Reply-To: References: <048a01cc6f84$60c41090$224c31b0$@edu.hk> <04a601cc700e$4f051d10$ed0f5730$@edu.hk> <085501cc744d$d90b4500$8b21cf00$@edu.hk> Message-ID: <08a901cc75f0$dd463b30$97d2b190$@edu.hk> Dear Bioperl-users, Following Fields, Christopher J's advice on sequence extraction, I manage to proceed to the last stage of non-synonymous SNP identification. Now what I have in hand is thousands of reliable multiple sequence alignment files, e.g. >seq1 ATGACAGACACGACGTTGCCGTAG >seq2 ATGACAGACACGACGTAGCCGTAG >seq3 ATGACAGACACGACGTTGCCGTAG Seq2 has a T->A mutation and that leads to a stop codon generation. I wonder if Bioperl has handled this kind of SNP or frameshift or non-sense mutations that lead to change of amino acid in the translated protein product. Thanks again to the community that helps me a great deal so I can catch up progress during this Sat/Sun!! From rondonbio at yahoo.com.br Mon Sep 19 09:46:36 2011 From: rondonbio at yahoo.com.br (Rondon Neto) Date: Mon, 19 Sep 2011 06:46:36 -0700 (PDT) Subject: [Bioperl-l] help-> SearchIO Message-ID: <1316439996.6247.YahooMailNeo@web130220.mail.mud.yahoo.com> Hi guys! I need your help in a loop that I have in SearchIO. I need to check the nucleotide coverage of querys using BLAST. I'm using the script below. It's open the alignment, create arrays for each query with zeros in each nucleotide position but, when I adds values to the coverage of each nucleotide, the script does it once and passes to another query. Can you hek me? Thank you very much, Rondon a Brazilian friend. use Bio::SearchIO; ? my $alignment = new Bio::SearchIO ( -format => 'blastXML', ? ? ? ? ? ? ? ? ? ? ? ? ?? ? -file ? => $alignment_file ); my %positions; while (my $result = $alignment->next_result) { while (my $hit = $result->next_hit) { while (my $hsp = $hit->next_hsp) { my $query_name = $result->query_name(); my $tam = $result -> query_length(); my @pos = $hsp->seq_inds('query','identical'); for (0..$tam){ ${$positions{$query_name}}[$_] = 0 } # make arrays for each query and populate with 0 in each position foreach my $num (@pos) { ${$positions{$query_name}}[$num -1]++; ? ?#This loop is where I believe that is an error. } } } } foreach my $key (keys %positions){ print "$key\t@{$positions{$key}}\n"; } exit; From roy.chaudhuri at gmail.com Mon Sep 19 12:29:41 2011 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Mon, 19 Sep 2011 17:29:41 +0100 Subject: [Bioperl-l] help-> SearchIO In-Reply-To: <1316439996.6247.YahooMailNeo@web130220.mail.mud.yahoo.com> References: <1316439996.6247.YahooMailNeo@web130220.mail.mud.yahoo.com> Message-ID: <4E776DF5.6040504@gmail.com> Hi Rondon, The line where you populate your arrayref with 0 (starting "for (0..$tam)") is within the HSP loop, so the data from every successive HSP will overwrite the previous one in your hash. You will therefore only see the data for the last HSP from each query. If you move that line to execute once per result (i.e. just after the line starting "while (my result ="), then I think it should work as you intended. Cheers, Roy. On 19/09/2011 14:46, Rondon Neto wrote: > Hi guys! I need your help in a loop that I have in SearchIO. I need > to check the nucleotide coverage of querys using BLAST. I'm using the > script below. It's open the alignment, create arrays for each query > with zeros in each nucleotide position but, when I adds values to the > coverage of each nucleotide, the script does it once and passes to > another query. Can you hek me? Thank you very much, > > Rondon a Brazilian friend. > > use Bio::SearchIO; > > my $alignment = new Bio::SearchIO ( -format => 'blastXML', > -file => $alignment_file ); > > my %positions; > while (my $result = $alignment->next_result) { > while (my $hit = $result->next_hit) { > while (my $hsp = $hit->next_hsp) { > my $query_name = $result->query_name(); > my $tam = $result -> query_length(); > my @pos = $hsp->seq_inds('query','identical'); > for (0..$tam){ ${$positions{$query_name}}[$_] = 0 } # make arrays for each query and populate with 0 in each position > foreach my $num (@pos) { > ${$positions{$query_name}}[$num -1]++; #This loop is where I believe that is an error. > } > } > } > } > > foreach my $key (keys %positions){ > print "$key\t@{$positions{$key}}\n"; > } > > exit; > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From roy.chaudhuri at gmail.com Mon Sep 19 12:39:40 2011 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Mon, 19 Sep 2011 17:39:40 +0100 Subject: [Bioperl-l] question Bioperl installation In-Reply-To: <1316205912.93564.YahooMailClassic@web160515.mail.bf1.yahoo.com> References: <1316205912.93564.YahooMailClassic@web160515.mail.bf1.yahoo.com> Message-ID: <4E77704C.20604@gmail.com> Hi Willy, There are instructions for downloading and installing BioPerl on the wiki: http://www.bioperl.org/wiki/Getting_BioPerl http://www.bioperl.org/wiki/Installing_BioPerl These are the first two results when you Google for "bioperl download". Note that the wiki is a little out of date, the latest BioPerl version is 1.6.901: http://search.cpan.org/~cjfields/BioPerl-1.6.901/ Cheers, Roy. On 16/09/2011 21:45, Willy Savigne wrote: > my name is william how do download Bioperl i tried other site but > NOTHING i would like to know info in downloading bioperl .This is > my first time into knowing bioinformatic i just got a book > developing bioinformatic and begginning perl bioinformatic. I do alot > Dna and RNA sequencing and more. > > Thank u willy > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From bosborne11 at verizon.net Tue Sep 20 13:01:21 2011 From: bosborne11 at verizon.net (Brian Osborne) Date: Tue, 20 Sep 2011 13:01:21 -0400 Subject: [Bioperl-l] Question about a phylogenetic tree Message-ID: <36D93DC4-135D-408F-8169-AC8D5E59BD90@verizon.net> All, I have code that starts with a sequence file and makes a tree (Bio::Tree::Tree) using Muscle to align and then Phyml, here's the last part that makes the tree: ..... get the files etc .... my %alignparams = ( -seqtype => 'nucleo', -usetree_nowarn => $guidetreefile, -in => $tempfile ); my $aligner = Bio::Tools::Run::Alignment::Muscle->new(%alignparams); # $align is a Bio::SimpleAlign object my $align = $aligner->align($tempfile); my %treeparams = ( -data_type => 'nt', -model => 'K80', # Kimura -tree => 'BIONJ', -bootstrap => 1000 ); my $treemaker = Bio::Tools::Run::Phylo::Phyml->new(%treeparams); #$tree is a Bio::Tree::Tree object my $tree = $treemaker->run($align); My question: do I get the pairwise distance between 2 sequences (based on Kimura here) by doing something like: $distance = $tree->subtree_length($internal_node) Where $internal_node is the parent of the pair in question? Excuse me if this is obvious, have never made Bioperl trees before! Brian O. From bosborne11 at verizon.net Tue Sep 20 15:17:13 2011 From: bosborne11 at verizon.net (Brian Osborne) Date: Tue, 20 Sep 2011 15:17:13 -0400 Subject: [Bioperl-l] Question about a phylogenetic tree In-Reply-To: <36D93DC4-135D-408F-8169-AC8D5E59BD90@verizon.net> References: <36D93DC4-135D-408F-8169-AC8D5E59BD90@verizon.net> Message-ID: Ah, I see: my $distances = $tree->distance(-nodes => [$node1,$node2]); Brian O. On Sep 20, 2011, at 1:01 PM, Brian Osborne wrote: > All, > > I have code that starts with a sequence file and makes a tree (Bio::Tree::Tree) using Muscle to align and then Phyml, here's the last part that makes the tree: > > ..... get the files etc .... > > my %alignparams = ( > -seqtype => 'nucleo', > -usetree_nowarn => $guidetreefile, > -in => $tempfile > ); > my $aligner = Bio::Tools::Run::Alignment::Muscle->new(%alignparams); > > # $align is a Bio::SimpleAlign object > my $align = $aligner->align($tempfile); > > my %treeparams = ( > -data_type => 'nt', > -model => 'K80', # Kimura > -tree => 'BIONJ', > -bootstrap => 1000 > ); > my $treemaker = Bio::Tools::Run::Phylo::Phyml->new(%treeparams); > > #$tree is a Bio::Tree::Tree object > my $tree = $treemaker->run($align); > > My question: do I get the pairwise distance between 2 sequences (based on Kimura here) by doing something like: > > $distance = $tree->subtree_length($internal_node) > > Where $internal_node is the parent of the pair in question? Excuse me if this is obvious, have never made Bioperl trees before! > > Brian O. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From manju.rawat2 at gmail.com Thu Sep 22 07:07:39 2011 From: manju.rawat2 at gmail.com (Manju Rawat) Date: Thu, 22 Sep 2011 16:37:39 +0530 Subject: [Bioperl-l] database for Bos Touras Message-ID: Hello To All, I want to blast my sequence Only in Bos Touras Database Using Local Blast(Blast+). But I dnt Know which database I should use for this From this Link. ftp://ftp.ncbi.nlm.nih.gov/blast/db/ Pl tell me which DB I Should use?? Thanks Manju From hrh at fmi.ch Thu Sep 22 07:44:56 2011 From: hrh at fmi.ch (Hans-Rudolf Hotz) Date: Thu, 22 Sep 2011 13:44:56 +0200 Subject: [Bioperl-l] database for Bos Touras In-Reply-To: References: Message-ID: <4E7B1FB8.8090208@fmi.ch> assuming you mean 'Bos taurus', it might be easier to get the data from ucsc: http://hgdownload.cse.ucsc.edu/downloads.html#cow or ensembl: ftp://ftp.ensembl.org/pub/release-64/fasta/bos_taurus/dna/ Regards, Hans On 09/22/2011 01:07 PM, Manju Rawat wrote: > Hello To All, > > I want to blast my sequence Only in Bos Touras Database Using Local > Blast(Blast+). > But I dnt Know which database I should use for this From this Link. > ftp://ftp.ncbi.nlm.nih.gov/blast/db/ > > Pl tell me which DB I Should use?? > > Thanks > Manju > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From hrh at fmi.ch Thu Sep 22 08:16:00 2011 From: hrh at fmi.ch (Hans-Rudolf Hotz) Date: Thu, 22 Sep 2011 14:16:00 +0200 Subject: [Bioperl-l] database for Bos Touras In-Reply-To: References: <4E7B1FB8.8090208@fmi.ch> Message-ID: <4E7B2700.8080904@fmi.ch> Yes, BLAST uses fasta files. You (may need to concatenate the individual chromosomes and the you) need to index them with 'makeblastdb' which is also part of the blast+ software package. see: http://www.ncbi.nlm.nih.gov/books/NBK1762/ Hans On 09/22/2011 01:49 PM, Manju Rawat wrote: > It will work on Local Blast or not?????? From bosborne11 at verizon.net Thu Sep 22 12:16:39 2011 From: bosborne11 at verizon.net (Brian Osborne) Date: Thu, 22 Sep 2011 12:16:39 -0400 Subject: [Bioperl-l] [bioperl-live] genbank_ref_extractor: new script to make search on entrez gene and retrieve related sequences (#23) In-Reply-To: References: Message-ID: <245C75D5-61EC-4395-B64F-47D8471568F5@verizon.net> Carne, This is impressive looking, it is now in scripts/. Thanks again, Brian O. On Sep 21, 2011, at 11:25 AM, Carn? Draug wrote: > Hi > > I wrote a script with bioperl that I would like to share back. It takes a list of searches for Entrez Gene and attempts to retrieve the related sequences (genomic, transcripts and proteins). It is also possible to obtain extra upstream and downstream bp for genomic sequences and control the naming of the files. In the end it can save all the results in a CSV file. > > Hope you find it up to your coding standards. Suggestions for improvements are welcome, including for a better name. > > Carn? > > You can merge this Pull Request by running: > > git pull https://github.com/carandraug/bioperl-live bp_genbank_ref_extractor > > Or you can view, comment on it, or merge it online at: > > https://github.com/bioperl/bioperl-live/pull/23 > > -- Commit Summary -- > > * genbank_ref_extractor: new script to make search on entrez gene and retrieve related sequences > > -- File Changes -- > > A scripts/Bio-DB-EUtilities/bp_genbank_ref_extractor.pl (1064) > > -- Patch Links -- > > https://github.com/bioperl/bioperl-live/pull/23.patch > https://github.com/bioperl/bioperl-live/pull/23.diff > > -- > Reply to this email directly or view it on GitHub: > https://github.com/bioperl/bioperl-live/pull/23 From bluecurio at gmail.com Thu Sep 22 15:32:07 2011 From: bluecurio at gmail.com (Daniel Renfro) Date: Thu, 22 Sep 2011 14:32:07 -0500 Subject: [Bioperl-l] Download RefSeq revision history programmatically Message-ID: <29B78DBCEBCA42B5A99461FE1E9BC33F@gmail.com> I am working on a project to find historical differences in GenBank/RefSeq files. I would like to download all the old revisions of a file (for example NC_000913 [http://www.ncbi.nlm.nih.gov/nuccore/NC_000913.2?report=girevhist]) using any technology available. I wrote a page-scraper in Perl, but I can't get NCBI to return plaintext, only HTML (which does nobody any good.) Does anyone know of a way to get all the "revisions" (not just "versions") of a GenBank/RefSeq file? -Daniel -- http://ecoliwiki.net/User:DanielRenfro Hu Lab Research Associate 979-862-4055 From ross at cuhk.edu.hk Tue Sep 27 10:16:14 2011 From: ross at cuhk.edu.hk (Ross KK Leung) Date: Tue, 27 Sep 2011 22:16:14 +0800 Subject: [Bioperl-l] obtain a distance matrix from tree In-Reply-To: References: <048a01cc6f84$60c41090$224c31b0$@edu.hk> <04a601cc700e$4f051d10$ed0f5730$@edu.hk> <085501cc744d$d90b4500$8b21cf00$@edu.hk> Message-ID: <014201cc7d20$03abd4c0$0b037e40$@edu.hk> After using MEGA to generate a newick tree file (phylogram), I wonder if Bioperl has any convenient functions to derive the (n x n) distance (by NJ, MP etc) matrix. Thanks for your advice in advance! From thomas.sharpton at gmail.com Tue Sep 27 16:02:44 2011 From: thomas.sharpton at gmail.com (Thomas Sharpton) Date: Tue, 27 Sep 2011 13:02:44 -0700 Subject: [Bioperl-l] obtain a distance matrix from tree In-Reply-To: <014201cc7d20$03abd4c0$0b037e40$@edu.hk> References: <048a01cc6f84$60c41090$224c31b0$@edu.hk> <04a601cc700e$4f051d10$ed0f5730$@edu.hk> <085501cc744d$d90b4500$8b21cf00$@edu.hk> <014201cc7d20$03abd4c0$0b037e40$@edu.hk> Message-ID: Hi Ross, For very large trees, I found it to be more efficient to do this in R using the ape package. I have a script listed in my github repo that will convert a tree to a distance matrix via in R at the link below: https://github.com/sharpton/PhylOTU/blob/master/tree_to_matrix.R That said, I've also done this in Bioperl using something like the following: use Bio::TreeIO; my $treein = Bio::TreeIO->new( -fh => "input_tree.nwk", -format => 'newick' ); while( my $tree = $treein->next_tree ){ my %dist_matrix = (); my @leaves = $tree->get_leaf_nodes; foreach my $leaf1( @leaves ){ my $id1 = $leaf1->id; foreach my $leaf2( @leaves ){ my $id2 = $leaf2->id; next if $id1 eq $id2; next if( defined( $dist_matrix{$id1}->{$id2} ) || defined ( $dist_matrix{$id2}->{$id1} ) ); my $distance = $tree->distance( -nodes => [$leaf1, $leaf2] ); $dist_matrix{$id1}->{$id2} = $distance; } } } #print distance matrix here.... This will put the information you need to create either a full or a upper triangle distance matrix into the hash %dist_matrix. I didn't test the above, so hopefully there are no bugs.... Someone else may have a more elegant solution. Best, Tom PS: Sorry if you get this twice. On Sep 27, 2011, at 7:16 AM, Ross KK Leung wrote: > After using MEGA to generate a newick tree file (phylogram), I > wonder if > Bioperl has any convenient functions to derive the (n x n) distance > (by NJ, > MP etc) matrix. Thanks for your advice in advance! > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From member at linkedin.com Tue Sep 27 19:45:10 2011 From: member at linkedin.com (Razi Khaja via LinkedIn) Date: Tue, 27 Sep 2011 23:45:10 +0000 (UTC) Subject: [Bioperl-l] Invitation to connect on LinkedIn Message-ID: <1856085440.8574001.1317167110185.JavaMail.app@ela4-bed82.prod> LinkedIn ------------ Razi Khaja requested to add you as a connection on LinkedIn: ------------------------------------------ Bolotin,, I'd like to add you to my professional network on LinkedIn. Accept invitation from Razi Khaja http://www.linkedin.com/e/5drwke-gt3jaequ-6k/uez6TYkHzbaXxXM-lUk23auFwJZodcPlXc2UWC0Ao8h/blk/I3148646357_2/1BpC5vrmRLoRZcjkkZt5YCpnlOt3RApnhMpmdzgmhxrSNBszYOnPsRcPoQdzwQcjd9bSsPizpOoltTbP0NdPgMd3kTcPgLrCBxbOYWrSlI/EML_comm_afe/?hs=false&tok=2TjQgihXkh-kU1 View profile of Razi Khaja http://www.linkedin.com/e/5drwke-gt3jaequ-6k/rsn/35197242/UkCS/?hs=false&tok=3k9X2Qfnoh-kU1 ------------------------------------------ From ross at cuhk.edu.hk Tue Sep 27 23:57:52 2011 From: ross at cuhk.edu.hk (Ross KK Leung) Date: Wed, 28 Sep 2011 11:57:52 +0800 Subject: [Bioperl-l] ancestral state derived from Tree In-Reply-To: <29B78DBCEBCA42B5A99461FE1E9BC33F@gmail.com> References: <29B78DBCEBCA42B5A99461FE1E9BC33F@gmail.com> Message-ID: <017701cc7d92$cba88d20$62f9a760$@edu.hk> By using Tom's advice, I'm able to obtain the distance matrix for the following tree by Bioperl TreeIO. ((((((A:1.00000000,B:1.00000000):1.00000000,C:1.00000000):0.00000000,D:0.000 00000):1.00000000,(E:0.00000000,(F:2.00000000,G:1.00000000):0.00000000):0.00 000000):2.00000000,(H:3.00000000,(I:2.00000000,(J:1.00000000,(K:2.00000000,( L:2.00000000,M:2.00000000):0.00000000):0.00000000):0.00000000):0.00000000):0 .00000000):1.00000000,(N:0.00000000,((O:0.00000000,P:0.00000000):1.00000000, (Q:2.00000000,(R:2.66666667,S:3.66666667):3.66666667):0.00000000):1.00000000 ):3.00000000,(T:0.00000000,(U:0.00000000,V:0.00000000):1.00000000):16.000000 00); For the last few nodes T, U and V, they should be monophyletic but U and V should be more closely related. Although I can use TreeIO methods like is_monophyletic or is_paraphyletic to test in this case, the problem becomes more tricky for nodes A, B, C, D because D actually makes no difference from the common ancestor of nodes A, B, C and D. Since is_monophyletic does not take into account for this case, is there any workaround? I have to pay attention to such a detail in order to make a better guess for the ancestral state(s) at various points of this tree. Thanks again for the TreeIO developers for making tree analysis easier for us biologists! From manju.rawat2 at gmail.com Wed Sep 28 05:54:07 2011 From: manju.rawat2 at gmail.com (Manju Rawat) Date: Wed, 28 Sep 2011 15:24:07 +0530 Subject: [Bioperl-l] how to blast a seq against multiple dbase Message-ID: Hello, I have downloaded all the chromosome of Bos Taurus and i'd changed them in blast format using makeblastdb..and now i want to localy blast my sequence against these all chromosome.. now i have 29 database.Is there any method by which can i blast my sequence against all 29 database in my program.. whta should i write in database???? @params = ('database' => '????????', 'outfile' => 'blast2.out', '_READMETHOD' => 'Blast', 'prog'=> 'blastn'); Thanks Manju Rawat From p.j.a.cock at googlemail.com Wed Sep 28 06:02:07 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 28 Sep 2011 11:02:07 +0100 Subject: [Bioperl-l] how to blast a seq against multiple dbase In-Reply-To: References: Message-ID: On Wed, Sep 28, 2011 at 10:54 AM, Manju Rawat wrote: > Hello, > > I have downloaded all the chromosome of Bos Taurus and i'd changed them in > blast format using makeblastdb..and now i want to localy blast my sequence > against these all chromosome.. > now i have 29 database.Is there any method by which can i blast my sequence > against all 29 database in my program.. > > > whta should i write in database???? > > @params = ('database' => '????????', 'outfile' => 'blast2.out', > ? ? ? ?'_READMETHOD' => 'Blast', 'prog'=> 'blastn'); > The simple answer is make a combined database. This works internally with alias files, have a look at the NR and NT databases for example - they act like singe databases but are actually a collection of chunks. Even simpler would be to combine your Bos taurus sequence files into a single multi-entry FASTA file, and make that into a single BLAST database. Peter From awitney at sgul.ac.uk Wed Sep 28 06:42:39 2011 From: awitney at sgul.ac.uk (Adam Witney) Date: Wed, 28 Sep 2011 11:42:39 +0100 Subject: [Bioperl-l] how to blast a seq against multiple dbase In-Reply-To: References: Message-ID: I think if you want to keep the databases separate you would need to create a factory for each database, something like this foreach my $db ( @databases ) { my $factory = Bio::Tools::Run::StandAloneBlastPlus->new( -db_data => $db , < ? any other params ? > ); ? do blast stuff? } or as Peter says in another email you could combine your databases and run one query then filter them out in the results. regards adam On 28 Sep 2011, at 10:54, Manju Rawat wrote: > Hello, > > I have downloaded all the chromosome of Bos Taurus and i'd changed them in > blast format using makeblastdb..and now i want to localy blast my sequence > against these all chromosome.. > now i have 29 database.Is there any method by which can i blast my sequence > against all 29 database in my program.. > > > whta should i write in database???? > > @params = ('database' => '????????', 'outfile' => 'blast2.out', > '_READMETHOD' => 'Blast', 'prog'=> 'blastn'); > > > > Thanks > Manju Rawat > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From carandraug+dev at gmail.com Wed Sep 28 07:43:02 2011 From: carandraug+dev at gmail.com (=?ISO-8859-1?Q?Carn=EB_Draug?=) Date: Wed, 28 Sep 2011 12:43:02 +0100 Subject: [Bioperl-l] retrieving bioperl version for scripts Message-ID: Hi everyone, is there a recommended way to get the version of a script that is part of bioperl (the ones in the scripts directory)? Rather than hard coding the version of the script independent of bioperl, I thought on using the bioperl version itself. How can this be done? Thanks in advance, Carn? From carandraug+dev at gmail.com Wed Sep 28 11:00:34 2011 From: carandraug+dev at gmail.com (=?ISO-8859-1?Q?Carn=EB_Draug?=) Date: Wed, 28 Sep 2011 16:00:34 +0100 Subject: [Bioperl-l] retrieving bioperl version for scripts In-Reply-To: References: Message-ID: 2011/9/28 longbow leo : > Hi, Carn?, > > Do you mean this: > > perl -MBio::SeqIO -e 'print $Bio::SeqIO::VERSION' > > In my machine, the output is Thank you. Yes this is what I was looking for. I looked down how that variable comes up and so I think I'll use use Bio::Root::Version; say $Bio::Root::Version::VERSION; Carn? From pcantalupo at gmail.com Wed Sep 28 12:54:19 2011 From: pcantalupo at gmail.com (Paul Cantalupo) Date: Wed, 28 Sep 2011 12:54:19 -0400 Subject: [Bioperl-l] algorithm_version not working in multi-result blast output Message-ID: Hello, I'm using the most recent copy of bioperl-live (pulled yesterday). I have a BLASTN (from blast+) output file for 3 query sequences (https://gist.github.com/1248342). I used this script, https://gist.github.com/1248338, to print the query id, algorithm and algorithm_version for each result. When I run the script, I get the following output: GFAVMM201BADC0 ?BLASTN ?2.2.25+ GFAVMM201A1JOH ?BLASTN GFAVMM201D933Z ?BLASTN Algorithm_version outputs the correct version for the first result but outputs the empty string for the 2nd and 3rd query. Why? This functionality worked about a month ago. What has changed to cause this to happen? Thank you, Paul From rondonbio at yahoo.com.br Wed Sep 28 15:47:45 2011 From: rondonbio at yahoo.com.br (Rondon Neto) Date: Wed, 28 Sep 2011 12:47:45 -0700 (PDT) Subject: [Bioperl-l] best Hit Message-ID: <1317239265.98674.YahooMailNeo@web130214.mail.mud.yahoo.com> Hi guys.? I have this subroutine that returns a hash with nucleotide's coverage of each query from a blast alignment. So, I want to compute uniq hits. If a hit has already been aligned with a query, it must be eliminated from my experiment. Can anyone check if it's right or can fix it to me? Is there a way to do that directly in blast? Thank you Rondon Neto sub nucleotide_coverage{ #Bio::SearchIO dependent #This subroutine return a Hash and a file with nucleotide coverage? #for each query in an blast alignment xlm file. The input is the #alignment file. my ($alignment_file) = @_; my $alignment = new Bio::SearchIO ( -format => 'blastXML', ? ?-file ? => $alignment_file );my %positions;my @used_reads; while (my $result = $alignment->next_result) { my $query_name = $result->query_name(); my $tam = $result -> query_length(); for (0..$tam-1){ ${$positions{$query_name}}[$_] = 0 }? while (my $hit = $result->next_hit) { my $hit_name = $hit->name; # Here is my best hit parser. Is it ok? foreach my $read (@used_reads) { if ( $read eq $hit_name ) { next; } } while (my $hsp = $hit->next_hsp) { my $query_name = $result->query_name(); my @pos = $hsp->seq_inds('query','identical'); foreach my $num (@pos) { ${$positions{$query_name}}[$num-1]++; } } push (@used_reads, $hit_name); } } my $outfile = "nucleotide_coverage.txt"; open OUT, ">$outfile" or die $!;foreach my $key (keys %positions){print OUT "$key\t@{$positions{$key}}\n"; } close OUT; return \%positions; } From shalabh.sharma7 at gmail.com Wed Sep 28 15:53:07 2011 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Wed, 28 Sep 2011 15:53:07 -0400 Subject: [Bioperl-l] Getting taxa from gi Message-ID: Hi All, I know this has been discussed before, but this is kind of a new problem that i am facing. I want to get taxonomy (full linage) information from the huge list of GI's. I am using Bio::DB:Genbak for this with perl-5.12.3. Here is my small script. #! /usr/local/perl-5.12.3/bin/perl -w use strict; use warnings; use Bio::DB::GenBank; my @ids = qw( CP000490 ); my $gbh = Bio::DB::GenBank->new(); foreach my $id( @ids ) { # say "* ID: $id"; my $seq = $gbh->get_Seq_by_acc( $id ); my $org = $seq->species; #print "$org\n"; my $class = join'-', $org->classification; print "$class\n"; } The output is: Paracoccus denitrificans PD1222-Paracoccus-Rhodobacteraceae-Rhodobacterales-Alphaproteobacteria-Proteobacteria-Bacteria which is fine but i also want to get the taxa id, and if possible taxa ids for all the linage classification. ideally i would like to get something like this: 318586 - - - - - - - 1224 - 2 I would really appreciate your help. Thanks Shalabh -- Shalabh Sharma Scientific Computing Professional Associate (Bioinformatics Specialist) Department of Marine Sciences University of Georgia Athens, GA 30602-3636 From cjfields at illinois.edu Wed Sep 28 17:36:37 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Wed, 28 Sep 2011 21:36:37 +0000 Subject: [Bioperl-l] retrieving bioperl version for scripts In-Reply-To: References: Message-ID: On Sep 28, 2011, at 10:00 AM, Carn? Draug wrote: > 2011/9/28 longbow leo : >> Hi, Carn?, >> >> Do you mean this: >> >> perl -MBio::SeqIO -e 'print $Bio::SeqIO::VERSION' >> >> In my machine, the output is > > Thank you. Yes this is what I was looking for. I looked down how that > variable comes up and so I think I'll use > > use Bio::Root::Version; > say $Bio::Root::Version::VERSION; > > Carn? Just a warning on this: we're shortly to announce a major restructuring effort with BioPerl that will dramatically affect core versioning, mainly from the point of view that modularizing BioPerl into more manageable sub-distributions will require that each sub-distribution have it's own version (not necessarily a bad thing). chris From cjfields at illinois.edu Wed Sep 28 17:40:48 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Wed, 28 Sep 2011 21:40:48 +0000 Subject: [Bioperl-l] algorithm_version not working in multi-result blast output In-Reply-To: References: Message-ID: <4240A221-7BF7-4723-A7A1-30C806CE3DC6@illinois.edu> Not sure, but I would hazard a guess that only the 'Query=' line is present in concatenated BLAST reports past the initial report, and the version isn't carried over (I recall this being a problem with the algorithm() as well, but that was fixed a while ago. This should be an easy enough fix, but can you submit it as a bug so we can track it? chris On Sep 28, 2011, at 11:54 AM, Paul Cantalupo wrote: > Hello, > > I'm using the most recent copy of bioperl-live (pulled yesterday). I > have a BLASTN (from blast+) output file for 3 query sequences > (https://gist.github.com/1248342). I used this script, > https://gist.github.com/1248338, to print the query id, algorithm and > algorithm_version for each result. When I run the script, I get the > following output: > > GFAVMM201BADC0 BLASTN 2.2.25+ > GFAVMM201A1JOH BLASTN > GFAVMM201D933Z BLASTN > > Algorithm_version outputs the correct version for the first result but > outputs the empty string for the 2nd and 3rd query. Why? This > functionality worked about a month ago. What has changed to cause this > to happen? > > Thank you, > > Paul > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From carandraug+dev at gmail.com Wed Sep 28 18:07:53 2011 From: carandraug+dev at gmail.com (=?ISO-8859-1?Q?Carn=EB_Draug?=) Date: Wed, 28 Sep 2011 23:07:53 +0100 Subject: [Bioperl-l] retrieving bioperl version for scripts In-Reply-To: References: Message-ID: 2011/9/28 Fields, Christopher J : > On Sep 28, 2011, at 10:00 AM, Carn? Draug wrote: > >> 2011/9/28 longbow leo : >>> Hi, Carn?, >>> >>> Do you mean this: >>> >>> perl -MBio::SeqIO -e 'print $Bio::SeqIO::VERSION' >>> >>> In my machine, the output is >> >> Thank you. Yes this is what I was looking for. I looked down how that >> variable comes up and so I think I'll use >> >> use Bio::Root::Version; >> say $Bio::Root::Version::VERSION; >> >> Carn? > > Just a warning on this: we're shortly to announce a major restructuring effort with BioPerl that will dramatically affect core versioning, mainly from the point of view that modularizing BioPerl into more manageable sub-distributions will require that each sub-distribution have it's own version (not necessarily a bad thing). Where will the scripts end up after this restructuration? What I want is to create a version of the script (not of bioperl). Since the script is released with bioperl, they are the same. I actually already made the commit that makes this, just haven't bothered with the pull request yet. Also, will there be a release before this change? Carn? From shalabh.sharma7 at gmail.com Thu Sep 29 10:37:53 2011 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Thu, 29 Sep 2011 10:37:53 -0400 Subject: [Bioperl-l] GFF to GTF Message-ID: Hi, Is there any module to convert GFF file to GTF? Thanks Shalabh -- Shalabh Sharma Scientific Computing Professional Associate (Bioinformatics Specialist) Department of Marine Sciences University of Georgia Athens, GA 30602-3636 From cjfields at illinois.edu Thu Sep 29 11:07:27 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Thu, 29 Sep 2011 15:07:27 +0000 Subject: [Bioperl-l] retrieving bioperl version for scripts In-Reply-To: References: Message-ID: <46733E36-5795-4EB6-9C2B-C978000FFD46@illinois.edu> On Sep 28, 2011, at 5:07 PM, Carn? Draug wrote: > 2011/9/28 Fields, Christopher J : >> On Sep 28, 2011, at 10:00 AM, Carn? Draug wrote: >> >>> 2011/9/28 longbow leo : >>>> Hi, Carn?, >>>> >>>> Do you mean this: >>>> >>>> perl -MBio::SeqIO -e 'print $Bio::SeqIO::VERSION' >>>> >>>> In my machine, the output is >>> >>> Thank you. Yes this is what I was looking for. I looked down how that >>> variable comes up and so I think I'll use >>> >>> use Bio::Root::Version; >>> say $Bio::Root::Version::VERSION; >>> >>> Carn? >> >> Just a warning on this: we're shortly to announce a major restructuring effort with BioPerl that will dramatically affect core versioning, mainly from the point of view that modularizing BioPerl into more manageable sub-distributions will require that each sub-distribution have it's own version (not necessarily a bad thing). > > Where will the scripts end up after this restructuration? What I want > is to create a version of the script (not of bioperl). Since the > script is released with bioperl, they are the same. I actually already > made the commit that makes this, just haven't bothered with the pull > request yet. > > Also, will there be a release before this change? > > Carn? Scripts will likely go with the distribution that they most closely are tied to, but that's still an area for debate (some may equally fall within one distribution or another, which will be tricky). For more on the release aspects see the (currently being revised and thus not complete) wiki page: http://www.bioperl.org/wiki/BioPerl_Modularization chris From pcantalupo at gmail.com Thu Sep 29 12:13:05 2011 From: pcantalupo at gmail.com (Paul Cantalupo) Date: Thu, 29 Sep 2011 12:13:05 -0400 Subject: [Bioperl-l] algorithm_version not working in multi-result blast output In-Reply-To: <4240A221-7BF7-4723-A7A1-30C806CE3DC6@illinois.edu> References: <4240A221-7BF7-4723-A7A1-30C806CE3DC6@illinois.edu> Message-ID: Bug submitted: https://redmine.open-bio.org/issues/3298 On Wed, Sep 28, 2011 at 5:40 PM, Fields, Christopher J wrote: > Not sure, but I would hazard a guess that only the 'Query=' line is present in concatenated BLAST reports past the initial report, and the version isn't carried over (I recall this being a problem with the algorithm() as well, but that was fixed a while ago. > > This should be an easy enough fix, but can you submit it as a bug so we can track it? > > chris > > On Sep 28, 2011, at 11:54 AM, Paul Cantalupo wrote: > >> Hello, >> >> I'm using the most recent copy of bioperl-live (pulled yesterday). I >> have a BLASTN (from blast+) output file for 3 query sequences >> (https://gist.github.com/1248342). I used this script, >> https://gist.github.com/1248338, to print the query id, algorithm and >> algorithm_version for each result. When I run the script, I get the >> following output: >> >> GFAVMM201BADC0 ?BLASTN ?2.2.25+ >> GFAVMM201A1JOH ?BLASTN >> GFAVMM201D933Z ?BLASTN >> >> Algorithm_version outputs the correct version for the first result but >> outputs the empty string for the 2nd and 3rd query. Why? This >> functionality worked about a month ago. What has changed to cause this >> to happen? >> >> Thank you, >> >> Paul >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From jluis.lavin at unavarra.es Fri Sep 30 04:23:19 2011 From: jluis.lavin at unavarra.es (jluis.lavin at unavarra.es) Date: Fri, 30 Sep 2011 10:23:19 +0200 Subject: [Bioperl-l] Bio-Graphics module Message-ID: <52f9bcbb5c40302fe5d1ea274982c24b.squirrel@webmail.unavarra.es> Dear All, I'm currently using Perl 5.10.0 version and Bioperl 1.6.1 running on a windows machine. I read about the Bio-Graphics module and it'd be wonderful to install it, but seems like it is only available for Perl 5.8... Is there any other Perl and/or Bioperl module to do the same kind of genomic and Blast report representation currently available? Thanks in advance -- Dr. Jos? Luis Lav?n Trueba Dpto. de Producci?n Agraria Grupo de Gen?tica y Microbiolog?a Universidad P?blica de Navarra 31006 Pamplona Navarra SPAIN From cjfields at illinois.edu Fri Sep 30 08:38:01 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Fri, 30 Sep 2011 12:38:01 +0000 Subject: [Bioperl-l] Bio-Graphics module In-Reply-To: <52f9bcbb5c40302fe5d1ea274982c24b.squirrel@webmail.unavarra.es> References: <52f9bcbb5c40302fe5d1ea274982c24b.squirrel@webmail.unavarra.es> Message-ID: It's available for all perl versions from 5.8.8 up. I have it running with perl 5.14. Now, I recall there being problems with installation on Mac OS X, though I think that was mainly due to GD.pm and libgd. chris On Sep 30, 2011, at 3:23 AM, wrote: > > Dear All, > > I'm currently using Perl 5.10.0 version and Bioperl 1.6.1 running on a > windows machine. > > I read about the Bio-Graphics module and it'd be wonderful to install it, > but seems like it is only available for Perl 5.8... > Is there any other Perl and/or Bioperl module to do the same kind of > genomic and Blast report representation currently available? > > Thanks in advance > > -- > Dr. Jos? Luis Lav?n Trueba > > Dpto. de Producci?n Agraria > Grupo de Gen?tica y Microbiolog?a > Universidad P?blica de Navarra > 31006 Pamplona > Navarra > SPAIN > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jillianrowe91286 at gmail.com Wed Sep 28 02:03:32 2011 From: jillianrowe91286 at gmail.com (Jill) Date: Tue, 27 Sep 2011 23:03:32 -0700 (PDT) Subject: [Bioperl-l] Gene Type in Entrez gene? Message-ID: Hi there, I am using the Bio::DB::Eutilities module to download gene sequences based on a query. while (my $docsum = $summaries->next_DocSum) { ## some items in DocSum are also named ChrStart so we pick the genomic ## information item and get the coordinates from it my ($genomic_info) = $docsum->get_Items_by_name('GenomicInfoType'); ## some entries may have no data on genomic coordinates. This condition filters then out if (!$genomic_info) { ## found no genomic coordinates data next; } ## get coordinates of sequence ## get_contents_by_name always returns a list my ($chr_acc_ver) = $genomic_info- >get_contents_by_name("ChrAccVer"); my ($chr_start) = $genomic_info- >get_contents_by_name("ChrStart"); my ($chr_stop) = $genomic_info- >get_contents_by_name("ChrStop"); my $strand; if ($chr_start < $chr_stop) { $strand = 1; $chr_start = $chr_start +1 - $bp5_extra; $chr_stop = $chr_stop +1 + $bp5_extra; } elsif ($chr_start > $chr_stop) { $strand = 2; $chr_start = $chr_start +1 - (-$bp5_extra); $chr_stop = $chr_stop +1 + (-$bp5_extra); } else { next; } while (my $item = $docsum->next_Item('flattened')) { next if ($item->get_name =~ m/NomenclatureName/); if($item->get_name =~ m/Description/) { $description = $item->get_content if $item->get_content; $description =~ tr/ /_/; print $description, "\n";} if($item->get_name =~ m/Name/) { $name = $item->get_content if $item->get_content; print $name, "\n"; } printf("%-20s:%s\n",$item->get_name,$item->get_content) if $item->get_content; } } Then I go on to use genbank to download the sequences based on the chromosome splice. For what I have it works great. But I am trying to get to the gene type (either protein coding or pseudo) as well. I can see it in the summary on the Entrez Gene sight, but can't get to it through bioperl. When I have it print out all the contents of the summary it doesn't show up there either. Any help? Thanks! From liam.elbourne at mq.edu.au Thu Sep 29 17:34:04 2011 From: liam.elbourne at mq.edu.au (Liam Elbourne) Date: Fri, 30 Sep 2011 07:34:04 +1000 Subject: [Bioperl-l] GFF to GTF In-Reply-To: References: Message-ID: <8D027281-44E6-467C-8D22-D2D2F87D04B6@mq.edu.au> Hi Shalabh, Not sure about bioperl (I looked a while back and either missed it or it's not there) but there is a program associated with the cufflinks suite called gffread that should convert. Regards, Liam Elbourne. On 30/09/2011, at 12:37 AM, shalabh sharma wrote: > Hi, > Is there any module to convert GFF file to GTF? > > Thanks > Shalabh > > > -- > Shalabh Sharma > Scientific Computing Professional Associate (Bioinformatics Specialist) > Department of Marine Sciences > University of Georgia > Athens, GA 30602-3636 > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From carandraug+dev at gmail.com Fri Sep 30 09:18:04 2011 From: carandraug+dev at gmail.com (=?ISO-8859-1?Q?Carn=EB_Draug?=) Date: Fri, 30 Sep 2011 14:18:04 +0100 Subject: [Bioperl-l] Gene Type in Entrez gene? In-Reply-To: References: Message-ID: On 28 September 2011 07:03, Jill wrote: > Hi there, > > I am using the Bio::DB::Eutilities module to download gene sequences > based on a query. > > > [...] > } > > > Then I go on to use genbank to download the sequences based on the > chromosome splice. For what I have it works great. But I am trying to > get to the gene type (either protein coding or pseudo) as well. I can > see it in the summary on the Entrez Gene sight, but can't get to it > through bioperl. When I have it print out all the contents of the > summary it doesn't show up there either. > > Any help? Hi Jill, there's already a script in bioperl that does what you want, it's just not part of the current stable release. You can get it here https://github.com/bioperl/bioperl-live/blob/master/scripts/Bio-DB-EUtilities/bp_genbank_ref_extractor.pl You can download the script alone, it will work fine in previous releases of bioperl, no need to write another one. Carn? Draug From manju.rawat2 at gmail.com Thu Sep 1 06:53:53 2011 From: manju.rawat2 at gmail.com (Manju Rawat) Date: Thu, 1 Sep 2011 02:53:53 -0400 Subject: [Bioperl-l] Bioperl query.... In-Reply-To: References: <4E5CC8AC.8050800@gmail.com> Message-ID: Thanks For The Reply.. I have already seen this link..But I am confused. I used to following code and run it... my $in = Bio::SearchIO->new(-format => 'blast', -file => 'seqs.blast'); while( my $result = $in->next_result ) { ## $result is a Bio::Search::Result::ResultI compliant object while( my $hit = $result->next_hit ) { ## $hit is a Bio::Search::Hit::HitI compliant object while( my $hsp = $hit->next_hsp ) { ## $hsp is a Bio::Search::HSP::HSPI compliant object if( $hsp->length('total') > 50 ) { if ( $hsp->percent_identity >= 75 ) { print "Query=", $result->query_name, " Hit=", $hit->name, " Length=", $hsp->length('total'), " gaps=", $hsp->gaps, " Percent_id=", $hsp->percent_identity, "\n"; } } } }} and it showing me following output with an error that.. *Erro--*rArgument "" isn't numeric in numeric lt (<) at /usr/local/share/perl/5.10.1/Bio/SearchIO/SearchResultEventBuilder.pm line 279, line 4113. Query=NM_181451 Hit=ref|NM_181451.1| Length=1349 gaps=1 Percent_id=100 Query=NM_181451 Hit=ref|XM_002706247.1| Length=1345 gaps=13 Percent_id=93.8289962825279 Query=NM_181451 Hit=ref|NM_001098089.1| Length=1323 gaps=7 Percent_id=91.9123204837491 Query=NM_181451 Hit=ref|NM_001008415.1| Length=1211 gaps=5 Percent_id=94.9628406275805 Query=NM_181451 Hit=ref|XM_001251693.3| Length=1320 gaps=5 Percent_id=91.969696969697 Query=NM_181451 Hit=ref|NM_001097567.1| Length=1338 gaps=4 Percent_id=91.5545590433483 Query=NM_181451 Hit=gb|AY075103.1| Length=1334 gaps=1 Percent_id=91.304347826087 ................ .......... Pl Find.whats the error i this code... Thanks Manju Rawat. From locarpau at upvnet.upv.es Thu Sep 1 14:49:06 2011 From: locarpau at upvnet.upv.es (Lorenzo Carretero Paulet) Date: Thu, 1 Sep 2011 16:49:06 +0200 Subject: [Bioperl-l] Parsing PAML mlc files In-Reply-To: References: <9645AF32-5EC3-41FA-9A32-45B6B92E31FD@illinois.edu><4DF56976.8080704@upvnet.upv.es><9866C4A4-AC36-4A25-B38F-3006A7BB0F11@illinois.edu> <1314636433.4e5bc291a40c6@webmail.upv.es> <1A4207F8295607498283FE9E93B775B407CCB2D3@EX02.asurite.ad.asu.edu> <1314659986.4e5c1e9268078@webmail.upv.es> Message-ID: <1314888546.4e5f9b628ecea@webmail.upv.es> Hi all, I'm trying to parse mlc output files from PAML using Bio::Tools::Phylo::PAML as: my $mlcfile = "/Applications/Bioinformatics/paml44/bin/mlc"; my $parserF = Bio::Tools::Phylo::PAML->new (-file =>$mclfile); if ( my $paml_result = $parserF->next_result ) { say Dumper $paml_result; #Prints Ok for ( my $model_result= $paml_result->get_NSSite_results ) { #say Dumper $model_result; #Prints nothing $ns_string = "model ".$model_result->model_num."\n ".$model_result->model_description()."\n ".$model_result->time_used."\n"; $dnds_site_classes = $model_result->dnds_site_classes; #a hashref #say Dumper $dnds_site_classes; for my $sites ( $model_result->get_BEB_pos_selected_sites ) ... ... ... The Bio::Tools::Phylo::PAML::Result object is ok, as I can print it using DUmper. In contrast, it seems that the Bio::Tools::Phylo::PAML::ModelResult object is not being properly instantiated, as I get the error message: "Can't call method "model_num" without a package or object reference at ..." What am I missing? Best, Lorenzo From jason.stajich at gmail.com Thu Sep 1 20:23:47 2011 From: jason.stajich at gmail.com (Jason Stajich) Date: Thu, 1 Sep 2011 13:23:47 -0700 Subject: [Bioperl-l] Parsing PAML mlc files In-Reply-To: <1314888546.4e5f9b628ecea@webmail.upv.es> References: <9645AF32-5EC3-41FA-9A32-45B6B92E31FD@illinois.edu><4DF56976.8080704@upvnet.upv.es><9866C4A4-AC36-4A25-B38F-3006A7BB0F11@illinois.edu> <1314636433.4e5bc291a40c6@webmail.upv.es> <1A4207F8295607498283FE9E93B775B407CCB2D3@EX02.asurite.ad.asu.edu> <1314659986.4e5c1e9268078@webmail.upv.es> <1314888546.4e5f9b628ecea@webmail.upv.es> Message-ID: Lorenzo - I am sure this is a problem with changes in the output from PAML - this is classic problem with this suite. This requires some debugging of the parser, not sure if there is anyone out there with time to do the debugging. I can say all this worked before on an earlier version of PAML but I don't know specifically what is going on with the latest paml4.4 version. Jason On Sep 1, 2011, at 7:49 AM, Lorenzo Carretero Paulet wrote: > Hi all, > I'm trying to parse mlc output files from PAML using Bio::Tools::Phylo::PAML as: > > my $mlcfile = "/Applications/Bioinformatics/paml44/bin/mlc"; > my $parserF = Bio::Tools::Phylo::PAML->new (-file =>$mclfile); > if ( my $paml_result = $parserF->next_result ) > { > say Dumper $paml_result; #Prints Ok > for ( my $model_result= $paml_result->get_NSSite_results ) > { > #say Dumper $model_result; #Prints nothing > $ns_string = "model ".$model_result->model_num."\n > ".$model_result->model_description()."\n ".$model_result->time_used."\n"; > $dnds_site_classes = $model_result->dnds_site_classes; #a hashref > #say Dumper $dnds_site_classes; > for my $sites ( $model_result->get_BEB_pos_selected_sites ) > ... > ... > ... > > The Bio::Tools::Phylo::PAML::Result object is ok, as I can print it using > DUmper. In contrast, it seems that the Bio::Tools::Phylo::PAML::ModelResult > object is not being properly instantiated, as I get the error message: > > "Can't call method "model_num" without a package or object reference at ..." > > What am I missing? > Best, > Lorenzo > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From Scott.Markel at accelrys.com Thu Sep 1 21:22:21 2011 From: Scott.Markel at accelrys.com (Scott Markel) Date: Thu, 1 Sep 2011 14:22:21 -0700 Subject: [Bioperl-l] file format for alignment plus features for aligned sequences Message-ID: <5ACBA19439E77B43A06F4CAB897EC97702F8302A05@EXCH1-COLO.accelrys.net> A question on behalf of the Discovery Studio group at Accelrys - They have alignment data with annotations, e.g., visualization settings or alignment properties. The aligned sequences also have features, e.g., domain boundaries or secondary structure motifs. They currently use BSML to save sequences and features. Is there an extension of BSML that can also save the alignment information? Are there any good file formats that can be used to store an alignment plus features associated with the aligned sequences? Are there other mailing lists that might be more appropriate for these questions? Scott Scott Markel, Ph.D. Principal Bioinformatics Architect email: smarkel at accelrys.com Accelrys (Pipeline Pilot R&D) mobile: +1 858 205 3653 10188 Telesis Court, Suite 100 voice: +1 858 799 5603 San Diego, CA 92121 fax: +1 858 799 5222 USA web: http://www.accelrys.com http://www.linkedin.com/in/smarkel Secretary, Board of Directors: International Society for Computational Biology Chair: ISCB Publications Committee Associate Editor: PLoS Computational Biology Editorial Board: Briefings in Bioinformatics From ihok at hotmail.com Fri Sep 2 03:49:50 2011 From: ihok at hotmail.com (Jack Tanner) Date: Thu, 1 Sep 2011 23:49:50 -0400 Subject: [Bioperl-l] Bio::Ext::Align? Message-ID: I'd like to run Bio::Ext::Align, from bioperl-ext. Seems like a pretty quiet codebase these days... Is it dead? Does anyone have a spec file for building an SRPM for it for RHEL 6? From cjfields at illinois.edu Fri Sep 2 04:31:17 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Fri, 2 Sep 2011 04:31:17 +0000 Subject: [Bioperl-l] Bio::Ext::Align? In-Reply-To: References: Message-ID: <82213CB3-C02A-415D-A4F3-66471D44C31C@illinois.edu> Yes, it's essentially deprecated (unmaintained). I don't know of anyone who has packaged that up in a while, if ever. chris On Sep 1, 2011, at 10:49 PM, Jack Tanner wrote: > I'd like to run Bio::Ext::Align, from bioperl-ext. Seems like a pretty quiet codebase these days... Is it dead? > > Does anyone have a spec file for building an SRPM for it for RHEL 6? > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From David.Messina at sbc.su.se Fri Sep 2 08:44:07 2011 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 2 Sep 2011 10:44:07 +0200 Subject: [Bioperl-l] Bio::Ext::Align? In-Reply-To: <82213CB3-C02A-415D-A4F3-66471D44C31C@illinois.edu> References: <82213CB3-C02A-415D-A4F3-66471D44C31C@illinois.edu> Message-ID: As it happens, a colleague of mine needed Bio::Ext::Align for hhrpred: http://toolkit.tuebingen.mpg.de/hhpred He got it working thus: > Hi Dave, > thanks a lot. i made it work. The error i got later on was: relocation R_X86_64_32 against `.rodata.str1.1' can not be used when making > a shared object; > recompile with -fPIC > > the solution is: > perl Makefile.PL PREFIX= /fftwsingle --enable-shared > --with-pic --enable-single > > make > make install > http://forums.fedoraforum.org/ > showthread.php?t=232607 Dave On Fri, Sep 2, 2011 at 06:31, Fields, Christopher J wrote: > Yes, it's essentially deprecated (unmaintained). I don't know of anyone > who has packaged that up in a while, if ever. > > chris > > On Sep 1, 2011, at 10:49 PM, Jack Tanner wrote: > > > I'd like to run Bio::Ext::Align, from bioperl-ext. Seems like a pretty > quiet codebase these days... Is it dead? > > > > Does anyone have a spec file for building an SRPM for it for RHEL 6? > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From David.Messina at sbc.su.se Fri Sep 2 09:30:33 2011 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 2 Sep 2011 11:30:33 +0200 Subject: [Bioperl-l] Parsing PAML mlc files In-Reply-To: References: <9645AF32-5EC3-41FA-9A32-45B6B92E31FD@illinois.edu> <4DF56976.8080704@upvnet.upv.es> <9866C4A4-AC36-4A25-B38F-3006A7BB0F11@illinois.edu> <1314636433.4e5bc291a40c6@webmail.upv.es> <1A4207F8295607498283FE9E93B775B407CCB2D3@EX02.asurite.ad.asu.edu> <1314659986.4e5c1e9268078@webmail.upv.es> <1314888546.4e5f9b628ecea@webmail.upv.es> Message-ID: Looking back at the commit history, back in April and May 2010, I made some updates for the January 2010 edition of PAML 4.4. All tests passed at that time, but: - the tests may be incomplete - PAML has undoubtedly changed since then, even if it's still called version 4.4 I can't look at this right now myself, but please file a bug report on this, and hopefully someone else can. Dave On Thu, Sep 1, 2011 at 22:23, Jason Stajich wrote: > Lorenzo - > > I am sure this is a problem with changes in the output from PAML - this is > classic problem with this suite. This requires some debugging of the > parser, not sure if there is anyone out there with time to do the debugging. > I can say all this worked before on an earlier version of PAML but I don't > know specifically what is going on with the latest paml4.4 version. > > Jason > > > On Sep 1, 2011, at 7:49 AM, Lorenzo Carretero Paulet wrote: > > > Hi all, > > I'm trying to parse mlc output files from PAML using > Bio::Tools::Phylo::PAML as: > > > > my $mlcfile = "/Applications/Bioinformatics/paml44/bin/mlc"; > > my $parserF = Bio::Tools::Phylo::PAML->new (-file =>$mclfile); > > if ( my $paml_result = $parserF->next_result ) > > { > > say Dumper $paml_result; #Prints Ok > > for ( my $model_result= $paml_result->get_NSSite_results ) > > { > > #say Dumper $model_result; #Prints nothing > > $ns_string = "model ".$model_result->model_num."\n > > ".$model_result->model_description()."\n ".$model_result->time_used."\n"; > > $dnds_site_classes = $model_result->dnds_site_classes; #a hashref > > #say Dumper $dnds_site_classes; > > for my $sites ( $model_result->get_BEB_pos_selected_sites ) > > ... > > ... > > ... > > > > The Bio::Tools::Phylo::PAML::Result object is ok, as I can print it using > > DUmper. In contrast, it seems that the > Bio::Tools::Phylo::PAML::ModelResult > > object is not being properly instantiated, as I get the error message: > > > > "Can't call method "model_num" without a package or object reference at > ..." > > > > What am I missing? > > Best, > > Lorenzo > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at illinois.edu Fri Sep 2 13:00:27 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Fri, 2 Sep 2011 13:00:27 +0000 Subject: [Bioperl-l] Bio::Ext::Align? In-Reply-To: References: <82213CB3-C02A-415D-A4F3-66471D44C31C@illinois.edu> Message-ID: <81EEC2D4-D039-41D5-A10F-91DD26B4D21E@illinois.edu> I think, if this is actively being used, we should split it away from bioperl-ext and release it on its own. Otherwise I worry about the long-term support for it/ chris On Sep 2, 2011, at 3:44 AM, Dave Messina wrote: > As it happens, a colleague of mine needed Bio::Ext::Align for hhrpred: > http://toolkit.tuebingen.mpg.de/hhpred > > > He got it working thus: > Hi Dave, > thanks a lot. i made it work. The error i got later on was: > relocation R_X86_64_32 against `.rodata.str1.1' can not be used when making a shared object; > recompile with -fPIC > > the solution is: > perl Makefile.PL PREFIX= /fftwsingle --enable-shared --with-pic --enable-single > > make > make install > http://forums.fedoraforum.org/showthread.php?t=232607 > > > > Dave > > > > > > On Fri, Sep 2, 2011 at 06:31, Fields, Christopher J wrote: > Yes, it's essentially deprecated (unmaintained). I don't know of anyone who has packaged that up in a while, if ever. > > chris > > On Sep 1, 2011, at 10:49 PM, Jack Tanner wrote: > > > I'd like to run Bio::Ext::Align, from bioperl-ext. Seems like a pretty quiet codebase these days... Is it dead? > > > > Does anyone have a spec file for building an SRPM for it for RHEL 6? > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From ihok at hotmail.com Fri Sep 2 15:20:44 2011 From: ihok at hotmail.com (Jack Tanner) Date: Fri, 2 Sep 2011 11:20:44 -0400 Subject: [Bioperl-l] Bio::Ext::Align? In-Reply-To: <81EEC2D4-D039-41D5-A10F-91DD26B4D21E@illinois.edu> References: <82213CB3-C02A-415D-A4F3-66471D44C31C@illinois.edu> <81EEC2D4-D039-41D5-A10F-91DD26B4D21E@illinois.edu> Message-ID: I also see that someone's forked it on Github and made some packaging fixes. It'd be nice to see it revived. On 9/2/2011 9:00 AM, Fields, Christopher J wrote: > I think, if this is actively being used, we should split it away from bioperl-ext and release it on its own. Otherwise I worry about the long-term support for it/ > > chris > > On Sep 2, 2011, at 3:44 AM, Dave Messina wrote: > >> As it happens, a colleague of mine needed Bio::Ext::Align for hhrpred: >> http://toolkit.tuebingen.mpg.de/hhpred >> >> >> He got it working thus: >> Hi Dave, >> thanks a lot. i made it work. The error i got later on was: >> relocation R_X86_64_32 against `.rodata.str1.1' can not be used when making a shared object; >> recompile with -fPIC >> >> the solution is: >> perl Makefile.PL PREFIX= /fftwsingle --enable-shared --with-pic --enable-single >> >> make >> make install >> http://forums.fedoraforum.org/showthread.php?t=232607 >> >> >> >> Dave >> >> >> >> >> >> On Fri, Sep 2, 2011 at 06:31, Fields, Christopher J wrote: >> Yes, it's essentially deprecated (unmaintained). I don't know of anyone who has packaged that up in a while, if ever. >> >> chris >> >> On Sep 1, 2011, at 10:49 PM, Jack Tanner wrote: >> >>> I'd like to run Bio::Ext::Align, from bioperl-ext. Seems like a pretty quiet codebase these days... Is it dead? >>> >>> Does anyone have a spec file for building an SRPM for it for RHEL 6? >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > From manju.rawat2 at gmail.com Sat Sep 3 05:29:56 2011 From: manju.rawat2 at gmail.com (Manju Rawat) Date: Sat, 3 Sep 2011 01:29:56 -0400 Subject: [Bioperl-l] hsps_successfully_gapped: 47 Message-ID: Hello, Is There any method in BioPerl through which we can extract number_of_hsps_successfully_gapped: from a blast file.. If any one know about the it Pl help me... Thanks Manju Rawat From manju.rawat2 at gmail.com Sat Sep 3 10:00:22 2011 From: manju.rawat2 at gmail.com (Manju Rawat) Date: Sat, 3 Sep 2011 06:00:22 -0400 Subject: [Bioperl-l] blast result not matching. Message-ID: Hi, I doing blast using bioperl...but it not showing me complete result.. my program is following... #!usr/bin/perl -w use Bio::Perl; # this script will only work with an internet connection # on the computer it is run on $seq = new_sequence("ATTGGTTTGGGGACCCAATTTGTGTGTTATATGTA"); $blast_result=blast_sequence($seq); write_blast(">xyz.blast",$blast_result); and Output.. BLASTN 2.2.25+ Stephen F. Altschul, Thomas L. Madden, Alejandro A. Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs", Nucleic Acids Res. 25:3389-3402. Query= blast-sequence-temp-id (30 letters) Database: All GenBank+EMBL+DDBJ+PDB sequences (but no EST, STS,GSS,environmental samples or phase 0, 1 or 2 HTGS sequences) 14,527,398 sequences; 37,346,598,701 total letters Score E Sequences producing significant alignments: (bits) value Database: All GenBank+EMBL+DDBJ+PDB sequences (but no EST, STS,GSS,environmental samples or phase 0, 1 or 2 HTGS sequences) Posted date: Sep 2, 2011 4:14 PM Number of letters in database: 37,346,598,701 Number of sequences in database: 14,527,398 Matrix: blastn matrix:2 -3 Gap Penalties Existence: 5, Extension: 2 expect: 1e-10 allowgaps: yes Search Statistics A: 0 Hits_to_DB: 737,387 S1: 23 S1_bits: 22.0 S2: 77 S2_bits: 70.7 X1: 22 X1_bits: 20.1 X2: 33 X2_bits: 29.8 X3: 110 X3_bits: 99.2 dbentries: 14,527,398 dbletters: -1308106959 effectivedblength: 36,954,358,955 effectivespace: 110,863,076,865 effectivespaceused: 110,863,076,865 entropy: 0.912 entropy_gapped: 0.780 kappa: 0.408 kappa_gapped: 0.410 lambda: 0.634 lambda_gapped: 0.625 length_adjustment: 27 num_extensions: 8,057 num_successful_extensions: 8,057 number_of_hsps_better_than_expect_value_cutoff_without_gapping: 0 number_of_hsps_gapped: 8,057 number_of_hsps_successfully_gapped: 0 querylength: 3 seqs_better_than_1e-10: 0 this result is not matching with with NCBI result... Is there anything wrong.. Thanks Manju Rawat From florent.angly at gmail.com Mon Sep 5 02:14:37 2011 From: florent.angly at gmail.com (Florent Angly) Date: Mon, 05 Sep 2011 12:14:37 +1000 Subject: [Bioperl-l] Bio::SeqIO can't guess the format of data from a pipe In-Reply-To: References: <2E070303-ADDF-472B-835B-6D3959B83E9A@illinois.edu> <4E58D105.7050805@gmail.com> <4E5A0590.2010805@gmail.com> <8D639B95-0666-4F09-8E9E-88C8CDF76ABC@illinois.edu> <4E5AC2B8.9060808@gmail.com> Message-ID: <4E64308D.5060304@gmail.com> Thanks for your advice Chris. I put a format() and variant() method in Bio::Root::IO. All the Bio::*IO methods inherit these methods. Regarding the module naming, 8 follow the convention Bio::*IO and 8 follow the Bio::*::IO convention. If we decide to rename some IO modules for consistency, I would prefer the Bio::*::IO convention. Regards, Florent On 29/08/11 11:10, Chris Fields wrote: > On Aug 28, 2011, at 5:35 PM, Florent Angly wrote: > >> Hi, >> >> I implemented the format() getter method in Bio::SeqIO as discussed, essentially following the way proposed by Hilmar. The variant() method is not needed since Bio::SeqIO::fastq already has a get/set method for that. > Right, but the method could be used by other modules if it were moved to Bio::SeqIO. for instance. > >> I noticed that there are plenty more Bio*IO modules that could benefit from having a format() method, e.g.: >> Bio::AlignIO >> Bio::ClusterIO >> Bio::FeatureIO >> Bio::MapIO >> Bio::OntologyIO >> Bio::SearchIO >> Bio::TreeIO >> Bio::Assembly::IO * >> The code could be copy-pasted for each of them but it is not very graceful. Is there a way we could have all these IO modules share the same format() method? > Move the method to Bio::Root::IO, the common base class for all of the above. > >> * Note how the IO class for Bio::Assembly is called Bio::Assembly::IO, and not Bio::AssemblyIO like for other classes. This may be something to change in the future for consistency. >> >> Florent > That's possible; one could take advantage of that for redesign/API issues if it were needed. > > chris From manju.rawat2 at gmail.com Mon Sep 5 07:53:40 2011 From: manju.rawat2 at gmail.com (Manju Rawat) Date: Mon, 5 Sep 2011 03:53:40 -0400 Subject: [Bioperl-l] Fwd: blast result not matching. In-Reply-To: References: Message-ID: Hi, I doing blast using bioperl...but it not showing me complete result.. my program is following... #!usr/bin/perl -w use Bio::Perl; # this script will only work with an internet connection # on the computer it is run on $seq = new_sequence("ATTGGTTTGGGGACCCAATTTGTGTGTTATATGTA"); $blast_result=blast_sequence($seq); write_blast(">xyz.blast",$blast_result); *and Output..* BLASTN 2.2.25+ Stephen F. Altschul, Thomas L. Madden, Alejandro A. Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs", Nucleic Acids Res. 25:3389-3402. Query= blast-sequence-temp-id (30 letters) Database: All GenBank+EMBL+DDBJ+PDB sequences (but no EST, STS,GSS,environmental samples or phase 0, 1 or 2 HTGS sequences) 14,527,398 sequences; 37,346,598,701 total letters Score E Sequences producing significant alignments: (bits) value Database: All GenBank+EMBL+DDBJ+PDB sequences (but no EST, STS,GSS,environmental samples or phase 0, 1 or 2 HTGS sequences) Posted date: Sep 2, 2011 4:14 PM Number of letters in database: 37,346,598,701 Number of sequences in database: 14,527,398 Matrix: blastn matrix:2 -3 Gap Penalties Existence: 5, Extension: 2 expect: 1e-10 allowgaps: yes Search Statistics A: 0 Hits_to_DB: 737,387 S1: 23 S1_bits: 22.0 S2: 77 S2_bits: 70.7 X1: 22 X1_bits: 20.1 X2: 33 X2_bits: 29.8 X3: 110 X3_bits: 99.2 dbentries: 14,527,398 dbletters: -1308106959 effectivedblength: 36,954,358,955 effectivespace: 110,863,076,865 effectivespaceused: 110,863,076,865 entropy: 0.912 entropy_gapped: 0.780 kappa: 0.408 kappa_gapped: 0.410 lambda: 0.634 lambda_gapped: 0.625 length_adjustment: 27 num_extensions: 8,057 num_successful_extensions: 8,057 number_of_hsps_better_than_expect_value_cutoff_without_gapping: 0 number_of_hsps_gapped: 8,057 number_of_hsps_successfully_gapped: 0 querylength: 3 seqs_better_than_1e-10: 0 this result is not matching with with NCBI result... Is there anything wrong.. Thanks Manju Rawat From p.j.a.cock at googlemail.com Mon Sep 5 09:44:06 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 5 Sep 2011 10:44:06 +0100 Subject: [Bioperl-l] Fwd: blast result not matching. In-Reply-To: References: Message-ID: On Mon, Sep 5, 2011 at 8:53 AM, Manju Rawat wrote: > Hi, > I doing blast using bioperl...but it not showing me complete result.. > > > my program is following... > ... > > this result is not matching with with NCBI result... > Is there anything wrong.. The NCBI website for BLAST uses different default values to the BLAST command line tools. Check things like the gap parameters if you want to use the same settings. Peter From p.j.a.cock at googlemail.com Mon Sep 5 10:25:15 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 5 Sep 2011 11:25:15 +0100 Subject: [Bioperl-l] Fwd: blast result not matching. In-Reply-To: References: Message-ID: Please CC the mailing list. On Mon, Sep 5, 2011 at 11:19 AM, Manju Rawat wrote: > Hi, > > Thanks for the reply... > but when i am blasting after getting sequence of any gene (from NCBI using > bioperl see below)..it showing me same result as shown in NCBI.. > > #!usr/bin/perl -w > use Bio::Perl; > $seq_object = get_sequence('NCBI',"NM_181451"); > $blast_result = blast_sequence($seq); > write_blast(">roa1.blast",$blast_report); > > > I dnt know why its not working when i am blasting my own sequence.. > Maybe you need give the sequence as a FASTA entry rather than a plain string? Peter From manju.rawat2 at gmail.com Mon Sep 5 10:40:57 2011 From: manju.rawat2 at gmail.com (Manju Rawat) Date: Mon, 5 Sep 2011 06:40:57 -0400 Subject: [Bioperl-l] Fwd: blast result not matching. In-Reply-To: References: Message-ID: No..i also tried this..this also dont work.. pls help me.. From cjfields at illinois.edu Mon Sep 5 19:42:49 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Mon, 5 Sep 2011 19:42:49 +0000 Subject: [Bioperl-l] Fwd: blast result not matching. In-Reply-To: References: Message-ID: <4AF09F30-CB69-4285-90A2-40AFEEEBD222@illinois.edu> Are you using the latest BioPerl? I believe there had been some fixes addressing remote blast. chris On Sep 5, 2011, at 5:40 AM, Manju Rawat wrote: > No..i also tried this..this also dont work.. > pls help me.. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From manju.rawat2 at gmail.com Tue Sep 6 10:59:50 2011 From: manju.rawat2 at gmail.com (Manju Rawat) Date: Tue, 6 Sep 2011 06:59:50 -0400 Subject: [Bioperl-l] Fwd: blast result not matching. In-Reply-To: <4AF09F30-CB69-4285-90A2-40AFEEEBD222@illinois.edu> References: <4AF09F30-CB69-4285-90A2-40AFEEEBD222@illinois.edu> Message-ID: bioperl 1.6.9 version is installed in my system.. its not the reason bcs blast is working fine when i am blasting with follwing code.. #!usr/bin/perl -w use Bio::Perl; $seq = get_sequence('NCBI',"NM_181451"); $blast_result=blast_sequence($seq); write_blast(">xyz.blast",$blast_result); Manju From sidd.basu at gmail.com Tue Sep 6 15:51:09 2011 From: sidd.basu at gmail.com (Siddhartha Basu) Date: Tue, 6 Sep 2011 10:51:09 -0500 Subject: [Bioperl-l] Bioinformatics Job Opening at dictyBase in Chicago Message-ID: <20110906155106.GB1841@Macintosh-388.local> Hi All, We have an open position for a Bioinformatics Software Engineer at dictyBase(Northwestern University in Chicago). The job involves developing web application and middleware for a genome database using modern perl(DBIx-Class/Moose/MVC web frameworks etc) as well as integration of various genomic tools(gbrowse, intermine, apollo, biomart, pathway tools etc..). For full details please see: http://www.dictybase.org/dictybase_jobs.html. thanks, -siddhartha Siddhartha Basu Software developer, dictybase http://www.dictybase.org From slucky at ibab.ac.in Wed Sep 7 13:39:03 2011 From: slucky at ibab.ac.in (Lucky Singh) Date: Wed, 07 Sep 2011 19:09:03 +0530 Subject: [Bioperl-l] Fwd: Re: Problem using Bio::Tools::Run::RemoteBlast Message-ID: <4E6773F7.7000703@ibab.ac.in> -------- Original Message -------- Subject: Re: [Bioperl-l] Problem using Bio::Tools::Run::RemoteBlast Date: Sat, 27 Aug 2011 20:36:58 +0530 From: Lucky Singh To: Carn? Draug On Friday 26 August 2011 07:50 PM, Carn? Draug wrote: > On 22 August 2011 07:01, Lucky Singh wrote: >> Now I >> wanted to host it from web server, but This program is not working from it >> may be it is not able to create or write on file from web server but in >> command line it is working fine. I don't know the possible reason, please >> help me to figure it out. > Have you looked in the apache logs (look in > /var/log/apache2/error.log) ? Can you pastebin your whole code and the > content of the error log after trying to run the script? Dear Carn? Draug, As per your suggestion, I am attaching blast code file currently it is not showing any error on error.log. Thanks a lot for your valuable reply and will be highly grateful if you can get me out of this problem :) -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: blastn URL: From jason.stajich at gmail.com Wed Sep 7 16:13:46 2011 From: jason.stajich at gmail.com (Jason Stajich) Date: Wed, 7 Sep 2011 09:13:46 -0700 Subject: [Bioperl-l] Fwd: blast result not matching. In-Reply-To: References: <4AF09F30-CB69-4285-90A2-40AFEEEBD222@illinois.edu> Message-ID: <8FB3525D-3B2B-4D17-80D4-AA85B3D2AE8B@gmail.com> I don't think it works. I am not sure why - probably a bug - but can you go back to what it is you are trying to do? The Bio::Perl functions in that modules are intended to be shortcuts but the original modules should work. Can you recap what it is you want to accomplish, it may be better to do this with the Bio::Perl module but instead use a more direct use of the underlying modules. On Sep 6, 2011, at 3:59 AM, Manju Rawat wrote: > bioperl 1.6.9 version is installed in my system.. > its not the reason bcs blast is working fine when i am blasting with > follwing code.. > > #!usr/bin/perl -w > use Bio::Perl; > $seq = get_sequence('NCBI',"NM_181451"); > $blast_result=blast_sequence($seq); > write_blast(">xyz.blast",$blast_result); > > > Manju > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Wed Sep 7 16:33:52 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Wed, 7 Sep 2011 16:33:52 +0000 Subject: [Bioperl-l] Fwd: blast result not matching. In-Reply-To: <8FB3525D-3B2B-4D17-80D4-AA85B3D2AE8B@gmail.com> References: <4AF09F30-CB69-4285-90A2-40AFEEEBD222@illinois.edu> <8FB3525D-3B2B-4D17-80D4-AA85B3D2AE8B@gmail.com> Message-ID: I think there was an issue with Bio::Perl BLAST submissions fixed in the 1.6.901 release (1.6.9 != 1.6.901, the latter is newer). From CPAN: 1.6.901 May 18, 2011 ... [Bug fixes] * [3205] - small fix to Bio::Perl blast_sequence() to make compliant with docs [genehack, cjfields] chris On Sep 7, 2011, at 11:13 AM, Jason Stajich wrote: > I don't think it works. I am not sure why - probably a bug - but can you go back to what it is you are trying to do? The Bio::Perl functions in that modules are intended to be shortcuts but the original modules should work. > Can you recap what it is you want to accomplish, it may be better to do this with the Bio::Perl module but instead use a more direct use of the underlying modules. > > > On Sep 6, 2011, at 3:59 AM, Manju Rawat wrote: > >> bioperl 1.6.9 version is installed in my system.. >> its not the reason bcs blast is working fine when i am blasting with >> follwing code.. >> >> #!usr/bin/perl -w >> use Bio::Perl; >> $seq = get_sequence('NCBI',"NM_181451"); >> $blast_result=blast_sequence($seq); >> write_blast(">xyz.blast",$blast_result); >> >> >> Manju >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > From carandraug+dev at gmail.com Wed Sep 7 16:47:16 2011 From: carandraug+dev at gmail.com (=?ISO-8859-1?Q?Carn=EB_Draug?=) Date: Wed, 7 Sep 2011 17:47:16 +0100 Subject: [Bioperl-l] Problem using Bio::Tools::Run::RemoteBlast In-Reply-To: <4E590812.9030006@ibab.ac.in> References: <37711.192.168.1.254.1313992876.squirrel@webmail.ibab.ac.in> <4E590812.9030006@ibab.ac.in> Message-ID: 2011/8/27 Lucky Singh : > On Friday 26 August 2011 07:50 PM, Carn? Draug wrote: >> >> On 22 August 2011 07:01, Lucky Singh ?wrote: >>> >>> Now I >>> wanted to host it from web server, but This program is not working from >>> it >>> may be it is not able to create or write on file from web server but in >>> command line it is working fine. I don't know the possible reason, please >>> help me to figure it out. >> >> Have you looked in the apache logs (look in >> /var/log/apache2/error.log) ? Can you pastebin your whole code and the >> content of the error log after trying to run the script? > > Dear Carn? Draug, > > As per your suggestion, I am attaching blast code file currently it is not > showing any error on error.log. > Thanks a lot for your valuable reply and will be highly grateful if you can > get me out of this problem :) Hi sorry for the late reply. Please try to always reply to the mailing list, maybe someone else can help you too. I don't know about the script as I never used RemoteBlast from bioperl. But given a quick look at it, you're not loading the CGI module on the script ( http://perldoc.perl.org/CGI.html ). Here's a simple example using the CGI module ( http://pastebin.com/miMd70wn ) and a HTML page that will use it ( http://pastebin.com/kWwwMijd ). If nothing shows up on error.log, take a look in access.log. Try some simple CGI script first, such as "hello world!" to see if the problem lies on your bioperl part of the script, or in the web server, or some other part. Carn? From scott at scottcain.net Wed Sep 7 17:57:31 2011 From: scott at scottcain.net (Scott Cain) Date: Wed, 7 Sep 2011 13:57:31 -0400 Subject: [Bioperl-l] October GMOD Meeting in Toronto Message-ID: Hello, The early registration deadline for the October GMOD meeting in Toronto, Canada is approaching. Please register by September 13th to avoid the late registration fee. You can register here: http://gmod.eventbrite.com/ For information about the GMOD meeting please see the page at: http://gmod.org/wiki/October_2011_GMOD_Meeting In addition to the main meeting, there will be a free BioMart workshop on the following Friday, which you can also register for at the main meeting registration page. Thanks, Scott -- ------------------------------------------------------------------------ Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 Ontario Institute for Cancer Research From info at etisalat.com Wed Sep 7 09:52:00 2011 From: info at etisalat.com (Etisalat Telecommunication Network.) Date: Wed, 7 Sep 2011 17:52:00 +0800 Subject: [Bioperl-l] Winning No:ETS/G/NG Message-ID: <20110907092508.M63844@etisalat.com> Etisalat Telecommunication Network. Ticket No:ET/S/3G Notification Date:07/09/2011 Winning No:ETS/G/NG Dear Beneficiary, Congratulations The Etisalat mobile telecommunication network service has chosen you by the board of executive directors as one of the final recipients of a cash Grant/Donation.The online cyber draws was conducted from an exclusive list of 100,000 email addresses of individuals and corporate bodies picked by an advanced automated random computer selection from the web.This promotion is to celebrate the patronage of our esteem customers and we are giving out a yearly donation of $1,000,000.00 US dollers to 10 lucky recipients as a way of showing our appreciation. CONTACT EVENT MANAGER. NAME:Thompson Thomas Phone # :+2347063805127 etisalat_clamdept001 at hotmail.com Etisalat Claims Department 1.Full Name: 2.Residential Address: 3.Country: 4.Occupation: 5.Telephone: 6.Sex: 7.Age: 8.Next of Kin: 9.Nationality: 10.Winning No: Secretary Mrs Linda Abram Etisalat Award Promotion (c)2011 Online Award Promotion Edition From longbow0 at gmail.com Wed Sep 7 20:19:37 2011 From: longbow0 at gmail.com (longbow leo) Date: Wed, 7 Sep 2011 15:19:37 -0500 Subject: [Bioperl-l] How to determine strains were evolved independently in a phylogenetic tree? Message-ID: Hi, I have created a phylogenetic for a virus protein which contained about 200 strains. Next I need to do an analysis to check whether several strains in this tree were evolved independently. Although it is not too difficult to do manually, I still have litter idea how to do this in a Perl script since there are some datasets need to do. At first I tried to use the method "is_monophyletic" in the module "Bio::Tree::TreeFunctionsI" to do this analysis, but it seems it doesn't work as I have thought. According to the description of "is_monophyletic" method, it "Will do a test of monophyly for the nodes specified in comparison to a chosen outgroup". Does here test whether the outgroup strain is monophyletic to the nodes, or test the nodes only? The description sounds like the latter but the what the script did seemed to be the first. Are there any suggestions? Thank you very much! Haizhou Liu From greg at ebi.ac.uk Thu Sep 8 10:40:30 2011 From: greg at ebi.ac.uk (Gregory Jordan) Date: Thu, 8 Sep 2011 11:40:30 +0100 Subject: [Bioperl-l] How to determine strains were evolved independently in a phylogenetic tree? In-Reply-To: References: Message-ID: Hi Haizhou, I'm not sure I understand exactly what you're trying to do. But to clarify the BioPerl code: the is_monophyletic method (for the actual code, see here https://github.com/bioperl/bioperl-live/blob/master/Bio/Tree/TreeFunctionsI.pm#L832) tests whether the single outgroup node falls *within* or *outside* the last common ancestor of the group of nodes given. If the outgroup node falls *outside* the subtree defined by this LCA node, then the group of nodes can be called monophyletic with respect to that outgroup (at least as far as my understanding of the word 'monophyletic' goes). If the outgroup node falls *within* the subtree defined by this LCA node, then the group of nodes is not monophyletic with respect to that outgroup node. The term "evolved independently" sounds slightly vague to me -- what is it exactly about the shape of your tree that allows you to call a strain independent or not? If you gave an example or two of trees where you consider the evolution to be independent and non-independent, I (or someone else on the list) may be able to help you find the right method to do this automatically. Cheers, Greg On Wed, Sep 7, 2011 at 9:19 PM, longbow leo wrote: > Hi, > > I have created a phylogenetic for a virus protein which contained about 200 > strains. Next I need to do an analysis to check whether several strains in > this tree were evolved independently. Although it is not too difficult to > do > manually, I still have litter idea how to do this in a Perl script since > there are some datasets need to do. > > At first I tried to use the method "is_monophyletic" in the module > "Bio::Tree::TreeFunctionsI" to do this analysis, but it seems it doesn't > work as I have thought. According to the description of "is_monophyletic" > method, it "Will do a test of monophyly for the nodes specified in > comparison to a chosen outgroup". Does here test whether the outgroup > strain > is monophyletic to the nodes, or test the nodes only? The description > sounds > like the latter but the what the script did seemed to be the first. > > Are there any suggestions? > > Thank you very much! > > > Haizhou Liu > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From manju.rawat2 at gmail.com Thu Sep 8 06:11:12 2011 From: manju.rawat2 at gmail.com (Manju Rawat) Date: Thu, 8 Sep 2011 02:11:12 -0400 Subject: [Bioperl-l] Fwd: blast result not matching. In-Reply-To: References: <4AF09F30-CB69-4285-90A2-40AFEEEBD222@illinois.edu> <8FB3525D-3B2B-4D17-80D4-AA85B3D2AE8B@gmail.com> Message-ID: Toady i installed the latest version of bioperl in my system via CPAN.. But this still not sowing the complete result.. I just want to do nucleotide blast using bioperl..but while i am doing blast with my sequence it shwowing very samll result.. I dnt know whether it is wrong or right...but while i am blasting the same sequence in NCBI it showing a diffrent result.. and i have also tried to use the orignl module..but it also dnt work.. Pl see reult of the balst in attached file of this mail.. #!usr/bin/perl -w use Bio::Perl; use Bio::SearchIO; $blast_report =blast_sequence('acggctgctgtagatctgatgct'); write_blast(">resl.blast",$blast_report); Thanks. Manju Rawat -------------- next part -------------- A non-text attachment was scrubbed... Name: resl.blast Type: application/octet-stream Size: 1680 bytes Desc: not available URL: From cjfields at illinois.edu Thu Sep 8 13:05:10 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Thu, 8 Sep 2011 13:05:10 +0000 Subject: [Bioperl-l] Fwd: blast result not matching. In-Reply-To: References: <4AF09F30-CB69-4285-90A2-40AFEEEBD222@illinois.edu> <8FB3525D-3B2B-4D17-80D4-AA85B3D2AE8B@gmail.com> Message-ID: <6D4A142B-9455-4CC3-AFDB-F9B3B991B57F@illinois.edu> Submissions to NCBI BLAST via their web interface have different parameters than those submitted via their QBLAST interface (what is used in BioPerl). So the fact there are differing results isn't too surprising, particularly if the results fall close to the e-value cutoff for one or the other. You will need to set the proper parameters, which I don't believe is possible via the (very simple) Bio::Perl interface, but is possible via Bio::Tools::Run::RemoteBlast. chris On Sep 8, 2011, at 1:11 AM, Manju Rawat wrote: > Toady i installed the latest version of bioperl in my system via CPAN.. > But this still not sowing the complete result.. > > > I just want to do nucleotide blast using bioperl..but while i am doing blast with my sequence it shwowing very samll result.. > I dnt know whether it is wrong or right...but while i am blasting the same sequence in NCBI it showing a diffrent result.. > > and i have also tried to use the orignl module..but it also dnt work.. > > Pl see reult of the balst in attached file of this mail.. > #!usr/bin/perl -w > use Bio::Perl; > use Bio::SearchIO; > $blast_report =blast_sequence('acggctgctgtagatctgatgct'); > write_blast(">resl.blast",$blast_report); > > Thanks. > Manju Rawat > > > From David.Messina at sbc.su.se Thu Sep 8 13:33:19 2011 From: David.Messina at sbc.su.se (Dave Messina) Date: Thu, 8 Sep 2011 15:33:19 +0200 Subject: [Bioperl-l] Fwd: blast result not matching. In-Reply-To: References: <4AF09F30-CB69-4285-90A2-40AFEEEBD222@illinois.edu> <8FB3525D-3B2B-4D17-80D4-AA85B3D2AE8B@gmail.com> Message-ID: As I think has been said earlier in this thread, it's almost certainly a discrepancy in the BLAST parameters between what the blast_sequence function in the Bio::Perl module is sending, and what the BLAST website is doing. In this case, you have a very short sequence. If you look in the "Algorithm parameters" section of the BLAST web form, you'll see that there is an option that is checked by default, "Automatically adjust parameters for short input sequences". If I uncheck that option, I get the same results as you did when you submitted your BLAST through BioPerl (see http://cl.ly/9ynq). So to get the same results from a BioPerl-submitted BLAST and a BLAST on NCBI's website, you need to have the same parameters. You can set the parameters from BioPerl as described in the documentation: http://search.cpan.org/~cjfields/BioPerl-1.6.901/Bio/Tools/Run/RemoteBlast.pm As Jason said earlier, the blast_sequence function in Bio::Perl is intended as a simple demonstration and uses the default BLAST parameters. That function is a wrapper around the RemoteBlast module. Since you want to do something a little different, I believe you'll need to use the RemoteBlast module directly. Dave On Thu, Sep 8, 2011 at 08:11, Manju Rawat wrote: > Toady i installed the latest version of bioperl in my system via CPAN.. > But this still not sowing the complete result.. > > > I just want to do nucleotide blast using bioperl..but while i am doing > blast > with my sequence it shwowing very samll result.. > I dnt know whether it is wrong or right...but while i am blasting the same > sequence in NCBI it showing a diffrent result.. > > and i have also tried to use the orignl module..but it also dnt work.. > > Pl see reult of the balst in attached file of this mail.. > #!usr/bin/perl -w > use Bio::Perl; > use Bio::SearchIO; > $blast_report =blast_sequence('acggctgctgtagatctgatgct'); > write_blast(">resl.blast",$blast_report); > > Thanks. > Manju Rawat > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From abualiga2 at gmail.com Thu Sep 8 14:44:39 2011 From: abualiga2 at gmail.com (galeb abu-ali) Date: Thu, 8 Sep 2011 10:44:39 -0400 Subject: [Bioperl-l] genbank parsing of multiple 'function' tags within primary tag Message-ID: Hi, I'm parsing a genbank file with Bio::SeqIO and am stuck on instances of multiple tags within a primary tag. E.g., when there are several 'function' tag-values within a 'CDS' primary tag, I don't know how to link those 'function' tag-values to a particular 'locus_tag'. As parsed values are returned as a list, I tried creating an array of hashes, where the hash-key is 'locus_tag' and hash-values are multiple 'function' tags, but am failing miserably. Pasted below is what I managed so far. At your convenience, please advise. thanks! galeb #!/usr/local/bin/perl # parse_gbk.pl # gsa 09042011 # script to parse out features from gbk # http://www.bioperl.org/wiki/HOWTO:Feature-Annotation#Customizing_Sequence_Object_Construction use strict; use warnings; use Bio::SeqIO; my @loci; my @seqs; my @directions; my @start_coords; my @end_coords; my @genes; my @products; my @notes; my @functions; my %functions; my $gb_file = shift; my $seqio_obj = Bio::SeqIO->new(-file => $gb_file ); my $seq_obj = $seqio_obj->next_seq; for my $feat_obj ( $seq_obj->get_SeqFeatures ) { if ( $feat_obj->primary_tag eq ( 'gene' ) ) { if ($feat_obj->has_tag( 'locus_tag' ) ) { push ( @seqs, $feat_obj->seq->seq ); #collect sequences for my $val ( $feat_obj->get_tag_values( 'locus_tag' ) ) { push ( @loci, $val ); # locus_tags } } if ( $feat_obj->has_tag( 'gene' ) ) { for my $val ( $feat_obj->get_tag_values( 'gene' ) ) { push ( @genes, $val ); # gene names } } else { push ( @genes, "" ); # if gene names are absent, leave empty } if ( $feat_obj->location->isa( 'Bio::Location::Simple' ) ) { # gene coordinates for my $location ( $feat_obj->location ) { push ( @start_coords, $location->start ); push ( @end_coords, $location->end ); if ( $location->strand == -1 ) { push ( @directions, "reverse" ); } else { push ( @directions, "forward" ); } } } } # gene products, notes, functions if ( $feat_obj->primary_tag eq ( 'CDS' ) || $feat_obj->primary_tag eq ( 'misc_feature' ) || $feat_obj->primary_tag eq ( 'ncRNA' ) || $feat_obj->primary_tag eq ( 'rRNA' ) || $feat_obj->primary_tag eq ( 'tRNA' ) || $feat_obj->primary_tag eq ( 'misc_RNA' ) ) { if ( $feat_obj->has_tag( 'product' ) ) { for my $product ( $feat_obj->get_tag_values( 'product' ) ) { push ( @products, $product ); } } else { push ( @products, "" ); } if ( $feat_obj->has_tag( 'note' ) ) { for my $note ( $feat_obj->get_tag_values( 'note' ) ) { push ( @notes, $note ); } } else { push ( @notes, "" ); } if ( $feat_obj->has_tag( 'function' ) ) { for my $function ( $feat_obj->get_tag_values( 'function' ) ) { push ( @functions, $function ); } } else { push ( @functions, "" ); } } } print "locus\tgene_name\tstart_nt\tend_nt\tlength_nt\tdirection\tproduct\tnote\tfunction\tsequence_nt\n"; # header for ( my $elem = 0; $elem < scalar @loci; ++$elem ) { print $loci[$elem], "\t",$genes[$elem], "\t", $start_coords[$elem], "\t", $end_coords[$elem], "\t", length( $seqs[$elem] ), "\t", $directions[$elem], "\t", $products[$elem], "\t", $notes[$elem], "\t", $functions[$elem], "\t", $seqs[$elem], "\n"; } From p.j.a.cock at googlemail.com Thu Sep 8 15:27:56 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 8 Sep 2011 16:27:56 +0100 Subject: [Bioperl-l] genbank parsing of multiple 'function' tags within primary tag In-Reply-To: References: Message-ID: On Thu, Sep 8, 2011 at 3:44 PM, galeb abu-ali wrote: > Hi, > > I'm parsing a genbank file with Bio::SeqIO and am stuck on instances of > multiple tags within a primary tag. ?E.g., when there are several 'function' > tag-values within a 'CDS' primary tag, I don't know how to link those > 'function' tag-values to a particular 'locus_tag'. Do you have GenBank features with multiple locus_tag qualifiers? That would be very unusual... Peter From cjfields at illinois.edu Thu Sep 8 15:32:21 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Thu, 8 Sep 2011 15:32:21 +0000 Subject: [Bioperl-l] genbank parsing of multiple 'function' tags within primary tag In-Reply-To: References: Message-ID: On Sep 8, 2011, at 10:27 AM, Peter Cock wrote: > On Thu, Sep 8, 2011 at 3:44 PM, galeb abu-ali wrote: >> Hi, >> >> I'm parsing a genbank file with Bio::SeqIO and am stuck on instances of >> multiple tags within a primary tag. E.g., when there are several 'function' >> tag-values within a 'CDS' primary tag, I don't know how to link those >> 'function' tag-values to a particular 'locus_tag'. > > Do you have GenBank features with multiple locus_tag qualifiers? > That would be very unusual... > > Peter Agreed; in order to clarify what you mean, I think we would need to see the record in question to get a better idea of the problem. chris From abualiga2 at gmail.com Thu Sep 8 15:39:20 2011 From: abualiga2 at gmail.com (galeb abu-ali) Date: Thu, 8 Sep 2011 11:39:20 -0400 Subject: [Bioperl-l] genbank parsing of multiple 'function' tags within primary tag In-Reply-To: References: Message-ID: I guess I was not clear. 'locus_tag' qualifiers are single, but there are mutliple 'function' qualifiers within a primary feature (e.g. 'CDS'). # gbk file LOCUS NC_011748 5154862 bp DNA circular BCT 15-MAY-2010 # example feature gene complement(1336169..1337905) /gene="cvrA" /locus_tag="EC55989_1287" /db_xref="GeneID:7145846" CDS complement(1336169..1337905) /gene="cvrA" /locus_tag="EC55989_1287" /function="7 : Transport and binding proteins" /function="15.10 : Adaptations to atypical conditions" /function="16.1 : Circulate" /inference="ab initio prediction:AMIGene:2.0" /note="the Vibrio parahaemolyticus gene VP2867 was found to be a potassium/proton antiporter; can rapidly extrude potassium against a potassium gradient at alkaline pH when cloned and expressed in Escherichia coli" /codon_start=1 /transl_table=11 /product="potassium/proton antiporter" /protein_id="YP_002402372.1" /db_xref="GI:218694705" /db_xref="GeneID:7145846" /translation="MDATTIISLFILGSILVTSSILLSSFSSRLGIPILVIFLAIGML AGVDGVGGIPFDNYPFAYMVSNLALAIILLDGGMRTQASSFRVALGPALSLATLGVLI TSGLTGMMAAWLFNLDLIEGLLIGAIVGSTDAAAVFSLLGGKGLNERVGSTLEIESGS NDPMAVFLTITLIAMIQQHESSVSWMFVVDILQQFGLGIVIGLGGGYLLLQMINRIAL PAGLYPLLALSGGILIFALTTALEGSGILAVYLCGFLLGNRPIRNRYGILQNFDGLAW LAQIAMFLVLGLLVNPSDLLPIAIPALILSAWMIFFARPLSVFAGLLPFRGFNLRERV FISWVGLRGAVPIILAVFPMMAGLENARLFFNVAFFVVLVSLLLQGTSLSWAAKKAKV VVPPVGRPVSRVGLDIHPENPWEQFVYQLSADKWCVGAALRDLHMPKETRIAALFRDN QLLHPTGSTRLREGDVLCVIGRERDLPALGKLFSQSPPVALDQRFFGDFILEASAKYA DVALIYGLEDGREYRDKQQTLGEIVQQLLGAAPVVGDQVEFAGMIWTVAEKEDNEVLK IGVRVAEEEAES" On Thu, Sep 8, 2011 at 11:32 AM, Fields, Christopher J < cjfields at illinois.edu> wrote: > On Sep 8, 2011, at 10:27 AM, Peter Cock wrote: > > > On Thu, Sep 8, 2011 at 3:44 PM, galeb abu-ali > wrote: > >> Hi, > >> > >> I'm parsing a genbank file with Bio::SeqIO and am stuck on instances of > >> multiple tags within a primary tag. E.g., when there are several > 'function' > >> tag-values within a 'CDS' primary tag, I don't know how to link those > >> 'function' tag-values to a particular 'locus_tag'. > > > > Do you have GenBank features with multiple locus_tag qualifiers? > > That would be very unusual... > > > > Peter > > Agreed; in order to clarify what you mean, I think we would need to see the > record in question to get a better idea of the problem. > > chris From p.j.a.cock at googlemail.com Thu Sep 8 15:46:28 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 8 Sep 2011 16:46:28 +0100 Subject: [Bioperl-l] genbank parsing of multiple 'function' tags within primary tag In-Reply-To: References: Message-ID: On Thu, Sep 8, 2011 at 4:39 PM, galeb abu-ali wrote: > I guess I was not clear. 'locus_tag' qualifiers are single, but there are > mutliple 'function' qualifiers within a primary feature (e.g. 'CDS'). So are your intending to look at all the CDS features only, and build a hash using the locus_tag as the key, and a list of the 'function' qualifiers as values? Peter From abualiga2 at gmail.com Thu Sep 8 15:55:08 2011 From: abualiga2 at gmail.com (galeb abu-ali) Date: Thu, 8 Sep 2011 11:55:08 -0400 Subject: [Bioperl-l] genbank parsing of multiple 'function' tags within primary tag In-Reply-To: References: Message-ID: So are your intending to look at all the CDS features only, and build a hash using the locus_tag as the key, and a list of the 'function' qualifiers as values? Precisely! I want to create a tab delim file with 'locus_tag' as the common identifier to all the features and gene sequences. So far, I parsed out sequences and single instance qualifiers, but 'function' and 'db_xref' qualifiers give me grief. galeb From abualiga2 at gmail.com Thu Sep 8 16:14:07 2011 From: abualiga2 at gmail.com (galeb abu-ali) Date: Thu, 8 Sep 2011 12:14:07 -0400 Subject: [Bioperl-l] genbank parsing of multiple 'function' tags within primary tag In-Reply-To: <1315497897.3797.443.camel@deskpro15336.internal.sanger.ac.uk> References: <1315497897.3797.443.camel@deskpro15336.internal.sanger.ac.uk> Message-ID: I only had a quick look at your code, so maybe I'm missing something but you are currently pushing all products of all CDSs into the same array, i.e. you do not assign them to a datastructure that links a particular CDS to a list of products. You then use the same index to print out a locus from the @loci array and a product from @products, but the two will not match up because you will have more products than loci. That's right. Products are not the issue in this particular case, as it's E.coli and there's no alternate splicing as far as I know so there is a single product per gene. But there are plenty more 'function' qualifiers, for example, than loci. And I don't know how to create a data structure that will link a 'gene' (as primary tag) to all other qualifiers, whether they belong to 'CDS', 'Misc_RNA', 'Misc_feature', or other primary tags. From ss2489 at cornell.edu Thu Sep 8 16:28:40 2011 From: ss2489 at cornell.edu (Surya Saha) Date: Thu, 8 Sep 2011 12:28:40 -0400 Subject: [Bioperl-l] genbank parsing of multiple 'function' tags within primary tag In-Reply-To: References: <1315497897.3797.443.camel@deskpro15336.internal.sanger.ac.uk> Message-ID: You might want to explore using a hash of complex records that are very similar to structures in C/C++. More info at http://perldoc.perl.org/perldsc.html#Declaration-of-a-HASH-OF-COMPLEX-RECORDS -Surya On Thu, Sep 8, 2011 at 12:14 PM, galeb abu-ali wrote: > I only had a quick look at your code, so maybe I'm missing something but > you are currently pushing all products of all CDSs into the same array, > i.e. you do not assign them to a datastructure that links a particular > CDS to a list of products. You then use the same index to print out a > locus from the @loci array and a product from @products, but the two > will not match up because you will have more products than loci. > > > > That's right. Products are not the issue in this particular case, as it's > E.coli and there's no alternate splicing as far as I know so there is a > single product per gene. But there are plenty more 'function' qualifiers, > for example, than loci. And I don't know how to create a data structure > that > will link a 'gene' (as primary tag) to all other qualifiers, whether they > belong to 'CDS', 'Misc_RNA', 'Misc_feature', or other primary tags. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From fs5 at sanger.ac.uk Thu Sep 8 16:04:57 2011 From: fs5 at sanger.ac.uk (Frank Schwach) Date: Thu, 08 Sep 2011 17:04:57 +0100 Subject: [Bioperl-l] genbank parsing of multiple 'function' tags within primary tag In-Reply-To: References: Message-ID: <1315497897.3797.443.camel@deskpro15336.internal.sanger.ac.uk> I only had a quick look at your code, so maybe I'm missing something but you are currently pushing all products of all CDSs into the same array, i.e. you do not assign them to a datastructure that links a particular CDS to a list of products. You then use the same index to print out a locus from the @loci array and a product from @products, but the two will not match up because you will have more products than loci. Frank On Thu, 2011-09-08 at 10:44 -0400, galeb abu-ali wrote: > Hi, > > I'm parsing a genbank file with Bio::SeqIO and am stuck on instances of > multiple tags within a primary tag. E.g., when there are several 'function' > tag-values within a 'CDS' primary tag, I don't know how to link those > 'function' tag-values to a particular 'locus_tag'. As parsed values are > returned as a list, I tried creating an array of hashes, where the hash-key > is 'locus_tag' and hash-values are multiple 'function' tags, but am failing > miserably. Pasted below is what I managed so far. At your convenience, > please advise. > > thanks! > > galeb > > #!/usr/local/bin/perl > # parse_gbk.pl > # gsa 09042011 > # script to parse out features from gbk > # > http://www.bioperl.org/wiki/HOWTO:Feature-Annotation#Customizing_Sequence_Object_Construction > > use strict; use warnings; > use Bio::SeqIO; > > my @loci; > my @seqs; > my @directions; > my @start_coords; > my @end_coords; > my @genes; > my @products; > my @notes; > my @functions; > my %functions; > > my $gb_file = shift; > my $seqio_obj = Bio::SeqIO->new(-file => $gb_file ); > my $seq_obj = $seqio_obj->next_seq; > > for my $feat_obj ( $seq_obj->get_SeqFeatures ) { > if ( $feat_obj->primary_tag eq ( 'gene' ) ) { > if ($feat_obj->has_tag( 'locus_tag' ) ) { > push ( @seqs, $feat_obj->seq->seq ); #collect sequences > for my $val ( $feat_obj->get_tag_values( 'locus_tag' ) ) > { > push ( @loci, $val ); # locus_tags > } > } > if ( $feat_obj->has_tag( 'gene' ) ) { > for my $val ( $feat_obj->get_tag_values( 'gene' ) > ) { > push ( @genes, $val ); # gene names > } > } > else { > push ( @genes, "" ); # if gene names are absent, leave > empty > } > if ( $feat_obj->location->isa( 'Bio::Location::Simple' ) ) { # gene > coordinates > for my $location ( $feat_obj->location ) { > push ( @start_coords, $location->start ); > push ( @end_coords, $location->end ); > if ( $location->strand == -1 ) { > push ( @directions, "reverse" ); > } > else { > push ( @directions, "forward" ); > } > } > } > } > # gene products, notes, functions > if ( $feat_obj->primary_tag eq ( 'CDS' ) || $feat_obj->primary_tag eq ( > 'misc_feature' ) || $feat_obj->primary_tag eq ( 'ncRNA' ) || > $feat_obj->primary_tag eq ( 'rRNA' ) || $feat_obj->primary_tag eq ( 'tRNA' ) > || $feat_obj->primary_tag eq ( 'misc_RNA' ) ) { > if ( $feat_obj->has_tag( 'product' ) ) { > for my $product ( $feat_obj->get_tag_values( 'product' ) ) { > push ( @products, $product ); > } > } > else { > push ( @products, "" ); > } > if ( $feat_obj->has_tag( 'note' ) ) { > for my $note ( $feat_obj->get_tag_values( 'note' ) ) { > push ( @notes, $note ); > } > } > else { > push ( @notes, "" ); > } > if ( $feat_obj->has_tag( 'function' ) ) { > for my $function ( $feat_obj->get_tag_values( 'function' ) ) { > push ( @functions, $function ); > } > } > else { > push ( @functions, "" ); > } > > } > } > > print > "locus\tgene_name\tstart_nt\tend_nt\tlength_nt\tdirection\tproduct\tnote\tfunction\tsequence_nt\n"; > # header > > for ( my $elem = 0; $elem < scalar @loci; ++$elem ) { > print $loci[$elem], "\t",$genes[$elem], "\t", $start_coords[$elem], > "\t", $end_coords[$elem], "\t", length( $seqs[$elem] ), "\t", > $directions[$elem], "\t", $products[$elem], "\t", $notes[$elem], "\t", > $functions[$elem], "\t", $seqs[$elem], "\n"; > } > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From cjfields at illinois.edu Thu Sep 8 16:51:22 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Thu, 8 Sep 2011 16:51:22 +0000 Subject: [Bioperl-l] genbank parsing of multiple 'function' tags within primary tag In-Reply-To: References: <1315497897.3797.443.camel@deskpro15336.internal.sanger.ac.uk> Message-ID: There is no need to do that if one is using the Bio::SeqFeatureI interface. Note that get_tag_values always returns a list, so to snag a single value for a tag in a scalar, force list context on the LHS by enclosing the variable in (). chris ----------------------------- #!/usr/bin/env perl use Modern::Perl; use Bio::SeqIO; my $in = Bio::SeqIO->new(-format => 'genbank', -file => shift); while (my $seq = $in->next_seq) { for my $feat ($seq->get_SeqFeatures) { next unless $feat->primary_tag eq 'CDS'; my ($locus) = $feat->has_tag('locus_tag') ? $feat->get_tag_values('locus_tag') : ''; my @funcs = $feat->has_tag('function') ? $feat->get_tag_values('function') : (); say join("\t", $locus, join(',', at funcs)); } } On Sep 8, 2011, at 11:28 AM, Surya Saha wrote: > You might want to explore using a hash of complex records that are very > similar to structures in C/C++. More info at > http://perldoc.perl.org/perldsc.html#Declaration-of-a-HASH-OF-COMPLEX-RECORDS > > -Surya > > On Thu, Sep 8, 2011 at 12:14 PM, galeb abu-ali wrote: > >> I only had a quick look at your code, so maybe I'm missing something but >> you are currently pushing all products of all CDSs into the same array, >> i.e. you do not assign them to a datastructure that links a particular >> CDS to a list of products. You then use the same index to print out a >> locus from the @loci array and a product from @products, but the two >> will not match up because you will have more products than loci. >> >> >> >> That's right. Products are not the issue in this particular case, as it's >> E.coli and there's no alternate splicing as far as I know so there is a >> single product per gene. But there are plenty more 'function' qualifiers, >> for example, than loci. And I don't know how to create a data structure >> that >> will link a 'gene' (as primary tag) to all other qualifiers, whether they >> belong to 'CDS', 'Misc_RNA', 'Misc_feature', or other primary tags. >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From abualiga2 at gmail.com Thu Sep 8 16:51:42 2011 From: abualiga2 at gmail.com (galeb abu-ali) Date: Thu, 8 Sep 2011 12:51:42 -0400 Subject: [Bioperl-l] genbank parsing of multiple 'function' tags within primary tag In-Reply-To: References: <1315497897.3797.443.camel@deskpro15336.internal.sanger.ac.uk> Message-ID: You might want to explore using a hash of complex records that are very similar to structures in C/C++. More info at http://perldoc.perl.org/perldsc.html#Declaration-of-a-HASH-OF-COMPLEX-RECORDS alright, thanks! From jskittrell at unmc.edu Thu Sep 8 16:40:31 2011 From: jskittrell at unmc.edu (Jeff S Kittrell) Date: Thu, 8 Sep 2011 11:40:31 -0500 Subject: [Bioperl-l] Error when parsing a blast file Message-ID: An HTML attachment was scrubbed... URL: From cjfields at illinois.edu Thu Sep 8 17:28:53 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Thu, 8 Sep 2011 17:28:53 +0000 Subject: [Bioperl-l] Error when parsing a blast file In-Reply-To: References: Message-ID: <1F3664C5-6D6B-409C-BE5A-C5EB08975231@illinois.edu> What version of bioperl are you using? I think this issue was addressed a while ago, but it's possible there has been a regression. chris On Sep 8, 2011, at 11:40 AM, Jeff S Kittrell wrote: > Hello Gentlemen, > > I am using BioPerl to a parse a blast output file but have run into some difficulties. I've pin pointed the problem and have pasted an example below. If you look at query position 223-224 you will see a large insertion 65ish nucleotides. Since the insertion spans the entire line there are no nucleotide position numbers at the end or beginning of the line nor any nucleotides within the line (dashes only). > When the SearchIO parser encounters this record it dies with the error > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: no data for midline Query ------------------------------------------------------------ > STACK: Error::throw > STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:368 > STACK: Bio::SearchIO::blast::next_result /usr/local/share/perl5/Bio/SearchIO/blast.pm:1805 > STACK: BlastParseNucleotideForDBTopHitCONTIGSQUERY.pl:24 > ----------------------------------------------------------- > > > Has anyone encountered this problem before? Am I doing something wrong? > > Thanks > > Jeff Kittrell > Department of Genetics, Cell Biology & Anatomy > University of Nebraska Medical Center > 985805 Nebraska Medical Center > Omaha, NE 68198-5805 > > Query= 78065535 > > Length=523 > Score E > Sequences producing significant alignments: (Bits) Value > > gi|144922664|ref|NM_001083909.1| Homo sapiens G protein-coupled... 576 1e-163 > > > > gi|144922664|ref|NM_001083909.1| Homo sapiens G protein-coupled > receptor 123 (GPR123), mRNA > Length=4298 > > Score = 576 bits (638), Expect = 1e-163 > Identities = 466/583 (80%), Gaps = 82/583 (14%) > Strand=Plus/Minus > > Query 1 CAGGACTCCGTGG-----ATGGCATCTCGGGCAGGGCCACGCTGGGGTCTGGGTGGGTCC 55 > ||||||||||||| | ||||||||||||||||||| |||||||||| |||||||| > Sbjct 2537 CAGGACTCCGTGGGCAGCAGGGCATCTCGGGCAGGGCCATGCTGGGGTCTCAGTGGGTCC 2478 > > Query 56 TTTGATGGAAGCCCCTGCTCTGCCTCTGGGGCGCCCCAGGACTGGAGGCCACAGGACAGA 115 > |||||||||| |||||||||||||||| ||| ||||||||||||||| |||||||||||| > Sbjct 2477 TTTGATGGAATCCCCTGCTCTGCCTCTAGGGTGCCCCAGGACTGGAGACCACAGGACAGA 2418 > > Query 116 AACCAGATGACCTTGTGCAGGGACGAGCACGTGGAACTGGGATAAAAGGAGTGGGCGTGG 175 > |||| ||||||| ||||| ||||| |||||| |||| |||||||| ||||||||||||| > Sbjct 2417 AACCGGATGACCGTGTGC-GGGACCAGCACGCGGAATTGGGATAAGGGGAGTGGGCGTGG 2359 > > Query 176 CCCAGAGCTTTTCCCCGCTGAGGTCTTTCACAAGGAAGGGGCAGGGGT------------ 223 > ||| |||| ||||||||||||||||||||||||||||||||||||||| > Sbjct 2358 CCCGGAGCGTTTCCCCGCTGAGGTCTTTCACAAGGAAGGGGCAGGGGTGTGATCACAAGG 2299 > > Query ------------------------------------------------------------ > > Sbjct 2298 AAGGGGCAGGGGTGTGATCACAAGGAAGGGGCAGGGGTGTGATCACAAGGAAGGGGCAGG 2239 > > Query 224 ---GTGAACTGCTTCCGAAAGGTGGGGTCACTTTGGTGCCCCCAGTGACCTCATGTGGCA 280 > |||||| ||||| |||||| |||||||||| ||| |||||||||||||||||||||| > Sbjct 2238 GGTGTGAACGGCTTCTGAAAGGCGGGGTCACTTCGGTACCCCCAGTGACCTCATGTGGCA 2179 > > Query 281 GATGGGCCCCCCACTCTGCTCTGAAGCTCCTCCAGGAACACTGTGTCCCCTG-CTCCGCC 339 > ||||||||||||||||||||||||||||||||||||||||| |||||| ||| | || | > Sbjct 2178 GATGGGCCCCCCACTCTGCTCTGAAGCTCCTCCAGGAACACCGTGTCCTCTGCCCCCATC 2119 > > Query 340 TACACAGTAGTTTCATTTTTCCAGGGTCCTGTTCGGATGTTGCCGGTCCCATCGGTGCCA 399 > |||||||||||||| |||||||||||||| |||||||||||||||||||| ||||||||| > Sbjct 2118 TACACAGTAGTTTCGTTTTTCCAGGGTCCCGTTCGGATGTTGCCGGTCCCGTCGGTGCCA 2059 > > Query 400 AACGGCAGGTCTTCTAGCAAGTTACCCTTGGGCAGCCCGTTCTGGCTGGGGCCACCAAAG 459 > ||||||||| |||||||||| ||||||||||||||||||||||||||||||||||||||| > Sbjct 2058 AACGGCAGGCCTTCTAGCAATTTACCCTTGGGCAGCCCGTTCTGGCTGGGGCCACCAAAG 1999 > > Query 460 GGCAGGGACTGTGTCCTCCGCAGCATCTCCAGGTGACCAGGCC 502 > ||||||||||||||||||||||||||||||| ||| || |||| > Sbjct 1998 GGCAGGGACTGTGTCCTCCGCAGCATCTCCAAGTGGCCGGGCC 1956 > > > > Lambda K H > 0.634 0.408 0.912 > > Gapped > Lambda K H > 0.625 0.410 0.780 > > Effective search space used: 47712920310 > > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From abualiga2 at gmail.com Thu Sep 8 17:51:34 2011 From: abualiga2 at gmail.com (galeb abu-ali) Date: Thu, 8 Sep 2011 13:51:34 -0400 Subject: [Bioperl-l] genbank parsing of multiple 'function' tags within primary tag In-Reply-To: References: <1315497897.3797.443.camel@deskpro15336.internal.sanger.ac.uk> Message-ID: thanks, Chris! works perfect. To make sure I understand what's going on, forcing list context on $locus allows me to get one value at a time, which is then concatenated with \t to concatenated functions. thanks again! galeb On Thu, Sep 8, 2011 at 12:51 PM, Fields, Christopher J < cjfields at illinois.edu> wrote: > There is no need to do that if one is using the Bio::SeqFeatureI interface. > Note that get_tag_values always returns a list, so to snag a single value > for a tag in a scalar, force list context on the LHS by enclosing the > variable in (). > > chris > > ----------------------------- > #!/usr/bin/env perl > > use Modern::Perl; > use Bio::SeqIO; > > my $in = Bio::SeqIO->new(-format => 'genbank', > -file => shift); > > while (my $seq = $in->next_seq) { > for my $feat ($seq->get_SeqFeatures) { > next unless $feat->primary_tag eq 'CDS'; > my ($locus) = $feat->has_tag('locus_tag') ? > $feat->get_tag_values('locus_tag') : ''; > my @funcs = $feat->has_tag('function') ? > $feat->get_tag_values('function') : (); > say join("\t", $locus, join(',', at funcs)); > } > } > > > > From cjfields at illinois.edu Thu Sep 8 18:27:06 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Thu, 8 Sep 2011 18:27:06 +0000 Subject: [Bioperl-l] genbank parsing of multiple 'function' tags within primary tag In-Reply-To: References: <1315497897.3797.443.camel@deskpro15336.internal.sanger.ac.uk> Message-ID: <2CDD8237-140E-410B-A18C-68FAF54719D5@illinois.edu> On Sep 8, 2011, at 12:51 PM, galeb abu-ali wrote: > thanks, Chris! works perfect. > To make sure I understand what's going on, forcing list context on $locus allows me to get one value at a time,... You have to be careful in this circumstance; doing this: my $foo = @bar; is scalar context on a list, which returns the number of elements in @bar. The following my ($foo) = @bar; forces list context and assigns the first value in @bar to $foo but tosses the rest. If you are sure there is only one value in @bar anyway, the above is fine (and is a common perl idiom). > which is then concatenated with \t to concatenated functions. I'm just using a simple join() to print off the results. Note the second element in the join list is an embedded join() with comma-sep values for functions. chris > thanks again! > > galeb > > On Thu, Sep 8, 2011 at 12:51 PM, Fields, Christopher J wrote: > There is no need to do that if one is using the Bio::SeqFeatureI interface. Note that get_tag_values always returns a list, so to snag a single value for a tag in a scalar, force list context on the LHS by enclosing the variable in (). > > chris > > ----------------------------- > #!/usr/bin/env perl > > use Modern::Perl; > use Bio::SeqIO; > > my $in = Bio::SeqIO->new(-format => 'genbank', > -file => shift); > > while (my $seq = $in->next_seq) { > for my $feat ($seq->get_SeqFeatures) { > next unless $feat->primary_tag eq 'CDS'; > my ($locus) = $feat->has_tag('locus_tag') ? > $feat->get_tag_values('locus_tag') : ''; > my @funcs = $feat->has_tag('function') ? > $feat->get_tag_values('function') : (); > say join("\t", $locus, join(',', at funcs)); > } > } > > > From cjfields at illinois.edu Thu Sep 8 18:30:06 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Thu, 8 Sep 2011 18:30:06 +0000 Subject: [Bioperl-l] Error when parsing a blast file In-Reply-To: References: <1F3664C5-6D6B-409C-BE5A-C5EB08975231@illinois.edu> Message-ID: Try updating to the latest CPAN release (1.6.901, which is the pre-1.7 release). chris On Sep 8, 2011, at 1:19 PM, Jeff S Kittrell wrote: > chris, > > I am using version 1.6.1 > > Thanks, > > > Jeff Kittrell > Department of Genetics, Cell Biology & Anatomy > University of Nebraska Medical Center > 985805 Nebraska Medical Center > Omaha, NE 68198-5805 > > "Fields, Christopher J" ---09/08/2011 12:28:56 PM---What version of bioperl are you using? I think this issue was addressed a while ago, but it's possi > > > From: > > "Fields, Christopher J" > > To: > > Jeff S Kittrell > > Cc: > > " " > > Date: > > 09/08/2011 12:28 PM > > Subject: > > Re: [Bioperl-l] Error when parsing a blast file > > > > What version of bioperl are you using? I think this issue was addressed a while ago, but it's possible there has been a regression. > > chris > > On Sep 8, 2011, at 11:40 AM, Jeff S Kittrell wrote: > > > Hello Gentlemen, > > > > I am using BioPerl to a parse a blast output file but have run into some difficulties. I've pin pointed the problem and have pasted an example below. If you look at query position 223-224 you will see a large insertion 65ish nucleotides. Since the insertion spans the entire line there are no nucleotide position numbers at the end or beginning of the line nor any nucleotides within the line (dashes only). > > When the SearchIO parser encounters this record it dies with the error > > > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > MSG: no data for midline Query ------------------------------------------------------------ > > STACK: Error::throw > > STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:368 > > STACK: Bio::SearchIO::blast::next_result /usr/local/share/perl5/Bio/SearchIO/blast.pm:1805 > > STACK: BlastParseNucleotideForDBTopHitCONTIGSQUERY.pl:24 > > ----------------------------------------------------------- > > > > > > Has anyone encountered this problem before? Am I doing something wrong? > > > > Thanks > > > > Jeff Kittrell > > Department of Genetics, Cell Biology & Anatomy > > University of Nebraska Medical Center > > 985805 Nebraska Medical Center > > Omaha, NE 68198-5805 > > > > Query= 78065535 > > > > Length=523 > > Score E > > Sequences producing significant alignments: (Bits) Value > > > > gi|144922664|ref|NM_001083909.1| Homo sapiens G protein-coupled... 576 1e-163 > > > > > > > gi|144922664|ref|NM_001083909.1| Homo sapiens G protein-coupled > > receptor 123 (GPR123), mRNA > > Length=4298 > > > > Score = 576 bits (638), Expect = 1e-163 > > Identities = 466/583 (80%), Gaps = 82/583 (14%) > > Strand=Plus/Minus > > > > Query 1 CAGGACTCCGTGG-----ATGGCATCTCGGGCAGGGCCACGCTGGGGTCTGGGTGGGTCC 55 > > ||||||||||||| | ||||||||||||||||||| |||||||||| |||||||| > > Sbjct 2537 CAGGACTCCGTGGGCAGCAGGGCATCTCGGGCAGGGCCATGCTGGGGTCTCAGTGGGTCC 2478 > > > > Query 56 TTTGATGGAAGCCCCTGCTCTGCCTCTGGGGCGCCCCAGGACTGGAGGCCACAGGACAGA 115 > > |||||||||| |||||||||||||||| ||| ||||||||||||||| |||||||||||| > > Sbjct 2477 TTTGATGGAATCCCCTGCTCTGCCTCTAGGGTGCCCCAGGACTGGAGACCACAGGACAGA 2418 > > > > Query 116 AACCAGATGACCTTGTGCAGGGACGAGCACGTGGAACTGGGATAAAAGGAGTGGGCGTGG 175 > > |||| ||||||| ||||| ||||| |||||| |||| |||||||| ||||||||||||| > > Sbjct 2417 AACCGGATGACCGTGTGC-GGGACCAGCACGCGGAATTGGGATAAGGGGAGTGGGCGTGG 2359 > > > > Query 176 CCCAGAGCTTTTCCCCGCTGAGGTCTTTCACAAGGAAGGGGCAGGGGT------------ 223 > > ||| |||| ||||||||||||||||||||||||||||||||||||||| > > Sbjct 2358 CCCGGAGCGTTTCCCCGCTGAGGTCTTTCACAAGGAAGGGGCAGGGGTGTGATCACAAGG 2299 > > > > Query ------------------------------------------------------------ > > > > Sbjct 2298 AAGGGGCAGGGGTGTGATCACAAGGAAGGGGCAGGGGTGTGATCACAAGGAAGGGGCAGG 2239 > > > > Query 224 ---GTGAACTGCTTCCGAAAGGTGGGGTCACTTTGGTGCCCCCAGTGACCTCATGTGGCA 280 > > |||||| ||||| |||||| |||||||||| ||| |||||||||||||||||||||| > > Sbjct 2238 GGTGTGAACGGCTTCTGAAAGGCGGGGTCACTTCGGTACCCCCAGTGACCTCATGTGGCA 2179 > > > > Query 281 GATGGGCCCCCCACTCTGCTCTGAAGCTCCTCCAGGAACACTGTGTCCCCTG-CTCCGCC 339 > > ||||||||||||||||||||||||||||||||||||||||| |||||| ||| | || | > > Sbjct 2178 GATGGGCCCCCCACTCTGCTCTGAAGCTCCTCCAGGAACACCGTGTCCTCTGCCCCCATC 2119 > > > > Query 340 TACACAGTAGTTTCATTTTTCCAGGGTCCTGTTCGGATGTTGCCGGTCCCATCGGTGCCA 399 > > |||||||||||||| |||||||||||||| |||||||||||||||||||| ||||||||| > > Sbjct 2118 TACACAGTAGTTTCGTTTTTCCAGGGTCCCGTTCGGATGTTGCCGGTCCCGTCGGTGCCA 2059 > > > > Query 400 AACGGCAGGTCTTCTAGCAAGTTACCCTTGGGCAGCCCGTTCTGGCTGGGGCCACCAAAG 459 > > ||||||||| |||||||||| ||||||||||||||||||||||||||||||||||||||| > > Sbjct 2058 AACGGCAGGCCTTCTAGCAATTTACCCTTGGGCAGCCCGTTCTGGCTGGGGCCACCAAAG 1999 > > > > Query 460 GGCAGGGACTGTGTCCTCCGCAGCATCTCCAGGTGACCAGGCC 502 > > ||||||||||||||||||||||||||||||| ||| || |||| > > Sbjct 1998 GGCAGGGACTGTGTCCTCCGCAGCATCTCCAAGTGGCCGGGCC 1956 > > > > > > > > Lambda K H > > 0.634 0.408 0.912 > > > > Gapped > > Lambda K H > > 0.625 0.410 0.780 > > > > Effective search space used: 47712920310 > > > > > > > > > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > From abualiga2 at gmail.com Thu Sep 8 18:34:41 2011 From: abualiga2 at gmail.com (galeb abu-ali) Date: Thu, 8 Sep 2011 14:34:41 -0400 Subject: [Bioperl-l] genbank parsing of multiple 'function' tags within primary tag In-Reply-To: <2CDD8237-140E-410B-A18C-68FAF54719D5@illinois.edu> References: <1315497897.3797.443.camel@deskpro15336.internal.sanger.ac.uk> <2CDD8237-140E-410B-A18C-68FAF54719D5@illinois.edu> Message-ID: many thanks again, Chris! I was reading Programming Perl, but this sums it up better. On Thu, Sep 8, 2011 at 2:27 PM, Fields, Christopher J wrote: > On Sep 8, 2011, at 12:51 PM, galeb abu-ali wrote: > > > thanks, Chris! works perfect. > > To make sure I understand what's going on, forcing list context on $locus > allows me to get one value at a time,... > > You have to be careful in this circumstance; doing this: > > my $foo = @bar; > > is scalar context on a list, which returns the number of elements in @bar. > The following > > my ($foo) = @bar; > > forces list context and assigns the first value in @bar to $foo but tosses > the rest. If you are sure there is only one value in @bar anyway, the above > is fine (and is a common perl idiom). > > > which is then concatenated with \t to concatenated functions. > > I'm just using a simple join() to print off the results. Note the second > element in the join list is an embedded join() with comma-sep values for > functions. > > chris > > > thanks again! > > > > galeb > > > > On Thu, Sep 8, 2011 at 12:51 PM, Fields, Christopher J < > cjfields at illinois.edu> wrote: > > There is no need to do that if one is using the Bio::SeqFeatureI > interface. Note that get_tag_values always returns a list, so to snag a > single value for a tag in a scalar, force list context on the LHS by > enclosing the variable in (). > > > > chris > > > > ----------------------------- > > #!/usr/bin/env perl > > > > use Modern::Perl; > > use Bio::SeqIO; > > > > my $in = Bio::SeqIO->new(-format => 'genbank', > > -file => shift); > > > > while (my $seq = $in->next_seq) { > > for my $feat ($seq->get_SeqFeatures) { > > next unless $feat->primary_tag eq 'CDS'; > > my ($locus) = $feat->has_tag('locus_tag') ? > > $feat->get_tag_values('locus_tag') : ''; > > my @funcs = $feat->has_tag('function') ? > > $feat->get_tag_values('function') : (); > > say join("\t", $locus, join(',', at funcs)); > > } > > } > > > > > > > > From David.Messina at sbc.su.se Fri Sep 9 09:40:25 2011 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 9 Sep 2011 11:40:25 +0200 Subject: [Bioperl-l] Fwd: blast result not matching. In-Reply-To: References: <4AF09F30-CB69-4285-90A2-40AFEEEBD222@illinois.edu> <8FB3525D-3B2B-4D17-80D4-AA85B3D2AE8B@gmail.com> Message-ID: Hi Manju, But this is not showing all query coverage as it shows in simple balst.(see > attached file) > I'm not sure what you mean by query coverage here, as blast report you attached doesn't (as far as I can see) include a calculation of the number or percentage of query bases covered. But in any case, everything in that blast report is available in the Bio::SearchIO object that B::T::R::RemoteBlast returns. Have you taken a look at http://www.bioperl.org/wiki/HOWTO:SearchIO ? That, along with the module documentation, should help you find the parts of the BLAST report you're looking for. > and i also want to write that result in a blast file..Is there any method > which can write the remoteblast output > in a file with blast extension? > It is possible to write out the results in a format that closely resembles the native blast report, but it's not recommended. If you want to just run BLAST and get back a report, there's no need to use BioPerl to parse the report first and then recreate the report. This might also be a good time to mention that, if you're doing more than a few hundred BLAST searches, you'll find it much more efficient to download the database and the BLAST program from NCBI and run them on your own computer. NCBI severely limits the speed and frequency of remote BLASTs, and furthermore it's much more prone to failure. Also, if you're using BLAST+, you can run your BLASTs on NCBI's computers remotely without BioPerl. Check out the --remote command-line option ? it's my favorite new feature! Dave From David.Messina at sbc.su.se Fri Sep 9 10:53:01 2011 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 9 Sep 2011 12:53:01 +0200 Subject: [Bioperl-l] Fwd: blast result not matching. In-Reply-To: References: <4AF09F30-CB69-4285-90A2-40AFEEEBD222@illinois.edu> <8FB3525D-3B2B-4D17-80D4-AA85B3D2AE8B@gmail.com> Message-ID: If you don't want to learn how to do this in BioPerl, then take my previous suggestion and just use NCBI's tools: Also, if you're using BLAST+, you can run your BLASTs on NCBI's computers > remotely without BioPerl. Check out the --remote command-line option On Fri, Sep 9, 2011 at 12:07, Manju Rawat wrote: > I dont no more about Bioperl.... > and i just want to blast my sequences using bioperl... > ans see the result in a file... > pls tell me what should i do??? > From manju.rawat2 at gmail.com Fri Sep 9 11:05:57 2011 From: manju.rawat2 at gmail.com (Manju Rawat) Date: Fri, 9 Sep 2011 07:05:57 -0400 Subject: [Bioperl-l] Fwd: blast result not matching. In-Reply-To: References: <4AF09F30-CB69-4285-90A2-40AFEEEBD222@illinois.edu> <8FB3525D-3B2B-4D17-80D4-AA85B3D2AE8B@gmail.com> Message-ID: I want to learn...and i am learing it from starting... My main query is I want to make a program which gives me that result(sequence) which have no blast result(no matches in any database/or particular database). for this i have to do blast may time....but i am not getting desired result in blast...this is the main problem which i am facing.. now pls tell me whta procedure i should follow... Manju From cjfields at illinois.edu Fri Sep 9 13:03:26 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Fri, 9 Sep 2011 13:03:26 +0000 Subject: [Bioperl-l] blast result not matching. In-Reply-To: References: <4AF09F30-CB69-4285-90A2-40AFEEEBD222@illinois.edu> <8FB3525D-3B2B-4D17-80D4-AA85B3D2AE8B@gmail.com> Message-ID: If you are planning on looking against 'everything' (e.g. nt or nr), and you have many sequences to run, I would follow Dave's suggestion and download BLAST locally. chris On Sep 9, 2011, at 6:05 AM, Manju Rawat wrote: > I want to learn...and i am learing it from starting... > My main query is I want to make a program which gives me that > result(sequence) which have no blast result(no matches in any database/or > particular database). > for this i have to do blast may time....but i am not getting desired result > in blast...this is the main problem which i am facing.. > now pls tell me whta procedure i should follow... > > > Manju > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From manju.rawat2 at gmail.com Fri Sep 9 09:01:55 2011 From: manju.rawat2 at gmail.com (Manju Rawat) Date: Fri, 9 Sep 2011 05:01:55 -0400 Subject: [Bioperl-l] Fwd: blast result not matching. In-Reply-To: References: <4AF09F30-CB69-4285-90A2-40AFEEEBD222@illinois.edu> <8FB3525D-3B2B-4D17-80D4-AA85B3D2AE8B@gmail.com> Message-ID: Thanks to all..Its working.. I tried that module...and got the result follwing result in terminal... waiting......db is All GenBank+EMBL+DDBJ+PDB sequences (but no EST, STS,GSS,environmental samples or phase 0, 1 or 2 HTGS sequences) hit name is ref|NM_181451.1| score is 240 hit name is ref|NM_001008415.1| score is 234 hit name is ref|XM_002706247.1| score is 212 hit name is ref|XM_002683856.1| score is 208 hit name is gb|EF197120.1| score is 208 hit name is ref|XR_083566.1| score is 198 hit name is ref|NM_001097567.1| score is 198 hit name is ref|NM_001098089.1| score is 198 hit name is ref|XM_002699708.1| score is 192 hit name is ref|XM_592786.5| score is 192 hit name is ref|XM_001251693.3| score is 192 hit name is gb|AF490400.1| score is 190 hit name is gb|AY075103.1| score is 190 hit name is ref|XR_083457.1| score is 178 But this is not showing all query coverage as it shows in simple balst.(see attached file) and i also want to write that result in a blast file..Is there any method which can write the remoteblast output in a file with blast extension? Thanks Manju Rawat. -------------- next part -------------- A non-text attachment was scrubbed... Name: res.blast Type: application/octet-stream Size: 218976 bytes Desc: not available URL: From ross at cuhk.edu.hk Sat Sep 10 06:39:23 2011 From: ross at cuhk.edu.hk (Ross KK Leung) Date: Sat, 10 Sep 2011 14:39:23 +0800 Subject: [Bioperl-l] fail to obtain node-to-node distance from a newick file Message-ID: <048a01cc6f84$60c41090$224c31b0$@edu.hk> I use the following code to derive the distance between two nodes but an error "MSG: could not find the lca of supplied nodes; can't find distance either" What's the problem? use Bio::TreeIO; ($treefh) = @ARGV; my $treeio = new Bio::TreeIO(-file => $treefh, -format => "newick"); my $tree = $treeio->next_tree; $keyword="Mycobacterium_tuberculosis_H37Rv"; my $Tnodes = $tree->find_node(-id => $keyword); $keyword="Mycobacterium_smegmatis_str._MC2_155"; my $Mnodes = $tree->find_node(-id => $keyword); $keyword="Mycobacterium_abscessus"; my $Anodes = $tree->find_node(-id => $keyword); my @root = $tree->get_root_node; #my $distances = $tree->distance(-nodes => [$node[0],$root]); my $distances = $tree->distance(-nodes => [$Tnode,$Mnodes]); print "Dist:$distances\n"; #### the following is the infile (((Mycobacterium_tuberculosis_SUMu006:4.1E-4,(Mycobacterium_tuberculosis_SUM u002:8.9E-4,Mycobacterium_tuberculosis_SUMu005:1.7E-4)31:1.4E-4)30:1.4E-4,(M ycobacterium_tuberculosis_SUMu007:4.3E-4,Mycobacterium_tuberculosis_SUMu009: 1.2E-4)22:1.4E-4)8:1.4E-4,(Mycobacterium_tuberculosis_SUMu004:3.7E-4,Mycobac terium_tuberculosis_SUMu008:2.0E-4)19:1.0E-4,(Mycobacterium_tuberculosis_SUM u003:1.4E-4,(((((Mycobacterium_tuberculosis_EAS054:0.00165,(Mycobacterium_tu berculosis_94_M4241A:0.0011,((Mycobacterium_tuberculosis_T17:0.00257,Mycobac terium_tuberculosis_KZN_605:0.00259)8:1.9E-4,(Mycobacterium_tuberculosis_'98 -R604_INH-RIF-EM':9.9E-4,(Mycobacterium_tuberculosis_C:0.00293,Mycobacterium _tuberculosis_str._Haarlem:0.00271)28:1.4E-4)1:1.3E-4)0:1.6E-4)0:1.7E-4)0:1. 6E-4,(Mycobacterium_tuberculosis_02_1987:0.00186,(Mycobacterium_tuberculosis _T85:0.00229,Mycobacterium_tuberculosis_210:0.00175)41:1.4E-4)22:1.1E-4)0:1. 4E-4,(((Mycobacterium_kansasii_ATCC_12478:0.03353,(Mycobacterium_marinum_M:8 .4E-4,Mycobacterium_ulcerans_Agy99:0.00621)100:0.03843)100:0.01058,((Mycobac terium_leprae_Br4923:1.7E-4,Mycobacterium_leprae_TN:1.6E-4)100:0.07379,(((My cobacterium_abscessus:0.09172,((Nocardia_farcinica_IFM_10152:0.09312,(Rhodoc occus_equi_103S:0.04972,((Rhodococcus_erythropolis_PR4:0.00223,Rhodococcus_e rythropolis_SK121:0.00169)100:0.05311,(Rhodococcus_jostii_RHA1:0.00657,Rhodo coccus_opacus_B4:0.00665)100:0.03198)100:0.02608)100:0.03679)100:0.0322,(((D ietzia_cinnamea_P4:0.17703,(Corynebacterium_amycolatum_SK46:0.16548,((Coryne bacterium_kroppenstedtii_DSM_44385:0.19137,(Corynebacterium_variabile_DSM_44 702:0.14307,(Corynebacterium_urealyticum_DSM_7109:0.11864,(Corynebacterium_r esistens_DSM_45100:0.08617,(Corynebacterium_jeikeium_ATCC_43734:0.00292,Cory nebacterium_jeikeium_K411:0.00277)100:0.08012)100:0.02749)100:0.03867)100:0. 05776)100:0.02728,((Corynebacterium_glucuronolyticum_ATCC_51866:0.00103,Cory nebacterium_glucuronolyticum_ATCC_51867:0.00137)100:0.2108,(((Corynebacteriu m_efficiens_YS-314:0.04795,Corynebacterium_glutamicum_R:0.05369)100:0.07449, ((Corynebacterium_matruchotii_ATCC_14266:0.00162,Corynebacterium_matruchotii _ATCC_33806:0.00183)100:0.13467,(Corynebacterium_diphtheriae_NCTC_13129:0.07 465,((Corynebacterium_pseudotuberculosis_C231:1.6E-4,Corynebacterium_pseudot uberculosis_I19:1.1E-4)55:1.8E-4,(Corynebacterium_pseudotuberculosis_1002:4. 2E-4,Corynebacterium_pseudotuberculosis_FRC41:1.2E-4)100:1.3E-4)100:0.0698)1 00:0.03816)100:0.03537)100:0.01906,((Corynebacterium_ammoniagenes_DSM_20306: 0.10393,(((Corynebacterium_accolens_ATCC_49725:0.00229,Corynebacterium_accol ens_ATCC_49726:0.00293)100:0.03941,(Corynebacterium_pseudogenitalium_ATCC_33 035:0.00268,Corynebacterium_tuberculostearicum_SK141:0.00306)100:0.02483)100 :0.04295,(Corynebacterium_aurimucosum_ATCC_700975:0.05398,Corynebacterium_st riatum_ATCC_6940:0.05404)100:0.02088)100:0.03132)100:0.05481,(Corynebacteriu m_genitalium_ATCC_33030:0.09551,Corynebacterium_lipophiloflavum_DSM_44291:0. 09316)100:0.0783)100:0.03398)100:0.02921)100:0.04667)100:0.02447)100:0.11754 )100:0.06305,(((Pseudonocardia_sp._P1:0.18877,(Saccharopolyspora_erythraea_N RRL_2338:0.1213,(Actinosynnema_mirum_DSM_43827:0.11851,(Saccharomonospora_vi ridis_DSM_43017:0.0927,(Amycolatopsis_mediterranei_U32:0.04231,Streptomyces_ sp._AA4:0.04144)100:0.05565)100:0.05467)55:0.01703)100:0.03241)100:0.02581,( Nakamurella_multipartita_DSM_44233:0.23119,(Geodermatophilus_obscurus_DSM_43 160:0.21546,((Stackebrandtia_nassauensis_DSM_44728:0.21272,((Salinispora_are nicola_CNS-205:0.01818,Salinispora_tropica_CNB-440:0.02452)100:0.03265,(Micr omonospora_sp._ATCC_39149:0.0329,(Micromonospora_aurantiaca_ATCC_27029:0.001 16,Micromonospora_sp._L5:7.7E-4)100:0.03199)100:0.01352)100:0.11098)100:0.09 308,((((((Propionibacterium_freudenreichii_subsp._shermanii_CIRM-BIA1:0.2696 8,(Propionibacterium_acnes_J139:0.00475,(Propionibacterium_acnes_KPA171202:6 .9E-4,(Propionibacterium_acnes_SK187:9.6E-4,(Propionibacterium_acnes_J165:6. 2E-4,Propionibacterium_acnes_SK137:8.8E-4)100:6.3E-4)100:0.00181)100:0.00315 )100:0.24712)45:0.13659,(Kribbella_flavida_DSM_17836:0.17436,(Nocardioidacea e_bacterium_Broad-1:0.13638,Nocardioides_sp._JS614:0.08869)100:0.11094)73:0. 02696)43:0.0456,((Kytococcus_sedentarius_DSM_20547:0.25655,(Dermacoccus_sp._ Ellin185:0.19062,(Intrasporangium_calvum_DSM_43043:0.13866,Janibacter_sp._HT CC2649:0.1412)100:0.04228)58:0.02864)100:0.06896,(Kineococcus_radiotolerans_ SRS30216:0.21207,((((Propionibacterium_acidifaciens_F0233:1.06116,Rubrobacte r_xylanophilus_DSM_9941:0.5414)87:0.41816,(((Tropheryma_whipplei_TW08/27:0.0 0122,Tropheryma_whipplei_str._Twist:9.8E-4)100:0.6263,(Microbacterium_testac eum_StLB037:0.17489,(Leifsonia_xyli_subsp._xyli_str._CTCB07:0.12936,(marine_ actinobacterium_PHSC20C1:0.15679,(Clavibacter_michiganensis_subsp._michigane nsis_NCPPB_382:0.00662,Clavibacter_michiganensis_subsp._sepedonicus:0.00756) 100:0.1194)100:0.03032)100:0.03729)100:0.05755)100:0.10753,((Kocuria_rhizoph ila_DC2201:0.15484,(Rothia_dentocariosa_M567:0.0599,(Rothia_mucilaginosa_ATC C_25296:0.00472,Rothia_mucilaginosa_DY-18:0.00351)100:0.05376)100:0.16477)10 0:0.06018,((Micrococcus_luteus_NCTC_2665:0.0060,Micrococcus_luteus_SK58:0.00 536)100:0.2105,(Arthrobacter_arilaitensis_Re117:0.19987,(Renibacterium_salmo ninarum_ATCC_33209:0.11862,(Arthrobacter_aurescens_TC1:0.0397,(Arthrobacter_ sp._FB24:0.03191,(Arthrobacter_chlorophenolicus_A6:0.02327,Arthrobacter_phen anthrenivorans_Sphe3:0.0208)100:0.01937)100:0.01749)100:0.05605)100:0.05138) 54:0.02994)100:0.02418)100:0.06667)66:0.02055)2:0.00737,(Brevibacterium_line ns_BL2:0.16126,Brevibacterium_mcbrellneri_ATCC_49030:0.16657)100:0.16995)18: 0.03558,(Brachybacterium_faecium_DSM_4810:0.31488,(((Actinomyces_sp._oral_ta xon_848_str._F0332:0.18095,Arcanobacterium_haemolyticum_DSM_20595:0.23121)10 0:0.1195,((Actinomyces_urogenitalis_DSM_15434:0.07685,(Actinomyces_sp._oral_ taxon_171_str._F0337:0.01337,(Actinomyces_oris_K20:0.00655,Actinomyces_visco sus_C505:0.00531)100:0.00537)100:0.07249)100:0.1364,(((Mobiluncus_mulieris_2 8-1:0.00136,(Mobiluncus_mulieris_ATCC_35243:8.3E-4,(Mobiluncus_mulieris_ATCC _35239:8.2E-4,Mobiluncus_mulieris_FB024-16:0.00149)77:5.2E-4)68:4.7E-4)100:0 .11036,(Mobiluncus_curtisii_ATCC_51333:0.00379,(Mobiluncus_curtisii_ATCC_430 63:0.00217,Mobiluncus_curtisii_subsp._holmesii_ATCC_35242:0.00188)100:0.0018 7)100:0.10314)100:0.23714,(Actinomyces_coleocanis_DSM_15436:0.23053,(Actinom yces_sp._oral_taxon_178_str._F0338:0.07648,(Actinomyces_sp._oral_taxon_180_s tr._F0310:0.034,(Actinomyces_odontolyticus_ATCC_17982:0.00322,Actinomyces_od ontolyticus_F0309:0.00348)100:0.02875)100:0.05144)100:0.13371)100:0.05044)10 0:0.03821)98:0.03296)99:0.07932,(Beutenbergia_cavernae_DSM_12333:0.17071,(Ce llulomonas_flavigena_DSM_20109:0.11998,(Xylanimonas_cellulosilytica_DSM_1589 4:0.13952,(Sanguibacter_keddieii_DSM_10542:0.0926,Jonesia_denitrificans_DSM_ 20603:0.19099)100:0.037)100:0.02528)100:0.03933)100:0.03437)90:0.02763)87:0. 02582)25:0.02892)16:0.01907)26:0.05274)57:0.03588,(Catenulispora_acidiphila_ DSM_44928:0.19325,((Streptomyces_bingchenggensis_BCW-1:0.02661,(Streptomyces _hygroscopicus_ATCC_53653:0.02627,Streptomyces_violaceusniger_Tu_4113:0.0226 6)100:0.0122)100:0.02739,(((Streptomyces_avermitilis_MA-4680:0.02562,((Strep tomyces_viridochromogenes_DSM_40736:0.02687,((Streptomyces_ghanaensis_ATCC_1 4672:0.0209,Streptomyces_griseoflavus_Tu4000:0.02075)100:0.01066,(Streptomyc es_sp._e14:0.03193,(Streptomyces_coelicolor_A3_2_:5.2E-4,Streptomyces_livida ns_TK24:6.7E-4)100:0.02763)21:0.00492)22:0.00461)100:0.00897,(Streptomyces_s viceus_ATCC_29083:0.02794,Streptomyces_scabiei_87.22:0.03155)79:0.00685)98:0 .00896)100:0.02063,(Streptomyces_albus_J1074:0.04846,(Streptomyces_sp._SPB74 :0.01027,(Streptomyces_sp._SPB78:0.0027,(Streptomyces_sp._SA3_actF:0.00376,S treptomyces_sp._SA3_actG:2.4E-4)100:6.7E-4)100:0.00617)100:0.05981)100:0.015 63)100:0.00959,((Streptomyces_sp._C:0.01998,Streptomyces_sp._Mg1:0.02056)100 :0.0383,(Streptomyces_pristinaespiralis_ATCC_25486:0.03823,(Streptomyces_sp. _ACTE:0.02415,((Streptomyces_griseus_subsp._griseus_NBRC_13350:4.1E-4,Strept omyces_sp._ACT-1:5.2E-4)100:0.00769,(Streptomyces_roseosporus_NRRL_11379:2.2 E-4,Streptomyces_roseosporus_NRRL_15998:0.00209)100:0.00782)100:0.017)100:0. 02297)60:0.00742)100:0.00985)100:0.0215)100:0.11349)100:0.05)46:0.02214,(Aci dothermus_cellulolyticus_11B:0.25166,((Nocardiopsis_dassonvillei_subsp._dass onvillei_DSM_43111:0.11961,Thermobifida_fusca_YX:0.09198)100:0.09418,(Thermo monospora_curvata_DSM_43183:0.12535,(Streptosporangium_roseum_DSM_43021:0.09 754,Thermobispora_bispora_DSM_43833:0.08434)100:0.06253)100:0.0273)100:0.045 43)74:0.02856)40:0.02885,(Frankia_symbiont_of_Datisca_glomerata:0.09569,(Fra nkia_sp._EuI1c:0.10847,((Frankia_alni_ACN14a:0.0351,Frankia_sp._CcI3:0.03651 )100:0.04249,(Frankia_sp._EAN1pec:0.03101,Frankia_sp._EUN1f:0.0394)100:0.046 88)100:0.02188)100:0.03014)100:0.16741)88:0.0315)65:0.02571)88:0.05354)88:0. 03348)88:0.06784,(Segniliparus_rotundus_DSM_44985:0.04795,Segniliparus_rugos us_ATCC_BAA-974:0.03891)100:0.23144)81:0.0205)88:0.02214,(Tsukamurella_pauro metabola_DSM_20162:0.13958,(Gordonia_bronchialis_DSM_43247:0.08238,Gordonia_ neofelifaecis_NRRL_B-59395:0.10412)100:0.06124)100:0.03199)84:0.01596)88:0.0 6919)88:0.04069,(Mycobacterium_smegmatis_str._MC2_155:0.05174,((Mycobacteriu m_sp._JLS:0.00135,(Mycobacterium_sp._KMS:1.4E-4,Mycobacterium_sp._MCS:2.3E-4 )100:0.00177)100:0.04943,(Mycobacterium_vanbaalenii_PYR-1:0.02631,(Mycobacte rium_gilvum_PYR-GCK:0.0011,Mycobacterium_sp._Spyr1:9.2E-4)100:0.03388)100:0. 03536)100:0.01213)100:0.02314)88:0.04937,(Mycobacterium_parascrofulaceum_ATC C_BAA-614:0.02796,(Mycobacterium_intracellulare_ATCC_13950:0.02159,(Mycobact erium_avium_subsp._paratuberculosis_K-10:0.0020,(Mycobacterium_avium_104:0.0 032,Mycobacterium_avium_subsp._avium_ATCC_25291:0.00216)91:6.3E-4)100:0.0194 3)100:0.01235)100:0.0158)88:0.01054)62:0.0066)88:0.04218,(Mycobacterium_tube rculosis_T46:7.6E-4,(Mycobacterium_tuberculosis_CPHL_A:5.8E-4,(Mycobacterium _tuberculosis_K85:8.0E-4,(Mycobacterium_bovis_AF2122/97:1.4E-4,(Mycobacteriu m_bovis_BCG_str._Pasteur_1173P2:1.4E-4,Mycobacterium_bovis_BCG_str._Tokyo_17 2:1.4E-4)100:1.3E-4)100:1.8E-4)38:1.1E-4)33:1.4E-4)0:1.4E-4)1:1.5E-4)0:1.5E- 4,(Mycobacterium_tuberculosis_CDC1551:2.2E-4,(Mycobacterium_tuberculosis_T92 :0.00262,Mycobacterium_tuberculosis_GM_1503:0.00427)3:1.3E-4)0:1.4E-4)13:1.4 E-4,(((Mycobacterium_tuberculosis_SUMu012:8.5E-4,(Mycobacterium_tuberculosis _H37Ra_WGS_:0.00109,(Mycobacterium_tuberculosis_H37Ra:1.4E-4,Mycobacterium_t uberculosis_H37Rv:1.4E-4)58:1.4E-4)41:1.4E-4)72:1.4E-4,(Mycobacterium_tuberc ulosis_SUMu001:2.9E-4,(Mycobacterium_tuberculosis_SUMu010:1.1E-4,Mycobacteri um_tuberculosis_SUMu011:8.7E-4)58:1.4E-4)65:1.6E-4)61:0.00102,(Mycobacterium _tuberculosis_F11:1.1E-4,((Mycobacterium_tuberculosis_KZN_4207:1.4E-4,Mycoba cterium_tuberculosis_KZN_R506:1.4E-4)58:1.4E-4,(Mycobacterium_tuberculosis_K ZN_1435:1.4E-4,Mycobacterium_tuberculosis_KZN_V2475:1.4E-4)74:1.4E-4)78:1.1E -4)36:1.4E-4)3:1.4E-4)46:1.4E-4)4:1.4E-4)88; From greg at ebi.ac.uk Sat Sep 10 15:39:52 2011 From: greg at ebi.ac.uk (Gregory Jordan) Date: Sat, 10 Sep 2011 16:39:52 +0100 Subject: [Bioperl-l] fail to obtain node-to-node distance from a newick file In-Reply-To: <048a01cc6f84$60c41090$224c31b0$@edu.hk> References: <048a01cc6f84$60c41090$224c31b0$@edu.hk> Message-ID: Hi Ross, Which version of BioPerl are you using? With the refactored tree code (available from the tree_api_refresh branch on the BioPerl github repo: https://github.com/bioperl/bioperl-live/blob/topic/tree_api_refresh/Bio/Tree/NodeFunctionsI.pm#L406) the following script works for me. Do those values look sensible to you? The code on the new branch is a bit experimental, so I wouldn't be surprised if all the edge cases for calculations like this aren't covered. --greg use Bio::TreeIO; my $treeio = new Bio::TreeIO(-file => 'temp.nh', -format => "newick"); my $tree = $treeio->next_tree; my $mt = $tree->find("Mycobacterium_tuberculosis_H37Rv"); my $ms = $tree->find("Mycobacterium_smegmatis_str._MC2_155"); my $ma = $tree->find("Mycobacterium_abscessus"); my $distance = $mt->distance($ma); print "MT - MA: ".$mt->distance($ma)."\n"; print "MT - MS: ".$mt->distance($ms)."\n"; print "MS - MA: ".$ms->distance($ma)."\n"; # MT - MA: 0.24326 # MT - MS: 0.18573 # MS - MA: 0.20729 --greg On Sat, Sep 10, 2011 at 7:39 AM, Ross KK Leung wrote: > I use the following code to derive the distance between two nodes but an > error "MSG: could not find the lca of supplied nodes; can't find distance > either" > > > > What's the problem? > > > > use Bio::TreeIO; > > > > ($treefh) = @ARGV; > > > > my $treeio = new Bio::TreeIO(-file => $treefh, -format => "newick"); > > my $tree = $treeio->next_tree; > > > > $keyword="Mycobacterium_tuberculosis_H37Rv"; > > my $Tnodes = $tree->find_node(-id => $keyword); > > $keyword="Mycobacterium_smegmatis_str._MC2_155"; > > my $Mnodes = $tree->find_node(-id => $keyword); > > $keyword="Mycobacterium_abscessus"; > > my $Anodes = $tree->find_node(-id => $keyword); > > > > my @root = $tree->get_root_node; > > #my $distances = $tree->distance(-nodes => [$node[0],$root]); > > > > my $distances = $tree->distance(-nodes => [$Tnode,$Mnodes]); > > print "Dist:$distances\n"; > > > > > > #### the following is the infile > > > (((Mycobacterium_tuberculosis_SUMu006:4.1E-4,(Mycobacterium_tuberculosis_SUM > > u002:8.9E-4,Mycobacterium_tuberculosis_SUMu005:1.7E-4)31:1.4E-4)30:1.4E-4,(M > > ycobacterium_tuberculosis_SUMu007:4.3E-4,Mycobacterium_tuberculosis_SUMu009: > > 1.2E-4)22:1.4E-4)8:1.4E-4,(Mycobacterium_tuberculosis_SUMu004:3.7E-4,Mycobac > > terium_tuberculosis_SUMu008:2.0E-4)19:1.0E-4,(Mycobacterium_tuberculosis_SUM > > u003:1.4E-4,(((((Mycobacterium_tuberculosis_EAS054:0.00165,(Mycobacterium_tu > > berculosis_94_M4241A:0.0011,((Mycobacterium_tuberculosis_T17:0.00257,Mycobac > > terium_tuberculosis_KZN_605:0.00259)8:1.9E-4,(Mycobacterium_tuberculosis_'98 > > -R604_INH-RIF-EM':9.9E-4,(Mycobacterium_tuberculosis_C:0.00293,Mycobacterium > > _tuberculosis_str._Haarlem:0.00271)28:1.4E-4)1:1.3E-4)0:1.6E-4)0:1.7E-4)0:1. > > 6E-4,(Mycobacterium_tuberculosis_02_1987:0.00186,(Mycobacterium_tuberculosis > > _T85:0.00229,Mycobacterium_tuberculosis_210:0.00175)41:1.4E-4)22:1.1E-4)0:1. > > 4E-4,(((Mycobacterium_kansasii_ATCC_12478:0.03353,(Mycobacterium_marinum_M:8 > > .4E-4,Mycobacterium_ulcerans_Agy99:0.00621)100:0.03843)100:0.01058,((Mycobac > > terium_leprae_Br4923:1.7E-4,Mycobacterium_leprae_TN:1.6E-4)100:0.07379,(((My > > cobacterium_abscessus:0.09172,((Nocardia_farcinica_IFM_10152:0.09312,(Rhodoc > > occus_equi_103S:0.04972,((Rhodococcus_erythropolis_PR4:0.00223,Rhodococcus_e > > rythropolis_SK121:0.00169)100:0.05311,(Rhodococcus_jostii_RHA1:0.00657,Rhodo > > coccus_opacus_B4:0.00665)100:0.03198)100:0.02608)100:0.03679)100:0.0322,(((D > > ietzia_cinnamea_P4:0.17703,(Corynebacterium_amycolatum_SK46:0.16548,((Coryne > > bacterium_kroppenstedtii_DSM_44385:0.19137,(Corynebacterium_variabile_DSM_44 > > 702:0.14307,(Corynebacterium_urealyticum_DSM_7109:0.11864,(Corynebacterium_r > > esistens_DSM_45100:0.08617,(Corynebacterium_jeikeium_ATCC_43734:0.00292,Cory > > nebacterium_jeikeium_K411:0.00277)100:0.08012)100:0.02749)100:0.03867)100:0. > > 05776)100:0.02728,((Corynebacterium_glucuronolyticum_ATCC_51866:0.00103,Cory > > nebacterium_glucuronolyticum_ATCC_51867:0.00137)100:0.2108,(((Corynebacteriu > > m_efficiens_YS-314:0.04795,Corynebacterium_glutamicum_R:0.05369)100:0.07449, > > ((Corynebacterium_matruchotii_ATCC_14266:0.00162,Corynebacterium_matruchotii > > _ATCC_33806:0.00183)100:0.13467,(Corynebacterium_diphtheriae_NCTC_13129:0.07 > > 465,((Corynebacterium_pseudotuberculosis_C231:1.6E-4,Corynebacterium_pseudot > > uberculosis_I19:1.1E-4)55:1.8E-4,(Corynebacterium_pseudotuberculosis_1002:4. > > 2E-4,Corynebacterium_pseudotuberculosis_FRC41:1.2E-4)100:1.3E-4)100:0.0698)1 > > 00:0.03816)100:0.03537)100:0.01906,((Corynebacterium_ammoniagenes_DSM_20306: > > 0.10393,(((Corynebacterium_accolens_ATCC_49725:0.00229,Corynebacterium_accol > > ens_ATCC_49726:0.00293)100:0.03941,(Corynebacterium_pseudogenitalium_ATCC_33 > > 035:0.00268,Corynebacterium_tuberculostearicum_SK141:0.00306)100:0.02483)100 > > :0.04295,(Corynebacterium_aurimucosum_ATCC_700975:0.05398,Corynebacterium_st > > riatum_ATCC_6940:0.05404)100:0.02088)100:0.03132)100:0.05481,(Corynebacteriu > > m_genitalium_ATCC_33030:0.09551,Corynebacterium_lipophiloflavum_DSM_44291:0. > > 09316)100:0.0783)100:0.03398)100:0.02921)100:0.04667)100:0.02447)100:0.11754 > > )100:0.06305,(((Pseudonocardia_sp._P1:0.18877,(Saccharopolyspora_erythraea_N > > RRL_2338:0.1213,(Actinosynnema_mirum_DSM_43827:0.11851,(Saccharomonospora_vi > > ridis_DSM_43017:0.0927,(Amycolatopsis_mediterranei_U32:0.04231,Streptomyces_ > > sp._AA4:0.04144)100:0.05565)100:0.05467)55:0.01703)100:0.03241)100:0.02581,( > > Nakamurella_multipartita_DSM_44233:0.23119,(Geodermatophilus_obscurus_DSM_43 > > 160:0.21546,((Stackebrandtia_nassauensis_DSM_44728:0.21272,((Salinispora_are > > nicola_CNS-205:0.01818,Salinispora_tropica_CNB-440:0.02452)100:0.03265,(Micr > > omonospora_sp._ATCC_39149:0.0329,(Micromonospora_aurantiaca_ATCC_27029:0.001 > > 16,Micromonospora_sp._L5:7.7E-4)100:0.03199)100:0.01352)100:0.11098)100:0.09 > > 308,((((((Propionibacterium_freudenreichii_subsp._shermanii_CIRM-BIA1:0.2696 > > 8,(Propionibacterium_acnes_J139:0.00475,(Propionibacterium_acnes_KPA171202:6 > > .9E-4,(Propionibacterium_acnes_SK187:9.6E-4,(Propionibacterium_acnes_J165:6. > > 2E-4,Propionibacterium_acnes_SK137:8.8E-4)100:6.3E-4)100:0.00181)100:0.00315 > > )100:0.24712)45:0.13659,(Kribbella_flavida_DSM_17836:0.17436,(Nocardioidacea > > e_bacterium_Broad-1:0.13638,Nocardioides_sp._JS614:0.08869)100:0.11094)73:0. > > 02696)43:0.0456,((Kytococcus_sedentarius_DSM_20547:0.25655,(Dermacoccus_sp._ > > Ellin185:0.19062,(Intrasporangium_calvum_DSM_43043:0.13866,Janibacter_sp._HT > > CC2649:0.1412)100:0.04228)58:0.02864)100:0.06896,(Kineococcus_radiotolerans_ > > SRS30216:0.21207,((((Propionibacterium_acidifaciens_F0233:1.06116,Rubrobacte > > r_xylanophilus_DSM_9941:0.5414)87:0.41816,(((Tropheryma_whipplei_TW08/27:0.0 > > 0122,Tropheryma_whipplei_str._Twist:9.8E-4)100:0.6263,(Microbacterium_testac > > eum_StLB037:0.17489,(Leifsonia_xyli_subsp._xyli_str._CTCB07:0.12936,(marine_ > > actinobacterium_PHSC20C1:0.15679,(Clavibacter_michiganensis_subsp._michigane > > nsis_NCPPB_382:0.00662,Clavibacter_michiganensis_subsp._sepedonicus:0.00756) > > 100:0.1194)100:0.03032)100:0.03729)100:0.05755)100:0.10753,((Kocuria_rhizoph > > ila_DC2201:0.15484,(Rothia_dentocariosa_M567:0.0599,(Rothia_mucilaginosa_ATC > > C_25296:0.00472,Rothia_mucilaginosa_DY-18:0.00351)100:0.05376)100:0.16477)10 > > 0:0.06018,((Micrococcus_luteus_NCTC_2665:0.0060,Micrococcus_luteus_SK58:0.00 > > 536)100:0.2105,(Arthrobacter_arilaitensis_Re117:0.19987,(Renibacterium_salmo > > ninarum_ATCC_33209:0.11862,(Arthrobacter_aurescens_TC1:0.0397,(Arthrobacter_ > > sp._FB24:0.03191,(Arthrobacter_chlorophenolicus_A6:0.02327,Arthrobacter_phen > > anthrenivorans_Sphe3:0.0208)100:0.01937)100:0.01749)100:0.05605)100:0.05138) > > 54:0.02994)100:0.02418)100:0.06667)66:0.02055)2:0.00737,(Brevibacterium_line > > ns_BL2:0.16126,Brevibacterium_mcbrellneri_ATCC_49030:0.16657)100:0.16995)18: > > 0.03558,(Brachybacterium_faecium_DSM_4810:0.31488,(((Actinomyces_sp._oral_ta > > xon_848_str._F0332:0.18095,Arcanobacterium_haemolyticum_DSM_20595:0.23121)10 > > 0:0.1195,((Actinomyces_urogenitalis_DSM_15434:0.07685,(Actinomyces_sp._oral_ > > taxon_171_str._F0337:0.01337,(Actinomyces_oris_K20:0.00655,Actinomyces_visco > > sus_C505:0.00531)100:0.00537)100:0.07249)100:0.1364,(((Mobiluncus_mulieris_2 > > 8-1:0.00136,(Mobiluncus_mulieris_ATCC_35243:8.3E-4,(Mobiluncus_mulieris_ATCC > > _35239:8.2E-4,Mobiluncus_mulieris_FB024-16:0.00149)77:5.2E-4)68:4.7E-4)100:0 > > .11036,(Mobiluncus_curtisii_ATCC_51333:0.00379,(Mobiluncus_curtisii_ATCC_430 > > 63:0.00217,Mobiluncus_curtisii_subsp._holmesii_ATCC_35242:0.00188)100:0.0018 > > 7)100:0.10314)100:0.23714,(Actinomyces_coleocanis_DSM_15436:0.23053,(Actinom > > yces_sp._oral_taxon_178_str._F0338:0.07648,(Actinomyces_sp._oral_taxon_180_s > > tr._F0310:0.034,(Actinomyces_odontolyticus_ATCC_17982:0.00322,Actinomyces_od > > ontolyticus_F0309:0.00348)100:0.02875)100:0.05144)100:0.13371)100:0.05044)10 > > 0:0.03821)98:0.03296)99:0.07932,(Beutenbergia_cavernae_DSM_12333:0.17071,(Ce > > llulomonas_flavigena_DSM_20109:0.11998,(Xylanimonas_cellulosilytica_DSM_1589 > > 4:0.13952,(Sanguibacter_keddieii_DSM_10542:0.0926,Jonesia_denitrificans_DSM_ > > 20603:0.19099)100:0.037)100:0.02528)100:0.03933)100:0.03437)90:0.02763)87:0. > > 02582)25:0.02892)16:0.01907)26:0.05274)57:0.03588,(Catenulispora_acidiphila_ > > DSM_44928:0.19325,((Streptomyces_bingchenggensis_BCW-1:0.02661,(Streptomyces > > _hygroscopicus_ATCC_53653:0.02627,Streptomyces_violaceusniger_Tu_4113:0.0226 > > 6)100:0.0122)100:0.02739,(((Streptomyces_avermitilis_MA-4680:0.02562,((Strep > > tomyces_viridochromogenes_DSM_40736:0.02687,((Streptomyces_ghanaensis_ATCC_1 > > 4672:0.0209,Streptomyces_griseoflavus_Tu4000:0.02075)100:0.01066,(Streptomyc > > es_sp._e14:0.03193,(Streptomyces_coelicolor_A3_2_:5.2E-4,Streptomyces_livida > > ns_TK24:6.7E-4)100:0.02763)21:0.00492)22:0.00461)100:0.00897,(Streptomyces_s > > viceus_ATCC_29083:0.02794,Streptomyces_scabiei_87.22:0.03155)79:0.00685)98:0 > > .00896)100:0.02063,(Streptomyces_albus_J1074:0.04846,(Streptomyces_sp._SPB74 > > :0.01027,(Streptomyces_sp._SPB78:0.0027,(Streptomyces_sp._SA3_actF:0.00376,S > > treptomyces_sp._SA3_actG:2.4E-4)100:6.7E-4)100:0.00617)100:0.05981)100:0.015 > > 63)100:0.00959,((Streptomyces_sp._C:0.01998,Streptomyces_sp._Mg1:0.02056)100 > > :0.0383,(Streptomyces_pristinaespiralis_ATCC_25486:0.03823,(Streptomyces_sp. > > _ACTE:0.02415,((Streptomyces_griseus_subsp._griseus_NBRC_13350:4.1E-4,Strept > > omyces_sp._ACT-1:5.2E-4)100:0.00769,(Streptomyces_roseosporus_NRRL_11379:2.2 > > E-4,Streptomyces_roseosporus_NRRL_15998:0.00209)100:0.00782)100:0.017)100:0. > > 02297)60:0.00742)100:0.00985)100:0.0215)100:0.11349)100:0.05)46:0.02214,(Aci > > dothermus_cellulolyticus_11B:0.25166,((Nocardiopsis_dassonvillei_subsp._dass > > onvillei_DSM_43111:0.11961,Thermobifida_fusca_YX:0.09198)100:0.09418,(Thermo > > monospora_curvata_DSM_43183:0.12535,(Streptosporangium_roseum_DSM_43021:0.09 > > 754,Thermobispora_bispora_DSM_43833:0.08434)100:0.06253)100:0.0273)100:0.045 > > 43)74:0.02856)40:0.02885,(Frankia_symbiont_of_Datisca_glomerata:0.09569,(Fra > > nkia_sp._EuI1c:0.10847,((Frankia_alni_ACN14a:0.0351,Frankia_sp._CcI3:0.03651 > > )100:0.04249,(Frankia_sp._EAN1pec:0.03101,Frankia_sp._EUN1f:0.0394)100:0.046 > > 88)100:0.02188)100:0.03014)100:0.16741)88:0.0315)65:0.02571)88:0.05354)88:0. > > 03348)88:0.06784,(Segniliparus_rotundus_DSM_44985:0.04795,Segniliparus_rugos > > us_ATCC_BAA-974:0.03891)100:0.23144)81:0.0205)88:0.02214,(Tsukamurella_pauro > > metabola_DSM_20162:0.13958,(Gordonia_bronchialis_DSM_43247:0.08238,Gordonia_ > > neofelifaecis_NRRL_B-59395:0.10412)100:0.06124)100:0.03199)84:0.01596)88:0.0 > > 6919)88:0.04069,(Mycobacterium_smegmatis_str._MC2_155:0.05174,((Mycobacteriu > > m_sp._JLS:0.00135,(Mycobacterium_sp._KMS:1.4E-4,Mycobacterium_sp._MCS:2.3E-4 > > )100:0.00177)100:0.04943,(Mycobacterium_vanbaalenii_PYR-1:0.02631,(Mycobacte > > rium_gilvum_PYR-GCK:0.0011,Mycobacterium_sp._Spyr1:9.2E-4)100:0.03388)100:0. > > 03536)100:0.01213)100:0.02314)88:0.04937,(Mycobacterium_parascrofulaceum_ATC > > C_BAA-614:0.02796,(Mycobacterium_intracellulare_ATCC_13950:0.02159,(Mycobact > > erium_avium_subsp._paratuberculosis_K-10:0.0020,(Mycobacterium_avium_104:0.0 > > 032,Mycobacterium_avium_subsp._avium_ATCC_25291:0.00216)91:6.3E-4)100:0.0194 > > 3)100:0.01235)100:0.0158)88:0.01054)62:0.0066)88:0.04218,(Mycobacterium_tube > > rculosis_T46:7.6E-4,(Mycobacterium_tuberculosis_CPHL_A:5.8E-4,(Mycobacterium > > _tuberculosis_K85:8.0E-4,(Mycobacterium_bovis_AF2122/97:1.4E-4,(Mycobacteriu > > m_bovis_BCG_str._Pasteur_1173P2:1.4E-4,Mycobacterium_bovis_BCG_str._Tokyo_17 > > 2:1.4E-4)100:1.3E-4)100:1.8E-4)38:1.1E-4)33:1.4E-4)0:1.4E-4)1:1.5E-4)0:1.5E- > > 4,(Mycobacterium_tuberculosis_CDC1551:2.2E-4,(Mycobacterium_tuberculosis_T92 > > :0.00262,Mycobacterium_tuberculosis_GM_1503:0.00427)3:1.3E-4)0:1.4E-4)13:1.4 > > E-4,(((Mycobacterium_tuberculosis_SUMu012:8.5E-4,(Mycobacterium_tuberculosis > > _H37Ra_WGS_:0.00109,(Mycobacterium_tuberculosis_H37Ra:1.4E-4,Mycobacterium_t > > uberculosis_H37Rv:1.4E-4)58:1.4E-4)41:1.4E-4)72:1.4E-4,(Mycobacterium_tuberc > > ulosis_SUMu001:2.9E-4,(Mycobacterium_tuberculosis_SUMu010:1.1E-4,Mycobacteri > > um_tuberculosis_SUMu011:8.7E-4)58:1.4E-4)65:1.6E-4)61:0.00102,(Mycobacterium > > _tuberculosis_F11:1.1E-4,((Mycobacterium_tuberculosis_KZN_4207:1.4E-4,Mycoba > > cterium_tuberculosis_KZN_R506:1.4E-4)58:1.4E-4,(Mycobacterium_tuberculosis_K > > ZN_1435:1.4E-4,Mycobacterium_tuberculosis_KZN_V2475:1.4E-4)74:1.4E-4)78:1.1E > -4)36:1.4E-4)3:1.4E-4)46:1.4E-4)4:1.4E-4)88; > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From ross at cuhk.edu.hk Sat Sep 10 23:06:44 2011 From: ross at cuhk.edu.hk (Ross KK Leung) Date: Sun, 11 Sep 2011 07:06:44 +0800 Subject: [Bioperl-l] fail to obtain node-to-node distance from a newick file In-Reply-To: References: <048a01cc6f84$60c41090$224c31b0$@edu.hk> Message-ID: <04a601cc700e$4f051d10$ed0f5730$@edu.hk> Hi Greg, The values are correct! However, how to install this bioperl-live module? my bioperl is 1.6.1 but there's an error: Can't locate object method "find" via package "Bio::Tree::Tree" at TreeCalDist.pl line 32, line 1. my $mt = $tree->find($keyword); #line 32 From: gjuggler at gmail.com [mailto:gjuggler at gmail.com] On Behalf Of Gregory Jordan Sent: 2011?9?10? 23:40 To: bioperl-l List; Ross KK Leung Subject: Re: [Bioperl-l] fail to obtain node-to-node distance from a newick file Hi Ross, Which version of BioPerl are you using? With the refactored tree code (available from the tree_api_refresh branch on the BioPerl github repo: https://github.com/bioperl/bioperl-live/blob/topic/tree_api_refresh/Bio/Tree /NodeFunctionsI.pm#L406) the following script works for me. Do those values look sensible to you? The code on the new branch is a bit experimental, so I wouldn't be surprised if all the edge cases for calculations like this aren't covered. --greg use Bio::TreeIO; my $treeio = new Bio::TreeIO(-file => 'temp.nh', -format => "newick"); my $tree = $treeio->next_tree; my $mt = $tree->find("Mycobacterium_tuberculosis_H37Rv"); my $ms = $tree->find("Mycobacterium_smegmatis_str._MC2_155"); my $ma = $tree->find("Mycobacterium_abscessus"); my $distance = $mt->distance($ma); print "MT - MA: ".$mt->distance($ma)."\n"; print "MT - MS: ".$mt->distance($ms)."\n"; print "MS - MA: ".$ms->distance($ma)."\n"; # MT - MA: 0.24326 # MT - MS: 0.18573 # MS - MA: 0.20729 --greg On Sat, Sep 10, 2011 at 7:39 AM, Ross KK Leung wrote: I use the following code to derive the distance between two nodes but an error "MSG: could not find the lca of supplied nodes; can't find distance either" What's the problem? use Bio::TreeIO; ($treefh) = @ARGV; my $treeio = new Bio::TreeIO(-file => $treefh, -format => "newick"); my $tree = $treeio->next_tree; $keyword="Mycobacterium_tuberculosis_H37Rv"; my $Tnodes = $tree->find_node(-id => $keyword); $keyword="Mycobacterium_smegmatis_str._MC2_155"; my $Mnodes = $tree->find_node(-id => $keyword); $keyword="Mycobacterium_abscessus"; my $Anodes = $tree->find_node(-id => $keyword); my @root = $tree->get_root_node; #my $distances = $tree->distance(-nodes => [$node[0],$root]); my $distances = $tree->distance(-nodes => [$Tnode,$Mnodes]); print "Dist:$distances\n"; #### the following is the infile (((Mycobacterium_tuberculosis_SUMu006:4.1E-4,(Mycobacterium_tuberculosis_SUM u002:8.9E-4,Mycobacterium_tuberculosis_SUMu005:1.7E-4)31:1.4E-4)30:1.4E-4,(M ycobacterium_tuberculosis_SUMu007:4.3E-4,Mycobacterium_tuberculosis_SUMu009: 1.2E-4)22:1.4E-4)8:1.4E-4,(Mycobacterium_tuberculosis_SUMu004:3.7E-4,Mycobac terium_tuberculosis_SUMu008:2.0E-4)19:1.0E-4,(Mycobacterium_tuberculosis_SUM u003:1.4E-4,(((((Mycobacterium_tuberculosis_EAS054:0.00165,(Mycobacterium_tu berculosis_94_M4241A:0.0011,((Mycobacterium_tuberculosis_T17:0.00257,Mycobac terium_tuberculosis_KZN_605:0.00259)8:1.9E-4,(Mycobacterium_tuberculosis_'98 -R604_INH-RIF-EM':9.9E-4,(Mycobacterium_tuberculosis_C:0.00293,Mycobacterium _tuberculosis_str._Haarlem:0.00271)28:1.4E-4)1:1.3E-4)0:1.6E-4)0:1.7E-4)0:1. 6E-4,(Mycobacterium_tuberculosis_02_1987:0.00186,(Mycobacterium_tuberculosis _T85:0.00229,Mycobacterium_tuberculosis_210:0.00175)41:1.4E-4)22:1.1E-4)0:1. 4E-4,(((Mycobacterium_kansasii_ATCC_12478:0.03353,(Mycobacterium_marinum_M:8 .4E-4,Mycobacterium_ulcerans_Agy99:0.00621)100:0.03843)100:0.01058,((Mycobac terium_leprae_Br4923:1.7E-4,Mycobacterium_leprae_TN:1.6E-4)100:0.07379,(((My cobacterium_abscessus:0.09172,((Nocardia_farcinica_IFM_10152:0.09312,(Rhodoc occus_equi_103S:0.04972,((Rhodococcus_erythropolis_PR4:0.00223,Rhodococcus_e rythropolis_SK121:0.00169)100:0.05311,(Rhodococcus_jostii_RHA1:0.00657,Rhodo coccus_opacus_B4:0.00665)100:0.03198)100:0.02608)100:0.03679)100:0.0322,(((D ietzia_cinnamea_P4:0.17703,(Corynebacterium_amycolatum_SK46:0.16548,((Coryne bacterium_kroppenstedtii_DSM_44385:0.19137,(Corynebacterium_variabile_DSM_44 702:0.14307,(Corynebacterium_urealyticum_DSM_7109:0.11864,(Corynebacterium_r esistens_DSM_45100:0.08617,(Corynebacterium_jeikeium_ATCC_43734:0.00292,Cory nebacterium_jeikeium_K411:0.00277)100:0.08012)100:0.02749)100:0.03867)100:0. 05776)100:0.02728,((Corynebacterium_glucuronolyticum_ATCC_51866:0.00103,Cory nebacterium_glucuronolyticum_ATCC_51867:0.00137)100:0.2108,(((Corynebacteriu m_efficiens_YS-314:0.04795,Corynebacterium_glutamicum_R:0.05369)100:0.07449, ((Corynebacterium_matruchotii_ATCC_14266:0.00162,Corynebacterium_matruchotii _ATCC_33806:0.00183)100:0.13467,(Corynebacterium_diphtheriae_NCTC_13129:0.07 465,((Corynebacterium_pseudotuberculosis_C231:1.6E-4,Corynebacterium_pseudot uberculosis_I19:1.1E-4)55:1.8E-4,(Corynebacterium_pseudotuberculosis_1002:4. 2E-4,Corynebacterium_pseudotuberculosis_FRC41:1.2E-4)100:1.3E-4)100:0.0698)1 00:0.03816)100:0.03537)100:0.01906,((Corynebacterium_ammoniagenes_DSM_20306: 0.10393,(((Corynebacterium_accolens_ATCC_49725:0.00229,Corynebacterium_accol ens_ATCC_49726:0.00293)100:0.03941,(Corynebacterium_pseudogenitalium_ATCC_33 035:0.00268,Corynebacterium_tuberculostearicum_SK141:0.00306)100:0.02483)100 :0.04295,(Corynebacterium_aurimucosum_ATCC_700975:0.05398,Corynebacterium_st riatum_ATCC_6940:0.05404)100:0.02088)100:0.03132)100:0.05481,(Corynebacteriu m_genitalium_ATCC_33030:0.09551,Corynebacterium_lipophiloflavum_DSM_44291:0. 09316)100:0.0783)100:0.03398)100:0.02921)100:0.04667)100:0.02447)100:0.11754 )100:0.06305,(((Pseudonocardia_sp._P1:0.18877,(Saccharopolyspora_erythraea_N RRL_2338:0.1213,(Actinosynnema_mirum_DSM_43827:0.11851,(Saccharomonospora_vi ridis_DSM_43017:0.0927,(Amycolatopsis_mediterranei_U32:0.04231,Streptomyces_ sp._AA4:0.04144)100:0.05565)100:0.05467)55:0.01703)100:0.03241)100:0.02581,( Nakamurella_multipartita_DSM_44233:0.23119,(Geodermatophilus_obscurus_DSM_43 160:0.21546,((Stackebrandtia_nassauensis_DSM_44728:0.21272,((Salinispora_are nicola_CNS-205:0.01818,Salinispora_tropica_CNB-440:0.02452)100:0.03265,(Micr omonospora_sp._ATCC_39149:0.0329,(Micromonospora_aurantiaca_ATCC_27029:0.001 16,Micromonospora_sp._L5:7.7E-4)100:0.03199)100:0.01352)100:0.11098)100:0.09 308,((((((Propionibacterium_freudenreichii_subsp._shermanii_CIRM-BIA1:0.2696 8,(Propionibacterium_acnes_J139:0.00475,(Propionibacterium_acnes_KPA171202:6 .9E-4,(Propionibacterium_acnes_SK187:9.6E-4,(Propionibacterium_acnes_J165:6. 2E-4,Propionibacterium_acnes_SK137:8.8E-4)100:6.3E-4)100:0.00181)100:0.00315 )100:0.24712)45:0.13659,(Kribbella_flavida_DSM_17836:0.17436,(Nocardioidacea e_bacterium_Broad-1:0.13638,Nocardioides_sp._JS614:0.08869)100:0.11094)73:0. 02696)43:0.0456,((Kytococcus_sedentarius_DSM_20547:0.25655,(Dermacoccus_sp._ Ellin185:0.19062,(Intrasporangium_calvum_DSM_43043:0.13866,Janibacter_sp._HT CC2649:0.1412)100:0.04228)58:0.02864)100:0.06896,(Kineococcus_radiotolerans_ SRS30216:0.21207,((((Propionibacterium_acidifaciens_F0233:1.06116,Rubrobacte r_xylanophilus_DSM_9941:0.5414)87:0.41816,(((Tropheryma_whipplei_TW08/27:0.0 0122,Tropheryma_whipplei_str._Twist:9.8E-4)100:0.6263,(Microbacterium_testac eum_StLB037:0.17489,(Leifsonia_xyli_subsp._xyli_str._CTCB07:0.12936,(marine_ actinobacterium_PHSC20C1:0.15679,(Clavibacter_michiganensis_subsp._michigane nsis_NCPPB_382:0.00662,Clavibacter_michiganensis_subsp._sepedonicus:0.00756) 100:0.1194)100:0.03032)100:0.03729)100:0.05755)100:0.10753,((Kocuria_rhizoph ila_DC2201:0.15484,(Rothia_dentocariosa_M567:0.0599,(Rothia_mucilaginosa_ATC C_25296:0.00472,Rothia_mucilaginosa_DY-18:0.00351)100:0.05376)100:0.16477)10 0:0.06018,((Micrococcus_luteus_NCTC_2665:0.0060,Micrococcus_luteus_SK58:0.00 536)100:0.2105,(Arthrobacter_arilaitensis_Re117:0.19987,(Renibacterium_salmo ninarum_ATCC_33209:0.11862,(Arthrobacter_aurescens_TC1:0.0397,(Arthrobacter_ sp._FB24:0.03191,(Arthrobacter_chlorophenolicus_A6:0.02327,Arthrobacter_phen anthrenivorans_Sphe3:0.0208)100:0.01937)100:0.01749)100:0.05605)100:0.05138) 54:0.02994)100:0.02418)100:0.06667)66:0.02055)2:0.00737,(Brevibacterium_line ns_BL2:0.16126,Brevibacterium_mcbrellneri_ATCC_49030:0.16657)100:0.16995)18: 0.03558,(Brachybacterium_faecium_DSM_4810:0.31488,(((Actinomyces_sp._oral_ta xon_848_str._F0332:0.18095,Arcanobacterium_haemolyticum_DSM_20595:0.23121)10 0:0.1195,((Actinomyces_urogenitalis_DSM_15434:0.07685,(Actinomyces_sp._oral_ taxon_171_str._F0337:0.01337,(Actinomyces_oris_K20:0.00655,Actinomyces_visco sus_C505:0.00531)100:0.00537)100:0.07249)100:0.1364,(((Mobiluncus_mulieris_2 8-1:0.00136,(Mobiluncus_mulieris_ATCC_35243:8.3E-4,(Mobiluncus_mulieris_ATCC _35239:8.2E-4,Mobiluncus_mulieris_FB024-16:0.00149)77:5.2E-4)68:4.7E-4)100:0 .11036,(Mobiluncus_curtisii_ATCC_51333:0.00379,(Mobiluncus_curtisii_ATCC_430 63:0.00217,Mobiluncus_curtisii_subsp._holmesii_ATCC_35242:0.00188)100:0.0018 7)100:0.10314)100:0.23714,(Actinomyces_coleocanis_DSM_15436:0.23053,(Actinom yces_sp._oral_taxon_178_str._F0338:0.07648,(Actinomyces_sp._oral_taxon_180_s tr._F0310:0.034,(Actinomyces_odontolyticus_ATCC_17982:0.00322,Actinomyces_od ontolyticus_F0309:0.00348)100:0.02875)100:0.05144)100:0.13371)100:0.05044)10 0:0.03821)98:0.03296)99:0.07932,(Beutenbergia_cavernae_DSM_12333:0.17071,(Ce llulomonas_flavigena_DSM_20109:0.11998,(Xylanimonas_cellulosilytica_DSM_1589 4:0.13952,(Sanguibacter_keddieii_DSM_10542:0.0926,Jonesia_denitrificans_DSM_ 20603:0.19099)100:0.037)100:0.02528)100:0.03933)100:0.03437)90:0.02763)87:0. 02582)25:0.02892)16:0.01907)26:0.05274)57:0.03588,(Catenulispora_acidiphila_ DSM_44928:0.19325,((Streptomyces_bingchenggensis_BCW-1:0.02661,(Streptomyces _hygroscopicus_ATCC_53653:0.02627,Streptomyces_violaceusniger_Tu_4113:0.0226 6)100:0.0122)100:0.02739,(((Streptomyces_avermitilis_MA-4680:0.02562,((Strep tomyces_viridochromogenes_DSM_40736:0.02687,((Streptomyces_ghanaensis_ATCC_1 4672:0.0209,Streptomyces_griseoflavus_Tu4000:0.02075)100:0.01066,(Streptomyc es_sp._e14:0.03193,(Streptomyces_coelicolor_A3_2_:5.2E-4,Streptomyces_livida ns_TK24:6.7E-4)100:0.02763)21:0.00492)22:0.00461)100:0.00897,(Streptomyces_s viceus_ATCC_29083:0.02794,Streptomyces_scabiei_87.22:0.03155)79:0.00685)98:0 .00896)100:0.02063,(Streptomyces_albus_J1074:0.04846,(Streptomyces_sp._SPB74 :0.01027,(Streptomyces_sp._SPB78:0.0027,(Streptomyces_sp._SA3_actF:0.00376,S treptomyces_sp._SA3_actG:2.4E-4)100:6.7E-4)100:0.00617)100:0.05981)100:0.015 63)100:0.00959,((Streptomyces_sp._C:0.01998,Streptomyces_sp._Mg1:0.02056)100 :0.0383,(Streptomyces_pristinaespiralis_ATCC_25486:0.03823,(Streptomyces_sp. _ACTE:0.02415,((Streptomyces_griseus_subsp._griseus_NBRC_13350:4.1E-4,Strept omyces_sp._ACT-1:5.2E-4)100:0.00769,(Streptomyces_roseosporus_NRRL_11379:2.2 E-4,Streptomyces_roseosporus_NRRL_15998:0.00209)100:0.00782)100:0.017)100:0. 02297)60:0.00742)100:0.00985)100:0.0215)100:0.11349)100:0.05)46:0.02214,(Aci dothermus_cellulolyticus_11B:0.25166,((Nocardiopsis_dassonvillei_subsp._dass onvillei_DSM_43111:0.11961,Thermobifida_fusca_YX:0.09198)100:0.09418,(Thermo monospora_curvata_DSM_43183:0.12535,(Streptosporangium_roseum_DSM_43021:0.09 754,Thermobispora_bispora_DSM_43833:0.08434)100:0.06253)100:0.0273)100:0.045 43)74:0.02856)40:0.02885,(Frankia_symbiont_of_Datisca_glomerata:0.09569,(Fra nkia_sp._EuI1c:0.10847,((Frankia_alni_ACN14a:0.0351,Frankia_sp._CcI3:0.03651 )100:0.04249,(Frankia_sp._EAN1pec:0.03101,Frankia_sp._EUN1f:0.0394)100:0.046 88)100:0.02188)100:0.03014)100:0.16741)88:0.0315)65:0.02571)88:0.05354)88:0. 03348)88:0.06784,(Segniliparus_rotundus_DSM_44985:0.04795,Segniliparus_rugos us_ATCC_BAA-974:0.03891)100:0.23144)81:0.0205)88:0.02214,(Tsukamurella_pauro metabola_DSM_20162:0.13958,(Gordonia_bronchialis_DSM_43247:0.08238,Gordonia_ neofelifaecis_NRRL_B-59395:0.10412)100:0.06124)100:0.03199)84:0.01596)88:0.0 6919)88:0.04069,(Mycobacterium_smegmatis_str._MC2_155:0.05174,((Mycobacteriu m_sp._JLS:0.00135,(Mycobacterium_sp._KMS:1.4E-4,Mycobacterium_sp._MCS:2.3E-4 )100:0.00177)100:0.04943,(Mycobacterium_vanbaalenii_PYR-1:0.02631,(Mycobacte rium_gilvum_PYR-GCK:0.0011,Mycobacterium_sp._Spyr1:9.2E-4)100:0.03388)100:0. 03536)100:0.01213)100:0.02314)88:0.04937,(Mycobacterium_parascrofulaceum_ATC C_BAA-614:0.02796,(Mycobacterium_intracellulare_ATCC_13950:0.02159,(Mycobact erium_avium_subsp._paratuberculosis_K-10:0.0020,(Mycobacterium_avium_104:0.0 032,Mycobacterium_avium_subsp._avium_ATCC_25291:0.00216)91:6.3E-4)100:0.0194 3)100:0.01235)100:0.0158)88:0.01054)62:0.0066)88:0.04218,(Mycobacterium_tube rculosis_T46:7.6E-4,(Mycobacterium_tuberculosis_CPHL_A:5.8E-4,(Mycobacterium _tuberculosis_K85:8.0E-4,(Mycobacterium_bovis_AF2122/97:1.4E-4,(Mycobacteriu m_bovis_BCG_str._Pasteur_1173P2:1.4E-4,Mycobacterium_bovis_BCG_str._Tokyo_17 2:1.4E-4)100:1.3E-4)100:1.8E-4)38:1.1E-4)33:1.4E-4)0:1.4E-4)1:1.5E-4)0:1.5E- 4,(Mycobacterium_tuberculosis_CDC1551:2.2E-4,(Mycobacterium_tuberculosis_T92 :0.00262,Mycobacterium_tuberculosis_GM_1503:0.00427)3:1.3E-4)0:1.4E-4)13:1.4 E-4,(((Mycobacterium_tuberculosis_SUMu012:8.5E-4,(Mycobacterium_tuberculosis _H37Ra_WGS_:0.00109,(Mycobacterium_tuberculosis_H37Ra:1.4E-4,Mycobacterium_t uberculosis_H37Rv:1.4E-4)58:1.4E-4)41:1.4E-4)72:1.4E-4,(Mycobacterium_tuberc ulosis_SUMu001:2.9E-4,(Mycobacterium_tuberculosis_SUMu010:1.1E-4,Mycobacteri um_tuberculosis_SUMu011:8.7E-4)58:1.4E-4)65:1.6E-4)61:0.00102,(Mycobacterium _tuberculosis_F11:1.1E-4,((Mycobacterium_tuberculosis_KZN_4207:1.4E-4,Mycoba cterium_tuberculosis_KZN_R506:1.4E-4)58:1.4E-4,(Mycobacterium_tuberculosis_K ZN_1435:1.4E-4,Mycobacterium_tuberculosis_KZN_V2475:1.4E-4)74:1.4E-4)78:1.1E -4)36:1.4E-4)3:1.4E-4)46:1.4E-4)4:1.4E-4)88; _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From manju.rawat2 at gmail.com Mon Sep 12 05:37:35 2011 From: manju.rawat2 at gmail.com (Manju Rawat) Date: Mon, 12 Sep 2011 01:37:35 -0400 Subject: [Bioperl-l] no blast result Message-ID: Hello, I want to make a program which first generate the random sequence and then gives me that result(sequence) which have no blast result(no matches in any database/or particular database).Is there any body who can help me in doing this. Pl reply if anybody knows about it.. Thanks Manju From zhangchnxp at gmail.com Mon Sep 12 05:51:59 2011 From: zhangchnxp at gmail.com (Zhang chn) Date: Mon, 12 Sep 2011 13:51:59 +0800 Subject: [Bioperl-l] no blast result In-Reply-To: References: Message-ID: Hi, IMHO, due to the nature of BLAST, it is usually impossible to get no results from random sequence, but to get a set of matches with lower scores. What you can do is to focus on the e-value, say, setting a threshold to it. FYI, http://www.ncbi.nlm.nih.gov/BLAST/tutorial/Altschul-1.html On Mon, Sep 12, 2011 at 1:37 PM, Manju Rawat wrote: > Hello, > I want to make a program which first generate the random sequence and then > gives me that result(sequence) which have no blast result(no matches in any > database/or particular database).Is there any body who can help me in doing > this. > > Pl reply if anybody knows about it.. > Thanks > Manju > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From manju.rawat2 at gmail.com Mon Sep 12 05:58:38 2011 From: manju.rawat2 at gmail.com (Manju Rawat) Date: Mon, 12 Sep 2011 01:58:38 -0400 Subject: [Bioperl-l] no blast result In-Reply-To: References: Message-ID: Ya i know this....And it is also in my use if i get result with lower scores. But how could I do this? Manju From zhangchnxp at gmail.com Mon Sep 12 06:04:17 2011 From: zhangchnxp at gmail.com (Zhang chn) Date: Mon, 12 Sep 2011 14:04:17 +0800 Subject: [Bioperl-l] no blast result In-Reply-To: References: Message-ID: Please read the documentation for Bio::Tools::Run::StandAloneBlast and Bio::AlignIO.* * On Mon, Sep 12, 2011 at 1:58 PM, Manju Rawat wrote: > Ya i know this....And it is also in my use if i get result with lower > scores. > But how could I do this? > > > Manju > From manju.rawat2 at gmail.com Mon Sep 12 11:12:40 2011 From: manju.rawat2 at gmail.com (Manju Rawat) Date: Mon, 12 Sep 2011 07:12:40 -0400 Subject: [Bioperl-l] no blast result In-Reply-To: References: Message-ID: I read this..but default program is not runnig fine.it showing many error that MSG: cannot find path to blastall.. Use of uninitialized value $_[0] in join or string at /usr/share/perl/5.10/File/Spec/Unix.pm line 41. Am this this is not showing output which i want.. Pl help me.. Manju Rawat From arguelloj at gmail.com Mon Sep 12 02:52:42 2011 From: arguelloj at gmail.com (J. Fernando Arguello) Date: Sun, 11 Sep 2011 19:52:42 -0700 Subject: [Bioperl-l] BioPerl - quick general question Message-ID: Dear BioPerl, I'm excited to see a project like this! Basically I have a computer science background with a few years of development, research and minimal bioinformatics experience. Dumb question...where is the best place for a developer to begin on the BioPerl wiki(s), who is wanting to contribute new code or bug fixes to BioPerl in the future? Any input is much appreciated. Thank you all for your time. Best, Fernando jfa From briano at bioteam.net Mon Sep 12 13:20:36 2011 From: briano at bioteam.net (Brian Osborne) Date: Mon, 12 Sep 2011 09:20:36 -0400 Subject: [Bioperl-l] Fwd: cds sequence extract References: <112c4ef2.641e.1325c4b21cb.Coremail.maliang7121@163.com> Message-ID: <671CAF11-55A4-462A-BC5B-805C87E1EB0E@bioteam.net> Liang Ma, I'm forwarding this to the Bioperl mailing list. If you're starting out with Bioperl I suggest you read this: http://www.bioperl.org/wiki/HOWTO:Beginners Brian O. Begin forwarded message: > From: maliang7121 > Date: September 12, 2011 2:20:20 AM EDT > To: briano at bioteam.net > Subject: cds sequence extract > > Dear Brian: > > I am a student of Chinese Academy of Sience, I begin to love bioperl, but now I have a problem. > > According to the script of the attachment, I could easily dowload sequences from NCBI, now I need extract cds sequence from the genbank format files, and put them all in a single file using fasta format, I can not do it, could you spend a few minite wrinting a script for me? > > Best! > > Liang Ma > > > Brian O. -- Brian Osborne, PhD BioTeam: http://bioteam.net email: briano at bioteam.net mobile: 978-317-3101 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: acc.txt URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: get_seq_by_acc_ml.pl Type: text/x-perl-script Size: 583 bytes Desc: not available URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: From fs5 at sanger.ac.uk Mon Sep 12 13:54:21 2011 From: fs5 at sanger.ac.uk (Frank Schwach) Date: Mon, 12 Sep 2011 14:54:21 +0100 Subject: [Bioperl-l] no blast result In-Reply-To: References: Message-ID: <1315835661.3797.563.camel@deskpro15336.internal.sanger.ac.uk> looks like BLAST is not install on your system. The BioPerl module only runs BLAST for you and parses the output but you still need the BLAST executables installed on your system. Follow the instructions on the NCBI website to download and install BLAST and try running it on the commandline with the "blastall" command. If that works then you can run it also via BioPerl. Frank On Mon, 2011-09-12 at 07:12 -0400, Manju Rawat wrote: > I read this..but default program is not runnig fine.it showing many error > that > > MSG: cannot find path to blastall.. > Use of uninitialized value $_[0] in join or string at > /usr/share/perl/5.10/File/Spec/Unix.pm line 41. > > Am this this is not showing output which i want.. > > Pl help me.. > > Manju Rawat > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From p.j.a.cock at googlemail.com Mon Sep 12 14:00:30 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 12 Sep 2011 15:00:30 +0100 Subject: [Bioperl-l] no blast result In-Reply-To: <1315835661.3797.563.camel@deskpro15336.internal.sanger.ac.uk> References: <1315835661.3797.563.camel@deskpro15336.internal.sanger.ac.uk> Message-ID: On Mon, Sep 12, 2011 at 2:54 PM, Frank Schwach wrote: > looks like BLAST is not install on your system. The BioPerl module only > runs BLAST for you and parses the output but you still need the BLAST > executables installed on your system. Follow the instructions on the > NCBI website to download and install BLAST and try running it on the > commandline with the "blastall" command. If that works then you can run > it also via BioPerl. > Frank Hang on - blastall is from the "legacy" BLAST suite, does BioPerl still talk to that or the new BLAST+ suite (e.g. binaries blastn and blastp rather then blastall)? Peter From cjfields at illinois.edu Mon Sep 12 17:45:56 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Mon, 12 Sep 2011 17:45:56 +0000 Subject: [Bioperl-l] BioPerl - quick general question In-Reply-To: References: Message-ID: <62B9B300-96AC-4511-A1B9-CFF36CBE6288@illinois.edu> On Sep 11, 2011, at 9:52 PM, J. Fernando Arguello wrote: > Dear BioPerl, > > I'm excited to see a project like this! Basically I have a computer science > background with a few years of development, research and minimal > bioinformatics experience. > > Dumb question...where is the best place for a developer to begin on the > BioPerl wiki(s), who is wanting to contribute new code or bug fixes to > BioPerl in the future? The basic starting point: the HOWTOs and the tutorial (not sure how up-to-date some of the latter are, in general they should work): http://www.bioperl.org/wiki/HOWTOs http://www.bioperl.org/wiki/Tutorials > Any input is much appreciated. Thank you all for your time. > > Best, > Fernando > jfa We gladly welcome anyone willing to hack on BioPerl. The repository is now on github (core is https://github.com/bioperl/bioperl-live), so it's fairly easy to fork the code and make changes. We are in the middle of splitting up the large codebase into more manageable subdistributions, so it's probably a good idea to ask on list about specific code in case the code is question resides in a separate repository. Let us know if you have additional questions! Cheers! chris From shalabh.sharma7 at gmail.com Mon Sep 12 18:00:16 2011 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Mon, 12 Sep 2011 14:00:16 -0400 Subject: [Bioperl-l] Module for SOCS Message-ID: Hi All, I am using SOCS for mapping my SOILD data. I was just wondering if there is any module in bioperl to analyze SOCS output files directly or mapreads format. Thanks Shalabh -- Shalabh Sharma Scientific Computing Professional Associate (Bioinformatics Specialist) Department of Marine Sciences University of Georgia Athens, GA 30602-3636 From greg at ebi.ac.uk Tue Sep 13 08:30:58 2011 From: greg at ebi.ac.uk (Gregory Jordan) Date: Tue, 13 Sep 2011 09:30:58 +0100 Subject: [Bioperl-l] fail to obtain node-to-node distance from a newick file In-Reply-To: <04a601cc700e$4f051d10$ed0f5730$@edu.hk> References: <048a01cc6f84$60c41090$224c31b0$@edu.hk> <04a601cc700e$4f051d10$ed0f5730$@edu.hk> Message-ID: Hi Ross, I don't typically 'install' versions of BioPerl from GitHub. Rather, I check out the code into a directory that's on my Perl search path (and make sure any other BioPerl code isn't on the path anymore). I think the following commands should get you the right set of code: > git clone git://github.com/bioperl/bioperl-live.git > git checkout topic/tree_api_refresh After that, I'm afraid I'll have to leave it to you (or someone else on the list). I'm no Perl guru, so I don't know the "right" way to direct Perl towards a developmental BioPerl branch. Cheers, Greg 2011/9/11 Ross KK Leung > Hi Greg,**** > > ** ** > > The values are correct! However, how to install this bioperl-live module? > my bioperl is 1.6.1 but there's an error:**** > > ** ** > > Can't locate object method "find" via package "Bio::Tree::Tree" at > TreeCalDist.pl line 32, line 1.**** > > my $mt = $tree->find($keyword); #line 32**** > > ** ** > > ** ** > > *From:* gjuggler at gmail.com [mailto:gjuggler at gmail.com] *On Behalf Of *Gregory > Jordan > *Sent:* 2011?9?10? 23:40 > *To:* bioperl-l List; Ross KK Leung > *Subject:* Re: [Bioperl-l] fail to obtain node-to-node distance from a > newick file**** > > ** ** > > Hi Ross,**** > > ** ** > > Which version of BioPerl are you using?**** > > ** ** > > With the refactored tree code (available from the tree_api_refresh branch > on the BioPerl github repo: > https://github.com/bioperl/bioperl-live/blob/topic/tree_api_refresh/Bio/Tree/NodeFunctionsI.pm#L406) > the following script works for me. Do those values look sensible to you? The > code on the new branch is a bit experimental, so I wouldn't be surprised if > all the edge cases for calculations like this aren't covered.**** > > ** ** > > --greg**** > > ** ** > > use Bio::TreeIO;**** > > ** ** > > my $treeio = new Bio::TreeIO(-file => 'temp.nh', -format => "newick");** > ** > > my $tree = $treeio->next_tree;**** > > my $mt = $tree->find("Mycobacterium_tuberculosis_H37Rv");**** > > my $ms = $tree->find("Mycobacterium_smegmatis_str._MC2_155");**** > > my $ma = $tree->find("Mycobacterium_abscessus");**** > > my $distance = $mt->distance($ma);**** > > print "MT - MA: ".$mt->distance($ma)."\n";**** > > print "MT - MS: ".$mt->distance($ms)."\n";**** > > print "MS - MA: ".$ms->distance($ma)."\n";**** > > # MT - MA: 0.24326**** > > # MT - MS: 0.18573**** > > # MS - MA: 0.20729**** > > ** ** > > --greg**** > > ** ** > > On Sat, Sep 10, 2011 at 7:39 AM, Ross KK Leung wrote:** > ** > > I use the following code to derive the distance between two nodes but an > error "MSG: could not find the lca of supplied nodes; can't find distance > either" > > > > What's the problem? > > > > use Bio::TreeIO; > > > > ($treefh) = @ARGV; > > > > my $treeio = new Bio::TreeIO(-file => $treefh, -format => "newick"); > > my $tree = $treeio->next_tree; > > > > $keyword="Mycobacterium_tuberculosis_H37Rv"; > > my $Tnodes = $tree->find_node(-id => $keyword); > > $keyword="Mycobacterium_smegmatis_str._MC2_155"; > > my $Mnodes = $tree->find_node(-id => $keyword); > > $keyword="Mycobacterium_abscessus"; > > my $Anodes = $tree->find_node(-id => $keyword); > > > > my @root = $tree->get_root_node; > > #my $distances = $tree->distance(-nodes => [$node[0],$root]); > > > > my $distances = $tree->distance(-nodes => [$Tnode,$Mnodes]); > > print "Dist:$distances\n"; > > > > > > #### the following is the infile > > > (((Mycobacterium_tuberculosis_SUMu006:4.1E-4,(Mycobacterium_tuberculosis_SUM > > u002:8.9E-4,Mycobacterium_tuberculosis_SUMu005:1.7E-4)31:1.4E-4)30:1.4E-4,(M > > ycobacterium_tuberculosis_SUMu007:4.3E-4,Mycobacterium_tuberculosis_SUMu009: > > 1.2E-4)22:1.4E-4)8:1.4E-4,(Mycobacterium_tuberculosis_SUMu004:3.7E-4,Mycobac > > terium_tuberculosis_SUMu008:2.0E-4)19:1.0E-4,(Mycobacterium_tuberculosis_SUM > > u003:1.4E-4,(((((Mycobacterium_tuberculosis_EAS054:0.00165,(Mycobacterium_tu > > berculosis_94_M4241A:0.0011,((Mycobacterium_tuberculosis_T17:0.00257,Mycobac > > terium_tuberculosis_KZN_605:0.00259)8:1.9E-4,(Mycobacterium_tuberculosis_'98 > > -R604_INH-RIF-EM':9.9E-4,(Mycobacterium_tuberculosis_C:0.00293,Mycobacterium > > _tuberculosis_str._Haarlem:0.00271)28:1.4E-4)1:1.3E-4)0:1.6E-4)0:1.7E-4)0:1. > > 6E-4,(Mycobacterium_tuberculosis_02_1987:0.00186,(Mycobacterium_tuberculosis > > _T85:0.00229,Mycobacterium_tuberculosis_210:0.00175)41:1.4E-4)22:1.1E-4)0:1. > > 4E-4,(((Mycobacterium_kansasii_ATCC_12478:0.03353,(Mycobacterium_marinum_M:8 > > .4E-4,Mycobacterium_ulcerans_Agy99:0.00621)100:0.03843)100:0.01058,((Mycobac > > terium_leprae_Br4923:1.7E-4,Mycobacterium_leprae_TN:1.6E-4)100:0.07379,(((My > > cobacterium_abscessus:0.09172,((Nocardia_farcinica_IFM_10152:0.09312,(Rhodoc > > occus_equi_103S:0.04972,((Rhodococcus_erythropolis_PR4:0.00223,Rhodococcus_e > > rythropolis_SK121:0.00169)100:0.05311,(Rhodococcus_jostii_RHA1:0.00657,Rhodo > > coccus_opacus_B4:0.00665)100:0.03198)100:0.02608)100:0.03679)100:0.0322,(((D > > ietzia_cinnamea_P4:0.17703,(Corynebacterium_amycolatum_SK46:0.16548,((Coryne > > bacterium_kroppenstedtii_DSM_44385:0.19137,(Corynebacterium_variabile_DSM_44 > > 702:0.14307,(Corynebacterium_urealyticum_DSM_7109:0.11864,(Corynebacterium_r > > esistens_DSM_45100:0.08617,(Corynebacterium_jeikeium_ATCC_43734:0.00292,Cory > > nebacterium_jeikeium_K411:0.00277)100:0.08012)100:0.02749)100:0.03867)100:0. > > 05776)100:0.02728,((Corynebacterium_glucuronolyticum_ATCC_51866:0.00103,Cory > > nebacterium_glucuronolyticum_ATCC_51867:0.00137)100:0.2108,(((Corynebacteriu > > m_efficiens_YS-314:0.04795,Corynebacterium_glutamicum_R:0.05369)100:0.07449, > > ((Corynebacterium_matruchotii_ATCC_14266:0.00162,Corynebacterium_matruchotii > > _ATCC_33806:0.00183)100:0.13467,(Corynebacterium_diphtheriae_NCTC_13129:0.07 > > 465,((Corynebacterium_pseudotuberculosis_C231:1.6E-4,Corynebacterium_pseudot > > uberculosis_I19:1.1E-4)55:1.8E-4,(Corynebacterium_pseudotuberculosis_1002:4. > > 2E-4,Corynebacterium_pseudotuberculosis_FRC41:1.2E-4)100:1.3E-4)100:0.0698)1 > > 00:0.03816)100:0.03537)100:0.01906,((Corynebacterium_ammoniagenes_DSM_20306: > > 0.10393,(((Corynebacterium_accolens_ATCC_49725:0.00229,Corynebacterium_accol > > ens_ATCC_49726:0.00293)100:0.03941,(Corynebacterium_pseudogenitalium_ATCC_33 > > 035:0.00268,Corynebacterium_tuberculostearicum_SK141:0.00306)100:0.02483)100 > > :0.04295,(Corynebacterium_aurimucosum_ATCC_700975:0.05398,Corynebacterium_st > > riatum_ATCC_6940:0.05404)100:0.02088)100:0.03132)100:0.05481,(Corynebacteriu > > m_genitalium_ATCC_33030:0.09551,Corynebacterium_lipophiloflavum_DSM_44291:0. > > 09316)100:0.0783)100:0.03398)100:0.02921)100:0.04667)100:0.02447)100:0.11754 > > )100:0.06305,(((Pseudonocardia_sp._P1:0.18877,(Saccharopolyspora_erythraea_N > > RRL_2338:0.1213,(Actinosynnema_mirum_DSM_43827:0.11851,(Saccharomonospora_vi > > ridis_DSM_43017:0.0927,(Amycolatopsis_mediterranei_U32:0.04231,Streptomyces_ > > sp._AA4:0.04144)100:0.05565)100:0.05467)55:0.01703)100:0.03241)100:0.02581,( > > Nakamurella_multipartita_DSM_44233:0.23119,(Geodermatophilus_obscurus_DSM_43 > > 160:0.21546,((Stackebrandtia_nassauensis_DSM_44728:0.21272,((Salinispora_are > > nicola_CNS-205:0.01818,Salinispora_tropica_CNB-440:0.02452)100:0.03265,(Micr > > omonospora_sp._ATCC_39149:0.0329,(Micromonospora_aurantiaca_ATCC_27029:0.001 > > 16,Micromonospora_sp._L5:7.7E-4)100:0.03199)100:0.01352)100:0.11098)100:0.09 > > 308,((((((Propionibacterium_freudenreichii_subsp._shermanii_CIRM-BIA1:0.2696 > > 8,(Propionibacterium_acnes_J139:0.00475,(Propionibacterium_acnes_KPA171202:6 > > .9E-4,(Propionibacterium_acnes_SK187:9.6E-4,(Propionibacterium_acnes_J165:6. > > 2E-4,Propionibacterium_acnes_SK137:8.8E-4)100:6.3E-4)100:0.00181)100:0.00315 > > )100:0.24712)45:0.13659,(Kribbella_flavida_DSM_17836:0.17436,(Nocardioidacea > > e_bacterium_Broad-1:0.13638,Nocardioides_sp._JS614:0.08869)100:0.11094)73:0. > > 02696)43:0.0456,((Kytococcus_sedentarius_DSM_20547:0.25655,(Dermacoccus_sp._ > > Ellin185:0.19062,(Intrasporangium_calvum_DSM_43043:0.13866,Janibacter_sp._HT > > CC2649:0.1412)100:0.04228)58:0.02864)100:0.06896,(Kineococcus_radiotolerans_ > > SRS30216:0.21207,((((Propionibacterium_acidifaciens_F0233:1.06116,Rubrobacte > > r_xylanophilus_DSM_9941:0.5414)87:0.41816,(((Tropheryma_whipplei_TW08/27:0.0 > > 0122,Tropheryma_whipplei_str._Twist:9.8E-4)100:0.6263,(Microbacterium_testac > > eum_StLB037:0.17489,(Leifsonia_xyli_subsp._xyli_str._CTCB07:0.12936,(marine_ > > actinobacterium_PHSC20C1:0.15679,(Clavibacter_michiganensis_subsp._michigane > > nsis_NCPPB_382:0.00662,Clavibacter_michiganensis_subsp._sepedonicus:0.00756) > > 100:0.1194)100:0.03032)100:0.03729)100:0.05755)100:0.10753,((Kocuria_rhizoph > > ila_DC2201:0.15484,(Rothia_dentocariosa_M567:0.0599,(Rothia_mucilaginosa_ATC > > C_25296:0.00472,Rothia_mucilaginosa_DY-18:0.00351)100:0.05376)100:0.16477)10 > > 0:0.06018,((Micrococcus_luteus_NCTC_2665:0.0060,Micrococcus_luteus_SK58:0.00 > > 536)100:0.2105,(Arthrobacter_arilaitensis_Re117:0.19987,(Renibacterium_salmo > > ninarum_ATCC_33209:0.11862,(Arthrobacter_aurescens_TC1:0.0397,(Arthrobacter_ > > sp._FB24:0.03191,(Arthrobacter_chlorophenolicus_A6:0.02327,Arthrobacter_phen > > anthrenivorans_Sphe3:0.0208)100:0.01937)100:0.01749)100:0.05605)100:0.05138) > > 54:0.02994)100:0.02418)100:0.06667)66:0.02055)2:0.00737,(Brevibacterium_line > > ns_BL2:0.16126,Brevibacterium_mcbrellneri_ATCC_49030:0.16657)100:0.16995)18: > > 0.03558,(Brachybacterium_faecium_DSM_4810:0.31488,(((Actinomyces_sp._oral_ta > > xon_848_str._F0332:0.18095,Arcanobacterium_haemolyticum_DSM_20595:0.23121)10 > > 0:0.1195,((Actinomyces_urogenitalis_DSM_15434:0.07685,(Actinomyces_sp._oral_ > > taxon_171_str._F0337:0.01337,(Actinomyces_oris_K20:0.00655,Actinomyces_visco > > sus_C505:0.00531)100:0.00537)100:0.07249)100:0.1364,(((Mobiluncus_mulieris_2 > > 8-1:0.00136,(Mobiluncus_mulieris_ATCC_35243:8.3E-4,(Mobiluncus_mulieris_ATCC > > _35239:8.2E-4,Mobiluncus_mulieris_FB024-16:0.00149)77:5.2E-4)68:4.7E-4)100:0 > > .11036,(Mobiluncus_curtisii_ATCC_51333:0.00379,(Mobiluncus_curtisii_ATCC_430 > > 63:0.00217,Mobiluncus_curtisii_subsp._holmesii_ATCC_35242:0.00188)100:0.0018 > > 7)100:0.10314)100:0.23714,(Actinomyces_coleocanis_DSM_15436:0.23053,(Actinom > > yces_sp._oral_taxon_178_str._F0338:0.07648,(Actinomyces_sp._oral_taxon_180_s > > tr._F0310:0.034,(Actinomyces_odontolyticus_ATCC_17982:0.00322,Actinomyces_od > > ontolyticus_F0309:0.00348)100:0.02875)100:0.05144)100:0.13371)100:0.05044)10 > > 0:0.03821)98:0.03296)99:0.07932,(Beutenbergia_cavernae_DSM_12333:0.17071,(Ce > > llulomonas_flavigena_DSM_20109:0.11998,(Xylanimonas_cellulosilytica_DSM_1589 > > 4:0.13952,(Sanguibacter_keddieii_DSM_10542:0.0926,Jonesia_denitrificans_DSM_ > > 20603:0.19099)100:0.037)100:0.02528)100:0.03933)100:0.03437)90:0.02763)87:0. > > 02582)25:0.02892)16:0.01907)26:0.05274)57:0.03588,(Catenulispora_acidiphila_ > > DSM_44928:0.19325,((Streptomyces_bingchenggensis_BCW-1:0.02661,(Streptomyces > > _hygroscopicus_ATCC_53653:0.02627,Streptomyces_violaceusniger_Tu_4113:0.0226 > > 6)100:0.0122)100:0.02739,(((Streptomyces_avermitilis_MA-4680:0.02562,((Strep > > tomyces_viridochromogenes_DSM_40736:0.02687,((Streptomyces_ghanaensis_ATCC_1 > > 4672:0.0209,Streptomyces_griseoflavus_Tu4000:0.02075)100:0.01066,(Streptomyc > > es_sp._e14:0.03193,(Streptomyces_coelicolor_A3_2_:5.2E-4,Streptomyces_livida > > ns_TK24:6.7E-4)100:0.02763)21:0.00492)22:0.00461)100:0.00897,(Streptomyces_s > > viceus_ATCC_29083:0.02794,Streptomyces_scabiei_87.22:0.03155)79:0.00685)98:0 > > .00896)100:0.02063,(Streptomyces_albus_J1074:0.04846,(Streptomyces_sp._SPB74 > > :0.01027,(Streptomyces_sp._SPB78:0.0027,(Streptomyces_sp._SA3_actF:0.00376,S > > treptomyces_sp._SA3_actG:2.4E-4)100:6.7E-4)100:0.00617)100:0.05981)100:0.015 > > 63)100:0.00959,((Streptomyces_sp._C:0.01998,Streptomyces_sp._Mg1:0.02056)100 > > :0.0383,(Streptomyces_pristinaespiralis_ATCC_25486:0.03823,(Streptomyces_sp. > > _ACTE:0.02415,((Streptomyces_griseus_subsp._griseus_NBRC_13350:4.1E-4,Strept > > omyces_sp._ACT-1:5.2E-4)100:0.00769,(Streptomyces_roseosporus_NRRL_11379:2.2 > > E-4,Streptomyces_roseosporus_NRRL_15998:0.00209)100:0.00782)100:0.017)100:0. > > 02297)60:0.00742)100:0.00985)100:0.0215)100:0.11349)100:0.05)46:0.02214,(Aci > > dothermus_cellulolyticus_11B:0.25166,((Nocardiopsis_dassonvillei_subsp._dass > > onvillei_DSM_43111:0.11961,Thermobifida_fusca_YX:0.09198)100:0.09418,(Thermo > > monospora_curvata_DSM_43183:0.12535,(Streptosporangium_roseum_DSM_43021:0.09 > > 754,Thermobispora_bispora_DSM_43833:0.08434)100:0.06253)100:0.0273)100:0.045 > > 43)74:0.02856)40:0.02885,(Frankia_symbiont_of_Datisca_glomerata:0.09569,(Fra > > nkia_sp._EuI1c:0.10847,((Frankia_alni_ACN14a:0.0351,Frankia_sp._CcI3:0.03651 > > )100:0.04249,(Frankia_sp._EAN1pec:0.03101,Frankia_sp._EUN1f:0.0394)100:0.046 > > 88)100:0.02188)100:0.03014)100:0.16741)88:0.0315)65:0.02571)88:0.05354)88:0. > > 03348)88:0.06784,(Segniliparus_rotundus_DSM_44985:0.04795,Segniliparus_rugos > > us_ATCC_BAA-974:0.03891)100:0.23144)81:0.0205)88:0.02214,(Tsukamurella_pauro > > metabola_DSM_20162:0.13958,(Gordonia_bronchialis_DSM_43247:0.08238,Gordonia_ > > neofelifaecis_NRRL_B-59395:0.10412)100:0.06124)100:0.03199)84:0.01596)88:0.0 > > 6919)88:0.04069,(Mycobacterium_smegmatis_str._MC2_155:0.05174,((Mycobacteriu > > m_sp._JLS:0.00135,(Mycobacterium_sp._KMS:1.4E-4,Mycobacterium_sp._MCS:2.3E-4 > > )100:0.00177)100:0.04943,(Mycobacterium_vanbaalenii_PYR-1:0.02631,(Mycobacte > > rium_gilvum_PYR-GCK:0.0011,Mycobacterium_sp._Spyr1:9.2E-4)100:0.03388)100:0. > > 03536)100:0.01213)100:0.02314)88:0.04937,(Mycobacterium_parascrofulaceum_ATC > > C_BAA-614:0.02796,(Mycobacterium_intracellulare_ATCC_13950:0.02159,(Mycobact > > erium_avium_subsp._paratuberculosis_K-10:0.0020,(Mycobacterium_avium_104:0.0 > > 032,Mycobacterium_avium_subsp._avium_ATCC_25291:0.00216)91:6.3E-4)100:0.0194 > > 3)100:0.01235)100:0.0158)88:0.01054)62:0.0066)88:0.04218,(Mycobacterium_tube > > rculosis_T46:7.6E-4,(Mycobacterium_tuberculosis_CPHL_A:5.8E-4,(Mycobacterium > > _tuberculosis_K85:8.0E-4,(Mycobacterium_bovis_AF2122/97:1.4E-4,(Mycobacteriu > > m_bovis_BCG_str._Pasteur_1173P2:1.4E-4,Mycobacterium_bovis_BCG_str._Tokyo_17 > > 2:1.4E-4)100:1.3E-4)100:1.8E-4)38:1.1E-4)33:1.4E-4)0:1.4E-4)1:1.5E-4)0:1.5E- > > 4,(Mycobacterium_tuberculosis_CDC1551:2.2E-4,(Mycobacterium_tuberculosis_T92 > > :0.00262,Mycobacterium_tuberculosis_GM_1503:0.00427)3:1.3E-4)0:1.4E-4)13:1.4 > > E-4,(((Mycobacterium_tuberculosis_SUMu012:8.5E-4,(Mycobacterium_tuberculosis > > _H37Ra_WGS_:0.00109,(Mycobacterium_tuberculosis_H37Ra:1.4E-4,Mycobacterium_t > > uberculosis_H37Rv:1.4E-4)58:1.4E-4)41:1.4E-4)72:1.4E-4,(Mycobacterium_tuberc > > ulosis_SUMu001:2.9E-4,(Mycobacterium_tuberculosis_SUMu010:1.1E-4,Mycobacteri > > um_tuberculosis_SUMu011:8.7E-4)58:1.4E-4)65:1.6E-4)61:0.00102,(Mycobacterium > > _tuberculosis_F11:1.1E-4,((Mycobacterium_tuberculosis_KZN_4207:1.4E-4,Mycoba > > cterium_tuberculosis_KZN_R506:1.4E-4)58:1.4E-4,(Mycobacterium_tuberculosis_K > > ZN_1435:1.4E-4,Mycobacterium_tuberculosis_KZN_V2475:1.4E-4)74:1.4E-4)78:1.1E > -4)36:1.4E-4)3:1.4E-4)46:1.4E-4)4:1.4E-4)88; > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l**** > > ** ** > From manju.rawat2 at gmail.com Tue Sep 13 11:20:07 2011 From: manju.rawat2 at gmail.com (Manju Rawat) Date: Tue, 13 Sep 2011 07:20:07 -0400 Subject: [Bioperl-l] no blast result In-Reply-To: References: <1315835661.3797.563.camel@deskpro15336.internal.sanger.ac.uk> Message-ID: this is the perl code #!usr/bin/perl -w use Bio::Perl; use Bio::SearchIO use Bio::Tools::Run::StandAloneBlast; @params = ('database' => 'swissprot', 'READMETHOD' => 'Blastn'); $factory = Bio::Tools::Run::StandAloneBlast->new(@params); $input = Bio::Seq->new(-id=>"testquery",-seq=>"gatcgtataccgtacagct"); $blast_report = $factory->blastall($input); write_blast(">rs.blast",$blast_report); It showing error that Use of uninitialized value $_[0] in join or string at /usr/share/perl/5.10/File/Spec/Unix.pm line 41. MSG: cannot find path to blastall From fs5 at sanger.ac.uk Tue Sep 13 15:09:24 2011 From: fs5 at sanger.ac.uk (Frank Schwach) Date: Tue, 13 Sep 2011 16:09:24 +0100 Subject: [Bioperl-l] no blast result In-Reply-To: References: <1315835661.3797.563.camel@deskpro15336.internal.sanger.ac.uk> Message-ID: <1315926565.3797.587.camel@deskpro15336.internal.sanger.ac.uk> Peter: in BioPerl 1.6 the default executable name in Bio::Tools::Run StandAloneBlast is still set to "blastall" - I'm not sure if it works with blast+ too. Manju: as I said previously, you need to check that you can run BLAST on the command line, i.e. make sure it is actually installed on your system. Have you done that? You can also check the Bio::Tools::Run::StandAloneBlast docs to see how you can manually set the path to your BLAST executable if it is not in your path. You have to install BLAST fisrt before you can run this module. The other error you get from yuor code refers to something that is outside of the code fragment you show here, so can't comment on that one. Frank On Tue, 2011-09-13 at 07:20 -0400, Manju Rawat wrote: > this is the perl code > > #!usr/bin/perl -w > use Bio::Perl; > use Bio::SearchIO > use Bio::Tools::Run::StandAloneBlast; > @params = ('database' => 'swissprot', > 'READMETHOD' => 'Blastn'); > > $factory = Bio::Tools::Run::StandAloneBlast->new(@params); > > > $input = Bio::Seq->new(-id=>"testquery",-seq=>"gatcgtataccgtacagct"); > $blast_report = $factory->blastall($input); > > > write_blast(">rs.blast",$blast_report); > > > It showing error that > > > Use of uninitialized value $_[0] in join or string > at /usr/share/perl/5.10/File/Spec/Unix.pm line 41. > > MSG: cannot find path to blastall > > -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From David.Messina at sbc.su.se Tue Sep 13 15:34:20 2011 From: David.Messina at sbc.su.se (Dave Messina) Date: Tue, 13 Sep 2011 17:34:20 +0200 Subject: [Bioperl-l] no blast result In-Reply-To: <1315926565.3797.587.camel@deskpro15336.internal.sanger.ac.uk> References: <1315835661.3797.563.camel@deskpro15336.internal.sanger.ac.uk> <1315926565.3797.587.camel@deskpro15336.internal.sanger.ac.uk> Message-ID: There's a separate Bio::Tools::Run::BlastPlus module for blast+. And a related HOWTO: http://www.bioperl.org/wiki/HOWTO:BlastPlus On Tue, Sep 13, 2011 at 17:09, Frank Schwach wrote: > Peter: in BioPerl 1.6 the default executable name in Bio::Tools::Run > StandAloneBlast is still set to "blastall" - I'm not sure if it works > with blast+ too. > > Manju: as I said previously, you need to check that you can run BLAST on > the command line, i.e. make sure it is actually installed on your > system. Have you done that? > You can also check the Bio::Tools::Run::StandAloneBlast docs to see how > you can manually set the path to your BLAST executable if it is not in > your path. You have to install BLAST fisrt before you can run this > module. > The other error you get from yuor code refers to something that is > outside of the code fragment you show here, so can't comment on that > one. > > Frank > > > On Tue, 2011-09-13 at 07:20 -0400, Manju Rawat wrote: > > this is the perl code > > > > #!usr/bin/perl -w > > use Bio::Perl; > > use Bio::SearchIO > > use Bio::Tools::Run::StandAloneBlast; > > @params = ('database' => 'swissprot', > > 'READMETHOD' => 'Blastn'); > > > > $factory = Bio::Tools::Run::StandAloneBlast->new(@params); > > > > > > $input = Bio::Seq->new(-id=>"testquery",-seq=>"gatcgtataccgtacagct"); > > $blast_report = $factory->blastall($input); > > > > > > write_blast(">rs.blast",$blast_report); > > > > > > It showing error that > > > > > > Use of uninitialized value $_[0] in join or string > > at /usr/share/perl/5.10/File/Spec/Unix.pm line 41. > > > > MSG: cannot find path to blastall > > > > > > > > -- > The Wellcome Trust Sanger Institute is operated by Genome Research > Limited, a charity registered in England with number 1021457 and a > company registered in England with number 2742969, whose registered > office is 215 Euston Road, London, NW1 2BE. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at illinois.edu Tue Sep 13 19:36:21 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Tue, 13 Sep 2011 19:36:21 +0000 Subject: [Bioperl-l] no blast result In-Reply-To: References: <1315835661.3797.563.camel@deskpro15336.internal.sanger.ac.uk> Message-ID: <6570DEC6-B485-44B0-868A-AAC6329B3224@illinois.edu> On Sep 12, 2011, at 9:00 AM, Peter Cock wrote: > On Mon, Sep 12, 2011 at 2:54 PM, Frank Schwach wrote: >> looks like BLAST is not install on your system. The BioPerl module only >> runs BLAST for you and parses the output but you still need the BLAST >> executables installed on your system. Follow the instructions on the >> NCBI website to download and install BLAST and try running it on the >> commandline with the "blastall" command. If that works then you can run >> it also via BioPerl. >> Frank > > Hang on - blastall is from the "legacy" BLAST suite, does > BioPerl still talk to that or the new BLAST+ suite (e.g. binaries > blastn and blastp rather then blastall)? > > Peter (aside: thought I sent this the other day. never mix grant writing and open source) Both BLAST and BLAST+ are supported via different modules. Some users don't want to use BLAST+ for various reasons, though this may soon be out of their control when NCBI eventually stops supporting legacy BLAST entirely. chris From manju.rawat2 at gmail.com Wed Sep 14 11:32:19 2011 From: manju.rawat2 at gmail.com (Manju Rawat) Date: Wed, 14 Sep 2011 07:32:19 -0400 Subject: [Bioperl-l] no blast result In-Reply-To: References: <1315835661.3797.563.camel@deskpro15336.internal.sanger.ac.uk> <6570DEC6-B485-44B0-868A-AAC6329B3224@illinois.edu> Message-ID: On Wed, Sep 14, 2011 at 7:31 AM, Manju Rawat wrote: > I am trying to install Blast+ in my system.(ubuntu) from this link > http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/unix_setup.html. > but i am getting error.. > > first i downloaded the blast(ncbi-blast-2.2.25+-ia32-linux.tar.gz) from ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/ > . and then extract it in the home/abc/ folder. > after that i set the path for configuration in terminal i.e > > *PATH=$PATH:/home/abc/blast-2.2.25+/bin* > > > but when i am running blast -help in terminal it showing me error that > > error while loading shared libraries: > libbz2.so.1: cannot open shared object file: No such file or directory. > > -- Regards Manju Rawat Project Assistant(NAIP) Genomics Lab ABTC,NDRI Karnal-132001,Haryana From kumarsaurabh20 at gmail.com Thu Sep 15 11:20:47 2011 From: kumarsaurabh20 at gmail.com (kumar Saurabh) Date: Thu, 15 Sep 2011 13:20:47 +0200 Subject: [Bioperl-l] issue with Bio::Tools::Run::Primer3Redux Message-ID: Hi, I need to integrate the primer3 module in one of our pipeline. In a process, I was testing the initial code given on the CPAN website. But whenever I try to run this program its giving me error...that "Cannot locate the Object method add_target via the package Bio::Tools:Run::Primer3Redux...." The line of codes I am using is as follows: # design some primers. # the output will be put into temp.out use Bio::Tools::Primer3Redux; use Bio::Tools::Run::Primer3Redux; use Bio::SeqIO; my $seqio = Bio::SeqIO->new(-file=>'sample.dna'); my $seq = $seqio->next_seq; my $primer3 = Bio::Tools::Run::Primer3Redux->new(-outfile => "temp.out", -path => "/home/singh/Downloads/primer3-2.2.3/src/primer3_core"); # or after the fact you can change the program_name $primer3->program_name('my_superfast_primer3'); unless ($primer3->executable) { print STDERR "primer3 can not be found. Is it installed?\n"; exit(-1) } # set the maximum and minimum Tm of the primer $primer3->add_targets('PRIMER_MIN_TM'=>56, 'PRIMER_MAX_TM'=>90); # Design the primers. This runs primer3 and returns a # Bio::Tools::Primer3::result object with the results # Primer3 can run in several modes (see explanation for # 'PRIMER_TASK' in the primer3 doccumentation). To run a task, # either call it by its PRIMER_TASK name as in these examples: $pcr_primer_results = $primer3->pick_pcr_primers($seq); $pcr_and_hyb_results = $primer3->pick_pcr_primers_and_hyb_probe( $seq ); $check_results = $primer3->check_primers(); # Alternatively, explicitly set the PRIMER_TASK parameter and # use the generic 'run' method (this is mainly here for backwards # compatibility) : $primer3->PRIMER_TASK( 'pick_left_only' ); $result = $primer3->run( $seq ); # If no task is set and the 'run' method is called, primer3 will default to # pick pcr primers. # see the Bio::Tools::Primer3Redux POD for # things that you can get from this. For example: print "There were ", $results->num_primer_pairs, " primer pairs\n"; Can anyone help me with this??? Best regards, Kumar From fs5 at sanger.ac.uk Thu Sep 15 13:44:03 2011 From: fs5 at sanger.ac.uk (Frank Schwach) Date: Thu, 15 Sep 2011 14:44:03 +0100 Subject: [Bioperl-l] issue with Bio::Tools::Run::Primer3Redux In-Reply-To: References: Message-ID: <1316094243.3797.669.camel@deskpro15336.internal.sanger.ac.uk> Hi Kumar, We are currently working on this module and you might want to check out the latest version on Chris Field's github project: https://github.com/cjfields/Bio-Tools-Primer3Redux There will probably be some changes again once I get some time again to work on a few points we discussed lately. You can also check out my repo here: https://github.com/fschwach/Bio-Tools-Primer3Redux but I will certainly have to make changes to that code because I used AUTOLAD in the last version, which is probably not a good idea. My recommendation would be to use Chris' repo and see if that works for you. If not, feedback would be much appreciated. Cheers, Frank On Thu, 2011-09-15 at 13:20 +0200, kumar Saurabh wrote: > Hi, > > I need to integrate the primer3 module in one of our pipeline. In a process, > I was testing the initial code given on the CPAN website. But whenever I try > to run this program its giving me error...that "Cannot locate the Object > method add_target via the package Bio::Tools:Run::Primer3Redux...." > > The line of codes I am using is as follows: > > # design some primers. > # the output will be put into temp.out > use Bio::Tools::Primer3Redux; > use Bio::Tools::Run::Primer3Redux; > use Bio::SeqIO; > > my $seqio = Bio::SeqIO->new(-file=>'sample.dna'); > my $seq = $seqio->next_seq; > > my $primer3 = Bio::Tools::Run::Primer3Redux->new(-outfile => "temp.out", > -path => > "/home/singh/Downloads/primer3-2.2.3/src/primer3_core"); > > # or after the fact you can change the program_name > $primer3->program_name('my_superfast_primer3'); > > unless ($primer3->executable) { > print STDERR "primer3 can not be found. Is it installed?\n"; > exit(-1) > } > > # set the maximum and minimum Tm of the primer > $primer3->add_targets('PRIMER_MIN_TM'=>56, 'PRIMER_MAX_TM'=>90); > > # Design the primers. This runs primer3 and returns a > # Bio::Tools::Primer3::result object with the results > # Primer3 can run in several modes (see explanation for > # 'PRIMER_TASK' in the primer3 doccumentation). To run a task, > # either call it by its PRIMER_TASK name as in these examples: > $pcr_primer_results = $primer3->pick_pcr_primers($seq); > $pcr_and_hyb_results = $primer3->pick_pcr_primers_and_hyb_probe( $seq ); > $check_results = $primer3->check_primers(); > > # Alternatively, explicitly set the PRIMER_TASK parameter and > # use the generic 'run' method (this is mainly here for backwards > # compatibility) : > $primer3->PRIMER_TASK( 'pick_left_only' ); > $result = $primer3->run( $seq ); > > # If no task is set and the 'run' method is called, primer3 will default > to > # pick pcr primers. > > # see the Bio::Tools::Primer3Redux POD for > # things that you can get from this. For example: > > print "There were ", $results->num_primer_pairs, " primer pairs\n"; > > > Can anyone help me with this??? > > > Best regards, > Kumar > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From cjfields at illinois.edu Thu Sep 15 14:13:38 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Thu, 15 Sep 2011 14:13:38 +0000 Subject: [Bioperl-l] issue with Bio::Tools::Run::Primer3Redux In-Reply-To: <1316094243.3797.669.camel@deskpro15336.internal.sanger.ac.uk> References: <1316094243.3797.669.camel@deskpro15336.internal.sanger.ac.uk> Message-ID: I mentioned off-list that this should be filed as a github issue so we don't lose track. Unfortunately I can't get to it until next week (grant deadline). chris On Sep 15, 2011, at 8:44 AM, Frank Schwach wrote: > Hi Kumar, > > We are currently working on this module and you might want to check out > the latest version on Chris Field's github project: > > https://github.com/cjfields/Bio-Tools-Primer3Redux > > There will probably be some changes again once I get some time again to > work on a few points we discussed lately. You can also check out my repo > here: > https://github.com/fschwach/Bio-Tools-Primer3Redux > > but I will certainly have to make changes to that code because I used > AUTOLAD in the last version, which is probably not a good idea. > My recommendation would be to use Chris' repo and see if that works for > you. If not, feedback would be much appreciated. > > Cheers, > > Frank > > > > > On Thu, 2011-09-15 at 13:20 +0200, kumar Saurabh wrote: >> Hi, >> >> I need to integrate the primer3 module in one of our pipeline. In a process, >> I was testing the initial code given on the CPAN website. But whenever I try >> to run this program its giving me error...that "Cannot locate the Object >> method add_target via the package Bio::Tools:Run::Primer3Redux...." >> >> The line of codes I am using is as follows: >> >> # design some primers. >> # the output will be put into temp.out >> use Bio::Tools::Primer3Redux; >> use Bio::Tools::Run::Primer3Redux; >> use Bio::SeqIO; >> >> my $seqio = Bio::SeqIO->new(-file=>'sample.dna'); >> my $seq = $seqio->next_seq; >> >> my $primer3 = Bio::Tools::Run::Primer3Redux->new(-outfile => "temp.out", >> -path => >> "/home/singh/Downloads/primer3-2.2.3/src/primer3_core"); >> >> # or after the fact you can change the program_name >> $primer3->program_name('my_superfast_primer3'); >> >> unless ($primer3->executable) { >> print STDERR "primer3 can not be found. Is it installed?\n"; >> exit(-1) >> } >> >> # set the maximum and minimum Tm of the primer >> $primer3->add_targets('PRIMER_MIN_TM'=>56, 'PRIMER_MAX_TM'=>90); >> >> # Design the primers. This runs primer3 and returns a >> # Bio::Tools::Primer3::result object with the results >> # Primer3 can run in several modes (see explanation for >> # 'PRIMER_TASK' in the primer3 doccumentation). To run a task, >> # either call it by its PRIMER_TASK name as in these examples: >> $pcr_primer_results = $primer3->pick_pcr_primers($seq); >> $pcr_and_hyb_results = $primer3->pick_pcr_primers_and_hyb_probe( $seq ); >> $check_results = $primer3->check_primers(); >> >> # Alternatively, explicitly set the PRIMER_TASK parameter and >> # use the generic 'run' method (this is mainly here for backwards >> # compatibility) : >> $primer3->PRIMER_TASK( 'pick_left_only' ); >> $result = $primer3->run( $seq ); >> >> # If no task is set and the 'run' method is called, primer3 will default >> to >> # pick pcr primers. >> >> # see the Bio::Tools::Primer3Redux POD for >> # things that you can get from this. For example: >> >> print "There were ", $results->num_primer_pairs, " primer pairs\n"; >> >> >> Can anyone help me with this??? >> >> >> Best regards, >> Kumar >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > The Wellcome Trust Sanger Institute is operated by Genome Research > Limited, a charity registered in England with number 1021457 and a > company registered in England with number 2742969, whose registered > office is 215 Euston Road, London, NW1 2BE. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From fs5 at sanger.ac.uk Thu Sep 15 14:43:48 2011 From: fs5 at sanger.ac.uk (Frank Schwach) Date: Thu, 15 Sep 2011 15:43:48 +0100 Subject: [Bioperl-l] issue with Bio::Tools::Run::Primer3Redux In-Reply-To: References: <1316094243.3797.669.camel@deskpro15336.internal.sanger.ac.uk> Message-ID: <1316097828.3797.700.camel@deskpro15336.internal.sanger.ac.uk> I also haven't had the time yet to work on this again but, yes, we need to make sure we don't loose track of where we are. On Thu, 2011-09-15 at 14:13 +0000, Fields, Christopher J wrote: > I mentioned off-list that this should be filed as a github issue so we don't lose track. Unfortunately I can't get to it until next week (grant deadline). > > chris > > On Sep 15, 2011, at 8:44 AM, Frank Schwach wrote: > > > Hi Kumar, > > > > We are currently working on this module and you might want to check out > > the latest version on Chris Field's github project: > > > > https://github.com/cjfields/Bio-Tools-Primer3Redux > > > > There will probably be some changes again once I get some time again to > > work on a few points we discussed lately. You can also check out my repo > > here: > > https://github.com/fschwach/Bio-Tools-Primer3Redux > > > > but I will certainly have to make changes to that code because I used > > AUTOLAD in the last version, which is probably not a good idea. > > My recommendation would be to use Chris' repo and see if that works for > > you. If not, feedback would be much appreciated. > > > > Cheers, > > > > Frank > > > > > > > > > > On Thu, 2011-09-15 at 13:20 +0200, kumar Saurabh wrote: > >> Hi, > >> > >> I need to integrate the primer3 module in one of our pipeline. In a process, > >> I was testing the initial code given on the CPAN website. But whenever I try > >> to run this program its giving me error...that "Cannot locate the Object > >> method add_target via the package Bio::Tools:Run::Primer3Redux...." > >> > >> The line of codes I am using is as follows: > >> > >> # design some primers. > >> # the output will be put into temp.out > >> use Bio::Tools::Primer3Redux; > >> use Bio::Tools::Run::Primer3Redux; > >> use Bio::SeqIO; > >> > >> my $seqio = Bio::SeqIO->new(-file=>'sample.dna'); > >> my $seq = $seqio->next_seq; > >> > >> my $primer3 = Bio::Tools::Run::Primer3Redux->new(-outfile => "temp.out", > >> -path => > >> "/home/singh/Downloads/primer3-2.2.3/src/primer3_core"); > >> > >> # or after the fact you can change the program_name > >> $primer3->program_name('my_superfast_primer3'); > >> > >> unless ($primer3->executable) { > >> print STDERR "primer3 can not be found. Is it installed?\n"; > >> exit(-1) > >> } > >> > >> # set the maximum and minimum Tm of the primer > >> $primer3->add_targets('PRIMER_MIN_TM'=>56, 'PRIMER_MAX_TM'=>90); > >> > >> # Design the primers. This runs primer3 and returns a > >> # Bio::Tools::Primer3::result object with the results > >> # Primer3 can run in several modes (see explanation for > >> # 'PRIMER_TASK' in the primer3 doccumentation). To run a task, > >> # either call it by its PRIMER_TASK name as in these examples: > >> $pcr_primer_results = $primer3->pick_pcr_primers($seq); > >> $pcr_and_hyb_results = $primer3->pick_pcr_primers_and_hyb_probe( $seq ); > >> $check_results = $primer3->check_primers(); > >> > >> # Alternatively, explicitly set the PRIMER_TASK parameter and > >> # use the generic 'run' method (this is mainly here for backwards > >> # compatibility) : > >> $primer3->PRIMER_TASK( 'pick_left_only' ); > >> $result = $primer3->run( $seq ); > >> > >> # If no task is set and the 'run' method is called, primer3 will default > >> to > >> # pick pcr primers. > >> > >> # see the Bio::Tools::Primer3Redux POD for > >> # things that you can get from this. For example: > >> > >> print "There were ", $results->num_primer_pairs, " primer pairs\n"; > >> > >> > >> Can anyone help me with this??? > >> > >> > >> Best regards, > >> Kumar > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > -- > > The Wellcome Trust Sanger Institute is operated by Genome Research > > Limited, a charity registered in England with number 1021457 and a > > company registered in England with number 2742969, whose registered > > office is 215 Euston Road, London, NW1 2BE. > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From manju.rawat2 at gmail.com Fri Sep 16 05:09:25 2011 From: manju.rawat2 at gmail.com (Manju Rawat) Date: Fri, 16 Sep 2011 01:09:25 -0400 Subject: [Bioperl-l] no blast result In-Reply-To: <1315926565.3797.587.camel@deskpro15336.internal.sanger.ac.uk> References: <1315835661.3797.563.camel@deskpro15336.internal.sanger.ac.uk> <1315926565.3797.587.camel@deskpro15336.internal.sanger.ac.uk> Message-ID: Hello Frank, Yes,u r rite..I tried to run blast in terminal but its not working.. I have installed the latest version of blast and download the database correctly.. But when i running blastn-help command in terminal it showing me error that blastn: error while loading shared libraries: libbz2.so.1: cannot open shared object file: No such file or directory. and when i am running the blastall command then it showing that *legacy_blast.pl blastall -p blastn -i amino.fa -d nr -o blast.out* Can't exec "/usr/bin/blastn": No such file or directory at /usr/bin/ legacy_blast.pl line 85. Program failed, try executing the command manually. While i have set the path of environment variable PATH=$PATH:/home/abc/ncbi-blast-2.2.25+/bin I have checked everything but not able tp fine the error.. Pl help me. Manju From manju.rawat2 at gmail.com Fri Sep 16 05:12:03 2011 From: manju.rawat2 at gmail.com (Manju Rawat) Date: Fri, 16 Sep 2011 01:12:03 -0400 Subject: [Bioperl-l] Command line error in BLAST+ Message-ID: Hi, I have installed the latest version of blast and download the database correctly Using this tutorial http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/unix_setup.html But when i running blastn-help command in terminal it showing me error that blastn: error while loading shared libraries: libbz2.so.1: cannot open shared object file: No such file or directory. and when i am running the blastall command then it showing that *legacy_blast.pl blastall -p blastn -i amino.fa -d nr -o blast.out* Can't exec "/usr/bin/blastn": No such file or directory at /usr/bin/ legacy_blast.pl line 85. Program failed, try executing the command manually. While i have set the path of environment variable PATH=$PATH:/home/abc/ncbi-blast-2.2.25+/bin I have checked everything but not able tp fine the error.. Pl help me. Thanks Manju From p.j.a.cock at googlemail.com Fri Sep 16 08:15:46 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 16 Sep 2011 09:15:46 +0100 Subject: [Bioperl-l] Command line error in BLAST+ In-Reply-To: References: Message-ID: On Fri, Sep 16, 2011 at 6:12 AM, Manju Rawat wrote: > Hi, > > > I have installed the latest version of blast and download the database > correctly Using this tutorial > http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/unix_setup.html > > But when i running blastn-help command in terminal it showing me error that > blastn: error while loading shared libraries: libbz2.so.1: cannot open > shared object file: No such file or directory. > > > and when i am running the blastall command then it showing that > *legacy_blast.pl blastall -p blastn -i amino.fa -d nr -o blast.out* > Can't exec "/usr/bin/blastn": No such file or directory at /usr/bin/ > legacy_blast.pl line 85. > Program failed, try executing the command manually. > > While i have set the path of environment variable > PATH=$PATH:/home/abc/ncbi-blast-2.2.25+/bin > > I have checked everything but not able tp fine the error.. > > Pl help me. > > Thanks > Manju You're using the BioPerl wrapper for legacy blast (blastall), which is not installed. Instead you have the new blast+ suite which includes a wrapper using the perl script legacy_blast.pl to imitate the old blastall tool (in this case it calls the new tool blastn). Fix 1: Edit legacy_blast.pl to use the path to blastn etc under your home directory Fix 2: Install BLAST+ at system level Fix 3: Use the BioPerl wrapper for BLAST+ instead. I'd go with option 3. Peter From p.j.a.cock at googlemail.com Fri Sep 16 08:17:58 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 16 Sep 2011 09:17:58 +0100 Subject: [Bioperl-l] no blast result In-Reply-To: References: <1315835661.3797.563.camel@deskpro15336.internal.sanger.ac.uk> <1315926565.3797.587.camel@deskpro15336.internal.sanger.ac.uk> Message-ID: On Fri, Sep 16, 2011 at 6:09 AM, Manju Rawat wrote: > Hello Frank, > > Yes,u r rite..I tried to run blast in terminal but its not working.. > I have installed the latest version of blast and download the database > correctly.. > > But when i running blastn-help command in terminal it showing me error that > blastn: error while loading shared libraries: libbz2.so.1: cannot open > shared object file: No such file or directory. > > > and when i am running the blastall command then it showing that > legacy_blast.pl blastall -p blastn -i amino.fa -d nr -o blast.out > Can't exec "/usr/bin/blastn": No such file or directory at > /usr/bin/legacy_blast.pl line 85. > Program failed, try executing the command manually. > > While i have set the path of environment variable > PATH=$PATH:/home/abc/ncbi-blast-2.2.25+/bin > > I have checked everything but not able tp fine the error.. > > Pl help me. > Manju For the benefit of anyone reading the archives later, I tried to answer this in Manju's new thread: http://lists.open-bio.org/pipermail/bioperl-l/2011-September/035696.html Peter From fs5 at sanger.ac.uk Fri Sep 16 08:36:37 2011 From: fs5 at sanger.ac.uk (Frank Schwach) Date: Fri, 16 Sep 2011 09:36:37 +0100 Subject: [Bioperl-l] Command line error in BLAST+ In-Reply-To: References: Message-ID: <1316162197.3797.721.camel@deskpro15336.internal.sanger.ac.uk> Hi Manju, Are you on Ubuntu? I think I've seen problems with this bzip library on Ubuntu before. It's not a problem with BLAST in any case. Should be possible to install the missing files through your package manager. I'm sure Google will know what to do :) Not sure what went wrong with your blast installation. What happens if you run blastall directly (without the legacy_blast.pl script)? In any case, it might be better to ask the NCBI people for help with the BLAST installation as this is not a BioPerl problem. cheers, Frank On Fri, 2011-09-16 at 01:12 -0400, Manju Rawat wrote: > Hi, > > > I have installed the latest version of blast and download the database > correctly Using this tutorial > http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/unix_setup.html > > But when i running blastn-help command in terminal it showing me error that > blastn: error while loading shared libraries: libbz2.so.1: cannot open > shared object file: No such file or directory. > > > and when i am running the blastall command then it showing that > *legacy_blast.pl blastall -p blastn -i amino.fa -d nr -o blast.out* > Can't exec "/usr/bin/blastn": No such file or directory at /usr/bin/ > legacy_blast.pl line 85. > Program failed, try executing the command manually. > > While i have set the path of environment variable > PATH=$PATH:/home/abc/ncbi-blast-2.2.25+/bin > > I have checked everything but not able tp fine the error.. > > Pl help me. > > Thanks > Manju > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From ross at cuhk.edu.hk Fri Sep 16 08:51:38 2011 From: ross at cuhk.edu.hk (Ross KK Leung) Date: Fri, 16 Sep 2011 16:51:38 +0800 Subject: [Bioperl-l] use blast to extract similar sequences In-Reply-To: References: <048a01cc6f84$60c41090$224c31b0$@edu.hk> <04a601cc700e$4f051d10$ed0f5730$@edu.hk> Message-ID: <085501cc744d$d90b4500$8b21cf00$@edu.hk> I wonder whether bioperl has any built-in modules that extracts sequences based on blast results. For example, a short query sequence of length 1000 is to blast against a reference genome of 3M. The homologous sequence of 1000 +/- 20 is extracted. Why is +/- 20 needed? Because we can't guarantee there must have a good match. Frequent blast users may be well aware that then there can be coverage, split-up due to local alignments, etc and that's why I would like to know if anybody has already developed a module to handle this kind of problem. Thanks in advance! From cjfields at illinois.edu Fri Sep 16 13:22:07 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Fri, 16 Sep 2011 13:22:07 +0000 Subject: [Bioperl-l] use blast to extract similar sequences In-Reply-To: <085501cc744d$d90b4500$8b21cf00$@edu.hk> References: <048a01cc6f84$60c41090$224c31b0$@edu.hk> <04a601cc700e$4f051d10$ed0f5730$@edu.hk> <085501cc744d$d90b4500$8b21cf00$@edu.hk> Message-ID: That seems like a pretty straightforward thing to do; there isn't an all-in-one way of doing this, but that's a good thing (it's a separation of concerns). 1) Run and parse BLAST results and grab seqID and coordinates for each hit (or each HSP for each hit) (Bio::SearchIO) 2) Pull the right subsequence +/- 20bp using above from the indexed flat file of your reference (Bio::DB::Fasta) You can get revcomped sequence from Bio::DB::Fasta directly by flipping coordinates: # raw sequence my $seq = $db->seq('CHROMOSOME_I',4_000_000 => 4_100_000); my $revseq = $db->seq('CHROMOSOME_I',4_100_000 => 4_000_000); chris On Sep 16, 2011, at 3:51 AM, Ross KK Leung wrote: > I wonder whether bioperl has any built-in modules that extracts sequences based on blast results. For example, a short query sequence of length 1000 is to blast against a reference genome of 3M. The homologous sequence of 1000 +/- 20 is extracted. Why is +/- 20 needed? Because we can't guarantee there must have a good match. Frequent blast users may be well aware that then there can be coverage, split-up due to local alignments, etc and that's why I would like to know if anybody has already developed a module to handle this kind of problem. Thanks in advance! > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From wsavigne at yahoo.com Fri Sep 16 20:45:12 2011 From: wsavigne at yahoo.com (Willy Savigne) Date: Fri, 16 Sep 2011 13:45:12 -0700 (PDT) Subject: [Bioperl-l] question Bioperl installation Message-ID: <1316205912.93564.YahooMailClassic@web160515.mail.bf1.yahoo.com> my name is william how do download Bioperl i tried other site but NOTHING? i would like to know info in downloading? bioperl .This is my first? time into knowing? bioinformatic i? just got? a book developing bioinformatic and begginning perl bioinformatic. I do alot Dna and RNA sequencing?? and more. ? Thank u willy From ross at cuhk.edu.hk Sun Sep 18 10:51:05 2011 From: ross at cuhk.edu.hk (Ross KK Leung) Date: Sun, 18 Sep 2011 18:51:05 +0800 Subject: [Bioperl-l] snp/frameshift identification In-Reply-To: References: <048a01cc6f84$60c41090$224c31b0$@edu.hk> <04a601cc700e$4f051d10$ed0f5730$@edu.hk> <085501cc744d$d90b4500$8b21cf00$@edu.hk> Message-ID: <08a901cc75f0$dd463b30$97d2b190$@edu.hk> Dear Bioperl-users, Following Fields, Christopher J's advice on sequence extraction, I manage to proceed to the last stage of non-synonymous SNP identification. Now what I have in hand is thousands of reliable multiple sequence alignment files, e.g. >seq1 ATGACAGACACGACGTTGCCGTAG >seq2 ATGACAGACACGACGTAGCCGTAG >seq3 ATGACAGACACGACGTTGCCGTAG Seq2 has a T->A mutation and that leads to a stop codon generation. I wonder if Bioperl has handled this kind of SNP or frameshift or non-sense mutations that lead to change of amino acid in the translated protein product. Thanks again to the community that helps me a great deal so I can catch up progress during this Sat/Sun!! From rondonbio at yahoo.com.br Mon Sep 19 13:46:36 2011 From: rondonbio at yahoo.com.br (Rondon Neto) Date: Mon, 19 Sep 2011 06:46:36 -0700 (PDT) Subject: [Bioperl-l] help-> SearchIO Message-ID: <1316439996.6247.YahooMailNeo@web130220.mail.mud.yahoo.com> Hi guys! I need your help in a loop that I have in SearchIO. I need to check the nucleotide coverage of querys using BLAST. I'm using the script below. It's open the alignment, create arrays for each query with zeros in each nucleotide position but, when I adds values to the coverage of each nucleotide, the script does it once and passes to another query. Can you hek me? Thank you very much, Rondon a Brazilian friend. use Bio::SearchIO; ? my $alignment = new Bio::SearchIO ( -format => 'blastXML', ? ? ? ? ? ? ? ? ? ? ? ? ?? ? -file ? => $alignment_file ); my %positions; while (my $result = $alignment->next_result) { while (my $hit = $result->next_hit) { while (my $hsp = $hit->next_hsp) { my $query_name = $result->query_name(); my $tam = $result -> query_length(); my @pos = $hsp->seq_inds('query','identical'); for (0..$tam){ ${$positions{$query_name}}[$_] = 0 } # make arrays for each query and populate with 0 in each position foreach my $num (@pos) { ${$positions{$query_name}}[$num -1]++; ? ?#This loop is where I believe that is an error. } } } } foreach my $key (keys %positions){ print "$key\t@{$positions{$key}}\n"; } exit; From roy.chaudhuri at gmail.com Mon Sep 19 16:29:41 2011 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Mon, 19 Sep 2011 17:29:41 +0100 Subject: [Bioperl-l] help-> SearchIO In-Reply-To: <1316439996.6247.YahooMailNeo@web130220.mail.mud.yahoo.com> References: <1316439996.6247.YahooMailNeo@web130220.mail.mud.yahoo.com> Message-ID: <4E776DF5.6040504@gmail.com> Hi Rondon, The line where you populate your arrayref with 0 (starting "for (0..$tam)") is within the HSP loop, so the data from every successive HSP will overwrite the previous one in your hash. You will therefore only see the data for the last HSP from each query. If you move that line to execute once per result (i.e. just after the line starting "while (my result ="), then I think it should work as you intended. Cheers, Roy. On 19/09/2011 14:46, Rondon Neto wrote: > Hi guys! I need your help in a loop that I have in SearchIO. I need > to check the nucleotide coverage of querys using BLAST. I'm using the > script below. It's open the alignment, create arrays for each query > with zeros in each nucleotide position but, when I adds values to the > coverage of each nucleotide, the script does it once and passes to > another query. Can you hek me? Thank you very much, > > Rondon a Brazilian friend. > > use Bio::SearchIO; > > my $alignment = new Bio::SearchIO ( -format => 'blastXML', > -file => $alignment_file ); > > my %positions; > while (my $result = $alignment->next_result) { > while (my $hit = $result->next_hit) { > while (my $hsp = $hit->next_hsp) { > my $query_name = $result->query_name(); > my $tam = $result -> query_length(); > my @pos = $hsp->seq_inds('query','identical'); > for (0..$tam){ ${$positions{$query_name}}[$_] = 0 } # make arrays for each query and populate with 0 in each position > foreach my $num (@pos) { > ${$positions{$query_name}}[$num -1]++; #This loop is where I believe that is an error. > } > } > } > } > > foreach my $key (keys %positions){ > print "$key\t@{$positions{$key}}\n"; > } > > exit; > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From roy.chaudhuri at gmail.com Mon Sep 19 16:39:40 2011 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Mon, 19 Sep 2011 17:39:40 +0100 Subject: [Bioperl-l] question Bioperl installation In-Reply-To: <1316205912.93564.YahooMailClassic@web160515.mail.bf1.yahoo.com> References: <1316205912.93564.YahooMailClassic@web160515.mail.bf1.yahoo.com> Message-ID: <4E77704C.20604@gmail.com> Hi Willy, There are instructions for downloading and installing BioPerl on the wiki: http://www.bioperl.org/wiki/Getting_BioPerl http://www.bioperl.org/wiki/Installing_BioPerl These are the first two results when you Google for "bioperl download". Note that the wiki is a little out of date, the latest BioPerl version is 1.6.901: http://search.cpan.org/~cjfields/BioPerl-1.6.901/ Cheers, Roy. On 16/09/2011 21:45, Willy Savigne wrote: > my name is william how do download Bioperl i tried other site but > NOTHING i would like to know info in downloading bioperl .This is > my first time into knowing bioinformatic i just got a book > developing bioinformatic and begginning perl bioinformatic. I do alot > Dna and RNA sequencing and more. > > Thank u willy > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From bosborne11 at verizon.net Tue Sep 20 17:01:21 2011 From: bosborne11 at verizon.net (Brian Osborne) Date: Tue, 20 Sep 2011 13:01:21 -0400 Subject: [Bioperl-l] Question about a phylogenetic tree Message-ID: <36D93DC4-135D-408F-8169-AC8D5E59BD90@verizon.net> All, I have code that starts with a sequence file and makes a tree (Bio::Tree::Tree) using Muscle to align and then Phyml, here's the last part that makes the tree: ..... get the files etc .... my %alignparams = ( -seqtype => 'nucleo', -usetree_nowarn => $guidetreefile, -in => $tempfile ); my $aligner = Bio::Tools::Run::Alignment::Muscle->new(%alignparams); # $align is a Bio::SimpleAlign object my $align = $aligner->align($tempfile); my %treeparams = ( -data_type => 'nt', -model => 'K80', # Kimura -tree => 'BIONJ', -bootstrap => 1000 ); my $treemaker = Bio::Tools::Run::Phylo::Phyml->new(%treeparams); #$tree is a Bio::Tree::Tree object my $tree = $treemaker->run($align); My question: do I get the pairwise distance between 2 sequences (based on Kimura here) by doing something like: $distance = $tree->subtree_length($internal_node) Where $internal_node is the parent of the pair in question? Excuse me if this is obvious, have never made Bioperl trees before! Brian O. From bosborne11 at verizon.net Tue Sep 20 19:17:13 2011 From: bosborne11 at verizon.net (Brian Osborne) Date: Tue, 20 Sep 2011 15:17:13 -0400 Subject: [Bioperl-l] Question about a phylogenetic tree In-Reply-To: <36D93DC4-135D-408F-8169-AC8D5E59BD90@verizon.net> References: <36D93DC4-135D-408F-8169-AC8D5E59BD90@verizon.net> Message-ID: Ah, I see: my $distances = $tree->distance(-nodes => [$node1,$node2]); Brian O. On Sep 20, 2011, at 1:01 PM, Brian Osborne wrote: > All, > > I have code that starts with a sequence file and makes a tree (Bio::Tree::Tree) using Muscle to align and then Phyml, here's the last part that makes the tree: > > ..... get the files etc .... > > my %alignparams = ( > -seqtype => 'nucleo', > -usetree_nowarn => $guidetreefile, > -in => $tempfile > ); > my $aligner = Bio::Tools::Run::Alignment::Muscle->new(%alignparams); > > # $align is a Bio::SimpleAlign object > my $align = $aligner->align($tempfile); > > my %treeparams = ( > -data_type => 'nt', > -model => 'K80', # Kimura > -tree => 'BIONJ', > -bootstrap => 1000 > ); > my $treemaker = Bio::Tools::Run::Phylo::Phyml->new(%treeparams); > > #$tree is a Bio::Tree::Tree object > my $tree = $treemaker->run($align); > > My question: do I get the pairwise distance between 2 sequences (based on Kimura here) by doing something like: > > $distance = $tree->subtree_length($internal_node) > > Where $internal_node is the parent of the pair in question? Excuse me if this is obvious, have never made Bioperl trees before! > > Brian O. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From manju.rawat2 at gmail.com Thu Sep 22 11:07:39 2011 From: manju.rawat2 at gmail.com (Manju Rawat) Date: Thu, 22 Sep 2011 16:37:39 +0530 Subject: [Bioperl-l] database for Bos Touras Message-ID: Hello To All, I want to blast my sequence Only in Bos Touras Database Using Local Blast(Blast+). But I dnt Know which database I should use for this From this Link. ftp://ftp.ncbi.nlm.nih.gov/blast/db/ Pl tell me which DB I Should use?? Thanks Manju From hrh at fmi.ch Thu Sep 22 11:44:56 2011 From: hrh at fmi.ch (Hans-Rudolf Hotz) Date: Thu, 22 Sep 2011 13:44:56 +0200 Subject: [Bioperl-l] database for Bos Touras In-Reply-To: References: Message-ID: <4E7B1FB8.8090208@fmi.ch> assuming you mean 'Bos taurus', it might be easier to get the data from ucsc: http://hgdownload.cse.ucsc.edu/downloads.html#cow or ensembl: ftp://ftp.ensembl.org/pub/release-64/fasta/bos_taurus/dna/ Regards, Hans On 09/22/2011 01:07 PM, Manju Rawat wrote: > Hello To All, > > I want to blast my sequence Only in Bos Touras Database Using Local > Blast(Blast+). > But I dnt Know which database I should use for this From this Link. > ftp://ftp.ncbi.nlm.nih.gov/blast/db/ > > Pl tell me which DB I Should use?? > > Thanks > Manju > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From hrh at fmi.ch Thu Sep 22 12:16:00 2011 From: hrh at fmi.ch (Hans-Rudolf Hotz) Date: Thu, 22 Sep 2011 14:16:00 +0200 Subject: [Bioperl-l] database for Bos Touras In-Reply-To: References: <4E7B1FB8.8090208@fmi.ch> Message-ID: <4E7B2700.8080904@fmi.ch> Yes, BLAST uses fasta files. You (may need to concatenate the individual chromosomes and the you) need to index them with 'makeblastdb' which is also part of the blast+ software package. see: http://www.ncbi.nlm.nih.gov/books/NBK1762/ Hans On 09/22/2011 01:49 PM, Manju Rawat wrote: > It will work on Local Blast or not?????? From bosborne11 at verizon.net Thu Sep 22 16:16:39 2011 From: bosborne11 at verizon.net (Brian Osborne) Date: Thu, 22 Sep 2011 12:16:39 -0400 Subject: [Bioperl-l] [bioperl-live] genbank_ref_extractor: new script to make search on entrez gene and retrieve related sequences (#23) In-Reply-To: References: Message-ID: <245C75D5-61EC-4395-B64F-47D8471568F5@verizon.net> Carne, This is impressive looking, it is now in scripts/. Thanks again, Brian O. On Sep 21, 2011, at 11:25 AM, Carn? Draug wrote: > Hi > > I wrote a script with bioperl that I would like to share back. It takes a list of searches for Entrez Gene and attempts to retrieve the related sequences (genomic, transcripts and proteins). It is also possible to obtain extra upstream and downstream bp for genomic sequences and control the naming of the files. In the end it can save all the results in a CSV file. > > Hope you find it up to your coding standards. Suggestions for improvements are welcome, including for a better name. > > Carn? > > You can merge this Pull Request by running: > > git pull https://github.com/carandraug/bioperl-live bp_genbank_ref_extractor > > Or you can view, comment on it, or merge it online at: > > https://github.com/bioperl/bioperl-live/pull/23 > > -- Commit Summary -- > > * genbank_ref_extractor: new script to make search on entrez gene and retrieve related sequences > > -- File Changes -- > > A scripts/Bio-DB-EUtilities/bp_genbank_ref_extractor.pl (1064) > > -- Patch Links -- > > https://github.com/bioperl/bioperl-live/pull/23.patch > https://github.com/bioperl/bioperl-live/pull/23.diff > > -- > Reply to this email directly or view it on GitHub: > https://github.com/bioperl/bioperl-live/pull/23 From bluecurio at gmail.com Thu Sep 22 19:32:07 2011 From: bluecurio at gmail.com (Daniel Renfro) Date: Thu, 22 Sep 2011 14:32:07 -0500 Subject: [Bioperl-l] Download RefSeq revision history programmatically Message-ID: <29B78DBCEBCA42B5A99461FE1E9BC33F@gmail.com> I am working on a project to find historical differences in GenBank/RefSeq files. I would like to download all the old revisions of a file (for example NC_000913 [http://www.ncbi.nlm.nih.gov/nuccore/NC_000913.2?report=girevhist]) using any technology available. I wrote a page-scraper in Perl, but I can't get NCBI to return plaintext, only HTML (which does nobody any good.) Does anyone know of a way to get all the "revisions" (not just "versions") of a GenBank/RefSeq file? -Daniel -- http://ecoliwiki.net/User:DanielRenfro Hu Lab Research Associate 979-862-4055 From ross at cuhk.edu.hk Tue Sep 27 14:16:14 2011 From: ross at cuhk.edu.hk (Ross KK Leung) Date: Tue, 27 Sep 2011 22:16:14 +0800 Subject: [Bioperl-l] obtain a distance matrix from tree In-Reply-To: References: <048a01cc6f84$60c41090$224c31b0$@edu.hk> <04a601cc700e$4f051d10$ed0f5730$@edu.hk> <085501cc744d$d90b4500$8b21cf00$@edu.hk> Message-ID: <014201cc7d20$03abd4c0$0b037e40$@edu.hk> After using MEGA to generate a newick tree file (phylogram), I wonder if Bioperl has any convenient functions to derive the (n x n) distance (by NJ, MP etc) matrix. Thanks for your advice in advance! From thomas.sharpton at gmail.com Tue Sep 27 20:02:44 2011 From: thomas.sharpton at gmail.com (Thomas Sharpton) Date: Tue, 27 Sep 2011 13:02:44 -0700 Subject: [Bioperl-l] obtain a distance matrix from tree In-Reply-To: <014201cc7d20$03abd4c0$0b037e40$@edu.hk> References: <048a01cc6f84$60c41090$224c31b0$@edu.hk> <04a601cc700e$4f051d10$ed0f5730$@edu.hk> <085501cc744d$d90b4500$8b21cf00$@edu.hk> <014201cc7d20$03abd4c0$0b037e40$@edu.hk> Message-ID: Hi Ross, For very large trees, I found it to be more efficient to do this in R using the ape package. I have a script listed in my github repo that will convert a tree to a distance matrix via in R at the link below: https://github.com/sharpton/PhylOTU/blob/master/tree_to_matrix.R That said, I've also done this in Bioperl using something like the following: use Bio::TreeIO; my $treein = Bio::TreeIO->new( -fh => "input_tree.nwk", -format => 'newick' ); while( my $tree = $treein->next_tree ){ my %dist_matrix = (); my @leaves = $tree->get_leaf_nodes; foreach my $leaf1( @leaves ){ my $id1 = $leaf1->id; foreach my $leaf2( @leaves ){ my $id2 = $leaf2->id; next if $id1 eq $id2; next if( defined( $dist_matrix{$id1}->{$id2} ) || defined ( $dist_matrix{$id2}->{$id1} ) ); my $distance = $tree->distance( -nodes => [$leaf1, $leaf2] ); $dist_matrix{$id1}->{$id2} = $distance; } } } #print distance matrix here.... This will put the information you need to create either a full or a upper triangle distance matrix into the hash %dist_matrix. I didn't test the above, so hopefully there are no bugs.... Someone else may have a more elegant solution. Best, Tom PS: Sorry if you get this twice. On Sep 27, 2011, at 7:16 AM, Ross KK Leung wrote: > After using MEGA to generate a newick tree file (phylogram), I > wonder if > Bioperl has any convenient functions to derive the (n x n) distance > (by NJ, > MP etc) matrix. Thanks for your advice in advance! > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From member at linkedin.com Tue Sep 27 23:45:10 2011 From: member at linkedin.com (Razi Khaja via LinkedIn) Date: Tue, 27 Sep 2011 23:45:10 +0000 (UTC) Subject: [Bioperl-l] Invitation to connect on LinkedIn Message-ID: <1856085440.8574001.1317167110185.JavaMail.app@ela4-bed82.prod> LinkedIn ------------ Razi Khaja requested to add you as a connection on LinkedIn: ------------------------------------------ Bolotin,, I'd like to add you to my professional network on LinkedIn. Accept invitation from Razi Khaja http://www.linkedin.com/e/5drwke-gt3jaequ-6k/uez6TYkHzbaXxXM-lUk23auFwJZodcPlXc2UWC0Ao8h/blk/I3148646357_2/1BpC5vrmRLoRZcjkkZt5YCpnlOt3RApnhMpmdzgmhxrSNBszYOnPsRcPoQdzwQcjd9bSsPizpOoltTbP0NdPgMd3kTcPgLrCBxbOYWrSlI/EML_comm_afe/?hs=false&tok=2TjQgihXkh-kU1 View profile of Razi Khaja http://www.linkedin.com/e/5drwke-gt3jaequ-6k/rsn/35197242/UkCS/?hs=false&tok=3k9X2Qfnoh-kU1 ------------------------------------------ From ross at cuhk.edu.hk Wed Sep 28 03:57:52 2011 From: ross at cuhk.edu.hk (Ross KK Leung) Date: Wed, 28 Sep 2011 11:57:52 +0800 Subject: [Bioperl-l] ancestral state derived from Tree In-Reply-To: <29B78DBCEBCA42B5A99461FE1E9BC33F@gmail.com> References: <29B78DBCEBCA42B5A99461FE1E9BC33F@gmail.com> Message-ID: <017701cc7d92$cba88d20$62f9a760$@edu.hk> By using Tom's advice, I'm able to obtain the distance matrix for the following tree by Bioperl TreeIO. ((((((A:1.00000000,B:1.00000000):1.00000000,C:1.00000000):0.00000000,D:0.000 00000):1.00000000,(E:0.00000000,(F:2.00000000,G:1.00000000):0.00000000):0.00 000000):2.00000000,(H:3.00000000,(I:2.00000000,(J:1.00000000,(K:2.00000000,( L:2.00000000,M:2.00000000):0.00000000):0.00000000):0.00000000):0.00000000):0 .00000000):1.00000000,(N:0.00000000,((O:0.00000000,P:0.00000000):1.00000000, (Q:2.00000000,(R:2.66666667,S:3.66666667):3.66666667):0.00000000):1.00000000 ):3.00000000,(T:0.00000000,(U:0.00000000,V:0.00000000):1.00000000):16.000000 00); For the last few nodes T, U and V, they should be monophyletic but U and V should be more closely related. Although I can use TreeIO methods like is_monophyletic or is_paraphyletic to test in this case, the problem becomes more tricky for nodes A, B, C, D because D actually makes no difference from the common ancestor of nodes A, B, C and D. Since is_monophyletic does not take into account for this case, is there any workaround? I have to pay attention to such a detail in order to make a better guess for the ancestral state(s) at various points of this tree. Thanks again for the TreeIO developers for making tree analysis easier for us biologists! From manju.rawat2 at gmail.com Wed Sep 28 09:54:07 2011 From: manju.rawat2 at gmail.com (Manju Rawat) Date: Wed, 28 Sep 2011 15:24:07 +0530 Subject: [Bioperl-l] how to blast a seq against multiple dbase Message-ID: Hello, I have downloaded all the chromosome of Bos Taurus and i'd changed them in blast format using makeblastdb..and now i want to localy blast my sequence against these all chromosome.. now i have 29 database.Is there any method by which can i blast my sequence against all 29 database in my program.. whta should i write in database???? @params = ('database' => '????????', 'outfile' => 'blast2.out', '_READMETHOD' => 'Blast', 'prog'=> 'blastn'); Thanks Manju Rawat From p.j.a.cock at googlemail.com Wed Sep 28 10:02:07 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 28 Sep 2011 11:02:07 +0100 Subject: [Bioperl-l] how to blast a seq against multiple dbase In-Reply-To: References: Message-ID: On Wed, Sep 28, 2011 at 10:54 AM, Manju Rawat wrote: > Hello, > > I have downloaded all the chromosome of Bos Taurus and i'd changed them in > blast format using makeblastdb..and now i want to localy blast my sequence > against these all chromosome.. > now i have 29 database.Is there any method by which can i blast my sequence > against all 29 database in my program.. > > > whta should i write in database???? > > @params = ('database' => '????????', 'outfile' => 'blast2.out', > ? ? ? ?'_READMETHOD' => 'Blast', 'prog'=> 'blastn'); > The simple answer is make a combined database. This works internally with alias files, have a look at the NR and NT databases for example - they act like singe databases but are actually a collection of chunks. Even simpler would be to combine your Bos taurus sequence files into a single multi-entry FASTA file, and make that into a single BLAST database. Peter From awitney at sgul.ac.uk Wed Sep 28 10:42:39 2011 From: awitney at sgul.ac.uk (Adam Witney) Date: Wed, 28 Sep 2011 11:42:39 +0100 Subject: [Bioperl-l] how to blast a seq against multiple dbase In-Reply-To: References: Message-ID: I think if you want to keep the databases separate you would need to create a factory for each database, something like this foreach my $db ( @databases ) { my $factory = Bio::Tools::Run::StandAloneBlastPlus->new( -db_data => $db , < ? any other params ? > ); ? do blast stuff? } or as Peter says in another email you could combine your databases and run one query then filter them out in the results. regards adam On 28 Sep 2011, at 10:54, Manju Rawat wrote: > Hello, > > I have downloaded all the chromosome of Bos Taurus and i'd changed them in > blast format using makeblastdb..and now i want to localy blast my sequence > against these all chromosome.. > now i have 29 database.Is there any method by which can i blast my sequence > against all 29 database in my program.. > > > whta should i write in database???? > > @params = ('database' => '????????', 'outfile' => 'blast2.out', > '_READMETHOD' => 'Blast', 'prog'=> 'blastn'); > > > > Thanks > Manju Rawat > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From carandraug+dev at gmail.com Wed Sep 28 11:43:02 2011 From: carandraug+dev at gmail.com (=?ISO-8859-1?Q?Carn=EB_Draug?=) Date: Wed, 28 Sep 2011 12:43:02 +0100 Subject: [Bioperl-l] retrieving bioperl version for scripts Message-ID: Hi everyone, is there a recommended way to get the version of a script that is part of bioperl (the ones in the scripts directory)? Rather than hard coding the version of the script independent of bioperl, I thought on using the bioperl version itself. How can this be done? Thanks in advance, Carn? From carandraug+dev at gmail.com Wed Sep 28 15:00:34 2011 From: carandraug+dev at gmail.com (=?ISO-8859-1?Q?Carn=EB_Draug?=) Date: Wed, 28 Sep 2011 16:00:34 +0100 Subject: [Bioperl-l] retrieving bioperl version for scripts In-Reply-To: References: Message-ID: 2011/9/28 longbow leo : > Hi, Carn?, > > Do you mean this: > > perl -MBio::SeqIO -e 'print $Bio::SeqIO::VERSION' > > In my machine, the output is Thank you. Yes this is what I was looking for. I looked down how that variable comes up and so I think I'll use use Bio::Root::Version; say $Bio::Root::Version::VERSION; Carn? From pcantalupo at gmail.com Wed Sep 28 16:54:19 2011 From: pcantalupo at gmail.com (Paul Cantalupo) Date: Wed, 28 Sep 2011 12:54:19 -0400 Subject: [Bioperl-l] algorithm_version not working in multi-result blast output Message-ID: Hello, I'm using the most recent copy of bioperl-live (pulled yesterday). I have a BLASTN (from blast+) output file for 3 query sequences (https://gist.github.com/1248342). I used this script, https://gist.github.com/1248338, to print the query id, algorithm and algorithm_version for each result. When I run the script, I get the following output: GFAVMM201BADC0 ?BLASTN ?2.2.25+ GFAVMM201A1JOH ?BLASTN GFAVMM201D933Z ?BLASTN Algorithm_version outputs the correct version for the first result but outputs the empty string for the 2nd and 3rd query. Why? This functionality worked about a month ago. What has changed to cause this to happen? Thank you, Paul From rondonbio at yahoo.com.br Wed Sep 28 19:47:45 2011 From: rondonbio at yahoo.com.br (Rondon Neto) Date: Wed, 28 Sep 2011 12:47:45 -0700 (PDT) Subject: [Bioperl-l] best Hit Message-ID: <1317239265.98674.YahooMailNeo@web130214.mail.mud.yahoo.com> Hi guys.? I have this subroutine that returns a hash with nucleotide's coverage of each query from a blast alignment. So, I want to compute uniq hits. If a hit has already been aligned with a query, it must be eliminated from my experiment. Can anyone check if it's right or can fix it to me? Is there a way to do that directly in blast? Thank you Rondon Neto sub nucleotide_coverage{ #Bio::SearchIO dependent #This subroutine return a Hash and a file with nucleotide coverage? #for each query in an blast alignment xlm file. The input is the #alignment file. my ($alignment_file) = @_; my $alignment = new Bio::SearchIO ( -format => 'blastXML', ? ?-file ? => $alignment_file );my %positions;my @used_reads; while (my $result = $alignment->next_result) { my $query_name = $result->query_name(); my $tam = $result -> query_length(); for (0..$tam-1){ ${$positions{$query_name}}[$_] = 0 }? while (my $hit = $result->next_hit) { my $hit_name = $hit->name; # Here is my best hit parser. Is it ok? foreach my $read (@used_reads) { if ( $read eq $hit_name ) { next; } } while (my $hsp = $hit->next_hsp) { my $query_name = $result->query_name(); my @pos = $hsp->seq_inds('query','identical'); foreach my $num (@pos) { ${$positions{$query_name}}[$num-1]++; } } push (@used_reads, $hit_name); } } my $outfile = "nucleotide_coverage.txt"; open OUT, ">$outfile" or die $!;foreach my $key (keys %positions){print OUT "$key\t@{$positions{$key}}\n"; } close OUT; return \%positions; } From shalabh.sharma7 at gmail.com Wed Sep 28 19:53:07 2011 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Wed, 28 Sep 2011 15:53:07 -0400 Subject: [Bioperl-l] Getting taxa from gi Message-ID: Hi All, I know this has been discussed before, but this is kind of a new problem that i am facing. I want to get taxonomy (full linage) information from the huge list of GI's. I am using Bio::DB:Genbak for this with perl-5.12.3. Here is my small script. #! /usr/local/perl-5.12.3/bin/perl -w use strict; use warnings; use Bio::DB::GenBank; my @ids = qw( CP000490 ); my $gbh = Bio::DB::GenBank->new(); foreach my $id( @ids ) { # say "* ID: $id"; my $seq = $gbh->get_Seq_by_acc( $id ); my $org = $seq->species; #print "$org\n"; my $class = join'-', $org->classification; print "$class\n"; } The output is: Paracoccus denitrificans PD1222-Paracoccus-Rhodobacteraceae-Rhodobacterales-Alphaproteobacteria-Proteobacteria-Bacteria which is fine but i also want to get the taxa id, and if possible taxa ids for all the linage classification. ideally i would like to get something like this: 318586 - - - - - - - 1224 - 2 I would really appreciate your help. Thanks Shalabh -- Shalabh Sharma Scientific Computing Professional Associate (Bioinformatics Specialist) Department of Marine Sciences University of Georgia Athens, GA 30602-3636 From cjfields at illinois.edu Wed Sep 28 21:36:37 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Wed, 28 Sep 2011 21:36:37 +0000 Subject: [Bioperl-l] retrieving bioperl version for scripts In-Reply-To: References: Message-ID: On Sep 28, 2011, at 10:00 AM, Carn? Draug wrote: > 2011/9/28 longbow leo : >> Hi, Carn?, >> >> Do you mean this: >> >> perl -MBio::SeqIO -e 'print $Bio::SeqIO::VERSION' >> >> In my machine, the output is > > Thank you. Yes this is what I was looking for. I looked down how that > variable comes up and so I think I'll use > > use Bio::Root::Version; > say $Bio::Root::Version::VERSION; > > Carn? Just a warning on this: we're shortly to announce a major restructuring effort with BioPerl that will dramatically affect core versioning, mainly from the point of view that modularizing BioPerl into more manageable sub-distributions will require that each sub-distribution have it's own version (not necessarily a bad thing). chris From cjfields at illinois.edu Wed Sep 28 21:40:48 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Wed, 28 Sep 2011 21:40:48 +0000 Subject: [Bioperl-l] algorithm_version not working in multi-result blast output In-Reply-To: References: Message-ID: <4240A221-7BF7-4723-A7A1-30C806CE3DC6@illinois.edu> Not sure, but I would hazard a guess that only the 'Query=' line is present in concatenated BLAST reports past the initial report, and the version isn't carried over (I recall this being a problem with the algorithm() as well, but that was fixed a while ago. This should be an easy enough fix, but can you submit it as a bug so we can track it? chris On Sep 28, 2011, at 11:54 AM, Paul Cantalupo wrote: > Hello, > > I'm using the most recent copy of bioperl-live (pulled yesterday). I > have a BLASTN (from blast+) output file for 3 query sequences > (https://gist.github.com/1248342). I used this script, > https://gist.github.com/1248338, to print the query id, algorithm and > algorithm_version for each result. When I run the script, I get the > following output: > > GFAVMM201BADC0 BLASTN 2.2.25+ > GFAVMM201A1JOH BLASTN > GFAVMM201D933Z BLASTN > > Algorithm_version outputs the correct version for the first result but > outputs the empty string for the 2nd and 3rd query. Why? This > functionality worked about a month ago. What has changed to cause this > to happen? > > Thank you, > > Paul > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From carandraug+dev at gmail.com Wed Sep 28 22:07:53 2011 From: carandraug+dev at gmail.com (=?ISO-8859-1?Q?Carn=EB_Draug?=) Date: Wed, 28 Sep 2011 23:07:53 +0100 Subject: [Bioperl-l] retrieving bioperl version for scripts In-Reply-To: References: Message-ID: 2011/9/28 Fields, Christopher J : > On Sep 28, 2011, at 10:00 AM, Carn? Draug wrote: > >> 2011/9/28 longbow leo : >>> Hi, Carn?, >>> >>> Do you mean this: >>> >>> perl -MBio::SeqIO -e 'print $Bio::SeqIO::VERSION' >>> >>> In my machine, the output is >> >> Thank you. Yes this is what I was looking for. I looked down how that >> variable comes up and so I think I'll use >> >> use Bio::Root::Version; >> say $Bio::Root::Version::VERSION; >> >> Carn? > > Just a warning on this: we're shortly to announce a major restructuring effort with BioPerl that will dramatically affect core versioning, mainly from the point of view that modularizing BioPerl into more manageable sub-distributions will require that each sub-distribution have it's own version (not necessarily a bad thing). Where will the scripts end up after this restructuration? What I want is to create a version of the script (not of bioperl). Since the script is released with bioperl, they are the same. I actually already made the commit that makes this, just haven't bothered with the pull request yet. Also, will there be a release before this change? Carn? From shalabh.sharma7 at gmail.com Thu Sep 29 14:37:53 2011 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Thu, 29 Sep 2011 10:37:53 -0400 Subject: [Bioperl-l] GFF to GTF Message-ID: Hi, Is there any module to convert GFF file to GTF? Thanks Shalabh -- Shalabh Sharma Scientific Computing Professional Associate (Bioinformatics Specialist) Department of Marine Sciences University of Georgia Athens, GA 30602-3636 From cjfields at illinois.edu Thu Sep 29 15:07:27 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Thu, 29 Sep 2011 15:07:27 +0000 Subject: [Bioperl-l] retrieving bioperl version for scripts In-Reply-To: References: Message-ID: <46733E36-5795-4EB6-9C2B-C978000FFD46@illinois.edu> On Sep 28, 2011, at 5:07 PM, Carn? Draug wrote: > 2011/9/28 Fields, Christopher J : >> On Sep 28, 2011, at 10:00 AM, Carn? Draug wrote: >> >>> 2011/9/28 longbow leo : >>>> Hi, Carn?, >>>> >>>> Do you mean this: >>>> >>>> perl -MBio::SeqIO -e 'print $Bio::SeqIO::VERSION' >>>> >>>> In my machine, the output is >>> >>> Thank you. Yes this is what I was looking for. I looked down how that >>> variable comes up and so I think I'll use >>> >>> use Bio::Root::Version; >>> say $Bio::Root::Version::VERSION; >>> >>> Carn? >> >> Just a warning on this: we're shortly to announce a major restructuring effort with BioPerl that will dramatically affect core versioning, mainly from the point of view that modularizing BioPerl into more manageable sub-distributions will require that each sub-distribution have it's own version (not necessarily a bad thing). > > Where will the scripts end up after this restructuration? What I want > is to create a version of the script (not of bioperl). Since the > script is released with bioperl, they are the same. I actually already > made the commit that makes this, just haven't bothered with the pull > request yet. > > Also, will there be a release before this change? > > Carn? Scripts will likely go with the distribution that they most closely are tied to, but that's still an area for debate (some may equally fall within one distribution or another, which will be tricky). For more on the release aspects see the (currently being revised and thus not complete) wiki page: http://www.bioperl.org/wiki/BioPerl_Modularization chris From pcantalupo at gmail.com Thu Sep 29 16:13:05 2011 From: pcantalupo at gmail.com (Paul Cantalupo) Date: Thu, 29 Sep 2011 12:13:05 -0400 Subject: [Bioperl-l] algorithm_version not working in multi-result blast output In-Reply-To: <4240A221-7BF7-4723-A7A1-30C806CE3DC6@illinois.edu> References: <4240A221-7BF7-4723-A7A1-30C806CE3DC6@illinois.edu> Message-ID: Bug submitted: https://redmine.open-bio.org/issues/3298 On Wed, Sep 28, 2011 at 5:40 PM, Fields, Christopher J wrote: > Not sure, but I would hazard a guess that only the 'Query=' line is present in concatenated BLAST reports past the initial report, and the version isn't carried over (I recall this being a problem with the algorithm() as well, but that was fixed a while ago. > > This should be an easy enough fix, but can you submit it as a bug so we can track it? > > chris > > On Sep 28, 2011, at 11:54 AM, Paul Cantalupo wrote: > >> Hello, >> >> I'm using the most recent copy of bioperl-live (pulled yesterday). I >> have a BLASTN (from blast+) output file for 3 query sequences >> (https://gist.github.com/1248342). I used this script, >> https://gist.github.com/1248338, to print the query id, algorithm and >> algorithm_version for each result. When I run the script, I get the >> following output: >> >> GFAVMM201BADC0 ?BLASTN ?2.2.25+ >> GFAVMM201A1JOH ?BLASTN >> GFAVMM201D933Z ?BLASTN >> >> Algorithm_version outputs the correct version for the first result but >> outputs the empty string for the 2nd and 3rd query. Why? This >> functionality worked about a month ago. What has changed to cause this >> to happen? >> >> Thank you, >> >> Paul >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From jluis.lavin at unavarra.es Fri Sep 30 08:23:19 2011 From: jluis.lavin at unavarra.es (jluis.lavin at unavarra.es) Date: Fri, 30 Sep 2011 10:23:19 +0200 Subject: [Bioperl-l] Bio-Graphics module Message-ID: <52f9bcbb5c40302fe5d1ea274982c24b.squirrel@webmail.unavarra.es> Dear All, I'm currently using Perl 5.10.0 version and Bioperl 1.6.1 running on a windows machine. I read about the Bio-Graphics module and it'd be wonderful to install it, but seems like it is only available for Perl 5.8... Is there any other Perl and/or Bioperl module to do the same kind of genomic and Blast report representation currently available? Thanks in advance -- Dr. Jos? Luis Lav?n Trueba Dpto. de Producci?n Agraria Grupo de Gen?tica y Microbiolog?a Universidad P?blica de Navarra 31006 Pamplona Navarra SPAIN From cjfields at illinois.edu Fri Sep 30 12:38:01 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Fri, 30 Sep 2011 12:38:01 +0000 Subject: [Bioperl-l] Bio-Graphics module In-Reply-To: <52f9bcbb5c40302fe5d1ea274982c24b.squirrel@webmail.unavarra.es> References: <52f9bcbb5c40302fe5d1ea274982c24b.squirrel@webmail.unavarra.es> Message-ID: It's available for all perl versions from 5.8.8 up. I have it running with perl 5.14. Now, I recall there being problems with installation on Mac OS X, though I think that was mainly due to GD.pm and libgd. chris On Sep 30, 2011, at 3:23 AM, wrote: > > Dear All, > > I'm currently using Perl 5.10.0 version and Bioperl 1.6.1 running on a > windows machine. > > I read about the Bio-Graphics module and it'd be wonderful to install it, > but seems like it is only available for Perl 5.8... > Is there any other Perl and/or Bioperl module to do the same kind of > genomic and Blast report representation currently available? > > Thanks in advance > > -- > Dr. Jos? Luis Lav?n Trueba > > Dpto. de Producci?n Agraria > Grupo de Gen?tica y Microbiolog?a > Universidad P?blica de Navarra > 31006 Pamplona > Navarra > SPAIN > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jillianrowe91286 at gmail.com Wed Sep 28 06:03:32 2011 From: jillianrowe91286 at gmail.com (Jill) Date: Tue, 27 Sep 2011 23:03:32 -0700 (PDT) Subject: [Bioperl-l] Gene Type in Entrez gene? Message-ID: Hi there, I am using the Bio::DB::Eutilities module to download gene sequences based on a query. while (my $docsum = $summaries->next_DocSum) { ## some items in DocSum are also named ChrStart so we pick the genomic ## information item and get the coordinates from it my ($genomic_info) = $docsum->get_Items_by_name('GenomicInfoType'); ## some entries may have no data on genomic coordinates. This condition filters then out if (!$genomic_info) { ## found no genomic coordinates data next; } ## get coordinates of sequence ## get_contents_by_name always returns a list my ($chr_acc_ver) = $genomic_info- >get_contents_by_name("ChrAccVer"); my ($chr_start) = $genomic_info- >get_contents_by_name("ChrStart"); my ($chr_stop) = $genomic_info- >get_contents_by_name("ChrStop"); my $strand; if ($chr_start < $chr_stop) { $strand = 1; $chr_start = $chr_start +1 - $bp5_extra; $chr_stop = $chr_stop +1 + $bp5_extra; } elsif ($chr_start > $chr_stop) { $strand = 2; $chr_start = $chr_start +1 - (-$bp5_extra); $chr_stop = $chr_stop +1 + (-$bp5_extra); } else { next; } while (my $item = $docsum->next_Item('flattened')) { next if ($item->get_name =~ m/NomenclatureName/); if($item->get_name =~ m/Description/) { $description = $item->get_content if $item->get_content; $description =~ tr/ /_/; print $description, "\n";} if($item->get_name =~ m/Name/) { $name = $item->get_content if $item->get_content; print $name, "\n"; } printf("%-20s:%s\n",$item->get_name,$item->get_content) if $item->get_content; } } Then I go on to use genbank to download the sequences based on the chromosome splice. For what I have it works great. But I am trying to get to the gene type (either protein coding or pseudo) as well. I can see it in the summary on the Entrez Gene sight, but can't get to it through bioperl. When I have it print out all the contents of the summary it doesn't show up there either. Any help? Thanks! From liam.elbourne at mq.edu.au Thu Sep 29 21:34:04 2011 From: liam.elbourne at mq.edu.au (Liam Elbourne) Date: Fri, 30 Sep 2011 07:34:04 +1000 Subject: [Bioperl-l] GFF to GTF In-Reply-To: References: Message-ID: <8D027281-44E6-467C-8D22-D2D2F87D04B6@mq.edu.au> Hi Shalabh, Not sure about bioperl (I looked a while back and either missed it or it's not there) but there is a program associated with the cufflinks suite called gffread that should convert. Regards, Liam Elbourne. On 30/09/2011, at 12:37 AM, shalabh sharma wrote: > Hi, > Is there any module to convert GFF file to GTF? > > Thanks > Shalabh > > > -- > Shalabh Sharma > Scientific Computing Professional Associate (Bioinformatics Specialist) > Department of Marine Sciences > University of Georgia > Athens, GA 30602-3636 > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From carandraug+dev at gmail.com Fri Sep 30 13:18:04 2011 From: carandraug+dev at gmail.com (=?ISO-8859-1?Q?Carn=EB_Draug?=) Date: Fri, 30 Sep 2011 14:18:04 +0100 Subject: [Bioperl-l] Gene Type in Entrez gene? In-Reply-To: References: Message-ID: On 28 September 2011 07:03, Jill wrote: > Hi there, > > I am using the Bio::DB::Eutilities module to download gene sequences > based on a query. > > > [...] > } > > > Then I go on to use genbank to download the sequences based on the > chromosome splice. For what I have it works great. But I am trying to > get to the gene type (either protein coding or pseudo) as well. I can > see it in the summary on the Entrez Gene sight, but can't get to it > through bioperl. When I have it print out all the contents of the > summary it doesn't show up there either. > > Any help? Hi Jill, there's already a script in bioperl that does what you want, it's just not part of the current stable release. You can get it here https://github.com/bioperl/bioperl-live/blob/master/scripts/Bio-DB-EUtilities/bp_genbank_ref_extractor.pl You can download the script alone, it will work fine in previous releases of bioperl, no need to write another one. Carn? Draug