From Laurence.Amilhat at toulouse.inra.fr Tue May 6 05:32:53 2008 From: Laurence.Amilhat at toulouse.inra.fr (Laurence Amilhat) Date: Tue, 06 May 2008 11:32:53 +0200 Subject: [Bioperl-l] BioPerl and NHX tree In-Reply-To: <24c96eca0801030615k44a1b188pb3aef683674f3153@mail.gmail.com> References: <476A7736.109@toulouse.inra.fr> <24c96eca0712200732q20523c1co1075c15d056ff634@mail.gmail.com> <477CBFDC.8020503@toulouse.inra.fr> <24c96eca0801030615k44a1b188pb3aef683674f3153@mail.gmail.com> Message-ID: <482025C5.1030805@toulouse.inra.fr> Hello, I am trying to convert a newick treefile to a NHX file with specie tags in order to visualize it with the ATV viewer. The script is working but I think there is an error because the ATV return this error message: " Failed to read gene tree from "BX881913.1.p.om.4.tfa_prot.tfa.taxid.alltree.cons_outtree.rooted.long.nhx" [Error in NHX format: More than one distance to parent:"0.0"]" When comparing the intree and outtree, they seem to be different, for example the intree file begins with (((((( and the outtree begins with ((( Do you have an idea of what I am doing wrong? Here is my code: use strict; use Bio::TreeIO; use Bio::Tree::NodeNHX; use Getopt::Long; my $tree_file; my $outfile; my $codefile; my %corresp; GetOptions('f|file:s' =>\$tree_file, 'o|out:s' =>\$outfile, 'c|code:s' =>\$codefile); # Read the correspondence file # For each sequence get: # - the TAXID # - the specie name # - the specie name (with no space) # - the complete fasta header open (CODE, "< $codefile"); while () { chomp; my($code,$a, $b, $c, $d, $e)=split (/\t/); $corresp{$code}{"taxid"}=$b; $corresp{$code}{"species"}=$d; $corresp{$code}{"header"}=$e; $corresp{$code}{"nom"}=$c; } my $treeio = new Bio::TreeIO (-format => 'nhx', -file => "$tree_file"); #my $treeout= new Bio::TreeIO (-format => 'nhx', -file =>">$outfile", -binary=>"1"); my $treeout= new Bio::TreeIO (-format => 'nhx', -file =>">$outfile"); # Read the tree and change sequence header and add a NHX flag to specify the specie while (my $tree= $treeio->next_tree) { my @nodes=$tree->get_nodes(); foreach my $nd(@nodes) { if ($nd->is_Leaf()) { my $id=$nd->id(); print STDOUT "ID $id\n"; #add a NHX tag to the node which is the specie name $nd->nhx_tag({S=>$corresp{$id}{"nom"}}); #change the sequence code by its complete fasta header $id=$corresp{$id}{"header"}; $nd->id($id); } } $treeout->write_tree($tree); } Here is the infile: ((((((20:3.0,21:3.0):2.0,(((17:3.0,18:3.0):2.0,19:3.0):3.0,(15:3.0,16:3.0):3.0):1.0):2.0, 14:3.0):3.0,22:3.0):3.0,((13:3.0,(11:3.0,(10:3.0,12:3.0):1.0):3.0):3.0,(2:3.0, 1:3.0):3.0):3.0):0.0,((5:3.0,4:3.0):3.0,(3:3.0,((8:3.0,6:3.0):3.0,(9:3.0,7:6.0):3.0):3.0):2.0):3.0); Here is the output file: (((lcl|Fam_018802_Contig1_2_TAXID=8022_:3.0[&&NHX:S=Oncorhynchus mykiss],BX881913.1.p.om.4_1_1_-_501_TAXID=8022_:3.0[&&NHX:S=Oncorhynchus my kiss]):3.0[&&NHX],(lcl|Fam_013546_Contig1_PIMPR_6_TAXID=90988_:3.0[&&NHX:S=Pimephales promelas],(lcl|ENSDARP00000087648_pep_known_chromosome _ZFISH7_13_51517919_51522668_-1_gene_ENSDARG00000063670_t:3.0[&&NHX:S=Danio rerio],(lcl|ENSDARP00000087661_pep_novel_chromosome_ZFISH7_13_51 517919_51522668_-1_gene_ENSDARG00000063670_t:3.0[&&NHX:S=Danio rerio],lcl|ENSDARP00000087654_pep_known_chromosome_ZFISH7_13_51517544_5152273 9_-1_gene_ENSDARG00000063670_t:3.0[&&NHX:S=Danio rerio]):1.0[&&NHX]):3.0[&&NHX]):3.0[&&NHX]):3.0[&&NHX],(lcl|Fam_012588_Contig3090_GADMO_2_T AXID=8049_:3.0[&&NHX:S=Gadus morhua],(lcl|GSTENP00018428001_pep_known_chromosome_TETRAODON7_14_8497414_8500061_-1_gene_GSTENG00018428001_t:3 .0[&&NHX:S=Tetraodon nigroviridis],((lcl|ENSORLP00000013438_pep_novel_chromosome_MEDAKA1_24_3589482_3594915_-1_gene_ENSORLG00000010721_tr:3. 0[&&NHX:S=Oryzias latipes],lcl|ENSGACP00000006915_pep_novel_group_BROADS1_groupXVIII_2150130_2155380_1_gene_ENSGACG00000005224_:3.0[&&NHX:S= Gasterosteus aculeatus]):2.0[&&NHX],((lcl|ENSDARP00000074838_pep_novel_chromosome_ZFISH7_20_12837032_12851267_1_gene_ENSDARG00000011000_tr:3 .0[&&NHX:S=Danio rerio],lcl|ENSDARP00000015974_pep_known_chromosome_ZFISH7_20_12836852_12852683_1_gene_ENSDARG00000011000_tr:3.0[&&NHX:S=Dan io rerio]):3.0[&&NHX],(lcl|Contig618_HIPHI_5_TAXID=8267_:3.0[&&NHX:S=Hippoglossus hippoglossus],(lcl|Fam_023545_Contig2_2_TAXID=8022_:3.0[&& NHX:S=Oncorhynchus mykiss],lcl|ENSTRUP00000046040_pep_novel_scaffold_FUGU4_scaffold_185_27966_32394_1_gene_ENSTRUG00000017961_t:3.0[&&NHX:S= Takifugu rubripes]):2.0[&&NHX]):3.0[&&NHX]):1.0[&&NHX]):2.0[&&NHX]):3.0[&&NHX]):3.0[&&NHX]):0.0[&&NHX],((lcl|ENSORLP00000013701_pep_novel_ch romosome_MEDAKA1_15_25438171_25450498_-1_gene_ENSORLG00000010924_:3.0[&&NHX:S=Oryzias latipes],lcl|ENSGACP00000007323_pep_novel_group_BROADS 1_groupVI_6476613_6485834_1_gene_ENSGACG00000005527_tra:3.0[&&NHX:S=Gasterosteus aculeatus]):3.0[&&NHX],(lcl|GSTENP00030753001_pep_known_chr omosome_TETRAODON7_17_3400689_3407671_1_gene_GSTENG00030753001_tr:3.0[&&NHX:S=Tetraodon nigroviridis],((lcl|ENSTRUP00000035694_pep_novel_sca ffold_FUGU4_scaffold_125_722763_725332_1_gene_ENSTRUG00000013959:3.0[&&NHX:S=Takifugu rubripes],lcl|ENSTRUP00000035693_pep_novel_scaffold_FU GU4_scaffold_125_722763_725332_1_gene_ENSTRUG00000013959:3.0[&&NHX:S=Takifugu rubripes]):3.0[&&NHX],(lcl|ENSTRUP00000035695_pep_novel_scaffo ld_FUGU4_scaffold_125_722853_725332_1_gene_ENSTRUG00000013959:3.0[&&NHX:S=Takifugu rubripes],lcl|ENSTRUP00000035691_pep_novel_scaffold_FUGU4 _scaffold_125_718572_725332_1_gene_ENSTRUG00000013959:6.0[&&NHX:S=Takifugu rubripes]):3.0[&&NHX]):3.0[&&NHX]):2.0[&&NHX]):3.0[&&NHX]; -- ==================================================================== = Laurence Amilhat INRA Toulouse 31326 Castanet-Tolosan = = Tel: 33 5 61 28 57 08 Email: laurence.amilhat at toulouse.inra.fr = ==================================================================== From shameer at ncbs.res.in Thu May 8 07:11:45 2008 From: shameer at ncbs.res.in (K. Shameer) Date: Thu, 8 May 2008 16:41:45 +0530 (IST) Subject: [Bioperl-l] HMMER - Parse hmmpfam output using bioperl - help!! In-Reply-To: <952B0A4E-8A14-4E8E-B36D-14596B20E330@bioperl.org> References: <946658.12337.qm@web36802.mail.mud.yahoo.com> <952B0A4E-8A14-4E8E-B36D-14596B20E330@bioperl.org> Message-ID: <47772.192.168.1.1.1210245105.squirrel@mail.ncbs.res.in> Dear All, Here is the code snippet I used to get the hit name and hit length from an hmmpfam file. I need to add the sequence start and end information (query), description of domain, score and e-value. I checked for the available method in deobfuscator, but I couldn't find the details i wanted. Is there methods available in the Bio::SearchIO or related modules. __CODE__ $input = shift; use Bio::SearchIO; my $in = Bio::SearchIO->new(-format => 'hmmer', -file => $input); while( my $result = $in->next_result ) { while( my $hit = $result->next_hit ) { print $hit->name(),"\t"; while( my $hsp = $hit->next_hsp ) { print $hsp->length(), "\n"; } } } _END_ Result for test.pf TRAP_240kDa 680 -- Thanks in advance, K. Shameer From shameer at ncbs.res.in Thu May 8 08:31:22 2008 From: shameer at ncbs.res.in (K. Shameer) Date: Thu, 8 May 2008 18:01:22 +0530 (IST) Subject: [Bioperl-l] HMMER - Parse hmmpfam output using bioperl - help!! In-Reply-To: <4822F8BE.6090507@sendu.me.uk> References: <946658.12337.qm@web36802.mail.mud.yahoo.com> <952B0A4E-8A14-4E8E-B36D-14596B20E330@bioperl.org> <47772.192.168.1.1.1210245105.squirrel@mail.ncbs.res.in> <4822F8BE.6090507@sendu.me.uk> Message-ID: <57429.192.168.1.1.1210249882.squirrel@mail.ncbs.res.in> Hi Sendu, Thanks for your quick reply. This is really useful. Separately, One quick question, I dont have hmmer_pull.pm. I am using an older version of Bioperl. Is there any provision to to download and install individual bioperl modules separately ? Thanks, K. Shameer > You'll find the relevant documentation under Bio::Search::Result, > Bio::Search::Hit and Bio::Search::HSP. > > Using Bio::SearchIO->new(-format => 'hmmer_pull') will also give you a > faster parser that may behave more closely to your expectation during > your loops. > > Anyway, there are various obvious-named methods you can use: > > $result->query_description > $hit->score > $hit->significance > $hit->description > $hit->start('query') > $hit->end('query') > $hsp->start('query') > $hsp->evalue > > etc. > From bix at sendu.me.uk Thu May 8 08:57:34 2008 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 08 May 2008 13:57:34 +0100 Subject: [Bioperl-l] HMMER - Parse hmmpfam output using bioperl - help!! In-Reply-To: <47772.192.168.1.1.1210245105.squirrel@mail.ncbs.res.in> References: <946658.12337.qm@web36802.mail.mud.yahoo.com> <952B0A4E-8A14-4E8E-B36D-14596B20E330@bioperl.org> <47772.192.168.1.1.1210245105.squirrel@mail.ncbs.res.in> Message-ID: <4822F8BE.6090507@sendu.me.uk> K. Shameer wrote: > Dear All, > > Here is the code snippet I used to get the hit name and hit length from an > hmmpfam file. I need to add the sequence start and end information > (query), description of domain, score and e-value. > > I checked for the available method in deobfuscator, but I couldn't find > the details i wanted. Is there methods available in the Bio::SearchIO or > related modules. > __CODE__ > $input = shift; > use Bio::SearchIO; > my $in = Bio::SearchIO->new(-format => 'hmmer', > -file => $input); > while( my $result = $in->next_result ) { > while( my $hit = $result->next_hit ) { > print $hit->name(),"\t"; > while( my $hsp = $hit->next_hsp ) { > print $hsp->length(), "\n"; > } > } > } You'll find the relevant documentation under Bio::Search::Result, Bio::Search::Hit and Bio::Search::HSP. Using Bio::SearchIO->new(-format => 'hmmer_pull') will also give you a faster parser that may behave more closely to your expectation during your loops. Anyway, there are various obvious-named methods you can use: $result->query_description $hit->score $hit->significance $hit->description $hit->start('query') $hit->end('query') $hsp->start('query') $hsp->evalue etc. From bix at sendu.me.uk Thu May 8 09:43:28 2008 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 08 May 2008 14:43:28 +0100 Subject: [Bioperl-l] HMMER - Parse hmmpfam output using bioperl - help!! In-Reply-To: <57429.192.168.1.1.1210249882.squirrel@mail.ncbs.res.in> References: <946658.12337.qm@web36802.mail.mud.yahoo.com> <952B0A4E-8A14-4E8E-B36D-14596B20E330@bioperl.org> <47772.192.168.1.1.1210245105.squirrel@mail.ncbs.res.in> <4822F8BE.6090507@sendu.me.uk> <57429.192.168.1.1.1210249882.squirrel@mail.ncbs.res.in> Message-ID: <48230380.5040908@sendu.me.uk> K. Shameer wrote: > Hi Sendu, > > Thanks for your quick reply. This is really useful. > > Separately, One quick question, I dont have hmmer_pull.pm. I am using an > older version of Bioperl. Is there any provision to to download and > install individual bioperl modules separately ? Generally you can grab individual modules from svn, eg: http://code.open-bio.org/svnweb/index.cgi/bioperl/view/bioperl-live/trunk/Bio/SearchIO/hmmer_pull.pm (click the 'checkout' link). However in this particular case it's really complicated, with that module needing lots of other new modules to work. Use the normal hmmer.pm module; if you don't have any problems with it stick with it. Otherwise I'd recommend upgrading your entire Bioperl to 1.5.2 or svn. From prachi at stanford.edu Thu May 8 16:54:06 2008 From: prachi at stanford.edu (Prachi Shah) Date: Thu, 8 May 2008 13:54:06 -0700 Subject: [Bioperl-l] Can't parse blast report written by Bio::SearchIO::Writer::TextResultWriter Message-ID: <8684cf960805081354s6400b1eey917f6b9ae862eded@mail.gmail.com> Hi all, I am trying to order of HSPs within each BLAST Hit in the order of ascending P-values. So, I parse my WU-BLAST report using Bio::SearchIO and create new Result, Hit and HSP objects in the order and then write out another BLAST report with the Bio::SearchIO::Writer::TextResultWriter module. All this works fine. But, when I try to parse this new blast report with Bio::SearchIO::blast, I get the following error: ------------- EXCEPTION ------------- MSG: no data for midline Query: 0 1 STACK Bio::SearchIO::blast::next_result /tools/perl/5.6.1/lib/site_perl/5.6.1/Bio/SearchIO/blast.pm:1151 STACK toplevel bin/testBlastParse.pl:12 -------------------------------------- I have copied below sample sections of both blast reports and the code. Any hints/ pointers/ suggestions are greatly appreciated. Thanks, Prachi The old vs new blast reports look slightly different, esp. note the HSP start and stop coordinates for the QUERY sequence. **Snippet of OLD blast report (generated by WU-BLAST): ---------------------------------------------------------------------------------------------------- Query= orf19.4890 (4931 letters) Database: Ca21_Chromosomes 9 sequences; 14,324,492 total letters. Searching....10....20....30....40....50....60....70....80....90....100% done WARNING: hspmax=1000 was exceeded by 8 of the database sequences, causing the associated cutoff score, S2, to be transiently set as high as 113. Smallest Sum High Probability Sequences producing High-scoring Segment Pairs: Score P(N) N Ca21chr1 Assembly 21, Ca21chr1 (3188577 nucleotides) 24655 0. 1 Ca21chr5 Assembly 21, Ca21chr5 (1190941 nucleotides) 1682 3.4e-68 3 Ca21chr6 Assembly 21, Ca21chr6 (1033553 nucleotides) 908 3.0e-34 3 Ca21chr2 Assembly 21, Ca21chr2 (2232049 nucleotides) 859 4.7e-30 1 Ca21chr7 Assembly 21, Ca21chr7 (949626 nucleotides) 492 7.3e-24 3 Ca21chr4 Assembly 21, Ca21chr4 (1603475 nucleotides) 528 9.8e-21 2 Ca21chrR Assembly 21, Ca21chrR (2286425 nucleotides) 520 1.4e-19 5 Ca21chr3 Assembly 21, Ca21chr3 (1799426 nucleotides) 502 1.7e-14 2 Ca19-mtDNA Assembly 19, Ca19-mtDNA (40420 nucleotides) 313 2.9e-06 2 >Ca21chr1 Assembly 21, Ca21chr1 (3188577 nucleotides) Length = 3,188,577 Plus Strand HSPs: Score = 506 (82.0 bits), Expect = 4.9e-14, P = 4.9e-14 Identities = 850/1549 (54%), Positives = 850/1549 (54%), Strand = Plus / Plus Query: 3450 ATGCATATGGTAATGTTAA-AATCACTGATTTTGGA-TTTTGTGCTAAATTAAC-T-GAT 3505 | | ||| | | || |||| ||| ||||| ||| | ||||| || | || | | | | Sbjct: 155924 AGGGATACGATTAT-TTAAGAATT-CTGATATTGAAATTTTG-GC-ATTTTCATATAGTT 155979 Query: 3506 CAAAGA--AATAAACGTGCC-ACAATGGTGGGGACACCATATTGG-ATGGCACCTGAAGT 3561 |||| | |||||| | | |||| || | ||| | | ||| | | | | Sbjct: 155980 CAAACATTAATAAATATATTGAAAATGTTGATTTAATCAT-TAGTCATG---CTGGTACT 156035 Query: 3562 GGTTAAACAAAAGGAATATGATGAAAAAGTTGATGTTTGGTCATTGGGGATTATGACTAT 3621 || | || | | || || | | | |||| | |||| |||| || Sbjct: 156036 GGATCAATCATTG--AT-TGTTTACAT--TTGAA--TAAACCATTAATTGTTATTGTTAA 156088 Query: 3622 TGAAATGATTGAAGGAGAACCACCTTATTTGAA-T-GAAGAACCATTAAAAGCATTATAT 3679 ---------------------------------------------------------------------------------------------------- **Snippet of NEW blast report (generated using Bio::SearchIO::Writer::TextResultWriter) ---------------------------------------------------------------------------------------------------- uery= orf19.4890 (4,931 letters) Database: Ca21_Chromosomes 9 sequences; 14,324,492 total letters Score E Sequences producing significant alignments: (bits) value Ca21chr1 Assembly 21, Ca21chr1 (3188577 nucleotides) 24655 0. Ca21chr5 Assembly 21, Ca21chr5 (1190941 nucleotides) 1682 3.4e-68 Ca21chr6 Assembly 21, Ca21chr6 (1033553 nucleotides) 908 3.0e-34 Ca21chr2 Assembly 21, Ca21chr2 (2232049 nucleotides) 859 4.7e-30 Ca21chr7 Assembly 21, Ca21chr7 (949626 nucleotides) 492 7.3e-24 Ca21chr4 Assembly 21, Ca21chr4 (1603475 nucleotides) 528 9.8e-21 Ca21chrR Assembly 21, Ca21chrR (2286425 nucleotides) 520 1.4e-19 Ca21chr3 Assembly 21, Ca21chr3 (1799426 nucleotides) 502 1.7e-14 Ca19-mtDNA Assembly 19, Ca19-mtDNA (40420 nucleotides) 313 2.9e-06 >Ca21chr1 Assembly 21, Ca21chr1 (3188577 nucleotides) Length = 3188577 Score = 3705.3 bits (24655), Expect = 0., P = 0. Identities = 4931/4931 (100%) Frame = -1 / +1 Query: 1 ATAAAGGATGCCAAATAGTAGTAGTAAAATAGTAAATAGAATTGCAAAACAAAAATGATT -58 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 2248574 ATAAAGGATGCCAAATAGTAGTAGTAAAATAGTAAATAGAATTGCAAAACAAAAATGATT 2248633 Query: -59 AAATAGCCCTTTATCAATAAATTTTTAAAGTTAGTTTCTTCTGGAACCCTACCCTCTTGG -118 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 2248634 AAATAGCCCTTTATCAATAAATTTTTAAAGTTAGTTTCTTCTGGAACCCTACCCTCTTGG 2248693 Query: -119 TGTTAATCTTTTAAGTTAATATTTATAGTTAATAAAGTAGAAGTGTCTATTTATTGATTG -178 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 2248694 TGTTAATCTTTTAAGTTAATATTTATAGTTAATAAAGTAGAAGTGTCTATTTATTGATTG 2248753 Query: -179 TTGTTGTTGTTGATTAAGAATATAAAGAAAAACAGAAAAGAAAAAAAGAAGGTTTAAAAA -238 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 2248754 TTGTTGTTGTTGATTAAGAATATAAAGAAAAACAGAAAAGAAAAAAAGAAGGTTTAAAAA 2248813 Query: -239 AGTTAATTGTGAAGTAAAAGGGTTGAAAAATTTTTTTTTTTTCTGTTTCTCTCTTTGAGA -298 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 2248814 AGTTAATTGTGAAGTAAAAGGGTTGAAAAATTTTTTTTTTTTCTGTTTCTCTCTTTGAGA 2248873 Query: -299 TTCTTTGACATATTTATTATTATAACACTATGCTATACTAAAAACAGTACTACCAATTGA -358 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 2248874 TTCTTTGACATATTTATTATTATAACACTATGCTATACTAAAAACAGTACTACCAATTGA 2248933 Query: -359 ATTAAATTAAATTAAATTAAATTAAATTATTAGACCAATTTCAATAAAGATAAGCAATTT -418 ---------------------------------------------------------------------------------------------------- **Here is the snippet of code that reads the old report, generates new objects and writes new report: ---------------------------------------------------------------------------------------------------- my $blast_report = Bio::SearchIO->new(-format => 'blast', -file => $blastOutputTmp); my $writer = Bio::SearchIO::Writer::TextResultWriter->new(-no_wublastlinks => 0); my $out_blast_report = Bio::SearchIO->new(-writer => $writer, -file => ">$blastOutputFile"); my $sorted_blast_report; while( my $result = $blast_report->next_result ) { my (%parameters, %statistics); foreach my $param ($result->available_parameters) { $parameters{$param} = $result->get_parameter($param); } foreach my $stat ($result->available_statistics) { $statistics{$stat} = $result->get_statistic($stat); } my $generic_result = Bio::Search::Result::BlastResult->new(-query_name => $result->query_name, -query_length => $result->query_length, -database_name => $result->database_name, -database_entries => $result->database_entries, -parameters => \%parameters, -statistics => \%statistics, -algorithm => $result->algorithm, -query_description => $result->query_description, -algorithm_reference => $result->algorithm_reference, -algorithm_version => $result->algorithm_version, -database_letters => $result->database_letters); while( my $hit = $result->next_hit ) { my $generic_hit = Bio::Search::Hit::BlastHit->new(-name => $hit->name, -algorithm => $hit->algorithm, -description => $hit->description, -length => $hit->length, -score => $hit->score, -bits => $hit->bits, -significance => $hit->significance); my (@hsp_sorted, @hsps); while( my $hsp = $hit->next_hsp ) { push(@hsps, $hsp); } @hsp_sorted = sort {$a->pvalue <=> $b->pvalue} @hsps; for(my $i=0; $i<=$#hsp_sorted; $i++) { $generic_hit->add_hsp($hsp_sorted[$i]); } $generic_result->add_hit($generic_hit); } $out_blast_report->write_result($generic_result); } ---------------------------------------------------------------------------------------------------- From jason at bioperl.org Thu May 8 18:29:40 2008 From: jason at bioperl.org (Jason Stajich) Date: Thu, 8 May 2008 15:29:40 -0700 Subject: [Bioperl-l] Can't parse blast report written by Bio::SearchIO::Writer::TextResultWriter In-Reply-To: <8684cf960805081354s6400b1eey917f6b9ae862eded@mail.gmail.com> References: <8684cf960805081354s6400b1eey917f6b9ae862eded@mail.gmail.com> Message-ID: <27483384-0188-44F5-8AF8-5293A7A83547@bioperl.org> I suspect somehow you are not reconstituting the Hit or Result objects properly, but I didn't try and debug this myself. You can specify a sort order function to the Result object now to specify the Hit order, maybe we should add sort function to Hit object for retrieving the underlying HSPs in a programmable order. Seems like that would be a cleaner fix. -jason On May 8, 2008, at 1:54 PM, Prachi Shah wrote: > Hi all, > > I am trying to order of HSPs within each BLAST Hit in the order of > ascending P-values. So, I parse my WU-BLAST report using Bio::SearchIO > and create new Result, Hit and HSP objects in the order and then write > out another BLAST report with the > Bio::SearchIO::Writer::TextResultWriter module. All this works fine. > But, when I try to parse this new blast report with > Bio::SearchIO::blast, I get the following error: > > ------------- EXCEPTION ------------- > MSG: no data for midline Query: 0 1 > STACK Bio::SearchIO::blast::next_result > /tools/perl/5.6.1/lib/site_perl/5.6.1/Bio/SearchIO/blast.pm:1151 > STACK toplevel bin/testBlastParse.pl:12 > -------------------------------------- > > I have copied below sample sections of both blast reports and the > code. Any hints/ pointers/ suggestions are greatly appreciated. > > Thanks, > Prachi > > > > The old vs new blast reports look slightly different, esp. note the > HSP start and stop coordinates for the QUERY sequence. > > **Snippet of OLD blast report (generated by WU-BLAST): > ---------------------------------------------------------------------- > ------------------------------ > Query= orf19.4890 > (4931 letters) > > Database: Ca21_Chromosomes > 9 sequences; 14,324,492 total letters. > Searching.... > 10....20....30....40....50....60....70....80....90....100% done > > WARNING: hspmax=1000 was exceeded by 8 of the database sequences, > causing the > associated cutoff score, S2, to be transiently set as > high as 113. > > S > mallest > > Sum > High > Probability > Sequences producing High-scoring Segment Pairs: Score > P(N) N > > Ca21chr1 Assembly 21, Ca21chr1 (3188577 nucleotides) 24655 > 0. 1 > Ca21chr5 Assembly 21, Ca21chr5 (1190941 nucleotides) 1682 > 3.4e-68 3 > Ca21chr6 Assembly 21, Ca21chr6 (1033553 nucleotides) 908 > 3.0e-34 3 > Ca21chr2 Assembly 21, Ca21chr2 (2232049 nucleotides) 859 > 4.7e-30 1 > Ca21chr7 Assembly 21, Ca21chr7 (949626 nucleotides) 492 > 7.3e-24 3 > Ca21chr4 Assembly 21, Ca21chr4 (1603475 nucleotides) 528 > 9.8e-21 2 > Ca21chrR Assembly 21, Ca21chrR (2286425 nucleotides) 520 > 1.4e-19 5 > Ca21chr3 Assembly 21, Ca21chr3 (1799426 nucleotides) 502 > 1.7e-14 2 > Ca19-mtDNA Assembly 19, Ca19-mtDNA (40420 nucleotides) 313 > 2.9e-06 2 > > >> Ca21chr1 Assembly 21, Ca21chr1 (3188577 nucleotides) > Length = 3,188,577 > > Plus Strand HSPs: > > Score = 506 (82.0 bits), Expect = 4.9e-14, P = 4.9e-14 > Identities = 850/1549 (54%), Positives = 850/1549 (54%), Strand = > Plus / Plus > > Query: 3450 ATGCATATGGTAATGTTAA-AATCACTGATTTTGGA- > TTTTGTGCTAAATTAAC-T-GAT 3505 > | | ||| | | || |||| ||| ||||| ||| | ||||| || | || > | | | | > Sbjct: 155924 AGGGATACGATTAT-TTAAGAATT-CTGATATTGAAATTTTG-GC- > ATTTTCATATAGTT > 155979 > > Query: 3506 CAAAGA--AATAAACGTGCC-ACAATGGTGGGGACACCATATTGG- > ATGGCACCTGAAGT 3561 > |||| | |||||| | | |||| || | ||| | | ||| > | | | | > Sbjct: 155980 CAAACATTAATAAATATATTGAAAATGTTGATTTAATCAT-TAGTCATG--- > CTGGTACT > 156035 > > Query: 3562 > GGTTAAACAAAAGGAATATGATGAAAAAGTTGATGTTTGGTCATTGGGGATTATGACTAT 3621 > || | || | | || || | | | |||| | |||| > |||| || > Sbjct: 156036 GGATCAATCATTG--AT-TGTTTACAT--TTGAA-- > TAAACCATTAATTGTTATTGTTAA > 156088 > > Query: 3622 TGAAATGATTGAAGGAGAACCACCTTATTTGAA-T- > GAAGAACCATTAAAAGCATTATAT 3679 > ---------------------------------------------------------------------- > ------------------------------ > > **Snippet of NEW blast report (generated using > Bio::SearchIO::Writer::TextResultWriter) > ---------------------------------------------------------------------- > ------------------------------ > uery= orf19.4890 > (4,931 letters) > > Database: Ca21_Chromosomes > 9 sequences; 14,324,492 total letters > > > Score E > Sequences producing significant alignments: > (bits) value > Ca21chr1 Assembly 21, Ca21chr1 (3188577 nucleotides) > 24655 0. > Ca21chr5 Assembly 21, Ca21chr5 (1190941 nucleotides) > 1682 3.4e-68 > Ca21chr6 Assembly 21, Ca21chr6 (1033553 nucleotides) > 908 3.0e-34 > Ca21chr2 Assembly 21, Ca21chr2 (2232049 nucleotides) > 859 4.7e-30 > Ca21chr7 Assembly 21, Ca21chr7 (949626 nucleotides) > 492 7.3e-24 > Ca21chr4 Assembly 21, Ca21chr4 (1603475 nucleotides) > 528 9.8e-21 > Ca21chrR Assembly 21, Ca21chrR (2286425 nucleotides) > 520 1.4e-19 > Ca21chr3 Assembly 21, Ca21chr3 (1799426 nucleotides) > 502 1.7e-14 > Ca19-mtDNA Assembly 19, Ca19-mtDNA (40420 nucleotides) > 313 2.9e-06 > > >> Ca21chr1 Assembly 21, Ca21chr1 (3188577 nucleotides) > Length = 3188577 > > Score = 3705.3 bits (24655), Expect = 0., P = 0. > Identities = 4931/4931 (100%) > Frame = -1 / +1 > > Query: 1 > ATAAAGGATGCCAAATAGTAGTAGTAAAATAGTAAATAGAATTGCAAAACAAAAATGATT -58 > > |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > Sbjct: 2248574 > ATAAAGGATGCCAAATAGTAGTAGTAAAATAGTAAATAGAATTGCAAAACAAAAATGATT > 2248633 > > Query: -59 > AAATAGCCCTTTATCAATAAATTTTTAAAGTTAGTTTCTTCTGGAACCCTACCCTCTTGG -118 > > |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > Sbjct: 2248634 > AAATAGCCCTTTATCAATAAATTTTTAAAGTTAGTTTCTTCTGGAACCCTACCCTCTTGG > 2248693 > > Query: -119 > TGTTAATCTTTTAAGTTAATATTTATAGTTAATAAAGTAGAAGTGTCTATTTATTGATTG -178 > > |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > Sbjct: 2248694 > TGTTAATCTTTTAAGTTAATATTTATAGTTAATAAAGTAGAAGTGTCTATTTATTGATTG > 2248753 > > Query: -179 > TTGTTGTTGTTGATTAAGAATATAAAGAAAAACAGAAAAGAAAAAAAGAAGGTTTAAAAA -238 > > |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > Sbjct: 2248754 > TTGTTGTTGTTGATTAAGAATATAAAGAAAAACAGAAAAGAAAAAAAGAAGGTTTAAAAA > 2248813 > > Query: -239 > AGTTAATTGTGAAGTAAAAGGGTTGAAAAATTTTTTTTTTTTCTGTTTCTCTCTTTGAGA -298 > > |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > Sbjct: 2248814 > AGTTAATTGTGAAGTAAAAGGGTTGAAAAATTTTTTTTTTTTCTGTTTCTCTCTTTGAGA > 2248873 > > Query: -299 > TTCTTTGACATATTTATTATTATAACACTATGCTATACTAAAAACAGTACTACCAATTGA -358 > > |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > Sbjct: 2248874 > TTCTTTGACATATTTATTATTATAACACTATGCTATACTAAAAACAGTACTACCAATTGA > 2248933 > > Query: -359 > ATTAAATTAAATTAAATTAAATTAAATTATTAGACCAATTTCAATAAAGATAAGCAATTT -418 > > ---------------------------------------------------------------------- > ------------------------------ > > **Here is the snippet of code that reads the old report, generates new > objects and writes new report: > ---------------------------------------------------------------------- > ------------------------------ > my $blast_report = Bio::SearchIO->new(-format => 'blast', > -file => $blastOutputTmp); > > my $writer = > Bio::SearchIO::Writer::TextResultWriter->new(-no_wublastlinks => 0); > my $out_blast_report = Bio::SearchIO->new(-writer => $writer, > -file => ">$blastOutputFile"); > > my $sorted_blast_report; > > while( my $result = $blast_report->next_result ) { > > my (%parameters, %statistics); > > foreach my $param ($result->available_parameters) { > > $parameters{$param} = $result->get_parameter($param); > } > > foreach my $stat ($result->available_statistics) { > > $statistics{$stat} = $result->get_statistic($stat); > } > > my $generic_result = > Bio::Search::Result::BlastResult->new(-query_name => > $result->query_name, > -query_length => > $result->query_length, > -database_name => > $result->database_name, > -database_entries => > $result->database_entries, > -parameters => \% > parameters, > -statistics => \% > statistics, > -algorithm => $result- > >algorithm, > -query_description => > $result->query_description, > -algorithm_reference => > $result->algorithm_reference, > -algorithm_version => > $result->algorithm_version, > -database_letters => > $result->database_letters); > > while( my $hit = $result->next_hit ) { > > my $generic_hit = Bio::Search::Hit::BlastHit->new(-name > => $hit->name, > -algorithm => $hit->algorithm, > -description => $hit->description, > -length => $hit->length, > -score => $hit->score, > -bits => $hit->bits, > -significance => $hit- > >significance); > > my (@hsp_sorted, @hsps); > while( my $hsp = $hit->next_hsp ) { > > push(@hsps, $hsp); > } > > @hsp_sorted = sort {$a->pvalue <=> $b->pvalue} @hsps; > > for(my $i=0; $i<=$#hsp_sorted; $i++) { > > $generic_hit->add_hsp($hsp_sorted[$i]); > > } > > $generic_result->add_hit($generic_hit); > > } > > $out_blast_report->write_result($generic_result); > > } > ---------------------------------------------------------------------- > ------------------------------ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From prachi at stanford.edu Thu May 8 18:35:30 2008 From: prachi at stanford.edu (Prachi Shah) Date: Thu, 8 May 2008 15:35:30 -0700 Subject: [Bioperl-l] Can't parse blast report written by Bio::SearchIO::Writer::TextResultWriter In-Reply-To: <27483384-0188-44F5-8AF8-5293A7A83547@bioperl.org> References: <8684cf960805081354s6400b1eey917f6b9ae862eded@mail.gmail.com> <27483384-0188-44F5-8AF8-5293A7A83547@bioperl.org> Message-ID: <8684cf960805081535v2a8c8261hcd373612100cdaf5@mail.gmail.com> > I suspect somehow you are not reconstituting the Hit or Result objects properly, > but I didn't try and debug this myself. Its possible, but I haven't been to point out what is going wrong. But then, the writer object is able to write the report without incident. I am at a loss. > You can specify a sort order function to the Result object now to specify the Hit order, > maybe we should add sort function to Hit object for retrieving the underlying HSPs in a > programmable order. Seems like that would be a cleaner fix. That would be ideal! But, until that is available, I will have to make-do with such a solution. Thanks, Prachi > On May 8, 2008, at 1:54 PM, Prachi Shah wrote: > >> Hi all, >> >> I am trying to order of HSPs within each BLAST Hit in the order of >> ascending P-values. So, I parse my WU-BLAST report using Bio::SearchIO >> and create new Result, Hit and HSP objects in the order and then write >> out another BLAST report with the >> Bio::SearchIO::Writer::TextResultWriter module. All this works fine. >> But, when I try to parse this new blast report with >> Bio::SearchIO::blast, I get the following error: >> >> ------------- EXCEPTION ------------- >> MSG: no data for midline Query: 0 1 >> STACK Bio::SearchIO::blast::next_result >> /tools/perl/5.6.1/lib/site_perl/5.6.1/Bio/SearchIO/blast.pm:1151 >> STACK toplevel bin/testBlastParse.pl:12 >> -------------------------------------- >> >> I have copied below sample sections of both blast reports and the >> code. Any hints/ pointers/ suggestions are greatly appreciated. >> >> Thanks, >> Prachi >> >> >> >> The old vs new blast reports look slightly different, esp. note the >> HSP start and stop coordinates for the QUERY sequence. >> >> **Snippet of OLD blast report (generated by WU-BLAST): >> ---------------------------------------------------------------------------------------------------- >> Query= orf19.4890 >> (4931 letters) >> >> Database: Ca21_Chromosomes >> 9 sequences; 14,324,492 total letters. >> Searching....10....20....30....40....50....60....70....80....90....100% done >> >> WARNING: hspmax=1000 was exceeded by 8 of the database sequences, causing the >> associated cutoff score, S2, to be transiently set as high as 113. >> >> Smallest >> Sum >> High Probability >> Sequences producing High-scoring Segment Pairs: Score P(N) N >> >> Ca21chr1 Assembly 21, Ca21chr1 (3188577 nucleotides) 24655 0. 1 >> Ca21chr5 Assembly 21, Ca21chr5 (1190941 nucleotides) 1682 3.4e-68 3 >> Ca21chr6 Assembly 21, Ca21chr6 (1033553 nucleotides) 908 3.0e-34 3 >> Ca21chr2 Assembly 21, Ca21chr2 (2232049 nucleotides) 859 4.7e-30 1 >> Ca21chr7 Assembly 21, Ca21chr7 (949626 nucleotides) 492 7.3e-24 3 >> Ca21chr4 Assembly 21, Ca21chr4 (1603475 nucleotides) 528 9.8e-21 2 >> Ca21chrR Assembly 21, Ca21chrR (2286425 nucleotides) 520 1.4e-19 5 >> Ca21chr3 Assembly 21, Ca21chr3 (1799426 nucleotides) 502 1.7e-14 2 >> Ca19-mtDNA Assembly 19, Ca19-mtDNA (40420 nucleotides) 313 2.9e-06 2 >> >> >>> Ca21chr1 Assembly 21, Ca21chr1 (3188577 nucleotides) >> >> Length = 3,188,577 >> >> Plus Strand HSPs: >> >> Score = 506 (82.0 bits), Expect = 4.9e-14, P = 4.9e-14 >> Identities = 850/1549 (54%), Positives = 850/1549 (54%), Strand = Plus / Plus >> >> Query: 3450 ATGCATATGGTAATGTTAA-AATCACTGATTTTGGA-TTTTGTGCTAAATTAAC-T-GAT 3505 >> | | ||| | | || |||| ||| ||||| ||| | ||||| || | || | | | | >> Sbjct: 155924 AGGGATACGATTAT-TTAAGAATT-CTGATATTGAAATTTTG-GC-ATTTTCATATAGTT >> 155979 >> >> Query: 3506 CAAAGA--AATAAACGTGCC-ACAATGGTGGGGACACCATATTGG-ATGGCACCTGAAGT 3561 >> |||| | |||||| | | |||| || | ||| | | ||| | | | | >> Sbjct: 155980 CAAACATTAATAAATATATTGAAAATGTTGATTTAATCAT-TAGTCATG---CTGGTACT >> 156035 >> >> Query: 3562 GGTTAAACAAAAGGAATATGATGAAAAAGTTGATGTTTGGTCATTGGGGATTATGACTAT 3621 >> || | || | | || || | | | |||| | |||| |||| || >> Sbjct: 156036 GGATCAATCATTG--AT-TGTTTACAT--TTGAA--TAAACCATTAATTGTTATTGTTAA >> 156088 >> >> Query: 3622 TGAAATGATTGAAGGAGAACCACCTTATTTGAA-T-GAAGAACCATTAAAAGCATTATAT 3679 >> ---------------------------------------------------------------------------------------------------- >> >> **Snippet of NEW blast report (generated using >> Bio::SearchIO::Writer::TextResultWriter) >> ---------------------------------------------------------------------------------------------------- >> uery= orf19.4890 >> (4,931 letters) >> >> Database: Ca21_Chromosomes >> 9 sequences; 14,324,492 total letters >> >> Score E >> Sequences producing significant alignments: (bits) value >> Ca21chr1 Assembly 21, Ca21chr1 (3188577 nucleotides) 24655 0. >> Ca21chr5 Assembly 21, Ca21chr5 (1190941 nucleotides) >> 1682 3.4e-68 >> Ca21chr6 Assembly 21, Ca21chr6 (1033553 nucleotides) >> 908 3.0e-34 >> Ca21chr2 Assembly 21, Ca21chr2 (2232049 nucleotides) >> 859 4.7e-30 >> Ca21chr7 Assembly 21, Ca21chr7 (949626 nucleotides) >> 492 7.3e-24 >> Ca21chr4 Assembly 21, Ca21chr4 (1603475 nucleotides) >> 528 9.8e-21 >> Ca21chrR Assembly 21, Ca21chrR (2286425 nucleotides) >> 520 1.4e-19 >> Ca21chr3 Assembly 21, Ca21chr3 (1799426 nucleotides) >> 502 1.7e-14 >> Ca19-mtDNA Assembly 19, Ca19-mtDNA (40420 nucleotides) >> 313 2.9e-06 >> >> >>> Ca21chr1 Assembly 21, Ca21chr1 (3188577 nucleotides) >> >> Length = 3188577 >> >> Score = 3705.3 bits (24655), Expect = 0., P = 0. >> Identities = 4931/4931 (100%) >> Frame = -1 / +1 >> >> Query: 1 ATAAAGGATGCCAAATAGTAGTAGTAAAATAGTAAATAGAATTGCAAAACAAAAATGATT -58 >> |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| >> Sbjct: 2248574 ATAAAGGATGCCAAATAGTAGTAGTAAAATAGTAAATAGAATTGCAAAACAAAAATGATT >> 2248633 >> >> Query: -59 AAATAGCCCTTTATCAATAAATTTTTAAAGTTAGTTTCTTCTGGAACCCTACCCTCTTGG -118 >> |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| >> Sbjct: 2248634 AAATAGCCCTTTATCAATAAATTTTTAAAGTTAGTTTCTTCTGGAACCCTACCCTCTTGG >> 2248693 >> >> Query: -119 TGTTAATCTTTTAAGTTAATATTTATAGTTAATAAAGTAGAAGTGTCTATTTATTGATTG -178 >> |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| >> Sbjct: 2248694 TGTTAATCTTTTAAGTTAATATTTATAGTTAATAAAGTAGAAGTGTCTATTTATTGATTG >> 2248753 >> >> Query: -179 TTGTTGTTGTTGATTAAGAATATAAAGAAAAACAGAAAAGAAAAAAAGAAGGTTTAAAAA -238 >> |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| >> Sbjct: 2248754 TTGTTGTTGTTGATTAAGAATATAAAGAAAAACAGAAAAGAAAAAAAGAAGGTTTAAAAA >> 2248813 >> >> Query: -239 AGTTAATTGTGAAGTAAAAGGGTTGAAAAATTTTTTTTTTTTCTGTTTCTCTCTTTGAGA -298 >> |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| >> Sbjct: 2248814 AGTTAATTGTGAAGTAAAAGGGTTGAAAAATTTTTTTTTTTTCTGTTTCTCTCTTTGAGA >> 2248873 >> >> Query: -299 TTCTTTGACATATTTATTATTATAACACTATGCTATACTAAAAACAGTACTACCAATTGA -358 >> |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| >> Sbjct: 2248874 TTCTTTGACATATTTATTATTATAACACTATGCTATACTAAAAACAGTACTACCAATTGA >> 2248933 >> >> Query: -359 ATTAAATTAAATTAAATTAAATTAAATTATTAGACCAATTTCAATAAAGATAAGCAATTT -418 >> >> ---------------------------------------------------------------------------------------------------- >> >> **Here is the snippet of code that reads the old report, generates new >> objects and writes new report: >> ---------------------------------------------------------------------------------------------------- >> my $blast_report = Bio::SearchIO->new(-format => 'blast', >> -file => $blastOutputTmp); >> >> my $writer = >> Bio::SearchIO::Writer::TextResultWriter->new(-no_wublastlinks => 0); >> my $out_blast_report = Bio::SearchIO->new(-writer => $writer, >> -file => ">$blastOutputFile"); >> >> my $sorted_blast_report; >> >> while( my $result = $blast_report->next_result ) { >> >> my (%parameters, %statistics); >> >> foreach my $param ($result->available_parameters) { >> >> $parameters{$param} = $result->get_parameter($param); >> } >> >> foreach my $stat ($result->available_statistics) { >> >> $statistics{$stat} = $result->get_statistic($stat); >> } >> >> my $generic_result = >> Bio::Search::Result::BlastResult->new(-query_name => >> $result->query_name, >> -query_length => >> $result->query_length, >> -database_name => >> $result->database_name, >> -database_entries => >> $result->database_entries, >> -parameters => \%parameters, >> -statistics => \%statistics, >> -algorithm => $result->algorithm, >> -query_description => >> $result->query_description, >> -algorithm_reference => >> $result->algorithm_reference, >> -algorithm_version => >> $result->algorithm_version, >> -database_letters => >> $result->database_letters); >> >> while( my $hit = $result->next_hit ) { >> >> my $generic_hit = Bio::Search::Hit::BlastHit->new(-name >> => $hit->name, >> -algorithm => $hit->algorithm, >> -description => $hit->description, >> -length => $hit->length, >> -score => $hit->score, >> -bits => $hit->bits, >> -significance => $hit->significance); >> >> my (@hsp_sorted, @hsps); >> while( my $hsp = $hit->next_hsp ) { >> >> push(@hsps, $hsp); >> } >> >> @hsp_sorted = sort {$a->pvalue <=> $b->pvalue} @hsps; >> >> for(my $i=0; $i<=$#hsp_sorted; $i++) { >> >> $generic_hit->add_hsp($hsp_sorted[$i]); >> >> } >> >> $generic_result->add_hit($generic_hit); >> >> } >> >> $out_blast_report->write_result($generic_result); >> >> } >> ---------------------------------------------------------------------------------------------------- >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at uiuc.edu Thu May 8 20:03:11 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 8 May 2008 19:03:11 -0500 Subject: [Bioperl-l] Can't parse blast report written by Bio::SearchIO::Writer::TextResultWriter In-Reply-To: <8684cf960805081535v2a8c8261hcd373612100cdaf5@mail.gmail.com> References: <8684cf960805081354s6400b1eey917f6b9ae862eded@mail.gmail.com> <27483384-0188-44F5-8AF8-5293A7A83547@bioperl.org> <8684cf960805081535v2a8c8261hcd373612100cdaf5@mail.gmail.com> Message-ID: <6661CE6F-0795-4EDE-9D05-CD95BAB3DBA4@uiuc.edu> You can always post it as an enhancement request in bugzilla. I don't think it would be too hard to implement. chris On May 8, 2008, at 5:35 PM, Prachi Shah wrote: >> I suspect somehow you are not reconstituting the Hit or Result >> objects properly, >> but I didn't try and debug this myself. > > Its possible, but I haven't been to point out what is going wrong. But > then, the writer object is able to write the report without incident. > I am at a loss. > >> You can specify a sort order function to the Result object now to >> specify the Hit order, >> maybe we should add sort function to Hit object for retrieving the >> underlying HSPs in a >> programmable order. Seems like that would be a cleaner fix. > > That would be ideal! But, until that is available, I will have to > make-do with such a solution. > > Thanks, > Prachi > > >> On May 8, 2008, at 1:54 PM, Prachi Shah wrote: >> >>> Hi all, >>> >>> I am trying to order of HSPs within each BLAST Hit in the order of >>> ascending P-values. So, I parse my WU-BLAST report using >>> Bio::SearchIO >>> and create new Result, Hit and HSP objects in the order and then >>> write >>> out another BLAST report with the >>> Bio::SearchIO::Writer::TextResultWriter module. All this works fine. >>> But, when I try to parse this new blast report with >>> Bio::SearchIO::blast, I get the following error: >>> >>> ------------- EXCEPTION ------------- >>> MSG: no data for midline Query: 0 1 >>> STACK Bio::SearchIO::blast::next_result >>> /tools/perl/5.6.1/lib/site_perl/5.6.1/Bio/SearchIO/blast.pm:1151 >>> STACK toplevel bin/testBlastParse.pl:12 >>> -------------------------------------- >>> >>> I have copied below sample sections of both blast reports and the >>> code. Any hints/ pointers/ suggestions are greatly appreciated. >>> >>> Thanks, >>> Prachi >>> >>> >>> >>> The old vs new blast reports look slightly different, esp. note the >>> HSP start and stop coordinates for the QUERY sequence. >>> >>> **Snippet of OLD blast report (generated by WU-BLAST): >>> ---------------------------------------------------------------------------------------------------- >>> Query= orf19.4890 >>> (4931 letters) >>> >>> Database: Ca21_Chromosomes >>> 9 sequences; 14,324,492 total letters. >>> Searching.... >>> 10....20....30....40....50....60....70....80....90....100% done >>> >>> WARNING: hspmax=1000 was exceeded by 8 of the database sequences, >>> causing the >>> associated cutoff score, S2, to be transiently set as high >>> as 113. >>> >>> >>> Smallest >>> Sum >>> High >>> Probability >>> Sequences producing High-scoring Segment Pairs: >>> Score P(N) N >>> >>> Ca21chr1 Assembly 21, Ca21chr1 (3188577 nucleotides) >>> 24655 0. 1 >>> Ca21chr5 Assembly 21, Ca21chr5 (1190941 nucleotides) >>> 1682 3.4e-68 3 >>> Ca21chr6 Assembly 21, Ca21chr6 (1033553 nucleotides) >>> 908 3.0e-34 3 >>> Ca21chr2 Assembly 21, Ca21chr2 (2232049 nucleotides) >>> 859 4.7e-30 1 >>> Ca21chr7 Assembly 21, Ca21chr7 (949626 nucleotides) >>> 492 7.3e-24 3 >>> Ca21chr4 Assembly 21, Ca21chr4 (1603475 nucleotides) >>> 528 9.8e-21 2 >>> Ca21chrR Assembly 21, Ca21chrR (2286425 nucleotides) >>> 520 1.4e-19 5 >>> Ca21chr3 Assembly 21, Ca21chr3 (1799426 nucleotides) >>> 502 1.7e-14 2 >>> Ca19-mtDNA Assembly 19, Ca19-mtDNA (40420 nucleotides) >>> 313 2.9e-06 2 >>> >>> >>>> Ca21chr1 Assembly 21, Ca21chr1 (3188577 nucleotides) >>> >>> Length = 3,188,577 >>> >>> Plus Strand HSPs: >>> >>> Score = 506 (82.0 bits), Expect = 4.9e-14, P = 4.9e-14 >>> Identities = 850/1549 (54%), Positives = 850/1549 (54%), Strand = >>> Plus / Plus >>> >>> Query: 3450 ATGCATATGGTAATGTTAA-AATCACTGATTTTGGA- >>> TTTTGTGCTAAATTAAC-T-GAT 3505 >>> | | ||| | | || |||| ||| ||||| ||| | ||||| || | || | >>> | | | >>> Sbjct: 155924 AGGGATACGATTAT-TTAAGAATT-CTGATATTGAAATTTTG-GC- >>> ATTTTCATATAGTT >>> 155979 >>> >>> Query: 3506 CAAAGA--AATAAACGTGCC-ACAATGGTGGGGACACCATATTGG- >>> ATGGCACCTGAAGT 3561 >>> |||| | |||||| | | |||| || | ||| | | ||| | >>> | | | >>> Sbjct: 155980 CAAACATTAATAAATATATTGAAAATGTTGATTTAATCAT-TAGTCATG--- >>> CTGGTACT >>> 156035 >>> >>> Query: 3562 >>> GGTTAAACAAAAGGAATATGATGAAAAAGTTGATGTTTGGTCATTGGGGATTATGACTAT 3621 >>> || | || | | || || | | | |||| | |||| >>> |||| || >>> Sbjct: 156036 GGATCAATCATTG--AT-TGTTTACAT--TTGAA-- >>> TAAACCATTAATTGTTATTGTTAA >>> 156088 >>> >>> Query: 3622 TGAAATGATTGAAGGAGAACCACCTTATTTGAA-T- >>> GAAGAACCATTAAAAGCATTATAT 3679 >>> ---------------------------------------------------------------------------------------------------- >>> >>> **Snippet of NEW blast report (generated using >>> Bio::SearchIO::Writer::TextResultWriter) >>> ---------------------------------------------------------------------------------------------------- >>> uery= orf19.4890 >>> (4,931 letters) >>> >>> Database: Ca21_Chromosomes >>> 9 sequences; 14,324,492 total letters >>> >>> >>> Score E >>> Sequences producing significant alignments: >>> (bits) value >>> Ca21chr1 Assembly 21, Ca21chr1 (3188577 >>> nucleotides) 24655 0. >>> Ca21chr5 Assembly 21, Ca21chr5 (1190941 nucleotides) >>> 1682 3.4e-68 >>> Ca21chr6 Assembly 21, Ca21chr6 (1033553 nucleotides) >>> 908 3.0e-34 >>> Ca21chr2 Assembly 21, Ca21chr2 (2232049 nucleotides) >>> 859 4.7e-30 >>> Ca21chr7 Assembly 21, Ca21chr7 (949626 nucleotides) >>> 492 7.3e-24 >>> Ca21chr4 Assembly 21, Ca21chr4 (1603475 nucleotides) >>> 528 9.8e-21 >>> Ca21chrR Assembly 21, Ca21chrR (2286425 nucleotides) >>> 520 1.4e-19 >>> Ca21chr3 Assembly 21, Ca21chr3 (1799426 nucleotides) >>> 502 1.7e-14 >>> Ca19-mtDNA Assembly 19, Ca19-mtDNA (40420 nucleotides) >>> 313 2.9e-06 >>> >>> >>>> Ca21chr1 Assembly 21, Ca21chr1 (3188577 nucleotides) >>> >>> Length = 3188577 >>> >>> Score = 3705.3 bits (24655), Expect = 0., P = 0. >>> Identities = 4931/4931 (100%) >>> Frame = -1 / +1 >>> >>> Query: 1 >>> ATAAAGGATGCCAAATAGTAGTAGTAAAATAGTAAATAGAATTGCAAAACAAAAATGATT -58 >>> >>> |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| >>> Sbjct: 2248574 >>> ATAAAGGATGCCAAATAGTAGTAGTAAAATAGTAAATAGAATTGCAAAACAAAAATGATT >>> 2248633 >>> >>> Query: -59 >>> AAATAGCCCTTTATCAATAAATTTTTAAAGTTAGTTTCTTCTGGAACCCTACCCTCTTGG -118 >>> >>> |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| >>> Sbjct: 2248634 >>> AAATAGCCCTTTATCAATAAATTTTTAAAGTTAGTTTCTTCTGGAACCCTACCCTCTTGG >>> 2248693 >>> >>> Query: -119 >>> TGTTAATCTTTTAAGTTAATATTTATAGTTAATAAAGTAGAAGTGTCTATTTATTGATTG -178 >>> >>> |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| >>> Sbjct: 2248694 >>> TGTTAATCTTTTAAGTTAATATTTATAGTTAATAAAGTAGAAGTGTCTATTTATTGATTG >>> 2248753 >>> >>> Query: -179 >>> TTGTTGTTGTTGATTAAGAATATAAAGAAAAACAGAAAAGAAAAAAAGAAGGTTTAAAAA -238 >>> >>> |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| >>> Sbjct: 2248754 >>> TTGTTGTTGTTGATTAAGAATATAAAGAAAAACAGAAAAGAAAAAAAGAAGGTTTAAAAA >>> 2248813 >>> >>> Query: -239 >>> AGTTAATTGTGAAGTAAAAGGGTTGAAAAATTTTTTTTTTTTCTGTTTCTCTCTTTGAGA -298 >>> >>> |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| >>> Sbjct: 2248814 >>> AGTTAATTGTGAAGTAAAAGGGTTGAAAAATTTTTTTTTTTTCTGTTTCTCTCTTTGAGA >>> 2248873 >>> >>> Query: -299 >>> TTCTTTGACATATTTATTATTATAACACTATGCTATACTAAAAACAGTACTACCAATTGA -358 >>> >>> |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| >>> Sbjct: 2248874 >>> TTCTTTGACATATTTATTATTATAACACTATGCTATACTAAAAACAGTACTACCAATTGA >>> 2248933 >>> >>> Query: -359 >>> ATTAAATTAAATTAAATTAAATTAAATTATTAGACCAATTTCAATAAAGATAAGCAATTT -418 >>> >>> ---------------------------------------------------------------------------------------------------- >>> >>> **Here is the snippet of code that reads the old report, generates >>> new >>> objects and writes new report: >>> ---------------------------------------------------------------------------------------------------- >>> my $blast_report = Bio::SearchIO->new(-format => 'blast', >>> -file => $blastOutputTmp); >>> >>> my $writer = >>> Bio::SearchIO::Writer::TextResultWriter->new(-no_wublastlinks => 0); >>> my $out_blast_report = Bio::SearchIO->new(-writer => $writer, >>> -file => ">$blastOutputFile"); >>> >>> my $sorted_blast_report; >>> >>> while( my $result = $blast_report->next_result ) { >>> >>> my (%parameters, %statistics); >>> >>> foreach my $param ($result->available_parameters) { >>> >>> $parameters{$param} = $result->get_parameter($param); >>> } >>> >>> foreach my $stat ($result->available_statistics) { >>> >>> $statistics{$stat} = $result->get_statistic($stat); >>> } >>> >>> my $generic_result = >>> Bio::Search::Result::BlastResult->new(-query_name => >>> $result->query_name, >>> -query_length => >>> $result->query_length, >>> -database_name => >>> $result->database_name, >>> -database_entries => >>> $result->database_entries, >>> -parameters => \ >>> %parameters, >>> -statistics => \ >>> %statistics, >>> -algorithm => $result- >>> >algorithm, >>> -query_description => >>> $result->query_description, >>> -algorithm_reference => >>> $result->algorithm_reference, >>> -algorithm_version => >>> $result->algorithm_version, >>> -database_letters => >>> $result->database_letters); >>> >>> while( my $hit = $result->next_hit ) { >>> >>> my $generic_hit = Bio::Search::Hit::BlastHit->new(-name >>> => $hit->name, >>> -algorithm => $hit->algorithm, >>> -description => $hit->description, >>> -length => $hit->length, >>> -score => $hit->score, >>> -bits => $hit->bits, >>> -significance => $hit- >>> >significance); >>> >>> my (@hsp_sorted, @hsps); >>> while( my $hsp = $hit->next_hsp ) { >>> >>> push(@hsps, $hsp); >>> } >>> >>> @hsp_sorted = sort {$a->pvalue <=> $b->pvalue} @hsps; >>> >>> for(my $i=0; $i<=$#hsp_sorted; $i++) { >>> >>> $generic_hit->add_hsp($hsp_sorted[$i]); >>> >>> } >>> >>> $generic_result->add_hit($generic_hit); >>> >>> } >>> >>> $out_blast_report->write_result($generic_result); >>> >>> } >>> ---------------------------------------------------------------------------------------------------- >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From punit_vergoboy2004 at yahoo.co.in Fri May 9 07:45:36 2008 From: punit_vergoboy2004 at yahoo.co.in (punit kumar) Date: Fri, 9 May 2008 17:15:36 +0530 (IST) Subject: [Bioperl-l] help_to_acces_clustal-w Message-ID: <937459.50783.qm@web8712.mail.in.yahoo.com> hi friends can?any one suggest me that how can i install fully the bioperl in my computer and how can access the? module?of clustal-w in my programme . i am wating of any persons reply . ?punit kumar kadimi. Explore your hobbies and interests. Go to http://in.promos.yahoo.com/groups/ From David.Messina at sbc.su.se Fri May 9 08:44:32 2008 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 9 May 2008 14:44:32 +0200 Subject: [Bioperl-l] help_to_acces_clustal-w In-Reply-To: <937459.50783.qm@web8712.mail.in.yahoo.com> References: <937459.50783.qm@web8712.mail.in.yahoo.com> Message-ID: <628aabb70805090544r2edc5fber7ce8fd49693fc041@mail.gmail.com> Hi Punit, You haven't said whether you've tried to install BioPerl already, or what kind of computer you have, so I'm afraid we don't know what you need help with. There are detailed installation instructions on the website here: http://www.bioperl.org/wiki/Installing_BioPerl If you want to parse ClustalW output, you use the AlignIO module. Details on that here: http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/AlignIO.html Dave From prachi at stanford.edu Fri May 9 13:43:41 2008 From: prachi at stanford.edu (Prachi Shah) Date: Fri, 9 May 2008 10:43:41 -0700 Subject: [Bioperl-l] Can't parse blast report written by Bio::SearchIO::Writer::TextResultWriter In-Reply-To: <6661CE6F-0795-4EDE-9D05-CD95BAB3DBA4@uiuc.edu> References: <8684cf960805081354s6400b1eey917f6b9ae862eded@mail.gmail.com> <27483384-0188-44F5-8AF8-5293A7A83547@bioperl.org> <8684cf960805081535v2a8c8261hcd373612100cdaf5@mail.gmail.com> <6661CE6F-0795-4EDE-9D05-CD95BAB3DBA4@uiuc.edu> Message-ID: <8684cf960805091043j706d2aaej8584b1e7d4e2e4d7@mail.gmail.com> Thanks. I have put in a bugzilla request. Although, I do need suggestions to solve my immediate problems. Any hints are greatly appreciated. Thanks, Prachi On Thu, May 8, 2008 at 5:03 PM, Chris Fields wrote: > You can always post it as an enhancement request in bugzilla. I don't think > it would be too hard to implement. > > chris > > On May 8, 2008, at 5:35 PM, Prachi Shah wrote: > >>> I suspect somehow you are not reconstituting the Hit or Result objects >>> properly, >>> but I didn't try and debug this myself. >> >> Its possible, but I haven't been to point out what is going wrong. But >> then, the writer object is able to write the report without incident. >> I am at a loss. >> >>> You can specify a sort order function to the Result object now to specify >>> the Hit order, >>> maybe we should add sort function to Hit object for retrieving the >>> underlying HSPs in a >>> programmable order. Seems like that would be a cleaner fix. >> >> That would be ideal! But, until that is available, I will have to >> make-do with such a solution. >> >> Thanks, >> Prachi >> >> >>> On May 8, 2008, at 1:54 PM, Prachi Shah wrote: >>> >>>> Hi all, >>>> >>>> I am trying to order of HSPs within each BLAST Hit in the order of >>>> ascending P-values. So, I parse my WU-BLAST report using Bio::SearchIO >>>> and create new Result, Hit and HSP objects in the order and then write >>>> out another BLAST report with the >>>> Bio::SearchIO::Writer::TextResultWriter module. All this works fine. >>>> But, when I try to parse this new blast report with >>>> Bio::SearchIO::blast, I get the following error: >>>> >>>> ------------- EXCEPTION ------------- >>>> MSG: no data for midline Query: 0 1 >>>> STACK Bio::SearchIO::blast::next_result >>>> /tools/perl/5.6.1/lib/site_perl/5.6.1/Bio/SearchIO/blast.pm:1151 >>>> STACK toplevel bin/testBlastParse.pl:12 >>>> -------------------------------------- >>>> >>>> I have copied below sample sections of both blast reports and the >>>> code. Any hints/ pointers/ suggestions are greatly appreciated. >>>> >>>> Thanks, >>>> Prachi >>>> >>>> >>>> >>>> The old vs new blast reports look slightly different, esp. note the >>>> HSP start and stop coordinates for the QUERY sequence. >>>> >>>> **Snippet of OLD blast report (generated by WU-BLAST): >>>> >>>> ---------------------------------------------------------------------------------------------------- >>>> Query= orf19.4890 >>>> (4931 letters) >>>> >>>> Database: Ca21_Chromosomes >>>> 9 sequences; 14,324,492 total letters. >>>> Searching....10....20....30....40....50....60....70....80....90....100% >>>> done >>>> >>>> WARNING: hspmax=1000 was exceeded by 8 of the database sequences, >>>> causing the >>>> associated cutoff score, S2, to be transiently set as high as >>>> 113. >>>> >>>> >>>> Smallest >>>> Sum >>>> High >>>> Probability >>>> Sequences producing High-scoring Segment Pairs: Score P(N) >>>> N >>>> >>>> Ca21chr1 Assembly 21, Ca21chr1 (3188577 nucleotides) 24655 0. >>>> 1 >>>> Ca21chr5 Assembly 21, Ca21chr5 (1190941 nucleotides) 1682 >>>> 3.4e-68 3 >>>> Ca21chr6 Assembly 21, Ca21chr6 (1033553 nucleotides) 908 >>>> 3.0e-34 3 >>>> Ca21chr2 Assembly 21, Ca21chr2 (2232049 nucleotides) 859 >>>> 4.7e-30 1 >>>> Ca21chr7 Assembly 21, Ca21chr7 (949626 nucleotides) 492 >>>> 7.3e-24 3 >>>> Ca21chr4 Assembly 21, Ca21chr4 (1603475 nucleotides) 528 >>>> 9.8e-21 2 >>>> Ca21chrR Assembly 21, Ca21chrR (2286425 nucleotides) 520 >>>> 1.4e-19 5 >>>> Ca21chr3 Assembly 21, Ca21chr3 (1799426 nucleotides) 502 >>>> 1.7e-14 2 >>>> Ca19-mtDNA Assembly 19, Ca19-mtDNA (40420 nucleotides) 313 >>>> 2.9e-06 2 >>>> >>>> >>>>> Ca21chr1 Assembly 21, Ca21chr1 (3188577 nucleotides) >>>> >>>> Length = 3,188,577 >>>> >>>> Plus Strand HSPs: >>>> >>>> Score = 506 (82.0 bits), Expect = 4.9e-14, P = 4.9e-14 >>>> Identities = 850/1549 (54%), Positives = 850/1549 (54%), Strand = Plus / >>>> Plus >>>> >>>> Query: 3450 >>>> ATGCATATGGTAATGTTAA-AATCACTGATTTTGGA-TTTTGTGCTAAATTAAC-T-GAT 3505 >>>> | | ||| | | || |||| ||| ||||| ||| | ||||| || | || | | | | >>>> Sbjct: 155924 >>>> AGGGATACGATTAT-TTAAGAATT-CTGATATTGAAATTTTG-GC-ATTTTCATATAGTT >>>> 155979 >>>> >>>> Query: 3506 >>>> CAAAGA--AATAAACGTGCC-ACAATGGTGGGGACACCATATTGG-ATGGCACCTGAAGT 3561 >>>> |||| | |||||| | | |||| || | ||| | | ||| | | | | >>>> Sbjct: 155980 >>>> CAAACATTAATAAATATATTGAAAATGTTGATTTAATCAT-TAGTCATG---CTGGTACT >>>> 156035 >>>> >>>> Query: 3562 >>>> GGTTAAACAAAAGGAATATGATGAAAAAGTTGATGTTTGGTCATTGGGGATTATGACTAT 3621 >>>> || | || | | || || | | | |||| | |||| |||| || >>>> Sbjct: 156036 >>>> GGATCAATCATTG--AT-TGTTTACAT--TTGAA--TAAACCATTAATTGTTATTGTTAA >>>> 156088 >>>> >>>> Query: 3622 >>>> TGAAATGATTGAAGGAGAACCACCTTATTTGAA-T-GAAGAACCATTAAAAGCATTATAT 3679 >>>> >>>> ---------------------------------------------------------------------------------------------------- >>>> >>>> **Snippet of NEW blast report (generated using >>>> Bio::SearchIO::Writer::TextResultWriter) >>>> >>>> ---------------------------------------------------------------------------------------------------- >>>> uery= orf19.4890 >>>> (4,931 letters) >>>> >>>> Database: Ca21_Chromosomes >>>> 9 sequences; 14,324,492 total letters >>>> >>>> Score >>>> E >>>> Sequences producing significant alignments: (bits) >>>> value >>>> Ca21chr1 Assembly 21, Ca21chr1 (3188577 nucleotides) >>>> 24655 0. >>>> Ca21chr5 Assembly 21, Ca21chr5 (1190941 nucleotides) >>>> 1682 3.4e-68 >>>> Ca21chr6 Assembly 21, Ca21chr6 (1033553 nucleotides) >>>> 908 3.0e-34 >>>> Ca21chr2 Assembly 21, Ca21chr2 (2232049 nucleotides) >>>> 859 4.7e-30 >>>> Ca21chr7 Assembly 21, Ca21chr7 (949626 nucleotides) >>>> 492 7.3e-24 >>>> Ca21chr4 Assembly 21, Ca21chr4 (1603475 nucleotides) >>>> 528 9.8e-21 >>>> Ca21chrR Assembly 21, Ca21chrR (2286425 nucleotides) >>>> 520 1.4e-19 >>>> Ca21chr3 Assembly 21, Ca21chr3 (1799426 nucleotides) >>>> 502 1.7e-14 >>>> Ca19-mtDNA Assembly 19, Ca19-mtDNA (40420 nucleotides) >>>> 313 2.9e-06 >>>> >>>> >>>>> Ca21chr1 Assembly 21, Ca21chr1 (3188577 nucleotides) >>>> >>>> Length = 3188577 >>>> >>>> Score = 3705.3 bits (24655), Expect = 0., P = 0. >>>> Identities = 4931/4931 (100%) >>>> Frame = -1 / +1 >>>> >>>> Query: 1 >>>> ATAAAGGATGCCAAATAGTAGTAGTAAAATAGTAAATAGAATTGCAAAACAAAAATGATT -58 >>>> |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| >>>> Sbjct: 2248574 >>>> ATAAAGGATGCCAAATAGTAGTAGTAAAATAGTAAATAGAATTGCAAAACAAAAATGATT >>>> 2248633 >>>> >>>> Query: -59 >>>> AAATAGCCCTTTATCAATAAATTTTTAAAGTTAGTTTCTTCTGGAACCCTACCCTCTTGG -118 >>>> |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| >>>> Sbjct: 2248634 >>>> AAATAGCCCTTTATCAATAAATTTTTAAAGTTAGTTTCTTCTGGAACCCTACCCTCTTGG >>>> 2248693 >>>> >>>> Query: -119 >>>> TGTTAATCTTTTAAGTTAATATTTATAGTTAATAAAGTAGAAGTGTCTATTTATTGATTG -178 >>>> |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| >>>> Sbjct: 2248694 >>>> TGTTAATCTTTTAAGTTAATATTTATAGTTAATAAAGTAGAAGTGTCTATTTATTGATTG >>>> 2248753 >>>> >>>> Query: -179 >>>> TTGTTGTTGTTGATTAAGAATATAAAGAAAAACAGAAAAGAAAAAAAGAAGGTTTAAAAA -238 >>>> |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| >>>> Sbjct: 2248754 >>>> TTGTTGTTGTTGATTAAGAATATAAAGAAAAACAGAAAAGAAAAAAAGAAGGTTTAAAAA >>>> 2248813 >>>> >>>> Query: -239 >>>> AGTTAATTGTGAAGTAAAAGGGTTGAAAAATTTTTTTTTTTTCTGTTTCTCTCTTTGAGA -298 >>>> |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| >>>> Sbjct: 2248814 >>>> AGTTAATTGTGAAGTAAAAGGGTTGAAAAATTTTTTTTTTTTCTGTTTCTCTCTTTGAGA >>>> 2248873 >>>> >>>> Query: -299 >>>> TTCTTTGACATATTTATTATTATAACACTATGCTATACTAAAAACAGTACTACCAATTGA -358 >>>> |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| >>>> Sbjct: 2248874 >>>> TTCTTTGACATATTTATTATTATAACACTATGCTATACTAAAAACAGTACTACCAATTGA >>>> 2248933 >>>> >>>> Query: -359 >>>> ATTAAATTAAATTAAATTAAATTAAATTATTAGACCAATTTCAATAAAGATAAGCAATTT -418 >>>> >>>> >>>> ---------------------------------------------------------------------------------------------------- >>>> >>>> **Here is the snippet of code that reads the old report, generates new >>>> objects and writes new report: >>>> >>>> ---------------------------------------------------------------------------------------------------- >>>> my $blast_report = Bio::SearchIO->new(-format => 'blast', >>>> -file => $blastOutputTmp); >>>> >>>> my $writer = >>>> Bio::SearchIO::Writer::TextResultWriter->new(-no_wublastlinks => 0); >>>> my $out_blast_report = Bio::SearchIO->new(-writer => $writer, >>>> -file => ">$blastOutputFile"); >>>> >>>> my $sorted_blast_report; >>>> >>>> while( my $result = $blast_report->next_result ) { >>>> >>>> my (%parameters, %statistics); >>>> >>>> foreach my $param ($result->available_parameters) { >>>> >>>> $parameters{$param} = $result->get_parameter($param); >>>> } >>>> >>>> foreach my $stat ($result->available_statistics) { >>>> >>>> $statistics{$stat} = $result->get_statistic($stat); >>>> } >>>> >>>> my $generic_result = >>>> Bio::Search::Result::BlastResult->new(-query_name => >>>> $result->query_name, >>>> -query_length => >>>> $result->query_length, >>>> -database_name => >>>> $result->database_name, >>>> -database_entries => >>>> $result->database_entries, >>>> -parameters => \%parameters, >>>> -statistics => \%statistics, >>>> -algorithm => >>>> $result->algorithm, >>>> -query_description => >>>> $result->query_description, >>>> -algorithm_reference => >>>> $result->algorithm_reference, >>>> -algorithm_version => >>>> $result->algorithm_version, >>>> -database_letters => >>>> $result->database_letters); >>>> >>>> while( my $hit = $result->next_hit ) { >>>> >>>> my $generic_hit = Bio::Search::Hit::BlastHit->new(-name >>>> => $hit->name, >>>> -algorithm => $hit->algorithm, >>>> -description => $hit->description, >>>> -length => $hit->length, >>>> -score => $hit->score, >>>> -bits => $hit->bits, >>>> -significance => $hit->significance); >>>> >>>> my (@hsp_sorted, @hsps); >>>> while( my $hsp = $hit->next_hsp ) { >>>> >>>> push(@hsps, $hsp); >>>> } >>>> >>>> @hsp_sorted = sort {$a->pvalue <=> $b->pvalue} @hsps; >>>> >>>> for(my $i=0; $i<=$#hsp_sorted; $i++) { >>>> >>>> $generic_hit->add_hsp($hsp_sorted[$i]); >>>> >>>> } >>>> >>>> $generic_result->add_hit($generic_hit); >>>> >>>> } >>>> >>>> $out_blast_report->write_result($generic_result); >>>> >>>> } >>>> >>>> ---------------------------------------------------------------------------------------------------- >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > From vdar at yorku.ca Fri May 9 21:10:23 2008 From: vdar at yorku.ca (nisa_dar) Date: Fri, 9 May 2008 18:10:23 -0700 (PDT) Subject: [Bioperl-l] problems with clustalw Message-ID: <17158917.post@talk.nabble.com> Hi, I need to do multiple sequence alignments of DNA sequences by using Bioperl. I am using the following module Bio::Tools::Run::Alignment::Clustalw; and I am getting the following error message Can't locate Bio/Tools/Run/Alignment/Clustalw.pm in @INC (@INC contains: /share/iNquiry/perl/lib/5.8.5/x86_64-linux-thread-multi /share/iNquiry/perl/lib/5.8.5 /share/iNquiry/perl/lib/x86_64-linux-thread-multi /share/iNquiry/perl/lib/5.8.4 /share/iNquiry/perl/lib/5.8.3 /share/iNquiry/perl/lib/5.8.2 /share/iNquiry/perl/lib/5.8.1 /share/iNquiry/perl/lib/5.8.0 /share/iNquiry/perl/lib /usr/lib64/perl5/5.8.5/x86_64-linux-thread-multi /usr/lib/perl5/5.8.5 /usr/lib64/perl5/site_perl/5.8.5/x86_64-linux-thread-multi /usr/lib64/perl5/site_perl/5.8.4/x86_64-linux-thread-multi /usr/lib64/perl5/site_perl/5.8.3/x86_64-linux-thread-multi /usr/lib64/perl5/site_perl/5.8.2/x86_64-linux-thread-multi /usr/lib64/perl5/site_perl/5.8.1/x86_64-linux-thread-multi /usr/lib64/perl5/site_perl/5.8.0/x86_64-linux-thread-multi /usr/lib/perl5/site_perl/5.8.5 /usr/lib/perl5/site_perl/5.8.4 /usr/lib/perl5/site_perl/5.8.3 /usr/lib/perl5/site_perl/5.8.2 /usr/lib/perl5/site_perl/5.8.1 /usr/lib/perl5/site_perl/5.8.0 /usr/lib/perl5/site_perl /usr/lib64/perl5/vendor_perl/5.8.5/x86_64-linux-thread-multi /usr/lib64/perl5/vendor_perl/5.8.4/x86_64-linux-thread-multi /usr/lib64/perl5/vendor_perl/5.8.3/x86_64-linux-thread-multi /usr/lib64/perl5/vendor_perl/5.8.2/x86_64-linux-thread-multi /usr/lib64/perl5/vendor_perl/5.8.1/x86_64-linux-thread-multi /usr/lib64/perl5/vendor_perl/5.8.0/x86_64-linux-thread-multi /usr/lib/perl5/vendor_perl/5.8.5 /usr/lib/perl5/vendor_perl/5.8.4 /usr/lib/perl5/vendor_perl/5.8.3 /usr/lib/perl5/vendor_perl/5.8.2 /usr/lib/perl5/vendor_perl/5.8.1 /usr/lib/perl5/vendor_perl/5.8.0 /usr/lib/perl5/vendor_perl .) at mult_align.pl line 9. BEGIN failed--compilation aborted at mult_align.pl line 9. Here is the piece of code that gives this message #!/usr/bin/perl -w use Bio::SeqIO; use Bio::Align::AlignI; use Bio::AlignIO; use Bio::AlignIO::msf; use Bio::SimpleAlign; use Bio::PrimarySeq; use Bio::Tools::Run::Alignment::Clustalw; use Bio::Root::IO; use Bio::Seq; my $query_string = "tatgtggctggcgagacacgacacttcatatggttttacctctacgtttgagtaattaagtacaatgagctatcact"; my $hit_string = "tatgtggctggcgagacacgacacttcatatggttttacctctacgtttgagtaattaagtacaatgagctatcact"; my $hit_string_two = "tatgtggctggcgagacacgacacttcatatggttttacctctacgtttgagtaattaagtacaatgagctatcact"; my @params = ('ktuple' => 2, 'matrix' => 'BLOSUM'); my $factory = Bio::Tools::Run::Alignment::Clustalw->new(@params); my $ktuple = 2; $factory->ktuple($ktuple); my $seq_obj_on = Bio::Seq->new(-id =>"thal", -seq =>"$query_string"); my $seq_obj_too = Bio::Seq->new(-id =>"lyrata", -seq =>"$hit_string"); my $seq_obj_thre = Bio::Seq->new(-id =>"boechera", -seq =>"$hit_string_two"); my @seq_array = qw/$seq_obj_on $seq_obj_too $seq_obj_thre/; my $seq_array_ref = \@seq_array; my $aln = $factory->align($seq_array_ref); I would appreciate if anyone could help. I don't know how to supply the environment variables at unix so if this is the solution please explain how can I do that. Thanks! -- View this message in context: http://www.nabble.com/problems-with-clustalw-tp17158917p17158917.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From bix at sendu.me.uk Sat May 10 02:02:56 2008 From: bix at sendu.me.uk (Sendu Bala) Date: Sat, 10 May 2008 07:02:56 +0100 Subject: [Bioperl-l] problems with clustalw In-Reply-To: <17158917.post@talk.nabble.com> References: <17158917.post@talk.nabble.com> Message-ID: <48253A90.8060300@sendu.me.uk> nisa_dar wrote: > I need to do multiple sequence alignments of DNA sequences by using Bioperl. > I am using the following module > Bio::Tools::Run::Alignment::Clustalw; > and I am getting the following error message > > Can't locate Bio/Tools/Run/Alignment/Clustalw.pm in @INC You need to install Bioperl-run, eg. cpan cpan>install S/SE/SENDU/bioperl-run-1.5.2_100.tar.gz (if you have core 1.5.2) From bix at sendu.me.uk Mon May 12 13:07:18 2008 From: bix at sendu.me.uk (Sendu Bala) Date: Mon, 12 May 2008 18:07:18 +0100 Subject: [Bioperl-l] problems with clustalw In-Reply-To: <1210611353.48287699ba469@mymail.yorku.ca> References: <17158917.post@talk.nabble.com> <48253A90.8060300@sendu.me.uk> <1210608557.48286bad2f2c5@mymail.yorku.ca> <48286CF8.2060405@sendu.me.uk> <1210609239.48286e57324cd@mymail.yorku.ca> <4828700A.4050709@sendu.me.uk> <1210610375.482872c72d63e@mymail.yorku.ca> <482874C7.2050606@sendu.me.uk> <1210611353.48287699ba469@mymail.yorku.ca> Message-ID: <48287946.8030007@sendu.me.uk> vdar at yorku.ca wrote: > Hi, > > Yes, I have clustalw installed and following is the result o which command > > $ which clustalw > /opt/Bio/bin/clustalw > > Please see aa.txt as output of perl -V and mult_align.pl is my script I've CC'd back the bioperl mailing list so other people can learn. Please keep it CC'd. Your script has two main errors: use Clustalw; $ENV{CLUSTALDIR} = '/opt/rocks/lib/perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/'; These should be: use Bio::Tools::Run::Alignment::Clustalw; $ENV{CLUSTALDIR} = '/opt/Bio/bin/clustalw'; There is also something very wrong with your installation, since you are using perl 5.8.5 yet have bioperl-run installed into a directory for 5.8.8. This is why Bio::Tools::Run::Alignment::Clustalw wasn't being found in the normal way; the 5.8.8 directory was never checked. PERL5LIB="/opt/rocks/lib/perl5/site_perl/5.8.8" should let it be found. If not, you might have to move the Bio folder from 5.8.8 to 5.8.5. From bix at sendu.me.uk Mon May 12 13:37:37 2008 From: bix at sendu.me.uk (Sendu Bala) Date: Mon, 12 May 2008 18:37:37 +0100 Subject: [Bioperl-l] problems with clustalw In-Reply-To: <1210613062.48287d46ba7c8@mymail.yorku.ca> References: <17158917.post@talk.nabble.com> <48253A90.8060300@sendu.me.uk> <1210608557.48286bad2f2c5@mymail.yorku.ca> <48286CF8.2060405@sendu.me.uk> <1210609239.48286e57324cd@mymail.yorku.ca> <4828700A.4050709@sendu.me.uk> <1210610375.482872c72d63e@mymail.yorku.ca> <482874C7.2050606@sendu.me.uk> <1210611353.48287699ba469@mymail.yorku.ca> <48287946.8030007@sendu.me.uk> <1210613062.48287d46ba7c8@mymail.yorku.ca> Message-ID: <48288061.9030001@sendu.me.uk> vdar at yorku.ca wrote: > Yes, seems like it worked, now I am having the following error message which is > not because of the errors in installation..right? > > $ perl mult_align.pl > Can't call method "isa" without a package or object reference at > /opt/rocks/lib/perl5/site_perl/5.8.8//Bio/Tools/Run/Alignment/Clustalw.pm line > 617. You weren't passing sequence objects to align() due to another error in your script: Instead of: my @seq_array = qw/$seq_obj_on $seq_obj_too $seq_obj_thre/; my $seq_array_ref = \@seq_array; my $aln = $factory->align($seq_array_ref); You can have: my @seq_array = ($seq_obj_on, $seq_obj_too, $seq_obj_thre); my $seq_array_ref = \@seq_array; my $aln = $factory->align($seq_array_ref); Or just: my $aln = $factory->align([$seq_obj_on, $seq_obj_too, $seq_obj_thre]); From vdar at yorku.ca Mon May 12 13:24:22 2008 From: vdar at yorku.ca (vdar at yorku.ca) Date: Mon, 12 May 2008 13:24:22 -0400 Subject: [Bioperl-l] problems with clustalw In-Reply-To: <48287946.8030007@sendu.me.uk> References: <17158917.post@talk.nabble.com> <48253A90.8060300@sendu.me.uk> <1210608557.48286bad2f2c5@mymail.yorku.ca> <48286CF8.2060405@sendu.me.uk> <1210609239.48286e57324cd@mymail.yorku.ca> <4828700A.4050709@sendu.me.uk> <1210610375.482872c72d63e@mymail.yorku.ca> <482874C7.2050606@sendu.me.uk> <1210611353.48287699ba469@mymail.yorku.ca> <48287946.8030007@sendu.me.uk> Message-ID: <1210613062.48287d46ba7c8@mymail.yorku.ca> Yes, seems like it worked, now I am having the following error message which is not because of the errors in installation..right? $ perl mult_align.pl Can't call method "isa" without a package or object reference at /opt/rocks/lib/perl5/site_perl/5.8.8//Bio/Tools/Run/Alignment/Clustalw.pm line 617. Quoting Sendu Bala : > vdar at yorku.ca wrote: > > Hi, > > > > Yes, I have clustalw installed and following is the result o which command > > > > $ which clustalw > > /opt/Bio/bin/clustalw > > > > Please see aa.txt as output of perl -V and mult_align.pl is my script > > I've CC'd back the bioperl mailing list so other people can learn. > Please keep it CC'd. > > Your script has two main errors: > > use Clustalw; > $ENV{CLUSTALDIR} = > '/opt/rocks/lib/perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/'; > > These should be: > use Bio::Tools::Run::Alignment::Clustalw; > $ENV{CLUSTALDIR} = '/opt/Bio/bin/clustalw'; > > There is also something very wrong with your installation, since you are > using perl 5.8.5 yet have bioperl-run installed into a directory for > 5.8.8. This is why Bio::Tools::Run::Alignment::Clustalw wasn't being > found in the normal way; the 5.8.8 directory was never checked. > > PERL5LIB="/opt/rocks/lib/perl5/site_perl/5.8.8" should let it be found. > If not, you might have to move the Bio folder from 5.8.8 to 5.8.5. > From vdar at yorku.ca Mon May 12 14:19:27 2008 From: vdar at yorku.ca (vdar at yorku.ca) Date: Mon, 12 May 2008 14:19:27 -0400 Subject: [Bioperl-l] problems with clustalw In-Reply-To: <48288061.9030001@sendu.me.uk> References: <17158917.post@talk.nabble.com> <48253A90.8060300@sendu.me.uk> <1210608557.48286bad2f2c5@mymail.yorku.ca> <48286CF8.2060405@sendu.me.uk> <1210609239.48286e57324cd@mymail.yorku.ca> <4828700A.4050709@sendu.me.uk> <1210610375.482872c72d63e@mymail.yorku.ca> <482874C7.2050606@sendu.me.uk> <1210611353.48287699ba469@mymail.yorku.ca> <48287946.8030007@sendu.me.uk> <1210613062.48287d46ba7c8@mymail.yorku.ca> <48288061.9030001@sendu.me.uk> Message-ID: <1210616367.48288a2f1191a@mymail.yorku.ca> Thanks a lot! You have solved the problem. I am getting the following output now. Would it be possible for you to let me know how can I print the alignments? I need the alignments as we get after running web-based clustalw or multalin programs. CLUSTAL W (1.83) Multiple Sequence Alignments Sequence format is Pearson Sequence 1: thal 77 bp Sequence 2: lyrata 77 bp Sequence 3: boechera 77 bp Start of Pairwise alignments Aligning... Sequences (1:2) Aligned. Score: 100 Sequences (1:3) Aligned. Score: 100 Sequences (2:3) Aligned. Score: 100 Guide tree file created: [/tmp/EIZp1pI1gi/jKZ8gRG2dY.dnd] Start of Multiple Alignment There are 2 groups Aligning... Group 1: Sequences: 2 Score:1463 Group 2: Sequences: 3 Score:1463 Alignment Score 1551 GCG-Alignment file created [/tmp/EIZp1pI1gi/Wa9Du2UIum] Nisa Quoting Sendu Bala : > vdar at yorku.ca wrote: > > Yes, seems like it worked, now I am having the following error message > which is > > not because of the errors in installation..right? > > > > $ perl mult_align.pl > > Can't call method "isa" without a package or object reference at > > /opt/rocks/lib/perl5/site_perl/5.8.8//Bio/Tools/Run/Alignment/Clustalw.pm > line > > 617. > > You weren't passing sequence objects to align() due to another error in > your script: > > Instead of: > my @seq_array = qw/$seq_obj_on $seq_obj_too $seq_obj_thre/; > my $seq_array_ref = \@seq_array; > my $aln = $factory->align($seq_array_ref); > > You can have: > my @seq_array = ($seq_obj_on, $seq_obj_too, $seq_obj_thre); > my $seq_array_ref = \@seq_array; > my $aln = $factory->align($seq_array_ref); > > Or just: > my $aln = $factory->align([$seq_obj_on, $seq_obj_too, $seq_obj_thre]); > From bix at sendu.me.uk Mon May 12 14:50:45 2008 From: bix at sendu.me.uk (Sendu Bala) Date: Mon, 12 May 2008 19:50:45 +0100 Subject: [Bioperl-l] problems with clustalw In-Reply-To: <1210616367.48288a2f1191a@mymail.yorku.ca> References: <17158917.post@talk.nabble.com> <48253A90.8060300@sendu.me.uk> <1210608557.48286bad2f2c5@mymail.yorku.ca> <48286CF8.2060405@sendu.me.uk> <1210609239.48286e57324cd@mymail.yorku.ca> <4828700A.4050709@sendu.me.uk> <1210610375.482872c72d63e@mymail.yorku.ca> <482874C7.2050606@sendu.me.uk> <1210611353.48287699ba469@mymail.yorku.ca> <48287946.8030007@sendu.me.uk> <1210613062.48287d46ba7c8@mymail.yorku.ca> <48288061.9030001@sendu.me.uk> <1210616367.48288a2f1191a@mymail.yorku.ca> Message-ID: <48289185.1030005@sendu.me.uk> vdar at yorku.ca wrote: > Thanks a lot! You have solved the problem. I am getting the following output > now. Would it be possible for you to let me know how can I print the > alignments? I need the alignments as we get after running web-based clustalw or > multalin programs. >[...] >> Or just: >> my $aln = $factory->align([$seq_obj_on, $seq_obj_too, $seq_obj_thre]); $aln is a Bio::SimpleAlign object. Check the docs for how to use it: http://docs.bioperl.org/bioperl-live/Bio/SimpleAlign.html For printing, you'll want to use AlignIO: http://docs.bioperl.org/bioperl-live/Bio/AlignIO.html For an example, see: http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_the_methods From vdar at yorku.ca Mon May 12 16:19:37 2008 From: vdar at yorku.ca (vdar at yorku.ca) Date: Mon, 12 May 2008 16:19:37 -0400 Subject: [Bioperl-l] problems with clustalw In-Reply-To: <48289185.1030005@sendu.me.uk> References: <17158917.post@talk.nabble.com> <48253A90.8060300@sendu.me.uk> <1210608557.48286bad2f2c5@mymail.yorku.ca> <48286CF8.2060405@sendu.me.uk> <1210609239.48286e57324cd@mymail.yorku.ca> <4828700A.4050709@sendu.me.uk> <1210610375.482872c72d63e@mymail.yorku.ca> <482874C7.2050606@sendu.me.uk> <1210611353.48287699ba469@mymail.yorku.ca> <48287946.8030007@sendu.me.uk> <1210613062.48287d46ba7c8@mymail.yorku.ca> <48288061.9030001@sendu.me.uk> <1210616367.48288a2f1191a@mymail.yorku.ca> <48289185.1030005@sendu.me.uk> Message-ID: <1210623577.4828a659ea351@mymail.yorku.ca> Thank you so much! Nisa Quoting Sendu Bala : > vdar at yorku.ca wrote: > > Thanks a lot! You have solved the problem. I am getting the following > output > > now. Would it be possible for you to let me know how can I print the > > alignments? I need the alignments as we get after running web-based > clustalw or > > multalin programs. > >[...] > >> Or just: > >> my $aln = $factory->align([$seq_obj_on, $seq_obj_too, $seq_obj_thre]); > > $aln is a Bio::SimpleAlign object. > Check the docs for how to use it: > http://docs.bioperl.org/bioperl-live/Bio/SimpleAlign.html > > For printing, you'll want to use AlignIO: > http://docs.bioperl.org/bioperl-live/Bio/AlignIO.html > For an example, see: > http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_the_methods > > From vdar at yorku.ca Mon May 12 17:22:45 2008 From: vdar at yorku.ca (nisa_dar) Date: Mon, 12 May 2008 14:22:45 -0700 (PDT) Subject: [Bioperl-l] automated stand alone blast with repeat masker Message-ID: <17189995.post@talk.nabble.com> Hi, I'm running a stand alone blast against my local databases by using the following code use Bio::Seq; use Bio::Tools::Run::StandAloneBlast; @params = (program => 'blastn', database => 'db.fa'); $blast_obj = Bio::Tools::Run::StandAloneBlast->new(@params); $seq_obj = Bio::Seq->new(-id =>"test query", -seq =>"TTTAAATATATTTTGAAGTATAGATTATATGTT"); $report_obj = $blast_obj->blastall($seq_obj); $result_obj = $report_obj->next_result; print $result_obj->num_hits; How can I include the code for repeat masker in it? Thanks Nisa -- View this message in context: http://www.nabble.com/automated-stand-alone-blast-with-repeat-masker-tp17189995p17189995.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From David.Messina at sbc.su.se Mon May 12 17:58:40 2008 From: David.Messina at sbc.su.se (Dave Messina) Date: Mon, 12 May 2008 23:58:40 +0200 Subject: [Bioperl-l] automated stand alone blast with repeat masker In-Reply-To: <17189995.post@talk.nabble.com> References: <17189995.post@talk.nabble.com> Message-ID: <628aabb70805121458o5bc808f8jf46869b08e65e8ac@mail.gmail.com> I haven't done this myself, but from a quick search on the BioPerl website, it looks like you'll want to use the Bio::Tools::Run::RepeatMaskermodule to create a repeat-masked fasta file. If you RepeatMask your query sequence(s), then you need to specify that sequence when you create your Bio::Seq object. If you instead RepeatMask your database, you'll need to create a blast database from the repeat-masked sequences and specify that db in your @params. I don't think there's a module for running formatdb, but you can do it through a system call. Dave From prachi at stanford.edu Mon May 12 19:17:53 2008 From: prachi at stanford.edu (Prachi Shah) Date: Mon, 12 May 2008 16:17:53 -0700 Subject: [Bioperl-l] Can't parse blast report written by Bio::SearchIO::Writer::TextResultWriter In-Reply-To: <8684cf960805091043j706d2aaej8584b1e7d4e2e4d7@mail.gmail.com> References: <8684cf960805081354s6400b1eey917f6b9ae862eded@mail.gmail.com> <27483384-0188-44F5-8AF8-5293A7A83547@bioperl.org> <8684cf960805081535v2a8c8261hcd373612100cdaf5@mail.gmail.com> <6661CE6F-0795-4EDE-9D05-CD95BAB3DBA4@uiuc.edu> <8684cf960805091043j706d2aaej8584b1e7d4e2e4d7@mail.gmail.com> Message-ID: <8684cf960805121617h9e2cf4ftdd5aee0f81635c47@mail.gmail.com> Thanks Jason for adding the sort_hsps method in Bio::Search::Hit::GenericHit. I tested it out and it works great. The other issue I have is the format of HSP start and stop coordinates when I write a new blast report (with HSPs sorted) using Bio::SearchIO::Writer::TextResultWriter. Below is an example of the same HSP alignment as output from BLAST and later when the blast report is generated by TextResultWriter. Notice, the change in start and stop coordinates. I would like to keep the start and stop format as in the first case. How do I specify that? Any indicators are greatly appreciated. Thanks, Prachi ---------------------------------------------------------------------------------------------------- **HSP alignment in blast report generated by BLAST itself: Score = 10150 (1529.0 bits), Expect = 0., Sum P(3) = 0. Identities = 2120/2345 (90%), Positives = 2120/2345 (90%), Strand = Minus / Plus Query: 2364 CATATCCAGATCTATCTTGATGATTCTTATTAGAATATGTATCTGAAGATGTGCCACTTG 2305 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 2251160 CATATCCAGATCTATCTTGATGATTCTTATTAGAATATGTATCTGAAGATGTGCCACTTG 2251219 Query: 2304 TTGGAGGTGGTGGAGCTCTTCTAGCAGGAATAAGTTCAGATTTATTCATCAAATTATTCA 2245 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 2251220 TTGGAGGTGGTGGAGCTCTTCTAGCAGGAATAAGTTCAGATTTATTCATCAAATTATTCA 2251279 Query: 2244 ATGGTGAAACGTTTNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNC 2185 |||||||||||||| | Sbjct: 2251280 ATGGTGAAACGTTTTTAGTATTATTATTGTTAGTGCTGTTGTTATTATTATTATTATTAC 2251339 Query: 2184 CAGAACTAGGTAATGAGCCTGATGATGATGTATGTTGGTGGGAAGAGCCATTTAGTTGTG 2125 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 2251340 CAGAACTAGGTAATGAGCCTGATGATGATGTATGTTGGTGGGAAGAGCCATTTAGTTGTG 2251399 Query: 2124 TCAAATGATATGGAGTTGGTGGTTTTGGTGCAGCTCGACTAGGTTTGAATTGTGAGACAG 2065 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 2251400 TCAAATGATATGGAGTTGGTGGTTTTGGTGCAGCTCGACTAGGTTTGAATTGTGAGACAG 2251459 Query: 2064 TAGATTTTGCTGGAGGTTTTACCCATTCTTGTAAATTTGCCTCTTGGACATTGTTTTTGG 2005 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 2251460 TAGATTTTGCTGGAGGTTTTACCCATTCTTGTAAATTTGCCTCTTGGACATTGTTTTTGG 2251519 Query: 2004 CTGATGAGTAATTGTTAGGGTCATTATTATTATTGTTGGTTTTGGAATTGATCATGGGTG 1945 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 2251520 CTGATGAGTAATTGTTAGGGTCATTATTATTATTGTTGGTTTTGGAATTGATCATGGGTG 2251579 Query: 1944 ATCCAATTGGAGTTCCAGCAGCAGAATTACCTCCATTTATATCGGAATAAAATTCTAAAA 1885 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 2251580 ATCCAATTGGAGTTCCAGCAGCAGAATTACCTCCATTTATATCGGAATAAAATTCTAAAA 2251639 Query: 1884 CTTTAATAACAGCAACAGGATCTTTTTTCCAATCCTCATTAGTGATTTTCGAATGTTGTA 1825 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 2251640 CTTTAATAACAGCAACAGGATCTTTTTTCCAATCCTCATTAGTGATTTTCGAATGTTGTA 2251699 ---------------------------------------------------------------------------------------------------- ** HSP alignment written by TextResultWriter: Score = 1529.0 bits (10150), Expect = 0., P = 0. Identities = 2120/2345 (90%) Frame = -1 / +1 Query: 20 CATATCCAGATCTATCTTGATGATTCTTATTAGAATATGTATCTGAAGATGTGCCACTTG -39 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 2251160 CATATCCAGATCTATCTTGATGATTCTTATTAGAATATGTATCTGAAGATGTGCCACTTG 2251219 Query: -40 TTGGAGGTGGTGGAGCTCTTCTAGCAGGAATAAGTTCAGATTTATTCATCAAATTATTCA -99 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 2251220 TTGGAGGTGGTGGAGCTCTTCTAGCAGGAATAAGTTCAGATTTATTCATCAAATTATTCA 2251279 Query: -100 ATGGTGAAACGTTTNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNC -159 |||||||||||||| | Sbjct: 2251280 ATGGTGAAACGTTTTTAGTATTATTATTGTTAGTGCTGTTGTTATTATTATTATTATTAC 2251339 Query: -160 CAGAACTAGGTAATGAGCCTGATGATGATGTATGTTGGTGGGAAGAGCCATTTAGTTGTG -219 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 2251340 CAGAACTAGGTAATGAGCCTGATGATGATGTATGTTGGTGGGAAGAGCCATTTAGTTGTG 2251399 Query: -220 TCAAATGATATGGAGTTGGTGGTTTTGGTGCAGCTCGACTAGGTTTGAATTGTGAGACAG -279 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 2251400 TCAAATGATATGGAGTTGGTGGTTTTGGTGCAGCTCGACTAGGTTTGAATTGTGAGACAG 2251459 Query: -280 TAGATTTTGCTGGAGGTTTTACCCATTCTTGTAAATTTGCCTCTTGGACATTGTTTTTGG -339 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 2251460 TAGATTTTGCTGGAGGTTTTACCCATTCTTGTAAATTTGCCTCTTGGACATTGTTTTTGG 2251519 Query: -340 CTGATGAGTAATTGTTAGGGTCATTATTATTATTGTTGGTTTTGGAATTGATCATGGGTG -399 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 2251520 CTGATGAGTAATTGTTAGGGTCATTATTATTATTGTTGGTTTTGGAATTGATCATGGGTG 2251579 Query: -400 ATCCAATTGGAGTTCCAGCAGCAGAATTACCTCCATTTATATCGGAATAAAATTCTAAAA -459 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 2251580 ATCCAATTGGAGTTCCAGCAGCAGAATTACCTCCATTTATATCGGAATAAAATTCTAAAA 2251639 Query: -460 CTTTAATAACAGCAACAGGATCTTTTTTCCAATCCTCATTAGTGATTTTCGAATGTTGTA -519 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 2251640 CTTTAATAACAGCAACAGGATCTTTTTTCCAATCCTCATTAGTGATTTTCGAATGTTGTA 2251699 From jason at bioperl.org Mon May 12 19:21:58 2008 From: jason at bioperl.org (Jason Stajich) Date: Mon, 12 May 2008 16:21:58 -0700 Subject: [Bioperl-l] Can't parse blast report written by Bio::SearchIO::Writer::TextResultWriter In-Reply-To: <8684cf960805121617h9e2cf4ftdd5aee0f81635c47@mail.gmail.com> References: <8684cf960805081354s6400b1eey917f6b9ae862eded@mail.gmail.com> <27483384-0188-44F5-8AF8-5293A7A83547@bioperl.org> <8684cf960805081535v2a8c8261hcd373612100cdaf5@mail.gmail.com> <6661CE6F-0795-4EDE-9D05-CD95BAB3DBA4@uiuc.edu> <8684cf960805091043j706d2aaej8584b1e7d4e2e4d7@mail.gmail.com> <8684cf960805121617h9e2cf4ftdd5aee0f81635c47@mail.gmail.com> Message-ID: <6DAEC561-D4C6-4F52-9359-84E4A336FD01@bioperl.org> that's a very strange bug - I don't quite understand where it is coming from. IF you don't mess with the HSP order and start with a report and generate the Text report output, does it also give the negative coordinates or are you still reconstituting the Hit/HSP objects "manually" in your code? -jason On May 12, 2008, at 4:17 PM, Prachi Shah wrote: > Thanks Jason for adding the sort_hsps method in > Bio::Search::Hit::GenericHit. I tested it out and it works great. > > The other issue I have is the format of HSP start and stop coordinates > when I write a new blast report (with HSPs sorted) using > Bio::SearchIO::Writer::TextResultWriter. Below is an example of the > same HSP alignment as output from BLAST and later when the blast > report is generated by TextResultWriter. Notice, the change in start > and stop coordinates. I would like to keep the start and stop format > as in the first case. How do I specify that? Any indicators are > greatly appreciated. > > Thanks, > Prachi > > ---------------------------------------------------------------------- > ------------------------------ > **HSP alignment in blast report generated by BLAST itself: > > Score = 10150 (1529.0 bits), Expect = 0., Sum P(3) = 0. > Identities = 2120/2345 (90%), Positives = 2120/2345 (90%), Strand = > Minus / Plus > > Query: 2364 > CATATCCAGATCTATCTTGATGATTCTTATTAGAATATGTATCTGAAGATGTGCCACTTG 2305 > > |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > Sbjct: 2251160 > CATATCCAGATCTATCTTGATGATTCTTATTAGAATATGTATCTGAAGATGTGCCACTTG > 2251219 > > Query: 2304 > TTGGAGGTGGTGGAGCTCTTCTAGCAGGAATAAGTTCAGATTTATTCATCAAATTATTCA 2245 > > |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > Sbjct: 2251220 > TTGGAGGTGGTGGAGCTCTTCTAGCAGGAATAAGTTCAGATTTATTCATCAAATTATTCA > 2251279 > > Query: 2244 > ATGGTGAAACGTTTNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNC 2185 > > |||||||||||||| | > Sbjct: 2251280 > ATGGTGAAACGTTTTTAGTATTATTATTGTTAGTGCTGTTGTTATTATTATTATTATTAC > 2251339 > > Query: 2184 > CAGAACTAGGTAATGAGCCTGATGATGATGTATGTTGGTGGGAAGAGCCATTTAGTTGTG 2125 > > |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > Sbjct: 2251340 > CAGAACTAGGTAATGAGCCTGATGATGATGTATGTTGGTGGGAAGAGCCATTTAGTTGTG > 2251399 > > Query: 2124 > TCAAATGATATGGAGTTGGTGGTTTTGGTGCAGCTCGACTAGGTTTGAATTGTGAGACAG 2065 > > |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > Sbjct: 2251400 > TCAAATGATATGGAGTTGGTGGTTTTGGTGCAGCTCGACTAGGTTTGAATTGTGAGACAG > 2251459 > > Query: 2064 > TAGATTTTGCTGGAGGTTTTACCCATTCTTGTAAATTTGCCTCTTGGACATTGTTTTTGG 2005 > > |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > Sbjct: 2251460 > TAGATTTTGCTGGAGGTTTTACCCATTCTTGTAAATTTGCCTCTTGGACATTGTTTTTGG > 2251519 > > Query: 2004 > CTGATGAGTAATTGTTAGGGTCATTATTATTATTGTTGGTTTTGGAATTGATCATGGGTG 1945 > > |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > Sbjct: 2251520 > CTGATGAGTAATTGTTAGGGTCATTATTATTATTGTTGGTTTTGGAATTGATCATGGGTG > 2251579 > > Query: 1944 > ATCCAATTGGAGTTCCAGCAGCAGAATTACCTCCATTTATATCGGAATAAAATTCTAAAA 1885 > > |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > Sbjct: 2251580 > ATCCAATTGGAGTTCCAGCAGCAGAATTACCTCCATTTATATCGGAATAAAATTCTAAAA > 2251639 > > Query: 1884 > CTTTAATAACAGCAACAGGATCTTTTTTCCAATCCTCATTAGTGATTTTCGAATGTTGTA 1825 > > |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > Sbjct: 2251640 > CTTTAATAACAGCAACAGGATCTTTTTTCCAATCCTCATTAGTGATTTTCGAATGTTGTA > 2251699 > > > ---------------------------------------------------------------------- > ------------------------------ > ** HSP alignment written by TextResultWriter: > > Score = 1529.0 bits (10150), Expect = 0., P = 0. > Identities = 2120/2345 (90%) > Frame = -1 / +1 > > Query: 20 > CATATCCAGATCTATCTTGATGATTCTTATTAGAATATGTATCTGAAGATGTGCCACTTG -39 > > |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > Sbjct: 2251160 > CATATCCAGATCTATCTTGATGATTCTTATTAGAATATGTATCTGAAGATGTGCCACTTG > 2251219 > > Query: -40 > TTGGAGGTGGTGGAGCTCTTCTAGCAGGAATAAGTTCAGATTTATTCATCAAATTATTCA -99 > > |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > Sbjct: 2251220 > TTGGAGGTGGTGGAGCTCTTCTAGCAGGAATAAGTTCAGATTTATTCATCAAATTATTCA > 2251279 > > Query: -100 > ATGGTGAAACGTTTNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNC -159 > > |||||||||||||| | > Sbjct: 2251280 > ATGGTGAAACGTTTTTAGTATTATTATTGTTAGTGCTGTTGTTATTATTATTATTATTAC > 2251339 > > Query: -160 > CAGAACTAGGTAATGAGCCTGATGATGATGTATGTTGGTGGGAAGAGCCATTTAGTTGTG -219 > > |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > Sbjct: 2251340 > CAGAACTAGGTAATGAGCCTGATGATGATGTATGTTGGTGGGAAGAGCCATTTAGTTGTG > 2251399 > > Query: -220 > TCAAATGATATGGAGTTGGTGGTTTTGGTGCAGCTCGACTAGGTTTGAATTGTGAGACAG -279 > > |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > Sbjct: 2251400 > TCAAATGATATGGAGTTGGTGGTTTTGGTGCAGCTCGACTAGGTTTGAATTGTGAGACAG > 2251459 > > Query: -280 > TAGATTTTGCTGGAGGTTTTACCCATTCTTGTAAATTTGCCTCTTGGACATTGTTTTTGG -339 > > |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > Sbjct: 2251460 > TAGATTTTGCTGGAGGTTTTACCCATTCTTGTAAATTTGCCTCTTGGACATTGTTTTTGG > 2251519 > > Query: -340 > CTGATGAGTAATTGTTAGGGTCATTATTATTATTGTTGGTTTTGGAATTGATCATGGGTG -399 > > |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > Sbjct: 2251520 > CTGATGAGTAATTGTTAGGGTCATTATTATTATTGTTGGTTTTGGAATTGATCATGGGTG > 2251579 > > Query: -400 > ATCCAATTGGAGTTCCAGCAGCAGAATTACCTCCATTTATATCGGAATAAAATTCTAAAA -459 > > |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > Sbjct: 2251580 > ATCCAATTGGAGTTCCAGCAGCAGAATTACCTCCATTTATATCGGAATAAAATTCTAAAA > 2251639 > > Query: -460 > CTTTAATAACAGCAACAGGATCTTTTTTCCAATCCTCATTAGTGATTTTCGAATGTTGTA -519 > > |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > Sbjct: 2251640 > CTTTAATAACAGCAACAGGATCTTTTTTCCAATCCTCATTAGTGATTTTCGAATGTTGTA > 2251699 From prachi at stanford.edu Mon May 12 19:26:41 2008 From: prachi at stanford.edu (Prachi Shah) Date: Mon, 12 May 2008 16:26:41 -0700 Subject: [Bioperl-l] Can't parse blast report written by Bio::SearchIO::Writer::TextResultWriter In-Reply-To: <6DAEC561-D4C6-4F52-9359-84E4A336FD01@bioperl.org> References: <8684cf960805081354s6400b1eey917f6b9ae862eded@mail.gmail.com> <27483384-0188-44F5-8AF8-5293A7A83547@bioperl.org> <8684cf960805081535v2a8c8261hcd373612100cdaf5@mail.gmail.com> <6661CE6F-0795-4EDE-9D05-CD95BAB3DBA4@uiuc.edu> <8684cf960805091043j706d2aaej8584b1e7d4e2e4d7@mail.gmail.com> <8684cf960805121617h9e2cf4ftdd5aee0f81635c47@mail.gmail.com> <6DAEC561-D4C6-4F52-9359-84E4A336FD01@bioperl.org> Message-ID: <8684cf960805121626y2fb9e8a1n7bbfc81e3a61a2bc@mail.gmail.com> Hi Jason, The negative coordinates in the HSP show up when I generate a Text report regardless of how/if I sort the HSP order. I think it has something to do with the frame. In the example I gave, the Query sequence matches the subject sequence on the negative strand. My guess is that TextResultWriter somehow takes the strand into account and tries to recalculates the start and stop locations? Thanks, Prachi On Mon, May 12, 2008 at 4:21 PM, Jason Stajich wrote: > that's a very strange bug - I don't quite understand where it is coming > from. IF you don't mess with the HSP order and start with a report and > generate the Text report output, does it also give the negative coordinates > or are you still reconstituting the Hit/HSP objects "manually" in your code? > > -jason > > > On May 12, 2008, at 4:17 PM, Prachi Shah wrote: > > > > Thanks Jason for adding the sort_hsps method in > > Bio::Search::Hit::GenericHit. I tested it out and it works great. > > > > The other issue I have is the format of HSP start and stop coordinates > > when I write a new blast report (with HSPs sorted) using > > Bio::SearchIO::Writer::TextResultWriter. Below is an example of the > > same HSP alignment as output from BLAST and later when the blast > > report is generated by TextResultWriter. Notice, the change in start > > and stop coordinates. I would like to keep the start and stop format > > as in the first case. How do I specify that? Any indicators are > > greatly appreciated. > > > > Thanks, > > Prachi > > > > > ---------------------------------------------------------------------------------------------------- > > **HSP alignment in blast report generated by BLAST itself: > > > > Score = 10150 (1529.0 bits), Expect = 0., Sum P(3) = 0. > > Identities = 2120/2345 (90%), Positives = 2120/2345 (90%), Strand = > > Minus / Plus > > > > Query: 2364 > CATATCCAGATCTATCTTGATGATTCTTATTAGAATATGTATCTGAAGATGTGCCACTTG 2305 > > |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > > Sbjct: 2251160 > CATATCCAGATCTATCTTGATGATTCTTATTAGAATATGTATCTGAAGATGTGCCACTTG > > 2251219 > > > > Query: 2304 > TTGGAGGTGGTGGAGCTCTTCTAGCAGGAATAAGTTCAGATTTATTCATCAAATTATTCA 2245 > > |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > > Sbjct: 2251220 > TTGGAGGTGGTGGAGCTCTTCTAGCAGGAATAAGTTCAGATTTATTCATCAAATTATTCA > > 2251279 > > > > Query: 2244 > ATGGTGAAACGTTTNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNC 2185 > > |||||||||||||| | > > Sbjct: 2251280 > ATGGTGAAACGTTTTTAGTATTATTATTGTTAGTGCTGTTGTTATTATTATTATTATTAC > > 2251339 > > > > Query: 2184 > CAGAACTAGGTAATGAGCCTGATGATGATGTATGTTGGTGGGAAGAGCCATTTAGTTGTG 2125 > > |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > > Sbjct: 2251340 > CAGAACTAGGTAATGAGCCTGATGATGATGTATGTTGGTGGGAAGAGCCATTTAGTTGTG > > 2251399 > > > > Query: 2124 > TCAAATGATATGGAGTTGGTGGTTTTGGTGCAGCTCGACTAGGTTTGAATTGTGAGACAG 2065 > > |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > > Sbjct: 2251400 > TCAAATGATATGGAGTTGGTGGTTTTGGTGCAGCTCGACTAGGTTTGAATTGTGAGACAG > > 2251459 > > > > Query: 2064 > TAGATTTTGCTGGAGGTTTTACCCATTCTTGTAAATTTGCCTCTTGGACATTGTTTTTGG 2005 > > |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > > Sbjct: 2251460 > TAGATTTTGCTGGAGGTTTTACCCATTCTTGTAAATTTGCCTCTTGGACATTGTTTTTGG > > 2251519 > > > > Query: 2004 > CTGATGAGTAATTGTTAGGGTCATTATTATTATTGTTGGTTTTGGAATTGATCATGGGTG 1945 > > |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > > Sbjct: 2251520 > CTGATGAGTAATTGTTAGGGTCATTATTATTATTGTTGGTTTTGGAATTGATCATGGGTG > > 2251579 > > > > Query: 1944 > ATCCAATTGGAGTTCCAGCAGCAGAATTACCTCCATTTATATCGGAATAAAATTCTAAAA 1885 > > |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > > Sbjct: 2251580 > ATCCAATTGGAGTTCCAGCAGCAGAATTACCTCCATTTATATCGGAATAAAATTCTAAAA > > 2251639 > > > > Query: 1884 > CTTTAATAACAGCAACAGGATCTTTTTTCCAATCCTCATTAGTGATTTTCGAATGTTGTA 1825 > > |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > > Sbjct: 2251640 > CTTTAATAACAGCAACAGGATCTTTTTTCCAATCCTCATTAGTGATTTTCGAATGTTGTA > > 2251699 > > > > > > > ---------------------------------------------------------------------------------------------------- > > ** HSP alignment written by TextResultWriter: > > > > Score = 1529.0 bits (10150), Expect = 0., P = 0. > > Identities = 2120/2345 (90%) > > Frame = -1 / +1 > > > > Query: 20 > CATATCCAGATCTATCTTGATGATTCTTATTAGAATATGTATCTGAAGATGTGCCACTTG -39 > > |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > > Sbjct: 2251160 > CATATCCAGATCTATCTTGATGATTCTTATTAGAATATGTATCTGAAGATGTGCCACTTG > > 2251219 > > > > Query: -40 > TTGGAGGTGGTGGAGCTCTTCTAGCAGGAATAAGTTCAGATTTATTCATCAAATTATTCA -99 > > |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > > Sbjct: 2251220 > TTGGAGGTGGTGGAGCTCTTCTAGCAGGAATAAGTTCAGATTTATTCATCAAATTATTCA > > 2251279 > > > > Query: -100 > ATGGTGAAACGTTTNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNC -159 > > |||||||||||||| | > > Sbjct: 2251280 > ATGGTGAAACGTTTTTAGTATTATTATTGTTAGTGCTGTTGTTATTATTATTATTATTAC > > 2251339 > > > > Query: -160 > CAGAACTAGGTAATGAGCCTGATGATGATGTATGTTGGTGGGAAGAGCCATTTAGTTGTG -219 > > |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > > Sbjct: 2251340 > CAGAACTAGGTAATGAGCCTGATGATGATGTATGTTGGTGGGAAGAGCCATTTAGTTGTG > > 2251399 > > > > Query: -220 > TCAAATGATATGGAGTTGGTGGTTTTGGTGCAGCTCGACTAGGTTTGAATTGTGAGACAG -279 > > |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > > Sbjct: 2251400 > TCAAATGATATGGAGTTGGTGGTTTTGGTGCAGCTCGACTAGGTTTGAATTGTGAGACAG > > 2251459 > > > > Query: -280 > TAGATTTTGCTGGAGGTTTTACCCATTCTTGTAAATTTGCCTCTTGGACATTGTTTTTGG -339 > > |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > > Sbjct: 2251460 > TAGATTTTGCTGGAGGTTTTACCCATTCTTGTAAATTTGCCTCTTGGACATTGTTTTTGG > > 2251519 > > > > Query: -340 > CTGATGAGTAATTGTTAGGGTCATTATTATTATTGTTGGTTTTGGAATTGATCATGGGTG -399 > > |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > > Sbjct: 2251520 > CTGATGAGTAATTGTTAGGGTCATTATTATTATTGTTGGTTTTGGAATTGATCATGGGTG > > 2251579 > > > > Query: -400 > ATCCAATTGGAGTTCCAGCAGCAGAATTACCTCCATTTATATCGGAATAAAATTCTAAAA -459 > > |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > > Sbjct: 2251580 > ATCCAATTGGAGTTCCAGCAGCAGAATTACCTCCATTTATATCGGAATAAAATTCTAAAA > > 2251639 > > > > Query: -460 > CTTTAATAACAGCAACAGGATCTTTTTTCCAATCCTCATTAGTGATTTTCGAATGTTGTA -519 > > |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > > Sbjct: 2251640 > CTTTAATAACAGCAACAGGATCTTTTTTCCAATCCTCATTAGTGATTTTCGAATGTTGTA > > 2251699 > > > > From jason at bioperl.org Mon May 12 19:53:15 2008 From: jason at bioperl.org (Jason Stajich) Date: Mon, 12 May 2008 16:53:15 -0700 Subject: [Bioperl-l] Can't parse blast report written by Bio::SearchIO::Writer::TextResultWriter In-Reply-To: <8684cf960805121626y2fb9e8a1n7bbfc81e3a61a2bc@mail.gmail.com> References: <8684cf960805081354s6400b1eey917f6b9ae862eded@mail.gmail.com> <27483384-0188-44F5-8AF8-5293A7A83547@bioperl.org> <8684cf960805081535v2a8c8261hcd373612100cdaf5@mail.gmail.com> <6661CE6F-0795-4EDE-9D05-CD95BAB3DBA4@uiuc.edu> <8684cf960805091043j706d2aaej8584b1e7d4e2e4d7@mail.gmail.com> <8684cf960805121617h9e2cf4ftdd5aee0f81635c47@mail.gmail.com> <6DAEC561-D4C6-4F52-9359-84E4A336FD01@bioperl.org> <8684cf960805121626y2fb9e8a1n7bbfc81e3a61a2bc@mail.gmail.com> Message-ID: <83452C02-671E-4468-85FB-F7F4FA556D71@bioperl.org> okay - so there's a bug - I remember someone tried to fix something in the writers recently so will have to look and see how that got broken and can be fixed. -j On May 12, 2008, at 4:26 PM, Prachi Shah wrote: > Hi Jason, > > The negative coordinates in the HSP show up when I generate a Text > report regardless of how/if I sort the HSP order. I think it has > something to do with the frame. In the example I gave, the Query > sequence matches the subject sequence on the negative strand. My guess > is that TextResultWriter somehow takes the strand into account and > tries to recalculates the start and stop locations? > > Thanks, > Prachi > > On Mon, May 12, 2008 at 4:21 PM, Jason Stajich > wrote: >> that's a very strange bug - I don't quite understand where it is >> coming >> from. IF you don't mess with the HSP order and start with a >> report and >> generate the Text report output, does it also give the negative >> coordinates >> or are you still reconstituting the Hit/HSP objects "manually" in >> your code? >> >> -jason >> >> >> On May 12, 2008, at 4:17 PM, Prachi Shah wrote: >> >> >>> Thanks Jason for adding the sort_hsps method in >>> Bio::Search::Hit::GenericHit. I tested it out and it works great. >>> >>> The other issue I have is the format of HSP start and stop >>> coordinates >>> when I write a new blast report (with HSPs sorted) using >>> Bio::SearchIO::Writer::TextResultWriter. Below is an example of the >>> same HSP alignment as output from BLAST and later when the blast >>> report is generated by TextResultWriter. Notice, the change in start >>> and stop coordinates. I would like to keep the start and stop format >>> as in the first case. How do I specify that? Any indicators are >>> greatly appreciated. >>> >>> Thanks, >>> Prachi >>> >>> >> --------------------------------------------------------------------- >> ------------------------------- >>> **HSP alignment in blast report generated by BLAST itself: >>> >>> Score = 10150 (1529.0 bits), Expect = 0., Sum P(3) = 0. >>> Identities = 2120/2345 (90%), Positives = 2120/2345 (90%), Strand = >>> Minus / Plus >>> >>> Query: 2364 >> CATATCCAGATCTATCTTGATGATTCTTATTAGAATATGTATCTGAAGATGTGCCACTTG 2305 >>> >>> |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| >>> Sbjct: 2251160 >> CATATCCAGATCTATCTTGATGATTCTTATTAGAATATGTATCTGAAGATGTGCCACTTG >>> 2251219 >>> >>> Query: 2304 >> TTGGAGGTGGTGGAGCTCTTCTAGCAGGAATAAGTTCAGATTTATTCATCAAATTATTCA 2245 >>> >>> |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| >>> Sbjct: 2251220 >> TTGGAGGTGGTGGAGCTCTTCTAGCAGGAATAAGTTCAGATTTATTCATCAAATTATTCA >>> 2251279 >>> >>> Query: 2244 >> ATGGTGAAACGTTTNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNC 2185 >>> >>> |||||||||||||| | >>> Sbjct: 2251280 >> ATGGTGAAACGTTTTTAGTATTATTATTGTTAGTGCTGTTGTTATTATTATTATTATTAC >>> 2251339 >>> >>> Query: 2184 >> CAGAACTAGGTAATGAGCCTGATGATGATGTATGTTGGTGGGAAGAGCCATTTAGTTGTG 2125 >>> >>> |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| >>> Sbjct: 2251340 >> CAGAACTAGGTAATGAGCCTGATGATGATGTATGTTGGTGGGAAGAGCCATTTAGTTGTG >>> 2251399 >>> >>> Query: 2124 >> TCAAATGATATGGAGTTGGTGGTTTTGGTGCAGCTCGACTAGGTTTGAATTGTGAGACAG 2065 >>> >>> |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| >>> Sbjct: 2251400 >> TCAAATGATATGGAGTTGGTGGTTTTGGTGCAGCTCGACTAGGTTTGAATTGTGAGACAG >>> 2251459 >>> >>> Query: 2064 >> TAGATTTTGCTGGAGGTTTTACCCATTCTTGTAAATTTGCCTCTTGGACATTGTTTTTGG 2005 >>> >>> |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| >>> Sbjct: 2251460 >> TAGATTTTGCTGGAGGTTTTACCCATTCTTGTAAATTTGCCTCTTGGACATTGTTTTTGG >>> 2251519 >>> >>> Query: 2004 >> CTGATGAGTAATTGTTAGGGTCATTATTATTATTGTTGGTTTTGGAATTGATCATGGGTG 1945 >>> >>> |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| >>> Sbjct: 2251520 >> CTGATGAGTAATTGTTAGGGTCATTATTATTATTGTTGGTTTTGGAATTGATCATGGGTG >>> 2251579 >>> >>> Query: 1944 >> ATCCAATTGGAGTTCCAGCAGCAGAATTACCTCCATTTATATCGGAATAAAATTCTAAAA 1885 >>> >>> |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| >>> Sbjct: 2251580 >> ATCCAATTGGAGTTCCAGCAGCAGAATTACCTCCATTTATATCGGAATAAAATTCTAAAA >>> 2251639 >>> >>> Query: 1884 >> CTTTAATAACAGCAACAGGATCTTTTTTCCAATCCTCATTAGTGATTTTCGAATGTTGTA 1825 >>> >>> |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| >>> Sbjct: 2251640 >> CTTTAATAACAGCAACAGGATCTTTTTTCCAATCCTCATTAGTGATTTTCGAATGTTGTA >>> 2251699 >>> >>> >>> >> --------------------------------------------------------------------- >> ------------------------------- >>> ** HSP alignment written by TextResultWriter: >>> >>> Score = 1529.0 bits (10150), Expect = 0., P = 0. >>> Identities = 2120/2345 (90%) >>> Frame = -1 / +1 >>> >>> Query: 20 >> CATATCCAGATCTATCTTGATGATTCTTATTAGAATATGTATCTGAAGATGTGCCACTTG -39 >>> >>> |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| >>> Sbjct: 2251160 >> CATATCCAGATCTATCTTGATGATTCTTATTAGAATATGTATCTGAAGATGTGCCACTTG >>> 2251219 >>> >>> Query: -40 >> TTGGAGGTGGTGGAGCTCTTCTAGCAGGAATAAGTTCAGATTTATTCATCAAATTATTCA -99 >>> >>> |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| >>> Sbjct: 2251220 >> TTGGAGGTGGTGGAGCTCTTCTAGCAGGAATAAGTTCAGATTTATTCATCAAATTATTCA >>> 2251279 >>> >>> Query: -100 >> ATGGTGAAACGTTTNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNC -159 >>> >>> |||||||||||||| | >>> Sbjct: 2251280 >> ATGGTGAAACGTTTTTAGTATTATTATTGTTAGTGCTGTTGTTATTATTATTATTATTAC >>> 2251339 >>> >>> Query: -160 >> CAGAACTAGGTAATGAGCCTGATGATGATGTATGTTGGTGGGAAGAGCCATTTAGTTGTG -219 >>> >>> |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| >>> Sbjct: 2251340 >> CAGAACTAGGTAATGAGCCTGATGATGATGTATGTTGGTGGGAAGAGCCATTTAGTTGTG >>> 2251399 >>> >>> Query: -220 >> TCAAATGATATGGAGTTGGTGGTTTTGGTGCAGCTCGACTAGGTTTGAATTGTGAGACAG -279 >>> >>> |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| >>> Sbjct: 2251400 >> TCAAATGATATGGAGTTGGTGGTTTTGGTGCAGCTCGACTAGGTTTGAATTGTGAGACAG >>> 2251459 >>> >>> Query: -280 >> TAGATTTTGCTGGAGGTTTTACCCATTCTTGTAAATTTGCCTCTTGGACATTGTTTTTGG -339 >>> >>> |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| >>> Sbjct: 2251460 >> TAGATTTTGCTGGAGGTTTTACCCATTCTTGTAAATTTGCCTCTTGGACATTGTTTTTGG >>> 2251519 >>> >>> Query: -340 >> CTGATGAGTAATTGTTAGGGTCATTATTATTATTGTTGGTTTTGGAATTGATCATGGGTG -399 >>> >>> |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| >>> Sbjct: 2251520 >> CTGATGAGTAATTGTTAGGGTCATTATTATTATTGTTGGTTTTGGAATTGATCATGGGTG >>> 2251579 >>> >>> Query: -400 >> ATCCAATTGGAGTTCCAGCAGCAGAATTACCTCCATTTATATCGGAATAAAATTCTAAAA -459 >>> >>> |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| >>> Sbjct: 2251580 >> ATCCAATTGGAGTTCCAGCAGCAGAATTACCTCCATTTATATCGGAATAAAATTCTAAAA >>> 2251639 >>> >>> Query: -460 >> CTTTAATAACAGCAACAGGATCTTTTTTCCAATCCTCATTAGTGATTTTCGAATGTTGTA -519 >>> >>> |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| >>> Sbjct: 2251640 >> CTTTAATAACAGCAACAGGATCTTTTTTCCAATCCTCATTAGTGATTTTCGAATGTTGTA >>> 2251699 >>> >> >> From cjfields at uiuc.edu Mon May 12 20:33:25 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 12 May 2008 19:33:25 -0500 Subject: [Bioperl-l] Can't parse blast report written by Bio::SearchIO::Writer::TextResultWriter In-Reply-To: <83452C02-671E-4468-85FB-F7F4FA556D71@bioperl.org> References: <8684cf960805081354s6400b1eey917f6b9ae862eded@mail.gmail.com> <27483384-0188-44F5-8AF8-5293A7A83547@bioperl.org> <8684cf960805081535v2a8c8261hcd373612100cdaf5@mail.gmail.com> <6661CE6F-0795-4EDE-9D05-CD95BAB3DBA4@uiuc.edu> <8684cf960805091043j706d2aaej8584b1e7d4e2e4d7@mail.gmail.com> <8684cf960805121617h9e2cf4ftdd5aee0f81635c47@mail.gmail.com> <6DAEC561-D4C6-4F52-9359-84E4A336FD01@bioperl.org> <8684cf960805121626y2fb9e8a1n7bbfc81e3a61a2bc@mail.gmail.com> <83452C02-671E-4468-85FB-F7F4FA556D71@bioperl.org> Message-ID: I ran some fixes on the writers recently. If we have the BLAST report generating this I can work on debugging it (I'll file a bug for tracking). chris On May 12, 2008, at 6:53 PM, Jason Stajich wrote: > okay - so there's a bug - I remember someone tried to fix something > in the writers recently so will have to look and see how that got > broken and can be fixed. > -j > On May 12, 2008, at 4:26 PM, Prachi Shah wrote: > >> Hi Jason, >> >> The negative coordinates in the HSP show up when I generate a Text >> report regardless of how/if I sort the HSP order. I think it has >> something to do with the frame. In the example I gave, the Query >> sequence matches the subject sequence on the negative strand. My >> guess >> is that TextResultWriter somehow takes the strand into account and >> tries to recalculates the start and stop locations? >> >> Thanks, >> Prachi >> >> On Mon, May 12, 2008 at 4:21 PM, Jason Stajich >> wrote: >>> that's a very strange bug - I don't quite understand where it is >>> coming >>> from. IF you don't mess with the HSP order and start with a >>> report and >>> generate the Text report output, does it also give the negative >>> coordinates >>> or are you still reconstituting the Hit/HSP objects "manually" in >>> your code? >>> >>> -jason >>> >>> >>> On May 12, 2008, at 4:17 PM, Prachi Shah wrote: >>> >>> >>>> Thanks Jason for adding the sort_hsps method in >>>> Bio::Search::Hit::GenericHit. I tested it out and it works great. >>>> >>>> The other issue I have is the format of HSP start and stop >>>> coordinates >>>> when I write a new blast report (with HSPs sorted) using >>>> Bio::SearchIO::Writer::TextResultWriter. Below is an example of the >>>> same HSP alignment as output from BLAST and later when the blast >>>> report is generated by TextResultWriter. Notice, the change in >>>> start >>>> and stop coordinates. I would like to keep the start and stop >>>> format >>>> as in the first case. How do I specify that? Any indicators are >>>> greatly appreciated. >>>> >>>> Thanks, >>>> Prachi >>>> >>>> >>> ---------------------------------------------------------------------------------------------------- >>>> **HSP alignment in blast report generated by BLAST itself: >>>> >>>> Score = 10150 (1529.0 bits), Expect = 0., Sum P(3) = 0. >>>> Identities = 2120/2345 (90%), Positives = 2120/2345 (90%), Strand = >>>> Minus / Plus >>>> >>>> Query: 2364 >>> CATATCCAGATCTATCTTGATGATTCTTATTAGAATATGTATCTGAAGATGTGCCACTTG 2305 >>>> >>>> |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| >>>> Sbjct: 2251160 >>> CATATCCAGATCTATCTTGATGATTCTTATTAGAATATGTATCTGAAGATGTGCCACTTG >>>> 2251219 >>>> >>>> Query: 2304 >>> TTGGAGGTGGTGGAGCTCTTCTAGCAGGAATAAGTTCAGATTTATTCATCAAATTATTCA 2245 >>>> >>>> |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| >>>> Sbjct: 2251220 >>> TTGGAGGTGGTGGAGCTCTTCTAGCAGGAATAAGTTCAGATTTATTCATCAAATTATTCA >>>> 2251279 >>>> >>>> Query: 2244 >>> ATGGTGAAACGTTTNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNC 2185 >>>> >>>> |||||||||||||| | >>>> Sbjct: 2251280 >>> ATGGTGAAACGTTTTTAGTATTATTATTGTTAGTGCTGTTGTTATTATTATTATTATTAC >>>> 2251339 >>>> >>>> Query: 2184 >>> CAGAACTAGGTAATGAGCCTGATGATGATGTATGTTGGTGGGAAGAGCCATTTAGTTGTG 2125 >>>> >>>> |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| >>>> Sbjct: 2251340 >>> CAGAACTAGGTAATGAGCCTGATGATGATGTATGTTGGTGGGAAGAGCCATTTAGTTGTG >>>> 2251399 >>>> >>>> Query: 2124 >>> TCAAATGATATGGAGTTGGTGGTTTTGGTGCAGCTCGACTAGGTTTGAATTGTGAGACAG 2065 >>>> >>>> |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| >>>> Sbjct: 2251400 >>> TCAAATGATATGGAGTTGGTGGTTTTGGTGCAGCTCGACTAGGTTTGAATTGTGAGACAG >>>> 2251459 >>>> >>>> Query: 2064 >>> TAGATTTTGCTGGAGGTTTTACCCATTCTTGTAAATTTGCCTCTTGGACATTGTTTTTGG 2005 >>>> >>>> |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| >>>> Sbjct: 2251460 >>> TAGATTTTGCTGGAGGTTTTACCCATTCTTGTAAATTTGCCTCTTGGACATTGTTTTTGG >>>> 2251519 >>>> >>>> Query: 2004 >>> CTGATGAGTAATTGTTAGGGTCATTATTATTATTGTTGGTTTTGGAATTGATCATGGGTG 1945 >>>> >>>> |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| >>>> Sbjct: 2251520 >>> CTGATGAGTAATTGTTAGGGTCATTATTATTATTGTTGGTTTTGGAATTGATCATGGGTG >>>> 2251579 >>>> >>>> Query: 1944 >>> ATCCAATTGGAGTTCCAGCAGCAGAATTACCTCCATTTATATCGGAATAAAATTCTAAAA 1885 >>>> >>>> |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| >>>> Sbjct: 2251580 >>> ATCCAATTGGAGTTCCAGCAGCAGAATTACCTCCATTTATATCGGAATAAAATTCTAAAA >>>> 2251639 >>>> >>>> Query: 1884 >>> CTTTAATAACAGCAACAGGATCTTTTTTCCAATCCTCATTAGTGATTTTCGAATGTTGTA 1825 >>>> >>>> |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| >>>> Sbjct: 2251640 >>> CTTTAATAACAGCAACAGGATCTTTTTTCCAATCCTCATTAGTGATTTTCGAATGTTGTA >>>> 2251699 >>>> >>>> >>>> >>> ---------------------------------------------------------------------------------------------------- >>>> ** HSP alignment written by TextResultWriter: >>>> >>>> Score = 1529.0 bits (10150), Expect = 0., P = 0. >>>> Identities = 2120/2345 (90%) >>>> Frame = -1 / +1 >>>> >>>> Query: 20 >>> CATATCCAGATCTATCTTGATGATTCTTATTAGAATATGTATCTGAAGATGTGCCACTTG -39 >>>> >>>> |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| >>>> Sbjct: 2251160 >>> CATATCCAGATCTATCTTGATGATTCTTATTAGAATATGTATCTGAAGATGTGCCACTTG >>>> 2251219 >>>> >>>> Query: -40 >>> TTGGAGGTGGTGGAGCTCTTCTAGCAGGAATAAGTTCAGATTTATTCATCAAATTATTCA -99 >>>> >>>> |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| >>>> Sbjct: 2251220 >>> TTGGAGGTGGTGGAGCTCTTCTAGCAGGAATAAGTTCAGATTTATTCATCAAATTATTCA >>>> 2251279 >>>> >>>> Query: -100 >>> ATGGTGAAACGTTTNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNC -159 >>>> >>>> |||||||||||||| | >>>> Sbjct: 2251280 >>> ATGGTGAAACGTTTTTAGTATTATTATTGTTAGTGCTGTTGTTATTATTATTATTATTAC >>>> 2251339 >>>> >>>> Query: -160 >>> CAGAACTAGGTAATGAGCCTGATGATGATGTATGTTGGTGGGAAGAGCCATTTAGTTGTG -219 >>>> >>>> |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| >>>> Sbjct: 2251340 >>> CAGAACTAGGTAATGAGCCTGATGATGATGTATGTTGGTGGGAAGAGCCATTTAGTTGTG >>>> 2251399 >>>> >>>> Query: -220 >>> TCAAATGATATGGAGTTGGTGGTTTTGGTGCAGCTCGACTAGGTTTGAATTGTGAGACAG -279 >>>> >>>> |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| >>>> Sbjct: 2251400 >>> TCAAATGATATGGAGTTGGTGGTTTTGGTGCAGCTCGACTAGGTTTGAATTGTGAGACAG >>>> 2251459 >>>> >>>> Query: -280 >>> TAGATTTTGCTGGAGGTTTTACCCATTCTTGTAAATTTGCCTCTTGGACATTGTTTTTGG -339 >>>> >>>> |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| >>>> Sbjct: 2251460 >>> TAGATTTTGCTGGAGGTTTTACCCATTCTTGTAAATTTGCCTCTTGGACATTGTTTTTGG >>>> 2251519 >>>> >>>> Query: -340 >>> CTGATGAGTAATTGTTAGGGTCATTATTATTATTGTTGGTTTTGGAATTGATCATGGGTG -399 >>>> >>>> |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| >>>> Sbjct: 2251520 >>> CTGATGAGTAATTGTTAGGGTCATTATTATTATTGTTGGTTTTGGAATTGATCATGGGTG >>>> 2251579 >>>> >>>> Query: -400 >>> ATCCAATTGGAGTTCCAGCAGCAGAATTACCTCCATTTATATCGGAATAAAATTCTAAAA -459 >>>> >>>> |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| >>>> Sbjct: 2251580 >>> ATCCAATTGGAGTTCCAGCAGCAGAATTACCTCCATTTATATCGGAATAAAATTCTAAAA >>>> 2251639 >>>> >>>> Query: -460 >>> CTTTAATAACAGCAACAGGATCTTTTTTCCAATCCTCATTAGTGATTTTCGAATGTTGTA -519 >>>> >>>> |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| >>>> Sbjct: 2251640 >>> CTTTAATAACAGCAACAGGATCTTTTTTCCAATCCTCATTAGTGATTTTCGAATGTTGTA >>>> 2251699 >>>> >>> >>> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From David.Messina at sbc.su.se Tue May 13 11:29:16 2008 From: David.Messina at sbc.su.se (Dave Messina) Date: Tue, 13 May 2008 17:29:16 +0200 Subject: [Bioperl-l] help_to_acces_clustal-w In-Reply-To: <462687.68349.qm@web8713.mail.in.yahoo.com> References: <462687.68349.qm@web8713.mail.in.yahoo.com> Message-ID: <628aabb70805130829r1e9b7c2fpbc6ecf036f01286f@mail.gmail.com> Hi Punit, Please make sure that you use 'reply to all' when responding so that this gets seen on the BioPerl mailing list, too. On Mon, May 12, 2008 at 1:25 PM, punit kumar wrote: > in actually i use the perl version 5.6 and i have tried before to install > the bioperl on the windows workstation before. > > but in actually it was so painfull i do not wanna change my prior version > of the perl which is used by my > and wanna install too so that is my problem. > I have never tried installing Perl on Windows, but if you read the BioPerl installation guide for Windowsthat I pointed you to, you'll see that a straightforward ActivePerl installer is available. So you shouldn't have to install Perl from the source code. Again, I haven't done it personally, but I would expect that the ActivePerl installer allows you to specify where it installs Perl. This would enable you to keep your existing Perl 5.6 installation and have a separate Perl 5.8.x installation for use with BioPerl. According to the BioPerl Windows installation guide, once you install ActivePerl, there is a Perl Package Manager with a graphical interface that makes it very easy to install the latest version of BioPerl. Dave From jay at jays.net Tue May 13 12:28:36 2008 From: jay at jays.net (Jay Hannah) Date: Tue, 13 May 2008 11:28:36 -0500 Subject: [Bioperl-l] [Gmod-gbrowse] Script to convert blastall output into gff format. In-Reply-To: References: Message-ID: <7419081D-CAEC-4F9C-ABD6-D5F8BBBA3106@jays.net> On May 13, 2008, at 8:59 AM, Gabriel Dalmazo wrote: > I've been serching for such tool, but couldn't find this especific > type of script, there are many parsers, but don't know which one is > more apropriated. I think this does what you're asking for: bioperl-live/scripts/utilities/search2gff.PLS http://code.open-bio.org/svnweb/index.cgi/bioperl/view/bioperl- live/trunk/scripts/utilities/search2gff.PLS If your goal is visualization you might find this interesting: http://www.bioperl.org/wiki/HOWTO:Graphics#Parsing_Real_BLAST_Output HTH, j http://clab.ist.unomaha.edu/CLAB/index.php/User:Jhannah From jason at bioperl.org Tue May 13 13:25:42 2008 From: jason at bioperl.org (Jason Stajich) Date: Tue, 13 May 2008 10:25:42 -0700 Subject: [Bioperl-l] MSA manipulation In-Reply-To: References: Message-ID: Jon - [CC-ing list in case others have input.] AlignI is the interface module but the actual implementation is in Bio:SimpleAlign. You want to read in alignments with Bio::AlignIO and that will give you Bio::SimpleAlign objects that can then be manipulated. There are methods for removing columns, etc. -jason On May 13, 2008, at 2:35 AM, Jon Wright ((JIC)) wrote: > Hi Jason, > > > > I am looking for a bioperl module into which you can load a multiple > sequence alignment and manipulate it programmatically (similar to > something like Jalview). The closest thing in BioPerl that I can find > is your implementation of the Bio::Align::AlignI but this doesn't > allow > any manipulation as far as I can tell. Are you aware of anything else > that might do the job? > > > > Thanks for your help. > > > > Jon > > > > ********************************************************* > > Jonathan Wright > > Computational and Systems Biology Department > > John Innes Centre > > Norwich > > UK > > > > www.jic.bbsrc.ac.uk > > Tel. +44 (0)1603 450811 > > ********************************************************* > > > From vdar at yorku.ca Tue May 13 16:45:13 2008 From: vdar at yorku.ca (vdar at yorku.ca) Date: Tue, 13 May 2008 16:45:13 -0400 Subject: [Bioperl-l] automated stand alone blast with repeat masker In-Reply-To: <628aabb70805121458o5bc808f8jf46869b08e65e8ac@mail.gmail.com> References: <17189995.post@talk.nabble.com> <628aabb70805121458o5bc808f8jf46869b08e65e8ac@mail.gmail.com> Message-ID: <1210711513.4829fdd9b756d@mymail.yorku.ca> Do we have to install it separately because seems like its not there on my system although I have bioperl installed on my system. Quoting Dave Messina : > I haven't done this myself, but from a quick search on the BioPerl website, > it looks like you'll want to use the > Bio::Tools::Run::RepeatMaskermodule > to create a repeat-masked fasta file. > > If you RepeatMask your query sequence(s), then you need to specify that > sequence when you create your Bio::Seq