[Bioperl-l] Can't parse blast report written by Bio::SearchIO::Writer::TextResultWriter

Jason Stajich jason at bioperl.org
Thu May 8 22:29:40 UTC 2008


I suspect somehow you are not reconstituting the Hit or Result  
objects properly, but I didn't try and debug this myself.

You can specify a sort order function to the Result object now to  
specify the Hit order, maybe we should add sort function to Hit  
object for retrieving the underlying HSPs in a
programmable order.  Seems like that would be a cleaner fix.

-jason

On May 8, 2008, at 1:54 PM, Prachi Shah wrote:

> Hi all,
>
> I am trying to order of HSPs within each BLAST Hit in the order of
> ascending P-values. So, I parse my WU-BLAST report using Bio::SearchIO
> and create new Result, Hit and HSP objects in the order and then write
> out another BLAST report with the
> Bio::SearchIO::Writer::TextResultWriter module. All this works fine.
> But, when I try to parse this new blast report with
> Bio::SearchIO::blast, I get the following error:
>
> ------------- EXCEPTION  -------------
> MSG: no data for midline Query: 0       1
> STACK Bio::SearchIO::blast::next_result
> /tools/perl/5.6.1/lib/site_perl/5.6.1/Bio/SearchIO/blast.pm:1151
> STACK toplevel bin/testBlastParse.pl:12
> --------------------------------------
>
> I have copied below sample sections of both blast reports and the
> code. Any hints/ pointers/ suggestions are greatly appreciated.
>
> Thanks,
> Prachi
>
>
>
> The old vs new blast reports look slightly different, esp. note the
> HSP start and stop coordinates for the QUERY sequence.
>
> **Snippet of OLD blast report (generated by WU-BLAST):
> ---------------------------------------------------------------------- 
> ------------------------------
> Query=  orf19.4890
>         (4931 letters)
>
> Database:  Ca21_Chromosomes
>            9 sequences; 14,324,492 total letters.
> Searching.... 
> 10....20....30....40....50....60....70....80....90....100% done
>
> WARNING:  hspmax=1000 was exceeded by 8 of the database sequences,  
> causing the
>           associated cutoff score, S2, to be transiently set as  
> high as 113.
>
>                                                                      S 
> mallest
>                                                                        
>  Sum
>                                                               High   
> Probability
> Sequences producing High-scoring Segment Pairs:              Score   
> P(N)      N
>
> Ca21chr1 Assembly 21, Ca21chr1 (3188577 nucleotides)         24655   
> 0.        1
> Ca21chr5 Assembly 21, Ca21chr5 (1190941 nucleotides)          1682   
> 3.4e-68   3
> Ca21chr6 Assembly 21, Ca21chr6 (1033553 nucleotides)           908   
> 3.0e-34   3
> Ca21chr2 Assembly 21, Ca21chr2 (2232049 nucleotides)           859   
> 4.7e-30   1
> Ca21chr7 Assembly 21, Ca21chr7 (949626 nucleotides)            492   
> 7.3e-24   3
> Ca21chr4 Assembly 21, Ca21chr4 (1603475 nucleotides)           528   
> 9.8e-21   2
> Ca21chrR Assembly 21, Ca21chrR (2286425 nucleotides)           520   
> 1.4e-19   5
> Ca21chr3 Assembly 21, Ca21chr3 (1799426 nucleotides)           502   
> 1.7e-14   2
> Ca19-mtDNA Assembly 19, Ca19-mtDNA (40420 nucleotides)         313   
> 2.9e-06   2
>
>
>> Ca21chr1 Assembly 21, Ca21chr1 (3188577 nucleotides)
>         Length = 3,188,577
>
>   Plus Strand HSPs:
>
>  Score = 506 (82.0 bits), Expect = 4.9e-14, P = 4.9e-14
>  Identities = 850/1549 (54%), Positives = 850/1549 (54%), Strand =  
> Plus / Plus
>
> Query:   3450 ATGCATATGGTAATGTTAA-AATCACTGATTTTGGA- 
> TTTTGTGCTAAATTAAC-T-GAT 3505
>               | | ||| | | || |||| |||  ||||| ||| | ||||| || |  ||  
> |  | | |
> Sbjct: 155924 AGGGATACGATTAT-TTAAGAATT-CTGATATTGAAATTTTG-GC- 
> ATTTTCATATAGTT
> 155979
>
> Query:   3506 CAAAGA--AATAAACGTGCC-ACAATGGTGGGGACACCATATTGG- 
> ATGGCACCTGAAGT 3561
>               |||| |  ||||||  |    | |||| ||     | ||| | |  |||    
> |  | | |
> Sbjct: 155980 CAAACATTAATAAATATATTGAAAATGTTGATTTAATCAT-TAGTCATG--- 
> CTGGTACT
> 156035
>
> Query:   3562  
> GGTTAAACAAAAGGAATATGATGAAAAAGTTGATGTTTGGTCATTGGGGATTATGACTAT 3621
>               || | ||  |  |  || || | | |   ||||   |    ||||      
> ||||   ||
> Sbjct: 156036 GGATCAATCATTG--AT-TGTTTACAT--TTGAA-- 
> TAAACCATTAATTGTTATTGTTAA
> 156088
>
> Query:   3622 TGAAATGATTGAAGGAGAACCACCTTATTTGAA-T- 
> GAAGAACCATTAAAAGCATTATAT 3679
> ---------------------------------------------------------------------- 
> ------------------------------
>
> **Snippet of NEW blast report (generated using
> Bio::SearchIO::Writer::TextResultWriter)
> ---------------------------------------------------------------------- 
> ------------------------------
> uery= orf19.4890
>        (4,931 letters)
>
> Database: Ca21_Chromosomes
>            9 sequences; 14,324,492 total letters
>
>                                                                   
> Score       E
> Sequences producing significant alignments:                       
> (bits)    value
> Ca21chr1 Assembly 21, Ca21chr1 (3188577 nucleotides)                 
> 24655  0.
> Ca21chr5 Assembly 21, Ca21chr5 (1190941 nucleotides)
> 1682  3.4e-68
> Ca21chr6 Assembly 21, Ca21chr6 (1033553 nucleotides)
> 908   3.0e-34
> Ca21chr2 Assembly 21, Ca21chr2 (2232049 nucleotides)
> 859   4.7e-30
> Ca21chr7 Assembly 21, Ca21chr7 (949626 nucleotides)
> 492   7.3e-24
> Ca21chr4 Assembly 21, Ca21chr4 (1603475 nucleotides)
> 528   9.8e-21
> Ca21chrR Assembly 21, Ca21chrR (2286425 nucleotides)
> 520   1.4e-19
> Ca21chr3 Assembly 21, Ca21chr3 (1799426 nucleotides)
> 502   1.7e-14
> Ca19-mtDNA Assembly 19, Ca19-mtDNA (40420 nucleotides)
> 313   2.9e-06
>
>
>> Ca21chr1 Assembly 21, Ca21chr1 (3188577 nucleotides)
>          Length = 3188577
>
>  Score = 3705.3 bits (24655), Expect = 0., P = 0.
>  Identities = 4931/4931 (100%)
>  Frame =  -1 / +1
>
> Query: 1        
> ATAAAGGATGCCAAATAGTAGTAGTAAAATAGTAAATAGAATTGCAAAACAAAAATGATT -58
>                 
> ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
> Sbjct: 2248574  
> ATAAAGGATGCCAAATAGTAGTAGTAAAATAGTAAATAGAATTGCAAAACAAAAATGATT
> 2248633
>
> Query: -59      
> AAATAGCCCTTTATCAATAAATTTTTAAAGTTAGTTTCTTCTGGAACCCTACCCTCTTGG -118
>                 
> ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
> Sbjct: 2248634  
> AAATAGCCCTTTATCAATAAATTTTTAAAGTTAGTTTCTTCTGGAACCCTACCCTCTTGG
> 2248693
>
> Query: -119     
> TGTTAATCTTTTAAGTTAATATTTATAGTTAATAAAGTAGAAGTGTCTATTTATTGATTG -178
>                 
> ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
> Sbjct: 2248694  
> TGTTAATCTTTTAAGTTAATATTTATAGTTAATAAAGTAGAAGTGTCTATTTATTGATTG
> 2248753
>
> Query: -179     
> TTGTTGTTGTTGATTAAGAATATAAAGAAAAACAGAAAAGAAAAAAAGAAGGTTTAAAAA -238
>                 
> ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
> Sbjct: 2248754  
> TTGTTGTTGTTGATTAAGAATATAAAGAAAAACAGAAAAGAAAAAAAGAAGGTTTAAAAA
> 2248813
>
> Query: -239     
> AGTTAATTGTGAAGTAAAAGGGTTGAAAAATTTTTTTTTTTTCTGTTTCTCTCTTTGAGA -298
>                 
> ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
> Sbjct: 2248814  
> AGTTAATTGTGAAGTAAAAGGGTTGAAAAATTTTTTTTTTTTCTGTTTCTCTCTTTGAGA
> 2248873
>
> Query: -299     
> TTCTTTGACATATTTATTATTATAACACTATGCTATACTAAAAACAGTACTACCAATTGA -358
>                 
> ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
> Sbjct: 2248874  
> TTCTTTGACATATTTATTATTATAACACTATGCTATACTAAAAACAGTACTACCAATTGA
> 2248933
>
> Query: -359     
> ATTAAATTAAATTAAATTAAATTAAATTATTAGACCAATTTCAATAAAGATAAGCAATTT -418
>
> ---------------------------------------------------------------------- 
> ------------------------------
>
> **Here is the snippet of code that reads the old report, generates new
> objects and writes new report:
> ---------------------------------------------------------------------- 
> ------------------------------
>     my $blast_report = Bio::SearchIO->new(-format => 'blast',
>                       -file   => $blastOutputTmp);
>
>     my $writer =
> Bio::SearchIO::Writer::TextResultWriter->new(-no_wublastlinks => 0);
>     my $out_blast_report = Bio::SearchIO->new(-writer => $writer,
>                           -file   => ">$blastOutputFile");
>
>     my $sorted_blast_report;
>
>     while( my $result = $blast_report->next_result ) {
>
>         my (%parameters, %statistics);
>
>     foreach my $param ($result->available_parameters) {
>
>         $parameters{$param} = $result->get_parameter($param);
>     }
>
>     foreach my $stat ($result->available_statistics) {
>
>         $statistics{$stat} = $result->get_statistic($stat);
>     }
>
>         my $generic_result =
> Bio::Search::Result::BlastResult->new(-query_name          =>
> $result->query_name,
>                                    -query_length        =>
> $result->query_length,
>                                    -database_name       =>
> $result->database_name,
>                                    -database_entries    =>
> $result->database_entries,
>                                    -parameters          => \% 
> parameters,
>                                    -statistics          => \% 
> statistics,
>                                    -algorithm           => $result- 
> >algorithm,
>                                    -query_description   =>
> $result->query_description,
>                                    -algorithm_reference =>
> $result->algorithm_reference,
>                                    -algorithm_version   =>
> $result->algorithm_version,
>                                    -database_letters    =>
> $result->database_letters);
>
>         while( my $hit = $result->next_hit ) {
>
>         my $generic_hit = Bio::Search::Hit::BlastHit->new(-name
>  => $hit->name,
>                                   -algorithm    => $hit->algorithm,
>                                   -description  => $hit->description,
>                                   -length       => $hit->length,
>                                   -score        => $hit->score,
>                                   -bits         => $hit->bits,
>                                   -significance => $hit- 
> >significance);
>
>         my (@hsp_sorted, @hsps);
>             while( my $hsp = $hit->next_hsp ) {
>
>             push(@hsps, $hsp);
>         }
>
>         @hsp_sorted = sort {$a->pvalue <=> $b->pvalue} @hsps;
>
>         for(my $i=0; $i<=$#hsp_sorted; $i++) {
>
>             $generic_hit->add_hsp($hsp_sorted[$i]);
>
>         }
>
>         $generic_result->add_hit($generic_hit);
>
>     }
>
>     $out_blast_report->write_result($generic_result);
>
>     }
> ---------------------------------------------------------------------- 
> ------------------------------
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l




More information about the Bioperl-l mailing list