[Bioperl-l] different results with remote-blast skript

Jason Stajich jason at bioperl.org
Sun Jul 5 17:40:41 EDT 2009


integer overflow in blast....

On Jul 5, 2009, at 2:00 PM, Smithies, Russell wrote:

> I'd guess it's a difference in the parameters used.
> Interesting that both have the number of letters in the db as  
> "-1,125,070,205", I assume that's a bug  :-)
>
> Stats from your remote_blast:
>
> 'stats' => {
>             'S1' => '42',
>             'S1_bits' => '20.8',
>             'lambda' => '0.309',
>             'entropy' => '0.345',
>             'kappa_gapped' => '0.0410',
>             'T' => '11',
>             'kappa' => '0.122',
>             'X3_bits' => '24.7',
>             'X1' => '16',
>             'lambda_gapped' => '0.267',
>             'X2' => '38',
>             'S2' => '74',
>             'seqs_better_than_cutoff' => '0',
>             'posted_date' => 'Jul 4, 2009  4:41 AM',
>             'Hits_to_DB' => '60102303',
>             'dbletters' => '-1125070205',
>             'A' => '40',
>             'num_successful_extensions' => '2004',
>             'num_extensions' => '1436892',
>             'X1_bits' => '7.1',
>             'X3' => '64',
>             'entropy_gapped' => '0.140',
>             'dbentries' => '9252258',
>             'X2_bits' => '14.6',
>             'S2_bits' => '33.1'
>           }
>
>
> Stats from a blast done on the NCBI webpage:
>
>  Database: All non-redundant GenBank CDS translations+PDB+SwissProt 
> +PIR+PRF
> excluding environmental samples from WGS projects
>    Posted date:  Jul 4, 2009  4:41 AM
>  Number of letters in database: -1,125,070,205
>  Number of sequences in database:  9,252,258
>
> Lambda     K      H
>   0.309    0.124    0.340
> Gapped
> Lambda     K      H
>   0.267   0.0410    0.140
> Matrix: BLOSUM62
> Gap Penalties: Existence: 11, Extension: 1
> Number of Sequences: 9252258
> Number of Hits to DB: 86493230
> Number of extensions: 3101413
> Number of successful extensions: 9001
> Number of sequences better than 100: 65
> Number of HSP's better than 100 without gapping: 0
> Number of HSP's gapped: 9000
> Number of HSP's successfully gapped: 66
> Length of query: 150
> Length of database: 3169897087
> Length adjustment: 113
> Effective length of query: 37
> Effective length of database: 2124391933
> Effective search space: 78602501521
> Effective search space used: 78602501521
> T: 11
> A: 40
> X1: 16 (7.1 bits)
> X2: 38 (14.6 bits)
> X3: 64 (24.7 bits)
> S1: 42 (20.8 bits)
> S2: 65 (29.6 bits)
>
>
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>> bounces at lists.open-bio.org] On Behalf Of Jonas Schaer
>> Sent: Sunday, 28 June 2009 10:15 p.m.
>> To: BioPerl List
>> Subject: [Bioperl-l] different results with remote-blast skript
>>
>> Hi again :)
>> please, I only have this little question:
>> why do I get different results with my remote::blast perl skript  
>> then on the
>> ncbi blast homepage?
>> I am using blastp, the query is an amino-sequence (different  
>> results with any
>> sequence, differences not only in number of hits but even in e- 
>> values, scores
>> etc...), the database is 'nr'.
>> PLEASE help me,
>> thank you in advance,
>> Jonas
>>
>> ps: my skript:
>> ##############################################################################
>> ##
>> use Bio::Seq::SeqFactory;
>>  use Bio::Tools::Run::RemoteBlast;
>>  use strict;
>>  my @blast_report;
>>  my $prog = 'blastp';
>>  my $db   = 'nr';
>>  my $e_val= '1e-10';
>>  #my $e_val= '10';
>>  my @params = ( '-prog' => $prog,
>>         '-data' => $db,
>>         '-expect' => $e_val,
>>         '-readmethod' => 'SearchIO' );
>>  my $factory = Bio::Tools::Run::RemoteBlast->new(@params);
>>   $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'} = '11 1';
>>   $Bio::Tools::Run::RemoteBlast::HEADER{'MAX_NUM_SEQ'} = '100';
>> $Bio::Tools::Run::RemoteBlast::HEADER{'EXPECT'} = '10';
>> $ 
>> Bio 
>> ::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'} =  
>> '1';
>>
>>  my
>> $ 
>> blast_seq 
>> ='MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLR
>> SLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDNAFRQAHQNTAMATGPD
>> PDDEYE';
>>  #$v is just to turn on and off the messages
>>  my $v = 1;
>>  my $seqbuilder = Bio::Seq::SeqFactory->new('-type' =>  
>> 'Bio::PrimarySeq');
>>  my $seq = $seqbuilder->create(-seq =>$blast_seq, -display_id =>
>> "$blast_seq");
>>  my $filename='temp2.out';
>>  my $r = $factory->submit_blast($seq);
>>  print STDERR "waiting..." if( $v > 0 );
>>    while ( my @rids = $factory->each_rid )
>>    {
>>        foreach my $rid ( @rids )
>>        {
>>            my $rc = $factory->retrieve_blast($rid);
>>            if( !ref($rc) )
>>            {
>>                if( $rc < 0 )
>>                {
>>                    $factory->remove_rid($rid);
>>                }
>>                print STDERR "." if ( $v > 0 );
>>            }
>>                else
>>                {
>>                    my $result = $rc->next_result();
>>                    $factory->save_output($filename);
>>                    $factory->remove_rid($rid);
>>                    print "\nQuery Name: ", $result->query_name(),  
>> "\n";
>>                    while ( my $hit = $result->next_hit )
>>                    {
>>                        next unless ( $v > 0);
>>                        print "\thit name is ", $hit->name, "\n";
>>                        while( my $hsp = $hit->next_hsp )
>>                        {
>>                            print "\t\tscore is ", $hsp->score, "\n";
>>                        }
>>                    }
>>                }
>>        }
>>
>>
>>    }
>> @blast_report = get_file_data ($filename);
>> return @blast_report;
>> ##############################################################################
>> ####
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> = 
> ======================================================================
> Attention: The information contained in this message and/or  
> attachments
> from AgResearch Limited is intended only for the persons or entities
> to which it is addressed and may contain confidential and/or  
> privileged
> material. Any review, retransmission, dissemination or other use of,  
> or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipients is prohibited by  
> AgResearch
> Limited. If you have received this message in error, please notify the
> sender immediately.
> = 
> ======================================================================
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason at bioperl.org



More information about the Bioperl-l mailing list