[Bioperl-l] different results with remote-blast skript

Chris Fields cjfields at illinois.edu
Sun Jul 5 22:51:39 UTC 2009


That inspires confidence ;>

chris

On Jul 5, 2009, at 4:40 PM, Jason Stajich wrote:

> integer overflow in blast....
>
> On Jul 5, 2009, at 2:00 PM, Smithies, Russell wrote:
>
>> I'd guess it's a difference in the parameters used.
>> Interesting that both have the number of letters in the db as  
>> "-1,125,070,205", I assume that's a bug  :-)
>>
>> Stats from your remote_blast:
>>
>> 'stats' => {
>>            'S1' => '42',
>>            'S1_bits' => '20.8',
>>            'lambda' => '0.309',
>>            'entropy' => '0.345',
>>            'kappa_gapped' => '0.0410',
>>            'T' => '11',
>>            'kappa' => '0.122',
>>            'X3_bits' => '24.7',
>>            'X1' => '16',
>>            'lambda_gapped' => '0.267',
>>            'X2' => '38',
>>            'S2' => '74',
>>            'seqs_better_than_cutoff' => '0',
>>            'posted_date' => 'Jul 4, 2009  4:41 AM',
>>            'Hits_to_DB' => '60102303',
>>            'dbletters' => '-1125070205',
>>            'A' => '40',
>>            'num_successful_extensions' => '2004',
>>            'num_extensions' => '1436892',
>>            'X1_bits' => '7.1',
>>            'X3' => '64',
>>            'entropy_gapped' => '0.140',
>>            'dbentries' => '9252258',
>>            'X2_bits' => '14.6',
>>            'S2_bits' => '33.1'
>>          }
>>
>>
>> Stats from a blast done on the NCBI webpage:
>>
>> Database: All non-redundant GenBank CDS translations+PDB+SwissProt 
>> +PIR+PRF
>> excluding environmental samples from WGS projects
>>   Posted date:  Jul 4, 2009  4:41 AM
>> Number of letters in database: -1,125,070,205
>> Number of sequences in database:  9,252,258
>>
>> Lambda     K      H
>>  0.309    0.124    0.340
>> Gapped
>> Lambda     K      H
>>  0.267   0.0410    0.140
>> Matrix: BLOSUM62
>> Gap Penalties: Existence: 11, Extension: 1
>> Number of Sequences: 9252258
>> Number of Hits to DB: 86493230
>> Number of extensions: 3101413
>> Number of successful extensions: 9001
>> Number of sequences better than 100: 65
>> Number of HSP's better than 100 without gapping: 0
>> Number of HSP's gapped: 9000
>> Number of HSP's successfully gapped: 66
>> Length of query: 150
>> Length of database: 3169897087
>> Length adjustment: 113
>> Effective length of query: 37
>> Effective length of database: 2124391933
>> Effective search space: 78602501521
>> Effective search space used: 78602501521
>> T: 11
>> A: 40
>> X1: 16 (7.1 bits)
>> X2: 38 (14.6 bits)
>> X3: 64 (24.7 bits)
>> S1: 42 (20.8 bits)
>> S2: 65 (29.6 bits)
>>
>>
>>> -----Original Message-----
>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>> bounces at lists.open-bio.org] On Behalf Of Jonas Schaer
>>> Sent: Sunday, 28 June 2009 10:15 p.m.
>>> To: BioPerl List
>>> Subject: [Bioperl-l] different results with remote-blast skript
>>>
>>> Hi again :)
>>> please, I only have this little question:
>>> why do I get different results with my remote::blast perl skript  
>>> then on the
>>> ncbi blast homepage?
>>> I am using blastp, the query is an amino-sequence (different  
>>> results with any
>>> sequence, differences not only in number of hits but even in e- 
>>> values, scores
>>> etc...), the database is 'nr'.
>>> PLEASE help me,
>>> thank you in advance,
>>> Jonas
>>>
>>> ps: my skript:
>>> ##############################################################################
>>> ##
>>> use Bio::Seq::SeqFactory;
>>> use Bio::Tools::Run::RemoteBlast;
>>> use strict;
>>> my @blast_report;
>>> my $prog = 'blastp';
>>> my $db   = 'nr';
>>> my $e_val= '1e-10';
>>> #my $e_val= '10';
>>> my @params = ( '-prog' => $prog,
>>>        '-data' => $db,
>>>        '-expect' => $e_val,
>>>        '-readmethod' => 'SearchIO' );
>>> my $factory = Bio::Tools::Run::RemoteBlast->new(@params);
>>>  $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'} = '11 1';
>>>  $Bio::Tools::Run::RemoteBlast::HEADER{'MAX_NUM_SEQ'} = '100';
>>> $Bio::Tools::Run::RemoteBlast::HEADER{'EXPECT'} = '10';
>>> $ 
>>> Bio 
>>> ::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'}  
>>> = '1';
>>>
>>> my
>>> $ 
>>> blast_seq 
>>> ='MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLR
>>> SLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDNAFRQAHQNTAMATGPD
>>> PDDEYE';
>>> #$v is just to turn on and off the messages
>>> my $v = 1;
>>> my $seqbuilder = Bio::Seq::SeqFactory->new('-type' =>  
>>> 'Bio::PrimarySeq');
>>> my $seq = $seqbuilder->create(-seq =>$blast_seq, -display_id =>
>>> "$blast_seq");
>>> my $filename='temp2.out';
>>> my $r = $factory->submit_blast($seq);
>>> print STDERR "waiting..." if( $v > 0 );
>>>   while ( my @rids = $factory->each_rid )
>>>   {
>>>       foreach my $rid ( @rids )
>>>       {
>>>           my $rc = $factory->retrieve_blast($rid);
>>>           if( !ref($rc) )
>>>           {
>>>               if( $rc < 0 )
>>>               {
>>>                   $factory->remove_rid($rid);
>>>               }
>>>               print STDERR "." if ( $v > 0 );
>>>           }
>>>               else
>>>               {
>>>                   my $result = $rc->next_result();
>>>                   $factory->save_output($filename);
>>>                   $factory->remove_rid($rid);
>>>                   print "\nQuery Name: ", $result->query_name(),  
>>> "\n";
>>>                   while ( my $hit = $result->next_hit )
>>>                   {
>>>                       next unless ( $v > 0);
>>>                       print "\thit name is ", $hit->name, "\n";
>>>                       while( my $hsp = $hit->next_hsp )
>>>                       {
>>>                           print "\t\tscore is ", $hsp->score, "\n";
>>>                       }
>>>                   }
>>>               }
>>>       }
>>>
>>>
>>>   }
>>> @blast_report = get_file_data ($filename);
>>> return @blast_report;
>>> ##############################################################################
>>> ####
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> = 
>> = 
>> =====================================================================
>> Attention: The information contained in this message and/or  
>> attachments
>> from AgResearch Limited is intended only for the persons or entities
>> to which it is addressed and may contain confidential and/or  
>> privileged
>> material. Any review, retransmission, dissemination or other use  
>> of, or
>> taking of any action in reliance upon, this information by persons or
>> entities other than the intended recipients is prohibited by  
>> AgResearch
>> Limited. If you have received this message in error, please notify  
>> the
>> sender immediately.
>> = 
>> = 
>> =====================================================================
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> jason at bioperl.org
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l




More information about the Bioperl-l mailing list