[Bioperl-l] different results with remote-blast skript
Jason Stajich
jason at bioperl.org
Sun Jul 5 21:40:41 UTC 2009
integer overflow in blast....
On Jul 5, 2009, at 2:00 PM, Smithies, Russell wrote:
> I'd guess it's a difference in the parameters used.
> Interesting that both have the number of letters in the db as
> "-1,125,070,205", I assume that's a bug :-)
>
> Stats from your remote_blast:
>
> 'stats' => {
> 'S1' => '42',
> 'S1_bits' => '20.8',
> 'lambda' => '0.309',
> 'entropy' => '0.345',
> 'kappa_gapped' => '0.0410',
> 'T' => '11',
> 'kappa' => '0.122',
> 'X3_bits' => '24.7',
> 'X1' => '16',
> 'lambda_gapped' => '0.267',
> 'X2' => '38',
> 'S2' => '74',
> 'seqs_better_than_cutoff' => '0',
> 'posted_date' => 'Jul 4, 2009 4:41 AM',
> 'Hits_to_DB' => '60102303',
> 'dbletters' => '-1125070205',
> 'A' => '40',
> 'num_successful_extensions' => '2004',
> 'num_extensions' => '1436892',
> 'X1_bits' => '7.1',
> 'X3' => '64',
> 'entropy_gapped' => '0.140',
> 'dbentries' => '9252258',
> 'X2_bits' => '14.6',
> 'S2_bits' => '33.1'
> }
>
>
> Stats from a blast done on the NCBI webpage:
>
> Database: All non-redundant GenBank CDS translations+PDB+SwissProt
> +PIR+PRF
> excluding environmental samples from WGS projects
> Posted date: Jul 4, 2009 4:41 AM
> Number of letters in database: -1,125,070,205
> Number of sequences in database: 9,252,258
>
> Lambda K H
> 0.309 0.124 0.340
> Gapped
> Lambda K H
> 0.267 0.0410 0.140
> Matrix: BLOSUM62
> Gap Penalties: Existence: 11, Extension: 1
> Number of Sequences: 9252258
> Number of Hits to DB: 86493230
> Number of extensions: 3101413
> Number of successful extensions: 9001
> Number of sequences better than 100: 65
> Number of HSP's better than 100 without gapping: 0
> Number of HSP's gapped: 9000
> Number of HSP's successfully gapped: 66
> Length of query: 150
> Length of database: 3169897087
> Length adjustment: 113
> Effective length of query: 37
> Effective length of database: 2124391933
> Effective search space: 78602501521
> Effective search space used: 78602501521
> T: 11
> A: 40
> X1: 16 (7.1 bits)
> X2: 38 (14.6 bits)
> X3: 64 (24.7 bits)
> S1: 42 (20.8 bits)
> S2: 65 (29.6 bits)
>
>
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>> bounces at lists.open-bio.org] On Behalf Of Jonas Schaer
>> Sent: Sunday, 28 June 2009 10:15 p.m.
>> To: BioPerl List
>> Subject: [Bioperl-l] different results with remote-blast skript
>>
>> Hi again :)
>> please, I only have this little question:
>> why do I get different results with my remote::blast perl skript
>> then on the
>> ncbi blast homepage?
>> I am using blastp, the query is an amino-sequence (different
>> results with any
>> sequence, differences not only in number of hits but even in e-
>> values, scores
>> etc...), the database is 'nr'.
>> PLEASE help me,
>> thank you in advance,
>> Jonas
>>
>> ps: my skript:
>> ##############################################################################
>> ##
>> use Bio::Seq::SeqFactory;
>> use Bio::Tools::Run::RemoteBlast;
>> use strict;
>> my @blast_report;
>> my $prog = 'blastp';
>> my $db = 'nr';
>> my $e_val= '1e-10';
>> #my $e_val= '10';
>> my @params = ( '-prog' => $prog,
>> '-data' => $db,
>> '-expect' => $e_val,
>> '-readmethod' => 'SearchIO' );
>> my $factory = Bio::Tools::Run::RemoteBlast->new(@params);
>> $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'} = '11 1';
>> $Bio::Tools::Run::RemoteBlast::HEADER{'MAX_NUM_SEQ'} = '100';
>> $Bio::Tools::Run::RemoteBlast::HEADER{'EXPECT'} = '10';
>> $
>> Bio
>> ::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'} =
>> '1';
>>
>> my
>> $
>> blast_seq
>> ='MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLR
>> SLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDNAFRQAHQNTAMATGPD
>> PDDEYE';
>> #$v is just to turn on and off the messages
>> my $v = 1;
>> my $seqbuilder = Bio::Seq::SeqFactory->new('-type' =>
>> 'Bio::PrimarySeq');
>> my $seq = $seqbuilder->create(-seq =>$blast_seq, -display_id =>
>> "$blast_seq");
>> my $filename='temp2.out';
>> my $r = $factory->submit_blast($seq);
>> print STDERR "waiting..." if( $v > 0 );
>> while ( my @rids = $factory->each_rid )
>> {
>> foreach my $rid ( @rids )
>> {
>> my $rc = $factory->retrieve_blast($rid);
>> if( !ref($rc) )
>> {
>> if( $rc < 0 )
>> {
>> $factory->remove_rid($rid);
>> }
>> print STDERR "." if ( $v > 0 );
>> }
>> else
>> {
>> my $result = $rc->next_result();
>> $factory->save_output($filename);
>> $factory->remove_rid($rid);
>> print "\nQuery Name: ", $result->query_name(),
>> "\n";
>> while ( my $hit = $result->next_hit )
>> {
>> next unless ( $v > 0);
>> print "\thit name is ", $hit->name, "\n";
>> while( my $hsp = $hit->next_hsp )
>> {
>> print "\t\tscore is ", $hsp->score, "\n";
>> }
>> }
>> }
>> }
>>
>>
>> }
>> @blast_report = get_file_data ($filename);
>> return @blast_report;
>> ##############################################################################
>> ####
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> =
> ======================================================================
> Attention: The information contained in this message and/or
> attachments
> from AgResearch Limited is intended only for the persons or entities
> to which it is addressed and may contain confidential and/or
> privileged
> material. Any review, retransmission, dissemination or other use of,
> or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipients is prohibited by
> AgResearch
> Limited. If you have received this message in error, please notify the
> sender immediately.
> =
> ======================================================================
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
--
Jason Stajich
jason at bioperl.org
More information about the Bioperl-l
mailing list