[Bioperl-l] different results with remote-blast skript
Chris Fields
cjfields at illinois.edu
Sun Jul 5 22:51:39 UTC 2009
That inspires confidence ;>
chris
On Jul 5, 2009, at 4:40 PM, Jason Stajich wrote:
> integer overflow in blast....
>
> On Jul 5, 2009, at 2:00 PM, Smithies, Russell wrote:
>
>> I'd guess it's a difference in the parameters used.
>> Interesting that both have the number of letters in the db as
>> "-1,125,070,205", I assume that's a bug :-)
>>
>> Stats from your remote_blast:
>>
>> 'stats' => {
>> 'S1' => '42',
>> 'S1_bits' => '20.8',
>> 'lambda' => '0.309',
>> 'entropy' => '0.345',
>> 'kappa_gapped' => '0.0410',
>> 'T' => '11',
>> 'kappa' => '0.122',
>> 'X3_bits' => '24.7',
>> 'X1' => '16',
>> 'lambda_gapped' => '0.267',
>> 'X2' => '38',
>> 'S2' => '74',
>> 'seqs_better_than_cutoff' => '0',
>> 'posted_date' => 'Jul 4, 2009 4:41 AM',
>> 'Hits_to_DB' => '60102303',
>> 'dbletters' => '-1125070205',
>> 'A' => '40',
>> 'num_successful_extensions' => '2004',
>> 'num_extensions' => '1436892',
>> 'X1_bits' => '7.1',
>> 'X3' => '64',
>> 'entropy_gapped' => '0.140',
>> 'dbentries' => '9252258',
>> 'X2_bits' => '14.6',
>> 'S2_bits' => '33.1'
>> }
>>
>>
>> Stats from a blast done on the NCBI webpage:
>>
>> Database: All non-redundant GenBank CDS translations+PDB+SwissProt
>> +PIR+PRF
>> excluding environmental samples from WGS projects
>> Posted date: Jul 4, 2009 4:41 AM
>> Number of letters in database: -1,125,070,205
>> Number of sequences in database: 9,252,258
>>
>> Lambda K H
>> 0.309 0.124 0.340
>> Gapped
>> Lambda K H
>> 0.267 0.0410 0.140
>> Matrix: BLOSUM62
>> Gap Penalties: Existence: 11, Extension: 1
>> Number of Sequences: 9252258
>> Number of Hits to DB: 86493230
>> Number of extensions: 3101413
>> Number of successful extensions: 9001
>> Number of sequences better than 100: 65
>> Number of HSP's better than 100 without gapping: 0
>> Number of HSP's gapped: 9000
>> Number of HSP's successfully gapped: 66
>> Length of query: 150
>> Length of database: 3169897087
>> Length adjustment: 113
>> Effective length of query: 37
>> Effective length of database: 2124391933
>> Effective search space: 78602501521
>> Effective search space used: 78602501521
>> T: 11
>> A: 40
>> X1: 16 (7.1 bits)
>> X2: 38 (14.6 bits)
>> X3: 64 (24.7 bits)
>> S1: 42 (20.8 bits)
>> S2: 65 (29.6 bits)
>>
>>
>>> -----Original Message-----
>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>> bounces at lists.open-bio.org] On Behalf Of Jonas Schaer
>>> Sent: Sunday, 28 June 2009 10:15 p.m.
>>> To: BioPerl List
>>> Subject: [Bioperl-l] different results with remote-blast skript
>>>
>>> Hi again :)
>>> please, I only have this little question:
>>> why do I get different results with my remote::blast perl skript
>>> then on the
>>> ncbi blast homepage?
>>> I am using blastp, the query is an amino-sequence (different
>>> results with any
>>> sequence, differences not only in number of hits but even in e-
>>> values, scores
>>> etc...), the database is 'nr'.
>>> PLEASE help me,
>>> thank you in advance,
>>> Jonas
>>>
>>> ps: my skript:
>>> ##############################################################################
>>> ##
>>> use Bio::Seq::SeqFactory;
>>> use Bio::Tools::Run::RemoteBlast;
>>> use strict;
>>> my @blast_report;
>>> my $prog = 'blastp';
>>> my $db = 'nr';
>>> my $e_val= '1e-10';
>>> #my $e_val= '10';
>>> my @params = ( '-prog' => $prog,
>>> '-data' => $db,
>>> '-expect' => $e_val,
>>> '-readmethod' => 'SearchIO' );
>>> my $factory = Bio::Tools::Run::RemoteBlast->new(@params);
>>> $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'} = '11 1';
>>> $Bio::Tools::Run::RemoteBlast::HEADER{'MAX_NUM_SEQ'} = '100';
>>> $Bio::Tools::Run::RemoteBlast::HEADER{'EXPECT'} = '10';
>>> $
>>> Bio
>>> ::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'}
>>> = '1';
>>>
>>> my
>>> $
>>> blast_seq
>>> ='MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLR
>>> SLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDNAFRQAHQNTAMATGPD
>>> PDDEYE';
>>> #$v is just to turn on and off the messages
>>> my $v = 1;
>>> my $seqbuilder = Bio::Seq::SeqFactory->new('-type' =>
>>> 'Bio::PrimarySeq');
>>> my $seq = $seqbuilder->create(-seq =>$blast_seq, -display_id =>
>>> "$blast_seq");
>>> my $filename='temp2.out';
>>> my $r = $factory->submit_blast($seq);
>>> print STDERR "waiting..." if( $v > 0 );
>>> while ( my @rids = $factory->each_rid )
>>> {
>>> foreach my $rid ( @rids )
>>> {
>>> my $rc = $factory->retrieve_blast($rid);
>>> if( !ref($rc) )
>>> {
>>> if( $rc < 0 )
>>> {
>>> $factory->remove_rid($rid);
>>> }
>>> print STDERR "." if ( $v > 0 );
>>> }
>>> else
>>> {
>>> my $result = $rc->next_result();
>>> $factory->save_output($filename);
>>> $factory->remove_rid($rid);
>>> print "\nQuery Name: ", $result->query_name(),
>>> "\n";
>>> while ( my $hit = $result->next_hit )
>>> {
>>> next unless ( $v > 0);
>>> print "\thit name is ", $hit->name, "\n";
>>> while( my $hsp = $hit->next_hsp )
>>> {
>>> print "\t\tscore is ", $hsp->score, "\n";
>>> }
>>> }
>>> }
>>> }
>>>
>>>
>>> }
>>> @blast_report = get_file_data ($filename);
>>> return @blast_report;
>>> ##############################################################################
>>> ####
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> =
>> =
>> =====================================================================
>> Attention: The information contained in this message and/or
>> attachments
>> from AgResearch Limited is intended only for the persons or entities
>> to which it is addressed and may contain confidential and/or
>> privileged
>> material. Any review, retransmission, dissemination or other use
>> of, or
>> taking of any action in reliance upon, this information by persons or
>> entities other than the intended recipients is prohibited by
>> AgResearch
>> Limited. If you have received this message in error, please notify
>> the
>> sender immediately.
>> =
>> =
>> =====================================================================
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> jason at bioperl.org
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
More information about the Bioperl-l
mailing list