[Bioperl-l] different results with remote-blast skript
Jonas Schaer
Brotelzwieb at gmx.de
Mon Jul 6 08:14:18 EDT 2009
Hi guys, thanks for your answers so far.
@jason: integer overflow in blast.... sorry, but what do you mean by that?
how can I fix it...?
Since I never really changed any parameters I thought them all to be
default. whatever, I tried to get "better" results with my prog by changing
these:
$Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'} = '11 1';
$Bio::Tools::Run::RemoteBlast::HEADER{'MAX_NUM_SEQ'} = '100';
$Bio::Tools::Run::RemoteBlast::HEADER{'EXPECT'} = '10';
$Bio::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'} = '1';
with no effect...I guess these were default values anyway.
So please maybe you can tell me all the other parameters I can change with
my perl-skript AND how to do that?
Unfortunately both, perl and the blast-algorithm are pretty much new to me,
maybe thats why I just cannot find out how to do that on my own... :/
Here is the output I get with my remote-blast skript:
#################################################################################################################
Query Name:
MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLRSL
L
hit name is ref|XP_001702807.1|
score is 442
BLASTP 2.2.21+
Reference: Stephen F. Altschul, Thomas L. Madden, Alejandro
A. Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and
David J. Lipman (1997), "Gapped BLAST and PSI-BLAST: a new
generation of protein database search programs", Nucleic
Acids Res. 25:3389-3402.
Reference for composition-based statistics: Alejandro A.
Schaffer, L. Aravind, Thomas L. Madden, Sergei Shavirin,
John L. Spouge, Yuri I. Wolf, Eugene V. Koonin, and Stephen
F. Altschul (2001), "Improving the accuracy of PSI-BLAST
protein database searches with composition-based statistics
and other refinements", Nucleic Acids Res. 29:2994-3005.
RID: 53STX5G2013
Database: All non-redundant GenBank CDS
translations+PDB+SwissProt+PIR+PRF excluding environmental samples
from WGS projects
9,252,587 sequences; 3,169,972,781 total letters
Query=
MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLRSLL
DVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDNAFRQAHQNTAM
ATGPDPDDEYE
Length=150
Score
E
Sequences producing significant alignments: (Bits)
Value
ref|XP_001702807.1| ClpS-like protein [Chlamydomonas reinhard... 174
2e-42
ALIGNMENTS
>ref|XP_001702807.1| ClpS-like protein [Chlamydomonas reinhardtii]
gb|EDP06586.1| ClpS-like protein [Chlamydomonas reinhardtii]
Length=303
Score = 174 bits (442), Expect = 2e-42, Method: Composition-based stats.
Identities = 150/150 (100%), Positives = 150/150 (100%), Gaps = 0/150 (0%)
Query 1 MGSSSVGTYHLLLVLMgaggeqqavqagaevaSTEQVDGSGMAANSRGSTSGSEQPPrds 60
MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDS
Sbjct 154 MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDS
213
Query 61 dlgllrslldVAGVDRTalevkllalaeagaeMPPAQDSQATAAGVVATLTSVYRQQVAR
120
DLGLLRSLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVAR
Sbjct 214 DLGLLRSLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVAR
273
Query 121 AWHERDDNAFRQAHQNTAMATGPDPDDEYE 150
AWHERDDNAFRQAHQNTAMATGPDPDDEYE
Sbjct 274 AWHERDDNAFRQAHQNTAMATGPDPDDEYE 303
Database: All non-redundant GenBank CDS translations+PDB+SwissProt+PIR+PRF
excluding environmental samples from WGS projects
Posted date: Jul 5, 2009 4:41 AM
Number of letters in database: -1,124,994,511
Number of sequences in database: 9,252,587
Lambda K H
0.309 0.122 0.345
Gapped
Lambda K H
0.267 0.0410 0.140
Matrix: BLOSUM62
Gap Penalties: Existence: 11, Extension: 1
Number of Sequences: 9252587
Number of Hits to DB: 60273703
Number of extensions: 1448367
Number of successful extensions: 2103
Number of sequences better than 10: 0
Number of HSP's better than 10 without gapping: 0
Number of HSP's gapped: 2113
Number of HSP's successfully gapped: 0
Length of query: 150
Length of database: 3169972781
Length adjustment: 113
Effective length of query: 37
Effective length of database: 2124430450
Effective search space: 78603926650
Effective search space used: 78603926650
T: 11
A: 40
X1: 16 (7.1 bits)
X2: 38 (14.6 bits)
X3: 64 (24.7 bits)
S1: 42 (20.8 bits)
S2: 74 (33.1 bits)
#################################################################################################################
and here are the hits (?) of the blast-algorithm on the ncbi-homepage with
the same query of course:
ref|XP_001702807.1| ClpS-like protein [Chlamydomonas reinhard... 300
3e-80
ref|XP_001942719.1| PREDICTED: similar to GA16705-PA [Acyrtho... 36.2
1.1
ref|ZP_03781446.1| hypothetical protein RUMHYD_00880 [Blautia... 35.4
1.8
ref|XP_001563232.1| leucyl-tRNA synthetase [Leishmania brazil... 34.3
4.2
ref|XP_680841.1| hypothetical protein AN7572.2 [Aspergillus n... 33.5
6.0
ref|YP_001768110.1| hypothetical protein M446_1150 [Methyloba... 33.5
7.0
#################################################################################################################at
least the first hit is the same, but even there there is a different score
and e-value.
thanks so much for any help :)
regards, jonas
----- Original Message -----
From: "Chris Fields" <cjfields at illinois.edu>
To: "Jason Stajich" <jason at bioperl.org>
Cc: "Smithies, Russell" <Russell.Smithies at agresearch.co.nz>; "'BioPerl
List'" <bioperl-l at lists.open-bio.org>; "'Jonas Schaer'"
<Jonas_Schaer at gmx.de>
Sent: Monday, July 06, 2009 12:51 AM
Subject: Re: [Bioperl-l] different results with remote-blast skript
> That inspires confidence ;>
>
> chris
>
> On Jul 5, 2009, at 4:40 PM, Jason Stajich wrote:
>
>> integer overflow in blast....
>>
>> On Jul 5, 2009, at 2:00 PM, Smithies, Russell wrote:
>>
>>> I'd guess it's a difference in the parameters used.
>>> Interesting that both have the number of letters in the db as
>>> "-1,125,070,205", I assume that's a bug :-)
>>>
>>> Stats from your remote_blast:
>>>
>>> 'stats' => {
>>> 'S1' => '42',
>>> 'S1_bits' => '20.8',
>>> 'lambda' => '0.309',
>>> 'entropy' => '0.345',
>>> 'kappa_gapped' => '0.0410',
>>> 'T' => '11',
>>> 'kappa' => '0.122',
>>> 'X3_bits' => '24.7',
>>> 'X1' => '16',
>>> 'lambda_gapped' => '0.267',
>>> 'X2' => '38',
>>> 'S2' => '74',
>>> 'seqs_better_than_cutoff' => '0',
>>> 'posted_date' => 'Jul 4, 2009 4:41 AM',
>>> 'Hits_to_DB' => '60102303',
>>> 'dbletters' => '-1125070205',
>>> 'A' => '40',
>>> 'num_successful_extensions' => '2004',
>>> 'num_extensions' => '1436892',
>>> 'X1_bits' => '7.1',
>>> 'X3' => '64',
>>> 'entropy_gapped' => '0.140',
>>> 'dbentries' => '9252258',
>>> 'X2_bits' => '14.6',
>>> 'S2_bits' => '33.1'
>>> }
>>>
>>>
>>> Stats from a blast done on the NCBI webpage:
>>>
>>> Database: All non-redundant GenBank CDS translations+PDB+SwissProt
>>> +PIR+PRF
>>> excluding environmental samples from WGS projects
>>> Posted date: Jul 4, 2009 4:41 AM
>>> Number of letters in database: -1,125,070,205
>>> Number of sequences in database: 9,252,258
>>>
>>> Lambda K H
>>> 0.309 0.124 0.340
>>> Gapped
>>> Lambda K H
>>> 0.267 0.0410 0.140
>>> Matrix: BLOSUM62
>>> Gap Penalties: Existence: 11, Extension: 1
>>> Number of Sequences: 9252258
>>> Number of Hits to DB: 86493230
>>> Number of extensions: 3101413
>>> Number of successful extensions: 9001
>>> Number of sequences better than 100: 65
>>> Number of HSP's better than 100 without gapping: 0
>>> Number of HSP's gapped: 9000
>>> Number of HSP's successfully gapped: 66
>>> Length of query: 150
>>> Length of database: 3169897087
>>> Length adjustment: 113
>>> Effective length of query: 37
>>> Effective length of database: 2124391933
>>> Effective search space: 78602501521
>>> Effective search space used: 78602501521
>>> T: 11
>>> A: 40
>>> X1: 16 (7.1 bits)
>>> X2: 38 (14.6 bits)
>>> X3: 64 (24.7 bits)
>>> S1: 42 (20.8 bits)
>>> S2: 65 (29.6 bits)
>>>
>>>
>>>> -----Original Message-----
>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>> bounces at lists.open-bio.org] On Behalf Of Jonas Schaer
>>>> Sent: Sunday, 28 June 2009 10:15 p.m.
>>>> To: BioPerl List
>>>> Subject: [Bioperl-l] different results with remote-blast skript
>>>>
>>>> Hi again :)
>>>> please, I only have this little question:
>>>> why do I get different results with my remote::blast perl skript
>>>> then on the
>>>> ncbi blast homepage?
>>>> I am using blastp, the query is an amino-sequence (different
>>>> results with any
>>>> sequence, differences not only in number of hits but even in e-
>>>> values, scores
>>>> etc...), the database is 'nr'.
>>>> PLEASE help me,
>>>> thank you in advance,
>>>> Jonas
>>>>
>>>> ps: my skript:
>>>> ##############################################################################
>>>> ##
>>>> use Bio::Seq::SeqFactory;
>>>> use Bio::Tools::Run::RemoteBlast;
>>>> use strict;
>>>> my @blast_report;
>>>> my $prog = 'blastp';
>>>> my $db = 'nr';
>>>> my $e_val= '1e-10';
>>>> #my $e_val= '10';
>>>> my @params = ( '-prog' => $prog,
>>>> '-data' => $db,
>>>> '-expect' => $e_val,
>>>> '-readmethod' => 'SearchIO' );
>>>> my $factory = Bio::Tools::Run::RemoteBlast->new(@params);
>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'} = '11 1';
>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'MAX_NUM_SEQ'} = '100';
>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'EXPECT'} = '10';
>>>> $
>>>> Bio
>>>> ::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'}
>>>> = '1';
>>>>
>>>> my
>>>> $
>>>> blast_seq
>>>> ='MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLR
>>>> SLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDNAFRQAHQNTAMATGPD
>>>> PDDEYE';
>>>> #$v is just to turn on and off the messages
>>>> my $v = 1;
>>>> my $seqbuilder = Bio::Seq::SeqFactory->new('-type' =>
>>>> 'Bio::PrimarySeq');
>>>> my $seq = $seqbuilder->create(-seq =>$blast_seq, -display_id =>
>>>> "$blast_seq");
>>>> my $filename='temp2.out';
>>>> my $r = $factory->submit_blast($seq);
>>>> print STDERR "waiting..." if( $v > 0 );
>>>> while ( my @rids = $factory->each_rid )
>>>> {
>>>> foreach my $rid ( @rids )
>>>> {
>>>> my $rc = $factory->retrieve_blast($rid);
>>>> if( !ref($rc) )
>>>> {
>>>> if( $rc < 0 )
>>>> {
>>>> $factory->remove_rid($rid);
>>>> }
>>>> print STDERR "." if ( $v > 0 );
>>>> }
>>>> else
>>>> {
>>>> my $result = $rc->next_result();
>>>> $factory->save_output($filename);
>>>> $factory->remove_rid($rid);
>>>> print "\nQuery Name: ", $result->query_name(),
>>>> "\n";
>>>> while ( my $hit = $result->next_hit )
>>>> {
>>>> next unless ( $v > 0);
>>>> print "\thit name is ", $hit->name, "\n";
>>>> while( my $hsp = $hit->next_hsp )
>>>> {
>>>> print "\t\tscore is ", $hsp->score, "\n";
>>>> }
>>>> }
>>>> }
>>>> }
>>>>
>>>>
>>>> }
>>>> @blast_report = get_file_data ($filename);
>>>> return @blast_report;
>>>> ##############################################################################
>>>> ####
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>> =
>>> =
>>> =====================================================================
>>> Attention: The information contained in this message and/or
>>> attachments
>>> from AgResearch Limited is intended only for the persons or entities
>>> to which it is addressed and may contain confidential and/or
>>> privileged
>>> material. Any review, retransmission, dissemination or other use
>>> of, or
>>> taking of any action in reliance upon, this information by persons or
>>> entities other than the intended recipients is prohibited by
>>> AgResearch
>>> Limited. If you have received this message in error, please notify
>>> the
>>> sender immediately.
>>> =
>>> =
>>> =====================================================================
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> --
>> Jason Stajich
>> jason at bioperl.org
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
--------------------------------------------------------------------------------
No virus found in this incoming message.
Checked by AVG - www.avg.com
Version: 8.5.375 / Virus Database: 270.13.5/2219 - Release Date: 07/05/09
05:53:00
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/gif
Size: 231 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20090706/47916715/attachment-0004.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/gif
Size: 358 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20090706/47916715/attachment-0005.gif>
More information about the Bioperl-l
mailing list