[Bioperl-l] cdd-search with remoteblast?

Cook, Malcolm MEC at stowers.org
Thu Jul 23 14:53:55 UTC 2009


Chris,

I figured it was something like that.  Isolating the test in a file worked for me to decouple whatever cross-contamination was going on.  So, I'll just leave it at that for now unless any objections.  I suppose the correct follow on would be to file a bug....  I'm poised to go on vacation.... perhaps when I get back....

I don't use RemoteBlast myself at all in fact.... just wrote the rps thing to make sure I knew how the ncbi service worked...

Cheers,

Malcolm
________________________________________
From: Chris Fields [cjfields1 at gmail.com]
Sent: Wednesday, July 22, 2009 6:32 PM
To: Cook, Malcolm
Cc: 'BioPerl List'; 'Jonas Schaer'
Subject: Re: [Bioperl-l] cdd-search with remoteblast?

Malcolm, it's probably not you.  Looks like the get/put parameters are
set as globals, so there may be cross-contamination of instances
(worth checking JIC).

You can probably work around that to an extent by encompassing any
calls in blocks to localize changes.

chris

On Jul 21, 2009, at 11:59 AM, Cook, Malcolm wrote:

> Chris,
>
> I wound up adding a new test
>
> #       $Id: RemoteBlast_rpsblast.t 15874 2009-07-21 16:57:54Z mcook $
>
> with the comment :
>
> # malcolm_cook at stowers.org: this test is in a separate file from
> # RemoteBlast.t (on which it is modelled) since there is some sort of
> # side-effecting between the multiple remote blasts that is causing
> # this test to fail, if it comes last, or the other test to fail, if
> # this one comes first.  THIS IS A BUG EITHER IN REMOTE BLAST OR MY
> # UNDERSTANDING, i.e. of how to initialize it.
>
> In any case, the test passes and demos rpsblast usage.
>
> Cheers,
>
>
> Malcolm Cook
> Stowers Institute for Medical Research - Kansas City, Missouri
>
>
>> -----Original Message-----
>> From: Chris Fields [mailto:cjfields1 at gmail.com]
>> Sent: Friday, July 10, 2009 1:05 PM
>> To: Cook, Malcolm
>> Cc: 'Jonas Schaer'; 'BioPerl List'
>> Subject: Re: [Bioperl-l] cdd-search with remoteblast?
>>
>> Malcolm,
>>
>> Nice!  Go ahead and add the test in; we can look at trying to
>> get CDD_SEARCH working at some point but this is a nice workaround.
>>
>> chris
>>
>> On Jul 10, 2009, at 10:45 AM, Cook, Malcolm wrote:
>>
>>> Chris, I've added a test to bioperl RemoteBlast.t that demonstrates
>>> the following.  Is it appropriate to submit it?
>>>
>>> Jonas, OK, I was a little quick on the gun... but I've got it now.
>>>
>>> You don't need to change the wrapper.  Here is what you need to do:
>>>
>>> # 1) set your database like this:
>>>
>>> -database => 'cdsearch/cdd', # c.f.
>>> http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/remote_blastdblist.html
>>> for other cdd database options
>>>
>>> # 2) add this line before submitting the job:
>>> $Bio::Tools::Run::RemoteBlast::HEADER{'SERVICE'} = 'rpsblast';
>>>
>>> You're in - No other changes needed.
>>>
>>> Malcolm Cook
>>> Stowers Institute for Medical Research - Kansas City, Missouri
>>>
>>>
>>>> -----Original Message-----
>>>> From: Jonas Schaer [mailto:Brotelzwieb at gmx.de]
>>>> Sent: Friday, July 10, 2009 4:18 AM
>>>> To: BioPerl List; Cook, Malcolm; Chris Fields
>>>> Subject: Re: [Bioperl-l] cdd-search with remoteblast?
>>>>
>>>> Hi,
>>>> I tried to do what Malcom proposed my ($prog = 'rpsblast';
>>>> my $db   =
>>>> 'CDD';)  but that didn't work.
>>>>
>>>> ------------- EXCEPTION: Bio::Root::Exception -------------
>>>> MSG: Value rpsblast for PUT parameter PROGRAM does not match
>>>> expression t?blast[ pnx]. Rejecting.
>>>> STACK: Error::throw
>>>> STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359
>>>> STACK: Bio::Tools::Run::RemoteBlast::submit_parameter
>>>> C:/Perl/site/lib/Bio/Tools
>>>> /Run/RemoteBlast.pm:329
>>>> STACK: Bio::Tools::Run::RemoteBlast::new
>>>> C:/Perl/site/lib/Bio/Tools/Run/RemoteBl
>>>> ast.pm:257
>>>> STACK: blast_a_seq2.pm:14
>>>> -----------------------------------------------------------
>>>> So I should try to "change the wrapper to allow
>> 'rpsblast'", right?
>>>> Could You tell me how to do that, please? So sorry but I
>> have no idea
>>>> yet...:) If that doesn't work, is there any other way to run
>>>> cdd-searches with perl?
>>>> Thank you so much!
>>>> Regards, Jonas
>>>>
>>>> ----- Original Message -----
>>>> From: "Chris Fields" <cjfields1 at gmail.com>
>>>> To: "Cook, Malcolm" <MEC at stowers.org>
>>>> Cc: "'Jonas Schaer'" <Brotelzwieb at gmx.de>; "'BioPerl List'"
>>>> <bioperl-l at lists.open-bio.org>; "'Smithies, Russell'"
>>>> <Russell.Smithies at agresearch.co.nz>; <cjfields at bioperl.org>
>>>> Sent: Thursday, July 09, 2009 9:19 PM
>>>> Subject: Re: [Bioperl-l] cdd-search with remoteblast?
>>>>
>>>>
>>>>> I've scheduled this tentatively for the 1.6 release
>> series (just not
>>>>> sure when yet).  It may work as is, but I haven't tried
>> it out yet
>>>>> (and am hazarding to guess it only retrieves the single
>> main RID at
>>>>> the moment).
>>>>>
>>>>> chris
>>>>>
>>>>> On Jul 9, 2009, at 10:56 AM, Cook, Malcolm wrote:
>>>>>
>>>>>> Jonas,
>>>>>>
>>>>>> If you want to continue to use the bioperl remoteblast
>> interface,
>>>>>> probably what you should do is simply call it twice.
>>>>>>
>>>>>> Once, as you already know how to do, which will return
>> without CDD
>>>>>> results.
>>>>>>
>>>>>> Secondly, to get the CDD results, call remoteblast a second time.
>>>>>> This time, using
>>>>>> -database => 'CDD'
>>>>>> -program => 'rpsblast'
>>>>>>
>>>>>> However, the wrapper may object to the 'rpsblast'
>> program.  It is
>>>>>> not listed in the POD -
>>>>>>
>>>> http://search.cpan.org/~cjfields/BioPerl-1.6.0/Bio/Tools/Run/R
>>>> emoteBlast.pm)
>>>>>>  If so, my guess is that changing the perl wrapper to allow
>>>>>> rpsblast will "just work" (tm).  I've cc:ed
>>>> cjfields at bioperl.org for
>>>>>> his opinion on this.
>>>>>>
>>>>>> Also, you might want to perform the CDD search first,
>> especially if
>>>>>> you are streaming results to eyeball that might like
>> something to
>>>>>> look at while the second (presumably longer) search is running.
>>>>>>
>>>>>> Cheers,
>>>>>>
>>>>>> Malcolm Cook
>>>>>> Stowers Institute for Medical Research - Kansas City, Missouri
>>>>>>
>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: bioperl-l-bounces at lists.open-bio.org
>>>>>>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf
>> Of Jonas
>>>>>>> Schaer
>>>>>>> Sent: Thursday, July 09, 2009 5:16 AM
>>>>>>> To: BioPerl List; Smithies, Russell
>>>>>>> Subject: Re: [Bioperl-l] cdd-search with remoteblast?
>>>>>>>
>>>>>>> Hi guys,
>>>>>>> Thank you all so much for your help and patience :). Of
>> course you
>>>>>>> were right and I finaly found the right put-parameter to get
>>>>>>> exactly the same hits as on the homepage.
>>>>>>> I do have an other question though :)...
>>>>>>> I now want to include a search for conserved domains,
>> but when I
>>>>>>> try to use the CDD_SEARCH-parameter
>>>>>>> (http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/new/node16.html#
>>>>>>> sub:CDD_SEARCH)
>>>>>>> like the other put-parameters the way chris once told me(works
>>>>>>> fine with the other params):
>>>>>>>
>>>>>>> my %put = (
>>>>>>>  WORD_SIZE => 3,
>>>>>>>  HITLIST_SIZE => 100,
>>>>>>>  THRESHOLD => 11,
>>>>>>>  FILTER => 'R',
>>>>>>>  GENETIC_CODE => 1,
>>>>>>>  CDD_SEARCH => 'on'
>>>>>>> ###I tried it
>>>>>>> with 'true' and '1', too.
>>>>>>>
>>>>>>> );
>>>>>>>
>>>>>>> for my $putName (keys %put) {
>>>>>>>  $factory->submit_parameter($putName,$put{$putName});
>>>>>>> }
>>>>>>>
>>>>>>>
>>>>>>> ...an exception is thrown:
>>>>>>>
>>>>>>> ------------- EXCEPTION: Bio::Root::Exception -------------
>>>>>>> MSG: CDD_SEARCH is not a valid PUT parameter.
>>>>>>> STACK: Error::throw
>>>>>>> STACK: Bio::Root::Root::throw
>>>> C:/Perl/site/lib/Bio/Root/Root.pm:359
>>>>>>> STACK: Bio::Tools::Run::RemoteBlast::submit_parameter
>>>>>>> C:/Perl/site/lib/Bio/Tools
>>>>>>> /Run/RemoteBlast.pm:325
>>>>>>> STACK: main::blast_a_sequence firsteval0.8.pm:383
>>>>>>> STACK: main::blast_it firsteval0.8.pm:288
>>>>>>> STACK: firsteval0.8.pm:35
>>>>>>> ----------------------------------------------------------- .
>>>>>>> I guess somehow this could be the solution to my problem:
>>>>>>> http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/new/node78.html#s
>>>>>>> ub:RID-for-Simultaneous
>>>>>>> , but unfortunately I don't understand what to do.
>>>>>>> I'm so sorry to bother you with this but please help me once
>>>>>>> more...:)
>>>>>>>
>>>>>>> Best regards and thanks in advance, Jonas
>>>>>>>
>>>>>>> ----- Original Message -----
>>>>>>> From: "Smithies, Russell" <Russell.Smithies at agresearch.co.nz>
>>>>>>> To: "'Jonas Schaer'" <Brotelzwieb at gmx.de>
>>>>>>> Cc: "'Chris Fields'" <cjfields at illinois.edu>; "'BioPerl List'"
>>>>>>> <bioperl-l at lists.open-bio.org>
>>>>>>> Sent: Monday, July 06, 2009 10:56 PM
>>>>>>> Subject: RE: [Bioperl-l] different results with
>>>> remote-blast skript
>>>>>>>
>>>>>>>
>>>>>>> Hi Jonas,
>>>>>>> You can't just play with the BLAST parameters and hope
>>>> for a "better"
>>>>>>> result.
>>>>>>> I'd suggest that if you aren't sure what they do, you
>> should leave
>>>>>>> them alone as small changes can make huge differences in the
>>>>>>> output - it's quite possible to miss finding what
>> you're looking
>>>>>>> for by using
>>>> the wrong
>>>>>>> parameters.
>>>>>>> If all else fails, read the blast manual:
>>>>>>> http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/blastall/blastall
>>>>>>> _all.html
>>>>>>> http://www.ncbi.nlm.nih.gov/blast/tutorial/
>>>>>>> Or Read Ian Korfs' excellent book:
>>>>>>> http://books.google.com/books?id=xvcnhDG9fNUC&lpg=PR17&ots=WJp
>>>>>> fuHF6Hn&dq=ian%20korf%20%20blast%20book&pg=PA3
>>>>>>>
>>>>>>> Don't worry about the integer overflow bug as there's
>> nothing you
>>>>>>> can do about it. If you're interested, Google and Wikipedia are
>>>>>>> your
>>>>>>> friends:
>>>>>>> http://en.wikipedia.org/wiki/Integer_overflow
>>>>>>>
>>>>>>>
>>>>>>> Russell
>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>>>>>> bounces at lists.open-bio.org] On Behalf Of Jonas Schaer
>>>>>>>> Sent: Tuesday, 7 July 2009 12:14 a.m.
>>>>>>>> To: BioPerl List; Chris Fields
>>>>>>>> Subject: Re: [Bioperl-l] different results with
>>>> remote-blast skript
>>>>>>>>
>>>>>>>> Hi guys, thanks for your answers so far.
>>>>>>>> @jason: integer overflow in blast.... sorry, but what do
>>>>>>> you mean by that?
>>>>>>>> how can I fix it...?
>>>>>>>>
>>>>>>>> Since I never really changed any parameters I thought them
>>>>>>> all to be
>>>>>>>> default.
>>>>>>>> whatever, I tried to get "better" results with my prog
>>>> by changing
>>>>>>>> these:
>>>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'} = '11 1';
>>>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'MAX_NUM_SEQ'} = '100';
>>>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'EXPECT'} = '10';
>>>>>>>>
>>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATI
>>>>>>> STICS'} =
>>>>>>>> '1';
>>>>>>>> with no effect...I guess these were default values anyway.
>>>>>>>>
>>>>>>>> So please maybe you can tell me all the other parameters I
>>>>>>> can change with
>>>>>>>> my
>>>>>>>> perl-skript AND how to do that?
>>>>>>>> Unfortunately both, perl and the blast-algorithm are pretty
>>>>>>> much new to
>>>>>>>> me,
>>>>>>>> maybe thats why I just cannot find out how to do that on my
>>>>>>> own... :/
>>>>>>>>
>>>>>>>> Here is the output I get with my remote-blast skript:
>>>>>>>>
>>>>>>> ##############################################################
>>>>>>> ################
>>>>>>>> ###################################
>>>>>>>> Query Name:
>>>>>>>>
>>>>
>> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLRSL
>>>>>>>> L
>>>>>>>>      hit name is ref|XP_001702807.1|
>>>>>>>>              score is 442
>>>>>>>> BLASTP 2.2.21+
>>>>>>>> Reference: Stephen F. Altschul, Thomas L. Madden, Alejandro
>>>>>>> A. Schaffer,
>>>>>>>> Jinghui Zhang, Zheng Zhang, Webb Miller, and David J.
>>>>>>> Lipman (1997),
>>>>>>>> "Gapped
>>>>>>>> BLAST and PSI-BLAST: a new generation of protein
>> database search
>>>>>>>> programs", Nucleic Acids Res. 25:3389-3402.
>>>>>>>>
>>>>>>>>
>>>>>>>> Reference for composition-based statistics: Alejandro A.
>>>>>>>> Schaffer, L. Aravind, Thomas L. Madden, Sergei Shavirin,
>>>>>>> John L. Spouge,
>>>>>>>> Yuri
>>>>>>>> I. Wolf, Eugene V. Koonin, and Stephen F. Altschul (2001),
>>>>>>> "Improving the
>>>>>>>> accuracy of PSI-BLAST protein database searches with
>>>>>>> composition-based
>>>>>>>> statistics and other refinements", Nucleic Acids Res.
>>>> 29:2994-3005.
>>>>>>>>
>>>>>>>>
>>>>>>>> RID: 53STX5G2013
>>>>>>>>
>>>>>>>>
>>>>>>>> Database: All non-redundant GenBank CDS
>>>>>>>> translations+PDB+SwissProt+PIR+PRF excluding
>>>> environmental samples
>>>>>>>> from WGS projects
>>>>>>>>         9,252,587 sequences; 3,169,972,781 total
>> letters Query=
>>>>>>>>
>>>>>>>
>>>>
>> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLRSLL
>>>>>>>>
>>>>>>>
>>>>
>> DVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDNAFRQAHQNTA
>>>> M
>>>>>>>> ATGPDPDDEYE
>>>>>>>> Length=150
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>     Score
>>>>>>>> E
>>>>>>>> Sequences producing significant alignments:
>>>>>>>    (Bits)
>>>>>>>> Value
>>>>>>>>
>>>>>>>> ref|XP_001702807.1|  ClpS-like protein [Chlamydomonas
>>>>>>> reinhard...   174
>>>>>>>> 2e-42
>>>>>>>>
>>>>>>>>
>>>>>>>> ALIGNMENTS
>>>>>>>>> ref|XP_001702807.1| ClpS-like protein [Chlamydomonas
>>>> reinhardtii]
>>>>>>>> gb|EDP06586.1| ClpS-like protein [Chlamydomonas reinhardtii]
>>>>>>>> Length=303
>>>>>>>>
>>>>>>>> Score =  174 bits (442),  Expect = 2e-42, Method:
>>>>>>> Composition-based
>>>>>>>> stats.
>>>>>>>> Identities = 150/150 (100%), Positives = 150/150 (100%),
>>>>>>> Gaps = 0/150
>>>>>>>> (0%)
>>>>>>>>
>>>>>>>> Query  1
>>>>>>> MGSSSVGTYHLLLVLMgaggeqqavqagaevaSTEQVDGSGMAANSRGSTSGSEQPPrds
>>>>>>>> 60
>>>>>>>>
>>>>>>> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDS
>>>>>>>> Sbjct  154
>>>>>>> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDS
>>>>>>>> 213
>>>>>>>>
>>>>>>>> Query  61
>>>>>>> dlgllrslldVAGVDRTalevkllalaeagaeMPPAQDSQATAAGVVATLTSVYRQQVAR
>>>>>>>> 120
>>>>>>>>
>>>>>>> DLGLLRSLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVAR
>>>>>>>> Sbjct  214
>>>>>>> DLGLLRSLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVAR
>>>>>>>> 273
>>>>>>>>
>>>>>>>> Query  121  AWHERDDNAFRQAHQNTAMATGPDPDDEYE  150
>>>>>>>>          AWHERDDNAFRQAHQNTAMATGPDPDDEYE Sbjct  274
>>>>>>>> AWHERDDNAFRQAHQNTAMATGPDPDDEYE  303
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Database: All non-redundant GenBank CDS
>>>>>>>> translations+PDB+SwissProt+PIR+PRF
>>>>>>>> excluding environmental samples from WGS projects
>>>>>>>>  Posted date:  Jul 5, 2009  4:41 AM Number of letters in
>>>>>>>> database: -1,124,994,511 Number of sequences in database:
>>>>>>>> 9,252,587
>>>>>>>>
>>>>>>>> Lambda     K      H
>>>>>>>> 0.309    0.122    0.345
>>>>>>>> Gapped
>>>>>>>> Lambda     K      H
>>>>>>>> 0.267   0.0410    0.140
>>>>>>>> Matrix: BLOSUM62
>>>>>>>> Gap Penalties: Existence: 11, Extension: 1 Number of
>> Sequences:
>>>>>>>> 9252587 Number of Hits to DB: 60273703 Number of extensions:
>>>>>>>> 1448367 Number of successful extensions: 2103 Number
>> of sequences
>>>>>>>> better than 10: 0 Number of HSP's better than 10
>> without gapping:
>>>>>>>> 0 Number of HSP's gapped: 2113 Number of HSP's successfully
>>>>>>>> gapped: 0 Length of query: 150 Length of database: 3169972781
>>>>>>>> Length adjustment: 113 Effective length of query: 37 Effective
>>>>>>>> length of database: 2124430450 Effective search space:
>>>>>>>> 78603926650 Effective search space used: 78603926650
>>>>>>>> T: 11
>>>>>>>> A: 40
>>>>>>>> X1: 16 (7.1 bits)
>>>>>>>> X2: 38 (14.6 bits)
>>>>>>>> X3: 64 (24.7 bits)
>>>>>>>> S1: 42 (20.8 bits)
>>>>>>>> S2: 74 (33.1 bits)
>>>>>>>>
>>>>>>>>
>>>>>>> ##############################################################
>>>>>>> ################
>>>>>>>> ###################################
>>>>>>>> and here are the hits (?) of the blast-algorithm on the
>>>>>>> ncbi-homepage with
>>>>>>>> the same query of course:
>>>>>>>> ref|XP_001702807.1|  ClpS-like protein [Chlamydomonas
>>>>>>> reinhard...   300
>>>>>>>> 3e-80
>>>>>>>> ref|XP_001942719.1|  PREDICTED: similar to GA16705-PA
>>>>>>> [Acyrtho...  36.2
>>>>>>>> 1.1
>>>>>>>> ref|ZP_03781446.1|  hypothetical protein RUMHYD_00880
>>>>>>> [Blautia...  35.4
>>>>>>>> 1.8
>>>>>>>> ref|XP_001563232.1|  leucyl-tRNA synthetase [Leishmania
>>>>>>> brazil...  34.3
>>>>>>>> 4.2
>>>>>>>> ref|XP_680841.1|  hypothetical protein AN7572.2
>>>>>>> [Aspergillus n...  33.5
>>>>>>>> 6.0
>>>>>>>> ref|YP_001768110.1|  hypothetical protein M446_1150
>>>>>>> [Methyloba...  33.5
>>>>>>>> 7.0
>>>>>>>>
>>>>>>> ##############################################################
>>>>>>> ################
>>>>>>>> ###################################at
>>>>>>>> least the first hit is the same, but even there there is a
>>>>>>> different score
>>>>>>>> and e-value.
>>>>>>>>
>>>>>>>> thanks so much for any help :)
>>>>>>>> regards, jonas
>>>>>>>>
>>>>>>>>
>>>>>>>> ----- Original Message -----
>>>>>>>> From: "Chris Fields" <cjfields at illinois.edu>
>>>>>>>> To: "Jason Stajich" <jason at bioperl.org>
>>>>>>>> Cc: "Smithies, Russell"
>>>>>>> <Russell.Smithies at agresearch.co.nz>; "'BioPerl
>>>>>>>> List'" <bioperl-l at lists.open-bio.org>; "'Jonas Schaer'"
>>>>>>>> <Jonas_Schaer at gmx.de>
>>>>>>>> Sent: Monday, July 06, 2009 12:51 AM
>>>>>>>> Subject: Re: [Bioperl-l] different results with
>>>> remote-blast skript
>>>>>>>>
>>>>>>>>
>>>>>>>>> That inspires confidence ;>
>>>>>>>>>
>>>>>>>>> chris
>>>>>>>>>
>>>>>>>>> On Jul 5, 2009, at 4:40 PM, Jason Stajich wrote:
>>>>>>>>>
>>>>>>>>>> integer overflow in blast....
>>>>>>>>>>
>>>>>>>>>> On Jul 5, 2009, at 2:00 PM, Smithies, Russell wrote:
>>>>>>>>>>
>>>>>>>>>>> I'd guess it's a difference in the parameters used.
>>>>>>>>>>> Interesting that both have the number of letters in
>> the db as
>>>>>>>>>>> "-1,125,070,205", I assume that's a bug  :-)
>>>>>>>>>>>
>>>>>>>>>>> Stats from your remote_blast:
>>>>>>>>>>>
>>>>>>>>>>> 'stats' => {
>>>>>>>>>>>         'S1' => '42',
>>>>>>>>>>>         'S1_bits' => '20.8',
>>>>>>>>>>>         'lambda' => '0.309',
>>>>>>>>>>>         'entropy' => '0.345',
>>>>>>>>>>>         'kappa_gapped' => '0.0410',
>>>>>>>>>>>         'T' => '11',
>>>>>>>>>>>         'kappa' => '0.122',
>>>>>>>>>>>         'X3_bits' => '24.7',
>>>>>>>>>>>         'X1' => '16',
>>>>>>>>>>>         'lambda_gapped' => '0.267',
>>>>>>>>>>>         'X2' => '38',
>>>>>>>>>>>         'S2' => '74',
>>>>>>>>>>>         'seqs_better_than_cutoff' => '0',
>>>>>>>>>>>         'posted_date' => 'Jul 4, 2009  4:41 AM',
>>>>>>>>>>>         'Hits_to_DB' => '60102303',
>>>>>>>>>>>         'dbletters' => '-1125070205',
>>>>>>>>>>>         'A' => '40',
>>>>>>>>>>>         'num_successful_extensions' => '2004',
>>>>>>>>>>>         'num_extensions' => '1436892',
>>>>>>>>>>>         'X1_bits' => '7.1',
>>>>>>>>>>>         'X3' => '64',
>>>>>>>>>>>         'entropy_gapped' => '0.140',
>>>>>>>>>>>         'dbentries' => '9252258',
>>>>>>>>>>>         'X2_bits' => '14.6',
>>>>>>>>>>>         'S2_bits' => '33.1'
>>>>>>>>>>>       }
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Stats from a blast done on the NCBI webpage:
>>>>>>>>>>>
>>>>>>>>>>> Database: All non-redundant GenBank CDS
>>>>>>> translations+PDB+SwissProt
>>>>>>>>>>> +PIR+PRF
>>>>>>>>>>> excluding environmental samples from WGS projects
>> Posted date:
>>>>>>>>>>> Jul 4, 2009  4:41 AM Number of letters in database:
>>>>>>>>>>> -1,125,070,205 Number of sequences in database:  9,252,258
>>>>>>>>>>>
>>>>>>>>>>> Lambda     K      H
>>>>>>>>>>> 0.309    0.124    0.340
>>>>>>>>>>> Gapped
>>>>>>>>>>> Lambda     K      H
>>>>>>>>>>> 0.267   0.0410    0.140
>>>>>>>>>>> Matrix: BLOSUM62
>>>>>>>>>>> Gap Penalties: Existence: 11, Extension: 1 Number of
>>>>>>>>>>> Sequences: 9252258 Number of Hits to DB: 86493230 Number of
>>>>>>>>>>> extensions: 3101413 Number of successful extensions: 9001
>>>>>>>>>>> Number of sequences better than 100: 65 Number of
>> HSP's better
>>>>>>>>>>> than 100 without gapping: 0 Number of HSP's gapped: 9000
>>>>>>>>>>> Number of HSP's successfully gapped: 66 Length of
>> query: 150
>>>>>>>>>>> Length of database: 3169897087 Length adjustment: 113
>>>>>>>>>>> Effective length of query: 37 Effective length of database:
>>>>>>>>>>> 2124391933 Effective search space: 78602501521 Effective
>>>>>>>>>>> search space used: 78602501521
>>>>>>>>>>> T: 11
>>>>>>>>>>> A: 40
>>>>>>>>>>> X1: 16 (7.1 bits)
>>>>>>>>>>> X2: 38 (14.6 bits)
>>>>>>>>>>> X3: 64 (24.7 bits)
>>>>>>>>>>> S1: 42 (20.8 bits)
>>>>>>>>>>> S2: 65 (29.6 bits)
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>> From: bioperl-l-bounces at lists.open-bio.org
>> [mailto:bioperl-l-
>>>>>>>>>>>> bounces at lists.open-bio.org] On Behalf Of Jonas Schaer
>>>>>>>>>>>> Sent: Sunday, 28 June 2009 10:15 p.m.
>>>>>>>>>>>> To: BioPerl List
>>>>>>>>>>>> Subject: [Bioperl-l] different results with
>>>> remote-blast skript
>>>>>>>>>>>>
>>>>>>>>>>>> Hi again :)
>>>>>>>>>>>> please, I only have this little question:
>>>>>>>>>>>> why do I get different results with my remote::blast
>>>>>>> perl skript
>>>>>>>>>>>> then on the
>>>>>>>>>>>> ncbi blast homepage?
>>>>>>>>>>>> I am using blastp, the query is an amino-sequence
>> (different
>>>>>>>>>>>> results with any sequence, differences not only in
>> number of
>>>>>>>>>>>> hits but
>>>> even in e-
>>>>>>>>>>>> values, scores
>>>>>>>>>>>> etc...), the database is 'nr'.
>>>>>>>>>>>> PLEASE help me,
>>>>>>>>>>>> thank you in advance,
>>>>>>>>>>>> Jonas
>>>>>>>>>>>>
>>>>>>>>>>>> ps: my skript:
>>>>>>>>>>>>
>>>>>>>>
>>>>>>> ##############################################################
>>>>>>> ################
>>>>>>>>>>>> ##
>>>>>>>>>>>> use Bio::Seq::SeqFactory;
>>>>>>>>>>>> use Bio::Tools::Run::RemoteBlast; use strict; my
>>>>>>>>>>>> @blast_report; my $prog = 'blastp';
>>>>>>>>>>>> my $db   = 'nr';
>>>>>>>>>>>> my $e_val= '1e-10';
>>>>>>>>>>>> #my $e_val= '10';
>>>>>>>>>>>> my @params = ( '-prog' => $prog,
>>>>>>>>>>>>     '-data' => $db,
>>>>>>>>>>>>     '-expect' => $e_val,
>>>>>>>>>>>>     '-readmethod' => 'SearchIO' ); my $factory =
>>>>>>>>>>>> Bio::Tools::Run::RemoteBlast->new(@params);
>>>>>>>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'}
>> = '11 1';
>>>>>>>>>>>>
>> $Bio::Tools::Run::RemoteBlast::HEADER{'MAX_NUM_SEQ'} = '100';
>>>>>>>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'EXPECT'} =
>> '10'; $ Bio
>>>>>>>>>>>>
>>>>>>>
>> ::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'}
>>>>>>>>>>>> = '1';
>>>>>>>>>>>>
>>>>>>>>>>>> my
>>>>>>>>>>>> $
>>>>>>>>>>>> blast_seq
>>>>>>>>>>>>
>>>>>>>
>>>>
>> ='MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLR
>>>>>>>>>>>>
>>>>>>>>
>>>>>>> SLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDN
>>>>>>> AFRQAHQNTAMATGPD
>>>>>>>>>>>> PDDEYE';
>>>>>>>>>>>> #$v is just to turn on and off the messages my $v = 1; my
>>>>>>>>>>>> $seqbuilder = Bio::Seq::SeqFactory->new('-type' =>
>>>>>>>>>>>> 'Bio::PrimarySeq'); my $seq = $seqbuilder->create(-seq
>>>>>>>>>>>> =>$blast_seq,
>>>> -display_id =>
>>>>>>>>>>>> "$blast_seq");
>>>>>>>>>>>> my $filename='temp2.out';
>>>>>>>>>>>> my $r = $factory->submit_blast($seq); print STDERR
>>>>>>>>>>>> "waiting..." if( $v > 0 ); while ( my @rids =
>>>>>>>>>>>> $factory->each_rid ) {
>>>>>>>>>>>>    foreach my $rid ( @rids )
>>>>>>>>>>>>    {
>>>>>>>>>>>>        my $rc = $factory->retrieve_blast($rid);
>>>>>>>>>>>>        if( !ref($rc) )
>>>>>>>>>>>>        {
>>>>>>>>>>>>            if( $rc < 0 )
>>>>>>>>>>>>            {
>>>>>>>>>>>>                $factory->remove_rid($rid);
>>>>>>>>>>>>            }
>>>>>>>>>>>>            print STDERR "." if ( $v > 0 );
>>>>>>>>>>>>        }
>>>>>>>>>>>>            else
>>>>>>>>>>>>            {
>>>>>>>>>>>>                my $result = $rc->next_result();
>>>>>>>>>>>>                $factory->save_output($filename);
>>>>>>>>>>>>                $factory->remove_rid($rid);
>>>>>>>>>>>>                print "\nQuery Name: ",
>>>>>>> $result->query_name(),
>>>>>>>>>>>> "\n";
>>>>>>>>>>>>                while ( my $hit = $result->next_hit )
>>>>>>>>>>>>                {
>>>>>>>>>>>>                    next unless ( $v > 0);
>>>>>>>>>>>>                    print "\thit name is ",
>>>> $hit->name, "\n";
>>>>>>>>>>>>                    while( my $hsp = $hit->next_hsp )
>>>>>>>>>>>>                    {
>>>>>>>>>>>>                        print "\t\tscore is ",
>>>>>>> $hsp->score, "\n";
>>>>>>>>>>>>                    }
>>>>>>>>>>>>                }
>>>>>>>>>>>>            }
>>>>>>>>>>>>    }
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> }
>>>>>>>>>>>> @blast_report = get_file_data ($filename); return
>>>>>>>>>>>> @blast_report;
>>>>>>>>>>>>
>>>>>>>>
>>>>>>> ##############################################################
>>>>>>> ################
>>>>>>>>>>>> ####
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> Bioperl-l mailing list
>>>>>>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>>>> =
>>>>>>>>>>> =
>>>>>>>>>>>
>>>>>>>
>>>>
>> =====================================================================
>>>>>>>>>>> Attention: The information contained in this message and/or
>>>>>>>>>>> attachments from AgResearch Limited is intended only for the
>>>>>>> persons or entities
>>>>>>>>>>> to which it is addressed and may contain
>> confidential and/or
>>>>>>>>>>> privileged material. Any review, retransmission,
>> dissemination
>>>> or other use
>>>>>>>>>>> of, or
>>>>>>>>>>> taking of any action in reliance upon, this information
>>>>>>> by persons or
>>>>>>>>>>> entities other than the intended recipients is
>> prohibited by
>>>>>>>>>>> AgResearch Limited. If you have received this message in
>>>>>>>>>>> error,
>>>>>>> please notify
>>>>>>>>>>> the
>>>>>>>>>>> sender immediately.
>>>>>>>>>>> =
>>>>>>>>>>> =
>>>>>>>>>>>
>>>>>>>
>>>>
>> =====================================================================
>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> Bioperl-l mailing list
>>>>>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Jason Stajich
>>>>>>>>>> jason at bioperl.org
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> Bioperl-l mailing list
>>>>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>> --------------------------------------------------------------
>>>>>>> ----------------
>>>>>>>> --
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> No virus found in this incoming message.
>>>>>>>> Checked by AVG - www.avg.com
>>>>>>>> Version: 8.5.375 / Virus Database: 270.13.5/2219 - Release
>>>>>>> Date: 07/05/09
>>>>>>>> 05:53:00
>>>>>>>
>>>>>>>
>>>>>>> --------------------------------------------------------------
>>>>>>> ------------------
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> No virus found in this incoming message.
>>>>>>> Checked by AVG - www.avg.com
>>>>>>> Version: 8.5.375 / Virus Database: 270.13.5/2220 - Release
>>>>>>> Date: 07/05/09
>>>>>>> 17:54:00
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Bioperl-l mailing list
>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>
>>>>
>>>>
>>>> --------------------------------------------------------------
>>>> ------------------
>>>>
>>>>
>>>>
>>>> No virus found in this incoming message.
>>>> Checked by AVG - www.avg.com
>>>> Version: 8.5.375 / Virus Database: 270.13.8/2227 - Release
>>>> Date: 07/09/09
>>>> 05:55:00
>>>>
>>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l





More information about the Bioperl-l mailing list