[Bioperl-l] cdd-search with remoteblast?

Cook, Malcolm MEC at stowers.org
Tue Jul 21 12:59:15 EDT 2009


Chris,

I wound up adding a new test

#       $Id: RemoteBlast_rpsblast.t 15874 2009-07-21 16:57:54Z mcook $

with the comment :

# malcolm_cook at stowers.org: this test is in a separate file from
# RemoteBlast.t (on which it is modelled) since there is some sort of
# side-effecting between the multiple remote blasts that is causing
# this test to fail, if it comes last, or the other test to fail, if
# this one comes first.  THIS IS A BUG EITHER IN REMOTE BLAST OR MY
# UNDERSTANDING, i.e. of how to initialize it.

In any case, the test passes and demos rpsblast usage.

Cheers,


Malcolm Cook
Stowers Institute for Medical Research - Kansas City, Missouri


> -----Original Message-----
> From: Chris Fields [mailto:cjfields1 at gmail.com]
> Sent: Friday, July 10, 2009 1:05 PM
> To: Cook, Malcolm
> Cc: 'Jonas Schaer'; 'BioPerl List'
> Subject: Re: [Bioperl-l] cdd-search with remoteblast?
>
> Malcolm,
>
> Nice!  Go ahead and add the test in; we can look at trying to
> get CDD_SEARCH working at some point but this is a nice workaround.
>
> chris
>
> On Jul 10, 2009, at 10:45 AM, Cook, Malcolm wrote:
>
> > Chris, I've added a test to bioperl RemoteBlast.t that demonstrates
> > the following.  Is it appropriate to submit it?
> >
> > Jonas, OK, I was a little quick on the gun... but I've got it now.
> >
> > You don't need to change the wrapper.  Here is what you need to do:
> >
> > # 1) set your database like this:
> >
> > -database => 'cdsearch/cdd', # c.f.
> > http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/remote_blastdblist.html
> >  for other cdd database options
> >
> > # 2) add this line before submitting the job:
> > $Bio::Tools::Run::RemoteBlast::HEADER{'SERVICE'} = 'rpsblast';
> >
> > You're in - No other changes needed.
> >
> > Malcolm Cook
> > Stowers Institute for Medical Research - Kansas City, Missouri
> >
> >
> >> -----Original Message-----
> >> From: Jonas Schaer [mailto:Brotelzwieb at gmx.de]
> >> Sent: Friday, July 10, 2009 4:18 AM
> >> To: BioPerl List; Cook, Malcolm; Chris Fields
> >> Subject: Re: [Bioperl-l] cdd-search with remoteblast?
> >>
> >> Hi,
> >> I tried to do what Malcom proposed my ($prog = 'rpsblast';
> >> my $db   =
> >> 'CDD';)  but that didn't work.
> >>
> >> ------------- EXCEPTION: Bio::Root::Exception -------------
> >> MSG: Value rpsblast for PUT parameter PROGRAM does not match
> >> expression t?blast[ pnx]. Rejecting.
> >> STACK: Error::throw
> >> STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359
> >> STACK: Bio::Tools::Run::RemoteBlast::submit_parameter
> >> C:/Perl/site/lib/Bio/Tools
> >> /Run/RemoteBlast.pm:329
> >> STACK: Bio::Tools::Run::RemoteBlast::new
> >> C:/Perl/site/lib/Bio/Tools/Run/RemoteBl
> >> ast.pm:257
> >> STACK: blast_a_seq2.pm:14
> >> -----------------------------------------------------------
> >> So I should try to "change the wrapper to allow
> 'rpsblast'", right?
> >> Could You tell me how to do that, please? So sorry but I
> have no idea
> >> yet...:) If that doesn't work, is there any other way to run
> >> cdd-searches with perl?
> >> Thank you so much!
> >> Regards, Jonas
> >>
> >> ----- Original Message -----
> >> From: "Chris Fields" <cjfields1 at gmail.com>
> >> To: "Cook, Malcolm" <MEC at stowers.org>
> >> Cc: "'Jonas Schaer'" <Brotelzwieb at gmx.de>; "'BioPerl List'"
> >> <bioperl-l at lists.open-bio.org>; "'Smithies, Russell'"
> >> <Russell.Smithies at agresearch.co.nz>; <cjfields at bioperl.org>
> >> Sent: Thursday, July 09, 2009 9:19 PM
> >> Subject: Re: [Bioperl-l] cdd-search with remoteblast?
> >>
> >>
> >>> I've scheduled this tentatively for the 1.6 release
> series (just not
> >>> sure when yet).  It may work as is, but I haven't tried
> it out yet
> >>> (and am hazarding to guess it only retrieves the single
> main RID at
> >>> the moment).
> >>>
> >>> chris
> >>>
> >>> On Jul 9, 2009, at 10:56 AM, Cook, Malcolm wrote:
> >>>
> >>>> Jonas,
> >>>>
> >>>> If you want to continue to use the bioperl remoteblast
> interface,
> >>>> probably what you should do is simply call it twice.
> >>>>
> >>>> Once, as you already know how to do, which will return
> without CDD
> >>>> results.
> >>>>
> >>>> Secondly, to get the CDD results, call remoteblast a second time.
> >>>> This time, using
> >>>> -database => 'CDD'
> >>>> -program => 'rpsblast'
> >>>>
> >>>> However, the wrapper may object to the 'rpsblast'
> program.  It is
> >>>> not listed in the POD -
> >>>>
> >> http://search.cpan.org/~cjfields/BioPerl-1.6.0/Bio/Tools/Run/R
> >> emoteBlast.pm)
> >>>>   If so, my guess is that changing the perl wrapper to allow
> >>>> rpsblast will "just work" (tm).  I've cc:ed
> >> cjfields at bioperl.org for
> >>>> his opinion on this.
> >>>>
> >>>> Also, you might want to perform the CDD search first,
> especially if
> >>>> you are streaming results to eyeball that might like
> something to
> >>>> look at while the second (presumably longer) search is running.
> >>>>
> >>>> Cheers,
> >>>>
> >>>> Malcolm Cook
> >>>> Stowers Institute for Medical Research - Kansas City, Missouri
> >>>>
> >>>>
> >>>>> -----Original Message-----
> >>>>> From: bioperl-l-bounces at lists.open-bio.org
> >>>>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf
> Of Jonas
> >>>>> Schaer
> >>>>> Sent: Thursday, July 09, 2009 5:16 AM
> >>>>> To: BioPerl List; Smithies, Russell
> >>>>> Subject: Re: [Bioperl-l] cdd-search with remoteblast?
> >>>>>
> >>>>> Hi guys,
> >>>>> Thank you all so much for your help and patience :). Of
> course you
> >>>>> were right and I finaly found the right put-parameter to get
> >>>>> exactly the same hits as on the homepage.
> >>>>> I do have an other question though :)...
> >>>>> I now want to include a search for conserved domains,
> but when I
> >>>>> try to use the CDD_SEARCH-parameter
> >>>>> (http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/new/node16.html#
> >>>>> sub:CDD_SEARCH)
> >>>>> like the other put-parameters the way chris once told me(works
> >>>>> fine with the other params):
> >>>>>
> >>>>> my %put = (
> >>>>>   WORD_SIZE => 3,
> >>>>>   HITLIST_SIZE => 100,
> >>>>>   THRESHOLD => 11,
> >>>>>   FILTER => 'R',
> >>>>>   GENETIC_CODE => 1,
> >>>>>   CDD_SEARCH => 'on'
> >>>>> ###I tried it
> >>>>> with 'true' and '1', too.
> >>>>>
> >>>>> );
> >>>>>
> >>>>> for my $putName (keys %put) {
> >>>>>   $factory->submit_parameter($putName,$put{$putName});
> >>>>> }
> >>>>>
> >>>>>
> >>>>> ...an exception is thrown:
> >>>>>
> >>>>> ------------- EXCEPTION: Bio::Root::Exception -------------
> >>>>> MSG: CDD_SEARCH is not a valid PUT parameter.
> >>>>> STACK: Error::throw
> >>>>> STACK: Bio::Root::Root::throw
> >> C:/Perl/site/lib/Bio/Root/Root.pm:359
> >>>>> STACK: Bio::Tools::Run::RemoteBlast::submit_parameter
> >>>>> C:/Perl/site/lib/Bio/Tools
> >>>>> /Run/RemoteBlast.pm:325
> >>>>> STACK: main::blast_a_sequence firsteval0.8.pm:383
> >>>>> STACK: main::blast_it firsteval0.8.pm:288
> >>>>> STACK: firsteval0.8.pm:35
> >>>>> ----------------------------------------------------------- .
> >>>>> I guess somehow this could be the solution to my problem:
> >>>>> http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/new/node78.html#s
> >>>>> ub:RID-for-Simultaneous
> >>>>> , but unfortunately I don't understand what to do.
> >>>>> I'm so sorry to bother you with this but please help me once
> >>>>> more...:)
> >>>>>
> >>>>> Best regards and thanks in advance, Jonas
> >>>>>
> >>>>> ----- Original Message -----
> >>>>> From: "Smithies, Russell" <Russell.Smithies at agresearch.co.nz>
> >>>>> To: "'Jonas Schaer'" <Brotelzwieb at gmx.de>
> >>>>> Cc: "'Chris Fields'" <cjfields at illinois.edu>; "'BioPerl List'"
> >>>>> <bioperl-l at lists.open-bio.org>
> >>>>> Sent: Monday, July 06, 2009 10:56 PM
> >>>>> Subject: RE: [Bioperl-l] different results with
> >> remote-blast skript
> >>>>>
> >>>>>
> >>>>> Hi Jonas,
> >>>>> You can't just play with the BLAST parameters and hope
> >> for a "better"
> >>>>> result.
> >>>>> I'd suggest that if you aren't sure what they do, you
> should leave
> >>>>> them alone as small changes can make huge differences in the
> >>>>> output - it's quite possible to miss finding what
> you're looking
> >>>>> for by using
> >> the wrong
> >>>>> parameters.
> >>>>> If all else fails, read the blast manual:
> >>>>> http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/blastall/blastall
> >>>>> _all.html
> >>>>> http://www.ncbi.nlm.nih.gov/blast/tutorial/
> >>>>> Or Read Ian Korfs' excellent book:
> >>>>> http://books.google.com/books?id=xvcnhDG9fNUC&lpg=PR17&ots=WJp
> >>>> fuHF6Hn&dq=ian%20korf%20%20blast%20book&pg=PA3
> >>>>>
> >>>>> Don't worry about the integer overflow bug as there's
> nothing you
> >>>>> can do about it. If you're interested, Google and Wikipedia are
> >>>>> your
> >>>>> friends:
> >>>>> http://en.wikipedia.org/wiki/Integer_overflow
> >>>>>
> >>>>>
> >>>>> Russell
> >>>>>
> >>>>>> -----Original Message-----
> >>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> >>>>>> bounces at lists.open-bio.org] On Behalf Of Jonas Schaer
> >>>>>> Sent: Tuesday, 7 July 2009 12:14 a.m.
> >>>>>> To: BioPerl List; Chris Fields
> >>>>>> Subject: Re: [Bioperl-l] different results with
> >> remote-blast skript
> >>>>>>
> >>>>>> Hi guys, thanks for your answers so far.
> >>>>>> @jason: integer overflow in blast.... sorry, but what do
> >>>>> you mean by that?
> >>>>>> how can I fix it...?
> >>>>>>
> >>>>>> Since I never really changed any parameters I thought them
> >>>>> all to be
> >>>>>> default.
> >>>>>> whatever, I tried to get "better" results with my prog
> >> by changing
> >>>>>> these:
> >>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'} = '11 1';
> >>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'MAX_NUM_SEQ'} = '100';
> >>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'EXPECT'} = '10';
> >>>>>>
> >>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATI
> >>>>> STICS'} =
> >>>>>> '1';
> >>>>>> with no effect...I guess these were default values anyway.
> >>>>>>
> >>>>>> So please maybe you can tell me all the other parameters I
> >>>>> can change with
> >>>>>> my
> >>>>>> perl-skript AND how to do that?
> >>>>>> Unfortunately both, perl and the blast-algorithm are pretty
> >>>>> much new to
> >>>>>> me,
> >>>>>> maybe thats why I just cannot find out how to do that on my
> >>>>> own... :/
> >>>>>>
> >>>>>> Here is the output I get with my remote-blast skript:
> >>>>>>
> >>>>> ##############################################################
> >>>>> ################
> >>>>>> ###################################
> >>>>>> Query Name:
> >>>>>>
> >>
> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLRSL
> >>>>>> L
> >>>>>>       hit name is ref|XP_001702807.1|
> >>>>>>               score is 442
> >>>>>> BLASTP 2.2.21+
> >>>>>> Reference: Stephen F. Altschul, Thomas L. Madden, Alejandro
> >>>>> A. Schaffer,
> >>>>>> Jinghui Zhang, Zheng Zhang, Webb Miller, and David J.
> >>>>> Lipman (1997),
> >>>>>> "Gapped
> >>>>>> BLAST and PSI-BLAST: a new generation of protein
> database search
> >>>>>> programs", Nucleic Acids Res. 25:3389-3402.
> >>>>>>
> >>>>>>
> >>>>>> Reference for composition-based statistics: Alejandro A.
> >>>>>> Schaffer, L. Aravind, Thomas L. Madden, Sergei Shavirin,
> >>>>> John L. Spouge,
> >>>>>> Yuri
> >>>>>> I. Wolf, Eugene V. Koonin, and Stephen F. Altschul (2001),
> >>>>> "Improving the
> >>>>>> accuracy of PSI-BLAST protein database searches with
> >>>>> composition-based
> >>>>>> statistics and other refinements", Nucleic Acids Res.
> >> 29:2994-3005.
> >>>>>>
> >>>>>>
> >>>>>> RID: 53STX5G2013
> >>>>>>
> >>>>>>
> >>>>>> Database: All non-redundant GenBank CDS
> >>>>>> translations+PDB+SwissProt+PIR+PRF excluding
> >> environmental samples
> >>>>>> from WGS projects
> >>>>>>          9,252,587 sequences; 3,169,972,781 total
> letters Query=
> >>>>>>
> >>>>>
> >>
> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLRSLL
> >>>>>>
> >>>>>
> >>
> DVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDNAFRQAHQNTA
> >> M
> >>>>>> ATGPDPDDEYE
> >>>>>> Length=150
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>      Score
> >>>>>> E
> >>>>>> Sequences producing significant alignments:
> >>>>>     (Bits)
> >>>>>> Value
> >>>>>>
> >>>>>> ref|XP_001702807.1|  ClpS-like protein [Chlamydomonas
> >>>>> reinhard...   174
> >>>>>> 2e-42
> >>>>>>
> >>>>>>
> >>>>>> ALIGNMENTS
> >>>>>>> ref|XP_001702807.1| ClpS-like protein [Chlamydomonas
> >> reinhardtii]
> >>>>>> gb|EDP06586.1| ClpS-like protein [Chlamydomonas reinhardtii]
> >>>>>> Length=303
> >>>>>>
> >>>>>> Score =  174 bits (442),  Expect = 2e-42, Method:
> >>>>> Composition-based
> >>>>>> stats.
> >>>>>> Identities = 150/150 (100%), Positives = 150/150 (100%),
> >>>>> Gaps = 0/150
> >>>>>> (0%)
> >>>>>>
> >>>>>> Query  1
> >>>>> MGSSSVGTYHLLLVLMgaggeqqavqagaevaSTEQVDGSGMAANSRGSTSGSEQPPrds
> >>>>>> 60
> >>>>>>
> >>>>> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDS
> >>>>>> Sbjct  154
> >>>>> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDS
> >>>>>> 213
> >>>>>>
> >>>>>> Query  61
> >>>>> dlgllrslldVAGVDRTalevkllalaeagaeMPPAQDSQATAAGVVATLTSVYRQQVAR
> >>>>>> 120
> >>>>>>
> >>>>> DLGLLRSLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVAR
> >>>>>> Sbjct  214
> >>>>> DLGLLRSLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVAR
> >>>>>> 273
> >>>>>>
> >>>>>> Query  121  AWHERDDNAFRQAHQNTAMATGPDPDDEYE  150
> >>>>>>           AWHERDDNAFRQAHQNTAMATGPDPDDEYE Sbjct  274
> >>>>>> AWHERDDNAFRQAHQNTAMATGPDPDDEYE  303
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> Database: All non-redundant GenBank CDS
> >>>>>> translations+PDB+SwissProt+PIR+PRF
> >>>>>> excluding environmental samples from WGS projects
> >>>>>>   Posted date:  Jul 5, 2009  4:41 AM Number of letters in
> >>>>>> database: -1,124,994,511 Number of sequences in database:
> >>>>>> 9,252,587
> >>>>>>
> >>>>>> Lambda     K      H
> >>>>>>  0.309    0.122    0.345
> >>>>>> Gapped
> >>>>>> Lambda     K      H
> >>>>>>  0.267   0.0410    0.140
> >>>>>> Matrix: BLOSUM62
> >>>>>> Gap Penalties: Existence: 11, Extension: 1 Number of
> Sequences:
> >>>>>> 9252587 Number of Hits to DB: 60273703 Number of extensions:
> >>>>>> 1448367 Number of successful extensions: 2103 Number
> of sequences
> >>>>>> better than 10: 0 Number of HSP's better than 10
> without gapping:
> >>>>>> 0 Number of HSP's gapped: 2113 Number of HSP's successfully
> >>>>>> gapped: 0 Length of query: 150 Length of database: 3169972781
> >>>>>> Length adjustment: 113 Effective length of query: 37 Effective
> >>>>>> length of database: 2124430450 Effective search space:
> >>>>>> 78603926650 Effective search space used: 78603926650
> >>>>>> T: 11
> >>>>>> A: 40
> >>>>>> X1: 16 (7.1 bits)
> >>>>>> X2: 38 (14.6 bits)
> >>>>>> X3: 64 (24.7 bits)
> >>>>>> S1: 42 (20.8 bits)
> >>>>>> S2: 74 (33.1 bits)
> >>>>>>
> >>>>>>
> >>>>> ##############################################################
> >>>>> ################
> >>>>>> ###################################
> >>>>>> and here are the hits (?) of the blast-algorithm on the
> >>>>> ncbi-homepage with
> >>>>>> the same query of course:
> >>>>>> ref|XP_001702807.1|  ClpS-like protein [Chlamydomonas
> >>>>> reinhard...   300
> >>>>>> 3e-80
> >>>>>> ref|XP_001942719.1|  PREDICTED: similar to GA16705-PA
> >>>>> [Acyrtho...  36.2
> >>>>>> 1.1
> >>>>>> ref|ZP_03781446.1|  hypothetical protein RUMHYD_00880
> >>>>> [Blautia...  35.4
> >>>>>> 1.8
> >>>>>> ref|XP_001563232.1|  leucyl-tRNA synthetase [Leishmania
> >>>>> brazil...  34.3
> >>>>>> 4.2
> >>>>>> ref|XP_680841.1|  hypothetical protein AN7572.2
> >>>>> [Aspergillus n...  33.5
> >>>>>> 6.0
> >>>>>> ref|YP_001768110.1|  hypothetical protein M446_1150
> >>>>> [Methyloba...  33.5
> >>>>>> 7.0
> >>>>>>
> >>>>> ##############################################################
> >>>>> ################
> >>>>>> ###################################at
> >>>>>> least the first hit is the same, but even there there is a
> >>>>> different score
> >>>>>> and e-value.
> >>>>>>
> >>>>>> thanks so much for any help :)
> >>>>>> regards, jonas
> >>>>>>
> >>>>>>
> >>>>>> ----- Original Message -----
> >>>>>> From: "Chris Fields" <cjfields at illinois.edu>
> >>>>>> To: "Jason Stajich" <jason at bioperl.org>
> >>>>>> Cc: "Smithies, Russell"
> >>>>> <Russell.Smithies at agresearch.co.nz>; "'BioPerl
> >>>>>> List'" <bioperl-l at lists.open-bio.org>; "'Jonas Schaer'"
> >>>>>> <Jonas_Schaer at gmx.de>
> >>>>>> Sent: Monday, July 06, 2009 12:51 AM
> >>>>>> Subject: Re: [Bioperl-l] different results with
> >> remote-blast skript
> >>>>>>
> >>>>>>
> >>>>>>> That inspires confidence ;>
> >>>>>>>
> >>>>>>> chris
> >>>>>>>
> >>>>>>> On Jul 5, 2009, at 4:40 PM, Jason Stajich wrote:
> >>>>>>>
> >>>>>>>> integer overflow in blast....
> >>>>>>>>
> >>>>>>>> On Jul 5, 2009, at 2:00 PM, Smithies, Russell wrote:
> >>>>>>>>
> >>>>>>>>> I'd guess it's a difference in the parameters used.
> >>>>>>>>> Interesting that both have the number of letters in
> the db as
> >>>>>>>>> "-1,125,070,205", I assume that's a bug  :-)
> >>>>>>>>>
> >>>>>>>>> Stats from your remote_blast:
> >>>>>>>>>
> >>>>>>>>> 'stats' => {
> >>>>>>>>>          'S1' => '42',
> >>>>>>>>>          'S1_bits' => '20.8',
> >>>>>>>>>          'lambda' => '0.309',
> >>>>>>>>>          'entropy' => '0.345',
> >>>>>>>>>          'kappa_gapped' => '0.0410',
> >>>>>>>>>          'T' => '11',
> >>>>>>>>>          'kappa' => '0.122',
> >>>>>>>>>          'X3_bits' => '24.7',
> >>>>>>>>>          'X1' => '16',
> >>>>>>>>>          'lambda_gapped' => '0.267',
> >>>>>>>>>          'X2' => '38',
> >>>>>>>>>          'S2' => '74',
> >>>>>>>>>          'seqs_better_than_cutoff' => '0',
> >>>>>>>>>          'posted_date' => 'Jul 4, 2009  4:41 AM',
> >>>>>>>>>          'Hits_to_DB' => '60102303',
> >>>>>>>>>          'dbletters' => '-1125070205',
> >>>>>>>>>          'A' => '40',
> >>>>>>>>>          'num_successful_extensions' => '2004',
> >>>>>>>>>          'num_extensions' => '1436892',
> >>>>>>>>>          'X1_bits' => '7.1',
> >>>>>>>>>          'X3' => '64',
> >>>>>>>>>          'entropy_gapped' => '0.140',
> >>>>>>>>>          'dbentries' => '9252258',
> >>>>>>>>>          'X2_bits' => '14.6',
> >>>>>>>>>          'S2_bits' => '33.1'
> >>>>>>>>>        }
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Stats from a blast done on the NCBI webpage:
> >>>>>>>>>
> >>>>>>>>> Database: All non-redundant GenBank CDS
> >>>>> translations+PDB+SwissProt
> >>>>>>>>> +PIR+PRF
> >>>>>>>>> excluding environmental samples from WGS projects
> Posted date:
> >>>>>>>>> Jul 4, 2009  4:41 AM Number of letters in database:
> >>>>>>>>> -1,125,070,205 Number of sequences in database:  9,252,258
> >>>>>>>>>
> >>>>>>>>> Lambda     K      H
> >>>>>>>>> 0.309    0.124    0.340
> >>>>>>>>> Gapped
> >>>>>>>>> Lambda     K      H
> >>>>>>>>> 0.267   0.0410    0.140
> >>>>>>>>> Matrix: BLOSUM62
> >>>>>>>>> Gap Penalties: Existence: 11, Extension: 1 Number of
> >>>>>>>>> Sequences: 9252258 Number of Hits to DB: 86493230 Number of
> >>>>>>>>> extensions: 3101413 Number of successful extensions: 9001
> >>>>>>>>> Number of sequences better than 100: 65 Number of
> HSP's better
> >>>>>>>>> than 100 without gapping: 0 Number of HSP's gapped: 9000
> >>>>>>>>> Number of HSP's successfully gapped: 66 Length of
> query: 150
> >>>>>>>>> Length of database: 3169897087 Length adjustment: 113
> >>>>>>>>> Effective length of query: 37 Effective length of database:
> >>>>>>>>> 2124391933 Effective search space: 78602501521 Effective
> >>>>>>>>> search space used: 78602501521
> >>>>>>>>> T: 11
> >>>>>>>>> A: 40
> >>>>>>>>> X1: 16 (7.1 bits)
> >>>>>>>>> X2: 38 (14.6 bits)
> >>>>>>>>> X3: 64 (24.7 bits)
> >>>>>>>>> S1: 42 (20.8 bits)
> >>>>>>>>> S2: 65 (29.6 bits)
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>> -----Original Message-----
> >>>>>>>>>> From: bioperl-l-bounces at lists.open-bio.org
> [mailto:bioperl-l-
> >>>>>>>>>> bounces at lists.open-bio.org] On Behalf Of Jonas Schaer
> >>>>>>>>>> Sent: Sunday, 28 June 2009 10:15 p.m.
> >>>>>>>>>> To: BioPerl List
> >>>>>>>>>> Subject: [Bioperl-l] different results with
> >> remote-blast skript
> >>>>>>>>>>
> >>>>>>>>>> Hi again :)
> >>>>>>>>>> please, I only have this little question:
> >>>>>>>>>> why do I get different results with my remote::blast
> >>>>> perl skript
> >>>>>>>>>> then on the
> >>>>>>>>>> ncbi blast homepage?
> >>>>>>>>>> I am using blastp, the query is an amino-sequence
> (different
> >>>>>>>>>> results with any sequence, differences not only in
> number of
> >>>>>>>>>> hits but
> >> even in e-
> >>>>>>>>>> values, scores
> >>>>>>>>>> etc...), the database is 'nr'.
> >>>>>>>>>> PLEASE help me,
> >>>>>>>>>> thank you in advance,
> >>>>>>>>>> Jonas
> >>>>>>>>>>
> >>>>>>>>>> ps: my skript:
> >>>>>>>>>>
> >>>>>>
> >>>>> ##############################################################
> >>>>> ################
> >>>>>>>>>> ##
> >>>>>>>>>> use Bio::Seq::SeqFactory;
> >>>>>>>>>> use Bio::Tools::Run::RemoteBlast; use strict; my
> >>>>>>>>>> @blast_report; my $prog = 'blastp';
> >>>>>>>>>> my $db   = 'nr';
> >>>>>>>>>> my $e_val= '1e-10';
> >>>>>>>>>> #my $e_val= '10';
> >>>>>>>>>> my @params = ( '-prog' => $prog,
> >>>>>>>>>>      '-data' => $db,
> >>>>>>>>>>      '-expect' => $e_val,
> >>>>>>>>>>      '-readmethod' => 'SearchIO' ); my $factory =
> >>>>>>>>>> Bio::Tools::Run::RemoteBlast->new(@params);
> >>>>>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'}
> = '11 1';
> >>>>>>>>>>
> $Bio::Tools::Run::RemoteBlast::HEADER{'MAX_NUM_SEQ'} = '100';
> >>>>>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'EXPECT'} =
> '10'; $ Bio
> >>>>>>>>>>
> >>>>>
> ::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'}
> >>>>>>>>>> = '1';
> >>>>>>>>>>
> >>>>>>>>>> my
> >>>>>>>>>> $
> >>>>>>>>>> blast_seq
> >>>>>>>>>>
> >>>>>
> >>
> ='MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLR
> >>>>>>>>>>
> >>>>>>
> >>>>> SLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDN
> >>>>> AFRQAHQNTAMATGPD
> >>>>>>>>>> PDDEYE';
> >>>>>>>>>> #$v is just to turn on and off the messages my $v = 1; my
> >>>>>>>>>> $seqbuilder = Bio::Seq::SeqFactory->new('-type' =>
> >>>>>>>>>> 'Bio::PrimarySeq'); my $seq = $seqbuilder->create(-seq
> >>>>>>>>>> =>$blast_seq,
> >> -display_id =>
> >>>>>>>>>> "$blast_seq");
> >>>>>>>>>> my $filename='temp2.out';
> >>>>>>>>>> my $r = $factory->submit_blast($seq); print STDERR
> >>>>>>>>>> "waiting..." if( $v > 0 ); while ( my @rids =
> >>>>>>>>>> $factory->each_rid ) {
> >>>>>>>>>>     foreach my $rid ( @rids )
> >>>>>>>>>>     {
> >>>>>>>>>>         my $rc = $factory->retrieve_blast($rid);
> >>>>>>>>>>         if( !ref($rc) )
> >>>>>>>>>>         {
> >>>>>>>>>>             if( $rc < 0 )
> >>>>>>>>>>             {
> >>>>>>>>>>                 $factory->remove_rid($rid);
> >>>>>>>>>>             }
> >>>>>>>>>>             print STDERR "." if ( $v > 0 );
> >>>>>>>>>>         }
> >>>>>>>>>>             else
> >>>>>>>>>>             {
> >>>>>>>>>>                 my $result = $rc->next_result();
> >>>>>>>>>>                 $factory->save_output($filename);
> >>>>>>>>>>                 $factory->remove_rid($rid);
> >>>>>>>>>>                 print "\nQuery Name: ",
> >>>>> $result->query_name(),
> >>>>>>>>>> "\n";
> >>>>>>>>>>                 while ( my $hit = $result->next_hit )
> >>>>>>>>>>                 {
> >>>>>>>>>>                     next unless ( $v > 0);
> >>>>>>>>>>                     print "\thit name is ",
> >> $hit->name, "\n";
> >>>>>>>>>>                     while( my $hsp = $hit->next_hsp )
> >>>>>>>>>>                     {
> >>>>>>>>>>                         print "\t\tscore is ",
> >>>>> $hsp->score, "\n";
> >>>>>>>>>>                     }
> >>>>>>>>>>                 }
> >>>>>>>>>>             }
> >>>>>>>>>>     }
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> }
> >>>>>>>>>> @blast_report = get_file_data ($filename); return
> >>>>>>>>>> @blast_report;
> >>>>>>>>>>
> >>>>>>
> >>>>> ##############################################################
> >>>>> ################
> >>>>>>>>>> ####
> >>>>>>>>>> _______________________________________________
> >>>>>>>>>> Bioperl-l mailing list
> >>>>>>>>>> Bioperl-l at lists.open-bio.org
> >>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>>>>>> =
> >>>>>>>>> =
> >>>>>>>>>
> >>>>>
> >>
> =====================================================================
> >>>>>>>>> Attention: The information contained in this message and/or
> >>>>>>>>> attachments from AgResearch Limited is intended only for the
> >>>>> persons or entities
> >>>>>>>>> to which it is addressed and may contain
> confidential and/or
> >>>>>>>>> privileged material. Any review, retransmission,
> dissemination
> >> or other use
> >>>>>>>>> of, or
> >>>>>>>>> taking of any action in reliance upon, this information
> >>>>> by persons or
> >>>>>>>>> entities other than the intended recipients is
> prohibited by
> >>>>>>>>> AgResearch Limited. If you have received this message in
> >>>>>>>>> error,
> >>>>> please notify
> >>>>>>>>> the
> >>>>>>>>> sender immediately.
> >>>>>>>>> =
> >>>>>>>>> =
> >>>>>>>>>
> >>>>>
> >>
> =====================================================================
> >>>>>>>>>
> >>>>>>>>> _______________________________________________
> >>>>>>>>> Bioperl-l mailing list
> >>>>>>>>> Bioperl-l at lists.open-bio.org
> >>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>>>>>
> >>>>>>>> --
> >>>>>>>> Jason Stajich
> >>>>>>>> jason at bioperl.org
> >>>>>>>>
> >>>>>>>> _______________________________________________
> >>>>>>>> Bioperl-l mailing list
> >>>>>>>> Bioperl-l at lists.open-bio.org
> >>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>> --------------------------------------------------------------
> >>>>> ----------------
> >>>>>> --
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> No virus found in this incoming message.
> >>>>>> Checked by AVG - www.avg.com
> >>>>>> Version: 8.5.375 / Virus Database: 270.13.5/2219 - Release
> >>>>> Date: 07/05/09
> >>>>>> 05:53:00
> >>>>>
> >>>>>
> >>>>> --------------------------------------------------------------
> >>>>> ------------------
> >>>>>
> >>>>>
> >>>>>
> >>>>> No virus found in this incoming message.
> >>>>> Checked by AVG - www.avg.com
> >>>>> Version: 8.5.375 / Virus Database: 270.13.5/2220 - Release
> >>>>> Date: 07/05/09
> >>>>> 17:54:00
> >>>>>
> >>>>> _______________________________________________
> >>>>> Bioperl-l mailing list
> >>>>> Bioperl-l at lists.open-bio.org
> >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>>
> >>
> >>
> >> --------------------------------------------------------------
> >> ------------------
> >>
> >>
> >>
> >> No virus found in this incoming message.
> >> Checked by AVG - www.avg.com
> >> Version: 8.5.375 / Virus Database: 270.13.8/2227 - Release
> >> Date: 07/09/09
> >> 05:55:00
> >>
> >>
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>




More information about the Bioperl-l mailing list