[Bioperl-l] hmmer3.pm question re query and hit coordinates

Peng Zhou zhoupenggeni at gmail.com
Wed Jul 11 20:05:56 UTC 2012


Thanks Chris, here is the link of the filed 
bug: https://redmine.open-bio.org/issues/3369

On Wednesday, July 11, 2012 2:02:46 PM UTC-5, Christopher Fields wrote:
>
> Peng, 
>
> Has this been filed as a bug yet?   
>
>     https://redmine.open-bio.org/projects/bioperl 
>
> Seems like it would be fairly easy to fix, but I want to track it just in 
> case. 
>
> chris 
>
> On Jul 11, 2012, at 12:45 PM, Peng Zhou wrote: 
>
> > Hello guys, 
> > 
> > Just a follow-up, it seems to me the bioperl-live version is still 
> having the same problem - calling hit "query" while query sequence "hit". I 
> also looked into the test script written for hmmer3 
> (bioperl-live/t/SearchIO/hmmer.t), and it doesn't deal with the alignment 
> part - I guess that's why this bug was not discovered. 
> > 
> > To be simple, here's an output of hmmsearch v3.0: 
> > # hmmsearch :: search profile(s) against a sequence database 
> > # HMMER 3.0 (March 2010); http://hmmer.org/ 
> > # Copyright (C) 2010 Howard Hughes Medical Institute. 
> > # Freely distributed under the GNU General Public License (GPLv3). 
> > # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
> - 
> > # query HMM file:                 
>  /project/youngn/zhoup/Scripts/spada/profile/21_all.hmm 
> > # target sequence database:       
>  /project/youngn/zhoup/Data/misc3/spada/Athaliana/01_genome/12_refseq_orf.fa 
>
> > # output directed to file:         
> /project/youngn/zhoup/Data/misc3/spada/Athaliana/11_hmmSearchX/01_raw.txt 
> > # number of worker threads:        4 
> > # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
> - 
> > 
> > Query:       CRP0000  [M=75] 
> > Scores for complete sequences (score includes all domains): 
> >    --- full sequence ---   --- best 1 domain ---    -#dom- 
> >     E-value  score  bias    E-value  score  bias    exp  N  Sequence     
>             Description 
> >     ------- ------ -----    ------- ------ -----   ---- --  --------     
>             ----------- 
> >     5.5e-25   95.0  14.4    5.7e-25   95.0  10.0    1.0  1 
>  Chr2_540228_540404_+     
> > 
> > Domain annotation for each sequence (and alignments): 
> > >> Chr2_540228_540404_+   
> >    #    score  bias  c-Evalue  i-Evalue hmmfrom  hmm to    alifrom  ali 
> to    envfrom  env to     acc 
> >  ---   ------ ----- --------- --------- ------- -------    ------- 
> -------    ------- -------    ---- 
> >    1 !   95.0  10.0   3.6e-30   5.7e-25      20      74 ..       4     
>  59 .]       1      59 [] 0.95 
> > 
> >   Alignments for each domain: 
> >   == domain 1    score: 95.0 bits;  conditional E-value: 3.6e-30 
> >                CRP0000 20 
> tegpkvaeartCesqShkFkGpCvsdtnCasvCrtEgfpgGecrg.rrrCfCtkpc 74 
> >                           ++gp+++eartCes+Sh+FkGpCvs +nCa+vC++Egf gG+crg 
> rrrC+Ct++c 
> >   Chr2_540228_540404_+  4 
> GMGPVTVEARTCESKSHRFKGPCVSTHNCANVCHNEGFGGGKCRGfRRRCYCTRHC 59 
> >                           
> 568899***99********************************************* PP 
> > 
> > And here is a dump of the parsed HSP object: 
> > $VAR1 = bless( { 
> >                  'VERBOSE' => 0, 
> >                  'IDENTICAL' => 0, 
> >                  'RANK' => 1, 
> >                  'STRANDED' => 'NONE', 
> >                  'EVALUE' => '3.6e-30', 
> >                  'HSP_LENGTH' => 56, 
> >                  'ALGORITHM' => 'HMMSEARCH' 
> >                  'SCORE' => '95.0', 
> >                  'GAP_SYMBOL' => '-', 
> >                  'CONSERVED' => 0, 
> >                   
> >                  'HIT_NAME' => 'Chr2_540228_540404_+', 
> >                  'HIT_DESC' => '', 
> >                  'HIT_START' => '20', 
> >                  'HIT_END' => '74', 
> >                  'HIT_LENGTH' => 56, 
> >                  'HIT_SEQ' => 
> 'tegpkvaeartCesqShkFkGpCvsdtnCasvCrtEgfpgGecrg-rrrCfCtkpc', 
> >                  'HIT_FRAME' => 0, 
> >                   
> >                  'QUERY_NAME' => 'CRP0000', 
> >                  'QUERY_DESC' => undef, 
> >                  'QUERY_START' => '4', 
> >                  'QUERY_END' => '59', 
> >                  'QUERY_LENGTH' => '75', 
> >                  'QUERY_FRAME' => 0, 
> >                  'QUERY_SEQ' => 
> 'GMGPVTVEARTCESKSHRFKGPCVSTHNCANVCHNEGFGGGKCRGfRRRCYCTRHC', 
> >                   
> >                  'HOMOLOGY_SEQ' => '++gp+++eartCes+Sh+FkGpCvs 
> +nCa+vC++Egf gG+crg rrrC+Ct++c', 
> >                }, 'Bio::Search::HSP::HMMERHSP' ); 
> > 
> > Clearly, the "HIT_START", "HIT_END", "HIT_SEQ" should actually be 
> exchanged with "QUERY_START", "QUERY_END" and "QUERY_SEQ" values. 
> > 
> > Thanks, 
> > 
> > Peng, 
> > 
> > On Tuesday, July 19, 2011 11:23:20 PM UTC-5, Givan, Scott A. wrote: 
> > I'll try the bioperl-live version. Thanks guys. 
> > Scott Givan 
> > 541-740-4685 
> > Sent from an iPhone (so expect typos). 
> > 
> > On Jul 19, 2011, at 10:34 PM, "Chris Fields" <cjfields at illinois.edu> 
> wrote: 
> > 
> > > This might be a disconnect between the HMMER3 version in bioperl-live 
> and the one in Kai's bioperl-hmmer3 repo.  I believe the one in 
> bioperl-live is newer.  Scott, can you give that a try? 
> > > 
> > > chris 
> > > 
> > > On Jul 19, 2011, at 9:45 PM, Thomas Sharpton wrote: 
> > > 
> > >> Hi Scott, 
> > >> 
> > >> Thanks for writing. I'm on the road at the moment so I have to be 
> briefer and less thorough than I'd like to be. 
> > >> 
> > >> What you are observing is not the intended behavior. Oddly, it's not 
> what I recall obtaining in my tests on this software, though I was mostly 
> interested in hmmsearch at the time and may have been sloppier than I 
> should have been when it came to hmmscan. 
> > >> 
> > >> What version of HMMER3 you're using? There have been some small 
> formatting changes in the past that might be causing a burp in the parser, 
> though I'm doubting it. 
> > >> 
> > >> Kai Blin wrote some test scripts (found here: 
> bioperl-live/t/SearchIO/hmmer.t) that, if I recall correctly, evaluate 
> query/hit coordinates. It might be worth giving this a shot if you haven't 
> already. 
> > >> 
> > >> Also, if you don't mind, I'm happy to run your code on your output 
> file on my end. It might help me diagnose the problem. 
> > >> 
> > >> Sorry this is being a thorn in your side! I've cc'ed the list in case 
> anyone else has insight into this matter. 
> > >> 
> > >> Best, 
> > >> Thomas 
> > >> 
> > >> On Jul 19, 2011, at 10:43 AM, Givan, Scott A. wrote: 
> > >> 
> > >>> Hi Thomas, 
> > >>> 
> > >>> I'm using modules in the bipoerl-hmmer3 git repository to parse 
> hmmscan 
> > >>> reports. When I parse the files and walk through the HSP's like: 
> > >>> 
> > >>> while (my $hit = $rslt->next_model) { 
> > >>> 
> > >>>    while (my $domain = $hit->next_hsp) { 
> > >>> 
> > >>> And retrieve the "hit" coordinates like: 
> > >>> 
> > >>>        print "hit coords: ", $domain->start('hit'), "-", 
> $domain->end('hit'), 
> > >>> "\n"; 
> > >>> 
> > >>> The coordinates returned correspond to what I would call the 
> "query", 
> > >>> since they are for the sequence I fed to hmmscan to search the 
> profile 
> > >>> database. Likewise, when retrieving the query coordinates like 
> > >>> $domain->start('query'), I get what I consider the "hit" 
> coordinates, 
> > >>> since they are for the domain profile. Is this the intended 
> behavior? 
> > >>> 
> > >>> Thanks. 
> > >>> 
> > >>> scott 
> > >>> 
> > >>> -- 
> > >>> Scott A. Givan 
> > >>> Associate Director 
> > >>> Informatics Research Core Facility 
> > >>> 240e Bond Life Sciences Center 
> > >>> Research Assistant Professor 
> > >>> Molecular Microbiology and Immunology 
> > >>> University of Missouri, Columbia 
> > >>> 
> > >>> TEL 573-882-2948 
> > >>> FAX 573-884-9676 
> > >>> http://ircf.rnet.missouri.edu 
> > >>> 
> > >>> 
> > >>> 
> > >> 
> > >> _______________________________________________ 
> > >> Bioperl-l mailing list 
> > >> Bioperl-l at lists.open-bio.org 
> > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l 
> > > 
> > 
> > _______________________________________________ 
> > Bioperl-l mailing list 
> > Bioperl-l at lists.open-bio.org 
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l 
> > 
> > 
> > 
> > 
>
>
> _______________________________________________ 
> Bioperl-l mailing list 
> Bioperl-l at lists.open-bio.org 
> http://lists.open-bio.org/mailman/listinfo/bioperl-l 
>



More information about the Bioperl-l mailing list