[Bioperl-l] Bio::Search::HSP::GenericHSP::seq_inds

Marc Logghe Marc.Logghe at ablynx.com
Thu Mar 27 13:26:24 UTC 2008


Hi all,

I am a little bit confused about the above mentioned seq_inds() method.
At first, I had the impression that the method returns an array of
positions in the hsp (hit or query) sequence.

At least that is what one would expect looking at the example usage in
the HOWTOs (http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_the_methods
second code block).

Am I correct in believing you can only do this if your hsp query stretch
starts at position 1 of the query sequence?

I think seq_inds() returns a list of positions relative to the query/hit
sequence. So, the code shown in the HOWTO is a kind of special case.

However, I do not understand how seq_inds() is dealing with gaps.

An example. If you blast the worm protein ZK822.4 against swissprot
using blastp at ncbi you get this hsp as top:

 

>sp|Q5VT52|K0460_HUMAN Uncharacterized protein KIAA0460
Length=1461
 
 Score = 35.8 bits (81),  Expect = 0.48, Method: Composition-based
stats.
 Identities = 22/55 (40%), Positives = 32/55 (58%), Gaps = 3/55 (5%)
 
Query  402  IAVEEETKTTKKNKKQ-QQQANKNKNKNKKK--TTIAPEAAIDANIAAEVHTQVL  453
            +A+ E   TT K +KQ ++  NK  NK  KK  T+  P+AA+ + I AE  +Q L
Sbjct  139  VALREALSTTFKTQKQLKENLNKQPNKQWKKSQTSTNPKAALKSKIVAEFRSQAL  193

 

Now, if you call seq_inds(query => 'gap') on that particular hsp object,
you get these positions: 417, 431, 432. Obviously, there is no gap in
the original query sequence at these positions. 
How do you have to read these numbers ? Remark also that for instance
417 is the res just in front of the gap.

Regards,

Marc

 

 





More information about the Bioperl-l mailing list