Marc.Logghe at ablynx.com
Thu Mar 27 13:26:24 UTC 2008
I am a little bit confused about the above mentioned seq_inds() method.
At first, I had the impression that the method returns an array of
positions in the hsp (hit or query) sequence.
At least that is what one would expect looking at the example usage in
the HOWTOs (http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_the_methods
second code block).
Am I correct in believing you can only do this if your hsp query stretch
starts at position 1 of the query sequence?
I think seq_inds() returns a list of positions relative to the query/hit
sequence. So, the code shown in the HOWTO is a kind of special case.
However, I do not understand how seq_inds() is dealing with gaps.
An example. If you blast the worm protein ZK822.4 against swissprot
using blastp at ncbi you get this hsp as top:
>sp|Q5VT52|K0460_HUMAN Uncharacterized protein KIAA0460
Score = 35.8 bits (81), Expect = 0.48, Method: Composition-based
Identities = 22/55 (40%), Positives = 32/55 (58%), Gaps = 3/55 (5%)
Query 402 IAVEEETKTTKKNKKQ-QQQANKNKNKNKKK--TTIAPEAAIDANIAAEVHTQVL 453
+A+ E TT K +KQ ++ NK NK KK T+ P+AA+ + I AE +Q L
Sbjct 139 VALREALSTTFKTQKQLKENLNKQPNKQWKKSQTSTNPKAALKSKIVAEFRSQAL 193
Now, if you call seq_inds(query => 'gap') on that particular hsp object,
you get these positions: 417, 431, 432. Obviously, there is no gap in
the original query sequence at these positions.
How do you have to read these numbers ? Remark also that for instance
417 is the res just in front of the gap.
More information about the Bioperl-l