cjfields at uiuc.edu
Thu Mar 27 15:05:59 UTC 2008
According to the GenericHSP::seq_inds() POD, seq_inds() reports
residue positions (indices) for the query/subject based on identity/
conservation, i.e. these are fro the original sequence positions as
determined by the HSP data, not alignment column positions. 'gaps'
should be reported at the position prior to where a gap is inserted.
However I think something is getting borked when the gap length is
longer than one, so I would partially qualify this as a bug.
Example: When I ran this using bioperl-live it gives a different set
of gaps indices which appear to be correct. I reran the BLASTP using
the web form using your query against swissprot and parsed it. I got
slightly different results for the BLAST report (probably differences
in the query sequence):
>gi|74746888|sp|Q5VT52|K0460_HUMAN Uncharacterized protein KIAA0460
Score = 35.8 bits (81), Expect = 0.47, Method: Composition-based
Identities = 22/55 (40%), Positives = 32/55 (58%), Gaps = 3/55 (5%)
Query 394 IAVEEETKTTKKNKKQ-QQQANKNKNKNKKK--TTIAPEAAIDANIAAEVHTQVL 445
+A+ E TT K +KQ ++ NK NK KK T+ P+AA+ + I AE +Q L
Sbjct 139 VALREALSTTFKTQKQLKENLNKQPNKQWKKSQTSTNPKAALKSKIVAEFRSQAL 193
seq_inds('query' => 'gaps') reports 409,423, and 424, which is
partially correct, e.g. there is a gap inserted after position 409 and
423 in the query. However, no gap is present after 424; I think this
occurs b/c the gap length is 2. The other HSPs report similar problems.
P.S. Just saw than Sendu posted; I agree, seq. positions with gap
lengths > 1 should be repeated. Should be easy to fix that.
On Mar 27, 2008, at 8:26 AM, Marc Logghe wrote:
> Hi all,
> I am a little bit confused about the above mentioned seq_inds()
> At first, I had the impression that the method returns an array of
> positions in the hsp (hit or query) sequence.
> At least that is what one would expect looking at the example usage in
> the HOWTOs (http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_the_methods
> second code block).
> Am I correct in believing you can only do this if your hsp query
> starts at position 1 of the query sequence?
> I think seq_inds() returns a list of positions relative to the query/
> sequence. So, the code shown in the HOWTO is a kind of special case.
> However, I do not understand how seq_inds() is dealing with gaps.
> An example. If you blast the worm protein ZK822.4 against swissprot
> using blastp at ncbi you get this hsp as top:
>> sp|Q5VT52|K0460_HUMAN Uncharacterized protein KIAA0460
> Score = 35.8 bits (81), Expect = 0.48, Method: Composition-based
> Identities = 22/55 (40%), Positives = 32/55 (58%), Gaps = 3/55 (5%)
> Query 402 IAVEEETKTTKKNKKQ-QQQANKNKNKNKKK--TTIAPEAAIDANIAAEVHTQVL
> +A+ E TT K +KQ ++ NK NK KK T+ P+AA+ + I AE +Q L
> Sbjct 139 VALREALSTTFKTQKQLKENLNKQPNKQWKKSQTSTNPKAALKSKIVAEFRSQAL
> Now, if you call seq_inds(query => 'gap') on that particular hsp
> you get these positions: 417, 431, 432. Obviously, there is no gap in
> the original query sequence at these positions.
> How do you have to read these numbers ? Remark also that for instance
> 417 is the res just in front of the gap.
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign
More information about the Bioperl-l