[Biopython-dev] SearchIO HSP indexing
Colin Archer
colin.aibn at gmail.com
Sat Feb 9 13:06:13 UTC 2013
Hi everyone,
I have a question about the implementation of
high-scoring segment pairs (HSPs) in SearchIO. I currently have an BLAST
output file in XML format I am parsing and this is one of the hits (removed
the alignment details to save space):
<Hit>
<Hit_num>1</Hit_num>
<Hit_id>gnl|BL_ORD_ID|111</Hit_id>
<Hit_def>ref|NC_007779|:125695-127587</Hit_def>
<Hit_accession>111</Hit_accession>
<Hit_len>1893</Hit_len>
<Hit_hsps>
<Hsp>
<Hsp_num>1</Hsp_num>
<Hsp_bit-score>3352.79</Hsp_bit-score>
<Hsp_score>1815</Hsp_score>
<Hsp_evalue>0</Hsp_evalue>
<Hsp_query-from>1</Hsp_query-from>
<Hsp_query-to>1893</Hsp_query-to>
<Hsp_hit-from>1</Hsp_hit-from>
<Hsp_hit-to>1893</Hsp_hit-to>
<Hsp_query-frame>1</Hsp_query-frame>
<Hsp_hit-frame>1</Hsp_hit-frame>
<Hsp_identity>1867</Hsp_identity>
<Hsp_positive>1867</Hsp_positive>
<Hsp_gaps>0</Hsp_gaps>
</Hsp>
<Hsp>
<Hsp_num>2</Hsp_num>
<Hsp_bit-score>399.997</Hsp_bit-score>
<Hsp_score>216</Hsp_score>
<Hsp_evalue>2.88061e-111</Hsp_evalue>
<Hsp_query-from>331</Hsp_query-from>
<Hsp_query-to>881</Hsp_query-to>
<Hsp_hit-from>22</Hsp_hit-from>
<Hsp_hit-to>581</Hsp_hit-to>
<Hsp_query-frame>1</Hsp_query-frame>
<Hsp_hit-frame>1</Hsp_hit-frame>
<Hsp_identity>452</Hsp_identity>
<Hsp_positive>452</Hsp_positive>
<Hsp_gaps>19</Hsp_gaps>
<Hsp_align-len>565</Hsp_align-len>
</Hsp>
Using Hsp1 as an example, the query and hit starts ("Hsp_query_to" and
"Hsp_hit_to") are both 1 in the XML but when I access the Hsp objects from
the BlastResult, both values are equal to 0:
>>> blast_record[0][0].query_start
0
>>> blast_record[0][0].hit_start
0
However, when I access the end objects for the query and hit, the result
isn't 1892 (zero based 1893) but 1893:
>>> blast_record[0][0].query_end
1893
>>> blast_record[0][0].hit_end
1893
Is this correct? I find it a little confusing that one result is zero-based
and the other one-based.
Thanks
Colin
More information about the Biopython-dev
mailing list