[Biopython-dev] SearchIO HSP indexing

Peter Cock p.j.a.cock at googlemail.com
Sat Feb 9 13:16:43 UTC 2013


On Sat, Feb 9, 2013 at 1:06 PM, Colin Archer <colin.aibn at gmail.com> wrote:
> Hi everyone,
>                   I have a question about the implementation of
> high-scoring segment pairs (HSPs) in SearchIO. I currently have an BLAST
> output file in XML format I am parsing and this is one of the hits (removed
> the alignment details to save space):
>
>         <Hit>
>           <Hit_num>1</Hit_num>
>           <Hit_id>gnl|BL_ORD_ID|111</Hit_id>
>           <Hit_def>ref|NC_007779|:125695-127587</Hit_def>
>           <Hit_accession>111</Hit_accession>
>           <Hit_len>1893</Hit_len>
>           <Hit_hsps>
>             <Hsp>
>               <Hsp_num>1</Hsp_num>
>               <Hsp_bit-score>3352.79</Hsp_bit-score>
>               <Hsp_score>1815</Hsp_score>
>               <Hsp_evalue>0</Hsp_evalue>
>               <Hsp_query-from>1</Hsp_query-from>
>               <Hsp_query-to>1893</Hsp_query-to>
>               <Hsp_hit-from>1</Hsp_hit-from>
>               <Hsp_hit-to>1893</Hsp_hit-to>
>               <Hsp_query-frame>1</Hsp_query-frame>
>               <Hsp_hit-frame>1</Hsp_hit-frame>
>               <Hsp_identity>1867</Hsp_identity>
>               <Hsp_positive>1867</Hsp_positive>
>               <Hsp_gaps>0</Hsp_gaps>
>             </Hsp>
>             <Hsp>
>               <Hsp_num>2</Hsp_num>
>               <Hsp_bit-score>399.997</Hsp_bit-score>
>               <Hsp_score>216</Hsp_score>
>               <Hsp_evalue>2.88061e-111</Hsp_evalue>
>               <Hsp_query-from>331</Hsp_query-from>
>               <Hsp_query-to>881</Hsp_query-to>
>               <Hsp_hit-from>22</Hsp_hit-from>
>               <Hsp_hit-to>581</Hsp_hit-to>
>               <Hsp_query-frame>1</Hsp_query-frame>
>               <Hsp_hit-frame>1</Hsp_hit-frame>
>               <Hsp_identity>452</Hsp_identity>
>               <Hsp_positive>452</Hsp_positive>
>               <Hsp_gaps>19</Hsp_gaps>
>               <Hsp_align-len>565</Hsp_align-len>
>             </Hsp>
>
> Using Hsp1 as an example, the query and hit starts ("Hsp_query_to" and
> "Hsp_hit_to") are both 1 in the XML but when I access the Hsp objects from
> the BlastResult, both values are equal to 0:
>
>>>> blast_record[0][0].query_start
> 0
>>>> blast_record[0][0].hit_start
> 0
>
> However, when I access the end objects for the query and hit, the result
> isn't 1892 (zero based 1893) but 1893:
>
>>>> blast_record[0][0].query_end
> 1893
>>>> blast_record[0][0].hit_end
> 1893
>
> Is this correct? I find it a little confusing that one result is zero-based
> and the other one-based.
>
> Thanks
> Colin

Hi Colin,

The SearchIO positions like elsewhere in Biopython should be
using Python style counting. Looking at this one:

               <Hsp_hit-from>1</Hsp_hit-from>
               <Hsp_hit-to>1893</Hsp_hit-to>

That is like a GenBank/EMBL location 1..1893 which in Python string
slicing is [0:1893], so the start has -1 but the end is unchanged. The
nice thing is the length is 1893 and is given as the difference of the
Python slicing style end and start.

Perhaps we need to work on the help text? Any suggestions?

Thanks,

Peter



More information about the Biopython-dev mailing list