[Biopython] how to get the hit length from Bio.Blast.NCBIXML?

Ann Loraine aloraine at gmail.com
Sun Mar 7 14:55:19 UTC 2010


Hello,

I'm using Bio.Blast.NCBIXML to parse blastx results for an annotation
project. I'm searching contig consensus sequences (assembled from 454
reads) against a protein database.

Since these are assembled ESTs and may be incomplete, I need to know
how much of a matched sequence was included in the alignment so that I
can compute the percent coverage of both the hit and query.

How do I retrieve the "hit length" from the objects returned by the parser?

I couldn't find anything in the record and alignment objects that
contains this information -- if it is not there, should it be added?

The hit length appears in the XML:

*cut*
    <Iteration>
      <Iteration_iter-num>3</Iteration_iter-num>
      <Iteration_query-ID>lcl|3_0</Iteration_query-ID>
      <Iteration_query-def>Both_1_c25003</Iteration_query-def>
      <Iteration_query-len>422</Iteration_query-len>
      <Iteration_hits>
        <Hit>
          <Hit_num>1</Hit_num>
          <Hit_id>gnl|BL_ORD_ID|12864</Hit_id>
          <Hit_def>gi|255551002|ref|XP_002516549.1| catalytic,
putative [Ricinus communis]</Hit_def>
          <Hit_accession>12864</Hit_accession>
          <Hit_len>431</Hit_len>
          <Hit_hsps>
            <Hsp>
              <Hsp_num>1</Hsp_num>
              <Hsp_bit-score>112.079</Hsp_bit-score>

*paste*

Best,

Ann Loraine



More information about the Biopython mailing list