[Biopython] How to extract start and end positions of a sequence in blast output file

Sun Aug 30 11:29:08 UTC 2009

On Sat, Aug 29, 2009 at 8:10 PM, jorma kala<jjkk73 at gmail.com> wrote:
> Hi,
> I'm using Blast through the biopython module.
> Is it possible to retrieve start and end positions on the genome of an
> aligned sequence  from a blast record object?

Yes - see below.

> (I've been looking at the Biopython tutorial, section 'the Blast record
> class', but haven't been able to find it.)
> Thank you very much

Have you tried using the built in help to find out more about
the HSP object??

e.g.
>>> from Bio.Blast import NCBIXML
>>> record = NCBIXML.read(open("xbt003.xml"))
>>> help(record.alignments[0].hsps[0])
...

Or, have you come across the Python command dir? This gives a
listing of all the properties and methods of an object (although
those starting with an underscore are special or private and
should usually be ignored).

e.g.
>>> from Bio.Blast import NCBIXML
>>> record = NCBIXML.read(open("xbt003.xml"))
>>> dir(record.alignments[0].hsps[0])
['__doc__', '__init__', '__module__', '__str__', 'align_length',
'bits', 'expect', 'frame', 'gaps', 'identities', 'match',
'num_alignments', 'positives', 'query', 'query_end', 'query_start',
'sbjct', 'sbjct_end', 'sbjct_start', 'score', 'strand']

The help text tells you this, but you could also guess from
using dir - sbjct_start and sbjct_end are what you want
(the start/end of the subject sequence, i.e. the database
match), while query_start and query_end are those for
your query sequence.

Peter