[Biopython] How to extract start and end positions of a sequence in blast output file
Peter
biopython at maubp.freeserve.co.uk
Sun Aug 30 07:29:08 EDT 2009
On Sat, Aug 29, 2009 at 8:10 PM, jorma kala<jjkk73 at gmail.com> wrote:
> Hi,
> I'm using Blast through the biopython module.
> Is it possible to retrieve start and end positions on the genome of an
> aligned sequence from a blast record object?
Yes - see below.
> (I've been looking at the Biopython tutorial, section 'the Blast record
> class', but haven't been able to find it.)
> Thank you very much
Have you tried using the built in help to find out more about
the HSP object??
e.g.
>>> from Bio.Blast import NCBIXML
>>> record = NCBIXML.read(open("xbt003.xml"))
>>> help(record.alignments[0].hsps[0])
...
Or, have you come across the Python command dir? This gives a
listing of all the properties and methods of an object (although
those starting with an underscore are special or private and
should usually be ignored).
e.g.
>>> from Bio.Blast import NCBIXML
>>> record = NCBIXML.read(open("xbt003.xml"))
>>> dir(record.alignments[0].hsps[0])
['__doc__', '__init__', '__module__', '__str__', 'align_length',
'bits', 'expect', 'frame', 'gaps', 'identities', 'match',
'num_alignments', 'positives', 'query', 'query_end', 'query_start',
'sbjct', 'sbjct_end', 'sbjct_start', 'score', 'strand']
The help text tells you this, but you could also guess from
using dir - sbjct_start and sbjct_end are what you want
(the start/end of the subject sequence, i.e. the database
match), while query_start and query_end are those for
your query sequence.
Peter
More information about the Biopython
mailing list