[Biopython] align sequence to genomic DNA
Horea Chrristian
h.chr at mail.ru
Fri Feb 27 17:42:21 UTC 2015
Hi again, I am trying to align a sequence to the mouse genome, I am
currently doing it thus:
```
#!/usr/bin/env python
from __future__ import division
__author__ = 'Horea Christian'
from Bio import Entrez
Entrez.email = "h.chr at mail.ru"
from Bio import pairwise2
from Bio.pairwise2 import format_alignment
from Bio.Blast import NCBIWWW
from Bio import SeqIO
from Bio.Seq import Seq
from Bio.Alphabet import generic_dna
from Bio.Blast import NCBIXML
my_seq =
Seq("AATATGGCGAGGAACACTGAAAAATTGGAAAATTTANATGTCCACTGTANGACATGGAATATGGCAAGAAAACTGAAAATCATGGAAAATGANAAACATCCACGTGACGACTTGAAAAATGACGAAATCACTGAAAAACGTGAAAAATGANAAATGCACACTCTCGGACCTGGAATATGGCGAGAAAACTGAAAATCACGGAAAATGAGAAATACACACTTTANGACGTGAAATATGGANAGGAAAACTGAAAAAGTGGAAAATTTANAAATGTCCACTGTAGGACTGTTGGGAGCCGCGCGAGATATGGCA", generic_dna)
result_handle = NCBIWWW.qblast("blastn", "nt", my_seq)
blast_records = NCBIXML.read(result_handle)
E_VALUE_THRESH = 1e-100
for alignment in blast_records.alignments:
for hsp in alignment.hsps:
if hsp.expect < E_VALUE_THRESH:
print('****Alignment****')
print('sequence:', alignment.title)
print('length:', alignment.length)
print('e value:', hsp.expect)
print(hsp.query[0:75] + '...')
print(hsp.match[0:75] + '...')
print(hsp.sbjct[0:75] + '...')
```
It seems to work quite well (you can run it yourselves), except of
course that I get huge numbers of high-confidence hits. No matter, my
issue right now is with using bipython. So how can I parse the position
of the matching sequence on the genome from my NCBIWWW.qblast results? I
have tried to look at the raw XML of resut_handle, but I was not able to
find any such tag? :-/ Can you help me out? Also, other suggestions
regarding how I currently use bipython are welcome!
Cheers,
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biopython/attachments/20150227/e5a41c56/attachment.html>
More information about the Biopython
mailing list