[BioPython] parsing the blastoutput and printing the alingment

Muthuraman, Manickam manickam.muthuraman at wur.nl
Thu Jun 15 14:13:17 UTC 2006


Dear peter

here is the code my_blast.out and the error. My need is to get all the blast hit sequences in fasta format. By parsing and i can extract accession number from it.

Code
from Bio import Fasta
file_for_blast=open('/home/manickam/Documents/m_cold.fasta','r')
f_iterator=Fasta.Iterator(file_for_blast)
f_record=f_iterator.next()
from Bio.Blast import NCBIWWW
result_handle=NCBIWWW.qblast('blastp','nr',f_record, format_type="XML")
save_file=open('/home/manickam/my_blast.out','w')
blast_results=result_handle.read()
save_file.write(blast_results)
save_file.close()
blast_out=open('/home/manickam/my_blast.out','r')
from Bio.Blast import NCBIXML
from Bio.Blast import NCBIStandalone
b_parser=NCBIXML.BlastParser()
b_iterator=NCBIStandalone.Iterator(blast_out,b_parser)
b_record = b_iterator.next()
for alignment in b_record.alignments:
    print "inside 2 loop"
    for hsp in alignment.hsps:
        print "inside 1 loop"
        print 'seq:',alignment.title
blast_out.close()

Error
[root at bioinfo python]# python blas.py
/usr/lib/python2.4/site-packages/Bio/Blast/NCBIWWW.py:1064: UserWarning: qblast works only with blastn and blastp for now.
  warnings.warn("qblast works only with blastn and blastp for now.")
Traceback (most recent call last):
  File "blas.py", line 16, in ?
    b_record = b_iterator.next()
  File "/usr/lib/python2.4/site-packages/Bio/Blast/NCBIStandalone.py", line 1410, in next
    return self._parser.parse(File.StringHandle(data))
  File "/usr/lib/python2.4/site-packages/Bio/Blast/NCBIXML.py", line 112, in parse
    self._parser.parse(handler)
  File "/usr/lib/python2.4/xml/sax/expatreader.py", line 107, in parse
    xmlreader.IncrementalParser.parse(self, source)
  File "/usr/lib/python2.4/xml/sax/xmlreader.py", line 123, in parse
    self.feed(buffer)
  File "/usr/lib/python2.4/xml/sax/expatreader.py", line 211, in feed
    self._err_handler.fatalError(exc)
  File "/usr/lib/python2.4/xml/sax/handler.py", line 38, in fatalError
    raise exception
xml.sax._exceptions.SAXParseException: <unknown>:1:4: not well-formed (invalid token)
[root at bioinfo python]#            

my_blast.out
HTTP/1.1 200 OK
Date: Thu, 15 Jun 2006 13:57:19 GMT
Server: Nde
Content-Type: application/xml
Connection: close

<?xml version="1.0"?>
<!DOCTYPE BlastOutput PUBLIC "-//NCBI//NCBI BlastOutput/EN" "NCBI_BlastOutput.dtd">
<BlastOutput>
  <BlastOutput_program>blastp</BlastOutput_program>
  <BlastOutput_version>BLASTP 2.2.14 [May-07-2006]</BlastOutput_version>
  <BlastOutput_reference>Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schäffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), &quot;Gapped BLAST and PSI-BLAST: a new generation of protein database search programs&quot;, Nucleic Acids Res. 25:3389-3402.</BlastOutput_reference>
  <BlastOutput_db>nr</BlastOutput_db>
  <BlastOutput_query-ID>1_13944</BlastOutput_query-ID>
  <BlastOutput_query-def>1BK0</BlastOutput_query-def>
  <BlastOutput_query-len>331</BlastOutput_query-len>
  <BlastOutput_param>
.
.
.
.
.
.
.
.
 <Hsp_identity>76</Hsp_identity>
              <Hsp_positive>128</Hsp_positive>
              <Hsp_gaps>27</Hsp_gaps>
              <Hsp_align-len>295</Hsp_align-len>
              <Hsp_qseq>VPKIDVSPLFGD-DQAAKMRVAQQIDAASRDTGFFYAVNHGIN---VQRLSQKTKEFHMSITPEEKWDLAIRAYNKEHQDQVRAGYYLSIPGKKAVESFCYLNP--NFTPDHPRIQAKTPTHEVNVWPDETKHPGFQDFAEQYYWDVFGLSSALLKGYALALGKEENFFARHFKPDDTLASVVLIRYP-YLDPYPEAAIKTAADGTKLSFEWHEDVSLITVLYQSNVQNLQVETAAGYQDIEADDTGYLINCGSYMAHLTNNYYKAPIHRV--KWVNAERQSLPFFVNLGYDSVI</Hsp_qseq>
              <Hsp_hseq>LPVIDLSLLDGSPESAAKFR--DDLLCATHDVGFFYLVGHGVDESLMDDLLAASREFFD--LPEDQKFAVENVKSPQFRGYTRVGGELT-EGKTDWREQIDVGPERDVIDNAPGLADYWRLEGPNLWPDAV--PQLRGLVNEWNDKLSAVSLRLLRAWAHALGAPEDVFDNAFA-DKPFPQLKIVRYPGESNPEPKQGVGAHRDGGVLTL----------LMVEPGKGGLQVDYNGEWVDVPPKPGAFVVNIGEMLELATEGYLKATLHRVISPLIGDDRISIPFFFNPALDTVM</Hsp_hseq>
              <Hsp_midline>+P ID+S L G  + AAK R    +  A+ D GFFY V HG++   +  L   ++EF     PE++        + + +   R G  L+  GK        + P  +   + P +         N+WPD    P  +    ++   +  +S  LL+ +A ALG  E+ F   F  D     + ++RYP   +P P+  +    DG  L+           ++ +     LQV+    + D+      +++N G  +   T  Y KA +HRV    +  +R S+PFF N   D+V+</Hsp_midline>
            </Hsp>
          </Hit_hsps>
        </Hit>
      </Iteration_hits>
      <Iteration_stat>
        <Statistics>
          <Statistics_db-num>3695564</Statistics_db-num>
          <Statistics_db-len>1269795892</Statistics_db-len>
          <Statistics_hsp-len>0</Statistics_hsp-len>
          <Statistics_eff-space>0</Statistics_eff-space>
          <Statistics_kappa>0.041</Statistics_kappa>
          <Statistics_lambda>0.267</Statistics_lambda>
          <Statistics_entropy>0.14</Statistics_entropy>
        </Statistics>
      </Iteration_stat>
    </Iteration>
  </BlastOutput_iterations>
</BlastOutput>





More information about the Biopython mailing list