[BioPython] plain txt blast output - xml instead

Peter biopython at maubp.freeserve.co.uk
Thu Jun 15 17:30:18 UTC 2006


Rohini Damle wrote:
> Hi,
> I am using BioPython 1.41 on windows I have also updated
> NcbIstandalone.pyfor the link u gave. here is my code.
> 
> from Bio.Blast import NCBIStandalone
> from Bio.Blast import NCBIXML
> blast_out = open("4proteinblast.xml","r")
> b_iterator = NCBIStandalone.Iterator(blast_out, NCBIXML.BlastParser())
> 
> for b_record in b_iterator :
>        query_name = b_record.query
>        print query_name
>       for alignment in b_record.alignments:
>               print '****Alignment****'
>               print 'sequence:', alignment.title
> 
> This code gives "sequences producing significant alignments for all the 4
> proteins but printing querry name as P1

This code does the same thing, but prints less on screen so its easier 
to read:

from Bio.Blast import NCBIStandalone
from Bio.Blast import NCBIXML
blast_out = open("4proteinblast.xml","r")
b_iterator = NCBIStandalone.Iterator(blast_out, NCBIXML.BlastParser())

for b_record in b_iterator :
     query_name = b_record.query
     print query_name
     for alignment in b_record.alignments:
         print query_name, alignment.title.split()[0]


 > I mean I am getting all the information I want but I have 4 protein
> querries and this code is giving only P1 as a query (not P2, P3, P4
 > but giving information about them) I ma attachin the xml file of
 > 4 protein blast results. thank you for your help.

Looking at the raw XML file by hand, I could only see references to P1, 
the first protein.

If the file had results for all four proteins I would expect to see:

<?xml version="1.0"?>
... results for P1 ...
<?xml version="1.0"?>
... results for P2 ...
<?xml version="1.0"?>
... results for P3 ...
<?xml version="1.0"?>
... results for P4 ...

Are you sure you gave Blast all four input sequences - and not just the 
first sequence?

Peter




More information about the Biopython mailing list