[BioPython] Zerg BLAST parser - timings

Leighton Pritchard L.Pritchard at scri.sari.ac.uk
Mon Sep 1 11:37:57 EDT 2003


Hi,

While I don't imagine Zerg will necessarily be the best option for all 
circumstances, running time trials with the script below (extracts all 
query sequence names from NCBI standalone BLAST output) gives on our Linux 
server:

361kb output file, 3 runs:
-------------------------------------------
Module                  Mean            StDev

NCBIStandalone.Parser   0.986s          0.003s

Zerg                            0.032s          0.005s


367Mb output file, 3 runs:
-------------------------------------------
Module                  Mean            StDev

NCBIStandalone.Parser   1461.08s        20.95s

Zerg                            58.29s          4.38s


There are almost certainly faster ways to use the Biopython parser code 
(specialist consumer, perhaps?) than in the test script given below, but I 
couldn't think of them nearly as quickly as I could implement the quick 
run-though with the Zerg parser.

In this simple case, the Zerg code turned out to be between 25 and 30 times 
faster than the Biopython code in the script below.  The Zerg wrapper and 
build instructions can be obtained from
http://bioinf.scri.sari.ac.uk/lp/pyzerg.shtml

############
import time

testfile = './testBLAST2.out'

print "Begin Biopython test"
# BIOPYTHON

from Bio.Blast import NCBIStandalone

fhandle = open(testfile, 'r')

parser = NCBIStandalone.BlastParser()
iterator = NCBIStandalone.Iterator(fhandle, parser)

biotime0 = time.time()
bioquerylist = []
while 1:
     record = iterator.next()
     if record is None:
         break
     bioquerylist.append(record.query)
biotime = time.time() - biotime0

print "End Biopython test"

print "Begin Zerg test"
# ZERG

import zerg

zergtime0 = time.time()
zergquerylist = []
zerg.open_file(testfile)
code, value = zerg.get_token()
while code:
     if code == 2:
         zergquerylist.append(value)
     code, value = zerg.get_token()
zergtime = time.time() - zergtime0
print "End Zerg test"

print "Bio: %s; Zerg: %s" % (len(bioquerylist), len(zergquerylist))
print "Bio: %s; Zerg: %s" % (biotime, zergtime)
######


Dr Leighton Pritchard AMRSC
PPI, Scottish Crop Research Institute
Invergowrie, Dundee, DD2 5DA, Scotland, UK
L.Pritchard at scri.sari.ac.uk
PGP key 47B4A485: http://www.keyserver.net http://pgp.mit.edu



More information about the BioPython mailing list