[BioPython] blast parse
Jose Blanca
jblanca at btc.upv.es
Wed Jan 30 04:15:49 EST 2008
Hi:
I'm new on the list and on biopython. I come from perl and I'm liking python a
lot.
I'm trying to read a big blast file and it takes a lot o time and memory. I'm
not sure if I'm taking the most efficient path. Basically I'm doing:
blasth = file('blast.xml', 'r')
from Bio.Blast import NCBIXML
p = NCBIXML.BlastParser()
blast_parse = p.parse(blasth)
for blast_result in blast_parse:
#do whatever
I was expecting to read the records one by one, but the call to
p.parse(blasth) takes a lot of time and memory. I'm not sure about what this
function returns, a list or an iterator. I've looked at the NCBIXML.py file
and the BlastParser class has two parse methods (am I wrong?).
def parse(self, handler):
"""Parses the XML data
handler -- file handler or StringIO
This method returns a list of Blast record objects.
"""
def parse(handle, debug=0):
"""Returns an iterator a Blast record for each query.
handle - file handle to and XML file to parse
debug - integer, amount of debug information to print
This is a generator function that returns multiple Blast records
objects - one for each query sequence given to blast. The file
is read incrementally, returning complete records as they are read
in.
I guess that the first function would read the complete file before returning
anything, but the second should return and read the records one by one. I
don't know if this guess is correct.
Is there other way to read these huge blast files without using so much
memory?
Best regards,
--
Jose M. Blanca Postigo
Instituto Universitario de Conservacion y
Mejora de la Agrodiversidad Valenciana (COMAV)
Universidad Politecnica de Valencia (UPV)
Edificio CPI (Ciudad Politecnica de la Innovacion), 8E
46022 Valencia (SPAIN)
Tlf.:+34-96-3877000 (ext 88473)
More information about the BioPython
mailing list