[BioPython] blast parse
Michiel de Hoon
mjldehoon at yahoo.com
Wed Jan 30 09:56:56 UTC 2008
Dear Jose,
To get the records one-by-one, use
from Bio.Blast import NCBIXML
blast_parse = NCBIXML.parse(blasth)
for blast_result in blast_parse:
# do whatever with blast_result
This avoids having to read the complete XML file all at once.
To the developers:
We should probably think about removing the NCBIXML.BlastParser.parse, and perhaps adding a NCBIXML.read function to read exactly one record from the XML file.
--Michiel.
Jose Blanca <jblanca at btc.upv.es> wrote: Hi:
I'm new on the list and on biopython. I come from perl and I'm liking python a
lot.
I'm trying to read a big blast file and it takes a lot o time and memory. I'm
not sure if I'm taking the most efficient path. Basically I'm doing:
blasth = file('blast.xml', 'r')
from Bio.Blast import NCBIXML
p = NCBIXML.BlastParser()
blast_parse = p.parse(blasth)
for blast_result in blast_parse:
#do whatever
I was expecting to read the records one by one, but the call to
p.parse(blasth) takes a lot of time and memory. I'm not sure about what this
function returns, a list or an iterator. I've looked at the NCBIXML.py file
and the BlastParser class has two parse methods (am I wrong?).
def parse(self, handler):
"""Parses the XML data
handler -- file handler or StringIO
This method returns a list of Blast record objects.
"""
def parse(handle, debug=0):
"""Returns an iterator a Blast record for each query.
handle - file handle to and XML file to parse
debug - integer, amount of debug information to print
This is a generator function that returns multiple Blast records
objects - one for each query sequence given to blast. The file
is read incrementally, returning complete records as they are read
in.
I guess that the first function would read the complete file before returning
anything, but the second should return and read the records one by one. I
don't know if this guess is correct.
Is there other way to read these huge blast files without using so much
memory?
Best regards,
--
Jose M. Blanca Postigo
Instituto Universitario de Conservacion y
Mejora de la Agrodiversidad Valenciana (COMAV)
Universidad Politecnica de Valencia (UPV)
Edificio CPI (Ciudad Politecnica de la Innovacion), 8E
46022 Valencia (SPAIN)
Tlf.:+34-96-3877000 (ext 88473)
_______________________________________________
BioPython mailing list - BioPython at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biopython
---------------------------------
Never miss a thing. Make Yahoo your homepage.
More information about the Biopython
mailing list