[BioPython] Blast-Parser

Arne Mueller a.mueller@icrf.icnet.uk
Thu, 30 Sep 1999 13:39:35 +0100

Hi all,

I've read about the blast-parser project in www.biopython.org. I've
already written a parser for blast2/psiblast protein sequence outbut
some month ago. The parser takes (psi)blast file and represents it as a
tree ob objects: each iteration is a branche of the root, each iteration
contains a dictionary of hits and each hit a list of HSPs. 

The parser is a module and it's classes can be used to be inherited to
yout own implementation of classes. There's also a python program
'blastflt.py' that uses the parser module to read all the hits and
alignments from the (psi)blast output. It's anice example how to use the
blast.py module (blastparser.py would be a better name for the module).

However there's no documentation except the docstrings in the module and
I've only tested the parser extensively for what I'm interested in - so
there'll be many bug (for sure)!

The parser is also rather slow because it used regular expressions and
very liberal with errors in the blast output file. It requeires an
additional class 'IO' which defines methods liek 'getline' and
'ungetline', the IO class is a dirty hack and has to be rewritten. 

Anyway if anybdoy is interested, I can send you the modlues blast.py and
IO.py as well as the example program blastflt.py. You may get some
inspirations and come up with a better parser ...



Arne Mueller
Biomolecular Modelling Laboratory
Imperial Cancer Research Fund
44 Lincoln's Inn Fields
London WC2A 3PX, U.K.
phone : +44-(0)171 2693405      | fax :+44-(0)171-269-3534
email : a.mueller@icrf.icnet.uk | http://www.icnet.uk/bmm/