Thu, 30 Sep 1999 11:53:27 -0700 (PDT)
Sounds like you have some good stuff that may be the basis for a general,
robust, well-tested blast parser. I would like to:
1. Archive the package in ScriptCentral so that other people can use it.
2. Look into testing/enhancing it for general use in the core
#1 should be easy, if that's what you want. #2 is also possible, if it's
acceptable to you to release it under the biopython license
(http://www.biopython.org/License.shtml). I don't know when the
discussion about blast parsers is going to start, but I'm going to try to
push it through soon! :)
Also, why did you represent the iterations of psi-blast as a tree?
Please let me know what you think!
On Thu, 30 Sep 1999, Arne Mueller wrote:
> Hi all,
> I've read about the blast-parser project in www.biopython.org. I've
> already written a parser for blast2/psiblast protein sequence outbut
> some month ago. The parser takes (psi)blast file and represents it as a
> tree ob objects: each iteration is a branche of the root, each iteration
> contains a dictionary of hits and each hit a list of HSPs.
> The parser is a module and it's classes can be used to be inherited to
> yout own implementation of classes. There's also a python program
> 'blastflt.py' that uses the parser module to read all the hits and
> alignments from the (psi)blast output. It's anice example how to use the
> blast.py module (blastparser.py would be a better name for the module).
> However there's no documentation except the docstrings in the module and
> I've only tested the parser extensively for what I'm interested in - so
> there'll be many bug (for sure)!
> The parser is also rather slow because it used regular expressions and
> very liberal with errors in the blast output file. It requeires an
> additional class 'IO' which defines methods liek 'getline' and
> 'ungetline', the IO class is a dirty hack and has to be rewritten.
> Anyway if anybdoy is interested, I can send you the modlues blast.py and
> IO.py as well as the example program blastflt.py. You may get some
> inspirations and come up with a better parser ...
> Arne Mueller
> Biomolecular Modelling Laboratory
> Imperial Cancer Research Fund
> 44 Lincoln's Inn Fields
> London WC2A 3PX, U.K.
> phone : +44-(0)171 2693405 | fax :+44-(0)171-269-3534
> email : firstname.lastname@example.org | http://www.icnet.uk/bmm/
> BioPython mailing list - BioPython@biopython.org