[BioPython] Blast-Parser

Jeffrey Chang jchang@SMI.Stanford.EDU
Thu, 30 Sep 1999 11:53:27 -0700 (PDT)

Hi Arne,

Sounds like you have some good stuff that may be the basis for a general,
robust, well-tested blast parser.  I would like to:
1.  Archive the package in ScriptCentral so that other people can use it.
2.  Look into testing/enhancing it for general use in the core

#1 should be easy, if that's what you want.  #2 is also possible, if it's
acceptable to you to release it under the biopython license
(http://www.biopython.org/License.shtml).  I don't know when the
discussion about blast parsers is going to start, but I'm going to try to
push it through soon!  :) 

Also, why did you represent the iterations of psi-blast as a tree?

Please let me know what you think!


On Thu, 30 Sep 1999, Arne Mueller wrote:

> Hi all,
> I've read about the blast-parser project in www.biopython.org. I've
> already written a parser for blast2/psiblast protein sequence outbut
> some month ago. The parser takes (psi)blast file and represents it as a
> tree ob objects: each iteration is a branche of the root, each iteration
> contains a dictionary of hits and each hit a list of HSPs. 
> The parser is a module and it's classes can be used to be inherited to
> yout own implementation of classes. There's also a python program
> 'blastflt.py' that uses the parser module to read all the hits and
> alignments from the (psi)blast output. It's anice example how to use the
> blast.py module (blastparser.py would be a better name for the module).
> However there's no documentation except the docstrings in the module and
> I've only tested the parser extensively for what I'm interested in - so
> there'll be many bug (for sure)!
> The parser is also rather slow because it used regular expressions and
> very liberal with errors in the blast output file. It requeires an
> additional class 'IO' which defines methods liek 'getline' and
> 'ungetline', the IO class is a dirty hack and has to be rewritten. 
> Anyway if anybdoy is interested, I can send you the modlues blast.py and
> IO.py as well as the example program blastflt.py. You may get some
> inspirations and come up with a better parser ...
> 	greetings,
> 	Arne
> -- 
> Arne Mueller
> Biomolecular Modelling Laboratory
> Imperial Cancer Research Fund
> 44 Lincoln's Inn Fields
> London WC2A 3PX, U.K.
> phone : +44-(0)171 2693405      | fax :+44-(0)171-269-3534
> email : a.mueller@icrf.icnet.uk | http://www.icnet.uk/bmm/
> _______________________________________________
> BioPython mailing list  -  BioPython@biopython.org
> http://biopython.org/mailman/listinfo/biopython