[Biojava-l] Parsing a BLAST file
Keith James
kdj@sanger.ac.uk
05 Nov 2001 09:41:45 +0000
>>>>> "Susan" == Susan Glass <SGlass@genetics.com> writes:
[...]
Susan> This works fine, but I believe I should be able to use
Susan> objects in the org.biojava.bio.search package to avoid
Susan> writing my own content handler. Is this true? If so, could
Susan> someone please point me to some example code (or other
Susan> help) that starts with a BLAST output file, and ends with
Susan> hit objects? Unfortunately I'm having trouble choosing the
Susan> correct classes to use with only the javadocs as a guide.
Susan> I found a mail list posting from August that deals with
Susan> this(
Susan> http://biojava.org/pipermail/biojava-l/2001-August/001480.html
Susan> ), but it uses a package org.biojava.bio.program.ssbind
Susan> that I don't have and can't find.
Hi Susan,
Here goes,
org.biojava.bio.search
The package currently contains classes for representing search data
(which are not specific to any search program or algorithm), but it
does not deal with getting/parsing/interpreting the data from the
search program.
org.biojava.bio.program.sax
Which as you know, does the SAX stuff for searches (and other files)
(For completeness, org.biojava.bio.program.search is a more primitive
and less flexible way to approach some of the tasks the SAX package
achieves. It did (and still does) use the org.biojava.bio.search
classes to store its results).
There was an obvious gap between the two; you could parse the search
output with the SAX package, but had no way of making the result/hit
objects in org.biojava.bio.search. The reason you can't find the
org.biojava.bio.program.ssbind package which does this is probably
that you have downloaded an older release.
org.biojava.bio.program.ssbind exists in the current CVS head and the
September 20th build on the ftp site (I'd recommend checking out the
CVS version as then you get some tests too).
There is a SAX handler in there called SeqSimilarityAdapter which
convertes the SAX events into method calls on an
org.biojava.bio.program.search.SearchContentHandler. Currently two
implementations of the SearchContentHandler interface are provided
BlastLikeSearchBuilder
which is the one which will build the org.biojava.bio.search.*
results. Also present is
BlastLikeHomologyBuilder
which builds org.biojava.bio.seq.homol.Homology objects
instead.
If you do get the CVS version you'll find a whole bunch of working
Blast -> search objects and Fasta -> search objects in the ssbind
tests subdirectory.
I hope this is of some help.
cheers,
Keith
--
-= Keith James - kdj@sanger.ac.uk - http://www.sanger.ac.uk/Users/kdj =-
Pathogen Sequencing Unit, Wellcome Trust Sanger Institute, Cambridge, UK