[Biojava-l] Stop condition for blast parser
mark.schreiber at novartis.com
mark.schreiber at novartis.com
Tue Mar 10 02:36:50 UTC 2009
Hi -
There are many ways to stop the parsing but it really depends on how you
have set the program up. Notably there is no way for the Blast parsing
system of BioJava to shut itself down but control probably shouldn't
happen at that level.
A crude but effective procedure is to write out the results when you find
the hit of interest and then simply call System.exit()
Another approach would be to spawn Tasks to parse each record and then
have them signal to the main thread when they are complete to shut them
down. If you are using Java 1.5 or earlier then you would need to do this
with Threads. If you have a later version you can use the concurrent
packages which are much nicer to deal with.
One thing I don't understand is why you don't blast each contig
separately, in that case the results would only contain your hit of
interest. That means 90K separate blasts but there are versions of blast
that run on clusters and the database (3 million genes) is not huge so it
should be an embarrassingly parallel problem?
- Mark
biojava-l-bounces at lists.open-bio.org wrote on 03/10/2009 03:00:36 AM:
> Hi Mark!
>
> Mark Schreiber wrote:
> > You could just customize BlastEcho to pass on the events of interest,
> > ignore those that are not interesting.
> That's what I am doing right now. But I don't know, how to tell my
> customized BlastEcho to stop, when a certain condition is met during a
> paricular event call. What's the command for stopping there?
>
> > It could also exit if a certain
> > event occurs.
> How?
>
> > Remember it cost almost nothing to read the file so you
> > save time by only sending interesting events for parsing.
> Hmm, I am not sure, if it's really almost nothing, when I've about
90,000
> contigs that were blasted against a database with about maybe 3,000,000
> genes. The blast output that I am parsing is about 13Gig big and every
> cycle I am looking for the results of one particular contig of these
> 90,000 contigs. So I definitely experienced that the time sums up a lot,
> when it's running in each of these 90,000 cycles over the whole file,
> although the contig I am looking for was already at the beginning ofthe
file.
>
>
> Cheers,
> Marcel
>
> >
> > On 7 Mar 2009, 12:01 PM, "Marcel Huntemann"
> > <marcel.huntemann at gmail.com <mailto:marcel.huntemann at gmail.com>>
wrote:
> >
> > But where? I can't do it in my customized handler, can I?
> >
> > Mark Schreiber wrote: > Because the blast parser uses event based
> > parsing you should be able to > c...
> >
> > > <marcel.huntemann at gmail.com <mailto:marcel.huntemann at gmail.com>
> > <mailto:marcel.huntemann at gmail.com
> > <mailto:marcel.huntemann at gmail.com>>> wrote: > > Hi! > > ...
> >
> > > <mailto:Biojava-l at lists.open-bio.org
> > <mailto:Biojava-l at lists.open-bio.org>>
> >
> > > http://lists.open-bio.org/mailman/listinfo/biojava-l >
> >
> _______________________________________________
> Biojava-l mailing list - Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
_________________________
CONFIDENTIALITY NOTICE
The information contained in this e-mail message is intended only for the
exclusive use of the individual or entity named above and may contain
information that is privileged, confidential or exempt from disclosure
under applicable law. If the reader of this message is not the intended
recipient, or the employee or agent responsible for delivery of the
message to the intended recipient, you are hereby notified that any
dissemination, distribution or copying of this communication is strictly
prohibited. If you have received this communication in error, please
notify the sender immediately by e-mail and delete the material from any
computer. Thank you.
More information about the Biojava-l
mailing list