[Biojava-dev] Proposed change to RichFormat interface

mark.schreiber at novartis.com mark.schreiber at novartis.com
Thu Jun 8 01:03:22 UTC 2006


Very cool!

Can you put this example in the cookbook?

- Mark





Richard Holland <richard.holland at ebi.ac.uk>
Sent by: biojava-dev-bounces at lists.open-bio.org
06/07/2006 08:36 PM

 
        To:     Mark Schreiber <mark.schreiber at novartis.com>
        cc:     biojava-dev <biojava-dev at biojava.org>, Michael Heuer <heuermh at acm.org>, 
Michael Heuer <heuermh at shell3.shore.net>
        Subject:        Re: [Biojava-dev] Proposed change to RichFormat interface


Hi guys.

See org.biojavax.seq.io.DebuggingRichSeqIOListener.

It extends BufferedInputStream, so can be used to wrap a normal
InputStream before being passed around.

It also implements RichSeqIOListener.

The idea is that you do something like this:

                 Namespace ns = RichObjectFactory.getDefaultNamespace();
                 InputStream is = new 
FileInputStream("myFastaFile.fasta");
                 FASTAFormat format = new FASTAFormat();

                 DebuggingRichSeqIOListener debug = 
                                 new DebuggingRichSeqIOListener(is);
                 BufferedReader br = new BufferedReader(
                                 new InputStreamReader(debug));

                 SymbolTokenization symParser = 
format.guessSymbolTokenization(debug);

                 format.readRichSequence(
            br,
            symParser,
            debug,
            ns);

This will then dump out everything as it is read, and all events as they
happen in-line with the input as it is interpreted.

Hope this helps?

cheers,
Richard
 

On Wed, 2006-06-07 at 14:02 +0800, mark.schreiber at novartis.com wrote:
> That might be a more elegant solution.
> 
> Could even make the InputStream implement RichSeqIOListener thus it 
would 
> be sending data to the RichFormat and listening to what the RichFormat 
> makes of the data.
> 
> The InputStreamIOListener could remember when the RichFormat emits a 
> startXXX() event record the line number and start buffering all the data 

> sent as the readLine() requests are made (while also sending it to the 
> RichFormat). When the RichFormat emits the corresponding endXXX() event 
> the buffer can be cleared and the process starts again.
> 
> Only problem might be what to do when the RichFormat consumes data in 
> between emitting events (which is allowed).
> 
> - Mark
> 
> 
> 
> 
> 
> Michael Heuer <heuermh at acm.org>
> Sent by: Michael Heuer <heuermh at shell3.shore.net>
> 06/07/2006 01:51 PM
> 
> 
>         To:     mark.schreiber at novartis.com
>         cc:     biojava-dev at biojava.org
>         Subject:        Re: [Biojava-dev] Proposed change to RichFormat 
interface
> 
> 
> Mark Schreiber wrote:
> 
> > Hi all -
> >
> > I would like to propose a change  to the RichFormat interface. I think 

> we
> > should do this now as we haven't done a stable biojavax roll out yet 
so
> > interface
> > changes should still be allowed. The additional methods would be:
> >
> > public String currentLine();
> > public int currentLineNumber();
> >
> > This would make debugging a lot easier, it would also make 
construction 
> of
> > a RichSeqIOListener that logs and debugs much easier. I was trying to 
do
> > this a while back. I started a background process that parsed 6GB of
> > genbank records looking for records that failed. It worked ok but 
would 
> be
> >
> > much better with the ability to query the RichFormat in the above way. 

> We
> > might even be able to make it  a utility that people could run on 
> suspect
> > files and generate standard bug reports to make it easier for us to 
> debug
> > the parser code.
> >
> > What do people think??
> 
> Another possibility would be to leave this sort of progress tracking up
> to the client, in that they could wrap the InputStream in something like
> an CountingInputStream before passing it to the parser(s):
> 
> http://jakarta.apache.org/commons/io/api-release/org/apache/commons/io/input/CountingInputStream.html
> 
>    michael
> 
> 
> 
> 
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
-- 
Richard Holland (BioMart Team)
EMBL-EBI
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD
UNITED KINGDOM
Tel: +44-(0)1223-494416

_______________________________________________
biojava-dev mailing list
biojava-dev at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biojava-dev






More information about the biojava-dev mailing list