[Biojava-l] DNA Strider Format and SeqIOListener problem
Thomas Down
td2@sanger.ac.uk
Tue, 12 Mar 2002 10:12:22 +0000
On Mon, Mar 11, 2002 at 11:17:28AM -0500, Marc Colosimo wrote:
> Hi,
>
> I've been working on porting a format reader for DNA Strider files (a
> very popular Mac program). About 50 to 80 percent of it has been coded
> (20% documented). I have limited time to work on this and each
> time I pick it up I get stuck with implementing the SeqIOListener
> interface. There is very little documentation on how this works. I
> understand that I need to chain them together when calling StreamReader:
Hi...
To write a basic file-parser, you shouldn't have to write
any implementations of SeqIOListener (or SequenceBuilder).
The basic pattern for BioJava sequence parsing is just:
Raw data ---> |SequenceFormat| ---> events ---> |SeqIOListener|
To support a new format, you just need to write a SequenceFormat
implementation, which parses raw data and passes information on
by calling methods on the SeqIOListener interfaces.
If you want to instantiate normal, in-memory, objects,
then you should be using SimpleSequenceBuilder as the listener
at the end of this chain.
Obviously, it's possible to write `filters' which implement
SeqIOListener, receive one stream of events, then pass a
(slightly modified) event stream on to another listener.
A lot of the parsers supplied with BioJava actually consist
of a SequenceFormat which just parses the basic `shape' of
the file, then a filter which processes the data further.
This was originally done for the sake of code reuse, but I
admit it does make the system rather harder to follow.
I'd suggest that you don't bother with this pattern unless it
makes life significantly easier -- just put everything in
the SequenceFormat object.
As for the URI property... This is to contain a URI which
identifies the sequence. e.g.:
file:///home/thomas/new-seq.seq
http://www.genome.org/exciting-clone.fa
urn:sequence/embl:AL121903
If there's no sensible way to generate a URI for a sequence,
I'd suggest just passing in the sequence name for this property.
Hope this helps,
Thomas.