[Biojava-l] Re: DNA Strider Format and SeqIOListener problem
Marc Colosimo
MEColosimo@alumni.carnegiemellon.edu
Tue, 12 Mar 2002 16:42:05 -0500
Thomas Down wrote:
> On Mon, Mar 11, 2002 at 11:17:28AM -0500, Marc Colosimo wrote:
> > Hi,
> >
> > I've been working on porting a format reader for DNA Strider files (a
> > very popular Mac program). About 50 to 80 percent of it has been coded
> > (20% documented). I have limited time to work on this and each
> > time I pick it up I get stuck with implementing the SeqIOListener
> > interface. There is very little documentation on how this works. I
> > understand that I need to chain them together when calling StreamReader:
>
> Hi...
>
> To write a basic file-parser, you shouldn't have to write
> any implementations of SeqIOListener (or SequenceBuilder).
> The basic pattern for BioJava sequence parsing is just:
>
> Raw data ---> |SequenceFormat| ---> events ---> |SeqIOListener|
>
> To support a new format, you just need to write a SequenceFormat
> implementation, which parses raw data and passes information on
> by calling methods on the SeqIOListener interfaces.
>
> If you want to instantiate normal, in-memory, objects,
> then you should be using SimpleSequenceBuilder as the listener
> at the end of this chain.
This answered one of my main questions, but not the other. After looking over
the docs I found my partial answer to my question (where is a list of defined
Features).
The Docs for Feature say:
"We may need some standardisation (sp error) for what the fields mean. In
particular, we should be compliant where sensible with GFF. "
I guess this has not been done.
So in my case I should just do the following.
public final static String PROPERTY_DESCRIPTIONLINE = "description_line"
[snip]
siol.addSequenceProperty(PROPERTY_DESCRIPTIONLINE, description);
where siol = SimpleSequenceBuilder.FACTORY
>
>
> Obviously, it's possible to write `filters' which implement
> SeqIOListener, receive one stream of events, then pass a
> (slightly modified) event stream on to another listener.
> A lot of the parsers supplied with BioJava actually consist
> of a SequenceFormat which just parses the basic `shape' of
> the file, then a filter which processes the data further.
> This was originally done for the sake of code reuse, but I
> admit it does make the system rather harder to follow.
>
Would adding defined (or standardized) features allow for a simple
SeqIOListener that implements them all.
>
> I'd suggest that you don't bother with this pattern unless it
> makes life significantly easier -- just put everything in
> the SequenceFormat object.
>
> As for the URI property... This is to contain a URI which
> identifies the sequence. e.g.:
>
> file:///home/thomas/new-seq.seq
> http://www.genome.org/exciting-clone.fa
> urn:sequence/embl:AL121903
>
> If there's no sensible way to generate a URI for a sequence,
> I'd suggest just passing in the sequence name for this property.
>
> Hope this helps,
>
> Thomas.
Yes is does.
Thank
Marc