[Biojava-l] Re: DNA Strider Format and SeqIOListener problem

Matthew Pocock matthew_pocock@yahoo.co.uk
Wed, 13 Mar 2002 13:03:15 +0000


Hi Marc,

All the projects seem to be moving slowly towards using controled 
vocabularies for some things. I guess as this matures, some of the 
property names will match vocabulary tearms. The problem is that some 
informtion is stored in interfaces and classes, while other information 
is stored in meta-data (e.g. the type property of a feature), and 
sometimes the two types of data are correlated. Please come up with 
standards for any properties you want to use, and we will see if they 
can be pushed through the whole project. In particular, I like the idea 
of a well-known description property.

Matthew

ps Thomas, what is the state of play with the BioSQL ontology stuff?

Marc Colosimo wrote:
> 
> Thomas Down wrote:
> 
> 
>>On Mon, Mar 11, 2002 at 11:17:28AM -0500, Marc Colosimo wrote:
>>
>>>Hi,
>>>
>>>I've been working on porting a format reader for DNA Strider files (a
>>>very popular Mac program). About 50 to 80 percent of it has been coded
>>>(20% documented). I have limited time to work on this and each
>>>time I pick it up I get stuck with implementing the SeqIOListener
>>>interface. There is very little documentation on how this works. I
>>>understand that I need to chain them together when calling StreamReader:
>>
>>Hi...
>>
>>To write a basic file-parser, you shouldn't have to write
>>any implementations of SeqIOListener (or SequenceBuilder).
>>The basic pattern for BioJava sequence parsing is just:
>>
>>   Raw data ---> |SequenceFormat| ---> events ---> |SeqIOListener|
>>
>>To support a new format, you just need to write a SequenceFormat
>>implementation, which parses raw data and passes information on
>>by calling methods on the SeqIOListener interfaces.
>>
>>If you want to instantiate normal, in-memory, objects,
>>then you should be using SimpleSequenceBuilder as the listener
>>at the end of this chain.
> 
> 
> This answered one of my main questions, but not the other. After looking over
> the docs I found my partial answer to my question (where is a list of defined
> Features).
> 
> The Docs for Feature say:
> 
> "We may need some standardisation (sp error) for what the fields mean. In
> particular, we should be compliant where sensible with GFF. "
> 
> I guess this has not been done.
> 
> So in my case I should just do the following.
> 
> public final static String PROPERTY_DESCRIPTIONLINE = "description_line"
> 
> [snip]
> 
> siol.addSequenceProperty(PROPERTY_DESCRIPTIONLINE, description);
> 
> where siol = SimpleSequenceBuilder.FACTORY
> 
> 
>>
>>Obviously, it's possible to write `filters' which implement
>>SeqIOListener, receive one stream of events, then pass a
>>(slightly modified) event stream on to another listener.
>>A lot of the parsers supplied with BioJava actually consist
>>of a SequenceFormat which just parses the basic `shape' of
>>the file, then a filter which processes the data further.
>>This was originally done for the sake of code reuse, but I
>>admit it does make the system rather harder to follow.
>>
> 
> 
> Would adding defined (or standardized) features allow for a simple
> SeqIOListener that implements them all.
> 
> 
>>I'd suggest that you don't bother with this pattern unless it
>>makes life significantly easier -- just put everything in
>>the SequenceFormat object.
>>
>>As for the URI property...  This is to contain a URI which
>>identifies the sequence.  e.g.:
>>
>>   file:///home/thomas/new-seq.seq
>>   http://www.genome.org/exciting-clone.fa
>>   urn:sequence/embl:AL121903
>>
>>If there's no sensible way to generate a URI for a sequence,
>>I'd suggest just passing in the sequence name for this property.
>>
>>Hope this helps,
>>
>>    Thomas.
> 
> 
> Yes is does.
> Thank
> Marc
> 
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l@biojava.org
> http://biojava.org/mailman/listinfo/biojava-l
>