[Bioperl-l] sequence filtering

Hilmar Lapp hlapp@gnf.org
Tue, 8 Oct 2002 14:59:52 -0700


I'm trying to pull the daily full RefSeq cumulative update through bioperl. Before even getting my hands dirty, I realized that this can't work because there are full chromosomes in there, and their sequences will choke perl. OTOH, I'm not interested in those anyway and ideally I can just skip over sequences some property of which match some pattern.

Like always, there is more than one way to make this work, and I'm wondering what could be the (subjectively :) 'best' way in the absence of event-based parsing. Some options that crossed my mind:

a) pass an optional additional parameter to next_seq() which is a closure returning TRUE if the entry is to be parsed and returned and FALSE otherwise. For this option the questions would be, when to call this function (every line, every 'item', before feature table, before sequence, any combination of those?), and what to pass to the closure as argument (a hash map with properties? an instantiated Bio::SeqI object? the current line? the current slot that was parsed and its value? something else?).

b) create a SeqFilterI interface and pass an object implementing it. This is really just a more OO-form of a) and the same kind of questions need to be answered.

c) sending events to an event listener, and skipping over the sequence if any of the listeners returns FALSE (i.e., join by AND). This is again very similar to a) but more flexible but also more heavy-weight (more method calls). Again, similar kinds of questions would need to be answered in order to define SeqParseEventI or a similar interface.

I'd be glad to hear anyone's thoughts on this. Also, I'm sure there are better ways. If you know one, I'd be glad to learn.

My preference goes for simplicity, and so far I don't think a) is that bad, although it does lack some flexibility.

	-hilmar
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------