[Bioperl-l] Bio::Location::Fuzzy, Bio::Location::Split

Jason Stajich jason@chg.mc.duke.edu
Thu, 25 Jan 2001 10:06:17 -0500 (EST)

On Wed, 24 Jan 2001, Hilmar Lapp wrote:

> Jason Stajich wrote:
> > 
> > I'd just like to reiterate - beware bioperl-live is development code.
> > 
> > I added these handlers for Fuzzy and Split features.  I decided to create
> > methods start_fuzzy,end_fuzzy for Bio::Location::Fuzzy to handle whether
> > or not we saw the <, > descriptors.  I probably need some more test cases
> I may have missed the obvious solution, but how are we going to
> distinguish 'unknown start/end' and 'somewhere in between'? That is,
> '<150' meaning 'before position 150', making it non-obvious how to
> return a minimal start, and '120.130' meaning it's between two known
> positions. Will I have to test fuzzy_start() before I'm allowed to
> safely call min_start()? (no, I don't want to suggest exceptions ...
> :O)

Hmm, perhaps I was confused.  I thought Split Location would deal with
min_start/max_end.  I believe fuzzy can have 3 qualities, a fuzzy start
(<150..100) a fuzzy end (90..<100) and fuzzy 'range' (1.12) [for lack of a
better word, suggestions welcome]. All 3 can be present in the same
location so they have to be independent operators.   When you call
start, it will return what it thinks is the start but you'll have to
test to see if the range or the start is fuzzy ($loc->range_fuzzy ||
$loc->start_fuzzy).  Perhaps that is too tedious?  I'd rather not throw an
exception here, but can be persuaded.    

Feel free to suggest a better set of methods for this.  

Now I'm cheating because I just added range_fuzzy this morning since I
wanted to think about that some more.  Learning by doing....

Oh and I think I just messed up - I'm not handling the 3'/5' different for
the fuzziness, (< vs >).  Will fix that by start_fuzzy/end_fuzzy returning
-1, 0, 1 meaning 5', not fuzzy, on 3'.  Unless you think it should return 
"<100" or "100>"  instead?

> > to make sure we are really getting everything to work, but the test in
> > t/SeqIO  test.genbank in genbank.out seem to work for most things except
> > the  variation feature type which uses the operator 'replace'.  We'll have
> > to define that in the FTHelper model, I didn't plan for it.
> > 
> I'm not sure the 'replace' operator is still standard (i.e., allowed).
> I seem to recall that it is no longer among the allowed operators, so
> you might wish to double-check on NCBI's feature table grammar
> definition.

okay, well, we'll have to think about whether or not we want to just
handle non-standard operators in a bulk way 'NonStandardLocation' which
stores a tag that describes the operator so that we can preserve the tag
name, or if we should build the flexibility some other way.  Clearly what
is being output right now 

variation       2913^2913

is relatively different from 
variation       replace(347,"c")

> 	Hilmar
> -- 
> -------------------------------------------------------------
> Hilmar Lapp                            email: lapp@gnf.org
> GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
> -------------------------------------------------------------
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l

Jason Stajich
Center for Human Genetics
Duke University Medical Center