[Biocorba-l] SeqFeatureLocation

Alan Robinson alan@ebi.ac.uk
Thu, 8 Feb 2001 17:15:12 +0000 (GMT Standard Time)


Ah... Did you mean that the location would just be (5.10) or (1^3)?

E.g. 

FT  XYZ                  (5.10)
                         /db_xref="SWISS-PROT:P12345"


In that case, the locations are valid:

1^3 would be a valid location and "points to a site between two adjacent
bases anywhere between bases 1 and 3".

The location 5.10 is also valid and "indicates that the exact location is
unknown but that it is one of the bases between bases 5 and 10,
inclusive".


For BioCorba, I'm working on the 95% rule (i.e. it should be able to
handle 95% of all cases - of which the above, I would say, aren't
typical).


What does this mean in terms of the return type without changing the IDL?

Immediate choices are:

1) return a 'null' value for the end 'SeqFeaturePostion'.

2) Throw a "UnableToProcess" exception and return the location as a
string. 


--
============================================================
Alan J. Robinson, D.Phil.             Tel:+44-(0)1223 494444
European Bioinformatics Institute     Fax:+44-(0)1223 494468
EMBL Outstation - Hinxton             Email:  alan@ebi.ac.uk
Wellcome Trust Genome Campus
Hinxton, Cambridge
CB10 1SD, UK                http://industry.ebi.ac.uk/~alan/
============================================================

On Wed, 7 Feb 2001, Jason Stajich wrote:

> Okay, so I've been told that 
> <5..100 and 5<..100 mean the same thing.  I feel better about that.  
> 
> But I'm still not clear how the location model will handle
> (5.10) or (1^3) as locations.   Are they really valid locations.  We can
> fudge it by making the start position be the value (since it can be
> represented that way) and make it so there is no ending position.  Sort of
> circumvents the model though.   
> 
> So we don't have a part for a fuzzy location only fuzzy endpoints.  What
> if the whole location is fuzzy, ie it is within 5.10 but we're not sure
> where it starts or ends.  To make this work we'd need to add a fuzzy field
> to the SeqFeatureLocatoin struct.  
> 
> Thanks.
> 
> On Tue, 6 Feb 2001, Jason Stajich wrote:
> 
> > Alan - I've finally gotten a chance to think about the location model some
> > more.. 
> > 
> > I like the SeqFeatureLocation has 2 SeqFeaturePositions for start end, and
> > SeqFeaturePosition can handle all the codes.  But I want to be sure that
> > the following cases can really be handled by this.  Correct me if any of
> > these are wrong.  I think we might need another field to represent whether
> > or not the fuzzy code that is BEFORE or AFTER is also on the 3' or 5'
> > strand... Or we change the code to be BEFORE-3' BEFORE-5', AFTER-3',
> > AFTER-5'.  Yuck, I know... 
> > 
> > 1..100 --  Location has 2 position objects for start and end fuzzy code is
> >            'EXACT', extension = 0
> > (1.2)..30 -- start is a position with fuzzy code 'WITHIN', position = 1,
> >              extension = 2 
> > 1^3       -- is this a legal location?  If it is , how do we represent it?
> > <20..40   -- (feature starts before bp 20 on 5' strand), position=20
> >               extension=0, fuzzy='BEFORE'
> > >20..100  -- (feature starts after bp 20 on 5' strand) fuzzy='AFTER'
> > 20<..100  -- (feature starts before bp 20 on 3' strand) fuzzy='BEFORE'
> > 20>..100  -- (feature starts after bp 20 on 3' strand)  fuzzy='AFTER'
> > 
> > see 
> > http://www.ncbi.nlm.nih.gov/Sitemap/samplerecord.html#FeaturesB
> > for explaination
> > 
> > "The location of each feature is provided as well, an can be a single
> > base, a contiguous span of bases, a joining of sequence spans, and
> > other representations.  If a feature is located on the complementary
> > strand, the word "complement" will appear before the base span. If the
> > "<" symbol precedes a base span, the sequence is partial on the 5' end
> > (e.g., CDS <1..206).  If the ">" symbol follows a base span, the
> > sequence is partial on the 3' end (e.g., CDS 435..915>)."
> > 
> > 
> > Jason Stajich
> > jason@chg.mc.duke.edu
> > Center for Human Genetics
> > Duke University Medical Center 
> > http://www.chg.duke.edu/ 
> > 
> > 
> > _______________________________________________
> > Biocorba-l mailing list
> > Biocorba-l@biocorba.org
> > http://www.biocorba.org/mailman/listinfo/biocorba-l
> > 
> 
> Jason Stajich
> jason@chg.mc.duke.edu
> Center for Human Genetics
> Duke University Medical Center 
> http://www.chg.duke.edu/ 
> 
> 
> _______________________________________________
> Biocorba-l mailing list
> Biocorba-l@biocorba.org
> http://www.biocorba.org/mailman/listinfo/biocorba-l
>