[Biocorba-l] SeqFeatureLocation
Alan Robinson
alan@ebi.ac.uk
Thu, 8 Feb 2001 17:15:12 +0000 (GMT Standard Time)
Ah... Did you mean that the location would just be (5.10) or (1^3)?
E.g.
FT XYZ (5.10)
/db_xref="SWISS-PROT:P12345"
In that case, the locations are valid:
1^3 would be a valid location and "points to a site between two adjacent
bases anywhere between bases 1 and 3".
The location 5.10 is also valid and "indicates that the exact location is
unknown but that it is one of the bases between bases 5 and 10,
inclusive".
For BioCorba, I'm working on the 95% rule (i.e. it should be able to
handle 95% of all cases - of which the above, I would say, aren't
typical).
What does this mean in terms of the return type without changing the IDL?
Immediate choices are:
1) return a 'null' value for the end 'SeqFeaturePostion'.
2) Throw a "UnableToProcess" exception and return the location as a
string.
--
============================================================
Alan J. Robinson, D.Phil. Tel:+44-(0)1223 494444
European Bioinformatics Institute Fax:+44-(0)1223 494468
EMBL Outstation - Hinxton Email: alan@ebi.ac.uk
Wellcome Trust Genome Campus
Hinxton, Cambridge
CB10 1SD, UK http://industry.ebi.ac.uk/~alan/
============================================================
On Wed, 7 Feb 2001, Jason Stajich wrote:
> Okay, so I've been told that
> <5..100 and 5<..100 mean the same thing. I feel better about that.
>
> But I'm still not clear how the location model will handle
> (5.10) or (1^3) as locations. Are they really valid locations. We can
> fudge it by making the start position be the value (since it can be
> represented that way) and make it so there is no ending position. Sort of
> circumvents the model though.
>
> So we don't have a part for a fuzzy location only fuzzy endpoints. What
> if the whole location is fuzzy, ie it is within 5.10 but we're not sure
> where it starts or ends. To make this work we'd need to add a fuzzy field
> to the SeqFeatureLocatoin struct.
>
> Thanks.
>
> On Tue, 6 Feb 2001, Jason Stajich wrote:
>
> > Alan - I've finally gotten a chance to think about the location model some
> > more..
> >
> > I like the SeqFeatureLocation has 2 SeqFeaturePositions for start end, and
> > SeqFeaturePosition can handle all the codes. But I want to be sure that
> > the following cases can really be handled by this. Correct me if any of
> > these are wrong. I think we might need another field to represent whether
> > or not the fuzzy code that is BEFORE or AFTER is also on the 3' or 5'
> > strand... Or we change the code to be BEFORE-3' BEFORE-5', AFTER-3',
> > AFTER-5'. Yuck, I know...
> >
> > 1..100 -- Location has 2 position objects for start and end fuzzy code is
> > 'EXACT', extension = 0
> > (1.2)..30 -- start is a position with fuzzy code 'WITHIN', position = 1,
> > extension = 2
> > 1^3 -- is this a legal location? If it is , how do we represent it?
> > <20..40 -- (feature starts before bp 20 on 5' strand), position=20
> > extension=0, fuzzy='BEFORE'
> > >20..100 -- (feature starts after bp 20 on 5' strand) fuzzy='AFTER'
> > 20<..100 -- (feature starts before bp 20 on 3' strand) fuzzy='BEFORE'
> > 20>..100 -- (feature starts after bp 20 on 3' strand) fuzzy='AFTER'
> >
> > see
> > http://www.ncbi.nlm.nih.gov/Sitemap/samplerecord.html#FeaturesB
> > for explaination
> >
> > "The location of each feature is provided as well, an can be a single
> > base, a contiguous span of bases, a joining of sequence spans, and
> > other representations. If a feature is located on the complementary
> > strand, the word "complement" will appear before the base span. If the
> > "<" symbol precedes a base span, the sequence is partial on the 5' end
> > (e.g., CDS <1..206). If the ">" symbol follows a base span, the
> > sequence is partial on the 3' end (e.g., CDS 435..915>)."
> >
> >
> > Jason Stajich
> > jason@chg.mc.duke.edu
> > Center for Human Genetics
> > Duke University Medical Center
> > http://www.chg.duke.edu/
> >
> >
> > _______________________________________________
> > Biocorba-l mailing list
> > Biocorba-l@biocorba.org
> > http://www.biocorba.org/mailman/listinfo/biocorba-l
> >
>
> Jason Stajich
> jason@chg.mc.duke.edu
> Center for Human Genetics
> Duke University Medical Center
> http://www.chg.duke.edu/
>
>
> _______________________________________________
> Biocorba-l mailing list
> Biocorba-l@biocorba.org
> http://www.biocorba.org/mailman/listinfo/biocorba-l
>