[Bioperl-l] split location problems

Jason Stajich jason at bioperl.org
Tue Oct 17 02:48:14 UTC 2006


This probably was exposed by the fact that the Split object used to
explicitly sort the features by start*strand always.  But with remote
locations and needing to be able to explicitly set the order (for features
that are not required to be 5' -> 3') that code must have been removed.   I
think there is just one place that must be missing a 'reverse' on the list
of sub-locations when the top-level feature is a complement.  I'll wait for
your fix before wading in - we probably might want to figure out a
'consolidate' method to shrink redundant and equivalent representations to
the shortest possible form. Ugh this really starts to resemble trying to
write a boolean logic toolkit....
-jason

On 10/16/06, Chris Fields <cjfields at uiuc.edu> wrote:
>
>
> On Oct 16, 2006, at 5:45 PM, Jason Stajich wrote:
>
> > The whole point of split locations is to represent genes with
> > introns so that is not the "rare" case.
> >
> > I'm confused where the problem is.  The locations that I get out
> > with to_FTstring on the location object are exactly the same as
> > those input.
>
> The problem is with the a subset of split locations described in the
> bug report.  The following works:
>
> complement(join(2691..4571,4918..5163))
>
> whereas this:
>
> join(complement(4918..5163),complement(2691..4571))
>
> gives this:
>
> complement(join(4918..5163,2691..4571))
>
> which is not syntactically the same.  It should be:
>
> complement(join(2691..4571,4918..5163))
>
> since 'join' implies that the order of the segments to be joined is
> important ('order' and 'bond' do not, I guess).
>
> > I have processed the genbank fungal genomes into GFF3 and have had
> > no problems so I'm confused where you are breaking down.  If I
> > write them out as embl I also get the correct thing.  This is using
> > the CVS version of bioperl from the HEAD.
> >
> > I've added code to test this to bug 2101 including a C.glabrata
> > chromsome downloaded from genbank.  Perhaps the problem is on the
> > EMBL parsing side, I didn't test that.
> >
> > On the technical side, I still am not sure I fully know where the
> > strand information should be stored - the top level container or
> > the sub-features.  I'll try and stay up on the discussion if
> > anything has been decided that I should know about.
> >
> > -jason
>
> Split::strand() sets the sublocations as well, which seems to confuse
> the situation more but it is consistent with LocationI, as Hilmar
> points out.  I'm looking into a few solutions now, including a fix in
> Split::to_FTstring().
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
>


-- 
Jason Stajich
jason at bioperl.org
http://www.duke.edu/~jes12/



More information about the Bioperl-l mailing list