[Bioperl-l] Bio::Location::Split question

Chris Fields cjfields at uiuc.edu
Mon Sep 18 21:55:38 UTC 2006


> I'm not sure what you're suggesting.
> 
> Are you suggesting that the examples are not identical in resulting
> DNA sequence (because in my book they are, because the order of
> segments is reversed in the second example).
> 
> Or are you suggesting that there is a bug in how BioPerl resolves
> split locations?
> 
> Or both?
> 
> 	-hilmar

This is from the GenBank/EMBL/DDBJ feature table definition for Locations:

---------------------------------------
complement(join(2691..4571,4918..5163))
       Joins regions 2691 to 4571 and 4918 to 5163, then 
       complements the joined segments (the feature is on the 
       strand complementary to the presented strand) 

join(complement(4918..5163),complement(2691..4571))
	Complements regions 4918 to 5163 and 2691 to 4571, then 
      joins the complemented segments (the feature is on the 
      strand complementary to the presented strand)
---------------------------------------

These two are the same only if the order of the locations is reversed,
otherwise they aren't the same:

complement(join(A..B,C..D))
join(complement(C..D),complement(A..B))

Both get the order, [dcba].  Normally only the first form is seen, and just
about every time I see the second the location order is reversed, so they
technically are the same.  No problem there.

However, if I take the two examples above, run them through
FTLocationFactory, then use to_FTstring() to get the feature string, this is
what I get:

complement(join(2691..4571,4918..5163))

complement(join(4918..5163,2691..4571))

Which one is correct?  From the above definition, I thought using 'join'
implies that the order is important for joining the locations (at least
according to the feature table definition above), starting from left to
right irrespective of the location order on the sequence.  Hence we have the
two different variations.  For Bioperl, do we always assume that the order
of the locations in a join is in sequence order or in the order they appear
in the original location string?  And how do remote locations fit in here?
It seems a simple reversal of the sublocation order should fix the above to
be in the correct order for the join if we want to stick with one form.

Even stranger, if they are remote locations they act differently (actually,
they act somewhat correctly).  If I add a faux remote location to the
original strings above, this is what I get from the location object's
to_FTstring():

complement(join(2691..4571,ABC1234.5:4918..5163))

join(complement(ABC1234.5:4918..5163),complement(2691..4571))

That way lies madness....

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign





More information about the Bioperl-l mailing list