[Bioperl-l] Bug #2936

Fields, Christopher J cjfields at illinois.edu
Sat Mar 16 10:02:12 EDT 2013


On Mar 16, 2013, at 7:46 AM, Peter Cock <p.j.a.cock at googlemail.com>
 wrote:

> On Fri, Mar 15, 2013 at 9:36 PM, Francisco J. Ossandón wrote:
>> 
>> ... Another thing that
>> is weird to me is that the sublocations are free to have different strands
>> values (like the first being positive strand and the second being negative
>> strand), since I can't think of one example where that can happen in real
>> genomes. In fact one of the tests in PrimarySeq.t is designed exactly to
>> have sublocations in opposite strands at the same time and then extract the
>> sequence, so I wonder if I'm wrong and there are real cases like that...
>> 
> 
> This is a real biological phenomena - trans-splicing, often in tRNA genes,
> for example:
> http://www.ncbi.nlm.nih.gov/genbank/genomesubmit_annotation#transpliced
> 
> As a result the BioPerl / BioSQL / Biopython etc location models do have
> to cope with this corner case.  Worse, there are examples where pieces
> from different chromosomes are spliced together - which is even harder
> to deal with - like my favourite pathological example, nad1 in NC_016406
> (and NC_016402), which has the following GenBank location string:
> 
> join(complement(149815..150200),
> complement(295492..295573),complement(293787..293978),
> NC_016402.1:6618..6676,181647..181905)
> 
> See also:
> http://blastedbio.blogspot.co.uk/2012/03/missing-external-exons-in-genbank-with.html
> 
> Peter

Exactly.  So, in this implementation the problem is the internal logic used for Bio::Location deals with cases like this, but there is an assumption made to deal with some cases like this that is wrong for others (say, circular chromosomes where a feature wraps the origin, where sorting messes things up).  It really needs a top to bottom overhaul to be explicit about the join, but such an overhaul will break some expected behavior, hence my suggestion of putting this in a bioperl v2.

chris


More information about the Bioperl-l mailing list