[Bioperl-l] Bug #2936

Peter Cock p.j.a.cock at googlemail.com
Sat Mar 16 12:46:22 UTC 2013


On Fri, Mar 15, 2013 at 9:36 PM, Francisco J. Ossandón wrote:
>
> ... Another thing that
> is weird to me is that the sublocations are free to have different strands
> values (like the first being positive strand and the second being negative
> strand), since I can't think of one example where that can happen in real
> genomes. In fact one of the tests in PrimarySeq.t is designed exactly to
> have sublocations in opposite strands at the same time and then extract the
> sequence, so I wonder if I'm wrong and there are real cases like that...
>

This is a real biological phenomena - trans-splicing, often in tRNA genes,
for example:
http://www.ncbi.nlm.nih.gov/genbank/genomesubmit_annotation#transpliced

As a result the BioPerl / BioSQL / Biopython etc location models do have
to cope with this corner case.  Worse, there are examples where pieces
from different chromosomes are spliced together - which is even harder
to deal with - like my favourite pathological example, nad1 in NC_016406
(and NC_016402), which has the following GenBank location string:

join(complement(149815..150200),
complement(295492..295573),complement(293787..293978),
NC_016402.1:6618..6676,181647..181905)

See also:
http://blastedbio.blogspot.co.uk/2012/03/missing-external-exons-in-genbank-with.html

Peter




More information about the Bioperl-l mailing list