[Bioperl-l] Re: *major* error in genbank parser or am i just insane?

Ewan Birney birney@ebi.ac.uk
Fri, 9 Aug 2002 09:24:02 +0100 (BST)


On Fri, 9 Aug 2002, Brian King wrote:

> > But is this just random cruft from Genbank/EMBL that
> > they didn't 
> > realise
> > when they designed it or something deeper?
> 
> After long struggles with the join operator I finally
> concluded is that it's just a way to represent
> hierarchical features in the flat feature table
> structure.  The regions within the join usually
> correspond to some other contiguous feature in the
> same feature table.  I'm interested to know if someone
> with more experience than me sees it the same way.  

I know this does not hold up 100% across the archive - there are CDS lines
with no corresponding separate exon features...

> 
> Because of the ambiguities in the join operator my
> ideal solution would be to not support the join syntax
> at all, but to match up the joined feature with its
> intended sub-features in the same table when parsing,
> or at least create generic sub-features at the
> contiguous regions on the join.  I'd make a real
> hierarchical representation in the object model and
> abandon the join syntax.  Unfortunately you'd have to
> hard-code some biological knowledge to judge if a
> corresponding sub-feature was really supposed to be
> part of a joined feature.  I doubt that round-trip
> preservation of the GenBank/EMBL record is necessary. 
> You could write out the record in a format that has
> hierarchical features and refer to the original record
> as needed.  Anyway, all that would be pretty hard to
> do, but I like to have an ideal in mind anyway.
> 

Ha!

This is very hard to do because you have to handle:


   (a) CDS with no Exons 

and, my particular favourite

   (b) a mRNA join operator which is out of sync with the CDS join
operator (!)


Quite what is going on in (b) is of course, anyone's guess. The simplest
solution is a typo by the author but perhaps he/she was trying to say
something profound ;)


Certainly doing this automatically works for 90% but sadly not 100% of
cases.



Don't forget remote features as well (joins across entries) which have
their own can of worms ...


> Sorry I only have an analysis and no solution.
> 
> Regards,
> Brian
> 
> 
> 
> 
> 
> 
> __________________________________________________
> Do You Yahoo!?
> HotJobs - Search Thousands of New Jobs
> http://www.hotjobs.com
> 

-----------------------------------------------------------------
Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420
<birney@ebi.ac.uk>. 
-----------------------------------------------------------------