[Bioperl-l] *major* error in genbank parser or am i just insane?

Hilmar Lapp hlapp@gnf.org
Thu, 8 Aug 2002 17:10:46 -0700


I'm sorry if I was driving anyone nuts. I had about 2 minutes left before the wireless was cut and was concerned about something getting missed and thought this was the bigger risk than me getting flamed for being thoughtless...

Reconsidering the whole thing, sure, both representations yield exactly the same DNA sequence, and round-tripping is the only issue. I'm happy to dismiss being able to round-trip to the linebreak, semantic sanity is indeed much preferable. Elia, are you OK with this?

So, as for split locations on the complementary strand, where does strand -1 go? Right now the parent is -1 whereas the sublocs are +1. This does not map 1:1 to biosql because seqfeatures (the 'parent') don't have a strand, and the parent's -1 appears to get lost presently when stored in biosql, at least given my copy of the code (older versions may not have this problem). Hence, shall we switch to parent as strand-less (0) and sublocs on strand -1? 

If I'm again missing something I'm happy to take more flames ...

	-hilmar

> -----Original Message-----
> From: Ewan Birney [mailto:birney@ebi.ac.uk]
> Sent: Thursday, August 08, 2002 2:37 PM
> To: Lincoln Stein
> Cc: Hilmar Lapp; Chris Mungall; Elia Stupka; Jason Stajich;
> bioperl-l@bioperl.org
> Subject: Re: [Bioperl-l] *major* error in genbank parser or am i just
> insane?
> 
> 
> On Thu, 8 Aug 2002, Lincoln Stein wrote:
> 
> > Hilmar Sez:
> > >  feature) are /not/ the same. You can't collapse this all 
> into one 
> > 
> > I've been trying to stay out of this, but Hilmar's last 
> comment is driving me 
> > nuts.  How do these two statements come to be semantically 
> different?
> > 
> > 	complement(join(2691..4571,4918..5163))
> > 	join(complement(4918..5163),complement(2691..4571))
> > 
> > They both produce the same DNA sequence in the same 
> orientation.  If GenBank 
> > chooses to represent numeric 0 as +0 sometimes and as -0 
> others, are we 
> > morally obligated to maintain the distinction in the data 
> model and round 
> > trip it?
> 
> I agree with you lincoln, although there is an argument for 
> being able to
> round trip it, I think semantic sanity is a good thing.
> 
> 
> I vote we go for one representation (strand only on simple 
> features) which
> means we are in sync with biosql and it means we will have to 
> take the hit
> about not being able to provide a diff-able round-trip...
> 
> 
> 
> 
> 
> > 
> > Lincoln
> > 
> > -- 
> > 
> ==============================================================
> ==========
> > Lincoln D. Stein                           Cold Spring 
> Harbor Laboratory
> > lstein@cshl.org			                  Cold 
> Spring Harbor, NY
> > 
> ==============================================================
> ==========
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l@bioperl.org
> > http://bioperl.org/mailman/listinfo/bioperl-l
> > 
> 
> -----------------------------------------------------------------
> Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420
> <birney@ebi.ac.uk>. 
> -----------------------------------------------------------------
> 
>