[BioPython] Re: [Bioperl-l] *major* error in genbank parser or am i just insane? (fwd)

Danny Yoo dyoo@acoma.Stanford.EDU
Wed, 7 Aug 2002 12:18:30 -0700 (PDT)


Hi everyone,

There's been a lot of activity on the Bioperl mailing list recently about
their Genbank location parser, and there was a call to compare the
behavior of it versus that of the other Bio projects.


Perhaps it might be a good thing to collaborate with them?  Someone can
write a set of common tests to make sure that sublocation parsing and
sequence extraction is being done consistantly between Biopython and
Bioperl.  I'll try writing something this afternoon.



---------- Forwarded message ----------
Date: Wed, 7 Aug 2002 11:39:19 -0700 (PDT)
From: Chris Mungall <cjm@fruitfly.org>
To: Hilmar Lapp <hlapp@gnf.org>
Cc: Elia Stupka <elia@fugu-sg.org>, Jason Stajich <jason@cgt.mc.duke.edu>,
     bioperl-l@bioperl.org
Subject: Re: [Bioperl-l] *major* error in genbank parser or am i just
    insane?


i would have though the sublocations strand should be -1, as they
represent exons on the reverse strand. but i don't really understand the
whole bioperl location+seqfeature semantics/model; when outside the
bioperl world i just have one class that rolls seqfeature and location
into one.

i'm happy to have hilmar revoke my fix and instead go with checking the
parent location strand rather than the sublocation strand (if someone
could fix the genbank dumper to print the complement correctly that would
be great). if we go this route i will fix bioperl-db so that the parent
location strand goes into the seqfeature_location table. note that this
will introduce a slight disjunction between biosql abnd bioperl (in biosql
we absolutely must represent -ve strand exons as
seqfeature_location.strand = -1). hmm, how does biojava handle this.

On Wed, 7 Aug 2002, Hilmar Lapp wrote:

> After looking at Chris' fix, it appears to be wrong: it would set
> the sublocs' strand to -1. The problem lies elsewhere, I'm going to
> revoke that fix.
>
> 	-hilmar
>
> On Wednesday, August 7, 2002, at 10:10  AM, Hilmar Lapp wrote:
>
> > I have no idea what the present status on that is, but my reply was
> > generally not about a long-term/high-level/design/it would
> > be much better if/ discussion. I basically asked the question what
> > complement(join(1..100,201..300)) exactly means, and whether it has
> > been decided how exactly it shall be translated into strand()
> > attributes of the parent and sub-locations. This hasn't been
> > answered yet ...
> >
> > Quoting from the FT definition:
> >
> > complement(join(2691..4571,4918..5163))
> >                           Joins regions 2691 to 4571 and 4918 to
> > 5163, then 
> >                           complements the joined segments (the
> > feature is 
> >                           on the strand complementary to the
> > presented strand)
> >  
> > join(complement(4918..5163),complement(2691..4571))
> >                           Complements regions 4918 to 5163 and 2691
> > to 4571, then 
> >                           joins the complemented segments (the
> > feature is 
> >                           on the strand complementary to the
> > presented strand)
> >
> > The case in question is the first example. To translate this
> > properly to Bioperl locations, this means the parent SplitLoc is
> > strand -1, whereas the subs are strand +1. Right?
> >
> > 	-hilmar
> >
> >
> > On Tuesday, August 6, 2002, at 10:24  PM, Chris Mungall wrote:
> >
> >> ok, committed - it seems to have had some weird knock on effect
> >> breaking
> >> other tests - i can uncommit if this is bad
> >>
> >> On Wed, 7 Aug 2002, Elia Stupka wrote:
> >>
> >>>> we need a short term fix for the standard situation even more -
> >>>> shall i
> >>>> commit my chnange or will this mess things up more?
> >>>
> >>> Please commit it, I cannot stand when long-term/high-
> >>> level/design/it would
> >>> be much better if/ discussions get in the way of production
> >>> improvement fixes.
> >>>
> >>> Once it's committed I can set off a script for the diffing of in/out
> >>> genbank so you can be comfortable that it's not screwing up the
> >>> rest of
> >>> genkank parsing ;)
> >>>
> >>> Elia
> >>>
> >>> ********************************
> >>> * http://www.fugu-sg.org/~elia *
> >>> * tel:    +65 6874 1467        *
> >>> * mobile: +65 9030 7613        *
> >>> * fax:    +65 6777 0402        *
> >>> ********************************
> >>>
> >>>
> >>>
> >>
> >>
> > --
> > -------------------------------------------------------------
> > Hilmar Lapp                            email: lapp at gnf.org
> > GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
> > -------------------------------------------------------------
> >
> >
> --
> -------------------------------------------------------------
> Hilmar Lapp                            email: lapp at gnf.org
> GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
> -------------------------------------------------------------
>
>

_______________________________________________
Bioperl-l mailing list
Bioperl-l@bioperl.org
http://bioperl.org/mailman/listinfo/bioperl-l