[BioPerl] Re: [Bioperl-l] gff_string on an HSPI object is not Bio::DB::GFF friendly

Lincoln Stein lstein at cshl.edu
Mon Jan 12 18:34:14 EST 2004


Hello,

You might want to look at the relative-to-absolute coordinate mapper that's in 
the Bio::Das library (version 0.94).  Here's an excerpt from the synopsis:

        use Bio::Das::Map 'print_location';

        my $m = Bio::Das::Map->new('my_map');
        $m->add_segment(['chr1',100,1000]   => ['c1.1',1,901]);
        $m->add_segment(['chr1',1001,2000]  => ['c1.2',501,1500]);
        $m->add_segment(['chr1',2001,4000]  => ['c1.1',3000,4999]);
        $m->add_segment(['c1.1',4000,4999]  => ['c1.1.1',1,1000]);

        my @abs_locations = $m->resolve('c1.1.1',500=>600);
        print_location(@abs_locations);

        for my $location (@abs_locations) {
           my @rel_locations = $m->project($location,'c1.1.1');
           print_location(@rel_locations);

           my @all_rel_locations = $m->sub_segments($location);
           print_location(@all_rel_locations);
        }

If you don't like using the [seqid,start,end] triples, you can use 
Bio::LocationI objects instead.

You can find the module on CPAN, or at www.biodas.org.

Lincoln

On Monday 12 January 2004 11:32 am, Aaron J.Mackey wrote:
> Actually, all I really need is a relative-to-absolute coordinate
> mapper, so that I can prepare input in relative coordinates, and feed
> it to a dbGFF database in absolute coordinates (which was what I was
> hoping load_gff.pl might now be doing automatically, given all your
> talk about GFF3).  I realize now, however, that relative coordinates is
> not in the purvue of the GFF3 spec, but is rather an application issue.
>
> Thanks for all your thoughts,
>
> -Aaron
>
> On Jan 12, 2004, at 11:18 AM, Scott Cain wrote:
> > Aaron,
> >
> > I really doubt that the current release of GBrowse supports relative
> > coordinates as described by both you and Allen.  I have to say I'm not
> > sure, because I am in the process of developing a set of test data.
> >
> > As for chado, it should actually be fairly easy to adapt it to work
> > with
> > relative coordinates.  The main change (for me) would be in the gbrowse
> > chado adaptor, which assumes that all features have as the 'srcfeature'
> > the 'top' feature (ie, all features are directly laid on the
> > chromosome/arm/contig/whatever).  The reason it does that is because
> > that is the way that the fruitfly people use it, and so that was the
> > data I had to develop the adaptor for.
> >
> > If having relative coordinates is something that would be useful for
> > you
> > to use chado, let me know (and send me sample GFF3 data) and I will
> > work
> > on it.  Otherwise, it will go in the TODO file.
> >
> > Thanks,
> > Scott
> >
> > On Fri, 2004-01-09 at 17:22, Allen Day wrote:
> >> We don't support this in the chado load_gff3.pl script, but it
> >> wouldn't be
> >> very difficult to add handling of simple cases.  I am concerned though
> >> about difficulties handling potential ambiguity wrt the strandedness
> >> of
> >> relative coordinates.
> >>
> >> I assume by relative coordinates here, you mean you're describing a
> >> feature's position in terms of the position of another feature which
> >> is
> >> itself described in absolute coordinates (or is relative to a feature
> >> which is).
> >>
> >> -Allen
> >>
> >> On Fri, 9 Jan 2004, Aaron J.Mackey wrote:
> >>> Hi Scott,
> >>>
> >>> Thanks for the quick reply, but that wasn't exactly the nature of the
> >>> question; the question was whether (apart from Gap attributes), do
> >>> gbrowse, BDGFF, and/or, specifically, load_gff.pl variants know the
> >>> rest of GFF3, namely to provide the ability of input GFF3 with
> >>> features
> >>> that aren't in absolute reference coordinates, but in relative
> >>> coordinates?  And is that ability in release 1.58, or some CVS
> >>> branch I
> >>> can access (code that lives quietly in the depths of Lincoln's hard
> >>> drive doesn't count)?
> >>>
> >>> Thanks,
> >>>
> >>> -Aaron
> >>>
> >>> On Jan 9, 2004, at 4:47 PM, Scott Cain wrote:
> >>>> OK, I am going to answer this, but if I am wrong, I'm sure Lincoln
> >>>> will
> >>>> correct me.  I don't think gbrowse or BDGFF knows how to deal with
> >>>> cigar
> >>>> lines in Gap attributes yet.  It is safer for the moment to
> >>>> continue to
> >>>> put separate HSPs on separate GFF lines for the time being.
> >>>>
> >>>> Scott
> >>>>
> >>>> On Fri, 2004-01-09 at 16:42, Aaron J.Mackey wrote:
> >>>>> Forgive me for a stupid question, but does GBrowse (v1.58) now
> >>>>> support
> >>>>> GFF3?  Namely, can I have start/stops in sub-feature coordinates
> >>>>> in my
> >>>>> input GFF3 and expect bp_load_gff.pl to behave properly (i.e.
> >>>>> generate
> >>>>> "canonical" top-level coordinates for storage)?  I didn't see
> >>>>> anything
> >>>>> in the documentation, so I was surprised to see some of the words
> >>>>> in
> >>>>> these posts ...
> >>>>>
> >>>>> Thanks
> >>>>>
> >>>>> On Jan 9, 2004, at 4:09 PM, Mark Wilkinson wrote:
> >>>>>> Cool.  I'm heavily into making the HSP's output proper GFF3 today
> >>>>>> for
> >>>>>> some of the Gbrowse tools that I have been working on, so I will
> >>>>>> jump
> >>>>>> in
> >>>>>> and do this over the next day or two.
> >>>>>>
> >>>>>> Cheers!
> >>>>>>
> >>>>>> Mark
> >>>>>>
> >>>>>> On Fri, 2004-01-09 at 14:49, Scott Cain wrote:
> >>>>>>> I think everything you wrote below is correct.  As far as I know,
> >>>>>>> only
> >>>>>>> Allen and I have been working BTGFF's GFF3 code, and we haven't
> >>>>>>> touched
> >>>>>>> the alignment portion, so I am not surprised that it is wrong.  I
> >>>>>>> suppose fixing BTGFF may break some tools, but I know that the
> >>>>>>> chado
> >>>>>>> loader I wrote will handle it correctly :-)
> >>>>>>>
> >>>>>>> Thanks,
> >>>>>>> Scott
> >>>>>>>
> >>>>>>> On Fri, 2004-01-09 at 15:45, Mark Wilkinson wrote:
> >>>>>>>> On Fri, 2004-01-09 at 11:22, Scott Cain wrote:
> >>>>>>>>>   - be sure to use a SO term for the type (ie, match or one of
> >>>>>>>>> its
> >>>>>>>>> children)
> >>>>>>>>
> >>>>>>>> So... actually the existing implementation of GFF3 in bioperl
> >>>>>>>> from Bio::Tools::GFF->new(-gff_version => 3)
> >>>>>>>> does not generate correctly formatted GFF3 for alignment
> >>>>>>>> features,
> >>>>>>>> yeah?
> >>>>>>>>
> >>>>>>>> e.g. for column 9 of an alignment feature I get:
> >>>>>>>>
> >>>>>>>> 	Target=gi|2828774:54232..54206
> >>>>>>>>
> >>>>>>>> whereas I think I should be getting
> >>>>>>>>
> >>>>>>>> 	Target=gi|2828774+54232+54206
> >>>>>>>>
> >>>>>>>> In addition, it passes through all sorts of other tags that
> >>>>>>>> begin
> >>>>>>>> with
> >>>>>>>> capital letters:
> >>>>>>>>
> >>>>>>>> 	Bits=46.1;FracId=0.962962962962963
> >>>>>>>>
> >>>>>>>> these should be
> >>>>>>>>
> >>>>>>>> 	bits=46.1;fracId=0.962962962962963
> >>>>>>>>
> >>>>>>>> if I am reading the spec correctly.
> >>>>>>>>
> >>>>>>>> Finally, the column-3 term that comes out is "similarity", but
> >>>>>>>> it
> >>>>>>>> should be
> >>>>>>>> one of the *match terms.  Is that also correct?
> >>>>>>>>
> >>>>>>>> Please confirm that I am interpreting the GFF3 spec correctly
> >>>>>>>> for
> >>>>>>>> these
> >>>>>>>> Alignment features and I would be happy to go in and fix things
> >>>>>>>> (a.k.a. break
> >>>>>>>> everyone else's tools ;-) )
> >>>>>>>>
> >>>>>>>> Cheerio!
> >>>>>>>>
> >>>>>>>> Mark
> >>>>>>
> >>>>>> --
> >>>>>> Mark Wilkinson <markw at illuminae.com>
> >>>>>> Illuminae
> >>>>>>
> >>>>>> _______________________________________________
> >>>>>> Bioperl-l mailing list
> >>>>>> Bioperl-l at portal.open-bio.org
> >>>>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >>>>
> >>>> --
> >>>> --------------------------------------------------------------------
> >>>> ---
> >>>> -
> >>>> Scott Cain, Ph. D.
> >>>> cain at cshl.org
> >>>> GMOD Coordinator (http://www.gmod.org/)
> >>>> 216-392-3087
> >>>> Cold Spring Harbor Laboratory
> >>>>
> >>>> _______________________________________________
> >>>> Bioperl-l mailing list
> >>>> Bioperl-l at portal.open-bio.org
> >>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >>>
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l at portal.open-bio.org
> >>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >
> > --
> > -----------------------------------------------------------------------
> > -
> > Scott Cain, Ph. D.
> > cain at cshl.org
> > GMOD Coordinator (http://www.gmod.org/)
> > 216-392-3087
> > Cold Spring Harbor Laboratory
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l

-- 
Lincoln Stein
lstein at cshl.edu
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)


More information about the Bioperl-l mailing list