[Bioperl-l] Bio::DB::GFF3 nightmare

Lincoln Stein lstein at cshl.edu
Thu Mar 30 23:40:14 UTC 2006


If you insist on using the Bio::DB::GFF database with GFF3 data, you must that 
the segment method returns a segment of the genome corresponding to the start 
and end positions of the indicated feature. When you call features() it 
returns everything that overlaps the region.

The API to use to get a single feature is get_feature_by_name():

	my $transcript = $dmdb->get_feature_by_name('CG30501-RA');
	my @exons     = $transcript->get_SeqFeatures;

Or, if you want all transcripts in the gene CG17800:

	my @transcripts = $dmdb->get_feature_by_attribute(Gene=>'CG17800');
	for my $transcript (@transcripts) {
		my @exons = $transcript->get_SeqFeatures;
	}

The latter depends on your having changed the gene Parent attribute into a 
Gene attribute as recommended in my previous email.

Lincoln

On Wednesday 29 March 2006 18:46, Marco Blanchette wrote:
> Dear all--
>
> I have been trying to display exon/intron structure of mRNAs for a given
> gene the D. melanogaster GadFly GFF3 annotation 4.2.1 loaded into mySQL
> using bp_bulk_loadd_gff.pl. I keep getting  mRNAs from other genes that
> fall within the segment of the queried gene. For example:
>
> #!/usr/bin/perl
> use strict;
> use warnings;
> use Bio::DB::GFF;
>
> my $agg1 = Bio::DB::GFF::Aggregator->new(    -method => 'pre_mRNA',
>                                             -sub_parts    =>
> ['exon','five_prime_UTR','three_prime_UTR'],
>                                         );
>
> my $dmdb = Bio::DB::GFF    ->new( -adaptor => 'dbi::mysql',
>                           -dsn =>
> 'dbi:mysql:database=dmel_421;host=riolab.net',
>                           -user => 'guest',
>                           -aggregators=> [$agg1],
>                 );
>
> my @genes = qw (CG17800);
>
> for my $gene (@genes){
>     my $tg = $dmdb->segment(-name => $gene);
>
>     my @transcripts = $tg->features(-type => 'pre_mRNA',
>                                      );
>
>      for my $tc (@transcripts){
>          my %atts = $tc->attributes;
>          print "$_ => $atts{$_}\n" foreach (keys %atts);
>      }
> }
>
> This script generate the output:
> Parent => CG30501-RA
> Name => Dscam:23
> Parent => CG17800-RE
> Parent => CG30500-RA
> Name => Dscam:23
> Parent => CG17800-RE
> Name => Dscam:23
> Parent => CG17800-RE
> Name => Dscam:23
> Parent => CG17800-RE
> Name => Dscam:23
> Parent => CG17800-RE
>
> Where Neither CG30501-RA nor CG30500-RA are coming from the gene CG17800.
> If I pass @transcripts to a Bio::Graphics::Panel object, I get, of course,
> all the different mRNAs even the one that don¹t belong to the CG17800 gene.
>
> I just can¹t figure out how to restrict the $tg->feature() call to the
> queried gene (ie CG17800)
>
> Many thanks
> ______________________________
> Marco Blanchette, Ph.D.
>
> mblanche at uclink.berkeley.edu
>
> Donald C. Rio's lab
> Department of Molecular and Cell Biology
> 16 Barker Hall
> University of California
> Berkeley, CA 94720-3204
>
> Tel: (510) 642-1084
> Cell: (510) 847-0996
> Fax: (510) 642-6062

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING, 
PLEASE CONTACT MY ASSISTANT, 
SANDRA MICHELSEN, AT michelse at cshl.edu




More information about the Bioperl-l mailing list