[Bioperl-l] Bio::DB::GFF and GadFly GFF3 issues
Lincoln Stein
lstein at cshl.edu
Tue Mar 7 02:33:37 UTC 2006
Hi Marco,
I'm awfully sorry about this, but it looks like Bio::DB::GFF was broken at
some point in the recent past and nobody picked up on it. The other thing
that happened is that the FlyBase folks have started representing their
transcripts with a single CDS rather than with multiple CDS's -- this doesn't
affect you now, but it will if you try to use the processed_transcript glyph.
So, first of you, you'll have to CVS update bioperl again. Second, look at
the enclosed draw_gene.pl script that will do what you want. The main trick
here is that I added type "intron" to the processed_transcript feature
aggregator, enabling you to get the introns. Ordinarily introns are not
included in the processed_transcript (because they're processed out!)
Lincoln
On Monday 06 March 2006 19:02, Marco Blanchette wrote:
> Lincoln--
>
> I did what you suggested and still can't get a drawing of the exon/intron
> mRNA structure using the following script:
>
> #!/usr/bin/perl
> use strict;
> use warnings;
> use Bio::DB::GFF;
> use Bio::Graphics;
> use Bio::SeqFeature::Generic;
>
> my $dmdb = Bio::DB::GFF->new( -adaptor => 'dbi::mysql',
> -dsn => "chr4_mod",
> );
> my @genes = ('CG2381','CG2041'); ##two genes on the fourth chromosome
> foreach my $gene (@genes){
>
> my $tg = $dmdb->segment(-name => $gene);
>
> if ($tg){
>
> my $panel = Bio::Graphics::Panel->new(
> -length => $tg->length,
> -width => 800,
> -pad_left => 10,
> -pad_right => 10,
> );
>
> my $full_length =
> Bio::SeqFeature::Generic->new(-start=>1,-end=>$tg->length);
>
> $panel->add_track($full_length,
> -glyph => 'arrow',
> -tick => 2,
> -fgcolor => 'black',
> -double => 1,
> );
>
> my $tcs = $tg->features(-types =>'processed_transcript',
> -attributes => {Gene=> $gene},
> -iterator => 1);
>
> while (my $tc = $tcs->next_seq){
>
> my $track = $panel->add_track( generic => $tc,
> -bgcolor => 'blue',
> -label => 1,
> -bump => 0,
> #-connector => 'solid',
> );
> }
> open FH, ">$gene.png" || die "Can't create file $gene.png\n";
> print FH $panel->png;
> $panel->finished;
> close FH;
> }
> }
>
> I get 2 files with the mRNA tracts labeled with the RNA id but without any
> exon/intron structure.
>
> However, If I extract the exons first and draw them as in:
>
> #!/usr/bin/perl
> use strict;
> use warnings;
> use Bio::DB::GFF;
> use Bio::Graphics;
> use Bio::SeqFeature::Generic;
>
> my $dmdb = Bio::DB::GFF->new( -adaptor => 'dbi::mysql',
> -dsn => "chr4_mod",
> );
> my @genes = ('CG2381','CG2041'); ##two genes on the fourth chromosome
> foreach my $gene (@genes){
>
> my $tg = $dmdb->segment(-name => $gene);
>
> if ($tg){
> my $panel = Bio::Graphics::Panel->new(
> -length => $tg->length,
> -width => 800,
> -pad_left => 10,
> -pad_right => 10,
> );
>
> my $full_length =
> Bio::SeqFeature::Generic->new(-start=>1,-end=>$tg->length);
>
> $panel->add_track($full_length,
> -glyph => 'arrow',
> -tick => 2,
> -fgcolor => 'black',
> -double => 1,
> );
>
> my @transcripts = $tg->features(-types =>'processed_transcript',
> -attributes => {Gene => $gene},
> );
>
> for my $tc (@transcripts){
> my @exons = $tc->features( -types => 'exon',
> -attributes => {Parent =>
> $tc->group});
>
> my @introns = $tc->features(-types => 'intron',
> -attributes => {Parent =>
> $tc->group});
>
> my $track = $panel->add_track( generic => \@exons,
> -bgcolor => 'blue',
> -label => 1,
> -bump => 0,
> );
> }
> open FH, ">$gene.png" || die "Can't create file $gene.png\n";
> print FH $panel->png;
> $panel->finished;
> close FH;
> }
> }
>
> I get the exon drawn correctly but I can add line connecting them.
> Moreover, since each exon are individual feature, I can't get the mRNA id
> displayed on each tract...
>
> I tried to pass the @intron array to the glyph in different ways to draw
> hat line between exon without any success.
>
> Am I going the right direction or their is some better workaround?
>
> Many thanks
>
> Marco
>
> On 3/6/06 10:31 AM, "Lincoln Stein" <lstein at cshl.edu> wrote:
> > Hi,
> >
> > Since I wrote the last message I have done some more testing and have
> > determined that the flybase GFF3 files cannot be stored in Bio::DB::GFF
> > due to limitations in the Bio::DB::GFF data model. The issue is that
> > Bio::DB::GFF can only store one level of parentage, and not the two
> > levels needed by flybase genes.
> >
> > Here is a quick fix to preprocess the gff3 files so that they can be used
> > by Bio::DB::GFF:
> >
> > while (<>) {
> > my @fields = split "\t";
> > next unless $fields[2] eq 'mRNA';
> > s/Parent=([^;]+)/Gene=$1/;
> > } continue {
> > print;
> > }
> >
> > This turns the "Parent" field of mRNA lines into a "Gene" attribute. You
> > can then find all transcripts corresponding to a particular gene in much
> > the way you tried earlier:
> >
> > my $tcs = $tg->features(-types =>'processed_transcript',
> > -attributes => {Gene=> $gene},
> > -iterator => 1);
> >
> > I am going back to work on Bio::DB::GFF3, which will fix this problem.
> >
> > Lincoln
> >
> > On Monday 06 March 2006 00:02, Marco Blanchette wrote:
> >> Dear all--
> >>
> >> I am trying to forge my first bioperl weapons with the
> >> Bio::DB::GFF and Bio::Graphics modules. My goal is to display genes with
> >> their underlying mRNAs and later on add addition useful info (ie binding
> >> site for our preferred proteins).
> >>
> >> I loaded the GadFly gff3 annotation in a mysql database using
> >> bulk_load_gff.pl and I am trying to pass a Bio::SeqFeatureI to the
> >> Bio::Graphics::add_feature method.
> >>
> >> My understanding is that:
> >> my $tcs = $tg->features(-types =>'processed_transcript',
> >> -attributes => {Parent =>
> >> $gene}, -iterator => 1);
> >>
> >> Produces a Bio::SeqIO object that can be iterate through the next_seq
> >> method to get a Bio::Seq object that could be used to extract a
> >> Bio::SeqFeatureI by using the get_SeqFeatures method.
> >>
> >> Somehow, my script does not produce the expected results. Could somebody
> >> put me on back on the right track.
> >>
> >>
> >> #!/usr/bin/perl
> >> use strict;
> >> use warnings;
> >> use Bio::DB::GFF;
> >> use Bio::Graphics;
> >>
> >> my $dmdb = Bio::DB::GFF->new( -adaptor => 'dbi::mysql',
> >> -dsn => "chr4",
> >> );
> >>
> >>
> >> my @genes = ('CG2041'); ##a gene on the fourth chromosome
> >>
> >> foreach my $gene (@genes){
> >>
> >> my $geneseg = $dmdb->segment(-name => $gene, -merge);
> >>
> >> if ($geneseg){
> >>
> >> my @tgs = $geneseg->features(-types => 'gene');
> >>
> >> for my $tg (@tgs){
> >>
> >> my $length = $tg->length();
> >>
> >> my $panel = Bio::Graphics::Panel->new(-length => $length, -width
> >> => 800);
> >>
> >> my $track = $panel->add_track( -glyph => 'generic',
> >> -label => 1);
> >>
> >> my $tcs = $tg->features(-types =>'processed_transcript',
> >> -attributes => {Parent =>
> >> $gene}, -iterator => 1);
> >>
> >> while ( my $tc = $tcs->next_seq ){
> >> $track->add_feature($tc->get_SeqFeatures);
> >> }
> >>
> >> print $panel->png;
> >> }
> >> }
> >> }
> >>
> >> Many thanks
> >>
> >>
> >> Marco Blanchette, Ph.D.
> >>
> >> mblanche at berkeley.edu
> >>
> >> Donald C. Rio's lab
> >> Department of Molecular and Cell Biology
> >> 16 Barker Hall
> >> University of California
> >> Berkeley, CA 94720-3204
> >>
> >> Tel: (510) 642-1084
> >> Cell: (510) 847-0996
> >> Fax: (510) 642-6062
> >>
> >>
> >>
> >>
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Marco Blanchette, Ph.D.
>
> mblanche at berkeley.edu
>
> Donald C. Rio's lab
> Department of Molecular and Cell Biology
> 16 Barker Hall
> University of California
> Berkeley, CA 94720-3204
>
> Tel: (510) 642-1084
> Cell: (510) 847-0996
> Fax: (510) 642-6062
>
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
--
Lincoln Stein
lstein at cshl.edu
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: CG7486.png
Type: image/png
Size: 3556 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060306/84ac0614/attachment-0004.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: draw_gene.pl
Type: application/x-perl
Size: 1469 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060306/84ac0614/attachment.pl>
More information about the Bioperl-l
mailing list