[Bioperl-l] Genes from MySQL database using Bio::DB::GFF

Scott Cain cain.cshl at gmail.com
Thu Aug 17 05:11:52 UTC 2006


Hi Marco,

Well, it works for me :-)

I ran this script:

#!/usr/bin/perl  -w
use strict;

use Bio::DB::GFF;
my $db = Bio::DB::GFF->new( -adaptor => 'dbi::pg',
                              -dsn => 'dbi:Pg:dbname=flybase');

my @feat = $db->get_feature_by_name('FBgn0025803');

for (@feat) {
    print "$_\n" if ($_->method eq 'gene');
}

and got one line:

gene:.(FBgn0025803)

The only real difference is that this in a PostgreSQL database and not
MySQL.  I used Pg since I have that installed.  I'll blow away this
database, install MySQL and see if that makes a difference (of course,
it shouldn't, but you never know...)

Gaah!  I ran the exact same script with a mysql Bio::DB::GFF and got
this out:

gene:.(FBgn0025803)
gene:.(FBgn0025803)

Looks like a bug in the mysql adaptor.  I'll see if I can track it down;
in the mean time, you could switch to a real database :-)

Scott



On Wed, 2006-08-16 at 23:30 -0400, Scott Cain wrote:
> Hi Marco,
> 
> I'm working on it right now--my first guess (without doing any real
> work), I'm betting on the problem being an incompatibility between the
> GFF3 file and the Bio::DB::GFF schema.
> 
> Scott
> 
> 
> On Wed, 2006-08-16 at 19:59 -0700, Marco Blanchette wrote:
> > Dear all,
> > 
> > I am desperately trying to get a list of gene coordinates from a MySQL
> > database version of the fly genome populated using the Bio::DB::GFF module.
> > I have a list of 277 id in a text file that when parsed through the
> > following script return 279 entries (2 more entries then the number of genes
> > in the starting list).
> > 
> > Here is the script:
> > 
> > use Bio::DB::GFF;
> > my $db = Bio::DB::GFF->new( -adaptor => 'dbi::mysql',
> >                               -dsn => 'dbi:mysql:database=dmel_43_new');
> > while (<>){
> >     chomp;
> >     my @feat = $db->get_feature_by_name($_);
> >     for my $f (@feat){
> >         if ($f->type->method eq 'gene'){
> >         print     "Name: ", $f->name,
> >                 " Strand: ", $f->strand,
> >                 " Start: ", $f->start,
> >                 " End: ", $f->end,
> >                 "\n";
> >         }
> >     }
> > }
> > 
> > I totally don¹t understand where the 2 extra entries are coming from.
> > Nothing differentiate them from each other. Moreover, when I double check
> > the MySQL database, both genes are having only a single Œgene¹ entry in the
> > fdata table.
> > 
> > Is there a bug in the way I am trying to fetch the individual genes or
> > something is wrong with the latest Bio::DB::GFF module from the CVS
> > repository?
> > 
> > Here is a test script and it¹s output that I am using to try to tract down
> > what the problem is. Hope this could help:
> > 
> > use Bio::DB::GFF;
> > my $db = Bio::DB::GFF->new( -adaptor => 'dbi::mysql',
> >                               -dsn => 'dbi:mysql:database=dmel_43_new');
> > my %dups;
> > my ($j, $i) =0;
> > while (<>){
> >     chomp;
> >     my $id = $_;
> >     my @feat = $db->get_feature_by_name($id);
> >     my $feat_size = $#feat;
> >     $j++ if $feat_size == 2;
> >     
> >     for my $f (@feat){
> >         $i++;
> >           
> >         if (exists $dups{$f->group} && $f->type->method eq 'gene'){
> >             print     "Calling >>>", $f->group,
> >                         " ID=", $i,
> >                         " from \@feat of size $feat_size",
> >                         "\n";
> >             print     "Chr: ", $f->refseq,
> >                     " Strand: ", $f->strand,
> >                     " Start: ", $f->start,
> >                     " End: ", $f->end,
> >                     "\n";
> >             print "Offending >>>", $dups{$f->group}->[0]->group,
> >                   " ID=", $dups{$f->group}->[1], "\n";
> >             print     "Chr: ", $dups{$f->group}->[0]->refseq,
> >                     " Strand: ", $dups{$f->group}->[0]->strand,
> >                     " Start: ", $dups{$f->group}->[0]->start,
> >                       " End: ", $dups{$f->group}->[0]->end;
> >             print "\n\n";
> >          } elsif ($f->type->method eq 'gene') {
> >             $dups{$f->group} = [$f, $i];
> >          }
> >     }
> > }
> > 
> > print "#### there was $j \@feat with only 2 features\n";
> > 
> > Output of the test script:
> > 
> > $ perl test.pl hrp36_targets.txt
> > Calling >>>FBgn0025803 ID=98 from @feat of size 2
> > Chr: 3R Strand: 1 Start: 16966463 End: 17038413
> > Offending >>>FBgn0025803 ID=97
> > Chr: 3R Strand: 1 Start: 16966463 End: 17038413
> > 
> > Calling >>>FBgn0025681 ID=304 from @feat of size 2
> > Chr: 2L Strand: 1 Start: 2992964 End: 2998614
> > Offending >>>FBgn0025681 ID=303
> > Chr: 2L Strand: 1 Start: 2992964 End: 2998614
> > 
> > #### there was 11 @feat with only 2 features
> > 
> > With the hope someone can find out the problem...
> > 
> > Cheers,
> > 
> > Marco
> > 
> > ______________________________
> > Marco Blanchette, Ph.D.
> > 
> > mblanche at uclink.berkeley.edu
> > 
> > Donald C. Rio's lab
> > Department of Molecular and Cell Biology
> > 16 Barker Hall
> > University of California
> > Berkeley, CA 94720-3204
> > 
> > Tel: (510) 642-1084
> > Cell: (510) 847-0996
> > Fax: (510) 642-6062
> -- 
> ------------------------------------------------------------------------
> Scott Cain, Ph. D.                                   cain.cshl at gmail.com
> GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
> Cold Spring Harbor Laboratory
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   cain.cshl at gmail.com
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060817/93c9027b/attachment-0003.bin>


More information about the Bioperl-l mailing list