[Bioperl-l] Genes from MySQL database using Bio::DB::GFF

Scott Cain cain.cshl at gmail.com
Thu Aug 17 03:30:47 UTC 2006


Hi Marco,

I'm working on it right now--my first guess (without doing any real
work), I'm betting on the problem being an incompatibility between the
GFF3 file and the Bio::DB::GFF schema.

Scott


On Wed, 2006-08-16 at 19:59 -0700, Marco Blanchette wrote:
> Dear all,
> 
> I am desperately trying to get a list of gene coordinates from a MySQL
> database version of the fly genome populated using the Bio::DB::GFF module.
> I have a list of 277 id in a text file that when parsed through the
> following script return 279 entries (2 more entries then the number of genes
> in the starting list).
> 
> Here is the script:
> 
> use Bio::DB::GFF;
> my $db = Bio::DB::GFF->new( -adaptor => 'dbi::mysql',
>                               -dsn => 'dbi:mysql:database=dmel_43_new');
> while (<>){
>     chomp;
>     my @feat = $db->get_feature_by_name($_);
>     for my $f (@feat){
>         if ($f->type->method eq 'gene'){
>         print     "Name: ", $f->name,
>                 " Strand: ", $f->strand,
>                 " Start: ", $f->start,
>                 " End: ", $f->end,
>                 "\n";
>         }
>     }
> }
> 
> I totally don¹t understand where the 2 extra entries are coming from.
> Nothing differentiate them from each other. Moreover, when I double check
> the MySQL database, both genes are having only a single Œgene¹ entry in the
> fdata table.
> 
> Is there a bug in the way I am trying to fetch the individual genes or
> something is wrong with the latest Bio::DB::GFF module from the CVS
> repository?
> 
> Here is a test script and it¹s output that I am using to try to tract down
> what the problem is. Hope this could help:
> 
> use Bio::DB::GFF;
> my $db = Bio::DB::GFF->new( -adaptor => 'dbi::mysql',
>                               -dsn => 'dbi:mysql:database=dmel_43_new');
> my %dups;
> my ($j, $i) =0;
> while (<>){
>     chomp;
>     my $id = $_;
>     my @feat = $db->get_feature_by_name($id);
>     my $feat_size = $#feat;
>     $j++ if $feat_size == 2;
>     
>     for my $f (@feat){
>         $i++;
>           
>         if (exists $dups{$f->group} && $f->type->method eq 'gene'){
>             print     "Calling >>>", $f->group,
>                         " ID=", $i,
>                         " from \@feat of size $feat_size",
>                         "\n";
>             print     "Chr: ", $f->refseq,
>                     " Strand: ", $f->strand,
>                     " Start: ", $f->start,
>                     " End: ", $f->end,
>                     "\n";
>             print "Offending >>>", $dups{$f->group}->[0]->group,
>                   " ID=", $dups{$f->group}->[1], "\n";
>             print     "Chr: ", $dups{$f->group}->[0]->refseq,
>                     " Strand: ", $dups{$f->group}->[0]->strand,
>                     " Start: ", $dups{$f->group}->[0]->start,
>                       " End: ", $dups{$f->group}->[0]->end;
>             print "\n\n";
>          } elsif ($f->type->method eq 'gene') {
>             $dups{$f->group} = [$f, $i];
>          }
>     }
> }
> 
> print "#### there was $j \@feat with only 2 features\n";
> 
> Output of the test script:
> 
> $ perl test.pl hrp36_targets.txt
> Calling >>>FBgn0025803 ID=98 from @feat of size 2
> Chr: 3R Strand: 1 Start: 16966463 End: 17038413
> Offending >>>FBgn0025803 ID=97
> Chr: 3R Strand: 1 Start: 16966463 End: 17038413
> 
> Calling >>>FBgn0025681 ID=304 from @feat of size 2
> Chr: 2L Strand: 1 Start: 2992964 End: 2998614
> Offending >>>FBgn0025681 ID=303
> Chr: 2L Strand: 1 Start: 2992964 End: 2998614
> 
> #### there was 11 @feat with only 2 features
> 
> With the hope someone can find out the problem...
> 
> Cheers,
> 
> Marco
> 
> ______________________________
> Marco Blanchette, Ph.D.
> 
> mblanche at uclink.berkeley.edu
> 
> Donald C. Rio's lab
> Department of Molecular and Cell Biology
> 16 Barker Hall
> University of California
> Berkeley, CA 94720-3204
> 
> Tel: (510) 642-1084
> Cell: (510) 847-0996
> Fax: (510) 642-6062
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   cain.cshl at gmail.com
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060816/779d9754/attachment.sig>


More information about the Bioperl-l mailing list