[Bioperl-l] Re: Bio::DB::GFF trouble + Blast question

Scott Cain cain at cshl.org
Thu Jul 3 21:20:42 EDT 2003


Venky,

Fair enough--if I change my example script to have the @genes line like
yours, it works for CG6667 and several other nearby numbers.

I am more inclined to distrust the database itself.  Perhaps a row in
the database is corrupted.  Try these queries:

mysql> select * from fgroup where gname='CG6667';
+------+--------+--------+
| gid  | gclass | gname  |
+------+--------+--------+
| 8185 | Gene   | CG6667 |
+------+--------+--------+

mysql> select * from fdata where gid=8185; (using the gid from above)
+-------+------+----------+----------+---------------+---------+--------+---------+--------+------+---------------+--------------+
| fid   | fref | fstart   | fstop    | fbin          | ftypeid | fscore | fstrand | fphase | gid  | ftarget_start | ftarget_stop |
+-------+------+----------+----------+---------------+---------+--------+---------+--------+------+---------------+--------------+
| 23255 | 2L   | 17414916 | 17428443 | 100000.000174 |       5 |   NULL | -       | NULL   | 8185 |          NULL |         NULL |
+-------+------+----------+----------+---------------+---------+--------+---------+--------+------+---------------+--------------+

select * from fdna where fref='2L' and foffset>=17414000 and foffset<=17430000;

which should give 9 rows of dna chunks that are 2000 bases long.

If any of your results are different from mine (excluding ids), then I
think you database has a problem.

Scott



On Thu, 2003-07-03 at 15:56, Venky Nandagopal wrote:
> Scott,
> 
> Thanks for the reply. I should have been more careful with my email -- my 
> script actually has the following lines
> 
> @genes = $db->get_feature_by_name("Gene" => $gene_id);
> print $genes[0]->seq;
> 
> The script works for every other CG number I've tried --- it only fails for 
> CG6667, which makes me think that there must be something wierd going on 
> with Bio::DB::GFF, not the script.
> 
> Venky
> 
> 
> 
> On 03 Jul 2003 13:46:08 -0400, Scott Cain <cain at cshl.org> wrote:
> 
> > Venky,
> >
> > It is not all that clear to me why in this case you need to, but you
> > need to specify the class of the object, in this case 'Gene'.
> >
> > Here is an example script that works for me:
> >
> > #!/usr/bin/perl
> > use strict;
> > use Bio::DB::GFF;
> > my $db = Bio::DB::GFF->new(-adaptor => 'dbi::mysql',
> > -dsn     => 'fly');
> > my @genes = $db->get_feature_by_name(-class=>'Gene',-name=>'CG6665');
> > print $genes[0]->seq,"\n";
> >
> > Scott
> >
> > On Thu, 2003-07-03 at 11:58, bioperl-l-request at portal.open-bio.org
> > wrote:
> >> Message: 12
> >> Date: Thu, 03 Jul 2003 01:43:57 -0700
> >> From: Venky Nandagopal <venky at OCF.Berkeley.EDU>
> >> Subject: [Bioperl-l] Bio::DB::GFF trouble + Blast question
> >> To: bioperl-l at portal.open-bio.org
> >> Message-ID: <oprrp7vjpxqwe008 at mail.ocf.berkeley.edu>
> >> Content-Type: text/plain; charset=utf-8; format=flowed
> >>
> >> Hi,
> >>
> >> I have a couple of problems: (1) I use a database created using 
> >> process_gadfly.pl to access the D.mel genome, via Bio::DB::GFF. I have a 
> >> utility script that returns the sequence of a gene given the CG number, 
> >> using 	@genes = get_feature_by_name(CG####); 	print $genes[0]->seq;
> >> This script seems to work fine for most CG numbers, except for CG6667, 
> >> which is the ID for the dorsal gene. For some reason, no sequence is 
> >> returned by the seq() method. The gene object is not undefined though, 
> >> since $genes[0]->asString returns "gene:gadfly(CG6667)"; similarly the 
> >> start, end, strand methods work fine. I have tried getting transcripts 
> >> instead of the gene etc etc, but CG6667 refuses to yield any sequence. 
> >> Can anyone provide an explanation for this?
> >>
> >>
> >> (2) This is not directly connect to Bioperl, but BLAST reports sometimes 
> >> provide Expect values in the form "Expect(3)=0.0". What does the 3 refer 
> >> to? Sometimes it says "Expect(7+)=1e-23".
> >>
> >>
> >> Thanks
> >> Venky
> >
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain at cshl.org
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory



More information about the Bioperl-l mailing list