[Bioperl-l] Re: load_gff.pl question

Scott Cain cain at cshl.org
Wed Aug 6 15:21:17 EDT 2003


Shin,

The problem you are running into is not really with load_gff.pl, but
with the database schema.  Assuming you are using MySQL, the table
create statement for fdata looks like this:

 create table fdata (
    fid                 int not null  auto_increment,
    fref                varchar(100) not null,
    fstart              int unsigned   not null,
    fstop               int unsigned   not null,
    fbin                double(20,6)  not null,
    ftypeid             int not null,
    fscore              float,
    fstrand             enum('+','-'),
    fphase              enum('0','1','2'),
    gid                 int not null,
    ftarget_start       int unsigned,
    ftarget_stop        int unsigned,
    primary key(fid),
    unique index(fref,fbin,fstart,fstop,ftypeid,gid),
    index(ftypeid),
    index(gid)

The problem  you have is with that unique index on
(fref,fbin,fstart,fstop,ftypeid,gid).  This index conflicts with your
data, in that the similar lines are getting assigned the same gid (group
id), since they look like the same thing.  So, the quick way to fix this
is to remove the 'unique' from the index declaration.  That can be found
in Bio/DB/GFF/Adaptor/dbi/mysql.pm. Then run load_gff.pl as usual.  The
longer way to fix this is look at your data and figure out why they are
all getting assigned the same group id and make them sufficiently
different so that they don't.  

Hope that helps,
Scott

On Wed, 2003-08-06 at 13:31, bioperl-l-request at portal.open-bio.org
wrote:
> Where do I start to customize this script to allow loading of large 
> number of similar entities?

-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain at cshl.org
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory



More information about the Bioperl-l mailing list