[Bioperl-l] Error loading GFF3: MSG: xxx doesn't have a primary id ...

Scott Cain scott at scottcain.net
Fri May 22 13:25:20 UTC 2009


Hi Dan,

There are a few problems:

  * Parent-child relationships aren't allowed to span multiple GFF
files.  The ID is only an identifier to identify relationships within
a GFF3 file and there is not guarantee that the database you are
loading will use them.  While the SFStore database does store them, it
doesn't use them for anything.

  * I don't think a match_part should have a parent that is a
cloned_genomic_insert anyway.  match_part terms should have some
variant of match (EST_match, cross_genome_match, etc) as their parent,
or they should have no parent at all and be grouped by a shared ID.

Scott


On Fri, May 22, 2009 at 7:38 AM, Dan Bolser <dan.bolser at gmail.com> wrote:
> Hi,
>
> I'm using Bio::DB::SeqFeature::Store::GFF3Loader to load GFF into a
> DB::SeqFeature::Store database.
>
> I first load in a set of 'clones' in a GFF file that looks like this...
>
> S.lycopersicum-chr4     SGN:chr04.v14.agp       cloned_genomic_insert
>  7400895 7558294 .       -       .
> ID=C04SLm0125H12.1;Alias=89;Ontology_term=SO:0000914
> S.lycopersicum-chr4     SGN:chr04.v14.agp       cloned_genomic_insert
>  7558295 7620759 .       +       .
> ID=C04HBa0002B09.1;Alias=90;Ontology_term=SO:0000914
> S.lycopersicum-chr4     SGN:chr04.v14.agp       cloned_genomic_insert
>  7670760 7801908 .       +       .
> ID=C04HBa0077O05.2;Alias=92;Ontology_term=SO:0000914
>
>
> And then I load a bunch of Blast hits from those clones in a GFF file
> that looks like this...
>
> S.lycopersicum-chr4     BLASTN  match_part      14263569
> 14263620        56.0    -       0       Target=BAC10.Contig16 314
> 365;score=56.0;Parent=C04HBa0107N23.1
> S.lycopersicum-chr4     BLASTN  match_part      7565714 7565734 42.1
>  +       0       Target=BAC10.Contig16 199
> 219;score=42.1;Parent=C04HBa0002B09.1
> S.lycopersicum-chr4     BLASTN  match_part      4309103 4309134 48.1
>  -       0       Target=BAC10.Contig18 1704
> 1735;score=48.1;Parent=C04HBa0308B07.2
>
>
> I'm not 100% sure I got the "tags" part of the latter GFF correct.
>
>
> I'm getting the following error loading the second GFF file:
>
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: C04HBa0002B09.1 doesn't have a primary id
> STACK: Error::throw
> STACK: Bio::Root::Root::throw ~/perl5/lib/perl5/Bio/Root/Root.pm:368
> STACK: Bio::DB::SeqFeature::Store::GFF3Loader::build_object_tree_in_tables
> ~/perl5/lib/perl5/Bio/DB/SeqFeature/Store/GFF3Loader.pm:685
> STACK: Bio::DB::SeqFeature::Store::GFF3Loader::build_object_tree
> ~/perl5/lib/perl5/Bio/DB/SeqFeature/Store/GFF3Loader.pm:664
> STACK: Bio::DB::SeqFeature::Store::GFF3Loader::finish_load
> ~/perl5/lib/perl5/Bio/DB/SeqFeature/Store/GFF3Loader.pm:318
> STACK: Bio::DB::SeqFeature::Store::Loader::load_fh
> ~/perl5/lib/perl5/Bio/DB/SeqFeature/Store/Loader.pm:325
> STACK: Bio::DB::SeqFeature::Store::Loader::load
> ~/perl5/lib/perl5/Bio/DB/SeqFeature/Store/Loader.pm:222
> STACK: ~/BiO/Util/my_seqfeature_load.plx:44
> -----------------------------------------------------------
>
>
> As you can see the ID C04HBa0002B09.1 (from the Parent tag of the
> second GFF) *does* exist in the first GFF.
>
>
> The features are apparently loaded correctly, and calling 'reindex' on
> the database seems to run without error. I tried to look into the
> above code, but I'm confused by all the calls to the Load 'Helper'.
>
> a) is this the problem of my GFF?
> b) is this important? (the features are apparently loaded)
> c) can you fix it? ;-)
>
>
> Thanks for any tips,
> Dan.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>



-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research




More information about the Bioperl-l mailing list