[Bioperl-l] Error loading GFF3: MSG: xxx doesn't have a primary id ...
Scott Cain
scott at scottcain.net
Fri May 22 13:25:20 UTC 2009
Hi Dan,
There are a few problems:
* Parent-child relationships aren't allowed to span multiple GFF
files. The ID is only an identifier to identify relationships within
a GFF3 file and there is not guarantee that the database you are
loading will use them. While the SFStore database does store them, it
doesn't use them for anything.
* I don't think a match_part should have a parent that is a
cloned_genomic_insert anyway. match_part terms should have some
variant of match (EST_match, cross_genome_match, etc) as their parent,
or they should have no parent at all and be grouped by a shared ID.
Scott
On Fri, May 22, 2009 at 7:38 AM, Dan Bolser <dan.bolser at gmail.com> wrote:
> Hi,
>
> I'm using Bio::DB::SeqFeature::Store::GFF3Loader to load GFF into a
> DB::SeqFeature::Store database.
>
> I first load in a set of 'clones' in a GFF file that looks like this...
>
> S.lycopersicum-chr4 SGN:chr04.v14.agp cloned_genomic_insert
> 7400895 7558294 . - .
> ID=C04SLm0125H12.1;Alias=89;Ontology_term=SO:0000914
> S.lycopersicum-chr4 SGN:chr04.v14.agp cloned_genomic_insert
> 7558295 7620759 . + .
> ID=C04HBa0002B09.1;Alias=90;Ontology_term=SO:0000914
> S.lycopersicum-chr4 SGN:chr04.v14.agp cloned_genomic_insert
> 7670760 7801908 . + .
> ID=C04HBa0077O05.2;Alias=92;Ontology_term=SO:0000914
>
>
> And then I load a bunch of Blast hits from those clones in a GFF file
> that looks like this...
>
> S.lycopersicum-chr4 BLASTN match_part 14263569
> 14263620 56.0 - 0 Target=BAC10.Contig16 314
> 365;score=56.0;Parent=C04HBa0107N23.1
> S.lycopersicum-chr4 BLASTN match_part 7565714 7565734 42.1
> + 0 Target=BAC10.Contig16 199
> 219;score=42.1;Parent=C04HBa0002B09.1
> S.lycopersicum-chr4 BLASTN match_part 4309103 4309134 48.1
> - 0 Target=BAC10.Contig18 1704
> 1735;score=48.1;Parent=C04HBa0308B07.2
>
>
> I'm not 100% sure I got the "tags" part of the latter GFF correct.
>
>
> I'm getting the following error loading the second GFF file:
>
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: C04HBa0002B09.1 doesn't have a primary id
> STACK: Error::throw
> STACK: Bio::Root::Root::throw ~/perl5/lib/perl5/Bio/Root/Root.pm:368
> STACK: Bio::DB::SeqFeature::Store::GFF3Loader::build_object_tree_in_tables
> ~/perl5/lib/perl5/Bio/DB/SeqFeature/Store/GFF3Loader.pm:685
> STACK: Bio::DB::SeqFeature::Store::GFF3Loader::build_object_tree
> ~/perl5/lib/perl5/Bio/DB/SeqFeature/Store/GFF3Loader.pm:664
> STACK: Bio::DB::SeqFeature::Store::GFF3Loader::finish_load
> ~/perl5/lib/perl5/Bio/DB/SeqFeature/Store/GFF3Loader.pm:318
> STACK: Bio::DB::SeqFeature::Store::Loader::load_fh
> ~/perl5/lib/perl5/Bio/DB/SeqFeature/Store/Loader.pm:325
> STACK: Bio::DB::SeqFeature::Store::Loader::load
> ~/perl5/lib/perl5/Bio/DB/SeqFeature/Store/Loader.pm:222
> STACK: ~/BiO/Util/my_seqfeature_load.plx:44
> -----------------------------------------------------------
>
>
> As you can see the ID C04HBa0002B09.1 (from the Parent tag of the
> second GFF) *does* exist in the first GFF.
>
>
> The features are apparently loaded correctly, and calling 'reindex' on
> the database seems to run without error. I tried to look into the
> above code, but I'm confused by all the calls to the Load 'Helper'.
>
> a) is this the problem of my GFF?
> b) is this important? (the features are apparently loaded)
> c) can you fix it? ;-)
>
>
> Thanks for any tips,
> Dan.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
--
------------------------------------------------------------------------
Scott Cain, Ph. D. scott at scottcain dot net
GMOD Coordinator (http://gmod.org/) 216-392-3087
Ontario Institute for Cancer Research
More information about the Bioperl-l
mailing list