[Bioperl-l] Error loading GFF3: MSG: xxx doesn't have a primary id ...

Dan Bolser dan.bolser at gmail.com
Fri May 22 15:53:15 UTC 2009


2009/5/22 Scott Cain <scott at scottcain.net>:
> Hi Dan,
>
> There are a few problems:
>
>  * Parent-child relationships aren't allowed to span multiple GFF
> files.  The ID is only an identifier to identify relationships within
> a GFF3 file and there is not guarantee that the database you are
> loading will use them.  While the SFStore database does store them, it
> doesn't use them for anything.
>
>  * I don't think a match_part should have a parent that is a
> cloned_genomic_insert anyway.  match_part terms should have some
> variant of match (EST_match, cross_genome_match, etc) as their parent,
> or they should have no parent at all and be grouped by a shared ID.

Ah... I was using Parent incorrectly (a remnant of my 'remapping' HSPs
at the clone level to the chromosomal level). Now, instead of Parent,
I set "ID=HIT:MyCloneID" (where "MyCloneID" identifies the underlying
clone for this HSP) the GFF is loading without error.

Now back in GBrowse I just need to work out why HSPs are being grouped
by Target (which is really cool!) and how to get a clickable HSP
density plot in the overview panel ;-) Progress!


Thanks for help with debugging my numerous mistakes.

All the best,
Dan.

> Scott
>
>
> On Fri, May 22, 2009 at 7:38 AM, Dan Bolser <dan.bolser at gmail.com> wrote:
>> Hi,
>>
>> I'm using Bio::DB::SeqFeature::Store::GFF3Loader to load GFF into a
>> DB::SeqFeature::Store database.
>>
>> I first load in a set of 'clones' in a GFF file that looks like this...
>>
>> S.lycopersicum-chr4     SGN:chr04.v14.agp       cloned_genomic_insert
>>  7400895 7558294 .       -       .
>> ID=C04SLm0125H12.1;Alias=89;Ontology_term=SO:0000914
>> S.lycopersicum-chr4     SGN:chr04.v14.agp       cloned_genomic_insert
>>  7558295 7620759 .       +       .
>> ID=C04HBa0002B09.1;Alias=90;Ontology_term=SO:0000914
>> S.lycopersicum-chr4     SGN:chr04.v14.agp       cloned_genomic_insert
>>  7670760 7801908 .       +       .
>> ID=C04HBa0077O05.2;Alias=92;Ontology_term=SO:0000914
>>
>>
>> And then I load a bunch of Blast hits from those clones in a GFF file
>> that looks like this...
>>
>> S.lycopersicum-chr4     BLASTN  match_part      14263569
>> 14263620        56.0    -       0       Target=BAC10.Contig16 314
>> 365;score=56.0;Parent=C04HBa0107N23.1
>> S.lycopersicum-chr4     BLASTN  match_part      7565714 7565734 42.1
>>  +       0       Target=BAC10.Contig16 199
>> 219;score=42.1;Parent=C04HBa0002B09.1
>> S.lycopersicum-chr4     BLASTN  match_part      4309103 4309134 48.1
>>  -       0       Target=BAC10.Contig18 1704
>> 1735;score=48.1;Parent=C04HBa0308B07.2
>>
>>
>> I'm not 100% sure I got the "tags" part of the latter GFF correct.
>>
>>
>> I'm getting the following error loading the second GFF file:
>>
>> ------------- EXCEPTION: Bio::Root::Exception -------------
>> MSG: C04HBa0002B09.1 doesn't have a primary id
>> STACK: Error::throw
>> STACK: Bio::Root::Root::throw ~/perl5/lib/perl5/Bio/Root/Root.pm:368
>> STACK: Bio::DB::SeqFeature::Store::GFF3Loader::build_object_tree_in_tables
>> ~/perl5/lib/perl5/Bio/DB/SeqFeature/Store/GFF3Loader.pm:685
>> STACK: Bio::DB::SeqFeature::Store::GFF3Loader::build_object_tree
>> ~/perl5/lib/perl5/Bio/DB/SeqFeature/Store/GFF3Loader.pm:664
>> STACK: Bio::DB::SeqFeature::Store::GFF3Loader::finish_load
>> ~/perl5/lib/perl5/Bio/DB/SeqFeature/Store/GFF3Loader.pm:318
>> STACK: Bio::DB::SeqFeature::Store::Loader::load_fh
>> ~/perl5/lib/perl5/Bio/DB/SeqFeature/Store/Loader.pm:325
>> STACK: Bio::DB::SeqFeature::Store::Loader::load
>> ~/perl5/lib/perl5/Bio/DB/SeqFeature/Store/Loader.pm:222
>> STACK: ~/BiO/Util/my_seqfeature_load.plx:44
>> -----------------------------------------------------------
>>
>>
>> As you can see the ID C04HBa0002B09.1 (from the Parent tag of the
>> second GFF) *does* exist in the first GFF.
>>
>>
>> The features are apparently loaded correctly, and calling 'reindex' on
>> the database seems to run without error. I tried to look into the
>> above code, but I'm confused by all the calls to the Load 'Helper'.
>>
>> a) is this the problem of my GFF?
>> b) is this important? (the features are apparently loaded)
>> c) can you fix it? ;-)
>>
>>
>> Thanks for any tips,
>> Dan.
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
>
>
> --
> ------------------------------------------------------------------------
> Scott Cain, Ph. D.                                   scott at scottcain dot net
> GMOD Coordinator (http://gmod.org/)                     216-392-3087
> Ontario Institute for Cancer Research
>




More information about the Bioperl-l mailing list