[Bioperl-l] [BioSQL-l] Problem loading GO.

Leighton Pritchard lpritc at scri.ac.uk
Tue Apr 17 13:35:44 UTC 2007


Hi Hilmar, 

Thanks for the very quick response.  Apologies for the long reply, but I
thought it might be useful if anyone else happens across the same
problems that I did.

On Tue, 2007-04-17 at 00:00 -0400, Hilmar Lapp wrote:
> Apparently the parser  
> fails to parse out database and accession for this db_xref of term GO: 
> 0018901.
> 
> If you can edit the obo file, you can try deleting the db_xref(s) for  
> that term that look odd (or delete all if you don't need them).

You're spot on - see further down for details...

> Note that the argument for --fmtargs here should read
> "-defs_file,/path/to/Downloads/GO.defs". (Note that within the quotes  
> there is no tilde expansion.)

D'oh!  Thanks for the note - my bad, there.

> This is one the things why you've got to love MySQL (and I am correct  
> in inferring that you're using MySQL?). 

The 'choice' was forced upon me ;)

> It may be necessary to widen the length of dbname.accession here, for  
> example to 80 chars? Let me know if you need help with the DDL  
> command to do this.

I've fixed that now (and added it to my local biosqldb-mysql.sql
schema), but with a clean BioSQL schema and using:

[lpritc at lplinuxdev sql]$ bp_load_ontology.pl --host localhost --dbname
biosql --namespace "Gene Ontology" --dbuser lpritc --dbpass ********
--format goflat --fmtargs
"-defs_file,/home/lpritc/Downloads/GO.defs" /home/lpritc/Downloads/function.ontology 

I was still getting errors with the GO flatfile:

Loading ontology Gene Ontology:
        ... terms

-------------------- WARNING ---------------------
MSG: insert in Bio::DB::BioSQL::DBLinkAdaptor (driver) failed, values
were ("","","0","") FKs ()
Column 'dbname' cannot be null
---------------------------------------------------
Could not store term GO:0047554, name '2-pyrone-4,6-dicarboxylate
lactonase activity':

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: create: object (Bio::Annotation::DBLink) failed to insert or to be
found by unique key
STACK: Error::throw
STACK:
Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:359
STACK:
Bio::DB::BioSQL::BasePersistenceAdaptor::create /usr/lib/perl5/site_perl/5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:206
STACK:
Bio::DB::BioSQL::TermAdaptor::store_children /usr/lib/perl5/site_perl/5.8.8/Bio/DB/BioSQL/TermAdaptor.pm:293
STACK:
Bio::DB::BioSQL::BasePersistenceAdaptor::create /usr/lib/perl5/site_perl/5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:214
STACK:
Bio::DB::BioSQL::BasePersistenceAdaptor::store /usr/lib/perl5/site_perl/5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:251
STACK:
Bio::DB::Persistent::PersistentObject::store /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Persistent/PersistentObject.pm:271
STACK: main::persist_term /usr/bin/bp_load_ontology.pl:805
STACK: /usr/bin/bp_load_ontology.pl:610
-----------------------------------------------------------

 at /usr/bin/bp_load_ontology.pl line 817
        main::persist_term('-term',
'Bio::Ontology::GOterm=HASH(0x88497a4)', '-db',
'Bio::DB::BioSQL::DBAdaptor=HASH(0x897f074)', '-termfactory',
'Bio::Ontology::TermFactory=HASH(0x8d64ad8)', '-throw',
'CODE(0x851abc8)', '-mergeobs', ...) called
at /usr/bin/bp_load_ontology.pl line 610

I tracked this down to an apparently poor formatting of the GO.defs file
(note that the first and third definition_lines appear to be two halves
of the same entry):

term: 2-pyrone-4,6-dicarboxylate lactonase activity
goid: GO:0047554
definition: Catalysis of the reaction: 2-pyrone-4,6-dicarboxylate + H2O
= 4-carboxy-2-hydroxyhexa-2,4-dienedioate.
definition_reference: :6-DICARBOXYLATE-LACTONASE-RXN
definition_reference: EC:3.1.1.57
definition_reference: MetaCyc:2-PYRONE-4

I found 43 similar errors for other GOIDs, and it appears to result from
the occurrence of the string "\," in a dbxref - mostly MetaCyc entries,
but also some UM-BBD_pathwayID entries.

These errors appear to have followed through into the generation of the
OBO format files in each case, e.g.:

def: "Catalysis of the reaction: 2-pyrone-4,6-dicarboxylate + H2O =
4-carboxy-2-hydroxyhexa-2,4-dienedioate." [:6-DICARBOXYLATE-LACTONASE-RXN, EC:3.1.1.57, MetaCyc:2-PYRONE-4]

and so is something for the GO guys to fix, I guess.


Another error is thrown after fixing the above, though (with the same
command as before):

Loading ontology Gene Ontology:
        ... terms

-------------------- WARNING ---------------------
MSG: insert in Bio::DB::BioSQL::TermAdaptor (driver) failed, values were
("GO:0006905","vesicle transport","OBSOLETE (was not defined before
being made obsolete).","X","") FKs (1)
Duplicate entry 'vesicle transport-1-X' for key 3
---------------------------------------------------
Could not store term GO:0006905, name 'vesicle transport':

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: create: object (Bio::Ontology::GOterm) failed to insert or to be
found by unique key
STACK: Error::throw
STACK:
Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:359
STACK:
Bio::DB::BioSQL::BasePersistenceAdaptor::create /usr/lib/perl5/site_perl/5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:206
STACK:
Bio::DB::BioSQL::BasePersistenceAdaptor::store /usr/lib/perl5/site_perl/5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:251
STACK:
Bio::DB::Persistent::PersistentObject::store /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Persistent/PersistentObject.pm:271
STACK: main::persist_term /usr/bin/bp_load_ontology.pl:805
STACK: /usr/bin/bp_load_ontology.pl:610
-----------------------------------------------------------

 at /usr/bin/bp_load_ontology.pl line 817
        main::persist_term('-term',
'Bio::Ontology::GOterm=HASH(0xbcac418)', '-db',
'Bio::DB::BioSQL::DBAdaptor=HASH(0x957805c)', '-termfactory',
'Bio::Ontology::TermFactory=HASH(0x995db20)', '-throw',
'CODE(0x9113bd0)', '-mergeobs', ...) called
at /usr/bin/bp_load_ontology.pl line 610

There are duplicate terms, identical in the term table except for GOID:
GO:0006905 and GO:0005480.  They are both "vesicle transport", and
obsoleted:

term: vesicle transport
goid: GO:0005480
definition: OBSOLETE (was not defined before being made obsolete).
definition_reference: GOC:go_curators
comment: This term was made obsolete because it represents a biological
process and not a molecular function. To update annotations, use the
biological process term 'vesicle-mediated transport ; GO:0016192'.

term: vesicle transport
goid: GO:0006905
definition: OBSOLETE (was not defined before being made obsolete).
definition_reference: GOC:go_curators
comment: This term was made obsolete because the meaning of the term is
ambiguous. To update annotations, consider the biological process term
'vesicle-mediated transport ; GO:0016192'.

I used the --noobsolete flag to avoid this error - reasoning that since
I'm populating the database for the first time, ignoring the obsolete
terms won't hurt - but finally this error was thrown:

Loading ontology Gene Ontology:
        ... terms

-------------------- WARNING ---------------------
MSG: insert in Bio::DB::BioSQL::DBLinkAdaptor (driver) failed, values
were ("PMID","","0","") FKs ()
Column 'accession' cannot be null
---------------------------------------------------
Could not store term GO:0032933, name 'SREBP-mediated signaling
pathway':

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: create: object (Bio::Annotation::DBLink) failed to insert or to be
found by unique key
STACK: Error::throw
STACK:
Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:359
STACK:
Bio::DB::BioSQL::BasePersistenceAdaptor::create /usr/lib/perl5/site_perl/5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:206
STACK:
Bio::DB::BioSQL::TermAdaptor::store_children /usr/lib/perl5/site_perl/5.8.8/Bio/DB/BioSQL/TermAdaptor.pm:293
STACK:
Bio::DB::BioSQL::BasePersistenceAdaptor::create /usr/lib/perl5/site_perl/5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:214
STACK:
Bio::DB::BioSQL::BasePersistenceAdaptor::store /usr/lib/perl5/site_perl/5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:251
STACK:
Bio::DB::Persistent::PersistentObject::store /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Persistent/PersistentObject.pm:271
STACK: main::persist_term /usr/bin/bp_load_ontology.pl:805
STACK: /usr/bin/bp_load_ontology.pl:610
-----------------------------------------------------------

 at /usr/bin/bp_load_ontology.pl line 817
        main::persist_term('-term',
'Bio::Ontology::GOterm=HASH(0xbe18f14)', '-db',
'Bio::DB::BioSQL::DBAdaptor=HASH(0x99bbf2c)', '-termfactory',
'Bio::Ontology::TermFactory=HASH(0x9da0ad8)', '-throw',
'CODE(0x9556bb4)', '-mergeobs', ...) called
at /usr/bin/bp_load_ontology.pl line 610

with the offending entry being 

term: SREBP-mediated signaling pathway
goid: GO:0032933
definition: A series of molecular signals from the endoplasmic reticulum
to the nucleus generated as a consequence of altered levels of one or
more lipids, and resulting in the activation of transcription by SREBP.
definition_reference: GOC:mah
definition_reference: PMID:0

I commented out the definition_reference for PMID:0, which seemed to fix
matters.

The process.ontology and component.ontology files then went into the
database without a hitch.  Thanks again for your help,

L.

-- 
Dr Leighton Pritchard B.Sc.(Hons) MRSC
D131, Plant Pathology Programme, SCRI
Errol Road, Invergowrie, Perth and Kinross, Scotland DD2 5DA
e:lpritc at scri.ac.uk            w:http://bioinf.scri.ac.uk/lp
gpg/pgp: 0xFEFC205C
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

SCRI, Invergowrie, Dundee, DD2 5DA.  
The Scottish Crop Research Institute is a charitable company limited by guarantee. 
Registered in Scotland No: SC 29367.
Recognised by the Inland Revenue as a Scottish Charity No: SC 006662.


DISCLAIMER:

This email is from the Scottish Crop Research Institute, but the views 
expressed by the sender are not necessarily the views of SCRI and its 
subsidiaries.  This email and any files transmitted with it are confidential 
to the intended recipient at the e-mail address to which it has been 
addressed.  It may not be disclosed or used by any other than that addressee.
If you are not the intended recipient you are requested to preserve this 
confidentiality and you must not use, disclose, copy, print or rely on this 
e-mail in any way. Please notify postmaster at scri.ac.uk quoting the 
name of the sender and delete the email from your system.

Although SCRI has taken reasonable precautions to ensure no viruses are 
present in this email, neither the Institute nor the sender accepts any 
responsibility for any viruses, and it is your responsibility to scan the email 
and the attachments (if any).




More information about the Bioperl-l mailing list