[Bioperl-l] [BioSQL-l] Problem loading GO.

Hilmar Lapp hlapp at gmx.net
Tue Apr 17 04:00:55 UTC 2007


Hi Leighton, please see below:

On Apr 16, 2007, at 11:55 AM, Leighton Pritchard wrote:

> Hi,
>
> I've been trying to upload the GO into a clean BioSQL (MySQL, 1.4.1)
> schema using the BioPerl bp_load_ontology.pl script, with the OBOv1.0,
> OBOv1.2, and the most recent flatfiles from
> http://www.geneontology.org/GO.downloads.ontology.shtml - none of my
> attempts have been successful.  The errors below are from a Linux
> installation, but the same errors are thrown on OS X, too.  I am using
> the most recent versions of BioPerl and bioperl-db, installed via  
> CPAN:
>
> [lpritc at lplinuxdev sequence_data]$ perl -MBio::Root::Version -e 'print
> $Bio::Root::Version::VERSION,"\n"'
> 1.005002102
>
> and bioperl-db 1.5.2.
>
> I have attached the traceback below (running with --safe throws a  
> number
> of equivalent errors),

Using --safe will throw the same errors, but will continue loading.  
I.e., you'd lose the one term, but keep everything else.

I do realize that especially for a graph losing an internal node can  
be quite detrimental.

> [...]
> ########
>
> [lpritc at lplinuxdev sequence_data]$ bp_load_ontology.pl --host  
> localhost
> --dbname biosql --namespace "Gene Ontology" --dbuser lpritc --dbpass
> ******** --format obo ~/Downloads/gene_ontology_edit.obo
> Loading ontology gene_ontology:
>         ... terms
>         ... relationships
>         Done with gene_ontology.
> Loading ontology biological_process:
>         ... terms
>
> -------------------- WARNING ---------------------
> MSG: insert in Bio::DB::BioSQL::DBLinkAdaptor (driver) failed, values
> were ("","","0","") FKs ()
> Column 'dbname' cannot be null
> ---------------------------------------------------

This would point to a problem of the BioPerl obo parser. According to  
the message, both the database name and the accession of the db_xref  
for the term are - surely erroneously - empty. Apparently the parser  
fails to parse out database and accession for this db_xref of term GO: 
0018901.

If you can edit the obo file, you can try deleting the db_xref(s) for  
that term that look odd (or delete all if you don't need them).

I'd have to debug the obo parser to see exactly where it's going  
wrong in parsing.

> Could not store term GO:0018901, name '2,4-dichlorophenoxyacetic acid
> metabolic process':
>
> ------------- EXCEPTION: Bio::Root::Exception -------------
> [...]
> [lpritc at lplinuxdev sequence_data]$ bp_load_ontology.pl --host  
> localhost
> --dbname biosql --namespace "Gene Ontology" --dbuser lpritc --dbpass
> ******** --format goflat --fmtargs ~/Downloads/GO.defs

Note that the argument for --fmtargs here should read
"-defs_file,/path/to/Downloads/GO.defs". (Note that within the quotes  
there is no tilde expansion.)

> ~/Downloads/function.ontology
> Loading ontology Gene Ontology:
>         ... terms
>
> -------------------- WARNING ---------------------
> MSG: insert in Bio::DB::BioSQL::DBLinkAdaptor (driver) failed, values
> were ("MetaCyc","2\,3-DIHYDROXYINDOLE-2\,3-DIOXYGENASE-RXN","0","")  
> FKs
> ()
> Duplicate entry '2\,3-DIHYDROXYINDOLE-2\,3-DIOXYGENASE-RX- 
> MetaCyc-0' for
> key 2
> ---------------------------------------------------

This is one the things why you've got to love MySQL (and I am correct  
in inferring that you're using MySQL?). The width of the  
dbxref.accession column (for which the second value in parentheses  
is) is 40 chars. The apparently pre-existing value ("2\,3- 
DIHYDROXYINDOLE-2\,3-DIOXYGENASE-RX-MetaCyc-0") is 50 chars, which  
when loaded should have resulted in an exception. Instead, MySQL just  
simply and silently truncates it to 40 chars, which makes it  
identical to the first 40 chars of "2\,3-DIHYDROXYINDOLE-2\,3- 
DIOXYGENASE-RXN" (which is 41 chars in length).

It may be necessary to widen the length of dbname.accession here, for  
example to 80 chars? Let me know if you need help with the DDL  
command to do this.

Let me know how far this gets you.

	-hilmar

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================








More information about the Bioperl-l mailing list