[Bioperl-l] New GO Parser and errors loading biosql database

Hilmar Lapp hlapp at gmx.net
Wed Feb 18 04:19:30 EST 2004


On Tuesday, February 17, 2004, at 11:42  AM, Law, Annie wrote:

> Hi,
>
> I would appreciate help with the following.
> I have installed the newest bioperl-db and biosql schema from cvs.
> I tried to load the database with information from godatabase.org and 
> got
> some errors listed further below (the
> Tables did not fill at all).
> Next I tried to load the database with Locuslink data from NCBI.
>
> 1)I got the LL file from NCBI and tried to load an empty datbase with a
> LL_tmpl file (for human) and
> It seemed to load properly and the tables were filling up but then it
> stopped after about 900 bioentries.
> I'm not sure what went wrong.  There seem to be a complaint about 
> duplicate
> entry but I don't think I should
> Modify the source file.
>

It should never be necessary to modify the source file.

First of all, unless you're testing or debugging something and actually 
*want* to get thrown out upon the first error, you should always 
specify --safe, for load_ontology.pl as well as for 
load_seqdatabase.pl. This will roll back a sequence that fails to load, 
but will otherwise keep going.


> [root@ data]# perl /root/bioperl-db/scripts/biosql/load_seqdatabase.pl
> --dbuser=root
>  --dbpass=mss22 --dbname bioseqdb --namespace "LocusLink" -format 
> locuslink
> /var/lib/mysql/LL_
> _tmpl --dbpass=bioinf1 --dbname bioseqdb --namespace "LocusLink" 
> -format
> locuslink /var/lib/mysql/LL_tmpl
> Loading /var/lib/mysql/LL_tmpl ...
>
> -------------------- WARNING ---------------------
> MSG: insert in Bio::DB::BioSQL::TermAdaptor (driver) failed, values 
> were
> ("GO:0005699","kinetochore","","") FKs (6)
> Duplicate entry 'kinetochore-6' for key 2
> ---------------------------------------------------

This basically means that there is already another term 'kinetochore' 
in the same ontology, but with a different GO id. I.e., the look-up by 
GO id failed for this one, prompting the system to insert the term as a 
new one, which then (unexpectedly) fails too because of the unique key 
violation.

This is not atypical for annotation being a work in progress. Also, and 
actually more likely, you may have had remnants of previous data loads 
there. GO terms get merged with others, and then previously primary IDs 
either get retired or become secondary IDs. A database of annotated 
genes like LL may not be immediately up to date.

Especially for LL the best thing is to always pre-load GO and other 
ontologies that sequences are associated (annotated) with. Also, it's 
not a bad idea to pre-load the NCBI taxonomy database using the script 
load_ncbi_taxonomy.pl in the biosql repository.

>
> 2) Updating GO parser
> I saw that the GO parser was updated recently and I have located the 
> code
> version 1.17.2.1 I downloaded the new version.  I am using bioperl 1.4.
> Should I just take the new dagflat.pm and replace the old one or are 
> there
> more steps involved?

Not really. There is also an updated test but you don't need that.

>   When I download whole Modules I need to use make
> commands.
>
> Also I saw that dagflat.pm requires graph.pm.

It's not actually dagflat that requires it but the OntologyEngineI 
implementation it populates behind the scenes 
(Bio::Ontology::SimpleGOEngine if you're curious).

But as a consequence of all this, yes if you use the dagflat parser 
(and goflat and soflat are basically just other names for the same 
parser) you do need Graph.pm from CPAN.

>   Is this graph.pm part of the
> bioperl 1.4 package I couldn't seem to find it or do I need to 
> download and
> install from CPAN.

You get it from CPAN. The name is Graph.pm. If the CPAN shell doesn't 
understand that, ask it to install Graph::Directed.


>   I searched CPAN for graph.pm and got several hits.  Is
> this the one I need? http://search.cpan.org/~mverb/GDGraph-1.43/
> Do I also need GD.pm? I think I saw somewhere that it is required?
> http://search.cpan.org/~lds/GD-2.11/GD.pm
> Although this could be a mistake
>

You do not need GD (or GDGraph or whatever) for bioperl-db.


> Where is the best place to install GD and graph.pm (with dagflat.pm or 
> the
> main perl library)?
> I'm not sure whether the main perl library is /usr/lib/perl5/5.8.0 or
> /usr/lib/perl5/site_perl/5.8.0/Bio

The CPAN shell will do that automatically. Also, if you just say 'make 
install' in a perl module's root source directory, it will be installed 
in the right place. The only think to be careful about is to use the 
same perl for running 'perl Makefile.PL' that you otherwise use for 
running perl scripts.

>
>
> 3) I installed Bioperl-db and downloaded the biosql schema 
> successfully but
> when I tried to use the Load_ontology.pl I got some errors which seem 
> to be
> saying that I am missing some main modules such as goflat (I recorded a
> script of the output). But I have goflat.pm.
> Am I calling the perl script incorrectly?  Or are there still some 
> modules I
> need to install? I'm not sure that I am using the correct
> Syntax for the format field.

The reason it is failing is because you don't have Graph.pm installed 
as the stack trace states:

> Bio::OntologyIO: goflat cannot be found
> Exception
> ------------- EXCEPTION  -------------
> MSG: Failed to load module Bio::OntologyIO::goflat. Can't locate
> Graph/Directed.pm in @INC (@INC contains:

The initial message that goflat.pm cannot be found is just a (wrong in 
this case) interpretation of the failure to dynamically load the module.

	-hilmar

-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------




More information about the Bioperl-l mailing list