[BioSQL-l] Loading long strings to Oracle

Teemu Kivioja Teemu.Kivioja at vtt.fi
Tue Jun 28 11:19:44 EDT 2005


Hi,

I have couple of possibly related problems with loading to the Oracle database.

1.) When trying to load yeast proteins from SGD, I get:

perl load_seqdatabase.pl --host sboracle1.ad.vtt.fi  --driver Oracle 
--testonly --dbname BfxDB --format swiss --printerror test.swiss
Loading test.swiss ...
DBD::Oracle::st execute failed: ORA-01461: can bind a LONG value only for 
insert into a LONG column (DBD ERROR: OCIStmtExecute) [for statement 
``UPDATE biosequence SET version = NVL(?,version), length = NVL(?,length), 
alphabet = NVL(?,alphabet), crc = NVL(?,crc), seq = NVL(?,seq), ent_oid = 
NVL(?,ent_oid) WHERE ent_oid = ?'' with params: 
:p5='MAKQRQTTKSSKRYRYSSFKARIDDLKIEPARNLEKRVHDYVESSHFLASFDQWKEINLSAKFTEFAAEIEHDVQTLPQILYHDKKIFNSLVSFINFHDEFSLQPLLDLLAQFCHDLGPDFLKFYEEAIKTLINLLDAAIEFESSNVFEWGFNCLAYIFKYLSKFLVKKLVLTCDLLIPLLSHSKEYLSRFSAEALSFLVRKCPVSNLREFVRSVFEKLEGDDEQTNLYEGLLILFTESMTSTQETLHSKAKAIMSVLLHEALTKSSPERSVSLLSDIWMNISKYASIESLLPVYEVMYQDFNDSLDATNIDRILKVLTTIVFSESGRKIPDWNKITILIERIMSQSENCASLSQDKVAFLFALFIRNSDVKTLTLFHQKLFNYALTNISDCFLE...', 
:p3='protein', :p6='14404', :p1=undef, :p7='14404', :p4='F6ED4E3E9AE0F468', 
:p2=2493]) at /usr/lib/perl5/site_perl/5.8.0/Bio/DB/BioSQL/BaseDriver.pm 
line 1115, <GEN0> line 51.

The file test.swiss only includes the record:
ID   YBL004W        STANDARD;      PRT;   2494 AA.

It seems that I can get rid of this error message by explicitly telling 
that the type of sequence field is CLOB by adding the code
	    if ($slots[$i] eq "seq") {
	      $self->bind_param($sth, $j, $slotvals->[$i],
				{ ora_type => ORA_CLOB });
	    } else {
	      $self->bind_param($sth, $j, $slotvals->[$i]);
	    }


2.) When trying insert InterPro (interpro 10.0
ftp://ftp.ebi.ac.uk/pub/databases/interpro/interpro.xml) I get:

perl load_ontology.pl --format 'interpro' --host sboracle1.ad.vtt.fi 
--namespace interpro  --driver Oracle --dbname BfxDB --testonly --fmtargs 
"ontology_engine,simple" interpro.xml
...
11900
Loading ontology InterPro:
         ... terms

-------------------- WARNING ---------------------
MSG: insert in Bio::DB::BioSQL::TermAdaptor (driver) failed, values were 
("IPR000911","Ribosomal protein L11","Ribosomes are ...

ORA-01461: can bind a LONG value only for insert into a LONG column (DBD 
ERROR: error possibly near <*> indicator at char 14 in 'INSERT INTO te<*>rm 
(identifier, name, definition, is_obsolete, ont_oid) VALUES (:p1, :p2, :p3, 
:p4, :p5)')
---------------------------------------------------
Could not store term IPR000911, name 'Ribosomal protein L11':

------------- EXCEPTION  -------------
MSG: create: object (Bio::Ontology::InterProTerm) failed to insert or to be 
found by unique key

Again, the annotation is >2000 characters long but well under the 4000 
character limit.

3.) As others have already reported, the memory usage can be high,  the 
above load_ontology process takes about 2.5GB of memory.

I guess the fact that the problems 1 and 2 already arise with strings that 
are <4000 chars long might be related to the local character coding. The code:

my $hash_ref = $dbh->ora_nls_parameters();
my $database_charset = $hash_ref->{NLS_CHARACTERSET};	
my $national_charset = $hash_ref->{NLS_NCHAR_CHARACTERSET};
print "database charset: $database_charset\n";
print "national charset: $national_charset\n";

gives

database charset: WE8ISO8859P1
national charset: AL16UTF16

and

$  locale LC_CTYPE | head
upper;lower;alpha;digit;xdigit;space;print;graph;blank;cntrl;punct;alnum;combining;combining_level3
toupper;tolower;totitle
16
6
UTF-8
70
84
1
0
1

Some details of the system:
Enterprise Linux, 2.4.21-32.0.1.ELsmp (64-bit)
Oracle 10g, version 10.1.0.3.0 - 64bit
Perl, v5.8.0 built for x86_64-linux-thread-multi
Bioperl 1.4
bioperl-db 0.1
DBD::Oracle 1.16
Biosql-schema downloaded on May 10

What would be the best way to solve these problems?

Best regards,
Teemu Kivioja




------------------------------------------------------------------
Teemu Kivioja, Research Scientist
VTT Biotechnology
P.O. Box 1500, FIN-02044 VTT, Finland
(Street address: Tietotie 2, Espoo, Otaniemi)
Email: Teemu.Kivioja at vtt.fi
Phone: +358 20 722 7111
Fax: +358 20 722 7071



More information about the BioSQL-l mailing list