On Commercial OODBs for XML RE: Bioperl: XML/BioPerl

Otillar, Robert {~Palo Alto} ROBERT.OTILLAR@Roche.COM
Thu, 14 Jan 1999 00:34:20 -0800



> Is anyone aware of plans on the part of the database organizations to
> serve 
> XML?
> 
		I've been giving thought to XML as representation standard
to couple with a persistent storage database for genomic data. I thought you
might be interested in a little of what I've found out about the commercial
OODB products.

		Object Designs, Inc., (www.odi.com) is in late beta-testing
of their Excelon XML database/server, part of the ObjectStore line of
products. Of interest:
		ObjectStore has PERL drivers on CPAN. (The supported
bindings are in C++ and Java).
		The ObjectStore suite of programs are available for $500 to
academics under a strict non-commercial use license. Their  full suite costs
~$10K-15K for commercial developers. Any developer can get their single-user
version, PSE PRO, for ~$250. This product is a usefully functional subset of
the enterprise-level Objectstore OODB. This includes their PSE Win32 C++ and
run-anywhere PSE Java database plus a few win32 class-design tools. 
		ODI's premier products do not run on linux, sadly, nor do
they appear to use Apache (just Netscape or Microsoft's IS). Their flagship
rapid database development products are all Win32, with ObjectStore and some
tools supported on Unix. The personal PSE Pro Java db 'runs on any machine
with a Java Development Kit'.
		Their rapid-development tools and web-server attachments for
generating dynamic www pages from their OODB server appear to have nice GUIs
and labor-saving object modelling-and-design features.
		ODI has some type of drivers for attaching relational
databases, like Oracle, to allow single federated database interface for
diverse products. I haven't used these or read their specs yet, though.
		For the PSE PRO and ObjectStore, queries are native Java and
persistence applies to any object reachable from an already persistent
object. Index structures into persistent object data include hashes, trees,
lists.
		ODI was a sponsor of the Objects in Bioinformatics
conference.
		I've read opinion from ODI's competitors that their products
may not scale well to multi-gigabyte datasets, due to the pointer-schema
they employ (i.e. corruption of the database may become frequent on large,
dynam ic databases.) I haven't confirmed this yet, and am still trying to
understand their differences from Objectivity (below).
		Summary: ODI
			Nice GUI's and rapid development tools
			Leading native XML support in OODBs
			C++ and Java supported, non-supported CPAN bindings
for perl.
			Affordable to small academic teams.
			Mostly a Win32 product.
			Very informative web-site with white-papers,
downloadable docs, and trial software.
			

		Objectivity, inc., (www.objectivity.com) is the commercial
OODB system used by the physics collaboration for the CERN upgrade, a system
which will require petabytes of data storage and handling. Hence I expect
Objectivity is quite robust and should scale very well. I haven't seen them
mention native/explicit support for XML, though true OODBs naturally fit
with XML datastorage in any case.
		C++ and Java bindings, support of Win32, unix and Linux (C++
only on Linux right now).
		The UCLA bioinformatics group is now using Objectivity.
Rumor is the design-tools are much more primitive than ODI's,  but when I
talked to Objectivity they said (more or less) that their focus is on
optimizing their OODB's efficiency, robustness, and capabilities, rather
than the 'you can use it without even coding' approach.
		Objectivity's products cost ~$10,000/developer, with
academic discounts on a case-by-case basis. 	Objectivity has drivers for
attaching Relational databases (i.e. support for use in a federation
scheme).
		Objectivity's locking method (for concurrent access) may be
at the container level, a significantly different granularity than ODI's.
		Working Summary on Objectivity:
			Very robust and scalable.
			C++ and Java supported. No PERL bindngs detected
yet.
			Win32, Unix, and Linux product.
			Possibly expensive for academic groups, due to small
discount.
			
			
		I am still checking out Objectivity and ODI, which appear to
be the two primary commercial vendors for general-purpose OODB products. I
haven't looked into Poet, Jasmine, or VERSANT very deeply yet, but am
reading....

		I will let you know more as I find it if there's an interest
in such information.

	Thoughts or suggestions?

	Perhaps this is redundant to what's already been said:
		--using bioperl modules to create a true
abstraction/intermediate layer for genomic analysis software would simplify
much updating of code. For example: 
		(NCBI2.0 blast-driver)->XML
		(WU-BLAST-driver)->XML
		(SSEARCH-driver)->XML

		XML->sequence object containing scoring, sequence, and
similarity-alignment objects.

		Then when:NCBI BLAST2.1 comes out: write one new driver,
don't touch other code. (Download one new driver, don't recheck existing
code). It would also lower the barier to folks like me contributing bits
here and there, like parser-drivers for ISS or novel search methods.
Abstracting meta-analysis tools' input to any XML standard might raise the
value of writing converters to generate data that was at least a superset of
the standard. Then perhaps GeneDoc, my web-server, and my own code could all
read the same file without a large overhead for
which-program-wrote-the-file-containing-that-alignment-data? code.
		--In addition, Bioperl tools have made parsing different
program's output much easier, but it remains very system-intensive to
repeatedly parse numerous collections of files for analysis. This would be
diminished if the hard part of parsing could be restricted/contained into
one interface layer, as mentioned above. I would expect parsing a known, or
at least validatable (like XML), format would be much faster. 
		For people who deal with thousands of BLAST & analysis files
(surely many others have this problem?) a persistence/database approach
appears inevitable and will require abstraction equivalent to an XML
representation. Hence a standardized format would be very useful to research
on large, highly annotated, collections of biological objects which might
require export for analysis with not-in-house software. In any case, I
believe data analysis on this scale is in the future of Bioperl, since the
ability to store & analyze the markup on a genome is what gives it's
sequence value.

	Thanks,
		Bobby O

	p.s. I enjoyed finding the XML thread on returing from holiday and
hope the Bioperl forum's discussions will result in a draft
schema/recommendation for sequence objects.
=========== Bioperl Project Mailing List Message Footer =======
Project URL: http://bio.perl.org/
For info about how to (un)subscribe, where messages are archived, etc:
http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/vsns-bcd-perl.html
====================================================================