Database directions for the 1990’s: part trois

The second of eight recorded lectures from the May 31, 1990 symposium entitled “The 1990’s – The Database Decade” features François Bancilhon, at the time one of the research team at Altair. Soon after this talk, he would move on to create O2 Technology, which developed the relatively successful O2 object-oriented database system.

In this highly entertaining talk, François describes the necessary features of an object-oriented database system and describes some potential research areas, along with his thoughts on the technology’s future in the market.

François is now the Chairman and CEO of Mandriva. Prior to that he has held positions at a variety of companies and research centers including SomaLogic, O2 Technology, MCC, INRIA, and the University of Paris.

This article summarizes François’s lecture, and is not a verbatim transcript of his remarks. Errors in fact or emphasis are my responsibility alone.

[See image gallery at glennpaulley.ca]

François Bancilhon: Object-oriented database systems

First, let me describe my affiliation. I am part of the research group at Altair (pronounced Al-tay-eer), which was started in 1986, and is a consortium with founding partners IN2 (a subsidiary of Siemens AG), INRIA, Bull, and LRI (a Univesity of Paris research lab). Altair has approximately 45 people, and funding of 120 million French francs. The goal of this consortium is to build an object-oriented database system. Our first prototype was completed in September 1989. Our initial beta release is scheduled for the middle of this year [1990] and our goal is to have a released product sometime in 1991.

Overview of database technology

Let me begin by starting with a brief overview of database technology, so I will try to cover in 5 minutes what Paul Larson covered in one hour but in a French way, rather than a Scandinavian way; I talk faster and I move my arms so you will see everything from a different perspective.

So our first question is: What is exactly the functionality of a database system?

A database system is a software system that provides six services:

persistence;
disk management;
data sharing;
data reliability;
data security; and
adhoc querying.

I think it is important to distinguish between the idea of persistence from the idea of disk management; the latter covers the handling of large volumes of data on disk, not merely ensuring that data is written to persistent storage. Reliability is a notion tied to recovery, that is recovery from failure. Adhoc querying is a hold over from work on relational systems; previously the idea was that in order to get data out of a database you had to write a program to be able to do that. Moreover, with adhoc querying a requirement is that simple queries should be able to be expressed simply.

DBMS History

Over the past few decades, we have:

The 1960’s: file systems
1970’s: CODASYL and hierarchical database systems
1980’s: relational systems
1990’s: what’s next? relational again? extended relational? object-oriented? extensible? deductive? Perhaps no DBMS at all?

The number and extent of these initiatives will be governed by supply and demand. Note what has happened with the adoption of relational database management systems:

1970: Codd’s seminar paper is published in Communications of the ACM
1975: Relational database technology is here
1980: Products based on that technology start to appear
1985: The marketplace for such products develops, following the commitment of IBM to relational technology
1990: Relational database products are dominant in the marketplace; one definition being that IBM’s DB2 licenses (in number) now are greater than the number of IMS licenses – someone here [at the symposium] from IBM can probably confirm that.

From that timeline it appears that it takes about 20 years from the creation of relational technology to its dominance in the marketplace. It is a source of pride for the visionaries and the development teams to take an idea and create something that is dominant in the marketplace, but also humbling, in a way, at the same time. Such technology adoptions, then, are relatively rare in one’s lifetime.

Today’s database user demands:

better performance: there are new benchmarks (ET1/TP1, a TP benchmark, the Wisconsin benchmark, the new Sun benchmark developed by Rick Cattell, Tektronics benchmarks – all of these are designed to push the limits of database system performance.
better programmer productivity: improvements to the design, coding, debugging and maintenance of applications; and
new application handling: CAD, CAM, CIM, CASE, cartography, urban systems, editorial information systems, office automation systems.

These needs are different from those needs that relational database systems were designed to handle. At the same time, however, there is still much continuing research in relational database systems – arguably, perhaps, it could be termed “polishing a round ball”. But other approaches and extensions to database technology are also underway. Some of these include:

deductive database systems – John Mylopoulos gave us a talk about that yesterday;
persistent programming languages – The basic idea here is that ordinarily programs don’t save your output; once the program ends, “poof”. One can save the results to a file system or to a database, but there are some problems with that approach. For one thing, a program has a set of types, and a database system’s set of types do not usually match the types used by the program, so there is much conversion of data back and forth. So with a persistent programming language, the idea is to support the ability to take any piece of data in a program and make it persist. And when I re-execute my program, that data is just “there”, and that means there is no longer any need for a “special” DBMS.
extending relational systems in various ways – see the work of David DeWitt at Wisconsin. Extensions are required because you cannot write an entire program in SQL today; the idea here is to extend SQL with programming constructs. Conceptually, this is the same thing as persistent programming languages, but this approach is coming at the problem from a very different starting point.
Lastly, there is lots of effort going into object-oriented database systems and that is the real subject of this talk.

My reading of the crystal ball

The following technologies: extensible systems, database programming technologies, and persistent programming languages will have an impact on future commercial database systems.

Deductive DBMS technology will have a much longer-term impact;
relational systems, with some minor extensions, are going to be here for a long time, mainly in traditional markets;
The next generation of database systems will be object-oriented. The migration to object-orientation will start with specific niche applications and subsequently will begin to address conventional markets. Lots of technology is under development today with persistent languages, albeit it is difficult to invent a new programming language to solve existing problems. Nonetheless new object-oriented database systems will come – once their credibility is established – and businesses will migrate towards OODBMS systems.

Theory and practice

When you see a diagram with a relational database, it always appears “clean” – it is often represented by a picture of a disk. And that’s because the definition of a relational database is well-defined; unlike, for example, an expert system which is usually diagrammed as a cloud of smoke. But when the idea of object-oriented database management systems came around, people argued about the definition of things. There are lots of ongoing conferences, work, prototypes, startups – everybody and his brother are “object-oriented” and it has, consequently, been hard to establish firm rules and definitions. As with many things in computing we tend to build things first and then analyze them later. To give you an idea of the activity going on today, we have:

Conferences and workshops: DOOD, OODBS I and OODBS II, DBPL I and DBPL II, sessions at ECOOP and OOPSLA;
Research groups: at OGC, Brown University, MCC, HP Labs, ALtair, the University of Karlsruhe;
Prototypes: Encore/OB Server, Exodus, IRIS, Orion, O2 – lots of prototypes.
Products: Gemstone, Ontos, Vision, GBase;
Startups: Servio Logic, OntoLogic, Graphael, Object Design, Object Sciences, Objectivity, Ithasca…

So, to try to come up with a consensus as to what is meant by object-oriented database technology myself and a number of colleagues wrote this paper [1] that we presented in Japan in 1989 that defines an object-oriented database system according to three types of rules: mandatory, or golden, rules for which we have broad consensus; optional rules (bonus and goodies) that make an object-oriented system better, but are not a requirement; and open rules that define specific choices but for which there is no broad agreement. So in the rest of this talk, I’m going to describe the “golden rules”. Now, there is often an object-oriented programming expert in the room and about now he would start giving me hell about the definitions that I’m going to be using; but in response let me open my umbrella and say that I am just a humble database person trying to pick what object-oriented technology I want to include in my database system, and nothing more.

The Golden Rules

The “golden rules” of an object-oriented database management system are:

Complex objects
Object identity
Encapsulation
Types or classes
Inheritance
Overloading and late binding
Computational completeness
Extensibility

and in the remainder of the talk I will discuss these rules in turn.

1. Complex objects

The idea of complex objects is to be able to assemble simple, atomic types into more complex objects, including set constructors, list constructors, tuple constructors, record constructors, and so on. The constructors must be orthogonal – so that you can develop sets of sets of lists of tuples, or whatever you want. For those of you who are familiar with NF2 relations (non-1NF, or nested, relations) those constructors are not orthogonal. Similarly, CODASYL list and record types have constructors that are not orthogonal either.

The three minimal constructors to be supported in an object-oriented database system are sets, lists, and tuples. More basic constructors are better, and there is clearly a need for advanced types of constructors with CAD applications, for example.

2. Object identity

Object identity is the notion that objects have “identity” independent of their values, usually implemented via object identifiers that are independent of any object value; in fact, the programmer often doesn’t have to “see” the object identifier at all.

The big thing object identity gives you is the ability to share objects amongst other objects in the database.

3. Encapsulation

Encapsulation is the big idea that comes from object orientation and abstract data types (ADTs). Encapsulation is the idea that you can define an object and specify the set of operations you can perform on that object. You cannot “see” the internals of the object – you can only access the values of an object through these specific methods.

To gain adoption of the idea of encapsulation we need to change some habits in how we develop database system applications. Consider a relational system and its database schema. We might have:

1	CREATE TABLE Employee ( employeeName CHAR(50), age INTEGER, salary INTEGER );

and two application programs: one being raiseSalary( employeeName ) and the other fireEmployee( employeeName ). (Aside: here I have a simple employment policy; either I raise the salaries of my employees, or I fire them.) So with this system we have the data in the database and the application programs are in system files; schema and applications are designed separately and stored separately. Within some limits, such as accessing the data through a view, all employeee data can be seen and manipulated by all application programs.

Consider instead an object-oriented world. We have an object with a memory part:

2	(employeeName: CHAR(50), age: INTEGER, salary: INTEGER)

and two methods that operate on that object: raiseSalary(employeeName) and fireEmployee(employeeName). Now we have data and methods designed at the same time, and in this uniform system data and their operations and stored and manipulated within the same system. The key is that access to the employee “record” is only through the two methods raiseSalary and fireEmployee, and nothing else.

4. Types or classes

Types, of course, come from higher-level programming languages. Relational systems, too, have types; but in a relational system one cannot have a type distinct from a relation itself, at least in today’s products. With object-oriented database systems, the idea is that objects of the same kind, or nature, should be grouped together and we should be able to define their common features. So we characterize collections of individual objects by their class, or type. What we have is:

same nature = same internal structure and same behaviour

For objects, we separate specification from implementation; the class, or type, description includes the structure description, the description of its operations, and the implementation of those operations.

Are types the same as classes? Some definitions distinguish between them, but I am not prepared to go into the details in this talk.

5. Inheritance

Inheritance is arguably the most important and significant features of object-oriented database systems.

Suppose I have a relational system with two relations:

3 4	CREATE TABLE student (name CHAR(50), birthdate DATE, grades CHAR(100) ); CREATE TABLE employee (name CHAR(50), birthdate DATE, salary INTEGER, senority CHAR(40), diplomas CHAR(100) );

and for students we have two methods, computeGPA and age, and for employees we also have two methods, salary and age. So in a relational system we define two relations and write four applications. With an object-oriented solution, we can factor out the common parts: the structure (name, birthdate) and the method “age”. This refactoring yields an object hierarchy where we recognize that we have an object of type Person from which the other two are derived:

5
6
7

Person( name, birthdate )
Student( grades )
Employee(  seniority, salary, diplomas )

and the program “age” is written only once, and it is written at the exact point of abstraction where it fits in the model.

The benefits of inheritance include:

avoids code redundancy;
makes code simpler by placing it at the right level of abstraction in an object hierarchy;
makes the objecct model the perfect paradigm for software re-usability, where the software component is the object and customization is performed through specialization. In the auto industry, a new model of a vehicle might have 20% new parts over the previous model; but 80% of the new model is built using existing fabrications. But in today’s software development practice it is difficult to specialize, or even find, a software component that we can re-use, and so applications are commonly 100% new code. Specialization of existing software components is the only way we can solve this problem.
provides a compact, well-structured and easy-to-understand schema description.

Note that inheritance can be multiple or single.

6. Overloading and late binding

Let’s look at a different example. Assume we have three types of objects: employee, bitmap, and graph. Assume we put them in a bundle and want to display them. We define three operations, displayEmployee(), displayBitmap(), and displayGraph(). Note that the code of each of these three operations will be different. Altogether, we have three method names, three types, and three procedure bodies. Moreover to display the bundle we need to have something like:

For x in Bundle DO
  Case of type(x):
     Employee: displayEmployee(x);
     Bitmap: displayBitmap(x);
     Graph: displayGraph(x);
  End

Note that if we add a new type to the bundle, we must rewrite the program.

Instead of the above, we introduce a new, conceptual type called displayableObject, with a single method, display(). displayableObject has three subtypes: Bitmap, Employee, and Graph. Each of these subtypes inherits the display() method and re-defines it. With this construction we have four types, a single method name, and three method bodies. To display the bundle, our code is simplified to:

14
15

For x in Bundle DO
  display(x):

and the resulting code is simpler, and will not have to change if a new type is introduced to the bundle.

Inheritance, however, is not free; there is a price you have to pay for that functionality, and that price is called “late binding”. That is, you cannot translate, at compile time, the name of a procedure into an address, because the system does not know until run-time which routine needs to be invoked for the given object. There are some known techniques from object-oriented programming that are designed to help with the performance of late binding, but these are beyond the scope of my talk.

7. Extensibility

With extensibility, the idea is that a system comes with a set of predefined classes and objects (date, currency, polygon, area, etc.) and the user can extend the system to support new types, objects of those types, and methods for those objects. The programmer should not have to distinguish between system-supplied types and user-defined types; they should work similarly, but could of course be implemented in different ways.

8. Computational completeness

It has been proven that SQL is not a computationally complete language; you cannot write everything in SQL because SQL only supports first-order logic. Instead, an object-oriented database system should be such that any application can be completely written in it, without having to use an external programming language. Currently all programming languages are computationally complete because they are Turing-complete, but it is not true for relational systems because SQL is not Turing-complete.

Final commercial

At Altair we are implementing an object-oriented database system called O2. It is a complete OODBMS operating on Sun servers and workstations, which is one of our major engineering challenges. O2 is multi-lingual, supporting both CO2 and BasicO2 as programming languages. O2 supports a query language (OQL) that has a SQL-like syntax and a functional flavour. O2 offers two modes of operation: development and execution. In development mode, schema changes are permitted but system execution is somewhat slower. In execution mode, schema changes are not permitted but system execution is considerably faster. A version of the O2 prototype is available for distribution and a product is being derived from this prototype.

Conclusions (Technology)

While O2 and other prototype and commercial systems are successfully implementing object-oriented features, a number of research problems still need solutions. However, I am confident that with the amount of activity in the area, solutions to these problems will be found. Today, the technology needed to build such systems is about ready. Perhaps it could be said that the products are a little bit ahead of the technology, but this approach will prove effective in the long run. These companies are simply trying to be the first to the market.

On the performance side, there is nothing inherently difficult. Object-oriented database systems will quickly improve as the generations of these systems evolve. I claim, in fact, that OODBMS systems will offer far better performance that relational systems on the OO benchmarks. It is reasonable to expect that in the short term, relational and object-oriented workloads will be different.

Finally, it may take some time before “normalization” of these object-oriented ideas occurs and are uniformly adopted. But it is much too early for this process to begin now.

Conclusions (Market)

The products are here! There is strong demand for this technology and customers are experimenting with these systems today, particularly with those specific applications (CAD, CAM, CIM) where current database technology is insufficient. Proof of the validity of object-oriented database systems will come in the next few years. Currently, there are seven companies offering object-oriented database system products, and at least two or three of these should survive the next three years.

There is nothing besides the weight of history (?) that keeps traditional business applications from benefiting from this technology; I predict that this will start in the mid-to-late 1990’s. To permit this to happen, OODBMS systems must offer at least the level of service provided by relational database products, because otherwise no-one will purchase them. So object-oriented database systems must be commercially viable, with, for example, full transaction support, COMMIT, ROLLBACK, and so on.

Questions

At this point François took questions from the audience, and several of the questions concerned specifics about the query language used in O2. François, in response, presented some brief examples:

The O2 language

Suppose we wish to determine the name and street of the restaurants in Paris that are rated with two stars:

select tuple( name: x.name,
              street: x.address.streetAddress.streetName )
from x in Restaurant
where (x.numberOfStars = 2) and (x.address.city = 'Paris' )

The approach of the O2 query language is a little less assertional than SQL in a relational system; the primary difference is that one needs to have associative access to sets of objects. However, if you look at a set of OQL queries, and a set of corresponding SQL queries that reference a relational schema, you don’t really see that many differences.

For DDL, here is an example of some class declarations:

add class Tour 
    type list ( tuple ( what: PlaceToSee, when: Date ) )
add class Address
    type tuple ( streetAddress: tuple (number: integer,
                                       streetType: string,
                                       streetName: string ),
                 city: City,
                 zipCode: integer )
add class City
    type tuple (name: string, 
                placesToSee: set ( PlaceToSee ),
                country: string )

O2 has a special language for the DBA so that the DBA can control the physical placement of objects on the disk; the language distinguishes between the logical and the physical schema.

The SQL versus object-oriented debate

Currently there is an interesting debate whether or not the evolution of object-oriented technology will be developed from scratch, or whether this evolution will come from changes to existing relational systems.

A similar thing occurred with older data management technologies and relational systems. There existed an opinion that while relational seemed “great”, relational was just theory, and it was unusable. But later, the opinion changed to “SQL isn’t bad, it’s just a language” and what followed was that many products started to offer SQL front-ends, such as gateways.

I think developers need to be given new design methodologies to work with object-oriented database systems. Today, many practitioners are comfortable with 3GL application programming languages and 3NF relations; we need to bridge the gap between these new systems and new design paradigms, and current design practice.

Analysis

Over the past 22 years, despite many advances in object-oriented database technology, object-oriented database systems never made it to the mainstream and never overcame the headstart that relational database systems such as Oracle and DB2 already had in the marketplace.

Today it is common to see relational products, such as Oracle, labelled as “object-relational” systems, or offering “object-oriented features”. These labels are merely buzzwords, justified by their proponents because, for example, Oracle offers support for more complex types, such as spatial data. However, if you measure Oracle against the 8 “golden rules” outlined above, it is clear that Oracle is not an object-oriented database system. Attempts to make SQL object-oriented largely failed as well. Part of that effort remains in the SQL/2011 standard, with its support for structured types, array types, and entity-type hierarchies. However, these SQL type constructors are not orthogonal and SQL still does not offer inheritance, encapsulation, or overloading. It is relatively commonplace in companies to use SQL precompilers to massage SQL source code so that it works with “subclasses” or entity-type hierarchies, hardly the sophisticated and elegant inheritance and overloading mechanisms François discussed.

In 2012 object-oriented database systems are a niche product, and the products François mentioned in his talk have either disappeared entirely or have relatively minor followings. Perhaps today’s most successful OODBMS is Intersystems’ Cache, but Intersystems’ revenue pales in comparison to systems such as IBM’s DB2, Oracle, and Microsoft SQL Server.

The drive towards better integration of object-oriented programming with database systems continues, however, but today along very different lines: the advent of object-relational mapping toolkits, such as Hibernate and Microsoft Entity Framework, which attempt to bridge between objects in an application program and relations within a database, and with SQL as the query language in the middle. The complexity of such systems is a significant challenge for practitioners to overcome.

[1] M. Atkinson, F. Bancilhon, D. DeWitt, K. Dittrich, D. Maier, and S. Zdonik. The Object-Oriented Database System Manifesto. In Proceedings of the First International Conference on Deductive and Object-Oriented Databases, pages 223-40, Kyoto, Japan, December 1989.