Different Kinds of Data Models: History and a Suggestion
Published: October 1, 2010
In this article, David C. Hay tries to redeem himself for some of his contributions to controversy in the data modeling world.
For an industry that is supposed to be helping the world at large get its use of language sorted out, the data modeling industry has been example number one of how not to do it. All you have to do is to start referring to “logical” data models, and you will get into an argument. Yours truly has certainly made his contribution to the controversy.
Whenever I have gone into a company, I have found that there is invariably one term that describes the very most important concept in the company. As it happens, that is the word for which it is impossible to find a single definition. For example, for the Alberta Ministry of Transportation, the word was “road.” They are in the business of building and maintaining roads. The problem is that a road is either a line, describing a path from one place (like the airport) to another (like the hotel); an area, if they are planning rights of way; or a solid, if they are engineering its physical construction.
So, the word “road” could not show up anywhere in my data model for them.
For us in the data modeling world, there are three words in this category: “conceptual model,” “logical model,” and “physical model.”
Danette McGilvray and I laid out a table of the kinds of models that exist and the way they are described by various players in the industry. This is reproduced here as Table 1.
The table shows four kinds of “data” models, and three sets of names associated with each. The three “approaches” are:
Table 1: Model Types5Essentially, the differences seem to be between (1) “overview models” (2) “models of the business,” (3) “models of a DBMS-specific data structure,” and (3) “models of the underlying physical database structure.” These seem to be reasonable categories. The problem is whether the term ”conceptual” refers to (1) or (2), does “logical” refer to (2) or (3), and does “physical” refer to (3) or (4).
The problem originated with the original “Three Schema Architecture,” laid out by The American National Standards Institute (ANSI) in 1975.6 This saw the world of data described from three perspectives. First of all, every individual makes sense out of the world in terms of structures assembled in his own head. This view of reality is called in ANSI’s view, the “external schema,” Various people will look at the same chunk of reality and form different internal schemata. There is (according to ANSI, at least) an underlying reality that is the source of all of these internal views. This underlying reality could be represented by a “conceptual schema.” This is the schema from which the others are derived. With this, different schemata can be derived to use in support of various data technologies. Each of these is an “internal schema.” The “internal schema” can then be subdivided into the “logical schema,” which represents the conceptual structure translated into a particular data storage approach. (In 1975, the principal candidates were “hierarchical” and “network”), and the “physical schema” that deals with the actual physical storage media.
In 1987, John Zachman organized the world in a different way. He addressed other things besides data, but for our purposes here, we’ll discuss the “Data” column of his “Framework for Information (now “enterprise’) architecture.”7 As he saw the world in 1987, these were the perspectives of interest:
More interesting—even though they were relatively abstract—were what I and my colleagues called “conceptual” models. Contrary to what many people expected, these were in fact presentable and understandable to business management. Moreover, they were also instructive to them.
These two phenomena you described appear to be examples of this more general concept. Is that so? (Well, I guess so.) Are there any other examples of that concept that you haven’t told me about? (Hmm. Now that you mention it, there is also …) And so forth.
In the data models I and my colleagues produced, about 10% of the content was what we were told. The other 90% were the logical implications of what we were told. This turned out to be very illuminating to the management group as well as to the modelers.
The result was a model that truly represented the enterprise, but in terms much more fundamental than any revealed in interviews with the people immersed in the daily grind of carrying out the business.
So, when I wrote my book, Requirements Analysis: From Business Views to Architecture, I took the liberty of squeezing another perspective into the framework. Between the business owner’s view and the designer’s view, I inserted:
Architect’s View (Model of the fundamental structure of the enterprise) – the integration of multiple owners' views to arrive at a coherent, unified view of the enterprise. This view is in terms of fundamental entities, of which those seen by the owners are examples.This moved the Designer’s View to row 4, which seemed appropriate, since the bottom three rows should have been about technology, while the top three rows should have been be about the enterprise.
The Builder’s and the Out-of-Context View got collapsed into a single “Builder’s View.” This was not unreasonable, since the distinction between the two in Mr. Zachman’s description seemed less than clear, to me at least.
I got heat from some of Mr. Zachman’s followers (although not from him) for tinkering with the sacred Zachman Framework. Much to my pleasant surprise, however, a few years later, he and Stan Locke showed me their updated version of the framework. In this one, the views were these:
Meanwhile, there kept being an issue of what exactly it meant to model the business owner’s view. The topic of semantics kept coming up. The problem is that people in different departments used different language. How can we systematically address the language used by business owners to describe their work? This was not a trivial task, since different people in different departments often described the same thing in different terms and used the same terms to describe very different things.
Out of this came the effort by the Business Rules Group that eventually was taken over by the Object Management Group to describe “The Business Vocabulary and Business Rules.” This was published in 2008.9
Here, for the first time, you had a comprehensive approach to describing an enterprise’s rules in a consistent form—and by virtue of that, you had the ability to describe the enterprise’s structure (as seen by the business owners) in a consistent way. The result of this is linguistic, not graphical, but it was a major step toward capturing the semantics of the enterprise.
Meanwhile, other people, completely outside the world of business and databases, who had been working in the area of linguistics for, say, several millennia, were inspired by the advent of the world-wide web to bring their knowledge of semantics and ontology into that world. It turned out that those architectural data models my colleagues and I were building were examples of something called an ontology.10 This is originally a word describing the branch of ancient Greek philosophy concerned with finding and describing “what exists.” (This is a lot trickier than you might suppose. Look at our efforts in this regard. Philosophers have been at it for a very long time. There are still some questions about that today, but that is beyond the scope of this discussion.) In modern terms, an ontology is a collection of terms to describe what exists in an enterprise. They must be meaningful in a particular context, with well-defined relationships and a means for drawing inferences from them. Out of conversations that included Sir Tim Berners-Lee and the World Wide Web Consortium (W3C) came the Semantic Web, a way of linking data (not just pages) over the world-wide web. From that, came the semantic languages RDF and OWL.
Without going into detail, the Resource Description Network (RDF) is a way of describing the world in terms of simple sentences. The Web Ontology Language (OWL—don’t ask about the acronym) builds on RDF to create a much more powerful, structured, language for describing the world. With these two languages, it was now possible to describe an enterprise with the language provided by its workers—and then use software called “inference engines” to identify discontinuities and conflicts of terminology. An architectural model can be directly mapped into these two languages, by the way.11
In short, the issue of how to deal with the semantics of an organization (Row 2 of Mr. Zachman’s Framework) is finally being addressed.
So, where does this leave our original problem with conceptual and logical models? I hereby modify my original organization described above. Harking back to the original ANSI ideas about the “External,” “Conceptual” and “Internal” schema, as updated by the upgraded Zachman Framework, I propose the following definitions:
Recent articles by David C. Hay
David C. Hay - In the information industry since it was called “data processing,” Dave Hay has been producing data models to support strategic and requirements planning for more than twenty-five years. As President of Essential Strategies, Inc. for nearly twenty of those years, Dave has worked in a variety of industries including, among others, banking, clinical pharmaceutical research, broadcasting, and all aspects of oil production and processing. Projects entailed various aspects of defining corporate information architecture, identifying requirements, and planning strategies for the implementation of new systems.
Dave’s recently published book, Enterprise Model Patterns: Describing the World, is an “upper ontology” consisting of a comprehensive model of any enterprise from several levels of abstraction. It is the successor to his groundbreaking 1995 book, Data Model Patterns: Conventions of Thought – the original book describing standard data model configurations for standard business situations.
In between, he has written Requirements Analysis: From Business Views to Architecture (2003) and Data Model Patterns: A Metadata Map (2006). Since he took the unusual step of using UML in the Enterprise Model Patterns… book, a follow-on book, UML and Data Modeling: A Reconciliation was published later in 2011. This book both shows data modelers how to adapt the UML notation to their purposes, and UML modelers how to adapt UML to produce business-oriented architectural models.
Dave has spoken at numerous international and local DAMA, semantics, and other conferences as well as at various user group meetings. He can be reached at firstname.lastname@example.org, (713) 464-8316, or via his company's website.