Architecture Made Easy, Part 1
Data Modeling: Rules of Abstraction
Published: April 3, 2008
This article describes how to avoid getting a dozen different data models from twelve experienced data modelers with the same requirements.
Numerous concepts have been introduced into the dialog of data architecture and design over the past quarter century. Whether the concepts are those of Martin, Codd, Simsion, Yourdon, Date, DeMarco, Booch, Gane or Chisholm, the results have been either good or bad – simply dependent upon who worked on the project. However, as automated systems within an enterprise become more numerous and larger, the one characteristic that continually becomes more pervasive is the incompatibility of data and database designs across applications.
In fact, incompatibility is frequently the outcome when people lose contact with one another as the result of forming a new group. It does not take long for new groups to head off in their own direction; it can occur in minutes, never mind what happens after months or years. The need for autonomy is usually explained as necessary in order to get anything done in a reasonable amount of time. As a result, numerous vocabularies, data structures, and information architectures become created with each new project.
The good news, however, is that there is an approach that can bring people on different projects, with disparate vocabularies and different paradigms, together such that for the first time you can avoid getting different data models from modelers when modeling the same things.
As you may have probably already guessed, the vocabulary itself is not the key. A number of companies have attempted to consolidate their vocabulary to a specific set of terms. However, no matter how well these terms may be defined, the ability to create disparate database designs is more formidable than a vocabulary can address.
We are also not talking about the types of differences that would result from having different knowledge, business priorities, or modeling style, as style can be addressed by rigorous standards. The differences that are created every day in data models arise from how we choose words and conceptualize abstractions: aye, there’s the rub.
For example, if we look at the two major types of automation systems that exist, there are control systems, which live in the tangible world of physical objects, operating mechanical equipment like flying a B-2 bomber, which is far too complicated to fly without a computer, and information systems, which cross into the world of intangible ideas and concepts.
Among these two types of automation systems, the disciplines of design and automation appear to go in completely separate ways. From a process perspective, these two types of automation systems truly belong to distinct paradigms having about a hundred fundamental differences between them (which is a different and exciting story).
From a data modeling perspective, however, control and information systems are not fundamentally different; they just abstract things differently. More precisely, in control systems we do not have to abstract things at all; whereas in information systems, we abstract many things at many levels of abstraction, for both process and data.
From a data modeling perspective, control systems inherit concrete physical objects that occasionally differ in their names, while information systems inherit ambiguous language-based names that can vary widely from one individual to the next, not only with their vocabulary, but with the way in which each individual abstracts an idea before labeling it with a name.
Existing engineering disciplines, such as normalization, reorganize attributes to minimize the redundancy of data values, but offer no ability to alter an abstraction, nevermind render a consistent set of abstractions among multiple disparate ones.
As a result, what have the great minds of architecture and design been missing? Although the answer has been elusive for a number of decades, the solution is surprisingly simple.
In the final analysis, it doesn’t even matter what particular name we assign a given thing, as long as we perform a set of basic steps to ensure that our abstractions are consistent and correct before we begin the data normalization process. As such, the Four Rules of Abstraction are extremely straightforward.
The Rules of Abstraction
1st Abstract Form (1AF) – self dependence
2nd Abstract Form (2AF) – time dependence
3rd Abstract Form (3AF) – essential dependence
4th Abstract Form (4AF) – accidental dependence
Although the rules of abstraction may not be appropriate for individual user conceptual data models where it is more important to capture the precise language of the business user, it does become appropriate for a unified conceptual model and essential for a logical data model.
Once the rules of abstraction have been performed, tasks like mapping data elements to fields in metadata repositories and columns in databases becomes an easy exercise.
Put another way, using the rules of abstraction within information systems is a way to bring information systems one step closer to the tangible world of control systems, where architecture and design are significantly easier in either an object-oriented or SOA approach.
In any event, please let me know if you enjoyed this article. Corrections, enhancements and suggestions are always welcome and are requested.
Recent articles by James Luisi
James Luisi - Jim has thirty years of experience in information architecture, architecture and governance within control and information systems in the financial and defense industries with information in LinkedIn.com. Feel free to send him a link. Jim is an author, speaker at conferences, and enterprise architect at a large financial conglomerate in New York area.