Building a Better Data Model

Introduction

As Information Systems get more complicated and specialized, Data Analysts find themselves in need of a better way to express more information in their models. Whether we are designing a
cutting-edge transactional system or a data warehouse, we rely on our user community to define the business rules that dictate and validate the data. Further, we are beginning to realize the
importance of not only a common repository of meta-data, but of the need to capture another layer of analysis, namely, a conceptual level of knowledge.

We need to be able to first design a conceptual schema, one that accurately and completely defines business rules in a way that our users can understand. This conceptual layer is free of
implementation details such as database vendor and schema implementation (Relational vs. Object-Relational (OR) vs. Object-Oriented (OO), etc.) We can then express that conceptual knowledge in a
somewhat implementation-biased and more abstract logical notation. Finally, we can express the model in a physical notation, especially in a schema that may not be exactly like that of the logical
layer (I am speaking of details such as controlled de-normalization here).

In this fashion, the logical and physical schemas are nothing more than an abstraction of the conceptual schema, which contains all of the information we need to accurately express the business
rules and data requirements. We can then see our conceptual schema represented as any target implementation in any syntax notation (IDEF1X, Chen, UML, etc.). We can toggle between them at will,
especially when we are dealing with users more versed in one notation over the others or with companies that have standards in place that dictate the model syntax used (for example, in government
work, IDEF1X). What syntax you use to describe the logical level is completely irrelevant at the design stage. What is really important is the data, and the rules we apply to the data.

We also need to communicate better with our users. For example, say you are validating your model with your users using terms like entities, attributes, and relationships (or even worse, foreign
keys, referential integrity, and tuples) and your users are vigorously nodding while giving you a blank look. Odds are, they don’t have a complete understanding of what you are talking about. Most
of the time, they won’t even ask for clarification. The reason for this is simple: we use a very obscure language that most people don’t understand (or even want to understand). But if your user
doesn’t quite understand what you are talking about, but nod vigorously anyway, I bet the model isn’t completely accurate.

Many of the points I have raised above have been addressed by a different sort of data modeling notation than most people are familiar with. I contend that in order to build a better data model, we
need to capture as much information as you can at the conceptual level, and Object Role Modeling (ORM) is the best way to do this.

ORM in a Nutshell

Designing a database requires a complete understanding of the subject area, or universe of Discourse (UoD), to be implemented. Thus, a good database model is one that specifies the UoD in a clear
and unambiguous way. ORM uses a natural language (English, for example, but any language will work) and easy to understand diagrams that are populated with example data to accomplish this goal. No
other popular notation allows you to do this, but it is extremely critical nonetheless.

Another notable aspect ORM is that, since it is based on natural language, it can be completely expressed in either graphical or textual format. Further, a natural language is much easier for your
users to understand, express, and verify than the technical jargon that we tend to use.

The root of ORM is the elementary fact. You express the UoD in terms of objects (such as person, department, project, etc.) playing roles (works for, manages, reports to, etc.). You make no
distinctions whether an object is an attribute or an entity. In essence, you are delaying commitment on importance, which allows you to be concerned only with the data and the rules as well as
being able to make easy adjustments later. You just express the UoD in simple, easy to understand facts such as: “Person works for Department”, “Person works on Project”, “Person manages
Department”, “Department manages Project”, “Person reports to Person”, “Person has Parking Space”, “Person receives parking reimbursement in Amount”, “Person drives Car”, and “Person
owns Car”.

Using this fact based approach, ORM makes reengineering and schema evolution a simple exercise. Further, this approach simplifies normalization worries: the elementary nature of the facts ensures
that the schema is in an optimal (often, fifth) normal form. This approach allows you to make (in ER terms) attribute level constraints. I have listed the kinds of constraints you can use in ORM
with (sometimes, unrealistic) examples of potential uses of these constraints below. See Figure One for an example of how these facts would be graphically represented.

Subset constraints: “A Person can manage a Department only if that Person works for that Department.”
Equality constraints: “If a Person owns a Car they must also drive that Car and if a Person drives a Car they must Own that Car.”
Exclusionary constraints: “A Person can either have a Parking Space or receive a parking reimbursement, not both.”
Mandatory disjunction: “A Person must either have a Parking Space or receive a parking reimbursement.”
Frequency: “Person may own a maximum of two Cars.”
Ring constraints: Given a fact in the form of ‘Person plays a role with Person’, specify if the relationship is reflexive, symmetric, transitive, irreflexive, asymmetric, antisymmetric,
and/or intransitive – Such as: “A Person cannot report to themselves (irreflexive).”
Or any combination of the above constraints such as: “A Person can only work on a Project if that Person’s Department manages that Project.”

Figure 1:

An example ORM schema (sample data omitted)

ORM also allows you to have facts with an arity (number of objects in the fact) greater than two. For example, let’s say you have a fact like “Movie receives Rating in Country”. ORM allows you
to model this fact naturally or via nesting, which allows you to add other (potentially optional) roles to the nested fact. See Figure Two.

Figure 2:

A ternary example in ORM: (left) flattened version (right) nested version (sample data omitted). The nested version (in this case) indicates that the rating is optional while the flattened version
dictates a rating must be given.

ORM also uses a formalized, accurate, and complete definition of subtyping and inheritance, derived fields (including a distinction between merely derived and derived-and-stored), and schema
transformations and evolution.

Finally, mapping the conceptual schema into a logical schema, a physical schema, and implementing the constraints are trivial (especially with the use of CASE tools: however, mapping issues and
CASE tool implementation are beyond the scope of this article).

Advantages of Using Object Role Modeling

ER’s logical model implementation allows you to view objects and relationships, and a few constraints, at a high level in a compact notation. However, ER’s use of attributes makes the model
inherently unstable with regard to schema changes. Further, ER schemas make it difficult to apply a population check (with real data) and are missing many, many important constraints, particularly
at the attribute level. ER relationships also tend to be binary (while ORM allows relationships of any arity), which force you to use unnatural intersection entities and other conceptual
falsehoods. ORM allows you to speak to the business experts in their own language, without having to use such artificial constructs.

Other benefits of ORM include:

The fact-based approach of ORM is a simpler and more accurate approach as it is easier to get one fact correct than many facts simultaneously.
ER tends to set a level of importance (is it an entity or an attribute?) early in the modeling process. If you do not perform those initial steps correctly the first time, you will end up
changing your model later (and possibly correcting the data itself). ORM sets no initial importance to objects at all. Rather, importance of a particular fact will reveal itself much later on (by
discovering that you have many roles attached to an object, or by actually mapping the entities/tables).
In ORM, cardinality is linked to the sample data sets you provide while modeling. It is much easier to determine constraints when you are looking at the provided data.
In ORM, semantic domains (i.e. units or ranges such as “name”, “SSN”, “age”, “date”) are automatically included. This allows for stronger typing and is less error prone.
ORM has a less implicit duplication of attributes than ER does. For example, In ER, you could have an athlete entity with 2 attributes, the country they represent and their birthplace
(Experienced modelers would point out that this is bad ER modeling. For the sake of argument, say the initial model only had country the athlete represents and the birthplace was added later.
Making this change is not so easy once the model is implemented). In ORM, you would have only 2 objects, Athlete and Country which play two roles with each other: “… represents …” and “…
born in …”. Again, ORM is less prone to errors (the previous mistake is impossible). Less error prone means your model (and therefore, your database) is much more stable and the magnitude of
problems caused by a change to the model is lessened.
Finally, ORM has many more constructs inherent to the language, and is therefore, more expressive of the actual UoD.

Conclusion

I am not suggesting that you abandon your ER or OO models. In fact, I believe that an ER or OO notation is quite adequate at expressing a compact summary of your data structure. But it should be
used only as an abstraction of the conceptual model, which can completely derive logical and physical schema and captures all of the information and rules that you need to accurately depict and
implement your data model. Further, I don’t believe that you should you always use an ER model to verify your model with your users as it is often too obscure and foreign to your users and
doesn’t express enough information for them to completely verify the model. The final notation you use is completely irrelevant: it’s how you get there that is important.

Further Reading

This article was intended to be an introductory glance at the benefits of Object Role Modeling. I have made little attempt to show the method in practice due to space limitations and the scope of
this article. Further reading on these subjects is recommended and encouraged. In addition to the sources cited in the references below, readers can find additional information in the Journal of
Conceptual Modeling (www.inconcept.com) and at The Official Site for Conceptual Data Modeling (www.orm.net).
The papers by Dr. Terry Halpin (and interviews with Dr. Halpin) cited below, are found at the latter URL.

References

Frank, M., Black Belt Design (interview), DBMS, September, 1995

Hallock, P.J. & Becker, S.A., Introduction to Data Modeling (lecture)

Halpin, T.A., Business Rules and Object Role Modeling, Database Programming & Design, October, 1996

Halpin, T.A. 1995, Conceptual Schema and Relational Database Design, 2nd edn, Prentice Hall Australia.

Halpin, T.A. Object Role Modeling: An Overview

Ross, R.G., Modeling for Data and Business Rules (An Interview with Terry Halpin), Data Base Newsletter, vol. 25, no. 5, September/October 1997

MenuMenu

Building a Better Data Model

Scot Becker

MenuMenu

Share this post

Scot Becker