|
Building a Better Data Model
Published: December 1, 1998 As Information Systems get more complicated and specialized, Data Analysts find themselves in need of a better way to express more information in their models.
Introduction As Information Systems get more complicated and specialized, Data Analysts find themselves in need of a better way to express more information in their models. Whether we are designing a cutting-edge transactional system or a data warehouse, we rely on our user community to define the business rules that dictate and validate the data. Further, we are beginning to realize the importance of not only a common repository of meta-data, but of the need to capture another layer of analysis, namely, a conceptual level of knowledge. We need to be able to first design a conceptual schema, one that accurately and completely defines business rules in a way that our users can understand. This conceptual layer is free of implementation details such as database vendor and schema implementation (Relational vs. Object-Relational (OR) vs. Object-Oriented (OO), etc.) We can then express that conceptual knowledge in a somewhat implementation-biased and more abstract logical notation. Finally, we can express the model in a physical notation, especially in a schema that may not be exactly like that of the logical layer (I am speaking of details such as controlled de-normalization here). In this fashion, the logical and physical schemas are nothing more than an abstraction of the conceptual schema, which contains all of the information we need to accurately express the business rules and data requirements. We can then see our conceptual schema represented as any target implementation in any syntax notation (IDEF1X, Chen, UML, etc.). We can toggle between them at will, especially when we are dealing with users more versed in one notation over the others or with companies that have standards in place that dictate the model syntax used (for example, in government work, IDEF1X). What syntax you use to describe the logical level is completely irrelevant at the design stage. What is really important is the data, and the rules we apply to the data. We also need to communicate better with our users. For example, say you are validating your model with your users using terms like entities, attributes, and relationships (or even worse, foreign keys, referential integrity, and tuples) and your users are vigorously nodding while giving you a blank look. Odds are, they don't have a complete understanding of what you are talking about. Most of the time, they won't even ask for clarification. The reason for this is simple: we use a very obscure language that most people don't understand (or even want to understand). But if your user doesn't quite understand what you are talking about, but nod vigorously anyway, I bet the model isn't completely accurate. Many of the points I have raised above have been addressed by a different sort of data modeling notation than most people are familiar with. I contend that in order to build a better data model, we need to capture as much information as you can at the conceptual level, and Object Role Modeling (ORM) is the best way to do this. ORM in a Nutshell Designing a database requires a complete understanding of the subject area, or universe of Discourse (UoD), to be implemented. Thus, a good database model is one that specifies the UoD in a clear and unambiguous way. ORM uses a natural language (English, for example, but any language will work) and easy to understand diagrams that are populated with example data to accomplish this goal. No other popular notation allows you to do this, but it is extremely critical nonetheless. Another notable aspect ORM is that, since it is based on natural language, it can be completely expressed in either graphical or textual format. Further, a natural language is much easier for your users to understand, express, and verify than the technical jargon that we tend to use. The root of ORM is the elementary fact. You express the UoD in terms of objects (such as person, department, project, etc.) playing roles (works for, manages, reports to, etc.). You make no distinctions whether an object is an attribute or an entity. In essence, you are delaying commitment on importance, which allows you to be concerned only with the data and the rules as well as being able to make easy adjustments later. You just express the UoD in simple, easy to understand facts such as: "Person works for Department", "Person works on Project", "Person manages Department", "Department manages Project", "Person reports to Person", "Person has Parking Space", "Person receives parking reimbursement in Amount", "Person drives Car", and "Person owns Car". Using this fact based approach, ORM makes reengineering and schema evolution a simple exercise. Further, this approach simplifies normalization worries: the elementary nature of the facts ensures that the schema is in an optimal (often, fifth) normal form. This approach allows you to make (in ER terms) attribute level constraints. I have listed the kinds of constraints you can use in ORM with (sometimes, unrealistic) examples of potential uses of these constraints below. See Figure One for an example of how these facts would be graphically represented.
ORM also allows you to have facts with an arity (number of objects in the fact) greater than two. For example, let's say you have a fact like "Movie receives Rating in Country". ORM allows you to model this fact naturally or via nesting, which allows you to add other (potentially optional) roles to the nested fact. See Figure Two.
ORM also uses a formalized, accurate, and complete definition of subtyping and inheritance, derived fields (including a distinction between merely derived and derived-and-stored), and schema transformations and evolution. Finally, mapping the conceptual schema into a logical schema, a physical schema, and implementing the constraints are trivial (especially with the use of CASE tools: however, mapping issues and CASE tool implementation are beyond the scope of this article). Advantages of Using Object Role Modeling ER's logical model implementation allows you to view objects and relationships, and a few constraints, at a high level in a compact notation. However, ER's use of attributes makes the model inherently unstable with regard to schema changes. Further, ER schemas make it difficult to apply a population check (with real data) and are missing many, many important constraints, particularly at the attribute level. ER relationships also tend to be binary (while ORM allows relationships of any arity), which force you to use unnatural intersection entities and other conceptual falsehoods. ORM allows you to speak to the business experts in their own language, without having to use such artificial constructs. Other benefits of ORM include:
Conclusion I am not suggesting that you abandon your ER or OO models. In fact, I believe that an ER or OO notation is quite adequate at expressing a compact summary of your data structure. But it should be used only as an abstraction of the conceptual model, which can completely derive logical and physical schema and captures all of the information and rules that you need to accurately depict and implement your data model. Further, I don't believe that you should you always use an ER model to verify your model with your users as it is often too obscure and foreign to your users and doesn't express enough information for them to completely verify the model. The final notation you use is completely irrelevant: it's how you get there that is important. Further Reading This article was intended to be an introductory glance at the benefits of Object Role Modeling. I have made little attempt to show the method in practice due to space limitations and the scope of this article. Further reading on these subjects is recommended and encouraged. In addition to the sources cited in the references below, readers can find additional information in the Journal of Conceptual Modeling (www.inconcept.com) and at The Official Site for Conceptual Data Modeling (www.orm.net). The papers by Dr. Terry Halpin (and interviews with Dr. Halpin) cited below, are found at the latter URL. References Frank, M., Black Belt Design (interview), DBMS, September, 1995 Hallock, P.J. & Becker, S.A., Introduction to Data Modeling (lecture) Halpin, T.A., Business Rules and Object Role Modeling, Database Programming & Design, October, 1996 Halpin, T.A. 1995, Conceptual Schema and Relational Database Design, 2nd edn, Prentice Hall Australia. Halpin, T.A. Object Role Modeling: An Overview Ross, R.G., Modeling for Data and Business Rules (An Interview with Terry Halpin), Data Base Newsletter, vol. 25, no. 5, September/October 1997 |