|
Why Data Models Cannot Work
Published: February 1, 2009 Malcolm Chisholm explains why enterprise-level information knowledge management will never be attained by producing a comprehensive set of data models.
A data model represents things of significance to an enterprise and their interrelationships. It does this in such a way that characteristics of these things can be identified and understood as discrete facts. Data is a stored representation of these facts. Thus, a data model provides an organized way of cataloging the things of significance to an enterprise in terms of the information that is recorded about them. Additionally, and very practically, a data model can specify a design for a database to hold this data. Yet, can data models really do all of this? Do they tell us the truth, the whole truth and nothing but the truth about what it is they are purporting to describe? Or do data models have inherent limits that prevent them from meeting our expectations? This is not the same thing as asking if data models can be done badly. There is no doubt that they can be, but this is more a reflection on data modelers than data models. What we are asking is whether data models that are done as well as they can be tell us everything we need to know about the structure of the data we are managing. The Yardstick for Judging a Data ModelIf a data model does represent characteristics of things that can be stored as facts in a database, then the data model should tell us as much (or maybe more) about these facts as we can find out from looking at the corresponding database. In other words, we should not discover more about the characteristics of the things of significance to an enterprise by looking at a physical database than we can by looking at the corresponding data model. A database is thus the yardstick for judging the data model from which it is built. Where is the “truth” – is it in the data model, or the database, or both? Let us try an exercise by taking a very simple example of a data model that has a single entity and a database built from it that consists of a single table. The entity is Financial Instrument, which can be defined as an obligation issued by a third party that conveys an interest in ownership, debt or other thing of value. Common examples would be stock, bonds and options. Figure 1 shows the structure of this entity taken from the data model.
Figure 1: The Financial Instrument Entity from the Data Model
Figure 2: The Financial Instrument Table in the Database What Does the Data Model Tell Us?If we are to ask what a data model is telling us, then we should have some expectation of the way in which it will provide answers. I suggest that the best way to understand a data model is to find what propositions it is stating. Propositions are statements that can be judged as true or false, or indeterminate if they cannot be decided upon. They are a major component of traditional logic, which provides a set of rules for stating and inspecting propositions that are extremely useful in data analysis. We can look at both the data model and the database to see what propositions can be mined out from both sources. Figure 3 shows the propositions that can be extracted from the data model. It also shows whether each of these propositions is true or false in absolute terms. That is, whether the proposition is true or false in what is generally termed the “real world” as opposed to the context of the data model or database. The reasons for judging certain propositions as false or indeterminate are given in Figure 3.
Figure 3: Propositions Extracted from the Data Model Figure 4 shows propositions that can be extracted from the database table shown in Figure 2. It is possible that even more propositions could be extracted.
Figure 4: Propositions Extracted from the Database Three things are immediately apparent:
Thus, the database provides more information about the data being managed than the data model does. Furthermore, the database can contradict the data model on certain points, and on these points the database is right and the data model is wrong. So the data model only tells us some things that are true. It does not tell us everything that is true, and it tells us some falsehoods that can be proven by the database it specifies.
Figure 5: Analysis of Propositions that are False or Indeterminate Analysis of Data Model PropositionsNow let us look a little further into the propositions that have been extracted from the data model to see what we can understand in general from them. 1. Every data model expresses propositions about each entity it contains.
2. Every such proposition has the entity as the basis of the subject.
3. Every such proposition has an attribute as the basis of the predicate.
4. Every such proposition is universal.
5. Every such proposition is affirmative.
6. The only terms that can appear in such propositions are the entity and the attributes defined for the entity. 7. A data model cannot distinguish attributes that are characteristics of the entity being modeled from attributes that represent other entities.
When we look at the database as well as the data model, we can see that there are additional issues. 8. No data model can express a particular proposition about an entity and its attributes.
9. No data model can express a negative proposition about an entity and its attributes.
10. No data model can use terms other than entities and attributes to express propositions.
ConclusionThe above analysis shows that data models have a structure that only allows us to express certain kinds of knowledge about an individual entity. Only universal, affirmative propositions that use the terms corresponding to the entity and its attributes can be expressed. Yet the underlying database can express much more knowledge about the way it is organizing its information, including particular and negative propositions. The structure of a data model can force it to misrepresent and ignore truths that are present in a database. Individuals who have to work with a database, be they business users or IT staff, need to fully understand databases. They will not be able to do so from data models. Data models have some advantages and can be very useful. However, we cannot represent that they are the sum total of knowledge about databases. Enterprise-level information knowledge management will never be attained by producing a comprehensive set of data models. Go to Current Issue | Go to Issue Archive Recent articles by Malcolm Chisholm
Malcolm Chisholm -
Malcolm Chisholm, Ph.D., has over 25 years of experience in enterprise information management and has worked in a wide range of sectors. He specializes in setting up and developing enterprise information management units, master data management and business rules. Malcolm has authored two books: Managing Reference Data in Enterprise Databases (Morgan Kaufmann, 2000) and How to Build a Business Rules Engine (Morgan Kaufmann, 2003). He can be contacted at mchisholm@refdataportal.com. Editor's note: More articles, resources, news and events are available in Malcolm's BeyeNETWORK Expert Channel. Be sure to visit today! |