|
The Data Modeling Addict - July 2006
How Well Does the Model Capture the Requirements?
Published: July 1, 2006 This article focuses on the first of the 10 categories
Published in TDAN.com July 2006 An application's flexibility and data quality depend quite a bit on the underlying data model. In other words, a good data model can lead to a good application and a bad data model can lead to a bad application. Therefore we need an objective way of measuring what is good or bad about the model. After reviewing hundreds of data models, I formalized the criteria I have been using into what I call the Data Model Scorecard.
The Scorecard contains 10 categories:
This is the second of a series of articles on the Data Model Scorecard. The first article on the Scorecard summarized the 10 categories, and each subsequent article will focus on a single category. This article focuses on the first of the 10 categories, How well does the model capture the requirements? For more on the Scorecard, please refer to the book, Data Modeling Made Simple: A Practical Guide for Business & IT Professionals. How well does the model capture the requirements? This is the “correctness” category. That is, we need to understand the content of what is being modeled. This can be the most difficult of all 10 categories to grade, the reason being that we really need to understand how the business works and what the business wants from their application. If we are modeling a sales data mart, for example, we need to understand both how the invoicing process works in our company, as well as what reports and queries will be needed to answer key sales questions from the business. What makes this category even more challenging is the possibility that perhaps the business requirements are not well-defined, or differ from verbal requirements, or keep changing usually with the scope expanding instead of contracting. We need to ensure our model represents the data requirements, as the costs can be devastating if there is even a slight difference between what was required and what was delivered. Besides not delivering what was expected is the potential that the IT/business relationship will suffer. Here are a few of the red flags I look for to validate this first category. By red flag, I mean something that stands out as a violation of this category. Modeling the wrong perspective. Usually one of my first questions when reviewing a data model is to identify why it is being produced in the first place. What are goals of the model and who is the audience whose needs should be met with this model? For example, if there is a need for analysts to understand a business area such as manufacturing, a model capturing how an existing application views the manufacturing area will not usually be acceptable. Although in reality it is likely both the manufacturing application and business processes will work very similar, they will be differences at times and these differences can be large especially in the case of ERP packages. Data elements with formats different from industry standards. For example, a five-character Social Security number or a six-character phone number. This is a red flag that can be identified without much knowledge of the content of the model. Incorrect cardinality. Assume the business rule is “We hire only applicants who completed a master's degree.” Does the model in fig. 1 show this?
No. It shows that each applicant can obtain zero, one or many degrees. Yet the cardinality allows an applicant to have zero degrees, which violates our business rule. Also, degree includes all possible types of degrees, with a master's degree being just one of these. So if Bob has only a bachelor's degree, that would not satisfy our business rule. We will need to subtype to enforce the specific rule to master's degree, as shown in fig. 2.
In fig. 2, the subtyping symbol captures that each degree can be a masters degree. The relationship between applicant and masters degree captures that each applicant must have at least one master's degree, which supports our business rule. As a proactive measure to improve the correctness of the data model, I have found the following techniques to be very helpful:
Stay tuned! Our next article will focus on the completeness category. |