|
The Data Modeling Addict – October 2008
The Data Model Scorecard Category 10 of 10: How Well Does the Metadata Match the Data
Published: October 1, 2008 The eleventh of a series of articles on the Data Model Scorecard™, this article focuses on the data category.
An application’s flexibility and data quality depend quite a bit on the underlying data model. In other words, a good data model can lead to a good application, and a bad data model can lead to a bad application. Therefore, we need an objective way of measuring what is good or bad about the model. After reviewing hundreds of data models, I formalized the criteria I have been using into what I call the Data Model Scorecard™. The Scorecard contains 10 categories:
This is the tenth of a series of articles on the Data Model Scorecard™. The first article on the Scorecard summarized the 10 categories, the second article focused on the correctness category, the third article focused on the completeness category, the fourth article focused on the structure category, the fifth article focused on the abstraction category, the sixth article focused on the standards category, the seventh article focused on the readability category, the eighth article focused on the definitions category, the ninth article focused on the real world category, the tenth article focused on the consistency category, and this article focuses on the data category. That is, How well does the metadata match the data? For more on the Scorecard, please refer to my book, Data Modeling Made Simple: A Practical Guide for Business & IT Professionals. How Well Does the Metadata Match the Data?This question ensures the model and the actual data that will be stored within the resulting tables are consistent with each other. This category determines how well the data elements and their rules match reality. If a data element is called email address text, for example, will it actually contain the email address? Examining a handful to a few hundred sample values can help ensure that the entity, data element, and relationship definitions are consistent with the data they will support. Collecting real data can be difficult to do early in a project’s life cycle, but the earlier the better so you can avoid future surprises that can be much more costly. If data surprises are not caught during the design phase, they will most likely be caught during system testing or user acceptance testing at a much higher cost of time and money. Here are a few of the red flags I look for to validate this category:
As a proactive measure to ensure the data matches the metadata, I have found using a standard template comparing the data to the metadata to be very effective. For example, I use a template called the Data Quality Validation Template (refer to chapter 5 of The Data Modeler’s Workbench for more details including an example). A data to metadata comparison template can contain metadata in one set of columns, the results of data checking in another set of columns, and validation by a business expert in the final set of columns. Validation by a business expert is essential to ensuring there is a correct data to metadata match. Go to Current Issue | Go to Issue Archive Recent articles by Steve Hoberman
Steve Hoberman -
Steve Hoberman is a world-recognized innovator and thought-leader in the field of data modeling. He has worked as a business intelligence and data management practitioner and trainer since 1990. Steve is known for his entertaining, interactive teaching and lecture style (watch out for flying candy!) and is a popular, frequent presenter at industry conferences, both nationally and internationally. Steve is a columnist and frequent contributor to industry publications, as well as the author of Data Modeler’s Workbench and Data Modeling Made Simple. He is the founder of the Design Challenges group and inventor of the Data Model Scorecard™. Please visit www.stevehoberman.com to learn more about his training and consulting services, and to sign up for his Design Challenges! He can be reached at me@stevehoberman.com. |