|
Measuring the Quality of Models
Published: October 1, 2000
Published in TDAN.com October 2000 INTRODUCTIONMany professions have plenty of yardsticks for measuring their version of ‘quality’. Sadly, data modelers seemed excluded from that lucky group. For a long time the profession has hunted for a reliable way to measure the inherent quality of a model. This article responds to that need. It describes a fresh, practical approach for evaluating a model’s quality. The article also describes the basis of quality, the criteria for assessing a model and how to apply the measures during the evaluation process. A SEARCH FOR QUALITYJust any old thing slapped down simply won’t do. Modelers take great pride in their work and want to produce high quality deliverables. But how can they know the value of their creation? Does it depend on:
The short answer is, no. These things poorly gauge a model’s quality because they fail in the most basic of ways – they are inappropriate. Assessing the quality of anything – models included – has two parts. One comes from measuring the right things, in the right way, with the right yardsticks. But the heart of quality comes from the second aspect; judging something based on its intended function and purpose. This seems only fair. To do otherwise is like drinking coffee and then complaining about the bad martini. So the search for quality starts by asking, “What’s the purpose of a data model?” [1] QUALITY MEASURES FROM PURPOSEAs a “technologically independent description of the business from the data perspective” a data model [2] depicts some part of the business. Through a process of evolutionary refinement a model keeps changing until the business clients are satisfied its says the things they want. At this point the model faithfully captures and re-transmits its descriptive message about the business. This essentially describes its purpose and intent. Now this won’t seem like headline news. Modelers know the ERD and its accompanying text capture and convey facts that describe the business. But the lighting-bolt about quality-measures comes from the implications of that simple statement. Namely, since models describe the business they should be measured based on their ability to do just that – describe the business. Models store and pass on information about the enterprise. They act as vehicles of communication so it seems reasonable to measure them based on how well they do that job – communicate. As such, communications-based measures provide a way to successfully evaluate a model’s quality. YARDSTICKS FOR MEASURING QUALITYGenerally speaking, what it takes to deliver a ‘message’ depends on what needs to be communicated. Simple messages require very little to communicate their intent - “No, I don’t want to switch my long distance carrier, and don’t call again.” More complex messages have more components and take longer to convey their full information content. But despite their complexity, the number of components, or the vehicle(s) chosen to communicate them, messages must have ‘information integrity’. That is, the message must fully transmit the intended information, no more and no less. What's more, for the overall message to have integrity, all of its parts – either alone or when used together - must also have integrity. More specifically then, the quality of a model is a function of the collective integrity of its components. Detecting ‘leakage’ or loss of information integrity at the more detailed level provides the key for measuring a model’s total reliability, and subsequent quality. As such, the following communications-based measures detect information leaks:
A model should fully convey its intended message – the one held within the structure of the ERD and all supplemental text – and only its intended message. Otherwise, a model that struggles to do this has a less-than-complete reliability to speak for or about the business. In essence it doesn’t communicate effectively, and therefore has a reduced quality and value. The following describes how to use these yardsticks to measure a model’s quality. AccuracyAccuracy describes the relationship between the model and the business area it represents. The quality of accuracy requires that each assertion in the model truly reflect the business intent. For example, does a Relationship show the right linkage (min. and max. cardinality) between two business concepts (Entities)? Any disagreement between the model and the business intent or target creates a contradiction. This says the model doesn’t faithfully capture and reflect the business need. For the purposes of this article, Entity definitions fall under this category. Other measures - like Clarity - also evaluate the definition, but the concept of Accuracy generally best quantifies this model characteristic. The rationale goes like this: a definition establishes the essential, inherent meaning of a business concept. It creates a boundary or ‘domain’ that corresponds to the scope of the Entity. A domain whose footprint doesn’t perfectly align with the required business scope for the concept is effectively inaccurate. Passing the test of Accuracy requires relentlessly combing through the model and asking the business, “Is this right?” ClarityThe test of clarity requires that the model state its message in clear and unambiguous terms. Because of its small number of symbols and limited grammar [3], the ERD usually avoids this problem. Most clarity issues come from the text portions of a model. For all its richness, the English language is a notoriously weak tool for rigorous specification. A model lacks clarity when the same statement has two or more interpretations. For example, “SUPERVISORs should monitor the hours of work for all EMPLOYEEs with a TIME CARD”. This can mean either:
This statement causes an information loss because it doesn’t make a clear assertion about the business requirement. Now look at the word, ‘should’. All by itself this dangerous little word creates more confusion. How strictly are SUPERVISORs to monitor EMPLOYEEs? Did the modeler intend to write a Business Rule or merely record a suggestion? Short, direct statements create the best text messages. Follow the rules of good grammar. Lastly, use words like ‘must’ and ‘will’ to write Business Rules. These simple guidelines can eliminate much of the ambiguity in a model. CompletenessComplete models give a thorough picture of the business. These models fully document the business area they represent. Readers have everything they need at hand to understand what the model wants to communicate. Lack of completeness can play havoc with a model, especially when it passes into the hands of others. Consider an undocumented Business Rule. Since it isn’t recorded a programmer won’t translate it into code. Or worse yet, on discovering a need for the rule, they invent what seems reasonable(!?) and then, most likely, not record their action. Completeness has two parts, structural and semantic. Structural completeness refers to the mandatory properties an object must have in order to transmit its part of the message. The Zachman Framework comes in handy at this point. It provides a way to describe what makes an object ‘complete’ for its row, and its requirements for transition into the next. For example, an Attribute could be complete in Row 2 with only a Name, Row 3 Attributes are complete when the Definition is added, but must have a Data Type value to be fully qualified for transformation into Row 4. These rules will vary depending upon each site’s overall data management architecture. Semantic completeness means having text-based properties like definitions and Business Rules. However, a complete model also contains things like Modeling Notes, Glossaries, Session Notes, Issues Logs, etc. This type of information gives the model a sense of context. It greatly helps the reader understand what gave the model its ‘shape’ and intent; who participated in what decisions, how business issues affected certain objects and so on. Lastly, semantic completeness is subjective but its absence is obvious when there are still questions about the model. Making a model complete requires the professional judgment of those who create and review them. If it seems reasonable and useful to add more information, do so. Otherwise, don’t add it. When in doubt, err on the side of caution. ConcisenessModels that repeat information or have overlapping data raise the issue of conciseness. Having the same facts in many locations puts the model’s information integrity at risk. In a database duplicate data can cause update anomalies. In a model duplicate data can suffer from metadata update anomalies. Models can have two types of redundancy. Inter-model redundancy occurs when, for the same business area, different types of models – e.g., data and process - repeat the same information. Intra-model redundancy occurs if the same model repeats information. Both types of redundancy have two common causes:
As much as possible models should contain only highly normalized metadata. Standards for object content can help by ensuring that certain information is captured in only one place. Each object (Entity, Attribute, Process, etc.) and component (ERD, State Transition Diagram, etc.) becomes responsible for recording specific information about the business. This has several benefits including:
ConsistencyConsistency requires that no statement in the model directly or indirectly conflict with any other statement from the same model. A model behaves inconsistently when it has a contradiction based on two or more sources. Moreover, as a measure of quality, consistency is a necessary condition for a model to integrate. That is, to provide an internally harmonious and uniform view of the business. Take the case of an ‘Attributive’ Entity with an optional Relationship to its parent. The model says two different things about the Entity. The classification says ‘Attributive’, but the Relationship – by breaking the child-to-parent dependency rule - says otherwise. The statements give inconsistent messages, and in doing so causes a loss of integrity. A leak like this may seem trivial. But any and all types of leakage can cause downstream problems for a project. Since the model is the basis for translating the business description into an actual system, plugging leaks becomes an important activity. Catching problems here helps avoid confusion, unnecessary costs and project delays. For the above: is referential integrity (RI) required or not? Imagine the costs, both direct and indirect, that could result if the need for RI is discovered “too late”, or is built inconsistently in the system. Finding conflicting metadata can get difficult, especially when the full business model has oceans of text spread between both data and process models. Content standards can help a bit. But modelers and reviewers need to religiously scour all parts of the model to ensure no inconsistencies lurk in either the text or the graphics. A MODEL-REVIEW PROCESS FOR ENSURING QUALITYWhile the yardsticks for measuring quality have value, the process of applying them may have even more importance. It also forms the lynchpin that binds the measures together. The review process uses two techniques called Direct Feedback (DF) and Business-Based Questioning (BBQ). In Direct Feedback the reviewer makes statements that say in essence, “The model tells me that...” Phrased in quality-measurement terms, these observations literally feed back to the modeler the messages the reviewer gets from the model. For example, “This statement about SUPERVISORs, EMPLOYEEs and TIMESHEETs is unclear. It could mean either….” The DF method helps the modeler realize what the model says to others, intended or not. Model reviewers also benefit since its a face-saving way to check their understanding before commenting on the model. Direct Feedback works to create a shared framework of understanding. Both reviewer and modeler can use to identify simple mistakes, places where the model seems incomplete, lacks clarity, and similar issues. Larger concerns, or ones that need handling with greater sensitivity, can use the Business-Based Questioning (BBQ) technique. This approach is useful when the reviewer can’t or doesn’t want to prejudge the model, or wants to carefully address a topic with the modeler. For example a reviewer may say, “The ‘many’ side of this Relationship is shown as optional. What business processes allow an EMPLOYEE not to have a PERIOD OF EMPLOYMENT?”. Or, “The EMPLOYEE definition excludes those who no longer work here. If not through EMPLOYEE, how will the company manage pensioners?”. Notice how the approach re-shapes a concern about the model into a business issue; the questions are business-based to start a discussion about the business. They are also open-ended in order to generate the wide-ranging dialogue necessary to get all information on the table before deciding what changes – if any – are required in the model. Overall, the BBQ approach creates an environment that is neutral and non-judgmental. It focuses on understanding the business as a step to developing its description. In doing so it keeps the light of critical examination where it belongs, on the business and on the model instead of sweating the modeler. The essence of these techniques comes from a philosophy of: “Critique, don’t criticize”. A critique-based approach reduces stress for both reviewer and reviewee. Neither has to worry about spending emotional energy wrestling over issues or whose opinion counts for more. Together they can focus their attention on the common ground of the business and the model as a description of it. Even better, Direct Feedback and Business-Based Questioning reframes the model review process from a checklist-driven, point-scoring event into a co-operative venture for ensuring the model’s quality [4]. AN ADDED BENEFITOne final but important benefit comes from judging models based on their communications value; better descriptions of the business through greater freedom of expression. Traditionally, aspects of individual modeling style have always been viewed with suspicion. For many a sense of style in a model is disturbing. After all, how can a quality deliverable come from anything so personal and subjective? Yet every modeler knows the unspoken truth: models describe the business, but only as seen through the eyes of the modeler. Style seems unavoidable. But instead of rejecting it, why not fully embrace it and leverage the advantages? A small example best explains this idea… Who hasn’t heard – especially when just learning to model – comments like, “You don’t need to subtype that Entity, just classify it”? Statements like this miss the point. Instead, ask the question, “What message is intended because the Entity is sub-typed?” Like their paint-and-brush kin, modelers need a form of artistic license. With this freedom they can bring elements of style into play that will make the model more expressive. Using a range of techniques modelers can stress or reduce the importance of business concepts, draw attention to significant changes of state, make a Business Rule more obvious - whatever they think is necessary to communicate [5]. In this way modelers can carefully craft each part of the model – each ‘word’ and ‘phrase’ - to create a finely tuned message about the business. Moreover, using the principles of quality they can ensure their model still communicates effectively and with integrity. Otherwise, denying a modeler this sense of style condemns us all to looking at dry, two-dimensional stick-drawings instead of richly textured and articulate portraits about the business. PROVABLE QUALITY FOR MODELSThe table below summarizes the five measures of quality. Each of them focus on judging a model based on how well it delivers its message about the business. As aspects of communication these measures protect against a loss of information integrity. Overall, they make valuable tools for creating high-quality deliverables. ttttttttttttttttttttttttttttttttttttttttttttttaaaaaaaaaaaaaaaaaaaaaaaaaaabbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbllllllllllllllleeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee111111111111111111111111111111111111111111111111 Some might worry about this approach to quality. It may seem too Zen-like because it elevates aspects of style. Or, there could be concerns about its rigor for properly evaluating a model. But, consider the following... This approach takes as its input the direct statements made by the model; the things the model says. It then measures those statements against the criteria. The rigor of the approach stems from the outcomes of that process. The process produces quantitative, bi-conditional decisions on issues of model quality:
Collectively these criteria for quality require that the model communicate its description of the business accurately, clearly, concisely, consistently and completely. Who could ask more from a model? What else could there be?
--------------------------------------------------------------------------------
DATA MODEL QUALITY REVIEW
Project:
Review #n (Date)
--------------------------------------------------------------------------------
General Comments: (Comments and observations that relate to the model as a whole)
DATA MODEL QUALITY REVIEW
Project:
Review #n (Date)
--------------------------------------------------------------------------------
Object Name: Object Type: (Entity, Attribute, Relationship, etc.)
--------------------------------------------------------------------------------
Questions: (Any questions the reviewer may have can be listed here)
--------------------------------------------------------------------------------
Perspective: Corporate: Project: (Records if the concern is within the scope of the project or extends beyond it. For example, does an Entity definition conflict with that of an established Corporate object?)
Quality Concern: Accuracy: Clarity: Completeness: (One or more quality concerns can be raised for the same object)
--------------------------------------------------------------------------------
Action Required: Yes: No:
--------------------------------------------------------------------------------
Comments: Suggestions:
--------------------------------------------------------------------------------
[1] For simplicity the article uses data models as the focus of discussion. However, the points offered apply to all types of models. [2] The term ‘data model’ refers to more than what is shown by the ERD. A model also contains Definitions, complex Business Rules, logical Attribute properties, Notes, etc. [3] The notation used on the ERD. The Information Engineering (IE) methodology uses crows feet, black and white circles, and short lines across the Relationship as its symbols – the language in effect. The valid combinations for the symbols defines the grammar. [4] A sample model-review form is provided. Contact the authors for more information about its use. [5] Since ERD is the strongest way to communicate in a model most aspects of style show up there Go to Current Issue | Go to Issue Archive
Peter A. McDougall -
Peter A. McDougall is a Senior Data Administrator with the Insurance Corporation of British Columbia, in Vancouver, British Columbia. In the past 18 years he has worked in a number of IT areas including telecommunications, systems development and data administration. In his career he successfully introduced many initiatives dedicated to increasing the quality of the data deliverable and to improving the return on investment in the data administration function. Peter is a noted conference speaker and has written several articles on managing the data resource. His revolutionary Syntactic-Role technique is the first formalised methodology for analysing and defining entities. He holds a B.Sc. in Computer Science from the University of British Columbia. He can be reached at: Phone: (604) 661-6044 / Fax: (604) 661-6406
John Claxton - John Claxton, graduated with a diploma in Operations Management from the British Columbia Institute of Technology and a BSc in Computer Science at Simon Fraser University, has spent twenty years
working in Information Technology in both Systems Development and Data Administration.
|