Measuring the Quality of Models

Published in TDAN.com October 2000

INTRODUCTION

Many professions have plenty of yardsticks for measuring their version of ‘quality’. Sadly, data modelers seemed excluded from that lucky group. For a long time the profession has
hunted for a reliable way to measure the inherent quality of a model. This article responds to that need. It describes a fresh, practical approach for evaluating a model’s quality. The
article also describes the basis of quality, the criteria for assessing a model and how to apply the measures during the evaluation process.

A SEARCH FOR QUALITY

Just any old thing slapped down simply won’t do. Modelers take great pride in their work and want to produce high quality deliverables. But how can they know the value of their creation? Does
it depend on:

Strict adherence to modeling standards?
Creating ‘Corporate’ –class objects for the repository?
Lines that don’t cross on the Entity-Relationship Diagram (ERD)?

The short answer is, no. These things poorly gauge a model’s quality because they fail in the most basic of ways – they are inappropriate.

Assessing the quality of anything – models included – has two parts. One comes from measuring the right things, in the right way, with the right yardsticks. But the heart of quality
comes from the second aspect; judging something based on its intended function and purpose. This seems only fair. To do otherwise is like drinking coffee and then complaining about the bad martini.
So the search for quality starts by asking, “What’s the purpose of a data model?” [1]

QUALITY MEASURES FROM PURPOSE

As a “technologically independent description of the business from the data perspective” a data model [2] depicts some part of the business. Through a process of
evolutionary refinement a model keeps changing until the business clients are satisfied its says the things they want. At this point the model faithfully captures and re-transmits its descriptive
message about the business. This essentially describes its purpose and intent.

Now this won’t seem like headline news. Modelers know the ERD and its accompanying text capture and convey facts that describe the business. But the lighting-bolt about quality-measures comes
from the implications of that simple statement. Namely, since models describe the business they should be measured based on their ability to do just that – describe the business. Models store
and pass on information about the enterprise. They act as vehicles of communication so it seems reasonable to measure them based on how well they do that job – communicate. As such,
communications-based measures provide a way to successfully evaluate a model’s quality.

YARDSTICKS FOR MEASURING QUALITY

Generally speaking, what it takes to deliver a ‘message’ depends on what needs to be communicated. Simple messages require very little to communicate their intent – “No, I
don’t want to switch my long distance carrier, and don’t call again.” More complex messages have more components and take longer to convey their full information content. But
despite their complexity, the number of components, or the vehicle(s) chosen to communicate them, messages must have ‘information integrity’. That is, the message must fully transmit
the intended information, no more and no less. What’s more, for the overall message to have integrity, all of its parts – either alone or when used together – must also have integrity.

More specifically then, the quality of a model is a function of the collective integrity of its components. Detecting ‘leakage’ or loss of information integrity at the more detailed
level provides the key for measuring a model’s total reliability, and subsequent quality. As such, the following communications-based measures detect information leaks:

Accuracy – Tests the information for being true and correct
Clarity – Tests the information to see if its meaning is clear
Completeness – Tests if enough is said to make the information understandable
Conciseness – Tests to see if the same information is repeated
Consistency – Tests to see if one piece of information contradicts another

A model should fully convey its intended message – the one held within the structure of the ERD and all supplemental text – and only its intended message. Otherwise, a model that
struggles to do this has a less-than-complete reliability to speak for or about the business. In essence it doesn’t communicate effectively, and therefore has a reduced quality and value.

The following describes how to use these yardsticks to measure a model’s quality.

Accuracy

Accuracy describes the relationship between the model and the business area it represents. The quality of accuracy requires that each assertion in the model truly reflect the business intent. For
example, does a Relationship show the right linkage (min. and max. cardinality) between two business concepts (Entities)? Any disagreement between the model and the business intent or target
creates a contradiction. This says the model doesn’t faithfully capture and reflect the business need.

For the purposes of this article, Entity definitions fall under this category. Other measures – like Clarity – also evaluate the definition, but the concept of Accuracy generally best quantifies
this model characteristic. The rationale goes like this: a definition establishes the essential, inherent meaning of a business concept. It creates a boundary or ‘domain’ that
corresponds to the scope of the Entity. A domain whose footprint doesn’t perfectly align with the required business scope for the concept is effectively inaccurate.

Passing the test of Accuracy requires relentlessly combing through the model and asking the business, “Is this right?”

Clarity

The test of clarity requires that the model state its message in clear and unambiguous terms. Because of its small number of symbols and limited grammar [3], the ERD usually avoids
this problem. Most clarity issues come from the text portions of a model. For all its richness, the English language is a notoriously weak tool for rigorous specification.

A model lacks clarity when the same statement has two or more interpretations. For example, “SUPERVISORs should monitor the hours of work for all EMPLOYEEs with a TIME CARD”. This can
mean either:

The TIME CARD is the tool by which SUPERVISORs will monitor EMPLOYEEs
SUPERVISORs will monitor only those EMPLOYEEs who have a TIME CARD

This statement causes an information loss because it doesn’t make a clear assertion about the business requirement. Now look at the word, ‘should’. All by itself this dangerous
little word creates more confusion. How strictly are SUPERVISORs to monitor EMPLOYEEs? Did the modeler intend to write a Business Rule or merely record a suggestion?

Short, direct statements create the best text messages. Follow the rules of good grammar. Lastly, use words like ‘must’ and ‘will’ to write Business Rules. These simple
guidelines can eliminate much of the ambiguity in a model.

Completeness

Complete models give a thorough picture of the business. These models fully document the business area they represent. Readers have everything they need at hand to understand what the model wants
to communicate.

Lack of completeness can play havoc with a model, especially when it passes into the hands of others. Consider an undocumented Business Rule. Since it isn’t recorded a programmer won’t
translate it into code. Or worse yet, on discovering a need for the rule, they invent what seems reasonable(!?) and then, most likely, not record their action.

Completeness has two parts, structural and semantic. Structural completeness refers to the mandatory properties an object must have in order to transmit its part of the message. The Zachman
Framework comes in handy at this point. It provides a way to describe what makes an object ‘complete’ for its row, and its requirements for transition into the next. For example, an
Attribute could be complete in Row 2 with only a Name, Row 3 Attributes are complete when the Definition is added, but must have a Data Type value to be fully qualified for transformation into Row
4. These rules will vary depending upon each site’s overall data management architecture.

Semantic completeness means having text-based properties like definitions and Business Rules. However, a complete model also contains things like Modeling Notes, Glossaries, Session Notes, Issues
Logs, etc. This type of information gives the model a sense of context. It greatly helps the reader understand what gave the model its ‘shape’ and intent; who participated in what
decisions, how business issues affected certain objects and so on.

Lastly, semantic completeness is subjective but its absence is obvious when there are still questions about the model. Making a model complete requires the professional judgment of those who create
and review them. If it seems reasonable and useful to add more information, do so. Otherwise, don’t add it. When in doubt, err on the side of caution.

Conciseness

Models that repeat information or have overlapping data raise the issue of conciseness. Having the same facts in many locations puts the model’s information integrity at risk. In a database
duplicate data can cause update anomalies. In a model duplicate data can suffer from metadata update anomalies. Models can have two types of redundancy. Inter-model redundancy occurs when, for the
same business area, different types of models – e.g., data and process – repeat the same information. Intra-model redundancy occurs if the same model repeats information.

Both types of redundancy have two common causes:

Different pieces of text make overlapping statements
The text repeats information shown by the graphic. For example, the text repeats a Business Rule as shown by a Relationship on the ERD

As much as possible models should contain only highly normalized metadata. Standards for object content can help by ensuring that certain information is captured in only one place. Each object
(Entity, Attribute, Process, etc.) and component (ERD, State Transition Diagram, etc.) becomes responsible for recording specific information about the business. This has several benefits
including:

Helping find certain pieces of information about the business
Easier maintenance – both in development and long-term
Evaluating the completeness of documentation for an object and the model overall

Consistency

Consistency requires that no statement in the model directly or indirectly conflict with any other statement from the same model. A model behaves inconsistently when it has a contradiction based on
two or more sources. Moreover, as a measure of quality, consistency is a necessary condition for a model to integrate. That is, to provide an internally harmonious and uniform view of the business.

Take the case of an ‘Attributive’ Entity with an optional Relationship to its parent. The model says two different things about the Entity. The classification says
‘Attributive’, but the Relationship – by breaking the child-to-parent dependency rule – says otherwise. The statements give inconsistent messages, and in doing so causes a loss of
integrity.

A leak like this may seem trivial. But any and all types of leakage can cause downstream problems for a project. Since the model is the basis for translating the business description into an actual
system, plugging leaks becomes an important activity. Catching problems here helps avoid confusion, unnecessary costs and project delays. For the above: is referential integrity (RI) required or
not? Imagine the costs, both direct and indirect, that could result if the need for RI is discovered “too late”, or is built inconsistently in the system.

Finding conflicting metadata can get difficult, especially when the full business model has oceans of text spread between both data and process models. Content standards can help a bit. But
modelers and reviewers need to religiously scour all parts of the model to ensure no inconsistencies lurk in either the text or the graphics.

A MODEL-REVIEW PROCESS FOR ENSURING QUALITY

While the yardsticks for measuring quality have value, the process of applying them may have even more importance. It also forms the lynchpin that binds the measures together.

The review process uses two techniques called Direct Feedback (DF) and Business-Based Questioning (BBQ). In Direct Feedback the reviewer makes statements that say in essence, “The model tells
me that…” Phrased in quality-measurement terms, these observations literally feed back to the modeler the messages the reviewer gets from the model. For example, “This statement about
SUPERVISORs, EMPLOYEEs and TIMESHEETs is unclear. It could mean either….” The DF method helps the modeler realize what the model says to others, intended or not. Model reviewers also
benefit since its a face-saving way to check their understanding before commenting on the model. Direct Feedback works to create a shared framework of understanding. Both reviewer and modeler can
use to identify simple mistakes, places where the model seems incomplete, lacks clarity, and similar issues.

Larger concerns, or ones that need handling with greater sensitivity, can use the Business-Based Questioning (BBQ) technique. This approach is useful when the reviewer can’t or doesn’t
want to prejudge the model, or wants to carefully address a topic with the modeler. For example a reviewer may say, “The ‘many’ side of this Relationship is shown as optional.
What business processes allow an EMPLOYEE not to have a PERIOD OF EMPLOYMENT?”. Or, “The EMPLOYEE definition excludes those who no longer work here. If not through EMPLOYEE, how will
the company manage pensioners?”. Notice how the approach re-shapes a concern about the model into a business issue; the questions are business-based to start a discussion about the business.
They are also open-ended in order to generate the wide-ranging dialogue necessary to get all information on the table before deciding what changes – if any – are required in the model.
Overall, the BBQ approach creates an environment that is neutral and non-judgmental. It focuses on understanding the business as a step to developing its description. In doing so it keeps the light
of critical examination where it belongs, on the business and on the model instead of sweating the modeler.

The essence of these techniques comes from a philosophy of: “Critique, don’t criticize”. A critique-based approach reduces stress for both reviewer and reviewee. Neither has to
worry about spending emotional energy wrestling over issues or whose opinion counts for more. Together they can focus their attention on the common ground of the business and the model as a
description of it. Even better, Direct Feedback and Business-Based Questioning reframes the model review process from a checklist-driven, point-scoring event into a co-operative venture for
ensuring the model’s quality [4].

AN ADDED BENEFIT

One final but important benefit comes from judging models based on their communications value; better descriptions of the business through greater freedom of expression.

Traditionally, aspects of individual modeling style have always been viewed with suspicion. For many a sense of style in a model is disturbing. After all, how can a quality deliverable come from
anything so personal and subjective? Yet every modeler knows the unspoken truth: models describe the business, but only as seen through the eyes of the modeler. Style seems unavoidable. But instead
of rejecting it, why not fully embrace it and leverage the advantages? A small example best explains this idea…

Who hasn’t heard – especially when just learning to model – comments like, “You don’t need to subtype that Entity, just classify it”? Statements like this miss
the point. Instead, ask the question, “What message is intended because the Entity is sub-typed?”

Like their paint-and-brush kin, modelers need a form of artistic license. With this freedom they can bring elements of style into play that will make the model more expressive.

Using a range of techniques modelers can stress or reduce the importance of business concepts, draw attention to significant changes of state, make a Business Rule more obvious – whatever they
think is necessary to communicate [5]. In this way modelers can carefully craft each part of the model – each ‘word’ and ‘phrase’ – to create a finely tuned message
about the business. Moreover, using the principles of quality they can ensure their model still communicates effectively and with integrity. Otherwise, denying a modeler this sense of style
condemns us all to looking at dry, two-dimensional stick-drawings instead of richly textured and articulate portraits about the business.

PROVABLE QUALITY FOR MODELS

The table below summarizes the five measures of quality. Each of them focus on judging a model based on how well it delivers its message about the business. As aspects of communication these
measures protect against a loss of information integrity. Overall, they make valuable tools for creating high-quality deliverables.

ttttttttttttttttttttttttttttttttttttttttttttttaaaaaaaaaaaaaaaaaaaaaaaaaaabbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbllllllllllllllleeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee111111111111111111111111111111111111111111111111

Some might worry about this approach to quality. It may seem too Zen-like because it elevates aspects of style. Or, there could be concerns about its rigor for properly evaluating a model. But,
consider the following…

This approach takes as its input the direct statements made by the model; the things the model says. It then measures those statements against the criteria. The rigor of the approach stems from the
outcomes of that process. The process produces quantitative, bi-conditional decisions on issues of model quality:

A statement either accurately describes the business or it doesn’t
A statement has only one interpretation or it doesn’t
A model sufficiently documents its business domain or it doesn’t
Statements repeat data or they don’t
Statements agree or they don’t

Collectively these criteria for quality require that the model communicate its description of the business accurately, clearly, concisely, consistently and completely. Who could ask more from a
model? What else could there be?

——————————————————————————–

DATA MODEL QUALITY REVIEW

Project:

Review #n (Date)

——————————————————————————–

General Comments:

(Comments and observations that relate to the model as a whole)

DATA MODEL QUALITY REVIEW

Project:

Review #n (Date)

——————————————————————————–

Object Name:

Object Type: (Entity, Attribute, Relationship, etc.)

——————————————————————————–

Questions:

(Any questions the reviewer may have can be listed here)

——————————————————————————–

Perspective: Corporate: Project:

(Records if the concern is within the scope of the project or extends beyond it. For example, does an Entity definition conflict with that of an established Corporate object?)

Quality Concern:       Accuracy:    Clarity:    Completeness:

Conciseness:    Consistency:

(One or more quality concerns can be raised for the same object)

——————————————————————————–

Action Required: Yes: No:
Action Taken: Yes: No:
Action Date:

——————————————————————————–

Comments:

Suggestions:

——————————————————————————–

[1] For simplicity the article uses data models as the focus of discussion. However, the points offered apply to all types of models.

[2] The term ‘data model’ refers to more than what is shown by the ERD. A model also contains Definitions, complex Business Rules, logical Attribute properties, Notes,
etc.

[3] The notation used on the ERD. The Information Engineering (IE) methodology uses crows feet, black and white circles, and short lines across the Relationship as its symbols
– the language in effect. The valid combinations for the symbols defines the grammar.

[4] A sample model-review form is provided. Contact the authors for more information about its use.

[5] Since ERD is the strongest way to communicate in a model most aspects of style show up there

MenuMenu

Share this post