Thought Leaders On Data Modeling:
Excerpt from Data Modeling Theory and Practice
Published: April 1, 2007
In this excerpt, we skip the introduction of the interviewees (most of whose names will be known to TDAN.com readers) and the discussion of the protocol and cut directly to a central question about data modeling.
Published in TDAN.com April 2007
This article presents an excerpt from Chapter 6 of my new book, Data Modeling Theory and Practice (Technics Publications). The book as a whole looks at the nature of the data modeling task, through review of the literature, interviews, surveys, and "laboratory" experiments. This chapter is titled, What the Thought Leaders Think and reports the results of interviews with seventeen influential people in the data management field, most of them specialists in data modeling, who generously responded to my request to videotape them for my research. The interviews were conducted in 2002 and, therefore, reflect my understanding of the interviewees' opinions and positions at that time.
In this excerpt, we skip the introduction of the interviewees (most of whose names will be known to TDAN.com readers) and the discussion of the protocol and cut directly to a central question about data modeling - is a descriptive or a creative (design ) process? We then look at relevant aspects of the data modeling environment and data modeling problems. In a later excerpt, we will cover three further dimensions: process, products and people.
a Description vs. Design - Explicit Positions
I'm sure I could find people who fit both those ends of that spectrum very well and are adamant that they're right about it.
- Karen Lopez
Opinions on the description / design question, presented directly, covered the full spectrum. Table 1 summarizes my overall classification of the interviewees' positions, based both on their direct responses and the wider-ranging discussions that followed.
The assessment shown in Table 1 is included primarily to illustrate that each characterization had support from several interviewees and that the diversity of views reported in this chapter was not a product of including one or two "outlier" individuals. The "position depends on language" row in the table reflects the view of Harry Ellis who argued that data modeling with entity-relationship-based approaches should be strongly characterized as design, but that his current work using the CBML language  is properly characterized as highly descriptive. As such, his comments provided qualified support for both positions.
Interviewees clearly understood this question and their views were largely articulate and unambiguous:
"Data modeling is not a process of creation; it is a process of discovery."
"Data modeling is a certainly a descriptive activity, it's not a design activity."
"I believe rabidly and intensely that it's a design process."
"We're designing (but) some of the people that we work with see us as scribes."
It bears re-emphasizing that the context of the question was development of a new database rather than (for example) enterprise modeling, creating reference data for metadata repository mappings, data warehouse design  or reverse engineering. Several interviewees were concerned that their views be reported only in that context.
Proponents of the descriptive characterization as well as those of the design characterization generally contended that the resulting model could be translated relatively mechanically into a conceptual schema (logical database design was the common term), or at least a default schema prior to performance tuning. They did not see a descriptive model as merely an input to a design stage. "There is no translation: we model the tables," said one advocate of the descriptive view.
Key themes in the descriptive characterization were:
The concept of a single objective reality was often at least implicit. One interviewee discussed a situation in which he had, on reflection, changed the way that the product concept was represented in a data model. Was this a case of coming up with a better design? "No," he said, "we finally figured out the product for this particular organization." Finally? "I have no claims of infallibility. It's perfectly reasonable that someone would be smarter ... and I'd look at him and say 'yeah you're right...'"
The practical impact of the descriptive position is illustrated by one interviewee's account of his testimony as an expert witness in an intellectual property case. He argued that "the data model is a description of the problem and therefore by definition one data model will look pretty similar to others. It's not a patentable or copyrightable thing... My model is my best description of my understanding of the nature of things and therefore I can't patent that because it's reality."
Proponents of the design position offered less elaboration at this point. Three key themes that they raised - negotiability of requirements, diversity of product and the role of creativity - are discussed later.
The "design" group saw data models as solutions. John Zachman used the Zachman Framework (Zachman 1987; Sowa and Zachman 1992) to distinguish the descriptive "business owner's view" (Row 2) from the logical data model of Row 3. In this formulation, the logical data model  is a solution to the business owner's well-articulated problem: "Once you've defined what the things are you're trying to manage... someone has to invent the filing system... " John Zachman was among several of the design proponents who drew an analogy with architecture.
Two interviewees' responses highlighted the role of the modeling language. Terry Halpin, noting that he was "biased" from using ORM , saw data modeling as both description and design. On the one hand, ORM supports a descriptive approach to requirements - "In ORM you verbalize the data requirements and that verbalization itself is the model - essentially" - on the other, there are numerous opportunities to modify the default conceptual schema that results. Harry Ellis echoed some of the earlier proponents of the ORM method when he argued that "if the language was adequately rich, what the domain expert is saying could be precisely and accurately written down ... in such a way that the technical applications would be absolutely definitive."
Sometimes "it's an art" is used as an excuse for not following generally accepted practices or internal standards.
- Karen Lopez
Describing only in terms of data misses the big picture.
- Ron Ross
Two themes fell under the heading of Environment - the context in which data modeling takes place. The first was the impact of an enterprise model on data modeling at the application level. Here the choice and creativity associated with the design characterization were seen as impediments to the consistency needed to support data integration:
"The crucial problem with creativity is that the more creativity goes into the model - the more idiosyncratic the models - the harder they are to fit together. Even if it's excellent it becomes problematic - and generally it isn't."
By establishing standards for data representation, an enterprise data model or architecture should (and would) render data modeling a more descriptive process. An enterprise model could become a surrogate for the business as an "absolute statement of truth." As one interviewee put it, if you still insist on being a soloist, you'll be asked to leave the orchestra. Enterprise models were also seen as valuable for encouraging a broader view at the application level: "if you see only one line of business your model is going to be very different than when you're looking at it across the entire enterprise." All but one of the interviewees who raised the role of an enterprise model as a vehicle for enforcing conformity was speaking from the position of enforcer rather than conformer, and the scenario was seen more as a goal than a current reality.
The second Environment theme was the need to see data modeling as only one technique amongst many, particularly in the context of understanding or negotiating business requirements. Several interviewees pointed out the danger of over-reliance on data models as a means of understanding the business and its requirements. Other techniques nominated included process and workflow modeling, Critical Success Factor and Key Performance Indicator analysis, Use Cases, and business objectives. These were seen as adjuncts or (often) precursors to data modeling. In the context of the description / design question, they suggested a separate "requirements elicitation" stage rather than direct, descriptive mapping of business concepts onto a model.
The negotiability of requirements
Data modeling is all about helping a business come up with a better way of doing business
- Alec Sharp
Data modelers should not resolve business problems
- Michael Brackett
If the description position was more comprehensively argued when the description / design question was presented directly, the balance was restored in the discussion of the negotiability of requirements. Consistently, the proponents of the design position argued not only that business requirements were negotiable, but that data modelers should be active in exposing new ways of doing business. The preferred method was to inform business stakeholders of the (negative) consequences of existing perspectives (to "expose the business to itself"), using the data models to facilitate discussion. Business rules may reflect the limitations of past technologies or systems, and need to be challenged rather than blindly accepted. Peter Aiken stated that "if the users aren't by a third to a half way through the session jumping up to the board and saying 'this is wrong' then ... I don't consider the modeling session a success." Alec Sharp expressed the view even more strongly: "it's criminal not to do something to help people see the consequences of having chosen a particular reality."
Implicit (or indeed explicit in some cases) is the view that the business does not know what it wants - or at least what is best for it. Modelers were seen as being able to make suggestions ("Have you thought about doing it this way?") and to bring in their own general or industry-specific business knowledge to provide new perspectives. One interviewee cited a case in which the business had specified some 500 attributes to be included in a database for reporting; after the data modeler reviewed how they would be used in practice, the number was reduced to 150.
More broadly, Alec Sharp talked of the "myth of requirements," arguing that the view that the business knows them already and that the job of the analyst is to extract them is "patently false" and a legacy of the early days of computerized systems. Data modelers who buy into the myth "may end up with a better data model but not with a better business." Architecture was invoked as a very close analogy ("don't tell me how to do it, tell me what you need to do") including recognition of the right of the client to say "thanks for your idea but no thanks."
Some of the aims and claims for business change were ambitious, even grandiose: "a skilled data modeler might help a business transform itself"; "the client said 'this has been a revelation'"; "the real benefit of data modeling ... is in synthesis of new ways of looking at things"; "I can help companies see themselves differently"; "help them define where they want their business to go - and model that"; "(changing) the management practices of the organization itself." Alec Sharp, addressing the apparent "scope creep" in the definition of data modeling commented: "I'm perfectly happy to do this from my role as data modeler, because I don't choose to limit that role..."
Richard Barker offered an opposing view: "I used to play around in that area but until I became a main board director of a company and learned the essence of running a business, I didn't really understand. That was a massive change." Interviewees from the description camp also supported the primacy of the business in determining its data model: "What we're modeling is what the domain expert says is right. You have to presume that the domain expert knows exactly the way the business is or wants to be - every little bit of it. The modeler is only articulating that..."
On the subject of challenging business requirements, one interviewee simply stated, "I never have that conversation." Another said that the business's view of its data should be questioned only in rare cases. And the final decision definitely lay with the business: "when there is a discrepancy it is the business which answers the discrepancy, not the data modeler". This extended to changing data names in the interests of precision: "Right up front you put your own spin on the business rather than letting the business have its say."
 The term design is used here in its plain English sense rather than as a stage in the applications development lifecycle.
 Corporate Business Modelling Language (Department of Defence (UK) 2005; Ellis and Nell 2005).
 The issue with data warehouse design was not the use of different modeling languages (viz star schemas) but the constraining effect of accommodating existing (legacy) data structures.
 Most interviewees used the term "logical data model" to denote what in academic work would generally be called a conceptual data model.
 Object Role Modeling - (Halpin 2001)
Recent articles by Graeme Simsion
Graeme Simsion - Graeme is the author of Data Modeling Essentials and Data Modeling Theory and Practice. Since mid-2007, he is no longer involved in data management and data modeling, but continues to draw on his experience as a consultancy manager to advise and teach on the management and delivery of consulting services.