|
Prepositions, Not Verbs or Nouns
Published: May 1, 2010 David Hay presents his opposing view to the assertions of other data modeling experts.
Recently, Steve Hoberman published an article in TDAN.COM, “Three Situations that Weaken Data Model Precision”1 (April 1, 2010), that stressed three flaws common in data models: inadequate definitions, using dummy values when a value is required, and using vague labels for relationships (or leaving them out altogether).
While he is absolutely correct in criticizing those who leave out relationship names, he makes one assertion about those names that troubles me – even though it expresses an opinion common in the data modeling world: “A very important part of [a proper] sentence is the verbs.” In this, he is supported by no less than Graeme Simsion and Graham Witt, along with Ron Ross, among others. I, on the other hand, wish to contest that assertion. Interestingly enough, while the data modeling community sees relationship end names as verbs, the object-oriented community sees “association” ends (“roles”) as nouns. That is, they portray a role name as a label for a target entity class. As a label, it is a noun. I contest this as well. A relationship is not a verb. It's not a noun. It’s a preposition. PremisesFirst, please understand that this conversation is about conceptual data models. It is conceptual models that describe classes of things in the world and the relationships between them. Logical models are more oriented to the database technology, so relationship names are not as semantically significant. The relationships in physical models are concerned with foreign keys and other mechanisms. With that in mind, let’s understand the nature of a conceptual data model. It has several characteristics that distinguish it from a logical model or a physical database design. These characteristics also distinguish it from a process model:
Note that a conceptual data model is a kind of ontology – a word from Greek philosophy that describes “the branch of metaphysics concerned with identifying, in the most general terms, the kinds of things that actually exist.”3 (I love using a hot new buzzword – especially when it’s 2500 years old!) You can think of Aristotle as the father of data modeling. In modern times, the word “ontology” means “a catalog of the types of things that are assumed to exist:
Two Common ViewsThere are two approaches commonly taken to naming relationships: one by the data modeling community, and one by the object-oriented community. They are different.Data modelers Mr. Hoberman’s books and course materials describe use of “verb phrases” to describe relationships. For example, in the article, he cites “A Customer can place one or many Orders.” In his concern for precision in a data model, he correctly points out that the verb must have a meaningful content, so things like “has” and “associated with” are not useful.5 Graeme Simsion and Graham Witt also use verbs in their relationships. For example, “Each Customer may make one, or more Purchases,” and “each Purchase must be made by exactly one Customer.”6 Ron Ross has often said that he is not producing “data models,” but rather “fact models.” For example, he cites as a list of facts:
In some cases, Mr. Ross annotates a model to describe the role being implied by the relationship. For example, if a Person rides in a Car, then a note next to Person can show that the Person is playing the role of being a [rider].7 Object Modelers As it happens, many UML authors also use verbs for association names, but that is a name that applies to the entire relationship – ostensibly in both directions. In practice, the modeler usually picks a direction, picks a verb, and labels it for that direction – along the same lines a data modeler would use. Those who take advantage of the UML feature that allows labeling “roles” at each end of an association, see the role name as a noun, essentially describing the entity class that is its object. In UML, a role describing an association end “represents the behavior of an element”8. This sounds like verbs again. But in fact a role name “provides a name to identify an association end within an association, a well as to navigate from one object to another using the association”.9 This name is usually a noun. It describes the part played by the property that is a related class. (“Properties” in UML may be either attributes or related classes.) In the example above, a Party would be a customer in an Order. In this case, two association roles would be expected between Party and Order, so the role names customer and vendor work well. Note that to the object-oriented modeler, though, a UML “association” does not represent a relationship in the semantic sense. It represents the path to be taken by a program in navigating from one entity class to another. So the point of view is focused on how to find the other entity class, rather than on what the relationship means. From that point of view, all that is required is a label on the other class. Indeed, as often as not, the entity class name itself is deemed sufficient. The standards organization ACORD has developed a UML model describing the insurance industry. Among other things, in an association from Contract to Contract Header the role played by Contract Header is labeled contractHeaderElement.10 Issue OneFirst of all, as stated above, a conceptual data model is fundamentally a representation of classes of things significant to an organization and the relationships of those things to each other. Data modelers are often keen to refer to “employee”, “customer” and “vendor”. However these are not classes of significant entities. The significant entity classes are Person, Organization, and their super-type which, by convention is most commonly called Party. “Employee” is a Person with a defined relationship (“employed by”) with a company.The roles that are implicit in these names should be represented in the names of relationships, not buried in the class names of people and organizations playing those roles. Thus, a Party (which may be either a Person or an Organization) may be a customer in one or more Orders. Other Parties may be vendors in the same Orders. Now this is a point that has been made to death in the past, and many modelers do avoid the “Customer” entity class. Indeed, Mr. Ross did in his presentation. He modeled only People and Organizations. Similarly, the ACORD model strongly emphasizes Party. Unfortunately, neither Mr. Hoberman nor Messrs. Simsion and Graham are so rigorous. Issue TwoThe camouflaging of relationships in entity class names is but one problem. A more significant issue has two parts. The first is simply this:A relationship is not a verb. Verbs describe activities or processes. These are more appropriately the subject of process models. To assert that a Customer places an Order is describing a business process. The input is presumably a requirement of some sort, and the output is a request for services or materials. On the other hand, if you want to describe the Order as a thing of significance to the enterprise, then you probably also want to represent its relationship to other things. In this case, two of the related things are instances of Party, one of whom is presumably the customer in the Order, and the other is presumably the vendor in the Order, as described above. No, the verb is not the part of speech to describe a relationship. The second part of this issue is this: A relationship is not a noun, either. In the ACORD insurance industry model, mentioned previously, the role describing the target entity class in the association from Contract to Contract Header was labeled contractHeaderElement.11 Another relationship is from Contract to Contract Element with Contract Element playing the role, elementsIncludedinContract.12 Note that what is being labeled is not the relationship between Contract and these elements. It is not about the nature of the association. Rather it is the answer to the question, “How can I identify that entity class when viewing it from the point of view of this entity class?”13 In these examples the target entity class is part of the relationship name. In many cases it is simply the object entity class name itself. For example, the relationship between an Order and Line Item might be simply “lineItem.” In UML, both attributes and relationships to other classes are considered properties of a class. This means that, in the second example, above, “elementsIncludedinContract” is a property of Contract, identifying the class that it is related to. That is, it is the name of the role played by Contract Element in describing Contract. As it happens, “included in” is a reasonable relationship name. The problem is that it would be a property of Contract Element, not Contract. That is, “each Contract Element must be (“1,”) included in one and only one {“,1”} Contract.” To make the role name a property of Contract, you have to say something like “each Contract may be associated with one or more (“0,*”) Contract Elements, each of which must be (“1,”) included in one and only one (“,1”) Contract.” This is not just a little convoluted. In the example above, a Java program would be expected to navigate from Contract to Contract Element guided by the path labeled “elementsIncludedinContract”. “Included in” is a clue to the role involved here, but this is fundamentally just a way to find the Contract Element. Some practitioners would simply label the role “contractElement.” Note that there is no indication of the meaning of the role. No, a noun as a role name does not work in a conceptual model attempting to describe the world. From the above two premises, we conclude that: The part of speech that describes relationships between things is the preposition. Remember the children’s program Sesame Street, and Grover’s words “over,” “under,” “around,” and so forth? He was teaching kids about the relationships between physical things – but the part of speech is the same even if the “things” are human beings or intangible concepts. Now don’t get the idea that verbs aren’t part of relationships. It’s just that in a model describing what exists, the only verb of interest is “to be.” This can be extended to describe optionality, in the form of “must be” and “may be.” That is, every relationship sentence should have the structure: Each Indeed, among the “verbish” examples, you’ll find such things as “is assigned to,” “is the parent of,” and so forth. The “verbish” part of these relationship names is “is.” The heart of the relationship name is in the preposition, even if the modeler doesn’t realize it. < entity class 1 > (noun) must be (or) (verb) may be < relationship > (prepositional phrase) one and only one (or) (adjective phrase) one or more < entity class > (noun) Mr. Hoberman’s example, “A Customer can place one or many Orders” suffers from two problems. First of all, the entity class isn’t “Customer.” That name encodes the relationship into the class name. The entity class is either Person, Organization or Party. It is the relationship name, not the entity class name, where “customerness” should be captured. In addition to providing too much information about a relationship, “customer” doesn’t tell us enough about the nature of the underlying class: Is it a Person, an Organization, or either – a Party? Taking the most general view, then, the sentence could read: Each Party may be a customer in one or more Orders. (Note also that “may be . . . one or more” is a little more graceful than “can . . . one or many.”) In the case of Messrs. Simsion and Graham, they do have a rigorous structure for their relationships: Each < entity class 1 > (noun) must (or) may < relationship > (verb) one and only one (or) one or more < entity class 2 > (noun) This is consistently applied and rigorous, but it does mean that a lot of the verbs begin with “be,” as in “each Operation must be managed by a Surgeon. To be sure, they can now say, going the other direction, that “each Surgeon may manage one or more Operations. But that camouflages the fact that each Surgeon is in fact playing the role of being the manager of the Operations. In the case of their relationship, “each Customer may make one, or more Purchases,” since “purchase” is more specific than “order,” this could be rendered as: Each Party may be the buyer in one or more Purchases. Again, I am compelled to assume that either a Person or an Organization may be a buyer in the Purchase, but I don’t know that because it is not clear in the original sentence. Mr. Ross made his name as an advocate for documenting business rules, and he asserts that his “fact models” are essential as the basis for describing business rules. This is certainly true. The question is whether the syntax described above to make model relationships can also be used to support business rules. For example, Mr. Ross describes the following business rules:
If we are concerned with the precision of the rules, however, it does not reduce their clarity to say instead:
Indeed, it could be said that in addition to being more precise, it is in fact clearer. The object-oriented modeler might say that in the relationship between Order and Party “customer” is a role played by Party, with “Party as customer” as a “property” of Order. But from the semantic perspective, “customer in Order” is a predicate of Party, not Order.
Think prepositions, not verbs and not nouns. ConclusionThe “relationship” part of “entity/relationship” modeling is far more challenging and subtle than most people realize. One complaint your author often hears when teaching the approach to naming relationships described here is that it is hard!That is true. If you are successful, the person reading the relationship sentence will find it to be obvious. Before you found the right words to make it seem that clear, however, the essential nature of that relationship was not so obvious. In data modeling, to come up with the right name to describe exactly how two things are related to each other – so as to make it sound obvious – requires you to understand the fundamental nature of the relationship to a much greater degree than your reader will. What exactly is the role being played there? You can see it. You know that it is true. But coming up with the right word is, well, challenging. It requires skill in using language. For a novelist to come up with the right word to convey an image or a person’s temperament – that’s hard, too. This is what literary skill is all about. If you went into computer science because you couldn’t write well in college, you are now officially in trouble. End Notes:
Go to Current Issue | Go to Issue Archive Recent articles by David C. Hay
David C. Hay - In the information industry since the days of punched cards, paper tape and teletype machines, Dave has been producing data models to support strategic and requirements planning for more than twenty
years. He has worked in a variety of industries, including, among others, banking, clinical pharmaceutical research, and all aspects of oil production and processing.
He is the founder and President of Essential Strategies, Inc., a seventeen-year-old consulting firm dedicated to helping clients define corporate information architecture, identify requirements, and plan strategies for the implementation of new systems. Dave is the author of the book, Data Model Patterns: Conventions of Thought, and Requirements Analysis: From Business Views to Architecture. His new book Data Model Patterns: A Metadata Map is a comprehensive schema of metadata from many different perspectives. He has also spoken at numerous international and local DAMA conferences, Oracle user group conferences, and many others.
He can be reached at dch@essentialstrategies.com, (713) 464-8316, or via his company's website at http://www.essentialstrategies.com. |