|
UML as a Data Modeling Notation, Part 1
Two Modeling Worlds
Published: September 3, 2008 Part 1 of this three-part series describes the basic differences between notations and how they can be reconciled.
This series of articles has two audiences: The data modelers who have been convinced that UML has nothing to do with them; and UML experts who don’t realize that data modeling really is
different from object modeling (and the differences are important). The authors’ objective is to finally bring the two groups together in peace. INTRODUCTION 1: FOR THE DATA MODELER
INTRODUCTION 2: FOR THE UML MODELER
A section has been added, by the way, about the aesthetic characteristics desirable in a data model, with a few words about how to present a model to business observers. This is included because, no matter what notation is used, a conceptual entity/relationship model is intended to be a means for communicating with the business. Contrary to what some in both camps believe, aesthetics is important. The easy part of these articles (for both audiences) is to understand the notation required for this approach. More difficult is the change in attitude required in each case, in order to be successful. Before proceeding, three observations should be kept in mind.
The objective of this series of articles is to provide all modelers with guidance as to how to produce an excellent conceptual entity/relationship model using UML Class Diagram notation. NOTATIONBoth the various forms of entity/relationship notation and UML can describe entity classes and relationships. Figure 1 shows a model fragment in the notation developed by Richard Barker and Harry Ellis. It asserts that an instance of an ORDER will be described by values of “Order number” and “Order date”, while a LINE ITEM is described by values of “Line number”, “Quantity”, “Price”, and "Delivery date". Moreover, a value of “(Extended value)” may be computed for each instance of LINE ITEM as well. In addition, this model fragment asserts that:
Figure 1: A Relationship in Barker-Ellis Notation Figure 2 shows exactly the same information, but in UML form. In the Barker-Ellis notation, the may be part of the first assertion is represented by a dashed line connected to the first entity class (ORDER). In the UML model, this is represented by the “0..” notation next to the second entity class (LINE ITEM). In the Barker-Ellis notation, the must be part of the second assertion is represented by a solid line next to the first entity class LINE ITEM). In the UML model, it is represented by “1..”, next to the second entity class (ORDER). Instead of the more than one part of the first assertion being represented in the Barker-Ellis notation by a “crow’s foot” (<—) symbol next to the second entity class, in the UML version, it is represented by “..*” next to the second entity class. Instead of the exactly one part of the second assertion being represented by a straight line (with no crow’s foot) in the Barker-Ellis notation, in the UML model it is represented by the characters “..1” next to the second entity class.
Figure 2: A Relationship in UML The two forms are semantically equivalent. Note that the form “1..1” is often abbreviated “1”. Because the optionality part (“may be” or “must be”) of the notation is next to the first entity class in the Barker-Ellis notation, by convention, the relationship name is next to the first entity class as well. In UML, optionality is denoted by the symbols next to the second entity class. For that reason, in the case of UML, the relationship name is next to the second entity class. To protect the sanity of those who have to work with both notations, in all cases, relationship sentences are read in a clockwise direction: left to right above and right to left below. In the Barker-Ellis model, mandatory attributes are designated with an asterisk (*) or an octothorpe (#)3 and optional ones are designated with a circle (0). In UML, the same symbols used for relationships are also used for attributes. The mandatory attributes in the example are annotated with “[1]”, representing “1..1”, and meaning that at least one value is required, but no more than one is permitted. Optional attributes are annotated with “[0..1]”, meaning that a value is not required but, again, no more than one value is permitted. In the entity/relationship modeling world, the second “..1” is unnecessary, since the original relational theory rules prohibiting multi-valued attributes are in effect. In the UML world such things are permitted, so the “..1” will always be present. Note that, strictly speaking, any expressions could be used to describe the roles played by each end of the relationship, but in disciplined data modeling, there are stringent rules, described in the following section. LANGUAGEAn entity/relationship diagram is primarily a graphic portrayal of English language assertions about an organization. Therefore, the only language to appear on a diagram must use terms relevant to the business. That is, only business terms (and conventional English) may be used as the names of entity classes and the names of roles. Note that the typographical conventions (all capitals for entity class names, italics for relationships) are unnecessary. Indeed, a case could be made for showing the sentences in all normal case. It is helpful, however to distinguish the components in this tutorial so that their role in the sentences is clear. Entity ClassesAn Entity class is the name of a “thing or object of significance to a business, whether real or imagined, about which information needs to be known or held.” [Barker 1989, p. 5-1]. This may be a concrete thing, such as PERSON, or GEOGRAPHIC LOCATION, or it may be an abstraction, such as LINE ITEM or PROJECT ROLE. A subset of the UML concept of “class” can be used for this, provided that it is understood to mean only entity/relationship model classes—that is things of significance to the enterprise, and only if the conventions described here for naming are followed. Specifically, the name of an entity class is in the singular, and refers to an instance of that class. Hence, ORDER, LINE ITEM, above. While the name “Project history” is not allowed an entity class called PROJECT could contain instances over time, so it may in fact be a project “history”. But that is not how it is named. Database table names are not allowed, nor are abbreviations or acronyms4 Classes that are computer artifacts (“window”, “cursor”, and the like) are not allowed. AttributesAs in UML, an attribute in an entity/relationship model is a characteristic of an entity class that “serves to qualify, identify, classify, quantify, or express the state of an entity” [Barker 1989, p. 5-6]. In the examples above, attributes of ORDER are “Order number” and “Order date”. Attributes of LINE ITEM are “Line number”, “Quantity”, “Price”, and “/Extended value”. The “/” in front of “Extended value” is a UML symbol for a computed field. (Most entity/relationship notations have no such symbol, although your authors’ convention surrounds the name with parentheses.) Each value of /Extended value is derived from the expression, “Quantity times Price”. The algorithm is not shown in an entity/relationship drawing, but must be documented behind the scenes. In UML, it can be shown in an annotation on the drawing. UML has the ability to display a large number of things about an attribute: its data type, its “visibility”5 whether it is “read-only” or not, and so forth. In the entity/relationship version, the only things to display are the attribute name, whether it is optional or not, its optional “<<ID>>” stereotype (more on that, below), and the optional “/” that designates it as a derived attribute. Datatype must be documented behind the scenes, but, as it adds clutter, it is not normally shown on a diagram used for presentations. It can be included on diagrams if they are solely used for documentation. “Visibility” is a characteristic of an attribute’s use in a particular context, and does not belong on a structural diagram. As with entity class names, attribute names must be common English names for the characteristics involved. In general, it is not necessary to include the entity class name in the attribute name, but in some companies, standards dictate that the entity class name be inserted in front of the common attributes – for example, “Person name” and “Person ID”. Relationships and RolesA relationship between two entity classes consists of two assertions about them. Each assertion is one entity class’s role with respect to the other. This can be described using the UML line for an “association”. In one sense a UML association is equivalent to an entity/relationship relationship, but a relationship in an entity/relationship model is more constrained in what it can represent than is an object-oriented association. Specifically, as will be described below, each relationship is a pair of assertions about the nature of the business. It is not simply recognition that two things are somehow associated with each other. Note that, while in preliminary entity/relationship models, many-to-many relationships are common, by the time the model has been resolved into a “conceptual” model, they have all been resolved into one-to-many relationships. This is important because often the intersection of the two entity classes contains important business information. Simply saying that each A is related to a lot of Bs and each B is related to a lot of As tells you nothing about each occurrence of an A being related to a B. In Information Engineering and the Barker-Ellis notation for entity/relationship modeling, cardinality (called “multiplicity” in UML) is represented by either the presence or absence of the “crow’s foot” (>-) symbol. Optionality (also known as “modality”) is represented (in Information Engineering) by either a circle (O) or vertical line (|),or (in the Barker-Ellis notation) by whether half of the relationship line involved is solid or dashed. In UML, cardinality is represented by characters: “..1” (meaning that an instance of the first entity class can be associated with no more than one instance of the second class) or “..*” (meaning that the first entity can be associated with an unlimited number of instances of the second class). A relationship’s optionality can be either “0..” (meaning that the relationship is optional) or “1..” (meaning that it is required). UML, by the way, unlike entity/relationship modeling supports a variety of values for maximum cardinality. the expression could be “1,4, > 7”, meaning the value must be exactly 1 or 4, or greater than 7. Unlike in conventional UML usage, each relationship consists of two ordinary English sentences, although that sentence does have a rigorous structure. Each relationship end is called a “role” in UML. Thus, the relationship portrayed in Figure 2 shows cardinality and optionality in graphic terms. Specifically:
Note that to say an ORDER may be composed of one or more items is often expressed as an ORDER is composed of zero, one, or more items, but in your authors’ opinion, the latter is clumsier. Saying this to a non-technical audience sounds, well, technical. Note also that each role name is in the form of a prepositional phrase, not a verb. The preposition is the part of speech that denotes a relationship. (Remember “Grover words”?) Verbs represent actions, that are the subject of a process-oriented, not a structural model. The most common configurations are “1..1”, for “…must be … exactly one…”, and “0..*” for “…may be…one or more…” As mentioned above, because it is so common, “1..1” is often abbreviated “1”. That means, when reading such a role, the reader must parse “1” into its two components. Thus, in Figure 2, above, the role reading from right to left produces the sentence:
From left to right, it reads:
Note that if the modeler is successful, these relationship sentences appear almost self-evident to the viewer. These are perfectly normal, non-technical sentences. Not only do they sound normal, they are also strong sentences, such that if the assertions are in fact wrong, a viewer cannot simply let them go. ‘E has to disagree with them. Note also, however, that coming up with such self-evident role names is very hard. To do so means that you must really understand the nature of the relationship, and you must be good at manipulating the English language (or whatever language you are modeling in). Unfortunately, many modelers don’t have the inclination or the ability to do so. The final product suffers.6 This, by the way is a very different approach to naming roles than is taken up in the object-oriented community. There the role is usually a noun, describing what the second entity is about. In many cases, this is indeed a noun form of the relationship coming back (“customer” rather than “customer in”), but in many cases it simply reproduces the name of the entity. Figure 3 shows a model developed under object-oriented rules. In this, the SUBJECT AREA plays the role of being the containing subject area for the ENTITY. The ENTITY, in turn, plays the role of being the contained entity for the SUBJECT AREA. This is as apposed to the entity/relationship approach of asserting that each ENTITY may be contained in one or more SUBJECT AREAS and that each SUBJECT AREA may be a container of one or more ENTITIES.
Figure 3: Object-oriented Roles7
A ConstraintSome entity/relationship notations have the ability to describe an “exclusive or” arrangement of relationships. For example, Figure 4 shows the assertion:
The “arc” across the relationship lines denotes this.
Figure 4: Exclusive Or in the Barker-Ellis Notation Not all entity/relationship notations can show this, but in fact UML can. In UML, it is called an “XOR Constraint” and is shown in Figure 5.8
Figure 5: Exclusive Or in UML
Part 2 of this series will explore the representation of sub-types and unique identifiers, as well as some UML features that are, well, unnecessary for an entity/relationship model.
References: Barker, R. 1989. CASE*Method: Entity Relationship Modeling. (Wokingham, England: Addison Wesley). Box, George E. P.; Norman R. Draper (1987). Empirical Model-Building and Response Surfaces, p. 424, Wiley. Available at: http://en.wikiquote.org/wiki/Special:BookSources/0471810339 Hay, D. 1999 “UML Misses the Boat,” East Coast Oracle Users' Group: ECO 99 (Conference Proceedings / HTML File). Apr 1, 1999. Available at http://essentialstrategies.com/publications/objects/umleco.htm Hay, D. 2003. Requirements Analysis: From Business Rules to Architecture (Upper Saddle River, NJ: Prentice Hall PTR). Miller, G. A. 1956. “The Magical Number Seven, Plus or Minus Two: Some Limits on Our Capacity for Processing Information”, The Psychological Review, Vol. 63, No 2 (March, 1956). Available at http://www.musanim.com/miller1956/. Martin, J., and James Odell. 1995. Object-Oriented Methods. (Englewood Cliffs, NJ: Prentice Hall). Page-Jones, M. 2000. Fundamentals of Object-Oriented Design in UML. New York: Dorset House). Pp. 233-240. Rumbaugh, J., Ivar Jacobson, Grady Booch. 1999. The Unified Modeling Language Reference Manual
Go to Current Issue | Go to Issue Archive Recent articles by David C. Hay
Recent articles by Michael J. Lynott
David C. Hay - In the information industry since the days of punched cards, paper tape and teletype machines, Dave has been producing data models to support strategic and requirements planning for more than twenty
years. He has worked in a variety of industries, including, among others, banking, clinical pharmaceutical research, and all aspects of oil production and processing.
He is the founder and President of Essential Strategies, Inc., a seventeen-year-old consulting firm dedicated to helping clients define corporate information architecture, identify requirements, and plan strategies for the implementation of new systems. Dave is the author of the book, Data Model Patterns: Conventions of Thought, and Requirements Analysis: From Business Views to Architecture. His new book Data Model Patterns: A Metadata Map is a comprehensive schema of metadata from many different perspectives. He has also spoken at numerous international and local DAMA conferences, Oracle user group conferences, and many others.
He can be reached at dch@essentialstrategies.com, (713) 464-8316, or via his company's website at http://www.essentialstrategies.com.
Michael J. Lynott - Mike has been doing data modeling and database design since his introduction to the world of databases in the early '80s. He was part of Oracle Corporation's introduction into Computer-Aided
Systems Engineering (CASE) and has been a leading expert in conceptual data modeling ever since. After that, he was a consultant with eTransitions of New Jersey, working with renowned author and
consultant Ulka Rodgers. In recent years, he has been senior enterprise information architect for a large retailer in Boise, Idaho. Mike has written a number of papers for various publications and
conferences.
|