TDAN: The Data Administration Newsletter, Since 1997


Subscribe to TDAN

Business Analysis Conference Europe 2014
Data Governance Financial Services Conference
Enterprise Dataversity
Data Modeling Zone
Data Governance Winter Conference

   > home > newsletter > article
 E-mail to friend

UML as a Data Modeling Notation, Part 4

by David C. Hay, Michael J. Lynott
Published: December 1, 2008
This article, Part 4 of what was originally a 3-part series, is in response to some issues that were brought up after the first three articles of this series were published.

The series of articles was originally presented in three parts. Part 1 set the stage, describing the basic differences between the notations and, in principle, how they can be reconciled. Part 2, went into more detail, addressing sub-types and constraints, along with both what elements in UML should not be used in a data model, and what has to be added (unique identifiers). Part 3 discussed the aesthetics of modeling, as well as some quirky aspects of UML that were worth noting.

A Postscript

Dave Hay used the approach to conceptual entity/relationship modeling described in this series published in the last three issues of TDAN to present a version of the Information Management Metadata Model to the Object Management Group for review. Specifically, the technology-independent models of both entity/relationship modeling and relational design were presented using this form. In general, they were well received, but one viewer brought up the problem that UML people viewing this model required information about its context if they were not to misinterpret it. The suggestion was made that perhaps UML should not be used in the case where the objectives of an entity/relationship model were clearly different from those of a more typical UML model.

Dave is receptive to that, since that was what he had expected to do in the first place, but he’s invested enough in this approach that he would like to see it work. So, he and Mike have come up with the following modifications to the technique, in an attempt to make the results more palatable to the UML community.

Understand that the principal difference between entity/relationship modeling and UML modeling is in its contents: entity/relationship modeling – as presented in these articles – is constrained to represent only objects and classes that represent a particular universe of discourse (such as a business). UML in its pure form has no such restrictions. An object can be a cursor or a window on a computer screen, a database or any such computer artifact. These are not included in a technology-independent entity/relationship model.

A UML model is typically created by an object-oriented designer to provide to a programmer, while an entity/relationship model is created by a business analyst to be reviewed by subject area experts, and then to be submitted to a physical database designer. These viewers of the model have very different perspectives on the issues at hand.

But this difference should not be insurmountable in allowing the UML and data modeling communities to share the notation.

As we’ve seen in the previous three articles of this series, there are two areas where the approaches differ:

  • Sub-types

  • Role name representation


Most developers of UML models and some developers of entity/relationship models favor the approach of representing sub-type boxes outside super-type boxes, connected by lines denoting specialization. We have argued here for the “box-in-box” notation (shown in Figure 1) for two reasons:

  • It is more compact. Given the constraint that a model must fit on an 8½ x 11 (or A4) piece of paper, having to take up space for sub-types is a cost.

  • It is more representative of the business reality. An instance of a sub-type really is an instance of its super-type(s). This notation makes it clearer that an attribute or relationship for a super-type is clearly also an attribute or relationship of all the sub-types.1

  • Even so, we recognize that that the box-in-box notation has already been taken. In UML version 2.0 a composite structure diagram is used to describe run-time architectures that aren’t clear from a typical object or class diagram. “UML 2 has added a composite structure diagram that shows the participating elements and their relationship in the context of a specific classifier such as a use case, object, collaboration, class, or activity.”2

“A composite structure is a set of interconnected elements that collaborate at runtime to achieve some purpose. Each element has some defined role in the collaboration.”3

A composite structure diagram is a larger rectangle, with its components contained as rectangles within it. To look at the diagram shown in Figure 1 as a composite structure diagram is to imagine that PERSON and ORGANIZATION are components of PARTY, not sub-types of it. Note that this is different from the composition diamond on a relationship. That denotes the class model idea that an instance of one class is composed of instances of another class. A composite structure diagram asserts that a system component is composed of other system components.


Figure 1: Prior Example

Indeed the drawing in Figure 1 does show generalization, not composition. To clarify this, we recommend including the generalization lines in the boxes. This is shown in Figure 2. This keeps the aesthetic orientation we are looking for, but signals the correct meaning to UML aficionados. This should not really be an issue because any viewer of this model should understand that it is a conceptual model describing an enterprise and not a run-time model describing a system (and this should be annotated in every diagram’s legend), but the additional notation should help.


Figure 2: New Example


In your authors’ entity/relationship world, the pair of sentences describing a relationship are two assertions about two entity classes.4 Each sentence is in the form: subject (first entity class) | predicate (role) | object (second entity class). Along with this is the analogous assertion that an entity class has an attribute, with “described by” implied as the predicate and the attribute itself playing the part of the object.

In Figure 3, a sample relationship is described by two role names:

  • Each Association End must be owned by one and only one Association.

  • Each Association must be the owner of one or more Association Ends.


Figure 3: Original E/R Example

When your authors first learned that “roles” and attributes are “owned by” or “properties of” a UML class, this seemed very compatible with the way we looked at entity/relationship entity classes. What mystified us was the way UML modelers name the roles. What we now realize is that to the extent that one can create a sentence from a UML role name, the role name turns out to be a property of the object rather than a property of the subject.

Figure 4 shows the UML version of this model.5 Here, the UML sentences would be:

  • Each Association End has (as a property) the role of an owning association with respect to one or more Association Ends.

    That is, “Each Association End has the role of [an Association’s being] an owning association with respect to one or more Association Ends.”

  • Each Association has (as a property) the role of one or more owned ends with respect to one and only one Association.

    That is, “Each Association has the role of one or more [Association End’s being] owned ends with respect to one and only one Association.”


Figure 4: UML Example

That is, to a UML modeler, each role is a property of the object of the sentence (Association End and Association, respectively, above) rather than its subject (Association and Association End).

The way we would do it (“Each Association may be owning (the owner of, actually) one or more Association Ends.”), the role of “owning” is a property of Association, the subject of the sentence. Going the other direction, we would put it “Each Association End may be owned ends of (or rather, owned by) one or more Associations.”

In recognition of the different points of view, your authors have no problem with putting the role name at the other end of the line, as is shown below in Figure 5. We can still follow the convention of reading in a clockwise direction, finding the cardinality symbols at the far end on the same side of the line. This differs from the way we originally portrayed this (as shown in Figure 3, above), where, for example, owned by would be next to Association.


Figure 5: Updated ER Example

This revised approach has the advantage of putting the role name next to the entity class playing the role, which may be more comfortable for UML readers. The UML reader can still interpret it to say that each Association End has the property of one Associations being owner of one or more Association Ends. Similarly, each Association has the property of one or more Association Ends being owned by one and only one Association.

The UML modeler can think of the role as describing the second entity class, but being a property of the first entity class, and data modelers can think of it as a predicate of the first entity class that is in terms of the second entity class. We have the policy of reading in a clockwise direction to preserve our sanity when dealing with multiple notations, but if one wants to read it in the other direction, that’s okay too.

A word about the role names themselves. As stated in a previous article, because of the nature of the relationship sentences, in an entity/relationship model they must be prepositional phrases or gerunds. Nouns don’t work. It is the preposition that is the part of speech for describing relationships. (Remember “Grover” words?) Nouns describe things, and we already have entity classes to do that.

Note that this is still an entity/relationship model, so the entity class names and role names have spaces in them.

A further change from common UML practice is the fact that here entity class names are not reproduced in the role names. That bit of redundancy in UML apparently comes from the fact that in Java programming, a class only “knows” what is in its namespace. The other class in the role is not in its namespace, so apparently the role name has to communicate what that is. This is clearly a technology-specific requirement, not appropriate for a technology-independent model. The conversion, however, of the entity/relationship model to a design model could automate appending the entity class name to the role name.

One issue is that many tools only allow a role to be the “property of” one entity class. Given our disagreements of interpretation of “property,” this will not work. To resolve this, simply make all role names properties of the relationships they are in. Given the current state of tools, this means it will be ambiguous to convert this to either a relational design or an OO design, but that’s an assignment for the tool makers to resolve.

In spite of all our best efforts, in this approach to using UML as a data modeling notation, clearly the meanings of many of the symbols are slightly different from those seen when UML is used to support object-oriented design. This is natural, just as the symbols have different meanings when the notation is used to support relational database design:

  • An entity class is a thing of significance to the enterprise. This is technology-independent.

  • An object-oriented class is a piece of program code, representing any kind of object. This is dependent on object-oriented technology.

  • A relational table is a collection of rows and columns stored on a computer. This is dependent on database technology.

Just as a transformation is required to convert an entity/relationship model into a relational database design, so is one required to convert an entity/relationship model into an object-oriented design. This may involve an automated process of attaching class names to role names, as well as manual efforts to add UML design adornments such as navigation and composition (and, of course, behavior).

Because the meaning of the models is different, should the notations be different? There are strong arguments for making it so, but these articles attempted to show that this is not required for the models to make sense. Whatever notation is used, precise, semantically clear models can be produced. To do so is worthwhile, regardless of the particular experiences of the modeler.

End Notes

  1. Yes, we acknowledge that this arrangement precludes representing multiple inheritance (a sub-type having more than one super-type), but it is our view that situations apparently requiring multiple inheritance should be modeled differently. The controversy continues.

  2. Eriksson, H-E, Magnus Penker, Brian Lyons, David Fado. UML 2 Toolkit. Indianapolis, Indiana: Wiley Publishing, Inc. Page 34.

  3. Wikipedia, “Composite Structure Diagram.”

  4. Back in the days when Dr. Chen invented “thing/relationship modeling” and the real-time programming community invented “thing-oriented programming”, they used different thesauri to come up with the language we use today. Dr. Chen called things “entities” and classes of things “entity class types”. The entity/relationship community got sloppy and lazy over the years and started calling the classes “entities.” When the two modeling communities started talking to each other this caused some confusion.

    For this reason, your authors are calling classes of entities “entity classes” and instances of entities “instances of entity classes”. In this paper, an “entity class” is simply the kind of class being addressed here.

  5. Our thanks to Jim Logan of Model Driven Solutions for this example. 


Go to Current Issue | Go to Issue Archive

Recent articles by David C. Hay

Recent articles by Michael J. Lynott

David C. Hay - In the information industry since it was called “data processing,” Dave Hay has been producing data models to support strategic and requirements planning for more than twenty-five years. As President of Essential Strategies, Inc. for nearly twenty of those years, Dave has worked in a variety of industries including, among others, banking, clinical pharmaceutical research, broadcasting, and all aspects of oil production and processing.  Projects entailed various aspects of defining corporate information architecture, identifying requirements, and planning strategies for the implementation of new systems.  

Dave’s recently published book, Enterprise Model Patterns: Describing the World, is an “upper ontology” consisting of a comprehensive model of any enterprise from several levels of abstraction. It is the successor to his groundbreaking 1995 book, Data Model Patterns: Conventions of Thought – the original book describing standard data model configurations for standard business situations. 

In between, he has written Requirements Analysis: From Business Views to Architecture (2003) and Data Model Patterns: A Metadata Map (2006). Since he took the unusual step of using UML in the Enterprise Model Patterns… book, a follow-on book, UML and Data Modeling: A Reconciliation was published later in 2011.  This book both shows data modelers how to adapt the UML notation to their purposes, and UML modelers how to adapt UML to produce business-oriented architectural models.

Dave has spoken at numerous international and local DAMA, semantics, and other conferences as well as at various user group meetings. He can be reached at, (713) 464-8316, or via his company's website.
Michael J. Lynott - Mike has been doing data modeling and database design since his introduction to the world of databases in the early '80s. He was part of Oracle Corporation's introduction into Computer-Aided Systems Engineering (CASE) and has been a leading expert in conceptual data modeling ever since. After that, he was a consultant with eTransitions of New Jersey, working with renowned author and consultant Ulka Rodgers. In recent years, he has been senior enterprise information architect for a large retailer in Boise, Idaho. Mike has written a number of papers for various publications and conferences.