|
A Repository Model - The Object-Oriented Design Model
Published: January 1, 2001
Published in TDAN.com January 2001
A teacher who can arouse a feeling for one single good action, for one single good poem, accomplishes more than he who fills our memory with rows on rows of natural objects, classified with
name and form.
Johann Wolfgang Goethe
Elective affinities, Book II, Chap. 7
Whoever tries for great objects must suffer something. Plutarch, Lives, Crassus In the last two issues of tdan.com, articles in this space - A Repository Model - The Analysis Model and A Repository Model - The Relational Design Model presented the first components of a data catalogue ("metadata repository" in the current argot), in data model form. In the first article, emphasis was placed on the elements required to support analysis - entities, attributes, relationships, and so forth. The second article described relational design, with its tables, columns, and keys. This issue covers the wonderful world of object-oriented design. Your author is indebted to Meilir Page-Jones for his excellent book, Fundamentals of Object-Oriented Design in UML,[1] which provided the theoretical basis for this article. In addition, he graciously reviewed this article and answered questions. In addition I would also like to thank Mark Spencer, Ed Landale, Joe Newcum, and Mark Gokman for their contributions to this and my previous two articles. Each provided extremely useful comments to help me refine my points. Errors in this article, however, are mine, not my reviewers'. As before, this article makes the point that, in most situations, there are relatively few very well defined things that we want to keep track of in a catalogue. To model these things should not be very difficult. These articles present a relatively simple set of models to describe a catalogue that will support a typical application. Yes, these are sketches, and they could certainly be made more elaborate. But they should accurately represent at least those things they set out to represent - concisely and in concrete terms. This article is intended to describe object-oriented design. This is currently a hot topic, but one which is unfortunately often misunderstood. It is substantially more complex than relational design, which has made this article much more difficult to write than either of the others. If I successfully described the subject, object-oriented developers will find what is written here self-evident. If that happens, the meta-model presented here must be reasonably correct and the first objective of the article will have been met. A second objective, however, is to provide assistance to those who are not as familiar with the object-oriented approach. The nice thing about data modeling is that, properly done, it is a powerful tool for exploring and describing areas with which you would be otherwise unfamiliar. If some readers come away with a better understanding of just what object-orientation is, a second objective of the article will have been met. Readers are encouraged to disagree with the particulars of these models. The nice thing about data modeling is that it gives us a very good language with which finally to clarify what we disagree about. Classes One of the claims made about the virtues of UML is that the same symbols can be used for the classes identified during requirements analysis as are used to describe the classes created in a computer system. This is unfortunate, since these are not the same thing, and to use the same symbol and terminology to describe both is misleading. The extent to which they are not the same thing will be demonstrated in detail by this article. In the relational world the distinction is made between entities that represent things of significance to the business, and tables and columns that are representations of these things in the computer. It is true that many data modelers in the relational world confuse them, but at least in principle entities and tables can be treated separately. Indeed, while the database design should be based on the entity model, it is often appropriate for the designer to depart from that structure for reasons of performance or other physical characteristics of the system. The UML approach means that the same symbol is representing very different things - classes in the world and classes that are computer artifacts. The confusion between these things can be unfortunate. Even in the object-oriented world, the bits of code that describe classes are not the same things as the classes the code describes. Because of the importance of distinguishing between the real-world "class" of requirements analysis from the computerized "class" of design, the former will be referred to as entity/class. The latter will here be simply called class. (In previous articles I referred to "object classes", since I view the word "class" as referring to a wide range of things outside the world of system development. It has been pointed out to me, however, that within the object-oriented domain, the things we are talking about here are simply "classes". So, "entity/object class" from the previous articles is hereby renamed entity/class, and henceforth this article will be about the entity class.) Figure 1 shows class (implementation), representing the piece of code that describes a class. Note that this is not the same as the entity/class from previous articles that described a thing in the world. Unlike in the relational world, sub-types and super-types can be implemented directly. That is, each class may be a generalization of one or more other classes and each class may be inheriting from one and only one other class. (Again, for philosophical reasons, in this article we are ruling out multiple inheritance - although, of course, the model could be changed to accommodate it.) (Ok, if you insist, change the model to say that "each class may be inheriting fromone or more other classes." You realize of course that this means you'll have to add an intersect entity.) Meilir Page-Jones describes a class/implementation as being in one of four "domains": A business class represents something in the business, which may be either:
An application class represents something specific to an application, and may be either
An architectural class concerns the specifics of an implementation in a particular computer. This might be one of the following:
A foundation class which is usable widely. foundation classes include:
We can probably assert a business rule that a class that is in one of the domains listed can only inherit from other classes in the same domain (business class to business class, foundation class to foundation class, etc.). Note that an attribute of class is the "Program Code" that implements it. This is in addition to the "Name" of the class. To provide for a bit more flexibility, the model redundantly also asserts that each class must be an example of one and only one class domain. These are the same domains represented as sub-types, above. That is, "Business Class", "Application Class", "Architectural Class", and "Foundation Class" are all class domains. Each class domain, however, may be composed of one or more other class domains. That is, the class domain structure allows for specification of the sub-domains listed above, which are not shown on the model as sub-types. Be aware, by the way, that there are other ways to classify classes as well, but we won't go into those here.
Figure 1 : Class
In Figure 2, we show that each class may be described by one or more class elements. A class element is an attribute of one and only one class, describing it, just as an attribute described an entity/class in the first article of this series. In object-oriented design, however, there are two kinds of attributes: an instance attribute takes on a different value for every occurrence of the class. The attribute "Name" for the class "person" is different for each person. This is an instance attribute. You can also have class attributes. In entity relationship modeling, these are usually handled by creating a parent entity, but in object-oriented design, they can be dealt with more directly and more intimately within the entity being described. A class attribute for "contract", for example could be "Next Contract Number". Instance attributes are of two kinds. Discrete instance attributes, such as "State", or "Color", take values from a discrete list. Other instance attributes, such as "specific gravity", take values from a continuous range. A particular kind of discrete instance attribute, state, will be described in detail, below. A discrete instance attribute may be given one or more legal values. Since we don't have "polymorphism" in this model to deal with varying formats, it is necessary to have the explicit attributes of legal value be "Text Value", "Date Value", and "Numeric Value". A business rule decrees that only one would be used for an instance of an object in this class. Note that class element also has the attribute "Visibility". Is this class element (is this instance attribute, for example) visible to any part of a system outside the class it is part of? Visibility is of at least three kinds:
There are other kinds of visibility that are implemented by specific object-oriented languages, but these three are the ones most commonly used. Note that a class element may itself be the use of another class. For example, the instance attribute "Name" could itself be a class.
Figure 2 : Class Elements
As implied by the definition of its domain described above, a business class may be derived directly from the entity/relationship (or object) model created during requirements analysis. Specifically, as shown in Figure 3, a business class may be based on one or more class definitions, each of which is in turn the use of either an entity/class or a relationship. That is, a class definition is the fact that a particular analysis artifact (entity/class, or relationship) is implemented as a class - specifically, a business class. Similarly an attribute definition is the fact that a particular attribute from the entity/relationship model is implemented as an instance attribute. That is, an attribute definition is the use of an attribute as an instance attribute. (As pointed out above, the instance attribute may itself be the use of another class.)
Figure 3 : Class Definitions
Objects Figure 4 shows that a class may be embodied in one or more actual objects. That is, an object is an instance of one and only one class. It is possible, for example, to define a class Flap (as on an airplane wing) and then to discuss a specific object leftFlap.Flap and rightFlap.Flap where leftFlap and rightFlap are specific objects of the class Flap. Then, when the program is run, there may be many examples of leftFlap.Flap and rightFlap.Flap. Note that in this model - and in program code - the object described here is still the definition of an object, not the object itself. It is a piece of program code that describes an instance of a class. When the program is run, there will in turn be one or more instances of the object, each with its own identifier (or "handle" as it is known) and with its own values for attributes. When using a relational database implementation, it is these run-time occurrences of objects that will constitute rows in a relational table. Since there may be several different objects defined for the class, the table will require an additional column ("Object Type" or some such), to identify which object this instance is an instance of. In a table describing "Flaps", for example, for any particular instance of one of the objects, the "Object Type" would be either "leftFlap" or "rightFlap". Thus we have three levels of instantiation in object-oriented design: the class, the object, and occurrences of the object. This is as opposed to the analysis situation where you have only two: the entity and occurrences of that entity. In another example (described in more detail below), when run, the statement New:hom1.Hominoid creates instances of the object hom1 of the class Hominoid. (Mr. Page-Jones follows programming conventions and uses a period to separate an object name from its class in written descriptions. UML, on the other hand, uses a colon to separate an object name from its class name.[3] In either case, the expression is underlined to denote its referring to an object, not a class.) Persistence Object-oriented programming may not have to be concerned with a physical database at all. It is perfectly common to define objects that only survive for the period that the program defining them is running. In business applications, however, it may be necessary to preserve an object's identity and data beyond the life of the program. That is, it is necessary to maintain persistent objects. Given current technology, this is typically done by storing the objects in relational tables and columns. Figure 5 shows that each class may be made persistent in one or more persistence mechanisms. Currently, the most common persistence mechanisms are tables and columns. Classes are typically made persistent in tables, and instance attributes are typically made persistent in columns. This doesn't mean, however, that others might not also be used. Historically, they have been such things as ISAM files, network databases, and other kinds of data storage technology.
State An instance attribute may be a state, which describes a condition for each object which is an instance of a class. A state is an instance attribute that is controlled by business rules which in turn constrain how an object may move from one value to another. (See Figure 6.) Note that the complete "state" of a class is the sum of the values of all its state attributes. As stated above, a state, as a discrete instance attribute, may be given one or more legal values. In the case of state, however, a business rule states that it must begiven one or more legal values. A transformation, is a rule for changing the value of a state from one legal value to another. A business rule asserts that a transformation specifically applies to the conversion of one legal value to another for a state, not for any other kind of discrete attribute instance. Note, by the way, that this concept of state can also be implemented in a corresponding way in a relational design. Because object-orientation began in the real-time systems world, however, the concept is more central to this approach. Behavior Figure 7 adds operation to the model. An operation is a function that is performed by objects in a class. Typically, an operation is on one or more instance attributes, although it might not be. "Visibility" is also an attribute of operation. That is, as with class elements, an operation may be seen throughout the system, within its own class, or only within its class and its sub-types. Note that what relational programmers would consider an attribute may in fact be implemented as a call to an operation that returns the requested value. In object-oriented land, it doesn't matter whether the value was stored in a table or derived in some other way. Mr. Page-Jones uses an example in his book, written in his version of a generic object-oriented language. His class Hominoid is a video game character that turns right or left and goes forward. It can detect if it is facing a wall and must turn. It is described as follows: Hominoid
New: Hominoid
// creates and returns a new instance of Hominoid
(Operations)
turnLeft
// turns the hominoid counterclockwise by 90 degrees
turnRight
// turns the hominoid clockwise by 90 degrees
advance (noOfSquares: Integer, out advanceOK: Boolean)
// moves the hominoid a certain number of squares along
// the direction that it's facing and returns
// whether successful
display
// shows the hominoid as an icon on the screen
(Attributes that are really operations)
location: Square
// returns the current square that the hominoid is on
facingWall: Boolean
// returns whether or not the hominoid is at a wall of the grid[4]
Essentially, the definition of the class is in terms of its operations. These include New, which creates an instance of hominid at run time, plus turnLeft, turnRight, advance, and display. It does have two instance attributes (location and facingWall), but as noted above, these are each a call to an operation that will return a value. So, even the instance attributes refer to operations as well. The instance attribute facingWall, by the way, is an example of a state, with legal values "Yes" and "No". An operation must be implemented by a method, a piece of program code that carries it out. Like other kinds of program code, this is a kind of module, where a module is any piece of code, as we defined it in the previous article. Another kind of module is a package, which is a collection of classes. Actually, a module may be composed of other modules, so a method may be composed of other methods, and a package may be composed of other packages.
An object behaves by having its operations send messages to other objects, thus triggering those objects' operations. Specifically, as shown in Figure 8, each message is from one object to another object. The message is actually sent by an operation that is performed by (the "from")object in a class. Specifically, this is the class that is embodied in the object that is the source of the message. Each message then acts as a trigger to invoke one of the operations that is performed by the class that the receiving object is an example of. If the messages are asynchronous, there may be a message queue in front of the receiving object to store messages until they can be processed. That is, messages which are concurrent or asynchronous must be stored until the receiving object can process them. Hence, each message must be either to another class, or to a message queue. Again, in this model, as in the program code involved, we are dealing with the definition of a message (describing it, as well as its normal source and destination). Actual messages are created when the program is run. A message must be an example of a message type. Message types include "informative", which provide an object with information to update itself, "interrogative", which request an object to reveal something of itself, and "imperative", which requests an object to take some action upon itself.
Each message may include one or more input or output message arguments, as shown in Figure 9. Each message argument must be for a particular message and it must be a reference to another object. Message arguments may be either input arguments or output arguments, as determined by the value of each message argument's "In indicator" and "Out indicator". Both indicators are present, since the same message argument could be both an input and an output argument. In a program, these arguments are shown with input arguments first (optionally preceded by the word "in"), followed by the word "out" and the output arguments, optionally followed by "inout" and any arguments that are both input and output arguments. Each argument is itself typically a reference to an object, but this can be an object in a "Foundation Class" - such as a kind of "integer", "character", or some such. In the example above, "noOfSquares" could be an object in the class "Integer". In Mr. Page-Jones' example, if hom1 is defined as an object of class Hominoid then a message advance would be specified as hom1.advance(noOfSquares, out advanceOK), where noOfSquares is an input parameter (the number of squares to advance) and advanceOK is an output parameter (whether or not the advance was successful).[5] Again, a run-time occurrence of hom1 would have an object id and would in fact advance a particular number of squares (like "5").
Figure 10 shows that a message to an object may be acting as one or more state triggers, each of which must be (the trigger) of one transformation from one legal value to another legal value. The two legal values must be of a state that is part of the class that is embodied in the destination object. In this model, the business rule governing the transformation is simply presented as a text attribute of state trigger. Perhaps a more sophisticated model could represent the structure of such a rule more explicitly. This is left as an assignment for the reader.
A Personal Comment I would be dishonest if I did not confess that this article was by far the most difficult of the three repository articles. Indeed, it is one of the most difficult I have ever written. I have been pleased throughout my career to be able to take my data modeling technique to any industry and within a few weeks understand that industry better than many people who work there. This is the first time I have taken it to my own industry. The experience has been very illuminating. For the last several years there has been friction between the object-oriented aficionados and those more schooled in relational technology. I confess to having contributed my part to that friction. The problem has been that the language and the perspectives of the two groups are very different. The fascinating thing about putting together this article has been that finally I have been able to dissect the object-oriented terms in a way that (it is to be hoped) can make them clearer to all, and perhaps to clarify the sources of some of the disputes. You will find personal observations in the article to be sure, but I have tried hard to be as objective and honest as possible in presenting each concept. Please feel free to take me to task if you believe I have failed in that anywhere in the discussion. And of course, correct me where I am simply mistaken. As I have stated in the previous articles, should you, dear reader, take exception to any of the models presented above - good! It is about time we had a discussion on the specific content we expect in a repository, instead of being surrounded by fluff pieces talking about what a good idea it is. The purpose of a data model is to be wrong. This one represents your author's best guess as to the truth, and it is there for people to correct. Tell me exactly which assertions (entities and/or relationships) you disagree with. Please either write to me at davehay@essentialstrategies.com or post your disagreements to the Data Management Mailing list. You may subscribe to this list by sending an e-mail to dm-discuss-subscribe@egroups.com, or go to its homepage at http://www.egroups.com/list/dm-discuss. Alternatively, if you think these models are completely wrong, please submit your own article to TDAN.com describing your counter argument. Send it to rseiner@tdan.com. I am sure Bob Seiner would be glad to hear from you. In your disagreements, I ask only two things: 1. The model is a set of assertions in the form: "Each must be (where the line next to the first entity is solid) or may be (where the line next to the first entity is dashed) one or more (where there is a "crow's foot" next to the second entity) or one and only one (where there is no "crow's foot" next to the second entity). (For example, "Each column must be part of one and only one table; each table may be composed of one and only one column.) Please express counter assertions in the same form. Yes, it is true that this is an unconventional approach to defining relationship names, and it is hard. But it is hard because to come up with a reasonable name (one that sounds perfectly obvious to the reader), you must really understand the nature of the relationship. If UML is used, each can be shown as a role name. 2. If you draw an alternative model, organize it so that the crow's feet (or the asterisks, if you use UML) are to the left or the top of the model. This tends to put reference entities in the lower right part of the diagram, and intersect or transaction entities in the upper left. It provides a consistent organization for the drawing, and makes it easier for all to see where the differences are. I look forward to hearing your comments and observations. [1] - Meilier Page-Jones, Fundamentals of Object-Oriented Design in UML, Addison-Wesley, (Reading, MA: 2000). [2] - Ibid., pages 233-240. [3] - Grady Booch, James Rumbaugh, and Ivar Jacobson, The Unified Modeling Language User Guide. Addison-Wesley. (Reading, MA: 1999), page 185. [4] - Ibid., page 6. [5] - Ibid., page 22. Go to Current Issue | Go to Issue Archive Recent articles by David C. Hay
David C. Hay - In the information industry since the days of punched cards, paper tape and teletype machines, Dave has been producing data models to support strategic and requirements planning for more than twenty
years. He has worked in a variety of industries, including, among others, banking, clinical pharmaceutical research, and all aspects of oil production and processing.
He is the founder and President of Essential Strategies, Inc., a fourteen-year-old consulting firm dedicated to helping clients define corporate information architecture, identify requirements, and plan strategies for the implementation of new systems. Dave is the author of the book, Data Model Patterns: Conventions of Thought, and Requirements Analysis: From Business Views to Architecture. His new book Data Model Patterns: A Metadata Map is a comprehensive schema of metadata from many different perspectives. He has also spoken at numerous international and local DAMA conferences, Oracle user group conferences, and many others.
He can be reached at dch@essentialstrategies.com, (713) 464-8316, or via his company's website at http://www.essentialstrategies.com. |