|
Graphical Patterns for Data Models
Published: October 1, 2002
Published in TDAN.com October 2002 1. INTRODUCTIONOne of the reasons that data modelers are tolerated is because they produce a nice wall-size diagram. It contains a lot of information and saves the time of going through pages and pages of documentation. The data modelers are very often judged by this most visible deliverable. In spite of both real and perceived importance of the diagrams, we usually fail here miserably. Most of them are just a mess of lines and boxes laid out without any rhyme or reason. Instead of being a helping tool, they hinder understanding and turn people away. The subsequent material is devoted to improving graphical representation by precise and meaningful layout and a well-defined color scheme. 2. LAYOUT2.1. LAYOUT: GENERAL PRINCIPLES The following are some general diagramming rules:
"Balance: An equal distribution of weight... Because our own balance is so important to us, we feel uncomfortable when we see something that isn't balanced. (We avoid leaning trees.) The same is true of a layout. If a layout isn't balanced, readers feel uneasy - they feel something is wrong with the page" [Siebert92] 2.2. LAYOUT: BASIC ELEMENTS The entity boxes and relationship lines are the basic graphical elements. One challenge is that the size of an entity does not always correspond to its importance. The box size depends on the length of the entity name, and the number of associated relationships. For example, a minor, insignificant associative entity might graphically take a lot of space to accommodate its long name and all the lines hooked to it. So the entity position on the diagram rather than the size should take a meaningful role. We are used to reading books from left to right, from top to bottom. The diagram should read the same way. To get that effect, we position the dependent entity to the right and below parent entities. I also tend to show
2.3. LAYOUT: RELATIONSHIPS Circular references and multiple relationships between two entities should not be allowed [Reingruber94]. So the model with only transactional data has a relatively small number of relationships. Advanced applications, which store business rules, tend to have a higher number of relationships. Data warehouses have even higher number with each fact table connected to each dimension. A sampling of models showed 2-3 relationship lines connected to each entity for operational databases, and 3-5 for data warehouses. It becomes difficult to trace the relationships. Techniques to improve the situation include: - Follow the 'no-line crossing' rule whenever possible, - Draw straight parallel lines whenever possible, - Combine relationship lines into beams; a whole beam becomes a visual element instead of individual lines. 2.4. GRAPHICAL PATTERNS: STRAIGHT LINES Let's take a look at an example of layered positioning with straight lines (Fig.1) The grandparent is above the parent, which in its turn is above the child entity, etc. The most abstract concepts get to the top, and the most detailed are at the bottom. It was music to my years when after only five minutes with the model my database administrator said: "all the data below the INSPECTION entity will be uploaded from a handheld device". He intuitively understood the convention and started to use it right away.
2.5. GRAPHICAL PATTERNS: RELATIONSHIP BEAMS When model complexity increases, it is no longer possible to use just straight lines and there is a need to introduce something different. Fig.2 shows examples of "relationship beams" -A main business entity with a number of code (static) entities -Multiple subtypes. An agent performs a number of activities on behalf of a client, such as making calls, sending letters and e-mails. There is a relationship beam from the agent to three activities, and a similar one from the client.
2.6. GRAPHICAL PATTERNS: DATA WAREHOUSE On Fig.3 all dimensions are at the top, and the facts are at the bottom. Relationship lines from each dimension are combined into beams with lines dropping into individual facts. Introduction of beams reduces the clutter, because the number of beams is much less than the number of individual lines. It is also much easier to trace the beams of lines, instead of each line separately, especially if a diagram spans multiple pages.
3. COLOR SCHEMESColor augments a flat diagram with a third dimension. There is couple of common sense rules:
3.1. COLOR SCHEMES: DOMAIN-INDEPENDENT The following are some domain-independent color schemes. Their advantage is that they can be re-used across models, establishing consistency. 3.1.1. DEVELOPMENT CYCLE SCHEME This scheme shows where each entity is in the development cycle. In this example (Fig. 4), it separates release 1 entities from release 2 entities.
3.1.2. ABSTRACTION-LAYER SCHEME This scheme (Fig.5) separates entities into layers, re-enforcing the layered positioning patterns discussed earlier. The three layers are:
This is the scheme, which I use most often.
3.3.3. FOUR ARCHETYPE SCHEME The Four Color Archetypes by Peter Coad [Coad99] includes:
The book [Coad99] is devoted to the models designed and presented in this color scheme. 3.2. COLOR SCHEMES: DOMAIN-SPECIFIC 3.2.1. SUBJECT AREA SCHEME I remember how much fun we had while picking out colors for subject areas. Greenish colors were slated for all money related subject areas, grayish colors for workflow, etc. But with 20 to 30 subject areas for the enterprise model, it is just too many colors for the reader to follow. I don't believe the subject area scheme to be useful. 3.2.2. IMPLEMENTATION SCHEME On Fig.6 the diagram is color-coded by the target database: - Entities owned by the server-based database are in one color - Entities owned by the mobile handheld database are in a different color Please note that an entity can be used by multiple applications, but considered to be owned by only one.
4. BENEFITSWhat are the benefits of such diagram improvements?
References [Coad99] - Peter Coad and all, Java modeling in color with UML. Prentice Hall: NJ (1999) [Siebert92] - Lori Siebert and Lisa Ballard. Making a Good Layout. North Light Books: Cincinnati (1992). [Reingruber94] - Reingruber Michael, William W. Gregory. The data modeling handbook: a best-practice approach to building quality data models. A Wiley-QED Publication: New York (1994). Go to Current Issue | Go to Issue Archive Recent articles by Alex Friedgan, Ph.D.
Alex Friedgan, Ph.D. - Alex Friedgan, Ph.D., is a Principal with Data Cartography. Alex has worked in multiple roles, including: research engineer, developer, analyst, college professor, database administrator, and data
architect. He succeeded in solving problems of reverse engineering, agile development, data warehousing, distributed data architecture, object modeling, enterprise data management, metadata
repository and metadata stand-alone solutions. He can be reached at alex.friedgan@gmail.com.
|