The Data Model Resource Book
Published: January 1, 1997
Author: Len Silverston, W.H.Inmon, Kent Graziano
Does your organization have an enterprise data model? How long would it take to create one? What about an enterprise data warehouse design? The Data Model Resource Book by Len Silverston, W.H. Inmon, and Kent Graziano can help in this area. The Resource Book provides library of logical data models and sample data warehouse and data mart designs. It highlights the importance of subject-area modeling and shows the relationship among high-level models, logical data models, warehouse designs, and data mart designs. It also provides a methodology to transform logical data models into data warehouse designs. But what are data models anyway?
Data models represent information areas of interest. While there are many ways to create data models, two modeling methodologies stand out—top-down and bottom-up. Bottom-up models are often the result of a reengineering effort. They usually start with existing data structures--forms, fields on application screens, or reports. These models are usually physical, application-specific, and incomplete from an enterprise perspective. They may not promote data sharing, especially if they are built without reference to other parts of the organization.
Top-down logical data models, on the other hand, are created in an abstract way by getting information from people who know the subject area. A system may not implement all the entities in a logical model, but the model serves as a reference point or template. Sometimes models are created in a mixture of the two methods: by considering the data needs and structure of an application and by consistently referencing a subject-area model. Unfortunately, in many environments the distinction between a logical data model and a physical data model is blurred. In addition, some CASE tools don’t make a distinction between logical and physical data models.
Data models are the result of a conscious, deliberate effort to understand and represent business data. But what good are they, especially in the context of a data warehouse? Data models and the modeling process are useful in a number of ways:Communication. Data modeling can facilitate dialog among members of the business and IT community. Business people have the opportunity to share experience, come to a common understanding, and see different points of view. Consistent Representation of Corporate Data. Modeling standards allow for data to be represented in the same way across the enterprise. This is especially important for data elements which may be represented in multiple ways. Common examples include phone numbers, social security numbers, dates, and addresses. Data Integrity. A normalized data model implemented in a database improves data integrity by implementing referential integrity and reducing data redundancy. Data Flexibility. In the context of a data warehouse, only application-neutral, subject-oriented data structures can easily satisfy needs for business intelligence across several departments. For example, a customer-profitability model might require data from several subject areas, including customer (customer service calls), finance (revenue and cost information), and human resources (hourly wage information). If the data warehouse is "tuned" for a specific area of reporting or analysis, easily extracting data for other business areas may be difficult. Warehouse data structures that are flat or "tuned" for a specific report often require a dedicated staff of report writers to satisfy business intelligence needs. Notation
The modeling notation in the Resource Book is somewhat "standard." You’ll see rounded rectangles (entities), lines, and "crow’s feet." Entities are named with singular nouns. Primary keys are marked with a "pound" sign ("#"). Supertype/subtype relationships are shown as a box-in-a-box. Primary keys are not migrated to related entities.
The Subject Areas
The main part of the book presents seven logical data models of core subject areas common to most industries:
The chapter on each subject area describes key features of the subject area, entities, attributes, and the subject area’s relationship to other subject areas.
Levels of Models
The Resource Book contains one of the best notations I have seen for showing the relationship among high-level models, mid-level models, and physical designs. Figure 1 shows major subject areas, while Figure 2 shows the relationship among the subject areas, mid-level models, and physical models. High-level models are appropriate for business presentations. Mid-level logical data models are appropriate for business-area analysis and transformation into physical designs for implementing systems. The physical model in Figure 2 is implicit. The book doesn’t present any physical data models for operational systems (as opposed to decision support systems). Rather, it assumes that logical models are transformed into physical designs.
Transforming the Logical Model
While the methodology for creating a logical data model is well-known, the methodology for creating a data warehouse model from a logical (or physical) data model is not as well known. In the Resource Book, logical data models are transformed into data warehouse designs. Key transformations include
Chapter Ten applies the transformation rules just mentioned to derive a sample data warehouse data model. Purely operational data (such as comments) are removed. Invoice and invoice detail tables are merged. A "load_date" column is added to several tables. These transformations result in three subject areas appropriate for business analysis: sales, budgeting and purchasing, and human resources. The data warehouse data model consists of several denormalized and reference tables, such as geography and products. The enterprise data warehouse design is suitable for decision support or for a repository, whose data will eventually be distributed to a departmental data mart.
Sample Data Mart Schemas
If the enterprise data warehouse or data warehouse design is inappropriate for departmental analysis, designers may want to create a departmental data mart. The Resource Book presents two data mart schemas for sales analysis and human resource analysis. These schemas are standard "star" schemas. Star schemas contain a "central table, called a fact table, with relationships to several lookup-tables called dimensions." Star schema structures allow for browsing of dimension tables and generally only require a few joins to get useful information.
The Resource Book is one of the more useful books I have seen on data architecture. It is just what it says: an (excellent) resource library for seven logical data models, one warehouse design, and two data mart designs. These data models and designs can be used as templates or starting points for your own modeling, an introduction to subject areas you might not be familiar with, validating your existing models, and a help to building a corporate data model. The logical models tend to be very complete. You probably won’t need all their features, but they provide a good reference.
The book provides a good notation for showing the relationship among high-level models, mid-level models, and data warehouse and data mart designs. Instance tables (sample data) help bring the models to life. The book also provides a good methodology for transforming logical data models to data warehouse designs. I do have one nit: I think the logical data models would be more readable if primary keys were migrated into the related entities, especially when an entity receives keys from several (four or five) other entities. All in all, the book is an extremely useful resource.
The Resource Book is published by John Wiley & Sons. A CD-ROM of SQL scripts to create the models via a CASE tool is also available. The ISBN number for the CD-ROM is 0-471-15366-4.
Recent articles by Van Scott
Van Scott -
Van Scott is a creative enterprise architect with a very strong methodological bent. He is intrigued by the enterprise, enterprise architectures, and data warehousing. Very project-focused. Known for taking notes, listening, and reading. Published author in music, psychology, and data management. Practicing musician and composer. Principal consultant at Sonata Consulting, Inc. You can reach Van at firstname.lastname@example.org or 940.566.3503. Current project is building a large call detail data warehouse for a firm in the telecommunications industry.