Modeling Data and Designing Databases

Published in TDAN.com January 2007


1. Objective

The objective of this paper is to present an approach to the comprehensive, efficient and effective process of creating data models through mission, database domain, and localized
entity-relationship modeling. The paper then proceeds with an overall process and check list for designing databases.

At first, it might seem strange to separate the process of modeling data from the process of designing databases. Not only are these topics really different, but if the first is not done before the
second then not only will you will not only have a bad database design but you will also have a high probability of databases that are not-integrated, that are redundant, and that definitely
possess conflicting semantics. These are all the inverse set of desired characteristics.


Case Study

A school system needs to build a database application (an application where the database is the core component). The problem domain is to track individuals who are being trained to be Certified
Medical Technicians (CMT). The school system has about 140,000 students.

There are about 210 health rooms across the system, and whenever a “nurse” is absent from a school, then a “certified medical technician” has to handle the dispensing of medicine, medical
record keeping, and the like. “Nurse” in quotes because a nurse can be a CNA (certified nursing assistant), an LPN (License Practical Nurse), or an RN (Registered Nurse). A CMT can only be
supervised by an RN.

Schools are responsible for nominating CMT candidates. They must pass a reading, writing, and arithmetic test. If the test is failed the CMT candidate cannot come back to the class for 90 days. The
CMT nominee must “sit” for a 40 hour class. There are three tests during the class, and a practical demonstration-based exam at the end. If any test is failed the CMT candidate is “expelled”
and they have to retake the entire course starting with the entrance test.

There is also a follow-up review and direct observation of every CMT every 45 school days. After the first 45 day period, the CMT candidate can then apply to the State to then become a Certified
Medical Technician. If thereafter, the CMT fails the follow-up review and/or any direct observation, the supervising RN must report that failure to the State. The State may then de-certified the
CMT. If de-certified, the process begins again from the entrance test. Giving the wrong drugs or dosage to a student, or failing to properly document medical treatments is a serious matter, and our
State takes it seriously.

The task is to build the database application against these requirements. A Nurse Manager of the school health organization chose MS/Access to build the application. Since the Nurse Manager was
told that MS/Access is really simple, and since she is a subject matter expert, she just fired up Access on her computer and started to type in the database’s design. One table, of course. As she
discovered something she forgot, she just modified the database’s one table design, one column at a time.

After about nine months, the one-table database in MS/Access was complete. The Nurse Manager was very proud of her accomplishment. She then started to type in data for about 300 records of CMTs in
various stages and states. Of course “she” herself is required to be present to understand and explain each and every row of data. There are status codes everywhere, some of the status codes are
related one to the other. There is no history. All the obvious questions were then asked. That is, those related to the underlying process, test re-takes, test failures, observations,
re-certifications, CMT moves from one school to the next, changes in RN supervisors, refresh of names, addresses, schools, from the school system’s databases, and the like. When the Nurse Manger
was unable to deal with these questions she started to feel “inadequate.”

A question was then asked, “Where’s your model of the data?” She stated, “Oh, right here.” She showed the MS/Access database table. It had never occurred to her that a model of the data is
different than a database’s design. The next two hours were then spent going over how to figure out the model of the data. At the end of that time, about 15-20 entities were identified and
related. “Yikes,” she exclaimed, “How am I going to be able to do that in Access?” She went away sad because she then knew that her requirements were becoming what she clearly did not know how
to do. Stating that a database’s design is not a data model would have been like pouring salt on a open wound. What should she have done? That’s the subject of this short paper. This paper is
written as if she had all the necessary tools to accomplish all the steps. Will she have these tools? Will the school system will expend the resources to make “simple, little applications” like
these practically possible. Good, sophisticated tool sets like Clarion for Windows make both client/server and Internet based applications like these very practical and cost effective to
accomplish.


2. Topics Covered

The topics is this paper include:

  • How to discover the data model
  • Where to put your data model
  • How to build a database design


3. How to Discover the Data Model

The overall process of modeling data is provided in Figure 1. Instantly you may be asking, “Why are functions and organizations involved in modeling data?” Well, if you don’t know, in a general
way, what your going to do with the data, nor who’s going to be using it, then how can you know if you’ve got the right model? Some suggest that you are just supposed to cover a whole wall in a
50x 30 room with butcher block paper and then, put a rectangle in the upper left corner, write “Customer,” and then wait for divine intercession. Some have grown old and retired waiting for the
revelations.

A careful review of the process model in Figure 1 shows that neither “schema” nor “database” (other than database domains) is specified. That’s because it’s a process to model data.
Thereafter, the data model can be employed to help create database schemas that are, of course, an essential requirement for database-centric applications. Provided here are summary descriptions of
these data modeling processes, and reasons why they are performed.

Figure 1.


3.1 Missions

Missions are the idealized descriptions of what the enterprise is all about, not just what is done on a day to day basis with respect to a specific narrow function. Identification and specification
of the relevant missions is critical. In the case study, the focus was on the database’s design to accomplish known processes. There was no overall understanding of how these collections of
processes fit into the overall mission of the school system.

Because she was focused only on the database’s design, she missed the opportunity to identify all the different objects (in the broadest sense of the word) that should have been involved. Her
focus was on the CMT only. Mission isn’t the sole answer, however. Rather, missions are the context within which the data model answer is derived.


3.2 Organizations

Organizations accomplish aspects of missions with databases, information systems and functions. In the case study, the enumeration of the various organizations serve to identify the groups that
have to be involved in getting, maintaining, and using the information. For example, CMTs are transferred from one school to the next. Knowing that, and knowing that probably the last thing on the
mind of the transferred CMT is to notify someone in the school system’s Health Services of the transfer, how to know the current CMT’s school is a real problem. The school system’s HR
organization should have the most current assignment for a person. Consequently, a feature of the resulting CMT system would be to access, on a weekly basis, the HR system’s database, and
validate, CMT by CMT whether the current CMT assigned location is the correct one. If it is then OK. Otherwise that school’s Assigned CMT records would have to be flagged as no longer having an
available CMT.

Again, organization, like mission, is not the answer to knowing the full set of requirements. Rather, it too is a contextual element and provide the identification of groups that have to be
involved in the full set of requirements for a complete solution.


3.3 Functions

Functions are the procedures that are performed by groups as they accomplish the various missions of the enterprise from within different enterprise organizations. This is where the actual
activities are specified that need to be performed. Setting out and defining these functions cause the identification of areas of data, data elements, states, exceptions and the like involved in
the CMT effort. Functions are, of course, all identified within the scope of the mission-organizations that result from the previous step. These functions directly lead to the set of processes,
automated and manual, that are needed in some way for a complete CMT solution.


3.4 Prior Data and Reports

Prior data and reports generally refer to the set of all data that must be involved in the application area, and by inference in the data model’s design. In the case study, there were State
requirements for CMT applications, transactions, and reporting. There were also data requirements to support the individual CMT person such as biographic data, assignments, tests, scores, CMT
progress states, and the like. All these data need to be identified and analyzed within the context of the missions, organizations, and functions as they directly lead to subjects, entities,
attributes, and of course, relationships.


3.5 Scope Model

The above four: missions, organizations, functions and prior data are all brought together to convey one overall understanding of just what the mission is, which organizations are involved, what
activities are performed, and just what data is involved in accomplishing the effort.

Once the scope model is complete, then from a non-IT point of view, the specification is not only complete, it should also be able to be reviewed for missing items and functionality. If any are
found then they should be corrected before proceeding. Correcting these errors here is very inexpensive as no databases are yet designed nor has any software been built. Once the scope model is as
complete as it can be, then, the next step, database domains, can proceed quickly.


3.6 Database Domains

Database domains are text or list based noun-intensive statements. Database domain paragraphs are the precursor step to entity-relationship models. Well done, each sentence in a Database Domain
text becomes a multi-entity and relationship statement. Here, there’s no intended granularity to an entity. Some may end up being complex (i.e., database objects), other simpler, and some even
just data elements.

Once database domains are complete, the various entity relationship diagrams are created, subject by subject. If there’s enough information, then a full set of attributes are created for each
entity. It’s best that the entities are in third normal form so that there is a crispness to the specificity of the set of entities within each subject. Relationships can be made between entities
in different subjects to provide a more uniform model of the data.

From the point of view of the case study, here are examples of database domain statements:

  • School system has health rooms.
  • Heath rooms have assigned supervising RNs
  • Schools have certified medical technicians
  • School system delivers CMT classes
  • CMT classes have certified teachers
  • School system employees are enrolled in CMT classes
  • School system CMT enrollees have passed (or failed) CMT tests
  • Candidate CMTs apply for State Certifications
  • State Certified CMTs must re-certify every two years from date of original certification

From these examples, entities, attributes, and relationships are very obvious. Entity-relationship models are easily drawn and validated by reviewing the subject-verb-object relationship and
cardinality with subject matter experts. If any statement is missing, now is the time to add it.


3.7 Database Objects

Database objects are collections of entities (actually collections of tables within a database’s schema). Full database object specification includes its data structure (i.e., a table collection);
table-based add, delete, and update processes; states; and state transform specifications. But for the purposes of this effort, the database object models are employed to provide a coherent
understanding of the processes that are required to operate against multiple entities so as to ensure an overall integrity.

In the case study, school system is clearly a database objects as it has multiply contained entities. The CMT class is probably a simple database object, and the Certification Date is certainly a
data element. Because the entity-relationship diagrams exist that were created from the database domains, identifying the database objects, simple entities and data elements is fast, and most often
correct on their first recognition.


3.8 Data Elements

During the process of creating attributes within entities, each attribute should be mapped to enterprise-wide data elements as a way to maximize semantic uniformity across differently named
attributes that represent the same data element, and also to ensure a uniformity of value domains. Once data elements are created and mapped to attributes, “where-used” reports can document
maximum semantic integrity.

In the case study, Certification Date would be a data element. Whether it is the First Certification Date, or a Renewal Certification Date is immaterial. They are columns or attributes that map to
the data element, Certification Date.


3.9 Data Models

If the steps above are accomplished, then the data model for the CMTs will largely be complete. If the effort is part of an overall larger modeling effort, then many of the subjects, entities,
attributes, and relationships will already have been created. Having these available from a metadata repository will greatly speed these steps. The remaining step is to actually create a database
schema of just the relevant data specifications and then proceed through prototyping to then validate the data model.

In the case study, a very quick set of entities were discovered, and included for example:

  • CMT Instructor
  • School System Employee
  • CMT Class
  • CMT Test
  • CMT Test Score
  • In-school Observation
  • Health Room
  • CMT Assignment
  • State Certification Record
  • Supervising Registered Nurse


3.10 Validation through Prototyping

This step, validation through prototyping, is critical because up to this point the solution has been almost entirely paper-based. Even if data models are in a metadata repository, they are still
untested designs. Data model design testing requires a process model through which the data model can be exercised to know that it represents the necessary data in an easy-to-use way.

The first step is to make a database schema. This is accomplished by importing collections, or subsets of collections of entities into a database’s schema. Once represented as a database schema, a
code generator that ingests the schema can materialize an actual database-centric application.

Once the application is generated, it can be electronically “shaped” to match a desired behavior model, and then demonstrated to users. User feedback is essential because that’s when you hear,
“Where’s the XYZ data?” Reviewing something on paper is just not sufficient. Once a review is done, and the comments are all logged, just throw the generated application away. If there are
changes to the data models, then make the changes and re-generate the prototype application. Once a series of cycles are performed, eventually the application and the data model will become stable.
If there are six such cycles, all within a two-month period, then that’s like implementing at version 6 versus implementing at version 1 and then spending two or three years achieving the same
design. For sure, this strategy is faster, cheaper, and far more effective.

In the case study, the database’s schema was able to be quickly created because all the entities, attributes and relationships existed in the metadata repository. A schema with all the tables was
created in one afternoon. It was reviewed. The application was generated about two hours later. Another day was spend pruning the application prior to demonstration. Then there were six sessions
with relevant subject matter experts set up. One every two weeks. That was more than enough time to cycle back in the changes to the data model, the database’s design, and the generated
application.


4. Where to Put Your Data Model

The data models should be placed into a metadata repository that is integrated and non-redundant. The model of such a metadata repository is similar to the one in Figure 2. Each box represents a
category of metadata and is sometimes called a meta-entity. This is a very high level subset representation of the metadata repository data model. The diagram shows that the obvious places where
the results from steps 3.1 through 3.8 are stored. Step 3.9 produces the metadata for the subject, entity, and attribute meta-entities. Relationships among entities are not shown. Step 3.10 causes
the creation of the metadata for the business information system, module, database table, and column meta-entities. Relationships among tables are not shown.

It is sometimes stated that metadata repositories are too expensive and take too long to implement. Thus, they are an impractical solution. Not true. If you are an IT data management professional,
then a metadata repository capable of storing all the metadata identified in this

paper and performing all the functions this paper describes can be purchases for less than one-day’s consulting fee. A fully sophisticated code generator can be purchases for less than three-days
consulting fees. The products are there, the way and the strategy is there. Is the will there?


5. How to Build a Database Design

The database’s design starts after Step 3.10 is complete. Existing at that point, is a functionally acceptable database design that conforms to business policy requirements. It however is most
likely not complete from a Information Technology point of view. The complete iterations of database design include:

  • Ensuring that the database schema meets business policy requirements.
  • Incorporating database structures to properly reflect historical data.
  • Adding database columns to support comprehensive audits.
  • Installing security devices to protect against misuse and theft.
  • Inclusion of database administrator special columns and processes.
  • Creating generalized versus specialized data structures to support multiple use data.
Figure 2.

Once a database schema has been created and the database application completed, there maybe other iterations such as:

  • Tuning database structures to ensure adequate performance.
  • Accommodating different flavors of SQL syntax for different database management systems (DBMS).
  • Installing special key structures instead of business-fact-based keys.
  • Accommodating the special physical database requirements of different DBMS.
  • Modifying database structures to support special or intensive analyses and reporting.

Finally special iterations of a database’s design may be needed for:

  • Ensuring that the database is “non-stop” and always optimized.
  • Accommodating the needs of client/server, the Internet, or Service Oriented Architectures, or combinations of all three.

These iterations are all addressed in a Whitemarsh paper, Iterations of Database Design that is available from the Whitemarsh website, www.wiscorp.com.

In terms of the case study, software systems like Clarion for Windows can interact with MS/Access databases through ODBC and JDBC. So, the original objective, to have a simple, efficient, and
effective database-centered application to manage the CMT area is very practical. For sure, with the right tool-set the CMT application could have been accomplished within two months of the need
being cited. But because the correct process was not known nor were the proper tools used, about 12 months of a professional’s time was wasted. That wasted time, worth about $125,000 is many times
more that the actual cost of doing the CMT application.


6. Conclusions

The practical application of the points made in this paper include:

  • A data model is not the same as a database design.
  • The activities for modeling data are an integral part of traditional requirements analysis and design.
  • Metadata resulting from the activities in all the steps of Section 3 should be stored in a metadata repository.
  • Database design begins in earnest after the database has been functionally validated through prototyping.
  • From an architecture and engineering point of view it is both invalid and counter productive in terms of semantic integration, non-redundancy, and enterprise interoperability to begin real
    business system design and implementation until after a data model has been thoroughly validated.

Share this post

Michael Gorman

Michael Gorman

Michael, the President of Whitemarsh Information Systems Corporation, has been involved in database and DBMS for more than 40 years. Michael has been the Secretary of the ANSI Database Languages Committee for more than 30 years. This committee standardizes SQL. A full list of Whitemarsh's clients and products can be found on the website. Whitemarsh has developed a very comprehensive Metadata CASE/Repository tool, Metabase, that supports enterprise architectures, information systems planning, comprehensive data model creation and management, and interfaces with the finest code generator on the market, Clarion ( www.SoftVelocity.com). The Whitemarsh website makes available data management books, courses, workshops, methodologies, software, and metrics. Whitemarsh prices are very reasonable and are designed for the individual, the information technology organization and professional training organizations. Whitemarsh provides free use of its materials for universities/colleges. Please contact Whitemarsh for assistance in data modeling, data architecture, enterprise architecture, metadata management, and for on-site delivery of data management workshops, courses, and seminars. Our phone number is (301) 249-1142. Our email address is: mmgorman@wiscorp.com.

scroll to top