|
Conquering the Logical-Physical Divide – August 2010
What is “Normal”?
Published: August 1, 2010 Larry Burns discusses the degree of normalization that is appropriate in a database schema and how that determination can be made.
Of all the issues surrounding the logical-physical divide, perhaps none is as critical (or as misunderstood) as the subject of normalization. I’ve alluded to this in previous articles, mentioning, for example, how implementing a fully normalized logical model in a physical database, and forcing application developers to code to the normalized schema, results in what Scott Ambler calls the “object-relational impedance mismatch.” This is why I advocate the use of a virtual data layer (VDL) that allows developers to code to database objects that more closely resemble the “real life” application objects.1
Implementing a fully normalized schema may lead to unnecessary complexity and poor performance in the database, and it may not be an appropriate database design for some applications (rules databases and document databases come immediately to mind). On the other hand, as we’ve previously seen, premature or improper denormalization of the database schema can result in corrupted data, poor performance, and a variety of application issues.2 So how do you decide what degree of normalization is appropriate in a database schema? First, it’s necessary to understand what normalization means in the logical realm and what it means in the physical realm. The logical data model, as I’ve said before, documents the business data requirements (the domain of the data and the business rules that constrain the data) of a business subject area (such as Accounting, Inventory, Order Processing, Personnel, etc.). It is a requirements document, not a design document. The logical data model is usually fully (or mostly) normalized; that is, the identified data elements, or attributes, are organized into groups of interrelated entities (objects with a specific business meaning) such that any given attribute:
However, the purpose of normalization in physical database design is different from that of logical design. Physical database design consists of mapping the business requirements of the logical model to an appropriate choice of implementation, architecture, and technology. In other words, you first need to answer some fundamental questions:
The degree of normalization in the physical design (and database schema) depends on the nature and purpose of the database being created. OLTP databases will necessarily be more normalized than databases that support reporting and analysis (and, as we’ll see in my next article, there may be denormalized data structures even within a mostly normalized database schema). And, as I said, some types of specialized databases (such as rules databases and document databases) require specialized design. But even when a logical model is being implemented in a mostly normalized OLTP database design, you probably still won’t implement everything in the logical model as-is. Some minor entities in the logical model might be implemented as rules or constraints, rather than as code tables, or they might not be implemented at all. Single-node supertype-subtype relationships might be collapsed into a single database table. Large or compound natural (business) keys might be replaced by a single surrogate (system-generated) key in the database. And so on. The point is this: Beware of blind assumptions that a given logical model must be implemented exactly as-is in a relational database. There is much analysis to be done, and many decisions to be made, in creating a physical database design that satisfies both the business requirements of the logical design, and the application requirements of the systems and users that will consume the data. NOTE: I’d like to make this a dialogue, so please feel free to email questions, comments and concerns to me. Thanks for reading! References:
Go to Current Issue | Go to Issue Archive Recent articles by Larry Burns
Larry Burns - Larry Burns has worked in IT for more than 25 years as a database administrator, application developer, consultant and teacher. He holds a B.S. in Mathematics from the University of Washington and a
Masters degree in Software Engineering from Seattle University. He currently works for a Fortune 500 company as a database consultant on numerous application development projects, and teaches a
series of data management classes for application developers. He was a contribut0r to DAMA International’s Data Management Body of Knowledge (DAMA-DMBOK), and is a former instructor and
advisor in the certificate program for Data Resource Management at the University of Washington in Seattle. You can contact him at Larry_Burns@comcast.net.
|