Conquering the Logical-Physical Divide – November 2010
Stupid Database Tricks
Published: November 1, 2010
Larry Burns explains why having a complete and accurate logical data model is so critical to database development.
We’ve been having lots of problems with one of our databases recently, and these problems serve to illustrate some fundamental truths about how to design and build databases (and how not to!).
The database in question is the database for a third-party vendor product that helps us keep track of equipment in our manufacturing plants that is connected to our network (that is, each piece of equipment has an IP address). As a DBA, I usually don’t have much to do with vendor databases. We keep them backed up, of course, so that they can be recovered if need be, and occasionally the vendor will send us update scripts to run. Other than that, we mostly don’t think about them.
But I had to think about this database – a lot – after receiving a frantic call from the application analyst responsible for supporting the product. It seems that a user had mistakenly assigned the same IP address to two different tools, and the product wouldn’t allow him to correct the mistake! I loaded a backup of the database to a test server, so the application analyst could write a SQL script to correct the problem. Second phone call: the update script updated hundreds of records, most of which were not supposed to be changed. Could I check to see why this happened?
What I discovered dismayed – but didn’t surprise – me. The database contained no data integrity constraints whatsoever (except for primary keys built on system-assigned identifiers). The lack of data integrity constraints (including referential integrity constraints between tables) is what allowed two tools to be assigned the same IP address. What I also discovered is that several of the code (minor entity) tables contained duplicate name values (and, in some cases, duplicate IP address values), which is what caused the analyst’s update script to change many more records than intended.
After laboriously figuring out how to write an update script that updated only the records that needed to be changed, I fired off an email to the vendor support rep. After describing the problems I had found in their database, I explained that databases of this sort invite an infinite (and steadily descending) cycle of data fixes, each of which results in some other problem that needs to be fixed, and so on and on. In my tenure as an application support analyst, I had to work on several such databases (which may explain my subsequent interest in data management!). I worked in one shop where the answer to every production problem was to “fix” the data in the database so that the application would stop blowing up. This fix, in turn, caused additional problems for the application; the data would be further “massaged,” resulting in more fixes, etc. This cycle of futility came to an abrupt end when one of the “fixes” deleted most of the records in the database – and there wasn’t a backup!
Most database issues come down to one fundamental question: What is a database? Far too many application developers think of a database as nothing more than a repository for application-specific data (what they call a “persistence engine”). To them, a database is a sort of “Fibber McGee’s closet”1 into which data gets stuffed and then, sometime later, retrieved. We hope that the data, by that time, is correct, timely, business-relevant, and easily consumable, but it probably won’t be. Databases built according to this definition of “database” characteristically have few, if any, data integrity constraints because developers prefer to code all such “intelligence” in the application.2
I advocate a more complete definition of “database.” A database, in my view, is not just a repository for business data; it is a repository for the business rules that constrain the data! This is why having a complete and accurate logical data model is so critical to database development – the logical data model documents the business data requirements (and the business rules) that need to be implemented, in some fashion, in the database. As I explained in my recent article,3 it is not necessary (and usually not desirable) to implement the logical data model as though it were the actual database design. Nevertheless, the database, once implemented, must encapsulate the business data requirements and business rules that are documented in the logical model.
Having a database that encapsulates – and enforces – the business rules surrounding business data helps ensure that the data in the database will always be complete and correct, and that it will always represent, at any given moment in time, a valid view of the business. Data that can be trusted to be correct, accurate and timely, that is easily consumable, and that is business-relevant (and not just application-specific) is data that can be reused, over and over again, in support of multiple applications, multiple business processes, and multiple business initiatives. This, in turn, greatly multiplies the value of the data (and the database) to the business.4
As I always say, data management is the art of turning data into business value. Data that is locked in application-specific databases (especially databases that don’t bother to enforce the correctness or business-relevance of their data) is data that is not going to be of much value to the business!
NOTE: I’d like to make this a dialogue, so please feel free to email questions, comments and concerns to me. Thanks for reading!
Recent articles by Larry Burns
Larry Burns - Larry Burns has worked in IT for more than 25 years as a database administrator, application developer, consultant and teacher. He holds a B.S. in Mathematics from the University of Washington and a Masters degree in Software Engineering from Seattle University. He currently works for a Fortune 500 company as a database consultant on numerous application development projects, and teaches a series of data management classes for application developers. He was a contribut0r to DAMA International’s Data Management Body of Knowledge (DAMA-DMBOK), and is a former instructor and advisor in the certificate program for Data Resource Management at the University of Washington in Seattle. You can contact him at Larry_Burns@comcast.net.