Views: The Key to Database Agility
Published: October 1, 2005
Published in TDAN.com October 2005
I always enjoy reading Scott Ambler's books and articles on Agile Development, and as a DBA, I'm particularly interested in his "agile" approach to data administration (as expressed, for example, in his book Agile Database Techniques). However, there is an important technique for creating truly agile databases that I haven't seen in his writings; one which is well-known to DBAs, but little understood in the application development community - the use of database views to create a layer of abstraction between the normalized relational tables and the applications that use the data.
Let's start with what Mr. Ambler calls the "object-relational impedance mismatch". What does this mean, and why does it exist? Data structures in a relational database are designed and built as a normalized association of base tables. This is necessary to ensure data integrity, eliminate data redundancy, improve the performance of database updates, and promote the reuse of data by multiple applications. But most application objects (class objects) aren't normalized; they see data in more generalized terms. Take, for example, the Karate School data model in Scott's May 2005 article in Software Development Magazine (incidentally, I have to quibble with his terminology here; I think this is a logical data model, not a physical one). The model shows the data split into several tables, such as PaymentType, BeltAttempt, and MembershipHold. But it's safe to say that the application objects aren't set up this way; I would venture to guess there are only two object classes here - a Student class and a Payment class. The "impedance mismatch" results from trying to map the non-normalized view of data in the class model to the normalized structures in the data model.
Developers find having to work with normalized data structures awkward and time-consuming, since it involves coding complex SQL queries that join data from multiple tables. But storing data in normalized form is necessary, for the reasons mentioned above. There's another reason as well: "refactoring" non-normalized data structures into normalized ones after the fact is always extremely difficult and labor-intensive, and sometimes isn't even possible (because data in non-key fields must be "refactored" into key fields, and data in these fields may have missing or incorrect values).
Now here's the good news: you don't have to work with the normalized tables! In fact, it isn't necessary for applications (and developers) to even be aware of their existence. It is very easy (at least for DBAs) to create an abstraction layer of joined views on top of the normalized data tables, putting the base tables completely "behind the scenes", and out of sight.
A view is a database object that looks and functions exactly like a table, except that it can contain data from multiple base tables. Data in a view can be reformatted and renamed (there's a way around those pesky DBA naming standards!), creating a data object that can map exactly to an object class - hence, no more "impedance mismatch"!
Figure 1 shows a simplified example of a normalized data structure for a timekeeping and task scheduling application. Figure 2 shows a view that merges data from these tables into a single virtual "table". Figure 3 shows the SQL code that creates the view (Note: the work of creating the views should be done by DBAs when creating the database, based on the data requirements of the class model, before the database is turned over to developers). As you can see, many of the "columns" (fields) in the view have been renamed, and transformations have been applied to some of the data formats (for example, the OTApproved field has been changed from a binary bit to a "Yes" or "No" text field). The database identifier keys have been masked in the view, and replaced with actual data values (e.g., ProjectID has been replaced with ProjectDescription). The various components of the EmployeeName (EmployeeFirstName and EmployeeLastName) have been joined together. Figure 4 shows the mapping from an EmployeeTask object class to the new EmployeeTasks view; note the disappearance of the "impedance mismatch"!
Figure 1: Normalized base tables for an employee task-scheduling database
Figure 2: Non-normalized view of the employee task-scheduling data
Figure 3: SQL code to create the view
Figure 4: Mapping between object class and view
Figure 5: Creating an XML view of data
Views create a layer of abstraction in the database that enables applications (and developers) to avoid having to deal with the "nuts and bolts" of the actual implementation and maintenance of the data (probably the greatest single source of friction between developers and data people). In object terms, views provide a means of encapsulating data access without introducing coupling (i.e., denormalization) or diminishing cohesion in the database. They have numerous other advantages as well:
Most importantly, views provide a means for making databases truly "agile". With views, you can change the way application objects see, and work with, data on the fly, without the time and expense of costly "refactoring" efforts. If you want to add, remove, or rename a column, change a column format, or bring in new data from another source, it's as easy as editing and recompiling the view (or creating a new view, if you don't want to interfere with other developers and applications). Applications can be migrated over time from one view to another, with both views supported in the database during the deprecation (transition) period.
Of course there are some issues, at least for DBAs, in this approach: views have to be written and maintained (possibly written by DBAs and maintained by developers?), and all major database tools (including the vendors') are table-based. You can easily generate DDL to create tables, indexes and triggers from data models and create change scripts to implement table changes in multiple environments, but vendor (and tool) support for views is minimal, which makes working with views more labor-intensive (and costly).
Views have other limitations as well: although you can combine local data with data in other databases (called foreign views), you can't update local data through views that reference non-local data (why not, I wonder?), nor can you create primary key / foreign key relationships involving views (again, why not?).
Another major issue for DBAs is the problem of updating the base tables in the database through updates on the views. Where I work, the general approach to database updating involves the use of stored procedures (what we call fundamental stored procedures, or FSPs), the code for which can be automatically generated from database tables, and I am currently working on extending this capability to views. We also created a .NET component to encapsulate the ADO.NET code required to invoke these procedures, in order to make their use easier for developers.
It is also possible to handle the table updates directly from updates on the views, through the use of what are called Instead-Of triggers. Instead-Of triggers are pieces of SQL code that function like stored procedures, except that they are invoked automatically whenever an INSERT, UPDATE, or DELETE command is executed against a view. Unlike normal database triggers, Instead-Of triggers actually do the work of performing the database update (that's why they're called "instead-of" triggers), and they can take the data from the view and split it into separate updates for each of the base tables involved. Instead-Of triggers can also handle transaction management (making sure that all table updates are either committed or rolled back as a single unit of work), concurrency control (handling multiple updates on the same view from multiple users), and error reporting (throwing exceptions back to the application if a database error occurs), the same way our FSPs do.
However, these Instead-Of triggers have to be coded manually; currently, we do not have an automated means of generating this code. This involves more DBA time and work (which can impact development schedules and budgets), and introduces the possibility of coding errors. The trigger code must also be manually maintained over time, to match changes to the views and/or base tables. However, this does at least transfer the work of dealing with the "object-relational impedance mismatch" from the developers to the DBAs, where it belongs.
In spite of the issues and obstacles involved, I firmly believe that the use of views in databases affords the best hope of bringing developers and DBAs together, facilitating the application development process, and bridging the object-relational gap. I encourage developers and DBAs everywhere to seriously consider this approach.
Recent articles by Larry Burns
Larry Burns - Larry Burns has worked in IT for more than 25 years as a database administrator, application developer, consultant and teacher. He holds a B.S. in Mathematics from the University of Washington and a Masters degree in Software Engineering from Seattle University. He currently works for a Fortune 500 company as a database consultant on numerous application development projects, and teaches a series of data management classes for application developers. He was a contribut0r to DAMA International’s Data Management Body of Knowledge (DAMA-DMBOK), and is a former instructor and advisor in the certificate program for Data Resource Management at the University of Washington in Seattle. You can contact him at Larry_Burns@comcast.net.