|
How to Protect the Data Warehouse Using the Repository
Published: June 1, 1997 This article is about how the data warehouse developer's can make use of the meta data to "protect" the data warehouse.
Meta data often is spoken about in terms associated with data warehouses. While it is understood that knowledge workers need meta data to perform their functions, other individuals related to the data warehouse can also reap great rewards from the use of meta data. This article is about how the data warehouse developer's can make use of the meta data to "protect" the data warehouse. Knowledge Workers need to know:
As a matter of common practice, the meta data to support the data warehouse is stored in some sort of meta data repository. The repository can exist in several warehouse building products or it can be stored in a centralized repository that brings together disparate meta data (much like a warehouse project of its own). This topic has been written about and presented many times. However, one item that is almost never mentioned is that warehouse developers need support in the form of the meta data also. For this article, let us consider that the meta data be stored in a centralized repository. A centralized meta data repository (or a coordinated set of product repositories) can be used to avert potential disasters by giving the developers an early "heads up" when operational system changes may affect the loading process. In a perfect world, warehouse developers should be notified when changes are being made to operational systems and data that may impact the warehouse building process. In the real world, this type of communications does not always exist. That is why "warehouse protection" using the meta data repository is important. The Set Up To "protect" the data warehouse, there are five primary types of meta data that must be stored and kept current in the repository:
By capturing these types of meta data, warehouse developers can effectively monitor the operational systems for changes that may impact data that feeds the warehouse. For example:
In many of these situations, changes are caught on the back end (the audit, testing or loading process) causing warehouse developers to spend valuable time tracing the source of a data problems. The three types of meta data (and two type of relations between meta data), when harvested properly and coordinated in conjunction with a change management system, can be used to pro-actively "protect" the data warehouse. To "protect" the warehouse, there are two change control processes that must be in place:
These two change control processes will be used to identify when a "warehouse protected component" is checked in and checked out of the development, test, and production system environments. The Look Warehouse developers should be able to identify all of their warehouse source data sets (table, copybook, ...) by their data definition names (table names, copybook names, ...). The meta data repository should be able to identify the programs that make use of these file definitions. The repository should also be able to identify the jobs that execute the programs. To protect the data warehouse, tag these items in the repository as "warehouse protected". There are three primary "action" points in a change control process that are critical to the warehouse developer:
When these three actions are taken on the system components mentioned earlier, the warehouse development team need to be informed immediately. It is important to recognize that the earlier in the change control process that the warehouse developers are made aware of changes that affect the warehouse source data, the more likely the developers will be prepared to accept and transform the operational data when the warehouse load process is (re) initiated. Allow me to take a closer look at these actions: The Retrieval of Source Code from Production
The Movement of Source Code to Test
The Movement of Source Code to Production
The Delivery If the repository administrator can identify warehouse "protected" system components and the change control environment is capable of reporting check-ins and check-outs of system components, all that needs to be developed is the ability to match components that have changed to the components tagged as "protected" and to report the matches to the warehouse developers. This can be done by:
Summary This type of approach is easily completed if the IS shop has implemented a meta data repository and has a sound means of change control. Both of these tasks require commitment and time to complete. All shops are not this fortunate and many have just begun to identify the need for a meta data repository and a strict change control environment. BUT... Many of the less fortunate shops also have data warehouses in production or in the analysis or building phase. To the shops that have repositories and change control in place, this article has identified additional uses for meta data in the data warehouse environment. To the less fortunate shops, perhaps this article can raise a warning signal as to where warehouse development can be broken down due to a lack of communications. No matter who you are, if you are developing a data warehouse, meta data can play a large role in "protecting" what is often considered to be "the-mother-of-all" IS development efforts. ... And certainly worth consideration. Go to Current Issue | Go to Issue Archive Recent articles by Robert S. Seiner
Robert S. Seiner - Robert (Bob) S. Seiner is recognized as the publisher of The Data Administration Newsletter, LLC – www.TDAN.com - an award winning electronic publication that
focuses on sharing information about data, information, content and knowledge management disciplines. TDAN.com celebrated its 12th anniversary in 2009. Mr. Seiner speaks often at major
data management and meta-data management, business intelligence and knowledge management related conferences and user group meetings across the U.S. He can be reached at the newsletter at rseiner@tdan.com or 412-220-9643.
Mr. Seiner is the President and Principal Consultant of KIK Consulting & Educational Services, LLC – www.KIKconsulting.com. KIK, celebrating its 7th anniversary, is a company that focuses on knowledge transfer and consultative mentoring in the fields of data governance and data stewardship implementations, metadata management, master data management and data architecture. Beyond knowledge-transfer-focused consulting, Mr. Seiner offers two-day in-house and public courses on how to build and implement data governance / stewardship programs and metadata programs. Contact Mr. Seiner at KIK at rseiner@kikconsulting.com. |