Managing Metadata Within and Across Warehousing Efforts
From the Applied Technology Guide - Computer Associates, Inc.
Published: July 1, 2000
Published in TDAN.com July 2000
Most large organizations today have had some experience with data warehousing implementations. Today, these typically take the form of data mart style implementations in various departmental focus areas such as financial analysis or customer focused systems assisting business units. Many organizations have multiple warehousing initiatives underway simultaneously and these systems will most likely be based on products from multiple data warehousing vendors, in the typical decentralized approach of most corporations. This approach has worked to date in that it has allowed reasonably rapid implementation of these systems and demonstrated to the organization the benefit and potential of data warehousing as a business tool at a fraction of the cost of the enterprise data warehouse model.
However, this is the typical "ready, fire, aim" approach which got us to the legacy data Tower of Babel we have today, and in keeping with that, some areas of the business are beginning to show signs of stress as a result of this approach to implementing data warehousing. Data and meta data are spread across multiple data warehousing systems, and system managers are wondering how best to coordinate and manage the dispersed meta data mess they have today. How do we maintain consistency when business rules change as a result of corporate reorganizations, regulatory changes, or other changes in business practices? What happens when an application wants to change the technical definition? How many places are impacted for each of these potential changes? These issues among others are forcing businesses to take a larger view — an enterprise view — of meta data management systems. Coordinating meta data across multiple data warehouses is one significant step in the right direction, and a repository is just the tool to do that.
A Repository as a Meta Data Integration Platform
Ideally, a corporation should adopt a repository as a meta data integration platform, making meta data available across the organization. This would serve to manage key meta data across all of the data warehouse and data mart implementations within an organization. This would allow all of the participants to share common data structures, business rule definitions, and data definitions from system to system across the enterprise.
The platform would accept and manage information from multiple sources. These would include systems from major vendor technology databases (e.g. IBM, Informix, Oracle, Microsoft, Sybase, etc.) and across a broad spectrum of tools, from extraction tools to analysis tools. On the output side, the system should provide open access by multiple tools as well as API's for custom needs.
The meta data repository also facilitates consistency and maintainability. It provides a common understanding across warehouse efforts promoting sharing and reuse. If a new data element definition is required for a data mart implementation, the platform should permit versioning to support the need. With a shared meta data repository the exchange of key information between business decision managers (facilitated by good solid end user access tools) becomes more feasible. And, when multiple data marts and data warehouses are involved, a central meta data platform will simplify and reduce the effort required to maintain them when viewed as a whole.
Repository systems need to contribute to and integrate with the existing legacy system environment and play an active role throughout the lifecycle of data warehousing systems to be truly considered enterprise meta data repositories.
Documenting database and legacy information are important capabilities in meta data repositories. Legacy models provide the information sourcing, data inventorying, and design that are key to developing an effective data warehouse. The meta data surrounding the acquisition, access, and distribution of warehouse data is the key to providing the business user with a complete map of the data warehouse.
The repository should play an active role in the entire life cycle of the data warehouse and all the output attributes of system and business value. This includes existing legacy system as sources, third party tools, etc. This then leverages the repository's role so it contributes in the development phases as well as the bulk cost of all IS systems (the downstream support and maintenance costs). These would include systems management, database management, business intelligence, and application development tools and components listed below.
What Needs to Be In an Enterprise Repository to Make the Warehouse Work Better
Some areas to focus on in reviewing repository functionality are discussed in the following sections ...
Nonproprietary Relational Database Management System
A repository should ideally use an industry standard DBMS which provides significant advantages over vendor-developed DBMSs. These advantages include advanced tools and utilities for database management (such as backups and performance tuning) as well as dramatically enhanced reporting capabilities. Furthermore, maintainablity and accessibility are enhanced by an "open" system.
Using a standard database also allows the repository vendor to focus on the quality of the repository, not the features of the database management system. In addition, it allows the vendor to take advantage of new features made available by the DBMS vendor.
Fully Extensible Meta Model
A repository should be a complete self-defining, extensible repository based on a common entity/relationship diagram. By using a model that reflects industry standards, it can provide users with the ability to easily customize the meta model to meet their specific needs. The repository should support the following meta model extensions:
The vendor should also support the Microsoft Open Information Model, which will allow information to be shared across multiple vendor products. Ideally, the vendor will be part of the Open Information Model design team.
Application Programming Interface (API) Access
An API access to the repository can provide an organization with the flexibility needed to create a meta data management system which suits their unique needs. Architecture can make the repository powerful by allowing users to create custom applications and programs.
In addition, the separation of meta data from the tools that access and manipulate it by the API is a flexible feature. The tools can manipulate meta data through the API, thereby allowing transparent access to the data. If the data structures change, the tools do not need to be changed. This allows for greater efficiency and flexibility in an organization's application development.
Central Point of Meta Data Control
The repository serves as a central point of control for data, providing a single place of record about information assets across the enterprise. It documents where the data is located, who created and maintains the data, what application processes it drives, what relationship it has with other data, and how it should be translated and transformed. This provides users with the ability to locate and utilize data that was previously inaccessible. Furthermore, a central location for the control of meta data ensures consistency and accuracy of information, providing users with repeatable, reliable results and organizations with a competitive advantage.
Impact Analysis Capability
If the repository has an impact analysis facility it can provide virtually unlimited navigation of the repository definitions to provide the total impact of any change. Users easily determine where any entity is used or what it relates to by using impact analysis views.
An impact analysis facility answers the true questions in the analysis phases without forcing a user to sift through large quantities of unfocused information. Furthermore, sophisticated impact analysis capabilities allow better time estimates for system maintenance tasks. They also reduce the amount of rework resulting from faulty impact analysis (e.g., a program not being changed as a result of a change to a table that it queries).
Naming Standards Flexibility
A repository should provide a detailed map of data definitions and elements, thereby allowing an organization to evaluate redundant definitions and elements and decide which ones should be eliminated, translated, or converted. By enforcing naming standards, the repository assists in reducing data redundancies and increasing data sharing, making the application development process more efficient and therefore less costly. In addition, an easily enforceable standard encourages organizations to define and use consistent data definitions, thereby increasing the reuse of standard definitions across disparate tools.
In repository discussions, "versioning" can have many different definitions. For example some version control capabilities are:
The repository's versioning capabilities facilitate the application lifecycle development process by allowing developers to work with the same object concurrently. Developers should be able to modify or change objects to meet their requirements without affecting other developers.
Robust Query and Reporting
The repository should provide business users with a vehicle for robust query and report generation. The end user tool should seamlessly pass queries to its own tool or third party products for automatic query generation and execution. Furthermore, business users should be able to create detailed reports from these tools, increasing the amount of valuable decision support information they are able to receive from the repository.
Data Warehousing Support
The repository provides information about the location and nature of operational data which is critical in the construction of a data warehouse. It acts as a guide to the warehouse data, storing information necessary to define the migration environment, mappings of sources to targets, translation requirements, business rules, and selection criteria to build the warehouse.
Organizations are becoming increasingly aware of the limitations of their own systems and internal data. The attempts to liberate and leverage data across the organization's stovepipes have been replete with frustration and too many examples of failure. These experiences, coupled with drivers demanding flexibility in business processes, are hastening the day that businesses will implement an enterprise level view of meta data. Activity to supply this enterprise level capability is being aggressively pursued by all major vendors. It is critical that corporations understand the issues at hand as they adopt enterprise strategies and that they be in a position to evaluate what set of vendor products are appropriate to their situation. Business Information Demand — An organization's continuously increasing, constantly changing need for current, accurate information, often on short notice, to support its business activities.
Computer Associates, Inc. - Computer Associates, Inc. http://www.ca.com/products/platinum/wp/wp_meta.htm.
The Applied Technology Guide - sponsored by Computer Associates, Inc. can be found at http://www.techguide.com.