Data Stewardship: Process for Achieving Data Integrity

Steward – from Old English for “keeper of the sty”, a sty ward.

Data Steward – Person responsible for managing the data in a corporation in terms of integrated, consistent definitions, structures, calculations, derivations, and so on.

Corporations are demanding better and better sources of data. The explosive growth of data warehousing and sophistication of the access tools are proof that data is one of the most critical assets
any company possesses. Data, in the form of information, must be delivered to decision-makers quickly, concisely and more importantly, accurately.

The data warehouse is an excellent mechanism for getting information into the hands of decision-makers. However, it is only as good as the data that goes into it. Problems occur when we attempt to
acquire and deliver this information. A major effort must be made in defining, integrating and synchronizing the data coming from the myriad operational systems producing data throughout the
corporation. Who should be responsible for this important task? The answer for a growing number of companies is a new business function called Data Stewardship.

What is Data Stewardship?

Data Stewardship has, as its main objective, the management of the corporation’s data assets in order to improve their reusability, accessibility, and quality. It is the Data Stewards’
responsibility to approve business naming standards, develop consistent data definitions, determine data aliases, develop standard calculations and derivations, document the business rules of the
corporation, monitor the quality of the data in the data warehouse, define security requirements, and so forth (see Table 1 for a list of the data integration issues determined by Data Stewards).

This data about data, or meta data, developed by Data Stewards can then be used by the corporation’s knowledge workers in their everyday analyses to determine what comparisons should be made,
which trends are significant, that apples have indeed been compared to apples, etc.

Just as the demand for a data warehouse with good data has grown, the need for a Data Stewardship function has likewise grown. More and more companies are recognizing the critical role this
function serves in the overall quest for high quality, available data. Such an integrated, corporate-wide view of the data provides the foundation for the shared data so critical in the data
warehouse.

What qualities should you look for in a Data Steward?

Data Stewards are well respected by the end user community because of their thorough understanding of how the business works. They have the confidence of both the IT and end user communities that
they are not creating meta data and business rules that are impossible to implement or counter to the corporation’s culture.

Table 2 lists the skill sets for a Data Steward. These are divided into two sets: Technical Skills and Interpersonal Skills.

The types of technical skills may seem more clear-cut that the interpersonal ones. Data Stewards will need some knowledge of IT systems and DBMSs employed in the corporation. This ensures that the
Data Stewards remain grounded in the reality of what is technologically feasible. Secondly Data Stewards should be able to understand both logical and physical data models, how entities relate to
each other, what redundancy is, why normalization rules are important. They are not, however, usually responsible for the creation of these models; that usually falls into the domain of the Data
Administration group.

Interpersonal skills are sometimes overlooked when choosing a Data Steward; yet these skills tend to be most important. Many times the Data Steward will find himself or herself in the situation of
trying to facilitate an agreement between two differing factions. Data integration can be a highly charged issue affecting the very core of how a company will continue to do business. Because of
this, the Data Steward must be able to reach a consensus wherever possible or at least a reasonable compromise. Secondly, these resources often must perform the difficult role of Organizational
Change Agent, smoothing the way for changes that will inevitably happen as integration of data occurs.

What is the scope of a Data Steward?

A typical corporate Data Stewardship function should have one Data Steward assigned to each major data subject area. These subject areas consist of the critical data entities or subjects such as
Customer, Order, Product, Market Segment, Employee, Organization, Inventory, etc. Usually, there are about 15-20 major subject areas in any corporation. As an example, one Data Steward would be
responsible for the Customer subject area and another would be assigned to the Product subject area.

The Data Steward responsible for a subject area usually works with a select group of employees representing all aspects of the company for that subject area. This committee of peers is responsible
for resolving integration issues concerning their subject area. The results of the committee’s work are passed on to the Data Administration and Database Administration functions for
implementation into the corporate data models, meta data repository, and ultimately, the data warehouse construct itself.

Just as there is a Data Architect in most Data Administration functions, there should be a “lead” Data Steward responsible for the work of the individual Data Stewards. The lead Data Steward’s
responsibility is to determine and control the domain of each Data Steward. These domains can become muddy and unclear, especially where subject areas intersect. Political battles can develop
between the Data Stewards if their domains are not clearly established.

Where to look for a good Data Steward?

Data Stewards generally come from either the end user community or the IT department:

Subject matter experts from within the end user community make good Data Stewards. They are quite knowledgeable about specific parts of the corporation. However, they may need training in some of
the technical aspects of data models and IT systems. In addition, they must be familiar with business areas other than their own. Otherwise they can be perceived as biased toward their perspectives
on the data.

Data modelers from the IT Data Administration function also make good Data Stewards. They understand the technical issues of data integration and usually acquire a great deal of exposure to the
business community while modeling the business rules, data entities and attributes. In addition, they generally have good rapport with end users and Database Administrators alike. However, the
resources must have the respect of the end user community and the authority to make decisions on their behalf.

How do you differentiate the roles of Data Stewards, Data Administrators, and Database Administrators?

Each function must have its own roles and responsibilities, spelled out clearly to avoid any confusion. There is little overlap in terms of each group’s responsibilities; however, there is a great
deal of collaboration and communication that must take place to ensure the data assets of the corporation are used to their maximum return on investment. Table 3 lists the specific roles for each
function – Data Stewardship, Data Administration and Database Administration.

A final note on the importance of Data Stewardship.

The Data Stewardship position probably has the highest profile within the corporation of the three mentioned above. Why? Because the Data Steward acts as the conduit between IT and end users. They
have the difficult but very rewarding task of guaranteeing that one of the corporation’s most critical assets, its data, is used to its fullest capacity.

For Data Stewardship to succeed in your corporation, a new incentive paradigm must be developed – one that rewards people on the basis of horizontal integration rather than only vertical or
“bottom line” success. As long as a department or division is solely focused on its bottom line, it will see no benefit in changing its business practices to integrate data and business rules
with another department or division. The new incentives should be driven by the success of the groups to resolve integration issues, to develop unified definitions, to change business practices to
conform to the new standards, etc.

Table 1: Data integration issues

Data Stewards are responsible for the following:

  • Standard Business Naming Standards
  • Standard Entity Definitions
  • Standard Attribute Definitions
  • Business Rules Specification
  • Standard Calculation and Summarization Definitions
  • Entity and Attribute Aliases
  • Data Quality Analyses
  • Sources of Data for the Data Warehouse
  • Data Security Specification
  • Data Retention Criteria

Table 2: Skill sets needed for Data Stewards

Technical Skill Set:

  • Basic Understanding of Data Modeling (Conceptual, Logical and Physical)
  • Basic Understanding of DBMSs
  • Basic Understanding of Data Warehouse concepts
  • Facilitation Skill
  • Technical Writing

Interpersonal Skill Set:

  • Solid Understanding of the Business
  • Excellent Communication Skills
  • Objectivity
  • Creativity
  • Diplomacy
  • Team Player
  • Well Respected in Their Subject Area

Table 3: Roles/Responsibilities of Data Stewards, Data Administrators and Database Administrators

For the Data Steward:

  • Resolving Data Integration Issues
  • Determining Data Security
  • Documenting Data Definitions, Calculations, Summarizations, etc.
  • Maintaining/Updating Business Rules
  • Analyzing and Improving Data Quality

For the Data Administrators:

  • Translating the Business Rules into Data Models
  • Maintaining Conceptual, Logical and Physical Data Models
  • Assisting in Data Integration Resolution
  • Maintaining Meta data Repository

For the Database Administrators:

  • Generating Physical DB Schema
  • Performing Database Tuning
  • Creating Database Backups
  • Planning for Database Capacity
  • Implementing Data Security Requirements

© Copyright 1997 – Intelligent Solutions, Inc.

Claudia Imhoff is the founder of Intelligent Solutions, Inc. (http://www.intelsols.com), a consulting company specializing in data management. She has co-authored two books with Bill Inmon
entitled, “Building the Operational Data Store” and “The Corporate Information Factory”, both published by John Wiley & Sons and is a columnist for several magazines. She is a frequent
speaker at national and international conferences on the topics of the Corporate Information Factory and the Information Ecosystem.

Share this post

Claudia Imhoff

Claudia Imhoff

A thought leader, visionary, and practitioner, Claudia Imhoff, Ph.D., is an internationally recognized expert on analytics, business intelligence, and the architectures to support these initiatives. Dr. Imhoff has co-authored five books on these subjects and writes articles (totaling more than 150) for technical and business magazines.

She is also the Founder of the Boulder BI Brain Trust, a consortium of independent analysts and consultants (www.BBBT.us). You can follow them on Twitter at #BBBT

Editor's Note:
More articles and resources are available in Claudia's BeyeNETWORK Expert Channel. Be sure to visit today!

Ê

scroll to top