Architecture Made Easy, Part 11
Data Governance: The New Philosophy of Data Governance
Published: September 1, 2010
Jim Luisi and Sean Kimball look at the benefits of having an appropriate balance of architecture and governance.
Architecture is the first step toward organizing and simplifying anything complex, and governance is essential for managing and controlling either development or change. While it is obvious that we need architecture and governance, which architecture and what form of governance is not at all obvious, and the wrong combination can result in failure.
Determining the optimal architecture and the ideal form of governance are not. If they were easy, everyone would be doing it and experiencing the many benefits. But exactly what are the benefits?
The Benefits of BalanceWhen the appropriate balance of architecture and governance is achieved, the business is rewarded with a competitive advantage through:
Moreover, if transparency of data and its associated metadata is not at the core of the solution, you need to completely start over again as all the councils and committees in the world aren’t going to do anything but harm your ability to conduct business; and your regulatory violations will never subside, never mind cease. To help understand why, let’s take a quick look at where it all started.
Brief History of Data GovernanceBefore it was ever called “data governance,” much of the industry had relatively good data governance. Early on, data was centralized on the mainframe often organized into nice subject areas of data, such as customer, vendor, and contract master files, not to mention that data was generally secure, relatively clean and accurate, and business users often kept paper copies of their data for them to do whatever they wanted or needed.
For example, securities traders and brokers kept their own customer and contact lists, transaction journals, and portfolio positions, because their livelihood fully depended upon that information. If they lost their job, got a better offer down the road, or if the computer lost their data, the prepared business user would survive.
With data volumes rapidly expanding, mid-range computers for departmental computing emerged to co-exist with mainframes that housed the books and records of the enterprise.
However, for the first time it became easier for business users to have their own electronic copies of their customer and contact lists, transaction journals, and portfolio positions redundantly on departmental computers so that they could rapidly meet their needs in a more localized environment. Copies of master files proliferated across the computing landscape as data governance entered the dark ages of information architecture. As many IT departments tried to lock down uncontrolled access to data, the situation continued to run out of control, only to get worse with the advent of the personal computer and Windows servers.
With the advent of desktop computers and servers that could fit under desks or in closets, the business continued to do what it needed to do, including copying their production data to portable thumb drives to safeguard their data. They found the most reliable data they could find from among the myriad of redundant data sources and continued to conduct business as best they could.
History shows that no matter how much IT may attempt to lock down data access across any of the three tiers of computer platforms, the business will continue to get around every measure to meet their needs for data and information. Data governance councils and architectural committees are merely momentary obstacles as business users find their way around their controls. And for the record, thank goodness they do, because the moment the business users are kept away from their own data, the company invariably will go out of business.
The New Philosophy of Data GovernanceWhen we finally realize that the ingenuity of business users to overcome any of the obstacles placed between them and their data is almost without limits, and once it is realized that if IT could successfully separate business users from their data it would be fatal to the business, a viable solution becomes possible.
The first principle of the new data governance philosophy is that IT must do everything it can to empower the business users to achieve relatively unfettered access to reliable data in a self-serve model.
Yes, IT power brokers, middlemen, and control freaks will go nuts, and there will be a hundred excuses for every suggestion that might be helpful to support this principle; but as difficult as it may be for some to accept, embracing the new philosophy is the only way to give IT the opportunity of meeting the myriad of regulatory rules.
That said, empowering business users does not mean that we should hand over the keys to the kingdom including all of the applications, compilers and databases. In fact, that would not empower the business at all. Instead, according to this new philosophy, empowering business users means putting an infrastructure in place that makes it easy for business users to get to the most reliable data conveniently on their own. Many within IT will claim that this is simply not possible, and they will most assuredly point to the complex tools and data landscape as proof.
Necessity is the Mother of InventionThere are those rare moments when IT can demonstrate the level of ingenuity that rivals business users. As one would hope when presenting IT with a complex challenge, there should be someone from among those expensive, highly experienced and educated IT individuals that should be able to come up with something, even if it means looking at what their kids may be doing on their Internet browsers. One such example is “mashups.”
Mashups1 are combinations of media that can be created simply using the mouse to drag and drop two things together. Mashup is a type of web application that uses content from more than one source to create something new that can be either displayed or listened to.
The term “mashup” originally comes from pop music, where users seamlessly combine music from one song with the vocal track from another to create something new that was mashed together. This does not mean that users will be able to set their production data to their favorite tune, but they will be able to associate it with their favorite form for visually illustrating it.
Now you may ask what mashups have to do with empowering business users to self-serve their own ad hoc reporting needs, never mind data governance, and the answer is surprisingly simple.
Mashups and Data GovernanceBusiness mashups defined by using data from across the enterprise are easy to define and manage by simply dragging a “data mashup” to a “widget.”
The component referred to as the “data mashup” is a query that renders a particular set of fields available for use in an ad hoc report. The component referred to as a “widget” is simply a prefabricated form to provide data visualization, such as a list, chart, bar graph, pie chart, or map.
When a business user combines a data mashup with a widget, they link fields of interest from the data mashup to the fields of the widget that are used to create a display and the result is called a mashup. They are easy enough to create that our kids assemble them on their laptops, smart phones, and iPads.
Under the new data governance philosophy, the role of IT is to facilitate and encourage mashup creation by business users while non-intrusively managing the selection of the reliable data sources, minimizing the replication of data, and providing data security, privacy, and reliable response time performance.
Business Approach to Data GovernanceWith the proliferation of data across a wide array of mid-range UNIX and Windows-based servers, not to mention personal computers, the sheer number of databases and database columns has become staggering. Many large companies have thousands of databases, and tens of thousands of MS Access databases and spreadsheets.
Any approach that starts from the many columns across the data landscape, such as by reverse engineering each database, is relatively certain to fail simply because of the volume of database columns that have to be manually analyzed to determine each field’s business meaning, and depending upon the sparseness and quality of the data content, the business value that it represents to business users.
An easier approach, one that gives an organization a fighting chance, is to begin with the fewest number of parts from a top-down perspective. This means starting from a logical data architecture2, which identifies data subject areas, and conceptual data models that visually illustrate the business data glossary in a meaningful and useful way.
As an alternative, one may also begin with a simple business data glossary that collects a basic set of “business” metadata about each business field, which is much more useful than what one often finds inside a CASE tool.
For example, a typical CASE tool data dictionary will feature a field name (e.g., coupon rate), an abbreviation (e.g., CPN_RT), and a field description that is often without any value to anyone (e.g., A coupon rate is the rate of a coupon.). In comparison, a proper business data glossary would likely include the following business metadata:
When managed properly as a business data glossary, the business data glossary field names can guide business users to the data mashups that meet their business need, or at a minimum it can help business users to communicate exactly what the business needs are to someone in IT who is helping them establish the necessary data mashups.
Leveraging Enterprise StandardsAt a more detailed level, there is a wide array of specialized data related disciplines, such as: risk management; data stewardship; regulatory compliance; data quality; data security; test data generation; data encryption; data masking; data accessibility; master data management; data virtualization; taxonomy; ontology; data archival; data administration; data abstraction; data normalization and de-normalization; logical and physical data architectures; current state architecture; future state architecture; data frameworks; data blueprints; reference architectures; terminal services architectures; conceptual, logical and physical data models; enterprise service bus architecture; ETL architecture; XML; various types of files and databases; access methods; referential integrity; operational data stores; data warehouses; data marts; star schemas and snowflakes; data mining; data visualization; data analytics; statistical and non-statistical data analysis; and business intelligence.
Governed by the standards from the enterprise, these disciplines help put the appropriate processes and standards in place to ensure the appropriate use, care, protection, and accessibility of data.
Often there is an enterprise information architect who is usually responsible for developing a well formed set of frameworks, standards, and data architectures that are useful across the enterprise.
These enterprise artifacts may then be adopted and often further detailed by each line of business to facilitate integration across the various lines of business. Once adopted, capabilities such as business intelligence, data analysis, and data visualization involving cross area business data becomes attainable with much less effort and expense than would be otherwise possible.
A common approach for promoting the artifacts of the enterprise is to place responsibility on the enterprise information architect to sell their ideas, as opposed to imposing them. The enterprise information architect then often assists the line of business to adapt the artifacts to meet their business priorities.
Top Down from Business and EnterpriseWhile data governance artifacts, such as frameworks, standards, and logical data architectures flow top-down from the enterprise, the creation of the business data glossary also flows down from the business.
The subject-matter experts in the business sometimes referred to as “data stewards,” advocate for their business users within their area of expertise. As advocates for the business, data stewards help business users get in touch with data that can provide them valuable insight into the books and records of the business and its operational workflows.
At the lowest level, data stewards take control of business data glossary field names, ensuring that they are clear and unambiguous with the many other data fields that also exist in that area of the business. In addition, information discovered by a data steward during his/her research should be easily recorded and readily accessible to the business users they advocate for.
Data Governance PortalA data governance portal is the window into the data assets of each business area across the enterprise. It contains the data that is present within the business and the results of the ongoing research that is conducted by the data steward.
Imagine a Google-like interface that business users could use to search for business data glossary fields entries, mashup reports, and mashup components that they could easily assemble or use directly from within the data governance portal.3
As an example, if someone were looking to find data mashup for ad hoc reporting that compares LIBOR to thirty-year fixed jumbo rates, the list of data mashups involving those business data glossary fields would be displayed with their business definitions and associated business metadata.
As such, the data governance portal would provide business users with an integrated self-service ad hoc reporting capability that would simultaneously allow IT to support the regulatory compliance needs of the corporation.
In the event that the business user requires data not already supported by existing data mashups, then the data management portal could route the inquiry to the data steward responsible for that area of data. Once researched and created, the incremental information provided would then be available thereafter on the portal to business users simply for the asking.
Implementation TipsWhen architecting a data governance portal and mashup capability you will probably need to leverage data virtualization to help stay aligned with your logical data architecture.
In the most detailed representation of your logical data architecture, each subject area of data is then expanded into a conceptual data model that business users and IT staff can easily relate to.
Even after having accomplished this ease of understanding, a number of challenges are likely to exist, particularly when the current environment is laden with data quality issues.
In such a situation, there is need for a good data virtualization4 strategy. A suitable data virtualization strategy will save large amounts of rework within mashups and other forms of reporting, as it can centrally address data scrubbing, value standardization and consistent formats across the data landscape.
To make matters even more challenging, let’s say that the goal of your organization is to migrate to a future state environment.
What is particularly useful is that a good data virtualization approach can render the data in the form that business users can more readily relate to from the perspective of your business data glossary and or logical data architecture. The logical data architecture, in fact, should be an illustration of your future state data architecture.
As you start creating an inventory of mashups, particularly useful is the ability to protect your investment in mashups by insulating your growing inventory of data mashups and other reports from the migration in your data landscape toward your future data architecture.
Perhaps most important to your ability to meet the needs of your business and your regulatory requirements is an efficient process of supporting this new philosophy for data governance, as it can go a long way towards eliminating the desire to develop MS Access applications and Excel spreadsheets for ad hoc reporting needs across the business community.
SummaryProper data governance should eliminate, and not add additional layers of process.
When optimized, the business and IT community alike should have two basic capabilities:
Depending upon the extent to which your industry is regulated and what the plausible financial damages and fines could be, not to mention the potential damage to brand reputation, operating either without data governance or with the wrong form of data governance may be construed as irresponsible. Without the right data governance, the interests of the various stakeholders across the enterprise and regulatory bodies will not be represented.
Please don’t hesitate to let me know which articles in the “Architecture Made Easy” series are useful to your organization. In addition, corrections, enhancements, and suggestions are always welcome and are requested.
Recent articles by Sean Kimball
Recent articles by James Luisi
Sean Kimball - Sean has twenty-five years of experience within the largest financial conglomerates. He was formerly the Chief Enterprise Architect for a large U.S.-based financial conglomerate and is one of the most innovative executives who can be found speaking at select industry conferences.
James Luisi - Jim has thirty years of experience in information architecture, architecture and governance within control and information systems in the financial and defense industries with information in LinkedIn.com. Feel free to send him a link. Jim is an author, speaker at conferences, and enterprise architect at a large financial conglomerate in New York area.