
|
The Role and ROI of Enterprise Schema Management
Published: July 1, 2003
Published in TDAN.com July 2006 Most organizations are looking for better ways to share information, cut costs, increase responsiveness, improve data quality and reduce risk. With increasing demand to link disparate data systems and categorize items uniformly, 21st century analysts and developers need an enterprise-wide map of what information is available, who is responsible for it and the detailed structure of the data definitions. Metadata is a part of the solution, along with a generalized, centralized system to define, manage and share structural and taxonomic schemas. A schema describes the logical organization of any information resource, including the structure and definition of data and metadata. Typical examples include:
A schema also defines the legal values, terminology and rules that constrain information in order to improve consistency and data integrity. Because each situation starts with different requirements, people use somewhat different ways of defining data (structure, metadata, taxonomy, semantics, etc.), which hinder cross-system information integration and retrieval. Unique and changing data definitions within “islands of information” clearly impede information flow across an enterprise. Where does one currently find the “enterprise schema standards”? What are the various schemas in use? Who are the stakeholders and stewards of data? What technical, policy and political problems might arise if I make changes to my data definitions or schema? The process of defining and maintaining a centralized repository or registry of enterprise schemas is called Enterprise Schema Management (ESM). The justification for Enterprise Schema Management begins with two key insights:
Shared definitions of schema and metadata drive configuration, mapping and synchronization processes that ensure fast integration, continuous interoperability and frictionless information access. With the increasing need for interoperability, information sharing and IT responsiveness, with – at best – flat headcount, efficiency is more important than ever. Doing things the same old way will create no advantage. Most enterprises can easily remove costs and improve availability of information using ESM. What's more, most large companies are have duplicate efforts underway—and don't even know it. Perhaps more dangerous, much stored information is simply unavailable because it is organized inconsistently or stove-piped within the application it was created. The fact that information is organized, structured and described differently in each information source makes interoperability, cross-system retrieval and information sharing a struggle. This even after years of investment in extract, transform and load (ETL), data integration, application integration and federated search. To answer the question “So what?”, lets consider the business value of Enterprise Schema Management. While many factors contribute to ROI, lets review five primary business imperatives:
These business reasons can provide the financial motivation to move forward with ESM. This involves leadership, process and technology. This involves centralization, reconciliation, impact analysis and change management. A registry of what information is available, how it is used (structure and semantics) and stewards/stakeholders is remarkably absent from most organization. Where does one currently find the “enterprise schema standards”? What are the various schemas in use? Who are the stakeholders and stewards of data? What technical and political problems might arise if I make changes to my data definitions or schema? Without a reference point, how can one compare to the authoritative source? These questions are asked every day by people working with schema and metadata during system integration and information retrieval projects. The larger the organization, the more likely that communication gaps exist. The more likely that overlapping projects – even conflicting projects—are actively producing information that goes unseen. The expense of duplicate effort is hidden from top management, but that doesn't mean it is acceptable, given our endless quest to reduce cost. Typical users of ESM include information architects, data managers, subject matter experts, software developers and application integrators. Key tasks include: Import: gathering schema, including metadata definitions, taxonomy and vocabulary Process: assignment of stewardship, stakeholders, definition of relationships, reconciliation of diverse viewpoints, analysis of impact of change, and change management Export: making schema available via search and automated synchronization of subscribing systems Reconciliation of Disparate Systems Enterprise Schema Management is used to reconcile the relationships and different meanings used by various applications. The semantics must be described and understood by others, because the word “State” could also be used be associated with concepts including “completed” or “restless”. “Cost” and “Price” could mean the same thing or very different things within various systems that are supposed to share information or reduce redundancy. This reconciliation or mediation process can result in consistency along a continuum. On one end is extreme standardization, in which all systems use the same hierarchy (structure, taxonomy) and vocabulary (terminology.) This is generally impractical for two reasons: 1) disruption and 2) loss of information fidelity. At the same time, total independence (developers creating self-serving, but perhaps arbitrary schema) sets up an environment in which redundancy and inconsistency add cost and interfere with information interchange. Experience tells us that support for varied viewpoints and shifting viewpoints is essential for effective information production and knowledge management. Enterprise Schema Management does not command absolute conformance, because mappings and alternatives can be documented within the enterprise schema repository. Indeed, flexibility is an important aspect in order to support varied viewpoints, which are sometimes important to maintain the fidelity of information. What is required is the registry and documentation of interdependencies, so that when anyone makes a change interconnected systems are aware, hopefully before any disruptive impact. Impact Analysis Once the schema from interconnected (or potentially interconnected) systems are imported, reconciled, relationships mapped and stewards/stakeholders identified you have vastly simplified the integration and retrieval process. However, requirements change, which may impact this now carefully constructed network of interrelated schema. Given change, Enterprise Schema Management systems generally include some sort of impact analysis. Impact analysis can reveal which people and systems are affected, at a granular level, by any potential change. The key, of course, is understanding and resolving impacts before the change occurs. This way, interdependency does not mean vulnerability. Using various types of collaboration, consensus rules, voting, etc., impacted individuals can discover, resolve and approve changes. Using the Web and email, rather than requiring “yet another meeting” accelerates the resolution process by enabling solutions and approvals to requiring everyone to get into one room all at the same time. The process of Enterprise Schema Management essentially captures knowledge, to aid in information sharing. Capturing schema (structure and semantics) within an enterprise repository also minimized the loss of information that employee turnover causes. (Ever been dependent on an individual because only they knew the internals of a system?) Change Management Streamlining the process of change management via distance collaboration can chop weeks from integration schedules. By creating an enterprise schema repository, you take the know-how from people's heads and their personal workspace into a shared, clearly documented view of enterprise information. What terminology and structure does each use? What are the relationships among distributed information assets? Are there gaps, overlaps or conflicts? This process is difficult enough when doing point-to-point integration, but becomes impossible to reconcile and manage over time when the goal is integrating an evolving network of interconnected enterprise applications, database and content management systems. XML Schema is not enough While XML and XML Schema would seem to make things better, they also can fuel the flame of independent, potentially arbitrary, definitions and rework, when re-use and interoperability are the objective. XML Schemas include definitions for the legal names of elements, valid attributes and vocabulary, along with the specific hierarchical structure of an XML document. To be valid, an XML document must confirm to the specifications or schema. An XML Schema also includes meta-data describing the purpose or meaning of the values. For example: when describing a location, “WA” is the information (data value) and “State” is the meta-data. Any valid XML document conforms to a defined structure or schema, which defines legal elements, attributes and vocabulary. Simply using XML Schemas does not ensure interoperability. While self-describing and structurally consistent, one cannot be sure of the meaning or semantics represented by perfectly legal element and attributes. Information exchange requires a shared understanding by source and receiver. Here are some quotes to illustrate the point: “Communication cannot occur unless there is a shared context for communication…through semantic mediation, which now ranks as one of the most complex and important issue,” according to IDC in a June 2002 report. “Anyone building any kind of software these days needs to build in the capability to communicate not just physically, but semantically with the rest of the world,” wrote Esther Dyson in Release 1.0, in February 2003. In addition to the interoperability risks associated with unmanaged sprouting of XML Schema, there is the obvious waste of time when developers “re-invent the wheel” to define structure and taxonomy that already exists or can be re-used. Removing this hidden cost of redundant development (and subsequent integration) provide a huge ROI for enterprise schema management. The META Group wrote, in May 2002, “Most corporations…have not recognized that they are about to have a management problem, as the number of XML Schemas and DTDs (Document Type Definitions) they must deal with grows out of control.” A big motivation foe XML is reuse and standardization of artifacts and schemas, but without Enterprise Schema Management, reuse is hard to achieve. Perhaps worse, people may think XML enables systems to interoperate, but one system may have a different use of terminology or different interpretation of the data being exchanged. Trouble starts if the semantic meaning of that data was not understood or delineated within the data exchange process. Given that current specifications do not prescribe a semantic solution beyond providing narrative information within the WSDL file or associated XML Schema, the schema repository has a key role to play. To successfully integrate and share information from across the enterprise requires an enterprise perspective, which is not now – or at any point in the near future – going to be 100% XML. While XML-based documents are increasingly pervasive within the modern enterprise, critical information is also stored and retrieved within relational databases. In this situation, schema is used to define the columns and data types within the tables of a database. Legacy applications generally have their own templates not based on XML Schema. Some systems still use DTD, or Document Type Definitions, used with SGML. Given the heterogeneous nature of enterprise information, a cross-system, language-independent perspective is required when modeling enterprise schema. This is particularly true given the business imperative to integrate structured and unstructured information from disparate sources. A generalized schema model is necessary to find overlaps, gaps and conflicts. Reuse and interoperability require the ability to compare “like” from “unlike” and to establish relationships based on meaning and intent rather than syntax and structure. Now that we have reviewed the driving forces for ESM, we can turn to key characteristics of any ESM solution: 1) Comprehensive: Manages both "Structural" Schema and Taxonomic Schema (elements, classes & vocabularies, terms and vocabulary views) Schema Standards and existing Schema editing tools tend to focus on either Structural Schema definitions (simple scalar and complex data element definitions, e.g. Dublin Core, ebXML) or taxonomic schemas (controlled vocabularies such as UN Geography or Getty vocabularies). Complete specifications of enterprise schemas involve both. An integrated system must address the domain specification and classification mechanisms of vocabularies and the overlying data definitions of all of the string, date, numeric and vocabulary-type elements and how they are combined into complex structures. 2) Flexible: Respects the inevitability of diversity and heterogeneity within the standards management process. Despite the larger organizational need for structural data standards, for a variety of technical, functional or cultural reasons, complete uniformity is not usually practical or desirable. Schema diversity is caused by a variety of things:
A worktable ESM system must allow for the management of disparate schema structures while promoting consistency and evolution toward an evolving enterprise standard, without squelching adoption with an all-or-nothing mandate. 3) Consensus-Managed: Implements guarantees to stakeholders that schema definition entities placed into the shared domain will be managed in such a way that their interests will be accounted for. Stakeholders will be hesitant to adopt and support a standards methodology that does not ensure that their systems architectures will not be adversely affected. If the definition of elements, vocabularies or schema structures which were previously locally maintained are modified without notice or approval and subsequently implemented or mandated in their systems, their business or technical processes may break. An impact analysis, voting and consensus enforcing mechanism must be ensured as part of a well design ESM system so that the “schema donors” are assured of continued shared control of the now-shared schema assets. 4) Culturally Responsive: Allows change management processes to be customized and "tweaked" at all levels of the organizational tree. Not all organizations are the same. Some have a very top-down management structure and others are collaborative to orchestrated chaos. The ESM system will influence the culture but should be responsive to the natural business and social culture of the enterprise it serves. In some cases, organizations have hybrid, mosaics of top-down and bottom-up collaborative styles. Schema Change Management processes, an essential part of the ESM should allow the appropriate style and rules to be established for the enterprise and its’ sub-divisions. 5) Granular/ Modular: Implements schema asset management, change management and permission to a highly granular level to allow maximum reusability, distributed management with the appropriate level of organizational security and control. XML Schema standards such as DTD and XSD provide a useful baseline model for defining schemas, particularly for on-the-wire interchange of data packets. Regardless of which standard is agreed upon, any mechanism that makes a complete schema document the atomic unit of management, ownership and workflow is critically limited in flexibility and reusability. The ideal ESM system manages, not a list of versioned documents but a living network of referential schema definition objects, which can be reused, combined and managed in rich ways. Consequently, the power of object-oriented inheritance, modularity, granular permission control, history logging, impact analysis and change control can be brought to bear to simplify the challenge of managing a complex and massive enterprise schema. 6) Lifecycle / Evolutionary: ESM Is a full-lifecycle system that not only sets the standard but makes it practical and workable to keep the standards current. Establishing an enterprise schema is a daunting challenge that few organizations have yet achieved. But more difficult still may be the discipline of keeping the standard current through the large and small changes necessitated by reorganizations, changing business needs, ongoing analysis and mergers and acquisitions. An ESM system should keep stakeholders engaged and involved in the day-to-day process of tracking and resolution, by keeping the “noise” level of irrelevant distractions down, informing every appropriate stakeholder automatically when changes affect them and enabling the most agile change process possible with a simple, repeatable online process that reduces unnecessary committee administrivia. 7) Humanistic: (vs strictly technical) Schema Standards should include not only technical structure definitions but human-readable labeling and descriptive information and appropriate validation and display tips necessary to drive client interactive functions. Enterprise Schema Management systems should be designed with the understanding that knowledge domain experts aren’t always programmers, but should still be full partners in the schema management and development process. As such the terms and techniques for modeling schemas must be accessible to anyone who can administer a typical content management system or manage a taxonomy. Furthermore, an EMS system should support globalized representations including language, date-time and currency conventions that are present in large, global organizations. Finally Schema is more than machine-readable standards but standards for human-mediate data management and consumption. The Enterprise Schema must include the ability to manage labeling, descriptive and display-hint information for schema definitions, which can be propagated across all impacted systems and languages so that each user in the information value chain accurately understands the meaning and intent of the data they are viewing. 8) Actionable: Schema Standards should be described in such a standard, detailed and consistent manner that the implementation and enforcement of those standards can usually be accomplished automatically through a standard implementation infrastructure or methodology. Rather than collecting dust as a “study” an enterprise-wide schema or taxonomy map should be living and connected to impacted systems. The ideal ESM contains a detailed, consistent, thoroughly cross-referenced specification of the Enterprise Schema that is accessible online by all systems regardless of architecture or language. Through a standardized API, process or infrastructure all systems under management by the ESM should be configurable through a repeatable, subscription process to those parts of the enterprise schema that apply to that system. Note: Systems that have no practical programmatic schema administration interface (e.g. one-off custom business applications) must be updated by developers to the specification implicit in the schema components to which they subscribe and which the system stakeholders are involved in collaboratively maintaining. CONCLUSIONWhy is Enterprise Schema Management worth the trouble? Consider the goals: interoperability, information sharing and cost reduction. Aren’t these critical objectives for both IT and business management? Indeed, consider what happens if you do not implement Enterprise Schema Management: information overload continues to get worse. Redundant data and redundant effort continue to drain profit from your enterprise. And spending on information integration continues to consume perhaps 70% of your IT resources, handicapping your ability to respond to new business requirements. Just one factor – the reuse of “core” schemas – provides tremendous cost justifications for ESM, by increasing programmer productivity, speeding the completion of projects and ensuring interoperability. When you add another aspect – the automatic propagation of agreed upon changes to all subscribing systems – it is easy to imagine proclaiming that you have discovered a way to decrease spending, while improving information sharing. Once you consider the schemas used across your enterprise as an information asset, critical to integration and untapped source of cost savings, you’ll discover that Enterprise Schema Management is critically important to achieving the goals of IT team and business management. Defining the structure of information using schemas is nothing new. But gathering together the schemas used among interconnected applications, in an enterprise schema repository, has become a source of competitive advantage for pioneers in the field of information management.
Copyright SchemaLogic © 2003
Go to Current Issue | Go to Issue Archive
Peter Hallett - Peter Hallett, VP of Marketing at SchemaLogic, has worked in business intelligence and knowledge management for twenty years, with a focus on business solutions for information integration, analysis
and retrieval.
Breanna Anderson - As CTO of Schemalogic, Breanna Anderson brings twenty years of experience in the design, development and deployment of database systems, RAD tools, content production and knowledge management
systems.
|