TDAN: The Data Administration Newsletter, Since 1997

THE DATA ADMINISTRATION NEWSLETTER – TDAN.com
ROBERT S. SEINER – PUBLISHER

Subscribe to TDAN

TDWI
Dataversity
Data Governance Winter
DGI Conference
Master Data Management

   > home
 Printer-friendly
 E-mail to friend

XML: Catalyst for Convergence?

by William J. Lewis
Published: September 1, 1999

In the article XML: The New Esperanto? I suggested that if XML is accepted by a critical mass of e-commerce participants and industries as a technological enabler, it will then act as an organizational motivator. It appears that this motivation is indeed picking up steam. The primary evidence for this momentum is the increased activity surrounding industry-specific vocabularies implemented in XML "schemas".

An XML schema, according to Microsoft, is a definition of a document [type], which includes:

  • the elements that can appear within the document
  • the attributes that can be associated with an element, including whether an element is empty or can include text, and any default values that may exist
  • the structure of the document: which elements are child elements of others, the sequence in which the child elements can appear, and the number of child elements

Technological curmudgeons may note that any similarities between an XML schema and a COBOL FD (or a data declaration in just about any computing language) may not be purely coincidental. There are only so many ways to describe what is, after all, a data record. Also, as we know, describing the syntax of a data structure does not necessarily indicate the meaning of its contents. (There he goes again with that semantics thing.) One could certainly define within an XML schema an element with a tag name "XY01". But this is where the "critical sharing mass" comes in: the tag is understood either implicitly ("heck, everybody knows what an XY01 is!") or explicitly (i.e., documented in a "schema repository"), or nobody will use/share that document. If a given XML schema is shared, it would certainly follow that some meaning is being conveyed.

Progress on XML schema development can be tracked in at least two portals (formerly known as Web sites), XML.org and biztalk.org. These are "schema repositories" at which schemas will registered (stored). There are a good number of participants currently registered at xml.org. Also, there appears to be a significant amount of activity around mapping the older format-standards, such as EDI, FIX and IFX in the finance industry, to XML schemas. So is some cautious optimism warranted regarding XML as a "catalyst for convergence" toward fewer, more ubiquitous data-element-naming standards?

Maybe not, due to several factors. The same element name (tag) can occur in an unlimited number of schemas. The same element name can occur multiple times within the same schema. And it's also highly likely, for the sake of expediency, that the data elements within various older formats will merely be mapped one-for-one to XML schemas. Let's look at some examples — maybe hypothetical, maybe not — related to the representation of bank account balances.

  • Say OFX (Open Financial Exchange) maps its data elements one-for-one to XML schemas. The OFX element named BALAMT becomes XML element with tag BALAMT.
  • Say IFX (Interactive Financial Exchange) also maps its data elements, one-for-one, to XML schemas. The IFX element named BALAMT becomes an XML element also tagged BALAMT.

Can we now say that OFX and IFX have "converged" in XML? Can we assume that these two XML elements, having identical tags, are semantically equivalent, i.e., synonymous? Actually we cannot, because, going back to the sources, IFX and OFX qualify balance amounts (Ledger, Available, Current, etc.) differently. In IFX, the meaning of the balance amount is qualified by the value assigned to another field (BALTYPE). In OFX, the meaning is qualified by the name of the "aggregate" (i.e., group level) field (LEDGERBAL, AVAILBAL) in which BALAMT is nested. There can be multiple BALAMT elements in a single OFX/XML document.

Convergence occurs only when the meanings of equivalent labels are precisely synonymous. A Balance Amount (Available) is not equivalent to a Balance Amount (Ledger), as anyone who's tried to write a check on an un-cleared deposited check can attest.

So we're not converged yet — is there hope? Check out UDEF.com. These folks have the right idea. UDEF is a system for classifying and identifying data elements according to their meaning.

Under UDEF, a data element is assigned a unique identifier based on its meaning. Applying this to the above IFX/OFX/XML example, "Ledger Balance Amount" could be assigned a fully-qualified UDEF identifier, say U-g.9_13.11 (I didn't say it was pretty). Since XML is eXtensible, within any schema an attribute "UDEF_ID" could be defined on any element. The value of UDEF_ID could be set to U-g.9_13.11, for example, for any XML element that is equivalent to "Ledger Balance Amount". A less-precisely-qualified "Balance Amount" element would take a less-well-qualified UDEF_ID value, say U-g.9_13. True semantic convergence could begin to become a reality.

So, to summarize the XML landscape: progress in semantic content still needs to catch up with the pace of progress in syntactic form. The shape of the XML conference table and the initial agenda have been proposed and generally agreed upon. Like spectators in the gallery at the Yalta conference, we're watching the participants enter the negotiating room and take their seats. Substantive talks are about to begin; the diplomatic language must be very precise.

Go to Current Issue | Go to Issue Archive


Recent articles by William J. Lewis

William J. Lewis - William J. Lewis has spent twenty years in the Information Technology field, the last thirteen managing data, meta data and data models. His work has appeared in previous releases of The Data Administration Newsletter, in DM Direct, Database Programming and Design, and IDUG Solutions Journal. He is currently an Associate Director in the Analytical Business practice of Cambridge Technology Partners.