XML: Catalyst for Convergence?
Published: September 1, 1999
In the article XML: The New Esperanto? I suggested that if XML is accepted by a critical mass of e-commerce participants and industries as a technological enabler, it will then act as an organizational motivator. It appears that this motivation is indeed picking up steam. The primary evidence for this momentum is the increased activity surrounding industry-specific vocabularies implemented in XML "schemas".
An XML schema, according to Microsoft, is a definition of a document [type], which includes:
Technological curmudgeons may note that any similarities between an XML schema and a COBOL FD (or a data declaration in just about any computing language) may not be purely coincidental. There are only so many ways to describe what is, after all, a data record. Also, as we know, describing the syntax of a data structure does not necessarily indicate the meaning of its contents. (There he goes again with that semantics thing.) One could certainly define within an XML schema an element with a tag name "XY01". But this is where the "critical sharing mass" comes in: the tag is understood either implicitly ("heck, everybody knows what an XY01 is!") or explicitly (i.e., documented in a "schema repository"), or nobody will use/share that document. If a given XML schema is shared, it would certainly follow that some meaning is being conveyed.
Progress on XML schema development can be tracked in at least two portals (formerly known as Web sites), XML.org and biztalk.org. These are "schema repositories" at which schemas will registered (stored). There are a good number of participants currently registered at xml.org. Also, there appears to be a significant amount of activity around mapping the older format-standards, such as EDI, FIX and IFX in the finance industry, to XML schemas. So is some cautious optimism warranted regarding XML as a "catalyst for convergence" toward fewer, more ubiquitous data-element-naming standards?
Maybe not, due to several factors. The same element name (tag) can occur in an unlimited number of schemas. The same element name can occur multiple times within the same schema. And it's also highly likely, for the sake of expediency, that the data elements within various older formats will merely be mapped one-for-one to XML schemas. Let's look at some examples â€” maybe hypothetical, maybe not â€” related to the representation of bank account balances.
Can we now say that OFX and IFX have "converged" in XML? Can we assume that these two XML elements, having identical tags, are semantically equivalent, i.e., synonymous? Actually we cannot, because, going back to the sources, IFX and OFX qualify balance amounts (Ledger, Available, Current, etc.) differently. In IFX, the meaning of the balance amount is qualified by the value assigned to another field (BALTYPE). In OFX, the meaning is qualified by the name of the "aggregate" (i.e., group level) field (LEDGERBAL, AVAILBAL) in which BALAMT is nested. There can be multiple BALAMT elements in a single OFX/XML document.
Convergence occurs only when the meanings of equivalent labels are precisely synonymous. A Balance Amount (Available) is not equivalent to a Balance Amount (Ledger), as anyone who's tried to write a check on an un-cleared deposited check can attest.
So we're not converged yet â€” is there hope? Check out UDEF.com. These folks have the right idea. UDEF is a system for classifying and identifying data elements according to their meaning.
Under UDEF, a data element is assigned a unique identifier based on its meaning. Applying this to the above IFX/OFX/XML example, "Ledger Balance Amount" could be assigned a fully-qualified UDEF identifier, say U-g.9_13.11 (I didn't say it was pretty). Since XML is eXtensible, within any schema an attribute "UDEF_ID" could be defined on any element. The value of UDEF_ID could be set to U-g.9_13.11, for example, for any XML element that is equivalent to "Ledger Balance Amount". A less-precisely-qualified "Balance Amount" element would take a less-well-qualified UDEF_ID value, say U-g.9_13. True semantic convergence could begin to become a reality.
So, to summarize the XML landscape: progress in semantic content still needs to catch up with the pace of progress in syntactic form. The shape of the XML conference table and the initial agenda have been proposed and generally agreed upon. Like spectators in the gallery at the Yalta conference, we're watching the participants enter the negotiating room and take their seats. Substantive talks are about to begin; the diplomatic language must be very precise.
Recent articles by William J. Lewis
William J. Lewis - William J. Lewis has spent twenty years in the Information Technology field, the last thirteen managing data, meta data and data models. His work has appeared in previous releases of The Data Administration Newsletter, in DM Direct, Database Programming and Design, and IDUG Solutions Journal. He is currently an Associate Director in the Analytical Business practice of Cambridge Technology Partners.