TDAN: The Data Administration Newsletter, Since 1997

THE DATA ADMINISTRATION NEWSLETTER – TDAN.com
ROBERT S. SEINER – PUBLISHER

Subscribe to TDAN

   > home > newsletter > article
 Printer-friendly
 E-mail to friend

XML: The New Esperanto?

by William J. Lewis
Published: September 1, 1999
"Many companies report a strong interest in XML. XML however, is so flexible that this is similar to expressing a strong interest in ASCII characters." -- Microsoft BizTalk Framework Overview
"Many companies report a strong interest in XML. XML however, is so flexible that this is similar to expressing a strong interest in ASCII characters." -- Microsoft BizTalk Framework Overview

It was a beautiful spring day in Chicago. I was taking a break from DCI's Data Warehouse World at the Navy Pier, having lunch at a seafood restaurant, as well as alternating between catching up on magazine reading and gazing out the window. I had ordered a dessert, and had asked the waitress to also bring a cup of coffee. "Regular or decaf?" Regular, please. "Cream and sugar?" Just black, thanks.

This brief exchange brought my thoughts back to an article on XML that I had just finished reading. (For the uninitiated, XML stands for eXtensible Markup Language. I love internal caps; remember Calvin and Hobbes' G.R.O.S.S.--Get Rid Of Slimy GirlS?) The general idea of the article was that XML, due to its ability to combine data and meta data in the same document, would remove once and for all the barriers to enterprises sharing not only data but also the meaning of that data. "All other entities…can understand what the data means and correlate it with similar data from other providers." (1) Rather like the intent of the international language Esperanto (www.esperanto.org), the goal of which was to remove all barriers to understanding of spoken and written languages by providing a worldwide second language.

Meta data is defined in XML documents in the form of labels, or "tags", which enclose each data value. While reading this article it occurred to me that this labeling, while certainly a big step toward conveying meaning, by no means insures communication of complete and unambiguous meaning. For example, take that verbal, decidedly low-tech interchange about coffee. I had assigned the label coffee to my request. To me, the label coffee is unambiguous; it always means "regular, and black". But that's just me-it's a "local variable". The waitress, working in the more precise vocabulary of the food-service professional, had no way of receiving specific-enough meaning without eliciting the details "behind" the label. The label I had originally chosen was insufficient to allow us to share a complete and accurate meaning.

What does this have to do with XML? Well, if the "meaning" (i.e., meta data) included in an XML document is only in the form of labels, in order to be global (shared), these labels must be 1) very specific, and 2) comprehended the same way by all parties, to avoid misunderstandings. (Practitioners of corporate law make a good living by arbitrating the consequences of corporate misunderstandings!) This is why the development of "vocabularies" or "grammars" is absolutely essential to the successful and widespread adoption of XML. So we can see that the technology and the syntax are the easy parts, and that once again, raising its ugly head, is the hard part--those pesky ol' semantics.

So if the success of XML (a form) is dependent on the development of vocabularies (the content), how are we doing so far in developing these vocabularies? Apparently things are proceeding apace, on both technical and business camps, in the traditional, not to say free-for-all, manner. On the technical side, Microsoft and the World Wide Web Consortium (W3) both rightly disavow any direct involvement in the resolution of semantic issues, appropriately concentrating on guidelines for syntaxes and formats, and on establishing "repositories" for storage and sharing of the vocabularies once they come rolling in. Microsoft is jumping in/on the effort with both feet, framing things up with their BizTalk Framework (www.biztalk.org). Another site to watch is www.xml.org, which advertises itself as "the first global XML industry portal featuring an XML registry and repository that offers automated public access to XML schemas for electronic commerce, business-to-business transactions, and tools and application interoperability." Whew. As of August 1999, content remains thin at all these registries, but let's stay tuned. When the work begins to get done, we can rest assured there is no lack of places to put the results.

On the business side, many of the players signing up on this new bandwagon are industry-specific protocol standards organizations significantly more well-established than the new kid XML. Making things even more complicated and ominous, many of these organizations bring long-held antagonisms to the party. In the finance industry alone, there is Open Financial Exchange (OFX), Interactive Financial Exchange (IXF), Bank Internet Payment System (BIPS), as well as GOLD and SET. Evidently GOLD is merging with OFX. Lost yet? Then there is Financial Information exchange as well, which hopefully will FIX everything. (Or wasn't it XML that was going to fix everything?) A veritable alphabet soup; and again, this is just the financial industry.

Why are vocabularies and their semantics, or meaning, so important? Are the potential risks in not getting it right limited to getting decaf, latte, or Irish coffee? The risks of ambiguity in any exchange of business information are of course quite serious-remember those attorneys waiting in the wings. For example, in the banking industry, isn't the label account_balance sufficiently specific to enable an accurate exchange of information? Well, in a packaged data warehouse product for the financial services industry (with which I happen to have direct experience), two entire tables, with columns numbering in the hundreds, are devoted solely to different types of account balances: month-to-date, year-to-date, average, aggregate, maximum, last month's, current month's, collected, uncollected, overdraft, combinations of the above and more. So say the following element is received as part of an XML document:

1234.56

What does this balance data mean--collected, uncollected, average, minimum, or what? It really means that the quality of the meaning is completely dependent on a specific and shared vocabulary agreed to by the participants in the transaction.

In the vocabulary of IFX, for example, the label DEPACCTBAL means "deposit account aggregate balance". This sounds pretty specific, but should one assume month-to-date aggregate, life-to-date aggregate, or last-month's-aggregate balance? And this is just one of many "standards"…FIX, OFX, et al, no doubt have their own labels for "deposit account aggregate balance"!

The IT trade press ought to be assuming a leadership role in delineating the differences between technical syntactical form and semantic business content. But, as somebody once said, I don't want to go off on a rant here… In their ubiquitous coverage of XML, the press persists in blurring the tough issues in their consistent confusion of form and content. A recent article described the "magic" of XML's support for "complex semantics" (2). The article cited the example of an element that could be represented in XML equivalently as either a single string, or a concatenation of three sub-strings (first-middle-last). Well, to begin with, this ain't no breakthrough in syntax…just about any conventional data-representation standard, even the DATA DIVISION in much-maligned COBOL, supports sub-strings as a matter of course (the relational model being a notable exception). But even more serious is the misrepresentation of this sub-string feature as an example of complex semantics. Truly complex semantics are issues of meaning---for example, deciding whether the customer "John Q. Public" is equivalent to the supplier of the same name, or equivalent to the customers "Jack Public" or "John Quincy Public". The equivalence of the strings "John Q. Public" and "John"||"Q."||"Public" is a mere formatting rule. It scarcely qualifies as a semantic issue, much less a complex one. Such reckless assertions trivialize the seriousness of the work still outstanding, and can seriously mislead technology consumers who need to understand all the issues potentially bearing on real-world implementations.

Another industry authority asserts that in order to use XML's "powerful capability to integrate dissimilar systems and databases within an enterprise…organizations need to have the identification and management of their enterprise meta data under control". (3) No doubt…but what's the hard part here--the part that businesses have had such limited success with over the past decades already? Most of us have direct experience with how far away most businesses are from having the identification and management of their enterprise meta data under control…and if they do, wouldn't they already be just a short step away from achieving integration of their dissimilar systems and databases…XML or not?

Well, on the upside, it indeed appears that XML, just like the information superhighway onto which it is poised to merge, may well be viewed by a critical mass of e-commerce participants and industries as a technological enabler, and hopefully it will act as an organizational motivator as well. In contrast to Esperanto, the financial benefits of widespread adoption of XML are obvious and immediate. E-money makes the world go ‘round, as they say, and XML's potential for shared meaning (and shared e-Revenue!) will hopefully motivate this critical mass to lay down their arms, come together around the conference table and hammer out a common vocabulary and dictionary for their common data. (Wait…did someone say…"data dictionary"?)

But in order to get this done, it's imperative that we get over congratulating ourselves for agreeing on the shape of the negotiating table, and get going on the real stuff-the meaning of the shared data, the currency that we need to power the brave new digital economy.

1) Osterfelt, Susan, "Business Intelligence: Sharing Financial Data: An XML eXaMpLe", DM Review, May 1999

2) Trustman, John, and Meshako, Susan, "XML Again", Intelligent Enterprise, August 3, 1999

3) Finklestein, Clive, "XML and Enterprise Information Portals", DM Review, July/August 1999

Go to Current Issue | Go to Issue Archive


Recent articles by William J. Lewis

William J. Lewis - William J. Lewis has spent twenty years in the Information Technology field, the last thirteen managing data, meta data and data models. His work has appeared in previous releases of The Data Administration Newsletter, in DM Direct, Database Programming and Design, and IDUG Solutions Journal. He is currently an Associate Director in the Analytical Business practice of Cambridge Technology Partners.