Data-Oriented Application Engineering: An Idea Whose Time Has Returned
Published: January 1, 2007
Published in TDAN.com January 2007
This article contains material from the book Principles of Data-Oriented Application Engineering, currently in progress.
What is Data-Oriented Application Engineering?
Surveys of software development organizations relentlessly deliver the message that an acceptable project success rate still eludes us. Model-driven methods such as the Object Management Group's Model Driven Architecture (MDA) promise improvements in this state of affairs, but have yet to achieve widespread adoption, and could fall prey to the same lack of impact experienced by CASE back in the preceding century.
An approach that could yield greater success would be not only model-driven, but data-model-driven. Applying a data-oriented approach to application development projects has many advantages as an alternative to other contemporary application development methods. The term Application Engineering acknowledges this as well as earlier attempts to bring a level of discipline and rigor to the practice of transforming business requirements into computer applications.
A data-oriented approach would utilize a combination of current and potential techniques and tools. It would deliberately seek opportunities to take advantage of the durable and self-organizing properties of data, and to extend these properties throughout the application development process. Potential benefits would include accelerated analysis and development, more straightforward integration, higher maintainability and lower cost after deployment-all of which translate to quicker and greater return on investment.
Why Data Orientation: The Business View
Data is not a by-product of business computer applications--or as they are better known, enterprise-class applications. On the contrary, enterprise-class applications are by-products of the data they manage. Furthermore, enterprise-class applications do not actually automate business processes--they only automate the processing of data. (Our profession used to be called "Data Processing". Things aren't that different today.) The fundamental nature of enterprise-class applications is significantly different from video games, robotics and personal productivity software; they're much more like digitized filing cabinets.
In reality, the only business requirements that absolutely, unconditionally must be satisfied by a business application are data requirements. And the self-organizing properties of a set of data--functional dependencies and other specific types of fixed and conditional constraints--determine most, if not all, of the business rules required in an application using that data. So it follows that given a truly comprehensive, tool-supported data modeling technique, enterprise-class applications could conceivably be built based exclusively on an extended data model.
It is widely recognized that business processes and technology change much more rapidly than data does. Very similar data is used by the business before and after deployment of an application, but very often by the time an application goes to production, the business processes it emulates have changed significantly, and the technology on which it is based is a generation behind. The responsiveness of a business to its changing marketplace is significantly enhanced if its processes are embedded to the minimum extent possible in application software and technical platforms.
Data, in contrast to business processes, does not need to be translated by design or architecture into some different digital language in order to be operated on by a computer. A focus on data rather than software also enables more direct transition and traceability from requirements through deployment.
Why Data Orientation: The Technology View
With the growing adoption of service-oriented architectures, a data-oriented approach to application engineering is an idea whose time has returned. Data drives the services that drive SOA. And the potential of data orientation is not limited to service-oriented architectures. A data-oriented approach is arguably the most important reason for the widespread success of data warehousing.
Partitioning and delineating the boundaries of the "functional" aspects of an enterprise-class application--components, services, and other types of executables-can be a problematic and to a significant extent arbitrary endeavor. In contrast, defining the boundaries of data is much more precise, since we have the tools of functional dependency and constraints at our disposal. Basing the boundaries of the computer-based equivalents of business processes on the data they process removes much of this ambiguity. Ambiguity decreases quality and productivity; precision increases them.
Extending Data Models
Fully data-oriented application engineering will require a more comprehensive data modeling vernacular than what is currently available.
A data model is not just a means of arriving at a database design-it is the comprehensive specification of the data requirements of a problem domain. When developing a business application, a data model is not just one of many artifacts produced-it is the primary artifact, to which any and all other artifacts should be directly related. Data management professionals realize this intuitively, but at this point in time the vast majority of application developers and software architects would probably disagree.
It could be argued that the proliferation of model types outside of data models has come about because of the limitations of data modeling tools and practice. If most or all of the specifications essential for generating an application can be stated in a single model type, or even a small number of highly integrated model types, less time will be spent on disparate and transitory documentation activities. This objective, incidentally, is consistent with the Agile Manifesto value statement of "Working software over comprehensive documentation". It is an example of how a data-oriented approach can help to attain development agility without compromising discipline and quality.
Much conventional "wisdom" has accumulated regarding the nature of data models, and this has seriously impeded their advancement. Significant but judicious extensions would be required to enable data models as the foundation of data-driven applications. Developing a viable data-oriented methodology will require substantial advancement and inter-connecting of these and other areas of research:
Ontology research may also have concepts and solutions to offer to help extend the boundaries of the current data modeling idiom.
Constructing a Data-Driven Application
Let's consider how a data-driven application would be developed, utilizing data-model-driven analysis, design and generation tools.
Any application development effort is an investment in enhancing the value of a subset of an enterprise's total data resource. So it follows that a data architect would function as the technical project lead on a data-oriented application engineering effort.
To obtain early and active stakeholder participation, a data-oriented project could be initiated by doing enough data modeling to allow model-based tools to generate initial screens. Construction would consist of generating concrete artifacts from abstract data models, evaluating of the artifacts by users, changing the models in response to this feedback, and circling back to step one. The extended data model and corresponding application would grow through iterative refinement and extension.
In one way or another, all enterprise-class application design is data design: determining where the data comes from, where it goes to, and what happens to it on the way. In a data-oriented approach, data provisioning accounts for how and where the data within scope will be sourced. Data logistics defines where it goes afterward.
Specifications for systems configuration--execution scaffolding--should be available in the form of a business-domain-neutral, reusable operational platform model. This type of model can be expressed, for example, using IBM's Architecture Description Standard. Merging this model with the data-driven application model would create the equivalent of the MDA's Platform Specific Model, or PSM (see Figure 1 below).
Regardless of whether the data being processed is in memory, on disk or wire, local or remote, it should be specified in a similar manner in the extended data model. The data provisioned from a local database today could be sourced through an integration hub tomorrow. The operational platform model provides the current physical details at any point in time.
Figure 1. Data-Driven Application Construction
Much of the effort in designing a contemporary enterprise-class application goes into dividing it into manageable chunks and distributing the results. Layers or tiers result from horizontal partitioning, components result from vertical partitioning. (Think of cellular mitosis-cells dividing to form more complex organisms.) Creating any such partition in application software increases the complexity at that point by a factor of five. One component becomes two, with two interfaces and some sort of transport between, even if both happen to be concurrently memory-resident. The more interfaces, the more data element instances that need to be semantically reconciled. The more a given component is reused, the more semantic mappings are required.
The optimal component partitioning and distribution at any point in time could very possibly be derived by computer, based on the extended data model combined with the operational platform model. Executable units and their interfaces could be generated dynamically and precisely. In this way, when the technology and/or topology of the operational platform model changes, components could be re-partitioned and re-generated without causing mapping errors or affecting the extended data model.
Production deployment of a data-driven application could be accomplished more by incremental assimilation into the business than by a big bang implementation. Deployment units would be packaged and rolled out when an acceptable value threshold is reached. "Use cases" would then happen as users interact with data during assimilation. As the application is extended and enhanced, new and modified deployment units would be generated and rolled out.
Recent articles by Bill Lewis
Bill Lewis - Bill is a Data Architect with IBM Global Business Solutions. His more than 25 years of information technology experience span the financial services, energy, health care, software and consulting industries. In addition to his current specializations in data management, metadata management and business intelligence, he has been a leading-edge practitioner and thought leader on topics ranging from software development tools to IT architecture. He has contributed to numerous online and print publications, and is the author of Data Warehousing and E-Commerce. He can be reached at firstname.lastname@example.org.