|
Dimensional Analysis of Data Architecture
Published: June 1, 2009 In this article, the authors analyze enterprise’s data architecture and obtain intelligent insights to find the best performing solution for a given set of requirements.
Business intelligence (BI) is the activity of monitoring the key business processes (KBPs) of an enterprise for key performance indicators (KPIs) of its business activities measured across various
dimensions of the business environment. Such Intelligence helps the enterprise to make appropriate changes to stay responsive to the customer requirements, forces of competition and the changing
constraints in the landscape of the business ecosystem. This agility in response to the environmental stimulus largely differentiates an entity’s survival.
Data mining is the activity of identifying interesting patterns that are not obvious. These patterns could be association between various dimensional values. Some of these patterns could be classification and clustering that could spot commonalities between entities having apparently unrelated dimensional values. These patterns could be sequences, periods and time intervals between activities or outliers and predictions based on past data. Data mining adds to the arsenal of business intelligence techniques. In this paper, we apply this concept of business intelligence for analyzing the data architecture of an enterprise. First we identify the processes and the measures in each of them. Next, we identify the various possible dimensions. Finally, we list the probable benefits that could be achieved by organizing and analyzing this data architecture model repository. KEY BUSINESS PROCESSES, KEY PERFORMANCE INDICATORS AND THE MEASURES NEEDEDIn our architecture dimensional nodel, we need to list all the key processes that we would like to monitor, the measures in these processes, and the formula to calculate the KPI from these measures. Later, we will inquire what kind of intelligence can be derived from monitoring the dimensions and measures, and their change observed over other dimensions and time. To identify the key performance indicators of data architecture, we can look to the service level agreements for quality parameters. These parameters address the non-functional requirements driving the design of data architecture.The key processes to monitor could include, but are not limited to:
a. High performance – real time – closed loop – operational systems.
b. Availability across locations, people (ability, group, nationality, security, corporate) , time zones, devices.
c. Reliability.
d. Scalability from startup to a few million users, like MySpace or Facebook. A representative list of the measures in these processes could include, but are not limited to:
MULTIDIMENSIONALITY OF DATA ARCHITECTUREA traditional data warehouse model uses the business dimensional model to analyze its business. For analyzing data architecture, we want to understand the multidimensionality of an enterprise data architecture solution along with a similar understanding of the problems it solves. In the following sub-sections we discuss the various classes of dimensions and their interrelationships. There are classifications such as dimensions and sub-dimensions, co-existent and mutually exclusive dimensions of an enterprise at any instant of time, static and mutable dimensions, finally overlapping and disjoint sub-dimensions. They also change over time assuming varying levels of significance.Static or Fixed Dimensions and Mutable Dimensions Certain dimensions are fixed from the birth of an enterprise or an enterprise’s architecture due to the domain or the nature of data. Some examples are geographical applications that use spatial data types, and spatio-temporal applications that use spatial and temporal data types. Because of the nature of their application, these data types are fixed and the probability that they will change is very miniscule. An example from another domain could be the place and date of birth, or the first transaction date of a customer. This is a static dimension that will never change. But there are other dimensions that are mutable, and which could change over the lifetime of the enterprise. For instance, the latency of the data available for analytics gets reduced with the advent of technology and the competitive business Environment, forcing everyone to get latest information for accurate decision making. To give an example from another domain, the gender of an employee in a human resource application is a static dimension, while the relative employee performance rank, education, skill sets, role and designation are all slowly changing dimensions. Though facts are largely monitored, often times, some business intelligence is derived when we measure the rate of change of dimensions (such as “How many promotions did an employee get during the past three years?”) where we measure the rate of change of designation of the employee over time. Such rate of change of any measure or dimension with respect to some other measure or dimension yields interesting results such as in CRM. In the telecom industry, how many times a customer has changed his mobile phone talk-time plan in the last quarter could tell something about the customer’s happiness, or predict a churn. Dimensions and Sub-Dimensions (both overlapping and disjoint) of Data Architecture Within each dimension, there could be many independent perspectives. So we have classified data administrators’ and data users’ sub-dimensions into one stakeholder’s dimension. The sub-dimensions could be overlapping like domain and data type. They could also be disjoint like data administrators and data users. Significance of Dimensions and Abstraction Levels of Models Though the dimensions could be many for a particular data architecture, a few of them assume prominence and have a larger say over data architecture decisions. Various abstractions could be modeled depending on the significance of the dimensions portrayed in that particular model. So a model at a very high level would have probably two or three dimensions that predominantly decide the architecture. As we go to more detailed abstractions, the other subtle dimensions could be shown with appropriate interactions. Co-Existent and Mutually Exclusive Dimensions at any Point in Time Examples of co-existent dimensions are data types and domains. Another example could be enterprise data that is structured. There are dimensions whose value do not co-exist, such as open source product data architecture, but falls under defense security classification. Single and Multi-Valued Dimensions at any Point in Time Domain of a data architecture is always single valued. Product or project data architecture is also single valued dimension. Some dimensions are multi-valued like types of administrators and types of users of data. Another example is data type (structured, unstructured) and nature of data (master, transaction, metadata, audit, reference, and external). Time Variant Polymorphism of the Dimensions The architectural dimensions are a nebulous bunch of axes that come into play, depending on the severity of the situation that is faced. Here the model emerges out of the concerns of that particular scenario. The meta-model consists of a mix of all these dimensions and has certain dimensions and their relationships prominent in the context of the current situation. For example a product could be successful within a particular enterprise. And it might assume widespread popularity among the user community and regulatory bodies that it is prescribed for the entire industry. Now the scalability and security dimensions assume more importance than the functionality that the product was trying to achieve earlier. Here the architecture will now be measured against a different class of dimensions that have assumed prominence. Dimensions and Sub-Dimensions In this sub-section, we see the more common dimensions which are found in a typical enterprise. For a list of possible dimensions and a detailed discussion with examples, please use reference [1].
SIGNIFICANCE OF DATA ARCHITECTURE WAREHOUSEYes, you have read the title of this section correctly. It is not data warehouse architecture, but a warehouse for data architecture models. Data architectures can be represented by architectural domain types and stored by periodically collecting snapshots, or a snapshot of them during their significant life cycle activities, or every change that is made on them as a change management induced transaction.This is similar to monitoring various types of facts (transaction, periodic snapshot, and cumulative snapshot) in a normal commerical enterprise. The use of such a warehouse would be to know:
a. What patterns always occur together, using clustering techniques?
b. What are the target domains for which an architecture style suits best? c. What are the sequences of incidents that occur over the life cycle of particular data architecture? d. Applications [10] in i. Targeting user training by data stewards, customer relationship management (CRM) systems to measure data users’ satisfaction, cross-selling data services, users’ segmentation for security classification. ii. Forecasting for database capacity planning, customer retention for data stewards to measure their users’ satisfaction, comparative data architecture in the industry used by competition. iii. Fraud detection for data security compromises in various data architectures.
e. Using the following operations & techniques [10]
i. Prediction – database capacity planning for scalability of concurrent users, volume of data and complexity of queries. ii. Database segmentation for designing distributed databases (decide on fragments and allocations based on users and the complexity of their queries). iii. Interrelationships between non-functional requirements (scalability and security, security and performance). iv. Intrusion detection in regular data access patterns. This will let one query this warehouse defining the requirements, and the architectural choices narrow down as the requirement specification progresses. At the end, there could be a clustering algorithm which gives all related architectures, rank them in order of whatever dimension we have chosen according to the weightage we have provided. Importantly, we can see the time variant nature of similar architectures, and trace the evolution, and fit it using the past to extrapolate the future. INTERCONNECTIONS AT INTERSECTS OF THE DIMENSIONAL MODELSome of the inter-connects that we have observed are [2]:
solutions for architectural problems by matching attributes between problems and solutions repositories. This approach is innovative and could be taken further for implementation by further research. CONCLUSIONWe have done a critical analysis of the data organization problem in the context of various environments that affect the selection and design of a solution to it. We have also proposed a repository mechanism to store, match and analyze the various problems and solutions with respect to their various dimensionalities. We observe behaviors of these solutions in terms of their key measures that will be vital to improving their services to the functioning enterprise.Over a period of time, we would be able to see the evolution, interesting patterns and classifications that emerge out of this data capture of the architecture’s behavior. Apart from being a repository of problems, solutions, and analyzing them, we would also be able to match solutions to problems from non-obvious relationships by dropping some dimensions, and by giving more significance to others. This is the way we see an enterprise being able to respond in an agile manner to the impulses it receives from the changes happening to the enterprise’s external environment. This we call the intelligence of our business of data architecture. More than the technique, the realization that such a vast amount of intelligence is available for having an edge over the competition is the first step of an intelligent architecture-enabled enterprise. ACKNOWLEDGEMENT The first two authors Sundara_rajan and Anupama_Nithyanand are grateful to their mentor and third author, S V Subrahmanya, Vice President at E-Commerce Research Labs for seeding and nurturing this idea, and Dr.T.S. Mohan, Principal Researcher, Dr. Ravindra Babu Tallamraju, Principal Researcher, and Dr.Sakthi Balan Muthiah, Manager-Research at E-Commerce labs at Education & Research, Infosys Technologies Limited, for their extensive reviews and expert guidance in articulating these ideas. The authors would like to thank all their colleagues and participants of authors’ training and knowledge sharing sessions at Infosys Technologies Limited, and contributed positively to these ideas. The authors would like to acknowledge and thank the authors and publishers of referenced papers and textbooks, which have been annotated at appropriate sections of this paper, for making available their invaluable work products which served as excellent reference to this paper. All trademarks and registered trademarks used in this paper are the properties of their respective owners / companies. REFERENCES
Go to Current Issue | Go to Issue Archive Recent articles by Anupama Nithyanand Recent articles by S. V. Subrahmanya Recent articles by P. A. Sundararajan
Anupama Nithyanand - Anupama is a Group Manager in Education and Research Department, Infosys Technologies Limited. She has a total of about 20 years experience in education, research, consulting, and people development.
Her areas of interest include various application development technologies, and information systems.
S. V. Subrahmanya - S.V. Subrahmanya is currently Vice-President at Infosys Technologies Ltd. He heads E-COM Research Labs at Education & Research at Infosys. He has authored three books and published several papers
in international conferences. He has designed and taught several technical courses at Infosys Technologies Ltd. He has almost 23 years of experience in the Iindustry and academics. His specialization
is in software architecture.
P. A. Sundararajan - Sundararajan PA, is a Technical Architect with E-Commerce Research Labs, Education and Research, Infosys Technologies Ltd. He has a total experience of about 14 years in application development and
data architecture, implementing solutions in the discrete manufacturing, ERP, mortgage and warranty domains with Java, and Oracle Technologies. His interests are in software architectures, data and
knowledge architectures, modeling, data analytics, data mining, web farming and information retrieval. He is very passionate about learning, teaching and researching in these areas.
Normal 0 false false false |