|
Farm and Market Architecture: A Vital Question for Master Data Management
Published: July 1, 2010 Malcolm Chisholm suggests the idea of having many master data farms and a single master data market to separate master data production from master data distribution.
The past few years have seen enormous growth in interest in master data management (MDM). In terms of architecture, interest seems to have crystallized around various kinds of hubs. The original vision seems to have been that master data can be taken into these hubs from transaction applications where it is produced. Once in the hub, it can be integrated (including de-duplication), cleaned, enriched, and then distributed to consuming applications. There has been a debate on the scope of these hubs. For instance, in some cases little more than identifying information (keys) may be managed, while in others, full-blown “golden records” containing many non-key attributes are maintained. Additionally, there has been debate over whether an instance of a hub should manage one or a few master data entities, or whether a hub should be truly “multi-entity” and manage many such entities.
Also during the past few years, many projects have been implemented, and they have not always gone as well as originally anticipated. This has resulted in feedback into the architectural debates and the way in which MDM product vendors position themselves. One important trend seems to center around the difficulty of relying on master data produced in legacy applications. Such data is often of inherently poor quality because the legacy applications were never intended to support MDM. Cleaning and integrating it in a hub therefore has an inescapable element of “hit-or-miss,” and the result is that the hub master data usually has limits as to how reliable it can be – and these limits may be less than any level acceptable to the enterprise. As a consequence, techniques for producing master data directly in MDM hubs have gained increasing attention. High-quality data is seen as the result of improved stewardship, and if facilities to support this can be built out in an MDM hub, then it seems possible to overcome the issue of the curse of master data produced by legacy applications. Figure 1 summarizes this architecture. ![]() Figure 1: Simplified Modern MDM Hub Architecture But is this architecture the final word? There are reasons to think that it might have inherent limitations and that another pattern may have more advantages. Let us look at why this might be so. Producing Master Data versus Consuming Master DataOne of the early projects I worked on was for a large intergovernmental organization that ran social and economic projects in developing countries. These projects were financed from specific funds, and my task was to build an application that would permit the creation of new funds. I had considerable experience with nearly all the major applications in this environment, and I knew that the only fund data these applications required was a couple of attributes: Fund Code and Fund Name. However, I was horrified to find that there were elaborate processes to create a new fund that involved a large number of parties each with specific responsibilities. This not only dictated a complex workflow with many states that a nascent fund passed through, but it also meant that there were many additional attributes (and entities) needed to store information about the fund on-boarding process. These included quite a lot of metadata about the process flow and participation by the users.I had originally thought that the project simply had to produce a table of Fund Code and Fund Name. In a sense it did, because this was all the consuming applications required. But the process and data required to get to this end point were very complex. It could also take a long time to on-board a fund – sometimes months. Does Master Data Production Fit in a Distribution Hub?What this experience taught me is that the production of master data is quite different than the distribution of master data. When I looked at the data model of my fund on-boarding application, it was vastly different to the data model of the simple table that had to be distributed. Would it make sense to incorporate the unique data and processes to produce the fund data into a distribution hub? On my project, we chose not to.Of course, in technical terms, there is nothing impossible about combining production and distribution in the same hub environment. But what is gained versus the problems that have to be overcome? Let us consider a few of the problems:
The Master Data FarmersFor decades, IT has viewed business users as rather a uniform lot. It is true that some have been recognized as being involved in data entry, and others as making business decisions based on outputs, and yet others as sponsors for IT activities. However, we are now seeing the emergence of a new class of users, and perhaps the best term for them is data content managers. These users do not participate in running or managing the enterprise. They are subject-matter experts in specific data domains. In financial services, we now commonly see teams dedicated to specific areas such as client data, account data, instrument data, or corporate action data. The data content managers, more than anyone else, “own” the data they are responsible for, and it is nearly always master data.Failure to recognize the existence of data content managers is a crucial error for IT, because IT certainly does not “own” master data. IT only builds and manages the environments in which master data can be created and distributed – and has no interest in the data content itself. But the architecture that IT creates should match the way the data is managed. The data content managers are like farmers of master data. They slowly tend to and grow their crops of data; and when these are ripe, the farmers send them to the market – to distribute them to the rest of the enterprise. Surely an architecture that confuses farm and market is inappropriate. An additional issue is that the environments in which master data is produced may need to be quite different. A team of data content managers looking after financial instrument master data will have very little in common with a team looking after client and counterparty master data. Why should the two teams be offered only a single environment in which to do their work? It is a little like saying that a dairy farm and a peach orchard are both farms and so they should be designed on identical principles. At a very high level this may be true, but for practical purposes such “one-size-fits-all” approaches always break down. Thus, we should not only separate master data production from distribution in our architecture, but production of master data entities should be separated into independent applications. Figure 2 summarizes this architectural pattern. ![]() Figure 2: Farm and Market Architecture Applied to MDM It is likely that as vendors try to put more and more specific master data production functionality into their hub products, they will be driven to the conclusion that the environments for the production of master data cannot be generalized. The Master Data MarketIn contrast to farming, markets do tend to be centralized. It does make sense to have a single distribution hub from which all enterprise applications can obtain master data. In this respect, the current hub architecture is really good. It is very difficult to see what better pattern could be implemented. The issues we have been discussing lie much more with the production – the “farming” – of master data.Like all architectural discussions, there is some degree of looking at an ideal state here. Some degree of integration and cleansing (at least data quality checking) will probably have to remain in the "market" hub. Also, the current designs of MDM hubs try very hard to deal with the fact that much master data is not truly “farmed” but is a by-product of legacy transaction applications. There will be no getting away from this in the near future, and so the issues of cleansing and integration will still need to be dealt with. However, architecture is also about planning for a target state, and the idea of having many master data farms and a single master data market at least represent a valid pattern for consideration in this planning. Note: I am grateful to my colleague Fabio Corzo for suggesting the farm and market analogy when we were discussing these problems. Go to Current Issue | Go to Issue Archive Recent articles by Malcolm Chisholm
Malcolm Chisholm -
Malcolm Chisholm, Ph.D., has over 25 years of experience in enterprise information management and has worked in a wide range of sectors. He specializes in setting up and developing enterprise information management units, master data management and business rules. Malcolm has authored two books: Managing Reference Data in Enterprise Databases (Morgan Kaufmann, 2000) and How to Build a Business Rules Engine (Morgan Kaufmann, 2003). He can be contacted at mchisholm@refdataportal.com. Editor's note: More articles, resources, news and events are available in Malcolm's BeyeNETWORK Expert Channel. Be sure to visit today! |