business intelligence resources

TDAN: The Data Administration Newsletter, Since 1997

THE DATA ADMINISTRATION NEWSLETTER – TDAN.com
ROBERT S. SEINER – PUBLISHER

Subscribe to TDAN

TDWI World Conference

TDAN.com - The Data Administration Newsletter

Business Intelligence Resources

business intelligence resources

TDAN.com - The Data Administration Newsletter

   > home
A New Way of Thinking - January 2006
Market Baskets and Association Rules

by David Loshin
Published: January 1, 2006

Published in TDAN.com January 2006

There is a famous, although perhaps apocryphal, story regarding data mining about beer and diapers. The story relates how some supermarket analyzed their product sales and noticed a correlation between the purchase of beer and diapers on Friday afternoons and evenings. Noticing this correlation, the store management determined that young wives were sending their husbands off to the store on Friday afternoons to pick up diapers, and while the husbands were there, they decided to buy their weekend refreshments at the same time. Because of the correlation, the store managers decided to place beer next to he diapers on the shelves to encourage other young fathers toward the same behavior.

Whether or not the story is true, it is often used to demonstrate different aspects of the power of data mining, and there are several key elements to this story:

  • The ability to use data mining to determine correlation between different operational transactions,
  • The business acumen to identify root causes for the correlation, and most importantly,
  • The ability to take action based on what has been learned.

While I am sure that there is some truth in the story, the thing that makes me wonder the most is how it perfectly reflects what is supposed to happen, which differs greatly from what usually happens during a data analysis project. The first tip-off is that business managers are proactively looking to discover intelligence out of their data; the second is that it doesn’t discuss the minimum 6-9 month lag between expressing the desire to analyze transactions and actually getting results, and the third is that there are managers poised and ready to actually take some action based on what they learned.

However, the value of the story lies in its simplicity, as it conveys some very powerful messages. The first, and probably most important one, is the concept of the association rule, which indicates a dependence of one set of attribute characteristics based on the values of some other set of attribute characteristic, with some level of support and confidence. For example, we might say that 20% of the time that someone buys diapers on a Friday afternoon, they also buy beer. The 20% is the confidence, which is the percentage of the time that the association exists. The support describes the percentage of the overall transactions in which the rule is observed.

Association rules are often used in an application called “market-basket analysis,” which is used to review how often certain events take place at the same time. The simplest example, much like our beer and diapers story, is the market basket one uses at the supermarket. The object is to determine which products are purchased at the same time in order to exploit the correlation, either by attempting to encourage the same behavior by making it easier to take place (e.g., moving the products together on the same shelf) or perhaps to prevent it from taking place, if purchasing the correlated products is undesirable.

More generally, association rule mining is used to find events that take place at or near the same time. Other examples include network analysis (looking for sentinel network events that might be precursors to network failures), attrition analysis (what activities lead up to a customer’s decision to resign their affiliation with an organization), or even to determine business decision strategies, such as predicting the success of commercial real estate purchases, marketing channels, or advertising approaches. Association rule mining has also been deployed in text mining and entity extraction applications to help determine guidelines for knowledge discovery.

But let’s quickly go back to the beer and diapers story. So far we have only touched on one of our key points – the ability to mine the data to discover the association rules. The two important points not yet addressed involve not the technical side, but the business side. Without having a sound basis for cooperation between the technologists and the business clients, it is almost impossible to achieve any kind of success using these approaches. First, there must be an agreement as to the value of organizational data and the potential for both discovering and exploiting actionable knowledge, and this can only exist when sound data management principles are in place, including:

  • A data quality management program, since without high quality data the ability to rely on discovered knowledge is severely limited;
  • Sound and repeatable ROI models for data exploitation projects, since the business clients taking the chance on funding knowledge discovery should be assured of the potential to be gained from taking that risk; and
  • Nimbleness in taking action based on what is learned, since having the knowledge without being to act on it is worse than not having the knowledge at all.

Realize that the value to be gained from a data mining program may be largely strategic during its early days, and recognize that the early value to be gained is the ability to change the ways that individuals interoperate within an organization to enhance collaboration to ensure greater success as the program matures.

Copyright © 2006 Knowledge Integrity, Inc.

Go to Current Issue | Go to Issue Archive


Recent articles by David Loshin

David Loshin - David is the President of Knowledge Integrity, Inc., a consulting and development company focusing on customized information management solutions including information quality solutions consulting, information quality training and business rules solutions. Loshin is the author of Enterprise Knowledge ManagementThe Data Quality Approach (Morgan Kaufmann, 2001) and Business IntelligenceThe Savvy Manager's Guide and is a frequent speaker on maximizing the value of information. David can be reached at loshin@knowledge-integrity.com or at (301) 754-6350.

Editor's note: More David Loshin articles, resources, news and events are available in the Business Intelligence Network's David Loshin Channel. Be sure to visit today!