Tecflix

Association Rules in Depth Purchase the entire course

27 June 2014 · · 2007 views

Association Rules is one of the most useful and well understood predictive analytical techniques, and a fundamental algorithm of classical data mining. Let Rafal teach you how to use it in SQL Server in this hands-on, detailed, 1-hour 40-minute, demo-filled video.

The easiest way to get started is to play with the Dependency Network viewer, which shows a web of connections, or associations, between related items, such as products that customers may be buying at the same time, or sets of characteristics, maybe demographics, like age, gender, marital status, that can show relationships between seemingly unrelated people, and so on. Make sure to understand the use of attribute names vs using their values for different types of analysis, such as demographical analysis vs basket analysis.

Indeed, the most typical use of this technique is for Market Basket Analysis, make sure to watch the short 10-minute video about it, before you continue with this one, as it also shows how you can do this analysis in Excel. Association Rules, however, can be used for much more than basket analysis, including finding correlations between large sets of individual values of any attribute of any set of items, not just products, but also: transactions, events, animals, and of course, people.

Association Rules algorithm is simple in design yet very powerful. It works by efficiently identifying items that co-occur, like products which are sold in one shopping visit, or in a year, or perhaps like cars sold in the lifetime of a customer. These itemsets are then analysed, from a probability point of view, to find if there are connections between items that seem to happen more often when other items are present, but at the same time, significantly less often without those other items. This leads to the generation of the associative Rules after which the technique takes its name.

To work with AR you need to understand which rules are strong and useful to be of practical value. The strength of association of a rule is expressed as its probability and its importance. Although they seem similar, they are two very different metrics. Probability simply tells us how likely it is that all the items on the left and the right of the rule happen together. For example, rules with probability of 1 (which means 100%, or certainty) denote things that always occur together, such as kits of items always sold in pairs and never alone etc. They are not usually of interest to us. Even rules with interesting probabilities, usually in the 40-60% (0.4–0.6) range, can be uninteresting!

Interestingness score, which we call the Rule Importance usually takes values between 0 and 2, although it can be any number. It simply tells us how much more likely is that the item on the right-hand side of the rule would happen with the items on the left than without those items on the left. The value is logarithmic on a scale of 10. For example, a rule such as milk -> cereal with importance of 1 tells us that it is 10 times more likely that cereal would be sold with milk than without milk. A rule such as card -> envelope with importance of 2 means that envelopes sell with cards 100 times more often than without envelopes. 3 would mean a factor of 1000 and so on. Importance can be even negative which just means the opposite, that is that the thing on the right of the rule is so many times more likely to happen without the thing on the left than with it. In the demo you will see an example of such a rule, showing products that “repel” each other, from a shoppers perspective.

When doing your first analysis it is often the case that you get an empty, or a very small, set of results. This is caused by the default parameters of the algorithm, which were designed to save computing resources and to prevent runaway processing. With today’s capabilities, you can be quite brave about changing MINIMUM_SUPPORT and MINIMUM_PROBABILITY parameters which is necessary to get in-depth results for many, even larger, data sets. Rafal shows how this can have a dramatic effect on the output of a simple basket analysis, but the same would apply, although in the opposite direction in terms of values, on dense sets, such as those used to analyse demographic or transactional characteristics.

The remaining parameters, MINIMUM_ITEMSET_SIZE, MAXIMUM_ITEMSET_SIZE, MAXIMUM_ITEMSET_COUNT help you prune or expand the results to match the type of data you analyse. We explain how to adjust them, but you need to experiment with various ranges as you move your project towards making shopping recommendations. Rafal shows how you make simple predictions, just based on the items bought or more complex ones, which take into account the characteristics of the buyer, such as their gender. If you are going to use this technique a lot, make sure to also learn about the remaining parameters: AUTODETECT_MINIMUM_SUPPORT, OPTIMIZED_PREDICTION_COUNT. 

Association Rules is a fascinating algorithm. Simple design and a sound grounding in theory leads to many uses, and we hope that once you get familiar with it you will consider using it not only for basket analysis, but also to explore, and predict, cooccurences of anything else: transactions, risks, generic events, faults, security breaches, and even customer preferences.

Log in or purchase access to play the video.

  • Introduction to Data Mining with Microsoft SQL Server 24-min Watch with Free Subscription

  • Data Mining Concepts and Tools 50-min

  • Data Mining Model Building, Testing and Predicting with Microsoft SQL Server and Excel 1-hour 20-min

  • What Are Decision Trees? 10-min Free—Watch Now

  • Decision Trees in Depth 1-hour 54-min

  • Why Cluster and Segment Data? 9-min Watch with Free Subscription

  • Clustering in Depth 1-hour 50-min

  • What is Market Basket Analysis? 10-min Watch with Free Subscription

  • Association Rules in Depth 1-hour 35-min

  • HappyCars Sample Data Set for Learning Data Mining

  • Additional Code and Data Samples (R, ML Services, SSAS) Get with Free Subscription

Purchase a Full Access Subscription

 
Individual Subscription

$480/year

Access all content on this site for 1 year.
Purchase
Group Purchase

from $480/year

For small business & enterprise.
Group Purchase
 
  • You can also redeem a prepaid code.
  • Payments are instant and you will receive a tax invoice straight away.
  • We offer sales quotes/pro-forma invoices, and we accept purchase orders and bank transfers.
  • Your satisfaction is paramount: we offer a no-quibble refund guarantee.
  • See pricing FAQ for more detail.
In collaboration with
Project Botticelli logo Oxford Computer Training logo SQLBI logo Prodata logo