Project Botticelli

Status message

Video playback issues? See tips for supported browsers. Clearing cookies and cache often helps.

What Are Decision Trees? Purchase this course

29 March 2013 · 5 comments · 18478 views

Loading the video player ...

If this message stays on, please ensure JavaScript is enabled and that you are using a supported device and browser. Ad blockers may also cause this issue.

For more information please visit our device support page.

Simple ways to study complex data

A decision tree is a simple, highly visual, and a very powerful analytical tool that analyses your data and builds for you a tree of nodes. Each node represents a logical decision, such as a choice of a value of one of your inputs (such as age, or income) that makes the most profound difference to the output that you wish to study (such as customer profitability). This short, free, 10-minute video by Rafal introduces this useful tool in an easy way, by explaining the concepts while analysing a simple set of retail data in the demo.

Microsoft Decision Trees can be used for three different purposes, without having to modify your data much! You can use them for: classification, regression, and even for associative analysis, which is similar to Association Rules technique, typically used for Market Basket Analysis. Without doubt, however, the simplest, flattened-data (case-level) decision trees are one of the best ways to start analysing any data, before proceeding to the use of other techniques, with the possible exception of the Naive Bayes and Clustering techniques, which can be useful especially when you suspect that the data set is odd, or perhaps it is one you do not understand well.

You will see how to create a simple decision tree using Excel and the free Data Mining Add-ins for Office (1:56), which connect Excel to a running instance of SQL Server Analysis Services 2012, 2008 R2, or 2008—if you use 2012, you need to be running a multidimensional instance for data mining to work. After explaining what the results mean, you will briefly see two more complex decision trees, including a regressive and an associative (nested) decision tree in the recently introduced SQL Server Data Tools 2012 (8:33).

If you are interested in learning about data mining, make sure to follow our entire tutorial, which includes a comprehensive, 2-hour module Decision Trees In Depth, available to our Full Access Members, who can also download our training data set, Happy Cars, which makes it possible for you to follow the demos.

Purchase This Course or Full Access Subscription

Single Course

$250/once

Access this course for its lifetime*.
Purchase
Subscription Best Value

$480/year

Access all content on this site for 1 year.
Purchase
Group Purchase

from $480/year

For small business & enterprise.
Group Purchase

  • Redeem a prepaid code
  • Payment is instant and you will receive a tax invoice straight away.
  • Your satisfaction is paramount: we offer a no-quibble refund guarantee.

* We guarantee each course to be available for at least 2 years from today, unless marked above with a date as a Course Retiring.

Comments

Navdeep Agarwal · 1 April 2013

Excellent Presentation !

ness · 29 January 2014

Hi Rafal,

Good presentation, well said! Thanks for sharing this so informative. :)

I have question regarding datamining. In our company we want to forecast future sales in each store for each brand and I used Time Series to do this and use 2-year data.

Some of the numbers look really good especially those brands that have data for two years but for the newer brands - the numbers are pretty far out of range - like 50 - 300% off.

What you can suggest other option to do this what algorithm that will best suite if you don't have enough history data that will not affect other brands that have enough history data for forecasting?

Your suggestion will much be appreciated.

Thank you in advance,
Ness

Rafal Lukawiecki · 1 February 2014

Thank you for your comments, Ness. In general, Time Series is useful for forecasting when your data shows a good level of statistical predictability. In other words, fit needs to exist in terms of data periodicity, and in terms of the actual progression of averages. It looks like that is the case for some of your data.

When it does not work, you may need to tackle the problem in several ways, but the overwhelming question you should try to answer first is if that data series is predictable at all. What do business owners think? We cannot predict chaos (philosophy aside) nor highly unctrollable events, which, by themselves, might be random enough to have too much variance. Make sure, too, not to be trying to predict too far into the future, as the variance of each further predicted data point grows fast enough to make the prediction useless. ARTXP is generally good for just 1–2 data points, ARIMA perhaps another 3–4, except for very stable and repeatable series. 

On the other hand, if your data is predictable, but the algorithm does not work, you should try checking if your data, the series, could be split apart into different components (like subcategories or other modelling dimensions) because you may find that only some of the components cause issues, while the rest works.

Ultimately, however, you could use other predictive techniques. A regressive decision tree might help, or you may want to reformulate the problem so that rather than predicting a series of time-related outcomes, you are predicting a fact related to other available inputs, and use any of the other approaches.

Finally, consider more parametrisation, and perhaps try using other approaches to time series modelling, there are a few more esoteric techniques available in R. Enjoy and stay learning with us.

Syed Qazafi · 19 July 2017

Hi Rafal
can you please advised from where I can download the files related to these videos.
I am a paid Member but could not find a way to get any material related to Courses.
I am finding lot of problem with the videos as they automatically stop streaming and come up with errors.
Your help will be highly appreciated.

Rafal Lukawiecki · 19 July 2017

Hi Syed, thanks for asking. The downloads for the Data Mining course are here: HappyCars Sample Data Set for Learning Data Mining.

I will contact you by email to help you resolve video playback issues. You should have no problems with streaming at all, I suspect something is affecting your device, browser set-up, or perhaps local networking and firewall. I am sure we will get it sorted. Rafal.

Online Courses