Data Mining Model Building, Testing and Predicting with Microsoft SQL Server and Excel

Jump to a chapter

Introduction (00:08)
Process and Methodology for Data Mining, CRISP-DM (01:06)
Data Mining Model Lifecycle (04:42)
Defining a Data Mining Model (08:05)
Approaches and Tools for Model Building (09:26)
Demo Introduction: Defining a New Data Mining Model using SSDT (12:57)
Demo: Creating a Data Source for Mining (14:57)
Demo: Creating and Exploring a Data Source View for Mining (15:37)
Demo: Creating a Mining Structure and a Data Mining Model (17:42)
Training a Mining Model with Data (25:55)
Demo: Training a Mining Model with Customer Data (26:47)
Demo: Reviewing Results using Decision Tree Viewer (29:01)
How Much Training Data is Enough to Build a Model (34:14)
Testing and Validating Mining Models: Accuracy, Reliability, Usefulness (38:06)
Model Validation Approaches and Techniques (43:30)
Demo: Lift Charts (47:38)
Demo: Profit Charts (56:45)
Demo: Classification Matrix (60:06)
Demo: Cross Validation (61:34)
Predictions and Scoring (67:18)
Demo: Predicting Sales using a Model and SSDT (69:09)
Demo: Predicting Sales using a Mining Model and Excel (73:06)
Next Steps (78:15)
Summary (79:38)

This 1-hour-20-minute video, part of our Data Mining Series, discusses the entire lifecycle of a Data Mining Model. You will learn how to build models and mining structures, starting by creating a Data Source and Data Source View, how to train it with your data, and how to view the results. Most importantly, you will also understand how to verify a model’s validity, by applying tests of accuracy, reliability, and usefulness. You will understand, and you will also see being used, such key veriﬁcation techniques as: a Lift Chart, Proﬁt Chart, Classiﬁcation Matrix, and Cross Validation. Finally, you will see how to predict unknown outcomes using your model.

We begin by introducing a well-known process methodology for data mining, known as CRISP-DM. The key to any data mining project is the understanding of its iterative nature: to gain a better business understanding you explore the meaning of your data, using mining models, and you necessarily modify those models, to get better, more accurate results. However, in the process of doing that, you change your understanding of the original issue, that you wanted to resolve, and the process continues. It is a loop, one that rewards you with intelligence, which helps you improve your organisation.

There are ﬁve stages in a data mining model’s lifecycle: Model Deﬁnition, Training, Testing, Use (Scoring), and Update, and we discuss those in detail, starting with Deﬁning a Model. The main model deﬁnition concepts, such as Cases, Keys, Column Data and Content Types, Usage, Discretization and so on, have been introduced in another module of this course, Data Mining Concepts and Tools, please make sure to follow it, if you are new to data mining. As there are several ways to build models, we briefly discuss the merits of using SQL Server Data Tools (SSDT), which are more suited to project-focused work, where you return to the same model over a period of time, perfecting it, as opposed to working entirely within Microsoft Excel, which is simpler, and easier to use for a more casual data miner. In addition, you will hear, very briefly, about using PMML (Predictive Model Markup Language) and DMX (Data Mining Extensions) as alternative ways to deﬁne models.

Time to build a model! In the ﬁrst series of demos you will see a Data Source and a Data Source View being created, so that we can use a simple, flat table of so-called customer signatures, which contains their demographic characteristics, such as age, occupation, and income, together with the known predictable outcomes, in our case the number of purchases each customer has made in our store, which we wish to study by building a mining model. Although our table comes from SQL Server, there is absolutely no limit where your data should be stored. You can use any database, spreadsheet, or even a text ﬁle. While you use SQL Server Analysis Services (SSAS) Multidimensional and Data Mining engine for analysis of this data, you are free to use any source for your data. The next step, creating a Mining Structure, together with a Mining Model, is the heart of the mining process. Once deﬁned, a mining model needs to be trained using your data. You will see how we train our model, and how we visualise the results using a Decision Tree viewer. A common question is asked about the amount of necessary data for training. As Rafal explains, there are no ﬁxed rules on a speciﬁc number of cases, however there are qualitative rules about the validity of the model, that must be fulﬁlled, which are often related to the amount of training data.

Testing and validation of models is of paramount importance. After all, what’s good of a working model if it cannot be trusted? You will hear an extensive explanation of testing of Model Accuracy, Reliability, and Usefulness, just before you see it being done on our model. The demo explains, and shows, at the same time, those most important validation techniques, including: Lift Charts, Proﬁt Charts, Classiﬁcation Matrices, and ﬁnally, Cross Validation for reliability testing.

If your model passes those tests, you may want to use it for making predictions about the unknown, by applying it to data on which you may want to perform predictions, which is sometimes called Model Scoring, or scoring data with a model. This is shown in two demos, both achieving the same results. First, using SSDT, and later in a simpler, but equally powerful way, using Excel, you will see how we use our just-created, and validated, model of customers’ characteristics to predict the most likely numbers of purchases that future, potential customers might make. This is a very common usage scenario, as you might want to apply a marketing list, and a mining model, to discover which customers are most likely to be interested in your products, and even to ﬁnd out what is their proﬁt potential to your company, impact on your credit, possibility of churn, or even risk of fraud.

If you would like to follow the demos, shown in this video, you will need access to a working installation of SQL Server Analysis Services (2012+, 2017 works well) and the database engine—get a trial if you don’t have it, or the free developer edition. You should also have access to Microsoft Oﬃce Excel 2013+ (2016 works well) and the free Data Mining Add-Ins, which come with its own mining data sample in an Excel workbook. You should also use Adventure Works DW datasets available from GitHub, and you may want to use our own dataset, HappyCars, which is available for download from here.

Log in or purchase access to play the video.

Data Mining with SQL Server SSAS

Introduction to Data Mining with Microsoft SQL Server 24-min Watch with Free Subscription
Data Mining Concepts and Tools 50-min
Data Mining Model Building, Testing and Predicting with Microsoft SQL Server and Excel 1-hour 20-min
What Are Decision Trees? 10-min Free—Watch Now
Decision Trees in Depth 1-hour 54-min
Why Cluster and Segment Data? 9-min Watch with Free Subscription
Clustering in Depth 1-hour 50-min
What is Market Basket Analysis? 10-min Watch with Free Subscription
Association Rules in Depth 1-hour 35-min
HappyCars Sample Data Set for Learning Data Mining
Additional Code and Data Samples (R, ML Services, SSAS) Get with Free Subscription

Purchase a Full Access Subscription

Individual Subscription

$480/year

Access all content on this site for 1 year.
Purchase

Group Purchase

from $480/year

For small business & enterprise.
Group Purchase

You can also redeem a prepaid code.
Payments are instant and you will receive a tax invoice straight away.
We oﬀer sales quotes/pro-forma invoices, and we accept purchase orders and bank transfers.
Your satisfaction is paramount: we oﬀer a no-quibble refund guarantee.
See pricing FAQ for more detail.

Data Mining Model Building, Testing and Predicting with Microsoft SQL Server and Excel Purchase the entire course

The Full Data Mining Lifecycle

Jump to a chapter

Data Mining with SQL Server SSAS

Purchase a Full Access Subscription

Individual Subscription

Group Purchase

In collaboration with

Company

Courses

Resources

Help

Search form

Data Mining Model Building, Testing and Predicting with Microsoft SQL Server and Excel Purchase the entire course

The Full Data Mining Lifecycle

Jump to a chapter

Data Mining with SQL Server SSAS

Purchase a Full Access Subscription

Individual Subscription

Group Purchase

Get the Newsletter

In collaboration with