Tecflix

HappyCars Sample Data Set for Learning Data Mining Purchase the entire course

29 November 2015 · · 5226 views

HappyCars is our educational sample data set, used for teaching data science and data mining. It comes with SQL Server tables containing sample data, such as: Customers, NonCustomers, Sales, and CustomerActivity, plus a few utility views. It also comes with a SQL Server Data Tools (SSDT) project, HappyCarsDM, which contains a prebuilt data source and views, and a series of Mining Structures containing Mining Models. We also provide a version suitable as an Azure SQL Database, which is particularly useful while learning Azure ML.

If you decide to download and use our set, you will need to:

  1. Accept additional T&Cs,
  2. change the name of the SSAS deployment server specified in the solution properties,
  3. change the name of the server specified in the data source.

How to Use HappyCars and HappyCarsAzure Data Sets

First of all, decide which version you would like to download. There are four options: HappyCars and HappyCarsAzure, each as a SQL Server Script (.sql file), as a SQL Server Backup (.bak) file. The data in both is identical, however the Azure set follows additional criterial (keys etc) that make it possible to deploy it as a SQL Azure Database. If you choose the script, open it in SQL Server Management Studio and execute it (make sure to verify if it is free of malware before you do that). Provided that you have the necessary permissions, it should create a database called HappyCars or HappyCarsAzure with a number of tables and views on your server. Alternatively, use a command line such as:

sqlcmd -S server\instance -i HappyCarsData2012.sql

to execute it. If you decide to use the backup file, simply restore it to your server using SSMS. If you are learning about Azure ML, you can deploy the local database to your Azure one using SSMS—it is faster than running the .sql script directly on the Azure SQL database.

Additional Instructions for HappyCars (not Azure) and SQL Server Analysis Services Data Mining

If you are using SSAS Tabular, you will need to install SSAS Multidimensional and Data Mining, first, as the data mining engine is only available in that instance mode of SQL Server Analysis Services. Next, you need to make two changes to the HappyCarsDM SQL Server Data Tools project. Open it with SSDT by double-clicking on HappyCarsDM.sln file. Then, select the project in the Solution Explorer, right-click on it to get to its properties, and change the Target Server option, on the Deployment tab, to your SSAS Multidimensional and Data Mining instance. 

Secondly, double-click on the Happy Cars DB.ds data source, Edit the connection setttings, and change the name of the server to your own instance of the SQL Server database engine where you have just installed, or restored, the HappyCars database:

You will need to deploy the project, and process each model, before you can view its results. Best of all, follow the demos in our Data Mining videos.

Terms and Conditions

  1. You acknowledge that Tecflix and Project Botticelli Ltd retain full copyright and ownership of this data set. By downloading it, you receive a non-transferrable, personal, free license to use it, subject to you agreeing to these terms and conditions. No ownership rights for the data set are transferred to you.
  2. You agree to use this data set only for personal educational purposes. No commercial use is allowed. Commercial and educational institution licenses are available upon request, subject to additional terms and conditions. 
  3. You agree not to distribute this data, or any data sets derived from it, in portion, or as a whole, or publish, email, post, or share it in any way, without our express, written permission.
  4. You agree not to publish any articles, blog posts, or other works, that use the data from this set, or that quote or refer to this set, in any way.
  5. Neither Tecflix nor Project Botticelli Ltd offers no guarantee for the suitability of this data set for any purposes and expressly disclaims all and any liability for any events, or losses, directly or indirectly related to your use of this data set.
  6. You agree to verify the accuracy of any conclusions you reach by means of your use of this data set independently and without relying on Tecflix or Project Botticelli Ltd. This data set may contain errors, and it should not be trusted for any purposes other than learning.
  7. Neither Tecflix nor Project Botticelli Ltd warrant that the data set is free from defects, viruses or other malware. You must verify its safety and security before you use it and you agree to accept the liability for any loss that might arise from its use, for any reasons, including, but not limited to, security breaches.
  8. You accept all liability for your own use of this data set and you indemnify Tecflix and Project Botticelli Ltd from any claims that may arise from your use of this data set or from the consequences of your use of this data set. 
  9. You acknowledge that this data set is not based on realistic simulations or real statistics, and it does not represent a sampling of any real events or real customers. The set has been synthesized for educational purposes by means of a pseudo-random automated process in order to demonstrate common and uncommon aspects of data mining and a number of edge conditions. All names are fictional, and any resemblance to any living or deceased persons, companies, trademarks, or products is coincidental. Names referred to in the data set do not mean anything, infer anything, and should not be associated with anything.
  10. Tecflix and Project Botticelli Ltd reserve the right to remove access to this data set at any time and without a prior notice.
  11. Tecflix and Project Botticelli Ltd reserves the right to amend these terms and conditions at any time and without notice.
  12. You agree that these Terms and Conditions are in addition to the Terms and Conditions of the Website, which you continue to accept.

Log in or purchase access for the full version of this content.

  • Introduction to Data Mining with Microsoft SQL Server 24-min Watch with Free Subscription

  • Data Mining Concepts and Tools 50-min

  • Data Mining Model Building, Testing and Predicting with Microsoft SQL Server and Excel 1-hour 20-min

  • What Are Decision Trees? 10-min Free—Watch Now

  • Decision Trees in Depth 1-hour 54-min

  • Why Cluster and Segment Data? 9-min Watch with Free Subscription

  • Clustering in Depth 1-hour 50-min

  • What is Market Basket Analysis? 10-min Watch with Free Subscription

  • Association Rules in Depth 1-hour 35-min

  • HappyCars Sample Data Set for Learning Data Mining

  • Additional Code and Data Samples (R, ML Services, SSAS) Get with Free Subscription

Purchase a Full Access Subscription

 
Individual Subscription

$480/year

Access all content on this site for 1 year.
Purchase
Group Purchase

from $480/year

For small business & enterprise.
Group Purchase
 
  • You can also redeem a prepaid code.
  • Payments are instant and you will receive a tax invoice straight away.
  • We offer sales quotes/pro-forma invoices, and we accept purchase orders and bank transfers.
  • Your satisfaction is paramount: we offer a no-quibble refund guarantee.
  • See pricing FAQ for more detail.
In collaboration with
Project Botticelli logo Oxford Computer Training logo SQLBI logo Prodata logo