Project Botticelli

New BI Content on ProjectBotticelli.com (July 2012)

Data Mining: Model Building, Testing & Predicting

When you registered on ProjectBotticelli.com, you asked to be notified about new content. This is our sixth newsletter since we have launched our site.

Summary of New Content:

The video and the sample data set are available to Full Access Members, while the PPTs are also available to registered, free members. 

News

You will be hearing a lot about Business Analytics over the next year, and beyond, as the domains of Business Intelligence, Predictive Analytics, Big Data Analytics, Data Warehousing, ETL and Data Integration are all slowly merging together, while exchanging and cross-breeding the underlying technologies. Data Mining, which combines statistics, machine learning, and database technologies, is only a few decades old—except for Bayesian algorithms, which were first thought of in 1700s. Big Data Analytics is relatively new, introducing large-scale, inexpensive processing parallelism to analytics, primarily by means of Apache Hadoop. In the Microsoft world, Hadoop is now well integrated, and you will find other forms of parallelism in the world of data, for example in the Microsoft Parallel Data Warehouse.

It is clear, that all of those approaches share the same goals: to provide you with meaningful answers about knowledge that has been hiding in your data, be it: small or large, slow or fast, simple or complex, structured or not. Whilst Data Mining is relatively mature, and well past the famous Gartner Trough of Disillusionment phase, Big Data is still on the way up the hype slope. However, a few years from now, we will no longer need to distinguish between the individual underlying ideas, various learning algorithms, ability to compute more in-memory, as opposed to scanning mechanical or SSD disks, or choose between relational, multidimensional, tabular, or unstructured data. All of those building blocks of the future of Business Analytics are about to be integrated, perhaps giving rise to, simply, Analytics, without any adjectives, modern enough by 2020 standards. Incidentally, if you are interested in my thoughts on the naming of the things we do, have a look at my older article about Performance Intelligence, some time.

In the meantime, Data Mining is undergoing a revival of interest, and now is a good time to learn it, if you are new to it, or to practice it more, if you have been doing it for a while. That is the key reason why we have been focusing our efforts on helping you learn that little-known, mature, and powerful, component of SQL Server Analysis Services. Our new Data Mining series is a great place to study it. It already includes 3 modules, and a sample data set, to help you practice it, and more modules are planned. The first module is free for everyone to watch, the remaining ones are for Full Access Members:

You can find out more about the newest content further below.

An Apology, and a Special Offer—10% Discount

Some of you, who have a VAT registration number in EU, have experienced difficulties renewing your Full Access Membership using a discount code. I would like to apologise for that. I'm pleased to say that the issue seems to have been resolved—thanks to Sanna from Finland, for alerting us to it. To make up, I would like to extend the 10% discount to all of you, both renewing or just signing up, valid on all memberships, until the end of July 2012—use coupon code JUL2012NEWSLETTER at checkout. Feel free to share it, or tweet it.

Group Membership

If you work for a company, Data Mining training could be useful to your colleagues. You can get Full Access Membership for multiple users by purchasing a discounted Group Membership pack. It is very simple: buy a pack, receive pre-paid membership codes for your users, then each user redeems a code. You also get one tax invoice for the whole purchase and everyone looks after their own usernames, passwords, and preferences. Above all, buying a Group Membership is cheaper than buying individually, and—until the end of July—you can use that 10% off code, mentioned above.

New Content in More Detail


Validating a Data Mining Model Using a Lift Chart

The third module of our Data Mining tutorial discusses the entire lifecycle of a Data Mining Model. You will learn how to build models and mining structures, starting by creating a Data Source and Data Source View. Subsequently, you will learn how to train it with your data, and how to view the results. Most importantly, you will also understand how to verify a model’s validity, by applying tests of accuracy, reliability, and usefulness. You will understand, and you will also see being used, such key verification techniques as: a Lift Chart, Profit Chart, Classification Matrix, or Cross Validation. Finally, you will see how to predict unknown outcomes using your model. This 1-hour 20-minute module includes 11 demos, which you can practice using our sample data set.

Data Mining Structures Available in HappyCars Sample Data Set

HappyCars is our educational sample data set, used for teaching Data Mining. Finding data sets that work for learning data mining by focusing on algorithm characteristics, as opposed to data sets which are good at presenting representative statistics, is not easy. That's why we built this one for you, with the specific purpose of teaching Data Mining. It comes with SQL Server tables containing sample data, such as Customers, NonCustomers, Sales, and CustomerActivity, plus a few utility views, amongst others. It also comes with an SQL Server Data Tools (SSDT) project, HappyCarsDM, which contains a prebuilt data source and views, and a series of Mining Structures containing Mining Models, which we explain in the videos of our online Data Mining course. It is available, at no additional cost, to our Full Access Members, as an educational aide, helpful when following the videos.

Request for Feedback

Do the videos play smoothly, and is the resolution really good? They are supposed to switch bitrate automatically to match your bandwidth conditions, but only you can give realistic feedback about that. Is anything behaving not as you would expect it to, or have you experienced any technical glitches with the site? We know you have high expectations, just as you do when you attend our high-quality conference presentations. Do you like anything in particular, or is something perhaps not so convenient for you? We would be extremely grateful if you would let us know any suggestions or comments you have—just email me: simply hit Reply, and even a one-liner will be much appreciated.

Thanks for reading and we hope you enjoy being a Full Access Member,

Rafal Lukawiecki, Strategic Consultant and Director, Project Botticelli Ltd

Project Botticelli New Content Announcements