Project Botticelli

New machine learning sample code and data set published

23 December 2016 · 2 comments · 773 views

While delivering my Practical Data Science course to over 300 new data scientists over the last year-and-a-half, I have been updating and improving (I hope) some of the key demos that I use for teaching. As I have been often asked to share it with the attendees, I decided to make it available more broadly. I have just uploaded the code for: performing classification diagnostics and plotting classifier performance using R, sample DMX that shows how to correctly query an association rules model to make cross-sell predictions with and without demographic (user-level) data, and an example in SQL Server R Services that shows the four different ways how to analyse a 10 million row data set, containing mortgage default risk information, using logistic regression. For your convenience, I have also included the 10 million rows data set as a SQL Server database backup. All of this is free, however, you need to have a registered account on our site. Get these files from here:

Enjoy machine learning!



aneesh · 15 April 2017

Hi Rafal,

Why there are no new video courses :( ?


Rafal Lukawiecki · 17 April 2017

Hi Aneesh, thanks for asking. The new videos are ready but they are patiently waiting for our web site infrastructure project to be completed. We are migrating to a new video streaming set-up and there are also major changes to the back-end to deal with the new and upcoming price model. That project is overdue at the moment, and I am hoping it would be live in July. As soon as that has happened, we will release the new videos—currently that is Chris Webb's new Power BI course and Marco and Alberto's DAX one. I am afraid we cannot release the new videos before those changes have taken place... Let me know if you have any questions.