Project Botticelli

Live Courses

Practical Machine Learning

Practical Machine Learning, AI & Data Science on Azure ML/Server and Azure SQL/Server in R (5 days)

This live classroom course is fully up-to-date for 2019, just refreshed for newly-released ML Server 9.4! It focuses on R and the technologies of Microsoft Machine Learning Server, Azure ML, SQL Server, and Azure SQL Database, whilst teaching you everything you need to know to start using machine learning, and to apply data science for analytics.

This course has two parts. You start with 2-day Part A: Introduction to Machine Learning, AI & Data Science with Azure ML followed by 3-day Part B: Intermediate Machine Learning in R, SQL Azure/Server, and Microsoft ML Server. The first part introduces the most important concepts and tools, while the second part teaches you R and how to use it for machine learning on the Microsoft platform. Most of the course is also applicable to Python programmers, as the key ML Server libraries are the same.

Registration and Dates

  • 2020 dates, including Estonia, Norway, and possibly Ireland, Sweden and UK, to be announced in our newsletter this winter. Please note, this course will be revised for 2020 and split into a separate Azure ML (new) course and a dedicated R in ML Server one.
  • Make sure to get our newsletter to be notified about additional, future dates.

Target Audience

  • Part A: Analysts, budding data scientists, database and BI developers, programmers, power users, DBAs, predictive modellers, forecasters, consultants, anyone interested in using ML for AI.
  • Part B: Current data scientists, ML/AI engineers, and all attendees of Part A.


There are no prerequisites for Part A, other than general ability to work with data in any form: if you have used a spreadsheet, tables, databases, or you have written a program, no matter how long ago, you will be able to learn from Part A.

Part B expects that you have some understanding of machine learning. If you have attended a prior course on Machine Learning, or if you are versed in model validity, accuracy, and reliability, consider attending Part B only. If unsure, ask yourself these questions: can I explain the difference between cross-validation and hold-out testing, do I know which business metrics correspond to precision and which to recall, is model accuracy more important than reliability, and how does a boosted decision tree work. If in doubt, please attend both Parts A and B.

Why attend this class?

Because of Rafal’s 10+ years of real-world machine learning experience.

You will not only learn all the concepts and tools that you need to know from a great teacher who has trained over 800 data scientists world-wide, a highly-respected presenter, capable of holding your attention, but, above all, from a practitioner of machine learning. Rafal Lukawiecki has been delivering ML, data mining, and data science projects for customers in retail, banking, entertainment, healthcare, manufacturing, education, and government sectors for over ten years. Because of that, you will learn:

  • everything essential to starting data science, ML, and AI projects,
  • all fundamental concepts,
  • how to avoid common pitfalls,
  • how to work fast yet accurately,
  • what is really useful and practical,
  • what is more theoretical but still important,
  • what hype you should be wary of.

You will be able to ask any questions related to your industry and you will get relevant, pragmatic, no-nonsense answers, helping you get ahead with your own projects.

Learn from Rafal who has done it all, not from those who just teach it—this is why it is called Practical Machine Learning.


50% lectures, 30% demos, 20% tutorials.

You are encouraged to follow the demos on your machine, and you will be challenged to find answers to 3 larger problems during the tutorials. While they are a hands-on part of the course, if you prefer not to practice, you are welcome to use that time for additional Q&A, or to analyse your own data. We will provide you with all the necessary data sets, and we will explain what free or evaluation edition software needs to be installed to follow the course on your own laptop. In some training centres we are able to provide pre-built machines which you can use instead of your own—please enquire. You will need an Azure account (even a free one) during the course. You can copy course experiments and data into your workspace for learning and for future reference after the course.

Student Testimonials

The course was an immense learning experience, tapping into the vast knowledge base that is Rafal. His presentation skills and technique made the learning experience very enjoyable. The pace at which he managed to deliver the content was remarkable, even when delayed to answer questions he still managed to run through the enormous subject matter and keep to schedule. All in all it was a very enjoyable learning experience that has fuelled my desire to learn more on the subject.
Sean, Globoforce, Ireland

This was a 5 star course. Rafal is a world class teacher who brings the right combination of practical, technical and theoretical experience to the course. I have a Masters in Analytics and have worked on an Analytics Project for 3 years and yet I still learnt so much from this course. Without a doubt the best course I have been on.
Brian, Department of Social Affairs, Ireland

I highly recommend this course. Rafal’s knowledge, teaching skills and humour makes complex challenges much easier to grasp and understand.
Asbjørn, Genus AS, Norway

I initially stumbled across the Practical Data Science course having seen and been impressed by videos of Rafal speaking at Microsoft Ignite. I appreciated and enjoyed the way he discussed his (extensive) practical experience in the field as much as the technology and am pleased to say the course was no different. I came into the course from a background of working with database’s, but the world of data science is something I’ve always wanted to get more involved in. This course seemed to be ideally tailored for this.
Callum, UK public sector company

I had the pleasure of attending “Practical Data Science” in Copenhagen with Rafal. The course was great, and is just the way it is described—not only was it practical and exciting, but followed by in depth understanding of theory. Rafal is a great instructor, and certainly one of the best experts that I have had the chance to meet. Throughout the whole course I learned a lot and Rafal even took time to debate specific problems that we were contemplating.
Philip, Inspari A/S, Denmark

I can only recommend this course. Rafal is an excellent teacher. He shows real world examples that are directly applicable.
Jacquel, Datalytics AG, Switzerland

Part A: Introduction to Machine Learning, AI & Data Science with Azure ML (Mon-Tue)

To deliver the best possible training we follow the industry. The agenda and course content are subject to continuous improvement and revision without further notice.

Machine Learning Fundamentals

We begin with a thorough introduction of all of the key concepts, terminology, components, and tools. Topics include:

  • Machine learning vs. data mining vs. artificial intelligence
  • Model building vs model deployment
  • Explorative ML vs predictive modelling
  • Getting started
  • Data Driven Decisions
  • Tool landscape: open source R vs. Microsoft R, Python, Azure SQL Database, SQL Server, ML Server, Azure ML
  • Azure ML Studio vs Azure ML Services
  • GUI vs code-first approaches in Azure ML
  • Teamwork
  • Algorithms, frameworks, model validity


There are hundreds of machine learning algorithms, yet they belong to just a dozen of groups, of which 5 are in very common use. We will introduce those algorithm classes, and we will discuss some of the most often used examples in each class, while explaining which technology tools (Azure ML, SQL, or R) provide their most convenient implementation. You will also learn how to find more algorithms on the Internet and how to figure out if they are any good for real use. Topics include:

  • What do algorithms do?
  • Algorithm classes in R, Python, ML Server, Azure ML, and SSAS Data Mining
  • Supervised vs. unsupervised learning
  • Classifiers
  • Clustering
  • Regressions
  • Similarity Matching
  • Recommenders
  • Determining which algorithms/packages are good and trustworthy?
  • Correlation is not causation


Machine learning requires you to prepare your data into a rather unique, flat, denormalised format. While features (inputs) are always necessary, and you may need to engineer thousands of them, we do not need labels (predictive outputs) in all cases. Topics include:

  • Cases, observations, samples, rows, and signatures
  • How much data is enough?
  • Does big data help?
  • Inputs and outputs, features, labels, regressors, independent and dependent variables, factors
  • Data formats, discretization/quantizing vs. continuous
  • Indicator columns
  • Feature engineering
  • Azure ML data preparation and manipulation modules
  • Moving data around and its storage, SQL vs. NoSQL, files, data lakes, BLOBs, Data Lakes and Hadoop

Process of Data Science

The process consists of problem formulation, data preparation, modelling, validation, and deployment—in an iterative fashion. You will briefly learn about the CRISP-DM industry-standard approach but the key subject of this module will teach you how to apply the scientific method of reasoning to solve real-world business problems with machine learning and statistics. Notably, you will learn how to start projects by expressing needs as hypotheses, and how to test them. Topics include:

  • Start of every project: stating business needs in data science terms
  • Hypotheses: null vs alternative
  • Hypothesis testing and experiments
  • Evaluating test results
  • The problem with p-values (briefly)
  • Bayesian vs. frequentist approach (briefly)
  • Student’s t-test, Pearson chi-squared test
  • Iterative hypothesis refinement

Introduction to Model Building

At the heart of every project we build machine learning models! The process is simple and it follows a well-trodden path. In this module you will build your first decision tree and get it ready for validation in the next module. Topics include:

  • Connecting to data
  • Selecting features and the label
  • Splitting data to create a holdout
  • Initialising the algorithm (Two-class Boosted Decision Tree)
  • Training a decision tree
  • Scoring the holdout
  • Plotting accuracy
  • Information leakage
  • Dealing with troublesome feature selection

Introduction to Model Validation

The most important aspect of any data science, artificial intelligence, and machine learning project is the iterative validation and improvement of the models. Without validation, your models cannot be reliably used. There are several tests of model validity, most importantly those that check accuracy and reliability. Topics include:

  • Testing accuracy
  • False positives vs. false negatives
  • Classification (confusion) matrix
  • Precision and recall
  • Balancing precision with recall vs. business goals and constraints
  • Introduction to lift charts and ROC curves
  • Testing reliability
  • Testing usefulness
  • The bias-variance trade-off (briefly)

Deployment to Production

Although many models provide immediate, explorative value not needing any further deployment, for many others deploying them into production enables the predictive, and artificially intelligent uses. You will learn a very quick and simple way of deploying Azure ML models as a web service, which is good at earlier stages of your projects. More advanced deployment techniques are covered in the intermediate course. Topics include:

  • Production concerns
  • Web service vs other deployment techniques
  • Experiment preparation for deployment
  • Creating a web service
  • Calling a predictive web service using Request/Response API
  • Consuming predictive web services in Python application and in Excel

Part B: Intermediate Machine Learning in R on SQL Server and Microsoft ML Server (Wed-Fri)

To deliver the best possible training we follow the industry. The agenda and course content are subject to continuous improvement and revision without further notice.

Working with R

There is a large number of tools that you can use with R, and we begin the day focusing on the essential ones. You will also learn how to organise your workflow. Topics include:

  • RStudio
  • Why is RStudio better than RTVS 2017
  • R Tools for Visual Studio 2017 (please note, there is no RTVS for VS 2019)
  • Rattle
  • Microsoft Machine Learning Server vs SQL Machine Learning Services (Azure and Server)
  • Reproducible workflow
  • Package dependency management
  • Snapshots using MRAN Time Machine
  • Projects, files, scripts, history, version control using git
  • Notebooks and RMarkdown

Data Preparation in R

R uses data frames, data tables, and tibbles, amongst others, while ML Server adds XDFs and the ability to work with data stored natively in Hadoop, Spark, and SQL Server. While most data preparation should be done as close to source, preferably using SQL, you will need to learn how to perform some transformations in R. Topics include:

  • Data frames, tables, tibbles
  • Reading files and ODBC data
  • XDFs and connecting to data in ML Server
  • Tidyverse
  • dplyr
  • Scaling data access using ML Server to overcome R/Python memory and parallelism limitations

Plots and Visualisations in R

One of the strengths of R is the ease of creating accurate (and good looking!) plots. As a bare minimum you need to understand how to use the most popular visualisation package, ggplot2, and some of the built-in base functions. Topics include:

  • Summarising data
  • Base boxplots, histograms, scatter plots
  • ggplot2: grammar of graphics
  • Combining visualisations into layers
  • Density plots
  • Surfacing R graphics in Power BI and SQL Server
  • Plotting big data using ML Server

Clustering, Segmentation, Anomaly Detection

Segmentation is the main application of unsupervised learning using clustering algorithms. You will also learn how to apply this technique for anomaly (outlier) detection and data preprocessing. Topics include:

  • Introduction to segmentation
  • Clustering algorithms (k-means, EM, hierarchical, and others)
  • Working with k-means
  • Preparing data for clustering, incl. categorical, non-numeric data
  • Informal yet practical introduction to Principal Component Analysis (PCA)
  • Interpreting clusters
  • Validating cluster goodness of fit using plots and metrics
  • Anomaly detection with clustering, PCA and SVMs


Without doubt, classifiers are the most important, and the most often used category of machine learning algorithms, and the foundation of algorithmic data science, and of most of today’s Artificial Intelligence. We will focus on several variants of the most important classification algorithm—decision tree—while progressively interpreting the results, and improving its performance. After introducing neural networks and logistic regression we will also compare the performance of all of these classifiers on our test dataset. Topics include:

  • Introduction to classifiers
  • Two-class (binary) vs multi-class
  • Decision trees, forests, and boosting
  • Neural networks
  • Logistic regression
  • Implementing simple decision trees in plain R
  • Visualising plain decision trees
  • Decision Forests and Boosting in ML Server
  • Overfitting (overtraining) concerns
  • Pruning and Complexity Penalty (CP), regularisation weight and other hyperparameters
  • Minimum support and the size of the tree
  • Avoiding overfitting through hyperparameter tuning
  • Implementing parallelised logistic regression on big data using ML Server

Classifier Validation

Validation of classifiers will be your key concern, because classifiers are used so often, and because their accuracy is not easy to balance with business requirements, such as restricted resources, or a required level of business performance. Building on your understanding of model validity (introduced in Part 1 of this course), you will learn how to balance an acceptable number of false positives with false negatives by using classification (confusion) matrices, metrics of precision and recall, by plotting ROC (Receiver Operating Characteristic) curves, and by measuring their business impact using profit and cost charts. Attendees have commented in the past that this is the most important module of the entire course. Topics include:

  • Testing classifiers
  • Charting precision-recall and sensitivity-specificity
  • Balancing precision-recall with business goals and constraints
  • ROC curves and lift charts in detail
  • Other measures of accuracy, including AUC, and F1 scores
  • Class imbalance problem (fraud analytics and rare event prediction)
  • What exactly does cross-validation tell us?
  • Measuring quality of cross-validation
  • Optimising binary classifier prediction probability thresholds for a given business target
  • Refining models to improve accuracy and reliability
  • Refining Complexity Penalty through cross-validation using caret package
  • Hyperparameter tuning


Considered by some as the numerical equivalent of classifiers, regression is a large subject of its own. We will introduce its simple but a very popular form, linear regression, followed by the Generalised Linear Model and other forms of regression, and finally, the more precise, but also prone-to-overfitting, decision tree variants. Topics include:

  • Introduction to simple regressions in R
  • Linear regression (classic)
  • Generalised Linear Models (GLM)
  • Dealing with non-normal data (Gamma distribution)
  • Ordinal and multinomial regressions
  • Advice on working with (star) ratings and Likert scales
  • Regression decision trees and other ensemble regression algorithms
  • Regression as a building block of other algorithms

Regression Validation

Unlike classifiers, regressions are easier to asses. You will learn about basic tests of classical linear regressions that are easy to perform in R, and about measuring quality of machine learning, non-linear regressions. Topics include:

  • Measuring linear regression quality
  • Homoscedasticity, multicollinearity and other concerns
  • Common diagnostic plots
  • Making prettier regression validation scatterplots in ggplot2
  • Measuring machine learning regression quality
  • R-squared (Coefficient of Determination), RMSE, MAE, RAE, RSE

Deployment to Production

If you plan on using your models for prediction, rather than just for the exploration of data, or if you want to embed them as Artificial Intelligence in your applications, you need to deploy your models to production and maintain them on an on-going basis. Since we focus on the Microsoft ML Server and SQL ML Services (both Azure Database and Server), you will learn about the powerful and fast PREDICT T-SQL statement, and other supported mechanisms for deploying your models. We will also discuss how to deploy models as a web service, using these, and other Microsoft and non-Microsoft techniques. Topics include:

  • What needs to be deployed, and when?
  • PREDICT T-SQL statement
  • Using sp_execute_external_script
  • Model storage, management and serialisation concerns
  • Deploying web services uses mrsdeploy and operationalisation server clusters
  • Consuming web services API from R
  • Consuming web services using Swagger and REST
  • On-going maintenance and model updates
  • Relationship to Azure ML

Please note: we reserve the right to amend the order of the modules to best suit the dynamic character of the class and to answer questions as they arise. Some subjects will only be covered if time allows, but your satisfaction is guaranteed.

Online Courses