Project Botticelli

Machine Learning and Data Science in Open Source R with Microsoft SQL

5-day tutored, online R course

To find out about our live online course delivery format, available dates, more info on our other ML course, payment options, and the cancellation policy click here. To book, use the green button on this page. Please note, this course is planned to be updated in 2024 using the most recent version of R and the current Microsoft data platform.

The R Logo is © 2016 The R Foundation What Will You Learn?

  • Building and deploying machine learning models using open source R programming language, including data preparation, visualisation, and stringent model validation.
  • High-performance ML using the newest version of Microsoft Azure SQL, Synapse, and SQL Server Machine Learning Services with Open Source R and RStudio.
  • Deployment to production with nanosecond-scale performance.
  • Successful data science project formulation and delivery.

Course Description

Above all, this course will teach you modern R: currently, the most powerful language explicitly designed for advanced analytics, statistical learning, data science, and cutting-edge general-purpose machine learning. While Python is more popular as a universal programming language, also widely used for image and text analysis using deep learning, R is a clear leader in data science. You will learn how to do machine learning in R especially on classical data sets that you often encounter in business use. Even though such data might come from a data lake, typically you will find plenty of it in a data warehouse, a relational databases, or you can acquire it from transactional business application files, or from devices, such as: healthcare equipment, point-of-sales devices, or manufacturing and transportation machinery. Above all, R is great for exploratory analysis of data and it can help you draw meaningful conclusions from real-world experiments, such as A-B marketing tests or product trials. This course will teach you the foundations of hypothesis testing in order to be able to draw such conclusions with a high dose of confidence.

Microsoft SQL Server 2019 Machine Learning Services, now also available in Azure SQL, Azure Synapse, and Azure SQL Managed Instance, support R and Python in a number high-performance, scalable, enterprise-ready, easy-to-use packages and libraries, notably RevoScale. Microsoft have announced some interesting improvements and changes to Machine Learning Services, due with a future version of SQL, and we will ensure you are ready for them! You will learn how to use them during this course. Rafal will also explain the future of the Microsoft technology, as it moves further into the open source world, with the upcoming versions of Azure SQL and SQL Server. You will also learn how to do almost everything using the most popular algorithms provided by open source R packages, such as rpart, kmeansruns, fps, cluster, AMR, factoextra, clusplot, ts, xts, e1071, caret, glm, and for extra help rattle, pROC, ROCR, arules, arulesViz, qdapTools, MLmetrics, and miscTools.

You will learn how to prepare and visualise data both by using open source packages, mainly dplyr and ggplot2, and other parts of the tidyverse meta-package, like readr, readxl, and lubridate, and how to do it more directly in SQL Server/Azure SQL, benefiting from its performance and scalability. We will even combine the power of R with Power BI, to create informative visualisations that are otherwise impossible to do it Power BI alone.

Grouped Notched Boxplot in R While learning about data science process and hypothesis testing, you will discover that some complex business questions can be answered using simpler, statistical techniques, such as tests of significant differences between sets of data, or visualisations like notched box plots. We will refresh your knowledge of rudimentary statistical concepts that are necessary for machine learning and data science, like knowing the difference between ordinal, interval and ratio data, and thus why it does not make sense to calculate a mean star rating, while a median is possible. A little time has been allocated for the discussion of p-values, confidence intervals, and the differences between Bayesian and frequentist interpretation of your results. Bear in mind, that this is not a course about statistics, but a little working knowledge is a must in our industry, and to make the rest of the course easier to follow.

Early in the course, you will learn all the fundamentals of machine learning—no prior knowledge is necessary. You will study: data preparation and relevant structures, algorithm classes and their applications, model evaluation and validation, including all the common performance metrics such as precision and recall. At the heart of this course, however, you will gain an intimate understanding of how some of the most important algorithms work and how to prepare data to make the algorithms give you the most they can.

Visualising Clustering Quality with Clusplot Starting with clustering, you will learn about k-means, k-medians, spherical kmeans and expectation-maximisation. You will find out how to prepare non-numerical and even some numerical data using popular R functions such as mtabulate for these algorithms. Other than using clustering for segmentation, we will also study its use for anomaly detection. We will expand on that subject using other, specialised techniques, such as a One Class SVM and PCA-Based Anomaly Detection, permitting you to predict anomalies, such as fraud.

Decision Tree in R on ML Server We dedicate a full day to focus on building classifiers. You will understand the differences between the most important decision tree algorithms: plain, forests and boosting, and you will study both simpler and more complex neural networks, and how they relate to regressions. We will also cover the widely used logistics regression algorithm, which, actually, is a classifier. Later in the course you will meet the large family of regression techniques, starting with classic linear regression, through GLM, the generalised linear model, to non-linear ML regressions. We will also have some time to cover remaining big applications of machine and statistical learning, notably forecasting with time series, and, briefly, recommendation engines.

Microsoft SQL Azure Logo When deploying models to production, the benefits of using SQL ML Services will impress. After seeing how to do it using open source R, we will culminate with an extremely fast in-database deployment using T-SQL PREDICT statement, and the related real-time sp_rxPredict, which returns predictions on a nano-second scale! You will also see how to deploy your models using web services, interacting via Azure if needed. Please note, however, that this course does not focus on Azure ML, even though we will briefly discuss how to combine those technologies together (please also see our other course by Rafal that focuses on Azure ML).

Every day we will work using RStudio, the most popular, and free, R IDE which is recommended by Microsoft for building R applications on top of Azure SQL and SQL Server. All of our work will follow the modern principles of reproducible research: you will learn how to set-up notebooks, manage packages and their dependencies, including versioning, using snapshots, how to save your work, how to manage change using Git, and how to collaborate. At the end of the course you will keep your own R notebook containing almost 1000 lines of code and results! You are also welcome to keep all data sets that you use during the course labs and tutorials. You will notice that throughout the week you understand and write better and more advanced R, whilst experiencing, first-hand, many of its real-world applications.

Model validity is the most important aspect of any machine learning project. A lot of time has been dedicated to explain it in detail: many validity metrics, such as precision, recall, AUC, F1 score, accuracy (which is rarely a good metric), and the many charts we use to analyse models, especially: confusion matrix, lift/gain charts, ROC curve, precision-recall curve, profit and cost chart, calibration charts, scatter plots, and others used for regression evaluation like histograms of residuals, QQ-Norm plot of residuals, scale-location, Cook’s distance and many others. You will learn how to create those plots using R, and with the help of other tools. At the end of the course you will know when you can trust your models, and you will be able to explain your work to others, especially your project sponsors who rarely are machine learning experts.

Above all, this course will not only teach you the technology and how to use it, but, much more importantly, you will understand how ML works, how to avoid common mistakes, such as overfitting/overtraining, how to balance model accuracy against its reliability—the bias-variance trade-off—and how to relate key ML performance metrics to your business goals, making your bosses and clients happy with your progress and results. You will gain clarity how to start your data science projects and how to finish them. You will know how to express the business need in terms of testable hypotheses, which will guide model building and selection. You will understand what types of work are suited to ML, and which are unlikely to deliver results. You will discover what makes good first projects in your own area of specialisation. These are the key benefits of studying machine learning with Rafal Lukawiecki: industry veteran who has been practicing ML, data mining, statistical learning, and data science with his customers for well over a decade, and who has studied artificial intelligence at Imperial College in the ‘90s under the guidance of the leaders and the inventors of this are of industry and science.

Target Audience

Analysts, budding and current data scientists, data engineers, DBAs, BI developers, programmers, power users, predictive modellers, forecasters, consultants, data engineers, anyone interested in using ML for AI, AI engineers.

Prerequisites

General ability to work with data in any form: using spreadsheets, tables, or databases. Prior knowledge of any programming language is helpful, however, if you are prepared to work harder by asking Rafal questions and doing a little additional homework during the week you can use this course to learn R as your very first programming language.

This course will teach you machine learning and data science using R and Microsoft technologies: you do not need to know that before attending.

Format

50% lectures, 25% demos, 25% lab tutorials.

You are encouraged to follow the demos on your machine, and you will be challenged to find answers to a few larger problems during the tutorials. We will provide you with all the necessary data sets including structured R Markdown notebooks containing labs. While both the demos and the tutorials are a hands-on part of the course, if you prefer not to practice, you are welcome to use that time for additional Q&A, or to work with your own data. As each training centre is different, you will receive an email, two weeks before the course starts, explaining how to prepare your computer for the course, unless the centre is providing one for you. In any case, preparation is easy, because we will use an Azure virtual machine that has been fully preconfigured with all the necessary software for the course. If you follow our preparation guide, you can take this VM together with the course data for your own future learning and reference.

Why attend this class?

Because of Rafal’s 10+ years of real-world machine learning experience.

You will not only learn all the concepts and tools that you need to know from an experienced teacher who has trained over 900 data scientists world-wide, Photo of Rafal Lukawiecki a highly-respected presenter, capable of holding your attention, but, above all, from a practitioner of machine learning. Rafal Lukawiecki has been delivering ML, data mining, and data science projects for customers in retail, banking, entertainment, healthcare, manufacturing, education, and government sectors for twelve years. Because of that, you will learn:

  • everything essential to starting data science, ML, and AI projects,
  • all fundamental concepts,
  • how to avoid common pitfalls,
  • how to work fast yet accurately,
  • what is really useful and practical,
  • what is more theoretical but still important,
  • what hype you should be wary of.

You will be able to ask any questions related to your industry and you will get relevant, pragmatic, no-nonsense answers, helping you get ahead with your own projects.

Learn from Rafal who has done it all, not from those who just teach it—this is why it is Practical Machine Learning.

Student Testimonials

Selection of comments from students:

This was the most well-thought out and practical course on data science that I have attended during my 10-years in analytics. Rafal’s lucid and engaging style ensured a learning experience that was infinitely more stimulating and thought provoking than one might expect from 5 days of technical training delivered remotely. I look forward to applying Rafal’s wisdom to real-world business problems, confident of avoiding the implementation pitfalls that so often beset machine learning projects.
Brian, Department of Social Protection, Ireland

The course was an immense learning experience, tapping into the vast knowledge base that is Rafal. His presentation skills and technique made the learning experience very enjoyable. The pace at which he managed to deliver the content was remarkable, even when delayed to answer questions he still managed to run through the enormous subject matter and keep to schedule. All in all it was a very enjoyable learning experience that has fuelled my desire to learn more on the subject.
Sean, Globoforce, Ireland

I highly recommend this course. Rafal’s knowledge, teaching skills and humour makes complex challenges much easier to grasp and understand.
Asbjørn, Genus AS, Norway

I initially stumbled across the Practical Data Science course having seen and been impressed by videos of Rafal speaking at Microsoft Ignite. I appreciated and enjoyed the way he discussed his (extensive) practical experience in the field as much as the technology and am pleased to say the course was no different. I came into the course from a background of working with database’s, but the world of data science is something I’ve always wanted to get more involved in. This course seemed to be ideally tailored for this.
Callum, UK public sector company

I had the pleasure of attending “Practical Data Science” in Copenhagen with Rafal. The course was great, and is just the way it is described—not only was it practical and exciting, but followed by in depth understanding of theory. Rafal is a great instructor, and certainly one of the best experts that I have had the chance to meet. Throughout the whole course I learned a lot and Rafal even took time to debate specific problems that we were contemplating.
Philip, Inspari A/S, Denmark

I can only recommend this course. Rafal is an excellent teacher. He shows real world examples that are directly applicable.
Jacquel, Datalytics AG, Switzerland

R logo shown above © The R Foundation CC-BY-SA 4.0

Book R for ML and SQL

$1,999 about €1764

* Based on the current ECB reference rate.

New 2024 course will be announced in our newsletter.