Introduction to Azure ML Classic Purchase this course

29 January 2015 · 4 comments · 2975 views

Process, Modelling, Validation, Experiments and Web Services

Rafal discusses a scoring experiment design in Azure Machine Learning

This full-length, 1-hour 40-minute, in-depth video introduces every aspect of Microsoft Azure Machine Learning: tools and concepts, the process, uploading data, modelling, validating results, preparing and publishing scoring experiments and even using deployed machine learning web services by calling them from a Python application.

It is easy to get started with Azure ML: all you need is a Microsoft account (a free one is OK) and once you have deployed an ML Workspace in the Azure Portal, you can start using the main development tool: ML Studio.

Of course, to do anything, you need to connect to data, representing the cases which you want to analyse and model—if you are not sure what is a case, make sure to watch a preceding module which introduces all of those important concepts in detail, also part of this online course. There is a good variety of supported data sources, including ﬂat ﬁles in formats such as CSV, ARFF (used by Weka), JSON, XML, and even R native RData format, although RData format is not fully supported by the various Azure ML modules—not yet—even though R code is supported. You can also connect via a web service to a ﬂat ﬁle, or get data via OData and even Power Query, or from a Hadoop Hive or an Azure BLOB. However, if you are studying fairly typical cases/observations, customer, or product signatures, the best way to manage them is by uploading those to a SQL Azure Database, and then connecting to it from your Azure ML experiment. This gives you the best of both worlds: a neat and easy-to-use relational database for managing the data, changing column types, and even running SQL queries, with the ability to have all the data eﬃciently read and processed by your machine learning modules inside Azure ML.

While there are many algorithms which you could use in Azure ML, our demo focuses on building a two-class, aka a binary classiﬁer using a Decision Forest. In our case, we are building a service which can predict if a potential customer is likely to purchase many cars from our ﬁctitious company, Happy Cars. To create such a service, we follow most of the data mining/machine learning process which can be summarised as a cycle of cycles: each of which progressively adjusts the input data (cases), builds models, and validates them, before deploying those into production. Having a good process in place is key to making machine learning part of your business, so make sure to watch this section.

Model validation is a relatively complex subject, and there are many tests you should perform, many of which are discussed in a module on data mining. In the simplest way, you need to validate model’s accuracy, reliability, and usefulness. Azure ML will help you assess the ﬁrst two, and the latter needs input from a human, domain expert, who can tell you if your model makes business sense. For example, a model that ﬁnds out that ladies usually buy female clothing would be a very accurate and a reliable one, but not useful at all (unless you happen to be studying humanity from another, very diﬀerent planet)! ROC (receiver operating characteristics) charts, lift charts, and the confusion/classiﬁcation matrix help us asses model accuracy, and they are shown in the demo, which also shows how to compare one model’s performance against another. There is much to learn about your model at this stage. Notably, this is the time when you get a good feel for the ratio of false positive and false negative predictions that you want to balance against each other, something that leads to a selection of an appropriate prediction threshold (or cut-oﬀ value). Watch the video to ﬁnd out why we are selecting a probability threshold of 0.3.

Once you are pleased with your model’s performance because you have validated it, you may want to deploy it to production. This is something fairly unique to Azure ML and it is an important part of its reliability and manageability production design. Even though you could just deploy the experiment which built the model, it would be very ineﬃcient and somewhat limiting to do that. Instead, create a new experiment which will be used for applying a previously saved model to new data, by means of calling a web service. This is known as scoring in the machine learning world, or as predicting in data mining—hence the term predictive analytics. In fact, this is not really an experiment anymore, it is a bit of an odd term that Azure ML uses in this context, as it is more of a query service. Once you have published it as a web service and optionally cleaned up the schema of the required inputs, you are ready to test your new service. This can be done by ﬁlling in values on a convenient web service test page, or by making an actual web service call from a diﬀerent application. In our case, you will see an example of a very easy and short Python code which calls our service to predict if the customer in question is likely to buy four or more cars from our company in their lifetime—naturally, you could use pretty much any language you wish to make this call. This Python application can now oﬀer this customer a special deal or perhaps direct them to someone dedicated to looking after our most valuable customers.

This is all you need to do to get started! In reality, you would want to perform more, real-life testing, such as A-B testing of your predictive service, and you would need to constantly review the model for its changing validity—as we have explained earlier in the video, this is part of the cyclical nature of using machine learning in your business. Make sure to follow the remaining modules of this course, especially those that explain the key aspects of model validity. Finally, if you would like to replicate the steps of the demo, make sure to download our sample data set, HappyCars, which is available to Full Access Members.

Purchase A Full Access Subscription

Subscription Best Value

$480/year

Access all content for 1 year.
Purchase on Tecflix

Payment is instant and you will receive a tax invoice straight away.
Your satisfaction is paramount: we offer a no-quibble refund guarantee.

Comments

hxy0135 · 3 February 2015

Dear Rafal,
Watched the video. As always, it is well done. I especially like the part "Evaluate model accuracy". You gave a very clear explanation on how to evaluate a model.
In Azure ML studio, Microsoft gives many samples but for almost all models, they dis not provide any in-depth explanation on why they do the things that way. Your in-depth analyzing and explanation on model evaluation makes your video valuable.

Thank you!

Rafal Lukawiecki · 3 February 2015

Thank you, Hua, for your very kind comment. I am happy that you saw the key point in how I, and my colleagues, teach complex concepts in our videos: we always want to focus on a real-world reason why something matters—I inherited this attitude from one of my ﬁrst bosses, Ian, when I worked in Oxford. He would go to great lengths always to explain the “why”. I promise to keep up this approach throughout this course, and the others which are still to come.

rjsiii · 16 May 2016

Thanks Rafal for this video. I wish i could have made the course in Oakbrook last week but unfortunately could not. At 87:08 you set the Input of the Score model task as "Set as Publish Input". I'm not sure if it's becuase Azure has been updated (IE now i used "import Data" and "Select Columns" Instead of "Read" and "Project") or if i missed something.

How'd you "Set as Publish Input" and is that still required?

Thanks much, I've LOVED the site so far!
-Rob Schenck

Rafal Lukawiecki · 17 May 2016

Hi Rob, you are quite correct: Microsoft have changed a good bit of the UI since the video. The new way is to use a dedicated pair of modules called, at the moment, Input and Output, and they appear in the Web Service section of the module list, to the left of the canvas.

Perhaps you can make it to Chicago on Nov 7th? We have just added that course to the schedule.