Machine Learning for IT Security: From ML to Security AI

Jump to a chapter

Introduction (00:07)
Agenda (00:49)
Opportunities of Applying ML to IT Security (02:54)
Requirements (06:16)
Supervised Approach (Security Classifier) (15:32)
Unsupervised Approach (Anomalies and Outliers) (23:00)
False Positive Problem (28:19)
Security Artificial Intelligence (34:53)
Summary (39:33)

This video explains how to get started with machine learning for IT security purposes. Using ML for security can oﬀer valuable new opportunities, leading to better, more thorough, proactive security, with fewer breaches. You can thank your existing, and some new, security-related data, such as logs, that we rarely rely on in a proactive manner. Logs are usually analysed to ﬁnd what happened, rather than to predict what is going to happen. Further, their size, messiness, and general inability to draw valid conclusions makes them hard to work with using traditional approaches.

Interestingly, this is exactly where machine learning can help. ML is good at working with large amounts of potentially confusing data. Above all, you can draw conclusions about the safety risk of new, previously unseen situations using your ML models—assuming you have followed the foundations of any good data science process and you have validated your models thoroughly to a statistical level of signiﬁcance, and you have tested them before deployment.

As with all machine learning and data science projects, you need to prepare your data carefully, structuring it into the common, flat table, row-per-event (case) format, where columns represent attributes that you can legally, ethically, and morally, analyse. If you are going to follow the supervised approach, you will need one additional column, the predictable target, also known as the label. Most likely you will build a classiﬁcation model using an algorithm such as decision trees, logistics regression, or a neural network. This classiﬁer will enable you to predict if an event, or a security credential, or person, or an application etc, that is currently performing some action, is threatening enough to warrant an intervention. This label column must exist before you build your model, and it needs to contain a known fact, ideally a Boolean of some sort, denoting if the known event was or was not a security issue.

What if you do not have such data? Or what if you are still at an earlier stage and you are merely doing some detective work to ﬁnd out if there were any security issues that your data has captured? The unsupervised approach can help. Your data needs to be prepared in an identical way to the supervised approach: one row per case, many columns for each analysable attribute. However, you will not have, or will not need to use, the label (predictable) column. In this approach you will use such algorithms as clustering or association rules to ﬁnd natural groupings of cases, looking for smaller, anomalous clusters, or outliers than belong to no cluster, or links, that connect one suspicious event to another. There are many other ways how you can use combinations of algorithms to build your security model using machine learning.

Sooner or later, when you try to use your model in production, you will face a major issue: false positives (FPs). Simply speaking, your system will make mistakes and it will class benign events as threats! Dealing with them in a just, legal, ethical and a common sense manner is utterly important, but not easy. Otherwise you will cause damage, upset, even expose your organisation to legal expenses or reputational losses, perhaps even straying into morally reprehensible territory—denying people a service they deserve simply because a computer thinks there is something wrong with them is unacceptable.

Having a process for dealing with false positives is fundamentally important, but so is the tuning of your model to balance their occurrence against its ability not to miss threats, measured as false negatives (FN). My courses teach a great deal about this, there are many ways to ﬁnd a balance between those two opposing goals (FP vs FN) that is good for everyone involved, whilst being cost-eﬀective. By the way, that is, essentially, practical machine learning, as applied to real-world projects and situations, rather than treated as a mathematical optimisation problem.

If all of this works well, you may want to take a step into the future of Artiﬁcial Intelligence for Security, aka Security AI. To do that, you need to introduce two interesting but risky automations. First of all, your machine learning model needs to be updated regularly, taking in new data (new logs etc), but also the results of its own (ie. model’s) actions. In other words, when we know that your model is making good or wrong decisions, that very fact becomes additional, important data, that is continually used to update the model. This is something we have done for decades, but more recently it became popular under the, trendy name of reinforcement learning. Secondly—and very carefully—you may want to automate the security decisions that your model makes. Yes, that means the machine will decide to proactively deny access, when it predicts a suﬃciently high level of security risk.

Those two steps: autonomous security decision actions and automatic model updates using new data and knowledge of its own mistakes is what turns your ML model into a Security AI system, one that learns and improves its chance of success, automatically.

With automatic Security AI it is very important to continually validate your models, because they will deteriorate if left to their own. Let me stress again that you also need a wonderful, people-friendly process for dealing with false positives. I have customers who have built that, for example for fraud detection, which is a form of security AI. In their case, I am happy to say that all autonomous actions of the system are vetted by a human controller, ensuring no “computer says no” annoyances, not to mention unethical actions. I wish everyone did it that way.

Log in or purchase access to play the video.

The Future Series (2019)

What is Artificial Intelligence? 19-min Watch with Free Subscription
What is Artificial Stupidity? 27-min Watch with Free Subscription
The Future of AI—How to Avoid Artificial Stupidity 1-hour 13-min
The Future of Power BI 20-min
Machine Learning for Security Applications: Why? 26-min Watch with Free Subscription
Machine Learning for IT Security: From ML to Security AI 42-min
Next Year in Machine Learning, Data Science, AI and BI 40-min Watch with Free Subscription
Microsoft Machine Learning Technologies: View Towards 2020 1-hour 26-min

Purchase a Full Access Subscription

Individual Subscription

$480/year

Access all content on this site for 1 year.
Purchase

Group Purchase

from $480/year

For small business & enterprise.
Group Purchase

You can also redeem a prepaid code.
Payments are instant and you will receive a tax invoice straight away.
We oﬀer sales quotes/pro-forma invoices, and we accept purchase orders and bank transfers.
Your satisfaction is paramount: we oﬀer a no-quibble refund guarantee.
See pricing FAQ for more detail.

Machine Learning for IT Security: From ML to Security AI Purchase the entire course

The Future Series (2019)

Jump to a chapter

The Future Series (2019)

Purchase a Full Access Subscription

Individual Subscription

Group Purchase

In collaboration with

Company

Courses

Resources

Help

Search form

Machine Learning for IT Security: From ML to Security AI Purchase the entire course

The Future Series (2019)

Jump to a chapter

The Future Series (2019)

Purchase a Full Access Subscription

Individual Subscription

Group Purchase

Get the Newsletter

In collaboration with