The cybersecurity field refers mostly to machine learning (not to AI). And a large part of the tasks is not human-related. Machine learning means solving certain tasks with the use of an approach and particular methods based on the data you have. Machine learning has become a vital technology for cybersecurity. It preemptively stamps out cyber threats and bolsters security infrastructure through pattern detection, real-time cybercrime mapping, and thorough penetration testing. IT Corporations use ML to bolster their cybersecurity systems and to keep malware at bay. Let’s discuss machine learning cybersecurity in 2023 in this blog.
MACHINE LEARNING FOR CYBERSECURITY 2023
- What is Machine Learning?
- Machine Learning task and cybersecurity.
- Cybersecurity tasks and machine learning.
- Machine Learning Cybersecurity for
- Network Protection.
- Endpoint Protection.
- Application Security.
- User Behavior.
- Process Behavior.
Machine learning is a branch of artificial intelligence (AI) and computer science which focuses on the use of data and algorithms to imitate the way that humans learn, gradually improving its accuracy.
Machine Learning terminology
AI ( Artificial Intelligence):- A Science of making things smart or, in other words, human tasks performed by machines (e.g., Visual Recognition, NLP, etc.). The main point is that AI is not exactly machine learning or smart things. It can be a classic program installed in your robot cleaner like edge detection. Roughly speaking, AI is a thing that somehow carries out human tasks.
ML (Machine Learning) — an Approach(just one of many approaches) to AI thatuses a system that is capable of learning from experience. It is intended not only for AI goals (e.g., copying human behavior) but it can also reduce the efforts and/or time spent on both simple and difficult tasks like stock price prediction. In other words, ML is a system that can recognize patterns by using examples rather than by programming them. If your system learns constantly, makes decisions based on data rather than algorithms, and changes its behavior, it’s Machine Learning.
DL (Deep Learning) — a set of Techniques for implementing machine learning that recognizes patterns of patterns – like image recognition. The systems identify primarily object edges, a structure, an object type, and then an object itself. The point is that Deep Learning is not exactly Deep Neural Networks. There are other algorithms, which were improved to learn patterns of patterns, such as Deep Q Learning in Reinforcement task.
Machine Learning tasks and Cybersecurity
Some different methods can be used to solve machine learning tasks and how they are related to cybersecurity tasks.
- Regression (or prediction) machine learning task— a task of predicting the next value based on the previous values.
- Classification machine learning task — a task of separating things into different categories.
- Clustering machine learning task— similar to classification but the classes are unknown, grouping things by their similarity.
- Association rule machine learning task (or recommendation) — a task of recommending something based on your experience.
- Dimensionality reduction machine learning task— or generalization, a task of searching common and most important features in multiple examples.
- Generative models machine learning task — a task of creating something based on the previous knowledge of the distribution.
Regression (or prediction) is simple. The knowledge about the existing data is utilized to have an idea of the new data. Take an example of house price prediction. In cybersecurity, it can be applied to fraud detection. The features (e.g., the total amount of suspicious transactions, location, etc.) determine the probability of fraudulent actions.
As for technical aspects of regression, all methods can be divided into two large categories: machine learning and deep learning. The same is used for other tasks.
For each task, there are the examples of ML and DL methods.
Machine learning for regression
Below is a short list of machine learning methods (having their own advantages and disadvantages) that can be used for regression tasks.
- Liner regression
- Polynomial regression
- Ridge regression
- Decision trees
- SVR (Support Vector Regression)
- Random forest
You can find out a detailed explanation of each method in ML for regression.
Deep learning for regression
For regression tasks, the following deep learning models can be used:
- Artificial Neural Network (ANN)
- Recurrent Neural Network (RNN)
- Neural Turing Machines (NTM)
- Differentiable Neural Computer (DNC)
Classification is also straightforward. Imagineyou have two piles of pictures classified by type (e.g., dogs and cats). In terms of cybersecurity, a spam filter separating spams from other messages can serve as an example. Spam filters are probably the first ML approach applied to Cybersecurity tasks.
The supervised learning approachisusually used for classification where examples of certain groups are known. All classes should be defined in the beginning.
Below is the list related to algorithms.
Machine learning for classification
- LogisticRegression (LR)
- K-Nearest Neighbors (K-NN)
- Support Vector Machine (SVM)
- Random Forest Classification
It’s considered that methods like SVM and random forests work best. Keep in mind that there are no one-size-fits-all rules, and they probably won’t operate properly for your task.
Deep learning for classification
- Artificial Neural Network
- Convolutional Neural Networks
Deep learning methods work better if you have more data. But they consume more resources especially if you are planning to use it in production and re-train systems periodically.
Clustering is similar to classification with the only but major difference. The information about the classes of the data is unknown. There is no idea whether this data can be classified. This is unsupervised learning.
Supposedly, the best task for clustering is forensic analysis. The reasons, course, and consequences of an incident are obscure. It’s required to classify all activities to find anomalies. Solutions to malware analysis (i.e., malware protection or secure email gateways) may implement it to separate legal files from outliers.
Another interesting area where clustering can be applied is user behavior analytics. In this instance, application users cluster together so that it is possible to see if they should belong to a particular group.
Usually clustering is not applied to solving a particular task in cybersecurity as it is more like one of the subtasks in a pipeline (e.g., grouping users into separate groups to adjust risk values).
Machine learning for clustering
- K-nearest neighbours (KNN)
Deep learning for clustering
- Self-organized Maps (SOM) or Kohonen Networks
Association Rule Learning
Netflix and SoundCloud recommend films or songs according to your movies or music preferences. In cybersecurity, this principle can be used primarily for incident response. If a company faces a wave of incidents and offers various types of responses, a system learns a type of response for a particular incident (e.g., mark it as a false positive, change a risk value, run the investigation). Risk management solutions can also have a benefit if they automatically assign risk values for new vulnerabilities or misconfigurations built on their description.
There are algorithms used for solving recommendation tasks.
Machine learning for association rule learning
Deep learning for association rule learning
- Deep Restricted Boltzmann Machine (RBM)
- Deep Belief Network (DBN)
- Stacked Autoencoder
The latest recommendation systems are based on restricted Boltzmann machines and their updated versions, such as promising deep belief networks.
Dimensionality reduction or generalizations are not as popular as a classification but are necessary if you deal with complex systems with unlabeled data and many potential features. You can’t apply to cluster because typical methods restrict the number of features or they don’t work. Dimensionality reduction can help handle it and cut unnecessary features. Like clustering, dimensionality reduction is usually one of the tasks in a more complex model. As to cybersecurity tasks, dimensionality reduction is common for face detection solutions — the ones you use on your iPhone.
Machine learning dimensionality reduction
- Principal Component Analysis (PCA)
- Singular-value decomposition (SVD)
- T-distributed Stochastic Neighbor Embedding (T-SNE)
- Linear Discriminant Analysis (LDA)
- Latent Semantic Analysis (LSA)
- Factor Analysis (FA)
- Independent Component Analysis (ICA)
- Non-negative Matrix Factorization (NMF)
Learn more about machine learning dimensionality reduction .
The task of generative models differs from the above-mentioned ones. While those tasks deal with the existing information and associated decisions, generative models are designed to simulate the actual data (not decisions) based on the previous decisions.
The simple task of offensive cybersecurity is to generate a list of input parameters to test a particular application for Injection vulnerabilities.
Alternatively, you can have a vulnerability scanning tool for web applications. One of its modules is testing files for unauthorized access. These tests can existing filenames to identify the new ones. For example, if a crawler detected a file called login.php, it’s better to check the existence of any backup or test its copies by trying names like login_1.php, login_backup.php, login.php.2017. Generative models are good at this.
Machine learning generative models
- Markov Chains
- Genetic algorithms
Deep learning generative models
- Variational Autoencoders
- Generative adversarial networks (GANs)
- Boltzmann Machines
Recently, GANs showed impressive results. They successfully mimic a video. Imagine how it can be used for generating examples for fuzzing.
Cybersecurity Tasks and Machine Learning
Instead of looking at ML tasks and trying to apply them to cybersecurity, let’s look at the common cybersecurity tasks and machine learning opportunities. There are three dimensions (Why What, and How).
The first dimension is a goal, or a task (e.g., detect threats, predict attacks, etc.). According to Gartner’s PPDR model, all security tasks can be divided into five categories:
The second dimension is a technical layer and an answer to the “What” question (e.g., at which level to monitor issues). Here is the list of layers for this dimension:
- network (network traffic analysis and intrusion detection);
- endpoint (anti-malware);
- application (WAF or database firewalls);
- user (UBA);
- process (anti-fraud).
Each layer has different subcategories. For example, network security can be Wired, Wireless or Cloud. Rest assured that you can’t apply the same algorithms with the same hyperparameters to both areas, at least in near future. The reason is the lack of data and algorithms to find better dependencies of the three areas so that it’s possible to change one algorithm to different ones.
The third dimension is a question of “How” (e.g., how to check the security of a particular area):
- in transit in real time;
- at rest;
For example, if you are about endpoint protection, looking for the intrusion, you can monitor processes of an executable file, do static binary analysis, analyze the history of actions in this endpoint, etc.
Some tasks should be solved in three dimensions. Sometimes, there are no values in some dimensions for certain tasks. Approaches can be the same in one dimension. Nonetheless, each particular point of this three-dimensional space of cybersecurity tasks has its intricacies.
It’s difficult to detail them all so let’s focus on the most important dimension — technology layers. Look at the cybersecurity solution from this perspective.
Machine learning for Network Protection
Network protection is not a single area buta set of different solutions that focus on a protocol such as Ethernet, wireless, SCADA, or even virtual networks like SDNs.
Network protection refers to well-known Intrusion Detection System (IDS) solutions. Some of them used a kind of ML years ago and mostly dealt with signature-based approaches.
ML in network security implies new solutions called Network Traffic Analytics (NTA) aimed at in-depth analysis of all the traffic at each layer and detecting attacks and anomalies.
How can ML help here? There are some examples:
- regression to predict the network packet parameters and compare them with the normal ones;
- classification to identify different classes of network attacks such as scanning and spoofing;
- clustering for forensic analysis.
You can find at least 10 papers describing diverse approaches in academic research papers.
- Machine Learning Techniques for Intrusion Detection
- Long Short Term Memory Networks for Anomaly Detection in Time Series
- Anomaly Detection Framework Using Rule Extraction for Efficient Intrusion Detection
- A survey of network anomaly detection techniques
- Shallow and Deep Networks Intrusion Detection System: A Taxonomy and Survey
- Deep Packet: A Novel Approach For Encrypted Traffic Classification Using Deep Learning
- Performance Comparison of Intrusion Detection Systems and Application of Machine Learning to Snort System
- Evaluation of Machine Learning Algorithms for Intrusion Detection System
- One Class of collective Anomaly Detection based on LSTM
- Network Traffic Anomaly Detection Using Recurrent Neural Networks
- Sequence Aggregation Rules for Anomaly Detection in Computer Network Traffic
- Big collection of all approaches for IDS
Machine learning for Endpoint Protection
The new generation of anti-viruses is Endpoint Detection and Response. It’s better to learn features in executable files or the process behavior. Keep in mind that if you deal with machine learning at the endpoint layer, your solution may differ depending on the type of endpoint (e.g., workstation, server, container, cloud instance, mobile, PLC, IoT device). Every endpoint has its specifics but the tasks are common:
- regression to predict the next system call for executable process and compare it with real ones;
- classification to divide programs into such categories as malware, spyware, and ransomware;
- clustering for malware protection on secure email gateways (e.g., to separate legal file attachments from outliers).
Academic papers about endpoint protection and malware specifically are gaining popularity. Here are a few examples:
- regression to detect anomalies in HTTP requests (for example, XXE and SSRF attacks and auth bypass);
- classification to detect known types of attacks like injections (SQLi, XSS, RCE, etc.);
- clustering user activity to detect DDOS attacks and mass exploitation.
- Malware Detection by Eating a Whole EXE
- Deep learning at the shallow end: Malware classification for non-domain experts
- TESSERACT: Eliminating Experimental Bias in Malware Classification across Space and Time
Machine learning for Application Security
Application security is my favorite area, by the way, especially ERP Security.
Where to use ML in-app security? — WAFs or Code analysis, both static and dynamic. To remind you, Application security can differ. There are web applications, databases, ERP systems, SaaS applications, microservices, etc. It’s almost impossible to build a universal ML model to deal with all threats effectively in near future. However, you can try to solve some of the tasks.
Here are examples what you can do with machine learning for application security:
More resources providing ideas of using ML for application security:
- Adaptively Detecting Malicious Queries in Web Attacks
- URLNet: Learning a URL Representation with Deep Learning for Malicious URL Detection
Machine learning for User Behavior
This area started as Security Information and Event Management (SIEM).
SIEM was able to solve numerous tasks if configured properly including user behavior search and ML. Then the UEBA solutions declared that SIEM couldn’t handle new, more advanced types of attacks and constant behavior change.
The market has accepted the point that a special solution is required if the threats are regarded from the user level.
However, even UEBA tools don’t cover all things connected with different user behavior. There are domain users, application users, SaaS users, social networks, messengers, and other accounts that should be monitored.
Unlike malware detection focusing on common attacks and the possibility to train a classifier, user behavior is one of the complex layers and unsupervised learning problems. As a rule, there is no labeled dataset as well as any idea of what to look for. Therefore, the task of creating a universal algorithm for all types of users is tricky in the user behavior area. Here are the tasks that companies solve with the help of ML:
- regression to detect anomalies in User actions (e.g., login at an unusual time);
- classification to group different users for peer-group analysis;
- clustering to separate groups of users and detect outliers.
More resources on machine learning user behaviour:
- Detecting Anomalous User Behavior Using an Extended Isolation Forest Algorithm: An Enterprise Case Study
- Deep Learning for Unsupervised Insider Threat Detection in Structured Cybersecurity Data Streams
Machine learning for Process Behavior
The process area is the last but not least. While dealing with it, it’s necessary to know a business process toething anomalous. Business processes can differ significantly. You can look for fraud in banking and retail systems, or on a plant floor in manufacturing. The two are different, and they demand a lot of domain knowledge. In machine learning feature engineering (the way you represent data to your algorithm) is essential to achieve results. Similarly, features are different in all processes.
In general, there are the examples of tasks in the process area:
- regression to predict the next user action and detect outliers such as credit card fraud;
- classification to detect known types of fraud;
- clustering to compare business processes and detect outliers.
You can find research papers related to banking fraud as ICS and SCADA systems security is much less represented.
- Fraud with autoencoders
- A Survey of Credit Card Fraud Detection Techniques: Data and Technique Oriented Perspective
- Anomaly detection; Industrial control systems; convolutional neural networks
There are more areas left. This blog outlined the basics. On the one hand, machine learning is not a silver-bullet solution if you want to protect your systems. Undoubtedly, there are many issues with interpretability (particularly for deep learning algorithms), but humans also cannot interpret their own decisions, right?
On the other hand, with the growing amount of data and decreasing number of experts, ML is the only remedy. It works now and will be mandatory soon. It is better to start right now.
Author at Toward data science