Chapter 5. AI and ML on the Security Front: A Focus on Web Applications

AI and ML techniques and solutions, combined with automation, are now being used in security for threat detection and remediation. In the case of inbound web application requests, AI and ML techniques are exceptionally useful when it comes to observing, quantifying, and classifying inbound requests based on the degree of maliciousness.

Finding Anomalies

In most cases, ML solutions can develop an understanding of existing vulnerabilities because they are capable of being taught how to recognize potential attacks that could exploit these vulnerabilities. Increasingly advanced applications of AI and ML are not interested in identifying and defending against familiar threats; this is something traditional security systems can often achieve. Instead, AI and ML systems are being deployed to find and classify anomalies. In the case of protecting web applications, AI and ML systems are being used to determine whether an inbound request “appears” to be legitimate traffic or whether it is malicious in nature.

ML techniques eliminate the need to have human analysts spend time on what is already understood, or what are often repeatable and mundane tasks. The machine can handle known and well-documented threats while also homing in on the anomalies and threat indicators that have not been seen before. As a point of reference, suppose that a website receives 1,000,000 requests per day, and only 100 of those requests are considered unusual, by whatever definition of unusual an organization uses. ML analysis can label those 100 requests as anomalies and highlight them for the security analyst. After they’re highlighted, the analyst can now spend more time investigating whether they are truly malicious in nature instead of trying to “find the anomalies” on their own.

In fact, web application monitoring is the perfect use case for ML and AI. It is physically impossible for a single human to review millions of lines of web logs each day looking for anomalies. ML is required to identify unusual behavioral patterns and bring them to the attention of analysts. More important, for common and repeated suspicious behaviors, organizations can use ML to automatically block the traffic and alert an analyst that the problem has been resolved. Not only does this help an organization become more efficient, it can help save money.

In the first scenario, in which ML identifies a potential attack and alerts an analyst, the analyst must investigate to determine whether the attack was real. While that investigation is ongoing, an attacker might have already gained access to an organization’s database and might even be downloading sensitive customer or organizational data. In the second scenario, the potential is blocked, and then an alert is generated. The analyst still must investigate, but if the attack turns out to have been a serious one, it has already been stopped. There is no costly data breach, and the attacker has moved on to another target.

AI techniques can also respond faster to vulnerabilities. Techniques that are used with web application firewalls (WAFs) for inbound web requests, for example, would compare all inbound requests against a list of known good and bad requests, regardless of whether they are “known” to be good or bad. This activity can add up to millions of comparisons, and it takes a significant amount of time to examine databases of everything seen in the past and run comparisons with each and every one of them. Instead of comparing one thing with millions of possibilities, AI techniques can rapidly identify a pattern in the requests that can then be analyzed for its threat potential.

The great thing about this process is that it works even for unknown vulnerabilities. For example, in March 2018, many distributed denial of service (DDoS) protection companies noticed an uptick in probing of the memcached port (UDP port 11211). Memcached is a protocol design to cache data in RAM, which can reduce latency on Linux and Windows servers by reducing the number of remote calls a system is making. Attackers figured out that they could use memcached to launch DDoS attacks, and very few organizations were protected against memcached traffic. DDoS protection providers were able to see those early probes because their AI/ML systems identified those probes as unusual. Thus, they were able to determine that a new type of botnet was being built and put protections in place to stop those attacks when they were first launched.

The following sections offer more specific examples of ML and AI techniques in action.

Bringing ML to Bot Attack Remediation

Much of the bot remediation activity today is still a very manual process. Bots using certain IP addresses or domains are identified, then steps can be taken to bar access by blocking them at the proxy or firewall.

However, the introduction of ML can greatly improve defense capabilities by using external threat intelligence about bot behaviors and combining it with data collected about real traffic samples to learn about new bot patterns. This information is then fed into a ML solution. After a ML solution consumes the various data points, it can be told to run multiple models whereby the human provides training input in an active feedback loop approach. ML solutions in turn can launch automated processes for blocking bot traffic based on the machine’s new understanding of what type of bot traffic to now look for. This process is called active learning with labeled data, and machine learning solutions can perform this process continuously without growing tired or becoming overwhelmed.

Using Supervised ML-Based Defenses for Security Events and Log Analysis

The use of supervised ML can greatly speed, augment, and assist the work of security analysts and engineers to identify and mitigate exact threat sources. These analysts and engineers are tasked with viewing security events and logs, and then analyzing a multitude of data points generated from the deployed security controls such as firewalls, IPSs, IDSs, sandboxes, WAFs, endpoint protection platform (EPP) solutions, and privileged access management (PAM) solutions. Combing through collected data to pinpoint specific security threats can take weeks, even months, to accomplish. Taking steps to mitigate the threats takes even longer. Most analysts and engineers already carry a full workload, making it more difficult for them to expand responsibilities. Hiring more resources to ferret out attacks is often not possible. All the while, malicious activities are wreaking havoc on a company’s online presence and performance, while damaging business revenue and reputation along the way.

Products are already on the market that employ human-based, supervised ML. For most of these products, an IT security analyst initially provides feedback to the AI engines that are at work scanning massive numbers of log entries to identify anomalous behaviors. The supervised ML system augments the analyst as it’s trained to improve detection of “significant events” in the logs and to immediately bring those events to the analyst’s attention, which means that your business is more quickly on the path to threat resolution.

Deploying Increasingly Sophisticated Malware Detection

In the case of websites that allow file uploads like pictures, forms, or documents, the traditional website malware detection runs on the servers themselves, and it identifies malicious files that might have already been uploaded to the server. This after-the-fact malware detection might have allowed the malware to cause damage to the application and data files already. Even worse, the malware might not have been detected at all until some user executes it, doing nothing more than spreading more malware internally and to website visitors as well.

We can implement more sophisticated and thoughtful ML techniques in the cloud or at the network edge, and we can apply these techniques to malware detection, as well. This can help identify and stop human and nonhuman malicious activities before they go beyond the edge to the servers and applications sitting behind.

Using AI to Identify Bots

Analyzing the user’s behavior as they visit a website or application is one way to protect against the invasion of nonhuman visitors; in other words, bots. In the past, nonhuman activity was rarely identified, and there were only rudimentary bot challenges like CAPTCHA, which are still so often used today. However, all of that has changed. Today there are better bot challenges used for detection, such as JavaScript challenges, human interaction challenges, and device fingerprint challenges. And these challenges are getting better at detecting bot activity.

ML-based bot management solutions can be capable of automatically tweaking the existing bot challenges to improve detection rates. They can also identify what existing or new bot challenge is needed to defeat a particular bot that uses a repeating tactic or displays a certain pattern of activity. ML-based bot management solutions are definitely needed to defend against that massive rise in malicious bot traffic due to the steady stream of compromised consumer Internet of Things (IoT) devices–turned into malicious bots.

Today, security analysts can implement various bot challenges on a moment’s notice, identifying normal usage patterns for each web application based on legitimate user and visitor behavior analysis, and provide customizable security postures for bots that deviate from the standard usage behavior, activity, or frequency. These technologies are available today, and ML is only making them better.

Of course, AI and ML have uses beyond the realm of bot and botnet protection. Chapter 6 focuses on some other areas of security where AI and ML can drive success.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset