Chapter 9: The Root Causes of Vulnerabilities

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

9
THE ROOT CAUSES OF VULNERABILITIES

This chapter describes the common root causes of security vulnerabilities that result from the implementation of a protocol. These causes are distinct from vulnerabilities that derive from a protocol’s specification (as discussed in Chapter 7). A vulnerability does not have to be directly exploitable for it to be considered a vulnerability. It might weaken the security stance of the protocol, making other attacks easier. Or it might allow access to more serious vulnerabilities.

After reading this chapter, you’ll begin to see patterns in protocols that will help you identify security vulnerabilities during your analysis. (I won’t discuss how to exploit the different classes until Chapter 10.)

In this chapter, I’ll assume you are investigating the protocol using all means available to you, including analyzing the network traffic, reverse engineering the application’s binaries, reviewing source code, and manually testing the client and servers to determine actual vulnerabilities. Some vulnerabilities will always be easier to find using techniques such as fuzzing (a technique by which network protocol data is mutated to uncover issues) whereas others will be easier to find by reviewing code.

Vulnerability Classes

When you’re dealing with security vulnerabilities, it’s useful to categorize them into a set of distinct classes to assess the risk posed by the exploitation of the vulnerability. As an example, consider a vulnerability that, when exploited, allows an attack to compromise the system an application is running on.

Remote Code Execution

Remote code execution is a catchall term for any vulnerability that allows an attacker to run arbitrary code in the context of the application that implements the protocol. This could occur through hijacking the logic of the application or influencing the command line of subprocesses created during normal operation.

Remote code execution vulnerabilities are usually the most security critical because they allow an attacker to compromise the system on which the application is executing. Such a compromise would provide the attacker with access to anything the application can access and might even allow the hosting network to be compromised.

Denial-of-Service

Applications are generally designed to provide a service. If a vulnerability exists that when exploited causes an application to crash or become unresponsive, an attacker can use that vulnerability to deny legitimate users access to a particular application and the service it provides. Commonly referred to as a denial-of-service vulnerability, it requires few resources, sometimes as little as a single network packet, to bring down the entire application. Without a doubt, this can be quite detrimental in the wrong hands.

We can categorize denial-of-service vulnerabilities as either persistent or nonpersistent. A persistent vulnerability permanently prevents legitimate users from accessing the service (at least until an administrator corrects the issue). The reason is that exploiting the vulnerability corrupts some stored state that ensures the application crashes when it’s restarted. A nonpersistent vulnerability lasts only as long as an attacker is sending data to cause the denial-of-service condition. Usually, if the application is allowed to restart on its own or given sufficient time, service will be restored.

Information Disclosure

Many applications are black boxes, which in normal operation provide you with only certain information over the network. An information disclosure vulnerability exists if there is a way to get an application to provide information it wasn’t originally designed to provide, such as the contents of memory, filesystem paths, or authentication credentials. Such information might be directly useful to an attacker because it could aid further exploitation. For example, the information could disclose the location of important in-memory structures that could help in remote code execution.

Authentication Bypass

Many applications require users to supply authentication credentials to access an application completely. Valid credentials might be a username and password or a more complex verification, like a cryptographically secure exchange. Authentication limits access to resources, but it can also reduce an application’s attack surface when an attacker is unauthenticated.

An authentication bypass vulnerability exists in an application if there is a way to authenticate to the application without providing all the authentication credentials. Such vulnerabilities might be as simple as an application incorrectly checking a password—for example, because it compares a simple checksum of the password, which is easy to brute force. Or vulnerabilities could be due to more complex issues, such as SQL injection (discussed later in “SQL Injection” on page 228).

Authorization Bypass

Not all users are created equal. Applications may support different types of users, such as read-only, low-privilege, or administrator, through the same interface. If an application provides access to resources like files, it might need to restrict access based on authentication. To allow access to secured resources, an authorization process must be built in to determine which rights and resources have been assigned to a user.

An authorization bypass vulnerability occurs when an attacker can gain extra rights or access to resources they are not privileged to access. For example, an attacker might change the authenticated user or user privileges directly, or a protocol might not correctly check user permissions.

NOTE

Don’t confuse authorization bypass with authentication bypass vulnerabilities. The major difference between the two is that an authentication bypass allows you to authenticate as a specific user from the system’s point of view; an authorization bypass allows an attacker to access a resource from an incorrect authentication state (which might in fact be unauthenticated).

Having defined the vulnerability classes, let’s look at their causes in more detail and explore some of the protocol structures in which you’ll find them. Each type of root cause contains a list of the possible vulnerability classes that it might lead to. Although this is not an exhaustive list, I cover those you are most likely to encounter regularly.

Memory Corruption Vulnerabilities

If you’ve done any analysis, memory corruption is most likely the primary security vulnerability you’ll have encountered. Applications store their current state in memory, and if that memory can be corrupted in a controlled way, the result can cause any class of security vulnerability. Such vulnerabilities can simply cause an application to crash (resulting in a denial-of-service condition) or be more dangerous, such as allowing an attacker to run executable code on the target system.

Memory-Safe vs. Memory-Unsafe Programming Languages

Memory corruption vulnerabilities are heavily dependent on the programming language the application was developed in. When it comes to memory corruption, the biggest difference between languages is tied to whether a language (and its hosting environment) is memory safe or memory unsafe. Memory-safe languages, such as Java, C#, Python, and Ruby, do not normally require the developer to deal with low-level memory management. They sometimes provide libraries or constructs to perform unsafe operations (such as C#’s unsafe keyword). But using these libraries or constructs requires developers to make their use explicit, which allows that use to be audited for safety. Memory-safe languages will also commonly perform bounds checking for in-memory buffer access to prevent out-of-bounds reads and writes. Just because a language is memory safe doesn’t mean it’s completely immune to memory corruption. However, corruption is more likely to be a bug in the language runtime than a mistake by the original developer.

On the other hand, memory-unsafe languages, such as C and C++, perform very little memory access verification and lack robust mechanisms for automatically managing memory. As a result, many types of memory corruption can occur. How exploitable these vulnerabilities are depends on the operating system, the compiler used, and how the application is structured.

Memory corruption is one of the oldest and best known root causes of vulnerabilities; therefore, considerable effort has been made to eliminate it. (I’ll discuss some of the mitigation strategies in more depth in Chapter 10 when I detail how you might exploit these vulnerabilities.)

Memory Buffer Overflows

Perhaps the best known memory corruption vulnerability is a buffer overflow. This vulnerability occurs when an application tries to put more data into a region of memory than that region was designed to hold. Buffer overflows may be exploited to get arbitrary programs to run or to bypass security restrictions, such as user access controls. Figure 9-1 shows a simple buffer overflow caused by input data that is too large for the allocated buffer, resulting in memory corruption.

Figure 9-1: Buffer overflow memory corruption

Buffer overflows can occur for either of two reasons: Commonly referred to as a fixed-length buffer overflow, an application incorrectly assumes the input buffer will fit into the allocated buffer. A variable-length buffer overflow occurs because the size of the allocated buffer is incorrectly calculated.

Fixed-Length Buffer Overflows

By far, the simplest buffer overflow occurs when an application incorrectly checks the length of an external data value relative to a fixed-length buffer in memory. That buffer might reside on the stack, be allocated on a heap, or exist as a global buffer defined at compile time. The key is that the memory length is determined prior to knowledge of the actual data length.

The cause of the overflow depends on the application, but it can be as simple as the application not checking length at all or checking length incorrectly. Listing 9-1 is an example.

def read_string()
{
➊ byte str[32];
  int i  = 0;

  do
  {
  ➋ str[i] = read_byte();
     i = i + 1;
  }
➌ while(str[i-1] != 0);
  printf("Read String: %s ", str);
}

Listing 9-1: A simple fixed-length buffer overflow

This code first allocates the buffer where it will store the string (on the stack) and allocates 32 bytes of data ➊. Next, it goes into a loop that reads a byte from the network and stores it an incrementing index in the buffer ➋. The loop exits when the last byte read from the network is equal to zero, which indicates that the value has been sent ➌.

In this case, the developer has made a mistake: the loop doesn’t verify the current length at ➌ and therefore reads as much data as available from the network, leading to memory corruption. Of course, this problem is due to the fact that unsafe programming languages do not perform bounds checks on arrays. This vulnerability might be very simple to exploit if no compiler mitigations are in place, such as stack cookies to detect the corruption.

UNSAFE STRING FUNCTIONS

The C programming language does not define a string type. Instead, it uses memory pointers to a list of char types. The end of the string is indicated by a zero-value character. This isn’t a security problem directly. However, when the built-in libraries to manipulate strings were developed, safety was not considered. Consequently, many of these string functions are very dangerous to use in a security-critical application.

To understand how dangerous these functions can be, let’s look at an example using strcpy, the function that copies strings. This function takes only two arguments: a pointer to the source string and a pointer to the destination memory buffer to store the copy. Notice that nothing indicates the length of the destination memory buffer. And as you’ve already seen, a memory-unsafe language like C doesn’t keep track of buffer sizes. If a programmer tries to copy a string that is longer than the destination buffer, especially if it’s from an external untrusted source, memory corruption will occur.

More recent C compilers and standardizations of the language have added more secure versions of these functions, such as strcpy_s, which adds a destination length argument. But if an application uses an older string function, such as strcpy, strcat, or sprintf, then there’s a good chance of a serious memory corruption vulnerability.

Even if a developer performs a length check, that check may not be done correctly. Without automatic bounds checking on array access, it is up to the developer to verify all reads and writes. Listing 9-2 shows a corrected version of Listing 9-1 that takes into account strings that are longer than the buffer size. Still, even with the fix, a vulnerability is lurking in the code.

def read_string_fixed()
{
➊ byte str[32];
  int i = 0;

  do
  {
  ➋ str[i] = read_byte();
     i = i + 1;
  }
➌ while((str[i-1] != 0) && (i < 32));

  /* Ensure zero terminated if we ended because of length */
➍ str[i] = 0;

  printf("Read String: %s ", str);
}

Listing 9-2: An off-by-one buffer overflow

As in Listing 9-1, at ➊ and ➋, the code allocates a fixed-stack buffer and reads the string in a loop. The first difference is at ➌. The developer has added a check to make sure to exit the loop if it has already read 32 bytes, the maximum the stack buffer can hold. Unfortunately, to ensure that the string buffer is suitably terminated, a zero byte is written to the last position available in the buffer ➍. At this point, i has the value of 32. But because languages like C start buffer indexing from 0, this actually means it will write 0 to the 33rd element of the buffer, thereby causing corruption, as shown in Figure 9-2.

Figure 9-2: An off-by-one error memory corruption

This results in an off-by-one error (due to the shift in index position), a common error in memory-unsafe languages with zero-based buffer indexing. If the overwritten value is important—for example, if it is the return address for the function—this vulnerability can be exploitable.

Variable-Length Buffer Overflows

An application doesn’t have to use fixed-length buffers to stored protocol data. In most situations, it’s possible for the application to allocate a buffer of the correct size for the data being stored. However, if the application incorrectly calculates the buffer size, a variable-length buffer overflow can occur.

As the length of the buffer is calculated at runtime based on the length of the protocol data, you might think a variable-length buffer overflow is unlikely to be a real-world vulnerability. But this vulnerability can still occur in a number of ways. For one, an application might simply incorrectly calculate the buffer length. (Applications should be rigorously tested prior to being made generally available, but that’s not always the case.)

A bigger issue occurs if the calculation induces undefined behavior by the language or platform. For example, Listing 9-3 demonstrates a common way in which the length calculation is incorrect.

   def read_uint32_array()
   {
     uint32 len;
     uint32[] buf;

     // Read the number of words from the network
➊   len = read_uint32();

     // Allocate memory buffer
➋   buf = malloc(len * sizeof(uint32));

     // Read values
     for(uint32 i = 0; i < len; ++i)
     {
➌     buf[i] = read_uint32();
     }
     printf("Read in %d uint32 values ", len);
   }

Listing 9-3: An incorrect allocation length calculation

Here the memory buffer is dynamically allocated at runtime to contain the total size of the input data from the protocol. First, the code reads a 32-bit integer, which it uses to determine the number of following 32-bit values in the protocol ➊. Next, it determines the total allocation size and then allocates a buffer of a corresponding size ➋. Finally, the code starts a loop that reads each value from the protocol into the allocated buffer ➌.

What could possibly go wrong? To answer, let’s take a quick look at integer overflows.

Integer Overflows

At the processor instruction level, integer arithmetic operations are commonly performed using modulo arithmetic. Modulo arithmetic allows values to wrap if they go above a certain value, which is called the modulus. A processor uses modulo arithmetic if it supports only a certain native integer size, such as 32 or 64 bits. This means that the result of any arithmetic operation must always be within the ranges allowed for the fixed-size integer value. For example, an 8-bit integer can take only the values between 0 and 255; it cannot possibly represent any other values. Figure 9-3 shows what happens when you multiply a value by 4, causing the integer to overflow.

Figure 9-3: A simple integer overflow

Although this figure shows 8-bit integers for the sake of brevity, the same logic applies to 32-bit integers. When we multiply the original length 0x41 or 65 by 4, the result is 0x104 or 260. That result can’t possibly fit into an 8-bit integer with a range of 0 to 255. So the processor drops the overflowed bit (or more likely stores it in a special flag indicating that an overflow has occurred), and the result is the value 4—not what we expected. The processor might issue an error to indicate that an overflow has occurred, but memory-unsafe programming languages typically ignore this sort of error. In fact, the act of wrapping the integer value is used in architectures such as x86 to indicate the signed result of an operation. Higher-level languages might indicate the error, or they might not support integer overflow at all, for instance, by extending the size of the integer on demand.

Returning to Listing 9-3, you can see that if an attacker supplies a suitably chosen value for the buffer length, the multiplication by 4 will overflow. This results in a smaller number being allocated to memory than is being transmitted over the network. When the values are being read from the network and inserted into the allocated buffer, the parser uses the original length. Because the original length of the data doesn’t match up to the size of the allocation, values will be written outside of the buffer, causing memory corruption.

WHAT HAPPENS IF WE ALLOCATE ZERO BYTES?

Consider what happens when we calculate an allocation length of zero bytes. Would the allocation simply fail because you can’t allocate a zero-length buffer? As with many issues in languages like C, it is up to the implementation to determine what occurs (the dreaded implementation-defined behavior). In the case of the C allocator function, malloc, passing zero as the requested size can return a failure, or it can return a buffer of indeterminate size, which hardly instills confidence.

Out-of-Bounds Buffer Indexing

You’ve already seen that memory-unsafe languages do not perform bounds checks. But sometimes a vulnerability occurs because the size of the buffer is incorrect, leading to memory corruption. Out-of-bounds indexing stems from a different root cause: instead of incorrectly specifying the size of a data value, we’ll have some control over the position in the buffer we’ll access. If incorrect bounds checking is done on the access position, a vulnerability exists. The vulnerability can in many cases be exploited to write data outside the buffer, leading to selective memory corruption. Or it can be exploited by reading a value outside the buffer, which could lead to information disclosure or even remote code execution. Listing 9-4 shows an example that exploits the first case—writing data outside the buffer.

➊ byte app_flags[32];

   def update_flag_value()
   {
  ➋ byte index = read_byte();
     byte value = read_byte();

     printf("Writing %d to index %d ", value, index);

  ➌ app_flags[index] = value;
   }

Listing 9-4: Writing to an out-of-bound buffer index

This short example shows a protocol with a common set of flags that can be updated by the client. Perhaps it’s designed to control certain server properties. The listing defines a fixed buffer of 32 flags at ➊. At ➋ it reads a byte from the network, which it will use as the index (with a range of 0 to 255 possible values), and then it writes the byte to the flag buffer ➌. The vulnerability in this case should be obvious: an attacker can provide values outside the range of 0 to 32 with the index, leading to selective memory corruption.

Out-of-bounds indexing doesn’t just have to involve writing. It works just as well when values are read from a buffer with an incorrect index. If the index were used to read a value and return it to the client, a simple information disclosure vulnerability would exist.

A particularly critical vulnerability could occur if the index were used to identify functions within an application to run. This usage could be something simple, such as using a command identifier as the index, which would usually be programmed by storing memory pointers to functions in a buffer. The index is then used to look up the function used to handle the specified command from the network. Out-of-bounds indexing would result in reading an unexpected value from memory that would be interpreted as a pointer to a function. This issue can easily result in exploitable remote code execution vulnerabilities. Typically, all that is required is finding an index value that, when read as a function pointer, would cause execution to transfer to a memory location an attacker can easily control.

Data Expansion Attack

Even modern, high-speed networks compress data to reduce the number of raw octets being sent, whether to improve performance by reducing data transfer time or to reduce bandwidth costs. At some point, that data must be decompressed, and if compression is done by an application, data expansion attacks are possible, as shown in Listing 9-5.

   void read_compressed_buffer()
   {
     byte buf[];
     uint32 len;
     int i = 0;

     // Read the decompressed size
➊   len = read_uint32();

     // Allocate memory buffer
➋   buf = malloc(len);

➌   gzip_decompress_data(buf)

     printf("Decompressed in %d bytes ", len);
   }

Listing 9-5: Example code vulnerable to a data expansion attack

Here, the compressed data is prefixed with the total size of the decompressed data. The size is read from the network ➊ and is used to allocate the required buffer ➋. After that, a call is made to decompress the data to the buffer ➌ using a streaming algorithm, such as gzip. The code does not check the decompressed data to see if it will actually fit into the allocated buffer.

Of course, this attack isn’t limited to compression. Any data transformation process, whether it’s encryption, compression, or text encoding conversions, can change the data size and lead to an expansion attack.

Dynamic Memory Allocation Failures

A system’s memory is finite, and when the memory pool runs dry, a dynamic memory allocation pool must handle situations in which an application needs more. In the C language, this usually results in an error value being returned from the allocation functions (usually a NUL pointer); in other languages, it might result in the termination of the environment or the generation of an exception.

Several possible vulnerabilities may arise from not correctly handling a dynamic memory allocation failure. The most obvious is an application crash, which can lead to a denial-of-service condition.

Default or Hardcoded Credentials

When one is deploying an application that uses authentication, default credentials are commonly added as part of the installation process. Usually, these accounts have a default username and password associated with them. The defaults create a problem if the administrator deploying the application does not reconfigure the credentials for these accounts prior to making the service available.

A more serious problem occurs when an application has hardcoded credentials that can be changed only by rebuilding the application. These credentials may have been added for debugging purposes during development and not removed before final release. Or they could be an intentional backdoor added with malicious intent. Listing 9-6 shows an example of authentication compromised by hardcoded credentials.

def process_authentication()
{
➊ string username = read_string();
   string password = read_string();

   // Check for debug user, don't forget to remove this before release
➋ if(username == "debug")
   {
     return true;
   }
   else
   {
  ➌ return check_user_password(username, password);
   }
}

Listing 9-6: An example of default credentials

The application first reads the username and password from the network ➊ and then checks for a hardcoded username, debug ➋. If the application finds username debug, it automatically passes the authentication process; otherwise, it follows the normal checking process ➌. To exploit such a default username, all you’d need to do is log in as the debug user. In a real-world application, the credentials might not be that simple to use. The login process might require you to have an accepted source IP address, send a magic string to the application prior to login, and so on.

User Enumeration

Most user-facing authentication mechanisms use usernames to control access to resources. Typically, that username will be combined with a token, such as a password, to complete authentication. The user identity doesn’t have to be a secret: usernames are often a publicly available email address.

There are still some advantages to not allowing someone, especially unauthenticated users, to gain access to this information. By identifying valid user accounts, it is more likely that an attacker could brute force passwords. Therefore, any vulnerability that discloses the existence of valid usernames or provides access to the user list is an issue worth identifying. A vulnerability that discloses the existence of users is shown in Listing 9-7.

def process_authentication()
{
   string username = read_string();
   string password = read_string();

➊ if(user_exists(username) == false)
   {
  ➋ write_error("User " + username " doesn't exist");
   }
   else
   {
  ➌ if(check_user_password(username, password))
     {
       write_success("User OK");
     }
     else
     {
    ➍ write_error("User " + username " password incorrect");
     }
   }
}

Listing 9-7: Disclosing the existence of users in an application

The listing shows a simple authentication process where the username and password are read from the network. It first checks for the existence of a user ➊; if the user doesn’t exist, an error is returned ➋. If the user exists, the listing checks the password for that user ➌. Again, if this fails, an error is written ➍. You’ll notice that the two error messages in ➋ and ➍ are different depending on whether the user does not exist or only the password is incorrect. This information is sufficient to determine which usernames are valid.

By knowing a username, an attacker can more easily brute force valid authentication credentials. (It’s simpler to guess only a password rather than both a password and username.) Knowing a username can also give an attacker enough information to mount a successful social-engineering attack that would convince a user to disclose their password or other sensitive information.

Incorrect Resource Access

Protocols that provide access to resources, such as HTTP or other file-sharing protocols, use an identifier for the resource you want to access. That identifier could be a file path or other unique identifier. The application must resolve that identifier in order to access the target resource. On success, the contents of the resource are accessed; otherwise, the protocol throws an error.

Several vulnerabilities can affect such protocols when they’re processing resource identifiers. It’s worth testing for all possible vulnerabilities and carefully observing the response from the application.

Canonicalization

If the resource identifier is a hierarchical list of resources and directories, it’s normally referred to as a path. Operating systems typically define the way to specify relative path information is to use two dots (..) to indicate a parent directory relationship. Before a file can be accessed, the OS must find it using this relative path information. A very naive remote file protocol could take a path supplied by a remote user, concatenate it with a base directory, and pass that directly to the OS, as shown in Listing 9-8. This is known as a canonicalization vulnerability.

   def send_file_to_client()
   {
➊   string name = read_string();
    // Concatenate name from client with base path
➋   string fullPath = "/files" + name;

➌   int fd = open(fullPath, READONLY);

    // Read file to memory
➍   byte data[] read_to_end(fd);

    // Send to client
➎   write_bytes(data, len(data));
   }

Listing 9-8: A path canonicalization vulnerability

This listing reads a string from the network that represents the name of the file to access ➊. This string is then concatenated with a fixed base path into the full path ➋ to allow access only to a limited area of the filesystem. The file is then opened by the operating system ➌, and if the path contains relative components, they are resolved. Finally, the file is read into memory ➍ and returned to the client ➎.

If you find code that performs this same sequence of operations, you’ve identified a canonicalization vulnerability. An attacker could send a relative path that is resolved by the OS to a file outside the base directory, resulting in sensitive files being disclosed, as shown in Figure 9-4.

Even if an application does some checking on the path before sending it to the OS, the application must correctly match how the OS will interpret the string. For example, on Microsoft Windows backslashes () and forward slashes (/) are acceptable as path separators. If an application checks only backslashes, the standard for Windows, there might still be a vulnerability.

Figure 9-4: A normal path canonicalization operation versus a vulnerable one

Although having the ability to download files from a system might be enough to compromise it, a more serious issue results if the canonicalization vulnerability occurs in file upload protocols. If you can upload files to the application-hosting system and specify an arbitrary path, it’s much easier to compromise a system. You could, for example, upload scripts or other executable content to the system and get the system to execute that content, leading to remote code execution.

Verbose Errors

If, when an application attempts to retrieve a resource, the resource is not found, applications typically return some error information. That error can be as simple as an error code or a full description of what doesn’t exist; however, it should not disclose any more information than required. Of course, that’s not always the case.

If an application returns an error message when requesting a resource that doesn’t exist and inserts local information about the resource being accessed into the error, a simple vulnerability is present. If a file was being accessed, the error might contain the local path to the file that was passed to the OS: this information might prove useful for someone trying to get further access to the hosting system, as shown in Listing 9-9.

def send_file_to_client_with_error()
{
➊ string name = read_string();

   // Concatenate name from client with base path
➋ string fullPath = "/files" + name;

➌ if(!exist(fullPath))
   {
  ➍ write_error("File " + fullPath + " doesn't exist");
   }
   else
   {
  ➎ write_file_to_client(fullPath);
   }
}

Listing 9-9: An error message information disclosure

This listing shows a simple example of an error message being returned to a client when a requested file doesn’t exist. At ➊ it reads a string from the network that represents the name of the file to access. This string is then concatenated with a fixed base path into the full path at ➋. The existence of the file is checked with the operating system at ➌. If the file doesn’t exist, the full path to the file is added to an error string and returned to the client ➍; otherwise, the data is returned ➎.

The listing is vulnerable to disclosing the location of the base path on the local filesystem. Furthermore, the path could be used with other vulnerabilities to get more access to the system. It could also disclose the current user running the application if, for example, the resource directory was in the user’s home directory.

Memory Exhaustion Attacks

The resources of the system on which an application runs are finite: available disk space, memory, and processing power have limits. Once a critical system resource is exhausted, the system might start failing in unexpected ways, such as by no longer responding to new network connections.

When dynamic memory is used to process a protocol, the risk of overallocating memory or forgetting to free the allocated blocks always exists, resulting in memory exhaustion. The simplest way in which a protocol can be susceptible to a memory exhaustion vulnerability is if it allocates memory dynamically based on an absolute value transmitted in the protocol. For example, consider Listing 9-10.

def read_buffer()
{
   byte buf[];
   uint32 len;
   int i = 0;

   // Read the number of bytes from the network
➊ len = read_uint32();

   // Allocate memory buffer
➋ buf = malloc(len);

   // Allocate bytes from network
➌ read_bytes(buf, len);

   printf("Read in %d bytes ", len);
}

Listing 9-10: A memory exhaustion attack

This listing reads a variable-length buffer from the protocol. First, it reads in the length in bytes ➊ as an unsigned 32-bit integer. Next, it tries to allocate a buffer of that length, prior to reading it from the network ➋. Finally, it reads the data from the network ➌. The problem is that an attacker could easily specify a very large length, say 2 gigabytes, which when allocated would block out a large region of memory that no other part of the application could access. The attacker could then slowly send data to the server (to try to prevent the connection from closing due to a timeout) and, by repeating this multiple times, eventually starve the system of memory.

Most systems would not allocate physical memory until it was used, thereby limiting the general impact on the system as a whole. However, this attack would be more serious on dedicated embedded systems where memory is at a premium and virtual memory is nonexistent.

Storage Exhaustion Attacks

Storage exhaustion attacks are less likely to occur with today’s multi-terabyte hard disks but can still be a problem for more compact embedded systems or devices without storage. If an attacker can exhaust a system’s storage capacity, the application or others on that system could begin failing. Such an attack might even prevent the system from rebooting. For example, if an operating system needs to write certain files to disk before starting but can’t, a permanent denial-of-service condition can occur.

The most common cause of this type of vulnerability is in the logging of operating information to disk. For example, if logging is very verbose, generating a few hundred kilobytes of data per connection, and the maximum log size has no restrictions, it would be fairly simple to flood storage by making repeated connections to a service. Such an attack might be particularly effective if an application logs data sent to it remotely and supports compressed data. In such a case, an attacker could spend very little network bandwidth to cause a large amount of data to be logged.

CPU Exhaustion Attacks

Even though today’s average smartphone has multiple CPUs at its disposal, CPUs can do only a certain number of tasks at one time. It is possible to cause a denial-of-service condition if an attacker can consume CPU resources with a minimal amount of effort and bandwidth. Although this can be done in several ways, I’ll discuss only two: exploiting algorithmic complexity and identifying external controllable parameters to cryptographic systems.

Algorithmic Complexity

All computer algorithms have an associated computational cost that represents how much work needs to be performed for a particular input to get the desired output. The more work an algorithm requires, the more time it needs from the system’s processor. In an ideal world, an algorithm should take a constant amount of time, no matter what input it receives. But that is rarely the case.

Some algorithms become particularly expensive as the number of input parameters increases. For example, consider the sorting algorithm Bubble Sort. This algorithm inspects each value pair in a buffer and swaps them if the left value of the pair is greater than the right. This has the effect of bubbling the higher values to the end of the buffer until the entire buffer is sorted. Listing 9-11 shows a simple implementation.

def bubble_sort(int[] buf)
{
  do
  {
    bool swapped = false;
    int N = len(buf);
    for(int i = 1; i < N - 1; ++i)
    {
      if(buf[i-1] > buf[i])
      {
        // Swap values
        swap( buf[i-1], buf[i] );
        swapped = true;
      }
    }
  } while(swapped == false);
}

Listing 9-11: A simple Bubble Sort implementation

The amount of work this algorithm requires is proportional to the number of elements (let’s call the number N) in the buffer you need to sort. In the best case, this necessitates a single pass through the buffer, requiring N iterations, which occurs when all elements are already sorted. In the worst case, when the buffer is sorted in reverse, the algorithm needs to repeat the sort process N ² times. If an attacker could specify a large number of reverse-sorted values, the computational cost of doing this sort becomes significant. As a result, the sort could consume 100 percent of a CPU’s processing time and lead to denial-of-service.

In a real-world example of this, it was discovered that some programming environments, including PHP and Java, used an algorithm for the hash table implementations that took N ² operations in the worst case. A hash table is a data structure that holds values keyed to another value, such as a textual name. The keys are first hashed using a simple algorithm, which then determines a bucket into which the value is placed. The N ² algorithm is used when inserting the new value into the bucket; ideally, there should be few collisions between the hash values of keys so the size of the bucket is small. But by crafting a set of keys with the same hash (but, crucially, different key values), an attacker could cause a denial-of-service condition on a network service (such as a web server) by sending only a few requests.

BIG-O NOTATION

Big-O notation, a common representation of computational complexity, represents the upper bound for an algorithm’s complexity. Table 9-1 lists some common Big-O notations for various algorithms, from least to most complex.

Table 9-1: Big-O Notation for Worst-Case Algorithm Complexity

Notation	Description
O(1)	Constant time; the algorithm always takes the same amount of time.
O(log N)	Logarithmic; the worst case is proportional to the logarithm of the number of inputs.
O(N)	Linear time; the worst case is proportional to the number of inputs.
O(N ²)	Quadratic; the worst case is proportional to the square of the number of inputs.
O(2^N)	Exponential; the worst case is proportional to 2 raised to the power N.

Bear in mind that these are worst-case values that don’t necessarily represent real-world complexity. That said, with knowledge of a specific algorithm, such as the Bubble Sort, there is a good chance that an attacker could intentionally trigger the worst case.

Configurable Cryptography

Cryptographic primitives processing, such as hashing algorithms, can also create a significant amount of computational workload, especially when dealing with authentication credentials. The rule in computer security is that passwords should always be hashed using a cryptographic digest algorithm before they are stored. This converts the password into a hash value, which is virtually impossible to reverse into the original password. Even if the hash was disclosed, it would be difficult to get the original password. But someone could still guess the password and generate the hash. If the guessed password matches when hashed, then they’ve discovered the original password. To mitigate this problem, it’s typical to run the hashing operation multiple times to increase an attacker’s computational requirement. Unfortunately, this process also increases computational cost for the application, which might be a problem when it comes to a denial-of-service condition.

A vulnerability can occur if either the hashing algorithm takes an exponential amount of time (based on the size of the input) or the algorithm’s number of iterations can be specified externally. The relationship between the time required by most cryptographic algorithms and a given input is fairly linear. However, if you can specify the algorithm’s number of iterations without any sensible upper bound, processing could take as long as the attacker desired. Such a vulnerable application is shown in Listing 9-12.

   def process_authentication()
   {
➊   string username = read_string();
     string password = read_string();
➋   int iterations = read_int();

     for(int i = 0; i < interations; ++i)
     {
➌     password = hash_password(password);
     }

➍   return check_user_password(username, password);
   }

Listing 9-12: Checking a vulnerable authentication

First, the username and password are read from the network ➊. Next, the hashing algorithm’s number of iterations is read ➋, and the hashing process is applied that number of times ➌. Finally, the hashed password is checked against one stored by the application ➍. Clearly, an attacker could supply a very large value for the iteration count that would likely consume a significant amount of CPU resources for an extended period of time, especially if the hashing algorithm is computationally complex.

A good example of a cryptographic algorithm that a client can configure is the handling of public/private keys. Algorithms such as RSA rely on the computational cost of factoring a large public key value. The larger the key value, the more time it takes to perform encryption/decryption and the longer it takes to generate a new key pair.

Format String Vulnerabilities

Most programming languages have a mechanism to convert arbitrary data into a string, and it’s common to define some formatting mechanism to specify how the developer wants the output. Some of these mechanisms are quite powerful and privileged, especially in memory-unsafe languages.

A format string vulnerability occurs when the attacker can supply a string value to an application that is then used directly as the format string. The best-known, and probably the most dangerous, formatter is used by the C language’s printf and its variants, such as sprintf, which print to a string. The printf function takes a format string as its first argument and then a list of the values to format. Listing 9-13 shows such a vulnerable application.

def process_authentication()
{
      string username = read_string();
      string password = read_string();

      // Print username and password to terminal
      printf(username);
      printf(password);

      return check_user_password(username, password))
}

Listing 9-13: The printf format string vulnerability

The format string for printf specifies the position and type of data using a %? syntax where the question mark is replaced by an alphanumeric character. The format specifier can also include formatting information, such as the number of decimal places in a number. An attacker who can directly control the format string could corrupt memory or disclose information about the current stack that might prove useful for further attacks. Table 9-2 shows a list of common printf format specifiers that an attacker could abuse.

Table 9-2: List of Commonly Exploitable printf Format Specifiers

Format specifier	Description	Potential vulnerabilities
%d, %p, %u, %x	Prints integers	Can be used to disclose information from the stack if returned to an attacker
%s	Prints a zero terminated string	Can be used to disclose information from the stack if returned to an attacker or cause invalid memory accesses to occur, leading to denial-of-service
%n	Writes the current number of printed characters to a pointer specified in the arguments	Can be used to cause selective memory corruption or application crashes

Command Injection

Most OSes, especially Unix-based OSes, include a rich set of utilities designed for various tasks. Sometimes developers decide that the easiest way to execute a particular task, say password updating, is to execute an external application or operating system utility. Although this might not be a problem if the command line executed is entirely specified by the developer, often some data from the network client is inserted into the command line to perform the desired operation. Listing 9-14 shows such a vulnerable application.

def update_password(string username)
{
➊ string oldpassword = read_string();
   string newpassword = read_string();

   if(check_user_password(username, oldpassword))
   {
     // Invoke update_password command
  ➋ system("/sbin/update_password -u " + username + " -p " + newpassword);
   }
}

Listing 9-14: A password update vulnerable to command injection

The listing updates the current user’s password as long as the original password is known ➊. It then builds a command line and invokes the Unix-style system function ➋. Although we don’t control the username or oldpassword parameters (they must be correct for the system call to be made), we do have complete control over newpassword. Because no sanitization is done, the code in the listing is vulnerable to command injection because the system function uses the current Unix shell to execute the command line. For example, we could specify a value for newpassword such as password; xcalc, which would first execute the password update command. Then the shell could execute xcalc as it treats the semicolon as a separator in a list of commands to execute.

SQL Injection

Even the simplest application might need to persistently store and retrieve data. Applications can do this in a number of ways, but one of the most common is to use a relational database. Databases offer many advantages, not least of which is the ability to issue queries against the data to perform complex grouping and analysis.

The de facto standard for defining queries to relational databases is the Structured Query Language (SQL). This text-based language defines what data tables to read and how to filter that data to get the results the application wants. When using a text-based language there is a temptation is to build queries using string operations. However, this can easily result in a vulnerability like command injection: instead of inserting untrusted data into a command line without appropriately escaping, the attacker inserts data into a SQL query, which is executed on the database. This technique can modify the operation of the query to return known results. For example, what if the query extracted the current password for the authenticating user, as shown in Listing 9-15?

   def process_authentication()
   {
➊   string username = read_string();
     string password = read_string();

➋   string sql = "SELECT password FROM user_table WHERE user = '" + username "'";

➌   return run_query(sql) == password;
   }

Listing 9-15: An example of authentication vulnerable to SQL injection

This listing reads the username and password from the network ➊. Then it builds a new SQL query as a string, using a SELECT statement to extract the password associated with the user from the user table ➋. Finally, it executes that query on the database and checks that the password read from the network matches the one in the database ➌.

The vulnerability in this listing is easy to exploit. In SQL, the strings need to be enclosed in single quotes to prevent them from being interpreted as commands in the SQL statement. If a username is sent in the protocol with an embedded single quote, an attacker could terminate the quoted string early. This would lead to an injection of new commands into the SQL query. For example, a UNION SELECT statement would allow the query to return an arbitrary password value. An attacker could use the SQL injection to bypass the authentication of an application.

SQL injection attacks can even result in remote code execution. For example, although disabled by default, Microsoft’s SQL Server’s database function xp_cmdshell allows you to execute OS commands. Oracle’s database even allows uploading arbitrary Java code. And of course, it’s also possible to find applications that pass raw SQL queries over the network. Even if a protocol is not intended for controlling the database, there’s still a good chance that it can be exploited to access the underlying database engine.

Text-Encoding Character Replacement

In an ideal world, everyone would be able to use one type of text encoding for all different languages. But we don’t live in an ideal world, and we use multiple text encodings as discussed in Chapter 3, such as ASCII and variants of Unicode.

Some conversions between text encodings cannot be round-tripped: converting from one encoding to another loses important information such that if the reverse process is applied, the original text can’t be restored. This is especially problematic when converting from a wide character set such as Unicode to a narrow one such as ASCII. It’s simply impossible to encode the entire Unicode character set in 7 bits.

Text-encoding conversions manage this problem in one of two ways. The simplest approach replaces the character that cannot be represented with a placeholder, such as the question mark (?) character. This might be a problem if the data value refers to something where the question mark is used as a delimiter or as a special character, for example, as in URL parsing where it represents the beginning of a query string.

The other approach is to apply a best-fit mapping. This is used for characters for which there is a similar character in the new encoding. For example, the quotation mark characters in Unicode have left-facing and right-facing forms that are mapped to specific code points, such as U+201C and U+201D for left and right double quotation marks. These are outside the ASCII range, but in a conversion to ASCII, they’re commonly replaced with the equivalent character, such as U+0022 or the quotation mark. Best-fit mapping can become a problem when the converted text is processed by the application. Although slightly corrupted text won’t usually cause much of a problem for a user, the automatic conversion process could cause the application to mishandle the data.

The important implementation issue is that the application first verifies the security condition using one encoded form of a string. Then it uses the other encoded form of a string for a specific action, such as reading a resource or executing a command, as shown in Listing 9-16.

def add_user()
{
➊ string username = read_unicode_string();

   // Ensure username doesn't contain any single quotes
➋ if(username.contains("'") == false)
   {
     // Add user, need to convert to ASCII for the shell
  ➌ system("/sbin/add_user '" + username.toascii() + "'");
   }
}

Listing 9-16: A text conversion vulnerability

In this listing, the application reads in a Unicode string representing a user to add to the system ➊. It will pass the value to the add_user command, but it wants to avoid a command injection vulnerability; therefore, it first ensures that the username doesn’t contain any single quote characters that could be misinterpreted ➋. Once satisfied that the string is okay, it converts it to ASCII (Unix systems typically work on a narrow character set, although many support UTF-8) and ensures that the value is enclosed with single quotes to prevent spaces from being misinterpreted ➌.

Of course, if the best-fit mapping rules convert other characters back to a single quote, it would be possible to prematurely terminate the quoted string and return to the same sort of command injection vulnerabilities discussed earlier.

Final Words

This chapter showed you that many possible root causes exist for vulnerabilities, with a seemingly limitless number of variants in the wild. Even if something doesn’t immediately look vulnerable, persist. Vulnerabilities can appear in the most surprising places.

I’ve covered vulnerabilities ranging from memory corruptions, causing an application to behave in a different manner than it was originally designed, to preventing legitimate users from accessing the services provided. It can be a complex process to identify all these different issues.

As a protocol analyzer, you have a number of possible angles. It is also vital that you change your strategy when looking for implementation vulnerabilities. Take into account whether the application is written in memory-safe or unsafe languages, keeping in mind that you are less likely to find memory corruption in, for example, a Java application.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 9: The Root Causes of Vulnerabilities

Create new playlist

Sign In

Sign Up

9THE ROOT CAUSES OF VULNERABILITIES