Structured Query Language (SQL) Injection

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Structured Query Language (SQL) Injection

Structured Query Language (SQL) was created in the 1970s, and has since become the de facto standard for relational data storage. All of the major database products (Oracle, DB2, Microsoft SQL Server, MySQL, to name just a few) support SQL, and even junior-level programmers are expected to have a solid grasp of SQL concepts and syntax. Unfortunately, most programming languages and frameworks make it far too easy for these programmers to write SQL queries that end up creating vulnerabilities that attackers can easily exploit to devastating effect.

Note

While we don’t expect you to have 20 years of experience as a SQL database administrator in order to get useful information from this chapter, we are going to assume that you have at least a basic familiarity with SQL. If you understand that SQL databases store data in tabular format, and that you use SELECT statements to retrieve data, INSERT statements to add data, UPDATE statements to modify data, and DELETE statements to erase data, then you should have all the SQL knowledge you’ll need for this chapter.

It may sound a little strange, but writing queries against a SQL database is a lot like playing the game MadLibs. If you’ve never played MadLibs before, it’s a game that comes in book form and is meant to be played by a group of people, usually on long road trips. Each game starts with a list of word types or categories, something like this:

The players then call out words that fit the required categories, the sillier the better:

After all the choices are made, the page is turned to reveal a sentence or paragraph incorporating the words that the players chose blindly:

On our summer vacation to Mars, we skied all week until we saw some purple cows!

Again, it sounds strange, but this is the same principle that web applications use to build SQL queries from user input. You ask the user for “words” that you then build into a “sentence” (that is, a SQL query) that they have no knowledge of. For example, let’s say you want to run a query against your database that will retrieve the first and last name of every salesperson in your company who works in a given U.S. state. The application code to perform this query might look something like this:

When a user enters a value for the state in the web application UI, say “CA,” the database engine executes the query:

This design works great, for both SQL queries and MadLibs, as long as everyone plays by the rules. But eventually, someone will get the idea to cheat. When this happens in MadLibs, you just end up with outcomes even sillier than usual:

On our summer vacation to blue, we stop sign all week until we saw some hamburgers and fries and milkshakes!

But when somebody cheats in a SQL query, the consequences are much more severe. Instead of entering the name of a state, like “CA” or “FL,” a cheater might enter a value like ′; DROP TABLE Users; --. The web application then builds the query using this input, and the database engine executes the SQL command:

What the cheater—whom we’ll refer to from now on as “the attacker”—has done here is to append, or inject, some SQL syntax of his own into the application developer’s intended SQL command. This attack technique is known as SQL injection. (Usually you’ll hear people pronounce this like “sequel injection” and not “S-Q-L injection.”) In this particular example, the attacker has instructed the database engine to drop (that is, delete) the table “Users” from the database. The database engine has no way of knowing that the DROP TABLE Users command actually came from an attacker, and the engine naïvely executes the command and deletes the table (assuming it actually exists, but more on that later).

SQL Injection Effects and Confidentiality-Integrity-Availability

If we look at this attack from a Confidentiality-Integrity-Availability perspective, at first glance it appears to be a straightforward availability issue. If an attacker can drop critical tables like the list of users from the database, it’s almost guaranteed that the application will just crash the next time someone tries to use it. It’s also almost guaranteed that the application will just keep crashing until you realize what the problem is and restore the database from its last known good backup (hopefully not too old of a backup, either!). Of course, we still haven’t addressed the root of the problem, which means there’s nothing from preventing the attacker from coming back in and deleting the table all over again.

Worse still, as bad as this is, it’s actually the least of our concerns. Successful SQL injection attacks can accomplish much more than deleting database tables, and can have effects beyond just availability concerns. What if, instead of choosing to delete records from the database, the attacker decided to alter existing records in the database, or add new records of his own?

or:

These are attacks that target the integrity of the application data, rather than its availability. And integrity attacks may not seem as bad as crashing the entire application so that no one can use it, but in some ways they’re actually worse. Deleting someone’s data is like keying their car: it’s a big, ugly, and obvious attack. Altering someone’s data is like cutting their brake lines: They probably won’t notice anything is wrong at first, but when they do, it’ll be much harder to recover from. How long ago was the data changed, and how far back do we have to restore data from? Exactly what data was changed in the first place? These are often very difficult questions to answer.

One of the hottest trends in the web application exploitation space right now is to combine a SQL injection data integrity attack with a persistent cross-site scripting attack as a kind of blended threat attack. (For a review of cross-site scripting attacks, refer to Chapter 6.) When an attacker finds a site vulnerable to SQL injection, he exploits the vulnerability to insert HTML <script> tags into the database records, as shown in Figure 7-1. These script tags point to a site under the attacker’s control that contains JavaScript malware such as a keystroke logger or drive-by downloader.

Figure 7-1 An attacker uses SQL injection to insert an HTML link to a malware site.

In this case, the attacker is assuming that the application is pulling the database records to display them in a web page. If he’s correct, when a victim views the compromised page, the application will pull the malicious script tag from the database record and include it with the page contents; the victim’s browser will then silently fetch the malware from the specified site, execute it, and the victim will be exposed to whatever payload the malware contains. Figure 7-2 shows a diagram of how a user would be exploited by this kind of attack.

Figure 7-2 A victim visits the affected site and silently downloads the attacker’s malware.

If this seems like a Rube Goldberg–style, overly convoluted attack that could never possibly work (SQL injection to persistent XSS to malware JavaScript to drive-by downloader or other malware payload), you may be surprised to learn that in 2008, over one million web pages were successfully infected by this exact attack. Site administrators began to notice that their Microsoft Active Server Pages (.ASP) web pages were behaving abnormally, and eventually realized that some kind of vulnerability had been exploited. Since the attacks were at first limited to Microsoft-specific .ASP files, security researchers assumed there was some previously unknown vulnerability (also known as a zero-day or 0-day vulnerability) in Microsoft Internet Information Services (IIS) to blame. However, further investigation revealed that all of the sites had been compromised with SQL injection. The reason that only .ASP pages were affected was that the attackers were not going after targets by hand; instead, they had programmed a botnet to search Google for vulnerable .ASP sites. When the botnet (later revealed to be the Asprox botnet) discovered a vulnerable page, it attempted to inject a script tag pointing to a malware site.

Tip

In response to the Asprox botnet attacks, the Hewlett-Packard Application Security Center, working in conjunction with the Microsoft Security Response Center (MSRC), developed a tool for IT professionals to determine whether their sites might be vulnerable to Asprox. The HP Scrawlr tool is still available for free download: Just search for “HP Scrawlr” (registration is required for downloading the tool from the HP web site). If you believe your site may be affected or just want to be proactive about SQL injection defense, download Scrawlr and try it against your sites.

A word of warning: don’t skip this just because you don’t use .ASP pages in your site. Other mass SQL injection attacks since Asprox such as “dnf666” and “LizaMoon” have been targeted at many different types of pages, and not just .ASP files. You may want to run Scrawlr or an equivalent SQL injection testing tool regardless of which web application framework you’re using.

So far, we’ve covered the potential availability and integrity aspects of SQL injection attacks. However, as bad as these effects are, the most damaging SQL injection attack of all may be a confidentiality attack: using SQL injection to steal data records. Again, this may not seem as bad as crashing the system or altering salaries. But in those cases, even though it may be difficult to repair the damage once it’s done, at least it’s possible. Once data is stolen, it can’t be un-stolen.

In one particularly ironic incident, the home page for the MySQL database engine, www.mysql.com, was itself compromised by a SQL injection attack. The attackers managed to extract the site’s list of valid users and their passwords, including those for some administrative users. They then posted this information to the public security vulnerability mailing list Full Disclosure for the whole world to see. This was bad enough for MySQL, but things could have been even worse had the attackers chosen to keep their list of passwords to themselves and just start exploiting those users.

If this happens to you, depending on the nature of your organization and your application, you may then be in violation of several compliance standards, including but not limited to: the Payment Card Industry Data Security Standard (PCI DSS), the Health Insurance Portability and Accountability Act (HIPAA), the Gramm-Leach-Bliley Act (GLBA), and the California Security Breach Information Act (CASB 1386). CASB 1386 states that if you do business in the state of California and you even suspect that California residents’ data has been stolen, you are legally obligated to inform them of the potential breach. It’s very frightening for a consumer to get a letter in the mail telling them that their credit card number may have been stolen, and it’s often difficult for them to forgive the organization at fault.

IMHO

There’s an ongoing religious argument among web application security experts as to whether cross-site scripting or SQL injection is the biggest threat to web applications. There are good arguments to be made on both sides, but I think you can sum it up by saying that the New York Times or the Wall Street Journal will never run a headline that says:

WEBSITE COOKIES LOST TO CROSS-SITE SCRIPTING HOLE

But this is a completely different story:

WEBSITE LEAKS CREDIT CARD NUMBERS; MILLIONS AFFECTED

Sometimes SQL injection vulnerabilities have multiple confidentiality-integrity-availability effects. The classic example of SQL injection is an attack against a user login procedure. The application prompts the user for their username and password, and then builds those inputs into a database query against a “Users” table:

The application executes this query and then checks to see whether any rows were returned from the database. If there was at least one row returned, then there must be a user in the system with the given username and password, so the user is valid and the application lets him in. If there were no rows returned, then there’s no user in the system with that username/password combination, and the application blocks the login attempt. But a SQL injection vulnerability can allow an attacker to completely bypass this authentication method. For example, what if an attacker were to enter the value foo for his username, and bar′ or ′1′ = ′1 for his password?

There may or may not be a user named foo in the database, and if so, he may or may not have a password of bar, but in either case it doesn’t really matter. The SQL clause OR ′1′=′1′ that the attacker injected assures him that all of the data in the table will be returned from the query, since the string value “1” is always equal to “1”. As long as there’s at least one user in the database (so that there’s at least one row returned by the injected query clause), the attacker will be granted access to the system. This may in turn have confidentiality effects—he can see data he wasn’t supposed to have access to—or integrity effects—he can change or add data he wasn’t supposed to have access to.

Probably the worst SQL injection attack payload is when the attacker can gain access to a command-line shell on the database server. In security lingo this is called owning or rooting the server. (Sometimes you’ll see this spelled as pwning the server.) If successful, this attack would enable him to do just about anything he wanted to do to the server, and with administrative privileges. He could read, alter, or delete any data on the system. The effects wouldn’t just be limited to data in the database either: even files on the server would be up for grabs too. The attacker could install new back doors into the system and alter the system logs so that computer forensics wouldn’t reveal the intrusion. Basically this would be a complete system compromise of confidentiality, integrity, and availability at the highest possible levels.

In case you’re wondering, this is not a purely hypothetical attack. The most famous (or maybe infamous) method of gaining shell access through a SQL injection vulnerability is to execute the Microsoft SQL Server stored procedure xp_cmdshell. The way xp_ cmdshell works is very simple: It takes a single string argument and then executes that as a command-line call. For example, the call

would perform a directory listing of the server’s C drive. Again, at this point the damage is limited only by the attacker’s imagination, and exploiting this through SQL injection is absolutely trivial:

If you’re running SQL Server, we strongly recommend disabling or removing the xp_cmdshell stored procedure. You can disable it through use of the sp_configure stored procedure, like so:

and in fact Microsoft ships newer versions of SQL Server (since SQL Server 2005) with xp_ cmdshell disabled by default. However, the fact that you can disable xp_cmdshell through a stored procedure begs the question: Couldn’t an attacker just re-enable xp_cmdshell with the same call? If the application database user is running with administrative privileges—which it often is, and later in this chapter we’ll see how to change this—then yes, an attacker could re-enable xp_cmdshell:

Given this, you’re better off removing the xp_cmdshell stored procedure from your system entirely, but if your code is vulnerable to injection attacks, then you’re still not out of trouble. Incredibly, it’s possible to re-create the xp_cmdshell procedure from scratch even if it’s been deleted from the database. There’s even an automated hacking tool called sqlninja that an attacker can use that will do this for him.

Even in light of these facts, it’s still important to either disable or remove xp_cmdshell as a defense-in-depth measure. Not every attacker knows how to re-create xp_cmdshell, and even if yours does, there’s no reason to make his job any easier.

The Dangers of Detailed Errors

One big unanswered question at this point is: Just how do the attackers find out the database table names to write SQL exploit code against them? As we just saw, some of the most damaging SQL injection attacks such as the authentication bypass technique and the xp_cmdshell stored procedure execution don’t necessarily require the attacker to have any knowledge of table names. But for now let’s assume the attacker isn’t interested in these attacks, so we’ll take a look at some methods that he’ll try to use to find out database metadata like table and column names.

What an attacker really hopes for when he starts casing your web site for SQL injection holes is that the site has been set up to display detailed error messages. You’ve probably visited sites before that showed cryptic error messages with application stack traces or maybe even snippets of the application’s source code. There’s really no reason for a site to be set up this way. Users can’t do anything with this information, and in most cases it probably confuses them or even scares them that something is wrong with the site. About the best thing you can say about sites with detailed error messages is that it’s easier for the site developers to track down and fix the bug, given that they’re getting debug messages from the live site. Unfortunately, this information comes at a huge price, since it’s not just available to the site developers but also to potential attackers.

The first thing an attacker will probably try to do to reveal detailed SQL error messages in your application is to input a single special SQL character such as a single quote or a semicolon into an input field.

This query has a syntax error—a mismatched number of single quotes—so the database engine won’t be able to process the query and it will return an error, as shown in Figure 7-3.

Figure 7-3 A dangerously detailed error message containing the application’s stack trace

At this point, now that the attacker has a detailed error message, he also now knows what the original intended SQL query syntax is. This is going to make his life much easier as he tries to inject his own code. Another piece of information that is going to make this job easier is that, per the ANSI SQL standard, all databases should supply an INFORMATION_SCHEMA view that contains metadata about the database. The SQL query syntax to retrieve all of the tables in the database is:

For the attacker, it’s now a simple matter of executing this query and returning the results to him. However, he can’t just execute this query directly; he has to inject it as part of the original. He can do this by injecting a UNION SELECT clause into his attack syntax. UNION SELECT is SQL syntax that allows you to select data from multiple tables into a single result set, which is exactly what the attacker wants to do here.

In Actual Practice

When you’re testing your own applications to see if they’re vulnerable to SQL injection, testing with your web browser is a great start, but eventually you’ll need more highly specialized tools. Let’s say that your application has a vulnerable query, something like:

Now let’s say that the web application user interface displays a combo box for the user to select their favorite color, and that the combo box is hard-coded to contain only the values “red,” “green,” and “blue.” If you just used a browser to test the application, you’d never find the SQL injection vulnerability, because you’d have no way of sending any values that the application didn’t intend. You couldn’t even choose “yellow,” much less ′; DROP TABLE Users; --. But attackers do not put artificial constraints on themselves; they do whatever they can to break into your database. They use proxy tools or other tools that can craft arbitrary HTTP requests. If you want to stay one step ahead of them, you’ll need to use the same tools they do. Eric Lawrence’s HTTP debugging tool Fiddler is one of our favorite tools for this purpose, and you can see a screenshot of it in Figure 7-4.

Figure 7-4 Using the HTTP debugging tool Fiddler to directly create a raw HTTP request

Note

While INFORMATION_SCHEMA may be an ANSI standard, the most popular SQL database engine, Oracle (with a reported 48 percent market share of relational database installations as of April 2010) does not support INFORMATION_SCHEMA. However, do not take this to mean that you’re safe from SQL injection just because you’re running Oracle! Oracle still implements the same functionality as INFORMATION_SCHEMA (and a lot more); it simply has its own alternative implementations. The equivalent query in Oracle to retrieve a list of user-defined tables is:

And to retrieve table metadata like column names and types, you can use the DESCRIBE command (this also works for MySQL):

DESCRIBE Salesperson

Your first step toward defending yourself from SQL injection should be to ensure that your applications never display detailed error messages to users. This includes pages that try to hide stack traces or debug information in HTML comment sections in the mistaken belief that these areas are only visible to developers. Remember that this information is only a “View Source” click away for a potential attacker.

Instead of detailed error messages, you should always be showing generic custom errors that don’t give away any internal details about the error at all, something like “We’re sorry, an error has occurred.” It’s important to note that this step alone is not sufficient to prevent SQL injection vulnerabilities, as we’ll see later, but it is a necessary step and helpful in preventing other vulnerabilities besides SQL injection.

The process for enabling custom errors varies from language to language and web server to web server. In most cases, such as for PHP applications, you’ll need to edit the application’s source code to trap errors and redirect the response to a custom error HTML page. However, in some cases, such as for Microsoft .NET applications, you can make this change via a configuration file, without having to change any source code.

IMHO

Some web application security experts recommend always returning HTTP response code 200 (which is the “OK” response code) even when an error occurs on the server. The rationale for this suggestion is that if the application were to return any codes in the 500–599 range (the “server error” range), attackers’ automated scanning tools would have an easier time detecting that a potential vulnerability exists in your application. While there is a grain of truth to this, I think in actual practice it’s overkill, and that it’s best to follow W3C specifications when possible. Always return the appropriate HTTP response code.

Blind SQL Injection: No Errors Required

Although it’s much easier for attackers to exploit SQL injection holes when the application presents them with detailed error messages, it’s by no means required. For our final look at the topic of exploiting SQL injection vulnerabilities—before we move on to the much more respectable topics of fixing SQL injection vulnerabilities and preventing them in the first place—we’ll examine a form of SQL injection that doesn’t rely on any error messages at all.

Let’s continue by adding some new functionality to our sales team management example application. Our new page will let the user enter a customer’s account number, and the application will check whether that customer has ever purchased anything. If so, then the user is redirected to the page “customerHistory.html,” and if not, then he is redirected back to the home page “home.html.” Here’s some sample code to demonstrate:

Let’s improve the code a little by catching any errors that the database throws, as we talked about in the previous section. We’ll change the code so that if an error occurs, the application will redirect the user to the home page, just as if they had searched for a nonexistent customer or one who had never placed an order.

The code still contains an underlying SQL injection vulnerability, but we’ve made any potential attacker’s work much more difficult if he wants to actually exploit that vulnerability. Assuming that the xp_cmdshell attack is infeasible (for example, maybe it’s not a Microsoft SQL Server database), he’s going to need to uncover some database metadata like table and column names if he wants to extract any actual data. As we saw before, normally he would look for such information in error messages, but we’ve cut off that attack vector by catching all database errors and just redirecting back to the home page. And even if he just made a lucky guess at a table name—“Sales” and “Users” are pretty common names, after all—the usual technique of pulling table data out with injected UNION SELECT clauses won’t work either. The application never displays a table of returned data; it just redirects the user to one page or the other. However, even given these constraints, a determined attacker will still be able to extract the entire contents of the database eventually.

Let’s step back into the attacker’s shoes for a minute. When he looks at the customer search page for the first time, he’ll probe it for vulnerability the same way he did for the salesperson search page in the previous section, by inputting a special SQL syntax character like a single quote or a percentage sign and watching for an error.

The database engine will fail trying to process this query due to the mismatched number of single quotes. It will throw an error, and the application code will catch the error and redirect the attacker to the home page. The attacker will then try the search again with the OR ′1′ = ′1′ injection clause we looked at earlier, when we were talking about using SQL injection to bypass authentication checks:

This time, the query will succeed, the engine will not throw an error, and (assuming there’s at least one row in the Sales table), the application will redirect to the customer page. This gives the attacker a crucial piece of information. He now knows that the query code actually is vulnerable to SQL injection, since it processed his injected SQL without failing. One more test query should give him everything he needs. He tries again with an injected clause that he knows will cause the query to return no records, something like AND ′1′ = ′2′.

This query succeeds but doesn’t return any rows (since “1” is never equal to “2”), and redirects to the home page. Now the attacker has everything he needs; it’s just a matter of time and patience.

While he may have been hoping for a little more data to work with, the mere fact that there is a difference in the response between a query that finds results and one that doesn’t is actually enough information for the attacker to be able to eventually extract the entire contents of the database. Since he knows what the results of both of these queries look like—a redirect to the customer history page if the query returns data, and a redirect to the home page if it doesn’t—he can now use this information to effectively “ask” the database a series of yes-or-no “questions.” The yes/no questions that will be most useful to the attacker will be ones he can ask against the INFORMATION_SCHEMA view.

With just yes/no answers, the attacker can’t ask the question “What are the names of all the tables in the database?” But he can ask the question “Is the first letter of the name of the first table in the database an ‘A’?”

If the result of this injection is that the application redirects to the customer history page, then the attacker knows the first letter of the first table is indeed an “A,” and he can move on to find the second letter. If the result is the home page, he tries again, this time checking whether the first letter of the first table name is a “B,” and so on. This technique of extracting the database contents one character at a time by asking yes/no questions is called blind SQL injection, and you can see this attack illustrated in Figure 7-5. Eventually, the attacker will be able to extract all of the table names from the database, and from there all of the column names and all of the row data.

Figure 7-5 An attacker extracts information from the application database by “asking” true/false “questions” using blind SQL injection attacks.

If a standard SQL injection attack is like playing MadLibs with the database engine, a blind SQL injection attack is more like playing “20 Questions,” or maybe more like “20 Million Questions.” It’s a tedious method, but effective, and there are tools that can help automate the process. The sqlninja tool mentioned earlier can perform automated blind SQL injection attacks, as can others such as Absinthe and sqlmap. The key takeaway from this should be that it’s important to disable detailed error messages, but this step alone is not enough to prevent SQL injection vulnerabilities. Eventually, you will need to address the root cause of the problem.

Solving the Problem: Validating Input

The root cause of SQL injection vulnerabilities is that an attacker can specify data (in this case, form field input values, URL querystring parameters, web service method parameter values, and so on) that is interpreted by the database engine as code (in this case, SQL syntax). In order to prevent the vulnerability, you need to ensure that the engine never treats user input as code. There are basically two ways to accomplish this, and a wise developer will use both. First, you can validate the user input to ensure that it doesn’t contain SQL syntax; and second, you can encode or escape the user input to ensure that data is always interpreted as data.

The first way that most people want to try to validate input is to check it for SQL syntax characters. This code example checks to see if the input value (in this case, the customer ID) contains any single quotes. If so, then the application blocks the request and returns an error:

Unfortunately, there are a couple of showstopper problems with this approach. The first is that it’s prone to false positives: it will block legitimate, nonmalicious input values. Any customers with apostrophes in their names, like John O’Malley, will be unable to use the system.

The second problem is that this type of validation method won’t have any effect on attack techniques that don’t rely on single quotes. When you write SQL commands, string and date field values must be wrapped in single quotes (for example, SELECT OrderID FROM Sales WHERE CustomerID = ′bryan′) but numeric fields must not be (for example, SELECT OrderID FROM Sales WHERE OrderCost > 1000). SQL injection attacks against numeric fields wouldn’t necessarily need single quotes, and the single-quote-checking code we just added would be ineffective in stopping these attacks.

Additionally, depending on when you check the input, an attacker may be able to slip an exploit payload past your validation code by encoding or escaping his attack. For example, he might apply URL encoding to his attack string, so that single-quote characters are converted to the value %27. If you try to validate users’ input before the application decodes it—for example, if you use a web application firewall (WAF) or other filter that works very early in the application’s request-handling lifecycle—then you may miss attacks like this.

Another variation of this kind of naïve defense is to try to prevent the OR 1=1 always-true-condition injection by checking the input value to see if it contains the string 1=1. The problem with this is that there are a lot of ways to write an always-true condition other than 1=1. An attacker could write 2=2 or 1<2 or 1<>2 or 1 IS NOT NULL or in fact an infinite number of other ways.

One validation method that is effective, although only for certain input types, is type conversion or casting. If a database column is a numeric or date type, you can improve security by converting its input values to the same type used in the database, rather than working with the values as strings.

If an attacker tries to inject any SQL into the input value, the conversion code will fail and the attack will be blocked. Unless someone comes up with a SQL injection payload that can be represented entirely as a decimal value—which is unlikely—this code is safe from SQL injection.

Regular Expressions

In general, validations based on blacklists (lists or patterns of known bad values) are less effective against attacks and less resilient to new attack variations than are validations based on whitelists (lists or patterns of known good values). We just saw this in our previous two examples: the single-quote checker (a blacklist filter) was prone to both false positives and false negatives, while the decimal conversion (a whitelist filter) worked correctly.

Note

Another good reason to avoid blacklists is that attack techniques are constantly changing. Attackers find new ways to break into applications, and if you’ve built a blacklist of attack patterns into your program, you’ll need to change its code to add new patterns whenever they’re discovered. Whitelists are usually much more resilient to new attacks. If a decimal value is good today, it’ll probably be good tomorrow too.

Type conversion validation can be very effective at blocking attacks, but it won’t work when the database type is a string or text type. A more powerful and flexible validation technique is required in these cases, and one way to do this is to test the inputs with regular expressions.

Regular expressions or regexes are essentially a kind of programming language, used to find complex patterns in strings. (An in-depth review of regexes or regex syntax is beyond the scope of this book; if you’re interested in learning more, the book Mastering Regular Expressions by Jeffrey Friedl is an excellent guide.)

Regexes are very powerful, but they’re also very complex. Even experienced regex developers sometimes have difficulty writing correct patterns. Because of this, a number of regex library sites have popped up online, where you can go to search for patterns or upload your own. Getting back to our example application, since we’re trying to test whether an input value could legitimately be a person’s name, we’ll go to www.regexlib.com, one of the most popular regex library sites, and search for “person’s name.” The top result is:

Into Action

Besides SQL injection, this regular expression pattern is also susceptible to a regular expression denial-of-service or ReDoS attack. An attacker could input a specially crafted value that would cause the regex engine to cycle through billions of iterations, consuming all of the server processor power and essentially hanging the process.

Whenever you add a regular expression pattern to your application, it’s a good idea to check that pattern for potential ReDoS. While this is fairly difficult to do just by manually inspecting the regex, there is a free downloadable tool from Microsoft called the SDL Regex Fuzzer that can test the pattern for you. You can see a screenshot of the SDL Regex Fuzzer in Figure 7-6.

Figure 7-6 The Microsoft SDL Regex Fuzzer tests regular expressions for denial-of-service vulnerabilities.

A quick test reveals that this pattern does match (that is, it would allow) names containing apostrophes, which is good. But another quick test reveals that it also matches on the SQL injection attack string X′ OR A IS NOT NULL. We’ve traded a false positive result for a false negative one, and that isn’t acceptable.

Solving the Problem: Escaping Input

Given all the problems we’ve looked at with single quotes and SQL injection attacks, at this point you might be wondering how it’s possible or even whether it’s possible at all to store single quotes in SQL databases. It is possible, and the way you do it is to escape single quotes into pairs of single quotes. For example, if you wanted to search for the customer O’Malley, the SQL query syntax would be:

A simple way to do this in your code is to replace all occurrences of single quotes with pairs of single quotes:

If an attacker tried to inject SQL into this query, the command text that would end up getting passed to the database would be something like:

The single-quote pairs all match up nicely, and the engine searches for a customer literally named ′ OR ′1′ = ′1, which won’t match any rows. The query will return no records, and the attack is successfully blocked.

If you think back to the cross-site scripting (XSS) attack we discussed in Chapter 6, you’ll remember that the way we ended up solving the problem was to encode the application’s output before it was sent to the victim’s browser. That way, any attack text like <script>alert(“xss”)</script> was just converted into harmless text, and the browser wouldn’t treat the attack as script and try to execute it. Exactly the same principle is at work here! By escaping the user input, we force the database engine to always treat it as text, and never as SQL code.

However, just as you had to encode more HTML characters than just less-than and greater-than to prevent XSS, you will have to escape more SQL characters than just single quotes in order to prevent SQL injection. SQL queries containing LIKE clauses can contain other special characters, such as percentage symbols and underscores, that act as wildcards. If your search term contains any of these symbols (for example, if you’re searching for the customer organization “100% Fun”), these will need to be escaped too. To do this, you can use the ESCAPE clause in your query to specify which character you would like to use as an escape character. For example, if you wanted to use the pipe character, you would write:

The default escape character is the backslash, and you can use this without having to explicitly specify an ESCAPE clause:

Tip

Some development frameworks have helper functions for escaping input. For example, if you’re using PHP to access a MySQL database, you can use the PHP method mysql_ real_escape_string to perform the appropriate escaping:

And in SQL Server, you can use the T-SQL command QUOTENAME to automatically escape single quotes, double quotes, or bracket characters.

The preferred way to escape input is to avoid ad-hoc SQL (combining SQL syntax strings with user input strings into one big string that’s passed to the database engine) altogether and instead to use parameterized queries. Parameterized queries use placeholder characters, usually question marks, to mark query parameters where variable input is expected. If we parameterized the customer search query we’ve been looking at, it would look like this:

We would then pass the CustomerID value we’re looking for as a separate input to the database query:

Notice that we didn’t need to perform any explicit escaping of the query parameter; the programming framework and the database engine took care of that for us. Also notice that we didn’t need to put single quotes around the question mark as we did when querying on the CustomerID field before. The database knows what the CustomerID field type is, and it can apply the appropriate formatting.

It’s important to note that using the parameterized query technique is not the same thing as using string replacement or string formatting commands to substitute values into a SQL syntax string:

This code is still completely vulnerable to SQL injection: string format, and replace functions don’t know anything about database types or correct SQL syntax escaping.

While parameterized queries work great for the majority of queries you’re likely to need, there are some edge cases where you can’t use them. The most common of these edge cases is when you need to use input variables to specify table or field names instead of just field values. For example, you can’t use parameterized queries to do this:

or this:

If your application needs to work this way, you’ll have to go back to ad-hoc SQL. But first take a good look at whether your application really does need to work this way. It’s dangerous to let users specify database objects like table and field names. Remember that we spent a lot of attention earlier on how to prevent detailed error messages containing this kind of information from being displayed to the user. Even if the object names aren’t directly displayed on the page—for example, if they’re hidden “value” attributes of combo box items—an attacker can still easily find them by clicking the View Source button in his browser.

Usually applications with this kind of behavior can be rewritten so that they select a predefined query hard-coded into the application. For example, if we want to display either all the customers in the database or all of the salespeople, we could do something like this:

Your Plan

SQL injection is serious business, but you shouldn’t be overwhelmed by the thought of defending against it. Follow these simple steps to keep attackers’ eyes and fingers out of your databases.

Ensure that only generic, nondescriptive error messages or HTTP 500 pages are displayed to users. Never give away database metadata like table names or application source code snippets in error messages.

Validate simple input types like credit card numbers or postal codes with regular expressions. If the input doesn’t match the expected value, return an error to the user (a generic, nondescriptive error!) and don’t execute the database query.

Always check to make sure that the input matches a good, valid pattern (for example, whitelist pattern matching) rather than whether it matches a bad, invalid pattern (for example, blacklist matching). Whitelist validation defenses are usually much more resilient to newly discovered attack techniques.

If the input is a date or numeric type, don’t work with it as a string. Cast or convert the user input from a string to the appropriate type.

Escaping user input is the single best way to prevent SQL injection. Use parameterized queries or stored procedures to escape SQL query parameters to safe values.

As a final alternative method to using parameterized queries in order to escape input, you can use stored procedures. If you use stored procedures the right way, you can actually make your application even more resilient to attack than if you’re using inline parameterized queries, but this topic is complex enough that it deserves its own section. We’ll cover stored procedures later in the chapter, but first we’ll discuss database permissions, since having an understanding of permissions is important toward understanding the added benefits of stored procedures.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Structured Query Language (SQL) Injection

Create new playlist

Sign In

Sign Up

Structured Query Language (SQL) Injection

SQL Injection Effects and Confidentiality-Integrity-Availability

The Dangers of Detailed Errors

Blind SQL Injection: No Errors Required

Solving the Problem: Validating Input

Regular Expressions

Solving the Problem: Escaping Input

Table of Contents for
Structured Query Language (SQL) Injection