Chapter 15. Stage 10: Security Response Planning

In this chapter:

This chapter explains why you need to be prepared to respond to the discovery of security vulnerabilities in your software. Because this entire book is dedicated to telling you about a process to help you build secure software, it might seem strange that we also talk about how to respond when you fail to build secure software. So we’ll first explain why it’s important that you do just that.

Once we’ve discussed the need to prepare to respond to the discovery of vulnerabilities in your software, we’ll describe the preparations you should make during the software development phases. Early preparation for security response will save you from having to figure out your plan at a time when you need to be executing it.

Why Prepare to Respond?

As we write this chapter, we can hear you saying, “These guys have spent ten chapters telling me how to build secure software. Why are they saying now that I have to be prepared to respond to vulnerabilities? Won’t following all their instructions prevent me from having to deal with security response?”

We wish we could tell you that following all the guidance in this book to the letter will save you from the pain of responding when people find vulnerabilities in your software. But sadly, it’s just not so. This section tells you why.

Your Development Team Will Make Mistakes

It’s a fact of software development life that your team won’t achieve perfection. However, from our experience in applying the Security Development Lifecycle (SDL) at Microsoft, we’re confident that if you apply the techniques we describe in this book, you’ll produce software with many fewer vulnerabilities than you would if you only applied development best practices. No matter how well you apply common practices for producing reliable and bug-free software, paying attention to the errors that lead to security vulnerabilities and applying practices that eliminate or detect those errors will result in more secure code, fewer vulnerabilities, and less need for security response.

But your development team is still made up of human beings, and they will make mistakes, including mistakes with security impact. At some point during development, you’ll decide that the rate of vulnerability discovery is “low enough” and that it has become very difficult to discover remaining security bugs. When that point arrives, you’ll decide to ship your product. No matter how hard you and your team have tried, there will be a vulnerability or two (or maybe more) that your team should have found.

New Kinds of Vulnerabilities Will Appear

If you follow the guidance in this book, you’ll build software that is as secure as you can make it at the point in time when you’re doing development. You and your team will do your best, and that might be very good indeed. But we can guarantee that the security researchers will keep trying, and they will find a class of vulnerability that neither you nor we knew about, and one (or many) vulnerabilities in that class will affect your software.

Back in the 1970s and 1980s, one of us (Lipner) believed that it would be possible to apply highly structured formal specification and design methods along with formal verification of specifications and programs to produce software that would be substantially free of vulnerabilities. A few projects attempted to follow this path, but all failed. (Lipner led a project that came close to releasing an operating system intended to reach Class A1—the highest level—of the U.S. Trusted Computer Systems Evaluation Criteria, or Orange Book [Karger et al. 1991].) The obvious cause of the failures was that by the time a team had executed the highly structured development process, the system they were producing was obsolete and no one wanted to buy it. But even if those highly formal processes had been efficient enough to produce competitive products, we don’t believe they would have achieved their ambitious security goals. The reason is that new classes of vulnerabilities continue to be discovered, and those vulnerabilities almost always result from errors that are below the level of detail addressed by the formal methods. To quote Earl Boebert, a security researcher whose experience goes back to the early 1970s, “Security is in the weeds.” Not only do you need to get the specifications and designs right, but any error in the machine code that is actually running can undo you.

If you look at the history of vulnerability reports, you can convince yourself that new classes of vulnerabilities continue to emerge. Stack-based buffer overruns go back to the 1980s or before (Eichlin and Rochlis 1989), and the authors put tremendous effort into removing them during the Microsoft Windows Server 2003 security push. But exploitable heap-based buffer overruns and integer overflow attacks on the length calculations that attempt to prevent buffer overruns are relatively new developments. For example, the ASN.1 network protocol–parsing component of Microsoft Windows was extensively scrubbed for buffer overrun vulnerabilities during the security push (CERT 2002). A researcher subsequently pointed out that the component was vulnerable to integer overflow attacks in the code that was supposed to prevent buffer overruns. Of course, our tools, training, and processes now focus on such vulnerabilities, but at the time of that particular security threat, we had to invoke our security response process and release Microsoft Security Bulletin MS04-007 to deal with vulnerabilities in a component we thought we had gotten right (Microsoft 2004a).

Integer and heap overruns are far from the only examples of new kinds of vulnerabilities. In early 2000, researchers at Microsoft discovered cross-site scripting attacks, in which a coding error in a trusted Web site can allow a malicious site to act on a client with the privileges of the trusted site (Microsoft 2000a). We remember a Web security expert telling us that everyone knew about the class of attack at issue—and we also remember that the site for this expert’s organization was itself vulnerable to cross-site scripting attacks. In the subsequent six years, the discovery of cross-site scripting vulnerabilities on Web sites—and the discovery of new variations of cross-site scripting vulnerabilities such as HTTP response splitting—has become a common occurrence (Microsoft 2005).

A final example of new kinds of vulnerabilities concerns cryptography. One of us (Howard) got some press coverage during 2005 (eWeek 2005) when he remarked that Microsoft was pursuing a campaign to remove from its products some older encryption algorithms (including the DES, RC2, and RC4 symmetric encryption algorithms and the MD4, MD5, and SHA-1 hash algorithms). The reason for the removal is simple: cryptanalytic research has advanced, and in a few years, customers won’t be able to trust their sensitive data to the protection afforded by those algorithms. Removing an encryption algorithm is a process with a long lead time—you have to consider compatibility with older systems, data formats, and protocols—so it was important to start early and make a concerted effort before a crisis ensued. One aspect of removal was to consider how our security response process would react if there was a catastrophic break of one of the suspect algorithms.

Rules Will Change

Since Microsoft has had a security response process (1997), we have learned, to our occasional regret, that security is often about user and public perception, not just about technical reality. Although you’d think that you’d be “done” if you simply fixed code vulnerabilities that could lead to spoofing, tampering, repudiation, information disclosure, denial of service, and elevation of privilege, the way things work in practice is much more complicated. Five years ago, if a piece of malicious software exploited a vulnerability for which Microsoft had issued a security update, our practice was to tell customers to apply the update in such terms that many heard the message that any attack or damage was their problem. If a worm or virus exploited a legitimate product feature after the user installed or executed the virus code, our answer was in essence, “You should not have run that malicious code.”

Today, we take a much more expansive view. We mobilize our security response process when viruses and worms appear, even if the malicious code is misleading gullible users rather than exploiting a vulnerability, sometimes in response to vulnerabilities or attacks related to third-party (non-Microsoft) software and, especially, if the malicious code is exploiting a vulnerability for which we’ve already issued an update. In the last case, we try to ensure that customers do apply the update, and we often release cleaning or removal tools to help fix the damage done by malicious code. Furthermore, we’ve frequently made product changes to restrict the ways in which malicious programs that users have been tricked into running can exploit legitimate product features. (The first such product change was the Microsoft Office Outlook E-Mail Security Update, which blocks the delivery of executable attachments to e-mail messages [Microsoft 2000b].)

A final factor that has made security response more important is criminals’ growing interest in software and the Internet. When the Microsoft Security Response Center (MSRC) was created in 1997, viruses, worms, and malicious code were primarily a form of Internet vandalism—a nuisance that disrupted the use of widely deployed software. Today, there are frequent reports of individual criminals or organized crime releasing malicious code as part of various money-making schemes (Naraine 2004). As the security of Microsoft software continues to improve, these criminals are likely to target other vendors’ products rather than simply giving up. Like the famous bank robber Willie “the Actor” Sutton, they will “go where the money is” and where it’s easy to steal. If your software is used in important applications or to process sensitive data, it’s likely to be a target, and that means you’ll need to have a response process in place and ready to go.

The bottom line for your development team is that even if you get product security “right” with regard to eliminating exploitable vulnerabilities, you are likely to need a response process. The time to plan and organize your response process is long before you need to invoke it for the first time.

Preparing to Respond

There are actually two related components to preparing for security response. The first is establishing a security response process. At Microsoft, the MSRC deals with all externally discovered vulnerabilities in Microsoft products, no matter what the products’ function or customer base might be.

We could probably write another book on security response and the lessons learned from the operations of the MSRC. In this section, we’ll first summarize that unwritten book, providing an outline of the organization, issues, and functions of a security response center. We’ll outline the entire response process, addressing the functions of a response center proper and the related functions of the product development team that works with the response center to address newly discovered vulnerabilities.

The second component of preparing for security response is the responsibility of each product development team. In the next section, we’ll discuss the response process from the perspective of the development team, building on the overall discussion of the response process but emphasizing the preparatory steps that the product team must take to be ready when vulnerabilities are discovered and reported.

Building a Security Response Center

The role of a security response center is to coordinate your software organization’s response to newly discovered vulnerabilities in products that you have shipped to customers (or deployed through Web properties or released to your internal users). The response process proper begins with awareness of a newly discovered security vulnerability, extends through investigation of the implications of the vulnerability and development of a security update or patch, and culminates with release of the update and the associated communications to get users to apply the update in a timely fashion. Once the update has been released, the response process continues to monitor and assess the impact of the update—and the vulnerability it addresses—on users of the software. This monitoring function is important because the response process is also invoked in the case of emergencies such as the release of exploit code or a worm, virus, or Trojan horse (we refer to this menagerie as malware), whether the exploit code or malware is exploiting a newly discovered vulnerability or one for which an update is already available, or even exploiting user fallibility, where no vulnerability at all is involved.

We describe the response process by using the terminology that would normally be used by a software vendor, and we believe that essentially every software vendor should have a security response process. But user organizations such as banks, manufacturing companies, and airlines usually have large software development staffs and significant Internet presence (for e-commerce, marketing, communications, and other purposes), and that means that they might also need a security response process. If your organization builds software for its own use (especially to process customer data) or if you release software to your customers (as a product, a service such as e-commerce, or software embedded in another product such as a medical device), you might have thought about contingencies that could result from a security bug or malicious attack. A security response center helps you handle those contingencies when (not if) they occur and do it in a fast, organized, and effective way.

Which Vulnerabilities Will You Respond To?

We want to begin by emphasizing two considerations associated with the security response center and its functions:

  1. The security response center deals only with vulnerabilities in fielded software (this might include applications that are fielded on the Web to your customers or only to your in-house user community). If you become aware of a vulnerability in a product that’s still under development, you just fix it. The other chapters of this book deal with why and how you do that. Of course, if you become aware of a vulnerability in a fielded product and you’re also working on a new product or version that might be susceptible to the same vulnerability, you need to be sure that the vulnerability is removed from the new version before it’s shipped to customers.

  2. The security response process deals mainly with externally discovered vulnerabilities that are found by security researchers, customers, or malicious attackers outside of your organization. You have to fix those vulnerabilities promptly and get the fixes out to your customers before they can be harmed. In the case of vulnerabilities that are discovered by your engineering team, it’s almost always a much better practice to fix them in the next update, service pack, or new release than it is to release a special update or patch.

The second consideration here is worth additional discussion: if your development team finds a vulnerability that affects shipping products, why should you delay fixing that vulnerability until you have a service pack or new release ready to go? The answer is that the decision to delay is likely to keep your customers safer. If you release a fix for a security vulnerability in a patch or update, you will inevitably highlight the presence of the vulnerability. Security researchers have become very proficient at reverse engineering security updates and very prolific at releasing exploit code that can be used against systems that exhibit the vulnerability. In particular, the researchers are much faster at releasing exploit code than users are at applying the patches or updates. So your release of an update for a previously unknown vulnerability will give the vulnerability wide visibility and might lead to attacks against users who would have been safer if you had never released the update. Release in a service pack or new version is likely to obscure the security fix by mixing it with numerous other software changes that have nothing to do with security. This will make the reverse-engineering task much harder.

The previous paragraph might sound like an argument for “security through obscurity”—and it really is. One of the hard jobs that your response center and development team will have to handle is deciding how likely it is that a newly (internally) discovered vulnerability will be found outside of your organization and exploited. If the vulnerability is really bad (the potential impact is very serious, and opportunities for exploitation appear widespread and easy to find), and if the vulnerability seems likely to be discovered even if you don’t release an update, the vulnerability is a strong candidate for an exception to the rule; you should consider releasing a security update even though the vulnerability was discovered by your organization. If the vulnerability appears difficult to discover, the related security update is a strong candidate for inclusion in a later service pack. Your security response center and your development team will have to work together to evaluate the vulnerability’s difficulty of discovery. If an internally discovered vulnerability is very similar to one (or more!) that has been reported by external security researchers, it isn’t really an internal discovery, and you should almost certainly fix it in an update. In fact, your response process should already be finding and fixing vulnerabilities similar to those reported externally.

Note

Note

Contrary to popular belief, software vendors commonly roll up security fixes—including fixes for internally discovered vulnerabilities—into big updates, or “dot” releases.

Where Do Vulnerability Reports Come From?

In the previous sections, we refer many times to security researchers. You might wonder who these security researchers are, what they are doing, and why they do it. There’s no single answer to those questions, but the following list gives you some examples from the experience of MSRC:

  • Security product vendors (especially suppliers of intrusion-detection and vulnerability-assessment products) have research departments that discover software vulnerabilities and then update their products to report vulnerable software versions or attempts to exploit the vulnerabilities.

  • Independent security consultants (who might be self-employed or work for consulting firms) conduct vulnerability research as a way of establishing their credibility and competence with potential clients. Some consultants also sell vulnerability information to user organizations to help them protect their systems and networks from attack.

  • Students of computer science and computer security conduct vulnerability research to improve their knowledge of software and security.

  • Various malicious parties conduct vulnerability research to find ways to attack computer users. These malicious parties are rumored to range from individuals with criminal intent (or individuals who want to sell vulnerability information to those with criminal intent) to organized crime rings bent on committing financial fraud to national governments seeking to steal secrets or disrupt systems or networks.

A security response center must interact with security researchers to protect users. One of the most important functions of the response center is to encourage responsible disclosure. Responsible disclosure refers to the practice whereby the finder of a vulnerability reports to the software developer and allows the developer to develop and release an update or patch before publicizing the details of the vulnerability. The developer’s security response center keeps the researcher apprised of the status of the response and update-development process and acknowledges the researcher’s cooperation when the update is released. (We’ll discuss this aspect of the process later in this chapter in “Managing the security researcher relationship.”) Responsible disclosure is important to the success of the response process because it minimizes the period of time when a vulnerability is known to potential attackers while users of the software are without a practical way to protect themselves from exploitation of the vulnerability. Research from Forrester (an IT analyst firm) has documented a measure of “days of risk” to quantify the benefits of an effective security response process and of responsible disclosure of vulnerabilities (Koetzle et al. 2004).

Security Response Process

The security response process integrates steps executed by the security response center with steps executed by the development team. The following section provides an overview of both sets of steps and how they fit into the overall process as well as brief descriptions of each of the steps. Figure 15-1 presents an overview of the flow of the response process.

Overview of the security response process.

Figure 15-1. Overview of the security response process.

At a high level, the process flow is broken into two parallel tracks. On the first track, the security response center focuses on the vulnerability report, communication with the security researcher, and managing the process from report to update release. The elements of this track are shown along the upper row of blocks in the figure. Along the second track, the development team and the specialized security team focus on the technical details of the vulnerability, the fix that remedies it, and the design or implementation errors that led to the vulnerability. The elements of this track are shown along the lower row of boxes in the figure.

The following sections summarize the security response activities associated with each element of each track. As we mentioned previously, it would be easy to produce an entire book that focuses solely on the response process. But in the book you are reading, we’ve limited ourselves to describing the overall response process and some key details of each component in hopes of giving developers of new response processes sufficient information to get started on the right path. If you are building a response process, the information in this section will tell you what issues to focus on and what steps your new response center should expect to follow as it receives its first vulnerability reports.

Vulnerability reporting

The response process begins with receipt of a vulnerability report. Create and publicize a point of contact that security researchers can use to initiate communication with your security response center. Microsoft has accepted reports sent to for more than eight years. You might wish to use another e-mail alias such as security or vulnerability-report for your response center. Regardless of your choice, it’s important to make it widely known and to not change it.

Note

Note

If you believe you have found a security bug in any Microsoft product, please send an e-mail message to MSRC at .

The response center must monitor the reporting alias and respond directly (with more than just a canned acknowledgement) to every e-mail message that is intended as a vulnerability report and isn’t clearly spam or traffic on a mailing list. The response center should also monitor security mailing lists, such as BugTraq (SecurityFocus 2005) and Full-Disclosure (Cartwright 2002), for list traffic that might be irresponsibly disclosing a vulnerability report. Such reports increase the likelihood that a vulnerability will be exploited before your teams have developed a way for customers to protect themselves, so it’s important to detect these reports quickly and act at once.

Finally, the response center needs to have a relationship with your organization’s customer-support or field-service teams so that new vulnerabilities (or exploitations of vulnerabilities) reported by customers will find their way promptly into the response process. We recommend that the response center be staffed with enough “duty officers” that any report can be evaluated and acted on within a day (24 hours) of receipt, weekends and holidays included. Organizations with products that appear very unlikely to be the subject of vulnerability reports might get by with a response center that operates only during the workweek, but you accept a degree of risk (for your organization and its customers) by accepting such a limitation.

When a duty officer receives a report that isn’t obviously spam or a customer question unrelated to security, she begins the response process. This will surely involve sending a personal reply to the security researcher who sent the report, perhaps with a request for more detail or clarification. If the duty officer believes that the report might be a real vulnerability (not a known issue or a nonissue), she also opens a bug in the response-center tracking database and assigns it for investigation to the product team responsible for the product in question.

Triaging

Vulnerability reports come in all varieties. At one extreme is the report that provides a code fragment, Web page content, or HTML request that causes code of the researcher’s choosing to run on a vulnerable system. The report and the impact of the vulnerability are clear at once. At the other extreme is a vague hint of something that might be a vulnerability, but there are few or no specifics either because the researcher doesn’t want to share full details (perhaps she wants to make the response center “work” so that the development organization will earn access to the details of the vulnerability) or because the researcher herself didn’t fully understand what was happening. Unfortunately, reports of the latter kind are not uncommon, and the fact that the researcher hasn’t worked out all the details definitely does not mean that the vulnerability can be ignored.

Important

Important

All incoming security vulnerabilities must be triaged.

Triage is the process of finding out enough about the reported vulnerability to assess its potential impact. The response team must reproduce the vulnerability as reported and understand what a malicious attacker might do with it. It’s important to gain a full understanding of the vulnerability as reported and to find out all of its implications. The security researcher might have understood that she was reporting a vulnerability that could be used to cause a denial of service—to crash the application or the underlying operating system—whereas a more sophisticated exploit could run hostile code and allow an attacker to “own” the system. The triage team must discover the full implications of the report.

In our experience, the best organization to conduct triage combines security experts with experts in the product that is the subject of the vulnerability report. At Microsoft, the security experts are referred to as the Secure Windows Initiative Attack Team (SWIAT), and they’re one component of the Secure Windows Initiative (SWI) team that manages and executes the SDL. The product experts come from the product group that built the product that is the subject of the vulnerability report; we’ll talk more about their role later in this chapter in “Create Your Response Team” when we discuss product-group responsibilities for security response.

The product of the triage element of the process is an assessment that covers:

  • The validity of the security researcher’s report. Is the report describing a vulnerability at all?

  • The severity of the reported vulnerability. Assuming that the report is valid, what kind of impact could the vulnerability have on customers’ systems if exploited? Are there mitigating factors that would reduce the likelihood of successful exploit in the field? Microsoft’s triage process is based on the MSRC Security Bulletin Rating System as described at http://www.microsoft.com/technet/security/rating.mspx.

  • Any other factors that would either mitigate or amplify the need to respond to the vulnerability or the urgency of that need. For example, if the vulnerability is reported to a public e-mail list rather than directly to the response center, the potential for immediate exploitation amplifies the urgency of a response. Similarly, if the vulnerability is reported by a customer who is not pressing for an update and the vulnerability is otherwise unlikely to be discovered, the urgency of response is reduced, and it might well be appropriate to consider releasing a fix in a service pack.

When triage is complete, the response center must have a working plan for responding to the vulnerability. You should know whether an update will be released, and with what urgency, as well as about other plans such as release of public information about mitigations and workarounds for the vulnerability in advance of the release of an update.

Creating the fix

The product team works alongside the response center team in the triage of each new vulnerability report. Once triage is complete, the product team owns responsibility for developing any fix that is required to address the vulnerability. This is true whether the fix will be released in a security update or patch or in a service pack. Obviously, timing and packaging considerations differ in these two cases, but many important elements are common, and this section will discuss them.

Any fix for a reported security vulnerability has three critical aspects:

  1. It must eliminate the vulnerability that was reported.

  2. It must eliminate any related vulnerabilities. A related vulnerability might result either from repeating the same mistake that caused the reported vulnerability in similar code or from an underlying design flaw that leads to a pattern of vulnerabilities.

  3. It must not unnecessarily “break” legitimate functions of the code that contained the vulnerability. We refer to such breakage as “causing a regression.” Much of the testing element of the response process focuses on eliminating regressions, but building a regression-free fix is fundamentally a part of fix development rather than of testing. As is widely understood in software engineering, it is not possible to test quality into the end product.

Eliminating the vulnerability as reported sounds relatively simple, and it often is: you just find the code that fails to test for valid input, and you add (or correct) the test as needed. But it can be easy to make a very fundamental mistake in designing a fix. We’ve seen security updates (including updates released by Microsoft) that insert a test for valid input on a path leading to the vulnerable code instead of fixing the vulnerable code itself.

Figure 15-2 sketches an example of how not to fix vulnerabilities, and Figure 15-3 sketches an example of a proper fix, in which the required test is added to the underlying vulnerable code. Similar considerations apply to client-server applications or components—if the client component or code is controlled by an untrusted user, it’s vital that the security check be made in the server component.

Partial fix for an underlying vulnerability.

Figure 15-2. Partial fix for an underlying vulnerability.

Correct fix for an underlying vulnerability.

Figure 15-3. Correct fix for an underlying vulnerability.

The quest for related vulnerabilities is motivated by the fact that security researchers often look for vulnerabilities similar to one that has just been fixed. There is nothing more frustrating to response center staff than to issue one update and immediately receive a report of a new vulnerability that looks just like the last one, in the same code. It’s a point of pride to MSRC and the associated product teams that in recent years they have become much more effective at finding and eliminating such related vulnerabilities.

Other factors that come into play in the response process can make the search for related vulnerabilities especially challenging. If the circumstances surrounding the vulnerability report suggest that a fix is urgent—for example, because a highly exploitable vulnerability has been made public—there might be insufficient time to do a thorough search for related vulnerabilities. If an initial review suggests that related vulnerabilities are likely to be present, the response team and product team will probably need to release multiple updates, including an immediate update to address the reported vulnerability and its most similar neighbors and a subsequent update to address the related vulnerabilities identified by a thorough search.

For an illustration of the search for related vulnerabilities, consider Microsoft Security Bulletins MS03-026, MS03-039, and MS04-012 (Microsoft 2003a, Microsoft 2003b, Microsoft 2004b), which were issued in response to an initial report of a vulnerability in the RPC/DCOM component of Windows and subsequent reports that were received after the initial update (MS03-026) was released. MSRC determined that it was important to release the initial update quickly, but a preliminary review of the affected code indicated that multiple additional vulnerabilities were present and likely to be found once security researchers saw the initial bulletin and update. So Microsoft initiated a process that involved releasing the fixes for the most urgent vulnerabilities while conducting a major review of the affected Windows components. The process culminated with the release of Microsoft Windows XP SP2 and Microsoft Windows Server 2003 SP1, in which remote anonymous access to the affected component was blocked by default, significantly reducing the attack surface to complement code-level changes that resulted from a very thorough review of the RPC/DCOM components.

Note

Note

Requiring authenticated RPC/DCOM by default protected Windows XP SP2 users from the Zotob worm.

Security fixes and regressions

The development of security fixes that do not cause regressions for legitimate users is both important and challenging. There is no single secret to the successful avoidance of regressions, but one step that Microsoft has taken in the quest for regression-free security fixes is to minimize the set of changes included in a security update or patch. Microsoft often supplies individual users (especially corporate users who have complex internal computing environments) with a non-security fix—often called a QFE (Quick-Fix Engineering)—to resolve problems with specific applications or peripheral devices. Historically, when we released a security fix for a component that had been the target of one or more QFEs, we included the QFEs as well as the security fix in the update. With the release of Windows Server 2003 and Windows XP SP2, we changed the operation of the Windows installer and the packaging of security fixes so that users who had not installed any QFEs received only the security fix when they installed a security update. Users who had installed any QFE for the component being updated received all of the QFEs as well as the security update. This change in security update packaging has reduced the rate of regressions caused by security updates, especially for corporate customers. You’ll find more information on security updates and regressions in the "Testing" section later in this chapter.

Security fixes for multiple product versions and locales

One final aspect of update development concerns product versions and localization. A single vulnerability might affect multiple product versions (for example, Windows XP and Windows Server 2003). If it does, update development and testing must be synchronized so that customers using all affected versions can be protected at the same time. This practice mitigates the risk of an attacker reverse engineering the update for one software version and then exploiting the vulnerability in other versions for which the update has not yet been released. For similar reasons, if the software is available in multiple language versions (Microsoft Windows is available in 28 languages, and Microsoft Office in 35), the update must be released for all versions at the same time. It would be unseemly for the French or German language versions of a product to remain vulnerable because an attacker had reverse engineered an update released only in English.

Managing the security researcher relationship

Security response is not just about accepting vulnerability reports and issuing updates and the associated security bulletins. Rather, there is a long-term aspect to security response that involves building relationships of trust and confidence with the security researchers who find and report vulnerabilities. Such a relationship is important to the vendor because it tends to develop the conditions that allow the response center to do its job well and minimize customers’ exposure to vulnerabilities for which no fix has been released.

From the response center’s perspective, the best scenario is one in which researchers practice responsible disclosure—keeping their findings private until the response center has issued an update. In the best case, researchers also understand the response process well enough to see that a long time interval between report and update release is not a matter of the response center ignoring the vulnerability report or the researcher. Rather, the long interval represents the time required for the response center and product team to search for related vulnerabilities, fix them all, and release a fully tested update that has minimal regression potential.

The response center has numerous techniques at its disposal to help manage the researcher relationship effectively. The first is simple communication: it’s important to keep the researcher apprised of the status of her vulnerability report (weekly updates are the norm) and to do so in a human and personal way. The MSRC used to avoid identifying the duty officer responsible for an individual report. This practice caused one researcher to conclude that the MSRC duty officer was actually a robot or artificial-intelligence program. Today, MSRC goes out of its way to identify duty officers and encourage researchers to establish a personal connection with “their” duty officers. Because researchers tend to specialize (in the browser, a spreadsheet application, or a database system), building a personal relationship between researcher and duty officer is often consistent with the efficiency of the response process because the duty officer can also be paired with one product team.

It’s easy for the response center to fall into the trap of believing that security researchers are a hostile camp bent on criticizing the products that the response center is supporting, and on putting users at risk by exposing product vulnerabilities. For example, one of our colleagues in the industry has been quoted as saying, “Most [vendors] don’t need threats to [fix reported vulnerabilities], and some researchers have become the problem.” We refer to this attitude as a trap because a response center that takes such an attitude will inevitably make researchers into adversaries who will not practice responsible disclosure or cooperate with other aspects of the response process. In contrast, the response center that assumes security researchers share the developer’s goal of making products more secure and protecting customers is likely to build a cooperative relationship with security researchers and wind up encouraging behaviors that benefit researchers, developers, and customers.

Since the late 1990s, almost all response centers have acknowledged the contribution of security researchers in the security bulletins they issue when an update is released. Such acknowledgements constitute a basic component of the cooperation between researchers and the response center. Some researchers have used public acknowledgements as indications of the quality of their work and have built healthy security research and consulting practices out of their success as researchers and vulnerability finders.

Response centers might well have options beyond acknowledgements to build more cooperative relations with researchers. Examples include offering organizations that conduct security research membership in partner programs, giving researchers early access to beta software, and offering internships or college recommendations. (MSRC has worked extensively with a very capable security researcher who is still in high school as this chapter is being written.) MSRC has also sponsored community-building events for security researchers and invited researchers with established track records to speak at in-house Microsoft security training conferences (ZDNet 2006).

Beyond keeping the security researcher apprised of the status of her report and the schedule for an update or patch, the response center might also wish to give the researcher an early copy of the update for testing and allow her to review and comment on the draft of the security bulletin. Of course, both of these options require a significant level of trust between response center and researcher, and neither is appropriate for the first report from a previously unknown researcher. But they are options for building the researcher relationship, and the response center should bear them in mind as its relationship with an individual researcher evolves over time.

Testing

Over the last 10 years, we’ve seen a steady reduction in the time interval between our release of a security update and the release of exploit code that shows how to take advantage of the vulnerability or even attack code that exploits the vulnerability for criminal purposes such as stealing customer information or launching distributed denial of service attacks. As a result, our advice to customers is that they apply the most critical security updates immediately, without taking a long time to test the updates for regressions or compatibility problems. We could not give such advice unless we were confident that our security updates wouldn’t cause such problems, and testing is one source of that confidence. (The quality of the security fix development itself is the other source.)

Security update testing aims to accomplish two purposes:

  1. To verify that the update in fact addresses the reported vulnerability and any related vulnerabilities

  2. To attempt to verify that the update will not cause regressions when users install it.

Testing to verify that the update addresses the vulnerability involves more than simply trying any demonstration code that the researcher supplied to see if it still exhibits the vulnerability. The test team must review the source code for the affected component and then try variations to ensure that the fix addresses the underlying vulnerability rather than simply blocking one path to its exploitation. (See Figure 15-2 and Figure 15-3 and the "Creating the fix" section earlier in this chapter.) The test team members must also apply their own security research skills to see that no less-obvious variations of the reported vulnerability remain. At Microsoft, the function of verifying security fixes before they are released is performed by SWIAT. In addition to applying their own skills and experience at security vulnerability research, SWIAT members stay aware of external trends in security research, including vulnerability reports against non-Microsoft products. They apply this knowledge as they test each new security update. This last fact is important; in numerous cases, we have found and fixed bugs in Microsoft products before the products were shipped by analyzing competitors’ security bugs.

Although testing to ensure that the security update eliminates reported vulnerabilities is a practice unique to security, testing to detect and eliminate regressions involves more standard testing practices. Security update testing begins with execution of the regression test suites for the component being updated. It includes testing with common applications as well as with test deployments to users inside and outside of Microsoft. No user is allowed to deploy the update operationally until it is finally released and available to all customers (to ensure that all customers are protected equally). Testing by external users engages large corporate customers who are not informed of the specifics of the vulnerabilities addressed by the updates they are testing. These customers commit to special agreements to provide feedback on updates and to maintain the confidentiality of the updates they receive (because disclosure of an update could result in its being reverse engineered and exploited before Microsoft is able to release the final update and protect customers). The customers serve as proxies for other customers in their “vertical” industry segment who are likely to have similar line-of-business applications. The customer testing program has proven valuable in identifying regressions that might otherwise affect corporate line-of-business applications and in giving corporate customers confidence that they can deploy updates without unacceptable risk to the continued functioning of their applications. Although Microsoft’s in-house testing against common packaged applications has proven effective at detecting potential regressions, it’s difficult to anticipate all the ways in which corporate IT departments have coded the applications they develop, and it’s impossible to gain access to all (or even most) such applications for testing. The customer testing program is the best way we have found to detect and eliminate potential regressions in such applications.

Content creation

The output of the response process goes beyond the security update to encompass content that provides information and guidance to customers using the affected software. MSRC produces content directed to IT professionals who work in enterprise IT departments and separate content for end users (primarily consumers who use Microsoft products at home). The end-user content is not detailed; it usually does little more than advise users that a vulnerability has been found and addressed and that they should install the update that Microsoft has released. The rationale for providing only this level of content is that most end users are unconcerned with the technical details of a security update and only want to be protected. The best and simplest way for them to be protected is to install the update. As more and more users enable the Automatic Update feature of Windows that installs new security updates without user intervention, even this content has become less relevant. However, it’s important to make information available to end users who might have heard about a vulnerability and want to know what has been done in response and to users who want information about the functioning of their systems. End users who want details of the vulnerability and of Microsoft’s response are referred to the content targeted at IT professionals.

Microsoft refers to content for IT professionals as security bulletins. Security bulletins must contain much more detail than end-user content about the vulnerability or vulnerabilities addressed by an update and the potential consequences of their exploitation. Where feasible, IT professional–oriented content should also tell system administrators about mitigations and workarounds. This information might allow administrators to determine that their particular configurations are not vulnerable to attack (even without installing the update) or tell them how they can prevent exploitation of a vulnerability without installing the update. Such information is important to organizations that need to schedule client or server downtime for update installation and that have an IT staff capable of analyzing their system environments and protecting their systems by taking administrative actions such as disabling system features or blocking network ports. At Microsoft, SWIAT develops information about mitigations and workarounds as part of the process of triaging the vulnerability report and searching for related vulnerabilities. MSRC produces the security bulletin for use by IT professionals.

Security advisories

In addition to security bulletins, Microsoft has established a practice of releasing security advisories in situations in which there is no security update. Security advisories are released when a circulating worm or virus is exploiting a vulnerability for which an update is already available or when a worm or virus is not exploiting any vulnerability. Security advisories are also released to convey information about mitigations and workarounds when information about a vulnerability becomes public before an update is ready for release.

Press outreach

One final aspect of content creation concerns preparation for press outreach. Vulnerabilities in the products of major software vendors such as Microsoft and Oracle can be newsworthy events, and the release of an update often triggers a round of coverage in the IT trade press and sometimes in the general press. MSRC prepares talking points and responses to anticipated press questions along with the other content associated with each update. MSRC personnel respond to press questions as needed when the update is released. In the case of updates that address especially serious vulnerabilities, MSRC reaches out to the press proactively to ensure wide dissemination of information about the vulnerability and its update and, thus, to encourage customers to deploy the update as rapidly as possible. Organizing for press response can help ensure that customers are not unduly confused or alarmed by the news associated with security vulnerabilities and that they get a clear picture of the risks that vulnerabilities pose and the appropriate actions to take in response.

Update release

The development and testing of a security update, the documentation of workarounds and mitigations, and the preparation of content all come together at the point of security update release. When all preparations have been completed, the updates are posted to a well-known Web site and made live for deployment through the various automatic updating facilities (Microsoft Windows Update, Automatic Update, Microsoft Update, Office Update). Security bulletins are posted to their own Web sites and an e-mail notification (and RSS feed and MSN alert) released to subscribers who have elected to be notified of the availability of new security bulletins. Microsoft’s customer support and sales organizations are also notified about the release of the update. They are directed to alert customers with whom Microsoft has a direct relationship that those customers should review the bulletin and consider installing the update or taking other action to protect their IT systems.

One important aspect of update release is to maximize predictability. Originally, Microsoft released security bulletins and updates whenever they were ready on the theory that this policy would protect customers as soon as possible. Although the theory was valid as far as it went, the practice had the effect of disrupting the operations of IT staffs. And because experience showed that the release of the update was really the event that started the race to reverse engineer the update and exploit the vulnerability, it was not clear that customers benefited from a release-when-ready policy. For those reasons, Microsoft led the industry in establishing the practice of releasing security updates on a predictable schedule, initially releasing updates weekly on Wednesdays and, in recent years, releasing on the second Tuesday of each month.

A second important aspect of update release is simultaneity. We alluded to this consideration in the "Security fixes for multiple product versions and locales" section earlier in this chapter. To the maximum extent possible, updates for all affected software versions and all language versions should be released at the same time. Furthermore, no customer should receive an update before any other. We’ve often discussed the latter policy with Chief Information Security Officers of major customers, many of whom believe that their organizations have a critical need to receive security updates or security bulletins before other customers. They make compelling cases, but on examination, it’s simply impossible to develop a consistent rationale for giving some customers access to updates before others—you find yourself on a slippery slope at whose bottom everyone receives the updates early. As we discussed previously, customers who receive the updates for testing are forbidden by the test agreement (and by their own best interest given that the updates they are testing are beta versions and might have unintended negative effects on customers’ systems) from putting them into production and are not informed of the specific vulnerabilities addressed by the updates. Microsoft carries this policy to the point of beginning the update process for its own systems at 10:00 Pacific Time on the second Tuesday of the month—the time when updates become available to customers. (However, we would almost certainly make an exception for the case of a vulnerability in the servers used to distribute updates and security bulletins because the loss of those servers would prevent not only Microsoft but also its customers from protecting themselves.)

Emergency situations have the potential to justify exceptions to our principles of predictability and simultaneity. Simply put, if a vulnerability is being exploited widely or poses a significant threat to the safety of customers and the Internet, the need for a speedy update can overwhelm the goals of releasing on a predictable schedule and of protecting all customers at once. We would be more reluctant to abandon simultaneity than predictability because it’s very hard to justify leaving some customers at risk while protecting others. (Fortunately, Microsoft’s development and packaging practices make it relatively simple to release updates for all product versions and languages at the same time.) One example of a decision to abandon predictability concerned the update released with Microsoft Security Bulletin MS06-001, “Vulnerability in Graphics Rendering Engine Could Allow Remote Code Execution” (Microsoft 2006). In that case, a vulnerability in the Windows Metafile Format (WMF) was discovered to be under active exploitation during the period between Christmas 2005 and New Year’s Day 2006. The MSRC team and the Windows team worked long days and nights through the New Year holiday weekend and into the following week to investigate the vulnerability, provide workaround information to customers, and build and test an update. Although the regular monthly release was planned for Tuesday, January 10, the MSRC determined that the severity of the vulnerability and the widespread customer concern would justify an out-of-band release as soon as the update had passed required testing. When that milestone was completed, the MSRC released the bulletin on Thursday, January 5, five days before the scheduled monthly release.

Once the bulletin and update are released, MSRC personnel initiate press outreach if warranted and respond to any press inquiries about the update. They also begin to monitor Internet activity for signs of the release of exploit code that would allow someone to attack customers or of worms, viruses, or other malware that exploit one of the vulnerabilities fixed by the update. Later in this chapter, the "Emergency Response Process" section discusses these situations in more detail.

Lessons learned

Although the urgent part of the response process concludes with the release of the security update and bulletin, one very important aspect remains. That is to ensure that security engineering practices, tools, testing, and training reflect the lessons to be learned from the vulnerability. At Microsoft, one staff member of the SWI team is responsible for conducting a root-cause analysis for every vulnerability fixed by a security update; for documenting the failures of design, coding, testing, training, and tools that allowed the vulnerability to make its way into the product; and for recommending changes that would prevent similar errors from occurring in the future. Updates to Microsoft’s static analysis tools, PREfix and PREfast, frequently result from the “lessons learned” process, and our security training classes (especially those taught by Michael Howard) are replete with samples of vulnerable code drawn from actual security vulnerabilities and the fixes that addressed them.

We’ve said throughout this book that absolute security isn’t achievable and that the only practical way to achieve more secure software for customers is to apply best practices and to learn from your mistakes. The “lessons learned” component of the security response process is key to learning from mistakes. It is absolutely vital that you not only recognize the specific causes and design or coding errors that lead to each security update but also use them as starting points for your own search for new kinds of vulnerabilities and ways to avoid them. In several cases at Microsoft, SWIAT investigations led to the identification of new classes of vulnerabilities related to but different from those reported by outside security researchers. The SWI team and product teams have then taken action to eliminate newly discovered vulnerabilities from product versions still under development. Most of these vulnerabilities have never been discovered by outside security researchers even though some examples remain in older product versions; if the vulnerabilities are discovered, customers who are using newer product versions that have been subject to the SDL are protected without any need to update their systems.

Emergency Response Process

The security response process described in the previous sections manages the “normal” security vulnerability cycle that begins with an external report of a vulnerability and culminates with the release of a security update and an update of development processes to reflect lessons learned. Although not exactly routine, this cycle has a relatively predictable flow and usually allows the product developer time to develop and test a security update and the associated communications.

There is an alternative vulnerability cycle that we at Microsoft refer to as the incident response or emergency response process. This cycle begins with some event—the irresponsible disclosure of a vulnerability or the launching of a worm, virus, or other piece of malware that might pose a significant and near-term threat to users of the affected software. At Microsoft, if the MSRC determines that the event in question could pose such a threat, it initiates what we refer to as the Software Security Incident Response Process (SSIRP).

The objective of the SSIRP is to mobilize Microsoft resources quickly to assess the potential threat and take action to minimize its impact on Microsoft customers. Each SSIRP incident is managed in a sequence of phases as shown in Figure 15-4. During the earlier phases, the process assembles a response team, identifies the scope and impact of the (real or potential) problem, and identifies a potential course of action toward its resolution. In the later phases, the process provides customers with information, tools, and updates as required to resolve the problem and reverse its impact to the extent feasible.

SSIRP flow.

Figure 15-4. SSIRP flow.

The SSIRP is managed and executed by a cross-functional team of people drawn from MSRC, SWIAT, the customer support organization, and the Microsoft IT security organization. Each incident is assigned an emergency lead (overall manager for that incident), an engineering lead (focused on the technical aspects of the incident), and a communications lead (focused on customer impact and external communications). Engineers and managers from the product group (or groups) responsible for any affected product join the SSIRP team as required. The following sections describe the SSIRP’s phases.

Watch phase

The Watch phase begins immediately after MSRC or any other team recognizes an unusual event. MSRC often initiates the Watch phase, but other teams, including customer service and the Microsoft IT groups, might also initiate Watch. Outside parties, including security vendors, customers, the press, and CERTs or government agencies, might also provide reports that lead to the initiation of the Watch phase. The Watch phase is executed by a small group of “first responders” whose objective is limited to confirming that an incident is under way. Once confirmation is complete, the process moves to the next phase.

Alert and Mobilize phase

During the Alert and Mobilize phase, a full SSIRP team is assembled, and an emergency lead, engineering lead, and communications lead are designated. The product team (or teams) responsible for the affected product (or products) mobilize during this phase. They work with SWIAT to begin determining the technical realities underlying the incident. The customer service and communications teams evaluate the incident’s impact on customers and its visibility in the press. These two factors play a major role in evaluating the significance of an incident, along with the technical assessment of severity and potential impact. If an incident affects a large number of customers or affects customers in a major way, it is significant; if an incident attracts media attention, it is significant because customers will become concerned about the safety of their systems regardless of the realities of the threat. Technical considerations can also make an incident significant. For example, the irresponsible disclosure of a vulnerability that could be exploited to do significant harm to customers almost inevitably leads to a SSIRP mobilization because the exploit could occur before an update is available to protect customers. Because the release of security updates is regularly followed by reverse engineering of the updates, publication of exploit code, and release of malware that exploits a vulnerability fixed by an update, MSRC enters the Alert and Mobilize phase as a matter of course as part of the process of releasing updates on the second Tuesday of the month. This process ensures that the response and product teams are assembled and prepared to respond as quickly as possible if exploit code is released or a malicious attack is launched.

Assess and Stabilize phase

The objective of the Assess and Stabilize phase is to provide sufficient information and assistance to customers so that the threat of harm can be significantly mitigated. This objective implies that SWIAT and the product team must gain sufficient understanding of the incident to make a recommendation—either that customers apply an existing update or that they deploy some measure that mitigates the effects of a vulnerability or the potential for a successful attack. For many incidents, this sort of recommendation might be sufficient to keep the attack from causing significant harm if the attack is not damaging or widespread and if MSRC can alert customers to apply an existing update. Similarly, if an attack is not exploiting a vulnerability at all, a recommendation for user action might be sufficient.

If an incident does involve the exploitation of a new vulnerability for which no update has been developed, the identification and communication of mitigations and workarounds becomes critically important. The Assess and Stabilize phase aims to produce mitigations and workarounds as rapidly as possible and to disseminate them broadly to stop an attack or incident before it can cause significant harm. During both the Alert and Mobilize phase and the Assess and Stabilize phase, MSRC and SWIAT work with partners such as antivirus and intrusion-detection vendors to share information and to ensure that the partners provide updated signatures that can protect customers. This work with partners is especially vital when an attack is under way and no update is available or when the attack is not exploiting a vulnerability.

Resolve phase

The Resolve phase brings the incident to a close by releasing whatever tools, updates, or information is required to assist customers in recovering from the effects of an attack and protecting themselves from further attacks. If the incident involves a vulnerability for which no update is available, an update must be released before the Resolve phase can be closed. If the incident involves an attack that damages customers’ systems, customer support must have the information necessary to help customers recover to the maximum extent possible. Depending on the scope of the attack, a malicious-code cleaning tool might also be released.

Cooperation with antivirus and intrusion-detection vendors continues into the Resolve phase. Customer and press communication also continues until customers are aware of the workarounds, mitigations, updates, and tools released in response to the incident.

At the conclusion of the SSIRP for a given incident, the team conducts a postmortem to identify lessons learned and potential improvements for the SSIRP process. This postmortem goes beyond the normal security response “lessons learned” process because it covers the teams involved in customer recovery and a broader range of communications activities, in addition to covering a software vulnerability and the steps needed to prevent similar vulnerabilities in the future.

Security Response and the Development Team

In the previous section, we presented an overview of the functions and organization of a security response center, drawing heavily on our experience with the Microsoft Security Response Center and the way it handles both normal vulnerability reports and security incidents that range up to the seriousness of full-blown Internet emergencies. That overview referred to the role of the product team responsible for the affected product in each of the stages of response, from triage, through update development and testing, to release. This section focuses on the aspects of preparation for security response that our experience has shown to be necessary if a product team is to execute its part of the response process effectively. The two guiding principles to this section are first, that the time to prepare for security response is before a vulnerability has been reported, and second (as we point out repeatedly in this book), that every team that ships software needs to be ready for security response.

Create Your Response Team

Our discussion of the security response process addressed at length the role of the response center as well as how the product team deals with a vulnerability report or security incident. That discussion assumed that the product team had people in place to execute their part of the response process and that the response center staff knew how to reach them. The times are long past when MSRC had to scramble to find the team responsible for a product or component after a vulnerability had been reported.

Today, the rule is simple: when you ship a piece of software, whether it’s a revenue product or a free release, you must identify the people who will respond to externally discovered vulnerabilities in that software and must provide their contact information to the response center. You should identify enough people that the response center can always find someone despite vacations and holidays. When someone in a response role leaves the product team, you must replace her. If the contact process breaks down, response center staff normally start contacting individuals higher in the management chain until they find someone who has the authority to get the response moving. At Microsoft, MSRC maintains emergency contact information for every product team’s management up to the vice president level as a backup for the contact lists of the people who are supposed to respond. All individuals on the contact lists provide information so that they can be contacted 24 hours a day and seven days a week.

The question of who plays the response-contact role is pretty well settled at Microsoft—almost all product teams designate program managers to drive their response process. The response program manager is expected to find testers to work with SWIAT to produce the vulnerability report. This program manager should also bring in the developers who are responsible for the offending code and can diagnose the root cause and make the fix as needed. Individual product groups organize the specifics of this process differently. Some have a dedicated, sustained engineering team with developers and testers who can build and test fixes whereas others assign sustained engineering program managers to coordinate the process but use developers and testers from the core development team. In every case, the response process can call on the developer who “owns” the code that exhibits the vulnerability to ensure clear understanding of the problem and the development of a correct fix.

Beyond merely creating the response team, you will need to be able to respond to vulnerabilities as long as the product is supported. For Microsoft products, this period is usually 10 years. As a result, we occasionally have to consider how we’ll support the security of a product long after we’ve stopped new development and reassigned most of the team originally responsible for the code. There’s no single solution to this problem, although the usual answer is that support responsibility goes to the team that develops the closest successor product. You’ll always have to be aware of continuing security response support, especially when you stop new development on a product or version.

Support Your Entire Product

The less-formal way of stating the requirement to support your entire product is to say, “If you ship it, you need to understand how to update it.” In large software organizations such as Microsoft, sharing and reuse of code and software components are common. (We’ve referred to those components as giblets, after the plastic bag of assorted innards that comes inside a frozen turkey.) The practice of reusing and sharing components has benefits for efficiency and consistency, but it does carry with it the risk that a vulnerability in giblet A will be manifest in product B, whose development team didn’t create giblet A and might not have the capability to update it.

The recommended response to this problem is simply, “Don’t do that.” If at all possible, rely on platform services that can be updated once as part of the operating system to protect users of all applications when a vulnerability is discovered. If you ship a component that was developed by another team, you must have a service-level agreement with the developing team so that they will respond when vulnerabilities are found and will work with you to develop, test, and package the necessary updates. If you can’t get such an agreement, take ownership of the source code and be prepared to respond on your own. If you can’t get ownership of the source code, develop your own component so you can support it correctly.

In the case of a widely shared component such as an image parser, class, component, or library, it’s especially important that the team that develops the component provide a plan for security response. Occasionally, after a vulnerability is reported to a team that shipped a component, the response center and the “shipping” team discover that the report actually affects a reused component. At that point, ideally, the response center brings in the team that developed the vulnerable component, and that team diagnoses the problem, develops a fix, and ensures that the fix is released by all of the “shipping” teams that have used the vulnerable component. Getting to this ideal state requires that the team that develops the component know which teams ship it and that there be a way to update the component wherever it’s used. Meeting the first requirement means that the developing team needs to have an authoritative list of “shipping” teams. Meeting the second requirement means that the installation and updating tools have to be robust enough to detect the vulnerable versions and apply the fix where needed. None of this is rocket science, but it can get very complicated in the case of a widely used component in which different “shipping” products might install different component versions and in which some of the teams involved fail to think about the need to update. At Microsoft, we’re still working to improve our processes in this area.

Support All Your Customers

It’s probably obvious that you’ll need to respond to vulnerabilities in all supported versions of your product, but we want to stress the point anyway. Our discussion of the response process emphasized the need to provide simultaneous updates for all supported versions, service packs, and local-language versions. Meeting this need means that your source control and testing systems need to be organized to produce and test the necessary updates for all supported versions (not just the most recent). Your customers do not want to be forced to upgrade to a new version, or even to a new service pack, to protect themselves from exploitation of a security vulnerability.

Support for local languages is another aspect of the development team’s role. Ideally, the local language support in your product will be designed well enough to allow you to build a single fix that applies to all language versions. However, some aspects of the fix or update might differ, depending on the language version. In that case, you should be prepared to do the necessary development and testing to release the fix for all languages at the same time. At Microsoft, our work to improve localization support for products has reduced the localization burden for updates so that only a few messages from the update installer package vary with local language. Furthermore, our processes now ensure that the localization work is completed rapidly enough so that updates for all supported languages ship at the same time. If you support an international market, it’s important to get the localization and internationalization support right in the first place because it will make security updating—as well as product enhancements in general—simpler for both your customers and your development teams.

To give you an idea of the level of effort required to support all your customers, we’ll cite the example of the Microsoft Internet Explorer browser component. At one time, before they could ship an update, the Internet Explorer team had to ensure the availability and testing of about 425 different packages, driven by the numbers of supported versions, operating system platforms, and local languages. Few products or technologies are as widely used or supported as Internet Explorer, but it’s still very important to consider the total number of versions you’ll need to support and to factor that number into your plans.

Make Your Product Updatable

Once you’ve produced an update and released it, your customers aren’t protected until they’ve installed it. This section is about the work that the product team does to ensure that customers can actually install the update. At Microsoft, we’ve found that improvements to update deployment and installation have been one of the most significant factors in improving our security response processes—and our customers’ security. Even if your users are technology-savvy, they’ll benefit from easy update deployment and installation.

There are a wide range of techniques for installing security updates. The least-effective technique is to ship your customers a package of updated product files and a readme file that tells them where on their system to copy the files. At the other extreme, you can build a tool into your product that detects the availability of a new update, copies it over the Web, and installs it for the customer with no manual intervention (assuming, of course, that the customer has consented to having his system updated in this manner). At Microsoft, we’ve sought to implement the latter approach. The reason is simple: more customers will actually deploy the updates, and fewer customers will be affected by malicious code and hostile attacks.

Achieving a consistent updating experience is easiest if you pick one and use it for all your products. At Microsoft, we started with eight individual installers that had been developed or adopted by product teams over the years. We were rightly criticized by customers and analysts for having such a confusing set of updating tools, so we initiated a multi-year transition to two installers (one for operating system components and the other for applications) to ease the burden on customers. Even those two installers use identical installer flags for various options, such as silent installation, to ease the system administrator’s task. Similarly, we are moving from a variety of ways of getting to the updates on the Web—Windows Update, Office Update, the Microsoft Download Center Web site, and individual product download Web sites—to a single Microsoft Update Web site that supports automatic updating plus a consistent family of enterprise updating tools for use by administrators who must update large numbers of computers.

Our objective in making the transition to a consistent updating approach is to reduce the difficulty of updating for customers and, thus, to help them install updates more rapidly. We’d like to see all home users install updates automatically as soon as they’re released (because they are unlikely to have complex custom applications in which compatibility with an update becomes an issue), and we’d like to see businesses install updates very rapidly with little or no delay attributable to the difficulty of packaging and deploying updates. We still see a few software suppliers releasing updates in a form in which the administrator has to copy individual files into the appropriate directories on the system. We believe that such a manual and error-prone approach inevitably delays update deployment and thus increases risk to customers. Your organization is not likely to have as many products or versions to contend with as Microsoft does, but it’s still very desirable to make a common choice of updating technology and then apply it for all of the products you release.

The last component of making your product updatable involves ensuring that updates are delivered securely. We still see individuals and some organizations that post updates to the Web without giving their users any way of confirming either the source of the update or the integrity (freedom from alteration) of the content. Installing any code whose origin and integrity you can’t confirm is risky, and that statement is even more pertinent for a security update. You should ensure that your updating mechanism includes provisions for digitally signing the update content, for confirming that the signer is in fact the organization that claims to have authored the update, and for verifying that the signed content has not been tampered with. All of Microsoft’s updating mechanisms incorporate these attributes, and in the case of the automatic updating mechanisms (Windows Update, Office Update, Microsoft Update), signature and integrity verification are performed by the update client as part of the download and installation process. Finally, have a plan to deal with the compromise of the key that you use to sign your updates. Although we’ve never had to deal with this problem, in 2001 we did have to deal with a situation in which a commercial certification authority certified two fraudulently acquired code-signing digital certificates that claimed to belong to Microsoft (Microsoft 2001). We revoked the certificates, and to the best of our knowledge, they were never used. The experience reinforced our commitment to being able to deal with such a contingency.

Find the Vulnerabilities Before the Researchers Do

The final but most important response-related task for the product team is to use vulnerability reports as a learning experience and to fix as many vulnerabilities as possible with as few updates as possible. To do this, the team must develop an in-depth understanding of each reported vulnerability and then determine whether the vulnerability represents an instance of some recurring pattern. If it does, the team must try to find the other instances and correct them all. In previous sections in this chapter, we discussed the three Microsoft updates, beginning with MS03-026, that addressed RPC/DCOM vulnerabilities. After the initial vulnerability report, MSRC, SWIAT, and the DCOM team quickly realized that the issue was just one instance of a pattern of vulnerabilities that they needed to address. In addressing the underlying problem, they conducted a series of code reviews and tests that lasted for several months. Ideally, they would have been able to release a single update to resolve all of the vulnerabilities at once, but receipt of new reports and concern over customers’ safety caused them to decide to release a series of three updates that eliminated progressively more vulnerabilities.

Some customers were upset over the fact that Microsoft released a succession of three updates to address the RPC/DCOM vulnerabilities; we would have preferred to release only one. But consider the alternatives: if we had simply fixed vulnerabilities as they were reported, we might well have issued 20 or more updates over a period of months or years, each addressing “the next” vulnerability. If we had waited until all of the vulnerabilities were eliminated before releasing any update, the odds were high that at least one of the vulnerabilities would have leaked out and been exploited while customers were still defenseless. We think we made the right choice to protect our customers.

Learning from security vulnerabilities involves two separate cycles. The shorter cycle is the security response cycle: the product team takes an external report, investigates it, and develops and releases an update that addresses the reported vulnerability and related vulnerabilities. The longer cycle reaches into the product-development process. The product team and the central security team update processes, training, tools, and standards to attempt to ensure that new product versions are not affected by the vulnerability or anything like it. This longer cycle is critically important, and it’s why we view security response as an integral component of SDL.

Summary

In the real world, products do not achieve perfect security, so software organizations must plan for security response. The response process encompasses a security response team or center, which faces customers and security researchers, and the product team, which must be prepared to investigate and eliminate security vulnerabilities. To implement the SDL effectively, the product team must treat each vulnerability report as a learning experience and must attempt to find related vulnerabilities and fix them in security updates and to update its SDL processes based on the lessons learned from each vulnerability.

References

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset