Performance Optimization

Note

Programmers, Testers

We optimize when there’s a proven need.

Our organization had a problem.[54] Every transaction our software processed had a three-second latency. During peak business hours, transactions piled up—and with our recent surge in sales, the lag sometimes became hours. We cringed every time the phone rang; our customers were upset.

We knew what the problem was: we had recently changed our order preprocessing code. I remember thinking at the time that we might need to start caching the intermediate results of expensive database queries. I had even asked our customers to schedule a performance story. Other stories had been more important, but now performance was top priority.

I checked out the latest code and built it. All tests passed, as usual. Carlann suggested that we create an end-to-end performance test to demonstrate the problem. We created a test that placed 100 simultaneous orders, then ran it under our profiler.

The numbers confirmed my fears: the average transaction took around 3.2 seconds, with a standard deviation too small to be significant. The program spent nearly all that time within a single method: verify_order_id(). We started our investigation there.

I was pretty sure a cache was the right answer, but the profiler pointed to another possibility. The method retrieved a list of active order IDs on every invocation, regardless of the validity of the provided ID. Carlann suggested discounting obviously flawed IDs before testing potentially valid ones, so I made the change and we reran the tests. All passed. Unfortunately, that had no effect on the profile. We rolled back the change.

Next, we agreed to implement the cache. Carlann coded a naïve cache. I started to worry about cache coherency, so I added a test to see what happened when a cached order went active. The test failed. Carlann fixed the bug in the cache code, and all tests passed again.

Note

Cache coherency requires that the data in the cache change when the data in the underlying data store changes and vice versa. It’s easy to get wrong.

Unfortunately, all that effort was a waste. The performance was actually slightly worse than before, and the caching code had bloated our previously elegant method into a big mess. We sat in silence for a minute.

What else could it be? Was there a processor bug? Would we have to figure out some way to dump our JIT-compiled code so that we could graph the processor pipelines executing assembly code? Was there a problem with memory pages, or some garbage collector bug? The profiling statistics were clear. As far as we could tell, we had done everything correctly. I reverted all our changes and ran the profiler again: 3.2 seconds per transaction. What were we missing?

Over lunch that day, Carlann and I shared our performance testing woes with the team. “Why don’t you let me take a look at it?” offered Nate. “Maybe a fresh pair of eyes will help.”

Carlann and I nodded, only too glad to move on to something else. We formed new pairs and worked on nice simple problems for the rest of the day.

The next morning, at the stand-up meeting, Janice told us that she and Nate had found the answer. As it turned out, my initial preconceptions had blinded us to an obvious problem.

There was another method inlined within verify_order_id() that didn’t show up in the profiler. We didn’t look at it because I was sure I understood the code. Janice and Nate, however, stepped through the code. They found a method that was trying to making an unnecessary network connection on each transaction. Fixing it lopped three full seconds off each transaction. They had fixed the problem in less than half an hour.

Oh, and the cache I was sure we would need? We haven’t needed it yet.

How to Optimize

Modern computers are complex. Reading a single line of a file from a disk requires the coordination of the CPU, the kernel, a virtual file system, a system bus, the hard drive controller, the hard drive cache, OS buffers, system memory, and scheduling pipelines. Every component exists to solve a problem, and each has certain tricks to squeeze out performance. Is the data in a cache? Which cache? How’s your memory aligned? Are you reading asynchronously or are you blocking? There are so many variables it’s nearly impossible to predict the general performance of any single method.

Important

The days in which a programmer could predict performance by counting instructions are long gone.

The days in which a programmer could accurately predict performance by counting instruction clock cycles are long gone, yet some still approach performance with a simplistic, brute-force mindset. They make random guesses about performance based on 20-line test programs, flail around while writing code the first time, leave a twisty mess in the real program, and then take a long lunch.

Sometimes that even works. More often, it leaves a complex mess that doesn’t benefit overall performance. It can actually make your code slower.

A holistic approach is the only accurate way to optimize such complex systems. Measure the performance of the entire system, make an educated guess about what to change, then remeasure. If the performance gets better, keep the change. If it doesn’t, discard it. Once your performance test passes, stop—you’re done. Look for any missed refactoring opportunities and run your test suite one more time. Then integrate.

Usually, your performance test will be an end-to-end test. Although I avoid end-to-end tests in other situations (because they’re slow and fragile—see Test-Driven Development” earlier in this chapter), they are often the only accurate way to reproduce real-world performance conditions.

You may be able to use your existing testing tool, such as xUnit, to write your performance tests. Sometimes you get better results from a specialized tool. Either way, encode your performance criteria into the test. Have it return a single, unambiguous pass/fail result as well as performance statistics.

Important

Use a profiler to guide your optimization efforts.

If the test doesn’t pass, use the test as input to your profiler. Use the profiler to find the bottlenecks, and focus your efforts on reducing them. Although optimizations often make code more complex, keep your code as clean and simple as possible.

If you’re adding new code, such as a cache, use test-driven development to create that code. If you’re removing or refactoring code, you may not need any new tests, but be sure to run your test suite after each change.

When to Optimize

Important

Performance optimizations must serve the customer’s needs.

Optimization has two major drawbacks: it often leads to complex, buggy code, and it takes time away from delivering features. Neither is in your customer’s interests. Optimize only when it serves a real, measurable need.

That doesn’t mean you should write stupid code. It means your priority should be code that’s clean and elegant. Once a story is done, if you’re still concerned about performance, run a test. If performance is a problem, fix it—but let your customer make the business decision about how important that fix is.

XP has an excellent mechanism for prioritizing customer needs: the combination of user stories and release planning. In other words, schedule performance optimization just like any other customer-valued work: with a story.

Of course, customers aren’t always aware of the need for performance stories, especially not ones with highly technical requirements. If you have a concern about potential performance problems in part of the system, explain your concern in terms of business tradeoffs and risks. They still might not agree. That’s OK—they’re the experts on business value and priorities. It’s their responsibility, and their decision.

Similarly, you have a responsibility to maintain an efficient development environment. If your tests start to take too long, go ahead and optimize until you meet a concrete goal, such as five or ten minutes. Keep in mind that the most common cause of a slow build is too much emphasis on end-to-end tests, not slow code.

How to Write a Performance Story

Like all stories, performance stories need a concrete, customer-valued goal. A typical story will express that goal in one or more of these terms:

Throughput

How many operations should complete in a given period of time?

Latency

How much delay is acceptable between starting and completing a single operation?

Responsiveness

How much delay is acceptable between starting an operation and receiving feedback about that operation? What kind of feedback is necessary? (Note that latency and responsiveness are related but different. Although good latency leads to good responsiveness, it’s possible to have good responsiveness even with poor latency.)

When writing performance stories, think about acceptable performance—the minimum necessary for satisfactory results—and best possible performance—the point at which further optimization adds little value.

Why have two performance numbers? Performance optimization can consume an infinite amount of time. Sometimes you reach your “best” goal early; this tells you when to stop. Other times you struggle even to meet the “acceptable” goal; this tells you when to keep going.

For example, a story for a server system could be, “Throughput of at least 100 transactions per minute (1,000 is best). Latency of six seconds per transaction (one second is best).” A client system might have the story, “Show progress bar within 1 second of click (0.1 second is best), and complete search within 10 seconds (1 second is best).”

Also consider the conditions under which your story must perform. What kind of workstation or servers will the software run on? What kind of network bandwidth and latency will be available? What other loads will affect the system? How many people will be using it simultaneously? The answers to these questions are likely the same for all stories. Help your customers determine the answers. If you have a standard deployment platform or a minimum platform recommendation, you can base your answers on this standard.

Questions

Why not optimize as we go? We know a section of code will be slow.

How can you really know until you measure it? If your optimization doesn’t affect code maintainability or effort—for example, if you have a choice between sorting libraries and you believe one would be faster for your situation—then it’s OK to put it in.

Important

Simple Design

That’s not usually the case. Like any other work, the choice to optimize is the choice not to do something else. It’s a mistake to spend time optimizing code instead of adding a feature the customer wants. Further, optimizing code tends to increase complexity, which is in direct conflict with the goal of producing a simple design. Although we sometimes need to optimize, we shouldn’t reduce maintainability when there’s no direct customer value.

If you suspect a performance problem, ask your on-site customers for a ballpark estimate of acceptable performance and run a few tests to see if there’s anything to worry about. If there is, talk to your customers about creating and scheduling a story.

How do we write a performance test for situations involving thousands of transactions from many clients?

Good stress-testing tools exist for many network protocols, ranging from ad hoc shell scripts running telnet and netcat sessions to professional benchmarking applications. Your testers or QA department can recommend specific tools.

Our performance tests take too long to run. How can we maintain a 10-minute build?

Good performance tests often take a long time to run, and they may cause your build to take more time than you like. This is one of the few cases in which a multistage build (discussed in Continuous Integration” in Chapter 7) is appropriate. Run your performance tests asynchronously, as a separate build from your standard 10-minute build (see Ten-Minute Build” in Chapter 7), when you integrate.

Our customers don’t want to write performance stories. They say we should write the software to be fast from the beginning. How can we deal with this?

“Fast” is an abstract idea. Do they mean latency should be low? Should throughput be high? Should the application scale better than linearly with increased activity? Is there a point at which performance can plateau, or suffer, or regress? Is UI responsiveness more important than backend processing speed?

These goals need quantification and require programming time to meet. You can include them as part of existing stories, but separating them into their own stories gives your on-site customers more flexibility in scheduling and achieving business value.

Results

When you optimize code as necessary, you invest in activities that customers have identified as valuable over perceived benefit. You quantify necessary performance improvements and can capture that information in executable tests. You measure, test, and gather feedback to lead you to acceptable solutions for reasonable investments of time and effort. Your code is more maintainable, and you favor simple and straightforward code over highly optimized code.

Contraindications

Important

It’s easy to guess wrong when it comes to performance.

Software is a complex system based on the interrelationship of many interacting parts. It’s easy to guess wrong when it comes to performance.

Therefore, don’t optimize code without specific performance criteria and objective proof, such as a performance test, that you’re not yet meeting that criteria. Throw away optimizations that don’t objectively improve performance.

Be cautious of optimizing without tests; optimization often adds complexity and increases the risk of defects.

Alternatives

There are no good alternatives to measurement-based optimization.

Many programmers attempt to optimize all code as they write it, often basing their optimizations on programmer folklore about what’s “fast.” Sometimes these beliefs come from trivial programs that execute an algorithm 1,000 times. A common example of this is a program that compares the speed of StringBuffer to string concatenation in Java or .NET.

Unfortunately, this approach to optimization focuses on trivial algorithmic tricks. Network and hard drive latency are much bigger bottlenecks than CPU performance in modern computing. In other words, if your program talks over a network or writes data to a hard drive—most database updates do both—it probably doesn’t matter how fast your string concatenations are.

On the other hand, some programs are CPU-bound. Some database queries are easy to cache and don’t substantially affect performance. The only way to be sure about a bottleneck is to measure performance under real-world conditions.

Further Reading

“Yet Another Optimization Article” [Fowler 2002b] also discusses the importance of measurement-based optimization. It’s available online at http://www.martinfowler.com/ieeeSoftware/yetOptimization.pdf.



[54] This is a fictionalized account inspired by real experiences.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset