4

BusOps

Images

The first rule of any technology used in a business is that automation applied to an efficient operation will magnify the efficiency. The second is that automation applied to an inefficient operation will magnify the inefficiency.

—Bill Gates

No partnership between two independent companies, no matter how well run, can match the speed, effectiveness, responsiveness and efficiency of a solely owned company.

—Edward Whitacre Jr., former CEO, GM and AT&T

Interlude: Operations Might Not Be Sexy, but Ignore It at Your Peril

Our medium-sized business recently replaced our core software system, which was built on the client-server model, with a new web-based package. We went about this the usual way: we compiled a list of our requirements, emphasizing business rules that were unique to our situation, then searched for packages that could handle them.

We knew we’d have growing pains trying to configure an off-the-shelf solution to our particular needs. At the same time, we wanted to move to the cloud, as we recognized this is the future and would eliminate the need to purchase, maintain, and manage expensive hardware. And so we created a formal RFP that we sent out to potential vendors.

After reviewing their responses, we homed in on two of the proposed solutions: the ones that looked like they’d adapt to our unique business requirements best and that also were cloud based.

Following our vendor meetings, we called references and talked further about our concerns about configuring the software to our needs. Fully aware of the challenges, we finally chose the vendor that had the best reputation, with the system that appeared to best meet our needs.

As we implemented the solution, we struggled to implement our business rules (no surprise there) to make the system support our business processes. With some pain and lessons learned we reached the punch list stage and checked off every one. Finally, we were ready to roll out our new solution.

That’s when we discovered a major problem, one we’d never considered might be an issue. The new system didn’t perform as well as our old client-server solution. It wasn’t even close. The new system was sluggish. Data entry slowed and our ability to get work done deteriorated.

We were moving to the cloud, so it hadn’t occurred to us that we might need an IT operations professional on the team until after we experienced the consequences of not having one.

If you’ve been involved in a cloud migration, you probably know what’s coming. What used to run at wire speeds on our local area network was now running across the internet. Which was when we learned that with the cloud we couldn’t just ignore the network when it came to performance engineering.

And so our project team learned about such niceties as network latency,* bandwidth and what can affect the performance of JavaScript.

The project team has since learned these new words (and concepts) and was left wondering how they missed it all.

Meanwhile, I’m proud to report that while in private the head of IT Operations was shaking his head, in public he never once said, “I told you so” or otherwise faulted the project manager for failing to include his team in the project.

He didn’t have to. From that point forward his problem was that he didn’t have enough staff members to support all the projects that were asking for them.

If There’s No Such Thing as an IT Project, Is There Such a Thing as IT Operations?

In the world of organizational design, work comes in two forms: projects and operations.

Projects have a start and a finish. They produce tangible and unique work products that in some way, shape, or form make tomorrow different from yesterday.

Operations also produces work products. Unlike projects, though, the work products operations delivers are highly similar. In operations, tomorrow is supposed to look a whole lot like yesterday, because consistency is the key to success.

The premise of this book is that to make the achievement of intentional business-change routine and expected, it must be the planned outcome of what used to be thought of as “IT projects.”

Which leads to a logical question: If IT application delivery is just one dimension of a business-change project, does that mean IT operations—the part of IT that makes sure applications, once installed, are running properly and available for those authorized to use them—is just one dimension of business operations?

That’s a simple question. The answer, in contrast, is complicated.

IT Operations as a Service Provider

As you’re probably tired of reading by now, in many organizations IT is run as if it were a separate business—a service provider for its internal customers. We’ve already talked about the dysfunction this creates on the applications side of the IT house. It carries over to IT Operations too.

For example, in yet another case of metaphor-driven cures to largely imaginary maladies, the IT-as-a-business metaphor has led to a seriously strange practice: negotiated service level agreements (SLAs) between IT Operations and its internal customers.

Here’s the “logic” behind the practice.

When real IT outsourcing businesses negotiate contracts with their real, paying customers, they need a way for both parties to agree that the outsourcer is delivering the services it promised.

And so their contracts include SLAs for each provided service. In general, an SLA is a two-part metric. The first part is the minimum acceptable standard of service. The second part is how often the outsourcer has to achieve that level of service.

In some cases, defining the first part of the metric is trivial, as it is for critical systems where the minimum acceptable standard of service is “up and available.” The second part is more interesting; the requirement might be that the system has to be up and available 99.9 percent of the time.

In most situations, though, both parts of the metric must be defined, as is the case for the response to system outages. In this case the SLA might define the maximum acceptable standard for outages when they do occur to be one hour or less before restoration of service. The contractual SLA might specify that the outsourcer must achieve this level of service for 99 percent of all system outages.

For outsourcers and their customers, SLAs are contractual matters. If the outsourcer fails to meet its SLAs, the contract specifies remedies, which are also a matter of contract negotiation. If the outsourcer refuses to provide the specified remedy, the customer can pursue the usual set of escalating legal alternatives.

If internal IT is supposed to act like an independent business, what could be more logical than it negotiating SLAs with its internal customers?

As it turns out, the answer is, lots of alternatives are more logical. Finding some that are less logical? That’s the challenge (but one we won’t bother exploring here).

Post-SLA IT Operations

Internal SLAs were never a particularly good idea for a number of reasons, some long-standing, some evolving.

The first, already stated, is that they reinforce the wrong IT/ business relationship model—that of IT-as-a-business selling to internal customers.

The second is an obvious consequence of the difference: If internal IT fails to achieve a negotiated SLA, what will its “customers” do—sue? SLAs without nonperformance penalties are futile. SLAs with nonperformance penalties encourage interorganizational distrust.

The third is an example of Lewis’s first law of metrics: you get what you measure—that’s the risk you take. In this case IT Operations does measure service levels but lacks any metric regarding innovation.

Well-run IT Operations has to constantly balance between reliability and innovation. But any innovation entails some level of risk. Because SLAs look backward, not forward, they report only the negative consequences of innovation, not its benefits.

As an example, the initial conversion to solid-state hard drives was definitely risky. Their short-term reliability and long-term durability were, for early adopters, unproven. And yet they paid off handsomely for organizations that tried them, giving them a performance advantage in the world of analytics and big data.*

Staying on the leading edge requires some risk taking and forward thinking that SLAs by nature discourage.

Even without these reasons, and even if SLAs once made sense, they don’t anymore, for reasons associated with IT Operations’ two types of responsibilities: technical services and support services.

SLAs for technical services relate to such matters as system availability and performance. Support services are what the men and women who work in IT Operations do to provide assistance to the men and women who work in business operations. Support services SLAs relate to such questions as how long someone should expect to wait before the help desk responds to their request for assistance and how long they should expect to wait until the problem they’ve reported is resolved.

The Case against Technical SLAs

Here’s why technical SLAs are (or should be) a thing of the past: once upon a time, high-availability architectures were a choice. Now they aren’t.

The fact of the matter is that even when some poor business manager agreed to a technical SLA—98 percent uptime used to be common—he agreed only because he had no real choice. A critical system going down was never acceptable, no matter what the SLA said. Back in the day, the only acceptable service level was the one the telecom department delivered: every time someone picked up the phone they got a dial tone; every time they dialed a number the call went through.*

The Digital world has changed all that. Blame Amazon if you like. When your employees shop there, they never experience an outage, and “employees” includes your executives and managers. One hundred percent uptime all the time, and with snappy performance to go with it, is now what everyone understands is possible, and possible at a scale far beyond anything your company has to achieve.

Oh, and they (and you) also expect that Amazon will process orders, ship merchandise, and even handle returns flawlessly. Service accuracy is baked into Amazon. It’s been part of its competitive advantage for years.*

It’s now the norm and everyone’s expectation.

Should IT Operations continue to track service levels for the technical services it provides? Yes, if it isn’t doing very well, but only as a tool to get it to where continued tracking would be a waste of time.

Because while a given piece of equipment might fail, that’s no longer a reason for systems to be out of commission. That’s the nature of high-availability architectures. If a system is ever unavailable, that should be a sufficiently rare event that keeping statistical track is a waste of time.

What won’t be a waste of time is a root cause analysis, because every outage means your high-availability architecture has a design flaw that needs fixing.

What also isn’t a waste of time is continuing to analyze reported incidents to detect and address emerging problems early, before they become detectable to the business at large.

Occasional outages used to be a normal part of doing business. In 2018, as we write these words, outages are no longer normal.

Service-Related SLAs

A user has a problem and calls the help desk. The help desk has a service level for time to first response and another service level for time to resolution.

On any given day, for any given call, the help desk either responds more quickly or responds less quickly than the service level specifies. It responds more quickly when help desk staff capacity exceeds the call volume. It responds less quickly when the call volume exceeds the help desk staff’s capacity.

The SLA has absolutely nothing to do with the help desk’s performance. It’s just a stick. It’s useful for beating up the help desk manager and not much else.

That’s almost an entirely accurate statement. The only time it’s inaccurate is budget season, when the help desk’s service level performance can be used to justify hiring more staff.

This is, to be fair, no small matter. But it justifies the practice of tracking service performance, not negotiating SLAs.

What to do instead of negotiating SLAs: It’s as we just said— by all means track performance. Otherwise you won’t know what needs attention and what doesn’t. It’s the negotiated SLAs that belong in the dustbin of outmoded management practices.

The Only IT Operations Metric That Matters

Pity the poor IT Operations manager.

For most people in management, success increases their visibility, which can lead to promotion, accolades, and better pay as well. The only time IT Operations is visible is when something goes wrong.

All good metrics are numerical representations of qualitative goals. And so the IT Operations metric that best reflects its goals is a measure of its invisibility. This “invisibility index” should be a composite metric that encompasses application availability and performance, the number of calls to the help desk— fewer calls means more invisibility—and (we’re finally getting to the point of this chapter!) some measure that reflects how often IT Operations performance is a bottleneck in other areas’ business processes and practices.

Fixing IT Operations for IT

Typical IT organizations are divided into Applications and Operations—Apps and Ops for short.

In these typical IT organizations, Apps and Ops distrust each other. One reason is that Apps succeeds by making application changes, but for IT Operations, each application change creates a risk of increased visibility.

A second reason is that Application Development teams need IT Operations to create and maintain development and test environments. For Apps this means Ops is a bottleneck. For Ops this means additional and often unscheduled work.

Which is how DevOps happened. DevOps is a form of Agile (see chapter 3). Unlike most Agile variants, with DevOps, as its name implies, Application Development teams include one or more IT Operations staff to handle IT Operations project responsibilities collaboratively and on the project’s schedule, instead of their being handled through the IT Operations request queue.

DevOps has a number of other interesting characteristics, all of which we’re going to ignore here as they aren’t directly relevant to what this book is about.

Digital Business and the End of IT Operations

It’s possible there’s never been a management fad more confusing and ambiguous than Digital this and Digital that.

Behind all the ambiguity are specific Digital technologies that create new capabilities. Businesses can take advantage of them to create competitive advantage. Or, they can ignore them, letting competitors gain the advantage instead.

Behind the specifics is the central Digital reality: information technology is no longer optional. It’s deeply embedded in every business process and practice your company relies on to do business on a day-to-day basis.

A logical consequence: conceptually, IT Operations is just one collection of moving parts among many in overall Business Operations. It’s logically just as much the province of the COO as it is of the CIO.

Digital Business and the Beginning of BusOps

As a practical matter, wherever it reports, IT Operations should remain intact. Its effectiveness (and consequent invisibility) depends on the ongoing collaboration of a number of technically proficient specialists—practitioners of mature and well-developed disciplines.

Managing the processes and practices they’re responsible for depends, in turn, on managers who understand their inner workings. Leading them depends on managers who can empathize with these practitioners as they go about handling their responsibilities.

It’s also worth recognizing that reorganizations rarely fix anything. Mostly they remove barriers by putting groups that didn’t work well together before the reorganization under common management.

Which also means most reorganizations create barriers between groups that used to have common management but don’t anymore.

From our perspective, moving IT Operations in the organizational chart so it reports to the COO is neither more nor less logical than leaving it where it is. As for restructuring IT Operations, breaking it up and parceling out its responsibilities within the business . . . that just won’t work. There remains value in technical people working together on common problems.

What does need to change? DevOps points the way. The culture of collaboration we discussed in chapter 1 has to extend to the relationship between IT Operations and the rest of business operations just as surely and deeply as it does between business managers who want to run things differently and better and IT Applications.

So unchain your help desk staff. With no SLAs to shackle them to their chairs, you can encourage them to get up and visit someone with an issue, learn about what their challenges are, and offer insights into how else they can take advantage of the technologies at their disposal.

Meanwhile, as you’re fixing Agile, fix it more by adding the DevOps dimension of including system and security administrators on business-change project teams. Your projects will have better outcomes, and the additional knowledge of what matters in business areas will make IT Operations more effective in its day-to-day decision-making.

Let’s introduce a new term to make it official. Just as DevOps is the blending and collaboration of IT Apps and IT Ops, let’s start talking about BusOps* as the blending and collaboration of IT Operations and Business Operations.

The battle, according to military theorists,1 is always for the hearts and minds. That often starts with vocabulary.

So BusOps it is. Add it to your working vocabulary. You just might be surprised . . . in a good way . . . at what comes of it.

If You Remember Nothing Else …

•   Service level agreements (SLAs) are contracts between suppliers and customers. That’s no longer the relationship between IT and the rest of the business, so SLAs have to go. Tracking service performance? That’s still necessary because otherwise you won’t know how you’re performing.

•   Another reason SLAs are pointless: the days of computer systems going down are over. We live in a 24x7 world. Amazon and Google don’t go down, and every executive, manager, and employee in your company knows it. So they won’t be satisfied with anything less from IT.

•   Business operations and IT Operations have a great deal in common. More than that, they’re inextricably linked: the point of IT Operations is to support business operations; without one you can’t have the other, and without the other you don’t need the one. Welcome to the world of BusOps.

What You Can Do Right Now

•   Eliminate all SLAs. Don’t worry. The business managers you’ve negotiated them with will breathe a sigh of relief.

•   Recognize, and make sure everyone within IT and throughout the business recognizes, that unplanned downtime is no longer normal or acceptable. Anything short of 100 percent availability means something needs fixing.

•   Institute the only IT Operations metric that matters: the invisibility index.

•   Introduce everyone in the business, both inside and outside IT, to the idea of BusOps and make it part of everyone’s vocabulary.

* This is the time it takes traffic to move across the internet. Traffic rarely goes from point A to point B. It usually hops around a dozen places before reaching its destination.

How wide the freeway is in lanes. Add new on-ramps and you may have a problem.

A programming language used to make internet applications feel more interactive.

* Example taken from Dave’s personal experience.

* Yes, yes, yes, it’s true. With mobile phones we’ve traded reliability for convenience. But we still get annoyed when voice or data service isn’t available.

* When we use Amazon as an example, we’re talking about Amazon the retailer, not Amazon Web Services, its cloud-provider division, which ironically enough does have SLAs in place for its customers. But then, those are real, paying customers, so traditional service models and lawyers are both pretty much inescapable.

* Pronounced’Biz-Ops.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset