Building a culture for network automation

The network industry is heavily product-driven, perhaps more than nearly any other technology discipline. Very rarely do we hear about “revolutionary” new IT processes or stories of how an organization won against it’s competitors because of its great IT team; it’s always the new shiny hardware or software products that grab the headlines.

However, new products—even “revolutionary” ones don’t solve business problems on their own. The advent of x86 virtualization technologies is arguably one of biggest disruptions we’ve ever seen in IT, yet 10 years later despite this disruption, we’re still taking weeks or even months to provision virtual machines. Clearly, our problems aren’t limited to the technology we use. Maybe we need a change in our culture as well. That’s what this lesson is about—why a good culture is a crucial, foundational element for network automation, and how to get there.

It’s also easy to over-rotate on topics of culture and think that it’s the sole cause of all of our problems. The reality is that we need a balance of good people, good process, and good technology in order to win. The cultural change discussed in this lesson is all about satisfying our desire to get things done and work on a team of similarly minded individuals.

In this lesson you won’t read about how many hugs your engineers should be giving per day, or why it’s important for a company to have an indoor trampoline to keep engineers happy. This lesson will take a different view on the subject of “culture,” and you’ll find that it all revolves around one very simple idea: people want to work with other people that give a crap about what they do.

If you’re in a position where you’re looking to keep things pretty consistent career-wise and not looking to make major changes, this lesson probably won’t mean much to you. However, since you’ve read this far, it’s quite likely you’re looking to improve, even a little bit. You’re not quite happy with the status quo. You want to be an agent for change in your organization, and on a personal note, you want to “level up” your technical skill set.

To that end, we’ll discuss three topics in particular:

  • Organizational Strategy and Flexibility

  • Embracing Failure

  • Skills and Education

Organizational Strategy and Flexibility

The first thing you need to address is your team and its place within the organization.

Transforming an Old-World Organization

Enterprise IT is not known for its ability to be on the cutting edge. Traditionally, the technology stack in an IT shop can lag behind by five, or sometimes even ten years compared to what technology leaders are doing or thinking about. This trend isn’t totally unwarranted; the cutting edge got its name for a reason. However, there’s also a lot of bad or outdated reasoning for why an IT shop might use a “legacy” technology stack.

Automation encounters this hurdle all the time. There are often concerns from folks who want to automate but just can’t convince anyone else in their organization to take even the first steps. Or (and this is often worse because of the “bad reputation” it creates) someone started doing basic automation, it went wrong, and took down a revenue-generating system.

Unfortunately, there’s no easy answer to this. Each organization has its own battle scars and unique history. However, one potential answer can be found in a great book, How to Win Friends and Influence People. In it, Dale Carnegie hands down several nuggets of wisdom that are useful for life in general—but one theme that’s present in many places in the book is the need to see things from others’ perspective. To (heavily) paraphrase:

You can’t get anyone to do anything they don’t want to do. So you have to make them want to do it.

We’ll discuss this a bit more later, but in short, the business has to want the automation or in-house scripting. This cannot be some science project that upper management only finds out about when things go wrong.

Even if you start the right way, by communicating your automation strategy to the business, there will be opposition. Change always brings out the antibodies. It would be strange if it didn’t at least make someone feel uncomfortable. The important thing is to remember why you’re doing this—it’s not because automation is cool (even though it is), it’s for the tangible, measurable benefits it can provide to your business.

Another very important point is the need to do things slowly, building good, lasting engineering habits, and to set the right expectations from the beginning. Automation is not unlike healthy weight loss. Anyone that has been able to lose weight and keep it off will tell you it’s not about fad diets, it’s about fundamentals like eating the right things, with the right portions, and exercising. Short-term gains are not nearly as important as building healthy habits over the long term. Fad diets may show some short-term success, and they’re certainly flashy, but they aren’t meant to provide any lasting benefits or success.

Similarly, automation is incremental. If your organization is focused on putting together an “automation” or “DevOps” team, the effort is already doomed to failure. Automation is one of those things that needs buy-in across silos, and needs to grow organically over time. It is for this reason that those organizations that already heavily automate usually don’t really call it anything special. It’s just “modern operations.”

Note

Some organizations have had success with a temporary “virtual” team assembled from members of various IT disciplines, who are tasked with bringing automation into the organization. This can be helpful to get started, but don’t lose sight of the fact that the ultimate goal is to improve operations across the entire organization, not to have a team dedicated to automation so the rest of the organization doesn’t have to worry about it.

To recap, don’t try to boil the ocean or try to formally define everything you’ll need to automate in the next century. Just get started. Start small, and automate the simple stuff, even if it involves writing a few scripts and running them with cron, or focusing on automated troubleshooting initially. You’ll find there’s a lot more to do once you’ve simply gotten started, and you’ll have a lot more confidence to keep it up.

The Importance of Executive Buy-In

Again, it is very important that automation is done with a well-communicated purpose and strategy. That purpose has to focus on delivering value to the business—whether it’s better uptime, security, or just responding more quickly to changing business needs. Those metrics should already be tracked, and if you’re thinking about starting automation without these, you have the order wrong. The very first thing you should be addressing is how well you are communicating your short- and long-term technology goals with the business.

Once this is addressed, there are some very tangible benefits you’ll realize once starting down the automation path. First, any additional head count that’s needed for automation will be an easier pill to swallow. A very common complaint among engineers struggling to get automation started in their organization is the lack of resources to do it. With proper communication with the business, it will be widely understood that the cost of a bit of additional head count would pale in comparison to ongoing outages resulting from either totally manual process, or half-baked automation tooling that was written by an overworked engineer.

However, the single most important reason to have frequent, quality communication regarding automation with the business leadership is that when things go wrong—and they will—you won’t find yourself ripping out all those new tools and processes, but rather working forward to fix them. We’ll discuss this later, but one of the reasons the hyper-scale web companies like Facebook and Google talk about embracing failure is because they ensure they learn from their failures, and strive to ensure they don’t encounter the same problems twice. Each failure is an opportunity to grow. Getting the business on board with this plan from the get-go will make sure that any failures are not only learned from, but planned for.

As an illustrative example, Gitlab (the SaaS version of the software) famously had a significant outage in January 2017. Not only did the service go down, but restoring the service took 18 hours due to a series of previously undetected failures in their backup procedures. Rather than shut the doors and try to figure out what went wrong in private, Gitlab published a Google Document outlining everything they knew, and exposed it publicly so users could see what was going on. They even live-streamed their work to bring the service back online. Once the service was available again, they published an extremely thorough blog post outlining not only what went wrong and how they fixed it, but what they’re putting into place to ensure this event doesn’t happen again. This went a long way to assure customers that Gitlab was serious about their service and that they were interested in learning from mistakes.

The bottom line is that failures will happen and that automation doesn’t obviate the need for proper architectural design. Establishing a contract of transparency and frequent communication with the business leadership will help get the time and tools you need to build a proper foundation for an automation initiative.

Build Versus Buy

With all of IT becoming embroiled in the “open source movement,” it’s easy to get caught up in the hype. “Stop buying everything from vendors, and build everything yourself using open source software,” right? Unfortunately, this sentiment differs from reality.

Despite what the analysts might have you believe, big organizations like Facebook or Google—known for their automation chops—don’t build everything from scratch. At every part of their technology stack, they make compromises on what’s economical for them to “build” versus “buy off the shelf.” For instance, they may build their own servers for their huge data centers, but they don’t fabricate every single component from its base elements. They still have to buy something—they’ve just made the decision to go deeper into that stack. In contrast, they may find it perfectly acceptable to go with a canned solution from Cisco for their corporate wireless connectivity.

The point here is that everyone has to make this decision for themselves. It is likely that you are not at the scale of Facebook or Google, and therefore you won’t get the same benefits of building your own servers that they do, but it’s also very possible that you could benefit from taking on a bit more of the pie than you traditionally have.

A good rule of thumb to follow in Enterprise IT is that you can buy your way into 80% of the features you need. Vendors can’t cater to every Enterprise (try as they might), so they have to spread their engineering across their entire customer base and put out products that are good enough for the immediate use cases. This means that the technology stack you acquire in this way will leave about 20% of your specific use case unaddressed. Traditionally, Enterprise IT has just simply accepted this—but they don’t have to.

Even basic scripting can help fill in this remaining 20% feature gap. For instance—you may have a wireless controller that doesn’t generate reports the way you want. Instead of waiting for months/years for your vendor to change their UI (which may never happen), perhaps investigate if the controller comes with an API. Maybe you could write a Python script to retrieve this data and use a graphics library to generate some nice visuals for you. Note that this doesn’t mean you’re a software developer now—it’s simply knowing enough about scripting that you have an alternative to waiting years for your vendor to respond to your feature request.

Each side of the “Build versus Buy” paradigm tends to have its own traits (Figure 1-1). Organizations that choose to go with a commercial, off-the-shelf solution in one area tend to rely on external resources for support, whereas a technology stack that’s built in-house tends to keep the support model in-house as well, which relies much more heavily on promoting self-sufficient expertise in that area.

Build versus Buy
Figure 1-1. Build versus Buy

However, no technology decision is so binary. A good way of looking at the “Build versus Buy” paradigm—as with many other concepts in this lesson—is as a spectrum. No organization builds everything from scratch. Each technology team must figure out the right mix for their business. In reality, each organization will have a combination of these two strategies.

Embracing Failure

It’s easy to become jaded against hyper-scale web companies talking about “failing fast,” and “embracing failure.” It does sound a bit absurd, doesn’t it? There are usually serious financial penalties associated with failure at the infrastructure layer, so it’s no wonder these ideas are met with a bit of resistance.

However, this resistance is often born out of a misunderstanding of the principle behind these catchphrases. The point of “embracing failure” isn’t that failure is awesome. It’s not. However, as bad as failure is, repeated failure is far worse. The idea behind “failing fast” is to never fail the same way twice. Go in with the assumption that failure will happen (because it will) and have a game plan for how you’ll learn from it. The reason it may appear that some of the web-scale companies are excited about failure is because with the processes and the culture they have, it usually represents a failure they’ve not yet seen, so they get to work on a new problem. They get to modify or build their systems in a way to account for that failure.

So, for the rest of us, the idea of embracing failure is not so different. The idea is to learn from failure, whether it’s an outage caused by a bug in the technology you already use, or someone fat-fingering an automation workflow or script and bringing down a datacenter. Failure happens with or without automation; the key is to understand and plan how your organization is going to react to it.

Note

This is why automated testing is so important. Putting this into place means that testing is not optional when changes are made—they’re literally part of how the change goes into production. Your automated tests are the machine-language version of the lessons you’ve learned in the past.

We’ve talked about the importance of obtaining buy-in from the business, and this is a big reason for doing just that. Failure is a natural part of IT regardless of where you are on the automation spectrum. Getting buy-in from the business can turn conversations about ripping out those “scripts gone wild” into conversations about learning from a failure and ensuring it doesn’t happen twice. Hold proper postmortems, be clear about where the problem occurred, bringing data to the conversation and being analytical rather than assigning blame. Failure isn’t always a sign that you’re doing the wrong thing, it can also be a sign that your technology stack or skill sets are maturing and experiencing growing pains. Include “what if” scenarios in both your architectural discussions as well as when coordinating resources and goals with the business. Build failure planning into everything you do, so there are no surprises when everyone has to rush to the NOC to fix something.

Failure is also a really common reason automation breaks down and organizations revert to manual processes. It may happen very subtly. Especially in network automation, when things go wrong, it’s tempting to circumvent the automation and log directly into infrastructure nodes, as you did before the automation was in place. Depending on how much automation is in place, this may literally be the only way to fix the problem, so this isn’t always a bad thing.

However, a good litmus test for the “automation health” of an organization is what that organization does to the automation after the failure. The healthiest organizations immediately work to modify the automation so the failure doesn’t occur again. We should learn from examples like the previously discussed outage at Gitlab, who started working immediately after fixing the problem to ensure it never happened again. Software development teams do this often; when a bug is discovered in software, the bug is fixed, but a unit test is also created to re-create the parameters that caused the bug, to ensure that the bug is not reintroduced in the future (known as a regression).

Failure happens. Learn from it, and use a process that helps ensure you don’t make the same mistake twice.

Skills and Education

Having the right skills has always been important in IT, especially if you want to differentiate yourself. This is certainly going to become even more critical as the pace of change continues to accelerate and you find your favorite technology stack being outdated by something new.

We’ll dive into a few specific areas of focus for enhancing your IT education and bringing your skill set into the next generation.

Learn What You Don’t Know

One of the most common reactions when we have conversations with peers or customers about what kind of things they can do to get started with automation is: “I didn’t even know that was possible!”

Indeed, working in IT can have a tendency to keep one in a bubble: constantly hearing different versions of the same message, and working on mostly the same things. This is one of the most dangerous scenarios for a technologist, because it’s worse than simply not knowing another technical discipline. In this scenario, you don’t know what you don’t know. It might never have occurred to you that you could use Python to talk to that switch in your broom closet, because those conversations just never made it into your world. If you stay in your bubble, you will have no idea what’s out there and will have difficulty growing as a technologist.

Fortunately, preventing this is easy. Frequently go outside your comfort zone. Jump into some kind of environment that operates outside your current “bubble.” There are an absolute multitude of technologies that may be interesting to you, and you may not ever even find out abut them until you challenge yourself to explore a new area.

This has tremendous value to you in your own career development, but it also essentially brings fresh ideas into your organization as well. This benefit isn’t always tangible, as evidenced by how difficult it can be to get approval to go to conferences, especially conferences that are outside your technical discipline. If you’re in a position where you’re in charge of deciding which conferences to invest in, realize that this is one of the least expensive ways to get fresh ideas into your own organization.

In short, go outside your comfort zone. Things don’t change that much in Enterprise IT, because our culture is very focused on and attached to IT vendors. These vendors have a vested interest in keeping things constant; rapid change doesn’t fit their sales model. One thing we can do to fight this is to stop getting all our ideas and guidance from vendors. Instead of going to your vendor’s big week-long marketing festival, maybe go to some smaller meet-ups, like your local network operator group. Or maybe some conferences outside your skill-set entirely, like a developer or automation conference.

Focus on Fundamentals

In any technical discipline, we always hear of new terms like “digital transformation” and “software defined” to describe the latest shiny new technology to enter the market. These terms give you a sense that you’re falling behind in the technology realm, and that buying the latest product (physical or virtual) will bring you back to the cutting edge.

In truth, most of us actually are a bit behind the times when it comes to technology. Especially in Enterprise IT, the technology stack can lag five, ten, or maybe even more years behind what’s considered the “cutting edge” stuff that folks in Silicon Valley are working with. However, buying the latest shiny product from your vendor never has and never will solve this problem for you. If this were the case, we would have solved this a long time ago. The real reason for stagnation in technology is an underinvestment in people and skill sets outside the traditional vendor-driven messaging we, at times, blindly follow.

Technology doesn’t really change that much. Speeds and feeds get bigger and better, and the industry can tend to go in strange directions at times, but it’s always a pendulum. Old patterns become new again, and the fundamental technology in use at the lowest level is usually the same. The TCP/IP you’ve known and loved from your earliest CCNA days is still very relevant in the latest Software Defined Networking products. The latest wireless products still reduce down to RF at their core.

This is one of the lessons that infrastructure professionals can learn from software developers. In general terms, infrastructure professionals don’t “build” as much as they “operate,” whereas software developers are accustomed to thinking like builders. To that end, software development is less of a skill than it is a collection of micro-skills. Just like a painter learns things like brush technique and the science of mixing colors, software developers pick up languages, tools, algorithms, and hardware knowledge, with the understanding that they will all become useful some day when the next big project calls for them.

These days, especially with the increasing importance of open source software in IT, the need for Systems or Computer Science fundamentals has never been higher. Learn about Linux. Explore a programming language. These fundamentals will help lead you to understanding more about what we’ve taken for granted in IT for so long. New IT products come out all the time, but they all run on hardware and software. So, whether you’re looking to make an entrance to a new discipline, or get deeper within your current one, focusing on the fundamentals is the best way to stay relevant across the vast, yet shallow changes in the IT stack over the long term.

Certifications?

Inevitably, we must answer the ever-popular question: “What is the value of IT certifications in the era of automation?” It’s an intriguing question, especially since it cannot be answered identically for everyone, as each one of us is at a different place in our IT career.

Certifications carry with them an implication that you know the material covered. So, while there are problems with IT certifications today, there’s no denying that there’s an interesting trade-off worth investigating. The certification will somewhat inflexibly define your capabilities in that area, but it’s something concrete and well-recognized by employers. Without certifications, you’d have to start every interview from scratch and prove to your potential employer that you know what you’re talking about. Certifications are a good way to short-circuit this, and certainly, if you’re new in IT, this is a very useful tool to have.

However, there are some limitations to what certifications bring to you. The value of this short circuit decreases over time as you gather more experience. In addition, certifications often serve the vendor first, so they won’t get you 100% coverage of everything you might want to know. You may find it useful to rely more on certifications at the beginning of your career, and as you gain experience, you can dive deeper into fundamentals, relying less on vendors to prove to employers what you know.

If you focus on the fundamentals, IT certifications become much more of a tactical tool than a career-defining education path. Certifications are perfectly fine for cutting through the initial stages of a hiring process, and for some employers, certifications are a requirement. However, understanding the fundamentals will not only help you win in an interview, it will also ensure that you continue to climb the technical ladder as the winds of IT change.

Won’t Automation Take My Job?!

One of the most common questions about automation is: “What will happen to my job?” Indeed, there’s a widespread belief that automation will mean a reduction in head count. After all, if a machine can do my job, who will pay me to do it manually?

This idea seems to be predicated on a few incorrect assumptions. The first of these is that automation is somehow an instantaneous, night/day difference, which is never the case. Automation is always incremental, and imperfect at every layer. You solve the easiest problems first, and gradually move up the stack to bigger problems. You occasionally go back and improve what you wrote last year.

Another incorrect assumption is that once automation is in place, there will be nothing left to do. This is also incorrect, not only because of the first reason given in the previous paragraph, but also because automation unlocks new capabilities you didn’t have before. It’s true that automation does eliminate the need for a warm-blooded human being to fill a certain role, but doing so creates new challenges that simply didn’t exist before, and those people should be reallocated to deal with the new problems. So while a certain role might be replaced by automation, there are always new opportunities opening up further up the stack.

So in a “post-automation” organization, it’s clear that roles and responsibilities will change. You still need good, well-trained people, they’ll just need to be reallocated to take on new challenges uncovered through the introduction of automation.

Summary

Hopefully this lesson has highlighted one very important truth: movements like “DevOps” aren’t just about new technology or tools, but they’re also not all about process, or even culture. DevOps is about all three working together. You have to approach your organization and your people with a systems mind-set, in the same way you might approach a technology problem. DevOps is about optimizing a human system and improving communication so that all three pillars of IT are working in harmony. Proper communication is extremely critical. You will not have success if you don’t understand how the business works. Nor will you have success if you are not able to communicate the value of what you’re doing.

Many of the thoughts shared in this lesson stem from the belief that, while society often likes to break every issue down into a binary, absolute choice between two polar opposites, the reality is often much more like a spectrum. This is very true in IT as well. What works for one organization may not work for you. It is up to you to take the pragmatic approach and really think about the problems you’re trying to solve. Don’t simply rely on IT analysts or big web-scale companies to make your strategic technology decisions for you.

Finally, the journey you need to undergo can’t possibly be contained in a single lesson. You need to get involved with other communities of people that have already made this journey, so you can learn from their mistakes. One great resource for this is the Slack team for “Network to Code” (sign up for free at http://slack.networktocode.com/). There, you’ll find over 50 channels focused on various topics related to infrastructure automation, broken down by vendor or open source project. Especially if you’re new to automation, this is an extremely good place to get started.

In conclusion, be the “automator”—not the “automated.”

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset