Imagine that your business is based in a large city, and management has decided to move all the employees from several small satellite locations to a centrally located office building. Your boss calls you in and tells you that you are going to volunteer to head the committee, who gets to decide where everyone is going to sit in the new building. Aren’t you lucky? So, how would you go about it?
There are a lot of right ways you could organize the employees, but if it were me, I’d say we need to attack the problem on two main fronts: organizational and security.
A good place to start would be to examine the natural employee order you already have in place, all while identifying major job types and existing teams and groupings. But looking at the employees strictly from an organizational perspective isn’t going to be enough. This is mainly because, in real life, people don’t all sit in locked offices with security at the office door and nowhere else. Some people are in private offices, some in cubes, some in open work areas, and wherever they may sit, they tend to work and interact in groups. To facilitate that, building security is typically handled in multiple layers, from the entrance to the floor, to the building section, and sometimes down to the room itself. Organizing the employees so that the groupings make sense from both an organizational and a security viewpoint will drastically simplify things.
Google Cloud may not be an office building, but there are a lot of conceptual similarities when it comes to setting up our resource hierarchy and, later, your access control.
We’ve mentioned a few times that Google Cloud projects are logical collections of services with a billing account attached, as well as that folders are logical groupings of projects and other folders that are used for organization purposes and to define trust boundaries.
I can’t tell you how many times I’ve shown up at an organization and discovered that they had a hundred projects and not a single folder in sight. In my mind, I was thinking, “You’re doing it wrong.” Not that working without folders can’t work – it’s that using only projects means that all the security and policy settings are handled on a project-by-project basis. You can’t tell me that there aren’t groups of projects with the same organization policies and security settings, perhaps managed by the same teams? Instead of a hundred projects, each with their own settings, it would be much easier to group projects with similar policy configurations into folders, apply the policies at the folder level, and then let those policies inherit down to the projects themselves. In the long run, that’s so much easier to manage.
Time to get to work! In this chapter, we’re going to introduce infrastructure automation with Terraform, and then use Terraform to help set up our resources. We will be covering the following topics:
When I was in high school, my dad got his pilot’s license and bought a small, twin-engine Comanche. Imagine a smallish four-seater car with wings, with some luggage thrown behind the two back seats – that’s pretty much what it’s like. The plane rolled off the assembly line in 1965, but my dad upgrades it constantly. Commercial pilots would drool over that little plane’s electronics. Honestly, I’m not sure if he likes flying as much as he likes upgrading his plane.
When I was a senior in high school, I remember that we took a trip. As the pilot, he was in the left front seat, and I was riding shotgun in the seat next to him. We were getting ready to take off and out he pulled out a laminated card of steps. It looked like original 1965 equipment.
I said, “What’s the matter, can’t you remember how to take off?”
He looked at me through these big black rectangular glasses and deadpan said, “Sure, but this way, if some smartass interrupts me in the middle of things, I’m much less likely to forget a step and kill us all.”
We have all worked through something using a checklist. Checklists are both good and bad. They’re good because they remind us what to do, and in which order, but they’re bad because when we go through them manually, there’s always a chance we will forget something. As the list gets longer, and as we go through it more times, the chance of us missing something increases. Going through checklists can also be quite tedious.
Google calls most cloud-related checklists toil. In Google’s original book on building and running production systems, Site Reliability Engineering (SRE), which you can read for free at https://sre.google/books, Google defines toil as, “the kind of work tied to running a production service that tends to be manual, repetitive, automatable, tactical, devoid of enduring value, and that scales linearly as a service grows.”
Planning a resource hierarchy of projects and folders is an art and a science. Building it, and managing that structure over time, is toil.
Imagine planning the resource hierarchy for a data processing application you are moving into Google Cloud. Throughout the week, various company websites generate a folder full of files containing website usage data from direct web visitors and various banner advertisement services. Every Saturday, the application runs a Spark job that extracts and transforms the data in the files and throws the results into a big on-premises database. During the week, data scientists run queries against the database to study website usage trends. Since the table size has increased (it now contains over a trillion records), the query efficiency has dropped. A decision has been made to move the whole kit-n-caboodle to Google Cloud. The plan is to collect and store the files in Cloud Storage, do the Saturday Spark job on an ephemeral Dataproc cluster, and store and query the results in BigQuery.
From a project perspective, how would you approach this problem? Build a single project and do everything there? You could, sure, but Google would recommend that you build at least three projects – one for development; one for non-production staging, testing, and QA work; and one for the production system itself. To make things toil-worthy, you don’t just have to create three identical projects. At first, you don’t know all the infrastructural pieces that you will need, which means elements could change over time. This means that not only do you need to build three identical sets of infrastructure, but you will need to evolve them as your design changes and make sure they are as similar to each other as possible.
Another related decision would be, are you going to store the data, process the data, and query the data out of a single set of projects, or are you going to store and process the data in one set, then move the results into a more centralized data warehousing project? So, you may not be talking about three projects – there could be considerably more.
Now, imagine the problem multiplying as the number of applications and systems expand.
Toil.
Let’s see how Terraform and other Infrastructure as Code (IaC) technologies can help.
If you need to create and manage three identical projects, the manual way would be to use checklists, build your initial development project, and document all the infrastructure. As the application evolves through various infrastructural changes, you must keep tweaking the checklist (and please lord, don’t miss documenting anything). Hold off on the non-production and production projects until you get close to your minimum viable product. As late in the game as possible, use your checklist to create the non-production project. Do your testing and QA work, and if any changes are needed, then document them, and retrofit them into development and non-production until you are ready to go to production. Finally, take your checklist and build the production project.
Matter of fact, where’s the new guy? Following a super long and detailed checklist sounds boring, so why don’t we get him to do it?
A much better approach would be to create a script that, in effect, is an executable version of a checklist. Basic scripting using bash files based on gcloud and other commands may not have much elegance, but they are tried and true and can get the job done. Like any scripting, if possible, store the files in a version control system such as Git. Even better, why not automate how the scripts are applied with a continuous integration (CI) and continuous delivery (CD) pipeline? Now, we’re getting code-like – that is, creating our infrastructure with code.
And thus, Infrastructure as Code (IaC) was born. IaC is the process of provisioning and managing your cloud infrastructure using scripts and other configuration files, which are typically stored in Git version control and ideally applied using CI/CD pipelines.
Though the bash scripting approach will work and is quite common in a lot of environments, it lacks many key benefits that are brought to you by tools specifically designed for IaC. Bash scripts, while flexible, are also wordy and complex when it comes to creating infrastructure. It’s even harder to code scripts so that they’re flexible enough to adapt to inevitable change, without you having to tear out the whole environment and start from scratch. While there are several IaC tools, the two most commonly used with Google Cloud are Deployment Manager and Terraform.
Deployment Manager is an IaC tool built right into Google Cloud. It came out as Generally Available (GA) in 2015 and has been a core GCP tool ever since. Using a mix of YAML, Python, and jinja files, it can automate how infrastructure is created, managed, and removed. Though it is nicely integrated right into Google Cloud, it has always felt a bit clunky, and its popularity has never really taken off. Also, with more and more companies operating their infrastructure over a mix of on-premises and cloud environments, a tool with a broader possible application does have its draw. Enter Terraform:
Terraform (TF) is an open source IaC tool created by HashiCorp (https://www.hashicorp.com). HashiCorp provides several freemium products, including Terraform, Packer, and Vault. TF has more than 1,000 community-created providers, allowing it to create and manage infrastructure in most cloud and on-premises environments. TF configuration files are written in a human-readable format known as HashiCorp Configuration Language (HCL), and the overall creation and application of those files can be accomplished using the same steps, everywhere.
While this information is accurate, it’s also sort of misleading. Yes, TF can create and manage infrastructure in all the major cloud providers, as well as most on-premises environments. Yes, the major steps you go through to use TF in any environment are always the same. And yes, the majority of your work will be creating and using HCL files.
But.
The HCL to create a VM in AWS, and the HCL that’s used to create a VM in GCP, while sharing a lot of logical similarities, are quite different. When I hear people say, “Terraform is cloud-agnostic,” it makes my eye twitch. To me, cloud-agnostic would be one configuration file that works in all the major clouds with little to no changes. That isn’t true in TF. Using TF for different environments, if that is one of your goals, will require you to get good at working with different HCL resources and their nested key-value configurations.
But even with the need to create environmentally-specific HCL, the fact that the logical process for using TF never changes has a lot of advantages. That, when combined with TF’s capabilities in the Google Cloud space, means that TF is a far superior option compared to Google Cloud’s Deployment Manager. This is likely why, starting in 2022, Google now recommends TF as the first choice when selecting an IaC tool for use with Google Cloud. Using it will reduce toil, improve consistency, and allow you to build security and infrastructure best practices right into your resource hierarchy. So, let’s get TF working for us.
TF is a command-line utility written in Go that can be downloaded from the HashiCorp website. Since it’s commonly used with Google Cloud, you can also find it pre-installed in Cloud Shell. If you’d like to install it yourself on a laptop or VM, please go to https://learn.hashicorp.com/tutorials/terraform/install-cli.
To use TF, you must run through five logical steps:
Since these steps are used to create, modify, and destroy infrastructure, you tend to iterate through them multiple times. Putting them into a simplified graphic, they would look like this:
To help learn how TF works, let’s look at an example. We’ll start by setting up the recommended TF folder and file structure, with a set of scripts we’ll use to create a virtual machine and a Cloud Storage bucket. Then, we’ll learn how to use TF to apply the scripts and build the infrastructure with the Terraform CLI. Next, we’ll examine the results in the cloud and the state files that TF creates and uses to speed up future infrastructure changes. With some general TF knowledge under our belt, we’ll turn back to our foundation laying and create a foundation for future TF usage in GCP, all while following the recommendations laid out in Google’s security foundations guide.
Let’s get to work.
Imagine that you have a basic Linux, Apache, MySQL, and PHP (LAMP) stack application you are planning to build on a VM, and you know it will need to be able to store files in a Cloud Storage bucket. You would like to use TF to create the VM and Cloud Storage bucket. In reality, a lot of ancillary components would need to be added into the mix to make this work at the production level – from a service account to network configurations to firewall rules – but we’ll leave them out for now to simplify this example.
Note – The Files for This Example Can Be Found on GitHub
If you check out the GitHub repository at https://github.com/PacktPublishing/The-Ultimate-Guide-to-Building-a-Google-Cloud-Foundation, you will see the complete and working tf_lamp_fun subfolder.
To get started, you need to set up the TF folder structure. Though it’s possible to create a single folder and do everything within that, even using a single .tf script file, as your infrastructure becomes more and more complex, it’s very helpful to divide and conquer with TF modules. A Terraform module is a subfolder of TF files (literally, files with a .tf extension) that all work to handle some part of your infrastructure. The root module is either the top-level folder itself or some folder just off the top level (for example, if you wanted to have a development and separate production root module, which we will see later in this chapter). The root module controls the overall process, mostly by putting all the modules into play. Folders that are typically in a nested /modules folder are submodules of that overall process. By default, you should create README.md, main.tf, variables.tf, and outputs.tf text files in each module. README is where you document what that module/submodule is doing. Main is where you enter the configurations to do the work. Variables allows you to define and describe any dynamic variable the script expects, as well as provide default values where appropriate. Finally, outputs defines and describes any output values from that particular script.
Note – Terraform Doesn’t Care about Filenames, It’s about the Extension
While the filenames and purposes I have used in this example are typical, and even a recommended best practice, they are in no way required. TF will evaluate any .tf files it finds in the current folder and process them in the order that TF feels is best. The folder structure and naming conventions are helpful only so far in that they remind you what’s defined where. If you feel that it is helpful to subdivide a module by splitting the main.tf file into multiple, smaller .tf files, then that is perfectly acceptable.
So, before I do anything else, let me go ahead and get that initial structure set up. For now, all the files are empty text files. For this example, the folder structure may look as follows. main.tf 1 would be the root script, main.tf 3 would be responsible for creating computing resources (in this example, a VM), and main.tf 2 would handle storage – that is, the Cloud Storage bucket:
Let’s start with the compute module. Defining resources in Terraform HCL can be accomplished by creating resource blocks in your TF files with the following syntax:
resource "some_resource_type" "name_for_this_resource" {
key = "value"
key2 = "value"
}
Here, some_resource_type is a provider-specific name for what resource is being created, name_for_this_resource will be a reference name created by you and used inside the TF script files for reference, and the key-value pairs are configurations that the resource type expects.
So, if you’re going to use TF to create a VM in Google Cloud, you need to check out the documentation for the TF Google Cloud provider for Compute Engine. A TF provider is like a driver of sorts, and it provides TF with a set of resources it knows how to implement. We’ll see in a bit how the provider gets loaded into the root module, but for now, let’s focus on the Google Cloud provider we’ll be using to create a VM.
The documentation for the Google Cloud provider can be found here: https://registry.terraform.io/providers/hashicorp/google/latest/docs. On the Google Cloud Platform Provider home page, down the left-hand side, you will see links to various resource types that it knows how to manage. Expanding Compute Engine | google_compute_instance will lead you to the documentation for creating VMs in Google Cloud. There are a lot of options you could specify for a new VM, but many of them have default values. I don’t need anything fancy, so I’m going to create my VM like this:
resource "google_compute_instance" "app_vm" {
name = "demo-vm"
machine_type = var.machine_type
zone = var.zone
allow_stopping_for_update = true
boot_disk {
initialize_params {
image = "ubuntu-2110-impish-v20220204" //ver 21.10
}
}
network_interface {
network = "default"
access_config { //This adds an ephemeral public IP
}
}
}
This script creates a GCP VM. How can I tell? Because the resource type is set to google_compute_instance. The name and boot image values for the VM have also been explicitly declared. The machine type of the VM (chipset, CPU count, and amount of RAM) and the Google Cloud zone where the VM will be created were pulled from variables. The referenced variables (var.xxx) should be declared in the variables.tf file and their values can be provided in several different ways, as we’ll cover soon:
variable "zone" {
description = "Zone for compute resources."
type = string
}
variable "machine_type" {
description = "GCE machine type."
type = string
default = "n1-standard-4"
}
Anyone using this module can look at my variables.tf and learn something about the variables, which ones have default values, and which ones must be assigned a value.
If you’d like to pass anything back from a module to the caller – either something needed as input to some other module (such as a machine IP address) or something you’d simply like to be printed back to the user when they run the script – you can add that to the outputs.tf file:
output "vm_public_ip" {
//Note: The following line of code should all be on a single line.
value = google_compute_instance.app_vm.network_
interface.0.access_config.0.nat_ip
description = "The public IP for the new VM"
}
Once the submodule has been fully defined, it can be called from your root module. Before doing that, though, the top of your root module (main.tf 1 in Figure 4.3) should start by loading the provider you are using for this series of scripts. For us, that’s Google Cloud’s provider, which is documented here: https://registry.terraform.io/providers/hashicorp/google/latest/docs. Our root script will start something like this:
provider "google" {
project = var.project_id
}
After loading the provider, a typical root module will call all the child modules, pass them any variable values they require, and let them do the majority – if not all – of the work. So, the rest of my main.tf file might look like this:
module "compute" {
source = "./modules/compute"
zone = var.zone
depends_on = [
module.storage
]
}
module "storage" {
source = "./modules/storage"
region = var.region
project_id = var.project_id
}
Notice that for compute, I’m specifying the VM’s zone using another variable, but I left out the machine type because I was happy with its default. I also wanted to make sure the GCS bucket was defined before the VM, so I added depends_on to the VM, which tells TF to make sure the storage module has been fully created before processing compute. TF can typically determine a good order of creation based on physical dependencies between resources, but if there is a dependency that TF can’t glean from the script, then this is a way to force ordering.
Since I passed in values to the modules that, once again, came from variables, I must declare or re-declare those variables in the variables.tf file for the root module, like so:
variable "project_id" {
description = "Project owning defined resources"
type = string
}
variable "zone" {
description = "The default GCP zone"
type = string
}
variable "region" {
description = "The default GCP region"
type = string
}
Let’s get back to the variables. There are several ways TF variable values can be provided when executing your scripts, including the following:
export TF_VAR_zone=us-central1-a
terraform apply -var=”zone=us-central1-a”
I don’t want to have to specify the variables manually, so I’m going to create a .tfvars file:
project_id = "patrick-haggerty"
region="us-central1"
zone="us-central1-a"
With all the files in place, including the Cloud Storage files I’ve omitted from this book, let’s create our infrastructure.
Even though your TF module structure and files are in place, there are several key TF CLI commands you will need to continue. These commands will all be executed from within the root module folder:
To implement my example, I’m running my scripts out of Cloud Shell, which comes with the TF CLI pre-installed. To use the CLI, I change to the folder containing my root module. Here, that would be the tf_lamp_fun folder. First, I must run init to download all the requisite provider files, and then run validate to sanity check the various TF files throughout the module structure.
When you think everything is ready to roll, you can use terraform plan. A hierarchical view of exactly what the scripts, when applied, will do will be returned. It will include several common symbols, including the following:
This is your chance to examine what TF is about to do. You’ll notice that the resources display the values for all the properties – the ones you explicitly assigned and the ones with defaults. If the plan appears to be doing what you like, then terraform apply it.
If you create and run a series of TF scripts from the command line, you’ll notice that several files are created:
By default, each time terraform plan/apply is executed, a refresh will be completed of the state file from the live infrastructure. This ensures that no changes have been made behind TF’s back. As a best practice, all infrastructure changes should be made through TF, making this an unneeded and potentially time-consuming process. Adding -refresh=false will disable this functionality.
Two additional TF CLI commands related to state are as follows:
Remember that TF isn’t just used to create and destroy infrastructure – it’s also used to update it. After some testing, I’ve decided that the default n1-standard-4 machine type for my VM is too big, wasting both resources and money. I’d like to change it down to n1-standard-1. The compute module already has the variable set up for machine_type, but the root module doesn’t. One option would be to update the variables.tf and terraform.tfvars files, just like we did with our other variables, but I’d like to show a slightly different approach this time. I’m going to go to the top of my root main.tf file and, just below provider, I’m going to add some locals. locals are used by Terraform to create and define variables directly in a specific .tf file:
locals{
cpu_count = "4"
chip_type = "standard"
chip_family = "n1"
machine_type = format("%s-%s-%s", local.chip_family,
local.chip_type, local.cpu_count)
}
If you’re thinking, “Hey, we could use locals to define all the variables,” you could, but I wouldn’t. This approach tends to work best when you use the .tfvars file for key naming elements, such as base names, and then use tricks such as the format() function, to combine the base elements into variable values. Say you’re creating a bucket, and your buckets all use the bkt-business_name-app_name-#### format, where the business name and app name could be defined in .tfvars and then used as name parts for multiple resources.
To use the local variables I created here, I could update the compute module’s details, like so:
module "compute" {
source = "./modules/compute"
zone = var.zone
machine_type = local.machine_type //notice the local.
depends_on = [
module.storage
]
}
When I run terraform plan, I will see the following:
~ resource "google_compute_instance" "app_vm" {
…
~ machine_type = "n1-standard-4" -> "n1-standard-1"
Note ~, indicating that the resource will be updated in place. So, TF is going to power the machine down, edit its machine type specification, and then restart it. Nice.
Note – Terraform Has a Lot of Useful Functions
You can find them documented at https://www.terraform.io/language/functions.
Running TF scripts out of Cloud Shell, as I did for my demo, is nice and easy, but it will quickly become problematic if you try to use that same approach to manage an organization full of infrastructure. Some such problems are as follows:
Since Google has switched its recommended IaC automation tool from Deployment Manager to TF, they have also released some foundational best practices in a PDF with corresponding TF scripts. Many of the best practices have been built into this book, but you can check out Google’s latest security foundations guide at https://services.google.com/fh/files/misc/google-cloud-security-foundations-guide.pdf. The corresponding TF scripts can be found at https://github.com/terraform-google-modules/terraform-example-foundation. They are, in turn, based on Google’s more general examples of Terraform scripts, which can be found in their Cloud Foundation Toolkit at https://cloud.google.com/docs/terraform/blueprints/terraform-blueprints.
You don’t have to use all of Google’s TF scripts as-is, but they can make a good foundation for scripts you create on your own.
To get a solid foundation in place where you can use TF to manage the infrastructure across your entire Google Cloud footprint, you should start by creating two projects and placing them in a GCP folder toward the very top of your organizational structure: seed and cicd. Why two? Mostly because it keeps things very clean from a separation of concerns (SoC) point of view. CI/CD has its own set of requirements, so let’s package them all into one project, while everything TF lives in the other:
The easiest way to implement these two projects is to use the first part (0-bootstrap) of Google’s Example Foundation at https://github.com/terraform-google-modules/terraform-example-foundation/. This repository was built to implement an entire foundation layer that follows the Google Security Best Practice Guide, which can be found at https://cloud.google.com/docs/terraform/blueprints/terraform-blueprints. It’s constructed from other Google TF blueprints, including bootstrap, folders, and project-factory. If the full sample org scripts don’t work for you, see if you can mix and match some of the others to get the work done the way you need.
The Example Foundation scripts can use Google’s Cloud Build or OSS Jenkins to do the CI/CD build work. To keep things simple, I’m going to stick with the default: Cloud Build.
To start, you will need an existing project in your organization that you can use as a launching pad. Alternatively, you could perform these steps using the Google Cloud SDK on your laptop. You will also need to log in to Google using your organizational administrator account. If you are using my naming scheme, then this will be in gcp-orgadmin-<first>.<last>@<domain> format.
Next, you must fork Google’s sample foundation repository of scripts so that you can edit them as you need. I’ve done exactly that, and I’ve put my copy in with the rest of the code from this book, so you can find it as a terraform-example-foundation sub-folder in my repository at https://github.com/PacktPublishing/The-Ultimate-Guide-to-Building-a-Google-Cloud-Foundation.
Note – You Should Use the Latest Version of the Example Foundation
I’m currently working with the 2021-12 version of Google’s blueprint guide and the 2022-06 latest version of the TF scripts. I don’t doubt that Google will be updating these files and that as they do, the steps I’m putting in this book – the steps that work with my fork of Google’s scripts – may start to drift from what you are seeing with the latest version. Please check the main repository and its documentation for updates.
A note on labels. In Google Cloud, a label is an extra identifier that helps you track resources at a finer grain. I’ll talk more about labels later in this chapter, so for now, when I have you editing labels, think of using them to identify data that’s been placed on sticky notes and stuck to the side of various resources. Both the label key and its value must contain UTF-8 lowercase letters, numeric characters, underscores, and dashes.
Either on your laptop or in your launching pad project’s Cloud Shell window, clone down your copy of Google’s Cloud Foundation Toolkit scripts. Here’s an example where I’m using my copy. Again, this should be updated to reflect your fork of Google’s scripts:
git clone
https://github.com/PacktPublishing/The-Ultimate-Guide-to-Building-a-Google-Cloud-Foundation
cd The-Ultimate-Guide-to-Building-a-Google-Cloud-Foundation/chapter04/terraform-example-foundation
Now that you have downloaded the scripts, let’s make some variable and value edits:
gcloud organizations list
gcloud beta billing accounts list
Excellent! Now that changes have been made to the variables, let’s use the scripts. Again, if you are using a newer version of Google’s example scripts, double-check their latest documentation and make sure these steps haven’t changed:
That should be it. If you look at the resource hierarchy for your organization in the GCP Console, by going to Navigation menu | IAM & Admin | Manage Resources, you should see a new fldr-bootstrap folder with the two new projects under it – that is, proj-b-cicd and proj-b-seed. You will also see the project labels we applied. This will look similar to the following:
Remember, the seed project will be doing the work, but next to no one should have direct access to it. The people administering your organizational structure will be doing that from the cicd project.
If you use the project selector (remember, that’s a little to the right of the Navigation menu’s hamburger button), you can navigate to your prj-b-cicd project. Once there, look at the project’s source repositories. You will see a series of Git repositories that have been created by the bootstrap process. Once you finish this book and have a better grasp of how this structure works, you will need to consider if this exact set of repositories makes sense in your organization, or if you will need something different. The list of repositories that have been created by the bootstrap TF scripts is one of its variables, so adding, removing, or renaming repositories will be easy to do, thanks to TF.
We will work with these various repositories and see how the CI/CD pipeline works shortly. First, though, it’s time to get to the fifth major step in Google’s 10 steps of foundation laying and start building our resource hierarchy.
Do you remember how we started this chapter? With figuring out where everyone got to sit in the new office building? It’s time to do exactly that, only for Google Cloud.
A well-planned Google Cloud resource hierarchy can help the entire GCP footprint by implementing key configuration and security principles in an automated way. If you’ve set up your bootstrap infrastructure, then you will have the CI/CD-driven TF projects in place, so let’s start laying in other key folders and projects. To do that, we’ll start by laying down some naming guidelines.
It is doable, if you want to willy-nilly name your GCP folders and projects, but when you’re implementing a level of automation, consistency in naming makes sense. Do you remember the document we created back in Chapter 2, IAM, Users, Groups, and Admin Access, where we put the names for things such as our organizational administrator email address format? It’s time to break that out again, or time to create a new infrastructure naming document.
One of the standard naming features you may or may not decide to implement is Hungarian Notation. Throughout grade and high school, I attended an all-boys Catholic school in Irving, Texas named Cistercian. Most of the Cistercian monks at that time had escaped Hungry in or around the 1956 Hungarian Revolution, when Hungry temporarily threw out the Stalinist USSR government they’d been under since the end of WWII. Hungarians traditionally reverse their name, placing their family name before their given name. So, Hungarian-style naming would change my name from Patrick Haggerty to Haggerty Patrick.
In the 1970s, a Hungarian named Charles Simonyi, a programmer who started at Xerox PARC and later became the Chief Architect at Microsoft, came up with what we now call Hungarian Notation. Hungarian Notation advocates starting names with a prefix that indicates what exactly is being named. For this book, I’m going to use Google’s naming recommendations, which will leverage a type of Hungarian Notation. You can decide if Google’s naming style works for you, or if you’d like to implement naming some other way.
To see the naming guidelines from Google, check out their Security Foundations guide at https://services.google.com/fh/files/misc/google-cloud-security-foundations-guide.pdf. Let’s start with the names we are going to need to create an initial resource hierarchy. Note that {something} implies something is optional:
Following Google’s recommendation, notice that I’m starting my names with a Hungarian Notation that identifies what is being named, such as fldr for folder. Then, I’m using a mix of standard and user-selected naming components. The following is for reference purposes:
Now that we have an idea of how we are going to name things, let’s discuss how we will lay out our resource hierarchy.
There are lots of ways your resource hierarchy can be laid out, and many would be perfectly acceptable. Instead of looking at lots of examples, let’s look at two good and one not-so-good designs. First, let me show you a very common design that is, unfortunately, what you shouldn’t do:
I can hear some of you now. “What? This is the bad example? But this is exactly what I was planning on doing!”
To get this design, I imagined looking at a company org chart, and I created a resource hierarchy that matched it. On the surface, this seems reasonable (which is why I see it all the time, unfortunately). So, what’s wrong with it? Let’s take a look:
Note – Folders Have Limits
Whichever design you decide on, keep in mind that there are some limits with folders. Currently, Google limits folder depth to no more than 10 levels, and any given folder can have up to 300 direct subfolders.
OK; if the org chart approach is wrong, what might be better?
Well, let’s start with a few Google Cloud principles and behaviors:
Google is a big fan of a basic folder structure, just enough to simplify policy application, and filled with lots of purpose-built projects that follow a consistent naming convention. I think the recommendation makes sense. Have you ever heard of the Keep It Simple Stupid (KISS) principle? You should apply it to your resource hierarchy design. Imagine your organization and what it does, then split that purpose along the most major lines, not department lines, but more “what can Google do for you” lines.
It might be because of my development background and my introduction to Google Cloud from that perspective, but I like the basic approach Google takes to its security blueprint guide. This is a design focused on the belief that you’re using Google primarily as a place to build and run applications. It’s simple but easily extensible. It may look like this:
See? Simple, right? All the shared resources sit in a common folder and the application-related projects all sit in folders related to their stage of development, ending with the bootstrap folder that hosts the cicd and seed projects. The layout allows us to assign base permissions and organizational policies at a very high, centralized folder level, and we can add any projects we need while following a pre-agreed-on naming convention.
While the design is nice from an application development and deployment view of the world, it doesn’t address needs around centralized data storage, regulatory isolation requirements, or multiple departments doing development. The fix? Add a few more folders.
If your operating units have a lot of autonomy, then you may wish to consider adding a layer of business unit folders between the org and env folders. This would allow the EMEA business unit to have their copy of the environments, and then the AMER business unit could do the same. If you have multiple development teams, then you could add team folders just inside each environment folder. If you have centralized data and data warehousing needs, then a folder for that can either be in the common folder, or you could add a new top-level folder and put those things there. In the end, including our folder naming convention, the design might be closer to this:
To fully flush out your design, but you should also consider things such as the following:
But your planning won’t stop with folders – you will also likely need some common-use projects to go along with them.
The most obvious type of Google Cloud project is one where you need to accomplish some specific task or application goal. Need to build a data warehouse in Big Query? You’re going to need a project. Planning on building a REST service that does currency conversion? You’ll need a dev/non-prod/prod project for that too. But what about projects for background tasks? I’m not talking about the background as in behind a UI – I’m talking more about handling core Google Cloud resources such as secrets, centralized networking, and DNS configurations. Luckily, the Google Cloud security blueprint and its related Example Foundation Terraform project have some suggestions.
Let’s start with ideas for the common folder. Remember that fldr-common (don’t forget our naming convention) will be used to house projects serving broad, cross-department needs. Some project suggestions from the Google security blueprint are as follows:
Once you have your common folders planned out, think about projects that you may need in every environment. If you are using the dev/non-prod/prod design, then you may want to pre-construct projects that can centralize resources at the environment level. Some examples are as follows:
While these are two nice starter lists, they are in no way meant to be exhaustive or mandatory. You may not be using Interconnect, so you wouldn’t need that project. You also may not want a way to cross-share VPC traffic, in which case the hubs would be unnecessary. And there’s no telling what other centralized or environmental services you may need that are specific to your organization, so make sure that you tweak the design to something that makes sense to your organization.
Before we move on and apply our infrastructure, let’s talk a little about resource labeling.
To help you manage organizations as they grow, you should seriously consider coming up with and using a core set of labels. As I mentioned earlier, in Google Cloud, a label is an arbitrary key-value piece of metadata you can stick to the side of most GCP resources. They can be used for tracking spend at a more granular level, and to help better filter GCP logs, among other things.
Common labels include things such as team, owner, region, business unit, department, data classification, cost center, component, environment, and state. We are going to apply a nice standard set of labels to our projects, starting with the following:
For more information about labels, go to https://cloud.google.com/resource-manager/docs/creating-managing-labels. As a best practice, using a labeling tool or applying labels via automation such as TF, is advisable.
To implement my resource hierarchy, I’m going to continue using Google’s Example Foundation, which is, in turn, built on several of their other blueprints. All the blueprints can be found at https://cloud.google.com/docs/terraform/blueprints/terraform-blueprints. The ones you should pay most attention to at this point are:
The resource hierarchy I’m going to implement at this point is the basic five-folder version I first showed you as a form of good design. It would be trivial to use the folders module to add the extra business unit and team folders if I decided to go that route.
One thing to keep in mind is that your folder design isn’t set in stone. It is possible to move projects from one folder to another, so it’s possible to start with one design and decide to modify it. Having said that, it’s a lot easier to begin with a simple design that works and then grow from there, thus minimizing the need to move things around.
Note – Moving Projects Will Have IAM Repercussions
If you move a project from one point in the resource hierarchy to another, any IAM configurations you’ve directly set inside the project will move with it. However, the inherited permissions will change based on the project’s new location in the hierarchy. For more information on moving projects, go to https://cloud.google.com/resource-manager/docs/moving-projects-folders.
Before we dive deeper into creating our folder structure, let’s talk about how the cicd and seed projects work.
When we created the cicd and seed projects, we didn’t talk about how to use them. As you may recall, the seed project is where the high-powered TF service account lives, and where the TF state information is stored and encrypted. The cicd project is where infrastructure managers will work to submit changes to the CI/CD pipeline.
Note – You May Not Have the Project Quota You Need
The following example creates a passel of projects. As a result, Google may throw an error related to your TF service account, essentially saying that the user doesn’t have enough of a quota to create and set up billing on so many projects. You should go ahead and ask Google to increase the SA’s quota. Go to https://support.google.com/code/contact/billing_quota_increase and request a quota increase of 50 for the TF service account you generated while creating your bootstrap project. This should be in your notes.
If you navigate to your cicd project (proj-b-cicd) and then go to Navigation menu | Source Repositories, you will see that you currently have several Git repositories. The 0-bootstap module created them as part of its setup task, and the actual repository list was controlled by the cloud_source_repos TF variable. The current set of repositories includes gcp-environments, gcp-networks, gcp-org, gcp-policies, and gcp-projects. We will cover the purposes of these different repositories soon.
Next, if you return to the cloud console and navigate to Navigation Menu | Cloud Build | Triggers, you will see that there are several pairs of Cloud Build triggers attached to these same repositories. Cloud Build triggers watch for repository changes and respond by following a prescribed set of actions defined in a YAML file.
The CI/CD pipeline is going to integrate with Git via a persistent branch strategy. That is, we are going to create and use Git branches to differentiate stages, such as dev, non-prod, and prod, and split between planning and applying TF scripts.
So, the plan/apply triggers work against the dev, non-prod, and prod environments. If you want to plan a job in the production environment, you would need to do the following:
Not too shabby. It makes good, logical sense and is easy to use.
To understand this process, let’s implement the 1-org and 2-environments parts of the Example Foundation.
The next step in Google’s TF Example Foundation is 1-org. It sets up several projects in the common folder for things such as org-level secrets, central networking, and org-level logging. It also configures a slew of base security features. Most of this step deals with things we will discuss later in this book (access, security, and logging) since Google’s 10-step Cloud Foundation deals with them in a slightly different order.
Though we don’t get into the details of exactly what 1-org does at this point, it will nicely illustrate how our CI/CD pipeline works.
To check out the latest instructions for 1-org, go to https://github.com/terraform-google-modules/terraform-example-foundation/tree/master/1-org.
To start, make sure you are logged into your organizational administrator account. Then, either open Cloud Shell in the same launchpad project you used to deploy the bootstrap project or go back to the terminal window on your laptop if that’s where you deployed from. You will need to locate the folder where you cloned down the Example Foundation. Some of the steps we will follow use the new beta TF validator to check some of the things we will do against a set of policy configurations. This is not a required component, but one that the Example Foundation uses, so I’m leaving it in.
Let’s get started by loading the Terraform validator policies into the related Git repository:
gcloud source repos clone gcp-policies
--project=CICD_PROJECT_ID
cd gcp-policies
cp -RT ../terraform-example-foundation/policy-library/ .
git add .
git commit -m ‘Initial commit, sample policies’
git push --set-upstream origin master
With the policies where we need them, let’s get a copy of Google’s base example organization in place. This will create the common folder and the central use projects we mentioned earlier in this chapter. Before we apply it, though, let’s get the files ready:
gcloud source repos clone gcp-org
--project=CICD_PROJECT_ID
cd gcp-org
git checkout -b plan
cp -RT ../terraform-example-foundation/1-org/ .
cp ../terraform-example-foundation/build/cloudbuild-tf-* .
cp ../terraform-example-foundation/build/tf-wrapper.sh .
chmod 755 ./tf-wrapper.sh
gcloud access-context-manager policies list
--organization YOUR_ORGANIZATION_ID
--format=”value(name)”
cp ./envs/shared/terraform.example.tfvars ./envs/shared/terraform.tfvars
Excellent! At this point, we have everything prepped. Let’s examine what we have and make a few configuration changes before we apply them.
Here, you will see a pair of basic Cloud Build configuration files. Cloud Build, as I mentioned earlier, is a serverless Google Cloud CI/CD tool. To tell it what you want it to do, you must create a configuration file, similar to the two we are examining. These scripts will decide whether to use your account or to use a central service account (we will be using the SA), drop the name of our central TF state bucket in where needed, and then call a bash script file to do the TF work. Here, you can see how the bash scripts (entrypoint: /bin/bash) are called and the arguments that are passed to each.
The script contains wrappers for all the TF steps, from init through apply, passing in appropriate arguments where needed.
Next, if you look in the base of your gcp-org folder, you’ll see a subfolder named shared. Expanding it, you will see all the .tf files Google has created to do the work. Remember how TF works – any file ending in .tf will be applied. In this case, instead of a main.tf file that calls a bunch of modules, the designers simply broke the process down into a single folder containing lots of .tf files. We’ve seen some of these files previously (*.tfvars, backend.tf, and so on). Some of the others you should note are as follows:
Now that we’ve taken our tour, let’s tweak a few things:
gcloud organizations list
gcloud beta billing accounts list
That’s it for the required variables. Now, let’s look at some you may want to consider. For a full description of all the variables and their use, see the variables.tf file. Let’s look at a few key options.
Excellent! With the variables all set, it’s time to finish the 1-org segment of the Example Foundation.
git add .
git commit -m ‘Initial org commit’
git push --set-upstream origin plan
https://console.cloud.google.com/cloud-build/builds?project=YOUR_CLOUD_BUILD_PROJECT_ID.
git checkout -b production
git push origin production
Fantastic! At this point, you should have the shared folder and all its related projects in place. To see them, go to Navigation menu | IAM & Admin | Manage Resources, and investigate the new structure.
Now that you know how all this stuff works, I’m going to let you build out the prod/non-prod/dev environments using the steps at https://github.com/terraform-google-modules/terraform-example-foundation/tree/master/2-environments. You’ll see that the process is almost identical to what you just did:
Before you tackle these steps, I want to point out a few interesting things and variable settings. First, you’ll notice that this time, there are a couple of subfolders in gcp-environments:
If you examine the variables file in the top-most gcp-environments folder, you will find some of the same configurations we’ve changed in other projects (org_id, billing_account, and so on).
Before you add and push your plan branch, make sure that you update the root terraform.tfvars file. You will notice that one of the variables references a security group that doesn’t exist yet, designed to hold users that need to be monitored at the environment level. We will discuss this group later, but for now, add the group with no specific permissions to your Google Cloud environment.
Also, the env_baseline TF files, once again, apply labels to the projects. You may want to edit those before implementing the plan.
I’m going to hold off on implementing 3-networks and 4-projects as we need to discuss a few other things first.
Fabulous work! You now have a CI/CD process in place to manage infrastructure in a toil-free, automated way. You can create TF scripts to set up Google Cloud resources and you’ve completed the fifth step in our 10-step foundation plan. Hey, you’re halfway there!
In this chapter, we made fantastic progress laying our Google Cloud foundation by completing step 5 and creating a resource hierarchy to control logical organization, from Google’s 10-step recipe. Not only did we learn how to plan and build a resource hierarchy, but we also simplified all our future steps by learning how to automate infrastructure management using a popular IaC product: Terraform. To make using TF easy and standard, we also implemented a CI/CD pipeline to allow us to continue working with our infrastructure from a central pair of projects.
If you want to keep moving through the checklist steps with me, your tutor, please move on to the next chapter, where you will learn how to use GCP security roles to control access across your presence in Google Cloud.