As an alternative to CustomResourceDefinitions, you can use a custom API server. Custom API servers can serve API groups with resources the same way the main Kubernetes API server does. In contrast to CRDs, there are hardly any limits to what you can do with a custom API server.
This chapter begins by listing a number of reasons why CRDs might not be the right solution for your use case. It describes the aggregation pattern that makes it possible to extend the Kubernetes API surface with a custom API server. Finally, you’ll learn to actually implement a custom API server using Golang.
A custom API server can be used in place of CRDs. It can do everything that CRDs can do and offers nearly infinite flexibility. Of course, this comes at a cost: complexity of both development and operation.
Let’s look at some limits of CRDs as of the time of this writing (when Kubernetes 1.14 was the stable release). CRDs:
Use etcd
as their storage medium (or whatever the Kubernetes API server uses).
Do not support protobuf, only JSON.
Support only two kinds of subresources: /status and /scale (see “Subresources”).
Do not support graceful deletion.1 Finalizers can simulate this but do not allow a custom graceful deletion time.
Add significantly to the Kubernetes API server’s CPU load, because all algorithms are implemented in a generic way (for example, validation).
Implement only standard CRUD semantics for the API endpoints.
Do not support cohabitation of resources (i.e., resources in different API groups or resources of different names that share storage).2
A custom API server, in contrast, does not have these restrictions. A custom API server:
Can use any storage medium. There are custom API servers, such as:
The metrics API server, which stores data in memory for maximum performance
API servers mirroring a Docker registry in OpenShift
API servers writing to a time series database
API servers mirroring cloud APIs
API servers mirroring other API objects, like projects in OpenShift that mirror Kubernetes namespaces
Can provide protobuf support like all native Kubernetes resources do. For this you must create a .proto file by using go-to-protobuf and then using the protobuf compiler protoc
to generate serializers, which are then compiled into the binary.
Can provide any custom subresource; for example, the Kubernetes API server provides /exec, /logs, /port-forward, and more, most of which use very custom protocols like WebSockets or HTTP/2 streaming.
Can implement graceful deletion as Kubernetes does for pods. kubectl
waits for the deletion, and the user can even provide a custom graceful termination period.
Can implement all operations like validation, admission, and conversion in the most efficient way using Golang, without a roundtrip through webhooks, which add further latency. This can matter for high performance use cases or if there is a large number of objects. Think about pod objects in a huge cluster with thousands of nodes, and two magnitudes more pods.
Can implement custom semantics, like the atomic reservation of a service IP in the core v1 Service
kind. At the moment the service is created, a unique service IP is assigned and directly returned. To a limited degree, special semantics like this can of course be implemented with admission webhooks (see “Admission Webhooks”), though those webhooks can never reliably know whether the passed object was actually created or updated: they are called optimistically, but a later step in the request pipeline might cancel the request. In other words: side effects in webhooks are tricky because there is no undo trigger if a request fails.
Can serve resources that have a common storage mechanism (i.e., a common etcd
key path prefix) but live in different API groups or are named differently. For example, Kubernetes stores deployments and other resources in the API group extensions/v1
and then moves them to more specific API groups like apps/v1
.
In other words, custom API servers are a solution for situations where CRDs are still limited. In transitional scenarios where it is important to not break resource compatibility when moving to new semantics, custom API servers are often much more flexible.
To learn how custom API servers are implemented, in this section we will look at an example project: a custom API server implementing a pizza restaurant API. Let’s take a look at the requirements.
We want to create two kinds in the restaurant.programming-kubernetes.info
API group:
Topping
Pizza toppings (e.g., salami, mozzarella, or tomato)
Pizza
The type of pizza offered in the restaurant
The toppings are cluster-wide resources and hold only a floating-point value for the cost of one unit of the topping. An instance is as simple as:
apiVersion
:
restaurant.programming-kubernetes.info/v1alpha1
kind
:
Topping
metadata
:
name
:
mozzarella
spec
:
cost
:
1.0
Each pizza can have an arbitrary number of toppings; for example:
apiVersion
:
restaurant.programming-kubernetes.info/v1alpha1
kind
:
Pizza
metadata
:
name
:
margherita
spec
:
toppings
:
-
mozzarella
-
tomato
The list of toppings is ordered (like any list in YAML or JSON), but the order does not really matter for the semantics of the type. The customer will get the same pizza in any case. We want to allow duplicates in the list in order to allow, say, a pizza with extra cheese.
All this can be implemented easily with CRDs. Now let’s add some requirements that go beyond the basic CRD capabilities:3
We want to allow only toppings in a pizza specification that have a corresponding Topping
object.
We also want to assume that we first introduced this API as a v1alpha1
version but eventually learned that we want another representation of the toppings in the v1beta1
version of the same API.
In other words, we want to have two versions and convert seamlessly between them.
The full implementation of this API as a custom API server can be found at the book’s GitHub repository. In the rest of this chapter, we will go through all the major parts of that project and learn how it works. In the process, you’ll see a lot of the concepts presented in the previous chapter in a different light: namely, the Golang implementation that is also behind the Kubernetes API server. A number of design decisions highlighted in CRDs also will become clearer.
Hence, we highly recommend you read through this chapter even if you don’t plan to go the route of a custom API server. Maybe the concepts presented here will be made available for CRDs as well in the future, in which case having knowledge of custom API servers will be useful to you.
Before going into the technical implementation details, we want to take a higher-level view of the custom API server architecture in the context of a Kubernetes cluster.
Custom API servers are processes serving API groups, usually built using the generic API server library k8s.io/apiserver. These processes can run inside or outside of the cluster. In the former case, they run inside pods, with a service in front.
The main Kubernetes API server, called kube-apiserver
, is always the first point of contact for kubectl
and other API clients. API groups served by a custom API server are proxied by the kube-apiserver
process to the custom API server process. In other words, the kube-apiserver
process knows about all of the custom API servers and the API groups they serve, in order to be able to proxy the right requests to them.
The component doing this proxying is inside the kube-apiserver
process and is called kube-aggregator
. The process of proxying API requests to the custom API server is called API aggregation.
Let’s look a bit more into the path of requests targeted at a custom API server, but coming in at the Kubernetes API server TCP socket (see Figure 8-1):
Requests are received by the Kubernetes API server.
They pass the handler chain consisting of authentication, audit logging, impersonation, max-in-flight throttling, authorization, and more (the figure is just a sketch and is not complete).
As the Kubernetes API server knows the aggregated APIs, it can intercept requests to the HTTP path /apis/aggregated-API-group-name
.
The Kubernetes API server forwards the request to the custom API server.
The kube-aggregator
proxies requests under the HTTP path for an API group version (i.e., everything under /apis/group-name
/version
). It does not have to know the actual served resources in the API group version.
In contrast, the kube-aggregator
serves the discovery endpoints /apis and /apis/group-name
of all aggregated custom API servers itself (it uses the defined order explained in the following section) and returns the results without talking to the aggregated custom API servers. Instead it uses the information from the APIService
resource. Let’s look at this process in detail.
For the Kubernetes API server to know about the API groups a custom API server serves, one APIService
object must be created in the apiregistration.k8s.io/v1
API group. These objects list only the API groups and versions, not resources or any further details:
apiVersion
:
apiregistration.k8s.io/v1beta1
kind
:
APIService
metadata
:
name
:
name
spec
:
group
:
API-group-name
version
:
API-group-version
service
:
namespace
:
custom-API-server-service-namespace
name
:
-API-server-service
caBundle
:
base64-caBundle
insecureSkipTLSVerify
:
bool
groupPriorityMinimum
:
2000
versionPriority
:
20
The name is arbitrary, but for clarity we suggest you use a name that identifies the API group name and version—e.g., group-name-version
.
The service can be a normal ClusterIP
service in the cluster, or it can be an ExternalName
service with a given DNS name for out-of-cluster custom API servers. In both cases, the port must be 443. No other service port is supported (at the time of this writing). Service target port mapping allows any chosen, preferably nonrestricted, higher port to be used for the custom API server pods, so this is not a major restriction.
The certificate authority (CA) bundle is used for the Kubernetes API server to trust the contacted service. Note that API requests can contain confidential data. To avoid man-in-the-middle attacks, it is highly recommended that you set the caBundle
field and not use the insecureSkipTLSVerify
alternative. This is especially important for any production cluster, including a mechanism for certificate rotation.
Finally, there are two priorities in the APIService
object. These have some tricky semantics, described in the Golang code documentation for the APIService
type:
// GroupPriorityMininum is the priority this group should have at least. Higher
// priority means that the group is preferred by clients over lower priority ones.
// Note that other versions of this group might specify even higher
// GroupPriorityMinimum values such that the whole group gets a higher priority.
//
// The primary sort is based on GroupPriorityMinimum, ordered highest number to
// lowest (20 before 10). The secondary sort is based on the alphabetical
// comparison of the name of the object (v1.bar before v1.foo). We'd recommend
// something like: *.k8s.io (except extensions) at 18000 and PaaSes
// (OpenShift, Deis) are recommended to be in the 2000s
GroupPriorityMinimum
int32
`json:"groupPriorityMinimum"`
// VersionPriority controls the ordering of this API version inside of its
// group. Must be greater than zero. The primary sort is based on
// VersionPriority, ordered highest to lowest (20 before 10). Since it's inside
// of a group, the number can be small, probably in the 10s. In case of equal
// version priorities, the version string will be used to compute the order
// inside a group. If the version string is "kube-like", it will sort above non
// "kube-like" version strings, which are ordered lexicographically. "Kube-like"
// versions start with a "v", then are followed by a number (the major version),
// then optionally the string "alpha" or "beta" and another number (the minor
// version). These are sorted first by GA > beta > alpha (where GA is a version
// with no suffix such as beta or alpha), and then by comparing major version,
// then minor version. An example sorted list of versions:
// v10, v2, v1, v11beta2, v10beta3, v3beta1, v12alpha1, v11alpha2, foo1, foo10.
VersionPriority
int32
`json:"versionPriority"`
In other words, the GroupPriorityMinimum
value determines where the group is prioritized. If multiple APIService
objects for different versions differ, the highest value rules.
The second priority just orders the versions among each other to define the preferred version to be used by dynamic clients.
Here is a list of the GroupPriorityMinimum
values for the native Kubernetes API groups:
var
apiVersionPriorities
=
map
[
schema
.
GroupVersion
]
priority
{
{
Group
:
""
,
Version
:
"v1"
}:
{
group
:
18000
,
version
:
1
},
{
Group
:
"extensions"
,
Version
:
"v1beta1"
}:
{
group
:
17900
,
version
:
1
},
{
Group
:
"apps"
,
Version
:
"v1beta1"
}:
{
group
:
17800
,
version
:
1
},
{
Group
:
"apps"
,
Version
:
"v1beta2"
}:
{
group
:
17800
,
version
:
9
},
{
Group
:
"apps"
,
Version
:
"v1"
}:
{
group
:
17800
,
version
:
15
},
{
Group
:
"events.k8s.io"
,
Version
:
"v1beta1"
}:
{
group
:
17750
,
version
:
5
},
{
Group
:
"authentication.k8s.io"
,
Version
:
"v1"
}:
{
group
:
17700
,
version
:
15
},
{
Group
:
"authentication.k8s.io"
,
Version
:
"v1beta1"
}:
{
group
:
17700
,
version
:
9
},
{
Group
:
"authorization.k8s.io"
,
Version
:
"v1"
}:
{
group
:
17600
,
version
:
15
},
{
Group
:
"authorization.k8s.io"
,
Version
:
"v1beta1"
}:
{
group
:
17600
,
version
:
9
},
{
Group
:
"autoscaling"
,
Version
:
"v1"
}:
{
group
:
17500
,
version
:
15
},
{
Group
:
"autoscaling"
,
Version
:
"v2beta1"
}:
{
group
:
17500
,
version
:
9
},
{
Group
:
"autoscaling"
,
Version
:
"v2beta2"
}:
{
group
:
17500
,
version
:
1
},
{
Group
:
"batch"
,
Version
:
"v1"
}:
{
group
:
17400
,
version
:
15
},
{
Group
:
"batch"
,
Version
:
"v1beta1"
}:
{
group
:
17400
,
version
:
9
},
{
Group
:
"batch"
,
Version
:
"v2alpha1"
}:
{
group
:
17400
,
version
:
9
},
{
Group
:
"certificates.k8s.io"
,
Version
:
"v1beta1"
}:
{
group
:
17300
,
version
:
9
},
{
Group
:
"networking.k8s.io"
,
Version
:
"v1"
}:
{
group
:
17200
,
version
:
15
},
{
Group
:
"networking.k8s.io"
,
Version
:
"v1beta1"
}:
{
group
:
17200
,
version
:
9
},
{
Group
:
"policy"
,
Version
:
"v1beta1"
}:
{
group
:
17100
,
version
:
9
},
{
Group
:
"rbac.authorization.k8s.io"
,
Version
:
"v1"
}:
{
group
:
17000
,
version
:
15
},
{
Group
:
"rbac.authorization.k8s.io"
,
Version
:
"v1beta1"
}:
{
group
:
17000
,
version
:
12
},
{
Group
:
"rbac.authorization.k8s.io"
,
Version
:
"v1alpha1"
}:
{
group
:
17000
,
version
:
9
},
{
Group
:
"settings.k8s.io"
,
Version
:
"v1alpha1"
}:
{
group
:
16900
,
version
:
9
},
{
Group
:
"storage.k8s.io"
,
Version
:
"v1"
}:
{
group
:
16800
,
version
:
15
},
{
Group
:
"storage.k8s.io"
,
Version
:
"v1beta1"
}:
{
group
:
16800
,
version
:
9
},
{
Group
:
"storage.k8s.io"
,
Version
:
"v1alpha1"
}:
{
group
:
16800
,
version
:
1
},
{
Group
:
"apiextensions.k8s.io"
,
Version
:
"v1beta1"
}:
{
group
:
16700
,
version
:
9
},
{
Group
:
"admissionregistration.k8s.io"
,
Version
:
"v1"
}:
{
group
:
16700
,
version
:
15
},
{
Group
:
"admissionregistration.k8s.io"
,
Version
:
"v1beta1"
}:
{
group
:
16700
,
version
:
12
},
{
Group
:
"scheduling.k8s.io"
,
Version
:
"v1"
}:
{
group
:
16600
,
version
:
15
},
{
Group
:
"scheduling.k8s.io"
,
Version
:
"v1beta1"
}:
{
group
:
16600
,
version
:
12
},
{
Group
:
"scheduling.k8s.io"
,
Version
:
"v1alpha1"
}:
{
group
:
16600
,
version
:
9
},
{
Group
:
"coordination.k8s.io"
,
Version
:
"v1"
}:
{
group
:
16500
,
version
:
15
},
{
Group
:
"coordination.k8s.io"
,
Version
:
"v1beta1"
}:
{
group
:
16500
,
version
:
9
},
{
Group
:
"auditregistration.k8s.io"
,
Version
:
"v1alpha1"
}:
{
group
:
16400
,
version
:
1
},
{
Group
:
"node.k8s.io"
,
Version
:
"v1alpha1"
}:
{
group
:
16300
,
version
:
1
},
{
Group
:
"node.k8s.io"
,
Version
:
"v1beta1"
}:
{
group
:
16300
,
version
:
9
},
}
So using 2000
for PaaS-like APIs means that they are placed at the end of this list.4
The order of the API groups plays a role during the REST mapping process in kubectl
(see “REST Mapping”). This means it has actual influence on the user experience. If there are conflicting resource names or short names, the one with the highest GroupPriorityMinimum
value wins.
Also, in the special case of replacing of an API group version using a custom API server, this priority ordering might be of use. For example, you could replace a native Kubernetes API group with a modified one (for whatever reason) by placing the custom API service at a position with a lower GroupPriorityMinimum
value than the one in the upper table.
Note again that the Kubernetes API server does not need to know the list of resources for either of the discovery endpoints /apis, and /apis/group-name
, or for proxying. The list of resources is returned only via the third discovery endpoint, /apis/group-name
/version
. But as we have seen in the previous section, this endpoint is served by the aggregated custom API server, not by kube-aggregator
.
A custom API server resembles most of the parts that make up the Kubernetes API server, though of course with different API group implementations, and without an embedded kube-aggregator
or an embedded apiextension-apiserver
(which serves CRDs). This leads to nearly the same architectural picture (shown in Figure 8-2) as the one in Figure 8-1:
We observe a number of things. An aggregated API server:
Has the same basic internal structure as the Kubernetes API server.
Has its own handler chain, including authentication, audit, impersonation, max-in-flight throttling, and authorization (we will explain throughout this chapter why this is necessary; see, for example, “Delegated Authorization”).
Has its own resource handler pipeline, including decoding, conversion, admission, REST mapping, and encoding.
Calls admission webhooks.
Might write to etcd
(it can use a different storage backend, though). The etcd
cluster does not have to be the same as the one used by the Kubernetes API server.
Has its own scheme and registry implementation for custom API groups. The registry implementation might differ and be customized to any degree.
Does authentication again. It usually does client certificate authentication and token-based authentication, calling back to the Kubernetes API server with a TokenAccessReview
request. We will discuss the authentication and trust architecture in more detail shortly.
Does its own auditing. This means the Kubernetes API server audits certain fields, but only on the meta level. Object-level auditing is done in the aggregated custom API server.
Does its own authentication using SubjectAccessReview
requests to the Kubernetes API server. We will discuss authorization in more detail shortly.
An aggregated custom API server (based on k8s.io/apiserver) is built on the same authentication library as the Kubernetes API server. It can use client certificates or tokens to authenticate a user.
Because an aggregated custom API server is architecturally placed behind the Kubernetes API server (i.e., the Kubernetes API server receives requests and proxies them to the aggregated custom API server), requests are already authenticated by the Kubernetes API server. The Kubernetes API server stores the result of the authentication—that is, the username and group membership—in HTTP request headers, usually X-Remote-User
and X-Remote-Group
(these can be configured with the --requestheader-username-headers
and --requestheader-group-headers
flags).
The aggregated custom API server has to know when to trust these headers; otherwise, any other caller could claim to have done authentication and could set these headers. This is handled by a special request header client CA. It is stored in the config map kube-system/extension-apiserver-authentication (filename requestheader-client-ca-file). Here is an example:
apiVersion
:
v1
kind
:
ConfigMap
metadata
:
name
:
extension-apiserver-authentication
namespace
:
kube-system
data
:
client-ca-file
:
|
-----BEGIN CERTIFICATE-----
...
-----END CERTIFICATE-----
requestheader-allowed-names
:
'["aggregator"]'
requestheader-client-ca-file
:
|
-----BEGIN CERTIFICATE-----
...
-----END CERTIFICATE-----
requestheader-extra-headers-prefix
:
'["X-Remote-Extra-"]'
requestheader-group-headers
:
'["X-Remote-Group"]'
requestheader-username-headers
:
'["X-Remote-User"]'
With this information, an aggregated custom API server with default settings will authenticate:
Clients using client certificates matching the given client-ca-file
Clients preauthenticated by the Kubernetes API server whose requests are forwarded using the given requestheader-client-ca-file and whose username and group memberships are stored in the given HTTP headers X-Remote-Group
and X-Remote-User
Last but not least, there is a mechanism called TokenAccessReview
that forwards bearer tokens (received via the HTTP header Authorization: bearer token
) back to the Kubernetes API server in order to verify whether they are valid. The token access review mechanism is disabled by default but can optionally be enabled; see “Options and Config Pattern and Startup Plumbing”.
We will see in the following sections how delegated authentication is actually set up. While we’ve gone into detail about this mechanism here, inside an aggregated custom API server this is mostly done automatically by the k8s.io/apiserver library. But knowing what is going on behind the curtain is certainly valuable, especially where security is involved.
After authentication has been done, each request must be authorized. Authorization is based on the username and group list. The default authorization mechanism in Kubernetes is role-based access control (RBAC).
RBAC maps identities to roles, and roles to authorization rules, which finally accept or reject requests. We won’t go into all the details here about RBAC authorization objects like roles and cluster roles, or role bindings and cluster role bindings (see “Getting the Permissions Right” for more). From an architectural point of view it is enough to know that an aggregated custom API server authorizes requests using delegated authorization via SubjectAccessReview
s. It does not evaluate RBAC rules itself but instead delegates evaluation to the Kubernetes API server.
Let’s look at delegated authorization in more detail now.
A subject access review is sent from the aggregated custom API server to the Kubernetes API server on a request (if it does not find an answer in its authorization cache). Here is an example of such a review object:
apiVersion
:
authorization.k8s.io/v1
kind
:
SubjectAccessReview
spec
:
resourceAttributes
:
group
:
apps
resource
:
deployments
verb
:
create
namespace
:
default
version
:
v1
name
:
example
user
:
michael
groups
:
-
system:authenticated
-
admins
-
authors
The Kubernetes API server receives this from the aggregated custom API server, evaluates the RBAC rules in the cluster, and makes a decision, returning a SubjectAccessReview
object with a status field set; for example:
apiVersion
:
authorization.k8s.io/v1
kind
:
SubjectAccessReview
status
:
allowed
:
true
denied
:
false
reason
:
"rule
foo
allowed
this
request"
Note here that it is possible that both allowed
and denied
are false
. This means that the Kubernetes API server could not make a decision, in which case another authorizer inside an aggregated custom API server can make a decision (API servers implement an authorization chain that is queried one by one, with delegated authorization being one of the authorizers in that chain). This can be used to model nonstandard authorization logic—that is, if in certain cases there are no RBAC rules but an external authorization system is used instead.
Note that for performance reasons, the delegated authorization mechanism maintains a local cache in each aggregated custom API server. By default, it caches 1,024 authorization entries with:
5
minutes expiry for allowed authorization requests
30
seconds expiry for denied authorization requests
These values can be customized via --authorization-webhook-cache-authorized-ttl
and --authorization-webhook-cache-unauthorized-ttl
.
We’ll see in the following sections how delegated authorization is set up in code. Again, as with authentication, inside an aggregated custom API server delegated authorization is mostly done automatically by the k8s.io/apiserver library.
In the previous sections we looked at the architecture of aggregated API servers. In this section we want to look at the implementation of an aggregated custom API server in Golang.
The main Kubernetes API server is implemented via the k8s.io/apiserver library. A custom API server will use the very same code. The main difference is that our custom API server will run in-cluster. This means that it can assume that a kube-apiserver
is available in the cluster and use it to do delegated authorization and to retrieve other kube-native resources.
We also assume that an etcd
cluster is available and ready to be used by the aggregated custom API server. It is not important whether this etcd
is dedicated or shared with the Kubernetes API server. Our custom API server will use a different etcd
key space to avoid conflicts.
The code examples in this chapter refer to the example code on GitHub, so look there for the complete source code. We will show only the most interesting excerpt here, but you can always go to the complete example project, experiment with it, and—very important for learning—run it in a real cluster.
This pizza-apiserver
project implements the example API shown in “Example: A Pizza Restaurant”.
We’ll start with a couple of option structs that are bound to flags. Take them from k8s.io/apiserver and add our custom options. Option structs from k8s.io/apiserver can be tweaked in-code for special use cases, and the provided flags can be applied to a flag set in order to be accessible to the user.
In the example we start very simply by basing everything on the RecommendedOptions
. These recommended options set up everything as needed for a “normal” aggregated custom API server for simple APIs, like this:
import
(
...
informers
"github.com/programming-kubernetes/pizza-apiserver/pkg/
generated/informers/externalversions"
)
const
defaultEtcdPathPrefix
=
"/registry/restaurant.programming-kubernetes.info"
type
CustomServerOptions
struct
{
RecommendedOptions
*
genericoptions
.
RecommendedOptions
SharedInformerFactory
informers
.
SharedInformerFactory
}
func
NewCustomServerOptions
(
out
,
errOut
io
.
Writer
)
*
CustomServerOptions
{
o
:=
&
CustomServerOptions
{
RecommendedOptions
:
genericoptions
.
NewRecommendedOptions
(
defaultEtcdPathPrefix
,
apiserver
.
Codecs
.
LegacyCodec
(
v1alpha1
.
SchemeGroupVersion
),
genericoptions
.
NewProcessInfo
(
"pizza-apiserver"
,
"pizza-apiserver"
),
),
}
return
o
}
The CustomServerOptions
embed RecommendedOptions
and add one field on top. NewCustomServerOptions
is the constructor that fills the CustomServerOptions
struct with default values.
Let’s look into some of the more interesting details:
defaultEtcdPathPrefix
is the etcd
prefix for all of our keys. As a key space, we use /registry/pizza-apiserver.programming-kubernetes.info, clearly distinct from Kubernetes keys.
SharedInformerFactory
is the process-wide shared informer factory for our own CRs to avoid unnecessary informers for the same resources (see Figure 3-5). Note that it is imported from the generated informer code in our project and not from client-go
.
NewRecommendedOptions
sets everything up for an aggregated custom API server with default values.
Let’s take a quick look at NewRecommendedOptions
:
return
&
RecommendedOptions
{
Etcd
:
NewEtcdOptions
(
storagebackend
.
NewDefaultConfig
(
prefix
,
codec
)),
SecureServing
:
sso
.
WithLoopback
(),
Authentication
:
NewDelegatingAuthenticationOptions
(),
Authorization
:
NewDelegatingAuthorizationOptions
(),
Audit
:
NewAuditOptions
(),
Features
:
NewFeatureOptions
(),
CoreAPI
:
NewCoreAPIOptions
(),
ExtraAdmissionInitializers
:
func
(
c
*
server
.
RecommendedConfig
)
([]
admission
.
PluginInitializer
,
error
)
{
return
nil
,
nil
},
Admission
:
NewAdmissionOptions
(),
ProcessInfo
:
processInfo
,
Webhook
:
NewWebhookOptions
(),
}
All of these can be tweaked if necessary. For example, if a custom default serving port is desired, RecommendedOptions.SecureServing.SecureServingOptions.BindPort
can be set.
Let’s briefly go through the existing option structs:
Etcd
configures the storage stack that reads and write to etcd
.
SecureServing
configures everything around HTTPS (i.e., ports, certificates, etc.)
Authentication
sets up delegated authentication as described in “Delegated Authentication and Trust”.
Authorization
sets up delegated authorization as described in “Delegated Authorization”.
Audit
sets up the auditing output stack. This is disabled by default, but can be set to output an audit log file or to send audit events to an external backend.
Features
configures feature gates of alpha and beta features.
CoreAPI
holds a path to a kubeconfig file to access the main API server. This defaults to using the in-cluster configuration.
Admission
is a stack of mutating and validating admission plug-ins that execute for every incoming API request. This can be extended with custom in-code admission plug-ins, or the default admission chain can be tweaked for the custom API server.
ExtraAdmissionInitializers
allows us to add more initializers for admission. Initializers implement the plumbing of, for example, informers or clients through the custom API server. See “Admission” for more about custom admission.
ProcessInfo
holds information for event object creation (i.e., a process name and a namespace). We have set it to pizza-apiserver
for both values.
Webhook
configures how webhooks operate (e.g., general setting for authentication and admission webhook). It is set up with good defaults for a custom API server that runs inside of a cluster. For API servers outside of the cluster, this would be the place to configure how it can reach the webhook.
Options are coupled with flags; that is, they are conventionally on the same abstraction level as flags. As a rule of thumb, options do not hold “running” data structures. They are used during startup and then converted to configuration or server objects, which are then run.
Options can be validated via the Validate() error
method. This method will also check that the user-provided flag values make logical sense.
Options can be completed in order to set default values, which should not show up in the flags’ help text but which are necessary to get a complete set of options.
Options are converted to a server configuration (“config”) by the Config() (*apiserver.Config, error)
method. This is done by starting with a recommended default configuration and then applying the options to it:
func
(
o
*
CustomServerOptions
)
Config
()
(
*
apiserver
.
Config
,
error
)
{
err
:=
o
.
RecommendedOptions
.
SecureServing
.
MaybeDefaultWithSelfSignedCerts
(
"localhost"
,
nil
,
[]
net
.
IP
{
net
.
ParseIP
(
"127.0.0.1"
)},
)
if
err
!=
nil
{
return
nil
,
fmt
.
Errorf
(
"error creating self-signed cert: %v"
,
err
)
}
[
...
omitted
o
.
RecommendedOptions
.
ExtraAdmissionInitializers
...
]
serverConfig
:=
genericapiserver
.
NewRecommendedConfig
(
apiserver
.
Codecs
)
err
=
o
.
RecommendedOptions
.
ApplyTo
(
serverConfig
,
apiserver
.
Scheme
);
if
err
!=
nil
{
return
nil
,
err
}
config
:=
&
apiserver
.
Config
{
GenericConfig
:
serverConfig
,
ExtraConfig
:
apiserver
.
ExtraConfig
{},
}
return
config
,
nil
}
The config created here contains runnable data structures; in other words, configs are runtime objects, in contrast to the options, which correspond to flags. The line o.RecommendedOptions.SecureServing.MaybeDefaultWithSelfSignedCerts
creates self-signed certificates in case the user has not passed flags for pregenerated certificates.
As we’ve described, genericapiserver.NewRecommendedConfig
returns a default recommended configuration, and RecommendedOptions.ApplyTo
changes it according to flags (and other customized options).
The config struct of the pizza-apiserver
project itself is just a wrapper around the RecommendedConfig
for our example custom API server:
type
ExtraConfig
struct
{
// Place your custom config here.
}
type
Config
struct
{
GenericConfig
*
genericapiserver
.
RecommendedConfig
ExtraConfig
ExtraConfig
}
// CustomServer contains state for a Kubernetes custom api server.
type
CustomServer
struct
{
GenericAPIServer
*
genericapiserver
.
GenericAPIServer
}
type
completedConfig
struct
{
GenericConfig
genericapiserver
.
CompletedConfig
ExtraConfig
*
ExtraConfig
}
type
CompletedConfig
struct
{
// Embed a private pointer that cannot be instantiated outside of
// this package.
*
completedConfig
}
If more state for a running custom API server is necessary, ExtraConfig
is the place to put it.
Similarly to option structs, the config has a Complete() CompletedConfig
method that sets default values. Because it is necessary to actually call Complete()
for the underlying configuration, it is common to enforce that via the type system by introducing the unexported completedConfig
data type. The idea here is that only a call to Complete()
can turn a Config
into a completeConfig
. The compiler will complain if this call is not done:
func
(
cfg
*
Config
)
Complete
()
completedConfig
{
c
:=
completedConfig
{
cfg
.
GenericConfig
.
Complete
(),
&
cfg
.
ExtraConfig
,
}
c
.
GenericConfig
.
Version
=
&
version
.
Info
{
Major
:
"1"
,
Minor
:
"0"
,
}
return
completedConfig
{
&
c
}
}
Finally, the completed config can be turned into a CustomServer
runtime struct via the New()
constructor:
// New returns a new instance of CustomServer from the given config.
func
(
c
completedConfig
)
New
()
(
*
CustomServer
,
error
)
{
genericServer
,
err
:=
c
.
GenericConfig
.
New
(
"pizza-apiserver"
,
genericapiserver
.
NewEmptyDelegate
(),
)
if
err
!=
nil
{
return
nil
,
err
}
s
:=
&
CustomServer
{
GenericAPIServer
:
genericServer
,
}
[
...
omitted
API
installation
...
]
return
s
,
nil
}
Note that we have intentionally omitted the API installation part here. We’ll come back to this in “API Installation” (i.e., how you wire the registries into the custom API server during startup). A registry implements the API and storage semantics of an API group. We will see this for the restaurant API group in “Registry and Strategy”.
The CustomServer
object can finally be started with the Run(stopCh <-chan struct{}) error
method. This is called by the Run
method of the options in our example. That is, CustomServerOptions.Run
:
Creates the config
Completes the config
Creates the CustomServer
Calls CustomServer.Run
This is the code:
func
(
o
CustomServerOptions
)
Run
(
stopCh
<-
chan
struct
{})
error
{
config
,
err
:=
o
.
Config
()
if
err
!=
nil
{
return
err
}
server
,
err
:=
config
.
Complete
().
New
()
if
err
!=
nil
{
return
err
}
server
.
GenericAPIServer
.
AddPostStartHook
(
"start-pizza-apiserver-informers"
,
func
(
context
genericapiserver
.
PostStartHookContext
)
error
{
config
.
GenericConfig
.
SharedInformerFactory
.
Start
(
context
.
StopCh
)
o
.
SharedInformerFactory
.
Start
(
context
.
StopCh
)
return
nil
},
)
return
server
.
GenericAPIServer
.
PrepareRun
().
Run
(
stopCh
)
}
The PrepareRun()
call wires up the OpenAPI specification and might do other post-API-installation operations. After calling it, the Run
method starts the actual server. It blocks until stopCh
is closed.
This example also wires a post-start hook named start-pizza-apiserver-informers
. As the name suggests, a post-start hook is called after the HTTPS server is up and listening. Here, it starts the shared informer factories.
Note that even local in-process informers of resources provided by the custom API server itself speak via HTTPS to the localhost interface. So it makes sense to start them after the server is up and the HTTPS port is listening.
Also note that the /healthz endpoint returns success only after all post-start hooks have finished successfully.
With all the little plumbing pieces in place, the pizza-apiserver
project wraps everything up into a cobra
command:
// NewCommandStartCustomServer provides a CLI handler for 'start master' command
// with a default CustomServerOptions.
func
NewCommandStartCustomServer
(
defaults
*
CustomServerOptions
,
stopCh
<-
chan
struct
{},
)
*
cobra
.
Command
{
o
:=
*
defaults
cmd
:=
&
cobra
.
Command
{
Short
:
"Launch a custom API server"
,
Long
:
"Launch a custom API server"
,
RunE
:
func
(
c
*
cobra
.
Command
,
args
[]
string
)
error
{
if
err
:=
o
.
Complete
();
err
!=
nil
{
return
err
}
if
err
:=
o
.
Validate
();
err
!=
nil
{
return
err
}
if
err
:=
o
.
Run
(
stopCh
);
err
!=
nil
{
return
err
}
return
nil
},
}
flags
:=
cmd
.
Flags
()
o
.
RecommendedOptions
.
AddFlags
(
flags
)
return
cmd
}
With NewCommandStartCustomServer
the main()
method of the process is pretty simple:
func
main
()
{
logs
.
InitLogs
()
defer
logs
.
FlushLogs
()
stopCh
:=
genericapiserver
.
SetupSignalHandler
()
options
:=
server
.
NewCustomServerOptions
(
os
.
Stdout
,
os
.
Stderr
)
cmd
:=
server
.
NewCommandStartCustomServer
(
options
,
stopCh
)
cmd
.
Flags
().
AddGoFlagSet
(
flag
.
CommandLine
)
if
err
:=
cmd
.
Execute
();
err
!=
nil
{
klog
.
Fatal
(
err
)
}
}
Note especially the call to SetupSignalHandler
: it wires Unix signal handling. On SIGINT
(triggered when you press Ctrl-C in a terminal) and SIGKILL
, the stop channel is closed. The stop channel is passed to the running custom API server, and it shuts down when the stop channel is closed. Hence, the main loop will initiate a shutdown when one of the signals is received. This shutdown is graceful in the sense that running requests are finished (for up to 60 seconds by default) before termination. It also makes sure that all requests are sent to the audit backend and no audit data is dropped. After all that, cmd.Execute()
will return and the process will terminate.
Now we have everything in place to start the custom API server for the first time. Assuming you have a cluster configured in ~/.kube/config, you can use it for delegated authentication and authorization:
$
cd
$GOPATH
/src/github.com/programming-kubernetes/pizza-apiserver$
etcd&
$
go run . --etcd-servers localhost:2379--authentication-kubeconfig ~/.kube/config
--authorization-kubeconfig ~/.kube/config
--kubeconfig ~/.kube/config I0331 11:33:25.702320
64244
plugins.go:158]
Loaded3
mutating admission controller(
s)
successfully in the following order: NamespaceLifecycle,MutatingAdmissionWebhook,PizzaToppings. I0331 11:33:25.70234464244
plugins.go:161]
Loaded1
validating admission controller(
s)
successfully in the following order: ValidatingAdmissionWebhook. I0331 11:33:25.71414864244
secure_serving.go:116]
Serving securely on[
::]
:443
It will start up and start serving the generic API endpoints:
$
curl -k https://localhost:443/healthz
ok
We can also list the discovery endpoint, but the result is not very satisfying yet—we have not created an API, so the discovery is empty:
$
curl -k https://localhost:443/apis{
"kind"
:"APIGroupList"
,"groups"
:[]
}
Let’s take a look from a higher level:
We have started a custom API server with the recommended options and config.
We have a standard handler chain that includes delegated authentication, delegated authorization, and auditing.
We have an HTTPS server running and serving requests for the generic endpoints: /logs, /metrics, /version, /healthz, and /apis.
Figure 8-3 shows this from 10,000 feet.
Now that we’ve set up a running custom API server, it’s time to actually implement APIs. Before doing so, we have to understand API versions and how they are handled inside of an API server.
Every API server serves a number of resources and versions (see Figure 2-3). Some resources have multiple versions. To make multiple versions of a resource possible, the API server converts between versions.
To avoid quadratic growth of necessary conversions between versions, API servers use an internal version when implementing the actual API logic. The internal version is also often called hub version because it is a kind of hub that every other version is converted to and from (see Figure 8-4). The internal API logic is implemented just once for that hub version.
Figure 8-5 shows how the API servers make use of the internal version in the life-cycle of an API request:
The user sends a request using a specific version (e.g., v1
).
The API server decodes the payload and converts it to the internal version.
The API server passes the internal version through admission and validation.
The API logic is implemented for internal versions in the registry.
etcd
reads and writes the versioned object (e.g., v2
—the storage version); that is, it converts from and to the internal version.
Finally, the result is converted to the request version, in this case, v1
.
On each edge between the internal hub version and the external version, a conversion takes place. In Figure 8-6, you can count the number of conversions per request handler. In a writing operation (like creation and update), at least four conversions are done, and even more if admission webhooks are deployed in the cluster. As you can see, conversion is a crucial operation in every API implementation.
In addition to conversion, Figure 8-6 also shows when defaulting takes place. Defaulting is the process of filling in unspecified field values. Defaulting is highly coupled with conversion, and is always done on the external version when it comes in from the user’s request, from etcd
or from an admission webhook, but never when converted from the hub to the external version.
Conversion is crucial for the API server mechanics. It is also crucial that all conversions (back and forth) must be correct in the sense of being roundtrippable. Roundtrippable means that we can convert back and forth in the version graph (Figure 8-4) starting with random values, and we never lose any information; that is, conversions are bijective, or one-to-one. For example, we must be able to go from a random (but valid) v1
object to the internal hub type, then to v1alpha1
, back to the internal hub type, and then back to v1
. The resulting object must be equivalent to the original.
Making types roundtrippable often requires a lot of thought; it nearly always drives the API design of new versions and also influences the extension of old types in order to store the information that new versions carry.
In short: getting roundtripping right is hard—very hard at times. See “Roundtrip Testing” to learn how roundtripping can be tested effectively.
Defaulting logic can changed during the lifecycle of an API server. Imagine you add a new field to a type. The user might have old objects stored on disk, or the etcd
may have old objects. If that new field has a default, this field value is set when the old, stored objects are sent to the API server, or when the user retrieves one of the old objects from etcd
. It looks like the new field has existed forever, while in reality the defaulting process in the API server sets the field values during the processing of the request.
As we have seen, to add an API to the custom API server, we have to write the internal hub version types and the external version types and convert between them. This is what we’ll look at now for the pizza example project.
API types are traditionally placed into the pkg/apis/group-name
package of the project with pkg/apis/group-name
/types.go for internal types and pkg/apis/group-name
/version
/types.go for the external versions). So, for our example, pkg/apis/restaurant, pkg/apis/restaurant/v1alpha1/types.go, and pkg/apis/restaurant/v1beta1/types.go.
Conversions will be created at pkg/apis/group-name
/version
/zz_generated.conversion.go (for conversion-gen
output) and pkg/apis/group-name
/version
/conversion.go for custom conversions written by the developer.
In a similar way, defaulting code will be created for defaulter-gen
output at pkg/apis/group-name
/version
/zz_generated.defaults.go and at pkg/apis/group-name
/version
/defaults.go for custom defaulting code written by the developer. We have both pkg/apis/restaurant/v1alpha1/defaults.go and pkg/apis/restaurant/v1beta1/defaults.go in our example.
We go into more detail about conversion and defaulting in “Conversions” and “Defaulting”.
With the exception of conversion and defaulting, we’ve seen most of this process already for CustomResourceDefinitions in “Anatomy of a type”. Native types for the external versions in our custom API server are defined exactly the same way.
In addition, we have pkg/apis/group-name
/types.go for the internal types, the hub types. The main difference is that in the latter the SchemeGroupVersion
in the register.go file references runtime.APIVersionInternal
(which is a shortcut for "__internal"
).
// SchemeGroupVersion is group version used to register these objects
var
SchemeGroupVersion
=
schema
.
GroupVersion
{
Group
:
GroupName
,
Version
:
runtime
.
APIVersionInternal
}
Another difference between pkg/apis/group-name/types.go
and the external type files is the lack of JSON and protobuf tags.
JSON tags are used by some generators to detect whether a types.go file is for an external version or the internal version. So always drop those tags when copying and pasting external types in order to create or update the internal types.
Last but not least, there is a helper to install all versions of an API group into a scheme. This helper is traditionally placed in pkg/apis/group-name
/install/install.go. For our custom API server pkg/apis/restaurant/install/install.go, it looks as simple as this:
// Install registers the API group and adds types to a scheme
func
Install
(
scheme
*
runtime
.
Scheme
)
{
utilruntime
.
Must
(
restaurant
.
AddToScheme
(
scheme
))
utilruntime
.
Must
(
v1beta1
.
AddToScheme
(
scheme
))
utilruntime
.
Must
(
v1alpha1
.
AddToScheme
(
scheme
))
utilruntime
.
Must
(
scheme
.
SetVersionPriority
(
v1beta1
.
SchemeGroupVersion
,
v1alpha1
.
SchemeGroupVersion
,
))
}
Because we have multiple versions, the priority has to be defined. This order will be used to determine the default storage version of the resource. It used to also play a role in version selection in internal clients (clients that return internal version objects; refer back to the note “Versioned Clients and Internal Clients in the Past”). But internal clients are deprecated and are going away. Even code inside an API server will use an external version client in the future.
Conversion takes an object in one version and converts it into an object in another version. Conversion is implemented through conversion functions, some of them manually written (placed into pkg/apis/group-name
/version
/conversion.go by convention), and others autogenerated by conversion-gen
(placed by convention into pkg/apis/group-name
/version
/zz_generated.conversion.go).
Conversion is initiated via a scheme (see “Scheme”) using the Convert()
method, passing the source object in
and the target object out
:
func
(
s
*
Scheme
)
Convert
(
in
,
out
interface
{},
context
interface
{})
error
The context
is described as follows:
// ...an optional field that callers may use to pass info to conversion functions.
It is used only in very special cases and is usually nil
. Later in the chapter we will look at the conversion function scope, which allows us to access this context from within conversion functions.
To do the actual conversion, the scheme knows about all the Golang API types, their GroupVersionKinds, and the conversion functions between GroupVersionKinds. For this, conversion-gen
registers generated conversion functions via the local scheme builder. In our example custom API server, the zz_generated.conversion.go file starts like this:
func
init
()
{
localSchemeBuilder
.
Register
(
RegisterConversions
)
}
// RegisterConversions adds conversion functions to the given scheme.
// Public to allow building arbitrary schemes.
func
RegisterConversions
(
s
*
runtime
.
Scheme
)
error
{
if
err
:=
s
.
AddGeneratedConversionFunc
(
(
*
Topping
)(
nil
),
(
*
restaurant
.
Topping
)(
nil
),
func
(
a
,
b
interface
{},
scope
conversion
.
Scope
)
error
{
return
Convert_v1alpha1_Topping_To_restaurant_Topping
(
a
.(
*
Topping
),
b
.(
*
restaurant
.
Topping
),
scope
,
)
},
);
err
!=
nil
{
return
err
}
...
return
nil
}
...
The function Convert_v1alpha1_Topping_To_restaurant_Topping()
is generated. It takes a v1alpha1
object and converts it to the internal type.
The preceding complicated type conversion turns the typed conversion function into a uniformly typed func(a, b interface{}, scope conversion.Scope) error
. The scheme uses the latter types because it can call them without the use of reflection. Reflection is slow due to the many necessary allocations.
The manually written conversions in conversion.go take precedence during generation in the sense that conversion-gen
skips generation for types if it finds a manually written function in the packages with the Convert_source-package-basename_Kind
To_target-package-basename
_Kind conversion function naming pattern. For example:
func
Convert_v1alpha1_PizzaSpec_To_restaurant_PizzaSpec
(
in
*
PizzaSpec
,
out
*
restaurant
.
PizzaSpec
,
s
conversion
.
Scope
,
)
error
{
...
return
nil
}
In the simplest case, conversion functions just copy over values from the source to the target object. But for the previous example, which converts a v1alpha1
pizza specification to the internal type, simple copying is not enough. We have to adapt the different structure, which actually looks like the following:
func
Convert_v1alpha1_PizzaSpec_To_restaurant_PizzaSpec
(
in
*
PizzaSpec
,
out
*
restaurant
.
PizzaSpec
,
s
conversion
.
Scope
,
)
error
{
idx
:=
map
[
string
]
int
{}
for
_
,
top
:=
range
in
.
Toppings
{
if
i
,
duplicate
:=
idx
[
top
];
duplicate
{
out
.
Toppings
[
i
].
Quantity
++
continue
}
idx
[
top
]
=
len
(
out
.
Toppings
)
out
.
Toppings
=
append
(
out
.
Toppings
,
restaurant
.
PizzaTopping
{
Name
:
top
,
Quantity
:
1
,
})
}
return
nil
}
Clearly, no code generation can be so clever as to foresee what the user intended when defining these different types.
Note that during conversion the source object must never be mutated. But it is completely normal and, often for performance reasons, highly recommended to reuse data structures of the source in the target object if the types match.
This is so important that we reiterate it in a warning, because it has implications not only for the implementation of conversion but also for callers of conversions and consumers of conversion output.
Conversion functions must not mutate the source object, but the output is allowed to share data structures with the source. This means that consumers of conversion output have to make sure not to mutate an object if the original object must not be mutated.
For example, assume you have a pod *core.Pod
in the internal version, and you convert it to v1
as podv1 *corev1.Pod
, and mutate the resulting podv1
. This might also mutate the original pod
. If the pod
came from an informer, this is highly dangerous because informers have a shared cache and mutating pod
makes the cache inconsistent.
So, be aware of this property of conversion and do deep copies if necessary to avoid undesired and potentially dangerous mutations.
While this sharing of data structures leads to some risk, it also can avoid unnecessary allocations in many situations. Generated code goes so far that the generator compares source and target structs and uses Golang’s unsafe
packages to convert pointers to structs of the same memory layout via a simple type conversion. Because the internal type and the v1beta1
types for a pizza in our example have the same memory layout, we get this:
func
autoConvert_restaurant_PizzaSpec_To_v1beta1_PizzaSpec
(
in
*
restaurant
.
PizzaSpec
,
out
*
PizzaSpec
,
s
conversion
.
Scope
,
)
error
{
out
.
Toppings
=
*
(
*
[]
PizzaTopping
)(
unsafe
.
Pointer
(
&
in
.
Toppings
))
return
nil
}
On the machine language level, this is a NOOP and therefore as fast as it can get. It avoids allocating a slice in this case and copying item by item from in
to out
.
Last but not least, some words about the third argument of conversion functions: the conversion scope conversion.Scope
.
The conversion scope provides access to a number of conversion metalevel values. For example, it allows us to access the context
value that is passed to the scheme’s Convert(in, out interface{}, context interface{}) error
method via:
s
.
Meta
().
Context
It also allows us to call the scheme conversion for subtypes via s.Convert
, or without considering the registered conversion functions at all via s.DefaultConvert
.
In most conversion cases, though, there is no need to use the scope at all. You can just ignore its existence for the sake of simplicity until you hit a tricky situation where more context than the source and target object is necessary.
Defaulting is the step in an API request’s lifecycle that sets default values for omitted fields in incoming objects (from the client or from etcd
). For example, a pod has a restartPolicy
field. If the user does not specify it, a value will default to Always
.
Imagine we are using a very old Kubernetes version around the year 2014. The field restartPolicy
was just introduced to the system in the latest release at that time. After an upgrade of your cluster, there is a pod in etcd
without the restartPolicy
field. A kubectl get pod
would read the old pod from etcd
and the defaulting code would add the default value Always
. From the user’s point of view, magically the old pod suddenly has the new restartPolicy
field.
Refer back to Figure 8-6 to see where defaulting takes place today in the Kubernetes request pipeline. Note that defaulting is done only for external types, not internal types.
Now let’s look at the code that does defaulting. Defaulting is initiated by the k8s.io/apiserver code via the scheme, similarly to conversion. Hence, we have to register defaulting functions into the scheme for our custom types.
Again, similarly to conversions, most defaulting code is just generated with the defaulter-gen
binary. It traverses API types and creates defaulting functions in pkg/apis/group-name
/version
/zz_generated.defaults.go. The code doesn’t do anything by default other than calling defaulting functions for the substructures.
You can define your own defaulting logic by following the defaulting function naming pattern SetDefaultsKind
:
func
SetDefaults
Kind
(
obj
*
Type
)
{
...
}
In addition, and unlike with conversions, we have to call the registration of the generated function on the local scheme builder manually. This is unfortunately not done automatically:
func
init
()
{
localSchemeBuilder
.
Register
(
RegisterDefaults
)
}
Here, RegisterDefaults
is generated inside package pkg/apis/group-name
/version
/zz_generated.defaults.go.
For defaulting code, it is crucial to know when a field was set by the user and when it wasn’t. This is not that clear in many cases.
Golang has zero values for every type and sets them if a field is not found in the passed JSON or protobuf. Imagine a default of true
for a boolean field foo
. The zero value is false
. Unfortunately, it is not clear whether false
was set due to the user’s input or because false
is just the zero value of booleans.
To avoid this situation, often a pointer type must be used in the Golang API types (e.g., *bool
in the preceding case). A user-provided false
would lead to a non-nil
boolean pointer to a false
value, and a user-provided true
would lead to the non-nil
boolean pointer and a true
value. A not-provided field leads to nil
. This can be detected in the defaulting code:
func
SetDefaults
Kind
(
obj
*
Type
)
{
if
obj
.
Foo
==
nil
{
x
:=
true
obj
.
Foo
=
&
x
}
}
This gives the desired semantics: “foo defaults to true.”
This trick of using a pointer works for primitive types like strings. For maps and arrays, it is often hard to reach roundtrippability without identifying nil
maps/arrays and empty maps/arrays. Most defaulters for maps and arrays in Kubernetes therefore apply the default in both cases, working around encoding and decoding bugs.
Getting conversions right is hard. Roundtrip tests are an essential tool to check automatically in a randomized test that conversions behave as planned and do not lose data when converting from and to all known group versions.
Roundtrip tests are usually placed with the install.go file (for example, into pkg/apis/restaurant/install/roundtrip_test.go) and just call the roundtrip test functions from API Machinery:
import
(
...
"k8s.io/apimachinery/pkg/api/apitesting/roundtrip"
restaurantfuzzer
"github.com/programming-kubernetes/pizza-apiserver/pkg/apis/
restaurant/fuzzer"
)
func
TestRoundTripTypes
(
t
*
testing
.
T
)
{
roundtrip
.
RoundTripTestForAPIGroup
(
t
,
Install
,
restaurantfuzzer
.
Funcs
)
}
Internally, the RoundTripTestForAPIGroup
call installs the API group into a temporary scheme using the Install
functions. Then it creates random objects in the internal version using the given fuzzer, and then converts them to some external version and back to internal. The resulting objects must be equivalent to the original. This test is done hundreds or thousand of times with all external versions.
A fuzzer is a function that return a slice of randomizer functions for the internal types and their subtypes. In our example, the fuzzer is placed into the package pkg/apis/restaurant/fuzzer/fuzzer.go and contains a randomizer for the spec struct:
// Funcs returns the fuzzer functions for the restaurant api group.
var
Funcs
=
func
(
codecs
runtimeserializer
.
CodecFactory
)
[]
interface
{}
{
return
[]
interface
{}{
func
(
s
*
restaurant
.
PizzaSpec
,
c
fuzz
.
Continue
)
{
c
.
FuzzNoCustom
(
s
)
// fuzz first without calling this function again
// avoid empty Toppings because that is defaulted
if
len
(
s
.
Toppings
)
==
0
{
s
.
Toppings
=
[]
restaurant
.
PizzaTopping
{
{
"salami"
,
1
},
{
"mozzarella"
,
1
},
{
"tomato"
,
1
},
}
}
seen
:=
map
[
string
]
bool
{}
for
i
:=
range
s
.
Toppings
{
// make quantity strictly positive and of reasonable size
s
.
Toppings
[
i
].
Quantity
=
1
+
c
.
Intn
(
10
)
// remove duplicates
for
{
if
!
seen
[
s
.
Toppings
[
i
].
Name
]
{
break
}
s
.
Toppings
[
i
].
Name
=
c
.
RandString
()
}
seen
[
s
.
Toppings
[
i
].
Name
]
=
true
}
},
}
}
If no randomizer function is given, the underlying library github.com/google/gofuzz will generically try to fuzz the object by setting random values for base types and diving recursively into pointers, structs, maps, and slices, eventually calling custom randomizer functions if they are given by the developer.
When writing a randomizer function for one of the types, it is convenient to call c.FuzzNoCustom(s)
first. It randomizes the given object s
and also calls custom functions for substructures, but not for s
itself. Then the developer can restrict and fix the random values to make the object valid.
It is important to make fuzzers as general as possible in order to cover as many valid objects as possible. If the fuzzer is too restrictive, the test coverage will be bad. In many cases during the development of Kubernetes, regressions were not caught because the fuzzers in place were not good.
On the other hand, a fuzzer only has to consider objects that validate and are the projection of actual objects definable in the external versions. Often you have to restrict the random values set by c.FuzzNoCustom(s)
in a way that the randomized object becomes valid. For example, a string holding a URL does not have to roundtrip for arbitrary values if validation will reject arbitrary strings anyway.
Our preceding PizzaSpec
example first calls c.FuzzNoCustom(s)
and then fixes up the object by:
Defaulting the nil
case for toppings
Setting a reasonable quantity for each topping (without that, the conversion to v1alpha1
will explode in complexity, introducing high quantities into a string list)
Normalizing the topping names, as we know that duplicated toppings in a pizza spec will never roundtrip (for the internal types, note that v1alpha1 types have duplication)
Incoming objects are validated shortly after they have been deserialized, defaulted, and converted to the internal version. Figure 8-5 showed earlier how validation is done between mutating admission plug-ins and validating admission plug-ins, long before the actual creation or update logic is executed.
This means validation has to be implemented only once for the internal version, not for all external versions. This has the advantage that it obviously saves implementation work and also ensures consistency between versions. On the other hand, it means that validation errors do not refer to the external version. This can actually be observed with Kubernetes resources, but in practice it is no big deal.
In this section, we’ll look at the implementation of validation functions. The wiring into the custom API server—namely, calling validation from the strategy that configures the generic registry—will be covered in the next section. In other words, Figure 8-5 is slightly misleading in favor of visual simplicity.
For now it should be enough to look at the entry point into the validation inside the strategy:
func
(
pizzaStrategy
)
Validate
(
ctx
context
.
Context
,
obj
runtime
.
Object
,
)
field
.
ErrorList
{
pizza
:=
obj
.(
*
restaurant
.
Pizza
)
return
validation
.
ValidatePizza
(
pizza
)
}
This calls out to the ValidateKind(obj
*Kind) field.ErrorList
validation function in the validation package of the API group pkg/apis/group/validation
.
The validation functions return an error list. They are usually written in the same style, appending return values to an error list while recursively diving into the type, one validation function per struct:
// ValidatePizza validates a Pizza.
func
ValidatePizza
(
f
*
restaurant
.
Pizza
)
field
.
ErrorList
{
allErrs
:=
field
.
ErrorList
{}
errs
:=
ValidatePizzaSpec
(
&
f
.
Spec
,
field
.
NewPath
(
"spec"
))
allErrs
=
append
(
allErrs
,
errs
...
)
return
allErrs
}
// ValidatePizzaSpec validates a PizzaSpec.
func
ValidatePizzaSpec
(
s
*
restaurant
.
PizzaSpec
,
fldPath
*
field
.
Path
,
)
field
.
ErrorList
{
allErrs
:=
field
.
ErrorList
{}
prevNames
:=
map
[
string
]
bool
{}
for
i
:=
range
s
.
Toppings
{
if
s
.
Toppings
[
i
].
Quantity
<=
0
{
allErrs
=
append
(
allErrs
,
field
.
Invalid
(
fldPath
.
Child
(
"toppings"
).
Index
(
i
).
Child
(
"quantity"
),
s
.
Toppings
[
i
].
Quantity
,
"cannot be negative or zero"
,
))
}
if
len
(
s
.
Toppings
[
i
].
Name
)
==
0
{
allErrs
=
append
(
allErrs
,
field
.
Invalid
(
fldPath
.
Child
(
"toppings"
).
Index
(
i
).
Child
(
"name"
),
s
.
Toppings
[
i
].
Name
,
"cannot be empty"
,
))
}
else
{
if
prevNames
[
s
.
Toppings
[
i
].
Name
]
{
allErrs
=
append
(
allErrs
,
field
.
Invalid
(
fldPath
.
Child
(
"toppings"
).
Index
(
i
).
Child
(
"name"
),
s
.
Toppings
[
i
].
Name
,
"must be unique"
,
))
}
prevNames
[
s
.
Toppings
[
i
].
Name
]
=
true
}
}
return
allErrs
}
Note how the field path is maintained using Child
and Index
calls. The field path is the JSON path, which is printed in case of errors.
Often there is an additional set of validation functions that differs slightly for updates (while the preceding set is used for creation). In our example API server, this could look like the following:
func
(
pizzaStrategy
)
ValidateUpdate
(
ctx
context
.
Context
,
obj
,
old
runtime
.
Object
,
)
field
.
ErrorList
{
objPizza
:=
obj
.(
*
restaurant
.
Pizza
)
oldPizza
:=
old
.(
*
restaurant
.
Pizza
)
return
validation
.
ValidatePizzaUpdate
(
objPizza
,
oldPizza
)
}
This can be used to verify that no read-only fields are changed. Often an update validation calls the normal validation functions as well and only adds checks relevant for the update.
Validation is the right place to restrict object names on creation—for example, to be single-word only, or to not include any non-alpha-numeric characters.
Actually, any ObjectMeta
field can technically be restricted in a custom way, though that’s not desirable for many fields because it might break core API machinery behavior. A number of resources restrict the names because, for example, the name will show up in other systems or in other contexts that require a specially formatted name.
But even if there are special ObjectMeta
validations in place in a custom API server, the generic registry will validate against generic rules in any case, after the custom validation has passed. This allows us to return more specific error messages from the custom code first.
So far, we have seen how API types are defined and validate. The next step is the implementation of the REST logic for those API types. Figure 8-7 shows the registry as a central part of the implementation of an API group. The generic REST request handler code in k8s.io/apiserver calls out to the registry.
The REST logic is usually implemented by what is called the generic registry. It is—as the name suggests—a generic implementation of the registry interfaces in the package k8s.io/apiserver/pkg/registry/rest.
The generic registry implements the default REST behavior for “normal” resources. Nearly all Kubernetes resources use this implementation. Only a few, specifically those that do not persist objects (e.g., SubjectAccessReview
; see “Delegated Authorization”), have custom implementations.
In k8s.io/apiserver/pkg/registry/rest/rest.go you will find many interfaces, loosely corresponding to HTTP verbs and certain API functionalities. If an interface is implemented by a registry, the API endpoint code will offer certain REST features. Because the generic registry implements most of the k8s.io/apiserver/pkg/registry/rest interfaces, resources that use it will support all the default Kubernetes HTTP verbs (see “The HTTP Interface of the API Server”). Here is a list of those interfaces that are implemented, with the GoDoc description from the Kubernetes source code:
CollectionDeleter
An object that can delete a collection of RESTful resources
Creater
An object that can create an instance of a RESTful object
CreaterUpdater
A storage object that must support both create and update operations
Exporter
An object that knows how to strip a RESTful resource for export
Getter
An object that can retrieve a named RESTful resource
GracefulDeleter
An object that knows how to pass deletion options to allow delayed deletion of a RESTful object
Lister
An object that can retrieve resources that match the provided field and label criteria
Patcher
A storage object that supports both get and update
Scoper
An object that must be specified and indicates what scope the resource
Updater
An object that can update an instance of a RESTful object
Watcher
An object that should be implemented by all storage objects that want to offer the ability to watch for changes through the Watch
API
Let’s look at one of the interfaces, Creater
:
// Creater is an object that can create an instance of a RESTful object.
type
Creater
interface
{
// New returns an empty object that can be used with Create after request
// data has been put into it.
// This object must be a pointer type for use with Codec.DecodeInto([]byte,
// runtime.Object)
New
()
runtime
.
Object
// Create creates a new version of a resource.
Create
(
ctx
context
.
Context
,
obj
runtime
.
Object
,
createValidation
ValidateObjectFunc
,
options
*
metav1
.
CreateOptions
,
)
(
runtime
.
Object
,
error
)
}
A registry implementing this interface will be able to create objects. In contrast to NamedCreater
, the name of the new object either comes from ObjectMeta.Name
or is generated via ObjectMeta.GenerateName
. If a registry implements NamedCreater
, the name can also be passed through the HTTP path.
It is important to understand that the implemented interfaces determine which verbs will be supported by the API endpoint that is created while installing the API into the custom API server. See “API Installation” for how this is done in the code.
The generic registry can be customized to a certain degree using an object called a strategy. The strategy provides callbacks to functionality like validation, as we saw in “Validation”.
The strategy implements the REST strategy interfaces listed here with their GoDoc description (see k8s.io/apiserver/pkg/registry/rest for their definitions):
RESTCreateStrategy
Defines the minimum validation, accepted input, and name generation behavior to create an object that follows Kubernetes API conventions.
RESTDeleteStrategy
Defines deletion behavior on an object that follows Kubernetes API conventions.
RESTGracefulDeleteStrategy
Must be implemented by the registry that supports graceful deletion.
GarbageCollectionDeleteStrategy
Must be implemented by the registry that wants to orphan dependents by default.
RESTExportStrategy
Defines how to export a Kubernetes object.
RESTUpdateStrategy
Defines the minimum validation, accepted input, and name generation behavior to update an object that follows Kubernetes API conventions.
Let’s look again at the strategy for the creation case:
type
RESTCreateStrategy
interface
{
runtime
.
ObjectTyper
// The name generator is used when the standard GenerateName field is set.
// The NameGenerator will be invoked prior to validation.
names
.
NameGenerator
// NamespaceScoped returns true if the object must be within a namespace.
NamespaceScoped
()
bool
// PrepareForCreate is invoked on create before validation to normalize
// the object. For example: remove fields that are not to be persisted,
// sort order-insensitive list fields, etc. This should not remove fields
// whose presence would be considered a validation error.
//
// Often implemented as a type check and an initailization or clearing of
// status. Clear the status because status changes are internal. External
// callers of an api (users) should not be setting an initial status on
// newly created objects.
PrepareForCreate
(
ctx
context
.
Context
,
obj
runtime
.
Object
)
// Validate returns an ErrorList with validation errors or nil. Validate
// is invoked after default fields in the object have been filled in
// before the object is persisted. This method should not mutate the
// object.
Validate
(
ctx
context
.
Context
,
obj
runtime
.
Object
)
field
.
ErrorList
// Canonicalize allows an object to be mutated into a canonical form. This
// ensures that code that operates on these objects can rely on the common
// form for things like comparison. Canonicalize is invoked after
// validation has succeeded but before the object has been persisted.
// This method may mutate the object. Often implemented as a type check or
// empty method.
Canonicalize
(
obj
runtime
.
Object
)
}
The embedded ObjectTyper
recognizes objects; that is, it checks whether an object in a request is supported by the registry. This is important to create the right kind of objects (e.g., via a “foo” resource, only “Foo” resources should be created).
The NameGenerator
obviously generates names from the ObjectMeta.GenerateName
field.
Via NamespaceScoped
the strategy can support cluster-wide or namespaced resources by returning either false
or true
.
The PrepareForCreate
method is called with the incoming object before validation.
The Validate
method we’ve seen before in “Validation”: it’s the entry point to the validation functions.
Finally, the Canonicalize
method does normalization (e.g., sorting of slices).
The strategy object is plugged into a generic registry instance. Here is the REST storage constructor for our custom API server on GitHub:
// NewREST returns a RESTStorage object that will work against API services.
func
NewREST
(
scheme
*
runtime
.
Scheme
,
optsGetter
generic
.
RESTOptionsGetter
,
)
(
*
registry
.
REST
,
error
)
{
strategy
:=
NewStrategy
(
scheme
)
store
:=
&
genericregistry
.
Store
{
NewFunc
:
func
()
runtime
.
Object
{
return
&
restaurant
.
Pizza
{}
},
NewListFunc
:
func
()
runtime
.
Object
{
return
&
restaurant
.
PizzaList
{}
},
PredicateFunc
:
MatchPizza
,
DefaultQualifiedResource
:
restaurant
.
Resource
(
"pizzas"
),
CreateStrategy
:
strategy
,
UpdateStrategy
:
strategy
,
DeleteStrategy
:
strategy
,
}
options
:=
&
generic
.
StoreOptions
{
RESTOptions
:
optsGetter
,
AttrFunc
:
GetAttrs
,
}
if
err
:=
store
.
CompleteWithOptions
(
options
);
err
!=
nil
{
return
nil
,
err
}
return
&
registry
.
REST
{
store
},
nil
}
It instantiates the generic registry object genericregistry.Store
and sets a few fields. Many of these fields are optional and store.CompleteWithOptions
will default them if they are not set by the developer.
You can see how the custom strategy is first instantiated via the NewStrategy
constructor and then plugged into the registry for create
, update
, and delete
operators.
In addition, the NewFunc
is set to create a new object instance, and the NewListFunc
field is set to create a new object list. The PredicateFunc
translates a selector (which could be passed to a list request) into a predicate function, filtering runtime objects.
The returned object is a REST registry, just a simple wrapper in our example project around the generic registry object to make the type our own:
type
REST
struct
{
*
genericregistry
.
Store
}
With this we have everything to instantiate our API and wire it into the custom API server. In the following section we’ll see how to create an HTTP handler out of it.
To activate an API in an API server, two steps are necessary:
The API version must be installed into the API type’s (and conversion and defaulting functions’) server scheme.
The API version must be installed into the server HTTP multiplexer (mux).
The first step is usually done using init
functions somewhere centrally in the API server bootstrapping. This is done in pkg/apiserver/apiserver.go in our example custom API server, where the serverConfig
and CustomServer
objects are defined (see “Options and Config Pattern and Startup Plumbing”):
import
(
...
"k8s.io/apimachinery/pkg/runtime"
"k8s.io/apimachinery/pkg/runtime/serializer"
"github.com/programming-kubernetes/pizza-apiserver/pkg/apis/restaurant/install"
)
var
(
Scheme
=
runtime
.
NewScheme
()
Codecs
=
serializer
.
NewCodecFactory
(
Scheme
)
)
Then for each API group that should be served, we call the Install()
function:
func
init
()
{
install
.
Install
(
Scheme
)
}
For technical reasons, we also have to add some discovery-related types to the scheme (this will probably go away in future versions of k8s.io/apiserver):
func
init
()
{
// we need to add the options to empty v1
// TODO: fix the server code to avoid this
metav1
.
AddToGroupVersion
(
Scheme
,
schema
.
GroupVersion
{
Version
:
"v1"
})
// TODO: keep the generic API server from wanting this
unversioned
:=
schema
.
GroupVersion
{
Group
:
""
,
Version
:
"v1"
}
Scheme
.
AddUnversionedTypes
(
unversioned
,
&
metav1
.
Status
{},
&
metav1
.
APIVersions
{},
&
metav1
.
APIGroupList
{},
&
metav1
.
APIGroup
{},
&
metav1
.
APIResourceList
{},
)
}
With this we have registered our API types in the global scheme, including conversion and defaulting functions. In other words, the empty scheme of Figure 8-3 now knows everything about our types.
The second step is to add the API group to the HTTP mux. The generic API server code embedded into our CustomServer
struct provides the InstallAPIGroup(apiGroupInfo *APIGroupInfo) error
method, which sets up the whole request pipeline for an API group.
The only thing we have to do is to provide a properly filled APIGroupInfo
struct. We do this in the constructor New()
(*CustomServer, error)
of the completedConfig
type:
// New returns a new instance of CustomServer from the given config.
func
(
c
completedConfig
)
New
()
(
*
CustomServer
,
error
)
{
genericServer
,
err
:=
c
.
GenericConfig
.
New
(
"pizza-apiserver"
,
genericapiserver
.
NewEmptyDelegate
())
if
err
!=
nil
{
return
nil
,
err
}
s
:=
&
CustomServer
{
GenericAPIServer
:
genericServer
,
}
apiGroupInfo
:=
genericapiserver
.
NewDefaultAPIGroupInfo
(
restaurant
.
GroupName
,
Scheme
,
metav1
.
ParameterCodec
,
Codecs
)
v1alpha1storage
:=
map
[
string
]
rest
.
Storage
{}
pizzaRest
:=
pizzastorage
.
NewREST
(
Scheme
,
c
.
GenericConfig
.
RESTOptionsGetter
)
v1alpha1storage
[
"pizzas"
]
=
customregistry
.
RESTInPeace
(
pizzaRest
)
toppingRest
:=
toppingstorage
.
NewREST
(
Scheme
,
c
.
GenericConfig
.
RESTOptionsGetter
,
)
v1alpha1storage
[
"toppings"
]
=
customregistry
.
RESTInPeace
(
toppingRest
)
apiGroupInfo
.
VersionedResourcesStorageMap
[
"v1alpha1"
]
=
v1alpha1storage
v1beta1storage
:=
map
[
string
]
rest
.
Storage
{}
pizzaRest
=
pizzastorage
.
NewREST
(
Scheme
,
c
.
GenericConfig
.
RESTOptionsGetter
)
v1beta1storage
[
"pizzas"
]
=
customregistry
.
RESTInPeace
(
pizzaRest
)
apiGroupInfo
.
VersionedResourcesStorageMap
[
"v1beta1"
]
=
v1beta1storage
if
err
:=
s
.
GenericAPIServer
.
InstallAPIGroup
(
&
apiGroupInfo
);
err
!=
nil
{
return
nil
,
err
}
return
s
,
nil
}
The APIGroupInfo
has references to the generic registry that we customized in “Registry and Strategy” via a strategy. For each group version and resource, we create an instance of the registry using the implemented constructors.
The customregistry.RESTInPeace
wrapper is just a helper that panics when the registry constructors return an error:
func
RESTInPeace
(
storage
rest
.
StandardStorage
,
err
error
)
rest
.
StandardStorage
{
if
err
!=
nil
{
err
=
fmt
.
Errorf
(
"unable to create REST storage: %v"
,
err
)
panic
(
err
)
}
return
storage
}
The registry itself is version-independent, as it operates on internal objects; refer back to Figure 8-5. Hence, we call the same registry constructor for each version.
The call to InstallAPIGroup
finally leads us to a complete custom API server ready to serve our custom API group, as shown earlier in Figure 8-7.
After all this heavy plumbing, it is time to see our new API groups in action. For this we start up the server as shown in “The First Start”. But this time the discovery info is not empty but instead shows our newly registered resource:
$
curl -k https://localhost:443/apis{
"kind"
:"APIGroupList"
,"groups"
:[
{
"name"
:"restaurant.programming-kubernetes.info"
,"versions"
:[
{
"groupVersion"
:"restaurant.programming-kubernetes.info/v1beta1"
,"version"
:"v1beta1"
}
,{
"groupVersion"
:"restaurant.programming-kubernetes.info/v1alpha1"
,"version"
:"v1alpha1"
}
]
,"preferredVersion"
:{
"groupVersion"
:"restaurant.programming-kubernetes.info/v1beta1"
,"version"
:"v1beta1"
}
,"serverAddressByClientCIDRs"
:[
{
"clientCIDR"
:"0.0.0.0/0"
,"serverAddress"
:":443"
}
]
}
]
}
With this, we have nearly reached our goal to serve the restaurant API. We have wired the API group versions, conversions are in place, and validation is working.
What’s missing is a check that a topping mentioned in a pizza actually exists in the cluster. We could add this in the validation functions. But traditionally these are just format validation functions, which are static and do not need other resources to run.
In contrast, more complex checks are implemented in admission—the topic of the next section.
Every request passes the chain of admission plug-ins after being unmarshaled, defaulted, and converted to internal types; refer back to Figure 8-2. More precisely, requests pass admission twice:
The mutating plug-ins
The validating plug-ins
Admission plug-ins can be both mutating and validating and therefore can potentially get called twice by the admission mechanism:
Once in the mutation phase, called for all mutating plug-ins sequentially
Once in the validation phase, called (potentially parallelized) for all validating plug-ins
More precisely, a plug-in can implement both the mutating and the validating admission interface, with two different methods for both cases.
Before the separation into mutating and validating, there was just one call to each plug-in. It was nearly impossible to keep an eye on which mutation each plug-in did and which admission plug-in order therefore made sense to lead to consistent behavior for the user.
This two-step architecture at least ensures that a validation is done at the end for all plug-ins, which guarantees consistency.
In addition, the chain (i.e., the order of plug-ins for both admission phases) is the same. Plug-ins are always enabled or disabled for both phases at the same time.
Admission plug-ins, at least those implemented in Golang as described in this chapter, work with internal types. In contrast, webhook admission plug-ins (see “Admission Webhooks”) are based on external types and involve conversion on the way to the webhook and back (in case of mutating webhooks).
But after all this theory, let’s get into the code.
An admission plug-in is a type implementing:
The admission plug-in interface Interface
Optionally the MutatingInterface
Optionally the ValidatingInterface
All three can be found in the package k8s.io/apiserver/pkg/admission:
// Operation is the type of resource operation being checked for
// admission control
type
Operation
string
.
// Operation constants
const
(
Create
Operation
=
"CREATE"
Update
Operation
=
"UPDATE"
Delete
Operation
=
"DELETE"
Connect
Operation
=
"CONNECT"
)
// Interface is an abstract, pluggable interface for Admission Control
// decisions.
type
Interface
interface
{
// Handles returns true if this admission controller can handle the given
// operation where operation can be one of CREATE, UPDATE, DELETE, or
// CONNECT.
Handles
(
operation
Operation
)
bool
.
}
type
MutationInterface
interface
{
Interface
// Admit makes an admission decision based on the request attributes.
Admit
(
a
Attributes
,
o
ObjectInterfaces
)
(
err
error
)
}
// ValidationInterface is an abstract, pluggable interface for Admission Control
// decisions.
type
ValidationInterface
interface
{
Interface
// Validate makes an admission decision based on the request attributes.
// It is NOT allowed to mutate.
Validate
(
a
Attributes
,
o
ObjectInterfaces
)
(
err
error
)
}
You see that the Interface
method Handles
is responsible for filtering on the operation. The mutating plug-ins are called via Admit
and the validating plug-ins are called via Validate
.
The ObjectInterfaces
gives access to helpers usually implemented by a scheme:
type
ObjectInterfaces
interface
{
// GetObjectCreater is the ObjectCreater for the requested object.
GetObjectCreater
()
runtime
.
ObjectCreater
// GetObjectTyper is the ObjectTyper for the requested object.
GetObjectTyper
()
runtime
.
ObjectTyper
// GetObjectDefaulter is the ObjectDefaulter for the requested object.
GetObjectDefaulter
()
runtime
.
ObjectDefaulter
// GetObjectConvertor is the ObjectConvertor for the requested object.
GetObjectConvertor
()
runtime
.
ObjectConvertor
}
The attributes passed to the plug-in (via Admit
or Validate
or both) basically contain all the information extractable from a request that is important to implementing advanced checks:
// Attributes is an interface used by AdmissionController to get information
// about a request that is used to make an admission decision.
type
Attributes
interface
{
// GetName returns the name of the object as presented in the request.
// On a CREATE operation, the client may omit name and rely on the
// server to generate the name. If that is the case, this method will
// return the empty string.
GetName
()
string
// GetNamespace is the namespace associated with the request (if any).
GetNamespace
()
string
// GetResource is the name of the resource being requested. This is not the
// kind. For example: pods.
GetResource
()
schema
.
GroupVersionResource
// GetSubresource is the name of the subresource being requested. This is a
// different resource, scoped to the parent resource, but it may have a
// different kind.
// For instance, /pods has the resource "pods" and the kind "Pod", while
// /pods/foo/status has the resource "pods", the sub resource "status", and
// the kind "Pod" (because status operates on pods). The binding resource for
// a pod, though, may be /pods/foo/binding, which has resource "pods",
// subresource "binding", and kind "Binding".
GetSubresource
()
string
// GetOperation is the operation being performed.
GetOperation
()
Operation
// IsDryRun indicates that modifications will definitely not be persisted for
// this request. This is to prevent admission controllers with side effects
// and a method of reconciliation from being overwhelmed.
// However, a value of false for this does not mean that the modification will
// be persisted, because it could still be rejected by a subsequent
// validation step.
IsDryRun
()
bool
// GetObject is the object from the incoming request prior to default values
// being applied.
GetObject
()
runtime
.
Object
// GetOldObject is the existing object. Only populated for UPDATE requests.
GetOldObject
()
runtime
.
Object
// GetKind is the type of object being manipulated. For example: Pod.
GetKind
()
schema
.
GroupVersionKind
// GetUserInfo is information about the requesting user.
GetUserInfo
()
user
.
Info
// AddAnnotation sets annotation according to key-value pair. The key
// should be qualified, e.g., podsecuritypolicy.admission.k8s.io/admit-policy,
// where "podsecuritypolicy" is the name of the plugin, "admission.k8s.io"
// is the name of the organization, and "admit-policy" is the key
// name. An error is returned if the format of key is invalid. When
// trying to overwrite annotation with a new value, an error is
// returned. Both ValidationInterface and MutationInterface are
// allowed to add Annotations.
AddAnnotation
(
key
,
value
string
)
error
}
In the mutating case—that is, in the implementation of the Admit(a Attributes) error
method—the attributes can be mutated, or more precisely, the object returned from GetObject() runtime.Object
can.
In the validating case, mutation is not allowed.
Both cases permit the call to AddAnnotation(key, value string) error
, which allows us to add annotations that end up in the audit output of the API server. This can be helpful in order to understand why an admission plug-in mutated or rejected a request.
Rejection is signaled by returning a non-nil
error from Admit
or Validate
.
It is good practice for mutating admission plug-ins to also validate the changes in the validating admission phase. The reason is that other plug-ins, including webhook admission plug-ins, might add further changes. If an admission plug-in guarantees that certain invariants are fulfilled, only the validation step can make sure this is really the case.
Admission plug-ins have to implement the Handles(operation Operation) bool
method from the admission.Interface
interfaces. There is a helper in the same package called Handler
. It can be instantiated using NewHandler(ops ...Operation) *Handler
and implements the Handles
method by embedding Handler
into the custom admission plug-in:
type
CustomAdmissionPlugin
struct
{
*
admission
.
Handler
...
}
Admission plug-ins should always check the GroupVersionKind of the passed object first:
func
(
d
*
PizzaToppingsPlugin
)
Admit
(
a
admission
.
Attributes
,
o
ObjectInterfaces
,
)
error
{
// we are only interested in pizzas
if
a
.
GetKind
().
GroupKind
()
!=
restaurant
.
Kind
(
"Pizza"
)
{
return
nil
}
...
}
and similarly for the validating case:
func
(
d
*
PizzaToppingsPlugin
)
Validate
(
a
admission
.
Attributes
,
o
ObjectInterfaces
,
)
error
{
// we are only interested in pizzas
if
a
.
GetKind
().
GroupKind
()
!=
restaurant
.
Kind
(
"Pizza"
)
{
return
nil
}
...
}
The full example admission implementation looks like this:
// Admit ensures that the object in-flight is of kind Pizza.
// In addition checks that the toppings are known.
func
(
d
*
PizzaToppingsPlugin
)
Validate
(
a
admission
.
Attributes
,
_
admission
.
ObjectInterfaces
,
)
error
{
// we are only interested in pizzas
if
a
.
GetKind
().
GroupKind
()
!=
restaurant
.
Kind
(
"Pizza"
)
{
return
nil
}
if
!
d
.
WaitForReady
()
{
return
admission
.
NewForbidden
(
a
,
fmt
.
Errorf
(
"not yet ready"
))
}
obj
:=
a
.
GetObject
()
pizza
:=
obj
.(
*
restaurant
.
Pizza
)
for
_
,
top
:=
range
pizza
.
Spec
.
Toppings
{
err
:=
_
,
err
:=
d
.
toppingLister
.
Get
(
top
.
Name
)
if
err
!=
nil
&&
errors
.
IsNotFound
(
err
)
{
return
admission
.
NewForbidden
(
a
,
fmt
.
Errorf
(
"unknown topping: %s"
,
top
.
Name
),
)
}
}
return
nil
}
It takes the following steps:
Checks that the passed object is of the right kind
Forbids access before the informers are ready
Verifies via the toppings informer lister that each topping mentioned in the pizza specification actually exists as a Topping
object in the cluster
Note here that the lister is just an interface to the informer in-memory store. So these Get
calls will be fast.
Admission plug-ins must be registered. This is done through a Register
function:
func
Register
(
plugins
*
admission
.
Plugins
)
{
plugins
.
Register
(
"PizzaTopping"
,
func
(
config
io
.
Reader
)
(
admission
.
Interface
,
error
)
{
return
New
()
},
)
}
This function is added to the plug-in list in the RecommendedOptions
(see “Options and Config Pattern and Startup Plumbing”):
func
(
o
*
CustomServerOptions
)
Complete
()
error
{
// register admission plugins
pizzatoppings
.
Register
(
o
.
RecommendedOptions
.
Admission
.
Plugins
)
// add admisison plugins to the RecommendedPluginOrder
oldOrder
:=
o
.
RecommendedOptions
.
Admission
.
RecommendedPluginOrder
o
.
RecommendedOptions
.
Admission
.
RecommendedPluginOrder
=
append
(
oldOrder
,
"PizzaToppings"
)
return
nil
}
Here, the RecommendedPluginOrder
list is prepopulated with the generic admission plug-ins, which every API server should keep enabled to be a good API convention citizen in the cluster.
It is best practice not to touch the order. One reason is that getting the order right is far from trivial. Of course, adding a custom plug-in at a location other than the end of the list is fine, if it is strictly necessary for the plug-in behavior.
The user of the custom API server will be able to disable a custom admission plug-in with the usual admission chain configuration flags (--disable-admission-plugins
, for example). By default our own plug-in is enabled, because we don’t explicitly disable it.
Admission plug-ins can be configured using a configuration file. To do so, we parse the output of the io.Reader
in the Register
function shown previously. The --admission-control-config-file
allows us to pass a configuration file to the plug-in, like so:
kind
:
AdmissionConfiguration
apiVersion
:
apiserver.k8s.io/v1alpha1
plugins
:
-
name
:
CustomAdmissionPlugin
path
:
custom-admission-plugin.yaml
Alternatively, we can do inline configuration to have all our admission configuration in one place:
kind
:
AdmissionConfiguration
apiVersion
:
apiserver.k8s.io/v1alpha1
plugins
:
-
name
:
CustomAdmissionPlugin
configuration
:
your-custom-yaml-inline-config
We briefly mentioned that our admission plug-in uses the toppings informer to check for the existence of toppings mentioned in the pizza. We have not talked about how to wire that into the admission plug-in. Let’s do this now.
Admission plug-ins often need clients and informers or other resources to implement their behavior. We can do this resource plumbing using plug-in initializers.
There are a number of standard plug-in initializers. If your plug-in wants to be called by them, it has to implement certain interfaces with callback methods (for more on this, see k8s.io/apiserver/pkg/admission/initializer):
// WantsExternalKubeClientSet defines a function that sets external ClientSet
// for admission plugins that need it.
type
WantsExternalKubeClientSet
interface
{
SetExternalKubeClientSet
(
kubernetes
.
Interface
)
admission
.
InitializationValidator
}
// WantsExternalKubeInformerFactory defines a function that sets InformerFactory
// for admission plugins that need it.
type
WantsExternalKubeInformerFactory
interface
{
SetExternalKubeInformerFactory
(
informers
.
SharedInformerFactory
)
admission
.
InitializationValidator
}
// WantsAuthorizer defines a function that sets Authorizer for admission
// plugins that need it.
type
WantsAuthorizer
interface
{
SetAuthorizer
(
authorizer
.
Authorizer
)
admission
.
InitializationValidator
}
// WantsScheme defines a function that accepts runtime.Scheme for admission
// plugins that need it.
type
WantsScheme
interface
{
SetScheme
(
*
runtime
.
Scheme
)
admission
.
InitializationValidator
}
Implement some of these and the plug-in gets called during launch, in order to get access to, say, Kubernetes resources or the API server global scheme.
In addition, the admission.InitializationValidator
interface is supposed to be implemented to do a final check that the plug-in is properly set up:
// InitializationValidator holds ValidateInitialization functions, which are
// responsible for validation of initialized shared resources and should be
// implemented on admission plugins.
type
InitializationValidator
interface
{
ValidateInitialization
()
error
}
Standard initializers are great, but we need access to the toppings informer. So, let’s look at how to add our own initializers. An initializer consists of:
A Wants*
interface (e.g., WantsRestaurantInformerFactory
), which should be implemented by an admission plug-in:
// WantsRestaurantInformerFactory defines a function that sets
// InformerFactory for admission plugins that need it.
type
WantsRestaurantInformerFactory
interface
{
SetRestaurantInformerFactory
(
informers
.
SharedInformerFactory
)
admission
.
InitializationValidator
}
The initializer struct, implementing admission.PluginInitializer
:
func
(
i
restaurantInformerPluginInitializer
)
Initialize
(
plugin
admission
.
Interface
,
)
{
if
wants
,
ok
:=
plugin
.(
WantsRestaurantInformerFactory
);
ok
{
wants
.
SetRestaurantInformerFactory
(
i
.
informers
)
}
}
In other words, the Initialize()
method checks that the passed plug-in implements the corresponding custom initializer Wants*
interface. If that is the case, the initializer will call the method on the plug-in.
Plumbing of the initializer constructor into RecommendedOptions.ExtraAdmissionInitializers
(see “Options and Config Pattern and Startup Plumbing”):
func
(
o
*
CustomServerOptions
)
Config
()
(
*
apiserver
.
Config
,
error
)
{
...
o
.
RecommendedOptions
.
ExtraAdmissionInitializers
=
func
(
c
*
genericapiserver
.
RecommendedConfig
)
(
[]
admission
.
PluginInitializer
,
error
,
)
{
client
,
err
:=
clientset
.
NewForConfig
(
c
.
LoopbackClientConfig
)
if
err
!=
nil
{
return
nil
,
err
}
informerFactory
:=
informers
.
NewSharedInformerFactory
(
client
,
c
.
LoopbackClientConfig
.
Timeout
,
)
o
.
SharedInformerFactory
=
informerFactory
return
[]
admission
.
PluginInitializer
{
custominitializer
.
New
(
informerFactory
),
},
nil
}
...
}
This code creates a loopback client for the restaurant API group, creates a corresponding informer factory, stores it in the options o
, and returns a plug-in initializer for it.
As promised, admission is the last step in the implementation to complete our custom API server for the restaurant API group. Now we want to see it in action, but not artificially on the local machine, but rather in a real Kubernetes cluster. This means we have to take a look at the deployment of an aggregated custom API server.
In “API Services”, we saw the APIService
object, which is used to register the custom API server API group versions with the aggregator inside the Kubernetes API server:
apiVersion
:
apiregistration.k8s.io/v1beta1
kind
:
APIService
metadata
:
name
:
name
spec
:
group
:
API-group-name
version
:
API-group-version
service
:
namespace
:
custom-API-server-service-namespace
name
:
custom-API-server-service
caBundle
:
base64-caBundle
insecureSkipTLSVerify
:
bool
groupPriorityMinimum
:
2000
versionPriority
:
20
The APIService
object points to a service. Usually, this service will be a normal cluster IP service: that is, the custom API server is deployed into the cluster using pods. The service forwards the requests to the pods.
Let’s look at the Kubernetes manifest to implement this.
We have the following manifests (found in the example code on GitHub) that will be part of an in-cluster deployment of a custom API service:
An APIService
for both versions v1alpha1
:
apiVersion
:
apiregistration.k8s.io/v1beta1
kind
:
APIService
metadata
:
name
:
v1alpha1.restaurant.programming-kubernetes.info
spec
:
insecureSkipTLSVerify
:
true
group
:
restaurant.programming-kubernetes.info
groupPriorityMinimum
:
1000
versionPriority
:
15
service
:
name
:
api
namespace
:
pizza-apiserver
version
:
v1alpha1
…and v1beta1
:
apiVersion
:
apiregistration.k8s.io/v1beta1
kind
:
APIService
metadata
:
name
:
v1alpha1.restaurant.programming-kubernetes.info
spec
:
insecureSkipTLSVerify
:
true
group
:
restaurant.programming-kubernetes.info
groupPriorityMinimum
:
1000
versionPriority
:
15
service
:
name
:
api
namespace
:
pizza-apiserver
version
:
v1alpha1
Note here that we set insecureSkipTLSVerify
. This is OK for development but inadequate for any production deployment. We’ll see how to fix this in “Certificates and Trust”.
A Service
in front of the custom API server instances running in the cluster:
apiVersion
:
v1
kind
:
Service
metadata
:
name
:
api
namespace
:
pizza-apiserver
spec
:
ports
:
-
port
:
443
protocol
:
TCP
targetPort
:
8443
selector
:
apiserver
:
"true"
A Deployment
(as shown here) or DaemonSet
for the custom API server pods:
apiVersion
:
apps/v1
kind
:
Deployment
metadata
:
name
:
pizza-apiserver
namespace
:
pizza-apiserver
labels
:
apiserver
:
"true"
spec
:
replicas
:
1
selector
:
matchLabels
:
apiserver
:
"true"
template
:
metadata
:
labels
:
apiserver
:
"true"
spec
:
serviceAccountName
:
apiserver
containers
:
-
name
:
apiserver
image
:
quay.io/programming-kubernetes/pizza-apiserver:latest
imagePullPolicy
:
Always
command
:
[
"/pizza-apiserver"
]
args
:
-
--etcd-servers=http://localhost:2379
-
--cert-dir=/tmp/certs
-
--secure-port=8443
-
--v=4
-
name
:
etcd
image
:
quay.io/coreos/etcd:v3.2.24
workingDir
:
/tmp
A namespace for the service and the deployment to live in:
apiVersion
:
v1
kind
:
Namespace
metadata
:
name
:
pizza-apiserver
spec
:
{}
Often, the aggregated API server is deployed to some nodes reserved for control plane pods, usually called masters. In that case, a DaemonSet
is a good choice to run one custom API server instance per master node. This leads to a high availability setup. Note, that API servers are stateless, which means they can easily be deployed multiple times and no leader election is necessary.
With these manifests, we are nearly done. As is so often the case, though, a secure deployment needs some more thought. You might have noticed that the pods (defined via the preceding deployment) use a custom service account, apiserver
. This can be created via another manifest:
kind
:
ServiceAccount
apiVersion
:
v1
metadata
:
name
:
apiserver
namespace
:
pizza-apiserver
This service account needs a number of permissions, which we can add via RBAC objects.
The service account of an API service first needs some generic permissions to participate in:
Objects can be created only in an existing namespace, and are deleted when the namespace is deleted. For this the API server has to get, list, and watch namespaces.
Admission webhooks configured via MutatingWebhookConfigurations
and ValidatedWebhookConfigurations
are called from each API server independently. For this the admission mechanism in our custom API server has to get, list, and watch these resources.
We configure both by creating an RBAC cluster role:
kind
:
ClusterRole
apiVersion
:
rbac.authorization.k8s.io/v1
metadata
:
name
:
aggregated-apiserver-clusterrole
rules
:
-
apiGroups
:
[
""
]
resources
:
[
"namespaces"
]
verbs
:
[
"get"
,
"watch"
,
"list"
]
-
apiGroups
:
[
"admissionregistration.k8s.io"
]
resources
:
[
"mutatingwebhookconfigurations"
,
"validatingwebhookconfigurations"
]
verbs
:
[
"get"
,
"watch"
,
"list"
]
and binding it to our service account apiserver
via a ClusterRoleBinding
:
apiVersion
:
rbac.authorization.k8s.io/v1
kind
:
ClusterRoleBinding
metadata
:
name
:
pizza-apiserver-clusterrolebinding
roleRef
:
apiGroup
:
rbac.authorization.k8s.io
kind
:
ClusterRole
name
:
aggregated-apiserver-clusterrole
subjects
:
-
kind
:
ServiceAccount
name
:
apiserver
namespace
:
pizza-apiserver
For delegated authentication and authorization, the service account has to be bound to the preexisting RBAC role extension-apiserver-authentication-reader
:
apiVersion
:
rbac.authorization.k8s.io/v1
kind
:
RoleBinding
metadata
:
name
:
pizza-apiserver-auth-reader
namespace
:
kube-system
roleRef
:
apiGroup
:
rbac.authorization.k8s.io
kind
:
Role
name
:
extension-apiserver-authentication-reader
subjects
:
-
kind
:
ServiceAccount
name
:
apiserver
namespace
:
pizza-apiserver
and the preexisting RBAC cluster role system:auth-delegator
:
apiVersion
:
rbac.authorization.k8s.io/v1
kind
:
ClusterRoleBinding
metadata
:
name
:
pizza-apiserver:system:auth-delegator
roleRef
:
apiGroup
:
rbac.authorization.k8s.io
kind
:
ClusterRole
name
:
system:auth-delegator
subjects
:
-
kind
:
ServiceAccount
name
:
apiserver
namespace
:
pizza-apiserver
Now with all manifests in place and RBAC set up, let’s deploy the API server to a real cluster.
From a checkout of the GitHub repository, and with configured kubectl
with cluster-admin
privileges (this is needed because RBAC rules can never escalate access):
$
cd
$GOPATH
/src/github.com/programming-kubernetes/pizza-apiserver$
cd
artifacts/deployment$
kubectl apply -f ns.yaml# create the namespace first
$
kubectl apply -f .# creating all manifests described above
Now the custom API server is launching:
$
kubectl get pods -A NAMESPACE NAME READY STATUS AGE pizza-apiserver pizza-apiserver-7779f8d486-8fpgj 0/2 ContainerCreating 1s$
# some moments later
$
kubectl get pods -A pizza-apiserver pizza-apiserver-7779f8d486-8fpgj 2/2 Running 75s
When it is running, we double-check that the Kubernetes API server does aggregation (i.e., proxying of requests). First check via APIService
s whether the Kubernetes API server thinks that our custom API server is available:
$
kubectl get apiservices v1alpha1.restaurant.programming-kubernetes.info
NAME SERVICE AVAILABLE
v1alpha1.restaurant.programming-kubernetes.info pizza-apiserver/api True
This looks good. Let’s try to list pizzas, with logging enabled to see whether something goes wrong:
$
kubectl get pizzas --v=
7 ... ... GET https://localhost:58727/apis?timeout=
32s ... ... GET https://localhost:58727/apis/restaurant.programming-kubernetes.info/ v1alpha1?timeout=
32s ... ... GET https://localhost:58727/apis/restaurant.programming-kubernetes.info/ v1beta1/namespaces/default/pizzas?limit=
500 ... Request Headers: ... Accept: application/json;
as
=
Table;
v
=
v1beta1;
g
=
meta.k8s.io, application/json ... User-Agent: kubectl/v1.15.0(
darwin/amd64)
kubernetes/f873d2a ... Response Status:200
OK in6
milliseconds No resources found.
This looks very good. We see that kubectl
queries the discovery information to find out what a pizza is. It queries the restaurant.programming-kubernetes.info/v1beta1 API to list the pizzas. Unsurprisingly, there aren’t any yet. But we can of course change that:
$
cd
../examples$
# install toppings first
$
ls topping*|
xargs -n1
kubectl create -f$
kubectl create -f pizza-margherita.yaml pizza.restaurant.programming-kubernetes.info/margherita created$
kubectl get pizza -o yaml margherita apiVersion: restaurant.programming-kubernetes.info/v1beta1 kind: Pizza metadata: creationTimestamp:"2019-05-05T13:39:52Z"
name: margherita namespace: default resourceVersion:"6"
pizzas/margherita uid: 42ab6e88-6f3b-11e9-8270-0e37170891d3 spec: toppings: - name: mozzarella quantity: 1 - name: tomato quantity: 1 status:{}
This looks awesome. But the margherita pizza was easy. Let’s try defaulting in action by creating an empty pizza that does not list any toppings:
apiVersion: restaurant.programming-kubernetes.info/v1alpha1 kind: Pizza metadata: name: salami spec:
Our defaulting should turn this into a salami pizza with a salami topping. Let’s try:
$
kubectl create -f empty-pizza.yaml pizza.restaurant.programming-kubernetes.info/salami created$
kubectl get pizza -o yaml salami apiVersion: restaurant.programming-kubernetes.info/v1beta1 kind: Pizza metadata: creationTimestamp:"2019-05-05T13:42:42Z"
name: salami namespace: default resourceVersion:"8"
pizzas/salami uid: a7cb7af2-6f3b-11e9-8270-0e37170891d3 spec: toppings: - name: salami quantity: 1 - name: mozzarella quantity: 1 - name: tomato quantity: 1 status:{}
This looks like a delicious salami pizza.
Now let’s check whether our custom admission plug-in is working. We first delete all pizzas and toppings, and then try to re-create the pizzas:
$
kubectl delete pizzas --all pizza.restaurant.programming-kubernetes.info"margherita"
deleted pizza.restaurant.programming-kubernetes.info"salami"
deleted$
kubectl delete toppings --all topping.restaurant.programming-kubernetes.info"mozzarella"
deleted topping.restaurant.programming-kubernetes.info"salami"
deleted topping.restaurant.programming-kubernetes.info"tomato"
deleted$
kubectl create -f pizza-margherita.yaml Error from server(
Forbidden)
: error when creating"pizza-margherita.yaml"
: pizzas.restaurant.programming-kubernetes.info"margherita"
is forbidden: unknown topping: mozzarella
No margherita without mozzarella, like in any good Italian restaurant.
Looks like we are done implementing what we described in “Example: A Pizza Restaurant”. But not quite. Security. Again. We have not taken care of the proper certificates. A malicious pizza seller could try to get between our users and the custom API server because the Kubernetes API server just accepts any serving certificates without checking them. Let’s fix this.
The APIService
object contains the caBundle
field. This configures how the aggregator (inside the Kubernetes API server) trusts the custom API server. This CA bundle contains the certificate (and intermediate certificates) used to verify that the aggregated API server has the identity it claims to have. For any serious deployment, put the corresponding CA bundle into this field.
While insecureSkipTLSVerify
is allowed in an APIService
in order to disable certification verification, it is a bad idea to use this in a production setup. The Kubernetes API server sends requests to a trusted aggregated API server. Setting insecureSkipTLSVerify
to true
means that any other actor can claim to be the aggregated API server. This is obviously insecure and should not be used in production environments.
The reverse trust from the custom API server to the Kubernetes API server, and its preauthentication of requests, is described in “Delegated Authentication and Trust”. We don’t have to do anything extra.
Back to the pizza example: to make it secure, we need a serving certificate and a key for the custom API server in the deployment. We put both into a serving-cert
secret and mount it into the pod at /var/run/apiserver/serving-cert/tls.{crt,key}. Then we use the tls.crt file as CA in the APIService
. This can all be found in the example code on GitHub.
The certificate-generation logic is scripted in a Makefile.
Note that in a real-world scenario we’d probably have some kind of cluster or company CA we can plug into the APIService
.
To see it in action, either start with a new cluster or just reuse the previous one and apply the new, secure manifests:
$
cd
../deployment-secure$
make openssl req -new -x509 -subj"/CN=api.pizza-apiserver.svc"
-nodes -newkey rsa:4096 -keyout tls.key -out tls.crt -days 365 Generating a4096
bit RSA private key ......................++ ................................................................++ writing new private key to'tls.key'
...$
ls *.yaml|
xargs -n1
kubectl apply -f clusterrolebinding.rbac.authorization.k8s.io/pizza-apiserver:system:auth-delegator unchanged rolebinding.rbac.authorization.k8s.io/pizza-apiserver-auth-reader unchanged deployment.apps/pizza-apiserver configured namespace/pizza-apiserver unchanged clusterrolebinding.rbac.authorization.k8s.io/pizza-apiserver-clusterrolebinding unchanged clusterrole.rbac.authorization.k8s.io/aggregated-apiserver-clusterrole unchanged serviceaccount/apiserver unchanged service/api unchanged secret/serving-cert created apiservice.apiregistration.k8s.io/v1alpha1.restaurant.programming-kubernetes.info configured apiservice.apiregistration.k8s.io/v1beta1.restaurant.programming-kubernetes.info configured
Note here the correct common name CN=api.pizza-apiserver.svc
in the certificate. The Kubernetes API server proxies the request to the api/pizza-apiserver service and hence its DNS name must be put into the certificate.
We double-check that we really have disabled the insecureSkipTLSVerify
flag in the APIService
:
$
kubectl get apiservices v1alpha1.restaurant.programming-kubernetes.info -o yaml apiVersion: apiregistration.k8s.io/v1 kind: APIService metadata: name: v1alpha1.restaurant.programming-kubernetes.info ... spec: caBundle: LS0tLS1C... group: restaurant.programming-kubernetes.info groupPriorityMinimum: 1000 service: name: api namespace: pizza-apiserver version: v1alpha1 versionPriority: 15 status: conditions: - lastTransitionTime:"2019-05-05T14:07:07Z"
message: all checks passed reason: Passed status:"True"
type
: Available artifacts/deploymen
This looks as expected: insecureSkipTLSVerify
is gone and the caBundle
field is filled with a base64 value of our certificate And: the service is still available.
Now let’s see whether kubectl
can still query the API:
$
kubectl get pizzas No resources found.$
cd
../examples$
ls topping*|
xargs -n1
kubectl create -f topping.restaurant.programming-kubernetes.info/mozzarella created topping.restaurant.programming-kubernetes.info/salami created topping.restaurant.programming-kubernetes.info/tomato created$
kubectl create -f pizza-margherita.yaml pizza.restaurant.programming-kubernetes.info/margherita created
The margherita pizza is back. This time it is perfectly secured. No chance for a malicious pizza seller to start a man-in-the-middle attack. Buon appetito!
Aggregated API servers using the RecommendOptions
(see “Options and Config Pattern and Startup Plumbing”) use etcd
for storage. This means that any deployment of a custom API server requires an etcd
cluster to be available.
This cluster can be in-cluster—for example, deployed using the etcd
operator. This operator allows us to launch and administrate an etcd
cluster in a declarative way. The operator will do updates, up and down scaling, and backup. This reduces the operational overhead a lot.
Alternatively, the etcd
of the cluster control plane (i.e., that of kube-apiserver
) can be used. Depending on the environment—self-deployed, on-premise, or hosted services like Google Container Engine (GKE)—this might be viable, or it might be impossible because the user has no access to the cluster at all (as is the case with GKE). In the viable cases, the custom API server has to use a key path that is distinct from the one used by the Kubernetes API server or other etcd
consumers. In our example custom API server, it looks like this:
const
defaultEtcdPathPrefix
=
"/registry/pizza-apiserver.programming-kubernetes.github.com"
func
NewCustomServerOptions
()
*
CustomServerOptions
{
o
:=
&
CustomServerOptions
{
RecommendedOptions
:
genericoptions
.
NewRecommendedOptions
(
defaultEtcdPathPrefix
,
...
),
}
return
o
}
This etcd
path prefix is different from Kubernetes API server paths, which use different group API names.
Last but not least, etcd
can be proxied. The project etcdproxy-controller implements this mechanism using the operator pattern; that is, etcd
proxies can be deployed automatically to the cluster and configured using EtcdProxy
objects.
The etcd
proxies will automatically do key mapping, so it is guaranteed that etcd
key prefixes will not conflict. This allows us to share etcd
clusters for multiple aggregated API servers without worrying that one aggregated API server reads or changes the data of another one. This will improve security in an environment where shared etcd
clusters are required, for example, due to resource constraints or to avoid operational overhead.
Depending on the context, one of these options must be chosen. Finally, aggregated API servers can of course also use other storage backends, at least in theory, as it requires a lot of custom code to implement the k8s.io/apiserver storage interfaces.
This was a pretty large chapter, and you made it to the end. You’ve gotten a lot of background about APIs in Kubernetes and how they are implemented.
We saw how aggregation of custom API servers fits into the architecture of a Kubernetes cluster. We saw how a custom API server receives requests that are proxies from the Kubernetes API server. We have seen how the Kubernetes API server preauthenticates these requests, and how API groups are implemented, with external versions and internal versions. We learned how objects are decoded into the Golang structs, how they are defaulted, how they are converted to internal types, and how they go through admission and validation and finally reach the registry. We saw how a strategy is plugged into a generic registry to implement “normal” Kubernetes-like REST resources, how we can add custom admissions, and how to configure a custom admission plug-in with a custom initializer. We now know how to do all the plumbing to start up a custom API server with a multiversion API group, and how to deploy the API group in a cluster with APIServices
. We saw how to configure RBAC rules to allow the custom API server to do its job. We discussed how kubectl
queries API groups. Finally, we learned how to secure the connection to our custom API server with certificates.
This was a lot. Now you have a much better understanding of what APIs are in Kubernetes and how they are implemented, and hopefully you are motivated to do one or more of the following:
Implement your own custom API server
Learn about the inner workings of Kubernetes
Contribute to Kubernetes in the future
We hope that you have found this a good starting point.
1 Graceful deletion means that the client can pass a graceful deletion period as part of the deletion call. The actual deletion is done by a controller asynchronously (the kubelet
does that for pods) by doing a forced deletion. This way pods have time to cleanly shut down.
2 Kubernetes uses cohabitation to migrate resources (e.g., deployments from the extensions/v1beta1
API group) to subject-specific API groups (e.g., apps/v1
). CRDs have no concept of shared storage.
3 We’ll see in Chapter 9 that CRD conversion and admission webhooks available in the latest Kubernetes versions also allow us to add these features to CRDs.
4 PaaS stands for Platform as a Service.