Appendix
Selected Topics

The purpose of this Appendix is to provide more information on selected topics that have been referenced earlier in the book:

  • The first topic concerns the relevant operations and management standards developed in the IETF;
  • The second topic is the development of modeling languages in general and the TOSCA standard in particular;
  • The third topic is dedicated to the notion of the REST API in the context of the World-Wide Web architecture;
  • The fourth topic covers the basic mechanisms that enable identity and access management.

A.1 The IETF Operations and Management Standards

The standards introduced here are the Simple Network Management Protocol (SNMP), the Common Open Policy Service (COPS), and the Network Configuration (NETCONF) protocol, respectively.

A.1.1 SNMP

The OSI approach to network management in the Internet was neither ignored nor completely abandoned. It was, indeed, simplified and constrained to focus on smaller, specific problems. Until fairly recently, the main objective of network management was monitoring—not configuration management. As we will soon see, the SNMP protocol ended up being much less than universal; a new set of protocols was needed for configuration management. In addition, access management (and overall security of network management) was not systematically approached until SNMPv3.

The 1990 IETF RFC 11551 defines the Structure of Management Information (SMI). As in the OSI, the MIBs were defined using ASN.1, but the use of ASN.1 is restricted, as we will explain shortly. Each type of object has been assigned—via an object identifier—a name specified down to its encoding. An object identifier is assigned according to the administrative policies.

An object identifier is a path through a global naming tree structure.2 The edges are labeled by Unicode strings, which in turn are encoded as integers. Each edge indicates the respective administrative domain (which may change as the tree is being traversed). Figure A.1 depicts the root of the tree with three edges, which are respectively administered by ITU-T (0), ISO (1), and ISO and ITU-T (2), jointly.

Tree diagram shows SMI ASN.1 object identifiers where Root is connected to ISO, then to the international organizations, and finally to US DoD.

Figure A.1 The tree of SMI ASN.1 object identifiers.

Under the ISO administration there is space for international organizations (3), which provided space (6) for the US Department of Defense. The rest of the SMI paths are listed on the right-hand side of the figure. The “internet (1.3.6.1)” subtree is reserved for objects managed by the Internet Assigned Number Authority (IANA).

To allow independent object definition, the “private space (1.3.6.1.4)” is defined, with the 1.3.6.1.4.1 space allocated to enterprise objects.

Formally, all identifiers are defined through the OBJECT IDENTIFIER construct recursively—by referring to previously defined constructs. Thus, one writes:

internet OBJECT IDENTIFIER ::= {iso org (3) dod (6) 1};
private	 OBJECT IDENTIFIER ::= {internet 4}; and
enterprise OBJECT IDENTIFIER ::= {private 1}.

RFC 1155 permits only the primitive (non-aggregate) ASN.1 types.3 It restricts the use of constructor types to lists and tables, and defines a number of application-wide types (such as NetworkAddress, IpAddress, Counter—a 32-bit one, TimeTicks—in 0.01 sec intervals, etc.). RFC 1155 further defines the format of a MIB. The latter is a quintuple comprising:

  1. Object name (a textual OBJECT DESCRIPTOR along with the OBJECT IDENTIFIER);
  2. The object structure proper, called syntax, which must resolve to an allowed structure type;
  3. A definition of the “semantics” of the object-type—written for people rather than machines to ensure “consistent meaning across all machines;”
  4. Access (read-only, read-write, write-only, or not-accessible4);
  5. Status (mandatory, optional, or obsolete).

The SNMP framework architecture is laid out in the RFC 3411,5 which is part of the Internet Standard 62. The SNMP system is realized by exchanging the SNMP protocol messages among the SNMP entities. At least one of these entities is a manager; the others are present in the devices or network elements. In its more general form, the architecture of an entity is presented in Figure A.2.

SNMP entity diagram shows SNMP engine; dispatcher, security, access control and message processing subsystems, user-based, view-based and other models. It also shows applications; command generator and responder, notification receiver and responder, proxy forwarder, and other applications.

Figure A.2 SNMP entity (after RFC 3411).

An entity has the SNMP engine and applications. The engine executes the protocol and also takes care of the security services—in particular, confidentiality and authentication. Within an administrative domain, each engine (and hence each entity) is unique, and it is assigned a name—snmpEngineID.

The engine consists of the Dispatcher, Message processing subsystem, and Access control subsystem.

The dispatcher supports (concurrently) multiple protocol versions but provides a single abstract interface to SNMP applications.

The message processing subsystem6 relies on the modules specific to the SNMP version.

Finally, the security subsystem provides the namesake services according to a given model. The model is characterized by the threats it protects and the protocols it employs. Only the user-based model7 is mentioned explicitly, but the plan is to allow any model as a plug-in).

Access control subsystem, which provides the authorization service (and thus may have been arguably considered a part of the security subsystem). Similarly to the security subsystem, the access control subsystem relies on one or more specific access control models. RFC 3415,8 yet another part of STD 62, specifies the view-based access control model, which determines the access rights of a group, representing zero or more objects that have the same access rights. As the standard explains, “For a particular context, identified by contextName, to which a group, identified by groupName, has access using a particular securityModel and securityLevel, that group's access rights are specified with respect to read-view, write-view, and notify-view.” Each view represents a set of object instances authorized for its respective action (i.e., reading or writing objects, or sending objects in a notification).

As far as applications within an SNMP entity go, they include command generators, which monitor and manipulate management data; command responders, which provide access to management data; notification originators, which initiate notification messages; notification receivers, which process asynchronous messages; and proxy forwarders, which merely forward messages toward the recipients. These applications are well-defined, but other applications are not excluded, so the architecture leaves a place to plug in an application.

A.1.2 COPS

COPS is specified in RFC 2748,9 but the context for this work has been set in RFC 2753,10 which provides the motivation, describes the terminology, and otherwise specifies the framework. We will introduce this context briefly.

First of all, the objective of the framework is to describe the execution of policy-based control over the QoS admission control decisions, with the primary focus on the RSVP protocol as an example.11

Among the goals of the framework are support for pre-emption, various policy styles, monitoring, and accounting. Pre-emption here means the ability to remove a previously granted resource so as to accommodate a new request.12 As far as the policy styles go, those to be supported include bi-lateral and multi-lateral service agreements and policies based on the notion of relative priority, as defined by the provider. Support for monitoring and accounting is effected by gathering the resource use and access data. The framework also states the requirement for fault tolerance and recovery in cases when a PDP fails or cannot be reached.

Figure A.3 presents the policy control architecture. Its main components are the PDP, where the policy-based decisions are made and the Policy Enforcement Point (PEP), which queries the PDP and ensures that the latter's directives are executed. The PEP is part of the network element in which admission control takes place. Figure A.3 depicts the general case where PDP is outside the network element; however, the PEP and PDP may be co-located within the same physical “box.” To this end, the framework allows both a co-located PDP module—called a local PDP (LPDP)—and a remote PDP module to be used.

Policy control architecture shows a network element with PEP connected to PDP. PDP is connected to directory service, authentication service, other services.

Figure A.3 Policy control architecture (after RFC 2753).

Interactions between the PEP and the PDP are typically triggered when the PEP acts on an event (such as making a decision on whether to admit a given packet). The PEP itself may also trigger the interactions by sending a notification. In the former case, the PEP queries the PDP, supplying the admission control information. The latter could be a flowspec, the amount of bandwidth requested, or a description of the event that triggered the policy decision request (or a combination of these). The PDP responds with the decision, and the PEP acts on it by accepting or denying the original request. PDP may also send additional information—unrelated to admission control—or respond with an error message.

In order to formulate the decision, the PDP may, in turn, request additional information from the directory services or other services—most notably the Authentication, Authorization, and Accounting (AAA) services. Yet another function of the PDP is to export information relevant for monitoring and accounting purposes.

Meanwhile, RFC 275013 updated RSVP to include policy data as a payload in its protocol, to be processed by routers and PDPs, but otherwise opaque to every other entity. We use Figure A.4 to explain the interaction with RSVP.

Policy control in an RSVP router architecture shows a RSVP-capable router consisting of PEP, LPDP, reservation setup agent, admission control, classifier, and packet scheduler. PEP is connected to LPDP, PDP, and reservation setup agent. Classifier produces packet scheduler.

Figure A.4 Policy control in an RSVP router (after RFC 2753).

When any RSVP-related event requires a policy decision, the reservation setup agent consults the PEP module. The latter first checks the LPDP—to obtain a partial policy decision (if available)—and then queries the PDP, attaching the partial decision fetched from the LPDP. The final policy decision is returned to the reservation setup agent.

A typical case is an admission control request (with which the associated policy elements are passed). The PDP may request, however, that at the same time the PEP raise a policy-related exception to the reservation setup agent. For example, the PDP may approve the request to proceed with the soft-state installation and forward the reservation upstream, but the session time may be limited, and so a respective notification regarding the path expiration would also need to travel upstream. This example illustrates the necessity of non-trivial interactions between COPS and RSVP.

Hence, whenever the PDP returns an error, it has to specify whether the event that generated the admission control request be processed as usual (but with the addition of error notification) or the processing be halted.

Conversely, the PDP can itself initiate a communication (by issuing a notification) to the PEP when a decision made earlier needs to be changed or an error was detected. If the notification needs to propagate along the reservation path, the PEP has to convey this information to the reservation agent.

The actual COPS protocol, created for the general administration, configuration, and enforcement of policies, is consistent with the above framework.

Something intrinsically new about COPS—as compared with SNMP or CMIP—is that COPS employs a stateful client–server model, which is different from that of the remote procedure call. As in any client–server model, the PEP (client) sends requests to the remote PDP (server), and the PDP responds with the decisions. But all the requests from the client PEP are installed and remembered by the remote PDP until they are explicitly deleted by the PEP. The decisions can come in the form of a series of notifications to a single request. This, in fact, introduces a new behavior: two identical requests may result in different responses because the states of the system when the first and second of these requests arrive may be different—depending on which states had been installed. Another stateful feature of COPS is that PDP may “push” the configuration information to the client and later remove it.

The COPS stateful model supports two mechanisms of policy control, called respectively the outsourcing model and the configuration model. With the outsourcing mechanism, PEP queries PDP every time it needs a decision; when the configuration mechanism is employed, PDP provisions the policy decision within the PEP.

Unlike SNMP, COPS was designed to leverage self-identifying objects and therefore it is extensible. COPS also runs on TCP, which ensures reliable transport. Although COPS may rely on TLS, it also has its own mechanisms for authentication, protection against replays, and message integrity.

The COPS model was found very useful in telecommunications, where it was both applied and further extended for QoS support. As far as Cloud Computing is concerned, the primary application of COPS is SDN.

A.1.3 Network Configuration (NETCONF) Model and Protocol

Figure A.5 presents the NETCONF architecture.

NETCONF architecture shows a client connected to a server using RPC with a cloud labeled as distributed object management infrastructure. The server is then connected to NETCONF configuration database. Notifications from the server is given to the client.

Figure A.5 NETCONF architecture.

The architecture has been developed—at least initially—with the command-line interface in mind, which explains some of its peculiar features. The protocol follows a client–server model in that a client issues commands carried in a remote procedure call and obtains the result on the return. The server executes commands over the configuration datastore. RFC 6241 defines this term as follows: “The datastore holding the complete set of configuration data that is required to get a device from its initial default state into a desired operational state.” The datastore, in turn, is defined as “ … a conceptual place to store and access information.” As far as the implementation is concerned, “A datastore might be implemented, for example, using files, a database, flash memory locations, or combinations thereof.”14

In this design, there are two major points of departure from both the client–server protocol and CLI. In the departure from the strict client–server model, the NETCONF architecture supports asynchronous communications—it gets notifications from the server. These are effectively interrupts as far as the client is concerned, and so the client has to be developed to supply proper interrupt handlers. In the departure from the CLI model, NETCONF uses structured XML-encoded data. In summary, NETCONF specifies15 a distributed object-oriented infrastructure.

This infrastructure, along with all other pieces of NETCONF, is conveniently presented in four layers as in Figure A.6. Following RFC 6241, the layers are explained by the respective examples—depicted on the right-hand side.

Diagram shows NETCONF layers; Content with configuration and notification data, operations such as <get>, <get-config>, etc, messages such as <rpc>, <rpc-reply>, etc, and secure transport between client and server. Content and operations are defined using YANG.

Figure A.6 NETCONF layers.

We start with the lowest one, called secure transport. (Note that all four layers are modules within the OSI application layer. The choice of the word “transport”16 in the name of the layer is somewhat unfortunate, as the name conflicts with the OSI terminology; however, the confusion is avoided if one keeps in mind that NETCONF is strictly an application-layer protocol and that the key word in the term “secure transport” is “secure.”) What the term really tries to reflect is that all NETCONF messages must be passed over a secure interface. Initially, the example in mind was that supported by the Secure SHell (SSH) protocol, as opposed to telnet—in which CLI commands were transmitted in the clear. (Again, SSH was developed for CLI—with SSH, one gets access to the shell interpreter, which otherwise could be reached with the unsecure telnet.) Over time, however, other protocols—notably TLS—have been accepted here. It is mandatory that this layer provides authentication, data integrity, confidentiality, and replay protection.

The next layer, messages, is merely a transport-independent framing mechanism for encoding both RPC-related and notifications-related structures. The four elements of the RPC structure are listed in the example entry of Figure A.6; the notification elements are listed in RFC 5277.17 In what follows, we will provide an example of the specification of an RPC call.

Figure A.7 presents both the XML encoding of the client's invocation of the deck-the-halls method, whose single parameter is the string boughs_of_holly, and the server's reply. First, the <rpc> element has a mandatory attribute message-id, a string, whose purpose is to uniquely identify the message (so as to be returned with the respective <rpc-reply>). This string is chosen by the sender of the RPC. We chose it to be “123,” following the tradition of encoding an integer.

Program shows XML encoding of the client’s invocation of the deck-the-halls method, whose single parameter is the string boughs_of_holly, and the server’s positive reply.

Figure A.7 (a) Invocation of the deck-the-halls RPC method; (b) reply with the positive result.

The <rpc> element is defined by NETCONF, and the appropriate namespace is referred to by the xmlns string. The name of the method, deck-the-halls, follows. Since this method has been introduced by us, we have to provide the URI of the namespace in the xmlns string that follows. The only parameter, which we appropriately call <fixture>, is the string boughs_of_holly, a constant value.

In case of a successful execution, the <rpc-reply> is returned with the <ok> element. (Otherwise, it can be returned with <rpc-error>, in which case the error cause would be specified.)

Back to the layers of Figure A.6. At the third layer, Operations, the NETCONF-defined base protocol operations, are processed. These operations are <get-config>, <edit-config>, <copy-config>, <delete-config>, <lock>, <unlock>, <close-session>, and <kill-session>. The first four are self-explanatory; the verb in each name specifies an operation that is performed on the configuration (or part of it). There are many nuances with the choice of parameters in the reply options, which are well explained in the RFC.

The <lock> operation is to request that the server deny (presumably for a short time) the entire datastore modification requests from other clients (including the SNMP clients, clients executing CLI scripts, or human users). The lock is active until the <unlock> operation is issued or—to preclude a permanently locked system—for the duration of the session. If the datastore is locked, then <edit-config>, <copy config>, or <delete-config> requests from other clients will be denied. (No semaphore or monitor infrastructure is maintained at the server; it is a client's job to retry.)

There is, however, a much more nuanced approach to the execution of the <lock> operation on the server side, and we will address this shortly. Let us first finish with the two remaining operations. Of these, <close-session> results in releasing all locks—and other resources—and terminating the underlying secure transport session. (This seems a bit awkward, since there is no <open-session> operation;18 the whole idea here is to force the release of the locks and other resources.) The <kill-session> achieves the same result, but with some other client's session, which naturally means that this operation may succeed only if the client that issued it has proper authorization.

A nuance mentioned in the previous paragraph is that, with respect to configuration updates, NETCONF supports and partly implements the transactional model known as ACID—Atomic, Consistent, Isolated, and Durable. (For a detailed discussion of this subject, see Chapter 20 of [1].) In a nutshell, these properties refer to a group of operations that constitute a transaction in the distributed environment. Incidentally, this is a classical orchestration construct.

The first major problem here is ensuring that either the whole group of operations succeeds—consider what may happen in the distributed environment when one or more hosts suddenly crash in the middle of the transaction—or, in case of a failure, the effect of all operations that had already executed is undone.

A typical example of a vulnerable transaction here is getting money from a cash machine. This involves debiting an account and dispensing the cash. If the machine cannot dispense the cash, the account must never be debited, but the machine must not give the cash out before the account is debited.

Atomicity is the property whose presence solves this problem. It is implemented by logging and tracking the effect of all successful transaction operations as candidates before actually performing the changes. In case of a failure of an operation, the logs are flashed, and so the transaction rolls back to the initial state. Otherwise, when all operations have completed successfully and the involved entities commit to the transaction, it is carried out. (Even if the server crashes in this last phase, it will still execute the committed transaction, according to the log, after it reboots.)

Here the consistency property means that if the associated data had been consistent before the transaction started, then these data will remain consistent after the transaction has completed. The isolation property deals with the visibility of the steps in an unfinished transaction; it is implementation-dependent. The durability property is concerned with the preservation of the system state (for example, in logs) to deal with crashes.

To this end, NETCONF defines the effect of its <commit> operation as follows: “The <commit> operation instructs the device to implement the configuration data contained in the candidate configuration. If the device is unable to commit all of the changes in the candidate configuration datastore, then the running configuration MUST remain unchanged. If the device does succeed in committing, the running configuration MUST be updated with the contents of the candidate configuration.”

NETCONF defines the <rollback-on-error> capability, in which case the <error-option> parameter of the <edit-config> operation may be set to rollback-on-error. To avoid inconsistency in the case of shared configurations, the RFC recommends that a client lock the configuration. That is why the standard explicitly forbids granting a lock “if any of the following conditions is true:

  • A lock is already held by any NETCONF session or another entity;
  • The target configuration is <candidate>, it has already been modified, and these changes have not been committed or rolled back; or
  • The target configuration is <running>, and another NETCONF session has an ongoing confirmed commit.”19

As we can see, the Operations layer is quite involved. (Reading all of the 112 pages of RFC 6241 will definitely confirm this observation, even more so the respective programming effort.) For a developer to deal with the implementation, a programmatic representation is needed, which is what is supposed to make up the fourth, Content, layer of the model. But RFC 6241 stops short of defining this layer—delegating the job to another standardization effort to specify “the NETCONF data models and protocol operations, covering the Operations and the Content layers.” Fortunately, the IETF NETMOD working group has completed such a specification, called YANG,20 and published in RFC 6020.21

In this book we won't be able to review YANG in any detail, but the interested reader will find the RFC very well written. We only note that YANG is the de-facto NETCONF modeling language. It is well structured, so following a module one can find both its high-level view and the ultimate encoding in NETCONF operations. By design, YANG has also been made extensible, thus allowing other SDOs to develop its extensions and individual programmers to produce plug-and-play modules.

YANG also maintains (limited) compatibility with SNMP: the SMIv2 MIB modules can be automatically translated into YANG modules for read-only access. There are several products and open-source projects—listed at http://trac.tools.ietf.org/wg/netconf/trac/wiki.

Naturally, NETCONF is used extensively in the SDN for configuring virtual switches. As noted in Chapter 7, there are NETCONF plug-ins used for this purpose as part of the open-source SDN project Open Daylight.

A.2 Orchestration with TOSCA

Already introduced earlier, the Topology and Orchestration Specification for Cloud Applications (TOSCA) is an OASIS standard [2]. At the time of writing, it is being adopted by the industry. TOSCA is a DSL for life cycle software management. It is also a modeling language in that it describes the structure (a model) of a service so as to express what needs to get done in order to run it as opposed to the programmatic how. The how part is generated automatically by the interpreter.

These types of language have not appeared overnight. A more general idea of domain-specific modeling as the means to improving software productivity through automatic code generation, derived from a high-level specification, has been supported by the Domain-Specific Modeling (DSM) Forum,22 whose site provides much interesting information (including authoritative bibliography) on the subject.

As an historical (and historic!) aside, as with several key technologies reviewed in this book, the major breakthrough with the domain-specific language development was achieved toward the end of the magic 1970s. Professor Noah Prywes, whom we first met at the very beginning of this book, published a seminal paper [3] that introduced the “Module Description Language (MODEL) designed for use by management, business, or accounting specialists who are not required to have computer training.” The MODEL language described the “input, output, and various formulae associated with system specification” but provided no sequencing information. The latter was the job of the MODEL Processor, which generated code (possibly after resolving inconsistencies or ambiguities in the process of interaction with the programmer). In the 1980s and 1990s, Professor Prywes and his graduate students at the University of Pennsylvania designed a full system for the distributed software life cycle specification, which could also be used for reverse engineering [4]. This was one successful example of technology transfer. The actual MODEL product, based on the research results, was developed and marketed by the Computer Command and Control company.

Let us go back to TOSCA. We start by expanding the explanation of its role in application orchestration with the help of Figure A.8, which is based on incisive material provided by our colleague Sivan Barzilay. An application here is expected to employ multiple virtual machines—including virtual networking appliances—interconnected in a particular way and ruled by a set of policies. A typical example here is deployment of a virtual network function (such as IMS) in an operator network.

Orchestration layering framework shows five layers; application orchestration layer with TOSCA-life cycle specification and policy, stack layer with OpenStack Heat, convertor layer with convertor, IaaS layer with SDN controller and neutron-glance-cinder-nova-ceilometer, and infrastructure layer with SDN virtual and physical devices, OpenStack and Non-Openstack cloud nodes.

Figure A.8 Orchestration layering framework (courtesy of Sivan Barzilay).

Managing network function virtualization at the lowest—infrastructure—layer is far too complex, as we have already demonstrated in Chapter 7 when discussing the evolution of telephony network management.

At the IaaS layer, we know how to orchestrate stacks. Specifically, we are familiar with the OpenStack mechanisms for orchestration, which are of course applicable to the Cloud nodes (data centers) that are based on the OpenStack software. But what if there are nodes that deploy alternative implementations? To maintain the uniform service specification in this case, we need to terminate the Heat API at the convertor layer, where the back-end functions will then take over the conversion task.

At the stack layer, we combine virtualization with data communications (and SDN); hence the need to integrate orchestration of stacks with network topology entities. Once again, this aspect is particularly critical to network function virtualization where an application in fact is networking as a service.

Naturally, to integrate wide-area networking with stack orchestration, another layer of abstraction—and specification—is needed so as to port the services across different platform implementations. This is precisely what TOSCA aims to accomplish.

The core TOSCA specification [2] describes the components of a service as well as the relationship among these components. In addition—and this is the key point—the specification language allows one to specify the operational behavior through management procedures for both creating and modifying the services in orchestration. Thus, the TOSCA Service Template describes the invariants of topology and orchestration procedures that hold across different environments throughout the life cycle.

To shed more light on the specifics, let us start with the namespace. TOSCA uses XML and defines two namespace prefixes: the namespace with prefix tosca (http://docs.oasis-open.org/tosca/ns/2011/12) is the default; the other prefix is xs. For the XML schema, refer to www.w3.org/2001/XMLSchema. TOSCA extensibility mechanisms support the export of entities (attributes and elements) from other namespaces as they don't contradict any entity in the TOSCA namespace.

The description of the abstract syntactic structure of the service template is accompanied by Figure A.9. (We will soon take a look at a specific example, to illustrate the abstraction.)

TOSCA template structure shows a service template with four categories; topology template with two subcategories node and relationship templates, node type definitions with interfaces and properties, relation type definitions with interfaces and properties, and plans.

Figure A.9 The structure of a TOSCA template.

The elementary components of the service are called nodes, the node types being declared at the highest level of the specification (along with the relationship types, Topology Template, and Plans). For example, the node types in a web service deployed in the Cloud might be a web server application, called My_App; a web server, X_Web_Server; the underlying operating system (a Linux distribution), Y_Linux; the virtual machine that hosts the application, Virtual_Machine; and, finally, the Cloud service providing the virtual machine, Z_Cloud.

Each node type defines the properties of the service component and the respective interfaces to the operations to be performed on the component. In the case of a server-type node, the properties might include the number of CPUs, memory size, the name of the image to instantiate and—as an essential security property—the SSH key pairs' location (cf. Figure A.11 later). (The values of the above parameters can be obtained through a specified input procedure.) The interfaces specify the operations on the node during the life cycle of a service. Each operation (e.g., create, start, or stop) comes along with the pointer to a script actually implementing the operation, as in:

Topology template diagram shows My_App with two categories; My_Database with queries database and then hosted on, and hosted on X_Web Server. X_Web Server is hosted on to Y_ Linux which is hosted on to Virtual Machine which is in turn hosted on to Z_Cloud.

Figure A.10 A topology template example.

Two columns showing translation of the TOSCA template into the corresponding HOT
template respectively. Left column shows translation of the TOSCA template. It has the description “ My most simple server template”, inputs with default as TEST-KEYPAIR-FOR-BDD and also has node_templates. Right column shows the corresponding HOT template with key_name: TEST-KEYPAIR-FOR-BDD.

Figure A.11 An example of translation of (a) the TOSCA template into (b) the corresponding HOT template (courtesy of Sivan Barzilay).

create: scripts/server_library/install_server.sh.

This is one place in TOSCA that provides the interface for procedural plug-ins. The relationship types specify the relations (or connections) among the nodes of given types, the idea being that the service is a directed graph whose vertices are the nodes and whose edges are the relationships.

Relationship types specify which nodes they can connect. The direction is denoted by explicitly declaring the source and target elements. (The direction, as we will see shortly, is essential for establishing the processing order.) The interface part allows us to plug in the code—just as in the node type case.

This graph-based representation (and the subsequent derivation of the actions obtained by traversing the graph) is the reason for the use of the word “topology” in TOSCA. With the earlier examples of nodes, we could use a relationship type HostedOn, as in “My_App [is] HostedOn X_Web_Server.” But My_App may also consume the services of a database represented by the node My_Database, and to specify this relationship we would introduce a new relationship type, QueriesDatabase.23

Overall, the TOSCA topology template consists of a set of node templates and relationship templates. Figure A.10 illustrates a fragment of a topology template for our schematic web service.

The final element of the TOSCA service template is Plans. Here the management aspects of service instances are defined via a workflow. In the choice of a specification language, TOSCA defers to other standards, such as the OASIS own Web Services Business Process Execution Language (BPEL) or the Business Process Modeling and Notation (BMPN)24 produced by the Object Management Group, although it allows use of other languages. Either way, a workflow refers to the operations defined as part of the node and relationship templates (e.g., as in the interface specifications).

Figure A.11 (provided to us by Sivan Barzilay) is a simple example of translating a TOSCA template into a HOT template which can be understood by the AWS CloudFormation and OpenStack. But how is a specification interpreted so as to support the sequence of life cycle operations? A node template provides a rather straightforward answer to the question of how to deploy and manage the node. The relationship specifies the order of processing the node templates (and, as we noted earlier, it may inject additional processing logic). For example, for the relationship type HostedOn, the host should naturally be created and configured before the node it hosts. Similarly, for the client-server relationships, the server must be processed before any of its clients.

For additional reading, we recommend an incisive paper [5], which also describes the life cycle of a service in view of a TOSCA template created by the Cloud service provider during the service offering phase.

There has been significant effort in the industry to adopt TOSCA. As with many other initiatives, there is an open-source project, which is appropriately called OpenTOSCA.25 Pleasing to the authors' tastes, the naming conventions for the OpenTOSCA ecosystem output seem to be rooted in oenology: in addition to the TOSCA run-time environment, the OpenTOSCA Container, it provides a graphical modeling tool called Winery and a self-service portal for the applications available in the container, called Vinothek.

As we mentioned earlier, there is considerable effort in OpenStack to interwork with TOSCA. TOSCA is also the subject of ongoing research. One pressing item here is the definition of a policy framework, which, as an important example, can be used to specify security policies. Among the major drivers is certification of Cloud services (as, for example, addressed in [6]). Once certified, a service should remain unchanged; hence the need to capture certification requirements in a formal service description.

Noting that “TOSCA lacks a detailed description of how to apply, design, and implement policies,”26 [7] demonstrates how security policies can be defined. The paper considers two approaches. The first approach is plan-based, and so the workflows in the build, management, and termination plans are modified to support the annotated policies. The second approach does not involve any modification of plans; instead, the relevant operations are modified.

But while the research is ongoing, the OpenTOSCA code is already being used successfully in production. As reported in [8], the design, specification, and Cloud deployment of an Enterprise Content Management (ECM) system using OpenTOSCA and OpenStack can be achieved by a single graduate student in the course of an MS project.

A.3 The REST Architectural Style

The concept of the REST architectural style has been described in [9] and elaborated on in Chapter 5 of Roy Fielding's PhD dissertation [10]. “A stubborn misconception in the industry is that REST is “HTTP-based.” Although Dr. Fielding has worked on the design, specification, and implementation of HTTP—whose specification he co-authored—he has stressed that REST is a style, which is completely protocol-independent. Nor is an HTTP-based API REST by default.

As [9] states, “REST is a coordinated set of architectural constraints that attempts to minimize latency and network communication while at the same time maximizing the independence and scalability of component implementations. REST enables the caching and reuse of interactions, dynamic substitutability of components, and processing of actions by intermediaries, thereby meeting the needs of an Internet scale distributed hypermedia system.”

“Caching” and the presence of the “intermediaries” here are quintessential architectural entities in the World-Wide Web, and in order to discuss REST, we need to review the architecture of the World-Wide Web. As we will see, the key to understanding the style lies in the last three words of the quote: “distributed hypermedia system.” In the next section, we review the latter concept; the two sections that follow briefly highlight aspects of the World-Wide Web architecture and outline the REST style.

A.3.1 The Origins and Development of Hypermedia

The word hypermedia refers here to the system of sound and video references. Syntactically, hypermedia are hypertext, as the actual links are ASCII-encoded. An early vision of hypermedia was expressed by Professor Vannevar Bush in 1945, in his famous article [11]:

“Consider a future device for individual use, which is a sort of mechanized private file and library. It needs a name, and, to coin one at random, ‘memex’ will do. A memex is a device in which an individual stores all his books, records, and communications, and which is mechanized so that it may be consulted with exceeding speed and flexibility. It is an enlarged intimate supplement to his memory.”

The implementation idea (unrealized in that form) was to use microfilm as the storage medium for “Books of all sorts, pictures, current periodicals, [and] newspapers.” These would be indexed and searched, mechanically, based on the codes entered on a typewriter keyboard and projected to a screen. The conception of associative linking—the key concept of hypertext—was part of the vision: “It affords an immediate step, however, to associative indexing, the basic idea of which is a provision whereby any item may be caused at will to select immediately and automatically another.”

According to his seminal 1965 paper [12], Theodor Holm Nelson started to work on the computer-based27 implementation of Bush's concept in 1960. The term “hypertext” was introduced in [12] “to mean a body of written or pictorial material interconnected in such a complex way that it could not conveniently be presented or represented on paper.” Same goes for “films, sound recordings, and video recordings,” which “can now be arranged as non-linear systems—for instance, lattices—for editing purposes, or for display with different emphasis.” Along with “hypertext” came the word “hyperfilm,” denoting a “browsable or vari-sequenced movie,” which is “only one of the possible hypermedia that require our attention.”

Of course, the paper went much further than merely defining the terms—it presented the information structure, file structure, and even a language for expressing the file format.28

In parallel with this development, two University of Pennsylvania professors, Noah Prywes—whom we now meet for the third time—and Harry J. Gray had been building the Multi-List system, outlined in a 1959 paper [15] and further described in [16], which was envisioned for library function automation through the software-based associative memory implementation in which the memory was organized in the linked-list structure. As reported in [17], Andries van Dam, a computer graphics pioneer, used Multi-List in his doctoral research at the University of Pennsylvania. In1966 he defended his doctoral thesis “A Study of Digital Processing of Pictorial Data,” and was awarded the second PhD in Computer Science in history.

Subsequently, Nelson and van Dam joined forces to develop—together with a team of Brown University students—the Hypertext Editing System (HES) on IBM 360 [18]. That project later morphed into a more advanced one, but here the major elements of hypertext processing as we know it have been present. The project involved multi-user access (not only for reading, but also for modifying files). The text displayed on a computer terminal could be selected with a light pen (a pre-cursor of what has become a mouse), tagged, and annotated. On top of validating the hypertext idea, HES demonstrated that line editing (which, incidentally, was still around even two decades later) can be replaced with full-screen editing and formatting.

We will fast-forward to 1980, when Sir Timothy Berners-Lee developed his hypertext-based system ENQUIRE at CERN; to 1989, when he followed up with a different design—now uniting the hypertext technology with that of the Internet—and built the first website; and finally to 1990, when he released the first web browser, called WorldWideWeb.

The hypermedia component of the architecture referred to a resource (which initially was a document represented by a file, but later evolved to what could be produced on the fly by a program). The universal resource naming scheme, which we reviewed earlier, allows us to locate resources at servers by means of DNS look-up. A browser, located at the client device, interprets the web pages written in the Hypert-Text Mark-up Language (HTML), and fetches the resources.29

A resource is assigned a Universal Resource Identifier (URI), defined in RFC 3986.30 A URI can be a Universal Resource Name (URN), which uniquely identifies a resource but does not specify its location, or it can be a Universal Resource Locator (URL), which actually specifies the resource location. The “or” is not exclusive: a URI can be both a URN and a URL.

A URI is a mere identifier; it does not have to refer to an accessible resource. An operation associated with a URI reference is defined by the protocol and the relevant metadata. RFC 3986 stresses that the typical operations that a system may attempt to perform on a resource—access, update, replace, or find attributes—are defined by the protocols that make use of URIs.

The syntax of URI is defined formally thus:

URI = <scheme> “:” <hier-part> [“?” <query>] [“#” <fragment>].

The scheme is typically the relevant protocol (such as http, mailto, or SIP), but it can be just a string urn, implying that the URI is a URN. The <hier-part> contains the DNS name of the host where the resource is located, the port to be used (this is optional, and the default is 80), and the path to the resource (modeled after the Unix file system). The rest is the optional search part. Figure A.12 lists several URI examples. (Note that, as is evident from the first two examples, a single resource may have different URIs.)

Bullet list of URI examples; ftp://ftp.ietf.org/rfc/rfc3986, http://www.ietf.org/rfc/rfc3986.txt, ldap://[2001:db8::7]/c=GB?objectClass?one, mailto:Cloud.Administrator@example.com, news:comp.infosystems.www.servers.unix, tel:+1-816-555-1212, telnet://192.0.2.16:80/, urn:oasis:names:specification:docbook:dtd:xml:4.1.2

Figure A.12 URI examples.

Here is another example illustrating the search part of the URI string. When we googled “an example of URL,” the browser displayed the following URI with the result: https://www.google.com/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#q=an%20example%20of%20url.

This example not only underlines the flexibility of the web hypermedia realization, but also demonstrates how the URIs store and “drive” the state of an application. The latter is a major tenet of the REST architectural style; we will return to this subject later. The reader will appreciate the quote from [9]: “Hypermedia was chosen as the user interface because of its simplicity and generality: the same interface can be used regardless of the information source, the flexibility of hypermedia relationships (links) allows for unlimited structuring, and the direct manipulation of links allows the complex relationships within the information to guide the reader through an application. Since information within large databases is often much easier to access via a search interface rather than browsing, the Web also incorporated the ability to perform simple queries by providing user-entered data to a service and rendering the result as hypermedia.”

The structure of a web page, which also provides a representation of the relevant resources, is specified using the HyperText Markup Language (HTML). HTML is standardized by W3C—an organization founded and headed by Sir Berners-Lee. The latest version, HTML 5.0,31 supports, among many other things, native (rather than plug-in-based) audio and video, browser storage, and in-line vector graphics.

Part of the resource representation is metadata (such as media type or last-modified time). Another data component is control data, whose purpose will become clear soon when we review data caching in the next section.

A.3.2 Highlights of the World-Wide Web Architecture

From the onset, the web design has been concerned with reducing bandwidth use (which also meant faster access to resources and, in many cases, also meant reducing networking fees). In the distributed environment, this can be achieved by a tried-and-true method—replicating data in one or more caches. Caching of the data starts at the client, where a page, once fetched, can be stored by the browser; further replication is performed by the proxies.

Figure A.13 illustrates this with an example where the proxies are deployed at the enterprise, the network provider, and the service provider. (We should note that deploying proxies not only provides physical storage for caching, but also enables content filtering, protocol translation, gathering of analytics, and end-user anonymity—all important features in the use of the World-Wide Web today. Needless to say, these very features come with well-known disadvantages where they are misused.)

Proxies caching network shows local proxy connected to enterprise proxy, which is then connected to network provider proxy. Remote server is connected to service provider proxy, which is then connected to network provider proxy.

Figure A.13 Caching with proxies (an example).

In the extreme case, when a server is down, a cache may still provide its services—from an end-user point of view a situation akin to observing the light of a star that has gone dark.

The known presence of caches naturally constrains any protocol deployed in this architecture, since an endpoint (a client or a proxy) that uses a cache has to know whether the cache is valid (i.e., contains the same information that the server does).

Another constraint is due to the dynamic nature of web pages. An ultimate HTML document does not have to be stored at the server—it can be created dynamically at the server as well as at the client, or at both, with a client and a server playing their respective parts.

In order to illustrate the working of the World-Wide Web, it is necessary to highlight the features of its original protocol, HTTP, defined in RFC 2616.32 HTTP runs over TCP (now using persistent connections) and so it has a “reliable pipe.”33 HTTP is a request/response protocol, whose requests are issued by a client, with responses returned by a server.

All HTTP messages are ASCII-encoded. The message headers are actually pure ASCII text, while the body of a message may contain the ASCII-encoded binary data.

HTTP has been geared toward implementing an object-oriented paradigm; consequently it is defined with a list of methods,34 which act on the resources identified in the request URI:

  • GET obtains the resource representation;
  • HEAD obtains only the HTTP header (typically to check the metadata associated with the resource);
  • POST “requests that the origin server accept the entity enclosed in the request as a new subordinate of the resource;” (We will review an example of interpretation later.)
  • PATCH35 partially modifies the resource's representation;
  • DELETE requests that the resource be deleted;
  • OPTIONS requests information about the communication options available on the request/response chain associated with the resource;
  • TRACE initiates a loop-back of the original message;
  • CONNECT requests connection to a proxy that can be a tunnel.

The responses are grouped to reserved buckets of code as follows:

  • 100–199: Information;
  • 200–299: Success with data (if present);
  • 300–399: Redirection;
  • 400–499: Client error (also request for authentication, as in 401, accompanied by a challenge);
  • 500–599: Server error.

These are mostly self-explanatory, but two brief observations are in order. First, redirection is a powerful (and potentially dangerous) feature, which—true to its name—instructs the client to go to another server for information. Processing here may be complex—for example, loops must be avoided and, in some cases, the decision on whether to allow redirection cannot be made by the browser itself. Yet redirection can be a tool for constructing services. As we will see in the last section of this chapter, HTTP-based API systematically uses redirection in the implementation of token-based identity management schemes, such as those used in the Open Authorization (OAuth) protocol.

Second, client error may be used not to indicate an error, but rather to request additional processing. For example, the 401 response requests authentication of the client, for which it provides a challenge to be answered by the client so as to prove its identity. Similarly, in view of the proxies, the 407 response invokes the same procedure, except that the client has to authenticate to a proxy.

The last observation already hints that HTTP is well aware of the proxies, but there is much more to this. The nuances of dealing with caches are in the headers. A critical information element, which (typically) contains a hash of a given page, is called an Entity Tag (ETag), because it serves as a tag to the content of the page. ETags are used to check the freshness of the cache. When the resource representation is fetched (via the GET method) the first time, its ETag is stored along with the cached representation. The subsequent GET requests carry the ETag with an if-none-match request header field, making the request conditional: if the page is fresh (i.e., it hashes to the same quantity as the value of the ETag), there is no need to transfer the page. The same mechanism can be used with PUT to prevent it from modifying a page which the client does not know about.

Keeping time values also helps with validating freshness. The last-modified entity header field, returned in the response, records the date and time when the page was modified. The client can use the conditional GET with the same value in the if-modified-since header. The data elements that enable conditional processing are those that [9] refers to as control data.

A.3.3 The Principles of REST

These principles apply to the practice of programming a service in the presence of the web architecture. To begin with, it is impossible to dictate browser implementations, and so the client concerns and the server concerns must be separated—there ought to be no client-to-server interface binding. This is fundamentally different from the RPC approach, in which such binding is prescribed.

Considering the scale of a web service, the server cannot keep the state of the application separate for each client, and so the server must be stateless. (Yet another reason for this is that a server is likely to be replicated, with several instances of it being load-balanced.) It follows that each request from a client to a server ought to contain all the information necessary for the server to understand the request.

Not every mechanism for supplying state-related information is acceptable to REST. There is a well-known mechanism in HTTP, called cookies, as specified in RFC 6265.36 A server places a “cookie”—a data structure describing the client state—in response to the initial request from a client, and this cookie is then exchanged (and possibly updated by the server) in all future interactions so that the cookie keeps the state of the client. This approach is not considered in line with the REST style, which prescribes driving the application state transitions through hypermedia (URI). In fact, [10] takes a strong exception to this in Section 6.3.4.2, noting a clash with the user's backing up (as in clicking on the Back button in the browser) “to back-up to a view prior to that reflected by the cookie, the browser's application state no longer matches the stored state represented within the cookie. Therefore, the next request sent to the same server will contain a cookie that misrepresents the current application context, leading to confusion on both sides.”

In the presence of intermediaries and caches, extra interactions ought to be eliminated, so that the correct cache that is closest to the client responds and eliminates further request propagation. To achieve this, all data within a response from a server are labeled as cacheable or non-cacheable.

The interface between a client and a server is constrained through (a) identification of resources, (b) manipulation of resources via their respective representations, (c) self-descriptive messages,37 and (d) using hypermedia as the means of state transition.38

The REST style also prescribes layering (to encapsulate legacy services and to protect new services from legacy clients39) and the use of code-on-demand, where the code can be brought—by means of hypermedia—to be executed at the client.

The REST style prescribes defining fixed resource names or hierarchies (in fact, any “typed” resources where the types need to be understood by clients); the instruction on constructing appropriate URIs must come from a server—as is done with HTML forms or URI templates. Reliance on any specific protocol is actually proscribed.

The last point is often misunderstood. In his blog,40 Fielding stresses that REST API must be defined in a protocol-independent manner. In fact, HTTP is not the only web application protocol. As we mentioned earlier, the IETF, recognizing the deficiency of single-transaction request/response protocols (which cannot support asynchronous “responses” such as notifications and thus necessitate expensive and awkward polling mechanisms), has developed and standardized a new full-duplex protocol called WebSocket.41

We finish this section with an example that demonstrates how hypermedia can drive the state of a service. To make this example concrete, we explicitly use HTTP, at the same time demonstrating the power of the REDIRECT method.

Consider a simple but rather typical service where a user orders something where a receipt is desirable. The user requests an item, represented by URI X, gets a form in response, fills in the form, and submits it.

If the service is implemented as in Figure A.14(a), where a form is obtained via GET X and returned to the server via POST X, the application ends up in a transient state. The browser may move elsewhere, and the user would never be able to return to the receipt by backing up. Unsure of the success of form submission—or desperate to get a receipt—the user might go through the same steps again, thus submitting a second form. (And so the user might end up paying for and owning two items, where only one was needed in the first place.)

Left cyclic and right cyclic diagrams shows transient state and permanent state services where original page is connected to Form through form transfer, Form is connected to Transient or Permanent page through form submission and then to original page.

Figure A.14 Eliminating the transient state: (a) a service with a transient state; (b) the same service with a permanent state.

Figure A.14(b) depicts an alternative implementation, which fixes this problem. Here, POST appends suffix y to X, thus creating a subordinate resource X/y. The response redirects the user to X/y, which represents a new permanent state, and so any subsequent GET X/y will result in returning the receipt.

A.4 Identity and Access Management Mechanisms

This section introduces further detail on the identity and access management mechanisms referenced in Chapter 7. Most of these mechanisms are standardized. We first examine password management. Then we introduce Kerberos, which is widely used for authentication in enterprises. Kerberos is, in fact, a complete system with a well-defined architecture and communication protocol. It supports mutual authentication and single sign-on by design. After Kerberos, we move to the topic of access control. We start with a review of two common approaches to implementing access control matrix: storing the information in its non-empty cells by columns and by rows. The column approach yields access control lists, while the row approach yields capability lists. Next, the Bell–LaPadula model, a foundation of advanced access control technology, is reviewed. Afterward, we address the Security Assertion Markup Language (SAML), OAuth 2.0, and OpenID Connect, which have to do with identity federation. Finally, we discuss the Extensible Access Control Markup Language (XACML), which supports policy-based access control. Building on SAML, XACML aims at specification of access control policies and queries.

A.4.1 Password Management

Authentication using passwords is problematic in part because of the limitations of our memory. Best practice prescribes that passwords be sufficiently long (e.g., over 10 characters in length), contain non-alphanumeric characters, appear to be meaningless, be changed regularly, and so on. Yet we cannot remember passwords that are considered strong. The situation only becomes progressively worse as Cloud services grow. Instead of one password, we each have many and cannot remember them. In the past, when forgetting a password, a user called a customer support center. These days, the user can reset the password by following a web link received at a pre-registered e-mail address.42 Given the vulnerabilities of e-mail end to end, the resetting steps may include a form of auxiliary authentication, which inevitably introduces additional side-effects. Usually, auxiliary authentication is based on the user's answers provided previously to a selective list of pre-set security questions. “What is your mother's maiden name?” is a standard question. In general, the questions are about common facts of the user. Here a dilemma arises. On the one hand, to maximize the chance that the user remembers the answers without writing them down, only truthful (or straightforward) answers should be provisioned in the system. On the other hand, such answers tend to be known or knowable by others in the age of Google. The dilemma can be eased somewhat by having the user customize the security questions so that the answers are obscure. For example, if a user had a history teacher with the nickname Turtle at high school, the user can choose to have the question “What does Turtle teach?” Increasing the number of security questions also helps. Overall, the set of answers to security questions is essentially another secret. The secret is relatively long-term and, in fact, needs to be protected accordingly.

Another major problem with passwords has to do with how they are stored in an authentication system. Keeping them in plain text with normal access control is obviously insufficient. An attacker breaking into the authentication system can easily steal the stored passwords and impersonate the users. Moreover, system administrators who have legitimate access to the stored passwords could misbehave. A standard practice is to cryptographically hash the passwords and store only the hash values. Cryptographic hashing is achieved with so-called one-way functions. This is not a formally defined concept. Roughly speaking, for a function H(p) to be one-way, it must have the property that it is “easy” to compute H(p) from p but “very hard” to solve an equation H(p) = Q for p. And even when one such solution is known, it is still “very hard” to find another one. “Easy” and “hard” are, of course, imprecise terms. What they refer to is computational complexity. If something can be computed fast (say, within seconds) it is “easy,” but if there is no known algorithm that would compute this in less than say 1000 years on a modern computer, then it might be considered “hard.”

With the password hashing scheme, to verify a password entered by the user, the authentication process computes its hash and checks the result against the stored hash value.43 Thus no one, not even the root, can look up a user's password. This is quite a feat! Encryption of passwords, for example, cannot achieve the same effect, since it is reversible. Whoever has access to the encryption key, legitimately or illegitimately, can know the password.

Roger Needham and Mike Guy are credited with inventing password hashing in 1963 [19], but standard cryptographic hash algorithms (e.g., MD544 and SHA-256) emerged much later. For a while, only “home-brewed” algorithms were used. One such algorithm was implemented in Multics and found to be flawed during a review, true to the common wisdom that cryptography is a hard subject. The Multics incidence prompted Robert Morris and Ken Thompson to develop a new hash function [20] for Unix, and this was ultimately proved cryptographically secure in 2000 [21].

Despite its virtue, the password-hashing scheme is vulnerable to dictionary attacks. Such an attack uses a pre-constructed dictionary of possible passwords (such as common names and words from the Oxford English dictionary) and their hash values using a known algorithm. The dictionary will take time to build, but this needs to be done only once. An attacker who manages to get hold of hashed passwords can then look up the hashes in the dictionary. If there are matches, the passwords are now identified. Dictionary attacks can be serious, since there are always people choosing bad passwords and dictionary look-ups are easy. Fortunately, when developing Unix at Bell Labs in the 1970s, Robert Morris and Ken Thompson anticipated the attacks and devised a technique to counter them [20]. The technique is hinged on including an n-bit random number (called the salt) when the hash of a password is computed. In other words, what is hashed is not just the password but the concatenation of the password and the salt. The salt is specific to each password and is changed whenever the password is changed. It is stored in the clear together with the hash value. Now when verifying a password entered by the user, the authentication process looks up the salt, computes the hash value of the concatenated salt and password, and compares the result with the stored value. As a result, the dictionary for pure password hashes no longer works and has to be reconstructed for every salt value. With an n-bit salt, this means 2n new dictionaries. When first introduced in Unix, salts were 12 bits long. These days, with ever-more-powerful processors and cheaper storage, salts should be much longer (at least 64 bits in length) to increase the cost of pre-computation. Another way to mitigate dictionary attacks is to iterate the hashing operation multiple times. This, however, has an impact on run-time authentication performance.

A.4.2 Kerberos

Kerberos was initially designed to authenticate a human user on any workstation (versus the user's own machine) in a distributed environment based on what the user knows. Its major goal has been to provide mutual authentication between a user of any computer and any designated resource (server) that belong to the network. Kerberos has been a solution of choice in the enterprise world, given its built-in support for single sign-on. Developed at MIT,45 Kerberos has been standardized by the IETF. RFC 412046 contains the core specification of Kerberos v5. Most operating systems support Kerberos47 today, including Microsoft Windows™.

Fundamentally, user authentication is password-based in Kerberos. Yet passwords are never exchanged directly. This is done through a scheme based on the Needham–Schroeder protocol [22], which uses cryptographic operations to achieve mutual authentication and confidentiality protection. Central to the scheme is a Key Distribution Center (KDC), which shares a secret key with every user (as well as every server) in an administrative domain (known as a realm). Figure A.15 shows how the scheme works in principle.

Simplified diagram shows Kerberos at work where Alice is connected to Key distribution center through Alice, AS and K sub(A)of K sub(A-AS), ticket sub(AS). Alice is also connected to Application server through ticket sub(AS), K sub(A-AS) of time and K sub(A-AS) of time plus 1

Figure A.15 Kerberos at work (simplified).

Alice wants to access an application server providing, say, an e-mail service. She logs into the KDC, giving her name and the application server's name. Upon receiving Alice's information, the KDC generates a session key (KA-AS) for Alice and the application server to share, encrypts the key with Alice's secret key (resulting in KA(KA-AS)), encrypts the key (together with Alice's name) with the application server's key (resulting in TicketAS, known as a ticket to the application server), and sends the two encrypted blobs to Alice. She decrypts the session key, encrypts the time stamp with the session key, and sends it together with the ticket (which is unreadable to her) to the application server. By decrypting the ticket, the application server learns the session key and Alice's name. Now both Alice and the application server are armed with the session key. They can authenticate each other by demonstrating knowledge of the key. This is achieved by Alice sending the encrypted time stamp and application server response in kind, except that the time stamp is incremented to avoid reply attacks. Alice gets serviced upon successful mutual authentication. Alice can use the same ticket to get the service later, as long as the ticket remains valid. She needs to get a new ticket if this is not the case.

As far as the user interface is concerned, the secret key is transparent. Alice still supplies her name and the password when logging into the KDC. It is the Kerberos client's job to convert the password into the secret key and immediately erase it. (Naturally, whenever Alice changes the password, her secret key is re-derived and the KDC is updated accordingly. Changing password can actually be implemented as a service accessible through a Kerberos ticket.) But there are two problems with the authentication flow shown in Figure A.15. The first problem is that there is no way for the KDC to know whether Alice is indeed the one sending the request. It sends a reply regardless. This is harmless since nobody but Alice will be able to understand the reply. Nevertheless, an adversary can just keep sending requests to attack Alice's key, especially if it is derived from a password. To address this problem, when Alice sends a request, she may include an encrypted time stamp with her secret key. Now the KDC can know whether the sender is really Alice and send a reply only if this is the case.

The other problem is that the client necessarily remembers Alice's secret key during her entire login session to prove her identity when needed. This makes the long-term secret key vulnerable to attack. To address the shortcoming, a special application server is introduced to the KDC to act as a buffer. Called the Ticket-Granting Server (TGS), it is responsible for issuing tickets to any other application servers and shares a secret key with the authentication server in the KDC—as shown in Figure A.16. Now Alice gets a short-lived session key (i.e., KA-TGS) and a special ticket (called a ticket-granting ticket) when first logging in. The client uses KA-TGS and TicketTGS to interact with the TGS on her behalf, whenever she seeks permission to access a server. As long as KA-TGS and TicketTGS are valid, they can be used to get a ticket without Alice re-entering her password.

Improved diagram shows Kerberos at work where Alice is connected to Key distribution center which consists of authentication server, and ticket granting server. Alice is also connected to Application server.

Figure A.16 Kerberos at work (improved).

Kerberos also supports the provision of tickets to other realms (e.g., organizations outside a given enterprise, with which the enterprise has an established relationship). The resulting cross-realm authentication is transparent to the user, who never needs to re-authenticate after entering the login and password for the hosting realm. The way it works is by having the TGS in one realm (say realm B) be registered in the other realm (say realm A). Alice (who belongs to realm A) can then access an application server in realm B by first obtaining a ticket from the TGS in realm A for the TGS in realm B, and then using that ticket to obtain from the TGS in realm B a ticket for an application server.

To summarize, the key features of Kerberos are that it:

  1. Authenticates the human user based on what he or she knows (i.e., a password), which the user can change at will;
  2. Supports single sign-on to all network resources from a host outside the network;
  3. Shields the user from all protocol complexities (including those associated with the generation and management of cryptographic keys, which actually provide much stronger authentication than a simple password scheme would);
  4. Protects the end user by ensuring that the permanent key is never stored anywhere outside the network key distribution center.

A.4.3 Access Control Lists

Access control lists are object-specific. An access control list specifies the subjects who can access a given object and the rights of each subject. The list is kept and managed centrally, often as part of an operating system. When a subject tries to access an object, the central system searches the ACL associated with the object. If the subject together with the necessary rights are on the list, access is granted. Otherwise, access is denied. Figure A.17 shows the ACLs corresponding to the example access control matrix in Chapter 7. We can see that ACLs could still be tedious; for one thing, the size of an ACL grows with the number of eligible users and the ACL is subject to churns if the user population changes frequently. Hence, it would be useful if redundant information on the list can be reduced. One approach to this end is to use the group48 concept. A group consists of multiple subjects sharing the same rights. As a result, an ACL can refer to just a group rather than every subject in the group. In this sense, a group is a special subject. A regular subject can be assigned to one or more groups and can access an object based on an individual subject's or group's rights.

Diagram shows access control lists of four objects arranged from top to bottom; object 1 with subjects 1, 2, and 3, object 2 with subjects 1, 2, and 3, object 3 with subjects 1, and 3, and object 4 with subjects 1, and 3.

Figure A.17 Access control lists.

ACLs were used by Multics and became widespread in Unix systems.49 An ACL associated with a file is typically represented as three triplets identifying the respective rights of the owner, the group, and the rest of the world (i.e., other). Each triplet consists of flags controlling whether the file can be read, written, and executed, respectively. A file with all the flags set will have the ACL that reads rwxrwxrwx; it is readable, writable, and executable by all. A file with more strict access would have the ACL that reads rwxr- - - - -; it is readable, writable, and executable by the owner only. The Unix implementation of ACLs is a form of abbreviation to contain their sizes; it limits the permissions that the owner can assign. For instance, it is impossible for Alice (as the owner) to allow Bob to read her file, Chris to write to it, Debbie to read and write to it, and Eve to execute it. Various Unix-based operating systems have augmented the abbreviated ACLs with varied levels of sophistication.

Overall, with ACLs, it is easy to verify, at the time of access, whether a given user is indeed authorized for access. It is also easy for an owner of an object to revoke the rights given to a subject; the owner just deletes the subject's rights from the object's ACL. To downgrade the rights given to a subject is easy too. The specific rights are removed from the subject's entry in the ACL. Nevertheless, ACLs have limitations. As a start, ACLs are unsuitable for handling cases where a user needs to delegate the authority to another user for a period of time, say a manager asking a subordinate to approve purchasing requests while he is on vacation. There is also the inherent difficulty of ascertaining privileges on a per-user basis from the multitude of ACLs. Yet such a determination is necessary when certain user's access rights need to be revoked. The use of groups also adds another wrinkle. An ACL could contain both a user and a group to which the user belongs. If just the user's right to an object is revoked, the user can still access the object through the group membership.

A.4.4 Capability Lists

Capability lists are subject-specific. A capability list contains the capabilities granted to a given subject. A capability specifies a particular object along with the permitted operations on the object. Dennis and Van Horn [23] introduced this term in describing a mechanism for controlling access to objects in memory. Conceptually, a capability is similar to a Kerberos ticket or OAuth token, as described earlier. Figure A.18 shows the capability lists corresponding to the example access control matrix in Chapter 7. The subject's set of capabilities determines exactly which objects the subject may access. There is no need to authenticate the subject.

Diagram shows capability lists of three subjects arranged from top to bottom; subject 1 with objects 1, 2, 3, and 4, subject 2 with objects 1, and 2, and subject 3 with objects 1, 2, 3, and 4.

Figure A.18 Capability lists.

When accessing an intended object, a subject must present the corresponding capability. Normally, the subject obtains the capability beforehand and stores it for later use.50 As a result, it is essential that a subject cannot forge or modify a capability and then use it. In other words, a capability shall be tamper-proof and authenticable. An effective approach to this end is through cryptography. An example is the PKI tokens in OpenStack Keystone, which are signed and verifiable.

As another example, Andrew Tanenbaum et al. [24] developed a scheme for the Amoeba distributed operating system. The scheme works as follows. A capability consists of an authenticable checksum in addition to the usual object identifier and rights. The checksum is computed with a cryptographically secure one-way function (e.g., HMAC) over the object identifier, rights, and a secret key (actually a random number as implemented) known only to the access control system. When attempting to access the object, the subject sends the capability to the system as part of the request. The system computes the checksum using the object identifier and rights in the capability, and the secret key that it holds. If the checksum matches the one in the capability, the request is granted. Otherwise, it is rejected. If a subject changes the object identifier or rights, the checksum in the capability will become invalid; the subject also cannot produce the right checksum without the secret key. This scheme should remind you of the object storage access control mechanism discussed in Chapter 6, which works essentially in the same way.

Another implication of a capability-based system is that a subject can pass copies of capabilities to other subjects without interacting with an authority, giving them access to objects. This is a double-edged sword. On the one hand, sharing and delegation become simpler. On the other hand, it is harder to track who gives which capabilities to whom, and whether the new holders of the given capabilities are authorized. It is, therefore, difficult to revoke the access rights of a selective set of subjects to an object. A work-around is to invalidate all outstanding capabilities associated with the object and to issue new capabilities to the eligible subjects. This can be achieved by changing the secret key in Amoeba and keeping track of revocation events in OpenStack Keystone.

A.4.5 The Bell–LaPadula Model

The Bell–LaPadula model [25] addresses control of information flow through two policy rules (called properties). One is the simple-security property that a subject can read only an object at the same or lower security level. Hence, a general can read a soldier's documents but a soldier cannot read a general's documents. But the no-read-up rule is insufficient to stop leakage of information downward. A general could read a confidential document and write what he read to an unclassified document accessible to a soldier. To prevent this, a subject must not be allowed to write down the security hierarchy. Thus comes the confinement property or *-property,51 which postulates that a subject can modify an object only at the same or higher security level. (This rule prevents a situation in which a general copies the content of a Top-Security file and pastes it to an Unclassified file.)

When both rules are enforced, as shown in Figure A.19, information can flow only upward. But this is problematic in practice. At some point, troops have to learn from their commanders where to go. Bell and LaPadula solved this problem by exempting a special group of trusted subjects from the rules. Another assumption of the Bell–LaPadula model is that the security levels of the involved actors stay the same (which is called the tranquility property). If security levels are allowed to change, information could flow in an undesired direction. Figure A.20 shows such a case, where Subject 4 lowers its own security level after reading from Object 3 in the previous example and writes to Object 1, leaking sensitive information as a result. The tranquility property actually has two versions: strong and weak. The strong version forbids changing security levels during system operation, while the weak version allows changes as long as the established security policy (e.g., no “read up” or “write down”) is not violated.

Flow diagram shows information in a Bell–LaPadula system with four security levels arranged from bottom to top; level 1 has object 1 and subject 1, level 2 has subject 2 and object 2, level 3 has object 3, subject 3 and object 3a, and level 4 has subject 4 and object 4.

Figure A.19 Flow of information in a Bell–LaPadula system.

Flow diagram shows altered information in a Bell–LaPadula system with four security levels arranged from bottom to top; level 1 has object 1, subjects 1 and 4, level 2 has subject 2 and object 2, level 3 has object 3, subject 3 and object 3a, and level 4 has object 4.

Figure A.20 Altered information flow in a Bell–LaPadula system.

Aimed at keeping secrets, the Bell–LaPadula model does not concern itself with either the trustworthiness or the integrity of the secrets. There is a danger here. With a naïve application of the model, a soldier may overwrite his superior's intelligence report. So it is possible for a general to get wrong information even though only the general can read the report. The gist of the problem is that the trustworthiness of an object created by a subject can be downgraded but not upgraded by the trustworthiness of the objects that the subject reads. Ken Biba addressed this problem by turning the Bell–LaPadula model upside down. His model [26] introduces two reverse properties: read up and write down. Analogous to the Bell–LaPadula properties, they are named as follows:

  1. The simple integrity property that a subject can read only objects at the same or a higher security (or integrity) level.
  2. The *-integrity property that a subject can write only objects at the same or a lower security (or integrity) level.

These properties reflect the so-called low-watermark principle: the integrity of an object composed of other objects is as good as that of the least trustworthy one among the composing group. Essentially, properties reflect the policy rules. When enforced, they ensure that the integrity of information is maintained; information can flow in the direction of high to low integrity. The Biba model is the first formal, verifiable model based on information flow integrity. As in Bell–LaPadula, the strict uni-directional flow of information makes it difficult to directly apply the Biba model to practical applications. More often than not, exceptions to break the flow constraint are required, and they have to be done on a case-by-case basis.52 Furthermore, protection of information disclosure and integrity usually need to be addressed together, and yet the Biba and Bell–LaPadula models are contradictory, preventing communication between security levels.

A.4.6 Security Assertion Markup Language

SAML is a widely implemented standard based on XML.53 It was originally developed by OASIS and then adopted by other efforts. The adoption by the Identity Federation Framework (ID-FF) project at Liberty Alliance54 and the Shibboleth open-source project,55 in particular, led to concurrent adaption of SAML 1.0, while it was undergoing various revisions in OASIS. The ID-FF project, driven by provider need, introduces the notion of the circle of trust and builds on SAML to effect identity federation therein. In contrast, the Shibboleth project addresses single sign-on and privacy-preserving access control, given its root in research and education-oriented Internet2.56 Multiple SAML-based efforts, however, yielded incompatible variants. Fortunately, the stakeholders got together in time to correct the course and develop a harmonized version. The result is SAML 2.0, which was approved by OASIS in 2005. This is the version that has been supported by most SAML implementations.

SAML 2.0 is composed of a family of specifications.57 At the core is the specification [27] that defines the notion for security assertions and the protocols for exchanging assertions between an identity provider and a relying party. (The core specification was also published as ITU-T Recommendation X.1141 [28] in 2006.) A security assertion is typically issued by an identity provider and used by a relying party to authenticate and authorize a subject (e.g., an end user). The assertion consists of a set of statements about the subject and contextual information such as the issuer, recipient, issuance time, and expiration time. A statement may concern authentication, or attribute, or authorization decision. An authentication statement describes an authentication transaction, including pertinent information such as the authentication method and time of the transaction. (SAML 2.0 does not dictate a particular authentication method and can support a range of authentication methods with varied strengths, such as password, Kerberos ticket, and X.509 certificate.) An attribute statement describes the attributes associated with the subject. Finally, an authorization decision statement asserts whether to allow a subject to access the requested resource. To ensure its integrity, a SAML assertion is digitally signed by the issuer.

SAML 2.0 defines a set of request–response protocols, each of which is for a specific purpose. For example, Assertion Query and Request Protocol allows a relying party to inquire about or request a SAML assertion (pertaining to attributes or authentication or authorization decision) about a subject; Authentication Request Protocol allows a subject to obtain an authentication assertion from the identity provider; and Name Identifier Mapping Protocol allows a relying party to obtain the new identifiers of a subject from the identity provider. The protocols, however, are not defined at a level that can be used directly for communication between the involved parties. They have to rely on an existing communication protocol (e.g., HTTP) for transporting messages (i.e., requests and responses) through protocol mappings. In the SAML nomenclature, mappings of SAML messages onto standard communication protocols are called bindings [29]. For example, the HTTP Redirect binding defines how SAML messages can be carried as part of URL parameters. Since the URL length is limited in practice, specialized encodings are needed. Furthermore, exceedingly large messages will need to be transported through other bindings. One such binding is HTTP POST, which allows SAML messages to be transported as part of the base-64-encoded content. It goes without saying that the transport has to be protected (typically through TLS), regardless of the bindings.

Figure A.21 shows as an example a message flow for identity federation through the SAML HTTP Redirect binding. The assumption here is that the relying party and identity provider have a pre-established relationship. The flow is high-level and goes as follows. Upon detecting that the user (through the user agent) requesting resource access has not been authenticated, the relying party issues a SAML authentication request, which is redirected to the identity provider. Then the identity provider performs the steps to authenticate the user. Again, the steps are specific to the authentication method of choice and beyond the scope of SAML. Upon completion of the authentication steps, the identity provider sends the SAML authentication response carrying the assertion, which is redirected to the relying party. According to the authentication response, the relying party then responds.

Diagram shows SAML message flow for identity federation which consist of User agent, relying party, and IdP. Resource request and response, SAML authentication request and response are transferred between user and relying party. Authentication exchange and assertion occurs between user and IdP.

Figure A.21 SAML message flow for identity federation.

A.4.7 OAuth 2.0

This section provides the additional detail on OAuth 2.058 that we promised in Chapter 7. We begin with the types of authorization grants defined in OAuth 2.0, which are as follows:

  1. Authorization code. This type of authorization grant is generated and verified by the authorization server for one-time use by the client and is suitable for cases where the client is interacting with a user via a user agent. The code can only be used once, and is bound to the client. The binding with the client necessitates authentication by the authorization server when the client obtains an access token. It is worth noting that the authorization server alone dictates the structure of the authorization codes; it is responsible for both issuing and verifying authorization codes.
  2. Implicit. This type of authorization grant aims to optimize the performance of an in-browser client (implemented in JavaScript). With such a client, there is no assured way to authenticate it. So the client is given an access token directly, saving it the need to take an extra step and exchange the authorization grant for the access token.
  3. Resource owner password credentials. Of course, this grant type seems to defy the very goal of OAuth—never to divulge a user's password. Hence, it should be used only when there is a high degree of trust between the resource owner and the client, such as when the client is part of the user device.
  4. Client credentials. This type of authorization grant allows the client to access the protected resources under its control or under the control of the authorization server by prior arrangement.

OAuth 2.0 also allows new grant types to be defined. One emerging new grant type is the SAML 2.0 Bearer Assertion.59 Issued by an identity provider (as we discussed earlier), an assertion contains security-related information (e.g., identity and privileges) about a subject, which is usable by the authorization server. A bearer assertion is a particular type of assertion that the holder does not need to provide any other proof (e.g., a cryptographic key) to use it. Hence, it is paramount that such assertions be properly protected both at rest and in motion. Given that different actors are involved in issuing and processing assertions, there is a need for a standard way to specify them, namely SAML 2.0 here.

The authorization server typically requires that the client authenticate itself before issuing an access token. Bound to HTTP, OAuth 2.0 supports authentication based on passwords (e.g., using the HTTP basic or digest authentication scheme60) and assertions.61 Obviously token requests and responses need to be protected appropriately; they include sensitive information (such as passwords, authorization codes, and access tokens). Mechanisms to this end include TLS 1.2, the HTTP “cache-control” header field (to effect no caching of sensitive information in HTTP caches), and sending OAuth-related information in the message body rather than the request URI (to prevent sensitive information from being logged at the user agent or possible intermediaries).

To optimize user experience, the OAuth 2.0 protocol simply utilizes the HTTP redirection constructs. Figure A.22 further shows the related message flow. The use of HTTP redirection, however, comes at a cost. For example, it is difficult, if not impossible, for a user to verify the redirect URI, given all the tricks that can be used for the graphical user interface. A user, thus, could authorize a rogue site without knowing it. To mitigate the problem, a countermeasure is for the authorization server to keep a white list of redirect URIs. In other words, all legitimate clients need to register their redirect URIs with the authorization server beforehand. Nevertheless, this is not common practice. Another problem is cross-site request forgery. The redirection back to the client from the user provides a venue for an adversary to inject its own authorization code or access token. A countermeasure is to allow the client to keep track of the authorization state. To this end, OAuth 2.0 supports a parameter in the protocol to carry state information. When redirecting the user agent to the authorization server, the client may include this parameter. It is then repeated in all related follow-up messages, up to the message that carries the authorization result back to the client. Since the value of the parameter is set and processed by the client alone, the client has total control of its structure. For state management, it can capture an authorization session identifier or some specific local state information. For additional security protection, it can include a signature. Again, because sensitive information (e.g., the authorization code and token) is transmitted across the network, the authorization exchange should be carried out over a secure transport protocol such as TLS.

Diagram shows OAuth 2 user authorization message flow which consist of Client, authorization server, and resource owner. Authorization request, authentication and authorization by owner and redirect response are transferred between server and owner. Service request, redirect and other responses occurs between client and owner.

Figure A.22 OAuth 2.0 user authorization message flow.

A.4.8 OpenID Connect

OpenID Connect (OIDC) is the latest reincarnation of OpenID, although the name is the only link to its predecessor. Developed by the OpenID Foundation,62 OpenID is the first user-centric identity federation mechanism. With it, a user can select and maintain his identifier (typically a URI) as well as have a choice in selecting his identity provider. OpenID also comes with its own bespoke federated authentication protocol that is tightly coupled to HTTP [30]. Compared with SAML, OpenID has a more focused scope and a lighter approach. Still the use of XML and a custom message signature scheme makes correct implementations and their interoperation challenging. Then social networking services emerged and took off. The associated identity federation technology gradually overtook once-promising OpenID and eventually forced it to move on.

Building on OAuth 2.0, OIDC automatically acquires the former's virtues, including support for REST, JSON, standard signing and encryption mechanisms, and varied deployment scenarios. As a result, interoperability is improved. But user-centric features (e.g., user-defined identifiers) are gone. According to the OpenID Foundation (the organization continuing to oversee OpenID's evolution), OIDC:

“allows Clients to verify the identity of the End-User based on the authentication performed by an Authorization Server, as well as to obtain basic profile information about the End-User in an interoperable and REST-like manner. OpenID Connect allows clients of all types, including Web-based, mobile, and JavaScript clients, to request and receive information about authenticated sessions and end-users. The specification suite is extensible, allowing participants to use optional features such as encryption of identity data, discovery of OpenID Providers, and session management, when it makes sense for them.”

Figure A.23 shows an example of the OIDC message flow, which essentially follows the authorization code flow in OAuth 2.0. The key differences according to the OIDC 1.0 core specification63 include the following:

Diagram shows OpenID connect message flow which consist of Client, authorization server, resource server, and end user. Authorization request, authentication and authorization by owner, redirect response and other responses, service request, etc occurs between them.

Figure A.23 OpenID connect message flow.

  • A set of special values (e.g., opened, profile, and e-mail) is defined for the scope parameter. The presence of openid in the authorization request (e.g., step 1 and step 2) is mandatory. Multiple additional values may be included as well. The information about the end user that the client can obtain depends on the presence of these scope values;
  • In addition to an access token, an ID token is also returned as part of the token response (i.e., step 7). The ID token is represented as a JSON web token with a JSON web signature based on an IETF standard.64 It contains a set of claims made by the authorization server. The claim set must include the information that identifies the issuer of the response, the intended audience (i.e., the client), the expiration time of the token, and the issuing time of the token. The information is used to further validate the token after its signature is verified as valid;
  • Claims about the authenticated end user are treated as a protected resource accessible through the UserInfo endpoint of the resource server. To obtain claims about the end user, the client sends a request to the UserInfo endpoint, including an access token (as shown in step 8). The returned claims depend on the scope values in the access token. For example, if the value of the scope parameter is profile, the end user's default claims are returned. These claims reveal information such as full name, gender, birth date, and home page. Claims are normally represented as a JSON object, which may be signed, or encrypted, or both.

A.4.9 Access Control Markup Language

The Attribute-Based Access Control (ABAC) is evolved from the RBAC model discussed in Chapter 7. It allows varied attributes (in terms of values and relations) to be considered at the time of object access. The attributes may be provided by the subject as part of an access request or inferred from the environment (such as in the case of time and location). A development of ABAC is the Extensible Access Control Markup Language (XACML) [31, 32], standardized jointly by OASIS and the ITU-T. The language is designed for specifying access control policies and queries.

XACML follows a general policy control model, which employs the PDP and PEP described earlier for QoS support and, in addition, the constructs below:

  • Policy Administration Point (PAP), which administers policies invoking typical operations such as create, update, and delete;
  • Policy repository, which is a database or a collection of databases storing policies (typically in the form of rules, such as IF <condition> THEN <action>).

Figure A.24 shows how these different constructs are related to each other through an example workflow involving the following steps:

Workflow shows policy control which consist of policy administration point, policy decision point, policy enforcement point, and policy repository. Policy administration point is connected to policy repository through 'create, update or delete policy'.

Figure A.24 Policy control workflow.

  1. The PEP receives an access request for a protected resource (or an object);
  2. The PEP passes the request to the PDP;
  3. The PDP fetches the applicable policy from the policy repository;
  4. The PDP, upon making the access decision, returns the result to the PEP;
  5. The PEP returns the requested resource or rejects the access request, enforcing the decision.

XACML builds on and is consistent with SAML. It has two key components:

  1. An XML-based language for expressing authorization and entitlement policies (e.g., who can do what, where, and when). Such policies are stored in the policy repository.
  2. Request and response messages between the PDP and PEP, where the request message is for triggering and feeding into the policy evaluation process at the PDP, and the response message from the PDP is for capturing the actions or obligations that the PEP needs to fulfill.

True to an XML-based language, XACML is verbose and typically generated by machines. To support RBAC, two eponymous profiles [33, 34] have been developed for XACML 2.0 and 3.0, respectively. In both profiles, roles are expressed as Subject Attributes65 in general. Depending on the application environment, there may be either one role attribute whose values correspond to different roles (e.g., “employee,” “manager,” or “officer”), or different attribute identifiers, each indicating a different role. Furthermore, the following policy types are defined in both profiles as well:

  • Role, which associates a given role attribute and value with a permission;
  • Permission, which contains the actual permissions (i.e., policy elements and rules);
  • HasPrivilegesOfRole, which supports querying about whether a subject has privileges associated with a given role. It is also possible to express policies in which a user holds several roles simultaneously.

It is worth noting that in the RBAC profile of XACML 2.0, there is an extra policy type (i.e., Role Assignment) defined to handle the actual assignment of roles to subjects. But the question of what roles a subject can have generally is considered beyond the scope of XACML. The question is addressed by a Role Enablement Authority. According to the following text, common to the scope descriptions of [33, 34]:

“Such an entity may make use of XACML policies, but will need additional information … The policies specified in this profile assume all the roles for a given subject have already been enabled at the time an authorization decision is requested. They do not deal with an environment in which roles must be enabled dynamically based on the resource or actions a subject is attempting to perform. For this reason, the policies specified in this profile also do not deal with static or dynamic “Separation of Duty” … A future profile may address the requirements of this type of environment.”

Notes

References

  1. Birman, K.P. (2012) Guide to Reliable Distributed Systems: Building High-Assurance Applications and Cloud-Hosted Services. Springer-Verlag, London.
  2. OASIS (2013) Committee Specification 01: Topology and Orchestration Specification for Cloud Applications, version 1.0. http://docs.oasis-open.org/tosca/TOSCA/v1.0/cs01/TOSCA-v1.0-cs01.pdf.
  3. Prywes, N.S. (1977) Automatic program generation. Proceedings of National Computer Conference AFIPS ‘77, ACM, New York, pp. 679–689.
  4. Ahrens, J. and Prywes, N. (1995) Transition to a legacy- and reuse-based software life cycle. IEEE Computer, 28(10), 27–36.
  5. Binz, T., Breiter, G., Leymann, F., and Spatzier, T. (2012) Portable Cloud services using TOSCA. IEEE Internet Computing, 16(03), 80–85.
  6. Sunyaev, A. and Schneider, S. (2013) Cloud services certification. Communications of the ACM, 56(2), 33–36.
  7. Waixenegger, T., Wieland, M., Binz, T., et al. (2013) Policy4TOSCA: A policy-aware Cloud service provisioning approach to enable secure Cloud computing. Lecture Notes in Computer Science, 8185, 360–376.
  8. Liu, K. (2013) Development of TOSCA Service Templates for provisioning portable IT Services. Diploma Thesis No. 3428, University of Stuttgart, Faculty of Computer Science, Electrical Engineering and Information Technology.
  9. Fielding, R.T. and Taylor, R.N. (2000) Principled design of the modern Web architecture. Proceedings of the 22nd International Conference on Software Engineering, ACM, New York, pp. 407–416.
  10. Fielding, R.T. (2000) Architectural styles and the design of network-based software architectures. PhD dissertation, University of California, Irvine, CA. www.ics.uci.edu/∼fielding/pubs/dissertation.
  11. Bush, V. (1945) As we may think. The Atlantic Monthly, 176(1), 101–108. www.theatlantic.com/magazine/archive/1945/07/as-we-may-think/303881/.
  12. Nelson, T.H. (1965) Complex information processing: A file structure for the complex, the changing and the indeterminate. Proceedings of the ACM 20th National Conference, ACM, New York, pp. 84–100.
  13. Nabokov, V. (1963) Pale Fire. Lancer Books, New York.
  14. Rowberry, S. (2011) Pale Fire as a hypertextual network. Proceedings of the 22nd ACM Hypertext Conference, HT’11, ACM, New York, pp. 319–324.
  15. Gray, H.J. and Prywes, N.S. (1959) Outline for a multi-list organized system. Proceedings of ACM ‘59; Preprints of Papers Presented at the 14th National Meeting of the Association for Computing Machinery, ACM, New York, pp. 1–7.
  16. Prywes, N.S. and Gray, H.J. (1963) The organization of a multilist-type associative memory. Transactions of the American Institute of Electrical Engineers, Part I: Communication and Electronics, 82(4), 488–492.
  17. Barnet, B. (2013) Memory Machines: The evolution of hypertext. Anthem Press, London.
  18. Carmody, S., Gross, W., Nelson, T.H., et al. (1969) A hypertext editing system for the /360. Center for Computer & Information Sciences, Brown University, Providence, RI. File Number HES360-0, Form AVD-6903-0, pp. 26–27 (cited from [17]).
  19. Bonneau, J. (2012) Guessing human-chosen secrets. PhD dissertation, University of Cambridge.
  20. Morris, R. and Thompson, K. (1979) Password security: A case history. Communications of the ACM, 22(11), 594–597.
  21. Wagner, D. and Goldberg, I. (2000) Proofs of security for the Unix password hashing algorithm. In Okamoto, T. (ed.), Advances in Cryptology—ASIACRYPT 2000. Springer, Berlin, pp. 560–572.
  22. Needham, R.M. and Schroeder, M.D. (1978) Using encryption for authentication in large networks of computers. Communications of the ACM, 21(12), 993–999.
  23. Dennis, J.B. and Van Horn, E.C. (1966) Programming semantics for multiprogrammed computations. Communications of the ACM, 9(3), 143–155.
  24. Tanenbaum, A.S., Van Renesse, R., Van Staveren, H., et al. (1990) Experiences with the Amoeba distributed operating system. Communications of the ACM, 33(12), 46–63.
  25. La Padula, L.J. and Elliott Bell, D. (1973) Secure Computer Systems: Mathematical Foundations. MTR-2547-VOL-1, Mitre Corporation, Bedford, MA.
  26. Biba, K.J. (1977) Integrity Considerations for Secure Computer Systems. MTR-3153-REV-1, Mitre Corporation, Bedford, MA.
  27. OASIS (2005) Assertions and protocols for the OASIS Security Assertion Markup Language (SAML) V2.0. http://docs.oasis-open.org/security/saml/v2.0/saml-core-2.0-os.pdf.
  28. International Telecommunication Union (2006) ITU-T Recommendation X.1141: Security Assertion Markup Language (SAML 2.0). www.itu.int.
  29. OASIS (2005) Bindings for the OASIS Security Assertion Markup Language (SAML) V2.0. http://docs.oasis-open.org/security/saml/v2.0/saml-bindings-2.0-os.pdf.
  30. OpenID Foundation (2007) OpenID Authentication 2.0. http://openid.net/specs/openid-authentication-2_0.html.
  31. International Telecommunication Union (2006) ITU-T Recommendation X.1142: eXtensible Access Control Markup Language (XACML 2.0). www.itu.int.
  32. International Telecommunication Union (2013) ITU-T Recommendation X.1144: eXtensible Access Control Markup Language (XACML 3.0). www.itu.int.
  33. OASIS (2005) Core and hierarchical Role Based Access Control (RBAC) profile of XACML v2.0. http://docs.oasis-open.org/xacml/2.0/access_control-xacml-2.0-rbac-profile1-spec-os.pdf.
  34. OASIS (2014) Core and hierarchical Role Based Access Control (RBAC) profile of XACML v3.0. http://docs.oasis-open.org/xacml/3.0/rbac/v1.0/csprd04/xacml-3.0-rbac-v1.0-csprd04.pdf.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset