Chapter 2. The HTTP Protocol

Now that we have a definition for Internet of Things, where do we start? It is safe to assume that most people that use a computer today have had an experience of Hypertext Transfer Protocol (HTTP), perhaps without even knowing it. When they "surf the Web", what they do is they navigate between pages using a browser that communicates with the server using HTTP. Some even go so far as identifying the Internet with the Web when they say they "go on the Internet" or "search the Internet".

Yet HTTP has become much more than navigation between pages on the Internet. Today, it is also used in machine to machine (M2M) communication, automation, and Internet of Things, among other things. So much is done on the Internet today, using the HTTP protocol, because it is easily accessible and easy to relate to. For this reason, we are starting our study of Internet of Things by studying HTTP. This will allow you to get a good grasp of its strengths and weaknesses, even though it is perhaps one of the more technically complex protocols. We will present the basic features of HTTP; look at the different available HTTP methods; study the request/response pattern and the ways to handle events, user authentication, and web services.

Before we begin, let's review some of the basic concepts used in HTTP which we will be looking at:

  • The basics of HTTP
  • How to add HTTP support to the sensor, actuator, and controller projects
  • How common communication patterns such as request/response and event subscription can be utilized using HTTP

HTTP basics

HTTP is a stateless request/response protocol where clients request information from a server and the server responds to these requests accordingly. A request is basically made up of a method, a resource, some headers, and some optional content. A response is made up of a three-digit status code, some headers and some optional content. This can be observed in the following diagram:

HTTP basics

HTTP request/response pattern

Each resource, originally thought to be a collection of Hypertext documents or HTML documents, is identified by a Uniform Resource Locator (URL). Clients simply use the GET method to request a resource from the corresponding server. In the structure of the URL presented next, the resource is identified by the path and the server by the authority portions of the URL. The PUT and DELETE methods allow clients to upload and remove content from the server, while the POST method allows them to send data to a resource on the server, for instance, in a web form. The structure of a URL is shown in the following diagram:

HTTP basics

Structure of a Uniform Resource Locator (URL)

HTTP defines a set of headers that can be used to attach metainformation about the requests and responses sent over the network. These headers are human readable key - value text pairs that contain information about how content is encoded, for how long it is valid, what type of content is desired, and so on. The type of content is identified by a Content-Type header, which identifies the type of content that is being transmitted. Headers also provide a means to authenticate clients to the server and a mechanism to introduce states in HTTP. By introducing cookies, which are text strings, the servers can ask the client to remember the cookies, which the client can then add to each request that is made to the server.

HTTP works on top of the Internet Protocol (IP). In this protocol, machines are addressed using an IP address, which makes it possible to communicate between different local area networks (LANs) that might use different addressing schemes, even though the most common ones are Ethernet-type networks that use media access control (MAC) addresses. Communication in HTTP is then done over a Transmission Control Protocol (TCP) connection between the client and the server. The TCP connection makes sure that the packets are not lost and are received in the same order in which they were sent. The connection endpoints are defined by the corresponding IP addresses and a corresponding port number. The assigned default port number for HTTP is 80, but other port numbers can also be used; the alternative HTTP port 8080 is common.

Tip

To simplify communication, Domain Name System (DNS) servers provide a mechanism of using host names instead of IP addresses when referencing a machine on the IP network.

Encryption can done through the use of Secure Sockets Layer (SSL) or Transport Layer Security (TLS). When this is done, the protocol is normally named Hypertext Transfer Protocol Secure (HTTPS) and the communication is performed on a separate port, normally 443. In this case, most commonly the server, but also the client, can be authenticated using X.509 certificates that are based on a Public Key Infrastructure (PKI), where anybody with access to the public part of the certificate can encrypt data meant for the holder of the private part of the certificate. The private part is required to decrypt the information. These certificates allow the validation of the domain of the server or the identity of the client. They also provide a means to check who their issuer is and whether the certificates are invalid because they have been revoked. The Internet architecture is shown in the following diagram:

HTTP basics

HTTP is a cornerstone of service-oriented architecture (SOA), where methods for publishing services through HTTP are called web services. One important manner of publishing web services is called Simple Object Access Protocol (SOAP), where web methods, their arguments, return values, bindings, and so on, are encoded in a specific XML format. It is then documented using the Web Services Description Language (WSDL). Another popular method of publishing web services is called Representational State Transfer (REST). This provides a simpler, loosely-coupled architecture where methods are implemented based on the normal HTTP methods and URL query parameters, instead of encoding them in XML using SOAP.

Recent developments based on the use of HTTP include Linked Data; a re-abstraction of the Web, where any type of data can be identified using a Unique Resource Identifier (URI), semantic representation of this data into Semantic Triples, as well as semantic data formats such as Resource Description Framework (RDF), readable by machines, or Terse RDF Triple Language (TURTLE), more readily readable by humans. While the collection of HTTP-based Internet resources is called the Web, these later efforts are known under the name 'the semantic web'.

Tip

For a thorough review of HTTP, please see Appendix E, Fundamentals of HTTP.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset