3.3. Unordered Structures

Unordered data structures don't occur often in nondistributed programming, but they are widely used in space-based programming. An unordered distributed data structure consists of a collection of objects. Unlike an ordered collection, such as an array where we can address a particular element and read its value or change it, an unordered structure allows us to typically perform two operations: “put” and “get.” The “put” operation allows you to add a new object to a collection, while “get” returns an arbitrary object from the collection. Because these data structures have no internal order, there is no criteria for returning any one object over any other, so an arbitrary one is returned.

Typically most unordered data structures are called bags because they resemble a bag of objects. A bag of apples is a good example. If you have an apple and want to store it in a bag, you just throw it in. You don't need to place the apple in a particular spot, since there is no need to order the apples. When you are hungry and want an apple, you just reach in and grab an apple; apples are pretty much the same, so any apple you first lay your hands on is good enough.

In general bags have many uses, and their use in space-based programming reflects the loosely coupled nature of space-based programs. We talked a bit about loose coupling in Chapter 1, and we'll return to the subject over the next several chapters. The connection here is that bags provide a distributed data structure that can be used by processes to request and provide services without having to know the specifics of who they are requesting a service from or providing a service to. In this way the components of the distributed application are loosely coupled, because there is no direct link from one component to another.

Bags are exceedingly simple to create in space-based programming: to create a bag, you only need to define an entry class and deposit as many instances of that entry into a space as you like. While the entry may have fields that can be used to order the items (for instance, an apple may be marked with the date it was picked from the tree) those fields are usually not important to the processes that make use of the bag (in this case, the dates aren't important to eating, unless some of the apples are really old). When a process needs an item from a bag, it grabs one, since any one will do.

Let's take a look at bags and see how we might use them in space-based programming.

3.3.1. Bags

Let's start with a simple example: many online services provide a means of picking an anonymous game partner from a pool of other users that have registered interest in playing a game (say chess). To implement such a service, we first need to create an entry that represents a user's interest. In the case of someone interested in chess games, the entry might look like:

public class ChessPartner implements Entry {
  public String username;
}

For every person interested in playing chess, we instantiate a ChessPartner entry, filling in that person's username, and write the entry to a space.

To find a chess partner, we simply create a template and fish one out:

ChessPartner template = new ChessPartner();
ChessPartner partner = (ChessPartner)
  space.take(template, null, Long.MAX_VALUE);

Here we first create a template (with its username field set to null by default) that will match any ChessPartner entry in the space. We then take one such entry out of the space, waiting if necessary until one enters the space. Once we have an entry, we obviously need to contact the partner and begin the game, but we will leave those details to the online service.

Bags need not consist of a collection of all entries of a specific type—we can treat a subset of entries as a bag as well. For instance, we could define our chess partner entry like this:

public class ChessPartner implements Entry {
  public String username;
  public String skillLevel;
}

Here, we've added a skillLevel field that can be used to describe the desired skill level of the opponent. Now you can look for an opponent that might give you a more challenging game:

ChessPartner template = new ChessPartner();
template.skillLevel = "expert";
ChessPartner partner = (ChessPartner)
  space.take(template, null, Long.MAX_VALUE);

Here we are still choosing an arbitrary partner from a pool of potential candidates, but we are dividing the bag up into smaller bags that are still unordered.

3.3.2. Task Bags and Result Bags

A common method of providing space-based services is through task bags and result bags. Conceptually a service works as follows: a process looking for a service places a task entry into a task bag, and a service looking for “orders to fill” removes the task from the bag and provides a service of some kind. When the service order is complete, the service places a result entry into a result bag, from which the original process can retrieve it.

While this scenario sounds like it is tailored to electronic commerce, task and result bags have wider application and are often used in distributed and parallel computations. In the remainder of this section we are going to take a quick peek at using task and result bags for parallel computation, which we will discuss in more depth in Chapter 6.

Replicated worker computing (sometimes called master-worker) is built on the observation that many computational problems can be broken into smaller pieces that can be computed by one or more processes in parallel. A great many computations fall into this class of problems, including ray tracing, weather model simulations, and parallel searches (to name a few).

If we look at the “skeleton” of these computations, many tend to look something like:

for (int i = 0; i < len; i++) {
   //... compute intensive code
}

That is, the computations are fairly simple, consisting of a loop over a common, usually compute-intensive, region of code. Often the size of the loop itself (dictated here by the variable len) can be quite long. In a ray tracing application, a computation might have to iterate over millions of pixels in an image, each of which takes a fair number of seconds to compute. These kinds of computations lend themselves well to being decomposed into smaller pieces that can be computed in parallel.

Typically in replicated worker computing, a master process takes the work performed in the for loop and divides it up into a number of tasks that it deposits into a task bag. One or more processes, known as workers, grab these tasks, compute them, and place the results back into a result bag. The master process collects the results as they are computed and combines them into something meaningful, such as a ray-traced image.

Here is what the skeleton for a typical master process might look like:

public class Master {
  for (int i = 0; i < totalTasks; i++) {
     Task task = new Task(...);
     space.write(task, ...);
  }
  for (int i = 0; i < totalTasks; i++) {
     Result template = new Result(...);
     Result result = (Result)space.take(template, ...);
     //... combine results
  }
}

Here we first deposit totalTasks requests into a bag of tasks, and then wait for totalTasks results from another bag. We don't care about the order in which the tasks are selected to be computed, as long as someone computes them and returns their results. Likewise, we don't care about the order in which results are retrieved from the results bag; we are happy to retrieve any results that have been computed. As we collect the results, we combine them into some meaningful form.

Here is what the skeleton for a typical worker process might look like:

public class Worker {
  for (;;) {
     Task template = new Task(...);
     Task task = (Task)space.take(template, ...);
     Result result = compute(task);
     space.write(result, ...);
  }
}

Here the worker continually looks for new tasks in the space. Whenever it finds and takes one, it computes the task (which returns some result) and then writes the result back into the space for the master to pick up.

One of the things that makes this simple pattern so powerful is that it is loosely coupled. The master adds tasks to a space, but doesn't worry about who will ultimately receive the tasks and compute them. Similarly, the workers themselves just pick up tasks from the bag, without worrying about who is asking for the work to be done. Because this pattern is loosely coupled, we have the beginnings of a scalable architecture: as we add more workers, the computation is generally completed more quickly. The pattern also naturally load balances, because the workers compute tasks at their own pace. We will return to this topic and explore it more fully in Chapter 6.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset