11.2. The Crypt Application

We are now going to develop a parallel application on top of our compute server. Our application is an exercise in applied cryptography, which is another way of saying we are going to write an application that breaks encrypted passwords.

11.2.1. Background

A common method of encrypting passwords, particularly in UNIX systems, is the use of a one-way encryption algorithm, which takes the user's password as input and returns the user's encrypted password. For instance, a password of “ secret” may result in the output “ BgU8DFSLhhz6Q.” One-way functions that work well for encryption have the property that, given the encrypted version of a password, it is difficult to compute the inverse of the function to produce the original password (which makes it hard to guess passwords from their encrypted form).

The parallel application we are going to be developing will break passwords through the brute force technique of trying every possible combination of ASCII characters that make up a password. This technique works as follows: given a one-way function, let's call it crypt, and an encrypted password e, we can iterate through every possible ASCII password p and compute crypt(p). We then compare the result of crypt(p) with e, and if the two are equal, then p is the password. If not, then we keep trying.

It isn't difficult to enumerate every possible ASCII password; it can be done by generating a sequence like this:

aaaaaaaa
aaaaaaab
aaaaaaac
.
.
.
zzzzzzzz

with the caveat that characters other than a-z can be used for passwords (in general, all 95 printable ASCII characters can be used).

There are, of course, other methods of trying to break passwords. The most common method is to use a dictionary of words that are then encrypted and checked against the stored encrypted password. This technique often works well in practice, because users often choose common words as passwords. However, it isn't foolproof, because users may choose passwords that aren't in dictionaries.

Exploring the space of every possible password—not just dictionary words—can be a very compute-intensive problem (if not intractable in some cases). Because of this we are going to implement a scaled-down application that breaks passwords of four characters or less. Even with four characters, the space of possible passwords would take a standalone Java application on the order of several hours to compute, which makes a nice size for our parallel experiments.

Our encrypted passwords are based on the UNIX password encryption scheme. The garden-variety UNIX algorithm normally takes a password of eight characters that is prepended by two characters of “salt”—an arbitrary sequence that is prepended to prohibit simple dictionary schemes of password breaking. So to encrypt “secret,” UNIX adds two random characters, say “aw” and then encrypts “awsecret” via a one-way function.

The UNIX crypt one-way function is provided in a class called JCrypt, written by John F. Dumas for the Java platform, and can be found in the source code for this book. The details of the class are not important, with the exception of the crypt method that we make use of in our code. The signature for this method is as follows:

public static String crypt(byte[] password);

The crypt method is a static method that takes a password (already prepended with salt) in the form of a byte array and returns an encrypted version of the password in the form of a String.

With this knowledge under our belt, let's design and build the application. Our application is going to follow the standard replicated-worker pattern: Given an encrypted password, a master process enumerates all possible passwords and generates tasks for the workers to verify whether or not the possible passwords match the encrypted version. The workers pick up the tasks and do the verification by the technique mentioned above—they first encrypt the potential password and then compare it against the encrypted password. If the two match we've found the password.

11.2.2. The Generic Worker and Crypt Task

The task entry is the heart of the compute server computation—tasks are written into the space for the GenericWorker to compute. Because we make use of the command pattern, the GenericWorker simply calls each task's execute method. Even if a worker has never seen a specific subclass of a task entry before, the space and its underlying RMI transport take care of the details of shipping the executable behavior to the worker.

Our workers look for tasks of type TaskEntry. Here we extend the task entry to create a new task: the CryptTask.

public class CryptTask extends TaskEntry {
  public Integer tries;
  public byte[] word;
  public String encrypted;

  public CryptTask() {
  }

  public CryptTask(int tries, byte[] word, String encrypted) {
    this.tries = new Integer(tries);
    this.word = word;
    this.encrypted = encrypted;
  }

  public Entry execute(JavaSpace space) {
    int num = tries.intValue();
    System.out.println("Trying " + getPrintableWord(word));
    for (int i = 0; i < num; i++) {
      if (encrypted.equals(JCrypt.crypt(word))) {
         CryptResult result = new CryptResult(word);
         return result;
      }
      nextWord(word);
    }
    CryptResult result = new CryptResult(null);
    return result;
  }

}

Let's step through the CryptTask code. A crypt task holds three pieces of information: tries, an integer that specifies the number of potential passwords each task should enumerate and try as a possible match against the encrypted version; word, a byte array that holds the word with which to begin the enumeration; and encrypted, a string that holds the encrypted password we are trying to break. So each task is given a starting point (word) and a number of enumerations (tries), to attempt matching words against the encrypted password (encrypted). All three values can be initialized in a crypt task by calling the supplied convenience constructor.

Next we have the crypt task's execute method: we first convert tries from its wrapper class into a primitive integer that is assigned to num. The rest of execute consists of one main loop that iterates for num times. Each time through the loop, we encrypt a word using the JCrypt class's crypt method and then compare it against the encrypted word. If the two are equal, then we've broken the password, and we create a CryptResult object and return it to the Worker process. The CryptResult object is simply an entry that holds the broken password in the form of a byte array.

If the two are not equal, then we call nextWord, which is shown below:

static void nextWord(byte[] word) {
   int pos = 5;
   for (;;) {
      word[pos]++;
     if ((word[pos] & 0x80) != 0) {
         word[pos--] = (byte) '!';
     } else {
        break;
     }
  }
}

The nextWord method is a static method (as we will see it is also used by the CryptMaster) that takes a word in the form of a byte array and alters the array so that it is set to the next word in the ASCII sequence. For instance, if we start at the word “aaaa” and prepend it with the salt sequence “aw” then the word “awaaaa” looks like this if we run it through nextWord a few times:

awaaaa
awaaab
awaaac
.
.
.
awaaa!
awaaba
awaabb
.
.
.
awaa!!
awabaa
awabab
.
.
.

If we take a look at the code for nextWord, we see that the method first increments the right-most character. If this character goes beyond the end of the ASCII sequence (which is the hexadecimal number 0x80) then the character is set to ! (which is the first printable character in the ASCII set), and the neighboring character to the left is incremented by one. The test for 0x80 is then repeated on that character; if equal, the character is also set to ! and the character to the left of it is incremented. And so on.

If CryptTask iterates through num possible passwords without finding a match, then it falls out of the loop and returns a result entry that has its word field set to null (the result entry consists of one field, word, that is used to hold the password when it is found).

To recap, the CryptTask has instructions to check a fixed number of potential passwords against the encrypted password. If a password is found, it is returned in a result entry, but if no password is found, then a result entry is still returned with its word field set to null (we will see why shortly).

11.2.3. The Crypt Master

Now that we've seen the worker code in detail, let's take a look at the CryptMaster, which drives the entire computation. Let's first examine the user interface of the CryptMaster (shown in Figure 11.1), which will give us a sense of its operation before we dive into the code details.

Figure 11.1. The CryptMaster Interface


The CryptMaster interface works as follows: we enter salt and a password (in the first two text entries in the upper left corner of the interface), and then optionally tweak the parameters below the password text entry, such as the “tries per task” parameter, which controls how many words each task checks against the encrypted password. We then click on the “Start” button, which starts the application.

Once started, the CryptMaster encrypts our password and displays it just below the password text entry field. The compute server will now be asked to figure out what our original password was, given the encrypted version. To do this, the crypt master begins generating tasks and collecting results until the workers break the password. During the computation, the right side of the interface provides an information display that shows the total number of words that have been checked against the encrypted password, the average number of words being computed per minute by all workers, and the number of tasks in the space waiting to be computed (this is called the “water level,” which we will discuss shortly along with the “high mark” and “low mark” configuration information that appears in the interface).

Below is the skeleton of the CryptMaster code:

public class CryptMaster extends GenericMaster
  implements ActionListener
{
  // ... GUI declarations here

  private int lowmark;
  private int highmark;
  private int triesPerTask;
  private int waterlevel = 0;
  private boolean start = false;
  private long startTime;
  private String salt;
  private String unencrypted;

  public void init() {
    super.init();
    // ... GUI setup here
  }

  public void actionPerformed(ActionEvent event) {
    if (start) {
       return;
    }

    String msg = null;
    salt = saltTextField.getText();
    if (salt.equals("")) {
        msg = "Supply a salt value";
    }

    unencrypted = wordTextField.getText();
    if (unencrypted.equals("")) {
        msg = "Supply a word to encrypt";
    }

    if (unencrypted.length() != 4) {
        msg = "Supply a word of four characters";
    }

    // JCrypt expects 8 chars
    unencrypted = "!!!!" + unencrypted;

    try {
       triesPerTask = Integer.parseInt(
          triesTextField.getText());
    } catch (NumberFormatException e) {
       msg = "Enter an integer value in "Tries per Task".";
    }
    try {
       highmark = Integer.parseInt(highTextField.getText());
    } catch (NumberFormatException e) {
       msg = "Enter an integer value in "High Mark".";
    }
    try {
       lowmark = Integer.parseInt(lowTextField.getText());
     } catch (NumberFormatException e) {
        msg = "Enter an integer value in "Low Mark".";
    }

    if (msg == null) {
       start = true;
    } else {
       showStatus(msg);
    }
  }
  // ... generateTasks() goes here
  // ... collectResults() goes here
  // ... other helper methods go here
}

CryptMaster subclasses our GenericMaster class; this basic skeleton shows where fields are declared and initialized and where the user interface is set up.

The notable part of this code is the actionPerformed method, which is called when the “Start” button is clicked. This method first checks to see if the start field has already been set to true (indicating that this method has been called before), and if so simply returns. Otherwise we check the value of the fields in the user interface to make sure they contain appropriate values. If they do, we set the value of several fields, including the start field; otherwise we display an error message in the status area of the applet.

11.2.4. Generating Tasks

Now that our user interface code is out of the way, let's look at how CryptMaster generates and collects tasks. Recall that GenericMaster (which CryptMaster subclasses) creates two threads, one that calls generateTasks and another that calls collectResults. Here is the code for generateTasks:

public void generateTasks() {
  while (!start) {
    try {
       Thread.sleep(250);
    } catch (InterruptedException e) {
       return;     // thread told to stop
    }
  }

  startTime = System.currentTimeMillis();

  String encrypted = JCrypt.crypt(salt, unencrypted);
  encryptedTextField.setText(encrypted);

  byte[] testWord = getFirstWord();

  for (;;) {
        if (testWord[1] != salt.charAt(1)) {
           return;
        }
      CryptTask task =
        new CryptTask(triesPerTask, testWord, encrypted);
      System.out.println("Writing task");
      writeTask(task);
      for (int i = 0; i < triesPerTask; i++) {
         CryptTask.nextWord(testWord);
      }
  }
}

The generateTasks method begins by waiting for the start field to become true. As we saw in the last section, this field is set to true when the user clicks on the “Start” button.

Once start is set to true, the generateTasks method sets the startTime field to the current time and the encrypted field to the encrypted version of the password entered in the text entry field (prepended with salt); then the interface is updated to display the encrypted version. Next the method getFirstWord is called to obtain the first ASCII sequence (see the complete source code for the method definition), which gets assigned to the field testWord; this will be the starting point for enumerating all potential password matches.

Then we enter the main loop of generateTasks, which first tests to see if we are at the last word of the ASCII sequence. Recall from our explanation of the CryptTask's nextWord method that the sequence is enumerated by “incrementing” through the positions of the word from right to left. Here we check the first character of the password's salt. When this character is no longer the salt character (in other words, when it, too, has been incremented) then we know we've exhausted all password possibilities.

Next we create a CryptTask by calling its convenience constructor, supplying the number of tries per task, the encrypted version of the password, and the starting word to begin testing. We then write the crypt task to the space. Next we need to update testWord, so that the next task will begin at the word which is triesPerTask away from this task's starting word. To do this we simply call the nextWord method triesPerTask times (a very fast operation compared to doing the password check of each possibility).

That is the basic version of the generateTasks method, which simply enumerates all possible values and partitions them into a number of tasks that need to be checked by workers. The task generation is problematic, however, in that it can generate a very large number of tasks before the password is discovered. In fact, there are over 81 million possible passwords of length four or less. If we choose a “tries per task” of 1,000 then the master will (in the worst case) generate over 80,000 tasks. This could have a serious impact on the resources of the space. Let's look at one way to improve this code.

How many passwords are possible? For a four-character password we need to check just over 81 million potential passwords. This number comes from the following computation: a password has four character “slots” to be filled, each of which can hold one of 95 possible ASCII characters (we will assume that passwords of 3 characters or less are filled with the space character at the end). Because the choice of each slot is independent of the others we obtain all possibilities by computing 95*95*95*95 = 954 = 81,450,625.

How many eight-character passwords are possible?


11.2.5. Improving Space Usage with Watermarking

One observation we can make is that the space needs to hold only enough tasks to keep workers busy. A technique for keeping a space full enough, yet not too full, is called watermarking. The word watermarking comes from the two marks often found on shorelines marking high and low tide. Here we choose a high point and low point, meaning the upper and lower limits on the number of entries we'd like in the space. We fill the space with entries until it reaches the high mark, and then wait until the number of entries falls below the low mark, at which point we start filling it with tasks again.

As an illustration of how watermarking works, assume you have a set of ten workers at your disposal. You might want to set the high and low marks to, say, twelve and twenty respectively. That way you always have at least enough tasks for all workers but at most twenty entries in the space. Tuning watermarks is more of an art than a science. In our case we are going to assume that the high and low marks are set when the computation begins and never change. Often the number of workers varies over time, and the high and low values could be modified adaptively—but we won't get that sophisticated here.

Let's apply the watermarking principle to our generateTasks code. Here is a new version of the generateTasks main loop, updated to use watermarking:

for (;;) {
   waitForLowWaterMark(lowmark);

   while (waterlevel < highmark) {
    if (testWord[1] != salt.charAt(1)) {
       return;
    }
    CryptTask task =
      new CryptTask(triesPerTask, testWord, encrypted);
     System.out.println("Writing task");
     writeTask(task);
     changeWaterLevel(1);
     for (int i = 0; i < triesPerTask; i++) {
        CryptTask.nextWord(testWord);
     }
  }
}

And here are the definitions of a couple helper methods that are used:

synchronized void waitForLowWaterMark(int level) {
  while (waterlevel > level) {
    try {
       wait();
    } catch (InterruptedException e) {
      ; // continue
    }
  }
}

private synchronized void changeWaterLevel(int delta) {
   waterlevel += delta;
   waterMarkLabel.setText("Waterlevel is at " +
     waterlevel + " tasks");
   notifyAll();
}

Our first addition to the main loop is the call to waitForLowWaterMark, which causes the method to wait until the water level is equal to our low watermark (which is specified in the user interface). This is determined by the waterlevel field, which is initially set to zero in its declaration.

Next we've inserted an additional loop, which continues as long as the water level is less than the high watermark. For each task we write into the space we increase the water level by one, until it exceeds the high watermark. As we will see in the next section, as the results are collected this watermark is decremented. Only when the low watermark is reached do we begin adding more tasks.

11.2.6. Collecting Results

Now let's look at how results are collected. The code for collectResults is given below:

public void collectResults() {
  int count = 0;
  Entry template;

  try {
     template = space.snapshot(new CryptResult());
  } catch (RemoteException e) {
     throw new RuntimeException("Can't create a snapshot");
  }

  for (;;) {
     CryptResult result = (CryptResult)takeTask(template);
     if (result != null) {
        count++;
        triedLabel.setText("Tried " +
           (count * triesPerTask) + " words.");
        updatePerformance(count * triesPerTask);
        changeWaterLevel(-1);
        if (result.word != null) {
            String word =
               CryptTask.getPrintableWord(result.word);
            answerLabel.setText("Unencrypted: " +
              word.substring(6));
            addPoison();
            return;
       }
     }
  }
}

In the collectResults method we first declare a local variable count, which will be used to keep a count of the number of tasks that have been computed. We then create a snapshotted version of the CryptResult entry, which will be used as a template to take results from the space. Once again, the result template does not vary when it is used, so snapshotting the entry is better than reserializing the entry each time we call take.

Next we enter the main loop of the method; in this loop we first take a result entry from the space, waiting as long as necessary. Once we have retrieved a result entry, we increment the count variable and update the information display of the user interface by printing the total number of words that have been tried against the encrypted password (computed by count * triesPerTask). We then update the performance area of the information display by calling updatePerformance with the total number of word tries as a parameter. The code for updatePerformance is given as follows:

public void updatePerformance(long wordsTried) {
  long now = System.currentTimeMillis();
  long wordRate = wordsTried / ((now - startTime) / 1000);
  perfLabel.setText(wordRate + " words per second.");
}

First we get the current time and assign it to now. Then we compute the average number words computed per second, which is wordsTried divided by the number of seconds since CryptMaster began running (which is ((now - startTime)/1000)). This gives us an average number of words computed per second. We then set the appropriate label in the information display.

Now, returning to collectResults, after updating the display we decrement the water level, since we know a task has been removed and computed. We then check to see if the result entry we retrieved has the cracked password in it. If result.word is non-null, then we have the password and we print it in the information display.

At this point we are finished, but before we return from collectResults, we first call addPoison, which we describe in the next section.

11.2.7. A Little Poison

The addPoison method allows us to clean up the space, which is needed because, when a valid password is found, the space may be left with task entries that no longer need to be computed. There may also be result entries that need to be cleaned up as well. In the latter case, we can expect leases to take care of removing the results. However, the disadvantage of leaving the task entries around is that the workers will continue to compute them, which would be a waste of resources in a shared compute server environment.

One approach to cleaning up task entries is to have the master collect the remaining tasks until none remain. A better approach is to let the workers clean up the tasks themselves. This can be done using a variant of a barrier (see Chapter 4), which is referred to as a poison pill. Poison pills work like this: When a task's execute method is called by the worker, the task checks to see if a poison pill exists in the space. If one exists, then the task simply returns (and is therefore not computed), but if no pill exists, the task is computed.

To implement a poison pill we first create a poison entry:

public class PoisonPill implements Entry {
  public PoisonPill() {
  }
}

As you can see the PoisonPill is a simple entry, whose only purpose is to act as a barrier in the space.

To make use of the poison pill we also need to alter the execute method of our CryptTask entry:

public Entry execute(JavaSpace space) {
  PoisonPill template = new PoisonPill();
  try {
     if (space.readIfExists(template, null,
        JavaSpace.NO_WAIT) != null)
     {
        return null;
     }
  } catch (Exception e) {
    ; // continue on
  }
  int num = tries.intValue();
  System.out.println("Trying " + getPrintableWord(word));
  for (int i = 0; i < num; i++) {
     if (encrypted.equals(JCrypt.crypt(word))) {
        CryptResult result = new CryptResult(word);
        return result;
     }
  nextWord(word);
  }
  CryptResult result = new CryptResult(null);
  return result;
}

Here we first look for a poison pill in the space using readIfExists. If one exists then we skip the computation and return null. Otherwise, we compute the entry as normal. When the poison pill exists in the space, it has a flushing effect on the remaining task entries: Whenever a worker takes a task and calls execute, a poison pill is found and the method returns immediately, so tasks are removed from the space instead of being computed.

11.2.8. Exploring Crypt

Finally, our application is complete. Now it is time to fire it up and experiment. As with the previous version of the compute server we first start up one or more generic workers and then start up the CryptMaster applet. At this point you should get a feel for how quickly the words are computed using one machine. If you are using a garden variety machine, then you should expect to see thousands of words processed per second.

With that benchmark behind you, begin to experiment with adding more machines. Pay attention to how the word rate increases as processors are added to the computation.

At this point you may want to tweak the “tries per task” parameter. If you collect data across a number of runs, you should see the effect of computation versus communication. As the triespertask parameter is raised, computation plays a larger role in the runtime behavior; as it is lowered, communication plays a larger role.

If you have access to a large number of machines, you will also want to study the speedup gained with each new processor. For a reasonable number of processors and a large enough “tries per task” you should expect to see perfect speedup. Perfect speedup means that if you use n processors you will compute a problem n times as fast as one processor. However as more and more processors are added, the effect may eventually level off, given contention for the space resource.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset