Chapter 8. The Genetic Code

Up to this point we've used Perl to search for motifs, simulate DNA mutations, generate random sequences, and transcribe DNA to RNA. These are all important activities, and they serve as a good introduction to the computational techniques you can use to study biological systems.

In this chapter, we'll write Perl programs to simulate how the genetic code directs the translation of DNA into protein. I will start by introducing the hash datatype. Then, after a brief discussion of how different data structures (like hashes and arrays) and database systems can store and access experimental information, we will write a program to translate DNA to protein. We'll also continue exploring regular expressions and write code to handle FASTA files.

Hashes

There are three main datatypes in Perl. You've already seen two: scalar variables and arrays. Now we'll start to use the third: hashes (also called associative arrays).

A hash provides very fast lookup of the value associated with a key. As an example, say you have a hash called %english_dictionary. (Yes, hashes start with the percent sign.) If you want to look up the definition of the word "recreant," you say:

$definition = $english_dictionary{'recreant'};

The scalar 'recreant' is the key, and the scalar definition that's returned is the value. As you see from this example, hashes (like arrays) change their leading character to a dollar sign when you access a single element, because the value returned from a hash lookup is a scalar value. You can tell a hash lookup from an array element by the type of braces they use: arrays use square brackets [ ]; hashes use curly braces { }.

If you want to assign a value to a key, it's similarly an easy, single statement:

$english_dictionary{'recreant'} = "One who calls out in surrender.";

Also, if you want to initialize a hash with some key-value pairs, it's done much like initializing arrays, but every pair becomes a key-value:

%classification = (
    'dog',      'mammal',
    'robin',    'bird',
    'asp',      'reptile',
);

which initializes the key 'dog' with the value 'mammal', and so on. There's another way of writing this, which shows what's happening a little more clearly. The following does exactly the same thing as the preceding code, while showing the key-value relationship more clearly:

%classification = (
    'dog'   => 'mammal',
    'robin' => 'bird',
    'asp',  => 'reptile',
);

You can get an array of all the keys of a hash:

@keys  = keys %my_hash;

You can get an array of all the values of a hash:

@values  = values %my_hash;

You use hashes in lots of different situations, especially when your data is in the form of key-value or you need to look up the value of a key fast. For instance, later in this chapter, we'll develop programs that use hashes to retrieve information about a gene. The gene name is the key; the information about the gene is the value of that key.

The name "hash" comes from something called a hash function, which practically any book on algorithms will define, if you've a mind to look it up. Mathematically, a Perl hash always represents a finite function. Let's skip the details of how they work under the hood and just talk about their behavior.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset