It’s easy to confuse the topic of encoding and decoding with encryption. These are similar procedures, but the purpose of encryption is to conceal and safeguard information. Encoding is done for transportation of information that may be too complex for the medium or to translate between different systems or for other innocuous purposes. Regardless, the process of encoding and decoding has the potential to be action packed and full of intrigue.
Still, back in the early days of computer telecommunications, encoding and decoding were regular occurrences. I remember transferring my first program over a modem: 16 kilobytes that took 16 minutes to transfer. That program consisted of binary data, but it was transported as plain text. It required encoding on the sending end and decoding on the receiving end. Such magic happens today as well, though probably much faster.
To explore the concept of encoding and decoding, regardless of the thrills and dangers, you must:
None of these items is dreary, not like that book on 100 fun and legal home projects you can do with an ironing board. But if you want to know more about encryption, refer to chapter 4.
The computer doesn’t know text. The char data type is merely a tiny integer, ranging in value from 0 through 255 (unsigned) or -128 to 127 (signed). It’s only the presentation of the char data type that makes it look like a character.
In C, the putchar() function outputs a value as a character. The function’s man page declares the function’s argument as an integer, though it appears on the standard output device as a character.
The printf() function is a bit more understanding of characters. It outputs a char data type as a character but only when the %c placeholder is used in the format string. If you substitute %d, the decimal integer output placeholder, the data is output as a number.
But what thing is output? How does the computer know to match a specific value with a given character? The answer comes in the form of the venerable digital acronym, ASCII.
It’s important to note that ASCII is pronounced “ass-key.” That’s right: ass and key. Titter all you like, but if you say, “ask two,” everyone will know you’re a dork.
It’s unimportant to note that ASCII stands for the American Standard Code for Information Interchange. Yes, it’s a standard devised by people who sit around all day having fun creating standards. And though the standard was developed in the early 1960s, it wasn’t until the mid-1980s that pretty much every computer on the planet began using ASCII codes consistently.
By adopting the ASCII standard for assigning codes to characters, computers can exchange basic information without requiring any translation. Before it was widely adopted in the late 1970s, computers had to run translation programs to get even a text file to read properly from one system to the next. But today, a text file on your overpriced Macintosh is easily readable on my cheap-o Linux box that my friend Don built in the back of his shop for $499.
The way ASCII works is to assign codes, integer values, to common characters and symbols. This translation originated from the telegraph era, where the codes had to be consistent for a message to be translated—encoded and decoded—lest the Hole-in-the-Wall Gang rob the 12:10 yet again because old Hamer McCleary was taking a nap at the Belle Fourche station.
ASCII codes are devised in a clever pattern, which is amazing for any group of humans to produce. The pattern allows for all sorts of fun and creative things to happen, as covered in section 5.1.4. Figure 5.1 lists the ASCII code table in its common, four “stick” presentation. See whether you can spy any of the patterns.
From figure 5.1, you see that ASCII codes range from 0 through 127. These are binary values 000-0000 through 111-1111. For the C language char data type, these values are all positive whether the variable is signed or unsigned.
Each of the four columns, or “sticks,” in the ASCII table (refer to figure 5.1) represents a different category of character types. Again, the codes are organized, probably due to some education from earlier abominable computer character codes that have since been taken out, placed in a dumpster, and set on fire with a jet engine.
The first stick consists of nonprinting control codes, which is why its output looks so dull in figure 5.1. Read more about the control codes in section 5.1.2.
Characters in the second stick in the ASCII table were selected for sorting purposes. The first few characters echo those on a teletype machine, the shifted number keys. These still hold true today for the most part: Shift+1 is the ! (exclamation point), Shift+3 is the # (hash), and so on.
The third stick contains uppercase letters, plus a few symbols.
The fourth stick contains lowercase letters, plus the rest of the symbols.
Miracles and magic surrounding the ASCII table and these codes are covered in the next few sections.
Having an ASCII table handy is vital to any programmer. Rather than sell you my handsome ASCII wall chart on Etsy, I decided that you must code your own ASCII table. Make the output appear exactly as shown in figure 5.1—which happens to be the output from my own ASCII program and looks like the wall chart. I often run my ASCII program as a reference because such information is useful and a program is a quick way to keep it handy, though I’m not making any money on Etsy.
The source code for my solution to this exercise is found in this book’s online repository as asciitable01.c. But please try creating your own before you just ape everything that I did.
I find the first stick of ASCII codes to be the most interesting, from both a historical and hilarious perspective. The control code names are adorable! “End of Text”? Try using that one in a meeting sometime, but just say “Control C” instead. Some people might get it.
“End of Text” is the official name of the Ctrl+C control code, ASCII code 3. Table 5.1 lists the details. Some of the codes or their keyboard equivalents might be familiar to you.