418 Intermediate C Programming
// mapping from ASCII to the in d exes of the code book69
int mapping [ N UMLETTER ];70
int ind ;71
for ( ind = 0; ind < NUMLETTER ; ind ++)72
{73
mapping [ ind ] = -1; // initial ized to invalid index74
int ind2 ;75
for ( ind2 = 0; ind2 < numRow ; ind2 ++)76
{77
i f ( codeboo k [ ind2 ][0] == ind )78
{79
mapping [ ind ] = ind2 ;80
}81
}82
}83
compress ( infile , outfile , codebook , mapping );84
If this is the input file:
ENNGGGGGGGTTTTTTTgggggggggggggggggggpppppppppppppppppppppppd
dddddddddddddddddddddddddddddddddddddddddddddhhhhhhhhhhhhhhh
hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
This is the compressed file:
1h1d1p1T1E1N01G001g00000
1110100111010111101011110111110111110111110111110111110111110
1111100111001110011100111001110011100111111111111111111111111
1111111111111111111111111111111111111111111111111111110110110
1101101101101101101101101101101101101101101101101101101101101
0101010101010101010101010101010101010101010101010101010101010
1010101010101010101010101010100000000000000000000000000000000
000000000000000000000000000000000000000000
There is only one newline ’n’, which appears after the header. Line breaks are added
so that the very long sequence of 1s and 0s can fit on this page. Please notice that the
header does not include the characters’ frequencies because this information is unnecessary
for decoding.
24.2.6 Compress with Bits
You may have noticed that the “compressed” file is actually longer than the original
file. The original file has 178 bytes and the “compressed” file has 433 bytes. Why? Because
the compressed file uses ’1’ and ’0’ (8-bit characters) to represent single bits. That is a
lot of waste. In order to actually compress the file, only single bits should be used. This
will not reduce the file size to one eighth or the original; however, it should be smaller
than 178 bytes. The compressed file contains more than the bits describing the data. The
compressed file starts with the header that describes the code book. The header includes
the “commands” 1 or 0 to indicate whether the following information is a leaf node or not.
If a command bit is 1, it must be followed by an ASCII character. ASCII codes go from 0
to 127 (this book ignores the characters whose values are greater than 128), and 2
7
= 128,
which means that 7 bits is sufficient to store a character. This allows us to put a command