3

Digital principles

3.1 Binary codes

In digital audio, binary numbers express the values of the samples which represent the original analog waveform. Digital audio recording consists of storing such numbers on a suitable medium, where the goal is to reproduce the numbers unchanged. However, if it is required to manipulate the audio waveform, this can be done in the digital domain by changing the sample values. To see how this can be done requires some knowledge of binary codes.

Figure 3.1 shows some binary numbers and their equivalent in decimal. The radix point has the same significance in binary: symbols to the right of it represent one half, one quarter and so on. Binary numbers easily become very long, and writing them by hand is tedious and error-prone. The octal and hexadecimal notations are both used for writing binary since conversion is so simple. Figure 3.1 also shows that a binary number is split into groups of three or four digits starting at the least significant end, and the groups are individually converted to octal or hexadecimal digits. Since sixteen different symbols are required in hex, the letters A–F are used for the numbers above nine.

The fixed number of bits in a PCM sample determines the extent of the quantizing range. In the sixteen-bit samples commonly used, there are 65 536 different numbers, each representing a different analog signal voltage. Care must be taken during conversion to ensure that the signal does not go outside the convertor range, or it will be clipped. In Figure 3.2 it will be seen that in a sixteen-bit pure binary system, the number range goes from 0000 hex, which represents the smallest voltage, through to FFFF hex, which represents the largest positive voltage. Effectively the zero voltage level of the analog waveform has been shifted so that the positive and negative voltages in a real audio signal may be expressed by binary numbers which are only positive. This approach is called offset binary and unfortunately it is unsuitable for audio signal processing in the digital domain.

Figure 3.3 shows that the level of the signal is measured by how far the waveform deviates from mid-range, around which attenuation, gain and mixing all take place. Digital audio mixing is achieved by adding sample values from two or more different sources, but the correct result will only be obtained if the quantizing intervals are of the same size and there are no offsets. In other cases the binary numbers are not proportional to the signal voltage.

Figure 3.1 (a) Binary and decimal. (b) In octal, groups of three bits make one symbol 0–7. (c) In hex, groups of four bits make one symbol 0–F. Note how much shorter the number is in hex.

image

Figure 3.2 Offset binary coding is simple but causes problems in digital audio processing. It is seldom used.

image

Figure 3.3 Attenuation of an audio signal takes place with respect to midrange.

image

In the two’s complement system, the upper half of the pure binary number range has been redefined to represent negative quantities. If a pure binary counter is constantly incremented and allowed to overflow, it will produce all the numbers in the range permitted by the number of available bits, and these are shown for a four-bit example drawn around the circle in Figure 3.4. As a circle has no real beginning, it is possible to consider it to start wherever it is convenient. In two’s complement, the quantizing range represented by the circle of numbers does not start at zero, but starts on the diametrically opposite side of the circle. Zero is midrange, and all numbers with the MSB (most significant bit) set are considered negative. The MSB is thus the equivalent of a sign bit where 1 = minus. Two’s complement notation differs from pure binary in that the most significant bit is inverted in order to achieve the half-circle rotation.

Figure 3.4 In this example of a four-bit two’s complement code, the number range is from −8 to +7. Note that the MSB determines polarity.

image

Figure 3.5 shows how a real ADC is configured to produce two’s complement output. At (a) an analog offset voltage equal to one half the quantizing range is added to the bipolar analog signal in order to make it unipolar as at (b). The ADC produces positive only numbers at (c) which are proportional to the input voltage. The MSB is then inverted at (d) so that the all-zeros code moves to the centre of the quantizing range. The analog offset is often incorporated into the ADC as is the MSB inversion.

The two’s complement system allows two sample values to be added, or mixed in audio parlance, and the result will be referred to the system midrange; this is analogous to adding analog signals in an operational amplifier. Figure 3.6 illustrates how adding two’s complement samples simulates a bipolar mixing process. The waveform of input A is depicted by solid black samples, and that of B by samples with a solid outline. The result of mixing is the linear sum of the two waveforms obtained by adding pairs of sample values. The dashed lines depict the output values. Beneath each set of samples is the calculation which will be seen to give the correct result. Note that the calculations are pure binary. No special arithmetic is needed to handle two’s complement numbers.

Figure 3.5 A two’s complement ADC. At (a) an analog offset voltage equal to one-half the quantizing range is added to the bipolar analog signal in order to make it unipolar as at (b). The ADC produces positive-only numbers at (c), but the MSB is then inverted at (d) to give a two’s complement output.

image

Figure 3.6 Using two’s complement arithmetic, single values from two waveforms are added together with respect to midrange to give a correct mixing function.

image

3.2 Introduction to digital logic

However complex a digital process, it can be broken down into smaller stages until finally one finds that there are really only two basic types of element in use, and these can be combined in some way and supplied with a clock to implement virtually any process. Figure 3.7 shows that the first type is a logic element. This produces an output which is a logical function of the input with minimal delay. The second type is a storage element which samples the state of the input(s) when clocked and holds or delays that state. The strength of binary logic is that the signal has only two states, and considerable noise and distortion of the binary waveform can be tolerated before the state becomes uncertain. At every logic element, the signal is compared with a threshold, and can thus can pass through any number of stages without being degraded.

Figure 3.7 Logic elements have a finite propagation delay between input and output and cascading them delays the signal an arbitrary amount. Storage elements sample the input on a clock edge and can return a signal to near coincidence with the system clock. This is known as reclocking. Reclocking eliminates variations in propagation delay in logic elements.

image

In addition, the use of a storage element at regular locations throughout logic circuits eliminates time variations or jitter. Figure 3.7 also shows that if the inputs to a logic element change, the output will not change until the propagation delay of the element has elapsed. However, if the output of the logic element forms the input to a storage element, the output of that element will not change until the input is sampled at the next clock edge. In this way the signal edge is aligned to the system clock and the propagation delay of the logic becomes irrelevant. The process is known as reclocking.

The two states of the signal when measured with an oscilloscope are simply two voltages, usually referred to as high and low. As there are only two states, there can only be true or false meanings. The true state of the signal can be assigned by the designer to either voltage state. When a high voltage represents a true logic condition and a low voltage represents a false condition, the system is known as positive logic, or high true logic. This is the usual system, but sometimes the low voltage represents the true condition and the high voltage represents the false condition. This is known as negative logic or low true logic. Provided that everyone is aware of the logic convention in use, both work equally well.

In logic systems, all logical functions, however complex, can be configured from combinations of a few fundamental logic elements or gates. It is not profitable to spend too much time debating which are the truly fundamental ones, since most can be made from combinations of others. Figure 3.8 shows the important simple gates and their derivatives, and introduces the logical expressions to describe them, which can be compared with the truth-table notation. The figure also shows the important fact that when negative logic is used, the OR gate function interchanges with that of the AND gate.

If numerical quantities need to be conveyed down the two-state signal paths described here, then the only appropriate numbering system is binary, which has only two symbols, 0 and 1. Just as positive or negative logic could be used for the truth of a logical binary signal, it can also be used for a numerical binary signal. Normally, a high voltage level will represent a binary 1 and a low voltage will represent a binary 0, described as a ‘high for a one’ system. Clearly a ‘low for a one’ system is just as feasible. Decimal numbers have several columns, each of which represents a different power of ten; in binary the column position specifies the power of two.

Several binary digits or bits are needed to express the value of a binary audio sample. These bits can be conveyed at the same time by several signals to form a parallel system, which is most convenient inside equipment or for short distances because it is inexpensive, or one at a time down a single signal path, which is more complex, but convenient for cables between pieces of equipment because the connectors require fewer pins. When a binary system is used to convey numbers in this way, it can be called a digital system.

The basic memory element in logic circuits is the latch, which is constructed from two gates as shown in Figure 3.9(a), and which can be set or reset. A more useful variant is the D-type latch shown at (b) which remembers the state of the input at the time a separate clock either changes state for an edge-triggered device, or after it goes false for a level-triggered device. A shift register can be made from a series of latches by connecting the Q output of one latch to the D input of the next and connecting all the clock inputs in parallel. Data are delayed by the number of stages in the register. Shift registers are also useful for converting between serial and parallel data transmissions. Where large numbers of bits are to be stored, cross-coupled latches are less suitable because they are more complicated to fabricate inside integrated circuits than dynamic memory, and consume more current.

Figure 3.8 The basic logic gates compared.

image

In large random access memories (RAMs), the data bits are stored as the presence or absence of charge in a tiny capacitor as shown in Figure 3.9(c). The capacitor is formed by a metal electrode, insulated by a layer of silicon dioxide from a semiconductor substrate, hence the term MOS (metal oxide semiconductor). The charge will suffer leakage, and the value would become indeterminate after a few milliseconds. Where the delay needed is less than this, decay is of no consequence, as data will be read out before they have had a chance to decay. Where longer delays are necessary, such memories must be refreshed periodically by reading the bit value and writing it back to the same place. Most modern MOS RAM chips have suitable circuitry built-in. In large RAMs it is clearly impractical to have a connection to each bit. Instead, the desired bit has to be addressed before it can be read or written. The size of the chip package restricts the number of pins available, so that large memories may use the same address pins more than once. The bits are arranged internally as rows and columns, and the row address and the column address are input sequentially on the same pins.

Figure 3.9 Digital semiconductor memory types. In (a), one data bit can be stored in a simple set-reset latch, which has little application because the D-type latch in (b) can store the state of the single data input when the clock occurs. These devices can be implemented with bipolar transistors of FETs, and are called static memories because they can store indefinitely. They consume a lot of power.

In (c), a bit is stored as the charge in a potential well in the substrate of a chip. It is accessed by connecting the bit line with the field effect from the word line. The single well where the two lines cross can then be written or read. These devices are called dynamic RAMs because the charge decays, and they must be read and rewritten (refreshed) periodically.

image

The circuitry necessary for adding pure binary or two’s complement numbers is shown in Figure 3.10. Addition in binary requires two bits to be taken at a time from the same position in each word, starting at the least significant bit. Should both be ones, the output is zero, and there is a carry-out generated. Such a circuit is called a half adder, shown in Figure 3.10(a) and is suitable for the least significant bit of the calculation. All higher stages will require a circuit which can accept a carry input as well as two data inputs. This is known as a full adder (Figure 3.10(b)). When mixing by adding sample values, care has to be taken to ensure that if the sum of the two sample values exceeds the number range the result will be clipping rather than wraparound.

Figure 3.10 (a) Half adder; (b) full-adder circuit and truth table.

image

Figure 3.11 Two configurations which are common in processing. In (a) the feedback around the adder adds the previous sum to each input to perform accumulation or digital integration. In (b) an invertor allows the difference between successive inputs to be computed. This is differentiation.

image

A storage element can be combined with an adder to obtain a number of useful functional blocks which will frequently be found in audio equipment. Figure 3.11(a) shows that a latch is connected in a feedback loop around an adder. The latch contents are added to the input each time it is clocked. The configuration is known as an accumulator in computation because it adds up or accumulates values fed into it. In filtering, it is known as an discrete time integrator. If the input is held at some constant value, the output increases by that amount on each clock. The output is thus a sampled ramp. Figure 3.11(b) shows that the addition of an invertor allows the difference between successive inputs to be obtained. This is digital differentiation. The output is proportional to the slope of the input.

3.3 The computer

The computer is now a vital part of digital audio systems, being used both for control purposes and to process audio signals as data. In control, the computer finds applications in database management, automation, editing, and in electromechanical systems such as tape drives and robotic cassette handling. Once processing speeds advanced sufficiently, computers became able to manipulate digital audio in real time.

The computer is a programmable device in that its operation is not determined by its construction alone, but instead by a series of instructions forming a program. The program is supplied to the computer one instruction at a time so that the desired sequence of events takes place.

Programming of this kind has been used for over a century in electromechanical devices, including automated knitting machines and street organs which are programmed by punched cards. However, the computer differs from these devices in that the program is not fixed, but can be modified by the computer itself. This possibility led to the creation of the term software to suggest a contrast to the constancy of hardware.

Computer instructions are binary numbers each of which is interpreted in a specific way. As these instructions don’t differ from any other kind of data, they can be stored in RAM. The computer can change its own instructions by accessing the RAM. Most types of RAM are volatile, in that they lose data when power is removed. Clearly if a program is entirely stored in this way, the computer will not be able to recover fom a power failure. The solution is that a very simple starting or bootstrap program is stored in non-volatile ROM which will contain instructions that will bring in the main program from a storage system such as a disk drive after power is applied. As programs in ROM cannot be altered, they are sometimes referred to as firmware to indicate that they are classified between hardware and software.

Making a computer do useful work requires more than simply a program which performs the required computation. There is also a lot of mundane activity which does not differ significantly from one program to the next. This includes deciding which part of the RAM will be occupied by the program and which by the data, producing commands to the storage disk drive to read the input data from a file and write back the results. It would be very inefficient if all programs had to handle these processes themselves. Consequently the concept of an operating system was developed. This manages all the mundane decisions and creates an environment in which useful programs or applications can execute.

The ability of the computer to change its own instructions makes it very powerful, but it also makes it vulnerable to abuse. Programs exist which are deliberately written to do damage. These viruses are generally attached to plausible messages or data files and enter computers through storage media or communications paths.

There is also the possibility that programs contain logical errors such that in certain combinations of circumstances the wrong result is obtained. If this results in the unwitting modification of an instruction, the next time that instruction is accessed the computer will crash. In consumer-grade software, written for the vast personal computer market, this kind of thing is unfortunately accepted.

For critical applications, software must be verified. This is a process which can prove that a program can recover from absolutely every combination of circumstances and keep running properly. This is a non-trivial process, because the number of combinations of states a computer can get into is staggering. As a result most software is unverified. It is of the utmost importance that networked computers which can suffer virus infection or computers running unverified software are never used in a life-support or critical application.

Figure 3.12 A simple computer system. All components are linked by a single data/address/ control bus. Although cheap and flexible, such a bus can only make one connection at a time, so it is slow.

image

Figure 3.12 shows a simple computer system. The various parts are linked by a bus which allows binary numbers to be transferred from one place to another. This will generally use tri-state logic so that when one device is sending to another, all other devices present a high impedance to the bus.

The ROM stores the startup program, the RAM stores the operating system, applications programs and the data to be processed. The disk drive stores large quantities of data in a non-volatile form. The RAM only needs to be able to hold part of one program as other parts can be brought from the disk as required. A program executes by fetching one instruction at a time from the RAM to the processor along the bus.

The bus also allows keyboard/mouse inputs and outputs to the display and printer. Inputs and outputs are generally abbreviated to I/O. Finally a programmable timer will be present which acts as a kind of alarm clock for the processor.

The processor or CPU (central processing unit) is the heart of the system. Figure 3.13 shows the data path of a simple CPU. The CPU has a bus interface which allows it to generate bus addresses and input or output data. Sequential instructions are stored in RAM at contiguously increasing locations so that a program can be executed by fetching instructions from a RAM address specified by the program counter (PC) to the instruction register in the CPU. As each instruction is completed, the PC is incremented so that it points to the next instruction. In this way the time taken to execute the instruction can vary.

The processor is notionally divided into data paths and control paths. The CPU contains a number of general-purpose registers or scratchpads which can be used to store partial results in complex calculations. Pairs of these registers can be addressed so that their contents go to the ALU (arithmetic logic unit). This performs various arithmetic (add, subtract, etc.) or logical (and, or, etc.) functions on the input data. The output of the ALU may be routed back to a register or output. By reversing this process it is possible to get data into the registers from the RAM. The ALU also outputs the conditions resulting from the calculation, which can control conditional instructions. Which function the ALU performs and which registers are involved are determined by the instruction currently in the instruction register then is decoded in the control path. One pass through the ALU can be completed in one cycle of the processor’s clock. Instructions vary in complexity as do the number of clock cycles needed to complete them. Incoming instructions are decoded and used to access a look-up table which converts them into microinstructions, one of which controls the CPU at each clock cycle.

Figure 3.13 The data path of a simple CPU. Under control of an instruction, the ALU will perform some function on a pair of input values from the registers and store or output the result.

image

3.4 Timebase correction

In Chapter 1 it was stated that a strength of digital technology is the ease with which delay can be provided. Accurate control of delay is the essence of timebase correction, necessary whenever the instantaneous time of arrival or rate from a data source does not match the destination. In digital audio, the destination will almost always require perfectly regular timing, namely the sampling rate clock of the final DAC. Timebase correction consists of aligning jittery signals from storage media or transmission channels with that stable reference.

A further function of timebase correction is to reverse the time compression applied prior to recording or transmission. As was shown in section 1.6, digital recorders compress data into blocks to facilitate editing and error correction as well as to permit head switching between blocks in rotary-head machines. Owing to the spaces between blocks, data arrive in bursts on replay, but must be fed to the output convertors in an unbroken stream at the sampling rate.

In computer hard-disk drives, which are used in digital audio workstations, time compression is also used, but a converse problem also arises. Data from the disk blocks arrive at a reasonably constant rate, but cannot necessarily be accepted at a steady rate by the logic because of contention for the use of buses and memory by the different parts of the system. In this case the data must be buffered by a relative of the timebase corrector which is usually referred to as a silo.

Although delay is easily implemented, it is not possible to advance a data stream. Most real machines cause instabilities balanced about the correct timing: the output jitters between too early and too late. Since the information cannot be advanced in the corrector, only delayed, the solution is to run the machine in advance of real time. In this case, correctly timed output signals will need a nominal delay to align them with reference timing. Early output signals will receive more delay, and late output signals will receive less delay.

Figure 3.14 Most TBCs are implemented as a memory addressed by a counter which periodically overflows to give a ring structure. The memory allows the read and write sides to be asynchronous.

image

Section 2.2 showed the principles of digital storage elements which can be used for delay purposes. The shift-register approach and the RAM approach to delay are very similar, as a shift register can be thought of as a memory whose address increases automatically when clocked. The data rate and the maximum delay determine the capacity of the RAM required. Figure 3.14 shows that the addressing of the RAM is by a counter that overflows endlessly from the end of the memory back to the beginning, giving the memory a ring-like structure. The write address is determined by the incoming data, and the read address is determined by the outgoing data.

In hard disk systems, the data transfers to and from the disk itself must be at a rate determined by the rotation of the disk. If the data cannot be supplied or accepted at this rate, data will be lost. The solution is to use a relative of the timebase corrector, known as a silo, which is a kind of memory that can provide or accept data from the disk as required, and buffer the timing of that data from the timing of the rest of the system. With the use of a silo, a disk write would not be affected if the computer briefly suspended bus data flow to service an interrupt as the data would come from the silo.

3.5 Multiplexing

Multiplexing is used where several signals are to be transmitted down the same channel. The channel bit rate must be the same as or greater than the sum of the source bit rates. Figure 3.15 shows that when multiplexing is used, the data from each source have to be time compressed. This is done by buffering source data in a memory at the multiplexer. It is written into the memory in real time as it arrives, but will be read from the memory with a clock which has a much higher rate. This means that the readout occurs in a smaller timespan. If, for example, the clock frequency is raised by a factor of ten, the data for a given signal will be transmitted in a tenth of the normal time, leaving time in the multiplex for nine more such signals.

Figure 3.15 Multiplexing requires time compression on each input.

image

In the demultiplexer another buffer memory will be required. Only the data for the selected signal will be written into this memory at the bit rate of the multiplex. When the memory is read at the correct speed, the data will emerge with their original timebase.

In practice it is essential to have mechanisms to identify the separate signals to prevent them being mixed up and to convey the original signal clock frequency to the demultiplexer. In time-division multiplexing the timebase of the transmission is broken into equal slots, one for each signal. This makes it easy for the demultiplexer, but forces a rigid structure on all the signals such that they must all be locked to one another and have an unchanging bit rate. Packet multiplexing overcomes these limitations.

The multiplexer must switch between different time-compressed signals to create the bitstream and this is much easier to organize if each signal is in the form of data packets of constant size. Figure 3.16 shows a packet multiplexing system.

Each packet consists of two components: the header, which identifies the packet, and the payload, which is the data to be transmitted. The header will contain at least an identification code (ID) which is unique for each signal in the multiplex. The demultiplexer checks the ID codes of all incoming packets and discards those which do not have the wanted ID.

Figure 3.16 Packet multiplexing relles on headers to identify the packets.

image

Packet multiplexing has advantages over time-division multiplexing because it does not set the bit rate of each signal. A demultiplexer simply checks packet IDs and selects all packets with the wanted code. It will do this however frequently such packets arrive. Consequently it is practicable to have variable bit rate signals in a packet multiplex. The multiplexer has to ensure that the total bit rate does not exceed the rate of the channel, but that rate can be allocated arbitrarily between the various signals.

As a practical matter is is usually necessary to keep the bit rate of the multiplex constant. With variable rate inputs this is done by creating null packets which are generally called stuffing or packing. The headers of these packets contain an unique ID which the demultiplexer does not recognize and so these packets are discarded on arrival.

In an MPEG environment, statistical multiplexing can be extremely useful because it allows for the varying difficulty of real program material. In a multiplex of several digital radio stations channels, it is unlikely that all the programs will encounter difficult material simultaneously. When one program encounters complex program material, more data rate can be allocated at the allowable expense of the remaining programs which are handling easy material.

3.6 Gain control

When making a digital recording, the gain of the analog input will usually be adjusted so that the quantizing range is fully exercised in order to make a recording of maximum signal-to-noise ratio. During post-production, the recording may be played back and mixed with other signals, and the desired effect can only be achieved if the level of each can be controlled independently. Gain is controlled in the digital domain by multiplying each sample value by a coefficient. If that coefficient is less than one, attenuation will result; if it is greater than one, amplification can be obtained, provided that the resultant larger sample values can be handled without clipping.

Figure 3.17 Structure of fast multiplier. The input A is multiplied by 1, 2, 4, 8, etc. by bit shifting. The digits of the B input then determine which multiples of A should be added together by enabling AND gates between the shifters and the adder. For long wordlengths, the number of gates required becomes enormous, and the device is best implemented in a chip.

image

Multiplication in binary circuits is difficult. It can be performed by repeated adding, but this is too slow to be of any use. In fast multiplication, one of the inputs will be simultaneously multiplied by one, two, four, etc., by hard-wired bit shifting. Figure 3.17 shows that the other input bits will determine which of these powers will be added to produce the final sum, and which will be neglected. If multiplying by five, the process is the same as multiplying by four, multiplying by one, and adding the two products. This is achieved by adding the input to itself shifted two places. As the wordlength of such a device increases, the complexity increases exponentially, so this is a natural application for an integrated circuit.

3.7 Digital faders and controls

In a digital mixer, the gain coefficients will originate in hand-operated faders, just as in analog. Analog mixers having automated mixdown employ a system similar to the one shown in Figure 3.18. Here, the faders produce a varying voltage and this is converted to a digital code or gain coefficient in an ADC and recorded alongside the audio tracks. On replay the coefficients are converted back to analog voltages which control VCAs (voltage-controlled amplifiers) in series with the analog audio channels. A digital mixer has a similar structure, and the coefficients can be obtained in the same way. However, on replay, the coefficients are not converted back to analog, but remain in the digital domain and control multipliers in the digital audio channels directly. As the coefficients are digital, it is so easy to add automation to a digital mixer that there is not much point in building one without.

Figure 3.18 The automated mixdown system of an audio console digitizes fader positions for storage and uses the coefficients later to drive VCAs via convertors.

image

Gain coefficients can be obtained by digitizing the output of an analog fader, or directly in a digital fader. This is a form of displacement transducer in which the mechanical position of the knob is converted directly to a digital code. In practical equipment, the position of other controls, such as for equalizers or scrub wheels, will also need to be digitized. Controls can be linear or rotary, and absolute or relative. In an absolute control, the position of the knob determines the output directly. These are inconvenient in automated systems because unless the knob is motorized, the operator cannot see the setting the automation system has selected. In a relative control, the knob can be moved to increase or decrease the output, but its absolute position is meaningless. The absolute setting is displayed on a bar LED nearby. In a rotary control, the bar LED may take the form of a ring of LEDs around the control. The automation system setting can be seen on the display and no motor is needed. In a relative linear fader, the control may take the form of an endless ridged belt like a caterpillar track. If this is transparent, the bar LED may be seen through it.

Figure 3.19 An absolute linear fader uses a number of light beams which are interrupted in various combinations according to the position of a grating. A Gray code shown in Figure 3.20 must be used to prevent false codes.

image

Figure 3.20 (a) Binary cannot be used for position encoders because mechanical tolerances cause false codes to be produced. (b) In Gray code, only one bit (arrowed) changes in between positions, so no false codes can be generated.

image

Figure 3.19 shows an absolute linear fader. A grating is moved with respect to several light beams, one for each bit of the coefficient required. The interruption of the beams by the grating determines which photocells are illuminated. It is not possible to use a pure binary pattern on the grating because this results in transient false codes due to mechanical tolerances. Figure 3.20 shows some examples of these false codes. For example, on moving the fader from 3 to 4, the MSB goes true slightly before the middle bit goes false. This results in a momentary value of 4 + 2 = 6 between 3 and 4. The solution is to use a code in which only one bit ever changes in going from one value to the next. One such code is the Gray code, which was devised to overcome timing hazards in relay logic but is now used extensively in position encoders. Gray code can be converted to binary in a ROM, a gate array or by software.

Figure 3.21 The fixed and rotating gratings produce moiré fringes which are detected by two light paths as quadrature sinusoids. The relative phase determines the direction, and the frequency is proportional to speed of rotation.

image

Figure 3.21 shows a rotary incremental encoder. This produces a sequence of pulses whose number is proportional to the angle through which it has been turned. The rotor carries a radial grating over its entire perimeter. This turns over a second fixed radial grating whose bars are not parallel to those of the first grating. The resultant moiré fringes travel inward or outward depending on the direction of rotation. Two suitably positioned light beams falling on photocells will produce outputs in quadrature. The relative phase determines the direction and the frequency is proportional to speed. The encoder outputs can be connected to a counter whose contents will increase or decrease according to the direction the rotor is turned. The counter provides the coefficient output and drives the display.

For audio use, a logarithmic characteristic is required in gain control. Linear coefficients can conveniently be rendered logarithmic in a ROM or by software.

3.8 A digital mixer

The signal path of a simple digital mixer is shown in Figure 3.22. The two inputs are multiplied by their respective coefficients, and added together in two’s complement to achieve the mix as was shown in Figure 3.6. Peak limiting will be required. The sampling rate of the inputs must be exactly the same, and in the same phase, or the circuit will not be able to add on a sample-by-sample basis. If the inputs have come from different sources, those sources must be synchronized by the same master clock, and/or timebase correction must be provided on the inputs. Synchronization of audio sources follows the principle long established in video in which a reference signal is fed to all devices which then slave or genlock to it.

Some thought must be given to the wordlength of the system. If a sample is attenuated, it will develop bits which are below the radix point. For example, if an eight-bit sample is attenuated by 24 dB, the sample value will be shifted four places down. Extra bits must be available within the mixer to accommodate this shift. Digital mixers can have an internal wordlength of up to 32 bits. When several attenuated sources are added together to produce the final mix, the result will be a stream of 32-bit or longer samples. As the output will generally need to be of the same format as the input, the wordlength must be shortened. This must be done very carefully, as it is a form of quantizing and will require dithering. The necessary techniques will be treated in Chapter 4.

In practice a digital mixer would not have one multiplier for every input. Figure 3.22 also shows that a more economical system results when a timeshared bus system is used with only one multiplier followed by an accumulator. In one sample period, each of the input samples is fed in turn to the lower input of the multiplier at the same time as the corresponding coefficient is fed to the upper input. The products from the multiplier are accumulated during the sample period, so that at the end of the sample period, the accumulator holds the sum of all the products, which is the digitally mixed sample. The process then repeats for the next sample period. To facilitate the sharing of common circuits by many signals, tri-state logic devices can be used. The outputs of such devices can be wired in parallel, and the state of the parallel connection will be the state of the device whose output is enabled. Clearly only one output can be enabled at a time, and this will be ensured by a sequencer circuit connected to all the device enables. In digital signal processing (DSP), the processes shown above can be simulated in software.

Figure 3.22 One multiplier/accumulator can be time shared between several signals by operating at a multiple of sampling rate. In this example, four multiplications are performed during one sample period.

image

In analog audio mixers, the controls have to be positioned close to the circuitry for performance reasons; thus one control knob is needed for every variable, and the control panel is physically large. Remote control is difficult with such construction. The order in which the signal passes through the various stages of the mixer is determined at the time of design, and any changes are difficult.

In a digital mixer,1,2 all the filters are controlled by simply changing the coefficients, and remote control is easy. Since control is by digital parameters, it is possible to use assignable controls, such that there need only be one set of filter and equalizer controls, whose setting is conveyed to any channel chosen by the operator.3 The use of digital processing allows the console to include a video display of the settings.

Since the audio processing in a digital mixer is by program control, the configuration of the desk can be changed at will by running the programs for the various functions in a different order. The operator can configure the desk to his own requirements by entering symbols on a block diagram on the video display, for example. The configuration and the setting of all the controls can be stored in memory or for a longer term, on disk, and recalled instantly. Such a desk can be in almost constant use, because it can be put back exactly to a known state easily after someone else has used it.

A further advantage of working in the digital domain is that delay can be controlled individually in the audio channels.4 This allows for the time of arrival of wavefronts at various microphones to be compensated despite their physical position.

Figure 3.23 shows a typical digital mixer installation.3 The analog microphone inputs are from remote units containing ADCs so that the length of analog cabling can be kept short. The input units communicate with the signal processor using digital fibre-optic links.

The sampling rate of a typical digital audio signal is low compared to the speed at which typical logic gates can operate. It is sensible to minimize the quantity of hardware necessary by making each perform many functions in one sampling period. Although general-purpose computers can be programmed to process digital audio, they are not ideal for the purpose. This has resulted in the development of specialized digital audio signal processors, almost always called DSP.5–7 These units are implemented with more internal registers than data processors to facilitate multi-point filter algorithms. The arithmetic unit will be designed to offer high-speed multiply/accumulate using techniques such as pipelining, which allows operations to overlap.8 The functions of the register set and the arithmetic unit are controlled by a microsequencer.

Figure 3.23 Digital mixer installation. The convenience of digital transmission without degradation allows the control panel to be physically remote from the processor.

image

External control of a DSP will generally be by a smaller processor, often in the operator’s console, which passes coefficients to the DSP as the operator moves the controls. In large systems, it is possible for several different consoles to control different sections of the DSP.9

The steadily improving economics of digital logic means that the true cost to the user of digital mixing consoles continues to fall, despite the inclusion of an increasing number of features. The traditional analog console is therefore under threat, except possibly at the highest quality level where equipment economics are not so dominant.

3.9 Filters

One of the most important processes in digital audio is filtering, and its parallel topic of transforms. Filters and transforms are relevant to sampling, conversion, recording, transmission and compression systems. Filtering is unavoidable. Sometimes a process has a filtering effect which is undesirable, for example the limited frequency response of a microphone, and we try to minimize it. On other occasions a filtering effect is specifically required. Filters are required in ADCs, DACs, in the data channels of digital recorders and transmission systems, in compression systems and in DSP.

Figure 3.24 shows that impulse response testing tells a great deal about a filter. In a perfect filter, all frequencies should experience the same time delay. If some groups of frequencies experience a different delay from others, there is a groupdelay error. As an impulse contains an infinite spectrum, a filter suffering from group-delay error will separate the different frequencies of an impulse along the time axis.

A pure delay will cause a phase shift proportional to frequency, and a filter with this characteristic is said to be phase-linear. The impulse response of a phase-linear filter is symmetrical. If a filter suffers from group-delay error it cannot be phaselinear. It is almost impossible to make a perfectly phase-linear analog filter, and many filters have a group-delay equalization stage following them which is often as complex as the filter itself. In the digital domain it is straightforward to make a phase-linear filter, and phase equalization becomes unnecessary.

Because of the sampled nature of the signal, whatever the response at low frequencies may be, all PCM channels act as low-pass filters because they cannot contain frequencies above the Nyquist limit of half the sampling frequency.

Figure 3.24 Group delay time-displaces signals as a function of frequency.

image

Transforms are a useful subject because they can help either to understand processes which cause undesirable filtering or to design filters. The information itself may be subject to a transform, especially in compression schemes. Transforming converts the information into another analog. The information is still there, but expressed with respect to temporal or spatial frequency rather than time or space. Instead of binary numbers representing the magnitude of samples, there are binary numbers representing the magnitude of frequency coefficients. What happens in the frequency domain must always be consistent with what happens in the time or space domains. Every combination of frequency and phase response has a corresponding impulse response in the time domain.

Figure 3.25 If a signal having a given spectrum is passed into a filter, multiplying the two spectra will give the output spectrum at (a). Equally transforming the filter frequency response will yield the impulse response of the filter. If this is convolved with the time domain waveform, the result will be the output waveform, whose transform is the output spectrum (b).

image

Figure 3.25 shows the relationship between the domains. On the left is the frequency domain. Here an input signal having a given spectrum is input to a filter having a given frequency response. The output spectrum will be the product of the two functions. If the functions are expressed logarithmically in deciBels, the product can be obtained by simple addition.

On the right, the time-domain output waveform represents the convolution of the impulse response with the input waveform. However, if the frequency transform of the output waveform is taken, it must be the same as the result obtained from the frequency response and the input spectrum. This is a useful result because it means that when audio sampling is considered, it will be possible to explain the process in both domains.

When a waveform is input to a system, the output waveform will be the convolution of the input waveform and the impulse response of the system. Convolution can be followed by reference to a graphic example shown in Figure 3.26. Where the impulse response is asymmetrical, the decaying tail occurs after the input. To obtain the correct result it is necessary to reverse the impulse response in time so that it is mirrored prior to sweeping it through the input waveform. If the impulse response is symmetrical, as would be the case with a linear phase filter, the mirroring process is superfluous.

Figure 3.26 In time discrete convolution, the mirrored impulse response is stepped through the input one sample period at a time. At each step, the sum of the cross-products is used to form an output value. As the input in this example is a constant-height pulse, the output is simply proportional to the sum of the coincident impulse response samples.

image

In the continuous domain, the output voltage is proportional to area where the two impulses overlap. However, in the sampled, or discrete time domain, both the impulse and the input are a set of discrete samples which clearly must have the same sample spacing. The impulse response only has value where impulses coincide. Elsewhere it is zero. The impulse response is therefore stepped through the input one sample period at a time. At each step, the area is still proportional to the output, but as the time steps are of uniform width, the area is proportional to the impulse height and so the output is obtained by adding up the lengths of overlap.

Figure 3.27 An FIR filter (a) responds only to an input, whereas the output of an IIR filter (b) continues indefinitely rather like a decaying echo.

image

Filters can be described in two main classes, as shown in Figure 3.27, according to the nature of the impulse response. Finite-impulse response (FIR) filters are always stable and, as their name suggests, respond to an impulse once, as they have only a forward path. In the temporal domain, the time for which the filter responds to an input is finite, fixed and readily established. The same is therefore true about the distance over which a FIR filter responds in the spatial domain. FIR filters can be made perfectly phase-linear if a significant processing delay is accepted. Most filters used for sampling rate conversion and oversampling fall into this category.

Infinite-impulse response (IIR) filters respond to an impulse indefinitely and are not necessarily stable, as they have a return path from the output to the input. For this reason they are also called recursive filters. As the impulse response is not symmetrical, IIR filters are not phase-linear. Audio equalizers often employ recursive filters.

3.10 FIR filters

A FIR filter performs convolution of the input waveform with its own impulse response. It does this by graphically constructing the impulse response for every input sample and superimposing all these responses. It is first necessary to establish the correct impulse response. Figure 3.28(a) shows an example of a lowpass filter which cuts off at one quarter of the sampling rate. The impulse response of an ideal low-pass filter is a sinx/x curve where the time between the two central zero crossings is the reciprocal of the cut-off frequency. According to the mathematical model, the waveform has always existed and carries on for ever.

The peak value of the output coincides with the input impulse. This means that the filter cannot be causal, because the output has changed before the input is known. Thus in all practical applications it is necessary to truncate the extreme ends of the impulse response, which causes an aperture effect, and to introduce a time delay in the filter equal to half the duration of the truncated impulse in order to make the filter causal. As an input impulse is shifted through the series of registers in Figure 3.28(b), the impulse response is created, because at each point it is multiplied by a coefficient as in (c).

These coefficients are simply the result of sampling and quantizing the desired impulse response. Clearly the sampling rate used to sample the impulse must be the same as the sampling rate for which the filter is being designed. In practice the coefficients are calculated, rather than attempting to sample an actual impulse response. The coefficient wordlength will be a compromise between cost and performance. Because the input sample shifts across the system registers to create the shape of the impulse response, the configuration is also known as a transversal filter. In operation with real sample streams, there will be several consecutive sample values in the filter registers at any time in order to convolve the input with the impulse response.

Simply truncating the impulse response causes an abrupt transition from input samples which matter and those which do not. Truncating the filter superimposes a rectangular shape on the time-domain impulse response. In the frequency domain the rectangular shape transforms to a sinx/x characteristic which is superimposed on the desired frequency response as a ripple. One consequence of this is known as Gibb’s phenomenon; a tendency for the response to peak just before the cut-off frequency.10,11 As a result, the length of the impulse which must be considered will depend not only on the frequency response, but also on the amount of ripple which can be tolerated. If the relevant period of the impulse is measured in sample periods, the result will be the number of points or multiplications needed in the filter. Figure 3.29 compares the performance of filters with different numbers of points. A high-quality digital audio FIR filter may need in excess of 100 points. Rather than simply truncate the impulse response in time, it is better to make a smooth transition from samples which do not count to those that do. This can be done by multiplying the coefficients in the filter by a window function which peaks in the centre of the impulse.

If the coefficients are not quantized finely enough, it will be as if they had been calculated inaccurately, and the performance of the filter will be less than expected. Figure 3.30 shows an example of quantizing coefficients. Conversely, raising the wordlength of the coefficients increases cost.

The FIR structure is inherently phase-linear because it is easy to make the impulse response absolutely symmetrical. The individual samples in a digital system do not know in isolation what frequency they represent, and they can only pass through the filter at a rate determined by the clock. Because of this inherent phase-linearity, a FIR filter can be designed for a specific impulse response, and the frequency response will follow.

Figure 3.28 (a) The impulse response of an LPF is a sinx/x curve which stretches from − to + ∞ in time. The ends of the response must be neglected, and a delay introduced to make the filter causal.

image

Figure 3.28 (b) The structure of an FIR LPF. Input samples shift across the register and at each point are multiplied by different coefficients.

image

Figure 3.28 (c) When a single unit sample shifts across the circuit of Figure 3.28(b), the impulse response is created at the output as the impulse is multiplied by each coefficient in turn.

image

Figure 3.29 The truncation of the impulse in an FIR filter caused by the use of a finite number of points (N) results in ripple in the response. Shown here are three different numbers of points for the same impulse response. The filter is an LPF which rolls off at 0.4 of the fundamental interval. (Courtesy Philips Technical Review)

image

Figure 3.30 Frequency response of a 49-point transversal filter with infinite precision (solid line) shows ripple due to finite window size. Quantizing coefficients to twelve bits reduces attenuation in the stopband. (Responses courtesy Philips Technical Review)

image

The frequency response of the filter can be changed at will by changing the coefficients. A programmable filter only requires a series of ROMs to supply the coefficients; the address supplied to the ROMs will select the response. The frequency response of a digital filter will also change if the clock rate is changed, so it is often less ambiguous to specify a frequency of interest in a digital filter in terms of a fraction of the fundamental interval rather than in absolute terms.

3.11 Sampling-rate conversion

Sampling-rate conversion is an important enabling technology on which a large number of practical devices are based. There are numerous standard sampling rates for audio and it may be necessary to convert between them. In some low-bit rate applications such as Internet audio, the sampling rate may deliberately be reduced. To take advantage of oversampling convertors, an increase in sampling rate is necessary for DACs and a reduction in sampling rate is necessary for ADCs. In oversampling the factors by which the rates are changed are usually simpler than in other applications.

There are three basic but related categories of rate conversion, as shown in Figure 3.31. The most straightforward (a) changes the rate by an integer ratio, up or down. The timing of the system is thus simplified because all samples (input and output) are present on edges of the higher-rate sampling clock. Such a system is generally adopted for oversampling convertors; the exact sampling rate immediately adjacent to the analog domain is not critical, and will be chosen to make the filters easier to implement.

Figure 3.31 Categories of rate conversion. (a) Integer-ratio conversion, where the lower-rate samples are always coincident with those of the higher rate. There are a small number of phases needed. (b) Fractional-ratio conversion, where sample coincidence is periodic. A larger number of phases are required. Example here is conversion from 50.4 kHz to 44.1 kHz (8/7). (c)

Variable-ratio conversion, where there is no fixed relationship, and a large number of phases are required.

image

Next in order of difficulty is the category shown at (b) where the rate is changed by the ratio of two small integers. Samples in the input periodically time-align with the output.

The most complex rate-conversion category is where there is no simple relationship between input and output sampling rates, and in fact they may vary. This situation, shown at (c), is known as variable-ratio conversion. The temporal or spatial relationship of input and output samples is arbitrary.

The technique of integer-ratio conversion is used in conjunction with oversampling convertors and in compression systems where sub-sampled versions of an audio waveform may be required.

Rate convertors incorporate two steps. The first is to control the system bandwidth. If the sampling rate is to be reduced, the bandwidth of the input signal must also be reduced to prevent aliasing. This stage is not required if the rate is to be increased. Next is the interpolation stage which represents the original waveform by samples at new locations. Performing the steps of rate increase separately is inefficient. The combination of the two processes into an interpolating filter minimizes the amount of computation.

As the purpose of the system is purely to change the sampling rate, the filter must be as transparent as possible, and this suggests the use of an FIR structure displaying linear phase. The theoretical impulse response of such a filter is a sinx/ x curve which has zero value at the position of adjacent input samples. In practice this impulse cannot be implemented because it is infinite. The impulse response used will be truncated and windowed as described earlier. To simplify this discussion, assume that a sinx/x impulse is to be used. There is a strong parallel with the operation of a DAC where the analog voltage is returned to the timecontinuous state by summing the analog impulses due to each sample. In a digital interpolating filter, this process is duplicated.12

If the sampling rate is to be doubled, new samples must be interpolated exactly half-way between existing samples. The necessary impulse response is shown in Figure 3.32; it can be sampled at the output sample period and quantized to form coefficients. If a single input sample is multiplied by each of these coefficients in turn, the impulse response of that sample at the new sampling rate will be obtained. Note that every other coefficient is zero, which confirms that no computation is necessary on the existing samples; they are just transferred to the output. The intermediate sample is then computed by adding together the impulse responses of every input sample in the window. The figure shows how this mechanism operates. If the sampling rate is to be increased by a factor of four, three sample values must be interpolated between existing input samples. It is then necessary to sample the impulse response at one-quarter the period of input samples to obtain three sets of coefficients which will be used in turn. In hardware-implemented filters, the input sample which is passed straight to the output is transferred by using a fourth filter phase where all coefficients are zero except the central one, which is unity.

Fractional ratio conversion allows interchange between different images having different pixel array sizes. Fractional ratios also occur in the vertical axis of standards convertors. Figure 3.31 showed that when the two sampling rates have a simple fractional relationship m/n, there is a periodicity in the relationship between samples in the two streams. It is possible to have a system clock running at the least-common multiple frequency which will divide by different integers to give each sampling rate.13

In a variable-ratio interpolator, values will exist for the points at which input samples were made, but it is necessary to compute what the sample values would have been at absolutely any point between available samples. The general concept of the interpolator is the same as for the fractional-ratio convertor, except that an infinite number of filter phases is ideally necessary. Since a realizable filter will have a finite number of phases, it is necessary to study the degradation this causes. The desired continuous temporal or spatial axis of the interpolator is quantized by the phase spacing, and a sample value needed at a particular point will be replaced by a value for the nearest available filter phase. The number of phases in the filter therefore determines the accuracy of the interpolation. The effects of calculating a value for the wrong point are identical to those of sampling with clock jitter, in that an error occurs proportional to the slope of the signal. The result is program-modulated noise. The higher the noise specification, the greater the desired time accuracy and the greater the number of phases required. The number of phases is equal to the number of sets of coefficients available, and should not be confused with the number of points in the filter, which is equal to the number of coefficients in a set (and the number of multiplications needed to calculate one output value).

Figure 3.32 A two times oversampling interpolator. To compute an intermediate sample, the input samples are imagined to be sinx/x impulses, and the contributions from each at the point of interest can be calculated. In practice, rather more samples on either side need to be taken into account.

image

3.12 Transforms and duality

The duality of transforms provides an interesting insight into what is happening in common processes. Fourier analysis holds that any periodic waveform can be reproduced by adding together an arbitrary number of harmonically related sinusoids of various amplitudes and phases. Figure 3.33 shows how a square wave can be built up of harmonics. The spectrum can be drawn by plotting the amplitude of the harmonics against frequency. It will be seen that this gives a spectrum which is a decaying wave. It passes through zero at all even multiples of the fundamental. The shape of the spectrum is a sinx/x curve. If a square wave has a sinx/x spectrum, it follows that a filter with a rectangular impulse response will have a sinx/x spectrum.

Figure 3.33 Fourier analysis of a square wave into fundamental and harmonics. A, amplitude, δ, phase of fundamental wave in degrees; 1, first harmonic (fundamental); 2, odd harmonics 3–15; 3, sum of harmonics 1–15; 4, ideal square wave.

image

A low-pass filter has a rectangular spectrum, and this has a sinx/x impulse response. These characteristics are known as a transform pair. In transform pairs, if one domain has one shape of the pair, the other domain will have the other shape. Figure 3.34 shows a number of transform pairs.

At (a) a square wave has a sinx/x spectrum and a sinx/x impulse has a square spectrum. In general the product of equivalent parameters on either side of a transform remains constant, so that if one increases, the other must fall. If (a) shows a filter with a wider bandwidth, having a narrow impulse response, then (b) shows a filter of narrower bandwidth which has a wide impulse response. This is duality in action. The limiting case of this behaviour is where one parameter becomes zero, the other goes to infinity. At (c) a time-domain pulse of infinitely short duration has a flat spectrum. Thus a flat waveform, i.e. DC, has only zero in its spectrum. The impulse response of the optics of a laser disk (d) has a sin2x/x2 intensity function, and this is responsible for the triangular falling frequency response of the pickup. The lens is a rectangular aperture, but as there is no such thing as negative light, a sinx/x impulse response is impossible. The squaring process is consistent with a positive-only impulse response. Interestingly the transform of a Gaussian response in still Gaussian.

Figure 3.34 Transform pairs. At (a) the dual of a rectangle is a sinx/x function. If one is time domain, the other is frequency domain. At (b), narrowing one domain widens the other. The limiting case of this is (c). Transform of the sinx/x squared function is triangular (d).

image

Duality also holds for sampled systems. A sampling process is periodic in the time domain. This results in a spectrum which is periodic in the frequency domain. If the time between the samples is reduced, the bandwidth of the system rises. Figure 3.35(a) shows that a continuous time signal has a continuous spectrum whereas at (b) the frequency transform of a sampled signal is also discrete. In other words sampled signals can only be analysed into a finite number of frequencies. The more accurate the frequency analysis has to be, the more samples are needed in the block. Making the block longer reduces the ability to locate a transient in time. This is the Heisenberg inequality, which is the limiting case of duality, because when infinite accuracy is achieved in one domain, there is no accuracy at all in the other.

Figure 3.35 Continuous time signal (a) has continuous spectrum. Discrete time signal (b) has discrete spectrum.

image

3.13 The Fourier transform

Figure 3.33 showed that if the amplitude and phase of each frequency component is known, linearly adding the resultant components in an inverse transform results in the original waveform. In digital systems the waveform is expressed as a number of discrete samples. As a result the Fourier transform analyses the signal into an equal number of discrete frequencies. This is known as a discrete Fourier transform or DFT in which the number of frequency coefficients is equal to the number of input samples. The fast Fourier transform is no more than an efficient way of computing the DFT.14

It will be evident from Figure 3.33 that the knowledge of the phase of the frequency component is vital, as changing the phase of any component will seriously alter the reconstructed waveform. Thus the DFT must accurately analyse the phase of the signal components.

Section 2.10 showed a point rotating about a fixed axis at constant speed. One way of defining the phase of a waveform is to specify the angle through which the point has rotated at time zero (T = 0). If a second point is made to revolve at 90° to the first, it would produce a cosine wave when translated. It is possible to produce a waveform having arbitrary phase by adding together the sine and cosine waves in various proportions and polarities. For example, adding the sine and cosine waves in equal proportion results in a waveform lagging the sine wave by 45°. The proportions necessary are respectively the sine and the cosine of the phase angle. Thus the two methods of describing phase can be readily interchanged.

The discrete Fourier transform spectrum-analyses a string of samples by searching separately for each discrete target frequency. It does this by multiplying the input waveform by a sine wave, known as the basis function, having the target frequency and adding up or integrating the products. Figure 3.36(a) shows that multiplying by basis functions gives a non-zero integral when the input frequency is the same, whereas (b) shows that with a different input frequency (in fact all other different frequencies) the integral is zero showing that no component of the target frequency exists. Thus from a real waveform containing many frequencies all frequencies except the target frequency are excluded. The magnitude of the integral is proportional to the amplitude of the target component.

Figure 3.36 The input waveform is multiplied by the target frequency and the result is averaged or integrated. At (a) the target frequency is present and a large integral results. With another input frequency the integral is zero as at (b). The correct frequency will also result in a zero integral shown at (c) if it is at 90° to the phase of the search frequency. This is overcome by making two searches in quadrature.

image

Figure 3.36(c) shows that the target frequency will not be detected if it is phase shifted 90° as the product of quadrature waveforms is always zero. Thus the discrete Fourier transform must make a further search for the target frequency using a cosine basis function. It follows from the arguments above that the relative proportions of the sine and cosine integrals reveal the phase of the input component. Thus each discrete frequency in the spectrum must be the result of a pair of quadrature searches.

Searching for one frequency at a time as above will result in a DFT, but only after considerable computation. However, a lot of the calculations are repeated many times over in different searches. The fast Fourier transform gives the same result with less computation by logically gathering together all the places where the same calculation is needed and making the calculation once.

3.14 The discrete cosine transform (DCT)

The DCT is a special case of a discrete Fourier transform in which the sine components of the coefficients have been eliminated leaving a single number. This is actually quite easy. Figure 3.37(a) shows a block of input samples to a transform process. By repeating the samples in a time-reversed order and performing a discrete Fourier transform on the double-length sample set a DCT is obtained. The effect of mirroring the input waveform is to turn it into an even function whose sine coefficients are all zero. The result can be understood by considering the effect of individually transforming the input block and the reversed block.

Figure 3.37 The DCT is obtained by mirroring the input block as shown at (a) prior to an FFT. The mirroring cancels out the sine components as at (b), leaving only cosine coefficients.

image

Figure 3.37(b) shows that the phase of all the components of one block are in the opposite sense to those in the other. This means that when the components are added to give the transform of the double length block all the sine components cancel out, leaving only the cosine coefficients, hence the name of the transform.15 In practice the sine component calculation is eliminated. Another advantage is that doubling the block length by mirroring doubles the frequency resolution, so that twice as many useful coefficients are produced. In fact a DCT produces as many useful coefficients as input samples.

The DCT is primarily used in audio coding because it converts the input waveform into a form where redundancy can be easily detected and removed. More details of the DCT can be found in Chapter 5.

3.15 Modulo-n arithmetic

Conventional arithmetic which is in everyday use relates to the real world of counting actual objects, and to obtain correct answers the concepts of borrow and carry are necessary in the calculations.

There is an alternative type of arithmetic which has no borrow or carry which is known as modulo arithmetic. In modulo-n no number can exceed n. If it does, n or whole multiples of n are subtracted until it does not. Thus 25 modulo-16 is 9 and 12 modulo-5 is 2. The output of a four-bit counter overflows when it reaches 1111 because the carry-out is ignored. If a number of clock pulses m are applied from the zero state, the state of the counter will be given by m Mod.16. Thus modulo arithmetic is appropriate to systems in which there is a fixed wordlength and this means that the range of values the system can have is restricted by that wordlength. A number range which is restricted in this way is called a finite field.

Modulo-2 is a numbering scheme which is used frequently in digital processes. Figure 3.38 shows that in modulo-2 the conventional addition and subtraction are replaced by the XOR function such that:

image

When multi-bit values are added Mod.2, each column is computed quite independently of any other. This makes Mod.2 circuitry very fast in operation as it is not necessary to wait for the carries from lower-order bits to ripple up to the high-order bits.

Figure 3.38 In modulo-2 calculations, there can be no carry or borrow operations and conventional addition and subtraction become identical. The XOR gate is a modulo-2 adder.

image

Modulo-2 arithmetic is not the same as conventional arithmetic and takes some getting used to. For example, adding something to itself in Mod.2 always gives the answer zero.

3.16 The Galois field

Figure 3.39 shows a simple circuit consisting of three D-type latches which are clocked simultaneously. They are connected in series to form a shift register. At (a) a feedback connection has been taken from the output to the input and the result is a ring counter where the bits contained will recirculate endlessly. At (b) one XOR gate is added so that the output is fed back to more than one stage. The result is known as a twisted-ring counter and it has some interesting properties. Whenever the circuit is clocked, the left-hand bit moves to the right-hand latch, the centre bit moves to the left-hand latch and the centre latch becomes the XOR of the two outer latches. The figure shows that whatever the starting condition of the three bits in the latches, the same state will always be reached again after seven clocks, except if zero is used. The states of the latches form an endless ring of non-sequential numbers called a Galois field after the French mathematical prodigy Evariste Galois who discovered them. The states of the circuit form a maximum length sequence because there are as many states as are permitted by the wordlength. As the states of the sequence have many of the characteristics of random numbers, yet are repeatable, the result can also be called a pseudorandom sequence (prs). As the all-zeros case is disallowed, the length of a maximum length sequence generated by a register of m bits cannot exceed (2m−1) states. The Galois field, however, includes the zero term. It is useful to explore the bizarre mathematics of Galois fields which use modulo-2 arithmetic. Familiarity with such manipulations is helpful when studying the error correction, particularly the Reed–Solomon codes used in recorders and treated in Chapter 6. They will also be found in processes which require pseudo-random numbers such as digital dither, considered in Chapter 3, and randomized channel codes used in, for example, DAB and discussed in Chapter 9.

Figure 3.39 The circuit shown is a twisted-ring counter which has an unusual feedback arrangement. Clocking the counter causes it to pass through a series of non-sequential values. See text for details.

image

The circuit of Figure 3.39 can be considered as a counter and the four points shown will then be representing different powers of 2 from the MSB on the left to the LSB on the right. The feedback connection from the MSB to the other stages means that whenever the MSB becomes 1, two other powers are also forced to one so that the code of 1011 is generated.

Each state of the circuit can be described by combinations of powers of x, such as

image

The fact that three bits have the same state because they are connected together is represented by the Mod.2 equation:

image

Let x = a, which is a primitive element. Now

image

In modulo-2

image

In this way it can be seen that the complete set of elements of the Galois field can be expressed by successive powers of the primitive element. Note that the twisted-ring circuit of Figure 3.39 simply raises a to higher and higher powers as it is clocked. Thus the seemingly complex multibit changes caused by a single clock of the register become simple to calculate using the correct primitive and the appropriate power.

The numbers produced by the twisted-ring counter are not random; they are completely predictable if the equation is known. However, the sequences produced are sufficiently similar to random numbers that in many cases they will be useful. They are thus referred to as pseudo-random sequences. The feedback connection is chosen such that the expression it implements will not factorize. Otherwise a maximum-length sequence could not be generated because the circuit might sequence around one or other of the factors depending on the initial condition. A useful analogy is to compare the operation of a pair of meshed gears. If the gears have a number of teeth which is relatively prime, many revolutions are necessary to make the same pair of teeth touch again. If the number of teeth have a common multiple, far fewer turns are needed.

3.17 The phase-locked loop

All digital audio systems need to be clocked at the appropriate rate in order to function properly. Whilst a clock may be obtained from a fixed-frequency oscillator such as a crystal, many operations in video require genlocking or synchronizing the clock to an external source. The phase-locked loop excels at this job, and many others, particularly in connection with recording and transmission.

In phase-locked loops, the oscillator can run at a range of frequencies according to the voltage applied to a control terminal. This is called a voltagecontrolled oscillator or VCO. Figure 3.40 shows that the VCO is driven by a phase error measured between the output and some reference. The error changes the control voltage in such a way that the error is reduced, so that the output eventually has the same frequency as the reference. A low-pass filter is fitted in the control voltage path to prevent the loop becoming unstable. If a divider is placed between the VCO and the phase comparator, as in the figure, the VCO frequency can be made to be a multiple of the reference. This also has the effect of making the loop more heavily damped, so that it is less likely to change frequency if the input is irregular.

Figure 3.40 A phase-locked loop requires these components as a minimum. The filter in the control voltage serves to reduce clock jitter.

image

Figure 3.41 shows how the 48 kHz sampling rate clock of professional digital audio may be obtained from the sync pulses of an analog video signal by such a multiplication process.

Figure 3.42 shows the NLL or numerically locked loop. This is similar to a phase-locked loop, except that the two phases concerned are represented by the state of a binary number. The NLL is useful to generate a remote clock from a master. The state of a clock count in the master is periodically transmitted to the NLL which will re-create the same clock frequency. The technique is used in MPEG transport streams.

Figure 3.41 Obtaining a 48 kHz sampling clock from the line frequency of 625/50 video using a phase-locked loop.

image

Figure 3.42 The numerically locked loop (NLL) is a digital version of the phase-locked loop.

image

References

Richards, J.W., Digital audio mixing. Radio and Electron. Eng., 53, 257–264 (1983)

Richards, J.W. and Craven, I., An experimental ‘all digital’ studio mixing desk. J. Audio Eng. Soc., 30, 117–126 (1982)

Jones, M.H., Processing systems for the digital audio studio. In Digital Audio, edited by B. Blesser, B. Locanthi and T.G. Stockham Jr, pp. 221–225, New York: Audio Engineering Society (1982)

Lidbetter, P.S., A digital delay processor and its applications. Presented at the 82nd Audio Engineering Society Convention (London, 1987), Preprint 2474(K-4)

McNally, G.J., COPAS: A high speed real time digital audio processor. BBC Research Dept Report, RD 1979/26

McNally, G.W., Digital audio: COPAS-2, a modular digital audio signal processor for use in a mixing desk. BBC Research Dept Report, RD 1982/13

Vandenbulcke, C. et al., An integrated digital audio signal processor. Presented at the 77th Audio Engineering Society Convention (Hamburg, 1985), Preprint 2181(B-7)

Moorer, J.A., The audio signal processor: the next step in digital audio. In Digital Audio, edited by B. Blesser, B. Locanthi and T.G. Stockham Jr, pp. 205–215, New York: Audio Engineering Society (1982)

Gourlaoen, R. and Delacroix, P., The digital sound mixing desk: architecture and integration in the future all-digital studio. Presented at the 80th Audio Engineering Society Convention (Montreux, 1986), Preprint 2327(D-1)

van den Enden, A.W.M. and Verhoeckx, N.A.M., Digital signal processing: theoretical background. Philips Tech. Rev., 42, 110–144, (1985)

McClellan, J.H., Parks, T.W. and Rabiner, L.R., A computer program for designing optimum FIR linear-phase digital filters. IEEE Trans. Audio and Electroacoustics, AU-21, 506–526 (1973)

Crochiere, R.E. and Rabiner, L.R., Interpolation and decimation of digital signals–a tutorial review. Proc. IEEE, 69, 300–331 (1981)

Rabiner, L.R., Digital techniques for changing the sampling rate of a signal. In Digital Audio, edited by B. Blesser, B. Locanthi and T.G. Stockham Jr, pp. 79–89, New York: Audio Engineering Society (1982)

Kraniauskas, P., Transforms in Signals and Systems, Wokingham: Addison-Wesley (1992)

Ahmed, N., Natarajan, T. and Rao, K., Discrete Cosine Transform, IEEE Trans. Computers, C-23, 90–93 (1974)

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset