The precedence relation and its graphical representation—a dataflow graph—define which operations must be completed before starting a new one. Consider a first example:
Example 10.3 The precedence graph of Algorithm 4.2 (carry-chain adder), with n = 4, is shown in Figure 10.3.
If one chooses to execute the algorithm in just one cycle (parallel implementation), then the precedence graph is practically equivalent to the corresponding combinational circuit. As an example, the carry-chain adder of Figure 11.3 can be directly deduced from the precedence graph of Figure 10.3. If a sequential implementation is considered, the precedence graph allows scheduling the operations, that is, deciding in which cycle every operation is performed.
A sequential implementation is given in Example 10.4.
Example 10.4 (Complete VHDL source code available.) The data flow graph corresponding to Algorithm 4.10 (long-operand addition), with n = 128 and s = 32, is shown in Figure 10.4. A possible 4-cycle schedule is indicated. In every cycle a 32-digit addition is performed so that only one 32-digit adder is necessary. The scheduled precedence graph also gives information about the memory resources: every arc crossing a cycle dotted line corresponds to a variable computed during some cycle and to be used during one of the following ones. So some memory resource is necessary in order to store its value (except in the case of inputs assumed constant during the whole computation). A practical implementation of the data path, with B = 2, is shown in Figure 10.5. The memory resources are four 32-bit registers—with a clock enable input—storing the four parts of the result and a D flip-flop that stores the successive carries. The corresponding cost C and delay T are equal to
The following VHDL model describes a generic version of the data path whose parameters are the number of bits n and the number of slices m = n/s:
entity data_path is port ( x, y: in long_operand; clk, reset, load_cy,: in std_logic; word_select: std_logic_vector(logm-1 downto 0); load: std_logic_vector(m-1 downto 0); z: out long_operand; carry: out std_logic ); end data_path; architecture circuit of data_path is signal op_1, op_2, adder_out: std_logic_vector(n downto 0); signal c_in, c_out: std_logic; begin op_1<=(‘0’&x(conv_integer(word_select))); op_2<=(‘0’&y(conv_integer(word_select))); adder_out<=op_1+op_2+c_in; c_out<=adder_out(n); process(clk) begin if reset=‘1’ then c_in<=‘0’; elsif clk'event and clk=‘1’ then if load_cy=‘1’ then c_in<=c_out; end if; end if; end process; process(clk) begin if clk'event and clk=‘1’ then for i in 0 to m-1 loop if load(i)=‘1’ then z(i)<=adder_out(n-1 downto 0); end if; end loop; end if; end process; carry<=c_in; end circuit;
An additional control unit must be designed in order to generate the control signals as well as controlling the start/done communication protocol. It can be modeled by a finite state machine:
entity control_unit is port ( clk, reset, start: in std_logic; done, load_cy: out std_logic; word_select: std_logic_vector(logm-1 downto 0); load: out std_logic_vector(m-1 downto 0) ); end control_unit; architecture fsm of control_unit is subtype state is integer range -3 to m; signal current_state: state; begin process(clk, reset) begin case current_state is when -3=>load<=conv_std_logic_vector(0, m); load_cy<= ‘0’; word_select<=conv_std_logic_vector(0, logm); done<=‘1’; when -2=>load<=conv_std_logic_vector(0, m); load_cy<= ‘0’; word_select<=conv_std_logic_vector(0, logm); done<=‘1’; when -1=>load<=conv_std_logic_vector(0, m); load_cy<= ‘0’; word_select<=conv_std_logic_vector(0, logm); done<=‘1’; when 0 to m-1=>load<=conv_std_logic_vector(2** (current_state), m); load_cy<=‘1’; word_select<= conv_std_logic_vector(current_state, logm); done<=‘0’; when m=>load<=conv_std_logic_vector(0, m); load_cy<= ‘0’; word_select<=conv_std_logic_vector(0, logm); done<=‘1’; end case; if reset=‘1’ then current_state<=-3; elsif clk'event and clk=‘1’ then case current_state is when -3=>if start=‘0’ then current_state<= current_state+1; end if; when -2=>if start=‘1’ then current_state<= current_state+1; end if; when -1=>current_state<=current_state+1; when 0 to m-1=>current_state<=current_state+1; when m=>current_state<=-3; end case; end if; end process; end fsm;
If a minimum number of state-encoding variables are used, and if the combinational cost is small compared with the state-register one, the cost of the control unit is roughly equal to log2(m + 4).CFF. The total cost of the long-operand adder is equal to
and its computation time to