10.2 PRECEDENCE RELATION AND SCHEDULING

The precedence relation and its graphical representation—a dataflow graph—define which operations must be completed before starting a new one. Consider a first example:

Example 10.3 The precedence graph of Algorithm 4.2 (carry-chain adder), with n = 4, is shown in Figure 10.3.

If one chooses to execute the algorithm in just one cycle (parallel implementation), then the precedence graph is practically equivalent to the corresponding combinational circuit. As an example, the carry-chain adder of Figure 11.3 can be directly deduced from the precedence graph of Figure 10.3. If a sequential implementation is considered, the precedence graph allows scheduling the operations, that is, deciding in which cycle every operation is performed.

A sequential implementation is given in Example 10.4.

Example 10.4 (Complete VHDL source code available.) The data flow graph corresponding to Algorithm 4.10 (long-operand addition), with n = 128 and s = 32, is shown in Figure 10.4. A possible 4-cycle schedule is indicated. In every cycle a 32-digit addition is performed so that only one 32-digit adder is necessary. The scheduled precedence graph also gives information about the memory resources: every arc crossing a cycle dotted line corresponds to a variable computed during some cycle and to be used during one of the following ones. So some memory resource is necessary in order to store its value (except in the case of inputs assumed constant during the whole computation). A practical implementation of the data path, with B = 2, is shown in Figure 10.5. The memory resources are four 32-bit registers—with a clock enable input—storing the four parts of the result and a D flip-flop that stores the successive carries. The corresponding cost C and delay T are equal to

image

Figure 10.3 Precedence graph of Algorithm 4.2.

image

image

The following VHDL model describes a generic version of the data path whose parameters are the number of bits n and the number of slices m = n/s:

entity data_path is
port (
  x, y: in long_operand;
  clk, reset, load_cy,: in std_logic;
  word_select: std_logic_vector(logm-1 downto 0);
  load: std_logic_vector(m-1 downto 0);
  z: out long_operand;
  carry: out std_logic
  );
end data_path;

architecture circuit of data_path is
  signal op_1, op_2, adder_out: std_logic_vector(n downto 0);
  signal c_in, c_out: std_logic;
begin
  op_1<=(‘0’&x(conv_integer(word_select)));
  op_2<=(‘0’&y(conv_integer(word_select)));
  adder_out<=op_1+op_2+c_in;
  c_out<=adder_out(n);
  process(clk)
  begin
    if reset=‘1’ then c_in<=‘0’;
    elsif clk'event and clk=‘1’ then
    if load_cy=‘1’ then c_in<=c_out; end if;
    end if;
  end process;
  process(clk)
  begin
    if clk'event and clk=‘1’ then
    for i in 0 to m-1 loop
    if load(i)=‘1’ then z(i)<=adder_out(n-1 downto 0);
    end if; end loop;
    end if;
  end process;
  carry<=c_in;
end circuit;

image

Figure 10.4 Precedence graph of Algorithm 4.10.

image

Figure 10.5 A 128-bit adder.

An additional control unit must be designed in order to generate the control signals as well as controlling the start/done communication protocol. It can be modeled by a finite state machine:

entity control_unit is
port (
  clk, reset, start: in std_logic;
  done, load_cy: out std_logic;
  word_select: std_logic_vector(logm-1 downto 0);
  load: out std_logic_vector(m-1 downto 0)
  );
end control_unit;

architecture fsm of control_unit is
  subtype state is integer range -3 to m;
  signal current_state: state;
  begin
  process(clk, reset)
  begin
    case current_state is
      when -3=>load<=conv_std_logic_vector(0, m); load_cy<=
      ‘0’; word_select<=conv_std_logic_vector(0, logm);
      done<=‘1’;
      when -2=>load<=conv_std_logic_vector(0, m); load_cy<=
      ‘0’; word_select<=conv_std_logic_vector(0, logm);
      done<=‘1’;
      when -1=>load<=conv_std_logic_vector(0, m); load_cy<=
      ‘0’; word_select<=conv_std_logic_vector(0, logm);
      done<=‘1’;
      when 0 to m-1=>load<=conv_std_logic_vector(2**
      (current_state), m); load_cy<=‘1’; word_select<=
      conv_std_logic_vector(current_state,
      logm); done<=‘0’;
      when m=>load<=conv_std_logic_vector(0, m); load_cy<=
      ‘0’; word_select<=conv_std_logic_vector(0, logm);
      done<=‘1’;
    end case;
    if reset=‘1’ then current_state<=-3;
    elsif clk'event and clk=‘1’ then
      case current_state is
        when -3=>if start=‘0’ then current_state<=
        current_state+1; end if;
        when -2=>if start=‘1’ then current_state<=
        current_state+1; end if;
        when -1=>current_state<=current_state+1;
        when 0 to m-1=>current_state<=current_state+1;
        when m=>current_state<=-3;
      end case;
    end if;
  end process;
end fsm;

If a minimum number of state-encoding variables are used, and if the combinational cost is small compared with the state-register one, the cost of the control unit is roughly equal to log2(m + 4).CFF. The total cost of the long-operand adder is equal to

image

and its computation time to

image

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset