7 AGC Design Using Multiagent Systems

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

A multiagent system (MAS) comprises two or more (intelligent) agents to follow a specific goal. The MAS is now a research reality, and MASs are rapidly having a critical presence in many areas and environments. Interested readers can find some detailed reviews in several references.^1–5 In the last two decades, MASs have been widely used in many fields of control engineering, such as power system control, manufacturing/industrial control, congestion control, distributed control, hybrid control, robotics and formation control, remote control, and traffic control.

With the embedded learning capabilities, agents that are autonomous, pro-active, and reactive are well suited for modeling and control of various real-world complex systems, including the power industry. The MAS philosophy and its potential value in power system applications have been discussed.^6–8 Major applications are in the area of simulation, modeling, and design of trading platforms in restructured electricity markets.^6,9,10

Recent published research works on MAS-based AGC design are briefly reviewed in Chapter 3. In this chapter, first an introduction of MASs is presented, then a multiagent reinforcement-learning-based AGC scheme is introduced, and finally, the proposed methodology is examined on some power system examples.

7.1 Multiagent System (MAS): An Introduction

Dealing with complex dynamic systems that can be described with the terms uncertainty, nonlinearity, information structure constraints, and dimensionality, it is very difficult to satisfy all requirements of an intelligent control system, such as adaptation and learning, autonomy and intelligence, as well as structures and hierarchies, by using fuzzy controllers, neural networks, and evolutionary optimization methods like genetic algorithm as single applications. There are no concepts to incorporate all these methods in one common framework that combines the advantages of the single methods.¹¹ One approach to design an intelligent control system to autonomously achieve a high level of control objectives could be the application of multiagent systems.

Multiagent systems perform a subfield of (distributed) artificial intelligence (AI). An MAS includes several agents and a mechanism for coordination of independent agents’ actions. Various definitions for an agent are given in the computer science and AI literature.^12,13 An agent can be considered an intelligent entity that is operating in an environment, with a degree of autonomy, specific goal(s), and knowledge.

An agent can alter the environment by taking some actions, and can act autonomously in response to environmental changes. Autonomy means that the agent is able to fulfill its tasks without the direct intervention of a human, and the environment is everything (systems, hardware, and software) external to the agent. Of course, the agent is also a part of the environment, and can alter the environment by taking some actions. In a single-agent system, if there are other agents, they are also considered part of the environment.

Suitability for representing and control of interconnected/distributed systems, simplicity of mechanism, programming, and implementation, the capability of parallel processing/computation, scalability (handling numerous units), extensibility and flexibility (integrating of new parts and entities), maintainability (because of modularity due to using multiple components— agents), responsiveness (handling anomalies locally instead of propagating them to the whole system), robustness against failure, and reliability are some important reasons to use MASs in (specifically distributed) control system designs.

Figure 7.1 illustrates a view of typical MAS conceptually. Here, the agent is shown as a unit that sends and receives messages and interacts (via sensors and actuators) with its environment autonomously. The agents may also interact directly, as indicated in the figure by the arrows between the agents.

Images

FIGURE 7.1
A general multiagent framework.

There may be numerous agents with different structures, local goals, actions, and domain knowledge—with or without the ability to communicate with other agents directly. In addition to autonomy, veracity, and rationality, the main characteristics an agent may have are social ability, responsiveness, proactiveness, adaptability, mobility, and learning. These characteristics are well defined in Wooldridge and Jennings.²

Over the years, various approaches to implement autonomous intelligent agents, such as belief-desire-intention (BDI) agents, reactive agents, agents with layered architectures,¹³ and agents implemented using model-based programming,¹⁴ have been introduced. The BDI approach is based on mental models of an agent’s beliefs, desires, and intentions. It considers agents to have beliefs (about itself, other agents, and its environments), desires (about future states), and intentions (about its own future actions). Reactive agents are normally associated with the model of intelligence. The fundamental property of reactive agents is that they do not perform reasoning through interaction with the environment. Instead, they react to inputs from their environment and messages from other agents.¹³

Several layered agent structures are discussed in Wooldridge and Weiss.¹³ In a layered agent, each layer is developed for a specific task. For example, consider agents with three layers:¹⁵ a layer for handling measurement (message), a layer for behavior analysis, and a layer for actions. In this case, the message handling layer is responsible for sending and receiving messages from other agents, the behavioral layer will instruct the message handling layer to inform other agents of the new data, and the actions layer has the core functional attributes of the agent to perform the actions.¹³

The task of an agent contains a set of essential activities. Its goal is to change (or maintain) the state of the domain in some desirable way (according to the interest of its human principal). To do so, it takes action from time to time. To take the proper action, it makes observations on the domain. Following observations on a domain, an agent performs inference based on its knowledge about the relations among domain events, and then estimates the state of the domain using an intelligent core. The activity of guessing the state of the domain from prior knowledge and observations is known as reasoning or inference. A multiagent system consists of a set of agents acting in a problem domain. Each agent carries only a partial knowledge representation about the domain and can observe the domain from a partial perspective. Although an agent in a multiagent system can reason and act autonomously, as in the single-agent paradigm, to overcome its limit in domain knowledge, perspective, and computational resources, it can benefit from other agents’ knowledge, perspectives, and computational resources through communication and coordination.¹⁶

In a multiagent system, at least one agent is usually equipped with intelligent inference. An intelligent core can play a major role in an intelligent agent for reasoning about the dynamic environment. Various intelligent cores/inferences, such as symbolic representation, if-then rules and fuzzy logic,¹⁷ artificial neural networks,¹⁸ reinforcement learning,¹⁹ and Bayesian networks,²⁰ can be used in MASs. It may also possible to use some well-known control theory, such as sliding mode control, to suppress the effects of modeling uncertainties and disturbances, and to force the agent dynamics to move along a stabilizing manifold called sliding manifold.^21,22

Multiagent control systems represent control schemes that are inherently distributed and consist of multiple entities/agents. The control architecture for MASs can be broadly characterized as deliberative control, reactive control, and a combination of both. Deliberative control is based on planning, while reactive control is based on coupling between sensing and actuation. Strategies that require that action be mediated by some symbolic representation of the environment are often called deliberative. In contrast, reactive strategies do not exhibit a steadfast reliance on internal models. Instead of responding to entities within a model, the control system can respond directly to the perception of the real world.²³

Images

FIGURE 7.2
A typical intelligent agent architecture.

Complex control tasks can always be hierarchically decomposed into several simpler subtasks. This naturally leads to agent architectures consisting of multiple layers. Figure 7.2 shows a schematic diagram of a three-layer agent architecture. In real-time control applications, the agents should be capable of reasoning about the best possible action without losing too much time on sending or receiving data. In order to deal with the timing constraints caused by the real-time nature of the domain, it was therefore desirable that the agents could perform this high-level reasoning process. Adopting a somewhat hybrid approach thus seemed to be an appropriate choice.²⁴

Figure 7.2 shows the functional hierarchy of the agent architecture. As illustrated in this figure, the agent and environment form a closed-loop system. The bottom layer is the interfacing layer, which takes care of the interaction with the environment. This layer observes the details of the environment as much as possible from the other layers. The middle layer is the signal processing/modeling layer, which simulates and provides a clear view of the world (environment) with a set of possible choices for a third layer. The highest layer in the architecture is the control layer, which contains the reasoning component of the system. In this layer, the best possible action is selected from the observation/modeling layer, depending on the current environment state and the current strategy of the overall control system. The most recent environment state information is then used by the control layer to reason about the best possible action. The action selected by the control layer is subsequently worked out in the second layer, which determines the appropriate actuator command. This command is then executed by the actuator control module in the interfacing layer.

7.2 Multiagent Reinforcement-Learning-Based AGC

One of the adaptive and nonlinear intelligent control techniques that can be effectively applicable in the power system AGC design is reinforcement learning (RL). Some efforts have been addressed.^25–29 The RL-based controllers learn and are adjusted to keep the area control error small enough in each sampling time of an AGC cycle. Since these controllers are based on learning methods, they are independent of environment conditions and can learn a wide range of operating conditions. The RL-based AGC design is model-free and can be easily scalable for large-scale systems and suitable in response to the load disturbances and power fluctuations.

The present section addresses the AGC design using an agent-based RL technique for an interconnected power system. Here, each control area includes an agent that communicates with others to control the frequency and tie-line power among the whole interconnected system. Each agent (control agent) provides an appropriate control action according to the area control error (ACE) signal, using an RL algorithm. In a multiarea power system, the learning process is considered a multiagent RL process, and agents of all areas learn together (not individually).

7.2.1 Multiagent Reinforcement learning

This section presents a brief background on multiagent RL. The basic concepts and a comprehensive survey have been given previously.¹⁹,³⁰ The RL is learning what to do, and how to map situations to actions, so as to maximize a numerical reward signal.¹⁹ In fact, the learner discovers which action should be taken by interacting with the environment and trying the different actions that may lead to the highest reward. The RL evaluates the actions taken and gives the learner feedback on how good the action taken was and whether it should repeat this action in the same situation. In other words, the RL methods as intelligent agents learn to solve a problem by interacting with their environments.

During the learning process, the agent interacts with the environment and takes an action a_t from a set of actions, at time t. These actions will affect the system and will take it to a new state x_t+1. Therefore, the agent is provided with the corresponding reward signal (r_t+1). This agent-environment interaction is repeated until the desired objective is achieved. A state signal indicates required information for making a decision and, if it succeeds in retaining all relevant information, is said to be Markov, or to have the Markov property,¹⁹ and a RL task that satisfies this property is called a finite Markov decision process (MDP). If an environment has the Markov property, then its dynamics enable one to predict the next state and expected next reward given the current state and action.

In each MDP, the objective is to maximize the sum of returned rewards over time. Then the expected sum of discounted rewards is defined by the following equation:

R = \sum_{k = 0}^{\infty} r^{k_{r_{t + 1}}} (7.1)

$R = \sum_{k = 0}^{\infty} r^{k_{r_{t + 1}}} (7.1)$

where γ is a discount factor that gives the most importance to the recent rewards, and 0 < γ < 1.

Another term is a value function that is defined as the expected return or reward (E) when starting at state x_t while following policy π(x,a) (see Equation 7.2). The policy shows the way the agent maps the states to the actions.³⁶

V^{π} (x) = E_{π} {\sum_{k = 0}^{\infty} r^{k_{t + k + 1}} | x_{t} = x} (7.2)

$V^{π} (x) = E_{π} {\sum_{k = 0}^{\infty} r^{k_{t + k + 1}} | x_{t} = x} (7.2)$

An optimal policy is one that maximizes the value function. Therefore, once the optimal state value is derived, the optimal policy can be found as follows:

V^{*} (x) = \max_{π} V^{π} (x), \forall x \in X (7.3)

$V^{*} (x) = \max_{π} V^{π} (x), \forall x \in X (7.3)$

In most RL methods, instead of calculating the state value, another term, known as the action value, is calculated (Equation 7.4), which is defined as the expected discounted reward while starting at state x_t and taking action a_t.

Q^{π} (x, a) = E_{π} {\sum_{k = 0}^{\infty} γ^{k} r_{t + k + 1} | x_{t} = x, a_{t} = a} (7.4)

$Q^{π} (x, a) = E_{π} {\sum_{k = 0}^{\infty} γ^{k} r_{t + k + 1} | x_{t} = x, a_{t} = a} (7.4)$

To calculate the optimal action value, Bellman’s equation,¹⁹ as shown in Equation 7.5, can be used. In general, an optimal policy is one that maximizes the Q-function, defined by

Q^{*} (x, a) = \max_{π} E_{π} {r_{t + 1} + γ \max_{a} Q^{*} (x_{t + 1 /} á) | x_{t} = x, a_{t} = a,} (7.5)

$Q^{*} (x, a) = \max_{π} E_{π} {r_{t + 1} + γ \max_{a} Q^{*} (x_{t + 1 /} á) | x_{t} = x, a_{t} = a,} (7.5)$

Different RL methods have been proposed to solve the above equations. In some algorithms, the agent will first approximate the model of the system in order to calculate the Q-function. The method used in the present chapter is of a temporal difference type, which learns the model of the system under control. The only available information is the reward achieved by each action taken and the next state. The algorithm, called Q-learning, will approximate the Q-function, and by the computed function the optimal policy, which maximizes this function, is derived.²⁷

Well-understood algorithms with desirable convergence and consistency properties are available for solving the single-agent RL task, both when the agent knows the dynamics of the environment and the reward function, and when it does not. However, the scalability of algorithms to realistic problem sizes is problematic in single-agent RL, and is one of the great reasons to use multiagent RL.³⁰ In addition to scalability and benefits owing to the distributed nature of the multiagent solution, such as parallel processing, multiple RL agents may utilize new benefits from sharing experience, e.g., by communication, teaching, or imitation.³⁰ These properties make RL attractive for multiagent learning.

However, several new challenges arise for RL in the MASs. In MASs, other adapting agents make the environment no longer stationary, violating the Markov property that traditional single-agent behavior learning relies on; these nonstationarity properties decrease the convergence properties of most single-agent RL algorithms.³¹Another problem is the difficulty of defining an appropriate learning goal for the multiple RL agents.³⁰ Only then, an RL agent will be able to coordinate its behavior with other agents. These challenges make the multiagent RL design and learning difficult in large-scale applications; one should use a special learning algorithm, such as those introduced in Bevrani et al.²⁸ and Busoniu et al.³⁰ and discussed in Section 7.2.3. Using such a learning algorithm, violation of the Markov property caused from a multiagent structure and other problems will be solved.

Images

FIGURE 7.3
The overall control framework for area i.

7.2.2 area Control agent

Figure 7.3 shows the proposed intelligent control framework for control area i in a multiarea power system. Each control area includes an RL-based control agent as an intelligent controller. The controller is responsible for producing an appropriate control action (∆P_Ci ) using RL, according to the measured ACE signal and tie-line power changes (∆Ptie–i ).

The intelligent controller (control agent) functions as follows: At each instant (on a discrete timescale k, k = 1, 2, …), the control agent observes the current state of the system x_k and takes an action a_k. The state vector consists of some quantities, which are normally available to the control agent. Here, the average of the ACE signal over the time interval k – 1 to k as the state vector at the instant k is used. For the algorithm presented here, it is assumed that the set of all possible states X is finite. Therefore, the values of various quantities that constitute the state information should be quantized.

The possible actions of the control agent are the various values of ∆P_C that can be demanded in the generation level within an AGC interval. The ∆P_C is also discretized to some finite number of levels. Now, since both X and A = {a_k; k = 1, 2, ...} are finite sets, a model for this dynamic system can be specified through a set of probabilities.

7.2.3 Rl algorithm

Here, similar to the introduced algorithm in Ahamed et al.,²⁵ an RL algorithm is used for estimating Q* and the optimal policy (Equation 7.5). Suppose we have a sequence of samples (x_k, x_k+1, a_k, r), k = 1, 2,… . Each sample is such that x_k+1 is a random state that results when action a_k is performed in state x_k, and r_k = g(x_k, x_k+1, a_k) is the consequent immediate reinforcement.

Such a sequence of samples can be obtained through either a simulation model of the system or observing the actual system in operation. This sequence of samples (called training set) can be used to estimate Q*, using a specific algorithm. Suppose Q^k is the estimated value of Q* at the kth iteration. Let the next sample be (x_k, x_k+1, a_k, r), then Q^k+1 can be obtained as follows:

Q^{k + 1} (x_{k;} a_{k}) = Q^{k} (x_{k;} a_{k}) + α [g (x_{k;} x_{k + 1 /} a_{k}) + γ \max_{α \in A} Q^{k} (x_{k + 1 /} á) - Q^{k} (x_{k;} a_{k})] (7.6)

$Q^{k + 1} (x_{k;} a_{k}) = Q^{k} (x_{k;} a_{k}) + α [g (x_{k;} x_{k + 1 /} a_{k}) + γ \max_{α \in A} Q^{k} (x_{k + 1 /} á) - Q^{k} (x_{k;} a_{k})] (7.6)$

where a is a constant called the step size of learning algorithm, and 0 < a < 1.

At each time step, as determined by the sampling time for the AGC action, the state input vector x for the AGC is determined, and then an action in that state is selected and applied to the model. The model is integrated for a time interval equal to the sampling time of AGC to obtain the state vector x´ at the next time step.

Here, the exploration policy for choosing actions in different states is used. It is based on a learning automata algorithm called pursuit algorithm.³² This is a stochastic policy where, for each state x, actions are chosen based on a probability distribution over the action space. Let $P_{x}^{k}$ $P_{x}^{k}$ denote the probability distribution over the action set for state vector x at the kth iteration of learning; that is, $P_{x}^{k} (a)$ $P_{x}^{k} (a)$ is the probability of choosing action a in state x at iteration k. An uniform probability distribution is considered at k = 0, that is,

P_{x}^{0} (a) = \frac{1}{| A |} \forall α \in A \forall x \in X (7.7)

$P_{x}^{0} (a) = \frac{1}{| A |} \forall α \in A \forall x \in X (7.7)$

At the kth iteration, let the state x_k be equal to x. An action a_k based on $P_{x}^{k} (\cdot)$ $P_{x}^{k} (\cdot)$ is randomly chosen. That is, $Prob (a_{k} = a) = P_{x}^{k} (a)$ $Prob (a_{k} = a) = P_{x}^{k} (a)$ . Using the performed simulation model, the system goes to the next state, x_k+1, by applying action a in state x, and is integrated for the next time interval. Then, Q^k is updated to Q^k+1 using Equation 7.6, and the probabilities are updated as follows:

\begin{matrix} P_{x}^{k + 1} (a_{g}) = P_{x}^{k} (a_{g}) + β (1 - P_{x}^{k} (a_{g})) \\ P_{x}^{k + 1} (a) = P_{x}^{k} (a) (1 - β) \forall α \in A, a \neq a_{g} \\ P_{y}^{k + 1} (a) = P_{y}^{k} (a) \forall α \in A, \forall y \in X, y \neq x \end{matrix} (7.8)

$\begin{matrix} P_{x}^{k + 1} (a_{g}) = P_{x}^{k} (a_{g}) + β (1 - P_{x}^{k} (a_{g})) \\ P_{x}^{k + 1} (a) = P_{x}^{k} (a) (1 - β) \forall α \in A, a \neq a_{g} \\ P_{y}^{k + 1} (a) = P_{y}^{k} (a) \forall α \in A, \forall y \in X, y \neq x \end{matrix} (7.8)$

where β is a constant and 0 < β < 1. Thus, at iteration k, the probability of choosing the greedy action a_g in state x is slightly increased, and the probabilities of choosing all other actions in state x are proportionally decreased.

In the present algorithm, the aim is to achieve the well-known AGC objective and to keep the ACE within a small band around zero. This choice is motivated by the fact that all the existing AGC implementations use this as a main control objective, and hence it will be possible to compare the proposed RL approach with conventional and other AGC design approaches.

As mentioned earlier, in this formulation, each state vector consists of the average value of ACE as a state variable. The control agent actions change the generation set point, ∆P_C. According to the RL algorithms application, usually a finite number of states are assumed. In this direction, the state variable and action variable should be discretized to finite levels, too.

The next step is to choose an immediate reinforcement function by defining the function g. The reward matrix initially is full of zeros. At each time step, the average value of the ACE signal is obtained; then, according to its discretized values, the state of the system is determined. Whenever the state is desirable (i.e., |ACE| is less than ε), the reward function g(k, x_k+1, a_k) is assigned at zero value. When it is undesirable (i.e., |ACE| > ε), g(k, x_k+1, a_k) is assigned a value –|ACE|. In this process, all actions that cause an undesirable state with a negative value are penalized.

7.2.4 application to a Thirty-Nine-bus Test System

To illustrate the effectiveness of the proposed control strategy, the designed intelligent control scheme is applied to the thirty-nine-bus test system described in Figure 6.7. The power system is divided into three control areas, as explained in Chapter 6. Here, the purpose is essentially to clearly show the various steps of implementation and illustrate the method. After design choices are made, the controller is trained by running the simulation in the learning mode, as explained in the previous section. After completing the learning phase, the control actions at various states converge to their optimal values.

The simulation is run as follows: At each AGC instant k, the control agents of all areas average all corresponding ACE signal instances gained every 0.1 s. Three average values of ACE signal instances, each related to one area, form the current state vector, x_k, that is obtained according to the quantized states. When all areas’ state vectors are ready, the control agents choose the action signal a_k that consists of three ∆P_C values for three areas (action signal is gained according to the quantized actions and the exploration policy mentioned above) to change the set points of the governors using the values given by a_k.

In the performed simulation studies, the input variable is obtained as follows. As the AGC decision time cycle chosen, three values of ACE (for three control areas) are calculated over the determined cycle. The averages of these values for three areas are the state variable $(x_{a v g 1 /}^{1} x_{a v g 2 /}^{1} x_{a v g 3}^{1})$ $(x_{a v g 1 /}^{1} x_{a v g 2 /}^{1} x_{a v g 3}^{1})$ .

Since in the multiagent RL process the agents of all areas are learning together, the state vector also consists of all state vectors of three areas, and the action vector consists of all action vectors of three areas, as shown in <(X₁,

X₂, X₃), (A₁, A₂, A₃), (r₁, r₂, r₃)> or < X, A, p, r >. Here $X_{i} = x_{a v g i}^{1}$ $X_{i} = x_{a v g i}^{1}$ is the discrete set of each area state, X is the joint state, A_i is the discrete set of each area action available to the area i, and A is the joint action. In each time instant after averaging of ACE_i for each area (over three instances), depending on the current joint state (X₁, X₂, X₃ ), the joint action (∆P_C1, ∆P_C2, ∆P_C3) is chosen according to the exploration policy.

Consequently, the reward r also depends on the joint action whenever the next state (X) is desirable (i.e., all |ACE_i| are less than ε); then reward function r is fixed at zero value. When the next state is undesirable (i.e., ∃ ACE_i , |ACE| > ε), r is assigned an average value of –|ACE_i|. In this algorithm, since all agents learn together, parallel computation causes the learning process to speed up. This RL algorithm is also more scalable than single-agent RL algorithms.

In the performed simulations, the proposed controllers are applied to the thirty-nine-bus, three-control area system, as simplified in Figure 7.4. In this section, the performance of the closed-loop system using the well-tuned conventional PI controllers is compared to that of the system using the designed multiagent RL controllers for a wide range of load disturbances.

As a serious test scenario, similar to in Section 6.3.2, the following load disturbances (step increase in demand) are applied to three areas: 3.8% of the total area load at bus 8 in area 1, 4.3% of the total area load at bus 3 in area 2, and 6.4% of the total area load at bus 16 in area 3 have been simultaneously increased in step form. The applied step load disturbances ΔP_Li (pu), the output power of wind farms P_WT (MW), and the wind velocity V_W (m/s) are shown in Figure 7.5.

The frequency deviation (∆f ) and area control error (ACE) signals in three areas are shown in Figures 7.6 and 7.7, respectively. The produced mechanical power by the AGC participating unit in area 2 (P_m2 for G₉), the corresponding electrical power (P_e2), and also the overall tie-line power for the same area (P_tie2) are shown in Figure 7.8.

The wind penetration in this system is considered as two individual wind farms, each with a capacity equivalent to about half of the total penetration.

Images

FIGURE 7.4
Three-control area with RL-based control agents.

Images

FIGURE 7.5
(a) Load step disturbances in three areas, (b) total wind power, and (c) the wind velocity pattern in area 1.

Images

FIGURE 7.6
Frequency deviation in (a) Area 1, (b) Area 2, and (c) Area 3. Proposed intelligent method (solid), linear PI control (dotted).

Images

FIGURE 7.7
ACE signal in (a) Area 1, (b) Area 2, and (c) Area 3. Proposed intelligent method (solid), linear PI control (dotted).

However, in the present simulation, the detailed dynamic nonlinear models of a thirty-nine-bus power system and wind turbines are used without applying an aggregation model for generators or wind turbine units. That is why, in the simulation results, in addition to the long-term fluctuations, some fast oscillations in a timescale of 10 s are also observable.²⁸,³³

As shown in the simulation results, using the proposed method, the area control error and frequency deviation of all areas are properly driven close to zero. Furthermore, regarding that the proposed algorithm is an adaptive algorithm and is based on the learning methods, i.e., in each state it finds the local optimum solution to gain the system objective (minimizing the ACE signal), the intelligent controllers provide smoother control action signals, and area frequency deviations are less than the frequency deviations in the same system with conventional controllers.

7.3 Using GA to Determine Actions and States

The genetic algorithm (GA) can be used to gain better results and to tune the quantized values of the state vector and action vectors. To quantize the state range and action range using GA, each individual that is an explanatory quantized value of states and actions should be a double vector. It is clear that with increasing the number of variables in a double vector, the states (ACE signal quantized values) are found more precisely. In the AGC issue, system states are more important than system actions (∆P_C quantized values) and have a greater affect on the whole system performance (keeping the ACE within a small band around zero), because the systems with more states can learn more precisely than the same systems with less states.

Images

FIGURE 7.8
Area 2 power response using the proposed multiagent RL method.

First, the maximum number of states (n_s) and the least valuable actions number (n_a) should be defined in GA for the assumed AGC system. For the actions variable, in case of considering a small number of variables, the learning speed will increase, because it is not necessary to examine extra actions in each state.²⁶

7.3.1 Finding individual’s Fitness and Variation Ranges

To find eligibility (fitness) of individuals, n_a variables are randomly chosen as discretized values of actions from each individual, which contains (n_a + n_s)

variables; then these values should be scaled according to the action range (variable’s range is between 0 and 1; however, the variation of the ∆P_C action signal is between [∆P_Cmin ∆P_Cmax]), and the remaining other n_s variables are discretized values of the ACE signal, which should be scaled to the valid range ([ACE_min ACE_max]). After scaling and finding the corresponding quantized state and action vector, the model is run with these properties, and the individual’s fitness is obtained from Equation 7.9. The individuals with the smallest fitness are the best.

Individual Fitness = Σ | A C E | / (simulation time) (7.9)

$Individual Fitness = Σ | A C E | / (simulation time) (7.9)$

Hence, the basic AGC task can be summarized to correct the observed ACE within a limited range; if the ACE goes beyond this range, other emergency control steps may have to be taken by the operator. Let the valuable range of ACE for which the AGC is expected to act properly be [ACE_min ACE_max]. In fact, ACE_min and ACE_max must be determined by the operating policy of the area according to the existing frequency operating standards.³³ ACE_max is the maximum ACE signal deviation that is expected to be corrected by the AGC (in practice, ACE deviations beyond this value are corrected only through operator intervention or may need an emergency control action). ACE_min is the required minimum amplitude of ACE deviation to trigger the AGC control loop.

The other variable to be quantized is the control action ∆P_C. This also requires that a design choice be made for the range [∆P_Cmin ∆P_Cmax]. The ∆P_Cmax is automatically determined by the equipment constraints of the system. It is the maximum power change that can be effected within one AGC execution period. The ∆P_Cmin is the minimum change that must be demanded according the dynamics of the AGC participating units. The following application example illustrates how the GA can be practically used to determine actions and states in a multiagent RL-based AGC design. Interested readers can find more details in Daneshfar and Bevrani²⁶ and Daneshfar.³⁴

7.3.2 application to a Three-Control area Power System

To illustrate the effectiveness of using GA in the RL algorithm for the proposed control strategy, a three-control area power system (same as the example used in Section 2.4 and Bevrani et al.³⁵) is considered as a test system. Each control area includes three Gencos, and the power system parameters are given in Table 2.1. The schematic diagram of the system used for simulation studies is also shown in Figure 2.13.

After completing the design steps of the algorithm, the controller must be trained by running the simulation in the learning mode, as explained in Section 7.2. The performance results presented here correspond to the performance of the controllers after ending the learning phase and converging the controller’s actions at various states to their optimal values. As described in a previous example (Section 7.2.4), at each AGC execution period that is greater than the simulation sampling time, the control agent of each area averages all corresponding ACE signal instances measured by sensors and averages all load change instances obtained during the AGC execution period. Three average values of ACE signal instances for three areas, together with three average values of load change instances, form the current joint state vector x_k and are obtained according to the quantized states gained from GA. Then, the control agents choose an action a_k according to the quantized actions gained from GA and the aforementioned exploration policy. Each joint action a_k consists of three actions (∆P_C1, ∆P_C2, ∆P_C3) to change the set points of the governors. Using these actions for the governors setting, the AGC process is transferred to the next execution period. During the next cycle (i.e., until the next instant of AGC gained), three values of average ACE instances in each area are formed for the next joint state x_k+1.

In the presented simulation study, the input variable is obtained as follows: At each AGC execution period, average values of ACE signal instances corresponding to three areas are calculated; they are the first state variables $(x_{a v g 1 /}^{1} x_{a v g 2 /}^{1} x_{a v g 3}^{1})$ $(x_{a v g 1 /}^{1} x_{a v g 2 /}^{1} x_{a v g 3}^{1})$ . In the multiagent RL process, agents of all areas are learning together, the joint state vector consists of all state vectors of three areas, and the joint action vector consists of all action vectors of three areas. This statement can be shown in triple < (X₁, X₂, X₃), (A₁, A₂, A₃), p, (r₁, r₂, r₃) > or <X, A, p, r>, where $X_{i} = (x_{a v g i}^{1}, x_{i}^{2})$ $X_{i} = (x_{a v g i}^{1}, x_{i}^{2})$ is the discrete set of each area state, X is the joint state, A_i is the discrete sets of actions available to area i, and A is the joint action.

In each AGC execution period, after averaging of ACE_i of all areas (over instances obtained in that period), depending on the current joint state (X₁, X₂, X₃), the joint action (∆P_C1, ∆P_C2, ∆P_C3) is chosen according to the exploration policy. Consequently, the reward r also depends on the joint action whenever the next state (X_i) is desirable, i.e., all |ACE_i| are less than ε, where ε is the smallest ACE signal value that AGC can operate. Then, the reward function r is assigned a zero value. When the next state is undesirable, i.e., at least one |ACE_i| is greater than ε, r is assigned an average value of all –|ACE_i|.

For the sake of simulations, the performance of the closed-loop system for the mentioned three-control area using the linear robust proportional-integral (PI) controllers,³⁵ compared to the designed multiagent RL controllers, is tested for the following simultaneous large load disturbances (step increase in demand) in three areas:

Δ P_{L 1} = 100 M W; Δ P_{L 2} = 80 M W; Δ P_{L 3} = 50 M W

$Δ P_{L 1} = 100 M W; Δ P_{L 2} = 80 M W; Δ P_{L 3} = 50 M W$

The frequency deviation, ACE signal, and control action signals of the closed-loop system are shown in Figures 7.9 to 7.11. Simulation results show that the ACE and frequency deviation of all areas for the proposed intelligent GA-based multiagent RL controllers are properly driven back to zero, as well as for robust PI controllers. The produced control action signals, which are proportional to the specified participation factors, are smooth enough to satisfy the generation physical constraint.

Images

FIGURE 7.9
Frequency deviation in (a) Area 1, (b) Area 2, and (c) Area 3. Proposed intelligent method (solid), robust PI control (dotted).

7.4 An Agent for β Estimation

The frequency bias factor (β) is an important term to calculate ACE (Equation 2.11). Since the control agents in the described AGC scheme provide control action signals based on the received ACE signals to achieve more accurate results, one may use an individual agent in each area for estimation of β. The conventional approaches in tie-line bias control use the frequency bias coefficient –10B to offset the area’s frequency response characteristic, β. But it is related to many factors, and with –10B = β, the ACE would only react to internal disturbances. Therefore, recently several approaches have been given to approximate β instead of a constant value for real-time applications.³⁶–39

A multiagent AGC scheme including an estimator agent to estimate the β parameter is given in Daneshfar and Bevrani.²⁶ The overall control framework is shown in Figure 7.12. The estimator agent in each control area calculates the β parameter and performs ACE based on received signals ∆P_tie, ∆f, ∆P_m, ∆P_L. The estimation algorithm is developed based on a dynamical representation of generation-load in the simplified AGC frequency response model (Figure 2.8), which can be described as follows:

Images

FIGURE 7.10
ACE signal in (a) Area 1, (b) Area 2, and (c) Area 3. Proposed intelligent method (solid), robust PI control (dotted).

\sum_{j = 1}^{n} M_{m j i} (t) - Δ P_{L i} (t) - M_{t i e, i} (t) = 2 H_{i} \frac{d}{d t} Δ f_{i} (t) + D_{i} Δ f_{i} (t) (7.10)

$\sum_{j = 1}^{n} M_{m j i} (t) - Δ P_{L i} (t) - M_{t i e, i} (t) = 2 H_{i} \frac{d}{d t} Δ f_{i} (t) + D_{i} Δ f_{i} (t) (7.10)$

Substituting ∆Ptie,i (t) from Equation 2.1 in Equation 7.10 yields

\sum_{j = 1}^{n} M_{m j i} (t) - Δ P_{L i} (t) + β_{i} Δ f_{i} (t) - A C E_{i} (t) = 2 H_{i} \frac{d}{d t} Δ f_{i} (t) + D_{i} Δ f_{i} (t) (7.11)

$\sum_{j = 1}^{n} M_{m j i} (t) - Δ P_{L i} (t) + β_{i} Δ f_{i} (t) - A C E_{i} (t) = 2 H_{i} \frac{d}{d t} Δ f_{i} (t) + D_{i} Δ f_{i} (t) (7.11)$

From Equation 7.11, ACE can be calculated in terms of other variables:

ACE i (t) = \sum_{i = 1}^{n} M_{m j i} (t) - Δ P_{L i} (t) + (β_{i} - D_{i}) Δ f_{i} (t) - 2 H_{i} \frac{d}{d t} Δ f_{i} (t) (7.12)

$ACE i (t) = \sum_{i = 1}^{n} M_{m j i} (t) - Δ P_{L i} (t) + (β_{i} - D_{i}) Δ f_{i} (t) - 2 H_{i} \frac{d}{d t} Δ f_{i} (t) (7.12)$

Images

FIGURE 7.11
Control action signal in (a) Area 1, (b) Area 2, and (c) Area 3. Proposed intelligent method (solid), robust PI control (dotted).

Images

FIGURE 7.12
Control framework for area i, with estimator agent.

By applying the following definition for a moving average over a T second interval to Equation 7.12,

X_{T} = \frac{1}{T} \int_{t_{i}}^{t_{f}} X (t) d t, t_{f} - t_{i} = T second (7.13)

$X_{T} = \frac{1}{T} \int_{t_{i}}^{t_{f}} X (t) d t, t_{f} - t_{i} = T second (7.13)$

{\bar{A C E}}_{T}

${\bar{A C E}}_{T}$ can be obtained as follows:

{\bar{A C E}}_{T} = \frac{1}{T} {\sum_{T} \sum_{i = 1}^{n} M_{m i^{i}} - \sum_{T} Δ P_{L i} + (β_{i} - D_{i}) \sum_{T} Δ f_{i}} - \frac{2 H_{i}}{T} (Δ f (t_{f}) - Δ f (t_{i})) (7.14)

${\bar{A C E}}_{T} = \frac{1}{T} {\sum_{T} \sum_{i = 1}^{n} M_{m i^{i}} - \sum_{T} Δ P_{L i} + (β_{i} - D_{i}) \sum_{T} Δ f_{i}} - \frac{2 H_{i}}{T} (Δ f (t_{f}) - Δ f (t_{i})) (7.14)$

Images

FIGURE 7.13
The response of estimator agent.

By applying the measured values of ACE and other variables in the above equation over a time interval, the values of β can be estimated for the corresponding period. Since the values of β vary with system conditions, these model parameters would have to be updated regularly using a recursive least squares (RLS) algorithm.⁴⁰ Suitable values of the duration T depend on system dynamic behavior. Although a larger T would yield smoother β values, the convergence to the proper value would be slow. Figure 7.13 shows the estimated and calculated β over a 100 s simulation, for area 1 of the three-control area described in Section 7.3.2. For this test the –10B of the target control area was set equal to β_cal. As it is clear, β_est converged rapidly to the β_cal and remained there over the rest of the run. Several simulation scenarios for the example at hand using an estimator agent are presented in Daneshfar and Bevrani.²⁶ The application results of using GA to optimize actions and states in the multiagent RL approach for the thirty-nine-bus test system are given in Daneshfar and Bevrani²⁶ and Daneshfar.³⁴

7.5 Summary

This chapter addresses the application of MASs in AGC design for interconnected power systems. General frameworks for agent-based control systems based upon the foundations of agent theory are discussed. A new multiagent AGC scheme has been introduced. The capability of reinforcement learning in the proposed AGC strategy is examined, and the application of GA to determine actions and states during the learning process is discussed. The possibility for building of more agents such as estimator agents to cope with real-world AGC systems is explained.

Model independency, scalability, flexibility, the decentralized property, and the capability of parallel processing, as main features of new approaches in specifying the control objective, make it very attractive for application in AGC design. The application results in some power system examples show that the RL-based multiagent control schemes perform well, in comparison to the performance of robust PI control design. In Chapter 8, another multi-agent AGC scheme based on Bayesian networks is presented.

References

1. K. P. Sycara. 1998. Multiagent systems. AI Magazine 19(2):79–92.

2. M. Wooldridge, N. Jennings. 1995. Intelligent agents: Theory and practice. Knowledge Eng. Rev. 10(2):115–52.

3. G. Weiss. 1999. Multiagent systems: A modern approach to distributed artificial intelligence. Cambridge, MA: MIT Press.

4. P. Stone, M. Veloso. 2000. Multiagent systems: A survey from a machine learning perspective. Autonomous Robots 8(3):345–83.

5. M. Wooldridge, N. R. Jennings. 1995. Agent theories, architectures, and languages: A survey. In Intelligent agents, 1–39. Berlin: Springer.

6. Power Systems Engineering Research Center (PSERC). 2008. Agent modeling for integrated power systems. Final project report. PSERC. http://www.pserc.wisc.edu/documents/publications/.

7. S. D. J. McArthur, E. M. Davidson, V. M. Catterson, A. L. Dimeas, N. D. Hatziargyriou, F. Ponci, T. Funabashi. 2007. Multi-agent systems for power engineering applications. Part I. Concepts, applications and technical challenges. IEEE Trans. Power Syst. 22(4):1743–52.

8. S. D. J. McArthur, E. M. Davidson, V. M. Catterson, A. L. Dimeas, N. D. Hatziargyriou, F. Ponci, T. Funabashi. 2007. Multi-agent systems for power engineering applications. Part II. Technologies, standards and tools for multi-agent systems. IEEE Trans. Power Syst. 22(4):1753–59.

9. P. Wei, Y. Yan, Y. Ni, J. Yen, F. Wu. 2001. A decentralized approach for optimal wholesale cross-border trade planning using multi-agent technology. IEEE Trans. Power Syst. 16:833–38.

10. A. L. Dimeas, N. D. Hatziargyriou. 2005. Operation of a multiagent system for microgrid control. IEEE Trans. Power Syst. 20:1447–55.

11. H. Voos. 2000. Intelligent agents for supervision and control: A perspective. In Proceedings of the 15th IEEE International Symposium on Intelligent Control (ISIC), Rio Patras, Greece, pp. 339–44.

12. S. Russell, P. Norvig. 1995., Artificial intelligence: A modern approach. Englewood Cliffs, NJ: Prentice-Hall.

13. M. Wooldridge, G. Weiss, eds. 1999. Intelligent agents, in multi-agent systems, 3–51. Cambridge, MA: MIT Press.

14. B. C. Williams, M. D. Ingham, S. H. Chung, P. H. Elliott. 2003. Model-based programming of intelligent embedded systems and robotic space explorers. Proc. IEEE 91(1):212–37.

15. F. Bellifemine, A. Poggi, G. Rimassa. 2001. Developing multi-agent systems with jade. In Intelligent agents, ed. C. Castelfranchi, Y. Lesperance, 89–103. Vol. VII, no. 1571 of Lecture Notes in Artificial Intelligence. Heidelberg, DE: Springer-Verlag.

16. Y. Xiang. 2002. Probablistic reasoning in multiagent systems: A graphical models approach. Cambridge: Cambridge University Press.

17. D. Poole, A. Mackworth, R. Goebel. 1998. Computational intelligence: A logical approach. New York: Oxford University Press.

18. M. H. Hassoun. 1995. Fundamentials of artificial neural networks. Cambridge, MA: MIT Press.

19. R. S. Sutton, A. G. Barto. 1998. Reinforcement learning: An introduction. Cambridge, MA: MIT Press.

20. H. Bevrani, F. Daneshfar, P. R. Daneshmand, T. Hiyama. 2010. Intelligent automatic generation control: Multi-agent Bayesian networks approach. In Proceedings of IEEE International Conference on Control Applications, Yokohama, Japan, CD-ROM.

21. V. Gazi, B. Fidan. 2007. Coordination and control of multi-agent dynamic systems: Models and approaches. In Swarm robotics, ed. E. Sahin, et al., 71–102. LNCS 4433. Berlin Heidelberg: Springer-Verlag.

22. V. I. Utkin. 1992. Sliding modes in control and optimization. Berlin: Springer-Verlag.

23. H. Li. 2006. A framework for coordinated control of multi-agent systems. PhD thesis, University of Waterloo, Waterloo, Ontario, Canada.

24. R. de Boer, J. Kok. 2002. The incremental development of a synthetic multi-agent system: The UvA trilearn 2001 robotic soccer simulation team. Master’s thesis, Faculty of Science, University of Amsterdam.

25. T.P.I. Ahamed, P. S. N. Rao, P. S. Sastry. 2006. Reinforcement learning controllers for automatic generation control in power systems having reheat units with GRC and dead-band. Int. J. Power Energy Syst. 26:137–46.

26. F. Daneshfar, H. Bevrani. 2010. Load-frequency control: A GA-based multi-agent reinforcement learning. IET Gener. Transm. Distrib. 4(1):13–26.

27. S. Eftekharnejad, A. Feliachi. 2007. Stability enhancement through reinforcement learning: Load frequency control case study. Bulk Power Syst. Dynamics Control VII:1–8.

28. H. Bevrani, F. Daneshfar, P. R. Daneshmand. 2010. Intelligent power system frequency regulation concerning the integration of wind power units. In Wind power systems: Applications of computational intelligence, ed. L. F. Wang, C. Singh, A. Kusiak, 407–37. Springer Book Series on Green Energy and Technology. Heidelberg: Springer-Verlag.

29. H. Bevrani, F. Daneshfar, P. R. Daneshmand, T. Hiyama. 2010. Reinforcement learning based multi-agent LFC design concerning the integration of wind farms. In Proceedings of IEEE International Conference on Control Applications, Yokohama, Japan, CD-ROM.

30. L. Busoniu, R. Babuska, B. de Schutter. 2008. A comprehensive survey of multi-agent reinforcement learning. IEEE Trans. Syst. Man. Cyber. C Appl. Rev. 38:156–72.

31. E. F. Yage, D. B. Gu. 2004. Multi-agent reinforcement learning for multi-robot systems: A survey. Technical Report CSM-404, University of Essex, Colchester, UK.

32. M. A. L. Thathachar, B. R. Harita. 1998. An estimator algorithm for learning automata with changing number of actions. Int. J. Gen. Syst. 14:169–84.

33. H. Bevrani. 2009. Robust power system frequency control. New York: Springer.

34. F. Daneshfar. 2009. Automatic generation control using multi-agent systems. MSc dissertation, Department of Electrical and Computer Engineering, University of Kurdistan, Sanandaj, Iran.

35. H. Bevrani, Y. Mitani, K. Tsuji. 2004. Robust decentralised load-frequency control using an iterative linear matrix inequalities algorithm. IEE Proc. Gener. Transm. Distrib. 3(151):347–54.

36. N. B. Hoonchareon, C. M. Ong, R. A. Kramer. 2002. Feasibility of decomposing ACE to identify the impact of selected loads on CPS1 and CPS2. IEEE Trans. Power Syst. 22(5):752–56.

37. L. R. Chang-Chien, N. B. Hoonchareon, C. M. Ong, R. A. Kramer. 2003. Estimation of β for adaptive frequency bias setting in load frequency control. IEEE Trans. Power Syst. 18(2):904–11.

38. L. R. Chang-Chien, C. M. Ong, R. A. Kramer., 2002. Field tests and refinements of an ace model. IEEE Trans. Power Syst. 18(2):898–903.

39. L. R. Chang-Chien, Y. J. Lin, C. C. Wu. 2007. An online approach to allocate operating reserve for an isolated power system. IEEE Trans. Power Syst. 22(3):1314–21.

40. E. K. P. Chong, S. H. Zak. 1996. An introduction to optimization. New York: John Wiley & Sons Press.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 7 AGC Design Using Multiagent Systems

Create new playlist

Sign In

Sign Up

Table of Contents for
7 AGC Design Using Multiagent Systems