Imagine that we need to come up with an enemy that needs to select different actions over time as the player progresses through the game and his or her patterns change, or a game for training different types of pets that have free will to some extent.
For these types of tasks, we can use a series of techniques aimed at modeling learning based on experience. One of these algorithms is Q-learning, which will be implemented in this recipe.
Before delving into the main algorithm, it is necessary to have certain data structures implemented. We need to define a structure for game state, another for game actions, and a class for defining an instance of the problem. They can coexist in the same file.
The following is an example of the data structure for defining a game state:
public struct GameState { // TODO // your state definition here }
Next is an example of the data structure for defining a game action:
public struct GameAction { // TODO // your action definition here }
Finally, we will build the data type for defining a problem instance:
public class ReinforcementProblem { }
public virtual GameState GetRandomState() { // TODO // Define your own behaviour return new GameState(); }
public virtual GameAction[] GetAvailableActions(GameState s) { // TODO // Define your own behaviour return new GameAction[0]; }
public virtual GameState TakeAction( GameState s, GameAction a, ref float reward) { // TODO // Define your own behaviour reward = 0f; return new GameState(); }
We will implement two classes. The first one stores values in a dictionary for learning purposes, and the second one is the class that actually holds the Q-learning algorithm:
QValueStore
class:using UnityEngine; using System.Collections.Generic; public class QValueStore : MonoBehaviour { private Dictionary<GameState, Dictionary<GameAction, float>> store; }
public QValueStore() { store = new Dictionary<GameState, Dictionary<GameAction, float>>(); }
public virtual float GetQValue(GameState s, GameAction a) { // TODO: your behaviour here return 0f; }
public virtual GameAction GetBestAction(GameState s) { // TODO: your behaviour here return new GameAction(); }
public void StoreQValue( GameState s, GameAction a, float val) { if (!store.ContainsKey(s)) { Dictionary<GameAction, float> d; d = new Dictionary<GameAction, float>(); store.Add(s, d); } if (!store[s].ContainsKey(a)) { store[s].Add(a, 0f); } store[s][a] = val; }
QLearning
class, which will run the algorithm:using UnityEngine; using System.Collections; public class QLearning : MonoBehaviour { public QValueStore store; }
private GameAction GetRandomAction(GameAction[] actions) { int n = actions.Length; return actions[Random.Range(0, n)]; }
public IEnumerator Learn( ReinforcementProblem problem, int numIterations, float alpha, float gamma, float rho, float nu) { // next steps }
if (store == null) yield break;
GameState state = problem.GetRandomState(); for (int i = 0; i < numIterations; i++) { // next steps }
yield return null;
if (Random.value < nu) state = problem.GetRandomState();
GameAction[] actions; actions = problem.GetAvailableActions(state); GameAction action;
if (Random.value < rho) action = GetRandomAction(actions); else action = store.GetBestAction(state);
float reward = 0f; GameState newState; newState = problem.TakeAction(state, action, ref reward);
q
value, given the current game, and take action and the best action for the new state computed before:float q = store.GetQValue(state, action); GameAction bestAction = store.GetBestAction(newState); float maxQ = store.GetQValue(newState, bestAction);
q = (1f - alpha) * q + alpha * (reward + gamma * maxQ);
q
value, giving its parents as indices:store.StoreQValue(state, action, q); state = newState;