Chapter 8
An MDP-Based Dynamic Optimization Methodology for Embedded Wireless Sensor Networks*

Embedded wireless sensor networks (EWSNs) are distributed systems consisting of spatially distributed autonomous sensor nodes that span diverse application domains. The EWSN application domains include security and defense systems, industrial monitoring, building automation, logistics, ecology, environment and ambient conditions monitoring, healthcare, home and office applications, vehicle tracking, and so on. However, this wide application diversity combined with increasing embedded sensor nodes complexity, functionality requirements, and highly constrained operating environments makes EWSN design very challenging.

One critical EWSN design challenge involves meeting application requirements such as reliability, lifetime, throughput, and delay (responsiveness) for a myriad of application domains. For example, a vineyard irrigation system may require less responsiveness to environmental stimuli (i.e., decreased irrigation during wet periods), but have a long lifetime requirement. On the other hand, in a disaster relief application, sensor nodes may require high responsiveness but have a short lifetime. Additional requirements may include high adaptability to rapid network changes as sensor nodes are destroyed. Meeting these application-specific requirements is critical to accomplishing the application's assigned function. Nevertheless, satisfying these demands in a scalable and cost-effective way is a challenging task.

Commercial off-the-shelf (COTS) embedded sensor nodes have difficulty meeting application requirements due to inherent manufacturing traits. In order to reduce manufacturing costs, generic COTS embedded sensor nodes capable of implementing nearly any application are produced in large volumes and are not specialized to meet any specific application requirements. In order to meet application requirements, sensor nodes must possess tunable parameters. Fortunately, some COTS have tunable parameters such as processor voltage, processor frequency, sensing frequency, radio transmission power, and radio transmissionfrequency.

Embedded sensor node parameter tuning is the process of determining appropriate parameter values, which meet application requirements. However, determining such values presents several tuning challenges. First, application managers (the individuals responsible for EWSN deployment and management) typically lack sufficient technical expertise [297, 298], as many managers are nonexperts (i.e., biologists, teachers, structural engineers, agriculturists). In addition, parameter value tuning is still a cumbersome and time-consuming task even for expert application managers because of unpredictable EWSN environments and difficulty in creating accurate simulation environments. Second, selected parameter values may not be optimal (suboptimal). Given a highly configurable embedded sensor node with many tunable parameters and with many values for each tunable parameter, choosing the optimal combination is difficult. In addition, unanticipated changes in the sensor node's environment can alter optimal parameter values. For example, an embedded sensor node designed to monitor a short-lived volcanic eruption may need to operate for more months/years than expected if earthquakes alter magma flow.

To ease parameter value selection, dynamic optimizations enable embedded sensor nodes to dynamically tune their parameter values in situ according to application requirements and environmental stimuli. This dynamic tuning of parameters ensures that an EWSN performs the assigned task optimally, enabling the embedded sensor node to constantly conform to the changing environment. In addition, the application manager need not know embedded sensor node and/or dynamic optimization specifics, thus easing parameter tuning for nonexpert application managers.

Unfortunately, there exists little previous work on EWSN dynamic optimizations with respect to the related high-level application requirements to low-level embedded sensor node parameters. Moreover, changes in application requirements over time were not addressed in previous work. Hence, novel dynamic optimization methodologies that respond to changing application requirements and environmental stimuli are essential.

In this chapter, we propose an application-oriented dynamic tuning methodology for embedded sensor nodes in distributed EWSNs based on Markov decision process (MDP). Our MDP-based application-oriented tuning methodology performs dynamic voltage, frequency, and sensing (sampling) frequency scaling (DVFS2). MDP is an appropriate candidate for EWSN dynamic optimizations where dynamic decision making is a requirement in light of changing environmental stimuli and wireless channel condition. We focus on DVFS2 for several reasons. Traditional microprocessor-based systems use dynamic voltage and frequency scaling (DVFS) for energy optimizations. However, sensor nodes are distinct from traditional systems in that they have embedded sensors coupled with an embedded processor. Therefore, DVFS only provides a partial tuning methodology and does not consider sensing frequency. Sensing frequency tuning is essential for sensor nodes to meet application requirements because the sensed data delay (thedelay between the sensor sensing the data and the data's reception by the application manager) depends on the sensor node sensing frequency as it influences the amount of processed and communicated data. Thus, DVFS2 provides enhanced optimization potential as compared to DVFS with respect to EWSNs.

Our main contributions in this chapter are as follows:

  • We propose an MDP for dynamic optimization of embedded sensor nodes in EWSNs. MDP is suitable for embedded sensor nodes dynamic optimization because of MDP's inherent ability to perform dynamic decision making. This chapter presents a first step toward MDP-based dynamic optimization.
  • Our MDP-based dynamic optimization methodology gives a policy that performs DVFS2 and specifies good quality solution for embedded sensor node parameter tuning for EWSN lifetime.
  • Our MDP-based dynamic tuning methodology adapts to changing application requirements and environmental stimuli.
  • We provide implementation guidelines for our proposed dynamic tuning methodology in embedded sensor nodes.

We compare our proposed MDP-based application-oriented dynamic tuning methodology with several fixed heuristics. The results show that our proposed methodology outperforms other heuristics for the given application requirements. The broader impacts of our research include facilitating EWSN designers (persons who design EWSNs for an application) to better meet application requirements by selecting suitable tunable parameter values for each sensor node. As this chapter presents a first step toward MDP-based dynamic optimization, our work can potentially spark further research in MDP-based optimizations for embedded sensor nodes in EWSNs.

The remainder of this chapter is organized as follows. A review of related work is given in Section 8.1. Section 8.2 provides an overview of MDP with respect to EWSNs and our proposed MDP-based application-oriented dynamic tuning methodology. Section 8.3 describes the formulation of our proposed methodology as an MDP. Section 8.4 provides implementation guidelines and our proposed methodology's computational complexity. Section 8.5 provides model extensions to our proposed policy. Numerical results are presented in Section 8.6. Section 8.7 concludes our study and outlines future research work directions.

8.1 Related Work

There is a lot of research in the area of dynamic optimizations [187–189, 299], but however, most previous work focuses on the processor or memory (cache) in computer systems. These endeavors can provide valuable insights into embedded sensor nodes' dynamic optimizations, whereas they are not directly applicable to embedded sensor nodes because of different design spaces, platform particulars, and embedded sensor nodes' tight design constraints.

Stevens-Navarro et al. [299] applied MDPs for vertical hands-off decisions in heterogeneous wireless networks. Although our work leverages the reward function idea from their work, our work, for the first time to the best of our knowledge, applies MDPs to dynamic optimizations of embedded sensor nodes in EWSNs.

Little previous work exists in the area of application-specific tuning and dynamic profiling in EWSNs. Sridharan and Lysecky [300] obtained accurate environmental stimuli by dynamically profiling the EWSN's operating environment; however, they did not propose any methodology to leverage these profiling statistics for optimizations. Tilak et al. [301] investigated infrastructure (referred to as sensor node characteristics, number of deployed sensors, and deployment strategy) trade-offs on application requirements. The application requirements considered were accuracy, latency, energy efficiency, fault tolerance, goodput (ratio of total number of packets received to the total number of packets sent), and scalability. However, the authors did not delineate the interdependence between low-level sensor node parameters and high-level application requirements. Kogekar et al. [227] proposed an approach for dynamic software reconfiguration in EWSNs using dynamically adaptive software. Their approach used tasks to detect environmental changes (event occurrences) and adapt the software to the new conditions. Their work did not consider embedded sensor node tunable parameters. Kadayif and Kandemir [302] proposed an automated strategy for data filtering to determine the amount of computation or data filtering to be done at the sensor nodes before transmitting data to the sink node. Unfortunately, the authors only studied the effects of data filtering tuning on energy consumption and did not consider other sensor node parameters and application requirements.

Some previous and current work investigates EWSN operation in changing application (mission) requirements and environmental stimuli. Marrón et al. [303] presented an adaptive cross-layer architecture TinyCubus for TinyOS-based sensor networks that allowed dynamic management of components (e.g., caching, aggregation, broadcast strategies) and reliable code distribution considering EWSN topology. TinyCubus considered optimization parameters (e.g., energy, communication latency, and bandwidth), application requirements (e.g., reliability), and system parameters (e.g., mobility). The system parameters selected the best set of components based on current application requirements and optimization parameters. Vecchio [304] discussed adaptability in EWSNs at three different levels: communication level (by tuning the communication scheme), application level (by software changes), andhardware level (by injecting new sensor nodes). The International Technology Alliance (ITA) in Network and Information Sciences, sponsored by the UK Ministry of Defense (MoD) and US Army Research Laboratory (ARL), investigates task reassignment and reconfiguration (including physical movement of sensor nodes or reprocessing of data) of already deployed sensor nodes in the sensor field in response to current or predicted future conditions to provide the expected sensed information at a sufficient quality [305]. However, ITA current projects, to the best of our knowledge, do not consider sensor node parameter tuning, and our MDP-based parameter tuning can optimize EWSN operation in changing application requirements.

Several papers explored DVFS for reduced energy consumption. Pillai and Shin [306] proposed real-time dynamic voltage scaling (RT-DVS) algorithms capable of modifying the operating systems' real-time scheduler and task management service for reduced energy consumption. Childers et al. [307] proposed a technique for adjusting supply voltage and frequency at runtime to conserve energy. Their technique monitored a program's instruction-level parallelism (ILP) and adjusted processor voltage and speed in response to the current ILP. Their proposed technique allowed users to specify performance constraints, which the hardware maintained while running at the lowest energy consumption.

Liu and Svensson [308] investigated reducing processor speed by varying the supply and threshold voltages for low power consumption in complementary metal-oxide-semiconductor (CMOS) very-large-scale integration (VLSI). Results showed that an optimized threshold voltage revealed an 8c08-math-0001 power savings without any negative performance impacts. In addition, significantly greater energy savings could be achieved by reducing processor speed in tandem with threshold voltage. Burd et al. [309] presented a microprocessor system that dynamically varied its supply voltage and clock frequency to deliver high throughput during critical high-speed execution periods and extended battery life during low-speed execution periods. Results revealed that dynamic voltage scaling (DVS) could improve energy efficiency by 10c08-math-0002 for battery-powered processor systems without sacrificing peak throughput.

Min et al. [225] demonstrated that DVS in a sensor node's processor reduced energy consumption. Their technique used a voltage scheduler, running in tandem with the operating system's task scheduler, to adjust voltage and frequency based on a priori knowledge of the predicted sensor node's workload. Yuan and Qu [226] studied a DVFS system for sensor nodes, which required sensor nodes sending data to insert additional information into a transmitted data message's header such as the packet length, expected processing time, and deadline. The receiving sensor node used this message information to select an appropriate processor voltage and frequency to minimize the overall energy consumption.

Some previous works in EWSN optimizations explored greedy andsimulated annealing (SA)-based methods. Lysecky and Vahid [310] proposed an SA-based automated application-specific tuning of parameterized sensor-based embedded systems and found that automated tuning can improve EWSN operation by 40% on average. Verma [311] studied SA and particle swarm optimization (PSO) methods for automated application-specific tuning and observed that SA performed better than PSO because PSO often quickly converged to local minima. In prior work, Munir et al. [312] proposed greedy-based and simulated annealing (SA)-based algorithms for parameter tuning. Greedy- and SA-based algorithms are lightweight, whereas these algorithms do not ensure convergence to an optimal solution.

There exists previous work related to DVFS and several initiatives toward application-specific tuning were taken. Nevertheless, the literature presents no mechanisms to determine a dynamic tuning policy for sensor node parameters in accordance with changing application requirements. To the best of our knowledge, we propose the first methodology to address EWSN dynamic optimizations with the goal of meeting application requirements in a dynamic environment.

8.2 MDP-Based Tuning Overview

In this section, we present our MDP-based tuning methodology along with an MDP overview with respect to EWSNs [137].

8.2.1 MDP-Based Tuning Methodology for Embedded Wireless Sensor Networks

Figure 8.1 depicts the process diagram for our MDP-based application-oriented dynamic tuning methodology. Our methodology consists of three logical domains: the application characterization domain, the communication domain, and the sensor node tuning domain.

c08f001

Figure 8.1 Process diagram for our MDP-based application-oriented dynamic tuning methodology for embedded wireless sensor networks

The application characterization domain refers to the EWSN application's characterization/specification. In this domain, the application manager defines various application metrics (e.g., tolerable power consumption, tolerable delay), which are calculated from (or based on) application requirements. The application manager also assigns weight factors to application metrics to signify the weightage or importance of each application metric. Weight factors provide application managers with an easy method to relate the relative importance of each application metric. The application manager defines an MDP reward function, which signifies the overall reward (revenue) for given application requirements. The application metrics along with the associated weight factors represents the MDP reward function parameters.

The communication domain contains the sink node (which gathers statistics from the embedded sensor nodes) and encompasses the communication network between the application manager and the sensor nodes. The application manager transmits the MPD reward function parameters to the sink node via the communication domain. The sink node in turn relays reward function parameters to the embedded sensor nodes.

The sensor node tuning domain consists of embedded sensor nodes and performs sensor node tuning. Each sensor node contains an MDP controller module, which implements our MDP-based tuning methodology (summarized here and described in detail in Section 8.3). After an embedded sensor node receives reward function parameters from the sink node through the communication domain, the sensor node invokes the MDP controller module. The MDP controller module calculates the MDP-based policy. The MDP-based policy prescribes the sensor node actions to meet application requirements over the lifetime of the sensor node. An action prescribes the sensor node state (defined by processor voltage, processor frequency, and sensing frequency) to transition from the current state. The sensor node identifies its current operating state, determines an action “c08-math-0003” prescribed by the MDP-based policy (i.e., whether to continue operation in the current state or transition to another state), and subsequently executes action “c08-math-0004.”

Our proposed MDP-based dynamic tuning methodology can adapt to changes in application requirements (since application requirements may change with time, for example, a defense system initially deployed to monitor enemy troop position for 4 months may later be required to monitor troop activity for an extended period of 6 months). Whenever application requirements change, the application manager updates the reward function (and/or associated parameters) to reflect the new application requirements. Upon receiving the updated reward function, the sensor node reinvokes MDP controller module and determines the new MDP-based policy to meet the new application requirements.

Our MDP-based application-oriented dynamic tuning methodology reacts to environmental stimuli via a dynamic profiler module in the sensor node tuning domain. The dynamic profiler module monitors environmentalchanges over time and captures unanticipated environmental situations not predictable at design time [300]. The dynamic profiler module may be connected to the sensor node and profiles the profiling statistics (e.g., wireless channel condition, number of packets dropped, packet size, radio transmission power) when triggered by the EWSN application. The dynamic profiler module informs the application manager of the profiled statistics via the communication domain. After receiving the profiling statistics, the application manager evaluates the statistics and possibly updates the reward function parameters. This reevaluation process may be automated, thus eliminating the need for continuous application manager input. Based on these received profiling statistics and updated reward function parameters, the sensor node MDP controller module determines whether application requirements are met or not met. If application requirements are not met, the MDP controller module reinvokes the MDP-based policy to determine a new operating state to better meet application requirements. This feedback process continues to ensure that the application requirements are met in the presence of changing environmental stimuli.

Algorithm 3 depicts our MDP-based dynamic tuning methodology in an algorithmic form, which applies to all embedded sensor nodes in the WSN (line 1) and spans the entire lifetime of the embedded sensor nodes (line 2). The algorithm determines an MDP-based policy c08-math-0005 according to the application's initial requirements (captured by the MDP reward function parameters) at the time of EWSN deployment (lines 3 and 4). The algorithm determines a new MDP-based policy to adapt to the changing application requirements (lines 6 and 7). After determining c08-math-0006, the algorithm specifies the action c08-math-0007 according to c08-math-0008 for each embedded sensor node state c08-math-0009, given the total number of states c08-math-0010 (lines 9–17).

8.2.2 MDP Overview with Respect to Embedded Wireless Sensor Networks

In this section, we define basic MDP terminology in the context of EWSNs and give an overview of our proposed MDP-based dynamic tuning policy formulation for embedded sensor nodes. MDPs, also known as stochastic dynamic programming, are used to model and solve dynamic decision-making problems. We use standard notations as defined in [313] for our MDP-based problem formulation. Table 8.1 presents a summary of key notations used in this chapter.

Table 8.1 Summary of MDP notations

Notation Description
c08-math-0011 Sensor node state
c08-math-0012 State space
c08-math-0013 Number of decision epochs
c08-math-0014 Number of sensor node state tuples
c08-math-0015 Reward at time c08-math-0016 given state c08-math-0017 and action c08-math-0018
c08-math-0019 Action to transition from state c08-math-0020 to state c08-math-0021
c08-math-0022 Allowable actions in state c08-math-0023
c08-math-0024 Transition probability function
c08-math-0025 Expected reward of policy c08-math-0026 with initial state c08-math-0027
c08-math-0028 Discount factor
c08-math-0029 Expected total reward
c08-math-0030 Expected total discounted reward
c08-math-0031 Maximum expected total discounted reward
c08-math-0032 Optimal decision rule
c08-math-0033 Decision rule at time c08-math-0034
c08-math-0035 Optimal policy
c08-math-0036 MDP-based policy

The basic elements of an MDP model are decision epochs and periods, states, action sets, transition probabilities, and rewards. An MDP is Markovian (memoryless) because the transition probabilities and rewards depend on the past only through the current state and the action selected by the decision maker in that state. The decision epochs refer to the points of time during a sensor node's lifetime at which the sensor node makes a decision. Specifically, a sensor node makes a decision regarding its operating state atthese decision epochs (i.e., whether to continue operating at the current state (processor voltage, frequency, and sensing frequency) or transition to another state). We consider a discrete time process where time is divided into periods and a decision epoch corresponds to the beginning of a period. The set of decision epochs can be denoted as c08-math-0037, where c08-math-0038 and denotes the sensor node's lifetime (each individual time period in c08-math-0039 can be denoted as time c08-math-0040). The decision problem is referred to as a finite horizon problem when the decision-making horizon c08-math-0041 is finite and infinite horizon otherwise. In a finite horizon problem, the final decision is made at decision epoch c08-math-0042; hence, the finite horizon problem is also known as the c08-math-0043 period problem.

algorithm

The system (a sensor node) operates in a particular state at each decision epoch, where c08-math-0044 denotes the complete set of possible system states. States specify particular sensor node parameter values and each state represents a different combination of these values. An action set represents all allowable actions in all possible states. At each decision epoch, the sensor node decides whether to continue operating in the current state or to switch to another state. The sensor node state (in our problem) represents a tuple consisting of processor voltage (c08-math-0045), processor frequency (c08-math-0046), and sensing frequency (c08-math-0047). If the system is in state c08-math-0048 at a decision epoch, the sensor node can choose an action c08-math-0049 from the set of allowable actions c08-math-0050 in state c08-math-0051. Thus, an action set can be written as c08-math-0052. We assume that c08-math-0053 and c08-math-0054 do not vary with time c08-math-0055 [313].

When a sensor node selects action c08-math-0056 in state c08-math-0057, the sensor node receives a reward c08-math-0058 and the transition probability distribution c08-math-0059 determines the system state at the next decision epoch. The real-valued function c08-math-0060 denotes the value of the reward received at time c08-math-0061 in period c08-math-0062. The reward is referred to as income or cost depending on whether or not c08-math-0063 is positive or negative, respectively. When the reward depends on the system state at the next decision epoch, we let c08-math-0064 denote the value of the reward received at time c08-math-0065 when the system state at decision epoch c08-math-0066 is c08-math-0067. The sensor node selects action c08-math-0068, and the system occupies the state c08-math-0069 at decision epoch c08-math-0070. The sensor node evaluates c08-math-0071 using [313]

8.1 equation

where the nonnegative function c08-math-0073 is called a transition probability function. c08-math-0074 denotes the probability that the system occupies state c08-math-0075 at time c08-math-0076 when the sensor node selects action c08-math-0077 in state c08-math-0078 at time c08-math-0079 and usually c08-math-0080. Formally, an MDP is defined as the collection of objects c08-math-0081.

A decision rule prescribes an action in each state at a specified decision epoch. Our decision rule for sensor nodes is a function c08-math-0082, which specifies the action at time c08-math-0083 when the system is in state c08-math-0084 for each c08-math-0085, c08-math-0086. This decision rule is both Markovian and deterministic.

A policy specifies the decision rule for all decision epochs. In the case of sensor nodes, the policy prescribes action selection under any possible system state. A policy c08-math-0087 is a sequence of decision rules, that is, c08-math-0088 for c08-math-0089. A policy is stationary if c08-math-0090, that is, for stationary policy c08-math-0091.

As a result of selecting and implementing a particular policy, the sensor node receives rewards at time periods c08-math-0092. The reward sequence is random because therewards received in different periods are not known prior to policy implementation. The sensor node's optimization objective is to determine a policy that maximizes the corresponding random reward sequence.

8.3 Application-Specific Embedded Sensor Node Tuning Formulation as an MDP

In this section, we describe the formulation of our embedded sensor node application-specific DVFS2 tuning as an MDP. We formulate MDP-based policy constructs (i.e., state space, decision epochs, actions, state dynamics, policy, performance criterion, and reward function) for our system. We also introduce optimality equations and the policy iteration algorithm.

8.3.1 State Space

The state space for our MDP-based tuning methodology is a composite state space containing the Cartesian product of embedded sensor node tunable parameters' state spaces. We define the state space c08-math-0093 as

where c08-math-0095 denotes the Cartesian product, c08-math-0096 is the total number of sensor node tunable parameters, c08-math-0097 denotes the state space for tunable parameter c08-math-0098 where c08-math-0099, and c08-math-0100 denotes the state space c08-math-0101 cardinality (the number of states in c08-math-0102).

The tunable parameter c08-math-0103's state space (c08-math-0104) c08-math-0105 consists of c08-math-0106 values

8.3 equation

where c08-math-0108 denotes the tunable parameter c08-math-0109's state space cardinality (the number of tunable values in c08-math-0110). c08-math-0111 is a set of c08-math-0112-tuples where each c08-math-0113-tuple represents a sensor node state c08-math-0114. Each state c08-math-0115 is an c08-math-0116-tuple, that is, c08-math-0117. Note that some c08-math-0118-tuples in c08-math-0119 may not be feasible (e.g., all processor voltage and frequency pairs are not feasible) and can be regarded as do not care tuples.

Each embedded sensor node state has an associated power consumption, throughput, and delay. The power, throughput, and delay for state c08-math-0120 are denoted by c08-math-0121, c08-math-0122, and c08-math-0123, respectively. Since different sensor nodes may have different embedded processors and attached sensors, each node may have node-specific power consumption, throughput, and delay information for each state.

8.3.2 Decision Epochs and Actions

Embedded sensor nodes make decisions at decision epochs, which occur after fixed time periods. The sequence of decision epochs is represented as

8.4 equation

where the random variable c08-math-0125 corresponds to the embedded sensor node's lifetime.

At each decision epoch, a sensor node's action determines the next state to transition to the given current state. The sensor node action in state c08-math-0126 is defined as

8.5 equation

where c08-math-0128 denotes the action taken at time c08-math-0129 that causes the sensor node to transition to state c08-math-0130 at time c08-math-0131 from the current state c08-math-0132. A policy determines whether an action is taken or not. If c08-math-0133, the action is taken and if c08-math-0134, the action is not taken. For a given state c08-math-0135, a selected action cannot result in a transition to a state that is not in c08-math-0136. The action space can be defined as

8.6 equation

8.3.3 State Dynamics

The state dynamics of the system can be delineated by the state transition probabilities of the embedded Markov chain. We formulate our sensor node policy as a deterministic dynamic program (DDP) because the choice of an action determines the subsequent state with certainty. Our sensor node DDP policy formulation uses a transfer function to specify the next state. A transfer function defines a mapping c08-math-0138 from c08-math-0139, which specifies the system state at time c08-math-0140 when the sensor node selects action c08-math-0141 in state c08-math-0142 at time c08-math-0143. To formulate our DDP as an MDP, we define the transition probability function as

8.3.4 Policy and Performance Criterion

For each given state c08-math-0145, a sensor node selects an action c08-math-0146 according to a policy c08-math-0147 where c08-math-0148 is a set of admissible policies defined as

8.8 equation

A performance criterion compares the performance of different policies. The sensor node selects an action prescribed by a policy based on the sensor node's current state. If the random variable c08-math-0150 denotes the state at decision epoch c08-math-0151 and the random variable c08-math-0152 denotes the action selected at decision epoch c08-math-0153, then for the deterministic case, c08-math-0154.

As a result of selecting an action, the sensor node receives a reward c08-math-0155 at time c08-math-0156. The expected total reward denotes the expected total reward over the decision-making horizon given a specific policy. Let c08-math-0157 denote the expected totalreward over the decision-making horizon when the horizon length c08-math-0158 is a random variable, the system is in state c08-math-0159 at the first decision epoch, and policy c08-math-0160 is used [300, 314]

where c08-math-0162 represents the expected reward with respect to policy c08-math-0163 and the initial state c08-math-0164 (the system state at the time of the expected reward calculation), and c08-math-0165 denotes the expected reward with respect to the probability distribution of the random variable c08-math-0166. We can write Eq. (8.9) as [313]

which gives the expected total discounted reward. We assume that the random variable c08-math-0168 is geometrically distributed with parameter c08-math-0169, and hence the distribution mean is c08-math-0170 [299]. The parameter c08-math-0171 can be interpreted as a discount factor, which measures the present value of one unit of reward received for one period in the future. Thus, c08-math-0172 represents the expected total present value of the reward (income) stream obtained using policy c08-math-0173 [313]. Our objective is to find a policy that maximizes the expected total discounted reward, that is, a policy c08-math-0174 is optimal if

8.11 equation

8.3.5 Reward Function

The reward function captures application metrics and sensor node characteristics. Our reward function characterization considers the power consumption (which affects the sensor node's lifetime), throughput, and delay application metrics. We define the reward function c08-math-0176, given the current sensor node state c08-math-0177 and the sensor node's selected action c08-math-0178 as

where c08-math-0180 denotes the power reward function; c08-math-0181 denotes the throughput reward function; c08-math-0182 denotes the delay reward function; and c08-math-0183, c08-math-0184, and c08-math-0185 represent the weight factors for power, throughput, and delay, respectively. The weight factors' constraints are given as c08-math-0186 where c08-math-0187 such that c08-math-0188, c08-math-0189, and c08-math-0190. The weight factors are selected based on the relative importance of application metrics with respect to each other, for example, a habitat monitoring application taking camera images of the habitat requires a minimum image resolution to provide meaningful analysis that necessitates a minimum throughput, and therefore throughput can be assigned a higher weight factor than the power metric for this application.

We define linear reward functions for application metrics because an application metric reward (objective function) typically varies linearly, or piecewise linearly, between the minimum and the maximum allowed values of the metric [300, 311]. However, a nonlinear characterization of reward functions is also possible and depends on the particular application. We point out that our methodology works for any characterization of reward function. The reward function characterization only defines the reward obtained from operating in a given state. Our MDP-based policy determines the reward by selecting an optimal/suboptimal operating state, given the sensor node design space and application requirements for any reward function characterization. We consider linear reward functions as a typical example from the space of possible reward functions (e.g., piecewise linear, nonlinear) to illustrate our MDP-based policy. We define the power reward function (Fig. 8.2(a)) in Eq. (8.12) as

where c08-math-0192 denotes the power consumption of the current state given action c08-math-0193 taken at time c08-math-0194, and the constant parameters c08-math-0195 and c08-math-0196 denote the minimum and maximum allowed/tolerated sensor node power consumption, respectively.

c08f002

Figure 8.2 Reward functions: (a) power reward function c08-math-0197; (b) throughput reward function c08-math-0198; (c) delay reward function c08-math-0199

We define the throughput reward function (Fig. 8.2(b)) in Eq. (8.12) as

8.14 equation

where c08-math-0201 denotes the throughput of the current state given action c08-math-0202 taken at time c08-math-0203 and the constant parameters c08-math-0204 and c08-math-0205 denote the minimum and maximum allowed/tolerated throughput, respectively.

We define the delay reward function (Fig. 8.2(c)) in Eq. (8.12) as

where c08-math-0207 denotes the delay in the current state and the constant parameters c08-math-0208 and c08-math-0209 denote the minimum and maximum allowed/tolerated delay, respectively.

State transitioning incurs a cost associated with switching parameter values from the current state to the next state (typically in the form of power and/or execution overhead). We define the transition cost function c08-math-0210 as

where c08-math-0212 denotes the transition cost to switch from the current state c08-math-0213 to the next state as determined by action c08-math-0214. Note that a sensor node incurs no transition cost if action c08-math-0215 prescribes that the next state is the same as the current state.

Hence, the overall reward function c08-math-0216 given state c08-math-0217 and action c08-math-0218 at time c08-math-0219 is

which accounts for the power, throughput, and delay application metrics as well as state transition cost.

We point out that many other application metrics (e.g., security, reliability, and lifetime) are of immense significance to EWSNs. For example, EWSNs are vulnerable to security attacks such as distributed denial of service and Sybil attacks for which a security reward function can be included. A reliability reward function can encompass the reliability aspect of EWSNs since sensor nodes are often deployed in unattended and hostile environments and are susceptible to failures. Similarly, considering sensor nodes' constrained battery resources, power optimization techniques exist that put sensor nodes in sleep or low-power mode (where communication and/or processing functions are disabled) for power conservation when less activity is observed in the sensed region as determined by previously sensed data. These power optimizations can be captured by a lifetime reward function. These additional metrics incorporation in our model is the focus of our future work.

8.3.6 Optimality Equation

The optimality equation, also known as Bellman's equation, for expected total discounted reward criterion is given as [313]

where c08-math-0222 denotes the maximum expected total discounted reward. The following are the salient properties of the optimality equation: the optimality equation has a unique solution; an optimal policy exists in given conditions on states, actions, rewards, and transition probabilities; the value of the discounted MDP satisfies the optimality equation; and the optimality equation characterizes stationary policies.

The solution of Eq. (8.18) gives the maximum expected total discounted reward c08-math-0223 and the MDP-based policy c08-math-0224, which gives the maximum c08-math-0225. c08-math-0226 prescribes the action c08-math-0227 from action set c08-math-0228, given the current state c08-math-0229 for all c08-math-0230. There are several methods to solve the optimality equation (8.18) such as value iteration, policy iteration, and linear programming; however, in this work, we use the policy iteration algorithm.

8.3.7 Policy Iteration Algorithm

The policy iteration algorithm can be described in four steps:

  1. 1. Set c08-math-0231 and choose any arbitrary decision rule c08-math-0232 where c08-math-0233 is a set of all possible decision rules.
  2. 2. Policy evaluation—Obtain c08-math-0234 by solving the equations:8.19 c08-math-0235
  3. 3. Policy improvement—Select c08-math-0236 to satisfy the equations:
    8.20 equation

    and setting c08-math-0238 if possible.

  4. 4. If c08-math-0239, stop and set c08-math-0240 where c08-math-0241 denotes the optimal decision rule. If c08-math-0242, set c08-math-0243 and go to step 2.

Step 2 is referred to as policy evaluation because, by solving Eq. (8.19), we obtain the expected total discounted reward for decision rule c08-math-0244. Step 3 is referred to as policy improvement because this step selects a c08-math-0245-improving decision rule. In step 4, c08-math-0246 quells cycling because a decision rule is not necessarily unique.

8.4 Implementation Guidelines and Complexity

In this section, we describe the implementation guidelines and computational complexity for our proposed MDP-based policy. The implementation guidelines describe the mapping of MDP specifications (e.g., state space, reward function) in our problem formulation (Section 8.3) to actual sensor node hardware. The computational complexity focuses on the convergence of the policy iteration algorithm and the data memory analysis for our MDP-based dynamic tuning methodology. The prototype implementation of our MDP-based tuning methodology on hardware sensor platforms is the focus of our future work.

8.4.1 Implementation Guidelines

In order to implement our MDP-based policy, particular values must be initially defined. The reward function Eq. (8.17) uses the power, throughput, and delay values offered in an embedded sensor node state c08-math-0247 (Section 8.3.1). An application manager specifies the minimum and maximum power, throughput, and delay values required by Eq. (8.13)–(8.15), respectively, and the power, throughput, and delay weight factors required in Eq. (8.12) according to application specifications.

A sensor node's embedded processor defines the transition cost c08-math-0248 as required in Eq. (8.16), which is dependent on a particular processor's power management and switching techniques. Processors have a set of available voltage and frequency pairs, which defines the c08-math-0249 and c08-math-0250 values, respectively, in a sensor node state tuple (Section 8.3.1). Embedded sensors can operate at different defined sensing rates, which define the c08-math-0251 value in a sensor node state tuple (Section 8.3.1). The embedded processor and sensor characteristics determine the value of c08-math-0252, which characterizes the state space in Eq. (8.2) (i.e., the number of allowable processor voltage, processor frequency, and sensing frequency values determine the total number of sensor node operating states, and thus the value of c08-math-0253).

The sensor nodes perform parameter tuning decisions at decision epochs (Section 8.2.2). The decision epochs can be guided by the dynamic profiler module to adapt to the environmental stimuli. For instance, for a target-tracking application and a fast-moving target, the decision epoch period should be small to better capture the fast-moving target. On the other hand, for stationary or slow-moving targets, decision epoch period should be large to conserve battery energy. However, since the exact decision epoch period is application specific, the period should be adjusted to control the sensor node's lifetime.

Both the MDP controller module (which implements the policy iteration algorithm to calculate the MDP-based policy c08-math-0254) and the dynamic profiler module (Section 8.2.1) can either be implemented as software running on a sensor node's embedded processor or custom hardware for faster execution.

One of the drawbacks for MDP-based policy is that computational and storage overhead increases as the number of states increases. Therefore, EWSN designer would like to restrict the embedded sensor states (e.g., 2, 4, or 16) to reduce the computational and storage overhead. If state restriction is not a viable option, the EWSN configuration could be augmented with a back-end base station node to run our MDP-based optimization, and the sensor node operating states would be communicated to the sensor nodes. This communication of operating state information to sensor nodes by the base station node would not consume enough power resources, given that this state information is transmitted periodically and/or aperiodically after some minimum duration determined by the agility of the environmental stimuli (e.g., more frequent communication would be required for a rapidly changing environmental stimuli as opposed to a slow changing environmental stimuli). This EWSN configuration could also consider global optimizations, which are optimizations that take into account sensor node interactions and dependencies and is a focus of our future work. We point out that global optimization storage and processing overhead increases rapidly as the number of sensor nodes in EWSN increases.

8.4.2 Computational Complexity

Since sensor nodes have limited energy reserves and processing resources, it is critical to analyze our proposed MDP-based policy's computational complexity, which is related to the convergence of the policy iteration algorithm. Since our problem formulation (Section 8.3) consists of finite states and actions, Puterman [313] proves a theorem that establishes the convergence of the policy iteration algorithm for finite states and actions in a finite number of iterations. Another important computational complexity factor is the algorithm's convergence rate. Puterman [313] shows that for a finite number of states and actions, the policy iteration algorithm converges to the optimal/suboptimal value function at least quadratically fast. Empirical observations suggest that the policy iteration algorithm can converge in c08-math-0255 iterations where each iteration takes c08-math-0256 time (for policy evaluation); however, no proof yet exists to verify these empirical observations [314]. Based on these empirical observations for convergence, policy iteration algorithm can converge in four iterations for c08-math-0257.

8.4.3 Data Memory Analysis

We performed data memory analysis using the 8-bit Atmel ATmega128L microprocessor [192] in XSM sensor nodes [315]. The Atmel ATmega128L microprocessor contains 128 KB of on-chip in-system reprogrammable Flash memory for program storage, 4 KB of internal SRAM data memory, and up to 64 KB of external SRAM data memory. Integer and floating-point data types require 2 and 4 B of storage, respectively. Our data memory analysis considers all storage requirements for our MDP-based dynamic tuning formulation (Section 8.3) including state space, action space, state dynamics (transition probability matrix), Bellman's equation (8.18), MDP reward function calculation (reward matrix), and policy iteration algorithm. We estimate data memory size for three sensor node configurations:

  • 4 sensor node states with 4 allowable actions in each state (16 actions in the action space) (Fig. 8.4);
  • 8 sensor node states with 8 allowable actions in each state (64 actions in the action space);
  • 16 sensor node states with 16 allowable actions in each state (256 actions in the action space).
c08f004

Figure 8.4 Symbolic representation of our MDP model with four sensor node states

Data memory analysis revealed that 4, 8, and 16 sensor node state configurations required approximately 1.55, 14.8, and 178.55 KB, respectively. Thus, currently available sensor node platforms contain enough memory resources to implement our MDP-based dynamic tuning methodology with 16 sensor node states or fewer. However, the memory requirements increase rapidly as the number of states increases due to the transition probability matrix and reward matrix specifications. Therefore, depending on the available memory resources, an application developer could restrict the number of states accordingly or otherwise would have to resort to doing computations on the back-end base station as outlined in Section 8.4 to conserve power and storage.

8.5 Model Extensions

Our proposed MDP-based dynamic tuning methodology for EWSNs is highly adaptive to different EWSN characteristics and particulars, including additional sensor node tunable parameters (e.g., radio transmission power) and application metrics (e.g., reliability). Furthermore, our problem formulation can be extended to form MDP-based stochastic dynamic programs. Our current MDP-based dynamic optimization methodology provides a basis for MDP-based stochastic dynamic optimization that would react to the changing environmental stimuli and wireless channel conditions to autonomously switch to an appropriate operating state. This stochastic dynamic optimization would provide a major incentive to use an MDP-based policy (because of the capability of MDP to formulate stochastic dynamic programs) as opposed to using lightweight heuristic policies (e.g., greedy or simulated annealing based) for parameter tuning that can determine appropriate operating state out of a large state space without requiring large computational and memory resources [316].

To exemplify additional tuning parameters, we consider a sensor node's transceiver (radio) transmission power. The extended state space can be written as

8.21 equation

where c08-math-0259 denotes the state space for a sensor node's radio transmission power.

We define the sensor node's radio transmission power state space c08-math-0260 as

8.22 equation

where c08-math-0262 denotes a radio transmission power, c08-math-0263 denotes the number of radio transmission power values, and c08-math-0264 denotes the radio transmission power state space cardinality.

To exemplify the inclusion of additional application metrics, we consider reliability, which measures the reliability of sensed data, such as the total number of sensed data packets received at the sink node without error in an arbitrary time window. The reliability can be interpreted as the packet reception rate, which is the complement of the packet error rate (PER) [24]. The factors that affect reliability include wireless channel condition, network topology, traffic patterns, and the physical phenomenon that triggered the sensor node communication activity [24]. In general, the wireless channel condition has the most affect on the reliability metric because sensed data packets may experience different error rates depending on the channel condition. A sensor node may maintain application-specified reliability in different wireless channel conditions by tuning/changing the error correcting codes, modulation schemes, and/or transmission power. The dynamic profiler module in our proposed tuning methodology helps estimating the reliability metric at runtime by profiling the number of packet transmissions from each sensor node and the number of packet receptions at the sink node.

The sensor node's reliability can be added to the reward function and the extended reward function can be written as

where c08-math-0266 denotes the reliability reward function, c08-math-0267 represents the weight factor for reliability, and the remainder of the terms in Eq. (8.23) has the same meaning as in Eq. (8.12). The weight factors' constraints are given as c08-math-0268 where c08-math-0269 such that c08-math-0270, c08-math-0271, c08-math-0272, and c08-math-0273.

The reliability reward function (Fig. 8.3(a)) in Eq. (8.23) can be defined as

24 equation

where c08-math-0275 denotes the reliability offered in the current state given action c08-math-0276 taken at time c08-math-0277 and the constant parameters c08-math-0278 and c08-math-0279 denote the minimum and maximum allowed/tolerated reliability, respectively. The reliability may be represented as a multiple of a base reliability unit equal to 0.1, which represents a 10% packet reception rate [24].

c08f003

Figure 8.3 Reliability reward functions: (a) linear variation; (b) quadratic variation

The reward function capturing an application's metrics can be defined according to particular application requirements, and may vary quadratically (Fig. 8.3(b)) instead of linearly (as defined earlier) over the minimum and maximum allowed parameter values and can be expressed as

25 equation

Thus, our proposed MDP-based dynamic tuning methodology works with any reward function formulation.

Currently, our problem formulation considers a DDP with fixed states and state transition probabilities equal to 1 (Section 8.3.3); however, our formulation can be extended to form stochastic dynamic programs with different state transition probabilities [317]. One potential extension could include environmental stimuli and wireless channel condition in the state space. The environmental stimuli and wireless channel condition vary with time and have a different probability of being in a certain state at a given point in time. For instance, considering the wireless channel condition along with the sensor node state, the state space vector c08-math-0281 can be given as

26 equation

where c08-math-0283 and c08-math-0284 represent sensor state and wireless channel state at time c08-math-0285 assuming that there are c08-math-0286 total sensor states and c08-math-0287 total wireless channel states. The state dynamics could be given by c08-math-0288, which denotes the probability that the system occupies state c08-math-0289 in c08-math-0290 time units given c08-math-0291 and c08-math-0292. If the wireless channel condition does not change state with time, then c08-math-0293 thus forming a DDP. The determination of c08-math-0294 requires probabilistic modeling of wireless channel condition over time and is the focus of our future work.

8.6 Numerical Results

In this section, we compare the performance (based on expected total discounted reward criterion (Section 8.3.6)) of our proposed MDP-based DVFS2 policy c08-math-0295 (c08-math-0296) with several fixed heuristic policies using a representative EWSN platform. We use the MATLAB MDP tool box [318] implementation of our policy iteration algorithm described in Section 8.3.7 to solve Bellman's equation (8.18) to determine the MDP-based policy. We select sensor node state parameters based on eXtreme Scale Motes (XSM) [315, 319]. The XSM motes have an average lifetime of 1000 h of continuous operation with two AA alkaline batteries, which can deliver 6 W h or an average of 6 mW [315]. The XSM platform integrates an Atmel ATmega128L microcontroller [192], a Chipcon CC1000 radio operating at 433 MHz, and a 4 Mbit serial Flash memory. The XSM motes contain infrared, magnetic, acoustic, photo, and temperature sensors.

To represent sensor node operation, we analyze sample application domains that represent a typical security system or defense application (henceforth referred to as a security/defense system) [137], healthcare application, and ambient conditions monitoring application. For brevity, we select a single sample EWSN platform configuration and several application domains, but we point out that our proposed MDP model and methodology works equally well for any other EWSN platform and application.

For each application domain, we evaluate the effects of different discount factors, different state transition costs, and different application metric weight factors on the expected total discounted reward for our MDP-based policy and several fixed heuristic policies (Section 8.6.1). The magnitude of difference in the total expected discounted reward for different policies is important as it provides relative comparisons between the different policies.

8.6.1 Fixed Heuristic Policies for Performance Comparisons

Due to the infancy of EWSN dynamic optimizations, there exist no dynamic sensor node tuning methods for comparison with our MDP-based policy. Therefore, we compare to several fixed heuristic policies (heuristic policies have been shown to be a viable comparison method [299]). To provide a consistent comparison, fixed heuristic policies use the same reward function and the associated parameter settings as that of our MDP-based policy. We consider the following four fixed heuristic policies:

  • A fixed heuristic policy c08-math-0297 that always selects the state with the lowest power consumption.
  • A fixed heuristic policy c08-math-0298 that always selects the state with the highest throughput.
  • A fixed heuristic policy c08-math-0299 that spends an equal amount of time in each of the available states.
  • A fixed heuristic policy c08-math-0300 that spends an unequal amount of time in each of the available states based on a specified preference for each state. For example, given a system with four possible states, the c08-math-0301 policy may spend 40% of time in the first state, 20% of time in the second state, 10% of time in the third state, and 30% of time in the fourth state.

8.6.2 MDP Specifications

We compare different policies using the expected total discounted reward performance criterion (Section 8.3.6). The state transition probability for each sensor node state is given by Eq. (8.7). The sensor node's lifetime and the time between decision epochs are subjective and may be assigned by an application manager according to application requirements. A sensor node's mean lifetime is given by c08-math-0302 time units, which is the time between successive decision epochs (which we assume to be 1 h). For instance for c08-math-0303, the sensor node's mean lifetime is c08-math-0304 h c08-math-0305 42 days.

For our numerical results, we consider a sensor node capable of operating in four different states (i.e., c08-math-0306 in Eq. (8.2)). Figure 8.4 shows the symbolic representation of our MDP model with four sensor node states. Each state has a set of allowed actions prescribing transitions to available states. For each allowed action c08-math-0307 in a state, there is a c08-math-0308 pair where c08-math-0309 specifies the immediate reward obtained by taking action c08-math-0310 and c08-math-0311 denotes the probability of taking action c08-math-0312.

Table 8.2 summarizes state parameter values for each of the four states c08-math-0332c08-math-0333. We define each state using a c08-math-0334 tuple where c08-math-0335 is specified in volts, c08-math-0336 in MHz, and c08-math-0337 in kHz. For instance, state one c08-math-0338 is defined as c08-math-0339, which corresponds to a processor voltage of 2.7 V, a processor frequency of 2 MHz, and a sensing frequency of 2 kHz (2000 samples/s). We represent state c08-math-0340 power consumption, throughput, and delay as multiples of power, throughput, and delay base units, respectively. We assume one base power unit is equal to 1 mW, one base throughput unit is equal to 0.5 MIPS (millions of instructions per second), and one base delay unit is equal to 50 ms. We assign base units such that these units provide a convenient representation of application metrics (power, throughput, delay). We point out, however, that any other feasible base unit values can be assigned [299]. We assume, without loss of generality, that the transition cost for switching from one state to another is c08-math-0341. The transition cost could be a function of power, throughput, and delay, but we assume a constant transition cost for simplicity as it is typically constant for different state transitions [192].

Table 8.2 Parameters for wireless sensor node state c08-math-0313 (c08-math-0314 is specified in volts, c08-math-0315 in MHz, and c08-math-0316 in kHz)

Notation c08-math-0317c08-math-0318 c08-math-0319c08-math-0320 c08-math-0321c08-math-0322 c08-math-0323c08-math-0324
c08-math-0325 10 15 30 55
c08-math-0326 4 8 12 16
c08-math-0327 26 14 8 6

Parameters are specified as a multiple of a base unit where one power unit is equal to 1 mW, one throughput unit is equal to 0.5 MIPS, and one delay unit is equal to 50 ms. Parameter values are based on the XSM mote (c08-math-0328, c08-math-0329, and c08-math-0330 denote the power consumption, throughput, and delay, respectively, in state c08-math-0331)

Our selection of the state parameter values in Table 8.2 corresponds to XSM mote specifications [193, 316]. The XSM mote's Atmel ATmega128L microprocessor has an operating voltage range of 2.7–5.5 V and a processor frequency range of 0–8 MHz. The ATmega128L throughput varies with processor frequency at 1 MIPS/MHz, thus allowing an application manager to optimize power consumption versus processing speed [192]. Our chosen sensing frequency also corresponds with standard sensor node specifications. The Honeywell HMC1002 magnetometer sensor [320] consumes on average 15 mW of power and can be sampled in 0.1 ms on the Atmel ATmega128L microprocessor, which results in a maximum sampling frequency of approximately 10 kHz (10,000 samples/s). The acoustic sensor embedded in the XSM mote has a maximum sensing frequency of approximately 8.192 kHz [315]. Although the power consumption in a state depends not only on the processor voltage and frequency but also on the processor utilization, which also depends on sensing frequency, we report the average power consumption values in a state as derived from the data sheets [193, 322].

Table 8.3 summarizes the minimum c08-math-0362 and maximum c08-math-0363 reward function parameter values for application metrics (power, throughput, and delay) and the associated weight factors for a security/defense system, health care, and ambient conditions monitoring application. Our selected reward function parameter values represent typical application requirements [81]. We describe later the relative importance of these application metrics with respect to our considered applications.

Table 8.3 Minimum c08-math-0342 and maximum c08-math-0343 reward function parameter values and application metric weight factors for a security/defense system, health care, and ambient conditions monitoring application

Notation Security/defense Health care Ambient monitoring
c08-math-0344 12 units 8 units 5 units
c08-math-0345 35 units 20 units 32 units
c08-math-0346 6 units 3 units 2 units
c08-math-0347 12 units 9 units 8 units
c08-math-0348 7 units 8 units 12 units
c08-math-0349 16 units 20 units 40 units
c08-math-0350 0.45 0.5 0.65
c08-math-0351 0.2 0.3 0.15
c08-math-0352 0.35 0.2 0.2

c08-math-0353 and c08-math-0354 denote minimum and maximum acceptable power consumption, respectively; c08-math-0355 and c08-math-0356 denote minimum and maximum acceptable throughput, respectively; c08-math-0357 and c08-math-0358 denote minimum and maximum acceptable delay, respectively; c08-math-0359, c08-math-0360, and c08-math-0361 denote the weight factors for power, throughput, and delay, respectively.

Although power is a primary concern for all EWSN applications and tolerable power consumption values are specified based on the desired EWSN lifetime considering limited battery resources of sensor nodes. However, a relative importance in power for different applications can be delineated mainly by the infeasibility of sensor node's battery replacement due to hostile environments. For example, sensor nodes in a war zone for a security/defense application and an active volcano monitoring application make battery replacement almost impractical. For a healthcare application with sensors attached to a patient to monitor physiological data (e.g., heart rate, glucose level), sensor node's battery may be replaced though power is constrained because excessive heat dissipation could adversely affect a patient's health. Similarly, delay can be an important factor for security/defense in case of enemy target tracking and health care for a patient in intensive health conditions, whereas delay may be relatively less important for a humidity monitoring application. A data-sensitive security/defense system may require a comparatively large minimum throughput in order to obtain a sufficient number of sensed data samples for meaningful analysis. Although relative importance and minimum and maximum values of these application metrics can vary widely with an application domain and between application domains, we pick our parameter values (Table 8.3) for demonstration purposes to provide an insight into our optimization methodology.

We point out that all of our considered application metrics specifically throughput depends on the traffic pattern. EWSN throughput is a complex function of the number of nodes, traffic volume and patterns, and the parameters of the medium access technique. As the number of nodes and traffic volume increases, contention-based medium access methods result in an increased number of packet collisions, which waste energy without transmitting useful data. This contention and packet collision results in saturation, which decreases the effective throughput and increases the delay sharply. We briefly outline the variance in EWSN traffic patterns for our considered applications. The security/defense application would have infrequent bursts of heavy traffic (e.g., when an enemy target appears within the sensor nodes' sensing range), healthcare applications would have a steady flow of medium-to-high traffic, and ambient conditions monitoring applications would have a steady flow of low-to-medium traffic except for emergencies (e.g., volcano eruption). Although modeling of application metrics with respect to traffic patterns would result in a better characterization of these metric values at particular instants/times in WSN, these metric values can still be bounded by a lower minimum value and an upper maximum value as captured by our reward functions (Section 8.3.5).

Given the reward function, sensor node state parameters corresponding to XSM mote, and transition probabilities, our MATLAB MDP tool box [318] implementation of policy iteration algorithm solves Bellman's equation (8.18) to determine the MDP-based policy and determines the expected total discounted reward (Eq. (8.10)).

8.6.3 Results for a Security/Defense System Application

8.6.3.1 The Effects of Different Discount Factors on the Expected Total Discounted Reward

Table 8.4 and Fig. 8.5 depict the effects of different discount factors c08-math-0364 on the heuristic policies and c08-math-0365 for a security/defense system when the state transition cost c08-math-0366 is held constant at 0.1 for c08-math-0367, and c08-math-0368, and c08-math-0369 are equal to 0.45, 0.2, and 0.35, respectively. Since we assume the time between successive decision epochs to be 1 h, the range of c08-math-0370 from 0.94 to 0.99999 corresponds to a range of average sensor node lifetime from 16.67 to 100,000 h c08-math-0371 days c08-math-0372 years. Table 8.4 and Fig. 8.5 show that c08-math-0373 results in the highest expected total discounted reward for all values of c08-math-0374 and corresponding average sensor node lifetimes.

Table 8.4 The effects of different discount factors c08-math-0375 for a security/defense system

Discount factor c08-math-0376 Sensor lifetime (h) c08-math-0377 c08-math-0378 c08-math-0379 c08-math-0380 c08-math-0381
0.94 16.67 10.0006 7.5111 9.0778 7.2692 7.5586
0.95 20 12.0302 9.0111 10.9111 8.723 9.0687
0.96 25 15.0747 11.2611 13.6611 10.9038 11.3339
0.97 33.33 20.1489 15.0111 18.2445 14.5383 15.1091
0.98 50 30.2972 22.5111 27.4111 21.8075 22.6596
0.99 100 60.7422 45.0111 54.9111 43.6150 45.3111
0.999 1000 608.7522 450.0111 549.9111 436.15 453.0381
0.9999 10,000 c08-math-0382 c08-math-0383 c08-math-0384 c08-math-0385 c08-math-0386
0.99999 100,000 c08-math-0387 c08-math-0388 c08-math-0389 c08-math-0390 c08-math-0391

c08-math-0392 if c08-math-0393, c08-math-0394

c08f005

Figure 8.5 The effects of different discount factors on the expected total discounted reward for a security/defense system. c08-math-0395 if c08-math-0396, c08-math-0397

Figure 8.6 shows the percentage improvement in expected total discounted reward for c08-math-0402 for a security/defense system as compared to the fixed heuristic policies. The percentage improvement is calculated as c08-math-0403 where c08-math-0404 denotes the expected total discounted reward for c08-math-0405 and c08-math-0406 denotes the expected total discounted reward for the c08-math-0407 fixed heuristic policy where c08-math-0408 = {POW, THP, EQU, PRF}. For instance, when the average sensor node lifetime is 1000 h (c08-math-0409), c08-math-0410 results in a 26.08%, 9.67%, 28.35%, and 25.58% increase in expected total discounted reward compared to c08-math-0411, c08-math-0412, c08-math-0413, and c08-math-0414, respectively. Figure 8.6 also depicts that c08-math-0415 shows increased savings as the average sensor node lifetime increases due to an increase in the number of decision epochs and thus prolonged operation of sensor nodes in optimal/suboptimal states as prescribed by c08-math-0416. On average over all discount factors c08-math-0417, c08-math-0418 results in a 25.57%, 9.48%, 27.91%, and 25.1% increase in expected total discounted reward compared to c08-math-0419, c08-math-0420, c08-math-0421, and c08-math-0422, respectively.

c08f006

Figure 8.6 Percentage improvement in expected total discounted reward for c08-math-0398 for a security/defense system as compared to the fixed heuristic policies. c08-math-0399 if c08-math-0400, c08-math-0401

8.6.3.2 The Effects of Different State Transition Costs on the Expected Total Discounted Reward

Figure 8.7 depicts the effects of different state transition costs on the expected total discounted reward for a security/defense system with a fixed average sensor node lifetime of 1000 h (c08-math-0423) and c08-math-0424, and c08-math-0425 equal to 0.45, 0.2, and 0.35, respectively. Figure 8.7 shows that c08-math-0426 results in the highest expected total discounted reward for all transition cost values.

c08f007

Figure 8.7 The effects of different state transition costs on the expected total discounted reward for a security/defense system. c08-math-0427, c08-math-0428

Figure 8.7 also shows that the expected total discounted reward for c08-math-0429 is relatively unaffected by state transition cost. This relatively constant behavior can be explained by the fact that our MDP policy does not perform many state transitions. Relatively few state transitions to reach the optimal/suboptimal state according to the specified application metrics may be advantageous for some application managers who consider the number of state transitions prescribed by a policy as a secondary evaluation criteria [299]. c08-math-0430 performs state transitions primarily at sensor node deployment or whenever a new MDP-based policy is determined as the result of changes in application requirements.

We further analyze the effects of different state transition costs on the fixed heuristic policies, which consistently result in a lower expected total discounted reward as compared to c08-math-0431. The expected total discounted rewards for c08-math-0432 and c08-math-0433 are relatively unaffected by state transition cost. The explanation for this behavior is that these heuristics perform state transitions only at initial sensor node deployment when the sensor node transitions to the lowest power state and the highest throughput state, respectively, and remain in these states for the entire sensor node's lifetime. On the other hand, state transition cost has the largest affect on the expected total discounted reward for c08-math-0434 due tohigh state transition rates because the policy spends an equal amount of time in all states. Similarly, high switching costs have a large affect on the expected total discounted reward for c08-math-0435 (although less severely than c08-math-0436) because c08-math-0437 spends a certain percentage of time in each available state (Section 8.6.1), thus requiring comparatively fewer transitions than c08-math-0438.

8.6.3.3 The Effects of Different Reward Function Weight Factors on the Expected Total Discounted Reward

Figure 8.8 shows the effects of different reward function weight factors on the expected total discounted reward for a security/defense system when the average sensor node lifetime is 1000 h (c08-math-0439) and the state transition cost c08-math-0440 is held constant at 0.1 for c08-math-0441. We explore various weight factors that are appropriate for different security/defense system specifics, that is, c08-math-0442. Figure 8.8 reveals that c08-math-0443 results in the highest expected total discounted reward for all weight factor variations.

c08f008

Figure 8.8 The effects of different reward function weight factors on the expected total discounted reward for a security/defense system. c08-math-0444, c08-math-0445 if c08-math-0446

8.6.4 Results for a Healthcare Application

8.6.4.1 The Effects of Different Discount Factors on the Expected Total Discounted Reward

Figure 8.9 depicts the effects of different discount factors c08-math-0447 for a healthcare application when the state transition cost c08-math-0448 is held constant at 0.1 for c08-math-0449, and c08-math-0450, and c08-math-0451 are equal to 0.5, 0.3, and 0.2, respectively. Figure 8.9 shows that c08-math-0452 results in the highest expected total discounted reward for all values of c08-math-0453 and the corresponding average sensor node lifetimes as compared to other fixed heuristic policies.

c08f009

Figure 8.9 The effects of different discount factors on the expected total discounted reward for a healthcare application. c08-math-0454 if c08-math-0455, c08-math-0456

Figure 8.10 shows the percentage improvement in expected total discounted reward for c08-math-0466 for a healthcare application as compared to the fixed heuristic policies. For instance, when the average sensor node lifetime is 1000 h (c08-math-0467), c08-math-0468 results in a 16.39%, 10.43%, 27.22%, and 21.47% increase in expected total discounted reward compared to c08-math-0469, c08-math-0470, c08-math-0471, and c08-math-0472, respectively. On average over all discount factors c08-math-0473, c08-math-0474 results in a 16.07%, 10.23%, 26.8%, and 21.04% increase in expected total discounted reward compared to c08-math-0475, c08-math-0476, c08-math-0477, and c08-math-0478, respectively.

c08f010

Figure 8.10 Percentage improvement in expected total discounted reward for c08-math-0457 for a healthcare application as compared to the fixed heuristic policies. c08-math-0458 if c08-math-0459, c08-math-0460

8.6.4.2 The Effects of Different State Transition Costs on the Expected Total Discounted Reward

Figure 8.11 shows the effects of different state transition costs on the expected total discounted reward for a healthcare application with a fixed average sensor node lifetime of 1000 h (c08-math-0479) and c08-math-0480, and c08-math-0481 equal to 0.5, 0.3, and 0.2, respectively. Figure 8.11 shows that c08-math-0482 results in the highest expected total discounted reward for all transition cost values. The fixed heuristic policies consistently result in a lower expected total discounted reward as compared to c08-math-0483. Comparison of Figs. 8.7 and 8.11 reveals that a security/defense system and a healthcare application have similar trends with respect to different state transition costs on the expected total discounted reward.

c08f011

Figure 8.11 The effects of different state transition costs on the expected total discounted reward for a healthcare application. c08-math-0461, c08-math-0462

8.6.4.3 The Effects of Different Reward Function Weight Factors on the Expected Total Discounted Reward

Figure 8.12 depicts the effects of different reward function weight factors on the expected total discounted reward for a healthcare application when the average sensor node lifetime is 1000 h (c08-math-0484) and the state transition cost c08-math-0485 is kept constant at 0.1 for c08-math-0486. We explore various weight factors that are appropriate for different healthcare application specifics (i.e., c08-math-0487). Figure 8.12 shows that c08-math-0488 results in the highest expected total discounted reward for all weight factor variations.

c08f012

Figure 8.12 The effects of different reward function weight factors on the expected total discounted reward for a healthcare application. c08-math-0463, c08-math-0464 if c08-math-0465

Figures 8.8 and 8.12 show that the expected total discounted reward of c08-math-0489 gradually increases as the power weight factor increases and eventually exceeds that of c08-math-0490 for a security/defense system and a healthcare application, respectively. However, close observation reveals that the expected total discounted reward of c08-math-0491 for a security/defense system is affected more sharply than a healthcare application because of the more stringent constraint on maximum acceptable power for a healthcare application (Table 8.3). Figures 8.8 and 8.12 show that c08-math-0492 tends to perform better than c08-math-0493 with increasing power weight factors because c08-math-0494 spends a greater percentage of time in low power states.

8.6.5 Results for an Ambient Conditions Monitoring Application

8.6.5.1 The Effects of Different Discount Factors on the Expected Total Discounted Reward

Figure 8.13 demonstrates the effects of different discount factors c08-math-0495 for an ambient conditions monitoring application when the state transition cost c08-math-0496 is held constant at 0.1 for c08-math-0497, and c08-math-0498, and c08-math-0499 are equal to 0.65, 0.15, and 0.2, respectively. Figure 8.13 shows that c08-math-0500 results in the highest expected total discounted reward for all values of c08-math-0501.

c08f013

Figure 8.13 The effects of different discount factors on the expected total discounted reward for an ambient conditions monitoring application. c08-math-0502 if c08-math-0503, c08-math-0504

Figure 8.14 shows the percentage improvement in expected total discounted reward for c08-math-0514 for an ambient conditions monitoring application as compared to the fixed heuristic policies. For instance, when the average sensor node lifetime is 1000 h (c08-math-0515), c08-math-0516 results in a 8.77%, 52.99%, 40.49%, and 32.11% increase in expected total discounted reward as compared to c08-math-0517, c08-math-0518, c08-math-0519, and c08-math-0520, respectively. On average over all discount factors c08-math-0521, c08-math-0522 results in a 8.63%, 52.13%, 39.92%, and 31.59% increase in expected total discounted reward as compared to c08-math-0523, c08-math-0524, c08-math-0525, and c08-math-0526, respectively.

c08f014

Figure 8.14 Percentage improvement in expected total discounted reward for c08-math-0505 for an ambient conditions monitoring application as compared to the fixed heuristic policies. c08-math-0506 if c08-math-0507, c08-math-0508

8.6.5.2 The Effects of Different State Transition Costs on the Expected Total Discounted Reward

Figure 8.15 depicts the effects of different state transition costs on the expected total discounted reward for an ambient conditions monitoring application with a fixed average sensor node lifetime of 1000 h (c08-math-0527) and c08-math-0528, and c08-math-0529 equal to 0.65, 0.15, and 0.2, respectively. Figure 8.15 reveals that c08-math-0530 results in the highest expected total discounted reward for all transition cost values. The fixed heuristic policies consistently result in a lower expected total discounted reward as compared to c08-math-0531. Figure 8.15 reveals that the ambient conditions monitoring application has similar trends with respect to different state transition costs as compared to the security/defense system (Fig. 8.7) and healthcare applications (Fig. 8.7).

c08f015

Figure 8.15 The effects of different state transition costs on the expected total discounted reward for an ambient conditions monitoring application. c08-math-0509, c08-math-0510

8.6.5.3 The Effects of Different Reward Function Weight Factors on the Expected Total Discounted Reward

Figure 8.16 shows the effects of different reward function weight factors on the expected total discounted reward for an ambient conditions monitoring application when the average sensor node lifetime is 1000 h (c08-math-0532) and the state transition cost c08-math-0533 is held constant at 0.1 for c08-math-0534. We explore various weight factors that are appropriate for different ambient conditions monitoring application specifics (i.e., c08-math-0535). Figure 8.16 reveals that c08-math-0536 results in the highest expected total discounted reward for all weight factor variations.

c08f016

Figure 8.16 The effects of different reward function weight factors on the expected total discounted reward for an ambient conditions monitoring application. c08-math-0511, c08-math-0512 if c08-math-0513

For an ambient conditions monitoring application, Fig. 8.16 shows that the expected total discounted reward of c08-math-0537 becomes closer to c08-math-0538 as the power weight factor increases, because with higher power weight factors, c08-math-0539 spends more time in lower power states to meet application requirements. Figure 8.16 shows that c08-math-0540 tends to perform better than c08-math-0541 with increasing power weight factors similar to the security/defense system (Fig. 8.8) and healthcare applications (Fig. 8.16).

8.6.6 Sensitivity Analysis

An application manager can assign values to MDP reward function parameters, such as c08-math-0542, c08-math-0543, c08-math-0544, c08-math-0545, c08-math-0546, c08-math-0547, c08-math-0548, c08-math-0549, c08-math-0550, and c08-math-0551, before an EWSN's initial deployment according to projected/anticipated application requirements. However, the average sensor node lifetime (calculated from c08-math-0552) may not be accurately estimated at the time of initial EWSN deployment, as environmental stimuli and wireless channel conditions vary with time and may not be accurately anticipated. The sensor node's lifetime depends on sensor node activity (both processing and communication), which varies with the changing environmental stimuli and wireless channel conditions. Sensitivity analysis analyzes the effects of changes in average sensor node lifetime after initial deployment on the expected total discounted reward. Thus, if the actual lifetime is different than the estimated lifetime, what is the loss in total expected discounted reward if the actual lifetime had been accurately predicted at deployment.

EWSN sensitivity analysis can be carried out with the following steps [299]:

  1. 1. Determine the expected total discounted reward given the actual average sensor node lifetime c08-math-0553, referred to as the Optimal Reward c08-math-0554.
  2. 2. Let c08-math-0555 denote the estimated average sensor node lifetime, and c08-math-0556 denote the percentage change from the actual average sensor node lifetime (i.e., c08-math-0557). c08-math-0558 results in a suboptimal policy with a corresponding suboptimal total expected discounted reward, referred to as Suboptimal Reward c08-math-0559.
  3. 3. The Reward Ratio c08-math-0560 is the ratio of the suboptimal reward to the optimal reward (i.e., c08-math-0561), which indicates suboptimal expected total discounted reward variation with the average sensor node lifetime estimation inaccuracy.

It can be shown that the reward ratio varies from (0, 2] as c08-math-0562 varies from (c08-math-0563100%, 100%]. The reward ratio's ideal value is 1, which occurs when the average sensor node lifetime is accurately estimated/predicted (c08-math-0564 corresponding to c08-math-0565). Sensitivity analysis revealed that our MDP-based policy is sensitive to accurate determination of parameters, especially average lifetime, because inaccurate average sensor node lifetime results in a suboptimal expected total discounted reward. The dynamic profiler module (Fig. 8.1) measures/profiles the remaining battery energy (lifetime) and sends this information to the application manager along with other profiled statistics (Section 8.2.1), which helps inaccurate estimation of c08-math-0566. Estimating c08-math-0567 using the dynamic profiler's feedback ensures that the estimated average sensor node lifetime differs only slightly from the actual average sensor node lifetime, and thus helps in maintaining a reward ratio close to 1.

8.6.7 Number of Iterations for Convergence

The policy iteration algorithm determines c08-math-0568 and the corresponding expected total discounted reward on the order of c08-math-0569, where c08-math-0570 is the total number of states. In our numerical results with four sensor node states, the policy iteration algorithm converges in two iterations on average.

8.7 Chapter Summary

In this chapter, we presented an application-oriented dynamic tuning methodology for embedded sensor nodes in distributed EWSNs based on MDPs. Our MDP-based policy tuned embedded sensor node processor voltage, frequency, and sensing frequency in accordance with application requirements over the lifetime of a sensor node. Our proposed methodology was adaptive and dynamically determined the new MDP-based policy whenever application requirements change (which may be in accordance with changing environmental stimuli). We compared our MDP-based policy with four fixed heuristic policies and concluded that our proposed MDP-based policy outperformed each heuristic policy for all sensor node lifetimes, state transition costs, and application metric weight factors. We provided the implementation guidelines of our proposed policy in embedded sensor nodes. We proved that our proposed policy had fast convergence rate, computationally inexpensive, and thus could be considered for implementation in sensor nodes with limited processing resources.

Extension of this work includes enhancing our MDP model to incorporate additional high-level application metrics (e.g., security, reliability, energy, lifetime) as well as additional embedded sensor node tunable parameters (such as radio transmission power, radio transmission frequency). Furthermore, incorporation of wireless channel condition in the MDP state space, thus formulating a stochastic dynamic program that enables sensor node tuning in accordance with changing wireless channel condition, will be an interesting research. Sensor node tuning automation can be enhanced using profiling statistics by architecting mechanisms that enable the sensor node to automatically react to environmental stimuli without the need for an application manager's feedback. Extension of our MDP-based dynamic optimization methodology for performing global optimization (i.e., selection of sensor node tunable parameter settings to ensure that application requirements are met for EWSN as a whole where different sensor nodes collaborate with each other in optimal/suboptimal tunable parametersettings determination) will be an interesting research problem to explore.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset