Chapter 8

Qualitative comparison

Abstract

This chapter concludes the second part of the book and essentially complements the previous chapters by attempting a qualitative comparison of the state-of-the-art malware diffusion modeling approaches provided in this second part of the book. The goal of this comparison is to highlight and recapitulate the special features of each modeling framework and the specific conditions under which it can be applied. Thus, the chapter attempts to identify and suggest the most suitable framework for different types of networks and operational scenarios, i.e. codify the most characteristic and strong features for each modeling approach in a cumulative fashion. The latter aspires to provide an added value to researchers and professional engineers that wish to exploit these approaches in their own theoretical and practical endeavors. Thus, this chapter may be considered as a qualitative comparison of the presented frameworks and a stepping stone for future research.

State-of-the-art malware diffusion models; Comparison; Complexity; Efficiency; Accuracy; Practical value; Sensitivity; Robustness

8.1. Introduction

In the previous four chapters of Part 2, namely, Chapters 47, we presented and analyzed in detail four generic modeling frameworks, developed for studying, and analyzing the malware diffusion processes and further controlling the dynamics of malware diffusion in various types of complex communication networks and different operational scenarios. For all these four frameworks, namely, the queuing, MRF, optimal control, and game-theoretic based frameworks, respectively, the corresponding techniques were developed mainly for one or more types of wireless complex networks. However, as also mentioned before, all of these four frameworks can be in principle generalized to arbitrary types of complex communications networks, e.g. wired and hybrid, in a straightforward manner.
Given these frameworks, their analysis, and provided numerical/simulation results, in this chapter, we attempt a qualitative comparison of the four generic approaches, in order to increase intuition and the potential for their practical exploitation. Considering prior experience and pitfalls identified while developing these frameworks, useful highlights, intuitions, and suggestions are shortlisted and presented in this chapter. We provide suggestions with respect to a number of important facets, such as computational complexity, implementation efficiency, practical value of the approaches, and others, by comparing the four approaches within an as common as possible set of assumptions. Thus, we provide in a codified manner, educated suggestions for applying the more suitable approach in theoretical and practical problems, given the network, operational, and other factors involved.
Consequently, this chapter further complements Part 2 of the book, toward creating longer-term value for the corresponding state-of-the-art techniques. It further aspires to highlight the most important aspects that each of these frameworks is characterized from, which can be exploited in the future in a more intense basis for modeling and analyzing other types of diffusion processes in similar or different types of complex networks. In other words, Chapter 8 further aids toward one of the initial goals of this book namely contributing to the evolution of Network Science, with respect to the diffusion of malicious information, and in general, in the study of information diffusion processes over various types of complex networks.
In the following, we first compare the four frameworks with respect to various designs and operational aspects, mentioned above, and then summarize the comparison in a tabular form.

8.2. Computational Complexity Comparison

In general, computational complexity characterizes the class of computational problems according to their inherent difficulty, and relating those classes to each other [56]. A problem is regarded as inherently difficult if its solution requires significant resources, whatever the algorithm used. Computational complexity formalizes this intuition by introducing mathematical models of computation to study these problems and quantifying the amount of resources needed to solve them, such as time and storage. Other complexity measures are also used, such as the amount of communication (communication complexity) and the number of processors.
In terms of the previously presented malware diffusion modeling frameworks, computational complexity refers to the difficulty posed by each framework in modeling malware diffusion and providing the desired solutions. Each of the four frameworks is based on a mathematical approach, i.e. queuing systems, MRFs, stochastic optimal control, and game theory, each providing various algorithms and solution techniques. In the framework of this study, computational complexity formalizes cumulatively the inherent processing (computational) load, communication (synchronization) load, etc., imposed by the corresponding decision or computational algorithms operating within each framework for obtaining the desired solutions.
Consequently, the computational comparison in this section will indicate roughly which approach requires more computational resources for obtaining a representation of malware diffusion and studying its impact on various types of networks. Computational resources of interest include processing resources, memory requirements, and communication, namely, signaling required to exchange between distributed agents, if so the case, by the corresponding algorithms.
In terms of computational complexity, the queuing-based and the MRF-based approaches have, in general, lower overall complexity than the other two. The first framework based on queuing systems does not employ any distributed or iterative algorithm. In order to acquire results on the network behavior, analytic expressions obtained can be directly applied to yield the desired evaluation metrics. At the same time, if simulation is involved, discrete-event simulation [63] can be applied by taking into account that two events are possible, infection or recovery. The corresponding simulator then determines which nodes are involved in the determined event and performs the required updates in the system state. Computationally, such algorithm is very simple, with very small computational complexity, namely, that of a for loop. However, the simulation should be run for sufficient time to ensure a large number of events take place, and this imposes CPU time requirements. Furthermore, if large networks are simulated, memory requirements also increase given the simulation ime required. Thus, the first framework has very low computational complexity, but typically imposes demanding CPU and memory requirements.
On the other hand, the MRF approach is based on an iterative algorithm (Simulated Annealing (SA) combined with Gibbs sampling). However, this is also characterized by low complexity in the order of O(N)image,1Nimage being the size of the network, for the case of the sequential implementation. The parallel and hybrid ones may have the same or smaller computational complexity, thus making this approach the best in terms of theoretical performance. Furthermore, the memory requirements scale in the order of O(N)image as well, requiring maintaining a simple vector with current and possibly previous state information. Regarding CPU time, again the number of iterations (sweeps) performed is usually in the range 1000–10,000, a load which is not of significant magnitude for modern machines. When a parallel implementation is employed, the computational complexity is even lower, but some signaling requirements emerge for the synchronization of the process. However, even in this case, the associated message passing is minor, scales well with a growing network size, and does not impose noteworthy requirements on the involved network. Overall, this approach emerges as the one with the lowest computational, communications and memory complexity requirements.
The malware diffusion control framework and the game-theoretic techniques, unless in cases where the solution can be shown to have simple (e.g. threshold-based) structures, lie on the more computationally demanding end. Both again provide analytic solutions, however, especially with the game-theoretic ones, solutions are typically computed iteratively. Convergence of the computational algorithms to the desired solutions can be an issue, especially when distributed approaches are required. In such cases, the selection of solution approach can have a significant impact on the speed of convergence to the corresponding solutions and even on the accuracy of the solution itself if time constraints do not permit an adequately ample amount of time to accommodate a sufficient number of iterations.
Thus, among the four frameworks, the MRF-based emerges as the most convenient in terms of computational complexity. Even though solutions are computed iteratively, and in principle infinite time is required for obtaining the desired solution, acceptable approximations are obtained rather fast compared to, e.g. game-theoretic solutions, where typically convergence of the corresponding iterative algorithms takes longer times. The more interested reader may consult more specialized treatments regarding the convergence behavior of the iterative algorithms typically employed in the four presented frameworks. Some relevant references include [228] for MRFs, [137] for optimal control based approaches, and [79] for game-theoretic based ones. The queuing-based framework is also very light in terms of computational complexity, but it is characterized by stricter operational assumptions and conditions that might not be always possible to meet in practical scenarios compared to the MRF or game-theoretic frameworks.

8.3. Implementation Efficiency Comparison

A significant aspect of each framework relevant to the computational complexity explained previously is the efficiency of implementation of each approach and its solutions. The efficiency of implementation refers to the total effort required to develop, deploy, test, and fully exploit the software implementing each framework, which can vary significantly if the framework is fully distributed, asynchronous, etc. The efficiency of implementation of each of the four presented frameworks can be very critical for different operational, network, and environmental conditions on a per case basis, and especially for their practical application. Furthermore, the implementation efficiency could improve or degrade computational complexity performance, by enabling obtaining solutions easier.
The queuing-based approach is perhaps the most agnostic of the four, since it does not require the computation of solutions iteratively and it only provides analytic results with respect to the parameters of the system analyzed. The implementation cost is minor requiring performing the corresponding computations with a suitable computational package, or in an ordinary programming language. Parameters can be modified easily and the results can be obtained in very fast-response scales. No signaling overhead exists and the exploitation of the results is direct. However, simulation of the network behavior can be tricky, as it depends on discrete-event techniques that typically take a significant amount of development and validation time.
The MRF framework for modeling malware diffusion also bears a light implementation load. Apart from the network/neighborhood implementation, which is necessarily common for all four frameworks and other traditional and emerging ones, this approach requires the implementation of the Gibbs sampler. If the Gibbs sampler is sequential, it is only required to compute implicitly the partition function based on the distributed potential function given by (5.4) for all sites in order to successfully compute the probability that each site will change state. A simple for loop is sufficient for this purpose. This can be easily extended for multiple states as well, if the infection model requires so. Furthermore, the same approach will be used for implementing a parallel Gibbs sampler. A minor complication might arise in the case of a synchronous parallel operation. In the event of an asynchronous one, sites do not have to exchange messages and thus implementation remains as simple as the sequential version of the approach. In the synchronous case, some signaling will be required to ensure that all agents (one for each site) have the same picture of system state, which will require message exchanges and thus slightly complicate implementation, especially if the underlying network substrate is wireless. In any case, piecemeal solutions for distributed agents communication available in the literature [12,228] can be employed, ensuring as little signaling overhead as possible.
The implementation of the optimal control framework is heavily dependent on the computation of solutions using Pontryagin’s maximum principle, which involve mixed boundary differential equations, i.e. differential equations for the states with initial conditions and differential equations for the costates with final conditions. Such implementations are readily available for various applications in the control systems domains, e.g. robots and vehicles. These implementations can be easily modified and used for obtaining solutions for malware modeling. Furthermore, the implementations are not difficult to achieve, even though the computational complexity could be in general high.
The implementation of the game-theoretic approaches is comparable but slightly more challenging than the optimal control framework. Specifically, the information structure of each player has to be carefully identified: for instance, whether players have no feedback information about the history of actions and states of the network (hence an open-loop scenario), or they have perfect or imperfect information of the current state, etc. Convergence issues and stability properties of the targeted solutions have to be considered carefully and evaluated within each specific network topology analyzed and attack scenario considered.
Overall, the queuing-based framework emerges as the more efficient for implementation, followed by the MRF-based approaches. However, given the fact that if the studied network changes, MRF can be applied with a minor only change, namely, the correct definition of new neighborhoods, while the queuing-based framework requires more complicated changes, MRF seems again the best “value-for-money” choice in terms of implementation efficiency.

8.4. Sensitivity Comparison

Sensitivity analysis can be, in principle, defined as the study of how the uncertainty in the output of a mathematical model or system (numerical or otherwise) can be apportioned to different sources of uncertainty in its inputs. In plain terms, in sensitivity analysis, one studies the variation of outcomes of a model for varying input parameters. Another form of sensitivity analysis is to explore the validity/accuracy of outcomes of a model, when the assumptions are slightly varied, as well as the impact that each such variation of assumptions might have on the quality of solutions, speed of convergence, etc.
In terms of the above, the queuing systems based framework is rather sensitive to the provided assumptions. Especially those regarding the ergodicity of the queuing system are very critical for the validity of the underlying Markov process that is employed for obtaining results on the evolution of malware diffusion. Furthermore, environmental factors affecting the topology, e.g. channel quality, mobility and operational variations, can easily throw systems off their steady-state and thus affect ergodicity of the system. In those cases, careful consideration and potential modification, if possible, of the Markovian setting to ensure that the system remains ergodic in the new steady-state are critical.
The MRF framework is far less sensitive to the above considerations. First of all, the framework is fairly transparent to the type of network topology. As long as local interactions take place, which is the case in all communication networks at the physical layer, the spatial Markov property of the model can be exploited for studying the system. The only modification needed is in the definition of the neighborhoods of nodes. Thus, from the generic perspective, the framework is fairly robust. However, it appears more sensitive to all factors affecting the convergence of the Gibbs sampler. Depending on the type of sampler employed, sequential or parallel, synchronization aspects might have an impact on the speed of convergence. Similarly, topological variations, e.g. network churn and mobility, can affect the speed of convergence of the Gibbs sampler, and even convergence of the sampler itself if the topology variations are extremely frequent. Of course, in any case, the system is capable of following up with the variations, driving the solution search toward the correct convergence points. However, it might be the case that variations take place at such a rate that the sampler cannot follow up. This is a form of “dynamic convergence,” which given the extremely varying nature of this system can be acceptable to some degree.
The optimal control and game-theoretic frameworks more or less follow the same lines. Depending on the spreading model governing the interaction between susceptible and infected nodes, as well as the cost and reward functions, the changes in the obtained solutions for a variation in the parameters of the system can be considerable. Hence, convergence to the desired solution requires careful re-consideration when extrapolating the obtained solutions for obtaining similar ones with varied input parameters.

8.5. Practical Value Comparison

The four frameworks presented exhibit several overlaps in terms of their modeling objectives and features, e.g. focus on the macroscopic evolution of the systems analyzed and application of stochastic techniques. However, at the same time, they also complement each other in various facets of malware diffusion. One such aspect is the practical value of each framework, and the outcomes that can be obtained by each one of them.
The queuing-based framework aims toward a more theoretic indication of robustness capabilities of an attacked complex network. Thus, it allows assessing in a compact fashion the overall and macroscopic capabilities of a network under attack, by identifying which parameters, e.g. number of network nodes and transmission radius, are prone to make the network more vulnerable. The framework can provide ample information on such longer-term aspects of the malware diffusion dynamics and especially the network topology parameters. However, it cannot be used in, e.g. real-time diffusion control. Any control policies obtained using the corresponding analytic results will be necessarily of open-loop character, i.e. control policies will be equivalent to threshold-based decisions on how to react to given circumstances and system states.
This is not the case with the optimal control framework, which allows for a broader span of control scenarios to be accommodated and various parameterized control policies to be obtained. The main objective of this approach is centered around controlling the dynamics of the malware diffusion in a generic fashion, and thus, it seems straightforward that one should prefer this modeling substrate when control is the main objective of the imposed problem. Of course, the MRF and game-theoretic based frameworks enable developing optimization formulations that allow control of malware dynamics as well. However, since both operate iteratively, this can have an impact on real-time controls, complicating their implementation and quality of obtained results.
The MRF-based framework on its part is a very simple to implement and use approach, which can accommodate all types of network topologies and model generically various types of infection models (with many different states, rather than only binary). Thus, this modeling substrate emerges as the most general and “handy” for complex networks, further allowing the study of malware diffusion under complex conditions, e.g. mobility and network churn.
Finally, the game-theoretic framework, which is capable of more accurately modeling the dynamic interactions between pairs of susceptible-infected/malicious nodes, can be employed for those purposes where the impact of the interaction and node response is analyzed. For instance, when the focus of the analysis is on protocol-user interactions, the behavior of legitimate users, e.g. quarantine, and the strategies of attackers, e.g. which network parameter to exploit more, the game-theoretic framework allows more flexibility than, e.g. the queuing framework (an example of attack optimization based on the queuing models is presented in Chapter 9, Section  9.1). The MRF-based framework also models susceptible-infected interactions, especially in networks where these interactions are of local nature via the clique potential functions. However, such interactions are restricted on a physical connectivity scope, contrary to the game-theoretic framework, where arbitrary types of interactions between susceptible-infected pairs can be analyzed and studied with respect to their impact on malware diffusion.
One significant aspect of all the presented frameworks and their parameters is the extent to which all such model parameters are associated with their real counterparts. In most of them, e.g. average/instantaneous infection/recovery rate, node/edge churn rates, and link infection probability, it is possible to obtain them accurately by measurement and statical processing. This is possible for all model parameters that can be observed macroscopically in a network through a simple counting process (either in the long-term yielding average quantities, or instantaneously yielding infinitesimal quantities). An example of such case is the average infection/recovery rate, which can be measured by a distributed process running over a legitimate network and updating two counters after each infection/recovery event.
On the other hand, there are model parameters/assumptions that are more complicated to observe/measure and validate in real systems. For instance, in the queuing and optimal control approaches, it is assumed that infections arrive according to Poisson processes in the network. This a very tough assumption to validate in the case of malware diffusion applications, because very specialized software is required to collect the involved data, in addition to the willingness of users to collaborate. However, it seems more viable to validate such assumption in information dissemination applications that resemble malware diffusion (Chapter 9, Section  9.2), where data from online social networks can be collected (see, for instance, relevant datasets provided in [54]) and processed to reconstruct the sequence of events taking place. For all these model parameters and assumptions, great caution is required by the interested researchers, first in validating them, if possible, and then in the application of the provided frameworks and the extent of accuracy of the obtained results.

8.6. Modeling Differences

The malware diffusion modeling frameworks presented in Chapters 47 can be grouped into two major classes, the first group (queuing and MRF-based) mainly deals with SIS types of networks via probabilistic and Markovian tools, while the second (optimal control and game-theoretic techniques) mainly addresses epidemics models (SIRD type). By an overall inspection of the mathematical techniques involved in the basis of each framework, it can be readily observed that there exist some commonalities between them, but also fundamental differences as well. They are concisely explained in the following.
Regarding the employed node infection model, the queuing and MRF-based approaches assume both a SIS type of malware diffusion. This means that both study the long-term behavior of the attacked network, when possibly multiple threats propagate. Thus, both frameworks aim at characterizing the long-term behavior and robustness capabilities of the network and can be used extensively as benchmarks for more specific attacks. Moreover, they can be used for design purposes, analyzing the inherent defensive capabilities of networks, and allowing the design of efficient countermeasures. On the contrary, the optimal control and game-theoretic frameworks assume a SIRD type of malware diffusion, which is more suitable for the study of specific single attacks. Thus, these approaches can provide useful outcomes for further assessing and securing attacked networks, but in a more targeted manner. This means that the two emerging groups of approaches can be used complementary for the study of malware diffusion.
In both optimal control and game-theoretic frameworks in Chapters 6 and 7, the dynamics of the epidemic is subject to dynamic decisions of rational entities, in order to achieve desired objectives. The difference between the two frameworks is that in optimal control framework, only one agent, either the attacker or the defender of the network, is assumed to be able to dynamically manipulate the dynamics of the spread, while in the game-theoretic framework, both of them can. For instance, while in both Eq. (6.1) and (7.1) the worm is assumed to be able to dynamically decide the rate of killing the infective nodes, in (6.1), the contact rate of nodes as well as the rate of patch dissemination is assumed as given exogenous functions, while in (7.1), these rates can be dynamically decided by the defender, and the attacker can only reasonably assume they will be chosen optimally in response to its choice of the dynamic killing rate, and likewise for the defender. In an optimal control framework, the decision of the controlling entity should take into account the instantaneous cost of a control at a time as well as its effect on future costs by affecting the evolution of the spread. In a differential game framework, each player, in addition, needs to take into account the strategic effect of the control decisions at any given time as well.
A very important common assumption among all frameworks is homogeneity and uniformity. Homogeneous mixing and uniform infection/recovery rates have been considered for all legitimate network nodes. Regarding the second, it is straightforward to extend the analysis in all frameworks to include nonuniform infection/recovery rates. However, that would involve more complicated algebraic computations for obtaining analytical results. Mathematically, it seems easier to do so for the MRF-based framework, due to the simpler analysis and respective expressions emerging. With respect to the first, this is a broader open problem, which is further discussed in Chapter 10.
Regarding the node states, the first two approaches consider mainly two states (S,I) and the second two approaches four states (S,I,R,D). However, extending to cases with more states is straightforward for the queuing and MRF frameworks. Especially for the first aspect, this was shown for networks with churn, while extension of states in the MRF models is rather easy (extension of the phase space). This is not the case with the optimal control and game theory based approaches, where modeling is tightly related to the employed SIRD model and associated states.

8.7. Overall Comparison

In Table 8.1, we present cumulatively the main features of the comparison of the four state-of-the-art frameworks presented previously, modeling the diffusion of malware and information in general. Their main performance indices are provided in a codified manner in order to provide an overall picture of their qualities and drawbacks in exploiting them in future research and industrial applications.

Table 8.1

Qualitative Comparison of State-of-the-Art Malware Modeling Frameworks

FeaturesQueuingMRFOptimal ControlGame Theory
Computational complexityLowLowHighHigh
Implementation costLowLow/averageAverageHigh
SensitivityHighLowMediumMedium
Practical valueMediumHighHighMedium

image

In addition, in Chapter 10, an overview of open problems is provided for each of the approaches presented in Part 2 of the book. Such open problems also constitute a comparative basis, in the sense that they indicate the maturity of each approach, tough constraints that need to be currently removed in order to obtain even more accurate and generic solutions, as well as the complete potential of each modeling framework for successfully describing malware diffusion in complex wireless and possibly wired networks.

1 The Big-O notation employed here is defined as follows. Let f,gimage be two functions defined on some subset of the real numbers. Then f(x)=O(g(x))image as ximage, if and only if there is a positive constant Mimage such that for all sufficiently large values of ximage, the absolute value of f(x)image is at most Mimage multiplied by the absolute value of g(x)image, i.e. if and only if there exists a positive real number Mimage and a real number x0image such that f(x)Mg(x)image for all xx0image.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset