1. Introduction
Cyber attacks (CAs) have generally been one-dimensional, involving denial of service (DoS), computer viruses or worms, and unauthorized intrusion (hacking). Websites, mail servers, and client machines are the major targets. However, recent CAs have diversified to include multi-stage and multi-dimensional attacks with a variety of tools and technologies. Nextgeneration security will require network management and intrusion detection systems that combine short-term sensor information with long-term knowledge databases to provide decision support and cyberspace command and control. One of the important capabilities is to efficiently and promptly predict the threat's tactical intent from various network alerts generated by Intrusion Detection Sensors (IDSs) or Intrusion Prevention Sensors (IPSs).
Recent efforts to apply data fusion techniques to cyber situational awareness are promising (Salerno et al., 2005; Tadda et al., 2006), but assessing the potential impact of an attack and predicting intent, or high-level data fusion, continue to present substantive challenges. In this chapter, an adaptive Markov game approach is introduced to meet the challenge.
Game theory is not a new concept in the cyber defense domain. Current game theoretic approaches (Alpcan & Basar, 2003; Agah et al., 2004; Sallhammar et al., 2005) for cyber network intrusion detection and decision support are based on static matrix games and simple extensive games, which are usually solved by game trees. For example (see Fig. 1), red side (attacker) has five options while blue side (IDS) has two re-actions for informationset 1 (two blue bullets labeled as 1:1) about sub-system 1 and three possible actions for information-set 2 (three blue bullets labeled as 1:2) about sub-system 2 and 3. The payoffs of both sides are shown on the right side of each possible outcome (black bullets). A mixed Nash Strategy pair is shown with black lines. Attacker will choose action “Attack Subsystem 1” with probability 3/20, “Attack Sub-system 2” with probability 3/20, and “Do not attack sub-system 2/3” with probability 7/10. Blue side will set an alarm for sub-system 1 with probability 1, sub-system 2 with probability 5/12, sub-system 3 with probability 47/90, and ignore with probability 11/180. It is not difficult to see that these matrix game models lack the sophistication to study multi-players with relatively large actions spaces, and large planning horizons.
In this chapter, we propose a game theoretic situation awareness and impact assessment approach for cyber network defense system to consider the changes of threat intents during cyber conflict. From a perspective of data fusion and adaptive control, we use a Markov (stochastic) game method to estimate the belief of each possible cyber attack pattern. With the consideration that the parameters in each game player's cost function is not accessible to other players, we design an adaptation scheme, based on the concept of Fictitious Play (FP), for the piecewise linearized Markov game model. A software tool is developed to demonstrate the performance of the adaptive game theoretic high level information fusion approach for cyber network defense and a simulation example shows the enhanced understating of cyber-network defense.
Our approach has the following advantages: (1) it is decentralized. Each network defense element or team makes decisions mostly based on local information. We put more autonomies in each group allowing for more flexibilities; (2) a Markov decision process (MDP) can effectively model the uncertainties in the noisy cyber environment; (3) it is a game model with two players: red force (network attackers) and blue force (cyber defense resources); and (4) FP learning concept is integrated. Each player presumes that his opponents are playing stable strategies (Nash equilibria). Each player starts with some initial beliefs and chooses a best response to those beliefs as a strategy in this round. Then, after observing their opponents' actions, the players update their beliefs. The process is then repeated. It is known that if it converges, then the point of convergence is a Nash equilibrium of the game.
The rest of the chapter is organized as follows. Section 2 describes our proposed framework. Section 3 presents a Markov model for cyber network defense. Then an adaptive design is described in Section 4. Section 5 discusses the simulation tool and simulation result, and Section 6 gives conclusions.
2. Framework for Cyber Threat Intent Inference
As indicated in Fig. 2, our cyberspace security framework has two fully coupled major parts: 1)
The data fusion module permits refinement of primitive awareness and assessment to identification of new attacks while the dynamic/adaptive feature recognition module generates estimates and learns about them. The adaptive Markov game method, a stochastic approach, is used to evaluate the prospects of each potential attack. Game theory captures the nature of cyber conflict: determining the attacker's strategies is closely allied to decisions on defense and vice versa.
Fig. 2 also charts the data mining and fusion structure. For instance, detection of new attack patterns is linked to Level-1 (L1) data fusion results in dynamic learning, including deception reasoning, trend/variation identification, and multi-agent learning. Our approach to deception detection is heavily rooted in the application of pattern-recognition techniques to locate and diagnose anomalous conditions in the cyber environment. Dynamic learning and refinement can also enhance Level-2 (L2) and Level-3 (L3) data fusion.
Various logs and alerts generated by Intrusion Detection Sensors (IDSs) or Intrusion Prevention Sensors (IPSs) are fed into the L1 data fusion components. The fused objects and related pedigree information are used by a feature/pattern recognition module to generate primitive prediction of intents of cyber attackers. If the observed features are already associated with adversary intents, we can easily obtain them by pattern recognition. In some time-critical applications, the primitive prediction can be used before it is refined; because the high-level data fusion refinement operation is relatively time-consuming in the multiplicative of probability calculations.
High-level (L2 and L3) data fusion based on Markov game model is proposed to refine the primitive prediction generated in stage 1 and capture new or unknown cyber attacks. The adaptive Markov (Stochastic) game method (AMGM) is used to estimate the belief of each possible cyber attack graph. Game theory can capture the nature of cyber conflicts: the determination of the attacking-force strategies is tightly coupled to the determination of the defense-force strategies and vice versa. Also AMGM can deal with the uncertainty and incompleteness of the available information. We propose a graphical model to represent the structure and evolution of the above-mentioned Markov game model so that we can efficiently solve the graphical game problem.
The captured unknown or new cyber attack patterns will be associated to related L1 results in the dynamic learning block, which takes deception reasoning, trend/variation identification, and distribution models and calculations into account. Our approach to deception detection is heavily based on the application of pattern recognition techniques to detect and diagnose what we call out-of-normal (anomaly) conditions in the cyber environment. The results of dynamic learning or refinement shall also be used to enhance L2 and L3 data fusion. This adaptive process may be considered as level 4 data fusion (process refinement; see the 2004 DFIG model (Blasch & Plano, 2005)).
In this chapter, we will focus on the L3 data fusion (adaptive Markov game approach) part in the overall framework shown in Fig. 2.
3. Markov game model for cyber network defense
In general, a Markov (stochastic) game (Shapley, 1953) is specified by (i) a finite set of players
where
The overall system state at time k is
where
For the blue team (network defense system), we consider the following defense actions:
For each network node (server or workstation), the state of time
For example, if the state of node 1 at time k is [“normal”, “NULL”, “NULL”], one component of the red action is “email-bombing node 1”, and one component of blue action is “email-filter–configuration-no-block for node 1”, then the probability distribution of all possible next states of node 1 is: [“normal”, “email-filter-configuration”, “email-bombing”] with probability 0.4; [“slow response”, “email-filter-configuration”, “email-bombing”] with probability 0.3; and [“crashed”, “email-filter-configuration”, “email-bombing”] with probability 0.3. The actual probabilities depend on the efficiency of attacking and defending actions.
The lower level payoff functions are used by each team (blue or red) to determine the cooperative team actions for each team member based on the available local information. For the
where,
Similarly, we obtain the lower level payoff functions for the
The top level payoff functions at time
In our approach, the lower level payoffs are calculated distributedly by each team member and sent back to network administrator via communication networks.
In our proposed approach, the sub-optimal Nash solution to the Markov game is obtained via a K time-step look-ahead approach, in which we only optimize the solution in the K time-step horizon. K usually takes 2, 3, 4, or 5. The suboptimal technique is used successfully for reasoning in games such as chess, backgammon, and monopoly. For our case, the objective of each team at time k is to find a serial of actions or course of action (COA) to maximize the following team level payoffs,
In general, the system model is nonlinear. By the linearization transformation method (Sastry, 1999), the system dynamics can be approximately modeled by a linear system.
where
where
4. An adaptation design for linear quadratic games
With the consideration that the parameters in each side’s cost function is not accessible to the other side, we propose to use an adaptation design based on the concept of Fictitious Play (FP) (Brown, 1951; Fudenberg & Levine, 1998) to learn these unknown properties. As a learning concept, FP was first introduced by G. W. Brown in 1951. Within the learning scheme, each player presumes that her opponents are playing stable (possibly mixed) strategies. Each player starts with some initial beliefs and chooses a best response to those beliefs as a strategy in this round. Then, after observing their opponents' actions, the players update their beliefs according to some learning rule (e.g. temporal-differencing, Q-learning, or Bayes' rule). The process is then repeated. It is known (Levine, 1998) that if it converges, then the point of convergence is a Nash equilibrium of the game.
4.1 Nash strategies of linear quadratic games
Let us first consider general two-person infinite-horizon simultaneous linear quadratic games
with the cost functions
where
where L1 and L2 are defined, respectively, by
4.2 Adaptation schemes
In the one-side adaptation scheme, only one of the two players has perfect information of the cost functions of both players, i.e., the player knows exact
We assume that player 1, who has perfect information structure, will apply its real state feedback Nash strategies
Consider the system defined in (12) with fixed controller (15) for player 1, let
First, we have
Since matrix
We introduce the estimation error
where
Choose the adaptive law for
where 0<Γ=Γ
As shown in Fig. 3, for any
It is proved that the
where
Similar to the proof of Normalized Gradient algorithm (Tao, 2003, page 115-116), we define a parameter error measurement function
where
Now let’s testify the inequality in the last line. By the eigenvalue decomposition of Γ, we can obtain Γ=
Since
This completes the verification of (28), from which we have
So
If
We proposed two ways to satisfy the excitation conditions (AstrÄom & Wittenmark, 1995, page 63-67). The first one is a reference signal tracking method, and the other is called small system disturbance.
We also attend the adaptation design to two-side adaptation scheme (Fig. 4), in which each player needs to estimate the control gain of its opponent. We assume that each player has access to its own cost function as well as the system dynamics, and does not know the parameters of the cost function of its opponent. i.e., player 1 knows
5. Simulation and visualization tool
To evaluate our game theoretic approach for cyber attack prediction and mitigation, we have constructed a Cyber Game Simulation Platform (CGSP) (as shown in Fig. 5) based on an open-source network experiment specification and visualization tool kit (ESVT). Through this event-based, interactive and visual simulation environment, various attack strategies (single stage or multi-staged) and scenarios can be easily played out and the effect of game theoretic attack prediction and mitigation can be visually and quantitatively evaluated.
The implemented network components in this platform includes Computer (host), Switch, Open Shortest Path First (OSPF) Router or Firewall, Link (connection), and (Sub) Network (Simulated by a node).
Besides the ordinary network properties such as processing capacity, bandwidth, Pr{error}, and delay etc., CGSP components can be assigned a number of network attack containment or traffic mitigation properties to act as various defense roles, including smart IDS (intrusion detection systems), incoming traffic block, and outgoing traffic block. Additionally and more importantly, these defense roles or network defense properties can be deployed and re-deployed in real-time during a game simulation run-time based on the local intelligence and orders from higher-level command centers.
The color of a link represents the traffic volume on that link (in KBps and in Mbps). Light Gray: less than 1 percent of bandwidth; Green: more than 1 percent of bandwidth; Yellow: between green and red; Red: more than 30 percent of bandwidth. The color of a host indicates the host status. Red: Infected node; Green: Vulnerable node but not infected; Gray: Non-vulnerable node.
In our simulation platform, network attacks and defenses are simulated in CGSP by events. Live network packets and other communications are represented and simulated by the main network event queue. Users or software agents can inject packets or network events through the timed event (M/M/1) queue. Security alerts or logs are generated and stored in the security log queue.
There are a number of cyber attacks that are included in the CGSP implementation: Port scan, Buffer attack (to gain control), Data bomb or Email bomb from and to a single host, Distributed Denial of service from multiple hosts, Worm attack, and Root right hack (confidentiality loss). [Note: Both buffer attack victims and worm infectives will join the distributed denial of service when they receive the DDOS command.]
The arsenal of network defense team includes: Smart IDS (Accuracy and false positive adjustable), Directional traffic block (outgoing or incoming), Host Shutdown, Host Reset (shutdown and restart). [Note: Both SHUTDOWN and RESET will clear the infection status on the host.]
We simulated a scenario (Fig. 5) with 23 workstations, 2 routers, 7 switches, and 1 network. In this scenario, we first limit the look-ahead steps
In this case, we implemented Nash strategies for cyber network defense side. We can see from Fig. 6 and Fig. 7 that a target computer (web server) is infected or hacked. Then the computer (web server) will be used by attacking force to infect other more important target computers such as file servers or email servers. This two-step attacking scheme is based on two facts: 1) a public web server is easy to attack and 2) an infected internal computer (web server in this case) is more efficient and stealthy than an external computer to attack well protected computers such as data servers or email servers.
The adaptive scheme is implemented and the results are shown in Fig. 8. In this plot,
In the next run, we set the look-ahead step
6. Conclusions
We implemented an adaptive Markov game theoretic situation awareness and adversary intent inference approach in a data-fusion/data-mining cyber attack and network defense framework (Fig. 2). The network security system was evaluated and protected from a perspective of data fusion and adaptive control. The goal of our approach was to examine the estimation of network states and projection of attack activities (similar to extended course of action (ECOA) in the warfare scenario). We used Markov game theory’s ability to “step ahead” to infer possible adversary attack patterns. With the consideration that the parameters in each game player's cost function is not accessible to other players, we designed an adaptation scheme, based on the concept of Fictitious Play (FP), for the piecewise linearized Markov game model. A software tool was developed to demonstrate the performance of the adaptive game theoretic high level information fusion approach for cyber network defense and simulations were performed to verify and illustrate the benefits of this model.
References
- 1.
Agah, A.; Das, S. K. & Basu, K. (2004). A non-cooperative game approach for intrusion detection in sensor networks, Vehicular Technology Conference, 2004. VTC2004-Fall. pp. 2902 – 2906, 2004. - 2.
Alpcan, T. & Basar, T. (2003). A game theoretic application to decision and analysis in Network Intrusion Detection, 42nd IEEE CDC 2003 , pp. 2595-2600, Maui, Hawaii, USA, 2003. - 3.
AstrÄom, K.J. & Wittenmark, B. (1995). Adaptive Control . Addison-Wesley Series in Electrical Engineering: Control Engineering, Addison-Wesley, second ed., 1995. - 4.
Basar, T. & Olsder, G. J. (1999). Dynamic Noncooperative Game Theory , SIAM Series in Classics in Applied Mathematics, second ed., January, 1999. - 5.
Blasch, E. & Plano, S. (2005). DFIG Level 5 (User Refinement) issues supporting Situational Awareness Reasoning, 7th International Conference on Information Fusion , pp. xxxv-xliii, ISBN: 0-7803-9286-8, Philadelphia, PA, USA, July, 2005. - 6.
Brown, G. W. (1951). Iterative solutions of games by fictitious play, In: Activity Analysis of Production and Allocation (T. C. Koopmans, ed.), New York: Wiley, 1951. - 7.
Chen, G.; Shen, D.; Kwan, C.; Cruz, Jr., J. B.; Kruger, M. & Blasch, E. (2007). Game Theoretic Approach to Threat Prediction and Situation Awareness, Journal of Advances in Information Fusion , vol. 2, no. 1, pp. 35-48, June, 2007. - 8.
Fudenberg D. & Levine, D. K. (1998). The Theory of Learning in Games , Cambridge: MIT Press, 1998. - 9.
Nash, J. (1951). Noncooperative games, Annals of Mathematics , vol. 54, pp. 286-295, 1951. - 10.
Salerno, J.; Hinman, M. & Boulware, D. (2005). A Situation Awareness Model Applied To Multiple Domains, Proc. Defense and Security Conference , Orlando, FL, 2005. - 11.
Sallhammar, K.; Knapskog, S. J. & Helvik, B. E. (2005). Using Stochastic Game Theory to compute the expected Behavior of attackers, Proceedings, 2005 Symposium on Applications and the Internet Workshops , 2005. - 12.
Sastry, S. S. (1999). Nonlinear Systems: Analysis, Stability and Control , Springer-Verlag, New York, NY, 1999. - 13.
Shapley, L. S. (1953). Stochastic games, Proc. National Academy of Sciences of the United States of America , vol. 39, pp. 1095-1100, 1953. - 14.
Shen, D. (2006). Nash strategies for dynamic noncooperative linear quadratic sequential games , Ph.D. Dissertation, Adviser: Jose B. Cruz, Jr., Ohio State University, Electrical Engineering, 2006. - 15.
Tadda, G.; Salerno, J.; Boulware, D.; Hinman, M. & Gorton, S. (2006). Realizing Situation Awareness within a Cyber Environment, In: Multisensor, Multisource Information Fusion: Architectures, Algorithms, and Applications 2006 , edited by B. V. Dasarathy, Proc. of SPIE Vol. 6242, SPIE, Bellingham, WA, 2006. - 16.
Tao, G. (2003). Adaptive Control Design and Analysis , Adaptive and Learning Systems for Signal Processing, Communications and Control Series, Hoboken, N.J.: Wiley-Interscience, 2003.