An Adaptive Markov Game Model for Cyber Threat Intent Inference

Dan Shen; Genshe Chen; Jose B. Cruz; Jr.; Erik Blasch; and Khanh Pham

doi:10.5772/6690

Author Information

Show +

Dan Shen
- Intelligent Automation, Inc.
Genshe Chen
- DCM Research Resources, LLC.
Jose B. Cruz
- The Ohio State University
Erik Blasch
- The Air Force Research Laboratory, USA
Khanh Pham
- The Air Force Research Laboratory, USA

*Address all correspondence to:

1. Introduction

Cyber attacks (CAs) have generally been one-dimensional, involving denial of service (DoS), computer viruses or worms, and unauthorized intrusion (hacking). Websites, mail servers, and client machines are the major targets. However, recent CAs have diversified to include multi-stage and multi-dimensional attacks with a variety of tools and technologies. Nextgeneration security will require network management and intrusion detection systems that combine short-term sensor information with long-term knowledge databases to provide decision support and cyberspace command and control. One of the important capabilities is to efficiently and promptly predict the threat's tactical intent from various network alerts generated by Intrusion Detection Sensors (IDSs) or Intrusion Prevention Sensors (IPSs).

Recent efforts to apply data fusion techniques to cyber situational awareness are promising (Salerno et al., 2005; Tadda et al., 2006), but assessing the potential impact of an attack and predicting intent, or high-level data fusion, continue to present substantive challenges. In this chapter, an adaptive Markov game approach is introduced to meet the challenge.

Game theory is not a new concept in the cyber defense domain. Current game theoretic approaches (Alpcan & Basar, 2003; Agah et al., 2004; Sallhammar et al., 2005) for cyber network intrusion detection and decision support are based on static matrix games and simple extensive games, which are usually solved by game trees. For example (see Fig. 1), red side (attacker) has five options while blue side (IDS) has two re-actions for informationset 1 (two blue bullets labeled as 1:1) about sub-system 1 and three possible actions for information-set 2 (three blue bullets labeled as 1:2) about sub-system 2 and 3. The payoffs of both sides are shown on the right side of each possible outcome (black bullets). A mixed Nash Strategy pair is shown with black lines. Attacker will choose action “Attack Subsystem 1” with probability 3/20, “Attack Sub-system 2” with probability 3/20, and “Do not attack sub-system 2/3” with probability 7/10. Blue side will set an alarm for sub-system 1 with probability 1, sub-system 2 with probability 5/12, sub-system 3 with probability 47/90, and ignore with probability 11/180. It is not difficult to see that these matrix game models lack the sophistication to study multi-players with relatively large actions spaces, and large planning horizons.

Fig. 1.
An example of decision support based on static matrix game model

In this chapter, we propose a game theoretic situation awareness and impact assessment approach for cyber network defense system to consider the changes of threat intents during cyber conflict. From a perspective of data fusion and adaptive control, we use a Markov (stochastic) game method to estimate the belief of each possible cyber attack pattern. With the consideration that the parameters in each game player's cost function is not accessible to other players, we design an adaptation scheme, based on the concept of Fictitious Play (FP), for the piecewise linearized Markov game model. A software tool is developed to demonstrate the performance of the adaptive game theoretic high level information fusion approach for cyber network defense and a simulation example shows the enhanced understating of cyber-network defense.

Our approach has the following advantages: (1) it is decentralized. Each network defense element or team makes decisions mostly based on local information. We put more autonomies in each group allowing for more flexibilities; (2) a Markov decision process (MDP) can effectively model the uncertainties in the noisy cyber environment; (3) it is a game model with two players: red force (network attackers) and blue force (cyber defense resources); and (4) FP learning concept is integrated. Each player presumes that his opponents are playing stable strategies (Nash equilibria). Each player starts with some initial beliefs and chooses a best response to those beliefs as a strategy in this round. Then, after observing their opponents' actions, the players update their beliefs. The process is then repeated. It is known that if it converges, then the point of convergence is a Nash equilibrium of the game.

The rest of the chapter is organized as follows. Section 2 describes our proposed framework. Section 3 presents a Markov model for cyber network defense. Then an adaptive design is described in Section 4. Section 5 discusses the simulation tool and simulation result, and Section 6 gives conclusions.

2. Framework for Cyber Threat Intent Inference

As indicated in Fig. 2, our cyberspace security framework has two fully coupled major parts: 1) Data fusion module (to refine primitive awareness and assessment, and to identify new cyber attacks); and 2) Dynamic/adaptive feature recognition module (to generate primitive estimations, and to learn new identified new or unknown cyber attacks).

Fig. 2.
Data Fusion Approach for Cyber Situation Awareness and Impact Assessment

The data fusion module permits refinement of primitive awareness and assessment to identification of new attacks while the dynamic/adaptive feature recognition module generates estimates and learns about them. The adaptive Markov game method, a stochastic approach, is used to evaluate the prospects of each potential attack. Game theory captures the nature of cyber conflict: determining the attacker's strategies is closely allied to decisions on defense and vice versa.

Fig. 2 also charts the data mining and fusion structure. For instance, detection of new attack patterns is linked to Level-1 (L1) data fusion results in dynamic learning, including deception reasoning, trend/variation identification, and multi-agent learning. Our approach to deception detection is heavily rooted in the application of pattern-recognition techniques to locate and diagnose anomalous conditions in the cyber environment. Dynamic learning and refinement can also enhance Level-2 (L2) and Level-3 (L3) data fusion.

Various logs and alerts generated by Intrusion Detection Sensors (IDSs) or Intrusion Prevention Sensors (IPSs) are fed into the L1 data fusion components. The fused objects and related pedigree information are used by a feature/pattern recognition module to generate primitive prediction of intents of cyber attackers. If the observed features are already associated with adversary intents, we can easily obtain them by pattern recognition. In some time-critical applications, the primitive prediction can be used before it is refined; because the high-level data fusion refinement operation is relatively time-consuming in the multiplicative of probability calculations.

High-level (L2 and L3) data fusion based on Markov game model is proposed to refine the primitive prediction generated in stage 1 and capture new or unknown cyber attacks. The adaptive Markov (Stochastic) game method (AMGM) is used to estimate the belief of each possible cyber attack graph. Game theory can capture the nature of cyber conflicts: the determination of the attacking-force strategies is tightly coupled to the determination of the defense-force strategies and vice versa. Also AMGM can deal with the uncertainty and incompleteness of the available information. We propose a graphical model to represent the structure and evolution of the above-mentioned Markov game model so that we can efficiently solve the graphical game problem.

The captured unknown or new cyber attack patterns will be associated to related L1 results in the dynamic learning block, which takes deception reasoning, trend/variation identification, and distribution models and calculations into account. Our approach to deception detection is heavily based on the application of pattern recognition techniques to detect and diagnose what we call out-of-normal (anomaly) conditions in the cyber environment. The results of dynamic learning or refinement shall also be used to enhance L2 and L3 data fusion. This adaptive process may be considered as level 4 data fusion (process refinement; see the 2004 DFIG model (Blasch & Plano, 2005)).

In this chapter, we will focus on the L3 data fusion (adaptive Markov game approach) part in the overall framework shown in Fig. 2.

3. Markov game model for cyber network defense

In general, a Markov (stochastic) game (Shapley, 1953) is specified by (i) a finite set of players N, (ii) a set of states S, (iii) for every player i ∈ N, a finite set of available actions Dⁱ (we denote the overall action space D =×_i∈NDⁱ), (iv) a transition rule q : S × D → Δ(S), (where Δ(S) is the space of all probability distributions over S), and (v) a payoff function r : S × D → R^N. For the cyber decision support and attacker intent inference problem, we obtain the following distributed discrete time Markov game (we revise the Markov game model (Chen et al., 2007) used for battle-space and focus on the cyber attack domain properties):

Players (Decision Makers) --- Cyber attackers and network defense system are two players of this Markov game model. We denote cyber attackers as “red team”, and the network defense system (IDSs, Firewalls, Email-Filters, Encryption) as “blue team”. The cooperation within the same team is also modeled so that the coordinated cyber network attacks can be captured and predicted.

State Space --- All the possible states of involved network nodes consist of the state space. For example, the web-server (IP = 26.134.3.125) is controlled by attackers. To determine the optimal IDS deployment, we include the defense status for each network nodes in the state space. So, for the i^th network node, there is a state vector sⁱ(k) at time k.

sik=fpaTE1

where f is the working status of the i^th network node, p is the protection status, a is the status of being attacked, and T is the transpose operator. “Normal” and “malfunction” are typical values of f with the meaning that the node is in the normal working status or malfunction (Recall that in battle space cases, the function status of any unit values can be “undestroyed”, “damaged”, or “destroyed”). p can be the defense unit/service (such as firewall, IDS and filter, with probability) assigned to the node and p = NULL means that the i^th node is unprotected. The type of attacks will be specified in Action Space.Remark 1:

It is not difficult to understand that the system states are determined by two factors: 1) previous states and 2) the current actions. So the whole system can be modelled by a first-order Markov decision process.

The overall system state at time k is

sk=s1ksk2⋯sMkE2

where M is the number of nodes in the involved cyber network.Remark 2:

The system states are updated based on the IDS and the control inputs from the decision makers. Our model does not require 100% accurate measurements from the IDS. The IDS can report several possible situations with the associated confidence vector. For example, one IDS can input “node 1 is damaged with the probability 0.85” and “node 1 is normal with probability 0.15”.

Action Space --- At every time step, each player chooses targets with associated actions based on its observed network information. For the red team (cyber network attackers), we consider the following types of network-based attacks:

Buffer overflow (web attack): it occurs when a program does not check to make sure the data it is putting into a space will actually fit into that space.

Semantic URL attack (web attack): In semantic URL attack, a client manually adjusts the parameters of its request by maintaining the URL's syntax but altering its semantic meaning. This attack is primarily used against Common Gateway Interface (CGI) driven websites.

E-mail Bombing (email attack): In Internet usage, an e-mail bomb is a form of net abuse consisting of sending huge e-mail volumes to an address in an attempt to overflow the mailbox or overwhelm the server.

E-mail spam (email attack): Spamming is the abuse of electronic messaging systems to send unsolicited, and/or undesired bulk messages.

MALware attachment (email attack): Malware is software designed to infiltrate or damage a computer system without the owner's informed consent. Some common MALware attacks are worms, viruses, Trojan horses, etc.

Denial-of-service (network attack): Denial-of-service (DoS) attack is an attempt to make a computer resource unavailable to its intended users. Typically the targets are high-profile web servers where the attack is aiming to cause the hosted web pages to be unavailable on the Internet. A distributed denial of service attack (DDoS) occurs when multiple compromised systems flood the bandwidth or resources of a targeted system, usually a web server(s). These systems are compromised by attackers using a variety of methods.Remark 3:

Some attacks may be multi-stage. For example, e-mail spam and MALware are used first to gain control of several temporal network nodes, which are usually not well protected servers. Then a DoS attack will be triggered to a specified and ultimate target. Our dynamic Markov game model can handle these attacks from a planning perspective. Our mixed Nash strategy pair is based on a fixed finite planning horizon. See Strategies for details.

For the blue team (network defense system), we consider the following defense actions:

IDS deployment: we assume that there are limited IDSs. IDS deployment is similar to resource allocation (target selection) problems in traditional battle-space situations. We try to find an optimal deployment strategy to maximize the chance of detecting all possible cyber network intrusions.

Firewall configuration: A firewall is an information technology (IT) security device which is configured to permit, deny or proxy data connections set and configured by the organization's security policy.

Email-filter configuration: Email filtering is the processing of organizing emails according to a specified criterion. Email filtering software inputs email and for its output, it might (a) pass the message through unchanged for delivery to the user's mailbox, (b) redirect the message for delivery elsewhere, or (c) throw the message away. Some e-mail filters are able to edit messages during processing.

Shut down or reset servers.

Transition Rule --- The objective of the transition rule is to calculate the probability distribution over the state space qsk+1skukBukR, where s_k, s_k+1 are system states at time k and k+1 respectively, ukB,ukR are the overall decisions of the blue team (network defense system) and the red team (cyber attackers), respectively, at time step k. How to decide the overall actions for each team are specified in Strategies.

For each network node (server or workstation), the state of time k+1 is determined by three things: 1) state at time k; 2) control strategies of the two players; and 3) the attack/defense efficiency. If we compare part 3) to battle-space domain, the efficiency is the analogue of kill probability of weapons.

For example, if the state of node 1 at time k is [“normal”, “NULL”, “NULL”], one component of the red action is “email-bombing node 1”, and one component of blue action is “email-filter–configuration-no-block for node 1”, then the probability distribution of all possible next states of node 1 is: [“normal”, “email-filter-configuration”, “email-bombing”] with probability 0.4; [“slow response”, “email-filter-configuration”, “email-bombing”] with probability 0.3; and [“crashed”, “email-filter-configuration”, “email-bombing”] with probability 0.3. The actual probabilities depend on the efficiency of attacking and defending actions.

Payoff Functions --- In this Markov game model, there are two levels of payoff functions for each player (red or blue): lower-level (cooperative within each team) and higher-level (non-cooperative between teams) payoff functions. This hierarchical structure is important to model the coordinated cyber network attacks and specify optimal coordinated network defense strategies and IDS deployment.

The lower level payoff functions are used by each team (blue or red) to determine the cooperative team actions for each team member based on the available local information. For the j^th unit of blue force, the payoff function at time k is defined as ϕjBs˜jBkujBkWBkk where s˜jBk⊆sk is the local information obtained by the j^th blue member, ujBk is the action taken by the blue team member at time k, and W^B (k), the weights for all possible action-target couples of blue force, is announced to all blue team members and determined according to the top level payoff functions from a team-optimal perspective.

ϕjBs˜jBkujBkWBkk=Us˜jBk−wWBkujBkCujBkE3

where, Us˜jBk is the utility or payoff of the current local network state. CujBk is the cost of action to be taken by the blue team member. Usually, U(•) is a negative value if a network node is in malfunction status due to a cyber attack. The specific value depends on the value of the network node. The counterpart in the battle-space domain is the target value of each platform. Function wWBkujBk will calculate the weight for any specified action decision for the j^th member of the blue team based on the received W^B(k), which is determined on a team level and indicates the preference and trend of team defense strategies.

Similarly, we obtain the lower level payoff functions for the j^th member of red player,

ϕjRs˜jRkujRkWRkk=Us˜jRk−wWRkujRkCujRkE4

The top level payoff functions at time k are used to evaluate the overall performance, V(•), of each team.

VBs˜BkukBk=∑j=1MBϕjBs˜jBkujBkWBkkE5

VRs˜RkukRk=∑j=1MRϕjRs˜jRkujRkWRkkE6

In our approach, the lower level payoffs are calculated distributedly by each team member and sent back to network administrator via communication networks.

Strategies --- In game theory, the Nash equilibrium (Nash, 1951) is a kind of optimal collective strategy in a game involving two or more players, where no player has anything to gain by changing only his or her own strategy. If each player has chosen a strategy and no player can benefit by changing his or her strategy while the other players keep theirs unchanged, then the current set of strategy choices and the corresponding payoffs constitute a Nash equilibrium.

In our proposed approach, the sub-optimal Nash solution to the Markov game is obtained via a K time-step look-ahead approach, in which we only optimize the solution in the K time-step horizon. K usually takes 2, 3, 4, or 5. The suboptimal technique is used successfully for reasoning in games such as chess, backgammon, and monopoly. For our case, the objective of each team at time k is to find a serial of actions or course of action (COA) to maximize the following team level payoffs, J(•), respectively,

JkBukBukRuk+1Buk+1R…,uk+KBuk+KR=∑i=kk+KVBs˜BiuiBiE7

JkRukBukRuk+1Buk+1R…uk+KBuk+KR=∑i=kk+KVRs˜RiuiRiE8

Remark 4:

The K-step look-ahead (or moving window) approach well fits the situations in which multi-step cyber network attacks occurs since we evaluate the performance of each team based on the sum of payoffs during a period of K-time steps. With bigger K, the game model can predicate or detect more complicated network attackers with more stages. The cost is the increased computation complexity.

In general, the system model is nonlinear. By the linearization transformation method (Sastry, 1999), the system dynamics can be approximately modeled by a linear system.

Xk+1=AkXk+DkBCkB+DkRCkRE9

where X_k is the state of network at time k. CkB and CkR are the action control of blue and red force, respectively, at time k. DkB and DkR are the related control gains. The objective of each side is to minimize its cost function

JB=XNTQNBRXN⏟Gain ofRedside−XNTQNBBXN⏟Gain of Blue side−∑k=1N−1CkBTPBCkB⏟Cost of Blue ActionsE10

JR=XNTQNRBXN⏟Gain of Blue side−XNTQNRRXN⏟Gain ofRedside−∑k=1N−1CkRTPRCkR⏟Cost ofRedActionsE11

where QkBR, QkBB, QkRB, and QkRR are the gain matrices. Each player determines its gain matrix based on the network topology and its goals. P^B and P^R are the cost matrix of blue player and red player, respectively.

4. An adaptation design for linear quadratic games

With the consideration that the parameters in each side’s cost function is not accessible to the other side, we propose to use an adaptation design based on the concept of Fictitious Play (FP) (Brown, 1951; Fudenberg & Levine, 1998) to learn these unknown properties. As a learning concept, FP was first introduced by G. W. Brown in 1951. Within the learning scheme, each player presumes that her opponents are playing stable (possibly mixed) strategies. Each player starts with some initial beliefs and chooses a best response to those beliefs as a strategy in this round. Then, after observing their opponents' actions, the players update their beliefs according to some learning rule (e.g. temporal-differencing, Q-learning, or Bayes' rule). The process is then repeated. It is known (Levine, 1998) that if it converges, then the point of convergence is a Nash equilibrium of the game.

4.1 Nash strategies of linear quadratic games

Let us first consider general two-person infinite-horizon simultaneous linear quadratic games

xk+1=AkXk+Bk1uk1+Bk2uk2E12

with the cost functions

J1=∑k=0+∞xkTQ1xk+uk1TR11uk1+uk2TR12uk2E13

J2=∑k=0N−1xkTQ2xk+uk1TR21uk1+uk2TR22uk2E14

where x_k∈Rⁿ, uki∈R, Qⁱ > 0, R^ij > 0, system (A, Bi) is stabilizable, (A, Ci) is detectable (where C^iT Cⁱ = Qⁱ), and |B^iTBⁱ|>0 for i=1, 2, and k=0, 1, 2, …. We assume that both players have perfect information structures, with which both players know the exact system dynamics xk+1=Akxk+Bk1uk1+Bk2uk2 and measure exact system states x_k. It is also assumed that the simultaneous game in (12) – (14) has a unique Nash strategy pair for both players. It is well known (Basar & Olsder, 1999) that the Nash strategy pair is specified by

uk1∗=Δarguk1minJ1=γ1xk=L1xkE15

uk2∗=Δarguk2minJ2=γ2xk=L2xkE16

where L1 and L2 are defined, respectively, by

L1=−R11+B1Tk1B1−1B1Tk1A+B2L2E17

L2=−R22+B2Tk2B2−1B2Tk2A+B1L1E18

K¹ and K² are specified in the following coupled discrete-time algebraic Riccati equations (DAREs) that, by assumption, have a unique positive semi-definite solution.

K1=Q1+L1TR11L1+L2TR12L2+A+B1L1+B2L2TK1A+B1L1+B2L2E19

K2=Q2+L1TR21L1+L2TR22L2+A+B1L1+B2L2TK2A+B1L1+B2L2E20

4.2 Adaptation schemes

In the one-side adaptation scheme, only one of the two players has perfect information of the cost functions of both players, i.e., the player knows exact Q¹, Q², R¹¹, R¹², R²¹, R²², while the other player only has access to its cost function and does not know the parameters of the cost function of its opponent. WLOG, we assume that player 1 knows exact Qⁱ, R^ij, A, Bⁱ while player 2 has access to Q², R²¹, R²², A, Bⁱ. We also assume that player 1 will apply its state feedback Nash strategies calculated based the information of system dynamics, system states and cost functions of both players. Player 2 knows in advance that its opponent will implement state feedback Nash strategies. Player 2 will estimate the control gain L¹ first, and then calculate/estimate its own control gain L².

We assume that player 1, who has perfect information structure, will apply its real state feedback Nash strategies u^1* = L¹x_k, so we follow the convectional indirect adaptive control design method (Tao, 2003).

Consider the system defined in (12) with fixed controller (15) for player 1, let Lˆk1 be an estimate of the control gain L¹ of player 1, from the point view of player 2. The block diagram of indirect adaptive control system is shown in Fig. 3.

First, we have

xˆk+1=Axk+B1Lˆk1xk+B2uk2E21

Fig. 3.
Indirect adaptive control design for one-side adaptation scheme

Since matrix B^1TB¹ is invertible, we obtain, from (14) and (21)

B1TB1−1B1Txˆk+1−xk+1=Lˆk1−L1xkE22

We introduce the estimation error ek=B1TB1−1B1Txˆk−xk, (e_k is a scalar), then

ek=L˜k−11ϕkE23

where L˜k−11=Lˆk−11−L1 and ϕk=1zxk=xk−1. (Note that 1zxk denotes the output of the system with transfer function 1z and input x_k.)

Choose the adaptive law for Lˆk1 as

Lˆk1=Lˆk−11−ekϕkTΓmk2E24

where 0<Γ=Γ^T <2I_n is a gain matrix, and

mk=κ+ϕkTϕk,κ>0

As shown in Fig. 3, for any Lˆk1, a design function or mapping will be used to calculate Lˆk2. From (18) and (20), we have a best response strategy for player 2 given the estimated control gain Lˆk1 of player 1,

Lˆk2=−R22+B2TKˆk2B2−1B2TKˆk2A+B1Lˆk1E25

Kˆk2=Q2+Lˆk1TR21Lˆk1+Lˆk2TR22Lˆk2+A+B1Lˆk1+B2Lˆk2TKˆk2A+B1Lˆk1+B2Lˆk2E26

It is proved that the Kˆk2 specified in (25) and (26) is the solution to the DARE

A¯TXA¯−X−A¯TXB¯B¯TXB¯+R¯−1B¯TXA¯+Q¯=0E27

where

A¯=A+B1Lˆk1,B¯=B2,R¯=R22andQ¯=Q2+Lˆk1TR21Lˆk1.

Remark 5:

We can easily obtain Kˆk2 by using commercial software such as MATLAB command dare A¯B¯Q¯R¯.

Similar to the proof of Normalized Gradient algorithm (Tao, 2003, page 115-116), we define a parameter error measurement function VLˆk1=Lˆk1Γ−1Lˆk1T for the adaptive law in (24). Then,

VL˜k1−VL˜k−11=L˜k−1−ekϕkTΓmk2Γ−1L˜k−11−ekϕkTΓmk2T=−1mk2L˜k−11ϕkek+ekϕkTL˜k−11T−ϕkTΓϕkek2mk2=−ek2mk22−ϕkTΓϕkκ+ϕkTϕk≤−αek2mk2E28

where α=2−λ_max[Γ]>0. λ_max[Γ] ∈(0,2) is the maximum eigenvalue of Γ which satisfies the condition 0<Γ<2I_n. In the above, the equality in the first line comes from (24) and L˜k1=Lˆk1−L1, so

Lˆk1=Lˆk−11−ekϕkTΓmk2⇒L˜k1=L˜k−11−ekϕkTΓmk2E30

Now let’s testify the inequality in the last line. By the eigenvalue decomposition of Γ, we can obtain Γ=W^ΤΛW where Λ=diag(λ₁,λ₂,…,λ_n) is the eigenvalue matrix, and W contains the eigenvectors ( and W satisfies the condition W^Τ=W⁻¹). Therefore,

ϕkTΓϕk=ϕkTWTΛWϕk≤λmaxΓϕkTWTWϕk=λmaxΓϕkTϕkE31

Since κ>0, we have κ+ϕkTϕk≥ϕkTϕk. Then

0≤ϕkTΓϕkκ+ϕkTϕk≤λmaxΓE32

This completes the verification of (28), from which we have

α∑k=1Nek2mk2≤VL˜01−VL˜N1≤VL˜01E33

So limN→+∞∑k=1Nek2mk2 is bounded from above by VL˜01. Since mk2=κ+ϕkTϕk>0, we obtain

limN→+∞ek=limN→+∞L˜k−11ϕk=0E34

If ϕ_k is persistently exciting, then

limN→+∞L˜k1=0⇒limN→+∞Lˆk1=L1E35

We proposed two ways to satisfy the excitation conditions (AstrÄom & Wittenmark, 1995, page 63-67). The first one is a reference signal tracking method, and the other is called small system disturbance.

We also attend the adaptation design to two-side adaptation scheme (Fig. 4), in which each player needs to estimate the control gain of its opponent. We assume that each player has access to its own cost function as well as the system dynamics, and does not know the parameters of the cost function of its opponent. i.e., player 1 knows Q¹, R¹², R¹¹, A, B¹ and B² while player 2 has access to Q², R²¹, R²², A, B¹ and B². We also assume that each player knows in advance that its opponent will implement state feedback Nash strategies. So, as shown in Fig. 4, player 2 (or player 1) will estimate the control gain L¹ (or L²) first, and then calculate/estimate its own control gain L² (or L¹). The convergence is proved (Shen, 2006) via two properties of “reaction-curve” like relations between control gains of two players.

Fig. 4.
Indirect adaptive control design for two-side adaptation scheme

5. Simulation and visualization tool

To evaluate our game theoretic approach for cyber attack prediction and mitigation, we have constructed a Cyber Game Simulation Platform (CGSP) (as shown in Fig. 5) based on an open-source network experiment specification and visualization tool kit (ESVT). Through this event-based, interactive and visual simulation environment, various attack strategies (single stage or multi-staged) and scenarios can be easily played out and the effect of game theoretic attack prediction and mitigation can be visually and quantitatively evaluated.

Fig. 5.
Cyber Game Simulation Platform (CGSP)

The implemented network components in this platform includes Computer (host), Switch, Open Shortest Path First (OSPF) Router or Firewall, Link (connection), and (Sub) Network (Simulated by a node).

Besides the ordinary network properties such as processing capacity, bandwidth, Pr{error}, and delay etc., CGSP components can be assigned a number of network attack containment or traffic mitigation properties to act as various defense roles, including smart IDS (intrusion detection systems), incoming traffic block, and outgoing traffic block. Additionally and more importantly, these defense roles or network defense properties can be deployed and re-deployed in real-time during a game simulation run-time based on the local intelligence and orders from higher-level command centers.

The color of a link represents the traffic volume on that link (in KBps and in Mbps). Light Gray: less than 1 percent of bandwidth; Green: more than 1 percent of bandwidth; Yellow: between green and red; Red: more than 30 percent of bandwidth. The color of a host indicates the host status. Red: Infected node; Green: Vulnerable node but not infected; Gray: Non-vulnerable node.

In our simulation platform, network attacks and defenses are simulated in CGSP by events. Live network packets and other communications are represented and simulated by the main network event queue. Users or software agents can inject packets or network events through the timed event (M/M/1) queue. Security alerts or logs are generated and stored in the security log queue.

There are a number of cyber attacks that are included in the CGSP implementation: Port scan, Buffer attack (to gain control), Data bomb or Email bomb from and to a single host, Distributed Denial of service from multiple hosts, Worm attack, and Root right hack (confidentiality loss). [Note: Both buffer attack victims and worm infectives will join the distributed denial of service when they receive the DDOS command.]

The arsenal of network defense team includes: Smart IDS (Accuracy and false positive adjustable), Directional traffic block (outgoing or incoming), Host Shutdown, Host Reset (shutdown and restart). [Note: Both SHUTDOWN and RESET will clear the infection status on the host.]

We simulated a scenario (Fig. 5) with 23 workstations, 2 routers, 7 switches, and 1 network. In this scenario, we first limit the look-ahead steps K to 2 (which means the defense side does not consider the multi-stage attacking patterns).

In this case, we implemented Nash strategies for cyber network defense side. We can see from Fig. 6 and Fig. 7 that a target computer (web server) is infected or hacked. Then the computer (web server) will be used by attacking force to infect other more important target computers such as file servers or email servers. This two-step attacking scheme is based on two facts: 1) a public web server is easy to attack and 2) an infected internal computer (web server in this case) is more efficient and stealthy than an external computer to attack well protected computers such as data servers or email servers.

Fig. 6.
A public web server is infected or hacked.

Fig. 7.
Three more important data servers are attacked by the infected internal server.

The adaptive scheme is implemented and the results are shown in Fig. 8. In this plot, Lˆkijs is the s^th element of Lˆkij, which is the estimate of state feedback control gain, L, of player j by player i, (i,j=1,2). We can see the convergence of Lˆkij to L_j. which is the actual value of control gain for player j. During the adaptation, the overshoots indicate that the decision maker will sometimes overact the changes (deviation from the current Nash equilibria) in the strategies of his opponents.

In the next run, we set the look-ahead step K=5. Then no network nodes are infected or hacked during the simulation of 2 hours. If a public server is infected, the defense side can foresee the enemy’s next attacking internal server from the infected network node. Then a shut-down or reboot action will be taken to destroy the multi-stage attack at the first stage.

6. Conclusions

We implemented an adaptive Markov game theoretic situation awareness and adversary intent inference approach in a data-fusion/data-mining cyber attack and network defense framework (Fig. 2). The network security system was evaluated and protected from a perspective of data fusion and adaptive control. The goal of our approach was to examine the estimation of network states and projection of attack activities (similar to extended course of action (ECOA) in the warfare scenario). We used Markov game theory’s ability to “step ahead” to infer possible adversary attack patterns. With the consideration that the parameters in each game player's cost function is not accessible to other players, we designed an adaptation scheme, based on the concept of Fictitious Play (FP), for the piecewise linearized Markov game model. A software tool was developed to demonstrate the performance of the adaptive game theoretic high level information fusion approach for cyber network defense and simulations were performed to verify and illustrate the benefits of this model.

References

1. Agah, A.; Das, S. K. & Basu, K. (2004). A non-cooperative game approach for intrusion detection in sensor networks, Vehicular Technology Conference, 2004. VTC2004-Fall. pp. 2902 – 2906, 2004.
2. Alpcan, T. & Basar, T. (2003). A game theoretic application to decision and analysis in Network Intrusion Detection, 42nd IEEE CDC 2003, pp. 2595-2600, Maui, Hawaii, USA, 2003.
3. AstrÄom, K.J. & Wittenmark, B. (1995). Adaptive Control. Addison-Wesley Series in Electrical Engineering: Control Engineering, Addison-Wesley, second ed., 1995.
4. Basar, T. & Olsder, G. J. (1999). Dynamic Noncooperative Game Theory, SIAM Series in Classics in Applied Mathematics, second ed., January, 1999.
5. Blasch, E. & Plano, S. (2005). DFIG Level 5 (User Refinement) issues supporting Situational Awareness Reasoning, 7th International Conference on Information Fusion, pp. xxxv-xliii, ISBN: 0-7803-9286-8, Philadelphia, PA, USA, July, 2005.
6. Brown, G. W. (1951). Iterative solutions of games by fictitious play, In: Activity Analysis of Production and Allocation (T. C. Koopmans, ed.), New York: Wiley, 1951.
7. Chen, G.; Shen, D.; Kwan, C.; Cruz, Jr., J. B.; Kruger, M. & Blasch, E. (2007). Game Theoretic Approach to Threat Prediction and Situation Awareness, Journal of Advances in Information Fusion, vol. 2, no. 1, pp. 35-48, June, 2007.
8. Fudenberg D. & Levine, D. K. (1998). The Theory of Learning in Games, Cambridge: MIT Press, 1998.
9. Nash, J. (1951). Noncooperative games, Annals of Mathematics, vol. 54, pp. 286-295, 1951.
10. Salerno, J.; Hinman, M. & Boulware, D. (2005). A Situation Awareness Model Applied To Multiple Domains, Proc. Defense and Security Conference, Orlando, FL, 2005.
11. Sallhammar, K.; Knapskog, S. J. & Helvik, B. E. (2005). Using Stochastic Game Theory to compute the expected Behavior of attackers, Proceedings, 2005 Symposium on Applications and the Internet Workshops, 2005.
12. Sastry, S. S. (1999). Nonlinear Systems: Analysis, Stability and Control, Springer-Verlag, New York, NY, 1999.
13. Shapley, L. S. (1953). Stochastic games, Proc. National Academy of Sciences of the United States of America, vol. 39, pp. 1095-1100, 1953.
14. Shen, D. (2006). Nash strategies for dynamic noncooperative linear quadratic sequential games, Ph.D. Dissertation, Adviser: Jose B. Cruz, Jr., Ohio State University, Electrical Engineering, 2006.
15. Tadda, G.; Salerno, J.; Boulware, D.; Hinman, M. & Gorton, S. (2006). Realizing Situation Awareness within a Cyber Environment, In: Multisensor, Multisource Information Fusion: Architectures, Algorithms, and Applications 2006, edited by B. V. Dasarathy, Proc. of SPIE Vol. 6242, SPIE, Bellingham, WA, 2006.
16. Tao, G. (2003). Adaptive Control Design and Analysis, Adaptive and Learning Systems for Signal Processing, Communications and Control Series, Hoboken, N.J.: Wiley-Interscience, 2003.

[1] 1. Agah, A.; Das, S. K. & Basu, K. (2004). A non-cooperative game approach for intrusion detection in sensor networks, Vehicular Technology Conference, 2004. VTC2004-Fall. pp. 2902 – 2906, 2004.

[2] 2. Alpcan, T. & Basar, T. (2003). A game theoretic application to decision and analysis in Network Intrusion Detection, 42nd IEEE CDC 2003, pp. 2595-2600, Maui, Hawaii, USA, 2003.

[3] 3. AstrÄom, K.J. & Wittenmark, B. (1995). Adaptive Control. Addison-Wesley Series in Electrical Engineering: Control Engineering, Addison-Wesley, second ed., 1995.

[4] 4. Basar, T. & Olsder, G. J. (1999). Dynamic Noncooperative Game Theory, SIAM Series in Classics in Applied Mathematics, second ed., January, 1999.

[5] 5. Blasch, E. & Plano, S. (2005). DFIG Level 5 (User Refinement) issues supporting Situational Awareness Reasoning, 7th International Conference on Information Fusion, pp. xxxv-xliii, ISBN: 0-7803-9286-8, Philadelphia, PA, USA, July, 2005.

[6] 6. Brown, G. W. (1951). Iterative solutions of games by fictitious play, In: Activity Analysis of Production and Allocation (T. C. Koopmans, ed.), New York: Wiley, 1951.

[7] 7. Chen, G.; Shen, D.; Kwan, C.; Cruz, Jr., J. B.; Kruger, M. & Blasch, E. (2007). Game Theoretic Approach to Threat Prediction and Situation Awareness, Journal of Advances in Information Fusion, vol. 2, no. 1, pp. 35-48, June, 2007.

[8] 8. Fudenberg D. & Levine, D. K. (1998). The Theory of Learning in Games, Cambridge: MIT Press, 1998.

[9] 9. Nash, J. (1951). Noncooperative games, Annals of Mathematics, vol. 54, pp. 286-295, 1951.

[10] 10. Salerno, J.; Hinman, M. & Boulware, D. (2005). A Situation Awareness Model Applied To Multiple Domains, Proc. Defense and Security Conference, Orlando, FL, 2005.

[11] 11. Sallhammar, K.; Knapskog, S. J. & Helvik, B. E. (2005). Using Stochastic Game Theory to compute the expected Behavior of attackers, Proceedings, 2005 Symposium on Applications and the Internet Workshops, 2005.

[12] 12. Sastry, S. S. (1999). Nonlinear Systems: Analysis, Stability and Control, Springer-Verlag, New York, NY, 1999.

[13] 13. Shapley, L. S. (1953). Stochastic games, Proc. National Academy of Sciences of the United States of America, vol. 39, pp. 1095-1100, 1953.

[14] 14. Shen, D. (2006). Nash strategies for dynamic noncooperative linear quadratic sequential games, Ph.D. Dissertation, Adviser: Jose B. Cruz, Jr., Ohio State University, Electrical Engineering, 2006.

[15] 15. Tadda, G.; Salerno, J.; Boulware, D.; Hinman, M. & Gorton, S. (2006). Realizing Situation Awareness within a Cyber Environment, In: Multisensor, Multisource Information Fusion: Architectures, Algorithms, and Applications 2006, edited by B. V. Dasarathy, Proc. of SPIE Vol. 6242, SPIE, Bellingham, WA, 2006.

[16] 16. Tao, G. (2003). Adaptive Control Design and Analysis, Adaptive and Learning Systems for Signal Processing, Communications and Control Series, Hoboken, N.J.: Wiley-Interscience, 2003.

An Adaptive Markov Game Model for Cyber Threat Intent Inference

Theory and Novel Applications of Machine Learning

Author Information

Dan Shen

Genshe Chen

Jose B. Cruz

Erik Blasch

Khanh Pham

1. Introduction

Fig. 1.

2. Framework for Cyber Threat Intent Inference

Fig. 2.

3. Markov game model for cyber network defense

4. An adaptation design for linear quadratic games

4.1 Nash strategies of linear quadratic games

4.2 Adaptation schemes

Fig. 3.

Fig. 4.

5. Simulation and visualization tool

Fig. 5.

Fig. 6.

Fig. 7.

Fig. 8.

6. Conclusions

References

Life-long Learning Through Task Rehearsal and Selective Knowledge Transfer

Your cart

An Adaptive Markov Game Model for Cyber Threat Intent Inference

Theory and Novel Applications of Machine Learning

Author Information

Dan Shen

Genshe Chen

Jose B. Cruz

Erik Blasch

Khanh Pham

1. Introduction

Fig. 1.

2. Framework for Cyber Threat Intent Inference

Fig. 2.

3. Markov game model for cyber network defense

4. An adaptation design for linear quadratic games

4.1 Nash strategies of linear quadratic games

4.2 Adaptation schemes

Fig. 3.

Fig. 4.

5. Simulation and visualization tool

Fig. 5.

Fig. 6.

Fig. 7.

Fig. 8.

6. Conclusions

References

Continue reading from the same book

Theory and Novel Applications of Machine Learning

Your cart