Construction and Application of Learning Petri Net

Liangbing Feng; Masanao Obayashi; Takashi Kuremoto; Kunikazu Kobayashi

doi:10.5772/48398

Author Information

Show +

Liangbing Feng
- Division of Computer Science & Design Engineering, Yamaguchi University, Ube, Japan
- Shenzhen Institutes of Advanced Technology, Shenzhen, China
Masanao Obayashi
- Division of Computer Science & Design Engineering, Yamaguchi University, Ube, Japan
Takashi Kuremoto
- Division of Computer Science & Design Engineering, Yamaguchi University, Ube, Japan
Kunikazu Kobayashi
- Division of Computer Science & Design Engineering, Yamaguchi University, Ube, Japan

*Address all correspondence to:

1. Introduction

Petri nets are excellent networks which have great characteristics of combining a well-defined mathematical theory with a graphical representation of the dynamic behavior of systems. The theoretical aspect of Petri nets allows precise modeling and analysis of system behavior, at the same time, the graphical representation of Petri nets enable visualization of state changes of the modeled system [32]. Therefore, Petri nets are recognized as one of the most adequate and sound tool for description and analysis of concurrent, asynchronous and distributed dynamical system. However, the traditional Petri nets do not have learning capability. Therefore, all the parameters which describe the characteristics of the system need to be set individually and empirically when the dynamic system is modeled. Fuzzy Petri net (FPN) combined Petri nets approach with fuzzy theory is a powerful modeling tool for fuzzy production rules-based knowledge systems. However, it is lack of learning mechanism. That is the significant weakness while modeling uncertain knowledge systems.

At the same time, intelligent computing is taken to achieve the development and application of artificial intelligence (AI) methods, i.e. tools that exhibit characteristics associated with intelligence in human behaviour. Reinforcement Learning (RL) and artificial neural networks have been widely used in pattern recognition, decision making, data clustering, and so on. Thus, if intelligent computing methods are introduced into Petri nets, this may make Petri nets have the learning capability, and also performance and the applicable areas of Petri nets models will be widely expanded. The dynamic system can be modeled by Petri nets with the learning capability and then the parameters of the system can be adjusted by online (data-driven) learning. At the same way, if the generalized FPNs are expanded by adding neural networks and their leaning capability, then FPNs are able to realize self-adapting and self-learning functions. Consequently, it achieves automatic knowledge reasoning and fuzzy production rules learning.

Recently, there are someresearches for making the Petri net have learning capability and making it optimize itself. The global variables are used to record all state of colored Petri net when it is running [22]. The global variables are optimized and colored Petri net is updated according to these global variables. A learning Petri net model which combines Petri net with a neural network is proposedby Hirasawa et al., andit was applied to nonlinear system control[10]. In our former work [5, 6], a learning Petri net model has been proposed based on reinforcement learning (RL). RL is applied to optimize the parameters of Petri net. And, this learning Petri net model has been applied to robot system control. Konar gave an algorithm to adjust thresholds of a FPN through training instances [1]. In [1], the FPN architecture is built on the connectionism, just like a neural network, and the model provides semantic justification of its hidden layer. It is capable of approximate reasoning and learning from noisy training instances.A generalized FPN model was proposed by Pedrycz et al., which can be transformed into neural networks with OR/AND logic neuron, thus, parameters of the corresponding neural networks can be learned (trained) [24]. Victor and Shen have developed a reinforcement learning algorithm for the high-level fuzzy Petri net models [23].

This chapter focuses on combining the Petri net and fuzzy Petri net with intelligent learning method for construction of learning Petri net andlearning fuzzy Petri net (LFPN), respectively. These are applied to dynamic system controls and a system optimization. The rest of this paper is organized as follow. Section 2 elaborates on the Learning Petri net construction and Learning algorithm. Section 3 describes how to use the Learning Petri net model in the robots systems. Section 4 constructs a LFPN. Section 5 shows the LFPN is used in Web service discovery problem. Section 6 summarizes the models of Petri net described in the chapter and results of their applications and demonstrates the future trends concerned with Learning Petri nets.

2. The learning Petri net model

The Learning Petri net (LPN) model is constructed based on high-level time Petri net (HLTPN). The definition of HLTPN is given firstly.

2.1. Definition of HLTPN

HLTPN is one of expanded Petri nets.

Definition 1: HLTPN has a 5-tuple structure,HLTPN= (NG, C, W, DT, M₀) [9], where

NG= (P, Tr, F) is called “net graph” with Pwhich called “Places”.P is a finite set of nodes.ID:PN is a function marking P, N = (1,2,…) is the set of natural number. p₁, p₂, …, p_nrepresents the elements of P and n is the cardinality of set P;

Tris a finite set of nodes, called “Transitions”, which disjoints from P, P◠Tr=∅; ID:TrN is a function marking Tr. tr₁, tr₂, …, tr_mrepresents the elements of Tr, m is the cardinality of set Tr;

F⊆(P×Tr)∪(Tr×P)is a finite set of directional arcs, known as the flow relation;

C is a finite and non-empty color set for describing different type of data;
W: F C is a weight function on F. If F⊆(P×Tr), the weight function W is W_inthat decides which colored Token can go through the arc and enable T fire. This color tokens will be consumed when transition is fired. If F⊆(Tr×P), the weight function W is W_outthat decides whichcolored Token will be generated by T and be input to P.
DT: TrR is a delay time function of a transition which has a Time delay for an enable transition fired or the fire of a transition lasting time.
M₀: PU_p∈P μC(p)such that ∀ p∈P, M₀(p) ∈ μC(p) is the initial marking function which associates a multi-set of tokens of correct type with each place.

2.2. Definition of LPN

In HLTPN, the weight functions of input and output arc for a transition decide the input and output token of a transition. These weight functions express the input-output mapping of transitions. If these weight functions are able to be updated according to the change of system, modeling ability of Petri net will be expanded. The delay time of HLTPN expresses the pre-state lasting time. If the delay time is able to be learnt while system is running, representing ability of Petri net will be enhanced. RL is a learning method interacting with a complex, uncertain environment to achieve an optimal policy for the selection of actions of the learner. RL suits to update dynamic system parameters through interaction with environment [18]. Hence, we consider using the RL to update the weight function and transition’s delay time of Petri net for constructing the LPN. In another word, LPN is an expanded HLTPN, in which some transition’s input arc weight function and transition delay time have a value item which records the reward from the environment.

Definition 2: LPN has a 3-tuple structure, LPN= (HLTPN, VW, VT), where

HLTPN= (NG, C, W, DT, M₀) is a High-Level Time Petri Net and NG= (P, Tr, F).
VW (value of weight function): W_inR, is a function marking on W_in. An arc F⊆(P×Tr) has a set of weight function W_in and each W_in has a reward value item VW ∈real number.
VT (value of delay time): DTR, is a function marking on DT. A transition has a set of DT and each DT has a reward value item VT ∈real number.

An example ofLPNmodel is shown in Figure 1 Using LPN, a mapping of input-output tokens is gotten. For example, in Figure 1,colored tokens C_ij(i=1;j=1,2, …, n) are input toP₁ by Tr_input. There are n weight functions W(<C₁_j>, VW_C₁_j,₁_,j) on a same arc F₁_,j. it is according to the value VW_Cij,i,jthat token C₁_j obeys what weight functions inW(<C_ij>, VW_Cij,i,j)to fire a transition. After token C₁_j passed through arc F_i,j (i=1;j=1,2, …, n), one of Tr_i,j(i=1; j=1,2, …, n) fires and generates Tokens C_ij(i=2; j=1,2, …, n)in P₂. After P₂ has color Token C_ij(i=2; j=1,2, …, n),Tr_i,j(i=2; j=1,2, …, n) fires and different colored Token C_ij(i=3; j=1,2, …, n)is generated. Then, a mapping of C₁_j– C₃_j is gotten. At the same time, a reward will be gotten from environment according to whether it accords with system rulethat C₃_j generated by C₁_j. These rewards are propagated to every VW_Cij,i,j and adjust the VW_Cij,i,j.After training, the LPN is able to express a correct mapping of input-output tokens.

Using LPN to model a dynamic system, the system state is modeled as Petri net marking which is marked for a set of colored token in all places of Petri net, and the change of the system state (i.e. the system action) is modeled as fired of transitions. Some parameters of system can be expressed as token number and color, arc weight function, transition delay time, and so on. For example, different system signals are expressed as different colored of token. When the system is modeled, some parameters are unknown or uncertain. So, these parameters are set randomly. When system runs, the system parameters are gotten gradually and appropriately through system acting with environment and the effect of RL.

2.3. Learning algorithm for LPN

In LPN, there are two kinds of parameters. One is discrete parameter −− the arc’s weight function which describes the input and output colored tokens for transition. The other is continuous parameter −− the delay time for the transition firing. Now, we will discuss two kinds of parameters which are learnt using RL.

2.3.1. Discrete parameter learning

In LPN, RL is used to adjust VW and VT through interacting with environment. RL could learn the optimal policy of the dynamic system through environment state observation and improvement of its behavior through trial and error with the environment. RL agent senses the environment and takes actions. It receives numeric award and punishments from some reward function. The agent learns to choose actions to maximize a long term sum or average of the future reward it will receive.

The arc weight function learning algorithm is based on Q-learning – a kind of RL[18]. In arc weight function learning algorithm, VW_Cij,i,j is randomly set firstly. So, the weight function on the arc is arbitrary. When the system runs, formula (1) is used to update VW_Cij,i,j.

VWCij,i,j=VWCij,i,jj+α[r+γ(VWci+1j,i+1,j¯)−VWCij,i,j]E1

(1)

where,

áis the step-size,is a discount rate.
ris reward which W(<C_ij>, VW_Cij,i,j) gets when Tr_i,j isfired by <C_ij>. Here, because environment gives system reward at only last step, so a feedback learning method is used. If W(<C_ij>, VW_Cij,i,j) through Tr_i,j generates Token <C_i+1,j> and W(<C_i+1j>, VW_Ci+1j,i+1,j) through Tr_i+1,j generates Token <C_i+2,j>, VW_Ci+1j,i+1,j gets an update value, and this value is feedback as W(<C_ij>, VW_Cij,i,j) next time reward r.
(VWci+1j,i+1,j¯)WC_i+1jVW_Ci+1j,i+1,j

(VWci+1j,i+1,j¯) t= γ(VWci+1j,i+1,j¯) t−1+rt E2

where t is time for that <C_i+₁_j> is generated by W(<C_ij>, VW_Cij,i,j).

When everyweight function of input arc of the transition has gotten the value, each transition has a value of its action. The policy of the action selection needs to be considered. The simplest action selection rule is to select the service with the highest estimated state-action value, i.e. the transition corresponding to the maximum VW_Cij,i,j. This action is called a greedy action. If a greedy action is selected, the learner (agent) exploits the current knowledge. If selecting one of the non-greedy actions instead, agent intends to explore to improve its policy. Exploitation is to do the right thing to maximize the expected reward on the one play; meanwhile exploration may produce the greater total reward in the long run. Here, a method using near-greedy selection rule calledε-greedy method is used in action selection; i.e., the action is randomly selected at a small probability ε and selected the action which has the biggest VW_cij,i,j at probability 1−ε.Now, we show the algorithm of LPN which is listed in Table 1.

Algorithm 1. Weight function learning algorithm
Initialization: Set all VW_ijand r of all input arc’s weight function to zero.
Initialize the learning Petri net. i.e. make the Petri net state as M₀.
Repeat i) and ii) until system becomes end state.
When a place gets a colored Token C_ij, there is a choice that which arc weight function is obeyed if the functions include this Token. This choice is according to selection policy which isε greedy (εis set according to execution environment by user, usually 0<ε<<1).
A: Select the function which has the biggest VW_cij,i,j at probability1-ε;
B: Select the function randomly at probability ε.
The transition which the function correlates fires and reward is observed. Adjust the weight function valueusing VW_Cij,i,j= VW_Cij,i,jj+α[r+- VW_Cij,i,j]. At the same time, α[r+- VW_Cij,i,j] is fed back to the weight function with generatedC_ij as its reward for next time.

Table 1.

Weight function learning algorithm in LPN

2.3.2. Continuous parameter learning

The delay time of transition is a continuous variable. So, the delay time learning is a problem of RL in continuous action spaces. Now, there are several methods of RL in continuous spaces: discretization method, function approximation method, and so on [4]. Here, discretization method and function approximation method are used in the delay time learning in LPN.

Discretization method

As shown in Figure 2 (i), the transition tr₁ has a delay time t₁. When p₁ has a token <token_n>, the system is at a state that p₁ has a Token. This time transition tr₁ is enabled. Becausetr₁ has a delay timet₁, tr₁ doesn’t fire immediately. After passing time t₁ andtr₁ fires, the token in p₁ is taken outand this state is terminated. Then, during the delay time of tr₁,the state that p₁has a token continues.

Because the delay time is a continuous variable, the different delay time is discretized for using RL to optimize the delay time. For example, tr₁ in Figure 2(i) has an undefined delay time t₁. Tr₁is discretized into several different transitions which have different delay times (shown in Figure 2(ii)) and every delay time has a value item Q. After Tr₁ fired at delay time t_1i, it gets a reward r immediately or after its subsequence gets rewards. The value of Q is updated by formula (3).

γ(VWci+1j,i+1,j¯)E3

where, Q(P, Tr) is value of transition Tr at Petri net state P. Q(P',Tr' ) is value of transition T' atnext state P'of P. αisa step-size, γ is a discount rate.

Figure 2.
Transformation form from high-level Petri net to the learning model

After renewing of Q, the optimal delay time will be selected. In Figure 2 (ii), when tr_11,…,tr_1n get value Q_11,…,Q_1n,respectively, the transition is selected by the soft-max method according to a probability of Gibbs distribution.

γ(VWci+1j,i+1,j¯)E4

where, Pr{t_t=t|p_t=p} is a probability selecting of transition t at state p, â is a positive inverse temperature constant and A is a set of available transitions.

Now, we found the learning algorithm of delay time of LPN using the discretization method. And it is listed in Table 2.

Transition’s delay time learning algorithm 1 (Discretization method):
Initialization: discretize the delay time and set Q(p,t) of every transition’s delay time to zero.
Initialize Petri net, i.e. make the Petri net state as P₁.
Repeat(i) and (ii) until system becomes end state.
Select a transition using formula (4).
After transition fired and reward is observed, value of Q(p,t) is adjusted using formula (3).
Step 3. Repeat Step2 until t is optimal as required.

Table 2.

Delay time learning algorithm using the discretization method

Function approximation method

First, the transition delay time is selected randomly and executed. The value of the delay time is obtained using formula (3). When the system is executed m times, the data (t_i, Q_i(p,t_i)) (i = 1, 2, …, m) is yielded.The relation of value of delay time Q and delay time t is supposed as Q = F(t). Using the least squares method, F(t) will be obtained as follows.It is supposed that F is a function class which is constituted by a polynomial. And it is supposed that formula (5) hold.

Q(P,Tr)←Q(P,Tr)+α[r+ γQ(P',Tr')−Q(P,Tr)]E5

The data (t_i, Q_i(p,t_i)) are substituted in formula (5). Then:

Pr{tt=t|pt=p} =eβQ(p,t)∑b∈AeβQ(p,b)E6

Here, the degree m of data (t_i, Q_i(p,t_i)) is not less than data number n of formula (5). According to the least squares method, we have (2.7).

f(t) =∑k=0naktk∈FE7

In fact, (7) is a problem which evaluates the minimum solution of function (8).

f(ti) =∑k=0naktik(i= 1, 2, …,m;m≥n)E8

So, function (9), (10) are gotten from (8).

||δ||2=∑i=1mδi2=∑i=1m[∑k=0naktik−Qi]2⇒minE9

||δ||2=∑i=1m[∑k=0naktik−Qi]2E10

Solution of Equation (10) a₀, a₁,…, a_ncan be deduced and Q= f(t) is attained. The solution t*_opt of Q= f(t) which makes maximum Q is the expected optimal delay time.

∂||δ||2∂aj=2∑i=1m∑k=0n(aktik−Qi)tij=0 (j=0, 1, …,n) E11

The multi-solution of (11) t = t_opt(opt = 1, 2, …, n-1) is checked by function (5) and a t^*_opt∈t_optwhich makes f(t*_opt)= max f(t_opt) (opt= 1, 2, …,n-1)is the expected optimal delay time. t*_optis used as delay time and the system is executed and new Q(p, t*_opt)is gotten. This (t*_opt,Q(p, t*_opt)) is used as the new and the least squares method can be used again to acquire more precise delay time.

After the values of actions are gotten, the soft-max method is selected as the actions selection policy.And then, we found the learning algorithm of delay time of Learning Petri net using the function approximation method. And it is listed in Table 3.

Transition’s delay time learning algorithm 2 ( Function approximation method):
Step 1. Initialization: Set Q(p,t) of every transition’s delay time to zero.
Step 2. Initialize Petri net, i.e. make the Petri net state as P₁.
Repeat(i) and (ii) until system becomes end state.
Randomly select the transition delay time t.
After transition fires and reward is observed, the value of Q(p, t)is adjusted using formula (3).
Step 3. RepeatStep 2 until adequacy data are gotten. Then, evaluate the optimal t using the function approximation method.

Table 3.

Delay time learning algorithm using the function approximation method

3. Applying LPN to robotic system control

3.1. Application for discrete event dynamic robotic system control

A discrete event dynamic system is a discrete-state, event-driven system in which the state evolution depends entirely on the occurrence of asynchronous discrete events over time [2]. Petri nets have been used to model various kinds of dynamic event-driven systems like computers networks, communication systems, and so on. In this Section, it is used to model Sony AIBO learning control system for the purpose of certification of the effectiveness of the proposed LPN.

AIBO voice command recognition system

AIBO (Artificial Intelligence roBOt) is a type of robotic pets designed and manufactured by Sony Co., Inc. AIBO is able to execute different actions, such as go ahead, move back, sit down, stand up and cry, and so on. And it can "listens" voice via microphone. A command and control system will be constructed for making AIBO understand several human voice commands by Japanese and English and take corresponding action. The simulation system is developed on Sony AIBO’s OPEN-R (Open Architecture for Entertainment Robot) [19]. The architecture of the simulation system is showed in Figure 3. Because there are English and Japanese voice commands for same AIBO action, the partnerships of voice and action are established in part (4). The lasted time of an AIBO action is learning in part (5). After an AIBO action finished, the rewards for correctness of action and action lasted time are given by the touch of different AIBO’s sensors.

Figure 3.
System architecture of voice command recognition

LPN model for AIBO voice command recognition system

In the LPN model for AIBO voice command recognition system, AIBO action change, action time are modeled as transition, transition delay, respectively. The human voice command is modeled by the different color Token. The LPN model is showed in Figure4. The meaning of every transition is listed below:Tr_input changes voice signal as colored Token which describe the voice characteristic.Tr₁₁, Tr₁₂and Tr₁₃can analyzethe voice signal. Tr₁generates 35 different Token VL₁….VL₃₅ according to the voice length. Tr₂ generates 8 different Token E2₁…E2₈ according to the front twenty voice sample energy characteristic. Tr₃ generates 8 different Token E4₁…E4₈according to the front forty voice sample energy characteristic [8]. These three types of the token are compounded into a compound Token <VL_l>+<VE2_m>+ <VE4_n> in p₂[12].

Tr_2jgenerates the different voice Token. The input arc’s weight function is ((<VL_l>+<VE_2m>+ <VE_4n>), VW_Vlmn,2j) and the output arc’s weight function is different voice Token. And voice Token will generate different action Token through Tr_3j.When Pr₄–Pr₈ has Token, AIBO’s action will last. Tr_4j takes Token out from p₄–p₈, and makes corresponding AIBO action terminates. Tr_4j has a delay time DT_4i, and every DT_4i has a value VT_4i. Transition adopts which delay time DT_4i according to VT_4i.

Results of simulation

When the system begins running, it can’t recognize the voice commands. A voice command comes and it is changed into a compound Token in p₂. This compound Token will randomly generate a voice Token and puts into p₃. This voice Token randomly arouses an action Token. A reward for action correctness is gotten, then, VW and VT are updated. For example, a compound colored Token (<VL_l>+ <VE_2m> + <VE_4n>) fired Tr₂₁and colored Token VC₁ is put into p₃. VC₁ firesT₃₂and AIBO acts "go". A reward is gotten according to correctness of action. VW_VC1,32 is updated by this reward and VW_VC1,32 updated value is fed back to p₂ as next time reward value of (<VL_l>+ <VE_2m> + <VE_4n>) fired Tr₂₁. After an action finished, a reward for correctness of action time is gotten and VT is updated.

Figure 4.
LPN model of voice command recognition

Figure 5.
Relation between training times and recognition probability

Figure 5 shows the relation between training times and voice command recognition probability. Probability 1 shows the successful probability of recently 20 times training. Probability 2 shows the successful probability of total training times. From the result of simulation, we confirmed that LPN is correct and effective using the AIBO voice command control system.

3.2. Application for continuous parameter optimization

The proposed system is applied to guide dog robot system which uses RFID (Radio-frequency identification) to construct experiment environment. The RFID is used as navigation equipment for robot motion. The performance of the proposed system is evaluated through computer simulation and real robot experiment.

RFID environment construction

RFID tagsare used to construct a blind road which showed in Figure 6. There are forthright roads, corners and traffic light signal areas. The forthright roads have two group tags which have two lines RFID tags. Every tag is stored with the information about the road. The guide dog robot moves, turns or stops on the road according to the information of tags. For example, if the guide dog robot reads corner RFID tag, then it will turn on the corner. If the guide dog robot reads either outer or inner side RFID tags, it implies that the robot will deviate from the path and robot motion direction needs adjusting. If the guide dog robot reads traffic control RFID tags, then it will stop or run unceasingly according tothe traffic light signal which is dynamically written to RFID.

Figure 6.
The real experimental environment

LPN model for the guide dog

The extended LPN control model for guide dog robot system is presented in Figure 7. The meaning of place and transition in Figure 7 is listed below:

When the system begins running, it firstly reads RFID environment and gets the information, Token puts into P₂. These Tokens fire one of transition from Tr₂ to Tr₆according to weight function on P₂to Tr₂,…,Tr₆. Then, the guide dog enters stop, running, turning corner, left adjustingor right adjusting states. Here, at P₃, P₄, P₅states, the guide dog turns at a specific speed. The delay time of Tr₇-Tr₉ decide the correction of guide dog adjustingits motion direction.

Reward getting from environment

When Tr₇, Tr₈or Tr₉ fires, it will get reward r as formula (12-b) when the guide dog doesn’t get Token <Left> and <Right> until getting Token <corner> i.e. the robot runs according correct direction until arriving corner. It will get reward r as formula (12-a), where t is time from transition fire to get Token <Left> and <Right>. On the contrary, it will get punishment -1 as (12-c) if robot runs out the road.

∑i=1m(∑k=0ntij+k)ak=∑i=1mtijQi (j= 0, 1, …,n). E12

Computer simulation and real robot experiment

When robot reads the <Left>, <Right> and <corner> information, it must adjust the direction of the motion. The amount of adjusting is decided by the continuing time of the robot at the state of P₃, P₄and P₅. So, the delay time of Tr₇, Tr₈ and Tr₉ need to learn.

Figure 8.
Direction adjustment of the guide dog robot motion

Before the simulation, some robot motion parameter symbols are given as:

v velocity of the robot

ω angular velocity of the robot

t_pre continuous time of the former state

t adjusting time

t_post last time of the state after adjusting

v, ω, t_pre, t_post can be measured by system when the robot is running. The delay time of Tr₇, Tr₈ and Tr₉, i.e. the robot motion adjusting time, is simulated in two cases.

As shown in Figure8(i), when the robot is running on the forthright road and meets inside RFID line, its deviation angleθ is:

∂f(t)∂t=0E13

where d₁ and l₁ are width of area between two inside lines and moving distance between two times reading of the RFID,respectively (See Figure8).

Robot’s adjusting time (transition delay time) is t.

If ωt -θ≥0, then

r={1/et1−1. (12−a)(12−b)(12−c)E14

else

θ=arcsin(d1/l1) =arcsin(d1/(tpre•v)).E15

Here, t_post is used to calculate reward r using formula (12). In the same way, the reward r can be calculated when the robot meets outside RFID line.

When the robot is running on the forthright road and meets the outside RFID line, the deviation angle θ is

tpost=d1vsin(ωt−θ),E16

Robot’s adjusting time (transition delay time) is t.

If ωt -θ≥0, then

tpost=d2vsin(ωt−θ).E17

else the robot will runs out the road. And the reward r is calculated using formula (12).

As shown in Figure8(ii), when the robot is running at the corner, it must adjustθ=90°. Ifθ≠90°, the robot will read <Left>, <Right> after it turns corner. Now, the case which the robot will read inner line <Left>, <Right> will be considered. If robot’s adjusting time is t. If ωt -θ≥0, then

θ=arcsin(d2/(v•tpre)),E18

else

tpost=d2vsin(ωt−θ),E19

Same to case (1), t_post is used to calculate reward r using formula (12). In the same way, the reward r can calculate when the robot meets outside RFID line.The calculation of reward, which is calculated from t, for other cases of direction adjustment of the robot is considered as the above two cases.

In this simulation, the value of the delay time has only a maximum at optimal delay time point. The graph of relation for the delay time and its value is parabola. So, when transition’s delay time learning by function approximation method which states in section 2.2.3, the relation of the delay time and its value is assumed as:

tpost=d12vsin(ωt−θ),E20

Computer simulations of Transition’s delay time learning algorithms were executed in the all cases of the robot direction adjusting. In the simulation of algorithm of discretization, the positive inverse temperature constant β is set as 10.0. After the delay time of different cases was learnt, it is recorded in a delay time table. Then, the real robot experiment was carried out using the delay time table which was obtainedby simulation process.

Result of simulation and experiment

The simulation result of transition’s delay time learning algorithm in two cases is shown in Figure 9.

Figure 9.
Result of simulation for the guide dog robot

The simulation result of θ=5° for the robot moving adjustment on forthright road is shown in Figure 9 (i). The simulation result of robot moving adjustment at the corner is shown in Figure 9(ii). From the result, it is found that the function approximation method can quickly approach optimal delay time than the discretization method, but the discretization method can approach more near optimal delay time through long time learning.

4. Construction of the learning fuzzy Petri net model

Petri net (PN) has ability to represent and analyze concurrency and synchronization phenomena in an easy way. PN approach can also be easily combined with other techniques and theories such as object-oriented programming, fuzzy theory, neural networks, etc. These modified PNs are widely used in the fields of manufacturing, robotics, knowledge based systems, process control, as well as other kinds of engineering applications [15]. Fuzzy Petri net (FPN), which combines PN and fuzzy theory, has been used for knowledge representation and reasoning in the presence of inexact data and knowledge based systems. But traditionalFPN lacks of learning mechanism, it is the main weakness while modeling uncertain knowledge systems [25]. In this section, we propose a new learning model tool — learning fuzzy Petri net (LFPN) [7]. Contrasting with the existing FPN, there are three extensions in the new model: 1) the place can possess different tokens which represent different propositions; 2) these propositions have different degrees of truth toward different transitions; 3) the truth degree of proposition can be learned through the arc’s weight function adjusting. The LFPN model obtains the capability of fuzzy production rules learning through truth degree updating. The artificial neural network is gotten learning ability through weight adjusting. The LFPN learning algorithm which introduces network learning method into Petri net update is proposed and the convergence of algorithm is analyzed.

4.1. The learning fuzzy Petri net model

Petri net is a directed, weighted, bipartite graph consisting of two kinds of nodes, called places and transitions, where arcs are either from a place to a transition or from a transition to a place. Tokens exist at different places. The use of the standard Petri net is inappropriate in situations where systems are difficult to be described precisely. Consequently, fuzzy Petri net is designed to deal with these situations where transitions, places, tokens or arcs are fuzzified.

The definition of fuzzy Petri net

A fuzzy place associates with a predicate or property. A token in the fuzzy place is characterized by a predicate or property belongs to the place, and this predicate or property has a level of belonging to the place. In this way, we may get a fuzzy proposition or conclusion, for example, speed is low. A fuzzy transition may correspond to an if-then fuzzy production rule for instance and is realized by truth values such as fuzzy inference algorithms [11, 20, 26].

Definition 1 FPN is a8-tuple, given byFPN=<P,Tr,F, D,I,O,α, β>

where:

P = {p₁, p₂, …, p_n} isafinite set of places;

Tr = {tr₁, tr₂, …, tr_m} is a finite set of transitions;

F tpost=d22vsin(ωt−θ)(P×Tr)∪(Tr×P)is a finite set of directional arcs;

D = {d₁, d₂, …, d_n} is a finite set of propositions, where proposition d_i corresponds to place p_i; P ∩ Tr ∩ D = ; cardinality of (P) = cardinality of (D);

I: tr → P^∞ is the input function, representing a mapping from transitions to bags of (their input) places, noting as *tr;

O: tr → P^∞ is the output function, representing a mapping from transitions to bags of (their output) places, noting as tr*;

α: P → [0, 1] and β: P → D. A token value in place p_i_∈P is denoted by α(p_i)∈ [0, 1]. If α(p_i)=y_i, y_i∈[0, 1] and β(p_i)= d_i,, then this states that the degree of truth of proposition d_i is y_i.

A transition tr_kis enabled if for all p_i∈I(tr_k), α(p_i)≥th, where th is a threshold value in the unit interval. If this transition is fired, then tokens are moved from their input place and tokens are deposited to each of its output places. The truth values of the output tokens are y_i•u_k, where u_k is the confidence level value of tr_k. FPN has capability of modeling fuzzy production rules. For example, the fuzzy production rule (21) can be modeled as shown in Figure 10.

Q=a2t2+a1t+a0.E21

Figure 10.
A fuzzy Petri net model (FPN)

The definition of LFPN

In a FPN, a token in a place represents a proposition and a proposition has a degree of truth. Now, three aspects of extension are done at the FPN and learning fuzzy Petri net (LFPN) is constructed. First, a place may have different tokens (Tokensare distinguished with numbers or colors) and the different tokens represent different propositions, i.e. a place has a set of propositions. Second, a place has a special token, i.e. there is a specified proposition. This proposition may have different degrees of truth toward different transitions tr which regard this place as input place *tr. Third, the weight of each arc is adjustable and used to record transition’s input and output information.

Definition 3 LFPN is a10-tuple, given byLFPN=<P,Tr,F, D,I,O,Th,W, α, β> (A LFPN model is shown in Figure 11).

where: Tr, F,I, Oare same with definition of FPN.

P={ p₁, p₂,…, p_i,…, p_n,…, p′₁, p′₂,…, p′_i,…, p′_r} is a finite set of places, where p_i is input place and p′_i is output places.

D = {d₁₁, …, d₁_N; d₂₁,…, d₂_N; …, d_ij, …; d_n₁,…, d_nN; d′₁₁, …, d′₁_N; d′_21,…, d′₂_N; …, d′_ij, …; d′_r₁,…, d′_rN} is a finite set of propositions, where proposition d_ijis j-thproposition forinput place p_iand proposition d′_ijis j-thproposition foroutput place p′_i.

W ={w₁₁, w₁₂, …, w₁_k, …, w_1m; …; w_i₁, w_i₂, …, w_ik, …, w_im; …；w_n₁, w_n₂, …, w_nm; w′₁₁, w′₁₂, …, w′₁_r; …; w′_k₁, w′_k₂, …, w′_kj，…, w′_kr; …；w′_m₁, w′_m₂, …, w′_mr} is the set of weights on the arcs, where w_ikis a weight from i-th input placeto k-th transition and w′_kj is a weight from k-th transition to j-th output place.

α(d_ij, tr_k)→ [0, 1] and β: P → D. When p_i∈P has a specialtoken_ij and β(token_ij, p_i)=d_ij, the degree of truth of proposition d_ij in place p_itoward to transition tr_k is denoted by α(d_ij, tr_k ) ∈ [0, 1]. When tr_k fires, the probability of proposition d_ij in p_i is α(d_ij, tr_k ).

Figure 11.
The model of learning fuzzy Petri net (LFPN)

α(d_ij, tr_k)→ [0, 1] and β: P → D. When p_i∈P has a specialtoken_ij and β(token_ij, p_i)=d_ij, the degree of truth of proposition d_ij in place p_itoward to transition tr_k is denoted by α(d_ij, tr_k ) ∈ [0, 1]. When tr_k fires, the probability of proposition d_ij in p_i is α(d_ij, tr_k ).

Figure 12.
A LFPN model with one transition

Th = {th₁, th₂, …, th_k, …, th_m}representsa set of threshold values in the interval [0, 1] associated with transitions (tr₁, tr₂, …, tr_k, …, tr_m), respectively; If all p_i∈I(tr_k) and α(d_ij, tr_k )≥th_k, tr_k is enable.

As showed in Figure 12, when p_i has a token_ij, there is propositiond_ij in p_i. This proposition d_ij has different truth to tr₁, tr₂, …, tr_k, …, tr_m. When a transition tr_k fired, tokens are put into p′₁, …, p′_r according to weight w′_k₁, …, w′_kr and each of p′₁, …, p′_r gets a proposition.

Figure 11 shows a LFPN which has n-input places, m-transitions and r-output places. To explain the truth computing, transition fire rule, token transfer rule and fuzzy production rules expression more clearly, a transition and its relation arcs, places are drawn-out from Figure 11 and shown in Figure 12.

Truth computing As shown in Figure 12, w_ik is the perfectvalue for token_ij when tr_k fires. When a set of tokens= (token₁_j, token₂_j, …, token_ij, …, token_nj) are input to all places of *tr_k,β(token₁_j, p₁)= d₁_j,…, β(token_nj, p_n)= d_nj. α(d_ij, tr_k ) is computed using the degree of similarity between token_ij and w_ik and calculation formula is shown in formula (22).

⊆E22

According to LFPN modelsfor different systems, the token and weight value may have different data types. There are different methods for computing α(d_ij, tr_k ) according to data type. If value types of token and weight are real number, α(d_ij, tr_k ) is computed as formula (2). In Section 4, α(d_ij, tr_k ) will be discussed for a LFPN model which has the textual type token and weight.

Transition fire rule As shown in Figure 12, when a set of tokens=(token₁_j, token₂_j,…, token_nj) are input to all places of *tr_k, and β(token₁_j, p₁)= d₁_j,…, β(token_nj, p_n)= d_nj. If all α(d_ij, tr_k ) (i=1, 2, …, n) ≥th_kis held, tr_k is enabled. Maybe, several transitions are enabled at same time. If formula (23) is held, tr_k is fired.

IFdiTHENdj(with Certainty Factor (CF)uk)E23

(23)

Token transfer rule As shown in Figure 12, after tr_k fired, token will be taken out from p₁~p_n. The token take rule is:

If token_ij ≤w_ik is held, token_ij in p_i will be taken out.

If token_ij ≥w_ik is held, token which equates token_ij−w_ik will be left in p_i.

Thus, after a transition tr_k fired, maybe theenabletransitions still exist in LFPN. An enable transition will be selected and fired according to formula (23) until there isn’t any enable transition.

After tr_k fired, the token according w′_ki will be put into p′_i. For example, if the weight function of arc tr_k to p′_i is w′_ki, then token which equates w′_ki will be put into p′_i.

Fuzzy production rules expression A LFPN is capable of modeling for fuzzy production rules just as a FPN. For example, as a case which states in Transition fire rule and Token transfer rule, when tr_k is fired, the below production rule is expressed:

IF d₁_j AND d₂_j AND … AND d_nj THEN d′₁_k AND d′₂_k AND … AND d′_rk

α(dij,trk)=1−|wik−tokenij|max(|wik|,|tokenij|)E24

The mathematical model of LFPN

In this section, the mathematical model of LFPN will be elaborated. Firstly, some conceptions are defined. When a token_ij is input to a place p_i, it is defined event p_ij occurs, i.e. the proposition d_ijis generated and probability of event p_ijis Pr(p_ij). The fired tr_kis defined as event tr_k and probability of event tr_k occurrence is Pr(tr_k). Secondly, we assume that each transition tr₁, tr₂, …, tr_k, …, tr_m has the same fire probability in whole event space, then

α(d1j,trk)⋅α(d2j,trk)⋅...⋅α(dnj,trk)=max(α(d1j,trh)⋅α(d2j,trh)⋅...⋅α(dnj,trh)1≤h≤m)E25

And when event tr_k occurs, the conditionalprobability of p_ij occurrence is defined asPr(p_ij |tr_k), i.e. á(d_ij, tr_k ) which is the probability of proposition d_ijgeneration when tr_kfires.

When p₁,p₂, …, p_n have token₁_j, token₂_j…token_nj and eventsp₁_j, p₂_j, …, p_njoccur. Then, Pr(tr_k| p₁_j, p₂_j, …, p_nj) is:

(CF=α(d1i,trk )•α(d2i,trk )•…•α(dni,trk ))E26

When events p₁_j, p₂_j, …, p_nj occurred, there is one of transitionstr₁, tr₂, …, tr_k, …, tr_m which will be fired, therefore

Pr(trk)=1mE27

From (25), (26)and (27), (28′) is gotten by the formula of full probability and Bayesian formula.

Pr(trk|p1j,p2j,...,pnj)=Pr(p1j,p2j,...,pnj|trk)Pr(trk)∑h=1mPr(trh)Pr(p1jp2j,...,pnj)E28

The transformation from (28′) to (28) is according to definition of α(d_ij, tr_k). As shown in Figure 11, when p₁,p₂, …, p_n have token₁_j, token₂_j…token_nj, the occurring probability of transition tr₁, …, tr_k, …, tr_m are α(d₁_j, tr₁) •α(d₂_j, tr₁) •…•α(d_nj, tr₁)/m, …, α(d₁_j, tr_k) •α(d₂_j, tr_k) •…•α(d_nj, tr_k)/m, …, α(d₁_j, tr_m) •α(d₂_j, tr_m) •…•α(d_nj, tr_m)/m. Thus, the transition tr_k, which has maximum of α(d₁_j, tr_k) •α(d₂_j, tr_k) •…•α(d_nj, tr_k), is selected and fired according to formula (23).

4.2. Learning algorithm for learning fuzzy Petri net

Learning algorithm

The learning fuzzy Petri net (LFPN) can be trained and made it learn fuzzy production rules. When a set of data input LFPN, a set of propositions are produced in each input place. For example, when token vectors (token₁_j, token₂_j, …, token_nj) (j=1, 2, …, N) input to p₁~p_n, propositions d₁_j, d₂_j, …, d_nj (j=1, 2, …, N) are produced. To train a fuzzy production rule which is IF d₁_j AND d₂_j AND … AND d_nj THEN d′₁_kAND d′₂_kAND … AND d′_nk, there are two tasks:

α(d₁_j, tr_k )•α(d₂_j, tr_k )•…•α(d_nj, tr_k ) (k∈{1,2,…m}) need to be updated to hold formula (23);
2) The output weight function of tr_k need to be updated for putting correct token to p′₁~p′_r. Then, β(p′₁) = d′₁_k, β(p′₂) = d′₂_k, …, β(p′_r) = d′_rk.

To accomplish these two tasks, the weightsw₁_k, w₂_k, …, w_nkand w′_k₁, w′_k₂, …, w′_krare modified by a learning algorithm of LFPN. Firstly, we define the training data set as {(X₁, Y₁), (X₂, Y₂), …, (X_N, Y_N)}, where X is input token vector, Y is output token vector and X_j, Y_j is defined as X_j=( x₁_j, x₂_j, …, x_nj)^T, Y_j=( y₁_j, y₂_j, …,y_rj)^T,respectively. Thus,

X=(X₁, X₂,…, X_j, …, X_N,),Y=( Y₁, Y₂, …, Y_j, …, Y_N), i.e.

∑h=1mPr(trh)Pr(p1j,p2j,...,pnj)=1

Secondly, the weight W_k=(w₁_k, w₂_k, …, w_nk)^T is the weight on arcs from *tr_k to tr_k and W′_k=( w′_k₁, w′_k₂, …, w′_kr)^T is the weight on arcs from tr_k to tr_k*. W₁, …, W_k, …, W_m and W′₁, …, W′_k, …, W′_m are the input and output arcs weight for tr₁, …, tr_k, …, tr_m. Thus,

W=(W₁, W₂,…, W_k, …, W_m), W′=(W′₁, W′₂,…, W′_k, …, W′_m), i.e.

Lastly, in the learning algorithm, when tr_k is fired, the truth of d′₁_j, d′₂_j, …, d′_rjto tr_k are defined as α(d′₁_j, tr_k )=1−|y₁_j−w′_k₁|/max (|w′_k₁|, |y₁_j|), α(d′₂_j, tr_k )=1−|y₂_j−w′_k₂|/ max (|w′_k₂|, |y₂_j|), …, α(d′_rj, tr_k )=1−|y_rj− w′_kr|/ max(|w′_kr|, |y_rj|) according to definition 3.The learning algorithm of learning fuzzy Petri net is shown in Table 4.

Learning Algorithm of LFPN:
Step 1.Wand W′ are selected randomly.
Step 2.For every training data set (X_j, Y_j)(j=1, 2, …, N), subject propositions d_1j, d_2j, …, d_nj in p₁~p_n and propositions d′_1j, d′_2j, …, d′_rjin p′₁~p′_r are produced. Then do step 3 to step 7;
Step 3.For i=1 to n
For h=1to m do
Compute α(d_ij, tr_h ) according to formula (2);
Step 4.Compute maximum truth of transition
4.1 Max=α(d_1j, tr₁ ) •α(d_2j, tr₁ ) •…• α(d_nj, tr₁ ); k=1;
4.2 For h=1 to m do
If α(d_1j, tr_h ) •α(d_2j, tr_h ) •…• α(d_nj, tr_h )"/>Max
Then { Max=α(d_1j, tr_h ) •α(d_2j, tr_h ) •…• α(d_nj, tr_h );
k=h; }
Step 5. Firetr_k;
Step 6.Maked_1j, d_2j, …, d_nj have bigger truth to tr_k,
W_k^(new) = W_k^(old) + γ(X_j−W_k^(old)) (29)
(W_k^(new) is the vector W_k after update and W_k^(old)is the vector W_k before updated. γ∈ QUOTE QUOTE (0,1) is learning rate.)
Step 7.Maked′_1j, d′_2j, …, d′_rjhave bigger truth to tr_k,
W′_k^(new) = W′_k^(old) +γ(Y_j−W′_k^(old) )(30)
(W′_k^(new) is the vector W′_k after update and W′_k^(old)is the vector W′_k before updated. γ∈ (0,1) is learning rate.)
Step 8. Repeat step 2-7, until the truth of α(d_1j, tr_k ), α(d_2j, tr_k ), …, α(d_nj, tr_k ) meet the requirement.

Table 4.

Learning algorithm of learning fuzzy Petri net

X=[x11x12...x1j...x1Nx21x22...x2j...x2N::::::::::::xn1xn2...xnj...xnN] Y=[y11y12...y1j...y1Ny21y22...y2j...y2N::::::::::::yr1yr2...yrj...yrN]E29

[w11w12...w1k...w1mw21w22...w2k...w2m::::::::::::wn1wn2...wnk...wnm]E30

Some details in the algorithm need to be elaborated further.

About the net construction: The number of input and output places can be easily set according to a real problem. It is difficult to decide anumber of transitions when the net is initialized. When LFPN is used to solve a special issue, the number of transitions is initially set according to practical situation experientially. Then, transitions can be dynamically appended and deleted during the training. If an input data X_j has a maximal truth to tr_k but one or several α(d_ij, tr_k)(1≤i≤n) are less than th_k (threshold of tr_k), transition tr_k cannot fire according to definition 3. Thus, data X_jcannot fire any existed transition. This case means that W₁, W₂, …, W_k, …, W_m cannot describe the vector characteristic of X_j. Then, a new transition tr_m+₁ and the arcs which connect tr_m+₁ with input and output place are constructed. X_jcan be set as weight W_m₊₁ directly. Second, during a training episode, if there is no data in X₁, X₂, …, X_Nthat can fire transition tr_d, it means that W_d cannot describe the vector characteristic of any data X₁, X₂, …, X_N. Then, the transition tr_d and the arcs which connect tr_d with input and output place will be deleted.
About WandW′ initialization: for promoting training efficiency at the first stage of training, W and W′ are set randomly in [X_min, X_max], [Y_min, Y_max] (X_min is a vector which every components is minimal component of vector set X₁, X₂, …, X_N; X_max is a vector which every components is maximal component of vector set X₁, X₂, …, X_N; Y_min, Y_max are same meaning with X_min, X_max).
Training stop condition of the learning algorithm: According to application case, th₁, th₂, …, th_k, … th_m are generally set a same value th. Whentraining begins, the threshold th is set low (for example 0.2), th increases as training time increasing. A threshold value th_last (for example 0.9) is set as training stop condition and algorithm is run until α(d₁_j, tr_k )> th_last, α(d₂_j, tr_k ) > th_last…α(d_nj, tr_k ) > th_last. From transition appending analysis, we understand that number of transitions will near to the number of training data if the threshold of transition sets near to 1. In this case, results will be obtained more correctly but the training time and LFPN running time will increase.

Analysis for convergence of LFPN learning algorithm

In this section, the convergence of the proposed algorithm will be analyzed. In step 6 of the LFPN learning algorithm, the formula (29) is used for making Wk (new) approach Xj than Wk (old) when Xj fired a transition trk. It is proved as follows.

Wk(new)=Wk(old)+γ(Xj−Wk(old))

Formula (11′) is rewritten as a scalar type and the scalar type of (X_j–W_k^(old)) is used to divide both sides of formula (11′). We get formula (11).

W′k(new)=W′k(old)+γ(Yj−W′k(old))

Wk(new)=Wk(old)+γ(Xj–Wk(old))=Wk(old)+γXj–γWk(old)Xj–Wk(new)=Xj–[Wk(old)+γ(Xj–Wk(old))]=Xj–Wk(old)–γXj+γWk(old)=(1–γ) (Xj–Wk(old))E31

Hence, W_kwill converge to X_j after enough training times.

In LFPN learning algorithm, there may be a class of training data X_j which are able to fire same transition tr_k. In this case, W_kapproaches to a class of data X_j and converges to a point in the class of data X_j according to formula (31).

Now, we will discuss the point in the class of data X_j where W_kconverges to. Supposing,there are b₁ data which are in X₁, X₂,…, X_j, …, X_N and fire a certain transition tr_kat the first training episode. At the second training episode, there are b₂ data which fire tr_k, and so on. If the total training times is ep and the total number of data which fire tr_k is t, t =(xij−wkj(new)xij−wkj(old))0≤i≤n. According to the order of the data fired tr_k, these t data are rewritten as X_k₁, X_k₂, …, X_kt. The average of training data X_k₁, X_k₂, …, X_kt is noted as =((1−γ)⋅(xij−wkj(old))xij−wkj(old))0≤i≤n=1−γ_k. To record the updated process of W_k simply, the updated order of W_k is recorded as W_k₁,W_k₂….W_kt.

The learning rate γ (0<γ<1) will decrease according to training time increasing, and it approaches to 0 at last because every training data cannot effectW_k too much in the last stage of training, else W_k will shake at the last stage of training. If learning rate γ is set as 1/(q+1) (q>0) when training begin, 1/(q+2), 1/(q+3), …, 1/(q+t) are set as learning rate γ when tr_k is fired at 2, 3, …, t time. Here, the initial values of W_kis set as W_k₀=W⁽⁰⁾×1/q, every component of W⁽⁰⁾×1/q is a random value in [X_min, X_max]. According to formula (29), we get

∑i=1epbiE32

When the training time increases, the training data set X_k₁, X_k₂, …, X_kt can be looked as very large, i.e. t is large.

X¯E33

Generally, q is a small positive constant and t is large. Then,

Wk0=1qW(0)Wk1=Wk0+1q+1(Xk1−Wk0)=1qW(0)−1q+1×1qW(0)+1q+1Xk1=1q+1(W(0)+Xk1)Wk2=Wk1+1q+2(Xk2−Wk1)=1q+1W(0)+1q+1Xk1−1q+1×1q+2W(0)−1q+1×1q+2Xk1+1q+2Xk2=1q+2(W(0)+Xk1+Xk2)Wkt=Wk,t−1+1q+t(Xkt−Wk,t−1)=1q+t−1W(0)−1q+t×1q+t−1W(0)+1q+t−1Xk1−1q+t×1q+t−1Xk1+...+1q+t−1Xk,t−1−1q+t×1q+t−1Xk,t−1+1q+tXkt=1q+t(W(0)+Xk1+...+Xk,t−1+Xkt)E34

From formula (14) will be gotten:

limt→∞Wkt=limt→∞1q+t(W(0)+Xk1+...+Xk,t−1+Xkt)E35

In the same way, W_k→limt→∞Wkt≈limt→∞1t(W(0)+Xk1+...+Xk,t−1+Xkt)=limt→∞1tW(0)+limt→∞1t(Xk1+...+Xk,t−1+Xkt)≈limt→∞1t(Xk1+...+Xk,t−1+Xkt)=Xk¯(k=1, 2, …, m) andW′_k→Wk→X¯k(k=1, 2, …, m) can be proved. Consequently, the learning algorithm of LFPN converges.

Now, we will analyze the convergence process and signification of convergence.

X_k₁, X_k₂, …, X_kt fire a certain transition tr_k at training time. As the training time increase, there are almost same data which fire the transition tr_k in every training time. These data belong to a class k. We suppose that these data are X_k₁, X_k₂, …, X_ks. When training begins, supposing, there is data X_uwhich does not belong to X_k₁, X_k₂, …, X_ks but fires tr_k. But, when training times increase, W_k will approach to X_k₁, X_k₂, …, X_ks and the probability which X_ufires tr_k will decrease. Hence, this type data X_u is very small part of X_k₁, X_k₂, …, X_kt. X_ulittle affects to W_k. On the other hand, when training begins, there is X_ke which belongs to X_k₁, X_k₂, …, X_ks but doesn’t fire transition tr_k. But, when training times increase, the probability which X_ke fires tr_k increases, then,X_k₁, X_k₂, …, X_ks can be approximately looked firing tr_kaccording tothe training. X¯kis denoted as the average of training data X_k₁, X_k₂, …, X_ks.
In the convergence demonstration, we use a special series of learning rate γ.Form the analysis in 1), X_k₁, X_k₂, …, X_ks can be looked as a class data which fires one transition tr_k. The data series X_k₁, X_k₂, …, X_kt can be looked as iterations of X_k₁, X_k₂, …, X_ks. W_k can converge to a point near Y¯kwith any damping learning rate series γ.
After training, W_k=(w₁_k, w₂_k, …, w_nk) comes near to the average of data which belong to class k, i.e.,W_k ≈ X¯k=(X¯k₁_k,X¯k₂_k,…,x¯_nk). When a data X_kj belong to class k comes, X_kj will have same vector characteristic with X_k₁, X_k₂, …, X_ks, i.e. x_1,_kj, x_2,_kj, …, x_n_,_kj are near to w₁_k, w₂_k, …, w_nk. Then, each component x_i_,_kj (1≤i≤n) of this data X_kj will have bigger similarity to w_ik (1≤i≤n) than i-th components of other weight W according to formula (2). X_kj will have biggest truth to tr_k according to formula (2). Thus, when data X_kj which belongs to class of X_k₁, X_k₂, …, X_ks inputs to LFPN, it will fire tr_k correctly and product correct output.

5. Web service discovery based on learning fuzzy Petri net model

Web services are used for developing and integrating highly distributed and heterogeneous systems in various domains. They are described by Web Services Description Language (WSDL). Web services discovery is a key to dynamically locating desired Web services across the Internet [16]. It immediately raises an issue, i.e. to evaluate the accuracy of the mapping in a heterogeneous environment when user wants to invoke a service. There are two aspects which need to evaluate. One is functional evaluation. The service providing function should be completely matched with user’s request; another aspect is non-functional evaluation, i.e. Quality of Service (QoS) meets user’s requirement. UDDI (Universal Description, Discovery and Integration) is widely used as a kind of discovery approach for functional evaluation. But, as the number of published Web services increases, discovering proper services using the limited description provided by the UDDI standard becomes difficult [17]. And UDDI cannot provide the QoS information of service. To discover the most appropriate service, there are necessary to focus on developing feasible discovery mechanisms from different service description methods and service execution context. Segev proposed a service function selection method [21]. A two-step, context based semantic approach to the problem of matching and ranking Web services for possible service composition is elaborated. The two steps for service function selection are Context extraction and Evaluation for Proximity degree of Service. Cai proposed service performance selection method [3]. The authors used a novel Artificial Neural Network-based service selection algorithm according to the information of the cooperation between the devices and the context information. In this paper, we aim at analyzing different context of services and constructing a services discovery model based on the LFPN. Firstly, different service functional descriptions are used to evaluate service function and an appropriate service is selected. Secondly, context of QoS is used to predict QoS and a more efficient service is selected. Data of QoS is real number and LFPN learning algorithm is directly used. But service function description is literal. Therefore, a Learning Fuzzy Petri Net for service discovery model is proposed for keyword learning based on LFPN.

5.1. Web services discovery model based on LFPN

To map a service’s function accurately, free textual service description, WSDL description, Web service’s operation and port parameters which are drawn from WSDL are used as input data here. Because the input data type is keyword, the proposed LFPN cannot deal with this type of data. Consequently, a Learning Fuzzy Petri Net for Web Services Discovery model (LFPNSD) is proposed. LFPNSD is a 10-tuple, given by LFPNSD =<P,Tr,F, W, D,I, O, Th, α, β > (as shown in Figure 13.)

where: Tr, I, O, Th, β are same with definition of LFPN.

P= {P_input}∪{P_output}={P₁₁, P₁₂, P₁₃}∪{P₂₁, P₂₂, P₂₃, P₂₄}

Fx¯(P_input×Tr)∪(Tr×P_output)

W=F→Keywords⁺, where weight function on P_input×Trare different keywords of service description and weight function on Tr×P_output are different service invoking information.

D = {d_11,_a, d_12,_b, d_13,_c}∪{d_21,_e, d_22,_f, d_23,_g, d_24,_h } is a finite set of propositions, where proposition d_11, _a is that P₁₁ has a service description tokens; proposition d_12, _b is that P₁₂ has a free textual description tokens; proposition d_13, _c is that P₁₃ has a service operation and port parameters tokens. And the propositions d_21, _e, d_22, _f, d_23, _g, d_24, _h are that P₂₁, P₂₂, P₂₃, P₂₄have different invoking information tokens of services.

Figure 13.
The learning fuzzy Petri net for Web service discovery (LFPNSD)

α(d_ij, tr_k )→ [0, 1]. α(d_ij, tr_k)=y_i∈ [0, 1] is the degree of truth of proposition d_ij to tr_k. α(d_ij, tr_k) is computed by bellow rules: if input description has n keywords and the w_ik on arc P_i to tr_k has s same keywords, the degree of similarity between weight keywords and input description keywords is expressed as:

x¯E36

The fire rule of transition: if α(d_11,_a, tr_k) •α(d_12,_b, tr_k) •α(d_13,_c, tr_k) =max((α(d_11,_a, tr_i) •α(d_12,_b, tr_i) •α(d_13,_c, tr_i))_1≤_i_≤_m) and all of α(d_11,_a, tr_k), α(d_12,_b, tr_k), α(d_13,_c, tr_k) are bigger than a threshold value th, then tr_k fires, the tokens in P₁₁~P₁₃ are taken out and tokens which according to w_k_,21, w_k_,22, w_k_,23,w_k_,24 are put into P₂₁~P₂₄.

As shown in Figure 13, service free textual description, WSDL description and operation and port information are used as input vector in the learning algorithm. And, service classification, WSDL address, all of service operation names and service SOAP messages are used as output vector. Because the training data type is the keyword, the learning algorithm of LFPN isdeveloped into a learning algorithm of LFPNSD. The learning algorithm of learning fuzzy Petri net for Web service discovery is shown in the table 5.

Learning Algorithm of LFPNSD:
Step 1. Make all weights on arcsbe ∅;
Step 2. For every service in training data set,
Repeat:
Get free textual description; Draw out WSDL description and operation and port name from WSDL;
Set service textual description, WSDL description, operation and port information as input vector;
Compare the input with the keywords on the weight of input arc:
If every keyword in weight is in the input data, then compute α(d_ij, tr_k) according to formula (16), else set α(d_ij, tr_k) =0.
If each of α(d_ij, tr₁), α(d_ij, tr₂), …, α(d_ij, tr_m-1) equates 0 and the weight of tr_m is ∅, then set α(d_ij, tr_m) =1.
If each of α(d_ij, tr₁), α(d_ij, tr₂), …, α(d_ij, tr_m-1) equates 0 and tr_m doesn’t exist, a new transition tr_mand the arcs which connect tr_mwith input and output place are constituted, set weight of arcs to be ∅ and α(d_ij, tr_m) =1.
If α(d_11,a, tr_k) •α(d_12,b, tr_k) •α(d_13,c, tr_k) =max((α(d_11,a, tr_i) •α(d_12,b, tr_i) •α(d_13,c, tr_i))_1≤i≤m), then tr_k fires.
If the tr_k fired, get a keyword in service description but not in the weight, and add it into the weight.
If training time is t and the weight is ∅, t keywords in service description are gotten and they are added into the weight.
If the tr_k fired, compare out training data (service classification, WSDL address, service operation and message) with the weight of w_k,21, w_k,22, w_k,23, w_k,24, and calculate and record the correct rate of output.
Update w_k,21, w_k,22, w_k,23, w_k,24according to output of training data.
Step 3. Repeat step 2, until each α(d_11,a, tr_k), α(d_12,b, tr_k), α(d_13,c, tr_k) meets the requirement value th_k.

Table 5.

Learning algorithm of learning fuzzy Petri net for Web service discovery

Discussion:

We discuss about the learning rate γ in the learning algorithm of LFPNSD. In the algorithm, the keyword is learned and added into weights one by one. Hereby,X_j–W_k^(new) =1 and X_j−W^(old) equates the difference between the number of input data keywords and the number of keywords on arc weight. Because X_j−W^(old) is not constant, the learning rate γ is different at each learning episode. For example, when input data has 10 keywords and arc weight has 6 keywords firstly, one keyword is learnt from input data and added into weight. In this case, the learning rate is 1/(10-6)=0.25.
If keyword isn’t learning one by one, the keywords on W₁, W₂, …, W_k, …, W_m will do not balance at beginning stage of training. Then, the similar but different description services have unbalance probability to fire transition at beginning stage of training. This makes the similar but different description services improperly fire a transition which has more keywords on its weight. It makes training efficiency lower.
In step 2.3 of algorithm, when each of α(dij, tr1), α(dij, tr2), …, α(dij, trm-1) equates 0, it means all weights on transition tr1~ trm-1 cannot describe this service. Therefore, it is a new type service. If there is a transition which has weight arc, it is used to record the new type service; else a new transition needsto be constructed.

5.2. The result of simulation

The two simulations are carried out. One is a more efficient service selection through QoS prediction using LFPN. The other is a service selection for appropriate function using LFPNSD.

Simulation for more efficient Web service selection

During the process of Web services discovery, there are maybe several services which have same function. One service which has the best QoS needs to be select. Hereby, the service performance context is used to predict the QoS value for next execution of service. If the prediction is precise enough, an appropriate service maybe selected.

In this simulation, LFPN is used as learning model for predicting service execution time which is main part of QoS. There are 11 inputs and 1 output in this model. 11 inputs include 10 data which are last 10 times execution time of a service and one data which is reliability of the service. The output is a prediction for execution time of service’s next execution. 10 transitions of LFPN is set when initialization.

A Web service performance dataset is employed for simulation. This dataset includes 100 publicly available Web services located in more than 20 countries. 150 service users executed about 100 invocations on each Web service. Each service user recorded execution time and invocation failures in dataset [27]. We selected one use’s invocation data as training data. Last 10 times execution time and reliability of each service was set as input and next time execution time was set as output. 20 sets of training data were selected for each of 100 services.

The initial threshold is selected as 0.2 and the threshold is increased 0.001 at every training episode. The initial learning rate is set as 1/1.1 for every transition. The learning rate is 1/(0.1+t) when a transition fired t times. Prediction result and training output data are noted as Output_predict and Output_training. Prediction precision probability Pre_pro is used to evaluate the precision result. And the precision probability is computed using:

Pre_pro =1-(|Output_predict−Output_training|/ Output_training).

Three different training stop conditionsare set as that three threshold values equal to 0.7, 0.8, and 0.9. The simulation result is listed in Table 6. Here, the number of service, which their execution time is precisely predicted, increased with the training threshold value increasing.

In the paper [3], the authors improved the traditional BP algorithm based on three-term method consisting of a learning rate, a momentum factor and a proportional factor for predicting service performance according to service context information. In this paper, this model is used to predict service execution time. The training data is same to LFPN’s. And the learning rate is 0.6, momentum factor 0.9, proportional factor 1 and training times is 10,000. We compared the simulation result of the method of [3], i.e. the conventional method, with that of LFPN in Table 7. From Table 7, it is shown that Web service number of high precision in LFPN’s prediction is bigger than the number of BP algorithm’s prediction and Web service number of low precision in LFPN’s prediction is smaller that BP algorithm’s prediction. Hereby, the result of LFPN is better than result of three term’s BP algorithm.

Precision	0.99~1	0.98~0.99	0.95~ 0.98	0.9~0.95	0.8~0.9	0.7~0.8	0.6~0.7	0~0.6
Number of Web services (th= 0.9)	21	14	17	15	10	8	9	6
Numberof Web services (th= 0.8)	17	12	14	11	10	12	10	14
Number of Web services (th= 0.7)	10	10	16	8	8	11	19	18

Table 6.

Prediction ability of LFPN

Precision	0.99~1	0.98~0.99	0.95~0.98	0.9~0.95	0.8~0.9	0.7~0.8	0.6~0.7	0~0.6
Number of Web services using the LFPN(th=0.9)	21	14	17	15	10	8	9	6
Number of Web services using the conventional method	6	7	15	18	20	12	10	12

Table 7.

Prediction ability compares for two methods

Simulation for selection of Web service’s function

In this simulation, LFPNSD is used as leaning model. The benchmark Web services which listed at www.xmethods.net are used as training data. Each service of these 260 services has a textual description and its WSDL address. And, we can get WSDL description, operation and port parameters from the WSDL. We want to classify the Web service into four classes: 1) business, 2) finance, 3) nets and 4) life services. After training, Web services are invoked by natural language request [14]. The natural language is decompounded into three inputs of this model. For example, we want to get a short message service (SMS) for sending a message to a mobile phone. The nature language of this discovery is input and decomposed into three parts: 1) WSDL description: send a message to a mobile phone; 2) free textual service description: sending a message to a mobile phone through the Internet; 3) operation and port parameters maybe have operation names: send messages, send message multiple recipients, and so on; port names send serviceSOAP, and so on.

In this simulation, we firstly set 100 transitions for LFPNSD model. The training stop condition is th_k (1≤k≤m) ≥ 0.6. The service selection precision is recorded after every time of training. As shown in Figure 14 and 15, using LFPNSD model and its learning algorithm described in Section 5.1, every service class precision probability raised to more than 0.9when the training time reaches to 10.

Figure 14.
The results of simulation using LFPNSD and its learning algorithm− Discovery Precision Probability for total services

Figure 15.
The results of simulation using LFPNSD and its learning algorithm − Discovery Precision Probability for classification services

A method for evaluating the proximity of services is proposed [21]. In the method, WSDL document is represented as D^wsdl={t₁, t₂, …, t_wsdl} and D^desc={t₁, t₂, …, t_desc} represents the textual description of the service. Because there is another descriptor of operation and port parameters in LFPNSD model, we add this descriptor as D^op&port={t₁, t₂, …, t_op&port} in order to compare two methods. Here, t_wsdl, t_desc and t_op&port are last keyword of WSDL, textural description and operation and port parameters. In the proximity of services method, the descriptor of natural language request which is provided by a user is D_user and descriptor of invoked service is D_inv. The three Context Overlaps (CO) are defined as same keywords between D^wsdl_user, D^desc_user, D^op&port_user and D^wadl_inv, D^desc_inv, D^op&port_inv. The proximity of user requested service and invoked service is defined as a root of sum of three CO’s squares. When a user invoking comes, it is compared with all services in services repository. Then, one service in D_inv, which has the biggest proximity value with D_user, was selected. We compared the discovery precision probability of this method (conventional method) withthe proposed LFPNSD.The simulation resultsare shown in Figure 16. The LFPNSD method yielded higher precision probabilities than theconventional method proposed in [21]. Especially when the service number of Web services’repository becomes more than 88, the difference is much more significant.Here, a correct service is selected in 14 services, 24 services, 37 services, 54 services, 88 services, 151 services just as they were used in [21].

Figure 16.
Comparison of two discovery methods

6. Conclusion

In this chapter, Learning Petri net (LPN) was constructed based on High-level Time Petri net and reinforcement learning (RL). The RL was used for adjusting the parameter of Petri net. Two kinds of learning algorithm were proposed for Petri net’s discrete and continuous parameter learning. And verification for LPN was shown. LPN model was applied to dynamical system control. We had used the LPN in three robot systems control - the AIBO, Guide Dog. The LPN models were found and controlled for these robot systems. These robot systems could adjust their parameters while system was running. And the correctness and effectiveness of our proposed model were confirmed in these experiments. LPN model was improved to the hierarchical LPN model and this improved hierarchical LPN model was applied to QoS optimization of Web service composition. The hierarchical LPN model was constructed based on stochastic Petri net and RL. When the model was used, the Web service composition was modeled with stochastic Petri net. A Web service dynamical composing framework is proposed for optimizing QoS of web service composition. The neural network learning method was used to Fuzzy Petri net. Learning fuzzy Petri net (LFPN) was proposed. Contrasting with the existing FPN, there are three extensions in the new model: the place can possess different tokens which represent different propositions; these propositions have different degrees of truth toward different transitions; the truth degree of proposition can be learnt through adjusting of the arc’s weight function. The LFPN model obtains the capability of fuzzy production rules learning through truth degree updating. The LFPN learning algorithm which introduced network learning method into Petri net update was proposed and the convergence of the algorithm was analyzed. The LFPN model was used into discovery of Web service. Using the LFPN model, different service functional descriptions are used to evaluate service function and an appropriate service is selected firstly, Secondly, context of QoS is used to predict QoS and a more efficient service is selected.

In the future, the different intelligent computing methods will be used into Petri net for constructing different type of LPN. The efficient different types of LPN used in different special area will be compared and an efficient LPN model for solving various problems will be founded.

References

1. KonarA.ChakrabortyU. K.WangP. P.Supervised Learning on a Fuzzy Petri Net. Information Sciences20051723-4
2. Hrúz B., Zhou M.C.Modeling and Control of Discrete-event Dynamic Systems: with Petri Nets and Other Tools.Springer Press. London, UK, 2007
3. CaiH.HuX.LuQ.CaoQ. A.NovelIntelligent.ServiceSelection.AlgorithmApplicationfor.UbiquitousWeb.ServicesEnvironment.Expert Systems with Applications 2009362
4. DoyaK.Reinforcement Learning in Continuous Time and Apace, Neural Computation, 2000121
5. FengL. B.ObayashiM.KuremotoT.KobayashiK. A.LearningPetri.NetModel.Basedon.ReinforcementLearning.Proceedings of the 15th International Symposium on Artificial Life and Robotics (AROB 2010 Oct. 27-30, 290-293.
6. FengL. B.ObayashiM.KuremotoT.KobayashiK.An Intelligent Control System Construction Using High-Level Time Petri Net and Reinforcement LearningProceedings of International Conference on Control, Automation, and Systems (ICCAS 2010Oct. 27-30, 535- 539.
7. FengL. B.ObayashiM.KuremotoT.KobayashiK. A.learningPetri.netModel. I. E. E.IEEJ Transactions on Electrical and Electronic Engineering 201273274282
8. FrederickJ. R.Statistical Methods for Speech. The MIT Press. Cambridge, Massachusetts, USA, 1999
9. GuangmingC.MinghongL.XianghuW.The Definition of Extended High-level Time Petri Nets.Journal of Computer Science 200622127143
10. HirasawaK.OhbayashiM.SakaiS.HuJ.Learning Petri Network and Its Application to Nonlinear System Control.IEEE Transactions on SystemsMan and Cybernetic, Part B: Cybernetics, 1998781 EOF9 EOF
11. StudyV. I. R. T. A. N. E. N. H. E. A.inFuzzy.PetriNets.theRelationship.toFuzzy.LogicProgramming.Reportson.ComputerScience.MathematicsNo.1995
12. YanH. S.JianJ.Agileconcurrent.engineeringIntegrated Manufacturing Systems 1999102103113
13. WangJ.Petri nets for dynamic event-driven system modeling. Handbook of Dynamic System Modeling2007Ed: Paul Fishwick, CRC Press, 117
14. LimJ. H.LeeK. H.Constructing Composite Web Services from Natural Language RequestsWeb Semantics: Science, Services and Agents on the World Wide Web181 EOF13 EOF
15. LiX.YuW.RsanoF. L.Dynamic Knowledge Inference and Learning under Adaptive Fuzzy Petri Net FrameworkIEEE Transactions on System, Man, and Cybernetics-Part C 2000304442 EOF450 EOF
16. PapazoglouM. P.GeorgakopoulosD.Service-OrientedComputing.Communicationsof.theA. C. M.2003
17. PlatzerC.DustdarS.VectorA.SpaceSearch.Enginefor.WebServices.ProcThird European Conf. Web Services (ECOWS’05) 2005
18. SuttonR. S.BattoA. G.Reinforcementlearning.AnIntroduction.TheM. I. T.PressCambridge.MassachusettsU. S.USA, 1998
19. Sony OPEN-R programming group, OPEN-R programming introduction. Sony Corporation,Japan, (2004
20. TzafestaS. G.RigatosG. G.Stability analysis of an adaptive fuzzy control system using Petri nets and learning automata. Mathematics and computers in Simulation 2000513
21. SegevA.TochE.ContextContext-Based Matching and Ranking of Web Services for CompositionIEEE Transaction on Services computing 200923210 EOF222 EOF
22. BaranaushasV.SarkauskasK.Colored Petri Nets-Tool for control system Learning. Electronics and Electrical Engineering 20064684146
23. VictorR. L.ShenReinforcement Learning for High-level Fuzzy Petri Nets, IEEE Transactions on System, Man, and Cybernetics-Part B 2003332
24. PedryczW.and.GomideF. A.GeneralizedFuzzy.PetriNet.ModelI. E. E. E.Transactionon.FuzzySystem.199424
25. XuH.WangY.JiaP.Fuzzy Neural Petri Nets, Proceedings of the 4th International Symposium on Neural Networks: Part II--Advances in Neural Networks, 2007328335
26. DingZ. H.BunkeH.SchneiderM.KandelA.Fuzzy Time Petri net Definitions, Properties, and Applications,Mathematical and Computer Modeling, Vol. 41, No.2-3 2005 345 EOF 360 EOF
27. ZhengZ. B.LyuM. R.R.Collaborative Reliability Prediction for Service-Oriented Systems, Proceedings of the ACM/IEEE 32nd International Conference on Software Engineering (ICSE2010

[1] 1. KonarA.ChakrabortyU. K.WangP. P.Supervised Learning on a Fuzzy Petri Net. Information Sciences20051723-4

[2] 2. Hrúz B., Zhou M.C.Modeling and Control of Discrete-event Dynamic Systems: with Petri Nets and Other Tools.Springer Press. London, UK, 2007

[3] 3. CaiH.HuX.LuQ.CaoQ. A.NovelIntelligent.ServiceSelection.AlgorithmApplicationfor.UbiquitousWeb.ServicesEnvironment.Expert Systems with Applications 2009362

[4] 4. DoyaK.Reinforcement Learning in Continuous Time and Apace, Neural Computation, 2000121

[5] 5. FengL. B.ObayashiM.KuremotoT.KobayashiK. A.LearningPetri.NetModel.Basedon.ReinforcementLearning.Proceedings of the 15th International Symposium on Artificial Life and Robotics (AROB 2010 Oct. 27-30, 290-293.

[6] 6. FengL. B.ObayashiM.KuremotoT.KobayashiK.An Intelligent Control System Construction Using High-Level Time Petri Net and Reinforcement LearningProceedings of International Conference on Control, Automation, and Systems (ICCAS 2010Oct. 27-30, 535- 539.

[7] 7. FengL. B.ObayashiM.KuremotoT.KobayashiK. A.learningPetri.netModel. I. E. E.IEEJ Transactions on Electrical and Electronic Engineering 201273274282

[8] 8. FrederickJ. R.Statistical Methods for Speech. The MIT Press. Cambridge, Massachusetts, USA, 1999

[9] 9. GuangmingC.MinghongL.XianghuW.The Definition of Extended High-level Time Petri Nets.Journal of Computer Science 200622127143

[10] 10. HirasawaK.OhbayashiM.SakaiS.HuJ.Learning Petri Network and Its Application to Nonlinear System Control.IEEE Transactions on SystemsMan and Cybernetic, Part B: Cybernetics, 1998781 EOF9 EOF

[11] 11. StudyV. I. R. T. A. N. E. N. H. E. A.inFuzzy.PetriNets.theRelationship.toFuzzy.LogicProgramming.Reportson.ComputerScience.MathematicsNo.1995

[12] 12. YanH. S.JianJ.Agileconcurrent.engineeringIntegrated Manufacturing Systems 1999102103113

[13] 13. WangJ.Petri nets for dynamic event-driven system modeling. Handbook of Dynamic System Modeling2007Ed: Paul Fishwick, CRC Press, 117

[14] 14. LimJ. H.LeeK. H.Constructing Composite Web Services from Natural Language RequestsWeb Semantics: Science, Services and Agents on the World Wide Web181 EOF13 EOF

[15] 15. LiX.YuW.RsanoF. L.Dynamic Knowledge Inference and Learning under Adaptive Fuzzy Petri Net FrameworkIEEE Transactions on System, Man, and Cybernetics-Part C 2000304442 EOF450 EOF

[16] 16. PapazoglouM. P.GeorgakopoulosD.Service-OrientedComputing.Communicationsof.theA. C. M.2003

[17] 17. PlatzerC.DustdarS.VectorA.SpaceSearch.Enginefor.WebServices.ProcThird European Conf. Web Services (ECOWS’05) 2005

[18] 18. SuttonR. S.BattoA. G.Reinforcementlearning.AnIntroduction.TheM. I. T.PressCambridge.MassachusettsU. S.USA, 1998

[19] 19. Sony OPEN-R programming group, OPEN-R programming introduction. Sony Corporation,Japan, (2004

[20] 20. TzafestaS. G.RigatosG. G.Stability analysis of an adaptive fuzzy control system using Petri nets and learning automata. Mathematics and computers in Simulation 2000513

[21] 21. SegevA.TochE.ContextContext-Based Matching and Ranking of Web Services for CompositionIEEE Transaction on Services computing 200923210 EOF222 EOF

[22] 22. BaranaushasV.SarkauskasK.Colored Petri Nets-Tool for control system Learning. Electronics and Electrical Engineering 20064684146

[23] 23. VictorR. L.ShenReinforcement Learning for High-level Fuzzy Petri Nets, IEEE Transactions on System, Man, and Cybernetics-Part B 2003332

[24] 24. PedryczW.and.GomideF. A.GeneralizedFuzzy.PetriNet.ModelI. E. E. E.Transactionon.FuzzySystem.199424

[25] 25. XuH.WangY.JiaP.Fuzzy Neural Petri Nets, Proceedings of the 4th International Symposium on Neural Networks: Part II--Advances in Neural Networks, 2007328335

[26] 26. DingZ. H.BunkeH.SchneiderM.KandelA.Fuzzy Time Petri net Definitions, Properties, and Applications,Mathematical and Computer Modeling, Vol. 41, No.2-3 2005 345 EOF 360 EOF

[27] 27. ZhengZ. B.LyuM. R.R.Collaborative Reliability Prediction for Service-Oriented Systems, Proceedings of the ACM/IEEE 32nd International Conference on Software Engineering (ICSE2010

Construction and Application of Learning Petri Net

Petri Nets - Manufacturing and Computer Science

Author Information

Liangbing Feng

Masanao Obayashi

Takashi Kuremoto

Kunikazu Kobayashi

1. Introduction

2. The learning Petri net model

2.1. Definition of HLTPN

2.2. Definition of LPN

Figure 1.

2.3. Learning algorithm for LPN

2.3.1. Discrete parameter learning

Table 1.

2.3.2. Continuous parameter learning

Figure 2.

Table 2.

Table 3.

3. Applying LPN to robotic system control

3.1. Application for discrete event dynamic robotic system control

Figure 3.

Figure 4.

Figure 5.

3.2. Application for continuous parameter optimization

Figure 6.

Figure 7.

Figure 8.

Figure 9.

4. Construction of the learning fuzzy Petri net model

4.1. The learning fuzzy Petri net model

Figure 10.

Figure 11.

Figure 12.

4.2. Learning algorithm for learning fuzzy Petri net

Table 4.

5. Web service discovery based on learning fuzzy Petri net model

5.1. Web services discovery model based on LFPN

Figure 13.

Table 5.

5.2. The result of simulation

Table 6.

Table 7.

Figure 14.

Figure 15.

Figure 16.

6. Conclusion

References

Continue reading from the same book

Petri Nets