Open Access is an initiative that aims to make scientific research freely available to all. To date our community has made over 100 million downloads. It’s based on principles of collaboration, unobstructed discovery, and, most importantly, scientific progression. As PhD students, we found it difficult to access the research we needed, so we decided to create a new Open Access publisher that levels the playing field for scientists across the world. How? By making research easy to access, and puts the academic needs of the researchers before the business interests of publishers.
We are a community of more than 103,000 authors and editors from 3,291 institutions spanning 160 countries, including Nobel Prize winners and some of the world’s most-cited researchers. Publishing on IntechOpen allows authors to earn citations and find new collaborators, meaning more people see your work not only from your own field of study, but from other related fields too.
Key Laboratory of Integrated Automation of Process Industry, Northeastern University, P. R. China
Heng Yue
Key Laboratory of Integrated Automation of Process Industry, Northeastern University, P. R. China
Tianyou Chai
Key Laboratory of Integrated Automation of Process Industry, Northeastern University, P. R. China
*Address all correspondence to:
1. Introduction
Rotary kiln is a kind of large scale sintering device widely used in metallurgical, cement, refractory materials, chemical and environment protection industries. Its complicated working mechanism includes physical change and chemical reaction of material, procedure of combustion, thermal transmission among gaseous fluid, solid material fluid and the liner. The automation problem of such processes remains unsolved because of the following inherent complexities. A rotary kiln is a typical distributed parameter system with correlative temperature distribution of gaseous phase and solid phase along its axis direction. Limited by device rotation and technical design, sensors and actuators can be installed only at the kiln head and kiln tail, and lumped parameter control strategies are employed to deal with distributed parameter problems. Thus the rotary kiln process is a multivariable nonlinear system with strong coupling, large lag and uncertain disturbances. Moreover, the key controlled variable of burning zone temperature is measured with serious disturbances. Most of rotary kilns are still under manual control with human operator observing the burning status. As a result, the product quality is hard to be kept consistent and energy consumption remains high, kiln liner is easy to wear out, the kiln running rate and yield is low.
Although several advanced control strategies including fuzzy control (Holmblad & Østergaard, 1995), intelligent control (Jarvensivu et al., 2001a; Jarvensivu et al., 2001b) and predictive control (Zanovello & Budman, 1999) have been introduced into process control of rotary kiln, all these researches focused on stabilizing some key controlled variables but are valid only for cases that boundary conditions do not change frequently. As a matter of fact, the boundary conditions of a rotary kiln often change. For example, the material load, water content and components of the raw material slurry vary frequently and severely. Moreover, the offline analysis data of components of raw material slurry reach the operator with large time delay. Thus conventional control strategy cannot reach automatic control and keep the product quality consistent. To deal with the complexity of operation conditions, the authors have proposed an intelligent control system based on human-machine interaction for an alumina rotary kiln in (Zhou et al., 2004; Zhou et al., 2006), in which human intervention function was design so that, if the operation condition changed largely, the human operator observing burning status can intervene the control actions when the system is in the automatic control mode to enhance the adaptability of the control system.
This chapter develops a supervisory control approach for burning zone temperature based on Q-learning, in which the signals of human intervention are viewed as the reinforcement learning signals. Section 2 makes brief descriptions of process and supervisory control system architecture. Section 3 discusses the detailed methodology of Q-learning-based supervisory control approach. The implementation and industrial applications are shown in Section 4. Finally, Section 5 draws the conclusion.
2. Process description and supervisory control system architecture
The alumina rotary kiln process is described as follows. Raw material slurry is sprayed into the rotary kiln from upper end (the kiln tail). At the lower end (the kiln head), the coal powders from the coal injector and the primary air from the air blower are mixed into bi-phase fuel flow, which is sprayed into the kiln head hood and combusts with the secondary air, which comes from the cooler. The heated gas was brought to the kiln tail by the induced draft fan, while the material moves to the kiln head via the rotation of the kiln and its self weight, in counter direction with the gas. After the material passes through the drying zone, pre-heating zone, decomposing zone, burning zone and cooling zone in sequence, soluble sodium aluminate is generated in the clinker, which is the product of the kiln process. This process aims to reach high digesting rate of alumina in the following digestion procedure.
Figure 1.
Schematic diagram of the alumina rotary kiln.
The control problem of quality index of kiln production is how to keep the liter weight of clinker being qualified under fluctuated boundary conditions and operating conditions. The liter weight of clinker is hard to measure online and cannot be controlled directly. This paper employs the following strategy to deal with this problem. Some online measurable technologic parameters with closed relations to the final quality index are chosen and controlled into certain ranges governed by technical requirement so that the quality index control is realized indirectly.
In the sintering process, the normal range of sintering temperature Tsinter of raw material depends upon components of raw material slurry. Variations of components of raw material slurry require corresponding variations of sintering temperature. Inconsistency of real sintering temperature range with requirement of raw material will results in over burning or under burning, and clinker quality is not satisfactory. Thus we conclude that components of raw material slurry and sintering temperature are the main aspects influencing clinker quality. Besides, other factors include particle size of raw material and residing time under Tsinter. The relationship between desired Tsinter and components of raw material slurry can be viewed as a unknown nonlinear function
Tsinter=f([A/S],[N/R],[C/S],[F/A])E1
where [A/S] is the alumina silica ratio of raw material slurry, [N/R] is the alkali ratio, [C/S] is the calcium silica ratio, [F/A] is the iron alumina ratio. Among them, the alumina silica ratio of raw material slurry has the strongest influence on Tsinter, the latter must be enhanced along with the enhancement of the former.
From above analysis, one may conclude that there are two key issues about the control problem of quality index of kiln production. One is how to keep the kiln temperature distribution satisfing technical requirement under fluctuated boundary conditions and operating conditions, i.e. how to keep burning zone temperature, kiln tail temperature and residual oxygen content in combustion gas in their technical required ranges. The other is how to adjust the setpoint range of burning zone temperature so that the liter weight of clinker may be kept qualified under fluctuated boundary conditions and operating conditions.
Figure 2.
General structure of the supervisory control system for rotary kiln process.
This paper has constructed a supervisory control system consisting of a supervisory level and a process control level, whose general structure is shown in Fig. 2. The final target of this supervisory control system is to keep the production quality index, i.e. the clinker unit weight, being acceptable even if the boundary conditions changed. The related process control strategies in process control level include, 1) a hybrid intelligent temperature controller was designed, which coordinated the coal feeding u1, damper position of the induced draft fan u2, and primary air flow u3 to make the burning zone temperature TBZ, the kiln tail temperature TBE, and the residual oxygen content in combustion gas OX satisfy technical requirements; TBZ is indirectly measured by an infrared pyrometer located at kiln head hood, and TBE is obtained through a thermocouple; 2) individual PI controllers were assigned to basic loops of primary air flow, primary air pressure and flow rate of raw material slurry; and 3) a human-machine interaction(HMI) mechanism was designed so that certain human interventions to coal feeding control from experienced operator can be introduced in the mode of automatic control when the operating conditions changed significantly. The aforementioned process control strategies were depicted in our previous study (Zhou et al., 2004).
The main part of the supervisory level is an intelligent setting model of TBZ, which adjusts the setpoint range of TBZ according to the variations of components of raw material slurry. The setpoints of TBE, OX, primary air pressure, flow rate of raw material slurry and the kiln rotary speed n are given by the operators according to production scheduling and production experiences.
The intelligent setting model of burning zone temperature consists of a pre-setting model of burning zone temperature, a compensation model and a setting selector mechanism. The pre-setting model is to give the upper and lower limits of setpoint range of burning zone temperature, denoted by T0BZ_SPHI andT0BZ_SPLO, calculating from the offline analysis data of components of raw material slurry. The fuzzy clustering analysis combined with case-based inference learning is employed to build up the pre-setting model of burning zone temperature. The core of the pre-setting model is a case base containing different upper and lower limits of setpoint range of burning zone temperature corresponding to different components of raw material slurry. Such case base is established through fuzzy clustering based data mining from vast process data samples under various components of raw material slurry. Details are not described in this paper.
As a matter of fact, the main problem we are facing is that the components of raw material slurry often change due to unstable raw material mixing process and the offline analysis data reach to the operator with large time delay so that the operator or the pre-setting model cannot directly adjust the setpoint of TBZ duly. As a result, a single intelligent temperature controller and a single pre-setting model of TBZ cannot maintain satisfactory performance. In such a case, a human operator usually rectifies the output of the temperature controller, i.e. the coal feeding, based on the experience of observing burning status through the HMI embedded in the control system. Such interventions can adapt the variation of operating conditions to a certain degree to sustain the quality of the product.
To deal with such a problem, a compensation model and a setting selector are appended. When the offline analysis data of components of raw material slurry are known and input into the system, i.e. the lth sampled time, the setting selector mechanism triggers the pre-setting model to calculated the proper setpoint range of TBZ. When the components of raw material slurry are unknown, the compensation model is triggered to calculated the proper upper and lower limits of setpoint range of the burning zone temperature, denoted by T1BZ_SPHI and T1BZ_SPLO respectively. In the following section, a Q-learning strategy is employed to construct compensation model to learn the self-adjusting knowledge about the setpoint of TBZ through online self-learning from the human intervention signals.
3. Setpoint adjustment approach based on Q-learning
3.1. Bases of Q-learning
Reinforcement learning is learning with a critic instead of a teacher. The only feedback provided by the critic is a scalar signal r called reinforcement, which can be regarded as a reward or a punishment. Reinforcement learning performs an online search to find an optimal decision policy in multi-stage decision problems.
Q-learning (Watkins & Dayan, 1992) is a reinforcement learning method where the learner builds incrementally a Q-function which attempts to estimate the discounted future rewards for taking actions from given states. The output of the Q-function for state x and action a is denoted byQ(x,a).When action a has been chosen and applied, the environment moves to a new state, x′, and a reinforcement signal, r, is received. Q(x,a)is updated by
where A(x′) is the set of possible actions in statex′, γis discount factor,αk is the learning rate, and visitsk(x,a)is the total number of times this state-action pair (x,a)has been visited up to and including the kth iteration.
3.2. Principle of setpoint adjustment approach based on Q-learning
In this section, we may design an online self-learning system based on reinforcement learning to gradually establish the optimal policy of setpoint adjustment of TBZ. Although it cannot reach to the operator in time, the changes of components of raw material slurry may be indirectly reflected through certain measurements of the rotary kiln process. The measurements can be used to construct the environment state set of the learning system. Moreover, information of human interventions can be regarded as evaluations about whether the setpoint of TBZ is proper or not, for human interventions often occur when the performance is unsatisfactory. Thus this kind of information can be defined as reward signal from environment.
For the learning system, the environment includes the rotary kiln process, the temperature controller and the operator. The environment provides current states and reinforcement payoffs to the learning system. The learning system produces the compensated upper and lower limits of setpoint range of TBZ to temperature controller in the environment. The learning system consists of a state perceptron, a critic, a learner and an action selector, as shown in Fig. 3. The state perceptron firstly samples and processes selected measurements to construct the original state vector, and then converts the original continuous state vector into a discrete feature vector x based on a defined feature extraction function. The action selector employs a ε-greedy action selection strategy to produce an amendment of setpoint of TBZ, i.e. ΔTBZ_SPand the critic serves to calculate an internal practicable reward r relying on some heuristic rules. The learner updates value function of the state-action pair based on tabular Q-learning. The final outputs of the learning system are the compensated upper and lower limits of setpoint range of TBZ, which are calculated respectively by
T1BZ_SPHI(k)=ΔTBZ_SP(k)+T1BZ_SPHI(k−1)E4
T1BZ_SPLO(k)=ΔTBZ_SP(k)+T1BZ_SPLO(k−1)E5
Figure 3.
Schematic diagram of setpoint adjustment approach for TBZ based on Q-learning.
In a Markov decision process (MDP), only the sequential nature of the decision process is relevant, not the amount of time that passes between decision stages. A generalization of this is the semi-Markov decision process (SMDP) in which the amount of time between one decision and the next is a random variable. For the learning process, we define τs as state perception time span for the perceptron to get the state of the environment and τr as reward calculation time span, also named as action execution time span, for the critic to calculate internal reward. The shortest time span from one decision to the next isτ=τs+τr.
The design of the learning system concerns the following key issues:
Construction of the environment perception state set;
Determination of the action set;
Determination of the immediate reward function;
Determination of the learning algorithm.
3.3. Construction of the state set
When components of raw material slurry fluctuate and related offline analysis data are unavailable, we hope that the learning system can estimate the changes of the components of raw material slurry through the percepted information about the environment state. From this idea, some related variables are selected from online measurable variables of the kiln process based on human experience, with which the state vector s is defined to buildup the original state space S of the learning system, wheres=[s1,s2,s3,s4,s5],s∈S. s1is defined as the averaged burning zone temperatureT¯BZ, s2is the averaged flow rate of raw material slurryG¯, s3is the averaged coal feedingu¯1, s4and s5 are the averaged upper and lower limit of the setpoint range of TBZ, named as T¯BZ_SPHIand T¯BZ_SPLOrespectively, all duringτs. They are calculated from
T¯BZ=∑j=1JTBZ(j)/JE6
G¯=∑j=1JG(j)/JE7
u¯1=∑j=1Ju1(j)/JE8
T¯BZ_SPHI=∑j=1JTBZ_SPHI(j)/JE9
T¯BZ_SPLO=∑j=1JTBZ_SPLO(j)/JE10
whereTBZ(j),G(j) ,u1(j) ,TBZ_SPHI(j) ,TBZ_SPLO(j) denote the jth sampling values of TBZ, flow rate of raw material slurry, coal feeding, upper and lower limits of the setpoint range of TBZ during τs respectively. J is the total number of sampling values duringτs.
Since the state space S defined above is continuous, it is impossible to compute and store value functions for every possible state or state-action pair due to the curse of dimensionality. The issue is often addressed by generating a compact parametric representation, such as an artificial neural network, that approximates the value function and can guide future actions. we practically choose to use a feature extraction method (Tsitsiklis & Van Roy, 1996) to map the original continuous state space into a finite feature space, then we can employ tabular Q-learning to solve the problem.
By identifying one partition per possible feature vector, the feature extraction mapping F(s)=[f1(s1,s4,s5),f2(s1),f3(s2),f4(s3)] defines a partitioning of the original state space. The burning zone temperature biasing (from the setpoint range) level feature f1, the temperature level feature f2, flow rate of raw material slurry level feature f3, the coal feeding level feature f4 are defined respectively by
where L1andL2 are the thresholds scaling the burning zone temperature bias from setpoint range level.
Each feature function maps the state space S to a finite setPm,m=1,2,3,4. Then we associate the feature vector x=[x1,x2,x3,x4]=F(s) to each states∈S. The resulting set of all possible feature vectors, also defined as feature spaceX, is the Cartesian product of the setsPm.
Because the compensation model for the setpoint of burning zone temperature needs only to be applicable for the normal kiln operating conditions, the design of state set needs certain filtration in the feature spaceX. The appearence of x3=3or x4=3 might means the abnormal operating conditions such as low load of flow rate of raw material slurry during kiln starting phase or abnormal coal components. The state set excludes such valued feature vectors.
3.4. Action set
The learning system aims to deduce the proper or best actions of setpoint adjustment of TBZ from specified environment state. The problem to be handled is how to choose ΔTBZ_SPaccording to the changes of environment state. Thus the action set can be defined asA={a1,a2,a3,a4,a5}={−30,−15,0,15,30}.
3.5. Immediate reward signal
During τr after the action selection based on current state judgment, the learning system determines the immediate reward signalr=R(Δu1MAN,Δu1AUTO), which represents the satisfactory degree of the environment about the action execution under current state, using the human intervention regulation of coal feeding Δu1MAN and temperature controller regulationΔu1AUTO.The reward signal r is determined in table 1.
r
|ΔCoalAUTO|≤L3
ΔCoalAUTOL3
ΔCoalAUTO−L3
|ΔCoalMAN|≤L3
0.4
0.4
0.4
ΔCoalMANL3
-0.2
0.2
-0.4
ΔCoalMANL3
-0.2
-0.4
0.2
Table 1.
Definition of immediate reward function R.
where L3 is the threshold constant, ΔCoalMANdenotes the total regulation of coal feeding from human intervention duringτr, which is calculated by
ΔCoalMAN=∑τrΔu1MANE15
and ΔCoalAUTO denotes the total regulation from temperature controller duringτr, which is calculated by
ΔCoalAUTO=∑τrΔu1AUTOE16
The immediate reward function R in Table 1 is from the following heuristic rules:
Duringτr, if|ΔCoalMAN|≤L3, which means the operator is satisfied with the regulation action of the control system and little human intervention occurs, then a positive reward r=0.4 is returned. If ΔCoalMANand ΔCoalAUTO have same regulation directions, which means the direction of regulation action of the control system fits with the operator expectation with short amplitude, then a positive reward r=0.2 is returned. If ΔCoalMAN> L3 or ΔCoalMAN<−L3, and|ΔCoalAUTO|≤L3, which means little regulation action of the control system occurs while large human intervention occurs, then r=-0.2. If ΔCoalMANand ΔCoalAUTO have contrary regulation directions, which means the operator is not satisfied with the regulation action of the control system, then a negative reward r=-0.4 is returned.
3.6. Algorithm summary
The whole learning algorithm of the learning system under learning mode is summarized as follows:
Step 1: If it is in initialization, then the Q value table of state-action pairs is initialized according to expert experience, otherwise goto step 2 directly;
Step 2: Duringτs, the state perceptron obtains and saves measured burning zone temperature, flow rate of raw material slurry, coal feeding, upper and lower limits of the setpoint range of the burning zone temperature, and calculates related averaged values by using (6)-(10), then transfer them into related level features to construct feature vector xby using (11)-(14).
Step 3: Search in the Q table to make state matching, if unsuccessful then goto step 2 to make state judgement again, if successful then go ahead;
Step 4: The action selector chooses an amendment of setpoint of TBZ as its output according to ε-greedy action selection strategy (Sutton & Barto, 1998), where ε=0.1;
Step 5: Duringτr, the critic determines the reward signal r of this state-action pair according to Table 1.
Step 6: When the current τrfinishes, entering the nextτs, the state perceptron judges the next statex′, state matching is made in the Q table, if unsuccessful then goto step 2 to start the next learning round, if successful then using the reward signal r, the learner calculates and updates the Q value of the last state-action pair by using (2)-(3), whereγ=0.9.
Step 7: Judge if the learning should be finished. When all evaluation values of state-action pairs in the Q table do not change obviously, it means that the Q-function have converged, and the compensation model is well trained.
The problem of Q table initialization: there is no explicit tutor signal in reinforcement learning, the learning procedure is carried out through constant interaction with environment to get the reward signals. Usually, less information from environment will results low learning efficiency of reinforcement learning. In this paper different initial evaluation values are given for different actions under same state based on expert experience so that the convergence of the algorithm has been speedup, and online learning efficiency has been enhanced.
3.7. Technical issues
The main task of the learning system is to estimate the variations of the kiln operating conditions continuously, and to adjust the setpoint range of burning zone temperature accordingly. Such adjustments should be made when the burning zone temperature is fairly controlled smooth by the temperature controller. Such a judgment signal is given out from the hybrid intelligent temperature controller. If the temperature control is in the abnormal conditions, the learning procedure must be postponed. In this case the setpoint range of the burning zone temperature is kept constant.
Moreover, setpoint adjustments should be made when the learning system make accurate judgment about the kiln operating conditions. Because of complexity and fluctuation of kiln operating conditions, accurate judgment for current state usually needs long time, and the time span between two setpoint adjustments cannot be too short, otherwise the calculated immediate reward cannot reflect the real influence of the above adjustment upon the behaviour and performance of the control system. Thus special attention should be paid to selection of τsandτr. This makes solid foundation, on which obtained environmental states and reinforcement payoffs are effective.
After long term running, large characteristic changes of components of raw material slurry, coal and kiln device may appear. The previous optimal designed compensation model for the setpoint of burning zone temperature might become invalid under new operating conditions. This needs new optimal design to keep good performance of control system for long term. In this case, the reinforcement learning system should be switched into the learning mode, and above models can be established through new learning to improve the performance, so that the control system has strong adaptability for long term running. This is an important issue drawing the attentions of the enterprise.
Shanxi Alumina Plant is the largest alumina plant in Asia with megaton production capacity. It has 6 oversize rotary kilns of φ4.5×110m. Its production employs the series parallel technology of Bayer and Sintering Processes. Such a production technology makes components of the raw material of rotary kilns often vary in large range. It is more difficult to keep a stable kiln operation than ordinary rotary kiln.
A supervisory control system has been developed in the #4 rotary kiln of Shanxi Alumina Plant based on the proposed structure and the setpoint adjustment approach of burning zone temperature. It is implemented in the I/A Series 51 DCS of Foxboro. The Q-learning-based strategy has been realized in the configuration environment of Fox Draw and ICC of I/A Series 51 DCS. Related parameters are chosen asτs=30min, τr=120min.
Figure 4.
The setpoint of burning zone temperature is properly adjusted after learning.
Fig. 4 shows the condition that, after a period of learning, a set of relatively stable strategies of setpoint adjustment has been established so that the setpoint range of TBZ can be automatically adjusted to satisfy the requirement of sintering temperature, according to the level of raw material slurry flow, the level of coal feeding, the level of TBZ and the level of temperature biasing. It can be seen that the setpoint adjustment happened only when TBZ is controlled smoothly. The judgment signal, denoted by “control parameter” in Fig. 4, takes value of 0 when the burning zone temperature is fairly controlled smooth, and vice versa.
The adjustment actions of the above reinforcement learning system result in satisfactory performance of the kiln temperature controller, with reasonable and acceptable regulation amplitude of coal feeding and regulation rhythm, so that the adaptability for variations of operating conditions has been significantly enhanced and the production quality index, liter weight of clinker, can be kept to reach the technical requirement even if the boundary conditions and operation conditions change. Meanwhile, human interventions become weaker and weaker since the model application has improved the system performance.
In the period of test run, the running rate of supervisory control system has been up to 90%. Negative influences on the heating and operating conditions from human factors have been avoided, rationalization and stability of clinker production has been kept, and operational life span of kiln liner has been prolonged remarkably. The qualification rate of clinker unit weight has been enhanced from 78.67% to 84.77%; production capacity in unit time per kiln has been increased from 52.95t/h to 55t/h with 3.9% increment. The kiln running rate has been elevated up to 1.5%. Through the calculation based on average 10℃ reduction of kiln tail temperature and average 2% decrease of the residual oxygen content in combustion gas, it can be concluded that 1.5% energy consumption has been saved.
In this chapter, we focus on the discussion about an implementation strategy of how to employ reinforcement learning in control of a typical complex industrial process to enhance control performance and adaptability for the variations of operating conditions of the automatic control system.
Operation of large rotary kilns is difficult and relies on experienced human operators observing the burning status, because of their inherent complexities. Thus the problem of human-machine coordination is addressed when we design the rotary kiln control system, and the human intervention and adjustment can be introduced. Except for emergent operation conditions that need urgent human operation for system safety, the fact is observed that human interventions to the automatic control system usually imply human’s discontent to the performance of the control system when the variation of boundary conditions occurs. From this idea, an online reinforcement learning-based supervisory control system is designed, in which the human interventions might be defined as the environmental reward signals. The optimal mapping between rotary kiln operating conditions and adjustment of important controller setpoint parameters can be established gradually. Successful application of this strategy in an alumina rotary kiln has shown that the adaptability and performance of the control system have been improved effectively.
Further research will focus on trying to improve the setting model of the burning zone temperature by introducing the offline analysis data of clinker liter weight to reject the other uncertain disturbances in the quality control of kiln production.
References
1.HolmbladL.ØstergaardJ.1995 The FLS application of Fuzzy logic, Fuzzy Sets and Systems, 702-3 , (March 1995) 135-146, 0165-0114
2.JarvensivuM.SaariK.Jamsa-JounelaS.2001a Intelligent control system of an industrial lime kiln process, Control Engineering Practice, 96 (June 2001) 589-606, 0967-0661
3.JarvensivuM.JuusoE.AhavaO.2001b Intelligent control of a rotary kiln fired with producer gas generated from biomass, Engineering Applications of Artificial Intelligence, 145 (October 2001) 629-653, 0952-1976
4.SuttonR.BartoA.1998Reinforcement Learning: An Introduction, MIT Press, 0-26219-398-1 Cambridge, MA
5.TsitsiklisJ.Van RoyB.1996Feature-based methods for large scale dynamic programming, Machine Learning, 221-3 , (Jan./Feb./March 1996) 5994 , 0885-6125
7.ZanovelloR.BudmanH.1999Model predictive control with soft constraints with application to lime kiln control, Computers and Chemical Engineering, 236 (June 1999) 791806, 0098-1354
8.ZhouX.XuD.ZhangL.ChaiT.2004 Integrated automation system of a rotary kiln process for Alumina production, Journal of Jilin University (Engineering and Technology Edition), 34 No. sup, (August 2004)350-353. 1671-5497(in Chinese)
9.ZhouX.YueH.ChaiT.FangB.2006 Supervisory control for rotary kiln temperature based on reinforcement learning,Proceedings of 2006 International Conference on Intelligent Computing, 428437 , 3-54037-255-5 Kunming, China, August, 2006, Springer-Verlag, Berlin, Germany