Open access peer-reviewed chapter - ONLINE FIRST

Improved Probabilistic Frequent Itemset Analysis Strategy of Learning Behaviors Based on Eclat Framework

By Xiaona Xia

Submitted: February 27th 2021Reviewed: March 14th 2021Published: April 24th 2021

DOI: 10.5772/intechopen.97219

Downloaded: 88

Abstract

Interactive learning environment is the key support for education decision making, the corresponding analytics and methodology are the important part of educational technology research and development. As an important part and the research challenge, learning behaviors are uncertain and produce complex data relationships, which makes the learning analysis process more difficult. This chapter studies the feasibility of Eclat framework applying in educational decision making and get the corresponding the data analysis results. We take probabilistic frequent itemsets and association rules as research objectives, extract and standardize multiple data subsets; Based on Eclat framework, using data vertical format, we design and improve the models and algorithms in the process of data management and processing. The results show that the improved models and algorithms are effective and feasible. On the premise of ensuring robustness and stability, the mining quality of probabilistic frequent itemsets and association rules is guaranteed, which is conducive to the construction of key execution topology of learning behaviors, and improves the accuracy and reliability of data association analysis and decision prediction. The whole analysis methods and demonstration processes can provide references for the study of interactive learning environment, as well as decision suggestions and predictive feedback.

Keywords

  • Learning Analytics
  • Decision Making
  • Eclat Framework
  • Probabilistic Frequent Itemsets
  • Association Rules
  • Decision Prediction
  • Interactive Learning Environment

1. Introduction

Content resources, interaction patterns, collaborative models, organizational planning and influencing factors related to learning processes constitute learning behaviors, which are also key elements to describe learning behaviors [1, 2]. The learning processes supported by online technology and data technology ensure the completeness and continuity of learning behavior data. Massive learning behaviors is an important part of education big data, which provides the possibility for the full development of learning analytics [3, 4]. Learning behavior data can be divided into two categories: horizontal format and vertical format from the perspective of data structure and feature attributes. These two categories are inseparable about the components of learning behaviors, which are the atomic units to describe learning behaviors, such as url, forumng, questionnaire, etc. The horizontal format of learning behaviors is a vector set composed of multi-dimensional attributes, and the vertical format is a vector set composed of multi-level learners. From the perspective of horizontal format, the researches define learning behaviors as the collection of learners, appropriate learning analytics and tools are used to carry out data statistics and rule exploration. However, it is difficult to calculate and compare the influences of components of learning behaviors, which is not conducive to the construction of a new education mode, and it is relatively difficult to implement the calculation and comparison of the influences of learning behaviors more passive.

Learning behaviors represent continuous learning processes, and there are associated needs and execution results [5]. The analysis of learning behavior based on vertical format can provide more intuitive and accurate characteristics for the study of the groupness and individuality of the learning behavior components. However, the analysis process based on the vertical format is a complex problem with multiple factors. It is impossible to find a suitable decision making and prediction framework. Through sampling, the breadth and depth of data processing are limited, and it is difficult to achieve a feasible decision. Due to the shortcomings and gaps in technology and model, learning behaviors constitute data and potential relationships cannot be gotten fully mining and complete analysis. In terms of research methods and application practices of learning behaviors, there are still many problems to solve [6, 7].

In this chapter, vertical data is analyzed for an online learning behavior big data set. The vertical data analysis of learning behaviors is carried out from the data structure and characteristics. Based on Eclat framework, a probabilistic frequent itemset learning algorithm is designed, and its feasibility and reliability are demonstrated and compared. Within the effective performance indicators, the probabilistic frequent itemsets and association rules are calculated and mined from the learning behavior components, and the correlation is demonstrated. Then we explore the rules and characteristics of learning behaviors, and provide decision feedback and suggestions for the design improvement and relationship of learning behaviors.

Advertisement

2. Related work

Mining probabilistic frequent itemsets is a branch of data analytics. There are explicit or implicit association data, which is the key basis for prediction, decision making and recommendation of other learning behavior components. On the current big data platform, the decision algorithm and recommendation algorithm based on frequent itemsets mining are used to track data. However, due to the particularity and complexity of learning behaviors, as well as the autonomy and randomness of learning processes, there is no general technical means to ensure the integrity and sufficiency to implement the analysis and calculation with the goal of decision making and prediction. In this regard, it is necessary to participate in benign learning behavior component construction and recommendation. The research on frequent itemsets has shown an urgent technical demand in the field of education big data. There have been relevant results to demonstrate the urgency and reality of empirical methods and technical means.

The research on probabilistic frequent itemsets of learning behavior components, after combing the relevant theoretical and application results, mainly focuses on the data statistics and association rules of horizontal format, which is reflected in the following aspects:

2.1 Frequent itemset mining based on apriori

Frequent itemset mining based on Apriori takes the construction of itemset association rules as the premise. The mining process is based on the horizontal format and completes the extraction of rules through iterative search strategy. After data connection and pruning, the itemsets satisfying the association rules are formed [8, 9, 10, 11]. If one itemset satisfies the minimum support and a certain confidence, it is defined as a frequent itemset. Apriori algorithm is used to analyze the relevance of learning behaviors, the main idea is to select the learning platform, locate the components of learning behaviors, realize the association between learning behaviors and learning effects, define learning behavior as “cause”, and define learning effect as “result”. The traditional Apriori algorithm is improved flexibly. With the help of clustering, weighted balance, decision tree evaluation and other means, the data tracking are realized. The research target is to optimize learning behaviors and improve learning efficiency. However, the frequent itemset mining process of Apriori needs to scan the original data many times. When the original data is large, the number of times of iterative scanning is too much, which seriously affects the efficiency of the algorithms.

2.2 Frequent itemsets mining based on FP-growth

Frequent itemset mining based on FP-growth also uses horizontal data format, but the data structure is essentially different. The process of data analysis is mainly divided into two steps: constructing FP tree and mining frequent itemsets. Through the construction of FP tree, the expression of itemsets associated transaction is realized, that is, one path of FP tree corresponds to a transaction, and the transaction is composed of items. Different transactions may have the same items, which makes the path of FP tree overlap. The more overlapped, the greater the path compression space, the higher the access efficiency of FP [12, 13, 14, 15]. FP-growth is used to mine frequent itemsets of learning behaviors. Its main idea is similar to Apriori. According to the research target of learning behaviors, users require to select the data set of learning behaviors, define the itemsets and research target, put forward hypothesis test, explore the rules by means of classification, clustering and decision making, draw a conclusion, and verify the existing education and teaching according to the data analysis results, but there are some problems. Due to the diversity, randomness and complexity of learning behaviors, FP-growth algorithm has obvious limitations in the study of learning behaviors. When the itemsets of learning behavior are too many or the relationships are complex, it will lead to too many sub nodes of FP tree, which will greatly reduce the efficiency of the algorithms, and can not get accurate and complete frequent itemsets. FP-growth algorithm is very difficult to learn.

2.3 Frequent itemsets mining based on Eclat framework

Compared with Apriori and FP(Frequent Pattern)-growth, The fundamental difference of Eclat is that the algorithm analysis of Eclat uses vertical data format, and is essentially a deep optimization search mechanism. The rule search space is effectively divided into subspace sets through concept lattice and equivalence relationships. The support calculation of each itemset does not require repeated retrieval of the entire dataset [16, 17, 18, 19]. The main idea of using Eclat framework to study learning behaviors need the support of big data set of learning behaviors, through data transposition and standardization processing, we can get the itemsets and the transaction set. On this basis, the relevant models and algorithms of Eclat framework are improved and redesigned. On the premise of support, confidence and promotion, frequent itemsets and association rules are mined. Taking the final frequent itemsets and association rules as the references. Vertical data analysis and research based on Eclat framework can improve the speed of data search, association and analysis, and also improve the reliability of data validation results to a certain extent.

However, the Eclat framework is rarely used in the data processing of learning behaviors. Therefore, the improvement of algorithms and models has no effective results, which is directly related to the difficulty of technology caused by the complexity of learning behaviors. If Eclat is used to transpose and intersect all items and transactions, or if the number of items and transactions is too large, the efficiency of the algorithms will be affected. Therefore, the mining of frequent itemsets in Eclat framework should be assisted by other algorithms and tools, which is more practical. This chapter will integrate the advantages and feasible attempts in the application of Eclat framework, such as technical improvement, model design, tool application, etc., so as to provide more effective methods for the follow-up study of big data of learning behaviors and others.

3. Elements of learning behaviors and research problems

We select a big data set of learning behaviors of UK open university in four periods in recent two years, and the data scale reaches hundreds of millions. From the perspective of course category, we realize the tracking and comparison of learning behaviors of the same category and different categories, and make adaptive decision. The courses are divided into two categories: Literature and Technology. For each category, two courses are selected, namely L1 and L2, T1 and T2. Different courses have different periods of learning behaviors, with the help of assessment, the learning effects are achieved. There is correlation between learning behaviors and learning effects, and there is mutual restrictive and driving relationships between the components of learning behaviors. The empirical problems and testing strategies are established between learning behaviors and learning effects, the research conclusions and decision making reflection are the basis for the improvement and optimization of data-driven learning behaviors.

Tables 14 show the components and indicators of learning behaviors corresponding to the four courses of L1, L2, T1 and T2. The four tables involve four learning periods: P1, P2, P3 and P4. The data distribution of the tables indicates that not all courses have learning behavior in each period. The indicators involve two statistics: the median and the mode, which are used to investigate the population trend. Different indicators are selected according to different types of components. “assessment” represents the assessment method of courses, that is composed of enumeration components, mainly including CT (Computer Test), TT (Teacher Test) and exam (computer and Teacher joint test); “final_result” represents the result of the course assessment and is also an enumeration type, including four components: excellent, pass, fail and withdrawn. “assessment” and “final_result” measure the group tendency of courses. Other components are the main parts of the interaction processes. They all describe the interaction frequency, which has the autonomy and randomness of learners. The strength of interaction frequency is assessed by the median to investigate the distribution range.

P2P4
forumng668content1080
homepage369resource8
content147subpage16
resource1url0
assessmentTTassessmentTT
final_resultpassfinal_resultpass

Table 1.

Components and indicators of L1.

P3P4
forumng23forumng26
homepage106homepage112
collaborate0collaborate0
content17content24
page0page0
quiz276quiz312
resource32resource34
subpage56subpage54.5
url4url4
assessmentTTassessmentTT
final_resultWith-drawnfinal_resultWith-drawn

Table 2.

Components and indicators of L2.

P2P3P4
dualpane3dualpane0dualpane0
forumng112.5forumng73.5forumng126
homepage195homepage160.5homepage189
collaborate0collaborate1collaborate1
content506content447.5content466
wiki117wiki86.5wiki126
page0page0page0
quiz137quiz113quiz108
resource7resource11resource9
subpage22subpage17subpage19
url27url27.5url22
assessmentTTassessmentTTassessmentTT
final_resultwithdrawnfinal_resultwithdrawnfinal_resultpass

Table 3.

Components and indicators of T1.

P1P2P3P4
dataplus0dataplus0dataplus0dataplus0
dualpane2dualpane0dualpane0dualpane0
forumng229.5folder1folder1forumng150
glossary0forumng183forumng143glossary0
homepage282glossary0glossary0homepage229
content795homepage234homepage201htmlactivity4
elluminate8collaborate1oucollaborate1collaborate1
wiki13content566oucontent482content638
page9wiki11ouwiki8wiki9
questionnaire3.5page7page7page2
quiz581.5questionnaire3questionnaire0questionnaire2
resource32quiz543quiz521quiz557
subpage219.5repeatactivity0resource22repeatactivity0
url23resource26subpage162resource25
assessmentTTsubpage180url12subpage184
final_resultPassurl13assessmentTTurl14
assessmentTTfinal_resultpassassessmentTT
final_resultpassfinal_resultexcellent

Table 4.

Components and indicators of T2.

From Tables 14, we can see that the concentration of group selection of “assessment” is very obvious. Most of the learners have completed the course assessment by teachers, but the assessment results are quite different, and the assessment results of the same course in different learning periods are also different. About P4 of L2, as same as P2 and P3 of T1, learners tend to give up the assessment. In P4 of T2, most of the learners obtain excellent assessment results, and most of them pass the course. From “assessment” and “final_result”, the group indicators of Literature courses and Technology courses are similar.

As for other components of learning behaviors, it can be found from the data that the category of components and the participation of isomorphic components show strong discrete characteristics. The results show that the types of interaction components in two learning periods of L2 and three learning periods of T1 are consistent, and the median is relatively close, which indicates that the distribution of learners’ participation in these interactive components is basically consistent. The two learning periods of L2 have the same “final_result” mode, and the assessment results of T1 have obvious differences. The comparison of the types or numbers of interaction components related to the same course in different learning periods directly shows the differences. The interactions are significantly different, and there is a gap in the median of the same interaction component, such as “content” of two learning periods of L1. At the same time, the types of interaction components that belong to Literature or Technology courses are subject to the courses. The learners of L1 and L2 have their own interactive components, and T1 and T2 are the same.

Therefore, their interaction components of L1, L2, T1 and T2 reflect the autonomous learning characteristics, and the component constraints of assessment methods and results realize the differentiation of learners. The problems and relationships are shown in Figure 1, which is divided into the following four steps:

  1. The mining of frequent itemsets will take different interaction components as reference items, and realize the analysis and mining of frequent itemsets based on reference items according to certain probability;

  2. Taking three enumeration methods of “assessment” as component reference items, according to certain probability, the frequent itemsets analysis and mining based on reference items are realized;

  3. The four enumeration methods of “final_result” are component reference items. Based on a certain probability, the analysis and mining of frequent itemsets based on reference items are realized;

  4. Based on a certain probability, the intersection of the three groups of frequent itemsets obtained from (1), (2) and (3) is solved and analyzed, and the inherent association logic and restrictive conditions are evaluated. On this basis, the rule of data-driven learning behaviors, prediction direction and decision making are explored.

Figure 1.

The research problems and logical relationships.

The certain probability in the four steps depends on the selected algorithm requirements and measurement support. Based on the improved Eclat framework, we complete the four steps of the research problems, uses the three indicators “Support”, “Confidence” and “Lift” to realize constraints, analyzes threshold and test criteria, and mines probabilistic frequent itemsets and association rules.

4. Improved Eclat framework

For the four learning behavior datasets corresponding to L1, L2, T1 and T2, the execution results of the reference items can be described in the form of probability, but not the “Support” calculation mode. The expected “Support” of reference items should be used to describe the execution frequency of uncertain components [20], that is a feasible and target analysis strategy, which is the model basis for improving the Eclat framework.

4.1 Related models

The Related Models for the improvement of Eclat framework are as follows.

4.1.1 Expected “Support” of reference items

Given a probabilistic data set with Nreference item instances, the expected “Support” of a reference term Xis expressed as the cumulative value of the probability in the probabilistic data set. The calculation formula is expectsupX=i=1NpiX.

4.1.2 Frequent itemsets

Based on the expected “Support” of a reference item, a probabilistic data set with Nreference item instances is given, if it meets expectsupXN×min_RST, the reference item Xis a frequent item set. min_RSTis the minimum relative “Support” threshold, which is calculated by the ratio of the minimum absolute “Support” threshold to the reference item instance. Generally, this value can be specified according to the data distribution.

4.1.3 Probability frequency

Combined with the conditions of frequent itemsets, given a probabilistic data set with Nreference item instances, the probability frequency of the reference term is defined as: proFX=proFexpectsupXN×min_RST.

4.1.4 Probabilistic frequent itemsets

Given a probabilistic data set with Nreference item instances, if meeting proFXmin_proF, the reference item Xis a probabilistic frequent itemset, min_proFis the minimum frequent probability threshold, which can also be specified according to the data distribution.

4.2 Algorithm design

Many algorithms for mining frequent itemsets mostly use horizontal data format with transaction as vector [5, 21]. The uncertainty of learning behavior data makes the analysis of learning behavior need vertical data format. One complete learning behavior of learners constitutes a transaction. Based on Eclat framework, it is suitable to adopt tidlistdata structure, and add a probability parameter to each item of learning behaviors to indicate the possibility of a specific transaction.

The vertical data format of learning behavior data is a binary tuple xtidlistx, which represents the item set of learning behaviors, and xis the identifier of each item, that is, the number of each learning behavior, tidlistxis the list of items of x. If each item contains an identifier iiand an existence probability pXii, tidlistxis expressed as a tuple i1pxi1i2pxi2iipxii. In the algorithm design of vertical data format, it is necessary to complete the calculation of probability frequency. Here we use two-dimensional array Pxijto represent the probability quality function, which means the Xprobability of the ioccurrence in the previous jreference items. Therefore, the calculation process of probability frequency is described as PFC(Frequent Pattern Calculation) program.

PFC program

Input: Item set XiipXii//1iIIrepresents the maximum number of transactions.

Output: Pxij

Process

  1. PFC()

  2. For j=0 to I

  3. Px0j=1

  4. EndFor //Initialize the first row units of Pxijof a to 1

  5. For j=0 to I

  6. For i=0 to min_Valuejmin_RST//min_Valuejmin_RSTis used to compare jand min_RST, then return the minimum value.

  7.   If i>jthen Pxij=0

  8.   Else if i>jthen Pxij=i=1jpxii

  9.    Else if i<j

  10.      Then PijX=PXi1j1pXij+maxPXij1PXi1j1pXij

    /*This formula is a kind of dynamic decision programming, and the maximum probability frequency is obtained by the adjacent units.*/

  11.      End If

  12.    End If

  13.   End If

  14.  End For

  15. End For

  16. Output: Pxij

Based on the calculation results of probability frequency, Eclat algorithm is designed. There are three main steps:

Firstly, according to the vertical data format, the transactions and corresponding items are extracted from the learning behavior data set, with the help of bi-directional sorting strategy, transactions are initialized. The items are stored according to tidlist. Then, it analyzes the “Support” of the transactions stored in tidlist, and discards the transactions with lower “Support” (support<min_RST).

Secondly, the items of learning behaviors are pruned and optimized, and the kitem set from tidlistis extracted by intersection, and the probability frequency of kitem set is realized by multiplication operation.

Thirdly, mining probabilistic frequent itemsets recursively in candidate itemsets. In the mining process, pruning strategy based on tidlistis implemented to reduce the search time complexity. Furthermore, based on the projection of kfrequent itemsets, the probability data composed of frequent itemsets are obtained.

These three steps constitute a recursive process, and the whole algorithm process is described as LB(Learning Behavior)-Eclat program.

LB-Eclat Program

Input: T//Tis the data set for storing vertical data formats.

Process:

  1. LB-Eclat(T)

  2. While all XiT

  3. Ii=φ

  4.  While XjT&& expectrupXi>expectrupXj

  5.   Xij=XiXj

  6.   tidlistXij=tidlistXitidlistXj

  7.   Call PFC(Xij)//call PFC program

  8.   If PXijmin_proF

  9.   Then T=TXij;Ii=IiXij

  10.   End While

  11. End While

  12. While Iiφ

  13.   LB-Eclat(Ii)

  14. End Whille

  15. Output: all probabilistic frequent itemsets.

5. Experiments

The learning behavior components shown in Tables 14 are different in scale and sparsity. Combined with the density of learning behavior components, the specific situation is shown in Table 5. In order to realize the comparison and test of the algorithms, the traditional Eclat algorithm and the Eclat algorithm based on descending “Support” (DES Eclat) are selected to carry out the experiments.

L1-P2sparse densityL1-P4sparse densityL2-P3moderate densityL2-P4dense density
T1-P2sparse densityT1-P3sparse densityT1-P4moderate density
T2-P1moderate densityT2-P2dense densityT2-P3moderate densityT2-P4dense density

Table 5.

Density of data sets.

5.1 Performance Indicators

Based on the Eclat framework, the traditional Eclat algorithm, des-Eclat algorithm and LB-Eclat algorithm are written into Python 3.7 and run in the same experimental configuration. In the whole experiment process, we set different min_RSTto mine frequent itemsets, and record the indicators generated in the whole processes, which are mainly reflected in the running time of the algorithm, the proportion of memory and the number of probabilistic frequent itemsets.

The test of each indicator is divided into three series according to the sparsity of the data set. The comparative statistical results of corresponding time are shown in Figures 24. The larger the min_RST, the smaller the time curve distribution of each subgraph. It can be seen from Figure 2 that the algorithm execution results of sparse density dataset based on the same value show that the traditional Eclat algorithm has advantages. The special sorting of data of DES-Eclat and LB-Eclat increases the time complexity, and the analysis process increases the data time. The execution time of LB-Eclat algorithm is the lowest in Figures 3 and 4, which indicates that the improvement of the algorithm is more conducive to the analysis of data sets with higher density, and is more effective for mining and processing frequent itemsets of learning behaviors. It can not be found from the time that the DES-Elat algorithm based on the reverse order strategy has a long running time.

Figure 2.

Comparison of running time of three algorithms on sparse density datasets.

Figure 3.

Comparison of running time of three algorithms on moderate density datasets.

Figure 4.

Comparison of running time of three algorithms on dense density datasets.

The comparative results of memory space of the three algorithms are shown in Figures 57. No matter what the density of data set, the three algorithms occupy the same memory space distribution, the value change trend is the same, LB-Eclat algorithm is slightly smaller than other algorithms, the larger the data set density, compared with the traditional Eclat algorithm and des-Eclat algorithm, the smaller the space complexity, that improve the utilization of memory.

Figure 5.

Comparison of memory space of three algorithms on sparse density datasets.

Figure 6.

Comparison of memory space of three algorithms on moderate density datasets.

Figure 7.

Comparison of memory space of three algorithms on dense density datasets.

The comparison results of probabilistic frequent itemsets mined by the algorithms are shown in Figures 810. With different min_RST, the number of probabilistic frequent itemsets depends on the items of learning behaviors and the density of transactions. Although the running time and memory space of the three algorithms are different on the same dataset, the number of probabilistic frequent itemsets obtained is basically the same. With the increase of min_RST, the fewer the number, the smaller the value, The larger the number, the more transactions and items need to be analyzed and calculated, which will inevitably increase the time complexity and space complexity.

Figure 8.

Comparison of probabilistic frequent Itemsets of three algorithms on sparse density datasets.

Figure 9.

Comparison of probabilistic frequent Itemsets of three algorithms on moderate density datasets.

Figure 10.

Comparison of probabilistic frequent Itemsets of three algorithms on dense density datasets.

The experimental results show that the LB-Eclat algorithm is effective in the study of uncertain learning behavior probabilistic frequent itemsets. About the running time and memory space, LB-Eclat is better than the other two approximate algorithms in mining and analyzing the probabilistic frequent itemsets of sparse density data sets, moderate density data sets and dense density data sets. Since there are 11 learning behavior data sets, the data are all from the real learning processes, and the comparison test process is fully complete. The indicators show that LB-Eclat algorithm are robust and realistic.

Advertisement

6. Probabilistic frequent itemsets analysis of learning behaviors

Based on LB-Eclat algorithm, the probabilistic frequent itemsets of 11 data sets of learning behaviors are mined, and the itemsets with high probability are found. On the basis of “Support” (>0.3) and “Conference” (>0.7), the probability frequent itemsets of each dataset are mined, and then the association degree of rules generated by itemsets is verified by “Lift”. If “Lift” > 1, the association degree of relevant rules is high. In the mining results of probabilistic frequent itemsets, 2-itemsets are the most, as shown in Tables 68, the other 3-itemsets and 4-itemsets are mainly based on the intersection and combination of 2-itemsets. The higher the density of data sets, the more frequent itemsets are mined. Based on the constraints of “Support” and “Confidence”, some data sets are limited to 2-itemsets, such as L1-p2 and L1-p4.

L1-P2T1-P2T1-P3
forumng, homepage
forumng, content
content, final_result
forumng,final_result
forumng, homepage
homepage, content
homepage, wiki
homepage, subpage
homepage, url
content, wiki
content, quiz
content, url
wiki, url
subpage, url
homepage, final_result
content, final_result
wiki, final_result
url, final_result
forumng, homepage
forumng, wiki
forumng, url
homepage, content
homepage, wiki
homepage, url
content, wiki
L1-P4
content, subpage
content,final_result
T1-P3content, url
wiki, url
homepage, final_result
content, final_result
wiki, final_result
url, final_result

Table 6.

Probabilistic frequent 2-itemsets of sparse density data sets.

L2-P3T1-P4T2-P1
forumng, homepage
forumng, subpage
forumng, url
homepage, quiz
homepage, subpage
homepage, url
quiz, subpage
resource, subpage
subpage, url
homepage, final_result
page, final_result
quiz, final_result
resource, final_result
subpage, final_result
forumng, homepage
forumng, wiki
forumng, url
homepage, content
homepage, wiki
homepage, url
content, wiki
wiki, url
forumng,final_result
homepage,final_result
content, final_result
wiki, final_result
quiz, final_result
subpage,final_result
url, final_result
dataplus, content
dataplus,questionnaire
dataplus, url
dualpane, content
dualpane,questionnaire
dualpane, subpage
dualpane, url
forumng, homepage
homepage, content
homepage, questionnaire
homepage, subpage
homepage, url
content, page
content, questionnaire
content, quiz
content, resource
content, subpage
content, url
wiki, subpage
page, questionnaire
page, subpage
page, url
questionnaire,subpage
questionnaire, url
quiz, subpage
resource, subpage
subpage, url
dataplus, final_result
dualpane, final_result
forumng, final_result
homepage, final_result
content, final_result
page,final_result
questionnaire,final_result
quiz, final_result
resource, final_result
subpage, final_result
url, final_result
T2-P3forumng, homepage
forumng, subpage
homepage, content
homepage, wiki
homepage,questionnaire
homepage, subpage
homepage, url
content, wiki
content, questionnaire
content, subpage
content, url
wiki, questionnaire
wiki, subpage
wiki, url
questionnaire, subpage
questionnaire, url
quiz, subpage
subpage, url
dataplus, final_result
folder, final_result
forumng, final_result
homepage, final_result
content, final_result
questionnaire, final_result
quiz, final_result
subpage, final_result
url, final_result
dataplus, content
dataplus, questionnaire
dataplus, url
dataplus, subpage
folder, quiz
folder, subpage

Table 7.

Mining results of probabilistic frequent 2-itemsets of moderate density data sets.

L2-P4T2-P2
forumng, homepage
homepage, subpage
quiz, subpage
forumng, final_result
homepage, final_result
page, final_result
quiz, final_result
subpage, final_result
dataplus, questionnaire
dataplus, dualpane
dataplus, content
dataplus, page
dataplus, url
dualpane, content
dualpane, page
dualpane, questionnaire
dualpane, subpage
dualpane, url
folder, subpage
forumng, homepage
homepage, content
homepage, wiki
homepage, subpage
homepage, url
content, wiki
content, page
content, questionnaire
content, quiz
content, subpage
content, url
wiki, questionnaire
wiki, subpage
wiki, url
page, questionnaire
page, subpage
page, url
questionnaire, subpage
questionnaire,
urlquiz, subpage
subpage, url
dataplus, final_result
dualpane, final_result
folder, final_result
forumng, final_result
content, final_result
homepage, final_result
content, final_result
wiki, final_result
page, final_result
questionnaire, final_result
quiz, final_result
subpage, final_result
url, final_result
T2-P4
dataplus, dualpane
dataplus, content
dataplus, wiki
dataplus, page
dataplus, questionnaire
dataplus, subpage
dataplus, url
dualpane, page
dualpane, questionnaire
homepage, content
homepage, subpage
homepage, url
content, wiki
content, page
content, questionnaire
content, subpage
content, url
wiki, page
wiki, questionnaire
wiki, subpage
wiki, url
page, questionnaire
page, subpage
page, url
questionnaire, subpage
questionnaire, url
quiz, subpage
resource, subpage
subpage, url

Table 8.

Probabilistic frequent 2-itemsets of dense density data sets.

From the distribution of frequent 2-itemsets in Tables 68, they have the following characteristics:

  1. There is a strong correlation between the components of learning behaviors, and even has a more obvious impact on the components of learning results. In the data set of approximate density, the frequent itemsets of Technology courses are significantly more than that of Literature courses. It shows that the learning behavior components of Technology courses have a strong diversity, and there is a continuous and serial interaction between the components, which makes learners form the approximate frequency participation. Compared with Literature courses, the components of Technology courses are more conducive to the formation of frequent itemsets of learning behaviors.

  2. For sparse density data sets, “forumng”, “homepage” and “content” are beneficial to form frequent 2-itemsets with other components, which is obviously reflected in different data sets of Literature and Technology courses. “wiki” also has frequent interaction with other components in Technology courses; For moderate density and dense density data sets, frequent 2-itemsets are similar, “forumng”, “homepage”, “content”, “url”, “quiz” and “subpage” all have strong component correlation. For Technology courses, frequent itemsets formed by “dataplus”, “dualpane”, “wiki” and “questionnaire” are used widely and frequently.

For the frequent itemset association rules of learning behavior components, three indicators are used to measure, which are “Support”, “Confidence” and “Lift”. “Support” determines the correlation between the components. “Lift” > 1 indicates that there is association and has positive correlation. The higher “Lift” is, the more valuable the association rules are; if “Lift” < 1 and smaller, there is negative correlation; if Lift = 1, the components are independent and have no correlation. The association rules with “Lift” > 1 and high confidence are listed and shown in Table 9, these association rules are the basis for tracking, adjusting and optimizing learning behaviors.

SupportConferenceLiftRules
L1-P20.23010.81271.2832{homepage, content} → {forumng}
L1-P4None
T1-P20.21520.78361.5241{homepage} → {forumng}
T1-P30.27010.74351.7195{content, wiki, subpage, url} → {homepage}
0.25580.82811.6399{content, wiki, subpage} → {url}
L2-P30.32910.85361.8408{homepage, subpage, url} → {forumng}
0.24530.71661.6807{quiz, subpage, url} → {homepage}
0.21320.53691.3577{subpage, quiz} → {final_result}
T1-P40.17310.84671.7063{homepage, wiki, url} → {forumng}
0.22290.87571.7530{content, wiki, url} → {homepage}
0.36810.53551.2122{content, wiki} → {final_result}
T2-P10.35220.77391.4773{content,questionnaire,url} → {dataplus}
0.41190.70491.6980{content, questionnaire, subpage, url} → {datapane}
0.38590.79781.5795{homepage} → {forumng}
0.46190.79532.0856{content, questionnaire, subpage, url} → {homepage}
0.42070.86821.7532{page, questionnaire, quiz, resource, subpage, url} → {content}
0.33610.74521.6985{questionnaire, subpage, url} → {page}
0.47470.72101.4747{subpage, url} → {questionnaire}
0.51510.83862.0553{resource, url} → {subpage}
0.33610.55481.2858{content, subpage, quiz} → {final_result}
T2-P30.20230.77341.6246{content, questionnaire, url, subpage} → {dataplus}
0.12850.82431.6447{homepage, subpage} → {forumng}
0.27040.86091.7790{content, wiki, questionnaire, subpage, url} → {homepage}
0.30220.86311.7404{wiki, questionnaire, subpage, url} → {content}
0.32530.80981.5003{url} → {subpage}
0.25210.52501.5236{folder, content, quiz, subpage} → {final_result}
L2-P40.17810.84731.4871{homepage} → {forumng}
0.39340.55631.0403{subpage} → {final_result}
T2-P20.19890.77321.6777{questionnaire, dualpane, content, page, url} → {dataplus}
0.24860.75311.6984{content, page, questionnaire, subpage, url} → {dualpane}
0.20190.80721.5534{homepage} → {forumng}
0.39470.75151.7227{content, wiki, subpage, url} → {homepage}
0.30250.87611.7971{wiki, page, questionnaire, quiz, subpage, url} → {content}
0.33420.75271.6687{questionnaire, subpage, url} → {page}
0.26240.70851.2884{subpage, url} → {quiz}
0.31280.85521.6777{url} → {subpage}
0.37600.54521.4449{folder, content, quiz, subpage} → {final_result}
T2-P40.27630.80021.7168{dualpane, content, wiki, page, questionnaire, subpage, url} → {dataplus}
0.22620.79341.7061{content, subpage, url} → {homepage}
0.27560.88581.7399{wiki, page, questionnaire, subpage, url} → {content}
0.39660.76061.7510{questionnaire, subpage, url} → {page}
0.31490.82221.9056{url} → {subpage}

Table 9.

Association rules generated by probabilistic frequent Itemsets.

On the whole, the association rules corresponding to the probabilistic frequent itemsets of sparse density data sets are less, and the association rules of Literature courses are less in the same density data sets [22]. For the moderate density and dense density data sets of Technology courses, rules are formed among the components of learning behaviors, and some of the components can produce rules with high credibility and strong relevance with the final assessment results.

It can be seen from Table 9 that there are common association rules of components among different data sets, which indicates that these rules have strong generality; for Literature courses or Technology courses, there are some similarities in association rules, but there are also obvious differences; For the same course, in different periods, the results show that the association rules of probabilistic frequent itemsets have both intersection and differences. About {content, questionnaire, subpage, url} → {homepage}, {resource, url} → {subpage} and {resource, url} → {subpage}, the “Lift” values are higher, indicating that the association degree is very high. From the table, it is easy to form strong association rules around “questionnaire”, “quiz”, “forumng”, “homepage”, “resource”, “subpage”, “url” and so on. “dataplus”, “dualpane”, “folder”, “wiki” and so on have strong relevance in Technology courses. Some of components have an obvious impact on the learning results. The extraction of these association rules can greatly simplify the categories of components in Tables 14.

The mining of probabilistic frequent itemsets and the learning of association rules are conducive to the evaluation and recommendation of components in the construction of learning behaviors [22, 23, 24]. At the same time, the formation process of learning behaviors can realize the aggregation of effective components according to these association rules. For the components related to association rules, we can build elastic proximity relationships or timely guidance strategies and recommendation mechanisms. This can effectively guide the learning processes, on the other hand, according to the needs of learning objectives, we can design association rules of probabilistic frequent itemsets according to the historical data, which is conducive to analyze and predict feasible participation components.

Based on the data in Tables 69, the nodes and edges of component interaction processes are constructed, and the key constituent units of learning behavior data sets are generated by Gephi. Figure 11 shows the topological structure and relationship weight of probabilistic frequent itemsets. There are 14 participation components involved and the weight of each relationship (edge) is calculated automatically. The thickness of the line indicates the strength of the relationship, and the dotted lines represent the potential relationships. The construction and extraction of the key topology of learning behaviors supported by probabilistic frequent itemsets are completed, which is a referential result of data-driven learning behavior prediction and decision making.

Figure 11.

The key topology of learning behaviors based on probabilistic frequent Itemsets and association rules.

7. Decision-making scheme for improving learning behaviors

Studying learning behavior through big data can promote learners to improve their learning processes and learning effects [25]. Aiming at the mining and association analysis of probability frequent itemsets, we realize 11 data subsets of learning behaviors with components as the basic structure characteristics. On the basis of Eclat framework, the vertical data format is adopted to design and improve the data structure and analysis algorithm for learning behavior components. Through the indicator comparison of approximate algorithms, the improved algorithm is effective and feasible for the analysis processes of data subsets, especially in the application of moderate density and dense density data set. Based on the data analysis results, “Support”, “Confidence” and “Lift” are the measurement indicators, and the corresponding thresholds are set. The probabilistic frequent itemsets and association rules are mined, and the key topology of learning behaviors supported by the probabilistic frequent itemsets are constructed. The whole processes of mining and analyzing probabilistic frequent itemsets are based on the vertical data format, which ensures the depth and breadth of data research results for decision prediction.

The research of learning behaviors is a specific branch of big data. It is different from other types of data characteristics. Because of the periodicity, continuity, collectivity and individuality of learning behaviors, there may be greater instability and discreteness between the generated data and the expected data. It is very difficult in data analysis and decision making, so it is necessary to design appropriate data structures and algorithms [26, 27] to carry out multi-dimensional empirical study on learning behaviors. Through a series of work and research results of probabilistic frequent itemsets analysis, the following decision schemes are obtained.

7.1 Learning content will affect the frequent itemsets of learning behaviors

Learning content determines learners’ tendency. The data of learning behaviors focuses on two Literature courses and two Technology courses, which correspond to multiple learning periods respectively. On the whole, the learning process of Technology courses more complicated, the learning behavior components are more diverse, and the online learning process description is also quite complete and comprehensive, that forms larger scale datasets. Learning content will affect the data density, components and the actual learning processes of learners, which determines the frequent itemsets mining results. For example, from the probabilistic frequent itemsets of the two learning periods of L1 course, the online learning processes corresponding to the learning contents do not have advantages, there is no effective correlation between the components and the learning assessment results, and the advantages of online learning mode are not obvious, which may be more suitable for the teaching mode.

Therefore, the construction of learning behaviors depends on the learning content. According to the mining results of frequent itemsets of historical data and the analysis of association rules, the learning mode of the course is optimized in the new learning period. Based on the learning content, we guide or expand the components of learning behaviors, so as to enhance the learning interest.

7.2 Teaching goals will affect the frequent itemsets of learning behaviors

The same learning content in different learning period, can produce different learning behavior data density, so as to get different frequent itemsets. In different learning periods, the frequent itemsets and association rules obtained by the algorithm are similar, but there are also obvious differences. The components are not the same, and some data sets are quite different. Learners in different periods have different teaching needs, and then correspond to different teaching objectives; On the other hand, the participation and traction in the learning process make the different participation components, and the stickiness of different components are different, which determines the frequent items, and thus produces different association rules, it even affects learners’ assessment methods and learning results.

Therefore, the construction of learning behaviors should consider the learning periods and the actual learners, flexibly construct teaching objectives, and design adaptive learning behavior components. In the learning processes, we should also timely analyze the learning behaviors, mine the existing problems and learners’ preferences, adjust the components in time, and optimize the learning methods appropriately. We should build a real-time and effective data tracking and analysis mechanisms.

7.3 The frequent itemsets of learning behavior have the characteristics of explicit and implicit association

There are differences in interaction mode of learning behavior components in different platforms, but the demands of serving learning behaviors are the same, that is to realize the continuity of learning behaviors and achieve the learning effects through the interaction of components. Through the mining of probabilistic frequent itemsets and the analysis of association rules, the components of frequent itemsets have explicit association features, and different frequent itemsets may also have implicit association features. It has a strong recommendation value for the prediction and feedback of latent learning behaviors. The key topological relationships of learning behaviors are shown in Figure 11, that can provide references for the follow-up learning processes of similar or the same courses, and expand learning methods.

Therefore, the construction of learning behaviors should not only consider the learning content and teaching objectives, but also refer to the historical effective learning behaviors, and also need to carry out effective learning process reform and learning strategy change based on data analysis, gradually promote learners to develop effective learning habits and methods, and construct new learning behavior components. According to the learning situation, stage learning feedback, potential behavior recommendation and implicit interest mining are achieved in order to improve the learning quality.

7.4 Learning behavior needs the adaptive support of specific algorithm and data structure

The generation of learning behaviors is a multi-dimensional process. The research strength of these data determines the cognitive strength of learning behaviors. There are different perspectives on the composition of learning behaviors, which determines different research methods. How to carry out relatively sufficient modeling description and business processing of learning behaviors presents challenges to learning analytics. Some existing software tools and analysis methods can not guarantee the appropriate quantification, standardization and initialization, the analysis process and experimental conclusion may not be thorough and comprehensive. Compared with the statistics and test of learning behaviors carried out by sampling, the effective and comprehensive analysis of learning behaviors is more convincing.

Therefore, the empirical analysis of learning behaviors should be the comprehensive application process of data-driven technologies and methods. Combined with the data characteristics, the technical requirements are demonstrated, and the structures and algorithms suitable for data attributes and process characteristics are designed. This aspect has huge research space and prospect in the field of education big data, which poses challenges for researchers. Learning analytics of educational big data is essentially data analysis, and it is a comprehensive application of computer science and technology, statistics, engineering, etc., and the design and development of general tools in this respect still need time [28]. For a specific data set, it is feasible and more realistic to design suitable data structures and algorithms for decision making.

8. Conclusion

The learning analysis of learning behaviors is a complex process. The data structure, attribute characteristics and relationship categories bring more difficulties. Moreover, the data has strong uncertainty and instability, so it is difficult to achieve technical unity and generality [29]. The development of online learning model gives new definitions and norms to learning behaviors, and also requires new data structure, attribute characteristics, relationship categories, etc. many technologies and methods that can be used in the research of learning behaviors may be inefficient for new data, or do nothing for the new research branches. This research is about the design and application of intelligent data mining technology on a big data set of learning behaviors. Based on Eclat framework, the data structure and algorithms are improved. Starting from the vertical data format, mining probabilistic frequent itemsets, analyzing association rules, and realizing data-driven decision making. In the subsequent research of learning behaviors, for uncertain data, we continue to conduct in-depth research and demonstration of methods and technologies, improve the quality of data analysis and relationship perspective, and provide more valuable conclusions for decision making and prediction feedback of learning behaviors.

Compliance with ethical standards

The authors certify that there is no conflict of interest with any individual/organization for the present work.

Advertisement

A list of acronyms

FPFrequent Pattern
PFCFrequent Pattern Calculation
LB-EclatLearning Behavior Eclat
Descending EclatDescending “Support”

DOWNLOAD FOR FREE

chapter PDF

© 2021 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution 3.0 License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

How to cite and reference

Link to this chapter Copy to clipboard

Cite this chapter Copy to clipboard

Xiaona Xia (April 24th 2021). Improved Probabilistic Frequent Itemset Analysis Strategy of Learning Behaviors Based on Eclat Framework [Online First], IntechOpen, DOI: 10.5772/intechopen.97219. Available from:

chapter statistics

88total chapter downloads

More statistics for editors and authors

Login to your personal dashboard for more detailed statistics on your publications.

Access personal reporting

We are IntechOpen, the world's leading publisher of Open Access books. Built by scientists, for scientists. Our readership spans scientists, professors, researchers, librarians, and students, as well as business professionals. We share our knowledge and peer-reveiwed research papers with libraries, scientific and engineering societies, and also work with corporate R&D departments and government entities.

More About Us