Abstract
Data mining is knowledge discovery process. It has to deal with exact information and inexact information. Statistical methods deal with inexact information but it is based on likelihood. Zadeh fuzzy logic deals with inexact information but it is based on belief and it is simple to use. Fuzzy logic is used to deal with inexact information. Data mining consist methods and classifications. These methods and classifications are discussed for both exact and inexact information. Retrieval of information is important in data mining. The time and space complexity is high in big data. These are to be reduced. The time complexity is reduced through the consecutive retrieval (C-R) property and space complexity is reduced with blackboard systems. Data mining for web data based is discussed. In web data mining, the original data have to be disclosed. Fuzzy web data mining is discussed for security of data. Fuzzy web programming is discussed. Data mining, fuzzy data mining, and web data mining are discussed through MapReduce algorithms.
Keywords
- data mining
- fuzzy logic
- fuzzy data mining
- web data mining
- fuzzy MapReduce algorithms
1. Introduction
Data mining is an emerging area for knowledge discovery to extract hidden and useful information from large amounts of data. Data mining methods like association rules, clustering, and classification use advanced algorithms such as decision tree and k-means for different purposes and goals. The research fields of data mining include machine learning, deep learning, and sentiment analysis. Information has to be retrieved within a reasonable time period for big data analysis. This may be achieved through the consecutively retrieval (C-R) of datasets for queries. The C-R property was first introduced by Ghosh [1]. After that, the C-R property was extended to statistical databases. The C-R cluster property is a presorting to store the datasets for clusters. In this chapter, C-R property is extended to cluster analysis. MapReduce algorithms are studied for cluster analysis. The time and space complexity shall be reduced through the consecutive retrieval (C-R) cluster property. Security of the data is one of the major issues for data analytics and data science when the original data is not to be disclosed.
The web programming has to handle incomplete information. Web intelligence is an emerging area and performs data mining to handle incomplete information. The incomplete information is fuzzy rather than probability. In this chapter, fuzzy web programming is discussed to deal with data mining using fuzzy logic. The fuzzy algorithmic language, called FUZZYALGOL, is discussed to design queries in data mining. Some examples are discussed for web programming with fuzzy data mining.
2. Data mining
Data mining [2, 3, 4, 5] is basically performed for knowledge discovery process. Some of the well-known data mining methods are frequent itemset mining, association rule mining, and clustering. Data warehousing is the representation of a relational dataset in two or more dimensions. It is possible to reduce the space complexity of data mining with consecutive storage of data warehouses.
The relational dataset is a representation of data with attributes and tuples.
Definition: A relational dataset R or cluster dataset is defined as a collection of attributes A1, A2 ,…, Am and tuples t1, t2,…, tn and is represented as
R = A1 x A2 x …x Am
ti = ai1 x ai2 x … x aim are tuples, where i =1,2,.., n
or
R(A1. A2. … Am). R is a relation.
R(ti)= (ai1. ai2 …. aim) are tuples, where i =1,2,.., n
or instance, two sample datasets “price” and “sales” are given in Tables 1 and 2, respectively.
The lossless join of the datasets “price” and “sales” is given in Table 3.
In the following, some of the methods (frequency, association rule, and clustering) are discussed.
Consider the “purchase” relational dataset given in Table 4.
2.1 Frequency
Frequency is the repeatedly accrued data.
Consider the following query:
Find the frequently customers purchase more than one item.
SELECT P.CNo, P.INo, IName, COUNT(*)
FROM purchase P
WHERE COUNT(*)>1.
The output of this query is given in Table 5.
INo | IName | Price |
---|---|---|
I005 | Shirt | 100 |
I007 | Dress | 50 |
I004 | Pants | 80 |
I008 | Jacket | 60 |
I009 | Skirt | 100 |
Table 1.
Sample dataset “price.”
INo | IName | Sales |
---|---|---|
I005 | Shirt | 80 |
I007 | Dress | 60 |
I004 | Pants | 100 |
I008 | Jacket | 50 |
I009 | Skirt | 80 |
Table 2.
Sample dataset “sales.”
INo | IName | Sales | Price |
---|---|---|---|
I005 | Shirt | 80 | 100 |
I007 | Dress | 60 | 50 |
I004 | Pants | 100 | 80 |
I008 | Jacket | 50 | 60 |
I009 | Skirt | 80 | 100 |
Table 3.
Lossless join of the price and sales datasets.
CNo | INo | IName | Price |
---|---|---|---|
C001 | I005 | shirt | 100 |
C001 | I007 | Dress | 50 |
C003 | I004 | pants | 80 |
C002 | I007 | dress | 80 |
C001 | I008 | Jacket | 60 |
C002 | I005 | shirt | 100 |
Table 4.
Sample dataset “purchase.”
CNo | INo | COUNT |
---|---|---|
C001 | I005 | 2 |
C002 | I005 | 2 |
Table 5.
Frequency.
2.2 Association rule
Association rule is the relationship among the data.
Consider the following query:
Find the customers who purchase shirt and dress.
<shirt⇔ dress>
SELECT P.CNo, P.INo
FROM purchase P
WHERE IName=”shirt” and IName=”dress”.
The output of this query is given in Table 6.
CNo | INo |
---|---|
C001 | I005 |
C002 | I005 |
Table 6.
Association.
2.3 Clustering
Clustering is grouping the particular data.
Consider the following query:
Group the customers who purchase dress and shirt.
The output of this query is given in Table 7.
CNo | INo | IName | Price |
---|---|---|---|
C001 | I007 | Dress | 50 |
I005 | shirt | 100 | |
C002 | I007 | dress | 80 |
I005 | shirt | 100 |
Table 7.
Clustering.
3. Data mining using C-R cluster property
The C-R (consecutive retrieval) property [1, 3] is the retrieval of records of database consecutively. Suppose R = {r1, r2, …, rn} is the dataset of records and C = {C1, C2, …, Cm} is the set of clusters.
The best type of file organization on a linear storage is one in which records pertaining to clusters are stored in consecutive locations without redundancy storing any data of R.
If there exists on such organization of R for C said to have the Consecutive Retrieval Property or C-R cluster property with respect to dataset R. Then C-R cluster property is applicable to linear storage.
The C-R cluster property is a binary relation between a cluster set and dataset.
Suppose if a cluster in a cluster set C is relevant to the data in a dataset R, then the relevancy is denoted by 1 and the irrelevancy is denoted by 0. Thus, the relevancy between cluster set C and dataset R can be represented as (n x m) matrix, as shown in Table 8. The matrix is called dataset-cluster incidence matrix (CIM).
Consider the dataset for customer account given in Table 9.
The dataset given in Table 9 is reorganized in ascending order based on sorting, as shown in Table 10.
Consider the following clusters of queries:
C1 = Find the customers whose sales is greater than or equal to 100.
C2 = Find the customers whose sales is less than 100.
C3 = Find the customers whose sales is greater than or equal average sales.
C4 = Find the customers whose sales is less than average sales.
The CIM is given in Table 11.
The dataset given in Table 11 is reorganized with sort on C1 in descending order, as shown in Table 12. Thus, C1 has C-R cluster property.
The dataset given in Table 11 is reorganized with sort on C2 in descending order, as shown in Table 13. Thus, C2 has C-R cluster property.
The dataset given in Table 11 is reorganized with sort on C3 in descending order, as shown in Table 14. Thus, C3 has C-R cluster property.
The dataset given in Table 11 is reorganized with sort on C4 in descending order, as shown in Table 15. Thus, C4 has a C-R cluster property.
The dataset is given for C1 ⋈ C2 has C-R cluster property (Table 16).
The dataset is given for C3 ⋈ C4 has C-R cluster property (Table 17).
The dataset is given for C1 ⋈ C3 has C-R cluster property (Table 18).
The dataset is given for C2 ⋈ C4 has C-R cluster property (Table 19).
The dataset is given for C2 ⋈ C3 has C-R cluster property (Table 20).
The cluster sets {C1 ⋈ C2, C3 ⋈ C4, C1 ⋈ C3, C2 U⋈ C4, C2 U⋈ C3} has C-R cluster property. Thus, the cluster sets have C-R cluster properties with respect to dataset R.
3.1 Design of parallel C-R cluster property
The design of parallel cluster shall be studied through the C-R cluster property. It can be studied in two ways: the parallel cluster design through graph theoretical approach and the parallel cluster design through response vector approach.
The C-R cluster property between cluster set C and dataset R can be stated in terms of the properties of vectors. The data cluster incidences of cluster set C with C-R cluster property may be represented as response vector set V. For instance the cluster set {C1, C2, C3, C4} has response vector set {V1=(1,1,1,0,0,0,0), V2=(0,0,0,1,1,1,1), V3=(1,1,1,0,0,0), and V4=(0,0,0,0,1,1,1)} (Tables 21–23).
R | C1 | C2 | …. | Cm |
---|---|---|---|---|
r1 | 1 | 0 | … | 1 |
r2 | 0 | 1 | ;;; | 0 |
- | - | - | … | - |
- | - | - | … | - |
= | - | - | … | - |
rn | 1 | 1 | … | 1 |
Table 8.
Incidence matrix.
R | CNo | IName | Sales |
---|---|---|---|
r1 | 70001 | Shirt | 150 |
r2 | 70002 | Dress | 30 |
r3 | 70003 | Pants | 100 |
r4 | 60001 | Dress | 50 |
r5 | 60002 | Jacket | 75 |
r6 | 60003 | Shirt | 120 |
r7 | 60004 | Dress | 40 |
Table 9.
Storage of sales.
R | CNo | IName | Sales |
---|---|---|---|
r1 | 70001 | Shirt | 150 |
r6 | 60003 | Dress | 120 |
r3 | 70003 | Pants | 100 |
r5 | 60002 | Dress | 75 |
r4 | 60001 | Jacket | 50 |
r7 | 60004 | Shirt | 40 |
r2 | 70002 | Dress | 30 |
Table 10.
Reorganizing for C-R cluster.
R | C1 | C2 | C3 | C4 |
---|---|---|---|---|
r1 | 1 | 0 | 1 | 0 |
r2 | 0 | 1 | 0 | 1 |
r3 | 1 | 0 | 1 | 0 |
r4 | 0 | 1 | 0 | 1 |
r5 | 0 | 1 | 1 | 0 |
r6 | 1 | 0 | 1 | 0 |
r7 | 0 | 1 | 0 | 1 |
Table 11.
Cluster incidence matrix.
R | C1 |
---|---|
r1 | 1 |
r3 | 1 |
r6 | 1 |
r2 | 0 |
r4 | 0 |
r5 | 0 |
R7 | 0 |
Table 12.
Sorting on C1.
R | C2 |
---|---|
r1 | 0 |
r3 | 0 |
r6 | 0 |
r2 | 1 |
r4 | 1 |
r5 | 1 |
r7 | 1 |
Table 13.
Sorting on C2.
R | C3 |
---|---|
r1 | 1 |
r3 | 1 |
r5 | 1 |
r6 | 1 |
r2 | 0 |
r4 | 0 |
r7 | 0 |
Table 14.
Sorting on C3.
R | C4 |
---|---|
r1 | 0 |
r3 | 0 |
r5 | 0 |
r6 | 0 |
r2 | 1 |
r4 | 1 |
r7 | 1 |
Table 15.
Sorting on C4.
R | C1 ⋈ C2 |
---|---|
r1 | 1 |
r3 | 1 |
r6 | 1 |
r2 | 1 |
r4 | 1 |
r5 | 1 |
r7 | 1 |
Table 16.
C1⋈C2.
R | C3 ⋈C4 |
---|---|
r1 | 1 |
r3 | 1 |
r5 | 1 |
r6 | 1 |
r2 | 1 |
r4 | 1 |
r7 | 1 |
Table 17.
C3⋈C4.
R | C1 ⋈C3 |
---|---|
r1 | 1 |
r3 | 1 |
r6 | 1 |
r2 | 1 |
r4 | 0 |
r5 | 0 |
r7 | 0 |
Table 18.
C1⋈C3.
R | C2 ⋈C4 |
---|---|
r1 | 0 |
r3 | 0 |
r6 | 0 |
r2 | 1 |
r4 | 1 |
r5 | 1 |
r7 | 1 |
Table 19.
C2⋈C4.
R | C2 U C3 |
---|---|
r1 | 1 |
r3 | 1 |
r6 | 1 |
r2 | 1 |
r4 | 1 |
r5 | 1 |
r7 | 1 |
Table 20.
C2⋈C3.
R | C1 | C2 |
---|---|---|
r1 | 1 | 0 |
r3 | 1 | 0 |
r6 | 1 | 0 |
r2 | 0 | 1 |
r4 | 0 | 1 |
r5 | 0 | 1 |
r7 | 0 | 1 |
Table 21.
{C1, C2}.
R | C3 | C4 |
---|---|---|
r1 | 1 | 0 |
r3 | 1 | 0 |
r6 | 1 | 0 |
r2 | 1 | 0 |
r4 | 0 | 1 |
r5 | 0 | 1 |
r7 | 0 | 1 |
Table 22.
{C3, C4}.
R | C2 | C3 |
---|---|---|
r1 | 0 | 1 |
r3 | 0 | 1 |
r6 | 0 | 1 |
r2 | 1 | 1 |
r4 | 1 | 0 |
r5 | 1 | 0 |
r7 | 1 | 0 |
Table 23.
{C2, C3}.
For instance, the response vector of the cluster C1 is given by column vector (1,1,1,0,0,0,0).
Suppose Ci and Cj are two clusters. If the two vectors Vi and Vj of Ci and Cj and the intersection Vi ∩ Vj = Ф, then the cluster set {Ci, Cj} has a parallel cluster property. Consider the vectors V1 and V2 of C1 and C2. The intersection of V1 ∩V2 = Ф, so that the cluster set {C1, C2} has parallel cluster property. Similarly the cluster set {C3, C4} has parallel cluster property. The cluster set {C2, C3} does not have parallel cluster property because V1 ∩ V2 # Ф and r2 depending on C1 and C2.
3.2 Visual design for parallel cluster
The C-R cluster property is studied with graphical approach. This graphical approach can be studied for designing parallel cluster processing (PCP).
Suppose Vi is the vertex of RICM of C. The G(C) is defined by vertices Vi, i=1,2,…, and n, and two vertices have an edge Eij associated with interval Ii={Vi, Vi+1} i=1,…,n-1.
If G(C) has C-R cluster property, the vertices of G(C) have consecutive 1’s or 0’s.
Consider the cluster set {C1, C2}. The G(C1) has the vertices (1,1,1,0,0,0,0), and the G(C2) has the vertices (0,0,0,1,1,1,1), G(C3) has the vertices (1,1,1,1, 0,0,0), and G(C4) has vertices (0,0,0,0,1,1,1).
The parallel cluster property exists if G(Ci) ∩G(Cj)=Ф.
For instance, consider the G(C1) and G(C2). G(C1) ∩G(C2)=Ф, so that the cluster set {C1, C2} has parallel cluster property. The graphical representation is shown in Figure 1.

Figure 1.
{C1, C2}.
Similarly the cluster set {C3, C4} has the parallel cluster property (PCP). The cluster set {C3, C4} has no PCP because it is G(C2) ∩ G(C3) # Ф
The graph G(C1) ∩ G(C2) = Ф have consecutive cluster property.
The graph G(C3) ∩ G(C4) = Ф have consecutive cluster property. The graphical representation is shown in Figure 2.

Figure 2.
{C3, C4}.
The graph G(C2) ∩ G(C3) # Ф does not have consecutive cluster property. The graphical representation is shown in Figure 3.

Figure 3.
{C2, C3}.
3.3 Parallel cluster design through genetic approach
Genetic algorithms (GAs) were introduced by Darwin [6]. GAs are used to learn and optimize the problem [7]. There are four evaluation processes:
Selection
Reproduction
Mutation
Competition
Consider the following crossover with two cuts:
Parent #1 00001111
Parent #2 11110000
The parent #1 and #2 match with crossover.
The C-R cluster property is studied through genetical study. This study will help for designing parallel cluster processing (PCP).
Definition: The gene G of cluster G(C) is defined as incidence sequence.
Suppose G(C1) is parent and G(C2) child genome of cluster incidence for C1 and C2.
Suppose the G(C1) has (1,1,1,0,0,0,0) and the G(C2) has the v(0,0,0,1,1,1,1).
The parallel cluster property may be designed using genetic approach with the C-R cluster property.
Suppose C is cluster set, R is dataset and G(C) is genetic set.
The parallel cluster property exists if G(Ci) and G(Cj) matches with crossover.
For instance,
G(C1) = 11110000
G(C2) = 00001111
G(C1) and G(C2)matches with the crossover.
The cluster set {C1, C2} has parallel cluster property.
Similarly the cluster set {C3, C4} has the parallel cluster property. The cluster set {C3, C4} has no PCP because G(C2) and G(C3) are not matched with crossover.
3.4 Parallel cluster design cluster analysis
Clustering is grouping the particular data according to their properties, and sample clusters C1 and C2 are given in Tables 24 and 25, respectively.
R | C1 |
---|---|
r1 | 1 |
r3 | 1 |
r6 | 1 |
Table 24.
Cluster C1.
R | C2 |
---|---|
r2 | 1 |
r4 | 1 |
r5 | 1 |
r7 | 1 |
Table 25.
Cluster C2.
Thus, the C1 and C2 have consecutive parallel cluster property (Tables 26 and 27).
R | C3 |
---|---|
r1 | 1 |
r3 | 1 |
r5 | 1 |
r6 | 1 |
Table 26.
Cluster C3.
R | C4 |
---|---|
r2 | 1 |
r4 | 1 |
r7 | 1 |
Table 27.
Cluster C4.
Thus, the C3 and C4 have consecutive parallel properly. C2 and C3 do not have consecutive parallel cluster property because r2 is common.
4. Design of retrieval of cluster using blackboard system
Retrieval of clusters from blackboard system [8] is the direct retrieval of data sources. When the query is being processed, the entire database has to bring to main memory but in blackboard architecture, the data item source is direct from the blackboard structure. For the retrieval of information for a query, data item is directly retrieved from the blackboard which contains data item sources. Hash function may be used to store the data item set in the blackboard.
The blackboard systems may be constructed with data structure for data item sources.
Consider the account (AC-No, AC-Name, AC-Balance)
Here AC-No is key of datasets.
Each data item is data sourced which is mapped by h(x).
These data items are stored in blackboard structure.
When the transaction is being processed, there is no need to take the entire database into the main memory. It is sufficient to retrieval of particular data item of particular transaction from the blackboard system (Figure 4).

Figure 4.
Blackboard system.
The advantage of blackboard architecture is highly secured for blockchain transaction. The blockchain technology has no third-party interference.
5. Fuzzy data mining
Sometimes, data mining is unable to deal with incomplete database and unable to combine the data and reasoning. Fuzzy data mining [6, 7, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18] will combine the data and reasoning by defining with fuzziness. The fuzzy MapReducing algorithms have two functions: mapping reads fuzzy datasets and reducing writes the after operations.
Definition: Given some universe of discourse X, a fuzzy set is defined as a pair {t, μd(t)}, where t is tuples and d is domains and membership function μd(x) is taking values on the unit interval [0,1], i.e., μd(t)➔[0,1], where tiЄX is tuples (Table 28).
R1 | d1 | d2 | . | dm | μ |
---|---|---|---|---|---|
t1 | a11 | a12 | . | a1m | μd(t1) |
t2 | a21 | a22 | A2m | μd(t2) | |
. | . | . | . | . | . |
tn | a1n | a1n | . | Anm | μd(tn) |
Table 28.
Fuzzy dataset.
The sale is defined intermittently with fuzziness (Tables 29–32).
CNo | INo | IName | Demand |
---|---|---|---|
C001 | I005 | shirt | 0.9 |
C001 | I007 | Dress | 0.65 |
C003 | I004 | pants | 0.85 |
C002 | I007 | dress | 0.6 |
C001 | I008 | Jacket | 0.65 |
C002 | I005 | shirt | 0.9 |
Table 29.
Fuzzy demand.
CNo | INo | IName | Negation of price |
---|---|---|---|
C001 | I005 | shirt | 0.3 |
C001 | I007 | Dress | 0.5 |
C003 | I004 | pants | 0.4 |
C002 | I007 | dress | 0.5 |
C001 | I008 | Jacket | 0.4 |
C002 | I005 | shirt | 0.3 |
Table 30.
Negation of price.
CNo | INo | IName | Sales U price |
---|---|---|---|
C001 | I005 | Shirt | 0.8 |
C001 | I007 | Dress | 0.5 |
C003 | I004 | Pants | 0.6 |
C002 | I007 | Dress | 0.5 |
C001 | I008 | Jacket | 0.6 |
C002 | I005 | Shirt | 0.7 |
Table 31.
Sales U price.
INo | IName | Sales |
---|---|---|
I005 | Shirt | 0.8 |
I007 | Dress | 0.5 |
I004 | Pants | 0.6 |
I007 | Dress | 0.5 |
I008 | Jacket | 0.6 |
Table 32.
Items-sales.
μ Demand(x)=0.9/90+0.85/80+0.8/75+0.65/70
or
Fuzziness may be defined with function
μ Demand(x)= (1+(Demand-100)/100) −1 Demand <=100
=1 Demand>100
Negation
Union
Union of 1105 = max{0.8,0.7}=0,8
Fuzzy semijoin is given by sales ⋈ items-sale as shown in Table 33.
CNo | INo | IName | Sales |
---|---|---|---|
C001 | I005 | shirt | 0.8 |
C001 | I007 | Dress | 0.5 |
C003 | I004 | pants | 0.6 |
C002 | I007 | dress | 0.5 |
C001 | I008 | Jacket | 0.7 |
C002 | I005 | shirt | 0.7 |
Table 33.
Fuzzy semijoin.
The fuzzy k-means clustering algorithm (FKCA) is optimization algorithm for fuzzy datasets (Table 34).
CNo | INo | IName | Sales |
---|---|---|---|
C001 | I005⇔I007 | Shirt⇔Dress | 0.4 |
C003 | I004 | pants | 0.6 |
C002 | I007⇔I005 | Dress⇔shirt | 0.5 |
Table 34.
Association.
Fuzzy k-means cluster algorithm (FKAC) is given by, using FAD
best=R
K=means=best
for i range(1,n)
for j range(1,n)
ti=fuzzy union(ri.RU ri.Rj), if ri.R=rj.R
C reduce best
k-means < best
return
The fuzzy multivalued association property of data mining may be defined with multivalued fuzzy functional dependency.
The fuzzy multivalued association (FMVD) is the multivalve dependency (MVD). The association multivalve dependency (FAMVD) may be defined by using Mamdani fuzzy conditional inference [3].
If EQ(t1(X),t2(X),t3(X)) then EQ(t1(Y) ,t2(Y)) or EQ(t2(Y) ,t3(Y)) or EQ(t1(Y) ,t3(Y))
= min{EQ(t1(Y) ,t2(Y)) EQ(t2(Y) ,t3(Y)) EQ(t1(Y) ,t3(Y))}
= min{min(t1(Y) ,t2(Y)) , min(t2(Y) ,t3(Y)) , min(t1(Y) ,t3(Y))}
= min(t1(Y) ,t2(Y). t3(Y))
The fuzzy k-means clustering algorithm (FKCA) is the optimization algorithm for fuzzy datasets (Table 35).
CNo | INo | IName | Sales |
---|---|---|---|
C001 | I005⇔I007 ⇔I008 | Shirt⇔Dress ⇔Jacket | 0.8 0.4 0.5 |
C003 | I004 | Pants | 0.6 |
C002 | I007⇔I005 | Dress⇔shirt | 0.5 0.7 |
Table 35.
Association using AFMVD.
Fuzzy k-means cluster algorithm (FKAC) is given by, using FAMVD
best=R
K=means=best
for i range(1,n)
for j range(1,n)
for k range(1,n)
ti=fuzzy union(ri.R U rj.R U rk.R), if ri.R=rj.R=rk.R
C reduce best
k-means<best
return
The fuzzy k-means clustering algorithm (FKCA) is the optimization algorithm for fuzzy datasets.
K=means=n
for i range(1,n)
for j range(1,n)
ti=fuzzy union(ri.R U si.Sj), if ri.R=sj.S
C =best
k-means < best
return
For example, consider the sorted fuzzy sets of Table 5 is given in Table 36.
CNo | INo | IName | Sales ⋈ Price⋈ Demand |
---|---|---|---|
C001 | I005 | Shirt | 0.8 |
C001 | I007 | Dress | 0.5 |
C003 | I004 | Pants | 0.6 |
C002 | I007 | Dress | 0.5 |
C001 | I008 | Jacket | 0.6 |
C002 | I005 | Shirt | 0.7 |
Table 36.
Fuzzy join.
6. Fuzzy security for data mining
Security methods like encryption and decryption are used cryptographically. These security methods are not secured. Fuzzy security method is based on the mind and others do not descript. Zadeh [16] discussed about web intelligence, world knowledge, and fuzzy logic. The current programming is unable to deal question answering containing approximate information. For instance “which is the best car?” The fuzzy data mining with security is knowledge discovery process with data associated.
The fuzzy relational databases may be with fuzzy set theory. Fuzzy set theory is another approach to approximate information. The security may be provided by approximate information.
Definition: Given some universe of discourse X, a relational database R1 is defined as pair {t, d}, where t is tuple and d is domain (Table 37).
R1 | d1 | d2 | . | dm |
---|---|---|---|---|
t1 | a11 | a12 | . | a1m |
t2 | a21 | a22 | A2m | |
. | . | . | . | . |
tn | a1n | a1n | . | Anm |
Table 37.
Relational database.
Price = 0.4/50+0.5/60+07/80+0.8/100
The fuzzy security database of price is given in Table 38.
INo | IName | Price |
---|---|---|
I005 | Benz | 0.8 |
I007 | Suzuki | 0.4 |
I004 | Toyota | 0.7 |
I008 | Skoda | 0.5 |
I009 | Benz | 0.8 |
Table 38.
Price fuzzy set.
Demand = 0.4/50+0.5/60+0.7/80+0.8/100
The fuzzy security database of demand is given in Table 39.
INo | IName | Demand | μ |
---|---|---|---|
I005 | Benz | 80 | 0.7 |
I007 | Suzuki | 60 | 0.5 |
I004 | Toyota | 100 | 0.8 |
I008 | Skoda | 50 | 0.4 |
I009 | Benz | 80 | 0.7 |
Table 39.
Demand fuzzy set.
The lossless natural join of demand and price is union and is given in Table 40.

Table 40.
Lossless join.
The actual data has to be disclosed for analysis on the web. There is no need to disclose the data if the data is inherently define with fuzziness.
“car with fuzziness >07” may defined as follows:
For instance,
XML data may be defined as
<CAR>
<COMPANY>
<NAME> Benz <NAME>
<FUZZ> 0.8 <FUZZ>
</COMPANY>
<COMPANY>
<NAME> Suzuki <NAME>
<FUZZ> 0.9<FUZZ>
</COMPANY>
<COMPANY>
<NAME> Toyoto<NAME>
<FUZZ> 0.6<FUZZ>
</COMPANY>
<COMPANY>
I<NAME> Skoda<NAME>
<FUZZ> 0.7<FUZZ>
</COMPANY>
Xquery may define using projection operator for demand car is given as
Name space default =
Validate <CAR> {
For $name in COMPANY/CAR
where $company/ Max($demand>0.7)}
return <COMPANY> {$company/name, $company/fuzzy}</COMPANY>
</CAR>
The fuzzy reasoning may be applied for fuzzy data mining.
Consider the more demand fuzzy database by decomposition (Tables 41 and 42).
INo | IName | Demand |
---|---|---|
I005 | Benz | 0.8 |
I007 | Suzuki | 0.9 |
I004 | Toyota | 0.6 |
I008 | Skoda | 0.7 |
I009 | Benz | 0.9 |
Table 41.
Demand.
INo | IName | Price |
---|---|---|
I005 | Benz | 0.7 |
I007 | Suzuki | 0.4 |
I004 | Toyota | 0.6 |
I008 | Skoda | 0.5 |
I009 | Benz | 0.7 |
Table 42.
Price.
The fuzzy reasoning [14] may be performed using Zadeh fuzzy conditional inference
The Zadeh [14] fuzzy conditional inference is given by
if x is P1 and x is P2 …. x is Pn then x is Q =
min 1, {1-min(μP1(x), μP2(x), …, μPn(x)) +μQ(x)}
The Mamdani [7] fuzzy conditional inference s given by
if x is P1 and x is P2 …. x is Pn then x is Q =
min {μP1(x), μP2(x), …, μPn(x) , μQ(x)}
The Reddy [12] fuzzy conditional inference s given by
= min(μP1(x), μP2(x), …, μPn(x))
If x is Demand then x is price
x is more demand
------------------------------------
x is more Demand o (Demand➔Price)
x is more Demand o min{1, 1-Demand+Price}Zadeh
x is more Demand o min{Demand, Price} Mamdani
x is more Demand o {Demand} Reddy
“If x is more demand, then x is more prices” is given in Tables 43 and 44.
INo | IName | More demand |
---|---|---|
I005 | Benz | 0.89 |
I007 | Suzuki | 0.95 |
I004 | Toyota | 0.77 |
I008 | Skoda | 0.84 |
I009 | Benz | 0.95 |
Table 43.
More demand.
INo | IName | Zadeh | Mamdani | Reddy |
---|---|---|---|---|
I005 | Benz | 0.9 | 0.7 | 0.7 |
I007 | Suzuki | 0.5 | 0.4 | 0.4 |
I004 | Toyota | 1,0 | 0.6 | 0.6 |
I008 | Skoda | 0.8 | 0.5 | 0.5 |
I009 | Benz | 0.8 | 0.7 | 0.7 |
Table 44.
Demand➔Price.
The inference for price is given in Table 45.
INo | IName | Zadeh | Mamdani | Reddy |
---|---|---|---|---|
I005 | Benz | 0.89 | 0.7 | 0.7 |
I007 | Suzuki | 0.5 | 0.4 | 0.4 |
I004 | Toyota | 0.77 | 0.6 | 0.6 |
I008 | Skoda | 0.8 | 0.5 | 0.5 |
I009 | Benz | 0.8 | 0.7 | 0.7 |
Table 45.
Inference price.
So the business administrator (DA) can take decision to increase the price or not.
7. Web intelligence and fuzzy data mining
Let C and D be the fuzzy rough sets (Tables 46–51).
d1 | 22 | . | dm | μ | |
---|---|---|---|---|---|
t1 | a11 | a12 | . | a1m | μd(t1) |
t2 | a21 | a22 | A2m | μd(t2) | |
. | . | . | . | . | . |
tn | a1n | a1n | . | Anm | μd(tn) |
Table 46.
Fuzzy database.
INo | IName | Price | μ |
---|---|---|---|
I005 | Shirt | 100 | 0.8 |
I007 | Dress | 50 | 0.4 |
I004 | Pants | 80 | 0.7 |
I008 | Jacket | 60 | 0.5 |
I009 | Skirt | 100 | 0.8 |
Table 47.
Price database.

Table 48.
Intersect of demand and price.
INo | IName | Demand | μ |
---|---|---|---|
I005 | Shirt | 80 | 0.8 |
I007 | Dress | 60 | 0.5 |
I004 | Pants | 100 | 0.8 |
I008 | Jacket | 50 | 0.5 |
I009 | Skirt | 80 | 0.8 |
Table 49.
Lossless decomposition of demand.
INo | IName | Price | μ |
---|---|---|---|
I005 | Shirt | 100 | 0.8 |
I007 | Dress | 50 | 0.5 |
I004 | Pants | 80 | 0.8 |
I108 | Jacket | 60 | 0.5 |
I009 | Skirt | 100 | 0.8 |
Table 50.
Lossless decomposition of price.
Company | μ |
---|---|
IBM | 0.8 |
Microsoft | 0.9 |
0.75 |
Table 51.
Best software company.
The operations on fuzzy rough set type 2 are given as
1-C= 1- μC(x) Negation
CVD=max{μC(x), μD(x)} Union
CΛD=min{μC(x) , μD(x)} Intersection
XML data may be defined as
<SOFTWARE>
<COMPANY>
<NAME> IBM <NAME>
<FUZZ> 0.8 <FUZZ>
</COMPANY>
<COMPANY>
<NAME> Microsoft <NAME>
<FUZZ> 0.9<FUZZ>
</COMPANY>
<COMPANY>
<NAME> Google<NAME>
<FUZZ> 0.75<FUZZ>
</COMPANY>
Xquery may define using projection operator for best software company is given as
Name space default =
Validate <SOFTWARE> {For $name in COMPANY/SOFTWARE where $company/ Max($fuzz)}
return <COMPANY> {$company/name, $company/fuzzy} </COMPANY>
</SOFTWARE>
Similarly, the following problem may be considered for web programming.
Let P is the fuzzy proposition in question-answering system.
P=Which is tallest buildings City?
The answer is “x is the tallest buildings city.”
For instance, the fuzzy set “most tallest buildings city” may defined as
most tallest buildings city = 0.6/Hoang-Kang + 0.6/Dubai + 0.7/New York +0.8/Taipei+ 0.5/Tokyo
For the above question, output is “tallest buildings city”= 0.8/Taipei by using projection.
The fuzzy algorithm using FUZZYALGOL is given as follows:
BEGIN
Variable most tallest buildings City = 0.6 / Hoang-Kang + 0.6 / Dubai + 0.7 / New York + 0.8 / Taipei + 0.5 / Tokyo
most tallest buildings City =0.8 / Taipei
Return URL, fuzziness=Taipei, 0.8
END
The problem is to find “most pdf of type-2 in fuzzy sets”
The Fuzzy algorithm is
Go to most visited fuzzy set cites
Go to most visited fuzzy sets type-2
Go to most visited fuzzy sets type -2 pdf
The web programming gets “the most visited fuzzy sets” and put in order
The web programming than gets “the most visited type-2 in fuzzy sets”
The web programming gets “the most visited pdf in type-2”
8. Conclusion
Data mining may deal with incomplete information. Bayesian theory needs exponential complexity to combine data. Defining datasets with fuzziness inherently reduce complexity. In this chapter, fuzzy MapReduce algorithms are studied based on functional dependencies. The fuzzy k-means MapReduce algorithm is studied using fuzzy functional dependencies. Data mining and fuzzy data mining are discussed. A brief overview on the work on business intelligence is given as an example.
Most of the current web programming studies are unable to deal with incomplete information. In this chapter, the web intelligence system is discussed for fuzzy data mining. In addition, the fuzzy algorithmic language is discussed for design fuzzy algorithms for data mining. Web intelligence system for data mining is discussed. Some examples are given for web intelligence and fuzzy data mining.
Acknowledgments
The author thanks the reviewer and editor for revision and review suggestions made in this work.