Sample dataset “price.”

## Abstract

Data mining is knowledge discovery process. It has to deal with exact information and inexact information. Statistical methods deal with inexact information but it is based on likelihood. Zadeh fuzzy logic deals with inexact information but it is based on belief and it is simple to use. Fuzzy logic is used to deal with inexact information. Data mining consist methods and classifications. These methods and classifications are discussed for both exact and inexact information. Retrieval of information is important in data mining. The time and space complexity is high in big data. These are to be reduced. The time complexity is reduced through the consecutive retrieval (C-R) property and space complexity is reduced with blackboard systems. Data mining for web data based is discussed. In web data mining, the original data have to be disclosed. Fuzzy web data mining is discussed for security of data. Fuzzy web programming is discussed. Data mining, fuzzy data mining, and web data mining are discussed through MapReduce algorithms.

### Keywords

- data mining
- fuzzy logic
- fuzzy data mining
- web data mining
- fuzzy MapReduce algorithms

## 1. Introduction

Data mining is an emerging area for knowledge discovery to extract hidden and useful information from large amounts of data. Data mining methods like association rules, clustering, and classification use advanced algorithms such as decision tree and k-means for different purposes and goals. The research fields of data mining include machine learning, deep learning, and sentiment analysis. Information has to be retrieved within a reasonable time period for big data analysis. This may be achieved through the consecutively retrieval (C-R) of datasets for queries. The C-R property was first introduced by Ghosh [1]. After that, the C-R property was extended to statistical databases. The C-R cluster property is a presorting to store the datasets for clusters. In this chapter, C-R property is extended to cluster analysis. MapReduce algorithms are studied for cluster analysis. The time and space complexity shall be reduced through the consecutive retrieval (C-R) cluster property. Security of the data is one of the major issues for data analytics and data science when the original data is not to be disclosed.

The web programming has to handle incomplete information. Web intelligence is an emerging area and performs data mining to handle incomplete information. The incomplete information is fuzzy rather than probability. In this chapter, fuzzy web programming is discussed to deal with data mining using fuzzy logic. The fuzzy algorithmic language, called FUZZYALGOL, is discussed to design queries in data mining. Some examples are discussed for web programming with fuzzy data mining.

## 2. Data mining

Data mining [2, 3, 4, 5] is basically performed for knowledge discovery process. Some of the well-known data mining methods are frequent itemset mining, association rule mining, and clustering. Data warehousing is the representation of a relational dataset in two or more dimensions. It is possible to reduce the space complexity of data mining with consecutive storage of data warehouses.

The relational dataset is a representation of data with attributes and tuples.

**Definition**: A relational dataset *R* or cluster dataset is defined as a collection of attributes *A*_{1}, *A*_{2} ,…, *Am* and tuples *t*_{1}, *t*_{2},…, *tn* and is represented as

*R* = *A*_{1} x *A*_{2} x …x *Am*

*ti* = *a*_{i1} x *a*_{i2} x … x *aim* are tuples, where *i* =1,2,.., *n*

or

*R*(*A*_{1}. *A*_{2}. … *Am*). *R* is a relation.

*R*(*ti*)= (*a*_{i1}. *a*_{i2} …. *a*_{im)} are tuples, where *i* =1,2,.., *n*

or instance, two sample datasets “price” and “sales” are given in Tables 1 and 2, respectively.

The lossless join of the datasets “price” and “sales” is given in Table 3.

In the following, some of the methods (frequency, association rule, and clustering) are discussed.

Consider the “purchase” relational dataset given in Table 4.

### 2.1 Frequency

Frequency is the repeatedly accrued data.

Consider the following query:

Find the frequently customers purchase more than one item.

SELECT P.CNo, P.INo, IName, COUNT(*)

FROM purchase P

WHERE COUNT(*)>1.

The output of this query is given in Table 5.

INo | IName | Price |
---|---|---|

I005 | Shirt | 100 |

I007 | Dress | 50 |

I004 | Pants | 80 |

I008 | Jacket | 60 |

I009 | Skirt | 100 |

INo | IName | Sales |
---|---|---|

I005 | Shirt | 80 |

I007 | Dress | 60 |

I004 | Pants | 100 |

I008 | Jacket | 50 |

I009 | Skirt | 80 |

INo | IName | Sales | Price |
---|---|---|---|

I005 | Shirt | 80 | 100 |

I007 | Dress | 60 | 50 |

I004 | Pants | 100 | 80 |

I008 | Jacket | 50 | 60 |

I009 | Skirt | 80 | 100 |

CNo | INo | IName | Price |
---|---|---|---|

C001 | I005 | shirt | 100 |

C001 | I007 | Dress | 50 |

C003 | I004 | pants | 80 |

C002 | I007 | dress | 80 |

C001 | I008 | Jacket | 60 |

C002 | I005 | shirt | 100 |

CNo | INo | COUNT |
---|---|---|

C001 | I005 | 2 |

C002 | I005 | 2 |

### 2.2 Association rule

*Association rule* is the relationship among the data.

Consider the following query:

Find the customers who purchase shirt and dress.

<shirt⇔ dress>

SELECT P.CNo, P.INo

FROM purchase P

WHERE IName=”shirt” and IName=”dress”.

The output of this query is given in Table 6.

CNo | INo |
---|---|

C001 | I005 |

C002 | I005 |

### 2.3 Clustering

*Clustering* is grouping the particular data.

Consider the following query:

Group the customers who purchase dress and shirt.

The output of this query is given in Table 7.

CNo | INo | IName | Price |
---|---|---|---|

C001 | I007 | Dress | 50 |

I005 | shirt | 100 | |

C002 | I007 | dress | 80 |

I005 | shirt | 100 |

## 3. Data mining using C-R cluster property

The C-R (consecutive retrieval) property [1, 3] is the retrieval of records of database consecutively. Suppose *R* = {*r*_{1}, *r*_{2}, …, *rn*} is the dataset of records and *C* = {*C*_{1}, *C*_{2}, …, *Cm*} is the set of clusters.

The best type of file organization on a linear storage is one in which records pertaining to clusters are stored in consecutive locations without redundancy storing any data of *R*.

If there exists on such organization of *R* for *C* said to have the Consecutive Retrieval Property or C-R cluster property with respect to dataset *R*. Then C-R cluster property is applicable to linear storage.

The C-R cluster property is a binary relation between a cluster set and dataset.

Suppose if a cluster in a cluster set *C* is relevant to the data in a dataset *R*, then the relevancy is denoted by 1 and the irrelevancy is denoted by 0. Thus, the relevancy between cluster set *C* and dataset *R* can be represented as (*n* x *m*) matrix, as shown in Table 8. The matrix is called dataset-cluster incidence matrix (CIM).

Consider the dataset for customer account given in Table 9.

The dataset given in Table 9 is reorganized in ascending order based on sorting, as shown in Table 10.

Consider the following clusters of queries:

C1 = Find the customers whose sales is greater than or equal to 100.

C2 = Find the customers whose sales is less than 100.

C3 = Find the customers whose sales is greater than or equal average sales.

C4 = Find the customers whose sales is less than average sales.

The CIM is given in Table 11.

The dataset given in Table 11 is reorganized with sort on *C*_{1} in descending order, as shown in Table 12. Thus, *C*_{1} has C-R cluster property.

The dataset given in Table 11 is reorganized with sort on C_{2} in descending order, as shown in Table 13. Thus, *C*_{2} has C-R cluster property.

The dataset given in Table 11 is reorganized with sort on *C*_{3} in descending order, as shown in Table 14. Thus, *C*_{3} has C-R cluster property.

The dataset given in Table 11 is reorganized with sort on *C*_{4} in descending order, as shown in Table 15. Thus, *C*_{4} has a C-R cluster property.

The dataset is given for *C*_{1} ⋈ *C*_{2} has C-R cluster property (Table 16).

The dataset is given for *C*_{3} ⋈ *C*_{4} has C-R cluster property (Table 17).

The dataset is given for *C*_{1} ⋈ *C*_{3} has C-R cluster property (Table 18).

The dataset is given for *C*_{2} ⋈ *C*_{4} has C-R cluster property (Table 19).

The dataset is given for C_{2} ⋈ C_{3} has C-R cluster property (Table 20).

The cluster sets {*C*_{1} ⋈ *C*_{2}, *C*_{3} ⋈ *C*_{4}, *C*_{1} ⋈ *C*_{3}, *C*_{2} U⋈ *C*_{4}, *C*_{2} U⋈ *C*_{3}} has C-R cluster property. Thus, the cluster sets have C-R cluster properties with respect to dataset *R*.

### 3.1 Design of parallel C-R cluster property

The design of parallel cluster shall be studied through the C-R cluster property. It can be studied in two ways: the parallel cluster design through graph theoretical approach and the parallel cluster design through response vector approach.

The C-R cluster property between cluster set *C* and dataset R can be stated in terms of the properties of vectors. The data cluster incidences of cluster set *C* with C-R cluster property may be represented as response vector set *V*. For instance the cluster set {*C*_{1}, *C*_{2}, *C*_{3}, *C*_{4}} has response vector set {*V*_{1}=(1,1,1,0,0,0,0), *V*_{2}=(0,0,0,1,1,1,1), *V*_{3}=(1,1,1,0,0,0), and *V*_{4}=(0,0,0,0,1,1,1)} (Tables 21–23).

R | C_{1} | C_{2} | …. | C_{m} |
---|---|---|---|---|

r_{1} | 1 | 0 | … | 1 |

r_{2} | 0 | 1 | ;;; | 0 |

- | - | - | … | - |

- | - | - | … | - |

= | - | - | … | - |

r_{n} | 1 | 1 | … | 1 |

R | CNo | IName | Sales |
---|---|---|---|

r_{1} | 70001 | Shirt | 150 |

r_{2} | 70002 | Dress | 30 |

r_{3} | 70003 | Pants | 100 |

r_{4} | 60001 | Dress | 50 |

r_{5} | 60002 | Jacket | 75 |

r_{6} | 60003 | Shirt | 120 |

r_{7} | 60004 | Dress | 40 |

R | CNo | IName | Sales |
---|---|---|---|

r_{1} | 70001 | Shirt | 150 |

r_{6} | 60003 | Dress | 120 |

r_{3} | 70003 | Pants | 100 |

r_{5} | 60002 | Dress | 75 |

r_{4} | 60001 | Jacket | 50 |

r_{7} | 60004 | Shirt | 40 |

r_{2} | 70002 | Dress | 30 |

R | C_{1} | C_{2} | C_{3} | C_{4} |
---|---|---|---|---|

r_{1} | 1 | 0 | 1 | 0 |

r_{2} | 0 | 1 | 0 | 1 |

r_{3} | 1 | 0 | 1 | 0 |

r_{4} | 0 | 1 | 0 | 1 |

r_{5} | 0 | 1 | 1 | 0 |

r_{6} | 1 | 0 | 1 | 0 |

r_{7} | 0 | 1 | 0 | 1 |

R | C_{1} |
---|---|

r_{1} | 1 |

r_{3} | 1 |

r_{6} | 1 |

r_{2} | 0 |

r_{4} | 0 |

r_{5} | 0 |

R_{7} | 0 |

R | C_{2} |
---|---|

r_{1} | 0 |

r_{3} | 0 |

r_{6} | 0 |

r_{2} | 1 |

r_{4} | 1 |

r_{5} | 1 |

r_{7} | 1 |

R | C_{3} |
---|---|

r_{1} | 1 |

r_{3} | 1 |

r_{5} | 1 |

r_{6} | 1 |

r_{2} | 0 |

r_{4} | 0 |

r_{7} | 0 |

R | C_{4} |
---|---|

r_{1} | 0 |

r_{3} | 0 |

r_{5} | 0 |

r_{6} | 0 |

r_{2} | 1 |

r_{4} | 1 |

r_{7} | 1 |

R | C_{1} ⋈ C_{2} |
---|---|

r_{1} | 1 |

r_{3} | 1 |

r_{6} | 1 |

r_{2} | 1 |

r_{4} | 1 |

r_{5} | 1 |

r_{7} | 1 |

R | C_{3} ⋈C_{4} |
---|---|

r_{1} | 1 |

r_{3} | 1 |

r_{5} | 1 |

r_{6} | 1 |

r_{2} | 1 |

r_{4} | 1 |

r_{7} | 1 |

R | C_{1} ⋈C_{3} |
---|---|

r_{1} | 1 |

r_{3} | 1 |

r_{6} | 1 |

r_{2} | 1 |

r_{4} | 0 |

r_{5} | 0 |

r_{7} | 0 |

R | C_{2} ⋈C_{4} |
---|---|

r_{1} | 0 |

r_{3} | 0 |

r_{6} | 0 |

r_{2} | 1 |

r_{4} | 1 |

r_{5} | 1 |

r_{7} | 1 |

R | C_{2} U C_{3} |
---|---|

r_{1} | 1 |

r_{3} | 1 |

r_{6} | 1 |

r_{2} | 1 |

r_{4} | 1 |

r_{5} | 1 |

r_{7} | 1 |

R | C_{1} | C_{2} |
---|---|---|

r_{1} | 1 | 0 |

r_{3} | 1 | 0 |

r_{6} | 1 | 0 |

r_{2} | 0 | 1 |

r_{4} | 0 | 1 |

r_{5} | 0 | 1 |

r_{7} | 0 | 1 |

R | C3 | C_{4} |
---|---|---|

r_{1} | 1 | 0 |

r_{3} | 1 | 0 |

r_{6} | 1 | 0 |

r_{2} | 1 | 0 |

r_{4} | 0 | 1 |

r_{5} | 0 | 1 |

r_{7} | 0 | 1 |

R | C_{2} | C_{3} |
---|---|---|

r_{1} | 0 | 1 |

r_{3} | 0 | 1 |

r_{6} | 0 | 1 |

r_{2} | 1 | 1 |

r_{4} | 1 | 0 |

r_{5} | 1 | 0 |

r_{7} | 1 | 0 |

For instance, the response vector of the cluster *C*1 is given by column vector (1,1,1,0,0,0,0).

Suppose *Ci* and *Cj* are two clusters. If the two vectors *Vi* and *Vj* of *Ci* and *Cj* and the intersection *Vi* ∩ *Vj* = Ф, then the cluster set {*Ci*, *Cj*} has a parallel cluster property. Consider the vectors *V*_{1} and *V*_{2} of *C*_{1} and *C*_{2}. The intersection of *V*_{1} ∩*V*_{2} = Ф, so that the cluster set {*C*_{1}, *C*_{2}} has parallel cluster property. Similarly the cluster set {*C*_{3}, *C*_{4}} has parallel cluster property. The cluster set {*C*_{2}, *C*_{3}} does not have parallel cluster property because *V*_{1} ∩ *V*_{2} # Ф and *r*_{2} depending on *C*_{1} and *C*_{2}.

### 3.2 Visual design for parallel cluster

The C-R cluster property is studied with graphical approach. This graphical approach can be studied for designing parallel cluster processing (PCP).

Suppose *Vi* is the vertex of RICM of C. The G(*C*) is defined by vertices *Vi*, *i*=1,2,…, and n, and two vertices have an edge *Eij* associated with interval *Ii*={*Vi*, *V*_{i+1}} i=1,…,*n*-1.

If G(*C*) has C-R cluster property, the vertices of G(C) have consecutive 1’s or 0’s.

Consider the cluster set {*C*_{1}, *C*_{2}}. The G(C1) has the vertices (1,1,1,0,0,0,0), and the G(*C*_{2}) has the vertices (0,0,0,1,1,1,1), G(*C*_{3}) has the vertices (1,1,1,1, 0,0,0), and G(*C*_{4}) has vertices (0,0,0,0,1,1,1).

The parallel cluster property exists if G(*Ci*) ∩G(*Cj*)=Ф.

For instance, consider the G(*C*_{1}) and G(*C*_{2}). G(*C*_{1}) ∩G(*C*_{2})=Ф, so that the cluster set {*C*_{1}, *C*_{2}} has parallel cluster property. The graphical representation is shown in Figure 1.

Similarly the cluster set {*C*_{3}, *C*_{4}} has the parallel cluster property (PCP). The cluster set {*C*_{3}, *C*_{4}} has no PCP because it is G(*C*_{2}) ∩ G(*C*_{3}) # Ф

The graph G(*C*_{1}) ∩ G(*C*_{2}) = Ф have consecutive cluster property.

The graph G(*C*_{3}) ∩ G(*C*_{4}) = Ф have consecutive cluster property. The graphical representation is shown in Figure 2.

The graph G(*C*_{2}) ∩ G(*C*_{3}) # Ф does not have consecutive cluster property. The graphical representation is shown in Figure 3.

### 3.3 Parallel cluster design through genetic approach

Genetic algorithms (GAs) were introduced by Darwin [6]. GAs are used to learn and optimize the problem [7]. There are four evaluation processes:

Selection

Reproduction

Mutation

Competition

Consider the following crossover with two cuts:

Parent #1 00001111

Parent #2 11110000

The parent #1 and #2 match with crossover.

The C-R cluster property is studied through genetical study. This study will help for designing parallel cluster processing (PCP).

**Definition:** The gene G of cluster G(*C*) is defined as incidence sequence.

Suppose G(*C*_{1}) is parent and G(*C*_{2}) child genome of cluster incidence for *C*_{1} and *C*_{2}.

Suppose the G(*C*_{1}) has (1,1,1,0,0,0,0) and the G(*C*_{2}) has the v(0,0,0,1,1,1,1).

The parallel cluster property may be designed using genetic approach with the C-R cluster property.

Suppose *C* is cluster set, *R* is dataset and G(*C*) is genetic set.

The parallel cluster property exists if G(*Ci*) and G(*Cj*) matches with crossover.

For instance,

G(*C*_{1}) = 11110000

G(*C*_{2}) = 00001111

G(*C*_{1}) and G(*C*_{2})matches with the crossover.

The cluster set {*C*_{1}, *C*_{2}} has parallel cluster property.

Similarly the cluster set {*C*_{3}, *C*_{4}} has the parallel cluster property. The cluster set {*C*_{3}, *C*_{4}} has no PCP because G(*C*_{2}) and G(*C*_{3}) are not matched with crossover.

### 3.4 Parallel cluster design cluster analysis

*Clustering* is grouping the particular data according to their properties, and sample clusters *C*_{1} and *C*_{2} are given in Tables 24 and 25, respectively.

R | C_{1} |
---|---|

r_{1} | 1 |

r_{3} | 1 |

r_{6} | 1 |

R | C_{2} |
---|---|

r_{2} | 1 |

r_{4} | 1 |

r_{5} | 1 |

r_{7} | 1 |

Thus, the *C*_{1} and *C*_{2} have consecutive parallel cluster property (Tables 26 and 27).

R | C_{3} |
---|---|

r_{1} | 1 |

r_{3} | 1 |

r_{5} | 1 |

r_{6} | 1 |

R | C_{4} |
---|---|

r_{2} | 1 |

r_{4} | 1 |

r_{7} | 1 |

Thus, the *C*_{3} and *C*_{4} have consecutive parallel properly. *C*_{2} and *C*_{3} do not have consecutive parallel cluster property because *r*_{2} is common.

## 4. Design of retrieval of cluster using blackboard system

Retrieval of clusters from blackboard system [8] is the direct retrieval of data sources. When the query is being processed, the entire database has to bring to main memory but in blackboard architecture, the data item source is direct from the blackboard structure. For the retrieval of information for a query, data item is directly retrieved from the blackboard which contains data item sources. Hash function may be used to store the data item set in the blackboard.

The blackboard systems may be constructed with data structure for data item sources.

Consider the account (AC-No, AC-Name, AC-Balance)

Here AC-No is key of datasets.

Each data item is data sourced which is mapped by h(*x*).

These data items are stored in blackboard structure.

When the transaction is being processed, there is no need to take the entire database into the main memory. It is sufficient to retrieval of particular data item of particular transaction from the blackboard system (Figure 4).

The advantage of blackboard architecture is highly secured for blockchain transaction. The blockchain technology has no third-party interference.

## 5. Fuzzy data mining

Sometimes, data mining is unable to deal with incomplete database and unable to combine the data and reasoning. Fuzzy data mining [6, 7, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18] will combine the data and reasoning by defining with fuzziness. The fuzzy MapReducing algorithms have two functions: *mapping* reads fuzzy datasets and *reducing* writes the after operations.

**Definition**: Given some universe of discourse *X*, a fuzzy set is defined as a pair {*t*, μ_{d}(*t*)}, where *t* is tuples and *d* is domains and membership function μ_{d}(*x*) is taking values on the unit interval [0,1], i.e., μ_{d}(*t*)➔[0,1], where *ti*Є*X* is tuples (Table 28).

R1 | d_{1} | d_{2} | . | d_{m} | μ |
---|---|---|---|---|---|

t_{1} | a_{11} | a_{12} | . | a_{1m} | μ_{d}(t_{1}) |

t_{2} | a_{21} | a_{22} | A_{2m} | μ_{d}(t_{2}) | |

. | . | . | . | . | . |

t_{n} | a_{1n} | a_{1n} | . | A_{nm} | μ_{d}(t_{n}) |

The sale is defined intermittently with fuzziness (Tables 29–32).

CNo | INo | IName | Demand |
---|---|---|---|

C001 | I005 | shirt | 0.9 |

C001 | I007 | Dress | 0.65 |

C003 | I004 | pants | 0.85 |

C002 | I007 | dress | 0.6 |

C001 | I008 | Jacket | 0.65 |

C002 | I005 | shirt | 0.9 |

CNo | INo | IName | Negation of price |
---|---|---|---|

C001 | I005 | shirt | 0.3 |

C001 | I007 | Dress | 0.5 |

C003 | I004 | pants | 0.4 |

C002 | I007 | dress | 0.5 |

C001 | I008 | Jacket | 0.4 |

C002 | I005 | shirt | 0.3 |

CNo | INo | IName | Sales U price |
---|---|---|---|

C001 | I005 | Shirt | 0.8 |

C001 | I007 | Dress | 0.5 |

C003 | I004 | Pants | 0.6 |

C002 | I007 | Dress | 0.5 |

C001 | I008 | Jacket | 0.6 |

C002 | I005 | Shirt | 0.7 |

INo | IName | Sales |
---|---|---|

I005 | Shirt | 0.8 |

I007 | Dress | 0.5 |

I004 | Pants | 0.6 |

I007 | Dress | 0.5 |

I008 | Jacket | 0.6 |

μ _{Demand}(*x*)=0.9/90+0.85/80+0.8/75+0.65/70

or

Fuzziness may be defined with function

μ _{Demand}(*x*)= (1+(Demand-100)/100) ^{−1} Demand <=100

=1 Demand>100

*Negation*

*Union*

Union of 1105 = max{0.8,0.7}=0,8

Fuzzy semijoin is given by sales ⋈ items-sale as shown in Table 33.

CNo | INo | IName | Sales |
---|---|---|---|

C001 | I005 | shirt | 0.8 |

C001 | I007 | Dress | 0.5 |

C003 | I004 | pants | 0.6 |

C002 | I007 | dress | 0.5 |

C001 | I008 | Jacket | 0.7 |

C002 | I005 | shirt | 0.7 |

The fuzzy k-means clustering algorithm (FKCA) is optimization algorithm for fuzzy datasets (Table 34).

CNo | INo | IName | Sales |
---|---|---|---|

C001 | I005⇔I007 | Shirt⇔Dress | 0.4 |

C003 | I004 | pants | 0.6 |

C002 | I007⇔I005 | Dress⇔shirt | 0.5 |

Fuzzy k-means cluster algorithm (FKAC) is given by, using FAD

best=R

K=means=best

for *i* range(1,*n*)

for *j* range(1,*n*)

*ti*=fuzzy union(r_{i}.RU r_{i}.R_{j}), if r_{i}.R=r_{j}.R

*C* reduce best

k-means < best

return

The fuzzy multivalued association property of data mining may be defined with multivalued fuzzy functional dependency.

The fuzzy multivalued association (FMVD) is the multivalve dependency (MVD). The association multivalve dependency (FAMVD) may be defined by using Mamdani fuzzy conditional inference [3].

If EQ(*t*_{1}(*X*),t_{2}(*X*),*t*_{3}(*X*)) then EQ(*t*_{1}(Y) ,*t*_{2}(*Y*)) or EQ(*t*_{2}(*Y*) ,t_{3}(*Y*)) or EQ(*t*_{1}(*Y*) ,*t*_{3}(*Y*))

= min{EQ(t_{1}(*Y*) ,t_{2}(*Y*)) EQ(t_{2}(Y) ,t_{3}(Y)) EQ(*t*_{1}(*Y*) ,*t*_{3}(*Y*))}

= min{min(t_{1}(*Y*) ,t_{2}(Y)) , min(*t*_{2}(Y) ,*t*_{3}(Y)) , min(*t*_{1}(*Y*) ,*t*_{3}(*Y*))}

= min(t_{1}(*Y*) ,t_{2}(*Y*). t_{3}(*Y*))

The fuzzy k-means clustering algorithm (FKCA) is the optimization algorithm for fuzzy datasets (Table 35).

CNo | INo | IName | Sales |
---|---|---|---|

C001 | I005⇔I007 ⇔I008 | Shirt⇔Dress ⇔Jacket | 0.8 0.4 0.5 |

C003 | I004 | Pants | 0.6 |

C002 | I007⇔I005 | Dress⇔shirt | 0.5 0.7 |

Fuzzy k-means cluster algorithm (FKAC) is given by, using FAMVD

best=R

K=means=best

for *i* range(1,*n*)

for *j* range(1,*n*)

for *k* range(1,*n*)

*ti*=fuzzy union(*ri*.*R* U *rj*.*R* U r_{k}.*R*), if *ri*.*R*=r_{j}.*R*=*rk*.*R*

*C* reduce best

k-means<best

return

The fuzzy k-means clustering algorithm (FKCA) is the optimization algorithm for fuzzy datasets.

K=means=n

for *i* range(1,*n*)

for *j* range(1,*n*)

*ti*=fuzzy union(*ri*.R U *si*.*Sj*), if *ri*.*R*=*sj*.*S*

*C* =best

k-means < best

return

For example, consider the sorted fuzzy sets of Table 5 is given in Table 36.

CNo | INo | IName | Sales ⋈ Price⋈ Demand |
---|---|---|---|

C001 | I005 | Shirt | 0.8 |

C001 | I007 | Dress | 0.5 |

C003 | I004 | Pants | 0.6 |

C002 | I007 | Dress | 0.5 |

C001 | I008 | Jacket | 0.6 |

C002 | I005 | Shirt | 0.7 |

## 6. Fuzzy security for data mining

Security methods like encryption and decryption are used cryptographically. These security methods are not secured. Fuzzy security method is based on the mind and others do not descript. Zadeh [16] discussed about web intelligence, world knowledge, and fuzzy logic. The current programming is unable to deal question answering containing approximate information. For instance “which is the best car?” The fuzzy data mining with security is knowledge discovery process with data associated.

The fuzzy relational databases may be with fuzzy set theory. Fuzzy set theory is another approach to approximate information. The security may be provided by approximate information.

**Definition**: Given some universe of discourse *X*, a relational database *R*1 is defined as pair {*t*, *d*}, where *t* is tuple and *d* is domain (Table 37).

R1 | d_{1} | d_{2} | . | d_{m} |
---|---|---|---|---|

t_{1} | a_{11} | a_{12} | . | a_{1m} |

t_{2} | a_{21} | a_{22} | A_{2m} | |

. | . | . | . | . |

t_{n} | a_{1n} | a_{1n} | . | A_{nm} |

Price = 0.4/50+0.5/60+07/80+0.8/100

The fuzzy security database of price is given in Table 38.

INo | IName | Price |
---|---|---|

I005 | Benz | 0.8 |

I007 | Suzuki | 0.4 |

I004 | Toyota | 0.7 |

I008 | Skoda | 0.5 |

I009 | Benz | 0.8 |

Demand = 0.4/50+0.5/60+0.7/80+0.8/100

The fuzzy security database of demand is given in Table 39.

INo | IName | Demand | μ |
---|---|---|---|

I005 | Benz | 80 | 0.7 |

I007 | Suzuki | 60 | 0.5 |

I004 | Toyota | 100 | 0.8 |

I008 | Skoda | 50 | 0.4 |

I009 | Benz | 80 | 0.7 |

The lossless natural join of demand and price is union and is given in Table 40.

The actual data has to be disclosed for analysis on the web. There is no need to disclose the data if the data is inherently define with fuzziness.

“car with fuzziness >07” may defined as follows:

For instance,

XML data may be defined as

<CAR>

<COMPANY>

<NAME> Benz <NAME>

<FUZZ> 0.8 <FUZZ>

</COMPANY>

<COMPANY>

<NAME> Suzuki <NAME>

<FUZZ> 0.9<FUZZ>

</COMPANY>

<COMPANY>

<NAME> Toyoto<NAME>

<FUZZ> 0.6<FUZZ>

</COMPANY>

<COMPANY>

I<NAME> Skoda<NAME>

<FUZZ> 0.7<FUZZ>

</COMPANY>

Xquery may define using projection operator for demand car is given as

Name space default =

Validate <CAR> {

For $name in COMPANY/CAR

where $company/ Max($demand>0.7)}

return <COMPANY> {$company/name, $company/fuzzy}</COMPANY>

</CAR>

The fuzzy reasoning may be applied for fuzzy data mining.

Consider the more demand fuzzy database by decomposition (Tables 41 and 42).

INo | IName | Demand |
---|---|---|

I005 | Benz | 0.8 |

I007 | Suzuki | 0.9 |

I004 | Toyota | 0.6 |

I008 | Skoda | 0.7 |

I009 | Benz | 0.9 |

INo | IName | Price |
---|---|---|

I005 | Benz | 0.7 |

I007 | Suzuki | 0.4 |

I004 | Toyota | 0.6 |

I008 | Skoda | 0.5 |

I009 | Benz | 0.7 |

The fuzzy reasoning [14] may be performed using Zadeh fuzzy conditional inference

The Zadeh [14] fuzzy conditional inference is given by

if x is P_{1} and x is P_{2} …. x is P_{n} then x is Q =

min 1, {1-min(μ_{P1}(x), μ_{P2}(x), …, μ_{Pn}(x)) +μ_{Q}(x)}

The Mamdani [7] fuzzy conditional inference s given by

if x is P_{1} and x is P_{2} …. x is P_{n} then x is Q =

min {μ_{P1}(x), μ_{P2}(x), …, μ_{Pn}(x) , μ_{Q}(x)}

The Reddy [12] fuzzy conditional inference s given by

= min(μ_{P1}(x), μ_{P2}(x), …, μ_{Pn}(x))

If x is Demand then x is price

x is more demand

------------------------------------

x is more Demand o (Demand➔Price)

x is more Demand o min{1, 1-Demand+Price}Zadeh

x is more Demand o min{Demand, Price} Mamdani

x is more Demand o {Demand} Reddy

“If x is more demand, then x is more prices” is given in Tables 43 and 44.

INo | IName | More demand |
---|---|---|

I005 | Benz | 0.89 |

I007 | Suzuki | 0.95 |

I004 | Toyota | 0.77 |

I008 | Skoda | 0.84 |

I009 | Benz | 0.95 |

INo | IName | Zadeh | Mamdani | Reddy |
---|---|---|---|---|

I005 | Benz | 0.9 | 0.7 | 0.7 |

I007 | Suzuki | 0.5 | 0.4 | 0.4 |

I004 | Toyota | 1,0 | 0.6 | 0.6 |

I008 | Skoda | 0.8 | 0.5 | 0.5 |

I009 | Benz | 0.8 | 0.7 | 0.7 |

The inference for price is given in Table 45.

INo | IName | Zadeh | Mamdani | Reddy |
---|---|---|---|---|

I005 | Benz | 0.89 | 0.7 | 0.7 |

I007 | Suzuki | 0.5 | 0.4 | 0.4 |

I004 | Toyota | 0.77 | 0.6 | 0.6 |

I008 | Skoda | 0.8 | 0.5 | 0.5 |

I009 | Benz | 0.8 | 0.7 | 0.7 |

So the business administrator (DA) can take decision to increase the price or not.

## 7. Web intelligence and fuzzy data mining

Let C and D be the fuzzy rough sets (Tables 46–51).

d_{1} | 2_{2} | . | d_{m} | μ | |
---|---|---|---|---|---|

t_{1} | a_{11} | a_{12} | . | a_{1m} | μ_{d}(t_{1}) |

t_{2} | a_{21} | a_{22} | A_{2m} | μ_{d}(t_{2}) | |

. | . | . | . | . | . |

t_{n} | a_{1n} | a_{1n} | . | A_{nm} | μ_{d}(t_{n}) |

INo | IName | Price | μ |
---|---|---|---|

I005 | Shirt | 100 | 0.8 |

I007 | Dress | 50 | 0.4 |

I004 | Pants | 80 | 0.7 |

I008 | Jacket | 60 | 0.5 |

I009 | Skirt | 100 | 0.8 |

INo | IName | Demand | μ |
---|---|---|---|

I005 | Shirt | 80 | 0.8 |

I007 | Dress | 60 | 0.5 |

I004 | Pants | 100 | 0.8 |

I008 | Jacket | 50 | 0.5 |

I009 | Skirt | 80 | 0.8 |

INo | IName | Price | μ |
---|---|---|---|

I005 | Shirt | 100 | 0.8 |

I007 | Dress | 50 | 0.5 |

I004 | Pants | 80 | 0.8 |

I108 | Jacket | 60 | 0.5 |

I009 | Skirt | 100 | 0.8 |

Company | μ |
---|---|

IBM | 0.8 |

Microsoft | 0.9 |

0.75 |

The operations on fuzzy rough set type 2 are given as

1-C= 1- μ_{C}(x) Negation

CVD=max{μ_{C}(x), μ_{D}(x)} Union

CΛD=min{μ_{C}(x) , μ_{D}(x)} Intersection

XML data may be defined as

<SOFTWARE>

<COMPANY>

<NAME> IBM <NAME>

<FUZZ> 0.8 <FUZZ>

</COMPANY>

<COMPANY>

<NAME> Microsoft <NAME>

<FUZZ> 0.9<FUZZ>

</COMPANY>

<COMPANY>

<NAME> Google<NAME>

<FUZZ> 0.75<FUZZ>

</COMPANY>

Xquery may define using projection operator for best software company is given as

Name space default =

Validate <SOFTWARE> {For $name in COMPANY/SOFTWARE where $company/ Max($fuzz)}

return <COMPANY> {$company/name, $company/fuzzy} </COMPANY>

</SOFTWARE>

Similarly, the following problem may be considered for web programming.

Let P is the fuzzy proposition in question-answering system.

P=Which is tallest buildings City?

The answer is “x is the tallest buildings city.”

For instance, the fuzzy set “most tallest buildings city” may defined as

most tallest buildings city = 0.6/Hoang-Kang + 0.6/Dubai + 0.7/New York +0.8/Taipei+ 0.5/Tokyo

For the above question, output is “tallest buildings city”= 0.8/Taipei by using projection.

The fuzzy algorithm using FUZZYALGOL is given as follows:

BEGIN

Variable most tallest buildings City = 0.6 / Hoang-Kang + 0.6 / Dubai + 0.7 / New York + 0.8 / Taipei + 0.5 / Tokyo

most tallest buildings City =0.8 / Taipei

Return URL, fuzziness=Taipei, 0.8

END

The problem is to find “most pdf of type-2 in fuzzy sets”

The Fuzzy algorithm is

Go to most visited fuzzy set cites

Go to most visited fuzzy sets type-2

Go to most visited fuzzy sets type -2 pdf

The web programming gets “the most visited fuzzy sets” and put in order

The web programming than gets “the most visited type-2 in fuzzy sets”

The web programming gets “the most visited pdf in type-2”

## 8. Conclusion

Data mining may deal with incomplete information. Bayesian theory needs exponential complexity to combine data. Defining datasets with fuzziness inherently reduce complexity. In this chapter, fuzzy MapReduce algorithms are studied based on functional dependencies. The fuzzy k-means MapReduce algorithm is studied using fuzzy functional dependencies. Data mining and fuzzy data mining are discussed. A brief overview on the work on business intelligence is given as an example.

Most of the current web programming studies are unable to deal with incomplete information. In this chapter, the web intelligence system is discussed for fuzzy data mining. In addition, the fuzzy algorithmic language is discussed for design fuzzy algorithms for data mining. Web intelligence system for data mining is discussed. Some examples are given for web intelligence and fuzzy data mining.

## Acknowledgments

The author thanks the reviewer and editor for revision and review suggestions made in this work.