Classification in Multi-Label Datasets

Aouatef Mahani

doi:10.5772/intechopen.109352

Abstract

Multi-label datasets contain several classes, where each class can have multiple values. They appear in several domains such as music categorization into emotions and directed marketing. In this chapter, we are interested in the most popular task of Data Mining, which is the classification, more precisely classification in multi-label datasets. To do this, we will present the different methods used to extract knowledge from these datasets. These methods are divided into two categories: problem transformation methods and algorithm adaptation ones. The methods of the first category transform multi-label classification problem into one or more single classification problems. While the methods of the second category extend a specific learning algorithm, in order to handle multi-label datasets directly. Also, we will present the different evaluation measures used to evaluate the quality of extracted knowledge.

Keywords

classification
instance
classifier
rank
label

Author Information

Show +

Aouatef Mahani*
- University of Science and Technology of Algiers Houari Boumediene USTHB, Algiers, Algeria

*Address all correspondence to: mahani.aouatef@gmail.com

1. Introduction

Classification is the most popular task in Data Mining. It consists of attribute to the appropriate class to an instance. There are several fields of classification that depend on the number of classes and the number of possible values of a class in a dataset. If a dataset contains a single class, which can have two values, then we speak about the classification in binary datasets. However, if the single class has more than two values, then we speak about the classification in multi-class datasets. In a case where a dataset contains several classes at a time, we speak about the classification in multi-label datasets.

Multi-label datasets appear in several applications such as text categorization [1], image annotation [2, 3], web advertising [4], and music categorization [5]. In these applications there are usually tens or hundreds of thousands of labels, while the number is still increasing. It is important to extract knowledge from these datasets to take decision. Consequently, the problem of classification in this kind of datasets is being an important problem in machine learning, and it has attracted the attention of many researchers.

For supervised learning algorithm from multi-label datasets, there are two major tasks: multi-label classification MLC and label ranking (LR) [6]. The first task is concerned with learning a model that outputs a bipartition of labels into relevant and irrelevant labels. The second task is concerned with learning model that outputs an ordering of the class labels according to their relevance.

For both tasks, the different approaches and techniques are proposed to deal with the classification in multi-label datasets divided into the two categories: problem transformation methods and algorithm adaptation methods [7]. In the first category, multi-label classification problem is transformed into one or more single classification problems. However, in the second one, the existing approaches are adapted to the studied problem.

This chapter is organized as follows: Section 2 presents the different notations used in the rest of chapter. In Section 3, we present at first the description measures of a multi-label dataset, and in the Section 4, we present the evaluation metrics used to evaluate the performances of the test dataset. In Section 5, we detail the different approaches and techniques used to deal with the problem of classification in multi-label datasets. Finally, in Section 6, we make our concluding remarks.

2. Notations

In the rest of this approach, we have used these notations:

D: is the concerned multi-label dataset.

N: is the size of multi-label dataset.

L: the set of labels.

Q: is the number of labels.

Y: is a set of labels, where Y is included in L with k = |Y|. It is known as k-label set.

Y¯: is the complementary set of Y.

3. Description measures

In a multi-label dataset, the number of labels varies from one instance to another. For this reason, we can find some datasets that contain few labels compared with the total number of labels. This could be a parameter that influences the performance of different methods and approaches used to deal with the classification problem in multi-label databases. Therefore, a statistical analysis is necessary in order to have a description on a database [7, 8].

3.1 Label cardinality LC

LC indicates the average number of labels per instance (Eq. (1)).

LC=1N∑i=1N∣yi∣E1

3.2 Label density LD

LD is the average number of labels divided by the overall number of labels Q (Eq. (2)).

LD=1N∑i=1N∣yi∣QE2

3.3 Distinct label sets DL

DL counts the number of label sets that are unique across the total number of instances (Eq. (3)).

DL=∣{∃xi∈X andYi⊆YxiYi∈DE3

4. Evaluation measures

For classical classification, different performance measures have been proposed such as accuracy and coverage. However, the performance measures for classification in multi-label datasets are more complicated than single-class datasets. Consequently, a number of evaluation measures are proposed specifically to the multi-label datasets. These measures are categorized into two groups: example-based measures and label-based ones.

4.1 Example-based measures

These measures evaluate each instance of test dataset, and they return the mean value. They are also divided into two groups: prediction-based measures and ranking-based ones. The former measures use a learning system, and they are calculated based on the average difference of the actual and the predicted set of labels over all test instances. Whereas, the latter measures evaluate the label ranking quality depending on the scoring function f(.,.).

4.1.1 Prediction-based measures

Hamming Loss [9] represents the fraction of misclassified instances. If this measure is low, then the classifier has good performance (Eq. (4)).

H=1N∗∑i=1m∣Yi∆Zi∣where:Yi∆Zi=XORYiZiE4

Classification Accuracy [10] represents the fraction of well-classified instances. It is a very strict as it requires the predicted set of labels to be an exact match of the true set of labels. It is also known as subset Accuracy [11] (Eq. (5)).

Classification Accuracy=1N∗∑i=1NIZi=YiE5

Where: IZi=Yi=1 if Zi=Yi et 0 else.

Accuracy [12] represents the percentage of correctly predicted labels among all predicted and true labels (Eq. (6)).

Accuracy=1N∗∑i=1N∣Yi∩Zi∣∣Yi∪Zi∣E6

Precision represents the proportion of true positive predictions (Eq. (7)) [13].

Precision=1N∗∑i=1N∣Yi∩Zi∣∣Zi∣E7

Recall: estimates the proportion of true labels that have been predicted as positives (Eq. (8)) [13].

Racall=1N∗∑i=1N∣Yi∩Zi∣∣Yi∣E8

4.1.2 Ranking-based metrics [14]

Coverage error evaluates how many steps are needed, on average, to move down the ranked label list so as to cover all the relevant labels of the instance (Eq. (9)).

Coverage error=1N∗∑i=1Nmaxy∈YirankfXiy−1E9

One-error computes how many times the top-ranked label is not in the true set of labels of the instance (Eq. (10)).

One−error=1N∗∑i=1Nargmaxy∈YifXiy∉YiE10

4.2 Label-based measures

In the aim to present the label measures, we compute the four components of the confusion matrix for each label y_i, which are TP_i, FP_i, TN_i, and FN_i that represent respectively true positive, false positive, true negative, and false negative (Eqs. (11)-(14) [15].

TPi=∣Xiwhere:yi∈Yiandyi∈Zi1≤i≤N∣E11

FPi=∣Xiwhere:yi∉Yiandyi∈Zi1≤i≤N∣E12

TNi=∣Xiwhere:yi∉Yiandyi∉Zi1≤i≤N∣E13

FNi=∣Xiwhere:yi∈Yiandyi∉Zi1≤i≤N∣E14

The label measures evaluate each label, and they return the average. The calculation of the average of all the labels can be achieved using two operations: macro-average and micro-average [16]. In macro-average, we calculate the performance measure of each label (Eqs. (15) and (16)), and then we take the average. On the other hand, in micro-average, we calculate the average performance measure of all the labels (Eqs. (17) and (18)).

Measures	Macro-average measures	Micro-average measures
Precision	1L∗∑i=1LTPiTPi+FPi (15)	∑i=1LTPi∑i=1LTPi+∑i=1LFPi (17)
Recall	1L∗∑i=1LTPiTPi+FNi (16)	∑i=1LTPi∑i=1LTPi+∑i=1LFNi (18)

5. Approaches and methods

The existing methods used to handle the classification problem in multi-label datasets are divided into two groups: problem transformation methods and algorithm adaptation methods.

5.1 Problem transformation method

This group transforms multi-label classification problem into one or more single classification problem [17].

5.1.1 Copy transformation method

This method [18] creates a single label dataset from original multi-label one. It replaces each multi-label instance with |Y_i| labels by |Y_i| instances. The variations of this method are dubbed copy-weight, select family of transformations, and ignore transformation. The first variation associates a weight to each produced instance. In the second one, for each set of created instances, only one instance is selected by applying the select max method that selects the most frequent instance, or the select min method that selects the least frequent instance, or select random one that selects an instance randomly. The last method deletes all multi-label instances.

5.1.2 Binary relevance (BR)

BR [17] is one of the most popular methods. It generates one dataset for each label where each dataset contains all instances, but with only one class, which may be positive or negative. For each instance of the i^th dataset, if its set of labels contains the i^th label, then its class is positive; otherwise its class is negative.

For each dataset, a classifier is generated. To classify a new instance, the BR method returns the union of all labels predicted by generated classifiers.

Although BR is a simple transformation method, it has been strongly criticized due to its incapacity of handling label dependency information [19].

5.1.3 Label power set (LP)

LP method [7] considers each set of labels of an instance as one class. For classifying a new instance, BR outputs the most probable class.

LP takes into account label dependence, but it has two drawbacks. First, the learning step becomes difficult when the number of label sets increases, especially when this number is exponential [20]. Second, the class imbalance problem can appear when there are some label sets that are represented by very few instances in the training dataset [20].

5.1.4 Random K-labelsets (RAKEL)

RAKEL [7] generates m Label Power set (LP) classifiers. To construct the LP classifier, we randomly select a k-labelset from L^k without replacement, and we build the appropriate training dataset. We note that the number of iterations m and the size of a label set k are the user-specified parameters. The different steps are detailed in this algorithm:

Input: training dataset D, set of labels L, parameters m and k.>/>Output: m classifiers and corresponding k-label sets Z_i>/>Begin>/>

Construct the set R of all k-label sets
for i:=1 to min(m, |L^k|) do>/>2.1. Select randomly the k-label set Z_i from R; R:=R/Z_i>/>2.2. Construct the corresponding training dataset D_i:
- D_i:= Ø
- For each instance (X_i,S_i) from D do
  - W:=X_i ∩ Z_i
  - If W = Ø, then replace S_i by the empty class else replace S_i by W
  - D_i:=D_i U {(X_i,W)}
>/>2.3. Build the classifier H_i using D_i>/>End.

To classify a new instance, each classifier uses its corresponding k-label set as it is illustrated in this algorithm:

Input: new instance X, set of m k-label set Z_i, L, m LP classifiers H_j and the threshold T.>/>Output: vector of predictions V>/>Begin>/>

for i:=1 to |L| do sum_i:=0; votes_i:=0
for j:=1 to m do>/>for each label l_i ϵ Z_i do sum_i:=sum_i + Hj(X,l_i); votes_i:=votes_i + 1
for i:=1 to |L| do
Avg_i:=sum_i/votes_i
If (Avg_i > T), then V_i:=1 else V_i:=0
End.

5.1.5 Ranking by pair-wise comparison (RPC)

RPC [21] produces L*(L-1)/2 binary datasets from original dataset, one for each pair (l_i, l_j) with 1 ≤ i < j ≤ k. Each dataset contains only instances that have the label l_i or l_j, but not both, and it is used to generate a binary classifier. To classify a new instance, each binary classifier outputs the labels, then the majority votes are applied for each label.

5.1.6 Calibrated label ranking (CLR)

CLR [22] is a technique that extends RPC by introducing a new virtual label. This latter is known as calibration label, and it is considered as a breaking point of the ranking that split the set of labels into two sets: relevant labels and irrelevant labels.

5.1.7 Classifier chain model (CC)

CC [19] produces L classifiers as Binary Relevance, but the actual classifier depends on previous one.

Example:

Attributes	Label	Attributes		Label	Attributes			Label	Attributes				Label
X₁	1	X₁	1	1	X₁	1	1	1	X₁	1	1	1	0
X₂	0	X₂	0	0	X₂	0	0	1	X₂	0	0	1	1
X₃	1	X₃	1	1	X₃	1	1	0	X₃	1	1	0	1
X₄	0	X₄	0	0	X₄	0	0	1	X₄	0	0	1	0
X₅	1	X₅	1	0	X₅	1	0	1	X₅	1	0	1	1

5.1.8 Ensemble of classifier chains (ECC)

This technique [23] uses classifier chains as a base classifier. It trains several CC classifiers using a standard bagging scheme. The produced binary models of each chain are ordered according to a random seed. Each model predicts different label sets. These predictions are summed per label so that each label receives a number of votes. A threshold is used to select the most popular labels that form the final predicted multi-label set.

5.1.9 Pruned sets (PS)

PS [24] consists of creating the new training dataset P from the original training dataset D by pruning infrequently label sets. This operation is controlled by a parameter p, which determines how often a label combination must occur for it not to be pruned. This algorithm summarizes this operation:

Input: the original dataset D and the parameter p.>/>Output: the pruned dataset P and the set of labels sets LC.>/>Begin>/>

P:=Ø; LC:=Ø
for each instance (X_i,S_i) from D do>/>If Si ϵ LC, then increment its frequency c by 1; else LC:=LC U (S_i,1)
for each instance (X_i,S_i) from D do
- Use LC to retrieve the frequency of Si: (S_i,c)
- If c > p, then P:=P U {(X_i,S_i)} else (X_i,S_i) is considered as a pruned instance
for each pruned instance (X_i,S_i) do
- Decompose Si into subsets s_i0, s_i1, …, s_in where each s_ij belongs to LC and its frequency c is >p
- for each s_ij do form the new instance (d_i,s_ij); P:=P U {(d_i,s_ij)}>/>End.

The pruned instances are reintroduced into the training in the form of new instances with smaller and more commonly found label sets. This allows the preservation of the instances and information about their label set. However, the size of training dataset is increased, and the average number of labels per instance becomes lower, which can in turn cause too few labels to be predicted at classification time.

5.1.10 Ensembles of pruned sets (EPS)

The PS method cannot create the new multi-label sets, which have not been seen in the training dataset. Consequently, it presents a problem when working with datasets where labelling is particularly irregular or complex. To solve this problem, an ensemble of PS [24] is proposed. The build phase of EPS is straightforward. Over m iterations, a subset of the training set is sampled and a PS classifier with relevant parameters is trained using this subset. For prediction, the threshold t is used, and different multi-label predictions are combined into a final prediction. This final label set may not have been known to any of the individual PS models, allowing greater classification potential.

5.1.11 Hierarchy of multilabel classifiers (HOMER)

HOMER [25] is an effective and computationally efficient for multi-label classification problem. Its principle consists of constructing a hierarchy of multi-label classifiers in the form of tree, following the divide-and-conquer strategy. Each deals with a much smaller set of labels compared with the set L. Each node of the tree contains a set of labels and the produced classifier, in which the root contains the set of all labels, and the leaves contain a single label. Each internal node contains the conjunction of the label sets of its children. The construction of this tree is done by following these steps:

The root contains all labels.
Train the classifier H₁ using all training dataset.
For each node n that contains more than a single label does.
- Create k children.
- Each child filters the training dataset of its parents by keeping instances that have at least one of its own labels.
- Train the classifier H_n using the filtered dataset.

The question is how to distribute the labels of a node on k children?

For each child, the labels may be evenly distributed to k subsets in a way such that labels belonging to the same subset are as similar as possible. To do this, HOMER uses a balanced clustering algorithm, known as balanced k means.

5.2 Algorithm adaptation methods

5.2.1 Decision trees

The decision tree algorithm C4.5 [26] is efficient and robust for machine learning. It consists of constructing a tree top down in which the nodes contain the most suitable attributes. The selection of the suitable attribute is done by using the information gain (Eq. (19)), which is the difference between the entropy of the remaining instances in the training dataset and the weighted sum of the entropy of the subsets caused by partitioning on the values of that attribute.

information gainDA=entroptyD−∑v∈VADvD∗entropyDvE15

Where: D is the training dataset, A is the considered attribute, VA is the set of possible values of the attribute A, D_v is the number of instances from the training dataset in which the value of the attribute A is v, and the entropy for a set of instances is defined in (Eq. (20)):

entropyD=−∑i=1Npci∗logpciE16

Where: p(c_i) is the probability (relative frequency) of class ci in this set.

The formula of the entropy is specific to a single class where the leaves contain one class. Therefore, C45 algorithm is the problem for multi-label datasets, and it is necessary to modify the formula of the entropy.

In [27], the learning process is accomplished by allowing multiple labels in the leaves of the tree. The formula for calculating entropy is modified for solving multi- label problems. The modified entropy sums the entropies for each individual class label (Eq. (21)).

entropyD=−∑i=1Npci∗logpci+qci∗logqciE17

Where: p(c_i) is the relative frequency of class label c_i, and q(c_i) = 1- p(c_i).

5.2.2 K-nearest neighbors KNN

Several methods exist based on KNN algorithm. ML-KNN [28] is the extension of KNN for classification problem in multi-label datasets. It consists of computing the prior and posterior probabilities to determine labels of a test instance. We introduce these notations before presenting ML-KNN.

The category vectory→ of an instance X: is a vector of size L where y→l=1 if l ϵ Y and 0 otherwise.
The K-nearest neighbors of X: N(X)
The membership counting vectorC→: to count the frequency of each label from N(X).
The eventH1l that the instance X has label l.
The eventH0l that X has not label l.
The eventEjl (j ∈ {0, 1, …, k}) denote the event that, among the k nearest neighbors of X, there are exactly j instances, which have label l.

To classify the test instance T, we follow these steps:

Compute the prior probability P(H1l) of each label l using all the training dataset (Eq. (22)):
PH1l=s+∑i=1Ny→xil/s∗2+NE18

Where: N is the size of the training dataset, s is an input argument, which is a smoothing parameter controlling the strength of uniform prior.

Determine the K-nearest neighbors of T.
Compute the posterior probability PEjl\Hbl for each label l and for each neighbor j (Eq. (23) and Eq. (24)):
PEjl\H1l=s+cj)/s∗K+1)+∑p=0KC1pE19
PEjl\H0l=s+c′j)/s∗K+1)+∑p=0KC2pE20

Where the vectors C₁ and C₂ are computed for each label and each instance.

Compute the prediction using the posterior probabilities.

5.2.3 Support vector machine

The support vector machines (SVMs) have been extended to handle the multi-label problem. For example, Rank-SVM [29] defines a linear model based on a ranking system combined with a label set size predictor with the aim to minimize the ranking loss (Eq. (25)) and to maximize the margin.

RLoss=1N+∑i=1N1∣Yi∣∗∣Y¯i∣∗∣Rxi∣E21

Where R(x_i) = {(l₁, l₂) ∈ Y_i * Y¯i | f(x_i, l₁) ≤ f(x_i, l₂)}, Y¯i denotes the complement of Y_i in Y, and f is the scoring function that gives a score for each label l interpreted as the probability that l is relevant.

5.2.4 Ensemble methods

AdaBoost.MH [30] is the extension of the AdaBoost algorithm, which is designed to minimize the hamming loss; for more details, see [31]. The minimization is done by decomposing the problem into k orthogonal binary classification problems.

AdaBoost.MR [30] is designed to find a hypothesis that ranks the correct labels at the top.

6. Conclusion

In this chapter, we have presented the classification problem in multi-label datasets, which is an important problem because these datasets appear in several domains. We have presented the description measures and the suitable metrics to evaluate the performances of the extracted knowledge. Then, we have reviewed the different approaches and methods used to deal, which are divided into two main categories: multi-label transformation methods and algorithm adaptation methods.

In future work, we are planning to present a state of the art about different approaches and techniques used to handle the classification problem in imbalanced multi-label datasets.

References

1. Dekel O, Shamir O. Multiclass-multilabel learning when the label set grows with the number of examples. In: Proceedings of 13th International Conference on the Artificial Intelligence and Statistics (AISTAT (2010)), 13-15 May 2010; Chia Laguna, Sardinia, Italy. 2010
2. Deng J, Dong W, Socher R, Li Li-Jia, Li K, Li Fei-Fei. Imagenet: A large scale hierarchical image database. In: Proceedings of 2009 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2009), 20–25 June 2009; Miami, Florida, USA. 2009. pp. 248-255
3. Nguyen CT, Zhan DC, Zhou ZH. Multi-modal image annotation with multi-instance multi-label LDA. In: Proceedings of the 23rd International Joint Conference on Artificial Intelligence (IJCAI’13), 3–9 August 2013; Beijing, China. 2013. pp. 1558-1564
4. Eygelzimer A, Langford J, Lifshits Y, Sorkin G, Strehl A. Conditional probability tree estimation analysis and algorithm. In: Proceedings of the 25th Conference in Uncertainty in Artificial Intelligence (UAI 2009), 18–21 June 2009; Montreal, Canada. 2009. pp. 51-58
5. Turnbull D, Barrington L, Torres D, Lanckriet G. Semantic annotation and retrieval of music and sound effects. IEEE Transactions on Audio, Speech and Language Processing. 2008;16:467-476. DOI: 10.1109/TASL.2007.913750
6. Vembu S, Gärtner T. Label ranking algorithms: A survey. In: Furnkranz J, Hullermeier E, editors. Preference Learning Handbook. Berlin, Heidelberg: Springer; 2009. pp. 45-64. DOI: 10.1007/978-3-642-14125-6_3
7. Tsoumakas G, Vlahavas I. Random k-labelsets: An ensemble method for multilabel classification. In: Proceedings of the 18th European Conference on Machine Learning (ECML 2007), 17–21 September 2007; Warsaw, Poland. 2007. pp. 406-417
8. Tsoumakas G, Katakis I, Vlahavas I. Mining multi-label data. In: Maimon O, Rokach L, editors. Data Mining and Knowledge Discovery Handbook. 2nd ed. Boston, MA: Springer; 2010. pp. 667-685. DOI: 10.1007/978-0-387-09823-4_34
9. Xia Y, Nie L, Zhang L, Yang Y, Hong R, Li X. Weakly supervised multilabel clustering and its applications in computer vision. IEEE Transaction on Cybernetics. 2016;46:3220-3232. DOI: 10.1109/TCYB.2015.2501385
10. Zhu S, Ji X, Xu W, Gong Y. Multi-labelled classification using maximum entropy method. In: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in Information Retrieval (SIGIR05); 15-19 August 2005; Salvador. 2005. pp. 274-281
11. Ghamrawi N, McCallum A. Collective multi-label classification. In: Proceedings of the 2005 ACM Conference on Information and Knowledge Management (CIKM’05); Bremen. Germany. October 31 - November 5 2005. pp. 195-200
12. Godbole S, Sarawagi S. Discriminative methods for multi-labeled classification. In: Proceedings of the 8th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2004); 26–28 May 2004; Sydney, Australia. 2004. pp. 22-30
13. Kanj S. Learning methods for multi-label classification [thesis]. The University of Technology of Compiègne. 2013
14. Zhang ML, Zhou ZH. A review on multi-label learning algorithms. IEEE Transactions on Knowledge and Data Engineering. 2014;26:1819-1837. DOI: 10.1109/TKDE.2013.39
15. Madjarov G, Kocev D, Gjorgjevikj D, Dzeroski S. An extensive experimental comparison of methods for multi-label learning. Pattern Recognition. 2012;45:3084-3104. DOI: 10.1016/J.PATCOG.2012.03.004
16. Yang Y. An evaluation of statistical approaches to text categorization. Information Retrieval. 1999;1:69-90. DOI: 10.1023/A:1009982220290
17. Boutell M, Luo J, Shen X, Brown C. Learning multi-label scene classification. Pattern Recognition. 2004;37:1757-1771. DOI: 10.1016/j.patcog.2004.03.009
18. Ganda D, Buch R. A survey on multi label classification. Recent trends in Programming Languages. 2018;5:19-23. DOI: 10.37591/RTPL
19. Read J, Pfahringer B, Holmes G, Frank RE. Classifier chains for multi-label classification. Machine Learning. 2011;85:333-359. DOI: 10.1007/s10994-011-5256-5
20. Cherman EA, Monard MC, Metz J. Multi-label problem transformation methods: A case study. Clei Electronic Journal. 2011;14:4. DOI: 10.19153/cleiej.14.1.4
21. Hüllermeier E, Fürnkranz J, Cheng W, Brinker K. Label ranking by learning pairwise preferences. Artificial Intelligence. 2008;172:1897-1916. DOI: 10.1016/j.artint.2008.08.002
22. Fürnkranz J, Hüllermeier E, Mencia EL, Brinker K. Multilabel classification via calibrated label ranking. Machine Learning. 2008;73:133-153. DOI: 10.1007/s10994-008-5064-8
23. Read J, Pfahringer B, Holmes G, Frank E. Classifier chains for multi-label classification. In: Proceedings of the 13th European Conference on Principles and Practice of Knowledge Discovery in Databases and 20th European Conference on Machine Learning (ECML PKDD 2009), Part II, LNAI 5782; September 7-11 2009; Berlin. 2009. pp. 254-269
24. Read J, Pfahringer B, Holmes G. Multi-label classification using ensembles of pruned sets. In Proceedings of the 2008 Eighth IEEE International Conference on Data Mining (ICDM’08); 15-19 December 2008; USA. 2008. pp. 995-1000
25. Tsoumakas G, Katakis I, Vlahavas I. Effective and efficient multi-label classification in domains with large number of labels. In: Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD) Workshop on Mining Multidimensional Data; Belgium 15-19 September 2008. pp. 30-44
26. Lakshmi BN, Indumathi TS, Ravi N. A study on C.5 decision tree classification algorithm for risk predictions during pregnancy. Procedia Technology. 2016;24:1542-1549. DOI: 10.1016/J.PROTCY.2016.05.128
27. Clare A, King RD. Knowledge discovery in multi-label phenotype data. In: Proceedings of the 5th European Conference on Principles of Knowledge Discovery in Databases (PKDD 2001); 3–7 September 2001; Freiburg, Baden-Württemberg, Germany. 2001. pp. 42-53
28. Zhang ML, Zhou ZH. ML-kNN: A lazy learning approach to multi-label learning. Pattern Recognition. 2007;40:2038-2048. DOI: 10.1016/j.patcog.2006.12.019
29. Elisseeff A, Weston J. A Kernel method for multilabelled classification. In: Proceedings of the Annual ACM Conference on Research and Development in Information Retrieval (SIGIR05); 15-19 August 2005; Salvador. 2005. pp. 274-281
30. Schapire RE, Singer Y. Boostexter: A boosting-based system for text categorization. Machine Learning. 2000;39:135-168. DOI: 10.1023/A:1007649029923
31. Schapire RE, Singer Y. Improved boosting algorithms using confidence rated predictions. Machine Learning. 1999;37:297-336. DOI: 10.1023/A:1007614523901

[1] 1. Dekel O, Shamir O. Multiclass-multilabel learning when the label set grows with the number of examples. In: Proceedings of 13th International Conference on the Artificial Intelligence and Statistics (AISTAT (2010)), 13-15 May 2010; Chia Laguna, Sardinia, Italy. 2010

[2] 2. Deng J, Dong W, Socher R, Li Li-Jia, Li K, Li Fei-Fei. Imagenet: A large scale hierarchical image database. In: Proceedings of 2009 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2009), 20–25 June 2009; Miami, Florida, USA. 2009. pp. 248-255

[3] 3. Nguyen CT, Zhan DC, Zhou ZH. Multi-modal image annotation with multi-instance multi-label LDA. In: Proceedings of the 23rd International Joint Conference on Artificial Intelligence (IJCAI’13), 3–9 August 2013; Beijing, China. 2013. pp. 1558-1564

[4] 4. Eygelzimer A, Langford J, Lifshits Y, Sorkin G, Strehl A. Conditional probability tree estimation analysis and algorithm. In: Proceedings of the 25th Conference in Uncertainty in Artificial Intelligence (UAI 2009), 18–21 June 2009; Montreal, Canada. 2009. pp. 51-58

[5] 5. Turnbull D, Barrington L, Torres D, Lanckriet G. Semantic annotation and retrieval of music and sound effects. IEEE Transactions on Audio, Speech and Language Processing. 2008;16:467-476. DOI: 10.1109/TASL.2007.913750

[6] 6. Vembu S, Gärtner T. Label ranking algorithms: A survey. In: Furnkranz J, Hullermeier E, editors. Preference Learning Handbook. Berlin, Heidelberg: Springer; 2009. pp. 45-64. DOI: 10.1007/978-3-642-14125-6_3

[7] 7. Tsoumakas G, Vlahavas I. Random k-labelsets: An ensemble method for multilabel classification. In: Proceedings of the 18th European Conference on Machine Learning (ECML 2007), 17–21 September 2007; Warsaw, Poland. 2007. pp. 406-417

[8] 8. Tsoumakas G, Katakis I, Vlahavas I. Mining multi-label data. In: Maimon O, Rokach L, editors. Data Mining and Knowledge Discovery Handbook. 2nd ed. Boston, MA: Springer; 2010. pp. 667-685. DOI: 10.1007/978-0-387-09823-4_34

[9] 9. Xia Y, Nie L, Zhang L, Yang Y, Hong R, Li X. Weakly supervised multilabel clustering and its applications in computer vision. IEEE Transaction on Cybernetics. 2016;46:3220-3232. DOI: 10.1109/TCYB.2015.2501385

[10] 10. Zhu S, Ji X, Xu W, Gong Y. Multi-labelled classification using maximum entropy method. In: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in Information Retrieval (SIGIR05); 15-19 August 2005; Salvador. 2005. pp. 274-281

[11] 11. Ghamrawi N, McCallum A. Collective multi-label classification. In: Proceedings of the 2005 ACM Conference on Information and Knowledge Management (CIKM’05); Bremen. Germany. October 31 - November 5 2005. pp. 195-200

[12] 12. Godbole S, Sarawagi S. Discriminative methods for multi-labeled classification. In: Proceedings of the 8th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2004); 26–28 May 2004; Sydney, Australia. 2004. pp. 22-30

[13] 13. Kanj S. Learning methods for multi-label classification [thesis]. The University of Technology of Compiègne. 2013

[14] 14. Zhang ML, Zhou ZH. A review on multi-label learning algorithms. IEEE Transactions on Knowledge and Data Engineering. 2014;26:1819-1837. DOI: 10.1109/TKDE.2013.39

[15] 15. Madjarov G, Kocev D, Gjorgjevikj D, Dzeroski S. An extensive experimental comparison of methods for multi-label learning. Pattern Recognition. 2012;45:3084-3104. DOI: 10.1016/J.PATCOG.2012.03.004

[16] 16. Yang Y. An evaluation of statistical approaches to text categorization. Information Retrieval. 1999;1:69-90. DOI: 10.1023/A:1009982220290

[17] 17. Boutell M, Luo J, Shen X, Brown C. Learning multi-label scene classification. Pattern Recognition. 2004;37:1757-1771. DOI: 10.1016/j.patcog.2004.03.009

[18] 18. Ganda D, Buch R. A survey on multi label classification. Recent trends in Programming Languages. 2018;5:19-23. DOI: 10.37591/RTPL

[19] 19. Read J, Pfahringer B, Holmes G, Frank RE. Classifier chains for multi-label classification. Machine Learning. 2011;85:333-359. DOI: 10.1007/s10994-011-5256-5

[20] 20. Cherman EA, Monard MC, Metz J. Multi-label problem transformation methods: A case study. Clei Electronic Journal. 2011;14:4. DOI: 10.19153/cleiej.14.1.4

[21] 21. Hüllermeier E, Fürnkranz J, Cheng W, Brinker K. Label ranking by learning pairwise preferences. Artificial Intelligence. 2008;172:1897-1916. DOI: 10.1016/j.artint.2008.08.002

[22] 22. Fürnkranz J, Hüllermeier E, Mencia EL, Brinker K. Multilabel classification via calibrated label ranking. Machine Learning. 2008;73:133-153. DOI: 10.1007/s10994-008-5064-8

[23] 23. Read J, Pfahringer B, Holmes G, Frank E. Classifier chains for multi-label classification. In: Proceedings of the 13th European Conference on Principles and Practice of Knowledge Discovery in Databases and 20th European Conference on Machine Learning (ECML PKDD 2009), Part II, LNAI 5782; September 7-11 2009; Berlin. 2009. pp. 254-269

[24] 24. Read J, Pfahringer B, Holmes G. Multi-label classification using ensembles of pruned sets. In Proceedings of the 2008 Eighth IEEE International Conference on Data Mining (ICDM’08); 15-19 December 2008; USA. 2008. pp. 995-1000

[25] 25. Tsoumakas G, Katakis I, Vlahavas I. Effective and efficient multi-label classification in domains with large number of labels. In: Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD) Workshop on Mining Multidimensional Data; Belgium 15-19 September 2008. pp. 30-44

[26] 26. Lakshmi BN, Indumathi TS, Ravi N. A study on C.5 decision tree classification algorithm for risk predictions during pregnancy. Procedia Technology. 2016;24:1542-1549. DOI: 10.1016/J.PROTCY.2016.05.128

[27] 27. Clare A, King RD. Knowledge discovery in multi-label phenotype data. In: Proceedings of the 5th European Conference on Principles of Knowledge Discovery in Databases (PKDD 2001); 3–7 September 2001; Freiburg, Baden-Württemberg, Germany. 2001. pp. 42-53

[28] 28. Zhang ML, Zhou ZH. ML-kNN: A lazy learning approach to multi-label learning. Pattern Recognition. 2007;40:2038-2048. DOI: 10.1016/j.patcog.2006.12.019

[29] 29. Elisseeff A, Weston J. A Kernel method for multilabelled classification. In: Proceedings of the Annual ACM Conference on Research and Development in Information Retrieval (SIGIR05); 15-19 August 2005; Salvador. 2005. pp. 274-281

[30] 30. Schapire RE, Singer Y. Boostexter: A boosting-based system for text categorization. Machine Learning. 2000;39:135-168. DOI: 10.1023/A:1007649029923

[31] 31. Schapire RE, Singer Y. Improved boosting algorithms using confidence rated predictions. Machine Learning. 1999;37:297-336. DOI: 10.1023/A:1007614523901

Classification in Multi-Label Datasets

Information Systems Management

Abstract

Keywords

Author Information

Aouatef Mahani*

1. Introduction

2. Notations

3. Description measures

3.1 Label cardinality LC

3.2 Label density LD

3.3 Distinct label sets DL

4. Evaluation measures

4.1 Example-based measures

4.1.1 Prediction-based measures

4.1.2 Ranking-based metrics [14]

4.2 Label-based measures

5. Approaches and methods

5.1 Problem transformation method

5.1.1 Copy transformation method

5.1.2 Binary relevance (BR)

5.1.3 Label power set (LP)

5.1.4 Random K-labelsets (RAKEL)

5.1.5 Ranking by pair-wise comparison (RPC)

5.1.6 Calibrated label ranking (CLR)

5.1.7 Classifier chain model (CC)

5.1.8 Ensemble of classifier chains (ECC)

5.1.9 Pruned sets (PS)

5.1.10 Ensembles of pruned sets (EPS)

5.1.11 Hierarchy of multilabel classifiers (HOMER)

5.2 Algorithm adaptation methods

5.2.1 Decision trees

5.2.2 K-nearest neighbors KNN

5.2.3 Support vector machine

5.2.4 Ensemble methods

6. Conclusion

References

Research Trends in Library and Information Science in India during 2011 to 2018

Classification in Multi-Label Datasets

Information Systems Management

Abstract

Keywords

Author Information

Aouatef Mahani*

1. Introduction

2. Notations

3. Description measures

3.1 Label cardinality LC

3.2 Label density LD

3.3 Distinct label sets DL

4. Evaluation measures

4.1 Example-based measures

4.1.1 Prediction-based measures

4.1.2 Ranking-based metrics [14]

4.2 Label-based measures

5. Approaches and methods

5.1 Problem transformation method

5.1.1 Copy transformation method

5.1.2 Binary relevance (BR)

5.1.3 Label power set (LP)

5.1.4 Random K-labelsets (RAKEL)

5.1.5 Ranking by pair-wise comparison (RPC)

5.1.6 Calibrated label ranking (CLR)

5.1.7 Classifier chain model (CC)

5.1.8 Ensemble of classifier chains (ECC)

5.1.9 Pruned sets (PS)

5.1.10 Ensembles of pruned sets (EPS)

5.1.11 Hierarchy of multilabel classifiers (HOMER)

5.2 Algorithm adaptation methods

5.2.1 Decision trees

5.2.2 K-nearest neighbors KNN

5.2.3 Support vector machine

5.2.4 Ensemble methods

6. Conclusion

References

Continue reading from the same book

Information Systems Management