Open access peer-reviewed chapter

A Food Recommender Based on Frequent Sets of Food Mining Using Image Recognition

By Thunchanok Tangpong, Somkiet Leanghirun, Aran Hansuebsai and Kosuke Takano

Submitted: October 7th 2020Reviewed: March 12th 2021Published: April 19th 2021

DOI: 10.5772/intechopen.97186

Downloaded: 92

Abstract

Food recommendation is an important service in our life. To set a system, we searched a set of food images from social network which were shared or reviewed on the web, including the information that people actually chose in daily life. In the field of representation learning, we proposed a scalable architecture for integrating different deep neural networks (DNNs) with a reliability score of DNN. This allowed the integrated DNN to select a suitable recognition result obtained from the different DNNs that were independently constructed. The frequent set of foods extracted from food images was applied to Apriori data mining algorithm for the food recommendation process. In this study, we evaluated the feasibility of our proposed method.

Keywords

  • food recommender
  • food data mining
  • image recognition
  • deep neural network
  • data mining algorithm

1. Introduction

People are now consuming more high energy foods, fats and meat, and most of them do not eat enough fruit, vegetables and other dietary fiber. The make-up of a diversified and combined food will vary depending on individual characteristics such as age, gender, lifestyle and degree of physical activity, cultural context, locally available foods and dietary customs. While the increase of food services plays an important role in food market and business, the food service industry is a vital part of economy [1, 2, 3]. The business relies on its management to control costs, keep customers happy, and ensure smooth operations on a daily basis. There are many different types of food service types or procedures, but the major category of the food service is Buffet and Family style services.

Being an industry that serves the human needs, the food service is always the forefront of innovation. Even the food safety practices have been continuously updated along with legislation, the service is still facing a number of issues such as food technologies and consumer trends. For example, a customer wants to know the food information in order to have a set of food on the table. Foreigners who are not familiar with the local foods would like to enjoy having foods in a common style in those countries. In addition, a food designer is seeking a new decoration idea for the beautification of foods on plate. Food recommendation therefore is an important tool to enrich our life. It can be defined as a system that will recommend items to the users/customers within an environment depending on their past activities.

There was demonstration that digital imaging could estimate food information in many environments and it had many advantages over other methods [4, 5]. However, to derive the food information such as food type, food combination and portion size from food images remains uncertainty.

Accordingly, to achieve better food recommendation, it would be useful to analyze foods that people are actually eating in daily life. POS (Point of Sale) is a large-scale transaction data relevant to the customer’s purchase tendency [6]. The data is used only by individual store and not open for public. Therefore we cannot analyze the food purchase data among different stores, restaurants, canteens, and so on. Amazon Gois a smart store where a purchase transaction can be detected by a camera. Comparing with the POS system, the Amazon Goprovides an automatic management of information about the foods that people bought, including the items that associated with those products and the appearance of each item with individual preference [7]. In addition, the system can predict the expectation of the market. Note that this kind of Amazon-Go-likesystem has similar constraint for collecting big data. The obtained database from different sources thus is varied. Therefore we can say that there is a limitation of integrating the purchase transaction over different database as shown in Figure 1.

Figure 1.

Limitation of integrating the purchase transaction over different database.

From the diagram, it seems to be meaningful to create a system that analyzes the big data of food-images from various communities including companies, restaurants, and groups in social network system for extracting the people’s preference of food combination, food design, and food appearance by applying the image recognition technology. In the field of learning representation, there are many established models such as Artificial Neural Network (ANN), Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN).

ANN is a broad term that encompasses any form of deep learning model. It can be either shallow or deep depending on the number of hidden layers. CNNs are designed specifically for computer vision. They are different from standard layers of ANNs as they are constructed to receive and process pixel data. RNNs are the “time series version” of ANNs. They are meant to process the sequences of data. They are at the basis of forecasting models and language models. The most common kind of recurrent layers are called LSTM (Long Short Term Memory) and GRU (Gated Recurrent Units). They contain a series of small, in-scale ANNs that are able to choose the existed information they want to let it flow through the model. That is how they establish the “memory”.

However, some problems for food recognition by these models may occur as there are tremendous number of foods. In addition, it remains a challenging issue due to the complexity of emotional expressions, arising from the food variety, gender differentiation, cross culture and age-related differences [8, 9]. Thus, it is hard to find the proper model that can recognize all of them.

Advertisement

2. Related work

Food image recognition is one of the promising applications of visual object application, as it will help estimate food characteristics and analyze people’s eating choices for daily life. Many research works represented food recognition more practical by using the convolutional neural network (CNN) model [10, 11, 12]. CNN was applied to the tasks of food detection and recognition through parameter optimization. A dataset of the most frequent food items was constructed in a publicly available food-logging system. The CNN showed significantly higher accuracy than a conventional method did. In addition, the color feature is not always helpful for improving the accuracy by comparing the results of two group of controlled trials. It was reported that the achievement of CNN model was at 70–80% on one dataset and 60% on the multi-food dataset. The improvements could be expected by collecting more images and optimizing the network architecture and hyper-parameters.

For example, Deep Convolutional Neural Network (DCNN) was introduced for food recognition based on a combination of CNN-related techniques such as pre-training with the large-scale ImageNet data, fine-tuning and activation features extracted from the pre-trained CNN [13, 14]. Another approach was based on two main steps: firstly, to produce a food activation map on the input image (i.e. heat map of probabilities) for generating bounding boxes proposals and, secondly, to recognize each of the food types or food-related objects presented in each bounding box [15]. Interestingly, the Max-Pooling function was used for the data and the features extracted from this function were used to train the network. An accuracy of 86.97% for the classes of the FOOD-101 data set was recognized [16]. It was found that the image classification could be extended using prominent features that could categorize food images. Note that the feature-based approach and the multi-level classification approach (hierarchical approach) were highly appreciable to avoid mis-classifications when the number of classes was increased. However, these methodologies consumed high computational time.

2.1 Concept of convolutional neural network (CNN)

Convolutional neural network is a network that employs a mathematical operation called convolution. There are two main processes in CNN architecture – Learning extraction and Classification [17].

Step-1: Learning extraction

This process executes feature extraction from images through the following three layers -

  1. Convolution layer: this is the first layer to extract features from an input image. There are matrix filters (feature map) that multiplies with image in order to extract some features such as edge, blur, and color.

  2. Activation layer: this layer is used to increase non-linearity of the network without affecting receptive fields of convolution layer. The output is ƒ(x) = max(0,x).

  3. Pooling layer: this layer reduces the number of parameters when the images are too large. This operation means that the layer reduces the dimensionality of each map but keeps the important information.

Step-2: Classification

This process executes image classification. There are two main layers as follows:

  1. Fully connected layer (FC layer): to connect every neuron in one layer to every neuron in another layer. We flattened our matrix into vector and feed it into a fully connected layer like neural network.

  2. Softmax layer: A special kind of activation layer, usually at the end of FC layer outputs. It produces a discrete probability distribution vector.

At this stage, the pretrained models such as AlexNet, VGGNet, GoogLeNet, and ResNetwill be applied in order to integrate the CNNs. The procedure is to merge the multiple CNNs with fixed number of CNNs and trained them after merging.

2.2 Association rule

Association rule is utilized in data mining phase as it is a machine learning method which finds the relationship between the items based on the frequency of item sets [18, 19]. To do this, a mathematical algorithm is needed to arrange the frequent item sets, for extracting association rule of food sets.

There are three measures in association rule task as shown in Figure 2:

  1. Support: It indicates how often the items appear in the data-set.

  2. Confidence: It indicates how often a rule is found to be true.

  3. Lift: It indicates how the antecedentand the consequentare related to one another. If the Lift value is 1, it means that the antecedentand the consequentare independent. If the Lift value is less than 1, it means that the antecedenthas negative effect on occurrence on the consequent. In addition, if the Lift value is more than 1, it means that the antecedentand the consequentare dependent.

Figure 2.

Basic terminologies of association rule measure.

3. Proposed method

As the uncertainty of convolutional architecture due to the fix number of CNNs and the interpretability after merging of CNNs [20]. Simpler and more highly scalable method to integrate the multiple networks is necessary and therefore there are three features in consideration - liability score for the system, no further training after integration, and high scalability. Deep neural network (DNN) may be an alternative choice to solve this problem [21]. It provides the neural basis for efficient visual object recognition in humans. Its architecture comprises more than three layers. Each of layers contains a combination of convolution, max-pooling and normalization stages, whereas these layers are fully connected. It inherently fuses the process of feature extraction with classification into learning using Fuzzy Support Vector Machine (FSVM) and enables the decision making [22, 23]. Many efforts were made to speed up both the training time as well as inference time of DNNs. Since the flexibility and the performance scalability to deal with various types of networks are crucial requirement of the DNN accelerator design [24], we therefore proposed a scalable architecture for integrating different deep neural networks with a reliability score to increase the probability to return correct class as the result of food recognition. The reliability score allowed the integrated DNN to select a suitable recognition result obtained from the different DNNs that were independently constructed. In this study, we evaluated the feasibility of our proposed method.

The frequent set of foods extracted from food images was applied to a data mining algorithm such as the Apriorialgorithm for the food recommendation process.

Figure 3 shows the schematic diagram of food recommender. This system would recommend a set of foods suitable for users based on association rule, which were extracted by food set mining from food images.

Figure 3.

Schematic diagram of food recommender.

The process of food recognition and extraction of association rule is given in Figure 4. Firstly, the food set images from database were collected. Secondly, the reliability score was calculated by Eq. (1). Then the food recognition and segmentation from the images were operated using DNNs. Finally, the extraction of association rule was obtained as the frequent food sets.

Figure 4.

Process of recognition of food set images and extraction of association rule.

Reliabilityscore=A×C×I×DE1

where A, C, I, and D represented the accuracy of the learning process, number of classes, number of leaning images, and difference between learning data and test data, respectively.

We combined several DNNs and calculated new probability with reliability scores as shown in Figure 5. Each DNN has different reliability score depending on following factors:

  • accuracy of the learning process

  • number of classes

  • number of learning images

  • difference between learning data and test data

Figure 5.

The feature of proposed method.

If the learning data and the test data are similar, the accuracy will be high but it may be not reliable. As we test with the unseen data, the DNN may predict wrong answer. Therefore, the difference between the learning data and the test data is necessary.

We calculated the reliability score and the weight output (W) of fully connected layer as the new probability using Eq. (2). The label of the new probability as the predicted result was provided.

Newprobability=W×ReliabilityscoreE2

Accordingly, the food set could be recognized from food images using different DNNs with the reliability scores as shown in Figure 6. Based on the recognition results, transactions of food were formed and applied to conventional association in the phase of data mining.

Figure 6.

Workflow of proposed food recommender.

4. Experimental and evaluation

We implemented our prototype of merging the results from different DNNs using MATLAB. The performance of recognition accuracy and the relationship between the recognition accuracy and the number of classes were evaluated. We used the dataset of food images for creating the DNNs and evaluated them as shown in Table 1. Three DNNs with different number of recognition classes for each dataset were created as given in Table 2. In addition, we made each DNN from Scratchby applying the transfer learning using GoogLeNet. Accordingly, we totally had 18 DNNs in this experiment.

Type of datasetDatasetNumber of classesNumber of images
PretrainedFood-101101101,000
PretrainedUEC-25625631,395
PretrainedFruit-3608177,917
TestUNIMIB2016331,047

Table 1.

Dataset of food images.

DatasetNumber of classesScratchGoogLeNet
food-10134DNN1foodDNN1food*
50DNN2foodDNN2food*
101DNN3foodDNN3food*
uec-25685DNN1uecDNN1uec*
128DNN2uecDNN2uec*
256DNN3uecDNN3uec*
fruit-36027DNN1fruitDNN1fruit*
40DNN2fruitDNN2fruit*
81DNN3fruitDNN3fruit*

Table 2.

Deep neural network with different number of classes.

Recognition accuracy was calculated using the following equation:

Accuracy=TP+TN/TP+TN+FP+FNE3

where TP, TN, FP, and FN were True Positive, True Negative, False Positive, and False Negative respectively.

To evaluate the performance of integrated deep neural network, we created two single DNNs and four integrated DNNs as shown in Table 3. The UNIMIB2016dataset for the test was employed. Recognition accuracy then was calculated using Eq. (3).

No.Number of DNNsType of DNNDataset
11GoogLeNetFood-101
22GoogLeNetFood-101 + UEC-256
33GoogLeNetFood-101 + UEC-256 + Fruit-360
41ScratchFood-101
52ScratchFood-101 + UEC-256
63ScratchFood-101 + UEC-256 + Fruit-360

Table 3.

Integrated deep neural network.

To evaluate the accuracy of extracted frequent food set, we employed three food labels as shown in Table 4 for extracting the association rule.

Types of food label
1Predicted label by integrated DNN using GoogLeNet
2Predicted label by integrated DNN made from Scratch
3Corrected label (no prediction)

Table 4.

Food labels used for association rule.

5. Results and discussion

Figure 7 shows the recognition accuracy of DNN by food-101 dataset, uec-256 dataset, and fruit-360 dataset respectively. There was a relationship between the recognition accuracy of DNN and the increase of the number of class for each database. It was found that the recognition accuracy of DNN slightly decreased according to the increase of the number of recognition classes. In addition, the performance of DNNs where GoogLeNetapplied was higher than the DNNs made from Scratch.

Figure 7.

Recognition results of food-101, uec-256 and fruit-360.

Figure 8 shows the result of integrated deep neural network. It implied that if the number of networks was increased, the recognition accuracy of the integrated DNNs would be enhanced. In addition, it was found that the performance of DNNs where GoogLeNetapplied was higher than the DNNs made from Scratch. It is because the proposed reliability score allowed the integrated DNNs to select a suitable recognition result obtained from the three different networks, which were constructed from the food-101, the uec-256 and the fruit-360 dataset. Note that although the DNN constructed from the fruit-360 dataset showed the high accuracy of 97.72% for 10 classes recognition, it could not return a suitable result for an image of general food other than a fruit. This is the disadvantage of a DNN constructed from a specific image dataset.

Figure 8.

Result of integrated deep neural network.

Tables 57 show the results of food labels where the association rules were sampled from the whole results. It was found that the number of bad and good results for “Kiwi, Doughnut” was almost equal. While “Pineapple mini and Ice-cream” in Table 6 gave better results. It was confirmed that the number of good results in the integrated networks using GoogleNetwas preferable. In Table 7, “Steak and French fries” showed good result relevant to the test data.

AntecedentsConsequentsResult
Kiwi, DoughnutPineapple, OrangeGood
Kiwi, DoughnutOrange, Miso soupBad
Kiwi, DoughnutSpaghetti Bolognese, OrangeGood
Kiwi, DoughnutOrange, Spam musubiBad
Kiwi, DoughnutPineapple, Miso soupBad
Kiwi, DoughnutPineapple, Spaghetti BologneseGood
Kiwi, DoughnutPineapple, Spam musubiBad
Kiwi, DoughnutSpaghetti Bolognese, Miso soupBad
Kiwi, DoughnutSpam musubi, Miso soupGood
Kiwi, DoughnutSpaghetti Bolognese, Spam musubiBad

Table 5.

Results of integrated DNNs made from Scratch.

AntecedentsConsequentsResult
Pineapple mini, Ice-creamApple red/yellowGood
Pineapple mini, Ice-creamCantaloupeGood
Pineapple mini, Ice-creamCocosGood
Pineapple mini, Ice-creamFrench loafGood
Pineapple mini, Ice-creamGranadillaBad
Pineapple mini, Ice-creamGrapefruit pinkGood
Pineapple mini, Ice-creamKiwiGood
Pineapple mini, Ice-creamPeach flatGood
Pineapple mini, Ice-creamPeach abateGood
Pineapple mini, Ice-creamPitahaya RedGood

Table 6.

Results of integrated DNN using GoogLeNet.

AntecedentsConsequentsResult
Steak, French friesBanana, Roll breadGood
Steak, French friesBaklava, OrangeGood
Steak, French friesRoll bread, OrangeGood
Steak, French friesSpaghetti, OrangeGood
Steak, French friesBaklava, Roll breadBad
Steak, French friesSpaghetti, BaklavaGood
Steak, French friesSpaghetti, Roll breadGood
Steak, French friesYogurt, Roll breadGood
Steak, French friesSpaghetti, YogurtGood
Steak, French friesSpaghettiGood

Table 7.

Results of corrected labels.

Advertisement

6. Conclusion

In food recognition phase, integrated networks (DNNs) showed higher recognition accuracy (80%) than a single network. Since the proposed reliability score allowed the integrated networks to select a suitable recognition result obtained from the different network with different domains. The performance of networks where GoogLeNetapplied gave higher recognition accuracy. In addition, it was found that when we used the test dataset different from the trained dataset, we could not get the suitable results.

In Data mining phase, we could extract some meaningful rules by applying the Apriorialgorithm to recognize the results of canteen image dataset. In our future work, we will modify this system in recognition phase and will increase the performance of the networks. We will evaluate the effectiveness of modified system using bigger size of food data. In addition, we are developing visual food mining using mixed model of DNN and RNN (recurrent neural networks) for continuing our research.

Acknowledgments

This research is a part of academic cooperation between Chulalongkorn University, Thailand and Kanagawa Institute of Technology (KAIT), Japan. Authors would like to thank KAIT for providing scholarship and deeply appreciate Professor Kosuke Takano for his advice and facility in his Laboratory at KAIT.

Conflict of interest

The authors declare no potential conflict of interest.

© 2021 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution 3.0 License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

How to cite and reference

Link to this chapter Copy to clipboard

Cite this chapter Copy to clipboard

Thunchanok Tangpong, Somkiet Leanghirun, Aran Hansuebsai and Kosuke Takano (April 19th 2021). A Food Recommender Based on Frequent Sets of Food Mining Using Image Recognition, Artificial Intelligence - Latest Advances, New Paradigms and Novel Applications, Eneko Osaba, Esther Villar, Jesús L. Lobo and Ibai Laña, IntechOpen, DOI: 10.5772/intechopen.97186. Available from:

chapter statistics

92total chapter downloads

More statistics for editors and authors

Login to your personal dashboard for more detailed statistics on your publications.

Access personal reporting

Related Content

This Book

Next chapter

The Prospects for Creating Instruments for the Coordination of Activities of International Organizations in the Regulation of Artificial Intelligence

By Valentina Petrovna Talimonchik

Related Book

First chapter

Biologically Inspired Intelligence with Applications on Robot Navigation

By Chaomin Luo, Gene En Jan, Zhenzhong Chu and Xinde Li

We are IntechOpen, the world's leading publisher of Open Access books. Built by scientists, for scientists. Our readership spans scientists, professors, researchers, librarians, and students, as well as business professionals. We share our knowledge and peer-reveiwed research papers with libraries, scientific and engineering societies, and also work with corporate R&D departments and government entities.

More About Us