Integrating Text Mining and Network Analysis for Topic Detection from Published Articles on Banana Sensory Characteristics

Published articles (28) from PubMed database on banana sensory characteristics were from 2002 to 2018. They were mined to detect the topic of discussion using the KNIME software. The texts were tagged with the Open Source Chemistry Analysis Routines (OSCAR) chemical named entity and preprocessed by filtering and stemming, thereafter the topic of discussion detected with the Latent Dirichlet Allocation and term co-occurrence was determined using KNIME data mining software. The co-occurrence terms were converted to node adjacency matrix and imported into Gephi Graph Visualisation and Manipulation software version 0.02. Network statistics such as modularity class, degree centrality, betweenness and closeness centrality were estimated. Majority of the OSCAR tagged words (50.8%) were chemical compounds and 47.3% ontology terms. The directed network consisted of 53 nodes and 904 edges. There were four modularity classes. The terms with high betweenness centrality (>45) were, accept, fruit, analysis, coat, food, composite and banana. Three topics were detected from the documents, namely (1) quality of banana fruit and peel; (2) use of banana fruit in food and wine and (3) sensory acceptability of banana peel and flour in food products. This chapter provides details each topic.


Introduction
Banana (Musa spp.) is one of the most produced and consumed fruit globally comprising of an edible pulp and a peel [1,2]. Majority of the banana in the world was produced in Asia (52.8%), America (26.6%), Africa (17.8%), Oceania (1.4%) and Europe (0.4%) between 2000 and 2016. Green banana flour (GBF) from the pulp is rich in vitamin C and A, glutathione, flavonoids and phenolic compounds with potent antioxidant activity [3]. Banana peels are rich in phenolics and are good source of antioxidants [1]. Banana peel and unripe banana fruit are rich in dietary fibre and indigestible carbohydrates, proteins, essential amino acids, cellulose, hemicellulose, lignin, starch, resistant starch, polyunsaturated fatty acids and potassium [4]. The interest in GBF relates to its high resistant starch (40.9-58.5%) and dietary fibre (6.0-15.5%) as well as bioactive compounds [4]. The high resistant starch might contribute to controlling glycemic indexes, cholesterol, gastric fullness, intestinal regularity and fermentation by intestinal bacteria, producing shortchain fatty acids that can prevent cancer in intestinal cells [5]. The health benefits of banana have attracted production of innovative food products in addition to their sensory properties in recent years. These scientific results have been communicated in the form of scientific papers containing unstructured data which use free flowing natural language combined with domain-specific terminology and numeric phrases [6]. Manual abstraction of information from these papers for literature review has huge labour cost and delay with considerable source of error and data corruption. Hence, scientific papers are attractive for the development of machine processes for automatic information extraction [6] using text mining.
Text mining uses the Natural Language Processing (NLP) tools for the automatic discovery of previously unknown information from unstructured data [7,6] typically consisting of four stages (a) information retrieval by gathering a set of textual materials for a given topic; (b) entity recognition characterised by identifying textual features from gathered texts; (c) information extraction which aims to extract relationships among the recognised textural features such as occurrence and co-occurrence of specific terms (indexing) and (d) knowledge discovery, the extracted relationships are used to identify useful patterns from the data set [8]. Network analysis is a sociology techniques used to study the relationships and community structures in social data and has since been applied in other fields such as bioinformatics in order to find key molecular markers and communities within an interaction network [8]. It can be used to study the co-occurrence of specific terms.
Konstanz Information Miner (KNIME) text processing feature can read and process textual data and transform it into numerical data (document and term vectors) such as the term co-occurrence adjacency matrix in order to apply regular KNIME data mining nodes [9].
The objective is to process the unstructured textural information related to banana sensory using text mining and network analysis approach in order to extract knowledge like the use of banana in innovative food products and visualise associated relationship to the banana sensory attributes and consumer acceptability.

Topic detection and network analysis methodology
Published articles (106) from PubMed database on 'banana sensory' were uploaded into KNIME (Konstanz Information Miner) software using the PubMed document parser. Some of the articles were observed not to relate to the manuscript of interest. Hence, the documents were indexed for 'banana' in the titles using the table indexer and index query nodes resulting to 28 relevant documents. The texts were tagged with the OSCAR (Open Source Chemistry Analysis Routines, an open source extensible system for the automated annotation of chemistry in scientific articles) chemical named entity using the Oscar Tagger node and pre-processed by filtering and stemming, then transformed into a bag of words, which was filtered again such that only the terms with relative frequency from 0.02 to 1.0 was used as features (Figure 1a). The term co-occurrence counter node was used to count the number of co-occurrences in sentences. Following which the documents were transformed into  document vectors. The co-occurrence terms in at least four sentences were converted to node adjacency matrix (Figure 1b) and imported into Gephi Graph Visualisation and Manipulation software version 0.02. Network statistics such as modularity class, degree centrality, betweenness and closeness centrality were estimated. Degree centrality is the central tendency of each node in the network. The more direct connects each term has, the more power it has in the network and so the more important it is. The betweenness centrality reflects the ability of a node to take control of other nodes communication and control resources in the network. Closeness centrality is the ability of a node not being controlled by other nodes and measures the closeness of a node to others in the network. The Latent Dirichlet Allocation (LDA) node which uses a machine learning for language toolkit (MALLET) topic modeling library was applied to extract relevant information from an unstructured text (Figure 1c).

Author network
The author network (Figure 2) is characterised by 113 nodes (researchers) and 217 edges indicating connectedness among the researchers. Although there are five groups in the author network, the groups are not well connected with network density of 3.4%.   Figure 3.
Chemical compounds related to banana sensory include modified atmospheric packaging (MAP), beta-glucan, green banana flour (GBF), carbon dioxide, alphaamylase and ethylene. Reaction name include dehydration. The flour from the peels and banana pulp requires dehydration to remove the moisture. Ontology is a set of concepts and categories in a subject area and includes food, nutrients, protein, process, pectin, antioxidants and ascorbic acid.

Co-occurrence term interaction in the banana sensory documents
The directed network for the co-occurrence terms consists of 53 nodes and 904 edges with in-degree ranging from 6 to 37 (banana) with a mean of 17. At least 50% of the nodes are co-occurring with 17 other nodes. Nodes with many ties like banana in this instance are said to be prominent, or have high prestige as many nodes seek to direct ties towards them, an indication of importance. The in-degree is a measure of popularity based on the number of connections to a node [10], representing the amount of attention the node receives. Hence, banana co-occurred more than other terms. This is expected as the topic of interest is on banana. The out-degree ranged from 0 to 44 (accept) with a mean of 17. Nodes with high out-degree like accept in this instance can influence others and are often said to be influential. The high out-degree is expected since sensory evaluation is related to acceptability characteristic of a product. The topic of interest being sensory attributes of banana and its products. The limitation of using a node's degree to quantify its significance is that each connection is valued equally as it assumes that forming a connection with an important node counts as much as a connection to an unimportant one [10]. Practically, developing a connection with the chief executive of a company gives more influence than a connection with someone in an entry level position.
The connectedness of the nodes is detailed in Figure 4. The nodes were coloured based on the four modularity classes. The base class consisting of 20.7% (green), class 1, 20.8% (red), class 2, 41.5% (purple) and class 3, 17.0% (blue) of the nodes. Betweenness centrality accounts for the nodes that lie on many short paths having considerable control over information diffusion in the network acting as gatekeepers of information [10]. Terms with high betweenness centrality are also high in in-degree and eigenvector centrality and are therefore connected to many highly connected terms. The betweenness centrality ranged from 0 to 192. The terms with high betweenness centrality (>50) were accept (139)

Discovered knowledge on banana sensory
Three topics each consisting of five terms were extracted from the banana sensory documents as detailed in Figure 5. Topic 0 contributing 27.98% correlates to quality of banana fruit and peel. Topic 1 (23.21%) correlates to the use of banana fruit in food and wine. Topics 2 (48.81%) relates to sensory acceptability of banana peel and flour in food products. The details of each topic will be provided.

Quality of banana peel and fruit
Kudachikar et al. [11] reported that packing optimally matured (75-80%) 'Robusta' banana in modified atmospheric packaging with low density polyethylene (LDPE) films alone and in combination with green keeper as 'ethylene absorbent' under low temperature (12 ± 1°C, 85-90% RH) extended the shelf life up to 5 and 7 weeks, respectively, compared to 3 weeks in openly kept control fruits stored under similar conditions. The green keeper treated samples contained three sachets containing KMnO 4 (10 g/sachet) as ethylene absorbent placed in each LDPE film. Sensory quality of the fully ripe fruits 5 days after the ethrel dip was very good. Banana fruits treated with 500 ppm of ethrel ripened evenly in 6 days at 20 ± 1°C with excellent external colour, taste, flavour and overall quality [12]. Using image analysis it was reported that banana peel browning occurred faster in banana packaged in CO 2 gas exchange packaging [13].
Banana fruits coated with a mixture of 10% gum Arabic and 1% chitosan maintained delayed colour development, reduced the rate of respiration and ethylene Banana Nutrition -Function and Processing Kinetics 8 production during storage and maintained the overall sensory of the fruits for 33 days [14]. The authors concluded that 10% gum Arabic with 1% chitosan could be used as an edible coating to commercially extend the storage of banana fruits for up to 33 days. Belayneh et al. [15] studied the physicochemical properties of four Ethiopian cooking varieties (Cardaba, Nijiru, Matoke and Kitawira). Cardaba variety had high fruit weight, fruit length, fruit girth, fruit volume, total soluble solids, ascorbic acid, dry matter and low titratable acidity and provides the best quality boiled pulp. Nijiru, Kitawira and Matoke were superior in acceptable quality chips and are recommended for chips by food processors in Ethiopia. Cardaba varieties were heaviest and the longest containing 88% more edible portions per unit fresh weight than the peel, whereas Kitawira and Nijiru are the smallest, shortest and thinnest fruit.
The improvement in the storage of banana for 33 days with edible Arabic gum and chitosan coating will greatly reduce wastage. Reduction in banana pulp browning will contribute to consumer appeal of banana products.

Use of banana fruit in food and wine
Mridula et al. [17] developed food grains (maize, defatted soy flour, sesame seed)-banana based nutritious expanded snacks using extrusion cooking. Banana pulp positively correlated with water solubility index, total minerals and iron content but negatively correlated with water absorption index, protein and overall acceptability. Optimised product was obtained by blending the coarsely ground maize (78.5), sesame seeds (7.5%), defatted soy flour (14%) with 8 g ripe banana pulp, 350 rpm screw speed and 14% feed moisture with 15.5% protein, 401 kcal/100 g, 4.48 mg/100 g and 7 overall sensory acceptability on a 9-point hedonic scale. Higher levels of the ripe banana pulp in the feed formulation resulted to increased Maillard reaction leading to high redness in the final product. The protein and calories of the snack could contribute about 50% of protein and 20% of calorie requirement of a 7-9-year-old child; hence it has the potential in combating protein-calorie malnutrition [17].
Vacuum-microwaved banana chips with 10% moisture had more crispiness, significantly higher volatiles and greater sensory rating than the air dried with similar moisture content [18]. Wheat pasta was produced with whole-wheat flour (60.6%) and whole egg (39.4%). The green banana pasta was produced with green banana flour (40.0%), egg whites (31.5%), water (16.4%), guar gum (2.4%) and xanthan gum (2.5%). The rationale for the use of egg white was its strong influence on the quality of gluten-free pasta products due to its high protein content that can be coagulated at low heat, easy access and low cost [5]. The hydrocolloids were included to augment the action of the egg white. However, the authors did not check whether the interaction between guar gum and xanthan is additive, synergistic or antagonistic. The pasting properties of Bambara groundnut flour with carboxylmethyl cellulose, starch and xanthan to obtain a non-gluten flour revealed that the xanthan had no significant effect on the pasting properties [19,20]. Thus using xanthan in such a system increases the cost with no functional merit. Green banana pasta containing approximately 98% less lipids showed greater acceptance (84.5% with celiac and 61.2% with non-celiac) than wheat pasta (53.6% with non-celiac individuals). The consumers did not identify any significant difference between the wheat and the green banana pasta in appearance, aroma, flavour and overall quality [5].  Ogodo et al. [21] reported mixed fruit (pawpaw, banana and watermelon) wine using Saccharomyces cerevisiae isolated from palm wine. The acceptability of the wines was rated as pawpaw and banana > pawpaw and watermelon > pawpaw, watermelon and banana > banana and watermelon wine. These studies highlight the potential of banana pulp in extruded, fried food products and wine.

Sensory acceptability of banana peel and flour in food products
Ripe banana pulp is rich in fibre, polyphenols and simple sugars (61.1 g/100 g) making it ideal for sucrose replacement in baked products. However, in cake formulation inclusion of ripe banana flour slightly lowered the specific volume and increased hardness [22]. Consequently, a decline in sensory acceptability. Nevertheless, the added banana flour significantly improved the nutritional properties of the cakes with increase in dietary fibre, polyphenols and up to three-fold improvement in antioxidant capacity.
Arvanitoyannis and Mavromatis [23] reported that the physicochemical (pH, texture, vitamin C, ash, fat, mineral and sensory properties of banana are related to the genotype and growing conditions with the minerals accurately discriminating banana cultivars of different geographical origin. The beneficial properties of banana relate to its high dietary fibre and antioxidant compounds, the latter being abundant in the peel. Extracts from banana peel was used as an antioxidant in freshly squeezed orange juices and juices from concentrate [24]. Adding the extract to both types of orange juice increased the free radical scavenging capacity as well as increase in antioxidant capacity using 2,2′-azino-bis-(3-ethylbenzothiazoline)-6-sulfonic acid (ABTS) radical with equal or greater than 5 mg of banana peel extract per ml of freshly squeezed juice. However, equal or greater than 10 mg banana peel extract per ml of orange juice produced undesirable in-mouth sensations and colour. Thus, banana peel has potential as a natural additive as free radical scavenger in orange juice.
The effect of banana peel flour (BPF), rice flakes and oat flour on the sensory acceptability of cereal bars using mixture design was investigated by Carvalho and Conti-Silva [25]. The lowest quantity of banana peel flour produced cereal bar with higher amount of rice flakes, chewiness and crispness. Formulations with intermediate and highest quantities of banana peel flour were darker in colour with higher banana aroma and bitter taste. The cereal bars were similar in hardness, adhesiveness, sweet taste and oat flavour. The feasibility of BPF in acceptable cereal bars as reported may diversify its use in new products for different market niches.
Maneerat et al. [26] extracted banana peel pectin (BPP) with HCl (pH 1.5) and water (pH 6.0) for 30-120 min at 90°C. The acid extraction produced 7-11% pectin (dry wt.) with 42-47 galacturonic acid (GalA), 57-61% degree of methylation (DM) and 17-40 kDa viscosity-average molecular weight (Mv). Lower DM with higher GalA and Mv characterised the water extracted BPP. The authors incorporated the BPP obtained from 60 min acid-and water-extraction into salad cream at 30% oil substitution. The result was a decrease in viscosity and lightness with a stable to cream separation during storage for 3 weeks. However, the salad cream containing water-extracted BPP had larger oil droplet size and greater extent of droplet flocculation. The full-and reduced-fat salad creams were similar in thickness, smoothness and overall acceptability.
Borges et al. [27] reported the quality of banana skin extract jellies. Based on the sensory and purchasing intention, the best formulations was obtained using a higher extract/sugar ratio (60/40) and lower pectin level (0.5 g/100 g) with the highest (20 ml) or lowest citric acid (15 ml) with scores for all the attributes ranging from liked slightly to moderately. The use of banana peel extract as an antioxidant is dose related. Equal or greater than 10 mg of extract per ml of juice is undesirable. Lower quantity of banana peel flour will produce consumer acceptable cereal bar and can be diversified into other niche products. It is possible to extract pectin from banana. However, the yield depends on the stage of ripeness of the peel. Yangilar [28] studied the effect of green banana flour (GBF) on the physical, chemical, mineral and sensory properties of ice cream. The GBF affected moisture, acidity, fat, ash contents and viscosity positively, while meltdown, colour and overrun were negatively affected. Ice cream with 2% GBF received the highest sensory score.
Enrichment of fermented milk with green banana pulp (GBP) stabilises the probiotic strain, Lactobacillus paracasei LBC 81 during refrigerated storage [29]. The hue, chroma and colour difference of the fermented milks were less affected as the amount of the GBP increased. However, the GBP resulted in increase in syneresis and the occurrence of post-acidification during the storage period. Fortunately, these technological defects can be improved with the use hydrocolloids and has greater control of the fermentation process [29]. More than 70% of the panellists expressed mean values ranging from 6 to 9 on a 9-point hedonic scale for fermented milk with 6 g/100 g of GBP for all the sensory attributes. A sample is considered as having good acceptance when 70% or more of the individuals' express mean values on the 9-point hedonic scale of higher than 5 [29]. Green banana pulp (9 g/100 g) negatively affected the acceptance of the product due to the acidification of the product, thus the rejection related to flavour and overall quality. The authors concluded that the use of green banana pulp can contribute to the nutritional quality of the fermented milk due to its phenolic compounds, resistant starch and fibres, thereby impacting consumer health due to its probiotic and prebiotic effects.
Pork skin (PS)-green banana flour (GBF) gel (PS-GBFG) was produced from cooked pork skin at 80°C for 60 min, ground through a 3 mm plate and mixed with water and GBF in a ratio of 1:2:2 (PS:water:GBF) using a cutter until complete homogenisation. The gel was used as fat replacer in bologna type sausages [30]. The PS-GBFG decreased the fat content, did not affect the protein levels while it increased the resistant starch significantly. It also improved cooking loss and emulsion stability, 60% substitution did not affect the colour and texture of the sausage. Although the 60% substitution was effective for maintaining sensory quality, acceptable products were obtained with up to 100% substitution. Thus, PS-GBF gel was effective as fat replacer in Bologna type sausages.
GBF was reported to significantly improve the emulsion stability and cooking yield of chicken nugget compared to the control. This was attributed to the increase in viscosity by the GBF fibres which ultimately reduced shrinkage on cooking [3]. As a functional ingredient in chicken nuggets GBF served as a good source of dietary fibre with positive impact to microbiological quality and comparable sensory quality to the control [3].
Wheat flour was substituted with 10% banana pseudo-stem flour (BPF) in bread [31] resulting in significantly higher moisture, ash, crude fibre, soluble, insoluble and total dietary fibre but lower protein, fat and carbohydrate compared to the control. Presence of BPF resulted in a lower volume, darker crumb and lighter crust colour than the control. However, the addition of CMC improved the bread volume. All bread with BPF had greater total phenolics and antioxidant properties than the control with BPF and 0.8% CMC bread highest in overall acceptability and comparable to the control in overall acceptability. Saravanan and Aradhya [32] produced food beverage rich in antioxidants from banana pseudo stem (BPS) and rhizome (BR). The pseudo stem is an actively aerial stem with closely packed leaf sheaths that functions as a vascular bridge for the flow of water and nutrients from underground rhizome to banana leaves and bunch. The rhizome is a modified stem of banana plant that remains underground and bears the banana plant on surface and roots underground. The pseudo stem is obtained after removal of the surrounding leaf sheaths at harvest. The BR juice had higher total phenolic and flavonoid content with correspondingly higher antioxidant activity compared to the BPS juice. The ready to serve beverage prepared consisting of 25% BPS juice and 20% BR juice each with 15° brix total soluble solids and 0.3% acidity was the best based on the sensory qualities.
Unripe banana whose peel was green and the pulp not soft was washed, peeled and cut into 10 mm thick slices, steam blanched for 10 min, dried at 60°C for 24 h, milled and sieved into flour. The flour was mixed with water (10 g flour/3 ml water) and fermented for 24 h. The fermented slurry was used as starter for wheat bread using the straight dough fermentation method [33]. Substitution of the wheat flour with the fermented banana increased the crude fibre, carbohydrate and protein content of the bread. The wheat-unripe banana blend (90:10) produced better quality and sensory acceptable bread.
Ripe banana pulp is high in fibre, polyphenol compounds and simple sugars (61.1 g/100 g). Segundo et al. [22] reported the potential of ripe banana flour [RBF] (20 and 40% replacement) as sucrose replacer in cake formulation. The inclusion of RBF significantly increased the dietary fibre, polyphenols and antioxidant capacity to three-fold. However, the increased batter consistency resulted in a slightly lower specific volume and higher hardness contributing to the decline in consumer acceptability. This effect was minimised in layer cakes where differences in volume were only evident at higher substitution level.
Oliveira de Souza et al. [34] reported the replacement of fat (0-100%) in pound cakes and sugar reduction (0-50%) using green banana puree (GBP). Replacing fat with GBP resulted in changes in colour, slice size, compaction, odour, flavour and texture. The GBP was produced by washing 280 g of whole green bananas on the second stage of maturation (green with a trace of yellow) and cooking under pressure at 120°C for 8 min. The cooked bananas where peeled and the pulp (183 g) was mashed up for 5 min in a multiprocessor with 100 g water to achieve the texture of puree. Sugar reduction negatively affected the appearance resulting to a higher proportion of big alveoli, beige or dark beige colour, mild taste, and wheat flour flavour. GBP replacement and reduction of sugar increased lightness, colour saturation and hue of the crust. The authors concluded that it is possible to replace 25% of fat with GBP in pound cakes and to reduce 20 and 40% sugar in low-fat cakes with GBP with very little impact on acceptance and sensory characteristics.
Flour from whole (pulp and peel) overripe banana (OWBF) was used as an ingredient in muffin. Products with OWBF in 400 and 500 g/kg of total flour were highly acceptable with high dietary fibre (181.9 g/kg) and resistant starch (35 g/kg), a low total starch (57 g/kg and high simple sugars (714.2 g/kg of carbohydrates were glucose plus fructose). The muffin with OWBF is classified as an intermediate glycemic load.

Conclusion
This chapter shows the power of integrating data mining and network analysis techniques in discovering interesting trends in quality of banana fruit and peel, use of banana fruit in food and wine as well as the sensory acceptability of banana peel and flour in food products. Arabic gum and chitosan coating can greatly reduce wastage by extended storage for up to 33 days. Ripe banana pulp is high in fibre, polyphenol compounds and simple sugars (61.1 g/100 g). These studies highlight the potential of banana pulp in extruded, fried food products and wine. Ripe banana flour significantly improved the nutritional properties of the cakes with © 2019 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/ by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. increase in dietary fibre, polyphenols and up to three-fold improvement in antioxidant capacity. Banana pulp flour, peel, pseudo stem and rhizome all have potential in consumer acceptable food products.

Author details
Victoria Jideani Department of Food Science and Technology, Cape Peninsula University of Technology, Bellville, South Africa *Address all correspondence to: jideaniv@cput.ac.za