Fundamental Concepts and Algorithms in Machine Learning

Adegboyega Adegboye

doi:10.5772/intechopen.1001886

Abstract

This chapter will enable readers to glance through the Machine learning landscape topology. It covers the fundamental Concept of machine learning, Algorithms in Machine Learning, usefulness of machine learning and other tips that will empower readers to get the best out of machine learning and its related Field. This chapter consists of 25 sections. Sections 1–10, covers but not limited to: Introduction to machine learning, Prerequisite to Machine Learning, Machine learning Algorithm, its categories, and the application of machine learning. While sections 11–20 topics includes, Perceptron, artificial neural network (ANN), model evaluation, principal component Analysis, and model parameter. Sections 21–25 cover such topics as: Errors in machine learning, bias, life cycle of machine learning, data gathering methodology, data set, population based algorithm, and conclusion. This chapter, also, discussed future of machine learning, and other key term required to the understanding of machine learning as a topic. The chapter showcase machine learning in such a way that new theory, knowledge, understanding, in the area of interest would emerge from reading it, as a topic, and in the area of application.

Keywords

machine learning
concepts
algorithm
future
data set
methodology and glance

Author Information

Show +

Adegboyega Adegboye*
- Achievers University, Owo, Nigeria

*Address all correspondence to: akanbi2090@yahoo.co.uk

1. Introduction

Machine learning a subdivision of artificial intelligence allows a particular chosen machine learning algorithm to be used in computer systems to learn and improve from gathering experience without being explicitly programmed. Recently, it has become a desired topic in research and related fields due to the many practical applications in big data, scientific applications such as bioinformatics, medicine, astronomy, and a variety of human tasks. It is the study of making machines more human-like in their decisions and actions by endowment with great ability to learn and develop their own programs. This action is executed with little or no human intervention.

2. Prerequisite to aid the understanding of machine learning

2.1 Learning

Obviously, learning is the process of converting experience into expertise or knowledge. In the traditional learning, learning is done by memorization of past experiences. In addition, the automated learning popularly known as Machine Learning (ML) makes use of label to distinguish between seen or unseen, experience, event, and entities. Machine learning is preferred; because of it ability to adapt, and tackle problem’s complexity.

2.2 Machine learning

Machine learning algorithms through a dataset that use to run them can teach computers how to do what occurs naturally to animals and human beings, and also, learn from experience through some computation techniques. As the number of samples made available for learning increases Machine learning algorithms adaptively improve with better competence performance.

2.3 Appropriate situation to use machine learning

Machine learning is use for a complex task or problem involving a huge amount of data and sets of variables, but no existing of formula or equation that has tackle such a problem.

3. Machine learning algorithms

They can be divided into three types of names based on their learning style.

3.1 Supervised learning algorithms

The training data is provided along with the label which directs the training process. The model is coached until the desired level of accuracy is attained with the training data set in use. Instances of such problems are classification and regression. Typical, algorithms include Logistic Regression, Naive Bayes, Decision Trees, Linear Regression, Support Vector Machines (SVM), Nearest Neighbor, Neural Networks, and others. Use Case examples are Digital marketing, Internet of Things (IoT), and Asset Maintenance.

3.2 Unsupervised machine learning algorithms

Input data used for Unsupervised Machine Learning Algorithms are not labeled neither come with a label. This model brings together recognizing patterns present in the data feed into it. It can handle problems such as dimensionality reduction and clustering. The list of algorithms used for these categories of problems includes but is not limited to the Apriori algorithm, Association Rules Mining, K-Means, and their variant. An example, the real life of unsupervised machine learning is a case where a supermarket desires to increase its revenue. It decides to implement a machine learning algorithm using its products’ data. It was observed that the customers who bought Milo more often tend to buy cow milk or those who buy Lipton tea tend to buy pure honey.

3.3 Semi-supervised machine learning algorithms

The cost of labeling data in a dataset is relatively costly as it can be done perfectly through the knowledge of highly skilled human experts. The key in data can be labeled or unlabelled data. The model makes predictions by learning the underlying patterns on its own without human intervention. It can handle classification and clustering problems. Its application is applicable in search engines, like Google, and in the analysis of images and audio.

3.4 Time series algorithms

Time Series algorithm is a learning algorithm that attempt and find best model and parameter values for a given dataset. A Time Series is a collection of observations of well defined dataset obtained over time, through repeated measurements of the observations. For example, measuring the rain drops over an area of land space each month of a given year comprises a time series dataset. Time series forecasting use model to predict future values. Its application includes but not limited to earthquakes prediction, signal processing, weather forecasting, pattern recognition, and so on. Time series analyses are limited in ability to generalize very well from a single study, and appropriate measure is not easily obtainable, and sometimes it is difficult to accurately identifying the correct model to represent a given dataset. Areas of applications are Forecasting pandemic spread, diagnosis, and medication planning in healthcare.

4. The algorithms can be grouped into the following based on the similarity of function

4.1 Regression algorithms

Regression is a process that identifies the relationship between the target output variables and the input attributes in order to make predictions concerning the new data. Commonly used Regression algorithms are: Simple Linear Regression, Multiple Regression Algorithm, Lasso Regression, Logistic regression, Multivariate Regression algorithm, and so on.

4.2 Instance-based algorithms

It is a type of learning machine algorithm that measures new instances of a problem with those in the training dataset to discover out a best match and makes a prediction in view of that. The most example of such algorithms comprise of, Locally Weighted Learning, Learning Vector Quantization, Self-Organizing Map, k-Nearest Neighbor, Support Vector Machines, et cetera.

4.3 Regularization

One of the techniques used for regulating the learning process from a particular set of features is regularization. The weights attached to the features are standardized through normalization, this prevents certain features from dominating the prediction process, which can result in overfitting. Typical regulation algorithms are Ridge Regression, Least-Angle Regression, and so on.

4.4 Decision tree algorithms

These methods normally construct a tree-based model constructed on the decisions made through examining the values of the attribute of features. Decision trees find application of use in classification, and regression problems. A few of the best decision tree algorithms include but not limited to: Classification and Regression Tree, C5.0, Conditional Decision Trees, Chi-squared Automatic Interaction Detection, and Decision Stump to mention but a few.

4.5 Bayesian algorithms

The Bayesian algorithms make use of the Bayes theorem to solve classification and regression problems. The algorithms comprise of Naive Bayes, Gaussian Naive Bayes, Multinomial Naive Bayes, Bayesian Belief Network, Bayesian Network, Averaged One-Dependence Estimators, and so on.

4.6 Clustering algorithms

These algorithms that group data points in a dataset into different clusters are called Clustering algorithms. Similar data, with the same properties, are grouped together. Data clustering is mutually exclusive and used only for statistical data analysis in many fields. Clustering is k-Means, and Hierarchical Clustering to mention a few.

4.7 Association rule learning algorithms

A type of rule-based learning method for identifying the relationships between variables in a very large dataset is called Association rule learning. It is commonly employed in a market basket analysis. Apriori algorithm, and FP Growth are typical example.

4.8 Artificial neural network algorithms

The Artificial neural network algorithms emulate the biological neurons in the human brain. They form a family of complex pattern matching, and prediction processes in classification and regression problems. Some of the well-know artificial neural network algorithms are: Perceptron, Multilayer Perceptrons, Stochastic Gradient Descent, Back-Propagation, Hopfield Network, Radial Basis Function Network, et cetera.

4.9 Deep learning algorithms

Deep learning algorithms are restructured versions of artificial neural networks that can handle very large and complex databases of labeled data types. Deep learning is endowed with more powerful refined computational resources, which enable them to handle text, image, audio, and video data inclusively and effectively. Self-taught learning constructs with many hidden layers assist deep learning to cope with huge data. Deep learning algorithms include but are not limited to Convolution Neural Networks, Recurrent Neural Networks, Deep Boltzmann Machines, Auto-Encoders Deep Belief Networks, Long Short-Term Memory Networks, and so forth.

4.10 Dimensionality reduction algorithms

These algorithms exploit the essential structure of data in an unsupervised approach to express data using improve information set. They revamp a high dimensional data into a lower dimension in such a way that supervised learning methods like regression, and classification could be used with such data. Some of the well known dimensionality reduction algorithms include but not limited to Principal Component Analysis, Principal Component Regression, Linear Discriminant Analysis, Quadratic Discriminant Analysis, Mixture Discriminant Analysis, Flexible Discriminant Analysis, Sammon Mapping, among others.

4.11 Ensemble algorithms

Ensemble techniques are models that were made up of various of the weaker models that are trained individually and the individual predictions of the models are coalesced using some methodology to get the final overall prediction. The quality of the output depends on the choice of methodology use to combine the individual results. Some of the typical methods are: Random Forest, Boosting, Bootstrapped Aggregation, AdaBoost, Stacked Generalization, Gradient Boosting Machines, Gradient Boosted Regression Trees, Weighted Average, and the like.

5. Applications of machine learning

These set of algorithms are proficient in build self intelligent systems that learn from their past knowledge using historical data and give accurate results. Many industries are now applying ML solutions to their various business problems, or to create new and improve products and services innovations that give them better edge over other competitor. They find application in Healthcare, defense, financial services, marketing, security services, and many more.

5.1 Facial recognition and image recognition

One of the applications of machine learning is the Facial Recognition; an example of this application is the use of iPhone. Typical use-cases of facial recognition are for security purposes akin to identifying criminals, searching for missing individuals, aid forensic investigations, and so on. Intelligent marketing, diagnose diseases, track attendance in schools are some of it other uses.

5.2 Automatic speech recognition

Automatic speech recognition (ASR) converts speech into digital text. Its applications lie in the user’s authenticating based on their voice, and the task perform is based on the human’s voice inputs. Speech patterns and vocabulary are fed into the system to train the model. Currently ASR systems find a wide variety of applications in area such as, in the defense, aviation, medical assistance, industrial robotics, and Security Access Control. As well as, in the home- automation, telecommunications-industry, information and technology, consumer electronics, forensic, Law enforcement, and others.

5.3 Financial services

Machine learning has many applications in the areas of Financial Services. Machine Learning algorithms have prove to be excellent at detecting frauds by monitoring activities of a particular user and evaluate it, if an illegal attempted activity is characteristic of that user or not. Money laundering detection and monitoring is a typical financial security use case. It also useful in the area of trading decisions with the help of algorithms that can analyze thousands of data sources concurrently. Its application extends to credit scoring and underwriting. It has wider uses in the virtual personal assistants like Siri, Google now, Alexa, and so on.

5.4 Marketing and sales

It helps businesses in the aspect of improving dynamic pricing by using regression techniques and models to make predictions, and in the Sentiment Analysis to gauge consumer response to a specific product or a marketing initiative. Also, in the Computer Vision, it helps brands to identify products in the images, the videos online, and to measure the mentions that miss out on any relevant information in textual format.

5.5 Healthcare

A vital application is in the diagnosis of diseases and ailments, which are otherwise hard to diagnose using other methods. For example it improve Radiotherapy performance, Clinical trials, and epidemic prediction outbreaks to give better results at cheaper cost and at reduced time.

5.6 Recommendation systems

Today businesses recognize and use recommendation systems for effective conversation with their users on their site. Relevant products such as: movies, web-series, songs, and much more are recommended for user with relevant information that will encourage customer to patronize the products. E-commerce sites like Amazon, and many others are well known recommendation systems use-cases.

6. Real-world machine learning use cases

The real world machine learning use cases can be found in:

6.1 Fraud detection

Machine learning algorithms can be trained to detect patterns of fraudulent activities. There are three basic types of fraud: asset misappropriation, bribery and corruption, and financial statement fraud.

6.2 Image and speech recognition

Machine learning algorithms can be used to recognize and at the same time, classify objects, people, and spoken words in images and audios recordings into categories each belongs. It can be used to classify the type of flower plant that is in a picture or identify an apple from a banana.

6.3 Predictive maintenance

Predictive maintenance is a maintenance that monitors the performance and condition of equipment during normal operation to reduce the likelihood of failures, this result to accurate prediction of downtime, and makes maintenance more cost effective.

6.4 Personalization

Personalization helps you gain insights into customer preferences and intent through data, it enable you to offer them likely desire needs, recommendation in a online shopping websites or streaming services and so on.

6.5 Healthcare

Machine learning can be used to predict accurately patient diagnosis outcomes, identify, and predict accurately potential outbreaks of infectious diseases that can leads to epidemic, and assist in accurate diagnosis, and timely treatment, and treatment planning management. For example, a machine learning algorithm can be used in medical imaging (such as X-rays scans) using pattern recognition to look for patterns that indicate a particular disease.

6.6 Natural language processing

Machine learning can be used to understand and process various human languages, enabling applications such as language translation and chat bots to reduced language barrier across globe. For example, it enable one to understand several languages, dialects, slang, and jargon.

7. Future of machine learning

Machine learning has a brighter future in development and in the application of various human endeavors. It has greater potential to develop many fields such as science, research, technology, and society at large. Some potential future uses for machine learning may include but are not limited to intelligent assistants, personalized healthcare, self-driving automobiles, and tackling global issues like poverty, and constant climate change. Also, developing new algorithms, techniques, and selecting the best algorithm to use for a particular task, and so on. Machine learning applications will increase and new innovations will emerge in robotics, self-supervised learning, and multi-agent learning. They will become more intelligent at completing tasks in the near future.

8. Over-fitting

Assume one trains a model from a dataset of 1000 faces. If we use the original data set on the model, it predicts outcomes is, say 99% accuracy. But when we run the model on a new (“unseen”) dataset of faces that are not in the original data set and its accuracy is say, 60%. Our model does not generalize well from training data the unseen data. This is known as overfitting, and it is a problem common in machine learning models.

8.1 Reasons for over fitting

A model can be over fitting due to model is too complexity, has a high discrepancy, and size of the training dataset used is not enough adequately sufficient.

8.2 Ways to tackle over fitting

Using either K-fold cross-validation, or regularization techniques such as Lasso, and Ridge. Also, training model with sufficient data, and adopting ensemble techniques.

9. Under-fitting

A model underfitting data in a dataset if it is not able to generalize well on the new data that are not part of the training data set. This may be due to high bias and low variance.

9.1 Reasons for under-fitting

A model that experience under-fitting, may due to Data used for training is not cleaned enough, and contains some noise (garbage values) in it, or the model has a high bias, the size of the training dataset used is not good enough, and the model is too simple, to capture needed information from data set.

9.2 Ways to tackle under-fitting

Under-fitting could be reduce or eliminate by increase the number of features in the dataset, increase model complexity, decrease noise data in the dataset, and increase dataset training time.

10. Training data

The training data set is about say, 70% of the entire data. It is used to set model parameters like weight and bias. It is used to teach a model built for machine learning.

10.1 Validation data

The validation data set is not more than say, 10% of the whole data; it is data that the model is not trained on. Moderately, it finds the right model; the number of hidden units in each layer, and fine-tunes parameters, to prevent overfitting.

10.2 Testing data

The test dataset is about 20% of the whole dataset. The model performance is measured using the test data. Test data can be by a program or function that aids the tester or the tester.

11. Perceptron a unit of learning

Perceptron is a unit of learning in a machine learning algorithm such as a single-layer neural network with a linear classifier that can be used as a binary classifier, using a supervised learning procedure. The input data is made to iterate a loop in order to teach the machine learning algorithm. This loop iteration occurs every time a dataset is fed to the machine. The algorithm improvises its output based on its findings after each iteration cycle. After many iterations, the output data is more refined and more accurate. Neurons learn and process elements in the training set one at a time with the help of a machine-learning algorithm. It is also known as a Linear Binary Classifier [1].

11.1 Perceptron working principle

Perceptron is a single-layer neural network with four parameters namely input values (Input nodes), weights and Bias, net sum, and an activation function. The perceptron multiplies all input values with there and their respective weights and adds these values together to make up the weighted sum. Then this weighted sum is applied to the activation function ‘f’ to obtain an output. The activation function or step function is represented by ‘f’.

11.2 Perceptron diagram

Artificial neural network architecture is very similar to the human neural network. The perceptron has only one neuron called a perceptron. It is shown in the Figure 1.

Figure 1.
A schematic diagram of an artificial neuron. Source: [2].

The perceptron has an input layer and a single neuron. The number of nodes in the input level is equal to the number of attributes in the input dataset. Each input is multiplied by a weight (which is usually initialized to a random value) and the results are added. The summation then goes through an activation function which processes the total information and provides an output. The output is given in Eq. (1), source: [3].

Output=W1X1+Wn−1Xn−1+WnXnE1

The output is Zero (0) if the sum is below certain threshold or One (1) if the output is above certain threshold

X_{1, Xn-1,} X _n are the inputs to the Neurons

While, W_1, Wn_-1, W_n are the corresponding weights.

12. Artificial neural network (ANN)

An artificial neural network is a model that processes information, moving from the way biological neural systems, such as the brain, process information [2]. An ANN is configured for a specific application, such as data classification or pattern recognition, by learning to process it. Learning in ANN involves modifying the weighting connections that exist between neurons. In the network, a large number of processing elements (neurons), which are strongly interconnected, work in parallel to solve a specific problem. Neural networks learn using an example. There are three layers in a neural network namely an input layer, a hidden layer, and an output layer in that order. Each layer consists of single or more nodes, as shown in Figure 2 by small circles. An artificial neural network is a model that processes information, moving from the way biological neural systems, such as the brain, process information [2]. An ANN is configured for a specific application, such as data classification or pattern recognition, by learning to process it. Learning in ANN involves modifying the weighting connections that exist between neurons. In the network, a large number of processing elements (neurons), which are strongly interconnected, work in parallel to solve a specific problem. Neural networks learn using an example. There are three layers in a neural network namely an input layer, a hidden layer, and an output layer in that order. Each layer consists of single or more nodes, as shown in Figure 2 by small circles. Every node in one level layer is connected to all other nodes in the next level layer [4]. All node takes the weighted sum of its inputs and translates it into a non-linear activation function. The output of one node in one level is the input of another node in the next level layer. The flow of signal is from left to right. The final output is calculated by performing the product for each of the nodes. Neural network training means learning the weights associated with all the edges.

Figure 2.
A feed-forward artificial neural network. Source: [4].

The lines between the nodes show the flow of information from one node to the next. Information flows only from the input to the output (that is, from left to right) in this kind of neural network. The neuron is the primary processing element of the neural network. It is the fundamental building block of the neural network. Each neuron performs a fraction of the computations involved in a typical network. A neuron’s output serve as the input of the next neuron except for the final output.

13. Goal of classification algorithm

The principal goal of the Classification algorithm is to identify the categories of a given dataset member in a class, and these algorithms are used mainly to predict the output according to the categorization of the data set member class. Classification algorithms output is illustrated in Figure 3 below. In the diagram, there are two classes, class A and Class B. These classes have unique features that are similar to each other and dissimilar to other classes.

Figure 3.
A classification algorithms output. Source: [5].

The algorithm which put into practice the classification of a dataset is known as a classifier. There are two such types of Classifications:

Binary Classifier: A two possible outcomes classifier is called Binary Classifier.

Examples of Binary Classifier output is: YES or NO, MALE or FEMALE, SPAM or NOT SPAM 0 or 1, Present or absent, and so on Multi-class classifier.

13.1 Evaluating a classification model

Once the model is completed and ready for use, it is necessary to evaluate its performance to known its capability. This can be done in the course of using Log Loss or Cross-Entropy Loss:

13.1.1 Log losses or cross-entropy loss

This is used for evaluating the performance of a classifier, whose output is a probability that lies between the numerical Value of Zero and One. The value of log loss should tend toward the numerical value Zero, for a high-quality binary Classification model. The higher accurateness of the model indicates lower log loss. Cross-entropy for a given Binary classification is calculated using:? (ylog(p) + (1?y)log(1?p)) Where y = Actual output p = predicted output.

13.1.2 Confusion matrix

The confusion matrix or error matrix consists of predictions resulting in a summarized form, which has a total number of incorrect predictions and correct predictions. The matrix looks like as Table 1 below:

		Actual
		Positive	Negative
Predicted	Positive	True Positive (TP)	False Positive (FP)
Predicted	Negative	False Negative (FN)	True Negative (TN)

Table 1.

Confusion matrix (source: [6]).

i. True positive (TP): correct positive prediction.

ii. False positive (FP): incorrect positive prediction.

iii. True negative (TN): correct negative prediction.

iv. False negative (FN): incorrect negative prediction.

Sensitivity

Sensitivity is the true positive rate of prediction. It is defined as in Eq. (2).

Sensitivity=TP/TP+FNE2

Specificity

Specificity is true negative rate. It is defined as in Eq. (3).

Specificity=TN/TN+FPE3

Classification Accuracy: Classifier performance is measured using Classification Accuracy (CA). It provides the percentage of correct impressions for the total instances in the dataset. CA can be calculated from the confusion table using Eq. (4).

Accuracy=TP+TN/TP+TN+FP+FNE4

Accuracy=TP+TNTotal population

13.1.3 AUC-ROC curve

The ROC curve stands for Receiver Operating Characteristics Curve and AUC stands for Area under the Curve. It is a graph that shows the performance of a particular classification model at different thresholds. AUC-ROC Curve is used for visualizing the functioning of a multiple classification model. The ROC curve is plotted with True Positive Rate (TPR) on Y-axis and False Positive Rate (FPR) on X-axis correspondingly.

14. Classification algorithms use cases example

Classification algorithms can be used in different places such as Email Spam Detection, Speech Recognition, identification of Cancer tumor cells, and so on.

15. Principal component analysis (PCA)

An example of an unsupervised learning algorithm is the Principal Component Analysis. Very often used for dimensionality reduction in machine learning. It makes use of a statistical process to convert the observations of correlated features into a set of linearly uncorrelated features by applying orthogonal transformation that results in newly transformed features known as the Principal Components. Some real-world applications of PCA are image processing and power utilization optimization.

15.1 Some commonly technical terms used in PCA algorithm are

Dimensionality: This term is the number of features or variables present in a given dataset or the number of columns present in the dataset.
Correlation: This signifies how strongly two variables are related to each other. They can either be directly or inversely relative to each other.
Orthogonal: It means that variables are not associated to each other. The correlation value is Zero between the pair of variables.
Eigenvectors: The eigenvector is a type of vector that is associated with a set of linear equations.
Covariance Matrix: Covariance matrix is used to represent the covariance values between pairs of elements given in a random vector [5].

15.2 Principal components in PCA

The principal component is a linear combination of the original features, orthogonal, and the importance of each component decreases when going from say, 1 to n.

15.3 Algorithm of principal component analysis (PCA)

Algorithm steps

Step 1: Get your data. Divide the input dataset into say set X, as the training set, and Y as the Validation set [5].

Step 2: Give your data a matrix structure.

Step 3: The dataset to be used is standardized by preferring features with high variance in a column to the features with lower variance in another column.

Step 4: Get Covariance of Z.

Step 5: Compute Eigen Vectors then Eigen Values.

Step 6: Rank the Eigen Vectors.

Step 7: Work out the new features by calculation.

15.4 Applications of principal component analysis

PCA is a dimensionality reduction technique commonly uses in various Artificial Intelligent (AI) applications such as computer vision, image compression, and so on. It application areas include finding hidden patterns in a dataset of Finance, Psychology, et cetera and in the dataset that has a high dimensions.

15.5 Model parameter

Model parameters are explained using the graph below (Figure 4).

Figure 4.
A model parameter graph (source: [5]).

The Graph is a model of a Simple Linear Regression. Here, x value is an independent variable; y value is the dependent variable. The equation, y = m x + c, is the regression line. It reveals the relationship between x value and y value correspondingly. The value c is the interception of the line, and m value the slope. These two parameters are premeditated by fitting the line by minimizing Root Mean Square Error (RMSE), and these are known as model parameters. Model parameters are the configuration variables that are internal to the model.

The model parameters are important to Machine Learning Algorithms in that model uses them for making predictions, the model learns them from the dataset presented to the model, and it is forbidden to set them manually. Examples of model parameters are weights and biases, support vectors, coefficients respectively in artificial neural networks, Support vector machines, and logistic regression.

16. Model hyper-parameter

The parameters that are defined explicitly by the designer of the machine learning model engineer or user to control the learning process are called Hyper-parameters. The parameters can be defined manually, the most excellent value is usually determined by trial and error or using a rule of thumb (Table 2).

S/N	Parameters	Hyper parameters
1.	Parameters are the configuration of a model, which are internal to the mode only.	The explicitly specified parameters that control the training process is known as Hyper parameters.
2.	To make reliable predictions parameters are very essential.	Hyperparameters play a key role in the model optimizing.
3.	During model training they precise specified	Before the commencement of training they are set.
4.	It is internal based to the model.	These are purely external to the model.
5.	They were set and learned by the model by itself.	The machine learning engineer/practitioner set it manually.
6.	These are dependent on the dataset used for training.	These are independent of the dataset used for training.
7.	The Optimization algorithms, for instance Gradient Descent estimated the values of parameters	Hyperparameter tuning are used to estimate hyper parameters values.
8.	The final parameters estimated after training determines the model performance on unseen data.	The selected or fine-tuned hyper parameters conclude the quality of the model.
9.	Weights in an ANN, and Support vectors in SVM are example of a model parameters.	The learning rate for training a neural network, K in the KNN algorithm, and so on are example model hyper parameters

Table 2.

Comparison between parameters and hyper-parameters (source: [5]).

17. Errors in machine learning

An error in machine learning is a measure of how accurately a machine learning algorithm can make predictions for unseen dataset. A reducible error can be reduced to improve the model prediction accuracy such error example are bias and Variance error. While irreducible errors will always be present in the model in spite of which algorithm has been used. The cause of these errors is unknown variables whose value cannot be reduced using any method [5].

18. Bias

In a real life situation, a machine learning model analyzes a given dataset, it finds patterns in the dataset, and make predictions out of it. During the training section, the model learns these patterns in the dataset and applies them to test data in order to make a prediction out of them. Bias error is the difference that is found between prediction values made by the model and actual expected value. It can be due to incapability of machine learning algorithms such as Linear Regression to capture the true relationship between the data points in the dataset. Each algorithm begins with some amount assumptions in the model, which makes the target function uncomplicated to learn [6].

A high bias model makes more assumptions, and the model may unable to capture the important features of a given, also it cannot perform very well on a new set of dataset. But a low bias model makes few assumptions about the form of the target function.

18.1 Ways to reduce high bias

Since high bias is due to the fact that the model producing it is simple, it can be reduced by increase the input features as the model under-fitted, decrease the regularization term, and introduced more complex models, for example by using polynomial features.

18.2 Epoch

An epoch is a complete pass of the training dataset through the machine learning algorithm. The number of epochs is an important hyper-parameter for a machine learning algorithm [6].

19. Variance error

Variances indicate the amount of variation in a given prediction of a model if a different training dataset is used. It shows how the result produced by the model is different from its expected result value. Low variance specifies that there is a small variation between the expected result and the result produced by the model, while, high variance shows a large variation in the predicted value and expected value. Linear Regression is an example of a machine learning algorithm with low variance, while Support Vector Machine and K-nearest have high variance.

20. Bias-variance trade-off

Balance the bias error and variance error enables a machine learning model to avoid over-fitting and under-fitting a given dataset. A simple model with fewer parameters results in low variance and high bias, while a large number of parameters results in high variance and low bias. The balancing between the bias error and variance error is known as the Bias-Variance trade-off.

21. Machine learning life cycle

The machine learning life cycle is a recurring process that builds an efficient solution to a given problem using machine learning techniques and procedures. The most important thing in the whole process is to understand the problem at hand and to know the purpose of solving the problem. This will help to generalize and interpret results properly. To solve a problem using a life cycle process, a model is designed using a machine learning algorithm, and a set of data is used to train the model and validated using test data.

21.1 Data gathering

The machine learning life cycle begins with the data-gathering process. Its purpose is to identify and obtain all data related to the problem at hand that needs to be solved. Likely sources of data collection are files, databases, the internet, mobile devices, and so on. The more the magnitude and superiority of the collected data, the better will be the efficiency of the machine learning algorithm and the accuracy of its prediction.

21.2 Data preparation

It is used to have better understand of the nature of data we are using for analysis. There is the need for proper understanding of the characteristics, format, and quality of data. In this, we may find Correlations, general trends, and outliers.

21.3 Data pre-processing

To carry out data analysis, data must undergone preprocessing. Data preprocessing is important before it is used in order to enhance algorithm performance. The dataset preprocessed include check missing values, noisy data, and other inconsistencies before running it with the algorithm.

21.4 Data wrangling

The process of cleaning and converting raw data into a usable format is known as data wrangling. It is the process of cleaning the data, selecting the variable to use, and transforming the data into a proper format to make it more suitable for analysis in the next step. Collection of dataset in the real-world applications may have various issues like Missing Values, duplication of data, invalid data, Noise, et cetera. So, filtering techniques of various type are used to clean the data. It is very necessary for one to detect and get rid of the issues above to avoid its negative implication on the quality of the outcome, and competence of the machine learning algorithm.

21.5 Data analysis

The cleaned and prepared data is now passed on to the analysis step. This step involves:

Selection of analytical techniques, Building models, and carrying out a review of the result obtained earlier.

21.6 Train model

We train our model to improve its performance for better outcomes in problem-solving. Training a model is a model requirement for it to understand the various patterns, rules, and, features that are present in the dataset.

21.7 Test model

After the machine learning model has been trained on a given dataset, the next is to test run the model with a new set of datasets for accuracy. Testing the model determines the percentage accuracy of the model as per the requirement of the project or problem that it is designed to solve.

21.8 Deployment

The last step of the machine learning life cycle is the deployment of the design model in a real-world system application. For an accurate prediction of the model, algorithms used for implementation must maintain a balance between bias and variance; this is a core issue in machine learning. In practice, this is not possible as bias and variance are oppositely correlated. As the variance decreases; the bias increase and vice-versa. It is imperative to establish a balance point to produce an optimal model.

22. How to get datasets for machine learning

A good machine learning engineer acquired enough knowledge that is used to prepare a suitable dataset that is applicable to each kind of machine learning project.

23. Dataset

A dataset is a collection of data stored in a table or arranged in some order. The data contained in the dataset may be in the form of an array. The below table shows an example of the dataset (Table 3).

S/N	Product Identification Num.	Product Name	Quantity	Unit Price	Country
1.	031846	Jpeg	98	$4.00	USA
2.	040159	Fanta	1000	$2.00	Saudi Arabian
3.	039466	Green Tea	10	$1.00	China
4.	040384	Soft balloon	100	$20.00	Nigeria
5.	048392	Yellow cake	99	$3.00	Ice land

Table 3.

Source: Author (2023).

The column corresponds to a particular variable and each row corresponds to the fields in the dataset [5].

24. Population based algorithm

Population-Based Algorithms and its variant in recent time are increased in use and solved an unlimited set of problems. This is partly due to their inherent ability to escape local optima, extensibility to multiple objective problems, their ability to handle linear, and non-linear inequality, and equality constraints in an uncomplicated way. A few examples of such algorithms are the Genetic algorithm, Ants colony, Particle Swarm, Bees algorithm, and more.

24.1 Working principle of population based algorithm

A population based algorithm uses the principles Natural Selection. It is a search-based optimization technique. The optimal or near-optimal solutions to complex problems which otherwise would take a life span to solve can be found by this class of algorithm. Quality results to optimization and search problems can be generated by this algorithm using operators like mutation, crossover, and selection.

24.2 Basic structure of population based algorithm

The population-based algorithm uses the theory of natural evolution. Significant features of such an algorithm are: Fitness Function stands for the main requirements of the desired solution of a problem (for example, cheapest price, least short route, less compact arrangement, and so on. The five phases of the population-based Algorithm are in order of initial population, fitness function, selection, crossover, and mutation [1].

24.2.1 Initial population

The initial population is the number of the first individuals selected, each individual represents a proposed solution. Its parameter is called Genes whose string is known as Chromosome this represents a solution. A string can be in combination with 1 s and 0 s.

24.2.2 Fitness function

The fitness function is how best an individual is relatively to others. Its score is used to select fitted individuals for the next population.

24.2.3 Selection

In the selection phase, the fittest individuals are selected and handed over their genes to the next generation [6]. Two pairs of parents are picked and chosen based on their fitness scores. The probability of selecting individuals with higher fitness for replication is high.

24.2.4 Crossover

A crossover point is chosen at random from within the genes for parents to be mated. By interchanging the genes of parents between themselves until the crossover point is reached children are produced. The new offspring formed is added to the readily available population.

24.2.5 Mutation

In some new children formed, some of their genes can be subjected to a mutation. This implies that some of the bits in the bit string can be flipped (reversed). Mutation occurs to retain diversity (variety) within the population and prevent early convergence. It causes movement within the search space (Local or Global) and restores lost information to the population [1].

24.2.6 Termination

The algorithm stops when the population converges (it does not produce offspring that differ significantly from the previous generation). Next, the population based algorithm is supposed to offer a number of solutions to a given problem.

25. Application of machine learning methods to develop virtual assistants

Virtual Assistants (VA) are machine learning-based systems that are aware of the user’s intention and respond accordingly. They are used as Virtual Customer Assistants for making respond to customer service rapidly and providing uninterrupted 24/7 support. Unlike chatbots, they have the ability to understand human languages. It uses semantic, deep neural networks, natural language processing, prediction models, recommendations, and personalization to assist people in the area of their various needs or automate tasks. It listens to and observes behaviors, builds and maintains data models, and predicts and makes recommend actions. VAs several use cases, including virtual personal assistants, virtual customer assistants, and virtual employee assistants. The assistant records voice instructions, send them over to a server on the cloud and decode them using ML algorithms and act accordingly. Python programming language is used for building digital virtual assistants. Virtual assistants can have a chat-based interface or use voice commands, without an interface to operate. Google Assistant and Microsoft’s Cortana are examples of virtual assistants.

26. Conclusion

This chapter’s discussion is on the fundamental concept, algorithms in machine learning, and its various application areas presently and in the near future. It updates the topics cover in a concise and precise manner to enable readers to have a good understanding of the topics covered. Learners could acquire the basic skills needed in machine learning to be able to understand its theory and practical application. While the expert in the field will enable to do revisions that will refresh his or her memory concerning the topic covered in the chapter. Researchers in various fields will be able to use it for research work related to machine learning theory, and practice. Practitioners should be able to apply knowledge gained in reading the chapter to improve practice standards. While policymakers in machine learning could use the insight gained in the topics presented in the chapter to make an informed decision related to machine learning in government and private establishments. The chapter will improve the utilization, and understanding of machine learning in theory and practice.

References

1. Sue. Introduction to Genetic Algorithms. 2021. ISBN:978-1-4020-9117-9
2. John. Biological Neurons, Neural Network and Artificial Neuron Lecture Series. 2022
3. Mbeledogu NN, Chiemeke SC, Imiavan AA. Soft computing stock market Price prediction for the Nigerian stock exchange. International Journal of Advance Engineering Management and Science. 2016;2(7):1028-1032
4. Imianvan AA, Amadin FL, Obi JC. Prototype of a neuro-fuzzy system for detection of environmental induced depression. International Journal of Application of Fuzzy Sets and Artificial Intelligence. 2012;2:79-90
5. Javatpoint. 2023. Available from: https://www.javatpoint.com/classification-algorithm-in-machine-learning
6. Adegboye A. Genetic Neuro Fuzzy Model for Diagnosing Clinical Depression: M.PHIL Thesis in the Department of Computer Sciences. Benin City, Nigeria: University of Benin; 2021

[1] 1. Sue. Introduction to Genetic Algorithms. 2021. ISBN:978-1-4020-9117-9

[2] 2. John. Biological Neurons, Neural Network and Artificial Neuron Lecture Series. 2022

[3] 3. Mbeledogu NN, Chiemeke SC, Imiavan AA. Soft computing stock market Price prediction for the Nigerian stock exchange. International Journal of Advance Engineering Management and Science. 2016;2(7):1028-1032

[4] 4. Imianvan AA, Amadin FL, Obi JC. Prototype of a neuro-fuzzy system for detection of environmental induced depression. International Journal of Application of Fuzzy Sets and Artificial Intelligence. 2012;2:79-90

[5] 5. Javatpoint. 2023. Available from: https://www.javatpoint.com/classification-algorithm-in-machine-learning

[6] 6. Adegboye A. Genetic Neuro Fuzzy Model for Diagnosing Clinical Depression: M.PHIL Thesis in the Department of Computer Sciences. Benin City, Nigeria: University of Benin; 2021