Application of Central Composite Design with Design Expert v13 in Process Optimization

Chigoziri N. Njoku; Samuel K. Otisi

doi:10.5772/intechopen.109704

Abstract

This chapter is focused on the study application of central composite design, in response surface methodology. We have reviewed this concept and applied it to optimize Biodiesel yield from transesterification of methanol and vegetable oil with a catalyst derived from eggshell using design expert 13. This optimization was carried out with reaction conditions of reaction time, methanol to oil ratio, catalyst loading, and reaction temperature. Data used as an instance was collected and analyzed from the work of Tshizanga et al. and the result obtained for a randomized experiment showed at a 95% confidence level that all the factors affected the product’s output. About 91% yield was obtained and operating parameters were optimized at a temperature of around 61%. Methanol to oil ratio of 22.13, and catalyst loading of around 3.7 wt%. This chapter provided a step-by-step guide on how to carry out this experiment using design expert 13, a reduced Quadratic model with a significant P-value of 0.0325 shows the model is significant, as indicated by an f-value of 3.57. An F-value might be caused by noise only in 3.25% of cases. The run was reduced to 18 compared to the 20 runs originally used by Tshizanga et al.

Keywords

response surface methodology (RSM)
central composite design (CCD)
design of experiment (DOE)
design expert

Author Information

Show +

Chigoziri N. Njoku
- Africa Centre of Excellence in Future Energies and Electrochemical Systems (ACE-FUELS), Federal University of Technology, Nigeria
Samuel K. Otisi*
- Department of Chemical Engineering, Federal University of Technology, Nigeria

*Address all correspondence to: samuelotisikalu@gmail.com

1. Introduction

It has always proven difficult to quickly select an appropriate experimental design, which can simply explicate many response factors. This sometimes leads to a quadratic surface model. CCD can be a choice for this kind of model. An experimental design called the central composite design (CCD) concept has emerged and has been very handy as part of the optimization process and search for the ideal product from ongoing batches. In statistics, a central composite design is an experimental kind of design, helpful in response surface methodology, for creating a second-order (quadratic) model for the response variables without having to use a complete three-level factorial experiment [1]. After performing the designed experiment, linear regression is deployed, sometimes iteratively, to obtain results. Coded Variables are frequently utilized when creating this design Most optimizations are done by screening all the potential variables [2]. Here, all the possible independent factors are first identified, and these factors are further improved before response surface methodology can finally be used to establish relationships between one or more process variables and their responses. The Central composite design is sometimes referred to as Box-Wilson central composite design and it has been chosen among researchers due to its accuracy.

2. Key terms in central composite design

Some important keywords will be mentioned throughout this chapter. This is to equip the readers with the terminology to understand fully the concept of Response surface methodology.

Response surface: These are the related variables. It involves a two or three-dimensional plot of the results of experimental data. Response surface methodology (RSM) is used to describe the use of experimental designs that give response surfaces from which information about the experimental system is deduced [3].

Factor: This can also be called the parameter or predictor. It is an entity that controls an outcome. The output Change is brought about by the manipulation or tweak of the input factor (s). They can be set and reset at different levels depending on the needs and conditions that affect the experiment.

Levels of the factors: The Design of experiments is named by the number of levels chosen for a factor, it could be a two or three-level design. It signifies the value of a factor that is prescribed in an experimental design. Levels could be high, mid, or low (three-level design) and only high and low (two-level design) is often coded as +1(high), 0(mid), and − 1(low). Selecting levels for an experiment often requires field experience. For example, for a three-level experiment, selecting the levels in a reactor would require some previous experience to decide 30°C(−1), 40°C (0) and 50°C (1) are suitable for low-level, mid-level, and high level respectively.

Blocking: This tool is used to eradicate the effects of external disturbances and in the process improve the efficiency of experimental design. External disturbances cause different forms of variations. The main goal is to arrange similar experiments runs into one group, so that the whole group becomes a homogeneous unit. For example in the transesterification reaction, A researcher is attempting to increase the yield of Biodiesel through Mean Absolute Errors (MAE). factors were considered for the initial experiment trials, which might have some impact on the yield. It is decided to study each factor in a two-level setting (i.e. a low value and a high value). Six experimental trials are chosen by the experimenter, but only four trials are possible to run per day. Here, each day can be handled separately as a different block [4].

Response: This is the result of the effect of an experiment, which is observed on account of changing the values of the predictors. For example the Yield, Selectivity, or Conversion of a reactant in a reactor.

Design of experiment (DOE): This is a statistical approach that involves planning, analyzing, conducting, and interpreting data obtained from experiments [3].

Randomization: While designing and running an experiment, there are several factors in the form of external disturbances often known as noise factors, which may influence how the experiment turns out. For example, variations in the quality of the raw material due to seasonal change, variations in the temperature, and their effects on the overall reaction yield may affect the result and such factors are difficult to control. Randomization is one of the methods to remove or reduce such errors occurring due to uncontrollable factors. Randomization helps in calculating the cumulative impact of the external disturbances if present in the process [3].

Model: This is an equation expressing the relationship between responses and the factors under study or investigation. Here the outcome can be denoted as a function of the experimental factors. For example, a model that has only one parameter xcould be expressed as;

y=fx+ℇE1

For two parameters model, it could represent as;

y=fx1,x2+ℇE2

For the n parameters model, consider the following equation;

y=fx1,x2…..xn+ℇE3

The function, fx denotes the relationship between the parameters and the response (y) with the residuals (ε) and is depicted through a polynomial equation. Three different models are described:

Linear model: This is the simplest polynomial model that contains only linear terms and describes only the linear relationships between the variables and the responses. A linear model with two factors x₁, x₂ are expressed as:

y=b0+b1x1+b2x2+ℇE4

Or can generally be represented as;

y=b0+∑i=1kbixi+ℇE5

Here, y is the outcome, b_i is the model coefficients, b₀ is the model intercept, i is the factor number from i to k, and x_i is the independent variables.

Interaction model: The interaction model holds some extra terms that depict interactions between various variables if any. For a two-factor, It is denoted as;

y=b0+b1x1+b2x2+b12x1x2+ℇE6

Or generally as;

y=b0+∑i=1kbixi+∑i=1k=1∑j=i+1kbijxixj+ℇE7

b_o, b _i, and b_ij are the regression or the model coefficients for intercept, linear, and interaction terms, respectively, and x_i, and x_j are reaction factors.

Quadratic model: Quadratic terms are introduced in the model to help ascertain the optimal value. It helps to identify curvature that exists in the model. This model for two factors and interaction can be represented below:

y=b0+b1x1+b2x2+b12x1x2+b11x12+b22x22+ℇE8

or generally a;

y=b0+∑i=1kbixi+∑i=1kbiixi+∑i=1k=1∑j=i+1kbijxixj+ℇE9

b_o, b _i, b_ii, and b_ij are the model coefficient for intercept, linear, quadratic, and interaction terms, respectively, and x_i, and x_j variables.

Note: The Symbol ℇ, in the model for eqs. (1) to (9) represents the residuals and the linear and interaction models are used during the screening stage.

Effects: This is often regarded as the coefficient of the variables, it can be distinguished from the main effects; which involve the factor’s coefficient in the first-order model. Interaction effect; It is the coefficient of the products of linear terms. Quadratic effect; which denotes the coefficient of the square of the linear terms.

Replication: Replication means repeating the entire experiment or a part of it, under different operating conditions. It helps to obtain a projection of the experimental errors and to understand and estimate more specifically the factors and their interaction.

3. Response surface methodology for optimization design

The primary goal 0f optimization design is to minimize unfavorable or undesired outputs or maximize the desired outputs. Sometimes, simple linear and interaction models are not enough to provide a brilliant picture of the process. For this study, our goal is to increase the Biodiesel Yield from the transesterification of methanol and vegetable oil using a catalyst derived from the eggshell. The experiment has already been done and data is provided in this reference [4]. We will be using the information from this work to provide a thorough examination of central composite design in process optimization. The variables are reaction temperature, methanol-to-oil ratio, and catalyst weight. If these entities are positioned inside the region in which the experiment is to be conducted, we need a mathematical model that can represent curvature so that it has a local optimum. The best model is the quadratic model as shown in (eq. 9), which contains linear terms for all factors, squared terms for all factors, and products of all pairs of variables. Response surface designs are generally used for fitting quadratic models. A full factorial design with three levels for each input variable is one such design. Due to the excessive number of runs, that is not necessary to fit the model. It is typically not a good design. The CCD and Box–Behnken designs are the two most common designs generally used in response surface modeling although only central composite designs are explored in detail. In these types of designs, the variables take on three or five distinct levels, but not all combinations of these values appear in the design. The steps in CCD for Optimization are outlined below:

Preliminary stage: Here, the following steps are done:

Choosing the factors and desired levels
Determination of the Counts of experimental runs
Calculation of alpha (α) and the axial values
Selecting the response variables
Carrying out the experiments
Model selection

Analysis stage: At this stage, the following are done:

ANOVA is conducted where the F-test and Lack of fitness are used to test for significance, Adjusted and predicted R² are also determined at this stage.
Next, the model equation is built
Comparing values predicted (from the model) and actual values
Using 2D and 3D contour plots or graphs to visualize the response(s).

Decision-making stage: Here, the predicted and actual values are compared to determine the residuals using some useful parameters such as Adjusted R-Square, Mean Absolute Error (MAE), or Mean Square Error (MSE) is employed to assess the model performance and if okay with the result we can proceed to the final stage but if not we go back to the preliminary stage to see how we can adjust the model.

Optimization stage: At this stage, the model is ready to be deployed for the optimization process, design expert version 13 is very handy for this entire process, all we need is to specify the required values. The detail on how to determine the CCD components will be done later in this chapter.

4. Box: Behnken design (BBD)

The box design can adapt to the response surface full quadratic model [5]. BBD has no incorporated factorial or fractional factor designs, such as CC. In this design, the treatment combinations are at the midpoints of the edges of the cube and the center as shown in Figure 1. BBD is a rotatable design and needs three levels for each factor. BBD should be considered for experiments with more than two factors, and when it is expected that the optimum is known to lie in the middle of the factor ranges. A, B, and C represent factors A, B, and C respectively.

Figure 1.
A representation of box–Behnken design.

5. Central composite design

Central composite design (CCD): This is a unique kind of response surface design that can fit a full quadratic model. It is comprised of factorial also known as fractional factorial design with a center point attached to a group of stars or axial points. Using the included axial points is an effective method for calculating the coefficients of a second-degree polynomial for the factors [6]. A CCD can be denoted as a square (for two factors design) or a cube (for a three factors design) having corners, which represent the levels (high and low represented as +1, −1 respectively), a star or axial points along the axes at or outside the square helps to account for the curvature and a center point at the origin. The general model for a two-factor full factorial CCD is represented graphically in Figure 2 below.

Figure 2.
A visual depiction of the CCD model for determination of total runs for all experiments for two factors full factorial design. K in the model is the number of factors, C is the replicated central points that help to eliminate pure error and N is the experiment runs required for the design.

Figure 3 displays a three-factor lay out for a CCD made up of a full factors factorial that forms the cube where each side is coded −1 and + 1 just like in Figure 2 above. The Stars stand for axial points and alpha is the distance from the edge of the cube to the stars.

Figure 3.
A graphical representation of three factors in a full factorial design.

6. Types of central composite design

There are three types of CCD namely:

Circumscribed Central Composite Design (CCC)
Inscribed Central Composite Design (CCI)
Face-Centered Central Composite Design (CCF)

The CCC is a type of CCD in which the location of the axial points forms new extremes from the already attained levels of the factorial factors. The new extremes are determined by a value called alpha (distance between the new extreme and the edges of the factorial points) making it up to 5 levels. It is often determined to achieve a rotatable design [7].

The CCI type is a modified form of CCC. The axial points are scaled to be within the limits of the factorial factor [8]. The CCI is also a rotatable type and has 5 levels just like the CCC.

For the CCF, the axial points correspond to the center points for each side of the cube in Figure 3 above (three factors designs) and they are non-rotatable [9]. It has only 3 levels. Figure 4 below provides more insight into this type of CCD.

Figure 4.
Three types of central composite Design6.

7. Determining the components of central composite design

Before starting the CCD optimization process, we need to provide a walk-through on how to calculate all the required parameters to build the model.

8. Calculating the number of experiment runs

To design a CCD experiment for two levels (+1 and − 1 levels of factors) full factorial design is represented by 2^k, then the axial points as represented by Figure 2 are given as 2 k, let C represent the center points and n, the number of times the experiment is repeated to eliminate errors. Then the total number of experiment runs is given as:

N=2k+2k+nCE10

Where k is the number of factors selected for the experiments. For our case, we have three (3) factors, i.e. Temperature, methanol-ratio, catalyst weight, and 4 repetitions. By substituting k = 3, C = 1, and n = 4 (i.e. 4 repetitions), then N = 18 runs. Luckily, design expert 13 will automatically generate this value once the number of factors and repetitions are provided. Keep in mind the number of center points can also be adjusted by clicking the options button on the software, in this case, we will just use.

9. Calculating alpha (α)

As can be seen that immediately after the factors, n, and C are provided the alpha is automatically calculated, this is because the minimum parameters to calculate it has been specified, now it will be shown how the program generates this value. As has been discussed earlier Alpha is the distance between the new extreme axial points and the edge formed from the factorial levels. Now the following equation will calculate this α value for any factor.

α=2k1/4E11

For our case, k is 3, and therefore α = 1.68179 which is in line with the value created by the software. Consider the Table 1 below for k from 2 to 5 factors and their corresponding values.

Factor (k)	Alpha (α)
2	1.41421
3	1.68179
4	2
5	2.37841

Table 1.

Factors and corresponding α values.

10. Calculating axial values

Before determining the axial points, the table below shows the factors levels and center points that will be used to compute the axial points. The center points are coded as 0, while low and high levels are designated as −1 and + 1 respectively. Also keep in mind that the experiment has already been performed and data provided from the work of (Table 2) Tshizanga et al., [4].

Factors	Symbols	Low level (−1)	Centre point (0)	High level (+1)
Temperature (°C)	X₁	60	65	70
Methanol to Oil ratio	X₂	15:1	22.5:1	30:1
Catalyst Weight (wt%)	X₃	2.0	3.5	5

Table 2.

Experimental ranges of the independent variable.

To compute the Axial values, the first thing to do is to find the α that can be added or subtracted from the factor levels (low and high) and the center points. Adding α to factor levels can be coded as + α (higher axial value) while subtracting α from the mean factor levels is however coded as – α (lower axial value). These additional two coded values (+α and – α) are axial and they make the factors up a total of 5. The two equations are given below:

+α=X+αxHigh level−Lowlevel2E12

α=X–αxHigh level−Lowlevel2E13

Where α can be found using (eq. 11) although calculated as 1.68179 and X is given by:

Lowlevel+centre point+High level/kE14

And k is the number of variables, in this case, k is 3. At this point let us get our hands dirty with calculating the values for these 3 factors.

For Temperature:

X1=60+65+703=65

+α=65+1.68179x70−602=73.4090°C (app. To 4 d.p)

−α=65–1.68179x70−602=56.5911°C

For Methanol-Oil ratio:

X2=15+22.5+303=22.522.5:1

+α=22.5+1.68179x30−152=35.113435.1134:1

−α=22.5–1.68179x30−152=9.88669.8866:1

For Catalyst weight:

X3=2+3.5+53=3.5

+α=3.5+1.68179x5−22=6.0227wt%

−α=3.5–1.68179x5−22=0.9773wt%

Currently, we have succeeded and step by step discussed how the software generated the alpha (α) and axial values as the components of the CCD, below is a table including these axial points (Table 3).

Factors	Symbols	Lower axial point (−α)	Low level (−1)	Centre point (0)	High level (+1)	Higher axial point (+α)
Temperature (°C)	X₁	56.5911	60	65	70	73.4090
Methanol to Oil ratio	X₂	9.8866	15:1	22.5:1	30:1	35.1134
Catalyst Weight (wt%)	X₃	0.9773	2.0	3.5	5	6.0227

Table 3.

Experimental ranges of independent variables including calculated axial (star) values.

Upon specifying the required parameters for the CCD model the software will generate a table where the experiment will now be conducted to enable determining the response for each experiment run. For this case study, our response is biodiesel yield which can be determined from methyl ester and waste vegetable oil weight using the following equation:

Yield%=WeightBiodieselWeightOilx100E15

Table 4 factors’ coded values organized in the standard order.

Std	Run	Factor 1 A:Temperature (0C)	Factor 2 B:Methano-Oil r…	Factor 3 C:Catalyst Weight wt%
1	3	−1.000	−1.000	−1.000
2	15	1.000	−1.000	−1.000
3	11	−1.000	1.000	−1.000
4	4	1.000	1.000	−1.000
5	2	−1.000	−1.000	1.000
6	12	1.000	−1.000	1.000
7	17	−1.000	1.000	1.000
8	8	1.000	1.000	1.000
9	7	−1.682	0.000	0.000
10	1	1.682	0.000	0.000
11	16	0.000	−1.682	0.000
12	14	0.000	1.682	0.000
13	13	0.000	0.000	−1.682
14	6	0.000	0.000	1.682
15	9	0.000	0.000	0.000
16	18	0.000	0.000	0.000
17	5	0.000	0.000	0.000
18	10	0.000	0.000	0.000

Table 4.

Factors’ coded values organized in the standard order.

Immediately after we fill up the required CCD components, the design expert will provide a table for coded levels of factors. This is being used to use as a guide to specifying the actual values and their corresponding responses. The experiment will be repeated four (4) times instead of 6 times (as done by the original researchers) to reduce the experiment runs. The two results will be compared after the optimization stage. Table 4 shows the coded factors, and Table 5 shows the actual values and their responses after experimenting in the laboratory.

Std	Run	Factor 1 A:Temperature (0C)	Factor 2 B:Methanol-Oil ratio	Factor 3 C:Catalyst Weight wt%	Response 1 Biodiesel Yield %
1	3	60	15	2	69.29
2	15	70	15	2	35.2
3	11	60	30	2	73.39
4	4	70	30	2	10.66
5	2	60	15	5	61.29
6	12	70	15	5	50.23
7	17	60	30	5	71.48
8	8	70	30	5	22.06
9	7	56.591	22.5	3.5	62.18
10	1	73.409	22.5	3.5	79
11	16	65	9.88655	3.5	42.68
12	14	65	35.1134	3.5	39.5
13	13	65	22.5	0.977311	66.52
14	6	65	22.5	6.02259	74.04
15	9	65	22.5	3.5	80.46
16	18	65	22.5	3.5	89.35
17	5	65	22.5	3.5	90.98
18	10	65	22.5	3.5	89.52

Table 5.

Actual factors’ values arranged in the standard order after the experiment.

At this point, we can now replace the coded values with the actual values from the previous calculations. The factor columns were generated with a particular pattern, but it’s beyond the scope of this chapter, to learn more about this we recommend reading “RSM simplified by Anderson and Whitcomb” [10].

11. Results and analysis

Note: When entering the values for the methanol to oil ratio in design expert, you can ignore all the 1’s, since the value of Oil concerning Methanol is always a unit for all the experiment runs.

We can now delve into understanding the data collected to build the model, perform analysis, and finally carry out the optimization. All these steps will be done in design expert software.

12. Understanding the data

The reason for this is basically to understand the relationship that exists in the data. In a more statistical sense, we need to know if there is a strong correlation between the variables and the response. If to some extent there exists an intra-correlation among the factors then one of them has to be removed because it will eventually harm the model. Design expert has provided a wonderful dashboard where we can carefully learn more about the data we have collected and make some sense of it. At the left of the software, we will see the information part of it. The summary, graphs columns, and evaluation subsections are the places to dig the nuggets from the data.

In the summary section we will see the summary statistics of the data, i.e. the number of experiment runs, type of designs and model, minimum, maximum, mean, standard deviations of the responses, and Ratio of maximum to minimum response values (Table 6).

Factor	Name	Units	Type	SubType	Minimum	Maximum	Coded Low	Coded High	Mean	Std. Dev.
A	Temperature	(0C)	Numeric	Continuous	56.59	73.41	−1 ↔ 60.00	+1 ↔ 70.00	65.00	4.24
B	Methanol-Oil ratio		Numeric	Continuous	9.89	35.11	−1 ↔ 15.00	+1 ↔ 30.00	22.50	6.36
C	Catalyst Weight	wt%	Numeric	Continuous	0.9773	6.02	−1 ↔ 2.00	+1 ↔ 5.00	3.50	1.27

Table 6.

Summary statistics of the factors.

We have seen that the mean response is quite far away from the minimum and maximum response, this is the primary reason for building this model to test the statistical significance of this result. If we are okay with the significance we can go ahead with the model built in the evaluation tab (Table 7).

Response	Name	Units	Observations	Minimum	Maximum	Mean	Std. Dev.	Ratio
R1	Biodiesel Yield	%	18.00	10.66	90.98	61.55	23.49	3.53

Table 7.

Summary statistics of the response.

Moving over to the graphs column section, there are scatter plots, histograms, and Box-plots. To make the most sense of this data, the scatter plot is most handy since it tells how the factors are correlated to each other, the drop-down at the top left helps to select the factors to show scatter plots or the correlations plots at the bottom to help display correlations as values between −1 and 1 (blue to red). Values close to −1 show a strong negative correlation and values close to +1 show a strong positive correlation. Now we will go ahead and display the scatter plots of each factor and the response under the factor that mostly affects the biodiesel yield in Figures 5–7 respectively.

Figure 5.
A scatterplot of temperature vs. biodiesel yield.

Figure 6.
A scatterplots of methanol/oil ratio against biodiesel yield.

Figure 7.
A scatterplots of catalyst weight against biodiesel yield.

The plots have shown that temperature mostly negatively affects yield. The correlation plot in Figure 7 confirmed this claim since the box with the most blueish color lies between the temperature and Biodiesel columns. As described in the correlation plot in Figure 8 from design expert software.

Figure 8.
Correlation plots of all the relationships that exist in the data.

Finally, this section provides a unique tab called evaluation, where the model name is selected and all their parameters are shown. In this case, a quadratic model has been selected by the software which is the best for CCD. There are two tabs i.e. the results and graphs where the model parameters are evaluated.

In the model tab, the model terms are related to the Significant factor, Variance Inflation Factor, R-Squared, and Power of the model as shown in the table below.

12.1 Model terms

Power calculations are performed using response type “Continuous” and parameters:

Delta = 2, Sigma = 1.

Power is evaluated over the −1 to +1 coded factor space. Standard errors should be similar to each other in a balanced design. Lower standard errors are better.

The ideal VIF value is 1.0. VIFs above 10 should cause concern. VIFs above 100 should cause alarm, indicating coefficients are poorly estimated due to multicollinearity.

Ideal Rᵢ² is 0.0. High Rᵢ² means terms are correlated with each other, possibly leading to poor models. If the design has multilinear constraints, then multicollinearity will exist to a higher extent. This inflates the VIFs and the Rᵢ², rendering these statistics useless. Use FDS instead.

The Power Calculation is the estimated chance to find a significant effect out of the current evaluation model. Power depends on the size and structure of the design, the signal-to-noise ratio (number of standard deviations) for the effect, and the model evaluated. The Options button on the Model tab allows the user to define three signal-to-noise ratios that define the number of standard deviations to use. If the power is not large enough (80% or more) for a reasonably sized effect, then the design is underpowered. As can be seen in Table 8, we may consider removing the interaction terms since they have lower power. This will be done after analyzing the model and the p-value is higher than 0.05. This means they have affected the performance of the model. Power is an inappropriate tool to evaluate response surface designs. Use prediction-based metrics provided in this program via Fraction of Design Space (FDS) statistics. Click on the Graphs tab to find the FDS graph. More information about FDS is available in the Help. Be sure that the model you selected contains only terms you expect to be significant (Table 9).

12.2 Leverage

The leverage data as shown in the table above is the potential for a design point to influence the fit of the model coefficients, based on its position in the design space. Leverages approaching or at 1 indicate that point will influence the model. A leverage of 1 means the model must exactly fit the observed value. A good design avoids leverages approaching 1. A design for the same model but having more runs will tend to have a lower leverage for each point.

Watch for leverages close to 1.0. Consider replicating these points or make sure they are run very carefully.

The Graphs tab contains the FDS, Perturbation, interactions, Contour, Cube, and 3D Surface Plots to help understand the data and the model parameters.

12.3 FDS graph

The FDS graph is used to compute the volume of the design space that has predicted variance less than or equal to the specified value. The fraction of the design space is calculated as this volume divided by the entire volume of the design space. The goal is to make a single plot that shows the cumulative fraction of the design space on the x-axis (from zero to one) versus the prediction variance on the y-axis.

For exploration and optimization, we advise an FDS score of at least 0.8, or 80%, and for stability and robustness testing, such as showcasing the design space for quality by design (QbD) work, 100%. Options for assessing the FDS in relationship to four different error categories are provided by the FDS Graph tool, i.e. Mean, prediction intervals, Difference between pairs of Observations, and Tolerance. We are using the Mean error type since the aim of this experiment is to find the optimized factor settings for specific response goals. Figure 9 below is the visualization for the FDS graph.

There are three parameters: delta, sigma, and alpha for each type of error and a fourth parameter is Proportion for the Tolerance type of error.

delta specifies the maximum acceptable half-width (margin of error) of the respective interval for the Mean, Pred, and Tolerance error types. One best way to find the delta is to answer the question, “plus or minus how much is an acceptable estimate?”

sigma is an estimate for the standard deviation that will appear on the ANOVA. It can be obtained from previous work with this system, work from a similar system, or outright guessing. A smaller sigma can be entered to enhance the FDS if the unexplained nuisance fluctuation can be reduced during the experiment.

Alpha is the used significance level throughout the statistical analysis. Our default is 0.05 or 5%. It is a type I error acceptable risk. FDS rises as alpha increases. The critical value is calculated using alpha/2 for two-sided intervals and alpha for one-sided intervals.

Proportion is only used for the Tolerance type of error. It is the percentage of the individual outcomes that must fall within the tolerance range. Building a larger design and raising the delta will boost the FDS score, reducing the sigma, increasing the alpha, and/or decreasing the Proportion [11, 12].

12.4 Interaction

When the reaction varies depending on the settings of two elements, there has been an interaction. They will display two non-parallel lines, showing that one element has an impact on them and depends on the level of the other. Figure 10 displays the standard error of the design with interactions of the model parameters.

Figure 10.
Std error of the design with interactions of the model parameters.

13. Analysis

In the Analysis Section in design expert, select no transform in the configure tab and start the analysis using the button at the button. The interface should appear like Figure 11 below.

You can take advantage of the advanced options button to customize the model like changing from coded to actual factors for factors coding (It is not recommended though).

13.1 Fit summary

The regression calculations to fit all of the polynomial models to the chosen answer are started when the Fit Summary button is clicked. All model terms’ effects are calculated by the program. It produces statistics such as p-values, lack of fit, and R-squared values for comparing the models. The fit summary output is shown on screen in a report which can also be printed and/or copied to another application detected, The “Suggested” model will be highlighted and noted by the program. On the Model panel, this is set as the default model. We Look for the following (Table 10):

A high-order model explains significantly more of the variation that is in the response (p-value small).
Insignificant lack of fit (p-value >0.10).
Adjusted R-squared and predicted R-squared have a reasonable level of agreement (within 0.2 of each other).

Term	Standard Error^*	VIF	R_i²	Power
A	0.2706	1	0.0000	91.4%
B	0.2706	1	0.0000	91.4%
C	0.2706	1	0.0000	91.4%
AB	0.3536	1	0.0000	72.2%
AC	0.3536	1	0.0000	72.2.%
BC	0.3536	1	0.0000	72.2%
A²	0.2634	1.01827	0.0179	99.9%
B²	0.2634	1.01827	0.0179	99.9%
C²	0.2634	1.01827	0.0179	99.9%

Table 8.

Model parameters.

Run	Leverage	Space Type
1	0.6073	Axial
2	0.6698	Factorial
3	0.6073	Axial
4	0.6698	Factorial
5	0.1663	Center
6	0.6073	Axial
7	0.1663	Center
8	0.1663	Center
9	0.6698	Factorial
10	0.6698	Factorial
11	0.6698	Factorial
12	0.6693	Factorial
13	0.1663	Center
14	0.1663	Center
15	0.6073	Axial
16	0.6698	Factorial
17	0.6698	Factorial
18	0.1663	Center
19	0.6073	Axial
20	0.6073	Axial
Average	0.5000

Table 9.

Leverage.

Response 1: Biodiesel Yield
Source	Sequential p-value	Lack of Fit p-value	Adjusted R²	Predicted R²
Linear	0.4974	0.0082	−0.0302	−0.3670
2FI	0.7771	0.0060	−0.1914	−1.6146
Quadratic	0.0279	0.0157	0.4440	−0.9458	Suggested
Cubic	0.0635	0.0354	0.8292	−6.2531	Aliased

Table 10.

Fit summary.

Note: Aliased Models should entirely be avoided.

13.2 Sequential model sum of squares

Table 11 shows the sum of squares, degree of freedom, mean square, F-value, and p-value of the design model. The Sequential Model Sum of squares is the sum of the squared deviations from the mean for each model. The SS for the Mean is calculated first, followed by the Blocks (if applicable), Linear model, Quadratic model, Special Cubic, Cubic, Residuals, and Total.

Source	Sum of Squares	df	Mean Square	F-value	p-value
Mean vs. Total	68182.63	1	68182.63
Linear vs. Mean	1421.30	3	473.77	0.8337	0.4974
2FI vs. Linear	726.96	3	242.32	0.3687	0.7771
Quadratic vs 2FI	4775.31	3	1591.77	5.19	0.0279	Suggested
Cubic vs. Quadratic	2076.83	4	519.21	5.51	0.0635	Aliased
Residual	376.84	4	94.21
Total	77559.86	18	4308.88

Table 11.

Modeling sequentially, sum of squares.

For each source, the sum of squares divided by the degrees of freedom yields the mean square. This is used to compute the F-value for the models.

The F-value is used to test the significance of adding new model terms to those terms already in the model. For instance, the meaning of the linear terms remains tested after removing the effect of the average and the blocks. Then, the significance of the quadratic terms is tested after removing the average, block, and linear effects. And so on. Select the polynomial with the highest order and where the additional terms are significant and the model is not aliased.

13.3 Model summary statistics

R-squared is the correlation coefficient for the model. It should be close to one. We recommend using the Adjusted R-squared for DOE evaluation.

The amount of variation that can be explained by the model is shown by the adjusted R-squared. This is the R-squared value after adjusting for how many terms are in the model relative to the number of design points. The Model summary statistics is shown in Table 12.

Source	Std. Dev.	R²	Adjusted R²	Predicted R²	PRESS
Linear	23.84	0.1516	−0.0302	−0.3670	12818.28
2FI	25.64	0.2291	−0.1914	−1.6146	24517.49
Quadratic	17.51	0.7383	0.4440	−0.9458	18246.34	Suggested
Cubic	9.71	0.9598	0.8292	−6.2531	68014.21	Aliased

Table 12.

Model summary statistics.

Predicted R-Squared is calculated from the PRESS statistic, this represents the amount of variation in new data explained by the model. A negative Predicted R-squared means that the overall mean is a better predictor than this model.

Focus on the model maximizing the Adjusted R² and the Predicted R².

13.4 Lack of fit tests

The data for the Lack of fit Test is shown above in Table 13. This is the p-value associated with the Lack of Fit calculation for this model. The best model should have an insignificant p-value. A typical cutoff would be a p-value >0.10 to conclude an insignificant lack of fit.

Source	Sum of Squares	df	Mean Square	F-value	p-value
Linear	7886.78	11	716.98	31.11	0.0082
2FI	7159.83	8	894.98	38.83	0.0060
Quadratic	2384.52	5	476.90	20.69	0.0157	Suggested
Cubic	307.68	1	307.68	13.35	0.0354	Aliased
Pure Error	69.15	3	23.05

Table 13.

Lack of fit tests.

The selected model should have an insignificant lack of fit.

13.5 ANOVA for quadratic model

Table 14 is the Anova data which is used to test for the significance of the result obtained. Model Probability (a.k.a. p-value) is the probability that the model F statistic is at least the computed value even though the truth is there are no factor effects (the data produced false effects). Probabilities less than the acceptable risk (alpha, by default 0.05) are deemed significant and indicate that there is a model effect. Values greater than the alpha risk suggest no significant effect.

Source	Sum of Squares	df	Mean Square	F-value	p-value
Model	6923.57	9	769.29	2.51	0.1049	not significant
A-Temperature	1218.74	1	1218.74	3.97	0.0813
B-Methanol-Oil ratio	140.27	1	140.27	0.4573	0.5179
C-Catalyst Weight	62.29	1	62.29	0.2031	0.6642
AB	561.12	1	561.12	1.83	0.2132
AC	165.07	1	165.07	0.5382	0.4841
BC	0.7564	1	0.7564	0.0025	0.9616
A²	836.53	1	836.53	2.73	0.1372
B²	4358.27	1	4358.27	14.21	0.0055
C²	859.23	1	859.23	2.80	0.1327
Residual	2453.67	8	306.71
Lack of Fit	2384.52	5	476.90	20.69	0.0157	significant
Pure Error	69.15	3	23.05
Cor Total	9377.24	17

Table 14.

Anova for quadratic model.

The degree to which the model fits the data is measured by lack of fit. A strong lack of fit (p < .05) is an undesirable property because it shows that the model does not fit the data well. It is desirable to have little lack of fit (P > 0.1).

The model is not significant in comparison to the noise, according to the model’s F-value of 2.51. The likelihood of noise causing an F-value this large is 10.49%.

Model terms are considered significant when the P-value is less than 0.0500. B² is a significant model term in this situation. The model terms are not significant if the value is greater than 0.1000. Model reduction may enhance the model if it has a lot of unnecessary terms (except those needed to maintain hierarchy).

The significance of the lack of fit is indicated by the lack of fit F-value of 20.69. A significant Lack of Fit F-value can only be caused by noise in 1.57 percent of cases. A significant lack of fit is undesirable, we want the model to fit.

A negative Predicted R² as shown in the Fit Statistics data in Table 15 implies that the overall mean may be a better predictor of your response than the current model. In some cases, a higher-order model might be more accurate.

Fit Statistics
Std. Dev.	17.51	R²	0.7383
Mean	61.55	Adjusted R²	0.4440
C.V. %	28.46	Predicted R²	−0.9458
		Adeq Precision	4.8223

Table 15.

Fit statistics.

Adeq Precision: The ratio of signal to noise is measured by Adeq Precision. A ratio of at least 4 is preferred. Your ratio of 4.822 shows an adequate signal. This model can be used to navigate the design space.

14. Decision

From the ANOVA result, it is obvious the model cannot be deployed like this, we need to tweak it a bit before using it for optimization, or else the solutions provided by it will be misleading. Now we will remove the interaction terms from the model since they have lower power (see Table 8). We will only repeat the ANOVA section after this change (Table 16).

14.1 ANOVA for reduced quadratic model

The model is significant, as indicated by the model’s F-value of 3.57 The likelihood of noise producing an F-value this large is only 3.25.

Model terms are considered significant when the P-value is less than 0.0500. In this case, B² is a crucial model term in this instance. Model terms are not significant if the value is higher than 0.100. Model reduction may enhance your model if it has a large number of unnecessary terms (excluding those necessary to maintain hierarchy).

The Lack of Fit F-value of 16.87 implies the Lack of Fit is significant. There is only a 2.02% chance that a Lack of Fit F-value this large could occur due to noise. A significant lack of fit is not okay -- we want the model to fit. But there is a little bit more improvement than before. So we can work with this model (Table 17).

Source	Sum of Squares	df	Mean Square	F-value	p-value
Model	6196.61	6	1032.77	3.57	0.0325	significant
A-Temperature	1218.74	1	1218.74	4.21	0.0646
B-Methanol-Oil ratio	140.27	1	140.27	0.4851	0.5006
C-Catalyst Weight	62.29	1	62.29	0.2154	0.6516
A²	836.53	1	836.53	2.89	0.1170
B²	4358.27	1	4358.27	15.07	0.0026
C²	859.23	1	859.23	2.97	0.1127
Residual	3180.62	11	289.15
Lack of Fit	3111.47	8	388.93	16.87	0.0202	significant
Pure Error	69.15	3	23.05
Cor Total	9377.24	17

Table 16.

ANOVA for reduced quadratic model.

Std. Dev.	17.00	R²	0.6608
Mean	61.55	Adjusted R²	0.4758
C.V. %		Predicted R²	−0.3457
		Adeq Precision	5.4594

Table 17.

Fit statistics for RQM.

14.2 Fit statistics for RQM

A negative Predicted R² implies that the overall mean may be a better predictor of your response than the current model. In some cases, a higher-order model may also predict better.

Adeq Precision measures the signal-to-noise ratio. A ratio greater than 4 is desirable. Our ratio of 5.459 indicates an adequate signal. This model can be used to navigate the design space. And there is an improvement in the Adjusted R² using this reduced Quadratic Model.

14.3 Coefficients in terms of coded factors

The coefficient estimate data in Table 18 represents the estimated coefficient and shows the anticipated change in response for each unit change in the factor value. The intercept in an orthogonal design is the overall average response of all the runs. The coefficients are adjustments around that average based on the factor settings. The VIFs are 1 when the factors are orthogonal; Multi-collinearity is indicated by VIFs that are more than 1, and the higher the VIF, the more severe the correlation of components VIFs of fewer than 10 are generally acceptable.

Factor	Coefficient Estimate	df	Standard Error	95% CI Low	95% CI High	VIF
Intercept	88.05	1	8.49	69.37	106.74
A-Temperature	−9.45	1	4.60	−19.57	0.6808	1.0000
B-Methanol-Oil ratio	−3.20	1	4.60	−13.33	6.92	1.0000
C-Catalyst Weight	2.14	1	4.60	−7.99	12.26	1.0000
A²	−8.13	1	4.78	−18.66	2.39	1.08
B²	−18.56	1	4.78	−29.09	−8.04	1.08
C²	−8.24	1	4.78	−18.76	2.28	1.08

Table 18.

Coefficients as codified factors.

14.4 Final equation in terms of coded factors

You can apply the equation in terms of coded factors in Table 19 to make predictions about the response for given levels of each factor. By default, the factors’ high levels are coded as +1 and their low levels as −1. By comparing the factor coefficients, the coded equation can be used to determine the relative importance of the elements.

Biodiesel Yield	=
+80.46
−9.45	^* A
−3.20	^* B
+2.13	^* C
−5.54	^* A²
−15.97	^* B²
−5.65	^* C²

Table 19.

Equation at the end using coded factors.

14.5 Final equation in using actual factors

The equation in terms of actual factors in Table 20 can be used to make predictions about the response for given levels of each factor. Here, the levels should be specified in the original units for each factor. The relative importance of each item should not be determined using this equation because the coefficients are scaled to accommodate the units of each factor and the intercept is not at the center of the design space.

Biodiesel Yield	=
−902.92058
+26.92040	^* Temperature
+12.34881	^* Methanol-Oil ratio
+19.01074	^* Catalyst Weight
−0.221613	^* Temperature²
−0.283914	^* Methanol-Oil ratio²
−2.51265	^* Catalyst Weight²

Table 20.

Final equation using actual factors.

14.6 Diagnostics plots

Raw residuals and internally studentized options are also available, with externally studentized residuals being the default. The standard errors of the residuals are different unless all the runs in a design have the same leverage. Each raw residual represents a different population (one for each different standard error). As a result, it is not recommended to validate the regression assumptions using raw residuals. All of the individual normal distributions are mapped by studentizing the residuals to a single standard normal distribution. The default is externally studentized residuals based on a deletion procedure since they are more sensitive to detecting issues with the analysis. Internally Studentized residuals are also available but are less sensitive to finding such problems. As described in the diagnostics plot in Figure 12 from design expert software.

Normal Probability: If the residuals follow a normal distribution, they should follow a straight line, according to the normal probability plot. Even with typical data, expect some scatter. Only focus on distinct patterns, such as an “S-shaped” curve, which suggests that a response modification might lead to a more accurate analysis.

Residuals vs. Predicted: This is a plot of the residuals versus the ascending predicted response values. The idea of constant variance is tested. The plot needs to be random scatter (residuals should have a constant range across the graph). This plot’s expanding variance (“megaphone pattern”) suggests that a transformation is required.

Predicted vs. Actual: An illustration showing a graph of expected and actual response values. The purpose is to detect a value, or group of values, that are not easily predicted by the model.

Leverage: A measurement of each point’s impact on the model’s fit. When a point’s leverage is 1, the model perfectly describes the observation at that location. The model is influenced by that point. A run with more than two times the typical leverage is generally regarded as having high leverage. There aren’t many runs like them in the factor space. The average leverage is calculated by dividing the number of terms among the model by the number of design runs.

14.7 Model graphs

All the model graphs which can be used to drive insights on the responses for all input data are shown in Figures 13–18 respectively.

15. Optimization

Here, Our goal is to maximize Biodiesel Yield using the given factors in the range (lower and upper level) summarized in Table 21 below.

15.1 Solutions

The design expert Software iterated over all the ranges of factors and found the maximum yield. There are 100 possible solutions. However, we will select the one suggested by the software and shown below in Table 22.

Name	Goal	Lower Limit	Upper Limit	Lower Weight	Upper Weight	Importance
A:Temperature	is in range	60	70	1	1	3
B:Methanol-Oil ratio	is in range	15	30	1	1	3
C:Catalyst Weight	is in range	2	5	1	1	3
Biodiesel Yield	maximize	10.66	90.98	1	1	5

Table 21.

Constraints.

100 Solutions found
Number	Temperature	Methanol-Oil ratio	Catalyst Weight	Biodiesel Yield	Desirability
1	61.818	22.128	3.760	91.007	1.000	Selected
2	62.522	21.568	3.704	90.987	1.000
3	62.223	21.803	3.668	91.064	1.000
4	61.717	21.882	3.708	91.025	1.000
5	62.126	21.973	3.759	91.052	1.000
6	62.021	21.746	3.772	91.045	1.000
7	62.295	21.826	3.807	91.013	1.000
8	62.025	22.087	3.706	91.052	1.000
9	62.183	21.870	3.759	91.055	1.000
10	62.188	22.001	3.632	91.049	1.000

Table 22.

Optimization solutions.

16. Conclusion

In this Chapter, we have extensively applied Central Composite design to optimize Biodiesel Synthesis Using a Catalyst and design expert 13 has been used to provide deep statistical analysis. A reduced Quadratic model with a significant p-value of 0.0325 was accepted since the Quadratic model has an insignificant p-value. The model is significant, as indicated by the model’s F-value of 3.57. An F-value this large might be caused by noise only in 3.25% of cases. The number of the experimental run was reduced to 18 runs compared to the 20 runs used by the original experimenters and we have also obtained a higher yield of 91% compared to the 89% obtained in the original study.

Acknowledgments

I acknowledge my co-author, Dr. C.N Njoku for his help and support to garner this information and for inspiring the success of this work. I also want to return big regard to my mother for always providing her special support in the little ways she could.

References

1. Bhattacharya S. Central composite Design for Response Surface Methodology and its Application in pharmacy. In: Response Surface Methodology in Engineering Science. London, UK: IntechOpen; 2021. DOI: 10.5772/INTECHOPEN.95835
2. Wikipedia contributors. (2020). Central composite design. In Wikipedia, The Free Encyclopedia. Available from: https://en.wikipedia.org/w/index.php?title=Central_composite_design&oldid=954106283 [Accessed: May 14, 2022]
3. Skartland LK, Mjos SA, Grung B. Experimental designs for modeling retention patterns and separation efficiency in the analysis of fatty acid methyl esters by gas chromatography-mass spectrometry. Journal of Chromatography A. 2011;1218:6823-6831
4. Tshizanga N, Aransiola EF, Oyekola O. Optimization of biodiesel production from waste vegetable oil and eggshell ash. South African Journal of Chemical Engineering. 2017;23:145-156. DOI: 10.1016/j.sajce.2017.05.003
5. Manohar M, Joseph J, Selvaraj T, Sivakumar D. Application of box Behnken design to optimize the parameters for turning Inconel 718 using coated carbide tools. International Journal of Scientific and Engineering Research. 2013;4(620):642
6. Breyfogle FW. Chapter 17. In: Statistical Methods for Testing, Development, and Manufacturing. John Wiley & Sons Ltd, New York. 252 p; 1992
7. Singh B, Kumar R, Ahuja N. Optimizing drug delivery systems using systematic" design of experiments." part I: Fundamental aspects. Critical Reviews in Therapeutic Drug Carrier Systems. 2005;22(1):27-105
8. Cavazzuti M. Design of experiments. In: Optimization Methods. Berlin, Heidelberg: Springer; 2013. pp. 13-42
9. Hassanein HM, Abd-Rabou AS, Sakr SM. Design optimization of transverse flux linear motor for weight reduction and performance improvement using response surface methodology and genetic algorithms. IEEE Transactions on Energy Conservation. 2010;25(3):598-605
10. Anderson MJ, Whitcomb PJ. RSM Simplified. New York: Productivity, Inc.; 2016
11. DeGryze L, Vandebroek. Using the correct intervals for prediction: A tutorial on tolerance intervals of ordinary least-squares regression. Chemometrics and Intelligent Laboratory Systems. 2007;87(2):147-154
12. Zahran A, Anderson-Cook CM, Myers RH. Fraction of design space to assess prediction capability of response surface designs. Journal of Quality Technology. 2003;35(4):377-386

[1] 1. Bhattacharya S. Central composite Design for Response Surface Methodology and its Application in pharmacy. In: Response Surface Methodology in Engineering Science. London, UK: IntechOpen; 2021. DOI: 10.5772/INTECHOPEN.95835

[2] 2. Wikipedia contributors. (2020). Central composite design. In Wikipedia, The Free Encyclopedia. Available from: https://en.wikipedia.org/w/index.php?title=Central_composite_design&oldid=954106283 [Accessed: May 14, 2022]

[3] 3. Skartland LK, Mjos SA, Grung B. Experimental designs for modeling retention patterns and separation efficiency in the analysis of fatty acid methyl esters by gas chromatography-mass spectrometry. Journal of Chromatography A. 2011;1218:6823-6831

[4] 4. Tshizanga N, Aransiola EF, Oyekola O. Optimization of biodiesel production from waste vegetable oil and eggshell ash. South African Journal of Chemical Engineering. 2017;23:145-156. DOI: 10.1016/j.sajce.2017.05.003

[5] 5. Manohar M, Joseph J, Selvaraj T, Sivakumar D. Application of box Behnken design to optimize the parameters for turning Inconel 718 using coated carbide tools. International Journal of Scientific and Engineering Research. 2013;4(620):642

[6] 6. Breyfogle FW. Chapter 17. In: Statistical Methods for Testing, Development, and Manufacturing. John Wiley & Sons Ltd, New York. 252 p; 1992

[7] 7. Singh B, Kumar R, Ahuja N. Optimizing drug delivery systems using systematic" design of experiments." part I: Fundamental aspects. Critical Reviews in Therapeutic Drug Carrier Systems. 2005;22(1):27-105

[8] 8. Cavazzuti M. Design of experiments. In: Optimization Methods. Berlin, Heidelberg: Springer; 2013. pp. 13-42

[9] 9. Hassanein HM, Abd-Rabou AS, Sakr SM. Design optimization of transverse flux linear motor for weight reduction and performance improvement using response surface methodology and genetic algorithms. IEEE Transactions on Energy Conservation. 2010;25(3):598-605

[10] 10. Anderson MJ, Whitcomb PJ. RSM Simplified. New York: Productivity, Inc.; 2016

[11] 11. DeGryze L, Vandebroek. Using the correct intervals for prediction: A tutorial on tolerance intervals of ordinary least-squares regression. Chemometrics and Intelligent Laboratory Systems. 2007;87(2):147-154

[12] 12. Zahran A, Anderson-Cook CM, Myers RH. Fraction of design space to assess prediction capability of response surface designs. Journal of Quality Technology. 2003;35(4):377-386