Response Surface Methodology Applied to the Optimization of Phenolic Compound Extraction from Brassica

The response surface methodology (RSM) is a relevant mathematical and statistical tool for process optimization. A state of the art on the optimization of the extraction of phenolic compounds from Brassica has shown that this approach is not sufficiently used. The reason for this is certainly an apparent complexity in comparison with the implementation of a one-factor-at-a-time (OFAT) optimization. The objective of this chapter is to show how one implement the response surface methodology in a didactic way on a case study: the extraction of sinapine from mustard bran. Using this approach, prediction models have been developed and validated to predict the sinapine content extracted as well as the purity of the extract in sinapine. The methodology presented in this chapter can be reproduced on any other application in the field of process engineering.


Introduction
Nowadays, bio-based molecules are more and more popular and used in everyday consumer products. Certain molecules such as phenolic compounds (PCs) are very appreciated for their biological activities which make it possible to fight against aging or to act as an antibacterial or anti-oxidant agent. Phenolic compounds are secondary metabolites of plants and are present in plant biomass as well as in agroindustrial by-products [1]. The latter are currently used in sectors with low added value such as methanization or animal feed. To provide additional value to its agroindustrial co-products, phenolic compounds could be extracted and concentrated [2]. For this, separation processes will have to be implemented and optimized. Thus, maximizing the extraction of phenolic compounds has become a topic of interest which would improve the profitability of crops and by-products resulting from their industrial transformation [3].
Many studies focus on maximizing extraction efficiency by optimizing using OFAT. This method, which seems simpler, is often either time consuming or leads to partial conclusions (e.g. no interpretation of the interactions between variables). Thus, to achieve such an optimization, it is recommended, if conditions allow it, to use the response surface methodology. RSM is a mathematical and statistical tool for exploring the relationships between several explanatory variables -called factorsand one or more variables to be optimized, called response(s). RSM is particularly relevant when the response is suspected to evolve in a curved way.
In this chapter, we will focus on the application of RSM for optimizing the extraction of phenolic compounds from Brassica. In the first part, we propose a state of the art of the studies on this topic with an analysis of the main tools used to determine the optimal operating conditions for the extraction of phenolic compounds. In a second part, a case study based on the work of Reungoat et al. (2020) is presented [4]. This study focuses on the optimization of a sustainable extraction process to improve the recovery (yield and purity) of sinapine from mustard bran. Sinapine has biological activities however, its first interest is the degradation product of its hydrolysis: the sinapic acid. It has been shown that providing bio-based sinapic acid is very relevant in various application fields [5]. Indeed, this platform molecules can be used for the chemo-enzymatic synthesis of various molecules such as an anti-UV agent [6,7], a non-endocrine disruptive antiradical additive [8] and a bisphenol A substitute for polymer/resin synthesis [9]. The study will be detailed not from an application point of view but from a methodological point of view with the presentation of the different steps which led to obtaining optimum operating conditions of the extraction process.

State of the art on the optimization of PC extraction from Brassica
The studies reported in Table 1 deal with the optimization of the extraction process of phenolic compounds from Brassica. These all relate to the use of a design of experiments (DoE). OFAT optimization has been excluded.
Twenty papers have been identified on various raw materials belonging to Brassica (rapeseed, mustard, cabbage, broccoli, cauliflower). The extraction processes implemented are the most popular ones: conventional solvent extraction (CSE), ultrasound-accelerated extraction (UAE), microwave-accelerated extraction (MAE). One study deals with an extraction assisted by pulsed electric field (PEF) [13] and another with accelerated solvent extraction (ASE) [5].
The operating conditions the most often optimized are the extraction temperature, the solvent concentration in water, the solid-to-matter ratio, and the extraction time. Some specific conditions can also be investigated such as ultrasonic or microwave power when UAE and MAE are carried out.
The predicted responses are diverse whether they are measurement of individual phenolic content obtained by HPLC, total phenolic compounds (TPC), or content of total flavonoid (TFC), or antioxidant activity (AA) which can be measured by different methods ( Table 2).
Most studies have used RSM to model and/or predict responses. A mixture design was also used to determine the composition of an extraction solvent from three pure solvents; a simplex centroid mixture was carried out [3]. Some studies model responses using first order polynomial equations. These models are obtained from factorial design of experiments [5,15,16]. Concerning the implementation of the RSM, the experimental design carried out are mainly Box-Behnken (BB) [8,10,13,14,17] and Central Composite (CC) [1,6,7,9,[18][19][20]. We also found a D-optimal [4] and a full factorial [2]. However, these DoEs are rarely associated with RSM. The predictions made by RSM are associated with second order polynomial models.
Compared to all the studies that exist in the literature on the extraction of phenolic compounds from Brassica, only a small proportion uses RSM.

Context of the study
Mustard bran is one of the main by-products of the mustard seed industry whose production peaked at 710 thousand tonnes in 2018 [29]. By-products from their processing represent up to 60%w of seeds [30]. Mustard bran is rich in water with a content between 53 AE 1%. The dry matter is mainly composed of proteins (27 AE 7%), lipids (18 AE 1%), carbohydrates (34 AE 5%) and ash (12 AE 5%) [30][31][32].
Phenolic compounds represent between 1 and 4% of the wet matter of defatted mustard seeds [33]. They are mainly derivatives of sinapic acid, present at 90% as sinapine with relatively small amounts of sinapic acid. Sinapine can be used directly due to its many bioactivities [5,34] or be hydrolyzed to sinapic acid by chemical or enzymatic means [35]. Thus, our work will focus on the extraction of sinapine from mustard bran. Moreover, bio-based sinapic acid is highly sought after thanks to its many applications, whether in cosmetics (anti-aging, anti-UV) or in the field of polymers [6,8].
Thus, the implementation of a green extraction process to recover sinapine seems particularly relevant. The most widely used process in the various studies found in the literature is conventional solvent extraction (CSE). This is a solid/ liquid extraction, the liquid being a solvent whose properties will define the sustainability of the process. Solvents such as acetone, methanol, ethanol or water, as well as a mixture, have been used [36]. To follow the principles of green extraction [37], the extraction process developed will use aqueous ethanol as solvent, the percentage of which will be determined during the optimization of the process.

Material and methods
Mustard bran, was supplied by Charbonneaux-Brabant (Reims, France). Mustard (B. juncea) grew in Canada and was processed in France. The treatment undergone by the seeds is cold mechanical pressure. The material has not been defatted, ground or dried. The raw mustard bran was stored in a cold room at 4°C until use.
A CSE using an ethanol/water mixture was implemented to remove sinapine from mustard bran. A fixed volume of 100 mL of solvent was used for each experiment. The extraction temperature was regulated with a digital thermometer in contact with the solvent and connected to the heating plate (IKA-RCT). Magnetic stirring was ensured throughout the duration of the extraction (2 h). Centrifugation was used (4713 g, 10 min) to separate the liquid extract from the solid residue. The sinapine content was measured by HPLC. More details on the materials and the methods can be found in Reugoat et al., 2020 [1].

Implementation of RSM
RSM is the recommended approach to optimize process operating conditions, for example to maximize extraction yield or minimize impurity content. Indeed, the implementation of the RSM, and therefore of a design of experiments, makes it possible to minimize the number of experiments, to determine the quadratic effect of a factor or the interaction between several factors and to obtain a high precision on the prediction of an optimal value.
The implementation of RSM requires the identification of the factors that will be involved in the model. Thus, RSM is often used after a screening plan which allows the discrimination of the operating conditions leading to a significant variation in the response. Sometimes, prior knowledge of the process is sufficient to avoid the screening step and RSM can be applied after arbitrary choice of factors by the experimenter. RSM is a relevant approach if the response surface is suspected to be curved. Indeed, the equation of the model used includes quadratic terms which make it possible to translate the curvature of the response.
In order to apply RSM, it is necessary to follow a rigorous approach so as not to end up with wrong conclusions or an unusable data set for the prediction of an optimum. This approach is illustrated in Figure 1.
For each step, the reasoning adopted for our case study will be detailed, the choices will be explained so that the methodology can be easily implemented on other cases.

Definition of the objectives
The objective of the optimization study must be defined according to the overall objective of the application. In our case, the operating conditions of the extraction process leading to a maximum yield of sinapine are sought. However, the global objective of the application is to produce sinapine, that is, to obtain a high purity sinapine extract. Thus, a second variable to be optimized emerges in addition to the yield of sinapine: the purity of the sinapine extract. Under such considerations, the optimum operating conditions sought will be a compromise between those allowing to maximize the yield of sinapine and the ones that maximize the purity of sinapine. Failure to correctly define the objective may lead to an incorrect definition of the responses, factors and their levels and thus induce a partial conclusion at the end of the study.

Definition of the responses
A response is defined as a variable to be explained. For the choice of responses, it is necessary to ensure that the measurement tools are sufficiently repeatable. Indeed, in statistics, it is common to say that the more the value is dispersed the more it will be difficult to highlight significant differences and therefore to obtain a valid prediction model. This is why the presence of a triplicate in the DOE is essential to quantify the repeatability of the measurement. If it is too large, the DOE will not be able to generate a valid model.
In our case study, the two responses to be optimized are the yield of sinapine in %(Y 1 ) and the purity of sinapine in % (Y 2 ) defined by Eq. (1,2) with C sinapine the sinapine content measured by HPLC in mg/L, V solvent the volume of solvent added during the extraction and m BDM the mass of dry matter in mustard bran.
with m sinapine (g) the mass of sinapine in the extract determined from C sinapine and m EDM the mass of the dry matter in the extract (g).

Definition of the factors
A factor is defined as a variable that provides information to explain a response. Two strategies can be used to define the factors: to apply a screening plan (factorial or Plackett-Burman) or to use expertise on the process. In our case, the factors were chosen based on prior knowledge about the extraction process [4]. Note that the factors must be independent for the implementation of the experimental design. This should be checked before establishing the matrix of experiments.
According to theory, the liquid/solid extraction processes are influenced by a set of parameters which can modify their efficiency. These relate to: (i) the equipment used (stirring power, the configuration of the reactor), (ii) the operating conditions (extraction time, extraction temperature and pressure), and (iii) the biomass and the solvent (solvent-to-matter ratio, state of the biomass, nature of the solvent).
Some of these parameters are often fixed in the design of the experiments. Indeed, for laboratory experiments, the extraction reactor is always the same as well as the stirring system (type and power). Conventionally, the extraction time corresponds to the time required to reach the equilibrium. In our case, the biomass is wet and in the form of bran, so it cannot be crushed or sieved. This parameter cannot be taken as a factor. In addition, two constraints were imposed: to conduct the experiments at atmospheric pressure and working with ethanol (pure or aqueous) to design a sustainable process. Thus, the parameters that could be included as factors in the design of experiments are the solvent-to-matter ratio, the extraction temperature and the ethanol concentration. These parameters being independent, three factors will be used in models developed using RSM. The last point to be defined is the variation range of each factor.

Range of extraction temperature
Technological limits exist for the choice of extraction temperatures. The experimental domain cannot be extended above 75°C to avoid evaporation phenomena due to the boiling temperature of ethanol. Thus, the extraction temperature will be able to vary between room temperature and 75°C. However, according to the literature, it does not seem interesting to carry out experiments at temperatures close to room temperature. Indeed, it is known that an increase of temperature allows to improve the extraction of phenolic compounds. A range of values too large can adversely affect the quality and accuracy of the prediction model. We have, therefore, chosen to limit our temperature range between 45°C and 75°C.

Range of solvent-to-matter ratio (S/M)
There is also a technological limit for this factor. Indeed, it is not possible to extract with less than 10 mL per gram of mustard bran. In addition, the objective being not to consume too much solvent, no more than 30 mL per gram of mustard bran will be used. Thus, the range of the S/M factor will be between 10 and 30 mL/g.

Range of ethanol concentration
No technological limit was found for this factor. The use of extreme values (water or pure ethanol) is not interesting because the better extraction yields are obtained with aqueous ethanol. According to preliminary experiments, to maximize the yield of sinapine (Y1), the values to be studied should be between 40 and 80%. Considering the purity of sinapine (Y2), the values to be studied should be between 60 and 100% in order to limit the extraction of impurities such as sugars and proteins.
To define the range of variation of the ethanol concentration, we merged the two previous intervals by removing the extreme values so as not to widen the range of values to be studied too much. Thus, ethanol concentrations between 45 and 95% were studied in the design of experiments.

Choice and implementation of the design of experiments
The two most used design of experiments for the implementation of RSM are the composite center (CC) and Box-Behnken (BB) designs.
For a same number of factors and levels, a BB design generates fewer experiments than a CC design. However, BB designs have a certain rigidity in their implementation since the number of levels per factor is fixed. In addition, the BB designs do not include in the experiments the extreme values of the variation ranges of the factors. This can sometimes constitute a problem, when a precise knowledge of the interval is available and/or when the extreme values want to be tested.
In addition, CC design is to be able to integrate preliminary experiments. Thus, the results of certain experiments present in the screening plan carried out upstream can be used as experiments in the CC design. Thus, the number of new assays to realize will decrease.
The CCF design belongs to the category of the CC design. The experiments defined are located in the center of each face of the experimental domain.
In our application, a CCF design was used to optimize the extraction process for the recovery of sinapine. A total of 17 experiments including a repetition at the central point constitutes the set of the experiments to implement RSM. The different assays are presented Table 2 in the form of coded and uncoded variables. X 1 (extraction temperature; 45, 60, 75°C), X 2 (concentration of ethanol; 45-70-95% v/v ) and X 3 (solvent-to-matter ratio; 10, 20, 30 mL/g BDM ) are the independent variables used to explain the responses Y 1 (sinapine yield on the mustard bran dry matter in g/g BDM ) and Y 2 (sinapine purity on the extract dry matter in % EDM ).
The experimental data were fitted using a second-order polynomial (Eq. (3)): where Y q are the different responses (q = 1-2); β 0 , β i , β ij , β ii are the regression coefficients for the mean, linear, interaction and quadratic terms respectively. X i and X j are the independent variables. ε q the residues between the observed and the predicted values.

Run the experiments
This step corresponds to data collection. Assays can be performed in random order. The material and methods of analysis were briefly introduced. More details can be found in Reungoat et al. (2020) [1].

Development and validation of the model
Once the data has been collected, they are processed by a software to generate a model and indicators that allow its quality to be assessed (fit to the data, ability to   predict). The software used to carry out our case study is the commercial software MODDE v.12.0 (Umetrics AB, Sweden). First, it is necessary to determine whether the model should be reduced. Reducing a model means removing variables whose coefficients are not significant. Significance tests are carried out for this purpose. The p-values obtained indicate whether the value of the coefficient can be considered equal to 0. In this case, the factors are considered to have no effect on the response. Table 3 presents the scaled and centered coefficients of the model associated with each term as well as the results of the significance test for each coefficient.
The p-values in red in Table 3 indicate the significant coefficients and factors to keep in the model.

Analysis of the prediction model of Y 1
The significant coefficients are β0(constant),β1 (T) and β22 (E*E). Since the quadratic term E*E is significant, the variable E cannot be removed from the model. Thus, the factors to keep are temperature and ethanol. The S/M ratio has no effect on the sinapine yield. The data must be reprocessed by the software keeping the variables T, E and E*E. New values are found for the coefficients of the reduced model. Sinapine yield can be predicted according to Eq. (4) with unscaled coefficients.  Secondly, the indicators calculated on each reduced model are interpreted to assess whether the correlation between the model and the experimental data is acceptable and whether these models can be considered as good prediction tools. These indicators are presented in Table 4.
The coefficients of determination being close to 1, the reduced models have a good accuracy in their prediction. The values of the adjusted coefficients of determination are high enough to suggest a satisfactory correlation between the values predicted by the model and the values observed by the experiments. The p-values obtained by the ANOVA on the model regression are less than 0.01% which validates the models obtained. The condition number determines the correct orthogonality of the two models because it does not exceed 10. Each model reproducibility is also excellent with a value close to 1. All these statistical parameters indicate that the relationships between the variables and the responses are well described by the models.

Determination of the optima and validation of the model
In order to determine the optimal operating conditions for each response, the 3D response surfaces will be plotted. In a second time, the software optimizer tool based on the Nelder-Mead simplex method was implemented to obtain the optimal operating conditions. Figure 2 presents the evolution of Y 1 according to the extraction temperature, the ethanol concentration for a solid-to-matter ratio of 10 mL/g.  Table 4.
Indicators to assess the fit and quality of reduced models.

Figure 2.
3D response surface for a solid-to-matter ratio of 10 mL/g. for Y 1 .
Variations of the sinapine yield from 5.3 to 8.9 mg/g BDM were found among the 17 experiments of the CCF design.
As can be seen on Figure 2, the sinapine yield evolves in a parabolic shape. This can be explained by a strong influence of the quadratic term of the ethanol concentration. The maximum sinapine yield is achieved in the range 65-80% ethanol. The extraction temperature has a positive effect on the sinapine yield as observed in Figure 2 with the inclination of the response surface towards the high temperature zone.
The optimal operating conditions determined for Y 1 by the software MODDE are 70% ethanol, 75°C.
An experimental sinapine yield of 8.8 AE 0.1 mg/g was achieved under these conditions. Figure 3 presents the evolution of Y 2 according to the extraction temperature, the ethanol concentration for a solid-to-matter ratio of 10 mL/g BDM .
Variations of the sinapine purity from 1.4% EDM and 4.4% EDM were found among the 17 experiments of the CCF design.
For a ratio of 20, the response surface is flat. Quadratic terms have little influence. The extract, containing the most sinapine compared to other extracted solutes, is obtained for a maximum temperature and ethanol concentration. This may be due to low solubility of proteins, sugars and minerals in ethanol compared to sinapine. However, an increase of the solvent-to-matter ratio increases the solubility of those impurities and decreases the sinapine purity in the extract.
The optimal operating conditions determined for Y 2 by the software MODDE are 100% ethanol, 75°C and, 10 mL/g BDM . An experimental sinapine purity of 4.4 AE 0.1% EDM was achieved under these conditions.
Since the two optima are not the same, it will be necessary to find the operating conditions allowing to maximize the two responses at the same time. Figure 4 presents the response surfaces for Y 1 and Y 2 on the same graph.
The MODDE software has determined that the optimal operating conditions that will provide the highest yield of sinapine while maintaining high purity, are 83% ethanol, 75°C, and 10 mL/g BDM . An experimental sinapine yield of 8.0 AE 0.1 mg/g was obtained under these conditions with a purity of 4.2 AE 0.1% EDM . The last step to be carried out is the validation of the models on new experiments. For this, experiments were realized in duplicate under optimal conditions corresponding to the maximization of Y 1 and for the compromise between Y 1 and Y 2 . Student's tests were performed to determine if the predicted values given by the models can be considered equivalent to the observed values. The results are shown in Table 5.
Experimental values correspond to predicted values since p-value>0.05. Thus, models developed by RSM are validated and can be used as prediction tool.

Conclusions
Concerning the extraction of sinapine from mustard bran, a CCF design was used to optimize the extraction process. A total of 17 experiments including a repetition at the central point constituted the set of the experiments to implement the RSM.
Two prediction models have thus been developed. These models have been validated, making it possible to predict the yield and the purity of sinapine from the 3D response surface for a solid-to-matter ratio of 10 mL/g BDM for Y 1 and Y 2 .

Validation of the models by performing student tests between predicted and observed values.
operating conditions of the extraction process (extraction temperature, ethanol concentration and solvent-to-matter ratio). An optimal sinapine content of 8.8 AE 0.1 mg/g was obtained at 75°C, 70% ethanol and 10 mL/g BDM whereas an optimal purity of sinapine in the extract (4.2 AE 0.1% EDM ) was achieved under different operating conditions (75°C,100% ethanol and 10 mL/g BDM ).
Wishing to situate us as close as possible to the 2 optima, the MODDE software determined that the most appropriate operating conditions were 75°C, 83% ethanol and 10 mL/g BDM . The loss in yield and purity remains low since the sinapine yield of 8.0 AE 0.1 mg/g and a purity of 4.0 AE 0.1% EDM are obtained.
The use of rigorous mathematical tools for optimization in process engineering remains under-exploited as we have shown for the extraction of phenolic compounds from Brassica. To remedy this, a generalization of the learning and use of experimental designs in universities and in the research community should be put in place. This is to encourage experimenters to optimize their process in a structured way rather than using OFAT approaches which seem easy to understand at first glance, but which may prevent the full exploitation of the information provided by the experiments. The case study, presented here, illustrated the potential in terms of process optimization using RSM.