A Bayesian Multiple-Trait and Multiple-Environment Model Using the Matrix Normal Distribution

Osval A. Montesinos-López; Abelardo Montesinos-López; José
Cricelio Montesinos-López; José Crossa; Francisco Javier Luna-
Vázquez; Josafhat Salinas-Ruiz

doi:10.5772/intechopen.71521

Abstract

Genomic selection (GS) is playing a major role in plant breeding for the selection of candidate individuals (animal or plants) early in time. However, for improving GS better statistical models are required. For this reason, in this chapter book we provide an improved version of the Bayesian multiple-trait and multiple-environment (BMTME) model of Montesinos-López et al. that takes into account the correlation between traits (genetic and residual) and between environments since allows general covariance’s matrices. This improved version of the BMTME model was derived using the matrix normal distribution that allows a more easy derivation of all full conditional distributions required, allows a more efficient model in terms of time of implementation. We tested the proposed model using simulated and real data sets. According to our results we have elements to conclude that this model improved considerably in terms of time of implementation and it is better than a Bayesian multiple-trait, multiple-environment model that not take into account general covariance structure for covariance’s of the traits and environments.

Keywords

genomic selection
multiple-trait and multiple-environment
Bayesian
general covariance’s matrices

Author Information

Show +

Osval A. Montesinos-López*
- Facultad de Telemática, Universidad de Colima, México
Abelardo Montesinos-López
- Departamento de Matemáticas, Centro Universitario de Ciencias Exactas e Ingenierías (CUCEI), Universidad de Guadalajara, México
José Cricelio Montesinos-López
- Centro de Investigación en Matemáticas (CIMAT), México
José Crossa
- International Maize and Wheat Improvement Center (CIMMYT), México
Francisco Javier Luna-Vázquez
- Facultad de Telemática, Universidad de Colima, México
Josafhat Salinas-Ruiz
- Colegio de Postgraduados, Campus Córdoba,, México

*Address all correspondence to: oamontes1@ucol.mx

1. Introduction

Genomic selection is revolutionizing plant breeding, since allows the selection of candidate individuals (animal or plants) early in time. However, the success of genomic selection is linked directly to the use of statistical models, since the process of selection of candidate individuals is done using statistical models. However, most of the models currently used in genomic selection are univariate models mostly for continuous phenotypes, which not exploit the existing correlation between traits when the selection of individuals (genotypes or animals) is done with the purpose to improve simultaneously multiple-traits. The advantage of jointly modeling multiple-traits compared to analyzing each trait separately, is that the inference process appropriately accounts for the correlation among the traits, which helps to increase prediction accuracy, statistical power, parameter estimation accuracy, and reduce trait selection bias [1, 2]. For this reason, there is a great interest of plant and animal scientist to develop appropriate genomic selection models for multiple-traits and multiple-environments to take advantage of this correlation and to improve the prediction accuracy in the selection of candidate individuals.

For this reason, in this chapter we propose an improved version of the Bayesian multiple-trait, multiple-environment (BMTME) model proposed by Montesinos-López et al. [3] that is appropriate for correlated multiple-traits and multiple-environments but instead of building this model using the multivariate normal distribution we propose to build it using the matrix normal distribution which should avoid that the number of rows of the datasets grows proportional to the number of traits under study.

Also, the BMTME model was improved adding a general covariance structure for the genetic covariance of environments in place of assuming a diagonal matrix as the original BMTME model. Additionally, in this chapter we compare the improved model in terms of prediction accuracy and time of implementation with the original BMTME model of Montesinos-López et al. [3] and with a multiple-trait and multiple-environment model where it is ignored the correlation between traits and between environments. Our hypothesis is that the improved model should be similar in terms of prediction accuracy, but considerably faster in terms of time of implementation with regard to the original BMTME of Montesinos-López et al. [3] and a little better in terms of prediction accuracy that a multiple-trait and multiple-environment model that ignore the correlation between traits and environments. Also, we propose to implement the proposed model with simulated and real data sets. Our results suggest that the construction and implementation of the proposed model should be of great help for breeding scientist and programs since will help to select candidate genotypes early in time with more accuracy.

2. Material and methods

2.1. Matrix normal distribution

The matrix normal distribution is a probability distribution that is a generalization of the multivariate normal distribution to matrix-valued random variables. According with Rowe [4] the n × p matrix normal distribution can be derived as a special case of the np-variate Multivariate Normal distribution when the covariance matrix is separable. A np-dimensional vector x is distributed according to multivariate normal distribution with np-dimensional mean μ and np × np covariance matrix Ω if its probability density function is given by

Pxμ,Ω=2π−np2Ω−12e−12x−μTΩ−1x−μE1

When the covariance matrix Ω is separable, that is, is one of the form Ω = Σ ⊗ Φ, where ⊗ is the Kronecker product which multiplies every entry of its first matrix argument by its entire second matrix argument, Eq. (1) becomes

pXM,Σ,Φ=2π−np2Σ−n2Φ−p2e−12trΣ−1X−MTΦ−1X−ME2

upon using the following matrix identities

Ω=Σ⊗Φ=ΣΦE3

and

x−μTΣ⊗Φ−1x−μ=trΣ−1X−MTΦ−1X−ME4

where X and M are matrix of dimension n × p such that x = vec(X) and μ = vec(M), with vec is the vec operator that stacks the columns of its matrix argument from left to right into a single vector, tr(.) is the trace operator which gives the sum of the diagonal elements of a square matrix argument.

Then, according with Rowe [4] the density function given in Eq. (2) correspond to a random variable that follows a n × p matrix normal distribution and it is denoted as

X∣M,Σ,Φ∼MNn×p(M,Φ,Σ)E5

where (M, Σ, Φ) parametrize the above distribution with M∈R^n × p, and Σ and Φ are positive defined matrix of dimension n × n and p × p, respectively. The matrices Σ and Φ are commonly referred to as the within and between covariance matrices. Sometimes they are referred to as the right and left covariance matrices [4].

Some useful properties of the matrix normal distribution are: the mean and model is equal to E(X| M, Σ, Φ) = M and the variance var(vec(X)| M, Σ, Φ) = Σ ⊗ Φ, which can be found by integration and differentiation. Since X follows a Matrix Normal distribution, the conditional and marginal distributions of any row or column subset are Multivariate Normal distributions [4].

2.2. Univariate model with genotype by environment interaction (M1)

First, for each trait we considered the following univariate linear mixed model:

yij=Ei+gj+gEij+eijE6

were y_ij represents the normal response from the jth line in the ith environment (i = 1, 2, …, I, j = 1, 2, …, J). For illustration purposes, we will use I = 3. E_i represents the fixed effect of the ith environment and is assumed as a fixed effect, g_j represents the random effect of the genomic effect of jth line, with g=gj,…,gJT∼N0,σ12Gg,Gg is of order J × J and represents the Genomic Relationship Matrix (GRM) and is calculated using the VanRaden method [5] as Gg=WWTp, with W as the matrix of marker of order J × p. gE_ij is the random interaction term between the genomic effect of the jth line and the ith environment where gE=gE11…gEIJT∼N0,σ22II⊗G and e_ij is a random error term associated with the jth line in the ith environment distributed as N(0, σ²). As previously mentioned, this model was used for each of the l = 1, …, L traits, where L denotes the number of traits under study.

2.3. Multivariate correlated model with multiple-trait and multiple-environment (M2)

To account for the correlation between traits, all of the L traits given in Eq. (6) should be jointly modeled in a whole multiple-trait, multiple-environment mixed model as the following:

Y=Xβ+Z1b1+Z2b2+eE7

where Y is of order n × L, with n = I × J, X is of order n × I, β is of order I × L and contains the beta coefficients of the environment by trait combinations, Z₁ is of order n × J, Z₂ is of order n × IJ, b₁ is of order J × L and follows a normal matrix distribution MN_J × L(0, G_g, Σ_t), b₂ is of order IJ × L with a normal matrix distribution b₂ ∼ MN_IJ × L(0, Σ_E ⊗ G_g, Σ_t) and e is of order n × L with a normal matrix distribution e ∼ MN_n × L(0, I_n, R_e), where Σ_t is the genetic covariance matrix between traits and it is assumed unstructured (or general), ⊗ denotes a Kronecker product, Σ_E is assumed as a general matrix of order I × I, R_e is the residual general covariance matrix between traits. It is important to point out that the trait × environment (T × E) interaction term is included in the fixed effects, while the trait × genotype (T × G) interaction term is included in the random effect b₁ and the three-way (T × G × E) interaction term is included in b₂.

2.4. Joint posterior density and prior specification

In this section, we provide the joint posterior density and prior specification for the improved BMTME model. Assuming independent prior distributions for β, Σ_t, Σ_E, and R_e, the joint posterior density of the parameter vector becomes:

P(β,b1,b2,Σt,ΣE,Re)  ∝P(Y|β,b1,b2,Re)P(β) P(b1|Σt)P(b2|Σt,ΣE)P(Σt)P(ΣE)P(Re)E8

where P(β), P(Σ_t), P(Σ_E) and P(R_e) denote the density prior distributions of β, Σ_t, Σ_E, and R_e, respectively. Specifically, we are assuming an Inverse-Wishart (IW) for Ω_v with shape parameter κ and scale matrix parameter B, and is denoted by Ω_v ∼ IW(κ, B), with density function given by PΩv∝Bκ2Ωv−κ+p+12exp−12trBΩv−1,κ>0,B,Ωv both are positive definite matrices. For the remaining parameters we are assuming the following prior distributions: ∼ MN_n × p(β₀, I_I, I_L), b₁|Σ_t ∼ MN_J × L(0, G_g, Σ_t), Σ_t ∼ IW(ν_t + L − 1, S_t), b₂|Σ_t, Σ_E ∼ MN_IJ × L(0, Σ_E ⊗ G_g, Σ_t), Σ_E ∼ IW(ν_E + I − 1, S_E), and R_e ∼ IW(ν_e + L − 1, S_e). Next we combine the joint posterior density of the parameter vector with the priors to obtain the full conditional distribution for parameters β, b₁, b₂, Σ_t, R_e. All full conditionals, as well as details of their derivations, are given in Appendix A.

2.5. Gibbs sampler

In order to produce posterior means for all relevant model parameters, below we outline the exact Gibbs sampler procedure that we proposed for estimating the parameters of interest. The ordering of draws is somewhat arbitrary; however, we suggest the following order:

Step 1. Simulate β according to the normal distribution given in Appendix A (A.1).

Step 2. Simulate b_h for h = 1, 2, according to the normal distribution given in Appendix A (A.2 and A.3).

Step 3. Simulate Σ_t according to the IW distribution given in Appendix A (A.4).

Step 4. Simulate Σ_E according to the IW distribution given in Appendix A (A.5).

Step 5. Simulate R_e according to the IW distribution given in Appendix A (A.6).

Step 6. Return to step 1 or terminate when chain length is adequate to meet convergence diagnostics.

2.6. Multivariate uncorrelated model with multiple-trait and multiple-environment (M3)

To compare the model given in Eq. (7) we considered also model M3 (Eq. 6) that consists of using the following multi-trait, multi-environment model that ignore the correlation between traits and between environments:

yijl=Ei+gj+Tl+gEij+TEil+gTjl+gETijl+eijlE9

where y_ijl represents the normal response from the jth line in the ith environment for trait l (i = 1, 2, …, I, j = 1, 2, …, J, l = 1, …, L). T_l represents the fixed effect of the lth trait, TE_il is the fixed interaction term between the lth trait and the ith environment, gT_jl represents the random effect of the interaction of genotype j and the lth trait, with gT=gT11…gTJLT∼N0,σ112G⊗IL, gET_ijl is the three-way interaction of genotype j, the ith environment and the lth trait, with gET=gET111…gETIJLT∼N0,σ222II⊗G⊗IL and e_ijl is a random error term associated with the jth line in the ith environment distributed as N(0, σ²).

2.7. Experimental data sets

2.7.1. Simulate data sets

For testing the proposed models and methods we simulated multiple-trait and multiple-environment data using model in Eq. (7). We studied six scenarios depending of the parameters used. For the first scenario (S1) we used the following parameters: three environments, three traits, 80 genotypes, 1 replication for environment-trait-genotype combination. We assumed that β^T = [15, 12, 7, 14, 10, 9, 13, 11, 8], where the first three beta coefficients belong to traits 1, 2 and 3 in environment 1, the second three values for the three traits in environment 2 and the last three for environment 3. We assumed that the genomic relationship matrix is known and is equal to G_g = 0.3I₈₀ + 0.7J₈₀, where I₈₀ is an identity matrix of order 80 and J₈₀ is a matrix of order 80 × 80 of ones. Therefore, the total number of observations is 3 × 80 × 3 × 1 = 720, that is, 240 for each trait. Since a covariance matrix can be expressed in terms of a correlation matrix (R_r) and a standard deviation matrix (Dr1/2) as:Σr=Dr1/2RrDr1/2, with r = t, E, e, where r = t represent the genetic covariance between traits, r = E represents the genetic covariance matrix between environments and r = e, represents the residual covariance matrix between traits. For the three covariance matrices (r = t, E, e) in this scenario we used R_r = 0.15I₃ + 0.85J₃, where J₃ is a matrix of order 3x3 of ones, andDt1/2=diag(0.9,0.8,0.9), DE1/2=diag(0.5,0.65,0.75) and De1/2=diag(6,0.43,0.33). For the second scenario (S2) we used exactly the same set of parameters defined in S1 except that for the correlation matrix now we assumed that the pair of correlations between traits and between environments is equal to 0.5, that is, R_r = 0.5I₃ + 0.5J₃, while the third scenario (S3) also is exactly as S1 with the exception that R_r = 0.75I₃ + 0.25J₃, that is, the pair of correlations between traits and between environments is equal to 0.25. These three set of correlation matrices given in S1, S2 and S3 were proposed in order to study the performance of the methods proposed in the context of high correlation (S1), medium (S2) and low correlation (S3) between traits (genetic and residual) and between environments. Other 3 scenarios were studied: scenario 4 (S4) is exactly as scenario S1 but in place of 80 lines were used 100 lines, scenario 5 (S5) was exactly as scenario S2 but with 100 lines and the last scenario (S6) was exactly as scenario S3 but using 100 lines in place of 80.

2.7.2. Real wheat data set

Here, we present the information on the first real data set used for implementing the proposed models. This real data set composed of 250 wheat lines that were extracted from a large set of 39 yield trials grown during the 2013–2014 crop season in Ciudad Obregon, Sonora, Mexico [6]. The trials under study were days to heading (DTHD), grain yield (GRYLD), plant height (PTHT) and the green normalized difference vegetation index (GNDVI), each of these traits were evaluated in three environments (Bed2IR, Bed5IR and Drip). The marker information used after editing was 12,083 markers. This data set was also used by Montesinos-López et al. [3] for this reason those interested in more details of this data set see this publication.

2.7.3. Real maize data set

The second real data set used for implementing the proposed models is composed of 309 double-haploid maize lines. Traits available in this data set include grain yield (Yield), anthesis-silking interval (ASI), and plant height (PH); each of these traits were evaluated in three optimum rainfed environments (EBU, KAT, and KTI). The marker information used after editing was 12,083 markers. Also, this data set was also used by Montesinos-López et al. [3] for this reason those interested in more details of this data set see this publication.

2.8. Assessing prediction accuracy

For assessing prediction accuracy for the simulated and real data sets a 20 training (trn)-testing (tst) random partitions were implemented under a cross-validation that mimicked a situation where lines were evaluated in some environments for the traits of interest; however, some lines were missing in all traits in the other environments, this cross-validation scheme is called CV1. Under this cross-validation, we assigned 80% of the lines to the trn set and the remaining 20% to the tst set. We used the Pearson correlation and mean square error of prediction (MSEP) to compare the predictive performance of the proposed models. Models with Pearson correlation closet to one indicated better predictions, while under the MSEP values closed to zero are better in terms of prediction accuracy. It is important to point out that model M2 was implemented with R code done for the authors implementing the Gibbs sampler given above for this model, while model M3 was implemented in the R package BGLR [7].

3. Results

The results are presented in two sections. The first section presents the results of the simulated data set, while the second the results with the real data sets.

3.1. Simulated data sets

In Table 1, under scenario S1 we can observe that the proposed model M2 was the best in terms of prediction accuracy (with Pearson correlation and MSEP) since in the 9 trait-environment combinations model M2 (improved BMTME model) was better than model M3 (uncorrelated multiple-trait multiple-environment). In average in terms of Pearson correlation the model M2 was better than the model M3 by 8.72%, while in terms of MSEP model M2 was better than model M3 in average by 6.24%. Under scenario S2, in terms of Pearson correlation model M2 was better in 7 out of 9 trait-environment combinations and in 6 out of 9 trait-environment combination in terms of MSEP. In terms of Pearson correlation model M2 was better than M3 in average by 7.76%, while in terms of MSEP was better by 2.27% in average (Table 1). While under scenario S3 also model M2 was better than model M3, since in 7 out of 9 trait-environment was the best, while under MSEP model M2 was better than M3 in 5 out of 9 trait-environment combination, however, in average model M2 was better than model M3 by 3.98 and 1.028% in terms of Pearson correlation and MSEP, respectively (Table 1).

Scenario	Trait_Env	M2				M3
Scenario	Trait_Env	CorP	SE	MSEP	SE	CorP	SE	MSEP	SE
	11	0.401	0.052	0.693	0.050	0.375	0.048	0.723	0.050
	21	0.481	0.044	0.561	0.033	0.434	0.044	0.605	0.035
	31	0.563	0.042	0.494	0.033	0.530	0.043	0.522	0.033
	12	0.408	0.037	0.658	0.045	0.343	0.041	0.715	0.046
S1	22	0.485	0.049	0.648	0.049	0.393	0.053	0.728	0.056
	32	0.506	0.042	0.580	0.049	0.420	0.049	0.642	0.049
	13	0.595	0.030	0.528	0.033	0.570	0.034	0.535	0.034
	23	0.473	0.043	0.565	0.036	0.461	0.039	0.582	0.039
	33	0.629	0.031	0.424	0.027	0.619	0.036	0.441	0.033
	Average	0.505	0.041	0.572	0.040	0.461	0.043	0.610	0.042
	11	0.349	0.054	0.748	0.057	0.302	0.052	0.750	0.055
	21	0.486	0.044	0.571	0.030	0.447	0.042	0.603	0.031
	31	0.503	0.044	0.588	0.033	0.508	0.045	0.579	0.031
S2	12	0.384	0.037	0.590	0.045	0.335	0.038	0.602	0.040
	22	0.476	0.049	0.664	0.047	0.407	0.053	0.726	0.057
	32	0.415	0.044	0.626	0.059	0.368	0.048	0.651	0.061
	13	0.599	0.028	0.548	0.043	0.566	0.030	0.537	0.037
	23	0.373	0.051	0.719	0.058	0.374	0.048	0.723	0.057
	33	0.565	0.034	0.530	0.043	0.584	0.037	0.513	0.047
	Average	0.448	0.044	0.632	0.046	0.413	0.045	0.646	0.046
	11	0.326	0.054	0.764	0.055	0.297	0.053	0.777	0.055
	21	0.480	0.043	0.588	0.030	0.443	0.041	0.616	0.030
	31	0.446	0.045	0.657	0.035	0.465	0.047	0.629	0.030
S3	12	0.404	0.038	0.545	0.045	0.391	0.038	0.553	0.039
	22	0.470	0.047	0.661	0.045	0.402	0.050	0.721	0.055
	32	0.343	0.045	0.630	0.062	0.311	0.048	0.648	0.064
	13	0.567	0.035	0.598	0.048	0.552	0.030	0.592	0.042
	23	0.327	0.054	0.832	0.067	0.324	0.052	0.831	0.067
	33	0.498	0.034	0.615	0.055	0.522	0.036	0.584	0.056
	Average	0.429	0.044	0.654	0.049	0.412	0.044	0.661	0.049

Table 1.

Comparison in terms of prediction accuracy of models M2 and M3 under scenarios S1, S2 and S3.

CorP: average of Pearson correlation; SE: standard error, MSEP: mean square error of prediction. S1: scenario with high correlation (0.85); S2: scenario with medium correlation (0.5); S3: scenario with low correlation (0.25). The values of this table correspond to the simulations done with 80 lines in each environment. In bold are the best predictions of each row (Trait-Env).

In Table 2, under scenario S4 model M2 was the best in terms of prediction accuracy (with Pearson correlation and MSEP) since in the 9 trait-environment combinations was better than model M3. In average in terms of Pearson correlation and MSEP model M2 was better than model M3 by 4.4 and 4.1%, respectively. Also, under scenario S5, in terms of Pearson correlation and MSEP, model M2 was better than model M3 in 7 out of 9 and in 6 out of 9 trait-environment combinations, respectively. Model M2 was better than M3 in average by 1.6% in terms of Pearson correlation and by 1.2% in average in terms of MSEP (Table 2). While under scenario S6 also model M2 was better than model M3 in terms of Pearson correlation, since in 7 out of 9 trait-environment was the best, while under MSEP model M2 was better than M3 in 5 out of 9 trait-environment combination, however, in average model M2 was better than model M3 by 1.6 and 1.02% in terms of Pearson correlation and MSEP, respectively (Table 2).

Scenario	Trait_Env	M2				M3
Scenario	Trait_Env	CorP	SE	MSEP	SE	CorP	SE	MSEP	SE
	11	0.495	0.042	0.782	0.052	0.483	0.043	0.800	0.056
	21	0.569	0.028	0.693	0.050	0.534	0.035	0.731	0.055
	31	0.621	0.028	0.589	0.038	0.596	0.033	0.619	0.044
	12	0.467	0.043	0.814	0.044	0.449	0.044	0.850	0.043
S4	22	0.471	0.040	0.689	0.040	0.440	0.041	0.740	0.046
	32	0.572	0.034	0.548	0.035	0.534	0.035	0.597	0.034
	13	0.498	0.040	0.975	0.060	0.486	0.035	0.984	0.060
	23	0.535	0.035	0.812	0.051	0.520	0.032	0.824	0.054
	33	0.631	0.034	0.638	0.043	0.604	0.029	0.674	0.044
	Average	0.540	0.036	0.727	0.046	0.516	0.036	0.758	0.049
	11	0.403	0.052	0.805	0.055	0.405	0.050	0.807	0.056
	21	0.537	0.029	0.666	0.047	0.510	0.035	0.688	0.049
	31	0.567	0.031	0.595	0.040	0.555	0.032	0.608	0.042
S5	12	0.399	0.051	0.899	0.051	0.397	0.053	0.907	0.053
	22	0.432	0.041	0.722	0.043	0.406	0.043	0.749	0.048
	32	0.509	0.034	0.554	0.037	0.503	0.035	0.564	0.035
	13	0.416	0.043	1.025	0.056	0.413	0.040	1.024	0.055
	23	0.487	0.033	0.791	0.042	0.488	0.034	0.784	0.045
	33	0.588	0.037	0.625	0.040	0.589	0.032	0.630	0.038
	Average	0.482	0.039	0.742	0.046	0.474	0.039	0.751	0.047
	11	0.370	0.054	0.798	0.057	0.369	0.052	0.802	0.057
	21	0.512	0.028	0.635	0.043	0.485	0.033	0.654	0.043
	31	0.521	0.034	0.587	0.040	0.511	0.034	0.596	0.041
S6	12	0.367	0.052	0.945	0.057	0.364	0.055	0.948	0.060
	22	0.412	0.040	0.759	0.045	0.382	0.042	0.776	0.047
	32	0.449	0.034	0.576	0.036	0.466	0.034	0.568	0.034
	13	0.379	0.045	1.013	0.053	0.374	0.042	1.016	0.051
	23	0.462	0.032	0.759	0.038	0.463	0.033	0.751	0.039
	33	0.542	0.039	0.618	0.040	0.558	0.035	0.610	0.036
	Average	0.446	0.040	0.743	0.045	0.441	0.040	0.747	0.045

Table 2.

Comparison in terms of prediction accuracy of models M2 and M3 under scenarios S4, S5 and S6.

CorP: average of Pearson correlation obtained across all trait-environment combination; SE: standard error; MSEP: mean square error of prediction. S4: scenario with high correlation (0.85); S5 the scenario with medium correlation (0.5); S6: scenario with low correlation (0.25). The values of this table correspond to the simulations done with 100 lines in each environment. In bold are the best predictions of each row (Trait-Env).

3.2. Real data sets

In Table 3 we can observe that in the wheat data set the best predictions were observed under the proposed improved BMTME model (M2), since in all trait-environment combinations was better model M2 in terms of Pearson correlation and in 10 out of 12 was the better in terms of MSEP than model M3 (that ignore the correlation between traits and between environments). However, in the maize data set the best predictions were observed under model M3, since in 5 out of 9 trait-environment combinations this model was superior to model M2, however there is not a great superiority of the results under model M3 regarded to model M2. This results obtained in the maize data set are in agreement with the correlation study performed since this data set has a very low genetic correlation between traits and between environments.

Data set	Trait_Env	M2				M3
Data set	Trait_Env	CorP	SE	MSEP	SE	CorP	SE	MSEP	SE
	DTHD_Bed2IR	0.876	0.008	8.117	0.692	0.875	0.009	10.636	0.882
	GNDVI_Bed2IR	0.848	0.008	0.000	0.000	0.009	0.022	0.103	0.006
	GRYLD_Bed2IR	0.639	0.014	0.055	0.002	0.463	0.015	0.161	0.007
	PTHT_Bed2IR	0.658	0.014	22.527	0.841	0.566	0.020	25.798	0.895
	DTHD_Bed5IR	0.873	0.007	13.074	0.733	0.845	0.010	15.312	0.508
Wheat	GNDVI_Bed5IR	0.758	0.019	0.000	0.000	0.496	0.023	0.219	0.011
	GRYLD_Bed5IR	0.178	0.023	0.251	0.008	0.175	0.020	0.336	0.014
	PTHT_Bed5IR	0.076	0.016	24.064	0.620	0.245	0.023	20.831	0.721
	DTHD_Drip	0.915	0.005	4.514	0.201	0.895	0.006	3.321	0.224
	GNDVI_Drip	0.681	0.012	0.000	0.000	−0.262	0.022	0.123	0.008
	GRYLD_Drip	0.653	0.011	0.126	0.005	0.638	0.011	0.144	0.005
	PTHT_Drip	0.658	0.019	21.565	0.531	0.602	0.012	21.306	0.728
	Average	0.651	0.013	7.858	0.303	0.462	0.016	8.191	0.334
	Yield_EBU	0.320	0.019	0.789	0.018	0.365	0.018	0.731	0.017
	ASI_EBU	0.501	0.016	0.396	0.012	0.510	0.015	0.391	0.012
	PH_EBU	0.308	0.025	0.015	0.003	0.305	0.011	0.010	0.000
	Yield_KAK	0.402	0.022	0.446	0.020	0.416	0.020	0.438	0.019
Maize	ASI_KAK	0.389	0.015	0.936	0.043	0.423	0.018	0.822	0.029
	PH_KAK	0.462	0.025	0.011	0.001	0.369	0.022	0.013	0.001
	Yield_KTI	0.276	0.015	0.848	0.022	0.318	0.018	0.825	0.024
	ASI_KTI	0.290	0.018	0.607	0.018	0.280	0.020	0.614	0.019
	PH_KTI	0.460	0.017	0.019	0.001	0.443	0.017	0.020	0.001
	Average	0.379	0.019	0.452	0.015	0.381	0.018	0.429	0.014

Table 3.

Comparison in terms of prediction accuracy of models M2 and M3 using the two real data sets.

CorP: average of Pearson correlation obtained across all trait-environment combination; SE: standard error; MSEP: mean square error of prediction. Trait_Env means trait-environment combination. In bold are the best predictions of each row (Trait-Env).

According to the results observed with the simulated data sets (Tables 1 and 2) and real data sets (Table 3) there is evidence that the larger the correlation between traits (genetic and residual) and environments (genetic) the better the performance of the proposed improved BMTME (M2) model with regard to the uncorrelated multiple-trait and multiple-environment model (M3), which means that when the there is considerable correlation between traits and between environments this help to increase prediction accuracy.

4. Conclusions

In this paper we proposed an improved version of the Bayesian multiple-trait multiple-environment (BMTME) model of Montesinos-López et al. [3] that was derived using the matrix normal distribution. The advantage of the proposed model (M2) is that it is more efficient in terms of time of implementation since this improved version works using as rows the genotypes by environment combinations in place of using as rows the combination of traits, genotypes and environments which allows a more practical implementation of the Gibbs sampler in terms of time of implementation. Another, improvement of the BMTME model is that now allows unstructured covariance matrix for modeling environments in place of only a diagonal matrix as the original BMTME model. We compared the extended model (M2) with an uncorrelated multiple-trait and multiple-environment model (M3) that ignores the general correlation between traits (genetic and residual) and between environments and we found that the proposed improved BMTME model (M2) outperforms model (M3) in all the scenarios under study with simulation, however the larger the correlation between traits and between environments the better the performance in terms of prediction accuracy of the improved BMTME model. Additionally, we provided all full conditionals required for the implementation of the improved BMTME model (see Gibbs sampler section and Appendix A). However, we are aware that more empirical evidence with real and simulated data is needed to support our findings, and for this reason, we encourage researcher to implement our proposed improved model and compare with models that ignore the correlation between traits and between environments like the model M3 given in Eq. (8).

Full conditional distribution for vec(β)

PvecβELSE∝exp{−12trRe−1Y−Xβ−Z1b1−Z2b2TInY−Xβ−Z1b1−Z2b2−12trIL−1β−β0TII−1β−β0}∝exp{−12vecY−Xβ−Z1b1−Z2b2TRe−1⊗InvecY−Xβ−Z1b1−Z2b2−12vecβ−β0TIL−1⊗II−1β−β0}∝exp{ −12[vec(Y)−(IL⊗X)vec(β)−vec(Z1b1) −vec(Z2b2)]T[ Re−1⊗In ][ vec(Y)−(IL⊗X)vec(β)−vec(Z1b1)−vec(Z2b2 ]−12vecβ−vecβ0TIL−1⊗II−1vecβ−vecβ0}∝exp{−12[vecβ−vec(β∼)]TΣ∼β−1[vecβ−vec(β∼)]}∝N(vec(β∼),Σ∼β)EA1

where Σ∼β=IL−1⊗II−1+Re−1⊗XTX−1, vecβ∼=Σ∼βIL−1⊗II−1vecβ0+Re−1⊗XT vecY−vecZ1b1−vecZ2b2}.

In the simplification of some calculations the following properties were involved: tr(AB) = vec(A^T)^Tvec(B) = vec(B)^Tvec(A^T), and vec (AXB) = (B^T ⊗ A)vec(X).

Full conditional for vec(b₁)

P(vec(b1)|ELSE)∝exp{ −12[vec(Y)−vec(Xβ)−(IL⊗Z1)vec(b1)−vec(Z2b2)]T [ Re−1⊗In ][ vec(Y)−vec(Xβ)−(IL⊗Z1)vec(b1)−vec(Z2b2) ]−12[ vec(b1)T[ Σt−1⊗Gg−1 ] ]vec(b1) }∝exp{ −12[ vec(b1)−vec(b˜1) ]TΣ˜b1−1[ vec(b1)−vec(b˜1) ] }∝N(vec(b˜1),Σ˜b1)EA2

where Σ∼b1=Σt−1⊗Gg−1+Re−1⊗Z1TZ1−1 and vecb∼1=Σ∼b1Re−1⊗Z1TvecY−vecXβ−vecZ2b2.

Full conditional for vec(b₂)

P(vec(b2)|ELSE)∝exp{ −12 [ vec(Y)−vec(Xβ)−vec(Z1b1) −(IL⊗Z2)vec(b2) ]T(Re−1⊗In)[ vec(Y)−vec(Xβ)−vec(Z1b1)−(IL⊗Z2)vec(b2) ]−12vec(b2)T(Σt−1⊗ΣE−1⊗Gg−1)vec(b2) }∝exp{ −12[ vec(b2)−vec(b˜2) ]TΣ˜b2−1[ vec(b2)−vec(b˜2) ] }∝N(vec(b˜2),Σ˜b2)EA3

where Σ∼b2=Σt−1⊗ΣE−1⊗Gg−1+Re−1⊗Z2TZ2−1 and vecb∼2=Σ∼b2Re−1⊗Z2TvecY−vecXβ−vecZ1b1.

Full conditional for Σ_t

PΣtELSE∝Pb1ΣtPb2ΣtPΣt∝Σt−J2Gg−L2exp−12trb1*TGg−1b1*Σt−1×Σt−IJ2ΣE⊗Gg−L2exp−12trb2*TΣE−1⊗Gg−1b2*Σt−1PΣt∝Σt−J+IJ2exp−12trb1*TGg−1b1*+b2*TΣE−1⊗Gg−1b2*Σt−1Stυt+L−12×Σt−υt+L2exp−12trStΣt−1∝Σt−υt+J+IJ+L−1+12exp−12trb1*TGg−1b1*+b2*TΣE−1⊗Gg−1b2*+StΣt−1∝IWυt+J+L+IJ−1b1*TGg−1b1*+b2*TΣE−1⊗Gg−1b2*+StEA4

Full conditional for Σ_E

PΣEELSE∝Pb2ΣEPΣE∝ΣE−JL2ΣT⊗Gg−I2exp−12trb2*TΣt−1⊗Gg−1b2*ΣE−1SEυE+I−12×ΣE−υE+I2exp−12trSEΣE−1∝ΣE−υE+I+JL2exp−12trb2*TΣt−1⊗Gg−1b2*+SEΣE−1SEυE+I−12∝IWυE+JL+I−1b2*TΣt−1⊗Gg−1b2*+SEEA5

Full conditional for R_e

PReELSE∝PYβ,b1,b2,RePRe∝Re−n2exp−12trY−Xβ−Z1b1−Z2b2TInY−Xβ−Z1b1−Z2b2Re−1×Seυe+L−12Re−υe+L2exp−12trSeRe−1∝Re−υe+n+L−1+12exp−12trY−Xβ−Z1b1−Z2b2TY−Xβ−Z1b1−Z2b2+SeRe−1∝IWυe+n+L−1Y−Xβ−Z1b1−Z2b2+SeEA6

where ‖(Y − Xβ − Z₁b₁ − Z₂b₂)‖ = (Y − Xβ − Z₁b₁ − Z₂b₂)^T(Y − Xβ − Z₁b₁ − Z₂b₂) .

References

1. Henderson CR, Quaas RL. Multiple trait evaluation using relatives' records. Journal of Animal Science. 1976;43(6):1188-1197
2. Pollak EJ, Van der Werf J, Quaas RL. Selection bias and multiple trait evaluation. Journal of Dairy Science. 1984;67(7):1590-1595
3. Montesinos-López OA, Montesinos-López A, Crossa J, Toledo F, Pérez-Hernández O, Eskridge KM, Rutkoski JA. Genomic Bayesian multi-trait and multi-environment model. G3: Genes|Genomes|Genetics. 2016;6(9):2725-2744
4. Rowe DB. Multivariate Bayesian Statistics. Models for Source Separation and Signal Unmixing. Boca Raton, FL: Chapman & Hall/CRC; 2003
5. VanRaden PM. Efficient methods to compute genomic predictions. Journal of Dairy Science. 2008;91(11):4414-4423
6. Rutkoski J, Poland J, Mondal S, Autrique E, González-Pérez L, Crossa J, Reynolds M, Singh R. Predictor traits from high-throughput phenotyping improve accuracy of pedigree and genomic selection for grain yield in wheat. G3: Genes|Genomes|Genetics. 2016;6(3):2799-2808
7. de los Campos G, Pérez-Rodríguez P. Bayesian Generalized Linear Regression. R Package Version 1.0.4 http://CRAN.R-project.org/package=BGLR. 2014

[1] 1. Henderson CR, Quaas RL. Multiple trait evaluation using relatives' records. Journal of Animal Science. 1976;43(6):1188-1197

[2] 2. Pollak EJ, Van der Werf J, Quaas RL. Selection bias and multiple trait evaluation. Journal of Dairy Science. 1984;67(7):1590-1595

[3] 3. Montesinos-López OA, Montesinos-López A, Crossa J, Toledo F, Pérez-Hernández O, Eskridge KM, Rutkoski JA. Genomic Bayesian multi-trait and multi-environment model. G3: Genes|Genomes|Genetics. 2016;6(9):2725-2744

[4] 4. Rowe DB. Multivariate Bayesian Statistics. Models for Source Separation and Signal Unmixing. Boca Raton, FL: Chapman & Hall/CRC; 2003

[5] 5. VanRaden PM. Efficient methods to compute genomic predictions. Journal of Dairy Science. 2008;91(11):4414-4423

[6] 6. Rutkoski J, Poland J, Mondal S, Autrique E, González-Pérez L, Crossa J, Reynolds M, Singh R. Predictor traits from high-throughput phenotyping improve accuracy of pedigree and genomic selection for grain yield in wheat. G3: Genes|Genomes|Genetics. 2016;6(3):2799-2808

[7] 7. de los Campos G, Pérez-Rodríguez P. Bayesian Generalized Linear Regression. R Package Version 1.0.4 http://CRAN.R-project.org/package=BGLR. 2014

A Bayesian Multiple-Trait and Multiple-Environment Model Using the Matrix Normal Distribution

Physical Methods for Stimulation of Plant and Mushroom Development

Abstract

Keywords

Author Information

Osval A. Montesinos-López*

Abelardo Montesinos-López

José Cricelio Montesinos-López

José Crossa

Francisco Javier Luna-Vázquez

Josafhat Salinas-Ruiz

1. Introduction

2. Material and methods

2.1. Matrix normal distribution

2.2. Univariate model with genotype by environment interaction (M1)

2.3. Multivariate correlated model with multiple-trait and multiple-environment (M2)

2.4. Joint posterior density and prior specification

2.5. Gibbs sampler

2.6. Multivariate uncorrelated model with multiple-trait and multiple-environment (M3)

2.7. Experimental data sets

2.7.1. Simulate data sets

2.7.2. Real wheat data set

2.7.3. Real maize data set

2.8. Assessing prediction accuracy

3. Results

3.1. Simulated data sets

Table 1.

Table 2.

3.2. Real data sets

Table 3.

4. Conclusions

References

Branch Formation and Yield by Flower Bud or Shoot Removal in Tomato

A Bayesian Multiple-Trait and Multiple-Environment Model Using the Matrix Normal Distribution

Physical Methods for Stimulation of Plant and Mushroom Development

Abstract

Keywords

Author Information

Osval A. Montesinos-López*

Abelardo Montesinos-López

José Cricelio Montesinos-López

José Crossa

Francisco Javier Luna-Vázquez

Josafhat Salinas-Ruiz

1. Introduction

2. Material and methods

2.1. Matrix normal distribution

2.2. Univariate model with genotype by environment interaction (M1)

2.3. Multivariate correlated model with multiple-trait and multiple-environment (M2)

2.4. Joint posterior density and prior specification

2.5. Gibbs sampler

2.6. Multivariate uncorrelated model with multiple-trait and multiple-environment (M3)

2.7. Experimental data sets

2.7.1. Simulate data sets

2.7.2. Real wheat data set

2.7.3. Real maize data set

2.8. Assessing prediction accuracy

3. Results

3.1. Simulated data sets

Table 1.

Table 2.

3.2. Real data sets

Table 3.

4. Conclusions

References

Continue reading from the same book

Physical Methods for Stimulation of Plant and Mushroom Development