Most traits important to agriculture, biology, and biomedicine are complex traits, determined by both genetic and environmental factors. The complex traits that change their phenotypes over different stages of development are called dynamic traits. Traditional quantitative trait loci (QTLs) mapping approaches ignore the dynamic changes of complex traits. Functional mapping, as a powerful statistical tool, can not only map QTLs that control the developmental pattern and process of complex traits, but also describe the dynamic changes of complex traits. In this study, we used functional mapping to identify those QTLs that affect height growth in 10th generation recombinant inbred lines derived from two different Arabidopsis thaliana accessions. Functional mapping identified 48 QTLs that are related to height traits. The growth curves of different genotypes can be drawn for each significant locus. By GO gene function annotations, we found that these QTLs detected are associated with the synthesis of biological macromolecules and the regulation of biological functions. Our findings provide unique insights into the genetic control of height growth of A. thaliana and will provide a theoretical basis for the study of complex traits.
- complex traits
- functional mapping
- Arabidopsis thaliana
Complex traits are genetic traits controlled by multiple genes. They are sensitive to environmental changes and easily affected by the environment . The phenotypic expression of complex traits in individuals within a population displays a continuous variation and generally a normal distribution. Most important biochemical, medical, and agronomic traits and the majority of human diseases are complex traits that are controlled by interconnected genetic networks and environmental factors. In these gene networks, the effect of each gene is small . The characteristics also change at different developmental stages and demonstrate the dynamic features of complex traits. The study of these complex traits is an important topic in modern biology.
Genetic mapping aimed at mapping underlying genes to genomic locations is a powerful tool for dissecting the genetic architecture of complex traits. Lander and Botstein have proposed an approach for mapping quantitative trait loci (QTLs) based on a sparse-density linkage map of molecular markers. This so-called interval mapping method can overcome the confounding problem of marker-QTL recombination . Composite interval mapping includes other markers as covariates to control the overall genetic background, which displays increased power in QTL detection . Considering QTL-QTL epistatic interactions in a linkage map, Kao et al. proposed using multiple marker intervals to map QTLs . Currently, statistical methodologies for QTL mapping include regression analysis, maximum likelihood, and the Bayesian approach. With the development of high-throughput single nucleotide polymorphism (SNP) genotyping techniques, genome-wide association studies (GWAS) have provided a powerful means of mapping a complete set of genes underlying complex traits [6, 7, 8]. The genetic structure of a trait is explained by GWAS, which identify the numbers and chromosomal locations of each gene, the size of each gene’s unique and pleiotropic effects, and the relative contributions of additive, dominant, and epistatic genetic effects. This provides an unprecedented tool for preparing a genotype-phenotype map . So far, GWAS have detected many genetic variants for a wide range of complex traits, including those pertaining to agriculture, forestry, and human disease [9, 10].
It should be noted, however, that traits such as height and weight vary with time or other independent environmental stimuli. The traditional QTL mapping method directly associates a single marker with a single phenotype at a time point, which ignores the dynamic characteristics of organisms at different developmental stages, and cannot exactly reflect the whole genetic architecture of complex traits. Although thousands of QTLs were detected in many individuals, only a small number of QTLs were cloned and separated . The reason for this problem is that the QTLs that have undergone rigorous statistical testing are divorced from biological relevance, which limited the projections of the genetic structure of traits. Genetic analysis of dynamic traits presents a serious statistical challenge. To solve these problems, Ma et al.  proposed a QTL mapping method based on a logistic-mixture model . The QTL effect on developmental traits during ontogeny is considered as a function of time, and a series of growth formulas can be derived from the logistic curve describing plant height, size, and weight , arriving at a model which is expected to be improved in parameter estimation and statistical inference over previous models. Ma et al.  developed a maximum likelihood statistical framework based on a logistic-mixture model for the characteristics of function-valued traits, which change as a function of a specific variable. This QTL mapping strategy is called functional mapping.
Functional mapping combines mathematical functions that describe biological processes and assembles mathematical formulas into the statistical framework of QTL mapping to study the interactions between genes and phenotypic traits of organisms during growth and development. We estimate the parameters of a specific genotype that determines the development of a trait by statistical hypothesis testing and parameter estimation, rather than directly estimating the genetic effects of the gene at all-time points. Because of the combination of biological laws (defined by mathematical models) and the reduced number of genetic parameters that need to be estimated, functional mapping increased the potency of detecting significant QTLs . Statistical merits of functional mapping will be exemplified in which case data are recorded irregularly, bringing on data sparsity. Such sparse longitudinal data cannot be well analyzed by traditional mapping for two reasons. First, because the problem of missing data existing at a given time point, traditional method is unable to use all of individuals, thus leading to a biased parameter estimation and reduced power of QTL detection. Second, individuals are measured at a few number of time points, limiting the fit of growth equation. Functional mapping is robust for handling longitudinal sparse data in which no single time point has the phenotypic data for all individuals, facilitating the QTL mapping to study the genetic architecture of hard-to-measure traits.
A set of tree data is used to assess the statistical validity of functional mapping. Several QTLs affecting the developmental trajectories of poplar were detected with the QTL mapping method based on the logistic-mixture model , and these QTLs were located on a genetic linkage map constructed by polymorphic markers. Studies have shown that functional mapping is useful in establishing gene-phenotype relationships and predicting QTL phenotypes of individual organisms based on the control of a trait. Functional mapping combines the principles of Mendelian genetics with statistical and developmental mechanisms, and is superior to traditional QTL mapping methods that combine the principles of statistics and genetics. To date, functional mapping has been used for mapping dynamic QTLs in poplars , jujubes , soybeans , rice , maize , yeast , oysters , mice , humans , and drug responses [25, 26].
2. Materials and methods
In this study, the mapping population included 114 RILs derived from two cultivars of
For vernalization of
2.2. Phenotypic data acquisition
Plant height was measured after 1 week of planting. Each line of
The plant height phenotype was measured for 1160
The growth of
2.3. Acquisition of SNP data
2.3.1. DNA quality inspection
The degree of degradation of the DNA was analyzed by agarose gel electrophoresis and imaged with an ultraviolet gel imager. The DNA sample was determined to be without degradation if the sample did exhibit dispersion phenomena. All of the samples were determined not to be degraded and were sequenced.
We measured the absorbance of the DNA samples at wavelengths of 260 and 280 nm and calculated the value of the optical density (OD) 260/OD280. Samples were determined to be of sufficient quality if this value was greater than 1.8 and less than 2.2. After testing, all samples were found to meet the requirements for sequencing.
2.3.2. Sequencing and mapping
DNA from the sample was sequenced on the Illumina HiSeq 2000 platform to generate 125 bp paired-end reads at greater depth. All these RILs sequencing were performed at Total Genomics Solution (TGS) Institute. After sequencing quantity control, the data of 107 samples reached 117.68 G, with an average of 1.10 G per sample. The percentage of Q30 bases was more than 90%, the percentage of Q20 bases was more than 95%, and the distribution of GC was normal. Thus, the quantity and quality of the sequencing met the requirements for subsequent analysis.
The per-base coverage depth across all contigs was calculated by mapping raw reads from each RIL against reference genomes. The results showed that the average mapping ratio and sequencing coverage of the samples were higher than 95 and 92%, respectively. The sequence depth of the samples was 9.91×. In addition to the line 110, the other lines are better for subsequent variation detection and analysis.
2.4. SNP calling
SNP detection was performed with the widely accepted mutation detection software GATK.SNP. The screening criteria were as follows: The depth of sequencing for each sample was greater than or equal to 4, otherwise the sample was marked as missing. Additionally, the quality value of the comparison must be greater than or equal to 20, and the variation detection quality value must be greater than or equal to 50. If the allele frequency was less than 5% or the absence rate of the sample was greater than 50%, the site was filtered out.
The RIL population contains 105 progeny and two parents; the parent number is “A518_LER” and “A518_SHA.” The reads of each sample were compared with the reference genome. The average ratio of the samples was above 95%, and the coverage was greater than 92%. The average sequencing depth was 9.19×. By processing, we obtained 107 samples and 1,023,325 whole genome SNP markers. According to the situation of this population, individuals with heterozygous genotype ratios over 25% and markers indicating that parents possessed heterozygous genotypes were eliminated. Finally, 609,427 SNPs and 80 individuals met the model requirements and could be used to detect the QTLs that affect the growth height of
3. Functional mapping
3.1. Statistical model
Let denote the vector of trait values for RIL,
where indicates the unknown parameters including the time-dependent effects of different QTL genotypes and the time-dependent residual variance and correlations. and are a multivariate normal distribution with a time-dependent mean vector for genotype
(T × T)-dimensional longitudinal covariance matrix is expressed as , which can be modeled by using a statistical approach such as the first-order autoregressive [AR(1)] model or an autoregressive moving-average process (ARMA). The maximum-likelihood estimates (MLEs) of the unknown parameters are implemented with the simplex algorithm in R software .
3.2. Modeling the mean vector
One of the most important equations for capturing time-specific change in growth is the logistic curve [49, 50], which we used to describe height growth of the QTL genotype according to the following expression:
3.3. Modeling the covariance structure
The first-order autoregressive [AR(1)] model has been successfully applied to model the structure of the within-subject covariance matrix for functional mapping. The AR(1) model includes two basic assumptions: (1) variance stationarity (the residual variance,
where 0 <
3.4. Hypothesis tests
After the MLEs of the parameters are obtained, the hypothesis concerning the existence of a QTL affecting overall growth can be written as:
where the null hypothesis H0 corresponds to the reduced model and the alternative hypothesis H1 corresponds to the full model. The test statistics for testing the hypotheses is the log-likelihood ratio (LR) of the full over reduced model. The critical threshold is determined from permutation tests.
3.5. Candidate gene function annotation
Gene ontology (GO) annotation analysis was performed using Blast2GO . Finally, R script language programming was used to translate the GO annotation results into charts.
4.1. QTL detection
By plotting the total growth against growth week, it was observed that each of the 116 mapped genotypes followed the logistic growth curve. Figure 3 illustrates the growth trajectories of height for each RIL over 9 weeks. A least-squares approach was used to fit height growth with the logistic curve (Eq. (3)) for each RIL. Based on statistical tests, all genotypes can be fit by a logistic curve containing parameters a, b, and r (P < 0.01, Figure 3).
According to the fit results, the relative growth rate of
Functional mapping was implemented to analyze the mapping population. A Manhattan plot of LR values against the genome locations of SNPs distributed throughout the genome is shown in Figure 4. We found QTLs that affect the growth trajectory of plant height on the fourth chromosome of the RIL population (Figure 4). The genome-wide empirical estimate of the critical value is obtained from permutation tests. The profile of the LR value of the full and reduced model across the length of chromosome 4 has a clear peak from 7.9 to 22.5 kb. A total of 48 significant QTLs responsible for growth trajectories of plant heights were identified. The LR value at this peak is 437.78, which is beyond the empirical critical threshold at the significance level
Functional mapping can also be used to observe the dynamic expression of a QTL over time such as when a QTL starts to affect a growth process. The parameter combination of the
On the basis of the hypothesis test (6), this QTL is detected to be inactive until
4.2. Candidate gene function annotation
GO classification is widely used for gene classification and functional annotation, and GO provides three types of semantic terms, including cell component, molecular function, and biological process, to describe the characteristics of genes and gene products [52, 53]. To further summarize the data of GO classification, the cytological components include cell structure, tissue, protein complex, extracellular structure, and cell process. Biological processes include developmental processes, physiological processes, regulatory processes, and the processes of responding to stress. Molecular functions include binding, catalysis, activation, structural molecules, and transcriptional regulatory functions (Figure 6).
Functional prediction and classification analysis of 48 loci were screened with the National Center for Biotechnology and the Joint Genome Institute (JGI) databases. There are 20 gene families in these loci, which include the F-box and calcium-binding EF-hand protein families. Pathway cluster analysis was used to compare the 48 genes with the known protein sequences in the JGI database.
It was found that 7 of the 48 genes corresponded to functions in the JGI database. They were divided into nine groups, including carbon fixation pathways in prokaryotes, biosynthesis of antibiotics, one-carbon pool by folate, starch, and sucrose metabolism, glycerolipid metabolism (synthase), glycerolipid metabolism (lipase), polyketide sugar unit biosynthesis, streptomycin biosynthesis, and pentose and glucuronate interconversions. We found that these nine pathways are mostly associated with carbohydrate, energy, and amino acid metabolism. These biological processes play an important role in the development of height in
We also found that AT4G00160.1 encodes an F-box protein in the signal transduction pathway. It has been shown that F-box is an auxin receptor . Plant height development is regulated by gibberellin (GA) and auxin (indole-3-acetic acid [IAA]) , and GA20ox and GA30ox are encoded by multiple genes, and mutations in these loci can result in dwarfing of the plant in the later stage of GA biosynthesis . The most significant site, AT4G01150.1, is related to protein curvature thylakoid chloroplastic. Protein curvature thylakoid chloroplastic tends to be located in leaves and stems, and plays an important role in plant photosynthesis, which affects plant growth .
Gene mapping has been shown to be a powerful approach for the study of the genetic architecture of complex traits. It has been instrumental for the characterization of QTLs that control quantitative traits of interest to agriculture, biology, and human disease [58, 59]. However, traditional mapping strategies do not provide much insight into the genetic control mechanisms for phenotypic variations if some statistical and biological issues related to the approach are not resolved. Ma et al. (2002) integrated some fundamental biological principles into the mapping framework, aimed at generating more biologically meaningful discoveries related to trait formation and development, further proposing so-called functional mapping . Functional mapping attempts to combine strong statistical and molecular genetics with the developmental mechanisms of biological features, and to elucidate the genetic mechanisms of complex traits. Since functional mapping combines different mathematical functions with biological significance, it possesses three advantages over traditional mapping methods in QTL mapping: (1) because the underlying biological mechanism is considered, the results of functional mapping are closer to biological reality; (2) a smaller sample size can be used to achieve sufficient accuracy for QTL detection because multiple measurements of the same individual improve mapping accuracy; and (3) by treating the growth process as a smooth curve, a large number of variables can be analyzed simultaneously, and the estimation of a small number of parameters can improve the accuracy of the parameter estimation and flexibility of the model.
With the development of high-throughput sequencing technology and the reduction of sequencing cost, GWAS have become an important tool for studying complex traits and have been widely used in genetic studies of complex traits in humans, animals, and plants . Most GWAS only use single phenotypic data to perform regression analysis with each SNP such as with Plink software . In addition, some GWAS have been developed to solve the false positive loci of population structure and genetic relationship [62, 63, 64]. The successes and potential of GWAS have not been explored when complex phenotypes arise as a curve. In any regard, a curve is more informative than a point in describing the biological features of a trait. To apply functional mapping to GWAS by integrating GWAS and functional aspects of dynamic traits, a new analytical model for genome-wide association analysis of dynamic data, called functional GWAS (fGWAS), has been derived . There are two advantages to fGWAS over GWAS: (1) fGWAS is able to identify genes that determine the final form of the trait and (2) it provides the ability to study the temporal pattern of genetic control over a time course.
The regulation network of plant height traits has been studied intensively in molecular biology. We already know that the development of plant height traits is regulated by growth hormones such as GA and IAA. Using functional mapping, we found 48 growth QTLs in
Functional mapping is far from enough to fully study complex traits, and there are still many limitations in describing the developmental pathways leading to the final phenotype and revealing the underlying genetic mechanisms for the formation and development of these traits. It is too simple to draw a complete dynamic diagram of complex traits. Wu  extends functional mapping to system mapping. By identifying the dynamic formation process of complex traits as a system and decomposing it into several parts, the QTL that controls the interaction of each component during the development of complex characters is identified. From the point of view of ecology, the process of character formation is extremely complex. To draw a complete quantitative genetic structure, we need to study the characteristics of an organism affected by its own genes as well as the influence from the community partner genome. In nature, most organisms live in groups, and individuals compete with each other. Wu combined game theory with QTL mapping, which opens up new opportunities for improving the accuracy and resolution of complex phenotype QTL recognition .
This work is supported by National Natural Science Foundation of China (grant 31700576) and grant 201404102 from the State Administration of Forestry of China.