Most traits important to agriculture, biology, and biomedicine are complex traits, determined by both genetic and environmental factors. The complex traits that change their phenotypes over different stages of development are called dynamic traits. Traditional quantitative trait loci (QTLs) mapping approaches ignore the dynamic changes of complex traits. Functional mapping, as a powerful statistical tool, can not only map QTLs that control the developmental pattern and process of complex traits, but also describe the dynamic changes of complex traits. In this study, we used functional mapping to identify those QTLs that affect height growth in 10th generation recombinant inbred lines derived from two different Arabidopsis thaliana accessions. Functional mapping identified 48 QTLs that are related to height traits. The growth curves of different genotypes can be drawn for each significant locus. By GO gene function annotations, we found that these QTLs detected are associated with the synthesis of biological macromolecules and the regulation of biological functions. Our findings provide unique insights into the genetic control of height growth of A. thaliana and will provide a theoretical basis for the study of complex traits.
- complex traits
- functional mapping
- Arabidopsis thaliana
Complex traits are genetic traits controlled by multiple genes. They are sensitive to environmental changes and easily affected by the environment . The phenotypic expression of complex traits in individuals within a population displays a continuous variation and generally a normal distribution. Most important biochemical, medical, and agronomic traits and the majority of human diseases are complex traits that are controlled by interconnected genetic networks and environmental factors. In these gene networks, the effect of each gene is small . The characteristics also change at different developmental stages and demonstrate the dynamic features of complex traits. The study of these complex traits is an important topic in modern biology.
Genetic mapping aimed at mapping underlying genes to genomic locations is a powerful tool for dissecting the genetic architecture of complex traits. Lander and Botstein have proposed an approach for mapping quantitative trait loci (QTLs) based on a sparse-density linkage map of molecular markers. This so-called interval mapping method can overcome the confounding problem of marker-QTL recombination . Composite interval mapping includes other markers as covariates to control the overall genetic background, which displays increased power in QTL detection . Considering QTL-QTL epistatic interactions in a linkage map, Kao et al. proposed using multiple marker intervals to map QTLs . Currently, statistical methodologies for QTL mapping include regression analysis, maximum likelihood, and the Bayesian approach. With the development of high-throughput single nucleotide polymorphism (SNP) genotyping techniques, genome-wide association studies (GWAS) have provided a powerful means of mapping a complete set of genes underlying complex traits [6, 7, 8]. The genetic structure of a trait is explained by GWAS, which identify the numbers and chromosomal locations of each gene, the size of each gene’s unique and pleiotropic effects, and the relative contributions of additive, dominant, and epistatic genetic effects. This provides an unprecedented tool for preparing a genotype-phenotype map . So far, GWAS have detected many genetic variants for a wide range of complex traits, including those pertaining to agriculture, forestry, and human disease [9, 10].
It should be noted, however, that traits such as height and weight vary with time or other independent environmental stimuli. The traditional QTL mapping method directly associates a single marker with a single phenotype at a time point, which ignores the dynamic characteristics of organisms at different developmental stages, and cannot exactly reflect the whole genetic architecture of complex traits. Although thousands of QTLs were detected in many individuals, only a small number of QTLs were cloned and separated . The reason for this problem is that the QTLs that have undergone rigorous statistical testing are divorced from biological relevance, which limited the projections of the genetic structure of traits. Genetic analysis of dynamic traits presents a serious statistical challenge. To solve these problems, Ma et al.  proposed a QTL mapping method based on a logistic-mixture model . The QTL effect on developmental traits during ontogeny is considered as a function of time, and a series of growth formulas can be derived from the logistic curve describing plant height, size, and weight , arriving at a model which is expected to be improved in parameter estimation and statistical inference over previous models. Ma et al.  developed a maximum likelihood statistical framework based on a logistic-mixture model for the characteristics of function-valued traits, which change as a function of a specific variable. This QTL mapping strategy is called functional mapping.
Functional mapping combines mathematical functions that describe biological processes and assembles mathematical formulas into the statistical framework of QTL mapping to study the interactions between genes and phenotypic traits of organisms during growth and development. We estimate the parameters of a specific genotype that determines the development of a trait by statistical hypothesis testing and parameter estimation, rather than directly estimating the genetic effects of the gene at all-time points. Because of the combination of biological laws (defined by mathematical models) and the reduced number of genetic parameters that need to be estimated, functional mapping increased the potency of detecting significant QTLs . Statistical merits of functional mapping will be exemplified in which case data are recorded irregularly, bringing on data sparsity. Such sparse longitudinal data cannot be well analyzed by traditional mapping for two reasons. First, because the problem of missing data existing at a given time point, traditional method is unable to use all of individuals, thus leading to a biased parameter estimation and reduced power of QTL detection. Second, individuals are measured at a few number of time points, limiting the fit of growth equation. Functional mapping is robust for handling longitudinal sparse data in which no single time point has the phenotypic data for all individuals, facilitating the QTL mapping to study the genetic architecture of hard-to-measure traits.
A set of tree data is used to assess the statistical validity of functional mapping. Several QTLs affecting the developmental trajectories of poplar were detected with the QTL mapping method based on the logistic-mixture model , and these QTLs were located on a genetic linkage map constructed by polymorphic markers. Studies have shown that functional mapping is useful in establishing gene-phenotype relationships and predicting QTL phenotypes of individual organisms based on the control of a trait. Functional mapping combines the principles of Mendelian genetics with statistical and developmental mechanisms, and is superior to traditional QTL mapping methods that combine the principles of statistics and genetics. To date, functional mapping has been used for mapping dynamic QTLs in poplars , jujubes , soybeans , rice , maize , yeast , oysters , mice , humans , and drug responses [25, 26].
Arabidopsis thaliana is a small, annual or winter annual, rosette plant. A. thaliana is a central genetic model and universal reference organism in plant and crop science. The successful integration of different fields of research in the study of A. thaliana has made a large contribution to our molecular understanding of key concepts in biology. The Arabidopsis reference genome sequence was the first published nuclear genome of a flowering plant in 2000 (
2. Materials and methods
In this study, the mapping population included 114 RILs derived from two cultivars of Arabidopsis. The main phenotype of the LER cultivar includes round leaves, short petiole, short pedicel, flowers clustered at inflorescence tips, short pod width, sharp tip, inflorescence compact, and a plant height of 10–25 cm. The general traits of the SHA cultivar are described in The Arabidopsis Information Resource database as a slightly narrow leaf with a height of about 30 cm. To ensure adequate seeds for the sustainability of the material, each RIL carried out a generation of expansion. For the sterilization of Arabidopsis seeds, the seeds were shaken in a centrifuge tube with 10% sodium hypochlorite for 6 min; 1 ml of 95% alcohol was then added, and the tubes were shaken for 5 min. The seeds were rinsed 5–6 times to thoroughly wash away the remaining sodium hypochlorite solution on the seed surface. The alcohol was poured out, and the tubes were dried at a super clean bench to obtain sterile Arabidopsis seeds.
For vernalization of A. thaliana seeds, distilled water was added to the seeds and the seeds were placed in a 4°C refrigerator for 3 days of dark treatment before sowing. To sow the seeds after vernalization, vermiculite, peat, and limestone were mixed evenly in a 1:1:1 ratio and used as the soil. An 18 × 18 cm plastic grid with 90 squares was placed over the plant pot. The seeds were placed into each mesh with a toothpick. The planting diagram of the experiment is shown in Figure 1. Circles and triangles represent the two different lines of A. thaliana, planted in the same growing space. After planting, the plant pot was covered with plastic wrap and then transferred to the long day (16 h) and short day (8 h) artificial climate room (22°C). When the Arabidopsis sprouts displayed green shoots, the plastic wrap was removed.
2.2. Phenotypic data acquisition
Plant height was measured after 1 week of planting. Each line of A. thaliana was randomly selected for 10 strains. In total, 1160 strains of Arabidopsis were selected. The height of Arabidopsis plants were measured once per week until the end of the plant’s life cycle by manual measurement with a 1 mm ruler (accurate to 1 mm).
The plant height phenotype was measured for 1160 A. thaliana strains. The height of 1–60 lines of Arabidopsis was measured seven times, and the height of 60–116 lines of Arabidopsis was measured seven times. Because of environmental or genotypic reasons, nine lineages of Arabidopsis did not complete a life cycle (numbers 1–4, 22, 28–30, and 60).
The growth of Arabidopsis strains can be seen in Figure 2a–c, which indicates the growth of the strains for 1, 3, and 6 weeks, respectively.
2.3. Acquisition of SNP data
A. thaliana DNA was extracted with a DNA extraction kit (TIANGEN DP305) according to the manufacturer’s instructions. The plant material used was 100 mg of the young leaf. After DNA was detected in the samples, the DNA sequence was determined by an outside company. The resulting sequence data was subjected to SNP calling.
2.3.1. DNA quality inspection
The degree of degradation of the DNA was analyzed by agarose gel electrophoresis and imaged with an ultraviolet gel imager. The DNA sample was determined to be without degradation if the sample did exhibit dispersion phenomena. All of the samples were determined not to be degraded and were sequenced.
We measured the absorbance of the DNA samples at wavelengths of 260 and 280 nm and calculated the value of the optical density (OD) 260/OD280. Samples were determined to be of sufficient quality if this value was greater than 1.8 and less than 2.2. After testing, all samples were found to meet the requirements for sequencing.
2.3.2. Sequencing and mapping
DNA from the sample was sequenced on the Illumina HiSeq 2000 platform to generate 125 bp paired-end reads at greater depth. All these RILs sequencing were performed at Total Genomics Solution (TGS) Institute. After sequencing quantity control, the data of 107 samples reached 117.68 G, with an average of 1.10 G per sample. The percentage of Q30 bases was more than 90%, the percentage of Q20 bases was more than 95%, and the distribution of GC was normal. Thus, the quantity and quality of the sequencing met the requirements for subsequent analysis.
The per-base coverage depth across all contigs was calculated by mapping raw reads from each RIL against reference genomes. The results showed that the average mapping ratio and sequencing coverage of the samples were higher than 95 and 92%, respectively. The sequence depth of the samples was 9.91×. In addition to the line 110, the other lines are better for subsequent variation detection and analysis.
2.4. SNP calling
SNP detection was performed with the widely accepted mutation detection software GATK.SNP. The screening criteria were as follows: The depth of sequencing for each sample was greater than or equal to 4, otherwise the sample was marked as missing. Additionally, the quality value of the comparison must be greater than or equal to 20, and the variation detection quality value must be greater than or equal to 50. If the allele frequency was less than 5% or the absence rate of the sample was greater than 50%, the site was filtered out.
The RIL population contains 105 progeny and two parents; the parent number is “A518_LER” and “A518_SHA.” The reads of each sample were compared with the reference genome. The average ratio of the samples was above 95%, and the coverage was greater than 92%. The average sequencing depth was 9.19×. By processing, we obtained 107 samples and 1,023,325 whole genome SNP markers. According to the situation of this population, individuals with heterozygous genotype ratios over 25% and markers indicating that parents possessed heterozygous genotypes were eliminated. Finally, 609,427 SNPs and 80 individuals met the model requirements and could be used to detect the QTLs that affect the growth height of A. thaliana.
3. Functional mapping
3.1. Statistical model
Let denote the vector of trait values for RIL, i measured at T time-points. Consider a SNP with two alleles Q and q, generating two genotypes: QQ with n1 RILs and qq with n2 RILs. In this study, the likelihood for height growth data of A. thaliana is expressed as
where indicates the unknown parameters including the time-dependent effects of different QTL genotypes and the time-dependent residual variance and correlations. and are a multivariate normal distribution with a time-dependent mean vector for genotype QQ and qq,
(T × T)-dimensional longitudinal covariance matrix is expressed as , which can be modeled by using a statistical approach such as the first-order autoregressive [AR(1)] model or an autoregressive moving-average process (ARMA). The maximum-likelihood estimates (MLEs) of the unknown parameters are implemented with the simplex algorithm in R software .
3.2. Modeling the mean vector
One of the most important equations for capturing time-specific change in growth is the logistic curve [49, 50], which we used to describe height growth of the QTL genotype according to the following expression:
where g(t) represents the trait value at time point t, a indicates the asymptotic value of g when , b is a parameter to position the curve on the time axis, and r indicates the relative growth rate. Consequently, any specific growth characteristics described by the logistic growth equation can be captured by parameter a, b, and r, and these can be used to determine the coordinates of biologically important benchmarks along the growth trajectory. The mean vector for the QTL genotypes QQ and qq from time 1 to T in the multivariate normal density function is expressed as:
3.3. Modeling the covariance structure
The first-order autoregressive [AR(1)] model has been successfully applied to model the structure of the within-subject covariance matrix for functional mapping. The AR(1) model includes two basic assumptions: (1) variance stationarity (the residual variance, σ2) is unchanged over time points and (2) covariance stationarity (the correlation between different time points) decreases proportionally (in ρ) with increased time interval. The AR(1) is described as:
where 0 < ρ < 1 is the proportion parameter with which the correlation decays with a time point.
3.4. Hypothesis tests
After the MLEs of the parameters are obtained, the hypothesis concerning the existence of a QTL affecting overall growth can be written as:
where the null hypothesis H0 corresponds to the reduced model and the alternative hypothesis H1 corresponds to the full model. The test statistics for testing the hypotheses is the log-likelihood ratio (LR) of the full over reduced model. The critical threshold is determined from permutation tests.
3.5. Candidate gene function annotation
Gene ontology (GO) annotation analysis was performed using Blast2GO . Finally, R script language programming was used to translate the GO annotation results into charts.
4.1. QTL detection
By plotting the total growth against growth week, it was observed that each of the 116 mapped genotypes followed the logistic growth curve. Figure 3 illustrates the growth trajectories of height for each RIL over 9 weeks. A least-squares approach was used to fit height growth with the logistic curve (Eq. (3)) for each RIL. Based on statistical tests, all genotypes can be fit by a logistic curve containing parameters a, b, and r (P < 0.01, Figure 3).
According to the fit results, the relative growth rate of A. thaliana was r = 0.61, and the asymptotic value of plant height was calculated to be a = 10.08. We can see from Figure 3 that the plant height of A. thaliana increases exponentially over a period of time, and as growth time increases, the trend of the curve begins to flatten, until the plant height does not change anymore. In addition, different genotypes showed different growth curves, suggesting the possibility of genetic control over the growth trajectories. The statistical model built upon the logistic growth curve model is used to map QTLs responsible for growth trajectories in plant height.
Functional mapping was implemented to analyze the mapping population. A Manhattan plot of LR values against the genome locations of SNPs distributed throughout the genome is shown in Figure 4. We found QTLs that affect the growth trajectory of plant height on the fourth chromosome of the RIL population (Figure 4). The genome-wide empirical estimate of the critical value is obtained from permutation tests. The profile of the LR value of the full and reduced model across the length of chromosome 4 has a clear peak from 7.9 to 22.5 kb. A total of 48 significant QTLs responsible for growth trajectories of plant heights were identified. The LR value at this peak is 437.78, which is beyond the empirical critical threshold at the significance level p = 0.05.
Functional mapping can also be used to observe the dynamic expression of a QTL over time such as when a QTL starts to affect a growth process. The parameter combination of the QQ genotype was a1 = 9.07, b1 = 370.08, and r1 = 0.92. The parameter combination of the qq genotype was a2 = 39.62, b2 = 114.61, and r2 = 0.28. Different QTL genotypes corresponded to different parameter combinations, which indicate that the QTL controls the developmental trajectory of the plant height. Using the estimates from the growth curves, we drew two different curves, each corresponding to a genotype at each of the detected QTLs on the fourth chromosome (Figure 5).
On the basis of the hypothesis test (6), this QTL is detected to be inactive until A. thaliana grew to 4 weeks, and its effect on height growth increased with time. At 9 weeks, the genotype qq exhibited height growth more than its alternative QQ. The effect of qq on plant height was more significant than that of QQ. If different genotypes at a given QTL correspond to different trajectories, the QTL must affect the differentiation of this trait. Apparently, this mapped QTL interacts significantly with time to affect the height growth of A. thaliana.
4.2. Candidate gene function annotation
GO classification is widely used for gene classification and functional annotation, and GO provides three types of semantic terms, including cell component, molecular function, and biological process, to describe the characteristics of genes and gene products [52, 53]. To further summarize the data of GO classification, the cytological components include cell structure, tissue, protein complex, extracellular structure, and cell process. Biological processes include developmental processes, physiological processes, regulatory processes, and the processes of responding to stress. Molecular functions include binding, catalysis, activation, structural molecules, and transcriptional regulatory functions (Figure 6).
Functional prediction and classification analysis of 48 loci were screened with the National Center for Biotechnology and the Joint Genome Institute (JGI) databases. There are 20 gene families in these loci, which include the F-box and calcium-binding EF-hand protein families. Pathway cluster analysis was used to compare the 48 genes with the known protein sequences in the JGI database.
It was found that 7 of the 48 genes corresponded to functions in the JGI database. They were divided into nine groups, including carbon fixation pathways in prokaryotes, biosynthesis of antibiotics, one-carbon pool by folate, starch, and sucrose metabolism, glycerolipid metabolism (synthase), glycerolipid metabolism (lipase), polyketide sugar unit biosynthesis, streptomycin biosynthesis, and pentose and glucuronate interconversions. We found that these nine pathways are mostly associated with carbohydrate, energy, and amino acid metabolism. These biological processes play an important role in the development of height in Arabidopsis.
We also found that AT4G00160.1 encodes an F-box protein in the signal transduction pathway. It has been shown that F-box is an auxin receptor . Plant height development is regulated by gibberellin (GA) and auxin (indole-3-acetic acid [IAA]) , and GA20ox and GA30ox are encoded by multiple genes, and mutations in these loci can result in dwarfing of the plant in the later stage of GA biosynthesis . The most significant site, AT4G01150.1, is related to protein curvature thylakoid chloroplastic. Protein curvature thylakoid chloroplastic tends to be located in leaves and stems, and plays an important role in plant photosynthesis, which affects plant growth .
Gene mapping has been shown to be a powerful approach for the study of the genetic architecture of complex traits. It has been instrumental for the characterization of QTLs that control quantitative traits of interest to agriculture, biology, and human disease [58, 59]. However, traditional mapping strategies do not provide much insight into the genetic control mechanisms for phenotypic variations if some statistical and biological issues related to the approach are not resolved. Ma et al. (2002) integrated some fundamental biological principles into the mapping framework, aimed at generating more biologically meaningful discoveries related to trait formation and development, further proposing so-called functional mapping . Functional mapping attempts to combine strong statistical and molecular genetics with the developmental mechanisms of biological features, and to elucidate the genetic mechanisms of complex traits. Since functional mapping combines different mathematical functions with biological significance, it possesses three advantages over traditional mapping methods in QTL mapping: (1) because the underlying biological mechanism is considered, the results of functional mapping are closer to biological reality; (2) a smaller sample size can be used to achieve sufficient accuracy for QTL detection because multiple measurements of the same individual improve mapping accuracy; and (3) by treating the growth process as a smooth curve, a large number of variables can be analyzed simultaneously, and the estimation of a small number of parameters can improve the accuracy of the parameter estimation and flexibility of the model.
With the development of high-throughput sequencing technology and the reduction of sequencing cost, GWAS have become an important tool for studying complex traits and have been widely used in genetic studies of complex traits in humans, animals, and plants . Most GWAS only use single phenotypic data to perform regression analysis with each SNP such as with Plink software . In addition, some GWAS have been developed to solve the false positive loci of population structure and genetic relationship [62, 63, 64]. The successes and potential of GWAS have not been explored when complex phenotypes arise as a curve. In any regard, a curve is more informative than a point in describing the biological features of a trait. To apply functional mapping to GWAS by integrating GWAS and functional aspects of dynamic traits, a new analytical model for genome-wide association analysis of dynamic data, called functional GWAS (fGWAS), has been derived . There are two advantages to fGWAS over GWAS: (1) fGWAS is able to identify genes that determine the final form of the trait and (2) it provides the ability to study the temporal pattern of genetic control over a time course.
The regulation network of plant height traits has been studied intensively in molecular biology. We already know that the development of plant height traits is regulated by growth hormones such as GA and IAA. Using functional mapping, we found 48 growth QTLs in A. thaliana. Through the GO annotation of QTLs, we found that there are many genes among the significant loci identified in this study that are related to the pathways for synthesis and conduction of growth hormones, such as AT4G00160.1, which encodes an F-box protein in the signal transduction pathway. It has been shown that the F-box is an auxin receptor. Thus, we can see that the QTLs identified in this study may not only be applicable to A. thaliana, but also to other plants. These results show that functional mapping can reveal more intricate details of dynamic traits such as height growth and other phenotypes.
Functional mapping is far from enough to fully study complex traits, and there are still many limitations in describing the developmental pathways leading to the final phenotype and revealing the underlying genetic mechanisms for the formation and development of these traits. It is too simple to draw a complete dynamic diagram of complex traits. Wu  extends functional mapping to system mapping. By identifying the dynamic formation process of complex traits as a system and decomposing it into several parts, the QTL that controls the interaction of each component during the development of complex characters is identified. From the point of view of ecology, the process of character formation is extremely complex. To draw a complete quantitative genetic structure, we need to study the characteristics of an organism affected by its own genes as well as the influence from the community partner genome. In nature, most organisms live in groups, and individuals compete with each other. Wu combined game theory with QTL mapping, which opens up new opportunities for improving the accuracy and resolution of complex phenotype QTL recognition .
This work is supported by National Natural Science Foundation of China (grant 31700576) and grant 201404102 from the State Administration of Forestry of China.