Soybean is a most important crop providing edible oil and plant protein source for human beings, in addition to animal feed because of high protein and oil content. This review summarized the progresses in the QTL mapping, candidate gene cloning and functional analysis and also the regulation of soybean oil and seed storage protein accumulation. Furthermore, as soybean genome has been sequenced and released, prospects of multiple omics and advanced biotechnology should be combined and applied for further refine research and high-quality breeding.
- seed oil content
- seed storage protein
1.1. Soybean protein and oil content QTL analysis
Soybean oil and protein content were quantitative trait and effected by multiple genes and environments factors [2, 3]; there were over 312 soybean oil QTLs and 231 soybean protein QTLs having been detected by different population and environments (SoyBase, http://www.soybase.org), with the main mapping methods including the analysis of variance (ANOVA; ), interval mapping (IM; [5, 6, 7]), composite interval mapping (CIM; [8, 9]), multiple interval mapping (MIM; ) and inclusive composite interval mapping (ICIM; ). Among the published soybean oil content QTLs, some of them showed ‘hot regions’ that have been identified four or more times at the same or similar intervals in different studies, which include Gm05: 35.2–40.8 Mb, Gm09: 40.3–46.8 Mb, Gm12: 34.1–40.6 Mb, Gm14: 33.8–49.2 Mb, Gm15: 0.8–13.9 Mb, Gm18: 51.6–59.8 Mb, Gm19: 32.9–48.0 Mb and Gm20: 23.5–34.6 Mb . For soybean protein content, there were also some ‘hot regions’ included Gm04: 43.6–47.7 Mb, Gm05: 39.7–41.4 Mb, Gm07: 4.2–9.6 Mb, Gm08: 5.8–10.2 Mb, Gm14: 4.8–9.6 Mb, Gm15: 0.0–7.5 Mb, Gm18: 47.9–54.0 Mb, Gm19: 35.5–42.1 Mb and Gm20: 2.1–34.2 Mb [13, 14]. Meta-analysis is a statistical method that could combine results from different sources in a single study ; it can increase QTL precision and validity by using mathematical models to refine the integration of QTLs  and have been performed in maize  and soybean  at the beginning of application. Meta-analysis method has also been employed to analyze the soybean oil and protein content separately by Qi et al. [19, 20].
However, soybean oil and protein content always showed the opposite relationship [21, 22], with the observation and data collections from many classical genetic analysis, the high oil variety with lower protein content and high protein variety with lower oil content . And also, many classical genetic and breeding books or data noted the opposite relationship for soybean oil and protein content [2, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34]. Although it was very hard to find the locus which could increase soybean oil and protein content at the same time , based on the big amounts of QTL mapping results, few regions showed the same direction of contribution to soybean oil and protein content in the same genetic population. Orf et al.  mapped the additive QTL affected the soybean oil content at 39.5–41.2 Mb of Gm05 with the population crossed by Minsoy and Noir1, the results implied Minsoy bring the positive alleles for increasing soybean oil and protein content, however, Specht et al.  identified the similar region with the opposite results that Noir1 bring the positive alleles. Hyten et al.  identified a QTL at 4.8–8.7 Mb of Gm07 and the parent Williams bring the positive alleles for both traits. Reinprecht et al.  also demonstrated that the variety OX948 bring the positive alleles. Mao et al.  identified the additive QTLs affected the soybean oil and content at 51.2–56.3 Mb of Gm01, 1.0–2.3 Mb of Gm09 and 39.4–46.1 Mb of Gm19 in the cross population of Hefeng47 and Heinong37, which indicated that the soybean variety Heinong37 bring the positive alleles of those regions that could increase the soybean oil and protein content at the same time. Heinong37 was the only one Chinese variety, which may bring the positive alleles for both traits based on published data.
1.2. Soybean fatty acid composition biosynthesis and transcriptional regulation
The accumulation of starch, lipid and protein supplied the raw materials and energy for soybean seed growth and maturity. Lipid was one of the three significant raw materials, although the biochemical pathway about synthesis of lipid has been studied thoroughly, the regulation mechanism is unclear till now [41, 42, 43, 44, 45, 46, 47]. De novo synthesis of fatty acid mainly started in plant plastid. Acetyl -CoA is a precursor of soybean seed fatty acid synthesis. It is an important intermediate of many cellular metabolisms, and it synthesizes a lot in plant cell and then acetyl-CoA carboxylase (ACCase) catalyzes the first committed step of fatty acid synthesis, acetyl-CoA carboxylate to malonyl-CoA . After that, malonyl-CoA has been catalyzed by fatty acid synthase complex (FAS) and proceeding of continuous polymerization reaction based on the acyl carbon chains synthesized with a frequency of two carbons per cycle. The growing acyl carbon chain binds to acyl-carried proteins (ACP) and termination with the acyl-ACP thioesterase or acyltransferase form into acyl ACP. Furthermore, different lengths of acyl ACP synthesized the acyl-CoA with acyl-CoA synthetase and transferred from the plasmids to the endoplasmic reticulum or the cytoplasm. At last, fatty acids were attached to glycerol to synthesize triacylglycerides (TAGs) with three different acyltransferases respectively [49, 50, 51, 52]. Till now, seed oil content can be increased by changing the expression levels of individual enzymes involved in oil metabolism [53, 54, 55, 56, 57, 58, 59]. However, the key enzyme responsible for TAG assembly is encoded by diacylglycerol acyltransferase 1 (
Fatty acid composition were determined mainly by five fatty acids, palmitic (C16:0), stearic (C18:0), oleic (C18:1), linoleic (C18:2) and linolenic (C18:3) [67, 68]. Most palmitic acid (16:0) produced by the type II synthase is elongated to stearic acid (18:0) [67, 69]. In recent decades, there were many reports about the QTLs of each components of fatty acid, and there were also some ‘hot regions’ for soybean seed linoleic included Gm05 39.36–40.87 Mb and Gm18 48.35–50.78 Mb (with the original QTLs from Diers and Shoemaker ; Bachlava et al. ; Li et al. ; Xie et al. ); for soybean seed linolenic included Gm02 17.07–34.9 Mb, Gm09 34.56–37.74 Mb, Gm14 17.08–39.5 Mb and 45.68–46.78 Mb, Gm15 6.7–7.71 Mb, 13.07–25.6 Mb and Gm19 35.75–37.38 Mb (with the original QTLs from Li et al. , Bachlava et al. ; Diers and Shoemaker ; Spencer et al. ; Reinprecht et al. ; Xie et al. ; Shibata et al. ; Hyten et al. ); for soybean seed oleic included Gm05 39.07–40.80 Mb and Gm18 49.24–51.95 Mb (with the original QTLs from Diers and Shoemaker ; Reinprecht et al. ; Xie et al. ); for soybean seed palmitic included Gm05 2.84–3.92 Mb, Gm09 7.74–11.83 Mb and 34.59–38.73 Mb, Gm15 9.13–13.16 Mb, Gm17 7.60–9.45 Mb and Gm18 38.38–41.09 Mb (with the original QTLs from Li et al. ; Wang et al. ; Xie et al. ; Hyten et al. ; Li et al. ; Kim et al. , Reinprecht et al. ). In soybean, stearoyl-acyl carrier protein desaturase (SAD) catalyzes the first step in seed oil biosynthesis, converting stearoyl-ACP to oleoyl-ACP, which plays a key role in determining the ratio of total saturated to unsaturated fatty acid in plants [35, 78, 79].Then, microsomal oleate desaturase (FAD2) and linoleoyl desaturase (FAD3) catalyze oleic to linoleic acid mainly in the sn-2 position, and then, fatty acid elongase converts fatty acids into a long-chain fatty acid . The
However, overexpression of a single gene of fatty acid synthesis does not significantly improve the fatty acid biosynthesis [88, 89]. Fatty acid synthesis is regulated by some major classical transcription factors coupling with seed development, including
In addition, other transcription factors have been identified to affect oil content in Arabidopsis, including
1.3. Soybean seed storage protein (SSP) and transcriptional regulation
Soybean seed storage proteins (SSP) have been identified and classified into four basic categories, including albumins (water-soluble), globulins (salt-soluble), prolamins (alcohol-soluble) and glutelins (weak acid/weak base-soluble) [113, 114]. Globulin is the main component of SSP and can be classified into four groups according to different sedimentation coefficients, which are 2S (including trypsin inhibitors and cytochrome and other ingredients), 7S (β-conglycinin), 11S (glycinin) and 15S (polymer of glycinin) . 7S and 11S are the main components of soybean seed storage protein, and they are accounting for 60–80% of the whole soybean seed storage protein [116, 117, 118, 119, 120]. Till now, about the genetic mechanisms of 7S and 11S, globulin subunits are clear in general [121, 122, 123, 124]. β-conglycinin is accounting for roughly 30–40% of the total seed protein and is mainly composed of α-(76kD), α ‘-(72kD) and β-(53kD) subunits [125, 126, 127]. Glycinin is accounting for roughly 40–60% of the total seed protein and is mainly composed of G1, G2, G3, G4 and G5 subunits (approximately 56, 54, 54, 64 and 58 kD, respectively) [113, 118, 128]. In the past several years, few QTL mapping researches were conducted for soybean seed 7S and 11S; the QTL region of 11S includes Gm09 45.6–47.6 Mb and 103.7–105.8 Mb, Gm17 79–81 Mb, Gm19 55.1–57.1 Mb, Gm19 60.3–62.35 Mb and Gm20 81.7–83.7 Mb ; the QTL region of 7S includes one QTL of α’-7S located on Gm08 35.7–37.7 Mb and nine QTLs of β-7S located on Gm01 65–104 Mb, Gm03 75.4–77.49 Mb, Gm17 26–81 Mb, Gm19 30–31 Mb, 100.7–115 Mb and Gm20 92–98 Mb [129, 130]. The genes of 11S and 7S have been reported, the genes of 11S subunit include
Accumulation of soybean seed storage protein is always coupling with TAGs and some key transcription factors involved in the process . B3-type transcription factors can act directly on the expression of SSP genes . The B3 domain, identified as the DNA-binding motif, recognizes the RY motif (CATGCA) as the target sequence , and RY motif (CATGCA) is a cis-acting element as a seed-specific promoter, which is the most legume seed storage protein gene that contain one or more RY repeating elements [65, 128]. Several studies have shown that the binding of the
1.4. Small RNA regulation of seed composition
Small RNAs, such as miRNAs and short interfering RNAs (siRNAs), are key components of the evolutionarily conserved system of gene regulation in eukaryotes . Wherein, microRNAs (miRNAs) are a class of non-coding small RNAs of 20–24 nt in length that play an important role in plant growth and development. Structurally, except for the characteristics of the segments, all miRNA precursors have well-predicted stem-loop hairpin structures, and this fold-back hairpin structure has a low degree of freedom of energy . The microRNA database (http://www.mirbase.org/) is a searchable database of published miRNA sequences and annotations. According to miRBase, miRNA information of 1269 species has been collected, including 399 soybean miRNAs. For example, gma-MIR156d belongs to the MIPF0000008, MIR156 gene family, described as
There are few studies on miRNAs related to plant quality. Soybean cotyledons affect soybean seed yield and quality. Goettel et al. analyzed 304 miRNA genes expressed in soybean cotyledons and predicted their complex miRNA networks to 1910 genes. By analyzing extensive biological pathways present in soybean cotyledons, the evolutionary pathways of soybean miR15/49 in soybean cotyledons were further demonstrated . Ye et al. identified and analyzed the whole genome of miRNA endogenous target gene mimic (eTM) and the phagemid-generated siRNA (PHAS) in soybean, with a focus on lipid metabolism-related genes. Lipid metabolism was found to be regulated by a potentially complex non-coding network in soybean, of which 28 may be miRNA-regulated and nine may be further regulated .
2. Conclusion and perspectives
As sequencing development of soybean genome, the cultivar Williams 82 genome has been released by Schmutz et al. , and it update the quality of assembly of the reference genome year by year. In present version (
Although many genes and regulators of seed oil content and SSP have been identified and their associated regulatory networks have been well studied in Arabidopsis, there are still unclear in soybean in addition to
This study was supported by the National Key R&D Program of China (2016 YFD0100500, 2016YFD0100300, 2016YFD0100201-21), the National Natural Science Foundation of China (31701449, 31471516, 31401465, 31400074, 31501332), the Natural Science Foundation of Heilongjiang (QC2017013), the Young Innovative Talent training plan of undergraduate colleges and universities in Heilongjiang province (UNPYSCT-2016144), special financial aid to post-doctor research fellow in Heilongjiang (To Qi Zhaoming), the Heilongjiang Funds for Distinguished Young Scientists (JC2016004) and the Outstanding Academic Leaders Projects of Harbin, China (2015RQXXJ018), the China Post Doctoral Project (2015 M581419), the Dongnongxuezhe Project (to Chen Qingshan), the Young Talent Project (to Qi Zhaoming, 518062) of Northeast Agricultural University and SIPT project of Northeast Agriculture University (2018-171, 2018-172).