Soybean is a most important crop providing edible oil and plant protein source for human beings, in addition to animal feed because of high protein and oil content. This review summarized the progresses in the QTL mapping, candidate gene cloning and functional analysis and also the regulation of soybean oil and seed storage protein accumulation. Furthermore, as soybean genome has been sequenced and released, prospects of multiple omics and advanced biotechnology should be combined and applied for further refine research and high-quality breeding.
- seed oil content
- seed storage protein
Soybean (Glycine max [L.] Merr.) accounts around 60% of the world’s oilseed consumption and also 68% of world protein meal consumption (
1.1. Soybean protein and oil content QTL analysis
Soybean oil and protein content were quantitative trait and effected by multiple genes and environments factors [2, 3]; there were over 312 soybean oil QTLs and 231 soybean protein QTLs having been detected by different population and environments (SoyBase,
However, soybean oil and protein content always showed the opposite relationship [21, 22], with the observation and data collections from many classical genetic analysis, the high oil variety with lower protein content and high protein variety with lower oil content . And also, many classical genetic and breeding books or data noted the opposite relationship for soybean oil and protein content [2, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34]. Although it was very hard to find the locus which could increase soybean oil and protein content at the same time , based on the big amounts of QTL mapping results, few regions showed the same direction of contribution to soybean oil and protein content in the same genetic population. Orf et al.  mapped the additive QTL affected the soybean oil content at 39.5–41.2 Mb of Gm05 with the population crossed by Minsoy and Noir1, the results implied Minsoy bring the positive alleles for increasing soybean oil and protein content, however, Specht et al.  identified the similar region with the opposite results that Noir1 bring the positive alleles. Hyten et al.  identified a QTL at 4.8–8.7 Mb of Gm07 and the parent Williams bring the positive alleles for both traits. Reinprecht et al.  also demonstrated that the variety OX948 bring the positive alleles. Mao et al.  identified the additive QTLs affected the soybean oil and content at 51.2–56.3 Mb of Gm01, 1.0–2.3 Mb of Gm09 and 39.4–46.1 Mb of Gm19 in the cross population of Hefeng47 and Heinong37, which indicated that the soybean variety Heinong37 bring the positive alleles of those regions that could increase the soybean oil and protein content at the same time. Heinong37 was the only one Chinese variety, which may bring the positive alleles for both traits based on published data.
1.2. Soybean fatty acid composition biosynthesis and transcriptional regulation
The accumulation of starch, lipid and protein supplied the raw materials and energy for soybean seed growth and maturity. Lipid was one of the three significant raw materials, although the biochemical pathway about synthesis of lipid has been studied thoroughly, the regulation mechanism is unclear till now [41, 42, 43, 44, 45, 46, 47]. De novo synthesis of fatty acid mainly started in plant plastid. Acetyl -CoA is a precursor of soybean seed fatty acid synthesis. It is an important intermediate of many cellular metabolisms, and it synthesizes a lot in plant cell and then acetyl-CoA carboxylase (ACCase) catalyzes the first committed step of fatty acid synthesis, acetyl-CoA carboxylate to malonyl-CoA . After that, malonyl-CoA has been catalyzed by fatty acid synthase complex (FAS) and proceeding of continuous polymerization reaction based on the acyl carbon chains synthesized with a frequency of two carbons per cycle. The growing acyl carbon chain binds to acyl-carried proteins (ACP) and termination with the acyl-ACP thioesterase or acyltransferase form into acyl ACP. Furthermore, different lengths of acyl ACP synthesized the acyl-CoA with acyl-CoA synthetase and transferred from the plasmids to the endoplasmic reticulum or the cytoplasm. At last, fatty acids were attached to glycerol to synthesize triacylglycerides (TAGs) with three different acyltransferases respectively [49, 50, 51, 52]. Till now, seed oil content can be increased by changing the expression levels of individual enzymes involved in oil metabolism [53, 54, 55, 56, 57, 58, 59]. However, the key enzyme responsible for TAG assembly is encoded by diacylglycerol acyltransferase 1 (DGAT1) [59, 60, 61], and expression of DGAT1 can be used to draw fatty acids into TAG; overexpression of DGAT1 could increase both seed oil content (by 9–12%) and seed weight (40–100%) in Arabidopsis . Overexpression of TmDGAT1a and TmDGAT1b could increase soybean seed oil content . SiDGAT1 encoding acyl-CoA could also increase soybean seed oil content . When expressing VgDGAT1A, (from Vernonia galamensis) it could make soybean oil content increase obviously . Furthermore, the speed limit of fatty acid biosynthesis enzyme in dicotyledonous plants is biotin carboxylase (BC), which is a vital subunit of acetyl-CoA. Li et al.  cloned four genes encoding BC from Brassica napus and elucidated the evolution and the regulation of ACCase in the Brassica. The cytosolic enzyme glyceraldehyde-3-phosphate dehydrogenase (GAPC) catalyzes a key reaction in glycolysis, whose levels are directly correlated with seed oil accumulation .
Fatty acid composition were determined mainly by five fatty acids, palmitic (C16:0), stearic (C18:0), oleic (C18:1), linoleic (C18:2) and linolenic (C18:3) [67, 68]. Most palmitic acid (16:0) produced by the type II synthase is elongated to stearic acid (18:0) [67, 69]. In recent decades, there were many reports about the QTLs of each components of fatty acid, and there were also some ‘hot regions’ for soybean seed linoleic included Gm05 39.36–40.87 Mb and Gm18 48.35–50.78 Mb (with the original QTLs from Diers and Shoemaker ; Bachlava et al. ; Li et al. ; Xie et al. ); for soybean seed linolenic included Gm02 17.07–34.9 Mb, Gm09 34.56–37.74 Mb, Gm14 17.08–39.5 Mb and 45.68–46.78 Mb, Gm15 6.7–7.71 Mb, 13.07–25.6 Mb and Gm19 35.75–37.38 Mb (with the original QTLs from Li et al. , Bachlava et al. ; Diers and Shoemaker ; Spencer et al. ; Reinprecht et al. ; Xie et al. ; Shibata et al. ; Hyten et al. ); for soybean seed oleic included Gm05 39.07–40.80 Mb and Gm18 49.24–51.95 Mb (with the original QTLs from Diers and Shoemaker ; Reinprecht et al. ; Xie et al. ); for soybean seed palmitic included Gm05 2.84–3.92 Mb, Gm09 7.74–11.83 Mb and 34.59–38.73 Mb, Gm15 9.13–13.16 Mb, Gm17 7.60–9.45 Mb and Gm18 38.38–41.09 Mb (with the original QTLs from Li et al. ; Wang et al. ; Xie et al. ; Hyten et al. ; Li et al. ; Kim et al. , Reinprecht et al. ). In soybean, stearoyl-acyl carrier protein desaturase (SAD) catalyzes the first step in seed oil biosynthesis, converting stearoyl-ACP to oleoyl-ACP, which plays a key role in determining the ratio of total saturated to unsaturated fatty acid in plants [35, 78, 79].Then, microsomal oleate desaturase (FAD2) and linoleoyl desaturase (FAD3) catalyze oleic to linoleic acid mainly in the sn-2 position, and then, fatty acid elongase converts fatty acids into a long-chain fatty acid . The FAD2 gene family of soybean was consisted of at least five members in four genome regions and was responsible for the conversion of oleic acid to linoleic acid [81, 82, 83, 84]. The FAD3 enzyme contributes to the synthesis of α-linolenic acids (18:3) in the polyunsaturated fatty acid pathway. To improve soybean oil quality, we aim at reducing the percentage of α-linolenic acids. GmFAD3 mutant can reduce α-linolenic acid content in soybean seed oil, which has been verified in many studies [58, 85, 86, 87].
However, overexpression of a single gene of fatty acid synthesis does not significantly improve the fatty acid biosynthesis [88, 89]. Fatty acid synthesis is regulated by some major classical transcription factors coupling with seed development, including WRINKLED1 (WRI1) LEAFY COTYLEDON1 (LEC1), LEC2, ABSCISIC ACID INSENSITIVE3 (ABI3), and FUSCA3 (FUS3) [90, 91, 92, 93, 94, 95] were the plant-specific B3 transcription factor family, LEC1 was an NFY-B-type or CCAAT-binding factor-type transcription factor  and WRI1 encodes a transcription factor of APETALA2-ethylene responsive element-binding protein (AP2-EREBP) family . WRI1 is a potential global regulator of de novo fatty acid biosynthesis that specifies the regulatory action of the direct target of LEC2 . Overexpression of the transcription factor WRI1, which controls the expression of genes involved in lipid metabolism, including glycolysis and fatty acid biosynthesis, increased seed oil content by 10–20% compared to the wild type [40, 90, 98, 99, 100, 101]. LEC1 function was partially dependent on ABI3, FUS3 and WRI1 in the regulation of fatty acid biosynthesis; both LEC1 and LEC1-like genes were acted as key regulators to coordinate the expression of fatty acid biosynthetic genes . LEC2 can regulate WRI1 directly and is necessary for the regulatory action of fatty acid metabolism . Ectopic expression of FUS3 can trigger the expression of fatty acid biosynthetic genes , and interaction of FUS3 and AKIN10 positively regulates auxin biosynthesis and indirectly regulates fatty acid biosynthesis . Furthermore, few new soybean transcription factors have been identified for fatty acid biosynthesis in recent years, mainly including GmbZIP123 regulates lipid accumulation indirectly through the sugar translocation ; GmMYB73 was functioned as a repressor for negative regulator GLABRA2 (GL2)  and relieved GL2-inhibited expression of PLDα1 to accelerate conversion of phosphatidylcholine to TAG ; GmZF351 will improve oil accumulation by directly activating WRI1, BCCP2, KASIII, TAG1 and OLEO2 ; GmNFYA has been identified to increase seed oil content based on RNA-seq and gene coexpression networks  and GmDOF4 and GmDOF11 can increase lipid content in seeds by direct activation of lipid biosynthesis genes [41, 105]. In recent, regulatory mechanisms of seed oil content have been updated by duplicated genes in soybean .
In addition, other transcription factors have been identified to affect oil content in Arabidopsis, including GL2, TT1, TT2, bZIP67, MED, MYB [58, 107, 108] and BASS2 [43, 107, 108, 109, 110, 111, 112].
1.3. Soybean seed storage protein (SSP) and transcriptional regulation
Soybean seed storage proteins (SSP) have been identified and classified into four basic categories, including albumins (water-soluble), globulins (salt-soluble), prolamins (alcohol-soluble) and glutelins (weak acid/weak base-soluble) [113, 114]. Globulin is the main component of SSP and can be classified into four groups according to different sedimentation coefficients, which are 2S (including trypsin inhibitors and cytochrome and other ingredients), 7S (β-conglycinin), 11S (glycinin) and 15S (polymer of glycinin) . 7S and 11S are the main components of soybean seed storage protein, and they are accounting for 60–80% of the whole soybean seed storage protein [116, 117, 118, 119, 120]. Till now, about the genetic mechanisms of 7S and 11S, globulin subunits are clear in general [121, 122, 123, 124]. β-conglycinin is accounting for roughly 30–40% of the total seed protein and is mainly composed of α-(76kD), α ‘-(72kD) and β-(53kD) subunits [125, 126, 127]. Glycinin is accounting for roughly 40–60% of the total seed protein and is mainly composed of G1, G2, G3, G4 and G5 subunits (approximately 56, 54, 54, 64 and 58 kD, respectively) [113, 118, 128]. In the past several years, few QTL mapping researches were conducted for soybean seed 7S and 11S; the QTL region of 11S includes Gm09 45.6–47.6 Mb and 103.7–105.8 Mb, Gm17 79–81 Mb, Gm19 55.1–57.1 Mb, Gm19 60.3–62.35 Mb and Gm20 81.7–83.7 Mb ; the QTL region of 7S includes one QTL of α’-7S located on Gm08 35.7–37.7 Mb and nine QTLs of β-7S located on Gm01 65–104 Mb, Gm03 75.4–77.49 Mb, Gm17 26–81 Mb, Gm19 30–31 Mb, 100.7–115 Mb and Gm20 92–98 Mb [129, 130]. The genes of 11S and 7S have been reported, the genes of 11S subunit include Gy1, Gy2, Gy3, Gy4, Gy5 and Gy7 and the genes of the 7S subunit mainly include CG-alpha-1 (7sα), CG-alpha’-1 (7sα’) and CG-beta-1 (7sβ) [131, 132, 133, 134]. Three genes encoding 11S, AtCRU1, AtCRU2 and AtCRU3, have been verified in Arabidopsis thaliana . Wang et al.  mapped a QTL qBSC-1 (7S), which could regulate the SSP. Knockdown of 7S globulin subunits can change nitrogen content in transgenic soybean seeds . Furthermore, the ratio of 11S to 7S is ranged from 0.5 to 1.7 among cultivar soybean and affects nutritional quality and functional properties of soybean seed storage protein directly [138, 139]. And also, it is amusing that the content of 7S and 11S are significantly negative correlation . Yang et al.  demonstrated that the lack of 11S4A induced the compensatory accumulation of 7S globulins. By adjusting the subunit composition of soybean seed storage protein, it can remove sensitization protein efficiently; at the same time, it is an approach to improve the quality of the soy protein nutrition and production and processing [42, 103, 142, 143].
Accumulation of soybean seed storage protein is always coupling with TAGs and some key transcription factors involved in the process . B3-type transcription factors can act directly on the expression of SSP genes . The B3 domain, identified as the DNA-binding motif, recognizes the RY motif (CATGCA) as the target sequence , and RY motif (CATGCA) is a cis-acting element as a seed-specific promoter, which is the most legume seed storage protein gene that contain one or more RY repeating elements [65, 128]. Several studies have shown that the binding of the ABI3 with the RY motif can regulate the accumulation of storage proteins in Arabidopsis seeds [147, 148, 149, 150]. The seed-specific B3 domain transcription factors, LEC2, FUS3 and ABI3, have been identified, and the mutations of these genes often showed the negative accumulation of seed storage proteins [151, 152, 153, 154]. In addition of ABI3, ABI4 and LEC1 also showed the interaction to regulate the SSP [96, 155]. Some previous studies showed that these genes affect the induction of storage protein gene expression directly [156, 157, 158, 159]. Furthermore, expression OLEOSIN required activation of LEC2 and two RY elements on its promoter . Both LEC1 and LEC2 act as positive regulators upstream of ABI3 and FUS3, function analysis showed influence on the expression of seed storage protein (SSP) genes [44, 153, 158, 160, 161]. LEC1 and L1L can active the promoter of CRUCIFERIN C (CRC), and LEC1 can also regulate CRC and other SSP genes working with FUS3 and ABI3 . In addition to RY motifs, the presence of G-Box elements is also proper activation of target promoters of LEC1, LEC2, ABI3 and FUS3 . Some studies showed that LEC2, ABI3 and FUS3 collaborate with bZIPs TFs that interact with these G-Box elements to activate SSP genes [163, 164]. Furthermore, GmDOF4 and GmDOF11 can bind with the promoter of CRA1 to regulate the expression of SSP . GmDREBL can be upregulated by GmABI3 and GmABI5 and be regulated by the late stage of SSP genes . DGAT can reduce the soluble carbohydrate content of mature seeds and increase the seed protein content at the same time . Therefore, in addition to WAR1, LEC1, LEC2, ABI3 and FUS3, transcription factors of MYB, bZIP, MADS, DOF or AP2 families are also involved in the accumulation of storage compounds (oil and SSPs) and seed development regulatory network, as partners or direct target genes .
1.4. Small RNA regulation of seed composition
Small RNAs, such as miRNAs and short interfering RNAs (siRNAs), are key components of the evolutionarily conserved system of gene regulation in eukaryotes . Wherein, microRNAs (miRNAs) are a class of non-coding small RNAs of 20–24 nt in length that play an important role in plant growth and development. Structurally, except for the characteristics of the segments, all miRNA precursors have well-predicted stem-loop hairpin structures, and this fold-back hairpin structure has a low degree of freedom of energy . The microRNA database (
There are few studies on miRNAs related to plant quality. Soybean cotyledons affect soybean seed yield and quality. Goettel et al. analyzed 304 miRNA genes expressed in soybean cotyledons and predicted their complex miRNA networks to 1910 genes. By analyzing extensive biological pathways present in soybean cotyledons, the evolutionary pathways of soybean miR15/49 in soybean cotyledons were further demonstrated . Ye et al. identified and analyzed the whole genome of miRNA endogenous target gene mimic (eTM) and the phagemid-generated siRNA (PHAS) in soybean, with a focus on lipid metabolism-related genes. Lipid metabolism was found to be regulated by a potentially complex non-coding network in soybean, of which 28 may be miRNA-regulated and nine may be further regulated .
2. Conclusion and perspectives
As sequencing development of soybean genome, the cultivar Williams 82 genome has been released by Schmutz et al. , and it update the quality of assembly of the reference genome year by year. In present version (Glycine max Wm82.a2.v1), 56,044 protein-coding loci and 88,647 transcripts have been predicted, and all related data have been released in Phytozome (
Although many genes and regulators of seed oil content and SSP have been identified and their associated regulatory networks have been well studied in Arabidopsis, there are still unclear in soybean in addition to WAR1, LEC1, LEC2, ABI3 and FUS3 due to the 75% duplication genome . Combination and application of multiple omics (genomics, functional genomics, transcriptomic, proteomics and epigenomics) and advanced biotechnology (genome editing) needed to clarify the soybean seed oil content and SSP gene and regulatory network. Secondary population including recombinant heterozygous lines (RHL), chromosome segment substitution line (CSSL) and/or near isogenic lines (NIL) need to be applied to reduce the variable for analyzing the effects of single gene or transcription factors and used to identify the effective alleles and evaluate its effects and contribution. Combination of general loci could be further used for design of selection chip assay, which may lead to the foundation of high oil or high seed storage protein breeding.
This study was supported by the National Key R&D Program of China (2016 YFD0100500, 2016YFD0100300, 2016YFD0100201-21), the National Natural Science Foundation of China (31701449, 31471516, 31401465, 31400074, 31501332), the Natural Science Foundation of Heilongjiang (QC2017013), the Young Innovative Talent training plan of undergraduate colleges and universities in Heilongjiang province (UNPYSCT-2016144), special financial aid to post-doctor research fellow in Heilongjiang (To Qi Zhaoming), the Heilongjiang Funds for Distinguished Young Scientists (JC2016004) and the Outstanding Academic Leaders Projects of Harbin, China (2015RQXXJ018), the China Post Doctoral Project (2015 M581419), the Dongnongxuezhe Project (to Chen Qingshan), the Young Talent Project (to Qi Zhaoming, 518062) of Northeast Agricultural University and SIPT project of Northeast Agriculture University (2018-171, 2018-172).