Recombinant inbred rodents form immortal genome-types that can be resampled deeply at many stages, in both sexes, and under multiple experimental conditions to model genome-environment interactions and to test genome-phenome predictions. This allows for experimental precision medicine, for which sophisticated causal models of complex interactions among DNA variants, phenotype variants at many levels, and innumerable environmental factors are required. Large families and populations of isogenic lines of mice and rats are now available and have been used across fields of biology. We will use the BXD recombinant inbred family and their derived diallel cross population as an example for predictive, experimental precision medicine and biology.
- experimental precision medicine
- systems genetics
- personalized medicine
- recombinant inbred strains
- diallel cross
One of the major objectives of modern biology and medicine is prediction: being able to take information about an individual’s genome and environment and accurately predict their phenotype. This effort has taken on many forms and many names in different fields over time including population genetics , statistical genetics, quantitative genetics , genetical genomics , complex trait analysis , systems genetics [5, 6], systems medicine [7, 8], personalized medicine , predictive medicine and precision medicine [10, 11]. In humans, this has been greatly constrained by the
How then, if there is so much complication in this one-to-one relationship (one gene variant to one phenotype), can we uncover the true many-to-many-to-many relationships that occur in biology? Phenotypes at many levels, including behavior, organ systems, cells, proteins, metabolites, and mRNAs, all interact together with sets of many gene variants, and with an individual’s current and previous environmental exposures. We need to understand gene–gene (epistasis), gene-age, gene-sex, gene-treatment, and gene–environment interactions and all their combinations. One answer to this is through the use of recombinant inbred (RI) populations and their derivatives.
2. Recombinant inbred families
Recombinant inbred (RI) populations are a seemingly simple idea: two inbred strains are crossed, and their F1 progeny are then crossed again to produce an F2. Pairs of these F2 animals are mated, and new lines are established through repeated rounds of sib-mating (Figure 1A). By generation F20, we have a population of 99% inbred strains, each of which is a unique mosaic of homozygous genetic regions from both the parents, and for which an effectively infinite set of genetically identical individuals can be produced [24, 25]. This combination of genetic variability between strains but identical genome within strains allows the mapping of linkage between genotype and phenotype. The design has been expanded on in a variety of ways , such as increasing the number of parental strains (e.g. the 8 founders used for the Collaborative Cross mice [27, 28]) to increase the number of variants that segregates in the population, or using multiple rounds of crossing before inbreeding, producing so-called Advanced Intercross RI strains (AI-RI) to increase the number of recombinations, and therefore the precision of mapping (Figure 1B; ). Although RI strains were first developed in mice, and it is mice that we will concentrate on in this chapter, the design has now been used for a wide variety of organisms, including
These RI families are an essential complement to data collected in humans, allowing us to build experimental platforms for what is now called precision medicine. Each isogenic RI strain within a family is effectively an immortal genome-type. This is important because it allows the same genome to be resampled using any tissue, at any age, with any method, with any environmental exposure or treatment that the researcher cares to use. This allows us to model higher-order genome-environment interactions: the many-to-many-to-many problem stated above.
Whereas in human cohorts we have to imagine a counterfactual (e.g. what would have happened had I exercised more?), in isogenic strains we can effectively run this counterfactual – almost perfectly genomically and environmentally matched individuals can be phenotyped with only a single environmental perturbation between them. Even better, we can have multiple duplicates of these identical genome-types within each arm of the study, allowing us to reduce the effect of unwanted environmental perturbations, increasing our power to detect true associations . However, in some sense, this is still an
The goal is accurate genome-phenome prediction. With this goal in mind, we will use the BXD family of isogenic mouse strains as our example of how this can be achieved. The BXDs are by a wide margin the largest and most deeply phenotyped mammalian family and can be used as a testbed for experimental precision medicine.
3. The BXD family
The BXD family were among the first RI strains to be produced [24, 42, 43]. This work was started by Benjamin A. Taylor who crossed female C57BL/6 J (B6 or B) and male DBA/2 J (D2 or D) strains—hence BXD (Figure 1A). The first sets of BXDs were intended for mapping Mendelian loci [42, 44], but the family was also used to map complex traits such as cancer and cardiovascular disease [45, 46, 47, 48], variation in CNS structure [49, 50, 51, 52], and behavioral and pharmacological differences [53, 54, 55, 56, 57, 58, 59, 60, 61, 62]. Twenty-seven of the original 32 BXD strains are still available from The Jackson Laboratory (JAX). In the mid-1990s, Taylor began the production of a second set of BXDs  and added nine new strains (BXD33–BXD42). BXD1-BXD42 carry the strain suffix “/TyJ”.
We started production of another wave of BXDs at UTHSC in the late 1990s . These new lines were derived from advanced intercross (AI) progeny that had accumulated chromosomal recombination events across 8 to 14 generations  (Figure 1B). These AI-derived BXDs incorporate roughly twice as many recombinations between parental genomes than do conventional F2-derived BXDs [63, 64, 65, 66, 67]. This improves mapping precision nearly two-fold. BXD strains BXD43 and above from UTHSC were donated to JAX once fully inbred, and carry the strain suffix “/RwwJ”.
The BXD family has been used to define specific genes and even sequence variants corresponding to 20 or more QTLs. These include two tightly linked genes,
Two things now set the BXD family apart from all other recombinant inbred populations: the number of strains within the family, and the deep, coherent phenome that has been collected for them.
3.1 The largest mammalian recombinant inbred family
The BXD family is the largest mammalian recombinant inbred population, having expanded during its lifetime, from ~20 , to ~35 , to ~80 , to a total of 198 strains with data on GeneNetwork.org. There are 123 BXD strains currently distributed by The Jackson Laboratory (JAX) and an additional seventeen strains available at UTHSC, soon to be donated to JAX . All 140 of these strains are available under a standard material transfer agreement. This expanded number of easily accessible strains increases the power and precision of linkage studies .
As the number of strains increases, there is an increase in the number of recombination junctions within the population, and consequently, quantitative trait loci (QTLs) can be narrowed down to smaller intervals. This is improved still further by the fact that approximately half of the BXD family are derived from advanced intercrosses, each of which will have a larger number of recombinations than their F2 derived cousins. We have demonstrated that when using approximately half of the family (60–80 strains), precision is close to 1 Mb for many traits . This is also partially due to two other features of the family. The first, common to all RIs, is that the effective heritability of the trait can be boosted by resampling the same genome-type , and the second, that because there are two parents in the population, there is a well-balanced distribution of the two haplotypes across the genome (the mean minor allele frequency is ~0.44).
When carrying out QTL mapping the largest gain of power is given by increasing the number of genome-types tested [38, 73], and therefore, as the largest RI family, the BXD have the most power to detect genotype–phenotype linkage. A simple app has been produced to estimate power to detect QTL in the BXD, available at http://power.genenetwork.org . When we examine power in the BXD family, we see a fact that might seem counter-intuitive to some: power is always increased more by increasing the number of strains compared to increasing the number of within strain biological replicates, even when heritability is low. Even at low-to-moderate heritabilities, increasing replicates above 6 within-strain gives very little improvement in power.
We should also note that the effect sizes seen in the BXD family (and other two-parent RIs), appear to be high, but this is correct, as effect size is highly dependent upon the population being studied. Effect sizes measured in families of inbred lines are typically much higher than those measured in an otherwise matched analysis of intercrosses, heterogeneous stock, or diversity outbred stock. Two factors contribute to the higher level of explained variance of loci when using inbred panels. The first reason is due to replicability. When effect size is treated as the proportion of total genomic variance explained by the QTL, effect size will increase as environmental effects decrease due to replication. That is, resampling decreases the standard error of the mean, suppressing environmental “noise” . This is in addition to the increase in heritability above (i.e. an increase in total variance explained by the total genomic variance).
The second reason is that nearly all loci in inbred panels are homozygous and the same number of sampled animals will account for twice as much genetic variance as in an F2 cross, and four times as much variance as in a backcross . When phenotyping with fully homozygous strains we are only examining the extreme ends of the distribution, providing a boost in power to detect additive effects. The downside is obvious: we cannot detect non-additive effects. However, if we add in members of the diallel cross population (DAX), we can now estimate both dominance and parent-of-origin effects. This is a topic we will come to later.
3.2 The deepest phenome for any family
As well as being the largest recombinant inbred family, the BXD are also the most deeply phenotyped. Over 40 years of data is now openly and publicly available at genenetwork.org, providing an unrivaled resource. This dense and well-integrated phenome consists of over 10,000 classical phenotypes . The phenome begins with Taylor’s 1973 analysis of cadmium toxicity, through to recent quantitative studies of addiction [84, 85, 86], behavior [87, 88, 89, 90], vision , infectious disease [92, 93, 94], epigenetics [95, 96], and even indirect genetic effects [97, 98, 99]. The BXDs have been used to test specific developmental and evolutionary hypotheses [49, 100, 101]. They have allowed the study of gene-by-environmental interactions, with environmental exposures including alcohol and drugs of abuse [86, 102, 103, 104, 105], infectious agents [71, 106, 107, 108, 109], dietary modifications [110, 111, 112, 113, 114, 115], and stress [116, 117]. The consequences of interventions and treatments as a function of genome, diet, age, and sex have been quantified [90, 96, 115, 118, 119, 120], and gene pleiotropy has been identified .
Beyond this, there is now extensive omics data for the BXD. Both parents have been fully sequenced [75, 122, 123], and deep linked-read and long-read sequencing of 152 members the BXD family is underway. Over 100 transcriptome datasets are available (e.g, [124, 125]), as well as more recent miRNA [84, 126], proteome [118, 120, 127], metabolome [75, 118, 125], epigenome [95, 128], and metagenome [93, 129] profiles. Nevertheless, much more is still to be done, as many of these measures have only been taken in the liver or in specific brain regions [118, 120]. However, as each of these new datasets is added, they will be fully coherent with previous datasets, multiplicatively increasing the usefulness of the whole phenome.
Access to this plethora of data is freely available from open-source web services, allowing users to download the data, or to make use of powerful statistical tools designed for global analyses that are integrated into websites (e.g. GeneNetwork.org, bxd.vital-it.ch, and Systems-Genetics.org) [125, 130, 131].
It cannot be overstated how important it is that those using the BXDs gain access to coherent genomes and quantitative phenomes generated under diverse laboratory and environmental conditions [83, 132]. New data can be compared to thousands of publicly available quantitative traits, and with each addition, the number of network connections grows quadratically—enabling powerful multi-systems analysis for all users [73, 111, 112, 118, 125, 133]. Causal pathways can be produced from genome variants, to gene expression, to metabolite levels, to phenotype . Within minutes of finding a gene of interest, a researcher can look for correlations between its expression and thousands of other genes, across dozens of tissues. Enrichment analysis can then be carried out on these ‘gene-friends’ suggesting pathways and networks that your gene of interest may be associated with. Correlations can be found between the expression of your gene and over 10,000 phenotypes, giving suggestions of the role of the gene at the whole-organism level. Shared QTLs, where both the gene-expression and a phenotype of interest are associated with the same locus, provide strong evidence of a genetic link. Using GeneNetwork.org we can build biological networks, moving from genetic variant, to expression difference, to protein expression, to whole-system outcomes, with just a few keystrokes, and without touching a lab bench [134, 135, 136]. Entire manuscripts can be written without leaving a web browser . This is a massive step forward that is under-appreciated by many.
The above demonstrates how the BXD can help us achieve our goal of predictive modeling of disease risk and the efficacy of interventions . Indeed, the family has already been used to test specific functional predictions of behavior based on neuroanatomical variation . The BXD family is well placed to address these questions that encompass both high levels of genetic variation and gene-environmental interactions: our many-to-many-to-many problem. This is bolstered by the family’s easy extendibility into a massive diallel cross population (DAX).
4. Diallel crosses
The diallel cross is another simple idea that has been with us for over 60 years [140, 141, 142]. We now have the major opportunity to take full advantage of this approach using large panels of fully sequenced isogenic strains. A DAX is the set of all possible matings between several genome-types (Figure 1D). For the C57BL/6 J and DBA/2 J there are the two reciprocal F1s, and these have been used to study parent-of-origin effects and to estimate heritability (e.g. ). As the number of parental strains increases, the number of potential diallel crosses increases exponentially, and tools have been developed to deal with large DAXs . Although we have learnt much about the genetic architecture of traits [53, 143, 144, 145, 146, 147], QTL mapping has been more difficult, given the relatively small number of strains used . We can now imagine the full DAX for the BXD family of 140 strains – 19,460 replicable isogenic F1s, all of which have a reproducible, entirely defined genome, and any subset of which can be generated efficiently for
At the first level, this has important consequences for power and precision. The number of strains phenotyped can be increased massively, giving power to detect loci with even the weakest of effect sizes . Precision can also be enhanced, as F1s can be produced which segregate for a narrow region of the genome, producing a small QTL interval containing fewer genes. All the data collected in these F1s can be coherently integrated into the phenome already aggregated for the BXD, meaning that every new phenotype measured adds quadratically to the phenome and that any user of this F1 has access to over 40 years of data.
At the next level up, it also allows us to detect, for example, dominance and parent-of-origin effects mentioned above. Small DAXs of mouse strains have been able to identify parent-of-origin effects, epistasis, and dominance, but have been unable to map the loci causing these effects [53, 143, 144, 145, 146, 149, 150]. By using reciprocal crosses of inbred strains (e.g. BXD001xBXD002F1 vs. BXD002xBXD001F1), we can produce isogenic litters, the members of which are all genetically identical, and whose only differences are due to parent-of-origin effects  (Figure 1C). By building a large DAX of reciprocal crosses, the genomic loci causing these dominance, epistatic, and/or parent-of-origin effects can be identified. Mapping of these non-additive effects is a complete dark zone in fully homozygous inbred populations.
Finally, and most importantly, the DAX provides a population for the testing of predictions. Using the BXD family we have enough strains to make associations, whether gene-phenotype, environment-phenotype, or gene–environment-phenotype, with high power. However, using only the inbred BXD lines, we do not have a second population in which to test predicted associations. The BXD DAX provides a matrix of 19,600 isogenic genome-types. If only the ‘diagonal’ of inbred BXD strains are used to detect associations and make predictions, any of the 19,460 isogenic F1s are available to test these associations and predictions (Figure 1D).
We can expand the DAX even further using easily available isogenic strains. There are approximately 200 RI strains from other two-parent mouse populations, including AXB/BXA (29 strains), AKXD (20), BXH (11), BRX58N (7), CXB (19), ILSXISS (60), LGXSM (~18), NXSM (16) and SWXJ (12), plus approximately 55–75 strains from the Collaborative Cross 8-parent RI population . From these inbred parents, there are over 152,100 isogenic F1s that can be produced and replicated. An additional expansion of this design is to cross RI families to genetically engineered disease models.
5. Diallele crosses to genetically modified strains
Genetically modified animals, including humanized, transgenic and knockout mouse models, have been a vital piece in uncovering genotype–phenotype associations, but they have often suffered from the same
An excellent example of this already exists: the Alzheimer’s disease BXD (AD-BXD) panel developed by Kaczorowski and colleagues [175, 176]. By crossing C57BL/6J-congenic females hemizygous for the humanized 5xFAD transgene (JAX Stock No. 008730) to males from BXD strains, they produced litters, half of which had the 5xFAD transgene (the AD-BXD), and half of which did not have the 5xFAD transgene (non-transgenic-BXD). The whole litter is genetically and environmentally identical except for the presence of the transgene, giving an immediate and directly comparable control (Figure 1C). By crossing the humanized 5xFAD line on a single genetic background to a diverse but defined set of BXDs, they produced a population that incorporates high levels of sequence variation mirroring that of humans. They have mapped genetic and molecular causes of cognitive loss in AD-BXD mice [154, 175, 176, 177, 178, 179], including a broad spectrum of cognitive loss similar to that of humans with familial and late-onset AD . The human transgenes in the 5XFAD line  sensitizes BXD hybrids to a greater or lesser degree—some begin to lose conditioned fear memory as early as 6 months; others well after a year , demonstrating a gene-by-gene-by-age interaction. Variation is highly heritable and mappable and gives a powerful means by which to define genetic causality and mechanisms of memory and non-cognitive loss and resilience to loss.
Neuner et al., were also able to demonstrate ‘reverse translation’ from human genomic data to mouse phenotype . They generated a polygenic genetic risk score using 21 human genes which increase Alzheimer’s disease risk, and showed that the allele dosage was significantly associated with cognitive outcomes in the AD-BXD. This confirms firstly, that naturally occurring variation in these networks has overlapping effects in mice and humans, and secondly that gene-phenotype associations translate across species. This approach can be applied to many other phenotypes.
Given that phenotypes from genetically engineered mice on a single genetic background cannot be reliably generalized to other mouse genetic backgrounds , it is unsurprising that there are difficulties in generalizing to other species. By crossing genetically modified lines to RI strains to produce a DAX, we overcome this problem and allow the integration and translation of data to other populations and other species.
6. Integration and translation with other populations
Compared to conventional F2s and advanced intercrosses (AIs), outcrossed heterogenous stock, or diversity outbred stock, the BXD are particularly advantageous when the heritability of a trait is moderate or low because the genetic signal can be boosted greatly by resampling isogenic members of the same line many times . The drawbacks of the BXDs are lower precision, and a decreased amount of variation in the population compared to e.g. multiparent families (such as the Collaborative Cross and the Diversity Outbred), and a consequent decrease in the total phenotypic variance . We consider this an acceptable drawback, as we have shown that medically relevant phenotypes have variation in the family and it is possible to achieve subcentimorgan mapping precision using only half of the full set of strains . Beyond this level of precision, an efficient method to transition from QTLs to causal genes, variants, and mechanisms is to take advantage of complementary resources. These include sets of other murine mapping resources, efficient
As a specific example of combining murine populations, Taylor’s cadmium testicular toxicity mutation (BXD Phenotype 13035) that was unmappable in 1973 now maps to 3 Mb on GeneNetwork.org. When combined with SNP data for common strains, the variant can be restricted to a 400 Kb region that includes the causal
Mouse-to-human genetic translation has at least a 20-year history , but has taken off now that GWAS are routine [48, 78, 111, 112, 123, 125, 185, 186]. Human GWAS data can be used to refine QTL found in mice, e.g. taking advantage of the power to detect associations in the BXD to identify a homologous region in humans, and then using the precision of human GWAS to identify a candidate gene [185, 186, 187].
More importantly, mouse data can be used to determine the function and causal pathway for associations made in humans. Finding variant-phenotype associations for any phenotype with GWASs is now only limited by one’s ability to collect phenotypes, but interpreting and determining the function of these variants is far more difficult, given the environmental and genetic variation in any human population. RI mice, such as the BXD, provide a method of ‘reverse-translation’, from human-to-mouse. Again, the work of Kaczorowski and colleagues above provides an excellent example  that can be applied to any other phenotypes.
Despite occasional arguments to the contrary [188, 189], mice, when used correctly, are a good model of human biology and medicine [12, 190, 191, 192]. Indeed, at least 40 Nobel Prizes have been awarded for research involving mice (http://www.animalresearch.info/en/medical-advances/nobel-prizes) , and their use has been vital in understanding the pathogenesis of many diseases. For true predictive medicine, we need to understand all gene-by-gene-by-environment-by-age-by-sex-by-treatment interactions , and animal models are the only way to do this at scale. The importance of using genetically diverse mice has often been overlooked, leading to difficulties with translation. RI families, such as the BXDs, and their expansions , including diallel crosses and reduced complexity crosses [194, 195], overcome this problem and are a vital step towards accurate, individualized, predictive medicine.
The UTHSC Center for Integrative and Translational Genomics (CITG) has supported production of the BXD colony at UTHSC and will continue to support this colony for the duration of the grant. The CITG also provides generous support for computer hardware and programming associated with GeneNetwork, and our Galaxy and UCSC Genome Browser instances. We thank the support of CITG, and funds from the UT-ORNL Governor’s Chair, NIDA grant P30 DA044223, NIAAA U01 AA013499 and U01 AA016662, NHLBI R01 HL151438, and NIDDK R01 DK120567 for the work at UTHSC.
Conflict of interest
The authors have no conflicts of interest.