Prevalences of major risk factors for VTE.
Thrombophilia is defined as a disorder of hemostasis in which there is a tendency for the occurrence of thrombosis in veins or arteries due to abnormalities in blood composition, blood flow, or the vascular wall. The pathogenesis of venous versus arterial thrombosis is very distinct and these are often considered as separate diseases. The term thrombophilia is most often used in combination with venous thrombosis. VTE encompasses mainly deep vein thrombosis and pulmonary embolism.
Venous thromboembolism is a common disease with an annual age-dependent incidence of 1-3 individuals per 1000 per year (Naess et al, 2007). VTE is a serious disease with a thirty day case-fatality rate of 6.4% after a first VTE event and this rate is twice as high for pulmonary embolism (9.7%) than for deep vein thrombosis (4.6%) (Naess et al, 2007). VTE can also lead to complications like post-thrombotic syndrome that is characterized by pain and ulceration.
Although both sexes are equally affected by a first VTE, men have a more than 2-fold higher risk for a recurrent VTE as compared to women (Douketis et al, 2011).
VTE is a complex common disease in which multiple risk factors, both acquired and genetic, are involved in the development of the disease. Many acquired risk factors have been identified such as surgery, immobilization, trauma, oral contraceptive or hormone replacement therapy use, pregnancy, malignancy, and advanced age.
This chapter will focus on the genetic risk factors for VTE that have been identified to date and the research methods that were used to identify these factors in the past as well as new technological innovations used for the discovery of new genetic risk factors for VTE.
2.1. Thrombophilia as monogenetic disease
In 1937, Nygaard and Brown introduced the designation “essential thrombophilia”in a report describing five cases of vascular disease characterized by recurrent episodes of acute occlusion in the large and small vessels of extremities, heart, kidney, and brain (Nygaard & Brown, 1937). In 1956 a survey of the literature that described a familial tendency for thrombosis was published that also used the term thrombophilia, now to indicate the hereditary nature of the disease (Jordan & Nandorff, 1956). Such a connection between inheritance and thrombosis was described as early as in 1911 (Schnitzler, 1926). Further studies into the genetic predisposition to thrombosis at the time were hampered by the lack of suitable tests and limited insight in the pathophysiology of VTE. Thrombophilia was considered a monogenetic disease starting with the identification of a family with hereditary antithrombin deficiency in 1965 (Egeberg, 1965).
In 1969, another heritable trait was found to be associated with thrombosis risk: non-O blood group. Blood group O is less often seen in thrombosis patients than one of the other blood groups (Jick et al, 1969). Protein C and protein S deficiencies were identified as genetic risk factors in thrombosis patients following the unraveling of the protein C anticoagulant system in the late 1970s and 1980s (Griffin et al, 1981, Schwarz et al, 1984). With improvement in DNA technology, mutations in the genes for antithrombin, protein C, and protein S that caused the deficiency states could be identified. In the 1990s activated protein C resistance and the Factor V Leiden mutation were discovered as well as the prothrombin mutation 20210G>A (Bertina et al, 1994; Poort et al, 1996). The prevalences of these known ‘classic’ genetic risk factors are represented in table 1.
|Genetic risk factor||General population||VTE patients||Thrombophilia families|
|Factor V Leiden||1-15%||10-50%||45%|
|Non-O blood group||57%||73%|
2.1.1. Antithrombin deficiency
Antithrombin (AT) deficiency was first described in a Scandinavian family in which several family members presented with thrombotic events and relatively low levels of AT in plasma. Heterozygous AT deficiency is a rare disorder with a prevalence of 1:500-1:5000 in the general population (Tait et al, 1994, Wells et al, 1994). AT deficiency is inherited as an autosomal dominant trait. Most cases are heterozygous and homozygous AT deficiency is hardly compatible with life and probably embryonic lethal. Heterozygous AT deficiency is observed in 4% of the thrombophilia families and in 1% of consecutive deep vein thrombosis patients (Lane et al, 1996).
DNA analyses resulted in the identification of loss of function mutations in the AT gene (SERPINC1) in people with AT deficiency. AT deficiency can be divided in two subtypes: type I (quantitative deficiency) and type II (qualitative deficiency). Type I deficiency is characterized by a reduction of activity and protein levels and accounts for 80% of the symptomatic patients with AT deficiency. Type I deficiencies are most commonly caused by short deletions and insertions and to a lesser extend by point mutations. Deletions are scattered throughout the AT gene, but three regions are often affected (codon 81, codon 106/107, codon 244/245). Recently, also large deletions (more than 30 bp) were identified in 8% of the AT deficient patients by using multiplex ligation-dependent probe amplification analysis (Luxembourg et al, 2011).
Type II deficiency is characterized by low activity and normal protein levels. Type II deficiencies result most often from single base pair substitutions that affect the reactive domain (type IIa) and heparin-binding domain (type IIb). Type IIc, a category including so-called pleiotropic defects, is often caused by mutations located in the strand Ic that impair the function of the reactive domain (Patnaik & Moll, 2008). The Human Gene Mutation Database describes at present 235 different mutations in the AT gene (Stenson et al, 2009).
Considering all known inherited thrombophilias, AT deficiency appears to lead to the highest risk for VTE. Risk for developing VTE depends on an individual’s family history, presence of other mutations, and the subtype of AT deficiency. In particular subtype IIb confers a lower risk than the other subtypes (Finazzi et al, 1987). Risk estimates for developing VTE in the presence of AT deficiency are mainly based on family studies and these show a 10-20 fold increased risk (Lijfering et al, 2009, Mahmoodi et al, 2010, van Boven & Lane, 1997). The risk for developing a recurrent VTE is 10.5% per year without long-term anticoagulant treatment. With long-term anticoagulant treatment it still is 2.7% per year (Vossen et al, 2005).
2.1.2. Protein C deficiency
In 1981 the first patient with protein C (PC) deficiency and recurrent venous thromboembolism was described (Griffin et al, 1981). The prevalence of PC deficiency in the general population is 0.2-0.4% and 3-5% for VTE patients, although variation is observed among different study populations (Franco & Reitsma, 2001).
Homozygous or compound heterozygous PC deficiency is very rare and causes severe thromboembolic disease and purpura fulminans in newborns (Marlar & Mastovich, 1990). Heterozygous PC deficiency is more frequently observed and is associated with an increased risk to develop venous thromboembolism. The inheritance pattern of heterozygous PC deficiency is not as clear as that of AT deficiency. In general PC deficiency is inherited as an autosomal dominant disorder, but often with incomplete penetrance. For homozygous and compound heterozygous PC deficiency recessive inheritance patterns seems to fit better (Bafunno & Margaglione, 2010, Bereczky et al, 2010).
PC deficiency is primarily caused by loss of function mutation in the protein C gene (PROC). Mutations are very heterogeneous and the majority are single nucleotide substitutions in the coding regions of PROC (Bereczky et al, 2010).
PC deficiency is generally subdivided into two types: type I (quantitative deficiency) and type II (qualitative deficiency). Most PC deficiencies are type I and result mainly from single nucleotide substitutions in the coding regions of PROC. Type II deficiency is observed in 10-15% of the cases and often results from missense mutations in regions encoding for the Gla-domain, the propeptide, or the serine protease domain. In total, 275 distinct mutations in the PROC gene have been entered into the HGMD database (Stenson et al, 2009). However, still in 10-30% of families with PC deficiency no mutations have been found (Koeleman et al, 1997).
Heterozygous PC deficiency is associated with an increased risk for VTE. Risk estimates for the development of VTE depend on the population studied and vary between a 3 and 11 fold enhanced risk. The annual recurrent incidence rate is rather high with 5.1% in men and women combined. In men only, the recurrence risk is 10.8% per year (Vossen et al, 2005).
2.1.3. Protein S deficiency
Three years after the description of the first PC deficient patient, the first protein S (PS) deficient patient was reported. This patient also encountered recurrent VTE. (Schwarz et al, 1984). The prevalence of PS deficiency in the general Caucasian population is 0.03-0.13% and 1-5% in VTE patients, but these numbers vary between different populations (Franco & Reitsma, 2001). Especially in Asians, PS deficiency appears to be more prevalent than in Caucasians. In Asia, PS deficiency prevalences of 0.48-0.63% (general population), and 12.7% (VTE patients) have been claimed (Adachi, 2005).
Homozygous or compound heterozygous PS deficiency is rare and causes similar clinical symptoms as homozygous or compound heterozygous PC deficiency. Almost all PS deficiency cases are heterozygous. Heterozygous PS deficiency is usually inherited as an autosomal dominant trait and the mutation spectrum is rather heterogeneous.
Three subtypes of PS deficiency can be distinguished: type I (low activity, total, and free PS), type II (low activity), and type III (low activity and free PS). Type I PS deficiency is most frequently observed and often a consequence of missense mutations in the protein S gene (PROS1). Copy number variations were found in 33% of a group of missense negative patients with PS deficiency (Pintao et al, 2009). These copy number variations included deletion of the whole PROS1 gene, partial gene deletions and partial duplications. In the Japanese population, one particular missense mutation, K196E, in the second EGF like domain of protein S is very abundant and was shown to be a risk factor for DVT (Kimura et al, 2006).
Type II PS deficiency is diagnosed in about 5% of the cases. This type of PS deficiency is mainly characterized by mutations in sequences of the PROS1 gene that encode the Gla-domain and the EGF4-domain (Baroni et al, 2006). Type I and III deficiency often occur in the same family as phenotypic variants of the same genetic defect. An age-dependent increase of PS levels might play a role in these phenotypic expression variations (Simmonds et al, 1997). However, also families with only type III have been described. In the HGMD database 243 different mutations have been submitted at this moment.
Heterozygous PS deficiency is associated with a 5-11.5 fold increased risk of VTE in family-based studies, but this could not be confirmed in population-based studies (Rezende et al, 2004). The recurrence rate is, like for PC deficiency, also higher for men (10.5%) than for women (3.1%). This risk is not apparent in patients using anticoagulants for a long-term period (Vossen et al, 2005).
2.1.4. Blood group
During a drug surveillance program, patients treated with anticoagulants for venous thromboembolism showed to have more often blood group non-O than expected. Following this observation, a cooperative study was performed among women from the USA, UK, and Sweden that developed venous thrombosis while taking oral contraceptives, during pregnancy or the puerperium, or at other times. This study confirmed that there was a deficit of patients with blood group O, and the difference was larger when venous thromboembolism was associated with either oral contraceptive use or pregnancy (Jick et al, 1969).
Blood group O is associated with lower levels of von Willebrand Factor (VWF) and Factor VIII. Variation in plasma VWF levels were shown to be explained for 30% by ABO blood group (Orstavik et al, 1985). Blood group non-O is associated with a 2.6 fold increased risk for developing venous thrombosis. Blood group A is the main group responsible for the risk. The risk associated with VWF levels completely disappeared after adjustment for a particular blood group. However, the risk due to Factor VIII was not changed after adjustment for blood groups, which indicates that Factor VIII is an independent risk factor for venous thromboembolism (Tirado et al, 2005).
2.1.5. Activated protein C resistance and factor V Leiden
Activated protein C resistance (APCR) was identified in 1993 as an inherited abnormality that was highly prevalent in VTE patients within a family. In some family members, the activated partial thromboplastin time did not prolong by addition of activated PC to the plasma (Dahlback et al, 1993). This observation was referred to as activated protein C resistance and was later detected in 10-50% of VTE patients (Franco & Reitsma, 2001). With complementation tests, Bertina et al. discovered that APCR could be restored by adding coagulation factor V. A mutation in the factor V gene that is responsible for the APCR was identified in 1994 (Bertina et al, 1994). This mutation, often called factor V Leiden, causes a substitution of guanosine by adenosine at nucleotide position 1691, leading to an amino acid change from arginine to glutamine at position 506 of the protein. Factor V Leiden is a gain of function mutation because activated factor V is less sensitive to inactivation by APC, which facilitates the formation of more thrombin.
Factor V Leiden is quite prevalent in Caucasians (2-13%) (Bafunno & Margaglione, 2010), but varies among different geographical regions (Figure 1). The distribution of factor V Leiden is centered in Europe and extends into north India in the east. Factor V Leiden is introduced in America and Australia through emigration of Europeans. Factor V Leiden is prevalent in Europe and America, but also in Saudi Arabia and Israel. The mutation is rare in native populations from Eastern Asia, Africa, and America. In Basques and Inuit’s from Greenland factor V Leiden is nearly absent. These populations represent autochthonous European groups that show limited mixing with other Europeans populations. Based on the worldwide distribution of the prevalences of factor V Leiden a single origin for this mutation has been hypothesized. Also haplotype analysis supports this hypothesis and factor V Leiden is therefore thought to be an founder mutation that occurred about 21,000 to 30,000 years ago (Zivelin et al, 1997). The factor V Leiden mutation might have arisen after the separation of Orientals and Caucasians as clear differences among races have been observed. (Bauduer & Lacombe, 2005, Herrmann et al, 1997, Rees, 1996)
In Europeans, factor V Leiden is the most common genetic defect involved in the etiology of VTE. Factor V Leiden is an autosomal dominant trait and heterozygotes have a 5 fold increased risk to develop VTE, while homozygotes have a 50 fold increased risk (Koster et al, 1993, Rosendaal et al, 1995).
Studies of the risk of developing recurrent venous thrombosis in the presence of factor V Leiden showed contradicting results. Most studies do not find an increased risk as compared to mutation negative subjects (Christiansen et al, 2005, De Stefano et al, 1999, Eichinger et al, 2002). Some studies found only an increased risk in men but not in women (Ridker et al, 1995, Vossen et al, 2005).
2.1.6. Prothrombin 20210G>A
The second most prevalent genetic abnormality causing thrombophilia was identified by a candidate gene approach in 1996 in patients from families with unexplained thrombophilia (Poort et al, 1996). This mutation is located in the 3’-untranslated region of the prothrombin gene, at position 20210. The nucleotide change from a guanosine to an adenosine causes no amino acid change, but probably positively affects polyadenylation and thereby increasing the mRNA and protein expression leading to increased plasma levels of prothrombin (Leitner et al, 2008). In heterozygous carriers the plasma levels of prothrombin are increased with 30% and in homozygous carriers with 70% (Bafunno & Margaglione, 2010).
Prothrombin 20210G>A is inherited as an autosomal dominant trait and is almost only observed in Caucasians from Europe. Outside Europe, only one case was observed in India (Rees et al, 1999). This mutation is found in 1-3% of the general population, in 6% of VTE patients, and in 10% of probands from thrombophilic families (Franco & Reitsma, 2001). This mutation was also suggested to originate from a single mutational event that occurred after the divergence of Africans from non-Africans and of Caucasoid from Mongoloid subpopulations, like the Factor V Leiden mutation (Zivelin et al, 1998).
Risk for venous thromboembolism is 2-5 fold increased in the presence of the prothrombin 20210G>A mutation. In combination with the Factor V Leiden mutation, risk showed a multiplicative effect and results in a 20 fold increased risk for VTE (Emmerich et al, 2001). Recurrence risk for VTE is not increased (Margaglione et al, 1999).
2.1.7. Fibrinogen γ variants
Fibrinogen consists of 3 polypeptides, Aα, Bβ, and γ, which are encoded by three separate genes. The gene for the fibrinogen γ polypeptide encodes two isoforms. The major mRNA form contains all 10 exons whereas the minor form (γ’) is the result of alternative splicing and includes intron 9 (de Moerloose et al, 2010). Approximately 10% of fibrinogen contains the minor isoform and this fibrinogen bears a high-affinity nonsubstrate-binding site for thrombin, which can cause an inhibition of thrombin activity (Mosesson, 2003).
Elevated levels of fibrinogen are associated with a 4 fold increased risk to develop a VTE (Kamphuisen et al, 1999), possibly by enhancing blood viscosity and platelet aggregation. An association study investigating linkage of haplotypes of the three fibrinogen genes showed that haplotype H2 of the FGG gene was associated with an increased risk for deep vein thrombosis (Uitte de Willige et al, 2005). This haplotype was also associated with reduced fibrinogen γ’ levels and fibrinogen γ’ / total fibrinogen ratio, but not with fibrinogen levels. The risk conferred by this haplotype was proposed to result from SNP 10034C>T (rs2066865). About 6% of individuals carry the variant 10034C>T and this increases the risk for VTE two-fold in Caucasians (de Moerloose et al, 2010; Grunbacher et al, 2007, Uitte de Willige et al, 2009). In African Americans variant 10034C>T only increases the VTE risk marginally (Uitte de Willige et al, 2009). Variant 10034C>T is located in a GT-rich sequence region at the 3’-untranslated region of the FGG gene, which contains a putative cleavage stimulation factor binding site and is involved in the regulation of the polyadenylation signals (Uitte de Willige et al, 2007). Another variant, 9340T>C, was discovered to reduce VTE risk in Caucasians but not in African Americans. Variant 9340T>C reduces the risk approximately two-fold (Uitte de Willige et al, 2009).
3.1. Thrombophilia as oligogenetic disease
With the discovery of deficiencies of protein S, protein C, and antithrombin, VTE was first suggested to be a monogenetic disease. However, in particular protein C deficiency showed variability in penetrance of the thrombotic phenotype within and between families, suggesting that additional genetic risk factors were present in these thrombosis prone families. This notion was confirmed with the finding of APC resistance. Individuals from families with protein S, protein C, or antithrombin deficiency but also APC resistance had a higher risk for VTE when they inherited combined defects rather than only one defect (Koeleman et al, 1994, van Boven et al, 1996). Thus, the penetrance of thrombosis increases in these protein C deficient families after introduction of the factor V Leiden allele in the pedigree (Brenner et al, 1996). Carriers of combinations of defects also presented with thrombosis earlier in life and more frequently. The same was observed in protein S deficient families, where combined defects of the protein S gene and either the Factor V Leiden mutation or the prothrombin 20210G>A were found in 40% (Koeleman et al, 1995, Zoller et al, 1995) and 30% (Castaman et al, 2000) of families, respectively. As a result, thrombophilia was then suggested to be an oligogenetic disease in which inherited predisposition results from 2 or more mutations in genes involved in blood clotting (Miletich et al, 1993).
The heritability of VTE was investigated in family and twin studies resulting in an estimated heritability of 50-60% (Heit et al, 2004; Larsen et al, 2003, Souto et al, 2000a). Heritability was also determined for individual coagulation factors involved in clot formation, like prothrombin (49-57%), factor V (44-62%), and von Willebrand Factor (34-75%) (de Lange et al, 2001; Souto et al, 2000b). In addition, 20-30% of consecutive VTE patients report one or more first-degree family members with VTE (Heijboer et al, 1990, van Sluis et al, 2006). These findings reaffirm that genetic risk factors do play an important role in the development of VTE.
An important question in the field remains whether there are a multitude of genetic risk factors that remain to be identified. In 13% of thrombophilia families already two or more genetic risk factors have been identified. In 60% and 27% of families 1 or no genetic factor was found, respectively (Bertina, 2001). This firmly suggests that we are still ‘missing’ genetic risk factors that predispose to venous thromboembolism.
3.2. Investigation of unexplained heritability
After establishing high levels of heritability in many complex diseases, including VTE, investigators concluded that many genetic risk factors remained to be discovered. New hypotheses were formulated to explain in which part of the genome sequence these missing genetic risk factors were to be found. One of these hypotheses was the ‘common disease common variant (CDCV) hypothesis’. The CDCV hypothesis states that several common allelic variants - with appreciable frequency in the population and low penetrance - would account for the genetically determined variance in disease susceptibility of complex diseases like VTE. The central idea behind the CDCV hypothesis is that variants causing common diseases are reasonably frequent in the population, ranging from 1-10% (Collins et al, 1998, Lander, 1996). Other premises of this hypothesis are that the original mutation arose more than 100,000 years ago and that the model included absence of selection for or against these variants to make it possible for the variants to persist at a high frequency in the population. Evolutionary data suggests a proliferation of the human population from a rather small group of founders to 6 billion plus and this would be supplementary evidence for the CDCV theory. The mutation spectrum was likely to be narrow in the founders and a specific mutant could remain quite common during an expansion of the population (Iyengar & Elston, 2007).
Before the introduction of the CDCV hypothesis, new genetic determinants were investigated by linkage studies. Whole genome linkage studies have been performed in family studies with mini- and microsatellite markers, but later also with single nucleotide polymorphisms (SNPs). Linkage analysis assumes that many families share defects in the same locus, while there often is considerable locus heterogeneity in complex diseases, which will dilute linkage signals. Therefore, association studies of unrelated individuals using genotyping of a large set of single nucleotide polymorphisms (SNPs) are more appropriate to use for complex diseases. This approach is directly based on the CDCV hypothesis. With the finishing of the Human Genome Project (Collins et al, 2003) and the International HapMap Project (The International HapMap Consortium, 2005) and technological improvements, like microarrays, more genome wide association studies (GWAS) became feasible.
In the human genome around 20 million SNPs have been identified and validated (NCBI dbSNP Build 132). DNA sequences are inherited in blocks with high linkage disequilibrium. The pattern of SNPs in a block is called a haplotype. These blocks may contain a large number of SNPs, but determining only a few SNPs, so-called tagSNPs, are required to identify the haplotypes in a block. Especially the HapMap data has been employed as a source of information about haplotypes in different populations and tagSNPs. These tagSNPs were used in GWAS studies to examine genomes for association with a certain phenotype.
A GWAS study should be designed very carefully to prevent bias and other problems in the subsequent analyses. The study populations most used in GWASs are case-control studies. Cases and controls should have the same ethnicity and geographical background to avoid false positive results due to population-stratification. For case inclusion, strict criteria should be taken into account to prevent inclusion of phenocopies within the study population.
Type I errors, i.e. false-positive results, can be avoided by choosing an appropriate significance level. In GWAS studies, multiple tests are performed and the significance levels should be corrected for these multiple comparisons. One way is to apply the Bonferroni correction, which adjusts the significance level dependent on the number of independent comparisons that were performed (Johnson et al, 2010, Risch & Merikangas, 1996). The Bonferroni correction might be too strict when tested SNPs are in linkage disequilibrium and therefore should not be considered as independent comparisons. Type II errors, false negative results, can be avoided by using large sample sizes.
Positive results should be replicated in at least 2 other populations. The effect size and significance of a positive result is often overestimated in the first study. As a consequence, to replicate a claim, the sample size of the replication studies should be therefore larger than the original GWAS study.
The CDCV hypothesis was not accepted by the whole field. Opponents argued that in many complex diseases already a spectrum of disease associated rare variants had become known in direct contradiction of the CDCV hypothesis which states that only a few variants would account for the risk in complex diseases (Pritchard, 2001).
The alternative hypothesis put forward by opponents of the CDCV hypothesis was the ‘common disease rare variant hypothesis (CDRV)’ that argues that multiple rare variants, with relatively high penetrance, are the major contributors to genetic susceptibility to complex diseases. The rare variants would be more important because they were more likely to be functional or have phenotypic effects (Gorlov et al, 2008, Pritchard, 2001, Schork et al, 2009). Also the observation of familial clustering of complex diseases strengthened the CDRV hypothesis (Schork et al, 2009). This hypothesis gained increasing support when the genetic variation found with GWAS studies explained collectively only a small fraction of the heritability of any disease in the population.
GWAS studies are not powered for the detection of rare variants. The only strategy available to identify such rare variants is to sequence DNA directly, either in candidate genes or whole genome. To perform large studies with conventional Sanger sequencing is very costly, time consuming, and impossible in practise. With the introduction of next generation sequencing technologies, high-throughput sequencing of many genes became feasible and at a reasonably price.
Next generation sequencing can be used for de novo sequencing and re-sequencing purposes. For humans, re-sequencing is used because the reference sequence is already known from the Human Genome project (Collins et al, 2003) and will be further improved by the 1000 Genomes Project (The 1000 Genomes Project Consortium, 2010), which just finished the pilot study at the end of 2010.
Next generation sequencing was first used for targeted re-sequencing of candidate genes in just a few subjects. Nowadays, also whole exome sequencing can be performed for a reasonable price, although the samples sizes in most studies are still limited. The best, non-hypothesis driven, method would be whole genome sequencing, but this is still quite expensive especially when using large sample sizes.
Targeted re-sequencing of candidate genes was initially executed by first amplifying the target sequences by PCR and then sequence these PCR products with a next generation sequencer. The PCR steps are very time consuming and to accelerate the whole sample preparation process, a new method was developed: target enrichment. This method uses predesigned probes to enrich the DNA for the selected target genes and wash away remaining non-selected DNA sequences.
Next generation sequencers are improving constantly and are generating more and more reads with increasing read length. As a consequence, the total data output from one sequencing run is increasing and all these data need to be analyzed. The data analysis of the sequencing reactions remains a challenge. Especially the distinction of sequencing errors from real mutations is difficult and is best served by using a high coverage level, i.e. the same sequence is analysed multiple times. However, PCR errors that originated in the sample preparation phase cannot always be distinguished from real mutations and this problem is not solved by using higher coverage levels. Therefore, findings from next generation sequencing still have to be confirmed with Sanger sequencing.
In the next three sections, some genome wide linkage analysis studies, GWAS studies, and high-throughput sequencing studies in the field of thrombophilia will be discussed. High-throughput sequencing results for VTE are not available and therefore we will discuss some results from other complex diseases.
3.3. Results from genome wide linkage studies
Several genome wide linkage studies have been performed for venous thromboembolism. The first one was executed in the Genetic Analysis of Idiopathic Thrombophilia (GAIT) study (Souto et al, 2000a). The GAIT study consists of 21 extended Spanish pedigrees. Twelve of these families were selected through probands with idiopathic thrombophilia. The other 9 families were selected irrespective of any phenotype. Several genome scans were performed in the GAIT study. For the first scan 363 microsatellite markers were genotyped (Soria et al, 2002) while 485 microsatellite markers were used in the second scan (Lopez et al, 2008). Later, a scan employing 307,984 SNPs was performed (Buil et al, 2010, Malarstig et al, 2009). The investigators focussed on associations between genetic markers and intermediate phenotypes of VTE, like lipoprotein(a) levels (Lopez et al, 2008), factor XII levels (Soria et al, 2002), total plasma homocysteine (Malarstig et al, 2009), and C4BP plasma levels (Buil et al, 2010). In these studies quantitative trait loci were discovered, but these loci often included the structural gene for the investigated intermediate phenotype. The studies investigating total plasma homocysteine and C4BP plasma levels were combinations of linkage and association studies. In one of these, associations were found for SNPs near the ZNF366 gene and the PTPRD gene, which might suggests novel pathways for homocysteine metabolism.
The second main genome wide linkage analysis was performed in the Kindred Vermont II study, which includes a single large pedigree with a high rate of VTE, partly due to type I protein C deficiency resulting from a single mutation in the protein C gene. Only a subset of the carriers of this mutation experienced a VTE and therefore a genome scan was performed including 375 microsatellite markers to investigate the presence of a second thrombophilic mutation in this pedigree (Hasstedt et al, 2004). Three potential gene loci were found and 109 genes within these loci were re-sequenced. Only one SNP in the CADM1 gene was associated with VTE, but this association was limited to the subjects with PC deficiency (Hasstedt et al, 2009).
The GENES study included 22 families with unexplained thrombophilia (Wichers et al, 2009). Families were included through a proband with VTE and absence of known thrombophilic defects. This study found that the endogenous thrombin potential (ETP) was associated with VTE and therefore ETP was used as an intermediate phenotype for VTE. However, the heritability of ETP was mainly caused by only one large family (128 individuals). In this family, a genome wide linkage scan was performed for quantitative trait loci influencing ETP and other coagulation and fibrinolysis variables (Tanck et al, 2011). The highest LOD score (4.8) for PC levels was found on chromosome 20q11. Candidate gene analysis revealed that a locus of the PROCR gene is a genetic determinant for PC levels, as well as for soluble EPCR levels (Pintao et al, 2011).
3.4. Results from genome wide association studies
The first large scale association analysis for VTE had a multistage design (Bezemer et al, 2008). In the first stage 19,682 SNPs, selected based on their potential affect on gene function or expression, were genotyped in pooled DNA samples from the Leiden Thrombophilia Study (LETS), including 443 cases and 453 controls. This resulted in 1,206 SNPs that were significantly associated with VTE. These 1,206 SNPs were then replicated in pooled DNA samples from a subset of the Multiple Environmental and Genetic Assessment of Risk Factors for Venous Thrombosis (MEGA) study (1,398 cases and 1,757 controls). 104 SNPs were significantly associated with VTE in this population and these SNPs were subsequently genotyped in both populations again, but now in the individual samples. 18 SNPs remained associated and were replicated in another subset of the MEGA study. Eventually, four SNPs located in the CYP4V2/KLKB1/F11 gene cluster and GP6 and SERPINC1 genes were consistently associated with VTE, as well as one SNP in the FV gene (Bezemer et al, 2010). Odds ratio’s ranged from 1.10-1.49.
The second large-scale association analysis was a genome wide study, including 317,139 SNPs (Tregouet et al, 2009). These SNPs were genotyped in 453 cases and 1,327 controls and the significant results were replicated in two independent case-control studies. This study only found consistently associated SNPs with VTE in two known VTE susceptibility genes: FV and ABO blood group genes. The same authors also attempted to replicate the significant results found by Bezemer et al. in the two replication populations and confirmed the associations in the genes CYP4V2 and GP6.
A genome wide association study investigating the intermediate phenotype plasma protein C levels was performed in a large population of individuals from European ancestry in the Atherosclerosis Risk in Communities (ARIC) study (Tang et al, 2010). In this study approximately 2.5 million SNPs were genotyped in 8048 subjects. Plasma protein C levels were associated with SNPs in the genes GCKR, PROC, PROCR, and EDEM2. All 4 loci were confirmed in a replication study including 1376 subjects. A fifth locus in gene BAZ1B was identified after pooling of the original study and replication study results.
3.5. High-through put sequencing results for complex diseases
High-throughput sequencing technology was first used to sequence a limited number of candidate genes. The first study that published next generation sequencing driven data investigated 10 candidate genes in type I diabetes (Nejentsev et al, 2009). These candidate genes were chosen based on positive association signals found in these genes with a GWAS. Exons and regulatory sequences of the 10 genes were sequenced in pools of DNA of 48 subject and 480 cases and 480 controls in total. Four rare variants were found in the gene IFIH1. Association analysis in over 30,000 participants showed that these variants were associated with a reduced risk with odds ratio’s of 0.51-0.74.
Targeted re-sequencing was also performed for two intervals including the two candidate genes, FAAH and MGLL, for extreme obesity (Harismendy et al, 2010). These intervals were sequenced in 142 obese people and 147 controls. Rare variants were found in or near promoter sequences and other regulatory elements like transcriptional enhancers of these genes. The intervals including rare variants were associated with extreme obesity. Most of these variants had minor allele frequencies of <0.01.
For autism, whole exome sequencing was performed in 20 patients and their parents (O'Roak et al, 2011). Twenty-one de novo mutations were identified and 11 of these were protein altering. Most of the protein altering mutations were found in highly conserved amino acid residues. Potentially causative mutations were identified in 4 of the more severely affected subjects of the probands in the genes FOXP1, GRIN2B, SCN1A, and LAMC3.
4. Future directions
Future research of genetics for venous thrombosis and other complex diseases will be largely based on the technologies that are now becoming available. When the costs to perform high-throughput sequencing experiments decline further, larger populations can be sequenced, as well as larger regions of the genome. Nowadays, it is already possible to capture the whole exome of the human genome for sequencing, but this is still too expensive to be performed in larger study populations. The ultimate goal is to sequence the whole human genome. This would be the most unbiased method to investigate genetic risk factors, because there is no assumption made about the location of variants in the genome or any pathway that is involved in VTE. Sequencers can generate an increasing amount of data, but the limiting factor now is the data analysis and the interpretation of the results for the disease under investigation. Improvements are still required in this field to support research of rare variants in complex diseases.
Rare variants are probably not the only biological elements that account for the unexplained heritability for VTE. Future research should also focus on other mechanisms that influence gene regulation and gene expression. Non-coding RNA molecules can be involved in chromatin modification, transcriptional regulation, and translational efficiency. Genetic variability and expression of these non-coding RNA might also have an effect on the development of diseases. Epigenetic mechanisms, like DNA methylation, also participate in the regulation of gene expression in a heritable manner. These epigenetic changes are already associated with the aetiology of some diseases, like cancer, diabetes, and neurological disorders. Furthermore, it might be worthwhile to use pathway directed methods in the investigation of complex diseases. Variability in biological systems as a whole might be more important due to gene-gene interactions than the genetic variability in separate candidate genes in isolation and this might also be the reason why replication of results of association studies of candidate genes often fails.
If we get more insight from these data into the genetic architecture of venous thromboembolism and the pathways that are important in the development of this disease, personalized prediction and management might become reality.
Early studies of genetic risk factors for venous thromboembolism have revealed several genetic variations like the factor V Leiden and the prothrombin mutation, which increase the risk of developing venous thromboembolism. Based on studies in thrombophilia families that were showing variability in penetrance of the phenotype, thrombophilia was proposed to be an oligogenetic disease. However, the established genetic risk factors do not explain the total heritability for venous thromboembolism, suggesting that genetic risk factors remain to be discovered. Association studies have attempted to make such discoveries by searching for common susceptibility variants, but the contribution of these studies have been limited. Other studies have to be performed to find new genetic determinants for venous thromboembolism. The most recent hypothesis is that unique, rare variants can explain much of the genetic susceptibility for VTE. With the introduction of high-throughput sequencing technology, rare variants can now be directly identified by candidate gene or whole exome sequencing approaches. The data analysis remains the biggest challenge of these types of studies. The most appropriate and unbiased method to determine new genes and pathways involved in disease would be a whole genome sequencing approach, but financially it is not yet possible to do this in large study populations. Although the focus of research in complex diseases is now mostly on rare variants, we have to realize that the unexplained heritability for venous thromboembolism might also reside in other elements that do not change the DNA sequence, but influence gene expression and regulation through other biological mechanisms.