Genetic Association Studies on Prostate Cancer

The modern research on molecular basis of prostate cancer (PCa) development includes studies aiming to identify potential genetic markers which could be used in diagnostics and/or monitoring of PCa. Genome-wide association studies (GWASs) have identified over 75 variants associated with PCa risk. One of the major PCa-related regions identified through GWASs is found to be a segment of 8q24. Other important PCa-susceptibility regions are 17q12, 17q24, 10q11, and 19q13. Candidate-gene based approach has also provided evidence of association between PCa risk and genetic variants located in functionally significant genes (both protein-coding and noncoding RNA genes) involved in normal prostatic cell growth, malignant transformation, or in the development of metastases. Nevertheless, the success of these studies is questionable, since numerous candidates for PCa-susceptibility variants were identified, but these results failed to replicate. The main aim of both types of genetic association studies on PCa is the identification of potential PCa genetic markers which could be used for constructing reliable algorithms for evaluating the risk for PCa development and/or PCa progression.


Introduction
Alarming statistics on prostate cancer (PCa) incidence and mortality, as well as the results of epidemiological studies, have led to focusing research efforts on discovering molecular mechanisms underlying its onset and progression [1]. Still, molecular basis of PCa pathogenesis remains largely unknown, while the results of studies in this area of research suggest that

Genome-wide association studies
The Human genome project was critical for making high-throughput genome-wide analyses possible. Not only that this project yielded DNA sequence information but also provided basis for development of methodology, including high-throughput genotyping assays, as well as software tools for analyzing large amount of genetic data [13]. Therefore, sequencing of human DNA provided basis for GWASs, including those on PCa [14].
To date (February 2016), GWASs have identified over 75 variants associated with prostate cancer risk, predominantly in populations of European ancestry (Figure 1) [15,16]. The first GWASs were conducted in 2007, for which a large collection of samples were obtained from PCa patients and healthy controls, as well as databases that included clinical data of patients were constructed [17][18][19]. The necessity of a large number of subjects for this type of study was obvious even in this early period of conducting GWASs. As in other complex diseases, PCa GWASs are usually designed in a multistage manner, with the whole set of tag-single nucleotide polymorphisms (tag-SNPs) being evaluated in the first phase, and only subsets of the most significant SNP being replicated in much larger groups of patients and controls in next phases [20,21]. Thus, repeating the tests yields the most significant results [20].
The results of initial GWASs showed that most of the PCa-associated genetic variants are located in so-called "gene-deserts". The lack of protein-coding genes in these regions was explained by the supposed presence of regulatory sequences of major proto-oncogenes and tumor-suppressive genes [22,23]. Today, another explanation is also the presence of genes encoding regulatory RNA molecules within PCa-risk regions [24].

8q24 region
One of the major PCa-related regions was found to be 8q24. Within approximately 1 million base pairs segment of 8q24 reside multiple variants associated with PCa [25]. This region was first identified as associated with PCa susceptibility in a genome-wide linkage study conducted in Icelandic population [26]. Later on, the association of genetic variants within this region with PCa risk was shown in initial GWASs from 2007. Gudmunsson et al., Haiman et al. and Yeager et al. have shown the association between previously reported rs1447295 and PCa risk [17][18][19]. Also, these first GWASs identified other PCa susceptibility variants within 8q24, rs6983267, and rs16901979. Afterward, GWASs have provided evidence for association of other single-nucleotide genetic variants (SNVs) from 8q24 with PCa risk, such as rs4242382, rs7017300, and rs7837688 [27,28]. In the recent years, by implementing clinical data and by using case-only design, both GWASs and validation studies have provided evidence for an association of several loci within 8q24 with PCa aggressiveness or survival [29][30][31][32].
PCa-susceptibility region within 8q24 was defined as gene desert, since no known proteincoding genes were located within it. Nevertheless, the possible biological explanation for the effect of genetic variants located in 8q24 on PCa risk was their influence on the regulation of the expression of nearby genes, mainly C-MYC. It was suggested that regulatory sequences controlling the transcription rate of C-MYC gene were located in 8q24, and that functional genetic variants which are in strong linkage disequilibrium (LD) with PCa susceptibility locus or several loci effect the sequence and therefore the function of regulatory elements [23]. Previous studies on molecular mechanisms of PCa pathogenesis have shown the functional significance of C-MYC, both by analyzing mutational signatures of malignant prostate tissue and by conducting functional analyses in cell cultures, which included stimulation or silencing of C-MYC expression [33]. Other than prostate cancer, several other malignancies were associated with 8q24, including breast and colorectal cancer. Some of the subregions of 8q24 associated with these cancers are found to overlap with those related to PCa, while others differ (Figure 2) [34].

17q12
17q12 is another PCa susceptibility region identified through initial GWAS. Two of the genetic variants located in 17q12, rs7501939 and rs3760511, were found to be associated with the risk of developing PCa in the study by Gudmundsson et al. conducted in 2007 [35]. In this GWAS, minor alleles of these two single nucleotide genetic variants were found to confer the increased risk of PCa in cohorts of participants from Iceland, Netherlands, and the USA, while in the group of Hispanics this genetic association was not shown [35]. The results of this GWAS were further validated in multiple populations, mostly of European origin [36][37][38][39][40][41][42][43]. Validation studies were even conducted in Africans in which genetic association studies on PCa are scarce [37,[44][45][46][47]. The most recent meta-analysis of both GWASs and validation studies has also shown the association of these genetic variants with PCa risk [43].
SNVs rs7501939 and rs3760511 are located in the first intron of the hepatocyte nuclear factor 1 β(HNF1β ) or transcription factor 2 (TCF2) which is a transcription factor showing tissuespecific expression pattern. Therefore, the association of genetic variants located in 17q12 with PCa risk could be explained by the effect of functional genetic variants on HNF1β function or expression [41].

17q24
Another PCa-susceptibility region on chromosome 17 is 17q24. Genetic variants located within this region which were found to be associated with PCa are intergenic variants. Similar to 8q24 genetic variants, those located in 17q24 are found in a gene desert, probably harboring multiple regulatory sequences controlling the expression of surrounding genes [48]. One of the most proximal genes is SOX9, which is an important proto-oncogene in prostatic tissue. Recent findings have shown the location of PCa-associated genetic variants in an enhancer looping to SOX9 gene [48]. Among these genetic variants is a tag-SNP previously identified through GWAS, as well as potentially functional genetic variants found by deep sequencing of PCasusceptibility region [35,48].

10q11
Two out of the three GWASs, which were published in 2008 in the same issue of Nature Genetics, have identified PCa-associated genetic variants in the region 10q11 [27,36]. Afterward, other studies have provided additional evidence to support the association between 10q11 and PCa susceptibility, including both GWASs and validation studies [49][50][51][52][53][54][55]. One of these PCa risk-associated genetic variants was located in the close proximity of the transcription start site of the gene Microseminoprotein B (MSMB ) which encodes a tumor-suppressor, and was, therefore, even considered as potentially functional. For the risk allele of this genetic variant, it was further shown to affect the expression of MSMB gene in a negative manner [56,57]. The other gene in proximity to this genetic variant is Nuclear receptor coactivator 4 (NCOA4). NCOA4 protein interacts with androgen receptor (AR) and acts as corepressor of androgenresponding genes. Therefore, functional genetic variants in LD with GWAS hits could potentially contribute to PCa risk by affecting the expression of these two genes, or others in proximity [58].

19q13
Region 19q13 harboring kallikrein genes KLK2 and KLK3 was found to be associated with PCa susceptibility through GWASs [59]. Several genetic variants associated with PCa risk were located in KLK3 gene, such as missense SNV rs17632542 identified by fine-mapping of PCaassociated subregion 19q13.33. These genes encode serine-proteases, one of which is PSA, used for PCa diagnosis and disease monitoring. Therefore, the association of PCa-risk genetic variants with serum PSA level was evaluated, yielding statistically significant results for potentially functional SNV rs17632542 [15,59].
Another subregion associated with PCa risk is 19q13.4 in which a GWASs hit is in strong LD in Chinese population with germline deletion affecting LILRA3 gene, involved in inflammatory pathways [60].

Candidate gene-based approaches
Even before GWASs, the necessity of conducting association studies in order to identify low and moderate penetrability genetic variants that contribute to PCa risk was obvious. Therefore, numerous candidate genes were analyzed for genetic variants associated with PCa, with questionable success due to false discoveries and the lack of replication [61]. Candidates were selected based on their potential functional significance in normal prostatic cell growth, malignant transformation, or in the development of metastases. Therefore, among these candidate genes are those encoding proteins involved in androgen signaling, cell-cycle control mechanisms, major tumor-suppressors, or proto-oncogenes, as well as those involved in cellular adhesion or communication with surrounding cellular or matrix components of prostate epithelium [62,63]. This implies the need for previous knowledge when designing case-control studies using candidate gene approach [64].
Even though these studies were common before GWASs, they are still conducted in numerous populations, aiming to confirm previously found associations, or to identify new ones by analyzing other candidates, selected by using modern research results, such as those involved in regulatory functions of non-coding RNAs [65].

Androgen signaling
Since androgen signaling is essential for growth and survival of prostate epithelial cells, genes involved in androgen biosynthesis, signal reception and transduction, as well as in androgen metabolism have emerged as candidates for case-control studies [63]. Most of these studies involved Androgen receptor (AR), as the major component of androgen signaling and regulation of expression of androgen-responding genes. Among these studies, major percentage relied on analyzing the potential association of the length of CAG repeat string with exon 1 which encodes a poly-glutamine tract of AR with PCa risk [66]. This homopolymeric tract is located in N-terminal domain of AR, which possesses transactivational properties and its length is inversely correlated with transactivation function [67]. Even though initial results were promising, the supposed association was not confirmed in a large percentage of later studies, and the effect sizes were not large enough to support the substantial biological role. Therefore, the association of this genetic variant with PCa risk remains controversial [68][69][70].
Another three-nucleotide (GGN) repeat string, encoding polyglycine tract in AR, was analyzed for potential association between its length and PCa risk. This repeat string is also located in exon 1, but less studied than the CAG repeat tract, possibly due to technical problems in amplifying GC-rich DNA regions [71]. The effect of the length of GGN repeat string on transactivational properties of AR is still unclear, and the other proposed mechanism of potential functional significance is the effect on AR translation [72]. Studies on the potential association of this microsatellite on PCa risk and progression yielded contrasting results [73][74][75][76][77][78][79].
Mixed results were also found for SRD5A2 (type II steroid 5α-reductase), which is the major enzyme converting testosterone to dihydrotestosterone. Similarly, studies analyzing genetic variants within CYP17, CYP19A, HSD17B, and HSD3B have shown initial promising results, lacking consistent validation [63,80].
Two genetic variants within PON1 have been analyzes in multiple populations, L55M and Q129R. The results to date are inconclusive, but the meta-analysis conducted in 2012 suggested the association of L55M missense variant with PCa risk [82]. Also, a recent meta-analysis on only three PCa studies and Q129R showed statistically significant association for several genetic models of association [83].
The most commonly analyzed SNVs in CYP1A1 are missense variants rs1048943 (p.Ile462Val) and rs4646903, which are also called MspI polymorphisms, since they alter the recognition site for MspI restriction enzyme. Numerous studies and also the recent meta-analyses showed the association between these SNVs and PCa risk [84][85][86].
The results obtained for genetic variants in CYP1B1 and CYP3A4 are controversial, with the recent meta-analyses suggesting the association of L432V, N453S, and A119S polymorphisms of CYP1B1 and A392G in CYP3A4 with PCa susceptibility [87,88].

DNA repair, cell cycle control, and apoptosis
Dysfunctions of DNA repair pathway, apoptosis regulation, and cell cycle control mechanisms alter the cells response to DNA damage and lead to uncontrolled proliferation, progression and metastasis of malignant diseases. Also, genetic variants in genes involved in these processes could potentially attribute to cancer susceptibility and/or progression risk [62].
The most common SNVs in XRCC1 studied in case-control studies on cancer risk are rs1799782 (p.Arg194Trp), rs25489 (p.Arg280His), and rs25487 (p.Arg399Gln) [89,103]. These genetic variants were also analyzed for their potential association with PCa risk in numerous studies, but the obtained results were inconsistent [89,90]. For rs25489, association with radiationinduced late toxicity in PCa patients was also shown [104]. Similarly, rs861539 (p.Thr241Met) in XRCC3 was found to be associated with early adverse effects induced by radiotherapy, based on quantitative data synthesis of 6 studies [105].
A recent study conducted in Spain showed the association of rs11615 in ERCC1 and rs17503908 in ATM with PCa aggressiveness [93]. Genetic variants in the same chromosomal region as ERCC1 were previously analyzed in a large study that provided opposing results. Nevertheless, this previous study was designed as to include subjects from multiple populations, and its results could therefore be influenced by genetic backgrounds of study participants [93,106].
Among genetic variants located in MDM2, missense variant SNP309 in the promoter region was most frequently analyzed. This SNV was found to be associated with both PCa risk and aggressiveness in multiple studies [107,108]. The first study on this subject yielded no evidence of the supposed association [109]. Nevertheless, results obtained in several later studies suggested the association of SNP309 with the risk of PCa progression to the more advanced stage, or the statistical trend of significance was reached [108].
Numerous studies conducted on a potential association between CCND1 genetic variant rs603965 (p.Ala870Gly) and PCa risk, yielding inconsistent results [99]. This SNV was found to affect alternative splicing and thus alter the C-terminal domain. Other genetic variants within this gene were shown to be associated with the risk of PCa biochemical reoccurrence after radical prostatectomy [110].
The most extensively analyzed SNV located in TP53 gene is rs1042522 (p.Arg72Pro). This genetic variant was found to be associated with PCa risk, especially among Caucasians [102]. When it comes to BCL2, encoding the founding member of apoptosis regulatory proteins, promoter SNV c.-938C > A was associated with PCa risk, although lacking replication, as well as with disease-free survival and biochemical recurrence of PCa after radical prostatectomy [100,101,111].

Vitamin D signaling
Vitamin D signaling in PCa has stimulatory effect on apoptosis, as well as inhibitory effect on the progression of cell cycle. Therefore, multiple genetic variants within the gene encoding the receptor for vitamin D (VDR) were analyzed for their potential association with PCa risk and/ or progression. Most of them are loci named FokI, BsmI, ApaI, and T I, according to restriction enzyme used for genotyping, Cdx2 in promoter region and polyA microsatellite, which were most frequently tested [112][113][114].
Even though the initial results on these loci were promising, in multiple populations, they were not replicated [113,115]. The association of these genetic variants with PCa progression parameters and the disease outcome also remains inconclusive [113,116].

Chronic inflammation and angiogenesis
Numerous genes involved in chronic inflammation have been studies for association of genetic variants that reside within them with PCa risk and/or progression [117]. Also, the importance of vascular support to cancer growth stimulated the association studies on PCa analyzing genetic variants located in angiogenesis-related genes [62]. Since these processes are codependent, numerous genes primarily found to be involved in chronic inflammation are also discussed as angiogenesis-related genes, and vice versa.
There have been various PCa case-control studies involving Vascular endothelial growth factor (VEGF) gene, encoding the important proangiogenic growth factor, as well as genes encoding Interleukin 8 (IL-8 ) and Interleukin 10 (IL-10) [96,[124][125][126][127][128] for genetic variant rs1570360 [c.-1154G > A] located in the promoter region of VEGF, statistically significant association with PCa risk was shown in several studies [126]. Most other VEGF genetic variants analyzed for potential association with PCa risk and/or progression are also located in the promoter region [126,[129][130][131]. These SNVs could be associated with transcription rate of VEGF [132], which is positively correlated with tumor stage, Gleason score, as well as with shorter period of disease-free survival [133].
Candidates for this type of studies were also genes encoding transcription factors which regulate the expression of VEGF, such as Hypoxia inducible factor 1 (HIF1A ), Epidermal growth factor (EGF ), and Lymphotoxin α (LTA). Nevertheless, except for HIF1A, association of genetic variants within these genes with PCa risk was not shown, or was mostly found in small sample studies and poorly replicated [62,125,126,134]. Some of the key regulators of angiogenesis are also fibroblast growth factors (FGFs). Therefore, receptor FGFR4 gene has been analyzed for genetic variants associated with PCa risk and/or progression. The most commonly tested SNV is a missense variant rs351855 (p.Gly388Arg), found to be associated with PCa risk and aggressiveness in a relatively small number of studies [135].
Among the most extensively analyzed candidate genes in PCa-related case-control studies are NOS3 and NOS2A, encoding nitric oxide synthases [136]. Both endothelial and inducible nitric oxide synthases, encoded by these genes, are enzymes that catalyze the production of NO from L-arginine and L-citrulline amino acids [137]. Being the major producer of NO in endothelial cells, eNOS, encoded by NOS3, is involved in the control of vascular tone and angiogenesis, which is essential for tumor growth and the development of metastases. Yet, the synthesis of NO is associated with apoptosis, which has the opposing effect on carcinogenesis [138]. Numerous genetic variants within these genes, especially NOS3, have been analyzed for potential association with PCa risk and/or progression [136]. Most commonly analyzed SNVs are -786 T > C (rs2070744) and 894G > T (rs1799983) [139][140][141][142][143][144][145][146][147], while several studies included insertion-deletion polymorphism 4a4b located in intron 4 of NOS3 [140,146,148,149]. For rs1799983, which is a missense genetic variant, it was hypothesized to affect NOS3 stability [150]. The other common SNV, rs2070744, affects promoter activity by allele C creating a binding site with validation protein 1A (RPA1) [151].
Angiogenesis process and tumor invasion also require degradation of extracellular matrix and basal membranes, which are catalyzed by matrix metalloproteinases. Among the genes encoding this class of enzymes, MMP2 and MMP9 are analyzed for genetic variants associated with PCa risk, and also for disease aggressiveness, due to their functional significance in tumor invasiveness [139,[152][153][154][155][156]. Commonly analyzed genetic variant in MMP2 promoter is rs243865. For minor allele of this SNV it was shown to be associated with reduced transcription rate of MMP2 [157].

Cellular adhesion
Among genes involved in cellular adhesion, CDH1 encoding E-cadherin was the candidate gene for the most case-control studies on PCa. Since aberrant expression of this gene is correlated with the increased metastatic potential of PCa, genetic variants in its promoter region were analyzed for potential association with PCa risk and progression [158,159]. Most extensively studied SNV−160C > A was found to affect CDH1 expression and was identified as PCa susceptibility genetic variant in multiple populations [158,160].
Only few studies also included genetic variants in genes encoding intercellular adhesion molecules (ICAMs), proteins involved in cellular adhesion and signaling. The analyzed genetic variants are those located in ICAM-1, ICAM-4, and ICAM-5 genes and need a further evaluation for potential association with PCa risk and/or progression [161,162].

Long noncoding RNA genes
The potential involvement of long noncoding RNAs (lncRNAs) in prostate carcinogenesis was suggested not only by the results of expression analyses that showed several known oncogenic and/or tumor-suppressive lcnRNAs to be aberrantly expressed in malignant prostatic tissue or plasma samples from patients with PCa but also by the identification of several PCa-specific lncRNAs [163,164].
Several SNVs in lncRNA genes were identified as PCa susceptibility variants in case-control studies on PCa. In their study published in 2011, Jin et al. have stated that eight SNVs identified to that time through GWAS are located in lncRNA intervals [165]. They also identified a SNV in a putative lncRNA which was not later experimentally confirmed as a PCa-susceptibility variants [165]. In a study published in 2013, Xue et al. have shown the association between two tag-SNPs in Prostate cancer gene expression marker 1 (PCGEM1 ) and PCa risk in Chinese population [166]. Genetic variant in another PCa-specific gene, prostate cancer associated 3 (PCA3), was analyzed for the length of a TAAA repeat string in the promoter region. This genetic variant was also found to be associated with PCa risk [167]. In a GWAS published in 2014, Cook et al. have identified rs7918885 in RP11-543 F8.2 gene as a PCa-susceptibility SNV in West African men, although GWAS statistical significance threshold was not reached [168]. Also, by using fine-mapping and resequencing of PCa-susceptibility subregion of 8q24, lncRNA gene prostate cancer noncoding RNA 1 (PRNCR1) was found to be located between the most significantly associated genetic variant [169].

MicroRNA genes
Dysregulation of diverse regulatory mechanisms based on microRNA activity has been implicated in prostate carcinogenesis. Therefore, possibly functional genetic variants located in microRNA genes emerged as potential PCa-associated loci. Among these genetic variants are those that potentially influence microRNA biogenesis, stability of mature microRNAs, efficiency of target gene regulation, as well as target specificity. By affecting these features of microRNA regulatory mechanisms, microRNA SNVs could be associated with aberrant expression of various important PCa-related oncogenes or tumor-suppressive genes [170][171][172].
MicroRNA genetic variants have been analyzed for their potential association with PCa in only a few studies conducted in Asian populations and in a single population of European origin. These studies have provided discordant results on the effects of genetic variants in rs2910164 in hsa-miR-146a [173][174][175][176], hsa-miR-196a2 [174,176,177], and rs3746444 in hsa-miR-499 gene on PCa risk [174,177]. In a recent study, rs4705342 located in hsa-miR-143 gene promoter was found to be associated with the risk of developing PCa [178]. Since the number of conducted studies is small, additional findings from multiple populations are needed in order to make further conclusions.
Another SNV, rs895819 located in a gene encoding miR-27a, which is androgen-regulated and stimulates the androgen signalization in a positive feedback loop, was found to be associated with PCa risk, as well as with the development of distant metastases. Nevertheless, these results are derived from a single study on PCa risk and rs895819 conducted relatively recent and needs further validation [177].

Replication, validation studies, and Meta-analyses
Differences in genetic backgrounds are an important issue in genetic association studies. Therefore, interpretation of data requires discussing the potential differences between populations. Therefore, in order to analyze such differences, multiple validation analyses are conducted in various population and ethnicities. These studies are designed so that they resemble as much as possible to the original study that yielded genetic associations, or the lack of it. The ratio for conducting such studies is the possible lack of association between identified PCa-susceptibility variants with PCa risk in certain populations, or the differences in effect sizes [179]. Replication studies, conducted in confirmation group of participants from the same population in which the initial results were found, is a method of checking reproducibility and evaluating possible false positives and effect overestimation [179,180].
Currently, replications and validations are conducted for both GWASs results, as well as for results from candidate gene-based studies. Of utmost importance is conducting replication and validation analyses of hits from studies with relatively small sample sizes, as well as with poorly clinically characterized cases with the lack of data on possible confounders, or questionable recruitment of controls [180]. Also, an important issue in case-control studies on PCa is the type of control group, which is in some cases healthy controls, while in others group of patients with benign prostatic hyperplasia (BPH). Furthermore, classification systems for patients with PCa which are used for evaluating potential genetic associations with PCa progression differ between studies, which together with small sizes of patient groups, calls for replication of acquired statistically significant data.
All of these issues are a potential reason for the opposing results on the association of the most of genetic variants analyzed in multiple studies with PCa risk and progression. Therefore, in order to elucidate the effect of these genetic variants, meta-analyses of eligible studies are frequently conducted. Combining the results from smaller studies through data synthesis in meta-analysis could result in increased statistical power [181]. Therefore, meta-analyses could provide more precise estimations, as well as the insight in the potential effect of confounders [182], such as ethnicity, participant recruitment strategy, or study size.

Future perspectives
The main aim of genetic association studies on PCa is the identification of potential PCa genetic markers which could be used for constructing reliable algorithms for evaluating the risk for PCa development and/or PCa progression [1]. Therefore, it is important not only to identify these PCa-related genetic variants, but also to precisely characterize their effect sizes. In order to do that, ethnic differences need to be taken into account [179]. Other important issues in interpreting results of association studies are gene-gene and gene-environment interactions. Therefore, future research and designing such algorithms require integration of knowledge on genetic associations, cellular pathways, and statistical epistasis in which real biological interaction could be reflected.
Since the major problem in clinical practice related to PCa is the overdiagnosis and monitoring of patients [4], additional studies on PCa aggressiveness with clinically well characterized groups of PCa patients are needed to identify genetic variants associated with PCa progression risk. The later implementation of algorithms based on these genetic variants could greatly improve clinical protocols in monitoring and treating PCa.

Conclusion
The efforts for improving clinical protocols in PCa diagnostics, monitoring and treatment resulted in conducting genetic association studies on PCa. These studies aim to identify potential PCa genetic markers and characterize their association with PCa risk and/or progression through measuring effect sizes. The identified and validated genetic markers could then be used for constructing reliable algorithms for evaluating the risk for PCa development and, more importantly, for PCa progression. Implementing such algorithms in clinical practice is expected to improve the distinction between early diagnosed PCa cases that require aggressive treatment and latent PCa cases which remain indolent during patient's lifetime.