Principles and Application of Microarray Technology in Thyroid Cancer Research

In recent decades, ongoing development of microarrays and microarray platforms has revolutionized biological research. Especially in cancer research, microarray technology has brought forth many new insights and has been widely applied for the elucidation of biological interrelations, effects, pathways and aetiology of cancers such as thyroid cancer (Kundel et al., 2010; Williams et al., 2011; Rousset et al., 2011; Cheng et al., 2011; Vierlinger et al., 2011). Microarrays have initiated a new era of research for scientists, with new challenges. Moreover, these assays have contributed to the elucidation of potential therapeutic targets for drug development, and elucidate biomarker candidates for improving diagnostics. Therefore microarrays will help to improve therapy and to enable personalized medicine.


Introduction
In recent decades, ongoing development of microarrays and microarray platforms has revolutionized biological research. Especially in cancer research, microarray technology has brought forth many new insights and has been widely applied for the elucidation of biological interrelations, effects, pathways and aetiology of cancers such as thyroid cancer (Kundel et al., 2010;Williams et al., 2011;Rousset et al., 2011;Cheng et al., 2011;Vierlinger et al., 2011). Microarrays have initiated a new era of research for scientists, with new challenges. Moreover, these assays have contributed to the elucidation of potential therapeutic targets for drug development, and elucidate biomarker candidates for improving diagnostics. Therefore microarrays will help to improve therapy and to enable personalized medicine.
We start this chapter by summarizing the flow of information from DNA to protein, and discussing points of interest for cancer research and cancer diagnostics.
The biological information of all eukaryotic organisms is stored in the DNA, which is arranged in chromosomes to achieve a compact structure. The DNA offers multiple points of interest for cancer research. These comprise on the one hand sequence based variations including structural and copy number variations, and on the other hand epigenetic variations, which have a regulatory effect on gene expression, without causing any changes to the DNA sequence (Bird, 2007). In this chapter, we focus mainly on points of interest which can be experimentally examined and analyzed by microarray technology. Therefore, microarrays detecting sequence variations (SNP arrays) as well as copy number alterations (array CGH) and DNA methylation will be discussed. There are also microarrays for histone acetylation studies based on chromatin immunoprecipitation (ChIP), where the fragmented (by sonication or enzyme digestion) chromatin is precipitated by specific antibodies and the DNA-precipitate is hybridized then onto microarrays (chips). Thus this technique is called "ChIP on chip" and used to identify DNA sequences within chromatin regions having bound modified histones. ChIP on chip technology is the method of choice for the investigation of histone modifications -which are major players of epigenetic regulatory 120 mechanisms of gene expression, together with DNA methylation (Collas, 2010;Russo et al., 2011). This method is also useful for studying (non-epigenetic) gene regulation, but we are not going to discuss that technology in detail in this chapter and would refer the reader to a review on "The current state of chromatin immunoprecipitation" written by Philippe Collas (Collas, 2010). Also, Russo et al., in a recently published paper, reviewed the important role of epigenetic changes in thyroid tumorigenesis (Russo et al., 2011). They summarized epigenetic effects on many genes involved in thyroid carcinogenesis, including those involved in the reduced ability of the tumour to concentrate radioiodine. The most prominent example of microarrays being used in cancer research is gene expression profiling of mRNA, whereby the expression levels of the entire transcriptome can be measured simultaneously. This, however, does not account for post-transcriptional gene silencing which occurs when a small piece of RNA (miRNA) interacts with mature mRNA. The result is a double stranded RNA, which cannot be read by the ribosome, and hence the production of protein is disabled (Fire et al., 1998;Bagasra and Prilliman, 2004;MacRae et al., 2006;MacRae et al., 2007). Dysregulation of miRNAs has been associated with cancer (L. He et al., 2005;Mraz et al., 2009). Mazeh et al., for example elucidated the feasibility of the mir-221 miRNA for the detection of papillary thyroid carcinoma in fine needle aspiration biopsy (FNAB) samples (Mazeh et al., 2011). Another study, performed by 121 Kitano et al. aimed to evaluate the power of miRNAs for distinguishing papillary from follicular thyroid carcinoma. They elucidated two miRNAs (miR-126 and miR-7) with high diagnostic accuracy (0.81 and 0.77) (Kitano et al., 2011).
Although until now most protein-based studies are done by mass spectrometry, proteinmicroarrays offer highly multiplexed analyses of protein abundance, as well as identification of tumour autoantibodies. Tumour antibodies are produced as a result of a humoral immune response to antigens of the tumour itself (Tan et al., 2009). On microarrays we can either immobilize the antigens, to detect tumour specific antibodies in the sample, or we can immobilize antibodies on the microarray, to identify the antigens or proteins and their modification within a sample.
In the following sections of this chapter, we will discuss the general principles of microarray technology as wells as commercially available platforms used in connection with thyroid cancer research.

The development of the microarray technology
The most prominent and most commonly used molecules in microarray technology are nucleic acids (DNA and RNA). In 1953, Watson and Crick laid the cornerstone for microarray technology by their description of the DNA double helix (Watson and Crick, 1953). Further knowledge of DNA in biological as well as technical contexts was obtained soon after, as it was found out that the DNA could be separated by heat or alkali treatment (cavalieri et al., 1962;Uhlenhopp and Krasna, 1969). The overall main principle of DNA hybridization is based on denaturation and the corresponding reverse process, renaturation. Renaturation processes were first described by Marmur and Doty and are highly specific under proper conditions (Marmur and Doty, 1961). Analysis of nucleic acids by hybridization has been a key method since the early 1960s. Another cornerstone for improving DNA-hybridization was established in the 70's by Ed Southern (Southern, 1975), who introduced the pioneering technology of Southern blotting. This method combines the separation of DNA based on fragment length by electrophoresis with subsequent hybridization of a probe for the specific detection of DNA fragments. A few years later, scientists developed a technology in which they immobilized known molecules on a nitrocellulose membrane or glass plate (Bains and Smith, 1988;Drmanac et al., 1989;Khrapko et al., 1989). One of the first scientists, using such membranes with high feature density was Jörg Hoheisel (Hoheisel et al., 1994). It was also Hoheisel who increased the feature density on the surfaces by the simple replacement of manual procedures by robotics for the production of the so-called macroarrays. That invention took the technology a substantial step further, as it not only increased the feature density, but also removed human errors and made the microarray technology reproducible and accurate (Wheelan et al., 2008).
The term microarray was first introduced by Schena et al. (Schena et al., 1995) in 1995 and the first genome of an eukaryotic species completely investigated (Saccharomyces cerevisiae) by a microarray was published in 1997 (Lashkari et al., 1997). In the last few years, further improvements were made especially when substituting the immobilized DNA-probes derived from clone-libraries by chemically synthesized oligonucleotides. These improvements were possible after elucidation of entire genomes by large consortia projects, like the Human Genome project. The sequence information from these projects laid the groundwork for generations of arrays covering entire genomes and transcriptomes which are available today. Some of the commercially available arrays cover genomes at a density of several hundred thousands and even millions of features which can be analysed in parallel in a single experiment. With the growing number of microarray applications and cost reductions, publications using microarray technologies in thyroid cancer research as well as in life sciences increased enormously in recent years ( Figure 3). Yet, the applications and answerable questions of microarrays are still growing and have become a standard in all areas of life sciences.

Principles of microarray analyses
The collective term "microarray" describes a state-of-the-art technology in molecular biology which allows high-throughput and highly parallel analyses of up to several 123 thousand points of interest (e.g. genes, mRNA, proteins). With some platforms, up to several hundred thousands of parallel measures (Sandoval et al., 2011) using nanogram amounts of sample material can be produced in one experiment. Parallel to the increase in throughput, quality also increased due to technical improvements of the array production process and molecular methods for labelling (Leung and Cavalieri, 2003;Priness et al., 2007;Thirlwell et al., 2010;Karakach et al., 2010;McCall et al., 2011).
The microarray itself consists of a carrier material. A very common, easy to handle surface is glass. The surfaces of the glass slides are usually modified with different reactive molecules (e.g. aldehyde or epoxy groups) onto which the biomolecules (probes) can be immobilized. These probes are either printed on the surfaces using microarray-spotters or directly synthesized using automated synthesizers (the reactive area between spots has to be blocked before starting hybridization of the targets). The latter process is used by several commercial microarray providers and enables production of high density array. Most microarray formats are of the size of a standard microscopic slide and can be easily handled. This allows the processing of many samples per assay and results in the generation of a high amount of data (Howbrook et al., 2003;Karakach et al., 2010).
An overview of the necessary experimental steps is given in Figure 4. Although different types of biomolecules and different strategies are used to generate microarray-data, the technical workflows are almost identical. The key event in the microarray-processing is the interaction between a probe (e.g. oligonucleotide) immobilized on the surface of the array and the target (e.g. fluorescent-labelled DNA). The immobilized molecules on the surfaces are referred to as probes, whereas the molecules which are being detected upon hybridization and binding towards the probes are called targets (Wheelan et al., 2008).
Every microarray study starts with the isolation of the respective targets (e.g. DNA or RNA). In principle nucleic acids are isolated upon cell disruption by mechanical or enzymatic methods and precipitated at increased salt concentrations or ethanol. State of the art methods use selective binding onto silicamembranes or silica coated magnetic beads at high salt or ethanol concentrations, proteins are washed off and nucleic acids are eluted from the silica-resins using water or low salt buffers (e.g. 10mM Tris.Cl, pH 8). The next step is labelling of the molecules with fluorescent dyes; for that step various methods exist. Often the labelling step is done during the enzymatic amplification reaction at which fluorescently labelled nucleotides or primers are incorporated into the newly synthesized amplicons (Schaferling and Nagl, 2006). Nowadays a broad range of fluorescent dyes with different absorption and excitation wavelengths are available. The different absorption and excitation maxima allow the combination of fluorescent dyes. In microarray analyses the fluorescent dyes Cy3 and Cy5 are widely used (Liang et al., 2003). After purification of the labelled amplicons, these molecules are mixed with a hybridization buffer and are subsequently applied to the microarrays and incubated over night. After the hybridization procedure, unbound molecules have to be washed off before the detection can be done by laser-scanning with dye specific wavelengths. The detection step generates an image of the microarray, which is employed for raw data extraction. Thus fluorescent intensities of each single spot of the microarrays are measured and written in a results file along with the spot coordinates and the specific "gene" identifier. The intensity of the generated signal depends on the amount of molecules (targets), which have bound to the probe-molecules within one spot (also called feature). The last step in a microarray experiment is the bioinformatic analysis of the data of a single slide (as in aCGH; see following chapters) or data from many samples of distinct classes (e.g. n x tumour samples vs. n x normal tissues) processed in parallel within one experiment. Dealing with that high amount of data is very challenging and requires high computer power as well as well established bioinformatics tools for -just to mention a few examples -image acquisition/analysis, normalization, statistical analyses like class prediction and pathway analysis (Leung and Cavalieri, 2003). Experimental strategies of microarray analyses have to be planned carefully to permit the generation of conclusive results. In principle, experiments can be conducted as either single colour or dual colour experiments. As already mentioned the commonly used Cy3 (greenexcitation 535nm) and Cy5 (red -excitation 635nm) dyes enable distinct colour separation, thus 2 targets can be hybridized in parallel on a single array. Therefore the combination of both colours can be used for paralleled hybridization of e.g. a tumour-sample (red) and a reference sample (green). These so called two-colour experiments are less error prone. The result is a ratio of the 2 colours, depending on the contribution of sample 1 and sample 2 to the total amount of bound molecules, labelled with different fluorescent dyes (Patterson et al., 2006). Single colour experiments are conducted by using different arrays for every sample and also every reference sample. Both single and dual colour experiments are dependent on the array platform used and the specific experimental aims. These prerequisites have to be taken into account for experimental planning and therefore the interested reader should refer to specialist literature (Simon et al., 2003).

RNA expression microarrays
Analysis of mRNA expression profiles is still one of the most prominent examples in microarray technology. In expression profiling experiments, the mRNA is typically isolated from two samples (e.g. the normal tissue and tumour tissue of one individual) and, subsequently, reverse transcribed in an enzymatic reaction by reverse transcriptase polymerase chain reaction (RT-PCR) to generate complementary DNA (cDNA). The two cDNA samples are fluorescently labelled with different dyes (usually with Cy3, and Cy5). Subsequently the labelled cDNAs are pooled and cohybridized onto the arrays. Finally, dual colour images are generated upon scanning the microarrays ( Figure 5) (Schena et al., 1995;Duggan et al., 1999).
Prominent commercially available platforms are Affymetrix, Nimblegen, Agilent and Illumina, with the technology being pioneered by Affymetrix. At the time of writing this article (June 2011), Affymetrix offers five different arrays for expression analyses on human samples and nine for mouse and rat. Affymetrix arrays have been successfully applied in a number of studies on thyroid cancer.

RNA expression microarrays in thyroid cancer research
Most research on thyroid cancer utilizing microarrays has been conducted using gene expression microarrays. For thyroid cancer in general, but also for the different subtypes of thyroid cancer, a number of gene expression studies have been performed. In Table 1 we mention some interesting gene expression studies with a high impact on thyroid cancer research. The different studies followed different objectives, such as the elucidation of biological processes involved in cancerogenesis, pathway analysis and defining new diagnostic, prognostic and predictive markers. Barden et al. (Barden et al., 2003) for example, aimed to elucidate differences in the gene expression profile of follicular thyroid adenomas (FTAs) and follicular thyroid carcinomas (FTCs) using Affymetrix chips (GeneChip Hu95 array). Their main result described a gene list containing 105 genes with different expression profiles between the two thyroid nodules. With those genes they were able to classify five follicular tumours correctly, which had an undisclosed final diagnosis. In 2004, Finley et al. (Finley et al., 2004) subjected different thyroid nodules to gene expression profiling with Affymetrix's U95 GeneChip with the objective of creating gene lists capable of distinguishing between malignant and benign cases. Their analysis was able to classify the 62 utilized samples into the malignant and benign groups with high sensitivity and high specificity. Gene expression profiling has not only confirmed known thyroid cancer associated genes, but also revealed genes which until now have not been known to be associatied with thyroid cancer. Mazzanti et al. (Mazzanti et al., 2004) tried to discriminate between benign and malignant thyroid tumours by using fine needle aspirations (FNA). They were able to set up a gene list with high discrimination power (87.1% accuracy; 12.9% error rate) between benign and malignant thyroid nodules. Translating these findings into routine clinical practice could improve the accuracy of diagnosis and hence patient care. With the increasing use of FNAs, Kundel et al. in 2010(Kundel et al., 2010 investigated the usability of FNAs compared to tissue specimens in microarray experiments, utilizing the U133 GeneChip from Affymetrix. They concluded that FNAs are a good alternative to tissue specimens, as clustering analysis could pair together the concordant pairs with perfect sensitivity and specificity. In 2005, Eszlinger et al. (Eszlinger et al., 2005) used Affymetrix Gene Chips to demonstrate that RAS-MAPK signalling does not contribute to cold thyroid nodules (CTN), which was in question for that subtype of thyroid cancer. In addition, this study described 31 differentially regulated genes between CTN and the surrounding tissue. Lacroix et al. (Lacroix et al., 2005) and Giordano et al. (Giordano et al., 2006) investigated whether a subset of FTCs with a PAX8/PPARG translocation possess a unique gene expression profile compared to other thyroid tumours. While Lacroix et al. used a custom array system from Agilent; Giordano et al. used the Affymetrix U133A GeneChip. Both studies revealed a distinct gene expression profile of FTCs with the PAX8/PPARG translocation. Lacroix et al. defined a list of 93 genes which also included non-thyroid-specific genes and Giordano et al. described four genes (ANGPTL4, AQP7, ENO3, PGF) with high translocation association. Finn et al. (Finn et al., 2007b) used a genome wide gene expression microarray system from Applied Biosystems to search for marker to discriminate between PTC and the "follicular variant" of PTC (FVPTC), a subtype of PTC which is difficult to diagnose. They were able to identify 15 new genes (CD14, CD74, CTSC, CTSH, CTSS, DPP6, ETHE1, HLA-A, HLA-DMA, HLADPB1, HLA-DQB1, HLA-DRA, OSTF1, TDO2 and 1 uncharacterized/unnamed gene) which were associated with FVPTC and a narrow repertoire of functions of the identified genes.
Since the Chernobyl disaster in 1986, an increased incidence of thyroid carcinomas, especially in the juvenile population has been observed. For this reason, Stein et al. and Kim et al. (among others) performed studies to assess the radiation-induced DNA damage. Stein et al. (Stein et al., 2010) revealed a set of differentially regulated genes in radiation-induced papillary thyroid carcinoma in Chernobyl paediatric patients, and Kim et al. (Kim et al., 2010) tried to define a gene list which could distinguish between papillary thyroid carcinomas and papillary thyroid microcarcinomas (PTM). For that purpose, Kim et al. used Affymetrix's Human Genome U133A Chip, comparing PTCs and PTMs with the corresponding normal tissue counterparts. They elucidated over 200 statistically significant upregulated genes and over 150 downregulated genes in both groups, but they did not find any statistically significant differentially expressed genes when comparing the expression profiles of PTC and PTM. Kim's study demonstrated that a great deal of information from the different groups could be obtained in a simple but well-planned experiment. In a very recent study, Williams et al. (Williams et al., 2011) subjected thyroid nodules to the U133 Affymetrix GeneChip with the aim of setting up gene lists with discriminatory power between aggressive and nonaggressive follicular carcinomas. They revealed three gene lists which discriminated between histologically normal thyroid tissue and follicular neoplasms (421 genes), between FTCs and FTAs (94 genes) and between aggressive FTC and nonaggressive FTC (4 genes; NID2, TM7SF2, TRIM2, and GLTSCR2). Recently Rousset et al. (Rousset et al., 2011) elucidated 19 genes to distinguish between malignant and benign thyroid tumours which can be applied for improved diagnostic testing using molecular methods.
Due to the bulk of publically available data dealing with thyroid cancer, we have used metaanalysis using publically available gene expression data from microarray experiments. A meta-analysis combines the data from different studies and applies statistical methods to remove the bias from the data, which is caused by the different origins (laboratories, arraytypes, etc.) of the data sets. Conventionally, discrimination between benign and malignant thyroid nodules is done by fine needle aspiration biopsy (FNAB) followed by cytological assessment. Thyroid nodules are typically classified by their histology into benign types such as Nodular Goiter (NG) and Follicular Thyroid Adenoma (FTA) and the malignant entities are defined as Follicular Thyroid Carcinoma (FTC), Papillary Thyroid Carcinoma (PTC), Medullary Thyroid Carcinoma (MTC) and Anaplastic Thyroid Carcinoma (ATC). Only approximately 5% -10% of thyroid nodules are malignant (Mazzaferri, 1992), the majority of which are papillary carcinomas. Despite many advances in the diagnosis and treatment of thyroid nodules and thyroid cancer, conventionally used diagnostic methods have a well-known low specificity (Cooper et al., 2006), resulting in an "indeterminate" or "suspicious" diagnosis in 10%-20% of cases. These patients usually undergo surgery, although the nodules are actually malignant in only 20% of these cases (Chang et al., 1997;Ravetto et al., 2000). This leads to a number of patients treated unnecessarily for malignant disease. Accordingly, we therefore followed the approach of using microarray gene expression profiles to obtain a diagnostic gene signature, with the potential of allowing a precise and reliable diagnosis from fine needle aspirates in the future. Before starting our own gene expression experiments in the lab by applying 44k whole genome arrays we used publically available microarray data sets from four studies (Huang et al., 2001;Jarzab et al., 2005;H. He et al., 2005) on PTC and applied an adopted meta-analysis approach. The methodology included bias removal between the four different studies using distance weighted discrimination (DWD) (Benito et al., 2004) (Figure 6). Fig. 6. DWD integration. The effect of DWD on the first two principal components (PC) and hierarchical clustering of the data. DWD was able to remove the separation between the datasets as indicated by the PC-plots and by the mixing of the branches in the dendrogram. The PC plots show that biological information is preserved after DWD integration (Samples cluster by dataset before integration and by tumour entity thereafter). Leaves in the dendrogram are coloured by tumour entity and branches are coloured according to dataset. From this meta-analysis, we could identify a one-gene classifier (SERPINA1) for PTC (Vierlinger et al., 2011). Identification of papillary thyroid disease was further validated by rigorous study-crossvalidation, where the classification of papillary thyroid disease with SERPINA 1 as a single marker was achieved with 99% accuracy in leave-one-out crossvalidation and 93% accuracy by external real-time PCR validation using a data set generated in our own laboratory. In the latter dataset we analysed 82 thyroid samples from different entities: PTC (n=19), NG (n=18), FTC (n=13), FTA (n=18), ATC (n=3), MTC (n=6) and normal thyroid tissue (n=5) and tested for the discriminative power of SERPINA1. Figure 7 shows the signal intensities and ROC plots of the SERPINA1 probe across the different entities in the meta-analysis data and our real-time PCR validation.

Study
Encouraged by the results from our meta-analysis on papillary carcinoma, which indicated a huge potential for future diagnostic applications, we performed microarray analysis on 49 N 2 -frozen thyroid tumours in our laboratory from all major histological entities using Agilent 44k whole genome microarrays. From these data, we successfully selected features which had, in combination, a high discriminative power between (1) benign and malignant nodules and (2) follicular adenoma and follicular carcinoma. These two sets of features (20 genes for malignancy and 23 genes for the follicular classification task) were then tested on independent published datasets using leave-one-out crossvalidation (nearest shrunken centroid classification). We successfully tested the genes for classification task 1 (malignancy) on a total of 246 samples from eight different studies with an accuracy of 92% (19 misclassified) and the genes for classification task 2 (FTC vs. FTA) on 60 samples from three studies with an accuracy of 98% (one sample misclassified).

aCGH
Array comparative genomic hybridization (aCGH) is a method to detect copy number alterations in a genome (Shinawi and Cheung, 2008). The aCGH technology is an alternative of comparative/chromosomal genomic hybridization, which is a cytogenetic method to detect copy number variations in DNA (Cheung et al., 2005).
The process employs a test (e.g. cancerous tissue) DNA and a normal reference DNA. Those DNA samples are labelled with two different fluorescent dyes and are subsequently hybridized to the microarray. The result is a colour ratio of the two samples, of which copy number changes (gain or loss) can be detected along all the chromosomes (Figure 8). Early CGH based methods used entire chromosomes which were painted (using in principle fluorescent in situ hybridization -FISH techniques) -and colour ratios measured along the chromosomes. These methods had several limitations, like a low optical resolution, compared to the modern microarray based approaches. To detect single copy losses within a genome, the losses had to be at least 5-10 Mb in length. On array platforms, copy number variations of 5-10 Kb can be detected. Today, there are high-resolution arrays available which allow for the detection of copy number variations as small as 200 bp (Urban et al., 2006). Therefore, even the detection of microdeletions and duplications in different diseases (e.g. cancer) is possible.

aCGH applications in thyroid cancer
In 2007, Rodriguez et al. (Rodrigues et al., 2007) screened for copy number variations within aneuploid PTC. They found copy number gains as well as losses in all analyzed samples. Nine gains in DNA copy number occurred in at least 50% of the analyzed cases, and the most frequent gain in the 5q region was determined in over 70% of cases followed by gains of 7p, 7q and 12q in 65% of the carcinomas. The degree of copy number losses was much lower than the gains, and fewer samples were affected by those losses. Only one loss (9q) occurred in more than 50% of the cases and five losses (1p, 9q, 22q, 11q, 13q) were found in 35-50% of the samples. Finn et al. (Finn et al., 2007a) investigated copy number gains and losses in PTC, where they found that chromosomal imbalances are more frequent than previously assumed, and that a gain in PDGFB alone was seen in tumours free of the BRAF mutation (the BRAF mutation had been identified as contributing to sporadic PTC). Finn et al. correlated the over expression of FGF4 and PDGF with a gain in copy numbers. But also in ATCs copy number changes were identified. Lee et al. (Lee et al., 2008) investigated ATCs in aCGH studies and also found copy number changes in all his analysed ATCs, especially in the genes CCND1 and UBCH10; a characteristic of ATC is the overexpression the CCND1 gene product, which is due to a gain in copy number. The previously mentioned study by Stein et al. (Stein et al., 2010) in addition to expression analyses, also employed aCGH analyses to examine the genomic effects of radiation from Chernobyl disaster. They were able to detect a number of regions with copy number alterations, including regions which had never before been associated with PTCs and are therefore unique to radiation-induced PTCs. They also came to the conclusion that gains are more frequent than deletions.

SNP arrays
A single nucleotide polymorphism (SNP) is a variation in a single base pair in DNA. The human genome contains approximately 10 million SNPs, which are conserved during evolution and within populations. SNP arrays, such as those from Affymetrix (Genome-Wide Human SNP Array 6.0, www.affymetrix.com), cannot only be applied to detect polymorphisms throughout the entire human genome, but also as an aCGH platform using the allele-ratios as an indicator for copy number variations. SNP arrays use the "single base extension" (SBE) principle. In single base extension dideoxy-instead of deoxy-NTPs (ddNTP vs. dNTP) are used. Due to its chemical structure just one fluorescently labelled ddNTP can be attached to the SBE-primer by the polymerase, further elongation is not possible. For each SNP-variant (e.g. C or T at a singular genomic location) a specific probe is present (e.g. thus 2 probes would be necessary for detection of that amplified C/T variant). The elongated ddNTP is complementary to the investigated SNP. Thus a signal on a specific spot is generated only when the complementary ddNTP can be bound to that microarray spot. SNP arrays are offered also by Illumina using a similar principle of detection. Nowadays these two companies offer the most comprehensive types of SNP arrays at different resolution and for different organisms.

SNP array application in thyroid cancer
SNPs are not well investigated in thyroid cancer, although SNPs could contribute to cancer development. Hence, not many papers investigating SNPs in thyroid cancer have been published so far -only one genome wide association study has been done. The study investigated 192 cases and 37196 controls (both Icelandic) and elucidated two SNPs (9q22.3 (nearest gene FOXE1), 14q13.3 (nearest gene TTF1)) which are highly associated with thyroid cancer. The risk of developing PTC and FTC is 5.7 times higher in carriers of the mutation than in non-carriers. The study also discovered that both alleles contributes to low concentrations of thyroid stimulating hormone (TSH) and the 9q22.3 gene is also associated with low T4 and high T3 concentrations (Gudmundsson et al., 2009).

DNA methylation arrays
Major players of epigenetic regulation are CpG methylation of DNA and histone modifications like methylation, acetylation and phosphorylation (Huang et al., 2010). These changes do not affect the DNA-sequence itself, but affect gene transcription due to structural changes of the chromatin conformation, enabling/disabling access of transcription factors. Both DNA methylation and histone modification interact with each other and clarifying these mechanisms is a relatively young area of research. Histonemodifications can be analysed by "ChIP on chip", using promoter-and CpG islands,-as well as tiling-arrays (those have immobilized probes presenting the specified genomic DNA regions). These methods are comparable to those used in aCGH but require initially performing a chromatin immunopreciptiation with a specific antibody.
In this part we want to focus at DNA methylation, which occurs at the 5'-carbon position of cytosine. Epigenetic events play a major role in gene expression (Sandoval et al., 2011) and aberrant DNA methylation changes are an early event as well a key event in human cancer development affecting transcriptional regulation (Berdasco and Esteller, 2010).
Many research groups have investigated DNA methylation, which was first discovered in the late 40s of the last century (Hotchkiss, 1948). Further important knowledge about DNA methylation with respect to cancer research was generated 40 years later by Adrian Bird, who elucidated the genomic regions with a high density of CG nucleotides, called CpGislands (Bird, 1986). In the vertebrate genome only cytosine residues within CpG dinucleotides can be methylated, creating a 5-methyl cytosine (mC). Methylation of CpGdinucleotides within CpG-islands is associated with transcriptional silencing of genes. In mammalian development, DNA methylation regions have a major impact in X-Inactivation and imprinting of genes (Senner, 2011). The great advantage of analyzing DNA methylation compared to other epigenetic modifications is its stability (Senner, 2011) in the various types of biological and clinical material available to the researcher. A number of studies have elucidated a linkage between hypermethylation of CpG-islands of promoter regions and tumorigenesis (Kass et al., 1997;Baylin and Herman, 2000). Since then, research activity dealing with DNA methylation with respect to cancer research has increased dramatically. This development goes along with the need for improved high throughput techniques, and companies responded with the manufacture of DNA methylation arrays. Methods to analyse the DNA methylation patterns throughout the genome are the methylated-DNA immunoprecipitation (MeDIP, similar to ChIP using an antibody specific to mC), or based on sodium-bisulfite based DNA deamination (cytosine is converted to uracil by deamination, methylated cytosines are not converted), as well as by methyl sensitive restriction enzymes (MSREs, which are enzymes blocked by methylated DNA). DNA processed by these methods can either be subjected to promoter arrays (which investigate DNA methylation within the CpG-island or promoter regions of genes, e.g. Agilent or Affymetrix promoter arrays) or to microarrays using bead-based technology (e.g. Illumina). Upon bisulfite deamination single CpG methlylation events (C vs. U has to be differentiated) are detected by methods similar to SNP arrays. At the moment Illumina offers the Illumina Infinium 450k BeadChip; which is currently the most comprehensive microarray for genome wide DNA methylation studies. This chip allows the simultaneous investigation of 450000 CpGs throughout the entire human genome and is not restricted to the promoter regions of the genes (www.illumina.com).

DNA methylation in thyroid cancer
A number of studies have revealed the potency of DNA methylation profiles in tumour diagnostics. In the last decade several studies dealing with epigenetic modifications leading to, or affecting thyroid cancer, were performed, however large microarray based studies are missing. The first PCR based (not microarray) studies were performed in 1998 and 2004 from Elisei et al. (Elisei et al., 1998) and Xing et al. (Xing et al., 2004). p16INK4A was investigated by Elisei et al. and RASSF1A by Xing et al. Elisei et al. found 30% of the thyroid carcinomas with hypermethylated regions of p16INK4A. Xing et al. found over 25% of the RASSF1A alleles methylated in 20% of the PTCs, in 44% of the benign thyroid tumours and in 75% of the FTCs, hypothesizing that RASSF1A methylation contributes to the development of tumours. In the following years more studies were performed, focusing on the different subtypes of thyroid cancer. Those studies elucidated the impact of hypermethylated genes with varying occurrence in thyroid cancer subtypes. Zuo et al. (Zuo et al., 2010) published a study in which they reported a hypermethylated Rap1GAP gene in 71% of all PTCs. Alvarez-Nunez et al. (Alvarez-Nunez et al., 2006) published a study showing that a modulator of the PI3K/akt pathway (the PTEN gene) was hypermethylated in 100% of the FTCs and in 50% of the PTC cases. The studies by Guan et al. (Guan et al., 2008) and Hu et al. (Hu et al., 2006) identified five genes (hMLH1, SLC5A8, TIMP-3, DAPK, RARβ2) that were hypermethylated in PTCs and associated with BRAF mutations. Although the hypermethylation of these five genes was found to a varying degree in PTC (starting from 22% of RARβ2 to 53% of TIMP-3 genes), all of them were associated with BRAF mutations. All of the mentioned studies focused on a few genes since large genome wide DNA methylation studies are still missing, or not yet published.
Recently we performed a microarray based methylation study with the aim of elucidating methylation markers for different thyroid nodules. We used a self-manufactured targeted microarray called the "AIT CpG 360 cancer array" which targets CpG-islands of 323 genes (patent number: WO2010086389A1). Six histological classes (normal thyroid tissue [SD]; struma nodosa [SN, benign]; FTA; FTC; PTC; MTC) were subjected to microarray analyses. The elucidation of methylation markers which could distinguish in general between benign (struma nodosa, FTA) and malignant (FTC, PTC) thyroid tissue were brought into focus, but we also aimed for the elucidation of methylation markers which are capable of distinguishing between the FTC and FTA (diagnostically difficult to specify), PTC and FTA as well as between struma nodosa and FTC and PTC, respectively. We generated 10 classifiers (Table 2) which have high discrimination powers between the different groups of thyroid nodules (patent number: WO2010086389A1). The classifiers were created by applying a statistical method for class prediction classifications and contained between 5 and 37 genes, with which a correct classification of samples is possible with high specificity and sensitivity.
In this context we wish to point out to cluster 7, where we defined an 18-gene classifier for the discrimination between FTA and FTC, which can be difficult to differentially diagnose by cytology. With the defined classifiers, a correct classification of 100% of the FTA and FTC samples (n=37) was observed. The classifiers of cluster 2 and 3 also offer a high correct discriminatory power of 93% between the predefined groups.

High density protein microarrays for tumour autoantibody detection
In recent years, a great deal of effort has gone into developing a screen for biomarkers at the proteomic level. Great improvement in proteomics using separation techniques based on high resolution 2D-gel-electrophoresis, HPLC and others, as well as improved detection limits in the femtogram range of target molecules by developments in mass spectrometry and combined bioinformatics data-analysis have been achieved. These technical improvements will help generate new insights in cancer biology and enable future diagnostic applications. With respect to microarray applications there is a growing interest in using serum tumour-associated antigen (TAA) antibodies as serological cancer biomarkers. The persistence and stability of autoantibodies in the serum of cancer patients is an advantage over other potential markers, including the TAAs themselves, some of which are released by tumours but rapidly degrade or are cleared after circulating in the serum for a limited time. Antibody-profiles of patient's serum can be easily detected using proteinmicroarrays with spotted antigens. Immunglobulins in serum bind to the immobilized antigens and can be detected using a fluorescent-labelled detection-antibody. Because of the simple test principle minimal invasive testing using serum autoantibody profiles has a great potential for improving early diagnosis, which is an unequivocal prerequisite for successful and efficient cancer therapy.
It had been shown for several cancers that panels of auto-antigens rather than individual antigens enhance the likelihood of detecting cancer antigens with diagnostic potential (Fernandez, 2005). Therefore our research group went on to establish high-density protein microarrays which can be used for autoantibody screening. For method optimization and proof of principle we started off with a microarray which included candidate marker proteins which were identified by previous SEREX (serological identification of antigens by recombinant expression cloning, screening of brain and lung cancer and screening macroarrays of a fetal brain cDNA expression library (Sahin et al., 1995). First "antigens" for microarray printing had to be generated. Thus recombinant candidate protein expression from E.coli expression clones was set up and optimized in a 96 well plate format. His-(histidine)-tagged recombinant proteins were purified using Ni-NTA (nickel immobilized onto agarose resin via nitrilo triacetic acid) sepharose and then printed onto epoxy-coated glass slides for the production of protein microarrays. Those were incubated with minute amounts (10µl of serum diluted 1:10) of serum from brain and lung tumour patients. Within this experiment we could show that using SEREX derived expression clones are suitable for microarray-based classification of patients. Repetitive serum-testing on different microarray slides confirmed the high reproducibility of the antibody signal patterns obtained and resulted in correlation coefficients ranging from 0.92 to 0.96 thereby clearly demonstrating the potential of protein microarrays (Stempfer et al., 2010).
Recently, protein microarray technologies have improved and arrays with either spotted antibodies or antigens are available for research. Especially, developments in microarrays for the elucidation of tumour-specific autoantibody profiles have been found to be very useful in enabling diagnostics, and many studies have been published regarding their utility in different (non-cancerous) diseases, as well as in cancer. Table 3 illustrates the potential of this testing principle highlighting "colon cancer" studies (Table 3) (Carpelan-Holmstrom et al., 1995;Ran et al., 2008;Liu et al., 2009;Babel et al., 2009;Chan et al., 2010). To the best of our knowledge, systematic studies using this approach are lacking for thyroid cancer diagnostics, although this approach may enable minimally invasive early diagnostic and even pres-symptomatic screening of patients.  Table 3. Examples of the diagnostic potential using immunological and tumourautoantibody based studies for serum based testing of colorectal cancer (CRC).

Bioinformatics
Microarray experiments require very careful planning and the use of proper statistical methods to analyze the highly multiplexed data (Simon, 2009). The fundamental idea behind microarray based studies in (thyroid) cancer research is the elucidation of genes which behave differently between distinct classes (e.g. tumour versus reference). Because of the high numbers of different features measured on a single sample and the great number of data points generated in parallel experiments of multiple samples, there is a serious consequence of performing statistical tests on many genes in parallel. This is known as multiplicity of p-values. Thus when analyzing 10000 genes one would detect 100 significant genes by chance with a p-value less than 0.01. Although there is a trade-off between controlling false positive and false negative results, the only way to improve both rates is to increase the number of individuals analysed in a (microarray) study. Thus for microarray experimental planning, the sample size for microarray experiments has to be defined prior to analysis for elucidation of statistically significant differences between groups at an acceptable statistical power. This corresponds to the percentage of the differentially expressed genes that are likely to be detected by the experiment. In addition the sample size depends on how large a difference someone wants to be able to detect. The classical way to estimate the number of replicates (sample size) in a microarray experiment is with power analyses. Therefore solutions are implemented in statistical software, which enables estimation of individuals needed per group in a microarray experiments. The number of replicates per group also affects data analysis, because the number of replicates can be used to determine the fold change to be detected in a gene or feature. For data analyses of the various microarray applications, different bioinformatic concepts and solutions exist and have filled many specialists books over recent years.
Most microarray experiments aim to 1) elucidate differentially "expressed" genes in one class of samples versus another class, 2) elucidate the relationship between "genes" or "samples", and 3) to classify new samples based on a classifier generated in an array experiment (Stekel, 2004). To address aim (1) various parametric and non-parametric t-tests are frequently used to analyse the differences between the 2 groups. For analyses of more complex experiments in which there might be more than 2 groups, ANOVA and linear models are the methods of choice. These are also suitable for analysis of experiments in which the response to more than one variable is measured. To study aim (2) -the relationship between genes or samples that behave in a similar manner, -correlation of parameters are identified by different distant measures. For visualization of the highdimensional data, principal component analyses and multidimensional scaling are best suited for illustration of the distance matrix between multiple genes and/or samples. In addition, clustering is a widely used analysis tool for arranging gene and sample profiles into a tree so that concordant genes or samples are located close together. Thus clusters of genes and/or samples are built with minimal differences between the genes/samples within the respective clusters than between the different clusters. Clustering and cluster-trees, or dendrograms, enable unsupervised elucidation of similarities and associations as well as "visualization" and simplification of complex data. Especially for improving diagnostics (3) classification of patients and samples is a very exciting area of microarray analyses. Using supervised learning, a training set with well known classes (e.g. benign vs. malignant) is applied to the statistical analyses and examines the differences between the groups aiming to find a classifier consisting of a small number of "genes" (biomarkers) in the training set, that can predict to which group each individual belongs. Based on those data a prediction rule is established which enables the classification of new samples. The "classifier" genes or biomarkers can then be used in future molecular tests -like targeted microarrays or qPCR and other simpler methods for diagnostic testing. Classification algorithms applied in microarray analyses include compound covariate predictor, diagonal linear discriminant analysis, k-nearest neighbor-, nearest centroid -predictor, and support vector machines. These methods are powerful for classifying samples, each with advantages and disadvantages. After building a classifier by either of these methods the classifiers have to be validated by using training and test-set samples or by cross-validation. In bioinformatic tools options defining training and test-set samples as well as several cross-validation strategies are implemented. Although these analyses are computationally intensive, today's standard personal computers usually have sufficient performance for analyses of an experiment of 100 whole genome expression arrays with more than 40000 features. Data analyses principles established along with microarray developments (especially gene expression analyses) will also be useful for most other applications like miRNA, DNAmethylation, copy number variation, protein-arrays as well as for analyses of genome-sequencing derived highly paralleled data. For aCGH, SNP, ChIPChip several other aspects of data analysis have to be considered that are not discussed here.
For further information the interested reader should consult specific publications dealing with these type of microarrays (and companies selling the specific tools) as well as books about statistical bioinformatics and microarray analyses (e.g. (Lee, 2010); (Simon et al., 2003)).

Genome sequencing technologies
Since the invention of Sanger's chain-terminating DNA sequencing approach as the standard method in 1975 (Sanger and Coulson, 1975;Sanger et al., 1977), many technological improvements have been made in the field of DNA sequencing. Those improvements have made DNA sequencing more effective and affordable to a broad range of scientists. While the sequencing of the whole human genome by Sanger-Sequencing required billions of dollars, currently even a $1000 genome has come into reach (Rusk, 2009). The so-called next generation sequencing (NGS) technologies offer great applications for research, and many microarray-based analyses of interest in cancer research are detectable by genome sequencing approaches. Thus gene-mutations and sequence variations, RNA expression, DNA methylation, and also ChIPChip (then called ChIP-Seq) can be elucidated in a genomewide manner by NGS. The NGS technology became commercially available in 2004, and the platforms of Roche, Illumina and Applied Biosystems are currently the three big players in the field using different biochemical principles for sequencing (Mardis, 2008). These and other companies are working on improving technologies that might enable increased sequencing throughput at decreased costs. All of the upcoming third-generation sequencing technologies have in common that the results can be monitored in real time. One of those three platforms is already commercially available (Helico Genetic Analysis) and one is ready to launch (Pacific Biosciences). Both platforms utilize a single-molecule sequencing approach, by incorporating fluorescently labelled nucleotides (Rusk, 2009;McCarthy, 2010). The third technology developed by Oxford Nanopore uses nanopores where nucleotides of a DNA strand are pulled base by base through a nanopore. The sequence is read via signal changes when nucleotides migrate through the nanopore and block an electrical current in the nanopore. No labeling of the nucleotides is required, and even methylcytosine can be detected without any prior DNA modification, such as bisulfite conversion (Clarke et al., 2009;Schadt et al., 2010)

Conclusion
In this chapter we have described various molecular-genetic high throughput analyses based on microarray technology, which have been widely applied over the past decade in clinical research. These techniques have provided considerable insights into biological processes and pathways for elucidation of disease mechanisms. Although many geneexpression studies have been conducted in thyroid cancer patients, studies for elucidation of epigenetic changes are lacking. In addition integration and combination of genomic and transcription data already available as well as integration of other -omics data (like epigenomics, proteomics, metabolomics, etc.) would enable a "systems biology approach in thyroid cancer", and might help to increase knowledge of thyroid cancer biology and uncover novel biological clues in cancer development and progression.
Although genome-sequencing technologies have developed rapidly over the last 10 years and have become more affordable over time, application of microarrays is still a state-of-the art technology. Genome sequencing approaches will improve life science research and replace microarrays in several applications. For future research, the aims, experimental design as well as costs will have to be considered when making the decision to use array-or sequencing approaches. Microarray technologies will likely maintain a role in thyroid cancer research in the future, since microarray technologies are already "mature technologies".