Cancer is one of the most important public health problem in Mexico and worldwide, especially for female population, breast and cervical cancer (CC) types are the most frequent. Incidence rates of CC are higher in developing countries 40/100,000 women per year vs. 10/100,000 in developed countries (1). In Mexico there are 12,000 new reports cases every year (2). The absence of the screening programs or comparatively ineffective screening programs lead to relatively late diagnosis of the disease and also in differences in the human papillomavirus (HPV) infection (3). Several types of HPV are associated with CC worldwide (4, 5), being the HPV16 the most frequent oncogenic type.
Epidemiological and experimental studies suggest that high risk HPV have an important role in cervical carcinogenesis. Persistent viral infection, genetic background in combination with constitutive expression of the viral oncogenes as E6 and E7, are decisive steps for malignant transformation, because these oncoproteins interact with the tumour suppressor proteins p53 and pRB, respectively for their degradation (6, 7). Finally, these interactions could induce cellular proliferation and genetic instability for example, which could promote the accumulation of mutations and aneuploidy (8). In conclusion, viral oncoproteins have a general impact in global profile of expressed genes, which could be analyzed by high-throughput methodologies. One of these techniques is DNA oligonucleotide-based microarray technology, which allows a rapid and high-throughput detection of thousands of transcripts simultaneously (9- 11).
It has been published several studies about gene expression profiles in HPV infected cells. Mainly these reports are based on gene expression levels altered by E6 and E7 HPV oncoproteins (12- 17). Regarding changes in gene expression profiles in cervical cancer samples, there are a few papers comparing normal cervical expressed genes versus tumors samples (12, 18), the major aim in those studies was to find potential tumor markers with clinical value. At present the list of the potential markers is short (p16, survivin).
We have published some works about alterations in gene expression in CC (19, 20). In those reports we observed that WNT pathway, calcium pathway and some cellular proteases (MMP11, cathepsin F) could be involved in cervical carcinogenesis. Thus, these findings are contributing to our knowledge about alterations in CC pathogenesis.
In the present work our microarray data obtained from microarray assay on CC samples (20) were newly managed and analyzed by using new bioinformatics suite programs and then, to get others genes altered in cervical cancer, as well as to define signaling pathways probably implicated in this type of cancer.
2. Material and methods
Biological samples. Eight squamous CC tissues stage IIB (according of International Federation of Obstetrics and Gynecology, FIGO) HPV16 positive were selected and two healthy “normal” cervical samples were originally studied. The normal samples were collected after hysterectomy by uterine myomatosis without HPV infection history. All DNAs from healthy and cervical cancer samples were subjected to PCR by using general oligonucleotides against to HPV; and to confirm the HPV type, the positive samples were then sequenced (data not shown). An important point to eliminate negative false, only the samples harboured at least 70% of tumour cells or normal epithelial cells were analyzed.
Total RNA was extracted from squamous CC and normal tissues using TRIzol reagent (GIBCO, BRL, USA). RNA was synthesized and labeled with CodeLink Express Assay Reagent Kit (Applied Microarrays, GE Healthcare, USA).
2.1. Microarray platform
CodeLink™ Human Whole Genome Microarrays offer comprehensive coverage of the Human genome, this array have ~57,000 probes and consider transcripts and ESTs (expression tagged sequences). In this system are included: 1,200 genes for oncogenesis process, 1,400 for cell cycle, 1,000 for cell-signaling, 3,000 for metabolism, 1,400 for developmental process, 2,700 of transcription and translation, 1,100 for immune and inflammation response, 800 for protein phosphorylation, 600 for apoptosis, 1,150 for ion transport, 400 for synaptic transmission, 200 for kinases, among other.
This array harbors 45,674 genes (based on unique UniGene IDs), 360 positive controls, 384 negative controls, 100 housekeeping genes, one specific and functionally validated probe and the oligonucleotide probe length of 30-mer.
This platform has been applied in different biological models (21-23). CodeLink Bioarray are recently introduced, single-color oligonucleotide microarrays, which differ from Affymetrix GeneChips in the following aspects: 1) this Bioarray use a single pre-synthesized, pre-validated 30-mer probe to detect each target transcript, whereas Gene-Chips use multiple in-situ synthesized, 25-mer probes; and 2) the surface of CodeLink Bioarrays is made of 3-dimensional aqueous gel matrix, whereas that of Affymetrix GeneChips is made of 2-dimensional glass matrix. These characteristics could suggest that CodeLink Bioarrays behave differently from GeneChips and may require different normalization strategies from the ones optimized for GeneChips (24)(Figure 1).
3. Analysis of the data
Partek® Genomics SuiteTM is a comprehensive suite of advanced statistics and interactive data visualization specifically designed to reliably extract biological signals from noisy data. The commercial software is unique in supporting all microarray and next generation sequencing technologies including gene expression and digital gene expression, exon/alternative splicing, RNA-Seq, copy number and association, ChIP-chip, ChIP-seq, and microRNAs in a single software package, allowing for analysis of multiple applications in one complete solution.
This kind of analysis will provide results with minimal noises generated from the internal controls. To perform this analysis is necessary to apply a software suite which is composed by three different statistical tests: Probe Statistical algorithm (MAS5), Probe Logarithmic Intensity Error (Plier) and Robust Multichip Analysis (RMA). The goal of these tests is to establish differences and similarities between internal controls and to get the most real data. In the present case, the normalized samples were analyzed RMA statistical tests eliminating the tags harboring variations.
In the global gene expression is difficult understand what happen with all of genes in different cellular process including the cancer, in this context a way of visualization data is a Principal Component Analysis or PCA (25). This method is a mathematical technique to reduction the effect of the gene expression sample in a small dimensional space, when there is less changes in the global gene expression the dimensional is smaller. Next for visualization of changes in gene expression in all samples we made a clustering analysis (26), in this method we used a K-means algorithm and let to classify to determine similitude and dissimilitude in all samples, to finish we applied methods of systems biology as Ingenuity Pathway Analysis) and gene classification to determine new list of candidates for subsequent lab verification and might help in the search for a cure for cancers.
3.1. Networks by Ingenuity Pathways Analysis (IPA)
This software (www.ingenuity.com) to help life science researchers explore, interpret, and analyze complex biological Systems, and is used to help researchers analyze 'omics data and model biological systems. This analysis was to identify Networks of interacting genes and other functional groups. A cut-off ratio of 2 was used to define genes.
The best and most accurate method for identifying disease-causing genes is monitoring gene expression values in different samples using microarray technology. One of the shortcomings of microarray data is that they provide a small quantity of samples with respect to the number of genes. This problem reduces the classification accuracy of the methods, so gene selection is essential to improve the predictive accuracy and to identify potential marker genes for a disease. Among numerous existing methods for gene selection, PARTEK has become one of the leading methods, but its performance can be reduced because of the small sample size, noisy data and the fact that the methods remove redundant genes.
The original cervical dataset was already published and available in a previous report (20). This dataset was obtained by using CodeLink microarray platform and provides the expression levels of 57,000 probes for 2 normal tissues and 8 cervical cancers HPV16 positive. The data were pre-processed by carrying out a base 10 logarithmic transformation and normalized (see Figure 2). After first analysis of the data, using significance analysis of microarrays 3,248 genes well annotated were identified.
After that, the data already normalized were managed and analyzed using Partek genomics suite version 6.5 (Partek GS) obtaining a list of differentially expressed genes (Tumor versus Normal). Specific and new procedures are actually performed in analysis of microarray data. For instance, after internal control analysis, the raw data are normalized using the common software available on line or some other like Partek GS. This suite provides rigorous and easy-to-use statistical tests for differential expression of genes or exons, and a flexible and powerful statistical test to detect alternative splicing based on a powerful mixed model analysis of variance.
By one-side ANOVA statistical test a small group of 208 genes were selected with a false discovery ratio < 10%, from these, 111 overexpressed genes with a fold change >2, and 97 downregulated genes with a fold change of <-2 genes were observed (Tables 1 and 2).
4.1. Analyzing microarray data with novel software suite
If we want to display the data in just two dimensions, we want as much of the variation in the data as possible captured in just two dimensions. Principal component analysis or PCA has been developed for this purpose. Applying this PCA method in the cervical data we observed some expected differences (Figure 3).
In order to obtain a graphical representation of the differences between normal and tumor tissues, hierarchical cluster analysis was performed on all samples with a pseudo-color visualization matrix of the 208 selected genes grouping with greater intra-group similarity and differences between groups. The phylogenetic tree resulting from the hierarchical complete linkage-clustering algorithm is shown in figure 4. The figure shows those genes that are changing respect to health cervical tissue. In this method of clustering, allows to do relationships among objects (genes or samples) and are represented by a tree whose branch lengths reflect the degree of similarity between the objects, as assessed by a pairwise similarity function. The computed trees can be used to arrange individual samples in the original data table; this allows the samples or groups of samples with similar expression patterns to be shown adjacent to each other.
In general, tumor samples showed heterogeneity among them compared with normal samples, which had a more homogeneous gene expression profile. So, clustering analysis in CC failed to show significant segregation of patients based on expression profiling possibly
due to the heterogeneous nature of the samples as well as the relatively small numbers of samples in this study. Even when the samples were subjected to rigorous procedures of analysis, the special selection of the patients, including age, clinical stage, HPV16 positive and contraceptive oral status avoiding any bias, in this stage of the carcinogenesis process (stage IIb) the pattern of gene expression is quite different between samples.
We used IPA to investigate the biological relevant of the observed genome-wide expressed gene changes by categorizing our data set into biological functions and/or diseases Ingenuity Pathway analysis was applied. The 208 genes annotated list by PARTEK analysis was submitted to the visualization IPA tool. This bioinformatics tool is employed for visualizing expression data in the context of KEGG biological pathways; the importance IPA is that retrieves an impact factor (IF) of genes that entire pathway involved, which can help to obtain a clearer notion of the alteration level in each biological pathway, and understand the complexity of these different process of the cancer cell. We imported a list of significantly up and down regulated genes (with extension.txt) into the program to convert the expression data into illustrations in an attempt to explore altered mechanisms in CC. To overcome any possible incorrect IF in altered pathways due to different size of samples, we submitted a similar quantity of up and down regulated genes. This allowed confirming that genes involved in several metabolic pathways were altered in CC (see networks).
We were able to associate biological functions and diseases to the experimental results. Fifteen pathways were obtained with a high score. Table 3 is showing the genes and the top three disorders/disease of “small networks” based in the analysis of the data. As can be seen, a clear route in cancer as it is known was not observed but some genes have been previously associated; however, these data give important information involving “non canonical” pathways in cancer.
Finally, in the Figure 5 is showed a “hypothetical network in CC” based from the 15 small networks. In addition to gene expression values, the proposed method uses Gene Ontology, which is a reliable source of information on genes. The use of Gene Ontology can compensate, in part, for the limitations of microarrays, such as having a small number of samples and erroneous measurement results.
In our results, non classical “cancer genes” were conserved, respect to expected genes as MYC, FOS, RB, P53, HIF, etc. However, in the “strict sense of the word” when is considered a cancer gene? By instance, over-expression, down-regulation, point mutation, amplification, loss of heterozygosity, polymorphisms, epigenetic changes, etc. Thus, any gene could be considered like cancer gene, if they are following special criteria as recently was reported (27).
In this context, we decided to explore two non-related genes in cervical cancer PARK2 gene. Interestingly, PARK2 gene mutations (point mutations and exonic deletions) were first identified in autosomal recessive juvenile-onset parkinsonism. This gene is mapped to 6q25.2-q27 containing 12 small exons, and encodes parkin protein which functions as an E3 ligase, ubiquitinating proteins for destruction by the proteosome. Several substrates for parkin have been identified, including a 22kD glycosolated form of synuclein, parkin-associated endothelin receptor-like receptor (Pael-R), and CDCrel-1. Over-expression of Pael-R causes it to become ubiquinated, insoluble, and unfolded, and lead to endoplasmic reticulum stress and cell death (for review see 28). The location of Parkin is in a chromosomal region that is frequently deleted in multiple tumor types, including hepatocellular carcinoma (HCC), ovarian cancer, and breast cancer. The Parkin gene is within FRA6E, the third most active common fragile site (29,30). Interestingly, all three fragile sites regions were found consistently deleted in HCC (31) as well as in ovarian, breast, and prostate cancers. Further PARKIN protein overexpression did not lead to
increased sensitivity to all pro-apoptotic induction but may show specificity for a certain type of cellular stress. At present, Parkin gene could be considered as new tumor suppressor gene (32). In our case, per se Parkin expression in CC is interesting. It is widely described that p53 master gene is constitutively expressed but under stress conditions as HPV infection, DNA damage or point mutations its half-time life of the protein is increased. Similar situation might be observed for Parkin gene due to increased expression in CC. This could be supported because the 6q25 cytogenetic region in CC is not altered as happen for TP53 gene (33). To this respect, we could hypothesize that a new overexpression of Parkin gene could be involved in invasion cervical carcinogenesis. These findings could demonstrate that the genetic context in which a mutation occurs can play a significant role in determining the type of illness produced or associated.
It has been established that although we inherit two copies of all genes (except those that reside on the sex chromosomes), there is a subset of these genes in which only the paternal or maternal copy is functional. This phenomenon of monoallelic, parent-of-origin expression of genes is termed genomic imprinting. Imprinted genes are normally involved in embryonic growth and behavioral development, but occasionally they also function inappropriately as oncogenes and tumor suppressor genes (34). Furthermore, it is well know that a variety of genetic changes influence the development and progression of cancer. These changes may result from inherited or spontaneous mutations that are not corrected by repair mechanisms prior to DNA replication. It is increasingly clear that so called epigenetic effects that do not affect the primary sequence of the genome also play an important role in tumorigenesis (35).
Other gene overexpressed seen in this analysis was PEG10 gene. This gene is mapped in chromosme 7q21. PEG10 protein prevents apoptosis in hepatocellular carcinoma cells through interaction with SIAH1, a mediator of apoptosis. May also have a role in cell growth promotion and hepatoma formation. Inhibits the TGF-beta signaling by interacting with the TGF-beta receptor ALK1. This is a paternally expressed imprinted gene that encodes transcripts containing two overlapping open reading frames (ORFs), RF1 and RF1/RF2, as well as retroviral-like slippage and pseudoknot elements, which can induce a -1 nucleotide frame-shift. Increased expression of this gene is associated with hepatocellular carcinomas. These findings link to cancer genetics and epigenetic by showing that a classic proto-oncogene, MYC, acts directly upstream of a proliferation-positive imprinted gene, PEG10 (36,37).
The HOX genes are a family of transcription factors that bind to specific sequences of DNA in target genes regulating their expression. The role of HOX genes in adult cell differentiation is still obscure, but growing evidence suggests that they may play an important role in the development of cancer. We have previously reported that some HOX genes could be related to CC. Specifically, HOXA9 was observed expressed in cervical cancer by RT-PCR end point. In the present work, the data are showing that statistically significative HOXA9 gene is differentially expressed in CC. Together to HOXB13, D9, D10, and HOXC cluster (HOXC9, C11–C13) genes this family of genes might be an important factor involved in CC (35).
It is clear that the most altered genes in CC are not commonly associated to cancer process. This fact could suggest: 1) the “classic genes of cancer” are statistically significant altered with tiny values, but there are some exceptions and specific tumor types as neuroblastomas and N-MYC gene, Her2/neu in breast cancer. 2) At least in stage IIb of cervical carcinogenesis could be involved genes related to “cellular economy” but not belonging to genes of cancer. This is supported by recent reports showing molecular alterations in genes not previously related to cancer (27). 3) The extreme values (high or low) in microarray analysis not always represent strong candidates of markers in the models performed. What about the most frequent?. (4) Integrative genomics, multicentre protocols in well selected samples, stratified stages and clinical follow-up, will be the clue to get cancer hallmarks. In addition, the study of the molecular function of selected genes strengthened the hypothesis that these genes are involved in the process of cancer growth.
The data information obtained from microarray analysis should be validated because can appear errors in positive and negative false due to nature of the massive assays. In this context, in order to confirm the microarray data, additional molecular tool as end point PCR, real time PCR, northern blot, immunohistochemistry should be performed and to obtain results. (Mendez S. An Integrative microarray gene expression analysis, approach identifies candidates’ array multi-experiments in Ovary Tumours, submitted to publication 2011).
This work was partially supported by CONACYT (México) grants 69719 AND 87244 from FONDOS SECTORIALES. We appreciate the technical assistance of Laboratorio de Oncología Genómica, CIS, HO-IMSS. Sergio JUAREZ and Mauricio SALCEDO made similar efforts in the present work.
- , , , , , , ,