Cancer is one of the most important public health problem in Mexico and worldwide, especially for female population, breast and cervical cancer (CC) types are the most frequent. Incidence rates of CC are higher in developing countries 40/100,000 women per year vs. 10/100,000 in developed countries (1). In Mexico there are 12,000 new reports cases every year (2). The absence of the screening programs or comparatively ineffective screening programs lead to relatively late diagnosis of the disease and also in differences in the human papillomavirus (HPV) infection (3). Several types of HPV are associated with CC worldwide (4, 5), being the HPV16 the most frequent oncogenic type.
Epidemiological and experimental studies suggest that high risk HPV have an important role in cervical carcinogenesis. Persistent viral infection, genetic background in combination with constitutive expression of the viral oncogenes as E6 and E7, are decisive steps for malignant transformation, because these oncoproteins interact with the tumour suppressor proteins p53 and pRB, respectively for their degradation (6, 7). Finally, these interactions could induce cellular proliferation and genetic instability for example, which could promote the accumulation of mutations and aneuploidy (8). In conclusion, viral oncoproteins have a general impact in global profile of expressed genes, which could be analyzed by high-throughput methodologies. One of these techniques is DNA oligonucleotide-based microarray technology, which allows a rapid and high-throughput detection of thousands of transcripts simultaneously (9- 11).
It has been published several studies about gene expression profiles in HPV infected cells. Mainly these reports are based on gene expression levels altered by E6 and E7 HPV oncoproteins (12- 17). Regarding changes in gene expression profiles in cervical cancer samples, there are a few papers comparing normal cervical expressed genes versus tumors samples (12, 18), the major aim in those studies was to find potential tumor markers with clinical value. At present the list of the potential markers is short (p16, survivin).
We have published some works about alterations in gene expression in CC (19, 20). In those reports we observed that WNT pathway, calcium pathway and some cellular proteases (MMP11, cathepsin F) could be involved in cervical carcinogenesis. Thus, these findings are contributing to our knowledge about alterations in CC pathogenesis.
In the present work our microarray data obtained from microarray assay on CC samples (20) were newly managed and analyzed by using new bioinformatics suite programs and then, to get others genes altered in cervical cancer, as well as to define signaling pathways probably implicated in this type of cancer.
2. Material and methods
Biological samples. Eight squamous CC tissues stage IIB (according of International Federation of Obstetrics and Gynecology, FIGO) HPV16 positive were selected and two healthy “normal” cervical samples were originally studied. The normal samples were collected after hysterectomy by uterine myomatosis without HPV infection history. All DNAs from healthy and cervical cancer samples were subjected to PCR by using general oligonucleotides against to HPV; and to confirm the HPV type, the positive samples were then sequenced (data not shown). An important point to eliminate negative false, only the samples harboured at least 70% of tumour cells or normal epithelial cells were analyzed.
Total RNA was extracted from squamous CC and normal tissues using TRIzol reagent (GIBCO, BRL, USA). RNA was synthesized and labeled with CodeLink Express Assay Reagent Kit (Applied Microarrays, GE Healthcare, USA).
2.1. Microarray platform
CodeLink™ Human Whole Genome Microarrays offer comprehensive coverage of the Human genome, this array have ~57,000 probes and consider transcripts and ESTs (expression tagged sequences). In this system are included: 1,200 genes for oncogenesis process, 1,400 for cell cycle, 1,000 for cell-signaling, 3,000 for metabolism, 1,400 for developmental process, 2,700 of transcription and translation, 1,100 for immune and inflammation response, 800 for protein phosphorylation, 600 for apoptosis, 1,150 for ion transport, 400 for synaptic transmission, 200 for kinases, among other.
This array harbors 45,674 genes (based on unique UniGene IDs), 360 positive controls, 384 negative controls, 100 housekeeping genes, one specific and functionally validated probe and the oligonucleotide probe length of 30-mer.
This platform has been applied in different biological models (21-23). CodeLink Bioarray are recently introduced, single-color oligonucleotide microarrays, which differ from Affymetrix GeneChips in the following aspects: 1) this Bioarray use a single pre-synthesized, pre-validated 30-mer probe to detect each target transcript, whereas Gene-Chips use multiple in-situ synthesized, 25-mer probes; and 2) the surface of CodeLink Bioarrays is made of 3-dimensional aqueous gel matrix, whereas that of Affymetrix GeneChips is made of 2-dimensional glass matrix. These characteristics could suggest that CodeLink Bioarrays behave differently from GeneChips and may require different normalization strategies from the ones optimized for GeneChips (24)(Figure 1).
3. Analysis of the data
Partek® Genomics SuiteTM is a comprehensive suite of advanced statistics and interactive data visualization specifically designed to reliably extract biological signals from noisy data. The commercial software is unique in supporting all microarray and next generation sequencing technologies including gene expression and digital gene expression, exon/alternative splicing, RNA-Seq, copy number and association, ChIP-chip, ChIP-seq, and microRNAs in a single software package, allowing for analysis of multiple applications in one complete solution.
This kind of analysis will provide results with minimal noises generated from the internal controls. To perform this analysis is necessary to apply a software suite which is composed by three different statistical tests: Probe Statistical algorithm (MAS5), Probe Logarithmic Intensity Error (Plier) and Robust Multichip Analysis (RMA). The goal of these tests is to establish differences and similarities between internal controls and to get the most real data. In the present case, the normalized samples were analyzed RMA statistical tests eliminating the tags harboring variations.
In the global gene expression is difficult understand what happen with all of genes in different cellular process including the cancer, in this context a way of visualization data is a Principal Component Analysis or PCA (25). This method is a mathematical technique to reduction the effect of the gene expression sample in a small dimensional space, when there is less changes in the global gene expression the dimensional is smaller. Next for visualization of changes in gene expression in all samples we made a clustering analysis (26), in this method we used a K-means algorithm and let to classify to determine similitude and dissimilitude in all samples, to finish we applied methods of systems biology as Ingenuity Pathway Analysis) and gene classification to determine new list of candidates for subsequent lab verification and might help in the search for a cure for cancers.
3.1. Networks by Ingenuity Pathways Analysis (IPA)
This software (www.ingenuity.com) to help life science researchers explore, interpret, and analyze complex biological Systems, and is used to help researchers analyze 'omics data and model biological systems. This analysis was to identify Networks of interacting genes and other functional groups. A cut-off ratio of 2 was used to define genes.
The best and most accurate method for identifying disease-causing genes is monitoring gene expression values in different samples using microarray technology. One of the shortcomings of microarray data is that they provide a small quantity of samples with respect to the number of genes. This problem reduces the classification accuracy of the methods, so gene selection is essential to improve the predictive accuracy and to identify potential marker genes for a disease. Among numerous existing methods for gene selection, PARTEK has become one of the leading methods, but its performance can be reduced because of the small sample size, noisy data and the fact that the methods remove redundant genes.
The original cervical dataset was already published and available in a previous report (20). This dataset was obtained by using CodeLink microarray platform and provides the expression levels of 57,000 probes for 2 normal tissues and 8 cervical cancers HPV16 positive. The data were pre-processed by carrying out a base 10 logarithmic transformation and normalized (see Figure 2). After first analysis of the data, using significance analysis of microarrays 3,248 genes well annotated were identified.
After that, the data already normalized were managed and analyzed using Partek genomics suite version 6.5 (Partek GS) obtaining a list of differentially expressed genes (Tumor versus Normal). Specific and new procedures are actually performed in analysis of microarray data. For instance, after internal control analysis, the raw data are normalized using the common software available on line or some other like Partek GS. This suite provides rigorous and easy-to-use statistical tests for differential expression of genes or exons, and a flexible and powerful statistical test to detect alternative splicing based on a powerful mixed model analysis of variance.
By one-side ANOVA statistical test a small group of 208 genes were selected with a false discovery ratio < 10%, from these, 111 overexpressed genes with a fold change >2, and 97 downregulated genes with a fold change of <-2 genes were observed (Tables 1 and 2).
4.1. Analyzing microarray data with novel software suite
If we want to display the data in just two dimensions, we want as much of the variation in the data as possible captured in just two dimensions. Principal component analysis or PCA has been developed for this purpose. Applying this PCA method in the cervical data we observed some expected differences (Figure 3).
In order to obtain a graphical representation of the differences between normal and tumor tissues, hierarchical cluster analysis was performed on all samples with a pseudo-color visualization matrix of the 208 selected genes grouping with greater intra-group similarity and differences between groups. The phylogenetic tree resulting from the hierarchical complete linkage-clustering algorithm is shown in figure 4. The figure shows those genes that are changing respect to health cervical tissue. In this method of clustering, allows to do relationships among objects (genes or samples) and are represented by a tree whose branch lengths reflect the degree of similarity between the objects, as assessed by a pairwise similarity function. The computed trees can be used to arrange individual samples in the original data table; this allows the samples or groups of samples with similar expression patterns to be shown adjacent to each other.
In general, tumor samples showed heterogeneity among them compared with normal samples, which had a more homogeneous gene expression profile. So, clustering analysis in CC failed to show significant segregation of patients based on expression profiling possibly
due to the heterogeneous nature of the samples as well as the relatively small numbers of samples in this study. Even when the samples were subjected to rigorous procedures of analysis, the special selection of the patients, including age, clinical stage, HPV16 positive and contraceptive oral status avoiding any bias, in this stage of the carcinogenesis process (stage IIb) the pattern of gene expression is quite different between samples.
We used IPA to investigate the biological relevant of the observed genome-wide expressed gene changes by categorizing our data set into biological functions and/or diseases Ingenuity Pathway analysis was applied. The 208 genes annotated list by PARTEK analysis was submitted to the visualization IPA tool. This bioinformatics tool is employed for visualizing expression data in the context of KEGG biological pathways; the importance IPA is that retrieves an impact factor (IF) of genes that entire pathway involved, which can help to obtain a clearer notion of the alteration level in each biological pathway, and understand the complexity of these different process of the cancer cell. We imported a list of significantly up and down regulated genes (with extension.txt) into the program to convert the expression data into illustrations in an attempt to explore altered mechanisms in CC. To overcome any possible incorrect IF in altered pathways due to different size of samples, we submitted a similar quantity of up and down regulated genes. This allowed confirming that genes involved in several metabolic pathways were altered in CC (see networks).
We were able to associate biological functions and diseases to the experimental results. Fifteen pathways were obtained with a high score. Table 3 is showing the genes and the top three disorders/disease of “small networks” based in the analysis of the data. As can be seen, a clear route in cancer as it is known was not observed but some genes have been previously associated; however, these data give important information involving
Finally, in the Figure 5 is showed a “hypothetical network in CC” based from the 15 small networks. In addition to gene expression values, the proposed method uses Gene Ontology, which is a reliable source of information on genes. The use of Gene Ontology can compensate, in part, for the limitations of microarrays, such as having a small number of samples and erroneous measurement results.
In our results, non classical “cancer genes” were conserved, respect to expected genes as MYC, FOS, RB, P53, HIF, etc. However, in the “strict sense of the word” when is considered a cancer gene? By instance, over-expression, down-regulation, point mutation, amplification, loss of heterozygosity, polymorphisms, epigenetic changes, etc. Thus, any gene could be considered like cancer gene, if they are following special criteria as recently was reported (27).
In this context, we decided to explore two non-related genes in cervical cancer
increased sensitivity to all pro-apoptotic induction but may show specificity for a certain type of cellular stress. At present,
It has been established that although we inherit two copies of all genes (except those that reside on the sex chromosomes), there is a subset of these genes in which only the paternal or maternal copy is functional. This phenomenon of monoallelic, parent-of-origin expression of genes is termed genomic imprinting. Imprinted genes are normally involved in embryonic growth and behavioral development, but occasionally they also function inappropriately as oncogenes and tumor suppressor genes (34). Furthermore, it is well know that a variety of genetic changes influence the development and progression of cancer. These changes may result from inherited or spontaneous mutations that are not corrected by repair mechanisms prior to DNA replication. It is increasingly clear that so called epigenetic effects that do not affect the primary sequence of the genome also play an important role in tumorigenesis (35).
Other gene overexpressed seen in this analysis was
It is clear that the most altered genes in CC are not commonly associated to cancer process. This fact could suggest: 1) the “classic genes of cancer” are statistically significant altered with tiny values, but there are some exceptions and specific tumor types as neuroblastomas and
The data information obtained from microarray analysis should be validated because can appear errors in positive and negative false due to nature of the massive assays. In this context, in order to confirm the microarray data, additional molecular tool as end point PCR, real time PCR, northern blot, immunohistochemistry should be performed and to obtain results. (Mendez S. An Integrative microarray gene expression analysis, approach identifies candidates’ array multi-experiments in Ovary Tumours, submitted to publication 2011).
This work was partially supported by CONACYT (México) grants 69719 AND 87244 from FONDOS SECTORIALES. We appreciate the technical assistance of Laboratorio de Oncología Genómica, CIS, HO-IMSS. Sergio JUAREZ and Mauricio SALCEDO made similar efforts in the present work.
- , , , , , , ,