Genes displaying altered expression in both psoriasis and Crohn's disease.
Psoriasis (a skin disease) and Crohn’s disease (a disease of the intestinal epithelium) are multifactorial diseases caused by abnormalities in genetic machinery regulation. Both pathologies disturb the immune system, and the pathological processes are triggered by environmental factors. In the case of psoriasis, these factors are psychoemotional stresses, infections (group A streptococci and Staphylococcus aureus), drugs (lithium-containing, antimalarial, and antituberculous agents and Novocain), smoking, and mechanical damages (the so-called Koebner phenomenon) [Bowcock A et al., 2004]. Psoriasis vulgaris is one of the most prevalent chronic inflammatory skin diseases affecting approximately 2% of individuals in Western societies, and found worldwide in all populations. Psoriasis is a complex disease affecting cellular, gene and protein levels and presented as skin lesions. The skin lesions are characterized by abnormal keratinocyte differentiation, hyperproliferation of keratinocytes, and infiltration of inflammatory cells [Boehncke WH et al. 1996; Ortonne JP, 1996]. The factors triggering Crohn’s disease include psychoemotional stresses, infections (Mycobacterium avium ssp. paratuberculosis and invasive Escherichia coli variants), drugs (antibiotics and nonsteroid antiimﬂammatory agents), smoking, and nutritional regimen [Sartor R., 2006]. Crohn’s disease known only since the 1920s [Crohn B et al., 1932] and now affecting up to 0.15% of the northwest European and North American population [Binder V., 2005].
Both psoriasis and Crohn’s disease are now regarded as incurable, and the goal of their therapy is to extend the remission periods and decrease the severity of the disease. These two diseases are tightly related at the genetic level, as over ﬁve genetic loci are involved in the development of both psoriasis and Crohn’s disease.
The mechanisms of both psoriasis and Crohn’s disease are complex and involve genetic and environmental factors. As we gain more knowledge about molecular pathways implicated in diseases, novel therapies emerge (such as etanercept and infliximab that target TNF-α or CD11a- mediated pathways [Pastore S et al., 2008; Gisondi P et al., 2007]).
We have studied earlier the components of AP-1 transcription factor as psoriasis candidate genes. This study was performed by bioinformatics analysis of the transcription data using the GEO DataSets database (http://www.ncbi.nlm.nih.gov/geo/) [Piruzian ES et al., 2007]. The same approach was used by other researchers to detect potential therapeutic targets for psoriasis [Yao Y., et al., 2008]. In next step, we performed a comparative analysis of the molecular processes involved in the pathogenesis of two diseases, psoriasis and Crohn’s disease [Piruzian ES et al., 2009].
Despite the fact that psoriasis and Crohn`s disease affect completely different body systems (skin and intestine), they are much more similar that it may seem at first glance. Both skin and intestinal epithelium are barrier organs, that are the first to resist the environmental factors, including microorganisms. Both pathologies are immune-mediated inflammatory diseases, that is also marked by the same drug therapies. Finally, they have a lot of common susceptibility loci (Fig. 1).
In recent years, microarray mRNA expression profiling [Oestreicher JL et al., 2001; Bowcock AM et al., 2001; Zhou X et al., 2003; Quekenborn-Trinquet V et al., 2005] of lesional psoriatic skin revealed over 1,300 differentially expressed genes. Enrichment analysis (EA) showed that these genes encode proteins involved in regeneration, hyperkeratosis, metabolic function, immune response, and inflammation and revealed a number of modulating signaling pathways. These efforts may help to develop new-generation drugs. However, enrichment analysis limits our understanding of altered molecular interactions in psoriasis as it provides a relative ranking based on ontology terms resulting in the representation of fragmented and disconnected perturbed pathways. Furthermore, analysis of gene expression alone is not sufficient for understanding the whole variety of pathological changes at different levels of cellular organization. Indeed, new methodologies have been applied to the analysis of OMICs data in complex diseases that include algorithm-based biological network analysis [Nikolskaya T, et al., 2009; Nikolsky Y et al., 2005; Bhavnani SK et al., 2009; Ideker T et al., 2008; Chuang HY et al., 2007] and meta-analysis of multiple datasets of different types [Cox B et al., 2005; Wise LH et al., 1999; Ghosh D et al., 2003; Warnat P et al., 2005; Hack CJ, 2004; Menezes R et al., 2009]. Here, we applied several techniques of network and meta-analysis to reveal the similarities and differences between transcriptomics- and proteomics-level perturbations in psoriasis lesions. We particularly focused on revealing novel regulatory pathways playing a role in psoriasis development and progression.
2. Transcriptomic and proteomic data, network analysis
Data preparation. The data deposited with the public database of microarray experiments, GEO (http://www.ncbi.nlm.nih.gov/geo/), were analyzed. The expression data on psoriasis were contained in entry GDS1391, and on Crohn’s disease, in entry GDS1330. Since these data were obtained using different microarrays and experimental schemes, analysis was individually performed for each disease with subsequent comparison of the lists of genes with altered expression for each case.
Two sets were selected from the overall data on psoriasis, namely, four experiments with gene expression in psoriatic skin lesions, and four, with gene expression in the healthy skin of the same patients. The selected data for Crohn’s disease were also represented by two sets: 10 experiments on expression in intestinal epithelial lesions, and 11, on expression in the intestinal tissue of healthy individuals. The data were prepared for analysis using the GeneSpring GX (http://www.chem.agilent.com/scripts/pds.asp?lpage=27881) software package. This processing comprised discarding of the genes with poorly detectable expression and normalization of the remaining data. In addition to the values of expression, the so-called absent call ﬂags were added for psoriasis cases; these ﬂags characterize the signiﬁcance of the difference in expression of a particular gene from the background noise. The genes displaying the ﬂag value of A (i.e., absent, which means that the expression of a particular gene in experiment is undetectable) in over 50% of experiments were discarded from further analysis. This information was unavailable for Crohn’s disease; therefore, this step was omitted. The results were normalized by the median gene expression in the corresponding experiment to make them comparable with one another.
Detection of the genes with altered expression. Differentially expressed genes were sought using Welch’s t-test [Welch B.L., 1947]. This test does not require that the distribution variances for the compared samples be equal; therefore, it is more convenient for analyzing expression data than a simple t-test. FDR algorithm [Benjamini Y et al., 1995] with a signiﬁcance threshold of 0.1 was used to control the type I errors in ﬁnding differentially expressed genes; in this case, the threshold determined the expected rate of false positive predictions in the ﬁnal set of genes after statistical control.
Detection of common biological processes. The resulting gene lists were compared, and the molecular processes mediated by the genes displaying altered expressions in both diseases were sought using the MetaCore (GeneGo Inc., www.genego.com) program. The signiﬁcance of the biological processes where the genes displaying altered expressions in both diseases was assessed according to the degree to which overlapping between the list of differentially expressed genes and the list of genes ascribed to the process exceeded random overlapping. Hypergeometric distribution [Draghici S et al., 2007] was used as a model of random overlapping between the gene lists. The measure of signiﬁ- cance for the input gene list, the p value, in this distribution is calculated as
where N is the number of genes in the MetaCore database; R, the number of genes ascribed to a particular process; n, the size of the input gene list; and r, the number of genes from the input list related to this process.
Three ontologies of biological processes were used in this work: GO (www.gene-ontology.org) and two ontologies included in the MetaCore, Canonical pathways and GeneGo process networks. The processes contained in the MetaCore ontologies are gene networks, which reﬂect the interaction of proteins involved in a particular biological regulatory or metabolic pathway. The processes for all three ontologies were prioritized by the negative logarithm of p value.
The common molecular biological pathways were determined based on the analysis of signiﬁcant biological processes and expressions of the genes involved in these processes. The MetaCore contains the algorithms providing for detection in the total network of gene interactions the particular regulatory pathways and subnetworks saturated with the objects of research interest, in this case, the genes with altered expression. The resulting assumptions on the pattern of common biological pathways were visualized as a gene network using the MetaCore.
Skin biopsies. Acquisition of the human tissue was approved by the Vavilov Institute of General Genetics of Russian Academy of Sciences review board and the study was conducted after patient's consent and according to the Declaration of Helsinki Principles. A total of 6 paired nonlesional and lesional (all were plaque-type) skin biopsies from 3 psoriatic patients were profiled using 2D electrophoresis. All the donors who gave biopsy tissue (both healthy controls and individuals with psoriasis) provided a written informed consent for the tissue to be taken and used in this study. Clinical data for all patients are listed in Table 3. Full-thickness punch biopsies were taken from uninvolved skin (at least 2 cm distant from any psoriatic lesion; 6 mm diameter) and from the involved margin of a psoriatic plaque (6 mm diameter) from every patient.
Microarray data analysis. We used recently published data set [Yao Y, et al., 2008] from GEO data base (http://www.ncbi.nlm.nih.gov/geo/; accession number GSE14095). We compared 28 pairs of samples (in each pair there was a sample of lesional skin and a sample of healthy skin from the same patient). Values for each sample were normalized by sample median value in order to unify distributions of expression signals. For assessment of differential expression we used paired Welch ttest with FDR correction [Benjamini Y et al., 1995]. Probe set was considered as differentially expressed if its average fold change exceeded 2.5 and FDR corrected p-value was less than 0.01.
Overconnection analysis. All network-based analyses were conducted with MetaCore software suite http://www.genego.com. This software employs a dense and manually curated database of interactions between biological objects and variety of tools for functional analysis of high-throughput data. We defined a gene as overconnected with the gene set of interest if the corresponding node had more direct interactions with the nodes of interest than it would be expected by chance. Significance of overconnection was estimated using hypergeometric distribution with parameters r - number of interactions between examined node and the list of interest; R - degree of examined node, n - sum of interactions involving genes of interest and N - total number of interactions in the database:
Hidden nodes analysis. In addition to direct interacting objects, we also used objects that may not interact directly with objects of interest but are important upstream regulators of those [Dezso Z et al., 2009]. The approach is generally the same as described above, but the shortest paths instead of direct links are taken into account. As we were interested in transcriptional regulation, we defined a transcriptional activation shortest path as the preferred shortest path from any object in the MetaCore database to the transcription factor target object from the data set. We added an additional condition to include the uneven number of inhibiting interactions in the path (that's required for the path to have activating effect). If the number of such paths containing examined gene and leading to one of objects of interest were higher than expected by chance, this gene was considered as significant hidden regulator. The significance of a node's importance was estimated using hypergeometric distribution with parameters r - number of shortest paths between containing currently examined gene; R - total number of shortest paths leading to a gene of interest through transcriptional factor, n - total number of transcription activation shortest paths containing examined gene and N - total number of transcription activation shortest paths in the database.
Rank aggregation. Both topology significance approaches produced lists of genes significantly linked to a gene or protein set of interest, ranked by corresponding p-values. To combine results of these two approaches, we used a weighted rank aggregation method described in [Pihur V et al., 2009]. Weighted Spearman distance was used as distance measure and the genetic algorithm was employed to select the optimal aggregated list of size 20. This part of work was accomplished in R 2.8.1 http://www.r-project.org.
Network analysis. In addition to topology analysis, we examined overexpressed genes and proteins using various algorithms for selecting connected biologically meaningful subnetworks enriched with objects of interest. Significance of enrichment is estimated using hypergeometric distribution. We first used an algorithm intended to find regulatory pathways that are presumably activated under pathological conditions. It defines a set of transcription factors that are directly regulating genes of interest and a set of receptors whose ligands are in the list of interest and then constructs series of networks; one for each receptor. Each network contains all shortest paths from a receptor to the selected transcriptional factors and their targets. This approach allows us to reveal the most important areas of regulatory machinery affected under the investigated pathological condition. Networks are sorted by enrichment p-value. The second applied algorithm used was aimed to define the most influential transcription factors. It considers a transcriptional factor from the data base and gradually expands the subnetwork around it until it reaches a predefined threshold size (we used networks of 50 nodes). Networks are sorted by enrichment p-value.
3. A comparative analysis of the molecular genetic processes in the pathogenesis of psoriasis and Crohn’s disease
Constructing List of the Genes with Altered Expression in Both Pathologies We detected the lists of differentially expressed genes separately in each dataset and compared these lists at the system level. This approach to analysis was dictated by the properties of expression data in general (a high noise level and a large volume of analyzed data) and individual properties of datasets selected for analysis, which were obtained using different microarrays with different numbers of probes. That is why the datasets were incomparable in a direct fashion. The dataset on psoriasis initially contained information on the expression levels of 12626 probes from eight experiments (four specimens of skin lesions, and four of the healthy skin from the same patients). After discarding the probes with poorly detectable expression (see Materials and Methods), the set was reduced to 5076 probes. The list of the probes with statistically signiﬁcant differences in expression between the lesion and healthy tissue contained 410 items at a signiﬁcance level of 0.1.
The dataset on Crohn’s disease contained information on the expression level of 24016 probes from 21 experiments (11 specimens of epithelial lesions and 10 specimens of healthy epithelium). The list of probes displaying statistically signiﬁcant differences in expression between the lesion and healthy tissue contained 3850 probes at a signiﬁcance level of 0.1. This pronounced difference in the sizes of gene lists result from the fact that the algorithm used for controlling type I errors (FDR) depends on the input set. The larger the initial gene list, the larger number of genes will pass the FDR control at a similar p-value distribution; in our case, the number of analyzed probes in the dataset for Crohn’s disease is ﬁve times larger than that in the dataset for psoriasis.
The lists of differentially expressed genes were input into the MetaCore program. Because microarrays contained not only gene probes, but also a large number of ESTs with unidentiﬁed functions, the size of gene lists at this stage changed because not all the probes had the corresponding gene in the MetaCore database and because some probes corresponded to more than one gene. The lists of recognized genes comprised 425 and 2033 items for psoriasis and Crohn’s disease, respectively.
The common part for the compared lists comprised 49 genes, which is a signiﬁcant overlapping (p value = 4.94 × 10 –2 ). The significance was estimated using Fisher’s test. The complete set contained 9017 genes present in both studied datasets (this set was identiﬁed by comparing the complete lists of genes for both microarrays in MetaCore). The lists of genes with altered expression were reduced to the subset of genes present in both datasets. Thus, these particular 49 genes were selected for further analysis (Table 1).
It was of interest to determine the molecular processes with which the genes common to psoriasis and Crohn’s disease are associated. Table 2 consolidates the most probable cell processes with involvement of the genes listed in Table 1, as determined by the MetaCore software tools. These processes (Table 2) fall into two main groups—related to inﬂammation and cell cycle. Indeed, the pathological lesions in both psoriasis and Crohn’s disease are inﬂammatory foci. The cell cycle is also considerably affected in both pathologies. An increased proliferation of keratinocytes is observed in the psoriatic skin, an inﬂammatory focus.
|Inflammation: interferon signaling pathways||2.19E-03|
|Signal transduction: Wnt signaling pathways||1.20E-02|
|Regulation of translation initiation||5.66E-02|
|Morphogenesis of blood vessels||9.76E-02|
|Inflammation: amphoterin signaling pathways||1.19E-01|
|Proteolysis determined by the cell cycle and apoptosis||1.29E-01|
|Interleukin regulation of the cell cycle in G1-S phase||1.29E-01|
|Signal transduction: androgen receptor signaling pathways||1.34E-01|
For a more detailed description of the inﬂammatory response and cell cycle in the parts of them most tightly related to the genes listed in Table 1, we constructed gene networks, which are fragments of the larger gene networks describing the inﬂammatory response (Fig. 2) and cell cycle control (Fig. 3). Figure 2 shows that the inﬂammatory response is initiated by such well-known cytokines as TNF-α, IFN-γ, IL-2, IL-6, IL-17, and IL-23. Then protein kinases activate the transcription factors AP-1, STAT3, C/EBP, NF-κB, ISGF3, and others.
Figure 3 shows that the key cell cycle regulators that changed gene expression are the transcription factors AP-1, c-Myc, and STAT3. It is also evident that the genes encoding AP-1 transcription factor components are involved in both the inﬂammatory response and cell cycle control. It is known that the genes depending on AP-1 play an important role in regulation of proliferation, morphogenesis, apoptosis, and cell differentiation. Induction of cell differentiation activates transcription of the genes encoding the components of AP-1 complex [Turpaev K.T., 2006]. We assume that the genes of AP-1 transcription factor are the candidate genes involved in the pathogenesis of both psoriasis and Crohn’s disease; moreover, this hypothesis is particularly based on the bioinformatics analysis of microarray data. Therefore, it was interesting to compare our data with the available information about the chromosome localization of the loci associated with psoriasis and Crohn’s disease.
4. Integrated network analysis of transcriptomic and proteomic data in psoriasis
Differentially abundant proteins. Protein abundance was determined by densitometric quantification of the protein spots on 2D-electophoresis gel (Figure 4) followed by MALDI-TOF mass spectrometry. Total of 10 proteins were over-abundant at least 2-fold in lesional skin compared with uninvolved skin: Keratin 14, Keratin 16, Keratin 17, Squamous cell carcinoma antigen, Squamous cell carcinoma antigen-2, Enolase 1, Superoxide dismutase [Mn], Galectin-7, S100 calcium-binding protein A9 and S100 calcium-binding protein A7.
Several of these proteins were previously reported to be over-abundant in psoriatic plaques [Leigh IM et al., 1995; Madsen P et al., 1991; Vorum H et al., 1996; Takeda A et al.., 2002]. The proteins belonged to a diverse set of pathways and processes.
We attempted to connect the proteins into a network using a collection of over 300,000 manually curated protein interactions and several variants of "shortest path" algorithms applied in MetaCore suite [Nikolsky Y et al, 2009] (Figure 5). The genes encoding overabundant proteins were found to be regulated by several common transcription factors (TFs) including members of the NFkB and AP-1 complexes, STAT1, STAT3, c-Myc and SP1. Moreover, the upstream pathways activating these TFs were initiated by the overabundant S100A9 through its receptor RAGE [Ghavami S et al., 2008] and signal transduction kinases (JAK2, ERK, p38 MAPK). This network also included a positive feedback loop as S100A9 expression was determined to be controlled by NF-kB [Schreiber J et al., 2006]. The topology of this proteomics-derived network was confirmed by several transcriptomics studies [Tsuruta D, 2009; Sano S et al., 2008; Ghoreschi K et al., 2003; Piruzian ES et al., 2009; Gandarillas A & Watt FM, 1997; Arnold I & Watt FM, 2001] which showed overexpression of these TFs in psoriasis lesions. Transiently expressed TFs normally have low protein level and, therefore, usually fail to be detected by proteomics methods.
RAGE receptor is clearly the key regulator on this network and plays the major role in orchestrating observed changes of protein abundance. This protein is abundant in both keratinocytes and leukocytes, though normally its expression is low [Lohwasser C et al., 2006]. RAGE participates in a range of processes in these cell types, including inflammation. It is being investigated as a drug target for treatment of various inflammatory disorders [Santilli F et al., 2009]. Thus, we may propose that RAGE can also play significant role in psoriasis.
We used Affymetrix gene expression data set from the recent study [Yao Y et al., 2008] involving 33 psoriasis patients. Originally, more than 1300 probe sets were found to be upregulated in lesions as compared with unlesional skin of the same people. We identified 451 genes overexpressed in lesional skin under more stringent statistical criteria (28 samples of lesional skin were matched with their nonlesional counterparts from the same patients in order to exclude individual expression variations, genes with fold change >2.5 and FDR-adjusted p-value < 0.01 were considered as upregulated). The genes encoding 7 out of 10 proteomic markers were overexpressed, well consistent with proteomics data. Expression of Enolase 1, Keratin 14 and Galectin 7 was not altered.
Despite good consistency between the proteomics and expression datasets, the two orders of magnitude difference in list size make direct correlation analysis difficult. Therefore, we applied interactome methods for the analysis of common upstream regulation of the two datasets at the level of transcription factors. First, we defined the sets of the most influential transcription factors using two recently developed methods of interactome analysis [Nikolsky Y et al., 2008] and the "hidden nodes" algorithm [Nikolskaya T et al., 2009]. The former method ranks TFs based on their one-step overconnectivity with the dataset of interest compared to randomly expected number of interactions. The latter approach takes into account direct and more distant regulation, calculating the p-values for local subnetworks by an aggregation algorithm [Nikolskaya T et al., 2009]. We calculated and ranked the top 20 TFs for each data type and added several TFs identified by network analysis approaches (data not shown). The TFs common for both data types were taken as set of 'important pathological signal transducers' (Figure 6). Noticeably, they closely resemble the set of TFs regulating the protein network on Figure 5.
In the next step, we applied "hidden nodes" algorithm to identify the most influential receptors that could trigger maximal possible transcriptional response. In total, we found 226 membrane receptors significantly involved into regulation of 462 differentially expressed genes ('hidden nodes' p-value < 0.05). Assuming that topological significance alone does not necessarily prove that all receptors are involved in real signaling or are even expressed in the sample; we filtered this list by expression performance. The receptors used were those whose encoding genes or corresponding ligands were overexpressed greater than 2.5 fold. We assumed that the pathways initiated by over-expressed receptors and ligands are more likely to be activated in psoriasis. Here we assumed that expression alterations and protein abundance are at least collinear. An additional criterion was that the candidate receptors had to participate in the same signaling pathways with at least one of the common TFs. No receptor was rejected based on this criterion. In total, 44 receptors passed the transcription cut-off. Of these 24 receptor genes were overexpressed; 23 had overexpressed ligands and 3 cases had overexpression of both ligands and receptors (IL2RB, IL8RA and CCR5; see Figures 7 and 8). Interestingly, for several receptors, more than one ligand was overexpressed (Figure 7). Several receptors are composed of several subunits, only one of which was upregulated (for example, IL-2 receptor has only gamma subunit gene significantly upregulated). Out of 44 receptors we identified by topology analysis, 21 were previously reported as psoriasis markers (they are listed in Table 3 with corresponding references). The other 23 receptors were not reported to be linked to psoriasis or known to be implicated in other inflammatory diseases. These receptors belong to different cellular processes (development, cell adhesion, chemotaxis, apoptosis and immune response) (Table 6).
|Gene||Connection to psoriasis||Gene||Connection to psoriasis|
|EPHA2||No||AGER||Yes [Foell D et al., 2003]|
|EPHB2||No||CCR1||Yes [Horuk R, 2005]|
|FCER1G||No||CCR2||Yes [Vestergaard C et al., 2004]|
|INSR||No||CCR3||Yes [Rottman JB et al., 2001]|
|LTBR||No||CCR5||Yes [de Groot M et al., 2007]|
|PLAUR||No||CD2||Yes [Ellis CN & Krueger GG., 2001]|
|TNFRSF10A||No||CD27||Yes [De Rie MA et al., 1996]|
|TNFRSF10B||No||CD36||Yes [Prens E et al., 1996]|
|CD44||Possible [Reichrath J et al, 1997]||CD3D||Yes [Haider AS, et al., 2007]|
|CSF2RB||Possible [Kelly R et al., 1993]||EGFR||Yes [Castelijns FA et al., 1999]|
|CXCR4||Possible [Gu J et al., 2002]||IL17RA||Yes [Johansen C et al., 2009]|
|FZD4||Possible[Reischl J et al., 2007]||IL1R1||Yes [Debets R et al., 1997]|
|GABBR1||Possible[Shiina T et al., 2009]||IL8RA||Yes [Schulz BS et al., 1993]|
|IL10RA||Possible [Asadullah K et al., 1998]||IL8RB||Yes [Schulz BS et al., 1993]|
|IL13RA1||Possible [Cancino-Diaz JC et al., 2002]||ITGAL||Yes [Guttman-Yassky E et al., 2008]|
|IL2RB||Possible [Pietrzak A et al., 2008]||ITGB2||Yes [Sjogren F et al., 1999]|
|IL2RG||Possible [Pietrzak A et al., 2008]||LRP1||Yes [Curry JL et al., 2003]|
|IL4R||Possible [Martin R, 2003]||PTPRC||Yes [Vissers WH et al., 2004]|
|LILRB2||Possible [Penna G et al., 2005]||SDC3||Yes [Patterson AM et al., 2008]|
|LRP2||Possible [Fu X et al., 2009]||SELE||Yes [Wakita H &Takigawa M, 1994]|
|LRP8||Possible [Fu X et al., 2009]||SELPLG||Yes [Chu A et al., 1999]|
|ROR2||Possible [Reischl J et al., 2007]||TLR4||Yes [Seung NR et al., 2007]|
Meta-analysis of multiple OMICs data types and studies is becoming an important research tool in understanding complex diseases. Several methods were developed for correlation analysis between the datasets of different type, such as mRNA and proteomics [Hack CJ, 2004; Le Naour F et al., 2001; Steiling K et al., 2009; Conway JP & Kinter M, 2005; Di Pietro C et al., 2009]. However, there are many technological challenges to resolve, including mismatching protein IDs and mRNA probes, fundamental differences in OMICs technologies, differences in experimental set-ups in studies done by different groups etc [Mijalski T et al., 2005]. Moreover, biological reasons such as differences in RNA and protein degradation processes also contribute to variability of different data types. As a result, transcriptome and proteome datasets usually show only weak positive correlation although were considered as complimentary. More recent studies focused on functional similarities and differences observed for different levels of cellular organization and reflected in different types of OMICs data [Habermann JK et al., 2007; Chen YR et al., 2006; Shachaf CM et al., 2008; Zhao C et al., 2009]. For example, common interacting objects were found for distinct altered transcripts and proteins in type 2 diabetes [Gerling IC et al., 2006]. In one leukemia study [Zheng PZ et al., 2005] authors found that distinct alterations at transcriptomics and proteomic levels reflect different sides of the same deregulated cellular processes.
The overall concordance between mRNA and protein expression landscapes was addressed in earlier studies, although the data types were compared mostly at the gene/protein level with limited functional analysis [Cox B et al., 2005; Mijalski T et al., 2005]. Later, ontology enrichment co-examination of transcriptomics and proteomic data has shown that the two data types affect similar biological processes and are complimentary [Chen YR, et al., 2006; Zheng PZ et al., 2005; Zhao C et al., 2009]. However, the key issue of biological causality and functional consequences of distinct regulation events at both mRNA and protein levels of cellular organization were not yet specifically addressed. These issues cannot be resolved by low resolution functional methods like enrichment analysis. Instead, one has to apply more precise computational methods such as topology and biological networks, which take into consideration directed binary interactions and multi-step pathways connecting objects between the datasets of different types regardless of their direct overlap at gene/protein level [Ideker T & Sharan R, 2008; Chuang HY et al., 2007]. For example, topology methods such as "hidden nodes" [Dezso Z et al., 2009; Nikolsky Y et al., 2008] can identify and rank the upstream regulatory genes responsible for expression and protein level alterations while network tools help to uncover functional modules most affected in the datasets, identify the most influential genes/proteins within the modules and suggest how specific modules contribution to clinical phenotype [Nikolsky Y et al., 2005; Gerling IC et al., 2006].
In this study, we observed substantial direct overlap between transcriptomics and proteomics data, as 7 out of 10 over-abundant proteins in psoriasis lesions were encoded by differentially over-expressed genes. However, the two orders of magnitude difference in dataset size (462 genes versus 10 proteins) made the standard correlation methods inapplicable. Besides, proteomics datasets display a systematic bias in function of abundant proteins, favoring "effector" proteins such as structural, inflammatory, core metabolism proteins but not the transiently expressed and fast degradable signaling proteins. Therefore, we applied topological network methods to identify common regulators for two datasets such as the most influential transcription factors and receptors. We have identified some key regulators of the "proteomics" set among differentially expressed genes, including transcription factors, membrane receptors and extracellular ligands, thus reconstructing upstream signaling pathways in psoriasis. In particular, we identified 24 receptors previously not linked to psoriasis.
Importantly, many ligands and receptors defined as putative starts of signaling pathways were activated by transcription factors at the same pathways, clearly indicating on positive regulatory loops activated in psoriasis. The versatility and the variety of signaling pathways activated in psoriasis is also impressive, which is evident from differentially overexpression of 44 membrane receptors and ligands in skin lesions. This complexity and redundancy of psoriasis signaling likely contributes to the inefficiency of current treatments, even novel therapies such as monoclonal antibodies against TNF-α and IL-23. Thus, the key regulator, RAGE receptor, triggers multiple signaling pathways which stay activated even when certain immunological pathways are blocked. Our study suggests that combination therapy targeting multiple pathways may be more efficient for psoriasis (particularly considering feasibility for topical formulations). In addition, the 24 receptors we identified by topology analysis and previously not linked with psoriasis can be tested as potential novel targets for disease therapy. The functional machinery of psoriasis is still not complete and additional studies can be helpful in "filling the gaps" of our understanding of its molecular mechanisms. For instance, kinase activity is still unaccounted for, as signaling kinases are activated only transiently and are often missed in gene expression studies. Topological analysis methods such as "hidden nodes" [Dezso Z et al., 2004] may help to reconstruct regulatory events missing in the data. Also, the emerging phosphoproteomics methodology may prove to become a helpful and complimentary OMICs technology. The network analysis methodology is not dependent on the type of data analyzed and or any gene/protein content overlap between the studies and is well applicable for functional integration of multiple data types.
Thus, we succeeded in comparing the molecular processes characteristic of psoriasis and Crohn’s disease and detecting the candidate genes involved in the processes common for both pathologies and critical for their development. Identification of the proteins encoded by these genes is an important aspect of the research performed, because the proteins are particular targets for elaborating new approaches to treating psoriasis and Crohn’s disease. Our data obtained by analyzing expression of the candidate genes for psoriasis and Crohn’s disease can enhance the search for new biological targets for the corresponding therapeutics.
In order to gain insight into molecular machinery underlying the disease, we conducted a comprehensive meta-analysis of proteomics and transcriptomics of psoriatic lesions from independent studies. Network-based analysis revealed similarities in regulation at both proteomics and transcriptomics level. We identified a group of transcription factors responsible for overexpression of psoriasis genes and a number of previously unknown signaling pathways that may play a role in this process. We also evaluated functional synergy between transcriptomics and proteomics results.
We have successfully applied network-based methods to integrate and explore two distinct high-throughput disease data sets of different origin and size. Through identification of common regulatory machinery that is likely to cause overexpression of genes and proteins, we came to the signaling pathways that might contribute to the altered state of regulatory network in psoriatic lesion. Our approach allows easy integrative investigation of different data types and produces biologically meaningful results, leading to new potential therapy targets. We have demonstrated that pathology can be caused and maintained by a great amount of various cascades, many previously not described as implicated in psoriasis; therefore, combined therapies targeting multiple pathways might be effective in treatment.
This work was supported by grants 16.512.11.2049 from Ministry of Education and Science of the Russian Federation, grant from Russian Academy of Sciences ("Fundamental Sciences to Medicine" program), and grant P1309 from Ministry of Education and Science of the Russian Federation (Federal Special Program "Scientific and Educational Human Resources of Innovative Russia" for 2009 - 2013)