Comprehensive Network Analysis of Cancer Stem Cell Signalling through Systematic Integration of PostTranslational Modification Dynamics

Post‐translational modifications, such as phosphorylation, acetylation and ubiquitina - tion, are widely known to play various important roles in cellular signalling. Recent significant advances in mass spectrometry‐based proteomics technology enable us not only to comprehensively identify expressed proteins but also to unveil their post‐trans lational modifications with high sensitivity. In our advanced proteome bioinformatics frameworks, statistical network analyses of large‐scale information on various post‐ translational modification dynamics were conducted to define the key machinery for cancer stem cell properties. The bioinformatical approaches using IPA (ingenuity path way analysis), NetworKIN and a newly developed platform named PTMapper (post‐ translational modification mapper) allowed us to perform network‐wide prediction of upstream interactors/kinases with the related information on the diseases and functions, leading to systematic finding of novel drug candidates to regulate aberrant signalling in cancer stem cells. In this chapter, we apply patient‐derived glioblastoma stem cells as a representative model of cancer stem cells to introduce some useful platforms for statisti cal and mathematical network analyses based on the large‐scale phosphoproteome data.


Introduction
Glioblastoma (GBM) is known to be the most common and aggressive brain tumour in adults. Despite the enormous efforts to overcome this tumour for many years, the median survival for GBM patients remains around only 1 year [1]. GBM is characterized by high invasiveness and intratumoral heterogeneity (ITH) [2,3]. Up to date, it is known that GBM-ITH contributes to the resistance to chemotherapy, radiation and surgical resection. Since functional diversity is the main feature of multilineage differentiation of cancer stem cells (CSCs) [4,5], glioblastoma stem cells (GSCs) were thought to be major therapeutic targets of GBM. Furthermore, post-translational modifications (PTMs) of GSCs are reported to tightly regulate highly tumourigenic potential of GSCs through aberrant signalling [6,7]. Therefore, it is important to comprehensively elucidate PTM-based GSC signalling networks for developing the effective treatment of GBM.
Advanced nanoscale liquid chromatography-tandem mass spectrometry (nanoLC-MS/MS) enables us to identify and quantify thousands of proteins in a single experiment [8]. Moreover, using the nanoLC-MS/MS system coupled to the high-affinity enrichment methods of the peptides with PTMs, we can also acquire in-depth biological information on PTM dynamics. In this chapter, we introduce high-resolution shotgun proteomics technology for large-scale PTM determination in combination with statistical bioinformatics platforms such as IPA [9], NetworKIN [10,11] and PTMapper [12].

System-wide proteomic analysis of PTM dynamics
PTMs are widely known to play crucial roles in cell fate control, such as proliferation, differentiation and apoptosis. More than 500 kinds of PTMs regarding eukaryotes and prokaryotes have been registered with Unimod, a comprehensive database of protein modifications for mass spectrometry [13]. Recent technological advances in mass spectrometry-based proteomics in combination with appropriate enrichment techniques for each PTM enable us to perform comprehensive identification and quantification of PTMs [14]. Here, we introduce biochemical purification methods for highly sensitive detection of the representative PTMs: phosphorylation, acetylation and ubiquitination (Figure 1).

Phosphorylation
Protein phosphorylation is recognized as one of the most important and well-studied PTMs and regulates a variety of biological processes by transmitting diverse external signals [15,16]. About as many as 280,000 phosphorylation sites have already been registered in PhosphoSitePlus, a knowledgebase containing non-redundant mammalian PTMs [17]. Titanium dioxide (TiO 2 ), which has very high affinity for phosphorylated peptides, is widely used for large-scale phosphoproteome analysis [18,19].

Acetylation
Lysine acetylation plays a key role in modulating transcriptional regulation through the coordinated function of histone acetyltransferases (HATs) and histone deacetylases (HDACs) [20]. The stabilization of p53, one of the most important transcription factors, is reported to greatly depend on lysine acetylation [21]. Thousands of lysine acetylation sites can be identified using an antibody against acetyl-lysine in combination with a high-resolution mass spectrometry system [22,23].

Ubiquitination
The ubiquitin system transmits protein degradation signal to proteasome as well as regulates multiple cellular functions such as cell-cycle progression, DNA repair and transcriptional regulation. Dysfunction of this system leads to various pathological conditions [24]. Ubiquitination sites are detected as diglycine (Gly-Gly) remnants on the modified lysine residues, which are generated by tryptic digestion of ubiquitinated proteins [25,26].

Systematic characterization of the phosphoproteome dynamics in GSCs
The quantitative information on the phosphoproteome dynamics can provide us with systematic description of the key machinery for cellular signalling. In this section, we introduce two examples of global phosphoproteome analyses of GSCs using SILAC (stable isotope labelling by amino acids in cell culture)-based quantitative technique [27,28] (Figure 2). One was carried out using epidermal growth factor (EGF) to elucidate the mechanism for stemness maintenance of GSCs [29], whereas the other was conducted through serum-induced differentiation of GSCs to unveil the key pathways responsible for disrupting stemness characteristics [30].

Global quantitative phosphoproteome analyses of EGF-stimulated GSCs
EGF is known to be essential for maintenance and growth of GSCs [31]. The quantitative phosphoproteomic analysis of EGF-stimulated GSCs was performed to acquire network-wide information on the molecules related to stemness maintenance. As a result, a total of 6073 phosphopeptides from 2282 phosphorylated proteins were identified, leading to quantitative classification of 516 upregulated and 275 downregulated phosphorylation sites [29].

Upstream kinase prediction analysis
Protein phosphorylation is known to be controlled by specific kinases depending on consensus sequence motifs of substrates [32]. The motif-x algorithm [33,34] is applicable to statistical extraction of significant consensus sequence motifs from the large-scale phosphoproteome data on EGF-stimulated GSCs (Figure 4(A) and (B)).
NetworKIN [10,11] is designed to predict upstream kinases based on the sequence motifs around the functionally regulated phosphorylation sites through construction of the related protein-protein interaction (PPI) networks using STRING [35]. The NetworKIN algorithm enables further interpretation of the results obtained from the motif-x analyses (Figure 4 (C)).

Global quantitative phosphoproteome analyses of serum-induced GSCs
CSCs are regarded as one of the most clinically important cell populations in causing tumour heterogeneity, which is responsible for the resistance to chemotherapy [36]. As recent studies have demonstrated that non-CSCs can also readily acquire CSC-like characteristics [37], it is very important to figure out the detailed mechanisms underlying CSC differentiation and  understand the principle of their heterogeneity. Serum-induced phosphoproteome dynamics in GSCs was measured to systematically elucidate the regulatory nodes for stemness alteration over the entire signalling networks [30]. Among 2876 phosphorylation sites on 1584 proteins identified, 732 phosphorylation sites on 419 proteins were found to be regulated through serum-induced differentiation. The integrative network analyses of the quantitative phosphoproteome data using various bioinformatical tools including IPA and NetworKIN indicated that transforming growth factor-β receptor type-2 (TGFBR2) might be one of the crucial upstream regulators concerning GSC alteration (Figure 5).

Development of advanced bioinformatical platforms for complicated kinase-substrate interaction networks
Although shotgun proteomics strategy based on advanced nanoLC-MS/MS system can provide us with large-scale information on various kinds of PTMs, there are only a few PTMbased network analysis tools available compared to conventional protein-protein interaction (PPI). Recently, CEASAR: connecting enzymes and substrates at amino acid resolution [39] and PhosphoPath [40] were developed to visualize kinase-substrate interactions in a phosphorylation site-oriented manner. CEASAR was designed to provide a high-resolution map of kinase-phosphorylation networks based on functional protein microarrays and bioinformatics analysis. On the other hand, PhosphoPath was developed as a Cytoscape app [41] to visualize both quantitative proteome and phosphoproteome data using PPI information extracted from BioGRID [42] and PhosphoSitePlus [17]. Recently, we also have developed a Cytoscape-based bioinformatical platform named 'post-translational modification mapper (PTMapper)' to visualize kinase-substrate interactions regarding multiple phosphorylation sites on signalling molecules (Figure 6) [12]. The kinase-phosphorylation site interaction dataset for this platform was integratively generated from PhosphoSitePlus [17], Phospho.ELM [43], PhosphoNetworks [44] and Uniprot KB [45], leading to construction of phosphorylation site-oriented PPI networks using Pathway Commons [46]. We applied this platform to extract crucial kinase-substrate interactions from the quantitative phosphoproteome data on EGF-stimulated GSCs [29]. As a result, p70S6K and Lyn were significantly extracted as key regulators (Figure 7).

Perspectives and conclusions
The bioinformatical description of GSC signalling dynamics based on the global quantitative phosphoproteome data led to network-wide extraction of critical molecules and their related pathways for defining stemness characteristics. Further integrative description of multiple PTM dynamics in GSCs will deepen our understanding of the nature of their cell signalling complexity at the network level. We believe that shotgun proteomics-based quantitative analyses of cancer stem cell signalling networks in combination with various statistical and mathematical platforms will pave the way to establish new directions towards systematic evaluation of drug targets in a cell-type specific manner.

Author details
Hiroko Kozuka-Hata and Masaaki Oyama* *Address all correspondence to: moyama@ims.u-tokyo.ac.jp Medical Proteomics Laboratory, The Institute of Medical Science, The University of Tokyo, Minato-ku, Tokyo, Japan