Proteomics in Acute Myeloid Leukemia

Acute myeloid leukemia (AML) is an extremely heterogeneous and deadly hematological cancer. Cytogenetic abnormalities and genetic mutations, though well recognized and highly prognostic, do not fully capture the degree of heterogeneities manifested in AML clinically. Additionally, current treatment of AML still largely depends on chemotherapy and allogeneic stem cell transplantation, with few options for personalized and molecularly targeted therapies. Proteomics holds promise for unraveling biological heterogeneities in AML beyond the scope of cytogenetics and genomics. In recent years, proteomics has emerged as an important tool for discovering new diagnostic biomarkers, enabling more prognostic patient classifications, and identifying novel therapeutic targets. In this chapter, we review recent advances in proteomic studies of AML, including an overview of AML pathology, popular proteomic techniques, various applications of proteomics in AML from biomarker discovery to target identification, challenges and future directions in this field.


Introduction
Acute myeloid leukemia (AML) is a hematological cancer characterized by rapid proliferation and accumulation of immature, abnormally differentiated clonal hematopoietic cells in the bone marrow and blood [1]. The American Cancer Society estimates about 21,380 new cases of AML and about 10,590 deaths from AML occurring in the United States in 2017. Known as an ageassociated disease, the average age of AML patients is 67 years old, and almost all deaths from AML are in adults. Increasing age is also a prognostic factor in AML. The disease has a much lower cure rate in older patients (5-15% over 60 years old) compared to younger patients (35-40% under 60 years old) [2], in part because the elderly are unable to tolerate intensive chemotherapy.
In contrast to breakthroughs made in treating other cancers, the progress of treatment in AML has been slow overall. The main challenge is that the biology of AML is enormously heterogeneous.
It has long been hoped that genetic mutations in AML can provide critical prognostic information to complement cytogenetics and help direct individualized therapy. Indeed, recurrent mutations of genes (e.g. FLT3, NPM1, CEBPA, KIT, DNMT3A, IDH1/2 and TET2) have been identified in AML, some of which were found to associate with patient outcome, and identification of these mutations has already been incorporated into the standard-of-care testing and classification system [11,12]. Our understanding of the genomic and epigenomic landscape in AML has also been greatly improved in the last decade thanks to the development of nextgeneration sequencing techniques. In a recent study by the Cancer Genome Atlas Research Network [13], whole-genome (50 cases) and whole-exome sequencing (150 cases), along with RNA and microRNA sequencing and DNA-methylation analysis, were used to analyze the genomes of 200 adult AML patients. The study revealed that AML genomes on average only have 13 mutations, which is fewer compared to other cancers. Furthermore, 5 of these 13 mutations are recurrent. The limited number of genetic mutations in contrast to the degree of heterogeneity observed clinically indicates the existence and importance of AML heterogeneity beyond genes.
Despite the extensive adoption of genomic approaches in cancer research, it is widely recognized that genomics alone is insufficient to provide an accurate picture of all cellular changes and dynamic states [14]. First, the same mRNA transcript often does not correspond to a single protein but to multiple protein counterparts, thanks to alternative splicing, protein cleavages, and post-translational modifications (PTM). In particular, PTMs (e.g. phosphorylation, acetylation, methylation, glycosylation, ubiquitination) play important roles in cellular processes by affecting the folding, location, and function of proteins. Proteins from the same mRNA transcript can have opposite effects on cellular processes with different PTMs, and it is currently not possible to predict the fate of PTMs from the protein sequence. Second, most cellular processes are executed and regulated by interactions between proteins and interactions between proteins and DNAs. An understanding of these interactions, which is unattainable via genomic approaches, is crucial for predicting cell behavior and discovering new drug targets. Moreover, the discovery of genetic mutations and abnormal gene expressions often does not offer an immediate therapeutic solution, as most drugs target proteins instead of genes.
Though nascent and over-shadowed by genomics in the research community, proteomics can complement the limitations of genomic approaches and advance the discovery of biomarkers and personalized treatments for AML. As the workhorses in cells, proteins can more accurately reflect the real dynamic changes in cellular processes, and offer insights into a heightened level of disease heterogeneity beyond the scope of genomics. In an analogy to screenwriting, genomics is a copy of a script, whereas proteomics is a movie produced from the script. With the same script, different actors, actresses and directors, stage settings and lighting effects will result in different productions. It is also extremely hard to judge whether the show will be a success based on the script alone, because execution matters and one can only be sure after seeing it in action. Therefore, proteomics can capture the real action in cells (e.g. the effects from cellular environment and the response by the cell) that are unforeseen by genomics.
The application potential of proteomics in AML is plenty. First, proteomics can be used to either establish new patient classification systems by itself or improve the current risk stratification system by complementing cytogenetics and genomics. Not all genetic mutations are equally important in driving the disease or in determining a patient's response to therapy. Some genetic mutations might not make a difference at the proteomic level, whereas some proteomic patterns and cell signaling behaviors might not manifest at the genetic level. The combination of proteomic and genomic approaches would be particularly beneficial for subclassifying patients that are currently lumped together in the intermediate risk group. Second, proteomics can be used as biomarkers to guide therapy. Certain protein expression and PTM levels could be effective indicators of whether a patient will develop chemoresistance and hence whether the patient should be referred to allogeneic stem cell transplantation. Moreover, abnormal expression of proteins can potentially be molecularly targeted, creating more personalized therapy options for AML patients.
The workflow of a typical proteomic project in AML is shown in Figure 1. In this chapter, we focus on reviewing the main proteomic techniques and the various applications of proteomics in AML research, the topics of the next two sections. In the last section, we will discuss the main challenges and issues in AML proteomic research by covering topics related to sample collection considerations and proteomic data analysis techniques.

Overview of proteomic techniques
The development of proteomic techniques in the past 20 years has enabled many research studies to identify the roles of proteins and PTMs in biology and human diseases at a large scale. It has also inspired the Human Proteome Project [15], a global effort that aims to "generate the map of protein based molecular architecture of the human body and become a resource to help elucidate biological and molecular function and advance diagnosis and treatment of diseases". Current proteomic approaches can be divided into two sub-categories: mass spectrometry (MS)-based, and antibody-based. Here, we describe the fundamentals of each technique and their recent applications in AML.

MS-based methods
One intuitive way to identify a protein is by measuring its mass directly. MS is a widely-used analytical technique that ionizes a sample (solid, liquid, or gas) and measures the mass based on the mass-to-charge ratios of the ions. The ionization causes the molecules to break into charged fragments, which pass through an electric (e.g. time-of-flight (TOF)) or magnetic field that sorts ions by their mass-to-charge ratios. The relative abundance of ions detected as a function of the mass-to-charge ratio is usually presented in a mass spectrum for deciphering the identity of the molecule. MS is often used in tandem with liquid chromatography (termed LC-MS or LC/MS) which separates the liquid compounds chromatographically before passing them through the mass spectrometer.
When applying MS to detect proteins, one can take either a "top-down" or a "bottom-up" approach [16][17][18]. The "top-down" approach ionizes the intact protein directly, and is usually limited to low-throughput single protein studies. On the other hand, the "bottom-up" approach first digests the protein into peptides using enzymes such as trypsin, and then analyzes the peptides using tandem mass spectrometry. The "bottom-up" approaches using LC-MS are also referred to as "shotgun proteomics" [19]. The "bottom-up" approach is more widely adopted compared to the "top-down" approach in proteomic studies because it is much easier to handle small tryptic peptides and determine their masses with high accuracy than handling intact protein ions. However, the limited protein sequence coverage by peptides, loss of PTM information and redundant peptides of ambiguous origin are some of the disadvantages of "bottom-up" approaches. Notably, an intermediate approach, "middledown", was proposed to break proteins into proteolytic peptides (size of 2-20 kDa) instead of small tryptic peptides (which is~8-25 residues long) using proteases such as OmpT [20]. This hybrid approach potentially combines the benefits from the "top-down" and "bottom-up" approaches and overcomes their drawbacks.
Electrospray ionization (ESI) [21] and matrix-assisted laser desorption/ionization (MALDI) [22] are two primary methods for ionizing proteins and peptides. ESI generates ionized molecules by applying a high electric field and dispersing the liquid sample into an aerosol. In contrast, MALDI ionizes the sample by firing laser pulses at the sample mixed with an energy absorbing matrix. Both methods are considered to be "soft" ways of obtaining ions of large molecules with low fragmentation. The main advantage of ESI is that it produces multiply charged ions, extending the mass detection range of the analyzer. MALDI, on the other hand, is advantageous for its robustness and high speed. ESI is frequently coupled with LC, whereas MALDI is most often used with TOF. A more recent method, Surface-enhanced laser desorption/ionization (SELDI) [23], was proposed as an alternative to MALDI. SELDI is similar to MALDI with the exception that the sample is bound to a surface in SELDI instead of being mixed with a matrix material. The SELDI surface allows for more retention of analytes and therefore is more suitable for detecting proteins in lower concentrations. SELDI is usually coupled with TOF, and it was shown that SELDI-TOF-MS can detect proteins from as little as 1 μLo fs e r u mo ra sf e wa s2 5 -50 cells [24], which can be very beneficial when studying clinical samples.
To quantify the protein levels (or termed "quantitative proteomics"), there are three major groups of labeling methods that can be used in the proteomic workflow: label-free, stable isotope labeling, and multiple reaction monitoring [25]. By its name, label-free methods (e.g. spectral counting and peptide peak intensity measurement) do not use any isotope containing compound to bind to and label proteins [26]. Though easy to perform, inexpensive, high throughput and with a wider dynamic range, label-free methods are in general less accurate [27]. Stable isotope labeling approaches use differential stable isotopes to label and distinguish samples via either metabolic labeling or chemical labeling. One example of metabolic labeling approach is stable isotope labeling by amino acids (SILAC) [28], which feeds cells from different samples with heavy and light forms of arginine or lysine through the growth medium. SILAC generates precise quantitation of proteins, but can only be applied to living or metabolically active samples. An alternative method, "super-SILAC", was developed to extend SILAC to human tissue samples by using a mixture of SILAC-labeled cell lines as the internal standard [29]. A super-SILAC mix based on five AML cell lines (Molm-13, NB4, MV4-11, THP-1, and OCI-AML3) was recently established for quantifying patient AML cells [30].
While most MS-based methods profile proteins from cell lysates, mass cytometry is a fusion technology of MS and flow cytometry that can be used to measure protein levels in single cells [31]. Mass cytometry is also referred to as cytometry by time-of-flight (CyTOF), which is the current commercialized implementation. Mass cytometry overcomes the spectral overlap in flow cytometry by conjugating probes (often antibodies) with heavy-metal isotopes as expression reporters instead of fluorophores. The metal-conjugated antibodies, ionized and detected using the TOF mass spectrometer, greatly increase the number of parameters measureable in single cells due to their little signal overlap. Currently, mass cytometry can be used to detect up to 40 parameters per cell (up to 100 parameters theoretically), including protein levels, PTMs and proteolysis products. Mass cytometry was recently used in pediatric AML to profile both the surface markers and intracellular signaling proteins in single cells [32]. Notably, the study discovered that the surface phenotypes and their regulatory intracellular signaling phenotypes are decoupled in AML, rendering the surface markers unreliable for reporting signaling states. The study also identified a gene signature associated with the primitive signaling phenotype that is predictive of survival.

Antibody-based methods
The other group of methods for detecting and quantifying proteins is based on the use of antibodies. Antibodies can be engineered to specifically recognize not only proteins but also their PTMs, which is very favorable for profiling kinases and signaling activities. Commonly used techniques such as western blot and enzyme-linked immunosorbent assay (ELISA) already use antibodies to measure protein expressions. However, these methods are low-throughput, and they are therefore unsuitable to profile a large number of proteins or samples in a timely fashion. Using microarray technologies, multiple types of high-throughput antibody-based methods were developed to enable profiling proteins at a much larger scale, including tissue microarrays (TMA) and protein microarrays. TMA is a proteomic technique in application to tissue samples [33]. TMA assembles up to 1000 tissue samples into one paraffin block to enable simultaneous evaluation of biomarkers. Since tissue samples are of more importance in solid tumors than in leukemia, we will focus the discussion on protein microarrays.
Based on the application purpose, protein microarrays can be divided into two categories: analytical protein arrays and functional protein arrays [34]. Functional protein arrays print a large number of individually purified proteins on an array to investigate their biochemical activities. The use of functional arrays is mostly in basic research, including identifying interactions between protein-protein, protein-DNA, protein-antibody, protein-lipid, protein-RNA, or protein-small molecules, and identifying substrates or enzymes for protein modifications. On the other hand, analytical protein arrays use well-characterized antibodies to measure the amounts of specific proteins in a large scale. These arrays are widely used in clinical research for biomarker discovery and protein expression profiling, and can be applied in disease diagnosis in clinic.
There are two types of analytical protein arrays: forward-phase protein array (FPPA) and reverse-phase protein array (RPPA) [35]. The major difference between FPPA and RPPA is whether antibodies or samples are immobilized. In FPPA, various antibodies are printed on a slide as bait molecules, where each spot on the array is one type of antibody. Each slide is then exposed to a single protein lysate (sample), and multiple protein expression levels are measured. The main advantage of FPPA is that a single slide can provide measurements of many proteins simultaneously. However, FPPA needs two highly specific antibodies (similar to "sandwich ELISA") for assaying each protein, and it also requires a higher amount of the protein lysate sample (which is often a luxury in clinical research). In contrast, RPPA immobilizes protein lysates, where each spot on the slide is a sample from a different source or condition. Each slide is then probed with one type of antibody and provides a read-out of the corresponding protein level across all printed samples, allowing for a direct comparison between samples. To profile multiple proteins, one can prepare a batch of identical slides printed with the same samples (which is straightforward to do), and process them in parallel, each slide with a unique type of antibody. RPPA is known to be highly sensitive and robust, and it is particularly advantageous for clinical applications because it requires lower amounts of samples. In the past decade, RPPA was used in multiple research studies to generate protein profiles and identify biomarkers in AML [36][37][38][39][40][41].
Compared to MS-based methods, antibody-based methods are less of a de novo discovery approach, and provides less coverage of the proteome. This is mainly because antibody-based methods only profile proteins that are known ahead of the experiment, and the coverage of these methods depend on the availability of specific antibodies. It is still an ongoing effort to generate antibodies that specifically recognize all protein isoforms present in the human proteome. The Human Protein Atlas project, started in 2003, maps the expression and location of proteins in cells, normal tissues and cancers using an antibody-based approach. Its latest version (16th release) now includes more than 25,000 antibodies that about 86% of all human protein-coding genes [42,43]. In addition, the quality of antibodies is key to the success of any antibody-based methods. Before printing an array, antibodies need to be validated to ensure that they are highly specific and do not cross-react with other proteins in the lysate. Otherwise, the accuracy of the profiling will be compromised by false signals. Antibodypedia (https:// www.antibodypedia.com/), a public database containing validation data of more than one million antibodies, is a useful resource for antibody-based research [44].

Discovery of diagnostic biomarkers
One application of proteomics in AML is diagnostic marker discovery. Comparing the proteomes between AML and healthy samples or between AML and other leukemic subtypes can shed light on the unique disease mechanisms present in AML. Differential protein expression levels can potentially serve as biomarkers for the early detection of the disease and for assisting the current diagnostic system to distinguish AML patients from other leukemic subtypes (for example, acute lymphocytic leukemia (ALL) or myelodysplastic syndromes (MDS)) as well as to classify patients within AML. The identification of these differentially expressed proteins specific to AML or specific to certain AML subtype will provide a deeper understanding of its heterogeneous disease mechanism and facilitate development of personalized therapy.
Multiple studies have compared the proteomes between AML and normal healthy samples to look for AML-specific protein signatures. Using two-dimensional electrophoresis (2-DE) and MS, Kwak et al. identified 8 proteins that were differentially expressed between 12 AML patients and 12 healthy subjects, in which 5 proteins (α-2-HS-glycoprotein, complementassociated protein SP-40, RBP4 gene product, lipoprotein C-III, and an unknown protein) were down-regulated and 3 proteins (immunoglobulin heavy-chain variant, proteosome 26S ATPase subunit 1, and haptoglobin-1) were up-regulated in AML [45]. In another study using 2-DE and MALDI-TOF peptide mass fingerprinting analysis [46], seven proteins (alphaenolase, RhoGDI2, annexin A10, catalase, peroxiredoxin 2, tromomyosin 3, and lipocortin 1 (annexin 1)) were found to have significantly altered expression in AML blast cells compared to normal mononuclear blood cells. Comparing the proteome of AML against that of normal white blood cells, 31 proteins (including myeloid-related protein 8 and 14, myosin light chain 2 and 3) with significant altered expression were identified [47].
Proteomic comparisons between AML and other leukemia-related diseases may reveal biomarkers to distinguish AML from similar diseases in clinic. Cui et al. identified 27 proteins with differential expression between AML and ALL, including myeloperoxidase [47]. Aiming to characterize the proteomic mechanism underlying MDS progression to AML, Braoudaki et al. identified MOES, ZRI and AIFM1 as potential biomarkers for AML using 2-DE and MALDI-TOF, since these proteins were found to be up-regulated in AML [48]. Foss et al. demonstrated that the use of alignment-based label-free quantitation approaches in LC-MS/MS to distinguish AML from ALL and CD34+ cells from healthy donors [49]. Based on the same data generated in Foss et al.'sstudy , Elo et al. used a more advanced statistical method (reproducibility optimized test statistics (ROTS)) to identify biomarkers from the proteomic data and from the transcriptomic data. They found that the alignment-based proteomic method was able to generate novel and significant biomarkers that were not detected by the transcriptomic assay [50]. From the proteomic profiles of 151 AML bone marrow samples generated by SELDI-TOF-MS, Xu et al. developed a proteomicbased decision tree model to classify patients into APL, AML-granulocytic, AML-monocytic, ALL, and control (healthy volunteers) [51].
AML subtypes display unique proteomic patterns, which may present therapeutic opportunities for each of these subtypes. In a study of 38 AML-M1/M2 patients and 17 healthy volunteers [52], Luczak et al. demonstrated the use of 2-DE-MS to distinguish between M1 and M2 patients. They identified five proteins that were differentially accumulated between M1 and M2, in which Annexin III, L-plastin and 6-phosphogluconate dehydrogenase were found exclusively in M2. Comparing the protein expression levels across AML FAB classes, Cui et al. identified 23 proteins differentially expressed between the granulocytic lineage (M1, M2, M3) and monocytic lineage (M5), where they found 7 proteins up-regulated in both M2 and M3, and 15 proteins tightly associated with M3 (e.g. cathepsin G) [47]. In an RPPA study of 256 newly diagnosed AML patients [36], 24 proteins were found to significantly differ in expression between FAB subtypes out of 51 proteins that were tested. The proteins were found to belong to three clusters: (1) total and phosphorylated signal transduction proteins (KCA, PKCA.p, ERK2, AKT.p308, P38.p P70S6K, P70S6K.p, and Src.p527), with lower expression in myeloid subtypes (M0, M1, and M2); (2) PTEN and PTEN.p, with lower expression in M6 and M7; (3) apoptosis, cell cycle or differentiation regulating proteins and activated STAT proteins that have higher expression in myeloid subtypes.
Differences in proteomics (expression patterns, protein interaction pathways, and PTMs) were also found between cytogenetic abnormalities. In a study of 42 AML patients study using 2-DE MALDI-TOF-MS [53], Balkhi et al. showed that there were significant differences of protein expression levels, protein interaction networks and PTMs between cytogenetic groups. PTMs specific to cytogenetic abnormalities were identified, including a b-O-linked N-acetyl glucosamine (O-GlcNAc) of hnRNPH1 in patients with 11q23 translocation, an acetylation of calreticulin in patients with t(8;21), and methylation of hnRNPA2/B1 in patients with t(8;21) and inv (16). In an RPPA study, increased MET phosphorylation levels were found to associate with t(15;17) and t(8;21) cytogenetic subtypes [54].
Proteomic comparisons of relapsed against newly diagnosed patients or patients in remission can reveal biomarkers for early detection of relapse and non-invasive monitoring of minimal residual disease (MRD). Using MALDI-TOF-MS and high performance LC (HPLC)-ESI-MS/MS [55], Bai et al. identified 47 peptides that were differentially expressed between AML and healthy controls. In specific, they built a quality classifier model based on three peptides (ubiquitin-like modifier activating enzyme 1 (UBA1), isoform 1 of fibrinogen alpha chain precursor and platelet factor 4 (PF4)). UBA1 was up-regulated in newly diagnosed AML, decreased to normal level after complete remission, and then elevated again in relapse, whereas the other two peptides had the opposite response. The three proteins were shown to correlate with patient outcome and can serve as biomarkers for monitoring MRD and detecting relapse. In another study, Pierce et al. performed RPPA on 511 AML patient samples, and found that the expression of protein transglutaminase 2 was higher at relapse compared to diagnosis [41].
Leukemic stem-like cells (LSC) are believed to play critical roles in patient chemoresistance, refractory and relapse. To investigate the biological differences between leukemic stem-like cells (LSC) and common myeloid progenitors (CMP), Kornblau et al. profiled the expression of 121 proteins in Bulk (CD3/CD19 depleted), CD34À, CD34+ (CMP), CD34+CD38+ and CD34 +CD38À (LSC) in AML patients using RPPA [40]. Significant differences in protein expression and protein network patterns were found between LSC and the rest of the cells, indicating unique AML biology existing in LSC. The differentially expressed proteins in LSC (e.g. Mcl1, cIAP, Survivin, and Bcl2) may present as therapeutic targets for selectively targeting LSC.

Discovery of prognostic factors
Proteomics enables discovery of abnormal expressions of proteins or PTMs that are predictive of patient outcome. Profiling protein expressions in 511 AML patients using RPPA [37], Kornblau et al. found that patients with high levels of FOXO3A phosphorylation have higher rates of primary resistance and shorter remission durations. The prognostic value of highly phosphorylated FOXO3A is independent of cytogenetics, since FOXO3A phosphorylation levels were not found to associate with karyotypes. In another study of the same patient cohort, the overexpression of FLI1 protein was identified as another adverse prognostic factor in AML [38]. In the study by Cui et al. [47], NM23-H1 was identified as a prognostic factor, since it is up-regulated in all FAB subtypes except M3a, a favorable prognosis subtype.
The prognostic protein signatures can potentially complement cytogenetics and genomics to build better classification systems. In a study of 54 AML samples using SELDI-TOF [56], Nicolas et al. showed that proteomic signatures can stratify patient outcome and complement cytogenetic classifications. Based on the proteomic profile, they grouped patients into two clusters and found significant differences in overall and event-free survival between the two clusters. The proteomic-defined clusters were also able to stratify the overall and event-free survival in specific cytogenetic categories: the intermediate risk group was divided into a group of patients with similar outcome to the favorable and a group with similar outcome to the unfavorable; the unfavorable group was divided into a group with similar outcome to the intermediate and a group of similar outcome to the unfavorable. In addition, they isolated a biomarker, S100A8, the expression of which is a predictor of poor survival.
The mutation of p53, resulting in p53 stabilization, is associated with adverse survival, though the mutation is observed in only 5-8% of newly diagnosed AML patients. A recent RPPA study showed that p53 stabilization also occurs in a significant portion of wild-type p53 patients, where the expression of the p53 negative regulator Mdm2 is elevated [39]. Furthermore, patients with overexpressed Mdm2 are subject to poor outcomes similar to patients with p53 mutants. This finding has significant clinical implications as it unveils the p53 dysfunction in wild-type p53 patients who are previously assumed to have intact p53 functions, and it highlights the value of proteomics to complement genetic testing for classifying patients and guiding treatments.
Recently, as part of the Dialog for Reverse Engineering Assessment and Methods (DREAM), a crowdsourcing effort was launched to build, compare and assess prediction algorithms for AML prognosis (DREAM 9 AML Outcome Prediction Challenge) [57]. Based on the data consisting of 40 clinical attributes and 231 RPPA measurements in 191 AML patients (the released training data), participants were asked to build models that predict response to therapy (sub-challenge 1), remission duration (such-challenge 2) and overall survival time (sub-challenge 3) in 100 AML patients (withheld as test data for model evaluation). As one of the conclusions, the study showed that the RPPA data substantially improved the top performing models' performance in predicting response to therapy in AML, illustrating the prognostic and predictive value of proteomics and the potential of combining proteomics with current prognostic factors for more accurate outcome assessment. In addition, the expression of PI3KCA was identified as a highly informative protein biomarker for predicting patient response to therapy.

Identification of target proteins
Proteomics can provide insights into the effects and mechanisms of genetic mutations and help identify novel drug targets associated with specific mutations. Transcription factor CCAAT enhancer binding protein α (C/EBPα) is an important regulator of the myeloid differentiation. Its mutant form, C/EBPα-p30, is present in about 9% of AML patients. Using 2-DE MALDI-TOF-MS, Geletu et al. identified Ubc9 (an E2-conjugating enzyme) as a target protein for C/ EBPα-p30 [58]. The expression of Ubc9 was found to increase when inducing C/EBPα-p30, and the overexpression of Ubc9 was also observed in patients with C/EBPα-p30. In another study using 2-DE-MS proteomic screening, Pulikkan et al. uncovered the association of PIN1 overexpression with C/EBPα-p30 [59]. They then demonstrated that the elevated levels of PIN1 block granulocyte differentiation via c-Jun, and that the inhibition of PIN1 restores myeloid differentiation in primary AML blasts with C/EBPα mutation. This discovery suggests a potential treatment strategy of inhibiting PIN1 for AML patients with C/EBPα mutation.
As another example, RAS mutations occur in 10-25% of AML patients, however the mutation is not known to be prognostic. An RPPA study of 609 patients (11% with RAS mutation) showed that the RAS-Raf-MAP kinase and PI3K signaling pathways are up-regulated in patients with RAS mutation, which indicates RAS and PI3K signaling pathways as potential inhibitory targets for treating patients with RAS mutations [60].

Proteomics in AML cell lines
Due to the limited availability and difficult culturing conditions of primary patient cells, AML cell lines are often used to study disease mechanisms and biomarker discoveries. In these wellcontrolled and less heterogeneous experimental environment, one can compare the proteomic profiles between AML cell lines derived from different sources and with different mutations or cytogenetic abnormalities, and then extrapolate findings to the patient category of the cell line's origin. Cell lines are also easier to manipulate (for example, by up or down regulating certain proteins and by introducing mutations), and are therefore a great platform for studying the signaling networks and discovering target proteins.
Recently, Matondo et al. used large-scale quantitative SILAC-MS to identify proteins regulated by proteasome inhibition in two AML cell lines of different maturation stages: KG1a cells (immature) and U937 cells (mature) [61]. From over 7000 proteins quantified in the two cell lines, the study identified novel regulation targets of the proteasome inhibition, including IL-32, apoptosis inducing factor SIVA, MORF family mortality factors, in addition to known regulation targets such as heat shock and cell cycle proteins. Using 2D-DIGE MALDI-TOF/MS, Hu et al. compared the proteomic profiles between leukemia cell lines, HL-60 (drug-sensitive) and HL-60/ ADR (adriamycin-resistant) [62]. Sixteen differentially expressed proteins were identified, among which the up-regulation of nucleophosmin/B23 (NPM B23) and nucleolin C23 were validated in AML patient samples and may be indicators of drug resistance and predictors of prognosis. To investigate how AML exosomes affect the function of hematopoietic stem and progenitor cells (HSPC), Huan et al. compared proteomes of HSPCs treated with exosomes from AML cell line Molm-14 against proteomes of HSPCs treated with media (control) [63]. They identified 282 proteins that were differentially expressed between the two conditions, and the functional annotation of these proteins pinpointed candidate pathways that are involved in the exosomemediated modulation of HSPC function.
Proteomics were used in multiple studies to investigate the effects and mechanisms of drugs in cell line models. Using SILAC-MS, Weber et al. quantified 10,975 distinct phosphorylation sites to characterize the phosphoproteomic changes in AML cell line KG1 upon pharmacological intervention from erlotinib and gefitinib [64]. They found that the cellular perturbation by the two drugs is rather specific, with fewer than 50 phosphorylation sites significantly changed upon treatment. Most of these phosphorylation changes occur in a network of tyrosine phosphorylated proteins, suggesting that the drugs interfere with leukemic activities by inhibiting signal transduction via Src family kinases and tyrosine kinases Btk and Syk. Proteomics can also be used to compare the mechanisms of two drugs. In a study using 2-DE MALDI-TOF/-TOF-MS, Buchi et al. quantified the protein expression levels in AML1/ETO positive leukemic cells under the treatment of azacitidine and under the treatment of decitabine [65]. The identification of differentially expressed proteins in both conditions as well as differentially expressed proteins exclusive to each condition provides insights into the biological effects of these DNMT inhibitors and the mechanism differences between them.
To develop drugs that specifically target leukemic cells, an understanding of the surface proteomes in cell lines is crucial. Using MALDI-TOF/TOF, Strassberger et al. generated surface proteomes for four AML cell lines (HL60, NB4, PLB985, THP1) [66]. Comparing the AML surface proteomes to that of granulocytes from normal human peripheral blood, they identified multiple proteins that were up-regulated in AML cell lines, including CD33, CD166, integrin alpha-4. An antibody-drug conjugate was then developed using a human monoclonal antibody targeting CD166 and a duocarmycin derivative as the cytotoxic agent, which was shown to be able to kill AML cells in vitro. The study serves as a good example and a basis for developing anti-AML therapeutic strategies using knowledge from cell surface proteomes.

Challenges and future directions
Though proteomic technologies are advancing rapidly, a few challenges and issues remain. Therefore, it is worth discussing these challenges as well as the future directions to solve them.
One issue is the choice of control samples. There lacks a consensus as to what samples should serve as the control for AML samples to be compared to. Often, samples from healthy subjects are used as control to represent the normal biology in hematopoiesis, yet samples from patients under complete remission could also make meaningful controls. Another question to ask is what specific samples from healthy subjects should be used: for example, should the samples be derived from healthy bone marrows or peripheral blood; should the samples come from cryopreserved cells or fresh lysates. Though most studies found similar protein expression patterns between bone marrow derived and blood derived samples in AML [36,48], a comprehensive comparison is yet to be done in a large cohort of healthy subjects. Recently, the influence of freezing on proteomes in AML cells was reported [67], underscoring the importance of establishing more standard sample collection and preservation procedures. In addition, the number of control samples included in the studies is usually small, which makes it hard to deduce statistically valid claims and does not account for the full degree of heterogeneity in healthy individuals.
One challenge facing clinical research is the scarcity of primary patient samples. Most studies profiled proteomes in fewer than a hundred patients, and in some cases fewer than five patients were used. Considering the extreme heterogeneous biology present in AML, the proteomic patterns and biomarkers discovered in a small group of patients may not generalize to the whole AML population. Due to this incomprehensive representation of the AML population, the classification and prognostic power of proteomics will also be limited by drawing conclusions from few clinical samples. Given access to a large cohort of patients, one potential solution is to use peripheral blood samples instead of bone marrow samples. The proteome of blood samples was found to be similar to the proteome of bone marrow samples in multiple studies [36,48], indicating that blood samples may be a substitute for bone marrow samples in proteomic research. Since obtaining blood samples is less invasive and much more convenient than obtaining bone marrow aspirates, the use of blood samples m a yg r a n tr e s e a r c h e r sa c c e s st op r o t e o m i cp r o f i l e sa tm o r et i m ep o i n t s( e . g .a td i a g n o s i s , through treatment and remission). For this approach to work, more comprehensive comparisons of the proteomes between blood and bone marrow in both AML and healthy subjects need to be carried out. Another potential remedy for the sample availability problem is to openly share data sets generated from quantitative proteomics through common platforms. More statistical power can be achieved when merging findings from multiple datasets of different sources for example using meta-analysis. This approach will greatly benefit from standardizing the choice of control samples and data processing procedures across different studies.
Due to its convenience and almost unlimited supply, cell lines are commonly used as a substitute for primary patient samples for discovering new biomarkers and therapeutic targets, screening for new drugs and investigating therapeutic effects and resistance. However, cell lines may not provide a truthful representation of the biology in AML patients, as cell lines adapt to the culture conditions and selection pressure. The validity of the cell line model is further compromised by the heterogeneity of AML biology. Even if a cell line does preserve the biology of its origin (which is unlikely), a cell line at most represents a tiny fraction of the AML population. To make cell lines more relevant, comprehensive proteomic profiles of cell lines and primary patients are needed to investigate the degree of biological changes present in cell lines and to match cell lines to specific patient subpopulations. The hope is that cell lines may preserve the biology of their origins in some pathways, and by matching cell lines to specific patient categories we can utilize cell lines to personize treatments for their corresponding patient subpopulations.
To realize the full potential of proteomics in both research and clinic, more advanced computational techniques should be adopted or developed in AML proteomics research. When analyzing proteomic profiles, most studies use standard statistical tests to compile a list of differentially expressed proteins, and some would carry out tests to correlate the protein expression patterns with other clinical attributes and genetic mutations. While these tests are necessary, few studies take the leap to generate pathway level insights by examining the protein expressions in the context of protein interactions. Network-based approaches can be very useful in this regard to organize and visualize protein expressions in protein networks [39], using protein interaction information from public databases (e.g. string [68]) or from graphical reconstruction models. Insights into abnormal pathway regulation beyond the identification of abnormal expression in single proteins can open the door for new drug targets. Another challenge on the computational side is the increasing dimensionality of proteomic data thanks to the improvements in throughput and coverage of proteomic experimental techniques. In this case, more powerful clustering [69,70] and dimension reduction techniques [71], as well as interactive visualization tools [72], can help researchers to best benefit from this increase in data size and empower then to make data-driven hypotheses and discoveries. Crowdsourcing competitions have also proved to be an effective way to encourage innovative solutions for these challenging computational issues [57,73].
In summary, proteomics in AML is enabling the identification of new biomarkers and improving the classification of patients. Moreover, new experimental protocols and data analysis methods and tools are emerging to capitalize on the richness of the personalized data from the proteomic screens. Together these technological advances can provide new insight into the heterogeneities and hallmarks of AML.