With the advances in biotechnology, identification of new therapeutic targets, and better understanding of human diseases, pharmaceutical companies and academic institutions have accelerated their efforts in drug discovery. The pipeline to obtain therapeutics often involves target identification and validation, lead discovery and optimization, pre-clinical animal studies, and eventually clinical trials to test the safety and effectiveness of the new drugs. In most cases, screening using genome-scale RNA interference (RNAi) technology or diverse compound libraries comprises the first step of the drug discovery initiatives. Small interfering RNA (siRNA, a class of double-stranded RNA molecules 20-25 nucleotides in length capable of interfering with the expression of specific genes with complementary nucleotide sequence) screen is an effective tool to identify upstream or downstream regulators of a specific target gene, which may also potentially serve as drug targets for a more efficient and successful treatment. On the other hand, screening of diverse small molecule libraries against a known target or disease-relevant pathway facilitates the discovery of chemical tools as candidates for further development.
Conducting either genome-wide RNAi or small molecule screens has become possible with the advances in high throughput (HT) technologies, which are indispensible to carry out massive screens in a timely manner (Macarron 2006; Martis et al. 2011; Pereira and Williams 2007). In screening campaigns, large quantities of data are collected in a considerably short period of time, making rapid data analysis and subsequent data mining a challenging task (Harper and Pickett 2006). Numerous automatic instruments and operational steps participate in an HT screening process, requiring appropriate data processing tools for data quality assessment and statistical analysis. In addition to quality control (QC) and “hit” selection strategies, pre- and post-processing of the screening data are essential steps in a comprehensive HT operation for subsequent interpretation and annotation of the large data sets. In this chapter, we review statistical data analysis methods developed to meet the needs for handling large datasets generated from HT campaigns. We first discuss the influence of proper assay design on statistical outcomes of the HT screening data. We then highlight similarities and differences among various methods for data normalization, quality assessment and “hit” selection. Information presented here provides guidance to researchers on the major aspects of high throughput screening data interpretation.
2. Role of statistics in HT screening design
2.1. HT screening process
A typical HT screening campaign can be divided into five major steps regardless of the assay type and the assay read-out (Fig. 1). Once target or pathway is identified, assay development is performed to explore the optimal assay conditions, and to miniaturize the assay to a microtiter plate format. Performance of an HT assay is usually quantified with statistical parameters such as signal window, signal variability and Z-factor (see definition in section 4). To achieve acceptable assay performances, one should carefully choose the appropriate reagents, experimental controls and numerous other assay variables such as cell density or protein/substrate concentrations.
The final distribution of the activities from a screening data set depends highly on the target and pathway (for siRNA) or the diversity of the compound libraries, and efforts have been continuously made to generate more diverse libraries (Entzeroth et al. 2009; Gillet 2008; Kummel and Parker 2011; Zhao et al. 2005). Furthermore, the quality and reliability of the screening data is affected by the stability and the purity of the test samples in the screening libraries, where storage conditions should be monitored and validated in a timely manner (Baillargeon et al. 2011; Waybright et al. 2009). For small molecules, certain compounds might interfere with the detection system by emitting fluorescence or by absorbing light, and they should be avoided whenever possible to obtain reliable screening results.
Assay development is often followed by a primary screen, which is carried out at a single concentration (small molecule) or single point measurements (siRNA). As the “hits” identified in the primary screen are followed-up in a subsequent confirmatory screen, it is crucial to optimize the assay to satisfactory standards. Sensitivity - the ability to identify an siRNA or compound as a “hit” when it is a true “hit”, and specificity - the ability to classify an siRNA or compound as a “non-hit” when it is not a true “hit”, are two critical aspects to identify as many candidates while minimizing false discovery rates. Specificity is commonly emphasized in the confirmatory screens which follow the primary screens. For instance, the confirmatory screen for small molecules often consists of multiple measurements of each compound’s activity at various concentrations using different assay formats to assess the compound’s potency and selectivity. The confirmatory stage of an RNAi screen using pooled siRNA may be performed in a deconvolution mode, where each well contains a single siRNA. Pooling strategy is also applicable to primary small molecule screens, where a keen pooling design is necessary (Kainkaryam and Woolf 2009). The confirmatory screens of compounds identified from small molecule libraries are followed by lead optimization efforts involving structure-activity relationship investigations and molecular scaffold clustering. Pathway and genetic clustering analysis, on the other hand, are widespread hit follow-up practices for RNAi screens. The processes encompassing hit identification from primary screens and lead optimization methods require powerful software tools with advanced statistical capabilities.
Accuracy and precision of an assay are also critical parameters to consider for a successful campaign. While accuracy is a measurement of how close a measured value is to its true value, precision is the proximity of the measured values to each other. Therefore, accuracy of an assay is highly dependent on the performance of the HT instruments in use. Precision, on the other hand, can be a function of sample size and control performances as well as instrument specifications, indicating that the experimental design has a significant impact on the statistical evaluation of the screening data.
2.2. Classical versus robust (resistant) statistics
One of the main assumptions when analyzing HT screening data is that the data is normally distributed, or it complies with the central limit theorem, where the mean of the distributed values converge to normal distribution unless there are systematic errors associated with the screen (Coma et al. 2009). Therefore, log transformations are often applied to the data in the pre-processing stage to achieve more symmetrically distributed data around the mean as in a normal distribution, to represent the relationship between variables in a more linear way especially for cell growth assays, and to make an efficient use of the assay quality assessment parameters (Sui and Wu 2007).
In HT screening practices, the presence of outliers - data points that do not fall within the range of the rest of the data - is generally experienced. Distortions to the normal distribution of the data caused by outliers impact the results negatively. Therefore, an HT data set with outliers needs to be analyzed carefully to avoid an unreliable and inefficient “hit” selection process. Although outliers in control wells can be easily identified, it should be clear that outliers in the test sample may be misinterpreted as real “hits” instead of random errors.
There are two approaches for statistical analysis of data sets with outliers: classical and robust. One can choose to replace or remove outliers based on the truncated mean or similar approaches, and continue the analysis process with classical methods. However, robust statistical approaches have gained popularity in HT screening data analysis in recent decades. In robust statistics, median and median absolute deviation (MAD) are utilized as statistical parameters as opposed to mean and standard deviation (std), respectively, to diminish the effect of outliers on the final analysis results. Although there are numerous approaches to detect and abolish/replace outliers with statistical methods (Hund et al. 2002; Iglewicz and Hoaglin 1993; Singh 1996), robust statistics is preferred for its insensitivity to outliers (Huber 1981). In statistics, while the robustness of an analysis technique can be determined by two main approaches, i.e. influence functions (Hampel et al. 1986) and breakdown point (Hampel 1971), the latter is a more intuitive technique in the concept of HT screening, where the breakdown point of a sample series is defined as the amount of outlier data points that can be tolerated by the statistical parameters before the parameters take on drastically different values that are not representing anymore distribution of the original dataset. In a demonstrated example on a five sample data set, robust parameters were shown to perform superior to the classical parameters after the data set was contaminated with outliers (Rousseeuw 1991). It was also emphasized that median and MAD have a breakdown point of 50%, while mean and std have 0%, indicating that sample sets with 50% outlier density can still be successfully handled with robust statistics.
2.3. False discovery rates
As mentioned previously, depending on the specificity and sensitivity of an HT assay, erroneous assessment of “hits” and “non-hits” is likely. Especially in genome-wide siRNA screens, false positive and negative results may mislead the scientists in the confirmatory studies. While the cause of false discovery results may be due to indirect biological regulations of the gene of interest through other pathways that are not in the scope of the experiment, it may also be due to random errors experienced in the screening process. Although the latter can be easily resolved in the follow-up screens, the former may require a better assay design (Stone et al. 2007). Lower false discovery rates can also be achieved by careful selection of assay reagents to avoid inconsistent measurements (outliers) during screening. The biological interference effects of the reagents in RNAi screens can be considered in two categories: sequence-dependent and sequence-independent (Echeverri et al. 2006; Mohr and Perrimon 2012). Therefore, off-target effects and low transfection efficiencies are the main challenges to be overcome in these screens. Moreover, selection of the appropriate controls for either small molecule or RNAi screens is very crucial for screen quality assessment as well as for “hit” selection, so that the false discovery rates can be inherently reduced. Positive controls are often chosen from small-molecule compounds or gene silencing agents that are known to have the desired effect on the target of interest; however, this may be a difficult task if very little is known about the biological process (Zhang et al. 2008a). On the other hand, selection of negative controls from non-targeting reagents is more challenging due to higher potential of biological off-target effects in RNAi screens compared to the negative controls used in small-molecule screens (Birmingham et al. 2009). Another factor that might interfere with the biological process in an HT screening assay is the bioactive contaminants that may be released from the consumables used in the screening campaign, such as plastic tips and microplates (McDonald et al. 2008; Watson et al. 2009). Unreliable and misleading screening results may be obtained from altered assay conditions caused by leached materials, and boosted false discovery rates may be unavoidable. Hence, the effects of laboratory consumables on the assay readout should be carefully examined during assay development.
The false discovery rates are also highly dependent on the analysis methods used for “hit” selection, and they can be statistically controlled. False discovery rate is defined as the ratio of false discoveries to the total number of discoveries. A t-test and the associated p value are often used for hypothesis testing in a single experiment and can be interpreted as the false positive discovery rate (Chen et al. 2010). However, the challenge arises when multiple hypothesis testing is needed or when the comparison of results across multiple experiments is required. For HT applications, a Bayesian approach was developed to enable plate-wise and experiment-wise comparison of results in a single process, while the false discovery rates can still be controlled (Zhang et al. 2008b). Another method utilizing the strictly standardized mean difference (SSMD) parameter was proven to control the false discovery and non-discovery rates in RNAi screens (Zhang 2007a; Zhang 2010 b; Zhang et al. 2010). By taking the data variability into account, SSMD method is capable of determining “hits” with higher assurance compared to the Z-score and t-test methods.
3. Normalization and systematic error corrections
3.1. Normalization for assay variability
Despite meticulous assay optimization efforts considering all the factors mentioned previously, it is expected to observe variances in the raw data across plates even within the same experiment. Here, we consider these variances as “random” assay variability, which is separate from the systematic errors that can be linked to a known reason, such as failure of an instrument. Uneven assay performances may unpredictably occur at any given time during screening. Hence, normalization of data within each plate is necessary to enable comparable results across plates or experiments allowing a single cut-off for the selection of “hits”.
When normalizing the HT screening data, two main approaches can be followed: controls-based and non-controls-based. In controls-based approaches, the assay-specific in-plate positive and negative controls are used as the upper (100%) and lower (0%) bounds of the assay activity, and the activities of the test samples are calculated with respect to these values. Although, it is an intuitive and easily interpretable method, there are several concerns with the use of controls for normalization purposes. With controls-based methods, too high or too low variability in the control wells does not necessarily represent the variability in the sample wells, and the outliers and biases within the control wells might impair the upper and lower activity bounds (Brideau et al. 2003; Coma et al. 2009). Therefore, non-control-based normalizations are favored for better understanding of the overall activity distribution based on the sample activities per se. In this method, most of the samples are assumed to be inactive in order to serve as their own “negative controls”. However, this approach may be misleading when the majority of the wells in a plate consist of true “hits” (such as screening a library of bioactive molecules) or siRNAs (e.g., focused library). Since the basal activity level would shift upwards under these conditions, non-controls-based method would result in erroneous decision making.
Plate-wise versus experiment-wise normalization and “hit” picking is another critical point to consider when choosing the best fitting analysis technique for a screen. Experiment-wise normalizations are advantageous in screens where active samples are clustered within certain plates. In this case, each plate is processed in the context of all plates in the experiment. On the other hand, plate-wise normalizations can effectively correct systematic errors occurring in a plate-specific manner without disrupting the results in other plates (Zhang et al. 2006). Therefore, the normalization method that fits best with one’s experimental results should be carefully chosen to perform efficient “hit” selection with low false discovery rates.
The calculation used in the most common controls-based normalization methods are as follows:
Percent of control (PC): Activity of the ith sample (Si) is divided by the mean of either the positive or negative control wells (C).
Normalized percent inhibition (NPI): Activity of the ith sample is normalized to the activity of positive and negative controls. The sample activity is subtracted from the high control (Chigh) which is then divided by the difference between mean of the low control (Clow) and the mean of the high control. This parameter may be termed normalized percent activity if the final result is subtracted from 100. Additionally, control means may be preferably substituted with the medians.
The calculation used in the most common non-controls-based normalization methods are as follows.
Percent of samples (PS): The mean of the control wells in the PC parameter (only when negative control is the control of interest) is replaced with the mean of all samples (Sall).
Robust percent of samples (RPS): In order to desensitize the PS calculation to the outliers, robust statistics approach is preferred, where mean of Sall in PS calculation is replaced with the median of Sall.
Z-score: Unlike the above parameters, this method accounts for the signal variability in the sample wells by dividing the difference of Si and the mean of Sall by the std of Sall. Z-score is a widely used measure to successfully correct for additive and multiplicative offsets between plates in a plate-wise approach (Brideau et al. 2003).
Robust Z-score: Since Z-score calculation is highly affected by outliers, robust version of Z-score is available for calculations insensitive to outliers. In this parameter, the mean and std are replaced with median and MAD, respectively.
3.2. Normalization for systematic errors
Besides the data variability between plates due to random fluctuations in assay performance, systematic errors are one of the major concerns in HT screening. For instance plate-wise spatial patterns play a crucial role in cell-based assay failures. As an example, incubation conditions might be adjusted to the exact desired temperature and humidity settings, but the perturbed air circulations inside the incubator unit might cause an uneven temperature gradient, resulting in different cell-growth rates in each well due to evaporation issues. Therefore, depending on the positions of the plates inside the incubator, column-wise, row-wise or bowl-shape edge effects may be observed within plates (Zhang 2008b; Zhang 2011b). On the other hand, instrumental failures such as inaccurate dispensing of reagents from individual dispenser channels might cause evident temporal patterns in the final readout. Therefore, experiment-wise patterns should be carefully examined via proper visual tools. Although some of these issues might be fixed at the validation stage such as performing routine checks to test the instrument performances, there are numerous algorithms developed to diminish these patterns during data analysis, and the most common ones are listed as follows and summarized in Table 1.
Median polish: Tukey’s two-way median polish (Tukey 1977) is utilized to calculate the row and column effects within plates using a non-controls-based approach. In this method, the row and column medians are iteratively subtracted from all wells until the maximum tolerance value is reached for the row and column medians as wells as for the row and column effects. The residuals in pth plate ( are then calculated by subtracting the estimated plate average (, ith row effect ( and jth column effect ( from the true sample value (Sijp). Since median parameter is used in the calculations, this method is relatively insensitive to outliers.
B-score: This is a normalization parameter which involves the residual values calculated from median polish and the sample MAD to account for data variability. The details of median polish technique and an advanced B-score method, which accounts for plate-to-plate variances by smoothing are provided in (Brideau et al. 2003).
BZ-score: This is a modified version of the B-score method, where the median polish is followed by Z-score calculations. While BZ-score is more advantageous to Z-score because of its capability to correct for row and column effects, it is less powerful than B-score and does not fit very well with the normal distribution model (Wu et al. 2008).
Background correction: In this correction method, the background signal corresponding to each well is calculated by averaging the activities within each well (S’ijp representing the normalized signal of a well in ith row and jth column in pth plate) across all plates. Then, a polynomial fitting is performed to generate an experiment-wise background surface for a single screening run. The offset of the background surface from a zero plane is considered to be the consequence of present systematic errors, and the correction is performed by subtracting the background surface from each plate data in the screen. The background correction performed on pre-normalized data was found to be more efficient, and exclusion of the control wells was recommended in the background surface calculations. The detailed description of the algorithm is found in (Kevorkov and Makarenkov 2005).
Well-correction: This method follows an analogous strategy to the background correction method; however, a least-squares approximation or polynomial fitting is performed independently for each well across all plates. The fitted values are then subtracted from each data point to obtain the corrected data set. In a study comparing the systematic error correction methods discussed so far, well-correction method was found to be the most effective for successful “hit” selection (Makarenkov et al. 2007).
Diffusion-state model: As mentioned previously, the majority of the spatial effects are caused by uneven temperature gradients across assay plates due to inefficient incubation conditions. To predict the amount of evaporation in each well in a time and space dependent manner, and its effect on the resulting data set, a diffusion-state model was developed by (Carralot et al. 2012). As opposed to the above mentioned correction methods, the diffusion model can be generated based on the data from a single control column instead of sample wells. The edge effect correction is then applied to each plate in the screening run based on the generated model.
Before automatically applying a systematic error correction algorithm on the raw data set, it should be carefully considered whether there is a real need for such data manipulation. To detect the presence of systematic errors, several statistical methods were developed (Coma et al. 2009; Root et al. 2003). In a demonstrated study, the assessment of row and column effects was performed based on a robust linear model, so called R score, and it was shown that performing a positional correction using R score on the data that has no or very small spatial effects results in lower specificity. However, correcting a data set with large spatial effects decreases the false discovery rates considerably (Wu et al. 2008). In the same study, receiver operating characteristics (ROC) curves were generated to compare the performance of several positional correction algorithms based on sensitivity and “1-specificity” values, and R-score was found to be the most superior. On the other hand, application of well-correction or diffusion model on data sets with no spatial effects was shown to have no adverse effect on the final “hit” selection (Carralot et al. 2012; Makarenkov et al. 2007). Additionally, reduction of thermal gradients and associated edge effects in cell-based assays was shown to be possible by easy adjustments to the assay workflow, such as incubating the plates at room temperature for 1 hour immediately after dispensing the cells into the wells (Lundholt et al. 2003).
4. QC methods
There are various environmental, instrumental and biological factors that contribute to assay performance in an HT setting. Therefore, one of the key steps in the analysis of HT screening data is the examination of the assay quality. To determine if the data collected from each plate meet the minimum quality requirements, and if any patterns exist before and after data normalization, the distribution of control and test sample data should be examined at experiment-, plate- and well-level. While there are numerous graphical methods and tools available for the visualization of the screening data in various formats (Gribbon et al. 2005; Gunter et al. 2003; Wu and Wu 2010), such as scatter plots, heat maps and frequency plots, there are also many statistical parameters for the quantitative assessment of assay quality. Same as for the normalization techniques, both controls-based and non-controls-based approaches exist for data QC methods. The most commonly-used QC parameters in HT screening are listed as follows and summarized in Table 2.
Signal-to-background (S/B): This is a simple measure of the ratio of the positive control mean to the background signal mean (i.e. negative control).
Signal-to-noise (S/N): This is a similar measure to S/B with the inclusion of signal variability in the formulation. Two alternative versions of S/N are presented below. Both S/B and S/N are considered week parameters to represent dynamic signal range for an HT screen and are rarely used.
Signal window (SW): This is a more indicative measure of the data range in an HT assay than the above parameters. Two alternative versions of the SW are presented below, which only differ by denominator.
Assay variability ratio (AVR): This parameter captures the data variability in both controls as opposed to SW, and can be defined as (1-Z’-factor) as presented below.
Z’-factor: Despite of the fact that AVR and Z’-factor has similar statistical properties, the latter is the most widely used QC criterion, where the separation between positive (Cpos) and negative (Cneg) controls is calculated as a measure of the signal range of a particular assay in a single plate. Z’-factor has its basis on normality assumption, and the use of 3 std’s of the mean of the group comes from the 99.73% confidence limit (Zhang et al. 1999). While Z’-factor accounts for the variability in the control wells, positional effects or any other variability in the sample wells are not captured. Although Z’-factor is an intuitive method to determine the assay quality, several concerns were raised about the reliability of this parameter as an assay quality measure. Major issues associated with the Z’-factor method are that the magnitude of the Z’-factor does not necessarily correlate with the hit confirmation rates, and that Z’-factor is not an appropriate measure to compare the assay quality across different screens and assay types (Coma et al. 2009; Gribbon et al. 2005).
Z-factor: This is the modified version of the Z’-factor, where the mean and std of the negative control are substituted with the ones for the test samples. Although Z-factor is more advantageous than Z’-factor due to its ability to incorporate sample variability in the calculations, other issues associated with Z’-factor (as discussed above) still apply. Additionally, in a focused library in which many possible “hits” are clustered in certain plates, Z-factor would not be an appropriate QC parameter. While assays with Z’- or Z-factor values above 0.5 are considered to be excellent, one may want to include additional measures, such as visual inspection or more advanced formulations in the decision process, especially for cell-based assays with inherently high signal variability. The power of the above mentioned parameters were discussed in multiple studies (Gribbon et al. 2005; Iversen et al. 2006; Macarron and Hertzberg 2009; Stevens et al. 1998).
SSMD: It is an alternative quality metric to Z’- and Z-factor, which was recently developed to assess the assay quality in HT screens (Zhang 2007a; Zhang 2007b). Due to its basis on probabilistic and statistical theories, SSMD was shown to be a more meaningful parameter than previously mentioned methods for QC purposes. SSMD differs from Z’- and Z-factor by its ability to handle controls with different effects, which enables the selection of multiple QC criteria for assays (Zhang et al. 2008a). The application of SSMD-based QC criterion was demonstrated in multiple studies in comparison to other commonly-used methods (Zhang 2008b; Zhang 2011b; Zhang et al. 2008a). Although SSMD was developed primarily for RNAi screens, it can also be used for small molecule screens.
5. “Hit” selection methods
The main purpose of HT screens is to obtain a list of compounds or siRNAs with desirable activity for further confirmation. Therefore, the ultimate goal of an HT screening campaign is to narrow down a big and comprehensive compound or siRNA library to a manageable number of “hits” with low false discovery rates. While the initial library of test samples undergoes multiple phases of elimination, the most critical factor is to select as many true “hits” as possible. After data normalization is applied as necessary, “hit” selection is performed on the plates that pass the QC criterion. As stated previously in Section 2.1, HT processes in primary and confirmatory screens differ in design. The “hit” selection process following a primary screen is similar for RNAi and small-molecule screens, where the screening run is often performed in single copy, and a single data point (obtained from either endpoint or kinetic reading) is collected for each sample. On the other hand, a confirmatory RNAi screen is typically performed in replicates using pooled or individual siRNA, while the confirmatory small-molecule screens are executed in dose-response mode. Here, we classify the “hit” selection methodologies in two major categories: primary and confirmatory screen analysis.
5.1. “Hit” selection in primary screen
Although RNAi and small molecule assays differ in many ways, a common aim is to classify the test samples with relatively higher or lower activities than the reference wells as “hits”. Hence, it is required to select an activity cut-off, where test samples with values above or below the cut-off are identified as “hits”. It is very crucial to select a sensible cut-off value with enough difference from the noise level in order to reduce false positive rates. Depending on the specific goals of the projects, the cut-off might need to be a reasonable value that leads to a manageable quantity of “hits” for follow-up studies. To guide scientists in the process, numerous “hit” selection methods have been developed for HT screens as presented below.
Percent inhibition cut-off: The “hits” from HT screening data that is normalized for percent inhibition (NPI method in Section 3.1) can be selected based on a percent cut-off value that is arbitrarily assigned relative to an assay’s signal window. As this method does not have much statistical basis to it, it is primarily preferred for small molecule screens with strong controls.
Mean +/- k std: In this method, cut-off is set to the value that is k std’s above or below the sample mean. While the cut-off can be applied to the normalized data, a k value of 3 is typically used, which is associated with the false positive error rate of 0.00135 (Zhang et al. 2006). As this cut-off calculation method is primarily based on normality assumption, it is also equivalent to a Z-score of 3. Since the use of mean and std make this method sensitive to outliers, a more robust version is presented next.
Median +/- k MAD: To desensitize the “hit” selection to outliers, a cut-off that is k MADs above or below the sample median was developed, and a study comparing the std- and MAD-based “hit” selection methods showed lower false non-discovery rates with the latter (Chung et al. 2008).
Quartile-based method: Similar to the previous approaches, the quartile-based “hit” selection method is based on the idea of treating the true “hits” as outliers and identifying them by setting upper and lower cut-off boundaries based on the quartiles and interquartiles of the data. The major advantage of the quartile-based method over median +/- k MAD is its more effective cut-off calculation formulation for non-symmetrical data, where upper and lower cut-offs can be determined independently. In the comparison of the three “hit” selection criteria presented so far, the quartile-based method outperformed the other two methods to detect true “hits” with moderate effects (Zhang et al. 2006).
SSMD and Robust SSMD: This parameter has become a widely-used method for RNAi screening data analysis mainly due to its ability to quantify RNAi effects with a statistical basis, and its better control on false negative and false positive rates (Zhang 2007a; Zhang 2007b; Zhang 2009; Zhang 2010a; Zhang 2010 b; Zhang 2011b; Zhang et al. 2010). SSMD is a robust parameter to capture the magnitude of the RNAi effects with various sample sizes. This scoring method also provides comparison of values across screens. Mean and std in the standard SSMD formula is substituted with median and MAD in the robust version. The SSMD parameter used for the primary screens without replicates holds a linear relationship with the Z-score method.
Bayesian method: This method is used to combine both plate-wise and experiment-wise information within single “hit” selection calculation based on Bayesian hypothesis testing (Zhang et al. 2008b). Bayesian statistics incorporates a prior data distribution and a likelihood function to generate a posterior distribution function. In HT screening data analysis using this method, the experiment- and plate-wise information is incorporated into the prior and likelihood functions, respectively. With the availability of various prior distribution models, the Bayesian method can be applied either with positive and negative controls or with test sample wells. As this method enables the control of false discovery rates, it is a more powerful “hit” selection measure than the median +/- k MAD when the sample data is used to generate the prior distribution.
5.2. “Hit” selection in confirmatory screen
Different strategies are pursued for the confirmation of “hits” from RNAi and small molecule primary screens. While dose response screens are very common to test the compound activities in a dose-dependent manner in small molecule screens, this is not applicable to RNAi screens. Here, we will present the “hit” selection methods for screens with replicates in two categories: dose-response analysis and others.
5.2.1. Dose-response analysis
After running a primary screen, in which a single concentration of compound is used, a subset of compounds is selected for a more quantitative assessment. These molecules are tested at various concentrations and plotted against the corresponding assay response. These types of curves are commonly referred to as “dose-response” or “concentration-response” curves, and they are generally defined by four parameters: top asymptote (maximal response), bottom asymptote (baseline response), slope (Hill slope or Hill coefficient), and the EC50 value.
A plot of signal as a function of concentration results in a rectangular hyperbola when the hill coefficient is 1 (Fig. 2A). Because the concentration range covers several orders of magnitude, the x-axis is normally displayed in the logarithm scale, resulting in a sigmoidal curve (Fig. 2B), which is generally fitted with the Hill equation:
The most accepted benchmark for drug potency is the EC50 value, which corresponds to the concentration of compound (x) that generates a signal midway between the top (T) and bottom (B) asymptotes (Fig. 2B). The steepness is indicated by the Hill slope (h), also known as the Hill coefficient or the slope factor (Fig. 2C).
It is preferable to apply the Hill equation to concentrations on a logarithmic scale, because the error associated with the EC50 (log form) follows a Gaussian distribution (Motulsky and Neubig 2010), as indicated in Eq. 21. The x values represent log[compound].
In biochemical experiments, a Hill coefficient of 1 is indicative of a 1:1 stoichiometry of enzyme-inhibitor or protein-ligand complexes. Under such condition, an increase from 10% to 90% response requires 81-fold change in compound concentration. Hill coefficient values that deviate from unity could reflect mechanistic implications (such as cooperativity or multiple binding sites) or non-ideal behavior of the compound (acting as protein denaturant or causing micelle formation) (Copeland 2005).
For symmetrical curves, the inflection point corresponds to the relative EC50 value, which lies halfway between the asymptotes. This relative EC50 may be different from the actual EC50 if the top and bottom plateaus do not accurately represent 0% and 100% response. For instance, in Fig. 2D, the midpoint in the black curve dictates a value of 60% based on the positive and negative controls. When using the relative EC50, careful analysis of data fitting is necessary to avoid deceptive results, as exemplified by the green curve in Fig. 2D. Curve fitting would provide a relative EC50 value of 1 for both the green and black curves, but based on controls, the compound associated with the green curve would inhibit the assay only by 20%. Therefore, it is argued that the best approach is to use a two-parameter curve fit, where only two parameters are allowed to float (EC50 and Hill coefficient values), while fixing the top and bottom boundaries as presented in Fig. 2E. (Copeland 2005).
Although EC50 is normally the main criterion to categorize compounds for downstream analysis, the value is highly dependent on assay conditions, such as cell number and enzyme/substrate amount (Copeland 2003). For enzymatic assays, a more attractive approach is to consider relative affinities. Cheng and Prusoff formulated a way to convert EC50 values to dissociation constants, thus reducing the overload of performing multiple titrations associated with standard enzyme kinetics (Cheng and Prusoff 1973). Nevertheless, the caveat of using this convenient alternative is to recognize the inhibitory modality of the compounds (Copeland 2005): competitive (Eq. 22), non-competitive (Eq. 23) and uncompetitive (Eq. 24).
The dissociation constant of a reversible compound (Ki) can be calculated based on a single substrate concentration (S) and the Michaelis constant (KM). The constant α delineates the effect of inhibitor binding on the affinity of the substrate for the enzyme. It becomes evident that EC50 and Ki are roughly the same at much lower substrate concentration relative to KM (Eq. 22) or when α=1 (Eq. 23).
Dose-response curves can follow various patterns, depending on the biological system to be investigated. For assays with certain basal level, increasing concentrations of a full agonist triggers a maximal response for the system (Fig. 2F, red curve). A partial agonist displays a reduced response (efficacy) relative to a full agonist (Fig. 2F, black curve), even though they both exhibit the same potency (i.e. same EC50 values). An antagonist might have certain affinity or potency, but it would not show any change in basal activity as the efficacy has a value of zero (Fig. 2F, green curve). However, an antagonist reverses the actions of an agonist. In pharmacological terms, the effects of a competitive antagonist can be overcome by augmenting the amount of agonist, but such agonist increment has no effect on the effects of non-competitive antagonists. Inverse agonists reduce the basal response of systems with constitutive activity (Fig. 2F, blue curve).
5.2.2. Other methods
In “hit” selection for confirmatory screens with single concentration of compound or siRNA, hypothesis testing is a commonly-used method to incorporate sample variability of each sample from its replicates. Therefore, confirmatory screens (or some primary screens) are chosen to be performed in replicates to statistically calculate the significance of the sample activity in relation to a negative reference group. Since previously listed Z- and robust Z-score methods assume that the variability of the test samples and the negative controls or references is equal, it is not a reliable measure for confirmatory screens with replicates, where the sample variability can be individually calculated. The most common methods to analyze screening data with replicates are listed below.
t-test: For “hit” selection in confirmatory screens, t statistics and the associated p value is used to calculate if a sample compound or siRNA is behaving significantly different than the majority of the test samples or controls. A t-test determines whether the null hypothesis, which is the mean of a test sample being equal to the mean of the negative reference group, is accepted or not. Paired t-test (first pairing of the test sample and reference value within each plate, then calculating t statistic on the paired values) is often preferred to avoid the distortion of results due to inter-plate variability, whereas unpaired t-test is used for global comparison of the sample replicates with all reference values in the experiment (Zhang 2011a). The p value calculated from t statistic is then used to determine the significance of the sample activity compared to the reference. An alternative to standard t-test, namely randomized variance model (RVM) t-test (Wright and Simon 2003), was found to be more advantageous for screens with few replicates to detect relatively less strong “hits” (Malo et al. 2010).
SSMD: While t-test is a useful method to calculate the significance of the sample activity by incorporating its variability across replicates, it lacks the ability to rank the samples by their effect sizes. As an alternative to t-test, SSMD-based “hit” selection method for replicates was proposed to enable the calculation of RNAi effects as previously illustrated in (Zhang 2011a). While SSMD-based method is more robust with small sample sizes as opposed to t-test (Zhang 2008a), at least 4 replicates is recommended in confirmatory screens to identify samples with moderate or higher effects (Zhang and Heyse 2009).
Various other p value calculation methods (e.g., redundant siRNA activity, or RSA) (Konig et al. 2007) and rank products method (Breitling et al. 2004)) are available, which can be adapted to detect “hits” in RNAi screens (Birmingham et al. 2009).
HT screening is a comprehensive process to discover new drug targets using siRNA and drug candidates from small molecule libraries. Statistical evaluation of the assay performance is a very critical step in HT screening data analysis. A number of data analysis methods have been developed to correct for plate-to-plate assay variability and systematic errors, and assess assay quality. Statistical analysis is also pivotal in the “hit” selection process from primary screens and in the evaluation during confirmatory screens. While some of these methods may be intuitively applied using spreadsheet programs (e.g., Microsoft Excel), others may require the development of computer programs using more advanced programming environments (e.g., R, Perl, C++, Java, MATLAB). Besides commercially available comprehensive analysis tools, there are also numerous open-access software packages designed for HT screening data management and analysis for scientist with little or no programming knowledge. A short compilation of freely available analysis tools is listed in Table 3. The growing number of statistical methods will accelerate the discovery of drug candidates with higher confidence.