InTech uses cookies to offer you the best online experience. By continuing to use our site, you agree to our Privacy Policy.

Mathematics » "Bayesian Inference", book edited by Javier Prieto Tejedor, ISBN 978-953-51-3578-4, Print ISBN 978-953-51-3577-7, Published: November 2, 2017 under CC BY 3.0 license. © The Author(s).

Chapter 4

Hypothesis Testing for High-Dimensional Problems

By Naveen K. Bansal
DOI: 10.5772/intechopen.70210

Article top

Hypothesis Testing for High-Dimensional Problems

Naveen K. Bansal
Show details

Abstract

For high-dimensional hypothesis problems, new approaches have emerged since the publication. The most promising of them uses Bayesian approach. In this chapter, we review some of the past approaches applicable to only law-dimensional hypotheses testing and contrast it with the modern approaches of high-dimensional hypotheses testing. We review some of the new results based on Bayesian decision theory and show how Bayesian approach can be used to accommodate directional hypotheses testing and skewness in the alternatives. A real example of gene expression data is used to demonstrate a Bayesian decision theoretic approach to directional hypotheses testing with skewed alternatives.

Keywords: multiple directional hypotheses, false discovery rate, familywise error rate, gene expression, skew-normal distribution

1. Introduction

In today’s world, most of the statistical inference problems involve high-dimensional multiple hypothesis testing. Whenever we collect data, we collect data on multiple features, involving very high-dimensional variables in some cases. For example, gene expression data consist of gene expressions on thousands of genes; image data consist of image expressions on multiple voxels. The statistical analysis for these types of data involves multiple hypotheses testing (MHT). It is well known that univariate methods cannot be applied to simultaneously test hypotheses on the multiple features. The reason for this is that the error rates for the univariate analysis get multiplied under MHT, and as a result the actual error rate can be very high. To understand the main issue of multiplicity, consider the following example. Suppose there are, say, 100 misspelled words in a book, and each of these words occurs in 5% of the pages. You pick a page at random. For each misspelled word, the probability is certainly 0.05 of finding that word in the page. However, the probability is much higher that you will find at least one of the 100 misspelled words. If these words were independently distributed, then the probability of finding at least one misspelled word is 1 − (0.95)100 ≈ 0.995. If the placements of the misspelled words were positively dependent, then the probability will be lower than 0.995. For example, if we take an extreme case of dependence that they all occur together, then the probability will be 0.05. The same phenomenon occurs in the MHT. The statistical inference, based on the error rate of individual hypothesis testing, can lead to very high error rate for the combined hypotheses. Thus, for the MHT, adjustment in the error rate needs to be made. Note that the adjustment rate may depend on the dependent structure, but due to the complexity of the dependent structure in high dimension, dependency is usually ignored in the literature [1].

The statistical inference depends on how we define the error rate for the combined hypotheses testing. Let us suppose that there are m hypotheses testing H0ivs.Hai,i=1,2,,m. If we do not want to make even one false discovery, then we should control the familywise error rate (FWER), which is defined as

FWER=PrFalselyRejectH0iforatleastonei,i=1,2,,m
(1)

There are many methods for controlling FWER ≤ αF (=0.05, e.g.). A simplest method is the Bonferroni’s procedure. Let Ti be the test statistics for testing H0ivs.Hai with the corresponding p-values pi. Then, Bonferroni’s procedure rejects H0i if pi < αF/m. To see the proof of this, suppose I0 be the set of all i for which H0i is true, and suppose pj < αF/m for at least one ∈ I0 . Then using Boole’s inequality, we have, from Eq. (1),

FWER=PriI0pi<αF/miI0Prpi<αF/m
(2)

Now, since, under H0i,piU0,1 , Pr{pi < αF/m} = αF/m. Then, assuming that there are m0 number of elements in I0, we have, from Eq. (2),

FWERm0αFmαF

Holm [2] gave a modified version of Bonferroni’s procedure which also controls the familywise error rate. Holm’s Bonferroni Procedure is the following: First rank all the p-values, p(1) ≤ p(2) ≤ … ≤ p(m), and let H01,H02,,H0m be their associated null hypotheses. Let l be the smallest index such that p(l) > αF/(m − l + 1). Then, reject only those null hypotheses that are associated with H01,H02,,H0l-1. Note that the selected hypotheses have p-values with p(1) < αF/m, p(2) < αF/(m − 1), …, p(l − 1) < αF/(m − l + 2) , and thus more powerful than Bonferroni’s procedure, since hypotheses that are selected under Bonferroni’s procedure will also be selected under Holm’s procedure.

The above Bonferroni type procedures are not very satisfactory when m is very high. Let us suppose m = 10, 000 (this is actually not very high for most of the high-dimensional problems), and suppose we want to control FWER by αF = 0.05. Then, for Holm’s procedure, the smallest p-value has to be lower than 0.000005 in order to reject at least one hypothesis, which may be very hard to achieve. The problem is not really with Holm’s procedure; the problem is with the use of FWER as an error rate. For a high-dimensional problem, it is unrealistic to seek for a procedure which will not make at least one false discovery. Benjamini and Hochberg [1] proposed a new approach called false discovery rate (FDR) and proposed a procedure that works much better for high-dimensional MHT.

In Section 2, we review the FDR procedure and Bayesian procedures for two-sided alternatives. An extension of directional hypotheses is presented in Section 3. In Section 3, we also discuss Bayesian procedures under skewed alternatives. In Section 4, the problem of directional hypotheses is considered by converting p-values to normally distributed test statistics. We also discuss, in Section 4, a Bayes procedure under skew-normal alternatives. An application using real data of gene expressions is also discussed in Section 4. Some concluding remarks are made in Section 5.

2. False discovery rate (FDR), Benjamini and Hochberg’s (BH) procedure, and Bayesian procedures

For each of the hypothesis testing H0ivs.Hai, suppose a statistical procedure either rejects the null hypothesis H0i or fails to reject H0i. For the sake of simplicity, we equate fail to reject H0i as accepting the null H0i . However, for small sample size case, it will be unwise to make a conclusion of accepting H0i. From now on, rejections of the null will be called discoveries. Table 1 shows the possible outcomes by a procedure, where, for example, V is the total number of discoveries, among them V0 is the number of false discoveries.

Accept H0Reject H0Total
H0 is trueU0V0m0
Ha is trueUaVamm0
UVm

Table 1.

Total number of decisions made.

Thus, the proportion of the false discoveries is V0/max(V, 1). The FDR is defined as the expected proportion of false discoveries, that is,

If, for example, FDR = 0.05, then we can expect on the average 5% of all discoveries to be false. In other words, under repeated experiments on the average, we make 5% of the false discoveries (in a frequentist’s sense). Note that FDR ≤ FWER = P(V0 ≥ 1) as the following inequality shows:

FDR=EV0maxV1=EV0maxV1IV01EIV01=PV01.

Thus, we are likely to make a higher number of discoveries under FDR approach than under FWER, since if a procedure controls FWER (≤α), then it also controls FDR ((≤α), but not vice versa.

2.1. Benjamini and Hochberg’s procedure

Benjamini and Hochberg [1] proposed the following BH procedure which controls the FDR.

Let pi be the p-value for the ith hypothesis under a test statistic Ti. Suppose T1, T2, …, Tm are independently distributed. Let p[1] < p[2] < … < p[m] be the ordered p-values with the corresponding null hypotheses be denoted by H01,H02,,H0m. Let

i0=maxi:piimα

Then, reject H0i for all i ≤ i0.

This procedure controls FDRm0mαα. Since m0 is unknown, having the upper bound of m0mα is not very useful. If m0 can be estimated reliably, a better bound is possible.

The above result was proven in [1], under the independence of the test statistics. Hochberg and Yekitieli [3] extended the result to positively correlated test statistics, and they also sharpened the BH procedure with new i0 defined as

i0=maxi:pi1mcmα,

where cm=i=1m1i.

2.2. Bayesian procedures

Under Bayesian setting, we assume that H0i and Hai,i=1,2,,m are generated probabilistically with

PH0i=pandPHai=1-p

Under this setting, [4] developed a concept of local false discovery rate (fdr). If Ti, i = 1, 2, …, m are test statistics with pdf Ti|H0 ∼ f0(t) and Ti|Ha ∼ fa(t). Then, marginally, Ti ∼ f(t) = pf0(t) + (1 − p)fa(t), and

fdrt=PH0iTi=t=pf0tft
(4)

The idea is that if Ti ∈ [t, t + δt], where δt → 0, then fdr(t) represents that the proportion of the times H0i will be true. If t is very high, then fdr(t) will be very small indicating the probability of H0i to be very small (i.e., the false discovery rate will be very small). In Eq. (3), p and f(t) are unknown, which can be estimated (see [4]).

Storey [5] proposed a positive false discovery rate

where expectation is with respect to the distribution of (Ti, θi), i = 1, 2, …, m. Under the assumption that T1, T2, … Tm are identically and independently distributed, [6] proved that

pFDRΓ=PH0TΓ),

for a procedure that rejects H0i when Ti ∈ Γ. Based on this, q − value for the multiple hypothesis (analogous to p-value for a single hypothesis) is defined as the smallest value of pFDR(Γ) such that the observed Ti = ti ∈ Γ, see [6]. Under most cases, q − value(ti) = P(H0Ti > ti). This gives a procedure under multiple hypothesis that rejects H0i if q − value(ti) < α.

3. Directional hypotheses testing

As described earlier, the null hypothesis H0i is either accepted or rejected. In most cases, however, rejection of null hypotheses is not sufficient. After rejecting H0i, finding the direction of the alternatives may also be important. A detailed discussion of the directional hypotheses can be found in [7].

Direction hypotheses testing involves testing H0i against directional hypotheses H-i and H+i, and the objective is to obtain selection region {Ti ∈ Γ} for selecting H-i and selection region {Ti ∈ Γ+} for selecting H+i. In other words, H0i will be rejected if Ti ∈ Γ or Ti ∈ Γ+, and the direction H-i or H+i is determined according to Ti ∈ Γ or Ti ∈ Γ+, respectively. Analogous to Table 1, we now have

Table 2 illustrates the number of cases possible when accepting H0 or selecting H or selecting H+. For example, out of V times when selecting H, V0 times errors are made when in fact H0 is true, and V+ times errors are made when in fact H+ is true. In other words, when selecting H, not only H0 is falsely rejected V0 times but the direction is also falsely selected V+ times. This leads to a concept of directional false discovery rate DFDR defined as

DFDR=EV0+V++W0+W-maxV+W,1.
(6)
Accept H0Select HSelect H+Total
H0 is trueU0V0W0m0
H is trueUVWm
H+ is trueU+V+W+m+
TotalUVWm

Table 2.

Number of decisions under directional hypotheses.

This is analogous to FDR for two-sided alternatives. For most cases, [8] showed that DFDR-controlling procedures for directional hypotheses can be treated as FDR-controlling procedure for two-sided multiple hypotheses with direction determined by the sign of the test statistics.

Bansal and Miescke [9] considered a decision theoretic formulation to multiple hypotheses problems. The approach assumes parametric modeling. Suppose the model for the observed data x be represented by P(xθ, η), where θ = (θ1, θ2, …, θm) ′ is a parameter vector of interest, and η is a nuisance parameter. The problem of interest is to test

H0i:θi=0vs.H-i:θi<0orH+i:θi>0
(7)

Let the loss function of a decision rule d(x) = (d1(x), d2(x), …, dm(x)) is given by

Lθ,dx=i=1mliθ,dix,
(8)

where li(θ, di(x)) is an individual loss of di. Here, di ∈ {−1, 0, 1} with di = 0, di = − 1, and di = 1 means accepting H0i, selecting H-i and selecting H+i, respectively. Note that for the “0-1” loss, that is, when li = 0 for correct decision, and li = 1 for the incorrect decision, L is the total number of incorrect decisions. Thus, minimizing the E[L(θ, d(X))] for the “0-1” loss amounts to minimizing the expected number of incorrect decisions.

Now, suppose under the Bayesian setting, θi, i = 1, 2, …, m are generated from

πθ=p-π-θ+p0Iθ=0+p+π+θ,
(9)

where π is the prior density over (−∞, 0) and π+ is the prior density over (0, ∞). A special case of prior (9) is that π(θ) = π+(−θ). In this case, p and p+ reflect the skewness in the alternative hypotheses. For example, if p = p+, then we have a symmetric case. In this case, the selection of H or H+, after rejecting H0, based on the sign of the test statistics makes sense. On the other hand, if p < p+, then it reflects that more of the θis are positives than negatives. For many gene expressions data analyses, this presents a useful case when over-expressed genes may occur more frequently than under-expressed genes as a result of gene mutation (naturally or as a result of external factors). For specific examples, see [9, 10].

From now on, we focus on the “0-1” loss. The results can be easily extended to other loss functions. The “0-1” loss can be written as

Lθ,d=i=1m1-j=-11Idi=jIνiθ=j,

where viθ-1,0,1 is an indicator variable indicating θi < 0 when viθ=-1, θi = 0 when viθ=0, and θi > 0 when viθ=1. It is easy to see that minimizing the posterior expected loss yields the selection rule that selects H-i,H0i,orH+i according to maxvi-,vi0,vi+, where vi-=PHi-x,vi0=PHi0x, and vi+=PHi+x.

3.1. The constrained Bayes rule

The Bayes procedure described earlier accommodates skewness in the prior, but no type of false discovery rates is controlled. In order to control a false discovery rate, we need to obtain a constrained Bayes rule that minimizes the posterior expected loss subject to a constraint on the false discovery rate.

The directional false discovery rate (6) is defined in a frequentist’s manner, in which expectation is with respect to X|θ. Let us define Eq. (6) as BDFDR when expectation is taken with respect to X|θ and then further expectation is taken with respect to θ. We define posterior version of Eq. (6) as PDFDR when the expectation is taken with respect to the posterior distribution of θ|X = x. It can be shown that

PDFDR=1-i=1mIdi=-1vi-+Idi=+1vi+|D-+|D+1
(10)

Here, D-=i=1mIdi=-1 and |D+|=i=1mIdi=1.

A constrained Bayes rule can be obtained by minimizing the posterior expected loss subject to the constraint that PDFDR ≤ α. There can be many approaches to obtain the constraint minimization. We present, here, an approach given in [9], which is as follows:

Consider the sets D-B and D+B of indices that selects Hi- and Hi+, respectively, according to the unconstraint Bayes rule, that is, when vi-=maxvi0,vi+ and vi+=maxvi0,vi-, respectively. Define ξi=νi- for iD+B, and ξi=νi+ for iD+B, and then rank all ξi,iD-BD+B from the lowest to the highest. Let the ranked values be denoted by ξ1ξ2ξk̂, where k̂=D-BD+B . Denote

i^0=max{jk^:1ji=1jξ[k^i+1]1α}.

Let Dξ denotes the set of indices corresponding to ξk̂ξk̂-1ξk̂-î0+1. Now, select H-i for iD-BDξ, and H+i for iD+BDξ.

3.2. Estimating mixture parameters

The above procedure requires estimation of the parameters (p, p0, p+) and estimation of the nuisance parameter η. Note that marginally,

Xip-f-xi|η+p0f0xi|η+p+f+xiη,

where f0(xiη) = f(xi| 0, η), and

f-xiη=-0fxiθ,ηπ-θdθ,f+xiη=0fxiθ,ηπ+θdθ

and X1, X2, …, Xm are independently distributed. Estimates of the parameters of the mixed density can be obtained by using EM algorithm. It is easy to see that the EM estimators of (p, p0, p+) follows the following iterative scheme:

p-j+1=1mi=1mp-jf-xiηp-jf-xiη+p0jf0xiη+p+jf+xiη,
p0j+1=1mi=1mp0jf-xiηp-jf-xiη+p0jf0xiη+p+jf+xiη,
p+j+1=1mi=1mp+jf-xiηp-jf-xiη+p0jf0xiη+p+jf+xiη

Estimation of η can also be estimated iteratively by using EM algorithm or by different means. See [9] for more details.

4. Bayes rules by converting p-values to normally distributed test statistics

Let Ti, i = 1, 2,.., m be independently and identically distributed test statistics. Let Pi=PTiti|H0i be the corresponding p-values. Note that under H0i,PiU0,1 . Let Xi = Φ− 1(Pi) be the corresponding z-score. Then, under H0i, Xi ∼ N(0, 1) . Efron [11] suggested using Xi ∼ N(0, σ2) under H0i with σ2 appropriately estimated. Efron pointed out that, in practice, σ2 may not be equal to 1 due to possible correlation among multiple components. Under the alternative, we assume that Xi ∼ N(θi, σ2), where θis are generated with distribution described in Eq. (9). It is true that this is a big leap in making this assumption. In practice, this assumption can be tested, however, and if true, it can lead to very powerful results. [9] assumed that π+(θ) is a truncated normal distribution N(0, σ2/ω) , and π(θ) = π+(−θ), where ω is some positive constant depending upon how inflated we believe the alternative θis are. It can be seen that

vi-p-T-xi,vi+p+T+xi,andvi0p0
(11)

with the proportionality constant [pT(xi) + p+T+(xi) + p0}− 1 . Also, T(xi) = T+(−xi), and

T+xi=expxi221+ωσ2Φxiσ1+ω
(12)

In order to apply the Bayes procedure as discussed in Section 3, all we need are Eqs. (11) and (12). For computation details, see [9].

4.1. Skew-normal alternatives

In the above discussions, we assumed that θis are generated from distribution with pdf (9). [12] considered the case when θis are generated from a skew-normal distribution under the alternative hypotheses. The skew-normal distribution was first introduced in [13]. It has an important property that if (ξ1, ξ2) ∼ Bivariate Norma with mean 0, then the distribution of ξ1|ξ2 > 0 ∼ Skew − normal. Its pdf is given by

g+ξ1=21σ1φξ1σ1Φλξ1σ1,

and is denoted by SN(0, σ1, λ). Here, λ is a skew parameter. If λ = 0, then this distribution is N(0, σ1). The implication of this result is the following: suppose within a normal system an outcome follows a normal distribution, but if a correlated factor starts exerting a positive effect, then the outcome variable will start following a skew-normal distribution. For example, consider RNAs experiments and assume that genes are in a normal state. Suppose a gene mutation occurs at a later state and it starts exerting positive effect on the affected genes. In this case, based on the above property of skew-normal distribution, we can assume that the expressions of the affected genes will follow a skew-normal distribution.

Under this formulation, we assume that θi = 1, 2, …, m are generated from

πλθi=pIθi=0+1-p2σ1φθiσ1Φλθiσ1

Now, similar to Eq. (11), it can be seen that

vi-1-pT-xi,vi+1-pT+xi,vi0p

with proportionality constant [(1 − p)(T+(xi) + T(xi) + p]− 1, where

T+xi=2σ10expxiθσ2φ1σ12+1σ2θΦλθσ1dθ,

and

T-xi=2σ1-0expxiθσ2φ1σ12+1σ2θΦλθσ1dθ.

The sets D-B and D+B can be written as

D-B=i:xi<-c1andD+B=i:xi>c2

where c1 > 0 and c2 > 0 are determined as shown in Figure 1 by considering the point of intersections of y = p/(1 − p) and y = T(x), and y = p/(1 − p) and y = T+(x), respectively. Note that when λ > 0, the intersection point Q (as shown in the figure) will be to the left of x = 0, and when λ < 0, Q will be to the right of x = 0. Thus, when λ > 0, c1 > c2 and the opposite is true when λ < 0. When λ = 0, T(x) = T+(−x) and thus c1 = c2. If λ → ∞, T(x) → 0 and thus DB- is an empty set which is equivalent to a one-tailed test. As discussed in Section 3, the procedure based on Eq. (13) by itself does not control BDFDR. However, c1 and c2 can be further shrunk so that the resulting procedure achieves BDFDR ≤ α; see [12] for details.

media/F1.png

Figure 1.

Graph of T+(x) and T(x) with cutoff values − c1 and c2 such that T+xp1-p and T-xp1-p.

To illustrate the above procedure, and to compare it with the standard FDR procedure (BY) of [8], which selects the direction based on the sign of the test statistics, we consider a HIV data described in [14]. For detailed analysis, see [12]. Here, we describe the analysis very briefly. The data consist of eight microarrays, four from cells of HIV-infected subjects and four from uninfected subjects, each with expression levels of 7680 genes. For each gene, we obtained a two-sample t-statistic, comparing the infected versus the uninfected subjects, which is then transformed to a z-value, where zi = Φ− 1{F6(ti)}. Here, F6(∙) denotes the cumulative distribution function (cdf) of t -distribution with six degrees of freedom. Figure 2 shows the histogram of the z-values with a skew-normal fit. Although the null distribution of Zi should be N(0, 1). However, as suggested in [11], we use the null distribution as N(−0.11, 0.752). Thus, we formulate our problem as testing hypotheses (7) with test statistics Zi ∼ N(−0.11 + θi, 0.752).

media/F2.png

Figure 2.

Histogram of the HIV data with cutoff points by BY and the Bayes method under skew-normal prior.

BY procedure resulted in cutoffs (−3.94, 3.94), which resulted in 18 total discoveries with two genes declared as under-expressed and 16 as over-expressed. For the constrained Bayes rule, we first used the EM algorithm to obtain the parameter estimates as p̂=0.9, σ̂=0.79, σ1̂=1.54, and λ̂=0.22. The Bayes procedure ended up with cut-off points (−2.82, 2.70) with a total of 86 discoveries (under-expressed genes: 23 and over-expressed genes: 63). Note that the number of discoveries by the Bayes rule is much higher than by the BY procedure.

5. Concluding remarks

There are many different methods of testing multiple hypotheses. Methodologies, however, depend on the criteria we choose. When the dimension of multiple hypotheses is not very high, the familywise error rate (FWER) is an appropriate criterion which safeguards against making even one false discovery. However, when the dimension of multiple hypotheses is very high, the FWER is not very useful; instead, a false discover rate (FDR) criterion is a good approach. Although FDR was originally defined as a frequentist’s concept, it can be re-interpreted in a Bayesian framework. The Bayesian framework brings many advantages. For example, a decision-theoretic formulation is easy to implement, directional hypotheses are easy to handle, and the skewness in the alternatives is easy to implement. Drawback is that we need to make an assumption about the prior distributions under the alternatives. Some work has been done based on nonparametric priors; however, much more work is needed.

References

1 - Benjamini Y, Hochberg Y. Controlling the false discovery rate: A practice and powerful approach to multiple testing. Journal of the Royal Statistical Society B. 1995;57(1):289-300
2 - Holm S. A simple sequentially rejective multiple test procedure. Scandinavian Journal of Statistics. 1979;6(2):65-70
3 - Hochberg B, Yekitieli D. The control of the false discovery rate in multiple testing under dependency. Annals of Statistics. 2001;29(4):1165-1188
4 - Efron B, Tibshirani R, Storey JD, Tusher V. Empirical Bayes analysis of a microarray experiment. Journal of the American Statistical Association. 2001;96(456):1151-1160
5 - Storey JD. A direct approach to false discovery rates. Journal of the Royal Statistical Society B. 2002;64(3):479-498
6 - Storey JD. The positive false discovery rate: A Bayesian interpretation and the q value. The Annals of Statistics. 2003;31(6):2013-2035
7 - Shaffer JP. Multiplicity, directional (Type III) errors, and the null hypothesis. Psychological Methods. 2002;7(3):356-369
8 - Benjamini Y, Yekutieli D. False discovery rate controlling confidence intervals for selected parameters. Journal of American Statistical Association. 2005:71-80
9 - Bansal NK, Miescke KJ. A Bayesian decision theoretic approach to directional multiple hypotheses problems. Journal of Multivariate Analysis. 2013:205-215
10 - Bansal NK, Jiang H, Pradeep P. A Bayesian methodology for detecting targeted genes under two related experiments. Statistics in Medicine. 2015;34(25):3362-3375
11 - Efron B. Correlation and large-scale simultaneous significance testing. Journal of the American Statistical Association. 2007:93-103
12 - Bansal NK, Hamedani GG, Maadooliat M. Testing multiple hypotheses with skewed alternatives. Biometrics. 2016;72(2):494-502
13 - Azzalini A. A class of distributions which includes the normal ones. Scandinavian Journal of Statistics. 1985;12(2):171-178
14 - van’t Wout AB, Lehrman GK, Mikheeva SA, O’Keeffe GC, Katze MG, Bumgarner RE, Mullins JI. Cellular gene expression upon human immunodeficiency virus type 1 infection of CD4+-T-cell lines. Journal of Virology. 2003;77(2):1392-1402