On Statistical Assessments of Racial/Ethnic Inequalities in Cigarette Purchase Price among Daily Smokers in the United States: Non-Hispanic Whites Pay Least

We discuss statistical methods suitable for comparing multiple populations versus one reference population and consider two common problems: (1) detecting all significant mean differences and (2) demonstrating that all mean differences are significant. Discussed methods include the Bonferroni approach (both problems), Min test (problem 2), and Strassburger-Bretz-Hochberg (SBH) confidence interval for estimating the smallest mean difference (problem 2). They illustrate the methods using the pooled 2010–2015 Tobacco Use Supplement to the Current Population Survey (TUS-CPS) data on the cigarette purchase price (per pack) reported by adult daily smokers (n = 34,728). The goal was to show that among seven considered racial/ ethnic groups of daily smokers, non-Hispanic (NH) Whites paid least for cigarettes (on average). We used the design-based multiple linear regression to derive the esti-mates and raw p-values. The Min test supported the study goal. Likewise, SBH lower 95% confidence interval bound was $0.08, indicating that the other racial/ethnic groups of daily smokers paid at least eight cents more for a pack of cigarettes (on aver-age) than did non-Hispanic Whites. However, Bonferroni method (that was originally proposed for problem 1) failed to support the study goal. The study highlights the importance of choosing the right statistical method for a given problem.


Introduction
In this chapter, we discuss statistical methods for comparing multiple populations relative to one population (termed "reference"). These types of multiple comparisons commonly arise in behavioral science, for example, when multiple racial/ethnic groups are compared to non-Hispanic (NH) White smokers in terms of tobacco-use-related behaviors [1][2][3][4]. When the statistical parameter of interest is the mean difference, the most common study goal is one of the following two goals. Goal 1 is to detect all significant mean differences among the considered populations (versus the reference population), that is, to draw an individual conclusion regarding significance of each mean difference. Goal 2 is to demonstrate that all mean differences among the considered ones are significant. Note that if one assessed Goal 1 and concluded that each mean difference was significant then s/he has (indirectly) assessed Goal 2 as well. Other more intricate study goals, such as the ones arising in pharmaceutical statistics which involve a hierarchical structure among the primary and secondary end points, were addressed elsewhere and are outside of the scope of this chapter [5][6][7][8][9][10][11].
We discuss how Goals 1 and 2 can be assessed in a study of racial and ethnic disparities, where Hispanic (H) population and five non-Hispanic populations such as American Indian/Alaska Native (AIAN), Asian (ASIAN), Black/African American (BAA), Hawaiian/Pacific Islander (HPI), and Multiracial (MULT), are compared to non-Hispanic White (W) population in terms of the mean differences. Suppose the overall error rate for assessing Goal 1 (Goal 2) is fixed at α-level. Then to assess Goal 1 (to detect all significant mean differences), we should first rescale each p-value i p , for example, via Bonferroni, Holm, or Hochberg approaches [6,[12][13][14]. This rescaling is essential to control the overall error rate at the nominal α-level. For example, in our case with six null hypotheses, the p-values rescaled via Bonferroni method are given as 6 i p (i.e., we multiply each original p-value by six). Second, we compare each rescaled p-value with α. If , then we reject 0i H and conclude that the ith difference is significant (positive); if > α i p , then we accept 0i H and conclude that the ith difference is not significant. As a result, we draw an individual conclusion regarding significance of each mean difference. Alternatively to the above hypothesis testing, one could construct the lower ( ) −α 100 1 / 6 % confidence intervals for the mean differences in Eq. (1) and use the lower bounds to differentiate between the significant and insignificant mean differences; this approach was discussed elsewhere [15].
To assess Goal 2 (to demonstrate that all differences are significant), one can use the Min test that is an intersection-union test [16][17][18][19][20][21]. The p-value for the Min test, denoted by p , is simply the largest p-value among the six individual p-values: i H and conclude that there is at least one insignificant mean difference. Note that we cannot comment on the significance of an individual mean difference, because we tested whether all mean differences are significant (i.e., whether the smallest mean difference is significant). Nonetheless, the Min test is more suitable for assessing Goal 2 than Bonferroni approach or another approach proposed for assessing Goal 1. A statistical method is usually proposed for a specific problem and thus, the methods should be used accordingly: the union-intersection hypothesis (Goal 1) should be tested via Bonferroni or another union-intersection test, while the intersection-union hypothesis (Goal 2) should be tested via the Min or another intersection-union test [12,22].
Alternatively to the Min test, we can use the Strassburger-Bretz-Hochberg (SBH) confidence interval approach as follows [23,24]. First, we compute the lower ( ) −α 100 1 % confidence intervals for the mean differences in Eq. (1). Let these bounds be denoted as Second, let L denote the smallest bound among these bounds, that is, Then the SBH lower ( ) −α 100 1 % confidence interval for the smallest mean difference is given by ( ) , we conclude that all mean differences are significant, and if ≤ 0 L , then we conclude that there is at least one insignificant mean difference.
We note that one needs to identify the appropriate statistical method to compute the individual p-values and confidence bounds. The choice depends on the study design, probability distributions, and other statistical considerations. The Min test and the SBH interval were discussed for parallel and factorial designs, where sample mean responses followed normal distributions with known variances or unknown (common) variance, as well as Binomial and several other distributions [20,21,[23][24][25][26][27]. In addition, one needs to decide whether the analyses should adjust for explanatory factors, for example, sociodemographic characteristics [28][29][30]. Such adjustments may help reduce the effect of confounding factors and therefore, improve estimation [31,32]. For example, Golden et al. examined how much smokers pay for a pack of cigarettes, on average, in the United States using data from the 2010-2011 Tobacco Use Supplement to the Current Population Survey (TUS-CPS) [1]. Among several design-based multiple linear regression models for the mean purchase price per pack (PPP) used in the study, one model adjusted for smokers' sociodemographic and smoking-related characteristics, cigarette purchase attributes, and the survey wave [1].
Despite availability and benefits of the Min test and SBH interval, these methods have not received much attention in behavioral sciences. We illustrate benefits of using these methods over Bonferroni method and simplicity of applications of these methods. We consider a study of racial and ethnic disparities in cigarette purchase prices conducted to demonstrate that W daily smokers, on average, purchase cigarettes at lower prices than do AIAN, ASIAN, BAA, H, HPI, and MULT daily smokers in the United States. This goal was motivated by results of a prior study revealing that BAA, H, and ASIAN/HPI (ASIAN and HPI combined) smokers paid higher PPP, on average, relative to W smokers, in the United States in the period from 2010 to 2011 [1].

Using data to derive the p-values and lower confidence interval bounds
We used the pooled 2010-2011 and 2014-2015 TUS-CPS data for adult daily smokers (n = 34,728) who reported the price of the last self-purchased pack or carton of cigarettes. The reported prices were used to compute the (average) PPP. The overall cohort was representative of about 23,370,261 adult daily smokers, where 12% were 18-24 years old, 38% were 25-44 years old, and 50% were 45+ years old, and 54% were men and 47% were women. The racial/ethnic representation was as follows: 76% were W, 11% were BAA, 8% were H, 2% were MULT, 2% were ASIAN, 1% were AIAN, and less than 1% were HPI. All racial/ethnic groups were well represented in the sample: the smallest number of respondents (96) corresponded to HPI daily smokers. Additional sample characteristics have been described in a prior study of purchasing cigarettes on Indian reservations [33].
We fixed the overall error rate at α = 5% and fitted a design-based multiple linear regression (R 2 ≈ 30%, F(25, 160) ≈ 257, p < 0.0001) to model the mean PPP as a function of daily smokers' characteristics, location of the purchase (on/off Indian reservation), survey mode (phone, in-person), and survey period (2010)(2011)(2014)(2015). The daily smokers' characteristics included race/ethnicity, age, sex, marital status, education, employment record, region of residency (West, South, Midwest, and Northeast), metropolitan area of residency (metro, nonmetro), and heavy smoking indicator. The analysis incorporated statistical methods recommended in the methodological guidelines for analysis of the CPS and CPS supplements [34,35]. Specifically, because the CPS incorporates complex sampling, we estimated variance using balanced repeated replications [36]. The main and 160 replicate weights for this approach have been made available for public use by the U.S. Census Bureau [34,35]. The analysis was performed using SAS®9.4 software [37]; the SAS®9.4 Survey Package procedures suitable for analysis of TUS-CPS have been discussed elsewhere [38]. Table 1 depicts the estimated model coefficients and their standard errors for all covariates. As is shown in Table 1, smokers' sex and survey mode (phone, in-person) were not significant. .The individual lower 95% confidence interval bounds for the mean PPP difference for each racial/ethnic population of daily smokers relative to W daily smokers were computed using the formula:  is the 95th percentile of the central t -distribution with 160 degrees of freedom (the number of degrees of freedom matches the number of the replicate weights) [34][35][36]. We note that there are alternative methods to construct the lower bounds, for example, using the standard normal distribution instead of the central t -distribution [34,36].  "upper," and "alpha = 0.05" options) when fitting the model using SAS software. Alternatively, we could use the lsmeans statement (with "adj = bon," "cl," and "alpha = 0.1" options), and select the comparisons of interest out of all 21 pair-wise comparisons reported and note the lower bound of the two-sided 90% confidence interval reported in the output.

Demonstrating the study goal via the min test and SBH confidence interval
The p-value for the Min test is = 0.0087 p , indicating that at 5% significance level we reject the null hypothesis in favor of the alternative. The corresponding SBH lower 95% confidence interval bound for the mean PPP difference is $0.08 (see Figure 1). Therefore, all six racial/ethnic groups of daily smokers paid, on average, higher PPP relative to W daily smokers in the United States in the periods from 2010-2011 to 2014-2015.
If instead of the Min test we used the Bonferroni approach, then the adjusted p-values would be less than 0.0006 for four comparisons (AIAN versus W, ASIAN versus W, BAA versus W, and H versus W), 0.0012 for one comparison (HPI versus W), and 0.0522 for one comparison (MULT versus W). Therefore, we would conclude that only AIAN, ASIAN, BAA, H, and HPI daily smokers pay higher PPP, on average, than do W daily smokers; and would fail to demonstrate that all six considered racial/ ethnic groups of daily smokers pay higher PPP, on average, relative to W daily smokers.

Discussion
The choice of the reference group as "W daily smokers" was based on the study goal and prior studies of cigarette purchasing behaviors of smokers [1,33]. The  choice of the reference group as well as the statistical methods should always align with the study goal and should be made prior to the data analysis. Specifically, when examining racial/ethnic disparities, using "W" as the reference group could be logical in some studies but not logical in the other studies. For example, if the study goal is to show that purchasing cigarettes on Indian reservations is most prevalent among AIAN smokers, then "AIAN smokers" should be chosen as the reference group. In addition, while both Bonferroni method and the Min test are simple to use, in practice, only Bonferroni method results in individual conclusions regarding each comparison. However, Bonferroni method is less powerful than the Min test when applied to an intersection-union problem (to assess Goal 2) [6,12].
The study indicated that W daily smokers paid significantly less for cigarettes, on average, than the other six racial/ethnic groups of daily smokers in the United States in the period from 2010-2011 to 2014-2015. The earlier reported finding (see model 6 in [1]) was that non-Hispanic White smokers, on average, paid significantly less for cigarettes than did BAA, AIAN, ASIAN/HPI (combined), and H smokers, and paid similar prices to the prices paid by "other non-Hispanic" smokers [1]. While the results might seem to disagree, the direct comparisons between these two findings are problematic, because the studies concerned different populations of smokers (daily smokers in our study, and daily and occasional smokers in the prior study) and time periods (overall 2010-2011 and 2014-2015 in our study, and 2010-2011 in the prior study). Moreover (though, the authors did not mention the method they used to adjust for multiple comparisons, if any), the authors considered the union-intersection problem that is conceptually different from the intersection-union problem addressed in our study [1].
Our study has several potential limitations. First, we considered the population of daily smokers, and thus, results should not be generalized to other populations of smokers such as occasional smokers. Indeed, daily and occasional smokers have very different cigarette purchasing behaviors, for example, daily smokers are more likely to purchase cigarettes in cartons rather than packs and travel to another state or Indian reservations to purchase cigarettes at lower prices [1,39,40]. Second, the analysis was based on a certain regression model where the mean PPP was modeled as a function of smokers' characteristics, location of the purchase, survey mode, and survey period. Another model could potentially lead to a different conclusion, for example, only two out of six models indicated significantly higher mean PPP for AIAN smokers relative to W smokers [1]. Another potential limitation is a lack of a theoretical proof that the SBH interval for the smallest mean PPP difference has indeed confidence level of ( ) −α 100 1 %. The probability coverage of the SBH confidence interval depends on the probability coverage of the individual confidence intervals for the mean differences [23]. Because we used the statistical methods outlined in the CPS methodological guidelines for constructing the individual intervals, we believe that the resulting SBH interval has the probability coverage close to ( ) −α 100 1 % level. Future research may target development and implementation of procedures for the Min test and SBH interval. Specifically, the software packages developed for analysis of complex survey data currently offer just a few multiple comparison methods. For example, the SAS Survey Package offers a built-in procedure for Bonferroni adjustments but lacks procedures for the multiple testing (interval estimation) such as the Min test (SBH interval). Availability of the "Min test" and "SBH interval" procedures would enable researchers to incorporate these methods directly in their analyses of complex survey data.

Conclusion
In our study, results of the Min test (and SBH interval) were different from the results of the Bonferroni method. Specifically, using the Min test (and SBH