Surveying Sensitive Topics with Indirect Questioning

Data reliability is a common concern especially when asking about sensitive topics such as sexual misconduct, domestic violence, or drug and alcohol abuse. Sensitive topics might cause refusals in surveys due to privacy concerns of the subjects. Unit nonresponse occurs when sampled subjects fail to participate in a study; item nonresponse occurs when sampled subjects do not respond to certain survey questions. Unit nonresponse reduces sample size and study power; it might also increase bias. Respondents, on the other hand, might answer the sensitive questions in a manner that will be viewed favorably by others instead of answering truthfully. Social desirability bias (SDB) has long been recognized as a serious problem in surveying sensitive topics. Various indirect questioning methods have been developed to reduce SDB and increase data reliability, one of them being the randomized response technique (RRT). In this chapter, we will review some of the important indirect questioning techniques proposed for binary responses, with a special focus on RRTs. We will discuss the advantages and disadvantages of some of the indirect questioning techniques and describe some of the recent novel methods.


Introduction: surveying sensitive topics
Data reliability is a common concern across all studies that use surveys, but more so while asking sensitive questions. Sensitive questions include sacred, private, or potentially exposing information that could be incriminating or discriminating for a respondent, or for the social group that is represented by the respondent [1]. For example, in studies which evaluate exposure to HIV infection, respondents are often asked sensitive questions regarding their opposite-or same-sex sexual practices. As another example, in studies which aim to assess substance use and abuse, respondents might suppress disclosure of their drug and alcohol misuse to avoid embarrassment or potentially harmful/unwanted consequences. Estimating the prevalence of such sensitive attributes is particularly important for health care researchers to build scientific knowledge, create necessary public health interventions, and develop political strategies.
Two problems typically arise while studying sensitive topics, (1) nonresponse rate increase and (2) social desirability bias (SDB), which is defined as the tendency of answering questions in a socially acceptable fashion rather than answering truthfully, occurs. Nonresponse rates can be reduced by utilizing some strategies such as using advance letters, offering incentives, using more experienced interviewers, and making the topic salient to respondents [2][3][4][5][6]. However, if these strategies are more effective for some particular subpopulations compared to others, a reduction in nonresponse can in reality increase nonresponse bias. Statistical techniques can also be used to minimize the effects of unit nonresponse after the data is collected [7]; yet, none of these approaches can prevent SDB. Many researchers have suggested using self-administered modes such as mail, web, computer-assisted self-interviewing (CASI), audio computer-assisted selfinterviewing (ACASI), telephone audio computer-assisted self-interviewing (T-ACASI), or touchtone data entry (TDE) in order to reduce SDB [8][9][10][11]. However, self-administered modes have their own drawbacks. For example, all selfadministered surveys are known to be susceptible to produce low-quality data since they lack the interviewer feedback to help clarify the questions when respondents do not understand them. As another example, it is known that computer-based surveys, such as CASI or ACASI, are mostly completed by younger, more computer savvy respondents, which potentially could introduce bias to estimates. T-ACASI, on the other hand, mainly suffers from high break-offs. There are additional issues about the feasibility of utilizing T-ACASI in survey tools with the elderly [12]. In addition, when surveying disadvantaged populations, self-administered surveys might not be a viable option. In fact, illiteracy, poor vision, respondent preference, or other reasons can cause self-administration not to occur: in a self-administered component of a computer-assisted personal survey, only 79% of CASI cases were actually fully self-administered [13]. For an extensive review of advantages and disadvantages of some of the common survey modes, see Smith and Kim [14].
An effective, alternative way to improve response rates and prevent SDB simultaneously is to increase the perceived privacy of the respondents. If the respondents' privacy can be guaranteed, then their tendency to refuse to participate and/ or provide untruthful answers would decrease. All indirect questioning techniques aim to achieve this goal via different approaches. In the next section, we review some of the indirect questioning techniques that have been developed to increase the perceived privacy of the respondents, where the characteristic under study (the outcome) is binary in nature. Some of these techniques explained here have been extended to the cases where the characteristic under study is quantitative or polychotomous in nature; here we will only focus on the binary outcomes, such as in yes/no type of questions. Note that all indirect questioning techniques have produced extensive research areas: they all have been modified and/or extended since they have been first introduced; here we briefly summarize the most important ones and present their main aspects for conciseness.

Indirect questioning techniques
Several indirect surveying methodologies have been developed to increase respondents' confidentiality when the characteristic under study has a sensitive nature. Among them, the most commonly used ones for binary outcomes are, namely, unmatched count technique (UCT), network scale-up technique (NST), nonrandomized response technique (NRRT), and randomized response technique (RRT).
UCT, which is also called the item count technique, was first introduced by Raghavarao and Federer [15] with the name "block total response procedure," but it was formally developed by Miller [16] in the form that we use today. Since then, UCT has been applied by many researchers such as Miller et al. [17], LaBrie and Earleywine [18], Biemer and Brown [19], Wolter and Laier [20], Gervais and Najle [21], and so forth. UCT provides privacy by embedding a sensitive behavior (which is of interest) within several nonsensitive behaviors. All nonsensitive behaviors and sensitive behavior should be binary outcomes (yes/no). In applying the technique, survey participants are randomly divided into two groups. Individuals in one group are provided with a list of nonsensitive behaviors (say, k items), whereas individuals in the other group are provided with the same list of nonsensitive behaviors plus one sensitive behavior that is under study (in total k + 1 items). Participants in both groups are asked to report the total number of activities that they have engaged in. The prevalence of the sensitive behavior is then estimated by calculating the difference between the two independent group means. Several researchers have modified and/or extended UCT in recent years such as Tsuchiya [22], Chaudhuri and Christofides [23], Hussain et al. [24], Ibrahim [25], and so on. As an example on how to apply UCT, consider the study of Zimmerman and Langer [26] in which the technique was utilized to improve the prevalence estimates of marihuana use and ever having had sex with a person of the same gender among the tenth grade students in the Miami-Dade County Public Schools. In their study, the sample (1524 students) was randomly assigned to receive one of three lists on the survey: 33% of the respondents received a list with four nonsensitive items only; 33% received a list with the four nonsensitive items plus the sensitive item about sex with someone of the same gender; 33% received a list with the four nonsensitive items plus the sensitive item about ever using marihuana. Their UCT estimate of prevalence of having sex with someone of the same gender was 80% higher than the prevalence from direct questioning (DQ ) (15.9 vs. 8.7%); similarly, their UCT estimate of marihuana use was 21% higher than the prevalence from DQ (20.6 vs. 16.7%). Assuming that for socially undesirable behaviors, higher prevalence estimates are more accurate than the lower prevalence estimates; Zimmerman and Langer [26] concluded that their results demonstrate the superiority of UCT compared to DQ.
The NST was first proposed by Bernard et al. [27] in order to estimate the size of a population at risk and was first used to get an estimate of the number of victims killed in the 1985 Mexico City earthquake [28]. The method was later refined and used to estimate the HIV-seropositive persons in the USA [29]. NST basically involves two steps: (1) the personal network size of the members of a random sample of a population is estimated and (2) an estimate of the number of members of the hidden subpopulation is obtained using the information from step 1. The method heavily relies on the assumption that people's social networks on average are representative of the general population; for example, if respondents report knowing 500 people on average, five of whom are sex workers, we can estimate that 1% of the general population is sex workers. The estimated prevalence is then combined with known information about the size of the general population, say the population of the USA, to produce an estimate for the number of people in the USA who are sex workers (see Russell et al. [30] for more details on NST and its limitations).
NRRT was first introduced by Swensson [31] and later modified by Takahasi and Sakasegawa [32]. The main idea behind Swensson's NRRT was to combine a nonsensitive behavior with a sensitive behavior in the same question so that it would not be possible for the interviewer to know which behavior is being responded with a "yes" answer; and therefore, respondents are provided with some level of privacy. Swensson's NRRT requires two independent samples to calculate the estimate of the sensitive characteristic; for this purpose, survey participants can be randomly divided into two groups. Let U be the nonsensitive behavior, such as being married, and let A be the sensitive behavior under study, such as abusing prescription drugs. Let us demonstrate the combined question using Table 1 given below.
Respondents in the first group receive the question "Do you belong to one of the groups a, b, or c in the table?" whereas respondents in the second group receive the question "Do you belong to one of the groups a, b, or d in the table?" Realize that the probability of belonging to groups a, b, c, or d is , respectively. If we denote the unknown proportion of the population members who have the sensitive characteristic A, i.e., the prevalence of the sensitive characteristic, by π A , and let , n j , and j ¼ 1, 2, then the probability of getting a "yes" response from sample 1 becomes and the probability of getting a "yes" response from sample 2 becomes Since, using the Eqs. (1) and (2), we can write From Eq. (3), π A can be estimated using the proportions of "yes" responses, which are calculated from the samples as: One can easily show that the estimator given by (4) is an unbiased estimator of π A : A (Do you use prescription pain relievers without a doctor's prescription?) U (Are you married?)

Yes No
Yes a b No c d Table 1.
The variance of the estimator given in Eq. (4) is derived in a few steps: which can be estimated by Swensson's NRRT was later modified by Takahasi and Sakasegawa [32] as follows: In the first stage, all respondents are asked a nonsensitive binary question, such as "If you have to choose between adopting a cat or a dog, which would you prefer?" but directed not to report their answers to the interviewer (replies are silent). In the second stage, the sensitive behavior is combined with the previous nonsensitive question and asked the respondents in the format given below: • If you are a dog person, and use prescription pain relievers without a doctor's prescription, say 0.
• If you are a dog person, and do not use prescription pain relievers without a doctor's prescription, say 1.
• If you are a cat person, and use prescription pain relievers without a doctor's prescription, say 1.
• If you are a cat person, and do not use prescription pain relievers without a doctor's prescription, say 0.
In this NRRT, to be able to obtain the estimates, the nonsensitive and sensitive behaviors need to be independent (U and A should not be related). As we explain in the next section, if the nonsensitive binary question's prevalence in the population is known, then Takahasi and Sakasegawa's NRRT is equivalent to Warner's model. If the nonsensitive binary question's prevalence in the population is not known, then this method requires the use of two independent samples. By relaxing the assumption of independence of the nonsensitive and sensitive behaviors, Takahasi and Sakasegawa [32] proposed an additional NRRT as well, where the nonsensitive behavior has three outcomes instead of two; as a result, in this NRRT, three independent samples are required instead of two independent samples. For a comprehensive review of NRRTs, interested readers are referred to Tian and Tang [33].
Different than the previous indirect questioning techniques, RRTs provide confidentiality by utilizing a randomization device, such as a deck of playing cards, a pair of dice, or a spinning game wheel while asking the sensitive question. The interviewer does not observe which playing card is chosen or what numbers the dice show, thus a respondent's privacy is absolutely guaranteed by the RRT process because the interviewer cannot know which statement the participant is responding to. In the next section, we review some of the main RRTs that have been proposed as well as some of the recently developed techniques.

Randomized response techniques
The first RRT was introduced by Warner [34]. Warner's RRT asks the sensitive question by providing respondents a randomization device with two statements on it that appear with known probabilities θ and 1Àθ. For example, consider a spinning game wheel which divided into two sectors with areas θ and 1Àθ. The first statement on the device possesses the sensitive characteristic, and the other statement is simply the complementary of the first one. Let us suppose that the sensitive characteristic is being HIV+. The two statements on the wheel would read "I am HIV+" and "I am HIV-" (see Figure 1). The respondents spin the wheel in private and answer truthfully with a "yes" or "no" to the statement on which the arrowhead lands. Since the spinning process is not observed by the interviewer, this model assures privacy protection for the respondents [34].
Let the unknown proportion of the population members who have the sensitive characteristic A be denoted by π A , and let where 0 ≤ π A ≤ 1 and i ¼ 1, 2…, n; then the probability of getting a "yes" response can be written as and, from Eq. (7), an unbiased estimator of π A can be derived aŝ where θ 6 ¼ 0:5 andπ ¼ ∑ n i¼1 X i =n: The variance of Warner's estimator can be derived as [34] Now, let us compare Swensson's NRRT with Warner's model. For simplicity, let us assume that n 1 ¼ n 2 ¼ n=2 and the prevalence of U (say, the proportion of the married people in the population) is known from previous studies. Let us also assume that U and A are not related behaviors. In Swensson's NRRT, if we denote the prevalence of U by p, the probability of getting a "yes" response from samples 1 and 2 can be written as respectively, assuming that the nonsensitive and sensitive behaviors are independent. Since we have the variance ofπ AS can be simplified as In order to compare Warner's RRT with Swensson's NRRT, we calculated theoretical relative efficiencies (REs) from Eqs. (8) and (9) for various combinations of π A and θ ¼ p values using a FORTRAN code and providing our results in Table 2. The FORTRAN code used is given in the Appendix.
One can conclude from however, Warner's estimatorπ AW loses its efficiency. The reason for this efficiency loss is due to the fact that when θ ! 0:5, the variance in Eq. (8) ! ∞. However, keep in mind that small θ values are not preferable in Warner's model since they do not provide sufficient privacy protection [33,35]. In order to further investigate how Warner's RRT compete with Swensson's NRRT, we also assumed the case where θ 6 ¼ p and calculated the theoretical REs as given above, for the case where π A values were set to 0.1, 0.2, and 0.3. The results are provided in Table 3. Realize that we only present the RE values for 0:05 ≤ p ≤ 0:5 and 0:05 ≤ θ ≤ 0:45 since The FORTRAN code used for Table 3 can be obtained from the author upon request.
One can observe from Table 3 that Warner's model is more efficient than Swensson's NRRT only for small θ values. One can also observe that REs become higher when π A values increase, except for some special cases where p ≤ 0:25 and θ ≥ 0:10. Note that in both Tables 2 and 3, REs! ∞ when θ ¼ 0:5.
When the nonsensitive binary question's prevalence in the population is known (say, the prevalence of being a dog person is 1 À θ, and the prevalence of being a cat person is θ), Takahasi and Sakasegawa's NRRT becomes equivalent to Warner's model. To see this, let the unknown proportion of the population members who have the sensitive characteristic A be denoted by π A , where 0 ≤ π A ≤ 1, and let X i (i ¼ 1, 2…, n) be defined as in Eq. (6). If θ is known, the probability of getting a "1" response from Takahasi and Sakasegawa's NRRT can be written as which is equal to the probability of getting a "yes" response from Warner's model, i.e., Eq. (7) and thus, provides the same estimator when solved for π A .
There have been many different RRTs developed since Warner's original method, such as in Greenberg et al. [36,37], Gupta [38], Gupta et al. [39][40][41], Yu et al. [42], Sihm et al. [43], Gupta and Shabbir [44], and so on. Efforts specifically have been made to improve the efficiency of the technique by reducing the variance and thus the confidence intervals, because, the primary disadvantage of the Warner's model, and in fact of all RRTs, is that the variances of the estimators are higher than the ones that could be obtained from DQ [35,45]; for a comprehensive review of RRTs, interested readers are referred to Chaudhuri and Mukerjee [35], Chaudhuri [46], and Chaudhuri and Christofides [47]. In the next subsection, we will review some real-life applications of RRTs and their comparisons with other surveying techniques.

Applications of RRTs
Two meta-analyses of 6 validation and 32 comparative studies which utilized RRTs showed that in various settings, the RRT results are superior to those from DQ and become more valid as the sensitivity of a topic increases [48].
The benefits of the RRTs have also been demonstrated by many statistical methodology researchers via theorems and simulation studies; however, their use in large or national surveys has been somewhat limited. In fact, to our knowledge, the only study which applied an RRT on a national level was done by Kirtadze et al. [49] in the country of Georgia. Kirtadze et al. [49] used a multistage cluster sampling and surveyed 4805 respondents to assess under-reporting of drug abuse in the Republic of Georgia. They utilized the unrelated question RRT to ask questions such as "During the last 12 months, have you taken hashish or marihuana?" They found that all RRT estimates for prevalence of controlled substance use were higher than the DQ estimates, which indicates under-reporting with DQ. For example, lifetime cannabis use estimate was 88.24% higher from RRT than from DQ. Kirtadze Table 3.
Relative efficiencies of the estimators from Warner's model and Swensson's NRRT for various θ and p values when π A ¼ 0:1, 0:2, 0:3: The shaded cells show the cases where Warner's model is more efficient than Swensson's model. however, did not use a gold standard such as urinalysis and thus did not know the "true" value of prevalence of illegal drug use in the Republic of Georgia's study population.
Although not nationwide, there have been other researchers who incorporated RRTs in their surveys. Fisher et al. [50], for example, used forced RRT to estimate the prevalence of substance use and sexual activity of high school students who enrolled to their clinic. They compared their results with the ones from a nonanonymous questionnaire which was completed by the same students earlier the same academic year. While RRT provided higher rates for substance use-related questions with respect to the DQ , it provided similar rates for sexual activityrelated questions (36% from DQ vs. 31% from RRT). Fisher et al. [50] concluded that admitting to sexual activity in the school setting might carry less stigma and perceived risk than admitting to marihuana or cocaine use [50]. We suggest, however, that this result might indicate that the high school students who participated in the study overestimated their sexual behavior when asked directly to live up to peer acceptance; in other words, we suggest that forced RRT corrected the overreporting.
In another study, Srivastava et al.
[51] used Warner's RRT to assess the extent of sexual abuse among children in several districts of Uttar Pradesh state of India. They found that the estimates from RRT were higher than the national estimates obtained from Ministry of Women and Child Development, Government of India, which is an indicator of potential under-reporting with DQ.
In a more recent study, Chhabra et al.
[52] used convenience sampling and asked the question "Have you ever been a victim of sexual abuse by a friend or family member?" to 585 students in a college in Delhi, India. They divided their sample into three equal randomly selected groups and asked the sexual abuse-related question using (1) DQ , (2) the RRT proposed by Sihm et al. [43], and (3) the confidential method, which was their gold standard to compare their results. The prevalence of sexual abuse was 14% with the gold standard, 8% with the DQ method, and 12% with the RRT. Note that their confidential method, however, was also a surveying technique in which the participants wrote down their answers and put them into a closed box.

Inflated variance
Although all RRTs lead to unbiased (i.e., accurate) estimates of the sensitive characteristic of interest, their variances are larger than the ones from the DQ technique. Thus, the price for using an RRT instead of the DQ is the inflated variance, which is due to the randomization process. Consequently, if the question of interest is not considered to be really sensitive by most of the respondents in a specific population, using an RRT instead of the DQ inflates the variance of the estimates unnecessarily. Besides, it is known that there are cultural or social differences in the extent to which topics are perceived as sensitive. For example, smoking marihuana is a less threatening topic in the Netherlands than in the USA, or questions regarding education are considered to be sensitive in Sweden [1]. Similarly, questions regarding HIV status or sexual practices might not be considered to be sensitive by patients who visit an HIV clinic for treatment. Unfortunately, once an RRT is incorporated within a survey, even if the question of interest turns out to be not sensitive after collecting the data, one has to proceed with the inefficient estimate obtained from the RRT. Motivated by the fact that if researchers do not have a priori knowledge about the sensitivity level of a question in a specific population, then they should be able to select between the estimators from an RRT and the DQ ; Ardah and Oral [45] proposed a novel two-stage RRT for a binary response where they utilized Warner's RRT in the second stage. Their model provides unbiased estimators of both prevalence of the sensitive characteristic and the proportion of cheating in the population simultaneously and allows one to obtain better estimates by avoiding the unnecessary penalty, i.e., the inflated variance, if the question of interest turns out to be significantly not sensitive. Although there are some other RRTs which can estimate the sensitivity level of a question, such as in Gupta [38], none of these RRTs have the ability to avoid the inflated variance if the question's sensitivity level turns out to be low.
Ardah and Oral [45] denoted the unknown true proportion of population members that have the sensitive characteristic A by π A , where 0 ≤ π A ≤ 1. They showed that, in a self-protective no-saying model (i.e., when a respondent cheats, s/he always answers in favor of the least stigmatizing category (see [53])) if one uses the DQ , the proportion of respondents who have the sensitive characteristic A becomes where π D is the proportion of respondents who reply with a "yes" to the sensitive question when asked directly and π C is the proportion of respondents who cheat (i.e., the proportion of respondents who reply with a "no" to the sensitive question even though they have the characteristic A). Realize that in Eq. (11), π C is the SDB, and if all respondents answer truthfully in the DQ , i.e., if π C ¼ 0, then π A ¼ π D . Unfortunately, the SDB, i.e., the proportion of cheaters, cannot be measured using the DQ. Ardah and Oral's [45] proposed model, however, allows one to both estimate it and select the unbiased estimator with the minimum variance depending on the sensitivity level of the question (or proportion of cheating in the population). Ardah and Oral [45] utilized unrelated question RRT in the second stage of their model as well; in fact, their model can easily be extended to any desired RRT [45].

Conclusion
Numerous indirect questioning techniques have been developed to ask survey questions on sensitive topics or stigmatizing characteristics. All indirect questioning techniques have their own limitations: In applying the UCT, only one of the group members provide the information on the sensitive characteristic of interest [25]. NST is known to suffer from recall bias, barrier effects, transmission error, and response bias [54]. NRRTs can be vulnerable to cheating due to distrust as much as the RRTs [55]. RRTs have some disadvantages as well: integrating a randomization device into a survey tool might not be practical in some situations, such as when researchers plan to use venue sampling to reach LGBT community members in gay bars or clubs. RRTs are also known not to work well if the respondents do not understand the process and/or do not follow the instructions properly. Besides, there might be cultural, social, or personal differences in the extent to which topics are perceived to be sensitive. Thus, we suggest that researchers should select the optimal surveying technique by considering various aspects of their study, such as the target population, sensitivity level of the question, available resources, and practicality of integrating a specific technique, at once. As in Erdmann [55], we also suggest that researchers should not rely on the more-is-better assumption, which is assuming that the higher prevalence estimates are more accurate than the lower prevalence estimates, in comparing different techniques; instead, we suggest to use a valid gold standard (such as urinalysis or a lie detector) for comparisons, whenever possible, perhaps using a small subsample. OPEN (3,FILE="C:\Users\Oral\Documents\Fortran results\Table2.