Estimating the Effect of Voters ’ Media Awareness on the 2016 US Presidential Election

We examine whether voter media awareness of the 2016 US Presidential election campaign influenced the election using a logit model to estimate the probability that a voter with certain characteristics votes for one of the two candidates. Our results indicate that the more active voters were on social media, the more likely they were to vote for Trump, and the more aware they were of the electoral campaign (watching TV, listening to the radio, reading newspapers, etc.) and the more interested they were in the news/politics, the less likely they were to vote for Trump. The impact of these variables was not as important as their sociodemographic characteristics.


Introduction
The 2016 US Presidential election stands out as an anomaly in election history. A candidate with no prior political experience used his advantage on social media and, in particular, on Twitter to reach the oval office. The election took over the media in extensive news coverage, TV ads, and social media trending. The awareness this election generated due to explicit and implicit advertising may have had a large impact on the outcome. Obama was the first candidate to utilize Twitter and other social media platforms in order to communicate directly with voters during the 2012 election (see Ref. [1]). Trump stormed the media and constantly trended on Facebook and Twitter throughout the campaign and used Twitter to speak freely about his platform (see Ref. [2]).
We examine whether voter awareness of the electoral campaign affects voting decisions. Following Schofield et al [3], we model voters' utility functions as depending on their preferences on an economic and a social policy dimension and sociodemographic characteristics (age, race, gender, education, income, and home state). Voters' utility is also influenced by their awareness of the campaign through TV news, radio, social media trending, their social media activity, the reported ideologies of themselves, and perception of candidates' ideology. We also include in voters' utilities other policy dimensions (stance on state spending on law enforcement, approval of the military, increasing the number of police officers, harsher punishments for previous offenders, and environmental policies) and a random shock. We derive the probability of voting for the candidates using voters' utilities assuming voters vote for the candidate that maximizes their utility.
Using responses to the 2016 Cooperative Congressional Election Survey and voters' utility functions, we estimate the probability that a voter with certain characteristics votes for Trump relative to Clinton. Our findings indicate that a voter who is more aware of media outlets (TV and radio) and that has a higher level of social media activity is, respectively, less and more likely to vote for Trump (relative to Clinton). We also find that voters' awareness of the campaign affected their voting decisions, though this impact is less strong than the effect of voters' sociodemographic characteristics. Advertising and awareness, in the form of active use of social media, influenced the election. Trump raised and spent significantly less than Clinton did an indication that campaign advertising is not just a matter of dollars but that voters' awareness of the campaign also affects their voting decisions (see Ref. [4]).
Section 2 summarizes the findings in the literature on the effect that media has on US elections. Section 3 models the utility voters derived from each candidate, then using this utility we derive the probability that a voter votes for Trump relative to Clinton. Section 4 gives the descriptive statistics of our data with results presented in Section 5. Final comments are given in Section 6 with the Appendix containing tables that support the analysis carried out in Section 5.

Literature review
We first review the literature on the effects of campaign advertising, the impact of Twitter on elections, and on modeling voters' choices using their preferences.
Huber and Arceneaux [5] study whether advertising mobilizes, informs, or persuades citizens in non-battleground states in the 2000 Presidential election, as candidates' advertising campaigns did not target these voters. Using the overlapping nature of media markets (TV) across states, they examine if campaign advertising aimed at swing states, also airing in non-battleground states, affects voting in non-battleground states. They argue that the volume and partisan balance of advertising in swing states is uncorrelated with voter behavior in nonbattleground states. They find advertising campaigns did not mobilize or inform citizens but had a strong persuasive effect with moderately aware individuals being the most susceptible to advertising-induced changes in opinion.
Gordon and Hartmann [6] use the 2000 and 2004 US elections to analyze the effect of market-level advertising on county-level vote shares. They use gross ratings points (GRP) from the Campaign Media Analysis Group as their advertising variable measuring the number of exposures to ads per capita. After controlling for other factors, they find that an increase of 1000 GRPs increases the probability of voting for the Republican and Democratic candidates by 1.5 and 1.7%. Hong [7] uses a sample of the 112th US House of Representatives' activity on social media to study the impact of Twitter on the politicians' campaign finances from June 8 to 22, 2011. He finds that politicians' adoption of social media increases donations from outside their constituencies, that politicians with extreme ideologies benefit more from social media, and that social media tends to react to salient ideas more easily and is thus more likely to benefit political extremists. He finds that an increase in out-of-state donations allows candidates to become more ideologically extreme concluding that social media bridges the gap between politicians and citizens, which may lead to increased inequality and polarization of candidate platforms. Schofield et al [3] builds stochastic models of the 2000 and 2004 US Presidential elections with valences 1 that affect voter's decisions. After placing voters in an economic and social policy space using factor analysis, they estimate a cleavage line, from a binomial logit model, dividing likely Democratic and Republican voters, and find voters' valence judgments and policy preferences significantly influence candidates' policy choices.
For the 2008 Presidential election, Clarke et al [8] models voters' choices using a valence model (including stance on social, economic, and education issues), partisanship, and party leader images. They find that McCain had a positive image as voters viewed him as more experienced, patriotic, and trustworthy than Obama but that voters' believed Obama would improve America's standing. Despite the presence of racial resentment, meaning that those with it had a negative view of Obama and a positive view of McCain, they find that Obama inspired hope with a "yes we can" attitude, typical in valence politics, and attributed Obama's higher valence to the belief that he could tackle the nation's issues and get the job done.
The literature finds that campaign advertising has an impact and a persuasive effect on voter's decisions. The large discrepancy between Clinton's and Trump's Twitter followers and number of tweets indicates that Trump had an advantage on social media as Trump more effectively used social media to connect directly with voters (see Ref. [9]). We study the effect that voters' awareness of the campaign had on their choice of candidate after taking into account the effect of differences between voters' and candidates' economic and social policy preferences, voters' sociodemographic characteristics, and their stance on other policy dimensions.

Modeling voters' electoral choices
In this section, we first model voters' electoral choices using the utility they derive from each candidate, then using the assumptions made on the shock affecting their utility derive the probability that the voter votes for Trump relative to Clinton.
We model the utility voter i derives from candidate j for j ¼ Clinton, Trump as a function of i's preferences, characteristics, and a random shock observed only by i and assume i votes for the candidate that maximizes i's utility. The utility voter i derives from candidate j is given by We assume voters have preferences over the economic and social policies they would like candidates to implement if elected. Voter i's ideal, or most preferred, economic and social policies are given by e i and s i in Eq. (1). Prior to the election, candidates announce their economic and social policy platforms. Candidate j's economic and social policy platforms are given by E j and S j in Eq. (1). The positive coefficients α e and α s in Eq. (1) measure the importance voters give to the economic and social policy dimensions. The terms Àα e s i À S j À Á 2 and Àα s e i À E j À Á 2 capture the disutility i experiences when j's economic and social policy platforms differ from i's 1 Valence is voters' non-policy evaluations of candidates and in particular in [3] measures voters' beliefs on the ability of a candidate to govern effectively.
ideal policies, so that the farther j's policies, E j and S j , are from i's ideal policies, e i and s i , the lower is i's utility from candidate j. Voters' individual sociodemographic characteristics (age, education, gender, income, and race) affect their voting behavior independent of their policy positions, through i's sociodemographic valence for j, ðβ j Á sociodemo i Þ in Eq. (1), given by β j Á sociodemo i β j1 age i þ β j2 educ i þ β j3 gender i þ β j4 income i þ β j5 race i : We allow voters' awareness of the electoral campaign to affect their utility function to examine if their awareness of the campaign influences their choice of candidate. Aware measures the number of media-related things the voter did in the past 24 hours (watch TV news, listen to the radio, read the newspaper, read a blog) with higher values measuring higher engagement by the voter in these events. A higher socialmedia value indicates i's greater participation in social media activities. Voter i's political activities include i's political meeting attendance and postings of political signs. The newsint variable indicates the self-reported level of interest the voter had in the news. These variables affect voters' choices through voters' awareness valence, γ j Á awareness i in Eq. (1), given by (1), measures whether the voter worked on a political campaign or donated money to candidate j, i.e., Voters' ideology and their perception of candidates' ideology may affect their voting decisions. The self-reported ideology is rated on a scale of very conservative (7) to very liberal (1), whereas voters' perception of candidate's ideology is rated from very liberal (1) to very conservative (7). These variables capture voters' beliefs of where they stand relative to their perception of candidates' ideology. The ideology valence ρ j Á ideology i in Eq. (1), is given by We also added other policy variables that may affect the utility voters derive from candidates. We included voter's opinions on increasing state spending on law enforcement and their approval of the military ensuring the supply of oil. We also incorporated voters' opinions on increasing the number of police officers (crime a), their support for harsher prison sentences for individuals with prior offenses (crime b), and their stance on environmental policies. We grouped these variables in what we call the other policy valence, θ j Á other i À Á in Eq. (1), given by As in Schofeld et al [3], we model i's belief of j's competence, or ability to govern, through the competence valence, λ j þ u ij À Á in Eq. (1) where λ j denotes mean of voters' belief of j's competence or ability to govern 2 and u ij is the idiosyncratic component of i's belief of j's competence that is only observed by i and that varies around λ j according to a type I extreme value distribution.
Voter i's utility from j has an observable (O) component, U O ij , that depends on voters' disutility from candidates' platforms differing from their ideals, the valences (sociodemographic, media, political participation, ideology, other, and competence), and on a random component, u ij . So that U ij in Eq. (1) is given by We assume that only Clinton and Trump run in the election and code the vote of i for Trump as 1 (Y i ¼ 1) and make Clinton the base candidate (Y i ¼ 0), so that the dependent variable Y i is coded as Voter i votes for Trump when the utility i derives from Trump is greater than that of voting for Clinton, i.e., when U i Trump >U i Clinton and votes for Clinton otherwise. Since i's utility from j is affected by a random component, u ij , the probability that i votes for Trump is given by Prob i Trump U i Trump >U i Clinton À Á , and since u ij is drawn from a type I extreme value distribution, the probability that i votes for Trump has a logit specification, i.e., where exp is the exponential function and x is the vector of factors included in the observable component of i's utility function, U O i Trump given in Eq. (2). 3,4 The marginal impact of an explanatory variable on the probability that i votes for Trump is obtained by finding the marginal effect that say variable x k has on Eq. (3), holding all other factors in i's utility function in Eq. (1) constant at some specified value, usually their means. The marginal effect of x k on Prob i Trump in Eq. (3), obtained by taking the partial derivative of Eq. (3) with respect to x k , is given by 3 The coefficients in Clinton's utility function (the base candidate) are standardized to zero, so that the where Pr Y i ¼ 1j x ½ is the probability that i votes for Trump and Pr Y i ¼ 0j is that of voting for Clinton. These probabilities change in a nonlinear manner as x k changes. As shown in Eqs. (4) and (5), the marginal effect of x k on P i Trump is the product of the logit coefficient, β k , and the probabilities of voting for the two candidates. The marginal effect measures the impact of a oneunit change in the explanatory variable on the probability that an individual votes for Trump relative to Clinton, the base candidate, holding all other variables at the mean, so that Pr

Descriptive statistics
We now provide the descriptive statistics of the variables used in the analysis. Our data comes from the 2016 Cooperative Congressional Election Survey (CCES), a nationally representative sample of the voting age population, interviewing 64,600 pre-and post-election respondents. We exclude those not voting for Clinton or Trump from our sample. Since the post-election follow-up survey asked the same individuals "For whom did you vote for President of the United States?," we know whom each individual voted for assuming truthful revelation.
The 2016 CCES survey includes a wide range of responses to related questions essentially conveying similar though different information on voters' preferences. Given the high correlation among these questions, these variables should not be simultaneously included in the regressions to avoid multicollinearity effects that may render the regression coefficient estimates unstable and that lead to the interpretation of the effect of these variables on the probability of voting for Trump, relative to Clinton, difficult. Rather than including a large number of highly correlated variables, we use the principal component analysis (PCA) to reduce the number of correlated variables included in the regression. The PCA performs orthogonal transformations to convert correlated variables into a smaller set of linearly uncorrelated variables called principal components. 5 The PCA gives the factor loading 6 of each principal component variable and identifies a smaller set of latent dimensions along which voters make their decisions.
Schofield et al [3], 7 we perform a PCA on 12 survey questions relating to voters' stances on the military, welfare spending, condition of the economy, approval of Obama, gun control, immigration, abortion, gay marriage, budget cuts, personal ideology, tax increases, and racism. Table A.1 in the Appendix contains the questions used in the analysis and the coding of possible responses. We use the PCA factor loadings for each question and each voter's response to each question to derive each voter's preferences along the dimensions identified in the PCA.
The PCA revealed two latent dimensions, labeled as the social and economic dimensions. Table 1 shows the PCA factor loadings for each survey question. The first component has two heavy loadings, racism (consistent with Schofield et al [3]) 5 For example, if supporting gay marriage and abortion have a high correlation, the PCA analysis would group these two variables into a single component. 6 Factor loadings represent how much a component explains the latent variable in the factor analysis.
Their values range between À1 and 1 with values close in absolute value to 1 (0) indicating that the component has a strong (weak) effect on the latent variable. 7 In their study of the 2000 and 2004 US presidential election, [3] finds that voters tend to make voting decisions along two economic and social latent dimensions. After locating voters along these two dimensions, they find that to maximize vote share candidates locate close to electoral mean, the average of voters' location along these two dimensions. and military, on the economic dimension. We anticipated that voters' opinion on economic problems and spending would load strongly in the economic dimension, as found in the literature; however this was not the case in our sample. Perhaps the 2016 election was too different from previous elections. The loadings indicate which component is associated with our social dimension and are consistent with previous literature (such as abortion and gay marriage).
We multiplied the significantly different from zero factor loading of each variable, given in Table 1, with the corresponding response of the voter to that question, and then aggregated these products according to their identification in the economic or social dimensions to find voters' locations along these two latent dimensions and assume these locations represent their preferences in these dimensions. Using the factor loadings in Table 1, voter i's location along these two dimensions, e i and s i in Eq. (1), are estimated as follows: Right on the economic axis (horizontal) in Figure 1 represents an individual that approves of the military and is fearful of people of other races. We interpret north on the social axis (vertical) as liberal concerning, for example, civil rights issues. Figure 1 shows that, while the Clinton and Trump voters are clearly divided along the social dimension, there is no strong divide among them in the economic axis. Table 2 echoes Figure 1 indicating that, on average, Trump voters are more conservative (À1.757) in their social values and Clinton's more liberal (1.753). The statistics of the two candidates along the economic dimension are relatively similar in mean, median, and standard deviations. Schofield et al [3], candidates' platforms, S j and E j , are at the mean of voters' ideal policies so that we can estimate voters' disutility when candidates adopt policies that differ from their ideals, À s i À S j À Á 2 and À e i À E j À Á 2 in Eq. (3).
Since voter's decisions depend on more than their economic and social stances, we control for voters' sociodemographic characteristics, reported ideologies, "other" policy variables, and our awareness valence. Table 3 shows that, while a larger proportion of women voted for Clinton, a larger proportion of men voted for Trump (see Figure 2a where 0 = male and 1 = female). Clinton had a higher proportion of nonwhite voters as 70% of her voters were white than Trump's 88% (see Figure 2b where 0 = nonwhite and 1 = white) with a higher proportion of educated and young individuals voting for Clinton (Figures 2c and d).
In Table 4, the self-reported ideologies and voters' perceived ideologies of each candidate show that Trump voters, on average, identify themselves as "conservative" and perceive Clinton as "very liberal" and Trump as "somewhat conservative." Clinton voters, on average, identify themselves as "somewhat liberal" and perceive Clinton as "somewhat liberal" and Trump as "very conservative." Clinton voters are more pro-environmental than Trump's, and Trump (Clinton) voters prefer to increase (maintain) state spending on law enforcement and support (oppose) increasing the number of police officers.  It is well known that voters' choice of candidate may depend on their state of residence (see also Refs. [10,11]) and that candidates carry out their campaign mostly in swing states. To control for differences across states, we create Democratic, Republican, and Swing dummy variables for voters living in Democratic, Republican, and swing states, coded using Politico's June 2016 list of swing states (see Table A.2 in the Appendix and [12]) called the state swingness variable.  Table 3.

Estimating the probability of voting for Trump
We examine the effect that the various components in voters' utility function in Eq. (1) have on the probability that voters choose a particular candidate in Eq. (3). We estimate a set of logit models sequentially adding groups of variables to show the effect these variables, as a group, have on the models' decision criteria and later discuss their marginal effects on the probability of voting for Trump.    |z-score| in parentheses.***prob <0.001, **prob <0.05, *prob <0.1. preferred model specification, includes all of the previous variables plus the "other" policy variables and gives the best fit according to the decision criteria statistics. The sign and significance of the coefficients of all variables except the economic dimension, attend, and social media are significant and stable across model specifications.
The economic dimension becomes insignificant in column (6) after introducing other policy variables, which improves the model fit as seen in the decision criteria statistics.
We take an alternative approach to the fixed effects used in the literature by incorporating the real swingness of each state as reported by Politico in June 2016 prior to the election. Tables 5 and 6 show that the full model with the state swingness variable gives a better fit to the data.
The logit coefficients given in Tables 5 and 6 do not measure the marginal effect that a variable has on the probability of voting for Trump. As shown in Eqs. (3)-(5), this probability varies in a nonlinear manner with changes in variable x k while holding all other variables at their mean. A positive marginal effect indicates that an increase in the variable results in an increased probability that an individual with mean characteristics votes for Trump (relative to Clinton), whereas a negative marginal effect decreases this probability. Table 7 shows the marginal effects on the probability of voting for Trump of each variable and their significant levels holding all other variables at their mean (given in column 2). A white voter that has the mean characteristics in all the other variables is 31.6% (which is significantly different from zero) more likely to vote for Trump relative to Clinton. A mean voter who approves of using the military for securing the oil supply is 14.2% more likely to vote for Trump, who cares about the environment is 54.1% less likely to vote for Trump, and who views themselves as very liberal on the ideology scale is 22.9% less likely to vote for Trump. An increase in the level of education, from say high school to some college, decreases the probability of voting for Trump by almost 4%.

Conclusion
In this paper, we examine which factors influence the probability that an individual votes for Trump relative to Clinton in the 2016 US Presidential election. Our major contribution is the addition of variables that measure voters' awareness of the electoral campaign after controlling for other factors that the literature finds significantly affect voters' choice of US Presidential candidate. Others in the literature include the number of advertising per capita or the amounts spent on advertising. We opt for a different approach by looking at the effect of the media on voters' choices by using data at the voter level. That is, our awareness variables measure voters' direct interest in the news, their use of social media, and their interest in the electoral campaign. By measuring variables at the voter level, we capture the impact that voters' media awareness had on their voting decisions. We also estimate voters' position along the economic and social dimensions to study the influence the disutility voters derive from candidates adopting positions that differ from their ideal policies had on their voting decisions.
We estimate a set of binomial logit regressions to examine the probability that a voter with certain characteristics votes for Trump relative to Clinton. Our results indicate that the more active a voter with mean characteristics is on social media, the more likely she/he was to vote for Trump. We also find that the more aware the mean voter was of the media (TV, radio, reading newspaper) and the more interested she/he was in the news, the less likely she/he was to vote for Trump. Even though the mean voter's awareness of the campaign impacted the mean voter's decision, the stances on social and economic issues, perceived ideologies, and voter sociodemographic had a greater impact on her/his voting decision.
The 2016 US election was the first election in which a candidate adopted a Twitter platform to communicate directly with voters. Our results indicate that future candidates should capitalize on this low-cost approach to bridging the gap between themselves and voters. Hong [7] argues that social media allows the voter to self-select in the form of a "follow" or "friend" to reinforce their ideological positions. Trump sent almost four times more tweets than Clinton did, finding many supporters along the way (see Ref. [9]). Furthermore, social media tends to react to salient ideas more easily and faster and therefore is more likely to benefit political extremists (see Ref. [7]). Although voter awareness has a lower impact than say, race, in the 2016 election, it still influenced the mean voter's choice of candidate. Future candidates can learn from the 2016 election as it may have changed the political campaign battleground forever.