Summary statistics.
Abstract
We develop a method to incorporate model uncertainty by model averaging in generalized linear models subject to multiple endogeneity and instrumentation. Our approach builds on a Gibbs sampler for the instrumental variable framework that incorporates model uncertainty in both outcome and instrumentation stages. Direct evaluation of model probabilities is intractable in this setting. However, we show that by nesting model moves inside the Gibbs sampler, a model comparison can be performed via conditional Bayes factors, leading to straightforward calculations. This new Gibbs sampler is slightly more involved than the original algorithm and exhibits no evidence of mixing difficulties. We further show how the same principle may be employed to evaluate the validity of instrumentation choices. We conclude with an empirical marketing study: estimating opening box office by three endogenous regressors (prerelease advertising, opening screens, and production budget).
Keywords
- multiple endogeneity
- instrumental variables
- Bayesian model averaging
- conditional Bayes factors
- box office forecasting
1. Introduction
Market response modeling focuses on estimating the effects of marketing activities on performance. However, marketing managers are often strategic in their use of marketing activities and adapt them in response to factors unobserved by the researcher [1, 2, 3]. Endogeneity arises, for example, when a firm’s marketing strategies such as advertising spending, channel selection, and pricing are nonrandom and influenced by the firm- and industry-level factors [4, 5, 6]. Strategic management decisions are endogenous to their expected effects on market performance. Therefore, empirical market response models that seek to estimate the causal effect of multiple marketing instruments need to account for such strategic planning of marketing activities, or otherwise may suffer from an endogeneity problem, leading to biased estimates of the effects of the marketing activities on performance [1, 3, 4, 7]. Dealing with endogeneity has been extensively discussed in the marketing literature, especially concerning different forms of regression and panel models [1, 5, 8, 9, 10], choice models [11, 12], endogeneity correction based on a control function approach [13, 14], as well as structural equations models [4]. However, little research addresses incorporating model uncertainty related to endogeneity in generalized linear models.
We consider the problem of incorporating instruments and covariate uncertainty into the Bayesian estimation of an instrumental variable (IV) regression system. The concepts of model uncertainty and model averaging have received widespread attention in the economics literature for the standard linear regression framework [15, 16, 17, 18] and in generalized linear models [19, 20, 21, 22]. For a good introduction to Bayesian model averaging (BMA), see [23]. Primarily, these frameworks do not directly address the case of multiple endogenous variables, and only recently has attention been paid to model uncertainty involving multiple endogenous variables. Unfortunately, the nested nature of IV estimation renders direct model comparison difficult. In the economics literature, this has led to several different approaches [24, 25]. Durlauf et al. [25] consider approximations of marginal likelihoods in a framework similar to two-stage least squares. Lenkoski et al. [16] continue this development with the two-stage Bayesian model averaging (2SBMA) methodology, which uses a framework developed by Kleibergen and Zivot [26] to propose a two-stage extension of the unit information prior [27]. Similar approaches in closely related models have been developed by [15, 28].
Koop et al. [29] developed a fully Bayesian methodology that does not utilize approximations to integrated likelihoods. They present a reversible jump Markov chain Monte Carlo (RJMCMC) algorithm [30], which extends the methodology of Holmes et al. [31]. The authors then show that the method can handle a variety of priors, including those of [32, 33], and [34]. However, the authors note that the direct application of RJMCMC leads to significant mixing difficulties and relies on a complicated model move procedure similar to simulated tempering to escape local model modes. There is a more straightforward and relatively general model search procedure. Madigan and York [35] proposed the Markov Chain Monte Carlo Model Composition (MC3) in which one applies the same idea of a Metropolis-Hastings step for model jumps from RJMCMC but in a simplified fashion.
We propose an alternative solution to this problem: Instrumental Variable Bayesian Model Averaging (IVBMA). Our method builds on a Gibbs sampler for the IV framework, extended from that discussed in Rossi et al. [36]. While direct model comparisons are intractable, we introduce the notion of a conditional Bayes factor (CBF), first discussed by Dickey and Gunel [37] and employed in a seemingly unrelated regression context by [31]. The CBF compares two models in a nested hierarchical system, conditional on parameters not influenced by the models under consideration. We show that the CBF for both outcome and instrumental equations is exceedingly straightforward to calculate and essentially reduces to the normalizing constants of a multivariate normal distribution.
Further, we note that our method can handle generalized linear mixed models with multiple endogenous variables in a straightforward fashion. This leads to a procedure in which model moves are embedded in a Gibbs sampler, which we term MC3-within-Gibbs. Based on this order of operations, IVBMA is only trivially more complicated than a Gibbs sampler that does not incorporate model uncertainty and thus appears to have limited issues regarding mixing. This feature is essential as it shows more complicated scenarios involving endogeneity, instrumentation, and model uncertainty can be handled within this framework, an important feature when constructing more involved Bayesian hierarchical models.
When working with a large system of equations subject to endogeneity and instrumentation, there is a natural concern that the instrument assumptions may not hold. A host of frequentist-type hypotheses has been proposed to examine the instrument conditions; the most familiar to applied researchers is the test of Sargan [38]. There have been, to our knowledge, no similar checks of instrument validity proposed in the Bayesian IV literature outside of the approximate method advocated in [16]. We offer a new verification of instrument validity, also based on CBFs, which appears to be the Bayesian analog of the Sargan test. This method can integrate seamlessly with the IVBMA framework and offers a check of instrument validity.
The article proceeds as follows. The basic framework we consider and the Gibbs sampler ignoring model uncertainty is discussed in Section 2. Section 3 reviews the concept of model uncertainty, introduces the notion of CBFs, and derives the conditional model probabilities used by IVBMA. In Section 4, we propose our method of assessing instrument validity. Section 5 presents empirical illustrations of the proposed model for predicting box office revenues. Lastly, we summarize and conclude with potential applications of the IVBMA approach.
2. The instrumental variable model with multiple endogenous variables
We consider the following classic linear system model with multiple endogenous variables:
where
while
for
When
Generalized linear mixed models provide a unified approach that directly acknowledges multiple levels of dependency and model different data types [39, 40, 41, 42]. Extensions to generalized linear models implicitly assume a continuous response with Gaussian errors. Extending these developments to alternative sampling models is straightforward in the context of a random-effects framework. Let
while the remaining
We proceed by discussing the Bayesian estimation of these parameters under standard conjugate priors, following the developments of [36]. Accordingly, with each parameter vector, we assume
and
where
where
Let
Fix
where
Set
The act of conditioning, therefore, turns the original system into a simple linear regression problem, and via standard results, we have that
where
Finally, suppose that all
where
with each
Eq. (4) and Eq. (5) thereby give the full conditionals necessary for the Gibbs sampler. For a basic introduction to MCMC sampling with illustration, see [43]. Our approach differs slightly from that of Rossi et al. [36], in that their Gibbs sampler features a more involved manner of updating the instrumental covariates
For a Poisson regression using a log link in Eq. (3), the term
3. Incorporating model uncertainty
We outline our method for incorporating model uncertainty into the framework in Eq. (1) and Eq. (2). To explain the motivation behind our CBF approach, we first review a classic Bayesian model selection method. We then show how the concept of Bayes Factors can be usefully embedded in a Gibbs sampler yielding CBFs. These CBFs are then shown to yield straightforward calculations.
3.1 Model selection and Bayes factors
In a general framework, incorporating model uncertainty involves considering a collection of candidate models
By letting the model become an additional parameter to be assessed in the posterior, we aim to calculate the posterior model probabilities given the data
where
The integrated likelihood
where
One possibility for pairwise comparison of models is offered by the Bayes factor (BF), which is in most cases defined together with the posterior odds [22, 46]. The posterior odds of model
where
denote the Bayes factor and the prior odds of
When the integrated likelihood in Eq. (7) and thus the BF can be computed directly, a straightforward method for exploring the model space, Markov Chain Monte Carlo Model Composition (MC3), was developed by Madigan and York [35]. MC3 determines posterior model probabilities by generating a stochastic process that moves through the model space
and (c) sets
3.2 Model determination
Incorporating model uncertainty into the system Eq. (1) involves considering a separate model space
Therefore, an implementation of MC3 in the product space of
Given the system Eq. (1), fix
Thus, the conditional posterior odds depend on calculating a Bayes factor conditional on the current state of
Calculating the relevant terms in Eq. (6) is straightforward. In particular, we note that
which is, in essence, an integrated likelihood for model
where
The power of this result is that the model
Since MC3 constitutes a valid MCMC transition in the model space
4. Assessing instrument validity
For the estimates
Suppose that all residuals were known. Let
The essential notion of the Sargan test is to consider the model,
and test whether
Our approach is to model this in a Bayesian context. In particular, we consider two models:
Let
Note that
and therefore, we have reduced the problem of assessing
while
Evaluation of these integrals thus requires the specification of priors
which yields
For
which yields
where
This approach offers similar performance to the Sargan test, which has the desirable feature that it is a fully Bayesian approach, as opposed to the approximate test of [16], and it can be directly embedded in the Gibbs sampling procedures outlined above. We emphasize in the discussion section that further work can be done on this diagnostic.
5. Empirical study: determinants of opening box office
In this section, we consider a generalized linear model with an identity link in the presence of multiple endogenous variables and covariates based on the IVBMA framework incorporating model uncertainty. Based on previous studies of box office revenues, we estimate the effects of three endogenous predictors, prelaunch advertising spending, the number of screens, and production budget with other covariates on opening box office.
Several studies have established a significant link between advertising expenditures and box-office grosses [47, 48, 49, 50]. Almost 90% of a movie’s advertising budget is allocated in the weeks leading up to the theatrical launch [49] shows the importance of prerelease advertising. The number of screens on which a movie is released has been recognized as one of the most significant factors related to the box office [51, 52, 53]. Prerelease advertising spending and the number of opening screens need to be considered endogenous because it is plausible for movies that are expected to generate high box office gross to receive more advertising and distribution. That is, advertising spending and distribution are more likely to be determined by expected box office revenues.
Major studios dominate the movie marketplace regarding film production and distribution. The production budget is an essential predictor because big budgets translate into the casting of top actors and directors, lavish sets and costumes, special effects, and expensive digital manipulations, leading to heightened audience attractiveness [54, 55]. Previous studies [55, 56, 57] used production budget as a direct influencer or moderating variable, but it is also the studio’s strategic decision using knowledge about viewers and competitors’ actions, that is, the data reflect firm’s strategic behavior [58]. While researchers examined endogeneity in advertising responsiveness using a control function approach [14] or price endogeneity using Gaussian copula [9], they did not simultaneously control for multiple endogenous variables or incorporate model uncertainty. The proposed approach can test the effects of three endogenous variables in a generalized framework.
5.1 Description of the data
Starting from all movies released by major studios from 2006 to 2007, we analyzed 130 movies, including 16 animation and 50 R-rated movies, based on the IMDb database. We have excluded films without the complete prerelease advertising information from TNS Media Intelligence. Advertising data include the total dollar value of prerelease media expenditure across 17 different media. The number of opening screens, production budget, and opening box office gross are obtained from IMDb.com and BoxOfficeMojo.com. Table 1 shows the summary statistics of the dependent variable and three endogenous variables. Opening box office gross varies from less than a million to over 100 million dollars. The production budget represents the most significant expense for movie studios [49]. For movies in our sample, they are about $52 million on average and vary from $4 million to $210 million. It becomes crucial for films with high production costs to succeed at the box office to recover their costs, resulting in higher advertising spending and showing at more theaters.
Mean | Median | SD | Range | |
---|---|---|---|---|
Opening box office | 20.48 | 14.32 | 17.90 | (0.72, 102.75) |
Prerelease advertising | 4.39 | 4.16 | 2.17 | (0.69, 10.79) |
Number of opening screens | 2729 | 2692 | 707 | (825, 4054) |
Production budget | 52.15 | 35.0 | 44.21 | (4, 210) |
The three endogenous predictors were regressed on eleven potential instruments and thirteen additional covariates, summarized in Table 2. Covariates such as genre, MPAA rating, animation, sequels, and release date are publicly available on IMDb and The Numbers. The genre is classified into seven categories (action, comedy, drama, horror, Sci-Fi, mystery/suspense, and romance), and the MPAA rating into two dummy variables (R, PG-13, and others).
Instrumental variables (Z) | Release time | Period indicator based on 10-year box office gross |
(1 = May, June, July, December/0 = other months) | ||
Expert | Marketability ratings of industry experts | |
Direct | Production and distribution by the same company | |
Distributor | Production studio dummy variables (SD1–SD6) | |
(1 = FOX, 2 = COLUMBIA, 3 = WARNER BROTHERS, 4 = UNIVERSAL, 5 = PARAMOUNT, 6 = DREAMWORKS, 7 = Others) | ||
Covariates (W) | Seasonality | Seasonal index by decomposition model |
Sequels | Dummy variable | |
Animation | Animation movies | |
Critics review | Movie ratings from Rotten Tomatoes (0–100 points) | |
GD1–GD7 | Genre dummy variables | |
(1 = action/adventure, 2 = comedy, 3 = drama, 4 = horror, 5 = Sci-Fi, 6 = mystery/suspense, 7 = romance, 8 = others) | ||
RD1–RD2 | MPAA rating dummy variables | |
(1 = R, 2 = PG13, 3 = Others) |
The MPAA rating is related to the potential size of viewers. Not R-rated movies are open to more moviegoers from the outset, making it necessary to have wider releases and intensive advertising. Critics’ ratings are obtained from the Rotten Tomatoes website, which gives a composite score of 1–100 based on evaluations from movie critics. A monthly seasonality index was obtained by estimating a decomposition model using a time series of monthly box office gross. The seasonal parameter was optimized at 0.56 with the mean absolute percentage error of 10.5%.
For the two endogenous variables, prerelease advertising and opening number of screens, we have used four common instruments of the 11 variables: (a) movie distributors, (b) release time, (c) average marketability ratings by three industry experts in one of the major studios, and (d) whether the same studio did production and distribution. Studios have considerable discretion over the amount and schedule of prelaunch advertising they allocate to each movie [51]. Because advertising elasticities for motion pictures are significantly higher compared to other industries [52], studios’ decisions on prerelease advertising spending and opening screens would have a significant impact on the success at the box office. We have included eight major studios to examine any studio-specific effects on advertising and distribution. Release time is another critical characteristic since movie advertising is seasonal, as heavily supported movies are usually released in peak seasons [51]. Based on the monthly box office gross from 2001 to 2010, we have found a substantial increase in box office gross in May–July and December. A dummy variable is used to indicate those months. For the third endogenous variable, production budget, we exclude release time and expert ratings since they are unavailable at the time of budget decision. Similarly, the seasonal index and critics review were also excluded from the regression of the production budget. Some major studios like 20th Century Fox and Paramount are vertically integrated, having their distribution division. A dummy variable
5.2 Results
Table 3 shows the IVBMA posterior estimates of the first stage. The sum of the models’ posterior probabilities containing the variable is called the inclusion probability [16, 23]. In Table 3, column
Prerelease advertising | Opening screens | Production budget | |||||||
---|---|---|---|---|---|---|---|---|---|
IncProb | Mean | Quantile | IncProb | Mean | Quantile | IncProb | Mean | Quantile | |
Intercept | 0.436 | −0.130 | (−1.437, 0.872) | 0.542 | 0.356 | (−0.908, 2.189) | 1 | 16.496 | (16.037, 16.909) |
Sequels | 1 | −0.415 | (−0.480, −0.348) | 0.201 | 0.025 | (0, 0.211) | 0.909 | 0.491 | (0, 0.914) |
Animation | 1 | 0.197 | (0.133, 0.266) | 0.365 | 0.061 | (0, 0.296) | 0.901 | 0.562 | (0, 1.083) |
Seasonal index | 1 | 1.418 | (1.004, 1.836) | 0.817 | 0.745 | (0, 1.727) | |||
Critics review | 1 | 1.111 | (1.037, 1.184) | 0.070 | 0 | (−0.025, 0.032) | |||
R | 0.089 | 0.004 | (0, 0.068) | 0.853 | −0.139 | (−0.288, 0) | 0.242 | 0.041 | (−0.123, 0.445) |
PG-13 | 1 | 0.511 | (0.467, 0.564) | 0.189 | −0.017 | (−0.174, 0) | 0.964 | 0.436 | (0, 0.791) |
Action/adventure | 0.026 | 0 | (0, 0) | 0.055 | 0.001 | (0, 0.029) | 0.987 | 0.494 | (0.183, 0.771) |
Comedy | 0.048 | −0.002 | (−0.033, 0) | 0.062 | 0.002 | (0, 0.038) | 0.539 | 0.181 | (0, 0.631) |
Drama | 0.048 | 0.001 | (0, 0.030) | 0.903 | −0.149 | (−0.259, 0) | 0.168 | −0.009 | (−0.225, 0.124) |
Horror | 1 | −0.229 | (−0.289, −0.169) | 0.081 | −0.004 | (−0.084, 0) | 0.446 | −0.139 | (−0.595, 0) |
Sci-Fi | 0.062 | 0.003 | (0, 0.051) | 0.102 | −0.006 | (−0.112, 0) | 0.202 | 0.017 | (−0.178, 0.331) |
Mystery/suspense | 0.057 | 0.002 | (0, 0.035) | 0.046 | 0 | (0, 0) | 0.212 | 0.036 | (−0.004, 0.358) |
Romance | 0.095 | −0.005 | (−0.068, 0) | 0.125 | −0.011 | (−0.150, 0) | 0.316 | −0.085 | (−0.600, 0.046) |
Direct | 0.028 | 0 | (0, 0) | 0.093 | 0.005 | (0, 0.087) | 0.724 | 0.222 | (0, 0.543) |
Fox | 0.999 | −0.147 | (−0.203, −0.090) | 0.065 | 0 | (−0.025, 0.015) | 0.219 | 0.021 | (−0.196, 0.383) |
Columbia | 0.028 | 0 | (0, 0) | 0.074 | −0.002 | (−0.057, 0.003) | 0.724 | 0.331 | (0, 0.838) |
Paramount | 0.030 | 0 | (0, 0) | 0.114 | 0.009 | (0, 0.134) | 0.994 | 0.725 | (0.311, 1.124) |
Universal | 0.034 | 0 | (0, 0) | 0.072 | 0.003 | (0, 0.077) | 0.885 | 0.488 | (0, 0.943) |
Warner Brothers | 0.053 | 0.002 | (0, 0.036) | 0.071 | −0.001 | (−0.041, 0.020) | 0.987 | 0.712 | (0.253, 1.144) |
MGM | 0.038 | 0.001 | (0, 0) | 0.089 | 0.005 | (0, 0.118) | 0.249 | 0.053 | (−0.088, 0.504) |
Lions Gate | 1 | −0.421 | (−0.498, −0.343) | 0.404 | −0.073 | (−0.314, 0) | 0.948 | −0.676 | (−1.161, 0) |
Buena Vista | 0.040 | 0 | (0, 0) | 0.092 | 0 | (−0.051, 0.057) | 0.208 | 0.008 | (−0.262, 0.348) |
Expert | 1 | 2.134 | (1.886, 2.449) | 1 | 1.596 | (1.139, 1.947) | |||
Release Time | 0.026 | 0 | (0, 0) | 0.068 | 0.002 | (0, 0.054) |
As expected, a seasonal index shows a high inclusion probability for both endogenous variables, which aligns with the common belief that movies with high expected gross are carefully scheduled to be released in peak seasons.
For prerelease advertising, the PG-13 rating is included with probability one. It concerns the size of potential viewers since non-R ratings imply greater reach among moviegoers, which may result in a higher level of advertising. There is empirical evidence from more than one systematic investigation to show that R-rated movies generate smaller revenues than those with less restrictive ratings [47, 62]. We also find that a dummy variable GD5 for Horror films is a significant predictor of prerelease advertising. This result may reflect the popular trend at that time. There are 15 horror movies in the sample including
In contrast, critical reviews were not included in explaining opening screens. It is consistent with the findings that the relationship between reviews and distributor’s decision is spurious [65], and there is only a positivity bias of exhibitors such that an excellent review allows a movie to stay longer on-screen while negative reviews do not shorten a film’s run [66]. That is, critical reviews do not influence an exhibitor’s decision to keep or withdraw a movie from a theater.
As shown in Table 3, regarding production budget, distributor effects are evident from the high inclusion probabilities of the studios besides movie characteristics such as
Table 4 shows the IVBMA posterior estimates of the second-stage regression. As discussed in section 4, we have tested instrumental validity based on a Bayesian approach. As mentioned in Section 4, the validity score represents the probability that the instrument condition is not satisfied. All instruments used in the study are essentially zero, which strongly supports the validity of the instrumentation choices. In the second stage, several variables are essential predictors of opening box office revenues. As expected, the number of opening screens and prerelease advertising are significant determinants of opening box office gross with high inclusion probabilities. Though it is difficult to disentangle the causal effect of advertising on sales using data on actual box office receipts, it is consistent with previous findings that prerelease advertising has a positive and statistically significant impact on public awareness of a movie and its box office performance [47, 49, 50, 68]. While Elberse and Eliashberg [52] argue that movie attributes and advertising expenditures mostly influence revenues indirectly through their impact on exhibitors’ screen allocations, this result supports a significant direct effect of advertising. The number of opening screens is the most important predictor, with an inclusion probability of one, which is also consistent with previous findings [53, 69, 70]. It seems to be the case that the more screens on which new movies were released, the bigger their initial audiences. The higher the audience for a movie in the opening weekend, the higher would be its audience the following week. While audiences inevitably drop off over time, a movie’s cinema run would be longer if it got off to a good start. Considering a typically high correlation between opening screens and prerelease advertising, studios’ advertising and distribution approaches may be very similar. Other than these two factors, Sequels and Drama show high inclusion probabilities, which may only reflect the characteristics of successful movies in the sample. Though we initially expected a significant effect of seasonality, it turns out to have a weak influence, though it remains relevant. Production budget has low inclusion probability, and it suggests that a movie’s production cost is an indicator of the creative talent involved or the extent to which the movie incorporates expensive special effects or uses elaborate set designs [49], but not a good indicator of success. For about 90 films released in the United States from 2008 to 2012 with budgets of more than $100 million, most of them failed to generate enough revenues at the box office to cover their costs [71]. After all, big budgets do not guarantee success, and the only way to know how audiences react to a movie is to wait until it has been released and moviegoers have had the opportunity to see it.
IncProb | Mean | SD | Quantile | Conditional | ||
---|---|---|---|---|---|---|
Mean | SD | |||||
Constant | 0.529 | −0.284 | 0.725 | (−2.078, 0.989) | −0.525 | 0.926 |
Sequels | 0.963 | 0.564 | 0.203 | (0, 0.915) | 0.585 | 0.174 |
Animation | 0.147 | −0.001 | 0.065 | (−0.166, 0.163) | −0.001 | 0.170 |
R | 0.103 | 0.002 | 0.041 | (−0.071, 0.096) | 0.016 | 0.126 |
PG-13 | 0.201 | 0.034 | 0.096 | (−0.001, 0.345) | 0.172 | 0.149 |
Action/adventure | 0.098 | 0.002 | 0.034 | (−0.041, 0.097) | 0.031 | 0.104 |
Comedy | 0.142 | −0.015 | 0.056 | (−0.205, 0.004) | −0.102 | 0.116 |
Drama | 0.797 | −0.238 | 0.157 | (−0.509, 0) | −0.298 | 0.113 |
Horror | 0.264 | 0.051 | 0.113 | (0, 0.383) | 0.193 | 0.143 |
Sci-Fi | 0.150 | 0.007 | 0.070 | (−0.117, 0.215) | 0.054 | 0.173 |
Mystery/suspense | 0.136 | −0.012 | 0.050 | (−0.183, 0.007) | −0.089 | 0.108 |
Romance | 0.129 | 0.003 | 0.052 | (−0.104, 0.138) | 0.019 | 0.143 |
Seasonal index | 0.569 | −0.446 | 0.668 | (−2.013, 0.445) | −0.781 | 0.723 |
Critics review | 0.291 | 0.038 | 0.216 | (−0.371, 0.670) | 0.130 | 0.384 |
Prerelease advertising | 0.918 | 0.452 | 0.200 | (0, 0.758) | 0.492 | 0.153 |
Opening screens | 1 | 1.287 | 0.306 | (0.756, 1.963) | 1.285 | 0.306 |
Production budget | 0.120 | 0.008 | 0.046 | (−0.019, 0.164) | 0.071 | 0.115 |
6. Conclusion
Market response models often use endogenous regressors since marketing activities are nonrandom and reflect the firm’s strategic behavior. Thus, ignoring the endogeneity of marketing actions will lead to incorrect estimates of response parameters and, consequently, to biased inferences [4, 58]. While researchers have developed various approaches to dealing with endogeneity, including the control function approach, Gaussian copula, or instrument-free approaches, the IV approach remains the technique of choice when dealing with endogeneity in econometrics and other areas of applied research. Almost invariably, empirical work in economics and marketing will be subject to much uncertainty about model specifications. This may be the consequence of the existence of different theories or different ways in which theories can be implemented in empirical models or other aspects such as assumptions about heterogeneity or independence of the observables [72]. It is important to realize that this uncertainty is an inherent part of the marketing response modeling.
We have proposed a computationally efficient solution to the problem of incorporating model uncertainty into IV estimation. The IVBMA method leverages an existing Gibbs sampler and shows that by nesting model moves inside this framework, model averaging can be performed with minimal additional effort. In contrast to the approximate solution proposed by [16], our method yields a theoretically justified, fully Bayesian procedure. The applied examples show this method’s benefit, by enabling additional factors to be entertained by the researcher, which are either incorporated where appropriate or promptly dropped.
The CBF approach is only one manner of incorporating model uncertainty in the framework considered. Two other options would be reversible jump schemes [29, 30] or specify a spike and slab prior [73]. We have chosen our approach because it fits nicely into the Gibbs sampling framework, unlike the reversible jump procedure of Koop et al. [29], and still explicitly incorporates uncertainty at the model level, unlike spike and slab type priors at the variable level. However, additional research is needed to explore the tradeoffs between these alternative methods of incorporating model uncertainty.
One assumption crucial to the Gibbs sampler’s functioning is the multivariate normality of the residuals in Eq. (2). Conley et al. [74] discuss a Bayesian approach that allows nonparametric estimation of the distribution of error terms in a set of simultaneous equations using a Dirichlet process mixture (DPM). We note that the IVBMA methodology can readily incorporate the DPM framework by simply replacing the IV kernel distributions of [36] with IVBMA kernel distributions. A nonparametric IVBMA approach based on non-normal errors will be one of the model extensions in the future. Another critical issue is assessing instruments’ validity in implementing IV methods. The Bayesian version of the Sargan test that we have proposed serves as a natural starting point for more involved methodologies, including latent factors though many features still need to be investigated on this front compared to other strategies.
IVBMA has the potential to be extended to more complicated likelihood frameworks. The proposed model can be extended to latent constructs in the context of structural equations modeling with latent Gaussian factors and, at the same time, selecting the suitable path model [75]. Survival analysis is another area that can benefit from the IVBMA approach in dealing with multiple endogenous regressors and implementing more flexible hazard specifications beyond the proportional hazard model [76]. Since the entire method uses a Gibbs framework, it can be incorporated in any setting where endogeneity, model uncertainty, and latent normality are present. In particular, the linear specification can be relaxed using semiparametric methods such as splines or more flexible approaches involving Gaussian processes. While the algorithms involved would understandably become more complex, the central concept involving using CBFs to assess model uncertainty would remain pertinent.
Here we outline the calculation of
Let
where
We can now see that the term in the integral is the canonical form of a Gaussian distribution. Appropriate completion therefore yields
Let
and for
where
The MCMC for this model roughly follows the algorithm mentioned above, but with the additional handling of the random effect
where
with
Further, denote
Writing
we have
Hence, by setting
we may sample
Once all
References
- 1.
Papies D, Ebbes P, Van Heerde HJ. Addressing endogeneity in marketing models. In: Leeflang PSH, Wieringa JE, Bijmolt THA, Pauwels KH, editors. Advanced Methods for Modeling Markets. Cham: Springer International Publishing; 2017. pp. 581-627. DOI: 10.1007/978-3-319-53,469-5_18 - 2.
Dong X, Chintagunta PK, Manchanda P. A new multivariate count data model to study multi-category physician prescription behavior. Quantitative Marketing and Economics. 2011; 9 :301-337. DOI: 10.1007/s11129-011-9102-7 - 3.
Chintagunta P, Erdem T, Rossi PE, Wedel M. Structural modeling in marketing: Review and assessment. Marketing Science. 2006; 25 :604-616. DOI: 10.1287/mksc.1050.0161 - 4.
Hult GTM, Hair JF, Proksch D, Sarstedt M, Pinkwart A, Ringle CM. Addressing endogeneity in international marketing applications of partial least squares structural equation modeling. Journal of International Marketing. 2018; 26 :1-21. DOI: 10.1509/jim.17.0151 - 5.
Manchanda P, Rossi PE, Chintagunta PK. Response modeling with nonrandom marketing-mix variables. Journal of Marketing Research. 2004; 41 :467-478. DOI: 10.1509/jmkr.41.4.467.47005 - 6.
Chintagunta PK. Endogeneity and heterogeneity in a probit demand model: Estimation using aggregate data. Marketing Science. 2001; 20 :442-456. DOI: 10.1287/mksc.20.4.442.9751 - 7.
Villas-Boas JM, Winer RS. Endogeneity in brand choice models. Management Science. 1999; 45 :1324-1338. DOI: 10.1287/mnsc.45.10.1324 - 8.
Rossi PE. Even the rich can make themselves poor: A critical examination of IV methods in marketing applications. Marketing Science. 2014; 33 :655-672. DOI: 10.1287/mksc.2014.0860 - 9.
Park S, Gupta S. Handling endogenous regressors by joint estimation using copulas. Marketing Science. 2012; 31 :567-586. DOI: 10.1287/mksc.1120.0718 - 10.
Ebbes P, Papies D, Heerde HJ. The sense and non-sense of holdout sample validation in the presence of endogeneity. Marketing Science. 2011; 30 :1115-1122. DOI: 10.1287/mksc.1110.0666 - 11.
Kuksov D, Villas-Boas JM. Endogeneity and individual consumer choice. Journal of Marketing Research. 2008; 45 :702-714. DOI: 10.1509/jmkr.45.6.702 - 12.
Louviere J, Train K, Ben-Akiva M, Bhat C, Brownstone D, Cameron TA, et al. Recent progress on endogeneity in choice modeling. Marketing Letters. 2005; 16 :255-265. DOI: 10.1007/s11002-005-5890-4 - 13.
Petrin A, Train K. A control function approach to endogeneity in consumer choice models. Journal of Marketing Research. 2010; 47 :3-13. DOI: 10.1509/jmkr.47.1.3 - 14.
Luan Y, Sudhir K. Forecasting marketing-mix responsiveness for new products. Journal of Marketing Research. 2010; 47 :444-457. DOI: 10.1509/jmkr.47.3.444 - 15.
Moral-Benito E. Dynamic panels with predetermined regressors: Likelihood-based estimation and Bayesian averaging with an application to cross-country growth. Banco de Espana Working Paper. 2011. DOI: 10.2139/ssrn.1844186 - 16.
Lenkoski A, Eicher TS, Raftery AE. Two-stage Bayesian model averaging in endogenous variable models. Econometric Reviews. 2014; 33 :37-41. DOI: 10.1080/07474938.2013.807150 - 17.
Fernández C, Ley E, Steel MFJ. Benchmark priors for Bayesian model averaging. Journal of Econometrics. 2001; 100 :381-427. DOI: 10.1016/S0304-4076(00)00076-2 - 18.
Moral-Benito E. Model averaging in economics: An overview. Journal of Economic Surveys. 2015; 29 :46-75. DOI: 10.1111/joes.12044 - 19.
Abrevaya J, Hausman JA, Khan S. Testing for causal effects in a generalized regression model with endogenous regressors. Econometrica. 2010; 78 :2043-2061. DOI: 10.3982/ecta7133 - 20.
Lewis SM, Eccleston JA, Russell KG. Designs for generalized linear models with several variables and model uncertainty. Technometrics. 2006; 48 :284-292. DOI: 10.1198/004017005000000571 - 21.
Clyde M, George EI. Model uncertainty. Statistical Science. 2004; 19 :81-94. DOI: 10.1214/088342304000000035 - 22.
Raftery A. Approximate Bayes factors and accounting for model uncertainty in generalized linear models. Biometrika. 1996; 83 :251-266. DOI: 10.1093/biomet/83.2.251 - 23.
Hinne M, Gronau QF, van den Bergh D, Wagenmakers E-J. A conceptual introduction to Bayesian model averaging. Advances in Methods and Practices in Psychological Science. 2020; 3 (2):200-215. DOI: 10.1177/2515245919898657 - 24.
Cohen-Cole E, Durlauf S, Fagan J, Nagin D. Model uncertainty and the deterrent effect of capital punishment. American Law and Economics Review. 2009; 11 :335-369. DOI: 10.1093/aler/ahn001 - 25.
Durlauf SN, Kourtellos A, Tan CM. Is god in the details? A reexamination of the role of religion in economic growth. Journal of Applied Econometrics. 2012; 27 :1059-1075. DOI: 10.1002/jae.1245 - 26.
Kleibergen F, Zivot E. Bayesian and classical approaches to instrumental variable regression. Journal of Econometrics. 2003; 114 :29-72. DOI: 10.1016/S0304-4076(02)00219-1 - 27.
Kass RE, Wasserman L. A reference test for nested hypotheses with large samples. Journal of the American Statistical Association. 1995; 90 :928-934. DOI: 10.1080/01621459.1995.10476592 - 28.
Mirestean AT, Tsangarides CG, Chen H. Limited information Bayesian model averaging for dynamic panels with short time periods. IMF Working Papers. 2009; 2009 :A001. DOI: 10.5089/9781451872217.001 - 29.
Koop G, Leon-Gonzalez R, Strachan R. Bayesian model averaging in the instrumental variable regression model. Journal of Econometrics. 2012; 171 :237-250. DOI: 10.1016/j.jeconom.2012.06.005 - 30.
Green PJ. Reversible jump Markov Chain Monte Carlo computation and Bayesian model determination. Biometrika. 1995; 82 :711-732. DOI: 10.1093/biomet/82.4.711 - 31.
Holmes CC, Denison DGT, Mallick BK. Accounting for model uncertainty in seemingly unrelated regressions. Journal of Computational and Graphical Statistics. 2002; 11 :533-551. DOI: 10.1198/106186002475 - 32.
Drèze JH. Bayesian limited information analysis of the simultaneous equations model. Econometrica. 1976; 44 :1045-1075. DOI: 10.2307/1911544 - 33.
Kleibergen F, van Dijk HK. Bayesian simultaneous equations analysis using reduced rank structures. Econometric Theory. 1998; 14 :701-743. DOI: 10.1017/S0266466698146017 - 34.
Strachan R, Inder B. Bayesian analysis of the error correction model. Journal of Econometrics. 2004; 123 :307-325. DOI: 10.1016/j.jeconom.2003.12.004 - 35.
Madigan D, York J. Bayesian graphical models for discrete data. International Statistical Review. 1995; 63 :215-232. DOI: 10.2307/1403615 - 36.
Rossi PE, Allenby GM, McCulloch R. Bayesian Statistics and Marketing. New York: Wiley; 2006. DOI: 10.1002/0470863692 - 37.
Dickey JM, Gunel E. Bayes factors from mixed probabilities. Journal of the Royal Statistical Society: Series B: Methodological. 1978; 40 :43-46. DOI: 10.1111/j.2517-6161.1978.tb01645.x - 38.
Sargan JD. The estimation of economic relationships with instrumental variables. Econometrica. 1958; 26 :393-415. DOI: 10.2307/1907619 - 39.
McCulloch CE, Searle S, Neuhaus JM. Generalized, Linear, and Mixed Models. 2nd ed. Hoboken, NJ: John Wiley & Sons, Inc.; 2008 - 40.
Natarajan R, Kass RE. Reference Bayesian methods for generalized linear mixed models. Journal of the American Statistical Association. 2000; 95 :227-237. DOI: 10.1080/01621459.2000.10473916 - 41.
Zeger SL, Karim MR. Generalized linear models with random effects: A Gibbs sampling approach. Journal of the American Statistical Association. 1991; 86 :79-86. DOI: 10.2307/2289717 - 42.
McCulloch CE, Nelder JA. Generalized Linear Models. 2nd ed. Chapman and Hall; 1989 - 43.
van Ravenzwaaij D, Cassey P, Brown SD. A simple introduction to Markov Chain Monte–Carlo sampling. Psychological Bulletin & Review. 2018; 25 :143-154. DOI: 10.3758/s13423-016-1015-8 - 44.
Hall DB. Zero-inflated Poisson and binomial regression with random effects: A case study. Biometrics. 2000; 56 :1030-1039. DOI: 10.1111/j.0006-341X.2000.01030.x - 45.
Albert J. A Bayesian analysis of a Poisson random effects model for home run hitters. The American Statistician. 1992; 46 :246-253. DOI: 10.2307/2685306 - 46.
Kass RE, Raftery AE. Bayes factors. Journal of the American Statistical Association. 1995; 90 :773-795. DOI: 10.1080/01621459.1995.10476572 - 47.
Gunter B. Predicting Movie Success at the Box Office. Palgrave Macmillan Cham: Springer International Publishing; 2018 - 48.
Rao VR, Ravid SAA, Gretz RT, Chen J, Basuroy S. The impact of advertising content on movie revenues. Marketing Letters. 2017; 28 :341-355. DOI: 10.1007/s11002-017-9418-5 - 49.
Elberse A, Anand B. The effectiveness of pre-release advertising for motion pictures: An empirical investigation using a simulated market. Information Economics and Policy. 2007; 19 :319-343. DOI: 10.1016/j.infoecopol.2007.06.003 - 50.
Zufryden FS. Linking advertising to box office performance of new film releases - a marketing planning model. Journal of Advertising Research. 1996; 36 :29 - 51.
Joshi A, Hanssens DM. The direct and indirect effects of advertising spending on firm value. Journal of Marketing. 2010; 74 :20-33. DOI: 10.1509/jmkg.74.1.20 - 52.
Elberse A, Eliashberg J. Demand and supply dynamics for sequentially released products in international markets: The case of motion pictures. Marketing Science. 2003; 22 :329-354. DOI: 10.1287/mksc.22.3.329.17740 - 53.
Neelamegham R, Chintagunta PK. A Bayesian model to forecast new product performance in domestic and international markets. Marketing Science. 1999; 18 :115-136. DOI: 10.1287/mksc.18.2.115 - 54.
Chang BH, Ki EJ. Devising a practical model for predicting theatrical movie success: Focusing on the experience good property. Journal of Media Economics. 2005; 18 :247-269. DOI: 10.1207/s15327736me1804_2 - 55.
Basuroy S, Chatterjee S, Ravid SA. How critical are critical reviews? The box office effects of film critics, star power, and budgets. Journal of Marketing. 2003; 67 :103-117. DOI: 10.1509/jmkg.67.4.103.18692 - 56.
Wasserman M, Mukherjee S, Scott K, Zeng XHT, Radicchi F, Amaral LAN. Correlations between user voting data, budget, and box office for films in the internet movie database. Journal of the Association for Information Science and Technology. 2015; 66 :858-868. DOI: 10.1002/asi.23213 - 57.
Simonton DK. Cinematic creativity and production budgets: Does money make the movie? The Journal of Creative Behavior. 2005; 39 :1-15. DOI: 10.1002/j.2162-6057.2005.tb01246.x - 58.
Dong X, Manchanda P, Chintagunta P. Quantifying the benefits of individual-level targeting in the presence of firm strategic behavior. Journal of Marketing Research. 2009; 46 :207-221. DOI: 10.1509/jmkr.46.2.207 - 59.
Sood S, Drze X. Brand extensions of experiential goods: Movie sequel evaluations. Journal of Consumer Research. 2006; 33 :352-360. DOI: 10.1086/508520 - 60.
Basuroy S, Chatterjee S. Fast and frequent: Investigating box office revenues of motion picture sequels. Journal of Business Research. 2008; 61 :798-803. DOI: 10.1016/j.jbusres.2007.07.030 - 61.
Smith D, Park C. The effects of brand extensions on market share and advertising efficiency. Journal of Marketing Research. 1992; 29 (3):296-313. DOI: 10.2307/3172741 - 62.
Ravid SA. Information, blockbusters and stars? A study of the film industry. The Journal of Business. 1999; 72 :463-492. DOI: 10.1086/209624 - 63.
Joshi M, Das D, Gimpel K, Smith NA. Movie reviews and revenues: An experiment in text regression. Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, USA: Association for Computational Linguistics: 2010, p. 293–296. - 64.
Debenedetti S, Ghariani G. To quote or not to quote? Critics’ quotations in film advertisements as indicators of the continuing authority of film criticism. Poetics. 2018; 66 :30-41. DOI: 10.1016/j.poetic.2018.02.003 - 65.
Reinstein D, Snyder MC. The influence of expert reviews on consumer demand for experience goods: A case study of movie critics. Journal of Industrial Economics. 2005; 53 :27-51. DOI: 10.1111/j.0022-1821.2005.00244.x - 66.
Legoux R, Larocque D, Laporte S, Belmati S, Boquet T. The effect of critical reviews on exhibitors’ decisions: Do reviews affect the survival of a movie on screen? International Journal of Research in Marketing. 2016; 33 :357-374. DOI: 10.1016/j.ijresmar.2015.07.003 - 67.
Filson D, Switzer D, Besocke P. At the movies: The economics of exhibition contracts. Economic Inquiry. 2005; 43 :354-369. DOI: 10.1093/ei/cbi024 - 68.
Eliashberg J, Jonker JJ, Sawhney MS, Wierenga B. MOVIEMOD: An implementable decision-support system for prerelease market evaluation of motion pictures. Marketing Science. 2000; 19 :226-243. DOI: 10.1287/mksc.19.3.226.11796 - 69.
Rao A, Hartmann W. Quality vs. variety: Trading larger screens for more shows in the era of digital cinema. Quantitative Marketing and Economics. 2015; 13 :117-134. DOI: 10.1007/s11129-015-9156-z - 70.
Moul CC, Shugan SM. Theatrical release and the launching of motion pictures. In: Moul CC, editor. A Concise Handbook of Movie Industry Economics. Cambridge: Cambridge University Press; 2005. pp. 80-137. DOI: 10.1017/CBO9780511614422.005 - 71.
Ghiassi M, Lio D, Moon B. Pre-production forecasting of movie revenues with a dynamic artificial neural network. Expert Systems with Applications. 2015; 42 :3176-3193. DOI: 10.1016/j.eswa.2014.11.022 - 72.
Steel MFJ. Model averaging and its use in economics. Journal of Economic Literature. 2020; 58 :644-719. DOI: 10.1257/jel.20191385 - 73.
George EI, McCulloch RE. Variable selection via Gibbs sampling. Journal of the American Statistical Association. 1993; 88 :881-889. DOI: 10.1080/01621459.1993.10476353 - 74.
Conley TG, Hansen CB, McCulloch RE, Rossi PE. A semi-parametric Bayesian approach to the instrumental variable problem. Journal of Econometrics. 2008; 144 :276-305. DOI: 10.1016/j.jeconom.2008.01.007 - 75.
Kaplan D, Lee C. Bayesian model averaging over directed acyclic graphs with implications for the predictive performance of structural equation models. Structural Equation Modeling: A Multidisciplinary Journal. 2016; 23 :343-353. DOI: 10.1080/10705511.2015.1092088 - 76.
Volinsky CT, Madigan D, Raftery AE, Kronmal RA. Bayesian model averaging in proportional hazard models: Assessing the risk of a stroke. Journal of the Royal Statistical Society: Series C: Applied Statistics. 1997; 46 :433-448. DOI: 10.1111/1467-9876.00082