A Bayesian Model for Investment Decisions in Early Ventures

In this research, we present a Bayesian model to aid the investment decision in early stage start-ups and ventures. This model addresses both the venture and the angel investing markets. The model is informed both by previous academic literature on entrepreneurship and by venture capital investment practices. The model is validated through an anonymized experiment where reviewers with previous experience in entrepreneurship or investment or both scored a list of 20 anonymous real companies for which we knew the outcome a priori. The experiment revealed that the model and online scoring platform that we built provide an accuracy of 83% in identifying companies that would later on fail and where the investments would be lost. The model also performs fairly well in identifying companies where the investors would not lose their money but they would either have to wait for a very long time on their returns or they would not receive large return on investment (ROI), and we also show that the model performs modestly in identifying “big exit” companies or companies where the investors would receive high ROI and in a fairly short amount of time.


Introduction
One of the biggest challenges facing early stage investors is a lack of actionable data and effective analytics. Most investment decisions are made based on the instinct (heuristics) of the investor who may or may not have experience in the sector and decisions are often inherently biased. In investment environment is increasingly complex, and investors cannot process all of the factors that are critical to the success of a potential investment and make a well-informed decision. Research suggests that well-built analytic models make better decisions than human experts across virtually every field [1].
Some of the newest data on the returns on angel investment show that these are about 2.5 times the value of the initial investment and the average period of recovery of investment is 3.6 years [2].
In general, there is little literature with respect to automated techniques or models of investment decision. A very recently published paper shows an interesting risk analysis model that would reduce the risk of investing in early entrepreneurs [3]. This research takes a similar approach-reduce the "bad" investment decisions-but it uses a different model, based on a Bayesian model, which performs well in identifying the future failures of new ventures.
While there is understandably little academic literature on forecasting future star-up success and its relationship to investment decision-making, due to the confidentiality of the data in this business, the decision-making practice in the venture capital and angel investment industries rely heavily on the experience of the investors and on the "collective" thinking of the investors that gather together to rate or assess the pitches or business proposals for various funding rounds of investment. Therefore, this chapter presents a model for investment decision-making that is informed mainly by the practitioners and is intended to be applied in to investment practice. Its aim is to be a tool that helps the process of rating seed and start-up ventures become more informative and transparent both for investors and for entrepreneurs.
The model built for this research is mainly informed by the interviews and discussions conducted with investors during the summer of 2014. The nodes of the model and the dependencies between the nodes have been created based on these interviews, while the distributions of the prior probabilities have been informed by the academic literature where such information could be found, otherwise they are normal.
This research describes the model in general terms, how it has been implemented in practice and the results of two experiments that have been run to provide validity of its forecasting accuracy. The construction, implementation, and validation of the model, as well as a discussion of findings are presented in the following sections below.
The rest of this chapter is structured as follows: Section 2 describes the model and the rationale behind building it; Section 3 describes the experiments that were conducted using this model, mainly with the purpose of validating its accuracy; Section 4 presents the results from the experiments and an analysis of the accuracy of the model; and Section 5 summarizes succinctly the conclusions of this research.

The Bayesian investment decision model
We used Bayesian networks modeling to build a probabilistic assessment model of early stage companies or ventures. We based our selection of nodes/factors on a series of interviews and working closely with practitioners in venture capital funding. We afterwards implemented this model on an online platform, available at www.exogenius.net (see Figure 1).
The Bayesian model scores on a scale of [0, 100] the potential performance of a company/start-up by identifying three key measures: business execution, value proposition, and exit potential (see Figure 2). These measures are aggregated (nonlinearly) into an overall score of performance. Each of these three important measures scores the future potential of a project or start-up in regard to their proposition (which may be a technological innovation, a social value, or any business value that the entrepreneur presents as the core proposition), their ability to sustain, carry out, and fulfill their proposition (business execution) and the potential of this new venture to exit (either through IPO, buy-out, or in any manner that would be satisfactory for the investor).
Each of these three measures is a child of five subnetworks in the model, which are represented by more granular parent-children nodes each. These five subnetworks are business/entrepreneurship factors or indicators that are measuring the new venture on the following aspects of the business proposal: technical difficulty, uniqueness of innovation, readiness for market, customer engagement, team performance, entrepreneurial and managerial experience, founders and incorporation of the company, and many more. Each of the granular nodes in the model is represented by three to five states and they are informed either by the evidence from published literature (as described below) or otherwise by a uniform distribution priors [4].
The conditional tables of each node have been readjusted after sensitivity analysis was performed, based on data and facts previously published in the entrepreneurship and high-growth companies literature [5][6][7]. For example, the states of the technology (marginal versus breakthrough) node are defined according to the literature on entrepreneurship [7][8][9]; the number of founders is also determined based on these prior findings, i.e., the state of 2-4 founders has the highest positive impact on the final score, while the other states have low impact or negative impact (more than 5 founders lower the chances of success significantly) [5,6].
The nodes representing the team complementarity, coordination, and learning are based on the findings of the Startup Genome Project, which was run at Berkley and Stanford Universities [5,10,11]. In other words, since the findings show that team complementarity and learning are critically important for the success of the early ventures, the team node in the model reflects these findings through the distribution of prior in its states.
Similarly, the nodes that are assessing the infrastructure of the start-up (broadly construed as not only physical requirements to develop the proposed technology, but also legislative, financial, or logistic infrastructure), are informed by the currently published probabilistic values in previous studies on organizational emergence [12].
The placement of the new venture in the current market is also assessed, and this is done based on the assessment of the projected growth of the company relative to the projected growth of the market or of the industry [7,12].
For the development of the model, we used both UnBBayes [13] and GeNIe/SMILE [14] opensource softwares dedicated to Bayesian modeling. After the model was built, tested, and developed, it was migrated on the online platform, with easy to use user interface, and where we ran our experiments.
The implementation of the model on an online platform facilitated experimentation for forecasting accuracy. The nodes of the model that provide new evidence, specific to each venture, are represented as a series of 23 questions in a user-friendly interface. For example, the evidence node in the model that represents the uniqueness of the offering became the question "How unique is the proposed offering (idea/innovation/technology/product/service)?" in the online platform. The nodes that were not evidence in the model have obviously not been represented as questions in the online implementation. The reviewers/users have the possibility to see the progression of the three key scores (value proposition, business execution, and exit potential) as well as the final score as they go through answering the individual assessment questions.

The experimental design for model validation
In order to validate the accuracy of the model scores, an anonymized experiment was designed, where 20 case studies of companies were recreated from real, historical companies. These case studies included the state of funding and potential of various companies while they were startups, before their first or second seed funding and the aim of the experiment was to show whether the exit or the overall scores of the model align statistically with what happened in real life.
In the experiment, there were randomly picked 20 historical cases for which we know the ground truths about their financial history (how they started, how much was their initial funding, and how much was their exit), by using publicly available information from Crunch-Base website, Wikipedia and various failed start-ups, and postmortems case studies. The companies in the sample for the experiment had either high exits (were bought for more than $500 million), medium exits (were bought for 100-1000K or they took a very long time to exit, i.e., 20 years), or no exits (they shut down or went bankrupt soon after their launch).
Each of these 20 case studies in the sample were recreated as anonymous business proposals, given the information at the time when they were seeking initial funding (i.e., 2010). Therefore, each of these anonymized case studies included the following information: the year when the reviewer had to "travel back in time" (i.e., 2010), with a hyperlink toward published most important business and technological events of that year (i.e., the economist), the company location, the number of founders, the type of incorporation, anonymized information about the founders experience, information about the market and industry at that time, information about the customers, the team, the infrastructure, about the financial past of the company if it existed and, most importantly, information about the product or technology without disclosing its brand name. The reviewers were also free to look for additional information on the web regarding the state of technology and business at that particular time in the past. The oldest case study was placed in 1999 and the newest one in 2014.
In other words, all the possible information about a company that could be included prior to the time of their initial funding request was we included, as long as it could be anonymized.
We conducted two experiments: one with experts in business or investing and other with MBA students at the University of Maryland.
The first experiment was carried by 24 volunteer reviewers, who reviewed five of these anonymous case studies each, by answering the questions from online platform at the forefront of our model for each of their assigned five case studies. The reviewers in the experiment are experienced as either entrepreneurs or investors; therefore, they are a panel of experts that completed the experiment.
The second experiment was carried by MBA students at the University of Maryland, in a 1 h long session. The students were also randomly assigned five case studies each and answered the same questions from the online platform as the experts did.

Results and accuracy analysis
The first experiment started on March 22, 2016 and by April 13, 2016, 54% of reviewers completed their reviews. We collected 68 (reviews) X-4 (scores) data points. The second experiment was carried out during 1 day in October 2016. A reviewer provides the observations for the evidence nodes/questions in the model. The model then provides a distribution on all scores as output, conditional on these observations. Thus, the Bayesian model here is a three-layer model where the metrics are at the top level in the network and the observations (market evaluation, team evaluation, etc.) are at the bottom layer of granular nodes.
Both the measures in the model and the observations are discrete.
The data from the anonymized experiments were rematched with the ground truth data from the real case studies and compared the experiments with the evidence on three groups of companies (high exits, medium exits, no exits). The distributions of the exit scores and the overall scores from the experiment for each of these groups are plotted on the following figures (see Figures 3-7).   Figure 3, except that these are the scores of the professional reviewers for the exit node and not the overall score. The low exits were scores mainly with values close to 0, medium exits with scores between 10 and 60, and high exits scores were very close to a uniform distribution. We can observe from these distributions that the "no exits" or "failures" scored low in both experiments, that the medium exits had medium scores in both experiments, and that the high exits had low, medium, and high scores in both experiments, whether we look at the final overall score or only at the exit key intermediate score (see Figure 8).  In other words, there is consistency between the two groups of reviewers with respect to each of the three groups of companies. Moreover, there is consistency in the reviewers responses and the ground truth data with respect to low-exit and medium-exit companies, but less so for high-exit companies. In other words, we can use this model to identify failures or low exits, but less so to identify high exits and, therefore, the model is designed to prune out "bad" proposals from a pool of varied investment opportunities.
Between the two experiments, we can also observe that the experts are still slightly better than MBA students at identifying low and medium exits.
The responses from the experiment for the "no exits" had a mean exit score of 20% and a median exit score of 16% and a mean and median overall score of 27% with a standard deviation of 16-17%. This means that the companies that failed in real life were reviewed with scores in the range of 16-27% in our model.
The medium exits experimental data had a mean exit score of 31%, a median of 28% and an overall mean and median of 34 and 36%, respectively, with standard deviations of 20 and 17%, respectively. This means that the companies that had medium exits (either low in capital value or took very long to exit) scored around the probabilities of 28-36% in our model.
The high exits had a mean and median exit score of 42%, an overall mean and median of 46% and a standard deviation of 28and 25%, respectively. This means that companies that were bought for more than $500 million in real life scored around 42-46% in our model (see Table 1).
The accuracy performance of the model was analyzed by using simple quantitative forecasting analysis. Specifically, the mean absolute deviation was used as a metric to calculate the forecasting error. The resolution value of 1 was considered for the companies with high exits, 0.5 for the medium exist, and 0 for the failed or no exit companies. The difference between these resolutions and the actual probabilities given by the reviewers was calculated as a mean absolute deviation. Based on this calculation, the overall accuracy of the model is situated at 75%, the accuracy for the no exits is valued at 83% and the accuracy for the medium and high exits is 77 and 41%, respectively (see Table 1).

Conclusions
In this research, a probabilistic model that assesses the potential for exit and overall performance of new ventures (start-ups) is presented, from building it based on practice and published statistical data, to its implementation in a readily available online platform that can be used by entrepreneurs and investors alike. The model is designed to assess quantitatively the potential of business while they are still at the very initial stages. The model is well informed with facts that we know from previous academic literature on entrepreneurship and high-growth companies, as well as informed in detail with venture capital experience and practices by working closely with them during the development phase of the model.
The model is validated using two anonymized experiments with experts in the field and MBA students and is currently translated into a commercial product. The results of these experiments and the details of the model are being presented in this chapter as both a validation method and as a viable metric or indicator that can detect ahead of time the future failures and "bad investments." This model can thus be also used by entrepreneurs to self-assess and identify points of weakness in their proposals and current seed ventures. Therefore, this research is presenting a tool for investment decision that can be easily automated and scaled up for the use of any potential investor, either angel or venture or any entrepreneur.
At the same time, these research efforts are also a good pathway to shed more transparency in the investment road map.