Assessing Venture Capital Investor Performance Drivers With Machine Learning

Data-driven decision making is a theme that is growing probably faster than ever. Venture Capital funds can already use machine learning as an investment evaluation tool to predict factors such as company relevance, thematics, growth, and the likelihood of fundraising. Venture Capital funds know, by and large, what is happening in their thematic industries and which investors are visible in the media. However, up until now, it has been difficult to objectively measure other investors’ performance. The way investors are ranked is usually subjective and most likely biased.

My thesis work: Building an investor evaluation algorithm for NGP Capital

NGP Capital has an extensive data and analytics platform called the “Q”. The objective of my thesis work is to create a more comprehensive and structured investor performance evaluation. This way, Q can learn which common factors drive venture capital investors’ performance. More importantly, the model can further improve Q’s knowledge structure and generate a more unbiased selection group of companies based on a systematic investor analysis. NGP can broaden the search breadth, improve early evaluation funnel, and move some cognitive attention span from investors into something else.

Machine learning looks for patterns in data features to predict a determined data point, known as the target label. In our case, the label is investor performance. Therefore, I built a database of investor strategies and behaviors encompassing tens of thousands of distinct investors in order to learn what makes a great investor. However, machine learning is full of black-box algorithms that do not explain their decision-making logic. That is why my work uses explainable artificial intelligence (XAI) libraries to measure the impact in a model prediction when we change our predictive data.

The investor profile labels are based on two factors: further funding raised by portfolio companies and exit valuations in these companies. The thesis argues that larger funding amounts correlate with valuation growth, and that exits correlate with fund IRR. Due to the limited access to data and lag in returns, we cannot use fund returns directly and instead use proxies to measure investor returns.

The machine learning features are based on an extensive academic literature review about investor performance factors. The thesis concludes that there are four key feature categories with enough data:

Firstly, syndicates & networks determine who the investors know and who they syndicate with.

Secondly, investors build their reputation out of large exits and deals, and this can, in turn, help investors raise more and larger funds.

Thirdly, fund characteristics include the industry and stage focus of an investor and affect an investors capability to select and grow companies.

Finally, target company features consider portfolio company factors during time of investment, such as founder experience, media traction, as well as funding and employee growth.

Key findings

Our analysis shows that reputation and fund characteristics are important feature categories. Large exits are visible in the media and investors taking part in these exits can possibly raise more and larger funds. Furthermore, early stage funding experience and industry focus is beneficial possibly due to the ability to choose the best industries as well as the best companies within an industry. In addition to that, investors benefit from a central position in the Venture Capital network, syndicating with the most networked players since these networks expand resources available for a portfolio company. Finally, the greatest investors tend to invest into founders with prior exits, into young companies, and into companies with high website traffic.

Machine learning and XAI can help evaluate industry actions, and thus our own actions. This knowledge can improve results and help in decision-making. The findings of this study help VC funds funnel down companies with an automated business intelligence process. With machine learning, we are making this complex data simpler.

Limitations of the study

This study is limited to the dataset in Q database. Many features are left out and these results are just a simplification of all the factors of investor performance. Some examples of features left out are management team networks and technical background, the activities how investors help portfolio companies grow, market situation and industry financial data, etc. However, this model tries to represent investor actions with the data we have access to. The difference between the top performing investors might not be that big, but we can funnel out the players with average performance.


During the course of this study, it has become apparent that the four common traits that make investors successful not only helps VC funds optimize investor syndication, but it can serve as an indication of areas to concur to further develop VC investment success. The model is a more comprehensive evaluation toolkit of early pipeline selection, alleviates our biases, and frees up cognitive resources.

Henri Lencioni Aalto-yliopistosta on vuoden 2021 gradukilpailun voittaja. Kilpailun sivut löydät täältä.