
The map above looks at the estimated GDP per capita of various European Regions in 1750, just as the industrial revolution was starting.
On a per capita basis, Northern Italy, The Dutch coast, Southern England and central Spain were among the richest areas.
Now, any estimate of historic GDP involves some guestimating (see: Roman Empire GDP) so here’s how they arrived at this data.
We provide new estimates using machine learning to augment historical GDPs per capita.
Specifically, we use data on the places of birth, death, and occupations of more than half a million famous figures to estimate the GDP per capita of dozens of countries and hundreds of regions in Europe and North America for the past 700 years.
We find that our model generates accurate out-of-sample estimates, explaining 90% of the variance of independent test data. Also, it significantly improves upon a baseline model accounting for persistence in income levels.
We externally validate our estimates in two ways. First, we show that our estimates support findings by
@DAcemogluMIT et al. on Atlantic trade as a driver of the reversal of fortunes between southwestern and northwestern Europe prior to 1800.Second, our GDP per capita estimates correlate with other proxies of economic development such as urbanization rates over the past 500 years, average body height in the 18th century, wellbeing in 1850, and city-level church building activity in the 14th and 15th century.
But let’s take a step back. Why should biographies of famous figures from Wikipedia and Wikidata tell us something about economic development?
Some notable individuals directly contributed to wealth. Take James Watt. Some are attracted by wealthy places or are by-products of wealth, e.g. Michelangelo. All these channels imply a positive correlation with GDP per capita that should be mineable from biographical records.
Equipped with hundreds of geographical features derived from the biographies of famous figures per period, we train a set of supervised machine learning models (elastic net regression models) to select the most relevant features and generate out-of-sample estimates.
This data makes it possible to compare the economic development of countries and regions in Europe and North America going back 700 years. Also, this validates the use of granular biographical data with machine learning as a method to augment historical GDP per capita data.
Take Lisbon in the previous plot. Its GDP per capita drops massively after 1750, due to the disastrous earthquake in 1755. The fact that this dark time in Lisbon’s history is that visible in our estimates, even compared to other regions in Portugal, supports their validity.
We investigate two more use cases in the manuscript. That is, the history of German regions during the Protestant Reformation, and the development of Charleston, SC, in the 18th and 19th century. Also, we explore feature importance using Shapley values.
We make all our estimates, together with the collected source data publicly available. Check out our interactive website to explore how rich Vienna was at the time of Mozart or Tuscany at the time of Michelangelo:
You can read the full paper here.
Which area surprised you the most?








Leave a Reply