How should we analyse survey forecasts of AI timelines?

Read at AI Impacts

The Expert Survey on Progress in AI (ESPAI) is a large survey of AI researchers about the future of AI, conducted in 2016, 2022, and 2023. One main focus of the survey is the timing of progress in AI1.

The timing-related results of the survey are usually presented as a cumulative distribution function (CDF) showing probabilities as a function of years, in the aggregated opinion of respondents. Respondents gave triples of (year, probability) pairs for various AI milestones. Starting from these responses, two key steps of processing are required to obtain such a CDF:

  • Fitting a continuous probability distribution to each response
  • Aggregating these distributions

These two steps require a number of judgement calls. In addition, summarising and presenting the results involves many other implicit choices.

In this report, I investigate these choices and their impact on the results of the survey (for the 2023 iteration). I provide recommendations for how the survey results should be analysed and presented in the future.

Headline results

This plot represents a summary of my best guesses as to how the ESPAI data should be analysed and presented.

CDF of ESPAI survey showing median and central 50% of expert responses.

Previous headline results

Thousands of AI Authors on the Future of AI, Figure 3. I added annotations to the 20%, 50%, and 80% points, for comparison with my plot.

I differ from previous authors in four main ways:

  • Show distribution of responses. Previous summary plots showed a random subset of responses, rather than quantifying the range of opinion among experts. I show a shaded area representing the central 50% of individual-level CDFs (25th to 75th percentile). More
  • Aggregate task and occupation questions. Previous analyses only showed task (HLMI) and occupation (FAOL) results separately, whereas I provide a single estimate combining both. By not providing a single headline result, previous approaches made summarization more difficult, and left room for selective interpretations. I find evidence that task automation (HLMI) numbers have been far more widely reported than occupation automation (FAOL). More
  • Median aggregation. I’m quite uncertain as to which method is most appropriate in this context for aggregating the individual distributions into a single distribution. The arithmetic mean of probabilities, used by previous authors, is a reasonable option. I choose the median merely because it has the convenient property that we get the same result whether we take the median in the vertical direction (probabilities) or the horizontal (years). More
  • Flexible distributions: I fit individual-level CDF data to “flexible” interpolation-based distributions that can match the input data exactly. The original authors use the Gamma distribution. This change (and distribution fitting in general) makes only a small difference to the aggregate results. More

Direct comparison of 3/4 elements of our approach with previous results

Combined effect of our approach (3/4 elements), compared with previous results. For legibility, this does not show the range of responses, although I consider this one of the most important innovations over previous analyses.

CDF Framing of automation Distribution family Loss function Aggregation p20 p50 p80
Blue (ours) Aggregate of tasks (HLMI) and occupations (FAOL) Flexible Not applicable Median 2048 2073 2103
Orange (previous) Tasks (HLMI) Gamma MSE of probabilities Arithmetic mean of probabilities 2031 2047 2110
Green (previous) Occupations (FAOL) Gamma MSE of probabilities Arithmetic mean of probabilities 2051 2110 2843

Note: Although previous authors give equal prominence to the orange (tasks, HLMI) and green (occupations, FAOL) results, I find evidence that the orange (tasks, HLMI) curve has been far more widely reported (More).

The last two points (aggregation and distribution fitting) directly affect the numerical results. The first two are about how the headline result of the survey should be conceived of and communicated.

These four choices vary in both their impact, and in my confidence that they represent an improvement over previous analyses. The two tables below summarise my views on the topic.

Choice Impact on understanding and communication of main results Confidence it’s an improvement
Show range of responses (More) High Very high
Aggregate FAOL and HLMI (More) High Moderate
Choice Numerical impact on aggregate CDF Confidence it’s an improvement
Median aggregation (More) High Very low
Flexible distributions (More) Minimal High

Even if you disagree with these choices, you can still benefit from my work! The code used to implement these new variations is open source. It provides user-friendly configuration objects that make it easy to run your own analysis and produce your own plots. The source data is included in version control. AI Impacts plans to use this code when analysing future iterations of ESPAI. I also welcome engagement from the wider research community.

Suggested textual description

If you need a textual description of the results in the plot, I would recommend:

Experts were asked when it will be feasible to automate all tasks or occupations. The median expert thinks this is 20% likely by 2048, and 80% likely by 2103. There was substantial disagreement among experts. For automation by 2048, the middle half of experts assigned it a probability between 1% and a 60% (meaning ¼ assigned it a chance lower than 1%, and ¼ gave a chance higher than 60%). For automation by 2103, the central half of experts forecasts ranged from a 25% chance to a 100% chance.2

This description still contains big simplifications (e.g. using “the median expert thinks” even though no expert directly answered questions about 2048 or 2103). However, it communicates both:

  • The uncertainty represented by the aggregated CDF (using the 60% belief interval from 20% to 80%)
  • The range of disagreement among experts (using the central 50% of responses)

In some cases, this may be too much information. I recommend if at all possible that the results should not be reduced to the single number of the year by which experts expect a 50% chance of advanced AI. Instead, emphasise that we have a probability distribution over years by giving two points on the distribution. So if a very concise summary is required, you could use:

Surveyed experts think it’s unlikely (20%) it will become feasible to automate all tasks or occupations by 2048, but it probably will (80%) by 2103.

If even greater simplicity is required, I would urge something like the following, over just using the median year:

AI experts think full automation is most likely to become feasible between 2048 and 2103.

Contents

  1. Headline results
    1. Suggested textual description
  2. The distribution of raw responses
    1. Example: Retail Salesperson occupation
    2. Timing of human-level performance
  3. Aggregation
    1. Possible methods
    2. Mean vs Median aggregation
    3. Why the mean and median differ
    4. Aside: the winsorized geometric mean of odds
  4. Distribution fitting
    1. Why fit a distribution?
    2. Limitations of previous analyses
      1. Constraints of Gamma distribution
      2. Inappropriate loss function
    3. Flexible distributions
    4. Other distributions
  5. Displaying the distribution of responses
  6. Aggregating across the task and occupation framings
  7. Codebase

The distribution of raw responses

Even readers who are familiar with ESPAI may only have seen the results after processing. It can be helpful to look at the raw data, i.e. respondent’s answers to questions before any processing, to remind ourselves how the survey was conducted.

All questions about how soon a milestone would be reached were framed in two ways: fixed-years and fixed-probabilities. Half of respondents were asked to estimate the probability that a milestone would be reached by a given year (“fixed-years framing”), while the other half were asked to estimate the year by which the milestone would be feasible with a given probability (“fixed-probabilities framing”).

Example: Retail Salesperson occupation

Responses about one such milestone (say, the occupation of retail salesperson in the example below), if shown as a scatterplot, form three horizontal lines for fixed probabilities, and three vertical lines for fixed years. These correspond to the six questions being asked about the milestone:

Scatterplot of survey responses for automating retail salesperson occupation, showing fixed probabilities and fixed years.

The scatterplot is a helpful reminder of the data’s shape in its rawest form. However, all scatterplots will form three horizontal and three vertical lines. We can show more structured information about the distribution of responses for each of the six questions by using six box and whisker plots3, as shown below:

Box plots showing probability and year distributions for automating retail salesperson occupation.

We can see several useful things in this set of box plots:

  • There is a large framing effect, whereby the fixed-years framing produces later predictions. (This effect is familiar from previous analyses of ESPAI, where it has been shown to occur systematically).
    • For example, the prediction (2043, 50%) is the 85th percentile of responses for the 50% question in the fixed-probabilities framing, while the same prediction is the median response for the 2043 question in the fixed-years framing.
    • When asked about 2073 in the fixed-years framing, the median response was 90%, which is much later than even the 85th percentile response to the 90% question int the fixed-probabilities framing.
  • Responses follow a skewed distribution
    • For all three questions in the fixed-probabilities framing, the responses have a large right skew
    • In the fixed-years framing, the 2033 question produces a right skew (up skew in the boxplot), whereas the 2073 question produces a left-skew (down skew in the boxplot), with more than 25% of respondents giving a probability of 100%.
  • There is a wide range of responses, indicating substantial disagreement among respondents. For example, when asked about 2043, the interval (centred on the median) that contains half of responses ranged from a 30% chance to a 90% chance. The interval that contains 70% of responses ranged from a 10% chance to a 98% chance.

We can now look at the distribution of raw responses for the timing of human-level performance.

Timing of human-level performance

When the survey investigated the timing of human-level performance, the question was framed in two ways, as tasks, and as occupations4:

  • “High-Level Machine Intelligence” (HLMI): when unaided machines can accomplish every task better and more cheaply than human workers.
  • “Full Automation of Labor” (FAOL): when for any occupation, machines could be built to carry it out better and more cheaply than human workers.

We can now take each of these in turn (expand the collapsible sections below).

Raw data

Box plots for FAOL in the fixed-years and fixed-probabilities framings.

In the fixed probabilities framing, respondents were asked for the number of years until a 10%, 50%, and 90% probability of FAOL.

Probability of FAOL Mean response 15th percentile response Median response 85th percentile response
10% 508177 10 40 100
50% 783590 20 70 200
90% 1.01197e+06 35 100 500

In the fixed years framing, respondents were asked for the probability of FAOL within 10, 20, and 50 years.

Years until FAOL Mean response 15th percentile response Median response 85th percentile response
10 6.02% 0.00% 0.00% 10.00%
20 12.30% 0.00% 2.00% 30.00%
50 24.66% 0.00% 10.00% 60.00%

Box plots for HLMI in the fixed-years and fixed-probabilities framings.

In the fixed probabilities framing, respondents were asked for the number of years until a 10%, 50%, and 90% probability of HLMI.

Probability of HLMI Mean response 15th percentile response Median response 85th percentile response
10% 41.19 2 5 20
50% 1307.74 7 20 50
90% 457537 15 50 100

In the fixed years framing, respondents were asked for the probability of HLMI within 10, 20, and 40 years.

Years until HLMI Mean response 15th percentile response Median response 85th percentile response
10 18.25% 0.00% 10.00% 50.00%
20 34.70% 4.00% 30.00% 75.00%
40 54.58% 10.00% 50.00% 95.00%

Aggregation

Possible methods

All previous analyses produced the aggregate distribution by taking the average of CDF values, that is, by taking the mean of probability values at each year.

There are many other possible aggregation methods. We can put these into two categories:

  • vertical methods like the above aggregate probability values at each year
  • horizontal methods aggregate year values at each probability

This figure illustrates both methods on a very simple example with two CDFs to aggregate.

Diagram showing vertical and horizontal methods for aggregating CDFs.

For both vertical and horizontal aggregation, we need not take the mean of values. In principle any aggregation function could be used, of which the mean and median are only the two most obvious examples.

When it comes to aggregating probabilities (vertical aggregation), there are additional complications. The topic has been well studied, and many aggregation methods have been proposed.

A full assessment of the topic would take us far beyond the scope of this report, so I will only briefly mention one prominent recommendation: taking the geometric mean of odds. The core observation is that the arithmetic mean of probabilities ignores information from extreme predictions. This can be seen with a simple example. In scenario A, we aggregate the two predictions (1%, 10%), whereas in scenario B the two predictions are (0.1%, 10%). The arithmetic mean of probabilities is close to 5% in both cases (5.5% for A and 5.05% for B). It gives very little weight to the difference between 1% and 0.1%, which is after all a factor of 10. The geometric mean of odds reacts much more strongly to this much more extreme prediction, being about 3.2% in scenario A, and 1.0% in scenario B.

This behaviour of the geometric mean of odds is theoretically appealing, but it is only advisable if extreme predictions are really to be taken at face value. We might worry that these predictions are much more overconfident.

As a further complication, in the case of ESPAI, in practise we cannot apply the geometric mean of odds. This is because for nearly every year (every vertical line) we might consider, many of the respondents’ fitted CDFs take values indistinguishable from 0 or 1. This causes the geometric mean of odds to immediately become 0 or 15.

Visualization showing the problems with using the geometric mean of odds for aggregation, where extreme probabilities lead to the aggregate CDF reaching 0 or 1.

Aggregating years is also problematic. Because the input is bounded on the left but not the right, the arithmetic mean of responses is inevitably dominated by extremely large values. This method would produce a CDF where any probability of the event is essentially infinitely many years away. We might hope to address this problem by using the geometric mean of years, but this in turn suffers from numerical and conceptual issues similar to those of the geometric mean of odds. Ultimately, taking the median of years is the only method of aggregating years I was able to apply.

The median of years and median of probabilities give the same answer. This makes intuitive sense since CDFs are strictly increasing6. So I simply call this the “median”.

Mean vs Median aggregation

As a result of these difficulties, I will present only the following aggregation methods:

  • (Arithmetic) mean of probabilities
  • Median of probabilities

These plots use the Gamma distribution with the mean square error (MSE) of probabilities as the loss function, so the mean aggregation line corresponds to the results of previous analyses.

Mean vs Median aggregation

Comparison of mean and median aggregation methods for FAOL. Extended comparison of mean and median aggregation for FAOL with additional data points.

Comparison of mean and median aggregation methods for HLMI.

Comparison of mean and median aggregation methods for Truck Driver occupation.

Comparison of mean and median aggregation methods for Surgeon occupation.

Comparison of mean and median aggregation methods for Retail Salesperson occupation.

Comparison of mean and median aggregation methods for AI Researcher occupation.

We see a notable pattern in each of these cases. As we go from left to right, the median always starts below the mean (i.e. the median initially gives later predictions), but eventually overtakes the mean (i.e. the median eventually gives earlier predictions). Median aggregation also always gives rise to a more confident probability distribution: one whose probability mass is more concentrated.

Why the mean and median differ

When the mean is very different from the median, this means that the distribution of responses is highly skewed. We can illustrate this by displaying a histogram of CDF values for a given year.

Histogram showing distribution of FAOL CDF values, comparing mean and median aggregation.

The results for automation of all occupations (FAOL) are quite interesting. For the years 2040 and 2060, the results are extremely skewed. A large majority assigns very low probabilities, but there is a right tail of high probabilities, which causes the mean to greatly exceed the median. For the years 2080 and 2100, a bimodal distribution emerges. We have a big cluster with probabilities near zero and a big cluster with probabilities near 1. Opinion is extremely polarised. By 2200 the median exceeds the mean. When we reach 2500, a majority think FAOL is near-certain, but a significant left tail causes the mean to lag far behind the median.

Histogram showing distribution of HLMI CDF values, comparing mean and median aggregation.

With task automation (HLMI), we see the same basic pattern (repeated everywhere): the median at first trails behind the mean, and for later years and higher probabilities, the median overtakes the mean. However, skewness is less extreme than for FAOL, and we do not see a strongly bimodal histogram (extreme polarisation) at any point.

Aside: the winsorized geometric mean of odds

There is one way to use the geometric mean of odds that avoids the problem of zeroes and ones. This is to winsorize the data: to replace the most extreme values with less extreme values. For example, we could replace all values less than 0.1 with 0.1, and all values greater than 0.9 with 0.9.

Of course, this introduces a highly subjective choice that massively affects the results. We could replace all values less than 0.01 with 0.01, or all values less than 0.001 with 0.001. Therefore, I do not consider this technique suitable for the creation of headline result.

However, it lets us do some potentially interesting explorations. Winsorising essentially means we do not trust the most extreme predictions. We can now explore what the geometric mean of odds would look like under various degrees of winsorisation. e.g. what does it look like if we ignore all predictions more extreme than 1:100? What about 1:1000?

Plot showing how winsorization levels affect geometric mean aggregation for FAOL. Plot showing how winsorization levels affect geometric mean aggregation for HLMI.

Very informally, we can see that for HLMI (tasks) and FAOL (occupations), the arithmetic mean of probabilities roughly corresponds to the geometric mean of odds with a winsorisation level of about 1:10. We’ve already discussed the well-known effect that the arithmetic mean of probabilities ignores more extreme predictions, compared to the geometric mean of odds. For this particular data, we can roughly quantify this effect, and see that it is equivalent to ignoring all predictions <10% and >90%. I find this to be quite an extreme level of winsorisation. Consider, for example, that the fixed-probabilities framing explicitly asked for predictions at the 10% and 90% levels – it would be odd to simultaneously consider these probabilities too extreme to be trusted.

Distribution fitting

All previous analyses of the ESPAI data fitted each respondent’s CDF data (triples of (year, probability)) to a Gamma distribution before aggregating these distributions.

Why fit a distribution?

Creating a full continuous distribution from three CDF points necessarily imposes some assumptions that were not present in the data. And recall, the respondents just gave numbers in text fields, and never saw the distribution that was later fitted to their CDF data.

So to begin with, it’s worth asking: why fit a distribution at all?

If we are only looking at a particular framing and question, for example FAOL with fixed years, it may indeed be preferable to look directly at the raw data. This allows us to talk strictly about what respondents said, without any additional assumptions. Even in this restricted setting, however, we might want to be able to get predictions for other years or probabilities than those respondents were asked about; this requires a full CDF. A simple example where this is required is for making comparisons across different iterations of the ESPAI survey in the fixed-years setting. Each survey asks for predictions about a fixed number of years from the date of the survey. So the fixed-years question asks about different calendar years each time the survey is made.

A more fundamental problem for the raw data approach is that we wish to aggregate the results of different framings into a single estimate. We can only aggregate across the fixed-years and fixed-probability framings by aggregating full distributions. In addition, even within the fixed-years framing, we cannot aggregate the occupations (FAOL) and tasks (HLMI) framings, because different years were used (10, 20, and 40 years for HLMI and 10, 20, and 50 years for FAOL).

Limitations of previous analyses

Constraints of Gamma distribution

While creating a full CDF from three points inevitably imposes assumptions not present in the data, we might think that, at a minimum, it would be desirable to have this CDF pass through the three points.

Previous analyses used a Gamma distribution. The gamma distribution is a 2-parameter distribution that is able to exactly match 2 points of CDF data, but not 3 points. The gamma distribution (and any 2-parameter distribution) necessarily loses information and distorts the expert’s beliefs even for the points where we know exactly what they believe.

Here are 9 examples from the fixed-probabilities framing7. They are representative of the bottom half of fits (from the 50th to 90th percentile). Each subplot shows an example where the fitted gamma CDF (shown as a gray curve) attempts to match three points from a respondent’s data (shown as red crosses).

Gamma distribution fits to fixed probabilities survey data, showing respondent points and Gamma CDFs.

First, we can see that the gamma is not flexible enough to match the three points. While the median fit (A1) is acceptable, some fits are poor.

Inappropriate loss function

In addition, if we look carefully we can see an interesting systematic pattern in the poor fits. We can see that when the Gamma has trouble fitting the data, it prefers to fit two points well, even at the expense of a very poor fit on the third point, rather than choosing a middle ground with an acceptable fit on all three points. This begins to be visible in row B (67th to 87th percentile), and becomes blatantly clear in row C (84th to 95th percentile). In fact, in C2 and C3, the Gamma fits two points exactly and completely ignores the third. When this happens, the worst-fit point is always the 0.9 or 0.1 point, never the 0.5 point.

The errors at the 0.1 and 0.9 points can completely change the nature of the prediction. This becomes clear if we go to odds space, and express the odds ratio between the data and the gamma CDF.

Gamma distribution fits to fixed probabilities survey data, showing respondent points and Gamma CDFs, with annotations

While a 2x odds ratio (e.g. in B2) is already substantial, when we move to the worst 15% of the fits, the odds ratio for the worst of the three points becomes astronomical.

The reason this happens is that the loss function used in previous work is not the appropriate one.

Previous analyses used mean squared error (MSE) of probabilities as their loss function: LMSE=i(pip^i)2L_{\text{MSE}} = \sum_i (p_i - \hat{p}_i)^2, where pip_i are the probabilities from the respondent’s data and p^i\hat{p}_i are the probabilities from the fitted CDF. This loss function treats all probability differences equally, regardless of where they occur in the distribution. For instance, it considers a deviation of 0.05 to be equally bad whether it occurs at p=0.5p=0.5 or at p=0.99p=0.99.

This is inappropriate when fitting CDF data. Consider the case depicted in C2, where the respondent thinks the event in question is 90% likely by 150 years from the date of the survey. Meanwhile, the Gamma CDF fitted by MSE gives a probability of 99.98% at 150 years. This dramatic departure from the respondent’s beliefs is represented in the 777x odds ratio. A 777x odds ratio at p=0.5p=0.5 would mean changing from even odds (1:1) to odds of 777:1, or a probability of >99.8%. (A 13x odds ratio, as seen for the 0.1 point in C1 (84th percentile), would mean changing from even odds to odds of 13:1, or a probability of 93%.)

The appropriate loss function for CDF data is the log loss, also known as the cross-entropy loss: Llog=i[pilog(p^i)+(1pi)log(1p^i)]L_{\text{log}} = -\sum_i [p_i \log(\hat{p}_i) + (1-p_i)\log(1-\hat{p}_i)]. This loss function naturally accounts for the fact that probability differences near 0 and 1 represent much larger differences in beliefs than the same probability differences near 0.5.8

As expected from this theoretical argument, we can see that the log loss, unlike the MSE of probabilities, does not display the pathological behaviour of ignoring the 0.1 or 0.9 point, and so avoids extreme odds ratios (see especially C1-C3):

Comparison of log loss and MSE on Gamma distribution fits for fixed probabilities framing.

As an informal analysis, this plot suggests that the MSE leads to extremely poor fits on >15% of the data, but also that most of the MSE fits are close to the log loss fits.

When we create the aggregate CDF, we see hardly any impact of the loss function:

Aggregate CDFs for HLMI and FAOL using Gamma and flexible distributions.

Flexible distributions

Regardless of the loss function used, we know that the Gamma distribution cannot match the three points of CDF data given by the expert9. When we use any such distribution, our results do not merely reflect the expert’s beliefs, they also reflect the mathematical constraint we imposed upon those beliefs.

Overriding expert responses in this way may be appropriate when we have a strong theoretical justification to impose a particular distribution family. For example, if we have a strong reason to believe that experts think (or ought to think) of a variable as a sum of many small independent contributions, we may wish to impose a normal distribution, even if the responses they gave us are incompatible with a normal distribution.

However, the authors of previous analyses did not justify the choice of the gamma distribution at any point. In addition, I am not aware of any strong argument to impose a particular distribution family in this case.

While creating a full CDF from three points inevitably imposes assumptions not present in the data, at a minimum, it would be desirable to have this CDF pass through the three points.

To achieve this, I used proprietary probability distributions which I call ‘flexible distributions’. I developed these over the last several years, for precisely the class of use cases faced by ESPAI. These distributions have the following properties:

  • Always exactly match three CDF points (or indeed an arbitrary number of them)…
  • …while taking a simple and smooth shape
  • Can be unbounded, or given an upper or lower bound, or both

The distributions I used in this analysis are based on interpolation theory. The full mathematical and algorithmic details are proprietary, and the distributions are available as a service at MakeDistribution.com. However, you can see how these distributions behave with the free interactive web UI below (select interpolation-based families under expert settings). In addition, to make this work reproducible, the specific fitted CDFs used in the ESPAI analysis are open source10.

Try the MakeDistribution interface

This plot compares Gamma distribution fits versus flexible distribution fits for fixed probabilities framing, displaying respondent points and Gamma CDFs.

Comparison of Gamma and flexible distribution fits for fixed probabilities framing.

When we aggregate the individual distributions, however, we find that the choice of distribution has a very limited impact, barely any more than the impact of the loss function.

Aggregate CDFs for HLMI and FAOL using Gamma and flexible distributions.

It may be somewhat surprising to see so little difference in aggregate, when we consider that there appeared to be systematic patterns in the poor gamma fits11. However, this might be explained by the fact that the majority of fits were of acceptable quality.

I ran many variations of this analysis (and so can you, using the open-source codebase). None showed a dramatic effect of the distribution family.

Other distributions

In addition to flexible distributions, I also investigated the use of alternative ‘traditional’ distributions, such as the Weibull or Generalised gamma9. For each distribution family, I tried fitting them both with the MSE of probabilities loss used by previous authors, and with the log loss. These had little impact on the aggregate CDF, which might be considered unsurprising since even the flexible distribution did not have large effects on aggregate results.

Displaying the distribution of responses

What is the range of opinion12 among experts? Previous analyses gave only an informal sense of this by displaying a few dozen randomly selected CDFs:

Scatterplot of previous ESPAI survey CDFs.

When Will AI Exceed Human Performance? Evidence from AI Experts, Figure 1.

Their plots also included a 95% bootstrap confidence interval for the mean CDF. This is a measure of statistical variability in the estimate of the mean due to the finite sample size, not a measure of the dispersion of responses. Since ESPAI sample sizes are quite large, and the mean hence quite precisely estimated, I believe this bootstrap confidence interval is of secondary importance.

I dispense with the bootstrap CI and instead use the shaded area around the aggregate CDF to show the distribution of responses, specifically the central half of CDFs, from the 25th to the 75th percentile. This is a more systematic and quantitative alternative to displaying a random subset of individual CDFs.

It is clear that authors of previous ESPAI analyses are well aware of what the bootstrap CI measures and interpret it correctly. However, it’s possible that some casual readers did not become fully aware of this. For the avoidance of doubt, the 95% bootstrap CI is radically different (and radically more narrow) than the interval containing 95% of individual CDFs. The latter would cover almost the entire plot:

Shaded area showing central 95% of individual CDFs for HLMI

The degree of disagreement among respondents is such that instead of 95%, I show the central 50% in my plots. This is the widest interval that I found sufficiently visually informative. More typical intervals like the central 80% or 70% would cover such a wide range of predictions as to be less informative.

Distribution of responses

Shaded area showing central 95% of individual CDFs for FAOL

Shaded area showing central 80% of individual CDFs for FAOL

Shaded area showing central 70% of individual CDFs for FAOL

Shaded area showing central 50% of individual CDFs for FAOL

Shaded area showing central 95% of individual CDFs for HLMI

Shaded area showing central 80% of individual CDFs for HLMI

Shaded area showing central 70% of individual CDFs for HLMI

Shaded area showing central 50% of individual CDFs for HLMI

Aggregating across the task and occupation framings

Before being asked for their forecasts, respondents were shown the following definitions for HLMI (High-Level Machine Intelligence) and FAOL (Full Automation of Labor):

HLMI (tasks):

High-level machine intelligence (HLMI) is achieved when unaided machines can accomplish every task better and more cheaply than human workers. Ignore aspects of tasks for which being a human is intrinsically advantageous, e.g., being accepted as a jury member. Think feasibility, not adoption.

FAOL (occupations):

Say an occupation becomes fully automatable when unaided machines can accomplish it better and more cheaply than human workers. Ignore aspects of occupations for which being a human is intrinsically advantageous, e.g., being accepted as a jury member. Think feasibility, not adoption. Say we have reached ‘full automation of labor’ when all occupations are fully automatable. That is, when for any occupation, machines could be built to carry out the task better and more cheaply than human workers.

The two questions are very similar. The main difference is that HLMI is phrased in terms of tasks, while FAOL asks about occupations. In principle, we should expect the same prediction on both questions. As noted by the authors,

since occupations might naturally be understood either as complex tasks, composed of tasks, or closely connected with one of these, achieving HLMI seems to either imply having already achieved FAOL, or suggest being close.

So it is legitimate to think of these as two different framings of the same question.

Despite their similarity, these framings yield very different predictions. The figures below show the result of using my preferred settings (median aggregation, flexible distributions), except that HLMI and FAOL are shown separately instead of aggregated:

HLMI vs FAOL

Previous analyses never aggregated the task and occupation results, only presenting them separately. Recall that using their methodology13, the authors reported the median year for all human tasks was 2047, while for occupations it was 2116 (a difference of 69 years!).

Presenting results separately allows for a deeper understanding for patient and sophisticated readers. However, we must be realistic: it is very likely that a single “headline” result will be most widely spread and remembered. Attempting to prevent this by only presenting HLMI (tasks) and FAOL (occupations) separately is largely futile, in my opinion. While it may sometimes encourage nuance, more often it will make it easier for readers to choose whichever of the two results best fits their preconceptions.

Indeed, my brief investigation suggests that citations of the 2023 survey results are strongly biased towards tasks (HLMI) over occupations (FAOL). Out of the 20 articles14 on the first two pages of Google Scholar citations of the 2024 preprint, 7 reported at least one of HLMI or FAOL. Among these

  • 6 out of 7 (85%) reported tasks (HLMI) only
  • 1 out of 7 (15%) reported tasks and occupations
  • None (0%) reported occupations (FAOL) only

Therefore, I consider it preferable, when providing headline results, to aggregate accross HLMI and FAOL to provide a single estimate of when all tasks or occupations will be automatable.

I achieve this by simply including answers to both questions prior to aggregation, i.e. no special form of aggregation is used for aggregating tasks (HLMI) and occupations (FAOL). Since more respondents were asked about tasks than occupations, I achieve equal weight by resampling from the occupations (FAOL) responses.

Codebase

For this analysis, I wrote a fully new codebase, available at github.com/tadamcz/espai. This was necessary because system used for previous analyses relied on a collection of Jupyter notebooks that required manually running cells in a specific, undocumented order to achieve results.

This new codebase, written in Python, makes our analyses reproducible for the first time. The codebase includes a robust test suite.

We are open sourcing the codebase and invite scrutiny and contributions from other researchers. It provides user-friendly configuration objects that make it easy for you to run your own variations of the analysis and produce your own plots.

  1. Timing will be my sole focus. I ignore ESPAI’s questions about whether the overall impact of AI will be positive or negative, the preferred rate of progress, etc. 

  2. This uses plain language as much as possible. Depending on your audience, you may wish to replace “central half” with “interquartile range”, or use phrases like “75th percentile”. Also, you can round 2048 to 2050 and 2103 to 2100 without losing anything of value. 

  3. Note that the ‘whiskers’ of our box plot are slightly nonstandard: they show the 15th and 85th percentile responses. Whiskers are more commonly used to represent the 1.5 IQR value: from above the upper quartile (75th percentile), a distance of 1.5 times the interquartile range (IQR) is measured out and a whisker is drawn up to the largest observed data point from the dataset that falls within this distance. 

  4. As a further subtlety, “the question sets do differ beyond definitions: only the HLMI questions are preceded by the instruction to “assume that human scientific activity continues without major negative disruption,” and the FAOL block asks a sequence of questions about the automation of specific occupations before asking about full automation of labor” (Thousands of AI Authors on the Future of AI

  5. In reality there are even more complications that I elide in the main text. If a set of probabilities contains both values of exactly 1, and values of exactly 0, the geometric mean of odds is undefined. If a one is present and there are no zeroes, the aggregate is one; and a zero is present and there are no ones, the aggregate is zero. However, floating point numbers by design have much more precision near zero than near one. For example, we can represent extremely small numbers like 1e-18, but 1 - 1e-18 just gets represented as 1.0. This means that very high probabilities get represented as 1 when equally extreme low probabilities do not get represented as zero. As a result, high probabilities get an “unfair advantage”. It should be possible to circumvent some of these problems by using alternative representations of the probabilities. However, many respondents directly give probabilities of 0% or 100% (as opposed to their fitted CDFs merely reaching these values). This poses a more fundamental problem for the geometric mean of odds. 

  6. I believe this is probably a theorem (with the possible exception of some degenerate cases), but I am not entirely sure since I have not attempted to actually write down or locate a proof. If you’ve got a proof or counter-example please contact me. 

  7. I give examples only for the fixed-probabilities framing in the main text because it’s easier to explain in the context of the loss functions we are using, which all use probabilities. However, we can see similar phenomena when looking at the fixed-years data. These are 9 plots representative of the bottom half of fixed-years Gamma fits. Gamma distribution fits to fixed years survey data, showing respondent points and Gamma CDFs. Since I am in this section aiming for expository clarity rather than the greatest rigour, I also elided the following complication in the main text. All distributions shown are Gammas fitted by previous authors, using the MSE of probabilities as the loss function. However, to produce the ranking of fits used to select which examples to plot, I used a different loss function. This was the MSE of years (horizontal direction) for the fixed-years plot, and the log loss for the fixed-probabilities plot. These loss functions make my examples more intuitive, while still being somewhat systematic (instead of cherry-picking examples). 

  8. The log loss can be motivated by analogy to the Kullback-Leibler (KL) divergence between discrete distributions. For each point in a respondent’s CDF data, we can think of it as a binary probability distribution (p,1p)(p, 1-p). The fitted CDF gives us another binary distribution (q,1q)(q, 1-q) at that point. The KL divergence between these distributions would be

    DKL(pq)=plog(p/q)+(1p)log((1p)/(1q))=plog(p)plog(q)+(1p)log(1p)(1p)log(1q)\begin{align*} D_{KL}(p\|q) &= p \log(p/q) + (1-p)\log((1-p)/(1-q)) \\ &= p\log(p) - p\log(q) + (1-p)\log(1-p) - (1-p)\log(1-q) \end{align*}

    The log loss [plog(q)+(1p)log(1q)]-[p\log(q) + (1-p)\log(1-q)] differs from this only by dropping the terms that don’t depend on qq, and thus has the same minimum. However, this is merely an intuitive motivation: we are not actually comparing two discrete distributions, but rather measuring how well our continuous CDF matches specific points. 

  9. Note that although the generalised gamma distribution has three parameters, as far as I can tell it does not have the flexibility to fit three arbitrary points of CDF data. I came to this conclusion by extensive empirical investigation, but I haven’t been able to locate or write a proof to conclusively establish this one way or another. Please write to me if you know the answer. By the way, I don’t know of any parametric 3-parameter distribution that has this property. I used flexible distributions for ESPAI because they are the only solution I am aware of.  2

  10. The code uses the paid MakeDistribution API, but a copy of all API responses needed to perform the analysis is stored in the repository. 

  11. I informally explored possible biases in the Gamma fits using the following histograms of residuals. While several of the residual distributions seem clearly biased, they also in most cases have 80% of the probability mass quite close to a residual of zero. I still do not fully understand why the effect of this data on aggregate CDFs is so muted, but I have not prioritised a more rigorous analysis.

    Histogram of residuals for Gamma fits in FAOL. Histogram of residuals for Gamma fits in HLMI.

  12. Due to the large framing effects of both tasks vs occupations, and fixed-years vs fixed-probabilities, which have been consistently observed, one may reasonably quarrel with describing this plot as showing “disagreement among respondents” or “the range of opinion among experts”. Part of why the range is so wide is that responses are highly sensitive to framing. Rather than saying experts disagree per se, purists might wish to say that expert opinion is undefined or unstable.

    This is a rather philosophical point. The more practical version of it is to ask whether we should aggregate accross these framings, or just present them separately.

    My position (further discussed here) is that while disaggregated results should also be available, aggregation is necessary to produce useful results. Aggregating things that have some commonalities and some differences is indeed inherent to science. While previous authors presented HLMI and FAOL separately, they did not present fixed-years and fixed-probabilities separately, which would be required if we take the anti-aggregation argument to its full conclusion. 

  13. Their methodology is different from what I used in the plots above, but yields very similar results for the median year. 

  14. Here is the full table:

    Title Year Link Citation
    Artificial intelligence: Arguments for catastrophic risk 2024 Link HLMI only
    Safety cases for frontier AI 2024 Link No numbers
    Me, myself and AI: How gender, personality and emotions determine willingness to use Strong AI for self-improvement 2024 Link HLMI only
    Theory Is All You Need: AI, Human Cognition, and Causal Reasoning 2024 Link No numbers
    Shared Awareness Across Domain‐Specific Artificial Intelligence 2024 Link No numbers
    Existential risk from transformative AI: an economic perspective 2024 Link HLMI only
    Theory is all you need: AI, human cognition, and decision making 2024 Link No numbers
    Generative artificial intelligence usage by researchers at work 2024 Link No numbers
    AI Horizon Scanning, White Paper p3395, IEEE-SA. Part I: Areas of Attention 2024 Link Two tasks (build a payment processing site, fine-tune an LLM)
    Generative AI, Ingenuity, and Law 2024 Link HLMI only
    AI Emergency Preparedness: Examining the federal government’s ability to detect and respond to AI-related national security threats 2024 Link No numbers
    Transformative AI, existential risk, and real interest rates 2024 Link HLMI only
    Misrepresented Technological Solutions in Imagined Futures: The Origins and Dangers of AI Hype in the Research Community 2024 Link No numbers
    Eliciting the Priors of Large Language Models using Iterated In-Context Learning 2024 Link HLMI only
    Strategic Insights from Simulation Gaming of AI Race Dynamics 2024 Link No numbers
    Evolutionary debunking and value alignment 2024 Link No numbers
    Robust Technology Regulation 2024 Link Extinction risk only
    Interpreting Affine Recurrence Learning in GPT-style Transformers 2024 Link No numbers
    Malicious use of AI and challenges to psychological security: Future risks 2024 Link HLMI and FAOL
    Grow Your Artificial Intelligence Competence 2024 Link No numbers

December 16, 2024

Elevated computing

I made a mount to lift my laptop off the surface of my desk:

I operate my laptop with its lid closed, using only an external monitor. The downside is that the closed laptop is a useless rectangle that takes up a lot of precious desk space. This type of product would have solved 80% of the problem:

best-vertical-laptop-stand-macbook.jpg

However, even more space could be saved by removing the laptop from the desk entirely. A pole or stand of some sort is already needed anyway to hold up the monitor, so why not use it to hold the laptop as well?

My solution consists of two repurposed commercially available products, along with a piece of 6mm birch plywood to hold them together.

Attached to the monitor pole, we have a VESA mount that can tilt and swivel. Instead of holding a monitor, it’s holding the wooden plank.

vesa

On the other side of the plank, we have 4 pieces of metal sold as an under-desk laptop mount:

under-desk

Details

To keep the setup compact, we want the pole-to-VESA piece to be as short as possible. There are zillions of pole-mounted VESA monitor arms to choose from, but finding something without the arm was a bit more difficult.

All inputs were sourced from the internet:

  • I ordered the wood cut to measure from woodsheets.com , which cost £8.60 including shipping.
  • The pole mount was £13.77 from Amazon
  • The under-desk mounting solution was £16.99 from Amazon

I used linseed oil to give the wood a nicer finish; this is of course optional but the aesthetic improvement is significant.

varnish-before.JPG varnish-after.JPG

The “under-desk” bits come with little silicon-ish pads to avoid scratching the laptop. Similarly, I put plastic caps on the screw heads on the laptop side:

IMG_3120 IMG_3119

Had I realised these would be needed, I would probably have chosen to use glue instead of screws to attach the wood to the VESA mount.

After completing this project, I also mounted a 3.5 inch external hard drive enclosure in a similar way. I glued it directly to the VESA mount. Arbitrary objects could be thus held aloft. The scope of this innovation has no limits currently known to science.

IMG_3329

As I have two poles, I put the hard drive on the other pole, but it would have comfortably fit on the first pole.

Previous versions of this idea

For my previous attempt, I hand-sawed a piece of scrap wood from a previous project. This looked tolerable, but for £8.60 it was worth having another piece cut to size:

IMG_3118

My initial idea was to use a VESA mount and a cheap laptop sleeve. I used this for several months:

However, inserting and removing the laptop was rather inconvenient because the sleeve lacks rigidity. With the wood-based solution, it’s a simple one-handed process. The sleeve also has worse heat dissipation, despite the addition of aeration holes.

August 3, 2023

How much of the fall in fertility could be explained by lower mortality?

Many people think that lower child mortality causes fertility to decline.

One prominent theory for this relationship, as described by Our World in Data1, is that “infant survival reduces the parents’ demand for children”2. (Infants are children under 1 years old).

In this article, I want to look at how we can precisify that theory, and what magnitude the effect could possibly take. What fraction of the decline in birth rates could the theory explain?

Important. I don’t want to make claims here about how parents actually make fertility choices. I only want to examine the implications of various models, and specifically how much of the observed changes in fertility the models could explain.

Constant number of children

One natural interpretation of “increasing infant survival reduces the parents’ demand for children” is that parents are adjusting the number of births to keep the number of surviving children constant.

Looking at Our World in Data’s graph, we can see that in most of the countries depicted, the infant survival rate went from about 80% to essentially 100%. This is a factor of 1.25. Meanwhile, there were 1/3 as many births. If parents were adjusting the number of births to keep the number of surviving children constant, the decline in infant mortality would explain a change in births by a factor of 1/1.25=0.8, a -0.2 change that is only 30% of the -2/3 change in births.

The basic mathematical reason this happens is that even when mortality is tragically high, the survival rate is still thankfully much closer to 1 than to 0, so even a very large proportional fall in mortality will only amount to a small proportional increase in survival.

Some children survive infancy but die later in childhood. Although Our World in Data’s quote focuses on infant mortality, it makes sense to consider older children too. I’ll look at under-5 mortality, which generally has better data than older age groups, and also captures a large fraction of all child mortality3.

England (1861-1951)

England is a country with an early demographic transition and good data available.

Doepke 2005 quotes the following numbers:

  1861 1951
Infant mortality 16% 3%
1-5yo mortality 13% 0.5%
0-5 yo mortality 27% 3.5%
Survival to 5 years 73% 96.5%
Fertility 4.9 2.1

Fertility fell by 57%, while survival to 5 years rose by 32%. Hence, if parents aim to keep the number of surviving children constant, the change in child survival can explain 43%4 of the actual fall in fertility. (It would have explained only 23% had we erroneously considered only the change in infant survival.)

Sub-Saharan Africa (1990-2017)

If we look now at sub-Saharan Africa data from the World Bank, the 1990-2017 change in fertility is from 6.3 to 4.8, a 25% decrease, whereas the 5-year survival rate went from 0.82 to 0.92, a 12% increase. So the fraction of the actual change in fertility that could be explained by the survival rate is 44%. (This would have been 23% had we looked only at infant survival).

Source data and calculations

So far, we have seen that this very simple theory of parental decision-making can explain 30-44% of the decline in fertility, while also noticing that considering childhood mortality beyond infancy was important to giving the theory its full due.

However, in more sophisticated models of fertility choices, the theory looks worse.

A more sophisticated model of fertility decisions

Let us imagine that instead of holding it constant, parents treat the number of surviving children as one good among many in an optimization problem.

An increase in the child survival rate can be seen as a decrease in the cost of surviving children. Parents will then substitute away from other goods and increase their target number of surviving children. If your child is less likely to die as an infant, you may decide to aim to have more children: the risk of experiencing the loss of a child is lower.5

For a more formal analysis, we can turn to the Barro and Becker (1989) model of fertility. I’ll be giving a simplified version of the presentation in Doepke 2005.

In this model, parents care about their own consumption as well as their number of surviving children. The parents maximise6

U(c,n)=u(c)+nϵVU(c,n) = u(c) + n^\epsilon V

where

  • nn is the number of surviving children and VV is the value of a surviving child
  • ϵ\epsilon is a constant (0,1)\in (0,1)
  • u(c)u(c) is the part of utility that depends on consumption7

The income of a parent is ww, and there is a cost per birth of pp and an additional cost of qq per surviving child8. The parents choose bb, the number of births. ss is the probability of survival of a child, so that n=sbn=sb.

Consumption is therefore c=w(p+qs)bc=w-(p+qs)b and the problem becomes maxbU=u(w(p+qs)b)+(sb)ϵV\max_{b} U = u(w-(p+qs)b) + (sb)^\epsilon V

Letting b(s)b^{*}(s) denote the optimal number of births as a function of ss, what are its properties?

The simplest one is that sb(s)sb^*(s), the number of surviving children, is increasing in ss. This is the substitution effect we described intuitively earlier in this section. This means that if ss is multiplied by a factor xx (say 1.25), b(s)b^*(s) will be multiplied more than 1/x1/x (more than 0.8).

When we looked at the simplest model, with a constant number of children, we guessed that it could explain 30-44% of the fall in fertility. That number is a strict upper bound on what the current model could explain.

What we really want to know, to answer the original question, is how b(s)b^*(s) itself depends on ss. To do this, we need to get a little bit more into the relative magnitude of the cost per birth pp and the additional cost qq per surviving child. As Doepke writes,

If a major fraction of the total cost of children accrues for every birth, fertility [i.e. b(s)b^*(s)] would tend to increase with the survival probability; the opposite holds if children are expensive only after surviving infancy9.

This tells us that falling mortality could actually cause fertility to increase rather than decrease.10

To go further, we need to plug in actual values for the model parameters. Doepke does this, using numbers that reflect the child mortality situation of England in 1861 and 1951, but also what seem to be some pretty arbitrary assumptions about the parent’s preferences (the shape of uu and the value of ϵ\epsilon).

With these assumptions, he finds that “the total fertility rate falls from 5.0 (the calibrated target) to 4.2 when mortality rates are lowered to the 1951 level”11, a 16% decrease. This represents is 28% of the actually observed fall in fertility to 2.1.

Extensions of Barro-Becker model

The paper then considers various extensions of the basic Barro-Becker model to see if they could explain the large decrease in fertility that we observe.

For example, it has been hypothesized that when there is uncertainty about whether a child will survive (hitherto absent from the models), parents want to avoid the possibility of ending up with zero surviving children. They therefore have many children as a precautionary measure. Declining mortality (which reduces uncertainty since survival rates are thankfully greater than 0.5) would have a strong negative impacts on births.

However, Doepke also considers a third model, that incorporates not only stochastic mortality but also sequential fertility choice, where parents may condition their fertility decisions on the observed survival of children that were born previously. The sequential aspect reduces the uncertainty that parents face over the number of surviving children they will end up with.

The stochastic and sequential models make no clear-cut predictions based on theory alone. Using the England numbers, however, Doepke finds a robust conclusion. In the stochastic+sequential model, for almost all reasonable parameter values, the expected number of surviving children still increases with ss (my emphasis):

To illustrate this point, let us consider the extreme case [where] utility from consumption is close to linear, while risk aversion with regards to the number of surviving children is high. … [W]hen we move (with the same parameters) to the more realistic sequential model, where parents can replace children who die early, … despite the high risk aversion with regards to the number of children, total fertility drops only to 4.0, and net fertility rises to 3.9, just as with the benchmark parameters. … Thus, in the sequential setup the conclusion that mortality decline raises net fertility is robust to different preference specifications, even if we deliberately emphasize the precautionary motive for hoarding children.

So even here, the fall in mortality would only explain 35% of the actually observed change in fertility. It seems that the ability to “replace” children who did not survive in the sequential model is enough to make its predictions pretty similar to the simple Barro-Becker model.

  1. The quote in context on Our World in Data’s child mortality page: “the causal link between infant [<1 year old] survival and fertility is established in both directions: Firstly, increasing infant survival reduces the parents’ demand for children. And secondly, a decreasing fertility allows the parents to devote more attention and resources to their children.” 

  2. As an aside, my impression is that if you asked an average educated person “Why do women in developing countries have more children?”, their first idea would be: “because child mortality is higher”. It’s almost a trope, and I feel that it’s often mentioned pretty glibly, without actually thinking about the decisions and trade-offs faced by the people concerned. That’s just an aside though – the theory clearly has prima facie plausibility, and is also cited in serious places like academia and Our World in Data. It deserves closer examination. 

  3. It should be possible to conduct the Africa analysis for different ages using IMHE’s more granular data, but it’s a bit more work. (There appears to be no direct data on deaths per birth as opposed to per capita, and data on fertility is contained in a different dataset from the main Global Burden of Disease data.) 

  4. All things decay. Should this Google Sheets spreadsheet become inaccessible, you can download this .xlsx copy which is stored together with this blog. 

  5. In this light, we can see that the constant model is not really compatible with parents viewing additional surviving children as a (normal) good. Nor of course is it compatible with viewing children as a bad, for then parents would choose to have 0 children. Instead, it could for example be used to represent parents aiming for a socially normative number of surviving children. 

  6. I collapse Doepke’s β\beta and VV into a single constant VV, since they can be treated as such in Model A, the only model that I will present mathematically in this post. 

  7. Its actual expression, that I omit from the main presentation for simplicity, is u(c)=c1σ1σu(c)=\frac{c^{1-\sigma}}{1-\sigma}, the constant relative risk-aversion utility function. 

  8. There is nothing in the model that compels us to call pp the “cost per birth”, this is merely for ease of exposition. The model itself only assumes that there are two periods for each child: in the first period, costing pp to start, children face a mortality risk; and in the second period, those who survived the first face zero mortality risk and cost qq

  9. Once again, Doepke calls the model’s early period “infancy”, but this is not inherent in the model. 

  10. It’s difficult to speculate about the relative magnitude of pp and qq, especially if, departing from Doepke, we make the early period of the model, say, the first 5 years of life. If the first period is only infancy, it seems plausible to me that qpq \gg p, but then we also fail to capture any deaths after infancy. On the other hand, extending the early period to 5 incorrectly assumes that parents get no utility from children before they reach the age of 5. 

  11. The following additional context may be helpful to understand this quote:

    The survival parameters are chosen to correspond to the situation in England in 1861 . According to Perston et al. (1972) the infant mortality rate (death rate until first birthday) was 16%16 \%, while the child mortality rate (death rate between first and fifth birthday) was 13%13 \%. Accordingly, I set si=0.84s_{i}=0.84 and sy=0.87s_{y}=0.87 in the sequential model, and s=sisy=0.73s=s_{i} s_{y}=0.73 in the other models. Finally, the altruism factor β\beta is set in each model to match the total fertility rate, which was 4.94.9 in 1861 (Chenais 1992). Since fertility choice is discrete in Models B and C, I chose a total fertility rate of 5.05.0 as the target.

    Each model is thus calibrated to reproduce the relationship of fertility and infant and child mortality in 1861 . I now examine how fertility adjusts when mortality rates fall to the level observed in 1951 , which is 3%3 \% for infant mortality and 0.5%0.5 \% for child mortality. The results for fertility can be compared to the observed total fertility rate of 2.12.1 in 1951 .

    In Model A (Barro-Becker with continuous fertility choice), the total fertility rate falls from 5.05.0 (the calibrated target) to 4.24.2 when mortality rates are lowered to the 1951 level. The expected number of surviving children increases from 3.73.7 to 4.04.0. Thus, there is a small decline in total fertility, but (as was to be expected given Proposition 1) an increase in the net fertility rate.

August 5, 2021