Elevated computing

I made a mount to lift my laptop off the surface of my desk:

I operate my laptop with its lid closed, using only an external monitor. The downside is that the closed laptop is a useless rectangle that takes up a lot of precious desk space. This type of product would have solved 80% of the problem:

However, even more space could be saved by removing the laptop from the desk entirely. A pole or stand of some sort is already needed anyway to hold up the monitor, so why not use it to hold the laptop as well?

My solution consists of two repurposed commercially available products, along with a piece of 6mm birch plywood to hold them together.

Attached to the monitor pole, we have a VESA mount that can tilt and swivel. Instead of holding a monitor, it’s holding the wooden plank.

On the other side of the plank, we have 4 pieces of metal sold as an under-desk laptop mount:

Details

To keep the setup compact, we want the pole-to-VESA piece to be as short as possible. There are zillions of pole-mounted VESA monitor arms to choose from, but finding something without the arm was a bit more difficult.

All inputs were sourced from the internet:

I ordered the wood cut to measure from woodsheets.com , which cost £8.60 including shipping.
The pole mount was £13.77 from Amazon
The under-desk mounting solution was £16.99 from Amazon

I used linseed oil to give the wood a nicer finish; this is of course optional but the aesthetic improvement is significant.

The “under-desk” bits come with little silicon-ish pads to avoid scratching the laptop. Similarly, I put plastic caps on the screw heads on the laptop side:

Had I realised these would be needed, I would probably have chosen to use glue instead of screws to attach the wood to the VESA mount.

After completing this project, I also mounted a 3.5 inch external hard drive enclosure in a similar way. I glued it directly to the VESA mount. Arbitrary objects could be thus held aloft. The scope of this innovation has no limits currently known to science.

As I have two poles, I put the hard drive on the other pole, but it would have comfortably fit on the first pole.

Previous versions of this idea

For my previous attempt, I hand-sawed a piece of scrap wood from a previous project. This looked tolerable, but for £8.60 it was worth having another piece cut to size:

My initial idea was to use a VESA mount and a cheap laptop sleeve. I used this for several months:

However, inserting and removing the laptop was rather inconvenient because the sleeve lacks rigidity. With the wood-based solution, it’s a simple one-handed process. The sleeve also has worse heat dissipation, despite the addition of aeration holes.

August 3, 2023

How much of the fall in fertility could be explained by lower mortality?

Many people think that lower child mortality causes fertility to decline.

One prominent theory for this relationship, as described by Our World in Data¹, is that “infant survival reduces the parents’ demand for children”². (Infants are children under 1 years old).

In this article, I want to look at how we can precisify that theory, and what magnitude the effect could possibly take. What fraction of the decline in birth rates could the theory explain?

Important. I don’t want to make claims here about how parents actually make fertility choices. I only want to examine the implications of various models, and specifically how much of the observed changes in fertility the models could explain.

Constant number of children

One natural interpretation of “increasing infant survival reduces the parents’ demand for children” is that parents are adjusting the number of births to keep the number of surviving children constant.

Looking at Our World in Data’s graph, we can see that in most of the countries depicted, the infant survival rate went from about 80% to essentially 100%. This is a factor of 1.25. Meanwhile, there were 1/3 as many births. If parents were adjusting the number of births to keep the number of surviving children constant, the decline in infant mortality would explain a change in births by a factor of 1/1.25=0.8, a -0.2 change that is only 30% of the -2/3 change in births.

The basic mathematical reason this happens is that even when mortality is tragically high, the survival rate is still thankfully much closer to 1 than to 0, so even a very large proportional fall in mortality will only amount to a small proportional increase in survival.

Some children survive infancy but die later in childhood. Although Our World in Data’s quote focuses on infant mortality, it makes sense to consider older children too. I’ll look at under-5 mortality, which generally has better data than older age groups, and also captures a large fraction of all child mortality³.

England (1861-1951)

England is a country with an early demographic transition and good data available.

Doepke 2005 quotes the following numbers:

	1861	1951
Infant mortality	16%	3%
1-5yo mortality	13%	0.5%
0-5 yo mortality	27%	3.5%
Survival to 5 years	73%	96.5%
Fertility	4.9	2.1

Fertility fell by 57%, while survival to 5 years rose by 32%. Hence, if parents aim to keep the number of surviving children constant, the change in child survival can explain 43%⁴ of the actual fall in fertility. (It would have explained only 23% had we erroneously considered only the change in infant survival.)

Sub-Saharan Africa (1990-2017)

If we look now at sub-Saharan Africa data from the World Bank, the 1990-2017 change in fertility is from 6.3 to 4.8, a 25% decrease, whereas the 5-year survival rate went from 0.82 to 0.92, a 12% increase. So the fraction of the actual change in fertility that could be explained by the survival rate is 44%. (This would have been 23% had we looked only at infant survival).

Source data and calculations

So far, we have seen that this very simple theory of parental decision-making can explain 30-44% of the decline in fertility, while also noticing that considering childhood mortality beyond infancy was important to giving the theory its full due.

However, in more sophisticated models of fertility choices, the theory looks worse.

A more sophisticated model of fertility decisions

Let us imagine that instead of holding it constant, parents treat the number of surviving children as one good among many in an optimization problem.

An increase in the child survival rate can be seen as a decrease in the cost of surviving children. Parents will then substitute away from other goods and increase their target number of surviving children. If your child is less likely to die as an infant, you may decide to aim to have more children: the risk of experiencing the loss of a child is lower.⁵

For a more formal analysis, we can turn to the Barro and Becker (1989) model of fertility. I’ll be giving a simplified version of the presentation in Doepke 2005.

In this model, parents care about their own consumption as well as their number of surviving children. The parents maximise⁶

U(c,n) = u(c) + n^\epsilon V

where

$n$ is the number of surviving children and $V$ is the value of a surviving child
$\epsilon$ is a constant $\in (0,1)$
$u(c)$ is the part of utility that depends on consumption⁷

The income of a parent is $w$ , and there is a cost per birth of $p$ and an additional cost of $q$ per surviving child⁸. The parents choose $b$ , the number of births. $s$ is the probability of survival of a child, so that $n=sb$ .

Consumption is therefore $c=w-(p+qs)b$ and the problem becomes $\max_{b} U = u(w-(p+qs)b) + (sb)^\epsilon V$

Letting $b^{*}(s)$ denote the optimal number of births as a function of $s$ , what are its properties?

The simplest one is that $sb^*(s)$ , the number of surviving children, is increasing in $s$ . This is the substitution effect we described intuitively earlier in this section. This means that if $s$ is multiplied by a factor $x$ (say 1.25), $b^*(s)$ will be multiplied more than $1/x$ (more than 0.8).

When we looked at the simplest model, with a constant number of children, we guessed that it could explain 30-44% of the fall in fertility. That number is a strict upper bound on what the current model could explain.

What we really want to know, to answer the original question, is how $b^*(s)$ itself depends on $s$ . To do this, we need to get a little bit more into the relative magnitude of the cost per birth $p$ and the additional cost $q$ per surviving child. As Doepke writes,

If a major fraction of the total cost of children accrues for every birth, fertility [i.e. $b^*(s)$ ] would tend to increase with the survival probability; the opposite holds if children are expensive only after surviving infancy⁹.

This tells us that falling mortality could actually cause fertility to increase rather than decrease.¹⁰

To go further, we need to plug in actual values for the model parameters. Doepke does this, using numbers that reflect the child mortality situation of England in 1861 and 1951, but also what seem to be some pretty arbitrary assumptions about the parent’s preferences (the shape of $u$ and the value of $\epsilon$ ).

With these assumptions, he finds that “the total fertility rate falls from 5.0 (the calibrated target) to 4.2 when mortality rates are lowered to the 1951 level”¹¹, a 16% decrease. This represents is 28% of the actually observed fall in fertility to 2.1.

Extensions of Barro-Becker model

The paper then considers various extensions of the basic Barro-Becker model to see if they could explain the large decrease in fertility that we observe.

For example, it has been hypothesized that when there is uncertainty about whether a child will survive (hitherto absent from the models), parents want to avoid the possibility of ending up with zero surviving children. They therefore have many children as a precautionary measure. Declining mortality (which reduces uncertainty since survival rates are thankfully greater than 0.5) would have a strong negative impacts on births.

However, Doepke also considers a third model, that incorporates not only stochastic mortality but also sequential fertility choice, where parents may condition their fertility decisions on the observed survival of children that were born previously. The sequential aspect reduces the uncertainty that parents face over the number of surviving children they will end up with.

The stochastic and sequential models make no clear-cut predictions based on theory alone. Using the England numbers, however, Doepke finds a robust conclusion. In the stochastic+sequential model, for almost all reasonable parameter values, the expected number of surviving children still increases with $s$ (my emphasis):

To illustrate this point, let us consider the extreme case [where] utility from consumption is close to linear, while risk aversion with regards to the number of surviving children is high. … [W]hen we move (with the same parameters) to the more realistic sequential model, where parents can replace children who die early, … despite the high risk aversion with regards to the number of children, total fertility drops only to 4.0, and net fertility rises to 3.9, just as with the benchmark parameters. … Thus, in the sequential setup the conclusion that mortality decline raises net fertility is robust to different preference specifications, even if we deliberately emphasize the precautionary motive for hoarding children.

So even here, the fall in mortality would only explain 35% of the actually observed change in fertility. It seems that the ability to “replace” children who did not survive in the sequential model is enough to make its predictions pretty similar to the simple Barro-Becker model.

The quote in context on Our World in Data’s child mortality page: “the causal link between infant [<1 year old] survival and fertility is established in both directions: Firstly, increasing infant survival reduces the parents’ demand for children. And secondly, a decreasing fertility allows the parents to devote more attention and resources to their children.” ↩
As an aside, my impression is that if you asked an average educated person “Why do women in developing countries have more children?”, their first idea would be: “because child mortality is higher”. It’s almost a trope, and I feel that it’s often mentioned pretty glibly, without actually thinking about the decisions and trade-offs faced by the people concerned. That’s just an aside though – the theory clearly has prima facie plausibility, and is also cited in serious places like academia and Our World in Data. It deserves closer examination. ↩
It should be possible to conduct the Africa analysis for different ages using IMHE’s more granular data, but it’s a bit more work. (There appears to be no direct data on deaths per birth as opposed to per capita, and data on fertility is contained in a different dataset from the main Global Burden of Disease data.) ↩
All things decay. Should this Google Sheets spreadsheet become inaccessible, you can download this .xlsx copy which is stored together with this blog. ↩
In this light, we can see that the constant model is not really compatible with parents viewing additional surviving children as a (normal) good. Nor of course is it compatible with viewing children as a bad, for then parents would choose to have 0 children. Instead, it could for example be used to represent parents aiming for a socially normative number of surviving children. ↩
I collapse Doepke’s $\beta$ and $V$ into a single constant $V$ , since they can be treated as such in Model A, the only model that I will present mathematically in this post. ↩
Its actual expression, that I omit from the main presentation for simplicity, is $u(c)=\frac{c^{1-\sigma}}{1-\sigma}$ , the constant relative risk-aversion utility function. ↩
There is nothing in the model that compels us to call $p$ the “cost per birth”, this is merely for ease of exposition. The model itself only assumes that there are two periods for each child: in the first period, costing $p$ to start, children face a mortality risk; and in the second period, those who survived the first face zero mortality risk and cost $q$ . ↩
Once again, Doepke calls the model’s early period “infancy”, but this is not inherent in the model. ↩
It’s difficult to speculate about the relative magnitude of $p$ and $q$ , especially if, departing from Doepke, we make the early period of the model, say, the first 5 years of life. If the first period is only infancy, it seems plausible to me that $q \gg p$ , but then we also fail to capture any deaths after infancy. On the other hand, extending the early period to 5 incorrectly assumes that parents get no utility from children before they reach the age of 5. ↩
The following additional context may be helpful to understand this quote:

The survival parameters are chosen to correspond to the situation in England in 1861 . According to Perston et al. (1972) the infant mortality rate (death rate until first birthday) was $16 \%$ , while the child mortality rate (death rate between first and fifth birthday) was $13 \%$ . Accordingly, I set $s_{i}=0.84$ and $s_{y}=0.87$ in the sequential model, and $s=s_{i} s_{y}=0.73$ in the other models. Finally, the altruism factor $\beta$ is set in each model to match the total fertility rate, which was $4.9$ in 1861 (Chenais 1992). Since fertility choice is discrete in Models B and C, I chose a total fertility rate of $5.0$ as the target.

Each model is thus calibrated to reproduce the relationship of fertility and infant and child mortality in 1861 . I now examine how fertility adjusts when mortality rates fall to the level observed in 1951 , which is $3 \%$ for infant mortality and $0.5 \%$ for child mortality. The results for fertility can be compared to the observed total fertility rate of $2.1$ in 1951 .

In Model A (Barro-Becker with continuous fertility choice), the total fertility rate falls from $5.0$ (the calibrated target) to $4.2$ when mortality rates are lowered to the 1951 level. The expected number of surviving children increases from $3.7$ to $4.0$ . Thus, there is a small decline in total fertility, but (as was to be expected given Proposition 1) an increase in the net fertility rate.

↩

August 5, 2021

The special case of the normal likelihood function

Summary¹: The likelihood function implied by an estimate $b$ with standard deviation $\sigma$ is the probability density function (PDF) of a $\mathcal{N}(b,\sigma^2)$ . Though this might sound intuitive, it’s actually a special case. If we don’t firmly grasp that it’s an exception, it can be confusing.

Suppose that a study has the point estimator $B$ for the parameter $\Theta$ . The study results are an estimate $B=b$ (typically a regression coefficient), and an estimated standard deviation² $\hat{sd}(B)=s$ .

In order to know how to combine this information with a prior over $\Theta$ in order to update our beliefs, we need to know what is the likelihood function implied by the study. The likelihood function is the probability of observing the study data $B=b$ given different values for $\Theta$ . It is formed from the probability of the observation that $B=b$ conditional on $\Theta=\theta$ , but viewed and used as a function of $\theta$ only³:

\mathcal{L}: \theta \mapsto P(B =b \mid \Theta = \theta)

The event “ $B=b$ ” is often shortened to just “ $b$ ” when the meaning is clear from context, so that the function can be more briefly written $\mathcal{L}: \theta \mapsto P(b \mid \theta)$ .

So, what is $\mathcal{L}$ ? In a typical regression context, $B$ is assumed to be approximately normally distributed around $\Theta$ , due to the central limit theorem. More precisely, $\frac{B - \Theta}{sd(B)} \sim \mathcal{N}(0,1)$ , and equivalently $B\sim \mathcal{N}(\Theta,sd(B)^2)$ .

$sd(B)$ is seldom known, and is often replaced with its estimate $s$ , allowing us to write $B\sim \mathcal{N}(\Theta,s^2)$ , where only the parameter $\Theta$ is unknown⁴.

We can plug this into the definition of the likelihood function:

\mathcal{L}: \theta \mapsto P(b\mid \theta)= \text{PDF}_{\mathcal{N}(\theta,s^2)}(b) = {\frac {1}{s\sqrt {2\pi }}}\exp \left(-{\frac {1}{2}}\left({\frac {b-\theta }{s }}\right)^{2} \right)

We could just leave it at that. $\mathcal{L}$ is the function⁵ above, and that’s all we need to compute the posterior. But a slightly different expression for $\mathcal{L}$ is possible. After factoring out the square,

\mathcal{L}: \theta \mapsto {\frac {1}{s {\sqrt {2\pi }}}}\exp \left(-{\frac {1}{2}} {\frac {(b-\theta)^2 }{s^2 }} \right),

we make use of the fact that $(b-\theta)^2 = (\theta-b)^2$ to rewrite $\mathcal{L}$ with the positions of $\theta$ and $b$ flipped:

\mathcal{L}: \theta \mapsto {\frac {1}{s {\sqrt {2\pi }}}}\exp \left(-{\frac {1}{2}}\left({\frac {\theta-b }{s }}\right)^{2} \right).

We then notice that $\mathcal{L}$ is none other than

\mathcal{L}: \theta \mapsto \text{PDF}_{\mathcal{N}(b,s^2)}(\theta)

So, for all $b$ and for all $\theta$ , $\mathcal{L}: \theta \mapsto \text{PDF}_{\mathcal{N}(\theta,s^2)}(b) = \text{PDF}_{\mathcal{N}(b,s^2)}(\theta)$ .

The key thing to realise is that this is a special case due to the fact that the functional form of the normal PDF is invariant to substituting $b$ and $\theta$ for each other. For many other distributions of $B$ , we cannot apply this procedure.

This special case is worth commenting upon because it has personally lead me astray in the past. I often encountered the case where $B$ is normally distributed, and I used the equality above without deriving it and understanding where it comes from. It just had a vaguely intuitive ring to it. I would occasionally slip into thinking it was a more general rule, which always resulted in painful confusion.

To understand the result, let us first illustrate it with a simple numerical example. Suppose we observe an Athenian man $b=200$ cm tall. For all $\theta$ , the likelihood of this observation if Athenian men’s heights followed an $\mathcal{N}(\theta,10)$ is the same number as the density of observing an Athenian $\theta$ cm tall if Athenian men’s heights followed a $\mathcal{N}(200,10)$ ⁶.

Graphical representation of $\text{PDF}_{\mathcal{N}(\theta,10)}(200) = \text{PDF}_{\mathcal{N}(200,10)}(\theta)$

When encountering this equivalence, you might, like me, sort of nod along. But puzzlement would be a more appropriate reaction. To compute the likelihood of our 200 cm Athenian under different $\Theta$ -values, we can substitute a totally different question: “assuming that $\Theta=200$ , what is the probability of seeing Athenian men of different sizes?”.

The puzzle is, I think, best resolved by viewing it as a special case, an algebraic curiosity that only applies to some distributions. Don’t even try to build an intuition for it, because it does not generalise.

To help understand this better, let’s look at at a case where the procedure cannot be applied.

Suppose for example that $B$ is binomially distributed, representing the number of successes among $n$ independent trials with success probability $\Theta$ . We’ll write $B \sim \text{Bin}(n, \theta)$ .

$B$ ’s probability mass function is

g: k \mapsto \text{PMF}_{\text{Bin}(n, \theta)}(k) = {n \choose k} \phi^k (1-\phi)^{n-k}

Meanwhile, the likelihood function for the observation of $b$ successes is

\mathcal{M}: \phi \mapsto \text{PMF}_{\text{Bin}(n, \theta)}(b) = {n \choose b} \phi^b (1-\phi)^{n-b}

To attempt to take the PMF $g$ , set its parameter $\theta$ equal to $b$ , and obtain the likelihood function would not just give incorrect values, it would be a domain error. Regardless of how we set its parameters, $g$ could never be equal to the likelihood function $\mathcal{M}$ , because $g$ is defined on $\{0,1,...,n\}$ , whereas $\mathcal{M}$ is defined on $[0,1]$ .

The likelihood function $\mathcal{Q}: P_H \mapsto P_H^2(1-P_H)$ for the binomial probability of a biased coin landing heads-up, given that we have observed $\{Heads, Heads, Tails\}$ . It is defined on $[0,1]$ . (The constant factor $3 \choose 2$ is omitted, a common practice with likelihood functions, because these constant factors have no meaning and make no difference to the posterior distribution.)

It’s hopefully now quite intuitive that the case where $B$ is normally distributed was a special case.⁷

Let’s recapitulate.

The likelihood function is the probability of $b\mid\theta$ viewed as a function of $\theta$ only. It is absolutely not a density of $\theta$ .

In the special case where $B$ is normally distributed, we have the confusing ability of being able to express this function as if it were the density of $\theta$ under a distribution that depends on $b$ .

I think it’s best to think of that ability as an algebraic coincidence, due to the functional form of the normal PDF. We should think of $\mathcal{L}$ in the case where $B$ is normally distributed as just another likelihood function.

Finally, I’d love to know if there is some way to view this special case as enlightening rather than just a confusing exception.

I believe that to say that a $\text{PDF}_{\theta,\Gamma}(b)=\text{PDF}_{b,\Gamma}(\theta)$ (where $\text{PDF}_{\psi,\Gamma}$ denotes the PDF of a distribution with one parameter $\psi$ that we wish to single out and a vector $\Gamma$ of other parameters), is equivalent to saying that the PDF is symmetric around its singled-out parameter. For example, a $\mathcal{N}(\mu,\sigma^2)$ is symmetric around its parameter $\mu$ . But this hasn’t seemed insightful to me. Please write to me if you know an answer to this.

Thanks to Gavin Leech and Ben West for feedback on a previous versions of this post. ↩
I do not use the confusing term ‘standard error’, which I believe should mean $sd(B)$ but is often also used to also denote its estimate $s$ . ↩
I use uppercase letters $\Theta$ and $B$ to denote random variables, and lower case $\theta$ and $b$ for particular values (realizations) these random variables could take. ↩
A more sophisticated approach would be to let $sd(B)$ be another unknown parameter over which we form a prior; we would then update our beliefs jointly about $\Theta$ and $sd(B)$ . See for example Bolstad & Curran (2016), Chapter 17, “Bayesian Inference for Normal with Unknown Mean and Variance”. ↩
I don’t like the term “likelihood distribution”, I prefer “likelihood function”. In formal parlance, mathematical distributions are a generalization of functions, so it’s arguably technically correct to call any likelihood function a likelihood distribution. But in many contexts, “distribution” is merely used as short for “probability distribution”. So “likelihood distribution” runs the risk of making us think of “likelihood probability distribution” – but the likelihood function is not generally a probability distribution. ↩
We are here ignoring any inadequacies of the $B\sim N(\Theta,s^2)$ assumption, including but not limited to the fact that one cannot observe men with negative heights. ↩
Another simple reminder that the procedure couldn’t possibly work in general is that in general the likelihood function is not even a PDF at all. For example, a broken thermometer that always gives the temperature as 20 degrees has $P(B=20 \mid \theta) = 1$ for all $\theta$ , which evidently does not integrate to 1 over all values of $\theta$ .

To take a different tack, the fact that the likelihood function is invariant to reparametrization also illustrates that it is not a probability density of $\theta$ (thanks to Gavin Leech for the link). ↩

July 31, 2021