Simpson's Paradox is a Special Case of Omitted Variable Bias



The goal of this post is to illustrate a point made in a recent tweet by Amit Ghandi that Simpson's Paradox is a special case of omitted variable bias (OVB).

I will use an example from How SIMPSON'S PARADOX explains weird COVID19 statistics. (This example is for illustrative purposes only. This post is about statistics, not COVID-19).
The video compares those diagnosed with COVID-19 in China and Italy between March and May 2020.
Outcome by country: people infected with COVID-19 were more likely to survive in China than Italy.
Outcome by country-age group: at each age bracket, people infected with COVID-19 were more likely to survive in Italy than China.

Let's illustrate this with a simulation in the Julia Language.
The variables are defined in the table below: \begin{array}{|l|l|l|} \hline \text{Outcome: } Y_{i} & \text{Treatment: } X_{i} & \text{Confounder: } Z_{i}\\ \hline Y_{i} \equiv 0 \text{ if person i survives} & X_{i} \equiv 0 \text{ if person i is in China} & Z_{i} \equiv 0 \text{ if person i's age } \leq 59 \\ Y_{i} \equiv 1 \text{ if person i dies} & X_{i} \equiv 1 \text{ if person i is in Italy} & Z_{i} \equiv 1 \text{ if person i's age } > 59 \\ \hline \end{array} Let's assume the true data generating process (DGP) is: $Y_{i} = \beta_{0} + \beta_{xy} X_{i} + \beta_{zy} Z_{i} + \varepsilon_{i}$
Assume $\beta_{0}=10, \beta_{xy} = -5, \beta_{zy} = 10$ (coefficients are in %).
Let's generate some artificial data.
Suppose we have N=200 observations (half China, half Italy).
Suppose 80% of China's population is young $Z_{i} =0$ and 20% is old $Z_{i} = 1$.
Suppose 20% of Italy's population is young $Z_{i} =0$ and 80% is old $Z_{i} = 1$.
    
  using DataFrames, Plots
  N = 200;           #200 observations = 100 in China + 100 in Italy.
  β_0 = 10.0; β_Italy = -5.0; β_Age = 10.0;
  #
  df = DataFrame(
      Y =        [ones(8);zeros(80-8);  ones(4);zeros(20-4);  
                  ones(1);zeros(20-1);  ones(12);zeros(80-12);], 
      Intercept = ones(N), 
      Italy     = [zeros(100); ones(100)], 
      Age = [zeros(80);ones(100-80); 
             zeros(20);ones(100-20);],
      )
    
1. Let's estimate the unconditional probability of death from COVID-19 in this data:
$$Y_{i} = \beta_{0} + \varepsilon_{i}$$
    
    y = df.Y; X = ones(N);
    β = X \ y   # 12.5%
    mean(y)     # 12.5% 
    
\begin{array}{|l|l|l|} \hline P\left( \text{Death from COVID-19} \right) = 12.5\% \\ \hline \end{array}
2. Let's estimate the probability of death from COVID-19 conditional only on country:
$$Y_{i} = \beta_{0} + \beta_{xy} X_{i} + \varepsilon_{i}$$
    
    y = df.Y; X = [ df.Intercept df.Italy];
    β = X \ y   
    β[1]         # 12%
    β[1] + β[2]  # 13% 
    
\begin{array}{|l|l|l|} \hline P\left( \text{Death from COVID-19 } | \text{ China} \right) = 12\% \\ \hline P\left( \text{Death from COVID-19 } | \text{ Italy} \right) = 13\% \\ \hline \end{array}
3. Let's estimate the probability of death from COVID-19 conditional on country and age:
$$Y_{i} = \beta_{0} + \beta_{xy} X_{i} + \beta_{zy} Z_{i} + \varepsilon_{i}$$
    
    y = df.Y; X = [ df.Intercept df.Italy df.Age];
    β = X \ y   
    β[1]         # 10%
    β[1] + β[2]  # 5%
    
Summarize $P\left( \text{Death from COVID-19 } | \text{ Country, Age} \right)$ in the following table: \begin{array}{|l|l|l|} \hline & \text{Young } (Z_{i} = 0) & \text{Old } (Z_{i} = 1) \\ \hline \text{China } (X_{i} = 0) & 10\% & 20\% \\ \hline \text{Italy } (X_{i} = 1) & 5\% & 15\% \\ \hline \end{array}
Without conditioning on age, patients in Italy have a 1% higher probability of Death than China (13% vs 12%).
Conditioning on age, patients in Italy have a 5% lower probability of Death than China within both age brackets.

Next we will show how Simpson's paradox is a special case of OVB.
Suppose the true model is: $$Y_{i} = \beta_{0} + \beta_{xy} X_{i} + \beta_{zy} Z_{i} + \nu_{i}$$
Suppose you omit $Z_{i}$ and instead estimate: $$Y_{i} = \beta_{0} + \beta_{xy} X_{i} + u_{i} \Rightarrow u_{i} = \beta_{zy} Z_{i} + \nu_{i} $$
Suppose $X_{i}$ predicts $Z_{i}$: $$Z_{i} = \delta_{xz} X_{i} + w_{i} \Rightarrow \delta_{xz} = \frac{\sigma_{xz}}{\sigma_{x}^2} = \rho_{xz}\times \frac{\sigma_{z}}{\sigma_{x}} $$
Denote the OLS estimate (from the equation that omits age) $\hat{\beta}_{xy}$.
We have: $E[\hat{\beta}_{xy}|X_{i}] = \beta_{xy} + \underbrace{\delta_{xz} \beta_{zy}}_{\text{bias}}$ [1]
$\text{Bias} = \delta_{xz} \beta_{zy} = \left( \rho_{xz}\times \frac{\sigma_{z}}{\sigma_{x}} \right) \times \left( \rho_{zy}\times \frac{\sigma_{y}}{\sigma_{z}} \right) = \rho_{xz}\times \rho_{zy} \times \frac{\sigma_{y}}{\sigma_{x}} $
The bias is the product of (1) the impact of the treatment on the OV $\delta_{xz}$ and (2) the impact of the OV on the outcome $\beta_{zy}$.
The estimate will be unbiased if either (1) the treatment is uncorrelated w/ the OV, or (2) the OV is uncorrelated w/ the outcome.
Alternatively we can think about this through the Law of Iterated Expectations: \begin{align*} E[Y_i|\text{ China }] =& E[Y_i|\text{ China, Young}] \times P\left(\text{China, Young}\right) + E[Y_i|\text{ China, Old}] \times P\left(\text{China, Old}\right) \\ 12\% =& 10\% \times 80\% + 20\% \times 20\% \\ =& 8\% + 4\% \end{align*}

[1] To derive the bias, slightly abuse notation by stacking a column of ones and $X_{i}$ into a matrix "X", and stack $\beta_{0}$ and $\beta_{xy}$ into $\beta$:

\begin{align*} \hat{\beta} =& (X'X)^{-1} X'Y \\ =& (X'X)^{-1} X'(X\beta + Z\beta_{zy} + \nu_{i}) \\ =& \beta + (X'X)^{-1} X'Z \beta_{zy} + (X'X)^{-1} X'Z \nu_{i} \\ E[\hat{\beta}_{xy}|X ] =& \beta + (X'X)^{-1} X'Z \beta_{zy} \\ =& \beta + \delta_{xz} \beta_{zy} \end{align*}



Let's start with a definition:
Case Fatality Rate (CFR): the proportion of people who die from a specified disease among all individuals diagnosed with the disease over a certain period of time.

The non-causal interpretation of $\beta_{xy}$ is:
Assumption:
The causal interpretation of $\beta_{xy}$ is:
Assumption:
AZ: even w/o causal assumption, still useful for non-causal prediction



$X \sim N\left(\mu, \sigma^2 \right)$