CDF:
RV | Julia | MATLAB | Base R | STATA | Mathematica | Python SciPy |
---|---|---|---|---|---|---|
$N(0,1)$ | cdf(Normal(0,1),x) | normcdf(x) | pnorm(x) | normal(x) | CDF[NormalDistribution[0, 1],x] | norm.cdf(x) |
$\chi^2_{r}$ | cdf(Chisq(r),x) | chi2cdf(x,r) | pchisq(x,r) | chi2(r,x) | CDF[ChiSquareDistribution[r],x] | chi2.cdf(x, r) |
$t_r$ | cdf(TDist(r),x) | tcdf(x,r) | pt(x,r) | 1-ttail(r,x) | CDF[StudentTDistribution[r],x] | t.cdf(x, r) |
$F_{r,k}$ | cdf(FDist(r,k),x) | fcdf(x,r,k) | pf(x,r,k) | F(r,k,x) | CDF[FRatioDistribution[r,k],x] | f.cdf(x, r, k) |
$D(\theta)$ | cdf(D(θ),x) | Dcdf(x,θ) | pD(x,θ) | ? | CDF[D[θ],x] | D.cdf(x,θ) |
Inverse Probabilities (quantiles):
RV | Julia | MATLAB | Base R | STATA | Mathematica | Python SciPy |
---|---|---|---|---|---|---|
$N(0,1)$ | quantile(Normal(0,1),p) | norminv(p) | qnorm(p) | invnormal(p) | Quantile[NormalDistribution[],p] | norm.ppf(p) |
$\chi^2_{r}$ | quantile(Chisq(r),p) | chi2inv(p,r) | qchisq(p,r) | invchi2(r,p) | Quantile[ChiSquareDistribution[r],p] | chi2.ppf(p, r) |
$t_r$ | quantile(TDist(r),p) | tinv(p,r) | qt(p,r) | invttail(r,1-p) | Quantile[StudentTDistribution[r],p] | t.ppf(p, r) |
$F_{r,k}$ | quantile(FDist(r,k),p) | finv(p,r,k) | qf(p,r,k) | invF(r,k,p) | Quantile[FRatioDistribution[r,k],p] | f.ppf(p, r, k) |
$D(\theta)$ | quantile(D(θ),p) | Dinv(p,θ) | qD(p,θ) | invD(p,θ) | Quantile[D[θ],p] | D.ppf(p,θ) |
Other Properties:
Property | Julia | MATLAB | Base R | STATA | Mathematica | Python SciPy |
---|---|---|---|---|---|---|
cdf | cdf(D(θ),x) | Dcdf(x,θ) | pD(x,θ) | ? | CDF[D[θ],x] | D.cdf(x,θ) |
pdf/pmf | pdf(D(θ),x) | Dcdf(x,θ) | dD(x,θ) | ? | PDF[D[θ],x] | D.pdf(x,θ) |
quantile | quantile(D(θ),p) | Dinv(p,θ) | qD(p,θ) | invD(p,θ) | Quantile[D[θ],p] | D.ppf(p,θ) |
random | rand(D(θ),N) | Dinv(p,θ) | rD(N) | invD(p,θ) | RandomVariate[D[θ],N] | D.ppf(p,θ) |
mean | mean(D(θ)) | - | - | - | Mean[D[θ]] | - |
entropy | entropy(D(θ)) | - | - | - | - | - |
fit | fit(D, data) | - | - | - | FindDistributionParameters[data,D] | - |
A key distinction between the way the packages above handle random variables
is that in
Julia and Mathematica
a random variable is itself a type.
On the other hand e.g. in R you cannot refer to the underlying randomv variable, you can only compute properties
such as chi2cdf(x,r)
.
General syntax in Julia:
Distributions.jl distinguishes between a Random Variable’s parameters and property variables.
A random variable is a type
such as Chisq(r)
or D(θ)
.
A property of a random variable such as CDF or mean is (typically) a functional
which takes random variable as its argument along with any necesarry property specific variables.
Note: some properties don’t have any arguments such as mean(D(θ))
.
Note: the fit(D, data)
function requires a distribution type without parameters D
as opposed to D(θ)
.
In general a random variables package does three things:
Here is an overview of current features:
D(θ)
, Chisq(r)
, FDist(r,k)
etcMixtureModel([Normal(0,1),Cauchy(0,1)], [0.5,0.5])
Truncated(Cauchy(0,1), 0.25, 1.8)
convolve(Cauchy(0,1), Cauchy(5,2))
product_distribution([Normal(),Cauchy()])
rand(D(θ),N)
, rand(Cauchy(0,1), 100)
fit(D, data)
fit(Histogram, data)
property(D(θ))
or property(D(θ),x)
where θ is the vector of distribution parameters and x is the vector of property
variables.
d=LogNormal()
mean(d), median(d), mode(d), var(d), std(d)
skewness(d), kurtosis(d), entropy(d)
pdf(d, 2), cdf(d, 2), quantile(d, .9), gradlogpdf(d, 2)
Distributions.expectation(LogNormal(), cos)
computes $E[cos(X)]$ where $X\sim LogNormal(0,1)$.
Numerical vs Symbolic:
I discussed the following examples on Discourse.
Distributions.jl currently doesn’t operate on transformations of random variables.
Mathematica can handle transformations of a distribution when it can solve the problem symbolically.
Now consider the same distribution with symbolic parameters BetaDistribution[α,β]
From paper:
Type hierarchy
p-d-q-r
in Julia:
fit(D, data)
fit(Histogram, data)
The table below adds Julia, Mathematica, and Python.
Python: https://github.com/QuantEcon/rvlib
R: https://github.com/alan-turing-institute/distr6
Compare syntax: https://hyperpolyglot.org/scripting
Let’s start with some definitions:
Simpson’s Paradox: a statistical phenomenon where an association between two variables in a population emerges, disappears or reverses when the population is divided into subpopulations.
Omitted Variable Bias (OVB): when a statistical model leaves out one or more variables that is both correlated with the treatment and the outcome.
Case Fatality Rate (CFR):
the proportion of people who die from a specified disease among all individuals diagnosed with the disease over a certain period of time.
Let’s use an example from
How Simpson’s paradox explains weird COVID19 statistics.
(This example is for illustrative purposes only. This post is about interpreting statistics, not COVID-19).
The video compares those diagnosed with COVID-19 in China and Italy between March and May 2020.
CFR by country: people infected with COVID-19 were more likely to die in Italy than China.
CFR by country-age group: at each age bracket, people infected with COVID-19 were more likely to die in China than Italy.
Let’s illustrate this with a simulation in the Julia Language.
The variables are defined in the table below:
Let’s assume the true data generating process (DGP) is:
$Y_{i} = \beta_{0} + \beta_{xy} X_{i} + \beta_{zy} Z_{i} + \varepsilon_{i}$
Under the true DGP, $\text{CFR}\left(X_i, Z_i \right) = P\left(Y_i =1 | X_i, Z_i \right) =E\left[Y_i =1 | X_i, Z_i \right]$.
Assume $\beta_{0}=10, \beta_{xy} = -5, \beta_{zy} = 10$ (coefficients are in %).
China-Young: $\text{CFR}\left(0, 0\right) = P\left(Y_i =1 | X_i=0, Z_i=0 \right) = \beta_{0} = 10\%$
China-Old: $\text{CFR}\left(0, 1\right) = P\left(Y_i =1 | X_i=0, Z_i=1 \right) = \beta_{0} + \beta_{zy} = 20\%$
Italy-Young: $\text{CFR}\left(1, 0\right) = P\left(Y_i =1 | X_i=1, Z_i=0 \right) = \beta_{0} + \beta_{xy} = 5\%$
Italy-Old: $\text{CFR}\left(1, 1\right) = P\left(Y_i =1 | X_i=1, Z_i=1 \right) = \beta_{0} + \beta_{xy} + \beta_{zy} = 15\%$
Let’s generate artificial data consistent with the DGP.
Suppose we have N=200 observations (half China, half Italy).
Suppose 80% of China’s population is young $Z_{i} =0$ and 20% is old $Z_{i} = 1$.
Suppose 20% of Italy’s population is young $Z_{i} =0$ and 80% is old $Z_{i} = 1$.
1) Let’s estimate the unconditional probability of death from COVID-19 in this data:
\(Y_{i} = \beta_{0} + \varepsilon_{i}\)
2) Let’s estimate the probability of death from COVID-19 conditional only on country:
\(Y_{i} = \beta_{0} + \beta_{xy} X_{i} + \varepsilon_{i}\)
3) Let’s estimate the probability of death from COVID-19 conditional on country and age:
\(Y_{i} = \beta_{0} + \beta_{xy} X_{i} + \beta_{zy} Z_{i} + \varepsilon_{i}\)
Summarize $P\left( \text{Death from COVID-19 } | \text{ Country, Age} \right)$ in the following table:
$$\begin{array}{|lll|}\hline & \text{Young}({Z}_{i}=0)& \text{Old}({Z}_{i}=1)\\ \text{China}({X}_{i}=0)& 10\mathrm{\%}& 20\mathrm{\%}\\ \text{Italy}({X}_{i}=1)& 5\mathrm{\%}& 15\mathrm{\%}\\ \hline\end{array}$$
Without conditioning on age, patients in Italy have a 1% higher probability of Death than China (13% vs 12%).
Conditioning on age, patients in Italy have a 5% lower probability of Death than China within both age brackets.
Next we will show how Simpson’s paradox is a special case of OVB.
Suppose the true model is:
\(Y_{i} = \beta_{0} + \beta_{xy} X_{i} + \beta_{zy} Z_{i} + \nu_{i}\)
Suppose you omit $Z_{i}$ and instead estimate:
\(Y_{i} = \beta_{0} + \beta_{xy} X_{i} + u_{i} \Rightarrow u_{i} = \beta_{zy} Z_{i} + \nu_{i}\)
Suppose $X_{i}$ predicts $Z_{i}$:
\(Z_{i} = \delta_{xz} X_{i} + w_{i} \Rightarrow \delta_{xz} = \frac{\sigma_{xz}}{\sigma_{x}^2} = \rho_{xz}\times \frac{\sigma_{z}}{\sigma_{x}}\)
Denote the OLS estimate (from the equation that omits age) $\hat{\beta}_{xy}$.
We have:
$E\left[ \hat{\beta}_{xy} | X_{i} \right] = \beta_{xy} + \underbrace{\delta_{xz} \beta_{zy}}_{\text{Bias}}$ (derivation below^{1})
$
\text{Bias} = \delta_{xz} \beta_{zy} = \left( \rho_{xz}\times \frac{\sigma_{z}}{\sigma_{x}} \right)
\times
\left( \rho_{zy}\times \frac{\sigma_{y}}{\sigma_{z}} \right) = \rho_{xz}\times \rho_{zy} \times \frac{\sigma_{y}}{\sigma_{x}}
$
The bias is the product of
(1) the impact of the treatment on the OV $\delta_{xz}$
and
(2) the impact of the OV on the outcome $\beta_{zy}$.
The estimate will be unbiased if either (1) the treatment is uncorrelated w/ the OV,
or
(2) the OV is uncorrelated w/ the outcome.
Simpson’s reversal occurs when the sign of the estimated coefficient switches after including the confounder (when the bias is big enough in the opposite direction of the true effect): $\text{sign}\left( \hat{\beta}_{xy} \right) \neq \text{sign}\left( \beta_{xy} \right)$ $\Leftrightarrow$ $\text{sign}\left( \beta_{xy} + \delta_{xz} \beta_{zy} \right) \neq \text{sign}\left( \beta_{xy} \right)$.
In our case, the true effect $\beta_{xy} = -5\%$ and the bias $\delta_{xz} \beta_{zy}=6\%$, the OVB is big enough to cause a reversal:
\(\begin{align*}
\hat{\beta}_{xy} &= 1\% & \text{Non-causal effect, estimated when excluding Z}
\\
\beta_{xy} &= -5\% & \text{Causal effect, estimated when including Z}
\\
\delta_{xz} \beta_{zy} &= 60\% \times 10\% =6\% &= \text{Bias}
\\
\beta_{xy} + \delta_{xz} \beta_{zy} &= -5\% + 60\% \times 10\% &= 1\%
\\
\end{align*}\)
Suppose we estimate: \(Y_{i} = \beta_{0} + \beta_{xy} X_{i} + \varepsilon_{i}\)
Non-causal interpretation $\hat{\beta}_{xy} = 1\%$: the probability of a diagnosed patient
dying from COVID-19 is 1% higher in Italy than China.
Assumption 1: the Chinese and Italian data was correctly measured and reported.
Note: the assumption required for the non-causal interpretation is relatively mild.
Causal interpretation $\hat{\beta}_{xy} = 1\%$:
if we intervene and move a diagnosed patient from China to Italy,
the probability of the patient dying from COVID-19 will be 1% higher in Italy.
Assumption 1: the Chinese and Italian data was correctly measured and reported.
Assumption 2: the “treatment” (China vs Italy) is uncorrelated with unobserved determinants of survival.
This is the famous CMI assumption: $E\left[ \varepsilon | X \right] = 0$.
Note: the identifying assumption (CMI) required for a causal interpretation is very strong.
In general treatments are often correlated with variables which are also correlated with the outcome.
In this case, the confounder is age, Italy’s population is older than China’s.
In general each reader can decide how convinced he is with the identifying assumption and thus how to interpret an estimate.
Importantly, non-causal estimates are often still very useful in contexts where our goal is to make predictions.
The true DGP above assumed the treatment effect was the same across age bins:
\(Y_{i} = \beta_{0} + \beta_{xy} X_{i} + \beta_{zy} Z_{i} + \nu_{i}\)
Thus the CFR was $\beta_{xy} = -5\%$ lower for both young and old patients in Italy.
Suppose there was treatment effect heterogeneity and the true DGP was:
\(Y_{i} = \beta_{0} + \beta_{xy} X_{i} + \beta_{zy} Z_{i} + \beta_{xzy} X_{i} Z_{i} + \nu_{i}\)
In this case, estimating the model omitting the interaction effect (omitted non-linearity)
would also cause OVB.
To derive the bias,
slightly abuse notation by stacking a column of ones and $X_{i}$ into a matrix “X”,
and stack $\beta_{0}$ and $\beta_{xy}$ into $\beta$:
\(\begin{align*}
\hat{\beta} &= (X'X)^{-1} X'Y \\
&= (X'X)^{-1} X'(X\beta + Z\beta_{zy} + \nu_{i}) \\
&= \beta + (X'X)^{-1} X'Z \beta_{zy} + (X'X)^{-1} X'Z \nu_{i} \\
\delta_{xz} &\equiv (X'X)^{-1} X'Z \\
\hat{\beta} &= \beta + \delta_{xz} \beta_{zy} + \delta_{xz} \nu_{i} \\
E\left( \hat{\beta} | X \right) &= \beta + \delta_{xz} \beta_{zy} \\
&
\end{align*}\)
↩