Home
Overview
Most scientific questions are causal and can be answered by a finite dimensional causal estimand. Perhaps the most famous of them is the Average Treatment Effect. If certain conditions are met (no unobserved confounders, overlap), statistical methods can be employed to estimate these causal estimands from data. Semi-parametric methods have gained interest in the past decade because they are more likely to capture the true generating process than their restricted parametric counterparts. This is particularly true in modern days data science where datasets are increasingly complex and unlikely to be well represented by parametric models.
TMLE.jl implements such semi-parametric methods. Precisely, it is an implementation of the Targeted Minimum Loss-Based Estimation (TMLE) framework. If you are interested in leveraging the power of modern machine-learning methods while preserving interpretability and statistical inference guarantees, you are in the right place. TMLE.jl is compatible with any MLJ compliant algorithm and any dataset wrapped in a DataFrame object.
The following plot illustrates the bias reduction achieved by TMLE over a mis-specified parametric linear model in the presence of confounding. Note that in this case, TMLE also uses mis-specified models but still achieves a lower bias due to the targeting step.
Installation
TMLE.jl can be installed via the Package Manager and supports Julia v1.10
and greater.
Pkg> add TMLE
Quick Start
To run an estimation procedure, we need 3 ingredients:
1. A dataset: here a simulation dataset
For illustration, assume we know the actual data generating process is as follows:
\[\begin{aligned} W &\sim \mathcal{Uniform}(0, 1) \\ T &\sim \mathcal{Bernoulli}(logistic(1-2 \cdot W)) \\ Y &\sim \mathcal{Normal}(1 + 3 \cdot T - T \cdot W, 0.01) \end{aligned}\]
Because we know the data generating process, we can simulate some data accordingly:
using TMLE
using Distributions
using StableRNGs
using Random
using CategoricalArrays
using MLJLinearModels
using LogExpFunctions
using DataFrames
rng = StableRNG(123)
n = 100
W = rand(rng, Uniform(), n)
T = rand(rng, Uniform(), n) .< logistic.(1 .- 2W)
Y = 1 .+ 3T .- T.*W .+ rand(rng, Normal(0, 0.01), n)
dataset = DataFrame(Y=Y, T=categorical(T), W=W)
2. A quantity of interest: here the Average Treatment Effect (ATE)
The Average Treatment Effect of $T$ on $Y$ confounded by $W$ is defined as:
Ψ = ATE(
outcome=:Y,
treatment_values=(T=(case=true, control = false),),
treatment_confounders=(T=[:W],)
)
StatisticalATE
- Outcome: Y
- Treatment: T => (control = false, case = true)
3. An estimator: here a Targeted Maximum Likelihood Estimator (TMLE)
tmle = Tmle()
result, _ = tmle(Ψ, dataset, verbosity=0);
result
One sample t-test
-----------------
Population details:
parameter of interest: Mean
value under h_0: 0
point estimate: 2.49315
95% confidence interval: (2.434, 2.552)
Test summary:
outcome with 95% confidence: reject h_0
two-sided p-value: <1e-93
Details:
number of observations: 100
t-statistic: 83.8265438241116
degrees of freedom: 99
empirical standard error: 0.029741766265683128
We are comforted to see that our estimator covers the ground truth! 🥳
Scope and Distinguishing Features
The goal of this package is to provide an entry point for semi-parametric asymptotic unbiased and efficient estimation in Julia. The two main general estimators that are known to achieve these properties are the One-Step estimator and the Targeted Maximum-Likelihood estimator. Most of the current effort has been centered around estimands that are composite of the counterfactual mean.
Distinguishing Features:
- Estimands: Counterfactual Mean, Average Treatment Effect, Interactions, Any composition thereof
- Estimators: TMLE, CV-TMLE, C-TMLE, One-Step, CV-One-Step.
- Machine-Learning: Any MLJ compatible model
- Dataset: Any dataset wrapped in a DataFrame.
- Factorial Treatment Variables:
- Multiple treatments
- Categorical treatment values
Citing TMLE.jl
If you use TMLE.jl for your own work and would like to cite us, here are the BibTeX and APA formats:
- BibTeX
@software{Labayle_TMLE_jl,
author = {Labayle, Olivier and Khamseh, Ava and Ponting, Chris and Beentjes, Sjoerd},
title = {{TMLE.jl}},
url = {https://github.com/olivierlabayle/TMLE.jl}
}
- APA
Labayle, O., Beentjes, S., Khamseh, A., & Ponting, C. TMLE.jl [Computer software]. https://github.com/olivierlabayle/TMLE.jl