Home

TMLE.jl is a Julia package for estimating causal parameters using targeted minimum loss based estimation (TMLE). Like double machine leaninng (DML), TMLE provides, double-robust estimators that combine causal inference theory with modern machine learning algorithms. The key advantage of TMLE is that it performs the debiasing step in function space rather than in the parameter space, ensuring that resulting estimates inherently respect the natural bounds of the estimand (e.g., probabilities remain between 0 and 1).

Installation

TMLE.jl can be installed via the Package Manager and supports Julia v1.10 and greater.

Pkg> add TMLE

Quick Start

To run an estimation procedure, we need 3 ingredients:

1. A dataset: here a simulation dataset

For illustration, assume we know the actual data generating process is as follows:

\[\begin{aligned} W &\sim \mathcal{Uniform}(0, 1) \\ T &\sim \mathcal{Bernoulli}(logistic(1-2 \cdot W)) \\ Y &\sim \mathcal{Normal}(1 + 3 \cdot T - T \cdot W, 0.01) \end{aligned}\]

Because we know the data generating process, we can simulate some data accordingly:

using TMLE
using Distributions
using StableRNGs
using Random
using CategoricalArrays
using MLJLinearModels
using LogExpFunctions
using DataFrames

rng = StableRNG(123)
n = 100
W = rand(rng, Uniform(), n)
T = rand(rng, Uniform(), n) .< logistic.(1 .- 2W)
Y = 1 .+ 3T .- T.*W .+ rand(rng, Normal(0, 0.01), n)
dataset = DataFrame(Y=Y, T=categorical(T), W=W)

2. A quantity of interest: here the Average Treatment Effect (ATE)

The Average Treatment Effect of $T$ on $Y$ confounded by $W$ is defined as:

Ψ = ATE(
    outcome=:Y,
    treatment_values=(T=(case=true, control = false),),
    treatment_confounders=(T=[:W],)
)

StatisticalATE
	- Outcome: Y
	- Treatment: T => (control = false, case = true)

3. An estimator: here a Targeted Maximum Likelihood Estimator (TMLE)

tmle = Tmle()
result, _ = tmle(Ψ, dataset, verbosity=0);
result

Targeted Minimum Loss Based Estimator
-------------------------------------
- point estimate         : 2.4931
- 95% confidence interval: [2.4341, 2.5522]
- p-value                : 9.43e-94
- mean influence curve   : 2.39e-16

Full test results can be obtained with `significance_test`

We are comforted to see that our estimator covers the ground truth! 🥳

Why TMLE?

Most scientific questions are causal and can be answered by a finite dimensional causal estimand. Perhaps the most famous of them is the Average Treatment Effect. If certain conditions are met (no unobserved confounders, overlap), statistical methods can be employed to estimate these causal estimands from data. Semi-parametric methods have gained interest in the past decade because they are more likely to capture the true generating process than their restricted parametric counterparts. This is particularly true in modern days data science where datasets are increasingly complex and unlikely to be well represented by parametric models.

TMLE.jl implements such semi-parametric methods. Precisely, it is an implementation of the Targeted Minimum Loss-Based Estimation (TMLE) framework. If you are interested in leveraging the power of modern machine-learning methods while preserving interpretability and statistical inference guarantees, you are in the right place. TMLE.jl is compatible with any MLJ compliant algorithm and any dataset wrapped in a DataFrame object.

The following plot illustrates the bias reduction achieved by TMLE over a mis-specified parametric linear model in the presence of confounding. Note that in this case, TMLE also uses mis-specified models but still achieves a lower bias due to the targeting step.

Home Illustration

Scope and Distinguishing Features

The goal of this package is to provide an entry point for semi-parametric asymptotic unbiased and efficient estimation in Julia. The two main general estimators that are known to achieve these properties are the One-Step estimator and the Targeted Maximum-Likelihood estimator. Most of the current effort has been centered around estimands that are composite of the counterfactual mean.

Distinguishing Features:

Estimands: Counterfactual Mean, Average Treatment Effect, Interactions, Any composition thereof
Estimators: TMLE, CV-TMLE, C-TMLE, One-Step, CV-One-Step.
Machine-Learning: Any MLJ compatible model
Dataset: Any dataset wrapped in a DataFrame.
Factorial Treatment Variables:
- Multiple treatments
- Categorical treatment values

Citing TMLE.jl

If you use TMLE.jl for your own work and would like to cite us, here are the BibTeX and APA formats:

BibTeX

@software{Labayle_TMLE_jl,
    author = {Labayle, Olivier and Khamseh, Ava and Ponting, Chris and Beentjes, Sjoerd},
    title = {{TMLE.jl}},
    url = {https://github.com/olivierlabayle/TMLE.jl}
}

Labayle, O., Beentjes, S., Khamseh, A., & Ponting, C. TMLE.jl [Computer software]. https://github.com/olivierlabayle/TMLE.jl