Home

Overview

Most scientific questions are causal and can be answered by a finite dimensional causal estimand. Perhaps the most famous of them is the Average Treatment Effect. If certain conditions are met (no unobserved confounders, overlap), statistical methods can be employed to estimate these causal estimands from data. Semi-parametric methods have gained interest in the past decade because they are more likely to capture the true generating process than their restricted parametric counterparts. This is particularly true in modern days data science where datasets are increasingly complex and unlikely to be well represented by parametric models.

TMLE.jl implements such semi-parametric methods. Precisely, it is an implementation of the Targeted Minimum Loss-Based Estimation (TMLE) framework. If you are interested in leveraging the power of modern machine-learning methods while preserving interpretability and statistical inference guarantees, you are in the right place. TMLE.jl is compatible with any MLJ compliant algorithm and any dataset wrapped in a DataFrame object.

The following plot illustrates the bias reduction achieved by TMLE over a mis-specified parametric linear model in the presence of confounding. Note that in this case, TMLE also uses mis-specified models but still achieves a lower bias due to the targeting step.

Home Illustration

Installation

TMLE.jl can be installed via the Package Manager and supports Julia v1.10 and greater.

Pkg> add TMLE

Quick Start

To run an estimation procedure, we need 3 ingredients:

1. A dataset: here a simulation dataset

For illustration, assume we know the actual data generating process is as follows:

\[\begin{aligned} W &\sim \mathcal{Uniform}(0, 1) \\ T &\sim \mathcal{Bernoulli}(logistic(1-2 \cdot W)) \\ Y &\sim \mathcal{Normal}(1 + 3 \cdot T - T \cdot W, 0.01) \end{aligned}\]

Because we know the data generating process, we can simulate some data accordingly:

using TMLE
using Distributions
using StableRNGs
using Random
using CategoricalArrays
using MLJLinearModels
using LogExpFunctions
using DataFrames

rng = StableRNG(123)
n = 100
W = rand(rng, Uniform(), n)
T = rand(rng, Uniform(), n) .< logistic.(1 .- 2W)
Y = 1 .+ 3T .- T.*W .+ rand(rng, Normal(0, 0.01), n)
dataset = DataFrame(Y=Y, T=categorical(T), W=W)

2. A quantity of interest: here the Average Treatment Effect (ATE)

The Average Treatment Effect of $T$ on $Y$ confounded by $W$ is defined as:

Ψ = ATE(
    outcome=:Y,
    treatment_values=(T=(case=true, control = false),),
    treatment_confounders=(T=[:W],)
)
StatisticalATE
- Outcome: Y
- Treatment: T => (control = false, case = true)

3. An estimator: here a Targeted Maximum Likelihood Estimator (TMLE)

tmle = Tmle()
result, _ = tmle(Ψ, dataset, verbosity=0);
result
One sample t-test
-----------------
Population details:
    parameter of interest:   Mean
    value under h_0:         0
    point estimate:          2.49315
    95% confidence interval: (2.434, 2.552)

Test summary:
    outcome with 95% confidence: reject h_0
    two-sided p-value:           <1e-93

Details:
    number of observations:   100
    t-statistic:              83.8265438241116
    degrees of freedom:       99
    empirical standard error: 0.029741766265683128

We are comforted to see that our estimator covers the ground truth! 🥳

Scope and Distinguishing Features

The goal of this package is to provide an entry point for semi-parametric asymptotic unbiased and efficient estimation in Julia. The two main general estimators that are known to achieve these properties are the One-Step estimator and the Targeted Maximum-Likelihood estimator. Most of the current effort has been centered around estimands that are composite of the counterfactual mean.

Distinguishing Features:

  • Estimands: Counterfactual Mean, Average Treatment Effect, Interactions, Any composition thereof
  • Estimators: TMLE, CV-TMLE, C-TMLE, One-Step, CV-One-Step.
  • Machine-Learning: Any MLJ compatible model
  • Dataset: Any dataset wrapped in a DataFrame.
  • Factorial Treatment Variables:
    • Multiple treatments
    • Categorical treatment values

Citing TMLE.jl

If you use TMLE.jl for your own work and would like to cite us, here are the BibTeX and APA formats:

  • BibTeX
@software{Labayle_TMLE_jl,
    author = {Labayle, Olivier and Khamseh, Ava and Ponting, Chris and Beentjes, Sjoerd},
    title = {{TMLE.jl}},
    url = {https://github.com/olivierlabayle/TMLE.jl}
}
  • APA

Labayle, O., Beentjes, S., Khamseh, A., & Ponting, C. TMLE.jl [Computer software]. https://github.com/olivierlabayle/TMLE.jl