API Reference

TMLE.ConfigurationMethod
Configuration(;estimands, scm=nothing, adjustment=nothing) = Configuration(estimands, scm, adjustment)

A Configuration is a set of estimands to be estimated. If the set of estimands contains causal (identifiable) estimands, these will be identified using the provided scm and adjustment method.

source
TMLE.OSEMethod
OSE(;models=default_models(), resampling=nothing, ps_lowerbound=1e-8, machine_cache=false)

Defines a One Step Estimator using the specified models for estimation of the nuisance parameters. The estimator is a function that can be applied to estimate estimands for a dataset.

Arguments

  • models: A Dict(variable => model, ...) where the variables are the outcome variables modeled by the models.
  • resampling: Outer resampling strategy. Setting it to nothing (default) falls back to vanilla estimation while

any valid MLJ.ResamplingStrategy will result in CV-OSE.

  • ps_lowerbound: Lowerbound for the propensity score to avoid division by 0. The special value nothing will

result in a data adaptive definition as described in here.

  • machine_cache: Whether MLJ.machine created during estimation should cache data.

Example

using MLJLinearModels
models = Dict(:Y => LinearRegressor(), :T => LogisticClassifier())
ose = OSE()
Ψ̂ₙ, cache = ose(Ψ, dataset)
source
TMLE.SCMType

A SCM is simply a wrapper around a MetaGraph over a Directed Acyclic Graph.

source
TMLE.TMLEEMethod
TMLEE(;models=default_models(), resampling=nothing, ps_lowerbound=1e-8, weighted=false, tol=nothing, machine_cache=false)

Defines a TMLE estimator using the specified models for estimation of the nuisance parameters. The estimator is a function that can be applied to estimate estimands for a dataset.

Arguments

  • models (default: default_models()): A Dict(variable => model, ...) where the variables are the outcome variables modeled by the models.
  • resampling (default: nothing): Outer resampling strategy. Setting it to nothing (default) falls back to vanilla TMLE while

any valid MLJ.ResamplingStrategy will result in CV-TMLE.

  • ps_lowerbound (default: 1e-8): Lowerbound for the propensity score to avoid division by 0. The special value nothing will

result in a data adaptive definition as described in here.

  • weighted (default: false): Whether the fluctuation model is a classig GLM or a weighted version. The weighted fluctuation has

been show to be more robust to positivity violation in practice.

  • tol (default: nothing): Convergence threshold for the TMLE algorithm iterations. If nothing (default), 1/(sample size) will be used. See also max_iter.
  • max_iter (default: 1): Maximum number of iterations for the TMLE algorithm.
  • machine_cache (default: false): Whether MLJ.machine created during estimation should cache data.

Example

using MLJLinearModels
tmle = TMLEE()
Ψ̂ₙ, cache = tmle(Ψ, dataset)
source
TMLE.TreatmentTransformerMethod
TreatmentTransformer(;encoder=encoder())

Treatments in TMLE are represented by CategoricalArrays. If a treatment column has type OrderedFactor, then its integer representation is used, make sure that the levels correspond to your expectations. All other columns are one-hot encoded.

source
TMLE.StaticSCMMethod

A plate Structural Causal Model where:

  • For all outcomes: oᵢ = fᵢ(treatments, confounders, outcomeextracovariates)
  • For all treatments: tⱼ = fⱼ(confounders)

Example

StaticSCM([:Y], [:T₁, :T₂], [:W₁, :W₂, :W₃]; outcomeextracovariates=[:C])

source
TMLE.brute_force_orderingMethod
brute_force_ordering(estimands; η_counts = nuisance_function_counts(estimands))

Finds an optimal ordering of the estimands to minimize maximum cache size. The approach is a brute force one, all permutations are generated and evaluated, if a minimum is found fast it is immediatly returned. The theoretical complexity is in O(N!). However due to the stop fast approach and the shuffling, this is actually expected to be much smaller than that.

source
TMLE.composeMethod
compose(f, estimation_results::Vararg{EICEstimate, N}) where N

Provides an estimator of f(estimation_results...).

Mathematical details

The following is a summary from Asymptotic Statistics, A. W. van der Vaart.

Consider k TMLEs computed from a dataset of size n and embodied by Tₙ = (T₁,ₙ, ..., Tₖ,ₙ). Since each of them is asymptotically normal, the multivariate CLT provides the joint distribution:

√n(Tₙ - Ψ₀) ↝ N(0, Σ),

where Σ is the covariance matrix of the TMLEs influence curves.

Let f:ℜᵏ→ℜᵐ, be a differentiable map at Ψ₀. Then, the delta method provides the limiting distribution of √n(f(Tₙ) - f(Ψ₀)). Because Tₙ is normal, the result is:

√n(f(Tₙ) - f(Ψ₀)) ↝ N(0, ∇f(Ψ₀) ̇Σ ̇(∇f(Ψ₀))ᵀ),

where ∇f(Ψ₀):ℜᵏ→ℜᵐ is a linear map such that by abusing notations and identifying the function with the multiplication matrix: ∇f(Ψ₀):h ↦ ∇f(Ψ₀) ̇h. And the matrix ∇f(Ψ₀) is the jacobian of f at Ψ₀.

Hence, the only thing we need to do is:

  • Compute the covariance matrix Σ
  • Compute the jacobian ∇f, which can be done using Julia's automatic differentiation facilities.
  • The final estimator is normal with mean f₀=f(Ψ₀) and variance σ₀=∇f(Ψ₀) ̇Σ ̇(∇f(Ψ₀))ᵀ

Arguments

  • f: An array-input differentiable map.
  • estimation_results: 1 or more EICEstimate structs.

Examples

Assuming res₁ and res₂ are TMLEs:

f(x, y) = [x^2 - y, y - 3x]
compose(f, res₁, res₂)
source
TMLE.default_modelsMethod
default_models(;Q_binary=LinearBinaryClassifier(), Q_continuous=LinearRegressor(), G=LinearBinaryClassifier()) = (

Create a Dictionary containing default models to be used by downstream estimators. Each provided model is prepended (in a MLJ.Pipeline) with an MLJ.ContinuousEncoder.

By default: - Qbinary is a LinearBinaryClassifier - Qcontinuous is a LinearRegressor - G is a LinearBinaryClassifier

Example

The following changes the default Q_binary to a LogisticClassifier and provides a RidgeRegressor for special_y.

using MLJLinearModels
models = default_models(
    Q_binary  = LogisticClassifier(),
    special_y = RidgeRegressor()
)
source
TMLE.epsilonsMethod
epsilons(cache)

Retrieves the fluctuations' epsilons corresponding to each targeting step from the cache.

source
TMLE.estimatesMethod
estimates(cache)

Retrieves the estimates corresponding to each targeting step from the cache.

source
TMLE.factorialEstimandMethod
factorialEstimand(
    constructor::Union{typeof(CM), typeof(ATE), typeof(AIE)},
    treatments, outcome; 
    confounders=nothing,
    dataset=nothing, 
    outcome_extra_covariates=(),
    positivity_constraint=nothing,
    freq_table=nothing,
    verbosity=1
)

Generates a factorial JointEstimand with components of type constructor (CM, ATE, AIE).

For the ATE and the AIE, the generated components are restricted to the Cartesian Product of single treatment levels transitions. For example, consider two treatment variables T₁ and T₂ each taking three possible values (0, 1, 2). For each treatment variable, the single treatment levels transitions are defined by (0 → 1, 1 → 2). Then, the Cartesian Product of these transitions is taken, resulting in a 2 x 2 = 4 dimensional joint estimand:

  • (T₁: 0 → 1, T₂: 0 → 1)
  • (T₁: 0 → 1, T₂: 1 → 2)
  • (T₁: 1 → 2, T₂: 0 → 1)
  • (T₁: 1 → 2, T₂: 1 → 2)

Return

A JointEstimand with causal or statistical components.

Args

  • constructor: CM, ATE or AIE.
  • treatments: An AbstractDictionary/NamedTuple of treatment levels (e.g. (T=(0, 1, 2),)) or a treatment iterator, then a dataset must be provided to infer the levels from it.
  • outcome: The outcome variable.
  • confounders=nothing: The generated components will inherit these confounding variables. If nothing, causal estimands are generated.
  • outcome_extra_covariates=(): The generated components will inherit these outcome_extra_covariates.
  • dataset: An optional dataset to enforce a positivity constraint and infer treatment levels.
  • positivity_constraint=nothing: Only components that pass the positivity constraint are added to the JointEstimand. A dataset must then be provided.
  • freq_table: This is only to be used by factorialEstimands to avoid unecessary computations.
  • verbosity=1: Verbosity level.

Examples:

  • An Average Treatment Effect with causal components:
factorialEstimand(ATE, (T₁ = (0, 1), T₂=(0, 1, 2)), :Y₁)
  • An Average Interaction Effect with statistical components:
factorial(AIE, (T₁ = (0, 1, 2), T₂=(0, 1, 2)), :Y₁, confounders=[:W₁, :W₂])
  • With a dataset, the treatment levels can be infered and a positivity constraint enforced:

Interactions:

factorialEstimand(ATE, [:T₁, :T₂], :Y₁, 
    confounders=[:W₁, :W₂], 
    dataset=dataset, 
    positivity_constraint=0.1
)
source
TMLE.factorialEstimandsMethod

factorialEstimands( constructor::Union{typeof(ATE), typeof(AIE)}, dataset, treatments, outcomes; confounders=nothing, outcomeextracovariates=(), positivity_constraint=nothing, verbosity=1 )

Generates a JointEstimand for each outcome in outcomes. See factorialEstimand.

source
TMLE.gradientsMethod
gradients(cache)

Retrieves the gradients corresponding to each targeting step from the cache.

source
TMLE.groups_orderingMethod
groups_ordering(estimands)

This will order estimands based on: propensity score first, outcome mean second. This heuristic should work reasonably well in practice. It could be optimized further by:

  • Organising the propensity score groups that share similar components to be close together.
  • Brute forcing the ordering of these groups to find an optimal one.
source
TMLE.significance_testFunction
significance_test(estimate::JointEstimate, Ψ₀=zeros(size(estimate.estimate, 1)))

Performs a TTest if the estimate is one dimensional and a HotellingT2Test otherwise.

source