Integrations

CausalTables.jl

The CausalTables.jl package provides a simple Tables-compliant interface allowing users to wrap data and structural causal information together into one object. It also allows users to easily simulate data from a known structural causal model for experimentation purposes.

TMLE.jl estimators can take CausalTable objects as input, in which case the user does not need to identify a statistical estimand from a causal one – it will be identified automatically from the CausalTable.

Using CausalTables.jl, one can define a structural causal model where the distribution of each variable is known, and sample from it using the rand function. This yields a CausalTable object that stores the underlying causal structure (the same information as that contained in an SCM object in TMLE.jl).

Estimating a causal quantity in this scenario is now simpler: one does not need to use the identify function or define the variables needed in the statistical estimand; just call the estimator with the CausalTable object!

using TMLE
using CausalTables
using Distributions
# Sample a random dataset endowed with causal structure
# using the CausalTables.jl package
scm = StructuralCausalModel(@dgp(
        W ~ Beta(2,2),
        A ~ Binomial.(1, W),
        Y ~ Normal.(A .+ W, 0.5)
    ); treatment = :A, response = :Y)

ct = rand(scm, 100)

# Define a causal estimand and estimate it using TMLE
Ψ = ATE(outcome = :Y, treatment_values = (A = (case = 1, control = 0),))
estimator = TMLEE()
Ψ̂, cache = estimator(Ψ, ct; verbosity=0)
(TMLE.TMLEstimate{Float64}(TMLE.StatisticalATE(:Y, OrderedCollections.OrderedDict{Symbol, @NamedTuple{control::Int64, case::Int64}}(:A => (control = 0, case = 1)), OrderedCollections.OrderedDict(:A => (:W,)), ()), 1.0733924805617898, 0.8875901218222116, 100, [0.966422503429709, -0.2553479464604733, 0.5343016547980471, 0.026490078331557355, 0.13981194990134674, 1.2605490332528977, -0.6451132682401461, -0.18562197012122947, -0.39965828819463817, -1.2898015039718977  …  0.21480028140548818, 0.9288035688022321, -0.6763389847484979, 0.8381612567058337, 0.18081399576365925, -0.2001035935726814, -0.8464439406119452, 1.0500883774034087, 1.0725925603415007, -1.9886733917911623]), Dict{Any, Any}(:targeted_factors => TMLE.MLCMRelevantFactors(TMLE.CMRelevantFactors(TMLE.ConditionalDistribution(:Y, (:A, :W)), (TMLE.ConditionalDistribution(:A, (:W,)),)), TMLE.MLConditionalDistribution(TMLE.ConditionalDistribution(:Y, (:A, :W)), machine(Fluctuation(Ψ = TMLE.StatisticalATE(:Y, OrderedCollections.OrderedDict{Symbol, @NamedTuple{control::Int64, case::Int64}}(:A => (control = 0, case = 1)), OrderedCollections.OrderedDict(:A => (:W,)), ()), …), …)), (TMLE.MLConditionalDistribution(TMLE.ConditionalDistribution(:A, (:W,)), machine(ProbabilisticPipeline(continuous_encoder = ContinuousEncoder(drop_last = true, …), …), …)),)), TMLE.ConditionalDistribution(:Y, (:A, :W)) => (TMLE.MLConditionalDistributionEstimator(ProbabilisticPipeline(continuous_encoder = ContinuousEncoder(drop_last = true, …), …)) => TMLE.MLConditionalDistribution(TMLE.ConditionalDistribution(:Y, (:A, :W)), machine(ProbabilisticPipeline(continuous_encoder = ContinuousEncoder(drop_last = true, …), …), …))), TMLE.CMRelevantFactors(TMLE.ConditionalDistribution(:Y, (:A, :W)), (TMLE.ConditionalDistribution(:A, (:W,)),)) => (TMLE.CMRelevantFactorsEstimator(nothing, Dict{Symbol, MLJBase.ProbabilisticPipeline{N, MLJModelInterface.predict} where N<:NamedTuple}(:Q_binary_default => ProbabilisticPipeline(continuous_encoder = ContinuousEncoder(drop_last = true, …), …), :G_default => ProbabilisticPipeline(continuous_encoder = ContinuousEncoder(drop_last = true, …), …), :Q_continuous_default => ProbabilisticPipeline(continuous_encoder = ContinuousEncoder(drop_last = true, …), …))) => TMLE.MLCMRelevantFactors(TMLE.CMRelevantFactors(TMLE.ConditionalDistribution(:Y, (:A, :W)), (TMLE.ConditionalDistribution(:A, (:W,)),)), TMLE.MLConditionalDistribution(TMLE.ConditionalDistribution(:Y, (:A, :W)), machine(ProbabilisticPipeline(continuous_encoder = ContinuousEncoder(drop_last = true, …), …), …)), (TMLE.MLConditionalDistribution(TMLE.ConditionalDistribution(:A, (:W,)), machine(ProbabilisticPipeline(continuous_encoder = ContinuousEncoder(drop_last = true, …), …), …)),))), TMLE.ConditionalDistribution(:A, (:W,)) => (TMLE.MLConditionalDistributionEstimator(ProbabilisticPipeline(continuous_encoder = ContinuousEncoder(drop_last = true, …), …)) => TMLE.MLConditionalDistribution(TMLE.ConditionalDistribution(:A, (:W,)), machine(ProbabilisticPipeline(continuous_encoder = ContinuousEncoder(drop_last = true, …), …), …)))))

Serialization to JSON / YAML

Estimands and estimates can be serialized to disk in JSON or YAML format using TMLE.write_json or TMLE.write_yaml. Let's serialize the estimand and estimate from the previous example.

using JSON
using YAML

TMLE.write_json("estimand.json", Ψ)
TMLE.write_yaml("estimate.yaml", Ψ̂)

Now let's deserialize the estimand and run TMLE again

deserialized_Ψ = TMLE.read_json("estimand.json")
Ψ̂_from_serialized_Ψ, cache = estimator(deserialized_Ψ, ct; cache=cache, verbosity=0)
(TMLE.TMLEstimate{Float64}(TMLE.StatisticalATE(:Y, OrderedCollections.OrderedDict{Symbol, @NamedTuple{control::Int64, case::Int64}}(:A => (control = 0, case = 1)), OrderedCollections.OrderedDict(:A => (:W,)), ()), 1.0733924805617898, 0.8875901218222116, 100, [0.966422503429709, -0.2553479464604733, 0.5343016547980471, 0.026490078331557355, 0.13981194990134674, 1.2605490332528977, -0.6451132682401461, -0.18562197012122947, -0.39965828819463817, -1.2898015039718977  …  0.21480028140548818, 0.9288035688022321, -0.6763389847484979, 0.8381612567058337, 0.18081399576365925, -0.2001035935726814, -0.8464439406119452, 1.0500883774034087, 1.0725925603415007, -1.9886733917911623]), Dict{Any, Any}(:targeted_factors => TMLE.MLCMRelevantFactors(TMLE.CMRelevantFactors(TMLE.ConditionalDistribution(:Y, (:A, :W)), (TMLE.ConditionalDistribution(:A, (:W,)),)), TMLE.MLConditionalDistribution(TMLE.ConditionalDistribution(:Y, (:A, :W)), machine(Fluctuation(Ψ = TMLE.StatisticalATE(:Y, OrderedCollections.OrderedDict{Symbol, @NamedTuple{control::Int64, case::Int64}}(:A => (control = 0, case = 1)), OrderedCollections.OrderedDict(:A => (:W,)), ()), …), …)), (TMLE.MLConditionalDistribution(TMLE.ConditionalDistribution(:A, (:W,)), machine(ProbabilisticPipeline(continuous_encoder = ContinuousEncoder(drop_last = true, …), …), …)),)), TMLE.ConditionalDistribution(:Y, (:A, :W)) => (TMLE.MLConditionalDistributionEstimator(ProbabilisticPipeline(continuous_encoder = ContinuousEncoder(drop_last = true, …), …)) => TMLE.MLConditionalDistribution(TMLE.ConditionalDistribution(:Y, (:A, :W)), machine(ProbabilisticPipeline(continuous_encoder = ContinuousEncoder(drop_last = true, …), …), …))), TMLE.CMRelevantFactors(TMLE.ConditionalDistribution(:Y, (:A, :W)), (TMLE.ConditionalDistribution(:A, (:W,)),)) => (TMLE.CMRelevantFactorsEstimator(nothing, Dict{Symbol, MLJBase.ProbabilisticPipeline{N, MLJModelInterface.predict} where N<:NamedTuple}(:Q_binary_default => ProbabilisticPipeline(continuous_encoder = ContinuousEncoder(drop_last = true, …), …), :G_default => ProbabilisticPipeline(continuous_encoder = ContinuousEncoder(drop_last = true, …), …), :Q_continuous_default => ProbabilisticPipeline(continuous_encoder = ContinuousEncoder(drop_last = true, …), …))) => TMLE.MLCMRelevantFactors(TMLE.CMRelevantFactors(TMLE.ConditionalDistribution(:Y, (:A, :W)), (TMLE.ConditionalDistribution(:A, (:W,)),)), TMLE.MLConditionalDistribution(TMLE.ConditionalDistribution(:Y, (:A, :W)), machine(ProbabilisticPipeline(continuous_encoder = ContinuousEncoder(drop_last = true, …), …), …)), (TMLE.MLConditionalDistribution(TMLE.ConditionalDistribution(:A, (:W,)), machine(ProbabilisticPipeline(continuous_encoder = ContinuousEncoder(drop_last = true, …), …), …)),))), TMLE.ConditionalDistribution(:A, (:W,)) => (TMLE.MLConditionalDistributionEstimator(ProbabilisticPipeline(continuous_encoder = ContinuousEncoder(drop_last = true, …), …)) => TMLE.MLConditionalDistribution(TMLE.ConditionalDistribution(:A, (:W,)), machine(ProbabilisticPipeline(continuous_encoder = ContinuousEncoder(drop_last = true, …), …), …)))))

The new estimate should match the previously serialized one

deserialized_Ψ̂ = TMLE.read_yaml("estimate.yaml")
TMLE.estimate(deserialized_Ψ̂) ≈ TMLE.estimate(Ψ̂_from_serialized_Ψ)
true