Integrations

CausalTables.jl

The CausalTables.jl package provides a simple Tables-compliant interface allowing users to wrap data and structural causal information together into one object. It also allows users to easily simulate data from a known structural causal model for experimentation purposes.

TMLE.jl estimators can take CausalTable objects as input, in which case the user does not need to identify a statistical estimand from a causal one – it will be identified automatically from the CausalTable.

Using CausalTables.jl, one can define a structural causal model where the distribution of each variable is known, and sample from it using the rand function. This yields a CausalTable object that stores the underlying causal structure (the same information as that contained in an SCM object in TMLE.jl).

Estimating a causal quantity in this scenario is now simpler: one does not need to use the identify function or define the variables needed in the statistical estimand; just call the estimator with the CausalTable object!

using TMLE
using CausalTables
using Distributions
# Sample a random dataset endowed with causal structure
# using the CausalTables.jl package
scm = StructuralCausalModel(@dgp(
        W ~ Beta(2,2),
        A ~ Binomial.(1, W),
        Y ~ Normal.(A .+ W, 0.5)
    ); treatment = :A, response = :Y)

ct = rand(scm, 100)

# Define a causal estimand and estimate it using TMLE
Ψ = ATE(outcome = :Y, treatment_values = (A = (case = 1, control = 0),))
estimator = Tmle()
Ψ̂, cache = estimator(Ψ, ct; verbosity=0)
(TMLE.TMLEstimate{Float64}(TMLE.StatisticalATE(:Y, OrderedCollections.OrderedDict{Symbol, @NamedTuple{control::Int64, case::Int64}}(:A => (control = 0, case = 1)), OrderedCollections.OrderedDict(:A => (:W,)), ()), 1.0736802470209694, 0.8906936757877406, 100, [0.9538506698206282, -0.2424797482929671, 0.5390113404931455, 0.030262343668806962, 0.13637607944142366, 1.262534595356361, -0.6445450639834266, -0.19606960274179958, -0.3976328715606551, -1.2730108514244205  …  0.21541863492529686, 0.9270425566099547, -0.6873827963101032, 0.839166366634344, 0.1746973256799527, -0.18841445646555793, -0.8635907926563152, 1.0469161133152005, 1.0652677834000555, -1.9960270345858337]), Dict{Any, Any}(TMLE.CMRelevantFactors(TMLE.ConditionalDistribution(:Y, (:A, :W)), (TMLE.ConditionalDistribution(:A, (:W,)),)) => Dict{Any, Any}(TMLE.CMRelevantFactorsEstimator(nothing, Dict{Symbol, MLJBase.ProbabilisticPipeline{N, MLJModelInterface.predict} where N<:NamedTuple}(:Q_binary_default => ProbabilisticPipeline(continuous_encoder = ContinuousEncoder(drop_last = true, …), …), :G_default => ProbabilisticPipeline(continuous_encoder = ContinuousEncoder(drop_last = true, …), …), :Q_continuous_default => ProbabilisticPipeline(continuous_encoder = ContinuousEncoder(drop_last = true, …), …))) => TMLE.MLCMRelevantFactors(TMLE.CMRelevantFactors(TMLE.ConditionalDistribution(:Y, (:A, :W)), (TMLE.ConditionalDistribution(:A, (:W,)),)), TMLE.MLConditionalDistribution(TMLE.ConditionalDistribution(:Y, (:A, :W)), machine(ProbabilisticPipeline(continuous_encoder = ContinuousEncoder(drop_last = true, …), …), …)), TMLE.JointConditionalDistributionEstimate{TMLE.MLConditionalDistribution, 1}((TMLE.ConditionalDistribution(:A, (:W,)),), (TMLE.MLConditionalDistribution(TMLE.ConditionalDistribution(:A, (:W,)), machine(ProbabilisticPipeline(continuous_encoder = ContinuousEncoder(drop_last = true, …), …), …)),)))), TMLE.ConditionalDistribution(:A, (:W,)) => Dict{Any, Any}(TMLE.MLConditionalDistributionEstimator(ProbabilisticPipeline(continuous_encoder = ContinuousEncoder(drop_last = true, …), …), nothing) => TMLE.MLConditionalDistribution(TMLE.ConditionalDistribution(:A, (:W,)), machine(ProbabilisticPipeline(continuous_encoder = ContinuousEncoder(drop_last = true, …), …), …))), :targeted_factors => TMLE.MLCMRelevantFactors(TMLE.CMRelevantFactors(TMLE.ConditionalDistribution(:Y, (:A, :W)), (TMLE.ConditionalDistribution(:A, (:W,)),)), TMLE.MLConditionalDistribution(TMLE.ConditionalDistribution(:Y, (:A, :W)), machine(Fluctuation(Ψ = TMLE.StatisticalATE(:Y, OrderedCollections.OrderedDict{Symbol, @NamedTuple{control::Int64, case::Int64}}(:A => (control = 0, case = 1)), OrderedCollections.OrderedDict(:A => (:W,)), ()), …), …)), TMLE.JointConditionalDistributionEstimate{TMLE.MLConditionalDistribution, 1}((TMLE.ConditionalDistribution(:A, (:W,)),), (TMLE.MLConditionalDistribution(TMLE.ConditionalDistribution(:A, (:W,)), machine(ProbabilisticPipeline(continuous_encoder = ContinuousEncoder(drop_last = true, …), …), …)),))), TMLE.ConditionalDistribution(:Y, (:A, :W)) => Dict{Any, Any}(TMLE.MLConditionalDistributionEstimator(ProbabilisticPipeline(continuous_encoder = ContinuousEncoder(drop_last = true, …), …), nothing) => TMLE.MLConditionalDistribution(TMLE.ConditionalDistribution(:Y, (:A, :W)), machine(ProbabilisticPipeline(continuous_encoder = ContinuousEncoder(drop_last = true, …), …), …)))))

Serialization to JSON / YAML

Estimands and estimates can be serialized to disk in JSON or YAML format using TMLE.write_json or TMLE.write_yaml. Let's serialize the estimand and estimate from the previous example.

using JSON
using YAML

TMLE.write_json("estimand.json", Ψ)
TMLE.write_yaml("estimate.yaml", Ψ̂)

Now let's deserialize the estimand and run TMLE again

deserialized_Ψ = TMLE.read_json("estimand.json")
Ψ̂_from_serialized_Ψ, cache = estimator(deserialized_Ψ, ct; cache=cache, verbosity=0)
(TMLE.TMLEstimate{Float64}(TMLE.StatisticalATE(:Y, OrderedCollections.OrderedDict{Symbol, @NamedTuple{control::Int64, case::Int64}}(:A => (control = 0, case = 1)), OrderedCollections.OrderedDict(:A => (:W,)), ()), 1.0736802470209694, 0.8906936757877406, 100, [0.9538506698206282, -0.2424797482929671, 0.5390113404931455, 0.030262343668806962, 0.13637607944142366, 1.262534595356361, -0.6445450639834266, -0.19606960274179958, -0.3976328715606551, -1.2730108514244205  …  0.21541863492529686, 0.9270425566099547, -0.6873827963101032, 0.839166366634344, 0.1746973256799527, -0.18841445646555793, -0.8635907926563152, 1.0469161133152005, 1.0652677834000555, -1.9960270345858337]), Dict{Any, Any}(TMLE.CMRelevantFactors(TMLE.ConditionalDistribution(:Y, (:A, :W)), (TMLE.ConditionalDistribution(:A, (:W,)),)) => Dict{Any, Any}(TMLE.CMRelevantFactorsEstimator(nothing, Dict{Symbol, MLJBase.ProbabilisticPipeline{N, MLJModelInterface.predict} where N<:NamedTuple}(:Q_binary_default => ProbabilisticPipeline(continuous_encoder = ContinuousEncoder(drop_last = true, …), …), :G_default => ProbabilisticPipeline(continuous_encoder = ContinuousEncoder(drop_last = true, …), …), :Q_continuous_default => ProbabilisticPipeline(continuous_encoder = ContinuousEncoder(drop_last = true, …), …))) => TMLE.MLCMRelevantFactors(TMLE.CMRelevantFactors(TMLE.ConditionalDistribution(:Y, (:A, :W)), (TMLE.ConditionalDistribution(:A, (:W,)),)), TMLE.MLConditionalDistribution(TMLE.ConditionalDistribution(:Y, (:A, :W)), machine(ProbabilisticPipeline(continuous_encoder = ContinuousEncoder(drop_last = true, …), …), …)), TMLE.JointConditionalDistributionEstimate{TMLE.MLConditionalDistribution, 1}((TMLE.ConditionalDistribution(:A, (:W,)),), (TMLE.MLConditionalDistribution(TMLE.ConditionalDistribution(:A, (:W,)), machine(ProbabilisticPipeline(continuous_encoder = ContinuousEncoder(drop_last = true, …), …), …)),)))), TMLE.ConditionalDistribution(:A, (:W,)) => Dict{Any, Any}(TMLE.MLConditionalDistributionEstimator(ProbabilisticPipeline(continuous_encoder = ContinuousEncoder(drop_last = true, …), …), nothing) => TMLE.MLConditionalDistribution(TMLE.ConditionalDistribution(:A, (:W,)), machine(ProbabilisticPipeline(continuous_encoder = ContinuousEncoder(drop_last = true, …), …), …))), :targeted_factors => TMLE.MLCMRelevantFactors(TMLE.CMRelevantFactors(TMLE.ConditionalDistribution(:Y, (:A, :W)), (TMLE.ConditionalDistribution(:A, (:W,)),)), TMLE.MLConditionalDistribution(TMLE.ConditionalDistribution(:Y, (:A, :W)), machine(Fluctuation(Ψ = TMLE.StatisticalATE(:Y, OrderedCollections.OrderedDict{Symbol, @NamedTuple{control::Int64, case::Int64}}(:A => (control = 0, case = 1)), OrderedCollections.OrderedDict(:A => (:W,)), ()), …), …)), TMLE.JointConditionalDistributionEstimate{TMLE.MLConditionalDistribution, 1}((TMLE.ConditionalDistribution(:A, (:W,)),), (TMLE.MLConditionalDistribution(TMLE.ConditionalDistribution(:A, (:W,)), machine(ProbabilisticPipeline(continuous_encoder = ContinuousEncoder(drop_last = true, …), …), …)),))), TMLE.ConditionalDistribution(:Y, (:A, :W)) => Dict{Any, Any}(TMLE.MLConditionalDistributionEstimator(ProbabilisticPipeline(continuous_encoder = ContinuousEncoder(drop_last = true, …), …), nothing) => TMLE.MLConditionalDistribution(TMLE.ConditionalDistribution(:Y, (:A, :W)), machine(ProbabilisticPipeline(continuous_encoder = ContinuousEncoder(drop_last = true, …), …), …)))))

The new estimate should match the previously serialized one

deserialized_Ψ̂ = TMLE.read_yaml("estimate.yaml")
TMLE.estimate(deserialized_Ψ̂) ≈ TMLE.estimate(Ψ̂_from_serialized_Ψ)
true