Resampling Strategies

We also provide additional resampling strategies compliant with the MLJ.ResamplingStrategy interface.

AdaptiveResampling

The AdaptiveResampling strategies will determine the number of cross-validation folds adaptively based on the available data. This is inspired from the this paper on practical considerations for super learning.

The AdaptiveCV will determine the number of folds adaptively and perform a classic cross-validation split:

TMLECLI.AdaptiveCVType
AdaptiveCV(;shuffle=nothing, rng=nothing)

A CV (see MLJBase.CV) resampling strategy where the number of folds is determined data adaptively based on the rule of thum described here.

source

The AdaptiveStratifiedCV will determine the number of folds adaptively and perform a stratified cross-validation split:

TMLECLI.AdaptiveStratifiedCVType
AdaptiveStratifiedCV(;shuffle=nothing, rng=nothing)

A StratifiedCV (see MLJBase.StratifiedCV) resampling strategy where the number of folds is determined data adaptively based on the rule of thum described here.

source

JointStratifiedCV

Sometimes, the treatment variables (or some other features) are imbalanced and naively performing cross-validation or stratified cross-validation could result in the violation of the positivity hypothesis. To overcome this difficulty, the following JointStratifiedCV, performs a stratified cross-validation based on both features variables and the outcome variable.

TMLECLI.JointStratifiedCVType
JointStratifiedCV(;patterns=nothing, resampling=StratifiedCV())

Applies a stratified cross-validation strategy based on a variable constructed from X and y. A composite variable is built from:

  • x variables from X matching any of patterns and satisfying autotype(x) <: Union{Missing, Finite}.

If no pattern is provided, then only the second condition is considered.

  • y if autotype(y) <: Union{Missing, Finite}

The resampling needs to be a stratification compliant resampling strategy, at the moment one of StratifiedCV or AdaptiveStratifiedCV

source