Resampling Strategies
We also provide additional resampling strategies compliant with the MLJ.ResamplingStrategy
interface.
AdaptiveResampling
The AdaptiveResampling strategies will determine the number of cross-validation folds adaptively based on the available data. This is inspired from the this paper on practical considerations for super learning.
The AdaptiveCV
will determine the number of folds adaptively and perform a classic cross-validation split:
TMLECLI.AdaptiveCV
— TypeAdaptiveCV(;shuffle=nothing, rng=nothing)
A CV (see MLJBase.CV
) resampling strategy where the number of folds is determined data adaptively based on the rule of thum described here.
The AdaptiveStratifiedCV
will determine the number of folds adaptively and perform a stratified cross-validation split:
TMLECLI.AdaptiveStratifiedCV
— TypeAdaptiveStratifiedCV(;shuffle=nothing, rng=nothing)
A StratifiedCV (see MLJBase.StratifiedCV
) resampling strategy where the number of folds is determined data adaptively based on the rule of thum described here.
JointStratifiedCV
Sometimes, the treatment variables (or some other features) are imbalanced and naively performing cross-validation or stratified cross-validation could result in the violation of the positivity hypothesis. To overcome this difficulty, the following JointStratifiedCV
, performs a stratified cross-validation based on both features variables and the outcome variable.
TMLECLI.JointStratifiedCV
— TypeJointStratifiedCV(;patterns=nothing, resampling=StratifiedCV())
Applies a stratified cross-validation strategy based on a variable constructed from X and y. A composite variable is built from:
- x variables from X matching any of
patterns
and satisfyingautotype(x) <: Union{Missing, Finite}
.
If no pattern is provided, then only the second condition is considered.
- y if
autotype(y) <: Union{Missing, Finite}
The resampling
needs to be a stratification compliant resampling strategy, at the moment one of StratifiedCV
or AdaptiveStratifiedCV