Models
Because TMLE.jl is based on top of MLJ, we can support any model respecting the MLJ interface. At the moment, we readily support all models from the following packages:
- MLJLinearModels: Generalized Linear Models in Julia.
- XGBoost.jl: Julia wrapper of the famous XGBoost package.
- EvoTrees.jl: A pure Julia implementation of histogram based gradient boosting trees (subset of XGBoost)
- GLMNet: A Julia wrapper of the glmnet package. See the GLMNet section.
- MLJModels: General utilities such as the
OneHotEncoder
orInteractionTransformer
.
Further support for more packages can be added on request, please fill an issue.
Also, because the estimator file used by the TMLE CLI is a pure Julia file, it is possible to use it in order to install additional package that can be used to define additional models.
Finally, we also provide some additional models described in Additional models provided by TMLECLI.jl.
Additional models provided by TMLECLI.jl
GLMNet
This is a simple wrapper around the glmnetcv
function from the GLMNet.jl package. The only difference is that the resampling is made based on MLJ resampling strategies.
TMLECLI.GLMNetRegressor
— MethodGLMNetRegressor(;resampling=CV(), params...)
A GLMNet regressor for continuous outcomes based on the glmnetcv
function from the GLMNet.jl package.
Arguments:
- resampling: A MLJ
ResamplingStrategy
, see MLJ resampling strategies - params: Additional parameters to the
glmnetcv
function
Examples:
A glmnet with alpha=0
.
model = GLMNetRegressor(resampling=CV(nfolds=3), alpha=0)
mach = machine(model, X, y)
fit!(mach, verbosity=0)
TMLECLI.GLMNetClassifier
— MethodGLMNetClassifier(;resampling=StratifiedCV(), params...)
A GLMNet classifier for binary/multinomial outcomes based on the glmnetcv
function from the GLMNet.jl package.
Arguments:
- resampling: A MLJ
ResamplingStrategy
, see MLJ resampling strategies - params: Additional parameters to the
glmnetcv
function
Examples:
A glmnet with alpha=0
.
model = GLMNetClassifier(resampling=StratifiedCV(nfolds=3), alpha=0)
mach = machine(model, X, y)
fit!(mach, verbosity=0)
RestrictedInteractionTransformer
This transformer generates interaction terms based on a set of primary variables in order to limit the combinatorial explosion.
TMLECLI.RestrictedInteractionTransformer
— TypeRestrictedInteractionTransformer(;order=2, primary_variables=Symbol[], primary_variables_patterns=Regex[])
Definition
This transformer generates interaction terms based on a set of primary variables. All generated interaction terms are composed of a set of primary variables and at most one remaining variable in the provided table. If (T₁, T₂) are defining the set of primary variables and (W₁, W₂) are reamining variables in the table, the generated interaction terms at order 2 will be:
- T₁xT₂
- T₁xW₂
- W₁xT₂
but W₁xW₂ will not be generated because it would contain 2 remaining variables.
Arguments:
- order: All interaction features up to the given order will be computed
- primary_variables: A set of column names to generate the interactions
- primaryvariablespatterns: A set of regular expression that can additionally
be used to identify primary_variables.
BiAllelicSNPEncoder
This transformer, mostly useful for genetic studies, converts bi-allelic single nucleotide polyphormism columns, encoded as Strings to a count of one of the two alleles.
TMLECLI.BiAllelicSNPEncoder
— TypeBiAllelicSNPEncoder(patterns=Symbol[])
Encodes bi-allelic SNP columns, identified by the provided patterns
Regex, as a count of a reference allele determined dynamically (not necessarily the minor allele).