compare_models()
compares the performance
of two models with a permutation test (#295, @courtneyarmour).cv_times
did not affect the reported
repeats for cross-validation (#291, @kelly-sovacool).This minor patch fixes a test failure on platforms with no long doubles. The actual package code remains unchanged.
kfold >= length(groups)
(#285, @kelly-sovacool).
kfold
<= the number of
groups in the training set. Previously, an error was thrown if this
condition was not met. Now, if there are not enough groups in the
training set for groups to be kept together during CV, groups are
allowed to be split up across CV partitions.cross_val
added to run_ml()
allows users to define their own custom cross-validation scheme (#278,
@kelly-sovacool).
calculate_performance
, which
controls whether performance metrics are calculated (default:
TRUE
). Users may wish to skip performance calculations when
training models with no cross-validation.group_partitions
added to
run_ml()
allows users to control which groups should go to
which partition of the train/test split (#281, @kelly-sovacool).training_frac
parameter in
run_ml()
(#281, @kelly-sovacool).
training_frac
is a fraction between 0 and 1
that specifies how much of the dataset should be used in the training
fraction of the train/test split.training_frac
a vector of
indices that correspond to which rows of the dataset should go in the
training fraction of the train/test split. This gives users direct
control over exactly which observations are in the training fraction if
desired.group_correlated_features()
is now a user-facing
function.stats::cor
with the corr_method
parameter:
get_feature_importance(corr_method = "pearson")
preprocess_data()
converted the
outcome column to a character vector (#273, @kelly-sovacool, @ecmaggioncalda).preprocess_data()
:
prefilter_threshold
(#240, @kelly-sovacool, @courtneyarmour).
prefilter_threshold
or fewer rows in the data.remove_singleton_columns()
called by
preprocess_data()
to carry this out.get_feature_importance()
:
groups
(#246, @kelly-sovacool).
groups
is NULL
by default; in this case,
correlated features above corr_thresh
are grouped
together.preprocess_data()
now replaces spaces in the outcome
column with underscores (#247, @kelly-sovacool, @JonnyTran).preprocess_data()
and
get_feature_importance()
using the progressr
package (#257, @kelly-sovacool, @JonnyTran, @FedericoComoglio).stringsAsFactors
behavior.rpart
from Suggests to Imports for consistency
with other packages used during model training.This is the first release version of mikropml! 🎉
NEWS.md
file to track changes to the
package.run_ml()
preprocess_data()
plot_model_performance()
plot_hp_performance()
run_ml()
:
glmnet
: logistic and linear regressionrf
: random forestrpart2
: decision treessvmRadial
: support vector machinesxgbTree
: gradient-boosted trees