swag
packageswag is a package that trains a meta-learning procedure which combines screening and wrapper methods to find a set of extremely low-dimensional attribute combinations.
First install the devtools package. Then swag with the following code:
## if not installed
## install.packages("remotes")
remotes::install_github("SMAC-Group/SWAG-R-Package")
library(swag) #load the new package
We propose to use the breastcancer dataset readily available from the package mlbench to give an overview of swag.
# After having installed the mlbench package
data(BreastCancer, package = "mlbench")
# Pre-processing of the data
y <- BreastCancer$Class # response variable
x <- as.matrix(BreastCancer[setdiff(names(BreastCancer),c("Id","Class"))]) # features
# remove missing values and change to 'numeric'
id <- which(apply(x,1,function(x) sum(is.na(x)))>0)
y <- y[-id]
x <- x[-id,]
x <- apply(x,2,as.numeric)
# Training and test set
set.seed(180) # for replication
ind <- sample(1:dim(x)[1],dim(x)[1]*0.2)
y_test <- y[ind]
y_train <- y[-ind]
x_test <- x[ind,]
x_train <-x[-ind,]
Now we are ready to train with swag! The first step is to define the meta-parameters of the swag procedure: (p_{max}) the maximum dimension of attributes, () a performance quantile which represents the percentage of learners which are selected at each dimension and (m), the maximum numbers of learners trained at each dimension. We can set all these meta-parameters, together with a seed for replicability purposes and verbose = TRUE
to get a message as each dimension is completed, thanks to the swagcontrol() function which behaves similarly to the trControl =
argument of caret.
# Meta-parameters chosen for the breast cancer dataset
swagcon <- swagControl(pmax = 4L,
alpha = 0.5,
m = 20L,
seed = 163L, #for replicability
verbose = T #keeps track of completed dimensions
)
# Given the low dimensional dataset, we can afford a wider search
# by fixing alpha = 0.5 as a smaller alpha may also stop the
# training procedure earlier than expected.
Having set-up the meta-parameters as explained above, we are now ready to train the swag. We start with the linear Support Vector Machine learner:
### SVM Linear Learner ###
train_swag_svml <- swag(
# arguments for swag
x = x_train,
y = y_train,
control = swagcon,
auto_control = FALSE,
# arguments for caret
trControl = caret::trainControl(method = "repeatedcv", number = 10, repeats = 1, allowParallel = F),
metric = "Accuracy",
method = "svmLinear", # Use method = "svmRadial" to train this alternative learner
preProcess = c("center", "scale")
)
## [1] "Dimension explored: 1 - CV errors at alpha: 0.115"
## [1] "Dimension explored: 2 - CV errors at alpha: 0.0549"
## [1] "Dimension explored: 3 - CV errors at alpha: 0.0403"
## [1] "Dimension explored: 4 - CV errors at alpha: 0.0394"
The only difference with respect to the classic caret train function, is the specification of the swag arguments which have been explained previously. In the above chunk for the svmLinear learner, we define the estimator of the out-of-sample accuracy as 10-fold cross-validation repeated 1 time. For this specific case, we have chosen to center and rescale the data, as usually done for SVMs, and, the parameter that controls the margin in SVMs is automatically fixed at unitary value (i.e. (c=1)).
Let’s have a look at the typical output of a swag training object for the svmLinear learner:
## [[1]]
## [1] 0.14094276 0.06959836 0.07499399 0.15157407 0.10811688 0.08592593 0.11502886
## [8] 0.12070707 0.22122896
##
## [[2]]
## [1] 0.05107744 0.06225950 0.03852213 0.05492304 0.06030544 0.04377104
## [7] 0.05108225 0.06212121 0.07485570 0.05491582
##
## [[3]]
## [1] 0.04010101 0.04761063 0.03848846 0.04030784 0.04575758 0.04016835 0.03841991
## [8] 0.04387205 0.05105099
##
## [[4]]
## [1] 0.03464646 0.04572751 0.04030664 0.03852213
## [[1]]
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
## [1,] 1 2 3 4 5 6 7 8 9
##
## [[2]]
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## [1,] 2 2 2 2 3 3 3 5 5 6
## [2,] 3 5 6 7 5 6 7 6 7 7
##
## [[3]]
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
## [1,] 2 2 2 3 2 2 3 3 5
## [2,] 3 3 6 6 3 5 5 5 6
## [3,] 6 7 7 7 5 6 6 7 7
##
## [[4]]
## [,1] [,2] [,3] [,4]
## [1,] 2 2 2 3
## [2,] 3 3 5 5
## [3,] 6 5 6 6
## [4,] 7 6 7 7
## [1] 0.11502886 0.05491943 0.04030784 0.03941438
The other two learners that we have implemented on swag are: lasso (glmnet package required) and random forest (party package required). The training phase for these learners, differs a little with respect to the SVM one. We can look at the random forest for a practical example:
### Random Forest Learner ###
train_swag_rf <- swag(
# arguments for swag
x = x,
y = y,
control = swagcon,
auto_control = FALSE,
# arguments for caret
trControl = caret::trainControl(method = "repeatedcv", number = 10, repeats = 1, allowParallel = F),
metric = "Accuracy",
method = "rf",
# dynamically modify arguments for caret
caret_args_dyn = function(list_arg,iter){
list_arg$tuneGrid = expand.grid(.mtry=sqrt(iter))
list_arg
}
)
## [1] "Dimension explored: 1 - CV errors at alpha: 0.0996"
## [1] "Dimension explored: 2 - CV errors at alpha: 0.0534"
## [1] "Dimension explored: 3 - CV errors at alpha: 0.0461"
## [1] "Dimension explored: 4 - CV errors at alpha: 0.0425"
The newly introduced argument caret_args_dyn
enables the user to modify the hyper-parameters related to a given learner in a dynamic way since they can change as the dimension grows up to the desired (p_{max}). This allows to adapt the mtry hyper-parameter as the dimension grows. In the example above, we have fixed mtry to the square root of the number of attributes at each step as it is usually done in practice.
You can tailor the learning arguments of swag() as you like, introducing for example grids for the hyper-parameters specific of a given learner or update these grids as the dimension increases similarly to what is usually done for the caret package. This gives you a wide range of possibilities and a lot of flexibility in the training phase.
To conclude this brief introduction, we present the usual predict() function which can be applied to a swag trained object similarly to many other packages in R. We pick the random forest learner for this purpose.
# best learner predictions
# if `newdata` is not specified, then predict gives predictions based on the training
# sample
sapply(predict(object = train_swag_rf), function(x) head(x))
## $predictions
## [,1]
## [1,] 1
## [2,] 1
## [3,] 1
## [4,] 1
## [5,] 1
## [6,] 2
##
## $models
## $models[[1]]
## [1] 3 5 6 7
# best learner predictions
best_pred <- predict(object = train_swag_rf,
newdata = x_test)
sapply(best_pred, function(x) head(x))
## $predictions
## [,1]
## [1,] 1
## [2,] 1
## [3,] 1
## [4,] 2
## [5,] 1
## [6,] 1
##
## $models
## $models[[1]]
## [1] 3 5 6 7
# predictions for a given dimension
dim_pred <- predict(
object = train_swag_rf,
newdata = x_test,
type = "attribute",
attribute = 4L)
sapply(dim_pred,function(x) head(x))
## $predictions
## [,1] [,2] [,3] [,4]
## [1,] 1 1 1 1
## [2,] 1 1 1 1
## [3,] 1 1 1 1
## [4,] 2 2 2 2
## [5,] 1 1 1 1
## [6,] 1 1 1 1
##
## $models
## $models[[1]]
## [1] 2 3 5 6
##
## $models[[2]]
## [1] 2 3 5 7
##
## $models[[3]]
## [1] 3 5 6 7
##
## $models[[4]]
## [1] 2 3 6 7
# predictions below a given CV error
cv_pred <- predict(
object = train_swag_rf,
newdata = x_test,
type = "cv_performance",
cv_performance = 0.04)
sapply(cv_pred,function(x) head(x))
## $predictions
## [,1]
## [1,] 1
## [2,] 1
## [3,] 1
## [4,] 2
## [5,] 1
## [6,] 1
##
## $models
## $models[[1]]
## [1] 3 5 6 7
Now we can evaluate the performance of the best learner selected by swag thanks to the confusionMatrix() function of caret.
# transform predictions into a data.frame of factors with levels of `y_test`
best_learn <- factor(levels(y_test)[best_pred$predictions])
caret::confusionMatrix(best_learn,y_test)
## Confusion Matrix and Statistics
##
## Reference
## Prediction benign malignant
## benign 90 0
## malignant 0 46
##
## Accuracy : 1
## 95% CI : (0.9732, 1)
## No Information Rate : 0.6618
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 1
##
## Mcnemar's Test P-Value : NA
##
## Sensitivity : 1.0000
## Specificity : 1.0000
## Pos Pred Value : 1.0000
## Neg Pred Value : 1.0000
## Prevalence : 0.6618
## Detection Rate : 0.6618
## Detection Prevalence : 0.6618
## Balanced Accuracy : 1.0000
##
## 'Positive' Class : benign
##
Thanks for the attention. You can definitely say that you worked with swag !!!