Getting started with sparsegl

This package provides tools for fitting regularization paths for sparse group-lasso penalized learning problems. The model is fit for a sequence of the regularization parameter lambda.

The strengths and improvements that this package achieved compared to other sparse group-lasso packages are as follows:

Installing

This package is not on CRAN yet, so it can be installed using the [devtools] (https://cran.r-project.org/package=devtools) package:

devtools::install_github("dajmcdon/sparsegl", ref = "main")

Building the vignettes, such as this getting started guide, takes a significant amount of time. They are not included in the package by default. If you want to include vignettes, then use this modified command:

devtools::install_github("dajmcdon/sparsegl", ref = "main",
                          build_vignettes = TRUE, dependencies = TRUE)

For this getting-started vignette, firstly, we will randomly generate X, an input matrix of predictors of dimension n-obs by p-feature. To initiate y, a real-valued vector (make it a vector rather a matrix),

where the coefficient vector \(\beta^*\) is specified as below, and the white noise \(\epsilon\) following standard normal distribution serves as data variation. Then the sparse group-lasso problem is formulated as the sum of mean squared error ( linear regression) or logistic loss (logistic regression) and a convex combination of lasso penalty and group lasso penalty:

where

library(sparsegl)
set.seed(1010)
n <- 100
p <- 200
X <- matrix(data = rnorm(n*p, mean = 0, sd = 1), nrow = n, ncol = p)
beta_star <- c(rep(5, 5), c(5, -5, 2, 0, 0), rep(-5, 5), c(2, -3, 8, 0, 0), rep(0, (p - 20)))
groups <- rep(1:(p / 5), each = 5)

# Linear regression model
eps <- rnorm(n, mean = 0, sd = 1)
y <- X %*% beta_star + eps

# Logistic regression model
pr <- 1 / (1 + exp(-X %*% beta_star))
y_binary <- rbinom(n, 1, pr)

sparsegl()

Given an input matrix X, and a response vector y (or a matrix with 1 column), a sparse group-lasso regularized linear model is fitted for a sequence of penalty parameter values in terms of penalized maximum likelihood. The penalty is composed of lasso penalty and group lasso penalty. The other main arguments the users might give are:

It returns a sparsegl object, and the main attribute of this object is:

fit1 <- sparsegl(X, y, group = groups)

Plotting function plot() for sparsegl object

This function produces nonzero-only coefficient curves for each penalty parameter lambda values in the regularization path for a fitted sparsegl object. The arguments of this function are:

To elaborate on y_axis and x_axis:

plot(fit1, y_axis = "group", x_axis = "lambda")

plot(fit1, y_axis = "coef", x_axis = "penalty", add_legend = FALSE)

coef(), predict() and print() sparsegl object

All three functions consume a fitted sparsegl object as an arguments and

coef <- coef(fit1, s = c(0.02, 0.03))
pred <- predict(fit1, newx = X[100,], s = fit1$lambda[2:3])
print(fit1)

cv.sparsegl()

This function does a k-fold cross-validation (cv) on sparsegl. It takes the same arguments X, y, group, which are specified above, with additional argument pred.loss. It can be set with either “L2” or “L1” for linear regression model, “loss” or “misclass” for logistic regression model indicating the loss to use for cv error. This will return a cv.sparseg object.

fit_l1 <- cv.sparsegl(X, y, group = groups, pred.loss = "L1")

plot(), coef() and predict() for cv.sparsegl object

plot(fit_l1)

coef <- coef(fit_l1, s = "lambda.1se")
pred <- predict(fit_l1, newx = X[50:80, ], s = "lambda.min")

estimate_risk()

This function returns the information criterion, which is the sum of the maximum log-likelihood and a penalty term determined by the chosen penalty type for a sparsegl model at each lambda. It provides a means for model selection by representing the trade-off between the goodness of fit of the model and the complexity of the model. It takes the same arguments X and y from the function sparsegl(). The additional arguments it needs are:

where df is the degree-of-freedom, and n is the sample size.

FYI: Degree-of-freedom is a tool to assess the complexity of a statistical modeling procedure. object$df, the approximation to the degree-of-freedom is the number of nonzero coefficients of the model. Notice that it would take some time to calculate the unbiased estimate of the exact degrees-of-freedom if X is complicated. For more details about how to realize this calculation, our method is implemented based on the paper https://arxiv.org/pdf/1212.6478.pdf.

risk <- estimate_risk(fit1, X, type = "AIC", approx_df = FALSE)