Note. In version 0.3.0, released on April 17th, 2023, we corrected an issue in the \(\texttt{penAFT.cv}\) function where the output \(\texttt{lambda.min}\) returned the largest candidate tuning parameter, not the tuning parameter with the smallest cross-validated error (as the documentation stated). This did not affect any results in the Statistics in Medicine article.

1. Introduction

In this document, we will detail how to use the penAFT R package with a number of short simulated examples. This document is intended to serve as a user guide accompanying the documentation in the penAFT R package. The primary purpose of the package is to fit, tune, and apply the penalized Gehan estimator in high-dimensional settings.

For \(i = 1, \dots, n\), let \(y_i = \min(t_i, c_i)\) where \(t_i\) is the time-to-event and \(c_i\) is the censoring time for the \(i\)th subject; let \(x_i \in \mathbb{R}^p\) be the measured predictors for the \(i\)th subject; and let \(\delta_i = \mathbf{1}(y_i = t_i)\) be the indicator of censoring. The penalized Gehan estimator is defined as \[\begin{equation} \label{eq:penalizedGehan} \arg\min_{{\beta} \in \mathbb{R}^p} \left\{ \frac{1}{n^2} \sum_{i=1}^n \sum_{j=1}^n \delta_i \{\log y_i - \log y_j - \beta'(x_i - x_j) \}^{-} + \lambda g({\beta})\right\} \end{equation}\] where \(\{a\}^{-} = \max(-a, 0)\), \(\lambda > 0\) is a tuning parameter, and \(g\) is a closed proper convex penalty function. In the penAFT package, one can fit this model and perform cross-validation for two classes of penalties: The weighted elastic net penalty is defined as \[\texttt{(penalty = "EN")}~~~~~~ g(\beta) = \alpha \| w \circ \beta\|_1 + \frac{(1-\alpha)}{2}\|\beta\|_2^2\] where \(w \in \mathbb{R}^p\) is a set of non-negative weights (which can be specified in the \(\texttt{weight.set}\) argument). The weighted sparse group-lasso penalty, which one may also use, is defined as \[\texttt{(penalty = "SG")}~~~~~~ g(\beta)= \alpha \| w \circ \beta\|_1 + (1-\alpha)\sum_{l=1}^G v_l\|\beta_{\mathcal{G}_l}\|_2\] where again, \(w \in \mathbb{R}^p\) is a set of non-negative weights and \(v_l \in \mathbb{R}^G\) are non-negative weights applied to each of the \(G\) (user-specified) groups.

It is important to note that \(\alpha = 1\) and \(\texttt{penalty = "EN"}\) are the defaults. When using the sparse group lasso penalty \(\texttt{penalty = "SG"}\), setting \(\alpha = 1\) is identical to \(\texttt{penalty = "EN"}\). Thus, one must use some value other than \(\alpha = 1\) to use \(\texttt{(penalty = "SG")}\).

First, load the \(\texttt{penAFT}\) package from GitHub. This will require the devtools package in R.

library(devtools)
## Loading required package: usethis
devtools::install_github("ajmolstad/penAFT", force=TRUE)
## Downloading GitHub repo ajmolstad/penAFT@HEAD
## 
## ── R CMD build ─────────────────────────────────────────────────────────────────
##   
   checking for file ‘/private/var/folders/zx/flplz1454t38lr_qm3jxv1s80000gn/T/Rtmp7GjHeq/remotes8fb8170889ca/ajmolstad-penAFT-aa475df/DESCRIPTION’ ...
  
✔  checking for file ‘/private/var/folders/zx/flplz1454t38lr_qm3jxv1s80000gn/T/Rtmp7GjHeq/remotes8fb8170889ca/ajmolstad-penAFT-aa475df/DESCRIPTION’
## 
  
─  preparing ‘penAFT’:
## 
  
   checking DESCRIPTION meta-information ...
  
✔  checking DESCRIPTION meta-information
## 
  
─  cleaning src
## 
  
─  checking for LF line-endings in source and make files and shell scripts
## 
  
─  checking for empty or unneeded directories
## 
  
   Omitted ‘LazyData’ from DESCRIPTION
## 
  
─  building ‘penAFT_0.3.0.tar.gz’
## 
  
   
## 

To begin, we generate data from a log-logistic accelerated failure time model.

library(penAFT)
set.seed(1)
n <- 200
p <- 250
SigmaX <- matrix(0, nrow=p, ncol=p)
for (j in 1:p) {
  for (k in 1:p) {
    SigmaX[j,k] <- 0.5^abs(j-k)
  }
}
eo <- eigen(SigmaX)
X <- matrix(rnorm(n*p), nrow=n)%*%eo$vec%*%diag(eo$val^.5)%*%t(eo$vec)
beta <- rep(0, p)
beta[c(1,20,200)] <- 2
logT <- X%*%beta + rlogis(n, loc=0, scale=2)
C <- rexp(n, rate=1/quantile(exp(logT), 0.65))
logY <- pmin(logT, log(C))
delta <- 1*(logY == logT)

2. Elastic net penalized Gehan estimator

2.1 Basics

Let us begin by fitting the model with the elastic net penalty.

fit.en <- penAFT(X, logY, delta, alpha = 0.5, nlambda = 100)
str(fit.en)
## List of 6
##  $ beta       :Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
##   .. ..@ i       : int [1:2269] 19 19 19 19 19 199 19 199 19 199 ...
##   .. ..@ p       : int [1:101] 0 1 2 3 4 6 8 10 12 14 ...
##   .. ..@ Dim     : int [1:2] 250 100
##   .. ..@ Dimnames:List of 2
##   .. .. ..$ : NULL
##   .. .. ..$ : NULL
##   .. ..@ x       : num [1:2269] 0.00124 0.02285 0.03876 0.05623 0.07831 ...
##   .. ..@ factors : list()
##  $ lambda     : num [1:100] 0.51 0.499 0.487 0.476 0.465 ...
##  $ standardize: logi TRUE
##  $ X.mean     : num [1:250] 0.0361 0.0266 -0.0517 -0.1021 -0.0357 ...
##  $ X.sd       : num [1:250] 0.94 1.01 1.06 1.1 1.07 ...
##  $ alpha      : num 0.5
##  - attr(*, "class")= chr "penAFT"

The output of \(\texttt{fit.en}\) is a list containing

  • \(\texttt{beta}\): A sparse matrix containing the regression coefficients. To extract coefficients for a particular tuning parameter, one must use the \(\texttt{penAFT.coef}\) function.
  • \(\texttt{lambda}\): The set of candidate tuning parameter values.
  • \(\texttt{standarize}\): An indicator of whether the predictors were standardized for model fitting.
  • \(\texttt{X.mean}\), \(\texttt{X.sd}\): The mean and standard deviations of the predictors. These are used to return coefficients and linear predictors on the appropriate scale in the \(\texttt{penAFT.coef}\) and \(\texttt{penAFT.predict}\) functions.

We can look at a trace plot for the fitted model.

penAFT.trace(fit.en)

Using the functions \(\texttt{penAFT.predict}\) and \(\texttt{penAFT.coef}\), we can also obtain the linear predictor for new subjects and extract coefficients at a particular \(\lambda\)

preds <- penAFT.predict(fit.en, Xnew = rnorm(p), lambda = fit.en$lambda[20])
str(preds)
##  num [1, 1] 0.667
coef.est20 <- penAFT.coef(fit.en, lambda = fit.en$lambda[20])
coef.est20$beta[1:20]
##  [1] 0.1449862 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
##  [8] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
## [15] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.4225436

If we wanted to select tuning parameters, we can run cross-validation using \(\texttt{penAFT.cv}\).

fit.en.cv <- penAFT.cv(X, logY, delta, alpha = 0.5, nlambda = 100, nfolds = 5)
## CV through:  ###                 20 % 
## CV through:  ### ###             40 % 
## CV through:  ### ### ###         60 % 
## CV through:  ### ### ### ###     80 % 
## CV through:  ### ### ### ### ###  100 %
str(fit.en.cv)
## List of 5
##  $ full.fit      :List of 6
##   ..$ beta       :Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
##   .. .. ..@ i       : int [1:2269] 19 19 19 19 19 199 19 199 19 199 ...
##   .. .. ..@ p       : int [1:101] 0 1 2 3 4 6 8 10 12 14 ...
##   .. .. ..@ Dim     : int [1:2] 250 100
##   .. .. ..@ Dimnames:List of 2
##   .. .. .. ..$ : NULL
##   .. .. .. ..$ : NULL
##   .. .. ..@ x       : num [1:2269] 0.00124 0.02285 0.03876 0.05623 0.07831 ...
##   .. .. ..@ factors : list()
##   ..$ lambda     : num [1:100] 0.51 0.499 0.487 0.476 0.465 ...
##   ..$ standardize: logi TRUE
##   ..$ X.mean     : num [1:250] 0.0361 0.0266 -0.0517 -0.1021 -0.0357 ...
##   ..$ X.sd       : num [1:250] 0.94 1.01 1.06 1.1 1.07 ...
##   ..$ alpha      : num 0.5
##  $ cv.err.linPred: num [1:100] 1.87 1.86 1.86 1.86 1.85 ...
##  $ cv.err.obj    : num [1:5, 1:100] 1.59 2.02 1.66 1.94 1.97 ...
##  $ cv.index      :List of 5
##   ..$ : int [1:41] 6 18 19 34 36 49 67 70 73 74 ...
##   ..$ : int [1:41] 1 3 14 21 24 29 32 58 59 69 ...
##   ..$ : int [1:40] 11 16 28 30 31 35 37 47 54 63 ...
##   ..$ : int [1:39] 10 13 44 45 46 56 57 62 82 91 ...
##   ..$ : int [1:39] 9 50 52 53 65 75 94 95 97 98 ...
##  $ lambda.min    : num 0.0851
##  - attr(*, "class")= chr "penAFT.cv"

We see that the object \(\texttt{fit.en.cv}\) contains exactly the \(\texttt{fit.en}\) object in the sublist \(\texttt{full.fit}\). The other output includes

  • \(\texttt{cv.linPred}\): the cross-validated linear predictor scores (the default metric for cross-validation)
  • \(\texttt{cv.err.obj}\): the cross-validation error Gehan loss evaluated on each fold
  • \(\texttt{lambda.min}\): the tuning parameter which minimized the cross-validated linear predictor scores
  • \(\texttt{cv.index}\): the indices of subjects belonging to each fold.

We can plot the cross-validation error curves using the \(\texttt{penAFT.plot}\) function.

penAFT.plot(fit.en.cv)

By using the \(\texttt{penAFT.coef}\) without specifying the tuning parameter, we can extract the coefficient at the tuning parameter minimizing the cross-validated linear predictor scores.

preds.cv <- penAFT.predict(fit.en.cv, Xnew = matrix(rnorm(p), nrow=1))
str(preds.cv)
##  num [1, 1] -5.1
coef.cv <- penAFT.coef(fit.en.cv)
coef.cv$beta[1:20]
##  [1]  0.96916871  0.23761024  0.00000000  0.00000000  0.00000000  0.00000000
##  [7]  0.00000000  0.00000000 -0.06805915  0.12349614  0.00000000  0.00000000
## [13]  0.00000000  0.00000000 -0.01219268  0.00000000  0.00000000  0.00000000
## [19]  0.00000000  1.68950662

2.2 Weighted elastic net penalization

Suppose that the first predictor was one which we did not want to penalize; and the second predictor was one we want to doubly penalize. For example, this the first predictor we require be included in the model, whereas the second we would rather not include, if possible. Our software can easily handle this situation. First, we construct the weight set. This is a list with inputs \(w \in \mathbb{R}^p_+\) and (if doing group-lasso) \(v\), which has dimension equal to the number of groups

weight.set <- list("w" = c(0, rep(1, p-1)))
fit.en.cv.weighted <- penAFT.cv(X, logY, delta, alpha = 0.5, weight.set = weight.set, nlambda = 100, nfolds = 5)
## CV through:  ###                 20 % 
## CV through:  ### ###             40 % 
## CV through:  ### ### ###         60 % 
## CV through:  ### ### ### ###     80 % 
## CV through:  ### ### ### ### ###  100 %
penAFT.trace(fit.en.cv.weighted)

2.3 Advanced options

Suppose that the minimum tuning parameter considered is too small, thus causing an increase in computing times. In this case, we would want to adjust the range of candidate tuning parameters to consider larger possible values of \(\lambda\). For this, we would set \(\texttt{lambda.ratio.min}\) to be greater than its default value of 0.1.

set.seed(20)
n <- 200
p <- 1000
SigmaX <- matrix(0, nrow=p, ncol=p)
for (j in 1:p) {
  for (k in 1:p) {
    SigmaX[j,k] <- 0.5^abs(j-k)
  }
}
eo <- eigen(SigmaX)
X <- matrix(rnorm(n*p), nrow=n)%*%eo$vec%*%diag(eo$val^.5)%*%t(eo$vec)
beta <- rep(0, p)
beta[c(1,20,200)] <- 2
logT <- X%*%beta + rlogis(n, loc=0, scale=2)
C <- rexp(n, rate=1/quantile(exp(logT), 0.65))
logY <- pmin(logT, log(C))
delta <- 1*(logY == logT)
fit.lasso <- penAFT(X, logY, delta, lambda.ratio.min = 0.25, nlambda = 50)
penAFT.trace(fit.lasso)

Note that one should use this option with caution: if \(\texttt{lambda.ratio.min}\) is too small, it may occur that the solution is no longer affected by \(\lambda\) (i.e., the minimizer is not unique). We recommend examining the solution path making \(\texttt{lambda.ratio.min}\) smaller if the lowest cross-validation error occurs at the smallest considered tuning parameter value.

If computing time is an issue, one may also relax convergence tolerances. It is recommended to relax the relative tolerance first, whose default value is \(2.5 \times 10^{-4}\). Conversely, if one requires a much more accurate solution, they may decrease the relative tolerance, although this can result in MUCH longer computing times if not done with caution.

ptm <- proc.time()
fit.lasso.relaxed <- penAFT(X, logY, delta, tol.rel = 1e-3, nlambda = 100)
proc.time() - ptm
##    user  system elapsed 
##   2.954   0.038   2.991
ptm <- proc.time()
fit.lasso.strict <- penAFT(X, logY, delta, tol.rel = 1e-4, nlambda = 100)
proc.time() - ptm
##    user  system elapsed 
##  31.205   0.195  31.427

3. Sparse group-lasso penalized Gehan estimator

3.1 Basics

Next, we consider the application of our method to setting where predictors are grouped.

Here, we break coefficients into groups of size 25. Clearly, only 5 coefficients from the first group (coefficients 1-25) are important.

groups <- rep(1:10, each = 25)
fit.sg.cv <- penAFT.cv(X, logY, delta, penalty = "SG", groups = groups, alpha = 0.5, nlambda = 100, nfolds = 5)
## CV through:  ###                 20 % 
## CV through:  ### ###             40 % 
## CV through:  ### ### ###         60 % 
## CV through:  ### ### ### ###     80 % 
## CV through:  ### ### ### ### ###  100 %
penAFT.trace(fit.sg.cv)

Now, let’s check the estimated coefficient.

penAFT.coef(fit.sg.cv)$beta[1:50]
##  [1] 0.08604489 0.34588667 0.52262444 0.09997556 0.19207822 0.20749261
##  [7] 0.08911747 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000
## [13] 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000
## [19] 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000
## [25] 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000
## [31] 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000
## [37] 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000
## [43] 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000
## [49] 0.00000000 0.00000000

We see that only predictors in the first two groups are nonzero, and both groups only consist of a small number of nonzero coefficients. Now, let’s remove the \(L_1\)-penalty so we only penalize groups. This can be done by setting \(\alpha = 0\).

groups <- rep(1:10, each = 25)
fit.sg.cv.0 <- penAFT.cv(X, logY, delta, penalty = "SG", groups = groups, alpha = 0, nlambda = 100, nfolds = 5)
## CV through:  ###                 20 % 
## CV through:  ### ###             40 % 
## CV through:  ### ### ###         60 % 
## CV through:  ### ### ### ###     80 % 
## CV through:  ### ### ### ### ###  100 %
penAFT.trace(fit.sg.cv.0)

penAFT.plot(fit.sg.cv.0)

In this situation, we would select the model with only the first group.

3.2 Weighted sparse group lasso

Now, suppose we wanted to incorporate weights. Suppose we did not want to penalize the \(L_2\) norm of the first group, and did not want to penalize the first 10 predictors in the first group. We can do so by specifying the weight set as shown below. Note that when any weights are equal to zero, the default choice of candidate tuning parameters may be too small: in this case, you may input your own set of candidate tuning parameters.

groups <- rep(1:10, each = 25)
weight.set <- list("w"= c(0, 0, rep(1,p-2)), "v" = c(0, rep(1,9)))
fit.sg.cv.weighted <- penAFT.cv(X, logY, delta, penalty = "SG", groups = groups, 
                                lambda = 10^seq(-0.5, -2, length=50), weight.set = weight.set, alpha = 0.5, nlambda = 100, nfolds = 5)
## Warning in penAFT.cv(X, logY, delta, penalty = "SG", groups = groups, lambda =
## 10^seq(-0.5, : It is recommended to let tuning parameters be chosen
## automatically: see documentation.
## CV through:  ###                 20 % 
## CV through:  ### ###             40 % 
## CV through:  ### ### ###         60 % 
## CV through:  ### ### ### ###     80 % 
## CV through:  ### ### ### ### ###  100 %
penAFT.trace(fit.sg.cv.weighted)