In this document, we provide a short tutorial on how to use the \(\texttt{MSRL}\) R package. If you encounter any errors or strange behavior, please report the issue at https://github.com/ajmolstad/MSRL. We start by downloading this package from GitHub.

install.packages("devtools")
library(devtools)
devtools::install_github("ajmolstad/MSRL")
library(MSRL)

The multivariate square-root lasso

The multivariate square-root lasso estimator is defined as \[\begin{equation}\label{eq:MSRL} \arg\min_{\beta \in \mathbb{R}^{p \times q}} \left\{ \frac{1}{\sqrt{n}}\|\mathbf{Y} - \mathbf{X}\beta\|_* + \lambda \sum_{j,k}w_{j,k}|\beta_{j,k}|\right\}. \end{equation}\] where \(\|\cdot\|_*\) is the nuclear norm, i.e., the norm which sums the singular values of its matrix argument, and the \(w_{j,k}\) are weights to be defined by the user. Note that here, the \(\mathbf{Y}\) and \(\mathbf{X}\) in the definition have been centered by their columnwise means (to remove the intercept). The \(Y\) and \(X\) input into the function should not be centered. The tuning parameter \(\lambda >0\) is user specified as well: we provide functions below to help the user determine an appropriate value for \(\lambda\).

Generate data

To demonstrate how to use the package, we generate some data from Model 1 of the article “Insights and algorithms for the multivariate square-root lasso”.

sessionInfo()
## R version 3.6.1 (2019-07-05)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS Catalina 10.15
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] Matrix_1.2-18
## 
## loaded via a namespace (and not attached):
##  [1] compiler_3.6.1  magrittr_1.5    tools_3.6.1     htmltools_0.4.0
##  [5] yaml_2.2.1      Rcpp_1.0.4      stringi_1.4.6   rmarkdown_2.1  
##  [9] grid_3.6.1      knitr_1.28      stringr_1.4.0   xfun_0.12      
## [13] digest_0.6.25   rlang_0.4.5     lattice_0.20-40 evaluate_0.14
# --------------------------------
# Preliminaries
# --------------------------------
set.seed(1)
p <- 100
n <- 70
q <- 20
ntest <- 100

# --------------------------------
# Generate predictors
# --------------------------------
SigmaX <- matrix(0, nrow=p, ncol=p)
for(k in 1:p){
  for(j in 1:p){
    SigmaX[j,k] <- .5^abs(j-k)
  }
}
diag(SigmaX) <- 1
eo <- eigen(SigmaX)
SigmaXsqrt <- eo$vec%*%diag(eo$val^.5)%*%t(eo$vec)
X <- matrix(rnorm(n*p), nrow=n)%*%SigmaXsqrt
Xtest <- matrix(rnorm(ntest*p), nrow=ntest)%*%SigmaXsqrt
# -----------------------------------
# Generate regression coefficients
# -----------------------------------
beta <- matrix(0, nrow=p, ncol=q)
beta[sample(1:(p*q), floor(p*q)*.04, replace=TRUE)] <- 1

# -------------------------------------
# Generate responses
# -------------------------------------
Sigma <- matrix(0, nrow=q, ncol=q)
for(k in 1:q){
  for(j in 1:q){
    Sigma[j,k] <- .9^abs(j-k)
  }
}
diag(Sigma) <- 1
eo <- eigen(Sigma)
Sigmasqrt <- eo$vec%*%diag(eo$val^.5)%*%t(eo$vec)
Y <- X%*%beta + matrix(rnorm(n*q), nrow=n)%*%Sigmasqrt
Ytest <- Xtest%*%beta + matrix(rnorm(ntest*q), nrow=ntest)%*%Sigmasqrt

Model fitting

First, we show how to fit the model without cross-validation. By setting the argument \(\texttt{nfolds = NULL}\). To track progress, we use the argument \(\texttt{quiet = FALSE}\). Since \(n > q\), we leave \(\texttt{ADMM = FALSE}\) – the function will switch to ADMM automatically if the residual matrix has fewer than \(q\) nonzero singular values.

mod1 <- MSRL.cv(X = X, Y = Y, nlambda = 10, standardize = FALSE, ADMM = FALSE, 
                nfolds = NULL,  weighted = FALSE,  delta = .25, tol = 1e-8, quiet = FALSE,
                inner.quiet = TRUE)
## 1 : non-zero =  0 
## # ------------------------------  
## 2 : non-zero =  3 
## # ------------------------------  
## 3 : non-zero =  12 
## # ------------------------------  
## 4 : non-zero =  30 
## # ------------------------------  
## 5 : non-zero =  57 
## # ------------------------------  
## 6 : non-zero =  77 
## # ------------------------------  
## 7 : non-zero =  102 
## # ------------------------------  
## 8 : non-zero =  127 
## # ------------------------------  
## 9 : non-zero =  180 
## # ------------------------------  
## 10 : non-zero =  269 
## # ------------------------------

If we set \(\texttt{inner.quiet = FALSE}\) and \(\texttt{ADMM = FALSE}\), we also print \(r_1\) and \(r_2\) defined below. These quantities measure whether the first order conditions are satisfied and will print after every 10 iterations. \[ r_1 = \frac{\sum_{j,k} 1\left(\hat{\beta}_{j,k} = 0 \cap |[X'UV']_{j,k}| \geq \sqrt{n} w_{j,k}\lambda\right) }{\sum_{j,k} 1(\hat{\beta}_{j,k} = 0)},\] and \[ r_2 = \max_{(j,k):\hat{\beta_{j,k}} \neq 0} |\frac{1}{\sqrt{n}}[X'UV']_{j,k} - w_{j,k}\lambda {\rm sign}(\hat{\beta}_{j,k})|\] where \((U,D,V) = {\rm svd}(Y - X\hat{\beta})\). This argument may be useful if convergence is slow with default settings. We decrease the convergence tolerance to display (slightly) more output for the sake of example.

mod2 <- MSRL.cv(X = X, Y = Y, nlambda = 10, standardize = FALSE, ADMM = FALSE, 
                nfolds = NULL, weighted = FALSE,   delta = .1, tol = 1e-16, quiet = FALSE,
                inner.quiet = FALSE)
## 1 : non-zero =  0 
## # ------------------------------  
## 2 : non-zero =  8 
## # ------------------------------  
## r1 =  2.850173e-14 ; s1 =  1.767457e-14 
## 3 : non-zero =  34 
## # ------------------------------  
## r1 =  9.753467e-13 ; s1 =  5.645532e-13 
## 4 : non-zero =  77 
## # ------------------------------  
## r1 =  2.501008e-15 ; s1 =  1.289917e-15 
## 5 : non-zero =  114 
## # ------------------------------  
## 6 : non-zero =  209 
## # ------------------------------  
## 7 : non-zero =  370 
## # ------------------------------  
## r1 =  5.166193e-12 ; s1 =  1.472256e-12 
## 8 : non-zero =  580 
## # ------------------------------  
## r1 =  6.010297e-10 ; s1 =  1.219004e-10 
## r1 =  3.470459e-13 ; s1 =  6.63826e-14 
## r1 =  8.001176e-16 ; s1 =  1.51717e-16 
## 9 : non-zero =  801 
## # ------------------------------  
## r1 =  4.99698e-08 ; s1 =  8.959355e-09 
## r1 =  1.076174e-09 ; s1 =  2.036444e-10 
## r1 =  2.688728e-11 ; s1 =  5.183093e-12 
## r1 =  9.157168e-13 ; s1 =  1.72084e-13 
## r1 =  3.685779e-14 ; s1 =  6.921522e-15 
## r1 =  1.559806e-15 ; s1 =  2.957224e-16 
## 10 : non-zero =  1001 
## # ------------------------------

Checking the output, we have:

str(mod2)
## List of 13
##  $ beta        :Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
##   .. ..@ i       : int [1:3194] 508 540 660 746 786 839 1256 1654 0 229 ...
##   .. ..@ p       : int [1:11] 0 0 8 42 119 233 442 812 1392 2193 ...
##   .. ..@ Dim     : int [1:2] 2000 10
##   .. ..@ Dimnames:List of 2
##   .. .. ..$ : NULL
##   .. .. ..$ : NULL
##   .. ..@ x       : num [1:3194] 0.2392 0.0802 0.5712 0.033 0.1546 ...
##   .. ..@ factors : list()
##  $ sparsity.mat: num [1:10] 0 8 34 77 114 ...
##  $ err.pred    : NULL
##  $ err.wpred   : NULL
##  $ err.spec    : NULL
##  $ err.nuc     : NULL
##  $ Y.mean      : num [1:20] 0.333 0.009 0.04 0.186 -0.222 ...
##  $ X.mean      : num [1:100] 0.1468 0.0199 0.0464 -0.027 0.0928 ...
##  $ Y.sd        : num [1:20] 2.5 3.29 2.72 2.8 1.7 ...
##  $ X.sd        : num [1:100] 0.938 0.831 1.099 1.012 0.951 ...
##  $ lambda.vec  : num [1:10] 0.803 0.622 0.482 0.373 0.289 ...
##  $ lam.min     : NULL
##  $ standardize : logi FALSE
##  - attr(*, "class")= chr "MSRL"

Note that \(\beta\) for all candidate tuning parameters is stored as a sparse matrix – to extract coefficients, use the \(\texttt{MSRL.coef}\) function detailed below.

Cross-validation

Now we set \(\texttt{nfolds = 5}\) to perform 5-fold cross-validation. Note that this actually fits the model \(\texttt{nfolds} + 1\) times since it first fits the model to the entire dataset.

mod3 <- MSRL.cv(X = X, Y = Y, nlambda = 50, standardize = FALSE, ADMM = FALSE, 
                nfolds = 5,  weighted = FALSE, delta = .1, tol = 1e-8, quiet = FALSE, 
                inner.quiet = TRUE)
## 1 : non-zero =  0 
## # ------------------------------  
## 2 : non-zero =  1 
## # ------------------------------  
## 3 : non-zero =  1 
## # ------------------------------  
## 4 : non-zero =  2 
## # ------------------------------  
## 5 : non-zero =  5 
## # ------------------------------  
## 6 : non-zero =  7 
## # ------------------------------  
## 7 : non-zero =  9 
## # ------------------------------  
## 8 : non-zero =  14 
## # ------------------------------  
## 9 : non-zero =  21 
## # ------------------------------  
## 10 : non-zero =  24 
## # ------------------------------  
## 11 : non-zero =  31 
## # ------------------------------  
## 12 : non-zero =  35 
## # ------------------------------  
## 13 : non-zero =  46 
## # ------------------------------  
## 14 : non-zero =  57 
## # ------------------------------  
## 15 : non-zero =  60 
## # ------------------------------  
## 16 : non-zero =  71 
## # ------------------------------  
## 17 : non-zero =  76 
## # ------------------------------  
## 18 : non-zero =  86 
## # ------------------------------  
## 19 : non-zero =  92 
## # ------------------------------  
## 20 : non-zero =  101 
## # ------------------------------  
## 21 : non-zero =  103 
## # ------------------------------  
## 22 : non-zero =  108 
## # ------------------------------  
## 23 : non-zero =  117 
## # ------------------------------  
## 24 : non-zero =  130 
## # ------------------------------  
## 25 : non-zero =  144 
## # ------------------------------  
## 26 : non-zero =  158 
## # ------------------------------  
## 27 : non-zero =  177 
## # ------------------------------  
## 28 : non-zero =  203 
## # ------------------------------  
## 29 : non-zero =  228 
## # ------------------------------  
## 30 : non-zero =  254 
## # ------------------------------  
## 31 : non-zero =  284 
## # ------------------------------  
## 32 : non-zero =  307 
## # ------------------------------  
## 33 : non-zero =  348 
## # ------------------------------  
## 34 : non-zero =  386 
## # ------------------------------  
## 35 : non-zero =  419 
## # ------------------------------  
## 36 : non-zero =  457 
## # ------------------------------  
## 37 : non-zero =  490 
## # ------------------------------  
## 38 : non-zero =  539 
## # ------------------------------  
## 39 : non-zero =  578 
## # ------------------------------  
## 40 : non-zero =  622 
## # ------------------------------  
## 41 : non-zero =  660 
## # ------------------------------  
## 42 : non-zero =  697 
## # ------------------------------  
## 43 : non-zero =  737 
## # ------------------------------  
## 44 : non-zero =  769 
## # ------------------------------  
## 45 : non-zero =  818 
## # ------------------------------  
## 46 : non-zero =  864 
## # ------------------------------  
## 47 : non-zero =  893 
## # ------------------------------  
## 48 : non-zero =  914 
## # ------------------------------  
## 49 : non-zero =  947 
## # ------------------------------  
## 50 : non-zero =  1001 
## # ------------------------------  
## # ------------------------------  
## Through CV fold 1 
## # ------------------------------  
## # ------------------------------  
## Through CV fold 2 
## # ------------------------------  
## # ------------------------------  
## Through CV fold 3 
## # ------------------------------  
## # ------------------------------  
## Through CV fold 4 
## # ------------------------------  
## # ------------------------------  
## Through CV fold 5 
## # ------------------------------

In the output, there are four different metrics used to measure performance in cross-validation. Let \(Y_{-k}\) and \(X_{-k}\) be the left-out-fold responses and predictors which have been centered (standardized) based on the left-in-fold data. The metrics are:

plot(x = log2(mod3$lambda.vec), y = rowMeans(mod3$err.pred), pch=20, 
     xlab=expression(paste(log[2], "(", lambda,")", sep="")), ylab="CV prediction error")
abline(v = which.min(rowMeans(mod3$err.pred)), lty=2)

Additional options

The other options work as follows:

Predictions

We can obtain predicted values for a new set of predictors. As a default tuning parameter after cross-validation, we use the tuning parmaeter which minimizes the square Frobenius norm prediction error.

fitted.preds <- MSRL.predict(Xnew = Xtest, fit = mod3, lambda = mod3$lam.min)
str(fitted.preds)
## List of 3
##  $ pred : num [1:100, 1:20] -2.994 -2.116 -1.218 1.008 -0.335 ...
##  $ beta0: num [1:20, 1] 0.0932 0.0362 0.0835 0.124 0.0399 ...
##  $ beta : num [1:100, 1:20] 0.958 0.904 0 0 0 ...
fitted.preds$pred[1:10, 1:10]
##             [,1]       [,2]       [,3]        [,4]        [,5]       [,6]
##  [1,] -2.9943838  1.7307768  1.7349441 -1.90440454  1.11935961 -1.0451513
##  [2,] -2.1159994 -1.4359749  0.0388384  0.97334015 -0.17965675 -0.3284218
##  [3,] -1.2182449 -0.2460927 -0.6745564 -4.38767909  0.31264379  0.8669372
##  [4,]  1.0083038 -2.0744347  0.8545038 -0.58502275 -0.78632120  0.7552486
##  [5,] -0.3354972 -2.3121015  1.3303996 -2.08789979 -1.70686707 -0.8036046
##  [6,]  0.8315909  2.6911306  2.8036538  0.03086671 -0.86009981 -0.5728383
##  [7,] -1.7031399  0.7694887 -2.5919184  0.88837062 -0.02330312 -0.3914578
##  [8,]  0.9085682 -0.4710284  0.3668158  1.43535635  3.06529147 -1.1743479
##  [9,]  1.2981999 -1.4923025 -1.5186350  3.78140926 -0.66816132 -0.4353316
## [10,]  4.1480585  2.9838544 -0.3917409  1.83579080 -1.42196579 -0.2157332
##             [,7]       [,8]        [,9]       [,10]
##  [1,] -0.9263431  0.3798833  0.55144178  1.14999685
##  [2,]  1.7694311 -0.9972666  0.43319530 -4.02035785
##  [3,]  3.6927030  0.2378163  0.60106551  2.45014105
##  [4,] -1.6760365  0.7431860  1.44357221 -1.39582094
##  [5,] -4.6227727  2.1014535  0.83328784  1.12653724
##  [6,]  0.9129801  1.4583513 -0.59350206  4.18385564
##  [7,]  2.7009649 -2.3050342 -1.01243142  0.53281047
##  [8,] -1.7403944 -0.3645940 -1.53570486 -1.49968315
##  [9,]  0.3362206 -0.4548674 -0.00745502 -5.27493974
## [10,]  1.4488326 -1.8292514 -0.33752744 -0.04516904
plot(fitted.preds$pred[,1], Ytest[,1], pch=20, ylab="True response", 
     xlab="Predicted response", main="Response variable #1")
abline(0,1, lty=2)

Finally, we can also extract both the regression coefficient matrix and intercept. We display the coefficient matrix below with nonzero entries highlighted in grey.

fitted.coefs <- MSRL.coef(fit = mod3, lambda = mod3$lambda.vec[12])
str(fitted.coefs) 
## List of 2
##  $ beta0: num [1:20, 1] 0.3119 0.009 0.0507 0.178 -0.0887 ...
##  $ beta : num [1:100, 1:20] 0.144 0 0 0 0 ...
temp <- bquote("Estimate of "~beta~"with"~lambda~ "="~.(round(mod3$lambda.vec[12], 4)))
image(t(abs(fitted.coefs$beta)), main=temp, xlab="Responses", ylab="Predictors", col = grey(100:0/100))

image(t(abs(beta)), main=expression(paste("True ", beta)),  xlab="Responses", ylab="Predictors", col = grey(100:0/100))