In this document, we provide a short tutorial on how to use the \(\texttt{MCMVR}\) R package. If you encounter any errors or strange behavior, please report the issue at https://github.com/ajmolstad/MCMVR. We start by downloading this package from GitHub.

install.packages("devtools")
library(devtools)
devtools::install_github("ajmolstad/MCMVR")
library(MCMVR)

First, we generate data from the “errors-in-variables” data generating model described in Section 3.1 of the article.

sessionInfo()
## R version 3.6.1 (2019-07-05)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS Mojave 10.14.6
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] RcppArmadillo_0.9.700.2.0 Rcpp_1.0.2               
## 
## loaded via a namespace (and not attached):
##  [1] compiler_3.6.1  magrittr_1.5    tools_3.6.1     htmltools_0.3.6
##  [5] yaml_2.2.0      stringi_1.4.3   rmarkdown_1.14  knitr_1.24     
##  [9] stringr_1.4.0   xfun_0.8        digest_0.6.20   evaluate_0.14
set.seed(1)
p <- 50
q <- 10
n <- 100

beta <- matrix(rnorm(p*q)*sample(c(0,1), p*q, prob = c(.9, .1), replace=TRUE), nrow=p, ncol=q)

Z <- matrix(rnorm(n*p), nrow=n, ncol=p)
Y <- tcrossprod(Z, t(beta)) + matrix(rnorm(n*q, sd=1), nrow=n)
X <- Z + matrix(rnorm(n*p, sd=sqrt(0.5)), nrow=n)

Znew <- matrix(rnorm(n*p), nrow=n, ncol=p)
Xnew <- Znew + matrix(rnorm(n*p, sd=sqrt(0.5)), nrow=n)
Ynew <- tcrossprod(Znew, t(beta)) + matrix(rnorm(n*q, sd=1), nrow=n)

We fit the model using the cross-validation function. There are a number of key arguments: the first is \(\texttt{tau.vec}\), where a user must specify a vector of candidate tuning parameters \(\tau\) over which to fit the model: \[ \arg \min_{\beta \in \mathbb{R}^{p \times q}} \left\{ {\rm tr}\left[n^{-1}(Y - X\beta) (\beta'\beta + \tau I_q)^{-1}(Y - X\beta)' \right] + \frac{\lambda}{\tau}{\rm Pen}(\beta)\right\}.\] Another important argument is \(\texttt{nfolds}\). If set to NULL, cross-validation is not performed, the model is fit to the complete data without cross-validation. Finally, a user must also decide which penalty to use. The current options are \(\texttt{penalty="L1"}\) and \(\texttt{penalty="NN"}\) which set \({\rm Pen}(\beta) = \sum_{j,k}|\beta_{j,k}|\) and \({\rm Pen}(\beta) = \sum_{j=1}^{\min (p,q)} \varphi_j(\beta)\), respectively, where \(\varphi_j(\beta)\) is the \(j\)th largest singular value of \(\beta\). Note that one only needs to input the number of candidate \(\lambda\), \(\texttt{nlambda}\), and the ratio of max to min lambda \(\texttt{delta}\).

# ---------------------------------------------------
# Perform 5-fold CV for a grid of tuning parameters 
# ---------------------------------------------------
tau.vec <- 10^seq(3, 0, by=-.5)
fit <- MCMVR.cv(X = X, Y = Y, tau.vec = tau.vec, nlambda = 50, nfolds = 5, 
  delta = .01, tol = 1e-8, quiet= TRUE, inner.quiet= TRUE, penalty="L1")
str(fit)
## List of 10
##  $ beta        : num [1:3500, 1:50] 0 0 0 0 0 0 0 0 0 0 ...
##  $ sparsity.mat: num [1:50, 1:7] 0 1 2 2 2 2 5 6 8 8 ...
##  $ err.pred    : num [1:50, 1:7, 1:5] 6.66 6.58 6.51 6.45 6.4 ...
##  $ err.wpred   : num [1:50, 1:7, 1:5] 0.969 0.964 0.959 0.955 0.951 ...
##  $ Y.offset    : num [1:10] 0.2821 -0.1201 0.0626 -0.1167 -0.2071 ...
##  $ X.offset    : num [1:50] -0.03591 0.00857 -0.18562 -0.15828 -0.01975 ...
##  $ lambda.vec  : num [1:50] 6.12 5.57 5.07 4.62 4.2 ...
##  $ tau.vec     : num [1:7] 1000 316.2 100 31.6 10 ...
##  $ tau.min     : num 3.16
##  $ lam.min     : num 0.44
##  - attr(*, "class")= chr "MCMVR"
# ---------------------------------------------------
# visualize CV error
# ---------------------------------------------------
library(lattice)
levelplot(apply(fit$err.pred, c(1,2), mean), col.regions=grey(100:0/100),
          xlab=expression(lambda), ylab=paste(expression(tau), " index", sep=""), aspect="fill")

# ---------------------------------------------------
# get coefs from model which minimized CV error
# ---------------------------------------------------
betas <- MCMVR.coef(fit)$beta

par(mfrow=c(1,2))
image(betas!=0, col=grey(100:0/100), main=paste("Nonzero entries of estimate"))
image(beta!=0, col=grey(100:0/100), main=paste("Nonzero entries of truth"))

# -------------------------------------------------
# make predictions for Xnew
# ------------------------------------------------
preds <- MCMVR.predict(Xnew, fit)

par(mfrow=c(1,2))
plot(preds$preds[,1], Ynew[,1], pch=20, main="Response 1", 
     ylab="Our prediction", xlab="True response")
abline(0,1, lty=2)
plot(preds$preds[,2], Ynew[,2], pch=20, main="Response 2", 
     ylab="Our prediction", xlab="True response")
abline(0,1, lty=2)

# ---------------------------------------------------
# get coefs from model with seperate tuning parameter
# ---------------------------------------------------
betas <- MCMVR.coef(fit, tau = fit$tau.vec[1], lambda=fit$lambda.vec[12])$beta

par(mfrow=c(1,2))
image(betas!=0, col=grey(100:0/100), main=paste("Nonzero entries of estimate"))
image(beta!=0, col=grey(100:0/100), main=paste("Nonzero entries of truth"))

We now also fit the model for the nuclear norm penalized version. Note that it is sometimes useful to relax the convergence tolerance for this version. We also turn off the \(\texttt{quiet}\) option. When \(\texttt{penalty = "NN"}\), this option prints the value of the nuclear norm evaluated at the estimate; when \(\texttt{penalty = "L1"}\), this option prints the number of nonzero entries in the estimate of \(\beta_*\). Note that \(\texttt{inner.quiet = FALSE}\) should only be used for determining an appropriate value of \(\texttt{tol}\).

# ---------------------------------------------------
# Perform 5-fold CV for a grid of tuning parameters 
# ---------------------------------------------------
tau.vec <- 10^seq(3, 0, by=-.5)
fit <- MCMVR.cv(X = X, Y = Y, tau.vec = tau.vec, nlambda = 20, nfolds = 5, 
  delta = .01, tol = 1e-10, quiet = FALSE, inner.quiet= TRUE, penalty="NN")
## tau = 1000 : lambda = 9.989233 : nuclear norm = 0 
## tau = 1000 : lambda = 7.83915 : nuclear norm = 0.4969919 
## tau = 1000 : lambda = 6.151851 : nuclear norm = 1.491432 
## tau = 1000 : lambda = 4.827727 : nuclear norm = 2.895799 
## tau = 1000 : lambda = 3.788607 : nuclear norm = 4.394054 
## tau = 1000 : lambda = 2.973147 : nuclear norm = 6.037959 
## tau = 1000 : lambda = 2.333207 : nuclear norm = 7.633972 
## tau = 1000 : lambda = 1.831007 : nuclear norm = 9.085348 
## tau = 1000 : lambda = 1.436901 : nuclear norm = 10.42884 
## tau = 1000 : lambda = 1.127623 : nuclear norm = 11.61476 
## tau = 1000 : lambda = 0.884913 : nuclear norm = 12.65818 
## tau = 1000 : lambda = 0.6944443 : nuclear norm = 13.56972 
## tau = 1000 : lambda = 0.5449721 : nuclear norm = 14.36509 
## tau = 1000 : lambda = 0.4276723 : nuclear norm = 15.04664 
## tau = 1000 : lambda = 0.3356201 : nuclear norm = 15.62766 
## tau = 1000 : lambda = 0.2633812 : nuclear norm = 16.11633 
## tau = 1000 : lambda = 0.206691 : nuclear norm = 16.5221 
## tau = 1000 : lambda = 0.1622028 : nuclear norm = 16.85975 
## tau = 1000 : lambda = 0.1272903 : nuclear norm = 17.13038 
## tau = 1000 : lambda = 0.09989233 : nuclear norm = 17.35113 
## tau = 316.2278 : lambda = 9.989233 : nuclear norm = 0 
## tau = 316.2278 : lambda = 7.83915 : nuclear norm = 0.5005064 
## tau = 316.2278 : lambda = 6.151851 : nuclear norm = 1.498667 
## tau = 316.2278 : lambda = 4.827727 : nuclear norm = 2.905169 
## tau = 316.2278 : lambda = 3.788607 : nuclear norm = 4.403946 
## tau = 316.2278 : lambda = 2.973147 : nuclear norm = 6.048851 
## tau = 316.2278 : lambda = 2.333207 : nuclear norm = 7.645307 
## tau = 316.2278 : lambda = 1.831007 : nuclear norm = 9.097865 
## tau = 316.2278 : lambda = 1.436901 : nuclear norm = 10.44415 
## tau = 316.2278 : lambda = 1.127623 : nuclear norm = 11.6329 
## tau = 316.2278 : lambda = 0.884913 : nuclear norm = 12.67971 
## tau = 316.2278 : lambda = 0.6944443 : nuclear norm = 13.59708 
## tau = 316.2278 : lambda = 0.5449721 : nuclear norm = 14.39732 
## tau = 316.2278 : lambda = 0.4276723 : nuclear norm = 15.08625 
## tau = 316.2278 : lambda = 0.3356201 : nuclear norm = 15.67322 
## tau = 316.2278 : lambda = 0.2633812 : nuclear norm = 16.16929 
## tau = 316.2278 : lambda = 0.206691 : nuclear norm = 16.58112 
## tau = 316.2278 : lambda = 0.1622028 : nuclear norm = 16.92016 
## tau = 316.2278 : lambda = 0.1272903 : nuclear norm = 17.19877 
## tau = 316.2278 : lambda = 0.09989233 : nuclear norm = 17.42263 
## tau = 100 : lambda = 9.989233 : nuclear norm = 0 
## tau = 100 : lambda = 7.83915 : nuclear norm = 0.5115146 
## tau = 100 : lambda = 6.151851 : nuclear norm = 1.520727 
## tau = 100 : lambda = 4.827727 : nuclear norm = 2.933977 
## tau = 100 : lambda = 3.788607 : nuclear norm = 4.434609 
## tau = 100 : lambda = 2.973147 : nuclear norm = 6.080995 
## tau = 100 : lambda = 2.333207 : nuclear norm = 7.679731 
## tau = 100 : lambda = 1.831007 : nuclear norm = 9.135377 
## tau = 100 : lambda = 1.436901 : nuclear norm = 10.48816 
## tau = 100 : lambda = 1.127623 : nuclear norm = 11.68609 
## tau = 100 : lambda = 0.884913 : nuclear norm = 12.74612 
## tau = 100 : lambda = 0.6944443 : nuclear norm = 13.68037 
## tau = 100 : lambda = 0.5449721 : nuclear norm = 14.498 
## tau = 100 : lambda = 0.4276723 : nuclear norm = 15.20645 
## tau = 100 : lambda = 0.3356201 : nuclear norm = 15.81302 
## tau = 100 : lambda = 0.2633812 : nuclear norm = 16.3281 
## tau = 100 : lambda = 0.206691 : nuclear norm = 16.75909 
## tau = 100 : lambda = 0.1622028 : nuclear norm = 17.11421 
## tau = 100 : lambda = 0.1272903 : nuclear norm = 17.40677 
## tau = 100 : lambda = 0.09989233 : nuclear norm = 17.64404 
## tau = 31.62278 : lambda = 9.989233 : nuclear norm = 0 
## tau = 31.62278 : lambda = 7.83915 : nuclear norm = 0.5472109 
## tau = 31.62278 : lambda = 6.151851 : nuclear norm = 1.588968 
## tau = 31.62278 : lambda = 4.827727 : nuclear norm = 3.02009 
## tau = 31.62278 : lambda = 3.788607 : nuclear norm = 4.523484 
## tau = 31.62278 : lambda = 2.973147 : nuclear norm = 6.172042 
## tau = 31.62278 : lambda = 2.333207 : nuclear norm = 7.773269 
## tau = 31.62278 : lambda = 1.831007 : nuclear norm = 9.235778 
## tau = 31.62278 : lambda = 1.436901 : nuclear norm = 10.6041 
## tau = 31.62278 : lambda = 1.127623 : nuclear norm = 11.82657 
## tau = 31.62278 : lambda = 0.884913 : nuclear norm = 12.92126 
## tau = 31.62278 : lambda = 0.6944443 : nuclear norm = 13.90005 
## tau = 31.62278 : lambda = 0.5449721 : nuclear norm = 14.77008 
## tau = 31.62278 : lambda = 0.4276723 : nuclear norm = 15.53716 
## tau = 31.62278 : lambda = 0.3356201 : nuclear norm = 16.20631 
## tau = 31.62278 : lambda = 0.2633812 : nuclear norm = 16.7826 
## tau = 31.62278 : lambda = 0.206691 : nuclear norm = 17.2738 
## tau = 31.62278 : lambda = 0.1622028 : nuclear norm = 17.68696 
## tau = 31.62278 : lambda = 0.1272903 : nuclear norm = 18.03076 
## tau = 31.62278 : lambda = 0.09989233 : nuclear norm = 18.31303 
## tau = 10 : lambda = 9.989233 : nuclear norm = 0 
## tau = 10 : lambda = 7.83915 : nuclear norm = 0.6627698 
## tau = 10 : lambda = 6.151851 : nuclear norm = 1.788089 
## tau = 10 : lambda = 4.827727 : nuclear norm = 3.248576 
## tau = 10 : lambda = 3.788607 : nuclear norm = 4.740305 
## tau = 10 : lambda = 2.973147 : nuclear norm = 6.378854 
## tau = 10 : lambda = 2.333207 : nuclear norm = 7.967467 
## tau = 10 : lambda = 1.831007 : nuclear norm = 9.424426 
## tau = 10 : lambda = 1.436901 : nuclear norm = 10.80619 
## tau = 10 : lambda = 1.127623 : nuclear norm = 12.06395 
## tau = 10 : lambda = 0.884913 : nuclear norm = 13.21911 
## tau = 10 : lambda = 0.6944443 : nuclear norm = 14.2844 
## tau = 10 : lambda = 0.5449721 : nuclear norm = 15.26726 
## tau = 10 : lambda = 0.4276723 : nuclear norm = 16.17295 
## tau = 10 : lambda = 0.3356201 : nuclear norm = 17.00502 
## tau = 10 : lambda = 0.2633812 : nuclear norm = 17.76579 
## tau = 10 : lambda = 0.206691 : nuclear norm = 18.45903 
## tau = 10 : lambda = 0.1622028 : nuclear norm = 19.08842 
## tau = 10 : lambda = 0.1272903 : nuclear norm = 19.65529 
## tau = 10 : lambda = 0.09989233 : nuclear norm = 20.16657 
## tau = 3.162278 : lambda = 9.989233 : nuclear norm = 0 
## tau = 3.162278 : lambda = 7.83915 : nuclear norm = 1.038415 
## tau = 3.162278 : lambda = 6.151851 : nuclear norm = 2.249844 
## tau = 3.162278 : lambda = 4.827727 : nuclear norm = 3.677558 
## tau = 3.162278 : lambda = 3.788607 : nuclear norm = 5.069798 
## tau = 3.162278 : lambda = 2.973147 : nuclear norm = 6.647228 
## tau = 3.162278 : lambda = 2.333207 : nuclear norm = 8.119999 
## tau = 3.162278 : lambda = 1.831007 : nuclear norm = 9.474282 
## tau = 3.162278 : lambda = 1.436901 : nuclear norm = 10.78757 
## tau = 3.162278 : lambda = 1.127623 : nuclear norm = 12.01057 
## tau = 3.162278 : lambda = 0.884913 : nuclear norm = 13.17215 
## tau = 3.162278 : lambda = 0.6944443 : nuclear norm = 14.29402 
## tau = 3.162278 : lambda = 0.5449721 : nuclear norm = 15.3968 
## tau = 3.162278 : lambda = 0.4276723 : nuclear norm = 16.50087 
## tau = 3.162278 : lambda = 0.3356201 : nuclear norm = 17.61678 
## tau = 3.162278 : lambda = 0.2633812 : nuclear norm = 18.73428 
## tau = 3.162278 : lambda = 0.206691 : nuclear norm = 19.83772 
## tau = 3.162278 : lambda = 0.1622028 : nuclear norm = 20.9153 
## tau = 3.162278 : lambda = 0.1272903 : nuclear norm = 21.96227 
## tau = 3.162278 : lambda = 0.09989233 : nuclear norm = 22.97429 
## tau = 1 : lambda = 9.989233 : nuclear norm = 0 
## tau = 1 : lambda = 7.83915 : nuclear norm = 1.401798 
## tau = 1 : lambda = 6.151851 : nuclear norm = 2.744489 
## tau = 1 : lambda = 4.827727 : nuclear norm = 3.88227 
## tau = 1 : lambda = 3.788607 : nuclear norm = 5.222365 
## tau = 1 : lambda = 2.973147 : nuclear norm = 6.613793 
## tau = 1 : lambda = 2.333207 : nuclear norm = 7.758001 
## tau = 1 : lambda = 1.831007 : nuclear norm = 8.889974 
## tau = 1 : lambda = 1.436901 : nuclear norm = 10.02008 
## tau = 1 : lambda = 1.127623 : nuclear norm = 11.08257 
## tau = 1 : lambda = 0.884913 : nuclear norm = 12.118 
## tau = 1 : lambda = 0.6944443 : nuclear norm = 13.14494 
## tau = 1 : lambda = 0.5449721 : nuclear norm = 14.17462 
## tau = 1 : lambda = 0.4276723 : nuclear norm = 15.21416 
## tau = 1 : lambda = 0.3356201 : nuclear norm = 16.26758 
## tau = 1 : lambda = 0.2633812 : nuclear norm = 17.33527 
## tau = 1 : lambda = 0.206691 : nuclear norm = 18.41499 
## tau = 1 : lambda = 0.1622028 : nuclear norm = 19.50372 
## tau = 1 : lambda = 0.1272903 : nuclear norm = 20.59899 
## tau = 1 : lambda = 0.09989233 : nuclear norm = 21.69823 
## Through CV fold 1 
## Through CV fold 2 
## Through CV fold 3 
## Through CV fold 4 
## Through CV fold 5
str(fit)
## List of 10
##  $ beta        : num [1:3500, 1:20] 0 0 0 0 0 0 0 0 0 0 ...
##  $ sparsity.mat: num [1:20, 1:7] 0 500 500 500 500 500 500 500 500 500 ...
##  $ err.pred    : num [1:20, 1:7, 1:5] 6.61 6.4 6.07 5.58 5.11 ...
##  $ err.wpred   : num [1:20, 1:7, 1:5] 0.979 0.954 0.906 0.836 0.775 ...
##  $ Y.offset    : num [1:10] 0.2821 -0.1201 0.0626 -0.1167 -0.2071 ...
##  $ X.offset    : num [1:50] -0.03591 0.00857 -0.18562 -0.15828 -0.01975 ...
##  $ lambda.vec  : num [1:20] 9.99 7.84 6.15 4.83 3.79 ...
##  $ tau.vec     : num [1:7] 1000 316.2 100 31.6 10 ...
##  $ tau.min     : num 1000
##  $ lam.min     : num 1.44
##  - attr(*, "class")= chr "MCMVR"
levelplot(apply(fit$err.pred, c(1,2), mean), col.regions=grey(100:0/100),
          xlab=expression(lambda), ylab=expression(tau), aspect="fill")