Title: | High Dimensional Confidence Interval Based on Lasso and Bootstrap |
---|---|
Description: | Fits regression models on high dimensional data to estimate coefficients and use bootstrap method to obtain confidence intervals. Choices for regression models are Lasso, Lasso+OLS, Lasso partial ridge, Lasso+OLS partial ridge. |
Authors: | Hanzhong Liu, Xin Xu, Jingyi Jessica Li |
Maintainer: | Xin Xu <[email protected]> |
License: | GNU General Public License version 2 |
Version: | 1.0-2 |
Built: | 2025-03-10 04:03:48 UTC |
Source: | https://github.com/xinxuyale/hdci |
Does residual (or paired) bootstrap Lasso and produces confidence intervals for regression coefficients.
bootLasso(x, y, B = 500, type.boot = "residual", alpha = 0.05, cv.method = "cv", nfolds = 10, foldid, cv.OLS = FALSE, tau = 0, parallel = FALSE, standardize = TRUE, intercept = TRUE, parallel.boot = FALSE, ncores.boot = 1, ...)
bootLasso(x, y, B = 500, type.boot = "residual", alpha = 0.05, cv.method = "cv", nfolds = 10, foldid, cv.OLS = FALSE, tau = 0, parallel = FALSE, standardize = TRUE, intercept = TRUE, parallel.boot = FALSE, ncores.boot = 1, ...)
x |
Input matrix as in glmnet, of dimension nobs x nvars; each row is an observation vector. |
y |
Response variable. |
B |
Number of replications in the bootstrap – default is 500. |
type.boot |
Bootstrap method which can take one of the following two values: "residual" or "paired". The default is residual. |
alpha |
Significance level – default is 0.05. |
cv.method |
The method used to select lambda in the Lasso – can be cv, cv1se, and escv; the default is cv. |
nfolds , foldid , cv.OLS , tau , parallel
|
Arguments that can be passed to escv.glmnet. |
standardize |
Logical flag for x variable standardization, prior to fitting the model. Default is standardize=TRUE. |
intercept |
Should intercept be fitted (default is TRUE) or set to zero (FALSE). |
parallel.boot |
If TRUE, use parallel foreach to run the bootstrap replication. Must register parallel before hand, such as doParallel or others. See the example below. |
ncores.boot |
Number of cores used in the bootstrap replication. |
... |
Other arguments that can be passed to glmnet. |
The function runs residual (type.boot="residual") or paired (type.boot="paired") bootstrap Lasso procedure, and produces confidence interval for each individual regression coefficient. Note that there are two arguments related to parallel, "parallel" and "parallel.boot": "parallel" is used for parallel foreach in the escv.glmnet; while, "paralle.boot" is used for the parallel foreach in the bootstrap replication precodure.
A list consisting of the following elements is returned.
lambda.opt |
The optimal value of lambda selected by cv/cv1se/escv. |
Beta |
An estimate of the regression coefficients. |
interval |
A 2 by p matrix containing the confidence intervals – the first row is the lower bounds of the confidence intervals for each of the coefficients and the second row is the upper bounds of the confidence intervals. |
library("glmnet") library("mvtnorm") ## generate the data set.seed(2015) n <- 200 # number of obs p <- 500 s <- 10 beta <- rep(0, p) beta[1:s] <- runif(s, 1/3, 1) x <- rmvnorm(n = n, mean = rep(0, p), method = "svd") signal <- sqrt(mean((x %*% beta)^2)) sigma <- as.numeric(signal / sqrt(10)) # SNR=10 y <- x %*% beta + rnorm(n) ## residual bootstrap Lasso set.seed(0) obj <- bootLasso(x = x, y = y, B = 10) # confidence interval obj$interval sum((obj$interval[1,]<=beta) & (obj$interval[2,]>=beta)) ## using parallel in the bootstrap replication #library("doParallel") #registerDoParallel(2) #set.seed(0) #system.time(obj <- bootLasso(x = x, y = y)) #system.time(obj <- bootLasso(x = x, y = y, parallel.boot = TRUE, ncores.boot = 2))
library("glmnet") library("mvtnorm") ## generate the data set.seed(2015) n <- 200 # number of obs p <- 500 s <- 10 beta <- rep(0, p) beta[1:s] <- runif(s, 1/3, 1) x <- rmvnorm(n = n, mean = rep(0, p), method = "svd") signal <- sqrt(mean((x %*% beta)^2)) sigma <- as.numeric(signal / sqrt(10)) # SNR=10 y <- x %*% beta + rnorm(n) ## residual bootstrap Lasso set.seed(0) obj <- bootLasso(x = x, y = y, B = 10) # confidence interval obj$interval sum((obj$interval[1,]<=beta) & (obj$interval[2,]>=beta)) ## using parallel in the bootstrap replication #library("doParallel") #registerDoParallel(2) #set.seed(0) #system.time(obj <- bootLasso(x = x, y = y)) #system.time(obj <- bootLasso(x = x, y = y, parallel.boot = TRUE, ncores.boot = 2))
Does residual (or paired) bootstrap Lasso+OLS and produces confidence intervals for regression coefficients.
bootLassoOLS(x, y, B = 500, type.boot = "residual", alpha = 0.05, OLS = TRUE, cv.method = "cv", nfolds = 10, foldid, cv.OLS = TRUE, tau = 0, parallel = FALSE, standardize = TRUE, intercept = TRUE, parallel.boot = FALSE, ncores.boot = 1, ...)
bootLassoOLS(x, y, B = 500, type.boot = "residual", alpha = 0.05, OLS = TRUE, cv.method = "cv", nfolds = 10, foldid, cv.OLS = TRUE, tau = 0, parallel = FALSE, standardize = TRUE, intercept = TRUE, parallel.boot = FALSE, ncores.boot = 1, ...)
x |
Input matrix as in glmnet, of dimension nobs x nvars; each row is an observation vector. |
y |
Response variable. |
B |
Number of replications in the bootstrap – default is 500. |
type.boot |
Bootstrap method which can take one of the following two values: "residual" or "paired". The default is "residual". |
alpha |
Significance level – default is 0.05. |
OLS |
If TRUE, this function runs residual (or paired) bootstrap Lasso+OLS; if FALSE, it runs residual (or paired) bootstrap Lasso. The default value is TRUE. |
cv.method |
The method used to select lambda in the Lasso+OLS – can be cv, cv1se, and escv; the default is cv. |
nfolds , foldid , cv.OLS , tau , parallel
|
Arguments that can be passed to escv.glmnet. |
standardize |
Logical flag for x variable standardization, prior to fitting the model. Default is standardize=TRUE. |
intercept |
Should intercept be fitted (default is TRUE) or set to zero (FALSE). |
parallel.boot |
If TRUE, use parallel foreach to run the bootstrap replication. Must register parallel before hand, such as doParallel or others. See the example below. |
ncores.boot |
Number of cores used in the bootstrap replication. |
... |
Other arguments that can be passed to glmnet. |
The function runs residual (type.boot="residual") or paired (type.boot="paired") bootstrap Lasso+OLS (if OLS=TRUE) procedure, and produces confidence interval for each individual regression coefficient. When the argument OLS=FALSE, it is the same as the function bootLasso, which runs residual (type.boot="residual") or paired (type.boot="paired") bootstrap Lasso procedure. Note that there are two arguments related to parallel, "parallel" and "parallel.boot": "parallel" is used for parallel foreach in the escv.glmnet; while, "paralle.boot" is used for the parallel foreach in the bootstrap replication precodure.
A list consisting of the following elements is returned.
lambda.opt |
The optimal value of lambda selected by cv/cv1se/escv. |
Beta.Lasso |
Lasso estimate of the regression coefficients. |
Beta.LassoOLS |
Lasso+OLS estimate of the regression coefficients. It gives back to Lasso estimate if the argument OLS=FALSE. |
interval.Lasso |
A 2 by p matrix containing the bootstrap Lasso confidence intervals – the first row is the lower bounds of the confidence intervals for each of the coefficients and the second row is the upper bounds of the confidence intervals. |
interval.LassoOLS |
A 2 by p matrix containing the bootstrap Lasso+OLS confidence intervals – the first row is the lower bounds of the confidence intervals for each of the coefficients and the second row is the upper bounds of the confidence intervals. It equals interval.Lasso if the argument OLS=FALSE. |
library("glmnet") library("mvtnorm") ## generate the data set.seed(2015) n <- 200 # number of obs p <- 500 s <- 10 beta <- rep(0, p) beta[1:s] <- runif(s, 1/3, 1) x <- rmvnorm(n = n, mean = rep(0, p), method = "svd") signal <- sqrt(mean((x %*% beta)^2)) sigma <- as.numeric(signal / sqrt(10)) # SNR=10 y <- x %*% beta + rnorm(n) ## residual bootstrap Lasso+OLS set.seed(0) obj <- bootLassoOLS(x = x, y = y, B = 10) # confidence interval obj$interval sum((obj$interval[1,]<=beta) & (obj$interval[2,]>=beta)) ## using parallel in the bootstrap replication #library("doParallel") #registerDoParallel(2) #set.seed(0) #system.time(obj <- bootLassoOLS(x = x, y = y)) #system.time(obj <- bootLassoOLS(x = x, y = y, parallel.boot = TRUE, ncores.boot = 2))
library("glmnet") library("mvtnorm") ## generate the data set.seed(2015) n <- 200 # number of obs p <- 500 s <- 10 beta <- rep(0, p) beta[1:s] <- runif(s, 1/3, 1) x <- rmvnorm(n = n, mean = rep(0, p), method = "svd") signal <- sqrt(mean((x %*% beta)^2)) sigma <- as.numeric(signal / sqrt(10)) # SNR=10 y <- x %*% beta + rnorm(n) ## residual bootstrap Lasso+OLS set.seed(0) obj <- bootLassoOLS(x = x, y = y, B = 10) # confidence interval obj$interval sum((obj$interval[1,]<=beta) & (obj$interval[2,]>=beta)) ## using parallel in the bootstrap replication #library("doParallel") #registerDoParallel(2) #set.seed(0) #system.time(obj <- bootLassoOLS(x = x, y = y)) #system.time(obj <- bootLassoOLS(x = x, y = y, parallel.boot = TRUE, ncores.boot = 2))
Combines residual (or paired) bootstrap Lasso+Partial Ridge and residual (or paired) bootstrap Lasso+OLS, and produces confidence intervals for regression coefficients.
bootLOPR(x, y, lambda2, B = 500, type.boot = "residual", thres = 0.5, alpha = 0.05, OLS = TRUE, cv.method = "cv", nfolds = 10, foldid, cv.OLS = TRUE, tau = 0, parallel = FALSE, standardize = TRUE, intercept = TRUE, parallel.boot = FALSE, ncores.boot = 1, ...)
bootLOPR(x, y, lambda2, B = 500, type.boot = "residual", thres = 0.5, alpha = 0.05, OLS = TRUE, cv.method = "cv", nfolds = 10, foldid, cv.OLS = TRUE, tau = 0, parallel = FALSE, standardize = TRUE, intercept = TRUE, parallel.boot = FALSE, ncores.boot = 1, ...)
x |
Input matrix as in glmnet, of dimension nobs x nvars; each row is an observation vector. |
y |
Response variable. |
lambda2 |
Tuning parameter in the Partial Ridge. If missing, lambda2 will be set to 1/nobs, where nobs is the number of observations. |
B |
Number of replications in the bootstrap – default is 500. |
type.boot |
Bootstrap method which can take one of the following two values: "residual" or "paired". The default is residual. |
thres |
A threshold parameter. For the variables/predictors with selection probability (obtained by bootstrap) larger than thres, this function uses bootstrap Lasso+OLS (if OLS=TRUE) or bootstrap Lasso (if OLS=FALSE) to produce confidence intervals; while, for the variables/predictors with selection probability (obtained by bootstrap) smaller than thres, this function uses bootstrap Lasso+Partial Ridge to produce confidence intervals. |
alpha |
Significance level – default is 0.05. |
OLS |
If TRUE, this function uses Lasso+OLS estimator to compute the residuals for residual bootstrap Lasso+Partial Ridge; otherwise, it uses Lasso estimator to compute the residuals for residual bootstrap Lasso+Partial Ridge. The default value is TRUE. This argument can be ignored for paired bootstrap Lasso+Partial Ridge. |
cv.method |
The method used to select lambda in the Lasso – can be cv, cv1se, and escv; the default is cv. |
nfolds , foldid , cv.OLS , tau , parallel
|
Arguments that can be passed to escv.glmnet. |
standardize |
Logical flag for x variable standardization, prior to fitting the model. Default is standardize=TRUE. |
intercept |
Should intercept be fitted (default is TRUE) or set to zero (FALSE). |
parallel.boot |
If TRUE, use parallel foreach to run the bootstrap replication. Must register parallel before hand, such as doParallel or others. See the example below. |
ncores.boot |
Number of cores used in the bootstrap replication. |
... |
Other arguments that can be passed to glmnet. |
The function combines the performance of bootstrap Lasso+Partial Ridge and bootstrap Lasso+OLS (if OLS=TRUE). For "large" regression coefficient in the sense that their selection probability (obtained by bootstrap) is larger than a threshold value (thres), it uses bootstrap Lasso+OLS to produce confidence intervals which can decrease the interval length ; while, for "small" regression coefficients meaning that their selection probability (obtained by bootstrap) is smaller than a threshold value (thres), it uses bootstrap Lasso+Partial Ridge to produce confidence intervals which can guarantee coverage. Note that there are two arguments related to parallel, "parallel" and "parallel.boot": "parallel" is used for parallel foreach in the escv.glmnet; while, "paralle.boot" is used for the parallel foreach in the bootstrap replication precodure.
A list consisting of the following elements is returned.
lambda.opt |
The optimal value of lambda selected by cv/cv1se/escv. |
Beta |
Lasso+OLS (if OLS=TRUE) or Lasso (if OLS=FALSE) estimate of the regression coefficients. |
Beta.LPR |
Lasso+Partial Ridge estimate of the regression coefficients. |
interval |
A 2 by p matrix containing the bootstrap Lasso+OLS (if OLS=TRUE) or bootstrap Lasso (if OLS=FALSE) confidence intervals – the first row is the lower bounds of the confidence intervals for each of the coefficients and the second row is the upper bounds of the confidence intervals. |
interval.LPR |
A 2 by p matrix containing the bootstrap Lasso+Partial Ridge confidence intervals – the first row is the lower bounds of the confidence intervals for each of the coefficients and the second row is the upper bounds of the confidence intervals. |
interval.LOPR |
A 2 by p matrix containing the combining confidence intervals of bootstrap Lasso+Partial Ridge and bootstrap Lasso+OLS (or bootstrap Lasso if OLS=FALSE) – the first row is the lower bounds of the confidence intervals for each of the coefficients and the second row is the upper bounds of the confidence intervals. |
library("glmnet") library("mvtnorm") ## generate the data set.seed(2015) n <- 200 # number of obs p <- 500 s <- 10 beta <- rep(0, p) beta[1:s] <- runif(s, 1/3, 1) x <- rmvnorm(n = n, mean = rep(0, p), method = "svd") signal <- sqrt(mean((x %*% beta)^2)) sigma <- as.numeric(signal / sqrt(10)) # SNR=10 y <- x %*% beta + rnorm(n) ## residual bootstrap Lasso OLS + Partial Ridge set.seed(0) obj <- bootLOPR(x = x, y = y, B = 10) # confidence interval obj$interval sum((obj$interval[1,]<=beta) & (obj$interval[2,]>=beta)) ## using parallel in the bootstrap replication #library("doParallel") #registerDoParallel(2) #set.seed(0) #system.time(obj <- bootLOPR(x = x, y = y)) #system.time(obj <- bootLOPR(x = x, y = y, parallel.boot = TRUE, ncores.boot = 2))
library("glmnet") library("mvtnorm") ## generate the data set.seed(2015) n <- 200 # number of obs p <- 500 s <- 10 beta <- rep(0, p) beta[1:s] <- runif(s, 1/3, 1) x <- rmvnorm(n = n, mean = rep(0, p), method = "svd") signal <- sqrt(mean((x %*% beta)^2)) sigma <- as.numeric(signal / sqrt(10)) # SNR=10 y <- x %*% beta + rnorm(n) ## residual bootstrap Lasso OLS + Partial Ridge set.seed(0) obj <- bootLOPR(x = x, y = y, B = 10) # confidence interval obj$interval sum((obj$interval[1,]<=beta) & (obj$interval[2,]>=beta)) ## using parallel in the bootstrap replication #library("doParallel") #registerDoParallel(2) #set.seed(0) #system.time(obj <- bootLOPR(x = x, y = y)) #system.time(obj <- bootLOPR(x = x, y = y, parallel.boot = TRUE, ncores.boot = 2))
Does residual (or paired) bootstrap Lasso+Partial Ridge and produces confidence intervals for regression coefficients.
bootLPR(x, y, lambda2, B = 500, type.boot = "paired", alpha = 0.05, OLS = TRUE, cv.method = "cv", nfolds = 10, foldid, cv.OLS = TRUE, tau = 0, parallel = FALSE, standardize = TRUE, intercept = TRUE, parallel.boot = FALSE, ncores.boot = 1, ...)
bootLPR(x, y, lambda2, B = 500, type.boot = "paired", alpha = 0.05, OLS = TRUE, cv.method = "cv", nfolds = 10, foldid, cv.OLS = TRUE, tau = 0, parallel = FALSE, standardize = TRUE, intercept = TRUE, parallel.boot = FALSE, ncores.boot = 1, ...)
x |
Input matrix as in glmnet, of dimension nobs x nvars; each row is an observation vector. |
y |
Response variable. |
lambda2 |
Tuning parameter in the Partial Ridge. If missing, lambda2 will be set to 1/nobs, where nobs is the number of observations. |
B |
Number of replications in the bootstrap – default is 500. |
type.boot |
Bootstrap method which can take one of the following two values: "residual" or "paired". The default is residual. |
alpha |
Significance level – default is 0.05. |
OLS |
If TRUE, this function uses Lasso+OLS estimator to compute the residuals for residual bootstrap Lasso+Partial Ridge; otherwise, it uses Lasso estimator to compute the residuals for residual bootstrap Lasso+Partial Ridge. The default value is TRUE. This argument can be ignored for paired bootstrap Lasso+Partial Ridge. |
cv.method |
The method used to select lambda in the Lasso – can be cv, cv1se, and escv; the default is cv. |
nfolds , foldid , cv.OLS , tau , parallel
|
Arguments that can be passed to escv.glmnet. |
standardize |
Logical flag for x variable standardization, prior to fitting the model. Default is standardize=TRUE. |
intercept |
Should intercept be fitted (default is TRUE) or set to zero (FALSE). |
parallel.boot |
If TRUE, use parallel foreach to run the bootstrap replication. Must register parallel before hand, such as doParallel or others. See the example below. |
ncores.boot |
Number of cores used in the bootstrap replication. |
... |
Other arguments that can be passed to glmnet. |
The function runs residual (type.boot="residual") or paired (type.boot="paired") bootstrap Lasso+Partial Ridge procedure, and produces confidence interval for each individual regression coefficient. If the argument OLS=TRUE, it uses Lasso+OLS estimator to compute the residuals for residual bootstrap Lasso+Partial Ridge; otherwise, it uses Lasso estimator to compute the residuals. Note that there are two arguments related to parallel, "parallel" and "parallel.boot": "parallel" is used for parallel foreach in the escv.glmnet; while, "paralle.boot" is used for the parallel foreach in the bootstrap replication precodure.
A list consisting of the following elements is returned.
lambda.opt |
The optimal value of lambda selected by cv/cv1se/escv. |
Beta |
Lasso+OLS (if OLS=TRUE) or Lasso (if OLS=FALSE) estimate of the regression coefficients. |
Beta.LPR |
Lasso+Partial Ridge estimate of the regression coefficients. |
interval |
A 2 by p matrix containing the bootstrap Lasso+OLS (if OLS=TRUE) or bootstrap Lasso (if OLS=FALSE) confidence intervals – the first row is the lower bounds of the confidence intervals for each of the coefficients and the second row is the upper bounds of the confidence intervals. |
interval.LPR |
A 2 by p matrix containing the bootstrap Lasso+Partial Ridge confidence intervals – the first row is the lower bounds of the confidence intervals for each of the coefficients and the second row is the upper bounds of the confidence intervals. |
library("glmnet") library("mvtnorm") ## generate the data set.seed(2015) n <- 200 # number of obs p <- 500 s <- 10 beta <- rep(0, p) beta[1:s] <- runif(s, 1/3, 1) x <- rmvnorm(n = n, mean = rep(0, p), method = "svd") signal <- sqrt(mean((x %*% beta)^2)) sigma <- as.numeric(signal / sqrt(10)) # SNR=10 y <- x %*% beta + rnorm(n) ## residual bootstrap Lasso+Partial Ridge set.seed(0) obj <- bootLPR(x = x, y = y, type.boot = "residual", B = 10) # confidence interval obj$interval sum((obj$interval[1,]<=beta) & (obj$interval[2,]>=beta)) ## using parallel in the bootstrap replication #library("doParallel") #registerDoParallel(2) #set.seed(0) #system.time(obj <- bootLPR(x = x, y = y)) #system.time(obj <- bootLPR(x = x, y = y, parallel.boot = TRUE, ncores.boot = 2))
library("glmnet") library("mvtnorm") ## generate the data set.seed(2015) n <- 200 # number of obs p <- 500 s <- 10 beta <- rep(0, p) beta[1:s] <- runif(s, 1/3, 1) x <- rmvnorm(n = n, mean = rep(0, p), method = "svd") signal <- sqrt(mean((x %*% beta)^2)) sigma <- as.numeric(signal / sqrt(10)) # SNR=10 y <- x %*% beta + rnorm(n) ## residual bootstrap Lasso+Partial Ridge set.seed(0) obj <- bootLPR(x = x, y = y, type.boot = "residual", B = 10) # confidence interval obj$interval sum((obj$interval[1,]<=beta) & (obj$interval[2,]>=beta)) ## using parallel in the bootstrap replication #library("doParallel") #registerDoParallel(2) #set.seed(0) #system.time(obj <- bootLPR(x = x, y = y)) #system.time(obj <- bootLPR(x = x, y = y, parallel.boot = TRUE, ncores.boot = 2))
Gets bootstrap confidence intervals for each of the coefficients of variables/predictors.
ci(Beta, Beta_bootstrap, alpha = 0.05, type = c("basic", "quantile", "bca", "basic2"), a, Beta2)
ci(Beta, Beta_bootstrap, alpha = 0.05, type = c("basic", "quantile", "bca", "basic2"), a, Beta2)
Beta |
An estimate of the coefficients. |
Beta_bootstrap |
Bootstrap estimates of the coefficients – a B by p matrix, where B is the number of replications in the bootstrap and p is number of variables/predictors. |
alpha |
Significance level – default is 0.05. |
type |
Different type of confidence interval: basic, quantile, bca (adjusted bootstrap confidence intervals) and basic2 (a modified basic confidence interval). |
a |
Parameter in the bca confidence interval. |
Beta2 |
An estimator of the coefficients used in the basic2 confidence interval. |
interval |
A 2 by p matrix containing the confidence intervals – the first row is the lower bounds of the confidence intervals for each of the coefficients and the second row is the upper bounds of the confidence intervals. |
Does k-fold estimation stability with cross-validation (escv) for glmnet and returns optimal values for lambda.
escv.glmnet(x, y, lambda = NULL, nfolds = 10, foldid, cv.OLS = FALSE, tau = 0, parallel = FALSE, standardize = TRUE, intercept = TRUE, ...)
escv.glmnet(x, y, lambda = NULL, nfolds = 10, foldid, cv.OLS = FALSE, tau = 0, parallel = FALSE, standardize = TRUE, intercept = TRUE, ...)
x |
Input matrix as in glmnet, of dimension nobs x nvars; each row is an observation vector. Can be in sparse matrix format (inherit from class "sparseMatrix" as in package Matrix). |
y |
Response variable. |
lambda |
Optional user-supplied lambda sequence for the Lasso; default is NULL, and glmnet chooses its own sequence. |
nfolds |
Number of folds - default is 10. |
foldid |
An optional vector of values between 1 and nfold identifying what fold each observation is in. If supplied, nfolds can be missing. |
cv.OLS |
If TRUE, uses two-stage estimator Lasso+OLS in the fits (using Lasso to select variables/predictors and then using OLS to refit the coefficients for the selected variables/predictors. The default value is FALSE. |
tau |
Tuning parameter in modified Least Squares (mls). Default value is 0, which corresponds to OLS. |
parallel |
If TRUE, use parallel foreach to fit each fold. Must register parallel before hand, such as doParallel or others. See the example below. |
standardize |
Logical flag for x variable standardization, prior to fitting the model sequence. Default is standardize=TRUE. |
intercept |
Should intercept be fitted (default is TRUE) or set to zero (FALSE). |
... |
Other arguments that can be passed to glmnet. |
The function is similar to cv.glmnet, and returns the values of lambda selected by cross-validation (cv), by cross-validation within 1 standard error (cv1se) and by estimation stability with cross-validation (escv). The function runs glmnet nfolds+1 times; the first to get the lambda sequence, and then the remainder to compute the first stage fit (i.e., Lasso) with each of the folds omitted. The error (cv and also es) is accumulated, and the average error and standard deviation over the folds is computed. Note that, similar to cv.glmnet, the results of escv.glmnet are random, since the folds are selected at random. Users can reduce this randomness by running escv.glmnet many times, and averaging the error curves.
A list consisting of the following elements is returned.
lambda |
The values of lambda used in the fits. |
glmnet.fit |
A fitted glmnet object for the full data. |
cv |
The mean cross-validated error - a vector of length length(lambda). |
cv.error |
Estimate of standard error of cv. |
es |
The mean estimation stability (es) value - a vector of length length(lambda). |
es.error |
Estimate of standard error of es. |
lambda.cv |
Value of lambda that gives minimum cv. |
lambda.cv1se |
Largest value of lambda such that cross-validated error is within 1 standard error of the minimum. |
lambda.escv |
Value of lambda selected by escv – giving the minimum es within the range of lambdas which are no less than lambda.cv. |
library("glmnet") library("mvtnorm") ## generate the data set.seed(2015) n <- 200 # number of obs p <- 500 s <- 10 beta <- rep(0, p) beta[1:s] <- runif(s, 1/3, 1) x <- rmvnorm(n = n, mean = rep(0, p), method = "svd") signal <- sqrt(mean((x %*% beta)^2)) sigma <- as.numeric(signal / sqrt(10)) # SNR=10 y <- x %*% beta + rnorm(n) ## escv without parallel # using Lasso+OLS in the cv fit. set.seed(0) obj <- escv.glmnet(x, y, cv.OLS = TRUE) # using Lasso in the cv fit. set.seed(0) obj <- escv.glmnet(x, y) ## escv with parallel #library("doParallel") #library("doRNG") #registerDoParallel(2) # using Lasso+OLS in the cv fit. #registerDoRNG(seed = 0) #obj <- escv.glmnet(x, y, cv.OLS = TRUE, nfolds = 4, parallel = TRUE) # using Lasso in the cv fit. #registerDoRNG(seed = 0) #obj <- escv.glmnet(x, y, parallel = TRUE)
library("glmnet") library("mvtnorm") ## generate the data set.seed(2015) n <- 200 # number of obs p <- 500 s <- 10 beta <- rep(0, p) beta[1:s] <- runif(s, 1/3, 1) x <- rmvnorm(n = n, mean = rep(0, p), method = "svd") signal <- sqrt(mean((x %*% beta)^2)) sigma <- as.numeric(signal / sqrt(10)) # SNR=10 y <- x %*% beta + rnorm(n) ## escv without parallel # using Lasso+OLS in the cv fit. set.seed(0) obj <- escv.glmnet(x, y, cv.OLS = TRUE) # using Lasso in the cv fit. set.seed(0) obj <- escv.glmnet(x, y) ## escv with parallel #library("doParallel") #library("doRNG") #registerDoParallel(2) # using Lasso+OLS in the cv fit. #registerDoRNG(seed = 0) #obj <- escv.glmnet(x, y, cv.OLS = TRUE, nfolds = 4, parallel = TRUE) # using Lasso in the cv fit. #registerDoRNG(seed = 0) #obj <- escv.glmnet(x, y, parallel = TRUE)
Gets Lasso estimator for a given value of lambda or for the value of lambda choosing by cross-validation (or escv).
Lasso(x, y, lambda = NULL, fix.lambda = TRUE, cv.method = "cv", nfolds = 10, foldid, cv.OLS = FALSE, tau = 0, parallel = FALSE, standardize = TRUE, intercept = TRUE , ...)
Lasso(x, y, lambda = NULL, fix.lambda = TRUE, cv.method = "cv", nfolds = 10, foldid, cv.OLS = FALSE, tau = 0, parallel = FALSE, standardize = TRUE, intercept = TRUE , ...)
x |
Input matrix as in glmnet, of dimension nobs x nvars; each row is an observation vector. |
y |
Response variable. |
lambda |
A value of lambda - default is NULL. lambda should be given a value when fix.lambda=TRUE. |
fix.lambda |
If TRUE, computes Lasso+OLS (or Lasso) for a fix value of lambda given by the argument "lambda"; otherwise, computes Lasso+OLS (or Lasso) for the value of lambda choosing by cv/cv1se/escv. |
cv.method |
The method used to select lambda – can be cv, cv1se, and escv; the default is cv. cv.method is useful only when fix.lambda=FALSE. |
nfolds , foldid , cv.OLS , tau , parallel
|
Arguments that can be passed to escv.glmnet (useful only when fix.lambda=FALSE). |
standardize |
Logical flag for x variable standardization, prior to fitting the model. Default is standardize=TRUE. |
intercept |
Should intercept be fitted (default is TRUE) or set to zero (FALSE). |
... |
Other arguments that can be passed to glmnet. |
The function computes the Lasso estimator for a give value of lambda (if fix.lambda=TRUE) or for the value of lambda choosing by cv/cv1se/escv (if fix.lambda=FALSE).
A list consisting of the following elements is returned.
beta |
The Lasso estimate for the coefficients of variables/predictors. |
beta0 |
A value of intercept term. |
lambda |
The value/values of lambda. |
meanx |
The mean vector of variables/predictors if intercept=TRUE, otherwise is a vector of 0's. |
mu |
The mean of the response if intercept=TRUE, otherwise is 0. |
library("glmnet") library("mvtnorm") ## generate the data set.seed(2015) n <- 200 # number of obs p <- 500 s <- 10 beta <- rep(0, p) beta[1:s] <- runif(s, 1/3, 1) x <- rmvnorm(n = n, mean = rep(0, p), method = "svd") signal <- sqrt(mean((x %*% beta)^2)) sigma <- as.numeric(signal / sqrt(10)) # SNR=10 y <- x %*% beta + rnorm(n) ## Lasso estimator # for a given value of lambda set.seed(0) obj.escv <- escv.glmnet(x, y) obj <- Lasso(x, y, lambda = obj.escv$lambda.cv) # Lasso estimate of the regression coefficients obj$beta # intercept term obj$beta0 # prediction mypredict(obj, newx = matrix(rnorm(10*p), 10, p)) # for lambda choosing by cross-validation (cv) which uses Lasso in the cv fit set.seed(0) obj <- Lasso(x, y, fix.lambda = FALSE) # for lambda choosing by cross-validation (cv) which uses Lasso+OLS in the cv fit set.seed(0) obj <- Lasso(x, y, fix.lambda = FALSE, cv.OLS = TRUE)
library("glmnet") library("mvtnorm") ## generate the data set.seed(2015) n <- 200 # number of obs p <- 500 s <- 10 beta <- rep(0, p) beta[1:s] <- runif(s, 1/3, 1) x <- rmvnorm(n = n, mean = rep(0, p), method = "svd") signal <- sqrt(mean((x %*% beta)^2)) sigma <- as.numeric(signal / sqrt(10)) # SNR=10 y <- x %*% beta + rnorm(n) ## Lasso estimator # for a given value of lambda set.seed(0) obj.escv <- escv.glmnet(x, y) obj <- Lasso(x, y, lambda = obj.escv$lambda.cv) # Lasso estimate of the regression coefficients obj$beta # intercept term obj$beta0 # prediction mypredict(obj, newx = matrix(rnorm(10*p), 10, p)) # for lambda choosing by cross-validation (cv) which uses Lasso in the cv fit set.seed(0) obj <- Lasso(x, y, fix.lambda = FALSE) # for lambda choosing by cross-validation (cv) which uses Lasso+OLS in the cv fit set.seed(0) obj <- Lasso(x, y, fix.lambda = FALSE, cv.OLS = TRUE)
Computes the two-stage estimator Lasso+OLS (default) or the Lasso estimator (if OLS=FALSE).
LassoOLS(x, y, OLS = TRUE, lambda = NULL, fix.lambda = TRUE, cv.method = "cv", nfolds = 10, foldid, cv.OLS = TRUE, tau = 0, parallel = FALSE, standardize = TRUE, intercept = TRUE, ...)
LassoOLS(x, y, OLS = TRUE, lambda = NULL, fix.lambda = TRUE, cv.method = "cv", nfolds = 10, foldid, cv.OLS = TRUE, tau = 0, parallel = FALSE, standardize = TRUE, intercept = TRUE, ...)
x |
Input matrix as in glmnet, of dimension nobs x nvars; each row is an observation vector. |
y |
Response variable. |
OLS |
If TRUE, computes Lasso+OLS; otherwise, computes Lasso estimator. The default is TRUE. |
lambda |
A value of lambda - default is NULL. lambda should be given a value when fix.lambda=TRUE. |
fix.lambda |
If TRUE, computes Lasso+OLS (or Lasso) for a fix value of lambda given by the argument "lambda"; otherwise, computes Lasso+OLS (or Lasso) for the value of lambda choosing by cv/cv1se/escv. |
cv.method |
The method used to select lambda – can be cv, cv1se, and escv; the default is cv. cv.method is useful only when fix.lambda=FALSE. |
nfolds , foldid , cv.OLS , tau , parallel
|
Arguments that can be passed to escv.glmnet (useful only when fix.lambda=FALSE). Note that, the default value of cv.OLS is TRUE, which means using Lasso+OLS in the cv fits. |
standardize |
Logical flag for x variable standardization, prior to fitting the model. Default is standardize=TRUE. |
intercept |
Should intercept be fitted (default is TRUE) or set to zero (FALSE). |
... |
Other arguments that can be passed to glmnet. |
If OLS=TRUE (default), this function computes the Lasso+OLS estimator for a give value of lambda (if fix.lambda=TRUE) or for the value of lambda choosing by cv/cv1se/escv (if fix.lambda=FALSE). If OLS=FALSE, this function computes the Lasso estimator in the same way as the function "Lasso". Note that, we use the easy-to-understand notation "Lasso+OLS" denoting the "Lasso+mLS" estimator defined in the paper: Liu H, Yu B. Asymptotic Properties of Lasso+mLS and Lasso+Ridge in Sparse High-dimensional Linear Regression. Electronic Journal of Statistics, 2013, 7.
A list consisting of the following elements is returned.
beta |
The Lasso+OLS (or Lasso when OLS=FALSE) estimate for the coefficients of variables/predictors. |
beta0 |
A value of intercept term. |
lambda |
The value/values of lambda. |
meanx |
The mean vector of variables/predictors if intercept=TRUE, otherwise is a vector of 0's. |
mu |
The mean of the response if intercept=TRUE, otherwise is 0. |
tau |
Tuning parameter in modified Least Squares (mls). |
library("glmnet") library("mvtnorm") ## generate the data set.seed(2015) n <- 200 # number of obs p <- 500 s <- 10 beta <- rep(0, p) beta[1:s] <- runif(s, 1/3, 1) x <- rmvnorm(n = n, mean = rep(0, p), method = "svd") signal <- sqrt(mean((x %*% beta)^2)) sigma <- as.numeric(signal / sqrt(10)) # SNR=10 y <- x %*% beta + rnorm(n) ## Lasso+OLS estimator # for a given value of lambda set.seed(0) obj.escv <- escv.glmnet(x, y) obj <- LassoOLS(x, y, lambda = obj.escv$lambda.cv) # Lasso+OLS estimate of the regression coefficients obj$beta # intercept term obj$beta0 # prediction mypredict(obj, newx = matrix(rnorm(10*p), 10, p)) # for lambda choosing by cross-validation (cv) which uses Lasso+OLS in the cv fit set.seed(0) obj <- LassoOLS(x, y, fix.lambda = FALSE) # for lambda choosing by cross-validation (cv) which uses Lasso in the cv fit set.seed(0) obj <- LassoOLS(x, y, fix.lambda = FALSE, cv.OLS = FALSE)
library("glmnet") library("mvtnorm") ## generate the data set.seed(2015) n <- 200 # number of obs p <- 500 s <- 10 beta <- rep(0, p) beta[1:s] <- runif(s, 1/3, 1) x <- rmvnorm(n = n, mean = rep(0, p), method = "svd") signal <- sqrt(mean((x %*% beta)^2)) sigma <- as.numeric(signal / sqrt(10)) # SNR=10 y <- x %*% beta + rnorm(n) ## Lasso+OLS estimator # for a given value of lambda set.seed(0) obj.escv <- escv.glmnet(x, y) obj <- LassoOLS(x, y, lambda = obj.escv$lambda.cv) # Lasso+OLS estimate of the regression coefficients obj$beta # intercept term obj$beta0 # prediction mypredict(obj, newx = matrix(rnorm(10*p), 10, p)) # for lambda choosing by cross-validation (cv) which uses Lasso+OLS in the cv fit set.seed(0) obj <- LassoOLS(x, y, fix.lambda = FALSE) # for lambda choosing by cross-validation (cv) which uses Lasso in the cv fit set.seed(0) obj <- LassoOLS(x, y, fix.lambda = FALSE, cv.OLS = FALSE)
Computes the two-stage estimator Lasso+Partial Ridge.
LPR(x, y, lambda = NULL, fix.lambda = TRUE, lambda2, cv.method = "cv", nfolds = 10, foldid, cv.OLS = TRUE, tau = 0, parallel = FALSE, standardize = TRUE, intercept = TRUE, ...)
LPR(x, y, lambda = NULL, fix.lambda = TRUE, lambda2, cv.method = "cv", nfolds = 10, foldid, cv.OLS = TRUE, tau = 0, parallel = FALSE, standardize = TRUE, intercept = TRUE, ...)
x |
Input matrix as in glmnet, of dimension nobs x nvars; each row is an observation vector. |
y |
Response variable. |
lambda |
lambda: A value of lambda - default is NULL. lambda should be given a value when fix.lambda=TRUE. |
fix.lambda |
If TRUE, computes Lasso+Partial Ridge estimator for a fix value of lambda given by the argument "lambda"; otherwise, computes Lasso+Partial Ridge estimator for the value of lambda choosing by cv/cv1se/escv. |
lambda2 |
Tuning parameter in the Partial Ridge. If missing, lambda2 will be set to 1/nobs, where nobs is the number of observations. |
cv.method |
The method used to select lambda – can be cv, cv1se, and escv; the default is cv. cv.method is useful only when fix.lambda=FALSE. |
nfolds , foldid , cv.OLS , tau , parallel
|
Arguments that can be passed to escv.glmnet (useful only when fix.lambda=FALSE). Note that, the default value of cv.OLS is TRUE, which means using Lasso+OLS in the cv fits. |
standardize |
Logical flag for x variable standardization, prior to fitting the model. Default is standardize=TRUE. |
intercept |
Should intercept be fitted (default is TRUE) or set to zero (FALSE). |
... |
Other arguments that can be passed to glmnet. |
This function computes the Lasso+Partial Ridge estimator for a give value of lambda (if fix.lambda=TRUE) or for the value of lambda choosing by cv/cv1se/escv (if fix.lambda=FALSE).
A list consisting of the following elements is returned.
beta |
The Lasso+Partial Ridge estimator for the coefficients of variables/predictors. |
beta0 |
A value of intercept term. |
lambda |
The value/values of lambda. |
lambda2 |
The value of lambda2. |
meanx |
The mean vector of variables/predictors if intercept=TRUE, otherwise is a vector of 0's. |
mu |
The mean of the response if intercept=TRUE, otherwise is 0. |
normx |
The vector of standard error of variables/predictors if standardize=TRUE, otherwise is a vector of 1's. |
tau |
Tuning parameter in modified Least Squares (mls). |
library("glmnet") library("mvtnorm") ## generate the data set.seed(2015) n <- 200 # number of obs p <- 500 s <- 10 beta <- rep(0, p) beta[1:s] <- runif(s, 1/3, 1) x <- rmvnorm(n = n, mean = rep(0, p), method = "svd") signal <- sqrt(mean((x %*% beta)^2)) sigma <- as.numeric(signal / sqrt(10)) # SNR=10 y <- x %*% beta + rnorm(n) ## Lasso+Partial Ridge estimator # for a given value of lambda set.seed(0) obj.escv <- escv.glmnet(x, y) obj <- LPR(x, y, lambda = obj.escv$lambda.cv) # Lasso+OLS estimate of the regression coefficients obj$beta # intercept term obj$beta0 # prediction mypredict(obj, newx = matrix(rnorm(10*p), 10, p)) # for lambda choosing by cross-validation (cv) which uses Lasso+OLS in the cv fit set.seed(0) obj <- LPR(x, y, fix.lambda = FALSE) # for lambda choosing by cross-validation (cv) which uses Lasso in the cv fit set.seed(0) obj <- LPR(x, y, fix.lambda = FALSE, cv.OLS = FALSE)
library("glmnet") library("mvtnorm") ## generate the data set.seed(2015) n <- 200 # number of obs p <- 500 s <- 10 beta <- rep(0, p) beta[1:s] <- runif(s, 1/3, 1) x <- rmvnorm(n = n, mean = rep(0, p), method = "svd") signal <- sqrt(mean((x %*% beta)^2)) sigma <- as.numeric(signal / sqrt(10)) # SNR=10 y <- x %*% beta + rnorm(n) ## Lasso+Partial Ridge estimator # for a given value of lambda set.seed(0) obj.escv <- escv.glmnet(x, y) obj <- LPR(x, y, lambda = obj.escv$lambda.cv) # Lasso+OLS estimate of the regression coefficients obj$beta # intercept term obj$beta0 # prediction mypredict(obj, newx = matrix(rnorm(10*p), 10, p)) # for lambda choosing by cross-validation (cv) which uses Lasso+OLS in the cv fit set.seed(0) obj <- LPR(x, y, fix.lambda = FALSE) # for lambda choosing by cross-validation (cv) which uses Lasso in the cv fit set.seed(0) obj <- LPR(x, y, fix.lambda = FALSE, cv.OLS = FALSE)
Computes modified Least Squares estimate.
mls(x, y, tau = 0, standardize = TRUE, intercept = TRUE)
mls(x, y, tau = 0, standardize = TRUE, intercept = TRUE)
x |
Input matrix as in glmnet, of dimension nobs x nvars; each row is an observation vector. |
y |
Response variable. |
tau |
Tuning parameter in modified Least Squares (mls). Default value is 0, which corresponds to Ordinary Least Squares (OLS). |
standardize |
Logical flag for x variable standardization, prior to fitting the model. Default is standardize=TRUE. |
intercept |
Should intercept be fitted (default is TRUE) or set to zero (FALSE). |
The function is used to compute the modified Least Squares (mLS) estimator defined in the paper: Liu H, Yu B. Asymptotic Properties of Lasso+mLS and Lasso+Ridge in Sparse High-dimensional Linear Regression. Electronic Journal of Statistics, 2013, 7.
A list consisting of the following elements is returned.
beta |
The mLS coefficient of variables/predictors. |
beta0 |
A value of intercept term. |
meanx |
The mean vector of variables/predictors if intercept=TRUE, otherwise is a vector of 0's. |
mu |
The mean of the response if intercept=TRUE, otherwise is 0. |
normx |
The vector of standard error of variables/predictors if standardize=TRUE, otherwise is a vector of 1's. |
tau |
The tuning parameter in mLS. |
library("mvtnorm") ## generate the data set.seed(2015) n <- 200 # number of obs p <- 500 s <- 10 beta <- rep(0, p) beta[1:s] <- runif(s, 1/3, 1) x <- rmvnorm(n = n, mean = rep(0, p), method = "svd") signal <- sqrt(mean((x %*% beta)^2)) sigma <- as.numeric(signal / sqrt(10)) # SNR=10 y <- x %*% beta + rnorm(n) ## modified Least Squares set.seed(0) obj <- mls(x = x[, 1:20], y = y) # the OLS estimate of the regression coefficients obj$beta # intercept term obj$beta0 # prediction mypredict(obj, newx = matrix(rnorm(10*20), 10, 20))
library("mvtnorm") ## generate the data set.seed(2015) n <- 200 # number of obs p <- 500 s <- 10 beta <- rep(0, p) beta[1:s] <- runif(s, 1/3, 1) x <- rmvnorm(n = n, mean = rep(0, p), method = "svd") signal <- sqrt(mean((x %*% beta)^2)) sigma <- as.numeric(signal / sqrt(10)) # SNR=10 y <- x %*% beta + rnorm(n) ## modified Least Squares set.seed(0) obj <- mls(x = x[, 1:20], y = y) # the OLS estimate of the regression coefficients obj$beta # intercept term obj$beta0 # prediction mypredict(obj, newx = matrix(rnorm(10*20), 10, 20))
Returns the predicted values.
mypredict(object, newx)
mypredict(object, newx)
object |
An object from mls, Lasso, LassoOLS or PartialRidge. |
newx |
Matrix of the values of variables/predictors for doing prediction; each row is an observation vector. |
The predicted values for a give newx matrix is returned.
library("mvtnorm") ## generate the data set.seed(2015) n <- 200 # number of obs p <- 500 s <- 10 beta <- rep(0, p) beta[1:s] <- runif(s, 1/3, 1) x <- rmvnorm(n = n, mean = rep(0, p), method = "svd") signal <- sqrt(mean((x %*% beta)^2)) sigma <- as.numeric(signal / sqrt(10)) # SNR=10 y <- x %*% beta + rnorm(n) ## modified Least Squares set.seed(0) obj <- mls(x = x[, 1:20], y = y) # the OLS estimate of the regression coefficients obj$beta # intercept term obj$beta0 # prediction mypredict(obj, newx = matrix(rnorm(10*20), 10, 20))
library("mvtnorm") ## generate the data set.seed(2015) n <- 200 # number of obs p <- 500 s <- 10 beta <- rep(0, p) beta[1:s] <- runif(s, 1/3, 1) x <- rmvnorm(n = n, mean = rep(0, p), method = "svd") signal <- sqrt(mean((x %*% beta)^2)) sigma <- as.numeric(signal / sqrt(10)) # SNR=10 y <- x %*% beta + rnorm(n) ## modified Least Squares set.seed(0) obj <- mls(x = x[, 1:20], y = y) # the OLS estimate of the regression coefficients obj$beta # intercept term obj$beta0 # prediction mypredict(obj, newx = matrix(rnorm(10*20), 10, 20))
Computes the Partial Ridge estimator.
PartRidge(x, y, lambda2 = 0, varset, standardize = TRUE, intercept = TRUE)
PartRidge(x, y, lambda2 = 0, varset, standardize = TRUE, intercept = TRUE)
x |
Input matrix as in glmnet, of dimension nobs x nvars; each row is an observation vector. |
y |
Response variable. |
lambda2 |
Tuning parameter for the Partial Ridge. The default value is 0, which gives back to the OLS estimator. |
varset |
A set indicating which variable/predictors are not penalized. Partial Ridge puts l2 penalty only on the coefficients of the variables/predictors not included in the varset. |
standardize |
Logical flag for x variable standardization, prior to fitting the model. Default is standardize=TRUE. |
intercept |
Should intercept be fitted (default is TRUE) or set to zero (FALSE). |
This function computes the Partial Ridge estimator, which adds l2 penalty only on the coefficients of variabels/predictors not included in the set varset, to the loss function (residual sum of squares).
A list consisting of the following elements is returned.
beta |
The Lasso+Partial Ridge estimator for the coefficients of variables/predictors. |
beta0 |
A value of intercept term. |
meanx |
The mean vector of variables/predictors if intercept=TRUE, otherwise is a vector of 0's. |
mu |
The mean of the response if intercept=TRUE, otherwise is 0. |
normx |
The vector of standard error of variables/predictors if standardize=TRUE, otherwise is a vector of 1's. |
lambda2 |
The value of lambda2. |
library("glmnet") library("mvtnorm") ## generate the data set.seed(2015) n <- 200 # number of obs p <- 500 s <- 10 beta <- rep(0, p) beta[1:s] <- runif(s, 1/3, 1) x <- rmvnorm(n = n, mean = rep(0, p), method = "svd") signal <- sqrt(mean((x %*% beta)^2)) sigma <- as.numeric(signal / sqrt(10)) # SNR=10 y <- x %*% beta + rnorm(n) ## Partial Ridge Regression # Lasso set.seed(0) obj.escv <- escv.glmnet(x, y) obj <- Lasso(x, y, lambda = obj.escv$lambda.cv) # variable set betalasso <- obj$beta selectvar <- betalasso != 0 # partial ridge PR.obj <- PartRidge(x = x, y = y, lambda2 = 1/n, varset = selectvar) # parial ridge estimate of the regression coefficients beta <- PR.obj$beta # intercept term beta0 <- PR.obj$beta0 # prediction mypredict(PR.obj, newx = matrix(rnorm(10*p), 10, p))
library("glmnet") library("mvtnorm") ## generate the data set.seed(2015) n <- 200 # number of obs p <- 500 s <- 10 beta <- rep(0, p) beta[1:s] <- runif(s, 1/3, 1) x <- rmvnorm(n = n, mean = rep(0, p), method = "svd") signal <- sqrt(mean((x %*% beta)^2)) sigma <- as.numeric(signal / sqrt(10)) # SNR=10 y <- x %*% beta + rnorm(n) ## Partial Ridge Regression # Lasso set.seed(0) obj.escv <- escv.glmnet(x, y) obj <- Lasso(x, y, lambda = obj.escv$lambda.cv) # variable set betalasso <- obj$beta selectvar <- betalasso != 0 # partial ridge PR.obj <- PartRidge(x = x, y = y, lambda2 = 1/n, varset = selectvar) # parial ridge estimate of the regression coefficients beta <- PR.obj$beta # intercept term beta0 <- PR.obj$beta0 # prediction mypredict(PR.obj, newx = matrix(rnorm(10*p), 10, p))