SC19027 is a simple R package developed for the 'Statistical Computing' course. It includes a data generation function datageneration (generates a dataset of linear model) and a linear regression function for scaled MCP or SCAD scalescad_mcp (implements the scaled penalized methods with the MCP or SCAD penalty). The Rcpp function resultC provides the bias of estimated sigma and average model size. We can compare the results of two methods.
This package also includes all my homework answers.
The datageneration function generates a dataset of linear model.
The source R code for datageneration is as follows:
datageneration<-function(n,p,r0){ S <- matrix(0,p,p)+1*diag(p) for(i in 1:50){ for(j in 1:50){ if(i!=j) S[i,j]<-r0 } } R <- chol(S) #t(R)%*%R=S X <- scale(matrix(rnorm(n*p),n,p)%*%R)*sqrt(n-1)/sqrt(n) attributes(X) <- NULL X <- matrix(X,n,p) true_beta<-c(rep(1/sqrt(3),3),rep(0,p-3)) Y <- X%*%true_beta+rnorm(n) return(list(X=X, Y=c(Y), true_beta = true_beta)) }
The dataset generated by this function includes design matrix X,response Y and coefficient beta. X has independent and identically distributed Gaussian rows with marginal distribution N(0,1),and correlated first 50 columns with correlation coefficient r0. The first three coefficients are nonzero,1/sqrt(3). The model has a normal noise level.
Set n=200,p=2000,r0=0.5.
library(SC19027) n<-200;p<-2000 r0<-0.5 data<-datageneration(n,p,r0) X<-data$X Y<-data$Y beta<-data$true_beta round(X[1:10,1:10],3) round(Y[1:10],3) round(beta[1:10],3)
The scalescad_mcp function implements the scaled penalized methods with the MCP or SCAD penalty.This method is put forward by Sun, T. and Zhang, C.-H. (2012). Scaled sparse linear regression.Biometrika, 99:879–898. It chooses an equilibrium with a sparse regression method by iteratively estimating the noise level via the mean residual square and scaling the penalty in proportion to the estimated noise level. The iterative algorithm costs little beyond the computation of a path or grid of the sparse regression estimator for penalty levels above a proper threshold.
The source R code for scalescad_mcp is as follows:
scalescad_mcp<-function(X,y,lam0,method){ X <- as.matrix(X) y <- as.numeric(y) nX=dim(X)[1]; pX=dim(X)[2] #calculation of gamma m1<-t(X)%*%X m<-m1-diag(diag(m1)) gamma=2/(1-max(abs(m))/nX) #iterative algorithm obj=picasso(X,y,method=method,gamma=gamma,intercept = FALSE) sigmaint=0.1; sigmanew=5; flag=0 while(abs(sigmaint-sigmanew)>0.0001 & flag <= 100){ flag=flag+1 sigmaint=sigmanew; lam=lam0*sigmaint fit<-picasso(X,y,lambda = lam,method = method,gamma=gamma,intercept = FALSE) hy=as.numeric(X %*% fit$beta[,1]) sigmanew=sqrt(mean((y-hy)^2)) } hsigma=sigmanew; hlam=lam hbeta=picasso(X,y,lambda = lam,method = method,gamma=gamma,intercept = FALSE)$beta[,1] #results return(list(hsigma=hsigma,coefficients=hbeta)) }
The input needed is design matrix X,response y,initial penalty level lam0 and the penalty method we choose. The output values include the estimated noise level and the estimated coefficients. We can compare the estimated results with the true values of sigma and beta. Let's take an example.
library(SC19027) n<-200;p<-2000 set.seed(123) data<-datageneration(n,p,0.5) X<-data$X Y<-data$Y lam0<-sqrt(2*log(p)/n) mcp<-scalescad_mcp(X,Y,lam0,method="mcp") sigma<-mcp$hsigma beta<-mcp$coefficients print(sigma) print(beta[1:10])
The Rcpp function resultC gives the bias of estimated sigma and average model size for the estimated linear model.
The source C++ code for resultC is as follows:
NumericVector resultC(double hsigma, NumericVector hbeta) { NumericVector out(2); out[0]=hsigma/1-1; int n=hbeta.size(); int k=0; for(int i = 0; i < n; ++i) { if(hbeta[i]>0) k=k+1; } out[1]=k; return out; }
According to this function,we can compare two scaled methods.
n<-200;p<-2000 set.seed(124) data<-datageneration(n,p,0) X<-data$X Y<-data$Y lam0<-sqrt(2*log(p)/n) mcp<-scalescad_mcp(X,Y,lam0,method="mcp") scad<-scalescad_mcp(X,Y,lam0,method="scad") r1<-resultC(mcp$hsigma,mcp$coefficients) r2<-resultC(scad$hsigma,scad$coefficients) a<-as.matrix(rbind(r1,r2),ncol=2) colnames(a)<-c("bias","AMS") rownames(a)<-c("scaled MCP","scaled SCAD") a
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.